### Generalized context-free grammar

Generalized Context-free Grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context free composition functions to rewrite rules. Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

## Description

A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form $f\left(\langle x_1, ..., x_m \rangle, \langle y_1, ..., y_n \rangle, ...\right) = \gamma$, where $\gamma$ is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like $X \to f\left(Y, Z, ...\right)$, where $Y$, $Z$, ... are string tuples or non-terminal symbols.

The rewrite semantics of GCFGs is fairly straight forward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, reducing successively reducing the tuples to a single tuple.

## Example

A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1), which generates the palindrome language , where $w^R$ is the string reverse of $w$, we can define the composition function conc as in (2a) and the rewrite rules as in (2b).

1. $S \to \epsilon ~|~ aSa ~|~ bSb$
1. $conc\left(\langle x \rangle, \langle y \rangle, \langle z \rangle\right) = \langle xyz \rangle$
2. $S \to conc\left(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle\right) ~|~ conc\left(\langle a \rangle, S, \langle a \rangle\right) ~|~ conc\left(\langle b \rangle, S, \langle b \rangle\right)$

The CF production of abbbba is

S

aSa

abSba

abbSbba

abbbba

and the corresponding GCFG production is

$S \to conc\left(\langle a \rangle, S, \langle a \rangle\right)$

$conc\left(\langle a \rangle, conc\left(\langle b \rangle, S, \langle b \rangle\right), \langle a \rangle\right)$

$conc\left(\langle a \rangle, conc\left(\langle b \rangle, conc\left(\langle b \rangle, S, \langle b \rangle\right), \langle b \rangle\right), \langle a \rangle\right)$

$conc\left(\langle a \rangle, conc\left(\langle b \rangle, conc\left(\langle b \rangle, conc\left(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle\right), \langle b \rangle\right), \langle b \rangle\right), \langle a \rangle\right)$

$conc\left(\langle a \rangle, conc\left(\langle b \rangle, conc\left(\langle b \rangle, \langle \epsilon \rangle, \langle b \rangle\right), \langle b \rangle\right), \langle a \rangle\right)$

$conc\left(\langle a \rangle, conc\left(\langle b \rangle, \langle bb \rangle, \langle b \rangle\right), \langle a \rangle\right)$

$conc\left(\langle a \rangle, \langle bbbb \rangle, \langle a \rangle\right)$

$\langle abbbba \rangle$

## Linear Context-free Rewriting Systems (LCFRSs)

Weir (1988) describes two properties of composition functions, linearity and regularity. A function defined as $f\left(x_1, ..., x_n\right) = ...$ is linear if and only if each variable appears at most once on either side of the =, making $f\left(x\right) = g\left(x, y\right)$ linear but not $f\left(x\right) = g\left(x, x\right)$. A function defined as $f\left(x_1, ..., x_n\right) = ...$ is regular if the left hand side and right hand side have exactly the same variables, making $f\left(x, y\right) = g\left(y, x\right)$ regular but not $f\left(x\right) = g\left(x, y\right)$ or $f\left(x, y\right) = g\left(x\right)$.

A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS), a subset of the GCFGs with strictly less computational power than the GCFGs as a whole, which is weakly equivalent to multicomponent Tree adjoining grammars. Head grammar is an example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.