Generalized context-free grammar

Generalized Context-free Grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context free composition functions to rewrite rules.[1] Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Description

A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form f(\langle x_1, ..., x_m \rangle, \langle y_1, ..., y_n \rangle, ...) = \gamma, where \gamma is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like X \to f(Y, Z, ...), where Y, Z, ... are string tuples or non-terminal symbols.

The rewrite semantics of GCFGs is fairly straight forward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, reducing successively reducing the tuples to a single tuple.

Example

A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1), which generates the palindrome language \{ ww^R : w \in \{a, b\}^{*} \}, where w^R is the string reverse of w, we can define the composition function conc as in (2a) and the rewrite rules as in (2b).

  1. S \to \epsilon ~|~ aSa ~|~ bSb
    1. conc(\langle x \rangle, \langle y \rangle, \langle z \rangle) = \langle xyz \rangle
    2. S \to conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle) ~|~ conc(\langle a \rangle, S, \langle a \rangle) ~|~ conc(\langle b \rangle, S, \langle b \rangle)

The CF production of abbbba is

S

aSa

abSba

abbSbba

abbbba

and the corresponding GCFG production is

S \to conc(\langle a \rangle, S, \langle a \rangle)

conc(\langle a \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle a \rangle)

conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle b \rangle), \langle a \rangle)

conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle), \langle b \rangle), \langle b \rangle), \langle a \rangle)

conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, \langle \epsilon \rangle, \langle b \rangle), \langle b \rangle), \langle a \rangle)

conc(\langle a \rangle, conc(\langle b \rangle, \langle bb \rangle, \langle b \rangle), \langle a \rangle)

conc(\langle a \rangle, \langle bbbb \rangle, \langle a \rangle)

\langle abbbba \rangle

Linear Context-free Rewriting Systems (LCFRSs)

Weir (1988)[1] describes two properties of composition functions, linearity and regularity. A function defined as f(x_1, ..., x_n) = ... is linear if and only if each variable appears at most once on either side of the =, making f(x) = g(x, y) linear but not f(x) = g(x, x). A function defined as f(x_1, ..., x_n) = ... is regular if the left hand side and right hand side have exactly the same variables, making f(x, y) = g(y, x) regular but not f(x) = g(x, y) or f(x, y) = g(x).

A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS), a subset of the GCFGs with strictly less computational power than the GCFGs as a whole, which is weakly equivalent to multicomponent Tree adjoining grammars. Head grammar is an example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.

References

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.