World Library  
Flag as Inappropriate
Email this Article

Range concatenation grammars

Article Id: WHEBN0026256535
Reproduction Date:

Title: Range concatenation grammars  
Author: World Heritage Encyclopedia
Language: English
Subject: Aperiodic finite state automaton, Embedded pushdown automaton, Deterministic context-free language, Context-sensitive language, Pattern language (formal languages)
Collection:
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Range concatenation grammars

Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier [1] in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the Mildly context-sensitive languages.[2]

From a theoretical point of view, any language that can be parsed in polynomial time belongs to the subset of RCG called positive range concatenation grammars, and reciprocally.[3]

Though intended as a variant on Groenink's Literal movement grammars, RCGs treat the grammatical process more as a proof than as a production. Whereas LMGs produce a terminal string from a start predicate, RCGs aim to reduce a start predicate (which predicates of a terminal string) to the empty string, which constitutes a proof of the terminal strings membership in the language.

Description

Formal definition

A Positive Range Concatenation Grammar (PRCG) is a tuple G = (N,~T,~V,~S,~P), where:

  • N, T and V are disjoint finite sets of (respectively) predicate names, terminal symbols and variable names. Each predicate name has an associated arity given by the function dim: N \rightarrow \mathbb{N}\setminus\{0\}.
  • S \in N is the start predicate name and verify dim(S)=1.
  • P is a finite set of clauses of the form \psi_0 \rightarrow \psi_1 \ldots \psi_m, where the \psi_i are predicates of the form A_i(\alpha_1, \ldots, \alpha_{dim(A_i)}) with A_i \in N and \alpha_i \in (T \cup V)^\star.

A Negative Range Concatenation Grammar (NRCG) is defined like a PRCG, but with the addition that some predicates occurring in the right-hand side of a clause can have the form \overline{A_i(\alpha_1, \ldots, \alpha_{dim(A_i)})}. Such predicates are called negative predicates.

A Range Concatenation Grammar is a positive or a negative one. Although PRCGs are technically NRCGs, the terms are used to highlight the absence (PRCG) or presence (NRCG) of negative predicates.

A range in a word w \in T^\star is a couple \langle l, r \rangle_w, with 0 \leq l \leq r \leq n, where n is the length of w. Two ranges \langle l_1, r_1 \rangle_w and \langle l_2, r_2 \rangle_w can be concatenated iff r_1 = l_2, and we then have: \langle l_1, r_1 \rangle_w \cdot \langle l_2, r_2 \rangle_w = \langle l_1, r_2 \rangle_w.

For a word w = w_1w_2\ldots w_n, with w_i \in T, the dotted notation for ranges is: \langle l, r \rangle_w = w_1\ldots w_{l-1} \bullet w_l\ldots w_{r-1} \bullet w_r\ldots w_n.

Recognition of strings

Like LMGs, RCG clauses have the general schema A(x_1, ..., x_n) \to \alpha, where in an RCG, \alpha is either the empty string or a string of predicates. The arguments x_i consist of strings of terminal symbols and/or variable symbols, which pattern match against actual argument values like in LMG. Adjacent variables constitute a family of matches against partitions, so that the argument xy, with two variables, matches the literal string ab in three different ways: x = \epsilon,\ y = ab;\ x = a,\ y = b;\ x = ab,\ y = \epsilon.

Predicate terms come in two forms, positive (which produce the empty string on success), and negative (which produce the empty string on failure/if the positive term does not produce the empty string). Negative terms are denoted the same as positive terms, with an overbar, as in \overline{A(x_1, ..., x_n)}.

The rewrite semantics for RCGs is rather simple, identical to the corresponding semantics of LMGs. Given a predicate string A(\alpha_1, ..., \alpha_n), where the symbols \alpha_i are terminal strings, if there is a rule A(x_1, ..., x_n) \to \beta in the grammar that the predicate string matches, the predicate string is replaced by \beta, substituting for the matched variables in each x_i.

For example, given the rule A(x, ayb) \to B(axb, y), where x and y are variable symbols and a and b are terminal symbols, the predicate string A(a, abb) can be rewritten as B(aab, b), because A(a, abb) matches A(x, ayb) when x = a,\ y = b. Similarly, if there were a rule A(x, ayb) \to A(x, x)\ A(y, y), A(a, abb) could be rewritten as A(a, a)\ A(b, b).

A proof/recognition of a string \alpha is done by showing that S(\alpha) produces the empty string. For the individual rewrite steps, when multiple alternative variable matches are possible, any rewrite which could lead the whole proof to succeed is considered. Thus, if there is at least one way to produce the empty string from the initial string S(\alpha), the proof is considered a success, regardless of how many other ways to fail exist.

Example

RCGs are capable of recognizing the non-linear index language \{ www : w \in \{a,b\}^{*} \} as follows:

Letting x, y, and z be variable symbols:


S(xyz) \to A(x, y, z)

A(ax, ay, az) \to A(x, y, z)

A(bx, by, bz) \to A(x, y, z)

A(\epsilon, \epsilon, \epsilon) \to \epsilon


The proof for abbabbabb is then

S(abbabbabb) \Rightarrow A(abb, abb, abb) \Rightarrow A(bb, bb, bb) \Rightarrow A(b, b, b) \Rightarrow A(\epsilon, \epsilon, \epsilon) \Rightarrow \epsilon

Or, using the more correct dotted notation for ranges:

S(\bullet{}abbabbabb\bullet{}) \Rightarrow A(\bullet{}abb\bullet{}abbabb, abb\bullet{}abb\bullet{}abb, abbabb\bullet{}abb\bullet{}) \Rightarrow A(a\bullet{}bb\bullet{}abbabb, abba\bullet{}bb\bullet{}abb, abbabba\bullet{}bb\bullet{}) \Rightarrow A(ab\bullet{}b\bullet{}abbabb, abbab\bullet{}b\bullet{}abb, abbabbab\bullet{}b\bullet{}) \Rightarrow A(\epsilon, \epsilon, \epsilon) \Rightarrow \epsilon

References

  1. ^ Boullier, Pierre (Jan 1998). Proposal for a Natural Language Processing Syntactic Backbone (Technical report) 3342. INRIA Rocquencourt (France). 
  2. ^ Pierre Boullier (1999). "Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars". Proc. EACL. pp. 53–60. 
  3. ^ Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 37.   citing http://mjn.host.cs.st-andrews.ac.uk/publications/2001d.pdf
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.