Incremental Parsing with Parallel Multiple Context-Free Grammars Krasimir Angelov Chalmers University of Technology G¨ teborg, Sweden o krasimir@chalmers.se Abstract Parallel Multiple Context-Free Grammar (PMCFG) is an extension of context-free grammar for which the recognition problem is still solvable in polynomial time. We describe a new parsing algorithm that has the advantage to be incremental and to support PMCFG directly rather than the weaker MCFG formalism. The algorithm is also top-down which allows it to be used for grammar based word prediction. 1 Introduction Parallel Multiple Context-Free Grammar (PMCFG) (Seki et al., 1991) is one of the grammar formalisms that have been proposed for the syntax of natural languages. It is an extension of context-free grammar (CFG) where the right hand side of the production rule is a tuple of strings instead of only one string. Using tuples the grammar can model discontinuous constituents which makes it more powerful than context-free grammar. In the same time PMCFG has the advantage to be parseable in polynomial time which makes it attractive from computational point of view. A parsing algorithm is incremental if it reads the input one token at the time and calculates all possible consequences of the token, before the next token is read. There is substantial evidence showing that humans process language in an incremental fashion which makes the incremental algorithms attractive from cognitive point of view. If the algorithm is also top-down then it is possible to predict the next word from the sequence of preceding words using the grammar. This can be used for example in text based dialog systems or text editors for controlled language where the user might not be aware of the grammar coverage. In this case the system can suggest the possible continuations. A restricted form of PMCFG that is still stronger than CFG is Multiple Context-Free Grammar (MCFG). In Seki and Kato (2008) it has been shown that MCFG is equivalent to string-based Linear ContextFree Rewriting Systems and Finite-Copying Tree Transducers and it is stronger than Tree Adjoining Grammars (Joshi and Schabes, 1997). Efficient recog- nition and parsing algorithms for MCFG have been described in Nakanishi et al. (1997), Ljungl¨ f (2004) and o Burden and Ljungl¨ f (2005). They can be used with o PMCFG also but it has to be approximated with overgenerating MCFG and post processing is needed to filter out the spurious parsing trees. We present a parsing algorithm that is incremental, top-down and supports PMCFG directly. The algorithm exploits a view of PMCFG as an infinite contextfree grammar where new context-free categories and productions are generated during parsing. It is trivial to turn the algorithm into statistical by attaching probabilities to each rule. In Ljungl¨ f (2004) it has been shown that the Gramo matical Framework (GF) formalism (Ranta, 2004) is equivalent to PMCFG. The algorithm was implemented as part of the GF interpreter and was evaluated with the resource grammar library (Ranta, 2008) which is the largest collection of grammars written in this formalism. The incrementality was used to build a help system which suggests the next possible words to the user. Section 2 gives a formal definition of PMCFG. In section 3 the procedure for "linearization" i.e. the derivation of string from syntax tree is defined. The definition is needed for better understanding of the formal proofs in the paper. The algorithm introduction starts with informal description of the idea in section 4 and after that the formal rules are given in section 5. The implementation details are outlined in section 6 and after that there are some comments on the evaluation in section 7. Section 8 gives a conclusion. 2 PMCFG definition Definition 1 A parallel multiple context-free grammar is an 8-tuple G = (N, T, F, P, S, d, r, a) where: ˇ N is a finite set of categories and a positive integer d(A) called dimension is given for each A N . ˇ T is a finite set of terminal symbols which is disjoint with N . ˇ F is a finite set of functions where the arity a(f ) and the dimensions r(f ) and di (f ) (1 i a(f )) are given for every f F . For every positive integer d, (T )d denote the set of all d-tuples Proceedings of the 12th Conference of the European Chapter of the ACL, pages 69­76, Athens, Greece, 30 March ­ 3 April 2009. c 2009 Association for Computational Linguistics 69 of strings over T . Each function f F is a total mapping from (T )d1 (f ) × (T )d2 (f ) × ˇ ˇ ˇ × (T )da(f ) (f ) to (T )r(f ) , defined as: f := (1 , 2 , . . . , r(f ) ) Here i is a sequence of terminals and k; l pairs, where 1 k a(f ) is called argument index and 1 l dk (f ) is called constituent index. ˇ P is a finite set of productions of the form: A f [A1 , A2 , . . . , Aa(f ) ] where A N is called result category, A1 , A2 , . . . , Aa(f ) N are called argument categories and f F is the function symbol. For the production to be well formed the conditions di (f ) = d(Ai ) (1 i a(f )) and r(f ) = d(A) must hold. ˇ S is the start category and d(S) = 1. We use the same definition of PMCFG as is used by Seki and Kato (2008) and Seki et al. (1993) with the minor difference that they use variable names like xkl while we use k; l to refer to the function arguments. As an example we will use the an bn cn language: S c[N ] N s[N ] N z[] c := ( 1; 1 z := ( , , ) Here the dimensions are d(S) = 1 and d(N ) = 3 and the arities are a(c) = a(s) = 1 and a(z) = 0. is the empty string. 1; 2 1; 3 ) function L applied to the syntax tree: L(f t1 t2 . . . ta(f ) ) = (x1 , x2 . . . xr(f ) ) where xi = K(L(t1 ), L(t2 ) . . . L(ta(f ) )) i and f := (1 , 2 . . . r(f ) ) F The function uses a helper function K which takes the already linearized arguments and a sequence i of terminals and k; l pairs and returns a string. The string is produced by simple substitution of each k; l with the string for constituent l from argument k: K (1 k1 ; l1 2 k2 ; l2 . . . n ) = 1 k1 l1 2 k2 l2 . . . n where i T . The recursion in L terminates when a leaf is reached. In the example an bn cn language the function z does not have arguments and it corresponds to the base case when n = 0. Every application of s over another tree t : N increases n by one. For example the syntax tree (s (s z)) will produce the tuple (aa, bb, cc). Finally the application of c combines all elements in the tuple in a single string i.e. c (s (s z)) will produce the string aabbcc. 4 The Idea s := (a 1; 1 , b 1; 2 , c 1; 3 ) Although PMCFG is not context-free it can be approximated with an overgenerating context-free grammar. The problem with this approach is that the parser produces many spurious parse trees that have to be filtered out. A direct parsing algorithm for PMCFG should avoid this and a careful look at the difference between PMCFG and CFG gives an idea. The context-free approximation of an bn cn is the language a b c with grammar: S ABC A | aA B | bB C | cC The string "aabbcc" is in the language and it can be derived with the following steps: S ABC aABC aaABC aaBC aabBC aabbBC aabbC aabbcC aabbccC aabbcc 3 Derivation The derivation of a string in PMCFG is a two-step process. First we have to build a syntax tree of a category S and after that to linearize this tree to string. The definition of a syntax tree is recursive: Definition 2 (f t1 . . . ta(f ) ) is a tree of category A if ti is a tree of category Bi and there is a production: A f [B1 . . . Ba(f ) ] The abstract notation for "t is a tree of category A" is t : A. When a(f ) = 0 then the tree does not have children and the node is called leaf. The linearization is bottom-up. The functions in the leaves do not have arguments so the tuples in their definitions already contain constant strings. If the function has arguments then they have to be linearized and the results combined. Formally this can be defined as a 70 The grammar is only an approximation because there is no enforcement that we will use only equal number of reductions for A, B and C. This can be guaranteed if we replace B and C with new categories B and C after the derivation of A: B bB B bB B C cC C cC C I NITIAL P REDICT S f [B] [0 S 0 f [B]; 1 : ˇ] P REDICT S - start category, = rhs(f, 1) Bd g[C] S CAN [k A f [B]; l : ˇ d; r ] j [k Bd g[C]; r : ˇ] k = rhs(g, r) [k A f [B]; l : ˇ s ] j [k+1 A f [B]; l : s ˇ ] j C OMPLETE In this case the only possible derivation from aaB C is aabbcc. The PMCFG parser presented in this paper works like context-free parser, except that during the parsing it generates fresh categories and rules which are specializations of the originals. The newly generated rules are always versions of already existing rules where some category is replaced with new more specialized category. The generation of specialized categories prevents the parser from recognizing phrases that are otherwise withing the scope of the context-free approximation of the original grammar. s = wk+1 [k A f [B]; l : ˇ] j N f [B] C OMBINE [k A; l; N ] j N = (A, l, j, k) [k Bd ; r; N ] u [u A f [B]; l : ˇ d; r ] j [k A f [B{d := N }]; l : d; r ˇ ] j Figure 1: Deduction Rules sequence of arguments ti : Bi . The sequence is the part that produced the substring: K(L(t1 ), L(t2 ) . . . L(ta(f ) )) = wj+1 . . . wk and is the part that is not processed yet. Passive Items The passive items are of the form: [k A; l; N ] , j k j and state that there exists at least one production: A f [B] f := (1 , 2 , . . . r(f ) ) and a tree (f t1 . . . ta(f ) ) : A such that the constituent with index l in the linearization of the tree is equal to wj+1 . . . wk . Contrary to the active items in the passive the whole constituent is matched: K(L(t1 ), L(t2 ) . . . L(ta(f ) )) l = wj+1 . . . wk Each time when we complete an active item, a passive item is created and at the same time we create a new category N which accumulates all productions for A that produce the wj+1 . . . wk substring from constituent l. All trees of category N must produce wj+1 . . . wk in the constituent l. There are six inference rules (see figure 1). The I NITIAL P REDICT rule derives one item spanning the 0 - 0 range for each production with the start category S on the left hand side. The rhs(f, l) function returns the constituent with index l of function f . In the P REDICT rule, for each active item with dot before a d; r pair and for each production for Bd , a new active item is derived where the dot is in the beginning of constituent r in g. When the dot is before some terminal s and s is equal to the current terminal wk then the S CAN rule derives a new item where the dot is moved to the next position. 5 Parsing The algorithm is described as a deductive process in the style of (Shieber et al., 1995). The process derives a set of items where each item is a statement about the grammatical status of some substring in the input. The inference rules are in natural deduction style: X1 . . . Xn < side conditions on X1 , . . . , Xn > Y where the premises Xi are some items and Y is the derived item. We assume that w1 . . . wn is the input string. 5.1 Deduction Rules The deduction system deals with three types of items: active, passive and production items. Productions In Shieber's deduction systems the grammar is a constant and the existence of a given production is specified as a side condition. In our case the grammar is incrementally extended at runtime, so the set of productions is part of the deduction set. The productions from the original grammar are axioms and are included in the initial deduction set. Active Items The active items represent the partial parsing result: [k A f [B]; l : ˇ ] , j k j The interpretation is that there is a function f with a corresponding production: A f [B] f := (1 , . . . l-1 , , . . . r(f ) ) such that the tree (f t1 . . . ta(f ) ) will produce the substring wj+1 . . . wk as a prefix in constituent l for any 71 When the dot is at the end of an active item then it is converted to passive item in the C OMPLETE rule. The category N in the passive item is a fresh category created for each unique (A, l, j, k) quadruple. A new production is derived for N which has the same function and arguments as in the active item. The item in the premise of C OMPLETE was at some point predicted in P REDICT from some other item. The C OMBINE rule will later replace the occurence A in the original item (the premise of P REDICT) with the specialization N . The C OMBINE rule has two premises: one active item and one passive. The passive item starts from position u and the only inference rule that can derive items with different start positions is P REDICT. Also the passive item must have been predicted from active item where the dot is before d; r , the category for argument number d must have been Bd and the item ends at u. The active item in the premise of C OMBINE is such an item so it was one of the items used to predict the passive one. This means that we can move the dot after d; r and the d-th argument is replaced with its specialization N. If the string contains another reference to the d-th argument then the next time when it has to be predicted the rule P REDICT will generate active items, only for those productions that were successfully used to parse the previous constituents. If a context-free approximation was used this would have been equivalent to unification of the redundant subtrees. Instead this is done at runtime which also reduces the search space. The parsing is successful if we had derived the [n S; 1; S ] item, where n is the length of the text, S is 0 the start category and S is the newly created category. The parser is incremental because all active items span up to position k and the only way to move to the next position is the S CAN rule where a new symbol from the input is consumed. 5.2 Soundness In the C OMPLETE rule the dot is at the end of the string. This means that wj+1 . . . wk will be not just a prefix in constituent l of the linearization but the full string. This is exactly what is required in the semantics of the passive item. The passive item is derived from a valid active item so there is at least one production for A. The category N is unique for each (A, l, j, k) quadruple so it uniquely identifies the passive item in which it is placed. There might be many productions that can produce the passive item but all of them should be able to generate wj+1 . . . wk and they are exactly the productions that are added to N . From all this arguments it follows that C OMPLETE is sound. The C OMBINE rule is sound because from the active item in the premise we know that: K = wj+1 . . . wu for every context built from the trees: t1 : B1 ; t2 : B2 ; . . . ta(f ) : Ba(f ) From the passive item we know that every production for N produces the wu+1 . . . wk in r. From that follows that K ( d; r ) = wj+1 . . . wk where is the same as except that Bd is replaced with N . Note that the last conclusion will not hold if we were using the original context because Bd is a more general category and can contain productions that does not derive wu+1 . . . wk . 5.3 Completeness The parsing system is complete if it derives an item for every valid grammatical statement. In our case we have to prove that for every possible parse tree the corresponding items will be derived. The proof for completeness requires the following lemma: Lemma 1 For every possible syntax tree (f t1 . . . ta(f ) ) : A with linearization L(f t1 . . . ta(f ) ) = (x1 , x2 . . . xd(A) ) where xl = wj+1 . . . wk , the system will derive an item [k A; l; A ] if the item [k A f [B]; l : ˇl ] was prej j dicted before that. We assume that the function definition is: f := (1 , 2 . . . r(f ) ) The proof is by induction on the depth of the tree. If the tree has only one level then the function f does not have arguments and from the linearization definition and from the premise in the lemma it follows that l = wj+1 . . . wk . From the active item in the lemma The parsing system is sound if every derivable item represents a valid grammatical statement under the interpretation given to every type of item. The derivation in I NITIAL P REDICT and P REDICT is sound because the item is derived from existing production and the string before the dot is empty so: K = The rationale for S CAN is that if K = wj-1 . . . wk and s = wk+1 then K ( s) = wj-1 . . . wk+1 If the item in the premise is valid then it is based on existing production and function and so will be the item in the consequent. 72 by applying iteratively the S CAN rule and finally the C OMPLETE rule the system will derive the requested item. If the tree has subtrees then we assume that the lemma is true for every subtree and we prove it for the whole tree. We know that K l = wj+1 . . . wk Since the function K does simple substitution it is possible for each d; s pair in l to find a new range in the input string j -k such that the lemma to be applicable for the corresponding subtree td : Bd . The terminals in l will be processed by the S CAN rule. Rule P REDICT will generate the active items required for the subtrees and the C OMBINE rule will consume the produced passive items. Finally the C OMPLETE rule will derive the requested item for the whole tree. From the lemma we can prove the completeness of the parsing system. For every possible tree t : S such that L(t) = (w1 . . . wn ) we have to prove that the [n S; 1; S ] item will be derived. Since the top-level 0 function of the tree must be from production for S the I NITIAL P REDICT rule will generate the active item in the premise of the lemma. From this and from the assumptions for t it follows that the requested passive item will be derived. 5.4 Complexity adds new productions and never removes. From that follows the inequality: n n (n - j + 1)P(j) P(n) j=0 i=0 (n - j + 1) which gives the approximation for the upper limit: P(n) n(n + 1) 2 The same result applies to the passive items. The only difference is that the passive items have only a category instead of a full production. However the upper limit for the number of categories is the same. Finally the upper limit for the total number of active, passive and production items is: P(n)(n2 + n + 1) The expression for P(n) is grammar dependent but we can estimate that it is polynomial because the set of productions corresponds to the compact representation of all parse trees in the context-free approximation of the grammar. The exponent however is grammar dependent. From this we can expect that asymptotic space complexity will be O(ne ) where e is some parameter for the grammar. This is consistent with the results in Nakanishi et al. (1997) and Ljungl¨ f (2004) where the o exponent also depends on the grammar. The time complexity is proportional to the number of items and the time needed to derive one item. The time is dominated by the most complex rule which in this algorithm is C OMBINE. All variables that depend on the input size are present both in the premises and in the consequent except u. There are n possible values for u so the time complexity is O(ne+1 ). 5.5 Tree Extraction The algorithm is very similar to the Earley (1970) algorithm for context-free grammars. The similarity is even more apparent when the inference rules in this paper are compared to the inference rules for the Earley algorithm presented in Shieber et al. (1995) and Ljungl¨ f o (2004). This suggests that the space and time complexity of the PMCFG parser should be similar to the complexity of the Earley parser which is O(n2 ) for space and O(n3 ) for time. However we generate new categories and productions at runtime and this have to be taken into account. Let the P(j) function be the maximal number of productions generated from the beginning up to the state where the parser has just consumed terminal number j. P(j) is also the upper limit for the number of categories created because in the worst case there will be only one production for each new category. The active items have two variables that directly depend on the input size - the start index j and the end index k. If an item starts at position j then there are (n - j + 1) possible values for k because j k n. The item also contains a production and there are P(j) possible choices for it. In total there are: n If the parsing is successful we need a way to extract the syntax trees. Everything that we need is already in the set of newly generated productions. If the goal item is [n S; 0; S ] then every tree t of category S that can be 0 constructed is a syntax tree for the input sentence (see definition 2 in section 3 again). Note that the grammar can be erasing; i.e., there might be productions like this: S f [B1 , B2 , B3 ] f := ( 1; 1 3; 1 ) There are three arguments but only two of them are used. When the string is parsed this will generate a new specialized production: S f [B1 , B2 , B3 ] (n - j + 1)P(j) j=0 possible choices for one active item. The possibilities for all other variables are only a constant factor. The P(j) function is monotonic because the algorithm only Here S,B1 and B3 are specialized to S , B1 and B3 but the B2 category is still the same. This is correct 73 because actually any subtree for the second argument will produce the same result. Despite this it is sometimes useful to know which parts of the tree were used and which were not. In the GF interpreter such unused branches are replaced by meta variables. In this case the tree extractor should check whether the category also exists in the original set of categories N in the grammar. Just like with the context-free grammars the parsing algorithm is polynomial but the chart can contain exponential or even infinite number of trees. Despite this the chart is a compact finite representation of the set of trees. Language Bulgarian English German Swedish Productions 3516 1165 8078 1496 Constituents 75296 8290 21201 8793 Table 1: GF Resource Grammar Library size in number of PMCFG productions and discontinuous constituents 1200 1000 800 ms 600 6 Implementation 400 Every implementation requires a careful design of the data structures in the parser. For efficient access the set of items is split into four subsets: A, Sj , C and P. A is the agenda i.e. the set of active items that have to be analyzed. Sj contains items for which the dot is before an argument reference and which span up to position j. C is the set of possible continuations i.e. a set of items for which the dot is just after a terminal. P is the set of productions. In addition the set F is used internally for the generatation of fresh categories. The sets C, Sj and F are used as association maps. They contain associations like k v where k is the key and v is the value. All maps except F can contain more than one value for one and the same key. The pseudocode of the implementation is given in figure 2. There are two procedures Init and Compute. Init computes the initial values of S, P and A. The initial agenda A is the set of all items that can be predicted from the start category S (I NITIAL P REDICT rule). Compute consumes items from the current agenda and applies the S CAN, P REDICT, C OMBINE or C OMPLETE rule. The case statement matches the current item against the patterns of the rules and selects the proper rule. The P REDICT and C OMBINE rules have two premises so they are used in two places. In both cases one of the premises is related to the current item and a loop is needed to find item matching the other premis. The passive items are not independent entities but are just the combination of key and value in the set F. Only the start position of every item is kept because the end position for the interesting passive items is always the current position and the active items are either in the agenda if they end at the current position or they are in the Sj set if they end at position j. The active items also keep only the dot position in the constituent because the constituent definition can be retrieved from the grammar. For this reason the runtime representation of the items is [j; A f [B]; l; p] where j is the start position of the item and p is the dot position inside the constituent. The Compute function returns the updated S and P sets and the set of possible continuations C. The set of continuations is a map indexed by a terminal and the 200 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Number of Tokens German Bulgarian Swedish English Figure 3: Parser performance in miliseconds per token values are active items. The parser computes the set of continuations at each step and if the current terminal is one of the keys the set of values for it is taken as an agenda for the next step. 7 Evaluation The algorithm was evaluated with four languages from the GF resource grammar library (Ranta, 2008): Bulgarian, English, German and Swedish. These grammars are not primarily intended for parsing but as a resource from which smaller domain dependent grammars are derived for every application. Despite this, the resource grammar library is a good benchmark for the parser because these are the biggest GF grammars. The compiler converts a grammar written in the high-level GF language to a low-level PMCFG grammar which the parser can use directly. The sizes of the grammars in terms of number of productions and number of unique discontinuous constituents are given on table 1. The number of constituents roughly corresponds to the number of productions in the contextfree approximation of the grammar. The parser performance in terms of miliseconds per token is shown in figure 3. In the evaluation 34272 sentences were parsed and the average time for parsing a given number of tokens is drawn in the chart. As it can be seen, although the theoretical complexity is polynomial, the real-time performance for practically interesting grammars tends to be linear. 8 Conclusion The algorithm has proven useful in the GF system. It accomplished the initial goal to provide suggestions 74 procedure Init() { k=0 Si = , for every i P = the set of productions P in the grammar A= forall S f [B] P do // I NITIAL P REDICT A = A + [0; S f [B]; 1; 0] return (S, P, A) } procedure Compute(k, (S, P, A)) { C= F= while A = do { let x A, x [j; A f [B]; l; p] A=A-x case the dot in x is { before s T C = C + (s [j; A f [B]; l; p + 1]) // S CAN before d; r if ((Bd , r) (x, d)) Sk then { Sk = Sk + ((Bd , r) (x, d)) forall Bd g[C] P do // P REDICT A = A + [k; Bd g[C]; r; 0] } forall (k; Bd , r) N F do // C OMBINE A = A + [j; A f [B{d := N }]; l; p + 1] at the end if N.((j, A, l) N F) then { forall (N, r) (x , d ) Sk do // P REDICT A = A + [k; N f [B]; r; 0] } else { generate fresh N // C OMPLETE F = F + ((j, A, l) N ) forall (A, l) ([j ; A f [B ]; l ; p ], d) Sj do A = A + [j ; A f [B {d := N }]; l ; p + 1] } P = P + (N f [B]) // C OMBINE } } return (S, P, C) } Figure 2: Pseudocode of the parser implementation 75 in text based dialog systems and in editors for controlled languages. Additionally the algorithm has properties that were not envisaged in the beginning. It works with PMCFG directly rather that by approximation with MCFG or some other weaker formalism. Since the Linear Context-Free Rewriting Systems, Finite-Copying Tree Transducers and Tree Adjoining Grammars can be converted to PMCFG, the algorithm presented in this paper can be used with the converted grammar. The approach to represent context-dependent grammar as infinite context-free grammar might be applicable to other formalisms as well. This will make it very attractive in applications where some of the other formalisms are already in use. In 31st Annual Meeting of the Association for Computational Linguistics, pages 130­140. Ohio State University, Association for Computational Linguistics, June. Stuart M. Shieber, Yves Schabes, and Fernando C. N. Pereira. 1995. Principles and Implementation of Deductive Parsing. Journal of Logic Programming, 24(1&2):3­36. References H° kan Burden and Peter Ljungl¨ f. 2005. Parsing a o linear context-free rewriting systems. In Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pages 11­17, October. Jay Earley. 1970. An efficient context-free parsing algorithm. Commun. ACM, 13(2):94­102. Aravind Joshi and Yves Schabes. 1997. Treeadjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages. Vol 3: Beyond Words, chapter 2, pages 69­ 123. Springer-Verlag, Berlin/Heidelberg/New York. Peter Ljungl¨ f. 2004. Expressivity and Complexity of o the Grammatical Framework. Ph.D. thesis, Department of Computer Science, Gothenburg University and Chalmers University of Technology, November. Ryuichi Nakanishi, Keita Takada, and Hiroyuki Seki. 1997. An Efficient Recognition Algorithm for Multiple ContextFree Languages. In Fifth Meeting on Mathematics of Language. The Association for Mathematics of Language, August. Aarne Ranta. 2004. Grammatical Framework: A Type-Theoretical Grammar Formalism. Journal of Functional Programming, 14(2):145­189, March. Aarne Ranta. 2008. GF Resource Grammar Library. digitalgrammars.com/gf/lib/. Hiroyuki Seki and Yuki Kato. 2008. On the Generative Power of Multiple Context-Free Grammars and Macro Grammars. IEICE-Transactions on Info and Systems, E91-D(2):209­221. Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. On multiple contextfree grammars. Theoretical Computer Science, 88(2):191­229, October. Hiroyuki Seki, Ryuichi Nakanishi, Yuichi Kaji, Sachiko Ando, and Tadao Kasami. 1993. Parallel Multiple Context-Free Grammars, Finite-State Translation Systems, and Polynomial-Time Recognizable Subclasses of Lexical-Functional Grammars. 76