Highly constrained unification grammars Daniel Feinstein Department of Computer Science University of Haifa 31905 Haifa, Israel daniel@cs.haifa.ac.il Shuly Wintner Department of Computer Science University of Haifa 31905 Haifa, Israel shuly@cs.haifa.ac.il Abstract Unification grammars are widely accepted as an expressive means for describing the structure of natural languages. In general, the recognition problem is undecidable for unification grammars. Even with restricted variants of the formalism, offline parsable grammars, the problem is computationally hard. We present two natural constraints on unification grammars which limit their expressivity. We first show that non-reentrant unification grammars generate exactly the class of contextfree languages. We then relax the constraint and show that one-reentrant unification grammars generate exactly the class of tree-adjoining languages. We thus relate the commonly used and linguistically motivated formalism of unification grammars to more restricted, computationally tractable classes of languages. Unification grammars are Turing equivalent: determining whether a given string is generated by a given grammar is as hard as deciding whether a Turing machine halts on the empty input (Johnson, 1988). Therefore, the recognition problem for unification grammars is undecidable in the general case. To ensure its decidability, several constraints on unification grammars, commonly known as the off-line parsability (OLP) constraints, were suggested, such that the recognition problem is decidable for off-line parsable grammars (Jaeger et al., 2005). The idea behind all the OLP definitions is to rule out grammars which license trees in which unbounded amount of material is generated without expanding the frontier word. This can happen due to two kinds of rules: -rules (whose bodies are empty) and unit rules (whose bodies consist of a single element). However, even for unification grammars with no such rules the recognition problem is NP-hard (Barton et al., 1987). In order for a grammar formalism to make predictions about the structure of natural language its generative capacity must be constrained. It is now generally accepted that Context-free Grammars (CFGs) lack the generative power needed for this purpose (Savitch et al., 1987), due to natural language constructions such as reduplication, multiple agreement and crossed agreement. Several linguistic formalisms have been proposed as capable of modeling these phenomena, including Linear Indexed Grammars (LIG) (Gazdar, 1988), Head Grammars (Pollard, 1984), Tree Adjoining Grammars (TAG) (Joshi, 2003) and Combinatory Categorial Grammars (Steedman, 2000). In a seminal work, Vijay-Shanker and Weir (1994) prove that all four formalisms are weakly equivalent. They all generate the class of mildly context-sensitive languages (M C S L), all members 1 Introduction Unification grammars (U G) (Shieber, 1986; Shieber, 1992; Carpenter, 1992) have originated as an extension of context-free grammars, the basic idea being to augment the context-free rules with non context-free annotations (feature structures) in order to express additional information. They can describe phonological, morphological, syntactic and semantic properties of languages simultaneously and are thus linguistically suitable for modeling natural languages. Several formulations of unification grammars have been proposed, and they are used extensively by computational linguists to describe the structure of a variety of natural languages. 1089 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 1089­1096, Sydney, July 2006. c 2006 Association for Computational Linguistics of which have recognition algorithms with time complexity O(n6 ) (Vijay-Shanker and Weir, 1993; Satta, 1994).1 As a result of the weak equivalence of four independently developed (and linguistically motivated) extensions of CFG, the class M C S L is considered to be linguistically meaningful, a natural class of languages for characterizing natural languages. Several authors tried to approximate unification grammars by means of context-free grammars (Rayner et al., 2001; Kiefer and Krieger, 2004) and even finite-state grammars (Pereira and Wright, 1997; Johnson, 1998), but we are not aware of any work which relates unification grammars with the class M C S L. The main objective of this work is to define constraints on UGs which naturally limit their generative capacity. We define two natural and easily testable syntactic constraints on UGs which ensure that grammars satisfying them generate the context-free and the mildly context-sensitive languages, respectively. The contribution of this result is twofold: · From a theoretical point of view, constraining unification grammars to generate exactly the class M C S L results in a grammatical formalism which is, on one hand, powerful enough for linguists to express linguistic generalizations in, and on the other hand cognitively adequate, in the sense that its generative capacity is constrained; · Practically, such a constraint can provide efficient recognition algorithms for the limited class of unification grammars. We define some preliminary notions in section 2 and then show a constrained version of U G which generates the class C FL of context-free languages in section 3. Section 4 presents the main result, namely a restricted version of U G and a mapping of its grammars to LIG, establishing the proposition that such grammars generate exactly the class M C S L. For lack of space, we favor intuitive explanation over rigorous proofs; the full details can be found in Feinstein (2004). terminals, including the start symbol S , and Rcf is a set of productions, assumed to be in a normal form where each rule has either (zero or more) non-terminals or a single terminal in its body, and where the start symbol never occurs in the right hand side of rules. The set of all such context-free grammars is denoted C F G S. In a linear indexed grammar (LIG),2 strings are derived from nonterminals with an associated stack denoted A[l1 . . . ln ], where A is a nonterminal, each li is a stack symbol, and l1 is the top of the stack. Since stacks can grow to be of unbounded size during a derivation, some way of partially specifying unbounded stacks in LIG productions is needed. We use A[l1 . . . ln ] to denote the nonterminal A associated with any stack whose top n symbols are l1 , l2 . . . , ln . The set of all nonterminals in VN , associated with stacks whose symbols come from Vs , is denoted VN [Vs ]. Definition 1. A Linear Indexed Grammar is a five tuple Gli = VN , Vt , Vs , Rli , S where Vt , VN and S are as above, Vs is a finite set of indices (stack symbols) and Rli is a finite set of productions in one of the following two forms: · fixed stack: Ni [p1 . . . pn ] · unbounded stack: Ni [p1 . . . pn ] or Ni [p1 . . . pn ] Nj [q1 . . . qm ] where Ni , Nj VN , p1 . . . pn , q1 . . . qm Vs , n, m 0 and , (Vt VN [Vs ]) . A crucial characteristic of LIG is that only one copy of the stack can be copied to a single element in the body of a rule. If more than one copy were allowed, the expressive power would grow beyond M C S L. Definition 2. Given a LIG VN , Vt , Vs , Rli , S , the derivation relation `li ' is defined as follows: for all 1 , 2 (VN [Vs ] Vt ) and Vs , · If Ni [p1 . . . pn ] Rli then 1 Ni [p1 . . . pn ]2 li 1 2 · If Ni [p1 . . . pn ] Rli then 1 Ni [p1 . . . pn ]2 li 1 2 · If Ni [p1 . . . pn ] Nj [q1 . . . qm ] Rli then 1 Ni [p1 . . . pn ]2 li 1 Nj [q1 . . . qm ] 2 2 The definition is based on Vijay-Shanker and Weir (1994). 2 Preliminary notions w CFG is a four-tuple Gcf = VN , Vt , Rcf , S A here Vt is a set of terminals, VN is a set of non1 The term mildly context-sensitive was coined by Joshi (1985), in reference to a less formally defined class of languages. Strictly speaking, what we call M C S L here is also known as the class of tree-adjoining languages. 1090 The language generated by Gli is L(Gli ) = {w Vt | S [ ] li w}, where `li ' is the reflexive, transitive closure of `li '. Unification grammars are defined over feature structures (FSs) which are directed, connected, rooted, labeled graphs, usually depicted as attribute-value matrices (AVM). A feature structure A can be characterized by its set of paths, A , an assignment of atomic values to the ends of some paths, A (·), and a reentrancy relation ` ' relating paths which lead to the same node. A sequence of feature structures, where some nodes may be shared by more than one element, is a multi-rooted structure (MRS). Definition 3. Unification grammars are defined over a signature consisting of a finite set AT O M S of atoms; a finite set F E AT S of features and a finite set W O R D S of words. A unification grammar is a tuple Gu = Ru , As , L where Ru is a finite set of rules, each of which is an MRS of length n 1, L is a lexicon, which associates with every word w W O R D S a finite set of feature structures, L(w), and As is a feature structure, the start symbol. o efinition 4. A unification grammar Ru , As , L D ver the signature ATOMS, F E AT S, W O R D S is non-reentrant iff for any rule ru Ru , ru is non-reentrant. It is one-reentrant iff for every rule ru Ru , ru includes at most one reentrancy, between the head of the rule and some element of the body. Let U Gnr , U G1r be the sets of all nonreentrant and one-reentrant unification grammars, respectively. Informally, a rule is non-reentrant if (on an AVM view) no reentrancy tags occur in it. When the rule is viewed as a (multi-rooted) graph, it is non-reentrant if the in-degree of all nodes is at most 1. A rule is one-reentrant if (on an AVM view) at most one reentrancy tag occurs in it, exactly twice: once in the head of the rule and once in an element of its body. When the rule is viewed as a (multi-rooted) graph, it is one-reentrant if the in-degree of all nodes is at most 1, with the exception of one node whose in-degree can be 2, provided that the only two distinct paths that lead to this node leave from the roots of the head of the rule and an element of the body. FSs and MRSs are partially ordered by subsumption, denoted ` '. The least upper bound with respect to subsumption is unification, denoted ` '. Unification is partial; when A B is undefined we say that the unification fails and denote it as A B = . Unification is lifted to MRSs: given two MRSs and , it is possible to unify the i-th element of with the j -th element of . This operation, called unification in context and denoted (, i) (, j ), yields two modified variants of and : ( , ). In unification grammars, forms are MRSs. A form A = A1 , . . . , Ak immediately derives another form B = B1 , . . . , Bm (denoted by 1 A u B ) iff there exists a rule ru Ru of length n that licenses the derivation. The head of ru is matched against some element Ai in A using unification in context: (A , i) (ru , 0) = (A , r ). If the unification does not fail, B is obtained by replacing the i-th element of A with the body of r . The reflexive transitive closure of `u ' is denoted by `u '. Definition 5. The language of a unification grammar Gu is L(Gu ) = {w1 · · · wn W O R D S | As u A1 , . . . , An }, where Ai L(wi ) for 1 i n. 1 3 Context-free unification grammars We define a constraint on unification grammars which ensures that grammars satisfying it generate the class C FL. The constraint disallows any reentrancies in the rules of the grammar. When rules are non-reentrant, applying a rule implies that an exact copy of the body of the rule is inserted into the generated (sentential) form, not affecting neighboring elements of the form the rule is applied to. The only difference between rule application in U Gnr and the analog operation in C F G S is that the former requires unification whereas the latter only calls for identity check. This small difference does not affect the generative power of the formalisms, since unification can be pre-compiled in this simple case. The trivial direction is to map a CFG to a nonreentrant unification grammar, since every CFG is, trivially, such a grammar (where terminal and non-terminal symbols are viewed as atomic feature structures). For the inverse direction, we define a mapping from U Gnr to C F G S. The nonterminals of the CFG in the image of the mapping are the set of all feature structures defined in the source U G. Definition 6. Let ug2cfg : U Gnr C F G S be a mapping of U Gnr to C F G S, such that 1091 if Gu = Ru , As , L is over the signature ATOMS, F E AT S, W O R D S then ug2cfg(Gu ) = VN , Vt , Rcf , S cf , where: · VN = {Ai | A0 A1 . . . An Ru , i 0} {A | A L(a), a AT O M S} {As }. VN is the set of all the feature structures occurring in any of the rules or the lexicon of Gu . · S cf = As · Vt = W O R D S the signature to represent and simulate LIG symbols. In particular, FSs will encode lists in the natural way, hence the features H E A D and TA I L. For the sake of brevity, we use standard list notation when FSs encode lists. LIG symbols are mapped to FSs thus: Definition 8. Let toFs be a mapping of LIG symbols to feature structures, such that: 1. If t Vt then toFs(t) = t 2. If N VN and pi Vs , 1 i n, then T toFs(N [p1 , . . . , pn ]) = N, p1 , . . . , pn he mapping toFs is extended to sequences of symbols by setting toFs( ) = toFs()toFs( ). Note that toFs is one to one. When FSs that are images of LIG symbols are concerned, unification is reduced to identity: Lemma 3. Let X1 , X2 VN [Vs ] Vt . If toFs(X1 ) toFs(X2 ) = then toFs(X1 ) = toFs(X2 ). When a feature structure which is represented as an unbounded list (a list that is not terminated by elist) is unifiable with an image of a LIG symbol, the former is a prefix of the latter. Lemma 4. Let C = p1 , . . . , pn , i be a nonreentrant feature structure, where p1 , . . . , pn Vs , and letX VN [Vs ] Vt . Then C toFs(X) = iff toFs(X ) = p1 , . . . , pn , , for some Vs . To simulate LIGs with UGs we represent each symbol in the LIG as a feature structure, encoding the stack of LIG non-terminals as lists. Rules that propagate stacks (from mother to daughter) are simulated by means of reentrancy in the UG. Definition 9. Let lig2ug be a mapping of L I G S to U G1r , such that if Gli = VN , Vt , Vs , Rli , S and Gu = Ru , As , L = lig2ug(Gli ) then Gu is over the signature (definition 7), As = toFs(S [ ]), for all t Vt , L(t) = {toFs(t)} and Ru is defined by: · A LIG rule of the form X0 is mapped to the unification rule toFs(X0 ) toFs() · A LIG rule of the form Ni [p1 , . . . , pn ] Nj [q1 , . . . , qm ] is mapped to the unification rule Ni , p1 , . . . , pn , 1 toFs() Nj , q1 , . . . , qm , 1 toFs() Evidently, lig2ug(Gli ) U G1r for any LIG Gl i . · Rcf consists of the following rules: 1. Let A0 A1 . . . An Ru and B L(b). If for some i, 1 i n, Ai B = , then Ai b Rcf 2. If A0 A1 . . . An Ru and As A0 = then S cf A1 . . . An Rcf . u u 3. Let r1 = A0 A1 . . . An and r2 = uu B0 B1 . . . Bm , where r1 , r2 Ru . If for some i, 1 i n, Ai B0 = , then the rule Ai B1 . . . Bm Rcf The size of ug2cfg(Gu ) is polynomial in the size of Gu . By inductions on the lengths of the derivation sequences, we prove the following theorem: Theorem 1. If Gu = Ru , As , L is a nonreentrant unification grammar and Gcf = ug2cfg(Gu ), then L(Gcf ) = L(Gu ). Corollary 2. Non-reentrant unification grammars are weakly equivalent to C F G S. 4 Mildly context-sensitive U G In this section we show that one-reentrant unification grammars generate exactly the class M C S L. In such grammars each rule can have at most one reentrancy, reflecting the LIG situation where stacks can be copied to exactly one daughter in each rule. 4.1 Mapping LIG to U G1r In order to simulate a given LIG with a unification grammar, a dedicated signature is defined based on the parameters of the LIG. Definition 7. Given a LIG VN , Vt , Vs , Rli , S , let be ATOMS, F E AT S, W O R D S , where AT O M S = VN Vs {elist}, F E AT S = {H E A D, TA I L}, and W O R D S = Vt . We use throughout this section as the signature over which UGs are defined. We use FSs over 1092 Theorem 5. If Gli = VN , Vt , Vs , Rli , S li is a LIG and Gu = lig2ug(Gli ) then L(Gu ) = L(Gli ). 4.2 Mapping U G1r to LIG We are now interested in the reverse direction, namely mapping UGs to LIG. Of course, since UGs are more expressive than LIGs, only a subset of the former can be correctly simulated by the latter. The differences between the two formalisms can be summarized along three dimensions: The basic elements U G manipulates feature structures, and rules (and forms) are MRSs; whereas LIG manipulates terminals and non-terminals with stacks of elements, and rules (and forms) are sequences of such symbols. Rule application In U G a rule is applied by unification in context of the rule and a sentential form, both of which are MRSs, whereas in LIG, the head of a rule and the selected element of a sentential form must have the same non-terminal symbol and consistent stacks. Propagation of information in rules In U G information is shared through reentrancies, whereas In LIG, information is propagated by copying the stack from the head of the rule to one element of its body. We show that one-reentrant UGs can all be correctly mapped to LIG. For the rest of this section we fix a signature ATOMS, F E AT S, W O R D S over which UGs are defined. Let N R F S S be the set of all non-reentrant FSs over this signature. One-reentrant UGs induce highly constrained (sentential) forms: in such forms, there are no reentrancies whatsoever, neither between distinct elements nor within a single element. Hence all the FSs in forms induced by a one-reentrant UG are non-reentrant. Definition 10. Let A be a feature structure with no reentrancies. The height of A, denoted |A|, is the length of the longest path in A. This is well-defined since non-reentrant feature structures are acyclic. Let Gu = Ru , As , L U G1r be a one-reentrant unification grammar. The maximum height of the grammar, maxHt(Gu ), is the height of the highest feature structure in the grammar. This is well defined since all the feature structures of onereentrant grammars are non-reentrant. The following lemma indicates an important property of one-reentrant UGs. Informally, in any FS that is an element of a sentential form induced by such grammars, if two paths are long (specifically, longer than the maximum height of the grammar), they must have a long common prefix. Lemma 6. Let Gu = Ru , As , L U G1r be a one-reentrant unification grammar. Let A be an element of a sentential form induced by Gu . If · Fj ·1 , · Fk ·2 A , where Fj , Fk F E AT S, j = k and |1 | |2 |, then |1 | maxHt(Gu ). Lemma 6 facilitates a view of all the FSs induced by such a grammar as (unboundedly long) lists of elements drawn from a finite, predefined set. The set consists of all features in F E AT S and all the non-reentrant feature structures whose height is limited by the maximal height of the unification grammar. Note that even with onereentrant UGs, feature structures can be unboundedly deep. What lemma 6 establishes is that if a feature structure induced by a one-reentrant unification grammar is deep, then it can be represented as a single "core" path which is long, and all the sub-structures which "hang" from this core are depth-bounded. We use this property to encode such feature structures as cords. Definition 11. Let : N R F S S × PAT H S (F E AT S N R F S S) be a mapping such that if A is a non-reentrant FS and = F1 , . . . , Fn A , then the cord (A, ) is A1 , F1 , . . . , An , Fn , An+1 , where for 1 i n + 1, Ai are non-reentrant FSs such that: · Ai = { G · | F1 , . . . , Fi-1 , G · A , i n, G = Fi } {} · Ai ( ) = A ( F1 , . . . , Fi-1 · ) (if it is defined). We also define last((A, )) = An+1 . The height of a cord is defined as |(A, )| = max1in+1 (|Ai |). For each cord (A, ) we refer to A as the base feature structure and to as the base path. The length of a cord is the length of the base path. The function is one to one: given (A, ), both A and are uniquely determined. Lemma 7. Let Gu be a one-reentrant unification grammar and let A be an element of a sentential form induced by Gu . Then there is a path A such that |(A, )| < maxHt(Gu ). 1093 Lemma 7 implies that every non-reentrant FS (i.e., FSs induced by one-reentrant grammars) can be represented as a height-limited cord. This mapping resolves the first difference between LIG and U G, by providing a representation of the basic elements. We use cords as the stack contents of LIG non-terminals: cords can be unboundedly long, but so can LIG stacks; the crucial point is that cords are height limited, implying that they can be represented using a finite number of elements. We now show how to simulate, in LIG, the unification in context of a rule and a sentential form. The first step is to have exactly one non-terminal symbol (in addition to the start symbol); when all non-terminal symbols are identical, only the content of the stack has to be taken into account. Recall that in order for a LIG rule to be applicable to a sentential form, the stack of the rule's head must be a prefix of the stack of the selected element in the form. The only question is whether the two stacks are equal (fixed rule head) or not (unbounded rule head). Since the contents of stacks are cords, we need a property relating two cords, on one hand, with unifiability of their base feature structures, on the other. Lemma 8 establishes such a property. Informally, if the base path of one cord is a prefix of the base path of the other cord and all feature structures along the common path of both cords are unifiable, then the base feature structures of both cords are unifiable. The reverse direction also holds. Lemma 8. Let A, B N R F S S be non-reentrant feature structures and 1 , 2 PAT H S be paths such that 1 B , 1 · 2 A , (A, 1 · 2 ) = t1 , F1 , . . . , F|1 | , t|1 |+1 , F|1 |+1 , . . . , t|1 ·2 |+1 , (B, 1 ) = s1 , F1 , . . . , s|1 |+1 , and F|1 |+1 s|1 |+1 . Then A B = iff for all i, 1 i |1 | + 1, si ti = . The length of a cord of an element of a sentential form induced by the grammar cannot be bounded, but the length of any cord representation of a rule head is limited by the grammar height. By lemma 8, unifiability of two feature structures can be reduced to a comparison of two cords representing them and only the prefix of the longer cord (as long as the shorter cord) affects the result. Since the cord representation of any grammar rule's head is limited by the height of the grammar we always choose it as the shorter cord in the comparison. We now define, for a feature structure C (which is a head of a rule) and some path , the set that includes all feature structures that are both unifiable with C and can be represented as a cord whose height is limited by the grammar height and whose base path is . We call this set the compatibility set of C and and use it to define the set of all possible prefixes of cords whose base FSs are unifiable with C (see definition 13). Crucially, the compatibility set of C is finite for any feature structure C since the heights and the lengths of the cords are limited. Definition 12. Given a non-reentrant feature structure C, a path = F1 , . . . , Fn C and a natural number h, the compatibility set, (C, , h), is defined as the set of all feature structures A such that C A = , A , and |(A, )| h. The compatibility set is defined for a feature structure and a given path (when h is taken to be the grammar height). We now define two similar sets, F H and U H, for a given FS, independently of a path. When rules of a one-reentrant unification grammar are mapped to LIG rules (definition 14), F H and U H are used to define heads of fixed and unbounded LIG rules, respectively. A single unification rule is mapped to a set of LIG rules, each with a different head. The stack of the head is some member of the sets F H and U H. Each such member is a prefix of the stack of potential elements of sentential forms that the LIG rule can be applied to. Definition 13. Let C be a non-reentrant feature structure and h be a natural number. Then: F H(C, h) = {(A, ) | C , A (C, , h)} U H(C, h) = {(A, ) · F | (A, ) F H(C, h), C ( ) , F F E AT S, v al(last((C A, )), F ) } This accounts for the second difference between LIG and one-reentrant U G, namely rule application. We now briefly illustrate our account of the last difference, propagation of information in rules. In U G1r information is shared between the rule's head and a single element in its body. Let ru = C0 , . . . , Cn be a reentrant unification rule in which the path µe , leaving the e-th element of the body, is reentrant with the path µ0 leaving the head. This rule is mapped to a set of LIG rules, corresponding to the possible rule heads induced by the compatibility set of C0 . Let r be a member of this set, and let X0 and Xe be the head and the e-th element of r, respectively. Reentrancy in ru is modeled in the LIG rule by copying the stack from X0 to Xe . The major complication is the contents 1094 of this stack, which varies according to the cord representations of C0 and Ce and to the reentrant paths. Summing up, in a LIG simulating a onereentrant U G, FSs are represented as stacks of symbols. The set of stack symbols Vs , therefore, is defined as a set of height bounded non-reentrant FSs. Also, all the features of the U G are stack symbols. Vs is finite due to the restriction on FSs (no reentrancies and height-boundedness). The set of terminals, Vt , is the words of the U G. There are exactly two non-terminal symbols, S (the start symbol) and N . The set of rules is divided to four. The start rule only applies once in a derivation, simulating the situation in UGs of a rule whose head is unifiable with the start symbol. Terminal rules are a straight-forward implementation of the lexicon in terms of LIG. Non-reentrant rules are simulated in a similar way to how rules of a non-reentrant UG are simulated by CFG (section 3). The major difference is the head of the rule, X0 , which is defined as explained above. One-reentrant rules are simulated similarly to non-reentrant ones, the only difference being the selected element of the rule body, Xe , which is defined as follows. Definition 14. Let ug2lig be a mapping of U G1r to L I G S, such that if Gu = Ru , As , L U G1r then ug2lig(Gu ) = VN , Vt , Vs , Rli , S , where VN = {N , S } (fresh symbols), Vt = W O R D S, Vs = F E AT S {A | A N R F S S, |A| maxHt(Gu )}, and Rli is defined as follows:3 1. S [ ] N [(As , )] 2. For every w W O R D S such that L(w) = {C0 } and for every 0 C0 , the rule N [(C0 , 0 )] w is in Rli . 3. If C0 , . . . , Cn Ru is a non-reentrant rule, then for every X0 L I G H E A D(C0 ) the rule X0 N [(C1 , )] . . . N [(Cn , )] is in Rli . 4. ( et ru = C0 , . . . , Cn Ru and (0, µ0 ) L e, µe ), where 1 e n. Then for every X0 L I G H E A D(C0 ) the rule X0 N [(C1 , )] . . . N [(Ce-1 , )] Xe N [(Ce+1 , )] . . . N [(Cn , )] For a non-reentrant FS C0 , we define: L I G H E A D(C0 ) as {N [ ] | F H(C0 , maxHt(Gu ))} {N [ ] | U H(C0 , maxHt(Gu ))} 3 is in Rli , where Xe is defined as follows. Let 0 be the base path of X0 and A be the base feature structure of X0 . Applying the rule ru to A, define ( A , 0) (ru , 0) = ( P0 , P0 , . . . , Pe , . . . , Pn ). (a) If µ0 is not a prefix of 0 then Xe = N [(Pe , µe )]. (b) If 0 = µ0 · , PAT H S then i. If X0 = N [(A, 0 )] then Xe = N [(Pe , µe · )]. ii. If X0 = N [(A, 0 ), F ] then Xe = N [(Pe , µe · ), F ]. By inductions on the lengths of the derivations we prove that the mapping is correct: Theorem 9. If Gu U G1r , then L(Gu ) = L(ug2lig(Gu )). 5 Conclusions The main contribution of this work is the definition of two constraints on unification grammars which dramatically limit their expressivity. We prove that non-reentrant unification grammars generate exactly the class of context-free languages; and that one-reentrant unification grammars generate exactly the class of mildly context-sensitive languages. We thus obtain two linguistically plausible constrained formalisms whose computational processing is tractable. This main result is primarily a formal grammar result. However, we maintain that it can be easily adapted such that its consequences to (practical) computational linguistics are more evident. The motivation behind this observation is that reentrancy only adds to the expressivity of a grammar formalism when it is potentially unbounded, i.e., when infinitely many feature structures can be the possible values at the end of the reentrant paths. It is therefore possible to modestly extend the class of unification grammars which can be shown to generate exactly the class of mildly context-sensitive languages, by allowing also a limited form of multiple reentrancies among the elements in a rule (e.g., to handle agreement phenomena). This can be most useful for grammar writers, and at the same time adds nothing to the expressivity of the formalism. We leave the formal details of such an extension to future work. This work can also be extended in other directions. The mapping of one-reentrant UGs to LIG is highly verbose, resulting in LIGs with a huge ru 1095 number of rules. We believe that it should be possible to optimize the mapping such that much smaller grammars are generated. In particular, we are looking into mappings of one-reentrant UGs to other M C S L formalisms, notably TAG. The two constraints on unification grammars (non-reentrant and one-reentrant) are parallel to the first two classes of the Weir (1992) hierarchy of languages. A possible extension of this work could be a definition of constraints on unification grammars that would generate all the classes of the hierarchy. Another direction is an extension of one-reentrant unification grammars, where the reentrancy does not have to be between the head and one element of the body. Also of interest are two-reentrant unification grammars, possibly with limited kinds of reentrancies. Acknowledgments This research was supported by The Israel Science Foundation (grant no. 136/01). We are grateful to Yael Cohen-Sygal, Nissim Francez and James Rogers for their comments and help. Aravind K. Joshi. 1985. Tree Adjoining Grammars: How much context Sensitivity is required to provide a reasonable structural description. In D. Dowty, I. Karttunen, and A. Zwicky, editors, Natural Language Parsing, pages 206­250. Cambridge University Press, Cambridge, U.K. Aravind K. Joshi. 2003. Tree-adjoining grammars. In Ruslan Mitkov, editor, The Oxford handbook of computational linguistics, chapter 26, pages 483­500. Oxford university Press. Bernd Kiefer and Hans-Ulrich Krieger. 2004. A context-free superset approximation of unificationbased grammars. In Harry Bunt, John Carroll, and Giorgio Satta, editors, New Developments in Parsing Technology, pages 229­250. Kluwer Academic Publishers. Fernando C. N. Pereira and Rebecca N. Wright. 1997. Finite-state approximation of phrase-structure grammars. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, Language, Speech and Communication, chapter 5, pages 149­ 174. MIT Press, Cambridge, MA. Carl Pollard. 1984. Generalized phrase structure grammars, head grammars and natural language. Ph.D. thesis, Stanford University. Manny Rayner, John Dowding, and Beth Ann Hockey. 2001. A baseline method for compiling typed unification grammars into context free language models. In Proceedings of EUROSPEECH 2001, Aalborg, Denmark. Giorgio Satta. 1994. Tree-adjoining grammar parsing and boolean matrix multiplication. In Proceedings of the 20st Annual Meeting of the Association for Computational Linguistics, volume 20. Walter J. Savitch, Emmon Bach, William Marsh, and Gila Safran-Naveh, editors. 1987. The formal complexity of natural language, volume 33 of Studies in Linguistics and Philosophy. D. Reidel, Dordrecht. Stuart M. Shieber. 1986. An Introduction to Unification Based Approaches to Grammar. Number 4 in CSLI Lecture Notes. CSLI. Stuart M. Shieber. 1992. Constraint-Based Grammar Formalisms. MIT Press, Cambridge, Mass. Mark Steedman. 2000. The Syntactic Process. Language, Speech and Communication. The MIT Press, Cambridge, Mass. K. Vijay-Shanker and David J. Weir. 1993. Parsing some constrained grammar formalisms. Computational Linguistics, 19(4):591 ­ 636. K. Vijay-Shanker and David J. Weir. 1994. The equivalence of four extensions of context-free grammars. Mathematical systems theory, 27:511­545. David J. Weir. 1992. A geometric hierarchy beyond context-free languages. Theoretical Computer Science, 104:235­261. References G. Edward Barton, Jr., Robert C. Berwick, and Eric Sven Ristad. 1987. The complexity of LFG. In G. Edward Barton, Jr., Robert C. Berwick, and Eric Sven Ristad, editors, Computational Complexity and Natural Language, Computational Models of Cognition and Perception, chapter 3, pages 89­102. MIT Press, Cambridge, MA. Bob Carpenter. 1992. The Logic of Typed Feature Structures. Cambridge University Press. Daniel Feinstein. 2004. Computational investigation of unification grammars. Master's thesis, University of Haifa. Gerald Gazdar. 1988. Applicability of indexed grammars to natural languages. In Uwe Reyle and Christian Rohrer, editors, Natural Language Parsing and Linguistic Theories, pages 69­94. Reidel. Efrat Jaeger, Nissim Francez, and Shuly Wintner. 2005. Unification grammars and off-line parsability. Journal of Logic, Language and Information, 14(2):199­234. Mark Johnson. 1988. Attribute-Value Logic and the Theory of Grammar, volume 16 of CSLI Lecture Notes. CSLI, Stanford, California. Mark Johnson. 1998. Finite-state approximation of constraint-based grammars using left-corner grammar transforms. In Proceedings of the 17th international conference on Computational linguistics, pages 619­623. 1096