Optimal k -arization of Synchronous Tree-Adjoining Grammar
Rebecca Nesson School of Engineering and Applied Sciences Harvard University Cambridge, MA 02138
nesson@seas.harvard.edu

Giorgio Satta Department of Information Engineering University of Padua I-35131 Padova, Italy
satta@dei.unipd.it

Stuart M. Shieber School of Engineering and Applied Sciences Harvard University Cambridge, MA 02138
shieber@seas.harvard.edu

Abstract
Synchronous Tree-Adjoining Grammar (STAG) is a promising formalism for syntaxaware machine translation and simultaneous computation of natural-language syntax and semantics. Current research in both of these areas is actively pursuing its incorporation. However, STAG parsing is known to be NP-hard due to the potential for intertwined correspondences between the linked nonterminal symbols in the elementary structures. Given a particular grammar, the polynomial degree of efficient STAG parsing algorithms depends directly on the rank of the grammar: the maximum number of correspondences that appear within a single elementary structure. In this paper we present a compile-time algorithm for transforming a STAG into a strongly-equivalent STAG that optimally minimizes the rank, k , across the grammar. The algorithm performs in O(|G| + |Y | · L3 ) G time where LG is the maximum number of links in any single synchronous tree pair in the grammar and Y is the set of synchronous tree pairs of G.

the application of synchronous tree-adjoining grammar (STAG) to this problem (Nesson, Shieber, and Rush, 2006; Chiang and Rambow, 2006). In a parallel development, interest in incorporating semantic computation into the TAG framework has led to the use of STAG for this purpose (Nesson and Shieber, 2007; Han, 2006b; Han, 2006a; Nesson and Shieber, 2006). Although STAG does not increase the expressivity of the underlying formalisms (Shieber, 1994), STAG parsing is known to be NPhard due to the potential for intertwined correspondences between the linked nonterminal symbols in the elementary structures (Satta, 1992; Weir, 1988). Without efficient algorithms for processing it, its potential for use in machine translation and TAG semantics systems is limited. Given a particular grammar, the polynomial degree of efficient STAG parsing algorithms depends directly on the rank of the grammar: the maximum number of correspondences that appear within a single elementary structure. This is illustrated by the tree pairs given in Figure 1 in which no two numbered links may be isolated. (By "isolated", we mean that the links can be contained in a fragment of the tree that contains no other links and dominates only one branch not contained in the fragment. A precise definition is given in section 3.) An analogous problem has long been known to exist for synchronous context-free grammars (SCFG) (Aho and Ullman, 1969). The task of producing efficient parsers for SCFG has recently been addressed by binarization or k -arization of SCFG grammars that produce equivalent grammars in which the rank, k , has been minimized (Zhang

1

Introduction

Tree-adjoining grammar is a widely used formalism in natural-language processing due to its mildlycontext-sensitive expressivity, its ability to naturally capture natural-language argument substitution (via its substitution operation) and optional modification (via its adjunction operation), and the existence of efficient algorithms for processing it. Recently, the desire to incorporate syntax-awareness into machine translation systems has generated interest in 604

Proceedings of ACL-08: HLT, pages 604­612, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics


1 : A B D
w
1

2 : A C E
x
2

A B

1 2 3

A B C D w

2 4 1 3

3 :
5

A
4 2

5

B G z
4

C E x
4

C G3 z D w

B D w

3

C E x

B D w

2 3

C E
x

1 4

F y

3

D w

2

F
y

1

4

1

A

Figure 1: Example of intertwined links that cannot be binarized. No two links can be isolated in both trees in a tree pair. Note that in tree pair 1 , any set of three links may be isolated while in tree pair 2 , no group of fewer than four links may be isolated. In 3 no group of links smaller than four may be isolated.
(a) S N P 1 V likes N N VP S N P 1 V aime NP N
1

( b)
1

likes
2

(c) S NP John VP V likes

S NP V P N P Jean V N N NP N Adj

VP N P 2

John

candies
1

N P 2

red NP NP N
1

aime Det les N

NP

Adj

Adj N N Adj red rouges

D et

John Jean

red candies

bonbons rouges

candies les bonbons

Figure 2: An example STAG derivation of the English/French sentence pair "John likes red candies"/"Jean aime les bonbons rouges". The figure is divided as follows: (a) the STAG grammar, (b) the derivation tree for the sentence pair, and (c) the derived tree pair for the sentences.

and Gildea, 2007; Zhang et al., 2006; Gildea, Satta, and Zhang, 2006). The methods for k -arization of SCFG cannot be directly applied to STAG because of the additional complexity introduced by the expressivity-increasing adjunction operation of TAG. In SCFG, where substitution is the only available operation and the depth of elementary structures is limited to one, the k -arization problem reduces to analysis of permutations of strings of nonterminal symbols. In STAG, however, the arbitrary depth of the elementary structures and the lack of restriction to contiguous strings of nonterminals introduced by adjunction substantially complicate the task. In this paper we offer the first algorithm addressing this problem for the STAG case. We present a compile-time algorithm for transforming a STAG into a strongly-equivalent STAG that optimally minimizes k across the grammar. This is a critical minimization because k is the feature of the grammar that appears in the exponent of the complexity of parsing algorithms for STAG. Following the method of Seki 605

et al. (1991), an STAG parser can be implemented with complexity O(n4·(k+1) · |G|). By minimizing k , the worst-case complexity of a parser instantiated for a particular grammar is optimized. The k arization algorithm performs in O(|G| + |Y | · L3 ) G time where LG is the maximum number of links in any single synchronous tree pair in the grammar and Y is the set of synchronous tree pairs of G. By comparison, a baseline algorithm performing exhaustive search requires O(|G| + |Y | · L6 ) time.1 G The remainder of the paper proceeds as follows. In section 2 we provide a brief introduction to the STAG formalism. We present the k -arization algorithm in section 3 and an analysis of its complexity in section 4. We prove the correctness of the algorithm in section 5.
1 In a synchronous tree pair with L links, there are O(L4 ) pairs of valid fragments. It takes O(L) time to check if the two components in a pair have the same set of links. Once the synchronous fragment with the smallest number of links is excised, this process iterates at most L times, resulting in time O(L6 ). G


:
1

A1 :B n4 : D
3

H n3 : I
2

1

3
J
3

k -arization Algorithm

C E y
5

2

n

Lz
4

n2 : F z

n5 : G x

M w

K
5

N
x

4

y

Figure 3: A synchronous tree pair containing fragments L = L (n1 , n2 ) and R = R (n3 ). Since links(n1 , n2 ) = links(n3 ) = { 2 , 4 , 5 }, we can define synchronous fragment  = L , R . Note also that node n3 is a maximal node and node n5 is not.  (n1 ) = 2 5 5 3 3 2 4 4 ;  (n3 ) = 2 5 5 4 4 2 .

2

Synchronous Tree-Adjoining Grammar

A tree-adjoining grammar (TAG) consists of a set of elementary tree structures of arbitrary depth, which are combined by substitution, familiar from contextfree grammars, or an operation of adjunction that is particular to the TAG formalism. Auxiliary trees are elementary trees in which the root and a frontier node, called the foot node and distinguished by the diacritic , are labeled with the same nonterminal A. The adjunction operation involves splicing an auxiliary tree in at an internal node in an elementary tree also labeled with nonterminal A. Trees without a foot node, which serve as a base for derivations, are called initial trees. For further background, refer to the survey by Joshi and Schabes (1997). We depart from the traditional definition in notation only by specifying adjunction and substitution sites explicitly with numbered links. Each link may be used only once in a derivation. Operations may only occur at nodes marked with a link. For simplicity of presentation we provisionally assume that only one link is permitted at a node. We later drop this assumption. In a synchronous TAG (STAG) the elementary structures are ordered pairs of TAG trees, with a linking relation specified over pairs of nonterminal nodes. Each link has two locations, one in the left tree in a pair and the other in the right tree. An example of an STAG derivation including both substitution and adjunction is given in Figure 2. For further background, refer to the work of Shieber and Schabes (1990) and Shieber (1994). 606

For a synchronous tree pair  = L , R , a fragment of L (or R ) is a complete subtree rooted at some node n of L , written L (n), or else a subtree rooted at n with a gap at node n , written L (n, n ); see Figure 3 for an example. We write links(n) and links(n, n ) to denote the set of links of L (n) and L (n, n ), respectively. When we do not know the root or gap nodes of some fragment L , we also write links(L ). We say that a set of links  from  can be isolated if there exist fragments L and R of L and R , respectively, both with links . If this is the case, we can construct a synchronous fragment  = L , R . The goal of our algorithm is to decompose  into synchronous fragments such that the maximum number of links of a synchronous fragment is kept to a minimum, and  can be obtained from the synchronous fragments by means of the usual substitution and adjunction operations. In order to simplify the presentation of our algorithm we assume, without any loss of generality, that all elementary trees of the source STAG have nodes with at most two children. 3.1 Maximal Nodes

A node n of L (or R ) is called maximal if (i) links(n) = , and (ii) it is either the root node of L or, for its parent node n , we have links(n ) = links(n). Note that for every node n of L such that links(n ) =  there is always a unique maximal node n such that links(n ) = links(n). Thus, for the purpose of our algorithm, we need only look at maximal nodes as places for excising tree fragments. We can show that the number of maximal nodes Mn in a subtree L (n) always satisfies |links(n)|  Mn  2 × |links(n)| - 1. Let n be some node of L , and let l(n) be the (unique) link impinging on n if such a link exists, and l(n) =  otherwise. We associate n with a string  (n), defined by a pre- and post-order traversal of fragment L (n). The symbols of  (n) are the links in links(n), viewed as atomic symbols. Given a node n with p children n1 , . . . , np , 0  p  2, we define  (n) = l(n)  (n1 ) · · ·  (np ) l(n). See again Figure 3 for an example. Note that | (n)| = 2 × |links(n)|.


L : X n1 : R n2 : G
1
2

X R X G
e
3
2

X

X
1

X R
1 1
2

R
X

G


R X R
X

G

G

xcise

adjoin

G

2

transform

1 2 5 5 3 3 2 4 4 1

1 0 0 0 0 0 0 0 0 1
1

0 1 0 0 0 0 1 0 0 0
2

0 0 1 1 0 0 0 0 0 0
5

0 0 1 1 0 0 0 0 0 0
5

0 0 0 0 0 0 0 1 1 0
4

0 0 0 0 0 0 0 1 1 0
4

0 1 0 0 0 0 1 0 0 0
2

0 0 0 0 1 1 0 0 0 0
3

0 0 1 0 0 1 1 0 0 0 0
3

1 0 0 0 0 0 0 0 0 1
1

Figure 4: A diagram of the tree transformation performed when fragment L (n1 , n2 ) is removed. In this and the diagrams that follow, patterned or shaded triangles represent segments of the tree that contain multiple nodes and at least one link. Where the pattern or shading corresponds across trees in a tree pair, the set of links contained within those triangles are equivalent.

Figure 5: Table  with synchronous fragment L (n1 , n2 ), R (n3 ) from Figure 3 highlighted.

in the process. 3.3 Method

3.2 Excision of Synchronous Fragments Although it would be possible to excise synchronous fragments without creating new nonterminal nodes, for clarity we present a simple tree transformation when a fragment is excised that leaves existing nodes intact. A schematic depiction is given in Figure 4. In the figure, we demonstrate the excision process on one half of a synchronous fragment: L (n1 , n2 ) is excised to form two new trees. The excised tree is not processed further. In the excision process the root and gap nodes of the original tree are not altered. The material between them is replaced with a single new node with a fresh nonterminal symbol and a fresh link number. This nonterminal node and link form the adjunction or substitution site for the excised tree. Note that any link impinging on the root node of the excised fragment is by our convention included in the fragment and any link impinging on the gap node is not. To regenerate the original tree, the excised fragment can be adjoined or substituted back into the tree from which it was excised. The new nodes that were generated in the excision may be removed and the original root and gap nodes may be merged back together retaining any impinging links, respectively. Note that if there was a link on either the root or gap node in the original tree, it is not lost or duplicated 607

Let nL and nR be the root nodes of trees L and R , respectively. We know that links(nL ) = links(nR ), and | (nL )| = | (nR )|, the second string being a rearrangement of the occurrences of symbols in the first one. The main data structure of our algorithm is a Boolean matrix  of size | (nL )| × | (nL )|, whose rows are addressed by the occurrences of symbols in  (nL ), in the given order, and whose columns are similarly addressed by  (nR ). For occurrences of links x1 , x2 , the element of  at a row addressed by x1 and a column addressed by x2 is 1 if x1 = x2 , and 0 otherwise. Thus, each row and column of  has exactly two non-zero entries. See Figure 5 for an example. For a maximal node n1 of L , we let  (n1 ) denote the stripe of adjacent rows of  addressed by substring  (n1 ) of  (nL ). If n1 dominates n2 in L , we let  (n1 , n2 ) denote the rows of  addressed by  (n1 ) but not by  (n2 ). This forms a pair of horizontal stripes in  . For nodes n3 , n4 of R , we similarly define  (n3 ) and  (n3 , n4 ) as vertical stripes of adjacent columns. See again Figure 5. Our algorithm is reported in Figure 6. For each synchronous tree pair  = L , R from the input grammar, we maintain an agenda B with all candidate fragments L from L having at least two links. These fragments are processed greedily in order of increasing number of links. The function I S O L AT E(), described in more detail be-


1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

Function K A R I Z E(G) {G a binary STAG} G  STAG with empty set of synch trees; for all  = L , R in G do init  and B ; while B =  do L  next fragment from B ; R  I S O L AT E(L ,  , R ); if R = null then add L , R to G ;   excise L , R from  ; update  and B ; add  to G ; return G F
igure 6: Main algorithm.

1: 2:

3: 4: 5:

6: 7: 8:

Function I S O L AT E(L ,  , R ) select n  R such that  (n) is the shortest string within  (nR ) including left/right boundaries of L in  ; if | (n)| = 2 × |links(L )| then return R (n); select n  R such that  (n ) is the gap string within  (n) for which links(n) - links(n ) = links(L ); if n is not defined then return null; {more than one gap} return R (n, n );
Figure 7: Find synchronous fragment.

low, looks for a right fragment R with the same links as L . Upon success, the synchronous fragment  = L , R is added to the output grammar. Furthermore, we excise  from  and update data structures  and B . The above process is iterated until B becomes empty. We show in section 5 that this greedy strategy is sound and complete. The function I S O L AT E() is specified in Figure 7. We take as input a left fragment L , which is associated with one or two horizontal stripes in  , depending on whether L has a gap node or not. The left boundary of L in  is the index x1 of the column containing the leftmost occurrence of a 1 in the horizontal stripes associated with L . Similarly, the right boundary of L in  is the index x2 of the column containing the rightmost occurrence of a 1 in these stripes. We retrieve the shortest substring  (n) of  (nR ) that spans over indices x1 and x2 . This means that n is the lowest node from R such that the links of L are a subset of the links of R (n). If the condition at line 3 is satisfied, all of the matrix entries of value 1 that are found from column x1 to column x2 fall within the horizontal stripes associated with L . In this case we can report the right fragment R = R (n). Otherwise, we check whether the entries of value 1 that fall outside of the two horizontal stripes in between columns x1 and x2 occur within adjacent columns, say from column x3  x1 to column x4  x2 . In this case, we check whether there exists some node n such that the substring of  (n) from position x3 to x4 is 608

i an occurrence of string  (n ). This means that n s the gap node, and we report the right fragment L = R (n, n ). See again Figure 5. We now drop the assumption that only one link may impinge on a node. When multiple links impinge on a single node n, l(n) is an arbitrary order over those links. In the execution of the algorithm, any stripe that contains one link in l(n) it must include every link in l(n). This prevents the excision of a proper subset of the links at any node. This preserves correctness because excising any proper subset would impose an order over the links at n that is not enforced in the input grammar. Because the links at a node are treated as a unit, the complexity of the algorithm is not affected.

4 Complexity
We discuss here an implementation of the algorithm of section 3 resulting in time complexity O(|G| + |Y | · L3 ), where Y is the set of synG chronous tree pairs of G and LG is the maximum number of links in a synchronous tree pair in Y . w Consider a synchronous tree pair  = L , R ith L links. If M is the number of maximal nodes in L or R , we have M = (L) (Section 3.1). We implement the sparse table  in O(L) space, recording for each row and column the indices of its two non-zero entries. We also assume that we can go back and forth between maximal nodes n and strings  (n) in constant time. Here each  (n) is represented by its boundary positions within  (nL ) or  (nR ), nL and nR the root nodes of L and R , respectively.


At line 2 of the function I S O L AT E() (Figure 7) we retrieve the left and right boundaries by scanning the rows of  associated with input fragment L . We then retrieve node n by visiting all maximal nodes of L spanning these boundaries. Under the above assumptions, this can be done in time O(L). In a similar way we can implement line 5, resulting in overall run time O(L) for function I S O L AT E(). In the function K A R I Z E() (Figure 6) we use buckets Bi , 1  i  L, where each Bi stores the candidate fragments L with |links(L )| = i. To populate these buckets, we first process fragments L (n) by visiting bottom up the maximal nodes of L . The quantity |links(n)| is computed from the quantities |links(ni )|, where ni are the highest maximal nodes dominated by n. (There are at most two such nodes.) Fragments L (n, n ) can then be processed using the relation |links(n, n )| = |links(n)| - |links(n )|. In this way each fragment is processed in constant time, and population of all the buckets takes O(L2 ) time. We now consider the while loop at lines 5 to 11 in function K A R I Z E(). For a synchronous tree pair  , the loop iterates once for each candidate fragment L in some bucket. We have a total of O(L2 ) iterations, since the initial number of candidates in the buckets is O(L2 ), and the possible updating of the buckets after a synchronous fragment is removed does not increase the total size of all the buckets. If the links in L cannot be isolated, one iteration takes time O(L) (the call to function I S O L AT E()). If the links in L can be isolated, then we need to restructure  and to repopulate the buckets. The former can be done in time O(L) and the latter takes time O(L2 ), as already discussed. Crucially, the updating of  and the buckets takes place no more than L - 1 times. This is because each time we excise a synchronous fragment, the number of links in  is reduced by at least one. We conclude that function K A R I Z E() takes time O(L3 ) for each synchronous tree  , and the total running time is O(|G| + |Y | · L3 ), where Y is the G set of synchronous tree pairs of G. The term |G| accounts for the reading of the input, and dominates the complexity of the algorithm only in case there are very few links in each synchronous tree pair. 609

:

n1 : A
3

5


4 2

:

n3 :
A A
6 3 1

n2 : B D w

C E x

B n4 : D w

1

Figure 8: In  links 3 and 5 cannot be isolated because the fragment would have to contain two gaps. However, after the removal of fragment  (n1 , n2 ), an analogous fragment  (n3 , n4 ) may be removed.

5 Proof of Correctness
The algorithm presented in the previous sections produces an optimal k -arization for the input grammar. In this section we sketch a proof of correctness of the strategy employed by the algorithm.2 The k -arization strategy presented above is greedy in that it always chooses the excisable fragment with the smallest number of links at each step and does not perform any backtracking. We must therefore show that this process cannot result in a non-optimal solution. If fragments could not overlap each other, this would be trivial to show because the excision process would be confluent. If all overlapping fragments were cases of complete containment of one fragment within another, the proof would also be trivial because the smallest-to-largest excision order would guarantee optimality. However, it is possible for fragments to partially overlap each other, meaning that the intersection of the set of links contained in the two fragments is non-empty and the difference between the set of links in one fragment and the other is also non-empty. Overlapping fragment configurations are given in Figure 9 and discussed in detail below. The existence of partially overlapping fragments complicates the proof of optimality for two reasons. First, the excision of a fragment  that is partially overlapped with another fragment  necessarily precludes the excision of  at a later stage in the exNote that the soundness of the algorithm can be easily verified from the fact that the removal of fragments can be reversed by performing standard STAG adjunction and substitution operations until a single STAG tree pair is produced. This tree pair is trivially homomorphic to the original tree pair and can easily be mapped to the original tree pair.
2


(1, 1 ) n1 : n2 : B n3 : C n4 :A D

(2) n5 : A n6 : B n7 : C

(3) n8 : A n9 : B n10 : C n11 : D

n1 : A n2 : B n3 : C n4 : D

n5 : E n6 : F n7 : G

n1 : A n5 : E Hx Ix

n1 : A n5 : E n2 : B Kx n4 : D Jx n4 : D

n3 : C n6 : F

remove  remove 

n4 : D

Figure 9: The four possible configurations of overlapped fragments within a single tree. For type 1, let  =  (n1 , n3 ) and  =  (n2 , n4 ). The roots and gaps of the fragments are interleaved. For type 1 , let  =  (n1 , n3 ) and  =  (n2 ). The root of  dominates the gap of . For type 2, let  =  (n5 , n6 ) and  =  (n5 , n7 ). The fragments share a root and have gap nodes that do not dominate each other. For type 3 let  =  (n8 , n10 ) and  =  (n9 , n11 ). The root of  dominates the root of  , both roots dominate both gaps, but neither gap dominates the other.

Figure 10: Removal from a tree pair  containing type 1­ type 2 fragment overlap. The fragment  is represented by the horizonal-lined pieces of the tree pair. The fragment  is represented by the vertical-lined pieces of the tree pair. Cross-hatching indicates the overlapping portion of the two fragments.

cision process. Second, the removal of a fragment may cause a previously non-isolatable set of links to become isolatable, effectively creating a new fragment that may be advantageous to remove. This is demonstrated in Figure 8. These possibilities raise the question of whether the choice between removing fragments  and  may have consequences at a later stage in the excision process. We demonstrate that this choice cannot affect the k found for a given grammar. We begin by sketching the proof of a lemma that shows that removal of a fragment  that partially overlaps another fragment  always leaves an analogous fragment that may be removed. 5.1 Validity Preservation Consider a STAG tree pair  containing the set of links  and two synchronous fragments  and  with  containing links links() and  containing links( ) (links(), links( ) ). If  and  do not overlap, the removal of  is defined as validity preserving with respect to . If  and  overlap, removal of  from  is validity preserving with respect to  if after the removal there exists a valid synchronous fragment (containing at most one gap on each side) that contains all and only the links (links() - links( ))  { x } where x is the new link added to  . 610

We prove a lemma that removal of any synchronous fragment from an STAG tree pair is validity preserving with respect to all of the other synchronous fragments in the tree pair. It suffices to show that for two arbitrary synchronous fragments  and  , the removal of  is validity preserving with respect to . We show this by examination of the possible configurations of  and  . Consider the case in which  is fully contained within . In this case links( ) links(). The removal of  leaves the root and gap of  intact in both trees in the pair, so it remains a valid fragment. The new link is added at the new node inserted where  was removed. Since  is fully contained within , this node is below the root of  but not below its gap. Thus, the removal process leaves  with the links (links() - links( ))  { x }, where x is the link added in the removal process; the removal is validity preserving. Synchronous fragments may partially overlap in several different ways. There are four possible configurations for an overlapped fragment within a single tree, depicted in Figure 9. These different singletree overlap types can be combined in any way to form valid synchronous fragments. Due to space constraints, we consider two illustrative cases and leave the remainder as an exercise to the reader. An example of removing fragments from a tree set containing type 1­type 2 overlapped fragments is given in Figure 10. Let  = L (n1 , n3 ), R (n5 , n6 ) . Let


= L (n2 , n4 ), R (n5 , n7 ) . If  is removed, the validity preserving fragment for  is L (n1 , n4 ), R (n5 ) . It contains the links in the vertical-lined part of the tree and the new link x . This forms a valid fragment because both sides contain at most one gap and both contain the same set of links. In addition, it is validity preserving for  because it contains exactly the set of links that were in links( ) and not in links() plus the new link x . If we instead choose to remove  , the validity preserving fragment for  is L (n1 , n4 ), R (n5 ) . The links in each side of this fragment are the same, each side contains at most one gap, and the set of links is exactly the set left over from links() once links( ) is removed plus the newly generated link x . An example of removing fragments from a tree set containing type 1 ­type 3 (reversed) overlapped fragments is given in Figure 11. If  is removed, the validity preserving fragment for  is L (n1 ), R (n4 ) . If  is removed, the validity preserving fragment for  is L (n1 , n8 ), R (n4 ) . Similar reasoning follows for all remaining types of overlapped fragments. 5.2 Proof Sketch We show that smallest-first removal of fragments is optimal. Consider a decision point at which a choice is made about which fragment to remove. Call the size of the smallest fragments at this point m, and let the set of fragments of size m be X with ,   X . There are two cases to consider. First, consider two partially overlapped fragments   X and   X . Note that |links()| < |links( )|. Valid/ ity preservation of  with respect to  guarantees that  or its validity preserving analog will still be available for excision after  is removed. Excising  increases k more than excising  or any fragment that removal of  will lead to before  is considered. Thus, removal of  cannot result in a smaller value for k if it is removed before  rather than after . Second, consider two partially overlapped fragments ,   X . Due to the validity preservation lemma, we may choose arbitrarily between the fragments in X without jeopardizing our ability to later remove other fragments (or their validity preserving analogs) in that set. Removal of fragment  cannot increase the size of any remaining fragment. Removal of  or  may generate new fragments 611

n1 : A n2 : B n3 : C

n4 : D n5 : E n6 : F n7 : G

n1 : A

n4 : D

n1 : A n4 : D n2 : B Kx

Hx n : 5E n3 : C Ix n6 : F

n7 : G n8 : J x

remove  remove 

Figure 11: Removal from a tree pair  containing a type 1 ­type 3 (reversed) fragment overlap. The fragment  is represented by the horizontal lined pieces of the tree pair. The fragment  is represented by the vertical-lined pieces of the tree pair. Cross-hatching indicates the overlapping portion of the two fragments.

that were not previously valid and may reduce the size of existing fragments that it overlaps. In addition, removal of  may lead to availability of smaller fragments at the next removal step than removal of  (and vice versa). However, since removal of either  or  produces a k of size at least m, the later removal of fragments of size less than m cannot affect the k found by the algorithm. Due to validity preservation, removal of any of these smaller fragments will still permit removal of all currently existing fragments or their analogs at a later step in the removal process. If the removal of  generates a new fragment  of size larger than m all remaining fragments in X (and all others smaller than  ) will be removed before  is considered. Therefore, if removal of  generates a new fragment smaller than  , the smallest-first strategy will properly guarantee its removal before  .

6 Conclusion
In order for STAG to be used in machine translation and other natural-language processing tasks it must be possible to process it efficiently. The difficulty in parsing STAG stems directly from the factor k that indicates the degree to which the correspondences are intertwined within the elementary structures of the grammar. The algorithm presented in this paper is the first method available for k -arizing a synchronous TAG grammar into an equivalent grammar with an optimal value for k . The algorithm operates offline and requires only O(|G| + |Y | · L3 ) time. G Both the derivation trees and derived trees produced are trivially homomorphic to those that are produced by the original grammar.


References
Aho, Alfred V. and Jeffrey D. Ullman. 1969. Syntax directed translations and the pushdown assembler. Journal of Computer and System Sciences, 3(1):37­56. Chiang, David and Owen Rambow. 2006. The hidden TAG model: synchronous grammars for parsing resource-poor languages. In Proceedings of the 8th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+ 8), pages 1­8. Gildea, Daniel, Giorgio Satta, and Hao Zhang. 2006. Factoring synchronous grammars by sorting. In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL-06), July. Han, Chung-Hye. 2006a. Pied-piping in relative clauses: Syntax and compositional semantics based on synchronous tree adjoining grammar. In Proceedings of the 8th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+ 8), pages 41­48, Sydney, Australia. Han, Chung-Hye. 2006b. A tree adjoining grammar analysis of the syntax and semantics of it-clefts. In Proceedings of the 8th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+ 8), pages 33­40, Sydney, Australia. Joshi, Aravind K. and Yves Schabes. 1997. Treeadjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages. Springer, pages 69­124. Nesson, Rebecca and Stuart M. Shieber. 2006. Simpler TAG semantics through synchronization. In Proceedings of the 11th Conference on Formal Grammar, Malaga, Spain, 29­30 July. Nesson, Rebecca and Stuart M. Shieber. 2007. Extraction phenomena in synchronous TAG syntax and semantics. In Proceedings of Syntax and Structure in Statistical Translation (SSST), Rochester, NY, April. Nesson, Rebecca, Stuart M. Shieber, and Alexander Rush. 2006. Induction of probabilistic synchronous tree-insertion grammars for machine translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), Boston, Massachusetts, 8-12 August. Satta, Giorgio. 1992. Recognition of linear context-free rewriting systems. In Proceedings of the 10th Meeting of the Association for Computational Linguistics (ACL92), pages 89­95, Newark, Delaware. Seki, H., T. Matsumura, M. Fujii, and T. Kasami. 1991. On multiple context-free grammars. Theoretical Computer Science, 88:191­229. Shieber, Stuart M. 1994. Restricting the weak-generative capacity of synchronous tree-adjoining grammars.

Computational Intelligence, 10(4):371­385, November. Shieber, Stuart M. and Yves Schabes. 1990. Synchronous tree adjoining grammars. In Proceedings of the 13th International Conference on Computational Linguistics (COLING '90), Helsinki, August. Weir, David. 1988. Characterizing mildly contextsensitive grammar formalisms. PhD Thesis, Department of Computer and Information Science, University of Pennsylvania. Zhang, Hao and Daniel Gildea. 2007. Factorization of synchronous context-free grammars in linear time. In NAACL Workshop on Syntax and Structure in Statistical Translation (SSST), April. Zhang, Hao, Liang Huang, Daniel Gildea, and Kevin Knight. 2006. Synchronous binarization for machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT/NAACL).

612