Graph Branch Algorithm: An Optimum Tree Search Method for Scored Dependency Graph with Arc Co-occurrence Constraints
Hideki Hirakawa Toshiba R&D Center 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki 210, JAPAN hideki.hirakawa@toshiba.co.jp

Abstract
Various kinds of scored dependency graphs are proposed as packed shared data structures in combination with optimum dependency tree search algorithms. This paper classifies the scored dependency graphs and discusses the specific features of the "Dependency Forest" (DF) which is the packed shared data structure adopted in the "Preference Dependency Grammar" (PDG), and proposes the "Graph Branch Algorithm" for computing the optimum dependency tree from a DF. This paper also reports the experiment showing the computational amount and behavior of the graph branch algorithm.

1 Introduction
The dependency graph (DG) is a packed shared data structure which consists of the nodes corresponding to the words in a sentence and the arcs showing dependency relations between the nodes. The scored DG has preference scores attached to the arcs and is widely used as a basis of the optimum tree search method. For example, the scored DG is used in Japanese Kakari-uke analysis1 to represent all possible kakari-uke(dependency) trees(Ozeki, 1994),(Hirakawa, 2001). (McDonald et al., 2005) proposed a dependency analysis method using a scored DG and some maximum spanning tree search algorithms. In this method, scores on arcs are computed from a set of features obtained from the dependency trees based on the
Kakari-uke relation, widely adopted in Japanese sentence analysis, is projective dependency relation with a constraint such that the dependent word is located at the left-hand side of its governor word.
1

optimum parameters for scoring dependency arcs obtained by the discriminative learning method. There are various kinds of dependency analysis methods based on the scored DGs. This paper classifies these methods based on the types of the DGs and the basic well-formed constraints and explains the features of the DF adopted in PDG(Hirakawa, 2006). This paper proposes the graph branch algorithm which searches the optimum dependency tree from a DF based on the branch and bound (B&B) method(Ibaraki, 1978) and reports the experiment showing the computational amount and behavior of the graph branch algorithm. As shown below, the combination of the DF and the graph branch algorithm enables the treatment of non-projective dependency analysis and optimum solution search satisfying the single valence occupation constraint, which are out of the scope of most of the DP(dynamic programming)based parsing methods.

2 Optimum Tree Search in a Scored DG
2.1 Basic Framework Figure 1 shows the basic framework of the optimum dependency tree search in a scored DG. In general, nodes in a DG correspond to words in the sentence and the arcs show some kind of dependency relations between nodes. Each arc has
Optimum Tree Search Algorithm s1

Well-formed dependency tree constraint D e p e n d e n cy Tree

s3 s4 s5

s2

Set of Scored Wellformed Dependency Trees
˘Ą 

Scored Dependency Graph

Well-formed Dependency Tree with the highest score

(score=s1+s2+s3+s4+s5 )

Figure 1: The optimum tree search in a scored DG

361
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 361­368, Sydney, July 2006. c 2006 Association for Computational Linguistics


a preference score representing plausibility of the relation. The well-formed dependency tree constraint is a set of well-formed constraints which should be satisfied by all dependency trees representing sentence interpretations. A DG and a wellformed dependency tree constraint prescribe a set of well-formed dependency trees. The score of a dependency tree is the sum total of arc scores. The optimum tree is a dependency tree with the highest score in the set of dependency trees. 2.2 Dependency Graph

(C3) Projectivity constraint(PJC): No arc crosses another arc5 (C4) Single valence occupation constraint(SVOC): No two arcs in a tree occupy the same valence of a predicate (C1) and (C2), collectively referred to as "covering constraint", are basic constraints adopted by almost all dependency parsers. (C3) is adopted by the majority of dependency parsers which are called projective dependency parsers. A projective dependency parser fails to analyze non-projective sentences. (C4) is a basic constraint for valency but is not adopted by the majority of dependency parsers. Graph search algorithms, such as the ChuLiu-Edmonds maximum spanning tree algorithm (Chu and Liu, 1965; Edmonds, 1967), algorithms based on the dynamic programming (DP) principle (Ozeki, 1994; Eisner, 1996) and the algorithm based on the B&B method (Hirakawa, 2001), are used for the optimum tree search in scored DGs. The applicability of these algorithms is closely related to the types of DGs and/or well-formedness constraints. The Chu-Liu-Edmonds algorithm is very fast (Ç ´Ňž ľ for sentence length Ň), but it works correctly only on word DGs. DP-based algorithms can satisfy (C1)-(C3) and run efficiently, but seems not to satisfy (C4) as shown in 2.4. (C2)-(C4) can be described as a set of cooccurrence constraints between two arcs in a DG. As described in Section 2.6, the DF can represent (C2)-(C4) and more precise constraints because it can handle co-occurrence constraints between two arbitrary arcs in a DG. The graph branch algorithm described in Section 3 can find the optimum tree from the DF. 2.4 SVOC and DP (Ozeki and Zhang, 1999) proposed the minimum cost partitioning method (MCPM) which is a partitioning computation based on the recurrence equation where the cost of joining two partitions is the cost of these partitions plus the cost of combining these partitions. MCPM is a generalization of (Ozeki, 1994) and (Katoh and Ehara, 1989) which compute the optimum dependency tree in a scored DG. MCPM is also a generalization of the probabilistic CKY algorithm and the Viterbi algo5 Another condition for projectivity, i.e. "no arc covers top node" is equivalent to the crossing arc constraint if special root node , which is a governor of top node, is introduced at the top (or end) of a sentence.

DGs are classified into some classes based on the types of nodes and arcs. This paper assumes three types of nodes, i.e. word-type, WPP-type2 and concept-type3 . The types of DGs are called a word DG, a WPP DG and a concept DG, respectively. DGs are also classified into non-labeled and labeled DGs. There are some types of arc labels such as syntactic label (ex. "subject","object") and semantic label (ex. "agent","target"). Various types of DGs are used in existing systems according to these classifications, such as non-label word DG(Lee and Choi, 1997; Eisner, 1996; McDonald et al., 2005)4 , syntactic-label word DG (Maruyama, 1990), semantic-label word DG(Hirakawa, 2001), non-label WPP DG(Ozeki, 1994; Katoh and Ehara, 1989), syntactic-label WPP DG(Wang and Harper, 2004), semantic-label concept DG(Harada and Mizuno, 2001). 2.3 Well-formedness Constraints and Graph Search Algorithms

There can be a variety of well-formedness constraints from very basic and language-independent constraints to specific and language-dependent constraints. This paper focuses on the following four basic and language-independent constraints which may be embedded in data structure and/or the optimum tree search algorithm. (C1) Coverage constraint: Every input word has a corresponding node in the tree (C2) Single role constraint(SRC): No two nodes in a dependency tree occupy the same input position
2 WPP is a pair of a word and a part of speech (POS). The word "time" has WPPs such as "time/n" and "time/v". 3 One WPP (ex. "time/n") can be categorized into one or more concepts semantically (ex. "time/n/period time" and "time/n/clock time"). 4 This does not mean that these algorithms can not handle labeled DGs.

362


target2,10 agent1,15 Isha-mo (doctor) Wakaranai (not_know) target4,7 agent3,5

in-state7,10

OS3[22]: (agent1,15) + (target4,7) OS4[25]: (agent5,15) + (in-state7,10)

NOS1[10]: (target2,10) OS1[15]: (agent1,15)

OS4[25]: (agent5,15) + (in-state7,10) NOS2[20]: (target4,10) + (in-state7,10)

Meaning of Arc Name sub : subject obj : object npp : noun-preposition vpp : verb-preposition pre : preposition nc : noun compound det : determiner rt : root

Well-formed optimum solutions for covering whole phrase

Figure 2: Optimum tree search satisfying SVOC
§ŚĽ¤Ł

Constraint Matrix

rithm6 . The minimum cost partition of the whole sentence is calculated very efficiently by the DP principle. The optimum partitioning obtained by MCPM constitutes a tree covering the whole sentence satisfying the SRC and PJC. However, it is not assured that the SVOC is satisfied by MCPM. Figure 2 shows a DG for the Japanese phrase "Isha-mo Wakaranai Byouki-no Kanja" encompassing dependency trees corresponding to "a patient suffering from a disease that the doctor doesn't know", "a sick patient who does not know the doctor", and so on. Ç Ë˝ -Ç Ë represent the optimum solutions for the phrases specified by their brackets computed based on MCPM. For example, Ç Ëż gives an optimum tree with a score of žž (consisting of ŇŘ˝ and Ř Ö Ř ) for the phras e "Isha-mo Wakaranai Byouki-no". The optimum s oolution for the whole phrase is either Ç Ë˝ ˇ Ç Ë r Ç Ëż ˇ Ç Ëž due to MCPM. The former has the highest score ź´ ˝ ˇ ž ľ but does not satisfy the SVOC because it has ŇŘ˝ and ŇŘ simultaneously. The optimum solutions satisfying the SVOC are Ć Ç Ë˝ ˇ Ç Ë and Ç Ë˝ ˇ Ć Ç Ëž shown at the bottom of Figure 2. Ć Ç Ë˝ and Ć Ç Ëž are not opti mum solut ions for their word coverages. This shows that it is not assured that MCPM will obtain the optimum solution satisfying the SVOC. On the contrary, it is assured that the graph branch algorithm computes the optimum solution(s) satisfying the SVOC because it computes the optimum solution(s) satisfying any cooccurrence constraints in the constraint matrix. It is an open problem whether an algorithm based on the DP framework exists which can handle the SVOC and arbitrary arc co-occurrence constraints.
Specifically, MTCM corresponds to probabilistic CKY and the Viterbi algorithm because it computes both the optimum tree score and its structure.
6

Figure 3: Scored dependency forest

2.5 Semantic Dependency Graph (SDG) The SDG is a semantic-label word DG designed for Japanese sentence analysis. The optimum tree search algorithm searches for the optimum tree satisfying the well-formed constraints (C1)-(C4) in a SDG(Hirakawa, 2001). This method is lacking in terms of generality in that it cannot handle backward dependency and multiple WPP because it depends on some linguistic features peculiar to Japanese. Therefore, this method is inherently inapplicable to languages like English that require backward dependency and multiple POS analysis. The DF described below can be seen as the extension of the SDG. Since the DF has none of the language-dependent premises that the SDG has, it is applicable to English and other languages. 2.6 Dependency Forest (DF) The DF is a packed shared data structure encompassing all possible dependency trees for a sentence adopted in PDG. The DF consists of a dependency graph (DG) and a constraint matrix (CM). Figure 3 shows a DF for the example sentence "Time flies like an arrow." The DG consists of nodes and directed arcs. A node represents a WPP and an arc shows the dependency relation between nodes. An arc has its ID and preference score. CM is a matrix whose rows and columns are a set of arcs in DG and prescribes the cooccurrence constraint between arcs. Only when CM(i,j) is , Ö and Ö are co-occurrable in one dependency tree. The DF is generated by using a phrase structure parser in PDG. PDG grammar rule is an extended CFG rule, which defines the mapping between a sequence of constituents (the body of a CFG rule) and a set of arcs (a partial dependency tree).
363

$ % %% % % $ $ $ $ %% %% %% %% % % % % $% % % % %$ %%%% % % $% % %%%%%$%%%%% % % %$ % %% % $ % % % $ % % %% $ % % %% % $ %% %% %% %% % % % % % % % % % %% %      # "  !    

 %  %  %  %  %  # % % " % %  ! %   %   $ $ $ $     $  

OS1[15]: (agent1,15)

OS2[10]: (in-state7,10)

0,time/v obj4

1,fly/n sub23 2,like/v

    
¨

Š Š Š Š Š

Š¨ Š¨ Š¨ Š¨ Š¨

Š

Byouki-no (sickness)

Kanja (patient)

nc2

npp19

rt29

obj16

Š¨


¨

0,time/n

sub24

1,fly/v

rt31 vpp18

2,like/p

3,an/det

¨

    
target6,5 agent5,15

rt32

vpp20

root pre15

det14

4,arrow/n Dependency Graph

˘Ą 


Figure 4: Graph branch algorithm
˘Ą 

(1) Partial-problem Partial-problem Č in the graph branch algorithm is a problem searching for all the wellformed optimum trees in a DF consisting of the dependency graph and con strain t matrix Ĺ . Č consi sts of the following elements . ( (a) Depend ency graph Ĺ (b) Constr aint matrix (c) Feasibl e solut ion value Ä (d) Upper boun d value Í ČÄ T e) Inco nsiste nt arc pair list Á he constraint matrix is common to all partialproblems, so one Ĺ is shared by all partialproblems. is repres ented by "Ö Ń " which shows a set of arcs to be removed from the whole dependency graph . For example , "Ö Ń " represents a partial dependency graph   in  . Á Č Ä is a list of the case inconsistent arc pairs. An inconsistent arc pair is an arc pair which does not satisfy some cooccurrence constraint.
364

3.2

Graph Branch Algorithm

The graph branch algorithm is obtained by defining the components of the original B&B skeleton algorithm, i.e. the partial-problem, the feasible solution, the lower bound value, the upper bound value, the branch operation, and so on(Ibaraki, 1978). Figure 4 shows the graph branch algorithm which has been extended from the original B&B skeleton algorithm to search for all the optimum trees in a DF. The following sections explain the B&B components of the graph branch algorithm.
7 The dominance test is not used in the graph branch algorithm.

7 ŠBD  "

) 0 P G  @ §  C  "   "          ¨ Š    C ¨ D

6

) 0 3 ¨ Ł &  §  ¨ Š  § ¨    Š2 e % 0 )

i  shr xy si

7 3 P G  @  ¨ Ł 2 D §    B D    A 4Ľ

6 §  ¨ Š '  "  § "  D

 Š  A 7 6 ¨ Ł5

) 0 3 §  ¨ Š      A § ¨ D §   2 c % 0 ) Ľ D §   

7   Š B D    "  Š  A 7 6 ¨ Ł5

w x whr v s q ur ts r qi p ih g 5

8 Š " ¨ P       Ł C ¨ D G 9 Ł  4Ľ

¤Ł

     Ł 5

) 0 3   Š "2 f % 0 ) Ľ Š ¨ F 

Ľ       B  Š  §¨    Š

Š " ¨ P       Ł C ¨ D G

 Š §  d 0)

3 1 8 4 4 (2 & ¨

8 Ł  4Ľ

 " 

Ł

Ł

The B&B method(Ibaraki, 1978) is a principle for solving computationally hard problems such as NP-complete problems. The basic strategy is that the original problem is decomposed into easier partial-problems (branching) and the original problem is solved by solving them. Pruning called a bound operation is applied if it turns out that the optimum solution to a partial-problem is inferior to the solution obtained from some other partialproblem (dominance test)7 , or if it turns out that a partial-problem gives no optimum solutions to the original problem (maximum value test). Usually, the B&B algorithm is constructed to minimize the value of the solution. The graph branch algorithm in this paper is constructed to maximize the score of the solution because the best solution is the maximum tree in the DF.

6 6 7       B  Š  § ¨    Š  Š  A 5

6 7 D §   ŠA 5

3 65

4b

P G  @2 & ¨

3.1

Outline of B&B Method

) 0 §  ¨ Š '  "  '  ¨ Š   U 4 C § '       '  Š "   ' Y  C § '     I  P 0 )

) 0 §  ¨ Š '  "

) 0 ¨  

)0     !¨   &   6 7 D §     Š  A 7 P Ł Ś 4Ľ P G  @ 5

Š §    ' §¨

) 0 3 " §  ¨ Š '  "


7 3 % H2 "  ! ¨ Š  §   Š   B D Š ¨ I B "  

Š §  Š "¨ " §  §¨

"  % H "¨ D Š C C  0 ) 7 $ `

 ' ¨ Š       D    " 2 a % 0 )

§ & 

5

§  Š " ¨ F  3 2

§  Š " ¨ F  3 2

3 65

3@9

4b

6 % H5

4 4 @ P2 & ¨ "  

P Ł Ś 2 & ¨

4Ľ

P G @

 " 

4Ľ

0)

0)

$

This section shows the graph branch algorithm based on the B&B principle, which searches for the optimum well-formed tree in a DF by applying problem expansions called graph branching.

)0

)0

" " 

Q § ¨ Š      D §    D    A  Š '  F   C § '       ' §  D Š 0 )

"¨

3 §  ¨ Š '  "    ¨ "   & &     "2

6 7 D §     Š  A 7 P Ł Ś

4Ľ

P G @ 5

C § '     I 

3@9

X

&Ś

@ P2 & ¨

0)

3 The Optimum Tree Search in DF

) 0 Q         D Š  Š  § ¨    Š  C § '  & §  ¨ Š '  "   ¨ "   &  § & Ś

)0

)0

 T §  D Š   Š Š   " ¨ @ P & Ś Ľ 3  Š  C  '  '  ! Š §    ' § ¨ 2 S % 0 ) 6 7        B  Š  § ¨    Š  Š  A 5 3 §  ¨ Š '  " B  § 4 4 % H 2 & ¨

)0

) 0   I 

Q  '  ! Š §    ' § ¨

 '  ! Š §    ' § Ś ) 0 Q P Ł Ś DŠ C§ Š "¨ 

         ¨ Š   Ł Ľ Q¨ Ł  & % H §  ¨ Š '  " 6 7 6 % H5 Ľ ¨   # Š "¨  7 65 4Ľ  4Ľ

) 0 ¤Ł &  C§ '     9

6 7       B  Š  § ¨    Š  Š  A 5

) 0 3  ' Š  "  '  !   ¨ Š¨ § ¨ 2 1 % 0 ) Ľ Š   Š "

6 7 3 Ł 2        B Š    "

C §  §  ¨ Š '  " Š §    ' § ¨

(  " §  ¨ Š '  " $ 7 @P

3%H &    "  DŠ

6 7 Š¨ F   Š  A 5

) 0 Ľ 3 Š "  Š C § '       '2 V % 0 )

) 0 3 D    "2 E % 0 ) Ľ   Š B D    "

         ¨ Š     !¨ Š Š §  Š "¨ " §  §¨ 7 3¨ Ł 2 R & B Š  A 4Ľ 7 3 ¨ Ł 2    ¨ B Š  A 4Ľ $ 7 1 8 4 ( 7 6 ¤ Ł5 Š §    ' §¨ 4Ľ 42 (5 3 65 @P 4Ľ 3( 3( 44 4Ľ

  ¨ "   &  D Š  Š '    G 0 )

¨ Ł          ¨ Š ¨ §Ś Ľ 0 ) 7 3 ¤ Ł2  ' B Š  A 4 @ 9 Š'G 0)  Š C ' 0 ) ¨ Ł 5  "  &  Š% Ľ C§ '  0 ) 3 @ P  % H2 U @ P2 & ¨ W @ 92 & ¨ Ł 2 & ¨ P Ł Ś Ł 0)

 Ľ Ł $

¤Ł

The generated CM assures that the parse trees in the parse forest and the dependency trees in the DF have mutual correspondence(Hirakawa, 2006). CM can represent (C2)-(C4) in 2.3 and more precise constraints. For example, PDG can generate a DF encompassing non-projective dependency trees by introducing the grammar rules defining non-projective constructions. This is called the controlled non-projectivity in this paper. Treatment of non-projectivity as described in (Kanahe et al., 1998; Nivre and Nilsson, 2005) is an important topic out of the scope of this paper.


(3) Algorithm for Obtaining Upper Bound Given a set of arcs which is a subset of , 9 of arcs in if the set of dependent nodes satisfies the covering constraint, the arc set is called the well-covered arc set. The maximum well-covered arc set is defined as a well-covered arc set with the highest score. In general, the maximum wellcovered arc set does not satisfy the SRC and does not form a tree. In the graph branch algorithm, the score of the maximum well-covered arc set of a dependency graph is assigned as the upper bound value Í of the partial-problem Č . Upper bound function Ř Ů calculates Í by scanning the arc lists sorted by the surface position of the dependent nodes of the arcs. (4) Branch Operation Figure 5 shows a branch operation called a graph branch operation. Child partial-problems of Č are con struct ed as follows: (a) Search for an inconsistent arc pair ´ Ö Ö  in the maximum well-covered arc set of the DG of Č . (b) Create child partial-problems Č , Č which have new DGs   Ö and   Ö respectively.
ľ

(5) Selection of Partial-problem × Đ Ř ÔÖ Ó Đ Ń employ s the best bou nd sear ch strategy, i.e. it selects the partial-problem which has the maximum bound value among the active partial-problems. It is known that the number of partial-problems decomposed during computation is minimized by this strategy in the case that no dominance tests are applied (Ibaraki, 1978). (6) Computing All Optimum Solutions In order to obtain all optimum solutions, partialproblems whose upper bound values are equal to the score of the optimum solution(s) are expanded at Ë in Figure 4. In the case that at least one inconsistent arc pair remains in a partial-problem (i.e. Á Č Ä ), graph branch is performed based on the inconsistent arc pair. Otherwise, the obtained optimum solution Ë is checked if one of the arcs in Ë has an equal rival arc by Ö× Ű Ř ĐŘ Ö Ň Ř Ú × func tion. The equa l rival arc of arc is an arc whose position and score are equal to those of arc . If an equal rival arc of an arc in Ë exists, a new partial-problem is generated by removing the arc in Ë . Ë assures that no partial-problem has an upper bound value
365

Since a solution to Č cannot have both Ö and Ö simulta neous ly due to the co-o ccurre nce con straint, the optimum solution of Č is obtained from either/both Č or/and Č . The child partialA feasible solution may not be optimum but is a possible interpretation of a sentence. Therefore, it can be used as an approximate output when the search process is aborted. 9 The dependent node of an arc is the node located at the source of the arc.
8

˘Ą 

(2) Algorithm for Obtaining Feasible Solution and Lower Bound Value In the graph branch algorithm, a well-formed dependency tree in the dependency graph of the partial-problem Č is assigned as the feasible solution Ë of Č 8 . The score of the feasible solution Ë is assigned as the lower bound value Ä . The function for computing these values Ř × is called a feasible solution/lower bound value function. The details are not shown due to space limitations, but Ř × is realized by the backtrackbased depth-first search algorithm with the optimization based on the arc scores. Ř × assures that the obtained solution satisfies the covering constraint and the arc co-occurrence constraint. The incumbent value Ţ (the best score so far) is replaced by the Ä at Ë ż in Figure 4 if needed.

arci
Remove arcj

DG: Dependency graph of parent problem

arcj

Remove arci

arci arcj
DGi: Dependency graph for child problem Pi DGj: Dependency graph for child problem Pj

Figure 5: Graph branching problem is easier than the parent partial-problem because the size of the DG of the child partialproblem is less than that of its parent. In Figure 4, Ř ÔĐ computes the list of inconsistent arc pairs Á Č Ä(Inconsistent Arc Pair List) for the maximum well-covered arc set of Č . Then the graph branch function Ö Ô Ö Ň selects one inconsistent arc pair ´ Ö Ö  ľ from Á Č Ä for branch operation. The selection criteria for ´ Ö Ö  ľ affect s the efficiency of the algo rithm. ÖÔ Ö Ň selects the incon sisten t arc pair containing the highest score arc in Ä(Branch Arc Candidates List). Ö Ô Ö Ň calculates the upper bound value for a child partial-problem by Ř Ů and sets it to the child partial-problem.


P0
$    # $   Š  # $ Š   # $    #¤ Ł  " ! § Ľ Ľ Ľ $Ą @& 9 8 7 ' & # $Ą @ & 9 8 7 ' & #    Š      ¤    Ł Ľ  Š   ¤ Ł Ľ   Š  ¨    ¤   Ł Ľ  ¤ Ł Ľ (1 0 ) Ą ( ' & Ł    % Ł 6Ľ  Š  ¨    ¤ Ł      Ł   Š      ¤  Š ¨ Ł   Š      ¤  Š ¨ Ł Ľ ¤Ł Ľ ˘ Ą  §Ś §

P1
$   Š  # $ Š    # $ Š   #¤ Ł  " ! §   " !§ Ľ   Š      ¤Ł     Ł § Ľ   Š       ¤  ¨ Ł Ľ ¤ Ł Ľ ˘Ą   §Ś

P2
$    # $   ¨ # $   Š  #¤ Ł  " ! §    " !§ Ľ ˘Ą   §Ś §

P3
$    # $    #¤ Ł  " !§  " !§ Ľ         Š  ¤ Ł      Ł § Ľ         ¤    Ł Ľ   Š ¤ Ł Ľ ˘ Ą  §Ś

P4
¤ Ł  " !§  " !§ Ľ (1 0 ) Ą ( ' & Ł    % Ł § ˘Ą   §Ś

search diagram). Ë ż´ ŇŮŃ ŇŘ Ú ĐŮ ŮÔ Ř ľ updates Ţ and Ç to new values. Then, Ř ÔĐ´Čź ľ comput es the incon sisten t arc pair list ´ž ˝ ľ ´˝ žżľ ´žż ˝ ľ ´ž ˝ ľ from the maximum well-covered arc set ˝ ž ˝ žż ˝  and set it to Á Č Ä. Ë ´Ń Ü ŃŮŃ Ú ĐŮ Ř ×Řľ compares the upper bound value Í and the feasi h ble solut ion valu e Ä . In this case, Ä Í olds, so Ä is assig ned the valu e of Á Č Ä. The next step Ë ´ Ö Ň ÓÔ Ö Ř ÓŇľ executes the
Ö Ô Ö Ň function. Ö Ô Ö Ň selects

Figure 6: Search diagram greater than or equal to the score of the optimum solutions when the computation stopped.

4 Example of Optimum Tree Search
This section shows an example of the graph branch algorithm execution using the DF in Fig.3. 4.1 Example of Graph Branch Algorithm

The search process of the B&B method can be shown as a search diagram constructing a partialproblem tree representing the parent-child relation between the partial-problems. Figure 6 is a search diagram for the example DF showing the search process of the graph branch algorithm. In this figure, box Č is a partial-problem with its dependency graph Ö Ń, upper bound value a Í , feasib le solu tion and lower boun d value Ä nd inconsistent arc pair list Á Č Ä. Suffix of Č i ndicates the generation order of partial-problems. Updating of global variable Ţ (incumbent value) and Ç (set of incumbent solutions) is shown under the box. The value of the left-hand side of the arrow is updated to that of right-hand side of the arrow during the partial-problem processing. Details of the behavior of the algorithm in Figure 4 are described below. In Ë ˝´ Ň Ř Đ Ţ ľ, Ţ , Ç and Č are set to  ˝, and Čź respectively. The DG of Čź is that of the example DF. This is represented by ÖŃ . ŘŮ sets the upp er bound valu e (=63) of Čź to Í . In practice, this is calculated by obtaining the maximum well-covered arc set of Čź . In Ë ž´× Ö ľ, × Đ Ř ÔÖ Ó Đ Ń selects Čź and Ř ×´Čź ľ is execut ed. The feasi ble solution Ë and its score Ä are calculated to set Ë ˝ ž ˝ žż ž , Ä ź (Čź in the
366

   Š      ¤ 5Ł 4  4 6 Ľ $ Ą @ & 9 8 7 ' & #  Ł 2

$Ą @& 9 8 7 ' &#   Ł $Ą @& 9 8 7 ' &#   Ł 2

 Š  ¨    ¤ 5 Ł 4

6Ľ

 Š  ¨    ¤ 5 3

   Š      ¤ 5 3 6 Ľ  Š  ¨    ¤ 5 Ł 4   4 6 Ľ   3  Ł 2

5 Ł4 6  3  % Ł  3  % Ł 2 $ Ą @& 9 8 7 ' & # $ Ą @& 9 8 7 ' & #    Š     ¤ 5Ł    Š      ¤ 5Ł 4 6Ľ  $ Ą @ & 9 8 7 ' & #  Ł 2

the arc pair with the highest arc score and performs the graph branch operation with the selected arc pair. The following is a Ä shown with the arc names and arc scores.
´ ´Ňž

˝

ÔÖ ˝

˝źľ ´ÔÖ

˝

˝ź

×Ů žż ÚÔÔ˝

˝źľ ľ

×Ů žż

˝ź

ÚÔÔ˝

ľ ´Ňž ˝ 

Scores are shown in . The arc pair containing the highest arc score is ´ž ˝ ľ and ´ž ˝ ľ containing Ňž ˝ . Here, ´ž ˝ ľ is selected and partial-problems Č˝ ´Ö Ń žľ and Čž ´Ö Ń ˝ ľ are generated. Čź is removed from Č and the new two partial-problems are added to Č resulting in Č Č˝ Čž . Then, based on the best bound search strategy, Ë ž´× Ö ľ is tried again. Č˝ upda tes Ţ and Ç beca use Č˝ obtain ed a feasible solution better than that in Ç obtained by Čź . Čž and Č are terminated because they have no feasible solution. Čż generates a feasible solution but Ţ and Ç are not updated. This is because the obtained feasible solution is inferior to the incumbent solution in Ç . The optimum solution(= ˝ ž ˝ ż˝ ˝  ) is obtained by Č˝ . The computation from Čž to Č is required to assure that the feasible solution of Č˝ is optimum.

5 Experiment
This section describes some experimental results showing the computational amount of the graph branch algorithm. 5.1 Environment and Performance Metric An existing sentence analysis system10 (called the oracle system) is used as a generator of the test corpus, the preference knowledge source and the correct answers. Experiment data of 125,320 sentences11 extracted from English technical docu10 A real-world rule-based machine translation system with a long development history 11 Sentences ending with a period and parsable by the oracle system.


Figure 7: EPN-T, EPN-F EPN-F and OSN system for 136 sentences in this sentence set is 97.2% with respect to human analysis results. All optimum trees are computed by the graph branch algorithm described in Section 3.2. Figure 7 shows averages of EPN-T, EPN-L, EPN-F and OSN with respect to sentence length. Overall averages of EPN-T, EPN-L, EPN-F and OSN for the test sentences are 3.0, 1.67, 1.43 and 1.15. The result shows that the average number of problems required is relatively small. The gap between Ave:EPN-T and Ave:EPN-L (3.0-1.67=1.33) is much greater than the gap between Ave:EPN-L and Ave:OSN(1.67-1.15=0.52). This means that the major part of the computation is performed only for checking if the obtained feasible solutions are optimum or not. According to (Hirakawa, 2001), the experiment on the B&B-based algorithm for the SDG shows the overall averages of AVE:EPN-T, AVE:EPNF are 2.91, 1.33 and the average CPU time is 305.8ms (on EWS). These values are close to those in the experiment based on the graph branch algorithm. Two experiments show a tendency for the optimum solution to be obtained in the early stage of the search process. The graph branch algorithm is expected to obtain the comparable performance with the SDG search algorithm. (Hirakawa, 2001) introduced the improved upper bound function g'(P) into the B&B-based algorithm for the SDG and found Ave:EPN-T is reduced from 2.91 to 1.82. The same technique is introduced to the graph branch algorithm and has obtained the reduction of the Ave:EPN-T from 3.00 to 2.68. The tendency for the optimum solution to be obtained in the early stage of the search process suggests that limiting the number of problems to expand is an effective pruning strategy. Figure 8 shows the ratios of the sentences obtaining the whole problem expansion, the first optimum solu-

(a) EPN in total (EPN-T): The number of the expanded problems which are generated in the entire search process. (b) EPN for the first optimum solution (EPN-F): The number of the expanded problems when the first optimum solution is obtained. (c) EPN for the last optimum solution (EPN-L): The number of the expanded problems when the last optimum solution is obtained. At this point, all optimum solutions are obtained. Optimum solution number (OSN) for a problem, i.e. the number of optimum dependency trees in a given DF, gives the lower bound value for all these metrics because one problem generates at most one solution. The minimum value of OSN is 1 because every DF has at least one dependency tree. As the search process proceeds, the algorithm finds the first optimum solution, then the last optimum solution, and finally terminates the process by confirming no better solution is left. Therefore, the three metrics and OSN have the relation OSN EPN-F EPN-L EPN-T. Average values for these are described as Ave:OSN, Ave:EPNF, Ave:EPN-L and Ave:EPN-T. 5.2 Experimental Results

An evaluation experiment for the open data is performed using a prototype PDG system implemented in Prolog. The sentences containing more than 22 words are neglected due to the limitation of Prolog system resources in the parsing process. 4334 sentences out of the remaining 6882 test sentences are parsable. Unparsable sentences (2584 sentences) are simply neglected in this experiment. The arc precision ratio12 of the oracle
12 Correct arc ratio with respect to arcs in the output dependency trees (Hirakawa, 2005).

367

    Ľ Ľ ¤ Ľ Ł Ľ ¤ ¤ ¤ Š¤ ¨¤ §¤ Ś¤ Ľ¤ ¤ ¤ Ł¤

ED DC BA @ 9 87 6 5 4 3 2 EI H CG A @ 9 87 F 5 4 3 2 E B QCG A @ 9 87 P 5 4 3 2 ETG CG A @ 9 87 4 S R    Š ¨ § Ś ¤ ¨ Ś Ľ § Ł

˘Ą 

ments is divided into open data (8605 sentences) and closed data (116,715 sentences). The preference score source, i.e. the WPP frequencies and the dependency relation frequencies are obtained from the closed data. The basic PDG grammar (907 extended CFG rules) is used for generating the DFs for the open data. The expanded problem number (EPN), a principal computational amount factor of the B&B method, is adopted as the base metric. The following three metrics are used in this experiment.

Š % 1 0 )(" ' & % $ #" !  


Figure 8: ARs for EPS-F, EPS-A, EPS-T tion and the last optimum solution to whole sentences with respect to the EPNs. This kind of ratio is called an achievement ratio (AR) in this paper. From Figure 8, the ARs for EPN-T, EPN-L, EPNF (plotted in solid lines) are 97.1%,99.6%,99.8% respectively at the EPN 10. The dotted line shows the AR for EPN-T of the improved algorithm using g'(P). The use of g'(P) increases the AR for EPN-T from 97.1% to 99.1% at the EPN 10. However, the effect of g'(P) is quite small for EPNF and EPN-L. This result shows that the pruning strategy based on the EPN is effective and g'(P) works for the reduction of the problems generated in the posterior part of the search processes.

6 Concluding Remarks
This paper has described the graph branch algorithm for obtaining the optimum solution for a DF used in PDG. The well-formedness dependency tree constraints are represented by the constraint matrix of the DF, which has flexible and precise description ability so that controlled nonprojectivity is available in PDG framework. The graph branch algorithm assures the search for the optimum trees with arbitrary arc co-occurrence constraints, including the SVOC which has not been treated in DP-based algorithms so far. The experimental result shows the averages of EPNT, EPN-L and EPN-F for English test sentences are 3.0, 1.67 and 1.43, respectively. The practical code implementation of the graph branch algorithm and its performance evaluation are subjects for future work.

References
Y. J. Chu and T. H. Liu. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14. J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of Standards, 71B:233­240.


3 2 9 8 ! 7 6 5' 4 % $ # " !   

3 2 ! 0 2 1 0 ) ('  % $ # " !   

3 2 ! 0 2 1 0 ) (' 4 % $ # " !   

¨Š

3 2 ! 0 2 1 0 ) (' & % $ # " !   

Š

§Š

Š

ŠŠ

¨


§


Š

 Ś Ľ ĽŠ Ś§¨ ŚĽ¨ Ś§¤ ŚĽ¤ Ł˘ Ą  

J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of COLING'96, pages 340­345. M. Harada and T. Mizuno. 2001. Japanese semantic analysis system sage using edr (in japanese). Transactions of JSAI, 16(1):85­93. H. Hirakawa. 2001. Semantic dependency analysis method for japanese based on optimum tree search algorithm. In Proceedings of the PACLING2001. H. Hirakawa. 2005. Evaluation measures for natural language analyser based on preference dependency grammar. In Proceedings of the PACLING2005. H. Hirakawa. 2006. Preference dependency grammar and its packed shared data structure 'dependency forest' (in japanese). To appear in Natural Language Processing, 13(3). T. Ibaraki. 1978. Branch-and-bounding procedure and state-space representation of combinatorial optimization problems. Information and Control, 36,127. S. Kanahe, A. Nasr, and O. Rambow. 1998. Pseudo-projectivity: A polynomially parsable nonprojective dependency grammar. In COLINGACL'98, pages 646­52. N. Katoh and T. Ehara. 1989. A fast algorithm for dependency structure analysis (in japanese). In Proceedings of 39th Annual Convention of the Information Processing Society. S. Lee and K. S. Choi. 1997. Reestimation and bestfirst parsing algorithm for probablistic dependency grammars. In Proceedings of the Fifth Workshop on Very Large Corpora, pages 41­55. H. Maruyama. 1990. Constraint dependency grammar and its weak generative capacity. Computer Software. R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLTEMNLP, pages 523­530. J. Nivre and J. Nilsson. 2005. Pseudo-projective dependency parsing. In ACL-05, pages 99­106. K. Ozeki and Y. Zhang. 1999.  . In Proceeding of the Workshop of The Fifth Annual Meeting of The Association for Natural Language Processing, pages 9­14. K. Ozeki. 1994. Dependency structure analysis as combinatorial optimization. Information Sciences, 78(1-2):77­99. W. Wang and M. P. Harper. 2004. A statistical constraint dependency grammar (cdg) parser. In Workshop on Incremental Parsing: Bringing Engineering and Cognition Together (ACL), pages 42­49.

368