Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan {ryu-i,inui,matsu}@is.naist.jp Abstract We approach the zero-anaphora resolution problem by decomposing it into intra-sentential and inter-sentential zeroanaphora resolution. For the former problem, syntactic patterns of the appearance of zero-pronouns and their antecedents are useful clues. Taking Japanese as a target language, we empirically demonstrate that incorporating rich syntactic pattern features in a state-of-the-art learning-based anaphora resolution model dramatically improves the accuracy of intra-sentential zero-anaphora, which consequently improves the overall performance of zeroanaphora resolution. 1 Introduction Zero-anaphora is a gap in a sentence that has an anaphoric function similar to a pro-form (e.g. pronoun) and is often described as "referring back" to an expression that supplies the information necessary for interpreting the sentence. For example, in the sentence "There are two roads to eternity, a straight and narrow, and a broad and crooked," the gaps in "a straight and narrow (gap)" and "a broad and crooked (gap)" have a zero-anaphoric relationship to "two roads to eternity." The task of identifying zero-anaphoric relations in a given discourse, zero-anaphora resolution, is essential in a wide range of NLP applications. This is the case particularly in such a language as Japanese, where even obligatory arguments of a predicate are often omitted when they are inferable from the context. In fact, in our Japanese newspaper corpus, for example, 45.5% of the nominative arguments of verbs are omitted. Since such gaps 625 can not be interpreted only by shallow syntactic parsing, a model specialized for zero-anaphora resolution needs to be devised on the top of shallow syntactic and semantic processing. Recent work on zero-anaphora resolution can be located in two different research contexts. First, zero-anaphora resolution is studied in the context of anaphora resolution (AR), in which zeroanaphora is regarded as a subclass of anaphora. In AR, the research trend has been shifting from rulebased approaches (Baldwin, 1995; Lappin and Leass, 1994; Mitkov, 1997, etc.) to empirical, or corpus-based, approaches (McCarthy and Lehnert, 1995; Ng and Cardie, 2002a; Soon et al., 2001; ¨ Strube and Muller, 2003; Yang et al., 2003) because the latter are shown to be a cost-efficient solution achieving a performance that is comparable to best performing rule-based systems (see the Coreference task in MUC1 and the Entity Detection and Tracking task in the ACE program2 ). The same trend is observed also in Japanese zeroanaphora resolution, where the findings made in rule-based or theory-oriented work (Kameyama, 1986; Nakaiwa and Shirai, 1996; Okumura and Tamura, 1996, etc.) have been successfully incorporated in machine learning-based frameworks (Seki et al., 2002; Iida et al., 2003). Second, the task of zero-anaphora resolution has some overlap with Propbank3 -style semantic role labeling (SRL), which has been intensively studied, for example, in the context of the CoNLL SRL task4 . In this task, given a sentence "To attract younger listeners, Radio Free Europe intersperses the latest in Western rock groups", an SRL 1 2 http://www-nlpir.nist.gov/related projects/muc/ http://projects.ldc.upenn.edu/ace/ 3 http://www.cis.upenn.edu/~mpalmer/project pages/ACE.htm 4 http://www.lsi.upc.edu/~srlconll/ Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 625­632, Sydney, July 2006. c 2006 Association for Computational Linguistics model is asked to identify the NP Radio Free Europe as the A0 (Agent) argument of the verb attract. This can be seen as the task of finding the zero-anaphoric relationship between a nominal gap (the A0 argument of attract) and its antecedent (Radio Free Europe) under the condition that the gap and its antecedent appear in the same sentence. In spite of this overlap between AR and SRL, there are some important findings that are yet to be exchanged between them, partly because the two fields have been evolving somewhat independently. The AR community has recently made two important findings: · A model that identifies the antecedent of an anaphor by a series of comparisons between candidate antecedents has a remarkable advantage over a model that estimates the absolute likelihood of each candidate independently of other candidates (Iida et al., 2003; Yang et al., 2003). · An AR model that carries out antecedent identification before anaphoricity determination, the decision whether a given NP is anaphoric or not (i.e. discourse-new), significantly outperforms a model that executes those subtasks in the reverse order or simultaneously (Poesio et al., 2004; Iida et al., 2005). To our best knowledge, however, existing SRL models do not exploit these advantages. In SRL, on the other hand, it is common to use syntactic features derived from the parse tree of a given input sentence for argument identification. A typical syntactic feature is the path on a parse tree from a target predicate to a noun phrase in question (Gildea and Jurafsky, 2002; Carreras and Marquez, 2005). However, existing AR models deal with intra- and inter-sentential anaphoric relations in a uniform manner; that is, they do not use as rich syntactic features as state-of-the-art SRL models do, even in finding intra-sentential anaphoric relations. We believe that the AR and SRL communities can learn more from each other. Given this background, in this paper, we show that combining the aforementioned techniques derived from each research trend makes significant impact on zero-anaphora resolution, taking Japanese as a target language. More specifically, we demonstrate the following: · Incorporating rich syntactic features in a state-of-the-art AR model dramatically im626 proves the accuracy of intra-sentential zeroanaphora resolution, which consequently improves the overall performance of zeroanaphora resolution. This is to be considered as a contribution to AR research. · Analogously to inter-sentential anaphora, decomposing the antecedent identification task into a series of comparisons between candidate antecedents works remarkably well also in intra-sentential zero-anaphora resolution. We hope this finding to be adopted in SRL. The rest of the paper is organized as follows. Section 2 describes the task definition of zeroanaphora resolution in Japanese. In Section 3, we review previous approaches to AR. Section 4 described how the proposed model incorporates effectively syntactic features into the machine learning-based approach. We then report the results of our experiments on Japanese zeroanaphora resolution in Section 5 and conclude in Section 6. 2 Zero-anaphora resolution In this paper, we consider only zero-pronouns that function as an obligatory argument of a predicate for two reasons: · Providing a clear definition of zero-pronouns appearing in adjunctive argument positions involves awkward problems, which we believe should be postponed until obligatory zero-anaphora is well studied. · Resolving obligatory zero-anaphora tends to be more important than adjunctive zeropronouns in actual applications. A zero-pronoun may have its antecedent in the discourse; in this case, we say the zero-pronoun is anaphoric. On the other hand, a zero-pronoun whose referent does not explicitly appear in the discourse is called a non-anaphoric zero-pronoun. A zero-pronoun may be non-anaphoric typically when it refers to an extralinguistic entity (e.g. the first or second person) or its referent is unspecified in the context. The following are Japanese examples. In sentence (1), zero-pronoun i is anaphoric as its antecedent, `shusho (prime minister)', appears in the same sentence. In sentence (2), on the other hand, j is considered non-anaphoric if its referent (i.e. the first person) does not appear in the discourse. (1) shushoi -wa prime ministeri -T O P houbeisi-te visit-U.S.-C O N J , PUNC ryoukoku-no both countries-B E T W E E N gaikou-o diplomacy-O B J 3.2 Anaphoricity determination There are two alternative ways for anaphoricity determination: the single-step model and the two-step model. The single-step model (Soon et al., 2001; Ng and Cardie, 2002a) determines the anaphoricity of a given anaphor indirectly as a by-product of the search for its antecedent. If an appropriate candidate antecedent is found, the anaphor is classified as anaphoric; otherwise, it is classified as non-anaphoric. One disadvantage of this model is that it cannot employ the preferencebased model because the preference-based model is not capable of identifying non-anaphoric cases. The two-step model (Ng, 2004; Poesio et al., 2004; Iida et al., 2005), on the other hand, carries out anaphoricity determination in a separate step from antecedent identification. Poesio et al. (2004) and Iida et al. (2005) claim that the latter subtask should be done before the former. For example, given a target anaphor (TA), Iida et al.'s selection-then-classification model: 1. selects the most likely candidate antecedent (CA) of TA using the tournament model, 2. classifies TA paired with CA as either anaphoric or non-anaphoric using an anaphoricity determination model. If the CA-TA pair is classified as anaphoric, CA is identified as the antecedent of TA; otherwise, TA is conclude to be non-anaphoric. The anaphoricity determination model learns the non-anaphoric class directly from non-anaphoric training instances whereas the single-step model cannot not use non-anaphoric cases in training. (i -ga) (i -N O M) plan-O B J suishinsuru promote-A D N O M unveil-PA S T PUNC houshin-o akirakanisi-ta . The prime minister visited the united states and unveiled the plan to push diplomacy between the two countries. (j -ga) ie-ni kaeri-tai . (j -N O M) home-DAT want to go back PUNC (2) (I) want to go home. Given this distinction, we consider the task of zero-anaphora resolution as the combination of two sub-problems, antecedent identification and anaphoricity determination, which is analogous to NP-anaphora resolution: For each zero-pronoun in a given discourse, find its antecedent if it is anaphoric; otherwise, conclude it to be non-anaphoric. 3 Previous work 3.1 Antecedent identification Previous machine learning-based approaches to antecedent identification can be classified as either the candidate-wise classification approach or the preference-based approach. In the former approach (Soon et al., 2001; Ng and Cardie, 2002a, etc.), given a target anaphor, TA, the model estimates the absolute likelihood of each of the candidate antecedents (i.e. the NPs preceding TA), and selects the best-scored candidate. If all the candidates are classified negative, TA is judged nonanaphoric. In contrast, the preference-based approach (Yang et al., 2003; Iida et al., 2003) decomposes the task into comparisons of the preference between candidates and selects the most preferred one as the antecedent. For example, Iida et al. (2003) proposes a method called the tournament model. This model conducts a tournament consisting of a series of matches in which candidate antecedents compete with each other for a given anaphor. While the candidate-wise classification model computes the score of each single candidate independently of others, the tournament model learns the relative preference between candidates, which is empirically proved to be a significant advantage over candidate-wise classification (Iida et al., 2003). 627 4 Proposal 4.1 Task decomposition We approach the zero-anaphora resolution problem by decomposing it into two subtasks: intrasentential and inter-sentential zero-anaphora resolution. For the former problem, syntactic patterns in which zero-pronouns and their antecedents appear may well be useful clues, which, however, does not apply to the latter problem. We therefore build a separate component for each subtask, adopting Iida et al. (2005)'s selection-thenclassification model for each component: 1. Intra-sentential antecedent identification: For a given zero-pronoun Z P in a given sentence S , select the most-likely candidate antecedent C1 from the candidates appearing in S by the intra-sentential tournament model 4.2 Representation of syntactic patterns In the first two of the above four steps, we use syntactic pattern features. Analogously to SRL, we extract the parse path between a zero-pronoun to its antecedent to capture the syntactic pattern of their occurrence. Among many alternative ways of representing a path, in the experiments reported in the next section, we adopted a method as we describe below, leaving the exploration of other alternatives as future work. Given a sentence, we first use a standard dependency parser to obtain the dependency parse tree, in which words are structured according to the dependency relation between them. Figure 1(a), for example, shows the dependency tree of sentence (1) given in Section 2. We then extract the path between a zero-pronoun and its antecedent as in Figure 1(b). Finally, to encode the order of siblings and reduce data sparseness, we further transform the extracted path as in Figure 1(c): · A path is represented by a subtree consisting of backbone nodes: (zero-pronoun), Ant (antecedent), Node (the lowest common ancestor), LeftNode (left-branch node) and RightNode. · Each backbone node has daughter nodes, each corresponding to a function word associated with it. · Content words are deleted. This way of encoding syntactic patterns is used in intra-sentential anaphoricity determination. In antecedent identification, on the other hand, the tournament model allows us to incorporate three paths, a path for each pair of a zero-pronoun and left and right candidate antecedents, as shown in 628 Figure 1: Representation of the path between a zero-pronoun to its antecedent TRQ USP Figure 2: Paths used in the tournament model Figure 25 . 4.3 Learning algorithm As noted in Section 1, the use of zero-pronouns in Japanese is relatively less constrained by syntax compared, for example, with English. This forces the above way of encoding path information to produce an explosive number of different paths, which inevitably leads to serious data sparseness. This issue can be addressed in several ways. The SRL community has devised a range of variants of the standard path representation to reduce the complexity (Carreras and Marquez, 2005). Applying Kernel methods such as Tree kernels (Collins and Duffy, 2001) and Hierarchical DAG kernels (Suzuki et al., 2003) is another strong option. The Boosting-based algorithm pro5 To indicate which node belongs to which subtree, the label of each node is prefixed either with L, R or I. £ ¢¢1" ¡ I D ' 1 C " B A ¢ 7 ¥" 8 0 ¥" @ 0 0 "@ D ' 1 C " B A ¢ 7 ¥" 8 0 ¥" @ 0 0 "@ E ¥¤§ "B E ¥¤§ "B §6 )¦ 5¡I ¢410§)(& $ ¡ I £ " '% ¢G%$ I ¦ ¢G%$ I ¦ D ¤E ©§ 7 &F ¥¤&¡ 0¡ £¦ D ¤E ©§ 7 &F ¥¤&¡ 0¡ £¦ Q sp $ ¥¤§ E "B £ ¡ ¢¢" ¢ ¦6)¤ § 5 4 ¡ AF@ 8¡I $ H¤ Q E $ H¤ Q E Q sW%r q $p V ¤%$ ©§ $¨ £ ¢£&7 0¢ ¢£ 8 ¡0 ¤%$ ©§ $¨ £ ¢£&7 0¢ ¢£ 8 ¡0 IG F%$ ¦ £ "§ ' ¢¢#)(& %7$ ¡ I £" ¢¢#§)' & 7$ ¢ %¡ D (¤(H ¡I A@ !¢9 8 ¢ ¡ %$¤ "P %$¤ "P 9 9999 9 9 Q #pW%r q $ V $ %' Q E £ " ' % ¢410§)(& $ ¡ I £ "§ ' % ¡ ¢¢10)(& $ ¢ IFGH$ ¦ A F@ 8¡I £ ¡ ¢¢" $ D (¤ ) ¡ ¢ D (¤ ) ¡ 7$ TS D ¢&ER%$ ©§ Q £ ¤¦ @ 0 " P $ %' "P 9 '&YX ¢'G ¦ &7 ¨ ' % ¤©§)(& $ ¡ I D ¤ ¤B 7$ ¡ 2¡ E7$ ¨ W C &S¤¤G $ V $ £ @ ¦ @ &&U 8 £¦ S ¤ C § ` S ¤ C § ` ¥ ¤ 8¤G ¡ I D ( C ¡ B ¢ 3¢ 2222 ¡ S ¤ C § ` Q $# ¢ h pi ¢¢#§)(& %7$ 7$ £ " ' ¡ £"¥ ¡ ¢¢#§ !£¤ ¢ I¢%$S )B I¢%$S )B ¢%S I$ )B £ " ¥£ ¢41#§ ! ¡ I `Y ¤¢¢ X ¢ ¡ AF@ 8 $ ¡ `Y ¢¢ X¡I 44 #6 53 2 ' 1 0 ) £¦ ¢ 0¥¤(¥'&¡ 44 #6 53 2 ' 1 0 ) £¦ ¢ 0¥¤(¥'&¡ g A (H` a5¡I A (H` a 5 ¢ ¡ §%$ "#! ¨ ' % ¡ ¤©§)(& $ $ ¨¥ ¡ ¤©§ ¦£¤ ¢ uta c ¤7¨ § ¦£¤ ¡ I ¥ §%$ "#! §%$ "#! ¥ ¤ 8¡ HG 7$ ¨ ¤ © ©§ ¨ ¦¡ £¡ ¤¢ ¥¤¢ ¨¤ © ©§ ¨ ¦¡ £¡ ¤¢ ¥¤¢ § ¡ I TVQ SSP § ¢ ¡ TW Q 3SP cb d¥a ce f'a 2. Intra-sentential anaphoricity determination: Estimate plausibility p1 that C1 is the true an tecedent, and return C1 if p1 intra (intra is a preselected threshold) or go to 3 otherwise 3. Inter-sentential antecedent identification: Select the most-likely candidate antecedent C2 from the candidates appearing outside of S by the inter-sentential tournament model. 4. Inter-sentential anaphoricity determination: Estimate plausibility p2 that C2 is the true if p antecedent, and return C2 2 inter (inter is a preselected threshold) or return non-anaphoric otherwise. 2. BM STR: BM with the syntactic features such as those in Figure 1(c). 3. SCM: The selection-then-classification model explained in Section 3. 4. SCM STR: SCM with all types of syntactic features shown in Figure 2. 5.1 Setting We created an anaphoric relation-tagged corpus consisting of 197 newspaper articles (1,803 sentences), 137 articles annotated by two annotators and 60 by one. The agreement ratio between two annotators on the 197 articles was 84.6%, which indicated that the annotation was sufficiently reliable. In the experiments, we removed from the above data set the zero-pronouns to which the two annotators did not agree. Consequently, the data set contained 995 intra-sentential anaphoric zero-pronouns, 754 inter-sentential anaphoric zero-pronouns, and 603 non-anaphoric zeropronouns (2,352 zero-pronouns in total), with each anaphoric zero-pronoun annotated to be linked to its antecedent. For each of the following experiments, we conducted five-fold cross-validation over 2,352 zero-pronouns so that the set of the zero-pronouns from a single text was not divided into the training and test sets. In the experiments, all the features were automatically acquired with the help of the following NLP tools: the Japanese morphological analyzer ChaSen7 and the Japanese dependency structure analyzer CaboCha8 , which also carried out named-entity chunking. 5.2 Results on intra-sentential zero-anaphora resolution In both intra-anaphoricity determination and antecedent identification, we investigated the effect of introducing the syntactic features for improving the performance. First, the results of antecedent identification are shown in Table 1. The comparison between BM (SCM) with BM STR (SCM STR) indicates that introducing the structural information effectively contributes to this task. In addition, the large improvement from BM STR to SCM STR indicates that the use of the preference-based model has significant impact on intra-sentential antecedent identification. This 7 8 Figure 4: Tree representation of features for the tournament model. posed by Kudo and Matsumoto (2004) is designed to learn subtrees useful for classification. Leaving the question of selecting learning algorithms open, in our experiments, we have so far examined Kudo and Matsumoto (2004)'s algorithm, which is implemented as the BACT system6 . Given a set of training instances, each of which is represented as a tree labeled either positive or negative, the BACT system learns a list of weighted decision stumps with a Boosting algorithm. Each decision stump is associated with tuple t, l, w , where t is a subtree appearing in the training set, l a label, and w a weight, indicating that if a given input includes t, it gives w votes to l. The strength of this algorithm is that it deals with structured feature and allows us to analyze the utility of features. In antecedent identification, we train the tournament model by providing a set of labeled trees as a training set, where a label is either left or right. Each labeled tree has (i) path trees TL , TR and TI (as given in Figure 2) and (ii) a set nodes corresponding to the binary features summarized in Table 3, each of which is linked to the root node as illustrated in Figure 4. This way of organizing a labeled tree allows the model to learn, for example, the combination of a subtree of TL and some of the binary features. Analogously, for anaphoricity determination, we use trees (TC , f1 , . . . , fn ), where TC denotes a path subtree as in Figure 1(c). 5 Experiments We conducted an evaluation of our method using Japanese newspaper articles. The following four models were compared: 1. BM: Ng and Cardie (2002a)'s model, which identify antecedents by the candidatewise classification model, and determine anaphoricity using the one-step model. 6 http://chasen.org/~taku/software/bact/ £¤ ¢ ¡¢ ©¦¦ ¨§¥ http://chasen.naist.jp/hiki/ChaSen/ http://chasen.org/~taku/software/cabocha/ 629 Figure 3: Feature set. Description HEAD BF characters of right-most morpheme in NP (PRED). P R E D I N M AT R I X 1 if PRED exists in the matrix clause; otherwise 0. PRED IN EMBEDDED 1 if PRED exists in the relative clause; otherwise 0. P R E D VO I C E 1 if PRED contains auxiliaries such as `(ra)reru'; otherwise 0. P R E D AU X 1 if PRED contains auxiliaries such as `(sa)seru', `hosii', `morau', `itadaku', `kudasaru', `yaru' and `ageru'. P R E D A LT 1 if P R E D VO I C E is 1 or P R E D AU X is 1; otherwise 0. POS Part-of-speech of NP followed by IPADIC (Asahara and Matsumoto, 2003). D E FI N I T E 1 if NP contains the article corresponding to DEFINITE `the', such as `sore' or `sono'; otherwise 0. D E M O N S T R AT I V E 1 if NP contains the article corresponding to DEMONSTRATIVE `that' or `this', such as `kono', `ano'; otherwise 0. PA RT I C L E Particle followed by NP, such as `wa (topic)', `ga (subject)', `o (object)'. Semantic NE Named entity of NP: P E R S O N , O R G A N I Z AT I O N , L O C AT I O N , A RT I FAC T, DAT E , T I M E , M O N E Y, P E R C E N T or N/A. EDR HUMAN 1 if NP is included among the concept `a human being' or `atribute of a human being' in EDR dictionary (Jap, 1995); otherwise 0. P RO N O U N T Y P E Pronoun type of NP. (e.g. `kare (he)' P E R S O N, `koko (here)' L O C AT I O N, `sore (this)' OT H E R S) SELECT REST 1 if NP satisfies selectional restrictions in Nihongo Goi Taikei (Japanese Lexicon) (Ikehara et al., 1997); otherwise 0. COOC the score of well-formedness model estimated from a large number of triplets Noun, Case, Predicate proposed by Fujita et al. (2004) Positional SENTNUM Distance between NP and PRED. BEGINNING 1 if NP is located in the beggining of sentence; otherwise 0. END 1 if NP is located in the end of sentence; otherwise 0. PRED NP 1 if PRED precedes NP; otherwise 0. NP PRED 1 if NP precedes PRED; otherwise 0. DEP PRED 1 if NPi depends on PRED; otherwise 0. DEP NP 1 if PRED depends on NPi ; otherwise 0. I N Q U OT E 1 if NP exists in the quoted text; otherwise 0. Heuristic CL RANK a rank of NP in forward looking-center list based on Centering Theory (Grosz et al., 1995) CL ORDER a order of NP in forward looking-center list based on Centering Theory (Grosz et al., 1995) NP and PRED stand for a bunsetsu-chunk of a candidate antecedent and a bunsetsu-chunk of a predicate which has a target zero-pronoun respectively. Feature Type Lexical Grammatical Feature finding may well contribute to semantic role labeling because these two tasks have a large overlap as discussed in Section 1. Second, to evaluate the performance of intrasentential zero-anaphora resolution, we plotted recall-precision curves altering threshold parameter and inter for intra-anaphoricity determination as shown in Figure 5, where recall R and precision P were calculated by: R= P= # of detected antecedents correctly # of anaphoric zero-pronouns , # of detected antecedents correctly # of zero-pronouns classified as anaphoric . Table 1: Accuracy of antecedent identification. BM 48.0% (478/995) BM STR 63.5% (632/995) SCM 65.1% (648/995) SCM STR 70.5% (701/995) resolution. Futhermore, SCM STR is significantly better than BM STR. This result supports that the former has an advantage of learning non-anaphoric zero-pronouns (181 instances) as negative training instances in intra-sentential anaphoricity determination, which enables it to reject non-anaphoric zero-pronouns more accurately than the others. 5.3 Discussion Our error analysis reveals that a majority of errors can be attributed to the current way of handling quoted phrases and sentences. Figure 6 shows the difference in resolution accuracy between zero-pronouns appearing in a quotation The curves indicate the upperbound of the performance of these models; in practical settings, the parameters have to be trained beforehand. Figure 5 shows that BM STR (SCM STR) outperforms BM (SCM), which indicates that incorporating syntactic pattern features works remarkably well for intra-sentential zero-anaphora 630 1 0.8 SCM_STR precision BM BM_STR SCM SCM_STR 1 SCM SCM_STR 0.8 SCM_STR precision 0.6 SCM 0.4 -0.006 0.6 SCM 0.4 BM 0.2 BM_STR 0.2 intra=0.022 0.009 0.005 0.013 0 0 0.1 0.2 0.3 0.4 recall 0.5 0.6 0.7 0.8 0 0 0.05 0.1 0.15 0.2 0.25 recall 0.3 0.35 0.4 0.45 0.5 Figure 5: Recall-precision curves of intrasentential zero-anaphora resolution. 1 SCM_STR IN_Q OUT_Q OUT_Q SCM_STR precision 0.6 Figure 7: Recall-precision curves of overall zeroanaphora resolution. 0.3 SCM_STR 0.25 0.8 SCM SCM_STR 0.2 AUC 0.4 IN_Q SCM 0.15 0.2 0.1 0 0 0.1 0.2 0.3 0.4 recall 0.5 0.6 0.7 0.8 0.05 Figure 6: Recall-precision curves of resolving inquote and out-quote zero-pronouns. (262 zero-pronouns) and the rest (733 zeropronouns), where "IN Q" denotes the former (inquote zero-pronouns) and "OUT Q" the latter. The accuracy on the IN Q problems is considerably lower than that on the OUT Q cases, which indicates that we should deal with in-quote cases with a separate model so that it can take into account the nested structure of discourse segments introduced by quotations. 5.4 Impact on overall zero-anaphora resolution We next evaluated the effects of introducing the proposed model on overall zero-anaphora resolution including inter-sentential cases. As a baseline model, we implemented the original SCM, designed to resolve intra-sentential zeroanaphora and inter-sentential zero-anaphora simultaneously with no syntactic pattern features. Here, we adopted Support Vector Machines (Vapnik, 1998) to train the classifier on the baseline 631 0 -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05 threshold intra Figure 8: AUC curves plotted by altering intra . model and the inter-sentential zero-anaphora resolution in the SCM using structural information. For the proposed model, we plotted several recall-precision curves by selecting different value for threshold parameters intra and inter . The results are shown in Figure 7, which indicates that the proposed model significantly outperforms the original SCM if intra is appropriately chosen. We then investigated the feasibility of parameter selection for intra by plotting the AUC values for different intra values. Here, each AUC value is the area under a recall-precision curve. The results are shown in Figure 8. Since the original SCM does not use intra , the AUC value of it is constant, depicted by the SCM. As shown in the Figure 8, the AUC-value curve of the proposed model is not peaky, which indicates the selection of parameter intra is not difficult. 6 Conclusion In intra-sentential zero-anaphora resolution, syntactic patterns of the appearance of zero-pronouns and their antecedents are useful clues. Taking Japanese as a target language, we have empirically demonstrated that incorporating rich syntactic pattern features in a state-of-the-art learning-based anaphora resolution model dramatically improves the accuracy of intra-sentential zero-anaphora, which consequently improves the overall performance of zero-anaphora resolution. In our next step, we are going to address the issue of how to find zero-pronouns, which requires us to design a broader framework that allows zeroanaphora resolution to interact with predicateargument structure analysis. Another important issue is how to find a globally optimal solution to the set of zero-anaphora resolution problems in a given discourse, which leads us to explore methods as discussed by McCallum and Wellner (2003). References M. Asahara and Y. Matsumoto, 2003. IPADIC User Manual. Nara Institute of Science and Technology, Japan. B. Baldwin. 1995. CogNIAC: A Discourse Processing Engine. Ph.D. thesis, Department of Computer and Information Sciences, University of Pennsylvania. X. Carreras and L. Marquez. 2005. Introduction to the conll2005 shared task: Semantic role labeling. In Proceedings of the Ninth CoNll, pages 152­164. M. Collins and N.l Duffy. 2001. Convolution kernels for natural language. In Proceedings of the NIPS, pages 625­ 632. A. Fujita, K. Inui, and Y. Matsumoto. 2004. Detection of incorrect case assignments in automatically generated paraphrases of japanese sentences. In Proceeding of the first IJCNLP, pages 14­21. D. Gildea and D. Jurafsky. 2002. Automatic labeling of semantic roles. In Computational Linguistics, pages 245­ 288. B. J. Grosz, A. K. Joshi, and S. Weinstein. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):203­226. R. Iida, K. Inui, H. Takamura, and Y. Matsumoto. 2003. Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the 10th EACL Workshop on The Computational Treatment of Anaphora, pages 23­30. R. Iida, K. Inui, and Y. Matsumoto. 2005. Anaphora resolution by antecedent identification followed by anaphoricity determination. ACM Transactions on Asian Language Information Processing (TALIP), 4:417­434. S. Ikehara, M. Miyazaki, S. Shirai A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi. 1997. Nihongo Goi Taikei (in Japanese). Iwanami Shoten. Japan Electronic Dictionary Research Institute, Ltd. Japan, 1995. EDR Electronic Dictionary Technical Guide. M. Kameyama. 1986. A property-sharing constraint in centering. In Proceedings of the 24th ACL, pages 200­206. T. Kudo and Y. Matsumoto. 2004. A boosting algorithm for classification of semi-structured text. In Proceedings of the 2004 EMNLP, pages 301­308. S. Lappin and H. J. Leass. 1994. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535­561. A. McCallum and B. Wellner. 2003. Object consolidation by graph partitioning with a conditionally trained distance metric. In Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, pages 19­24. J. F. McCarthy and W. G. Lehnert. 1995. Using decision trees for coreference resolution. In Proceedings of the 14th IJCAI, pages 1050­1055. R. Mitkov. 1997. Factors in anaphora resolution: they are not the only things that matter. a case study based on two different approaches. In Proceedings of the ACL'97/EACL'97 Workshop on Operational Factors in Practical, Robust Anaphora Resolution. H. Nakaiwa and S. Shirai. 1996. Anaphora resolution of japanese zero pronouns with deictic reference. In Proceedings of the 16th COLING, pages 812­817. V. Ng. 2004. Learning noun phrase anaphoricity to improve coreference resolution: Issues in representation and optimization. In Proceedings of the 42nd ACL, pages 152­ 159. V. Ng and C. Cardie. 2002a. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th ACL, pages 104­111. M. Okumura and K. Tamura. 1996. Zero pronoun resolution in japanese discourse based on centering theory. In Proceedings of the 16th COLING, pages 871­876. M. Poesio, O. Uryupina, R. Vieira, M. Alexandrov-Kabadjov, and R. Goulart. 2004. Discourse-new detectors for definite description resolution: A survey and a preliminary proposal. In Proceedings of the 42nd ACL Workshop on Reference Resolution and its Applications, pages 47­54. K. Seki, A. Fujii, and T. Ishikawa. 2002. A probabilistic method for analyzing japanese anaphora integrating zero pronoun detection and resolution. In Proceedings of the 19th COLING, pages 911­917. W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521­544. ¨ M. Strube and C. Muller. 2003. A machine learning approach to pronoun resolution in spoken dialogue. In Proceedings of the 41st ACL, pages 168­175. J. Suzuki, T. Hirao, Y. Sasaki, and E. Maeda. 2003. Hierarchical directed acyclic graph kernel: Methods for structured natural language data. In Proceeding of the 41st ACL, pages 32­39. V. N. Vapnik. 1998. Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing Communications, and control. John Wiley & Sons. X. Yang, G. Zhou, J. Su, and C. L. Tan. 2003. Coreference resolution using competition learning approach. In Proceedings of the 41st ACL, pages 176­183. 632