Language Specific Issue and Feature Exploration in Chinese Event Extraction Zheng Chen Heng Ji Department of Computer Science The Graduate Center Queens College and The Graduate Center The City University of New York 365 Fifth Avenue, New York, NY 10016, USA zchen1@gc.cuny.edu hengji@cs.qc.cuny.edu Abstract In this paper, we present a Chinese event extraction system. We point out a language specific issue in Chinese trigger labeling, and then commit to discussing the contributions of lexical, syntactic and semantic features applied in trigger labeling and argument labeling. As a result, we achieved competitive performance, specifically, F-measure of 59.9 in trigger labeling and F-measure of 43.8 in argument labeling. same task as we did in this paper. However, to our knowledge, the language specific issue and feature contributions for Chinese event extraction have not been reported by earlier researchers. The remainder of the paper is organized as follows. Section 2 points out a language specific issue in Chinese trigger labeling and discusses two strategies of trigger labeling: word-based and character-based. Section 3 presents argument labeling. Section 4 discusses the experimental results. Section 5 concludes the paper. 2 Trigger Labeling We split trigger labeling into two steps: 1) trigger identification: to recognize the event trigger 2) trigger classification: to assign an event type for the trigger. The two strategies we will discuss in trigger labeling (word-based and character-based) only differ in the first step. 2.1 A Language-Specific Issue Chinese, and some other languages, e.g., Japanese do not have delimiters between words. Thus, segmentation is usually an indispensible step for further processing, e.g., Part-of-Speech tagging, parsing, etc. However, the segmentation may cause a problem in some tasks, e.g., name entity recognition (Jing et al., 2003) and event trigger identification. For a specific example, "" (shoot and kill) is segmented as a Chinese word. However, there are two triggers in the word, one is " "(shoot) with the event type of Attack, and the other is ""(kill) with the event type of Die. The trigger may also cross two or more words, e.g., the trigger is " " (public letter) which crosses two words, "" (public) and ""(letter). In the ACE Chinese corpus, 2902 triggers exactly one-to-one match their corresponding words, 1 Introduction In this paper we address the event extraction task defined in Automatic Content Extraction (ACE) 1 program. The ACE program defines the following terminology for event extraction task: Trigger: the word that most clearly expresses an event's occurrence Argument: an entity, or a temporal expression or a value that plays a certain role in the event instance Event mention: a phrase or sentence with a distinguished trigger and participant arguments Some English event extraction systems based on supervised learning have been reported by researchers (Ahn, 2006; Ji and Grishman, 2008). In this paper we developed a modularized Chinese event extraction system. We nicely handled the language specific issue in trigger labeling and explored effective lexical, syntactic and semantic features that were applied in trigger labeling and argument labeling. Tan et al. (2008) addressed the 1 http://www.nist.gov/speech/tests/ace/ 209 Proceedings of NAACL HLT 2009: Short Papers, pages 209­212, Boulder, Colorado, June 2009. c 2009 Association for Computational Linguistics meanwhile, 431 triggers are inconsistent with the words (either within the word, or across words). The inconsistency rate is as high as 13%. We then discuss two strategies of trigger labeling, one is word-based in which we use a global errata table to alleviate the inconsistency problem, and the other is character-based which solves the inconsistency problem. 2.2 Word-based Trigger Labeling We apply Maximum-Entropy based classifiers for trigger identification and trigger classification. The two classifiers share the same set of features: Lexical features: word, POS of the word, previous word + word, word + next word, previous POS + POS, and POS + next POS. Syntactic features: 1) depth: the depth of the trigger in the parse tree 2) path to root: the path from the leaf node of the trigger to the root in the parse tree 3) sub-categorization : the phrase structure expanded by the father of the trigger 4) phrase type: the phrase type of the trigger Semantic dictionaries: 1) predicate existence: a boolean value indicating the existence of trigger in a predicate list which is produced from Chinese Propbank (Xue and Palmer, 2008) 2) synonym entry: the entry number of the trigger in a Chinese synonym dictionary Nearest entity information: 1) the entity type of the syntactically nearest entity to the trigger in the parse tree 2) the entity type of the physically nearest entity to the trigger in the sentence To deal with the language-specific issue in trigger identification, we construct a global errata table to record the inconsistencies existing in the training set. In the test procedure, if the scanned word has an entry in the errata table, we select the possible triggers in the entry as candidate triggers. 2.3 Character-based Trigger Labeling Although the error table significantly helps to reduce segmentation inconsistencies, it is not a perfect solution since it only recognizes the inconsistencies existing in the training data. To take a further step we build a separate character-based trigger identification classifier for comparison. We use a MEMM (Maximum Entropy Markov Model) to label each character with a tag indicating whether it is out of the trigger (O), or is the beginning of the trigger (B) or is a part of the trigger except the beginning (I). Our MEMM 210 classifier performs sequential classification by assigning each character one of the three tags. We then apply Viterbi algorithm to decode the tag sequence and identify the triggers in the sequence. Features used in our MEMM classifier include: the character, previous character, next character, previous tag and word-based features that the character carries. We apply the same set of features for trigger classification as used in word-based trigger labeling. 3 Argument Labeling We also split argument labeling into two steps: 1) argument identification: to recognize an entity or a temporal expression or a value as an argument 2) role classification: to assign a role to the argument. We apply Maximum-Entropy based classifiers for the two steps and they share the same set of features: Basic features: trigger, event subtype of the event mention, type of the ACE entity mention, head word of the entity mention, combined value of event subtype and head word, combined value of event subtype and entity subtype. Neighbor words: 1) left neighbor word of the entity, temporal expression, or value 2) right neighbor word of the entity, temporal expression, or value Syntactic features: 1) sub-categorization: the phrase structure expanding the parent of the trigger 2) position: the relative position of the entity regarding to the trigger (before or after) 3) path: the minimal path from the entity to the trigger 4) distance: the shortest length from the entity to the trigger in the parse tree 4 Experimental Results 4.1 Data and Scoring Metric We used 2005 ACE training corpus for our experiments. The corpus contains 633 Chinese documents. In this paper we follow the setting of ACE diagnostic tasks and use the ground truth entities, times and values for our training and testing. We randomly selected 558 documents as training set and 66 documents as test set. For the training set, we reserved 33 documents as development set. We define the following standards to determine the correctness of an event mention: A trigger is correctly labeled if its event type and offsets exactly match a reference trigger. An argument is correctly labeled if its event type, offsets, and role match the reference argument mention. 4.2 Overall System Performance Table 1 shows the overall Precision (P), Recall (R) and F-Measure (F) scores of our baseline system (word-based system with only lexical features in trigger labeling and basic features in argument labeling), word-based system with full integrated features and character-based system with full integrated features. Comparing to the Chinese event extraction system reported by (Tan et al., 2008), our scores are much lower. However, we argue that we apply much more strict evaluation metrics. 4.3 Comparison between Word-based and Character-based Trigger Labeling discriminate the word senses of a candidate trigger. In the following example, S1: The players are entering the stadium to prepare for the coming game. S2: Many farm products have been rotted before entering the market. The word "" (entering) indicates a "Transport" event in sentence 1 but not in sentence 2. The phrase structures around the word "" in both sentences are exactly the same (VP VP-NP). However, if an entity of "PERSON" appears ahead of "", the word "" is much more likely to be a trigger. Hence the features of nearby entity information could be effective. 4.5 Feature Contributions for Argument Labeling Table 1 lists the comparison results between character-based and word-based trigger labeling. It indicates that the character-based method outperforms the word-based method, mostly due to the better performance in the step of trigger identification (3.3% improvement in F-Measure) with precision as high as 82.4% (14.3% improvement), and a little loss in recall (2.1%). 4.4 Feature Contributions for Trigger Labeling Table 2 presents the feature contributions for word-based trigger labeling, and we observe similar feature contributions for character-based since it only differs from word-based in trigger identification and works similarly in trigger classification (we omit the results here). Table 2 shows that maintaining an errata table is an effective strategy for word-based trigger identification and dictionary resources improve the performance. It is worth noting that the performance drops when integrating the syntactic features. Our explanation might be that the trigger, unlike the predicate in the semantic role labeling task, can not only be a verb, but also can be a noun or other types. Thus the syntactic position for the trigger in the parse tree is much more flexible than the predicate in Semantic Role Labeling. For this reason, syntactic features are not so discriminative in trigger labeling. Furthermore, the syntactic features cannot 211 Table 3 shows feature contributions for argument labeling after word-based trigger labeling and we also observe the same feature contributions for argument labeling after character-based trigger labeling (results are omitted). It shows that the two neighbor word features are fairly effective. We observe that in some patterns of event description, the left word is informative to tell the followed entity mention is an argument. For example, " [Entity]"(killed by [Entity]) is a common pattern to describe an attack event, and the left neighbor word of the entity "" (by) can strongly imply that the entity is an argument with a role of "Attacker". Meanwhile, the right word can help reduce the spurious arguments. For example, in the Chinese " " (of) structure, the word " " (of) strongly suggests that the entity on the left side of "" is not an argument. The sub-categorization feature contributes little since it is a feature shared by all the arguments in the parse tree. Table 3 also shows that Path and Distance are two effective features. It is obvious that in the parse tree, each argument attached to the trigger is in a certain syntactic configuration. For example, the path " NP VP VV " implies that it might be a Subject-Verb structure and thus the entity in NP is highly likely to be an argument of the trigger (VV). The Position feature is helpful to discriminate argument roles in syntactically identical structure, e.g., "Subject Verb Object" structure. Performance System Baseline Word-based Character-based Trigger Identification P 61.0 68.1 82.4 R 50.0 52.7 50.6 F 54.9 59.4 62.7 Trigger Labeling P 58.7 65.7 78.8 R 48.2 50.9 48.3 F 52.9 57.4 59.9 Argument Identification P R F 49.5 56.1 64.4 38.2 38.2 36.4 43.1 45.4 46.5 Argument Labeling P 44.6 53.1 60.6 R 34.4 36.2 34.3 F 38.9 43.1 43.8 Table 1. Overall system performance (%) P 61.0 64.0 64.9 64.3 68.1 Trigger Identification R 50.0 52.0 53.5 51.8 52.7 F 54.9 57.4 58.6 57.4 59.4 P 58.7 61.3 62.7 60.6 65.7 Trigger Labeling R 48.2 49.8 51.6 48.9 50.9 F 52.9 54.9 56.6 54.1 57.4 Lexical features : (1) (1) + Errata table: (2) (2) + Dictionaries: (3) (3)+ Syntactic features: (4) (3) + Entity information: (5) Table 2. Feature contributions for word-based trigger labeling (%) Argument Identification P R 40.5 32.8 45.2 35.4 47.7 35.6 49.0 35.7 41.9 33.1 46.6 36.2 49.5 37.0 43.8 35.3 56.2 36.1 56.1 38.2 F 36.2 39.7 40.8 41.3 37.0 40.7 42.3 39.1 43.9 45.4 P 37.7 41.6 44.1 46.1 38.7 43.4 45.0 41.0 51.2 53.1 Argument Labeling R 30.5 32.5 32.9 33.6 30.5 33.7 33.6 33.1 32.9 36.2 F 33.7 36.5 37.7 38.9 34.1 38.0 38.5 36.6 40.0 43.1 Basic feature set: (1) (1)+Left word: (2) (1)+Right word: (3) Feature set 2: (2)+(3) (1)+Sub-categorization: (4) (1)+Path: (5) (1)+Distance: (6) (1)+Position:(7) Feature set 3 (from 4 to 7) Total Table 3. Feature contributions for argument labeling after word-based trigger labeling (%) 5 Conclusions and Future Work References D. Ahn. 2006. The stages of event extraction. Proc. COLING/ACL 2006 Workshop on Annotating and Reasoning about Time and Events. Sydney, Australia. H. Ji and R. Grishman. 2008. Refining Event Extraction Through Cross-document Inference. Proc. ACL 2008. Ohio, USA. H. Jing, R. Florian, X. Luo, T. Zhang, and A. Ittycheriah. 2003. HowtogetaChineseName(Entity): Segmentation and combination issues. Proc. EMNLP 2003. H. Tan; T. Zhao; J. Zheng. 2008. Identification of Chinese Event and Their Argument Roles. Proc. of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops. N. Xue and M. Palmer. 2008. Adding Semantic Role to the Chinese Treebank. Natural Language Engineering. Combridge University Press. In this paper, we took a close look at language specific issue in Chinese event extraction and explored effective features for Chinese event extraction task. All our work contributes to setting up a high performance Chinese event extraction system. For future work, we intend to explore an approach to conducting cross-lingual event extraction and investigate whether the cross-lingual inference can bootstrap either side when running two language event extraction systems in parallel. Acknowledgments This material is based upon work supported by the Defense Advanced Research Projects Agency under Contract No. HR0011-06-C-0023 via 27001022, and the CUNY Research Enhancement Program and GRTI Program. 212