Disambiguation of Preposition Sense Using Linguistically Motivated Features Stephen Tratz and Dirk Hovy Information Sciences Institute University of Southern California 4676 Admiralty Way, Marina del Rey, CA 90292 {stratz,dirkh}@isi.edu Abstract In this paper, we present a supervised classification approach for disambiguation of preposition senses. We use the SemEval 2007 Preposition Sense Disambiguation datasets to evaluate our system and compare its results to those of the systems participating in the workshop. We derived linguistically motivated features from both sides of the preposition. Instead of restricting these to a fixed window size, we utilized the phrase structure. Testing with five different classifiers, we can report an increased accuracy that outperforms the best system in the SemEval task. semantic content into account, disambiguating between preposition senses becomes increasingly important for text processing tasks. In order to disambiguate different senses, most systems to date use a fixed window size to derive classification features. These may or may not be syntactically related to the preposition in question, resultingin the worst casein an arbitrary bag of words. In our approach, we make use of the phrase structure to extract words that have a certain syntactic relation with the preposition. From the words collected that way, we derive higher level features. In 2007, the SemEval workshop presented participants with a formal preposition sense disambiguation task to encourage the development of systems for the disambiguation of preposition senses (Litkowski and Hargraves, 2007). The training and test data sets used for SemEval have been released to the general public, and we used these data to train and test our system. The SemEval workshop data consists of instances of 34 prepositions in natural text that have been tagged with the appropriate sense from the list of the common English preposition senses compiled by The Preposition Project, cf. Litkowski (2005). The SemEval data provides a natural method for comparing the performance of preposition sense disambiguation systems. In our paper, we follow the task requirements and can thus directly compare our results to the ones from the study. For evaluation, we compared our results to those of the three systems that participated in the task (MELB: Ye and Baldwin (2007); KU: Yuret (2007); IRST: Popescu et al. (2007)). We also used the "first sense" and the "most frequent sense" 1 Introduction Classifying instances of polysemous words into their proper sense classes (aka sense disambiguation) is potentially useful to any NLP application that needs to extract information from text or build a semantic representation of the textual information. However, to date, disambiguation between preposition senses has not been an object of great study. Instead, most word sense disambiguation work has focused upon classifying noun and verb instances into their appropriate WordNet (Fellbaum, 1998) senses. Prepositions have mostly been studied in the context of verb complements (Litkowski and Hargraves, 2007). Like instances of other word classes, many prepositions are ambiguous, carrying different semantic meanings (including notions of instrumental, accompaniment, location, etc.) as in "He ran with determination", "He ran with a broken leg", or "He ran with Jane". As NLP systems take more and more 96 Proceedings of the NAACL HLT Student Research Workshop and Doctoral Consortium, pages 96100, Boulder, Colorado, June 2009. c 2009 Association for Computational Linguistics baselines (see section 3 and table 1). These baselines are determined by the TPP listing and the frequency in the training data, respectively. Our system beat the baselines and outperformed the three participating systems. 2.2 Feature Extraction 2 2.1 Methodology Data Preparation We downloaded the test and training data provided by the SemEval-2007 website for the preposition sense disambiguation task. These are 34 separate XML filesone for each preposition, comprising 16557 training and 8096 test example sentences, each sentence containing one example of the respective preposition. What are your beliefs
about these emotions ? The preposition is annotated by a head tag, and the meaning of the preposition in question is given as defined by TPP. Each preposition had between 2 and 25 different senses (on average 9.76). For the case of "about" these would be 1. on the subject of; concerning 2. so as to affect 3. used to indicate movement within a particular area 4. around 5. used to express location in a particular place 6. used to describe a quality apparent in a person We parsed the sentences using the Charniak parser (Charniak, 2000). Note that the Charniak parsereven though among the best availbale English parsersoccasionally fails to parse a sentence correctly. This might result in an erroneous extraction, such as an incorrect or no word. However, these cases are fairly rare, and we did not manually correct this, but rather relied on the size of the data to compensate for such an error. After this preprocessing step, we were able to extract the features. 97 Following O'Hara and Wiebe (2003) and Alam (2004), we assumed that there is a meaningful connection between syntactically related words on both sides of the preposition. We thus focused on specific words that are syntactically related to the preposition via the phrase structure. This has the advantage that it is not limited to a certain window size; phrases might stretch over dozens of words, so the extracted word may occur far away from the actual preposition. These words were chosen based on a manual analysis of training data. Using Tregex (Levy and Andrew, 2006), a utility for expressing "regular expressions over trees", we created a set of rules to extract the words in question. Each rule matched words that exhibited a specific relationship with the preposition or were within a two word window to cover collocations. An example rule is given below. IN > (P P < (V P < # #!AU X)) = x& < This particular rule finds the head (denoted by x) of a verb phrase that governs the prepositional phrase containing the preposition, unless x is an auxiliary verb. Tregex rules were used to identify the following words for feature generation: · the head verb/noun that immediately dominates the preposition along with all of its modifying determiners, quantifiers, numbers, and adjectives · the head verb/noun immediately dominated by the preposition along with all of its modifying determiners, quantifiers, numbers, and adjectives · the subject, negator, and object(s) of the immediately dominating verb · neighboring prepositional phrases dominated by the same verb/noun ("sister" prepositional phrases) · words within 2 positions to the left or right of the preposition For each word extracted using these rules, we collected the following items: its modifying determiners, quantifiers, numbers, and adjectives ! the head verb/noun immediately domi· the word itself nated by the preposition along with all of its · lemma modifying determiners, quantifiers, numbers, and adjectives ! the subject, (both exact and conflated, the · part-of-speech negator, and object(s) of e.g. immediately 'verb' for verb both 'VBD' anddominating'VBD') ! neighboring prepositional phrases domi· all nated by of thesame WordNet sense synonyms the first verb/noun ("sister" prepositional phrases) ·! words within 2the first WordNetleft or right all hypernyms of positions to the sense of the preposition For words indicator for capitalization we col· boolean extracted using these rules, lected the following features: ! feature itself Each the wordis a combination of the extraction ! lemma rule and the extracted item. The values the feature ! part-of-speech (both exact and conflated can take on are binary: present or absent. For some (e.g. both 'VBD' and 'verb' for 'VBD')) prepositions, this resulted in WordNet sense fea! synonyms of the first several thousand tures. In order to reduce computation time, we used ! hypernyms of the first WordNet sense the ! boolean indicatoreachcapitalizationclassifier, following steps: For for preposition This resultedfeatures using information gain (Forwe ranked the in several thousand features for the prepositions. From the information gain included at man, 2003). We used resulting lists,we (Foreman, 2003)4000 features. Thus not all classifiers features most in order to find the highest ranking used the of each class and limited our classifiers to the top same features. 4000 features in order to reduce computation time. 2.3 Classifier Training 2.3 Classifier Training We chose maximum entropy (Berger et al., 1996) as We chose maximum entropy (Berger, 1996) as our our primary classifier, since it had been successfully primary classifier because the highest performing applied by the the SemEval-2007 systems in systems in both highest performing preposition both the SemEval-2007 preposition sense disambiguation sense disambiguation task (Ye and Baldwin, 2007) task (Ye and Baldwin, 2007) and the general word and the general word sense disambiguation task (Tratz disambiguation task We used the implemensense et al., 2007) used it. (Tratz et al., 2007). We tationthe implementation provided by the Mallet maused provided by the Mallet machine learning toolkit learning toolkit (McCallum, 2002). For the chine (McCallum, 2002). Then, for the sake of comparison, we also we also built other classifiers sake of comparison, built several several other clasincluding multinomial naïve Bayes, SVMs, kNN, sifiers, including (J48) using na¨ve Bayes, SVMs, and decision trees multinomialthe iWEKA toolkit kNN, and decision trees (J48) using the WEKA (Witten, 1999). We chose the radial basis function toolkit (Witten, the SVMs and left the radial basis (RBF) kernel for 1999). We chose all other pafunction (RBF) kernel for the rameters at their default values. SVMs and left all other parameters at their default values. expected, the highest accuracy was achieved using the maximum entropy classifier. Overall, our system outperformed the winning the winning system by 0.058, an 8 percent improvesystem by 0.058, an 8 percent improvement. A ment. A simple proportion test shows this to be stasimple proportion test shows this to be statistically tistically significant significant at 0.001. at 0.001. !"#$%& System *++ kNN !67)89:;)*#<=#>? SVM (RBF Kernel) @0/)A#BCDCE=)$<##D J48 decision trees 7F>$C=EGC'>)+'HI#):'J#D Multinomial Naïve Bayes 7'KCGFG)#=$