Discovering asymmetric entailment relations between verbs using selectional preferences
Fabio Massimo Zanzotto DISCo University of Milano-Bicocca
Via Bicocca degli Arcimboldi 8, Milano, Italy zanzotto@disco.unimib.it

Marco Pennacchiotti, Maria Teresa Pazienza ART Group - DISP University of Rome "Tor Vergata"
Viale del Politecnico 1, Roma, Italy {pennacchiotti, pazienza}@info.uniroma2.it

Abstract
In this paper we investigate a novel method to detect asymmetric entailment relations between verbs. Our starting point is the idea that some point-wise verb selectional preferences carry relevant semantic information. Experiments using WordNet as a gold standard show promising results. Where applicable, our method, used in combination with other approaches, significantly increases the performance of entailment detection. A combined approach including our model improves the AROC of 5% absolute points with respect to standard models.

1 Introduction
Natural Language Processing applications often need to rely on large amount of lexical semantic knowledge to achieve good performances. Asymmetric verb relations are part of it. Consider for example the question "What college did Marcus Camby play for?". A question answering (QA) system could find the answer in the snippet "Marcus Camby won for Massachusetts" as the question verb play is related to the verb win. The viceversa is not true. If the question is "What college did Marcus Camby won for?", the snippet "Marcus Camby played for Massachusetts" cannot be used. Winnig entails playing but not vice-versa, as the relation between win and play is asymmetric. Recently, many automatically built verb lexicalsemantic resources have been proposed to support lexical inferences, such as (Resnik and Diab, 2000; Lin and Pantel, 2001; Glickman and Dagan, 2003). All these resources focus on symmetric semantic relations, such as verb similarity. Yet, 849

not enough attention has been paid so far to the study of asymmetric verb relations, that are often the only way to produce correct inferences, as the example above shows. In this paper we propose a novel approach to identify asymmetric relations between verbs. The main idea is that asymmetric entailment relations between verbs can be analysed in the context of class-level and word-level selectional preferences (Resnik, 1993). Selectional preferences indicate an entailment relation between a verb and its arguments. For example, the selectional preference {human} win may be read as a smooth constraint: if x is the subject of win then it is likely that x is a human, i.e. win(x)  human(x). It follows that selectional preferences like {player} win may be read as suggesting the entailment relation win(x)  play (x). Selectional preferences have been often used to infer semantic relations among verbs and to build symmetric semantic resources as in (Resnik and Diab, 2000; Lin and Pantel, 2001; Glickman and Dagan, 2003). However, in those cases these are exploited in a different way. The assumption is that verbs are semantically related if they share similar selectional preferences. Then, according to the Distributional Hypothesis (Harris, 1964), verbs occurring in similar sentences are likely to be semantically related. The Distributional Hypothesis suggests a generic equivalence between words. Related methods can then only discover symmetric relations. These methods can incidentally find verb pairs as (win,play) where an asymmetric entailment relation holds, but they cannot state the direction of entailment (e.g., winplay). As we investigate the idea that a single relevant verb selectional preference (as {player}

Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 849­856, Sydney, July 2006. c 2006 Association for Computational Linguistics


win) could produce an entailment relation between verbs, our starting point can not be the Distributional Hypothesis. Our assumption is that some point-wise assertions carry relevant semantic information (as in (Robison, 1970)). We do not derive a semantic relation between verbs by comparing their selectional preferences, but we use pointwise corpus-induced selectional preferences. The rest of the paper is organised as follows. In Sec. 2 we discuss the intuition behind our research. In Sec. 3 we describe different types of verb entailment. In Sec. 4 we introduce our model for detecting entailment relations among verbs . In Sec. 5 we review related works that are used both for comparison and for building combined methods. Finally, in Sec. 6 we present the results of our experiments.

modifies v rather than x has this property in the general case, i.e.: p(c(x)|v (x)) > p(c(x)) (1)

The probabilistic setting of selectional preferences also suggests an entailment: the implication v (x)  c(x) holds with a given degree of certainty. This definition is strictly related to the probabilistic textual entailment setting in (Glickman et al., 2005). We can use selectional preferences, intended as probabilistic entailment rules, to induce entailment relations among verbs. In our case, if a verb vt expects that the subject "has the property of doing an action vh ", this may be used to induce that the verb vt probably entails the verb vh , i.e.: vt (x)  vh (x) (2)

2 Selectional Preferences and Verb Entailment
Selectional restrictions are strictly related to entailment. When a verb or a noun expects a modifier having a predefined property it means that the truth value of the related sentences strongly depends on the satisfiability of these expectations. For example, "X is blue" implies the expectation that X has a colour. This expectation may be seen as a sort of entailment between "being a modifier of that verb or noun" and "having a property". If the sentence is "The number three is blue", then the sentence is false as the underlying entailment blue(x)  has colour(x) does not hold (cf. (Resnik, 1993)). In particular, this rule applies to verb logical subjects: if a verb v has a selectional restriction requiring its logical subjects to satisfy a property c, it follows that the implication: v (x)  c(x) should be verified for each logical subject x of the verb v . The implication can also be read as: if x has the property of doing the action v this implies that x has the property c. For example, if the verb is to eat, the selectional restrictions of to eat would imply that its subjects have the property of being animate. Resnik (1993) introduced a smoothed version of selectional restrictions called selectional preferences. These preferences describe the desired properties a modifier should have. The claim is that if a selectional preference holds, it is more probable that x has the property c given that it 850

As for class-based selectional preference acquisition, corpora can be used to estimate these particular kinds of preferences. For example, the sentence "John McEnroe won the match..." contributes to probability estimation of the class-based selectional preference win(x)  human(x) (since John McEnroe is a human). In particular contexts, it contributes also to the induction of the entailment relation between win and play, as John McEnroe has the property of playing. However, as the example shows, classes relevant for acquiring selectional preferences (such as human) are explicit, as they do not depend from the context. On the contrary, properties such as "having the property of doing an action" are less explicit, as they depend more strongly on the context of sentences. Thus, properties useful to derive entailment relations among verbs are more difficult to find. For example, it is easier to derive that John McEnroe is a human (as it is a stable property) than that he has the property of playing. Indeed, this latter property may be relevant only in the context of the previous sentence. However, there is a way to overcome this limitation: agentive nouns such as runner make explicit this kind of property and often play subject roles in sentences. Agentive nouns usually denote the "doer" or "performer" of some action. This is exactly what is needed to make clearer the relevant property vh (x) of the noun playing the logical subject role. The action vh will be the one entailed by the verb vt heading the sentence. As an example in the sentence "the player wins", the action play


evocated by the agentive noun player is entailed by win.

3 Verb entailment: a classification
The focus of our study is on verb entailment. A brief review of the WordNet (Miller, 1995) verb hierarchy (one of the main existing resources on verb entailment relations) is useful to better explain the problem and to better understand the applicability of our hypothesis. In WordNet, verbs are organized in synonymy sets (synsets) and different kinds of semantic relations can hold between two verbs (i.e. two synsets): troponymy, causation, backwardpresupposition, and temporal inclusion. All these relations are intended as specific types of lexical entailment. According to the definition in (Miller, 1995) lexical entailment holds between two verbs vt and vh when the sentence Someone vt entails the sentence Someone vh (e.g. "Someone wins" entails "Someone plays"). Lexical entailment is then an asymmetric relation. The four types of WordNet lexical entailment can be classified looking at the temporal relation between the entailing verb vt and the entailed verb vh . Troponymy represents the hyponymy relation between verbs. It stands when vt and vh are temporally co-extensive, that is, when the actions described by vt and vh begin and end at the same times (e.g. limpwalk). The relation of temporal inclusion captures those entailment pairs in which the action of one verb is temporally included in the action of the other (e.g. snoresleep). Backwardpresupposition stands when the entailed verb vh happens before the entailing verb vt and it is necessary for vt . For example, win entails play via backward-presupposition as it temporally follows and presupposes play. Finally, in causation the entailing verb vt necessarily causes vh . In this case, the temporal relation is thus inverted with respect to backward-presupposition, since vt precedes vh . In causation, vt is always a causative verb of change, while vh is a resultative stative verb (e.g. buyown, and givehave). As a final note, it is interesting to notice that the Subject-Verb structure of vt is generally preserved in vh for all forms of lexical entailment. The two verbs have the same subject. The only exception is causation: in this case the subject of the entailed verb vh is usually the object of vt (e.g., X give Y  Y have). In most cases the subject of vt carries 851

out an action that changes the state of the object of vt , that is then described by vh . The intuition described in Sec. 2 is then applicable only for some kinds of verb entailments. First, the causation relation can not be captured since the two verbs should have the same subject (cf. eq. (2)). Secondly, troponymy seems to be less interesting than the other relations, since our focus is more on a logic type of entailment (i.e., vt and vh express two different actions one depending from the other). We then focus our study and our experiments on backward-presupposition and temporal inclusion. These two relations are organized in WordNet in a single set (called ent) parted from troponymy and causation pairs.

4 The method
Our method needs two steps. Firstly (Sec. 4.1), we translate the verb selectional expectations in specific Subject-Verb lexico-syntactic patterns P (vt , vh ). Secondly (Sec. 4.2), we define a statistical measure S (vt , vh ) that captures the verb preferences. This measure describes how much the relations between target verbs (vt , vh ) are stable and commonly agreed. Our method to detect verb entailment relations is based on the idea that some point-wise assertions carry relevant semantic information. This idea has been firstly used in (Robison, 1970) and it has been explored for extracting semantic relations between nouns in (Hearst, 1992), where lexico-syntactic patterns are induced by corpora. More recently this method has been applied for structuring terminology in isa hierarchies (Morin, 1999) and for learning question-answering patterns (Ravichandran and Hovy, 2002). 4.1 Nominalized textual entailment lexico-syntactic patterns The idea described in Sec. 2 can be applied to generate Subject-Verb textual entailment lexicosyntactic patterns. It often happens that verbs can undergo an agentive nominalization, e.g., play vs. player. The overall procedure to verify if an entailment between two verbs (vt , vh ) holds in a pointwise assertion is: whenever it is possible to apply the agentive nominalization to the hypothesis vh , scan the corpus to detect those expressions in which the agentified hypothesis verb is the subject of a clause governed by the text verb vt . Given a verb pair (vt , vh ) the assertion is for-


Lexico-syntactic patterns
Pnom (vt , vh ) =

nominalization
Phb (vt , vh ) =

happens-before (Chklovski and Pantel, 2004)

probabilistic entailment (Glickman et al., 2005) additional sets

Ppe (vt , vh ) =

{"ag ent(vh )|num:sing vt |person:third,t:pres ", "ag ent(vh )|num:plur vt |person:nothird,t:pres ", "ag ent(vh )|num:sing vt |t:past ", "ag ent(vh )|num:plur vt |t:past "} {"vh |t:inf and then vt |t:pres ", "vh |t:inf * and then vt |t:pres ", "vh |t:past and then vt |t:pres ", "vh |t:past * and then vt |t:pres ", "vh |t:inf and later vt |t:pres ", "vh |t:past and later vt |t:pres ", "vh |t:inf and subsequently vt |t:pres ", "vh |t:past and subsequently vt |t:pres ", "vh |t:inf and eventually vt |t:pres ", "vh |t:past and eventually vt |t:pres "} {"vh |person:third,t:pres "  "vt |person:third,t:pres ", "vh |t:past "  "vt |t:past ", "vh |t:pres cont "  "vt |t:pres cont ", "vh |person:nothird,t:pres "  "vt |person:nothird,t:pres "}

Fagent (v ) = F (v ) = Fall (v ) =

{"ag ent(v )|num:sing ", "ag ent(v )|num:plur "} {"v |person:third,t:present ", "v |person:nothird,t:present ", "v |t:past "} {"v |person:third,t:pres ", "v |t:pres cont , "v |person:nothird,t:present ", "v |t:past "}

Table 1: Nominalization and related textual entailment lexico-syntactic patterns

malized in a set of textual entailment lexicosyntactic patterns, that we call nominalized patterns Pnom (vt , vh ). This set is described in Tab. 1. ag ent(v ) is the noun deriving from the agentification of the verb v . Elements such as l|f1 ,...,fN are the tokens generated from lemmas l by applying constraints expressed via the feature-value pairs f1 , ..., fN . For example, in the case of the verbs play and win, the related set of textual entailment expressions derived from the patterns are Pnom (win, play ) = {"player wins", "players win", "player won", "players won"}. In the experiments hereafter described, the required verbal forms have been obtained using the publicly available morphological tools described in (Minnen et al., 2001). Simple heuristics have been used to produce the agentive nominalizations of verbs1 . Two more sets of expressions, Fagent (v ) and F (v ) representing the single events in the pair, are needed for the second step (Sec. 4.2). This two additional sets are described in Tab. 1. In the example, the derived expressions are Fagent (play ) = {"player","players"} and F (win) = {"wins","won"}. 4.2 Measures to estimate the entailment strength

The related entailment can not be considered commonly agreed. For example, the sentence "Like a writer composes a story, an artist must tell a good story through their work." suggests that compose entails write. However, it may happen that these correctly detected entailments are accidental, that is, the detected relation is only valid for the given text. For example, if the text fragment "The writers take a simple idea and apply it to this task" is taken in isolation, it suggests that take entails write, but this could be questionable. In order to get rid of these wrong verb pairs, we perform a statistical analysis of the verb selectional preferences over a corpus. This assessment will validate point-wise entailment assertions. Before introducing the statistical entailment indicator, we provide some definitions. Given a corpus C containing samples, we will refer to the absolute frequency of a textual expression t in the corpus C with fC (t). The definition can be easily extended to a set of expressions T . Given a pair vt and vh we define the following entailment strength indicator S (vt , vh ). Specifically, the measure Snom (vt , vh ) is derived from point-wise mutual information (Church and Hanks, 1989): Snom (vt , vh ) = log p(vt , vh |nom) p(vt )p(vh |pers) (3)

The above textual entailment patterns define pointwise entailment assertions. If pattern instances are found in texts, the related verb-subject pairs suggest but not confirm a verb selectional preference.
1 Agentive nominalization has been obtained adding "-er" to the verb root taking into account possible special cases such as verbs ending in "-y". A form is retained as a correct nominalization if it is in WordNet.

where nom is the event of having a nominalized textual entailment pattern and pers is the event of having an agentive nominalization of verbs. Probabilities are estimated using maximum-likelihood: p(vt , vh |nom)  fC (Pnom (vt , vh )) P th, fC ( nom (v , v ))

852


p(vt )  fC (F (vt ))/fC ( (v )), and F p(vh |pers)  fC (Fagent (vh ))/fC ( ag ent (v )). Counts are considered useful when they are greater or equal to 3. The measure Snom (vt , vh ) indicates the relatedness between two elements composing a pair, in line with (Chklovski and Pantel, 2004; Glickman et al., 2005) (see Sec. 5). Moreover, if Snom (vt , vh ) > 0 the verb selectional preference property described in eq. (1) is satisfied.
F

indicator. The two methods are then fairly in line. The other approach we experiment is the "quasi-pattern" used in (Glickman et al., 2005) to capture lexical entailment between two sentences. The pattern has to be discussed in the more general setting of the probabilistic entailment between texts: the text T and the hypothesis H . The idea is that the implication T  H holds (with a degree of truth) if the probability that H holds knowing that T holds is higher that the probability that H holds alone, i.e.: p(H |T ) > p(H ) (4)

5 Related "non-distributional" methods and integrated approaches
Our method is a "non-distributional" approach for detecting semantic relations between verbs. We are interested in comparing and integrating our method with similar approaches. We focus on two methods proposed in (Chklovski and Pantel, 2004) and (Glickman et al., 2005). We will shortly review these approaches in light of what introduced in the previous sections. We also present a simple way to combine these different approaches. The lexico-syntactic patterns introduced in (Chklovski and Pantel, 2004) have been developed to detect six kinds of verb relations: similarity, strength, antonymy, enablement, and happensbefore. Even if, as discussed in (Chklovski and Pantel, 2004), these patterns are not specifically defined as entailment detectors, they can be useful for this purpose. In particular, some of these patterns can be used to investigate the backwardpresupposition entailment. Verb pairs related by backward-presupposition are not completely temporally included one in the other (cf. Sec. 3): the entailed verb vh precedes the entailing verb vt . One set of lexical patterns in (Chklovski and Pantel, 2004) seems to capture the same idea: the happens-before (hb) patterns. These patterns are used to detect not temporally overlapping verbs, whose relation is semantically very similar to entailment. As we will see in the experimental section (Sec. 6), these patterns show a positive relation with the entailment relation. Tab. 1 reports the happens-before lexico-syntactic patterns (Phb ) as proposed in (Chklovski and Pantel, 2004). In contrast to what is done in (Chklovski and Pantel, 2004) we decided to directly count patterns derived from different verbal forms and not to use an estimation factor. As in our work, also in (Chklovski and Pantel, 2004), a mutualinformation-related measure is used as statistical 853

This equation is similar to equation (1) in Sec. 2. In (Glickman et al., 2005), words in H and T are supposed to be mutually independent. The previous relation between H and T probabilities then holds also for word pairs. A special case can be applied to verb pairs: p(vh |vt ) > p(vh ) (5)

Equation (5) can be interpreted as the result of the following "quasi-pattern": the verbs vh and vt should co-occur in the same document. It is possible to formalize this idea in the probabilistic entailment "quasi-patterns" reported in Tab. 1 as Ppe , where verb form variability is taken into consideration. In (Glickman et al., 2005) point-wise mutual information is also a relevant statistical indicator for entailment, as it is strictly related to eq. (5). For both approaches, the strength indicator Shb (vt , vh ) and Spe (vt , vh ) are computed as follows: p(vt , vh |y ) Sy (vt , vh ) = log (6) p(vt )p(vh ) where y is hb for the happens-before patterns and pe for the probabilistic entailment patterns. Probabilities are estimated as in the previous section. Considering independent the probability spaces where the three patterns lay (i.e., the space of subject-verb pairs for nom, the space of coordinated sentences for hb, and the space of documents for pe), the combined approaches are obtained summing up Snom , Shb , and Spe . We will then experiment with these combined approaches: nom + pe, nom + hb, nom + hb + pe, and hb + pe.

6 Experimental Evaluation
The aim of the experimental evaluation is to establish if the nominalized pattern is useful to help


(a) 1 0.8 0.6 S e(t) 0.4 0.2 0 0 0.2 0.4 0.6 1 - S p(t) 0.8 1 nom hb pe hb + pe hb + pe + nom S e(t) 0.4 0.2 0 0 0.2 1 0.8 0.6

(b)

hb hb + pe hb + pe + n hb + pe + n 0.4 0.6 1 - S p(t) 0.8 1

Figure 1: ROC curves of the different methods in detecting verb entailment. We experiment with the method by itself or in combination with other sets of patterns. We are then interested only in verb pairs where the nominalized pattern is applicable. The best pattern or the best combined method should be the one that gives the highest values of S to verb pairs in entailment relation, and the lowest value to other pairs. We need a corpus C over which to estimate probabilities, and two dataset, one of verb entailment pairs, the True Set (T S ), and another with verbs not in entailment, the Control Set (C S ). We use the web as corpus C where to estimate Smi and GoogleT M as a count estimator. The web has been largely employed as a corpus (e.g., (Turney, 2001)). The findings described in (Keller and Lapata, 2003) suggest that the count estimations we need in our study over Subject-Verb bigrams are highly correlated to corpus counts. 6.1 Experimental settings the test is positive, while S p(t) is the probability of belonging to C ontrolS et if the test is negative, i.e.: S e(t) = p((vh , vt )  T S |S (vh , vt ) > t) S p(t) = p((vh , vt )  C S |S (vh , vt ) < t) The ROC curve (S e(t) vs. 1 - S p(t)) naturally follows (see Fig. 1). Better methods will have ROC curves more similar to the step function f (1 - S p(t)) = 0 when 1 - S p(t) = 0 and f (1 - S p(t)) = 1 when 0 < 1 - S p(t)  1. The ROC analysis provides another useful evaluation tool: the AROC, i.e. the total area under the ROC curve. Statistically, AROC represents the probability that the method in evaluation will rank a chosen positive example higher than a randomly chosen negative instance. AROC is usually used to better compare two methods that have similar ROC curves. Better methods will have higher AROCs. As True Set (T S ) we use the controlled verb entailment pairs ent contained in WordNet. As described in Sec. 3, the entailment relation is a semantic relation defined at the synset level, standing in the verb sub-hierarchy. That is, each pair of synsets (St , Sh ) is an oriented entailment relation between St and Sh . WordNet contains 409 entailed synsets. These entailment relations are consequently stated also at the lexical level. The pair (St , Sh ) naturally implies that vt entails vh for each possible vt  St and vh  Sh . It is possible to derive from the 409 entailment synset a test set of 2,233 verb pairs. As Control Set we use two sets: random and ent. The random set 854

Since we have a predefined (but not exhaustive) set of verb pairs in entailment, i.e. ent in WordNet, we cannot replicate a natural distribution of verb pairs that are or are not in entailment. Recall and precision lose sense. Then, the best way to compare the patterns is to use the ROC curve (Green and Swets, 1996) mixing sensitivity and specificity. ROC analysis provides a natural means to check and estimate how a statistical measure is able to distinguish positive examples, the True Set (T S ), and negative examples, the Control Set (C S ). Given a threshold t, S e(t) is the probability of a candidate pair (vh , vt ) to belong to True Set if


is randomly generated using verb in ent, taking care of avoiding to capture pairs in entailment relation. A pair is considered a control pair if it is not in the True Set (the intersection between the True Set and the Control Set is empty). The ent is the set of pairs in ent with pairs in the reverse order. These two Control Sets will give two possible ways of evaluating the methods: a general and a more complex task. As a pre-processing step, we have to clean the two sets from pairs in which the hypotheses can not be nominalized, as our pattern Pnom is applicable only in these cases. The pre-processing step retains 1,323 entailment verb pairs. For comparative purposes the random Control Set is kept with the same cardinality of the True Set (in all, 1400 verb pairs). S is then evaluated for each pattern over the True Set and the Control Set, using equation (3) for Pnom , and equation (6) for Ppe and Phb . The best pattern or combined method is the one that is able to most neatly split entailment pairs from random pairs. That is, it should in average assign higher S values to pairs in the True Set. 6.2 Results and analysis

hb pe nom nom + pe hb + pe hb + nom + pe hb hb + pe hb + nom + pe

AROC 56.00 57.00 59.94 64.40 61.44 66.44 61.64 69.03 70.82

best accuracy 57.11 55.75 59.86 61.33 58.98 63.09 62.73 64.71 66.07

Table 2: Performances in the general case: ent vs. random
hb nom hb hb + nom hb + nom AROC 43.82 54.91 56.18 49.35 57.67 best accuracy 50.11 54.94 57.16 51.73 57.22

Table 3: Performances in the complex case: ent vs. ent tion hb + pe that excludes the Pnom pattern (61%), the improvement in the AROC is of 5% and 3%. Moreover, the shape of the nom + hb + pe ROC curve in Fig. 1.(a) is above all the other in all the points. In the second experiment we compared methods in the more complex task of dividing the ent set from the ent set. In this case methods are asked to determine if win  play is a correct entailment and play  win is not. Results of these set of experiments is presented in Tab. 3. The nominalized pattern nom preserves its discriminative power. Its AROC is over the chance line even if, as expected, it is worse than the one obtained in the general case. Surprisingly, the happensbefore (hb) set of patterns seems to be not correlated the entailment relation. The temporal relation vh -happens-before-vt does not seem to be captured by those patterns. But, if this evidence is seen in a positive way, it seems that the patterns are better capturing the entailment when used in the reversed way (hb). This is confirmed by its AROC value. If we observe for example one of the implications in the True Set, reach  g o what is happening may become clearer. Sample sentences respectively for the hb case and the hb case are "The group therefore elected to go to Tyso and then reach Anskaven" and "striving to reach personal goals and then go beyond them". It seems that in the second case then assumes an enabling role more than only a temporal role. After this sur855

In the first experiment we compared the performances of the methods in dividing the ent test set and the random control set. The compared methods are: (1) the set of patterns taken alone, i.e. nom, hb, and pe; (2) some combined methods, i.e. nom + pe, hb + pe, and nom + hb + pe. Results of this first experiment are reported in Tab. 2 and Fig. 1.(a). As Figure 1.(a) shows, our nominalization pattern Pnom performs better than the others. Only Phb seems to outperform nominalization in some point of the ROC curve, where Pnom presents a slight concavity, maybe due to a consistent overlap between positive and negative examples at specific values of the S threshold t. In order to understand which of the two patterns has the best discrimination power a comparison of the AROC values is needed. As Table 2 shows, Pnom has the best AROC value (59.94%) indicating a more interesting behaviour with respect to Phb and Ppe . It is respectively 2 and 3 absolute percent point higher. Moreover, the combinations nom + hb + pe and nom + pe that includes the Pnom pattern have a very high performance considering the difficulty of the task, i.e. 66% and 64%. If compared with the combina-


prising result, as we expected, in this experiment even the combined approach hb + nom behaves better than hb + nom and better than hb, respectively around 8% and 1.5% absolute points higher (see Tab. 3). The above results imposed the running of a third experiment over the general case. We need to compare the entailment indicators derived exploiting the new use of hb, i.e. hb, with respect to the methods used in the first experiment. Results are reported in Tab. 2 and Fig. 1.(b). As Fig. 1.(b) shows, the hb has a very interesting behaviour for small values of 1 - S p(t). In this area it behaves extremely better than the combined method nom + hb + pe. This is an advantage and the combined method nom + hb + pe exploit it as both the AROC and the shape of the ROC curve demonstrate. Again the method nom + hb + pe that includes the Pnom pattern has 1,5% absolute points with respect to the combined method hb + pe that does not include this information.

ceedings of the 1st Pascal Challenge Workshop, Southampton, UK. David M. Green and John A. Swets. 1996. Signal Detection Theory and Psychophysics. John Wiley and Sons, New York, USA. Zellig Harris. 1964. Distributional structure. In Jerrold J. Katz and Jerry A. Fodor, editors, The Philosophy of Linguistics, New York. Oxford University Press. Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 15th International Conference on Computational Linguistics (CoLing-92), Nantes, France. Frank Keller and Mirella Lapata. 2003. Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29(3), September. Dekan Lin and Patrick Pantel. 2001. DIRT-discovery of inference rules from text. In Proc. of the ACM Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco, CA. George A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM, 38(11):39­41, November. Guido Minnen, John Carroll, and Darren Pearce. 2001. Applied morphological processing of english. Natural Language Engineering, 7(3):207­223. Emmanuel Morin. 1999. Extraction de liens semantiques entre termes a partir de corpus de ´ ` ´ textes techniques. Ph.D. thesis, Univesite de Nantes, ´ Faculte des Sciences et de Techniques. Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th ACL Meeting, Philadelphia, Pennsilvania. Philip Resnik and Mona Diab. 2000. Measuring verb similarity. In Twenty Second Annual Meeting of the Cognitive Science Society (COGSCI2000), Philadelphia. Philip Resnik. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Harold R. Robison. 1970. Computer-detectable semantic structures. Information Storage and Retrieval, 6(3):273­288. Peter D. Turney. 2001. Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proc. of the 12th European Conference on Machine Learning, Freiburg, Germany.

7 Conclusions
In this paper we presented a method to discover asymmetric entailment relations between verbs and we empirically demonstrated interesting improvements when used in combination with similar approaches. The method is promising and there is still some space for improvements. As implicitly experimented in (Chklovski and Pantel, 2004), some beneficial effect can be obtained combining these "non-distributional" methods with the methods based on the Distributional Hypothesis.

References
Timoty Chklovski and Patrick Pantel. 2004. VerbOCEAN: Mining the web for fine-grained semantic verb relations. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcellona, Spain. Kenneth Ward Church and Patrick Hanks. 1989. Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, Canada. Oren Glickman and Ido Dagan. 2003. Identifying lexical paraphrases from a single corpus: A case study for verbs. In Proceedings of the International Conference Recent Advances of Natural Language Processing (RANLP-2003), Borovets, Bulgaria. Oren Glickman, Ido Dagan, and Moshe Koppel. 2005. Web based probabilistic textual entailment. In Pro-

856