A Logic-based Semantic Approach to Recognizing Textual Entailment Marta Tatu and Dan Moldovan Language Computer Corporation Richardson, Texas, 75080 United States of America marta,moldovan@languagecomputer.com Abstract This paper proposes a knowledge representation model and a logic proving setting with axioms on demand successfully used for recognizing textual entailments. It also details a lexical inference system which boosts the performance of the deep semantic oriented approach on the RT E data. The linear combination of two slightly different logical systems with the third lexical inference system achieves 73.75% accuracy on the RT E 2006 data. 1 Introduction While communicating, humans use different expressions to convey the same meaning. One of the central challenges for natural language understanding systems is to determine whether different text fragments have the same meaning or, more generally, if the meaning of one text can be derived from the meaning of another. A module that recognizes the semantic entailment between two text snippets can be employed by many N L P applications. For example, Question Answering systems have to identify texts that entail expected answers. In Multi-document Summarization, the redundant information should be recognized and omitted from the summary. Trying to boost research in textual inferences, the PASCAL Network proposed the Recognizing Textual Entailment (RT E) challenges (Dagan et al., 2005; Bar-Haim et al., 2006). For a pair of two text fragments, the task is to determine if the meaning of one text (the entailed hypothesis denoted by ) can be inferred from the meaning of the other text (the entailing text or ). In this paper, we propose a model to represent ¡ the knowledge encoded in text and a logical setting suitable to a recognizing semantic entailment system. We cast the textual inference problem as a logic implication between meanings. Text semantically entails if its meaning logically implies the meaning of . Thus, we, first, transform both text fragments into logic form, capture their meaning by detecting the semantic relations that hold between their constituents and load these rich logic representations into a natural language logic prover to decide if the entailment holds or not. Figure 1 illustrates our approach to RT E. The following sections of the paper shall detail the logic proving methodology, our logical representation of text and the various types of axioms that the prover uses. To our knowledge, there are few logical approaches to RT E. (Bos and Markert, 2005) repand into a first-order logic transresents lation of the DRS language used in Discourse Representation Theory (Kamp and Reyle, 1993) and uses a theorem prover and a model builder with some generic, lexical and geographical background knowledge to prove the entailment between the two texts. (de Salvo Braz et al., 2005) proposes a Description Logic-based knowledge representation language used to induce the representations of and and uses an extended subsumption algorithm to check if any of 's representations obtained through equivalent transformations entails . 2 Cogex - A Logic Prover for N L P Our system uses C O G E X (Moldovan et al., 2003), a natural language prover originating from OTT E R (McCune, 1994). Once its set of support is loaded with and the negated hypothesis ( ) and its usable list with the axioms needed to gener819 ¢ ¡ Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 819­826, Sydney, July 2006. c 2006 Association for Computational Linguistics ¡ ¡ ¡ ¡ Figure 1: C O G E X's Architecture 2.1 Proof scoring algorithm 820 ¡ Once a proof by contradiction is found, its score is computed by starting with an initial perfect score and deducting points for each axiom utilized in the proof, every relaxed argument, and dropped predicate. The computed score is a measure of the kinds of axioms used in the proof and the significance of the dropped arguments and predicates. If we assume that both text fragments are existential, then if and only if 's entities are a subset of 's entities (Some smart people read Some people read) and penalizing a pair whose contains predicates that cannot be inferred is a correct way to ensure entailment (Some people read Some smart people read). But, if both and are universally quantified, then the groups mentioned in must be a subset of the ones from (All people read All smart people read and All smart people read All people read). Thus, the scoring mod- 3 Knowledge Representation For the textual entailment task, our logic prover uses a two-layered logical representation which captures the syntactic and semantic propositions encoded in a text fragment. 3.1 Logic Form Transformation In the first stage of our representation process, C O G E X converts and into logic forms (Moldovan and Rus, 2001). More specifically, a predicate is created for each noun, verb, adjective and adverb. The nouns that form a noun compound are gathered under a nn NNC predicate. Each named entity class of a noun has a corresponding predicate which shares its argument with the noun predicate it modifies. Predicates for ¡ ¦¤ § ¥£ ¡ ate inferences, C O G E X begins to search for proofs. To every inference, an appropriate weight is assigned depending on the axiom used for its derivation. If a refutation is found, the proof is complete; if a refutation cannot be found, then predicate arguments are relaxed. When argument relaxation fails to produce a refutation, entire predicates are dropped from the negated hypothesis until a refutation is found. ule adds back the points for the modifiers dropped from and subtracts points for 's modifiers not present in . The remaining two cases are summarized in Table 1. Because pairs with longer sentences can potentially drop more predicates and receive a lower score, C O G E X normalizes the proof scores by dividing the assessed penalty by the maximum assessable penalty (all the predicates from are dropped). If this final proof score is above a threshold learned on the development data, then the pair is labeled as positive entailment. ¢ ¡ ¡ ¡ ¢ ¡ ¡ prepositions and conjunctions are also added to link the text's constituents. This syntactic layer of the logic representation is, automatically, derived from a full parse tree and acknowledges syntaxbased relationships such as: syntactic subjects, syntactic objects, prepositional attachments, complex nominals, and adjectival/adverbial adjuncts. In order to objectively evaluate our representation, we derived it from two different sources: constituency parse trees (generated with our implementation of (Collins, 1997)) and dependency parse trees (created using Minipar (Lin, 1998))1 . The two logic forms are slightly different. The dependency representation captures more accurately the syntactic dependencies between the concepts, but lacks the semantic information that our semantic parser extracts from the constituency parse trees. For instance, the sentence Gilda Flores was kidnapped on the 13th of January 19902 is "constituency" represented as Gilda NN(x1) & Flores NN(x2) & nn NNC(x3,x1,x2) & human NE(x3) & kidnap VB(e1,x9,x3) & on IN(e1,x8) & 13th NN(x4) & of NN(x5) & January (x6) & 1990 NN(x7) & nn NNC(x8,x4,x5,x6,x7) & date NE(x8) and its "dependency" logic form is Gilda Flores NN(x2) & human NE(x2) & kidnap VB(e1,x4,x2) & on IN(e1,x3) & 13th NN(x3) & of IN(x3,x1) & January 1990 NN(x1). 3.1.1 Negation The exceptions to the one-predicate-peropen-class-word rule include the adverbs not and never. In cases similar to further details were not released, the system removes 1 The experimental results described in this paper were performed using two systems: the logic prover when it receives as input the constituency logic representation (C O G E X ) and the dependency representation (C O G E X ). 2 All examples shown in this paper are from the entailment corpus released as part of the Second RT E challenge (www.pascal-network.org/Challenges/RTE2). The RT E datasets will be described in Section 7. © ¡ Table 1: The quantification of and influences the proof scoring algorithm not RB(x3,e1) and negates the verb's predicate (-release VB(e1,x1,x2)). Similarly, for nouns whose determiner is no, for example, No case of indigenously acquired rabies infection has been confirmed, the verb's predicate is negated (case NN(x1) & -confirm VB(e2,x15,x1)). 3.2 Semantic Relations The second layer of our logic representation adds the semantic relations, the underlying relationships between concepts. They provide the semantic background for the text, which allows for a denser connectivity between the concepts expressed in text. Our semantic parser takes free English text or parsed sentences and extracts a rich set of semantic relations3 between words or concepts in each sentence. It focuses not only on the verb and its arguments, but also on semantic relations encoded in syntactic patterns such as complex nominals, genitives, adjectival phrases, and adjectival clauses. Our representation module maps each semantic relation identified by the parser to a predicate whose arguments are the events and entities that participate in the relation and it adds these semantic predicates to the logic form. For example, the previous logic form is augmented with the THEME SR(x3,e1) & TIME SR(x8,e1) relations4 (Gilda Flores is the theme of the kidnap event and 13th of January 1990 shows the time of the kidnapping). 3.3 Temporal Representation In addition to the semantic predicates, we represent every date/time into a normalized form time TMP(BeginFn(event), year, month, date, hour, minute, second) & time TMP(EndFn(event), year, month, date, hour, minute, second). Furthermore, temporal reasoning We consider relations such as AGENT, THEME, TIME, LOCATION, MANNER, CAUSE, INSTRUMENT, POSSESSION, PURPOSE, MEASURE, KINSHIP, ATTRIBUTE, etc. 4 R(x,y) should be read as "x is R of y". 3 821 ¢ ¤¡ § ¦£ ¢ ( ¢¡ , ) All people read Some smart people read All smart people read Some people read Add the dropped points for 's modifiers (,) Some people read All smart people read Some smart people read All people read Subtract points for modifiers not present in ¤ ¥£ ¨ 4 Axioms on Demand C O G E X's usable list consists of all the axioms generated either automatically or by hand. The system generates axioms on demand for a given pair whenever the semantic connectivity between two concepts needs to be established in a proof. The axioms on demand are lexical chains and world knowledge axioms. We are keen on the idea of axioms on demand since it is not possible to derive apriori all axioms needed in an arbitrary proof. This brings a considerable level of robustness to our entailment system. 4.1 eXtended WordNet lexical chains 5 Because WordNet senses are ranked based on their frequency, the correct sense is most likely among the first . In . our experiments, 6 Each lexical chain is assigned a weight based on its properties: shorter chains are better than longer ones, the relations are not equally important and their order in the chain influences its strength. If the weight of a chain is above a given threshold, the lexical chain is discarded. In our experiments, we discarded the chains with concepts whose generality weight exceeded 0.8 such as object NN#1, act VB#1, be VB#1, etc. Another important change that we introduced in our extension of WordNet is the refinement of the D E R I VAT I O N relation which links verbs with their corresponding nominalized nouns. Because the relation is ambiguous regarding the role of the noun, we split 7 There are no restrictions on the target concept. 822 d e ¦£ 0" " 0 c ba ) ¦£ " 0c a Y ` u ¤ Y y ¤ xY w 0 R 0" v d t ¦£ " 0c r 98 s@¡ f VU $ 0" ¦¦ £ £ Sp `qS "h&f 0 c i W( g6 & 0 X For the semantic entailment task, the ability to recognize two semantically-related words is an important requirement. Therefore, we automatically construct lexical chains of WordNet relations from 's constituents to 's (Moldovan and Novischi, 2002). In order to avoid errors introduced by a Word Sense Disambiguation system, we used the first senses for each word 5 unless the source and the target of the chain are synonyms. If a chain exists6 , the system generates, on demand, an axiom with the predicates of the source (from ) and the target (from ). ¡ link ( ). Similarly, the system rejected chains with more than one H Y P O N Y M Y relations. Although these relations link semantically related concepts, the type of semantic similarity they introduce is not suited for inferences. Another restriction imposed on the lexical chains generated for entailment is not to start from or include too general concepts7 . Therefore, we assigned to each noun and verb synset from WordNet a generality weight based on its relative position within its hierarchy and on its frequency in a large corpus. If is the depth of concept , is the maxand imum depth in 's hierarchy is the information content of measured on the British National Corpus, then 98 @¡7" (&$" )'%#! HYPONYMY 8 (U #WVG8 S TR 6 B IB FEDB QPHGC@CA 6 4 31 5£20 predicates are derived from both the detected semantic relations as well as from a module which utilizes a learning algorithm to detect temporally ordered events ( , where is the temporal signal linking two events and ) (Moldovan et al., 2005). From each triple, temporally related S U M O predicates are generated based on hand-coded rules for the signal classes ( sequence, earlier TMP(e1,e2), contain, during TMP(e1,e2), etc.). In the above example, 13th of January 1990 is normalized to the interval time TMP(BeginFn(e2), 1990, 1, 13, 0, 0, 0) & time TMP(EndFn(e2), 1990, 1, 13, 23, 59, 59) and during TMP(e1,e2) is added to the logical representation to show when the kidnapping occurred. ¨ ¨ © For example, given the I S A relation between murder#1 and kill#1, the system generates, when needed, the axiom murder VB(e1,x1,x2) kill VB(e1,x1,x2). The remaining of this section details some of the requirements for creating accurate lexical chains. Because our extended version of WordNet has attached named entities to each noun synset, the lexical chain axioms append the entity name of the target concept, whenever it exists. For example, the logic prover uses the axiom Nicaraguan JJ(x1,x2) Nicaragua NN(x1) & country NE(x1) when it tries to infer electoral campaign is held in Nicaragua from Nicaraguan electoral campaign. We ensured the relevance of the lexical chains by limiting the path length to three relations and the set of WordNet relations used to create the chains by discarding the paths that contain certain relations in a particular order. For example, the automatic axiom generation module does not consider chains with an I S - A relation followed by a ¦¤ ¦¤ ¦ ¢ £¤ ¦¢¤ §£ ¥¢ ¦ ¤ ¤£ ¦¢¤¢ §£ ¥£ ¡ ¢ ¡ £ ¡ £ ¦ ¡ ¢ ¦¤ § ¥£ ¤ ¡ ¢ this relation in three: AC T- D E R I VAT I O N, AG E N TD E R I VAT I O N and T H E M E - D E R I VAT I O N. The role of the nominalization determines the argument given to the noun predicate. For instance, the axioms act VB(e1,x1,x2) acting NN(e1) (AC T), act VB(e1,x1,x2) actor NN(x1) (AG E N T) reflect different types of derivation. 4.2 NLP velopment set data, and 230 originate from previous projects. These axioms express knowledge that could not be derived from WordNet regarding employment9 , family relations, awards, etc. Axioms 4.3 World Knowledge Axioms 8 http://xwn.hlt.utdallas.edu 9 For example, the axiom country NE(x1) & negotiator NN(x2) & nn NNC(x3,x1,x2) work VB(e1,x2,x4) & for IN(e1,x1) helps the prover infer that Christopher Hill works for the US from top US negotiator, Christopher Hill. 10 Harabagiu and Moldovan (1998) lists the exact number of possible combinations for several WordNet relations and part-of-speech classes. 823 ¥ ¦ ¨¤¤ §¤£ ¦ ¡ ¢ ¤ ¢ £¡ 0 ¡ 0 ¡ £ ¦ ¥ ¨¤¤ §¤£ ¤ ¡ Because, sometimes, the lexical or the syntactic knowledge cannot solve an entailment pair, we exploit the WordNet glosses, an abundant source of world knowledge. We used the logic forms of the glosses provided by eXtended WordNet 8 to, automatically, create our world knowledge axioms. For example, the first sense of noun Pope and its definition the head of the Roman Catholic Church introduces the axiom Pope NN(x1) head NN(x1) & of IN(x1,x2) & Roman Catholic Church NN(x2) which is used by prover to show the entailment between : A place of sorrow, after Pope John Paul II died, became a place of celebration, as Roman Catholic faithful gathered in downtown Chicago to mark the installation of new Pope Benedict XVI. and : Pope Benedict XVI is the new leader of the Roman Catholic Church. We also incorporate in our system a small common-sense knowledge base of 383 handcoded world knowledge axioms, where 153 have been manually designed based on the entire de ( ) and ( ) hold true. We note that not any two semantic relations can be combined: and have to be compatible with respect to the part-of-speech of the common argument. Depending on their properties, there are up to 8 combinations between any two semantic relations and their inverses, not counting the combinations between a semantic relation and itself10 . Many combinations are not semantically significant, for example, KINSHIP SR(x1,x2) & TEMPORAL SR(x2,e1) is unlikely to be found in text. Trying to solve the semantic combinations one comes upon in text corpora, we analyzed the RT E development corpora and devised rules for some of the combinations encountered. We validated these axioms pairs from the LA by checking all the Times text collection such that holds. We have identified 82 semantic axioms that show how semantic relations can be combined. These axioms enable inference of unstated meaning from the semantics detected in text. For example, if states explicitly the K I N S H I P ( K I N ) relations between Nicholas Cage and Alice Kim Cage and between Alice Kim Cage and Kal-el Coppola Cage, the logic prover uses the KIN SR(x1,x2) & KIN SR(x2,x3) KIN SR(x1,x3) semantic axiom (the transitivity of the blood relation) and the symmetry of this relationship (KIN SR(x1,x2) ¦ ¨¤¤ §¤£ ¦ ¢ £¡ ¤ 0 ¡ 0 ¥ ¤ ¡ © ¦ 0 ¢ £¡ ¡ ¤ ¥¨ ¦ § ¡ ¤ ¤¢ ¤ ¥ ¦¤ ¦¤£ ¦ ¤ ¤ ¦ ¤ Y © ¤ ¤ Our N L P axioms are linguistic rewriting rules that help break down complex logic structures and express syntactic equivalence. After analyzing the logic form and the parse trees of each text fragment, the system, automatically, generates axioms to break down complex nominals and coordinating conjunctions into their constituents so that other axioms can be applied, individually, to the components. These axioms are made availpair that generated them. able only to the For example, the axiom nn NNC(x3,x1,x2) & francisco NN(x1) & merino NN(x2) merino NN(x3) breaks down the noun compound Francisco Merino into Francisco and Merino and helps C O G E X infer Merino's home from Francisco Merino's home. 5 Semantic Calculus The Semantic Calculus axioms combine two semantic relations identified within a text fragment and increase the semantic connectivity of the text (Tatu and Moldovan, 2005). A semantic axiom which combines two relations, and , is devised by observing the semantic connection between the and words for which there exists at least one other word, , such that ¦¤ § ¥£ ¡ ¡ 6 Temporal Axioms One of the types of temporal axioms that we load in our logic prover links specific dates to more general time intervals. For example, October 2000 entails the year 2000. These axioms are automatically generated before the search for a proof starts. Additionally, the prover uses a SUMO knowledge base of temporal reasoning axioms that consists of axioms for a representation of time points and time intervals, Allen (Allen, 1991) primitives, and temporal functions. For example, during is a transitive Allen primitive: during TMP(e1,e2) & during TMP(e2,e3) during TMP(e1,e3). 7 Experiments and Results The benchmark corpus for the RT E 2005 task consists of seven subsets with a 50%-50% split between the positive entailment examples and the negative ones. Each subgroup corresponds to a different N L P application: Information Retrival (I R), Comparable Documents (C D), Reading Comprehension (R C), Question Answering (Q A), Information Extraction (I E), Machine Translation (M T), and Paraphrase Acquisition (P P). The RT E data set includes 1367 English pairs from the news domain (political, economical, etc.). The RT E 2006 data covered only four N L P tasks (I E , I R , Q A and Multi-document Summarization (S U M)) with an identical split between positive and negative examples. Table 2 presents the data statistics. For the RT E 2005 data, we list the confidence-weighted score (cws) (Dagan et al., 2005) and, for the RT E 2006 data, the average precision (ap) measure (Bar-Haim et al., 2006). 11 824 y y y KIN SR(x2,x1)) to infer 's statement (K I N(Kal-el Coppola Cage, Nicholas Cage)). Another frequent axiom is LOCATION SR(x1,x2) & PARTWHOLE SR(x2,x3) LOCATION SR(x1,x3). Given the text John lives in Dallas, Texas and using the axiom, the system infers that John lives in Texas. The system applies the 82 axioms independent of the concepts involved in the semantic composition. There are rules that can be applied only if the concepts that participate satisfy a certain condition or if the relations are of a certain type. For example, LOCATION SR(x1,x2) & LOCATION SR(x2,x3) LOCATION SR(x1,x3) only if the L O C AT I O N relation shows inclusion (John is in the car in the garage LOCATION SR(John,garage). John is near the car behind the garage LOCATION SR(John,garage)). 2005 RT E 2006 RT E Development set 567 800 Test set 800 800 ¢ ¦¤ § ¥£ ¡ Table 2: Datasets Statistics 7.1 C O G E X's Results Tables 3 and 4 summarize C O G E X's performance on the RT E datasets, when it received as input the different-source logic forms11 . On the RT E 2005 data, the overall performance on the test set is similar for both logic proving runs, C O G E X and C O G E X . On the development set, the semantically enhanced logic forms helped the prover distinguish better the positive entailments (C O G E X has an overall higher precision than C O G E X ). If we analyze the performance on the test data, then C O G E X performs slightly better on M T, C D and P P and worse on the R C , I R and Q A tasks. The major differences between the two logic forms are the semantic content (incomplete for the dependency-derived logic forms) and, because the text's tokenization is different, the number of predicates in 's logic forms is different which leads to completely different proof scores. On the RT E 2006 test data, the system which uses the dependency logic forms outperforms C O G E X . C O G E X performs better on almost all tasks (except S U M) and brings a significant improvement over C O G E X on the I R task. Some of the positive examples that the systems did not label correctly require world knowledge that we do not have encoded in our axiom set. One example for which both systems returned the wrong answer is pair 353 (test 2006) where, from China's decade-long practice of keeping its currency valued at around 8.28 yuan to the dollar, the system should recognize the relation between the yuan and China's currency and infer that the currency used in China is the yuan because a country's currency currency used in the country. Some of the pairs that the prover, currently, cannot handle involve numeric calculus and human-oriented estimations. Consider, for example, pair 359 (dev set, RT E 2006) labeled as positive, for which the logic prover could not determine that 15 safety violations numerous safety violations. The deeper analysis of the systems' output IE IR CD QA RC MT PP TEST DEV acc 58.33 52.22 82.00 50.00 53.57 55.83 56.00 59.37 63.66 RT E cws 60.90 62.41 88.90 56.27 56.38 55.83 63.11 63.09 63.44 f 60.31 15.68 79.69 0.00 38.09 53.91 26.66 48.00 64.48 acc 57.50 53.33 79.33 51.53 57.14 52.50 54.00 59.12 61.19 cws 57.03 59.67 87.15 42.37 59.32 58.17 58.15 57.17 63.63 Table 3: Task IE IR QA SUM TEST DEV 2005 data results (accuracy, confidence-weighted score, and f-measure for the true class) ap 49.71 65.91 67.30 77.60 66.31 64.05 f 57.57 56.14 48.64 74.62 60.16 66.19 acc 59.00 73.50 64.00 74.00 67.62 69.00 ap 59.74 72.50 68.16 79.68 70.69 70.92 acc 58.00 62.50 62.00 74.50 64.25 64.50 RT E Table 4: 2006 data results (accuracy, average precision, and f-measure for the true class) are no axioms that can be used to derive knowledge that supports the hypothesis. Contrarily, for the I E task, the systems were fooled by the high word overlap between and . For example, pair 678's text (test set, RT E 2006) contains the entire hypothesis in its if clause. For this task, we had the highest number of false positives, around double when compared to the other applications. L E X A L I G N works surprisingly well on the RT E data. It outperforms the semantic systems on the 2005 Q A test data, but it has its limitations. The logic representations are generated from parse trees which are not always accurate ( 86% accuracy). Once syntactic and semantic parsers are perfected, the logical semantic approach shall prove its potential. 7.3 Merging three systems 7.2 Lexical Alignment Because the two logical representations and the lexical method are very different and perform better on different sets of tasks, we combined the scores returned by each system12 to see if a mixed approach performs better than each individual method. For each N L P task, we built a classifier based on the linear combination of the three scores. Each task's classifier labels pair as positive if Each system returns a score between 0 and 1, a number close to 0 indicating a probable negative example and a number close to 1 indicating a probable positive example. Each pair's lexical alignment score, , is the normalized average edit distance cost. 12 825 ¦£ DB @ 864 3 2 0 )' CY A975111(& y S U( H%" ©§¥ © ¨¦E £ ¢ ¡ ¦£ S U( H7" ©§¥ ¨ ¨¦E £ ¤¢ $" %#! Inspired by the positive examples whose is in a high degree lexically subsumed by , we developed a shallow system which measures their overlap by computing an edit distance between the text and the hypothesis. The cost of deleting a word from is equal to 0, the cost of replacing a word from with another from , where and and are not synonyms in WordNet equal to (we do not allow replace operations) and the cost of inserting a word from varies with the partof-speech of the inserted word (higher values for WordNet nouns, adjectives or adverbs, lower for verbs and a minimum value for everything else). Table 5 shows a minimum cost alignment. The performance of this lexical method (L E X A L I G N) is shown in Tables 3 and 4. The alignment technique performs significantly better on the pairs in the C D (RT E 2005) and S U M (RT E 2006) tasks. For these tasks, all three systems performed the best because the text of false pairs is not entailing the hypothesis even at the lexical level. For pair 682 (test set, RT E 2006), and have very few words overlapping and there ¡ ¡ showed that while WordNet lexical chains and NLP axioms are the most frequently used axioms throughout the proofs, the semantic and temporal axioms bring the highest improvement in accuracy, for the RT E data. © ¤ ¤ ¡ ¡ ¤ ¤ ¦ ¤ ¡ ¦¢ ¤ ¦ ¨ COGEX COGEX © ¨ Task COGEX COGEX LEXALIGN C O M B I N AT I O N f 51.42 27.58 74.38 64.80 58.33 27.84 30.30 54.52 57.52 acc 56.66 50.00 82.00 53.07 57.85 51.66 50.00 59.12 62.08 cws 53.41 55.92 88.04 43.76 60.26 45.94 47.03 55.74 59.94 f 59.99 0.00 80.57 63.90 49.57 67.04 0.00 59.17 60.83 acc 62.50 68.88 84.66 60.76 60.00 64.16 68.00 67.25 70.37 cws 67.63 75.77 91.73 55.05 62.89 63.80 75.27 67.64 71.89 f 57.14 64.10 82.70 63.82 50.00 66.66 63.63 64.69 66.66 LEXALIGN C O M B I N AT I O N f 63.71 73.89 57.64 73.73 67.50 69.31 acc 54.00 64.50 58.50 70.50 61.87 62.25 ap 49.70 69.45 55.78 76.82 57.64 62.66 f 67.14 65.02 57.86 73.05 66.07 62.72 acc 71.50 74.00 70.50 79.00 73.75 75.12 ap 62.99 74.30 75.10 80.33 71.33 76.28 f 71.36 72.92 66.67 78.13 72.37 76.83 ¤ d ¡ ¤ ¡ £ ¤ § £ ¡ ¦¤£ ¤ ¤ ¡ ¡ ¤ § £ , where the optimum values of the classifier's real-valued parameters ( ) were determined using a grid search on each development set. Given the different nature of each application, the parameters vary with each task. For example, the final score given to each I E 2006 pair is highly dependent on the score given by C O G E X when it received as input the logic forms created from the constituency parse trees with a small correction from the dependency parse trees logic form system13 . For the I E task, the lexical alignment performs the worst among the three systems. On the other hand, for the I R task, the score given by L E X A L I G N is taken into account14 . Tables 3 and 4 summarize the performance of the three system combination. This hybrid approach performs better than all other systems for all measures on all tasks. It displays the same behavior as its dependents: high accuracy on the C D and S U M tasks and many false positives for the I E task. 8 Conclusion In this paper, we present a logic form representation of knowledge which captures syntactic dependencies as well as semantic relations between concepts and includes special temporal predicates. We implemented several changes to our WordNet lexical chains module which lead to fewer unsound axioms and incorporated in our logic prover semantic and temporal axioms which decrease its dependence on world knowledge. We plan to improve our logic prover to detect false entailments even when the two texts have a high word overlap and expand our axiom set. References J. Allen. 1991. Time and Time Again: The Many Ways to Represent Time. Internatinal Journal of Intelligent Systems, 4(6):341­355. R. Bar-Haim, I. Dagan, B. Dolan, L. Ferro, D. Giampiccolo, B. Magnini, and I. Szpektor. 2006. The Second PASCAL Recognising Textual Entailment 13 14 0 )1 CY A9753 2£& 6 4 B # £1 6 4 B ! % DB @ 864 ! % !% $ 0 )(' CY A9753 £& 6 4 B # " 6 4 B !% !% $ ! DB @ 864 F £0 ¥ ¥£§ ¡¢ © ¨¦E ¢ ¨ ¨¦E £ ¢ ¤¢© ©§¥ £ ©§¥ ¨¦ ©§ ¤ ¦£ F P0 ¥ ¥£§ ¤¢© ¤ S UH7" ( " £¥ 0 ¥£§ F ¤¢© ¡¢ ¢ : : The Council of Europe The Council of Europe has DEL * INS 45 member states. 45 member states. Three countries from ... DEL * is made up by * Table 5: The lexical alignment for RT E 2006 pair 615 (test set) Challenge. In Proceedings of the Second PASCAL Challenges Workshop. J. Bos and K. Markert. 2005. Recognizing Textual Entailment with Logical Inference. In Proceedings of HLT/EMNLP 2005, Vancouver, Canada, October. M. Collins. 1997. Three Generative, Lexicalized Models for Statistical Parsing. In Proceedings of the ACL-97. I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL Recognising Textual Entailment Challenge. In Proceedings of the PASCAL Challenges Workshop, Southampton, U.K., April. R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. 2005. An Inference Model for Semantic Entailment in Natural Language. In Proceedings of AAAI-2005. S. Harabagiu and D. Moldovan. 1998. Knowledge Processing on Extended WordNet. In Christiane Fellbaum, editor, WordNet: an Electronic Lexical Database and Some of its Applications, pages 379­ 405. MIT Press. H. Kamp and U. Reyle. 1993. From Discourse to Logic: Introduction to Model-theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers. D. Lin. 1998. Dependency-based Evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems, Granada, Spain, May. William W. McCune, 1994. Manual and Guide. OTTER 3.0 Reference D. Moldovan and A. Novischi. 2002. Lexical chains for Question Answering. In Proceedings of COLING, Taipei, Taiwan, August. D. Moldovan and V. Rus. 2001. Logic Form Transformation of WordNet and its Applicability to Question Answering. In Proceedings of ACL, France. D. Moldovan, C. Clark, S. Harabagiu, and S. Maiorano. 2003. COGEX A Logic Prover for Question Answering. In Proceedings of the HLT/NAACL. D. Moldovan, C. Clark, and S. Harabagiu. 2005. Temporal Context Representation and Reasoning. In Proceedings of IJCAI, Edinburgh, Scotland. M. Tatu and D. Moldovan. 2005. A Semantic Approach to Recognizing Textual Entailment. In Proceedings of HLT/EMNLP. 826