SIGIR 2007 Proceedings Poster A Comparison of Sentence Retrieval Techniques Niranjan Balasubramanian, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval Depar tment of Computer Science University of Massachusetts, Amherst, MA 01003 niranjan,allan,croft@cs.umass.edu 1. INTRODUCTION 3. TEST COLLECTION The exp eriments to detect redundant information were conducted on a dataset prepared by Murdock [8]. The dataset consists of documents and 50 queries that are answer sentences for some TREC 2003 QA track questions. For each query documents were retrieved using query likelihood retrieval. The top 1000 retrieved documents were sentence segmented, stemmed using Krovetz stemmer and the sentences were indexed. The queries were then used to retrieve sentences from their corresp onding sentence indexes. For each query the top 50 retrieved sentences using query likelihood and translation based retrieval models were manually judged for the amount of redundant information contained in them. Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques dep ends on the similarity measure used to rank sentences. An effective retrieval model should b e able to handle low word overlap b etween query and candidate sentences and go b eyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outp erformed TF-IDF and word overlap based methods for ranking sentences. In this pap er, we compare the p erformance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information. Categories and Sub ject Descriptors: H3.3 [Information Storage and Retrieval]: Information search and Retrieval General Terms: Algorithms, Exp erimentation, Theory Keywords: Sentence Retrieval, Language Modeling 4. COMPARISON OF TECHNIQUES We compare advanced techniques within the language modeling framework to improve sentence retrieval effectiveness. 4.1 Topic based Smoothing Language modeling techniques primarily rely on estimates of word generation probabilities from document and query models. The estimates of word generation probabilities from small units of text such as sentences are not reliable and need to b e smoothed. The probabilities are smoothed using a linear interp olation of estimates from a topic model, built from the top 1000 documents retrieved from the document index for each query, and general English. 2. RELATED WORK Previous work on novelty detection [9] and summarization[3] has explored several sentence level similarity measures and retrieval techniques for identifying novel information. Metzler et al [6] showed that query likelihood outp erformed TF-IDF and word overlap based measures for identifying sentences that contain some sp ecific facts contained in a query sentence. Murdock [8] compared query likelihood and a mono-lingual translation based model for the task of identifying restatements. Jeon et al [4] describ e a mixture model of query likelihood and a translation based model for successfully identifying similar questions. Our work extends [6] and [8] by considering more sophisticated language modeling techniques such as topic based smoothing, dep endence models and relevance modeling to improve retrieval effectiveness for identifying redundant information. 4.2 Dependence Models Modeling query term dep endencies has b een shown to improve retrieval effectiveness for document retrieval [7]. We use the sequential dep endence model describ ed in [7] and ignore the full dep endence model as it does not scale to long queries. Sequential dep endence models capture term dep endencies b etween adjacent terms in the query and relax the indep endence assumptions made by the query-likelihood model to some degree. 4.3 Relevance Models Relevance modeling [5] is a technique for estimating query models from top ranked retrieved documents. Diaz et al [2] show that using a larger external corpus to build relevance models p erforms b etter than building relevance models using the target collection alone. We compare the effectiveness of relevance models built from target and external collections. Copyright is held by the author(s)/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. Copyright 2007 ACM 978-1-59593-597-7/07/0007 ...$5.00. 813 SIGIR 2007 Proceedings Poster 4.4 Translation based Models Murdock [8] showed that Model-S, a translation based model, is effective for sentence retrieval, esp ecially for question answering and novelty detection tasks. However, for the task of identifying redundant information at sentence level, Model-S is only as effective as a simple query likelihood retrieval. We compare the effectiveness of Model-S and a mixture model [4] using a large, relatively robust monolingual translation dictionary [4]. 5. EXPERIMENTS AND RESULTS Results are rep orted for the following retrieval techniques. 1. Query Likelihood Baseline (QL) 2. Topic Smoothing (QL-TS) - Collection estimates linearly interp olated with topic model estimates obtained from the top 1000 documents retrieved for each query. 3. Sequential Dep endence Model (DM) - A weighted combination of the original query terms with a sequential dep endence model query. 4. Translation Model (Model-S) - A translation based model describ ed in [8] using a translation dictionary, Webfaq, built from FAQ pairs on the web [4]. 5. Mixture Model (MM) - A mixture model of query likelihood and a translation based model (IBM-Model 1 [1]) describ ed in [4]. 6. Relevance Model-Target (RM-T) - Query model built using the target collection alone. 7. Relevance Model-External + Target (RM-E) - Query model built on a collection of external documents (Gigaword news corpus) and the target collection. 8. Interp olated Queries (RM+DM) - Best p erforming dep endence model queries interp olated with b est p erforming relevance model queries. 9. Two stage (DMRM) - Best p erforming dep endence model queries used to retrieve documents that are then used to build relevance model queries. Topic based smoothing and dep endence models provide modest improvements over the query likelihood baseline. The translation based models provide no improvements over topic based smoothing. Using relevance models leads to small but significant gains over the b est topic based smoothing run. Using a large external collection resulted in minor improvements over relevance models built from the smaller target collection alone. Dep endence model queries provide a different form of evidence of relevance than relevance model queries and therefore their combination yields improvments over the individual queries. Also DMRM, which uses the b est p erforming dep endence model query to build relevance models, achieves the b est p erformance by b oosting the quality of the documents used to build the relevance models. 6. CONCLUSIONS Previous work on sentence retrieval techniques show that simple query likelihood models outp erform word overlap and TF-IDF based measures. In this short pap er, we compared advanced language modeling techniques for the task of identifying redundant information in sentences and showed that they outp erform simple query likelihood and topic based smoothing methods. Acknowledgments This work was supp orted in part by the Center for Intelligent Information Retrieval, in part by the Defense Advanced Research Pro jects Agency (DARPA) under contract numb er HR0011-06-C-0023, and in part by NSF grant #I IS-0534383. Any opinions, findings and conclusions or recommendations expressed in this material are the author's and do not necessarily reflect those of the sp onsor. [1] P. Brown, V. Della Pietra, S. Della Pietra, and R. Mercer. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263­311, 1993. [2] F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corp ora. Proc. ACM SIGIR, pages 154­161, 2006. [3] G. Erkan and D. Radev. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intel ligence Research, 22:457­479, 2004. [4] J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proc. CIKM conference, pages 84­90, 2005. [5] V. Lavrenko and W. Croft. Relevance based language models. Proceedings of the ACM SIGIR conference, pages 120­127, 2001. [6] D. Metzler, Y. Bernstein, W. Croft, A. Moffat, and J. Zob el. Similarity measures for tracking information flow. Proc. CIKM conference, pages 517­524, 2005. [7] D. Metzler and W. B. Croft. A Markov Random Field model for term dep endencies. In Proceedings of the ACM SIGIR conference, pages 472­479, 2005. [8] V. Murdock. Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts Amherst, 2006. [9] I. Sob oroff and D. Harman. Overview of the TREC 2003 Novelty Track. The Twelfth Text REtrieval Conference, 2003. 7. REFERENCES Table 1: Comparison of effectiveness. , are significant (p < 0.05) improvements over QL-TS and DM, respectively, using a two-tailed paired t-test. Best results are underlined. Method QL QL-TS DM Model-S MM RM-T RM-E RM+DM DMRM P@5 0.6776 0.6857 0.6980 0.6735 0.6735 0.6939 0.6980 0.7020 0.7061 P@10 0.5531 0.5694 0.5653 0.5653 0.5653 0.5714 0.5735 0.5755 0.5714 P@15 0.4639 0.4735 0.4735 0.4803 0.4748 0.4789 0.4912 0.4857 0.4789 P@20 0.4102 0.4143 0.4061 0.4143 0.4153 0.4276 0.4276 0.4133 0.4204 MAP 0.6066 0.6248 0.6264 0.6189 0.6198 0.6351 0.6384 0.6417 0.6438 Table 1 shows retrieval effectiveness in terms of precision at the top ranks and the mean average precision (MAP). 814