SIGIR 2007 Proceedings Poster Incorporating Term Dependency in the DFR Framework Jie Peng , Craig Macdonald , Ben He , Vassilis Plachouras , Iadh Ounis University of Glasgow {pj, craigm, ben, ounis}@dcs.gla.ac.uk vassilis@yahoo-inc.com 2. TERM DEPENDENCE IN THE DFR FRAMEWORK We use the DFR paradigm to capture the dep endence of query terms in documents. The prop osed model assigns scores to pairs of query terms, in addition to the single query terms. Hence the score of a document d for a query Q is given as follows: X X scor e(d, p) (1) scor e(d, Q) = 1 · scor e(d, t) + 2 · t Q pQ2 Glasgow, UK Yahoo! Research Barcelona, Spain ABSTRACT Term dep endency, or co-occurrence, has b een studied in language modelling, for instance by Metzler & Croft [3] who showed that retrieval p erformance could b e significantly enhanced using term dep endency information. In this work, we show how term dep endency can b e modelled within the Divergence From Randomness (DFR) framework. We evaluate our term dep endency model on the two adhoc retrieval tasks using the TREC .GOV2 Terabyte collection. Furthermore, we examine the effect of varying the term dep endency window size on the retrieval p erformance of the prop osed model. Our exp eriments show that term dep endency can indeed b e successfully incorp orated within the DFR framework. Categories and Sub ject Descriptors: H.3.3 [Information Storage & Retrieval]: Information Search & Retrieval General Terms: Performance, Exp erimentation Keywords: Term Dep endency, DFR 1. INTRODUCTION Document weighting models, such as BM25 and language modelling, rank documents using the occurrences of single query terms in documents and assuming that the query terms are indep endent. However, previous studies have shown that taking the dep endency of query terms in documents into account can improve retrieval p erformance [3, 4]. In particular, Metzler & Croft have develop ed a formal framework for modelling term dep endencies via Markov random fields [3]. They explored three p ossible relation variants between query terms: Full Indep endence, which assumes query terms are indep endent with each other; Sequential Dep endence, which only assumes a dep endence b etween neighb ouring query terms; and Full Dep endence which assumes all query terms are dep endent with each other. Their exp eriments showed that their term dep endency model could significantly improve retrieval p erformance. In this pap er, we show how term dep endency can b e naturally modelled within the Divergence From Randomness framework [1] (Section 2). We evaluate the prop osed model in the context of the TREC 2005 and TREC 2006 Terabyte track adhoc tasks (Section 3). In particular, we observe comparable conclusions to Metzler & Croft's [3], namely that the incorp oration of term dep endencies into a DFR-based model can significantly enhance retrieval p erformance. Copyright is held by the author/owner(s). SIGIR' 07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. where scor e(d, t) is the score assigned to a query term t in the document d, p corresp onds to a pair of query terms that app ear within the query Q, scor e(d, p) is the score assigned to a pair of query terms p in the document d, and Q2 is a set of pairs of query terms, as defined b elow. The two scores are combined linearly using 1 & 2 as weights. For simplicity, P e use only binary weights. In Equation (1), w the score tQ scor e(d, t) can b e estimated by any DFR weighting model. For example, we can use the DFR PL2 document weighting model [1]. We now introduce three p ossible variants in using term dep endencies b etween query terms: For full indep endence (FI), the introduced weighting model only computes the first comp onent of Equation (1) as it ignores the term dep endencies b etween query terms (1 = 1, 2 = 0). This is equivalent to PL2 alone; For sequential dep endence (SD), we compute b oth comp onents of Equation (1) (1 = 1, 2 = 1), and in this case, Q2 is the set that contains ordered pairs of neighb ouring query terms; For full dep endence (FD), we compute b oth comp onents of Equation (1) (1 = 1, 2 = 1), and in this case, Q2 is the set that contains unordered pairs of any two query terms. In this pap er, we consider only pairs of query terms, even if we could easily extend the model to more than two terms. The weight scor e(d, p) of a pair of query terms in a document is computed as follows: scor e(d, p) = - log2 (Pp1 ) · (1 - Pp2 ) (2) where Pp1 corresp onds to the probability that there is a document in which a pair of query terms p occurs a given numb er of times. Pp1 can b e computed with any Randomness model from the DFR framework. Pp2 corresp onds to the probability of seeing the pair of query terms once more, after having seen it a given numb er of times. Pp2 can b e computed using any of the After-effect models in the DFR framework. The difference b etween scor e(d, p) and scor e(d, t) is that the former dep ends on counts of occurrences of the pair of query terms p, while the latter dep ends on counts of occurrences of the query term t. 843 SIGIR 2007 Proceedings Poster FI SD FD TREC 2006 adhoc Task MAP b-Pref P@10 0.3062 0.3572 0.5640 0.3175 (+3.7%) 0.3670 (+2.7%) 0.5940 (+5.3%) 0.3297 (+7.7%) 0.3811 (+6.7%) 0.5600 (-0.7%) TREC 2005 adhoc Task MAP b-Pref P@10 0.3407 0.3770 0.6180 0.3384 (-0.2%) 0.3767 (0.0%) 0.6280 (+1.6%) 0.3488 (+2.4%) 0.3868 (+2.6%) 0.6240 (+1.0%) Table 1: MAP, b-Pref, and P@10 scores for each variant for TREC 2005 and TREC 2006 adhoc task, respectively. Values in parenthesis denote percentage improvement over full independence (FI). Scores statistically improved over FI are marked with and scores statistically improved over both FI and SD are marked with (Wilcoxon Matched-Pairs Signed-Ranks Test, p < 0.05).The best result in each column is emphasised. To compute the weight scor e(d, p), we use a Randomness model, which does not consider the collection frequency of the pair of query terms. It is based on the binomial Randomness model, given as follows [2]: " 1 · - log2 (l - 1)! + log2 pf n! scor e(d, p) = pf n + 1 + log2 (l - 1 - pf n)! - pf n log2 (pp ) (3) " - (l - 1 - pf n) log2 (pp ) 1 where l is the length of the document in tokens, pp = l-1 , pp = 1 - pp , and pf n is the normalised frequency of the pair of query terms p using Normalisation 2 [1]: 0.335 0.33 0.325 Full Independence (FI) Sequential Dependence (SD) Full Dependence (FD) MAP 0.32 0.315 0.31 0.305 0.3 0 5 10 15 20 25 window_size 30 35 40 45 50 Figure 1: MAP distribution for TREC 2006 for varying window siz e. and b-Pref (these improvements are significant on the TREC 2006 queries), P@10 is enhanced more by SD than by FD. Moreover, we vary the window siz e parameter to investigate its effect on retrieval effectiveness. For lack of space, we consider only the TREC 2006 queries and the MAP measure. From Figure 1, we observe that b oth SD and FD, in most cases, outp erform the FI baseline. The p erformance is stable for the SD variant as window siz e is varied. However, the improvement decreases as window siz e is increased for the FD variant. In general, a window siz e around 5 will produce the b est MAP scores for the FD variant. Furthermore, for 2 window siz e 24, the FD variant outp erforms SD, while for window siz e > 24, SD p erforms b etter. Finally, we notice that the results observed in our exp eriments mirror those observed in [3], namely that FD can significantly outp erform SD, and b oth FD and SD can significantly outp erform the FI baseline, even though, unlike [3], we only consider pairs of query terms. Furthermore, our retrieval p erformance would likely b e enhanced by suitable training of 1 & 2 in Equation (1). pf n = pf · log 2 (1 + cp · av g l - 1 )(cp > 0) l-1 (4) pf is the frequency of the pair of query terms p that app ear within window siz e tokens in the document (for SD, these must app ear in the same order as in pair p), av g l is the average document length in the collection, and cp is a hyp erparameter that controls the normalisation applied to the pair of query terms frequency against document length. 3. EXPERIMENTS AND ANALYSIS We use the TREC .GOV2 Terabyte test collection, and its associated TREC 2005 and 2006 adhoc title-only topics and relevance assessment sets. For indexing and retrieval we use Terrier1 , with Porter's weak stemming and removing stopwords. For all our exp eriments, we set c = 6 and cp = 0.05 for the term frequency normalisation parameter of PL2 and the pair of query terms frequency normalisation resp ectively, as suggested by [2]. For the window siz e, we use 5 ­ the default setting suggested by [2]. We rep ort Mean Average Precision (MAP), binary preference (b-Pref ), and Precision at 10 (P@10). Firstly, in Table 1, we assess the extent to which the incorp oration of term dep endency enhances retrieval p erformance over a full indep endence baseline (FI). We observe that the FD variant always outp erforms the baseline, except for P@10 on the TREC 2006 queries (the improvements on the TREC 2006 queries are statistically significant). Moreover, the SD variant outp erforms the FI baseline (these improvements are statistically significant for MAP and b-Pref on the TREC 2006 queries), except for MAP and b-Pref on the TREC 2005 queries, which are slightly (but not significantly) lower than the baseline. Secondly, we compare the SD and FD variants. From Table 1, we observe that while FD outp erforms SD for MAP 1 4. CONCLUSIONS We have introduced a novel DFR-centric approach for incorp orating term dep endencies. Our approach is shown to b e robust as retrieval effectiveness is enhanced on a largescale adhoc test collection, using a variety of window sizes. Further enhancement of retrieval p erformance is likely to be achieved by an appropriate training of the model, similar to that in [3]. 5. REFERENCES [1] G. Amati. Probability Models for Information Retrieval based on Divergence From Randomness. PhD thesis, University of Glasgow, 2003. [2] C. Lioma, C. Macdonald, V. Plachouras, J. Peng, B. He, and I. Ounis. Exp eriments in Terabyte and Enterprise tracks with Terrier. In Proceedings of TREC 2006, 2007. [3] D. Metzler and W. B. Croft. A Markov Random Field Mo del for Term Dep endencies. In Proceedings of SIGIR 2005, pages 472­479. 2005. [4] M. Srikanth and R. Srihari. Biterm Language Mo dels for Do cument Retrieval. In Proceedings of SIGIR 2002, pages 425­426. 2002. URL: http://ir.dcs.gla.ac.uk/terrier/ 844