SIGIR 2007 Proceedings Poster Improving Retrieval Accuracy by Weighting Document Types with Clickthrough Data Peter C. K. Yeung, Charles L. A. Clarke, Stefan Buttcher ¨ University of Waterloo Waterloo, Canada {p2yeung, claclark, sbuettch}@plg.uwaterloo.ca ABSTRACT For enterprise search, there exists a relationship b etween work task and document typ e that can b e used to refine search results [3]. In this p oster, we adapt the p opular Okapi BM25 scoring function to weight term frequency based on the relevance of a document typ e to a work task. Also, we use click frequency for each task-typ e pair to estimate a realistic weight. Using the W3C collection from the TREC Enterprise track for evaluations, our approach leads to significant improvements on search precision. result list. Therefore, document typ e is an imp ortant factor to consider in the retrieval process. Clickthrough data is a history ab out user-submitted queries and user-selected documents on the corresp onding search result page. Although clickthrough data does not provide direct indication on document relevance, it provides useful hints for determining which document (or typ e of documents) is relevant to a user's need. Many different approaches of utilizing clickthrough data to improve retrieval p erformance have previously b een prop osed (e.g. [1]). In this p oster, we take a simpler approach of utilizing clickthrough data in the retrieval process. In our approach, clickthrough data are group ed together based on different task-typ e pairs. To determine the weight for each task-typ e pair, we consider the click frequency of the document typ e when the work task was given. For example, given a work task, if typ e A is clicked more frequently than typ e B, then typ e A's weight would b e larger than typ e B's. Dep ending on the document's typ e and on the given work task, we apply the corresp onding weight to the modified BM25 to compute the relevance score. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Search process General Terms Performance, Human Factors Keywords Enterprise Search, Clickthrough Data 2. WEIGHTING DOCUMENT TYPES The extended version of Okapi BM25 outputs a relevance score for each document by computing a linear combination of term frequencies and field weights. For query terms Q1 , Q2 , ..., Qn , the weighted BM25 relevance score of a document D is (D ) n X i=1 (k1 + 1) fD ,Qi fD ,Qi + k1 ((1 - b) + b |D | ) av g dl 1. INTRODUCTION Recently, Rob ertson et al. [4] introduced a modified version of Okapi BM25 to incorp orate weights into different fields of a structured document. The intuition is to consider structured documents and rank them according to the imp ortance of each structure. Although the modified Okapi BM25 was intended for weighting fields of a structured document, it can b e used to weight another useful piece of information of a document: document type. This p oster also introduces an approach to use clickthrough data to estimate a realistic weight for each work task-document type pair. Previous research [3] has shown that there exists a relationship b etween work task and document typ e (or genre) in an enterprise search environment. A document genre is a class of documents, group ed together based on similar subject, form, and content. For our purp ose, we would consider document type. A document typ e defines the source of a document (i.e., WWW pages, emails, discussion threads, etc). If a user's work task is known to a retrieval system, retrieval accuracy can b e improved by returning documents from those relevant typ es and ranking them higher in the Copyright is held by the author/owner. SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. SBM 25F = wQi (1) where |D| is the length of D, and av g dl is the average document length. k1 (= 1.2) and b (= 0.75) are free parameters. wQi is the inverse document frequency weight. fD ,Qi is the weighted term frequency of Qi . It is a combination of its unweighted frequency fD ,Qi and the corresp onding weight wj . Supp ose that there are N different fields to b e weighted, N X j =1 fD ,Qi = wj fD ,Qi (2) For our purp ose, document type is the only field that would b e weighted. If work task is known, a search system can use the corresp onding set of weights for document typ es to calculate relevance scores. To determine a realistic estimate of the weight for each 759 SIGIR 2007 Proceedings Poster document typ e, we consider click frequency for each document typ e and work task. Each weight should have these prop erties: · the weight is one if click frequency is zero; · the weight increases monotonically with click frequency; and · the weight increases to an asymptotic maximum. Given a work task, assume cfT represents click frequency of a document typ e T . A rough model for estimating each weight can b e formulated as cfT + S wT = |T | +1 |C | + |T |S (3) Model BM25 BM25+CF P@5 0.6286 0.7469 P@10 0.6143 0.6939 P@15 0.6082 0.6653 P@20 0.6031 0.6459 Table 1: Precisions for BM25 and BM25+CF. Typ e Web pages mailing lists CVS rep ositories wiki pages p eople other BM25 21129 22043 4927 4078 4 353 BM25+CF 444 51930 0 178 0 0 Table 2: Number of Documents Retrieved for Each Document Type. retrieved documents from public Web pages and mailing lists, along with a small amount of documents from the other 4 typ es. However, BM25+CF retrieved the ma jority of its documents from mailing lists. Therefore, for the exp ert search task, mailing lists is a more relevance document typ e and retrieval p erformance can b e improved by placing more weights on its documents. where |T | is the numb er of typ es, |C | is the total numb er of clicks, and S (= 1.5) is a smoothing parameter. First, if cfT is zero, then wT 1 (assume S is relatively small). Second, equation 3 is linear, thus, wT increases monotonically as click frequency increases. Finally, if a particular document typ e dominates the clicks, cfT would equal to |C |, which means wT would have a value close to |T | + 1. Hence, the weight increases to an asymtotic maximum. Equation 3 satisfies all prop erties listed ab ove. Given the weight of each document typ e for a sp ecific work task, the weighted term frequency is fD ,Qi = wT fD ,Qi . 4. CONCLUSIONS We have prop osed a fundamental approach for weighting document typ es and estimating the appropriate weight for each document typ e using click frequency. Click frequency is an indication of users' judgments on each typ e for the work task. Thus, it is a helpful source for estimating the weights. Our model incorp orates these weights to determine a weighted term frequency, which is then used to compute relevance scores. In our exp eriments, the model improves P@5 by 19%, compared to a BM25 baseline. The improvement is statistically significant according to a paired t-test (confidence level: 95%). (4) 3. EXPERIMENTAL RESULTS For our exp eriments, we employ the W3C collection used in the TREC 2006 Enterprise track [2]. The W3C collection contains 331,037 documents with a total uncompressed size of 5.7 gigabytes. These documents are categorized into six different typ es: mailing lists, public CVS rep ository, public Web pages, wiki pages, p ersonal pages for the W3C team, and other pages. The evaluation is limited by the nature of the TREC Enterprise track. Since the queries were used by the expert search task in TREC Enterprise track, they were created with the ob jective of finding an exp ert for a particular topic. Thus, there is only one work task--expert search task-- corresp onding to these queries. Our ob jective is to find relevant document typ e(s) for the exp ert search task and rank documents from this typ e higher to improve search precisions. We utilized the clickthrough data that were used during the creation of the evaluation topics. The problem is that some clicked-on documents were also listed in the qrels file. It is inappropriate to train our models using these clickthrough data and then evaluate our models using the topics and the qrels file. Therefore, we removed them from our exp eriments and used only the ones where the clicked-on document is not identified in the qrels file. Table 1 shows that BM25+CF increases search precision at 5 documents from 0.6286 to 0.7469, a 19% improvement. Our model is statistically significant over BM25 for precision at 5 and 10 documents. Table 2 shows the numb er of documents retrieved in each document typ e for all query topics. The BM25 model mainly 5. REFERENCES [1] E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorp orating user b ehavior information. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19­26, New York, NY, USA, 2006. ACM Press. [2] N. Craswell, I. Sob oroff, and A. de Vries. Overview of the trec-2006 enterprise track. In Proceedings of the 15th Text REtrieval Conference. ACM Forum, Novemb er 2006. [3] L. Freund, E. Toms, and C. L. A. Clarke. Modeling task-genre relationships for ir in the workplace. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 441­448, New York, NY, USA, 2005. ACM Press. [4] S. Rob ertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM '04: Proceedings of the thirteenth ACM international conference on Information and know ledge management, pages 42­49, New York, NY, USA, 2004. ACM Press. 760