SIGIR 2007 Proceedings Poster Using Collaborative Queries to Improve Retrieval for Difficult Topics Xin Fu, Diane Kelly & Chirag Shah University of North Carolina Chapel Hill, NC 27599-3360 USA [fu | dianek | chirags] @ email.unc.edu ABSTRACT We describe a preliminary analysis of queries created by 81 users for 4 topics from the TREC Robust Track. Our goal was to explore the potential benefits of using queries created by multiple users on retrieval performance for difficult topics. We first examine the overlap in users' queries and the overlap in results with respect to different queries for the same topic. We then explore the potential benefits of combining users' queries in various ways. Our results provide some evidence that having access to multiple users' queries can improve retrieval for individual searchers and for difficult topics. topics. These systems often recommend previously posed queries that are believed to be similar to the current query (presumably because they are about the same topic). This approach leverages the knowledge and experiences of multiple users with similar interests to improve retrieval performance. We adopt this approach in our investigation of retrieval for difficult topics. 2. METHOD The queries that were analyzed in this study were collected in a previous study where 81 users were presented with four topics, posed a single query for each topic and evaluated the relevance of a set of 10 search results [4]. Users were undergraduate students at a large university and were experienced Web searchers. The TREC Robust Track [6] collection was used which consists of a 3GB corpus of newswire text in English, 50 topics and a set of relevance judgments. Although 50 topics were used in the Robust Track, only 4 were used in this study. This was related to the nature of the previous study and because we did not want to overburden our users. The topics that were used in this study are displayed in Table 1. Users were shown the title, description and narrative fields, but only the topic numbers and titles are displayed in Table 1. This table includes the number of relevant documents in the corpus for each topic, as well as the difficulty of the topic [6]. The difficulty number is a rank that indicates how well the 2005 Robust Track participants did with the topics. For instance, Topic 374 was the least difficult topic out of the 50 original topics and was ranked `1.' The Lemur Toolkit (http://www.lemurproject.org/), OKAPI BM25, the Libbow stop word list and the Porter stemmer were used for retrieval. Table 1. Topics investigated in the study Topic 354 374 408 448 Title Journalist Risks Nobel Prize Winners Tropical Storms Ship Losses No. Relevant 376 278 183 121 Difficulty 44 1 23 15 Categories and Subject Descriptors H.1.2 [Information Storage and Retrieval]: Models and Principles ­ User/Machine Systems ­ Human factors General Terms Performance, Human Factors Keywords Collaborative queries, polyrepresentation 1. INTRODUCTION One of the major goals of the TREC Robust Track was to investigate topics that had performed poorly in previous TRECs [6]. Participants experimented with a variety of techniques; more successful techniques used external sources of information, such as Web search engine results, to expand queries that had been created using the title or description fields from the topic. Although participants were able to improve retrieval performance for some topics, performance for other topics remained low. We investigate an alternative approach to improving performance of difficult topics which makes use of queries generated by multiple users for the same topics. Belkin, et al. [1] investigated the effectiveness of multiple query representations for ten TREC topics that had been generated by ten expert online searchers and found that a progressive combination of query formulations led to a progressive improvement in results. This result can be explained, in part, by Ingwersen's [2] theory of polyrepresentation in IR, which suggests that obtaining multiple representations of a single information need is a better approach to retrieval than using solitary queries. Our work is also related to the notion of collaborative queries and social search systems [5]. The idea behind these systems is that retrieval for users can be improved by incorporating results from previous users' searches for similar Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. 3. RESULTS Table 2 characterizes users' queries. On average, users' queries were about 3 terms long. Users entered the most unique queries for Topic 354 (i.e., 40 out of 81 queries were unique) and the fewest for topic 374. The number of users who used the title as their query is displayed in column four. Users entered more title queries for Topic 374 than for any other topic. Overall, 26 users used titles-as-queries for all 4 topics. Column five shows that users' queries for Topic 408 contained the most unique terms. 879 SIGIR 2007 Proceedings Poster Table 2. Characteristics of users' queries Topic 354 374 408 448 Length Mean (std.) 2.73 (1.04) 3.30 (1.27) 2.95 (1.18) 3.09 (1.09) No. Unique Queries 40 21 38 33 No. `Title' Queries 34 56 31 29 No. Unique Terms 28 18 33 25 the lengthier queries did perform better. For Topics 354 and 448, queries constructed from the most frequently used query terms performed much better than the TREC median. 0.70 Query 1 2 3 0.60 4 5 6 7 Super TREC Median TREC Best 0.50 The performance of the unique queries is presented in Table 3. This table presents precision at 20 (rounded down to the nearest .1). Although gmap is the official measure for the Robust track, it is a run level statistic, so is not suitable for the topic level analysis here. We choose to compute precision at 20 since it is accepted that users typically only search through the first 1-2 pages of results [3]. In this instance, we are interested in seeing whether having access to multiple queries for the same topic could possibly help users. The values for queries submitted for Topic 354 ranged from .00 to .65. The range of values was much smaller for Topic 374 and concentrated around the highest values, while the reverse was true for Topic 448. Table 3. Frequency of precision for unique queries per topic Topic 354 374 408 448 .0 8 0 7 11 .1 6 0 9 11 .2 8 0 9 7 .3 6 1 5 3 .4 5 8 4 1 .5 5 7 2 0 .6 2 3 2 0 .7 0 2 0 0 0.40 0.30 0.20 0.10 0.00 354 374 408 448 Figure 1. Average precision of progressive user queries 4. CONCLUSIONS The goal of this study was to explore the potential benefits of using queries created by multiple users on retrieval performance for difficult topics. Our results provide some evidence that collaborative user queries can improve retrieval for individual searchers and for difficult topics. Unique user queries generated a range of precision values which suggests that query recommendations could help those who initially pose unsuccessful queries. In this study a progressive combination of the most frequently occurring terms in users' queries did not outperform the best runs from TREC. In some cases the progressive queries performed better than the TREC median, but there was no consistent pattern. Finally, this study only examined four topics, which limits the generalizability of the results. Currently, we are analyzing a second dataset collected from another experiment that contains queries for all 50 topics, although there are fewer queries per topic (5-15 as opposed to 81). The average performances for each topic were .243, .486, .211 and .115, respectively. Although the performance measures were not the same, the easiest topic according to the ranking in Table 1 performed the best in our study, while the most difficult topic performed the second best in our study. Our users' queries were the least effective for Topic 448, even though it only represented a mid-range of difficultly. Finally, the performances of the title queries were about average for Topic 354 and 374 (.30 and .55), but were quite poor for Topics 408 and 448 (.10 and .00). Overall, it appears that having access to previous queries could help some users ­ the potential improvement gain for each topic was: +.65, +.40, +.65 and +.40, respectively. We combined unique terms from users' queries in various ways to see if different combinations could improve performance. To create queries for this comparison, we progressively combined the most frequently occurring terms in users' queries for each topic. Query 1 was the term occurring the most frequently in users' queries, while Query 2 was the combination of this term and the term occurring the second most frequently. We also created a `super' query for each topic which contained each unique query term for that topic (e.g., 354's super query contained 28 terms). Figure 1 presents the average precision for each query, for each topic. Only queries containing the seven most frequently occurring terms for each topic are included. This figure also includes the median and best runs from the Robust Track [6]. In no case did progressive user queries outperform the best performing TREC run, but the performance of Query 3 for Topic 354 was very close to the best TREC run. For these queries, there is no apparent relationship between number of terms and performance; queries of different lengths performed the best for different topics. Super queries never performed best, although for Topic 448, for which users' queries were generally unsuccessful, 5. REFERENCES [1] Belkin, N. J., Cool, C., Croft, W. B., & Callan, J. P. (1993). The effect of multiple query representations on information retrieval system performance. Proc. of SIGIR '93, 339-346. [2] Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. Journal of Documentation, 52, 3-50. [3] Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. Proc. of SIGIR '05, 154-161. [4] Kelly, D., Fu, X., & Shah, C. (2007). Effects of rank and precision of search results on users' evaluations of system performance. UNC SILS Technical Report, TR-2007-02. Available at: http://sils.unc.edu/research/techreports.html [5] Smyth, B., Balfe, E., Freyne, J., Briggs, P., Coyle, M., & Boydell, O. (2004). Exploiting query repetition and regularity in an adaptive community-based Web search engine. User Modeling and User-Adapted Interaction, 14(5), 382-423. [6] Voorhees, E. M. (2006). Overview of the TREC 2005 Robust Retrieval Track. Proc. of TREC-14. 880