SIGIR 2007 Proceedings Poster Making Mind and Machine Meet - A Study of Combining Cognitive and Algorithmic Relevance Feedback Chirag Shah, Diane Kelly, and Xin Fu School of Information and Library Science University of Nor th Carolina Chapel Hill NC 27599, USA {chirags,dianek,fu}@email.unc.edu ABSTRACT Using Saracevic's relevance types, we explore approaches to combining algorithm and cognitive relevance in a term relevance feedback scenario. Data collected from 21 users who provided relevance feedback about terms suggested by a system for 50 TREC HARD topics are used. The former type of feedback is considered as cognitive relevance and the latter type is considered as algorithm relevance. We construct retrieval runs using these two types of relevance feedback and experiment with ways of combining them with simple Boolean operators. Results show minimal differences in performance with respect to the different techniques. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Relevance feedback General Terms Human Factors, Performance, Experimentation Keywords Relevance Feedback, Cognitive & Algorithmic Relevance relevance, describes the relationship between a user's perception of his information need, what he or she currently knows about the information need and a document. Situational relevance is concerned with the relationship between the ob ject and the task at hand; Borlund comments that it is difficult to separate this type of relevance from cognitive relevance. Finally, motivational or affective relevance is related to the intentions, goals and motivations of the user. In this paper, we adopt the view of Borlund who states that this latter type of relevance is not an independent type of relevance, but rather an ever-present characteristic of the other types of relevance. There have been several efforts at using algorithmic as well as cognitive relevance [6]. Each type has its own benefits. For instance, Ruthven and Lalmas [6] point out that since systems have access to internal statistical information of a collection, they can select good discriminatory terms for feedback. On the other hand, users may be able to make more informed decisions about terms since it is their information need. Thus, it may be beneficial to combine these two types of relevance. In this paper we present a preliminary investigation into how algorithm and cognitive term relevance feedback can be combined to improve retrieval. 1. INTRODUCTION 2. METHOD We use data from a user study by Kelly and Fu [3] where 21 users provided terms relevance feedback about terms suggested by the system. Fifty topics from TREC's HARD data-set [1] were used and each user evaluated 10 topics. This data-set consists of the 3 GB AQUAINT corpus of newswire text data in English, 50 topics in standard TREC format (title, description and narrative) all taken from the TREC Robust track and relevance judgments. These topics have all been designated 'difficult' by TREC's Robust Track coordinators because systems have not performed well with them in previous TRECs. It is important to note that the evaluation framework limits the relevance type that we are trying approximate to topical relevance. Users were presented with a web-based interface displaying a topic and were asked to create an initial query for the topic. Following this, users were presented with 20 terms and asked to indicate which terms they would add to their queries. To populate the interface with terms, the title component of each topic was used as a query and Okapi BM25 [5] retrieval model with pseudo relevance feedback was employed with the aid of the Lemur toolkit1 . Basic stop words 1 Investigations of term relevance feedback generally focus on one of three things: (1) how to identify terms that will be used for feedback, (2) how to present these terms to users, or (3) how to incorporate user feedback into retrieval. Studies of all three aspects are usually beyond the scope of any one study and in this paper, we focus on the third aspect, and more specifically on how to combine user and system feedback. We frame our investigation in terms of Saracevic's [7] five relevance types, which were later elaborated by Borlund [2]: system or algorithm, topical, pertinence or cognitive, situational and motivational. System or algorithm relevance describes the relationship between a query and the collection of information ob jects. This is the most ob jective type of relevance since it is operationalized by a particular algorithm, and does not involve user judgment. Topical relevance is associated with the aboutness of a particular document; this type of relevance is made by TREC assessors. Pertinence, or cognitive Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. http://www.lemurpro ject.org/ 877 SIGIR 2007 Proceedings Poster and acronym lists were used, but no stemmer. The 20 terms presented to users were from the top 10 retrieved documents. In this study, we consider these terms to represent algorithm relevance, while terms selected by the user to represent cognitive relevance. 3. EXPERIMENTS AND RESULTS We present a set of system runs and then a set of runs that combine user feedback in various ways. The description of the system runs are given below. 1. Original query run: We used titles of the 50 HARD topics as queries. Lemur toolkit with Okapi BM25 model for retrieval was used. 2. Unweighted expansion query run: We let the Okapi retrieval model implemented in Lemur extract the top 20 terms from top 10 documents and used them as the feedback terms. 3. Weighted expansion query run: Same as above except this time we considered weights of the terms as given by Okapi. The explanation of these weights is given in [4]. The results of these three system runs are reported in Table 1. They show that expanded queries performed better than the original ones. Also, the weighted query run outperformed the unweighted query run. There results of the runs that used algorithmic relevance may not seem surprising. Table 1: Average MAP for system runs. Improvements are compared to the original query run. System Avg. MAP Improvement Original query 0.1598 ­ Unweighted expansion 0.2240 40.18% Weighted expansion 0.2331 45.87% Thus, a user selected term was assigned a weight equal to 1 and non-selected term was assigned a weight equal to 0. The results of the four runs resulting from the above mentioned configurations are presented in Table 2. As we can see, the OR operator gave better results than those that use the AND operator. In addition, weighted runs performed better than unweighted ones and combining user feedback with the OR operator and including weights performs the best. 4. CONCLUSION Eliciting feedback from users can potentially be a useful component of an IR system and identifying effective ways of combining system and user feedback remains an important research problem. In this paper, we combined algorithmic and cognitive relevance feedbacks and presented preliminary results of our experiment. We used data from a previously conducted user study that included feedback from 21 users on 50 HARD topics. For each topic, queries constructed from topic titles were used to generate 20 feedback terms (algorithmic relevance). Each user was asked to select relevant terms from this set (cognitive relevance). Various runs combining these two types of relevance feedback in different ways were evaluated. Retrieval results indicated that (1) using weighted feedback outperforms unweighted feedback, and (2) combining user feedback with the OR operator outperforms combinations with the AND operator. An important avenue for further explorations would be comparing the effects of the kind of combination described in this paper with some state-of-the-art pseudo-relevance feedback schemes such as relevance models. Unfortunately due to the nature of the data collected from the previous user study, it was not useful to compare our results with those of such models. Although we only examined two of Saracevic's relevance types in this work, understanding more about the relationship among these types and how they can be used in combination to improve retrieval are important directions for future research. Consideration of user weights, which was limited to binary in this study, should also be explored further. Table 2: Average MAP for combination runs. Improvements are compared to the original query run. System Avg. MAP Improvement Unweighted AND 0.1917 19.96% Unweighted OR 0.2240 40.18% Weighted AND 0.1921 20.21% Weighted OR 0.2359 47.62% Now we turn our attention to cognitive relevance. A simple way of using cognitive relevance is keeping the terms that the user selected and discarding the rest. Thus, we are ANDing cognitive relevance with the algorithmic relevance. Another possibility would be to combine them using OR operator, which would result in keeping terms found by the system as well as those selected by the user. This boils down to our second system run described earlier. However, when we consider weights of the terms, things become interesting. In case of AND operation, considering weights will simply mean using the feedback terms that the user selected with the weights that the algorithm found. In case of OR operation, we need to add weights evaluated by the system and given by the user. Since the user provided binary feedback as relevance judgment, we decided to use it as weight itself. 5. REFERENCES [1] James Allen. HARD track overview in TREC 2005 high accuracy retrieval from documents. In Proceedings of TREC, pages 1­17, 2006. [2] Pia Borlund. The concept of relevance in IR. JASIST, 54(10):913­925, 2003. [3] Diane Kelly and Xin Fu. Elicitation of term relevance feedback: An investigation of term source and context. In Proceedings of the ACM Conference on Research in Information Retrieval (SIGIR), 2006. [4] S. E. Robertson. On term selection for query expansion. Journal of Documentation, 46:359­364, 1990. [5] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC, pages 21­30, 1994. [6] Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. The Know ledge Engineering Review, 18(2):95­145, 2003. [7] Tefko Saracevic. Relevance reconsidered. In Proceedings of CoLIS 2, pages 201­218, 1996. 878