SIGIR 2007 Proceedings Poster Retrieval of Discussions from Enterprise Mailing Lists Maheedhar Kolla mkolla@uwaterloo.ca University of Waterloo Canada. N2L 3G1 Olga Vechtomova ovechtom@uwaterloo.ca University of Waterloo Canada. N2L 3G1 ABSTRACT Mailing list archives in an enterprise are a valuable source for employees to dig into the past proceedings of the organization that could be relevant to their present task. Going through the proceedings of discussions about certain topics might be cumbersome and regular search techniques might not work in this context due to the genre that the documents belong to. In this paper, we propose methods, based on theory of sub jectivity, to retrieve email messages that could contain argumentative discussions about the topic that the user is interested in. Categories and Sub ject Descriptors: H.4.3 [INFORMATION SYSTEMS APPLICATIONS]: Communications Applications General Terms: Experimentation, Theory. Keywords: Enterprise Search, Email Search, Sub jectivity Theory. In discussions relating to the above example, participants contribute to the flow by expressing their viewpoints/opinions on the ideas proposed previously (or a new one if the author is initiating the discussion). This is followed by other participants taking turns in either agreeing with the previous author(s) or by suggesting some alternative. This motivates our hypothesis that by identifying the sub jective opinion of the author about the topic, we may be able to retrieve the discussions having a pro/con argument in their content. 2. SUBJECTIVITY IN DISCUSSIONS Skomorowski [4] proposed a method to determine if some text is expressing opinion targeted towards a topic, by taking into account the sub jective adjectives surrounding the topic terms in text. Adjectives, in noun phrases or in the context of nouns, act as noun modifiers. Therefore, by identifying the likelihood of a sub jective adjective modifying a noun, he proposed to infer the sub jective opinion directed at that noun. In our work, we used the 1336 sub jective adjectives manually composed by Hatzivassiloglou and McKeown [3] to retrieve messages having arguments about the topic. Instead of using all 1336 sub jective adjectives, we selected a subset of adjectives based on the idea that sub jective adjective usage varies with the topic in discussion. For example, sub jective adjectives such as "incompatible", "unsupported" would be more frequent in discussions about "systems/products" or "devices" and may also express the author's sub jective opinion towards the topic of those discussions. We used the pseudo-feedback term selection method to select the subset of sub jective adjectives, as proposed by Carpineto et.al [1]. We computed the KL divergence values for all the subjective adjective terms, composed by Hatzivassiloglou and McKeown [3], as follows: kldvalue (t) = [pR (t)] log [pR (t)/pC (t)] (1) 1. INTRODUCTION Mailing lists of an organization offer a chance for the employees to discuss technical problems, debate on policies-inmaking to reach an acceptable solution. Archived mailing lists are a valuable source of information that gives users an opportunity to analyze the proceedings taken place over certain period of time [6] and may contain answers to such questions as: What points in the policy did most users agree/disagree with? Users interested in past discussions might like to know both the pro and the con arguments made in those discussions. Hence, the retrieved email message(s) should satisfy two constraints: 1) It should be on the topic that the user is interested in and 2) It should have a pro/con argument about the topic in discussion. Several approaches have been proposed to identify the discussions with a pro/con argument on the topic [2]. We based our experiments on the hypothesis that emails containing a pro/con argument tend to express a sub jective opinion of the author towards the topic in discussion. For example: - Title: Smileys and language - Narrative: Relevant message should either express preference for the use of emoticons or present arguments against their use Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. where pR (t) is the probability of the term t in the relevant (pseudo-relevant) document set; pC (t) is the probability of the term t in the whole collection. The probability values were computed as follows: p(C |R) (t) = f(C |R) (t) N T(C |R) (2) where, f(C |R) (t) is the count of the terms in the relevant document set (R), and N T(C |R) is the number of tokens in the document set. In the current experiments, we used the top 25 documents retrieved by Okapi BM25 retrieval [5] as the pseudo-relevant document set and extracted the top 40 881 SIGIR 2007 Proceedings Poster subjective adjectives. We then used the selected set of adjectives for re-ranking the messages initially retrieved. We also used these sub jective adjectives for expanding initial query, by treating these adjectives as pseudo-relevant feedback terms. Run t.baseline s.rerank s.feedback 2.1 Re-ranking with Subjective Adjectives Using the selected subset of sub jective adjectives, we reranked the messages retrieved for initial query. We computed the BM25 term weight [5] for each sub jective term as follows: wt Dock = T Ft (k1 + 1) k1 ((1 - b) + b |D L| ) AV D L level 1 2 1 2 1 2 MAP 0.2898 0.1853 0.2493 0.2012 0.3025 0.2048 P@5 0.4640 0.2739 0.5640 0.3913 0.5280 0.3304 P@10 0.4580 0.2522 0.4580 0.3283 0.4860 0.2978 bpref 0.3084 0.2109 0.3501 0.3449 0.3176 0.2941 + T Ft wt (3) Table 1: Using sub jective adjective terms for pseudo-feedback and re-ranking purp oses. Only 46 queries had at least one judged do cument that contains a pro/con argument ab out the topic. Measures with indicate statistical significance over baseline run (paired t-test p< 0.05) where wt is the IDF value of the term t; T Ft is the frequency of the term t in document k; DL is the length of the document k and AV DL is the average document length in the collection, k1 and b are constant values set to 1.2 and 0.5 respectively. The IDF Value of a term t is computed as follows: wt = log (N umber of documents in collection) (N umber of documents containing t) (4) P at K Precision Values at K 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 t.baseline (on topic) s.rerank (on topic) s.feedback (on topic) t.baseline (on topic + pro/con) s.rerank (on topic + pro/con) s.feedback (on topic + pro/con) We then add the BM25 term weight, as computed above, of all sub jective terms in the document (wt Dock ) to obtain an updated score. 2.2 Pseudo-relevance Feedback with Subjective Adjectives We selected the top 25 sub jective adjectives, ranked based on their KL divergence values, for query expansion. To avoid query drift caused by term expansion, we assigned lower weights to the sub jective adjective terms. We trained our system using the Discussion Search 2005 test data [2] and found to achieve better performance when assigned a weight of 0.03 to each expansion term. In current experiments, we assigned the same weight (0.03) to each expansion term. 0.2 0.15 0 50 100 K 150 200 Figure 1: Plots: Precision values (P ) at various levels of do cuments retrieved (K ) using first (on topic) and second (on topic + pro/con) level relevance judgements. argument about the topic. We would like to experiment further with various combinations of number of pseudo-relevant documents selected and number of sub jective adjectives extracted for query expansion. Further, we would like to determine the polarity of a message -- whether it expresses a pro or against argument on the topic. We would also like to study the sub jectivity property combined with the thread structure of emails in a mailing list. 3. EVALUATION In order to evaluate the methods, we used the topic set and W3C corpus provided by NIST in context of the Discussion Search 2006 task. For baseline run, t.baseline, we extracted the title terms from the topics as queries and ranked the messages based on the Okapi BM25 ranking function [5]. For re-rank run, s.re-rank, we re-ranked the messages initially retrieved by the method explained in Section 2.1. For the feedback run, s.feedback, we followed the method explained in Section 2.2. All runs were compared by two levels of performance measures [Table 1][Figure 1]: · Performance of system in retrieving messages that are on-topic, but may or may not contain a pro/con argument about the topic (level 1). · Performance of system in retrieving messages that have a pro/con argument about the topic (level 2). 5. REFERENCES [1] C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst., 19(1):1­27, 2001. [2] N. Craswell. Overview of the trec-2005 enterprise track. In Proceedings of the 14th TREC, 2005. [3] V. Hatzivassiloglou and K. R. McKeown. Predicting the semantic orientation of adjectives. In Proceedings of the eighth conference on EACL, pages 174­181, 1997. [4] J. Skomorowski. Topical opinion retrieval. Master's thesis, University of Waterlo o, (2006). [5] K. Sparck-Jones, S. Walker, and S. E. Rob ertson. A probabilistic mo del of information retrieval: development and comparative exp eriments - part 2. Information Processing and Management, 36(6):809­840, 2000. [6] Y. Wu and D. W. Oard. Indexing emails and email threads for retrieval. In Proceedings of the 28th annual international ACM SIGIR conference, pages 665­666, (2005). 4. CONCLUSION AND FUTURE WORK We proposed methods to retrieve discussions containing a pro/con argument about a topic. We found that presence of sub jective adjectives indicates the presence of a pro/con 882