WWW 2007 / Track: Data Mining Session: Predictive Modeling of Web Users Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs Qiaozhu Mei , Xu Ling , Matthew Wondra , Hang Su , ChengXiang Zhai Depar tment of Computer Science University of Illinois at Urbana-Champaign Depar tment of EECS Vanderbilt University ABSTRACT In this pap er, we define the problem of topic-sentiment analysis on Weblogs and prop ose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The prop osed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated sentiments. It could also provide general sentiment models that are applicable to any ad hoc topics. With a sp ecifically designed HMM structure, the sentiment models and topic models estimated with TSM can b e utilized to extract topic life cycles and sentiment dynamics. Empirical exp eriments on different Weblog datasets show that this approach is effective for modeling the topic facets and sentiments and extracting their dynamics from Weblog collections. The TSM model is quite general; it can b e applied to any text collections with a mixture of topics and sentiments, thus has many p otential applications, such as search result summarization, opinion tracking, and user b ehavior prediction. Categories and Sub ject Descriptors: H.3.3 [Information Search and Retrieval]: Text Mining General Terms: Algorithms Keywords: topic-sentiment mixture, weblogs, mixture model, topic models, sentiment analysis 1. INTRODUCTION More and more internet users now publish online dairies and express their opinions with Weblogs (i.e., blogs). The wide coverage of topics, dynamics of discussion, and abundance of opinions in Weblogs make blog data extremely valuable for mining user opinions ab out all kinds of topics (e.g., products, p olitical figures, etc.), which in turn would enable a wide range of applications, such as opinion search for ordinary users, opinion tracking for business intelligence, and user b ehavior prediction for targeted advertising. Technically, the task of mining user opinions from Weblogs b oils down to sentiment analysis of blog data ­ identifying and extracting p ositive and negative opinions from blog articles. Although much work has b een done recently on blog mining [11, 7, 6, 15], most existing work aims at extracting and analyzing topical contents of blog articles without any Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2007, May 8­12, 2007, Banff, Alberta, Canada. ACM 978-1-59593-654-7/07/0005. analysis of sentiments in an article. The lack of sentiment analysis in such work often limits the effectiveness of the mining results. For example, in [6], a burst of blog mentions ab out a b ook has b een shown to b e correlated with a spike of sales of the b ook in Amazon.com. However, a burst of criticism of a b ook is unlikely to indicate a growth of the b ook sales. Similarly, a decrease of blog mentions ab out a product might actually b e caused by the decrease of complaints ab out its defects. Thus understanding the p ositive and negative opinions ab out each topic/subtopic of the product is critical to making more accurate predictions and decisions. There has also b een some work trying to capture the p ositive and negative sentiments in Weblogs. For example, Opinmind [20] is a commercial weblog search engine which can categorize the search results into p ositive and negative opinions. Mishne and others analyze the sentiments [18] and moods [19] in Weblogs, and use the temp oral patterns of sentiments to predict the b ook sales as opp osed to simple blog mentions. However, a common deficiency of all this work is that the prop osed approaches extract only the overall sentiment of a query or a blog article, but can neither distinguish different subtopics within a blog article, nor analyze the sentiment of a subtopic. Since a blog article often covers a mixture of subtopics and may hold different opinions for different subtopics, it would b e more useful to analyze sentiments at the level of subtopics. For example, a user may like the price and fuel efficiency of a new Toyota Camry, but dislike its power and safety asp ects. Indeed, p eople tend to have different opinions ab out different features of a product [28, 13]. As another example, a voter may agree with some p oints made by a presidential candidate, but disagree with some others. In reality, a general statement of good or bad ab out a query is not so informative to the user, who usually wants to drill down in different facets and explore more detailed information (e.g., "price", "battery life", "warranty" of a laptop). In all these scenarios, a more in-depth analysis of sentiments in sp ecific asp ects of a topic would b e much more useful than the analysis of the overall sentiment of a blog article. To improve the accuracy and utility of opinion mining from blog data, we prop ose to conduct an in-depth analysis of blog articles to reveal the ma jor topics in an article, associate each topic with sentiment p olarities, and model the dynamics of each topic and its corresp onding sentiments. Such topic-sentiment analysis can p otentially supp ort many applications. For example, it can b e used to generate a more detailed topic-sentiment summary of Weblog search results as shown in Figure 1. 171 WWW 2007 / Track: Data Mining Session: Predictive Modeling of Web Users Topic-sentiment summary Query: Dell Laptop positive Topic 1 (Price) Topic 2 (Battery) · it is the best site and they show Dell coupon code as early as possible · One thing I really like about this Dell battery is the Express Charge feature. Topic-sentiment dynamics (Topic = Price) strength Positive Negative Neutral negative · Even though Dell's price is cheaper, we still don't want it. · ...... · my Dell battery sucks · Stupid Dell laptop battery · ...... neutral · mac pro vs. dell precision: a price comparis.. · DELL is trading at $24.66 · i still want a free battery from dell.. · ...... time Figure 1: A possible application of topic-sentiment analysis In Figure 1, given a query word representing a user's ad hoc information need (e.g., a product), the system extracts the latent facets (subtopics) in the search results, and associates each subtopic with p ositive and negative sentiments. From the example sentences on the left, which are organized in a two dimensional structure, the user can understand the pros and cons of each facet of the product, or what are its b est and worst asp ects. From the strength dynamics of a topic and its associated sentiments on the right, the user can get deep er understanding of how the opinions ab out a sp ecific facet change over time. To the b est of our knowledge, no existing work could simultaneously extract multiple topics and different sentiments from Weblog articles. In this pap er, we study the novel problem of modeling subtopics and sentiments simultaneously in Weblogs. We formally define the Topic-Sentiment Analysis (TSA) problem and prop ose a probabilistic mixture model called TopicSentiment Mixture (TSM) to model and extract the multiple subtopics and sentiments in a collection of blog articles. Sp ecifically, a blog article is assumed to b e "generated" by sampling words from a mixture model involving a background language model, a set of topic language models, and two (positive and negative ) sentiment language models. With this model, we can extract the topic/subtopics from blog articles, reveal the correlation of these topics and different sentiments, and further model the dynamics of each topic and its associated sentiments. We evaluate our approach on different weblog data sets. The results show that our method is effective for all the tasks of the topic-sentiment analysis. The prop osed approach is quite general and has many p otential applications. The mining results are quite useful for summarizing search results, monitoring public opinions, predicting user b ehaviors, and making business decisions. Our method requires no prior knowledge ab out a domain, and can extract general sentiment models applicable to any ad hoc queries. Although we only tested the TSM on Weblog articles, it is applicable to any text data with mixed topics and sentiments, such as customer reviews and emails. The rest of the pap er is organized as follows. In Section 2, we formally define the problem of Topic-Sentiment Analysis. In Section 3, we present the Topic-Sentiment Mixture model and discuss the estimation of its parameters. We show how to extract the dynamics of topics and sentiments in Section 4, and present our exp eriment results in Section 5. In Sections 6 and 7, we discuss the related work and conclude. 2. PROBLEM FORMULATION In this section, we formally define the general problem of Topic-Sentiment Analysis. Let C = {d1 , d2 , ..., dm } b e a set of documents (e.g., blog articles). We assume that C covers a numb er of topics, or subtopics (also known as themes ) and some related sentiments. Following [9, 1, 16, 17], we further assume that there are k ma jor topics (subtopics) in the documents, {1 , 2 , ..., k }, each b eing characterized by a multinomial distribution over all the words in our vocabulary (also known as a unigram language model). Following [23, 21, 13], we assume that there are two sentiment p olarities in Weblog articles, the positive and the negative sentiment. The two sentiments are associated with each topic in a document, representing the p ositive and negative opinions ab out the topic. Definition 1 (Topic Model) A topic model in a text collection C is a probabilistic distribution of words {p(w|)}wV and represents a semantically coherent topic. w Clearly, we have V p(w | ) = 1. Intuitively, the high probability words of a topic model often suggest what theme the topic captures. For example, a topic ab out the movie "Da Vinci Code" may assign a high probability to words like "movie", "Tom" and "Hanks" This definition can b e easily extended to a distribution of multiword phrases. We assume that there are k such topic models in the collection. Definition 2 (Sentiment Model) A sentiment model in a text collection C is a probabilistic distribution of words representing either positive opinions ({p(w|Pw }wV ) or neg) ative opinions ({p(w|N )}wV ). We have V p(w |P ) = w 1 and V p(w |N ) = 1. Sentiment models are orthogonal to topic models in the sense that they would assign high probabilities to general words that are frequently used to express sentiment p olarities whereas topical models would assign high probabilities to words representing topical contents with neutral opinions. Definition 3 (Sentiment Coverage) A sentiment coverage of a topic in a document (or a collection of documents) is the relative coverage of the neurtral, p ositive, and negative opinions ab out the topic in the document (or the collection of documents). Formally, we define a sentiment coverage of topic i in document d as ci,d = {i,d,F , i,d,P , i,d,N }. i,d,F , i,d,P ,i,d,N are the coverage of neutral, p ositive, and negative opinions, resp ectively; they form a probability distribution and satisfy i,d,F + i,d,P + i,d,N = 1. In many applications, we also want to know how the neu- 172 WWW 2007 / Track: Data Mining tral discussions, the p ositive opinions, and the negative opinions ab out the topic (subtopic) change over time. For this purp ose, we introduce two additional concepts, "topic life cycle" and "sentiment dynamics" as follows. Definition 4 (Topic Life Cycle) A topic life cycle, also known as a theme life cycle in [16], is a time series representing the strength distribution of the neutral contents of a topic over the time line. The strength can b e measured based on either the amount of text which a topic can explain [16] or the relative strength of topics in a time p eriod [15, 17]. In this pap er, we follow [16] and model the topic life cycles with the amount of document content that is generated with each topic model in different time p eriods. Definition 5 (Sentiment Dynamics) The sentiment dynamics for a topic is a time series representing the strength distribution of a sentiment s {P, N } associated with . The strength can indicate how much p ositive/negative opinion there is ab out the given topic in each time p eriod. Being consistent with topic life cycles, we model the sentiment dynamics with the amount of text associated with topic that is generated with each sentiment model. Based on the concepts ab ove, we define the ma jor tasks of Topic-Sentiment Analysis (TSA) on weblogs as: (1) Learning General Sentiment Models: Learn a sentiment model for p ositive opinions and a sentiment model for negative opinions, which are general enough to b e used in new unlab eled collections. (2) Extracting Topic Models and Sentiment Coverages: Given a collection of Weblog articles and the general sentiment models learnt, customize the sentiment models to this collection, extract the topic models, and extract the sentiment coverages. (3) Modeling Topic Life Cycle and Sentiment Dynamics: Model the life cycles of each topic and the dynamics of each sentiment associated with that topic in the given collection. This problem as defined ab ove is more challenging than many existing topic extraction tasks and sentiment classification tasks for several reasons. First, it is not immediately clear how to model topics and sentiments simultaneously with a mixture model. No existing topic extraction work [9, 1, 16, 15, 17] could extract sentiment models from text, while no sentiment classification algorithm could model a mixture of topics simultaneously. Second, it is unclear how to obtain sentiment models that are indep endent of sp ecific contents of topics and can b e generally applicable to any collection representing a user's ad hoc information need. Most existing sentiment classification methods overfit to the sp ecific training data provided. Finally, computing and distinguishing topic life cycles and sentiment dynamics is also a challenging task. In the next section, we will present a unified probabilistic approach to solve these challenges. Session: Predictive Modeling of Web Users with this model could capture the p ositive or negative sentiment well. To model b oth topics and sentiments, we also use a mixture of multinomials, but extend the model structure to include two sentiment models to naturally capture sentiments. In the previous work [15, 17], the words in a blog article are classified into two categories: (1) common English words (e.g., "the", "a", "of ") and (2) words related to a topical theme (e.g., "nano", "price", "mini" in the documents ab out iPod). The common English words are captured with a background comp onent model [28, 16, 15], and the topical words are captured with topic models. In our topic-sentiment model, we extend the categories for the topical words in existing approaches. Sp ecifically, for the words related to a topic, we further categorize them into three sub-categories: (1) words ab out the topic with neutral opinions (e.g., "nano", "price"); (2) words representing the p ositive opinions of the topic (e.g., "awesome", "love"); and (3) words representing the negative opinions ab out the topic (e.g., "hate", "bad"). Corresp ondingly, we introduce four multinomial distributions: (1) B is a background topic model to capture common English words; (2) = {1 , ..., k } are k topic models to capture neutral descriptions ab out k global subtopics in the collection; (3) P is a p ositive sentiment model to capture p ositive opinions; and (4) N is a negative sentiment model to capture negative opinions for all the topics in the collection. According to this mixture model, an author would "write" a Weblog article by making the following decisions stochastically and sampling each word from the comp onent models: (1) The author would first decide whether the word will b e a common English word. If so, the word would b e sampled according to B . (2) If not, the author would then decide which of the k subtopics the word should b e used to describ e. (3) Once the author decides which topic the word is ab out, the author will further decide whether the word is used to describ e the topic neutrally, p ositively, or negatively. (4) Let the topic picked in step (2) b e the j -th topic j . The author would finally sample a word using j , P or N , according to the decision in step(3). This generation process is illustrated in Figure 2. 1 Neutral Positive 2 1, d, F 2, d, F k, d, F 1 Themes 2 ... k d1 d2 dk B 1 - B w j, d, P P ... k 3. A MIXTURE MODEL FOR THEME AND SENTIMENT ANALYSIS N j, d, N B d Negative 3.1 The Generation Process A lot of previous work has shown the effectiveness of mixture of multinomial distributions (mixture language models) in extracting topics (themes, subtopics) from either plain text collections or contextualized collections [9, 1, 16, 15, 17, 12]. However, none of this work models topics and sentiments simultaneously; if we apply an existing topic model on the weblog articles directly, none of the topics extracted Figure 2: The generation process of the topicsentiment mixture model We now formally present the Topic-Sentiment Mixture model and the estimation of parameters based on blog data. 173 WWW 2007 / Track: Data Mining Session: Predictive Modeling of Web Users given a query, Opinmind can retrieve p ositive sentences and negative sentences, thus we can obtain examples with sentiment lab els for a topic (i.e., the query) from Opinmind. The query can b e regarded as a topic lab el. To ensure diversity of topics, we can submit various queries to Opinmind and mix all the results to form a training collection. Presumably, if the topics in this training collection are diversified enough, the sentiment models learnt would b e very general. With such a training collection, we have topic lab els and sentiment lab els for each document. Formally, we have C = {(d, td , sd )}, where td indicates which topics the document is ab out, and sd indicates whether d holds p ositive or negative opinions ab out the topics. We then use the topic-sentiment model presented in Section 3.2 to fit the training data and estimate the sentiment models. Since we have topic and sentiment lab els, we imp ose the following constraints: (1) dj = 1 if td = j and dj = 0 otherwise; (2) j,d,P = 0 if sd is negative and j,d,N = 0 if sd is positive. In Section 5, we will show that this estimation method is effective for extracting general sentiment models and the diversity of topics helps improve the generality of the sentiment models learnt. Rather than directly using the learnt sentiment models to analyze our target collection, we use them to define a prior on the sentiment models and estimate sentiment models (and the topic models) using the maximum a p osterior estimator. This way would allow us to adapt the general sentiment models to our collection and further improve the accuracy of the sentiment models, which is traditionally ¯ done in a domain dep endent way. Sp ecifically, let P and ¯ N b e the p ositive and negative sentiment models learnt from some training collections. We define the following two conjugate Dirichlet priors for the sentiment model P and ¯ N , resp ectively: Dir ({1 + µP p(w|P )}wV ) and Dir ({1 + ¯ µN p(w|N )}wV ), where the parameters µP and µN indicate how strong our confidence is on the sentiment model prior. Since the prior is conjugate, µP (or µN ) can b e interpreted as "equivalent sample size", which means that the impact of adding the prior would b e equivalent to adding ¯ ¯ µP p(w|P ) (or µN p(w|N )) pseudo counts for word w when estimating the sentiment model p(w|P ) (or p(w|N )). If we have some prior knowledge on the topic models, we can also define them as conjugate prior for some j . Indeed, given a topic, a user often has some knowledge ab out what asp ects are interesting. For example, when the user is searching for laptops, we know that he is very likely interested in "price" and "configuration". It will b e nice if we "guide" the model to enforce two of the topic models to b e as close as p ossible to the predefined facets. Therefore, in general, we may assume that the prior on all the parameters in the model is p() p ( P ) p ( N ) w V 3.2 The Topic-Sentiment Mixture Model Let C = {d1 , ..., dm } b e a collection of weblog articles, = {1 , ..., k } b e k topic models, P and N b e a p ositive and negative sentiment model resp ectively. The log likelihood of the whole collection C according to the TSM model is log (C ) = d C w V c(w : d)log [B p(w|B ) + (1 - B ) jk =1 dj × (j,d,F p(w|j ) + j,d,P p(w|P ) + j,d,N p(w|N ))] where c(w : d) is the count of word w in document d, B is the probability of choosing B , dj is the probability of choosing the j -th topic in document d, and {j,d,F , j,d,P , j,d,N } is the sentiment coverage of topic j in document d, as defined in Section 2. Similar to existing work [28, 16, 15, 17], we also regularize this model by fixing some parameters. B is set to an empirical constant b etween 0 and 1, which indicates how much noise that we b elieve exists in the weblog collection. We then set the background model as d c(w, d) w C d p(w|B ) = V C c(w , d) The parameters remaining to b e estimated are: (1) the topic models, = {1 , ..., k }; (2) the sentiment models, P and N ; (3) the document topic probabilities dj ; and (4) the sentiment coverage for each document, {j,d,F , j,d,P , j,d,N }. We denote the whole set of free parameters as . Without any prior knowledge, we may use the maximum likelihood estimator to estimate all the parameters. Sp ecifically, we can use the Exp ectation-Maximization (EM) algorithm [3] to compute the maximum likelihood estimate iteratively; the up dating formulas are shown in Figure 3. In these formulas, {zd,w,j,s } is a set of hidden variables (s {F, P, N }), and p(zd,w,j,s ) is the probability that word w in document d is generated from the j -th topic, using topic/sentiment model s. However, in reality, if we do not provide any constraint on the model, the sentiment models estimated from the EM algorithm will b e very biased towards sp ecific contents of the collection, and the topic models will also b e "contaminated" with sentiments. This is b ecause the opinion words and topical words may co-occur with each other, thus they will not b e separated by the EM algorithm. This is unsatisfactory as we want our sentiment models to b e indep endent of the topics, while the topic models should b e neutral. In order to solve this problem, we introduce a regularized two-phase estimation framework, in which we first learn a general prior distribution on the sentiment models and then combine this prior with the data likelihood to estimate the parameters using the maximum a p osterior (MAP) estimator. 3.3 Defining Model Priors The prior distribution should tell the TSM what the sentiment models should look like in the working collection. This knowledge may b e obtained from domain sp ecific lexicons, or training data in this domain as in [23]. However, it is imp ossible to have such knowledge or training data for every ad hoc topics, or queries. Therefore, we want the prior sentiment models to b e general enough to apply to any ad hoc topics. In this section, we show how we may exploit an online sentiment retrieval service such as Opinmind [20] to induce a general prior on the sentiment models. When jk =1 p ( j ) = ¯ w V p(w|P )µP p(w|P ) p(w|j )µj p(w|j ) ¯ ¯ p(w|N )µN p(w|N jk w ) =1 V where µj = 0 if we do not have prior knowledge on j . 3.4 Maximum A Posterior Estimation With the prior defined ab ove, we may use the MAP esti^ mator: = arg max p(C |)p() 174 WWW 2007 / Track: Data Mining Session: Predictive Modeling of Web Users (n) (n) (1 - B )dj j,d,F p(n) (w |j ) (n) (n) (n) (j ,d,F p(n) (w |j ) + j ,d,P p(n) (w |P ) j =1 dj p(zd,w,j,F = 1) = B p(w |B ) + (1 - B ) k k k + j (n) (n) (w | )) N ,d,N p p(zd,w,j,P = 1) = B p(w |B ) + (1 - B ) (n) (n) (1 - B )dj j,d,P p(n) (w |P ) (n) (n) (n) (j ,d,F p(n) (w |j ) + j ,d,P p(n) (w |P ) j =1 dj (n) (n) (1 - B )dj j,d,N p(n) (w |N ) (n) (n) (n) (j ,d,F p(n) (w |j ) + j ,d,P p(n) (w |P ) j =1 dj + j (n) (n) (w | )) N ,d,N p p(zd,w,j,N = 1) (n+1) = dj = = = = = j,d,F (n+1) j,d,P (n+1) j,d,N p (n+1) (n+1) (w | j ) p(n+1) (w |P ) p(n+1) (w |N ) = = B p(w |B ) + (1 - B ) + j w c(w , d)(p(zd,w,j,F = 1) + p(zd,w,j,P = 1) + p(zd,w,j,N = 1)) V k w V c(w , d)(p(zd,w,j ,F = 1) + p(zd,w,j ,P = 1) + p(zd,w,j ,N = 1)) j =1 w V c(w , d)p(zd,w,j,F = 1) w V c(w , d)(p(zd,w,j,F = 1) + p(zd,w,j,P = 1) + p(zd,w,j,N = 1)) w V c(w , d)p(zd,w,j,P = 1) w c(w , d)(p(zd,w,j,F = 1) + p(zd,w,j,P = 1) + p(zd,w,j,N = 1)) V w V c(w , d)p(zd,w,j,N = 1) w V c(w , d)(p(zd,w,j,F = 1) + p(zd,w,j,P = 1) + p(zd,w,j,N = 1)) d C c(w , d)p(zd,w,j,F = 1) w d , V C c(w d)p(zd,w ,j,F = 1) d k C j =1 c(w , d)p(zd,w,j,P = 1) w d k , V C j =1 c(w d)p(zd,w ,j,P = 1) d k C j =1 c(w , d)p(zd,w,j,N = 1) w d k , V C j =1 c(w d)p(zd,w ,j,N = 1) (n) (n) (w | )) N ,d,N p Figure 3: EM updating formulas for the topic-sentiment mixture model It can b e computed by rewriting the M-step in the EM algorithm in Section 3.2 to incorp orate the pseudo counts given by the prior [14]. The new M-step up dating formulas are: p(n+1) (w |P ) = ¯ µP p(w |P ) + w µP + V ¯ µN p(w |N ) + w µN + V d d C C where x {j, P, N }, and s is a language model of s. 3. Reveal the overall opinions for documents/topics: Given a document d and a topic j , the overall sentiment distribution for j in d is the sentiment coverage {j,d,F , j,d,P , j,d,N }. The overall sentiment strength (e.g., p ositive sentiment) for the topic j is d C dj j,d,P d S (j, P ) = C dj k k j =1 c(w , d)p(zd,w,j,P = 1) ,j,P j =1 c(w , d)p(zd,w = 1) p(n+1) (w |N ) = d d C C ¯ µj p(w |j ) + p(n+1) (w |j ) = w µj + V s d C C d k k j =1 c(w , d)p(zd,w,j,N = 1) ,j,N j =1 c(w , d)p(zd,w = 1) c(w , d)p(zd,w,j,F = 1) c(w , d)p(zd,w ,j,F = 1) 4. SENTIMENT DYNAMICS ANALYSIS While the TSM model can b e directly used to analyze topics and sentiments in many ways, it does not directly model the topic life cycles or sentiment dynamics. In addition to associating the sentiments with multiple subtopics, we would also like to show how the p ositive/negative opinions ab out a given subtopic change over time. The comparison of such temp oral patterns (i.e., topic life cycles and corresp onding sentiment dynamics) could p otentially provide more in-depth understanding of the public opinions than [20], and yield more accurate predictions of user b ehavior than using the methods prop osed in [6] and [19]. To achieve this goal, we can approximate these temp oral patterns by partitioning documents into their corresp onding time p eriods and computing the p osterior probability of p(t|j ), p(t|j , P ) and p(t|j , N ), where t is a time p eriod. This approach has the limitation that these p osterior distributions are not well defined, b ecause the time variable t is nowhere involved in the original model. An alternative approach would b e to model the time variable t explicitly in the model as in [15, 17], but this would bring in many more free parameters to the model, making it harder to estimate all the parameters reliably. Defining a good partition of the time line is also a challenging problem, since too coarse a The parameters µ can b e either empirically set to constants, or set through regularized estimation [25], in which we would start with very large µ s and then gradually discount µ s in each EM iteration until some stopping condition is satisfied. 3.5 Utilizing the Model Once the parameters in the model are estimated, many tasks can b e done by utilizing the model parameters. 1. Rank sentences for topics: Given a set of sentences and a theme j , we can rank the sentences according to a topic j with the score w p(w|j ) S cor ej (s) = -D(j ||s ) = - p(w|j ) log p(w|s ) V where s is a smoothed language model of sentence s. 2. Categorize sentences by sentiments: Given a sentence s assigned to topic j , we can assign s to p ositive, negative, or neutral sentiment according to w p(w|s ) arg max -D(s ||x ) = arg max - p(w|s ) log p(w|x ) x x V 175 WWW 2007 / Track: Data Mining partition would miss many bursting patterns, while too fine granularity a time p eriod may not b e estimated reliably b ecause of data sparseness. In this work, we present another approach to extract topic life cycles and sentiment dynamics, which is similar to the method used in [16]. Sp ecifically, we use a hidden Markov model (HMM) to tag every word in the collection with a topic and sentiment p olarity. Once all words are tagged, the topic life cycles and sentiment dynamics could b e extracted by counting the words with corresp onding lab els. We first sort the documents with their time stamps, and convert the whole collection into a long sequence of words. On the surface, it app ears that we could follow [16] and construct an HMM with each state corresp onding to a topic model (including the background model), and set the output probability of state j to p(w|j ). A topic state can either stay on itself or transit to some other topic states through the background state. The system can learn (from our collection) the transition probabilities with the Baum-Welch algorithm [24] and decode the collection sequence with the Viterbi algorithm [24]. We can easily model sentiments by adding two sentiment states to the HMM. Unfortunately, this structure cannot decode which sentiment word is ab out which topic. Below, we present an alternative HMM structure (shown in Figure 4) that can b etter serve our purp ose. Session: Predictive Modeling of Web Users life cycles and sentiment dynamics by counting the numb er of words lab eled with the corresp onding state over time. 5. EXPERIMENTS AND RESULTS 5.1 Data Sets We need two typ es of data sets for evaluation. One is used to learn the general sentiment priors, thus should have lab els for p ositive and negative sentiments. In order to extract very general sentiment models, we want the topics in this data set to b e as diversified as p ossible. We construct this training data set by leveraging an existing weblog sentiment retrieval system (i.e., Opinmind.com [20]), i.e., we submit different queries to Opinmind and mix the downloaded classified results. This also gives us natural b oundaries of topics in the training collection. The comp osition of this training data set (denotated as "OPIN") is shown in Table 1. Topic laptops movies universities airlines cities # Pos. 346 396 464 283 500 # Neg. 142 398 414 400 500 Topic people banks insurances nba teams cars # Pos. 441 292 354 262 399 # Neg. 475 229 297 191 334 Table 1: Basic statistics of the OPIN data sets B P T1 1 N E From and to E T2 T3 The other typ e of data is used to evaluate the extraction of topic models, topic life cycles, and sentiment dynamics. Such data do not need to have sentiment lab els, but should have time stamps, and b e able to represent users' ad hoc information needs. Following [16], we construct these data sets by submitting time-b ounded queries to Google Blog Search 1 and collect the blog entries returned. We restrict the search domain to spaces.live.com, since schema matching is not our focus. The basic information of these test collections (notated as "TEST") is shown in Table 2. Data Set iPod Da Vinci Code # doc. 2988 1000 Time Period 1/11/0511/01/06 1/26/0510/31/06 Query Term ip o d da+vinci+code Figure 4: The Hidden Markov Model to extract topic life cycles and sentiment dynamics In Figure 4, state E controls the transitions b etween topics. In addition to E, there are a series of pseudo states, each of which corresp onds to a subtopic. These pseudo states can only transit from each other through state E. A pseudo state is not a real single state, but a substructure of states and transitions. For example, pseudo state T1 consists of four real states, three of them corresp ond to the topic model 1 , the p ositive sentiment model, and the negative sentiment model. The remaining state corresp onds to the background model. The four states in each pseudo state are fully connected, except that there is no direct transition b etween two sentiment states. The output probabilities of all states (except for state E ) are fixed according to the corresp onding topic or sentiment models. This HMM structure can decode b oth topic segments (with pseudo state Tj ) and sentiment segments associated with each topic (with states inside Tj ). We force the model to start with state E , and use the Baum-Welch algorithm to learn the transition probabilities and the output probability for E. Once all the parameters are estimated, we use the Viterbi algorithm to decode the collection sequence. Finally, as in [16], we compute the topic Table 2: Basic statistics of the TEST data sets For all the weblog collections, Krovetz stemmer [10] is used to stem the text. 5.2 Sentiment Model Extraction Our first exp eriment is to evaluate the effectiveness of learning the prior models for sentiments. As discussed in ¯ Section 3.3, a good s should not b e dep endent with the sp ecific features of topics, and b e general enough to b e used to guild the learning of sentiment models for unseen topics. The more diversified the topics of the training set are, the more general the sentiment models estimated should b e. To evaluate the effectiveness of our TSM model on this task, we collect lab eled results of 10 different topics from Opinmind, each of which consists of an average of 5 queries. We then construct a series of training data sets, such that for any k (1 k 9), there are 10 training data sets, each of which is a mixture of k topics. We then apply the TSM model on each data set and extract sentiment models 1 http://blogsearch.google.com 176 WWW 2007 / Track: Data Mining accordingly. We also construct a dataset with the mixture of all 10 topics. The top words of the sentiment models which are extracted from the 10-topic-mixture data set and those from a single-topic data set are compared in Table 3. KL divergence 40 Session: Predictive Modeling of Web Users KL Positive KL Negative 35 P-mix love awesome go o d miss amaze pretty job go d yeah bless excellent N-mix suck hate stupid ass fuck horrible shitty crappy te rri b l e p eople e vi l P-movies love harry p ot brokeback mountain awesome b o ok b eautiful go o d watch series N-movies hate harry p ot mountain brokeback suck e vi l movie gay b ore fear P-cities b eautiful love awesome amaze live go o d night n ic e ti m e air greatest N-cities hate suck p eople traffic drive fuck stink move weather city transp ort 30 25 20 15 1 2 3 4 5 6 7 Number of Mixtured Topics 8 9 Table 3: Sentiment models learnt from a mixture of topics are more general The left two columns in Table 3 present the two sentiment models extracted from the 10-topic-mixture dataset, which is more general than the right two columns and two in the middle, which are extracted from two single-topic data sets ("movies" and "cities") resp ectively. In the two columns in the middles, we see terms like "harry", "p ot", "brokeback", "mountain" ranked highly in the sentiment models. These words are actually part of our query terms. We also see other domain sp ecific terms such as "movie", "series", "gay", and "watch". In the sentiment models from "cities" dataset, we remove all query terms from the top words. However, we could still notice words like "night", "air" in the p ositive model, and "traffic", "weather", "transp ort" in the negative model. This indicates that the sentiment models are highly biased towards the sp ecific features of the topic, if the training data set only contains one topic. To evaluate this in a more principled way, we conduct a 10-fold cross validation, which numerically measures the closeness of the sentiment models learnt from a mixture of topics (k = 1 9), and those from an unseen topic (i.e., a topic not in the mixture). Intuitively, a sentiment model is less biased if it is closer to unseen topics. The closeness of two sentiment models is measured with the KullbackLeibler(KL) Divergence, the formula of which is D(x ||y ) = w V Figure 5: Sentiment model leant from diversified topics better fits unseen topics 5.3 Topic Model Extraction Our second exp eriment is to fit the TSM to ad hoc weblog collections and extract the topic models and sentiment coverage. As discussed in Section 3.3, the general sentiment models learnt from the OPIN Data set will b e used as a strong prior for the sentiment models in a given collection. We exp ect the topic models extracted b e unbiased towards sentiment p olarities, which simply represent the neutral contents of the topics. In the exp eriments, we set the initial values of µ s reasonably large (>10,000), and use the regularized estimation strategy in [25] to gradually decay the µ s. B is empirically set b etween 0.85 0.95. Some informative topic models extracted from the TEST data sets are shown in Table 4, 5. batt., nano battery shuffle charge nano do ck i tu n e usb hour NO-Prior marketing apple microsoft market z une d e vi c e company consumer sale a ds, spa m f re e sign offer freepay complete v i ru s f re e i p o d trial With-Prior Nano Battery nano battery color shuffle th i n charge hold usb m o de l hour 4gb m in i do ck lif e inch rechargeable Table 4: Example topic models with TSM: iPod content langdon secret murder louvre th ri l l c lu e neveu curator NO-Prior b o ok background author jesus id e a mary holy gosp el court magdalene brown testament b lo o d gnostic copyright constantine publish b ib le With-Prior movie religion movie religion hank b e lie f tom cardinal fi lm fashion watch conflict howard metaphor ron complaint actor communism p(w|x ) p(w|x ) log p(w|y ) where x and y are two sentiment models (e.g., P learnt from a mixture-topic collection and P from a single-topic collection). We use a simple Laplace smoothing method to guarantee p(w|y ) > 0. The result of the cross validation is presented in Figure 5. Figure 5 measures the average KL divergence b etween the p ositive (negative) sentiment model learnt from a k-topicmixture dataset and the p ositive (negative) sentiment model learnt from an unseen single-topic dataset. We notice that when k is larger, where the topics in the training dataset are more diversified, the sentiment models learnt from the collection are closer to the sentiment models of unseen topics. This validates our assumption that a more diversified training collection could provide more general sentiment prior ¯ ¯ models for new topics. The sentiment models (P and N ) estimated from the 10-topic-mixture collection are used as the prior sentiment models in the following exp eriments. Table 5: Example topic models: Da Vinci Code As discussed in Section 3.4, we can either extract topic models in a completely unsup ervised way, or base on some prior of what the topic models should look like. In Table 4, 5, the left three columns are topic models extracted without prior knowledge, and the right columns are those extracted with the b old titles as priors. We see that the topics extracted in either way are informative and coherent. The ones extracted with priors are extremely clear and distinctive, such as "Nano" and "battery" for the query "iPod". 177 WWW 2007 / Track: Data Mining This is quite desirable in summarizing search results, where the system could extract topics in an interactive way with the user. For example, the user can input several words as exp ected facets, and the system uses these words (e.g., "movie", "b ook", "history" for the query "Da Vinci Code" and "battery", "price" for the query "iPod") as prior on some topic models, and let the remaining to b e extracted in the unsup ervised way. With the topic models and sentiment models extracted, we can summarize the sentences in blog search results, by first ranking sentences according to different topics, and then assigning them into sentiment categories. Table 6 shows the summarized results for the query "Da Vinci Code". We show two facets of the results: "movie" and "b ook". Although b oth the movie and the b ook are named as "The Da Vinci Code", many p eople hold different opinions ab out them. Table 6 well organizes sentences retrieved for the query "da vinci code" by the relevance to each facets, and the categorization as p ositive, negative, and neutral opinions. The sentences do not have to contain the facet name, such as "Tom Hanks stars in the movie". The b olded sentence clearly presents an example of mixed topics. We also notice that the system sometimes make the wrong classifications. For example, the sentence "anyb ody is interested in it?" is misclassified as p ositive. This is b ecause we rely on a unigram language model for the sentiments, and the "bag of words" assumption does not consider word dep endency and linguistics. This problem can b e tackled when phrases are used as the bases of the sentiment models. In Table 7, we compare the query summarization of our model to that of Opinmind. The left two columns are summarized search results with TSM and right two columns are top results from Opinmind, with the same query "iPod". We see that Opinmind tends to rank sentences with strongest sentiments to the top, but many of which are not very informative. For example, although the sentences "I love iPod" and "I hate iPod" do reflect strong attitudes, they do not give the user as useful information as "out of battery again". Our system, on the other hand, reveals the hidden facets of p eople's opinions. In the results from Opinmind, we do notice that some sentences are ab out sp ecific asp ects of iPod, such as "battery", "video", "microsoft (indicating marketing)". Unfortunately, these useful information are mixed together. Our system organizes the sentences according to the hidden asp ects, which provides the user a deep er understanding of the opinions ab out the query. Session: Predictive Modeling of Web Users burst in May, 2006, which was caused by the release of the movie "Da Vinci Code". However, b efore the movie, we can still notice some bursts of discussions ab out the b ook. In the plot of the "b ook" facet, the p ositive sentiments consistently dominates the opinions during the burst. For the religion issues and the conflicts of the b elief, however, the negative opinions are stronger than the p ositive opinions during the burst, which is consistent with the heated debates ab out the movie around that p eriod of time. In fact, the b ook and movie are b oycotted or even banned in some countries b ecause of the confliction to their religious b elief. In Figure 6(c) and (d), we present the topic life cycle and the sentiment dynamics of the subtopic "Nano" for "iPod". In Figure 6(c), we see that b oth the neutral topic and the p ositive sentiment ab out Nano burst around early Septemb er, 2005. That is consistent with the time of the official introduction of iPod Nano. The negative sentiment, however, does not burst until several weeks later. This is reasonable, since p eople need to exp erience the product for a while b efore discovering its defects. In Figure 6(d), we alternatively plot the relative strength of p ositive (negative) sentiment over the negative (p ositive) sentiment. This relative strength clearly reveals which sentiment dominates the opinions, and the trend of this domination. Since there are generally less negative opinions, we plot the Neg/Pos line with a different scale. Again, we see that around the time that the iPod Nano was introduced, the p ositive sentiments dominate the opinions. However, in Octob er 2005, the negative sentiments shows a sudden increase of coverage. This overlaps with the time p eriod in which there was a bursting of complaints, followed by a lawsuit ab out the "scratch problem" of iPod Nano. We also plot the Pos/Neg dynamics of the overall sentiments ab out "iPod". We see that its shap e is much different than the Pos/Neg plot of "Nano". The p ositive sentiment holds a larger prop ortion of opinions, but this domination is getting weaker. This also suggests that it is not reasonable to use the overall blog mentions (not distinguishing subtopics or sentiments), or the general sentiment dynamics (not distinguishing subtopics), to predict the user b ehavior (e.g., buying a Nano). All these results show that our method is effective to extract the dynamics of topics and the sentiments. 6. RELATED WORK To the b est of our knowledge, modeling the mixture of topics and sentiments has not b een addressed in existing work. However, there are several lines of related work. Weblogs have b een attracting increasing attentions from researchers, who consider weblogs as a suitable test b ed for many novel research problems and algorithms [11, 7, 6, 15, 19]. Much new research work has found applications to weblog analysis, such as community evolution [11], spatiotemp oral text mining [15], opinion tracking [20, 15, 19], information propagation [7], and user b ehavior prediction [6]. Mei and others introduced a mixture model to extract the subtopics in weblog collections, and track their distribution over time and locations [16]. Gruhl and others [7] prop osed a model for information propagation and detect spikes in the diffusing topics in weblogs, and later use the burst of blog mentions to predict spikes of sales of this b ook in the near future [6]. However, all these models tend to ignore the sentiments in the weblogs, and only capture the general 5.4 Topic Life Cycle and Sentiment Dynamics Based on the topic models and sentiment models learnt from the TEST collections, we evaluate the effectiveness of the HMM based method presented in Section 4, on extraction of topic life cycles and sentiment dynamics. Intuitively, we exp ect the sentiment models to explain as much information as p ossible, since the most useful patterns are sentiment dynamics. In our exp eriments, we force the transition probability from topic states to sentiment states, and those from sentiment models to themselves to b e reasonably large (e.g., >0.25). The results of topic life cycles and sentiment dynamics are selectively presented in Figure 6. In Figure 6(a) and (b), we present the dynamics of two facets in the Da-Vinci-Code Collection: "b ook" and "religion, b elief ". The neutral line in each plot corresp onds to the topic life cycle. In b oth plots, we see a significant 178 WWW 2007 / Track: Data Mining Neutral ... Ron Howards selection of Tom Hanks to play Rob ert Langdon. Directed by: Ron Howard Writing credits: Akiva Goldsman ... After watching the movie I went online and some research on ... I knew this b ecause I was once a follower of feminism. I rememb ered when i first read the b o ok, I finished the b o ok in two days. I'm reading "Da Vinci Co de" now. Thumbs Up Tom Hanks stars in the movie, who can b e mad at that? Tom Hanks, who is my favorite movie star act the leading role. Anyb o dy is interested in it? Session: Predictive Modeling of Web Users Thumbs Down But the movie might get delayed and even killed off if he loses. protesting ... will lose your faith by ... watching the movie. ... so sick of p eople making such a big deal ab out a FICTION b o ok and movie. ... so sick of p eople making such a big deal ab out a FICTION b o ok and movie. This controversy b o ok cause lots conflict in west so ciety. in the feeling of deeply anxious and fear, to ... read b o oks calmly was quite difficult. Topic1 (Movie) And I'm hoping for a go o d b o ok to o. Awesome b o ok. So still a go o d b o ok to past time. Topic2 (Bo ok) Table 6: Topic-sentiment summarization: Da Vinci Code 1 2 3 TSM Thumbs Up Thumbs Down (sweat) iPo d Nano ok so ... WAT IS THIS SHIT??!! Ip o d Nano is a co ol design, ... ip o d nanos are TOO small!!!! the battery is one serious Po or battery life ... example of excellent relibability ...iPo d's battery completely died My new VIDEO ip o d arrived!!! fake video ip o d Oh yeah! New iPo d video Watch video p o dcasts ... Opinmind Thumbs Up Thumbs Down I love my iPo d, I love my G5... I hate ip o d. I love my little black 60GB iPo d Stupid ip o d out of batteries... I LOVE MY iPOD " hate ip o d " = 489.. I love my iPo d. my iPo d lo oked uglier...surface... - I love my iPo d. i hate my ip o d. ... iPo d video lo oks SO awesome ... microsoft ... the iPo d sucks Table 7: Topic-sentiment summarization: iPod 300 50 500 70 0.7 250 Relative Strength Book:Neutral Book:Positive Book:Negative Strength 45 40 35 Belief:Neutral Belief:Positive Belief:Negative Strength 450 400 350 300 250 200 150 100 50 Nano:Neutral Nano:Positive Nano:Negative 60 50 40 30 20 10 0 Overall:Pos/Neg Nano:Pos/Neg Nano:Neg/Pos 0.6 0.5 0.4 0.3 0.2 0.1 0 200 Strength 30 25 20 15 10 5 150 100 50 1/13/05 2/12/05 3/14/05 4/13/05 5/13/05 6/12/05 7/12/05 8/11/05 9/10/05 10/10/05 11/9/05 12/9/05 1/8/06 2/7/06 3/9/06 4/8/06 5/8/06 6/7/06 7/7/06 8/6/06 9/5/06 10/5/06 Time 1/13/05 2/12/05 3/14/05 4/13/05 5/13/05 6/12/05 7/12/05 8/11/05 9/10/05 10/10/05 11/9/05 12/9/05 1/8/06 2/7/06 3/9/06 4/8/06 5/8/06 6/7/06 7/7/06 8/6/06 9/5/06 10/5/06 Time 01/10/05 02/07/05 03/07/05 04/04/05 05/02/05 05/30/05 06/27/05 07/25/05 08/22/05 09/19/05 10/17/05 11/14/05 12/12/05 01/09/06 02/06/06 03/06/06 04/03/06 05/01/06 05/29/06 06/26/06 07/24/06 08/21/06 09/18/06 10/16/06 0 0 0 Time (a) Da Vinci Code: Book (b) Da Vinci Code: Religion (c)iPod: Nano Figure 6: Topic life cycles and sentiment dynamics description ab out topics. This may limit the usefulness of their results. Mishne and others instead used the temp oral pattern of sentiments to predict the b ook sales. Opinmind [20] summarizes the weblog search results with p ositive and negative categories. On the other hand, researchers also use facets to categorize the latent topics in search results [8]. However, all this work ignores the correlation b etween topics and sentiments. This limitation is shared with other sentiment analysis work such as [18]. Sentiment classification has b een a challenging topic in Natural Language Processing (see e.g., [26, 2]). The most common definition of the problem is a binary classification task of a sentence to either the p ositive or the negative p olarity [23, 21]. Since traditional text categorization methods p erform p oorly on sentiment classification [23], Pang and Lee prop osed a method using mincut algorithm to extract sentiments and sub jective summarization for movie reviews [21]. In some recent work, the definition of sentiment classification problem is generalized into a rating scale [22]. The goal of this line of work is to improve the classification accuracy, while we aim at mining useful information (topic/sentiment models, sentiment dynamics) from weblogs. These methods do not either consider the correlation of sentiments and topics or model sentiment dynamics. Some recent work has b een aware of this limitation. Engstr¨m studied how the topic dep endence influences the aco curacy of sentiment classification and tried to reduce this dep endence [5]. In a very recent work [4], the author prop osed a topic dep endent method for sentiment retrieval, which assumed that a sentence was generated from a probabilistic model consisting of b oth a topic language model and a sentiment language model. A similar approach could b e found in [27]. Their vision of topic-sentiment dep endency is similar to ours. However, they do not consider the mixture of topics in the text, while we assume that a document could cover multiple subtopics and different sentiments. Their model requires that a set of topic keywords is given by the user, while our method is more flexible, which could extract the topic models in an unsup ervised/semi-sup ervised way with an EM algorithm. They also requires sentiment training data for every topic, or manually input sentiment keywords, while we can learn general sentiment models applicable to ad hoc topics. Most opinion extraction work tries to find general opinions on a given topic but did not distinguish sentiments [28, 15]. Liu and others extracted product features and opinion features for a product, thus were able to provide sentiments for different features of a product. However, those product opinion features are highly dep endent on the training data sets, thus are not flexible to deal with ad hoc queries and topics. The same problem is shared with [27]. They also did not provide a way to model sentiment dynamics. There is yet another line of research in text mining, which tries to model the mixture of topics (themes) in documents [9, 1, 16, 15, 17, 12]. The mixture model we presented is along this line. However, none of this work has tried to model the sentiments associated with the topics, thus can not b e applied to our problem. However, we do notice 179 1/10/2005 2/7/2005 3/7/2005 4/4/2005 5/2/2005 5/30/2005 6/27/2005 7/25/2005 8/22/2005 9/19/2005 10/17/2005 11/14/2005 12/12/2005 1/9/2006 2/6/2006 3/6/2006 4/3/2006 5/1/2006 5/29/2006 6/26/2006 7/24/2006 8/21/2006 9/18/2006 10/16/2006 Time (d)iPod: Relative WWW 2007 / Track: Data Mining that the TSM model is a sp ecial case of some very general topic models, such as the CPLSA model [17], which mixes themes with different views (topic, sentiment) and different coverages (sentiment coverages). The generation structure in Figure 2 is also related to the general DAG structure presented in [12]. Session: Predictive Modeling of Web Users [11] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web, pages 568­576, 2003. [12] W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 577­584, 2006. [13] B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 342­351, 2005. [14] G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997. [15] Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 533­542, 2006. [16] Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceedings of KDD '05, pages 198­207, 2005. [17] Q. Mei and C. Zhai. A mixture model for contextual text mining. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Know ledge discovery and data mining, pages 649­655, 2006. [18] G. Mishne and M. de Rijke. MoodViews: Tools for blog mood analysis. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006), pages 153­154, 2006. [19] G. Mishne and N. Glance. Predicting movie sales from blogger sentiment. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006), 2006. [20] Opinmind. http://www.opinmind.com. [21] B. Pang and L. Lee. A sentimental education: Sentiment analysis using sub jectivity summarization based on minimum cuts. In Proceedings of the ACL, pages 271­278, 2004. [22] B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL, pages 115­124, 2005. [23] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79­86, 2002. [24] L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2):257­285, Feb. 1989. [25] T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 162­169, 2006. [26] J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation (formerly Computers and the Humanities), 39, 2005. [27] J. Yi, T. Nasukawa, R. C. Bunescu, and W. Niblack. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of ICDM 2003, pages 427­434, 2003. [28] C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD '04, pages 743­748, 2004. 7. CONCLUSIONS In this pap er, we formally define the problem of topicsentiment analysis and prop ose a new probabilistic topicsentiment mixture model (TSM) to solve this problem. With this model, we could effectively (1) learn general sentiment models; (2) extract topic models orthogonal to sentiments, which can represent the neutral content of a subtopic; and (3) extract topic life cycles and the associated sentiment dynamics. We evaluate our model on different Weblog collections; the results show that the TSM model is effective for topic-sentiment analysis, generating more useful topicsentiment result summaries for blog search than a state-ofthe-art blog opinion search engine (Opinmind). There are several interesting extensions to our work. In this work, we assume that the content of sentiment models is the same for all topics in a collection. It would b e interesting to customize the sentiment models according to each topic and obtain different contextual views [17] of sentiments on different facets. Another interesting future direction is to further explore other applications of the TSM, such as user b ehavior prediction. 8. ACKNOWLEDGMENTS We thank the three anonymous reviewers for their comments. This work is in part supp orted by the National Science Foundation under award numb ers 0425852, 0347933, and 0428472. 9. REFERENCES [1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993­1022, 2003. [2] Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of HLT-EMNLP 2005, 2005. [3] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1­38, 1977. [4] K. Eguchi and V. Lavrenko. Sentiment retrieval using generative models. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 345­354, July 2006. [5] C. Engstrom. Topic dependence in sentiment classification. masters thesis. university of cambridge. 2004. [6] D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proceedings of KDD '05, pages 78­87, 2005. [7] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web, pages 491­501, 2004. [8] M. A. Hearst. Clustering versus faceted categories for information exploration. Commun. ACM, 49(4):59­61, 2006. [9] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR '99, pages 50­57, 1999. [10] R. Krovetz. Viewing morphology as an inference process. In Proceedings of SIGIR '93, pages 191­202, 1993. 180