WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Mining Tag Clouds and Emoticons behind Community Feedback Kavita A. Ganesan eBay Inc. 2145 Hamilton Avenue San Jose, CA 95125, USA +1 408 967 5895 Neel Sundaresan eBay Research Labs 2145 Hamilton Avenue San Jose, CA 95125, USA +1 408 376 8422 Harshal Deo eBay Inc. 2145 Hamilton Avenue San Jose, CA 95125,USA +1 408 967 8725 kaganesan@ebay.com ABSTRACT nsundaresan@ebay.com hdeo@ebay.com In this paper we describe our mining system which automatically mines tags from feedback text in an eCommerce scenario. It renders these tags in a visually appealing manner. Further, emoticons are attached to mined tags to add sentiment to the visual aspect. seller. This then goes to the text pre-processor to clean out the text and to build a dictionary of words. The text is then fed into a suffix tree data structure for n-gram phrase extraction. Extracted phrases are weighted and the best phrases are retained. Short-listed phrases are then mapped corresponding emoticons and finally the feedback summary with varying size of tag clouds [2] are presented to users. Figure 1 shows a screenshot of our mining system that we call Emosi Sosial. The emoticons attached to the tags give a visual understanding on what the community feels about a seller. The emoticon legend on the left simply makes it easy for users to figure out what each emoticon means. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Clustering, Information filtering, Selection process, Retrieval models General Terms Algorithms, Experimentation, Design Keywords Algorithms, Search, Feedback, Experimentation, Web 2.0 Measurement, Precision, 1. Motivation and Background In a social commerce environment such as eBay community feedback is imperative. Buyers will want to buy from sellers whom they can truly trust and will deliver what is promised. Many times, the only way of knowing this is based on the comments left by other buyers. Superlative phrases like "AAA+++" are as common as "good packaging", or "no response". Although the feedback score is an indication of how good a seller is, buyers would like to know if the seller typically has good customer support, provides fast shipping or good packaging. A prospective buyer may compromise on some needs but not on others. The option of analyzing the reason behind the seller's score however poses some challenges especially when the seller has thousands of reviews. Our work focuses on mining tags from a collection of small blocks of text which have opinion and positive or negative feedback attached to them. Our goal is to identify the most distinguishing tags on a per-entity (user/category) basis from among a collection of entities (users/categories). For visual representation we would like to use the font size and weight for representing density. For visual impact and normalizing across text, we use representative emoticons as proxies. Figure 1: A Screenshot of Emosi Sosial. Tag clouds in the middle under the headlines positive, negative and neutral are the summarized feedback in their respective categories. The pop-up seen on the far right highlights all comments related to the tag that was clicked on. Note that the larger font-sized text are more representative than the smaller ones. 2.1 Weighting and Scoring of Feedback Tags Each extracted tag is weighted using the tf-idf (term frequency­ inverse document frequency) [3] scheme. The tags that meet the minimum weight threshold will be short listed and eventually displayed. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. For user level mining, the tf for a given phrase P of a seller S is multiplied by the idf of phrase P in the global dataset. The same is done for category level mining. The only difference is that the tf for all sellers within a given category is used instead of tf for a single seller. The tf-idf scoring formula used is as follows: TFnormalized = (TFpsellerk or TFpcategoryk) / TFmax IDF=Log2 (N/ TFpglobal) TF-IDF= TFnormalized * IDF 2. Implementation and Architecture The emotion mining System starts with a "request" for a seller's feedback information. This returns the last pre-chosen number of feedback text (positive, negative or neutral) for the named Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1181 WWW 2008 / Poster Paper Where: TFpsellerk : term frequency of phrase p for a seller k TFpcategoryk: term frequency of phrase p in a category k TFpglobal : global term frequency for phrase p We normalize the term frequencies so that longer documents are not unfairly given more weight. Once we have the short listed phrases we go on to mapping the phrases with the appropriate emoticon. April 21-25, 2008 · Beijing, China 14,000 feedback words the resulting tags are 90. The significant reduction in information is the result of presenting common phrases of high significance as a single phrase. The removal of noise and the use of stop words have also contributed to a more readable feedback summary. 2.2 Phrase to Emoticon Mapping Emoticons are a simple and concise way of visually showing user the general emotion behind a feedback tag. Figure 2 depicts the list of emoticons used for mapping and an example of possible mappings. The emoticon dictionary uses different combination of words to map phrases to emoticons. These words are typically in its root form to avoid ambiguity. For a given phrase, the phrase to emoticon mapping is looked up in the emoticon dictionary. Phrases are first converted to their root form using a WordNet library. Phrases like "slow shipping" will be changed to "slow/ship". Then a lookup is performed on the different combination of words to find the best mapping. Figure 3: Snapshot of positive feedback in the Jewellery category. Note that in addition to the generic positive comments like "prompt shipping", category-specific ones like "lovely bracelet" are prominent. Phrases/Words fast,speed /deliver,ship, arrive/ fast,quick no,never/receive,response, reply nice,beautiful,pretty,gorgeous,cute Mapping Figure 4: Snapshot of negative feedback tags in the laptops category. The tags "bad battery" and "no power supply" are very interesting tags within the laptops category. Figure 2: Set of emoticons used to depict emotions behind community feedback. The descriptions are not specific to the tags. A broad range of tags could map to one emoticon. Tags like "fast delivery", "arrived fast" and "arrived quickly" would map to the `quick shipment' emoticon. 5. Conclusions and Future Work In this paper we described a system that mines tags from short text which is user feedback in a social commerce application. It mines and identifies tags that are more representative of a user among users in a community or tags that are more representative of a category among categories in an application. Tags which carry similar emotions can be combined into a common emotion represented by an emoticon. We believe that this system combines fun and utility in a unique way. For future work, we plan to continue to improve the tag mining and emoticon mapping systems. We are working on making the emoticon mapping automatic rather than dictionary-based. 3. Some Experiments We would like to study how one user differs from another from the point of view of tags and emotions. Similarly, we would like to understand typical tags and emoticons of different product categories like "Jewellery" or "Electronics". Figure 3 and Figure 4 show part of feedback tags within two categories. Note that larger sized tags are more important than smaller tags. Tags like "lovely bracelet" and "lovely necklace" are representative positive emotions in the Jewellery category while "bad battery" or "no power supplies" are typical negative emotions in the laptop category. 6. REFERENCES [1] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan and A. Tomkins. Visualizing Tags over Time. International World Wide Web Conference, Scotland. May 2006. 4. Evaluation One dimension of evaluation of the tag cloud mining system is how much noise it reduces while improving quality of information presented [1]. We choose the feedback text from a 100 sellers and plot that against the number of resulting tags for the sellers. For a seller with about 4000 words of feedback text, the resulting tags are around 25 and for a seller with about 1182 [2] A. Rivadeneira, D. Gruen, M. Muller and D. Millen. Geeting our Head in the Clouds: Towards Evaluation Studies of Tagclouds. ACM SIGCHI 2007. Pages 995-998. [3] Stephen Robertson. Understanding Inverse Document Frequency. Journal of Documentation, Volume 60, Number 5, pp. 503-520(18). 2004.