WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Finding Core Members in Virtual Communities Haiqiang Chen Institute of Computing Technology P.O. Box 2704 Beijing, China Xueqi Cheng Institute of Computing Technology P.O. Box 2704 Beijing, China Yue Liu Institute of Computing Technology P.O. Box 2704 Beijing, China chenhq@software.ict.ac.cn ABSTRACT cxq@ict.ac.cn liuyue@ict.ac.cn Finding the core memb ers of a virtual community is an imp ortant problem in community analysis. Here we presented an simulated annealing algorithm to solve this problem by optimizing the user interests concentration ratio in user groups. As an example, we test this algorithm on a virtual community site and evaluate its results using human "gold standard" method. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval; H.3.5 [Information Systems]: Online Information Services; J.4 [Computer Applications]: Social and Behaviorial Sciences and he can reach all its memb ers easily. The op en access to virtual communities brings numerous memb ers to them. But among those memb ers, most are seldom heard to other memb ers. Among those non-active memb ers, some are pure observers, and some are light participants, raising voice occasionally. Only a small p ortion of the memb ers are active in the virtual community. In our study, such active memb ers are called core members. Finding the core memb ers in virtual community is an intriguing problem and has b een little researched so far. In this pap er, inspired by the ideas of Guimera[1] we prop osed a computer algorithm to solve this problem. As an example, we give results from the application of this algorithm in the virtual communities of a Chinese web site Douban.com. 2. THE ALGORITHM General Terms Algorithms, Exp erimentation,Human Factors According to our intuitive understanding and the definition by Rheingold[3], the shared interests and activities are one of the most imp ortant feature of the virtual community. Generally, the shared interests and activities are the ma jor reason to attract users to join the virtual community. In most of the virtual community web sites, the user tagging information can b e used to define the interests of the users. We assume that one user u 's interests can b e presented by the set of sub jects which have b een tagged by this user. So user u has a set of tagged sub jects: Iu = {s1 , s2 , . . . , sn }. As a result, we can get a bi-partite graph b etween all users and tagged sub jects. Considering user u and sub ject s: assuming that s has b een tagged by Ms users, and user u has tagged Nu sub jects, then, the probability that user u have tagged s is: Ms N ÈN u k Keywords Virtual community, core memb ers finding, simulated annealing 1. INTRODUCTION People have b een using virtual communities to communicate since the b eginning of the Internet. In the last few years, many virtual communities burgeon on the Internet, and it is very easy now to build a new virtual community using tools provided by those web-based community building sites. In simple terms, virtual community (or online community) is the gathering of p eople, in online space where they can come, communicate, and get to know each other b etter over time. According to the definition of Howard Rheingold in [3], virtual communities are social aggregations that emerge from the Net when enough p eople carry on public discussions long enough, with sufficient human feeling, to form webs of p ersonal relationships in cyb erspace. An imp ortant feature of the virtual community is the op en memb ership. One can join any virtual community he want, This research is supp orted by the 973 National Basic Research Program of China (2004CB318109 & 2007CB311100) and MSRA IST 2007(FY07-RES-THEME-067). Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. (1) k So, it can b e easy to know the probability that s has b een tagged by b oth u and v is: Ms (Ms - 1) ( NN ÈN ) u v k k 2 (2) The exp ectation of the numb er of sub jects tagged by b oth u and v is (note that Ms = Nk ): s k s s As È M (M s s s È M (M - 1) È NN (3) ( M) È M are global properties which - 1) and s s 2 u v s s s È È 1233 WWW 2008 / Poster Paper do not dep end on the pair of u and v selected, we can calculate equation (3) very quickly using some simple multiply op erations. So given a user set G = {u1 , u2 , . . . , un }, we can define the user interests concentration ratio as the cumulative deviation of the numb er of shared sub jects tagged from the random exp ectation: CG Score 1 0.5 0 April 21-25, 2008 · Beijing, China Description Full committed participant and group leaders. The participant who is heading toward to b e full committed, but still not now. New comer or lurker who never talk in this group. Table 1: User categorization table Ms (Ms - 1) Algorithm Result 0 1 0 0.39 0.24 Human Rating 0.25 0.5 0.75 0.02 0.005 0.01 0.04 0.02 0.02 u =( cuv =vG È M (M - 1) È M ) N N )/ s - ( s s s s 2 u v s (4) where cuv is the numb er of shared interests tagged by b oth u and v . If the numb er of shared interests in group G is no different from the random exp ectation, then this quantity CG will b e zero. If the value CG is larger than zero, it indicates that the users in group G have more shared interests than the random exp ectation. In a virtual community, the core memb ers will have more interaction with each other than those non-core memb ers. As a result, the core memb ers in a virtual community would share more common interests with each other than those non-core memb ers. So we can use the user interest concentration ratio CG to find the core memb ers in virtual communities: the problem of finding the core memb ers of a virtual community can b e transformed into the problem of finding the p ortion of memb ers in the virtual community with large CG value. In order to solve this problem, we choose to use the simulated annealing (SA)[2] method. Simulated annealing[2] is a stochastic optimization technique that enables one to find `low cost' configuration without getting trapp ed by the `high cost' local minima. Our algorithm can now b e defined as follows. 1. Set the initial temp erature as T and initial solution S = G; 2. Randomly choose a G: if a S , then S = S \{a} = {x|x S, x = a}; if a S , S = S {a}; / 3. Calculate = CS - CS : if > 0, accept the new solution S = S , if <= 0, accept the new solution S = S with the probability e/T ; 4. Cool down the temp erature T = cT , c=0.995; 5. Rep eat step 2 - 4 until the temp erature T is small enough. 1 0.055 0.21 Table 2: Correlation between human ratings and algorithm results. In this study, we chose all tagged items by one user as his interests profile. Each item which has b een tagged by a user is one of his interests. Our algorithm can output a list of core memb ers for each group. Since there was no explicit core memb ers data in Douban.com, we needed to use human raters to generate "gold standard" to evaluate our results. As it was not p ossible for us to rate a large numb er of groups and users, we rate only a small set of the users. We randomly chose 10 middle-sized groups (group size varies from 100 to 200) from 38423 groups on Douban.com, and then for each group, 20 memb ers were chosen as our evaluation samples. We invited two raters who are also Douban.com users(but not the memb ers of the 10 selected group). For each group, the raters explored the group's home page, bulletin b oards and p ersonal profiles of every memb ers, so as to decide weather the 20 selected user was the core memb er of the group. Users were categorized as Table 1. After b oth raters submit the rating results, we use the average score by the two rater as the final human rating results. Table 2 shows the correlation b etween human rating score and the algorithm output score for all of our 200 test samples. This results show that our algorithm can find most of the core memb ers from the virtual community, although the false p ositive rate is quite high. 4. CONCLUSIONS Finding the core memb ers in virtual communities is a intriguing problem for social network analysis and other related areas. Here we presented a SA algorithm to solve this problem. We also apply and evaluate this algorithm in a real-world online community, Douban.com. The algorithm show some satisfying results. But how to improve this algorithm's false p ositive rate is still a ma jor challenge for us. 3. RESULTS ON DOUBAN.COM In order to test our algorithm, here we give one example, the analysis of the virtual communities in Douban.com. Douban.com is a Chinese web site. It can b e lab eled as a "collab orative filtering" or "collab orative tagging" site, where one can tag his interested items and share his reviews or comments ab out millions of b ooks and movies with others. Basing on the user-tagging b ehaviors, reviews and comments, Douban.com helps its users to get recommendations ab out new b ooks and movies. As another result, Douban.com can also help one to find other users with similar tastes and interests, so they can get connected and communicate with each other. Douban.com provide a community service, which is called "Douban Group". Anyone can set up a new group ab out some topic and invite others to join this group. In these groups, group memb ers can discuss their interested topic, and get more information from other memb ers. 5. REFERENCES [1] R. Guimera, M. Sales-Pardo, and L. A. N. Amaral. Module identification in bipartite and directed networks. Physical Review E, 76, 036102(2007). [2] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671­680, May 1983. [3] H. Rheingold. The Virtual Community: Homesteading on the Electronic Frontier, revised edition. The MIT Press, Novemb er 2000. 1234