WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities FacetNet: A Framework for Analyzing Communities and Their Evolutions in Dynamic Networks Yu-Ru Lin1 1 2 Yun Chi2 Shenghuo Zhu2 Hari Sundaram1 Belle L. Tseng3 Ar ts, Media and Engineering Program, Arizona State University, Tempe, AZ 85281, USA NEC Laboratories America, 10080 N. Wolfe Rd, SW3-350, Cuper tino, CA 95014, USA 3 YAHOO! Inc., 2821 Mission College Blvd, Santa Clara, CA 95054 USA 1 {yu-ru.lin,hari.sundaram}@asu.edu, 2 {ychi,zsh}@sv.nec-labs.com, 3 belle@yahoo-inc.com 1. INTRODUCTION Data from many social network datasets, including pap er co-authorship networks and the blogosphere, is a graph where nodes represent individuals (e.g., club memb ers, authors, and bloggers) and edges represent the relationship and interactions among individuals (e.g., interactions in a club, co-authorship, and hyp erlinks in blogs). In such social networks, individuals form communities by building relationships and interactions with each other. The analysis of these communities (memb ership, structure and temp oral dynamics) is an imp ortant research issue. Traditional analysis of social networks treats the network as as a static graph, where the static graph is either derived from aggregation of data over all time or taken as a snapshot of data at a particular time. These studies range from wellestablished social network analysis [22] to recent successful applications such as HITS and PageRank [7, 15]. However, this research omits one imp ortant feature of communities in networked data-- the temp oral evolution of communities. By ignoring community evolution, prior works have overlooked a key asp ect of online communities. More recently, there has b een a growing b ody of work on the analysis of communities and their temp oral evolution in dynamic networks [1, 8, 9, 10, 11, 16, 19, 21]. However, a common weakness in these studies, as we will discuss in detail in related work, is that communities and their evolutions are studied separately--usually community structures are indep endently extracted at consecutive timesteps and then in retrosp ect, evolutionary characteristics are introduced to explain the difference b etween these community structures over time. Such a two-stage approach may make sense when the community structure is unambiguous (e.g., when the community affiliation is available). However, more often than not, data from real-world networks are ambiguous and sub ject to noise. Under such scenarios, if an algorithm extracts community structure for each timestep indep endently of other timesteps, it often results in community structures with high temp oral variation. Consequently, undesirable evolutionary characteristics may have to b e introduced in order to explain the high variation in the community structures. Therefore, we argue that a more appropriate approach is to analyze communities and their evolutions in a unified framework where the community structure provides evidence ab out community evolutions and at the same time, the evolutionary history offers hints on what community structure is more appropriate. For example, a community structure that introduces dramatic evolutions in a very short p eriod of time is less desirable. ABSTRACT We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as pap er citation networks. Also, communities may evolve over time, due to changes to individuals' roles and social status in the network as well as changes to individuals' research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine corresp ondences. We argue that this approach is inappropriate in applications with noisy data. In this pap er, we prop ose FacetNet for analyzing communities and their evolutions through a robust unified process. In this novel framework, communities not only generate evolutions, they also are regularized by the temp oral smoothness of evolutions. As a result, this framework will discover communities that jointly maximize the fit to the observed data and the temp oral evolution. Our approach relies on formulating the problem in terms of non-negative matrix factorization, where communities and their evolutions are factorized in a unified way. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We p erform extensive exp erimental studies, on b oth synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications-- Data mining ; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Information filtering ; J.4 [Computer Applications]: Social and Behavioral Sciences-- Economics General Terms Algorithms, Exp erimentation, Measurement, Theory Keywords Community, Evolution, Soft Memb ership, Non-negative Matrix Factorization, Community Net, Evolution Net Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 685 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities Another common problematic issue in current community analysis techniques is that an individual is usually assigned to only one community at a time. On the contrary, an individual may b e engaged in multiple communities at the same time. For example, a blogger who is a dance guru may also b e an amateur photographer at the same time. Because of this, an individual who usually participates in multiple communities should b e assigned to multiple communities at the same time. Therefore, instead of a hard community partition, we argue that a soft community memb ership is more informative, as it provides more details ab out how an individual participates in each of the communities. In this pap er, we prop ose a systematic framework for analyzing communities and their evolutions in dynamic networks, and we term our framework FacetNet 1 . Our main contributions are threefold: 1. We introduce the FacetNet framework to analyze communities and their evolutions in a unified process. In our framework, the community structure at a given timestep t is determined b oth by the networked data at t and by the historic community evolution patterns. As a result, the discovered communities and their evolutions are more robust to noise and more reasonable (e.g., dramatic change in a short time is unlikely). 2. We extend the soft clustering algorithm by Yu et al. [24] from static graphs to dynamic networks. In contrast to a hard community partition, in our framework an individual can participate in multiple communities at the same time and with different participation levels. Similarly, an observed relationship is generated due to a combined effect from various communities. Based on the soft community memb ership, we further define two novel concepts--Community Net and Evolution Net --to represent community structures and their evolutions, resp ectively. 3. We provide an iterative algorithm that is guaranteed to converge to (local) optimal solutions to the prop osed formulation. We prove the correctness and convergence of our algorithm and show that this algorithm has low time complexity. We also provide principled solutions to some practical issues, such as how to determine the numb er of communities and how to handle adding and removing of individuals in a dynamic network. We use synthetic and real datasets (including a blog dataset and a pap er co-authorship dataset) to demonstrate that compared to traditional methods, our framework provides more reasonable results on communities and their evolutions. We also show that our framework is able to discover interesting insights in dynamic networks that are not directly obtainable from existing methods. The rest of the pap er is organized at follows. First, we discuss related work. In Section 2, we describ e our basic framework in detail. In Section 3, we introduce extensions of our framework to handle some practical issues. In Section 4, we provide exp erimental studies. Finally in Section 5, we give the conclusion. 1 FacetNet stands for "a Framework for Analyzing Communities and EvoluTions in dynamic NETworks". Related Work Community formation has b een extensively studied in various research areas such as social network analysis, Web community analysis, computer vision, etc. In social network analysis, an imp ortant research topic is to identify cohesive subgroups of individuals within a network where cohesive subgroups are defined as "subsets of actors among whom there are relatively strong, direct, intense, frequent, or p ositive ties" ([22]). Many approaches, such as cliquebased, degree-based, and matrix-p erturbation-based, have b een prop osed to extract cohesive subgroups from social network [22]. Communities also play an imp ortant role in Web analysis. For example, Flake et al. [6] defined Web communities as "a set of sites that have more links to memb ers of the community than to non-memb ers", and prop osed algorithms to identify Web communities based on a maximum flow/minmum cut framework. Newman et al. [13] defined a metric called modularity measure to quantify the strength of community structure which we will discuss in detail in a later section. In computer vision, community extraction is closely related to image segmentation problem. One effective method in this area is the sp ectral clustering algorithm [4, 5, 18, 25] where the eigenvectors of certain normalized similarity matrices are used for the clustering purp ose. Later, White et al. [23] p ointed out the close relationship b etween Newman's modularity and the sp ectral clustering and prop osed several algorithms to combine the two approaches. Yu et al. [24] prop osed a novel clustering framework on graphs where the cluster memb erships are assigned in a probabilistic way. In Yu's framework, cluster memb erships can b e extracted in different resolutions, representing local or global cluster structures. A common issue in all the ab ove studies is that they only analyzed static networks where no temp oral analysis is used for evolution study. Another issue in these studies is that they treat community extraction as a graph partition problem and therefore always result in hard community memb erships, which disallows an individual to participate multiple communities at the same time. Recently, there exists a growing b ody of literature on analyzing communities and their evolutions in dynamic networks. Kumar et al. [8] studied the evolution of the blogosphere as a graph in terms of the change of characteristics, (such as in-degree, out-degree, strongly connected comp onents), the change of communities, as well as the burstiness in blog community. Leskovec et al. [10] studied the patterns of growth for graphs in various fields and prop osed generators that produce graphs exhibiting the discovered patterns. Palla et al [16] analyzed a co-authorship network and a mobile phone network, where b oth networks are dynamic, by using the clique p ercolation method (CPM). Toyoda et al. [21] studied the evolution of Web communities from a series of Web achieves by defining different typ es of community changes, such as emerge, dissolve, grow, and shrink, as well as a set of metrics to quantify such changes for community evolution analysis. Spiliop oulou et al. [19] prop osed a framework, MONIC, to model and monitor cluster transitions over time. They defined a set of external transitions such as survive, split, disapp ear, to model transactions among different clusters and a set of internal transitions, such as size and location transitions to model changes within a community. Asur et al. [1] introduced a family of events on b oth communities and individuals to characterize evolution of communities. They also defined a set of metrics to measure the 686 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities stability, sociability, influence and p opularity for communities and individuals. Sun et al. [20] prop osed a parameterfree algorithm, GraphScop e, to mine time-evolving graphs where the Minimum Description Length (MDL) principle is employed to extract communities and to detect community changes. Mei et al. [12] extracted latent themes from text and used the evolution graph of themes for temp oral text mining. All these studies, however, have a common weak p oint--community extraction and community evolution are analyzed in two separated stages. That is, when communities are extracted at a given timestep, historic community structure, which contains valuable information related to current community structure, is not taken into account. There are some recent studies on evolutionary emb edding and clustering that are closely related to our work. In [17], Sarkar et al. prop osed a dynamic method that emb eds nodes into latent spaces where the locations of the nodes at consecutive timesteps are regularized so that dramatic change is unlikely. In [2], Chakrabarti et al. prop osed the first evolutionary clustering methods where the cluster memb ership at time t is influenced by the clusters at time t-1. Chi et al. [3] extended similar ideas and prop osed the first evolutionary sp ectral clustering algorithms. They used graph cut as a metric for measuring community structures and community evolutions. All these studies differ from our work in that they regularize the current community memb ership at time t by using historic community memb ership indirectly. In Chakrabarti et al.'s evolutionary hierarchical clustering algorithm, historic community structure affects the tree-node merging step in the current time. In their evolutionary kmeans clustering algorithm, historic centroids affect the kmean process at the current time. In Chi et al.'s algorithms, certain eigenvectors, instead of the community structure, are regularized over time. In the work of Sarkar et al., although the relationship among nodes in latent spaces is preserved over time, the issue of communities are not directly addressed. In contrast, in our prop osed framework, the community memb ership itself is directly regularized over time. 2.2 Basic Formulation As mentioned in the introduction, we want to analyze communities and their evolutions in a unified process. That is, at time t, we prefer a community structure so that the community evolution from t-1 to t is not unreasonably dramatic. To achieve this goal, we prop ose to use the community structure at time t-1 (already extracted) to regularize the community structure at current time t (to b e extracted). To incorp orate such a regulation, we introduce a cost function to measure the quality of community structure at time t, where the cost consists of two parts--a snapshot cost and a temporal cost : cost = · C S + (1 - ) · C T (1) This cost function is first prop osed by Chakrabarti et al. [2] in the context of evolutionary clustering. In this cost function, the snapshot cost C S measures how well a community structure fits W , the observed interactions at time t. The temp oral cost C T measures how consistent the community structure is with resp ect to historic community structure (at time t-1 ). The parameter is set by the user to control the level of emphasis on each part of the total cost. 2 . 2 . 1 S n a p s h o t Co s t A community structure at time t should fit W well, where W is the observed interaction (similarity) matrix at time t. This requirement is reflected in the snapshot cost C S in the cost function Eq. (1). We first describ e how we model the community structure and how we define the snapshot cost. Assume there exist m communities at time t. We further assume that the interaction (similarity) wij is a combined effect due to all the m communities. m hat is, we approximate T wij using a mixture model wij k=1 pk ·pki ·pkj , where pk is the prior probability that the interaction wij is due to the k-th community, pki and pkj are the probabilities that an interaction in community k involves node vi and vj , resp ectively. Written in a matrix form, we have W X X T where X Rn×m is a non-negative matrix with xik = pki + i and xik = 1. In addition, is an m × m non-negative diagonal matrix with k = pk , where k is short for kk . Matrices X and (or equivalently, their product X ) fully characterize the community structure in the mixture model. This model was first prop osed by Yu et al. in [24]. v2 v1 v2 c1 v3 v6 v3 c2 v4 v5 v4 v5 v6 v1 v2 v1 2. FORMULATION 2.1 Notation First, a note on notations. In this pap er, we use lower-case letters, e.g., x, to represent scalars, vector-formed letters, e.g., v, to represent vectors, and upp er-case letters, e.g., W , to represent matrices. Both wij and (W )ij represent the element at the i-th row and j -th column of W . We use vec (W ) to denote the vectorization of W , i.e., stacking the columns of W into a column vector. A subscript t on a variable, e.g., Wt or wt;ij , denotes the value of that variable at time t. However, to avoid notation clutter, we try not to use the subscript t unless it is needed for clarity. We assume that edges in the networked data are associated with discrete timesteps. We use a snapshot graph Gt (Vt , Et ) to model interactions at time t where in Gt , each node vi Vt represents an individual and each edge eij Et denotes the presence of interactions b etween vi and vj . Assuming Gt has n nodes, we use a matrix W Rn×n (which + is short for Wt ) to represent the similarity b etween nodes in Gt , where wij > 0 if eij Et and otherwisie wij = 0. Without loss of generality, we assume that ,j wij = 1. Over time, the interaction history is captured by a sequence of snapshot graphs G1 , · · · , Gt , · · · indexed by time. x31 c1 1 v3 x32 v6 x41 c2 2 x4 2 v4 v5 ( a) ( b) w34 1x31x41 +2x32x42 (c) Figure 1: Schematic illustration of soft communities: (a) the original graph, (b) the bipartite graph with two communities, and (c) how to approximate an edge (w34 ) In Fig. 1, we use a toy example with 6 nodes and 2 communities to illustrate this model of community structure. For a general graph (a), we use a sp ecial bipartite graph (b) to approximate (a). Note that (b) has two more nodes, i.e., c1 and c2 , corresp onding to the two communities. However, 687 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities because (b) is a bipartite graph (i.e., an edge can only occur b etween a node v and a community c), it has less degree of freedom and so it is a more parsimonious explanation of (a). In (c), we show how an edge w34 is generated in the mixture model as the sum of 1 x31 x41 and 2 x32 x42 . Equivalently, we are approximating W , which has rank n, by a product in the form of X X T , which has rank m. Based on this model, we define the snapshot cost C S as the error introduced by such an approximation, i.e., C S = D(W XX T ) i aij where D(A B) = ,j (aij log bij - aij + bij ) is the KLdivergence b etween A and B . So the snapshot cost is high when the approximate community structure X X T fails to fit the observed data W well. time t-1 to t will b e more dramatic, and therefore the temp oral cost C T will b e higher. 2.3.2 A Probabilistic Interpretation Next we provide a probabilistic interpretation of our framework by using a first-order Markov generative model. The basic ideas are that (1) the currently observed data Wt is generated from the current community structure following a certain distribution and (2) the current community structure at time t is generated by using the community structure at time t-1 as the prior distribution. Let Ut b e the community parameters at time t and Ut-1 b e those at time t-1. The goal is then to estimate the unseen parameters Ut , given Wt and Ut-1 , i.e., U = arg max log P (Wt , Ut |Ut-1 ) Ut 2.2.2 Temporal Cost In the cost function Eq. (1), the temp oral cost C T is used to regularize the community structure so that it is less probable for unreasonably dramatic community evolution from time t-1 to t. We prop ose to achieve this regularization by defining C T as the difference b etween the community structure at time t and that at time t-1. Recall that the community structure is captured by X . Therefore, with . Y = Xt-1 t-1 , the temp oral cost is defined as C T = D(Y X) where D is the KL-divergence as defined b efore. So the temp oral cost C T is high when there is a dramatic change of community structure from time t-1 to t. We further assume a first-order Markov model, as illustrated in Fig. 2, and we have P (Wt , Ut |Ut-1 ) = P (Wt |Ut )P (Ut |Ut-1 ). Therefore the log-likelihood function can b e written as L(Ut ) = log P (Wt |Ut ) + log P (Ut |Ut-1 ) (3) Dirichlet U0 Ut-1 Ut multinomial W0 Wt-1 Wt 2.2.3 Putting It Together Putting the snapshot cost C S and the temp oral cost C T together, we have an optimization problem as to find the b est community structure at time t, expressed by X and , that minimizes the following total cost cost = · D(W XX T ) + (1 - ) · D(Y X) (2) i n×m sub ject to X R+ , xik = 1, and b eing an m-bym non-negative diagonal matrix. Solving this optimization problem is the core of our FacetNet framework. Figure 2: Schematic illustration of the probabilistic model, where U = (X, ) In our model, Ut-1 = (Xt-1 , t-1 ) and Ut = (Xt , t ). Under the ab ove probabilistic model, we have the following theorem, whose proof is given in the App endix. Theorem 1. Under the assumptions that (1) vec (Wt ) fol lows a multinomial dis, ribution with paramt X T eter t , where t = vec t t Xt and 2.3 Justification We now provide two interpretations to the cost function Eq. (2), one from the p oint of view of information theory and the other from that of probabilistic generative model. (2) vec (Xt t ) fol lows a Dirichlet distribution with parameter t , where t = vec (Xt-1 t-1 ) + 1 and = (1 - )/; 2.3.1 An Information Theory Interpretation In information theory, the KL-divergence D(P Q) is also known as the relative entropy, and it represents the information gain if we use the precise distribution P instead of the approximate model Q (where Q tries to model P ). In our community structure, X X T is the marginal distribution induced from the bipartite model and it tries to approximate W . As a result, D(W XX T ) gives us the information gain (or the error introduced) from our community structure X X T to the true distribution W . A higher information gain suggests a larger error introduced by X X T and therefore implies a higher snapshot cost C S . Similarly, in D(Y X), Y represents the community structure at time t-1. When we try to use the current community structure X to explain Y , if the information gain from X to Y is larger, then the change of community structure from the parameter estimation of Xt and t by maximum a posterior (MAP) in Eq. (3) is equivalent to that by minimizing the cost function in Eq. (2). 2.4 Solution In this subsection, we first provide an iterative algorithm to solve the optimization problem defined by Eq. (2) and then show the time complexity of our algorithm. 2.4.1 An Iterative Algorithm In our algorithm, we use the following up date rules and as stated in the following theorem, in each iteration, the algorithm up dates the values of X and in such a way that the cost function defined in Eq. (2) is monotonically decreased. 688 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities Theorem 2. The fol lowing update rules wil l monotonical ly decrease the cost function defined in Eq. (2) and therefore converge to an optimal solution to the objective function: j wij · k · xj k xik xik · 2 · + (1 - ) · yik (4) (X X T )ij i then normalize such that xik = 1, k i wij · xik · xj k k k · · + (1 - ) · yik T) (X X ij j k then normalize such that k = 1. i (5) t-1 v1 c1 v2 v3 c3 v4 c2 v5 v6 t c2 t c3 c2 c2 t-1 t c1 c1 c2 c1 c3 c1 ( a) ( b) (c) The proof for the correctness and the convergence of the ab ove up date rules is skipp ed due to space limit. 2.4.2 Time Complexity We now show the time complexity for each iteration of the up dates in Theorem 2. The most time-consuming part is to compute (X X T )ij for all i, j {1, . . . , n}. However, it turns out that we do not have to compute (X X T )ij for each pair of (i, j ), thanks to the sparseness of W . In W , the numb er of non-zero elements is the numb er of edges in the snapshot graph, which we denote by . Then for each nonzero wij , we compute the corresp onding (X X T )ij , which takes O(m) time with m b eing the numb er of communities. As a result, the total complexity is O(m). If we consider m, the numb er of communities, to b e a constant and if the degree of nodes in the snapshot graph is b ounded by another constant, then the complexity is reduced to O(n), i.e., linear in the numb er of nodes in the snapshot graph. Figure 3: Schematic illustration of communities and their evolutions: (a) two bipartite graphs at time t-1 and time t (merged by vi 's), (b) the Community Net at time t induced by the bipartite graph at time t, and (c) the Evolution Net from t-1 to t induced by the two bipartite graphs - T this bipartite graph, t Xt Dt 1 Xt t gives a marginal distribution on the subgraph with nodes {c1 , c2 , c3 } (Fig. 3(b)) and this is exactly the community structure we are looking for. We call this induced subgraph on the community nodes (i.e.,{c1 , c2 , c3 }) a Community Net. Note that to induce the community net, each node vi participates in all the communities, with different levels. This is more reasonable than traditional methods in which each node can only contribute to a single community. 2.5 Communities and Their Evolutions After obtaining the solution to Eq. (2) by using our algorithm in Theorem 2, here are the ways we utilize the solution to analyze communities and their evolutions. 2.5.3 Evolution Net To derive the community evolutions, we align the two bipartite graphs, that at time t-1 and that at time t, side by side by merging the corresp onding network nodes vi 's, as illustrated in Fig. 3(a). Then a natural definition of community evolution (from ct-1;i at time t-1 to ct;j at time t) is the probability of starting from ct-1;i , walking through the merged bipartite graphs, and reaching ct;j . Such a walking process produces what we call the Evolution Net to represent community evolutions, as illustrated in Fig. 3(c). A simple T - derivation shows that P (ct-1;i , ct;j ) = (t-1 Xt-1 Dt 1 Xt t )ij -1 T and P (ct;j |ct-1;i ) = (Xt-1 Dt Xt t )ij . Again, each node and each edge contribute to the evolution from ct-1;i to ct;j . That is, all individuals and all interactions are related to all the community evolutions, with different levels. We b elieve this is more reasonable than how community evolutions are derived in traditional methods. In tradition methods, usually the intersection and union of community members at different time are used alone to compute community evolutions, with a questionable assumption that all memb ers in a community should b e treated with identical imp ortance. 2.5.1 Community Membership Assume we have computed the result at time t-1, i.e., (Xt-1 , t-1 ), and the result at time t, i.e., (Xt , t ). In addition, we define a diagonal matrix Dt , whose diagonal j elements are the row sums of Xt t , i.e., dt;ii = (Xt t )ij . - Then we claim that the i-th row of Dt 1 Xt t indicates the soft community memb erships of vi at time t. We illustrate this by using an example shown in Fig. 3. Recall that in the bipartite graph at time t (the right side of Fig. 3(a)), the weights of edges connecting vi to c1 , c2 , and c3 represent the joint probability P (vi , c1 ), P (vi , c2 ), and P (vi , c3 ). The - Dt 1 part normalizes this joint probability to get P (c1 |vi ), P (c2 |vi ), and P (c3 |vi ), i.e., the conditional probability that vi b elongs to c1 , c2 , and c3 , resp ectively. And this conditional probability is exactly the soft community memb ership we are looking for. Furthermore, we can see that the i-th diagonal element of Dt provides information ab out the level of activity of vi at time t. 3. EXTENSIONS In this section we introduce two extensions to our basic framework in order to handle inserting and removing of individuals and to determine the numb er of communities in a dynamic network over time. 2.5.2 Community Net The community structure itself, on the other hand, is ex- T pressed by t Xt Dt 1 Xt t . For this we again look at the bipartite graph at time t (the right side of Fig. 3(a)). InT duced from this bipartite graph, Xt Xt gives a marginal distribution on the subgraph with nodes {v1 , . . . , v6 } in order to approximate Wt . In a dual fashion, also induced from 3.1 Inserting and Removing Nodes In real applications, it occurs very often that some new individuals join a dynamic network (e.g., a new author in a 689 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities paper co-authorship network) and existing ones leave (e.g., a blogger who stops blogging). We provide the following heuristic techniques in our algorithm to handle such inserting and removing of nodes. Assume that at time t, out of the n nodes in the network, n1 existing nodes are removed from and n2 new nodes are inserted into the network. We first handle the n1 removed nodes by removing the corresp onding n1 rows from Y in Equations (4) and (5) to get Y . Next, we scale Y to get Y so that Y is a valid joint distribution, i.e., Y = i Y / j yij . The basic idea b ehind this heuristic is that we assume the n1 nodes are randomly selected, indep endent of their community memb ership. Under such an assumption, Y is a conditional distribution, conditioning on the remaining n - n1 nodes in the network. To add the n2 nodes, we ^ pad n2 rows of zeros to Y to get Y . This heuristic is actually equivalent to assuming that these n2 nodes have already existed at time t-1 but as isolated nodes. 3.2.2 Extended Formulas If we allow different community numb ers at time t and t-1, then we have to revise Eq. (2) accordingly b ecause in Eq. (2), the term D(Y X) requires Y (the community structure at t-1 ) and X (the community structure at t) to have the same numb er of columns and therefore the same numb er . of communities. To solve this issue, we first define Z = T Xt-1 t-1 Xt-1 and then revise the cost function to b e cost = · D(W XX T ) + (1 - ) · D(Z XX T ) (8) The basic idea is that when the community numb ers are different at time t and t-1, instead of regularizing the community structure itself, we regularize the marginal distribution induced by the community structure at time t (i.e., X X T , which approximates Wt ) so that it is not too far away from T that at time t-1 (Xt-1 t-1 Xt-1 ). For the cost function given in Eq. (8), the following up date rules are used. Theorem 4. The fol lowing update rules wil l monotonical ly decrease the cost function defined in Eq. (8) and therefore converge to an optimal solution to the objective function. j ( · wij + (1 - ) · zij ) · k · xj k xik xik · (9) (X X T )ij i then normalize such that xik = 1, k k k · ( · wij + (1 - ) · zij ) · xik · xj k (X X T )ij j k then normalize such that k = 1. i (10) 3.2 Changing Community Numbers So far we have assumed that the numb er of communities, m, is given b eforehand by the user. However, such an assumption will limit the scop e of application of our framework. In this subsection we try to answer two questions: how to automatically determine the numb er of communities at a given time t and how to revise our framework to allow the numb er of communities to change in different timesteps. 3.2.1 Soft Modularity In [13], Newman et al. introduces an elegant concept, the modularity Q, to measure the goodness of a community partition Pm where Q is defined as A ( A 2 km (V k , V k ) (V k , V ) Q (P m ) = - 6) A(V , V ) A(V , V ) =1 i with A(Vp , Vq ) = Vp ,j Vq wij . Basically, Q measures the deviation b etween the chance for edges among communities to b e generated due to the community structure and the chance for the edges to b e generated randomly. Extensive exp erimental results [13, 23] have demonstrated that Q is an effective measure for the community quality, where a maximal Q is a good indicator of the b est community structure and therefore the b est community numb er m. Here we extend the concept of modularity to handle soft memb ership by defining a Soft Modularity Qs : -( Qs =T r D-1 X )T W (D-1 X ) 1T W T (D-1 X )(D-1 X )T W 1 (7) where 1 is a vector whose elements are all ones. Qs has the following nice prop erty, whose proof is given in the App endix. Theorem 3. The Qs defined in Eq. (7) has the same probabilistic interpretation as the Q defined in Eq. (6), but in the context of soft community membership. In addition, Qs is a generalized modularity in that Qs is identical to Q when D-1 X becomes a hard community membership (i.e., each row of D-1 X has one 1 and m-1 zeros). So to detect the b est community numb er m at time t, we run our algorithm for a range of candidates for m and pick the b est one determined by Qs . The proof for the correctness and the convergence of the ab ove up date rules is skipp ed due to space limit. 4. EXPERIMENTAL STUDIES In this section, we use several synthetic datasets, a blog dataset, and a pap er co-authorship dataset to study the p erformance of our FacetNet framework. 4.1 Synthetic Datasets 4.1.1 Synthetic Dataset # 1 We start with the first synthetic dataset, which is a static network, to illustrate some good prop erties of our framework. This dataset was first studied by White et al. [23] and is shown in Fig. 4(a). The network contains 15 nodes which roughly form 3 communities--C 1, C 2, and C 3-- where edges tend to occur b etween nodes in the same community. We first check our soft modularity measure. We apply our algorithm to the network with various community numb ers m and the resulting Qs values are plotted in Fig. 4(b). In addition, in Fig. 4(b) we also show the modularity values Q that are rep orted by White et al. in [23]. As can b e seen from the plot, b oth Qs and Q show distinct p eaks when m = 3, which corresp onds to the correct community numb er. Next, after our algorithm correctly partitions the 15 nodes into three communities, we illustrate the soft community memb ership by studying two communities among the three-- C 1 = {6, 7, 8, 9, 10} and C 2 = {11, 12, 13, 14, 15}. In Fig. 4(a), we use the same circle shap e to represent these 10 nodes but use different gray levels to indicate their community 690 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities (a) z = 3 Mutual Information 0.5 5 12 13 11 2 3 4 (b) z = 5 Mutual Information 0.25 EvolSpec FacetNet NCut SNMF 1 Modularity Q Soft Modularity Qs 1.4 1.38 1.36 1.34 1.32 1.3 1.28 EvolSpec FacetNet NCut SNMF C3 6 C2 14 15 0.4 0.3 0.2 0.1 0 1 0.2 10 8 7 9 0.15 C1 2 3 4 5 6 Community Number 2 4 6 8 10 0.1 2 4 6 8 10 (b ) (a) Figure 4: (a) The synthetic dataset (b) Modularity and soft modularity under different community numbers memb ership--we use white color to illustrate the level that a node b elongs to C 1 and dark color to show the level that a node b elongs to C 2. As can b e seen, while node 7, node 14, and node 15 have very clear community memb erships, node 10 and node 13, who are on the b oundary b etween C 1 and C 2, have rather fuzzy memb ership. That is, our algorithm is capable of assigning meaningful soft memb ership to a node to indicate to which level the node b elongs to a certain community. Timestep Timestep Figure 5: Mutual information with respect to the ground truth over 10 timesteps when (a) z = 3 and (b) z = 5 4.1.2 Synthetic Dataset # 2 The second dataset is generated according to the description by Newman et al. in [13]. This dataset contains 128 nodes, which are divided 4 communities of 32 nodes each. We generate data for 10 consecutive timesteps. In each timestep from 2 to 10, dynamics are introduced in the following way: from each community we randomly select 3 memb ers to leave their original community and to join randomly the other three communities. Edges are added randomly with a higher probability pin for within-community edges and a lower probability pout for b etween-community edges. However, the average degree for the nodes is set to 16. As a result, a single parameter z , which represents the mean numb er of edges from a node to nodes in other communities, is enough to describ e the data. Because we have the ground truth for the community memb ership at each timestep, we directly study the accuracy of the community structure obtained by our framework. We compare our FacetNet framework with 3 baseline algorithms. The first baseline, which we call EvolSpec, is the evolutionary sp ectral clustering algorithm prop osed by Chi et al. [3]. Because EvolSpec is an evolutionary version of the Normalized Cut (NCut ) algorithm by Shi et al. [18], we take NCut as our second baseline. Similarly, FacetNet is essentially an evolutionary version of the soft clustering method (SNMF ) by Yu et al. [24], we take SNMF as our third baseline. Notice that FacetNet and EvolSpec are evolutionary algorithms whereas NCut and SNMF are not--NCut and SNMF work on each snapshot graph indep endently of other snapshot graphs. In addition, to make the results comparable, for FacetNet and SNMF we convert the soft memb ership into 0/1 indicators by assigning each node to the community it most likely b elongs to. Furthermore, in all the exp eriments, for FacetNet and EvolSpec we set to b e 0.9. Fig. 5(a) and 5(b) show the accuracy and standard error of the community memb ership obtained by the four algorithms for two datasets generated with z = 3 and z = 5, resp ectively. The accuracy is computed by the mutual information b etween the derived community memb ership and the ground truth, where a higher mutual information indicates b etter accuracy. From the figures we can see that when z = 3, i.e., when there is less noise and hence the community structure is easy to detect, b oth FacetNet and EvolSpec have accuracy improved starting from the second timestep, which suggests that an evolution framework is b eneficial in this dynamic network. In comparison, NCut and SNMF have relatively flat accuracy over all timesteps, with NCut slightly outp erforms SNMF. For the data where z = 5, there are more edges going b etween communities and therefore the community structure is more difficult to detect. From Fig. 5(b) we can see that although at the first few timesteps FacetNet does not p erform as good as EvolSpec, as time going further, FacetNet starts to outp erform EvolSpec. This suggests that the b enefits of FacetNet accumulates over time more than EvolSpec. In addition, as for the case with z = 3, when z = 5, b oth FacetNet and EvolSpec clearly outp erform their non-evolutionary version, NCut and SNMF. (a) Time per Iteration (sec) 0.2 800 (b) Iteration Numbers 0.15 600 0.1 400 0.05 200 0 0 2000 4000 6000 0 0 2000 4000 6000 Size of Network Size of Network Figure 6: Running time for networks of different sizes (a) time per iteration (sec) and (b) number of iterations until converge Next, we study the time p erformance of FacetNet. We rep eat the ab ove exp eriment over a family of networks of various sizes (node numb ers). In Fig. 6(a) we show the average running time p er iteration of our algorithm on networks with different sizes. In 6(b) we showed the numb er of iteration needed for convergence when the convergence criterion is that the change b etween two consecutive iterations is b elow a threshold of 1e-5. As can b e seen, first, the running time p er iteration scales linearly with the size of network, which validates our theoretical analysis in Section 2; second, the numb er of iteration needed for convergence is insensitive to the network size, which implies that the overall running time of our algorithm scales linearly to the size of network. 691 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities 4.2 NEC Blog Dataset The blog data was collected by an NEC in-house blog crawler. Given seeds of manually picked highly ranked blogs, the crawler discovered blogs that are densely connected with the seeds, resulting in an expanded set of blogs that communicate with each other. The crawler then continued monitoring for new entries over a long time p eriod. This NEC blog data set contains 148,681 entry-to-entry links among 407 blogs crawled during 12 consecutive month (a timestep is one month), b etween August 2005 and Sepp emb er 2006. t Following [14], we define W by wij = wij / ~ ~ ,q wpq where wii = 1, wij = exp(-1/( · lij )) if eij Et , and otherwise ~ ~ wij = 0. In the formula, lij is the edge weight (e.g., # of ~ links) of eij and , which is set to 0.2, is a parameter to control marginal effect when lij is increased. Table 1: Top keywords among the four communities in the NEC dataset, sorted by the tf-idf score C1 adsense, beta, skype, firefox, msn, rss, aol, yahoo, google, ebay, deskC2 C3 C4 top, wordpress, voip, feeds, myspace, p o dcasting, technorati, search, engine, browser, ads, gmail, windows, os, develop er, venture, marketing, apple, p o dcasts, develop ers, engines, mac, publishers, ceo, linux gop, uranium, hezb ollah, demo crats, rove, cia, republicans, saddam, qaeda, tax, republican, iraqi, rob erts, bush, clinton, iraq, senate, tro ops, terrorists, administration, terrorist, wilson, conservative, taxes, lib eral, intelligence, israel, terror, iran, weap ons, war, soldiers shanghai, rob ots, installation, japan, japanese, architecture, art, chinese, china, saudi, phones, filed, mobile, games, korea, rfid, sex, green, camera, sound, cell, b o dy, africa, phone, entertainment, film, gay, india, fuel, archive, design, elections, flash, device, water, wireless, south library, learning, digital, resources, collection, conference, staff, communities, students, session, b o oks, database, access, survey, university, science, canada, myspace, articles, education, technologies, knowledge, filed, virtual, to ols, research, david, learn, services, flickr, computers (a) Soft Modularity Qs 0.23 1 0.225 0.8 0.22 0.6 0.4 0.215 0.2 0.21 2 4 6 8 0 (b) Mutual Information = 0.1 = 0.5 = 0.9 2 4 6 8 10 12 Community Number Timestep Figure 7: (a) Soft modularity and (b) mutual information under different for the NEC dataset We start with analyzing the overall picture of the dataset. We first aggregate all the edges over all timesteps into a single network and apply our algorithm to compute the soft modularity score Qs under different community numb ers. As can b e seen in Fig. 7(a), a clear p eak shows when the community numb er is 4. We draw the aggregated graph in Fig. 8, according to the main community each blog most likely b elongs to. In addition, in Table 1 we list the top keywords, measured by the tf-idf score, that occur in the p osts of these four communities. It seems that C 1 focuses on technology, C 2 on p olitics, C 3 on international entertainment, and C 4 on digital libraries. C1 C3 four communities stay rather stable over all the timesteps. This effect is partially due to the way these blogs are selected by our focused crawler--our crawler chose to crawl a densely connected subgraph of the blogosphere where each node in the subgraph has large numb er of links and high level of interaction intensities. Therefore, most of the selected blogs b elong to some well-known bloggers and they seldom move around b etween communities. We apply our FacetNet algorithm on the data with different . And we compute the mutual information b etween the extracted communities at each timestep and the four communities shown in Fig. 8. Fig. 7(b) shows the results under =0.1, 0.5, and 0.9. As can b e seen, as increases, our algorithm emphasizes less on the temp oral smoothness and as a result, the community structure has higher variation over time. In addition, as increases, the communities at each timestep deviate further from the communities obtained from the aggregated data. These results on one hand justify our arguments in the introduction section and on the other hand demonstrate that our FacetNet framework is capable of controlling the tradeoff b etween the snapshot cost and the temp oral cost in the cost function Eq. (1). Aug 05 Sep 05 Oct 05 Nov 05 Dec 05 Jan 06 Feb 06 c3 c3 0 .3 4 c3 0 .4 9 c3 0.33 0.41 c3 0.34 c3 0.59 c1 0 .4 2 0 .4 5 0 .4 2 c1 c4 c4 0 .6 9 c2 0 .8 8 c2 0 .7 5 c2 0 .8 7 c2 0 .2 2 0 .6 7 c2 0 .8 0 c2 0 .8 4 c2 Aug 05 Sep 05 c3 c3 0 .5 7 c1 c4 c2 0 .8 7 c2 0 .3 6 0 .7 1 0 .2 4 0 .5 9 c1 0.32 0.71 c1 0 .2 0 0 .7 1 c1 0 .2 0 0 .7 2 c1 c4 0.31 0.69 0.24 0.53 c3 c1 c4 0 .5 9 0.41 0.42 0.87 c4 c2 d © Oct 05 c3 0.45 0.38 0.76 0.41 0.25 c1 0.51 0.76 c1 0.39 0.75 c1 0.24 0.77 c1 c4 0.48 0.43 c4 0.23 0.53 c4 0.28 0.57 c4 0.59 0.34 0.40 0.38 0.86 c4 c2 Nov 05 c3 0.58 0.33 0.27 0.61 0.79 c4 c2 Dec 05 c3 0.59 0.33 0.23 0.70 c3 c1 0.22 0.53 0.79 c4 c2 Jan 06 Feb 06 c3 0.55 0.33 0.22 0.69 c1 0 .6 0 0 .8 4 c4 c2 (a) = 1 (b) = 0.5 C2 s d C4 Figure 9: The Evolution Net of the NEC dataset when (a) = 1 and (b) = 0.5 Fig. 9 shows the Evolution Net derived from our framework when =1 and 0.5 ( = 1 means no temp oral smoothness is considered). In the Evolution Net, the size of a node is prop ortional to k and it represents the size of the corresp onding community. The edge lab el indicates the probability of a transition from the source community at t-1 to the Figure 8: Four communities in the NEC dataset Next, we analyze the blog data as a dynamic network. After studying the content of the blogs, we find that the ab ove 692 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities Table 2: Top members in the three communities during 2003­2006, sorted by xik , i.e., pki Data Mining Database Divesh Srivastava Surajit Chaudhuri Nick Koudas Elke A. Rundensteiner Jennifer Widom Raghu Ramakrishnan Jeffrey F. Naughton David J. DeWitt Rajeev Motwani H. V. Jagadish Artificial Intelligence Hans-Peter Kriegel William C. Regli Maxim Peysakhov Vincent A. Cicirello Evan Sultanik Gustave Anderson Andrew Burnheimer David Dorsey Mos he Ka m Joseph Kop ena Philip S. Yu Jiawei Han Wei Wang Jian Pei Divyakant Agrawal Kian-Lee Tan Beng Chin Ooi Stanley B. Zdonik Nikos Mamoulis Walid G. Aref C1 C2 C3 C4 Figure 10: The Community Net for the NEC dataset in September 2005 target community at t. (To avoid clutter, we did not show edges with probabilities less than 0.2). From Fig. 9(a) we see that when no temp oral smoothness is considered, C 4 disapp eared at the second timestep (Sep 05) and re-app eared in the third timestep (Oct 05). However, by carefully examining the original data, we did not find any significant events so supp ort such changes. Therefore, we conjecture that these changes are due to the data noises at the second timestep, which triggered the algorithm to split C 1 into two communities and merge C 4 to one of them. In comparison, as can b e seen from Fig. 9(b), when there is a temp oral smoothness term, the four communities remain relatively stable. That is, although there exist transitions among different typ es of communities, the ma jority of transitions are b etween communities of the same typ e. These results demonstrate that the FacetNet framework is more robust to data noise. From the Evolution Net, we can also obtain some other observations. For example, the p olitical community C 2 is rather isolated from the rest communities over all the time. In comparison, b oth C 3 and C 4 interacts with C 1 heavily. In addition, in Fig. 10 we show the Community Net at an arbitrary timestep (Sep 05). In the Community Net, the node sizes are prop ortional to t;k and the edge weights are - T prop ortional to the corresp onding entries in t Xt Dt 1 Xt t (self-loops are not shown). We can see that this Community Net is a good synopsis of the aggregated network in Fig. 8. its contribution to the community structure. For example, in Table 2, some of the top authors in the AI area are ranked high partially b ecause they participated in a series of pap ers with up to 20 co-authors and these large co-author cliques heavily influenced the community structure of the AI community in our dataset. 5. CONCLUSION The analysis of communities and their evolutions in dynamic temp oral networks is a challenging research problem with broad applications. In this pap er, we prop osed a framework, FacetNet, to solve this problem. Unlike traditional two-stage techniques that separate the task of community extraction and the task of evolution extraction, the FacetNet framework combines these two tasks in a unified process. Therefore, not only community structures determine the evolutions, but also the historic evolution patterns regularize current community structure. As a result, it is less likely for the current community structure to deviate too dramatically from the most recent history. In addition to the basic framework, we also prop osed to use soft community memb ership and we introduced several novel concepts such as Community Net, Evolution Net, and Soft Modularity, to measure and to visualize the resulting communities and their evolutions. Extensive exp erimental studies demonstrated that our framework provide communities and evolutions that are more accurate and more robust to data noise. 4.3 DBLP Co-authorship Dataset The DBLP co-authorship dataset is a subset of that used by Asur et al. [1]. It contains pap ers in 28 conferences over 10 years (1997­2006), which span three areas--database, data mining, and artificial intelligence. We selected a dense subgraph of 2950 authors from the original dataset and partition the time into three p eriods with overlap: 1997­2000, 2000­2003, 2003­2006. Due to the space limit, we skip the detailed p erformance study and only p oint out two interesting issues ab out this dataset. First, in the first two p eriods, the soft modularity shows (local) p eaks at m = 4 while in the third p eriod, m = 3 is optimal. As a result, for this exp eriment we used the up date algorithm describ ed in Theorem 4. Second, in Table 2 we list the top authors in the three communities detected by FacetNet in the third p eriod (2003­2006). Here the rank is determined by the value xik , i.e., pki . Recall that pki indicates to what level the k-th community involves the i-th node. So from our framework, we can directly infer who are the imp ortant memb ers in each community. However, notice that by important, we are not judging the quality or quantity of pap ers by an author. Instead, in our framework the imp ortance of a node in a community is determined by Acknowledgments We thank Kai Yu for providing us with the SNMF source code and for the helpful discussion; thank Junichi Tatemura and Ko ji Hino for helping prepare the blog dataset; thank Sitaram Asur and Srinivasan Parthasarathy for providing us with the DBLP dataset. 6. REFERENCES [1] S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for characterizing the evolutionary b ehavior of interaction graphs. In Proc. of the 13th ACM SIGKDD Conference, 2007. [2] D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering. In Proc. of the 12th ACM SIGKDD Conference, 2006. [3] Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary sp ectral clustering by incorp orating temp oral smoothness. In Proc. of the 13th ACM SIGKDD Conference, 2007. 693 WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities [4] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997. [5] I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: sp ectral clustering and normalized cuts. In Proc. of the 10th ACM SIGKDD Conference, 2004. [6] G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In Proc. of the 6th ACM SIGKDD Conference, 2000. [7] J. M. Kleinb erg. Authoritative sources in a hyp erlinked environment. J. of the ACM, 46(5), 1999. [8] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proc. of the 12th WWW Conference, 2003. [9] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proc. of the 12th ACM SIGKDD Conference, 2006. [10] J. Leskovec, J. Kleinb erg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and p ossible explanations. In Proc. of the 11th ACM SIGKDD Conference, 2005. [11] Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Blog community discovery and evolution based on mutual awareness expansion. In Proc. of the Int. Conf. on Web Intel ligence, 2007. [12] Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temp oral text mining. In Proc. of the 11th ACM SIGKDD Conference, 2005. [13] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004. [14] H. Ning, W. Xu, Y. Chi, Y. Gong, and T. Huang. Incremental sp ectral clustering with application to monitoring of evolving blog communities. In SIAM Int. Conf. on Data Mining, 2007. [15] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. In Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998. [16] G. Palla, A.-L. Barabasi, and T. Vicsek. Quantifying social group evolution. Nature, 446, 2007. [17] P. Sarkar and A. W. Moore. Dynamic social network analysis using latent space models. SIGKDD Explor. Newsl., 7(2), 2005. [18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intel ligence, 22(8), 2000. [19] M. Spiliop oulou, I. Ntoutsi, Y. Theodoridis, and R. Schult. Monic: modeling and monitoring cluster transitions. In Proc. of the 12th ACM SIGKDD Conference, 2006. [20] J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. GraphScop e: parameter-free mining of large time-evolving graphs. In Proc. of the 13th ACM SIGKDD Conference, 2007. [21] M. Toyoda and M. Kitsuregawa. Extracting evolution of web communities from a series of web archives. In HYPERTEXT '03: Proc. of the 14th ACM conference on hypertext and hypermedia, 2003. [22] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994. [23] S. White and P. Smyth. A sp ectral clustering approach to finding communities in graph. In SDM, 2005. [24] K. Yu, S. Yu, and V. Tresp. Soft clustering on graphs. In NIPS, 2005. [25] H. Zha, X. He, C. H. Q. Ding, M. Gu, and H. D. Simon. Sp ectral relaxation for k-means clustering. In NIPS, 2001. APPENDIX Proof for Theorem 1 Proof. Given that Xt Rn×m , each column of which + sums up to one, and t is an m-by-m diagonal matrix and sum to one, we have P (Ut |Ut-1 ) = P (Yt |Ut-1 ), where Yt = Xt t . Because of the constraint, for any given Yt , there is a unique pair of Xt and t such that Yt = Xt t , and Xt and t satisfy the constraint. Thus, Yt uniquely determines Ut , which implies P (Ut |Ut-1 ) = P (Yt |Ut-1 ). The logarithm of MAP is L(Ut ) = log P (Wt |Ut )P (Ut |Ut-1 ) = log P (Wt |Ut )P (Xt t |Ut-1 ) = log multinom(vec (Wt ) ; t ) + log Dirichlet(vec (Xt t ) ; t ) i ( j Wt;ij )! i Wt;ij i = log t;ij j Wt;ij ! j + log 1i Y Yt;ikt-1;ik B ( ) k i i = Wt;ij log t;ij + Yt-1;ik log Yt;ik + const. j k Since t;kk = 1 and j t;ij = 1, we can further derive L(Ut ) = - D(Wt t;ij ) - D(Yt-1 Yt ) + const. 1 T = - [D(Wt Xt t Xt ) + (1 - )D(Yt-1 Xt t )] + const. Thus, maximizing L(Ut ) is equivalent to minimizing the cost function in Eq. (2). Proof for Theorem 3 Proof. In the standard Q formula Eq. (6), the first term A(Vk ,Vk ) is the empirical probability that a randomly seA(V ,V ) lected edge has b oth ends in communitiy k. For our case, this empirical probability should b e ,j wij P (k |i)P (k |j ). i k i k Yt;ik = k t;kk = Vk For the second term in Eq. (6), A((V ,,V)) is the empirical A V probability that a randomly selected edge is related to (i.e., has at least one end in) communiity k. Forj our case, this empirical probability should b e P (k|i) wij . By summing these two terms over all k's and noticing that P (k|i) = (D-1 X )ik , we get formula Eq. (7). In addition, it is straightforward to verify that Qs is equal to Q when D-1 X b ecomes a 0/1 indicator matrix. 694