Cyclizing Clusters via Zeta Function of a Graph Deli Zhao and Xiaoou Tang Department of Information Engineering, Chinese University of Hong Kong Hong Kong, China {dlzhao,xtang}@ie.cuhk.edu.hk Abstract Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1 Introduction Pattern clustering is a classic topic in pattern recognition and machine learning. In general, algorithms for clustering fall into two categories: partitional clustering and hierarchical clustering. Hierarchical clustering proceeds by merging small clusters (agglomerative) or dividing large clusters into small ones (divisive). The key point of agglomerative merging is the measurement of structural affinity between clusters. This paper is devoted to handle the problem of data clustering via hierarchical agglomerative merging. 1.1 Related work The representative algorithms for partitional clustering are the traditional K-means and the latest Affinity Propagation (AP) [1]. It is known that the K-means is sensitive to the selection of initial K centroids. The AP algorithm addresses this issue by that each sample is initially viewed as an examplar and then examplar-to-member and member-to-examplar messages competitively transmit among all samples until a group of good examplars and their corresponding clusters emerge. Besides the superiority of finding good clusters, AP exhibits the surprising ability of handling large-scale data. However, AP is computationally expensive to acquire clusters when the number of clusters is set in advance. Both K-means and AP encounter difficulty on multiple manifolds mixed data. The classic algorithms for agglomerative clustering include three kinds of linkage algorithms: the single, complete, and average Linkages. Linkages are free from the restriction on data distributions, but are quite sensitive to local noisy links. A novel agglomerative clustering algorithm was recently developed by Ma et al. [2] with the lossy coding theory of multivariate mixed data. The core of The preproceedings version for NIPS'08. 1 Figure 1: A small graph with four vertices and five edges can be decomposed into three cycles. The complexity of the graph can be characterized by the collective dynamics of these basic cycles. their algorithm is to characterize the structures of clusters by means of the variational coding length of coding arbitrary two merged clusters against only coding them individually. The coding length based algorithm exhibits the exceptional performance for clustering multivariate Gaussian data or subspace data. However, it is not suitable for manifold-valued data. Spectral clustering algorithms are another group of popular algorithms developed in recent years. The Normalized Cuts (Ncuts) algorithm [3] was developed for image segmentation and data clustering. Ng et al.'s algorithm [4] is mainly for data clustering, and Newman's work [5] is applied for community detection in complex networks. Spectral clustering can handle complex data of multiple distributions. However, it is sensitive to noise and the variation of local data scales. In general, the following four factors pertaining to data are still problematic for most clustering algorithms: 1) mixing distributions such as multivariate Gaussians of different derivations, subspaces of different dimensions, or globally curved manifolds of different dimensions; 2) multiple scales; 3) global sampling densities; and 4) noise. To attack these problems, it is worthwhile to develop new approaches that are conceptually different from existing ones. 1.2 Our work To address issues for complex data clustering, we develop a new clustering approach called Zeta l-links, or Zell. The core of the algorithm is based on a new cluster descriptor that is essentially the integration of all cycles in the cluster by means of the Zeta function of the corresponding graph. The Zeta function leads to a rational form of cyclic interactions of members in the cluster, where cycles are employed as primitive structures of clusters. With the cluster descriptor, the popularity of a cluster is quantified as the global fusion of variations of the structural descriptor by the leaveone-out strategy in the cluster. This definition of the popularity is expressible by diagonals of matrix inverse. The structural inference between clusters may be performed with this popularity character. Based on the novel popularity character, we propose a clustering method, named Zeta merging in the hierarchical agglomerative fashion. This method has no additional assumptions on data distributions and data scales. As a subsidiary procedure for Zeta merging, we present a simple method called llinks, to find the initial set of clusters as the input of Zeta merging. The Zell algorithm is the combination of Zeta merging and l-links. Directed graph construction is derived from l-links. 2 Cyclizing a cluster with Zeta function Our ideas are mainly inspired by recent progress on study of collective dynamics of complex networks. Experiments have validated that the stochastic states of a neuronal network is partially modulated by the information that cyclically transmits [6], and that proportions of cycles in a network is strongly relevant to the level of its complexity [7]. Recent studies [8], [9] unveil that short cycles and Hamilton cycles in graphs play a critical role in the structural connectivity and community of a network. These progress inspires us to formalize the structural complexity of a cluster by means of cyclic interactions of its members. As illustrated in Figure 1, the relationship between samples can be characterized by the combination of all cycles in the graph. Thus the structural complexity of the graph can be conveyed by the collective dynamics of these basic cycles. Therefore, we may characterize a cluster with the global combination of structural cycles in the associated graph. To do so, we need to model cycles of different lengths and combine them together as a structural descriptor. 2.1 Modeling cycles of equal length We here model cycles using the sum-product codes to structurize a cluster. Formally, let C = {x1 , . . . , xn } denote the set of sample vectors in a cluster C . Suppose that W is the weighted adjacency matrix of the graph associated with C . A vertex of the graph represents a member in C . For generality, the graph is assumed to be directed, meaning that W may be asymmetric. Let = {p1 p2 · · · p -1 p , p p1 } denote any cycle of length defined on W. We apply the factorial codes to retrieve the structural information of cycle , thus defining 2 p -1 Wp p1 k=1 Wpk pk+1 , where Wpk pk+1 is the (pk , pk+1 ) entry of W. The value rovides a kind of degree measure of interactions among -associated vertices. For the set K of all cycles of length , the sum-product code is written as: k-1 = Wpk pk+1 . (1) = p p1 = K K W =1 The value may be viewed as the quantified indication of global interactions among C in the cycle scale. The structural complexity of the graph is measured by these quantities of cycles of all different lengths, i.e., {1 , . . . , , . . . , }. Further, we need to perform the functional integration of these individual measures. The Zeta function of a graph may play a role for such a task. 2.2 Integrating cycles using Zeta function Zeta functions are widely applied in pure mathematics as tools of performing statistics in number theory, computing algebraic invariants in algebraic geometry, measuring complexities in dynamic systems. The forms of Zeta functions are diverse. The Zeta function we use here is defined as: , = z = exp z (2) 1 where z is a real-valued variable. Here z may be viewed as a kind of functional organization of all cycles in {K1 , . . . , K , . . . , K } in a global sense. What's interesting is that z admits a rational form [10], which makes the intractable manipulations arising in (1) tractable. Theorem 1. z = 1/ det(I - z W), where z < (W) and (W) denotes the spectral radius of the matrix W. From Theorem 1, we see that the global interaction of elements in C is quantified by a quite simple expression of determinantal form. 2.3 Modeling popularity The popularity of a group of samples means how much these samples in the group is perceived to be a whole cluster. To model the popularity, we need to formalize the complexity descriptor of the cluster C . With the cyclic integration z in the preceding section, the complexity of the cluster can be measured by the polynomial entropy C of logarithm form: = = - ln det(I - z W). (3) C = ln z = z 1 The entropy C will be employed to model the popularity of C . As we analyze at the beginning of Section 2, cycles are strongly associated with structural communities of a network. To model the popularity, therefore, we may investigate the variational information of cycles by successively leaving one member in C out. More clearly, let C denote the popularity character of C . Then C is defined as the averaged sum of the reductive entropies: n n = 1p 1p C - C \xp . (4) C = C - C \xp n =1 n =1 Let T denote the transpose operator of a matrix and ep is the p-th standard basis whose p-th element is 1 and 0 elsewhere. We have the following theorem. n 1 Theorem 2. C = n ln p=1 eT (I - z W)-1 ep . p By analysis of inequalities , we may obtain that C is bounded as 0 < C (C /n). The popularity measure C is a structural character of C , which can be exploited to handle problems in learning such as clustering, ranking, and classification. The computation of C is involved with that of the inverse of (I - z W). In general, the complexity of computing (I - z W)-1 is O(n3 ). However, C is only related to the diagonals of (I - z W)-1 instead of a full dense matrix. This unique property leads the computation of C to the complexity of O(n1.5 ) by a specialized algorithm for computing diagonals of the inverse of a sparse matrix [11]. 2.4 Structural affinity measurement Given a set of initial clusters Cc = {C1 , . . . , Cm } and the adjacency matrix P of the corresponding samples, the affinities between clusters or data groups can be measured via the corresponding popularity character C . Under our framework, an intuitive inference is that the two clusters that share the largest reciprocal popularity have the most consistent structures, meaning the two clusters are 3 most relevant from the structural point of view. Formally, for two given data groups Ci and Cj from Cc , the criterion of reciprocal popularity may be written as Ci Cj = Ci + Cj = (Ci |Ci Cj - Ci ) + (Cj |Ci Cj - Cj ), 1 |Ci | - where the conditional popularity Ci |Ci Cj is defined as Ci |Ci Cj = ln -1 z PCi Cj ) ep and PCi Cj is the submatrix of P corresponding to the samples in Ci and Cj . The incremental popularity Ci embodies the information gain of Ci after being merged with Cj . The larger the value of Ci Cj is, the more likely the two data groups Ci and Cj are perceived to be one cluster. Therefore, Ci Cj may be exploited to measure the structural affinity between two groups of samples from a whole set of samples. x (5) eT (I p Ci p 3 Zeta merging We will develop the clustering algorithm using the structural character C . The automatic detection of the number of clusters are also taken into consideration. 3.1 Algorithm of Zeta merging With the criterion of structural affinity in Section 2.4, it is straightforward to write the procedures of clustering in the hierarchical agglomerative way. The algorithm may proceed from the pair {Ci , Cj } that has the largest incremental popularity Ci Cj , i.e., {Ci , Cj } = arg max Ci Cj . We name the i,j method by Zeta merging, whose procedures are provided in Algorithm 1. In general, Zeta merging will proceed smoothly if the damping factor z is bounded as 0 < z < 2 1 1 . P Algorithm 1 Zeta merging inputs: the weighted adjacency matrix P, the m initial clusters Cc = {C1 , . . . , Cm }, and the number mc (mc m) of resulting clusters. Set t = m. while 1 do if t = mc then break; end if Search two clusters Ci and Cj such that {Ci , Cj } = arg max Ci Cj . Cc {Cc \ {Ci , Cj }} {Ci Cj }; t t - 1. end while {Ci ,Cj }Cc The merits of Zeta merging are that it is free from the restriction of data distributions and is less affected by the factor of multiple scales in data. Affinity propagation in Zeta merging proceeds on graph according to cyclic associations, requiring no specification on data distributions. Moreover, the popularity character C of each cluster is obtained from the averaged amount of variational information conveyed by C . Thus the size of a cluster has little influence on the value Ci Cj . What's most important is that cycles rooted at each point in C globally interact with all other points. Thus, the global descriptor C and the popularity character C are not sensitive to the local data scale at each point, leading to the robustness of Zeta merging against the variation of data scales. 3.2 Number of clusters in Zeta merging In some circumstances, it is needed to automatically detect the number of underlying clusters from given data. This functionality can be reasonably realized in Zeta merging if each cluster corresponds to a diagonal block structure in P, up to some permutations. The principle is that the minimum Ci Cj will be zero when a set of separable clusters emerges, behind which is the mathematical principle that inverting a block-diagonal matrix is equivalent to inverting the matrices on the diagonal blocks. In practice, however, the minimum Ci Cj has a jumping variation on the stable part of its curve instead of exactly arriving at zero due to the perturbation of the interlinks between clusters. Then the number of clusters corresponds to the step at the jumping point. 4 The Zell algorithm An issue arising in Zeta merging is the determination of the initial set of clusters. Here, we give a method by performing local single Linkages ( message passing by minimum distances). The method of graph construction is also discussed here. 1 Interested one may refer to the full version of this paper for proofs. 4 Figure 2: Schematic illustration of l-links. From left to right: data with two seed points (red markers), 2-links grown from two seed points, and 2-links from four seed points. The same cluster is denoted by the markers with the same color of edges. 4.1 Detecting l-links 2 Given the sample set Cy = {y1 , . . . , ymo }, we first get the set Si K of 2K nearest neighbors for 2K the point yi . Then from yi , messages are passed among Si in the sense of minimum distances (or general dissimilarities), thus locally forming an acyclic directed subgraph at each point. We call such an acyclic directed subgraph by l-links, where l is the number of steps of message passing 2 among Si K . In general, l is a small integer, e.g., l {2, 3, 4, . . . }. The further manipulation is to merge l-links that share common vertices. A simple schematic example is shown in Figure 2. The specific procedures are provided in Algorithm 2. Algorithm 2 Detecting l-links inputs: the sample set Cy = {y1 , . . . , ymo }, the number l of l-links, the number K of nearest neighbors for each point, where l < K . Initialization: Cc = {Ci |Ci = {yi }, i = 1, . . . , mo } and q = 1. for i from 1 to mo do 2 Search 2K nearest neighbors of yi and form Si K . Iteratively perform Ci Ci {yj } if yj = arg min min distance(y, yj ), until |Ci | l. Perform Cj Ci Cj , Cc Cc \ Ci , and q q + 1, if |Ci Cj | > 0, where j = 1, . . . , q . end for 4.2 Graph construction 2 yj Si K yCi The directional connectivity of l-links leads us to build a directed graph whose vertex yi directionally points to its K nearest neighbors. The method of graph construction is presented in Algorithm 3. The free parameter in (6) is estimated according to the criterion that the geometric mean of all similarities between each point and its three nearest neighbors is set to be a, where a is a given parameter in (0, 1]. It is easy to know that (P) < 1 here. Algorithm 3 Directed graph construction inputs: the sample set Cy , the number K of nearest neighbors, and a free parameter a (0, 1]. y y 2 1 Estimate the parameter by 2 = - mo lna 3 [distance (yi , yj )] . i Cy j Si Define the entry of the i-th row and j -th column of the weighted adjacency matrix P as e i j K xp (- ), if yj Si , 2 0, otherwise. mo Perform the sum-to-one operation for each row, i.e., Pij Pij / j =1 Pij . [distance(y , y )]2 Pij = (6) 4.3 Zeta l-links (Zell) Our algorithm for data clustering is in effect to perform Zeta merging on the initial set of small clusters derived from l-links. So, we name our algorithm by Zeta l-links, or Zell. The complete implementation of the Zell algorithm is to consecutively perform Algorithm 3, Algorithm 2, and Algorithm 1. In practice , the steps in Algorithm 3 and Algorithm 2 are operated together to enhance the efficiency of Zell. Zeta merging may also be combined with K-means and Affinity Propagation for clustering. These two algorithms work well for producing small clusters. So, they can be employed to generate initial clusters as the input of Zeta merging. 5 20 10 0 -10 -20 1000 5 80 800 1500 20 10 5 0 -5 -10 -15 500 1000 0 -10 50 400 300 40 -40 -30 100 -20 -10 0 -20 -20 -60 -50 -60 -50 -40 -30 -20 -10 0 -60 -50 -40 -30 (a) x 10 Minimum Delta popularity -6 (b) 8 Minimum Delta popularity (c) 5 x 10 -7 x 10 -7 6 4 2 0 First-order difference 3 4 3 2 1 0 -1 -2 2 1 0 0 20 40 60 80 100 Number of clusters 120 140 160 5 10 Number of clusters 15 5 10 Number of clusters 15 (d) (e) (f) Figure 3: Clustering on toy data. (a) Generated data of 12 clusters. The number of each cluster is shown in the figure. The data are of different distributions, consisting of multiple manifolds (two circles and a hyperbola), subspaces (two pieces of lines and a piece of the rectangular strip), and six Gaussians. The densities of clusters are diverse. The differences between the sizes of different clusters are large. The scales of the data vary. For each cluster in the manifold and subspace data, the points are randomly generated with different deviations. (b) Clusters yielded by Zell (given number of clusters). The different colors denote different clusters. (c) Clusters automatically detected by Zell on the data composed by six Gaussians and the short line. (d) Curve of minimum Delta popularity ( ). (e) Enlarged part of (d) and the curve of its first-order differences. The point marked by the square is the detected jumping point. (f) The block structures of P corresponding to the data in (c). 5 Experiment Experiments are conducted on clustering toy data, hand-written digits and cropped faces from captured images, and segmenting images to test the performance of Zell. The quantitative performance of the algorithms is measured by the normalized mutual information (NMI) [12] which is widely used in learning communities. The NMI quantifies the normalized statistical information shared between two distributions. The larger the NMI is, the better the clustering performance of the algorithm is. Four representative algorithms are taken into comparison, i.e., K-centers, (average) Linkage, Affinity Propagation (AP), and Normalized Cuts (Ncuts). Here we use K-centers instead of K-means because it can handle the case where distances between points are not measured by Euclidean norms. For fair comparison, we run Ncuts on the graph whose parameters are set the same with the graph used by Zell. The parameters for Zell are set as z = 0.01, a = 0.95, K = 20, and l = 2. 5.1 On toy data We first perform an experiment on a group of toy data of diverse distributions with multiple densities, multiple scales, and significantly different sizes of clusters. As shown in Figures 3 (b) and (c), the Zell algorithm accurately detects the underlying clusters out. Particularly, Zell is capable of simultaneously differentiating the cluster with five members and the cluster with 1500 members. This functionality is critically important for finding genes from microarray expressions in bioinformatics. Figures 3 (d) and (e) show the curves of minimum variational (for the data in Figure 3 (c)) where the number of clusters is determined at the largest gap of the curve in the stable part. However, the method presented in Section 3.2 fails to automatically detect the number of clusters for the data in Figure 3 (a), because the corresponding P matrix has no clear diagonal block structures. 5.2 On imagery data The imagery patterns we adopt are the hand-written digits in the MNIST and USPS databases and the facial images in the ORL and FRGC (Face Recognition Grand Challenge, http://www.frvt.org/FRGC/) databases. The MNIST and USPS data sets are downloaded from Sam Roweis's homepage (http://www.cs.toronto.edu/~roweis). For MNIST, we select all the images of digits from 0 to 4 in the testing set for experiment. For FRGC, we use the facial images in the target 6 Table 1: Imagery data. MNIST and USPS: digit databases. ORL and FRGC: face databases. The last row shows the numbers of clusters automatically detected by Zell on the five data sets. Data set MNIST USPS ORL sFRGC FRGC Number of samples 5139 11000 400 11092 16028 Number of clusters 5 10 40 186 466 Average number of each cluster 1027 ± 64 1100 ± 0 10 ± 0 60 ± 14 34 ± 24 Dimension of each sample 784 256 2891 2891 2891 Detected number of clusters 11 8 85 (K = 5) 229 511 Table 2: Quantitative clustering results on imagery data. NMI: normalized mutual information. The `pref ' means the preference value used in Affinity Propagation for clustering of given numbers. K = 5 for the ORL data set. Algorithm K-centers Linkage Ncuts Affinity propagation (pref) Zell MNIST 0.228 0.496 0.737 0.451 (-871906470) 0.865 NMI USPS 0.183 0.095 0.443 0.313 (-417749850) 0.772 ORL 0.393 0.878 0.939 0.877 (-6268) 0.940 sFRGC 0.106 0.934 0.953 0.899 (-16050) 0.988 FRGC 0.187 0.950 0.924 0.906 (-7877) 0.981 set of experiment 4 in the FRGC version 2. Besides the whole target set, we also select a subset from it. Such persons are selected as another group of clusters if the number of faces for each person is no less than forty. The information of data sets is provided in Table 1. For digit patterns, the Frobenius norm is employed to measure distances of digit pairs without feature extraction. For face patterns, however, we extract visual features of each face by means of the local binary pattern algorithm. The i (^i -i )2 yy Chi-square metric is exploited to compute distances, defined as distance(^, ) = yy ^ + . yy i i The quantitative results are given in Table 2. We see that Zell consistently outperforms the other algorithms across the five data sets. In particular, the performance of Zell is encouraging on the FRGC data set which has the largest numbers of clusters and samples. As reported in [1], AP does significantly outperform K-centers. However, AP shows the unsatisfactory performance on the digit data where the manifold structures may occur due to that the styles of digits vary significantly. The average Linkage also exhibits such phenomena. The results achieved by Ncuts are also competitive. However, Ncuts is overall unstable, for example, yielding the low accuracy on the USPS data. The results in Tabel 3 confirms the stability of Zell over the variations of free parameters. Actually, l affects the performance of Zell when it is larger, because it may incur incorrect initial clusters. 5.3 Image segmentation We show several examples of the application of Zell on image segmentation from the Berkeley (I -I )2 segmentation database. The weighted adjacency matrix P is defined as Pij = exp(- i 2 j ) if Ij Ni8 and 0 otherwise, where Ii is the intensity value of an image and Ni8 denotes the set of pixels in the 8-neighborhood of Ii . Figure 4 displays the segmentation results of different numbers of segments for each image. Overall, attentional regions are merged by Zell. Note the small attentional regions take the priorities of being merged than the large ones. Therefore, Zell yields many small attentional regions as final clusters. 6 Conclusion An algorithm, named Zell, has been developed for data clustering. The cyclization of a cluster is the fundamental principle of Zell. The key point of the algorithm is the integration of structural cycles but Zeta function of a graph. A popularity character of measuring the compactness of the cluster is defined via Zeta function, on which the core of Zell for agglomerative clustering is based. An approach for finding initial small clusters is presented, which is based on the merging of local links among samples. The directed graph used in this paper is derived from the directionality of l-links. Experimental results on toy data, hand-written digits, facial images, and image segmentation show the competitive performance of Zell. We hope that Zell brings a new perspective on complex data clustering. 7 Table 3: Results yielded by Zell over variations of free parameters on the sFRGC data. The initial set is {z = 0.01, a = 0.95, K = 20, l = 3}. When one of them varies, the other keep invariant. Parameter z a K l -{ 1 , 2 , 3 , 4 } Range 10 0.2 × {1, 2, 3, 4, 4.75} 10 × {2, 3, 4, 5} {2, 3, 4} NMI 0.988 ± 0 0.988 ± 0.00019 0.987 ± 0.0015 0.988 ± 0.0002 Figure 4: Image segmentation by Zell from the Berkeley segmentation database. Acknowledgement We thank Yaokun Wu and Sergey Savchenko for their continuing help on algebraic graph theory. We are also grateful of the interesting discussion with Yi Ma and John Wright on clustering and classification. Feng Li and Xiaodi Hou are acknowledged due to their kind help. The reviewers' insightful comments and suggestions are also greatly appreciated. References [1] Frey, B.J. & Dueck, D. (2007) Clustering by passing messages between data points. Science 315:972-976. [2] Ma, Y. Derksen, H. Hong, W. & Wright, J. (2007) Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Trans. on Pattern Recognition and Machine Intelligence 29:1546-1562. [3] Shi, J.B. & Malik, J. (2000) Normalized cuts and image segmentation. IEEE Trans. on Pattern Recognition and Machine Intelligence 22(8):888-905. [4] Ng, A.Y., Jordan, M.I. & Weiss, Y. (2001) On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press. [5] Newman, M.E.J. (2006) Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74(3). [6] Destexhe, A. & Contreras, D. (2006) Neuronal computations with stochastic network states. Science, 314(6):85-90. [7] Sporns, O. Tononi, G. & Edelman, G.M. (2000) Theoretical neuroanatomy: relating anatomical and functional connectivity in graphs and cortical connection matrices. Cerebral Cortex, 10:127-141. [8] Bagrow, J. Bollt, E. & Costa, L.F. (2007) On short cycles and their role in network structure. http://arxiv.org/abs/cond-mat/0612502. [9] Bianconi, G. & Marsili, M. (2005) Loops of any size and Hamilton cycles in random scale-free networks. Journal of Statistical Mechanics, P06005. [10] Savchenko, S.V. (1993) The zeta-function and Gibbs measures. Russ. Math. Surv. 48(1):189-190. [11] Li, S. Ahmed, S. Klimeck, G. & Darve, E. (2008) Computing entries of the inverse of a sparse matrix using the FIND algorithm. Journal of Computational Physics 227:9408-9427. [12] Strehl, A. & Ghosh, J. (2002) Cluster ensembles -- a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3:583617. 8