Correlation Clustering: Maximizing Agreements via Semidefinite Programming Chaitanya Swamy Abstract We consider the Correlation Clustering problem introduced in [2]. Given a graph G = (V , E ) where each edge is lab eled either "+" (similar) or "-" (different), we want to cluster the nodes so that the + edges lie within the clusters and the - edges lie b etween clusters. Sp ecifically, we want to maximize agreements -- the numb er of + edges within clusters and - edges b etween clusters. This problem is NP-Hard [2]. We give a 0.7666-approximation algorithm for maximizing agreements on any graph even when the edges have non-negative weights (along with lab els) and we want to maximize the weight of agreements. These were p osed as op en problems in [2]. Previously the only results known were a trivial 0.5-approximation for arbitrary edge weighted graphs, and a PTAS with unit edge weights when |E | = (|V |2 ). Somewhat surprisingly, our algorithm always produces a clustering with at most 6 clusters. As a corollary we get a 0.7666-approximation algorithm for the k-clustering variant of the problem where we may create at most k clusters. A ma jor comp onent of this algorithm is a simple, easy-to-analyze algorithm that by itself achieves an approximation ratio of 0.75, op ening at most 4 clusters. minimizing disagreements on a complete unweighted graph. Recently, [3], [4] and [5] independently gave an O(log |V |)-approx. algorithm for minimizing disagreements on weighted graphs. Independent of our work, [3] gives a 0.7664-approx. algorithm for maximizing agreements and shows that the problem is APX-Hard. The k -clustering variant is a special case of the MAX-2-LIN-MOD-k problem [1]. Here we have a set of weighted linear equations and inequations of the form yi - yj z (mo d k ), yi - yj z (mo d k ). We want to set yi {0, . . . , k - 1} to maximize the total weight of the satisfied (in)equations. We get a 0.7666-approximation for the special case when we only have equations yi - yj 0 (mo d k ) and inequations yi - yj 01(mo d k ). In contrast, for the general case only a k + (k) approximation is known due to Andersson, Engebretsen and H°stad [1]. a Our Techniques. We consider a semidefinite programming relaxation of the problem and round its optimal solution. We consider two rounding pro cedures. The first one extends the Go emans-Williamson random-hyperplane rounding pro cedure for MAX CUT [7]. We cho ose 2 hyperplanes pro ducing a solution with at most 4 clusters where the weight of agreements is, in expectation, at least 0.75 times the optimal solution value. The second pro cedure is based on the rounding scheme of [6]. We show that by randomly choosing one of these we obtain a clustering in which the total weight of agreements is at least 0.7666 times the optimal solution value. Thus cho osing the better of the two rounding procedures gives a 0.7666-approximation algorithm. 1 Problem Description We consider a somewhat more general version of correlation clustering to maximize agreements. Each edge e has two weights win (e), wout (e) 0. An edge e contributes win (e) to the total agreement weight if it lies within a cluster and wout (e) otherwise. The problem is to find a eclustering that maximizes e The within cluster win (e) + not in cluster wout (e). weights win (e), wout (e) can be viewed as confidence estimates of whether e should be labeled + or - respectively, thus giving a soft labeling. The correlation clustering problem considered in [2] is a special case obtained by setting win (e) = 1 if e is labeled + and 0 otherwise, wout (e) = 1 - win (e). Applications and Related Work. Bansal, Blum & Chawla [2] introduced the correlation clustering problem and motivated it by a do cument clustering application. We have a corpus of do cuments and each no de in G represents a do cument in the corpus. An edge (u, v ) with a + label means that the do cuments corresponding to no des u, v are similar and a - label means that they are different. The goal is to cluster the documents so that similar do cuments (+ edges) lie in the same cluster and dissimilar do cuments (- edges) lie in different clusters. Correlation clustering can also be viewed as an agnostic learning problem. Each edge (u, v ) is an "example" and we want to represent the target function f using a hypothesis class of vertex clusters. Our result implies that any function f has a representation using a hypothesis of at most 6 clusters that is close to the best possible representation of f by a hypothesis from the class of al l possible vertex clusterings. There is a trivial 0.5-approximation algorithm for maximizing agreements; putting all vertices in one big cluster, or placing every vertex in a separate cluster, agrees with at least half the edge labels. Bansal et al. give an algorithm to approximate agreements with an additive error of |V |2 , obtaining a PTAS when |E | = (|V |2 ). An equivalent optimization problem is to minimize disagreements -- the number of - edges within clusters and + edges between clusters. In [2] a constant-factor approximation algorithm is given for Cornell University, Ithaca, NY 14853. Research supported by NSF grant CCR-9912422. swamy@cs.cornell.edu. Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 526 1.1 A Semidefinite Program. Let ei Rn be the vector with 1 in the ith co ordinate and 0s everywhere else. We can formulate the problem as the follwwing mathematical program: maximize o : e xv in (e)(xu · xv ) + wout (e)(1 - xu · xv ) =(u,v ) . {e1 , . . . , en } for every v V Vector ei represents a possible cluster i. For any clustering, if we set xv = ei for every vertex v assigned to cluster i, i = 1, . . . , k , the ob jective function value becomes the weight of agreements in the clustering. We relax the constraints xv {e1 , . . . , en } to get a semidefinite program. w ( e max SP) in (e)(xu · xv )+wout (e)(1 - xu · xv ) =(u,v ) ) So < 0.68288 and f () (1-os/ ) = 2(s- > 2 in 1 c1 0 . dg 0.7895 for , 2 d = g () - 2- - cot(/2) s 0 2 0 for0 . , ince cos 1 - 2 , sin for 2 . 0 , So, g () g ( ) = 0.75 for , 2 2 2 2 - . 2 Theorem 2.1. The above rounding procedure delivers a solution of expected value at least 0.75 · OPT . Proof. Let C be the clustering obtained by rounding. Let Xe be the contribution of Xdge = = (u, v ) to C e e win (e)pin () + and = arccos(xu · xv ). E e b w wout (e)pout () 0.75 C in (e)(xu ·xX +wout (e)(1-xu ·xv ) v) e 0.75 · OPT . Ee y Lemma 2.2, so E = 2.1 Improvements. For the other rounding pro cedure we adapt a rounding scheme in [6]. We choose 6 random vectors r1 , . . . , r6 Rn whose co ordinates have the standard normal distribution. Each ri defines a (possibly empty) cluster Ci = {v : v · ri = maxj v · rj } in our clustering. The analysis of this algorithm is however significantly more involved. Randomly choosing this scheme or the 2-hyperplane rounding algorithm gives a 0.7666-approximation algorithm that produces at most 6 clusters. So this also works for the k -clustering variant when k 6. For k 5 we use a relaxation -1 where (1.1) is replaced by xu · xv k-1 and the obw e 1+(k-1)(xu ·xv ) + jective function is max in (e) =(u,v ) k , (k-1)(1-xu ·xv ) and round this by choosing eiwout (e) k ther 1 or 2 hyperplanes. This achieves a ratio of 0.77. Theorem 2.2. There is a 0.7666-approximation algorithm for maximizing agreements. This also gives a 0.7666-approx. algorithm for the k -clustering variant. Acknowledgments. I thank David Shmoys for giving helpful suggestions. I also thank Dotan Emanuel, Amos Fiat and Moses Charikar for sharing their results. References [1] G. Andersson, l. Engebretsen, and J. H°stad. A new a approach to use semidefinite programming with applications to linear equations mod p. J. Algorithms, 39:162­204, 2001. [2] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Proc. 43rd IEEE FOCS, 238­247, 2002. [3] M. Charikar, V. Guruswami and A. Wirth. Clustering with qualitative information. To appear in Proc. 44th IEEE FOCS, 2003. [4] E. Demaine and N. Immorlica. Correlation clustering with partial information. Proc. 6th APPROX, 1­13, 2003. [5] D. Emanuel and A. Fiat. Correlation clustering -- minimizing disagreements on arbitrary weighted graphs. Proc. 11th ESA, 208­220, 2003. [6] A. Frieze and M. Jerrum. Improved approximation algorithms for MAX k -CUT and MAX-BISECTION. Algorithmica, 18:67­81, 1997. [7] M. Goemans and D. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. JACM, 42:1115-1145, 1995. s.t. xv · xv = 1 xu · xv 0 for all v for all u, v , u = v (1.1) Our formulation resembles the MAX k -CUT relaxation in [6] but they relax a mathematical program involving k vectors {ai } s.t. ai · ai = 1, ai · aj = k-11 for i = j . - 2 The Algorithm We solve (SP) and round the optimal solution. We consider two rounding pro cedures. Due to space limitations we only describe one of these which by itself gives an approximation ratio of 0.75, and sketch the improvements. We extend the Go emans-Williamson rounding for MAX CUT by cho osing multiple hyperplanes. Let {xv Rn } be the optimal solution to (SP). While rounding we need to ensure that both, the probability that edge e lies inside a cluster, and the probability that e lies between clusters, are comparable to the co efficients of win (e) and wout (e) respectively in the ob jective function. Cho osing to o many random hyperplanes rapidly decreases the probability of the former, and with to o few hyperplanes, e.g., 1, the probability of the latter decreases to 0.5 times the co efficient of wout (e). We cho ose 2 hyperplanes passing through the origin independently at random with normals distributed uniformly in the unit sphere. Let q1 , q2 be the normals to the hyperplanes. These partition the vertices into 4 sets, some possibly empty, based on xv · qi . Let Rs1 ,s2 = {v : (-1)si xv · qi 0, i = 1, 2} where si {0, 1}. Each such non-empty set defines a cluster. Analysis. Let pin (), pout () = 1 - pin () denote the probabilities that nodes u and v with xu · xv = cos lie in the same cluster or different clusters respectively. Lemma 2.1. pin () = (1 - / )2 . 0 Lemma 2.2. For any , ], pin () 0.75 cos and 2 pout () 0.75(1 - cos ). n t( Proof. Let f () = pio() and g () = (1pouco) ) . f () is c s -s 0 s minimized at the unique point , 2 .t. tan = 527