Ensemble Clustering using Semidefinite Programming Vikas Singh1, Lopamudra Mukherjee2, Jiming Peng3, Jinhui Xu2 1 W2 Biostatistics and Medical Informatics University of Wisconsin ­ Madison vsingh@biostat.wisc.edu Solution 1 Computer Science & Engineering State University of New York at Buffalo {lm37, jinhui}@cse.buffalo.edu 2 Industrial and Enterprise Systems Eng. University of Illinois Urbana Champaign pengj@uiuc.edu 3 The Problem: How to best combine the ensemble of multiple clustering solutions in a maximum consent sense? Goal Solution 2 · Let Generate Clusterings Solution 3 Integrate Generate Clusterings Integrate P1, P2, . . . , Pm be m partitions of the data. · Each partition Pi is produced by a different algorithm, Ci in an ensemble. · Goal is to derive a partition P for the ensemble. · No Solution 4 Solution 4 single clustering algorithm is perfect. · In most real life applications, no ground truth is known. · An ensemble will most likely be superior to individual solutions. Motivations Do pairwise voting strategies work? Existing Graph Partitioning based approaches Base Partitioning D D B A C D E C1 1 {C3} D w 1 {C4} 1 {C3} 1 {C4} =4 C A w Our contributions C 1 {C3} Similarity Matrix 1 {C4} A 2 {C1, C2} 2 {C1, C2} A B D E C C2 A A B 1 2 1 0 B 1 C 2 1 D 1 0 1 E 0 1 A 2 {C1, C2} C w =4 1 {C1} 1 {C1} 1 {C3} 1 {C1} =4 1 {C1} 1 {C3} 1 {C1} 1 0 1 1 1 0 1 0 B A D CE D C E C C3 1 {C1} 1 {C3} B E B E 1 {C2} 1 {C4} 1 {C4} 1 {C2} D E B E 1 {C2} 1 {C4} B A C4 How many times are A and C in the same cluster? (A, C, D) and (A, B , C ) are same weight-wise, but Which algorithms put (A, C, D) together? Answer: None Which algorithms put (A, B , C ) together? Answer: At least one (C1) Is this information useful? Answer:Yes, as cohesiveness measures in clusters of size 2 new string encoding based strategy captures this cohesiveness/agreement better. · A new IP model to optimize similarity. · Model can be convexified to a precise SDP. · Show applications to novel domains (such as segmentation) in addition to regular classification tasks. · Empirically, also optimizes Normalized Mutual Information well. ·A Compute togetherness frequency/votes/weight of items, but considered pairwise So, we'll see you at the poster then! Advances in Neural Information Processing Systems, 2007.