Ensemble Clustering using Semidefinite Programming
Vikas Singh1, Lopamudra Mukherjee2, Jiming Peng3, Jinhui Xu2
1

W2

Biostatistics and Medical Informatics University of Wisconsin ­ Madison vsingh@biostat.wisc.edu
Solution 1

Computer Science & Engineering State University of New York at Buffalo {lm37, jinhui}@cse.buffalo.edu

2

Industrial and Enterprise Systems Eng. University of Illinois Urbana Champaign pengj@uiuc.edu

3

The Problem: How to best combine the ensemble of multiple clustering solutions in a maximum consent sense? Goal

Solution 2

· Let

Generate Clusterings Solution 3

Integrate

Generate Clusterings

Integrate

P1, P2, . . . , Pm be m partitions of the data. · Each partition Pi is produced by a different algorithm, Ci in an ensemble.  · Goal is to derive a partition P for the ensemble.
· No

Solution 4
Solution 4

single clustering algorithm is perfect. · In most real life applications, no ground truth is known. · An ensemble will most likely be superior to individual solutions.

Motivations

Do pairwise voting strategies work? Existing Graph Partitioning based approaches
Base Partitioning
D D

B

A C

D E
C1

1 {C3}
D
w

1 {C4}

1 {C3}

1 {C4}

=4
C A
w

Our contributions
C

1 {C3}
Similarity Matrix

1 {C4}

A

2 {C1, C2}

2 {C1, C2}

A B

D E C

C2

A A B
1 2 1 0

B
1

C
2 1

D
1 0 1

E
0 1

A

2 {C1, C2}

C
w

=4

1 {C1}

1 {C1}

1 {C3}

1 {C1}

=4
1 {C1}

1 {C3}

1 {C1}
1 0 1 1 1 0 1 0

B

A

D CE D C E

C
C3

1 {C1}

1 {C3}

B

E

B

E

1 {C2} 1 {C4} 1 {C4}

1 {C2}

D E

B

E

1 {C2} 1 {C4}

B

A

C4

How many times are A and C in the same cluster?

(A, C, D) and (A, B , C ) are same weight-wise, but
Which algorithms put (A, C, D) together? Answer: None Which algorithms put (A, B , C ) together? Answer: At least one (C1) Is this information useful? Answer:Yes, as cohesiveness measures in clusters of size  2

new string encoding based strategy captures  this cohesiveness/agreement better. · A new IP model to optimize similarity. · Model can be convexified to a precise SDP. · Show applications to novel domains (such as segmentation) in addition to regular classification tasks. · Empirically, also optimizes Normalized Mutual Information well.

·A

Compute togetherness frequency/votes/weight of items, but considered pairwise

So, we'll see you at the poster then!

Advances in Neural Information Processing Systems, 2007.