Fast search for Dirichlet process mixture models
Hal Daume III ()

This code implements the search algorithms (modulo a few minor changes) described in the Fast search for DPMMs paper at AI-Stats 2007. It should work out of the box with a reasonably recent version of matlab. Currently the code only contains the Dirichlet/Multinomial case, but the Gaussian case can be hacked in in about 5 minutes.

First, download the tar bundle: DPsearch.tgz and untar it somewhere useful. Next, load up matlab and run:

  DPsearch(X, alpha, G0alpha, beamSize, Y, heuristic)
Here, X is the data (X(n,:) is the n-th data point, X(n,i) is the number of times word "i" is observed in document "n"). alpha is the concentration parameter of the DP, G0alpha is the concentration parameter of the prior mean Dirichlet. beamSize is the size of the beam you want to use. If you have "true" clusters, Y should contain these; otherwise, use ones(1,N), where N is the number of data points. Finally, heuristic should be 'n' to run with no heuristic, 'a' to use the admissible heuristic and 'i' to use the inadmissible heuristic.

If you want a baseline, see DPgibbs.m.

Please email me me AT hal3 DOT name with questions and comments.