Fast search for Dirichlet process mixture models
Hal Daume III ()
This code implements the search algorithms (modulo a few minor changes)
described in the Fast search for
DPMMs paper at AI-Stats 2007. It should work out of the box with a
reasonably recent version of matlab. Currently the code only contains the
Dirichlet/Multinomial case, but the Gaussian case can be hacked in in about 5
minutes.
First, download the tar bundle: DPsearch.tgz and
untar it somewhere useful. Next, load up matlab and run:
DPsearch(X, alpha, G0alpha, beamSize, Y, heuristic)
Here, X is the data (X(n,:) is the n-th data point,
X(n,i) is the number of times word "i" is observed in document "n").
alpha is the concentration parameter of the DP, G0alpha is
the concentration parameter of the prior mean Dirichlet. beamSize is
the size of the beam you want to use. If you have "true" clusters, Y
should contain these; otherwise, use ones(1,N), where N is
the number of data points. Finally, heuristic should be 'n'
to run with no heuristic, 'a' to use the admissible heuristic and
'i' to use the inadmissible heuristic.
If you want a baseline, see DPgibbs.m.
Please email me with
questions and comments.