Fast search for Dirichlet process mixture models

Hal Daume III ()
This code implements the search algorithms (modulo a few minor changes)
described in the Fast search for
DPMMs paper at AI-Stats 2007. It should work out of the box with a
reasonably recent version of matlab. Currently the code only contains the
Dirichlet/Multinomial case, but the Gaussian case can be hacked in in about 5
minutes.

First, download the tar bundle: DPsearch.tgz and
untar it somewhere useful. Next, load up matlab and run:

DPsearch(X, alpha, G0alpha, beamSize, Y, heuristic)

Here, `X` is the data (`X(n,:)` is the n-th data point,
`X(n,i)` is the number of times word "i" is observed in document "n").
`alpha` is the concentration parameter of the DP, `G0alpha` is
the concentration parameter of the prior mean Dirichlet. `beamSize` is
the size of the beam you want to use. If you have "true" clusters, `Y`
should contain these; otherwise, use `ones(1,N)`, where `N` is
the number of data points. Finally, `heuristic` should be `'n'`
to run with no heuristic, `'a'` to use the admissible heuristic and
`'i'` to use the inadmissible heuristic.
If you want a baseline, see DPgibbs.m.

Please email me with
questions and comments.