HBC: Hierarchical Bayes Compiler
Pre-release version 0.3 (27 Oct 2007)
Older versions:
0.2
0.1
HBC is a toolkit for implementing hierarchical Bayesian models. HBC
created because I felt like I spend too much time writing boilerplate
code for inference problems in Bayesian models. There are several
goals of HBC:
- Allow a natural implementation of hierarchal models.
- Enable quick and dirty debugging of models for standard data types.
- Focus on large-dimension discrete models.
- More general that simple Gibbs sampling (eg., allowing for
maximizations, EM and message passing).
- Allow for hierarchical models to be easily embedded in larger
programs.
- Automatic Rao-Blackwellization (aka collapsing).
- Allow efficient execution via compilation to other languages (such
as C, Java, Matlab, etc.).
These goals distinguish HBC from other Bayesian modeling software,
such as Bugs (or WinBugs). In
particular, our primary goal is that models created in HBC can be used
directly, rather than only as a first-pass test. Moreover, we aim for
scalability with respect to data size. Finally, since the goal of HBC
is to compile hierarchical models into standard programming
languages (like C), these models can easily be used as part of a
larger system. This last point is in the spirit of the dynamic
programming language Dyna.
Note that some of these aren't yet supported (in particular: some of 4
and full support for 6) but should be coming soon!
New in Version 0.3
- (Major update) Bug fixes for doubly indexed variables.
I.e., things like a_{b_{c}}. These were hardly working if at
all. Many more things are possible with this fixes.
- (Minor update) --dump now works with compiled
code (with the exception of best dumping) and so you can
actually see what the compled sampler is doing.
- (Minor udpate) You can elect to maximize some variables
(instead of sample) by saying --maximize VAR.
- (Minor update) Now comes with an implementation of IBM model 1 for machine translation.
New in Version 0.2
There are three new things in version 0.2:
- (Major update) Dirichlet/Multinomial pairs can now be
marginalized out! This means that you can, for instance, obtain the
"collapsed Gibbs sampler" for LDA. To marginalize out a
multinomial variable called, say, theta, you need only
specify "--collapse theta" as an argument to
hbc.
- (Minor update) The generated C code now keeps track of the
best sample thus far and displays an asterix whenever a better sample
is encountered.
- (Minor update) Command-line options can be specified
directly in .hier source files. Simply begin a line with
--# (which would otherwise be treated as a comment) and you
can just write out any command-line option; see LDA.hier for an example.
A Quick Example
To give a flavor of what HBC is all about, here is a complete
implementation of a Bayesian mixture of Gaussians model in HBC format:
alpha ~ Gam(10,10)
mu_{k} ~ NorMV(vec(0.0,1,dim), 1) , k \in [1,K]
si2 ~ IG(10,10)
pi ~ DirSym(alpha, K)
z_{n} ~ Mult(pi) , n \in [1,N]
x_{n} ~ NorMV(mu_{z_{n}}, si2) , n \in [1,N]
If you are used to reading hierarchical models, it should be quite
clear what this model does. Moreover, by keeping to a very LaTeX-like
style, it is quite straightforward to automatically typeset any
hierarchical model. If this file were stored in
mix_gauss.hier, and if we had data for x stored in a
file called X, we could run this model (with two Gaussians)
directly by saying:
hbc simulate --loadM X x N dim --define K 2 mix_gauss.hier
Perhaps closer to my heart would be a six-line implementation of the
Latent Dirichlet Allocation model, complete with hyperparameter
estimation:
alpha ~ Gam(0.1,1)
eta ~ Gam(0.1,1)
beta_{k} ~ DirSym(eta, V) , k \in [1,K]
theta_{d} ~ DirSym(alpha, K) , d \in [1,D]
z_{d,n} ~ Mult(theta_{d}) , d \in [1,D] , n \in [1,N_{d}]
w_{d,n} ~ Mult(beta_{z_{d,n}}) , d \in [1,D] , n \in [1,N_{d}]
This code can either be run directly (eg., by a simulate
call as above) or compiled to native C code for (much) faster
execuation.
User's Guide
Can be downloaded in Adobe Acrobat format.
Distribution
You can download either the source code as a tar bundle or a Linux executable, also as
a tar bundle. You can build the
source using GHC by saying: ghc --make -fglasgow-exts Main -o hbc.
Both include sample hierarchical models and data for:
The source, executables and examples are all completely free for any
purpose whatsoever.
Questions, Comments and Bugs
Please email me directly with supporting information at .