Downloads And Media
- Source code for a C++ Gibbs sampler for latent Dirichlet allocation and latent Dirichlet allocation with WordNet (written to be easily extensible to other models) [Related Paper]
- Dataset for Networks Uncovered by Bayesian Inference; an LDA-style corpus that used pair mentions in a text as a "document." From Wikipedia and the Bible.
- Around two hundred thousand sense disambiguated lexical similarity pairs multiply annotated by humans. (Note: These data are slightly different from what were published - less filtering was applied) [First paper, Mechanical Turk Paper]
- Source code for syntactic topic models
- Source code for a R gibbs sampler for Networks Uncovered by Bayesian Inference (Nubbi) in addition to related models like RTL and LDA. [Related Paper]
- Human evaluation of topics derived by pLSI, LDA, and CTM. [Related Paper]
- Human evaluations of transitivity on multiple axes [Related Paper]
- German film reviews. [Related Paper]
- Spoilers from TV Tropes. [Related Paper]
- Quiz Bowl questions, answers, and buzzes. [Related Paper]
- Source code for a distributed implementation of LDA (Mr. LDA) [Related Paper]
Media
- My VideoLectures.net Page
- Jonathan and I discussing a poster of this work
- Recap in song of our 2009 Workshop on Applications of Topic Models [lyrics]
- Videos of verbs (script by Xiaojuan Ma, performed by Ge Wang)
- Video of me presenting topic models for machine translation [Related paper]
- Video of presentation about incremental trivia games [Related paper]