1. Process the data mallet import-file --remove-stopwords --keep-sequence --input nsf-30k.txt --output nsf.mallet mallet import-file --remove-stopwords --keep-sequence --input nrc.txt --output nrc.mallet --use-pipe-from nsf.mallet 2. Learn topics mallet train-topics --input nsf.mallet --num-topics 10 --num-iterations 100 --output-model nsf_10 --output-state nsf_10.state.gz --output-doc-topics nsf_10.doc --output-topic-keys nsf_10.topics --inferencer-filename nsf_10.inf --optimize-interval 10 --optimize-burn-in 50 3. Inspect those files gzcat nsf_10.state | more 4. Learn topics on the NRC data mallet infer-topics --input nrc.mallet --inferencer nsf_10.inf --output-doc-topics nrc.doc