This is a preliminary release of the joint part of speech (POS) tagger and syntactic chunker described in the original ICML 2005 Learning as Search Optimization paper (this is similar, but not identical to the one described in that paper -- it is trained on a different subset of the data and not as much care was taken to tune hyperparameters against dev data). There will be a subsequent release based on the new learning technique described in the NIPS 2005 Search-based Structured Prediction paper, but that is not yet ready for mass consumption. The one released here is also significantly more efficient.
% tagchunk.i686 -predict . (weights file) (test file) (resource directory)You should replace (weights file) with the name of the weights you wish to use, (test file) with the name of the file you wish to tag and (resource directory) with the name of the directory into which you extracted the lists. The program writes the output to stdout.
% cat test The man with the telescope saw me across the street . % tagchunk.i686 -predict . w-1 test ~/projects/chunking/ > test.out Loading lists...list-locations1...list-locations2...list-locations3...list-locations4...list-locations5...list-names1...list-names2...list-names3...list-namesA...list-namesB...list-nes...list-positions1...list-positions2...list-positions3...list-verbs1...list-verbs2...list-verbs3...list-tags-wsj...list-ulfreq...list-tags-all...list-tags-all2...list-mp-address...list-mp-adj...list-mp-aux...list-mp-beforeorg...list-mp-begwords...list-mp-dist...list-mp-nn...list-mp-nn1...list-mp-noun...list-mp-subj...list-mp-units... % cat test.out The_DT_B-NP man_NN_I-NP with_IN_B-PP the_DT_B-NP telescope_NN_I-NP saw_VBD_B-VP me_PRP_B-NP across_IN_B-PP the_DT_B-NP street_NN_I-NP ._._B-OYou can ignore the "Loading lists..." line. The output should be fairly clear if you know what POS tags and chunk labels look like.
% tc.pl [-faster|-lc] file1 ... fileNWhere -faster means to use beam 1 and -lc means to use the lower-case weights. The list of files (file1, ..., fileN) are tagged and the outputs are written to (file1.tc, ..., fileN.tc).