[CLIS logo]

LBSC 671 - Creating Information Infrastructures
Spring 2014 - Section 0101
Assignment G13 - Machine-Assisted Indexing


This homework is due before the start of the class session indicated on the syllabus. It should be submitted using your ELMS Wiki page, in the usual way. Partial credit may be awarded.

The goal of this assignment is to gain some experience with text classification and to get a sense for how well it is working. First, select an online text classification system. You can use any classifier that you wish, but I have tested the following online demos:

Or, if you are feeling particularly ambitious (which is not required!) you can build your own classifier at etcML by providing positive and negative training examples. Or if you are feeling REALLY ambitious, you can run NLTK or Weka (warning: installing these can be a major undertaking).

First, just try things out a bit to get a sense for what the classifier you have selected is trying to do. Then search the Web to select a dozen or so examples to be classified and (this is important) first classify them yourself. For example, if your classifier guesses whether the text was written by a man or a woman (none of them do, this is just an example), then you would label each text based on YOUR guess. You can base your guess on certain knowledge (because of the way you selected your examples) or you can really just make your best guess (if you really don't know). Then run the classifier on each and compare the classifier's results with your guess (or knowledge) of the right answer. Tabulate the correct and incorrect answers from the classifier (treating your answers as true) and then see if you can explain why the ones that are right are right and why the ones that are wrong are wrong (for example, are they misleading examples? does the classifier pay too much attention to simple features and miss some nuance? etc.). In your answer, tell me:

  1. Which classifier you selected and what classification task it was performing. If you use an online demo, please provide the URL.
  2. One example of a document that you tried to classify.
  3. The accuracy (percent correct) of your classifier.
  4. Why you think your classifier made the mistakes it made (if it made any!)

Doug Oard Last modified: Sun May 4 22:35:05 2014