- Introduction
- Judging Classifier Goodness
*[20%]* - Reductions for Multiclass Classification
*[40%]* - Gradient Descent and Linear Classification
*[40%]* - Extra Credit: Collective Classification
*[EC: 20%]*

Files you'll edit: | |

`gd.py` |
Where you will put your gradient descent implementation. |

`linear.py` |
This is where your generic "regularized linear classifier" implementation will go. |

`multiclass.py` |
This is where your generic "regularized linear classifier" implementation will go. |

Files you might want to look at: | |

`binary.py` |
Our generic interface for binary classifiers (actually works for regression and other types of classification, too). |

`datasets.py` |
Where a handful of test data sets are stored. |

`mlGraphics.py` |
A few useful plotting commands |

`runClassifier.py` |
A few wrappers for doing useful things with classifiers, like training them, generating learning curves, etc. |

`util.py` |
A handful of useful utility functions: these will undoubtedly be helpful to you, so take a look! |

`data/*` |
All the datasets that we'll use. |

**What to submit:** You
will handin
all of the python files listed above under "Files you'll edit" as
well as a `partners.txt` file that lists the **names**
and **uids** (first four digits) of all members in your team.
Finally, you'll hand in a `writeup.pdf` file that answers all
the written questions in this assignment (denoted by **WU#:** in
this `.html` file).

**Evaluation:** Your code will be autograded for
technical correctness. Please *do not* change the names of any
provided functions or classes within the code, or you will wreak havoc
on the autograder. However, the correctness of your implementation --
not the autograder's output -- will be the final judge of your score.
If necessary, we will review and grade assignments individually to
ensure that you receive due credit for your work.

**Academic Dishonesty:** We will be checking your code
against other submissions in the class for logical redundancy. If you
copy someone else's code and submit it with minor changes, we will
know. These cheat detectors are quite hard to fool, so please don't
try. We trust you all to submit your own work only; *please*
don't let us down. If you do, we will pursue the strongest
consequences available to us.

**Getting Help:** You are not alone! If you find
yourself stuck on something, contact the course staff for help.
Office hours, class time, and the mailing list are there for your
support; please use them. If you can't make our office hours, let us
know and we will schedule more. We want these projects to be
rewarding and instructional, not frustrating and demoralizing. But,
we don't know when or how to help unless you ask. One more piece of
advice: if you don't know what a variable is, print it out.

**WU2 (10%):** On the sentiment data, use FastDT to train a
decision tree of all possible depths from 1 to 20. Use the development
data to choose an optimal depth, call it d*. What development error do
you get for d*? Which other depths are *not statistically
significantly worse* than d*? Use the ttest with a 95% significance
level to answer this question. Please write a couple sentences
describing what you did to evaluate this, as well as what your answer
is.

First, you must implement AVA and the tree based reduction (the multiclass.py file that comes with this project is identical to the one from the lab, except the existence of the extra class for trees). See the lab for test cases of this. Second, you must implement a tree-based reduction. Most of train is given to you, but predict you must do all on your own. I've provided a tree class to help you:

>>> t = makeBalancedTree(range(6)) >>> t [[0 [1 2]] [3 [4 5]]] >>> t.isLeaf False >>> t.getLeft() [0 [1 2]] >>> t.getLeft().getLeft() 0 >>> t.getLeft().getLeft().isLeaf True

**WU4 (10%):** Using decision trees of constant depth for each
classifier (but you choose it as well as you can!), train AVA, OVA and
Tree (using balanced trees) for the wine data. Which does best?

**WEC (5%):** Build a better tree (any way you want) other than the
balanced binary tree. Fill in your code for this
in `getMyTreeForWine`, which defaults to a balanced tree.

In each iteration of gradient descent, we will compute the gradient
and take a step in that direction, with step size `eta`. We
will have an *adaptive* step size, where `eta` is computed
as `stepSize` divided by the square root of the iteration
number (counting from one).

Once you have an implementation running, we can check it on a simple
example of minimizing the function `x^2`:

>>> gd.gd(lambda x: x**2, lambda x: 2*x, 10, 10, 0.2) (1.0034641051795872, array([ 100. , 36. , 18.5153247 , 10.95094653, 7.00860578, 4.72540613, 3.30810578, 2.38344246, 1.75697198, 1.31968118, 1.00694021]))You can see that the "solution" found is about 1, which is not great (it should be zero!), but it's better than the initial value of ten! If yours is going up rather than going down, you probably have a sign error somewhere!

We can let it run longer and plot the trajectory:

>>> x, trajectory = gd.gd(lambda x: x**2, lambda x: 2*x, 10, 100, 0.2) >>> x 0.003645900464603937 >>> plot(trajectory)It's now found a value close to zero and you can see that the objective is decreasing by looking at the plot.

**WU5 (5%):** Find a few values of step size where it converges and
a few values where it diverges. Where does the threshold seem to
be?

**WU6 (5%):** Come up with a *non-convex* univariate
optimization problem. Plot the function you're trying to minimize and
show two runs of `gd`, one where it gets caught in a local
minimum and one where it manages to make it to a global minimum. (Use
different starting points to accomplish this.)

If you implemented it well, this should work in multiple dimensions, too:

>>> x, trajectory = gd.gd(lambda x: linalg.norm(x)**2, lambda x: 2*x, array([10,5]), 100, 0.2) >>> x array([ 0.0036459 , 0.00182295]) >>> plot(trajectory)Our generic linear classifier implementation is in

There are three loss function stubs: `SquaredLoss` (which is
implemented for you!), `LogisticLoss` and `HingeLoss`
(both of which you'll have to implement. My suggestion is to hold off
implementing the other two until you have the linear classifier
working

.
The `LinearClassifier` class is a stub implemention of a
generic linear classifier with an l2 regularizer. It
is *unbiased* so all you have to take care of are the weights.
Your implementation should go in `train`, which has a handful
of stubs. The idea is to just pass appropriate functions
to `gd` and have it do all the work. See the comments inline
in the code for more information.

Once you've implemented the function evaluation and gradient, we can
test this. We'll begin with a very simple 2D example data set so that
we can plot the solutions. We'll also start with *no
regularizer* to help you figure out where errors might be if you
have them. (You'll have to import `mlGraphics` to make this
work.)

>>> h = linear.LinearClassifier({'lossFunction': linear.SquaredLoss(), 'lambda': 0, 'numIter': 100, 'stepSize': 0.5}) >>> runClassifier.trainTestSet(h, datasets.TwoDAxisAligned) Training accuracy 0.91, test accuracy 0.86 >>> h w=array([ 2.73466371, -0.29563932]) >>> mlGraphics.plotLinearClassifier(h, datasets.TwoDAxisAligned.X, datasets.TwoDAxisAligned.Y)Note that even though this data is clearly linearly separable, the

If we change the regularizer, we'll get a slightly different solution:

>>> h = linear.LinearClassifier({'lossFunction': linear.SquaredLoss(), 'lambda': 10, 'numIter': 100, 'stepSize': 0.5}) >>> runClassifier.trainTestSet(h, datasets.TwoDAxisAligned) Training accuracy 0.9, test accuracy 0.86 >>> h w=array([ 1.30221546, -0.06764756])As expected, the weights are

Now, we can try different loss functions. Implement logistic loss and hinge loss. Here are some simple test cases:

>>> h = linear.LinearClassifier({'lossFunction': linear.SquaredLoss(), 'lambda': 10, 'numIter': 100, 'stepSize': 0.5}) >>> runClassifier.trainTestSet(h, datasets.TwoDDiagonal) Training accuracy 0.98, test accuracy 0.86 >>> h w=array([ 0.33864367, 1.28110942]) >>> h = linear.LinearClassifier({'lossFunction': linear.HingeLoss(), 'lambda': 1, 'numIter': 100, 'stepSize': 0.5}) >>> runClassifier.trainTestSet(h, datasets.TwoDDiagonal) Training accuracy 0.98, test accuracy 0.86 >>> h w=array([ 1.17110065, 4.67288657])

**WU8 (5%):** For each of the loss functions, train a model on the
binary version of the wine data (called WineDataBinary) and evaluate
it on the test data. You should use lambda=1 in all cases. Which works
best? For that best model, look at the learned weights. Find
the *words* corresponding to the weights with the greatest
positive value and those with the greatest negative value (this is
like LAB3). Hint: look at WineDataBinary.words to get the id-to-word
mapping. List the top 5 positive and top 5 negative and explain.