\documentclass[fleqn]{article}
\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}
\begin{document}
\lecture{Machine Learning}{HW04: Complex predictions}{CS 726, Fall 2011}
% IF YOU ARE USING THIS .TEX FILE AS A TEMPLATE, PLEASE REPLACE
% "CS 726, Fall 2011" WITH YOUR NAME AND UID.
Hand in at: \url{http://www.cs.utah.edu/~hal/handin.pl?course=cs726}.
Remember that only PDF submissions are accepted. We encourage using
\LaTeX\ to produce your writeups. See \verb+hw00.tex+ for an example
of how to do so. You can make a \verb+.pdf+ out of the \verb+.tex+ by
running ``\verb+pdflatex hw00.tex+''.
\begin{enumerate}
\item Define, in a manner analogous to the way Tasks are defined in
Chapter 5, the regression problem under squared loss (refer back to
Section 1.4 if you need to).
%\begin{solution}
% Given: TODO
%
% Compute: TODO
%\end{solution}
\item All of the theoretical results for complex classification say
something like ``if a binary classifier gets error at most $\ep$,
then the error on my more complex problem will be at most $g(\ep)$''
(where $g$ is whatever is appropriate for the particular
algorithm). Hopefully you realize that there are multiple types of
error that matter, for instance: training error and expected test
error. To what type(s) of error do these theorems apply?
%\begin{solution}
% TODO
%\end{solution}
\item At the face of it, AVA seems more computationally intensive at
training time than OVA because it trains $\cO(K^2)$ classifiers
rather than $\cO(K)$ classifiers. However, all of the $K$-many OVA
classifiers are on the full data set of $N$ examples, while the
$\cO(K^2)$ AVA classifiers are only on subsets of the data. Suppose
that you have $N$ data points, divided evenly into $K$ classes (so
that there are $N/K$ examples per class).
\begin{enumerate}
\item Suppose that the training time for your binary classifier is
linear in the number of examples it receives. What is the
complexity of training OVA and AVA, as a function of $N$ and $K$?
%\begin{solution}
% TODO
%\end{solution}
\item Suppose the training time is quadratic; then what is the
complexity of AVA and OVA?
%\begin{solution}
% TODO
%\end{solution}
\end{enumerate}
\item Define a ranking preference function $\om$ that penalizes
mispredictions \emph{linearly} up to a threshold $K$. In other
words, for $K=20$, if I put the object that should be in position
$5$ in position $20$, then I pay $\$15$; if I put it in position
$30$, I only pay $\$20$ because nothing costs more than $K=\$20$.
%\begin{solution}
% TODO
%\end{solution}
\end{enumerate}
\end{document}