\documentclass[fleqn]{article}
\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}
\begin{document}
\lecture{Machine Learning}{HW05: Gradient descent and friends}{CS 726, Fall 2011}
% IF YOU ARE USING THIS .TEX FILE AS A TEMPLATE, PLEASE REPLACE
% "CS 726, Fall 2011" WITH YOUR NAME AND UID.
Hand in at: \url{http://www.cs.utah.edu/~hal/handin.pl?course=cs726}.
Remember that only PDF submissions are accepted. We encourage using
\LaTeX\ to produce your writeups. See \verb+hw00.tex+ for an example
of how to do so. You can make a \verb+.pdf+ out of the \verb+.tex+ by
running ``\verb+pdflatex hw00.tex+''.
\begin{enumerate}
\item Show that logistic loss (Equations 6.5 on p87 of the book) is
convex for a fixed value of $y \in \pm 1$ and \emph{as a function of
$\hat y$}. It's easiest (shortest, least cumbersome) to do in
terms of derivatives, but you could also do it directly from the
definition of convexity in terms of chords if you prefer.
%\begin{solution}
%\end{solution}
\item Show that if $f(z)$ is convex in $z$, then $f(\vec w \cdot \vec
x)$ is convex in $\vec w$ for a fixed $\vec x$. Note: show it
\emph{directly}: simply saying that linear functions are convex and
composition of convex functions is convex is \emph{not} an
acceptable answer. In particular, show it in terms of the chord
definition of convexity. I've started the solution below to set up
some notation that you're free to use or erase.
\begin{solution}
Let $\vec u$ and $\vec w$ be given, and let $\be \in [0,1]$. Let
$\vec v = \be \vec u + (1-\be) \vec w$ (so that $v$ is between $u$
and $w$). Define $f_u = f(\vec u \cdot \vec x)$ and similarly for
$f_v$ and $f_w$. We wish to show that $f_v \leq \be f_u + (1-\be)
f_v$.
TODO: your part here
\end{solution}
\item You might notice that Algorithm 23 (for subgradient descent on
regularized hinge loss) looks a \emph{lot} like Algorithm 5 (the
original perceptron algorithm). In fact, the most substantial
differences are in line 5, where HingeRegularizedGD compares $y(\vec
w \cdot \vec x + b) \leq 1$ whereas Perceptron comparse $\dots \leq
0$; and line 10 in which the weights are regularized. How would you
have to change the hinge loss and change the regularizer to make
these two differences go away? (Note even with these changes that
Perceptron would make updates after \emph{each} example, while
HingeRegularizedGD doesn't update until after processing \emph{all}
examples, so they're not completely identical.)
%\begin{solution}
%\end{solution}
\end{enumerate}
\end{document}