Machine Learning
CMSC 726
Fall 2011
![]() |
Machine Learning
CMSC 726 Fall 2011
|
![]() |
Machine learning is all about finding patterns in data. The whole
idea is to replace the "human writing code" with a "human supplying
data" and then let the system figure out what it is that the person
wants to do by looking at the examples. The most central concept in
machine learning is generalization: how to generalize beyond
the examples that have been provided at "training time" to new
examples that you see at "test time." A very large fraction of what
we'll talk about has to do with figuring out what generalization
means. We'll look at it from lots of different perspectives and
hopefully gain some understanding of what's going on.
This class will showcase machine learning technology in the context of
recommender systems, ala what you see on Amazon or NetFlix (or
eHarmony). The data we'll be working with is recommendations for CS
courses at UMD!
There are a few cool things about machine learning that I hope to get
across in class. The first is that it's broadly applicable. These
techniques have led to significant advances in many fields, including
stock trading, robotics, machine translation, computer vision,
medicine, etc. The second is that there is a very close connection
between theory and practice. While this course is more on the
"practical" side of things, almost everything we will talk about has a
huge amount of accompanying theory. The third is that once you
understand the basics of machine learning technology, it's a very open
field and lots of progress can be made quickly, effectively by
figuring out ways to formalize whatever we can figure out about the
world.
Prerequisites: I take prerequisites seriously. There will be a
lot of math in this class and if you do not come prepared, life
will be rough. You should be able to take derivatives by hand
(preferably of multivariate functions), you should know what dot
products are and how they are related to projections onto subspaces,
you should know what Bayes' rule is and you should know that it's okay
for the density of a Gaussian probability distribution to be greater
than one. I've provided some reading
material to refresh these issues in your head, but if you haven't
at least seen these things before, you should beef up your math
background before class begins. On the
programming side, projects will be in Python; you should understand basic
computer science concepts (like recursion), basic data structures
(trees, graphs), and basic algorithms (search, sorting, etc.). (If
you know matlab, here's a nice cheat sheet.)
The purpose of grading (in my mind) is to provide extra incentive for
you to keep up with the material and to ensure that you exit the class
as a machine learnign genius. If everyone gets an A, that
would make me happy (sadly, it hasn't happened yet). The components
of grading are:| 27% | Programming projects There are three programming projects, each worth 9% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These must be completed in teams of two or three students. | |
| 18% | Written homeworks There are thirteen written homeworks (one per week), each is worth 1.5% of your final grade (lowest one dropped). They will be graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. (The initial homework, HW00, is not graded, but required if you do not want to fail.) | |
| 25% | Midterm exam Roughly halfway through the semester, there will be a midterm exam that covers everything up until that point. Obviously it is to be completed individually, but is open-book. | |
| 25% | Final (practical) exam Everyone is to complete a final project, in teams of arbitrary size, which will play the role of a practical final exam. We will discuss the scope of the project later in class. | |
| 5% | Class participation You will be graded on your in-class presentations of homework questions and other general participation, including participation in the comments on the blog. This is mostly subjective. |
| Date | Topics | Readings | Due | Notes |
| 01 Sep | [1] What is machine learning? | CIML 1-1.2 | - | ![]() |
| 06 Sep | [2] Decision trees and inductive bias | CIML 1.3-1.9 | HW00 | - |
| 08 Sep | [3] Geometry and nearest neighbors | CIML 2-2.3 | HW01 | - |
| 13 Sep | [4] K-means clustering | CIML 2.4-2.6 | - | - |
| 15 Sep | [5] Perceptrons | CIML 3-3.4 | HW02 | - |
| 20 Sep | [6] Perceptrons II | CIML 3.5-3.7 | - | - |
| 22 Sep | [7] Practical issues and evaluation | CIML 4-4.8 | HW03 | - |
| 27 Sep | [8] Imbalanced and multiclass classification | CIML 5-5.2 | P1 | - |
| 29 Sep | [9] Ranking and collective classification | CIML 5.3-5.5 | HW04 | - |
| 04 Oct | [10] Linear models and gradient descent | CIML 6-6.4 | - | - |
| 06 Oct | [11] Subgradient descent and support vector machines | CIML 6.5-6.7 | HW05 | - |
| 11 Oct | [12] Probabilistic modeling | CIML 7 | - | - |
| 13 Oct | [13] Probabilistic modeling II | CIML 7 | HW06 | - |
| 18 Oct | [14] Neural networks | CIML 8 | - | - |
| 20 Oct | [15] Neural networks II | CIML 8 | HW07 | - |
| 25 Oct | [16] Kernel methods | CIML 9-9.4 | - | - |
| 27 Oct | [17] Kernel methods II | CIML 9.5-9.6 | HW08 | - |
| 01 Nov | [18] Ensemble methods | CIML 11 | P2 | - |
| 03 Nov | [19] Efficient learning | CIML 12 | HW09 | - |
| 08 Nov | [20] Linear unsupervised learning | CIML 13-13.2 | Midterm | - |
| 10 Nov | [21] Non-linear unsupervised learning | CIML 13.3-13.5 | HW10 | ![]() |
| 15 Nov | [22] Expectation maximization | CIML 14-14.3 | - | ![]() |
| 17 Nov | [23] Expectation maximization II | CIML 14.4-14.5 | HW11 | ![]() |
| 22 Nov | [24] Semi-supervised learning | ssl_survey (sec 2-4) | - | - |
| 29 Nov | [25] Hidden Markov models | hmms-sl | - | ![]() |
| 01 Dec | [26] Graphical models | bp | HW12 | - |
| 06 Dec | [27] Online learning | online (1-75) | P3 | - |
| 08 Dec | [28] Structured learning | - | - | ![]() |
| 13 Dec | [29] Bayesian learning | bayes-slides | HW13 | ![]() |