Machine Learning (CMSC 726)

Machine Learning
CMSC 726
Fall 2011

Schedule:	Tue/Thr 11:00am-12:15pm
Location:	CSIC 2107 (note: changed!)
Instructor:	Hal Daume III:
Office Hours:	AVW 3227; M 1:30-2:30 or by appointment
Piazza:	UMD/cs726
TA:	Aleksandrs Ecins (office hours T/R 1p-2p, AVW 344408)

Jump to: [Background] [Structure] [Grading] [Textbooks] [Schedule] [Homework] [Links] [Policies]

Background and Description

Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples. The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time." A very large fraction of what we'll talk about has to do with figuring out what generalization means. We'll look at it from lots of different perspectives and hopefully gain some understanding of what's going on.

This class will showcase machine learning technology in the context of recommender systems, ala what you see on Amazon or NetFlix (or eHarmony). The data we'll be working with is recommendations for CS courses at UMD!

There are a few cool things about machine learning that I hope to get across in class. The first is that it's broadly applicable. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. The second is that there is a very close connection between theory and practice. While this course is more on the "practical" side of things, almost everything we will talk about has a huge amount of accompanying theory. The third is that once you understand the basics of machine learning technology, it's a very open field and lots of progress can be made quickly, effectively by figuring out ways to formalize whatever we can figure out about the world.

Prerequisites: I take prerequisites seriously. There will be a lot of math in this class and if you do not come prepared, life will be rough. You should be able to take derivatives by hand (preferably of multivariate functions), you should know what dot products are and how they are related to projections onto subspaces, you should know what Bayes' rule is and you should know that it's okay for the density of a Gaussian probability distribution to be greater than one. I've provided some reading material to refresh these issues in your head, but if you haven't at least seen these things before, you should beef up your math background before class begins. On the programming side, projects will be in Python; you should understand basic computer science concepts (like recursion), basic data structures (trees, graphs), and basic algorithms (search, sorting, etc.). (If you know matlab, here's a nice cheat sheet.)

Structure of Class

I will take a slightly non-standard approach to class time. I will not spend 3 hours per week going over material that was in the readings. As a result, you should read. And you should do the short written assignments. My responsibility will be to help you understand things that are hard, and to give you an insider's view of the field. Class time will be interactive. Certain homework problems will be marked for in-class presentation, and you will do them. The rest of class time will be spent talking about issues that arise, things that I think are particularly interesting, doing activities and/or demos.

Your responsibilities are as follows:

Read the assigned reading assignments before class. There are inline questions in the reading that you should be prepared to answer in class. Though you don't need to be right, you do need to have an answer (or face public embarassment!).
It will be very helpful if you write down a short list of questions before class, though this isn't actually required. I'm serious about reading; to demonstrate that, most reading assignments are about 10 pages long (some are 15ish).
Complete the assigned weekly homework assignments before class. Some will be "starred", meaning that we will spend the first part of class time going over the solutions. Students will present the solutions: you will be chosen to present uniformly at random (without replacement). We have our own handin system.
Participate actively in class discussions and on Piazza (questions and answers are considered participation!).

Given that this is a three credit class, I expect you to spend nine hours per week working on ML stuff. Three of those hours will be in class. Of the remaining six, I expect about two to be spent reading (one hour per assignment), two to be spent on written homeworks and two to be spent on projects. If things are taking significantly more time than this, you should talk to us.

Grading

The purpose of grading (in my mind) is to provide extra incentive for you to keep up with the material and to ensure that you exit the class as a machine learnign genius. If everyone gets an A, that would make me happy (sadly, it hasn't happened yet). The components of grading are:

	27%	Programming projects There are three programming projects, each worth 9% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These must be completed in teams of two or three students.
	18%	Written homeworks There are thirteen written homeworks (one per week), each is worth 1.5% of your final grade (lowest one dropped). They will be graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. (The initial homework, HW00, is not graded, but required if you do not want to fail.)
	25%	Midterm exam Roughly halfway through the semester, there will be a midterm exam that covers everything up until that point. Obviously it is to be completed individually, but is open-book.
	25%	Final (practical) exam Everyone is to complete a final project, in teams of arbitrary size, which will play the role of a practical final exam. We will discuss the scope of the project later in class.
	5%	Class participation You will be graded on your in-class presentations of homework questions and other general participation, including participation in the comments on the blog. This is mostly subjective.

Late homeworks are not allowed (without prior approval). This is because I need to put solutions up on the web page. You may hand any project in up to 48 hours late; however, once it is late by one minute, your final score will be halved.

We will post notes on the blog when assignments have been graded. If you handed something in and do not get a score for an assignment, you have a one week moritorium on complaints.

Textbooks

There are no official books for this course. Our primary source will be a collection of notes (aka CIML) I have been writing. Some recommended (but not required) books:

Pattern Recognition and Machine Learning by Chris Bishop (ISBN 0387310738)
Machine Learning by Tom Mitchell (ISBN 0070428077)
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845)
Information Theory, Inference and Learning Algorithms by David MacKay (ISBN 0521642981)
An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (ISBN 0262111934)

Schedule (tentative)

The following schedule is subject to change, but likely not by very much. The readings listed are readings that you should have finished by that date. Everything is due by 10:55am on the date listed on the schedule. Programming assignments are to be completed in Python. Written assignments are to be handed in in PDF format.

One thing that students have pointed out in the past that I'll point out to you is that Wikipedia has a bunch of good articles related to machine learning and statistics. Especially basic statistics stuff (various distributions, rules of probability, etc.) are very well explained there. I highly recommend it as an alternative source of information.

Date Topics Readings Due Notes

01 Sep [1] What is machine learning? CIML 1-1.2 -

Basic Supervised Learing

06 Sep [2] Decision trees and inductive bias CIML 1.3-1.9 HW00 -

08 Sep [3] Geometry and nearest neighbors CIML 2-2.3 HW01 -

13 Sep [4] K-means clustering CIML 2.4-2.6 - -

15 Sep [5] Perceptrons CIML 3-3.4 HW02 -

20 Sep [6] Perceptrons II CIML 3.5-3.7 - -

22 Sep [7] Practical issues and evaluation CIML 4-4.8 HW03 -

27 Sep [8] Imbalanced and multiclass classification CIML 5-5.2 P1 -

29 Sep [9] Ranking and collective classification CIML 5.3-5.5 HW04 -

Advanced Supervised Learing

04 Oct [10] Linear models and gradient descent CIML 6-6.4 - -

06 Oct [11] Subgradient descent and support vector machines CIML 6.5-6.7 HW05 -

11 Oct [12] Probabilistic modeling CIML 7 - -

13 Oct [13] Probabilistic modeling II CIML 7 HW06 -

18 Oct [14] Neural networks CIML 8 - -

20 Oct [15] Neural networks II CIML 8 HW07 -

25 Oct [16] Kernel methods CIML 9-9.4 - -

27 Oct [17] Kernel methods II CIML 9.5-9.6 HW08 -

01 Nov [18] Ensemble methods CIML 11 P2 -

03 Nov [19] Efficient learning CIML 12 HW09 -

Unupervised Learing

08 Nov [20] Linear unsupervised learning CIML 13-13.2 Midterm -

10 Nov [21] Non-linear unsupervised learning CIML 13.3-13.5 HW10

15 Nov [22] Expectation maximization CIML 14-14.3 -

17 Nov [23] Expectation maximization II CIML 14.4-14.5 HW11

22 Nov [24] Semi-supervised learning ssl_survey (sec 2-4) - -

Advanced Topics

29 Nov [25] Hidden Markov models hmms-sl -

01 Dec [26] Graphical models bp HW12 -

06 Dec [27] Online learning online (1-75) P3 -

08 Dec [28] Structured learning - -

13 Dec [29] Bayesian learning bayes-slides HW13

Homework Assignments

All written homeworks are due on Thursday. See the schedule above for due dates. You may handin your homework/projects here. You're free to use the LaTeX source in any way you want, but you'll need haldefs.sty and notes.sty to build them.

Written Homeworks

HW00: Survey and background check (tex)
HW01: Basic concepts and geometry (tex)
HW02: Clustering and perceptrons (tex)
HW03: Perceptrons and evaluation (tex)
HW04: Complex predictions (tex)
HW05: Gradient descent and friends (tex)
HW06: Probabilistic and neural modeling (tex)
HW07: Kernel methods (tex)
HW08: Kernels II (tex)
HW09: Ensembles and efficiency (tex)
HW10: Unsupervised learning (tex)
HW11: Expectation maximization (tex)
HW12: Graphical models (tex)
HW13: Advanced learning topics

Programming Projects

P0: Unix/Python/NumPy tutorial
P1: Basic classification ; solution
P2: Complex classification
P3: Unsupervised learning

Final Project

See here

Useful Links

This course has been taught (by me!) in the past: Fall 2009, Fall 2008, Spring 2008 and Spring 2007 .

This course is similar to several other machine learning courses, taught at other universities: CMU (Tom Mitchell and Andrew Moore), Stanford (Andrew Ng), Cornell (Thorsten Joachims) and Edinburgh (Sethu Vijayakumar). There have also been a series of summer schools on machine learning, some of which have videos up.

Although you won't need to use any of this software for your homeworks/projects, there are a large number of open-source machine learning toolkits out there. (Some of these may be useful for the competition.) A small sample:

Torch3: a generic machine learning library, particularly good for neural networks, but also a lot more!
MegaM: Optimization software for maximum entropy models, uses conjugate gradient for binary/binomial problems and LM-BFGS for multiclass problems
FastDT: Very fast decision tree learner that implements bagging and boosting
libSVM: a very efficient library for SVMs
SVM-Light: another efficient library for SVMs
Weka: the "defacto" machine learning/data-mining library
Mallet: a library for structured prediction with CRFs (plus other stuff)

Course Policies

Cheating: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.

ADA: Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester. You may reach them at 301-314-7682 or by visiting Susquehanna Hall on the 4th Floor.

College guidelines: Document concerning adding, dropping, etc. here.

Date	Topics	Readings	Due	Notes
01 Sep	[1] What is machine learning?	CIML 1-1.2	-
Basic Supervised Learing
06 Sep	[2] Decision trees and inductive bias	CIML 1.3-1.9	HW00	-
08 Sep	[3] Geometry and nearest neighbors	CIML 2-2.3	HW01	-
13 Sep	[4] K-means clustering	CIML 2.4-2.6	-	-
15 Sep	[5] Perceptrons	CIML 3-3.4	HW02	-
20 Sep	[6] Perceptrons II	CIML 3.5-3.7	-	-
22 Sep	[7] Practical issues and evaluation	CIML 4-4.8	HW03	-
27 Sep	[8] Imbalanced and multiclass classification	CIML 5-5.2	P1	-
29 Sep	[9] Ranking and collective classification	CIML 5.3-5.5	HW04	-
Advanced Supervised Learing
04 Oct	[10] Linear models and gradient descent	CIML 6-6.4	-	-
06 Oct	[11] Subgradient descent and support vector machines	CIML 6.5-6.7	HW05	-
11 Oct	[12] Probabilistic modeling	CIML 7	-	-
13 Oct	[13] Probabilistic modeling II	CIML 7	HW06	-
18 Oct	[14] Neural networks	CIML 8	-	-
20 Oct	[15] Neural networks II	CIML 8	HW07	-
25 Oct	[16] Kernel methods	CIML 9-9.4	-	-
27 Oct	[17] Kernel methods II	CIML 9.5-9.6	HW08	-
01 Nov	[18] Ensemble methods	CIML 11	P2	-
03 Nov	[19] Efficient learning	CIML 12	HW09	-
Unupervised Learing
08 Nov	[20] Linear unsupervised learning	CIML 13-13.2	Midterm	-
10 Nov	[21] Non-linear unsupervised learning	CIML 13.3-13.5	HW10
15 Nov	[22] Expectation maximization	CIML 14-14.3	-
17 Nov	[23] Expectation maximization II	CIML 14.4-14.5	HW11
22 Nov	[24] Semi-supervised learning	ssl_survey (sec 2-4)	-	-
Advanced Topics
29 Nov	[25] Hidden Markov models	hmms-sl	-
01 Dec	[26] Graphical models	bp	HW12	-
06 Dec	[27] Online learning	online (1-75)	P3	-
08 Dec	[28] Structured learning	-	-
13 Dec	[29] Bayesian learning	bayes-slides	HW13