Hal Daumé III

What is ML?

Machine learning is the study of computer systems that learn from data and experience. It is applied in an incredibly wide variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential customer of machine learning.

What is CIML?

CIML is an introductory textbook that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire book. The topics covered are a small superset of what I typically teach.


The primary goal of CIML is as a first course in machine learning (along the lines of Mitchell's textbook) rather than a reference for experienced researchers (eg., Bishop). It is written primarily for people who want to learn about and use machine learning, but will not necessarily be machine learning researchers. You can see my thoughts on how to teach machine learning on my blog.

Getting CIML?

If you are an instructor and would like an advance copy (roughly mid-Summer 2011), please let me know and we can work something out. I will use this book in Fall 2011 and, if you'd like to "beta test" it at the same time, I will make it available to your students at cost. (It is published on-demand through Amazon.) If you are a student, I'm afraid that you will need to wait until it is officially released (probably Spring 2012).


CIML is made up of "bite-sized" chapters (each about 10-15 pages), which can be covered deeply in a week-per-chapter. Or, if you're more aggressive, some could be done entirely in one week. Many are optional and dependencies are marked. There are 19 planned "content" chapters, which should enable instructors to pick and choose a bit based on their tastes. It is currently about 1/3 written, and I estimate it will be about 350-400 pages (full color!) at the end. CIML is written in such a way that within a few weeks, students should be able to tackle complicated problems (not just binary classification!).

Student Level?

Discrete math, calculus and data structures are important prerequesites. I believe strongly in math as an aid to understanding, but the goal of CIML is not to make everyone a mathematician. Unlike other books, the math that's in there is built up piece by piece, and explained inline. There are also dozens of algorithms spelled out in pseudo-code.
last updated on twenty eight february, two thousand eleven; contact me AT hal3 DOT name