Classification via Minimum Incremental Co ding Length (MICL) NIPS 2007 Spotlight: Poster ID #T40 John Wright ECE, University of Illinois at Urbana-Champaign Sup ervised learning: training data X1 . . . Xk from k classes, unlabelled test sample x. Idea: lossy coding length (upto distortion 2 ) as a finite-sample surrogate for Shannon optimal co de lengths w.r.t. true (unknown) distribution: L ( X ) = m +n 2 log2 det(I + 2 m n X X T ) #bits to enco de X upto distortion 2 RMAP - RMICL Number of training samples 50 0.2 40 30 20 10 10 22 34 Ambient dimension 0.15 0.1 0.05 0 -0.05 RRDA - RMICL Number of training samples Minimize extra bits needed to encode test x and training Xi jointly: class(x) := arg mini L (Xi {x}) - L (Xi ) 50 40 30 20 10 0.8 0.6 0.4 0.2 10 22 34 Ambient dimension 0 Theorem: asymptotically, MICL approaches regularized MAP: ^ ~^ ~ class(x) arg maxi P x | N (µi , i + 2 I ) P class i Strengths: lossy co ding regularization effect performance gains with small samples in high-dimensional spaces lo cal and kernel extensions for non-Gaussian data. J. Wright, Y. Tao, Z. Lin, Y. Ma, H. Shum Neural Information Processing Systems, Decemb er 4, 2007