Classification via Minimum Incremental Co ding Length (MICL)
NIPS 2007 Spotlight: Poster ID #T40 John Wright ECE, University of Illinois at Urbana-Champaign

Sup ervised learning: training data X1 . . . Xk from k classes, unlabelled test sample x. Idea: lossy coding length (upto distortion 2 ) as a finite-sample surrogate for Shannon
optimal co de lengths w.r.t. true (unknown) distribution: L ( X ) =
m +n 2

log2 det(I +

2 m

n

X X T )  #bits to enco de X upto distortion 2
RMAP - RMICL Number of training samples 50 0.2 40 30 20 10 10 22 34 Ambient dimension 0.15 0.1 0.05 0 -0.05 RRDA - RMICL

Number of training samples

Minimize extra bits needed to encode test x and training Xi jointly: class(x) := arg mini L (Xi  {x}) - L (Xi )
50 40 30 20 10

0.8 0.6 0.4 0.2 10 22 34 Ambient dimension 0

Theorem: asymptotically, MICL approaches regularized MAP: ^ ~^ ~ class(x)  arg maxi P x | N (痣 , i + 2 I ) P class i Strengths: lossy co ding  regularization effect

performance gains with small samples in high-dimensional spaces lo cal and kernel extensions for non-Gaussian data.
J. Wright, Y. Tao, Z. Lin, Y. Ma, H. Shum Neural Information Processing Systems, Decemb er 4, 2007