Sparse Online Learning via Truncated Gradient [T33] John Langford1 Lihong Li2 Tong Zhang2 1Yahoo! Research 2Rutgers University Goal: Sparse solutions in large-scale online learning Stochastic Gradient Descent Properties of TG: Regret is small L1 regularization Efficient: complexity is independent of #features Effective in our application 3x109 features 3x107 examples wi +1 = wi - 1 L( wi , ( xi , yi )) our approach ~ wi +1 = wi - 1 L( wi , ( xi , yi )) ~ w = T ( w , g , ) i +1 i +1 Truncated Gradient Weight decay towards 0