Sparse Online Learning via Truncated Gradient [T33]
John Langford1  Lihong Li2  Tong Zhang2
1Yahoo!

Research  2Rutgers University

Goal: Sparse solutions in large-scale online learning Stochastic Gradient Descent

Properties of TG:
Regret is small  L1 regularization Efficient: complexity is independent of #features Effective in our application 3x109 features 3x107 examples

wi +1 = wi - 1 L( wi , ( xi , yi ))
our approach

~ wi +1 = wi - 1 L( wi , ( xi , yi )) ~ w = T ( w , g ,  )
i +1 i +1

Truncated Gradient

Weight decay towards 0