Sparse Feature Learning for Deep Belief Networks M. Ranzato, Y. Boureau, Y. LeCun Courant Institute New York University Poster ID M55 ENCODER DECODER CODE nd Unsupervised Algorithm Learns encoder by coupling with decoder Symmetry avoids filter normalization Simple iterative online algorithm INPUT W' 1st stage ENCODER W OUTPUT Sparsity penalty no need to consider partition function 2 stage ENCODER DECODER CODE Training Deep Networks training stage by stage top stage produces higher level representations INPUT Inference feedforward pass through the chain of encoders DECODER Comparison with other algorithms: PCA and Restricted Boltzmann Machine (RBM) Experiments with MNIST dataset by trading off RMSE and sparsity level, this machine gives better performance using fewer bits in the code Some features learned at the 1st stage Reconstructions from 1ofN codes at the 2nd stage The nonlinear mapping from input pixel intensities to class labels was discovered (totally unsupervised)