Instructor:

Héctor Corrada Bravo

Center for Bioinformatics and Computational Biology

Department of Computer Science

Office: 3114F Biomolecular Sciences Building

Phone Number: 301-405-2481Meeting times

Monday, 1:45pm-2:45pm

Room BSB 3118

Date | Topic | Readings |
---|---|---|

9/16 | Organizational Meeting |

Major advances in technology for genomic studies are bringing the prospect of personalized and individualized medicine closer to reality. Many of these advances are predicated on the ability to generate data at an unprecedented rate, posing a significant need for computational data analysis that is clinically and biologically useful and robust.

This reading course will concentrate on the fundamental statistical methods required to meet this need. The goal is to be familiar with the key articles and concepts in the field of penalized regression, as well as have a sense of the unanswered questions and current research directions. No prior knowledge of biology is required.

The idea is to create an informal atmosphere where students can freely interact and discuss statistical / computational literature. Below is list of the (mathematical) background required. Interested students without this background are encouraged to attend these readings. The main ideas of each paper will be thoroughly discussed

Some of the papers we will discuss are touched upon in the wonderful textbook, which is free online:

Hastie T., Tibshirani R., and Friedman, J. (2009) The Elements of Statistical Learning. Second edition. Springer.

Basic knowledge of linear regression models, as covered in a machine learning or statistics course. What is a linear regression model? How is a linear regression model used to make predictions / inferences?

Very basic knowledge of matrix algebra.

Hoerl, A. and Kennard, R. (1970). Ridge regression: applications to nonorthogonal problems. Technometrics, 12 69-82

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B, 58 267-288

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B, 67 301-320

Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Stat., 28 1356-1378

Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32 407-451

Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat., 1 302-332

Wu, T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat., 2 224-244

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc., 96 1348-1360

Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Stat., 38 894-942

Zou, H. (2006). The adaptive lasso and its oracle properties. J. Am. Stat. Assoc., 101 1418-1429

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 68 49-67

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B, 67 91-108

Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9 432

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. Roy. Stat. Soc. B, 70 849D911

Witten, D. and Tibshirani, R. (2008). Testing significance of features by lassoed principal components. Ann. Appl. Stat., 2 986

Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. Roy. Stat. Soc. B, 71 1009-1030