Instructor:

Héctor Corrada Bravo

Center for Bioinformatics and Computational Biology

Department of Computer Science

Office: 3114F Biomolecular Sciences Building

Phone Number: 301-405-2481Lecture Meeting times

Tuesday and Thursday, 9:30am-10:45am

Room CSIC 3120Office Hours: Friday 1:00pm-2:00pm AVW 3223 (or BSB 3114F if so posted in Piazza) and by appointment

TA: Wikum Dinalankara

Office Hours: Monday 1:00pm-3:00pm BSB 3119Discussion site: https://piazza.com/class/hqr5f9kmxwg6dc

Handin: http://inclass.umiacs.umd.edu/perl/handin.pl?course=cmsc702

Grades server: http://grades.cs.umd.edu

*Lectures linked are from last semester and very likely to change near lecture time*

Legend |
---|

Under construction |

Not updated yet |

[1] Gentleman, R., Carey, V.J., et al. *Bioinformatics and Computational Biology Solutions Using R and Bioconductor.* Springer, 2005.

[2] Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer 2009.

Many slides are borrowed from a number of sources (hopefully cited in slides). A lot of them are borrowed from Rafael A. Irizarry.

Homework | Date posted | Due date | Hints and solutions |
---|---|---|---|

Homework 1 | Feb 15 | Mar 13 | |

Homework 2 | Mar 25 | Apr 1 | |

Homework 3 | May 1 | May 9 |

- R/Bioconductor: R is an open-source environment for data analysis. The Bioconductor project provides a large number of useful libraries for high-throughput genomic data analysis. You can browse them by task
- The RStudio IDE is highly recommended. The Revolution IDE is also very good, only Linux and Windows.
- Bioconductor Workflows include some of the analyses we will be performing and implementing in class.
- R Task Views: The Machine Learning and Optimization Task Views list useful packages in R we may use.
- R/Matlab references: A short R guide for Matlab users. A longer one.
- R/Python references: A short R guide for Python users.

- Scitable is an excellent resource provided by the journal
*Nature*providing introductory reading material on most of the topics in biology/genetics/medicine we discuss in class. - Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer 2009. This book covers many of the Machine/Statistical Learning methods we will use in class. You can download it here.
- Diez, Barr and Cetinkaya-Rundel. OpenIntro Statistics. 2011 A good introductory Statistics textbook. You can download it here..
- Rafael Irizarry's lecture videos are good to watch.
- Rafa is also offering a related course as a MOOC on edX

The official syllabus detailing class policies, calendar and other details can be found here [pdf]

Major advances in technology for genomic studies are bringing the prospect of personalized and individualized medicine closer to reality. Many of these advances are predicated on the ability to generate data at an unprecedented rate, posing a significant need for computational data analysis that is clinically and biologically useful and robust.

This course will concentrate on the fundamental computational and statistical methods required to meet this need. It will cover topics in functional genomics, population genetics and epigenetics. Computational methods studied for this type of analysis include: supervised, unsupervised and semi-supervised learning, data visualization, statistical modeling and inference, probabilistic graphical models, sparse methods, and numerical optimization. Machine learning methods will be a core component of this class. No prior knowledge of biology is required.

- Introduction
- A primer on molecular biology for computer scientists and statisticians

- Functional genomics
- Gene expression analysis by probabilistic modeling and statistical inference
- Supervised, unsupervised and semi-supervised learning models for classification and clustering from gene expression data
- Deriving medical diagnosis and prognosis models from expression data using sparse methods in machine learning
- Probabilistic graphical models of gene co-regulation

- Genetics and epigenetics
- Genotype-phenotype association analysis by probabilistic modeling and statistical inference
- Genomic-environmental-clinical data integration methods

- Emerging Topics
- Sequencing technologies
- Individualized Medicine