Date
|
Lecture Name
|
Readings (recommended)
|
Work Due
|
1/28
|
Course introduction and administrivia
| |
|
1/30
|
The R data analysis environment
|
John Cook's intro Baltimore Analysis The source
|
|
2/4
|
Molecular biology for computer scientists and statisticians
|
Hunter. Molecular Biology for Computer Scientists
|
|
2/6
|
The R/Bioconductor genomics analysis environment
|
Bioconductor paper The anatomy of successful computational biology software The setup script: setup.R The script we're using in class: bioconductor.R Microarray analysis example
|
|
2/11
|
Genome architecture Notes on HMMs for CpG Islands
|
Wu, et al., 2010 GenomicRanges Paper
|
|
2/13
|
CAMPUS CLOSED
| |
|
2/18
|
Overview of second generation sequencing technology Bowtie
|
Bowtie
|
|
2/20
|
Gene expression analysis: RNA sequencing analysis
|
DESeq RNAseq review article Myrna
|
|
2/25
|
RNA sequencing analysis (II) Lecture Notes
|
DESeq RNAseq review article Myrna
|
|
2/27
|
Isoform expression quantification and transcriptome assembly
|
Jiang and Wong IsoLasso Cufflinks Salzman, Jiang and Wong
|
|
3/4
|
Isoform expression quantification and transcriptome assembly
|
Jiang and Wong IsoLasso Cufflinks Salzman, Jiang and Wong
|
|
3/6
|
Isoform expression quantification and transcriptome assembly Lessons learned from RNA-seq
|
Jiang and Wong IsoLasso Cufflinks Salzman, Jiang and Wong
|
|
3/11
|
Data Analyst Bag of Tricks I: Empirical Bayes Methods LectureNotes
|
limma [1] Ch. 11 and Ch. 14
|
|
3/13
|
Data Analyst Bag of Tricks II: Multiple Testing Lecture Notes
|
q- value Noble, "How does multiple testing correction work" SAM
|
HW1
|
3/18
|
No class: Spring Break
| |
|
3/20
|
No class: Spring Break
| |
|
3/25
|
Unsupervised methods Notes on EM
|
[1] Chs. 12 and 13
|
|
3/27
|
Unsupervised methods (II)
|
[2] Ch. 14.3 and Ch 14.5 SVA Leek, et al., batch effects
|
|
3/27
|
Unsupervised methods (II)
|
[2] Ch. 14.3 and Ch 14.5 SVA Leek, et al., batch effects
|
HW2
|
4/3
|
Classification and prediction methods
|
[2] Ch. 4
|
|
4/4
|
THIS IS NOT A LECTURE DATE
| |
Project Proposal
|
4/8
|
Recap
| |
|
4/10
|
Genetics:Brief genotyping intro Genotype/phenotype association discovery and analysis
|
SOAP RAPID Lirnet
|
|
4/14
|
THIS IS NOT A LECTURE DATE
| |
Midterm
|
4/15
|
Group presentations
| |
|
4/17
|
Group presentations
| |
|
4/22
|
Genetics:Genotype/phenotype association discovery and analysis
|
RAPID
|
|
4/24
|
Regulatory network discovery
|
Segal, et al., 2003
|
|
4/29
|
Analysis of differential methylation with sequencing
|
Hansen et al., 2012 BSmooth
|
|
5/1
|
Approaching the promise of individualized medicine
| |
|
5/2
|
THIS IS NOT A LECTURE DATE
| |
Project progress report
|
5/3
|
THIS IS NOT A LECTURE DATE
| |
|
5/6
|
Project presentations (1)
| |
|
5/8
|
Project presentations (2)
| |
HW 3
|
5/13
|
Project presentations (3)
| |
|
5/15
|
THIS IS NOT A LECTURE DATE
| |
Final project
|
Lectures linked are from last semester and very likely to change near lecture time
Legend
|
Under construction
|
Not updated yet
|
[1] Gentleman, R., Carey, V.J., et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005.
[2] Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer 2009.
Many slides are borrowed from a number of sources (hopefully cited in slides). A lot of them are borrowed from Rafael A. Irizarry.
The official syllabus detailing class policies, calendar and other details can be found here [pdf]
Major advances in technology for genomic studies are bringing the prospect of personalized and individualized medicine closer to reality. Many of these advances are predicated on the ability to generate data at an unprecedented rate, posing a significant need for computational data analysis that is clinically and biologically useful and robust.
This course will concentrate on the fundamental computational and statistical methods required to meet this need. It will cover topics in functional genomics, population genetics and epigenetics. Computational methods studied for this type of analysis include: supervised, unsupervised and semi-supervised learning, data visualization, statistical modeling and inference, probabilistic graphical models, sparse methods, and numerical optimization. Machine learning methods will be a core component of this class. No prior knowledge of biology is required.
- Introduction
- A primer on molecular biology for computer scientists and statisticians
- Functional genomics
- Gene expression analysis by probabilistic modeling and statistical inference
- Supervised, unsupervised and semi-supervised learning models for classification and clustering from gene expression data
- Deriving medical diagnosis and prognosis models from expression data using sparse methods in machine learning
- Probabilistic graphical models of gene co-regulation
- Genetics and epigenetics
- Genotype-phenotype association analysis by probabilistic modeling and statistical inference
- Genomic-environmental-clinical data integration methods
- Emerging Topics
- Sequencing technologies
- Individualized Medicine