STAT115 STAT215 BIO512 BIST298 Introduction to Computational

advertisement
STAT115
STAT215 BIO512 BIST298
Introduction to Computational
Biology and Bioinformatics
Spring 2014
Xiaole Shirley Liu
STAT115
STAT 215 BIO512 BIST 298
Student Introduction
• Name, School, Grade, Major
• What do you hope to learn from
this course
Bioinformatics and
Computational Biology
• Interdisciplinary
– Statistics, Biology, Computer Science
• Applied
– From freshman to postdocs
– Useful training for many
– The more you practice, the better you get
• Moves with technology development
3
STAT115
The Protein Sequence and
Structure Wave
•
•
•
•
•
•
•
4
1955: Sanger sequenced bovine insulin
1970: Smith-Waterman algorithm
1973: PDB
1990: BLAST
1994: BLOCKS database
1994-: CASP
1997-: Proteomics
STAT115
The Microarray Wave
• Microarray contains hundreds to
millions of tiny probes
• Simultaneously detect how much
each gene is expressed
5
STAT115
ALL vs AML
• Golub et al, Science 1999.
6
STAT115
ALL vs AML
7
STAT115
“Microarrays” Today
• Infer the expression value of all
the genes from 1000 probes
• High throughput drug screen
8
STAT115
The DNA Sequencing Wave
•
•
•
•
•
•
9
1953: DNA structure
1972: Recombinant DNA
1977: Sanger sequencing
1985: PCR
1988: NCBI
1990: BLAST
STAT115
Sequencing in the 1970s
10
STAT115
The Human Genome Race
• Human Genome Project: 1990-2003
– Originally 1990-2005
– Boosted by technology improvement
(automation improved throughput and quality
with reduced cost)
– Competition from Celera
11
STAT115
Human Genome Sequencing
• Clone-by-clone and whole-genome shotgun
12
STAT115
The Human Genome Race
• Human Genome Project: 1990-2003
– Originally 1990-2005
– Boosted by technology improvement
(automation improved throughput and quality
with reduced cost)
– Competition from Celera
• Informatics essential for both the public and
private sequencing efforts
– Sequence assembly and gene prediction
– Working draft finished simultaneously spring
2000
13
STAT115
Sequencing in 2001
Sequencing in 2007
Sequencing Today
• 1000 Genomes
• Personal genome
sequencing
• HiSeq2500
16
STAT115
Personalized Disease
Susceptibility Test and Treatment
• GWAS Catalog
17
• Cancer Genome
Sequencing
STAT115
Big Data Challenges
18
STAT115
19
STAT115
Is This Course For Me?
• Biologists
• Statistician
• Computer Scientists
• STAT115/STAT215/BIO512/BIST298
• http://stat115.org/
20
STAT115
Class Information
• Professor and Lectures
– Video recording in 2-4 sections
– Tue / Thu 11:30am-1pm, SC221
– Office hours
• Roughly 4 modules
–
–
–
–
21
Gene expression
Transcriptional and epigenetic gene regulation
Human genetics, association studies
Translational cancer bioinformatics
STAT115
Class Information
• Teaching Fellows
Yang Li
Lin Liu
• Labs: Wed 6 – 8pm, Science Center 418D
– Tue 5:30 – 7pm, HSPH Kresge 209, Boston?
– Make sure you come tomorrow!
22
STAT115
Class Information
• HW and Grading
–
–
–
–
–
–
HW 6 * 10 or 6 * 12
Exams (midterm + final) * 10
Class participation: 20
Algorithm videos: 5
Lecture notes: extra 5 points
Late days
• Auditing?
23
STAT115
All biology is becoming computational,
much the same way it has became
molecular … Otherwise “low input,
high throughput and no output science”
--- Sydney Brenner
2002 Nobel Prize
Download