STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2014 Xiaole Shirley Liu STAT115 STAT 215 BIO512 BIST 298 Student Introduction • Name, School, Grade, Major • What do you hope to learn from this course Bioinformatics and Computational Biology • Interdisciplinary – Statistics, Biology, Computer Science • Applied – From freshman to postdocs – Useful training for many – The more you practice, the better you get • Moves with technology development 3 STAT115 The Protein Sequence and Structure Wave • • • • • • • 4 1955: Sanger sequenced bovine insulin 1970: Smith-Waterman algorithm 1973: PDB 1990: BLAST 1994: BLOCKS database 1994-: CASP 1997-: Proteomics STAT115 The Microarray Wave • Microarray contains hundreds to millions of tiny probes • Simultaneously detect how much each gene is expressed 5 STAT115 ALL vs AML • Golub et al, Science 1999. 6 STAT115 ALL vs AML 7 STAT115 “Microarrays” Today • Infer the expression value of all the genes from 1000 probes • High throughput drug screen 8 STAT115 The DNA Sequencing Wave • • • • • • 9 1953: DNA structure 1972: Recombinant DNA 1977: Sanger sequencing 1985: PCR 1988: NCBI 1990: BLAST STAT115 Sequencing in the 1970s 10 STAT115 The Human Genome Race • Human Genome Project: 1990-2003 – Originally 1990-2005 – Boosted by technology improvement (automation improved throughput and quality with reduced cost) – Competition from Celera 11 STAT115 Human Genome Sequencing • Clone-by-clone and whole-genome shotgun 12 STAT115 The Human Genome Race • Human Genome Project: 1990-2003 – Originally 1990-2005 – Boosted by technology improvement (automation improved throughput and quality with reduced cost) – Competition from Celera • Informatics essential for both the public and private sequencing efforts – Sequence assembly and gene prediction – Working draft finished simultaneously spring 2000 13 STAT115 Sequencing in 2001 Sequencing in 2007 Sequencing Today • 1000 Genomes • Personal genome sequencing • HiSeq2500 16 STAT115 Personalized Disease Susceptibility Test and Treatment • GWAS Catalog 17 • Cancer Genome Sequencing STAT115 Big Data Challenges 18 STAT115 19 STAT115 Is This Course For Me? • Biologists • Statistician • Computer Scientists • STAT115/STAT215/BIO512/BIST298 • http://stat115.org/ 20 STAT115 Class Information • Professor and Lectures – Video recording in 2-4 sections – Tue / Thu 11:30am-1pm, SC221 – Office hours • Roughly 4 modules – – – – 21 Gene expression Transcriptional and epigenetic gene regulation Human genetics, association studies Translational cancer bioinformatics STAT115 Class Information • Teaching Fellows Yang Li Lin Liu • Labs: Wed 6 – 8pm, Science Center 418D – Tue 5:30 – 7pm, HSPH Kresge 209, Boston? – Make sure you come tomorrow! 22 STAT115 Class Information • HW and Grading – – – – – – HW 6 * 10 or 6 * 12 Exams (midterm + final) * 10 Class participation: 20 Algorithm videos: 5 Lecture notes: extra 5 points Late days • Auditing? 23 STAT115 All biology is becoming computational, much the same way it has became molecular … Otherwise “low input, high throughput and no output science” --- Sydney Brenner 2002 Nobel Prize