Next-Generation Sequencing: Applications for Human Health and Agriculture Dr. Harsha K Rajasimha Fall 2013 1 Why this course? Next-Generation Sequencing (NGS) is revolutionizing the way molecular biology research is conducted. The cost of sequencing a megabase of DNA has come down from almost $8K in 2001 to about $0.08 in 2013. That is 5 orders of magnitude reduction in 12 years. This has resulted in the generation of unprecedented amount of sequencing data from various platforms (dominated by Illumina) for basic research, understanding disease, human health applications, pharmacogenomics and agricultural applications for engineering better seeds and traits. Personal genomics offers huge potential and promise for early diagnosis and for pharmacogenomics based drugs repurposing or companion diagnostics. Algorithms for analyzing genome sequence data are fast maturing and offering better performance or accuracy. Open-source as well as commercial tools and databases are being developed and the solutions are fast moving towards commoditization on the cloud. However, computational advances are not catching up with the speed of data generation (due in part to cost reduction and potential) and significant computational challenges still remain a hurdle for a more widespread adoption of NGS-based applications. This course aims to cover recent advances in NGS technologies, tools, algorithms, and databases and discusses various applications of NGS. We will discuss high quality as well as speculative research articles on NGS data management, analysis and interpretation. The students gain understanding of various challenges and research opportunities to pursue in genomics and applications. This is a graduate level course in an interdisciplinary area. Students are expected to have varying levels of theoretical understanding of the Biochemistry, molecular biology, plant biology, human disease, genetics, classical statistics, probability theory; machine learning, and computer science —however, proficiency is not required. The students are also expected to understand questions raised by the papers listed below. However, the students should not to be discouraged if they do not completely understand all the papers. The selection of papers covers a large area in the hope that each student will find something particularly interesting for her. The list of the papers is flexible: with the instructor’s permission the students may substitute other papers for ones listed here. The students are encouraged to propose the papers related to their research interests. The course encourages active participation and discussions. 2 Format Course format: 1. Introductory lecture (3 hours). 2. Presentations by students based on the recommended literature list (four 3 hour meetings). 3. Final exam (take home). I will be available for consultations during the semester. If you are stuck, I am here to help. 3 Grading The grade is calculated as the sum of: Presentation: 60% Participation in the discussion: 30% Final Exam (take home): 50% The sum here is larger than 100%: this is by design. There is many ways to get the good grade; you can choose the one that suits you best. If a student publishes or submits a paper on NGS or one of its applications, bonus points are awarded. 4 Presentation Topics Each seminar day includes three or four presentations on close topics. Aim at about 30–40 minutes (including questions). The students are expected to participate in the discussion and fill the presentation evaluation forms. The list of topics below is large. We are not going to cover all of them. Instead, you have the choice to select the papers and topics that are of interest for you. Moreover, I may approve presenting a paper not in the list if you are interested in it and request this beforehand. All papers here are available in the GMU databases for free for GMU students. 4.1 Next-Generation Sequencing Fundamentals of sequencing 4.2 Applications of NGS - Exome-seq - Whole Genome-seq - RNA-seq - Small RNA-seq - ChIP-Seq - Methylome-seq 4.3 NGS applications in Biomedical Research - Disease causing mutation detection - Biomarker discovery - Gene Expression Studies - Integrative Genomics Studies 4.4 NGS applications in Agriculture - Challenges in plant genomics - Genotyping by Sequencing - Seeds & Traits References (approximate list of paper to be reviewed) GENERAL READING TO UNDERSTAND THE FIELD: 1) Gonzaludo et al. HGV2012: Leveraging Next-Generation Technology and Large Datasets to Advance Disease Research. Human Mutation (HGV meeting report) 2) Hennekam et al. Next-Generation Sequencing Demands Next-Generation Phenotyping. Human Mutation (HGV meeting report) 3) Wilson Sayres et al. HGV2011: Personalized Genomic Medicine Meets the Incidentalome. Human Mutation (HGV meeting report) 4) Xia et al. NGS Catalog: A Database of Next Generation. Sequencing Studies in Humans. HUMAN MUTATION Database in Brief 33: E2341-E2355 (2012) Online 5) Oetting WS. Exome and Genome Analysis as a Tool for Disease Identification and Treatment: The 2011 Human Genome Variation Society Scientific Meeting. Human Mutation (HGV meeting report) 6) Lim et al. Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling. Genomics & Informatics. Vol. 10(1) 1-8, March 2012 7) Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology 2011, 14:R36 8) Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L Differential analysis of gene regulation at transcript resolution with RNA-seq; Nature Biotechnology 9) A. Dobin et al, Bioinformatics 2012; "STAR: ultrafast universal RNA-seq aligner" 10) McKenna A, et al (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303 11) Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data Nucleic Acids Research, 38:e164, 2010 12) Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137 13) Li, B. and Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011, 12:323 14) Roy Ronen; Ido Gan; Shira Modai;Alona Sukacheov; Gideon Dror; Eran Halperin; Noam Shomron. miRNAkey: a software for microRNA Deep Sequencing analysis. Bioinformatics 2010; 15) Bormann Chung CA, Boyd VL, McKernan KJ, Fu Y, Monighetti C, et al. (2010) Whole Methylome Analysis by Ultra-Deep Sequencing Using Two-Base Encoding. PLoS ONE 5(2): e9320 NGS for DISEASE DIAGNOSTICS 1) Kingsmore and Saunders. Deep Sequencing of Patient Genomes for Disease Diagnosis: When Will It Become Routine? Sci Transl Med 15 June 2011 Vol 3 Issue 87 87ps23 2) Tracy J. Dixon-Salazar et al. Exome Sequencing Can Improve Diagnosis and Alter Patient Management Sci Transl Med 4, 138ra78 (2012); 3) Calvo S et al. Molecular Diagnosis of Infantile Mitochondrial Disease with Targeted NextGeneration Sequencing. Sci Transl Med 4, 118ra10 (2012) 4) Carol Jean Saunders et al. Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units Sci Transl Med 4, 154ra135 (2012); 5) Jacob O. Kitzman et al. Noninvasive Whole-Genome Sequencing of a Human Fetus Sci Transl Med 4, 137ra76 (2012) NGS for UNDERSTANDING OF CANCER 6) Campbell P et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. PNAS _ September 2, 2008 _ vol. 105 _ no. 35 _ 13081–13086 7) David Wu et al. High-Throughput Sequencing Detects Minimal Residual Disease in Acute T Lymphoblastic LeukemiaSci Transl Med 4, 134ra63 (2012); 8) Tim Forshew et al. Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA. Sci Transl Med 4, 136ra68 (2012); 9) Kannan K et al. Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing, www.pnas.org/cgi/doi/10.1073/pnas.1100489108 10) Rosewick N et al. Deep sequencing reveals abundant noncanonical retroviral microRNAs in Bcell leukemia/lymphoma, www.pnas.org/cgi/doi/10.1073/pnas.1213842110 NGS for AGRICULTURE 11) Elshire RJ et al. (2011) A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE 6(5): e19379. 12) Bart R et al. High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance. www.pnas.org/cgi/doi/10.1073/pnas.1208003109 13) Yang H. e al. Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L. BMC Genomics 2012, 13:318