The Genomics Education Partnership TA AnnotationWorkshop 2006 August 21-23 Funded by the Howard Hughes Medical Institute WU Program Participants Sarah Elgin, Prof Biology & Genetics Jeremy Buhler, Asst Prof Computer Science Chris Shaffer, Biology, Senior Teaching Fellow Wilson Leung, Biology, Res. Asst, TA & Web Master Taylor Cordonnier (Teaching Assistant & Lab Participant) John Russell (Professor, Director of DBBS) Tricia Wallace (Tour Guide, WU Genome Sequencing Center) Undergraduate alumni of Bio 4342: Kasia Falkowska, David Desruisseau Washington University Graduate Students Michael Brooks (genetics/computational biology) Deanna Mendez (biophysics/chromosomal proteins) Sanjida Rangwala (genetics/plant genomes) Participating Schools Catherine Coyle-Thompson California State University - Northridge Chunguang Du Montclair State University Todd Eckdahl Missouri Western Anya Goodman Cal Poly State University – San Luis Obispo Charles Hauser St. Edward’s University Karmella Haynes WU, Davidson College Chris Jones Moravian College Olga Ruiz Kopp Utah Valley State College Gary Kuleck Loyola Marymount University Jennifer Myka Thomas More College Paul Overvoorde Macalester College Debbie Parrilla-Hernandez Universidad de Puerto Rico en Humacao Dennis Revie California Lutheran University Stephanie Schroeder Webster University Mary Shaw New Mexico Highlands University Gary Skuse Rochester Institute of Technology Colette Witkowski Southwest Missouri State Goals • Better integration of genomics into the undergraduate biology curriculum • Better integration of research thinking into the academic year curriculum • Creation of a dynamic student-scientist partnership to engage students in genomics research • GOAL: To provide students the opportunity to work as a research team through a largescale sequencing project. • PROCESS: Students begin with sample preparation, data generation, finishing and quality control at the WU Genome Sequencing Center, and complete annotation and analysis with WU Computer Science faculty. Challenge: making it work at a distance, with your curriculum Virtual Tour of the Genome Sequencing Center available on line, as CD, or DVD • Web site: lecture notes, PowerPoint presentations, references, homework with answer keys, example student presentations • Key analytical work is computer based • Major resources for annotation, databases, are open access (NIH, UCSC, Ensembl) Choice of research problems? Comparative analysis of Drosophila dot chromosomes D. erecta annotation; D. mojavensis sequencing Annotation of corn genome? Gut bacteria genomes? Requires lead scientist(s) committed to publication Our ‘04-’06 research goal: To compare finished sequence from the dot chromosomes of D. melanogaster with D. virilis The sequencing “pipeline” • • • • • • • • Genomes enter the GSC as BAC or fosmid library Clones to be sequenced are selected The GSC prepares ~2 kb libraries from each clone The 2 kb fragments are sequenced from each end (~700 bases each) Phred/Phrap assembles the sequenced fragments Finishers use Consed, request additional data to generate a single, high-quality contig Annotation identifies sequence features of interest Future: start from posted unfinished sequence: annotate D. erecta, finish & annotate D. mojavensis Current status, spring 2006 Finished sequence D. virilis dot chromosome, reference strain Chosen fosmids ~12kb 12kb 15kb 8kb 13kb 3kb 9kb Remaining gaps • 13 fosmids (~40 kb each) were selected to be made into libraries for sequencing • Each student sequences and annotates one fosmid • 8 smaller gaps will be sequenced using a PCRbased method (summer work, Michelle & Taylor) Shotgun sequencing & assembly genome Shotgun (paired ends) Assemble sequence reads scaffold Additional sequence reads needed Initial assembly, 2-fold coverage From 2X reads to 6X coverage…. • Three significant contigs • All gaps spanned • Fair coverage, but weak spots GSC libraries for sequencing insert (2-4 kb) primer read plasmid Sequence reads in a problem areaa run of C’s… Final Assembly •40,809 base pairs •438 reads •Good coverage, no low quality regions Final check: EcoRI digest, actual vs. in silico Annotation: analyzing sequence data • Practice problem: genes and pseudogenes in man and chimpanzee • Annotating Drosophila fosmid: – – – – – Finding genes Finding repeats Searching for conserved elements Clustal analysis Evaluating synteny • Final challenge: putting it all together Working as a group, with TA assistance, is most effective Partnership can be effective. Work on adjacent fosmids? Annotation: what do students gain by analyzing sequence data? • What tools are available for finding genes & other features of interest? How do they work? Managing data… • How do you define a gene? a psuedogene? • How are genomes organized? Repeats? • Power of comparative genomics • Questions of evolution Initial analysis of D. virilis dot chromosome fosmids 27/28 genes remain on the dot, but rearrangements within the chromosome are common! Examples of genome organization in Drosophila Egfr CG10440 D.v. Arm Egfr D.m. Arm Ephrin CG1970 Pur-Alpha CG10440 Thd1 Zfh2 D.v. Dot Ephrin CG1970 Pur-Alpha Thd1 Zfh2 D.m. Dot Coding UTR 5KB DNA Tranposons Other Repetitive Dot chromosomes genes have larger introns due to repetitious DNA Intron Size Distribution: Dot Chromosome versus Other Chromosomes 100% Other Chromosomes % Introns This Size or Smaller 90% 80% 70% 60% 50% Dot Chromosomes 40% 30% 20% 10% 0% 0 200 400 600 800 1000 1200 1400 Intron Size Legend: Perc. D. virilis Dot Perc. D. melanogaster Dot Perc. D. virilis Other Perc. D. melanogaster Other The dot chromosomes of D. melanogaster and D. virilis both have a Repeat Density Comparison ofD. melanogaster and D. virilis high density of repeat sequences, but differ in type of repeats using RepeatMasker/Superlibrary with Classification 30 1360 Elements DINES 25 Repeat Density (%) Other DNA Transposons Unknown 20 Simple Repeats Retroelements 15 10 5 0 DM: Dot DV: Dot DM: Arms Species: Chromosome DV: Arms Resulting publication: Slawson, E.E., Shaffer, C.D., Leung, W, Malone, C.D., Kellmann, E., Shevchek, R.B., Craig, C.A., Bloom, S., Bogenpohl, J. II, Dee, J., Morimoto, E.T.A., Myoung, J., Nett, A.S., Ozsolak, F., Tittiger, M.E., Zeug, A., Pardue, M.L., Buhler, J., Mardis, E., and Elgin, S.C.R. (2006) “Comparison of dot chromosome sequences from D. melanogaster and D. virilis reveals an enrichment of DNA transposon sequences in heterochromatic domains,” Genome Biology 7: R15. • But required ca. 10 months additional full-time work! Assessment: Likert Scale (5 = Agree, 1 = Disagree) • Before the course, I understood how the human genome had been sequenced: 2.5 • After the course, I understood… how the human genome had been sequenced: 4.9; … how eukaryotic genomes are organized 4.5; … nature of genes 4.4. • The course helped me improve my wet lab skills: 2.5 • The course helped me improve my computer skills: 4.5 • Genomics is awesome! I love the power of databases! 4.8 Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences 1. Understanding of the research process 4.24 2. Understanding how knowledge is constructed 4.16 3. Ability to analyze data 4.08 4. Skill in interpretation of results 3.92 5. Understanding how scientists work on real problems 3.88 6. Assertions require supporting evidence 3.88 7. Skill in scientific writing 3.80 Mean Values Scale: 1-5 Data from Course Work (25) SURE 2003 (1135) Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences 8. Readiness for more research 3.64 9. Tolerance for obstacles 3.63 10. Ability to integrate theory and practice 3.60 11. Learning lab skills 3.56 12. Clarification of a career path 3.13 13. Learning to work independently 2.83 14. Understanding primary literature 2.79 2.22 15. Learning ethical conduct Data from Course Work (25) SURE 2003 (1135) Mean Values Scale: 1-5 Comparison of Learning Gains from WU Lab Courses with Summer Research Experiences Learning to work independently Mean values 4.5 4 3.5 3 2.5 Understanding knowledge construction Skill in scientific writing 2 1.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Learning Gains Course Work SURE 2003 SURE 2004 What Students Say They Learned: Oral presentation skills, defending ideas Scientific writing Why you do things, and how to choose a strategy That research doesn’t always work, and goes slowly That research is collaborative That science is more ambiguous than it appears in lectures Things Students Said Helped Them Understand the Material Better: Writing formal lab reports Defending their work against challenges from others (in oral presentations) Having lots of opportunities to ask questions Doing trouble-shooting Lessons Learned • Students need ownership; can come from the computerbased effort, does not require wet lab. • Generating letter grades - use staged problem sets to teach techniques, record progress; periodic reports with written and oral defense of conclusions. • Challenging - work always changing, requires time commitment; computer support important • Quality of the experimental work is very good! Finished sequence, publishable data, conclusions. Good studentscientist partnership. Goals for workshop…. • Provide background experience in gene annotation; introduce computer-based training materials, problem sets; annotate a Drosophila gene • Provide a review of genome sequencing, visit the WU Genome Sequencing Center • Discuss your role as a TA • Discuss plan to facilitate data in / data out from WU • Discuss communications plan - Wiki? Help contacts? • Discuss present and future projects of the GEP