The Genomics Education Partnership TA AnnotationWorkshop 2006

advertisement
The Genomics Education
Partnership
TA AnnotationWorkshop
2006
August 21-23
Funded by the Howard Hughes Medical Institute
WU Program Participants
Sarah Elgin, Prof Biology & Genetics
Jeremy Buhler, Asst Prof Computer Science
Chris Shaffer, Biology, Senior Teaching Fellow
Wilson Leung, Biology, Res. Asst, TA & Web Master
Taylor Cordonnier (Teaching Assistant & Lab Participant)
John Russell (Professor, Director of DBBS)
Tricia Wallace (Tour Guide, WU Genome Sequencing Center)
Undergraduate alumni of Bio 4342:
Kasia Falkowska, David Desruisseau
Washington University Graduate Students
Michael Brooks (genetics/computational biology)
Deanna Mendez (biophysics/chromosomal proteins)
Sanjida Rangwala (genetics/plant genomes)
Participating Schools
Catherine Coyle-Thompson California State University - Northridge
Chunguang Du
Montclair State University
Todd Eckdahl
Missouri Western
Anya Goodman
Cal Poly State University – San Luis Obispo
Charles Hauser
St. Edward’s University
Karmella Haynes
WU, Davidson College
Chris Jones
Moravian College
Olga Ruiz Kopp
Utah Valley State College
Gary Kuleck
Loyola Marymount University
Jennifer Myka
Thomas More College
Paul Overvoorde
Macalester College
Debbie Parrilla-Hernandez
Universidad de Puerto Rico en Humacao
Dennis Revie
California Lutheran University
Stephanie Schroeder
Webster University
Mary Shaw
New Mexico Highlands University
Gary Skuse
Rochester Institute of Technology
Colette Witkowski
Southwest Missouri State
Goals
• Better integration of genomics into the undergraduate
biology curriculum
• Better integration of research thinking into the academic
year curriculum
• Creation of a dynamic student-scientist partnership to
engage students in genomics research
• GOAL: To provide students
the opportunity to work as a
research team through a largescale sequencing project.
• PROCESS: Students begin
with sample preparation, data
generation, finishing and
quality control at the WU
Genome Sequencing Center,
and complete annotation and
analysis with WU Computer
Science faculty.
Challenge: making it work at a
distance, with your curriculum
Virtual Tour of the Genome Sequencing Center available on line, as CD, or DVD
• Web site: lecture notes, PowerPoint presentations,
references, homework with answer keys, example
student presentations
• Key analytical work is computer based
• Major resources for annotation, databases, are open
access (NIH, UCSC, Ensembl)
Choice of research problems?
Comparative analysis of Drosophila dot chromosomes
D. erecta annotation; D. mojavensis sequencing
Annotation of corn genome?
Gut bacteria genomes?
Requires lead scientist(s) committed to publication
Our ‘04-’06 research goal:
To compare finished sequence
from the dot chromosomes of
D. melanogaster with D. virilis
The sequencing “pipeline”
•
•
•
•
•
•
•
•
Genomes enter the GSC as BAC or fosmid library
Clones to be sequenced are selected
The GSC prepares ~2 kb libraries from each clone
The 2 kb fragments are sequenced from each end
(~700 bases each)
Phred/Phrap assembles the sequenced fragments
Finishers use Consed, request additional data to
generate a single, high-quality contig
Annotation identifies sequence features of interest
Future: start from posted unfinished sequence:
annotate D. erecta, finish & annotate D. mojavensis
Current status, spring 2006
Finished sequence
D. virilis dot chromosome, reference strain
Chosen fosmids
~12kb
12kb
15kb
8kb 13kb
3kb
9kb
Remaining gaps
• 13 fosmids (~40 kb each) were selected to be
made into libraries for sequencing
• Each student sequences and annotates one fosmid
• 8 smaller gaps will be sequenced using a PCRbased method (summer work, Michelle & Taylor)
Shotgun sequencing & assembly
genome
Shotgun (paired ends)
Assemble sequence reads
scaffold
Additional sequence reads needed
Initial assembly, 2-fold coverage
From 2X reads to 6X coverage….
• Three significant contigs
• All gaps spanned
• Fair coverage, but weak spots
GSC libraries for sequencing
insert (2-4 kb)
primer
read
plasmid
Sequence reads in a problem areaa run of C’s…
Final Assembly
•40,809 base pairs
•438 reads
•Good coverage, no low quality regions
Final check:
EcoRI digest,
actual vs. in silico
Annotation: analyzing
sequence data
• Practice problem: genes and pseudogenes in man
and chimpanzee
• Annotating Drosophila fosmid:
–
–
–
–
–
Finding genes
Finding repeats
Searching for conserved elements
Clustal analysis
Evaluating synteny
• Final challenge: putting it all together
Working as a group, with TA assistance, is most effective
Partnership can be effective. Work on adjacent fosmids?
Annotation: what do students
gain by analyzing sequence data?
• What tools are available for finding genes
& other features of interest? How do they
work? Managing data…
• How do you define a gene? a psuedogene?
• How are genomes organized? Repeats?
• Power of comparative genomics
• Questions of evolution
Initial analysis of D. virilis dot
chromosome fosmids
27/28 genes remain on the dot, but rearrangements
within the chromosome are common!
Examples of genome organization in
Drosophila
Egfr
CG10440
D.v. Arm
Egfr
D.m. Arm
Ephrin CG1970 Pur-Alpha
CG10440
Thd1
Zfh2
D.v. Dot
Ephrin CG1970 Pur-Alpha
Thd1
Zfh2
D.m. Dot
Coding
UTR
5KB
DNA Tranposons
Other Repetitive
Dot chromosomes
genes have larger introns due to repetitious DNA
Intron Size Distribution: Dot Chromosome versus Other Chromosomes
100%
Other Chromosomes
% Introns This Size or Smaller
90%
80%
70%
60%
50%
Dot Chromosomes
40%
30%
20%
10%
0%
0
200
400
600
800
1000
1200
1400
Intron Size
Legend:
Perc. D. virilis Dot
Perc. D. melanogaster Dot
Perc. D. virilis Other
Perc. D. melanogaster Other
The dot chromosomes of D. melanogaster and D. virilis both have a
Repeat Density Comparison ofD. melanogaster and D. virilis
high density
of repeat sequences, but differ
in type of repeats
using RepeatMasker/Superlibrary
with Classification
30
1360 Elements
DINES
25
Repeat Density (%)
Other DNA Transposons
Unknown
20
Simple Repeats
Retroelements
15
10
5
0
DM: Dot
DV: Dot
DM: Arms
Species: Chromosome
DV: Arms
Resulting publication:
Slawson, E.E., Shaffer, C.D., Leung, W, Malone, C.D.,
Kellmann, E., Shevchek, R.B., Craig, C.A., Bloom, S.,
Bogenpohl, J. II, Dee, J., Morimoto, E.T.A., Myoung, J.,
Nett, A.S., Ozsolak, F., Tittiger, M.E., Zeug, A., Pardue,
M.L., Buhler, J., Mardis, E., and Elgin, S.C.R. (2006)
“Comparison of dot chromosome sequences from
D. melanogaster and D. virilis reveals an enrichment of
DNA transposon sequences in heterochromatic domains,”
Genome Biology 7: R15.
• But required ca. 10 months additional full-time work!
Assessment: Likert Scale
(5 = Agree, 1 = Disagree)
• Before the course, I understood how the human genome had
been sequenced: 2.5
• After the course, I understood… how the human genome
had been sequenced: 4.9;
… how eukaryotic genomes are organized 4.5;
… nature of genes 4.4.
• The course helped me improve my wet lab skills: 2.5
• The course helped me improve my computer skills: 4.5
• Genomics is awesome! I love the power of databases! 4.8
Learning Gains from WU Lab Courses Compared to
Summer Program Research Experiences
1. Understanding of the research process
4.24
2. Understanding how knowledge is constructed
4.16
3. Ability to analyze data
4.08
4. Skill in interpretation of results
3.92
5. Understanding how scientists work on real problems
3.88
6. Assertions require supporting evidence
3.88
7. Skill in scientific writing
3.80
Mean Values
Scale: 1-5
Data from Course Work (25)
SURE 2003 (1135)
Learning Gains from WU Lab Courses Compared to
Summer Program Research Experiences
8. Readiness for more research
3.64
9. Tolerance for obstacles
3.63
10. Ability to integrate theory and practice
3.60
11. Learning lab skills
3.56
12. Clarification of a career path
3.13
13. Learning to work independently
2.83
14. Understanding primary literature
2.79
2.22
15. Learning ethical conduct
Data from Course Work (25)
SURE 2003 (1135)
Mean Values
Scale: 1-5
Comparison of Learning Gains from WU Lab Courses
with Summer Research Experiences
Learning to work
independently
Mean values
4.5
4
3.5
3
2.5
Understanding
knowledge construction
Skill in scientific
writing
2
1.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Learning Gains
Course Work
SURE 2003
SURE 2004
What Students Say They Learned:
 Oral presentation skills, defending ideas
 Scientific writing
 Why you do things, and how to choose a strategy
 That research doesn’t always work, and goes slowly
 That research is collaborative
 That science is more ambiguous than it appears in
lectures
Things Students Said Helped Them
Understand the Material Better:
 Writing formal lab reports
 Defending their work against challenges from
others (in oral presentations)
 Having lots of opportunities to ask questions
 Doing trouble-shooting
Lessons Learned
• Students need ownership; can come from the computerbased effort, does not require wet lab.
• Generating letter grades - use staged problem sets to teach
techniques, record progress; periodic reports with written
and oral defense of conclusions.
• Challenging - work always changing, requires time
commitment; computer support important
• Quality of the experimental work is very good! Finished
sequence, publishable data, conclusions. Good studentscientist partnership.
Goals for workshop….
• Provide background experience in gene annotation;
introduce computer-based training materials, problem
sets; annotate a Drosophila gene
• Provide a review of genome sequencing, visit the WU
Genome Sequencing Center
• Discuss your role as a TA
• Discuss plan to facilitate data in / data out from WU
• Discuss communications plan - Wiki? Help contacts?
• Discuss present and future projects of the GEP
Download