Genomics Education Partnership: a flexible approach to implement Genomic teachings and research in the classroom Matthew W. Wadsworth and Consuelo J. Alvarez, Department of Biological and Environmental Sciences, Longwood University, Farmville, VA 23901 INTRODUCTION ANNOTATION The Genomics Education Partnership (GEP) has afforded students at Longwood University the opportunity to work with finishing sequencing and annotation research projects that are of scientific significance. The project focuses on many closely related Drosophila species (Fig. 1). The purpose of the project is to assess the differences between the dot chromosome of D. melanogaster, which is largely heterochromatin, and others species such as D. mojavensis, D. erecta, and D. virilis that are composed mainly of euchromatin. The long-term goal of the project is to use comparative genomics to discover the evolutionary cause of the relatively recent transition from a euchromatic to heterochromatic dot chromosome. Each student begins by selecting a finishing or annotation project from an online database. Finishing the DNA sequence is the first step. The projects were compiled by the Genome Sequencing Center (GSC) at Washington University, St. Louis, for use by students. Genomes enter the GSC as BAC or fosmid libraries from which clones to be sequenced are chosen. The GSC then prepares approximately 2 kb libraries from each clone that are then shotgun sequenced (Fig. 2). When these DNA fragments are then pieced together using Phred/Phrap there can be a wide variety of problems with the sequence, such as gaps or low quality areas that then must be corrected by the finisher. Annotation is the process of locating genes and other relative sequences within the finished DNA sequence. This process requires the use of various online gene databases and results in specific gene locations with exact exon/intron boundaries which can then be used for comparative analysis. The process of mapping the location of genes and various other relevant sequences within a finished DNA sequence GOALS: To find genes, its functions and coding proteins as well as to delimit exon-intron boundaries, and uncover isoforms and orthologs PROCEDURE: * Check basic information from various gene finding programs such as Genscan and GeneID (Fig. 7a-b) * Mask the repetitious sequences that range from simple poly-A repeats to repetitious elements of over 50 base pairs * Search a database containing matches to known gene sequences; use NCBI’s Basic Local Alignment Search Tool (BLAST) vs. D. melanogaster (Fig. 8) * Map the exon/intron boundaries and check its accuracy with the Gene Model Checker program (provided by GEP) * Note any sequences that are out of the ordinary * Submit the project (Fig. 9) Figure 1. Phylogenetic tree for Drosophila species showing that the D. melanogaster subgroup has evolved more recently than D. virlis, prompting scientists to investigate the reason for a change from a euchromatic to heterochromatic dot chromosome. Figure 2. Genome shotgun sequencing pathway indicating that gaps in the sequence of the assembly can form where no fosmids are recorded across a given area. FINISHING A multi-step process used to piece together a complete and flawless DNA sequence GOALS: To eliminate any gaps, correct and improve low-quality regions, high-quality discrepancies between bases, single subclones, or single chemistry for sequencing reads PROCEDURE: * Select DNA fosmids of approximately 40 kb in length from an online database * Analyze these sequences by using the programs Consed and Phred/Phrap * Look at the gaps present in the fosmid and the overall quality of the sequence (Fig. 3) * Correct high-quality discrepancies between base pairs when enough evidence from other reads is present (Fig. 4) * Call for reads to solve more complex problems such as gaps or low-quality areas that have no relevant data present (Fig. 5) * Check for bacterial contaminants to be present with “BLAST” * Continue to annotation with the final finishing product is complete and has a high-quality DNA in comparison to the consensus (Fig. 6) Figure 3. A screenshot from the program Consed of the original assembly view of finishing project 120-D14 is shown. The DNA contig labeled 3 is a sequence approximately 29 kb in length, while contig 2 is roughly 12 kb in length. One of the primary duties of a finisher is to bridge the gap between these two contigs in order to create a single continuous contig roughly 41 kb long. The green lines spanning the assembly represent the amount of coverage over the area as well as relative quality of the data present. The triangle lines represent forward and reverse pairs, and the orange and black boxes show tandem and inverse repeats. Additional reads may be required to increase the quality of certain areas to an acceptable level. Figure 4. An example of a high quality discrepancy between base pairs is given in this figure. Each row represents a separate read over the target area. While all reads are of equal quality at that particular base (13,533 of the consensus sequence), some call it as a cytosine and others as a thymine. This discrepancy is an example of a polymorphism and can not be manually corrected by the finisher. Instead the finisher tags the area so that more detailed research can be conducted later on. Figure 7a. Visual representation of the Genscan program predicted genes from the annotation project containing fosmid13. Genscan has predicted 5 total genes within the fosmid, three in the plus reading frame and two in the minus reading frame. Figure 7b. A more specific read-out of predicted genes provided by the Genscan program is described. Contained within are predicted exon/intron boundaries are shown as well as relevant sequences such as the promoter region and poly-A tails. Figure 9. A summary of the actual genes located within fosmid13 is shown. Notice that five genes were originally predicted (Fig. 7), however, only two genes were actually found to be present within fosmid13. Figure 8. Blastn results indicating a section of a suspected gene sequence within fosmid13 after being run through a D. melanogaster gene database. The query sequence represents the D. melanogaster gene location within the chromosome, and the subject sequence is the suspected D. erecta gene. The sequences are 96% matches to one another, with an example of a mismatched base pair circled in red. CONCLUSION Figure 5. Two separate sets of reads called for project 120-D14 are described. The column entitled Oligo Sequence shows the selected DNA primer sequence from which the read will originate from. Primers are selected 70-100 bases from the problem area and are oriented to span the region in question. Through use of these reads the two contigs were joined and all low-quality, single subclone/single chemistry areas were remedied. The implementation of the Genomics Education Partnership at Longwood University was successful. At Longwood University a total of five annotation and four finishing projects were completed and submitted during my two-semester involvement with this project. The finishing of D. mojavensis and annotation of D. virilis has since been completed. Although, we did start with the annotation of D. erecta, this current year, the GEP had added to its research new drosophila species. Thus, the target organism for finishing is D. grimshawi while for annotation is D. mojavensis as well as to complete the remaining fosmids of D. erecta. The data obtained through the completion of these projects will go a long way in assisting upper-level researchers in determining the evolutionary transition from a euchromatic to heterochromatic dot chromosome. The GEP provides a unique educational experience, allowing students to be involved in a project that requires collaboration with other students and faculty spread across the country. REFERENCES Figure 6. Assembly view of project 120-D14, obtained upon project completion, showing that the initial gap has been bridged, and all other errors corrected. The fosmid will continue on to annotation from this point. GEP Homepage: http://gep.wustl.edu/ NCBI BLAST Search Engine: http://blast.ncbi.nlm.nih.gov/Blast.cgi FlyBase: http://flybase.org/ UCSC Genome Browser: http://genome.ucsc.edu/ RepeatMasker: http://www.repeatmasker.org/ Genscan: http://www.genscan.com/ ACKNOWLEDGMENTS GEP Program Director: Sarah C.R. Elgin Technical Director: Chris Shaffer Chief Technical/Teaching Assistant: Wilson Leung Sponsored by Washington University at St. Louis and HHMI GEP members and partner 06-07-08 and their students and institutions Biology 425 students, spring class 2008 at Longwood University