GCAT-SEEKquence The Genome Consortium for Active Teaching NextGen Sequencing Group NextGen Sequencing Request Form Complete fields below, save file with your last name at the beginning of the filename (e.g. newman-GCAT-SEEK Sequence request form.pdf) and email to Vincent Buonaccorsi <BUONACCORSI@juniata.edu> A. Contact Information 1. Name: Vince Buonaccorsi 2. Department: Biology 3. Institution: Juniata College 4. Phone Number:814-641-3579 5. Email Address: buonaccorsi@juniata.edu B. Project Information 1. Title: Transcriptome scan for positive selection on coding regions between S. carnatus and S. chrysomelas. 2. Category:Transcriptome 3. Total amount of sequence requested: 450Mbp 4. Preferred technology: Illumina 5. Do you have funds for a partial run next Spring? No C. Describe the background, hypotheses and specific aims (500 words max) Detailed examination of adaptation and reproductive isolation at the molecular level is very challenging in most species, yet to achieve a full understanding of these processes we must determine the genomic underpinnings of population divergence. Continued development of new model systems is essential to test generality of early findings from model species. The genus Sebastes represent a rare example of a marine vertebrate species flock and presents a natural evolutionary laboratory within which one may pursue hypotheses with the rare advantage of phylogenetic replication. S. carnatus and S. chrysomelas are a particularly intriguing and important sympatrically distributed species pair because they appear to be divergent only at a small minority of loci, loci that are likely linked to genes that produced and/or maintain species boundaries (Buonaccorsi et al. 2011). In order to begin to characterize the genes and genomic processes that produced and maintain differences between these species, we propose to sequence the transcriptome of S. carnatus and S. chrysomelas from a single individual each, identify genes experiencing high non-synonymous to synonymous substitution rates and thereby identify candidate gene sets involved in their divergence. The process of speciation in the presence of gene flow requires strong coordinated divergent selection on several pre-zygotic traits, where post-zygotic barriers have not yet taken hold (Coyne and Orr 2004; Via 2009). Speciation in allopatry may accumulate gradually (or quickly if a new environment is colonized) and may involve a relatively random selection of pre- or post-zygotic isolating mechanisms. Speciation with gene flow may be more common than previously appreciated since new mechanisms for the process have been proposed (Via 2009). For S. carnatus and S. chrysomelas speciation with geneflow appears to be occurring. The species have nearly completely overlapping species distributions, niche partitioning by depth preference, only slight morphological divergence, and introgression is still occurring to a limited degree (Buonaccorsi et al. 2011). After a candidate set of genes with accelerated amino acid divergence has been identified between S. carnatus and S. chrysomelas, we hypothesize that genomic divergence follows patterns expected under speciation with gene flow, characterized by overrepresentation of genes involved in pre-zygotic isolation mechanisms such as mate recognition or habitat specialization (Via 2009). Follow-up studies can be performed on divergent genes to determine also whether divergence was simultaneous and dated to species origin (Buonaccorsi et al. 2011). A list of the divergent genes, their timing of divergence, and their levels of introgression should paint an interesting picture of the divergence genetics for this incipient species pair. D. Describe the methods [sample prep, calculation of amount of sequence required, analysis plan] The genome size of Sebastes is estimated to be about 1Gbp, and transriptomes are typically 1.5% of the genome size. 30X coverage amounts to 450Mbp. A single lane of Illumina provides many times over that amount of sequencing, so I will be able to easily share space with my colleagues using barcoding. 454FLX+ data from the S. carnatus transcriptome has been obtained from the incubator study. We propose to isolate total RNA from liver, kidney, gonad, and brain tissue from one adult S. chrysomelas to complete the comparison. S. chrysomelas tissue samples have been attained and preserved in RNeasy solution (Qiagen) with the aid of colleagues in the Monterey Bay area. Total RNA will be isolated using RNeasy Plus Mini kit (Qiagen). Quantification will be performed on a Qubit 2.0 Fluorometer (Invitrogen) and mRNA purity checked with an Implen Nanodrop Fluorometer. Isolation of mRNA from total RNA will be performed using magnetic porous glass (MPG) mRNA Purification Kit (PureBiotech). Samples will be sent to the PSU sequencing facility for further quality check on their Agilent Bioanalyzer. Results will be assembled using NextGENe or CLC Bio software if available, or on the cloud using Velvet/Oases. Protocols for protein ID, open reading frames, and calculation of non-synonymous (Ka) to synonymous (Ks) substitution ratios follow Heras et al. (2011). Briefly, gene pairs will be annotated using BLAST2GO (Conesa et al. 2005) to the Swissprot database using BlastX. To identify open reading frames EST sequences will be BlastX’d against Stickleback and Fugu genomes. Resulting files will be processed through ORF-PREDICTOR and TargetIdentifier (Min et al. 2005a,b) to identify ORFs and whether entire coding regions were covered. Protein alignments will be converted into corresponding codon alignments using Pal2Nal 2.2 (Suyama et al. 2006). Nonynonymous to synonymous substitution rates will be calculated using KaKs_Calculator (Zhang et al. 2007), filtered for possible paralogs, and tested for enrichment of GO categories using a Fisher’s Exact Test using GOSSIP (Bluthgen et al. 2005). Ka/Ks ratios of > 1.0 are consistent with positive selection. E. Describe the role and number of undergraduates involved in the project, and how they would benefit. If funded I will plan to invite 5 undergraduates per semester into my research program . This work and the followup work involved will also be used as the foundation for larger projects in my new 4 credit Genetic Research Methods class in the Spring semester (10 students) whose goal is to introduce students to modern molecular research in the context of performing modern research projects. The results of this project will serve as basis for a publication that includes the research students that become involved. I aim to use my developing expertise in nextGen analysis to implement a freshman laboratory module (150 students per year) aiming to introduce students to analysis of publically available human exomes for potential disease causing nonsynonymous SNPs. F. I agree to administer the GCAT-SEEK pre- and post-activity assessment test for students and to complete the faculty post-utilization survey. X yes, ____ no G. Describe any other broader impact or intellectual merit considerations. Rockfish are model systems for the study of aging, conservation, and speciation. Transcriptomes from these species can be compared to several others within the genus as part of broader efforts to understand the molecular basis for adaptation to the marine environment, negligible senescence, and larval dispersal. H. References Bluthgen et al. 2005. Biological profiling of gene groups utilizing gene ontology. Genome Inform. 16, 106-115. Buonaccorsi VP, Narum SR, Karkoska KA, Gregory S, Deptola S, Weimer AB. 2011. Characterization of a genomic divergence island between black-and-yellow and gopher Sebastes rockfishes. Molecular Ecology. doi: 10.1111/j.1365-294X.2011.05119.x Conesa et al 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674 -3676. Coyne JA, Orr HA. 2004. Speciation. Sinaur Associates. Sunderland, MA 545 pp. Heras J, Koop BF, Aguilar A. 2011. A transcriptomic scan for positively selected genes in two closely related marine fishes: Sebastes caurinus and S. rastrelliger. Marine Genomics. 2011 Jun;4(2):93-8. Epub 2011 Mar 9. Min et al. 2005a. OrfPredictor: Predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res. 33, W677-W680 Min et al. 2005b. Targetidentifier: a web for identifying full length cDNAs from EST sequences. Nucleic Acids Res. 33. W699-W672. Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609-W612. Via S. 2009. Natural selection in action during speciation. Proceedings of the National Academy of Sciences, 106, 9939–9946 Zhang et al. 2007. KaKs_Calculator:Calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics, 4, 259-263.