Transcriptome scan for positive selection on coding regions

advertisement
GCAT-SEEKquence
The Genome Consortium for Active Teaching
NextGen Sequencing Group
NextGen Sequencing Request Form
Complete fields below, save file with your last name at the beginning of
the filename (e.g. newman-GCAT-SEEK Sequence request form.pdf) and
email to Vincent Buonaccorsi <BUONACCORSI@juniata.edu>
A. Contact Information
1. Name:
Vince Buonaccorsi
2. Department: Biology
3. Institution: Juniata College
4. Phone Number:814-641-3579
5. Email Address: buonaccorsi@juniata.edu
B. Project Information
1. Title: Transcriptome scan for positive selection on coding regions between S. carnatus and S.
chrysomelas.
2. Category:Transcriptome
3. Total amount of sequence requested: 450Mbp
4. Preferred technology: Illumina
5. Do you have funds for a partial run next Spring? No
C. Describe the background, hypotheses and specific aims (500 words max)
Detailed examination of adaptation and reproductive isolation at the molecular level is very challenging in most
species, yet to achieve a full understanding of these processes we must determine the genomic underpinnings
of population divergence. Continued development of new model systems is essential to test generality of early
findings from model species. The genus Sebastes represent a rare example of a marine vertebrate species flock
and presents a natural evolutionary laboratory within which one may pursue hypotheses with the rare
advantage of phylogenetic replication. S. carnatus and S. chrysomelas are a particularly intriguing and important
sympatrically distributed species pair because they appear to be divergent only at a small minority of loci, loci
that are likely linked to genes that produced and/or maintain species boundaries (Buonaccorsi et al. 2011). In
order to begin to characterize the genes and genomic processes that produced and maintain differences
between these species, we propose to sequence the transcriptome of S. carnatus and S. chrysomelas from a
single individual each, identify genes experiencing high non-synonymous to synonymous substitution rates and
thereby identify candidate gene sets involved in their divergence.
The process of speciation in the presence of gene flow requires strong coordinated divergent selection
on several pre-zygotic traits, where post-zygotic barriers have not yet taken hold (Coyne and Orr 2004; Via
2009). Speciation in allopatry may accumulate gradually (or quickly if a new environment is colonized) and may
involve a relatively random selection of pre- or post-zygotic isolating mechanisms. Speciation with gene flow
may be more common than previously appreciated since new mechanisms for the process have been proposed
(Via 2009). For S. carnatus and S. chrysomelas speciation with geneflow appears to be occurring. The species
have nearly completely overlapping species distributions, niche partitioning by depth preference, only slight
morphological divergence, and introgression is still occurring to a limited degree (Buonaccorsi et al. 2011).
After a candidate set of genes with accelerated amino acid divergence has been identified between S.
carnatus and S. chrysomelas, we hypothesize that genomic divergence follows patterns expected under
speciation with gene flow, characterized by overrepresentation of genes involved in pre-zygotic isolation
mechanisms such as mate recognition or habitat specialization (Via 2009). Follow-up studies can be performed
on divergent genes to determine also whether divergence was simultaneous and dated to species origin
(Buonaccorsi et al. 2011). A list of the divergent genes, their timing of divergence, and their levels of
introgression should paint an interesting picture of the divergence genetics for this incipient species pair.
D. Describe the methods [sample prep, calculation of amount of sequence required, analysis plan]
The genome size of Sebastes is estimated to be about 1Gbp, and transriptomes are typically 1.5% of the genome
size. 30X coverage amounts to 450Mbp. A single lane of Illumina provides many times over that amount of
sequencing, so I will be able to easily share space with my colleagues using barcoding.
454FLX+ data from the S. carnatus transcriptome has been obtained from the incubator study. We propose to
isolate total RNA from liver, kidney, gonad, and brain tissue from one adult S. chrysomelas to complete the
comparison. S. chrysomelas tissue samples have been attained and preserved in RNeasy solution (Qiagen) with
the aid of colleagues in the Monterey Bay area. Total RNA will be isolated using RNeasy Plus Mini kit (Qiagen).
Quantification will be performed on a Qubit 2.0 Fluorometer (Invitrogen) and mRNA purity checked with an
Implen Nanodrop Fluorometer. Isolation of mRNA from total RNA will be performed using magnetic porous
glass (MPG) mRNA Purification Kit (PureBiotech). Samples will be sent to the PSU sequencing facility for further
quality check on their Agilent Bioanalyzer. Results will be assembled using NextGENe or CLC Bio software if
available, or on the cloud using Velvet/Oases. Protocols for protein ID, open reading frames, and calculation of
non-synonymous (Ka) to synonymous (Ks) substitution ratios follow Heras et al. (2011). Briefly, gene pairs will
be annotated using BLAST2GO (Conesa et al. 2005) to the Swissprot database using BlastX. To identify open
reading frames EST sequences will be BlastX’d against Stickleback and Fugu genomes. Resulting files will be
processed through ORF-PREDICTOR and TargetIdentifier (Min et al. 2005a,b) to identify ORFs and whether entire
coding regions were covered. Protein alignments will be converted into corresponding codon alignments using
Pal2Nal 2.2 (Suyama et al. 2006). Nonynonymous to synonymous substitution rates will be calculated using
KaKs_Calculator (Zhang et al. 2007), filtered for possible paralogs, and tested for enrichment of GO categories
using a Fisher’s Exact Test using GOSSIP (Bluthgen et al. 2005). Ka/Ks ratios of > 1.0 are consistent with positive
selection.
E. Describe the role and number of undergraduates involved in the project, and how they would benefit.
If funded I will plan to invite 5 undergraduates per semester into my research program . This work and the
followup work involved will also be used as the foundation for larger projects in my new 4 credit Genetic
Research Methods class in the Spring semester (10 students) whose goal is to introduce students to modern
molecular research in the context of performing modern research projects. The results of this project will serve
as basis for a publication that includes the research students that become involved. I aim to use my developing
expertise in nextGen analysis to implement a freshman laboratory module (150 students per year) aiming to
introduce students to analysis of publically available human exomes for potential disease causing nonsynonymous SNPs.
F. I agree to administer the GCAT-SEEK pre- and post-activity assessment test for students and to complete the
faculty post-utilization survey. X yes, ____ no
G. Describe any other broader impact or intellectual merit considerations.
Rockfish are model systems for the study of aging, conservation, and speciation. Transcriptomes from these
species can be compared to several others within the genus as part of broader efforts to understand the
molecular basis for adaptation to the marine environment, negligible senescence, and larval dispersal.
H. References
Bluthgen et al. 2005. Biological profiling of gene groups utilizing gene ontology. Genome Inform. 16, 106-115.
Buonaccorsi VP, Narum SR, Karkoska KA, Gregory S, Deptola S, Weimer AB. 2011. Characterization of a genomic
divergence island between black-and-yellow and gopher Sebastes rockfishes. Molecular Ecology. doi:
10.1111/j.1365-294X.2011.05119.x
Conesa et al 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics
research. Bioinformatics 21, 3674 -3676.
Coyne JA, Orr HA. 2004. Speciation. Sinaur Associates. Sunderland, MA 545 pp.
Heras J, Koop BF, Aguilar A. 2011. A transcriptomic scan for positively selected genes in two closely related
marine fishes: Sebastes caurinus and S. rastrelliger. Marine Genomics. 2011 Jun;4(2):93-8. Epub 2011 Mar 9.
Min et al. 2005a. OrfPredictor: Predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res.
33, W677-W680
Min et al. 2005b. Targetidentifier: a web for identifying full length cDNAs from EST sequences. Nucleic Acids
Res. 33. W699-W672.
Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence alignments into the
corresponding codon alignments. Nucleic Acids Res. 34, W609-W612.
Via S. 2009. Natural selection in action during speciation. Proceedings of the National Academy of Sciences, 106,
9939–9946
Zhang et al. 2007. KaKs_Calculator:Calculating Ka and Ks through model selection and model averaging.
Genomics Proteomics Bioinformatics, 4, 259-263.
Download