SCU Biology Assessing Genetic Diversity in the Rare Sandhill Endemic Erysimum teretifolium Using Microsatellites and Next-Generation Sequencing Background Island biogeography predicts A. that populations occupying A B island-like habitats near genetic reservoirs will contain higher levels of diversity than more isolated populations (Vellend 2003). Genetic structure within such islands then reflects isolation B. by distance theory (Wright 1943). Genetic diversity is also predicted to be positively correlated with population size (Leimu et al. 2003). The Zayante Sandhills of Santa Fig. 1. The Ben Lomond Wallflower. Erysimum Cruz, California, are island-like teretifolium occupies inland sandhills of Santa Cruz Co. (A) xeric habitats separated by mesic which have been largely destroyed by sand quarrying (B). redwoods and mixed evergreen forests. These unique habitats are home to many endemic plant and animal species, including the Ben Lomond Wallflower (Erysimum teretifolium; Fig. 1A). This naturally patchy habitat is threatened by the sand quarrying industry (Fig. 1B) and residential development. An unknown number of populations of E. teretifolium remain, several of which contain fewer than 100 individuals. Using two distinct methods, microsatellite analysis and Next-Generation sequencing (NGS), this project investigates the distribution of genetic diversity within and among eight extant populations to determine whether E. teretifolium’s island-like habitat influences its genetic distribution and to guide future conservation priorities. Such data will help land managers determine appropriate seed sources for establishing new populations of E. teretifolium. In particular, this project addresses the complexity of analyzing microsatellite data from a hexaploid plant species and discusses whether NGS may provide a viable alternative to estimating genetic diversity in such taxa. Population Structure • 25 individuals per population were pooled into a single barcode. • 4 populations in total were barcoded and sequenced on a single lane of Illumina HiSeq (shared with a total of 8 barcodes/lane). DNA Plant tissue (fresh) • Is there discernible population structure in E. teretifolium? • Is the distribution of genetic diversity within and among populations consistent with this species’ insular habitat? • Do population size or geographic isolation impact genetic diversity within populations? • Can NGS complement traditional microsatellite approaches for conservation genetics? Samples were collected from 186 individuals representing 8 populations of E. teretifolium (11-32 individuals per population). DNA was extracted with a NucleoSpin Plant II kit using lysis buffer 1 (Machery & Nagel). PCR amplification was carried out on 3 microsatellite loci (18 total alleles) developed for the European E. mediohispanicum according to the methods of Muñoz-Pajares et al. (2011). Alleles were separated on an ABI3730 with a LIZ600 size standard, and lengths were determined using PeakScanner Software v1.0 (Life Technologies). Due to hexaploidy in E. teretifolium, we could not confidently determine genotypes, so we analyzed the data with the restriction model in Structure (Pritchard et al. 2000). A range of population clusters (k = 1-10) were tested using location priors and allowing for admixture (ngen=106, 5 replicates per k-value, burnin=5*105, lambda=0.51202, determined empirically). The number of population clusters that best fit the data was calculated using the Δk method of Evanno et al. (2005) in Structure Harvester (Earl et al. 2011). Runs with identical parameters were conducted including samples from the closely related wallflower, E. capitatum ssp. angustatum (ERCAAN), to ensure the model could differentiate these taxa. Average group assignments for E. teretifolium were used for later analyses. Samples were analyzed in Arlequin v3.5 (Excoffier et al. 2005) for AMOVA and FST using groupings predicted by Structure. The total number of differences between each pair of individuals was calculated in PAUP v4.0 (Swofford 2002). The distribution of genetic distances within and among populations was calculated from the resulting distance matrix. Geographic distances were determined in Google Earth based on GPS coordinates. A Mantel nonparametric test was used to compare the geographic and genetic distance matrices (Liedloff 1999). Population size estimates were based on censuses of juveniles, flowering individuals, and fruiting individuals at each site. Remaining analyses were carried out in Excel. Illumina HiSeq (USC Epigenome Center) Library Prep (Nextera) Identify SNPs Contig 1 A. thaliana Microreads produced by Illumina HiSeq (50bp paired-end) Fig. 3. Average probability of group assignments. Pie diagrams depict the average group assignment probabilities in each population for the two genetic clusters identified by Structure for E. teretifolium. • Two primary geographic clusters emerge based on Structure assignments: Northwest/South (QH, BD, AZA/Hwy17), and Central (OLY, GEY, SHGW) with MTH acting as a bridge between the Central and South groupings. • Groupings may be arising from a central versus peripheral division Among groups Among populations within groups Within populations 4.0 3.5 Microreads 50,044,686 CTAGCT 51,868,830 TAATCG 49,746,950 TGACCA 50,491,668 De Novo Assembly (Velvet) k-mer Length 23 27 31 35 39 23 27 31 35 39 23 27 31 35 39 23 27 31 35 39 Median Depth of Coverage 3.375 3.225 3.116 3.170 4.539 3.412 3.259 3.153 3.203 4.575 3.279 3.124 3.033 3.140 4.764 3.383 3.227 3.128 3.192 4.602 Contiguous Sequences N50 274 395 277 151 367 226 381 263 147 363 240 342 191 133 384 234 377 247 147 368 Contigs Blasted to A. thaliana for identification Max Contig Length (bp) 7969 9373 10583 13277 18453 8764 10764 10583 10587 16179 11043 11317 9369 13990 16179 8196 9578 10583 15095 18453 Fig. 8. De novo assembly of contigs for four populations of E. teretifolium across a range of k-mer lengths. All four of the longest contigs (k-mer length=39) are similar to known A. thaliana mitochondrial sequences but contain SNPs and indels (megablast, E=0.0). 3.0 Conclusions 2.5 2.0 0 Fig. 4. Analysis of Molecular Variance. Populations assigned to groups based on average group assignment probability from Structure k=2 categories without ERCAAN. 82% of the variation exists within populations. Barcode CAGGCG y = -4E-05x + 3.2966 R² = 0.096 3000 6000 9000 12000 15000 Geographic distance (m) Methods DNA Extraction Contig 1 Sources of Genetic Variation Research Questions Next-Generation Sequencing Approach Average Genetic Distance (total differences) Artwork by Edward Rooks Julie A. Herman, Khaaliq DeJan, Justen B. Whittall Santa Clara University, CA Fig. 5. Isolation by distance. Genetic distances are averages of all pairwise comparisons of individuals for each pairwise comparison of populations. No correlation (Mantel test: 104 iterations, 8x8 half matrix, randomization, r = -0.3098, n.s.). • Most of the genetic diversity exists within populations and correlates weakly with population size. • Continental islands such as the Zayante sandhills may not act the same as oceanic islands, as seen in the case of E. teretifolium, which does not fit an isolation by distance model. Acknowledgements • • • Fst • 24 of 28 comparisons between populations had Fst significantly greater than 0 (p<0.05). • Hwy17, one of the smallest, most disturbed, and isolated populations, has the highest pairwise Fst. • AZA, one of the largest, least disturbed, and central populations, has the lowest Fst. • Although AMOVA shows most of the variation is contained within populations, Fst reveals that most populations are significantly different from one another. • There is no correlation between geographic distance and genetic distance. • These results suggest that an island-like model is inappropriate to describe these populations although they superficially physically resemble island habitats • Team Wallflower, Summer 2012 Cindy Dick, Miranda Melen, & Devin Wakefield at SCU provided invaluable assistance, as well as Inés Casimiro-Soriguer from Universidad Pablo de Olavide Charles Nicolet from USC’s Epigenome Center provided critical assistance with the NGS library preps & sequencing. Jodi McGraw, Ingrid Parker, Val Haley & Terris Kasteen provided essential field assistance. Funding was provided by an SCU ALZA Scholarship to JH and Section VI funds from the California Department of Fish and Wildlife to JW. References Earl D & von Holdt B (2011). Structure harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources:1-3. Evanno G, Regnaut S, & Goudet J (2005) Detecting the number of clusters of individuals using the software Structure: a simulation study. Molecular Ecology 14(8):2611-2620. Excoffier, Laval LG, & Schneider S (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50. Leimu R, Mutikainen P, Koricheva J, Fischer M (2006) How general are positive relationships between plant population size, fitness, and genetic variation? Journal of Ecology 94(5):942-952. Liedloff, AC (1999) Mantel Nonparametric Test Calculator. Version 2.0. School of Natural Resource Sciences, Queensland University of Technology, Australia. Muñoz-Pajares AJ, Herrador MB, Abdelaziz M, Picó FX, Sharbel TF, Gómez JM &Perfectti F (2011) Characterization of microsatellite loci in Erysimum mediohispanicum (Brassicaceae) and cross-amplification in related species. American Journal of Botany e287-e289. Pritchard JK, Stephens M, & Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945-959. Swofford, D L (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. Vellend M (2003) Island Biogeography of Genes and Species. The American Naturalist 162(3):358-365. Wright S (1943). Isolation by distance. Genetics 28(2), 114.