ele12460-sup-0001-SuppInfo

Supplemental Data and Experimental Procedures Summary metrics of genomic divergence In Table 1 and Supplemental Table 1, a series of metrics were calculated to summarize various aspects of the allele frequency responses displayed by SNPs in the selection experiment and genomic divergence between the host races in nature. These included: |∆ freq| = mean absolute value of the allele frequency response in the selection experiment; r = the regression coefficient between the allele frequency response for SNPs in the selection experiment versus the allele frequency difference for the same SNPs between the host races in the sympatric comparison; ∆ races = the mean allele frequency difference between the host races in the sympatric comparison for alleles for indicated categories of SNPs increasing in frequency in the selection experiment. Note that the ∆ races value considers the direction, as well as the magnitude, of the frequency difference between apple and hawthorn flies. Thus, if the allele responding to reach higher frequency in the selection experiment in the 32-day hawthorn sample was in lower frequency in the apple than the hawthorn race in the 7-day samples, then the difference would have a negative sign in the calculation of ∆ races; % same = the percentage of SNPs for which the allele frequency response in the selection experiment changed in the same direction as the difference between the host races (i.e., the proportion of times the sign of the SNP allele frequency difference between the 32-day hawthorn and 7-day hawthorn samples was the same as the difference between the 7-day apple and 7-day hawthorn samples). The summary metrics were calculated for different categories of SNPs including: (1.) all 32,455 variable sites and 2,352 mapped markers; (2.) SNPs within the all variable sites and mapped marker data sets displaying significant responses to selection; (3.) SNPs within the all variable i sites and mapped marker data sets displaying significant differences in the sympatric comparison; and (4.) SNPs within the all variable sites and mapped marker data sets displaying significant responses in the selection experiment and significant differences in the sympatric comparison. For each of these categories, we also estimated metrics considering a single SNP drawn at random from each of the sets of SNPs in linkage equilibrium with one another. A total of 10,000 replicates where performed for these estimates and we report the mean value calculated from these trials. Sliding window analysis along chromosomes In Figure 2, a sliding window analysis was used to compare and illustrate the response to selection and host race divergence genome-wide along chromosomes. Our metric for the response to selection was the absolute value of the allele frequency difference between the 32day and 7-day hawthorn samples averaged for all mapped SNPs falling within a 2 centi-Morgan window. The window was then slid one SNP at a time according to the linkage map along the length of each chromosome. The metric for host-related divergence was calculated as the absolute value of the allele frequency difference between the 7-day apple and 7-day hawthorn samples times the sign of the allele frequency difference between the 32-day and 7-day hawthorn samples similarly centered on each SNP and then averaged across the corresponding 2 centiMorgan window. This window was then shifted one SNP at a time according to the linkage map along the length of each chromosome. Also given are the correlation coefficients (r) and association probability value (P) for correlations between sliding window estimates of genetic response to selection versus divergence between the host races for each chromosome. The patterns across each chromosome for mapped loci in the selection experiment and the sympatric ii comparison are illustrated in Fig. 2. Calculating the polygenic response to selection In Figure 3, we calculated polygenic genotype scores for individual flies across the genome, which is the mean proportion of a fly’s genome composed of alleles more common in the hawthorn race. Polygenic scores were calculated by first determining the allele more common in the 7-day hawthorn than 7-day apple fly sample for each SNP. We then summed the number of such “hawthorn race” alleles an individual possessed across the genome based on the genotype likelihood values for each SNP and divided the sum total by twice the number of SNPs present in the sample. The result was an estimate of the proportion of an individual fly’s genome that was hawthorn race like in its composition. We then generated kernel density plots in R (version 2.11.1) of the polygenic genotype scores for individuals in the three sample populations to graphically depict how effective the selection experiment was in shifting the distribution of surviving flies in the 32-day selection treatment from the hawthorn race toward the apple race (Fig. 3). Distributions for all 32,455 variable SNPs and for all variable SNPs that displayed significant frequency differences between that host races and a significant response in the selection experiment are shown in Fig. 3a and 3b, respectively. To quantify the shift in the polygenic genotype distribution toward the apple race in the selection experiment, we took the difference in the mean score for the 7-day hawthorn minus the 32-day hawthorn sample and divided it by the difference in the mean score for the 7-day hawthorn minus the 7-day apple sample. Empirical estimates of linkage disequilibrium in R. pomonella iii In general, unlinked SNPs mapping to different chromosomes displayed little or no disequilibrium. Of a total of 6,011 pairwise LD tests between unlinked SNPs on different chromosomes displaying significant responses to selection, only 73 comparisons were significant (1.2%), and none on a table-wide basis. The average r value between unlinked SNPs for Burrow’s ∆ was 0.067, the largest r value between unlinked SNPs for Burrow’s ∆ was 0.43, and the greatest number of significant associations displayed by any single SNP was 3. In contrast, LD was more pronounced between sets of SNPs residing on the same chromosome. Thus, SNPs residing on the same chromosome in LD might not represent independent loci responding to selection to prewinter length. Rather, the responses they display could represent the indirect consequences of physical linkage and genetic hitchhiking to a third site that is the direct target of selection. To account for this and derive an estimate for the minimum number of independent genes/gene regions under selection, we analyzed the pattern of pairwise ∆ values to determine the fewest number of sets of significantly responding SNPs in the selection experiment displaying significant LD with other members of the set, but in linkage equilibrium with all other SNPs. We accomplished this through a custom script, starting with a randomly chosen SNP, successively added additional randomly chosen SNPs displaying significant LD to other members of the growing set until all SNPS were exhausted. The algorithm then continued by randomly choosing another SNP not yet contained in any set until all SNPs were assigned. At this time, the total number of sets was calculated and the algorithm reset to no assignments and rerun to determine the lowest estimate after 10,000 replicates. The analysis was performed considering significant SNPs in the 2,352 mapped and entire 32,455 polymorphic SNP data sets, and for SNPs showing significant differences in the sympatric comparison, as well as the selection experiment. iv Simulation estimates of null expectations when including linkage disequilibrium Computer simulations were then performed to generate non-parametric distributions for the null expectation of the number of independent sets of SNPs in linkage equilibrium expected by chance in the selection experiment for both the entire 32,455 and 2,352 mapped sites data. The simulations were conducted by randomly choosing n = 54 and n = 47 individual whole genome genotypes with replacement from the pool of 7-day hawthorn flies. Statistical significance for each SNP between the two random samples was then assessed as described above for the data from the selection experiment. Pairwise Burrow's ∆ values were then calculated between SNPs displaying significant differences and these values used in the LD algorithm to determine the lower bound number of independent sets of SNPs in linkage equilibrium they defined. The process was repeated 1,000 times to generate a null distribution for statistical testing. Polygenic threshold model and number of selected loci The pronounced genetic response observed within a single generation in the selection experiment might not be thought possible due to the unrealistically large number of selective deaths it would seem to entail. However, when selection is imposed along one ecological axis, as we did, and involves a polygenic trait like diapause, in which many loci contribute to the phenotype, there is no limit to the number of loci potentially responding to selection. The problem is statistical detection given a finite sample size. We analyzed a polygenic threshold model of hard selection to show that it is the statistical detection of loci under selection given sample sizes rather than the possible number of loci under v selection that is the limiting issue in selection experiments. To demonstrate this, we first estimated the average allele frequencies for SNPs that subsequently significantly changed in frequency in the selection experiment. From these SNPs the average frequency for the common allele was ~0.80. We then considered that half the time the common allele would be favored by rearing under apple-like environmental conditions and half the time the rare allele would be. A baseline pool of 10,000 non-selected experimental individuals was then constructed possessing a variable number of x unlinked and independently assorting loci that were sensitive to selection equally divided into loci in which the common versus rare allele was favored under apple rearing conditions. When then randomly chose n = 54 individuals from the baseline pool with replacement to represent the 7-day hawthorn sample. We then assumed that each of the x loci contributed equally to the diapause phenotype with the total number of apple selected alleles an individual possessed dictating a deeper initial diapause depth. We then selected those individuals in the upper 18% quantile representing flies possessing diapause phenotypes that could potentially survive the 32-day prewinter treatment and emerge as adults after the 30-week chilling period; the 18% threshold represents the relative proportion of 32-day versus 7-day hawthorn flies that survived the experimental treatments (Fig. 1b). From this selected pool of 1,800 flies, we randomly chose n = 47 individuals with replacement to represent the 32-day hawthorn sample. We then determined the average allele frequency shifts for SNPs, and the numbers and proportions of SNPs that significantly changed in frequency. We performed 1,000 replicates for a given value of x loci under selection to generate the expected means for the genetic response metrics. The results are summarized in Supplemental Fig. 1a-c. In the Supplemental Fig. 1, we demonstrate how for a polygenic model of selection with the sample sizes in our experiment, the 18% relative survivorship we induced in the long versus vi short prewinter treatment may detect only ~ 22% of all unlinked SNPs (or SNP sets) that are targets of selection. Thus, our estimate of 110 may greatly underestimate the actual number of gene regions affected by selection. As seen in Supplemental Fig. 1a-c and from the nature of the polygenic threshold model, there is no limit to the number of potential loci contributing to the deeper diapause phenotype that is under selection. As additional loci are added, however, they each contribute proportionately less to the phenotype and, hence, the strength of selection acting upon each gene decreases. Given a fixed and finite experimental sample size, the result is that: (1) the average shift in allele frequency of loci will decrease, as x increases (Supplemental Fig. 1c); and (2) a lower proportion of loci will be detected as statistically responding to selection, as x increases (Supplemental Fig. 1b). However, the absolute number of significantly responding loci will increase (Supplemental Fig. 1a). For the actual experimental data, we found 162 independently responding genes/gene regions, with a lower bound estimate of 110. Based on Supplemental Fig. 1a, this implies that potentially many more loci are under divergent selection for diapause depth between the host races. Indeed, the results suggest that perhaps each of the 686 sets of SNPs defined by all 32,455 variable sites may contain a gene(s) that responded to selection. Of course, this does not prove that this is the case. However, it does demonstrate that numerous loci can simultaneously be under divergent selection and potentially respond in a manipulative experiment conducted within a generation. However, the problem more often than not will be the statistical detection of many of these selected loci. vii We emphasize that the strength of our study and novelty of the experimental genomic approach lies not in the number of times we replicate the selection experiment as a whole, but on the number of polygenic SNPs and gene regions contributing to diapause adaptation that we effectively sample in a given replicate trial. Given the stochastic nature in which particular SNPs or gene regions may respond in a given trial, we show that we may get statistical significance for only ~22% of loci (Supplemental Figure 1). Nevertheless, the large number of potential targets and their general response in the predicted direction, even though all may not be significant individually, results in a substantial genome wide pattern. We are therefore moving beyond looking at individual SNPs to a genome-wide distribution perspective. Further, we are integrating genome scans, natural history, and selection experiments together to give a more powerful design for testing for the footprint of divergent selection and its significance for ecological speciation. The strength of this design is being able to: (1.) select on the key environmental conditions causing RI to identify significantly responding SNPs and gene regions from the background noise; and then (2.) use this information to predict the direction and magnitude of divergence that should be observed in nature to determine the genomic footprint of ecological selection and its relationship to RI. viii Supplemental Table S1. Summary metrics of the genetic response in the selection experiment for the mapped SNPs to the R. pomonella genome, as well as for subcategories of mapped SNPs displaying significant differences in the selection experiment (sig. sel.), between the host races (sig. races), in both the selection experiment and between the host races (sig. both), and for sets of SNPs in linkage equilibrium with one another (link eq.). |∆ freq| r ∆ races % same 2,352 0.039 0.438 + 0.017 62.8 Mapped SNPs sig. sel. 312 0.094 0.758 + 0.050 82.4 Mapped SNPs sig. sel. (link eq.) 125 0.090 0.822 +0.059 86.4 Mapped SNPs sig. races 131 0.057 0.752 + 0.082 84.0 Mapped SNPs sig. races (link eq.) 50 0.060 0.774 +0.077 85.2 Mapped SNPs sig. both 51 0.100 0.963 + 0.121 100 Mapped SNPs sig. both (link eq.) 34 0.093 0.954 + 0.114 100 Locus category Mapped SNPs n * See Table 1 in for results for all 32,455 variable SNPs genotyped in the study. n = number of SNPs per category; |∆ freq| = mean absolute allele frequency response in selection experiment for indicated SNP categories; r = correlation coefficient between allele frequency response in selection experiment versus allele frequency difference between host races (P < 10-6 for all r values); ∆ races = mean frequency difference between the host races for alleles increasing in frequency in the selection experiment; % same = percentage of SNPs for which the allele frequency response in the selection experiment changed in the same direction as the difference between the host races. ix 110 ~500 Supplemental Figure 1. Theoretical predictions from polygenic threshold selection model (yaxis) for: (a) the number of statistically significant SNPs; (b) the proportion of statistically significant SNPs; and (c) the mean allele frequency shift for all SNPs expected for differing numbers of independently assorting SNPs experiencing selection (x-axis). Trend lines were fitted using a cubic spline function in R. Filled circles represent expectations if each of the 686 independent sets of SNPs observed in the study contained at least one gene under divergent selection. Note, in panel (a.), we highlighted the lower bound estimate of the number of x independent regions we detected responding to selection in our experiment (110; y-axis), suggesting approximately 500 independent regions could be responding to selection (x-axis). Supplemental Figure 2. Association between allele frequency shifts generated during the selection experiment on host-plant-associated overwintering conditions and allele frequency differences between sympatric haw and apple host races of R. pomonella in nature. xi

ele12460-sup-0001-SuppInfo

Related documents

Products

Support

ele12460-sup-0001-SuppInfo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib