ele12460-sup-0001-SuppInfo

advertisement
Supplemental Data and Experimental Procedures
Summary metrics of genomic divergence
In Table 1 and Supplemental Table 1, a series of metrics were calculated to summarize various
aspects of the allele frequency responses displayed by SNPs in the selection experiment and
genomic divergence between the host races in nature. These included: |∆ freq| = mean absolute
value of the allele frequency response in the selection experiment; r = the regression coefficient
between the allele frequency response for SNPs in the selection experiment versus the allele
frequency difference for the same SNPs between the host races in the sympatric comparison; ∆
races = the mean allele frequency difference between the host races in the sympatric comparison
for alleles for indicated categories of SNPs increasing in frequency in the selection experiment.
Note that the ∆ races value considers the direction, as well as the magnitude, of the frequency
difference between apple and hawthorn flies. Thus, if the allele responding to reach higher
frequency in the selection experiment in the 32-day hawthorn sample was in lower frequency in
the apple than the hawthorn race in the 7-day samples, then the difference would have a negative
sign in the calculation of ∆ races; % same = the percentage of SNPs for which the allele
frequency response in the selection experiment changed in the same direction as the difference
between the host races (i.e., the proportion of times the sign of the SNP allele frequency
difference between the 32-day hawthorn and 7-day hawthorn samples was the same as the
difference between the 7-day apple and 7-day hawthorn samples).
The summary metrics were calculated for different categories of SNPs including: (1.) all 32,455
variable sites and 2,352 mapped markers; (2.) SNPs within the all variable sites and mapped
marker data sets displaying significant responses to selection; (3.) SNPs within the all variable
i
sites and mapped marker data sets displaying significant differences in the sympatric
comparison; and (4.) SNPs within the all variable sites and mapped marker data sets displaying
significant responses in the selection experiment and significant differences in the sympatric
comparison. For each of these categories, we also estimated metrics considering a single SNP
drawn at random from each of the sets of SNPs in linkage equilibrium with one another. A total
of 10,000 replicates where performed for these estimates and we report the mean value
calculated from these trials.
Sliding window analysis along chromosomes
In Figure 2, a sliding window analysis was used to compare and illustrate the response to
selection and host race divergence genome-wide along chromosomes. Our metric for the
response to selection was the absolute value of the allele frequency difference between the 32day and 7-day hawthorn samples averaged for all mapped SNPs falling within a 2 centi-Morgan
window. The window was then slid one SNP at a time according to the linkage map along the
length of each chromosome. The metric for host-related divergence was calculated as the
absolute value of the allele frequency difference between the 7-day apple and 7-day hawthorn
samples times the sign of the allele frequency difference between the 32-day and 7-day hawthorn
samples similarly centered on each SNP and then averaged across the corresponding 2 centiMorgan window. This window was then shifted one SNP at a time according to the linkage map
along the length of each chromosome. Also given are the correlation coefficients (r) and
association probability value (P) for correlations between sliding window estimates of genetic
response to selection versus divergence between the host races for each chromosome. The
patterns across each chromosome for mapped loci in the selection experiment and the sympatric
ii
comparison are illustrated in Fig. 2.
Calculating the polygenic response to selection
In Figure 3, we calculated polygenic genotype scores for individual flies across the genome,
which is the mean proportion of a fly’s genome composed of alleles more common in the
hawthorn race. Polygenic scores were calculated by first determining the allele more common in
the 7-day hawthorn than 7-day apple fly sample for each SNP. We then summed the number of
such “hawthorn race” alleles an individual possessed across the genome based on the genotype
likelihood values for each SNP and divided the sum total by twice the number of SNPs present in
the sample. The result was an estimate of the proportion of an individual fly’s genome that was
hawthorn race like in its composition. We then generated kernel density plots in R (version
2.11.1) of the polygenic genotype scores for individuals in the three sample populations to
graphically depict how effective the selection experiment was in shifting the distribution of
surviving flies in the 32-day selection treatment from the hawthorn race toward the apple race
(Fig. 3). Distributions for all 32,455 variable SNPs and for all variable SNPs that displayed
significant frequency differences between that host races and a significant response in the
selection experiment are shown in Fig. 3a and 3b, respectively. To quantify the shift in the
polygenic genotype distribution toward the apple race in the selection experiment, we took the
difference in the mean score for the 7-day hawthorn minus the 32-day hawthorn sample and
divided it by the difference in the mean score for the 7-day hawthorn minus the 7-day apple
sample.
Empirical estimates of linkage disequilibrium in R. pomonella
iii
In general, unlinked SNPs mapping to different chromosomes displayed little or no
disequilibrium. Of a total of 6,011 pairwise LD tests between unlinked SNPs on different
chromosomes displaying significant responses to selection, only 73 comparisons were significant
(1.2%), and none on a table-wide basis. The average r value between unlinked SNPs for
Burrow’s ∆ was 0.067, the largest r value between unlinked SNPs for Burrow’s ∆ was 0.43, and
the greatest number of significant associations displayed by any single SNP was 3. In contrast,
LD was more pronounced between sets of SNPs residing on the same chromosome. Thus, SNPs
residing on the same chromosome in LD might not represent independent loci responding to
selection to prewinter length. Rather, the responses they display could represent the indirect
consequences of physical linkage and genetic hitchhiking to a third site that is the direct target of
selection. To account for this and derive an estimate for the minimum number of independent
genes/gene regions under selection, we analyzed the pattern of pairwise ∆ values to determine
the fewest number of sets of significantly responding SNPs in the selection experiment
displaying significant LD with other members of the set, but in linkage equilibrium with all other
SNPs. We accomplished this through a custom script, starting with a randomly chosen SNP,
successively added additional randomly chosen SNPs displaying significant LD to other
members of the growing set until all SNPS were exhausted. The algorithm then continued by
randomly choosing another SNP not yet contained in any set until all SNPs were assigned. At
this time, the total number of sets was calculated and the algorithm reset to no assignments and
rerun to determine the lowest estimate after 10,000 replicates. The analysis was performed
considering significant SNPs in the 2,352 mapped and entire 32,455 polymorphic SNP data sets,
and for SNPs showing significant differences in the sympatric comparison, as well as the
selection experiment.
iv
Simulation estimates of null expectations when including linkage disequilibrium
Computer simulations were then performed to generate non-parametric distributions for the null
expectation of the number of independent sets of SNPs in linkage equilibrium expected by
chance in the selection experiment for both the entire 32,455 and 2,352 mapped sites data. The
simulations were conducted by randomly choosing n = 54 and n = 47 individual whole genome
genotypes with replacement from the pool of 7-day hawthorn flies. Statistical significance for
each SNP between the two random samples was then assessed as described above for the data
from the selection experiment. Pairwise Burrow's ∆ values were then calculated between SNPs
displaying significant differences and these values used in the LD algorithm to determine the
lower bound number of independent sets of SNPs in linkage equilibrium they defined. The
process was repeated 1,000 times to generate a null distribution for statistical testing.
Polygenic threshold model and number of selected loci
The pronounced genetic response observed within a single generation in the selection experiment
might not be thought possible due to the unrealistically large number of selective deaths it would
seem to entail. However, when selection is imposed along one ecological axis, as we did, and
involves a polygenic trait like diapause, in which many loci contribute to the phenotype, there is
no limit to the number of loci potentially responding to selection. The problem is statistical
detection given a finite sample size.
We analyzed a polygenic threshold model of hard selection to show that it is the statistical
detection of loci under selection given sample sizes rather than the possible number of loci under
v
selection that is the limiting issue in selection experiments. To demonstrate this, we first
estimated the average allele frequencies for SNPs that subsequently significantly changed in
frequency in the selection experiment. From these SNPs the average frequency for the common
allele was ~0.80. We then considered that half the time the common allele would be favored by
rearing under apple-like environmental conditions and half the time the rare allele would be. A
baseline pool of 10,000 non-selected experimental individuals was then constructed possessing a
variable number of x unlinked and independently assorting loci that were sensitive to selection
equally divided into loci in which the common versus rare allele was favored under apple rearing
conditions. When then randomly chose n = 54 individuals from the baseline pool with
replacement to represent the 7-day hawthorn sample. We then assumed that each of the x loci
contributed equally to the diapause phenotype with the total number of apple selected alleles an
individual possessed dictating a deeper initial diapause depth. We then selected those individuals
in the upper 18% quantile representing flies possessing diapause phenotypes that could
potentially survive the 32-day prewinter treatment and emerge as adults after the 30-week
chilling period; the 18% threshold represents the relative proportion of 32-day versus 7-day
hawthorn flies that survived the experimental treatments (Fig. 1b). From this selected pool of
1,800 flies, we randomly chose n = 47 individuals with replacement to represent the 32-day
hawthorn sample. We then determined the average allele frequency shifts for SNPs, and the
numbers and proportions of SNPs that significantly changed in frequency. We performed 1,000
replicates for a given value of x loci under selection to generate the expected means for the
genetic response metrics. The results are summarized in Supplemental Fig. 1a-c.
In the Supplemental Fig. 1, we demonstrate how for a polygenic model of selection with the
sample sizes in our experiment, the 18% relative survivorship we induced in the long versus
vi
short prewinter treatment may detect only ~ 22% of all unlinked SNPs (or SNP sets) that are
targets of selection. Thus, our estimate of 110 may greatly underestimate the actual number of
gene regions affected by selection.
As seen in Supplemental Fig. 1a-c and from the nature of the polygenic threshold model, there is
no limit to the number of potential loci contributing to the deeper diapause phenotype that is
under selection. As additional loci are added, however, they each contribute proportionately less
to the phenotype and, hence, the strength of selection acting upon each gene decreases. Given a
fixed and finite experimental sample size, the result is that: (1) the average shift in allele
frequency of loci will decrease, as x increases (Supplemental Fig. 1c); and (2) a lower proportion
of loci will be detected as statistically responding to selection, as x increases (Supplemental Fig.
1b). However, the absolute number of significantly responding loci will increase (Supplemental
Fig. 1a).
For the actual experimental data, we found 162 independently responding genes/gene regions,
with a lower bound estimate of 110. Based on Supplemental Fig. 1a, this implies that potentially
many more loci are under divergent selection for diapause depth between the host races. Indeed,
the results suggest that perhaps each of the 686 sets of SNPs defined by all 32,455 variable sites
may contain a gene(s) that responded to selection. Of course, this does not prove that this is the
case. However, it does demonstrate that numerous loci can simultaneously be under divergent
selection and potentially respond in a manipulative experiment conducted within a generation.
However, the problem more often than not will be the statistical detection of many of these
selected loci.
vii
We emphasize that the strength of our study and novelty of the experimental genomic approach
lies not in the number of times we replicate the selection experiment as a whole, but on the
number of polygenic SNPs and gene regions contributing to diapause adaptation that we
effectively sample in a given replicate trial. Given the stochastic nature in which particular SNPs
or gene regions may respond in a given trial, we show that we may get statistical significance for
only ~22% of loci (Supplemental Figure 1). Nevertheless, the large number of potential targets
and their general response in the predicted direction, even though all may not be significant
individually, results in a substantial genome wide pattern. We are therefore moving beyond
looking at individual SNPs to a genome-wide distribution perspective. Further, we are
integrating genome scans, natural history, and selection experiments together to give a more
powerful design for testing for the footprint of divergent selection and its significance for
ecological speciation. The strength of this design is being able to: (1.) select on the key
environmental conditions causing RI to identify significantly responding SNPs and gene regions
from the background noise; and then (2.) use this information to predict the direction and
magnitude of divergence that should be observed in nature to determine the genomic footprint of
ecological selection and its relationship to RI.
viii
Supplemental Table S1. Summary metrics of the genetic response in the selection experiment
for the mapped SNPs to the R. pomonella genome, as well as for subcategories of mapped SNPs
displaying significant differences in the selection experiment (sig. sel.), between the host races
(sig. races), in both the selection experiment and between the host races (sig. both), and for sets
of SNPs in linkage equilibrium with one another (link eq.).
|∆ freq|
r
∆ races
% same
2,352
0.039
0.438
+ 0.017
62.8
Mapped SNPs sig. sel.
312
0.094
0.758
+ 0.050
82.4
Mapped SNPs sig. sel. (link eq.)
125
0.090
0.822
+0.059
86.4
Mapped SNPs sig. races
131
0.057
0.752
+ 0.082
84.0
Mapped SNPs sig. races (link eq.)
50
0.060
0.774
+0.077
85.2
Mapped SNPs sig. both
51
0.100
0.963
+ 0.121
100
Mapped SNPs sig. both (link eq.)
34
0.093
0.954
+ 0.114
100
Locus category
Mapped SNPs
n
* See Table 1 in for results for all 32,455 variable SNPs genotyped in the study. n = number of
SNPs per category; |∆ freq| = mean absolute allele frequency response in selection experiment
for indicated SNP categories; r = correlation coefficient between allele frequency response in
selection experiment versus allele frequency difference between host races (P < 10-6 for all r
values); ∆ races = mean frequency difference between the host races for alleles increasing in
frequency in the selection experiment; % same = percentage of SNPs for which the allele
frequency response in the selection experiment changed in the same direction as the difference
between the host races.
ix
110
~500
Supplemental Figure 1. Theoretical predictions from polygenic threshold selection model (yaxis) for: (a) the number of statistically significant SNPs; (b) the proportion of statistically
significant SNPs; and (c) the mean allele frequency shift for all SNPs expected for differing
numbers of independently assorting SNPs experiencing selection (x-axis). Trend lines were fitted
using a cubic spline function in R. Filled circles represent expectations if each of the 686
independent sets of SNPs observed in the study contained at least one gene under divergent
selection. Note, in panel (a.), we highlighted the lower bound estimate of the number of
x
independent regions we detected responding to selection in our experiment (110; y-axis),
suggesting approximately 500 independent regions could be responding to selection (x-axis).
Supplemental Figure 2. Association between allele frequency shifts generated during the
selection experiment on host-plant-associated overwintering conditions and allele frequency
differences between sympatric haw and apple host races of R. pomonella in nature.
xi
Download