Supplemental Material

advertisement
SUPPLEMENTARY MATERIAL for the Note
On the distribution of temporal variations in allelic frequency: consequences for the
estimation of effective population size and the detection of loci undergoing selection
by Isabelle Goldringer and Thomas Bataillon
Exploring the effect of uncertainty in the estimation of the effective size, Ne, of
the genome on the expected distribution of Fc at individual loci.
In our note we outline a procedure to test if individual loci drift faster (slower) than the
remaining loci of the genome. We do so by simulating the expected distribution of Fc values at each
locus using the remaining loci to estimate a mean effective size, Ne, throughout the genome.
Simulating these distributions requires knowledge of the initial frequencies of the alleles at a given
"focal" locus as well as Ne. We acknowledge that there is some sampling variance around the
estimated initial allelic frequencies of each locus but given the sample sizes typically used (~100 )
they are unlikely to be very large. Here, we focus instead on the sampling variance around the estimate
of Ne obtained from the remaining loci of that study. The uncertainty around Ne could potentially
affect the null distribution and in turn the p-values of our test.
As an example we use patterns of temporal variation in allelic frequency detected at locus
ba242-C in wheat population undergoing natural selection. This locus exhibited a relative high value
of Fc that could be suggestiing the presence of selection. More information on this dataset can be found
in : Goldringer, I., J. Enjalbert, A.-L. Raquin, P. Brabant 2001 Strong selection in wheat populations
during ten generations of dynamic management. Genet. Sel. Evol. 33 (Suppl 1) : 441-463.
We generate below several Fc distribution expected under the null hypothesis of homogeneous drift
throughout the genome for the locus ba242-C. Each distribution corresponds to different levels of
uncertainty about Ne. We explore the robustness of the pvalues to such uncertainty.
Null distributions for Fc were obtained using simulations (see our note for details) of a Wright
Fisher population and each distribution was based on 50,000 independent replicates.
 We used the following settings for our simulations
 Sample size at generation 1 and 10 where resp. S1=84 and S10=107 individuals. Initial allelic
frequencies 0.19 and 0.81
 T the number of generation between samples =10
 Mean effective size of the remaining loci of the genome Ne=144
We first generated the Fc distribution for locus ba242-C assuming that Ne was known without error (as
we do in our Note, see scenario 1 in the Table 1 below), we then explored the effect of incorporating
error around Ne by randomly drawing Ne from a Gaussian distribution with mean =144 and different
variances (see scenario 2, 3, and 4 in the Table 1 below) thus generating null distributions for Fc at
locus ba242-C accounting for increased incertitude around the value of Ne estimated from the
remainder loci.
Suppl Table 1
Scenario Figure 1
1
S1
2
S2
3
S3
4
S4
Std dev Ne 2
0
10
20
30
Mean Fc 3
0.101
0.121
0.282
0.302
Median Fc 3
0.020
0.020
0.020
0.020
p-value 4
0.028
0.029
0.030
0.031
1 Corresponding figure
2 Standard deviation of the Gaussian distribution describing the uncertainty around Ne. Each
simulation of an Fc value started by drawing a Gaussian deviate from that distribution.
3 Mean and median of the distribution of Fc values (for each scenario the distribution was obtained
using 50,000 independent replicates).
4 p-value associated with locus ba242-C. A p-value was computed for each scenario based on each
distribution as p-value = #(simulations with Fc > 0.19)/50000.
Download