MEC_5562_sm_AppendixS1-2_TableS1-S3_FigS1

advertisement
Appendix 1. ABC validation procedure
In order to validate the simulation precedure and to decide which of the mean, mode, or median
is the best suited for parameters estimation, we simulated two additional 1000 datasets with
known parameter values, the time of colonization being fixed either to 100 (first pseudoobserved dataset) or to 800 generations (second pseudo-observed dataset). The floreana intial
and current sizes were fixed to 15 and 1,000 respectively and the Peru size at 10,000. The
migration rate was set at 0.0045 and the mutation rate at 0.0015 with a geometric parameter of
the generalized stepwise mutation model (GSM-p parameter) at 0.15 (Tables S1 and S2).
For the ABC simulations, the six parameters were drawn randomly from uniform prior
distributions and an additional set of one million simulations was generated specifically for the
purpose of the validation procedure. This set of simulation was used with both pseudo-observed
datasets described above. The population of Floreana was assumed to have grown exponentially
after the colonization of the island, assuming an initial effective size (N0) of [1-20] and a current
size (N1) of [500-5,000]. The effective size in Peru was set to 10,000 and kept constant over
time. The colonization time (t) was assumed to lie between 1 and 60,000 generations, assuming
a maximum generation time of 25 years. The migration rate was set to [0.000001-0.01]. For each
of the five loci, a mutation rate µ was randomly drawn from a gamma distribution, with a mean
randomly drawn between 0.0001 to 0.01 and a shape parameter k set to 10. The simulated
datasets were compared to the ‘pseudo-observed’ ones and the 2,000 closest simulations were
kept for parameter estimations. The bias, the relative root mean square error (RMSE) and Factor
2 were computed, as explained in the Material and Method section.
Supplementary Table 1 Accuracy of the estimated parameters assessed using the mean, the
median and the mode of the posterior distribution, by simulating a 1000 test datasets with
known parameter values, time of colonization being fixed to 100 generations.
Mean
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
100
618.09
15
9.1749
1000
2588.6
0.0045
0.0044
0.0015
0.0018
0.15
0.1489
Bias
5.181
-0.388
1.587
-0.024
0.169
-0.008
RMSE
8.756
0.403
1.602
0.214
0.298
0.038
Factor 2
0.216
0.840
0
0.988
0.996
1
Coverage
50%
90%
0.909
0.999
0.195
0.829
0.812
0.919
0.996
1
0.732
0.97
0.978
1
Factor 2
0.383
0.723
0.039
0.940
0.998
1
Coverage
50%
90%
0.909
0.999
0.195
0.829
0.812
0.919
0.996
1
0.732
0.97
0.978
1
Factor 2
0.831
0.287
0.805
0.630
0.999
0.754
Coverage
50%
90%
0.909
0.999
0.195
0.829
0.812
0.919
0.996
1
0.732
0.970
0.978
1
Median
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
100
307.14
15
8.6436
1000
2499.4
0.0045
0.0041
0.0015
0.0017
0.15
0.1479
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
100
122.3
15
6.3763
1000
1691.1
0.0045
0.0031
0.0015
0.0016
0.15
0.1403
Bias
2.071
-0.424
1.499
-0.099
0.129
-0.014
RMSE
3.359
0.446
1.532
0.281
0.269
0.059
Mode
Bias
0.223
-0.575
0.691
-0.311
0.066
-0.065
RMSE
0.951
0.632
1.430
0.483
0.237
0.506
The coverage is computed as the proportion of simulations in which the “true value” lies
within the respective 50% and 90% credible intervals.
Supplementary Table 2 Accuracy of the estimated parameters assessed using the mean the
median and the mode of the posterior distribution, by simulating a 1000 test datasets with
known parameter values, time of colonization being fixed to 800 generations.
Mean
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
800
6180.6
15
11.239
1000
2413.1
0.0045
0.0048
0.0015
0.0018
0.15
0.1505
Bias
6.726
-0.251
1.413
0.077
0.184
0.004
RMSE
7.950
0.256
1.433
0.255
0.313
0.040
Factor 2
0.033
1
0.009
0.984
0.998
1
Coverage
50%
90%
0.927
0.997
0.906
1
0.919
0.959
0.965
0.999
0.734
0.976
0.958
1
Factor 2
0.358
0.997
0.283
0.945
0.998
1
Coverage
50%
90%
0.927
0.997
0.906
1
0.919
0.959
0.965
0.999
0.734
0.976
0.958
1
Factor 2
0.557
0.846
0.909
0.628
0.999
0.786
Coverage
50%
90%
0.927
0.997
0.906
1
0.919
0.959
0.965
0.999
0.734
0.976
0.958
1
Median
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
800
2375
15
11.529
1000
2225.5
0.0045
0.0047
0.0015
0.0017
0.15
0.1504
Parameter
Colonization time
Floreana initial size
Floreana current size
Migration rate
Mutation rate
GSM p-parameter
Values
True
Estimated
800
718.5
15
14.6
1000
1283.9
0.0045
0.0042
0.0015
0.0016
0.15
0.1535
Bias
1.969
-0.231
1.226
0.038
0.141
0.003
RMSE
2.767
0.244
1.275
0.315
0.282
0.062
Mode
Bias
-0.102
-0.025
0.284
-0.071
0.065
0.023
RMSE
0.746
0.317
0.915
0.586
0.250
0.531
The coverage is computed as the proportion of simulations in which the “true value” lies
within the respective 50% and 90% credible intervals.
Appendix 2. IMa2 simulations
IMa2 (Hey, 2005; 2010) was run using combinations of parameters that differed by the assumed
average mutation rate of microsatellites (=0.004, 0.0005 and 0.0001). These different
mutations rates were drawn from the literature (Udupa & Baum 2001; Thuillet et al. 2002;
Vigouroux et al. 2002; O'Connell & Ritland 2004) and from the results obtained from ABC
simulations (mode = 0.0004 to 0.0006). We assumed that an ancestral population in Peru split at
a time t and gave rise to the current populations in Peru and Floreana with subsequent
migrations between the two populations. For the three IMa runs we performed, the maximum
sizes for Peru and the Galapagos were assumed to be, as for ABC simulations, NP =100,000
individuals and NF = 5,000 individuals respectively, which were translated into 4N using the
three assumed values of  (4NP = 1,600, 200 and 40, respectively; 4NF = 80, 10 and 2,
respectively). The geometric mean of 4NP and 4NP estimations, x, was calculated and used to
define the priors following the IMa2 guidelines. The maximum population size was set as 5x,
which translated into the following priors depending on the assumed mutation rate: q=1800, 224
and 45, respectively; the maximum time of colonization was as 2x (t=720, 89, and 18,
respectively) and the maximum migration rate between the two populations Peru and Floreana
was set as 2/x (m=0.006, 0.045 and 0.224, respectively). The Metropolis Coupling was
implemented using ten independent chains with a high heating using a geometric increment
model with a degree of non-linearity of 0.99 and the lower value of the heating term fixed at
0.25. Each chain was initiated with a burning period of 100,000 updates and the total run length
was ten million updates, with a thinning interval of 100 updates.
The best results were obtained for =0.0005, with all parameters found to be uncorrelated and
the trend-line plots showing no obvious trends. IMa was then run a second time with 20 chains
and a different seed. The results between the two runs were congruent. The mixing was
however higher with the second run and the parameter estimates of this later run are presented
in Table S3. Parameters conversion was obtained following Hey (2005) using as =0.0005 for the
mutation rate per locus and per generation.
Supplementary Table 3 Estimation of population sizes, colonization time and migration
parameters using IMa2, with a mutation rate fixed at =0.0005.
 = 0.0005
Mean
t (in generations)
Mode
HPD95low
HPD95Hi
250
267
0
1'869
9'955
5'095
1'960
20'440
N Floreana
795
728
168
1'960
N Ancestral
15'245
13'945
9'015
21'670
Flo -> Peru (m)
0.00001
0.00000
0.00000
0.00002
Peru -> Flo (m)
0.00001
0.00002
0.00000
0.00002
N Peru
HPD95low and HPD95Hi correspond to the lower and higher bounds of the estimated 95% lower or highest
posterior density intervals, respectively.
Supplementary Figure 1 Posterior density curves for the four parameters used to simulate the
colonization of the Galapagos by Geoffroea spinosa with IMa2. The modal value of each
estimated parameter is shown within parentheses.
Colonization time (267)
Floreana current size (728)
7
6
5
4
3
2
1
0
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
500
1000
1500
Peru ancestral size (13,945)
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
10000 20000 30000 40000 50000
S3 Peru current size (5,095)
0.05
0.04
0.03
0.02
0.01
0
0
1000
2000
3000
4000
5000
0
20000 40000 60000 80000 100000
Literature
Hey J (2005) On the number of New World founders: A population Genetic Portrait of the
peopling of the Americas. PLoS Biology, 3(6), e193.
Hey J (2010) Documentation for IMa2. Department of Genetics, Rutgers University, USA.
Available at http://genfaculty.rutgers.edu/hey/software.
O'Connell LM, Ritland K (2004) Somatic Mutations at Microsatellite Loci in Western Redcedar
(Thuja plicata: Cupressaceae). Journal of Heredity 95, 172-176.
Thuillet A-C, Bru D, David J, et al. (2002) Direct Estimation of Mutation Rate for 10
Microsatellite Loci in Durum Wheat, Triticum turgidum (L.) Thell. ssp durum Desf.
Molecular Biology and Evolution 19, 122-125.
Udupa S, Baum M (2001) High mutation rate and mutational bias at (TAA)n microsatellite loci in
chickpea (Cicer arietinum L.). Molecular Genetics and Genomics 265, 1097-1103.
Vigouroux Y, Jaqueth JS, Matsuoka Y, et al. (2002) Rate and Pattern of Mutation at Microsatellite
Loci in Maize. Molecular Biology and Evolution 19, 1251-1260.
Download