Supplementary File 1 (doc 24K)

advertisement
APPENDIX 1:
DETAILED
DESCRIPTION
OF
THE
COALESCENT-BASED
PROGRAMS AND CHOICE OF MUTATION MODELS AND SEARCH PARAMETERS.
Batwing This program uses a Markov chain Monte Carlo (MCMC) method to
9
generate approximate random samples from the posterior distributions of parameters such
as mutation rates, effective population sizes and growth rates, and times of population
splitting events. It also generates approximate posterior samples of the entire genealogical
tree underlying the sample, including the tree height, which corresponds to the Time
since the Most Recent Common Ancestor (TMRCA)18, 19.
For the analyses of the microsatellites, the stepwise mutation model was chosen,
assuming equal mutation rates for all loci. The k-alleles model was not functional, due to
non-convergent chains, and was not used. Mitochondrial sequences could not be analysed
since Batwing was not yet ready for the finite site model. In the lack of preliminary data,
the prior distributions were set at: uniform distributions for the population parameter ,
for the time to the first splitting event and for growth rates and the Dirichlet distribution
(1,1,1) for the proportion of the total ancestral population size taken up by each
population after the splitting events. The initial genealogy was constructed by grouping,
most of the time, the most similar haplotypes (the badness, a number used to set up the
initial genealogical tree of the MCMC chain, was 0.01). 1,000 steps of warm up followed
by a Markov chain of 10,000 steps were performed. 100 changes to the tree were
attempted between changes of the mutation and model parameters and 100 changes were
made to mutation and model parameters between sampling occasions, with the approval
of the author (I. J. Wilson).
Outfiles were analysed using the software R [http://cran.r-project.org], which allowed
us to estimate, for each parameter, the median value of the posterior distributions and the
95% confidence interval, and to determine which supertree was the most represented
when the Markov process converged.
Fluctuate This program uses a maximum likelihood method based on Monte Carlo
Markov chains, with transition probabilities determined by a Metropolis Hastings
algorithm. Felsenstein's (1984) mutation model for non-recombining sequences
(described in the dnaml.doc of the Phylip manual) was used.
We assumed four equiprobable categories of sites with relative substitution rates
0.002, 0.090, 0.549 and 2.465 and a transition/transversion ratio equal to 25
(recommended in Excoffier et al.
51
). This corresponds to a discrete gamma distribution
with  parameter equal to 0.3. As suggested by the authors, the initial genealogy was
constructed with the Phylip software (version 3.5) using the Kimura 2-parameters
distance and assuming a transition/transversion ratio of 25. The starting value of the 
parameter was estimated according to Watterson’s method
52
and the exponential growth
rate starting value was set at 1.0. The search strategy, based on the suggestions indicated
in the documentation of Fluctuate, included 10 short chains (10 000 steps) followed by 3
long chains (100 000 steps), with a sampling increment (number of steps between two
sampling occasions of the genealogies) of 100.
Migrate This program uses a similar maximum likelihood estimation method as
Fluctuate. The same mutation model as Fluctuate was used for mitochondrial sequences.
The ladder model (without the Brownian motion approximation)
10
53
was used for the
analysed of microsatellites. It is similar to the stepwise mutation model, except that it
occasionally allows that the new allele is different by more than one repeat from the old
allele. Mutation rates were supposed to be the same across microsatellite loci. The
starting genealogy is a UPGMA tree constructed by Migrate. The initial values of the I’s
and ji are defined by a Fst-based method. The analysis of Y microsatellites involved 10
short chains (500 steps) and three long chains (5,000 steps) with a 500 steps increment.
Each run lasted about three weeks on a powerful UNIX workstation. For chromosome 8q,
chains with the same number of steps but with variable increments were run.
Mitochondrial sequences were analysed in two ways: 10 shorts chains (1,000 steps) and 3
long chains (10,000 steps) with 500 steps of increment and 2 longer runs, where short
chains were 10,000 and long chains were 100,000 steps long. In all cases, the first 10,000
steps of each chain were discarded. The search strategies were elaborated based on the
authors' recommendations, as well as on the results of the preliminary runs. For all
genetic systems, the stability of the estimates was tested by performing several runs. For
each parameter, the mean and variance were estimated over all runs.
APPENDIX 2: ESTIMATION OF EFFECTIVE POPULATION SIZES
The outcomes of the different programs allowed us to compute the effective
population sizes. For the models where a constant population size was assumed, the
effective size Ne of the population was directly estimated from the population parameter
, using the equation: =2Mu where u is the assumed mutation rate and M is equal to 2Ne
for chromosome 8q markers, to Nef (effective female population size) for mitochondrial
sequences and to Neh (effective size of male population) for Y chromosome
microsatellites.
When exponential growth was assumed, the effective size was obtained from the
estimated growth rate and the present effective size (No). In a constant-size population,
the coalescent time of two genes for a uniparentally inherited locus is equal to the
population size 38. Thus, equation (7) in Slatkin and Hudson 28 for the coalescent time of
two genes in an exponentially growing population can be transformed into an equation
for the effective population size:
 1   1
 exp   Ei  
    
Ne 
Equation 1

where Ei(x) is the exponential integral function
54
,  the exponential growth rate and
=No. For fluctuate,  is obtained from g using =gu.
In the analysis of chromosome 8q data with Batwing, we did not compute the effective
size (and the total coalescence time) since it would have been distorted by the fact that
Batwing omits recombination, while recombination increases the observed genetic
diversity, measured with . On the other hand, computation of the effective size was
possible with Migrate, since this program analyses each microsatellite independently and
thus allows for recombination.
Download