Appendix7.

advertisement
Appendix VII
We simulated 100 gene trees from one symmetric species tree of 8 species using the ms
program (Hudson 2002) assuming a constant population size, 3 alleles sampled per
species, and three divergence times: 0.25, 0.5, and 1Ne (Fig. 1). We ranked the simulated
gene trees based on their discordance with the known species tree using the deep
coalescences (DC) statistic in Mesquite 2.74 (Maddison and Maddison 2010), and formed
three sets of gene trees: (1) the 20 least discordant with 10-15 DC, (2) the 20 most
discordant with 22-27 DC, and (3) the mix of 10 most- and 10-least discordant (Fig. 2).
Next, we simulated sequence data (500 base pairs) for these three sets with SeqGen 1.3.2
(Rambaut and Grassly 1997) using the HKY substitution model and  per site = 0.01. We
estimated species trees for each data set of 20 loci in *BEAST, and calculated K scores
between the true species tree and a sample of 100 trees from the posterior distribution of
species trees (Fig. 3). The least discordant gene trees produced the most accurate species
trees, the mix of gene trees recovered the least accurate species trees, and the most
discordant gene trees have an intermediate accuracy. Differences in accuracy among the
three groups are statistically significant based on a non-parametric Kruskal-Wallis test
(2= 209.7, degrees of freedom = 2, p < 0.001). The simulated gene trees were visualized
with a multi-dimensional scaling analysis (MDS) using Tree Set Viz in Mesquite 1.00
based on a distance matrix of Robinson-Foulds distances (Fig. 4). The MDS was
performed with a step size of 0.018245 and resulted in a stress of 0.41763297.
Hudson R.R. 2002. Generating samples under a Wright-Fisher neutral model.
Bioinformatics 18:337−338.
Maddison W.P., Maddison D.R. 2010. Mesquite: a modular system for evolutionary
analysis, version 2.73. http://mesquiteproject.org.
Rambaut A., Grassly N.C. 1997. Seq-Gen: an application for the Monte Carlo simulation
of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci
13:235-238.
Figure 1. Simulated species tree. The scale bar represents 0.2Ne
13.0
12.0
11.0
10.0
9.0
8.0
7.0
6.0
4.0
3.0
2.0
27.0
26.0
25.0
24.0
23.0
22.0
21.0
20.0
19.0
18.0
17.0
16.0
15.0
14.0
13.0
12.0
11.0
1.0
10.0
Number of Trees
5.0
Deep Coalescences (gene tree) (Species tree sptree)
Figure 2. Histogram of the number of gene trees based on their deep coalescences with
the species tree in Fig. 1.
0.0045
0.004
Accuracy (Kscores)
0.0035
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
Figure 3. Mean accuracy (K scores) of three groups of simulated gene trees (blue = least
discordant, red = most discordant, green = mix of least and most discordant) using the
species tree in Fig. 1. Vertical bars represent standard errors.
a)
b)
Figure 4. Vizualizations of tree space derived from multi-dimensional scaling analysis
with Tree Set Viz in Mesquite. (A) red dots = least discordant, green dots = most
discordant; (B) blue = mix.
Download