2013-SystBiol-Adams-SuppMaterial

advertisement
Supplementary Material from D.C. Adams, “Quantifying and comparing phylogenetic evolutionary
rates for shape and other high-dimensional phenotypic data”. Systematic Biology.
Type I error and Statistical Power
Here I use computer simulations to show the statistical properties of the proposed method for
comparing evolutionary rates for high-dimensional multivariate traits. Three sets of simulations were
performed. The first set was conducted on four perfectly balanced phylogenetic trees that differed in their
number of taxa: N = 16, 32, 64, 128. The second set was conducted on randomly generated trees that
differed in their number of taxa: N = 16, 32, 64, 128. The final set of simulations was conducted on
randomly generated trees (N = 16, 32, 64, 128) with three groups of taxa on each.
For each simulation, a phylogeny was specified, taxa were divided into two (or three) groups, and
the number of trait dimensions was specified (p = 2, 3, 4, 5, 7, 10). Next, input error covariance matrices
of dimension p × p were constructed for each group, which were used to generate the phenotypic data.
For simulations assuming isotropic error, the evolutionary rate for the first group was set to 12  1.0 for
each trait dimension. For the second, group, the evolutionary rate for all trait dimensions was set to a
fixed proportional difference relative to that in the first group. These varied across simulations such that
the initial rate difference was known between groups (  22  1.0, 1.5, 2.0, 3.0, 4.0 ). For simulations
assuming non-isotropic error, , the evolutionary rate for the first group was drawn from a normal
distribution (   1.0 ;  0.1 ) for each trait dimension. The evolutionary rates for all traits for the second
group were then obtained by multiplying these values by a constant ( k  1.0, 1.5, 2.0, 3.0, 4.0 ) to obtain
a known initial rate difference between groups. Simulations with three groups had the evolutionary rate of
the third group set as: 12   32  1.0 .
From the initial covariance matrices, 1000 phenotypic datasets were obtained by evolving multidimensional traits along the phylogeny according to a Brownian motion model of evolution. Data were
simulated using the function ‘transformPhylo.sim’ in the R-package motmot (Thomas and
Freckleton, 2011), which is capable of simulating data under a Brownian motion model with
differing rates for groups of taxa. For the case of randomly generated trees, a new phylogeny was
2
simulated for each data set. For each simulation, evolutionary rates (  mult
) were then estimated for
each group of taxa on the phylogeny, and the proposed ratio-based test was used to determine whether or
2
2
2
not  mult
differed from one another among groups. Specifically, the observed ratio (  mult
. A  mult . B ) was
compared to ratios obtained from data under the null hypothesis of no rate difference among groups; the
latter of which was generated by phylogenetic simulation. Here, the global evolutionary rate across the
2
entire phylogeny (  mult
) was used to generate simulated data sets using ‘sim.char’ in the R-package
.Tot
2
geiger (Harmon et al., 2008), where  mult
was used as the input rate for each trait dimension (using
.Tot
2
provided equivalent results, as N is a constant). The proportion significant results (of 1000)
 mult
.Tot / N
was then treated as the significance level of the observed rate ratio for that simulated dataset.
Across all simulation conditions, simulations where 12   22  1.0 represented Type I error rate
assessments (i.e., no difference in their evolutionary rates), while simulations where  22 / 12  1.0
assessed statistical power. For Type I error simulations, rate comparisons were also performed with
likelihood methods based on the evolutionary rate matrix R, using the function ‘evol.vcv’ in the Rpackage phytools (Revell, 2012).
2
Results: For all simulations, hypothesis tests based on  mult
displayed appropriate Type I error
rates near  = 0.05, which remained consistently near 0.05 regardless of trait dimensionality (figs. A1-3).
By contrast, tests based on the evolutionary rate matrix R had unacceptably high Type I error rates which
increased with trait dimensionality (Fig. 2a: main text). With the likelihood approach, Type I error rates
were over 8% for tests on bivariate traits, and exceeded 50% when p = 10. Thus, this method has
unacceptably high Type I error when used on high-dimensional data.
When evolutionary rates differed between traits (  22 / 12  1.0 ), the distance-based method
2
demonstrated acceptable statistical power, and was capable of identifying even small differences in  mult
between traits. The power of the test also rose rapidly as the difference between evolutionary rates
increased. This pattern became more acute as the number of species in the phylogeny increased.
Importantly, these statistical findings were obtained regardless of whether phylogenies were balanced or
random, or whether evolutionary rates for two or more groups of taxa on the phylogeny were compared
(figs. A1-A3).
Overall these simulations reveal that the distance-based method for comparing evolutionary rates
2
(  mult
) has appropriate Type I error and statistical power, and is therefore appropriate for use on high-
dimensional datasets.
Fig. A1. Statistical power curves for tests comparing evolutionary rates for two groups of taxa on
balanced phylogenies. Each point on each power curve is the result of 1000 simulations at the conditions
specified.
Fig. A2. Statistical power curves for tests comparing evolutionary rates for two groups of taxa on random
phylogenies. Each point on each power curve is the result of 1000 simulations at the conditions specified.
Fig. A2. Statistical power curves for tests comparing evolutionary rates for three groups of taxa on
random phylogenies. Each point on each power curve is the result of 1000 simulations at the conditions
specified.
Literature Cited
Harmon, L. J., J. Weir, C. Brock, R. E. Glor, and W. Challenger. 2008. GEIGER: Investigating
evolutionary radiations. Bioinformatics 24:129-131.
Revell, L. J. 2012. Phytools: An R package for phylogenetic comparative biology (and other things).
Methods in Ecology and Evolution 3:217-223.
Thomas, G. H., and R. P. Freckleton. 2011. MOTMOT: models of trait macroevolution on trees. Methods
in Ecology and Evolution 3:145-151.
Download