2014-SystBiol-Adams-SuppMaterial-Kmult

Supplementary Material from D.C. Adams, “A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data”. Systematic Biology. Type I error and Statistical Power Here I use computer simulations to show the statistical properties of the proposed method for evaluating phylogenetic signal in multivariate data. Six sets of simulations were performed. The first three sets of simulations were conducted on four perfectly balanced phylogenetic trees that differed in their number of taxa: N = 16, 32, 64, 128. The remaining three sets of simulations were conducted on randomly generated trees that differed in their number of taxa: N = 16, 32, 64, 128. For each simulation, a phylogeny was specified, and the number of trait dimensions was specified (p = 2, 4, 6, 8, 10). Next, input covariance matrices of dimension p × p were constructed, from which multivariate data were simulated under Brownian motion. For simulations, three different models of input covariance structure were utilized: 1) isotropic structure, where the input variance for each trait dimension was identical (  12   22  ) and there was no input covariation between dimensions, 2) non-isotropic structure, where the input variance in each trait dimension was allowed to differ from one another (  12   22  ) and there was no input covariance between trait dimensions, and 3) non-isotropic structure that included covariation among trait dimensions. For models with isotropic covariance structure, 2 = 1.0 was chosen all trait dimensions. For simulations modeling non-isotropic covariance structure, the input 2 for each trait dimension was drawn from a normal distribution following:   1.0; std  0.2 , and the p × p covariance matrix was constructed using these values as the diagonal elements. For simulations modeling non-isotropic covariance structure with covariation among trait dimensions, a random p × p covariance matrix was generated in the following manner. First, a lower-triangular matrix L was generated by drawing values from the normal distribution (   0; std  1 ). Next, the matrix product LLt was calculated, which produces a positive semi-definite covariance matrix (following the Cholesky decomposition: Σ = LLt ). Thus, LLt represents a random covariance matrix that includes differing amounts of input variation among trait dimensions and covariation among trait dimensions. From the initial covariance matrices, 1000 phenotypic datasets were obtained by evolving multidimensional traits along a phylogeny according to a Brownian motion model of evolution. Data were simulated using the function ‘sim.char’ in the R-package Geiger (Harmon, et al. 2008). For tests of Type I error, data the original phylogeny was transformed into a star phylogeny, using the lambda transformation:  = 0.0. Phylogenetic signal in these datasets were then evaluated on the original fullyresolved tree. For tests of statistical power, data were simulated and evaluated on the resolved tree (for details of this approach see: Blomberg, et al. 2003). To obtain a known range of phylogenetic signal, prior to simulating phenotypic data the branch lengths of the phylogeny were transformed by the parameter , where  = 0.0 transforms the tree to a star phylogeny while  = 1.0 represents the original fully-resolved tree. Transformation values used in this study were:  = 0, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0. For those simulations utilizing randomly generated trees, a new phylogeny was simulated for each data set. Across all simulation conditions, Kmult was estimated for each dataset and statistically evaluated using permutation. The proportion of significant results (out of 1000) was then treated as the Type I error or statistical power of the test, depending upon initial simulation conditions. Results: For all simulations, hypothesis tests of phylogenetic signal displayed appropriate Type I error rates near the nominal value of  = 0.05. In addition, the statistical power of tests based on Kmult increased rapidly as the degree of phylogenetic signal increased, and this pattern remained consistent across the range of trait dimensionality examined in this study, as well as across a range of the number of species in the phylogeny. For balanced phylogenies, patterns were concordant between simulations where data were generated using isotropic variation (fig A1), non-isotropic variation (fig A2), and non-isotropic variation with covariance among trait dimensions (fig A3). This was also the case for randomly generated phylogenies (figs. A4 – A6). Across all simulations the power of the test also rose rapidly as the input level of phylogenetic signal increased. This pattern became more acute as the number of species in the phylogeny increased, and as the number of trait dimensions increased (figs. A1-A6). Overall these simulations reveal that tests of phylogenetic signal based on Kmult display appropriate Type I error and statistical power. Thus, Kmult is an appropriate method for evaluating phylogenetic signal in high-dimensional datasets. Fig. A1. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on balanced phylogenies using an isotropic model to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Fig. A2. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on balanced phylogenies using a non-isotropic model to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Fig. A3. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on balanced phylogenies using a non-isotropic model with trait covariance to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Fig. A4. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on random phylogenies using an isotropic model to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Fig. A5. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on random phylogenies using a non-isotropic model to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Fig. A6. Statistical power curves for tests evaluating phylogenetic signal using Kmult when data are simulated on random phylogenies using a non-isotropic model with covariance among trait dimensions to generate the data. Each point on each power curve is the result of 1000 simulations at the conditions specified. Literature Cited Blomberg SP, Garland T, Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution, 57:717-745. Harmon LJ, Weir J, Brock C, Glor RE, Challenger W. 2008. GEIGER: Investigating evolutionary radiations. Bioinformatics, 24:129-131.

2014-SystBiol-Adams-SuppMaterial-Kmult

Related documents

Products

Support

2014-SystBiol-Adams-SuppMaterial-Kmult

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib