The Phylogenetic Comparative Method Advanced Biostatistics Dean C. Adams Lecture 12 EEOB 590C 1 Confounding Factors in Biology •Often we want to assess relationship between two sets of variables •Is morphology associated with resource use? •Is genetic variation correlated with habitat? •Does behavior covary with latitude? •But a third variable often covaries with the other two •Morphology and resource use may both covary with geographic location •May generate non-independence among objects •We require methods that can account for this 2 The Problem of Phylogeny •Comparing patterns across species common to address HEvol •However, phylogenetic relationships among taxa ‘inflate’ observed correlation of traits (samples not independent) •Must account for phylogenetic non-independence •The comparative method is a common approach Y Y X X 3 The Comparative Method •Comparative Method: Statistics where observations are species sampled across a phylogeny •Cannot use standard statistical methods because these assume observations are independent •The phylogenetic relationships generates non-independence, and must be accounted for during the analysis •Several approaches have been proposed 4 General Concept of Comparative Methods 1. 2. 3. 4. 5. Data for taxa at tips of phylogeny Assess covariation among variables Define reasonable model of evolution Account for non-independence due to phylogeny with model Analyze data while accounting for phylogenetic non-independence •If pattern exists after accounting for phylogeny, some factor OTHER than phylogeny is needed to explain it •If pattern exists before but not after, one possibility is that phylogeny ‘explains’ pattern (or at least is correlated with actual underlying factor) 5 Comparative Methods for Continuous Data •Phylogenetic Mantel test (critique: Harmon and Glor 2010 Evol. [ bad type I error and power]) •Phylogenetic autocorrelation (PA: for critique: Rohlf 2001 Evol.) •Phylogenetic eigenvector regression (PVR: for critique: Adams & Church 2011 Ecography; Freckleton et al. 2011: Am. Nat.) •Phylogenetic independent contrasts (PIC) •Phylogenetic regression (PR) •Phylogenetic generalized least squares (PGLS) 6 Independent Contrasts (PIC) •Most commonly used approach to account for phylogeny •Felsenstein (1985) developed approach based on null model of Brownian Motion (BM) •Is statistical, in that it mathematically addresses the concern that phylogenetically related taxa are not independent Y Y X Felsenstein (1985). Am. Nat. 125:1-15. X 7 Independent Contrasts (PIC) •Null model for character evolution follows Brownian Motion •No change in mean, but varX time •Method uses contrast scores at nodes, which are independent •Contrast scores between tips are their difference, standardized by branch lengths (m=0; s=1) •Internal nodes are estimated, and additional contrasts obtained (proceed for all n-1 nodes of bifurcating tree) •Perform analyses on contrast scores rather than tips data (e.g., uncentered correlations, regressions, etc.) Felsenstein (1985). Am. Nat. 125:1-15. 8 Independent Contrasts: Computations Contrast Scores Yij Y Y i j X1 Y1 vi v j X2 Y2 v1 Internal Nodes: Weighted Average* 1 Y 1 Y v1 1 v2 2 Yn1 1 1 v1 v2 *NOTE: Internal branches adjusted as: Felsenstein (1985). Am. Nat. 125:1-15. n1 X3 Y3 v2 v12 v3 n2 1 vij* vij 1 v 1 v i j 9 Phylogenetic Independent Contrasts (PIC) •Calculate contrasts, internal nodes: analyze standardized contrasts Y: 4 5 9 11 13 7 4 8 c2 10 4.5 10 7.25 c3 8.625 Contrast Scores Yi Y j Yij vi v j c4 7.854 7.874 c4 c4 Internal Nodes: Weighted Average 1 Y1 v Yn1 1 1 v1 c1 4 5 1 Y2 v2 1 v2 c4 Felsenstein (1985). Am. Nat. 125:1-15. 11 9 11 11 13 7 1.5 1.5 4.5 10 0.5 0.5 7.25 10 0.5 0.5 8.625 4 0.5 2.5 7.854 8 0.5 3 10 Example: Independent Contrasts •Is there an evolutionary relationship between body size and home range in mammals? Carnivores: black Ungulates: white ANCOVA: Significant BM-HR slope carnivore HR > ungulate HR Garland et al. (1993). Syst. Biol. PIC ANCOVA: Significant BM-HR regression no HR differences 11 Independent Contrasts: Comments •Independent contrasts is an intuitive approach, and follows a reasonable null model •Method is an algorithm with logical steps •Phylogenies with multifurcations can be analyzed (add zero-length branches: Felsenstein, 1985) •PIC is a special case of phylogenetic generalized least-squares: a more general (and flexible) statistical model 12 Phylogenetic Generalized Least Squares •Generalized least squares (GLS) is general model for GLM Y = Xβ + ε •Standard GLM model has error of m=0, s=1 •GLS can utilize other error term structures •Error can account for non-independence among objects •Analysis analogous to a ‘weighted’ GLM, where weights are inverse of structured error •For PGLS, structured error term accounts for phylogeny 13 Phylogenetic Generalized Least Squares •PIC special case of standard statistical model (PGLS) Y = Xβ + ε •PGLS is GLM model with structured error term ε=σ C 2 Diagonals are height of OTU above root: off-diagonals are height of ancestors above root 1 2 0 1 1 2 C 1 2 2 1 2 0 0 0 3 1 1+2 2 3 β = X C X X t C-1Y t -1 -1 •NOTE: C follows Brownian Motion model of evolution •PR and independent contrasts special cases (Rohlf, 2001) •Can accommodate multifurcations (polytomies) in phylogeny, and multivariate data See Martins and Hansen (1997). Am. Nat.; Rohlf (2001). Evolution. 14 OLS vs. PGLS: Statistical Perspective •Compare phylogenetically ‘naïve’ and ‘informed’ analyses: -1 t •OLS comparative model: β = X X Xt Y •OLS is an unweighted mode: β = X C X X C Y t -1 -1 t -1 •PGLS is a weighted model:β = X C X XtC-1Y t -1 1 0 0 C 0 1 0 0 0 1 -1 1 2 0 1 1 2 C 1 2 2 1 2 0 0 0 3 •In PGLS, the weights are the phylogenetic distances, which describe the phylogenetic non-independence 15 What PGLS is (and what it isn’t) •PGLS is a weighted analysis •It does NOT partition variance by 1st ‘partialing out’ evolutionary (phylogenetic) variance, then leftover variance attributed to ecology (many misunderstandings of this in ecological literature) •It is a simultaneous analysis of ecology AND phylogeny •With non-independence of data accounted for •Analog: accounting for spatial non-independence (pseudoreplication) But data come from Lat Y X Long 16 Example: Body Shape Evolution in Cichlids •Does body shape covary with trophic morphology given phylogeny? 2 13 1 12 11 14 4 10 9 8 Tooth groups: 3 5 6 White: Eretmodus 7 Black: Spathodus Red: Tanganicodus Rüber and Adams (2001). J. Evol. Biol. 14:325-332. 17 Results •Body shape correlated with trophic morphology (PLS: r = 0.72; P = 0.001; regression: Wilks’ L = 0.118; P < 0.00001) •PGLS: Wilks’ L = 0.368; P < 0.00001 •Mechanism OTHER than phylogeny required to explain pattern (R&A hypothesize ecological specialization to new exploited habitats, followed by selection, as the driving force of morphological change) Rüber and Adams (2001). J. Evol. Biol. 14:325-332. 18 Comments on PGLS •Statistically account for phylogenetic non-independence •Flexible: All GLM methods can be implemented •PIC (most common comparative method) is special case of PGLS 19 Evolutionary Models: Brownian Motion •PIC and PGLS based on Brownian motion (BM) •BM model: neutral change based on rate dYi t s dBi t Character change Evolutionary rate Y - E Y C Y - E Y t s 2 -1 N Small random perturbations Images from Butler and King, 2004 •Is a neutral ‘drift’ model of character change (NO SELECTION) Felsenstein (1985). Am. Nat. 20 Evolutionary Models: Ornstein-Uhlenbeck •OU model includes drift and selection dYi t i j Yi t s dBi t Strength of selection Distance from optima Brownian motion portion •Trait values ‘pulled’ to one or more optima (1 = stabilizing selection; 2+ = divergent selection) Hansen and Martins (1996) Evol.; Martins and Hansen (1997) Am. Nat.; Butler and King (2004) Am. Nat. 21 Evolutionary Models: ACDC •Evolutionary rate changes along phylogeny (accelerate/decelerate) •Contains s2 and additional parameter ‘g’ (ACDC parameter) •Can model early-burst phenotypic evolution (i.e. character change concentrated at speciation events •Early-burst expected in adaptive radiations (Harmon et al. 2010) and punctuated equilibrium models Blomberg et al. 2003. Am. Nat. See also Pagel 1999. Nature. 22 Comparing Evolutionary Models •Evolutionary models describe ‘expected’ trait variation •This is described by C* •Each model fits data to tree to obtain parameters (e.g., s2) •Models all have log(L) •COMPARE EVOLUTIONARY MODELS using LRT and AIC •*OU and ACDC models fit by adjusting branch lengths in C (e.g., Blomberg et al. 2003; Harmon et al. 2008) 23 Example: Comparing Evolutionary Models •How did Anolis body size groups (small, medium, large) evolve? •5 models: BM, OU1, OU3 OU4 (3 group+anc), OULP (3 gp + history of colonization) •OULP (3 gp + col. hist.) best explains body size evolution •AIC: models NOT ranked simplest complicated (thus, number of parameters does not tell entire story) Butler and King. 2004. Am. Nat. 24 The Tempo of Evolution: Comparing Rates •Patterns of diversification vary greatly among taxa and traits Brussatte et al. 2008 Sidlauskas 2008 •What is the tempo (or pace) of evolutionary change? 25 Phylogenetic Evolutionary Rates • Evolutionary rate: Y - E Y C-1 Y - E Y t s2 N Felsenstein 1973; O’Meara et al. 2006 s2 is phylogenetically ‘standardized’ variance N MD i 1 Yi Y N Y - Y Y - Y 2 t N Y - E Y Y - E Y t E(X): root of tree N Y - E Y C-1 Y - E Y t N 1 2 1 1 2 C 1 2 2 1 2 0 0 0 0 3 1 1+2 2 3 26 Comparing Evolutionary Rates Among Clades 1 exp 2 log L s 2 log Y - E Y s C t 2 N 2 -1 Y - E Y det(s C) 2 log L R Felsenstein 1973; O’Meara et al. 2006 1 exp 2 log Y - E Y R C t 2 Np -1 Y - E Y det( R C) Revell and Harmon 2008 •Evaluate alternative evolutionary hypotheses Different rates among clades (O’Meara et al. 2006) Rate shifts within clades (Revell 2008) 27 Example: Turtle Chromosome Evolution •Turtles have different # chromosomes (N: 28-68) •Turtles have different sex determining modes (GSD/TSD) •Ho: Are rates of chromosome evolution higher when sex determining mode changes? Pruned, time-dated supertree for turtles Red branches= transitions in SDM Valenzuela and Adams. (2011). Evolution 28 Example: Turtle Chromosome Evolution •Evolutionary rate 20X higher on branches with SDM change! 2 s nochg 0.054 . 2 s chg 1.079 •Two-rate model significantly better fit to data ln L 1 rate 173.4004 ln L 2 rates 150.8875 LRT = 45.025, P < 0.00001 AIC1-rate = 350.97, AIC2-rates = 308.11 •Rate of chromosome evolution higher when SDM changes Valenzuela and Adams. (2011). Evolution 29 Comparing Evolutionary Rates Among Traits •Can use likelihood methods to compare rates among traits 1: Obtain R and logLR Y - E Y C Y - E Y t R -1 log L R N t -1 1 exp 2 Y - E Y R C Y - E Y log Np 2 det(R C) 2: Compare to constrained Rc & logLR.c s R s where: s s s 2 1 2 2 2 p C 2 1 21 s 31 s 22 s 32 2 s3 Rc found via constrained optimization algorithm Adams. (2013). Syst. Biol. 30 Example: Morphological Evolution in Plethodon •Compare traits related to competition vs. those that are not Forelimb Length HL P serratus P shenandoah P cinereus P virginia P hoffmani P nettingi P hubrichti P richmondi P electromorphus P websteri P wehrlei P punctatus P welleri P angusticlavius P ventralis P dorsalis P yonahlossee P petraeus P kentucki P caddoensis P ouachitae P fourchensis P jordani P glutinosus P aureolus P mississippi P kiamichi P sequoyah P albagula P kisatchie P grobmani P savannah P ocmulgee P meridianus P amplus P montanus P metcalfi P cheoah P chattahoochee P variolatus P chlorobryonis P shermani P teyahalee P cylindraceus Body Width Evolutionary Rate (s2) Head Length HL BW FL •Competitive traits have higher rates Adams. (2013). Syst. Biol. 25 20 15 10 MYA 5 6.70 7.80 7.07 7.09 7.58 7.41 8.20 7.18 6.55 5.59 9.68 11.45 7.36 7.03 6.51 6.70 14.58 12.38 11.70 8.20 11.40 10.74 10.80 12.86 9.20 11.59 11.25 11.55 12.32 9.30 11.11 10.69 10.28 11.68 11.12 9.16 12.62 9.24 10.37 9.56 10.37 9.40 12.50 13.15 FL BW 6.73 9.00 6.63 8.61 7.58 8.36 9.38 6.50 6.63 7.01 12.19 14.69 7.77 8.10 7.19 7.33 20.06 18.19 14.90 9.20 13.30 14.59 16.16 16.20 12.53 13.60 14.38 13.66 16.12 11.48 13.06 13.24 13.83 17.28 16.44 13.66 16.06 13.26 13.67 11.84 12.20 13.24 16.40 16.65 3.33 3.92 3.57 4.51 3.66 3.42 4.16 3.72 4.18 3.48 4.98 5.99 3.78 3.70 3.27 3.32 8.44 7.86 5.95 3.60 5.45 6.69 7.66 7.71 5.31 6.68 7.20 7.83 8.16 6.05 6.21 7.13 7.07 8.36 7.58 6.02 7.58 6.00 6.40 6.41 5.93 7.10 7.03 8.02 0 31 Phylogenetic Signal •Does phenotypic variation exhibit phylogenetic ‘structure’? •Phylogenetic signal: the degree to which trait similarity and phylogenetic relationships are associated MSEO MSEO K obs exp MSE MSE Y - E Y Y - E Y t K Y - E Y C-1 Y - E Y t tr (C) N (1t C-1 1)-1 N 1 •Statistically assessed via randomization (randomize X vs. phylogeny) •K ranges from 0∞, E(K) = 1 •K < 1: less phylogenetic signal than expected •K > 1: greater phylogenetic signal than expected •Note: multivariate generalization of K needed Blomberg et al. 2003. Evolution 32 Example: Phylogenetic Signal •Preferred body temperature in Australian skinks •Significant phylogenetic signal Blomberg et al. 2003. Evolution 33 Summary: Phylogenetic Comparative Biology •Testing evolutionary hypotheses requires phylogenetic perspective •Analytical methods developed for evaluating: •Evolutionary correlations (PGLS) •Compare evolutionary models (BM, OU, etc.) •Compare evolutionary rates •Evaluation phylogenetic signal •Multivariate analogs to the above-methods recently developed •Multivariate s2 : Adams (2014a). Systematic Biology •Multivariate K: Adams (2014b). Systematic Biology •Multivariate PGLS for p > N : Adams (2014c). Evolution 34