The Phylogenetic Comparative Method Advanced Biostatistics Dean C. Adams Lecture 12

advertisement
The Phylogenetic Comparative Method
Advanced Biostatistics
Dean C. Adams
Lecture 12
EEOB 590C
1
Confounding Factors in Biology
•Often we want to assess relationship between two sets of variables
•Is morphology associated with resource use?
•Is genetic variation correlated with habitat?
•Does behavior covary with latitude?
•But a third variable often covaries with the other two
•Morphology and resource use may both covary with geographic
location
•May generate non-independence among objects
•We require methods that can account for this
2
The Problem of Phylogeny
•Comparing patterns across species common to address HEvol
•However, phylogenetic relationships among taxa ‘inflate’
observed correlation of traits (samples not independent)
•Must account for phylogenetic non-independence
•The comparative method is a common approach
Y
Y
X
X
3
The Comparative Method
•Comparative Method: Statistics where observations are species
sampled across a phylogeny
•Cannot use standard statistical methods because these assume
observations are independent
•The phylogenetic relationships generates non-independence, and
must be accounted for during the analysis
•Several approaches have been proposed
4
General Concept of Comparative Methods
1.
2.
3.
4.
5.
Data for taxa at tips of phylogeny
Assess covariation among variables
Define reasonable model of evolution
Account for non-independence due to phylogeny with model
Analyze data while accounting for phylogenetic non-independence
•If pattern exists after accounting for phylogeny, some factor OTHER
than phylogeny is needed to explain it
•If pattern exists before but not after, one possibility is that phylogeny
‘explains’ pattern (or at least is correlated with actual underlying factor)
5
Comparative Methods for Continuous Data
•Phylogenetic Mantel test (critique: Harmon and Glor 2010 Evol. [ bad type I error and power])
•Phylogenetic autocorrelation (PA: for critique: Rohlf 2001 Evol.)
•Phylogenetic eigenvector regression (PVR: for critique: Adams & Church
2011 Ecography; Freckleton et al. 2011: Am. Nat.)
•Phylogenetic independent contrasts (PIC)
•Phylogenetic regression (PR)
•Phylogenetic generalized least squares (PGLS)
6
Independent Contrasts (PIC)
•Most commonly used approach to account for phylogeny
•Felsenstein (1985) developed approach based on null model of
Brownian Motion (BM)
•Is statistical, in that it mathematically addresses the concern that
phylogenetically related taxa are not independent
Y
Y
X
Felsenstein (1985). Am. Nat. 125:1-15.
X
7
Independent Contrasts (PIC)
•Null model for character evolution follows Brownian Motion
•No change in mean, but varX  time
•Method uses contrast scores at nodes, which are independent
•Contrast scores between tips are their difference, standardized by branch
lengths (m=0; s=1)
•Internal nodes are estimated, and additional contrasts obtained (proceed
for all n-1 nodes of bifurcating tree)
•Perform analyses on contrast scores rather than tips data (e.g., uncentered
correlations, regressions, etc.)
Felsenstein (1985). Am. Nat. 125:1-15.
8
Independent Contrasts: Computations
Contrast Scores
Yij
Y Y 


i
j
X1
Y1
vi  v j
X2
Y2
v1
Internal Nodes: Weighted Average*
1 Y  1 Y
v1 1
v2 2
Yn1 
1  1
v1
v2
*NOTE: Internal branches adjusted as:
Felsenstein (1985). Am. Nat. 125:1-15.
n1
X3
Y3
v2
v12
v3
n2


1
vij*  vij  
 1 v 1 v 

i
j 

9
Phylogenetic Independent Contrasts (PIC)
•Calculate contrasts, internal nodes: analyze standardized contrasts
Y: 4
5
9
11
13
7
4
8
c2 
10
4.5
10
7.25
c3 
8.625
Contrast Scores
Yi  Y j 
Yij 
vi  v j
c4 
7.854
7.874
c4 
c4 
Internal Nodes: Weighted Average
1 Y1 
v
Yn1  1
1 
v1
c1
4  5


1 Y2
v2
1
v2
c4 
Felsenstein (1985). Am. Nat. 125:1-15.
11
 9  11
11
13  7 
1.5  1.5
 4.5  10 
0.5  0.5
 7.25  10 
0.5  0.5
 8.625  4 
0.5  2.5
 7.854  8
0.5  3
10
Example: Independent Contrasts
•Is there an evolutionary relationship between body size and home range in mammals?
Carnivores: black
Ungulates: white
ANCOVA: Significant BM-HR slope
carnivore HR > ungulate HR
Garland et al. (1993). Syst. Biol.
PIC ANCOVA: Significant BM-HR regression
no HR differences
11
Independent Contrasts: Comments
•Independent contrasts is an intuitive approach, and follows a
reasonable null model
•Method is an algorithm with logical steps
•Phylogenies with multifurcations can be analyzed (add zero-length branches:
Felsenstein, 1985)
•PIC is a special case of phylogenetic generalized least-squares: a
more general (and flexible) statistical model
12
Phylogenetic Generalized Least Squares
•Generalized least squares (GLS) is general model for GLM
Y = Xβ + ε
•Standard GLM model has error of m=0, s=1
•GLS can utilize other error term structures
•Error can account for non-independence among objects
•Analysis analogous to a ‘weighted’ GLM, where weights are
inverse of structured error
•For PGLS, structured error term accounts for phylogeny
13
Phylogenetic Generalized Least Squares
•PIC special case of standard statistical model (PGLS)
Y = Xβ + ε
•PGLS is GLM model with structured error term
ε=σ C
2
Diagonals are height
of OTU above root:
off-diagonals are
height of ancestors
above root
 1 2
0
 1  1 2

C    1 2
 2  1 2 0 
 0
0
 3 


1

1+2
2

3
β =  X C X  X t C-1Y
t
-1
-1
•NOTE: C follows Brownian Motion model of evolution
•PR and independent contrasts special cases (Rohlf, 2001)
•Can accommodate multifurcations (polytomies) in phylogeny, and
multivariate data
See Martins and Hansen (1997). Am. Nat.; Rohlf (2001). Evolution.
14
OLS vs. PGLS: Statistical Perspective
•Compare phylogenetically ‘naïve’ and ‘informed’ analyses:
-1
t
•OLS comparative model: β =  X X  Xt Y
•OLS is an unweighted mode: β =  X C X  X C Y
t
-1
-1
t
-1
•PGLS is a weighted model:β =  X C X  XtC-1Y
t
-1
1 0 0


C  0 1 0
0 0 1


-1
 1 2
0
 1  1 2

C    1 2
 2  1 2 0 
 0
0
 3 

•In PGLS, the weights are the phylogenetic distances, which describe the
phylogenetic non-independence
15
What PGLS is (and what it isn’t)
•PGLS is a weighted analysis
•It does NOT partition variance by 1st ‘partialing out’ evolutionary
(phylogenetic) variance, then leftover variance attributed to ecology
(many misunderstandings of this in ecological literature)
•It is a simultaneous analysis of ecology AND phylogeny
•With non-independence of data accounted for
•Analog: accounting for spatial non-independence (pseudoreplication)
But data come from
Lat
Y
X
Long
16
Example: Body Shape Evolution in Cichlids
•Does body shape covary with trophic morphology given phylogeny?
2
13
1
12
11
14
4
10
9
8
Tooth groups:
3
5
6
White: Eretmodus
7
Black: Spathodus
Red: Tanganicodus
Rüber and Adams (2001). J. Evol. Biol. 14:325-332.
17
Results
•Body shape correlated with trophic morphology (PLS: r = 0.72; P = 0.001;
regression: Wilks’ L = 0.118; P < 0.00001)
•PGLS: Wilks’ L = 0.368; P < 0.00001
•Mechanism OTHER than phylogeny required to explain pattern
(R&A hypothesize ecological specialization to new exploited habitats, followed by selection,
as the driving force of morphological change)
Rüber and Adams (2001). J. Evol. Biol. 14:325-332.
18
Comments on PGLS
•Statistically account for phylogenetic non-independence
•Flexible: All GLM methods can be implemented
•PIC (most common comparative method) is special case of PGLS
19
Evolutionary Models: Brownian Motion
•PIC and PGLS based on Brownian motion (BM)
•BM model: neutral change based on rate
dYi  t   s dBi  t 
Character change
Evolutionary rate
 Y - E  Y  C  Y - E  Y 

t
s
2
-1
N
Small random perturbations
Images from Butler and King, 2004
•Is a neutral ‘drift’ model of character change (NO SELECTION)
Felsenstein (1985). Am. Nat.
20
Evolutionary Models: Ornstein-Uhlenbeck
•OU model includes drift and selection
dYi  t      i j  Yi  t    s dBi  t 
Strength of selection
Distance from optima
Brownian motion portion
•Trait values ‘pulled’ to one or more optima (1 = stabilizing
selection; 2+ = divergent selection)
Hansen and Martins (1996) Evol.; Martins and Hansen (1997) Am. Nat.; Butler and King (2004) Am. Nat.
21
Evolutionary Models: ACDC
•Evolutionary rate changes along phylogeny (accelerate/decelerate)
•Contains s2 and additional parameter ‘g’ (ACDC parameter)
•Can model early-burst phenotypic evolution (i.e. character change
concentrated at speciation events
•Early-burst expected in adaptive radiations (Harmon et al.
2010) and punctuated equilibrium models
Blomberg et al. 2003. Am. Nat.
See also Pagel 1999. Nature.
22
Comparing Evolutionary Models
•Evolutionary models describe ‘expected’ trait variation
•This is described by C*
•Each model fits data to tree to obtain parameters (e.g., s2)
•Models all have log(L)
•COMPARE EVOLUTIONARY MODELS using LRT and AIC
•*OU and ACDC models fit by adjusting branch lengths in C (e.g., Blomberg et al. 2003; Harmon et al. 2008)
23
Example: Comparing Evolutionary Models
•How did Anolis body size groups (small, medium, large) evolve?
•5 models: BM, OU1, OU3 OU4 (3 group+anc), OULP (3 gp + history of colonization)
•OULP (3 gp + col. hist.) best explains body size evolution
•AIC: models NOT ranked simplest  complicated (thus, number of parameters does not tell
entire story)
Butler and King. 2004. Am. Nat.
24
The Tempo of Evolution: Comparing Rates
•Patterns of diversification vary greatly among taxa and traits
Brussatte et al. 2008
Sidlauskas 2008
•What is the tempo (or pace) of evolutionary change?
25
Phylogenetic Evolutionary Rates
• Evolutionary rate:
 Y - E  Y   C-1  Y - E  Y  
t
s2 
N
Felsenstein 1973; O’Meara et al. 2006
s2 is phylogenetically ‘standardized’ variance

N
MD 
i 1
Yi  Y
N
 Y - Y Y - Y

2
t
N
 Y - E  Y   Y - E  Y 
t
E(X): root of tree

N
 Y - E  Y   C-1  Y - E  Y  
t
N
 1 2
 1  1 2

C    1 2
 2  1 2


0
0
0

0
 3 
1

1+2
2

3
26
Comparing Evolutionary Rates Among Clades

 1
 exp   2

log  L s 2   log 






  Y - E  Y   s C 
t
 2 
N
2
-1
 Y - E  Y     
 det(s C)
2




log L R 
Felsenstein 1973; O’Meara et al. 2006


 1
 exp   2

 log 


 Y - E  Y   R  C
t
 2 
Np
-1
 Y - E  Y     
 det( R  C)



Revell and Harmon 2008
•Evaluate alternative evolutionary hypotheses
Different rates among clades
(O’Meara et al. 2006)
Rate shifts within clades
(Revell 2008)
27
Example: Turtle Chromosome Evolution
•Turtles have different # chromosomes (N: 28-68)
•Turtles have different sex determining modes (GSD/TSD)
•Ho: Are rates of chromosome evolution higher when sex
determining mode changes?
Pruned, time-dated supertree for turtles
Red branches= transitions in SDM
Valenzuela and Adams. (2011). Evolution
28
Example: Turtle Chromosome Evolution
•Evolutionary rate 20X higher on branches with SDM change!
2
s nochg
 0.054
.
2
s chg
 1.079
•Two-rate model significantly better fit to data
ln  L 1 rate  173.4004
ln  L 2
rates
 150.8875
LRT = 45.025, P < 0.00001
AIC1-rate = 350.97, AIC2-rates = 308.11
•Rate of chromosome evolution higher when SDM changes
Valenzuela and Adams. (2011). Evolution
29
Comparing Evolutionary Rates Among Traits
•Can use likelihood methods to compare rates among traits
1: Obtain R and logLR
 Y - E  Y  C  Y - E  Y 
t
R
 
-1
log L R 
N

t

-1
 1
 exp   2  Y - E  Y    R  C   Y - E  Y  

 log 
Np

 2   det(R  C)

2: Compare to constrained Rc & logLR.c s

R

s
where: s  s   s

2
1
2
2
2
p
C
2
1
21
s 31
s 22
s 32
  





2
s3 
Rc found via constrained optimization algorithm
Adams. (2013). Syst. Biol.
30
Example: Morphological Evolution in Plethodon
•Compare traits related to competition
vs. those that are not
Forelimb
Length
HL
P serratus
P shenandoah
P cinereus
P virginia
P hoffmani
P nettingi
P hubrichti
P richmondi
P electromorphus
P websteri
P wehrlei
P punctatus
P welleri
P angusticlavius
P ventralis
P dorsalis
P yonahlossee
P petraeus
P kentucki
P caddoensis
P ouachitae
P fourchensis
P jordani
P glutinosus
P aureolus
P mississippi
P kiamichi
P sequoyah
P albagula
P kisatchie
P grobmani
P savannah
P ocmulgee
P meridianus
P amplus
P montanus
P metcalfi
P cheoah
P chattahoochee
P variolatus
P chlorobryonis
P shermani
P teyahalee
P cylindraceus
Body
Width
Evolutionary Rate (s2)
Head Length
HL
BW
FL
•Competitive traits have higher rates
Adams. (2013). Syst. Biol.
25
20
15
10
MYA
5
6.70
7.80
7.07
7.09
7.58
7.41
8.20
7.18
6.55
5.59
9.68
11.45
7.36
7.03
6.51
6.70
14.58
12.38
11.70
8.20
11.40
10.74
10.80
12.86
9.20
11.59
11.25
11.55
12.32
9.30
11.11
10.69
10.28
11.68
11.12
9.16
12.62
9.24
10.37
9.56
10.37
9.40
12.50
13.15
FL
BW
6.73
9.00
6.63
8.61
7.58
8.36
9.38
6.50
6.63
7.01
12.19
14.69
7.77
8.10
7.19
7.33
20.06
18.19
14.90
9.20
13.30
14.59
16.16
16.20
12.53
13.60
14.38
13.66
16.12
11.48
13.06
13.24
13.83
17.28
16.44
13.66
16.06
13.26
13.67
11.84
12.20
13.24
16.40
16.65
3.33
3.92
3.57
4.51
3.66
3.42
4.16
3.72
4.18
3.48
4.98
5.99
3.78
3.70
3.27
3.32
8.44
7.86
5.95
3.60
5.45
6.69
7.66
7.71
5.31
6.68
7.20
7.83
8.16
6.05
6.21
7.13
7.07
8.36
7.58
6.02
7.58
6.00
6.40
6.41
5.93
7.10
7.03
8.02
0
31
Phylogenetic Signal
•Does phenotypic variation exhibit phylogenetic ‘structure’?
•Phylogenetic signal: the degree to which trait similarity and phylogenetic
relationships are associated
 MSEO 
 MSEO 
K  obs 
exp



 MSE 
 MSE 
 Y - E  Y   Y - E  Y 
t
K
 Y - E  Y   C-1  Y - E  Y  
t
tr (C)  N (1t C-1 1)-1
N 1
•Statistically assessed via randomization (randomize X vs. phylogeny)
•K ranges from 0∞, E(K) = 1
•K < 1: less phylogenetic signal than expected
•K > 1: greater phylogenetic signal than expected
•Note: multivariate generalization of K needed
Blomberg et al. 2003. Evolution
32
Example: Phylogenetic Signal
•Preferred body temperature in Australian skinks
•Significant phylogenetic signal
Blomberg et al. 2003. Evolution
33
Summary: Phylogenetic Comparative Biology
•Testing evolutionary hypotheses requires phylogenetic perspective
•Analytical methods developed for evaluating:
•Evolutionary correlations (PGLS)
•Compare evolutionary models (BM, OU, etc.)
•Compare evolutionary rates
•Evaluation phylogenetic signal
•Multivariate analogs to the above-methods recently developed
•Multivariate s2 : Adams (2014a). Systematic Biology
•Multivariate K: Adams (2014b). Systematic Biology
•Multivariate PGLS for p > N : Adams (2014c). Evolution
34
Download