notes

advertisement
Phylogenetic comparative methods
Comparative studies (nuisance)
Evolutionary studies (objective)
Community ecology (lack of alternatives)
Current growth of phylogenetic
comparative methods
New statistical methods
Availability of phylogenies
Culture
One of many possible types of
problems
y  b0  b1 x  
or as a special case
y  b0  
This model structure can be used for
a variety of types of problems
y  b0  b1 x  
Assumptions:
y takes continuous values
x can be a random variable or a set of
known values (continuous or not)
y is linearly related to x
 are random variables with expectation 0
and finite (co)variances that are known
y  b0  b1 x  
Statistical methods
(P)IC = GLS
Phylogenetic independent contrasts
Generalized Least Squares
(these are methods, not models)
Other methods for other statistical models
ML, REML, EGLS, GLM, GLMM, GEE,
“Bayesian” methods
y  b0  b1 x  
 are random variables with expectation 0
and finite (co)variances that are known
Phylogeny provides a hypothesis for these
covariances
Close
Relatives
Tend to
Resemble
Each Other
A
B
4
C
3
D
2
E
F
Y
G
H FE
A
C
1
0
BD
G
-1
H
I
0
1
2
X
3
4
A
B
4
C
3
D
2
E
F
Y
What does this
G
represent?
H
FE
A is it
How
constructed?
C
1
0
G
-1
H
I
0
Is itDknown for
B
1 certain?
2
3
X
4
Assume that this
represents time and
is knownGwithout
errorH F E
A
B
4
C
3
D
2
E
F
G
Y
Translate
into the
C
pattern
of covariances
0
in  among
species
D
1
-1
H
I
A
B
0
1
2
X
V
3
4
Trait value
Hypothetical trait for a single species
under Brownian motion evolution
possible
course of
evolution
Time
Trait value
another
possible
course of
evolution
Time
Trait value
another
possible
course of
evolution
Time
Trait value
Brownian motion evolution gives the
hypothetical variance of a trait
Variance
Time
Trait value
Brownian motion evolution
Variance
Time
Brownian motion evolution of a
hypothetical trait during speciation
Variance between
species = Time
Total variance = Total time
Variance between
species = Time
Total variance = Total time
Covariance =
Shared time
Variance between
species = Time

A
B
4
C
3
D
2
E
F
Y
G
BrownianH
motion
A
V
C
1
0
BD
G
-1
H
I
FE
0
1
2
3
4
X

V
Covariance matrix giving
phylogenetic covariances
among species
v ii diagonal elements give the total variance

for species i
v ij off-diagonal elements give covariances
between species i and species j
I am confused by the authors use of "branch
lengths" on page 3023. I'm not sure if "different
types of branch lengths" mean different
phylogenetic analyses or something else I'm not
aware of.
Digression - non-Brownian models of evolution
Ornstein-Uhlenbeck evolution
Stabilizing
selection with
strength given
by d
selection
Time
Variance between
species < Time
Total variance << Total time
Variance between
species < Time
Ornstein-Uhlenbeck evolution
Time
Variance
Stabilizing selection means information is “lost”
through time
Phylogenetic correlations between species
decrease
Phylogenetic Signal
(Blomberg, Garland, and Ives
2003)
OU
process
 Vd

Vd=

measures the strength of signal
Vd=

y  b0  b1 x  
Assumptions:
y takes continuous values
x can be a random variable or a set of known
numbers
y is linearly related to x
 are random variables with expectation 0 and
finite (co)variances that are known
If d must be estimated, cannot be analyzed using PIC
or GLS
If we are dealing with a recent, rapid radiation, (supported
clade but with short branches) will the lack of branch
length data render any PIC not very informative
biologically, because we would expect non-significant
probabilities, based solely on the branch lengths alone?
page 3022, second paragraph.
Phylogenetic Signal
(Blomberg, Garland, and Ives
2003)
OU
process
 Vd

Vd=

measures the strength of signal
y  b0  b1 x  
Statistical methods
(P)IC = GLS
Phylogenetic independent contrasts
Generalized Least Squares
(these are methods, not models)
Other methods for other statistical models
ML, REML, EGLS, GLM, GLMM, GEE,
“Bayesian” methods
PIC
yij  1xij   'i  ' j ij
 'k  'l
 'i   i 
 'k  'l


4
1
y4
2
3
y1
y2
y3


4
1
y4
2
3
y1
y2
y3
y12  y1  y 2
y1 1  y 2  2 y1 y 2  1 2 
y4 
   

1 1 1  2
1  2 1   2 
y 34  y 3  y 4
1 2
 '4   4 
1   2
PIC
yij  1xij   'i  ' j ij
y ij

 'i  ' j
 1
x ij
 'i  ' j
 ij
Regression through the origin

PIC
y ij
 'i  ' j
 1
x ij
 'i  ' j
 ij
You could also use different branch lengths
for x:


y ij
 'i  ' j
 1
x˜ ij
u'i u' j
 ij
Branch lengths of y
Branch lengths of x
PIC
y ij
 'i  ' j
 1
x ij
 'i  ' j
 ij
You could also use different branch lengths
for x:

y ij
 'i  ' j
 1
x˜ ij
u'i u' j
When could this be justified?

 ij
When could this be justified?
y ij
 'i  ' j
 1
x˜ ij
u'i u' j
 ij
yij  1xij   'i  ' j ij
Never (?)

y  b0  b1 x  
Statistical methods
(P)IC = GLS
Phylogenetic independent contrasts
Generalized Least Squares
(these are methods, not models)
Other methods for other statistical models
ML, REML, EGLS, GLM, GLMM, GEE,
“Bayesian” methods
y  b0  b1 x  
E'    V   I
2
2
Elements of V are given by shared
branch lengths under the
assumption of “Brownian motion”
evolution
Generalized Least Squares, GLS
y  y1 ,y 2 ,...,y n 
'
X  1,x
b  b0 , b1 
'
ˆb  X' V 1X 1 X' V 1 y

 

'
1
ˆ
ˆ
  y  Xb V y  X bˆ n  2
2

Ordinary least squares
ˆb  X' X1 X' y


 n  2
'
ˆ
ˆ  y  Xb y  Xbˆ

2
V=I
Related to ordinary least squares
DVD' I
z  Dy
U  DX
y  Xb  
Dy  DXb  D
z  Ub  
z  Ub 

E'   EDD ' 
 DE' D'
 D VD'   I
2
2
z  Ub 
E '   I
2

Values of

z  Dy
are linear combinations of yi

A
B
4
C
3
D
2
E
F
Y
G
H FE
A
C
1
0
BD
G
-1
H
I
0
1
2
X
3
4
GLS
parameter true value
estimate
95% confidence
LS
estimate
interval
95% confidence
interval
b0
0
2.28
[-0.82, 5.38]
-1.10
[-3.69, 1.49]
b1
0
-0.43
[-1.45, 0.60]
1.45
[0.28, 2.62]
2
2
3.35
E{Yh }
2.84
1.39
[ -0.35 , 6.03]
3.84
[0.35 , 7.33]
If IC and GLS can yield identical results and the authors
refer to IC as "a special case of GLS models" (p. 3032),
in what situation(s) would GLS be a more appropriate
method? In other words, why not just use IC?
Divergence time for desert and montane ringtail
populations assumed to be 10,000 years
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Predicting values
for ancestral and
new species
yij  1xij   'i  ' j ij

A
B
C
D
E
F
Is the prediction of the
4
estimateGof y for species I
3
H F Eprecise than
more
or less
what
you
2
A would expect from
Y a standard regression
1
C
analysis?
0
BD
G
-1
H
I
0
1
2
X
3
4
When dealing with multiple, incongruent gene trees, we can
perform multiple PIC's on each tree, and find a
correlation or not. How do we know which is the "right"
answer?
The three main phylogenetically based statistical methods
described in the reading (IC, GLS, and Monte Carlo
simulations) rely on correct information about tree
topology and branch lengths. If we are unsure of the
correctness of these basic assumptions, what is the best
way to analyze our data?
I'm unclear how data can be statistically significant when
transformed, but not significant otherwise. This seems
like cheating/lying.
The paper discussed researchers' decisions about branch
lengths, especially in terms of transformations (OU,
ACDC). Do researchers use ultrametric trees for these
analyses?
Download