Tricks for trees: Having reconstructed trees, what can we do with them?

advertisement
Tricks for trees: Having reconstructed phylogenies
what can we do with them?
Mike Steel
Allan Wilson Centre for
Molecular Ecology and Evolution
Biomathematics Research Centre
University of Canterbury,
Christchurch, New Zealand
DIMACS, June 2006
1
Where are phylogenetic trees used?




Evolutionary biology – species relationships,
dating divergences, speciation processes,
molecular evolution.
Ecology – classifying new species; biodiversity,
co-phylogeny, migration of populations.
Epidemiology – systematics, processes,
dynamics
Extras - linguistics, stematology, psychology.
2
Phylogenetic trees
[Definition]
A phylogenetic X-tree is a tree T=(V,E)
with a set X of labelled leaves, and all other vertices
unlabelled and of degree >3.
If
all non-leaf vertices have degree 3 then T is binary
3
Trees and splits
3
1
2
e  Ae | Be
4
5
(T )  { Ae | Be : e  E}
6
Partial order:
( PX , )
T  T '  (T )  (T ' )
Buneman’s Theorem
4
Quartet trees
• A quartet tree is a binary phylogenetic tree on 4
leaves (say, x,y,w,z) written xy|wz.
x
w
y
z
• A phylogenetic X-tree displays xy|wz if there is an edge in
T whose deletion separates {x,y} from {w,z}
x
y
w
r
z
s
u
5
Corresponding notions for rooted trees

Clusters (in place of splits)

Triples in place of quartets
6
How are trees useful in epidemiology?
Systematics and reconstruction




How are different
types/strains of a virus
related?
When, where, and how did
they arise?
What is their likely future
evolution?
What was the ancestral
sequence?
7
How are trees useful in epidemiology?
Processes and dynamics
(“Phylodynamics”)



How do viruses change with time
in a population? Population size etc
What is their rate of mutation,
recombination, selection?
Within-host dynamcs



How do viruses evolve in a single patient?
How is this related to the progression of
the disease?
How much compartmental variation
exists?
8
What do the shapes of these trees tell us about the
processes governing their evolution?
Eg. Population dynamics, selection
Coalescent prediction
10
Tree shapes (non-metric)
George Yule
a
b
c
d
e
11
Why do trees on the same taxa disagree?
Model violation
1.
1.
2.
3.
2.
3.
4.
1.
2.
3.
4.
“true model” differs from “assumed model”
“true model = assumed model” but estimation method
not appropriate to model
model true but too parameter rich (non-identifyability)
Sampling error (and factors that make it worse!)
Alignment error
Evolutionary processes
Lineage sorting
Recombination
Horizontal gene transfer; hybrid taxa
Gene duplication and loss
13
Sampling error that’s hard to deal with
T1
T2
T3
T4
Time
?
e
14
Example: Deep divergence in the Metazoan phylogeny
Deuterostomes
Cnidaria
Ustilago
Arthropods
Crustacea
Urochordata
Annelida
Cephalochordata
Mollusca
Echinodermata
Glossina
Anopheles
Mammalia
Drosophila
Actinopter
Coleoptera
Phanerochaete
Cryptococcus
Hymenoptera Hemiptera
Siphonaptera
Lepidoptera
Schizosaccharomyces
Chelicerata
Saccharomyces
Tardigrades
Candida
Paracooccidioides
Strongyloides
Gibberella
Neurospora
Magnaporth
Heterodera
Ascaris Meloidogyne
Brugia
Glomus
Pristionchus
Ancylostoma
Neocallimastix
Caenorhabditis briggsae
Caenorhabditis elegans
Fungi
Trichinella
Monosiga brevicollis
Monosiga ovata
Ctenophora
Echinococcus
Fasciola
Schistosoma mansoni
Schistosoma japonicum
Dugesia
Nematodes
Choanoflagellates
Platyhelminthes
From Huson and Bryant, 2006
15
Models
2
1
1
3
e
vs
2
4
e
3
4
Finite state Markov process
k
1
e2
16
Models
3
1
3
1
vs
2
2
4
4
•“site saturation”
• subdividing long edges only offers a partial
remedy (trade-off).
17
Why do trees on the same taxa disagree?
Model violation
1.
1.
2.
3.
2.
3.
4.
1.
2.
3.
4.
“true model” differs from “assumed model”
“true model = assumed model” but estimation method
not appropriate to model
model true but too parameter rich (non-identifyability)
Sampling error (and factors that make it worse!)
Alignment
Evolutionary processes
Lineage sorting
Recombination
Horizontal gene transfer; hybrid taxa
Gene duplication and loss
18
Gene trees vs species trees
a
Theorem
b
c
a
b
c
J. H. Degnan and N.A. Rosenberg, 2006.
For n>5, for any tree, there are branch lengths and
population sizes for which the most likely gene tree is
different from the species tree.
Discordance of species trees with their most likely gene trees.
PLoS Genetics, 2(5), e68 May, 2006
19
Example
?
Orangutan
Gorilla
Chimpanzee
Adapted From the Tree of the Life Website,
University of Arizona
Human
20
Distinguishing between signals

A
Lineage sorting vs sampling error vs HGT
B
C
A
B
C
A C
B
21
Why do trees on the same taxa disagree?
Model violation
1.
1.
2.
3.
2.
3.
4.
1.
2.
3.
4.
“true model” differs from “assumed model”
“true model = assumed model” but estimation method
not appropriate to model
model true but too parameter rich (non-identifyability)
Sampling error (and factors that make it worse!)
Alignment
Evolutionary processes
Lineage sorting
Recombination
Horizontal gene transfer; hybrid taxa
Gene duplication and loss
22
Given a tree what questions might we want to answer?




How reliable is a split?
Where is the root of the tree? Relative ranking of vertices? Dating?
How well supported is some ‘deep divergence’ resolved?
What model best describes the evolution of the sequences
(molecular clock? dS/dN ratio constant? etc)
Statistical approaches:
 Non-parametric bootstrap
 Parametric bootstrap
 Likelihood ratio tests
 Bayesian posterior probabilities
 Tests (KH, SH, SOWH)
Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000.
Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49: 652-670.
23
From Steve Thompson, Florida State Uni
24
Example
25
Non-parametric bootstrap
26
27
Dealing with incompatibility: Consensus trees



Strict
Majority rule
Semistrict consensus
28
Consensus networks

Take the splits that are in at least x% of the trees and
represent them by a graph
Splits Graph (G()) – Dress and Huson
Each split is represented by a class of ‘parallel’ edges

Simplest example (n=4).
29
(NS)
(NS)
(SS)
(A)
(A)
(SS)
(NS)
(NS)
(SS)
(SS)
(SS)
(SS)
(NS)
(SS)
(NS)
(N,NS)
R.nivicola(N)
(C,S)
(NS, N)
(SS)
(SS)
(NS)
chloroplast
JSA tree
30
(SS)
(A)
(SS)
(SS)
(SS)
(NS)
(SS)
(SS)
(SS)
(N)
R.nivicola
(SS)
(NS,N)
(A)
(NS)
(NS)
(NS)
(SS,NS)
(NS) (NS,N) (NS)
(SS)
nuclear
(NS)
(SS)
ITS tree
31
consensus network
(ITStree+JSAtree)
I
III
II
R.nivicola
32
Maximum agreement subtrees

Concept

Computational complexity
33
Comparing trees

Splits metric (Robinson-Foulds)

Statistical aspects.
Tree rearrangement operations
– the graph of
trees (rSPR).

Cophylogeny
34
Co-phylogeny (m. charleston)
35
Supertrees
Compatibility concept
 Compatibility of rooted trees (BUILD)
 Why do we want to do this?
 Extension – higher order taxa, dates
 Methods for handling incompatible trees
(MRP; mincut variants; minflip)

36
Compatibility
A set Q of quartets is compatible if there is a phylogenetic
X-tree T that displays each quartet of Q

Example: Q={12|34, 13|45, 14|26}
1
3
2
4
5
6
Complexity?
37
Supertrees
Compatibility concept
 Compatibility of rooted trees (BUILD)
 Why do we want to do this?
 Extension – higher order taxa, dates
 Methods for handling incompatible trees
(MRP; mincut variants; minflip)

38
Phylogenetic networks



Consensus setting: consensus networks
Minimizing hybrid/reticulate vertices
Supernetworks – Z closure, filtering
39
a
b
c
d
a
c
b
d
a
b
c
d
Networks can represent:
 Reticulate evolution (eg. hybrid species)
 Phylogenetic uncertainty (i.e. possible alternative trees)
Z-closure
Given T1,…, Tk on overlapping sets of species,
let   (T1 )    (Tk )
construct spcl2() and construct the
‘splits graph’ of the resulting splits that are ‘full’.
40
Split closure operation (Meacham 1986)
A1
A2
A1
B1
B2
B1UB2
,
A1UA2
B2
B1
A2
A1
B2
41
42
43
Reconstructing ancestral sequences

Methods (MP, Likelihood, Bayesian)
Quiz. MP for a balanced tree = majority state?

Information-theoretic considerations
44
Statistics of parsimony (clustering on a tree)
45
Download