12625764_cccc.ppt (744.5Kb)

advertisement
Coalescent
Consequences for
Consensus
Cladograms
J. H. Degnan1, M. Degiorgio2, D. Bryant3, and N. A. Rosenberg1,2
1 Dept.
of Human Genetics, U. of Michigan
2 Bioinformatics Program, U. of Michigan
3 Dept. of Mathematics, U. of Auckland
16 September 2007
Outline






Species trees vs. gene trees
Consensus tree background
Asymptotic consensus trees
Finite sample consensus trees
Consistency results
Conclusions
Gene trees vary across the genome
Why? Incomplete lineage sorting,
horizontal gene transfer, sampling, etc.
Gene tree discordance
 From one true species tree, we expect there to
be different gene trees at different loci as a
result of lineage sorting, independently of
problems due to estimation or sampling error.
 Gene tree discordance depends especially on
branch lengths in the species tree, measured
by the number of generations scaled by
effective population size, t / (2N).
(((
A,
B)
G
,C
T:
),D
(((
A,
)
B
G
),D
T:
(((
),C
A,
)
C
G
),B
T:
(((
),D
A,
)
C)
G
,D
T:
(((
),B
A,
)
D
G
),B
T:
(((
),C
A,
)
D
G
),C
T:
(((
),B
B,
)
C)
G
,A
T:
),D
(((
B,
)
C
G
),D
T:
(((
),A
B,
)
D)
G
T:
,A
(((
),C
B,
)
D)
G
,C
T:
(((
),A
C,
)
D
G
),A
T:
(((
),B
C,
)
D)
G
T:
,B
((A
),A
,B
)
),(
G
T:
C
,D
((A
))
,C
G
),(
T:
((A B,D
))
,D
),(
B,
C
))
G
T:
(((
A,
B)
G
,C
T:
),D
(((
A,
)
B
G
),D
T:
(((
),C
A,
)
C
G
),B
T:
(((
),D
A,
)
C)
G
,D
T:
(((
),B
A,
)
D
G
),B
T:
(((
),C
A,
)
D
G
),C
T:
(((
),B
B,
)
C)
G
,A
T:
),D
(((
B,
)
C
G
),D
T:
(((
),A
B,
)
D)
G
T:
,A
(((
),C
B,
)
D)
G
,C
T:
(((
),A
C,
)
D
G
),A
T:
(((
),B
C,
)
D)
G
T:
,B
((A
),A
,B
)
),(
G
T:
C
,D
((A
))
,C
G
),(
T:
((A B,D
))
,D
),(
B,
C
))
G
T:
x=2, y=1.2
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
x=y=0.1
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Consensus (majority-rule)
Asymptotic consensus
trees
 Consensus trees are usually statistics, functions of
data like x-bar.
 We consider replacing observed (estimated) gene
trees with their theoretical probabilities under
coalescence and determining the resulting consensus
tree.
 Motivation: if there are a large number of independent
loci, observed clade proportions should approximate
their theoretical probabilities.
Types of consensus trees

Strict—only clades that are included in observed trees are in the
consensus tree. In the coalescent model, all clades have probability > 0.

Democratic vote—use the gene tree that occurs most frequently.

Majority rule—consensus tree has all clades that were observed in > 50%
of trees.

Greedy—sort clades by their proportions. Accept the most frequently
observed clades one at a time that are compatible with already accepted
clades. Do this until you have a fully resolved tree.

R*—for each set of 3 taxa, find the most commonly occurring triple e.g.,
(AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring
triples.
Unresolved zone for majority-rule
and too-greedy zone
What about finite samples?
 If you sample 10 loci, you could have:
 All 10 match the species tree
 9 match the species tree, 1 disagrees
 8 match the species tree, 2 disagree, etc.
 You can consider gene trees as categories and use
multinomial probabilities for the probability of your
sample
 By enumerating all multinomial samples, you can
compute the probabilities of every possible consensus
tree.
Are consensus trees inconsistent
estimators of species trees?
 Theorem 1. Majority-rule asymptotic
consensus trees (MACTs) do not have any
clades not on the species tree.
 Theorem 2. Greedy asymptotic consensus
trees (GACTs) can be misleading estimators of
species for the 4-taxon asymmetric tree and for
any species tree with n > 4 species.
 Theorem 3. R* asymptotic consensus trees
(RACTs) always match the species tree.
Conclusions
 Coalescent gene tree probabilities are useful for
understanding asymptotic behavior of consensus trees
constructed from independent gene trees.
 R* consensus trees are consistent and more resolved
than majority-rule consensus trees.
 Greedy consensus trees can be misleading, but are
quicker to approach the species tree than majority-rule
or R* when outside of the greedy zone.
Download