Coalescent Tree

advertisement
Trees & Topologies
Chapter 3, Part 1
Terminology
• Equivalence Classes – specific separation of a
set of genes into disjoint sets covering the
whole set of genes
• Jump Process – describes which pair of genes
coalesce at each coalescence event
• Waiting Time Process – the waiting time to
the next coalescent event when there are k
genes left
2/19/2009
COMP 790-Trees & Topologies
2
Coalescent Tree
2/19/2009
COMP 790-Trees & Topologies
3
Coalescent vs. Phylogenic Trees
• Phylogenetic tree: branch length = #of mutations
• Coalescent tree: branch length = time to coalescence
(coalescent time x 2N generations x generation time)
• Expected number of mutations = /2 Coalescent time
Four representations of a coalescent tree
Rooted Phylogenetic Tree
2/19/2009
COMP 790-Trees & Topologies
4
Counting Trees & Topologies
(Ck) # of coalescent topologies with k leaves
(Bk) # of binary unrooted tree topologies with k leaves
2/19/2009
COMP 790-Trees & Topologies
5
Recursion Illustrated
Basic recursion for the number of unrooted tree topologies as a function of leaves
2/19/2009
COMP 790-Trees & Topologies
6
Recurrence Intuition
K
2
3
4
5
6
7
8
10
15
20
Bk 1
1
3
15
105
945
10395
2027025
7.9x1012 2.2x1020
Ck 1
3
18
180
2700
56700 1587600
2571912000 7.0x1018 5.6x1029
1E+18
1E+16
1E+14
1E+12
1E+10
B
100000000
C
1000000
10000
100
1
1
2/19/2009
2
3
4
5
6
7
8
9
10
COMP 790-Trees & Topologies
11
12
13
14
7
Gene Trees
• Graph that shows the ancestral relationship
between genes.
• Assume infinite sites model to build gene
trees. (Ch. 5 discusses what happens without this assumption)
• Not a coalescent tree.
• Clusters genes according to their type and
mutation pattern.
2/19/2009
COMP 790-Trees & Topologies
8
Example Gene Tree
Data set with five
sequences and four
segregating sites with
relative positions.
Built up, starting with first site, and continually adding more sites to the tree.
2/19/2009
COMP 790-Trees & Topologies
9
Building Gene Trees
1. Determine if data passes 4-gamete test. If not, there cannot
be a gene tree.
2. If each column is a binary number, sort the numbers in
decreasing order, with largest binary number in column one.
3. Add each sequence with all its characters one at a time. The
characters of a sequence to be added is a specific row, which
is read right to left. The sequence is placed by tracing from
the leaves towards the root. It has its own edges until the
prefix is encountered where it coincides with the last added
character.
4. Root is labeled with an open circle. It can be removed to
form an unrooted tree.
2/19/2009
COMP 790-Trees & Topologies
10
Example
Given the following table, build a gene tree.
1.
2.
3.
4.
Determine if data passes 4-gamete test. If not, there cannot be a gene tree.
If each column is a binary number, sort the numbers in decreasing order, with
largest binary number in column one.
Add each sequence with all its characters one at a time. The characters of a
sequence to be added is a specific row, which is read right to left. The sequence
is placed by tracing from the leaves towards the root. It has its own edges until
the prefix is encountered where it coincides with the last added character.
Root is labeled with an open circle. It can be removed to form an unrooted tree.
2/19/2009
A
B
C
D
1.
0
0
1
0
2.
0
0
0
1
3.
1
0
0
0
4.
0
0
0
1
5.
1
COMP 790-Trees & Topologies
1
0
0
11
Nested Subsamples
• Assume a sample A, is taken of size n, and within that
sample a subsample B, of size m is taken, m  n.
• Process describing the number of ancestors starts
out in (m,n) and jumps to either (m,n-1) or (m-1,n-1)
2/19/2009
COMP 790-Trees & Topologies
12
More nested subsamples
• Probability that the MRCA of B is also the
MRCA of A
• Special case: A is the whole population (n  ,
or n = 2N, and 2N is large)
2/19/2009
COMP 790-Trees & Topologies
13
More nested subsamples
M
1
2
3
5
P (A = B)
0 / 2 (no info)
1/3
1/2
2/3 = 0.67 4/5 = 0.80 9/10 = 0.90 14/15 = 0.9333
Probability (A = B)
9
19
29
Remember: time until whole population
has found a MRCA is 2 (in coalescent units)
and the time until a sample of size two has
found a MRCA is 1.
1
0.8
0.6
0.4
0.2
0
M
2/19/2009
1
2
3
5
9
19
COMP 790-Trees & Topologies
14
Hanging Subtrees
2/19/2009
COMP 790-Trees & Topologies
15
Unbalanced Trees
• Probability that the basal split into two
lineages at the root of the tree results in the
labeled, unordered partition (i, n-i), i =
1,2,…,n/2
• In large samples, unbalanced trees are unlikely.
2/19/2009
COMP 790-Trees & Topologies
16
Neanderthal Example
• Nordborg(1998) studied the tree of a combined sample of 986
human mitochondrial sequences and 1 Neanderthal
sequence.
• Assuming random mating: 2 /(986 *985) = 2 * 10-6
• Nordborg pointed out that a large part of the human sample
had found a common ancestor during the time the sequence
Neanderthal lived (30,000-100,000 years ago)
• For example, if there were 5 ancestors to present human
sample 30,000 years ago, the probability is 2 /(5*4) = 10%.
• Does not provide strong evidence against interbreeding
between Neanderthals and humans.
2/19/2009
COMP 790-Trees & Topologies
17
Next Time
• More Trees & Topologies
– A single lineage
– Disjoint subsamples
– A sample partitioned by a mutation
– The probability of going from n ancestors to k
ancestors.
2/19/2009
COMP 790-Trees & Topologies
18
Download