C - UPCH

advertisement
Construcción de cladogramas y
Reconstrucción Filogenética
DATOS: Alineamiento de secuencias de genes
Cómo podemos transformar esta información a un contexto histórico?
Patrón de Electroforesis en
Campo Pulsado
Spoligotyping de aislados clínicos de M. tuberculosis
Cepas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Dendograma y patrones RFLP de aislados
clínicos de M. tuberculosis
Las bandas polimórficas son convertidas en arreglos
de 0 y 1 (0=ausencia de banda, 1=presencia de banda)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
H37Rv 1100111111111111111111111111111111111111111
CDC1551 1111111111111111111111101111011110101111111
H37Ra 1100111111111111111111111111111111111111111
430
1111111111111111111111111111111110111111111
280
1111111111111111111111111111011110111111111
312
1111111111101001111110111111111110111111111
413
1110111111111111111111111100111110111111111
467
1110111111111111111111110111111111111111111
270
1110111111111011111111111111111110111111111
2604
1110111111111001111111111111111111111111101
300
1110111111111001111111111111111111111111101
2651
1110111111101111111111110111111110111111111
593
1110111111101011111111111111111110111111111
372
1110111111101011111111111111111110111111111
545
1110111111101011111111111111111110111111111
271
1110111111101011111111111111111110111111111
558
1110111111101011111111111111111110111111111
397
1110111111101011111111111111111110111111111
552
1110111111101001111111111111111110111111111
466
1110111110111111111111110111111111111111111
465
1110111110111111111111110111111111111111111
340
1110111110111111111111110111111111111111111
339
1110111110111111111111110111111111111111111
345
1110111110111111111111110111111111111111111
346
1110111110111111111111110111111111111111111
452
1100111111111101111111111111110110111111111
H37Pe 1100111111111011111111111111111111111111111
Phylogeny inference
1. Distance based methods
-Pair wise distance matrix
-Adjust tree branch lengths to fit the distance
matrix (ex. Minimum squares, Neighbor
joining)
2. Character based methods
-Parsimony
-Maximum likelihood or model based
evolution
In 1866, Ernst Haeckel coined the word “phylogeny” and presented
phylogenetic trees for most known groups of living organisms.
The Tree of Life project
Surf the tree of life at:
http://tolweb.org/tree/phylogeny.html
What is a tree?
A tree is a mathematical structure which is used to model
the actual evolutionary history of a group of sequences or organisms,
i.e. an evolutionary hypothesis.
A tree consists of nodes connected by branches.
The ancestor of all the
sequences is the root of
the tree
Internal nodes represent
hypothetical ancestors
Terminal nodes represent sequences or
organisms for which we have data.
Each is typically called a
“Operational Taxonomical Unit”
or OTU.
Types of Trees
Bifurcating
Multifurcating
Polytomy
Polytomies: Soft vs. Hard
• Soft: designate a lack of information about the
order of divergence.
• Hard: the hypothesis that multiple divergences
occurred simultaneously
Types of Trees
Trees
Networks
Only one path between
any pair of nodes
More than one path
between any pair of
nodes
Comments on Trees
Trees give insights into •
underlying data
Identical trees can appear •
differently depending upon the
method of display
Information maybe lost when •
creating the tree. The tree is not
the underlying data.
A
B
C
B
A
C
C
A
B
C
B
A
Given a multiple alignment, how do we construct the tree?
A
B
C
D
E
F
–
–
–
-
GCTTGTCCGTTACGAT
ACTTGTCTGTTACGAT
ACTTGTCCGAAACGAT
ACTTGACCGTTTCCTT
AGATGACCGTTTCGAT
ACTACACCCTTATGAG
?
Construction of a distance tree using clustering with the Unweighted
Pair Group Method with Arithmatic Mean (UPGMA)
First, construct a distance matrix:
A
B
C
D
E
F
–
–
–
-
GCTTGTCCGTTACGAT
ACTTGTCTGTTACGAT
ACTTGTCCGAAACGAT
ACTTGACCGTTTCCTT
AGATGACCGTTTCGAT
ACTACACCCTTATGAG
A B C D
E
B 2
C 4
4
D 6
6
6
E 6
6
6
4
F 8
8
8
8
8
From http://www.icp.ucl.ac.be/~opperd/private/upgma.html
UPGMA
First round
A B C D
E
B 2
C 4
4
D 6
6
6
E 6
6
6
4
F 8
8
8
8
dist(A,B),C = (distAC + distBC) / 2 = 4
dist(A,B),D = (distAD + distBD) / 2 = 6
dist(A,B),E = (distAE + distBE) / 2 = 6
dist(A,B),F = (distAF + distBF) / 2 = 8
8
Choose the most similar pair,
cluster them together and calculate
the new distance matrix.
C
D
E
F
A,B
C
D
E
4
6
6
8
6
6
8
4
8
8
UPGMA
Second round
A,B
C
D
E
4
6
6
8
6
6
8
4
8
8
C
D
E
F
Third round
A,B
C D,E
C
4
D,E
6
6
F
8
8
8
UPGMA
Fourth round
AB,C
D,E
6
F
8
D,E
8
Fifth round
ABC,DE
F
8
Note the this method identifies the root of the tree.
UPGMA assumes a molecular clock
•
•
•
The UPGMA clustering method is very sensitive to unequal
evolutionary rates (assumes that the evolutionary rate is the
same for all branches).
Clustering works only if the data are ultrametric
Ultrametric distances are defined by the satisfaction of the
'three-point condition'.
The three-point condition:
A
B
C
For any three taxa, the two greatest distances are equal.
UPGMA fails when rates of evolution are not constant
A tree in which the evolutionary rates are not equal
From http://www.icp.ucl.ac.be/~opperd/private/upgma.html
A
B
C
D
B
5
C
4
7
D
7
10
7
E
6
9
6
5
F
8
11
8
9
E
8
(Neighbor joining will get
the right tree in this case.)
Character state methods
MAXIMUM PARSIMONY
Logic:
Examine each column in the multiple alignment of the sequences.
Examine all possible trees and choose among them according to
some optimality criteria
Method we’ll talk about
• Maximum parsimony
Maximum Parsimony
Simpler hypotheses are preferable to more complicated ones and that as
hoc hypotheses should be avoided whenever possible (Occam’s Razor).
Thus, find the tree that requires the smallest number of evolutionary
changes.
W
X
Y
Z
–
–
-
0123456789012345
ACTTGACCCTTACGAT
AGCTGGCCCTGATTAC
AGTTGACCATTACGAT
AGCTGGTCCTGATGAC
W
X
Y
Z
Maximum Parsimony
Start by classifying the sites:
123456789012345678901
Mouse
CTTCGTTGGATCAGTTTGATA
Rat
CCTCGTTGGATCATTTTGATA
Dog
CTGCTTTGGATCAGTTTGAAC
Human
CCGCCTTGGATCAGTTTGAAC
-----------------------------------Invariant
* * ******** *****
Variant
** *
*
**
-----------------------------------Informative **
**
Non-inform.
*
*
123456789012345678901
CTTCGTTGGATCAGTTTGATA
CCTCGTTGGATCATTTTGATA
CTGCTTTGGATCAGTTTGAAC
CCGCCTTGGATCAGTTTGAAC
** *
Mouse
Rat
Dog
Human
Site 5:
Mouse
G
G
T
G
Rat
Site 2:
Mouse
T
C
C
Rat
Mouse
T
T
C
Dog
T
Dog
T
C
Human
G
Rat
C
Human
Dog
T
Dog
T
C
Mouse
T
C
Mouse
T
T
T
Dog
G
C
Human
Mouse
G
T
C
Human
Dog
G
G
C
Rat
Dog
G
G
Human
T
Rat
G
Mouse
G
G
Site 3:
T
Rat
G
Human
Mouse
G
G
G
T
Dog
Mouse
G
T
G
Dog
Rat
G
C
Human
C
Rat
C
C
Human
Rat
G
T
G
Human
Maximum Parsimony
123456789012345678901
Mouse
CTTCGTTGGATCAGTTTGATA
Rat
CCTCGTTGGATCATTTTGATA
Dog
CTGCTTTGGATCAGTTTGAAC
Human
CCGCCTTGGATCAGTTTGAAC
Informative **
**
Mouse
Dog
Dog
Mouse
Mouse
Rat
Rat
Human
Rat
Human
Dog
Human
3
0
1
EVOLUCIÓN IN VITRO POR
INTERMEDIO DE PCR
Download