Construcción de cladogramas y Reconstrucción Filogenética DATOS: Alineamiento de secuencias de genes Cómo podemos transformar esta información a un contexto histórico? Patrón de Electroforesis en Campo Pulsado Spoligotyping de aislados clínicos de M. tuberculosis Cepas 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Dendograma y patrones RFLP de aislados clínicos de M. tuberculosis Las bandas polimórficas son convertidas en arreglos de 0 y 1 (0=ausencia de banda, 1=presencia de banda) • • • • • • • • • • • • • • • • • • • • • • • • • • • H37Rv 1100111111111111111111111111111111111111111 CDC1551 1111111111111111111111101111011110101111111 H37Ra 1100111111111111111111111111111111111111111 430 1111111111111111111111111111111110111111111 280 1111111111111111111111111111011110111111111 312 1111111111101001111110111111111110111111111 413 1110111111111111111111111100111110111111111 467 1110111111111111111111110111111111111111111 270 1110111111111011111111111111111110111111111 2604 1110111111111001111111111111111111111111101 300 1110111111111001111111111111111111111111101 2651 1110111111101111111111110111111110111111111 593 1110111111101011111111111111111110111111111 372 1110111111101011111111111111111110111111111 545 1110111111101011111111111111111110111111111 271 1110111111101011111111111111111110111111111 558 1110111111101011111111111111111110111111111 397 1110111111101011111111111111111110111111111 552 1110111111101001111111111111111110111111111 466 1110111110111111111111110111111111111111111 465 1110111110111111111111110111111111111111111 340 1110111110111111111111110111111111111111111 339 1110111110111111111111110111111111111111111 345 1110111110111111111111110111111111111111111 346 1110111110111111111111110111111111111111111 452 1100111111111101111111111111110110111111111 H37Pe 1100111111111011111111111111111111111111111 Phylogeny inference 1. Distance based methods -Pair wise distance matrix -Adjust tree branch lengths to fit the distance matrix (ex. Minimum squares, Neighbor joining) 2. Character based methods -Parsimony -Maximum likelihood or model based evolution In 1866, Ernst Haeckel coined the word “phylogeny” and presented phylogenetic trees for most known groups of living organisms. The Tree of Life project Surf the tree of life at: http://tolweb.org/tree/phylogeny.html What is a tree? A tree is a mathematical structure which is used to model the actual evolutionary history of a group of sequences or organisms, i.e. an evolutionary hypothesis. A tree consists of nodes connected by branches. The ancestor of all the sequences is the root of the tree Internal nodes represent hypothetical ancestors Terminal nodes represent sequences or organisms for which we have data. Each is typically called a “Operational Taxonomical Unit” or OTU. Types of Trees Bifurcating Multifurcating Polytomy Polytomies: Soft vs. Hard • Soft: designate a lack of information about the order of divergence. • Hard: the hypothesis that multiple divergences occurred simultaneously Types of Trees Trees Networks Only one path between any pair of nodes More than one path between any pair of nodes Comments on Trees Trees give insights into • underlying data Identical trees can appear • differently depending upon the method of display Information maybe lost when • creating the tree. The tree is not the underlying data. A B C B A C C A B C B A Given a multiple alignment, how do we construct the tree? A B C D E F – – – - GCTTGTCCGTTACGAT ACTTGTCTGTTACGAT ACTTGTCCGAAACGAT ACTTGACCGTTTCCTT AGATGACCGTTTCGAT ACTACACCCTTATGAG ? Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA) First, construct a distance matrix: A B C D E F – – – - GCTTGTCCGTTACGAT ACTTGTCTGTTACGAT ACTTGTCCGAAACGAT ACTTGACCGTTTCCTT AGATGACCGTTTCGAT ACTACACCCTTATGAG A B C D E B 2 C 4 4 D 6 6 6 E 6 6 6 4 F 8 8 8 8 8 From http://www.icp.ucl.ac.be/~opperd/private/upgma.html UPGMA First round A B C D E B 2 C 4 4 D 6 6 6 E 6 6 6 4 F 8 8 8 8 dist(A,B),C = (distAC + distBC) / 2 = 4 dist(A,B),D = (distAD + distBD) / 2 = 6 dist(A,B),E = (distAE + distBE) / 2 = 6 dist(A,B),F = (distAF + distBF) / 2 = 8 8 Choose the most similar pair, cluster them together and calculate the new distance matrix. C D E F A,B C D E 4 6 6 8 6 6 8 4 8 8 UPGMA Second round A,B C D E 4 6 6 8 6 6 8 4 8 8 C D E F Third round A,B C D,E C 4 D,E 6 6 F 8 8 8 UPGMA Fourth round AB,C D,E 6 F 8 D,E 8 Fifth round ABC,DE F 8 Note the this method identifies the root of the tree. UPGMA assumes a molecular clock • • • The UPGMA clustering method is very sensitive to unequal evolutionary rates (assumes that the evolutionary rate is the same for all branches). Clustering works only if the data are ultrametric Ultrametric distances are defined by the satisfaction of the 'three-point condition'. The three-point condition: A B C For any three taxa, the two greatest distances are equal. UPGMA fails when rates of evolution are not constant A tree in which the evolutionary rates are not equal From http://www.icp.ucl.ac.be/~opperd/private/upgma.html A B C D B 5 C 4 7 D 7 10 7 E 6 9 6 5 F 8 11 8 9 E 8 (Neighbor joining will get the right tree in this case.) Character state methods MAXIMUM PARSIMONY Logic: Examine each column in the multiple alignment of the sequences. Examine all possible trees and choose among them according to some optimality criteria Method we’ll talk about • Maximum parsimony Maximum Parsimony Simpler hypotheses are preferable to more complicated ones and that as hoc hypotheses should be avoided whenever possible (Occam’s Razor). Thus, find the tree that requires the smallest number of evolutionary changes. W X Y Z – – - 0123456789012345 ACTTGACCCTTACGAT AGCTGGCCCTGATTAC AGTTGACCATTACGAT AGCTGGTCCTGATGAC W X Y Z Maximum Parsimony Start by classifying the sites: 123456789012345678901 Mouse CTTCGTTGGATCAGTTTGATA Rat CCTCGTTGGATCATTTTGATA Dog CTGCTTTGGATCAGTTTGAAC Human CCGCCTTGGATCAGTTTGAAC -----------------------------------Invariant * * ******** ***** Variant ** * * ** -----------------------------------Informative ** ** Non-inform. * * 123456789012345678901 CTTCGTTGGATCAGTTTGATA CCTCGTTGGATCATTTTGATA CTGCTTTGGATCAGTTTGAAC CCGCCTTGGATCAGTTTGAAC ** * Mouse Rat Dog Human Site 5: Mouse G G T G Rat Site 2: Mouse T C C Rat Mouse T T C Dog T Dog T C Human G Rat C Human Dog T Dog T C Mouse T C Mouse T T T Dog G C Human Mouse G T C Human Dog G G C Rat Dog G G Human T Rat G Mouse G G Site 3: T Rat G Human Mouse G G G T Dog Mouse G T G Dog Rat G C Human C Rat C C Human Rat G T G Human Maximum Parsimony 123456789012345678901 Mouse CTTCGTTGGATCAGTTTGATA Rat CCTCGTTGGATCATTTTGATA Dog CTGCTTTGGATCAGTTTGAAC Human CCGCCTTGGATCAGTTTGAAC Informative ** ** Mouse Dog Dog Mouse Mouse Rat Rat Human Rat Human Dog Human 3 0 1 EVOLUCIÓN IN VITRO POR INTERMEDIO DE PCR