Uploaded by elkelambrt

Introduction To Phylogenetics Lecture

advertisement
Introduction to Phylogenetic
Trees
Alignment Recap
• What is the fundamental principle underpinning multiple sequence
alignment?
• Homology - we want all columns in our alignment to consist of
characters which share common ancestry
• Alignments are made up from conserved and variable blocks
• Alignment programs aim to maximise conserved blocks
• CLUSTAL - very widely used, uses Progressive Sequence Alignment
• MUSCLE - more sophisticated program, uses multiple processes to avoid
the problems of ‘once a gap, always a gap’
• Gap penalties - gaps are rare in nature, so we want penalize them in our
alignments. Use different gap penalties for datasets of closely and distantly
related sequences
Introduction to Phylogenetic Trees
Aims: To learn how to read and interpret phylogenetic trees
and get a general overview of phylogenetic analysis
Objectives: at the end of this lecture you should:




be able to understand what a phylogenetic tree is


be able to differentiate between monophyletic, paraphyletic &
polyphyletic groups


be able to differentiate between orthologues & paralogues


be able to write out phylogenetic trees in the Newick format
understand the standard terminology used in phylogenetics
Terminology
b
c
d
f
e
a
phylogenetic tree = evolutionary tree = phylogeny = dendrogram
- a graphic display of predicted evolutionary relationships
- may be created for genes, proteins or species
- consists of “branches” and “nodes”
branches, also called “edges”
- internal (nodes to nodes) or terminal (nodes to terminals)
terminals = “leaves” = operational taxonomic units, “OTUs”
- OTUs = organisms (taxa), tree = “species tree”
- OTUs = genes or proteins, tree = “gene tree”
g
i
h
Terminology
node = point at which two branches diverge
- correspond to hypothetical last common ancestor
- branches represent divergence event
- gene tree: divergence = gene duplication event
c
- species tree: divergence = speciation event
b
f
e
a
root = origin of the tree, or sub-tree
d
g
i
h
Terminology
Polytomy
• Unresolved phylogenetic relationships
• Three or more branches leading from one
node
• Polytomies may result from a lack of
phylogenetic data (soft polytomy)
• Soft polytomies may be resolved by
increasing the phylogenetic signal, i.e.
using more data
• Polytomies may arise if multiple speciation
events take place instantaneously (hard
polytomy) - see the D. simulans complex
• Hard polytomies cannot be resolved by
increasing the volume of data
Drosophila simulans complex
Rooting Phylogenetic Trees:
1
2
1
2
 root = oldest point in the tree
if molecular clock -> root would be in the middle
(i.e. common ancestor equidistant from everything)
 without a clock (i.e., in the real world) need external point of reference
 = outgroup, = anything not in your ingroup (= group of interest)
for gene trees can use distant relative/gene family
for species tree use sister group = closest relative to ingroup
 root - doesn’t change distances, but shows chronological order of events
Phylograms &
Cladograms
Phylogram
Cladogram
Mb_spo11_1_jg
Mb_spo11_1_jg
Drosophila_me
Drosophila_me
Lg_jgi_Lotgi1
Lg_jgi_Lotgi1
Nematostella_
Nematostella_
Xenopus_tropi
Xenopus_tropi
Homo_sapiens_
Homo_sapiens_
Trichoplax_ad
Trichoplax_ad
At_spo11_3gi_
At_spo11_3gi_
Os_spo11_3gi_
Os_spo11_3gi_
Monosiga_brev
Monosiga_brev
Movata_DC4710
Movata_DC4710
Aspergillus_n
Aspergillus_n
Schizosacchar
Schizosacchar
Saccharomyces
Saccharomyces
Cryptococcus_
Cryptococcus_
Plasmodium_fa
Plasmodium_fa
Entamoeba_his
Entamoeba_his
Caenorhabditi
Caenorhabditi
At_spo11_2gi_
At_spo11_2gi_
Os_spo11_1gi_
Os_spo11_1gi_
0.2
• Branch lengths informative
• Scale bar (no. of subs/site)
• Branch lengths non-informative
• Easier to read
• No scale bar
Tree Conversion
Coffee
Chocolate
Caviar
Oyster
Lobster
Coffee
Truffle
Nori
Coffee
Coffee
Chocolate
Truffle
=
Caviar
Oyster
Lobster
Nori
Trees Are About Groupings
monophyletic (pure) group
(clade)
paraphyletic group
(convenience)
polyphyletic group
(similarities =
parallel/convergent evolution)
nodes define “clades”
clade = monophyletic group
= node plus all descendants
- share unique common ancestor (relative to the rest of the tree)
and common history
Paraphyly & Polyphyly
• The two groupings are often difficult to differentiate
• The smaller number of parsimony steps indicates which is the
more likely type of group
Dikarya
Zygomycota 1
Zygomycota 2
Zygomycota 3
Chytridiomycota 1
Zygomycota 4
Zygomycota 5
Chytridiomycota 2
Chytridiomycota 3
Chytridiomycota 4
Microsporidia
Adapted from James et al. (2006) Nature 443:818-822
Paraphyly & Polyphyly
• Chytridiomycota paraphyly: 6 steps
• Chytridiomycota polyphyly: 4 steps
Dikarya
Zygomycota 1
Zygomycota 2
Zygomycota 3
Chytridiomycota 1
Zygomycota 4
Zygomycota 5
Chytridiomycota 2
Chytridiomycota 3
Chytridiomycota 4
Microsporidia
Adapted from James et al. (2006) Nature 443:818-822
Paraphyly & Polyphyly
• Zygomycota paraphyly: 3 steps
• Zygomycota polyphyly: 5 steps
Dikarya
Zygomycota 1
Zygomycota 2
Zygomycota 3
Chytridiomycota 1
Zygomycota 4
Zygomycota 5
Chytridiomycota 2
Chytridiomycota 3
Chytridiomycota 4
Microsporidia
Adapted from James et al. (2006) Nature 443:818-822
Questions
• What type of group are the mammals?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Mammal 1
Mammal 2
Mammal 3
Reptiles
Amphibians
Fish
Questions
• What type of group are the mammals?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Mammal 1
Mammal 2
Mammal 3
Reptiles
Amphibians
Fish
Questions
• What type of group are the reptiles?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Mammals
Reptile 1
Reptile 2
Birds
Reptile 3
Reptile 4
Amphibians
Questions
• What type of group are the reptiles?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Mammals
Reptile 1
3 steps vs. 4 steps
Reptile 2
Birds
Reptile 3
Reptile 4
Amphibians
Questions
• What type of group are the slugs?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Snail 1
Slug 1
Snail 2
Snail 3
Snail 4
Slug 2
Snail 6
Slug 3
Snail 7
Snail 8
Questions
• What type of group are the slugs?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Snail 1
Slug 1
Snail 2
3 steps vs. 5 steps
Snail 3
Snail 4
Slug 2
Snail 6
Slug 3
Snail 7
Snail 8
Questions
• What type of group are the Monosiga?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Salpingoeca 1
Salpingoeca 2
Salpingoeca 3
Salpingoeca 4
Codosiga
Choanoeca
Monosiga 1
Monosiga 2
Desmarella
Salpingoeca 5
Salpingoeca 6
Salpingoeca 7
Salpingoeca 8
Salpingoeca 9
Adapted from Nitsche et al. (2011) JEM 58:452-462
Questions
• What type of group are the Monosiga?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Salpingoeca 1
Salpingoeca 2
Salpingoeca 3
2 steps vs. 7 steps
Salpingoeca 4
Codosiga
Choanoeca
Monosiga 1
Monosiga 2
Desmarella
Salpingoeca 5
Salpingoeca 6
Salpingoeca 7
Salpingoeca 8
Salpingoeca 9
Adapted from Nitsche et al. (2011) JEM 58:452-462
Questions
• What type of group are the Salpingoeca?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Salpingoeca 1
Salpingoeca 2
Salpingoeca 3
Salpingoeca 4
Codosiga
Choanoeca
Monosiga 1
Monosiga 2
Desmarella
Salpingoeca 5
Salpingoeca 6
Salpingoeca 7
Salpingoeca 8
Salpingoeca 9
Adapted from Nitsche et al. (2011) JEM 58:452-462
Questions
• What type of group are the Salpingoeca?
• a) monophyletic, b) paraphyletic, c) polyphyletic
Salpingoeca 1
Salpingoeca 2
Unknown:
4 steps vs. 4 steps
Salpingoeca 3
Salpingoeca 4
Codosiga
Choanoeca
Monosiga 1
Monosiga 2
Desmarella
Salpingoeca 5
Salpingoeca 6
Salpingoeca 7
Salpingoeca 8
Salpingoeca 9
Adapted from Nitsche et al. (2011) JEM 58:452-462
Homologues can be Orthologues or
Paralogues
- X and X’ are paralogues, i.e.
XX’ is a multigene family.
- All the X genes are orthologues
of each other
- All the X’ genes are orthologues
of each other.
A.
•
•
B.
for orthologues, gene trees = species trees
It is essential not to mix up orthologues and paralogues for species trees
Xenologues (Xenology)
Results from Lateral Gene Transfer (LGT)
Salmonella
E. coli
Salmonella
Mycoplasma
Listeria
X
Strep
E. coli
Mycoplasma
Listeria
Strep
Bacillus
* very common in bacteria, e.g. pathogenicity island, antibiotic resistance genes
important in bacterial evolution -> new metabolic pathways, etc.
(e.g. E.coli K12 vs E.coli 0157, ~1.5 mB difference in genome size)
Bacillus
Two Main Categories of Phylogenetic Methods
 Distance methods
 tree based single metric: % difference (distance) between sequences
 also referred to as “clustering” or “algorithmic” methods
 take data (matrix of % D), plug into equation, -> tree, one solution only
 fast, easy, reasonably accurate, good enough for many things
 methods: (UPGMA), neighbour-joining
 Discrete data (tree searching) methods
 each column in alignment = discrete data point
i.e., hypothesis for each column of alignment
 look for tree that best fits this collection of hypotheses
 for >8 OTUs = a lot of possible trees
 much more detail, precision..., much slower
 methods: parsimony, maximum likelihood, bayesian inference
Newick Format: Written Tree Description
 common interchange format, read by most tree drawing programs
also called “New Hampshire format”
 describes a tree using set annotation
all parenthesis must be balanced
all taxa and groups are separated by commas
no spaces
ended by semi-colon
((( ( A , B ) , C) , ( D , E )) , F);
A B C D E
F
 Very useful for producing representative phylogenies from different studies
Questions
(((Chimp,Bonobo),Human),Gorila);
Questions
( ( (Human,Chimp) , (Lion,Tiger)),(Kangaroo,Wallaby));
Questions
(((Human,Chimp),(Lion,Tiger)),(Kangaroo, Wallaby));
Questions
( ((((Drosophila,Mosquito) ,Ladybird),(Dragonfly,Damselfly)),Crab),
(Spider,Scorpion));
Questions
((mauritiana,sechellia,simulans) ,melanogaster);
Questions
((((Stramenopiles,Alveolata,Rhizaria),Archaeplastida),(Opisthokonta,Amoebozoa) ) ,Excavata);
Summary
• A phylogenetic tree provides a graphical representation of
evolutionary relationships between genes, proteins or species
• The root represents the oldest part of the tree from which all descendant
sequences are derived
• An outgroup, of closely related sequences, is required in order to accurately
place the root
• A monophyletic group, or clade, is a pure group which contains the ancestral
node and all descendant sequences
• Orthologues are homologous genes derived by speciation events
• Paralogues are homologous genes derived by gene duplication events
within a single genome
• Molecular phylogenies are created using either distance methods or discrete
data methods
• The Newick format allows phylogenies to be written out in a single line of
text
Download