Introduction to Phylogenetics for immunologists

advertisement
Introduction to Phylogenies
for immunologists 2013
Dr Laura Emery
Laura.Emery@ebi.ac.uk
www.ebi.ac.uk/training
Objectives
After this tutorial you should be able to…
• Use essential phylogenetic terminology effectively
• Discuss aspects of phylogenies and their implications for
phylogenetic interpretation
• Apply phylogenetic principles to interpret simple trees
This course will not:
• Provide you with an overview of phylogenetic methods
• Enable you to use tools to construct your own phylogenies
• Enable you to evaluate whether a sensible phylogenetic
model or method was selected to construct a phylogeny
Outline
• Introduction
• Aspects of a tree
1. Topology
2. Branch lengths
3. Nodes
4. Confidence
• Simple phylogenetic interpretation
• Including homology, gene duplication, co-evolution
What can I do with phylogenetics?
• Deduce relationships among species or genes or cells
• Deduce the origin of pathogens
• Identify biological processes that affect how your
sequence has evolved e.g. identify genes or residues
undergoing positive selection
• Explore the evolution of traits through history
• Estimate the timing of major historical events
• Explore the impact of geography on species
diversification
What is a phylogenetic tree?
Darwin 1837
A tree is an explanation of how sequences evolved, their
genealogical relationships and thus how they came to be
the way they are today (or at the time of sampling).
Phylogenies explain genealogical
relationships
• Family tree
Aspects of a tree
1.
Topology (branching order)
2.
Branch lengths (indication of genetic change)
3.
Nodes
i.
Tips (sampled sequences known as taxa)
ii.
Internal nodes (hypothetical ancestors)
iii. Root (oldest point on the tree)
4.
Confidence (bootstraps/probabilities)
1. Topology
The topology describes the branching structure of the tree,
which indicate patterns of relatedness.
These trees
display the
same topology
These trees
display
different
topologies
A
A
B
B
C
C
B
C
A
A
C
C
B
A
B
B
A
C
Topology Question
Are these topologies the same?
Answer = yes
Topology Question II
Which of these trees has a different topology from the
others?
F
E D A B C
F
D E
F
C B
C A B
A E D
C B A
F
E D B A C
E D F
2. Branch lengths indicate genetic change
0.8
1.2
0.5
0.6
0.5
0.5
• Longer branches indicate greater change
• Change is typically represented in units of number of
substitutions per site (but check the legend)
A scale bar can represent branch lengths
0.8
1.2
0.5
0.6
0.5
0.5
0.5
These are alternative representations of the same
phylogeny
Alternative representations of phylogenies
All of these representations depict the same topology
Branch lengths are indicated in blue
Red lengths are meaningless
Not all trees include branch length data
Cladogram
Phylogram
Distance and substitution rate are
confounded
• Branch lengths indicate the genetic change that has
occurred
• We often don’t know if long branch lengths reflect:
• A rapid evolutionary rate
• An ancient divergence time
A
• A combination of both
• Genetic change = Evolutionary rate
B
C
E
D
x Divergence time
(substitutions/site) (substitutions/site/year) (years)
3. Nodes
A
B
C
D
E
• Nodes occur at the ends of branches
• There are three types of nodes:
i.
Tips (sampled sequences known as taxa)
ii.
Internal nodes (hypothetical ancestors)
iii. Root (oldest point on the tree)
Figures Andrew Rambaut
The root is the oldest point on the tree
present
A
B
C
D
E
past
• The root indicates the direction of evolution
• It is also the (hypothesised) most recent common
ancestor (MRCA) of all of the samples in the tree
Figures Andrew Rambaut
Trees can be drawn in an unrooted form
Rooted
A
B
C
D
Unrooted
E
A
D
B
E
C
These are alternative representations of the same topology
There are multiple rooted tree topologies for
any given unrooted tree
*
• Most tree-building
methods produce
unrooted trees
• Identifying the correct
root is often critical for
interpretation!
Figure Aiden Budd
How to root a tree
Midpoint rooted
• Midpoint rooting
• Assume constant
evolutionary rate
Unrooted
• Often not the case!
• Outgroup rooting
Outgroup rooted
• The outgroup is one or
more taxa that are known
to have diverged prior to
the group being studied
• The node where the
outgroup lineage joins the
other taxa is the root
Recommended
Root Question
This tree shows a cladogram i.e. the branch lengths do not
indicate genetic change.
Indicate any root positions where bird and crocodile are not
sister taxa (each other's closest relatives).
Alternative Representations Question
4. Confidence
How good is a tree?
A tree is a collection of hypotheses
so we assess our confidence in each
of its parts or branches independently
0.99
100
0.81
63
0.93
85
There are three main approaches:
• Bootstraps
• Bayesian methods
• Approximate likelihood ratio test (aLRT) methods
probabilistic
What is a monophyletic group?
A monophyletic group (also described as a clade) is a group of
taxa that share a more recent common ancestor with each other
than to any other taxa.
monophyletic group
Confidence Question
Which of the bootstrap values indicates our confidence in
the grouping of A, B, C, and D together as a monophyletic
group? Do you think we can be confident in this grouping?
100
91
63
84
A
B
C
D
E
F
Note: high bootstrap values do not always mean that we have confidence in a
branch. False confidence can be generated under some phylogenetic methods
Part two: Phylogenetic interpretation
for immunologists 2013
Dr Laura Emery
Laura.Emery@ebi.ac.uk
www.ebi.ac.uk/training
Phylogenetic interpretation skill set
1. Tree-thinking skills
• relatedness, confidence, homology
2. Knowledge of phylogenetic methods and their limitations
3. Knowledge of biological processes affecting sequence
evolution
•
gene duplication, recombination, horizontal gene transfer,
population genetic processes, and many more!
4. Knowledge of the data you wish to interpret
Simple phylogenetic interpretation question
• Which is true?
• A) Mouse is more closely
related to fish than frog is to
fish
• B) Lizard is more closely
related to fish than mouse is to
fish
• C) Human and frog are equally
related to fish
Homology is similarity due to shared ancestry
Example: limbs and wings
• Limbs are homologous
they share a common
ancestor
• Wings are not homologous
they are an analogous as
they have evolved
similarity independently
Gene duplication
Gene duplication and
subsequent divergence can
result in novel gene
functions (it can also result
in pseudogenes)
• Genes that are
homologous due to
gene duplication are
paralogous
• Genes that are
homologous due to
speciation are
orthologous
Teleost MHC class II phylogeny
• Can you spot any MHC
class II gene duplication
events?
Harstad et al BMC Genomics 2008
Immunology related genes have atypical
patterns of molecular evolution
• Immunology genes
have a high dN/dS
ratio indicative of
positive selection
• Rapid evolutionary
rate
• Difficult to align
• Violate assumptions
of many
phylogenetic models
Park et al 2012. Scientific Reports
Positive selection can lead to ladder-like
phylogenies
Example: influenza haemagglutination
phylogeny and immunological mapping
Smith et al 2004. Science
Phylogenetics can inform us of hostpathogen interactions and co-evolution
• "Mirror" phylogenies are indicative of host-parasite
vertical inheritance
Jiggins web page: http://www.gen.cam.ac.uk/research/jiggins/research.html
What does this phylogeny tell us about
Human Cytomegalovirus (HCMV)?
Baboon
Simian
Rhesus
Chimp
Human
Rat
Murine
Nicholson et al 2009. Virol J
T-cell receptors and immunoglobulin chains
are homologous
Richards et al 2000
An extremely brief introduction to
methods, analyses, & pitfalls
There is only one true tree
• The true tree refers to what actually happened in the
evolutionary past
• All methods attempt to reconstruct the true phylogeny
• Even the best method may not give you the true tree
Phylogenetic Methods: The general
approach
• We want to find the tree that best explains our aligned
sequences
• We need to be able to define “best explains”
• we need a model of sequence evolution
• we need a criterion (or set of criteria) to use to choose
between alternative trees
• then evaluate all possible trees
(NB: if N=20, then 2 x 1020 possible unrooted trees!)
• or take a short cut
Paul Sharp
The problem of multiple substitutions
*
G
*
A
A
hidden
mutations
*
A
*
T
• More likely to have
occurred between
distantly related
species
• > We need an explicit
model of evolution to
account for these
Methodological approaches
1. Distance matrix methods (pre-computed
distances)
•
UPGMA assumes perfect molecular clock Sokal & Michener
(1958)
•
Minimum evolution (e.g. Neighbor-joining, NJ) Saitou & Nei
(1987)
2. Maximum parsimony Fitch (1971)
•
Minimises number of mutational steps
3. Maximum likelihood, ML
• Evaluates statistical likelihood of alternative trees, based on
an explicit model of substitution
4. Bayesian methods
• Like ML but can incorporate prior knowledge
Phylogenetic analyses are not
straightforward
Decide
upon and
implement
method
Data assessment
- known biology
- additional data
(e.g. geography)
Investigate
unexpected and
unresolved
aspects further
- consider including
more data
Formulate
hypotheses
No
No
Yes
Can you
validate
this?
Phylogeneti
c Result(s)
Answere
d your
question?
Yes
Final phylogeny
and analysis
Further Reading
• Molecular Evolution: A Phylogenetic Approach (1998)
Roderic D M Page & Edward C Holmes, Blackwell
Science, Oxford.
• The Phylogenetic Handbook (2003), Marco Salemi and
Anne-Mieke Vandamme Eds, Cambridge University
Press, Cambridge.
• Inferring Phylogenies (2003) Joseph Felsenstein,
Sinauer.
• Molecular Evolution (1997) Wen-Hsiung Li , Sinauer
Phylogenetics at the EBI
• Clustal phylogeny currently available
• RAxML coming soon…
• www.EBI.ac.uk/tools/phylogeny
Acknowledgements
People
• Andrew Rambaut (University of Edinburgh)
team
• Paul Sharp (University of Edinburgh)
• Nick Goldman (EMBL-EBI)
• Benjamin Redelings (Duke University)
• Brian Moore (University of California, Davis)
• Olivier Gascuel (University of Montpelier)
• Aiden Budd (EMBL-Heidelberg)
Funding
EMBL member states and…
…and the EBI training
Thank you!
www.ebi.ac.uk
Twitter: @emblebi
Facebook: EMBLEBI
Download