27 FresH2O Phylogeny Trees

advertisement
General Phylogenetics
Points that will be covered in this presentation
•Tree Terminology
•General Points About Phylogenetic Trees
•Phylogenetic Analyses
The importance of Alignments
The different analysis methods
Tree confidence measures
1
Tree Terminology
Node: point at which 2 or
more branches diverge
internal node
Internal node: hypothetical last common ancestor
Terminal node: molecular or morphological data
from which the tree is derived. (These will often be used
terminal node
or OTU
to represent species or individual specimens and may be referred
to as OTUs = Operational Taxonomic Units)
terminal node
or OTU
internal node
Clade: a node (hypothetical
ancestor) and all the
lineages descending from it
clade
clade
2
Tree Terminology
Monophyletic group: a group in
which all members are derived
from a unique common ancestor
Polyphyletic group: a group in
which all members are not
derived from a unique common
ancestor. The common ancestor
of the group has many
descendants that are not in the
group
Paraphyletic group: a group that
excludes some of the
descendants of the common
ancestor (a form of polyphyly)
3
General Points About Phylogenetic Trees
All branches can rotate freely around a node
(i.e. B is not more closely related to C than A, and C is
not more closely related to D than E)
A
B
C
D
E
A
B
Branch lengths may be be drawn as equal
between nodes – “cladograms” (see tree above)
(these are used when one is interested only in the
branching pattern)
C
D
E
Branch lengths may be proportional to the
hypothesized distance between nodes –
“phylogram” (see tree on left)
4
General Points About Phylogenetic Trees
polytomy
Fully resolved trees are bifurcating (only two
decendant lineages from nodes)
A node with more than two decendant
lineages is a multifurcating node or a
polytomy.
polytomy
Polytomies may be “soft” or “hard”
“Soft” = product of data or analysis
“Hard” = product of biology
5
General Points About Phylogenetic Trees
LSU tree
polytomy
Example of a “soft” polytomy:
LSU analysis is unable to
resolve the relationships of
some Ptilophora species.
Tronchin et al. 2004
rbcL tree
Using different data (rbcL) the
relationships among Ptilophora
species are better resolved.
Tronchin et al. 2004
6
Phylogenetic Analyses
The Importance of Alignments
Phylogenetic trees derived from the analysis of DNA or amino acid sequences
are only as good as the data they are based upon.
Garbage In = Garbage Out
Consequently, sequence alignment is the most important step in
phylogenetic analysis.
The aligned sites of a sequence must be homologous (or identical by decent =
taxa share the same state because their ancestor did).
If two taxa share the same state but not by decent it is called homoplasy
7
Phylogenetic Analyses
The Importance of Alignments
DNA sequences are prone to homoplasy because there are only 4 possible sites (and
insertion/deletion mutations[indels] for some loci).
same sites in different
sequences need to be
homologous
inferred insertion/deletion
mutations (gaps)
area to possibly remove
from analyses because of
uncertain homology
between sites
8
Phylogenetic Analyses
The Different Analysis Methods
See: evolution.genetics.washington.edu/phylip/software.html#methods for a list of software programs
Distance methods: based on similarity between OTUs
UPGMA – originally used for phenotypic characters in numerical
taxonomy. Generally not applied to sequence data because it is highly
sensitive to mutation rate changes in lineages, i.e. the data must fit a
“molecular clock.”
NJ (Neighbor Joining) – algorithm method that will find the “minimum
evolution” tree without examining all possible topologies.
The accuracy of a distance tree depends on 2 things:
1)How “true” are the distances calculated between taxa (how good is
the model of evolution that your distances are based upon).
2) The standard error of the distance measure estimation
9
Phylogenetic Analyses
The Different Analysis Methods
Optimization methods
•Parsimony: searching for the tree that requires the least number
of mutational steps i.e. the simplest is the best.
•Maximum Likelihood: searching for the most likely tree (the tree
with highest probability) given the OTUs (sequences) and model
of evolution i.e. the tree that maximizes the probability of
observing the data is the best tree.
•Bayesian: searching for the best set of trees i.e. the set of trees
in which the likelihoods are so similar that changes between
them are essentially random.
10
Phylogenetic Analyses
Tree Confidence Measures
Decay Analysis or Goodman-Bremer Support Values: a test used in parsimony
analyses where one determines how many steps less parsimonious than minimal, is a
particular branch in your tree no longer resolved in the consensus of all possible trees
that length.
One step less parsimonious
L = 36
Most parsimonious tree
L = 35
Two steps less parsimonious
L = 37
d1
d2
How meaningful the values are
may depend on the tree length.
11
Phylogenetic Analyses
Tree Confidence Measures
Bootstrapping: A non-parametric test of how well the data support the nodes
of a given tree.
Determining support is a bit of a statistical problem: Evolution only happened
once so there is no underlying distribution to sample in order to develop
confidence values.
Method: the original analysis is performed multiple times on pseudo-datasets
derived by sampling the original dataset with replacement. The number, or
fraction, of times that a particular clade is present in the resulting trees is its
boostrap value.
Bootstrapping is not portable i.e. you can not compare values across studies
because changing any parameters will change the values.
12
Phylogenetic Analyses
Tree Confidence Measures
Bootstrapping
By default most programs will show bootstrap values when they are greater
than 50 but, does a bootstrap value of 50 mean anything?
For a discussion of this see Hillis & Bull (1993) Systematic Biology 42:182192 (they tested bootstrap values based on a known phylogeny).
Wilson’s General Rule:
•60-80, is there other evidence to support the relationship, be cautious;
•80-90, usually pretty solid;
•90-100, solid and unlikely to be misleading.
13
General Points About Phylogenetic Trees
DNA or protein sequence trees are
hypotheses of how a particular DNA locus or
protein has evolved.
We assume that the way the DNA or protein
has evolved reflects the way the species has
evolved i.e. gene tree = species tree
IMPORTANT: This may or may not reflect
reality.
i.e. You Still Have To Think as molecules do
not necessarily trump morphology,
development, etc.
14
General Points About Phylogenetic Trees
species tree
gene tree
A
B
C
gene tree = species tree
A
B
C
gene tree = species tree
15
Download