Phylogenetic Terminology

advertisement
Terminology of
Phylogenetic Trees
Dan Graur
1
•Evolutionary relationships
are usually illustrated by
means of a phylogenetic tree
(dendogram).
•The “tree metaphor” cannot
always be used.
2
Ernst Heinrich Haeckel
1834-1919
3
Jean-Baptiste [Pierre Antoine de
Monet, Chevalier de] Lamarck.
1809
4
Charles Darwin
July 1837
July 2007
5
Charles Darwin
November 1859
6
The terminology of phylogenetics is discombobulated.
7
Graduate Student Assignments
Instead of a stream of emails that
will yield unsatisfactory results,
kindly set up appointments and
let’s talk.
8
In mathematics, a graph is an abstract
representation of a set of objects called nodes (or
vertices), some of which are connected to one
another by links called branches (or edges). A path
in a graph is a sequence of branches that connect
any two nodes.
9
Graphs = Trees + Non-Tree Graphs (or Networks)
In a tree (b), any two nodes are connected by a
single path.
In a network (a), there may be multiple pathways
connecting two nodes.
10
The evolutionary relationships among a
group of organisms are illustrated by means
of phylogenetic trees (or dendrograms).
11
Internal
External or Peripheral
Branch
12
The branching pattern of a tree
is called its topology.
Three different styles of trees, one topology.
13
One topology
14
16
Terminal node = Operational taxonomic unit (OTU)
Internal node = Hypothetical taxonomic unit (HTU)
Peripheral ( or terminal) branch = relationship between OTU and HTU
Internal branch = relationship between two HTUs
17
Bifurcating and multifurcating trees
A node is bifurcating (or binary or dichotomous) if it has
only two immediate descendant lineages, but
multifurcating (or polytomous) if it has three or more
than two immediate descendant lineages. In a strictly
bifurcating tree, each internal node is incident to exactly
three branches, two derived and one ancestral.
18
A bifurcation is always
interpreted as a speciation
event
Two possible interpretations for a
multifurcation (polytomy) in a tree:
1. The polytomy represents the true
sequence of events (hard
polytomy), whereby an ancestral
taxon gave rise to three or more
descendant taxa simultaneously.
2. The polytomy represents a lack of
resolution. The exact order of two
or more bifurcations cannot be
determined unambiguously with the
available data (soft polytomy).
19
Rooted and unrooted trees
In a rooted tree there exists a particular node, called the
root, from which a unique path leads to any other node. The
direction of each path corresponds to evolutionary time, and
the root is the common ancestor of all the taxonomic units
20
under study.
In an unrooted tree with four external
nodes, the internal branch is referred to
as the central branch.
21
How many unrooted topologies are here?
d
a
1
d
b
2
b
a
e
c
e
c
c
e
e
a
a
b
3
4
b
d
d
c
22
• In an unrooted phylogenetic tree you
cannot immediately assess evolutionary
relationships.
• In a rooted phylogenetic tree,
evolutionary relationships are evident.
23
Phoronida (horseshoe worms)
Brachiopoda (lampshells)
Arthropoda (arthropods)
Vertebrata (vertebrates)
Which of the following taxa are evolutionarily the closest to
24
Rick Perry? (a) Phoronida, (b) Brachiopoda, (c) Arthropoda,
(d) all three taxa are equidistant from Perry, or (e) two taxa
are closer to Perry than the third taxon.
Cladograms & Phylograms
(collectively Dendograms)
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Cladograms show
branching order branch lengths are
meaningless
Eukaryote 3
Eukaryote 4
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Phylograms show
branch order and
branch lengths
Eukaryote 2
Eukaryote 3
Eukaryote 4
25
Unscaled phylogram
Scaled phylogram
The branch length is number of changes (e.g., nucleotide
substitutions) that have occurred along a branch. The total
number of changes in a particular tree is called the tree
length.
27
Tree balance
Tree balance is a measure of the degree of symmetry of a
rooted phylogenetic tree. It serves as an indication of the
pattern of speciation events in the group of taxa under
study.
Balanced tree
Unbalanced or Pectinate (comb-like) tree
28
Tree balance
Balanced tree
Unbalanced or Pectinate (comb-like) tree
In an unbalanced tree, only one descendant of a node
continues to speciate after a splitting event. In a balanced
tree, all descendants of a node participate equally in
cladogenesis.
29
Tree balance
Balanced tree
Unbalanced or Pectinate (comb-like) tree
Tree balance is an important indicator of the ease of
phylogenetic reconstruction. Because, by definition,
unbalanced trees contain long branches, they are more
difficult to reconstruct phylogenetically than balanced trees.
In fact, unbalanced and balanced tree are sometimes
referred to as “good” and “bad” trees, respectively (Sackin
1972).
30
How to describe a
phylogenetic tree
in computerese?
31
The Newick format
In computer programs, trees are
represented in a linear form by a string of
nested parentheses, enclosing taxon
names (and possibly also branch lengths
and bootstrap values), and separated by
commas. This type of representation is
called the Newick format. The originator
of this format in mathematics was Arthur
Cayley (1821–1895).
32
The Newick format
The Newick format for phylogenetic trees was adopted on June 26,
1986 at an informal meeting at Newick's Lobster House in Dover,
New Hampshire. The Newick format currently serves as the de facto
standard for representing phylogenetic tree and is employed by
almost all phylogenetic software tools. Unfortunately, it has never
been described in a formal publication; the first time it is mentioned
in a publication is in 1992.
33
The Newick
format
In the Newick format,
the pattern of the
parentheses indicates the
topology of the tree by
having each pair of
parentheses enclose all
members of a
monophyletic group. A
phylogenetic tree in the
Newick format always
ends in a semicolon (;).
; 34
The Newick format
One can use the Newick format to write
down rooted trees, unrooted trees,
multifurcations, branch lengths, and
bootstrap values.
35
36
3 OTUs
1 unrooted tree = 3 rooted trees
37
4 OTUs
3 unrooted trees = 15 rooted trees
38
The number of possible bifurcating
rooted trees (NR) for n  2 OTUs
(2n  3)!
N =
R
n

2
2
(n  2)!
(2n  5)!
NU = n  3
2
(n  3)!
The number of possible bifurcating
unrooted trees (NU) for n  3 OTUs
39

Number of OTUs
Number of possible rooted tree

2
1
3
3
4
15
5
105
6
954
7
10,395
8
135,135
9
2,027,025
10
34,459,425
15
213,458,046,676,875
20
8,200,794,532,637,891,559,375
40

Evolution is an historical process.
Only one historical narrative is true.
From 8,200,794,532,637,891,559,375
possibilities, 1 possibility is true and
8,200,794,532,637,891,559,374 are
false.
Truth is one, falsehoods are
many.
41
How do we know which of the
8,200,794,532,637,891,559,375
trees is true?
42
We don’t, we infer
by using decision
criteria.
43
True and inferred trees
The sequence of speciation events that has led to
the formation of a group of OTUs is historically
unique. A tree representing the true evolutionary
history is called the true tree.
A tree that is obtained by using a certain set of
data and a certain method of tree reconstruction is
called an inferred tree.
An inferred tree may or may not be the true
tree.
44
ancestor
descendant 1
Cladogenesis = the
splitting of an
evolutionary lineage
into two genetically
independent lineages.
descendant 2
45
Anagenesis =
changes occurring
along an
evolutionary
lineage.
descendant
46
In molecular
phylogenetics, we
assume that species
are only created by
cladogenesis.
Species Trees
&
Gene Trees
48
At every locus, if we trace back the history of any two alleles from
any two populations, we will eventually reach a common ancestral
allele from which both contemporary alleles have been derived.
49
The routes of inheritance represent the passage of genes from
parents to offspring, and the branching pattern depicts a gene tree.
50
Different genes, however, may have different evolutionary
histories, i.e., different routes of inheritance, different gene trees.
51
The routes of inheritance are mostly confined by reproductive
barriers—that is, gene flow occurs only within the species. A species is
therefore like a bundle of genetic connections, in which many
entangled parent-offspring lines form the ties that bundle individuals
together into a species lineage.
52
A gene tree may differ from a species tree
S = Divergence
time for species
1 and 2
53
A gene tree may differ from a species tree
G1 = Inferred
divergence
time by using
alleles a and f
S = Divergence
time for species
1 and 2
54
A gene tree may differ from a species tree
Alleles d and b are
closer to each other
than alleles d and f.
55
Incomplete lineage sorting due to polymorphism
at speciation time
56
Gene trees and species trees
Gene tree
a
A
b
B
c
D
Species tree
It is often assumed that gene trees always
equal species trees. This may be not be
true.
57
Taxon (singular); Taxa (plural)
A taxon is a species or a group of species that has been given a
name, e.g., Homo sapiens (modern humans) or Lepidoptera
(butterflies).
There are codes of biological nomenclature which seek to ensure
that every taxon has a single and stable name, and that every
name is used for only one taxon.
58
Clades*
• Strictly: A clade is a group of all the taxa that have been
derived from a common ancestor plus the common ancestor
itself.
• In molecular phylogenetics: A clade is a group of taxa under
study that share a common ancestor, which is not shared by
any other species outside the group.
59
*also: monophyletic groups, natural clades
Paraphyletic Taxa
• A taxon whose common
ancestor is shared by any
other taxon is called a
paraphyletic taxon or
an invalid taxon.
Reptiles are paraphyletic.
60
• A named taxon that lacks phylogenetic validity,
but is nonetheless used, is called a convenience
taxon.
“a convenience fish”
61
Sister Taxa
• If a clade is composed
of two taxa, these are
referred to as sister
taxa.
Birds and crocodiles are
sister taxa.
62
Which of the following groups are not monophyletic?
E. coli
rat
mouse
baboon chimp
human
a. human, chimpanzee, baboon
b. mouse, chimpanzee, baboon
c. rat, mouse
d. human, chimpanzee, baboon, rat, mouse
e. E. coli, human, chimpanzee, baboon, rat, mouse
63
Which of the following groups are not monophyletic?
E. coli
rat
mouse
baboon chimp
human
a. human, chimpanzee, baboon
b. mouse, chimpanzee, baboon
c. rat, mouse
d. human, chimpanzee, baboon, rat, mouse
e. E. coli, human, chimpanzee, baboon, rat, mouse
64
Two or more sequences are said to be homologous if they
are related by descent. Homology is often ascertained on
the basis of sequence similarity. Thus, if two or more
sequences exhibit high degrees of similarity, it is likely
(but not always the case) that they are homologous.
Sequence similarity may also arise without common
ancestry: by chance, or due to convergence driven by
similar selective pressures. Such sequences, which are
similar but not homologous, are said to be analogous.
65
Homology is a qualitative statement.
Similarity is a quantitative and, hence, quantifiable
statement (e.g., percent similarity, percent identity).
Similarity is a fact.
Homology is a hypothesis.
Of course, as with any other scientific hypothesis,
homology between two sequences may be tested and every
so often rejected.
66
Types of homology
•Orthology: Similarity due to speciation.
•Paralogy: Similarity due to gene
duplication.
•Ohnology: A special case of paralogy in
which similarity is due to genome
duplication.
•Xenology: Similarity due to horizontal
gene transfer.
67
Orthologs and Paralogs
paralogous
orthologous
a
b c
orthologous
C B
A
Duplication yields 2
copies (paralogs) on the
same genome
Ancestral gene
68
Orthologs and Paralogs
Only b, C, and A are sampled
a
b* c
C B
A
b C
A
A mixture of
orthologs and
paralogs is sampled
69
70
71
A character
provides
information
about an
individual OTU.
A distance
represents a
quantitative
statement
concerning the
dissimilarity
between two
OTUs.
72
A character is a well-defined
feature that in a taxonomic unit
can assume one out of two or
more mutually exclusive
character states.
Mutually exclusive: If David is tall, David cannot be short.
73
74
75
Character
Continuous
Discrete
Multistate
Ordered
Polar
Binary
Unordered
Unpolar
Polar
Unpolar
76
A character is unordered if a change
from one character state to any other
character state can occur in one step.
77
A character is ordered if there exists a
symmetrical path of change from one character
state to another.
78
A character is polar if there exists an
asymmetrical (irreversible) path of change
from one character state to another.
Polar
79
The number of steps between two character
states is specified by a step matrix.
80
Assumptions about character evolution
Methods of phylogenetic reconstruction
require that we make explicit assumptions
about:
(1) the number of discrete steps required
for one character state to change into
another.
(2) the probability with which such a
change may occur.
81
Temporal Polarity of Character States
Character states may be ranked by relative
antiquity into:
(1) primitive or ancestral (plesiomorphy)
(2) derived or novel (apomorphy)
82
Taxonomic Distribution of Character States
A primitive state that is shared by several taxa is a
symplesiomorphy.
A derived state that is shared by several taxa is a
synapomorphy.
symbioisis
sympathy
synapse
synteny
A derived character state unique to a particular taxon is an
autapomorphy.
A character state that is shared by several taxa due to
convergence, parallelism and reversals, rather than due to
common descent, is a homoplasy.
83
homoplasy
apomorphy
synapomorphy
(autapomorphy)
D
symplesiomorphy
C
C
B
A
A
B
A
C
A
A
A
plesiomorphy
A
84
What is swimming in shark and carp?
shark
carp
guppy
chicken
rat
bat
a. symplesiomorphic
b. synapomorphic
c. autapomorphic
d. homoplasic
85
What are scales in guppy and carp?
shark
carp
guppy
chicken
rat
bat
a. symplesiomorphic
b. synapomorphic
c. autapomorphic
d. homoplasic
86
What are feathers in chicken?
shark
carp
guppy
chicken
rat
bat
a. symplesiomorphic
b. synapomorphic
c. autapomorphic
d. homoplasic
87
What are wings in chicken and bat?
shark
carp
guppy
chicken
rat
bat
a. symplesiomorphic
b. synapomorphic
c. autapomorphic
d. homoplasic
88
Distance Data
89
90
Most molecular data yield character
states that are subsequently
converted into distances.
91
92
93
+
94
Ultrametricity = Strict Molecular Clock
Download