Hierarchical structuring of trait data

advertisement
Hierarchical Structuring of
Trait Data
Bot 940: Evidence for Evolution
Eric Caldera
Scientific Method
1. Observation and description: Organisms
seem to have changed over time
2. Formulation of a Hypothesis: Evolution by
decent with modification / common ancestry
3. Predictions: Biological traits should appear in
a nested hierarchical structure “groups
within groups”
4. Experimental test of predictions: Do traits
suspected to have arisen by decent with
modification show a greater degree of
hierarchical structure?
Nested hierarchical structure
“groups within groups”
Vascular tissue
Chloroplasts
Water-tight egg
Four limbs
= shared-derived characters
Note that we don’t see overlap
across groups
Example: there are
no fungi with
vascular tissue,
insects with four
limbs or amphibians
with vascular
tissue…etc.
Vascular tissue
Chloroplasts
Water-tight egg
Four limbs
= shared-derived characters
eukaryote
Why do we predict nested
hierarchical structure?
• Only branching evolutionary processes are
capable of generating nested hierarchical
structure.
• For example, human languages, which have
common ancestors and are derived by descent
with modification, generally can be classified
in objective nested hierarchies (Pei 1949;
Ringe 1999).
• Only certain things can be classified
objectively in a consistent, unique nested
hierarchy.
• The difference drawn here is between
"subjective" and "objective”.
• Anything can be grouped into hierarchies (for
example, automobiles), but the importance of
characters must be weighted subjectively.
Subjective grouping of automobiles
three wheels
four wheels
three wheels
A
B
four wheels
Characters
four wheels
three wheels
red
blue
Note that one tree is not more parsimonious over the other.
In tree A, the number of wheels is subjectively weighted
over color, and vice versa in tree B.
• A cladistic analysis of automobiles will not produce a
unique, consistent, well-supported tree that displays
nested hierarchies.
• A cladistic analysis of automobiles (or any analysis of
randomly assigned characters) will result in a
phylogeny, but there will be a very large number of
other phylogenies, many of them with very different
topologies, that are as well-supported by the same
data.
• Cladistic analysis of an actual genealogical process will
produce one or a small amount of trees that are much
more well-supported by the data than the other
possible trees.
• The nested
hierarchical
organization
contrasts with other
possible biological
patterns, such as
the continuum of
"the great chain of
being"
• Mere similarity between organisms is
not enough to support evolution.
• The nested classification pattern
produced by a branching evolutionary
process, such as common descent, is
much more specific than simple
similarity.
Is demonstrating that phylogenies show hierarchical
structure enough? How much nested structuring is
necessary to show that the structure is non-random?
www.talkorigins.org
Testing for Hierarchical
Structure
• Plylogenetic signal (the degree to which a
phylogeny shows a unique well supported tree)
can be quantified.
– We will discuss two methods: the randomization
test and the consistency index (CI). (Archie, 1989;
Klasssen et al. 1991)
• For additional tests see: Faith and Cranston
1991; Farris, 1989, Felsenstein 1985; Hillis
1991; Hillis and Huelsenbeck 1992,
Huelsenbeck et al. 2001
Archie, 1989
• Provides a test to determine whether
the minimum length tree for a given
dataset is significantly different from
that expected from random data.
Archie’s Randomization test:
• 1. randomize character data and perform the
cladistic analysis
• 2. repeat this process to obtain a distribution
of the minimum tree lengths for the
randomized data
• 3. Test whether the minimum length tree
generated from the real data is significantly
smaller than the trees generated from
randomized data
Real data
Vascular tissue
Chloroplasts
Water-tight egg
Four limbs
bacteria
0
0
0
0
amphibians
0
0
0
1
humans
0
0
1
1
mammals
0
0
1
1
birds
0
0
1
1
reptiles
0
0
1
1
fishes
0
0
0
0
insects
0
0
0
0
fungi
0
0
0
0
mosses
0
1
0
0
ferns
1
1
0
0
flowering plants
1
1
0
0
Randomized data
Vascular tissue
Vascular tissue
Chloroplasts
Water-tight egg
Four limbs
Chloroplasts
Water-tight egg
Four limbs
bacteria
1
0
0
1
amphibians
0
0
0
0
humans
0
1
1
0
mammals
0
0
0
0
birds
0
0
1
0
reptiles
0
1
0
1
fishes
0
0
1
0
insects
0
0
0
1
fungi
0
0
0
1
mosses
1
0
0
0
ferns
0
1
1
1
flowering plants
0
0
0
0
Minimum length
tree from real data
Distribution of minimum length trees from randomized dataset
Min tree from
real data
Distribution of randomized data
Archie, 1998
Klassen et al. 1991
• Makes use of the consistency index (CI)
• CI represents the reciprocal of the
number of steps per character
– The further from one, and the closer to zero =
increased homoplasy
• A problem with CI is that is seems to
decrease as a function of number of
characters and taxa
• Klassen et al generated CI distributions
for random datasets of varying numbers
of taxa and characters.
• The CI values for random data can then
be compared to real data
Note that most real
datasets are well
above the 95%
confidence interval
for CI values, and in
no cases are they
below the 95% limit.
Klassen et al. 1991
Klassen et al. 1991
Did GOD do it?
• How might you respond to the idea that
nested hierarchal structure is seen because
certain traits, by design, work better
together?
• Is testing hierarchical structure against a
“random” alternative sufficient. What about
those that argue that god did not design
things “randomly”
•
•
•
•
•
•
•
•
•
•
Archie, J. W. (1989) "A randomization test for phylogenetic information in
systematic data." Systematic Zoology 38: 219-252.
Faith, D. P., and Cranston, P. S. (1991) "Could a cladogram this short have arisen
by chance alone?: on permutation tests for cladistic structure." Cladistics 7: 128.
Farris, J. S. (1989) "The retention index and the rescaled consistency index."
Cladistics 5:417-419.
Felsenstein, J. (1985) "Confidence limits on phylogenies: an approach using the
bootstrap." Evolution 39: 783-791.
Hillis, D. M. (1991) "Discriminating between phylogenetic signal and
random
noise in DNA sequences." In Phylogenetic analysis of DNA
sequences. pp.
278-294 M. M. Miyamoto and J. Cracraft, eds. New York: Oxford University
Press.
Hillis, D. M., and Huelsenbeck, J. P. (1992) "Signal, noise, and reliability in
molecular phylogenetic analyses." Journal of Heredity 83: 189- 195.
Hillis, D. M., Moritz, C. and Mable, B. K. Eds. (1996) Molecular systematics.
Sunderland, MA: Sinauer Associates.
Klassen, G. J., Mooi, R. D., and Locke, A. (1991) "Consistency indices and random
data." Syst. Zool. 40:446-457.
Pei, M. (1949) The Story of Language. Philadelphia: Lippincott.
Ringe, D. (1999) "Language classification: scientific and unscientific
methods." in The Human Inheritance, ed. B. Sykes. Oxford: Oxford
University Press, pp. 45-74.
Download