Laboratory 10 (WORD)

advertisement
Laboratory 10
Seed Plant Phylogeny (week two)
Compare your trees with the sample trees that your teaching fellow has handed out in lab this week.
Make sure that you understand how the character changes marked on the trees yield the character states
listed in the 6-character data table. Also make sure you understand how the tree length (total number
of character-state changes) was computed. Finally, locate the character reversals (from 1 to 0) and
parallelisms (two or more identical changes of the same character, either 0 to 1 or 1 to 0) on the three
trees.
IV. Building the Full Tree via Computer and Manipulating It to Test Ideas.
Consider the following: the trees you have just finished looking at are for eight groups of organisms
based on six characters. The most recent analysis of the seed plants, done by Kevin Nixon and
coworkers (Nixon et al., 1994) included 103 characters for 14 living groups and 14 extinct groups - a
data table of 103 x 28 cells! No one in her right mind would attempt to infer a phylogeny (build a tree)
from the huge pile of data in this data set by hand. What we do instead is leave it to the computers to
do the job. You still must study the plants and collect the character states, build the data table, and tell
the computer how to do the analysis - but most of the time is in finding the shortest tree, and this
process the computer can do.
In this exercise we will use Nixon’s data set, though only subset of the characters and only for the
living seed plants, to infer phylogenies (that is build trees that hypothesize evolutionary history) using
computer programs. Though there is an array of available programs, most do basically the same thing they build networks based on similarities, then root the trees using the outgroup criterion (that is, a
character shared by the outgroup and some of the members of the study group is primitive, see. p. 8).
For this exercise, we will use two different programs, because they each have different capacities: the
programs are ---- PAUP (stands for Phylogenetic Analysis Using Parsimony) - to find the tree with the fewest
character-state changes (most parsimonious tree).
-- MacClade - to 1) show the tree, 2) learn about the character distributions on the shortest tree, and 3)
consider alternative trees that are longer but assume possibly more appealing evolutionary histories.
A. Finding the Shortest Tree using PAUP
For this part of the exercise, you need to get used to the MacIntosh computers in the computer lab in
the cellar of Marsh Life Science Building. All the files you need should be in your folder - but make
sure you have them, they are three A copy of PAUP 3.1.1
A copy of MacClade version 3.01
A copy of the seed plant data file, “seedplant.dat”
Okay, let’s begin.
1. Open the file called “seedplant.dat.” You may get a message that says the file is locked and
asking you if you want to open it anyway. Answer yes. Since it is a PAUP file, the computer
will open PAUP, then open your data file. Now, choose Save as... from the File pull-down
menu and follow the hints to give the file a name, which makes your own copy of the data file.
Give it any name you like, but we’ll call this file “yourname.dat” from now on. Save your
file on the desktop.
(If you are not used to MacIntoshes, the way to do this is to 1) move the arrow to the word
“File” in the white ribbon at the top of the screen, 2)move the arrow down to the word “Save
as...”, and 3) let go. The computer will guide you from here on.)
2. Now look at the structure of the file. The design of this file is a consequence of the
programmer’s approach to inferring phylogenies using a computer and the particular
programming language he chose. Let’s take a tour of the file.
a. At the top of the file is the file-type label, “#NEXUS”. The third line, “BEGIN
DATA;” is a signal to the computer that the data are about to be fed to it. Then,
under “DIMENSIONS”, the file describes the number of groups we are going to
provide data for (NTAX=11), then the number of characters provided for each of
these groups (NCHAR=103).
b. Then, in the fifth line, comes some language defining characters in the data table.
These are useful to understand.
FORMAT MISSING=? --- If you look at the MATRIX (data table) below, you will
see a number of question marks. These may mean one of three things:
i. truly missing data (for instance if no one has ever studied that part of the
plant)
ii. data missing because a structure has not been invented by a group (for
instance fruit structure for a pine tree makes no sense because pines don’t
have fruits.)
iii. in the case of our data (from Nixon et al., 1994) a character that
- varies within a group
- is confusing as to its homology (for instance it may be unclear exactly
what a leaf is in a group)
GAP=. doesn’t apply to this data set.
SYMBOLS= “0,1,2,3” defines the numbers to be used to represent character states. In this larger data
set some characters have more than two character states. For instance, consider character 27, vein
orders. This character represents the number of times that veins branch, and the number varies. Nixon
et al. chose to divide the character into three character states - not branched, branched once, and
branched twice or more. In the matrix these character states are represented by 0,1, and 2.
The vein-order number character also introduces the idea of defining primitive versus derived character
states. This particular character is considered as “ordered”, that is unbranched veins are primitive,
singly branched veins are derived from unbranched veins, and the more complex branching(branched
twice or more) is in turn derived from the singly branched veins.
Sometimes, scholars do not want to define character states as primitive or derived, in which case the
character is labeled “unordered”.
c. Next comes the MATRIX command, which tells the computer that the data are
next in the file. It ignores the numbers in brackets, i.e. [10
20...], which are
there to help you tell which character number you are looking at in the matrix below.
The matrix itself includes a number or a question mark for each character for each of
the eleven living groups (evolutionary lineages) we are asking the program to build a
tree for.
d. Down at the bottom of the file is a block of commands under “BEGIN
ASSUMPTIONS”. These tell the computer about our particular choices for
analyzing these data.
i. TYPESET * mixed distinguishes between the unordered and ordered
characters in the data set. These are Nixon et al.’s opinions about which is
which, from the original article.
ii. EXSET * exclude is the list of characters we are excluding in this
exercise. These are removed so that just enough characters (32) are
included to give us the same answer as all 103 characters, and it will make it
simpler for you to deal with the trees when you make them.
The final characters are ------- {13 17 20 22 33-35 38-40 47 57-61 64 65 70 71 73 74
83-92} --- Here are the characters and their character states:
CHAR.
13.Vessels
17.Lignin Subunits
20.Resins
22.Leaf Base
33.Stomates
34.Astrosclereids
in
Leaf
35.Strobili
38.Microsporophylls
39.Microsporophylls
40. Microsporangia
per unit
47.Leptomate
Aperture
57.Microgametophyte
58.Pollen Tube
59.Ramiform Pollen
Tubes
60.Stalk Cell
61.Sperm
64.Woody Cones
65.Compound Cone
Units
70.Ovules
71.Micropyle
73.Ovule Growth
74.Outer Seed
Envelope
83.Megaspore Tetrad
84. Megaspore Wall
85.Megagametophyte
86.Megagametophyte
87.Archegonia
88.Egg
89.Early Embryogeny
90.Embryo Maturity
91.Embryo Feeder
92.Seed Germination
KIND
additive
additive
nonadditive
STATE 0
absent
vanillan
absent
simple
haplocheilic
STATE 2
absent
STATE 1
present
syringal groups
present
sheathing
some or all
syndetocheilic
present
unisexual
bisexual
functionally
unisexual
spiral
free
many
whorled/opposite
basally fused
1-4
absent
present
more than
fournucleate
suspended
absent
4-nucleate
present
flagellate
absent
many
absent
nonflagellate
present
few
orthotropous
normal
pachychalazal
absent
anatropous
tubular
endochalazal
present
tetrahedral
thick
monosporic
alveolar
present
cellular
free-nuclear
postshed
absent
cryptocotylar
linear
thin/absent
tetrasporic
nonalveolar
absent
free-nuclear
cellular
preshed
present
phanerocotylar
3-nucleate
penetrating
present
isobilateral
iii. ANCSTATES allzero = 0:ALL. This is the way in which we actually
root the tree, in this case by defining all the zero character state in the data
table as most primitive. Nixon et al. decided on which character states were
most primitive by the outgroup-comparison method we discussed last week,
and coded the character states so that 0 is most primitive for each one..
3. It’s time to run the program. Choose the File pulldown menu from the white bar at the top
of the screen and select Execute yourname.dat to run PAUP, the program that will build
your phylogenetic tree for you, using your file.
a. PAUP will first read the data for our 11 groups using the 32 selected characters.
PAUP reports on its work. You should see the following messages:
Processing of file “yourfile.dat nexus” begins...
Data matrix has 11 taxa, 103 characters
Valid character-state symbols: 0123
Missing data identified by ‘?’
Gaps identified by ‘-’, treated as “missing”
Processing of “yourfile.dat nexus” completed.
PAUP counts all the characters in the data file to come up with 103, but will only use
32, because of the EXSET * exclude command we included. Taxa are our groups.
b. Next we need to ask PAUP to search for the shortest tree. From the Search pull-down
menu choose Heuristic. This sort of search for trees is the fastest, but it may not find the
shortest tree of all - we choose this kind of search to save you time.
The next menu that appears offers you the options that go with the heuristic search. Simply
tap return, since we‘ll use all the default options. PAUP then does the search, and you get two
reports, one about the details of the parameters for the search and a second that reports on the
results of the search, with the command close in a button.
Notice that the program tried 416 rearrangements and found one shortest tree that is 37 steps
long.
Now go ahead and click on the close button.
c Now choose the Trees menu and select Show Trees...and tap return to view a primitive
version of the shortest tree in the PAUP windowframe now. But to really play with this tree,
the other program, MacClade, is better. So save this tree - choose Trees again and select Save
trees to file... at the very bottom of the pull-down menu The name of your PAUP tree file
will be yourfile.dat.trees unless you change it.
Now quit PAUP by choosing the File menu and select the Quit option at the very bottom.
B. Now work with the MacClade program to understand the structure of your phylogeny (hypothesis of
evolutionary history) for the living seed plants. Find the MacClade program and click on it twice to
start it running.
1. Choose the File menu and select Open File... Click on the Desktop button. Now choose the
data file you named, for example “Yourfile.dat” (not the trees file). MacClade and PAUP
recognize each other’s files, so the file will open right up. This time, the data are really easy
to see. The 11 groups are named along the left side in the first column, and the character
states for each character are listed in the column under each character number. Scan across
the right to view all 103 characters, but remember - we are only using the 32 above.
2. To see how MacClade excludes characters, choose the Display menu and select Character
Status. This command reveals a table of characters with information about each. You can
quickly see which characters are included and excluded. Close this window when you’re
done.
3. Now it’s time to open the tree file using MacClade. Once again choose the Display menu,
but this time select Go To Tree Window. You will get a message that says that there are no
tree files stored for this data file - choose the button that says “open tree file”.
A menu appears that allows you to choose your tree file, for instance “yourfile.dat.trees”.
Click on the file and then tap the button to choose it (You can also click twice on the file
name).
A new menu appears with a list of files, including 1.PAUP.1 - this is the tree you created
using PAUP. Select it and tap return. The tree itself will finally appear. Now you can get
down to playing with the tree.
4. Having gotten your tree ready to manipulate, there are three basic things to do.
First, trace a single character’s history and show the outcome in the character states typical of
each group, and
Second, customize the tree to show the character changes,
Third, change the tree around to understand the effects of choosing a different history for the
evolutionary groups.
But first, a simple lesson. Choose the Tools menu. Choose the symbol that looks like this:
This tool allows you to rotate the two branches at a node - try it out. Move the cursor on to
the tree image and click on a line bearing two branches. You will see the branches rotate.
One thing critical to figure out is that either way the tree is telling you the same thing - that the
two branches represent the two evolutionary descendants of one common ancestor.
a.Tracing the History of Single Characters
For this section, here are two characters from the data table for reference.
Character
13.Vessels
20.Resins
0
absent
absent
1
present
present
i. Choose Trace Character from the Trace menu. The changes in
character 13 (vessels) are shown in color, with yellow primitive and blue
derived. (You may want to change the shape of the tree to see it more
easily, now that the character changes are not being displayed.)
ii. Now choose Data Boxes from the Display menu. This box shows the
actual data for each of our eleven living seed plant groups going across the
screen. Scroll up the data in the box until character 13 sits just on top of the
tree - and you should see a combination of the data for character 13 and the
inferred evolutionary history of the character states.
What you see is that Ephedra, Gnetum, Welwitschia, and the angiosperms
all share a character state (vessels present) whereas the rest of the groups
share a character state with our outgroup, the cycads (vessels absent).
Vessels absent is thus the primitive character state (indicated by yellow),
and vessels present is derived (indicated by blue). You can see from the
coloring of the tree that the common ancestor of these four groups is
supposed to have invented vessels (the common ancestor is the line ancestral
to all; it shows the transformation “13:0->1”.)
Understanding this last paragraph is the most important thing you can do in
this lab - it is an illustration of the way in which modern phylogenetic
analysis infers common ancestors based on shared, derived character states.
iii. Look at other characters to see similar histories inferred from character
distributions. Especially look at character 20 -- resins present versus absent
-- since this character helps define a common ancestor for all of the conifers.
b. Displaying Character Changes on the Trees. Now it’s time to use MacClade to
label the tree with the character changes inferred by PAUP.
i. Go to the Trace menu and choose All Changes Options. Here you will
be presented with a box that customizes the tree. Choose...
---”almost all possible changes”
---choose “Trace All” down at the bottom right. (This can also be done
from the Trace menu by selecting Trace All Changes.)
ii. Go to the Display menu and choose Trace Labeling.
---Click on the symbol labeled “label by characters changing”
---Check the box labeled “Show states changing”.
iii. Now, looking at the tree, you will see that the character changes are
impossible to read. Go back to the Tree Size and Shape under the Display
menu and choose four times taller than wide. Also for tree shape choose the
square corner option. This should open up the tree so you can see all the
characters changing.
Notice that most of the character changes make sense. You can see the
character number followed by the change, for instance” 91:0->1”. In some
cases, what you see is the following: :71:0/1->1”. These are cases in which
the simplest solution does not include a decision between two character
states for a particular place in the tree (both options yield the same tree
length). 71:0/1->1 translated means : “For character 71 at this internode, it is
just as parsimonious for the ancestor to have character 0 as character 1, take
your choice.” You can prove this yourself by counting up the number of
character-state changes under each assumption. This is best don eby
sketching the tree twice, once for parallel
iv. Look at the two characters, 13 and 20. You should see that MacClade
shows them transforming from primitive to derived at the expected places.
Vessels originate (13:0->1) in the common ancestor of Ephedra, Gnetum,
Welwitschia, and the angiosperms. Resins are invented (20:0->1) in by the
common ancestor of the conifers.
v. If you can, you may choose to print this tree as it now stands. Printing is
done from the File menu, by selecting Print. You will have to choose
among several options on the Print menu.
c. Manipulating the Tree to Test Alternative Hypotheses.
i. Remember the woody cone character? Here it is again:
Character
64.Woody Cones
0
absent
1
present
Some specialists have suggested that the podocarps and yews, which look
like conifers but don’t have cones, have lost their cones (a reversal of
character 64) instead of never having invented them. Use the Trace
Character command to show the character-state distribution and inferred
history of this trait (choose character 64 from the box at the bottom right).
How many more changes would be required in the tree for the common
ancestor of yews, podocarps and conifers to have invented woody cones?
(You must figure this out by looking at the character state changes on the
current tree and figuring out how many more changes are needed with this
new constraint.)
ii. Some specialists have felt that the angiosperms were most closely related
to the genus Gnetum among seed plants. How does the length of the tree
change if you move the angiosperms into a position where they and Gnetum
arise from the same common ancestor?
To answer this question - Choose the arrow tool from the Tool menu if you don’t have the arrow.
- Move the arrow to the line leading to Gnetum.
- Click and hold down while you move the arrow to the line leading to the
angiosperms.
- Let go.
MacClade will rebuild the tree with Gnetum and the angiosperms as
descendants of a single common ancestor (as sister groups) and report the
total length of the tree down at the bottom right in the box.
Adjust the tree size and shape to 4 times normal size and 1 times as high as
wide. Move the cursor to show just the following three groups angiosperms, Gnetum, and Welwitschia. Count the number of parallelisms
and reversals on the modified tree (with Gnetum and the angiosperms as
sister groups) and on the original tree (with Gnetum and Welwitschia as
sister groups). These counts will demonstrate to you how trees get longer
and shorter.
Look at the tree length reported in the PAUP 1 Box: it has gone from 37 to
43 steps.
There is an additional problem with the hypothesis that Gnetum is the sister
group of the angiosperms. Can you figure out what it is?
V. Considering a Molecular Data Set (Albert et al., 1994) for the Same Taxa (Not
Written 1999)
RELEVANT LITERATURE
Albert, V.A., A. Backlund, K. Bremer, M.W. Chase, J.R. Manhart, B.D. Mishler, and K.C. Nixon.
1994. Functional constraints and rbcL evidence for land plant phylogeny. Annals Missouri
Botanical Garden 81:534-567. [shows trees based on integrated molecular and morphological
data for both all and only living taxa]
Doyle, J.A. and M. J. Donoghue. 1986. Seed plant phylogeny and the origin of the angiosperms:
an experimental cladistic approach. Botanical Review 52(4):321-431.
Doyle, J.A., M.J. Donoghue, and E.A. Zimmer. 1994. Integration of morphologiccal and
ribosomal DNA data on the origin of angiosperms. Annals Missouri Botanical Garden 81:
419-450.
Nixon, K.C., W.L. Crepet, D. Stevenson, and E.M. Friis. 1994. A reevaluation of seed plant
phylogeny. Annals Missouri Botanical Garden 81:484-533. [source of the data for this lab]
Download