Introduction to Systematics - BioQUEST Curriculum Consortium

advertisement
Introduction to Systematics
Fall 2007
The following introduction is slightly modified from Brown et al. 2003. Tree of Life:
Microbial Evolution in Microbes Count! John R. Jungck, Ethel D. Stanley, and Marion
Field Fass, Eds. Published by BioQUEST Curriculum Consortium and American Society
for Microbiology.
For the next few weeks we’ll be looking at all the organisms that are capable of
photosynthesis. Our job is to determine what evolutionary pattern led to their
diversification. There are two types of data we can use. The first is morphological and
the second is DNA. Ideally these two methods should give us similar results. Today
we’ll be acquainting ourselves with two software packages: Biology Workbench and
PHYLP.
DNA and phylogenetic trees:
Background
Bioinfomatics is a relatively new field that involves the use of computers to mine the
incredible amount of molecular information available to scientists. This information
includes DNA and protein sequences as well as other types of information for a wide
variety of organisms. Phylogeny reconstruction is an important application of
bioinformatics.
Phylogeny refers to a hypothesis for the evolutionary history of a group of species.
Phylogenetic trees are usually constructed from sequence data but may also be based on
morphological characteristics, and metabolic characteristics, or other forms of molecular
data (e.g. DNA fingerprint profiles). Phylogenies are typically represented as resolved or
partially resolved bifurcating trees, consisting of nodes and edges (Fig. 1a, 1b). The
relative lengths of different edges represent the degrees of relatedness, and are often
shown in proportion (weighted tree, as in Fig. 1a). However, if the investigator is only
interested in the respective groupings of species sometimes the edges are shown as being
of an equal arbitrary length (unweighted tree, as in Fig. 1b).
Fig. 1a. An unrooted weighted tree.
Fig. 1b. An unrooted unweighted tree.
Evolutionary tree building seeks to find the tree that makes the most biologically-sensible
explanation of how a mixed group of species evolved. There can be many possible
bifurcating tree shapes to consider for even relatively small numbers of species
compared.
Some methods of tree building use some secondary criteria to evaluate how well each of
the possible tree shapes fit the data. However, an alternative approach, which works
better for larger numbers of species is to first construct a table containing measures of
difference between pairwise comparisons of species (e.g. perhaps count the number of
mutations that differ between the sequences of different species). The next step then uses
the values in the table to determine the order species are clustered into a bifurcating tree
(Fig. 2).
1
8
7
1
2
4
5
2
8
6
3
4
5
1
7
8
6
3
1
7
2
2
7
8
6
5
3
6
3
4
Fig. 2 Phylogenetic tree building by pairwise comparisons.
This procedure produces just one tree, and does not evaluate this tree against other
possible fully resolved bifurcating trees, which may be almost as equally probable.
There are several factors to consider when constructing a tree:
5








Did the data evolve on a bifurcating tree?
Can we expect the patterns in the data to indicate only 1 unique tree?
How far back in history can we reliably infer phylogeny from sequence data?
Can we infer a species’ phylogeny from the comparative analysis of only small
sections of the genomes of our organisms?
Will protein or DNA or rRNA sequences be better for inferring phylogeny?
Has the data evolved under a molecular clock?
Is it valid to compare homologous sequences if they are from different genome
compartments (e.g. from chloroplasts, mitochondria, nuclei, nucleomorphs)?
How should we think about sequence evolution - what are the properties of sequence
evolution? Different models of sequence evolution can be used to correct the
difference values in our starting distance table. Different corrections could result in
different values that change the shape of our final tree. What are the assumptions of
these models? Which ones would be best? How would I decide?
Bioinfomatics Tools
The purpose of this exercise is to develop a basic understanding of bioinfomatics and
learn some of the skills involved in constructing and interpreting phylogenetic trees and
cladograms. The basic tools consist of a bank of data, in our case DNA sequences, and a
program to compare the similarities and differences between sequences. These programs
also display the results in the form of a tree. Simply generating the tree is not the end of
the story. Interpretation is a time-consuming and extensive process and you will need to
spend a fair amount of time interpreting your trees.
Learning the basic skills
Follow your instructor’s directions and complete the “warm-up” activity.
1. Your characteristic grid for your “organisms”:
Organism
1
2
3
4
5
6
Char 1
Char 2
Char 3
Char 4
Char 5
Character name 1______________________ Codes :____________________________
Character name 2______________________ Codes :____________________________
Character name 3______________________ Codes :____________________________
Character name 4______________________ Codes :____________________________
Character name 5______________________ Codes :____________________________
Your phylogeny based on morphological characteristics:
Now weight your characteristics as ancestral or derived. What does your phylogeny look
like now? What changes did you make?
For the next part of the lab you will work with the 10 species you selected last week at
Lamberton Conservatory. First complete the following chart using the information you
collected last week.
Your characteristic grid for the Lamberton species:
Organism
Char 1
Char 2
Char 3
Char 4
Char 5
1
2
3
4
5
6
7
8
9
10
Character name 1______________________ Codes :____________________________
Character name 2______________________ Codes :____________________________
Character name 3______________________ Codes :____________________________
Character name 4______________________ Codes :____________________________
Character name 5______________________ Codes :____________________________
Using your phylogeny from last week, and information on phylogenies you have learned
in this lab, draw your phylogeny based on morphological characteristics (state whether
you are using pure morphological data, or are designating some characteristics ancestral
or derived.
Step by step instruction
1. Get on to the internet and type in http://workbench.sdsc.edu/
2. Click on “Set up a free account” and follow the instruction to set up an account. Be
sure to keep a record of your name and password. Next time you enter, click on
“Enter the Biology Workbench 3.2” and all you have to do is to type your name and
password.
3. Once you are in, click on “Nucleic Tools”.
4. Highlight “Ndjinn – Multiple Database Search”. Click “Run”.
5. There many databases available. For our project we will use GBPLN (GenBank Plant
Sequences, which includes the fungi and algae.) For this week, we will be working
on the10 plants you chose at the Lamberton Conservatory last week in lab.
6. Click on “GBPLN”. (You have to scroll down for this option.)
7. Scroll up and type in the scientific name for the first species on your list. Click on
“Show 10 hits”.
8. Click on “Search”. You will see a list of choices. Scroll down and make notes on
what types of sequences are available. What genes are sequenced? Are they whole
or partial sequences? Use the grid on the next page to help you organize these data.
Enter the gene name and whether it is a partial sequence of a complete sequence.
Organism
Gene
Gene
Gene
Gene
Gene
1
2
3
4
5
6
7
8
9
10
9. Repeat this process for all your species, then determine what gene sequence is
available for most of your species and import those sequences to your account.
10. Import sequences by searching again and highlighting the line for the data you want.
Click on the “Import Sequence(s)” button located at the end of the first line of the
interactive box.
11. Now we are ready to generate a tree. Select all the organisms. Click “Run”. All the
boxes in front of these organism names should be checked.
12. Using the scroll box and scroll down and highlight “CLUSTALW – Multiple
Sequence Alignment”. Click “Run”.
13. A new screen will appear. You can choose to make a rooted or unrooted tree by
clicking on the arrow next to the box labeled “Guide tree display” and choosing
rooted or unrooted. Then, click “Submit”. The screen will go blank and you may
have to wait several minutes. Wait until a screen titled “CLUSTALW” with
“Sequence alignment” appears. Scroll down to examine the DNA sequences and how
they align with each other. Scroll further down and you will see your tree!
14. Open Microsoft Word. Click “Edit”, then “Paste” and your tree will reappear.
Adjust the size of your tree by selecting the tree image and resize from the lower right
corner so two trees can fit on a page. Type in a label for each tree using consecutive
figure numbers.
15. Look up each GBPLN number and write the corresponding species beside the number
by hand. You may also label each one by pulling down the “Insert” menu then
selecting and releasing on “TextBox”. Click where you want the textbox to be
located and type the name of the species. Drag the textbox next to the corresponding
number.
How does this tree compare with the one you drew last week and the phylogeny you drew
earlier in this lab, based on morphological characteristics?
Morphological and molecular data, and phylogenetic trees.
To complete a phylogenetic tree using character states requires a lot of homework. In the
weeks to come, we’ll be building a character matrix with the following headings:
Taxon
Characters
1
2
3
4
5
6
7
8
9
10
11
Phylum1
Phylum2
Phylum3
Phylum4
For each character you need to name the character and determine how many different
states it will have, then assign a number to each of the character states. Those are the
numbers you will enter in your character matrix. Today you’ve had some practice.
Starting next week, we’ll start building the phylogeny of all photosynthetic organisms.
Working in teams of four, use your text book and list all the phyla that your text
discusses. Look at the pictures and draw on your previous knowledge of plants. What
are some characteristics that you notice many, or some phyla have? For example, we
learned that having vascular tissue was a major evolutionary advantage. So, one of our
characters could be “vascular tissue” with two character states, 0=absent and 1=present.
See what other characters you want to include. We’ll refine these lists as we proceed in
the semester.
Build a character matrix list in Excel with the following columns: character name,
character description, character state 0, character state 1, etc. for all character states for
that character, and why it’s an important character for determining evolutionary
relationships in plants. Fill a row in for each character. Here’s a sample for the example
given above:
Character name
vascular tissue
Character
description
vascular tissue
includes presence of
xylem and phloem
Character state
descriptions
0 = no vascular
tissue
1 = vascular tissue
present
Why is this
important?
vascular tissue
allowed plants to
grow larger and
become more
complex…
Next character listed
here
For next week turn in:




Your Lamberton list with a photograph of each species (properly cited)
This completed lab handout
Your tree from Biology Workbench.
A printed copy of your proposed character matrix for all photosynthetic life with
at least 5 characters described as above.
rev. 6/19/08 bjb
Download