L5-Alignment and analysis of DNA Sequences

Dangerous Ideas, Spring 2005
Name: _________________
Lab 5: Alignment and
Phylogenetic analysis of
DNA Sequences
 To understand how DNA can be used to study evolutionary history
 To become familiar with the process of aligning sequences and constructing
phylogenetic trees
 To explore on-line resources including GenBank and BLAST
MATERIALS: Access to a high-speed internet connection
Our collection of known DNA sequences has increased dramatically in the last
few years due to recent advances in the field of molecular biology. The DNA sequence
of an individual contains information that can be used in a wide variety of applications,
from forensics to the study of evolution.
Evolutionary biologists view DNA as a “document” of evolutionary history.
Comparing the DNA sequences of genes from different organisms can reveal
evolutionary relationships that might not otherwise be inferred from their morphology.
Since genomes acquire mutations gradually, the amount of sequence difference found in
two organisms should tell us something about how recently these two organisms shared a
common ancestor. In other words, two organisms that share a relatively recent common
ancestor should have more similar DNA sequences than two organisms that diverged
Molecular phylogenetics is the field
of study that attempts to determine the rates
and/or patterns of change occurring in
DNA (and other macromolecules) and to
reconstruct the evolutionary history of
genes and organisms. The evolutionary
history revealed by the sequence data is
frequently presented in a phylogenetic
tree. Phylogenetic trees are branching
diagrams depicting the evolutionary
relationships of organisms.
It is important to note that our current understanding of most evolutionary
relationships comes from a variety of data including both traditional morphological
approaches as well as molecular data.
Researchers attempting to construct phylogenetic trees must go through a series of
Step 1: Acquire the DNA sequences- DNA sequences may either be determined
directly by sequencing a region of DNA, or indirectly, by acquiring the sequence from a
public database or published source. (DNA sequencing will be discussed in lecture; we
will use public databases in our exploration today.)
Step 2: Align the DNA sequences- Once accurate DNA sequences have obtained, they
must be properly aligned to reveal their evolutionary relationships. Consider the
following example:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G T G T C A A T
At first glance, organism 1 and 2 appear to have dramatically different DNA sequences.
In fact, they seem to share only 6 of the 12 bases being examined (50% sequence
homology). Now examine these sequences properly aligned:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G - T G T C A A
With a gap correctly inserted, it is now apparent that the two organisms share 11 of the 12
bases being examined (92% sequence homology). Correct alignment is difficult and
usually done through the use of software such as CLUSTAL.
Step 3: Construct a Phylogenetic Tree- With the sequences correctly aligned, a
phylogenetic tree can now be constructed. Consider the following, aligned, sequences:
Organism 1: A T G G G C T G T C A A
Organism 2: A T G G G - T G T C A A
Organism 3: A T G G G - T G T C A A
Organism 4: A T G G G C T G T C A A
These organisms seem to share some evolutionary history as they all have similar DNA
sequences. Organisms 2 and 3, however, are both “missing” the C at position 6. Their
evolutionary relationships, as predicted by this data set, could be presented as:
As the DNA sequence under consideration gets longer and more complicated, so,
too, does the process of constructing an appropriate tree. Again, most of this work is
done by using one of several software packages.
The National Center for Biotechnology Information (NCBI)Established in 1988 as a national resource for molecular biology information,
NCBI creates public databases, conducts research in computational biology, develops
software tools for analyzing genome data, and disseminates biomedical information - all
for the better understanding of molecular processes affecting human health and disease.
You can explore NCBI at http://www.ncbi.nlm.nih.gov.
Two especially useful services provided at the NCBI website are PubMed and
BLAST. (Click the links in the upper header.) PubMed is a searchable database of
published scientific papers in the fields of medicine and biotechnology. BLAST is a
software program (a suite of algorithms actually) that allows one to search GenBank for
similar sequences. This allows for the identification of unknown sequences as well as
comparison between similar sequences.
GenBankGenBank® is the National Institute of Health’s (NIH) genetic sequence database,
an annotated collection of all publicly available DNA sequences. There are
approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002.
A new release is made every two months. GenBank is part of the International Nucleotide
Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan
(DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.
These three organizations exchange data on a daily basis. In other words, this is a global,
cooperative effort to share DNA sequence information as it’s acquired. This is the
database against which BLAST will search to identify a sample sequence and/or find
similar sequences in the database.
Today we will access several DNA sequences from a public database, align them,
and construct a phylogenetic tree. The sequences we will analyze today are from human
mitochondrial DNA. (Remember that mitochondria contain their own DNA, and that this
DNA is always maternal in origin.)
Mitochondrial DNA has been extensively studied in
an attempt to understand human evolution and prehistoric
migratory patterns. Some anthropologists have argued that
people evolved at least partly from the Neanderthals. The
opposing theory is that modern humans evolved in Africa,
then spread outward, overwhelming earlier hominids
including Neanderthals. The short, squat Neanderthals
inhabited much of Europe from about 100,000 years ago
until dying out about 28,000 years ago. Analyzing
mitochondrial DNA has provided data with which to
evaluate these two different hypotheses.
Acquiring Sequences:
To access the sequence information for this exercise, you will need to follow these steps:
Open an Internet browser and go to http://www.bioservers.org.
Go to the butler labeled “Sequence Server” and click the “Enter” button below it.
Click the “Manage Groups” button in the top center of your screen.
From the pull-down menu under “Sequence Sources”, select “Prehistoric Human
5. Eight different entries will appear in your window. Note that you can view these
sequences by clicking on the red “View” button next to each.
6. Select all eight sequences by clicking in the box on their left. Click on “OK” after
all are selected.
Aligning the Sequences:
We will now ask the server to align all eight of our sequences using a program called
1. Select all eight of your sequences by clicking in the box to their left. With all the
sequences selected, click on the “Compare” box directly above.
2. You will now be shown an alignment. The yellow color indicates regions where
all the sequences do not align. Scroll through the sequence and note the high
levels of variation!
Constructing a Phylogenetic Tree:
1. Return to the previous screen by clicking on “Done”.
2. Be sure all eight of your sequences are highlighted. (Boxes to their left should be
3. Click on the toggle menu bar that currently says “CLUSTAL W”. Select
“Phylogenetic Tree” and click on the “Compare” Button.
4. A window will open containing a phylogenetic tree based on the mtDNA
sequence provided.
Using the tree you just created, and the bioserver database, answer the questions
on the following page.
Names of Group Members:
1. What is the hypothesis being tested in this analysis? (Hint: There are two, conflicting
hypotheses; you’ll have to pick one!)
2. What do you predict you’ll see in the phylogenetic tree if your hypothesis is correct?
3. In the space below (or on a separate sheet), draw the tree generated from the
mitochondrial sequences analyzed:
4. Does this tree support your hypothesis? Explain.
5. To further clarify your data, return to bioserver. Close the window containing your
tree. Click on “Manage Groups” again to import another set of sequences. This time
select “modern human mtDNA”. Both sets of sequences will now appear in your
window. Select one or two of the modern sequences and generate another phylogenetic
tree. Draw this tree in the space below (or on a separate sheet).
6. Does this tree support your hypothesis? Explain.