Lab 5: Alignment and Phylogenetic analysis of DNA Sequences

advertisement
Dangerous Ideas, Spring 2008
Name: _________________
Lab 5: Alignment and
Phylogenetic analysis of
DNA Sequences
OBJECTIVES:
 To understand how DNA can be used to study evolutionary history
 To become familiar with the process of aligning sequences and constructing
phylogenetic trees
 To explore on-line resources including GenBank and BLAST
MATERIALS: Access to a high-speed internet connection
INTRODUCTION:
Our collection of known DNA sequences has increased dramatically in the last
few years due to recent advances in the field of molecular biology. The DNA sequence
of an individual contains information that can be used in a wide variety of applications,
from forensics to the study of evolution.
Evolutionary biologists view DNA as a “document” of evolutionary history.
Comparing the DNA sequences of genes from different organisms can reveal
evolutionary relationships that might not otherwise be inferred from their morphology.
Since genomes acquire mutations gradually, the amount of sequence difference found in
two organisms should tell us something about how recently these two organisms shared a
common ancestor. In other words, two organisms that share a relatively recent common
ancestor should have more similar DNA sequences than two organisms that diverged
earlier.
Molecular phylogenetics is the field
of study that attempts to determine the rates
and/or patterns of change occurring in
DNA (and other macromolecules) and to
reconstruct the evolutionary history of
genes and organisms. The evolutionary
history revealed by the sequence data is
frequently presented in a phylogenetic
tree. Phylogenetic trees are branching
diagrams depicting the evolutionary
relationships of organisms.
It is important to note that our current understanding of most evolutionary
relationships comes from a variety of data including both traditional morphological
approaches as well as molecular data.
Researchers attempting to construct phylogenetic trees must go through a series of
steps:
Step 1: Acquire the DNA sequences- DNA sequences may either be determined
directly by sequencing a region of DNA, or indirectly, by acquiring the sequence from a
public database or published source. (DNA sequencing will be discussed in lecture; we
will use public databases in our exploration today.)
Step 2: Align the DNA sequences- Once accurate DNA sequences have obtained, they
must be properly aligned to reveal their evolutionary relationships. Consider the
following example:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G T G T C A A T
At first glance, organism 1 and 2 appear to have dramatically different DNA sequences.
In fact, they seem to share only 6 of the 12 bases being examined (50% sequence
homology). Now examine these sequences properly aligned:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G - T G T C A A
With a gap correctly inserted, it is now apparent that the two organisms share 11 of the 12
bases being examined (92% sequence homology). Correct alignment is difficult and
usually done through the use of software such as CLUSTAL.
Step 3: Construct a Phylogenetic Tree- With the sequences correctly aligned, a
phylogenetic tree can now be constructed. Consider the following, aligned, sequences:
Organism 1: A T G G G C T G T C A A
Organism 2: A T G G G - T G T C A A
Organism 3: A T G G G - T G T C A A
Organism 4: A T G G G C T G T C A A
These organisms seem to share some evolutionary history as they all have similar DNA
sequences. Organisms 2 and 3, however, are both “missing” the C at position 6. Their
evolutionary relationships, as predicted by this data set, could be presented as:
1
4
2
3
As the DNA sequence under consideration gets longer and more complicated, so,
too, does the process of constructing an appropriate tree. Again, most of this work is
done by using one of several software packages.
DNA SEQUENCE RESOURCES:
The National Center for Biotechnology Information (NCBI)Established in 1988 as a national resource for molecular biology information,
NCBI creates public databases, conducts research in computational biology, develops
software tools for analyzing genome data, and disseminates biomedical information - all
for the better understanding of molecular processes affecting human health and disease.
You can explore NCBI at http://www.ncbi.nlm.nih.gov.
Two especially useful services provided at the NCBI website are PubMed and
BLAST. (Click the links in the upper header.) PubMed is a searchable database of
published scientific papers in the fields of medicine and biotechnology. BLAST is a
software program (a suite of algorithms actually) that allows one to search GenBank for
similar sequences. This allows for the identification of unknown sequences as well as
comparison between similar sequences.
GenBankGenBank® is the National Institute of Health’s (NIH) genetic sequence database,
an annotated collection of all publicly available DNA sequences. There are
approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002.
A new release is made every two months. GenBank is part of the International Nucleotide
Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan
(DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.
These three organizations exchange data on a daily basis. In other words, this is a global,
cooperative effort to share DNA sequence information as it’s acquired. This is the
database against which BLAST will search to identify a sample sequence and/or find
similar sequences in the database.
YOUR EXPLORATION:
Today we will access several DNA sequences from a public database, align them,
and construct a phylogenetic tree. The sequences we will analyze today are from human
mitochondrial DNA. (Remember that mitochondria contain their own DNA, and that this
DNA is always maternal in origin.)
Mitochondrial DNA has been extensively studied in
an attempt to understand human evolution and prehistoric
migratory patterns. Some anthropologists have argued that
people evolved at least partly from the Neanderthals. The
opposing theory is that modern humans evolved in Africa,
then spread outward, overwhelming earlier hominids
including Neanderthals. The short, squat Neanderthals
inhabited much of Europe from about 100,000 years ago
until dying out about 28,000 years ago. Analyzing
mitochondrial DNA has provided data with which to
evaluate these two different hypotheses.
Acquiring Sequences:
To access the sequence information for this exercise, you will need to follow these steps:
1.
2.
3.
4.
Open an Internet browser and go to http://www.bioservers.org.
Go to the butler labeled “Sequence Server” and click the “Enter” button below it.
Click the “Manage Groups” button in the top center of your screen.
From the pull-down menu under “Sequence Sources”, select “Prehistoric Human
mtDNA”.
5. Eight different entries will appear in your window. Note that you can view these
sequences by clicking on the red “View” button next to each.
6. Select all eight sequences by clicking in the box on their left. Click on “OK” after
all are selected.
Aligning the Sequences:
We will now ask the server to align all eight of our sequences using a program called
Clustal.
1. Select all eight of your sequences by clicking in the box to their left. With all the
sequences selected, click on the “Compare” box directly above.
2. You will now be shown an alignment. The yellow color indicates regions where
all the sequences do not align. Scroll through the sequence and note the high
levels of variation!
Constructing a Phylogenetic Tree:
1. Return to the previous screen by clicking on “Done”.
2. Be sure all eight of your sequences are highlighted. (Boxes to their left should be
checked.)
3. Click on the toggle menu bar that currently says “CLUSTAL W”. Select
“Phylogenetic Tree” and click on the “Compare” Button.
4. A window will open containing a phylogenetic tree based on the mtDNA
sequence provided.
TO TURN IN:
Using the tree you just created, and the bioserver database, answer the questions
on the following page.
Lab 5: EXPLORING
PHYLOGENETICS
Names of Group Members:
1. What is the hypothesis being tested in this analysis? (Hint: There are two, conflicting
hypotheses; you’ll have to pick one!)
2. What do you predict you’ll see in the phylogenetic tree if your hypothesis is correct?
3. In the space below (or on a separate sheet), draw the tree generated from the
mitochondrial sequences analyzed:
4. Does this tree support your hypothesis? Explain.
5. To further clarify your data, return to bioserver. Close the window containing your
tree. Click on “Manage Groups” again to import another set of sequences. This time
select “modern human mtDNA”. Both sets of sequences will now appear in your
window. Select one or two of the modern sequences and generate another phylogenetic
tree. Draw this tree in the space below (or on a separate sheet).
6. Does this tree support your hypothesis? Explain.
Lab 2
Part2: Analysis of mtDNA
Sequences
OBJECTIVES:




Review the process of DNA replication, electrophoresis, and PCR
Understand the process of DNA sequencing
Explore the Bioserver database and Genbank
Compare and analyze our own mtDNA sequences
General Background:
Recall that earlier in the quarter we collected our cheek cells using a saline rinse, ruptured
those cells to extract their DNA, and then used the polymerase chain reaction (PCR) to make
multiple copies of a small portion of our mitochondrial DNA (mtDNA). (See Lab 2, Part 1 for
details.) When you last saw your sample, it was in the thermocycler, ready to begin that PCR.
In the time since, I have used DNA electrophoresis to visualize your samples. I took a
small portion of your PCR product (5 ul) and ran it on a gel to see if your reaction had worked. If
the PCR did not work, there was too little DNA to see on the gel. If it did work, a strong band
was visible on the gel. In this case, I then sent your PCR reactions to Cold Springs Harbor
Laboratory for sequencing on their DNA sequencers.
Cold Springs Harbor Lab technicians then used your PCR product as a template for DNA
sequencing, and visualized the results on an automated DNA sequencer. (See notes on
sequencing below.) The sequence they obtained has been posted on the Bioserver website. We
will access these sequences together in lab today and compare our mtDNA sequences to each
other, and to modern humans from around the world!
Notes on DNA Sequencing (also see figure at the end of this handout):
DNA sequencing takes advantage of what is known about DNA replication in cells. In
many ways, it is also quite similar to the reaction you performed, PCR, to copy your original
cheek cell mtDNA. As with PCR, heat is used to temporarily separate the two strands of a DNA
molecule. A DNA polymerase (the enzyme that copies DNA) can then use one strand as a
template to make a copy of the original molecule. When this reaction is done for the purposes of
PCR, it is done with a nearly unlimited supply of nucleotides (A, T, C, and G), the building
blocks of DNA.
In standard DNA sequencing however, this reaction is split into four separate tubes. Each
of these reaction tubes receives plenty of DNA polymerase and nucleotides, but also receives a
small amount of a modified nucleotide. (Thus one tube will receive a modified “A”, one tube a
modified “T”, one a “G”, and one a modified “C”.) This modified nucleotide (a
dideoxynucleotide) is unique in that it is unable to form a bond with the next nucleotide in the
growing chain. Thus these modified nucleotides are often called chain terminators. As the
polymerase moves along the template molecule, catalyzing the production of a new strand, it will
usually incorporate a “normal” nucleotide, but will occasionally incorporate a chain terminator.
When it does so, DNA replication stops. As many hundreds of thousands of these reactions are
occurring simultaneously in your tube, all possible lengths of DNA molecules will be produced.
And in the tube with the modified “A”, all of these chains will end in “A”. This is true for the
tubes containing the T, G, and C chain terminators as well. Thus each tube contains a mixture of
molecules, all of which end in a particular nucleotide.
This collection of molecules is then sorted using electrophoresis. As with the
electrophoresis we did earlier this quarter, this process will sort the DNA molecules based on
their size. By running all four tubes next to each other, we can then “read” up the gel to
reconstruct the sequence of our original DNA template.
Common Questions:
1. Why didn’t my PCR work??
PCR is a notoriously finicky reaction. Common errors or sources of failure include
pipetting errors, and template quality. For example, if you had too few cheek cells in your
preparation, your PCR might not have worked. The presence of too many cheek cells, or other
contaminants, could also keep your reaction from working.
2. What does “N” mean in a DNA sequence?
Often times we cannot interpret which nucleotide (A, T, C, or G) is at a particular
location in a DNA molecule. When it cannot be determined, we insert an “N” into the sequence
to indicate an unknown nucleotide.
3. What makes some DNA sequences “excellent” and others “poor”?
On your data table, I have scored the results of each sequence on a qualitative scale
ranging from excellent to poor. This primarily reflects the number of N’s in your sequence.
Sources of sequence ambiguity can include poor template quality (PCR product) as well as
several factors out of your control, including the quality of the sequencing reaction and the skill
of the technician performing the sequencing!
Procedures:
Your goal today is primarily exploratory. You will work with one other student to access our
DNA sequences, practice some alignments, and generate phylogenetic trees from our sequences.
Students with good sequencing results may wish to identify their number, but note that this is
optional!
Step 1: To begin, open a browser to http://www.bioserver.org/sequences/.
Login to the Sequence Server as a guest.
Step 2: Click the “Manage Groups” button at the top of the screen. This will open the Manage
Groups Window. In this window, choose “classes” from the popup menu on the upper right. A
new screen will appear and you will see our class listed under Suzanne Schlador, Dangerous
Idease. To select our class, click the checkbox next to the listing and click “OK”. This will
move our class onto your worksheet.
Step 3: To compare sequences, you will need to have more than once sequence on the worksheet.
To add students from our class, select the desired sequences from the popup menu. Then click
the check box for each sequence you want to include in your comparison, and press the
“Compare” button. Sequence server will open a new window to display the results of your
comparison. Note that sequences with many N’s (those rated “poor” on my table) are difficult for
the server to align!
Step 4: Practice an alignment! Select two or more sequences for alignment as described in Step
3. In the space below, or on an additional sheet, make note of which sequences you choose to
align, and how many differences you observed between them. (Differences will be highlighted in
yellow; ambiguous positions are noted in grey.)
Step 5: From your main worksheet, you can also compare any of ours to those contained in the
international database, Genbank. To do this, select a sequence by clicking the round button to the
right of the sequence and then clicking “Analyze”. Sequence server will open a new window
showing the results of your analysis. Note that you can follow links (the Genbank accession
numbers) in the results to learn more about the sequences you match with. Try this with at least
one of our sequences.
Step 6: Now for the fun part! As we just did in the previous exercise (Lab #5), try using the
Bioserver software to generate at least one phylogenetic tree. From your main worksheet, return
to “Manage Groups”. Notice that you can add a variety of groups to your worksheet including
modern humans, ancient humans (those Neandertals!), other students, and other animals. Select
two of more groups of interest to you, and at least two of our students, and draw the phylogenetic
tree you generate in the space below.
Does the tree you’ve generated look like you would expect it to? Why or why not?
DNA Sequencing (the Sanger Method):
Download