Lab Handout

advertisement
Name_____________________
Investigation: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Using bioinformatics as a tool to determine evolutionary relationships.
Between 1990–2003, scientists working on an international research project known as the Human
Genome Project were able to identify and map the 20,000–25,000 genes that define a human being. The
project also successfully mapped the genomes of other species, including the fruit fly, mouse, and
Escherichia coli. The location and complete sequence of the genes in each of these species are available
for anyone in the world to access via the Internet.
Why is this information important? Being able to identify the precise location and sequence of human
genes will allow us to better understand genetic diseases. In addition, learning about the sequence of
genes in other species helps us understand evolutionary relationships among organisms. Many of our
genes are identical or similar to those found in other species.
Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same
gene found in humans? Does it cause a similar disease? It would take you nearly 10 years to read
through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This
definitely isn’t practical, so a sophisticated technological method is needed.
Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to
analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order
to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which
stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest
and search entire genomic libraries for identical or similar sequences in a matter of seconds.
In this laboratory investigation, you will use BLAST to compare several genes, and then use the
information to construct a cladogram. A cladogram (also called a phylogenetic tree) is a visualization of
the evolutionary relatedness of species. A cladogram is treelike, with the endpoints of each branch
representing a specific species. The closer two species are located to each other, the more recently they
share a common ancestor.
Cladrograms can also include additional details, such as the evolution of particular physical structures
called shared derived characters. The placement of the derived characters corresponds to when (in a
general, not a specific, sense) that character evolved; every species above the character label possesses
that structure.
Historically, only physical structures were used to create cladograms; however, modern-day cladistics
relies heavily on genetic evidence as well. For example, chimpanzees and humans share 95%+ of their
DNA, which would place them closely together on a cladogram. Humans and fruit flies share
approximately 60% of their DNA, which would place them farther apart on a cladogram.
PRE-LAB:
1) Use the following data to construct a cladogram of the major plant groups (“1” = characteristic is
present within group):
Organisms
Mosses
Pine trees
Flowering plants
Ferns
Vascular Tissue
0
1
1
1
Flowers
0
0
1
0
Seeds
0
1
1
0
2) GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in
glycolysis, an important reaction that produces molecules used in cellular respiration. The following
data table shows the percentage similarity of this gene and the protein it expresses in humans versus
other species. For example, according to the table, the GAPDH gene in chimpanzees is 99.6% identical
to the gene found in humans, while the protein is identical.
Species
Chimpanzee (Pan troglodytes)
Dog (Canis lupus familiaris)
Fruit fly (Drosophila melanogaster)
Roundworm (Caenorhabditis elegans)
Gene Percentage Similarity
99.6%
91.3%
72.4%
68.2%
Protein Percentage Similarity
100%
95.2%
76.7%
74.3%
a) Why is the percentage similarity in the gene always lower than the percentage similarity in the
protein for each species? (Hint: Recall how a gene is expressed to produce a protein.)
b) Draw a cladogram depicting the evolutionary relationships among all five species (including humans)
according to their percentage similarity in the GAPDH gene.
PROCEDURE (Part One):
A team of scientists has uncovered the fossil specimen near Liaoning
Province, China. Make some general observations about the
morphology (physical structure) of the fossil, and then record your
observations in the space below. Little is known about the fossil. It
appears to be a new species. Upon careful examination of the fossil,
small amounts of soft tissue have been discovered. Normally, soft
tissue does not survive fossilization; however, rare situations of such
preservation do occur. Scientists were able to extract DNA nucleotides
from the tissue and use the information to sequence several genes.
Your task is to use BLAST to analyze these genes and determine the
most likely placement of the fossil species on the cladogram below.
Step 1: Make a hypothesis as to where you believe the fossil specimen should be placed on the
cladogram based on the morphological observations that you made of the fossil. Mark (and label) your
hypothesis on the cladogram above.
OBSERVATIONS:
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
Step 2: Download the sequences of the four gene samples taken from the unknown fossil. These can be
found on my website under the ‘Labs & Lab Notebook’ link.
Step 3: Use the following website for your genetic analysis—
BLAST – use this website to compare gene sequences with genomic DNA from representative
organism in a data base. http://blast.ncbi.nlm.nih.gov/Blast.cgi
Step 4: Go to the BLAST website. Under ‘Basic BLAST’, click on ‘nucleotide blast.’ Copy-and-paste the
gene sequence for FOSSIL GENE 1 into the ‘Enter Query Sequence’ box on the BLAST webpage. Under
‘Choose Search Set- database,’ make sure that “others (nr etc.)” is selected. Under ‘Program Selection,’
make sure that “Highly similar sequences (megablast)” is selected.
 Then, click the blue “BLAST” button to search for gene sequences in different species that are
similar to the unknown fossil gene sequence.
Step 5: When the results of the BLAST sequence comparison appear, scroll down to the section entitled
‘sequences producing significant alignments.’ The species in the list that appears below this section are
those with sequences identical to (or most similar to) the gene of interest. The most similar sequences
are listed first.
 You’ll need to click on the particular species listed and the ‘accession’ link, where you’ll find
more info that includes the common name of the species, the # of nucleotides that match
between the gene of interest and the known organism, etc. Using the information from your
results, complete TABLE 1. REPEAT STEPS 4 & 5 FOR ALL FOUR FOSSIL GENES.
TABLE 1
Fossil
Gene
#
Most closely related
organism (genus and
species name)
Most closely related
species (common
name)
“Max
Score”
Number of
matching
nucleotides
(fossil gene
vs. organism
gene)
1
/
2
/
3
/
4
/
%
nucleotide
match
(“Max
Identity”)
Next TWO most closely
related organisms
(common names only)
Step 6: Based on what you’ve learned from the sequence analysis and what you know from the fossil
structure itself, decide where the new fossil species belongs on the cladogram. Mark (and label) your
results on the cladogram so that you may compare your results w/ your hypothesis.
PROCEDURE (Part Two):
Now that you’ve completed Part One of the investigation, you should feel more comfortable using
BLAST. The next step is to learn how to find and BLAST your own genes of interest. To locate a gene,
you will go to the following website:
NCBI Gene – use this website to obtain gene sequences for analysis.
http://www.ncbi.nlm.nih.gov/gene
Step 1: Use the search tool at the top of this website to search for the sequences listed in TABLE 2.
Step 2: Click on the first link that appears and scroll down to the “NCBI Reference Sequences.” Under
“mRNA and Proteins,” click on the first file name. It will be named “NM_000257.2” or something
similar.
Step 3: Just below the gene title, click on “FASTA.” This is the name for a particular format for
displaying sequences.
Step 4: Copy the entire gene sequence, and then go to the BLAST website (see Procedure: Part One,
Step 3.)
Step 5: Under ‘Basic BLAST’, click on ‘nucleotide blast.’ Paste your gene sequence into the ‘Enter Query
Sequence’ box on the BLAST webpage. Under ‘Choose Search Set,’ make sure that “others (nr etc.)” is
selected. Under ‘Program Selection,’ make sure that “Highly similar sequences (megablast)” is selected.
Then, click the blue “BLAST” button to search for gene sequences in different species that are similar to
the human gene sequence of interest.
Step 6: When the results of the BLAST sequence comparison appear, scroll down to the section entitled
‘sequences producing significant alignments.’ The species in the list that appears below this section are
those with sequences identical to (or most similar to) the human gene of interest. The most similar
sequences are listed first, as the higher “max score” usually indicates closer genetic relationships.
***For TABLE 2, exclude all Homo sapiens (human) DNA sequence matches. Choose the DNA sequence
from the organism other than Homo sapiens within your BLAST results list that most closely matches the
sequence of your human gene of interest. For example, Pan troglodytes is the scientific name of the
common chimpanzee, where Pan is the genus name and troglodytes is the species name.***
Remember to click on the particular species listed and the ‘accession’ link, where you’ll find more info
that includes the common name of the species, the # of nucleotides that match between the gene of
interest and the known organism, etc. Using the information from your results, complete TABLE 2.
REPEAT STEPS 1-6 FOR ALL FOUR PROVIDED HUMAN GENES OF INTEREST.
Step 7: Think of a human protein NOT listed in the table, search for the gene sequence of this protein,
run a BLAST comparison for this protein, and list all results in the final row of TABLE 2.
TABLE 2
Human
Gene
Human
Estrogen
Receptor
Most closely related
organism (genus and
species name)
Most closely
related species
(common name)
“Max
Score”
Number of
matching
nucleotides
(human gene
vs. organism
gene)
/
Human
Keratin 18
/
Human
Catalase
/
Human
Myosin 7
(cardiac)
/
Human
_______
_______
/
%
nucleotide
match
(“Max
Identity”)
Next TWO most closely
related organisms
(common names only)
Analysis
1) Using your results from TABLE 2, sketch a hypothetical cladogram based upon gene sequence
matches. Your cladogram should include humans and ALL animals listed within TABLE 2.
2) What is the function in humans of each of the proteins produced from the genes in TABLE 2?
GENE
PROTEIN FUNCTION
Human Estrogen
Receptor
Human Keratin
18
Human Catalase
Human Myosin
7 (cardiac)
Human
____________
____________
3) Is it possible to find the same gene in two different kinds of organisms but not find the protein that is
produced by that gene in both organisms? Why or why not?
4) If you found the same gene in all organisms you test, what does this suggest about the evolution of
this gene in the history of life on earth?
5) Does the use of DNA sequences in the study of evolutionary relationships mean that other
characteristics are unimportant in such studies? Explain your answer.
Download