AP Biology Lab: Using BLAST for DNA/Protein Sequencing Purpose

advertisement
AP Biology Lab: Using BLAST for DNA/Protein
Sequencing
Purpose: Using genetic databases and the internet, student will be:
1.
2.
3.
Aware of their genetic existence.
Experienced at manipulating protein sequences.
Conscious of the enormous amount of effort that is going to be required to store all the genetic
information of even a single complex species in a way that is useful to both researchers and other
interested parties.
Procedure: Part 1: Search for a Protein Sequence
Frequently scientists will obtain just part of a gene sequence but will not know which gene it came from. They can
search databases to see if a similar sequence already exists. It is somewhat easier to search protein sequences; we
will assume they have translated their DNA sequence, found an open reading frame and obtained an amino acid
sequence with which they can now search a data base.
Connect to the National Center for Biotechnology Information (NCBI) site.
You may find it helpful to open the assignment twice so you have two active windows: one for the assignment and
one for the NCBI site.




Select: BLAST from the menu near the bottom of the homepage under the heading POPULAR.
Ignore the Heading "BLAST Assembled RefSeq Genomes"
Under the heading "Basic BLAST", Select: protein blast.
Copy and paste the following amino acid sequence into the box titled "Enter Query Sequence". (Each
amino acid is abbreviated by a single letter - see appendix at the bottom of this page)
rragaaddddvgrrrrttrtraasevrfhgihmrsygrwsaeirdssyrghrlwigtyataeaaaraydaearrihg
akantnfppppndvdsgapppppwdleahmrflgevelddgga
 Scroll down a little and Click: BLAST (a large blue horizontal oval button on the left side).
 A page will appear with information on how the search is going. Wait for the next page to appear. This will
have a wide pink bar followed by a chart with colored horizontal bars and below this, the search results
you need.
If you try this during the peak hours of the day, you may hit busy signals.
 Find a protein sequence with a high match. There will be many possibilities - the list starts with the
highest match. Also scroll much further down the page to see how the search sequence (labeled the
query) is aligned with sequences (labeled the subject) selected by the Blast program.
 Go to one of the identified sequences by clicking on the accession number link (the complex
name/number on the left). You can click on an accession number in the section where the alignments are
presented or scroll up to the original listings just below the bar chart.

A lot of information will appear about the protein sequence and its gene. Start your first report with the
name of the gene (or the protein it encodes), the organism it came from and name the individuals who
determined the gene sequence. Include the accession number in your report. Note that you may find
several references cited when you click on an accession number and that the protein sequence is often
located at the bottom of the page.
 Cut and paste the complete amino acid sequence into your report (in your word processing program,
convert the sequence to a proportional space font like Courier or New Courier). Find and mark the region
which is similar to the search query sequence.
 Go back to the alignment section of the initial search results and go near the bottom of the list to see
sequences which are less similar to the search query sequence. What do you think the + signs mean
(between the query and the subject sequences) (Hint find a figure in the Campbell text which shows all
the chemical structures of the amino acids organized according to their properties)?
Procedure: Part Two: Search for a DNA Sequence
Repeat the assignment for the second report using the provided nucleotide sequence and use a program (blastx,
see below) that will translate it into all six possible reading frames (including the three on the complementary
strand).
 Select: BLAST.
 Under the heading "Basic BLAST", Select: blastx.
 Copy and paste the following nucleotide sequence into the sequence entry box.
GAAGATAATA CATAATGTCG ATTGTTGGAA GGAATGCTAT TCTGAATCTA
AGAATTTCAC TATGTCCTCT GTTTATGGGC AAAAGATCGT TTGTATCCTC
TCCGGTTAGC AATAGTGCAA AAGCTGTGAA ATTCTTAAAG GCTCAAAGAC
GAAAACAGAA AAATGAAGCC AAACAAGCCA CTTTGAAAGC GTCAACCGAT
AAGGTTGATC CAGT
 Click: BLAST
This search could take longer because the program has to search all six reading frames.
 Follow the instructions provided for part I to complete your second report. Make sure you pick a high
match. Once you are at the site describing the gene, you will have to find the actual protein sequence,
often located near the bottom of the page.

Mark on the protein sequence you copy into your report the region encoded by the nucleotides used for
the search. To help do this, copy and paste the query nucleotide sequence into your report and indicate
the region encoding the methionine start codon (Hint it is near the start of the DNA sequence).
Appendix: Amino Acid Abbreviations
A Alanine,
C Cysteine,
D Aspartic acid, E Glutamic acid, F Phenylalanine, G Glycine,
H Histidine,
I Isoleucine,
K Lysine,
L Leucine,
M Methionine N Asparagine,
P Proline,
Q Glutamine,
R Arginine,
S Serine,
T Threonine, V Valine,
W Tryptophan, Y Tyrosine
AP Biology Lab: Using BLAST for DNA/Protein Sequencing Lab Report
**Please record all information into you lab notebook. Make sure to follow a common lab
write-up and follow the rubric.**
Data Part One: Search for a Protein Sequence
1. Find a protein sequence with a high match.
2. Name of the gene (or the protein it encodes): _____________________________________________________
3. Organism it came from: _______________________________________________________________________
4. Name the individuals who determined the gene
sequence:____________________________________________________________________________________
5. Accession number in your report: _______________________________________________________________
6. Cut and paste the complete amino acid sequence into your lab report. (Remember, convert the sequence to a
proportional space font like Courier or New Courier). Find and mark the region (with a highlighter) which is similar
to the search query sequence.
Question 1: What do you think the + signs mean (between the query and the subject sequences) (Hint find a
figure in the Campbell text which shows all the chemical structures of the amino acids organized according to their
properties)?
Data Part Two: Search for a DNA Sequence
1. Find a DNA sequence with a high match.
2. Name of the gene (or the protein it encodes): _____________________________________________________
3. Organism it came from: _______________________________________________________________________
4. Name the individuals who determined the gene
sequence:____________________________________________________________________________________
5. Accession number in your report: _______________________________________________________________
6. Find the actual protein sequence, often located near the bottom of the page. Copy and paste the protein
sequence in your lab notebook. Mark on the protein sequence the region encoded by the nucleotides used for the
search. To help do this, copy and paste the query nucleotide sequence here and indicate the region encoding the
methionine (AUG on mRNA) start codon (Hint it is near the start of the DNA sequence).
Conclusion: Write a conclusion using the 3-paragraph prompt in your syllabus.
Analysis and Extension (To replace Hypothesis Points): Please answer the questions below using
complete sentences in your lab report.
1) What is the scientific purpose(s) of using the BLAST website database?
2) How could a scientist use the BLAST website to prove/justify evolutionary relationships?
3) Develop an experiment using BLAST to determine if a human and an orangutan shared a common ancestor
(Hint: think of our amino acid activity in class).
Download