AP Biology Lab: Using BLAST for DNA/Protein Sequencing Purpose: Using genetic databases and the internet, student will be: 1. 2. 3. Aware of their genetic existence. Experienced at manipulating protein sequences. Conscious of the enormous amount of effort that is going to be required to store all the genetic information of even a single complex species in a way that is useful to both researchers and other interested parties. Procedure: Part 1: Search for a Protein Sequence Frequently scientists will obtain just part of a gene sequence but will not know which gene it came from. They can search databases to see if a similar sequence already exists. It is somewhat easier to search protein sequences; we will assume they have translated their DNA sequence, found an open reading frame and obtained an amino acid sequence with which they can now search a data base. Connect to the National Center for Biotechnology Information (NCBI) site. You may find it helpful to open the assignment twice so you have two active windows: one for the assignment and one for the NCBI site. Select: BLAST from the menu near the bottom of the homepage under the heading POPULAR. Ignore the Heading "BLAST Assembled RefSeq Genomes" Under the heading "Basic BLAST", Select: protein blast. Copy and paste the following amino acid sequence into the box titled "Enter Query Sequence". (Each amino acid is abbreviated by a single letter - see appendix at the bottom of this page) rragaaddddvgrrrrttrtraasevrfhgihmrsygrwsaeirdssyrghrlwigtyataeaaaraydaearrihg akantnfppppndvdsgapppppwdleahmrflgevelddgga Scroll down a little and Click: BLAST (a large blue horizontal oval button on the left side). A page will appear with information on how the search is going. Wait for the next page to appear. This will have a wide pink bar followed by a chart with colored horizontal bars and below this, the search results you need. If you try this during the peak hours of the day, you may hit busy signals. Find a protein sequence with a high match. There will be many possibilities - the list starts with the highest match. Also scroll much further down the page to see how the search sequence (labeled the query) is aligned with sequences (labeled the subject) selected by the Blast program. Go to one of the identified sequences by clicking on the accession number link (the complex name/number on the left). You can click on an accession number in the section where the alignments are presented or scroll up to the original listings just below the bar chart. A lot of information will appear about the protein sequence and its gene. Start your first report with the name of the gene (or the protein it encodes), the organism it came from and name the individuals who determined the gene sequence. Include the accession number in your report. Note that you may find several references cited when you click on an accession number and that the protein sequence is often located at the bottom of the page. Cut and paste the complete amino acid sequence into your report (in your word processing program, convert the sequence to a proportional space font like Courier or New Courier). Find and mark the region which is similar to the search query sequence. Go back to the alignment section of the initial search results and go near the bottom of the list to see sequences which are less similar to the search query sequence. What do you think the + signs mean (between the query and the subject sequences) (Hint find a figure in the Campbell text which shows all the chemical structures of the amino acids organized according to their properties)? Procedure: Part Two: Search for a DNA Sequence Repeat the assignment for the second report using the provided nucleotide sequence and use a program (blastx, see below) that will translate it into all six possible reading frames (including the three on the complementary strand). Select: BLAST. Under the heading "Basic BLAST", Select: blastx. Copy and paste the following nucleotide sequence into the sequence entry box. GAAGATAATA CATAATGTCG ATTGTTGGAA GGAATGCTAT TCTGAATCTA AGAATTTCAC TATGTCCTCT GTTTATGGGC AAAAGATCGT TTGTATCCTC TCCGGTTAGC AATAGTGCAA AAGCTGTGAA ATTCTTAAAG GCTCAAAGAC GAAAACAGAA AAATGAAGCC AAACAAGCCA CTTTGAAAGC GTCAACCGAT AAGGTTGATC CAGT Click: BLAST This search could take longer because the program has to search all six reading frames. Follow the instructions provided for part I to complete your second report. Make sure you pick a high match. Once you are at the site describing the gene, you will have to find the actual protein sequence, often located near the bottom of the page. Mark on the protein sequence you copy into your report the region encoded by the nucleotides used for the search. To help do this, copy and paste the query nucleotide sequence into your report and indicate the region encoding the methionine start codon (Hint it is near the start of the DNA sequence). Appendix: Amino Acid Abbreviations A Alanine, C Cysteine, D Aspartic acid, E Glutamic acid, F Phenylalanine, G Glycine, H Histidine, I Isoleucine, K Lysine, L Leucine, M Methionine N Asparagine, P Proline, Q Glutamine, R Arginine, S Serine, T Threonine, V Valine, W Tryptophan, Y Tyrosine AP Biology Lab: Using BLAST for DNA/Protein Sequencing Lab Report **Please record all information into you lab notebook. Make sure to follow a common lab write-up and follow the rubric.** Data Part One: Search for a Protein Sequence 1. Find a protein sequence with a high match. 2. Name of the gene (or the protein it encodes): _____________________________________________________ 3. Organism it came from: _______________________________________________________________________ 4. Name the individuals who determined the gene sequence:____________________________________________________________________________________ 5. Accession number in your report: _______________________________________________________________ 6. Cut and paste the complete amino acid sequence into your lab report. (Remember, convert the sequence to a proportional space font like Courier or New Courier). Find and mark the region (with a highlighter) which is similar to the search query sequence. Question 1: What do you think the + signs mean (between the query and the subject sequences) (Hint find a figure in the Campbell text which shows all the chemical structures of the amino acids organized according to their properties)? Data Part Two: Search for a DNA Sequence 1. Find a DNA sequence with a high match. 2. Name of the gene (or the protein it encodes): _____________________________________________________ 3. Organism it came from: _______________________________________________________________________ 4. Name the individuals who determined the gene sequence:____________________________________________________________________________________ 5. Accession number in your report: _______________________________________________________________ 6. Find the actual protein sequence, often located near the bottom of the page. Copy and paste the protein sequence in your lab notebook. Mark on the protein sequence the region encoded by the nucleotides used for the search. To help do this, copy and paste the query nucleotide sequence here and indicate the region encoding the methionine (AUG on mRNA) start codon (Hint it is near the start of the DNA sequence). Conclusion: Write a conclusion using the 3-paragraph prompt in your syllabus. Analysis and Extension (To replace Hypothesis Points): Please answer the questions below using complete sentences in your lab report. 1) What is the scientific purpose(s) of using the BLAST website database? 2) How could a scientist use the BLAST website to prove/justify evolutionary relationships? 3) Develop an experiment using BLAST to determine if a human and an orangutan shared a common ancestor (Hint: think of our amino acid activity in class).