AP Biology Lab 3: Using BLAST for DNA/Protein Sequencing Background: Between 1990–2003, scientists working on an international research project known as the Human Genome Project were able to identify and map the 20,000–25,000 genes that define a human being. The project also successfully mapped the genomes of other species, including the fruit fly, mouse, and Escherichia coli. The location and complete sequence of the genes in each of these species are available for anyone in the world to access via the Internet. Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same gene found in humans? Does it cause a similar disease? It would take nearly 10 years to read through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This definitely is not practical, so a sophisticated technological method is required. Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest and search entire genomic libraries for identical or similar sequences in a matter of seconds. Purpose: Using genetic databases and the internet, student will be: 1. 2. 3. Aware of their genetic existence. Experienced at manipulating protein sequences. Conscious of the enormous amount of effort that is going to be required to store all the genetic information of even a single complex species in a way that is useful to both researchers and other interested parties. Procedure: Part 1: Search for a Protein Sequence Frequently scientists will obtain just part of a gene sequence but will not know which gene it came from. They can search databases to see if a similar sequence already exists. It is somewhat easier to search protein sequences; we will assume they have translated their DNA sequence, found an open reading frame, and obtained an amino acid sequence with which they can now search a database. Connect to the National Center for Biotechnology Information (NCBI) site. You may find it helpful to open the assignment twice so you have two active windows: one for the assignment and one for the NCBI site. Select: BLAST from the menu near the bottom of the homepage under the heading POPULAR. Ignore the Heading "BLAST Assembled RefSeq Genomes" Under the heading "Basic BLAST", Select: protein blast. Copy and paste the following amino acid sequence into the box titled "Enter Query Sequence". (Each amino acid is abbreviated by a single letter - see appendix at the bottom of this page) rragaaddddvgrrrrttrtraasevrfhgihmrsygrwsaeirdssyrghrlwigtyataeaaaraydaearrihgakantn fppppndvdsgapppppwdleahmrflgevelddgga Scroll down a little and Click: BLAST (a large blue horizontal oval button on the left side). A page will appear with information on how the search is going. Wait for the next page to appear. This will have a wide pink bar followed by a chart with colored horizontal bars and below this, the search results you need. If you try this during the peak hours of the day, you may hit busy signals. Find a protein sequence with a high match. There will be many possibilities - the list starts with the highest match. Also scroll much further down the page to see how the search sequence (labeled the query) is aligned with sequences (labeled the subject) selected by the Blast program. Go to one of the identified sequences by clicking on the accession number link (the complex name/number on the left). You can click on an accession number in the section where the alignments are presented or scroll up to the original listings just below the bar chart. A lot of information will appear about the protein sequence and its gene. Start your first report with the name of the gene (or the protein it encodes), the organism it came from (scientific and general name, use google for help) and name the individuals who determined the gene sequence. Include the accession number in your report. Note that you may find several references cited when you click on an accession number and that the protein sequence is often located at the bottom of the page. Cut and paste the complete amino acid sequence (at the bottom) into your report (in your word processing program, convert the sequence to a proportional space font like Courier or New Courier). Find and mark the region which is similar to the search query sequence. Go back to the alignment section of the initial search results and go near the bottom of the list to see sequences which are less similar to the search query sequence. What do you think the + signs mean (between the query and the subject sequences) (Hint find a figure in the Campbell text which shows all the chemical structures of the amino acids organized according to their properties)? Procedure: Part Two: Search for a DNA Sequence Repeat the assignment for the second report using the provided nucleotide sequence and use a program (blastx, see below) that will translate it into all six possible reading frames (including the three on the complementary strand). Select: BLAST. Under the heading "Basic BLAST", Select: blastx. Copy and paste the following nucleotide sequence into the sequence entry box. GAAGATAATA CATAATGTCG ATTGTTGGAA GGAATGCTAT TCTGAATCTA AGAATTTCAC TATGTCCTCT GTTTATGGGC AAAAGATCGT TTGTATCCTC TCCGGTTAGC AATAGTGCAA AAGCTGTGAA ATTCTTAAAG GCTCAAAGAC GAAAACAGAA AAATGAAGCC AAACAAGCCA CTTTGAAAGC GTCAACCGAT AAGGTTGATC CAGT Click: BLAST This search could take longer because the program has to search all six reading frames. Follow the instructions provided for part I to complete your second report. Make sure you pick a high match. Once you are at the site describing the gene, you will have to find the actual protein sequence, often located near the bottom of the page. Mark on the protein sequence you copy into your report the region encoded by the nucleotides used for the search. To help do this, copy and paste the query nucleotide sequence into your report and indicate the region encoding the methionine start codon (Hint it is near the start of the DNA sequence). Appendix: Amino Acid Abbreviations A Alanine, C Cysteine, D Aspartic acid, E Glutamic acid, F Phenylalanine, G Glycine, H Histidine, I Isoleucine, K Lysine, L Leucine, M Methionine N Asparagine, Q Glutamine, R Arginine, S Serine, T Threonine, V Valine, W Tryptophan, Y Tyrosine P Proline, AP Biology Lab 3: Using BLAST for DNA/Protein Sequencing Lab Report **Please record all information into you lab notebook. Make sure to follow a common lab write-up and follow the rubric.** Data Part One: Search for a Protein Sequence 1. Find a protein sequence with a high match. 2. Name of the gene (or the protein it encodes): _____________________________________________________ 3. Organism (Scientific Name and Common Name) it came from: ____________________________________________________________________________________________ 4. Name the individuals who determined the gene sequence:____________________________________________________________________________________ 5. Accession number in your report: _______________________________________________________________ 6. Cut and paste the complete amino acid sequence into your lab report. (Remember, convert the sequence to a proportional space font like Courier or New Courier). Find and mark the region (with a highlighter) which is similar to the search query sequence. Question 1: What do you think the + signs mean (between the query and the subject sequences) (Hint find a figure in the Campbell text which shows all the chemical structures of the amino acids organized according to their properties)? Data Part Two: Search for a DNA Sequence 1. Find a DNA sequence with a high match. 2. Name of the gene (or the protein it encodes): _____________________________________________________ 3. Organism it came from: _______________________________________________________________________ 4. Name the individuals who determined the gene sequence:____________________________________________________________________________________ 5. Accession number in your report: _______________________________________________________________ 6. Find the actual protein sequence, often located near the bottom of the page. Copy and paste the protein sequence in your lab notebook. Mark on the protein sequence the region encoded by the nucleotides used for the search. To help do this, copy and paste the query nucleotide sequence here and indicate the region encoding the methionine (AUG on mRNA) start codon (Hint it is near the start of the DNA sequence). Conclusion: Write a conclusion using the 3-paragraph prompt in your syllabus. Analysis and Extension (To replace Hypothesis Points): Please answer the questions below using complete sentences in your lab report. 1) What is the scientific purpose(s) of using the BLAST website database? 2) How could a scientist use the BLAST website to prove/justify evolutionary relationships? 3) Develop an experiment using BLAST to determine if a human and an orangutan shared a common ancestor (Hint: think of our amino acid activity in class).