Sciences of Life Academy @ Mizzou Bacterial Genomics & Bioinformatics Dr. Shari Freyermuth, Biochemistry Dept. Session I: Tues, July 10; 1:30 – 4:30 Session 11: Thurs, July 12; 1:30 – 4:30 Bioinformatics exercise This is the raw DNA sequence output from one Ammonifex clone. 31-0304A.Seq 1 LENGTH: 885 ANACTATCAC Thu, Mar 4, 2007 4:04 PM ATGATTACGC CAAGCTATTT CHECK: 2200 AGGTGACACT ATAGAATACT 51 CAAGCTATGC ATCAAGCTTG GTACCGAGCT CGGATCCACT AGTAACGGCC 101 GCCAGTGTGC TGGAATTCGC CCTTTAGAGG TGGCTGTGGC CGCTTTGGCG 151 GTTGCGGACG ATCAGGTTGG GGGCAGGAAG CCCAGCGTCC TCCCAGATCA 201 TCGCCTTGGC ATGGTCAAGG TCGAAGATCA GCCAGCTCAC GAAGCCGGGC 251 CGATTGACCT GCATGTAGGG GTAGCGAATG GCGTATTCGC GCGGCCGGAC 301 TCGGGTGGCC GTCTTGTCGT CAGAGCAACG CGGCAGATAG GGCGCTTCCG 351 TGAGCACGCG ATTGAGCGCG GTGCCCGACT GGAAAAAGCG GTCTGCTGTC 401 TGGTTGTCTA GTTTCTGTGC CGTTGTACCT GCCATAACCC CTCATCACAC 451 GTCCCGTAGG GGCGTGCGCA CGCTTGCGCG CAGGGCTTAA GACACGTACA 501 ATCGAAGCAT AGTCAGTCGC TTAATTGCAC TGGTACAGAG TTCACCCGCC 551 AAAGCGAACG CTGTATTTGT CTGGGCACCT GCCCGGTTTC TCTGTCAAGG 601 CCCGCTCGCC AAACGGGCTT TGTCGTTTCT GACTCTCCAG TTTTCTGCTG 651 TCTGGTCAGC TCTCTNCTTC TTCATCACGG TCACTCTGCG TCCATCGAGG 701 NCGGTGCTTG CTTGTAANCC GCGCACCATT GTCTTGGGGC ACCCAGTGCT 751 CCGTACTCCC ATCAAACTTC ATGATTCATA NCTTTTGGAG CACGCTTGCA 801 GCGNCNTCCA AGCTCTGGGA ANTACCGAAG GCCGCCNCGG GGAGCAACCC 851 AAANTTNGCT CATTTATTCC GGNCCGCACG AAGAA .. Our goal is to determine if this DNA sequence is similar to any other known DNA sequences. DNA from many organisms has already been sequenced and some of it is associated with specific functions. Thus, one of the first steps in learning about any new DNA sequence is to find out if it is comparable to any known sequences. The resource most researchers use for this is the National Center for Biotechnology Information (NCBI), which is a division of the National Library of Medicine at the National Institutes of Health. The website is: http://www.ncbi.nlm.nih.gov/. Step 1: Because the sequencing primer hybridizes to the vector, the first portion of our sequence will be vector sequence. Vector sequence will interfere with the database search, so we need to determine where the actual Ammonifex DNA sequence begins. To determine what part of this DNA sequence is the vector sequence, you will run a program called VecScreen, found on the NCBI site. 1. Go to the NCBI site at http://www.ncbi.nlm.nih.gov/. Look at the right hand column labeled Hot Spots, at the bottom choose VecScreen. 2. Copy the Ammonifex DNA sequence on page 1 and paste it into the window on the VecScreen page. Make sure that you do not include the header information. 3. Click the Run VecScreen button. 4. On the next popup window, click on the Format! button. It will take about a minute for the results to come back. 5. Make a note of the Segments matching vector. The high number is the nucleotide that will be at the junction between the vector sequence and the Ammonifex sequence. We want to remove all the vector sequence before we compare the Ammonifex sequence to the database. You will be looking for the sequence (CGCCCTT) in the vector that occurs just before the insert. 6. Edit your Ammonifex clone sequence to remove all nucleotides 5' of (everything before) and including the sequence CGCCCTT. Step 2: Next we will BLAST the edited sequence to see if there are any similar sequences already present in the GenBank database. BLAST or the Basic Local Alignment Search Tool finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. 1. Go to the NCBI site (http://www.ncbi.nlm.nih.gov/) and click on “BLAST” in the banner at the top of the page. 2. We will be comparing nucleotides to a nucleotide database. Under the “Basic BLAST” heading choose “nucleotide blast.” 3. On the page that comes up, you will need to copy and paste your edited sequence into the search box. 4. Under the second box labeled “Choose Search Set” chose the database “other” and from the pulldown menu select “Nucleotide collection (nr/nt).” 5. Click the BLAST button at the bottom of the page. 6. It will take a few seconds for the results to come back. 7. Look at your results to determine if this Ammonifex DNA sequence is present in any other organisms. Some notes about the output from the Blast search: A higher score means that the homology is greater. Anything over 50 is probably a significant match. The Expect value (E) is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. Thus, a lower E value means the match is more significant. High E values mean that the match is most likely random and does not indicate that the DNA sequences are related. Step 3: You can also blast against a protein database and often this gives you additional information. In BlastX, your nucleotide sequence is translated and then compared to a protein database. 8. Go to the NCBI site (http://www.ncbi.nlm.nih.gov/) and click on “BLAST” in the banner at the top of the page. 9. We will be comparing translated nucleotides to a protein database. To do this, under the “Basic BLAST” heading choose “blastx” (Search protein database using a translated nucleotide query). 10. On the page that comes up, you will need to copy and paste your edited sequence into the search box. 11. Under “Database” make sure that “Non-redundant protein sequences (nr)” is selected. 12. Click the BLAST button at the bottom of the page. 13. It will take about a minute for the results to come back. 14. Look at your results to determine if this Ammonifex protein sequence is present in any other organisms. 15. Using BlastX will give you an idea of what protein this DNA sequence may encode.