This is the raw DNA sequence output from one Ammonifex clone

advertisement
Sciences of Life Academy @ Mizzou
Bacterial Genomics & Bioinformatics
Dr. Shari Freyermuth, Biochemistry Dept.
Session I: Tues, July 10; 1:30 – 4:30
Session 11: Thurs, July 12; 1:30 – 4:30
Bioinformatics exercise
This is the raw DNA sequence output from one Ammonifex clone.
31-0304A.Seq
1
LENGTH: 885
ANACTATCAC
Thu, Mar 4, 2007 4:04 PM
ATGATTACGC
CAAGCTATTT
CHECK: 2200
AGGTGACACT
ATAGAATACT
51
CAAGCTATGC
ATCAAGCTTG
GTACCGAGCT
CGGATCCACT
AGTAACGGCC
101
GCCAGTGTGC
TGGAATTCGC
CCTTTAGAGG
TGGCTGTGGC
CGCTTTGGCG
151
GTTGCGGACG
ATCAGGTTGG
GGGCAGGAAG
CCCAGCGTCC
TCCCAGATCA
201
TCGCCTTGGC
ATGGTCAAGG
TCGAAGATCA
GCCAGCTCAC
GAAGCCGGGC
251
CGATTGACCT
GCATGTAGGG
GTAGCGAATG
GCGTATTCGC
GCGGCCGGAC
301
TCGGGTGGCC
GTCTTGTCGT
CAGAGCAACG
CGGCAGATAG
GGCGCTTCCG
351
TGAGCACGCG
ATTGAGCGCG
GTGCCCGACT
GGAAAAAGCG
GTCTGCTGTC
401
TGGTTGTCTA
GTTTCTGTGC
CGTTGTACCT
GCCATAACCC
CTCATCACAC
451
GTCCCGTAGG
GGCGTGCGCA
CGCTTGCGCG
CAGGGCTTAA
GACACGTACA
501
ATCGAAGCAT
AGTCAGTCGC
TTAATTGCAC
TGGTACAGAG
TTCACCCGCC
551
AAAGCGAACG
CTGTATTTGT
CTGGGCACCT
GCCCGGTTTC
TCTGTCAAGG
601
CCCGCTCGCC
AAACGGGCTT
TGTCGTTTCT
GACTCTCCAG
TTTTCTGCTG
651
TCTGGTCAGC
TCTCTNCTTC
TTCATCACGG
TCACTCTGCG
TCCATCGAGG
701
NCGGTGCTTG
CTTGTAANCC
GCGCACCATT
GTCTTGGGGC
ACCCAGTGCT
751
CCGTACTCCC
ATCAAACTTC
ATGATTCATA
NCTTTTGGAG
CACGCTTGCA
801
GCGNCNTCCA
AGCTCTGGGA
ANTACCGAAG
GCCGCCNCGG
GGAGCAACCC
851
AAANTTNGCT
CATTTATTCC
GGNCCGCACG
AAGAA
..
Our goal is to determine if this DNA sequence is similar to any other known DNA
sequences. DNA from many organisms has already been sequenced and some of it is
associated with specific functions. Thus, one of the first steps in learning about any new
DNA sequence is to find out if it is comparable to any known sequences. The resource
most researchers use for this is the National Center for Biotechnology Information
(NCBI), which is a division of the National Library of Medicine at the National Institutes
of Health. The website is: http://www.ncbi.nlm.nih.gov/.
Step 1: Because the sequencing primer hybridizes to the vector, the first portion of our
sequence will be vector sequence. Vector sequence will interfere with the database
search, so we need to determine where the actual Ammonifex DNA sequence begins.
To determine what part of this DNA sequence is the vector sequence, you will run
a program called VecScreen, found on the NCBI site.
1. Go to the NCBI site at http://www.ncbi.nlm.nih.gov/. Look at the right
hand column labeled Hot Spots, at the bottom choose VecScreen.
2. Copy the Ammonifex DNA sequence on page 1 and paste it into the
window on the VecScreen page. Make sure that you do not include the
header information.
3. Click the Run VecScreen button.
4. On the next popup window, click on the Format! button. It will take about
a minute for the results to come back.
5. Make a note of the Segments matching vector. The high number is the
nucleotide that will be at the junction between the vector sequence and the
Ammonifex sequence. We want to remove all the vector sequence before
we compare the Ammonifex sequence to the database. You will be looking
for the sequence (CGCCCTT) in the vector that occurs just before the
insert.
6. Edit your Ammonifex clone sequence to remove all nucleotides 5' of
(everything before) and including the sequence CGCCCTT.
Step 2: Next we will BLAST the edited sequence to see if there are any similar
sequences already present in the GenBank database. BLAST or the Basic Local
Alignment Search Tool finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and calculates
the statistical significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify members of gene
families.
1. Go to the NCBI site (http://www.ncbi.nlm.nih.gov/) and click on “BLAST” in the
banner at the top of the page.
2. We will be comparing nucleotides to a nucleotide database. Under the “Basic
BLAST” heading choose “nucleotide blast.”
3. On the page that comes up, you will need to copy and paste your edited sequence
into the search box.
4. Under the second box labeled “Choose Search Set” chose the database “other”
and from the pulldown menu select “Nucleotide collection (nr/nt).”
5.
Click the BLAST button at the bottom of the page.
6. It will take a few seconds for the results to come back.
7. Look at your results to determine if this Ammonifex DNA sequence is present in
any other organisms.
Some notes about the output from the Blast search:
A higher score means that the homology is greater. Anything over 50 is probably a
significant match.
The Expect value (E) is a parameter that describes the number of hits one can "expect" to
see just by chance when searching a database of a particular size. Thus, a lower E value
means the match is more significant. High E values mean that the match is most likely
random and does not indicate that the DNA sequences are related.
Step 3: You can also blast against a protein database and often this gives you additional
information. In BlastX, your nucleotide sequence is translated and then compared to a
protein database.
8. Go to the NCBI site (http://www.ncbi.nlm.nih.gov/) and click on “BLAST” in the
banner at the top of the page.
9. We will be comparing translated nucleotides to a protein database. To do this,
under the “Basic BLAST” heading choose “blastx” (Search protein database
using a translated nucleotide query).
10. On the page that comes up, you will need to copy and paste your edited sequence
into the search box.
11. Under “Database” make sure that “Non-redundant protein sequences (nr)” is
selected.
12. Click the BLAST button at the bottom of the page.
13. It will take about a minute for the results to come back.
14. Look at your results to determine if this Ammonifex protein sequence is present in
any other organisms.
15. Using BlastX will give you an idea of what protein this DNA sequence may
encode.
Download