Using Bioinformatics

advertisement
E. Stanley and M.Waterman
NIE 2011
Textile Identification
Using Bioinformatics: Molecular Sequence Data
Overview:



Distinguish between protein and nucleic acid sequences.
Search an online database to identify an unknown protein or an unknown DNA
sequence.
Find other organisms with similar sequences using Blast.
View your unknown sequence:
1. Download your unknown file from the website.
2. Open the text file to view the sequence information. The file is in a
FASTA format that makes it possible to use online bioinformatics
programs to compare it with other sequences.
3. Indicate if your unknown is a protein or nucleic acid sequence by
checking the appropriate description below.
Protein sequence ____
DNA sequence ____
How could you tell?
Using NCBI to Identify a Sequence:
The National Center for Biotechnology Information (NCBI) advances science and health
by providing access to biomedical and genomic information. Like researchers, teachers
and students can find, upload/download, and compare sequence information for
proteins and nucleic acids.
You will be accessing a collection of publicly available sequences called GenBank, the
NIH genetic sequence database. The sequences are annotated so you know what
organism or lab the sequence came from, who submitted it, and usually its function.
GenBank is part of DataBank of Japan (DDBJ), the European Molecular Biology
Laboratory (EMBL), and the GenBank at NCBI, the International Nucleotide Sequence
Database Collaboration.
E. Stanley and M.Waterman
NIE 2011
Go to the NCBI Blast site at http://blast.ncbi.nlm.nih.gov/Blast.cgi to use the
similarity between biological sequences to find out about the structure and function of
your unknown sequence.
4.
To start, choose the appropriate BLAST search:
A NUCLEOTIDE SEQUENCE

Go to the BLAST home page and click "nucleotide blast" under Basic
BLAST.
A PROTEIN SEQUENCE

Go to the BLAST home page and click "protein blast" under Basic BLAST.
5.
Open your unknown file, choose select all, and then copy your file content.
6.
Under Enter Query Sequence, paste the file content into the query box.
7.
Under Choose Search Set:
If your unknown is a protein, choose non-redundant protein sequences (nr)
If your unknown is a nucleic acid, choose nucleic acid sequences (nr/nt)
8.
Under Program Selection:
If your unknown is a protein, choose blastp (protein-protein BLAST)
If your unknown is a nucleic acid, choose highly similar sequences (megablast)
9.
Now click the blue BLAST button at the bottom left of the screen.
Note: You may have to wait for results while everyone is trying to do this.
10. Consider your BLAST Results:

Scroll down to the table that lists the sequences producing significant
alignments (having the greatest sequence similarity).
11. Choose the top record. This sequence should have the maximum score for
alignment with your unknown sequence. Enter the following information:
Accession Number: ___________________________________
Description:
E. Stanley and M.Waterman
NIE 2011
12. Record information about the top record.

Review the accession record by clicking on the accession number link to
learn more.

Fill in the information below
Kind of organism: __________________________________________________
Genus and species: _________________________________________________
Name of researcher(s) who produced this record:
Where the research was done:
13.
List three other organisms with closely related sequences.
Are you surprised by these results?
Explain.
Download