Introduction to BLAST and Sequence Similarity Searching

advertisement
Introduction to BLAST and Sequence Similarity Searching
A nucleotide sequence (ATGCATTCATGCC) by itself is not informative. It must be
compared against existing databases to reveal its identification and function.
The general approach involves the use of a set of algorithms such as the BLAST
programs to compare an unknown sequence to all the sequences in a specific database
(in our case, all DNA sequences existing in the nucleotide database).
Each comparison is given a score reflecting the degree of similarity between the
unknown sequence and the sequences in the database. The higher the score, the
greater the degree of similarity.
The BLAST programs (Basic Local Alignment Search Tools) are a set of sequence
comparison algorithms introduced in 1990 that are used to search sequence databases.
The BLAST programs improved the overall speed of searches while retaining good
sensitivity (important as databases continue to grow) by breaking the query and
database sequences into fragments ("words"), and initially seeking matches between
fragments.
A variety of DNA and protein databases are available. A protein database is appropriate
for searches with an amino acid sequence as the unknown. A nucleic acid database is
generally appropriate for searches with a DNA unknown sequence.
Questions for Review
What is Sequence Similarity Searching?
Why use Sequence Similarity Searching?
Is a high score indicative of high or low similarity between two sequences?
What are the two primary databases available through BLAST?
Download