Introduction to BLAST and Sequence Similarity Searching A nucleotide sequence (ATGCATTCATGCC) by itself is not informative. It must be compared against existing databases to reveal its identification and function. The general approach involves the use of a set of algorithms such as the BLAST programs to compare an unknown sequence to all the sequences in a specific database (in our case, all DNA sequences existing in the nucleotide database). Each comparison is given a score reflecting the degree of similarity between the unknown sequence and the sequences in the database. The higher the score, the greater the degree of similarity. The BLAST programs (Basic Local Alignment Search Tools) are a set of sequence comparison algorithms introduced in 1990 that are used to search sequence databases. The BLAST programs improved the overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments. A variety of DNA and protein databases are available. A protein database is appropriate for searches with an amino acid sequence as the unknown. A nucleic acid database is generally appropriate for searches with a DNA unknown sequence. Questions for Review What is Sequence Similarity Searching? Why use Sequence Similarity Searching? Is a high score indicative of high or low similarity between two sequences? What are the two primary databases available through BLAST?