Uploaded by stephencharles0001

20230106 BLAST Exercise

advertisement
BN 231
BLAST Exercise
Exercise 1: Protein-Protein BLAST
Step 1: copy the sequence below:
MSKRKAPQETLNGGITDMLTELANFEKNVSQAIHKYNAYRKAASVIAKYPHKIKSG
AEAKKLPGVGTKIAEKIDEFLATG
KLRKLEKIRQDDTSSSINFLTRVSGIGPSAARKFVDEGIKTLEDLRKNEDKLNHHQRIG
LKYFGDFEKRIPREEMLQMQD
IVLNEVKKVDSEYIATVCGSFRRGAESSGDMDVLLTHPSFTSESTKQPKLLHQVVEQ
LQKVHFITDTLSKGETKFMGVCQ
LPSKNDEKEYPHRRIDIRLIPKDQYYCGVLYFTGSDIFNKNMRAHALEKGFTINEYTIR
PLGVTGVAGEPLPVDSEKDIF
DYIQWKYREPKDRSE
Step 2: Open BLAST: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Step 3: Select Protein BLAST
Step 4: Paste Sequence on query box, add job name and choose BLAST P, Click BLAST
Step 5: Interpreting results
Explore the results page
a. Top section with input sequence details
b. Explore these tabs: Description, Graphic summary, Alignments, Taxonomy
c. Interpreting results
Scientific name:
Max Score (Maximum score): the highest alignment score calculated from the sum of the
rewards for matched nucleotides and penalities for mismatches and gaps
Total Score: the sum of alignment scores of all segments from the same subject sequence
Query Cover (Query coverage): the percent of the query length that is included in the aligned
segments
E-value (Expect Value): the number of alignments expected by chance with the calculated
score or better. The expect value is the default sorting metric; for significant alignments the E
value should be very close to zero.
Percentage identity: the highest percent identity for a set of aligned segments to the same
subject sequence.
Acc. Len. (Accession Length): the number of nucleotides or amino acids in the result
sequence identified by the accession number
Accession: a unique identifier assigned to records in the NCBI databases
About score matrices
BLAST uses score matrices to compare the sequences by alignment them and assigning a
value to each alignment. There are a several kinds of scoring matrices for different
comparisons such as BLOSUM and PAM matrices.
BLOSUM-62 matrix is among the best for detecting most weak protein similarities,
BLOSUM-45 for long and weak alignments, BLOSUM80 for closely related sequences,
BLOSUM45 for distant related sequences, PAM 1, PAM250, etc
The higher the score value, the better the alignment/comparison.
Examples:
Image rights:
https://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/BLAST/scoring_nucleotides.html
Image rights: https://studylib.net/doc/18217730/score--bit-score--p-value--e
Learn more on BLAST scoring matrices:
1. Use of scoring matrices:
https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/650/Use
_scoring_matrices.html
2. BLAST substitution matrices:
https://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html
Exercise 2: DNA BLAST
1. Copy the sequence
TCGAAATAACGCGTGTTCTCAACGCGGTCGCGCAGATGCCTTTGCTCATCAGATGCGACCGCAAC CACGTCCGC
CGCCTTGTTCGCCGTCCCCGTGCCTCAACCACCACCACGGTGTCGTCTTCCCCGAA CGCGTCCCGGTCAGCCAG
CCTCCACGCGCCGCGCGCGCGGAGTGCCCATTCGGGCCGCAGCTGCG ACGGTGCCGCTCAGATTCTGTGTGGCA
GGCGCGTGTTGGAGTCTAAA
Open BLAST https://blast.ncbi.nlm.nih.gov/Blast.cgi
Choose nucleotide BLAST and paste the sequence
Click BLAST
Explore and interpret results as above
Other exercises:
1. Interactive BLAST tutorial: https://digitalworldbiology.com/blast
2. https://digitalworldbiology.com/tutorial/blast-for-beginners
3. Detailed tutorial: https://www.ncbi.nlm.nih.gov/books/NBK1734/
Download