NCBI Blast by Jundi

advertisement
Basic Introduction of BLAST
Jundi Wang
School of Computing
CSC691
09/08/2013
Overview
1.Introduction of BLAST
 Background of BLAST
 Programs in BLAST
 Function of BLAST
2.Application of BLAST
 BLAST web version
 Stand-alone BLAST
2
Background of BLAST
 BLAST (Basic Local Alignment Search Tool):
1. The most widely used sequence similarity tool.
2. BLAST is a family of programs:
a) Compare protein queries to protein databases
b) Compare nucleotide queries to nucleotide
databases
3
Background of BLAST
 The Mechanism of BLAST Finding similar
sequences:
BLAST finds similar sequences by locating
short matches between the two sequence.
After the first match, BLAST begins to make
local alignments.
4
Programs in BLAST
There are some different BLAST programs
available for different analytic purposes.
 Nucleotide-nucleotide BLAST (blastn)
This program, given a DNA query, returns the most similar
DNA sequences from the DNA database that the user specifies.
 Protein-protein BLAST (blastp)
This program, given a protein query, returns the most similar
protein sequences from the protein database that the user
specifies.
5
Programs in BLAST
 Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation
products of a nucleotide query sequence against a protein
sequence database.
 Nucleotide 6-frame translation-nucleotide 6-frame
translation (tblastx)
This program translates the query nucleotide sequence in all six
possible frames and compares it against the six-frame
translations of a nucleotide sequence database.
6
Programs in BLAST
 Protein-nucleotide 6-frame translation (tblastn)
This program compares a protein query against the all six
reading frames of a nucleotide sequence database.
7
Six-Frame Translation
Once a gene has been sequenced it is important to
determine the correct open reading frame (ORF).
Every region of DNA has six possible reading
frames, three in each strand. The ORF that is used
determines which amino acids will be encoded by a
gene. Typically only one reading frame is used in
translating a gene (in eukaryotes). The ORF starts
with an start codon (ATG) and ends with a stop
codon (TAA, TAG, or TGA).
8
Six-Frame Translation
Example:
9
Function of BLAST
 BLAST can be used to infer functional and
evolutionary relationships between
sequences as well as help identify members
of gene families.
10
Application of BLAST
 BLAST web version:
Advantage:
1. It is convenient to operate.
2. Synchronously updates the databases.
Weakness:
1. It is not good enough to analyze large-scale data.
2. Programmer cannot customize the database.
http://www.ncbi.nlm.nih.gov/BLAST/
11
Application of BLAST
 Stand-alone BLAST:
Advantage:
1. It can be used to analyze large-scale data.
2. Programmer can customize the database.
3. Programmer can download different version for different
operating system.
Weakness:
1. It is difficult to user who don’t have computer science
background.
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
12
Application of BLAST
 Statistics in BLAST
1. Score:
It is a value calculated from the number of gaps and
substitutions associated with each aligned.
2. E value:
It describes the likelihood that a sequence with a similar
score will occur in the database by chance.
13
Application of BLAST
3. Identities:
It describes the identity between query sequence and the
sequence from database.
4. Positive:
It describes the similarity between query sequence and the
sequence from database.
5. Gaps:
It describes the gaps between query sequence and the
sequence from database.
14
Application of BLAST (web version)
NCBI BLAST
web page
Nucleotide
Alignment
Protein
Alignment
15
Application of BLAST (web version)
Query
Sequence
Upload
File
Query Subrange
Select Database
16
Application of BLAST (web version)
Select
Algorithm
E value limitation
17
Application of BLAST (web version)
Click “Mouse” to check the detail
18
The
Value
of
score is
the
result
of
Score
Matrix
Application of BLAST (web version)
100% Identity
No Gap
19
Application of BLAST (web version)
NCBI Accession ID
All compared sequence
20
Application of BLAST
(Stand-alone Version)
 Download and install Stand-alone BLAST
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
 Download the database from NCBI
ftp://ftp.ncbi.nlm.nih.gov/blast/db/
 Download and install Activeperl from ActiveState
http://www.activestate.com/activeperl
21
Application of BLAST
(Stand-alone Version)
 Build local database
1. Enter the BLAST folder and create a database (db) folder.
2. Extract the downloaded database into the db folder.
 Link the database to the BLAST
1. Execute cmd.exe and link the database to the BLAST by
Perl.
 Modify the environment variables
1. Set the new path variable in order to make the BLAST to be
recognized.
22
Application of BLAST
(Stand-alone Version)
 Create a query sequence with a FASTA format.
Start with “>”
Follow by the name or
description of the query
sequence
23
Application of BLAST
(Stand-alone Version)
Example: Compare the query sequence with the sequence
from the “refseq_rna.00” database.
Link the
“refseq_rna.00” to
the BLAST
Different program
in BLAST package
Name of
database
24
Application of BLAST
(Stand-alone Version)
The basic
information of the
current database
25
Application of BLAST
(Stand-alone Version)
Execute “blastn”
program
Import the
target database
Import the
query sequence
Report the result
in a new file
26
Application of BLAST
(Stand-alone Version)
The length of
compared sequence
All compared sequence
Statistic
evaluation
NCBI Accession ID
27
Application of BLAST
(Stand-alone Version)
28
Application of BLAST
(Stand-alone Version)
29
Summary
DNA Sequencing in
a new species
Output
Query
Database
NCBI BLAST
Import
Thank You
31
Download