Sequence Alignment: BLAST - Genome Projects at University of

advertisement
E S S E N T I A L S O F N E X T G E N E R A T I O N
S E Q U E N C I N G W O R K S H O P 2 0 1 4
U N I V E R S I T Y O F K E N T U C K Y A G T C
6
Class
Sequence Alignment:
BLAST
Goal:
Be able to install and use the Basic Local Alignment Search Tool (BLAST ) to align and
compare sequences Search the NCBI non-redundant BLAST database with a query file
Input:
BLAST/MoTeR_retrotransposons.fasta
BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta
Output:
BLAST/MoTeRs.nrBLASTn
BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta
6.1 Installing BLAST
First, we will download the BLAST binaries directly from the NCBI website.
 Go the NCBI homepage at http://www.ncbi.nlm.nih.gov/
 Click the “Data and Software” link (left-hand panel)
 Click the “Downloads” tab
 Click the “BLAST (Stand-alone)” link
 Click the ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ link under
“BLAST+ executables”.
 Note that the latest executable for Linux is: ncbi-blast-2.2.29+-ia64-linux.tar.gz
If we were to click on this link, it would download the file to the machine that we are working on – not
the Linux server where the program needs to be installed.
Essentials of Next Generation Sequencing 2014
Page 1 of 4
Instead, we will copy the link to the file to make downloading it via the command line easier. We have a
64-bit (“x64”) Linux system, so right-click the ncbi-blast-2.2.29+-x64-linux.tar.gz link and
select “Copy Link Address” or the equivalent in your browser.
 Now use PuTTY to connect to your server via SSH.
 Download the latest BLAST executables to your home directory from the NCBI FTP server
using the link you copied. Right-clicking pastes it into PuTTY. The command should look like:
o
wget
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbiblast-2.2.29+-x64-linux.tar.gz
 After downloading, unpack the executables:
o
tar zxvpf ncbi-blast-2.2.29+-x64-linux.tar.gz
 Add the directory with the executables to your system PATH:
o
o
PATH=/home/yourusername/ncbi-blast-2.2.29+/bin:$PATH
export PATH
Make sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10)!
6.2 Run a Local  Remote BLAST Search
Goal:
Search the NCBI non-redundant BLAST database with a query file
Input:
BLAST/MoTeR_retrotransposons.fasta
Output:
BLAST/MoTeRs.nrBLASTn
 Now, use your locally installed blastn program to search the NCBI database using the query file
MoTeR_retrotransposons.fasta:
o
blastn –db nr –query BLAST/MoTeR_retrotransposons.fasta –out
BLAST/MoTeRs.nrBLASTn –evalue 1e-20 –outfmt 1 –remote
BLAST takes several parameters here:
specifies the database to be searched (we will use the NCBI “nr” database)
specifies the local query sequence file (full path required)
name of output file
tells program to only report matches with ≤ specified value
specifies format of output (values can range from 0 to 11). Output formats are
listed below.
-remote: tells program to search a remote (NCBI) database
-db:
-query:
-out:
-evalue:
-outfmt:
Essentials of Next Generation Sequencing 2014
Page 2 of 4
Here are the possible parameters to –outfmt:












0 = pairwise
1 = query-anchored showing identities
2 = query-anchored no identities
3 = flat query-anchored, show identities
4 = flat query-anchored, no identities
5 = XML Blast output
6 = tabular
7 = tabular with comment lines
8 = Text ASN.1
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
You may want to experiment with this parameter to see which format suits your needs.
 Examine the BLAST output with less.
6.3 Create and Search a Custom BLAST Database
Goal:
Create a BLAST nucleotide database from the genome assembly and perform a query
against it
Input:
BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta
Output:
BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta
Next, we will create a BLAST database from our existing genome data that we can search against. First,
we will need to tell BLAST where to look for your custom database with a .ncbirc file.
 Use the vim text editor to create a file named .ncbirc (yes, the prefix period should be included)
inside your home directory. This file should contain the text:
[BLAST]
BLASTDB=/home/yourusername/ncbi-blast-2.2.29+/db
Again, be sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10)
 Create a subdirectory named db within the ncbi-blast-2.2.29+ directory.
 Copy the BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta genome file into
your newly created db directory.
 Change to the db directory, and use makeblastdb to create a new BLAST database:
Essentials of Next Generation Sequencing 2014
Page 3 of 4
o
makeblastdb –in magnaporthe_oryzae_70-15_8_supercontigs.fasta
–dbtype nucl –out Moryzae_genome.fasta
 Now change back to your home directory and run a blastn search using the sequence in
MoRepeats.fasta as the query and your new genome as the database:
o
blastn -db Moryzae_genome.fasta -query BLAST/MoRepeats.fasta –out
BLAST/MoRepeats.Moryzae_genomeBLASTn1
-evalue 1e-20 –outfmt 1
 Examine your output file with less.
 Now is a good time to try running the search using a few different output format options (0
through 11). Try –outfmt 6 for example!
o
blastn -db Moryzae_genome.fasta -query BLAST/MoRepeats.fasta –out
BLAST/MoRepeats.Moryzae_genomeBLASTn6
-evalue 1e-20 –outfmt 6
 BLAST comes in many flavors, not just blastn.
o blastn: nucleotide-nucleotide alignment
o blastp: protein-protein alignment
o blastx: does six-frame translation of query nucleotide sequence and aligns against a
protein database
o and many more!
Essentials of Next Generation Sequencing 2014
Page 4 of 4
Download