E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2 0 1 4 U N I V E R S I T Y O F K E N T U C K Y A G T C 6 Class Sequence Alignment: BLAST Goal: Be able to install and use the Basic Local Alignment Search Tool (BLAST ) to align and compare sequences Search the NCBI non-redundant BLAST database with a query file Input: BLAST/MoTeR_retrotransposons.fasta BLAST/MoRepeats.fasta BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta Output: BLAST/MoTeRs.nrBLASTn BLAST/MoRepeats.Moryzae_genomeBLASTn1 blast-2.2.29+/db/Moryzae_genome.fasta 6.1 Installing BLAST First, we will download the BLAST binaries directly from the NCBI website. Go the NCBI homepage at http://www.ncbi.nlm.nih.gov/ Click the “Data and Software” link (left-hand panel) Click the “Downloads” tab Click the “BLAST (Stand-alone)” link Click the ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ link under “BLAST+ executables”. Note that the latest executable for Linux is: ncbi-blast-2.2.29+-ia64-linux.tar.gz If we were to click on this link, it would download the file to the machine that we are working on – not the Linux server where the program needs to be installed. Essentials of Next Generation Sequencing 2014 Page 1 of 4 Instead, we will copy the link to the file to make downloading it via the command line easier. We have a 64-bit (“x64”) Linux system, so right-click the ncbi-blast-2.2.29+-x64-linux.tar.gz link and select “Copy Link Address” or the equivalent in your browser. Now use PuTTY to connect to your server via SSH. Download the latest BLAST executables to your home directory from the NCBI FTP server using the link you copied. Right-clicking pastes it into PuTTY. The command should look like: o wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbiblast-2.2.29+-x64-linux.tar.gz After downloading, unpack the executables: o tar zxvpf ncbi-blast-2.2.29+-x64-linux.tar.gz Add the directory with the executables to your system PATH: o o PATH=/home/yourusername/ncbi-blast-2.2.29+/bin:$PATH export PATH Make sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10)! 6.2 Run a Local Remote BLAST Search Goal: Search the NCBI non-redundant BLAST database with a query file Input: BLAST/MoTeR_retrotransposons.fasta Output: BLAST/MoTeRs.nrBLASTn Now, use your locally installed blastn program to search the NCBI database using the query file MoTeR_retrotransposons.fasta: o blastn –db nr –query BLAST/MoTeR_retrotransposons.fasta –out BLAST/MoTeRs.nrBLASTn –evalue 1e-20 –outfmt 1 –remote BLAST takes several parameters here: specifies the database to be searched (we will use the NCBI “nr” database) specifies the local query sequence file (full path required) name of output file tells program to only report matches with ≤ specified value specifies format of output (values can range from 0 to 11). Output formats are listed below. -remote: tells program to search a remote (NCBI) database -db: -query: -out: -evalue: -outfmt: Essentials of Next Generation Sequencing 2014 Page 2 of 4 Here are the possible parameters to –outfmt: 0 = pairwise 1 = query-anchored showing identities 2 = query-anchored no identities 3 = flat query-anchored, show identities 4 = flat query-anchored, no identities 5 = XML Blast output 6 = tabular 7 = tabular with comment lines 8 = Text ASN.1 9 = Binary ASN.1 10 = Comma-separated values 11 = BLAST archive format (ASN.1) You may want to experiment with this parameter to see which format suits your needs. Examine the BLAST output with less. 6.3 Create and Search a Custom BLAST Database Goal: Create a BLAST nucleotide database from the genome assembly and perform a query against it Input: BLAST/MoRepeats.fasta BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta Output: BLAST/MoRepeats.Moryzae_genomeBLASTn1 blast-2.2.29+/db/Moryzae_genome.fasta Next, we will create a BLAST database from our existing genome data that we can search against. First, we will need to tell BLAST where to look for your custom database with a .ncbirc file. Use the vim text editor to create a file named .ncbirc (yes, the prefix period should be included) inside your home directory. This file should contain the text: [BLAST] BLASTDB=/home/yourusername/ncbi-blast-2.2.29+/db Again, be sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10) Create a subdirectory named db within the ncbi-blast-2.2.29+ directory. Copy the BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta genome file into your newly created db directory. Change to the db directory, and use makeblastdb to create a new BLAST database: Essentials of Next Generation Sequencing 2014 Page 3 of 4 o makeblastdb –in magnaporthe_oryzae_70-15_8_supercontigs.fasta –dbtype nucl –out Moryzae_genome.fasta Now change back to your home directory and run a blastn search using the sequence in MoRepeats.fasta as the query and your new genome as the database: o blastn -db Moryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn1 -evalue 1e-20 –outfmt 1 Examine your output file with less. Now is a good time to try running the search using a few different output format options (0 through 11). Try –outfmt 6 for example! o blastn -db Moryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn6 -evalue 1e-20 –outfmt 6 BLAST comes in many flavors, not just blastn. o blastn: nucleotide-nucleotide alignment o blastp: protein-protein alignment o blastx: does six-frame translation of query nucleotide sequence and aligns against a protein database o and many more! Essentials of Next Generation Sequencing 2014 Page 4 of 4