Sheffield Molecular Genetics Facility University of Sheffield PROTOCOLS Using Stand-alone nBLAST to Check for Duplicate Sequences (eg Within a Microsatellite Library) by Andrew Leviston (02.03.2004) and Deborah Dawson. Last updated 19.08.2004 (contact D.A.Dawson@shef.ac.uk for assistance) This program allows you to check in-house for duplicates within a database of sequences. You should add Sau-LA, Sau-LB and PUC18 vector sequences in both orientations to ensure these are also searched for. Your sequences should also be BLASTed against the public nBLAST database to check for sequence matches to other species and to re-check for vector and linker contamination. Go to… http://www.ncbi.nlm.nih.gov/BLAST/ Also see A. Checking the loci you are using (or have developed) are unique. Method This example is based on some moth sequences – so replace the word “moth”, and as in “mothoutput” with the name of your spaces and do not include spaces or any other symbols Input file All of the following needs to be done on a PC. If your files are on MAC – email them to yourself and open them on a PC. Make your sequences in to FASTA format in a WORD document. The name of the sequence must contain no spaces and no symbols and is preceded by > then the actual sequence begins on the next line, do not have an empty line between the name of the sequence and the sequence data. Open notepad = “ALL PROGRAMS” – “ACCESSORIES” – “NOTEPAD” Or Use WORDPAD if you have an error message (because the file formatting is probably causing the problem). Paste your sequences into notepad and use this to edit them. >”name of your 1st sequence” “moth03A12” >”name of your 2nd sequence” http://www.shef.ac.uk/misc/groups/molecol/smgf.html Sheffield Molecular Genetics Facility University of Sheffield “moth04B10” etc i.e. for some moth sequences >moth03A12 GATCGAGCGTTGC….. >moth04B10 GATCTTGTACCTGCACA…. etc Save the file in the c:\BLAST directory as: “name of your sequence”.nt eg “moths.nt .nt is the file type blast uses as a database, when saving select “all files” in the “save as type” box, and type the name of your sequence file followed by “.nt” into the “file name” box. Leave encoding as “ANSI” i.e. for a file of moth sequences save as moths.nt with the “all files” and encoding as “ANSI”option selected You also need to save the same file as “name of your sequence”.txt ie “moths.txt” .txt is a plain text file which blast uses as a query, when saving select “all files” in the “save as type” box, and type the name of your sequence file followed by “.txt” into the “file name” box. Leave encoding as “ANSI”. i.e. for a file of moth sequences save as moths.txt with the “all files” option selected Close the notepad file after it has been saved as a “moths.nt” and “moths.txt” file Using Blast on a PC http://www.shef.ac.uk/misc/groups/molecol/smgf.html Sheffield Molecular Genetics Facility University of Sheffield Blast operates from the DOS “Command Prompt” and requires commands to be typed in (don’t worry though!), the commands to be typed are in between quotation marks - “type me”. After each command you need to press return. Open up the command prompt (Start > Programs > Accessories > Command Prompt) At the prompt “D:\>” type in – “C:” At the prompt (which should have now appeared) “C:\>” Type in “cd blast” The following prompt should appear “C:\BLAST>” The above commands move the command prompt to the c:\blast directory to use the program. Type in - “formatdb -i moths.nt -p F” The above commands tell Blast to format your database into a useable form, you need to substitute “moths.nt” for the name of your FASTA database which you have produced earlier. E.g. yourspecies.nt Type in “Blastall -p blastn -d moths.nt –i moths.txt -o mothoutput.htm –T T –m 1” (The “o” is the letter “o” and not a zero and the last “l” is a number one Not an “L”) This produces an output file sequence.htm in the blast directory. You can open it in Internet Explorer to view the matches. You will need to replace “mothseqs.nt” with the name of your FASTA database. You will also need to replace “mothseqs.txt” with the name of your query file created earlier. E.g. mothseqs.txt “mothoutput.htm” is the output file, you can call it anything you want, simply replace it with a name of your choice, the “.htm” file allows it to be opened in a web browser. E.g. yourspeciesoutput.htm – just include output so the filename doesn’t match any others already used. At the prompt “C:\BLAST>” Type “exit” Leave the “Command Prompt” Open th C BLAST directory and the “mothoutput.htm” file http://www.shef.ac.uk/misc/groups/molecol/smgf.html Sheffield Molecular Genetics Facility University of Sheffield “START” > “MY COMPUTER” > “C” > “BLAST” > “ mothoutput.htm” Double click on “ mothoutput.htm” which will open the results into a web page format. Unfortunately the normal nBLAST coloured pictogram of sequences matching as not shown – but sequences are aligned and matches shown by underscored dashes. What the command does Blastall is the execution command Blastn is the nucleotide blast -d tells blast to use “mothseqs”.nt as a database -i tells blast to use “mothseqs”.txt as the query (you can use other files) -o produces the output file sequence.htm (you can call it anything) -T T tells Blast to make the output file html (you can leave it out) -m 1 produces a query anchored output -m 0 produces a pairwise matched output Example using sequences from birds This creates the file “birds.htm” in the c:\blast directory. http://www.shef.ac.uk/misc/groups/molecol/smgf.html