Sheffield Molecular Genetics Facility University of Sheffield

advertisement
Sheffield Molecular Genetics Facility
University of Sheffield
PROTOCOLS
Using Stand-alone nBLAST to Check for Duplicate Sequences
(eg Within a Microsatellite Library)
by Andrew Leviston (02.03.2004) and Deborah Dawson. Last updated 19.08.2004
(contact D.A.Dawson@shef.ac.uk for assistance)
This program allows you to check in-house for duplicates within a database of sequences.
You should add Sau-LA, Sau-LB and PUC18 vector sequences in both orientations to
ensure these are also searched for.
Your sequences should also be BLASTed against the public nBLAST database to check
for sequence matches to other species and to re-check for vector and linker
contamination.
Go to…
http://www.ncbi.nlm.nih.gov/BLAST/
Also see
A. Checking the loci you are using (or have developed) are unique.
Method
This example is based on some moth sequences – so replace the word “moth”, and as in
“mothoutput” with the name of your spaces and do not include spaces or any other
symbols
Input file
All of the following needs to be done on a PC.
If your files are on MAC – email them to yourself and open them on a PC.
Make your sequences in to FASTA format in a WORD document. The name of the
sequence must contain no spaces and no symbols and is preceded by > then the actual
sequence begins on the next line, do not have an empty line between the name of the
sequence and the sequence data.
Open notepad = “ALL PROGRAMS” – “ACCESSORIES” – “NOTEPAD”
Or Use WORDPAD if you have an error message (because the file formatting is probably
causing the problem).
Paste your sequences into notepad and use this to edit them.
>”name of your 1st sequence”
“moth03A12”
>”name of your 2nd sequence”
http://www.shef.ac.uk/misc/groups/molecol/smgf.html
Sheffield Molecular Genetics Facility
University of Sheffield
“moth04B10”
etc
i.e. for some moth sequences
>moth03A12
GATCGAGCGTTGC…..
>moth04B10
GATCTTGTACCTGCACA….
etc
Save the file in the c:\BLAST directory as:
“name of your sequence”.nt
eg “moths.nt
.nt is the file type blast uses as a database,
when saving select “all files” in the “save as type” box, and type the name of your
sequence file followed by “.nt” into the “file name” box.
Leave encoding as “ANSI”
i.e.
for a file of moth sequences save as moths.nt with the “all files” and encoding as
“ANSI”option selected
You also need to save the same file as
“name of your sequence”.txt
ie
“moths.txt”
.txt is a plain text file which blast uses as a query, when saving select “all files” in the
“save as type” box, and type the name of your sequence file followed by “.txt” into the
“file name” box. Leave encoding as “ANSI”.
i.e.
for a file of moth sequences save as moths.txt with the “all files” option selected
Close the notepad file after it has been saved as a “moths.nt” and “moths.txt” file
Using Blast on a PC
http://www.shef.ac.uk/misc/groups/molecol/smgf.html
Sheffield Molecular Genetics Facility
University of Sheffield
Blast operates from the DOS “Command Prompt” and requires commands to be typed in
(don’t worry though!), the commands to be typed are in between quotation marks - “type
me”. After each command you need to press return.
Open up the command prompt (Start > Programs > Accessories > Command Prompt)
At the prompt
“D:\>”
type in – “C:”
At the prompt (which should have now appeared)
“C:\>”
Type in “cd blast”
The following prompt should appear
“C:\BLAST>”
The above commands move the command prompt to the c:\blast directory to use the
program.
Type in -
“formatdb -i moths.nt -p F”
The above commands tell Blast to format your database into a useable form, you need to
substitute “moths.nt” for the name of your FASTA database which you have produced
earlier. E.g. yourspecies.nt
Type in “Blastall -p blastn -d moths.nt –i moths.txt -o mothoutput.htm –T T –m 1”
(The “o” is the letter “o” and not a zero and the last “l” is a number one Not an “L”)
This produces an output file sequence.htm in the blast directory. You can open it in
Internet Explorer to view the matches. You will need to replace “mothseqs.nt” with the
name of your FASTA database. You will also need to replace “mothseqs.txt” with the
name of your query file created earlier. E.g. mothseqs.txt
“mothoutput.htm” is the output file, you can call it anything you want, simply replace it
with a name of your choice, the “.htm” file allows it to be opened in a web browser. E.g.
yourspeciesoutput.htm – just include output so the filename doesn’t match any others
already used.
At the prompt
“C:\BLAST>”
Type
“exit”
Leave the “Command Prompt”
Open th C BLAST directory and the “mothoutput.htm” file
http://www.shef.ac.uk/misc/groups/molecol/smgf.html
Sheffield Molecular Genetics Facility
University of Sheffield
“START” > “MY COMPUTER” > “C” > “BLAST” > “ mothoutput.htm”
Double click on “ mothoutput.htm” which will open the results into a web page format.
Unfortunately the normal nBLAST coloured pictogram of sequences matching as not
shown – but sequences are aligned and matches shown by underscored dashes.
What the command does
Blastall is the execution command
Blastn is the nucleotide blast
-d tells blast to use “mothseqs”.nt as a database
-i tells blast to use “mothseqs”.txt as the query (you can use other files)
-o produces the output file sequence.htm (you can call it anything)
-T T tells Blast to make the output file html (you can leave it out)
-m 1 produces a query anchored output
-m 0 produces a pairwise matched output
Example using sequences from birds
This creates the file “birds.htm” in the c:\blast directory.
http://www.shef.ac.uk/misc/groups/molecol/smgf.html
Download