If necessary, the programs translate nucleic acid sequences in all six

advertisement
Using BLAST to find a genius gene:
+What is BLAST: The BLAST program is designed to compare primary biological
sequence information, such as the amino-acid sequences of different proteins or the
nucleotides of DNA sequences. To use it, a researcher should submit to the algorithm a
sequence of interest. The sequence can be DNA, RNA, or an amino acid chain. The
algorithm will then compare the sequence the user submitted with the sequences in its
database, and tell the user which database sequence most closely matches the usersubmitted sequence. You may have a sequence of interest because you just sequenced a
DNA, RNA, or amino acid chain and want to know what organism it comes from or what
protein it makes, or you may have a known DNA, RNA, or amino acid sequence from a
particular organism and want to know how similar it is to another organism to help
determine possible evolutionary relationships. If necessary, the programs translate
nucleic acid sequences in all six possible reading frames to compare them to protein
sequences. (http://en.wikipedia.org/wiki/BLAST )
What you have->
A DNA or RNA
An amino acid
What you want v
sequence
sequence of a protein
Nucleotide Blast
N/A
blastx
Protein Blast
tblastx
tblastn
blastn
N/A
Similarities with nucleotide sequence in the
database
Similarities with proteins in the database
Similarities with translated nucleotide sequences*
in the database
Similarities with each other
+In our project, we can use this program for 2 purposes: find out the evolutionary
relationship between mouse and human (main purpose) and compare the mutant
genes with the original genes in mouse.
a) find out the evolutionary relationship between two or more genetic sequence
between mouse and human. In detail, in the discovery of a previously unknown
gene in the mouse, which we believe is “myg”, we may typically perform a BLAST
search of the human genome to see if humans carry a similar gene; BLAST will
identify sequences in the human genome that resemble the mouse gene based on
similarity of sequence.
+I tried what Prof. Han had shown in “week 8-9” file and below are what I
discovered:
When I have a nucleotide sequence which I believe is a “myg”, (is this case I will
suppose that the “myg” sequence here is the gene sequence in the reference that
Dr. Han had gave us previously: ACCATTGGGGGAGGAGATGA (of course
this is not true, this’s just a assumption ) I will compare it to other nucleotide
sequences in the database. When I clicked the link http://blast.ncbi.nlm.nih.gov/, I
arrived at a page that looked like this:
In the box "Enter Query Sequence" I submitted a nucleotide sequence
ACCATTGGGGGAGGAGATGA. It has no number but I think we can also enter
a nucleotide sequence with numbers in it or a FASTA sequence (a sequence
where the first line starts with a ">" sign). We can also upload a text file with your
sequence in it.
In the box "Choose Search Set" I selected the database you want to search for
your entered nucleotide sequence. If you know it came from the human or mouse
genome, those options are readily available for you. If you're not sure where it
came from you should choose the "Nucleotide Collection (nr/nt)" option, which
will search for your sequence in all non-redundant nucleotide databases. If you
know it came from a species other than mouse or human, choose the "Nucleotide
collection" option, then on the line underneath the selection box, type the name of
the species you want to search. There are many other options on the pull-down
menu to explore, but the most likely databases to be of use to you are the ones
previously discussed. But in my case, I chose the “human” for the species
genome, because I want to find the similarity and evolutionary relationship
between that sequence with human’s sequences and genes .
In the "Program Selection" box, you can indicate how accurate of a match your
submitted nucleotide sequence will make to a nucleotide sequence in your
selected database. If you have no idea, you may want to start with "Highly similar
sequences", perform the BLAST, and see what your results are. If you don't get
any good matches, you may want to make the matching algorithm less rigorous. If
you want to know more about different matching algorithms, there is a note on
them at the bottom of that page. In my case, I chose the “megablast”. After you've
chosen your matching algorithm, click BLAST.
After you've clicked BLAST, you will probably have to wait a few minutes to get
your results like I did.
+Result: The results page is packed with data for you to use to quantify how
good your BLAST match was. At the top of the page I saw color-coded score
alignment of my submitted sequence with all the sequences found in the database.
I think the higher the score, the more similar the two. In my case, there was blue
color, which mean the so-so score, because the sequence Prof. Han gave just a
very short primer.
When I scrolled down, I saw my top match. The hit told me the species name of
the species it aligned to, the name of the gene it aligned to (if applicable), the
length of the gene, if the sequence was found from mRNA, the score of the match,
the expect value of the match, the identities of the nucleotide match, the number
of gaps, which strand the match occurred on, and then the actual alignment. There
is also each bits score for each sequence, which links to the corresponding
pairwise alignment between query sequence and hit sequence (also referred to as
subject sequence). Another kind of scores is Expect Value- it describes the
likelihood that a sequence with a similar score will occur in the database by
chance. The smaller the E Value, the more significant the alignment.
>
gb|JN946174.1|
Mus musculus targeted KO-first, conditional ready, lacZ-tagged
mutant allele Pfn4:tm2a(KOMP)Wtsi; transgenic
Length=38890
Score = 40.1 bits (20), Expect = 0.066
Identities = 20/20 (100%), Gaps = 0/20 (0%)
Strand=Plus/Plus
Query 1
ACCATTGGGGGAGGAGATGA 20
||||||||||||||||||||
Sbjct 30454 ACCATTGGGGGAGGAGATGA 30473
>
gb|JN945187.1|
Mus musculus targeted non-conditional, lacZ-tagged mutant allele
Tuba1c:tm1e(EUCOMM)Hmgu; transgenic
Length=38887
Score = 40.1 bits (20), Expect = 0.066
Identities = 20/20 (100%), Gaps = 0/20 (0%)
Strand=Plus/Plus
Query 1
ACCATTGGGGGAGGAGATGA 20
||||||||||||||||||||
Sbjct 12682 ACCATTGGGGGAGGAGATGA 12701
After getting result, we can focus analyze human genes which have the highest
score and are most homologous to the sequence which we believe is ther “myg”
gene in mouse.
b) In addition, we can also use blastn - for nucleotide - nucleotide comparisons, which
we can determine the differences between the mutant genes of the mutant mice and the
original genes of the wild-type mice. And those differences can be used as gene markers.
REFERENCES:
[PDF]
Bioinformatics: Analyzing DNA Sequence using BLAST
http://www.ncbi.nlm.nih.gov/blast/html/BLASThomehelp.html
http://www.doelz.com/biocompanion_96/compari9.html
Download