Using BLAST to find a genius gene: +What is BLAST: The BLAST program is designed to compare primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. To use it, a researcher should submit to the algorithm a sequence of interest. The sequence can be DNA, RNA, or an amino acid chain. The algorithm will then compare the sequence the user submitted with the sequences in its database, and tell the user which database sequence most closely matches the usersubmitted sequence. You may have a sequence of interest because you just sequenced a DNA, RNA, or amino acid chain and want to know what organism it comes from or what protein it makes, or you may have a known DNA, RNA, or amino acid sequence from a particular organism and want to know how similar it is to another organism to help determine possible evolutionary relationships. If necessary, the programs translate nucleic acid sequences in all six possible reading frames to compare them to protein sequences. (http://en.wikipedia.org/wiki/BLAST ) What you have-> A DNA or RNA An amino acid What you want v sequence sequence of a protein Nucleotide Blast N/A blastx Protein Blast tblastx tblastn blastn N/A Similarities with nucleotide sequence in the database Similarities with proteins in the database Similarities with translated nucleotide sequences* in the database Similarities with each other +In our project, we can use this program for 2 purposes: find out the evolutionary relationship between mouse and human (main purpose) and compare the mutant genes with the original genes in mouse. a) find out the evolutionary relationship between two or more genetic sequence between mouse and human. In detail, in the discovery of a previously unknown gene in the mouse, which we believe is “myg”, we may typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. +I tried what Prof. Han had shown in “week 8-9” file and below are what I discovered: When I have a nucleotide sequence which I believe is a “myg”, (is this case I will suppose that the “myg” sequence here is the gene sequence in the reference that Dr. Han had gave us previously: ACCATTGGGGGAGGAGATGA (of course this is not true, this’s just a assumption ) I will compare it to other nucleotide sequences in the database. When I clicked the link http://blast.ncbi.nlm.nih.gov/, I arrived at a page that looked like this: In the box "Enter Query Sequence" I submitted a nucleotide sequence ACCATTGGGGGAGGAGATGA. It has no number but I think we can also enter a nucleotide sequence with numbers in it or a FASTA sequence (a sequence where the first line starts with a ">" sign). We can also upload a text file with your sequence in it. In the box "Choose Search Set" I selected the database you want to search for your entered nucleotide sequence. If you know it came from the human or mouse genome, those options are readily available for you. If you're not sure where it came from you should choose the "Nucleotide Collection (nr/nt)" option, which will search for your sequence in all non-redundant nucleotide databases. If you know it came from a species other than mouse or human, choose the "Nucleotide collection" option, then on the line underneath the selection box, type the name of the species you want to search. There are many other options on the pull-down menu to explore, but the most likely databases to be of use to you are the ones previously discussed. But in my case, I chose the “human” for the species genome, because I want to find the similarity and evolutionary relationship between that sequence with human’s sequences and genes . In the "Program Selection" box, you can indicate how accurate of a match your submitted nucleotide sequence will make to a nucleotide sequence in your selected database. If you have no idea, you may want to start with "Highly similar sequences", perform the BLAST, and see what your results are. If you don't get any good matches, you may want to make the matching algorithm less rigorous. If you want to know more about different matching algorithms, there is a note on them at the bottom of that page. In my case, I chose the “megablast”. After you've chosen your matching algorithm, click BLAST. After you've clicked BLAST, you will probably have to wait a few minutes to get your results like I did. +Result: The results page is packed with data for you to use to quantify how good your BLAST match was. At the top of the page I saw color-coded score alignment of my submitted sequence with all the sequences found in the database. I think the higher the score, the more similar the two. In my case, there was blue color, which mean the so-so score, because the sequence Prof. Han gave just a very short primer. When I scrolled down, I saw my top match. The hit told me the species name of the species it aligned to, the name of the gene it aligned to (if applicable), the length of the gene, if the sequence was found from mRNA, the score of the match, the expect value of the match, the identities of the nucleotide match, the number of gaps, which strand the match occurred on, and then the actual alignment. There is also each bits score for each sequence, which links to the corresponding pairwise alignment between query sequence and hit sequence (also referred to as subject sequence). Another kind of scores is Expect Value- it describes the likelihood that a sequence with a similar score will occur in the database by chance. The smaller the E Value, the more significant the alignment. > gb|JN946174.1| Mus musculus targeted KO-first, conditional ready, lacZ-tagged mutant allele Pfn4:tm2a(KOMP)Wtsi; transgenic Length=38890 Score = 40.1 bits (20), Expect = 0.066 Identities = 20/20 (100%), Gaps = 0/20 (0%) Strand=Plus/Plus Query 1 ACCATTGGGGGAGGAGATGA 20 |||||||||||||||||||| Sbjct 30454 ACCATTGGGGGAGGAGATGA 30473 > gb|JN945187.1| Mus musculus targeted non-conditional, lacZ-tagged mutant allele Tuba1c:tm1e(EUCOMM)Hmgu; transgenic Length=38887 Score = 40.1 bits (20), Expect = 0.066 Identities = 20/20 (100%), Gaps = 0/20 (0%) Strand=Plus/Plus Query 1 ACCATTGGGGGAGGAGATGA 20 |||||||||||||||||||| Sbjct 12682 ACCATTGGGGGAGGAGATGA 12701 After getting result, we can focus analyze human genes which have the highest score and are most homologous to the sequence which we believe is ther “myg” gene in mouse. b) In addition, we can also use blastn - for nucleotide - nucleotide comparisons, which we can determine the differences between the mutant genes of the mutant mice and the original genes of the wild-type mice. And those differences can be used as gene markers. REFERENCES: [PDF] Bioinformatics: Analyzing DNA Sequence using BLAST http://www.ncbi.nlm.nih.gov/blast/html/BLASThomehelp.html http://www.doelz.com/biocompanion_96/compari9.html