RETRIEVING INFORMATION FROM THE BLAST SEARCH RESULTS

advertisement
NAME ____________________________________________
GENETICS (BIO 306)
RETRIEVING INFORMATION FROM BLAST SEARCH RESULTS
Supply the following information about your fly genes. You may submit a separate document for
each gene, or include both on one document.
1. Write the name of your fly gene.
2. In your own words, describe the putative normal function of the gene (i.e., of the wild-type
allele). Include a properly formatted reference for the journal article that describes this
function (see links on the assignment website for proper formatting guidelines).
3. Conduct a nucleotide BLAST search using your wild-type gene sequence. Exclude Drosophila
(taxid:7215) [Note: if your initial search returns an error message of “No significant similarity
found,” repeat the search using discontiguous megablast rather than megablast as your search
algorithm. Some genes (e.g., white) require that you increase the number of results (try 500) in
order to get a useful result.]. Write the scientific and common names of the top nonDrosophila hit obtained from the BLAST search [Note: Sometimes the common name will not be
readily available. If not, click on the link from the list of hits, and once the locus information
comes up, click on the link next to the word "ORGANISM". This will take you to the taxonomy of
the critter and the common name will be listed].
4. Write the query cover %, E value, and % identity for the top non-Drosophila hit.
5. Using the Graphic Summary, briefly describe the pattern of overlap between the query
sequence and the top non-Drosophila hit. Does the entire hit show strong similarity to the query
sequence? If not, which parts of the hit do (provide approximate ranges of nucleotides)?
6. Run a second BLAST search. In the “Choose Search Set” window, under “Organism,” include
the following genetic model organisms (you can just copy-paste the information in the left
column below to the Organism window, using the “+” button to add lines, one per organism):
Escherichia coli (taxid:562)
Saccharomyces cerevisiae (taxid:4932)
Caenorhabditis elegans (taxid:6239)
Arabidopsis thaliana (taxid:3702)
Danio rerio (taxid:7955)
Mus musculus (taxid:10090)
Homo sapiens (taxid:9606)
[bacterium]
[yeast]
[nematode]
[rock cress (plant)]
[zebrafish]
[mouse]
[human]
[Note: You may need to use the blastn algorithm rather than megablast or discontinuous
megablast in order to obtain any hits].
In the table below, provide the percent similarity, percent coverage, and E-value for each model
organism. What organism yields the top hit to your Drosophila gene? Is this result what you
expected? Why or why not?
Organism
Escherichia coli
Saccharomyces cerevisiae
Caenorhabditis elegans
Arabidopsis thaliana
Danio rerio
Mus musculus
Homo sapiens
Percent coverage
Percent similarity
E-value
What is the identity (title/description) of the top-hit gene? Does the top hit appear to have a
biologically similar function to your Drosophila gene? Why or why not?
7. Now, obtain the amino acid translation of your gene sequence from the GenBank record for
your sequence, and rerun the BLAST search with the same model organisms as above, but this
time use a protein BLAST (blastp) search. Do the results (e.g., top hit organism, % coverage, %
identity, and E-value) differ significantly from the results of the nucleotide BLAST search? Does
your conclusion change about whether the top hit appears to have a biologically similar function
to your Drosophila gene? Why do you think this might be the case?
Submit your answers to the appropriate dropbox as a Word File (.doc or .docx).
Download