Genomics_Tutorial

advertisement
Tutorial: Finding the gene structure of orthologs
When blasting an incomplete or in-progress genome sequence, it can be difficult to
find hits that are the length of the full gene ortholog you are looking for. Often, you
will find just one small hit in an entire database.
In order to analyze the hits you get, it’s important to understand what part of the
gene’s structure they come from. Once you know which part of the primary
structure your hit corresponds to, you can begin to study the secondary structure,
function and regulation of this gene.
In this tutorial, I will explain how you can determine which exon of a gene your hit
corresponds to, as well as how to get the most gene structure information out of the
sequence data you have.
The basic steps to determining which exons you have are as follows:
- Download an ortholog sequence from a genome browser
- Blast this against your database
- Copy the QUERY sequence you used to get this hit
- Find the detailed, full sequence of the otholog gene you used as your query
within the genome browser. Make sure this page indicates which sequences
refer to which exon
- Search for a portion of your query sequence on the page which contains the
full ortholog sequence
- Because you took this query sequence straight from the genome browser,
you should have an exact match. Determine which exon this match falls
under. You now know which exon you have in your sequence data!
I will begin with an example of a hit I found when blasting grape’s Nhx1 gene against
the blueberry EST data.
I first blasted a file from NCB1 listed as the complete cdna of the grape Nhx1. This
gave me one large (~1200 bp) hit against the blueberry EST database.
In order to figure out which part of the Nhx1 gene I had found, I went into the grape
genome browser. I found six candidate genes for Nhx1 in the browser and blasted
all six of these against the 1200 bp hit mentioned above. It’s important to cover all
your bases and check every candidate gene, because you don’t know which copy of
the gene you might have in your sequence.
The first candidate gene listed gave me one hit when blasted against the 1200 bp’s
from the EST database:
I copied the first segment of the query sequence:
CAGGCATGCTTTT
Then, I went into the grape genome browser and looked at the detailed description
of this first candidate gene I had obtained my query sequence from.
The page gives you the transcript structure
As well as the full sequence, formatted to distinguish different structural elements of
the gene:
Because my query sequence (CAGGCATGCTTTT) came directly from this genome
browser, it should have an exact match on this page. Thus, I can utilize the find
function of my web browser, and simply search the page for my query sequence.
When I did this I found one match:
This segment corresponds to exon 10. Thus, we know that we have the equivalent
of grape’s exon 10 within our 1200 bp EST hit.
Expanding on your data:
I found exon 10 by investigating just one hit, received by blasting the Nhx1 grape
gene against the EST database. It is possible to expand your hits, potentially
uncovering more exons within your sequence data. One way to do that is to simply
blast against a different database. For example, when I blast the same Nhx1 gene
against the 454 data, I get a different hit than the one I get against the EST data.
Another option is to use a different query in your blast. You can change sequences
or perhaps blast an amino acid sequence using a tBLASTn query.
Each hit you get gives you another opportunity to go into a genome browser and
determine which exon your hit aligns with. Slowly, you can begin to piece together
exons and gain a sense of the overall structure of your ortholog.
Download