Instructions

advertisement
MicroRNA target prediction tool
Introduction:
miRNAs are short non-coding RNAs that serve as post-transcriptional regulators of gene
expression in plants and animals. They act by binding to complementary sites on target
mRNAs to induce cleavage or repression of productive translation.
Although their biological importance has become clear, how they recognize and regulate
target genes remains less well understood.
Target:
Develop an accurate, fast and efficient tool for prediction of microRNA binding sites on
different genomes, using a user-friendly and intuitive graphic interface.
The algorithm:
Our algorithm is based on a variation of the Smith-Waterman algorithm followed by
additional free-energy level filtering, provided by external folding tools.
Our program can handle multiple sequences of microRNA and mRNA, allowing many-tomany queries; here we will examine the processing of a single microRNA-mRNA query.
Step 1
The microRNA sequence contains a "seed". A seed is a sequence located on the 5' end of the
microRNA, and considered to be the key part of the binding. A good pairing between the
seed and a sequence in the mRNA can indicate the existence of a binding site.
Therefore, a good tactic for predicting targets will be searching for mRNA segments that
have a good match with the seed.
Smith-Waterman is a dynamic-programming algorithm used for scoring sequence
alignments. The score is based on Watson-Crick pairing with penalties for gaps and
mismatches.
The original SW performs a global alignment, meaning the complete seed sequence is
aligned against the complete mRNA sequence. The returned output is a single optimal
alignment.
The SW variation we used imposes a local alignment, meaning the complete seed sequence
is aligned against subsequences of the mRNA. This provides us with a number of plausible
local targets which can be sorted by their score; (a low score indicates the sequences are
poorly aligned and therefore are less likely to be a microRNA-mRNA binding site).
Setting a changeable cutoff value for the alignment score gives us the ability to trade off
between the quantity and stringency of the results.
Step 2
It has been proven in recent studies, that the alignment alone is unable to accurately predict
binding sites. In order to eliminate false-positives, we apply a set of filters on the results.
The first filter is based on the free energy level of the secondary-structure of the microRNAmRNA duplex. Lower energy values mean more stable duplexes and therefore, a more
plausible result. This task is performed by a modular external folding tool (such as Vienna,
RNA-Hybrid, etc).
The second filter is based on user input. Users may choose to filter results by a number of
parameters such as pairing percent, number of GU wobble pairs and bulge lengths.
User Guide:
1. The Input tab
1 - Gene Selection tool. Used for mRNA sequences input
2 - microRNA Selection tool. Used for microRNA sequences input
3 - "Begin" button. Used to begin the target prediction process
4 - "Clear Input" button. Cleans all query fields
5 - "Options" button. Used to configure prediction parameters
1.1.
Input of mRNA sequences
There are three ways to insert mRNA sequences for the query:
1.1.1. Database sequences
choose an organism from the list:
A list of genomes will appear. Choose a genome to view the genes it contains, and mark
the desirable genes available for the query.
Note: You can check a s genome in order to query it's complete gene list,
or check the "All" box in order to query the complete genome list.
Choose a region in the gene to query: CDS (coding region), 5'UTR or 3'UTR. This applies to
all selected genes.
1.1.2. Manual Input
choose "manual input", and check the "enter a sequence in FASTA format".
Enter the desired sequence(s) in FASTA format, as seen below.
1.1.3. FASTA File
choose "manual input", and check the "choose a file to upload", browse for the desired
FASTA file.
1.2.
Input of microRNA sequences
Again, there are three ways to insert miRNA sequences for the query, as explained in the
gene section.
1.3.
Setting the query parameters
Click the "options" button, the options dialog box will appear.
Here you can adjust the query parameters.
1.3.1.Seed from & to
defines the start\end points of the seed.
1.3.2.Max number of GU pairs
1.3.3.mRNA folding size
defines the length of the gene sequence to be folded, after a seed match is
found.
1.3.4.Energy cutoff
refers to the free energy value of the folded duplex.
1.3.5.Max bulge in miRNA
1.3.6.Max bulge in mRNA
1.3.7.Pairing percent
1.3.8.Find similar results
1.4. Starting the search
setting the gene and microRNA inputs will turn the "Begin Search" button active,
clicking it will initiate the search, and open the process tab.
NOTE: the "Clear Input" button can be pressed anytime to clear all input.
1.5. Adding miRNA\Genes to the Database
New data can be introduced into the database by choosing the "import to DB"
option.
note: gene can be imported only in .gbwithparts file format, which can be
downloaded from NCBI database.
miRNA can be imported only in fasta format.
2. The processing tab
this tab shows the progress bar, indicating the progress of the search.
the "Stop" will stop the search process and re-open the input tab.
After the search process is completed, the two results tabs will be opened.
3. The Result by microRNA tab
this tab displays the results of the query, listed by miRNA.
In order to view the results, choose one of the queried miRNAs from the list on the left.
A table will appear on the main screen, displaying the binding sites (or "hits") found for
the selected miRNA.
3.1. The results table
the table columns provide information about the hits:
the name of the gene containing the binding site, the alignment score of the seed,
the location of the site on the gene by index, the duplex free energy value and a
scheme illustrating the binding itself.
The table can be sorted any of these columns by clicking on the column name.
Double clicking on the arrow on the left will open the view of the chosen gene on
the "Results by Genes" tab.
3.2. The microRNA options dialog box
open the option dialog box from the upper bar tools->options.
a new window will pop up:
3.2.1. Sort mRNA's list by
choose an ordering parameter for the mRNA list.
3.2.2. Hide Micros that have less than X total hits
miRNA with less total hits on all the genes than the specified value will be
removed from the list.
3.2.3.Hide Micros that hit less than X genes
miRNA which targets less genes than the specified value will be removed from
the list.
4. The Result by Genes tab
this tab displays the results of the query, listed by the queried genes.
In order to view the results, choose one of the queried genes from the list on the left.
A scheme will appear on the main screen, displaying a representation of the selected
gene with the predicted hits found.
On the bottom of the page additional information about the gene is displayed.
4.1. The gene scheme
The bar represents the entire selected gene, where the numbers on it represent the
base number.
miRNA hits are represented by short lines underneath the gene bar, same miRNA
share the same line color.
Hovering over a miRNA hit line will display information about the hit (as shown in
the picture).
Right clicking a hit will mark other hits of the same miRNA on the gene.
Double clicking a hit will redirect to the selected miRNA in the "Results by miRNA"
table.
4.2. The microRNA options dialog box
open the option dialog box from the upper bar tools->options.
a new window will pop up:
4.2.1. Sort genes list by
choose an ordering parameter for the genes list.
4.2.2. Hide genes that have less than X total hits
a gene with less total hits than the specified value will be removed from the
list.
4.2.3.Hide Micros that hit less than X genes
a gene which was targeted by less miRNAs than the specified value will be
removed from the list.
Download