MicroRNA target prediction tool Introduction: miRNAs are short non-coding RNAs that serve as post-transcriptional regulators of gene expression in plants and animals. They act by binding to complementary sites on target mRNAs to induce cleavage or repression of productive translation. Although their biological importance has become clear, how they recognize and regulate target genes remains less well understood. Target: Develop an accurate, fast and efficient tool for prediction of microRNA binding sites on different genomes, using a user-friendly and intuitive graphic interface. The algorithm: Our algorithm is based on a variation of the Smith-Waterman algorithm followed by additional free-energy level filtering, provided by external folding tools. Our program can handle multiple sequences of microRNA and mRNA, allowing many-tomany queries; here we will examine the processing of a single microRNA-mRNA query. Step 1 The microRNA sequence contains a "seed". A seed is a sequence located on the 5' end of the microRNA, and considered to be the key part of the binding. A good pairing between the seed and a sequence in the mRNA can indicate the existence of a binding site. Therefore, a good tactic for predicting targets will be searching for mRNA segments that have a good match with the seed. Smith-Waterman is a dynamic-programming algorithm used for scoring sequence alignments. The score is based on Watson-Crick pairing with penalties for gaps and mismatches. The original SW performs a global alignment, meaning the complete seed sequence is aligned against the complete mRNA sequence. The returned output is a single optimal alignment. The SW variation we used imposes a local alignment, meaning the complete seed sequence is aligned against subsequences of the mRNA. This provides us with a number of plausible local targets which can be sorted by their score; (a low score indicates the sequences are poorly aligned and therefore are less likely to be a microRNA-mRNA binding site). Setting a changeable cutoff value for the alignment score gives us the ability to trade off between the quantity and stringency of the results. Step 2 It has been proven in recent studies, that the alignment alone is unable to accurately predict binding sites. In order to eliminate false-positives, we apply a set of filters on the results. The first filter is based on the free energy level of the secondary-structure of the microRNAmRNA duplex. Lower energy values mean more stable duplexes and therefore, a more plausible result. This task is performed by a modular external folding tool (such as Vienna, RNA-Hybrid, etc). The second filter is based on user input. Users may choose to filter results by a number of parameters such as pairing percent, number of GU wobble pairs and bulge lengths. User Guide: 1. The Input tab 1 - Gene Selection tool. Used for mRNA sequences input 2 - microRNA Selection tool. Used for microRNA sequences input 3 - "Begin" button. Used to begin the target prediction process 4 - "Clear Input" button. Cleans all query fields 5 - "Options" button. Used to configure prediction parameters 1.1. Input of mRNA sequences There are three ways to insert mRNA sequences for the query: 1.1.1. Database sequences choose an organism from the list: A list of genomes will appear. Choose a genome to view the genes it contains, and mark the desirable genes available for the query. Note: You can check a s genome in order to query it's complete gene list, or check the "All" box in order to query the complete genome list. Choose a region in the gene to query: CDS (coding region), 5'UTR or 3'UTR. This applies to all selected genes. 1.1.2. Manual Input choose "manual input", and check the "enter a sequence in FASTA format". Enter the desired sequence(s) in FASTA format, as seen below. 1.1.3. FASTA File choose "manual input", and check the "choose a file to upload", browse for the desired FASTA file. 1.2. Input of microRNA sequences Again, there are three ways to insert miRNA sequences for the query, as explained in the gene section. 1.3. Setting the query parameters Click the "options" button, the options dialog box will appear. Here you can adjust the query parameters. 1.3.1.Seed from & to defines the start\end points of the seed. 1.3.2.Max number of GU pairs 1.3.3.mRNA folding size defines the length of the gene sequence to be folded, after a seed match is found. 1.3.4.Energy cutoff refers to the free energy value of the folded duplex. 1.3.5.Max bulge in miRNA 1.3.6.Max bulge in mRNA 1.3.7.Pairing percent 1.3.8.Find similar results 1.4. Starting the search setting the gene and microRNA inputs will turn the "Begin Search" button active, clicking it will initiate the search, and open the process tab. NOTE: the "Clear Input" button can be pressed anytime to clear all input. 1.5. Adding miRNA\Genes to the Database New data can be introduced into the database by choosing the "import to DB" option. note: gene can be imported only in .gbwithparts file format, which can be downloaded from NCBI database. miRNA can be imported only in fasta format. 2. The processing tab this tab shows the progress bar, indicating the progress of the search. the "Stop" will stop the search process and re-open the input tab. After the search process is completed, the two results tabs will be opened. 3. The Result by microRNA tab this tab displays the results of the query, listed by miRNA. In order to view the results, choose one of the queried miRNAs from the list on the left. A table will appear on the main screen, displaying the binding sites (or "hits") found for the selected miRNA. 3.1. The results table the table columns provide information about the hits: the name of the gene containing the binding site, the alignment score of the seed, the location of the site on the gene by index, the duplex free energy value and a scheme illustrating the binding itself. The table can be sorted any of these columns by clicking on the column name. Double clicking on the arrow on the left will open the view of the chosen gene on the "Results by Genes" tab. 3.2. The microRNA options dialog box open the option dialog box from the upper bar tools->options. a new window will pop up: 3.2.1. Sort mRNA's list by choose an ordering parameter for the mRNA list. 3.2.2. Hide Micros that have less than X total hits miRNA with less total hits on all the genes than the specified value will be removed from the list. 3.2.3.Hide Micros that hit less than X genes miRNA which targets less genes than the specified value will be removed from the list. 4. The Result by Genes tab this tab displays the results of the query, listed by the queried genes. In order to view the results, choose one of the queried genes from the list on the left. A scheme will appear on the main screen, displaying a representation of the selected gene with the predicted hits found. On the bottom of the page additional information about the gene is displayed. 4.1. The gene scheme The bar represents the entire selected gene, where the numbers on it represent the base number. miRNA hits are represented by short lines underneath the gene bar, same miRNA share the same line color. Hovering over a miRNA hit line will display information about the hit (as shown in the picture). Right clicking a hit will mark other hits of the same miRNA on the gene. Double clicking a hit will redirect to the selected miRNA in the "Results by miRNA" table. 4.2. The microRNA options dialog box open the option dialog box from the upper bar tools->options. a new window will pop up: 4.2.1. Sort genes list by choose an ordering parameter for the genes list. 4.2.2. Hide genes that have less than X total hits a gene with less total hits than the specified value will be removed from the list. 4.2.3.Hide Micros that hit less than X genes a gene which was targeted by less miRNAs than the specified value will be removed from the list.