BCB 444/544X Lab 8 RNA Secondary Structure Prediction Due: 10/22/2007 by 5pm Email to: terrible@iastate.edu Objectives 1. Learn about the resources available for RNA secondary structure prediction 2. Practice using RNA secondary structure prediction software 3. Be able to compare the results of RNA secondary structure predictions Introduction Most models for the function of molecules and experimental observations make more sense if we know the structures of the molecules involved. For RNA, it is often important to know if the bases we have determined to be crucial for function are in a helical region, a loop, or a bulge. Having an accurate secondary structure prediction for RNA can aid in designing and interpreting experiments and developing functional models. Exercises Required questions are in red. The first exercise is taken from Baxevanis and Ouellette’s Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Part I. To demonstrate the utility of color annotation on the mfold server, predict the secondary structure for the Drosophila sucinea R2 3’ UTR, as shown here: Figure 6.2 from Baxevanis and Ouellette R2 elements are a class of retrotransposons that are found in most arthropods (Eickbush, 2002). During retrotransposition, the 3’ UTR of the message RNA is specifically recognized by the reverse transcriptase during target-primed reverse transcription (Luan & Eickbush, 1995; Luan et al., 1993). The secondary structure of the 3’ UTR was predicted for Drosophila with comparative sequence analysis of 10 sequences (Mathews et al., 1997). The sequence of the R2 element from D. sucinea, which can adopt the comparative analysis structure, was later determined (Lathe & Eickbush, 1997). This sequence has been chosen for this example because it has a known secondary structure and the prediction of this secondary structure by free energy minimization is less accurate than average, so that the usefulness of color annotation is demonstrated (Zuker & Jacobson, 1995; Zuker & Jacobson, 1998). Here is the R2 3’ UTR sequence: UGAUCUCUGUAUUUGUUUCUAUUUUGAACAUUUGCCUGCUACCUUGGCAUA ACAUCAAUAAGGUACAAACAUCGCAAAAAGUCAUCAUAAGGUGGGUUUUAG UACGUAGGCGCUGUAGAACUUAAUUGUUCUGAUAGAGCAGCGAGUCGUGCA UGCUAGUCUAGCAUUUCUUGCUACCUAGUAUCUUUAGAAGAUUUCCCUCCCU UAGCGGUCAAA Access the mfold Web server and paste the sucinea R2 element sequence into the large field on the server Web site for the input sequence. Scroll to the bottom of the Web page, to the section marked “Choose structure annotation.” Select the button after “p-num” to choose a color annotation that reflects how well determined base pairs are. Keep the default settings for all other fields. Note, however, that there are links to a help page with an explanation of each user definable setting. Click the “Fold RNA” button at the bottom of the form. This sequence is short enough that the default immediate job can be performed, so the Web browser will move quickly to the results page. The results remain available on the server for 24 hours. Note that the energy dot plot can be viewed by following a hyperlink at the top of the page. Furthermore, a zip or tar file can be downloaded that contains all the predicted structures. On the results page, view the first individual structure by clicking jpg under Structure 1. 1. In the color coding scheme, which color means that the base-pair has the highest probability? Which color corresponds to the lowest probability? Go to the RNAfold server and paste the sucinea R2 element sequence in the input box. Scroll to the bottom and click on Fold it to generate the prediction. 2. Are there similarities between the structures predicted by mfold and RNAfold? 3. How does the predicted structures compare to the structure shown above? References cited in this section: Eickbush, TH (2002). In Mobile DNA II (Craig, NL, Craigie, R, Gellart, M, and Lambowitz, AM eds). Lathe, WC and Eickbush, TH (1997). Mol. Biol. Evol. 14, 1232-1241. Luan, DD and Eickbush TH (1995). Mol. Cell. Biol. 15, 3882-3891. Luan, DD et al. (1993). Cell 72, 595-605. Mathews, DH et al. (1997). RNA 3, 1-16. Zuker, M, and Jacobson, AB (1995). Nucl. Acids Res. 23, 2791-2798. Zuker, M, and Jacobson, AB (1998). RNA 4, 669-679. Part II. Go through the exercise at: http://cnx.rice.edu/content/m11065/latest/ 4. There are 12 questions in the exercise. Submit answers to all 12. Part III. For a more real world problem, use mfold to predict the secondary structures of the sequences here. These sequences are for an important regulatory element in the lentivirus HIV and EIAV called the Rev response element, or RRE. The sequences have the same function in the two species and we hypothesize that they may have similar structures. 5. Are there any similarities between the HIV and EIAV RRE’s? To help determine if the sequences share a common structure, it may help to identify regions of high similarity and predict the structure of just those regions. Go to ClustalW and enter the two sequences. Use the program with default parameters to identify any regions of similarity. Save the alignment in a file for use later. Use mfold to predict the secondary structures of the regions of the sequences that ClustalW aligns. 6. Are there any similarities between the HIV and EIAV RRE structures from mfold? Go to RNAalifold and submit the aligned HIV and EIAV RRE sequences. Save the postscript drawing of the predicted structure. 7. How does the structure predicted by RNAalifold compare to the mfold structures?