Amer Abdulla 10/15/2010 RNA Folding HW 1 - Hammerhead ribozyme YES Gate The following is the ribozyme sequence used in this assignment: GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC AGGCGAAACGGUGAAAGCCGUAGGUUGCCC Source: Penchovsky R, Breaker RR. Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes. Nat Biotechnol. 2005 Nov;23(11):1424-33. Epub 2005 Oct 23. OFF Position 1) To predict the MFE structure, I used RNAfold, a program that reads in an RNA sequence from standard input, calculates their minimum free energy structure, and prints to standard output the minimum free energy structure in bracket notation and its free energy. In the terminal, I entered “RNAfold,” which prompted the RNAfold program to output “Input string (upper or lower case);” I then entered the hammerhead ribozyme sequence seen above. These are my results: GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAU CAGGCGAAACGGUGAAAGCCGUAGGUUGCCC ((((((((((((((((((((.(..(((.......))).)))))))).))))).....(((((....))))).)))))))) minimum free energy = -35.10 kcal/mol In order to visualize the secondary structure determined by RNAfold, I used the program RNAplot, which renders 2D plots of the secondary structure of RNAs in Postscript. In the command line, I entered RNAplot, which caused the program to request from me to “Input the sequence and structure.” Upon copying and pasting the sequence and secondary structure in bracket notation produced from RNAfold into the terminal, I pressed enter, and a file named rna.ps was produced. In order to visualize the secondary structure, I had to convert the Postscript file to a PDF file. To do this, I entered “convert rna.psf rna.pdf” into the command line. Then, I used Adobe reader to open the file. This is what I got: 2) To get the baseparing probability plot, I used the program RNAplot with the -p option. In the command line, I entered “RNAfold -p” and then when it asked for the sequence I copied and pasted the sequence in. The following was the output: length = 80 GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC AGGCGAAACGGUGAAAGCCGUAGGUUGCCC ((((((((((((((((((((.(..(((.......))).)))))))).))))).....(((((....))))).)))))))) minimum free energy = -35.10 kcal/mol (((((((((((((((((({(.,..,||,,,,...}}}.|})))))),,)))).....(((((....))))).)))))))) free energy of ensemble = -37.93 kcal/mol ((((((((((((.(((((((...................)))))))..)))).....(((((....))))).)))))))) {-29.41 d=10.92} frequency of mfe structure in ensemble 0.0836646; ensemble diversity 15.95 A file called dot.ps was formed, which held in it the base-pairing probability plot, seen below: In the lower left corner of the plot, there is one square for each base-pair in the MFE structure. In the upper right corner, there are squares with areas proportio nal to the pairing probabili ty. As one can see, the most likely pairings seen in the upper right correspond to the pairings shown in the lower left, which represent a kind of “global minimum” in base-pairings. 3) Statuses of different parts of the sequence: Stem I: exists Stem II: does not exist because the effector-binding site binds to the rest of the hammerhead ribozyme and therefore prevents stem II from forming Stem III: exists Cleavage site: relatively inaccessible because it's in a tight corner between stems I and III. ON Position The same exact procedure was followed as above, except for the imposition of constraints to form the ON position. 1) Predicted MFE structure – In order to make the On position, it was necessary to make the Oligonucleotide Binding Site (OBS) single stranded so that we could simulate its binding to DNA-1. In order to do this, I used the constraints feature of RNAfold. In the command line, I typed in RNAfold -C. Next, it asked me to input the sequence, and I did and pressed enter. Next, it have me a list of notations to use in assigning my constraints. I placed a “.” for all bases for which I did not want any constraints, and I placed an “x” for all bases that I wanted to not pair up with any other bases. These bases, which were bases 26 – 47, corresponded to the OBS. The following is my output: GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC AGGCGAAACGGUGAAAGCCGUAGGUUGCCC ((((((((.......((((((...........................))))))...(((((....))))).)))))))) minimum free energy = -28.53 kcal/mol The sequence and structure were inputted into RNAfold and the following was the output: 2) B as ep ai ri n g pr o b a bi lit y pl ot – th is w as th e in put and output in the terminal: be231-08@kepler ~/RNA_Folding [12:35am]> RNAfold -p -C Input constraints using the following notation: | : paired with another base . : no constraint at all x : base must not pair < : base i is paired with a base j<i > : base i is paired with a base j>i matching brackets ( ): base i pairs base j Input string (upper or lower case); @ to quit ....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8 GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC AGGCGAAACGGUGAAAGCCGUAGGUUGCCC .........................xxxxxxxxxxxxxxxxxxxxxx................................. length = 80 GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC AGGCGAAACGGUGAAAGCCGUAGGUUGCCC ((((((((.......((((((...........................))))))...(((((....))))).)))))))) minimum free energy = -28.53 kcal/mol ((((((((.......((((((...........................))))))...(((((....))))).)))))))) free energy of ensemble = -29.02 kcal/mol ((((((((.......((((((...........................))))))...(((((....))))).)))))))) {-28.53 d=0.89} frequency of mfe structure in ensemble 0.621842; ensemble diversity 1.67 And this was the plot: 3) Statuses of different parts of the sequence : Stem I: exists Stem II: exists because OBS is now single stranded, meaning it will not interfere with the formatio n of stem II. Stem III: exists Cleavage site: relatively higher accessibility than in the OFF position because the angle between stems I and III is larger than in the OFF position. 2 – Software for verifying YES Gate Inputs to program: - $ARGV[0]: the RNA sequence which the program will verify as being a YES gate - $ARGV[1]: the nucleotide number that indicates the first nucleotide in the OBS subsequence - $ARGV[2]: the nucleotide number that indicates the last nucleotide in the OBS subsequence * When program is called at command line, do error checking for the following: ◦ Ensure all 3 arguments are provided and that they are each defined ◦ Ensure that RNA sequence is composed of all A's, U's, C's, and G's. ◦ Ensure that the OBS is of sufficient length ◦ Ensure that $ARGV[2] is greater than $ARGV[2] If these conditions are met, proceed with program. Truth table of the YES logic gate. A is input, and F is output (1 if it is indeed a correct YES gate, 0 if it is not). A F 0 0 1 1 Pseudo-code for program 1. Compute the MFE secondary structure of the RNA sequence in the OFF state. Do this by calling RNAfold with the RNA sequence as an input, and enable the -T option set at 370 C in order to simulate physiological conditions. 2. Using the produced output, check that the the catalytic core is base-paired in the correct way to form the OFF state. Specifically, use the number and position of “.,” “(,” and “)” to ensure that stems I, IV, and II are correctly formed. If the catalytic core is not correctly formed, return 0 to the truth table, and end the program. If the catalytic core is correct, proceed to Step 3. 3. Now compute the MFE secondary structure of the RNA sequence in the ON state. Do this by calling RNAfold with the RNA sequence as an input, and enable the -T option set at 370 C and also the -C option, in order to prevent the OBS sequence from base-pairing. Make the program take the RNA sequence as the first line of input to RNA fold. For the second line of input, make the program use the OBS coordinates specified in $ARGV[1] and $ARGV[2] to place x's at every nucleotide position corresponding to the OBS. Place a “.” at every other nucleotide position, which allows all non-OBS nucleotides to base-pair freely. 4. Using the produced output, check that the the catalytic core is base-paired in the correct way to form the ON state. Specifically, use the number and position of “.,” “(,” and “)” to ensure that stems I, II, and III are correctly formed. If the catalytic core is not correctly formed, return 0 to the truth table and end the program. If the catalytic core is correct, proceed to Step 5. 5. Using the RNAfold output, check that 30-70% of the OBS is base-paired in the OFF state. To do this, first use the OBS coordinates to check each OBS nucleotide and count how many have a “(“ or “).” Then, divide this count by the total length of the OBS. If this number is outside of the range of .3 to .7, return 0 to the truth table and end the program. If it is within this range, proceed to Step 6. 6. Using the MFE calculations from the RNAfold calls for both the ON and OFF positions, check to see if the energy gap between the OFF and ON positions is at least 6-10 kcal/mol. If it is not, return a value of 0 to the truth table and end the program. If it is, proceed to Step 7. 7. In order to impose more stringent requirements that our YES gate is physiologically relevant, check that both the OFF and ON positions are stable from 20-40 degrees Celsius. To do this, use a for loop to run RNAfold for each increment of 5 degrees between 20 to 40 degrees. For each pass of the loop, repeat step 2 for the OFF position and step 4 for the ON position. If for any structure at any temperature the structure is not stable, return a value of 0 to the truth table and end the program. If the structures are stable at all temperatures, proceed to Step 8. 8. Check that the structure ensemble diversity for both the ON and OFF positions is less than 9. If either or both of them are not, return a value of zero to the truth table. If they are, return a value of 1 to the truth table. 3 – Hammerhead ribozyme structure Objective: Predict the MFE structure of the AF404053.1 hammerhead ribozyme sequence and compare to the structure given in Rfam. The following represents the hammerhead ribozyme sequence, nucleotides 70 – 186 of the complete genome for Avocado sunblotch viroid isolate CF39, taken from Genbank. tttccctgaagagacgaagtgatcaagagatcgaagacgagtgaactaattttttttaataaaaagttcaccacgactcctccttctctcac aagtcgaaactcagagtcggcaag First, RNAfold was used to produce the MFE structure of this sequence: be231-08@kepler ~ [1:20am]> RNAfold Input string (upper or lower case); @ to quit ....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8 tttccctgaagagacgaagtgatcaagagatcgaagacgagtgaactaattttttttaataaaaagttcaccacgactcctccttctctc acaagtcgaaactcagagtcggcaag length = 90 UUUCCCUGAAGAGACGAAGUGAUCAAGAGAUCGAAGACGAGUGAACUAAUUUUUUUUAAU AAAAAGUUCACCACGACUCCUCCUUCUCUC ..........((((.((((.((((....)))).....((.(((((((...((((......)))))))))))..)).......)))))))) minimum free energy = -18.30 kcal/mol Then, RNAplot was used to generate the secondary structure from this sequence information: The structures look substantially different. This may be due to the fact that the structure generated in Rfam was formed under a different temperature. Also, it may have been generated with specific constraints on some bases, such as no binding, as we did in part one of this assignment.