RNA_Folding_HW

advertisement
Amer Abdulla
10/15/2010
RNA Folding HW
1 - Hammerhead ribozyme YES Gate
The following is the ribozyme sequence used in this assignment:
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC
AGGCGAAACGGUGAAAGCCGUAGGUUGCCC
 Source: Penchovsky R, Breaker RR. Computational design and experimental
validation of oligonucleotide-sensing allosteric ribozymes. Nat Biotechnol. 2005
Nov;23(11):1424-33. Epub 2005 Oct 23.
OFF Position
1) To predict the MFE structure, I used RNAfold, a program that reads in an RNA
sequence from standard input, calculates their minimum free energy structure, and
prints to standard output the minimum free energy structure in bracket notation and its
free energy. In the terminal, I entered “RNAfold,” which prompted the RNAfold
program to output “Input string (upper or lower case);”
I then entered the hammerhead ribozyme sequence seen above. These are my results:
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAU
CAGGCGAAACGGUGAAAGCCGUAGGUUGCCC
((((((((((((((((((((.(..(((.......))).)))))))).))))).....(((((....))))).))))))))
minimum free energy = -35.10 kcal/mol
In order to visualize the secondary structure determined by RNAfold, I used the program
RNAplot, which renders 2D plots of the secondary structure of RNAs in Postscript. In the
command line, I entered RNAplot, which caused the program to request from me to
“Input the sequence and structure.” Upon copying and pasting the sequence and
secondary structure in bracket notation produced from RNAfold into the terminal, I
pressed enter, and a file named rna.ps was produced. In order to visualize the secondary
structure, I had to convert the Postscript file to a PDF file. To do this, I entered “convert
rna.psf rna.pdf” into the command line. Then, I used Adobe reader to open the file. This
is what I got:
2) To get the baseparing probability plot, I used the program RNAplot with the -p option.
In the command line, I entered “RNAfold -p” and then when it asked for the sequence
I copied and pasted the sequence in. The following was the output:
length = 80
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC
AGGCGAAACGGUGAAAGCCGUAGGUUGCCC
((((((((((((((((((((.(..(((.......))).)))))))).))))).....(((((....))))).))))))))
minimum free energy = -35.10 kcal/mol
(((((((((((((((((({(.,..,||,,,,...}}}.|})))))),,)))).....(((((....))))).))))))))
free energy of ensemble = -37.93 kcal/mol
((((((((((((.(((((((...................)))))))..)))).....(((((....))))).)))))))) {-29.41 d=10.92}
frequency of mfe structure in ensemble 0.0836646; ensemble diversity 15.95
A file called dot.ps was formed, which held in it the base-pairing probability plot, seen
below:
In the
lower left
corner of
the plot,
there is
one
square
for each
base-pair
in the
MFE
structure.
In the
upper
right
corner,
there are
squares
with
areas
proportio
nal to the
pairing
probabili
ty. As
one can
see, the
most
likely
pairings
seen in the upper right correspond to the pairings shown in the lower left, which represent
a kind of “global minimum” in base-pairings.
3) Statuses of different parts of the sequence:
Stem I: exists
Stem II: does not exist because the effector-binding site binds to the rest of the
hammerhead ribozyme and therefore prevents stem II from forming
Stem III: exists
Cleavage site: relatively inaccessible because it's in a tight corner between stems I and
III.
ON Position

The same exact procedure was followed as above, except for the imposition of
constraints to form the ON position.
1) Predicted MFE structure – In order to make the On position, it was necessary to make
the Oligonucleotide Binding Site (OBS) single stranded so that we could simulate its
binding to DNA-1. In order to do this, I used the constraints feature of RNAfold. In
the command line, I typed in RNAfold -C. Next, it asked me to input the sequence,
and I did and pressed enter. Next, it have me a list of notations to use in assigning my
constraints. I placed a “.” for all bases for which I did not want any constraints, and I
placed an “x” for all bases that I wanted to not pair up with any other bases. These
bases, which were bases 26 – 47, corresponded to the OBS. The following is my
output:
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC
AGGCGAAACGGUGAAAGCCGUAGGUUGCCC
((((((((.......((((((...........................))))))...(((((....))))).))))))))
minimum free energy = -28.53 kcal/mol
The sequence and structure were inputted into RNAfold and the following was the
output:
2) B
as
ep
ai
ri
n
g
pr
o
b
a
bi
lit
y
pl
ot
–
th
is
w
as
th
e
in
put and output in the terminal:
be231-08@kepler ~/RNA_Folding [12:35am]> RNAfold -p -C
Input constraints using the following notation:
| : paired with another base
. : no constraint at all
x : base must not pair
< : base i is paired with a base j<i
> : base i is paired with a base j>i
matching brackets ( ): base i pairs base j
Input string (upper or lower case); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC
AGGCGAAACGGUGAAAGCCGUAGGUUGCCC
.........................xxxxxxxxxxxxxxxxxxxxxx.................................
length = 80
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUC
AGGCGAAACGGUGAAAGCCGUAGGUUGCCC
((((((((.......((((((...........................))))))...(((((....))))).))))))))
minimum free energy = -28.53 kcal/mol
((((((((.......((((((...........................))))))...(((((....))))).))))))))
free energy of ensemble = -29.02 kcal/mol
((((((((.......((((((...........................))))))...(((((....))))).)))))))) {-28.53 d=0.89}
frequency of mfe structure in ensemble 0.621842; ensemble diversity 1.67
And this was the plot:
3)
Statuses
of
different
parts of
the
sequence
:
Stem I:
exists
Stem II:
exists
because
OBS is
now
single
stranded,
meaning
it will
not
interfere
with the
formatio
n of
stem II.
Stem III:
exists
Cleavage site: relatively higher accessibility than in the OFF position because the angle
between stems I and III is larger than in the OFF position.
2 – Software for verifying YES Gate
Inputs to program:
- $ARGV[0]: the RNA sequence which the program will verify as being a YES gate
- $ARGV[1]: the nucleotide number that indicates the first nucleotide in the OBS
subsequence
- $ARGV[2]: the nucleotide number that indicates the last nucleotide in the OBS
subsequence
* When program is called at command line, do error checking for the following:
◦ Ensure all 3 arguments are provided and that they are each defined
◦ Ensure that RNA sequence is composed of all A's, U's, C's, and G's.
◦ Ensure that the OBS is of sufficient length
◦
Ensure that $ARGV[2] is greater than $ARGV[2]
If these conditions are met, proceed with program.
Truth table of the YES logic gate. A is input, and F is output (1 if it is indeed a correct
YES gate, 0 if it is not).
A
F
0
0
1
1
Pseudo-code for program
1. Compute the MFE secondary structure of the RNA sequence in the OFF state. Do
this by calling RNAfold with the RNA sequence as an input, and enable the -T
option set at 370 C in order to simulate physiological conditions.
2. Using the produced output, check that the the catalytic core is base-paired in the
correct way to form the OFF state. Specifically, use the number and position of
“.,” “(,” and “)” to ensure that stems I, IV, and II are correctly formed. If the
catalytic core is not correctly formed, return 0 to the truth table, and end the
program. If the catalytic core is correct, proceed to Step 3.
3. Now compute the MFE secondary structure of the RNA sequence in the ON state.
Do this by calling RNAfold with the RNA sequence as an input, and enable the -T
option set at 370 C and also the -C option, in order to prevent the OBS sequence
from base-pairing. Make the program take the RNA sequence as the first line of
input to RNA fold. For the second line of input, make the program use the OBS
coordinates specified in $ARGV[1] and $ARGV[2] to place x's at every
nucleotide position corresponding to the OBS. Place a “.” at every other
nucleotide position, which allows all non-OBS nucleotides to base-pair freely.
4. Using the produced output, check that the the catalytic core is base-paired in the
correct way to form the ON state. Specifically, use the number and position of “.,”
“(,” and “)” to ensure that stems I, II, and III are correctly formed. If the catalytic
core is not correctly formed, return 0 to the truth table and end the program. If the
catalytic core is correct, proceed to Step 5.
5. Using the RNAfold output, check that 30-70% of the OBS is base-paired in the
OFF state. To do this, first use the OBS coordinates to check each OBS nucleotide
and count how many have a “(“ or “).” Then, divide this count by the total length
of the OBS. If this number is outside of the range of .3 to .7, return 0 to the truth
table and end the program. If it is within this range, proceed to Step 6.
6. Using the MFE calculations from the RNAfold calls for both the ON and OFF
positions, check to see if the energy gap between the OFF and ON positions is at
least 6-10 kcal/mol. If it is not, return a value of 0 to the truth table and end the
program. If it is, proceed to Step 7.
7. In order to impose more stringent requirements that our YES gate is
physiologically relevant, check that both the OFF and ON positions are stable
from 20-40 degrees Celsius. To do this, use a for loop to run RNAfold for each
increment of 5 degrees between 20 to 40 degrees. For each pass of the loop,
repeat step 2 for the OFF position and step 4 for the ON position. If for any
structure at any temperature the structure is not stable, return a value of 0 to the
truth table and end the program. If the structures are stable at all temperatures,
proceed to Step 8.
8. Check that the structure ensemble diversity for both the ON and OFF positions is
less than 9. If either or both of them are not, return a value of zero to the truth
table. If they are, return a value of 1 to the truth table.
3 – Hammerhead ribozyme structure
Objective: Predict the MFE structure of the AF404053.1 hammerhead ribozyme sequence
and compare to the structure given in Rfam.
The following represents the hammerhead ribozyme sequence, nucleotides 70 – 186 of
the complete genome for Avocado sunblotch viroid isolate CF39, taken from Genbank.
tttccctgaagagacgaagtgatcaagagatcgaagacgagtgaactaattttttttaataaaaagttcaccacgactcctccttctctcac
aagtcgaaactcagagtcggcaag
First, RNAfold was used to produce the MFE structure of this sequence:
be231-08@kepler ~ [1:20am]> RNAfold
Input string (upper or lower case); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
tttccctgaagagacgaagtgatcaagagatcgaagacgagtgaactaattttttttaataaaaagttcaccacgactcctccttctctc
acaagtcgaaactcagagtcggcaag
length = 90
UUUCCCUGAAGAGACGAAGUGAUCAAGAGAUCGAAGACGAGUGAACUAAUUUUUUUUAAU
AAAAAGUUCACCACGACUCCUCCUUCUCUC
..........((((.((((.((((....)))).....((.(((((((...((((......)))))))))))..)).......))))))))
minimum free energy = -18.30 kcal/mol
Then, RNAplot was used to generate the secondary structure from this
sequence information:
The structures look substantially different. This may be due to the fact that the structure
generated in Rfam was formed under a different temperature. Also, it may have been
generated with specific constraints on some bases, such as no binding, as we did in part
one of this assignment.
Download