talk - TerpConnect - University of Maryland

advertisement
Hello, my name is Trey Belew, and I would like to share with you some of our work in the
Dinman laboratory. I’d like to tell the story of mRNA destabilization mediated by programmed -1
ribosomal frameshifting in 4 parts: starting with some background information, followed by an
introduction of the computational pipeline we developed, and complementary studies of the
effects of -1 Programmed ribosomal frameshifting in yeast and mammalian cell culture.
Dr. Dinman’s lab is first and foremost one which studies translation, my work concerns a
cotranslational process, thus I will highlight the most relevant aspects of protein translation
before looking at the elements and mechanisms of a frameshift signal, some examples, and
how history informed the general hypothesis of this work.
In order to correctly translate an mRNA into a peptide, the ribosome must correctly recognize
the start codon, maintain the proper reading frame, and correctly recognize the end of the
message. My work focuses on what happens when this process is derailed and the translating
ribosome shifts into an alternate reading frame.
A -1 PRF signal contains 3 elements: a slippery heptamer followed by a strong RNA structure
separated by an appropriate spacer region. These are arranged so that an actively translating
ribosome is forced to pause by the RNA secondary structure while it is located over the slippery
heptamer. In some cases the tRNAs break and repair in the -1 frame, while in others the
ribosome makes an incomplete translocation because forward motion is frustrated by the
downstream secondary structure. In each case, however, the ribosome eventually denatures
the downstream structure and continues translation in the new reading frame.
Though -1 PRF signals have been characterized in many viruses since being discovered by
Jacks and Varmus, I am showing just three here ranging from the very simple yeast L-A totivirus
to the significantly more complex HIV. In each case, ribosomes usually stop at the 0 frame stop
codon, thus translating the structural protein. However they stochastically change reading
frame at the programmed frameshifting signal and continue on to translate the enzymatically
active carboxy-terminal extension protein. Thus the virus is able to manipulate the host’s
translational machinery to provide two products for the cost of a single mRNA.
This is far from the first molecular system first described in viruses. From the Hershey Chase
experiments showing that nucleic acids are the heritable material of the cells to the discovery of
the IRES, viruses have been instrumental in discovering new mechanisms and eventually
characterizing them in all kingdom’s of life.
The process of characterizing frameshifting in genomic sequences is already underway. Here
are just a few examples of frameshifting signals found in cellular genes in all kingdoms of life,
suggesting that frameshifting has implications which stretch beyond viruses.
Therefore we hypothesized that cellular gene expression is post-transcriptionally regulated by -1
PRF. In order to test this hypothesis, we implemented a computational pipeline to search for
candidate -1 PRF signals in genomic sequences and assayed these candidates in the
laboratory.
Therefore I would first like to describe the design and implementation of this computational
pipeline, show some examples of what it has found so far.
The PRFdb first imports a population of sequences from other sequence databases like the
SGD or NCBI, usually this population focuses upon the genome of a single organism. The prfdb
applies some simple pattern matches against this population and quickly exclude inappropriate
sequences. The remaining candidates are passed to rnamotif to exclude candidates which
have no potential to form even the most permissive pseudoknot. Positive candidates are
passed to a series of RNA structure prediction programs which provide a predicted structure
and minimum free energy. This is followed by a series of randomization and refoldings in order
to generate a population of sequences against which to compare the prediction. All the data
from this process, including the sequence predictions and randomizations are then stored in a
database and used for future tests, including cross references against other databases like
microarray or (now) next-generation sequencing projects until we choose candidate sequences
to bring into the laboratory here we assay them for the ability to promote frameshifting in vivo
and for effects on mRNA abundance and stability.
The PRFdb currently comprises up to 2000 CPUs provided by the NCI and university of
Maryland high performance computing center. I limit the code to using less than 100 cpus
however to avoid being a pain. It is the responsibility of the compute nodes to download the
sequences, perform the pipeline, and upload the results to a synchronously replicating
database. This database in turn may be queried by any number of webservers to provide a
(hopefully friendly) interface to the resulting data.
Currently, the PRFdb contains approximately ½ million sequences from about 50 completed
genomes in all kingdoms of life. Its currently implementation, given sufficient space to grow, can
easily scale to the entirety of genbank.
When this pipeline has finished examining a genome (in this case Saccharomyces cerevisiae),
we are left with a huge distribution of candidate sequence windows. Here I have plotted the
predicted difference from random (z-score) with respect to minimum free energy. The axes in
black denote the mean of the population from yeast; therefore as we move from top-right to
bottom left we look at more and more significant candidates with respect to minimum free
energy and statistical significance. Therefore the graphs come with little gray lines denoting 1
and 2 standard deviations from the mean on each axis. In addition, the PRFdb provides
histograms showing the population of candidates with respect to MFE and Z, in this case
minimum free energy.
Looking still more broadly, a similar distribution is observed across each genome examined.
Furthermore, between1 and 5 potential -1 PRF signal candidtes are found per open reading
frame, but the percentage observed as ‘significant’ using these metrics remains nearly 10%
across all species.
Looking at the distribution of -1 frame extensions with respect to open reading frame provides
an observation which informs the central hypothesis of this work: unlike the observed viral -1
PRF signals, very few of these candidates extend beyond a few codons. In fact, if we cound the
number of codons explicitly, very few of the -1 frame extensions are longer than 30 codons.
Using this same plot though, it is possible to successfully identify a previously characterized -1
frameshift signal from mice which has similarity to known retroelements.
Thus if these candidate -1 frameshift elements are in fact active, then a great majority of them
lead translating ribosomes to a proximal -1 frame termination codon.
Another way to visualize the contrasts between normal translation and -1 frameshifting in what
we term the genomic or viral contexts is to again imagine a translating ribosome paused at a
strong secondary structure. In the genomic -1 PRF context, the ribosome usually terminates
shortly after the frameshift event at a proximal -1 frame stop codon. Conversely, most of the
time the ribosome stays in the correct reading frame and continues on to translate the 0 frame
product, this is also the most common fate when translating a viral -1 frameshifting signal.
However, as mentioned, sometimes the ribosome shifts reading frame and translates a cterminal extension.
Another way to examine the differences between viral and genomic frameshifting is to plot the
number of observed potential frameshift signals with respect to open reading frame, shown in
these plots in red. If indeed all -1 PRF signals follow the pattern established by the viral origin,
then we would expect most of these to occur at the 3’ end of the ORF, but instead the opposite
was observed, far fewer instances are observed at the end. Conversely in green against the
right axis is plotted the percentage of this population which extends by more than 30 amino
acids. Interestingly, in every species with a moderately large sample size, a sharp increase is
evident in the final 10% of ORFs. This suggests to me at least, that there is a selective
pressure against viral retroelements in genomic sequences in general, but that some have been
established in genomic sequences.
With all that in mind, I would like to give a short tour of the predicted ribosomal frameshift signal
database. This is the front page, it provides links showing some metrics for a few completed
genomes to the left, some background, and a series of tables which allow on to search for
specific sequences, look over distributions of prf signals, filter the database, download data,
import new sequences, or fold sequences denovo using programs like mfold, pknots, etc. The
two most commonly used buttons are the search and distribution. If one clicks search, this
interfaces provides a keyword search (into which I filled ‘EST2’), It is also possible to search for
a specific genbank or SGD accession or HGNC gene name. Finally, it is also possible to
provide nucleotide sequence or protein sequence to perform a blast search against the local
database or against NCBI’s genbank.
When I hit the search button, I am confronted with the various EST2 hits from various yeast
species. From here I can also link out to the SGD or NCBI to learn more about this accession
or view the detailed information for this gene. Doing so brings up a summary for this ORF as
well as a formatted sequence window showing the positions of the candidate -1 prf signals from
this orf. In this case I chose the position 1653 candidate, which in turn loads a series of
structure predictions, one of which is shown here in a linear Feynman diagram. To the right is
the distribution of 100 randomized sequences with the same nucleotide complement as the
original sequence window. In this distribution the mean is shown as a black bar, the idealized
normal distribution is shown in red, and the actual predicted mfe is in green, thus the difference
between the black and green bars informs the z score, which is explicitly provided at the left,
along with the predicted mfe, gc content. A few other links are also provided, including
sequence and secondary structure prediction download links.
The data generated by this pipeline is used to inform the later analyses. It is worth noting
however, that mRNA structure prediction is NP-Hard, and so it is not feasible to accurately
attempt predicting an entire ORF. However, we can break this problem down into small
parallelizable pieces. Doing so results in a tremendous population of potential candidates,
which we minimize using just a couple of simple metrics. When we looked at the -1 frame
extensions predicted for these candidates, we found that a great majority of them result in
proximal termination codons. Furthermore, the distribution of these sequences over the ORF is
informative, though what exactly it is saying I am not sure. Finally, this dataset is relatively easy
to cross correlate with other datatypes like microarray data or the torrent of next-generation
sequencing data in order to successfully find candidates.
At this point, lets shift gears and look at the fate of some candidates provided by the
computational pipeline in yeast. In order to do so, we evaluated their ability to promote
apparent frameshifting in vivo with the dual luciferase assay system. We then used the PGK1
reporter system to monitor steady state abundance and mRNA decay rate. We used some
tools provided by yeast genetics to test the effects of specific decay pathways. Finally we
focused upon the PRF signals in the EST2 gene to look for effects on telomere homeostasis.
This illustrates the yeast dual luciferase reporter system used in the dinmanlab and some
candidate -1 PRF signals which were cloned into it, including the viral -1 PRF signal from LA.
We used these constructs to ask whether the candidate sequences promote significant levels of
frameshifting. We set a readthrough control to 100%, when firefly is out of frame with respect to
Renilla, we observe approximately 1 event in 1,000 in this reporter while the LA reporter
promotes approximately 10% frameshifting. All of the candidate sequences promote
frameshifting at least an order of magnitude more than the out of frame control, even FKS1
which is predicted to only form a stem-loop. As a conservative estimate, we define significant
as greater than 1%, and by this metric 4 of these candidates promote significant frameshifting in
vivo.
I’d like to remind everyone of one of the previous observations: nearly all genomic -1 PRF
events lead to proximal stop codons, which is also true for these candidates. Therefore, if we
once again imagine translating ribosomes shifting reading frames on these messages, the
ribosomes will quickly terminate on the proximal stop codons. Eukaryotes have a well
established pathway ready to rapidly degrade messages which terminate prematurely, called
nonsense mediated decay. Thus we hypothesize that these messages are substrates for decay
via NMD. Yeast also have a pathway termed ‘no-go’ decay, which serves to free trapped
ribosomes by cleaving the mRNA and leaving it a substrate for rapid decay. Because -1 PRF
assumes a significant ribosomal pause, we hypothesize that some of these mRNAs are also
substrates for NGD.
In order to test these hypotheses, we used the PGK1 reporter system.
This is an excellent
mRNA stability reporter, partially because the phosphoglycerate kinase mRNA is so abundant
and stable, but also because it has been used in many previous studies to test mRNA stability.
We cloned into this mRNA some exogenous sequence a la the dual luciferase reporter to
differentiate our copy from the endogenous copy and then inserted the candidate sequences
along with a premature termination codon control.
We used these reporters to query the steady state abundance of these candidates. The U3
snoRNA was used as a loading control. To the left is the extremely abundant readthrough
control followed by the much less abundant premature termination codon control. As you can
see, all of the candidates we tested in wild type cells were much less abundant than the
readthrough. We were further able to use a temperature sensitive version of RNA polymerase 2
in order to perform time course analyses and measure the decay rate of some of these. In
Green and Red I plotted the decay of the readthrough and PTC containing constructs. As you
can see in the blots, the readthrough abundance remained extremely high over time while the
PTC control, not so much. Similarly the abundance of the EST2 mRNA decreased significantly
over time. When I repeated the experiment in cells deficient in Nonsense mediated decay, all of
these messages were stabilized over time, thus establishing that these are in fact substrates for
NMD. Another way to ask the same question is to perform steady state abundance assays for
all the candidates in NMD deficient cells, when I did so, three increased in abundance by quite a
lot, especially EST2. Similarly, we asked whether these candidates are substrates for no-go
decay. Comparing that result against an NMD blot shows that once again EST2 is strongly
increased in abundance, as is BUB3. Thus of these candidates, I think either the EST2 or
BUB3 signals are the strongest substrates for no-go decay.
We chose to look further at EST2, not only because of the results in the previous experiments,
but also because of its importance to the cell. EST2 encodes the reverse transcriptase portion
of the telomerase holoenzyme. It, along with est1, est3, and a guide rna are responsible for
maintaining chromosome ends in the cell. They are recruited to the chromosome ends by
cdc13 and stn1, this is notable because stn1 and est1 also harbor functional -1 frameshift
signals while est3 has a known +1 frameshifting signal.
Here are diagrammed 5 of the most likely candidate frameshifting signals from the full length
EST2 gene. We cloned these into the dual luciferase reporter system and found that in addition
to the previously characterized signal at position 1653, the position 1215 PRF signal is also
functional. Rather than test these with the PGK1 reporter system, I made silent mutations in a
full length clone of the EST2 reading frame which we hypothesized would disrupt the ability of
translating ribosomes to shift reading frame. I then queried the steady state abundance of the
full length EST2 ORF with qPCR in wild type and NMD deficient cells and found really strong
increases in abundance.
If inactivating NMD increases the steady state abundance of EST2, what happens if the overall
rate of frameshifting is decreased? In order to test that, we queried the amount of EST2 in a
wild type strain and two isogenic mutant strains of L3, one which frameshifts more often than
wild type and one less. In this case at least, when the rate of frameshifting decreased, the
mRNA abundance of EST2 increased and vice-versa.
In an attempt to understand the significance of these changes in EST2 abundance, we
attempted a follow-up experiment to one previously performed by the Bermon lab. On the left,
at the bottom of lanes 1 and 3 are the relative sizes of telomeres in wild type cells and cells
which are NMD deficient. Thus telomeres are shorter without active NMD in the cell. I repeated
that experiment on the right in lanes 2 and 4, then in lane 3 I added tried again with cells
harboring the silent EST2 mutations and observed an intermediate phenotype. Thus I would
like to suggest that -1 PRF affects telomere maintenance in yeast.
Taken together, the work we performed in yeast netted some pretty interesting observations.
For one thing, the computational pipeline seems to work at least somewhat, we were able to
find functional -1 frameshift signals with it. Looking more closely at these candidates we found
that they are destabilization elements via nmd and ngd. Looking more closely at est2 we found
2 strong -1 PRF signals and were able to establish a correlation between frameshifting
efficiency and mRNA abundance thanks to Arturas’ L3 strains. And finally, making silent
mutations to the EST2 mRNA leads to changes in telomere length which are mediated by NMD.
Download