Hello, my name is Trey Belew, and I would like to share with you some of our work in the Dinman laboratory. I’d like to tell the story of mRNA destabilization mediated by programmed -1 ribosomal frameshifting in 4 parts: starting with some background information, followed by an introduction of the computational pipeline we developed, and complementary studies of the effects of -1 Programmed ribosomal frameshifting in yeast and mammalian cell culture. Dr. Dinman’s lab is first and foremost one which studies translation, my work concerns a cotranslational process, thus I will highlight the most relevant aspects of protein translation before looking at the elements and mechanisms of a frameshift signal, some examples, and how history informed the general hypothesis of this work. In order to correctly translate an mRNA into a peptide, the ribosome must correctly recognize the start codon, maintain the proper reading frame, and correctly recognize the end of the message. My work focuses on what happens when this process is derailed and the translating ribosome shifts into an alternate reading frame. A -1 PRF signal contains 3 elements: a slippery heptamer followed by a strong RNA structure separated by an appropriate spacer region. These are arranged so that an actively translating ribosome is forced to pause by the RNA secondary structure while it is located over the slippery heptamer. In some cases the tRNAs break and repair in the -1 frame, while in others the ribosome makes an incomplete translocation because forward motion is frustrated by the downstream secondary structure. In each case, however, the ribosome eventually denatures the downstream structure and continues translation in the new reading frame. Though -1 PRF signals have been characterized in many viruses since being discovered by Jacks and Varmus, I am showing just three here ranging from the very simple yeast L-A totivirus to the significantly more complex HIV. In each case, ribosomes usually stop at the 0 frame stop codon, thus translating the structural protein. However they stochastically change reading frame at the programmed frameshifting signal and continue on to translate the enzymatically active carboxy-terminal extension protein. Thus the virus is able to manipulate the host’s translational machinery to provide two products for the cost of a single mRNA. This is far from the first molecular system first described in viruses. From the Hershey Chase experiments showing that nucleic acids are the heritable material of the cells to the discovery of the IRES, viruses have been instrumental in discovering new mechanisms and eventually characterizing them in all kingdom’s of life. The process of characterizing frameshifting in genomic sequences is already underway. Here are just a few examples of frameshifting signals found in cellular genes in all kingdoms of life, suggesting that frameshifting has implications which stretch beyond viruses. Therefore we hypothesized that cellular gene expression is post-transcriptionally regulated by -1 PRF. In order to test this hypothesis, we implemented a computational pipeline to search for candidate -1 PRF signals in genomic sequences and assayed these candidates in the laboratory. Therefore I would first like to describe the design and implementation of this computational pipeline, show some examples of what it has found so far. The PRFdb first imports a population of sequences from other sequence databases like the SGD or NCBI, usually this population focuses upon the genome of a single organism. The prfdb applies some simple pattern matches against this population and quickly exclude inappropriate sequences. The remaining candidates are passed to rnamotif to exclude candidates which have no potential to form even the most permissive pseudoknot. Positive candidates are passed to a series of RNA structure prediction programs which provide a predicted structure and minimum free energy. This is followed by a series of randomization and refoldings in order to generate a population of sequences against which to compare the prediction. All the data from this process, including the sequence predictions and randomizations are then stored in a database and used for future tests, including cross references against other databases like microarray or (now) next-generation sequencing projects until we choose candidate sequences to bring into the laboratory here we assay them for the ability to promote frameshifting in vivo and for effects on mRNA abundance and stability. The PRFdb currently comprises up to 2000 CPUs provided by the NCI and university of Maryland high performance computing center. I limit the code to using less than 100 cpus however to avoid being a pain. It is the responsibility of the compute nodes to download the sequences, perform the pipeline, and upload the results to a synchronously replicating database. This database in turn may be queried by any number of webservers to provide a (hopefully friendly) interface to the resulting data. Currently, the PRFdb contains approximately ½ million sequences from about 50 completed genomes in all kingdoms of life. Its currently implementation, given sufficient space to grow, can easily scale to the entirety of genbank. When this pipeline has finished examining a genome (in this case Saccharomyces cerevisiae), we are left with a huge distribution of candidate sequence windows. Here I have plotted the predicted difference from random (z-score) with respect to minimum free energy. The axes in black denote the mean of the population from yeast; therefore as we move from top-right to bottom left we look at more and more significant candidates with respect to minimum free energy and statistical significance. Therefore the graphs come with little gray lines denoting 1 and 2 standard deviations from the mean on each axis. In addition, the PRFdb provides histograms showing the population of candidates with respect to MFE and Z, in this case minimum free energy. Looking still more broadly, a similar distribution is observed across each genome examined. Furthermore, between1 and 5 potential -1 PRF signal candidtes are found per open reading frame, but the percentage observed as ‘significant’ using these metrics remains nearly 10% across all species. Looking at the distribution of -1 frame extensions with respect to open reading frame provides an observation which informs the central hypothesis of this work: unlike the observed viral -1 PRF signals, very few of these candidates extend beyond a few codons. In fact, if we cound the number of codons explicitly, very few of the -1 frame extensions are longer than 30 codons. Using this same plot though, it is possible to successfully identify a previously characterized -1 frameshift signal from mice which has similarity to known retroelements. Thus if these candidate -1 frameshift elements are in fact active, then a great majority of them lead translating ribosomes to a proximal -1 frame termination codon. Another way to visualize the contrasts between normal translation and -1 frameshifting in what we term the genomic or viral contexts is to again imagine a translating ribosome paused at a strong secondary structure. In the genomic -1 PRF context, the ribosome usually terminates shortly after the frameshift event at a proximal -1 frame stop codon. Conversely, most of the time the ribosome stays in the correct reading frame and continues on to translate the 0 frame product, this is also the most common fate when translating a viral -1 frameshifting signal. However, as mentioned, sometimes the ribosome shifts reading frame and translates a cterminal extension. Another way to examine the differences between viral and genomic frameshifting is to plot the number of observed potential frameshift signals with respect to open reading frame, shown in these plots in red. If indeed all -1 PRF signals follow the pattern established by the viral origin, then we would expect most of these to occur at the 3’ end of the ORF, but instead the opposite was observed, far fewer instances are observed at the end. Conversely in green against the right axis is plotted the percentage of this population which extends by more than 30 amino acids. Interestingly, in every species with a moderately large sample size, a sharp increase is evident in the final 10% of ORFs. This suggests to me at least, that there is a selective pressure against viral retroelements in genomic sequences in general, but that some have been established in genomic sequences. With all that in mind, I would like to give a short tour of the predicted ribosomal frameshift signal database. This is the front page, it provides links showing some metrics for a few completed genomes to the left, some background, and a series of tables which allow on to search for specific sequences, look over distributions of prf signals, filter the database, download data, import new sequences, or fold sequences denovo using programs like mfold, pknots, etc. The two most commonly used buttons are the search and distribution. If one clicks search, this interfaces provides a keyword search (into which I filled ‘EST2’), It is also possible to search for a specific genbank or SGD accession or HGNC gene name. Finally, it is also possible to provide nucleotide sequence or protein sequence to perform a blast search against the local database or against NCBI’s genbank. When I hit the search button, I am confronted with the various EST2 hits from various yeast species. From here I can also link out to the SGD or NCBI to learn more about this accession or view the detailed information for this gene. Doing so brings up a summary for this ORF as well as a formatted sequence window showing the positions of the candidate -1 prf signals from this orf. In this case I chose the position 1653 candidate, which in turn loads a series of structure predictions, one of which is shown here in a linear Feynman diagram. To the right is the distribution of 100 randomized sequences with the same nucleotide complement as the original sequence window. In this distribution the mean is shown as a black bar, the idealized normal distribution is shown in red, and the actual predicted mfe is in green, thus the difference between the black and green bars informs the z score, which is explicitly provided at the left, along with the predicted mfe, gc content. A few other links are also provided, including sequence and secondary structure prediction download links. The data generated by this pipeline is used to inform the later analyses. It is worth noting however, that mRNA structure prediction is NP-Hard, and so it is not feasible to accurately attempt predicting an entire ORF. However, we can break this problem down into small parallelizable pieces. Doing so results in a tremendous population of potential candidates, which we minimize using just a couple of simple metrics. When we looked at the -1 frame extensions predicted for these candidates, we found that a great majority of them result in proximal termination codons. Furthermore, the distribution of these sequences over the ORF is informative, though what exactly it is saying I am not sure. Finally, this dataset is relatively easy to cross correlate with other datatypes like microarray data or the torrent of next-generation sequencing data in order to successfully find candidates. At this point, lets shift gears and look at the fate of some candidates provided by the computational pipeline in yeast. In order to do so, we evaluated their ability to promote apparent frameshifting in vivo with the dual luciferase assay system. We then used the PGK1 reporter system to monitor steady state abundance and mRNA decay rate. We used some tools provided by yeast genetics to test the effects of specific decay pathways. Finally we focused upon the PRF signals in the EST2 gene to look for effects on telomere homeostasis. This illustrates the yeast dual luciferase reporter system used in the dinmanlab and some candidate -1 PRF signals which were cloned into it, including the viral -1 PRF signal from LA. We used these constructs to ask whether the candidate sequences promote significant levels of frameshifting. We set a readthrough control to 100%, when firefly is out of frame with respect to Renilla, we observe approximately 1 event in 1,000 in this reporter while the LA reporter promotes approximately 10% frameshifting. All of the candidate sequences promote frameshifting at least an order of magnitude more than the out of frame control, even FKS1 which is predicted to only form a stem-loop. As a conservative estimate, we define significant as greater than 1%, and by this metric 4 of these candidates promote significant frameshifting in vivo. I’d like to remind everyone of one of the previous observations: nearly all genomic -1 PRF events lead to proximal stop codons, which is also true for these candidates. Therefore, if we once again imagine translating ribosomes shifting reading frames on these messages, the ribosomes will quickly terminate on the proximal stop codons. Eukaryotes have a well established pathway ready to rapidly degrade messages which terminate prematurely, called nonsense mediated decay. Thus we hypothesize that these messages are substrates for decay via NMD. Yeast also have a pathway termed ‘no-go’ decay, which serves to free trapped ribosomes by cleaving the mRNA and leaving it a substrate for rapid decay. Because -1 PRF assumes a significant ribosomal pause, we hypothesize that some of these mRNAs are also substrates for NGD. In order to test these hypotheses, we used the PGK1 reporter system. This is an excellent mRNA stability reporter, partially because the phosphoglycerate kinase mRNA is so abundant and stable, but also because it has been used in many previous studies to test mRNA stability. We cloned into this mRNA some exogenous sequence a la the dual luciferase reporter to differentiate our copy from the endogenous copy and then inserted the candidate sequences along with a premature termination codon control. We used these reporters to query the steady state abundance of these candidates. The U3 snoRNA was used as a loading control. To the left is the extremely abundant readthrough control followed by the much less abundant premature termination codon control. As you can see, all of the candidates we tested in wild type cells were much less abundant than the readthrough. We were further able to use a temperature sensitive version of RNA polymerase 2 in order to perform time course analyses and measure the decay rate of some of these. In Green and Red I plotted the decay of the readthrough and PTC containing constructs. As you can see in the blots, the readthrough abundance remained extremely high over time while the PTC control, not so much. Similarly the abundance of the EST2 mRNA decreased significantly over time. When I repeated the experiment in cells deficient in Nonsense mediated decay, all of these messages were stabilized over time, thus establishing that these are in fact substrates for NMD. Another way to ask the same question is to perform steady state abundance assays for all the candidates in NMD deficient cells, when I did so, three increased in abundance by quite a lot, especially EST2. Similarly, we asked whether these candidates are substrates for no-go decay. Comparing that result against an NMD blot shows that once again EST2 is strongly increased in abundance, as is BUB3. Thus of these candidates, I think either the EST2 or BUB3 signals are the strongest substrates for no-go decay. We chose to look further at EST2, not only because of the results in the previous experiments, but also because of its importance to the cell. EST2 encodes the reverse transcriptase portion of the telomerase holoenzyme. It, along with est1, est3, and a guide rna are responsible for maintaining chromosome ends in the cell. They are recruited to the chromosome ends by cdc13 and stn1, this is notable because stn1 and est1 also harbor functional -1 frameshift signals while est3 has a known +1 frameshifting signal. Here are diagrammed 5 of the most likely candidate frameshifting signals from the full length EST2 gene. We cloned these into the dual luciferase reporter system and found that in addition to the previously characterized signal at position 1653, the position 1215 PRF signal is also functional. Rather than test these with the PGK1 reporter system, I made silent mutations in a full length clone of the EST2 reading frame which we hypothesized would disrupt the ability of translating ribosomes to shift reading frame. I then queried the steady state abundance of the full length EST2 ORF with qPCR in wild type and NMD deficient cells and found really strong increases in abundance. If inactivating NMD increases the steady state abundance of EST2, what happens if the overall rate of frameshifting is decreased? In order to test that, we queried the amount of EST2 in a wild type strain and two isogenic mutant strains of L3, one which frameshifts more often than wild type and one less. In this case at least, when the rate of frameshifting decreased, the mRNA abundance of EST2 increased and vice-versa. In an attempt to understand the significance of these changes in EST2 abundance, we attempted a follow-up experiment to one previously performed by the Bermon lab. On the left, at the bottom of lanes 1 and 3 are the relative sizes of telomeres in wild type cells and cells which are NMD deficient. Thus telomeres are shorter without active NMD in the cell. I repeated that experiment on the right in lanes 2 and 4, then in lane 3 I added tried again with cells harboring the silent EST2 mutations and observed an intermediate phenotype. Thus I would like to suggest that -1 PRF affects telomere maintenance in yeast. Taken together, the work we performed in yeast netted some pretty interesting observations. For one thing, the computational pipeline seems to work at least somewhat, we were able to find functional -1 frameshift signals with it. Looking more closely at these candidates we found that they are destabilization elements via nmd and ngd. Looking more closely at est2 we found 2 strong -1 PRF signals and were able to establish a correlation between frameshifting efficiency and mRNA abundance thanks to Arturas’ L3 strains. And finally, making silent mutations to the EST2 mRNA leads to changes in telomere length which are mediated by NMD.