Does a folded RNA have an inside and outside? Lathan and Cech (1994) used hydroxyl radical probing to answer this question on a tRNA molecule. Riboses in tertiary structure will not be accessible to the hydroxyl radical. The tRNA was enzymatically synthesized and so it contained no modified bases. Its end was labeled with 32P, the RNA was folded by heating and cooling, and then incubated with Fe(EDTA) and peroxide. [Fe(EDTA)]2- + H2O2 → [Fe(EDTA)]- + •OH + OHFenton chemistry is the generation of a hydroxyl radical by Fe(EDTA) that is oxidized by peroxide. The neutral hydroxyl radical abstracts a hydrogen from deoxyribose or ribose, resulting in strand scission. Preferred sites are H5′ > H4′ > H3′ ≈ H2′ (deoxyribose). Blue: H4’ Red: H5’,5” SASA: solvent accessible surface area Cleavage is determined by solvent accessibility, and for the famous Drew-Dickerson DNA dodecamer, cleavage efficiency reveals how sequence dependence in B-form structure exposes or protects the deoxyribose. Bishop et al., Chemical Biol in press (Tullius lab) The pattern of protection is mapped onto the secondary structure. Those riboses from bases involved in tertiary interactions are not cleaved. The 414 nucleotide Tetrahymena Group I intron is one of the most extensively studied RNAs. It has been the model RNA system for folding studies for the past 20 years. Ask first: Does it have an inside and an outside? Proteins have solvent-accessible surfaces and internal cores; is this true for a large folded RNA? Without Mg2+ there is uniform cleavage over the entire RNA. With Mg2+ there are areas that are preferentially cleaved and some that were completely protected. The difference between traces were mapped onto the secondary structure of the intron. Protected sites are shaded. Cleaved sites are outlined. The authors made several points: Proteins have tightly packed interiors as a natural consequence of their nonpolar and hydrophobic amino acid side chains that avoid solvation. In contrast, the planar bases of a duplex are in the middle of the RNA helix and the anionic phosphates and polar sugars are on the outside of a duplex. How does this structure lend itself to compaction? The authors suggest that tertiary hydrogen binding interactions between bases, sugars, and phosphates like those in tRNA will be present in the structure of the Group I intron. Stacking interactions also contribute to tertiary interactions. “Finally, magnesium ions, neutralizing the anionic phosphates and perhaps bridging helices, could allow the backbones of different helices to be packed close together”. In 1998, repeat the probing experiment, with two differences: observe temporal folding using hydroxyl radicals generated from an X-ray beam. Experiment: prepare end-labeled RNA. Mix it in the stopped flow with Mg2+ to a final concentration of 10 mM. After mixing, start sampling by irradiating with the beam and collecting samples. Time resolution is 10 msec. Run the samples out on a denaturing polyacrylamide gel and quantify the cleavage (protection) with time. (Sclavi et al., 1998. Science 279: 1940-1945.) Here are the data describing the time dependence of protection of sites. Ybar is the fractional saturation of single protected sites, determined from the power dependence of the beam p = plower + (pupper – plower)Ybar and Ybar = 1 – e-kt. P is the saturation, pupper, plower the upper and lower limits of the transition curve, k the first order rate constant, and t time in seconds. Curve A is protection of nt 174-176: k = 2.7 (-1.3, + 1.8) s-1 Curve B is nt 183-189. k = 0.9 (±0.3) s-1 Curve C is nt 57-59; k = 0.20 (±0.05) s-1 Open symbols are controls of pre-equilibrated RNA with Mg2+. Map these data onto the secondary structure: P indicates a duplex Regions with similar folding rates are colored-coded. Green is fast: 2 sec-1. Orange has a tetraloop/receptor and a fast folding rate of ~1 sec-1. Pink folds slower: 0.2 sec Yellow folds slowest: 0.06 sec MODEL: Russell et al. ((2000) Nat Str Biol 7:367-370) characterized this form and others using SAXS (small angle X-ray scattering). They found that when the intron folds in the presence of Mg2+, it is compact. Moreover, the compaction happened fast, but the native structure wasn’t completely formed until later. This led them to propose “electrostatic collapse” for the RNA. General principles: RNA folding often requires Mg2+ ions. RNA folding is hierarchical. RNA molecules can misfold. Concept: An RNA folding funnel. M is Misfolded, and N is correct Native fold. Intermediates are also shown. Paths depend on ions, temperature, mutations, starting structures. How do you think an RNA folds during transcription? Proteins also have folding funnels – how might they differ from the RNA funnels? More ligands and nucleic acids and folding. DNA also has close associations with ions. This is a Dickerson dodecamer crystal structure that is very high resolution. spermidine Na+ Hexahydrate Mg2+ [Shui et al (Williams lab) 1998. Biochemistry 37:8341] The “Spine of Hydration” Nucleic acids are extensively hydrated, and the concept that there is a network of ordered water molecules held in place through hydrogen bonding to bases and phosphates is generally accepted. The “spine of hydration” was thought to occur in the minor groove of B-form DNA based on early crystal structures of the Dickerson dodecamer. Shui t al. revisited that study, and concluded that many of those waters were in fact ions (Na+) that constituted the first layer of the spine. The waters were in a second ordered layer. More ligands and nucleic acids. Ligand = protein. The arrangement of hydrogen bond donors and acceptors allows a protein to distinguish among AT, TA, CG, and GC in the major groove. In the minor groove, only AT and GC pairs can be discriminated. Protein:NA recognition mechanisms. 1) Coulombic interactions (with consequential ion release) 2) van der Waals (dipole-dipole and induced dipole) 3) Solvent driven (hydrophobic effect) 4) Hydrogen bonding These interactions will be highly dependent on solution conditions of temperature, salt concentrations, and pH. These conditions must be explicitly stated in any description of protein binding to RNA or DNA. 1) Coulombic interactions (with consequential ion release) Protein + DNA <=> Protein:DNA Kobs = [PD]/[P][D] This is too simple, since DNA (and RNA) are polyanions and bind counterions. Logically, since Protein binds a nucleic acid, it must also ‘bind’ anions. When the nucleic acid binds protein, it must release its counterions and waters from sites that will interact with protein (vice versa for the protein). P(aM+, bX-, cH+, dH2O) + D(eM+, fH2O) <=> PD(gM+, hX-, jH2O) So a more accurate equilibrium reaction is P + D <=> PD + xM+ (x = g-(e+a)) so increasing the concentration of M+ will shift the equilibrium to the left (free P and D). 2) van der Waals (dipole-dipole and induced dipole) London dispersion forces are weak interactions that are typically induced-dipole. 4. Hydrogen bonding Recognition of a specific site is often described in terms of ‘direct readout’ – amino acids of the protein ‘recognize’ the 3D arrangement of hydrogen bond donors and acceptors on the nucleic acid. ‘Indirect readout’ – the protein recognizes conformational features of the nucleic acid. Hydrogen bonding is the most common devise to obtain specificity of interactions, since hydrogen bonding has preferences for length and bond angle. However, it is not sufficient to consider direct interactions between protein and nucleic acid since many specific interactions are mediated by water molecules (not necessarily visible in crystal structures). Energetically, residues that are involved in intermolecular hydrogen bonding are often hydrogen bonded to water in the free state, so there is not a large energy gain in formation of the protein:nucleic acid hydrogen bond (about -1.1 to -1.7 kcal/mol H-bond). But, if a hydrogen bond to water is not replaced by an equivalent hydrogen bond, then there is an energy loss associated with complex formation. Specificity due to hydrogen bonding is more related to losing a hydrogen bond than forming one, although the opposing effects are often impossible to separate. Essential features for modulating the binding of a protein to a nucleic acid are: 1> Reversible binding. 2> Competitive binding. The same protein for different sites or many proteins for the same site. 3> Modulation of binding affinity and specificity by small effector ligands 4> Competition between different protein subunits. Binding can be modulated in two ways: 1> Thermodynamic, or equilibrium control. In this case, regulation is achieved by equilibrium binding affinities of various proteins for their DNA/RNA sites, and so the percent site occupancy by a given protein is the key. 2> Kinetic control. The rates of complex formation or dissociation are most important. To describe complex formation, it is necessary to know the binding affinity and rates of binding and dissociation. In practical terms, in order to understand regulation by a protein:nucleic acid interaction, it is necessary to know the binding mechanism. What proteins bind to DNA and RNA? The helix-turn-helix motif binds to DNA. These aren’t stable out of the context of the whole protein. Helix-turn-helix binds to DNA duplexes. Zinc finger specificity can be modulated. Three tandem fingers bind to DNA. What’s the advantage of having more than one finger? Wolfe et al., (Pabo lab) Used a leucine zipper linked to two Zn-fingers to create a new DNA binding protein. Leucine zippers themselves can bind DNA (fos/jun, GCN4, bZIP) Complex DNA binding proteins E. coli SSB protein (Lohman lab, WUMS) EcoR1 restriction enzyme + DNA (Rosenberg lab, U Pitt) Proteins that bind RNA. 1. The Arginine-rich motif (ARM) BIV tat and HIV1 Rev peptides can fit into the major groove because there is a dramatic deformation of the A-form duplex. It is wider due to the bulged nucleotides so that the peptide can make contact with bases, sugars, and phosphates. Due to the number of interactions (hydrogen bonds, electrostatics, and stacking hydrophobic amino acids), the dissociation constants of these small complexes can be nanomolar. 1BIV,1MNB 1ETF 1A4T Weiss & Narayana (1998) Biopolymers 48:167 P22: NAKTR RHER9R RKLAI ERDTI The P22 peptides WT, Pro9, and Ala9 mutants do not show evidence of a stable helix. Binding of the Ala9 peptide is at least as tight as the wt peptide, but the Pro mutant doesn’t bind. [A construct with four Ala added to the terminus does have the CD signature of an α-helix.] wt Pro9 Ala9 The point is: there is not one unique way for an ARM peptide to make specific contact with an RNA. It is difficult to predict and almost impossible to model. A is P22/BoxB. The α-helix fits into the groove of the RNA and bends over the GAAAA loop where its Trp stacks with an adenosine. The peptide bends at R11 to allow the helical sidechains to stack with the nucleobases. B is HIV Rev/RRE. The REV peptide forms an α-helix, but positions itself in the RNA bulge. The peptide contacts both RNA strands. C is BIV Tat/TAR. The Tat peptide forms a β hairpin as it positions itself in the RNA bulge. The peptide contacts both RNA strands. 2. dsRBD Double-stranded RNA binding domains (dsRBM) are nonspecific, but sensitive to A-form structure. This is the domain from PKR (Protein Kinase R). The affinity comes from many contacts between the protein and 2′ OH groups in the minor groove. If a DNA were A-form, these contacts would be missing, but the protein could still make electrostatic contacts with the phosphates. dsRBD binding is structureselective, but not sequence-specific. Other very important proteins have this motif: ADAR1, the RNA-specific adenosine deaminase that converts adenosine to inosine in duplexes contains three dsRBDs. DCR (Dicer) is the enzyme that cleaves double-stranded RNAs into 21 base-pair pieces. These small duplex RNAs go on to become incorporated into the RISC, where they are bound by Ago and become the templates for RNAi cleavage of mRNAs. DCR has one dsRBD. Argonaute (Ago) proteins have two dsRBDs. They bind to miRNA and siRNA as part of the process of gene regulation by translation repression (the current model for miRNA activity) or mRNA degradation (RNAi). 3. RNA Recognition Motif (RRM). It is the most common eukaryotic RNA binding domain. RRMs are identified by their conserved sequences Consensus RNP-2 LFVGNL IY I KL RNP-1 KGFGFVXF R YA Y Two or three aromatic residues are solvent-exposed on the surface of the β sheet RNP1 begins in Loop3 and extends through β3. RNP2 extends through β1. Birney, E., Kumar, S., Krainer, A.R. 1993. Analysis of the RNArecognition motif and RS and RGG domains: conservation in metazoan premRNA splicing In 2011, there factors . Nucleic are 64620 RRM sequences the NCBI Protein Database Acids Res.in 21(25): 5803–5816. and 400-500 structures. The human U1A protein is the best studied of all the RRMs. It binds Stemloop II of U1 snRNA. A cocrystal shows how the RNA sits on the surface of the β-sheet and how a protein loop pokes through the 10 nucleotide RNA loop. What we have learned: Prediction of an RNA:protein interaction will have unique difficulties. You can’t simply dock the molecules, since both of them change their conformation. You can’t design a new RNA binding protein the same way you can design a new zinc finger DNA binder, or a helix/turn/helix.