Organization of RNA structural motifs: Lessons from SCOR Donna K. Hendrix Department of Plant and Microbial Biology University of California, Berkeley and Physical Biosciences Division Lawrence Berkeley National Laboratory dkhendrix@lbl.gov Structural classification of RNA http://scor.lbl.gov Search by • PDB or NDB id • primary sequence • key word Directed Acyclic Graph Architecture Classification principles Base pairing • Watson Crick Base stacking • non-canonical Sequence Backbone conformation •Backbone interactions • backbone-backbone • backbone-base SCOR 2.0 classification Structural classification • Hairpin loops • Internal loops Tertiary interactions • Ribose zippers • Coaxial helices, Tetraloop-receptor, A-minor motif, Kissing hairpin, Pseudoknots Functional classification • Molecular function • Motif function • Structural models RNA structural classification • Conserved patterns and relationships • sequence • structure • Organize data for non-specialist • Classification for RNA model-building, engineering How to give yourself eye strain SCOR 2.0.3 update: • 102 new structures • 20 structures removed from SCOR 2.0.2 • 85 structures previously in SCOR but not functionally annotated Moved server from LBL; cleaned up the code a little bit; upgraded OS/tomcat; Eric added apache services. What defines an RNA structural motif? Conserved, repeated structural features – sequence – fold (backbone, stacking) – interactions (hydrogen bonds, stacking) Primary structure •Identify by – conservation of sequence – binding or stability •Specify by sequence: – GUAUGA (Box C of C/D Box snoRNA) – CUCAGUACGAGAGG AAC (sarcin-ricin loop) M. Tamura and S.R Holbrook JMB 320:455 (2002) Secondary structure motifs •Specify by Watson Crick base pairing – – – – internal loops hairpin loops junction loops some tertiary interactions (pseudoknots) 1euy, Sherlin, et. Al. JMB 299:431 (2000) Structural, or 3-d motifs •Distinguished from secondary structural motifs by three-dimensional features and interactions – bases: pairing, stacking, base-backbone – backbone: backbonebackbone, torsion angles (including chi), pseudotorsion •Described by sequence, secondary structure features as well 1euy, Sherlin, et. Al. JMB 299:431 (2000) Organization of structural motifs: hierarchical classification from SCOR 1.1 and 1.2 Internal Loops NonWatson Crick paired stacked duplexes Loops with unpaired stacked bases, no triples or dinucleotide platforms One Looped out base One looped- Loops with out base with Dinucleotide stacked non- platform Watson Crick base pairs Several loopedout bases Base triple, no dinucleotide platform Unpaired, unstacked looped in bases Transglycosidic bond(s) Limitations of the hierarchical classification (SCOR 1.1, 1.2) Internal Loops NonWatson Crick paired stacked duplexes Loops with unpaired stacked bases, no triples or dinucleotide platforms Several loopedout bases One Looped out base One loopedout base with stacked nonWatson Crick base pairs Base triple, no dinucleotide platform Unpaired, unstacked looped in bases Transglycosidic bond(s) Loops with Dinucleotide platform 1i6u: c:10-11, c:28 (A-U)A Tishchenko, et al., JMB 311:311 (2001) 1exy:a:9,20,22 (G,C,A) Jiang, et al. Structure 7:1461 (1999) Organization of structural motifs: SCOR 2.0 and the DAG classification • Use a directed acyclic graph (DAG) to represent the relationships among motifs • Increase searching options: by sequence, strand, PDB or NDB identifier, residue number and key words Limitations of the hierarchical classification(SCOR 1.1, 1.2) Internal Loops NonWatson Crick paired stacked duplexes Loops with unpaired stacked bases, no triples or dinucleotide platforms Several loopedout bases One Looped out base One loopedout base with stacked nonWatson Crick base pairs Base triple, no dinucleotide platform Unpaired, unstacked looped in bases Transglycosidic bond(s) Loops with Dinucleotide platform 1i6u: c:10-11, c:28 (A-U)A Tishchenko, et al., JMB 311:311 (2001) 1exy:a:9,20,22 (G,C,A) Jiang, et al. Structure 7:1461 (1999) SCOR 2.0 DAG: internal loop base triples Internal Loops Loops with dinucleotide platforms Loops with simple dinucleotide platform Loops with base triples Loops with a dinucleotide platform in a triple Loops with base triples, no dinucleotide platform Limitations of the DAG •No clean way to present orthogonal attributes – “hairball” – Multiple DAGs •Not easily searchable – Inherent awkwardness to browsing Organization of structural motifs: hierarchically organized queryable •PDB ID: 1dul attributes •Location: chain b, res 146-150; chain b, res 161-165 •Sequence 146-UCAGG-150 165-GACGA-161 •Base pairings 146-165; U∙G; cis WC-WC 147-164; C∙A; trans WC/Hoogsteen 148-163; A∙C; trans WC/sugar edge 149-162; G∙G; trans bifurcated/Hoogsteen 150-161; G∙A; cis WC-WC •Base stacking Adjacent: 145-146, 146-147, 148-149, 149-150… Non-adjacent: 147-162, 148-164 (stack swap) •Pseudotorsions Residue 146.B 147.B 148.B η 169.3 160.9 110.7 θ 195.0 144.3 155.2 χ 203.9 217.6 228.2 •RNA “Rotamers” … •Identify motifs that consist of these more atomic attributes. 1dul:146-150.b, 161-165.b E. coli SRP/RNA Batey, et al., Science 287:1232 (2000) Feature-based structural classification • • • • • • *Sequence *Loop length Base pairings Pseudotorsion angles Hydrogen bonds Stacking – adjacent and non-adjacent Classification of structural elements by features Feature-based searching and characterization of motifs RNA Structural Elements Characteristic Element Loop Motifs Tertiary Interaction Motifs Size Small, local May span entire loop Multiple loops, stems involved Sequence Conservation Little or none Often have sequence preferences/isosteric Interaction sites Often conserved Evolutionarily conserved Usually single feature Multiple features/elements Multiple in each interacting motif Found within various motifs Not nested; may occur in tertiary interaction motifs May include multiple elements and motifs Structural Conservation Features (pairing, stacking, etc.) Occurrence By definition Element Name(s) Description Found In Reference U turn/Uridine turn/Pi turn A sharp bend in the phosphate-sugar backbone between the first and second nucleotides, followed by characteristic stacking of the second and third nucleotides. Original descriptions include a stabilizing hydrogen bond between the first and third residues. Hairpin loops (e.g., GNRA, T--C loop) and internal loops (Holbrook et al., 1978; Kim and Sussman, 1976; Klosterman et al., 2004b; Quigley and Rich, 1976) A-minor interaction The insertion of minor groove edges of an adenine into the minor groove of neighboring helices. Four types have been identified. Ribose zipper, kink-turn (Nissen et al., 2001) S-turn Two consecutive bends in the phosphate-sugar backbone characterized by backbone distortions and inverted sugar puckers, resulting in an "S" shape. Loop E motif Sarcin-ricin loop (Correll et al., 1999; Szewczak et al., 1993; Wimberly et al., 1993) Dinucleotide platform Two adjacent, covalently linked, coplanar residues that form a nonWatson Crick pairing. Internal loops, often involved in a base triple (Klosterman et al., 2004b) Base triples Three hydrogen-bonded, coplanar bases with two of the bases sometimes forming a Watson-Crick pair or dinucleotide platform. Loop E motif, Sarcin-ricin loop (Klosterman et al., 2004b) Cross-strand stack A base on one strand stacks with a base on the opposing strand, rather than stacking with the adjacent bases on its own strand. Internal loops, e.g., Bacterial Loop E motif (Correll et al., 1997) Noncanonical base pairs Two bases of any type interacting in a generally planar arrangement can form hydrogen bonds in characteristic patterns. Double helices (Leontis and Westhof, 2001) (Nagaswamy et al., 2002) Extruded helical single strand Two or three unpaired bases extruded from the main double helical stack forming an independent stack. Internal and hairpin loops (Klosterman et al., 2004b) Annotation issues: What is a motif? Recurrent structure Conserved structure Conserved function? I know it when I see it. Definition (glossary) Annotation issues: Assessment Canonical Variations (-like, pseudo-, reverse-, inverse-) eVal Annotation issues: Who is it for? Student. Naïve in knowledge of structural motifs, but expert in biology. Expert. Computer-readable, human-interpretable? But what about my favorite structure (sequence, motif)? BLAST?