Article No. mb981797 J. Mol. Biol. (1998) 279, 773±793 Derivation of the Three-dimensional Architecture of Bacterial Ribonuclease P RNAs from Comparative Sequence Analysis Christian Massire, Luc Jaeger and Eric Westhof* Institut de Biologie MoleÂculaire et Cellulaire du CNRS UPR 9002, 15 rue Descartes 67084 Strasbourg, France The secondary structure of bacterial RNase P RNA, a ribozyme responsible for the maturation of the 50 end of tRNAs, is well established on the basis of sequence comparison analysis. RNase P RNA secondary structures fall into two types, A and B, which share a common core formed by the assembly of two main folding domains, but differ in their peripheral elements. A revised alignment of 137 available sequences reveals new covariations allowing for the re®nement of both types of secondary structures. Phylogenetic evidence is thus provided for the extension of stems P11, P14, P19, P10.1 and P15.1 through further canonical base-pairs or GA . . . GA mismatches. These re®nements led in turn to a new organization of the catalytic core, with coaxial stackings of helices P2 and P19 as well as P1 and P4. New inter-domain tertiary interactions involve loop L9 and helix P1 and loop L8 with helix P4. These features were incorporated into atomic-scale 3D models of RNase P RNA for representatives of each structural type, namely Escherichia coli and Bacillus subtilis. In each model, the juxtaposition of the core helices creates a cradle onto which the pretRNA substrate binds with most evolutionarily conserved residues converging towards the cleavage site. The inner cores of both types are stabilized similarly, albeit by different peripheral elements, emphasizing the modular and hierarchical organisation of the architecture of RNase P RNAs. Similarities are thus apparent between the type A modules, P16/ P17/P6 and P13/P14, and their type B analogs, P5.1/P15.1 and P10.1/ P10.1a, respectively. Other noteworthy features of these models include compactness and good agreement with published crosslinking data. # 1998 Academic Press Limited *Corresponding author Keywords: ribonuclease P RNA; ribozyme; RNA folding; RNA-RNA interactions; RNA modelling Introduction Ribonuclease P (RNase P) is the endoribonuclease responsible for removing the leader sequence from the tRNA precursors during the maturation of the 50 end of tRNAs (for reviews, see Altman et al., 1993; Pace & Brown, 1995). RNase P is a ribonucleoprotein whose catalytic function, at least in bacteria, is carried by its RNA component (RNase P RNA) rather than by the protein (Guerrier-Takada et al., 1983). Another remarkable feature of RNase P RNA is its ability to recognize the tertiary structure of its pre-tRNA substrates (Kahle et al., 1990). The secondary structure of bacAbbreviations used: Eco, Escherichia coli; Bsu, Bacillus subtilis; pre-tRNA, precursor tRNA. 0022±2836/98/240773±21 $25.00/0 terial RNase P RNA was inferred from a ®rst comparative analysis of sequences (James et al., 1988) and re®ned as further sequences became available (Haas et al., 1991; Brown et al., 1993; Haas et al., 1994). An unexpected result was the recognition of two main structural types, namely type A and type B, which nevertheless share a common core (Haas et al., 1996b). Type A is the ancestral form (see the Escherichia coli sequence in Figure 1(a)), whereas type B has later emerged within the Gram-positive lineage (see the Bacillus subtilis sequence in Figure 1(b)). Previous work on group I introns has shown that large RNAs are likely to be organized in self-folded structural domains (Murphy & Cech, 1993; Doudna & Cech, 1995). Such domains are also visible in RNase P RNA, which is primarily organized in two distinct domains (Pan, 1995; # 1998 Academic Press Limited 774 Modelling of the 3D Structure of RNase P RNA Figure 1 (legend opposite) 775 Modelling of the 3D Structure of RNase P RNA Loria & Pan, 1996). Domain I, involved in the recognition of the T-loop of pre-tRNAs (Pan et al., 1995; Loria & Pan, 1997), encompasses the ``upper part'' of RNase P RNA, i.e. stems P7 to P12, the large internal loop L11/12, as well as stems P13/ P14 (in type A) or P10.1 (in type B; see Figure 1). Domain II includes the ``bottom'' half of the molecule, and comprises most nucleotides thought to be involved in the formation of the catalytic site, i.e. stems P1 to P5 and their ¯anking strands (Frank et al., 1996; Pan & Jakacka, 1996; see Figure 1). Domain II includes as well stems P15, P19, and extensions P16/P17/P6 and P18 (in type A) or P5.1, P15.1/P15.2 (in type B). In spite of the early recognition of both structural types, most attention has been focused on the type A sequence from E. coli, for which two structural models for the RNase P RNA-pre-tRNA complex were independently proposed (Harris et al., 1994; Westhof & Altman, 1994). The Harris & Pace model was primarily designed as a frame for inter- and intramolecular crosslinking data (Burgin & Pace, 1990; Nolan et al., 1993; Oh & Pace, 1994), whereas the Westhof & Altman model was based on alternative cross-linking data (Guerrier-Takada et al., 1989) and chemical modi®cation data (Lumelsky & Altman, 1988; Shiraishi & Shimura, 1988; Kirsebom & Altman, 1989; Knap et al., 1990; Talbot & Altman, 1994b; Westhof et al., 1996b). They share nonetheless some common details, such as the stacking of P1, P2 and P3, as well as an interaction between the L15/16 internal loop and the RCCA sequence at the 30 end of pre-tRNA (Kirsebom & SvaÈrd, 1994; LaGrandeur et al., 1994). However, these working models remain organised in a radically different way, re¯ecting incompatibilities between some of the experimental data. The number of available sequences has dramatically increased within the last two years, bringing phylogenetic evidence for long-range interaction between L14 and P8, L18 and P8 (Brown et al., 1996), as well as between L9 and P1 (Massire et al., 1997). None of these contacts was actually predicted by the 3D working models, but the Harris & Pace model has been recently re®ned to accommodate them (Harris et al., 1997). Our previous model, however, could not incorporate these interactions without breaking the proposed catalytic site. This fact led us to reconsider the entire modelling process on the primary basis of an exhaustive phylogenetic analysis of the 137 bacterial sequences available from the ribonuclease P database (Brown, 1997) and from genome sequencing projects (see Methodology). In the present study, we propose 3D models for the two main structural types of RNase P RNA. Although they display different peripheral elements, they share a similar core centred around two inter-domain tertiary interactions involving loops L8 and L9. The proposed models should be viewed as ``consensus'' 3D structures that aim at emphasizing the folding architecture and relationships between the folding elements rather than precise atomic details. Results Type A and type B bacterial RNase P RNAs have both been shown to be formed of two independent folding domains that can self-assemble into a ®nal active structure (Loria & Pan, 1996). The assembly of RNase P RNAs structural elements is thus fully compatible with a hierarchical folding process where prefolded RNA modules are used to create larger domains capable, once assembled, to associate into the ®nal structure. The strategy we have used for modelling the threedimensional architectures of type A and type B RNase P ribozymes re¯ects this modular view of RNA structures. In a ®rst step, we began by modelling domain I and domain II separately. Once built, it became possible to orient and adjust the structure of the two domains with respect to their interdomain long-range interactions and tertiary contacts with the pre-tRNA substrate. The resulting three-dimensional models for the architecture of type A and type B RNase P ribozymes were ®nally re®ned with respect to the available published cross-linking data. Modelling of domain I The P10/P11 stack All bacterial RNase P RNA sequences present a wide internal loop L11/12 linking stems P11, P12, P13 and P14 (James et al., 1988; and see Figure 1). This loop contains two conserved regions, 130ACAGNRA-136 and 177-GUGNAA-182 (Chen & Pace, 1997) that are found also in RNase P RNA sequences from eukarya (PagaÂn-Ramos et al., 1996) Figure 1. Classical representation of the 2D structure of (a) the E. coli RNase P RNA and of (b) the B. subtilis RNase P RNA. (Haas et al., 1994). Strands involved in the formation of the P4 and P6 pseudoknots are connected by square brackets. Extensions of stems P11, P10.1a, P15.1 and P19 are highlighted by continuous lines (see the text). Alternative representations for the 2D structures of (c) the E. coli RNase P RNA and of (d) the B. subtilis RNase P RNA. The alternative representations constitute an attempt at rendering the respective spatial arrangement of the helical stems and stacks in the 3D models. Thick lines link nucleotides adjacent in the sequence with arrows indicating the 50 to 30 polarity. Nucleotides invariant among bacteria are bold. Nucleotides involved in tertiary interactions are enclosed in boxes and linked by thin, dotted lines. Known non-canonical pairings are indicated by open circles. For each 2D diagram, the thick dotted line between P5 and P7 indicates the separation between the two main structural domains (Loria & Pan, 1996). Stems are colour-coded according to the helical stacks proposed in the present 3D models (see Figure 2). 776 Modelling of the 3D Structure of RNase P RNA Table 1. Covariations between bases 128 and 230 (136 sequences) Table 3. Covariations between bases 203 and 226 (110 sequences possessing P13/P14) Base 128 Base 230 A A C G U C Base 203 G 5 126 U Base 226 5 A C G U A 2 C 3 G 101 U 4 Data are collected with the program COSEQ (unpublished). and archaea (Haas et al., 1996a). In spite of this homology, the pairing that closes loop L11/12 varies between kingdoms. In bacteria, L11/12 is closed by a composite helical structure formed of the 2-bp stems P10 and P11 (TallsjoÈ et al., 1993; Haas et al., 1994) and the lone base-pair 128:230 (see Table 1 and Mattson et al., 1994). In eukarya, stems P10 and P11 are merged within a single stem of at least 6 bp (Tranguch & Engelke, 1993), whereas a continuous 8 or 9 bp stem P10/P11 may form in the archaea Methanococcus jannaschii and Archaeglobus fulgidus. This suggests that, in bacterial RNase P RNA, P10, P11, and the 128:230 interaction could also be part of a long irregular helix. Nucleotides adjacent to the 128:230 interaction have been proposed to form additional base-pairs (Mattson et al., 1994; Westhof & Altman, 1994). Indeed, the base-pair 129:229 shows transversions from C:G to A:G and A:U (but no C:U is observed, see Table 2). The putative base-pair 127:231 is much more conserved, however. When the two adjacent bulging nucleotides A232 and A233 are present (in all bacteria but bacteriodes), this base-pair exhibits a strong bias towards G:U and G:C pairs (then mostly in thermophilic species). Interestingly, in the 113 RNase P RNA sequences displaying the bulging adenine bases A173 and A174 at the base of stem P12, a similar pattern of variation is observed for the adjacent 144:172 base-pair. We suggest, therefore, that the 127-G . . . 231-YAA motif should be viewed as a part of the 7 bp stem P10/P11, in the same way the 144-G . . . 172-YAA motif is incorporated into stem P12 (see Figure 1(c)). A233 has been shown to be involved in the formation of a contact with the hydroxyl O20 of residue 62 in the pre-tRNA, which suggests that this internal loop motif, G . . . YAA, is particularly well suited for tertiary interactions. In fact, in our A type model, 144-G . . . 172-YAA is proposed to interact with loop L13 (see below). Table 2. Covariations between bases 129 and 229 (135 sequences, both bases are deleted in the Hbc. mobilis sequence) The P13/P14 stack Type B and most b-purple bacteria sequences do not possess hairpins P13 and P14, which, otherwise, always appear simultaneously in loop L11/12 (see Figure 1(a) and (b)). This suggests that, in type A RNase P RNA, these stems are coaxially stacked, so that their 50 and 30 dangling strands are brought close to each other (see Figure 1(c)). Interestingly, the secondary structure of P14 can be re®ned in order to leave an identical number of unpaired nucleotides in the L11/12 loop of type A and B RNase P RNAs (see Figure 1(c) and (d)), in agreement with the high level of sequence conservation of L11/12. Indeed, in all type A RNase P RNAs, stem P14 can be extended by joining nucleotide(s) in J13/14 to adjacent nucleotide(s) in J14/11, leaving in most sequences the conserved G225 bulging (see Tables 3 and 4). Phylogenetic evidence for this extension is mainly provided by ¯avobacteria sequences, where the bulge in P14 is absent and where all WatsonCrick combinations are encountered for the basepairs extending P14. Thus, the length of P14 is more constant than it was previously proposed (9 or 10 bp long, instead of 7 to 9 bp) in agreement with the tertiary interaction involving L14 and P8 (Brown et al., 1996). The stem P10.1 Among the 18 type B sequences known to date, 15 possess a P10.1 extension between P10 and P11 (see Figure 1 and 3). Phylogenetic evidence for a tertiary interaction between P10.1a, the distal part of P10.1, and L12, the tetraloop capping stem P12, was previously reported (Tanner & Cech, 1995). In most type B sequences, this interaction involves the recognition of a conserved GAAA tetraloop in L12 by an 11 nt GAAA receptor of consensus Bsu 145-CCUAAG . . . 159-UAUGG in P10.1a (Costa & Michel, 1995). This interaction is expected to be Table 4. Covariations between bases 203.0 (5' to base 203) and 227 in the 51 sequences having base 203.0 Base 129 Base 229 A C G U A 28 13 C 94 Base 203.0 G U Base 227 A C G U A 44 C 1 G 5 U 1 777 Modelling of the 3D Structure of RNase P RNA similar to the P5b-P6a interaction seen in the crystallographic structure of the P4-P6 domain of the Tetrahymena group I intron (Cate et al., 1996a). Although the Clostridium innocuum sequence contains a GAAA loop L12, a particular loop receptor rich in G:U pairs has been suggested for this sequence in lieu of the expected 11 nt receptor motif (Haas et al., 1996b). This receptor can be rearranged as a variant of the 11 nt motif (see Figure 3). Indeed, the resulting motif CCUGUG . . . UAUGG, with a GU platform instead of the canonical AA platform, has actually been found by in vitro selection to interact speci®cally with GAAA tetraloops (see sequence C7.10 in Figure 5B of Costa & Michel, 1997). The L12-P10.1a interaction is also feasible in the Brevibacillus brevis sequence, this time through a GYRA loop-helix interaction (Michel & Westhof, 1990; Jaeger et al., 1994). Most non-mycoplasma sequences possess an internal loop L10.1 of consensus Bsu 139RAAA . . . 166-RAGUA and can, therefore, be aligned with con®dence (see Figure 3). Strikingly, this internal loop motif is similar to the loop E of eukaryotic 5 S RNA (Shen et al., 1995), which is characterized by a well-de®ned structure involving the stacking of non-canonical base-pairs. By analogy, we propose the formation of the following non-canonical pairs within L10.1 (see also Figure 3): R139:A170 (sheared), A140:U169 (Hoogsteen), A141:A167 (Hoogsteen) and A142:R166 (sheared). As a result, the P10.1 extension should appear as a straight, irregular helix that keeps constant the distance between the base of P10.1 and the 11 nt GAAA receptor (see Figures 1(d), 2(b) and (d)). This is emphasized by the observation than the loop E motif L10.1 can be slide one base-pair, since it does not affect the overall length of P10.1/ P10.1a. Even the Bacillus megaterium sequence which possesses a longer AAAAACC . . . CGAGUA internal loop (extra nucleotides are underlined) but lacks two base-pairs in P10.1a can potentially have a similar overall length for its P10.1/P10.1a extension. The formation of an AC platform, followed by a C:C pair can indeed substitute for the two missing base-pairs (Figure 3; and see Cate et al., 1996b). This latter motif has actually been recognized as an alternative for the AA platform and the G:U pair within the GAAA receptor (see sequence C7.16 in Figure 5B of Costa & Michel, 1997). The overall stacking of the P10.1 extension is moreover con®rmed by the remaining type B sequences: the Acholeplasma laidlawii sequence lacks the L10.1 internal loop and possesses instead a continuous P10.1 stem and, in the C. innocuum sequence, the three purine:purine pairs that constitute L10.1 are likely to stack on each other. Interestingly, when the L12-P10.1a interaction is present, the overall length of P10.1/P10.1a seems to correlate with the length of P12: indeed, the three mycoplasma sequences M. capricolum, M. pneumoniae and M. genitalium display a simultaneous shortening of stem P12 and of the L10.1 internal loop, while retaining the L12-P10.1a interaction. Analogy between P13/P14 and P10.1a/P10.1 It was previously suggested that P10.1a/P10.1 could be a structural equivalent for the P13/P14 stems (Loria & Pan, 1996). The fact that the RNase P protein component from E. coli can bind either to a type A or to a type B RNase P RNA could plead for a high level of similarity between tertiary structures of both types of RNase P RNA (GuerrierTakada et al., 1983). In type A sequences, the loophelix interaction L14-P8 brings P14 close to the 3 nt bulge between P10 and P11, i.e. close to the insertion site of P10.1 in type B sequences (see Figures 1 and 2(c) and (d)). The type B stem P10.1a/P10.1 could therefore take the place of the type A stems P13/P14 if one assumes a tetraloop conformation for the GAAA fragment that links P10.1 and P11 in most type B sequences (see Figure 1(c) and (d)). By considering the analogy between P13/P14 and P10.1a/P10.1, one could expect that a tertiary interaction, speci®c to type A sequences, replaces the type B interaction L12-P10.1a. Actually, P13, which is always 6 bp long and is capped by a loop of consensus GUAAGAG, could be involved in a tertiary contact with the 144-G . . . 172-YAA motif at the base of stem P12. Indeed, Brown et al. (1996) have already noticed that the stem ± loop P13/L13 and the 144-G . . . 172-YAA motif always appear simultaneously, both features having been deleted in type B and b-purple bacteria sequences. Interestingly, the mitochondrial RNase P RNA from Reclinomonas americana lacks both the L13 loop and the P12 bulge (Lang et al., 1997). Since mitochondrial sequences have been derived from the a lineage of purple bacteria, the Recl. americana sequence provides another instance of the simultaneous deletion of the P12 bulge and the L13 loop, providing further evidence for their implication in a common interaction. However, due to the lack of experimental evidence for speci®c contacts between L13 and P12, we did not attempt to model this hypothetical interaction in detail. Conformation of the P7/P8/P9/P10 cruciform junction The juxtaposition of stems P7 to P10 creates a cruciform in the RNase P RNA structure (Haas et al., 1994). In such a four-way junction, stems are expected to be stacked by pairs, leaving only two theoretically feasible conformations: a P7/P8 and a P9/P10 stack or, alternatively, a P7/P10 and a P8/ P9 stack. We constructed the cruciform junction with both conformations. Only the latter was found compatible with the geometry of the L14-P8 interaction and with the simultaneous extension of stems P10/P11 and P14, which leaves at most two nucleotides between P14 and P11 (see Figure 1(a) and (c)). This conformation is moreover the only one that is in agreement with the cross-link of pre- Figure 2. Ribbon representations of the complete 3-D models of RNase P RNA. Front view of (a) the E. coli model and of (b) the B. subtilis model. The colour code is the same as in Figure 1. Stems of domain I are red, green and deep blue; stems of domain II are orange, purple, light blue and yellow. The invariant nucleotides of the catalytic site are represented with large white spheres, while the conserved nucleotides from the T-loop recognition site are represented with smaller spheres. Back view of (c) the E. coli model and of (d) the B. subtilis model. A rotation of 180 along the vertical axis has been applied to each model between front and back views. The loop-helix interactions L9-P1, L8-P4, L14-P8, L18-P18 in the E. coli model, and L12-P10.1a and L15.1-P5.1 in the B. subtilis model are indicated by wide arrowheads. The ribbon drawings were made with the DRAWNA software (Massire et al., 1994). 779 Modelling of the 3D Structure of RNase P RNA Figure 3. Re®ned alignment of the Bsu 132 . . . 216 region (equivalent to Eco 120 . . . 176) for the 18 type B bacterial sequences. Canonical Watson-Crick pairs are green and wobble pairs are dark blue. Hoogsteen pairs are light blue and sheared pairs are red. AA platforms and GAAA loops are purple. Unpaired nucleotides are black. Nucleotides invariant within these sequences are bold. For clarity, the terminal loop L10.1a and the most conserved part of J11/12 are not displayed. Instead, the number of corresponding nucleotides is indicated between square brackets for each sequence. tRNA G64 with both A118 in L9 and G100 in L8 (Nolan et al., 1993). In the present models, domain I is therefore made of the tight assembly of the nearly coaxial stacks P9/P8, P11/P10/P7, and alternatively P13/ P14 or P10.1a/P10.1 (see Figure 2). The position of stem P12, capping the P11/P10/P7 stack, is uncertain due to the lack of evidence for speci®c contacts in the large loop L11/12, but is nevertheless partially constrained by the known geometry of the L12-P10.1a interaction in the B. subtilis model. Modelling of domain II The stem P19 Stem P19 appears sporadically between P2 and the well-conserved fragment J19/4 of consensus ACARAA. Fragment J19/4 constitutes with the 30 strand of P4 the last universally conserved region (Chen & Pace, 1997) and is unchanged regardless of the occurrence of P19. Stem P19 is present in most branches of the bacterial tree, except in g-purple bacteria (other taxa provide evidence for a late deletion of P19). We noticed that in nearly all proposed secondary structures of RNase P RNA with P19, the distance between P2 and P19 is the same as that between P19 and the ACARAA consensus fragment (from zero to three nucleotides). However, the secondary structure of stem P19 can be re®ned in order to leave no unpaired nucleotide between P2 and P19 as well as between P19 and the conserved part of J19/4. Table 5 summarizes the different types of pairing that are found within the six proximal base-pairs of P19 (see also Figure 1(b) and (d)). Although pairs Bsu 329:368 Table 5. Comparative analysis of the pairing types within the six proximal base-pairs of P19 (B. subtilis numbering) Base-pair 329:368 330:367 331:366 332:365 333:364 334:363 No. of sequences Regular R:R 88 86 86 83 83 79 57 65 75 75 72 69 21 17 4 3 6 4 Pairing type A:C Mismatch 7 3 4 7 5 5 6 Pairs are grouped in the following way. Regular, C:G, G:C, A:U, U:A, G:U and U:G; R:R: G:A, A:G and A:A; mismatch, C:C, U:U, C:U, U:C, C:A and G:G. 780 and 330:367, located at the base of P19, tend to form more non-classical base-pairs than 331:366 to 334:363, they nevertheless favour regular Watson-Crick base-pairs, or alternative A:A, G:A, A:G or A:C pairs rather than other mismatches. Purine:purine pairs have been observed in crystallographic structures extending regular helical stems, e.g. base-pair G26:A44 in tRNAAsp (Westhof et al., 1985) and in the P4-P6 domain of group I introns (Cate et al., 1996a). Similarly, A:C pairs are observed at the edge of RNA helices; for example, an A:C pair is found at the edge of the anticodon stem of tRNAAsp (Westhof et al., 1985). Therefore, the presence of purine:purine or A:C pairs at the base of P19 is fully compatible with the extension of the helical stack. As a consequence, stem P19 appears to be adjacent to P2 in bacteria, as already noticed in eukarya (Tranguch & Engelke, 1993). Main helical stacks in domain II Since the major driving force for RNA folding is base stacking, a single stem is likely to be inserted in a given structure at the top of an already existing helical stack. Thus, the fact that the single stem P19 is adjacent to P2 suggests a P19/P2 stack. The sequences that lack P19 possess instead a 4 nt linker (R19) between P2 and the ACARAA fragment J19/4. Fragment J19/4 is highly conserved and should therefore adopt the same conformation in all bacterial RNase P RNA sequences. Consequently, the 30 ends of P19 and R19 should also adopt a similar conformation in both type A and type B models. This is achieved if R19 adopts a loop conformation, mimicking the right-handed stack between P19 and P2 (see Figure 4). Thus, in both 3D models, the conserved J19/4 fragment lies Figure 4. Close-up stereoview of (a) the junction between R19 and P4 in the E. coli model and (a) between P19 and P4 in the B. subtilis model. The four nucleotides forming R19, C343 to G346, are drawn as sticks in the E. coli model. Modelling of the 3D Structure of RNase P RNA on the edge of the deep groove of P2 and along the shallow groove of P4. At the other end of P2, a single, invariant nucleotide (G19) is inserted between P2 and P3 (see Figure 1). In the absence of any other feasible alternative stacking, we conserved in our models the P2/P3 stack already present in previous models (Harris et al., 1994; Westhof & Altman, 1994). Therefore, the assembly of P19, P2 and P3 forms the ®rst continuous helical stack in domain II (see Figure 1 and 2).The choice of the P19/P2 stack in our models prevents the once proposed stacking of P1 on P2. Instead, we propose to stack P1 on P4, since a single conserved nucleotide (A361) is always inserted between the 30 strands of P4 and P1 (Haas et al., 1996b). P4 is in turn likely to stack on P5, because their 50 strands are always contiguous (Harris et al., 1994). Finally, P5 is adjacent to both P7 and P5.1 in type B sequences. However, P5 and P7 belong to different folding domains and are not adjacent in some cyanobacteria sequences, whereas P5 and P5.1 are always contiguous. Moreover, the fact that the single stem P5.1 is present in type B sequences argues for its stacking on a conserved stem, i.e. P5. Thus, P1, P4, P5 and P5.1 constitute the second continuous helical stack in domain II (see Figures 1(d) and 5(b)). The P15/P16/P17/P6 extension in type A RNase P RNA In our models, stem P15 is the only conserved stem from the domain II that is not part of the main helical stacks. P15 is poorly constrained in type B sequences because of the lack of evidence of speci®c contacts within the conserved dangling strands J5/15 and J15/18. In type A sequences, however, P15 constitutes the base of the P15/P16/ P17/P6 extension, which is strongly constrained by the formation of pseudoknot P6. The orientation of P15 may therefore be inferred from the modelling of the whole P15/P16/P17/P6 extension from type A RNase P RNA. P6 is usually 4 bp long, although its size can reach 7 bp in a-purple bacteria and cyanobacteria (Vioque, 1997). One to four nucleotides link P5 and P6, whereas a single adenosine base joins P6 to P7 in most cases. Therefore, P6 was stacked below P7 in the E. coli model (see Figures 1(c) and 5(a)). The other end of P6 is always adjacent to the 30 strand of P17, justifying the stacking of P6 and P17 within a type III pseudoknot (see Figure 5(a) and Westhof & Jaeger, 1992). Since the P15/P16/P17/P6 extension is constrained at both ends, it should contain a sharp turn either between P16 and P17, or between P15 and P16 (see Figures 1 and 2). The internal loop between P16 and P17 is extremely variable in composition or in length, and is often interrupted by the insertion of an extra stem, either between P16 and P17 (e.g. in planctomycete or a-purple bacteria sequences, see Brown, 1997) or between P17 and P16 (e.g. in Chlamydiae, see Herrmann et al., 1996). The fact that these extra stems are likely to stack 781 Modelling of the 3D Structure of RNase P RNA 1994). An NMR structure of a 31-mer RNA comprising P15, P16 and the L15/16 internal loop (Glemarec et al., 1996) shows that the bases within the internal loop L15/16 stack upon each other, leaving the Watson-Crick side of G292 and G293 available for additional pairings. More recently, the L15/16 loop and its homologue in B. subtilis, L15, were modelled in detail on the basis of a phylogenetic analysis (Easterwood & Harvey, 1997). Our model of this region, which does not differ signi®cantly from the latter work, presents a nearly coaxial stacking of P15 and P16, allowing the pre-tRNA 30 -RCCA sequence to interact in the minor groove of L15/16. Putative interaction between L5.1 and L15.1 in type B RNase P RNA Figure 5. Close-up stereo view of the junction (a) between P5, P6/P17 and P7 in the E. coli model and (b) between P5, P5.1 and P7 in the B. subtilis model. Unpaired nucleotides A79, U80, A81, A86 and G279 in the E. coli model and A81 to U85 in the B. subtilis model are drawn as sticks. on either P16 or P17 strongly suggests the presence of a kink between P16 and P17 (see Figures 1 and 2). The L15/16 internal loop, or its homologue in type B sequences, L15, is involved in intermolecular interactions with the pre-tRNA 30 -RCCA sequence (LaGrandeur et al., 1994) through at least two canonical base-pairs (Kirsebom & SvaÈrd, Type B RNase P RNA sequences lack the P16/ P7/P6 extension that contributes to the stabilisation of type A RNase P RNA structrures via the formation of the long-range interaction P6 (Haas et al., 1991). Nevertheless, they are characterized by the additional stems P5.1 and P15.1 that could potentially form an interaction in order to stabilize domain II (Haas et al., 1996b). The recent availibility of further type B sequences allowed us to reconsider the previous alignment of P5.1 and P15.1 (see Figure 6). Stem P15.1, previously considered to have 6 or 7 bp, can be extended up to 11 canonical base-pairs in each type B sequence (see Figure 6). This extension is made possible in some species by replacing two contiguous base-pairs by a GA . . . GA motif or one of its variants, or by bulging a nucleotide on the 30 strand. The L15.1 loop Figure 6. Re®ned alignment of stems P5.1 and P15.1 for the 18 type B bacterial sequences. Canonical Watson-Crick pairs are green, wobble pairs are blue, sheared pairs are red, nucleotides characteristic to the lone Acholeplasma laidlawii sequence are purple. Unpaired nucleotides are black. Nucleotides that are invariant within type B sequences (but in Acp. laidlawii) are bold. Square brackets stand for fragments of sequence not shown. 782 is then reduced to the consensus RAAN(3-5)AA. It is striking that the presence of such a loop in L15.1 appears to correlate with the presence of a highly conserved stem P5.1 of 6 bp long and capped by an UGNRAU loop (see Figure 6). Indeed, in the Acp. laidlawii sequence, where L15.1 is reduced to the well-known GCAA tetraloop (Heus & Pardi, 1991), stem P5.1 contains an extra C:G base-pair and is not capped by a UGNRAU loop. Furthermore, at a ®ner scale, the sequences of M. genitalium and M. pneumoniae may also be distinguished from the other in both the L5.1 and L15.1 loops (see Figure 6). In these sequences, the L5.1 loop follows the consensus UGGAAY instead of UGHGAU (signi®cant differences are underlined, see also Figure 6). This variation correlates with the presence in L15.1 of a loop following the consensus RAAN(2-3)AUAA observed in all L8 decaloops (see below), instead of consensus RAAN(3-5)AA (see Figure 6). These correlated changes suggest that L5.1 and L15.1 are indeed interacting. The structure of the putative L5.1-L15.1 interaction was not modelled in detail due to the lack of evidence for any speci®c contact. Nevertheless, the fact that a GCAA tetraloop in L15.1 correlates with the presence of two G:C base-pairs in P5.1 pleads for a classical GCAA loop-helix inteaction between L15.1 and P5.1 in the RNase P RNA from Acp. laidlawii, in place of the more complex interaction in the other type B sequences. Since P5.1 and P15.1 are expected to be located at the same place within all type B structures regardless of the actual nature of the P5.1-P15.1 interaction, the permutation of P5.1-P15.1 modules between different type B structures should preserve the active conformation. Therefore, in the B. subtilis model we have replaced the native P5.1-P15.1 module by its counterpart from the Acp. laidlawii sequence, where the relative orientation of P5.1 and P15.1 is constrained by the known geometry of a GNRA-loop interaction (Jaeger et al., 1994; Pley et al., 1994). The imposed stacking of P5.1 below P5 (see Figure 2(d)) thus constrains the position of P15.1. Homology between P18 and P15.1/P15.2 Stems P15.1 and P15.2 are always contiguous in type B sequences (Haas et al., 1996b). Their simultaneous insertion 50 to the fragment Bsu 314AGANNNAU-321 (nucleotides 328 to 335 in E. coli, see Figure 1) occurs in replacement of one or two nucleotides (Haas et al., 1994, 1996a). Therefore, P15.1 and P15.2 are likely to stack on each other within an independently folded subdomain P15.1/ P15.2. Most type A sequences possess instead a single stem P18 (see Figure 1(c) and (d)). In our models, P18 and P15.2 are placed similarly and can even be considered as two variants of the same helical stem (see Figure 2(c) and (d)). We keep, however, the different names to emphasize their structural differences: P18 is always 8 bp long and capped by a GNRA loop, whereas P15.2 does not exhibit signi®cant conservation in either sequence Modelling of the 3D Structure of RNase P RNA or length. P18 is constrained in the E. coli model by the L18-P8 interaction (Brown et al., 1996), whereas P15.1/P15.2 in the B. subtilis model is instead constrained by the L15.1-L5.1 interaction. It is noteworthy that these interactions place similarly the P18 and P15.1-P15.2 extensions behind P4, so that the neighbouring Eco 328-AGA-330 fragment can adopt the same conformation in both models, which is expected, since this fragment is conserved in all bacterial sequences. Interdomain association The fact that the two independent folding domains, when synthesized separately, can selfassemble into a catalytically active complex (Loria & Pan, 1996) suggests the formation of multiple tertiary contacts between the two pre-folded domains. The ®rst identi®ed interdomain interaction occurs between L18 and P8 (Brown et al., 1996). However, RNase P RNA mutants with a deletion of P18 retain a weak catalytic activity, indicating that the interdomain association may occur in the absence of the L18-P8 interaction (Haas et al., 1994). We recently reported phylogenetic evidence for another interdomain tertiary interaction between L9 and P1 (Massire et al., 1997). The conservation of a GNRA loop L9 on top of the 5 bp stem P9 is correlated with the conservation of a G:C pair in the distal end of P1. Alternatively, L9 and the 30 end of P1 are complementary in at least two Mycoplasma sequences, where they have therefore the ability to be involved in the formation of a pseudoknot in lieu of the loop-helix interaction. Nonetheless, the L9-P1 interaction is absent in onethird of the sequences, indicating that additional interdomain contacts may still remain to be identi®ed. Putative interaction between L8 and P4 L8 is a good candidate for such an additional interaction. In type A RNase P RNA, P8 is indeed the anchoring site for both L14 and L18, suggesting that P8 itself should be tightly bound to the core. Moreover, in all bacterial RNase P sequences, there is a striking conservation of two adenine bases within L8 (Eco A98 and A99 or Bsu A105 and A106, see Figures 1 and 7(a)), pointing to their potential role in folding or/and catalytic function of bacterial RNase P RNA. Interestingly, we have noticed that 11 to 12 bp separate these conserved adenine bases from loop L9, which was suggested to interact with P1 in the majority of RNase P RNA sequences (Figure 1(c); and see Massire et al., 1997). Since P8 and P9 are stacked, loops L8 and L9 are separated by a complete helical turn and point in the same direction. The L1-L9 interaction could thus act as a ruler for positioning the conserved adenine bases of L8 in front of a speci®c site of interaction. Considering that P1 and P4 are stacked in our current RNase P RNA models, the best candidate for this kind of interaction is P4. Two con- Modelling of the 3D Structure of RNase P RNA 783 Figure 7. Alignment of stems P4, P5, P7 and P8 for eight bacterial sequences representative of the structural diversity within these stems (only the ®ve proximal base-pairs of P4 are shown). Paired nucleotides are colour-coded as in Figure 1. Unpaired nucleotides are black. Square brackets stand for fragments of sequence not shown. Invariant bases are bold. The ®rst group of sequences contains type A sequences with a 5 bp stem P8 capped by a pentaloop L8 conserving Eco A98 and A99. The second group gathers together the type A sequence from Chlorobium limicola (which possesses a long P8 interrupted after 6 bp by two G:A pairs) and type B sequences (with a 6 bp stem P8 and a variable loop conserving Bsu A105 and A106). Within each group, the insertion of one base-pair in P5 compensates for the insertion of one nucleotide in P7 (blue positions), in agreement with an antiparallel orientation of P5 and P7. When the second group is compared to the ®rst, the insertion of one base-pair in P5 compensates for the deletion of one nucleotide in P8 (positions marked with an asterisk), in agreement with a parallel orientation of P5 and P8. (b) A drawing of the proposed orientation of stems P5, P7 and P8. Stems are represented by vertical arrows of proportional length for the same sequences as in (a). The distance between the conserved bases Eco G356-G357 in P4 and A98-A99 in L8 remains the same when P5 and P7, as well as P7 and P8, are in antiparallel orientations. served G:C pairs within core element P4 are, indeed, located at a distance of one helical turn from the two base-pairs supposed to interact with L9 (see Figure 1(c)). Therefore, whereas L9 interacts with the shallow groove side of 3:371 and 4:370 base-pairs of P1 in our type-A model, the two adenine bases of L8 interact with the shallow groove side of the conserved Eco C71:G356 and C70:G357 base-pairs of P4 (see Figure 1(c) and (d)). Although this putative tertiary interaction is not corroborated by a direct covariation, its formation is fully compatible with both type A and type B models, and can help in understanding the subtle variations observed for the length of the stems P5, P7 and P8, which are the structural elements separating P4 from L8. Most type A and type B sequences are characterized by a different number of nucleotides separating the base of stem P8 and the two conserved adenine bases in L8 (see Figure 7(a)). Interestingly, this length variation within stem ± loop P8/L8 correlates with the length variations of stems P5 and P7 (see Figure 7(a)). If one assumes an antiparallel orientation between P5 and P7 as well as between P7 and P8, the anchoring point of P5 to P4 is then located at a conserved distance from the two conserved adenine bases of L8, in both type A and type B RNase P structures (see Figure 7(b)). Therefore, the positioning of the conserved adenine bases of L8 in front of the fourth and ®fth C:G base-pairs of P4 is possible in both 784 Modelling of the 3D Structure of RNase P RNA type A and type B three-dimensional models (see Figure 1(c) and (d)) and can justify the concerted changes of length observed among RNase P sequences within P5, P7 and P8. Several experimental data support the existence of a tertiary contact between L8 and P4. Indeed, protections towards the Fe(II)-EDTA hydroxyl radical of both L8 and P4 are lost when domains I and II are constructed and folded separately (Loria & Pan, 1996). Moreover, recent cross-linking data have shown the proximity of L8 and P4 in the catalytically active RNase P RNA (Harris et al., 1997). With this interaction, the P8/P9 stack is anchored to the catalytic domain at both ends through loop-helix interactions (see the green arrows in Figure 2(c)), providing a rigid frame for the L14-P8 and L18-P8 interactions (see the blue arrows in Figure 2(c)). The overall three-dimensional architecture According to our three-dimensional models, the architecture of bacterial RNase P RNAs can be seen as the assembly of helical stacks packed together through the formation of long-range tertiary interactions (see Figures 1 and 2). Several intradomain long-range interactions participate in the independent folding and stabilization of domains I and II. So far, no less than three longrange interactions have been found to constrain the folding of domain I: interactions L14-P8 and L13P12 in type A RNase P RNA, and L12-P10.1a in type B RNase P RNA. Domain II from type A RNase P RNA is stabilized by the long-range interaction P6, whereas loop-loop interaction L5.1-L15.1 represents the P6 natural counterpart in type B RNase P RNA. Interdomain tertiary interactions like L18-P8, L9-P1 and L8-P4 are then favouring the association of domains I and II into the ®nal active structure. The core from both models presents a side-by-side assembly of the four main helical stacks: P11/P10/P7, P9/P8, P1/P4/P5 and (P19)/P2/P3 (see Figures 1 and 2). This assembly is smoothly bent, with an internal and an external side (see Figure 8(a)). The internal side is concave and acts as a cradle for the pre-tRNA substrate. It contains also most of the universally conserved nucleotides from P4, J5/15, J18/2, J19/4 that are thought to form the catalytic site (see Figure 2). Contacts with the pre-tRNA substrate are clustered in three regions. (i) The 30 -terminal RCCA sequence binds to the internal loop L15/16 in the E. coli model or to the L15 loop in the B. subtilis model (see Figure 8(b)). (ii) The catalytic site, in the neighbourhood of the 50 termini of the tRNA, comprises J5/15, the 50 strand of P4, J18/2 as well as the conserved A351 and 352 from J19/4. (iii) The P11 stem and the A118 from P9 are part of the recognition site for the T-loop of the substrate (Pan et al., 1995; Loria & Pan, 1997). The external side of the threedimensional architecture possesses, in contrast, the most variable extensions that maintain the active conformation of the catalytic site. On this external side can be found the anchoring sites for the L14 Figure 8. (a) Top view of the CPK model of the B. subtilis RNase P RNA-tRNA complex, viewed down the axis of the T-stem axis of the tRNA. The T arm of the tRNA lies in the cradle made by the assembly of the red, green, purple and orange stacks (respectively P11/P10/ P7, P9/P8, P1/P4/P5 and P2/P3). (b) Side view of the B. subtilis RNase P RNA-tRNA complex. A rotation of 90 along the horizontal axis is applied to the model between both views. The upstream sequence of the tRNA is not shown for clarity. The base-pairing between L15 and the tRNA RCCA sequence is visible at the bottom of the Figure. and L18 loops (see Figure 2(b) and (d)) and most nucleotides involved in the binding of the RNase P protein component (Talbot & Altman, 1994a). Validation of the models The structure of the E. coli RNase P RNA has been investigated through crosslinking experiments (Burgin & Pace, 1990; Nolan et al., 1993; Ê long crosslinking Harris et al., 1994) using a 9 A agent (azidophenacyl bromide) speci®cally attached to the 50 end of either RNase P RNA or mature tRNA. The main crosslinks obtained in that way are summarized in Table 6. For each attachment site, the crosslinked nucleotides are speci®ed, 785 Modelling of the 3D Structure of RNase P RNA Table 6. Validation data for the E. coli model, according to available crosslinking data (Nolan et al., 1993; Harris et al., 1994; Oh & Pace, 1994) Ê) Crosslinked nucleotides (inter-phosphate distance in A Attachment G G G G G G G G 63 135 179 229 292 304 332 350 G1 G 53 G 64 335 112 112 127 247 247 247 247 (17.8) (15.0) (29.0) (17.9) (24.9) (28.0) (15.4) (20.0) 248 (10.1) 111 (21.6) 67 (25.5) 351 118 118 129 248 248 248 248 (25.5) (18.0) (30.4) (18.4) (23.9) (24.6) (13.7) (19.4) 331 (20.9) 114 (24.4) 68 (21.2) 352 (26.4) 123 137 253 295 295 300 (36.0) (22.0) (11.7) (30.4) (15.3) (22.4) 332 (16.4) 348 (24.6) 100 (23.2) 128 138 260 296 (18.4) (20.2) (17.1) (32.6) 132 (18.2) 183 (19.7) ‡2 (30.6) 185 (17.6) ‡3 (34.7) 329 (19.4) -4 (19.9) -5 (25.1) 333 (21.2) 349 (20.7) 118 (22.0) 247 (17.8) 248 (21.1) 0 The attachment site of 5 -azidophenacyl crosslinking agents is indicated in the ®rst column. For each attachment site, the crosslinked nucleotides are indicated, with the mean internucleotide distance measured between P atoms. Nucleotides within the pre-tRNA subÊ ) are bold. strate are indicated with italicized numbers. Distances greater than the chosen threshold value (30 A with the mean inter-nucleotide distance (measured between phosphorus atoms) indicated in parentheses. We consider that a given crosslink is compatible with our model if this distance is less than Ê , since the localization of these crosslinks is 30 A not known at an atomic scale. Moreover, since most attachment sites are obtained by circular permutations of the RNA, crosslinking agents are attached to new 50 ends, for which the backbone itself is expected to add some extra ¯exibility. The majority of intra- and inter-molecular crosslinks are compatible with our models (see Table 6). Two distinct sets of incompatible crosslinks are visible. The ®rst set involves the G179 crosslinks, i.e. within the region of L11/12, which is poorly constrained in our models. The high level of conservation of this internal loop suggests speci®c contacts between its bases, which we did not attempt to model. The second set of unsatis®ed constraints occurs in the region of P15. Once again, the actual 3D structure of this region is likely to be more compact than in our models and remains undetermined. Discussion General organization of the core of RNase P RNAs Type A and type B RNase P secondary structures share a common conserved core that encompasses parts of structural domains I and II: roughly, stems P1 to P5 with P15 from domain II and stems P7 to P11 from domain I (see Figure 9). These core elements may be conserved because: (i) they are part of the catalytic site (P4, J5/15, J19/4 and J18/2); (ii) they are involved in the correct recognition of the substrate (P11, L15); and (iii) they form part of the frame that stabilizes the catalytic core or constitutes the interdomain interface (P8, P9). Therefore, deletion of some of the normally conserved core elements may affect or abolish the speci®city for pre-tRNA substrates, while still allowing cleavage of alternative substrates (Guerrier-Takada & Altman, 1992; Pan & Jakacka, 1996). Figure 9(c) illustrates the decomposition of the core into two main domains, each formed by the coaxial stacking of several helices. We do not wish to convey the impression that Figure 9(c) represents the 2D structure of an Ur-RNase P, precursor of the two bacterial types, although we cannot exclude the fact that it constitutes a possibility. The common interdomain contacts involve P1 and L9 as well as P4 and L8. However, type A and type B RNase P RNAs present their own set of peripheral, tertiary elements that contribute to the stabilization of the catalytic core through the formation of speci®c tertiary interactions (see The overall threedimensional architecture in Results and Figure 9(a) and (b)). Insertion sites of less conserved, peripheral elements occur on the core periphery. Type A and type B RNase P RNAs are best distinguished by the elements inserted in the single-stranded regions between P5 and P7 and L15 (see Figure 9). Several long-range interactions characteristic of each type of RNase P occur between peripheral elements (e.g. L5.1-L15.1 in type B or P6 in type A) as well as between the periphery and the core (e.g. L14-P8 and L18-P8 in type A). Despite the fact that these peripheral elements are non-homologous, they nevertheless have similar, functional purposes by contributing to the folding and stabilization of a common catalytic core. Comparison with other models The present models are somewhat similar, in their general domain organisation, to the last model described by Harris and Pace (later referenced as the HP model, see Harris et al., 1997). Common features include a roughly similar orientation of the pre-tRNA substrate, an identical ¯attened conformation of the four-way junction, a network of tertiary interactions L14-P8, L18-P8 and L9-P1, and a C-shaped extension P15/P16/P17/P6. Signi®cant divergences are, however, clearly visible, in particular in domain II. Whereas the P4/P5 and P1/P2/P3 stacks form a T in the HP model, the stacks P1/P4/P5 and (P19)/P2/P3 are nearly parallel in our models. 786 Modelling of the 3D Structure of RNase P RNA Figure 9. Simpli®ed 2D representations of (a) a typical type A sequence and of (b) a type B sequence with their characteristic tertiary interactions. Stems are colour-coded as in Figures 1 and 2 with conserved regions represented with thick lines. Main insertion sites are indicated by arrowheads. (c) A 2D representation of the minimal core. Insertion sites common to both types (P12, P18 and P19) are indicated by small arrowheads. The insertion sites of peripheral elements that are characteristic of a given type are indicated by large arrowheads. 787 Modelling of the 3D Structure of RNase P RNA More importantly, the incorporation of P19 in our models leads to a different topology of the central pseudoknot with an alternative crossing of the junctions J19/4 and J3/4 (see Figure 4). In Results, we proposed that the conserved J19/4 lies on the edge of the deep groove of P2 and along the shallow groove of P4, if one assumes a right-handed stack between P19 and P2. The crossing between J19/4 and the 50 strand of the P1 and P2 helices is thus reversed compared to that of the HP model. In order to avoid the passage of J19/4 between the 50 strand of P2 and J3/4, the crossing between the J19/4 and J3/4 strands also must be reversed. As a consequence, J19/4 lies on the substrate side in our models, instead of being on the protein side as in the HP model (compare the boxed 2D sketches in Figure 10(a) and (b)). This different placement of J19/4 has important consequences for the sequence of folding of the central pseudoknot. P2 was among the ®rst helices to be ®rmly identi®ed by phylogenetic sequence comparison (James et al., 1988) and has, therefore, been naturally thought to be part of the secondary structure. Evidence for four out of the eight basepairs of P4 was then also provided. Because of its assumed shorter length, P4 has been implicitly considered to fold after the formation of P2. This hierarchical folding is present in the HP model: the unfolding of its pseudoknot starts with the disruption of P4 (see Figure 10(a), right) and not by the opening of P2, which would lead to an entanglement (see Figure 10(a), left). As new sequences provided new instances of covariations in the distal base-pairs, P4 has been extended to its present length of 8 bp plus a bulge (Haas et al., 1991). However, the folding of the central pseudoknot has not been re-analysed. The following facts argue for an alternative folding of the pseudoknot: (i) P4 is indeed longer than P2 (8 bp and a conserved bulge versus. 7 bp); (ii) P2 displays greater variation in length than P4, and is even absent in some mitochondrial sequences where P4 lacks only one base-pair; (iii) protections towards the Fe(II)EDTA hydroxyl radical show that P2 is almost completely exposed to solvent, whereas P4 is deeply buried in the structure (Loria & Pan, 1996); and (iv) experimental evidence for a slower formation of P2 has been reported (Zarrinkar et al., 1996). These arguments would indicate that P2 is indeed a tertiary interaction with P4 forming the 2D structure. This is noticeable in our models (see Figure 10(b), right), where the 30 strand of P2 and its ¯anking junctions, J18/2 and J19/4, make a broad loop that, upon folding, creates the pairing of P2. One can therefore expect that the unfolding of the pseudoknot starts with the disruption of P2 and not of P4 (see Figure 10(b), left). In domain I, the main difference between the HP model and our models concerns the positioning of P13 and P14. In our models, the fact that P13 and P14 are stacked implies a rather different orientation of P13 and P12. The latter stem is partially constrained by the maintenance of the L12-P10.1 interaction. But, since P10.1 is itself constrained only in its lower part, small variations in the structure of the P10/P10.1/P11 junction would result in greater changes in the position of the distal end of P10.1. This would dramatically alter the orientation of P12. This is the reason why the modelling of P12 and L11/12 will probably need re®nement. Long-range interactions within RNase P RNAs RNA-RNA long-range interactions The long-range interactions within RNase P RNAs belong essentially to two broad categories: base-pairing between two loops and the family of GNRA loop-receptor interactions (Westhof et al., 1996a; and see Figure 1). Interestingly, both kinds of contacts are alternatively found between L9 and P1, since this interaction may be realized either through a GNRA-helix interaction (mostly in type A sequences) or through the formation of a pseudoknot P21 in some mycoplasmas (Massire et al., 1997). Nevertheless, GNRA loop-receptor interactions are found at four distinct locations and are thus the most widespread long-range interactions among RNase P RNAs. In addition, several tertiary interactions proposed in our models involve adenine-rich terminal loops (L8, L13, L5.1 and L15.1) distinct from the classical GNRA tetraloop, as well as adenine-rich internal loops, such as those found within P10/P11 and P12. Recently, Abramowitz & Pyle (1997) reported that GNRA tetraloops may be considered as members of a larger family of GNn . . . RA motifs that all share the ability to form tertiary interactions through the recognition of the minor groove of a helix. GNn . . . RA motifs may thus be enlarged to hexa- or heptaloops, or even internal loops if the insertion sequence has the ability to fold itself into a regular hairpin stem ±loop structure. Such a motif is indeed found in the Deinococcus radiodurans sequence, which displays an unusually long P9 stem interrupted by a GA . . . GAA internal loop after 5 bp, i.e. at the exact place of the L9 tetraloop found in most sequences. Accordingly, the G:C pairs that constitute the putative loop receptor are seen in P1 as expected. Other probable members of the GNn . . . RA motif family are the L8 decaloop and the L15.1 loop from type B RNase P RNAs, of respective consensus GAAN3AUAA and GAAN35AA (see Figure 7). The conservation of both ends suggests that these loops are closed by at least one sheared G:A or A:A base-pair, thus extending the helical stacks P8 and P15.1. The Watson-Crick side of at least the 30 adenine base is therefore available for further long-range interaction, as is the case with a canonical GNRA tetraloop. Thus, whereas adenine bases of the L8 decaloop are proposed to interact with the shallow groove of the G:C basepairs within P4, adenine bases of L15.1 could well be involved in a new type of loop-loop interaction with the hexaloop UGNRAU, L5.1. The higher 788 Modelling of the 3D Structure of RNase P RNA Figure 10. Schematic views of the unfolding (upper arrows) and folding processes (lower arrows) of domain II as a function of the conformation of the P2-P4 pseudoknot common to RNase P RNA. (a) Topology found in previous models (HP). (b) Topology in present models (MJW). The secondary structure schemes in the rectangles represent the folding topology of the central P2-P4 pseudoknot and its connections to the surrounding P1, P3 and R19/P19. The insertion of stems P5 to P18 is indicated by two dots. In (a), J19/4 is behind J3/4 and the 50 strand of P1 and P2, while in (b) it is the reverse. Paths to the right illustrate possible unfolding pathways. In (a), ®rst P4 must open and then P2, while the opposite occurs in (b). Paths to the left illustrate unfolding pathways leading to strand entanglements. The process of unfolding the central pseudoknot P2-P4 is thus non-commutative, in HP models ®rst P4 opens and then P2, while in MJW models ®rst P2 opens and then P4. The non-commutativity of the two operations comes formally from the topology of the junctions between the central pseudoknot and the connecting helices. Thus, in the HP model, if the relative positions of P3 and domain I were interchanged, the path to the left would be possible (a). Similarly, in MJW models, if the junctions J1/4 and J3/4 were interchanged, the path to the left (b) would be possible. However, the assumed right-handed stacking between P2 and P3, as well as between P1 and P4, leads to the stereochemically favoured foldings shown in the boxed schemes. Further, the choice for the MJW model compared to the HP model derives from the right-handed stack between P19/R19 and P2. degree of conservation within the decaloop L8 is moreover striking, and suggests speci®c pairings between 50 and 30 ends of the loop. Internal loops of similar sequence GAAN . . . AUAA are found also within stem P9 of purple bacteria sequences (Brown et al., 1996) in place of the GNRA loop L9, providing additional evidence for the implication of such motifs in tertiary interactions. In type A RNase P RNAs, the natural counterpart of the L8 decaloop is a pentaloop of consensus UAAYR, which is also proposed to interact with the G:C base-pairs in P4. This loop has no obvious similarity with the classical GNRA loop and seems to be much less widespread. The G . . . YAA internal loop motif is another interesting motif found at two different locations among RNase P RNAs (see Figure 1(c)). This motif within P10/P11 was shown to recognise speci®cally the pre-tRNA backbone (Pan et al., 1995; Loria & Pan, 1997). As proposed in our 3D models, the same motif within P12 is potentially involved in the recognition of the conserved heptaloop L13 of consensus GUAAGAG. It remains to be seen whether the motif G . . . YAA is found in other large RNA structures. Possible compensation of a missing RNA-RNA interaction by the protein component The conservation of the L8-P4 interaction is of key importance in our models, since it maintains the association of the two main folding domains in a cradle-like conformation. This interaction is also expected to occur in archaea, since the archaeal 789 Modelling of the 3D Structure of RNase P RNA consensus core is almost identical with the bacterial core (Haas et al., 1996a). However, the whole stem ±loop P8/L8 is absent in both archaea Methanococcus jannaschii and Archaeoglobus fulgidus. These sequences are, moreover, the only archaeal ones where the whole P16/P17/P6 extension has also been deleted, raising the question of a possible link between the deletion of these structural elements. One can nevertheless speculate that, in both these organisms, the protein component of RNase P has co-evolved in order to compensate for the lack of P8 and/or P6 so as to maintain the active conformation of the RNA. Unfortunately, no RNase P protein from archaea has been fully identi®ed to date (Nieuwlandt et al., 1991; Darr et al., 1992), and no analogue of bacterial (Brown & Pace, 1992) or eukaryal (Kim et al., 1997) proteins is found in complete archaeal genomes (Bult et al., 1996; Klenk et al., 1997; Smith et al., 1997). The absence of a bacterial-like protein in archaeal RNase P is particularly striking when considering the very strong structural homology between archaeal and bacterial RNase P RNA. In eukarya, the secondary structure of RNase P RNA is not yet ®rmly established (Tranguch & Engelke, 1993), but resembles the bacterial or archaeal consensus in many points (Chen & Pace, 1997). Animal sequences nevertheless seem to lack stems P5 and P15, which may be linked with the emergence of a stem-loop-stem structure within P3. Yeast sequences may be distinguished by their unique structure of the cruciform junction, illustrating once again the modular use of alternative RNA structural motifs and their possible stabilization through either RNA-RNA or protein-RNA interactions. Conclusion The modelling of the three-dimensional architecture of group I introns emphasized the hierarchical folding of large RNA molecules (Michel & Westhof, 1990; Jaeger et al., 1996; Westhof et al., 1996a). First, the pairings of the secondary structure join proximate regions in sequence, followed by end-to-end stacking of contiguous helices. Those preformed domains then associate to constitute the compact tertiary structure via interactions between tertiary architectural motifs (for a review, see Brion & Westhof, 1997). The ®rst recurrent 3D motif discovered (Michel & Westhof, 1990) consists of a GNRA tetraloop interacting with two consecutive base-pairs in the shallow groove of an RNA helix. The primordial role played by this motif and its variants in the hierarchical self-assembly process of large RNAs has since been established (Jaeger et al., 1994; Murphy & Cech, 1994; Pley et al., 1994; Costa & Michel, 1995, 1997; Cate et al., 1996a; Costa et al., 1997). The modelling of four complete self-splicing group I introns belonging to four subgroups illustrated the widespread use of two types of long-range RNA-RNA anchors between peripheral domains: GNRA-helix receptor and loop-loop interactions (e.g. see Lehnert et al., 1996). The picture that emerged from that study was that non-homologous peripheral domains, characteristic of each subgroup, engage in either one of those two RNA-RNA long-range anchoring interactions and, thereby, contribute to either the stabilization of the helical stems of the catalytic core or the correct positioning of the helical substrate. The present work extends the view of a modular and hierarchical RNA self-assembly, as seen in the architecture of group I introns, to the bacterial RNase P ribozymes. Again, coaxial stacking of helical stems (one of which contains a pseudoknot) is largely prevalent in the formation of subdomains that create a cradle into which the substrate, the amino acid arm of pre-tRNA, is bound. And, again, the maintenance and assembly of the subdomains is performed by a recurrent and systematic use of GNRA-helix and loop-loop anchoring contacts. Finally, again, different and non-homologous peripheral elements promote different and mutually exclusive long-range anchoring contacts with identical functional purposes. Only a fraction of group I introns self-assemble and most of them require the help of protein cofactors for proper folding. A well-described system is the Neurospora crassa mitochondrial tyrosyl tRNA synthetase (CYT-18 protein), which binds to the P4-P6 domain of introns belonging to subgroup IA2, like the N. crassa NcLSU intron (Caprara et al., 1996b). Those introns lack the P5abc extension, which forms a self-folding domain (Murphy & Cech, 1993; Cate et al., 1996a) and is situated at the back of the substrate binding site. It has been shown that CYT-18 binds at the same place as P5abc (Caprara et al., 1996a). Similarly, in the present models, the P RNA forms a layer of helices sandwiched between the protein cofactor and the tRNA substrate. One can speculate that a fully autonomous RNase P functioning without a protein cofactor would require the presence of additional peripheral domains for adequate stabilization of the helical core at the back side of the substrate recognition surface. Methodology Phylogenetic analysis The initial alignment of 87 bacterial sequences from the sixth edition of the book of P was ®rst downloaded from the ribonuclease P database (Brown, 1997). New sequences were added and manually aligned using SeqPup (Don Gilbert, Indiana University) as they were directly brought to publication (J. W. Brown, personal communication; and Haas et al., 1996b; Herrmann et al., 1996; Vioque, 1997) or derived from genome sequencing projects (Fleischmann et al., 1995; Fraser et al., 1995; Himmelreich et al., 1996; Tomb et al., 1997). Although the sequencing of the complete gen- 790 Modelling of the 3D Structure of RNase P RNA Table 7. References and access numbers of the sequences cited in this work Organism Reference Acholeplasma laidlawii Archaeglobus fulgidus Bacillus megaterium Bacillus subtilis Brevibacillus brevis Chlamydia trachomatis Chlorobium limicola Clostridium innocuum Deinococcus radiodurans Escherichia coli Methanococcus jannaschii Mycoplasma capricolum Mycoplasma fermantans Mycoplasma genitalium Mycoplasma pneumoniae Reclinomonas americana Synechococcus PCC6301 Haas et al. (1996b) Klenk et al. (1997) James et al. (1988) Reich et al. (1986) James et al. (1988) Herrmann et al. (1996) Haas et al. (1994) Haas et al. (1996b) Haas et al. (1991) Reed et al. (1982) Bult et al. (1996) Ushida et al. (1995) Siegel et al. (1996) Fraser et al. (1995) Himmelreich et al. (1996) Lang et al. (1997) Banta et al. (1992) ome of Neisseria gonorrhoeae, Streptococcus pyogenes, (Roe et al., 1997, http://dna1.chem.uoknor.edu, Neisseria meningditis, Vibrio cholerae (TIGR, 1997, http://www. tigr.org/tdb/mdb/mdb.html) and Treponema pallidum (Weinstock & Norris, 1997, http://utmmg.med.uth. tmc.edu/treponema/tpall.html) are still under progress, their unique RNase P RNA gene (rnpB) has been identi®ed by BLAST searches (Altschul et al., 1997). These sequences contain the whole set of conserved nucleotides that are characteristic of bacterial RNase P RNA. They display therefore a canonical type A secondary structure and have been added to the alignment. Sequences were then classi®ed by phylogeny before being sought for ®ner alignment within the most variable stems (P3, P6, P9, P12, P16, P17, P19, P10.1) and within single-stranded regions (L11/12, L15/16, L8, L15.1). For clarity, references and access numbers of sequences of particular interest are grouped in Table 7. Throughout the text, any base-pair is denoted by a colon. Residues involved in internal loops are separated by three dots. Other standard abbreviations are: N, any nucleotide; R, A or G; Y, U or C; H is A, C or U. Nucleotides in bold letters represent invariant nucleotides. Three-dimensional modelling Molecular modelling was performed as described (Michel & Westhof, 1990; Westhof, 1993) using the program MANIP (unpublished) implemented on an SGI workstation. The 3D structures were geometrically re®ned with the restrained least-squares program NUCLIN-NUCLSQ (Westhof et al., 1985). Drawings were produced by DRAWNA (Massire et al., 1994) on an SGI workstation. Coordinates are available by anonymous ftp (130.79.17.244). Acknowledgements We are grateful to Dr J. W. Brown for communicating RNase P RNA sequences from green non-sulphur bacteria prior to publication, and more generally for his involvement in the creation and the updating of the RNase P database (Brown, 1997). We thank Dr S. Norris for the identi®cation of the rnpB gene in the Treponema Access number U64877 AE000782 M19022 M13175 M19023 U44031 L25703 U64878 M64708 M17569 L77117 D13066 U41756 L43967 U00089 AF007261 X63566 pallidum genome. C.M. ackowledges support from the MinisteÁre de l'eÂducation nationale, de l'enseignement supeÂrieur, de la recherche et de l'insertion professionnelle and the Human Frontier Science Program (CRG0291/1997-M to E.W.). E.W. thanks the ``Institut Universitaire de France'' for support. References Abramowitz, D. L. & Pyle, A. M. (1997). Remarkable morphological variability of a common RNA folding motif: the GNRA tetraloop-receptor interaction. J. Mol. Biol. 266, 493± 506. Altman, S., Kirsebom, L. & Talbot, S. (1993). Recent studies of ribonuclease P. FASEB J. 7, 7 ± 14. Altschul, S. F., Madden, T. L., SchaÈffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389± 3402. Banta, A. B., Haas, E. S., Brown, J. W. & Pace, N. R. (1992). Sequence of the ribonuclease P RNA gene from the cyanobacterium Anacystis nidulans. Nucl. Acids Res. 20, 911. Brion, P. & Westhof, E. (1997). Hierarchy and dynamics of RNA folding. Annu. Rev. Biophys. Biomol. Struct. 26, 113± 137. Brown, J. W. (1997). The ribonuclease P database. Nucl. Acids Res. 25, 263± 264. Brown, J. W. & Pace, N. R. (1992). Ribonuclease P RNA and protein subunits from bacteria. Nucl. Acids Res. 20, 1451± 1456. Brown, J. W., Haas, E. S. & Pace, N. R. (1993). Characterization of ribonuclease P RNAs from thermophilic bacteria. Nucl. Acids Res. 21, 671± 679. Brown, J. W., Nolan, J. M., Haas, E. S., Rubio, M. A. T., Major, F. & Pace, N. R. (1996). Comparative analysis of ribonuclease P RNA using gene sequences from natural microbial population reveals tertiary structural elements. Proc. Natl Acad. Sci. USA, 93, 3001± 3006. Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., FitzGerald, L. M., Clayton, R. A., Gocayne, J. D., Kerlavage, A. R., Dougherty, B. A., Tomb, J.-F., Adams, M. D., Reich, C. I., Overbeek, R., Kirkness, E. F., Weinstock, K. G., Merrick, J. M., Glodek, A., Scott, J. L., Geoghagen, N. S. M., Weidman, J. F., Modelling of the 3D Structure of RNase P RNA Fuhrmann, J. L., Nguyen, D., Utterbach, T. R., Kelley, J. M., Peterson, J. D., Sadow, P. W., Hanna, M. C., Cotton, M. D., Roberts, K. M., Hurst, M. A., Kaine, B. P., Borodovsky, M., Klenk, H.-P., Fraser, C. M., Smith, H. O., Woese, C. R. & Venter, J. C. (1996). Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058± 1073. Burgin, A. B. & Pace, N. R. (1990). Mapping the active site of ribonuclease P RNA using a substrate containing a photoaf®nity agent. EMBO J. 9, 4111± 4118. Caprara, M. G., Lehnert, V., Lambowitz, A. M. & Westhof, E. (1996a). A tyrosyl-tRNA synthetase recognizes a conserved tRNA-like structural motif in the group I intron catalytic core. Cell, 87, 1135± 1145. Caprara, M. G., Mohr, G. & Lambowitz, A. M. (1996b). A tyrosyl-tRNA synthetase protein induces tertiary folding of the group I intron catalytic core. J. Mol. Biol. 257, 512± 531. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R. & Doudna, J. A. (1996a). Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 1678± 1685. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R. & Doudna, J. A. (1996b). RNA tertiary structure mediation by adenosine platforms. Science, 273, 1696± 1699. Chen, J.-L. & Pace, N. R. (1997). Identi®cation of the universally conserved core of RNase P RNA. RNA, 3, 557± 560. Costa, M. & Michel, F. (1995). Frequent use of the same tertiary motif by self-folding RNAs. EMBO J. 14, 1276± 1285. Costa, M. & Michel, F. (1997). Rules for RNA recognition of GNRA tetraloops deduced by in vitro selection: Comparison with in vivo evolution. EMBO J. 16, 3289± 3302. Costa, M., DeÁme, E., Jacquier, A. & Michel, F. (1997). Multiple tertiary interactions involving domain II of group II self-splicing introns. J. Mol. Biol. 267, 520± 536. Darr, S. C., Brown, J. W. & Pace, N. R. (1992). The varieties of ribonuclease P. Trends Biochem. Sci. 17, 178± 182. Doudna, J. A. & Cech, T. R. (1995). Self-assembly of a group I intron active site from its component tertiary structural domains. RNA, 1, 36 ±45. Easterwood, T. R. & Harvey, S. C. (1997). Ribonuclease P RNA: Models of the 15/16 bulge from Escherichia coli and the P15 loop of Bacillus subtilis. RNA, 3, 577± 585. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J.-F., Dougherty, B. A. & Merrick, J. M. (1995). Whole-genome randon sequencing and assembly of Haemophilus in¯uenzae Rd. Science, 269, 496± 512. Frank, D. N., Ellington, A. E. & Pace, N. R. (1996). In vitro selection of RNase P RNA reveals optimized catalytic activity in a highly conserved structural domain. RNA, 2, 1179± 1188. Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A., Fleischmann, R. D., Bult, C. J., Kerlavage, A. R., Sutton, G. G., Kelley, J. M., Fritchman, J. L., Weidman, J. F., Small, K. V., Sandusky, M., Fuhrmann, J., Nguyen, D., 791 Utterbach, T. R., Saudek, D., Philips, C. A., Merrick, J. M., Tomb, J.-F., Dougherty, B. A., Bott, K. F., Hu, P.-C., Lucier, T. S., Peterson, S. N., Smith, H. O., Hutchinson, C. A. & Venter, J. C. (1995). The minimal gene complement for Mycoplasma genitalium. Science, 270, 397± 403. Glemarec, C., Kufel, J., FoÈldesi, A., Maltseva, T., SandstroÈm, A., Kirsebom, L. A. & Chattopadhyaya, J. (1996). The NMR structure of 31mer RNA domain of Escherichia coli RNase P RNA using its non-uniformly deuterium labelled counterpart [the `NMRwindow' concept]. Nucl. Acids Res. 24, 2022± 2035. Guerrier-Takada, C. & Altman, S. (1992). Reconstitution of enzymatic activity from fragments of M1 RNA. Proc. Natl Acad. Sci. USA, 89, 1266± 1270. Guerrier-Takada, C., Gardiner, K., Marsh, T. L., Pace, N. R. & Altman, S. (1983). The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell, 35, 849± 857. Guerrier-Takada, C., Lumelsky, N. & Altman, S. (1989). Speci®c interactions in RNA enzyme-substrate complexes. Science, 246, 1578± 1584. Haas, E. S., Morse, D. P., Brown, J. W., Schmidt, F. J. & Pace, N. R. (1991). Long-range structure in ribonuclease P RNA. Science, 254, 853± 856. Haas, E. S., Brown, J. W., Pitulle, C. & Pace, N. R. (1994). Further perspective on the catalytic core and secondary structure of ribonuclease P RNA. Proc. Natl Acad. Sci. USA, 91, 2527± 2531. Haas, E. S., Armbruster, D. W., Vucson, B. M., Daniels, C. J. & Brown, J. W. (1996a). Comparative analysis of ribonuclease P RNA in Archaea. Nucl. Acids Res. 24, 1252± 1259. Haas, E. S., Banta, A. B., Harris, J. K., Pace, N. R. & Brown, J. W. (1996b). Structure and evolution of ribonuclease P RNA in Gram-positive bacteria. Nucl. Acids Res. 24, 4775 ±4782. Harris, M. E., Nolan, J. M., Malothra, A., Brown, J. W., Harvey, S. C. & Pace, N. R. (1994). Use of photoaf®nity crosslinking and molecular modeling to analyze the global architecture of ribonuclease P RNA. EMBO J. 13, 3953± 3963. Harris, M. E., Kazantchev, A. V., Chen, J. L. & Pace, N. R. (1997). Analysis of the tertiary structure of the ribonuclease P ribozyme-substrate complex by photoaf®nity crossliking. RNA, 3, 561± 574. Herrmann, B., Winqvist, O., Mattson, J. G. & Kirsebom, L. A. (1996). Differentiation of Chlamydia sby sequence determination and restriction endonuclease cleavage of RNase P RNA genes. J. Clin. Microbiol. 34, 1897± 1902. Heus, H. A. & Pardi, A. (1991). Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops. Science, 253, 191± 194. Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.-C. & Herrmann, R. (1996). Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucl. Acids Res. 24, 4420± 4449. Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement of a GNRA tetraloop in long-range RNA tertiary interactions. J. Mol. Biol. 236, 1271± 1276. Jaeger, L., Michel, F. & Westhof, E. (1996). The structure of group I ribozymes. In Catalytic RNA (Eckstein, F. & Lilley, D. M. J., eds), vol. 10, pp. 33 ± 51, Springer Verlag, Berlin. James, B. D., Olsen, G. J., Liu, J. & Pace, N. R. (1988). The secondary structure of ribonuclease P RNA, the catalytic element of a ribonucleoprotein enzyme. Cell, 52, 19 ± 26. 792 Kahle, D., Wehmeyer, U. & Krupp, G. (1990). Substrate recognition by RNase P and by the catalytic M1 RNA: identi®cation of possible contact points in pre-tRNAs. EMBO J. 9, 1929± 1937. Kim, J. J., Kilani, A. F., Zhan, X., Altman, S. & Liu, F. (1997). The protein cofactor allows the sequence of an RNase P ribozyme to diversify by maintaining the catalytically active structure of the ribozyme. RNA, 3, 613± 623. Kirsebom, L. A. & Altman, S. (1989). Reaction in vitro of some mutants of RNase P with wild-type and temperature-sensitive substrates. J. Mol. Biol. 207, 837± 840. Kirsebom, L. A. & SvaÈrd, S. G. (1994). Base-pairing between Escherichia coli RNase P RNA and its substrate. EMBO J. 13, 4870± 4876. Klenk, H.-P., Clayton, R. A., Tomb, J.-F., White, O., Nelson, K. E., Ketchum, K. A., Dodson, R. J., Gwimm, M., Hickey, E. K., Peterson, J. D., Richardson, D. L., Kerlavage, A. R., Graham, D. E., Kyrpides, N. C., Fleischmann, R. D., Quackenbush, J., Lee, N. H., Sutton, G. G., Gill, S., Kirkness, E. F., Dougherty, B. A., McKenney, K., Adams, M. D., Loftus, B., Peterson, S., Reich, C. I., McNeil, L. K., Badger, J. H., Glodek, A., Zhou, L., Overbeek, R., Gocayne, J. D., Weidman, J. F., McDonald, L., Utterback, T., Cotton, M. D., Sproggs, T., Artiach, P., Kaine, B. P., Sykes, S. M., Sadow, P. W., D'Andrea, K. P., Bowman, C., Fujii, C., Garland, S. A., Mason, T. M., Olsen, G. J., Fraser, C. M., Smith, H. O., Woese, C. R. & Venter, J. C. (1997). The complete genome sequence of the hyperthermophilic, suphate-reducing archaeon Archaeoglobus fulgidus. Nature, 390, 364± 370. Knap, A. K., Wesolowski, D. & Altman, S. (1990). protection from chemical modi®cation of nucleotides in complexes of M1 RNA, the catalytic subunit of RNase P from E. coli, and tRNA precursors. Biochimie, 72, 779± 790. LaGrandeur, T. E., HuÈttenhofer, A., Noller, H. F. & Pace, N. R. (1994). Phylogenetic comparative chemical footprint analysis of the interaction between ribonuclease P RNA and tRNA. EMBO J. 13, 3945± 3952. Lang, B. F., Burger, G., O'Kelly, C. J., Cedergren, R., Golding, G. B., Lemieux, C., Sankoff, D., Turmel, M. & Gray, M. W. (1997). An ancestral mitochondrial DNA ressembling a eubacterial genome in miniature. Nature, 387, 493± 497. Lehnert, V., Jaeger, L., Michel, F. & Westhof, E. (1996). New loop-loop tertiary interactions in self-splicing introns of subgroups IC and ID: a complete 3D model of the Tetrahymena thermophila ribozyme. Chem. Biol. 3, 993± 1009. Loria, A. & Pan, T. (1996). Domain structure of the ribozyme from eubacterial ribonuclease P. RNA, 2, 551± 563. Loria, A. & Pan, T. (1997). Recognition of the T stemloop of a pre-tRNA substrate by the ribozyme from Bacillus subtilis ribonuclease P. Biochemistry, 36, 6317± 6325. Lumelsky, N. & Altman, S. (1988). Selection and characterization of randomly produced mutants in the gene coding for M1 RNA. J. Mol. Biol. 202, 443± 454. Massire, C., Gaspin, C. & Westhof, E. (1994). DRAWNA: a program for drawing schematic views of nucleic acids. J. Mol. Graph. 12, 201± 206. Modelling of the 3D Structure of RNase P RNA Massire, C., Jaeger, L. & Westhof, E. (1997). Phylogenetic evidence for a new tertiary interaction in RNase P RNA. RNA, 3, 553± 556. Mattson, J. G., SvaÈrd, S. G. & Kirsebom, L. A. (1994). Characterization of the Borrelia burgdorferi RNase P RNA gene reveals a novel tertiary interaction. J. Mol. Biol. 241, 1 ±6. Michel, F. & Westhof, E. (1990). Modelling of the threedimensional architecture of group I catalytic introns based on comparative sequence analysis. J. Mol. Biol. 216, 585± 610. Murphy, F. L. & Cech, T. R. (1993). An independently folding domain of RNA tertiary structure within the Tetrahymena ribozyme. Biochemistry, 32, 5291± 5300. Murphy, F. L. & Cech, T. R. (1994). GAAA tetraloop and conserved bulge stabilize tertiary structure of a group I intron domain. J. Mol. Biol. 236, 49 ± 63. Nieuwlandt, D. T., Haas, E. S. & Daniels, C. J. (1991). The RNA component of RNase P from the archaebacterium Haloferax volcanii. J. Biol. Chem. 266, 5689± 5695. Nolan, J. N., Burke, D. H. & Pace, N. R. (1993). Circularly permuted tRNAs as speci®c photoaf®nity probes of ribonuclease P RNA structure. Science 261, 762± 765. Oh, B.-K. & Pace, N. R. (1994). Interaction of the 30 -end of tRNA with ribonuclease P RNA. Nucl. Acids Res. 22, 4087± 4094. Pace, N. R. & Brown, J. W. (1995). Evolutionary perspective on ribonuclease P structure and function. J. Bacteriol. 177, 1919± 1928. PagaÂn-Ramos, E., Lee, Y. & Engelke, D. R. (1996). A conserved RNA motif involved in divalent cation utilization by nuclear RNase P. RNA, 2, 1100± 1109. Pan, T. (1995). Higher order folding and domain analysis of the ribozyme from Bacillus subtilis ribonuclease P. Biochemistry, 34, 902± 909. Pan, T. & Jakacka, M. (1996). Mutiple substrate binding sites in the ribozyme from Bacillus subtilis RNase P. EMBO J. 15, 2249± 2255. Pan, T., Loria, A. & Zhong, K. (1995). Probing of tertiary interactions in RNA: 2'-hydroxyl-base contacts between the RNase P RNA and pre-tRNA. Proc. Natl Acad. Sci. USA, 92, 12510± 12514. Pley, H. W., Flaherty, K. M. & McKay, D. B. (1994). Model for an RNA tertiary interaction from the structure of an intermolecular complex between a GAAA tetraloop and an RNA helix. Nature, 372, 111± 114. Reed, R. E., Baer, M. F., Guerrier-Takeda, C., DonnisKeller, H. & Altman, S. (1982). Nucleotide sequence of the gene encoding the RNA subunit (M1 RNA) of ribonuclease P from Escherichia coli. Cell, 30, 627± 636. Reich, C., Gardiner, K. J., Olsen, G. J., Pace, B., Marsh, T. L. & Pace, N. R. (1986). The RNA component of the Bacillus subtilis RNase P: sequence, activity and partial secondary structure. J. Biol. Chem. 261, 7888± 7893. Shen, L. X., Cai, Z. & Tinoco, I. (1995). RNA structure at high resolution. FASEB J. 9, 1023± 1033. Shiraishi, H. & Shimura, Y. (1988). Functional domains of the RNA component of ribonuclease P revealed by chemical probing of mutant RNAs. EMBO J. 7, 3817± 3821. Siegel, R. W., Banta, A. B., Haas, E. S., Brown, J. W. & Pace, N. R. (1996). Mycoplasma fermantans simpli®es Modelling of the 3D Structure of RNase P RNA our view of the catalytic core of ribonuclease P RNA. RNA, 2, 452± 462. Smith, D. R., Doucette-Stamm, L. A., Deloughery, C., Lee, H., Dubois, J., Aldredge, T., Bashirzadeh, R., Blakely, D., Robin, Cook R., Gilbert, K., Harrison, D., Hoang, L., Keagle, P., Lumm, W., Pothier, B., Qiu, D., Spadafora, R., Vicaire, R., Wang, Y., Wierzbowski, J., Gibson, R., Jiwani, N., Caruso, A., Bush, D., Safer, H., Patwell, D., Prabhakar, S., McDougall, S., Shimer, G., Goyal, A., Pietrokovski, S., Church, G. M., Daniels, C. J., Mao, J.-I., Rice, P., NoÈlling, J. & Reeve, J. N. (1997). Complete genome sequence of Methanobacterium thermoautotrophicum H: functional analysis and comparative genomics. J. Bacteriol. 179, 7135± 7155. Talbot, S. J. & Altman, S. (1994a). Gel retardation analysis of the interaction between C5 protein and M1 RNA in the formation of the ribonuclease P holoenzyme from Escherichia coli. Biochemistry, 33, 1399± 1405. Talbot, S. J. & Altman, S. (1994b). Kinetic and thermodynamic analysis of RNA-protein interactions in the RNase P holoenzyme from Escherichia coli. Biochemistry, 33, 1406± 1411. TallsjoÈ, A., SvaÈrd, S. G., Kufel, J. & Kirsebom, L. A. (1993). A novel tertiary interaction in M1 RNA, the catalytic subunit of Escherichia coli RNase P. Nucl. Acids Res. 21, 3927± 3933. Tanner, M. A. & Cech, T. R. (1995). An important RNA tertiary interaction of group I and group II introns is implicated in Gram-positive RNase P RNAs. RNA, 1, 349± 350. Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickley, H. K., Berg, D. E., Gocayne, J. D., Utterbach, T. R., Peterson, J. D., 793 Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karp, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature, 388, 539± 547. Tranguch, A. J. & Engelke, D. R. (1993). Comparative structural analysis of nuclear RNase P RNAs from yeast. J. Biol. Chem. 268, 14045± 14053. Ushida, C., Izawa, D. & Muto, A. (1995). RNase P RNA of Mycoplasma capricolum. Mol. Biol. Rep. 22, 125± 129. Vioque, A. (1997). The RNase P RNA from cyanobacteria: short tandemly repeated repetitive (STRR) sequences are present within the RNase P RNA gene in heterocyst-forming cyanobacteria. Nucl. Acids Res. 25, 3471± 3477. Westhof, E. (1993). Modeling the three-dimensional structure of ribonucleic acids. J. Mol. Struct. 286, 203± 210. Westhof, E. & Altman, S. (1994). Three-dimensional working model of M1 RNA, the catalytic RNA subunit of ribonuclease P from Escherichia coli. Proc. Natl Acad. Sci. USA, 91, 5133± 5137. Westhof, E. & Jaeger, L. (1992). RNA pseudoknots. Curr. Opin. Struct. Biol. 2, 327± 333. Westhof, E., Dumas, P. & Moras, D. (1985). Crystallographic re®nement of yeast aspartic acid transfert RNA. J. Mol. Biol. 184, 119± 145. Westhof, E., Masquida, B. & Jaeger, L. (1996a). RNA tectonics: towards RNA design. Folding Des. 1, R78± R88. Westhof, E., Wesolowski, D. & Altman, S. (1996b). Mapping in three dimensions of regions in a catalytic RNA protected from attack by an Fe(II)-EDTA reagent. J. Mol. Biol. 258, 600± 613. Zarrinkar, P. P., Wang, J. & Williamson, J. R. (1996). Slow folding kinetics of RNase P RNA. RNA, 2, 564± 573. Edited by J. Karn (Received 26 November 1997; received in revised form 23 March 1998; accepted 26 March 1998)