X-ray structure of the C-terminal domain of a coronavirus nucleocapsid protein; structural basis of helical nucleocapsid formation Hariharan Jayaram, Jyothi Jayaram, Brian R. Bowman, Ellen W. Collison, B.V.Venkataram Prasad Verna and Marrs McLean Department of Biochemistry and Molecular Biology; Baylor College of Medicine; Houston, Texas, 77030; U.S.A , Department of Veterinary Pathobiology; Texas A&M University; College Station, Texas ,77843;U.S.A Coronaviridae cause a variety of respiratory and enteric diseases in animals and man including SARS a disease with emerging global impact. Enveloped capsids of the virus enclose the single stranded genome associated with the nucleocapsid protein ( N protein). Using limited proteolysis we identified two stable globular domains of the nucleocapsid protein from infectious bronchitis virus. We present here the crystal structure of the Cterminal domain (CTD) of IBV- N protein. The CTD exist as intimate domain swapped dimers that tend to organize into helical arrays. Inferring from interactions observed in crystals at different pHs we hypothesize that the CTD is the key determinant of helical nucleocapsid formation in the virus. Similarity between CTD and the capsid forming domain of a related virus family reveals that this fold constitutes a new class of viral capsid folds that are employed in viruses with helical nucleocapsids. 1 Coronaviridae, a member of the order Nidovirales, is a family of viruses with ssRNA genomes which are a significant causative agent of common colds and other severe respiratory illnesses such as SARS. The coronaviruses have enveloped, non-icosahedral, pleiomorphic capsids with diameters ranging from 80 to 160 nm. The capsid encloses the viral genome consisting of a single 30kb long segment of positive sense ssRNA. Upon infection the genomic RNA encodes a 3’ co-terminal set of four or more subgenomic mRNAs that code for both structural and non-structural proteins. The enveloped capsid of the virus is predominantly made up of the membrane glycoprotein (M) and another small transmembrane protein (E) and an array of spikes composed of the spike protein glycoprotein (S). A significant protein component of the capsid is the nucleocapsid protein (N), which interacts with the genomic ssRNA forming a helical nucleocapsid that comprises the central core of the virion. Electron microscopic studies of TGEV a prototype coronavirus revealed that the internal nucleocapsid is possibly helical and is composed of the ssRNA genome tightly associated with N-nucelocapsid protein(Risco, Anton et al. 1996). The coronavirus N (nucelocapsid) protein is typically a protein of molecular weight 50kDa to 60kDa.tag and is synthesized in large amounts in an infected cell. The protein binds the genomic RNA as well as subgenomic RNAs that are synthesized during a virus infection. Interactions with conserved sequences in genomic RNA are hypothesized to mediate incorporation of RNA into nucleocapsid cores. Proper assembly of capsids in reverse genetic systems also 2 requires complementary interactions between N protein and the major membrane protein M. The N from SARS has been shown to interact with cycophilin an immuno-modulator, RNAseH and also found to activate the AP1 pathway whci plays a role in cell cycle control. . In MHV N protein was shown to enter the nucleus while similar localization was observed for a fragment of the protein in SARS. These suggest a possible role for N protein in host modulation and control of host processes during a coronavirus infection. The N protein is also a major antigen and is one of the diagnostic markers used for coronavirus infection. The N protein also enhances protection caused by the vain vaccine in birds. The N protein displays a non-specific affinity for ssRNA in coronavirus including the ability to recognize with increased affinity the consensus packaging signal of MHV and also interactions between SARS-N protein N-terminal and consensus leader sequence in RNA. The N protein also has a role in modulating viral sub-genomic RNA transcription and mRNA translation along with control of packaging of genomic RNA. These activities have led to the suggestion that N protein function to coordinate the involvement of subgenomic and genomic RNA in various stages of the virus life cycle and ensure its packaging into a nucleocapsid. Consierable biochemical information has become available on the in vitro behavior of N protein especially with regard to its oligomerization behavior and interaction with 3 RNA.The full length N-protein is prone to disorder and aggregation in solution and its instability is suggested to be important for its role in virus capsid formation(Wang, Wu et al. 2004). The dimerization domain of N-protein has been localized to the c-terminal 200 residues by several studies which identified N-protein dimers both in the context of the domain by itself and the full length protein(Surjit, Liu et al. 2004; Tang, Wu et al. 2005; Yu, Gustafson et al. 2005). The N-terminal domain has been shown to be predominantly monomeric with an affinity for ssRNA. An NMR structrre for the N-terminal domain for SARS-N clearly shows that The N –terminal domain is largely composed of coiled structure and interacts with RNA in solution. The N-protein therefore constitutes two functional domains, an RNA binding N-terminal domain (Tang, Wu et al. 2005; Yu, Gustafson et al. 2005) and a C-terminal dimerization domain. Biochemical characterization of IBV-N protein domains: The full length N protein from infectious bronchitis virus has been purified and characterized previously. The N protein has strong interactions with 5’and 3’ conserved sequences of IBV RNA and also undergoes phosphorylation during an infection to generate multiple isoforms . Our structural characterization of full length N protein was impeded by its aggregation and degradation on storage under a variety of conditions (lane zero Figure 0b). Purified full length N protein was also extremely polydisperse in solution and not amenable to detailed structural characterization. We employed the divide and conquer approach to study the protein structurally. Using limited proteolysis we chose to identify regions of the protein that represented stable domains that were resistant to proteolysis under limiting amounts of proteases trypsin 4 (that cleaves after basic residues Arg and Lys) and V8 protease (cleaves after acidic residues Glu and Asp). The digestion pattern with v8 protease was not very distinct and yielded several diffuse bands( data not shown). Trypsin proteolysed the full length protein to a single ~17 kD band on a 17% denaturing SDS-PAGE gel within 15 minutes of trypsinization(Figure 0b). The “single” band thus observed was resistant to further degradation even upon typsinization for several hours and represented a stable region of the protein. Using N-terminal sequencing of the cleavage fragment we identified four tryptic fragments: two major cleavage sites that corresponded to cleavage at residues19 and 219 and two secondary cleavage sites at residues 27 and 226-migrated The optimized domain constructs termed NTD (N terminal domain) and CTD (C-terminal domain) were then cloned, expressed and purified to homogeneity. The N terminal domain thus identified was monomeric at moderate concentrations concentrations while the Cterminal domain protein was a dimer even at very low concentrations(Figure 0c). The Cterminal protein tended to aggregate during purification and thus was purified at very low concentrations and concentrated only prior to crystallization screening. The NTD and CTD proteins thus expressed failed to interact at a variety of salt and protein concentrations as assayed by gel-filtration co-fractionation and pull down experiments (Figure 0c and data not shown). NTD and CTD therefore represent independent domains of the full length protein and were suitable for structure determination separately. Crystals of both the N-terminal and C-terminal domain were obtained in a variety of conditions. Although diffraction data were obtained for both domains, we were successful in phasing only the CTD data .We present here the crystal structure of the C- 5 terminal domain of IBV-N protein. Of the three different space groups in which we were successful in obtaining diffraction data we successfully solved the structure of CTD in two different conditions (Table 1). One of these crystal I , is at an extremely low pH of 4.5 where the crystals have a distinct rod like appearance in rare cases but form large needles or flat sheets in most cases. The other condition ( Table I, Crystal II) yielded crystals which were flat sheets after several weeks. We were successful in obtaining two wavelength anomalous data with selenomethionine substituted protein for crystal I and native data for crystal II. Crystal I and Crystal II represented two different pHs and two different ionic strengths and had widely differing unit cell sizes(Table 1). The crystal morphology of both crystals i.e rods or needles at acidic pHs or flat sheet crystals at basic pHs indicated a tendency of the protein to pack very well in two dimensions. Besides these a third three-dimensional hexagonal-bipyramidal crystal form grown under similar conditions as Crystal I but at slightly elvated pH ( pH 5.2 ) and the absence of citrate or acetate was optimized. Despite the seeming three dimensional appearance of this crystal form, the diffraction pattern was extremely anistropic with almost no diffraction perpendicular to the principal long axis of the pyramid. This factor also characteristic of organization along only two dimensions prevented the solving of CTD structure under these conditions. We report the pH 4.5 structure of CTD with a dimer in the asymmetric unit and a pH 8.5 structure with 4 dimers or 8 molecules in the asymmetric unit. The observation of dimers as the building block of both crystals at these widely different pHs coupled with the dimer observed on gel filtration under extremely dilute conditions reveal that dimers of CTD were the obvious physiologically relevant form for this domain. 6 Structure of The CTD dimer: The CTD exists in both crystal forms as an intimate domain swapped dimer. The domain swapping is brought about by interaction between βstrands of one monomer with surrounding helices and loops from the other monomer to form a reciprocated, closed domain swapped dimer akin to that seen in crystal structures of cystatin A and RNAseA(Janowski, Kozak et al. 2001; Newcomer 2001). Accordingly a 12 residue long β-strand β2 (295 and 307) constitutes the interface between the two monomers (Figure 2 bottom). The overall topology of the dimer of IBV-N can be said to be a concave β-stranded floor of ~400Å2 area with the topology β1B-β2B-β2A-β1A surrounded by helices and loops. The helices 3 and 4 connected by loop region arch over this floor and constitute the roof of the dimer. The 12 residue long α-helix α5 located at the extreme C-terminus of CTD forms an angled wall that flanks either side of the dimer and is held in place by a tight turn made up residues 307 to 310(Figure 1 and Figure 2). The dimerization interactions are very tight and bury a surface area of 5780Å2. Neither the serine rich domain (161 to 191, Figure 0) nor disulfide bonding are important in protein oligomerization as was expected based on previous biochemical data. The two cysteine residues C228 and C281 lie in close proximity in the interior of the dimer and are not disulfide bonded to each other in this structure. The crystals and protein prep was performed in the absence of reducing agent so the non disulfide bond mediated interaction seen here is probably identical to that seen in the virus nucleocapsid . The integrity of the dimer observed in solution is apparent when one considers the ~5000 Å2 buried surface area involved in the dimerization. 7 The dimeric structure observed at pH 4.5 was almost identical to all four dimers observed at pH 8.5 with the rmsd. for Cα-atoms in the core region (233 to 328) being ~0.3 Å. The N and C termini in the five dimers observed differed from dimer to deimer based on its stabilizing interactions with neighboring dimers in crystal (pH 4.4 case) or within the asymmetric unit (pH 8.5 case). Further insight into the nature of the CTD in the capsid or context of the virus can be got from looking at the crystal packing interactions in both spacegroups. The presence of a dimer in the ASU in one crystal form and 4 dimers in the asu in the other crystal form allowed the analysis of dimer-dimer interactions not only at different pHs but in the presence and absence of any constraints imposed by crystal packing. Crystal packing interactions in CTD insights into stability of helical packing interactions: The two structures presented here result in five kinds of inter-dimer interactions. Crystal packing in crystal I is brought about by dimer-dimer interactions with the nth dimer interacting with n-1 dimer and n+1 dimer from neighboring ASU (Figure 4b). In crystal II with 4 dimers in the ASU inter-dimer interactions are responsible for keeping the four dimers in the ASU together as well as mediating crystal packing(Figure 4a) accordingly giving rise to four classes of dimer-dimer interactions. Three of them i.e AB-CD, CD-EF and crystal packing interaction GH with ABn+1 belong to one class and a new class of “dimer-dimer” interactions involves the interaction between the GH dimer with a different interface formed by the CD-EF dimer (Figure 4a). The uniformity of all but the last kind of dimer-dimer interactions observed in two 8 crystals is apparent from a superposition of all four types of dimer-dimer interactions observed between the two crystals whereby the dimers all superpose with a minimum of 0.3 XXXÅ rmsd and a maximum of 0.8XXX Å rmsd (Figure 4c and Figure 4d). When the three dimers from crystal I are superposed from the three dimers from crystal II the rmsd between them is ~1.0 Å (Figure 4c). This clearly indicates that the dimers tend to swivel only slightly w.r.t each other and constitute a subtle module that is very prone to interacting with itself. These interactions primarily involve residues between 308 and 328 which constitute the XXX type turn (TT in Figure 1)and 5 and the terminal loop in CTD(Figure 1 lilac boxes). The dimers interact such that they bury a surface area of ~1200 Å2 between them in all cases except that seen in packing in crystal II (dimer GH interacting with dimer ABn+1) where the buried surface area is only 600 Å2 due to a swiveling away of the GH dimer prompted possibly by its strong interaction with CD-EF dimers from within the ASU. Although there is not significant surface complementarily between the two molecules the predominant interaction between dimers is a salt bridge between Arg-308 from one dimer and Asp-314 from a neighboring dimer (Figure 5). The salt bridge and the orientation of the dimers remain almost identical between the structures at pH 4.5 and pH 8.5. The inter dimer interactions other than for the salt bridge are strictly Vanderwaal interactions. The multimerization interaction in addition to the dimerization interactions seen in CTD very well maintained over this wide range of pHs. The ionic strength of the two crystal conditions is also different thereby providing further evidence as to the stability of dimer- 9 dimer packing interactions. The additional dimer (GH) is clearly auxiliary (and not part of the primary fibre see below) and reveals a higher mode of interaction with CTD dimers. The GH dimer interacts with residues from two neighboring dimers in crystal 2 such that the total buried surface area is ~1200Å2. The interacting surface comprises residues from all over the dimers (underlined residues Figure 1). Since this interaction involves three different molecules and yet the buried surface area is similar (~1200 Å2)as the primary crystalpacking (or fiber forming interaction), we hypothesize that it is less likely and therefore secondary to the primary interaction seen for other dimers. Considering this dimer mediates crystal packing in this pacegroup by the same region on its other face, the tight salt bridge observed between R308 and D315 is preserved in only one of the cases and disrupted in the two fold related. Despite this skewing the overall rmsd is only 0.8XXX Å indicating the extreme adaptability of the dimer with α5 and preceding loop mediating these interactions. This additional interaction also leads to the possibility that the fibre-hexamer made up ABCDEF with GH appendage could circularize or form planar triangles under certain conditions with the GH dimer serving as a bridge to bring the otherwise rigid ABCDEF fibres together. Such bridging interactions may indeed be necessary for spherical particle formation driven by triangularization of three hexamers with the fourth dimer serving as the linker. In addition the greater flexibility of various regions of the protein at alkaline pH (Figure 6a) coupled with the swiveling seen by GH-AB interaction could represent a snapshot 1 0 into the dis-assembly of dimer-dimer interactions considering how this may be important for nucleocapsid disaasembly and genome release( NEED TO ELABORATE). NEED TO ADD A BIG SECTION ON THE FIBRE AND HOW THIS SUGGESTS HELICAL NUCLEOCAPSID FORMATION>>AND IMPLICATIONS FOR M INTERACTION ETC Electrostaic surface, conservation of surface residues and interaction with other other capsid components: Analysis of the GRASP surface of the octamer further reveals that the surface is primarily acidic with a swath of basic residues running in an expectedly helical fashion throught the fibre (Figure 6b). Although the pimary interactions with RNA are conferred by the N terminus secondary interactions may be facilitated by this basic stretch which is clearly solvent exposed. Fibre formation : The clear tendency of the dimer-dimer interaction to promote fibre formation is evident from superposition of three dimers from both spacegroups (Figure 4c). The relevance of this interaction is greater when one considers that it occurs as discussed above at both pHs and also occurs free of crystal packing induced forces at the alkaline pHs. The dimer induced fibre formation is even more striking when one puts it in context of the relatedness of the protein to another capsid forming domain N protein from a related virus. Similarity to other nucleocapsid proteins and evolutionary implications for viral 1 1 architechture: A DALI search of the PDB revealed a very striking similarity to the 12X amino acid capsid forming domain of PRRSV a corona like virus which is a member of the nidovirales family. This match had a similarity Z-score of XXX with a corresponding RMS deviation of 2.8 Å .PRRSV a corona like virus is also a + single stranded RNAvirus with a similarly large genome. PRRSV also forms a helical nucleocapsid and the full length N-protein was shown to form fibers in solution for the full length protein . The capsid forming domain also packed into helical arrays using crystal contacts in the crystal studied. The arrangements of CTD, PRRSV and MS2 coat protein all show a similar feature of an anti-parallel beta strand floor with flanking helixes and loops. The similarity between PRRSV and CTD here clearly indicates that these viruses are3 more similar than previously thought and hints at this architecture being a characteristic fold adopted by helical nucleocapsid viruses. This fact taken together with the interaction seen in the PRRSV crystal packing interaction similarly mediated by helix helix vanderwaal stacking and a simlar salt-bridge between ArgX aqnd ASpX in PRRSV suggests a common theme in helical fibre formation across the viruses in the Nidovirales family to which PRRSV and IBV both belong. This strengthens the suggestion that this fold is commonly employed in viruses with helical nucleocapsids. Materials and methods: 1 2 Purification of full length nucleocapsid protein and identification of tryptically stable fragments: Full length nucleocapsid protein was expressed as before. The expressed protein was purified by Ni-NTA agarose affinity followed by Heparin affinity to almost 95% purity ( as assessed by denaturing SDS-PAGE followed by coomassie staining). The protein was checked for monodispersity by dynamic light scattering ( Dynapro ) and negative stain electron microscopy. Cleavage of full length N protein was carried out at 1-2 mg/ml concentration with 2% (wt trypsin /wt protein) sequencing grade trypsin (Roche) to identify tryptically stable fragments . Following trypsinization the protein was run on a denaturing SDS-PAFGE gel and the protein band that resulted was blotted onto a PVDF (polyvinyldine fluoride) membrane and subjected to N-terminal amino acid sequencing. For construct optimization the carboxy termini were estimated based on predicted secondary structure in terminal region and mass spectrometric characterization of purified protein. Cloning , expression purification and crystallization of the tryptic fragments of nucleocapsid protein: The two major and minor bands identified were expressed as GST fusion proteins using the pet41 EkLIC vector (Novagen) into the LIC site . The expressed protein was purified using affinity on glutathione S sepharose ( pharmacia) followed by on-bead cleavage with enterokinase (EK-Max Invitrogen). The cleavage reaction was performed by suspending 1 ml of beads in 40 mls of cutting buffer ( 250 mM NaCl, 50 mM Tris-HCl ph 8.0) with 10 units of protease for 1ml of beads. Following proteolysis the dilute supernatant was purified further by gel filtration chromatography on a superdex 75 16/60 column ( Pharmacia). The protein migrated as a dimer and was concentrated to 1 3 5-8 mg/ml and used for crystallization trials. Initial crystallization trials were carried our using Crystal Screen I ( Hampton Research). Following several leads in conditions with Peg 4000. The Index screens 2 and 3 ( Jena Biosciences) were used to design optimization strategy. Crystals of the C-terminal dimer grew in three to ten days and were mostly needle shaped ,thin plates or hexagonal three dimensional bipyramidal crystals that grew around two base conditions: one with citrate i.e 100 mM pH 4.5-5.2 trisodium citrate, 0.1M MgCl2, 25-30% PEG 4000 and the other had32% PEG 4000, 0.8 M LiSO4, 0.1 M Tris-HCl pH 8.5. Data Collection and phasing: Data was collected at the beamlines as indicated in Table I. 180 or 360 oscillation images with 1 oscillation angle were collected using the inverse beam approach with a wedge size of 30. The entire dataset was integrated and scaled using the HKL200 suite and scalepack. Four methionine positions were located using shake and bake. The solutions were then refined, phases calculated and density modified using SHARP. The final FOM after structure solution and phasing inn SHARP was 0.65 which yielded maps of an excellent quality to 2.2 Å. Although almost 80% of the model could be traced using automated tracing in ARP-wARP, manual building of the dimer in the asymmetric unit was performed using the program COOT. Refinement was carried out in CNS or refmac5 . Refined coordinates for the dimer were used to phase data obtained in the P21212 spacegroup by molecular replacement in the program phaser. Phaser was able to correctly identify positions of all 4 dimers. Model bias was avoided during refinement by using the prime and switch methodology implemented in 1 4 SOLVE/RESOLVE. All figures were generated using Espript in combination with Adobe Illustrator or pymol. Janowski, R., M. Kozak, et al. (2001). "Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping." Nat Struct Biol 8(4): 316-20. Newcomer, M. E. (2001). "Trading places." Nat Struct Biol 8(4): 282-4. Risco, C., I. M. Anton, et al. (1996). "The transmissible gastroenteritis coronavirus contains a spherical core shell consisting of M and N proteins." J Virol 70(7): 4773-7. Surjit, M., B. Liu, et al. (2004). "The nucleocapsid protein of the SARS coronavirus is capable of self-association through a C-terminal 209 amino acid interaction domain." Biochem Biophys Res Commun 317(4): 1030-6. Tang, T. K., M. P. Wu, et al. (2005). "Biochemical and immunological studies of nucleocapsid proteins of severe acute respiratory syndrome and 229E human coronaviruses." Proteomics 5(4): 925-37. Wang, Y., X. Wu, et al. (2004). "Low stability of nucleocapsid protein in SARS virus." Biochemistry 43(34): 11103-8. Yu, I. M., C. L. Gustafson, et al. (2005). "Recombinant severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein forms a dimer through its Cterminal domain." J Biol Chem 280(24): 23280-6. 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2