X-ray structure of the C-terminal domain of a coronavirus nucleocapsid protein; structural basis of helical nucleocapsid formation Hariharan Jayaram, Jyothi Jayaram, Brian R. Bowman, Ellen W. Collison, B.V.Venkataram Prasad Verna and Marrs McLean Department of Biochemistry and Molecular Biology; Baylor College of Medicine; Houston, Texas, 77030; U.S.A , Department of Veterinary Pathobiology; Texas A&M University; College Station, Texas ,77843;U.S.A Coronaviridae cause a variety of respiratory and enteric diseases in animals and man including SARS a disease with emerging global impact. Enveloped capsids of the virus enclose the single stranded genome associated with the nucleocapsid protein ( N protein). Using limited proteolysis we identified two stable globular domains of the nucleocapsid protein from infectious bronchitis virus. We present here the crystal structure of the Cterminal domain (CTD) of IBV- N protein. The CTD exist as intimate domain swapped dimers that tend to organize into helical arrays. Inferring from interactions observed in crystals at different pHs we hypothesize that the CTD is the key determinant of helical nucleocapsid formation in the virus. Similarity between CTD and the capsid forming domain of a related virus family reveals that this fold constitutes a new class of viral capsid folds that are employed in viruses with helical nucleocapsids. Coronaviridae, a member of the order Nidovirales, is a family of viruses with ssRNA genomes which are a significant causative agent of common colds and other severe respiratory illnesses such as SARS. The coronaviruses have enveloped, non-icosahedral, pleiomorphic capsids with diameters ranging from 80 to 160 nm. The capsid encloses the viral genome consisting of a single 30kb long segment of positive sense ssRNA. Upon infection the genomic RNA encodes a 3’ co-terminal set of four or more subgenomic mRNAs that code for both structural and non-structural proteins. The enveloped capsid of the virus if predominantly made up of the membrane glycoprotein (M) and another small transmembrane protein (E) and an array of spikes composed of the spike protein glycoprotein (S). A significant protein component of the capsid is the nucleocapsid protein (N), which interacts with the genomic ssRNA forming a helical nucleocapsid that comprises the central core of the virion. Electron microscopic studies of TGEV a prototype coronavirus revealed that the internal nucleocapsid is possibly helical and is composed of the ssRNA genome tightly associated with N-nucelocapsid protein(Risco, Anton et al. 1996). The coronavirus N (nucelocapsid) protein is typically a protein of molecular weight 50kDa to 60kDa.tag and is synthesized in large amounts in an infected cell. The protein binds the genomic RNA as well as subgenomic RNAs that are synthesized during a virus infection. Interactions with conserved sequences in genomic RNA are hypothesized to mediate incorporation of RNA into nucleocapsid cores. Proper assembly of capsids in reverse genetic systems also requires complementary interactions between N protein and the major membrane protein M. The N protein displays a non-specific affinity for ssRNA in coronavirus including the ability to recognize with increased affinity the consensus packaging signal of MHV and also interactions between SARS-N protein N-terminal and consensus leader sequence in RNA. The N protein also has a role in modulating viral sub-genomic RNA transcription and mRNA translation along with control of packaging of genomic RNA. These activities have led to the suggestion that N protein function to coordinate the involvement of subgenomic and genomic RNA in various stages of the virus life cycle and ensure its packaging into a nucleocapsid. The full length N-protein is prone to disorder, aggregation and degradation in solution and its instability is suggested to be important for its role in virus capsid formation(Wang, Wu et al. 2004). Limited proteolysis conducted on the full length N protein from infectious bronchitis virus demonstrated that it was predominantly composed of two domains an N-terminal domain that consisted of monomers and a Cterminal domain that existed as dimers at a very low concentration in solution. Similarly dimers of the N-protein were observed for the homologous region in SARS-N protein both in the context of the domain by itself and the full length protein(Surjit, Liu et al. 2004; Tang, Wu et al. 2005; Yu, Gustafson et al. 2005).It is our observation that Nterminal domain and C-terminal domain of SARS do not interact with each other at moderate salt concentrations as assayed by co-fractionation during gel chromatography and affinity pull down experiments. The N-protein therefore constitutes two functional domains, an RNA binding N-terminal domain (Tang, Wu et al. 2005; Yu, Gustafson et al. 2005) and a C-terminal dimerization domain. (Janowski, Kozak et al. 2001) Structure of The CTD dimer: The CTD exists in both crystal forms as an intimate domain swapped dimer. The domain swapping is brought about by interaction between beta strands of one monomer with surrounding helixes and loops from the other monomer to form a reciprocated,closed domain swapped dimer akin to that seen in crystal structures of cystatin A(Janowski, Kozak et al. 2001; Newcomer 2001).The major interface between the two monomers is brought about by a beta sheet constituted by strand 2 (295 and 307) . The base of the dimmer is thus made up a ~400 Å area concave surface.with the topology b1B-b2B-b2A-b1A. The basic structure can be described as floor of anti-parallel b strands surrounded by helices and loops. The helices 3 and 4 connected by loop region surround the beta strand floor flanked by the longest C-terminal helix 6. The dimerization interactions are very tight and bury a very large surface area. Neither the serine rich domain or the disulfide bonding was important in either protein oligomerization or crystal packing. The two cysteine residues do not mediate dimerization and are not disulfide bonded to each other.. The crystals and protein prep was performed I the absence of reducing agent so the non disulfide bond mediated interaction seen here is probably identical to that seen in the nucleocapsid . Interestingly a DALI search of the PDB revealed a very striking similarity to the 12X amino acid capsid forming domain of PRRSV a corona like virus which is a member of the nidovirales family. This match had a similarity Z-score of XXX with a corresponding RMS deviation of 2.8 Å .PRRSV a corona like virus is also a + single stranded RNAvirus with a similarly large genome. PRRSV also forms a helical nucleocapsid and the full length N-protein was shown to form fibers in solution for the full length protein . The capsid forming domain also packed into helical arrays using crystal contacts in the crystal studied. The arrangements of CTD, PRRSV and MS2 coat protein all show a similar feature of an anti-parallel beta strand floor with flanking helixes and loops. The similarity between PRRSV and CTD here clearly indicates that these viruses are3 more similar than previously thought and hints at this architecture being a characteristic fold adopted by helical nucleocapsid viruses. Crystal packing interactions in CTD insights into stability of helical packing interactions: The buried surface area of ~5000 Å2 in the dimer clearly indicates that the dimer is very intimate and is almost likely to be the dimer found in the capsid. Further insight into the nature of the CTD in the capsid or context of the virus can be got from looking at the crystal packing interactions in both spacegroups. We were fortunate to crystallize CTD in three crystal forms. Two of which yielded structures presented here. The third crystal form yields highly anisotropic data with most of the data looking similar to helical packed arrays with layer lines and smeared spots. This clearly indicates the tendency of the CTD to organize into helical arrays. Majority of the crystal packing interactions in the two crystal forms primarily involve residues between 308 and 328 which constitute the terminal loop and 6. A salt bridge mediated by Arg90 from one dimer and Asp96 from the other dimer stabilized the ~1200 Å2 interaction area between the two dimers. Interestingly enough the mode of interactions between asymmetric units in the crystal at pH 4.5 is conserved within the asymmetric unit at pH 8.5. The salt bridge and the orientation of the dimers remain almost identical such that the rmsd between three adjacent dimers from one spacegroup and the three molecules in the assymetric unit from the other spacegroup is ~1.0 Å. Although the surface complementarity does not exist between the two molecules and the iteractions other than for the salt bridge are strictly vanderwaal interactions. The multimerization interaction in addition to the dimerization interactions seen in CTD are therefore very well maintained over this wide range of pHs. The ionic strength of the two crystal conditions is also different thereby providing further evidence as to the stability of dimer-dimer packing interactions. Auxilary interactions with fourth dimer in second spacegroup: The additional dimer is clearly not part of the primary fibre and instead forms a weak secondary transverse fibre. The primary residues involved in contact of this cond molecule with the three other dimers are scattered over both molecules of the dimer. This interaction mode might represent interactions necessary to yield a spherical structure in the context of fibre formation. Since this interaction involved three different molecules and yet buried a similar surface area as the primary crystalpacking or fribre forming interaction, we hypothesize that it is energetically less likeky yet equally stabilizing. Analysis of the GRASP surface of the octamer further reveals that the surface is primarily acidic with a swath of basic residues running in an expectedly helical fashion throught the fibre. Although the pimary interactions with RNA are conferred by the Nterminus secondary interactions may be facilitated by this basic stretch which is clearly solvent exposed. Interactions with M protein: This fact taken together with the interaction seen in the PRRSV crystal packing interaction similarly mediated by helix helix vanderwaal stacking and a simlar salt-bridge between ArgX aqnd ASpX in PRRSV suggests a common theme in helical fibre formation across the viruses in the Nidovirales family to which PRRSV and IBV both belong. This strengthens the suggestion that this fold is commonly employed in viruses with helical nucleocapsids. Materials and methods: Purification of full length nucleocapsid protein and identification of tryptically stable fragments: Full length nucleocapsid protein was expressed as before. The expressed protein was purified by Ni-NTA agarose affinity followed by Heparin affinity to almost 95% purity ( as assessed by denaturing SDS-PAGE followed by coomassie staining). The protein was checked for monodispersity by dynamic light scattering ( Dynapro ) and negative stain electron microscopy. Cleavage of full length N protein was carried out at 1-2 mg/ml concentration with 2% (wt trypsin /wt protein) sequencing grade trypsin (Roche) to identify tryptically stable fragments . Following trypsinization the protein was run on a denaturing SDS-PAFGE gel and the protein band that resulted was blotted onto a PVDF (polyvinyldine fluoride) membrane and subjected to N-terminal amino acid sequencing. For construct optimization the carboxy termini were estimated based on predicted secondary structure in terminal region and mass spectrometric characterization of purified protein. Cloning , expression purification and crystallization of the tryptic fragments of nucleocapsid protein: The two major and minor bands identified were expressed as GST fusion proteins using the pet41 EkLIC vector (Novagen) into the LIC site . The expressed protein was purified using affinity on glutathione S sepharose ( pharmacia) followed by on-bead cleavage with enterokinase (EK-Max Invitrogen). The cleavage reaction was performed by suspending 1 ml of beads in 40 mls of cutting buffer ( 250 mM NaCl, 50 mM Tris-HCl ph 8.0) with 10 units of protease for 1ml of beads. Following proteolysis the dilute supernatant was purified further by gel filtration chromatography on a superdex 75 16/60 column ( Pharmacia). The protein migrated as a dimer and was concentrated to 5-8 mg/ml and used for crystallization trials. Initial crystallization trials were carried our using Crystal Screen I ( Hampton Research). Following several leads in conditions with Peg 4000. The Index screens 2 and 3 ( Jena Biosciences) were used to design optimization strategy. Crystals of the C-terminal dimer grew in three to ten days and were mostly needle shaped ,thin plates or hexagonal three dimensional bipyramidal crystals that grew around two base conditions: one with citrate i.e 100 mM pH 4.5-5.2 trisodium citrate, 0.1M MgCl2, 25-30% PEG 4000 and the other had32% PEG 4000, 0.8 M LiSO4, 0.1 M Tris-HCl pH 8.5. Data Collection and phasing: Data was collected at the beamlines as indicated in Table I. 180 or 360 oscillation images with 1 oscillation angle were collected using the inverse beam approach with a wedge size of 30. The entire dataset was integrated and scaled using the HKL200 suite and scalepack. Four methionine positions were located using shake and bake. The solutions were then refined, phases calculated and density modified using SHARP. The final FOM after structure solution and phasing inn SHARP was 0.65 which yielded maps of an excellent quality to 2.2 Å. Although almost 80% of the model could be traced using automated tracing in ARP-wARP, manual building of the dimer in the asymmetric unit was performed using the program COOT. Refinement was carried out in CNS or refmac5 . Refined coordinates for the dimer were used to phase data obtained in the P21212 spacegroup by molecular replacement in the program phaser. Phaser was able to coorectly identify positions opf all 4 dimers. Model bias was avoided during refinement by using the prime and switch methodology implemented in SOLVE/RESOLVE. All figures were generated using espript in combination with Adobe Illustrator or pymol. Janowski, R., M. Kozak, et al. (2001). "Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping." Nat Struct Biol 8(4): 316-20. Newcomer, M. E. (2001). "Trading places." Nat Struct Biol 8(4): 282-4. Risco, C., I. M. Anton, et al. (1996). "The transmissible gastroenteritis coronavirus contains a spherical core shell consisting of M and N proteins." J Virol 70(7): 4773-7. Surjit, M., B. Liu, et al. (2004). "The nucleocapsid protein of the SARS coronavirus is capable of self-association through a C-terminal 209 amino acid interaction domain." Biochem Biophys Res Commun 317(4): 1030-6. Tang, T. K., M. P. Wu, et al. (2005). "Biochemical and immunological studies of nucleocapsid proteins of severe acute respiratory syndrome and 229E human coronaviruses." Proteomics 5(4): 925-37. Wang, Y., X. Wu, et al. (2004). "Low stability of nucleocapsid protein in SARS virus." Biochemistry 43(34): 11103-8. Yu, I. M., C. L. Gustafson, et al. (2005). "Recombinant severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein forms a dimer through its Cterminal domain." J Biol Chem 280(24): 23280-6.