Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Fragmental Cloning of Complete Genome Length cDNA and Important Viral Proteins of SARS-CoV and Sequence-based Bioinformatic Analysis1 Sidi Chen and Jianguo Chen2 Department of Cell Biology and Genetics ,College of Life Science ,Peking University Abstract:SARS-CoV , a positive-single-strand RNA virus, is a recent highly infectious coronavirus of human respiratory system. We have cloned the complete-length genomic cDNA as well as some important viral proteins of this coronavirus in fragments by standard molecular cloning techniques, and the credibility of the correctness of the cloned cDNA sequences have been confirmed by DNA sequencing. Bioinformatic analysis, which is based on cDNA and protein sequence, reveals the putative characteristics of the viral proteins and the evolutional relationships between SARS-CoV and other coronavirus. Multiple alignment of the amino acid sequences of the putative NSP13 of different species in coronaviridae reveals in this nonstructural protein some conserved motifs in which there are 4 residues forming the K-D-K-E conserved tetrad predicted to be related with viral mRNA capping. Evolution trees based on the alignment of NSP13 reveals that the SARS CoV is not close related to any group of the known coronavirus, which is consistent with the analysis based on the alignment of structural proteins and polymerase. The evolutional distances based on the alignments of NSP13 and NSP10 suggest that the SARS CoV share more similarity with BCoV and MHV than other coronavirus, which may provide a clue for the origin of SARS CoV. KEYWORDS : SARS coronavirus (SARS-CoV), cDNA cloning, NSP13, Bioinformatic analysis , Evolutional relationship 1 This project is partly funded by the Jun-Zheng Foundation at PKU. The one in charge of this project is Prof. Jianguo Chen in Department of Cell Biology and Genetics ,College of Life Science ,Peking University 2 corresponding author: chenjg@pku.edu.cn 628 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Introduction: The Severe Acute Respiratory Syndrome that broke out recently in China and was confirmed to be caused by SARS coronavirus. The first complete genome-length sequence was obtained by Marra et al , Cananda.1 The SARS CoV is a member of the coronaviridae, in which family all the viruses are enveloped and have significant spike proteins rounding their surfaces. Inside the viron of each virus is the genomic RNA integrated with nucleocapsid proteins.2 The model for a typical coronavirus (e.g HCoV229E) is shown in Fig.13. The envelope carries three glycoproteins: Spike protein (S) , which is related with receptor binding, cell fusion and acts as the major antigen in immunity; Envelope protein (E), which is a envelope-associated protein; Membrane protein (M), which is related with transmembrane - budding and envelope formation ;In a few types, there is a third glycoprotein: Haemagglutinin-esterase (HE). The genome is associated with a basic phosphoprotein, N. The coronaviruses, which replicate in the cytoplasm in the host cells, are distinguished by the presence of a single-stranded plus sense RNA genome approximately 30 kb in length that has a 5´ cap structure and 3´ polyA tract.3 The mRNA mapping based on sequence reveals the genome organization of SARS CoV, which indicates the presence of the CDS of a leader peptide, structural proteins such as S, M, E, N4, nonstructural proteins such as MHV p65 counterpart, NSP1-NSP13, X1-X5.5 We aimed to clone the complete length cDNA and cDNAs that coding different structural and nonstructural proteins of the SARS coronavirus in order that experiments of virus-cell interaction and of viral protein expression and characterization could be carried out in the future. More over, bioinformatics analysis could be done to predict the characteristics of some viral proteins like the putative nonstructural protein NSP13 which is suggested to have the function of viral mRNA capping , and to explore the evolutional relationship of SARS CoV and other coronaviruses. Fig1. Model for a viron of coronavirus S - Spike protein; E - Envelope protein: small, envelope-associated protein; M - Membrane protein ; HE - Haemagglutinin-esterase (In a few types); N- Nucleocapsid. 3 This figure is from Alan J. Cann , Priciples of Molecular Virology. 629 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Materials and Methods: 1. Template preparation: The reverse transcriptional mixture of SARS-CoV RNA from supernatant fluid of the virus infected Vero cells using random or specific primers was generously supplied by Dr. Y Lu in Zhejiang Provincial Center for Disease Prevention and Control. 2. Primer Design (by Primer-Premier) : We designed primers for both complete-length genomic cDNA cloning and the cloning of some viral proteins, which is based on the sequence of the positive strand of SARS CoV summitted by Marra et al (Genbank Assess Number:NC_004718). Primers for complete-length genomic cDNA cloning have no additional bases, while primers for protein cloning have additional bases such as restriction sites, start codon ATG, or stop codon TAA. . Primers for complete-length genomic cDNA cloning locate at intervals of about 1-2kilobase, and commonly have a restriction site in original sequence ; Primers for protein CDS cloning are located at the beginnings and ends of the target cDNA sequence of the protein. These primers are listed in Table 1 a,b and Table 2. The primers were synthesis and dissoluted in distilled water at a concentration of 10µmol/L. Table 1a. PCR Primers for fragmental cDNA cloning of SARS CoV genome Forward Primers: Primer name sequence (5’3’) F0 CTACCCAGGA AAAGCCAACC AACC F10 CCAGACACCC TTCGAAATTA AGAG F25 GAACTCGA AGCACTCGAG ACGCCCG F43 GGATG TTAGAGCCAT AATGGCAACC F5 CATCTTCTACA GCATGCTAAT TTGG F65 TTT CACTAGCCTT AGGTTTAAAAAC F79 GCTT ATGTCGACAC CTTTTCAGCA F91 TGCGAAAGGTCAGAAGTAGGTATT F99 AGTGGTTTTAG GAAAATGGCA TTCCC F120 GCCACTGCC CAGGAGGCCT ATGAGC F138 CCTGACATCTT ACGCGTATAT GC F150 GCAAAGAA TAGAGCTCGC ACCGTAGCT F165 CTGAGAGACT CAAGCTTTTC GC F179 CTGCAAT TTACAAGTCT AGAAATACC F192 GTATGTGAAT AAGCATGCAT TCCAC F202 GGCTATGCCTT CGAACACATC G F251 GC ATGACTAGTT GTTGCAGTTG CC F279 TACTA TCAAC TGTCA AGATC CAGC F286 TCTTCTCGCTCCTCATCACGT 630 position in genome (1-24) (1002-1025) (2533-2557) (4356-4380) (5380-5404) (6518-6542) (7957-7980) (9145-9168) (9970-9994) (12052-12076) (13861-13883) (15004-15030) (16532-16553) (17905-17930) (19212-19236) (20221-20242) (25129-25153) (27980-28004) (28654-28634) Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Table 1b. PCR Primers for fragmental cDNA cloning of SARS CoV genome (continued) Reverse Primers: Primer name sequence (5’3’) position in genome R10 CTCTTAATTT CGAAGGGTGT CTGG (1025-1002) R27 AATTGGTGCA CCCCCTTTTA AGCG (2712-2688) R43 GGTTGCCATT ATGGCTCTAA CATCC (4380-4356) R53 CCAAATTAGC ATGCTGTAGA AGATG (5404-5380) R65 GTTTTTAAAC CTAAGGCTAG TGAAA (6542-6518) R79 TGCTGAAAAG GTGTCGACAT AAGC (7980-7957) R93 ACCAGCCACT ACTGAAGCAG ACAC (9330-9307) R102 TTGCATAGAA TGGCCAATAA CACG (10218-10193) R120 GCTCATAGGC CTCCTGGGCA GTGGC (12076-12052) R138 GCATATACGC GTAAGATGTC AGG (13883-13861) R150 AGCTACGGTG CGAGCTCTAT TCTTTGC (15030-15004) R165 GCGAAAAGCT TGAGTCTCTC AG (16553-16532) R179 TTGTAATGTA GCCACATTGC GACG (17955-17932) R192 GTGGAATGCA TGCTTATTCA CATAC (19236-19212) R202 CGATGTGTTC GAAGGCATAG CC (20242-20221) R219 GTACCCATGG GTTTAGAAAC AGC (21914-21891) R265 CAGCAAGCAC AAAACAAGCA A (26584-26564) R283 CCCTGGCCTC GAGGGAATCT AAGTT (28320-28296) R297 GTCATTCTCC TAAGAAGCTA TTAAAA (29713-29687) Table 2. PCR Primers for cDNA cloning of SARS CoV protein CDS Primer name F3C R3C Fn Rn Fm Rm Fs1 Rs2 Fp1 Rp5 sequence (5’3’) CCC GGA TCC ATG AGTGGTTTTAG GAAAATGGCA TTCCC GGG GCGG CCGC TTA TTGGAAGGTA ACACCAGAGC CCC GGA TCC ATGTCT GATAATGGAC CCCAATCAAA GGGGCGG CCGCTTATGCCTGA GTTGAATCAG CAGAAGC CCCGGA TCC ATGGCAGACAACGGTACTATTACCG GGGGCGG CCGCTTACTGTACT AGCAAAGCAA TATTG CCCGGA TCCATGTTTATTTTCTTATTATTTCTTACTC GGGGCGG CCGCTTATGTGTAA TGTAATTTGA CACCC CCCGGATCCATGTCTGCGGATGCATCAACGTTTTT AAACCGGGTTTGCGGTGTAAG GGGCTCGAGTTACTGCAAGACTGTATGTGGTGTG additional restriction site Bam HI Not I Bam HI Not I Bam HI Not I Bam HI Not I BamHI XhoI *The mucleotides underlined are added artificially for cloning, not native in the SARS genome ** The primers for specific protein CDS cloning are named with the direction and protein name: F3C and R3C are for the cloning of 3CL protein , Fn and Rn for N protein, Fm and Rm for M protein, Fs1 and Rs2 for S protein, Fp1 and Rp5 for RdRp. 631 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) 3.cDNA synthesis by Polymerase Chain Reaction: For cloning of the cDNA of the SARS-CoV, the first stand cDNA mixture was subjected to PCR amplification using the specific primers to amplify cDNA fragments of the virus (Table 1, Table 2). In brief, we set up the polymerase chain reaction (PCR) as described below (Table 3). The PCR products were purified by agarose gel electrophoresis and than cloned directly into pGEM T Easy vector (Promega, Madison, USA). The result of latter sequencing confirmed that the amplifed fragments were exactly the cDNA fragments of SARS-CoV. Table 3. PCR reaction for SARS CoV cDNA synthesis Volume Content (µL ) 0.1 Template :1st –strand cDNA solution** 0.5 Forward primer 0.5 Reverse primer dNTP mixture 5 (dATP ,dCTP ,dGTP, dTTP ) 5 10 fold ExTaq buffer ExTaq polymerase* dH2O Total 0.5 38.4 50 Final concentration 0.1µmol/L 0.1µmol/L 0.25mmol/L each 10mmol/L Tris-HCl (pH 8.3) 50mmol/L KCl 1.5mmol/L MgCl2 - *The Extaq polymerase and its buffer were bought from Takara Ltd **Negative control was set without adding template 3. Characteristics prediction of viral proteins by TMHMM and DNAMAN : With the sequence of the complete-length cDNA of SARS CoV and the putative CDS(coding sequence), we collected all the SARS CoV protein sequences from Genbank, with exceptions that some of the protein sequences were translated from DNA sequences by DNAMAN biosoftware. The protein sequences were then submitted to the servers of TMHMM , which do the analysis of the characteristics of the SARS proteins. 4. Evolutional relationship analysis by Clustal and Mega. After collecting the cDNA and protein sequences of some structural and nonstructural proteins of different coronavirus, we did multiple alignments of these sequences by Clustalx biosoftware and drew evolutional trees by Mega biosoftware based on the results of alignments. 632 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Results and discussion: cDNA synthesis and cloning 22 cDNA fragments of SARS CoV were synthesized by polymerase chain reaction (some PCR product bands shown in Fig.2). By cloning cDNA fragments into pGEM-T Easy vector, We have constructed 21 plasmids which contain a cDNA fragment. These 21 cloned cDNA fragments cover the complete length of the SARS CoV genome except a short piece, which is still under cloning, of about 2kb at the position from 26383 to 28320 in the genome of SARS CoV CUHK-W1 (AY278554).4 Some of the clones , such as the 3C-TE, RdRp-TE, S-TE, M-TE, N-TE contain cDNA fragments coding an independent viral protein each.(see Figure.3, Table 4). Some the fragments (as shown by black arrows in Fig.3 ) were sequenced to ascertain their correctness and to determined their cloning directions in pGEM-T Easy vector (T7SP6 or SP6T7, as shown in Table 4 ), while the clones coding viral proteins (protein clones for short, as shown by red arrows in Fig3) were proved correct by restriction mapping (data not shown ), and the proteins they are coding were designed to be expressed by constructing expressional plasmids based on these protein clones. The fragment under cloning (as shown by blue bar in Fig3 ) is in progress. Lane: a1 a2 b1 b2 1.4kb (a) c1 0.9kb (b) c2 0.6kb (c) Fig2. Electrophoresis of PCR products of some cDNA fragments of the SARS CoV genome. (a) PCR products of cDNA coding the Nucleocapsid of SARS CoV; (b)PCR products of cDNA coding the 3CL protein of SARS CoV ; (c) PCR products of cDNA coding the Membrane protein of SARS CoV. The lengths (in kb) of the cDNAs are incicated by arrow bars. And Lanes a1, b1 and c2 are the DNA Molecular Weight Marker of λ DNA digested by EcoRI and HindIII.PCR products of the other fragment were all separated and purified by argarose gel electrophoresis (pictures not shown ). 4 Since the difference between sequences of different SARS CoVs (NC_004718 and AY278554 ) is minor, in our work on sequence analysis, we use the sequence of SARS CoV CUHK-W1 (AY278554) to determined the fragment positions in the genome for restriction mapping, which has not affect any results both experimentally and theoretically.. 633 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) 000-010-TE 010-027-TE 025-043-TE 043-053-TE 053-065-TE 065-079-TE 263-283-TE 079-093-TE UNDER CLONING 150-192-TE 091-102-TE 099-120-TE 192-202-TE 120-138-TE 1 150-179-TE 251-265-TE 202-222-TE 279-297-TE 29776 2000bp 3CL protein RdRp Nsp9 Spike Glygoprotein M protein Nucleocapsid SARS FRAGMENT CLONES fragment Clones Protein clones Under cloning fragment mapped by Chen SD,PKU Fig3. cDNA fragment clones covering the SARS CoV genome All these fragments had been cloned into pGEM-T Easy vector and some later into expression vectors. The black arrows show the fragment clones for construction of complete-genome-length . Detail information of all the clones is provided in Table4. 634 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Table 4. cDNA fragment clones covering the SARS CoV genome No. Plasmid name Inserted cDNA fragment position in SARS CoV genome Fragment size (kb) 1 2 000-010-TE 010-027-TE 1-1025 1002-2712 1.0 1.7 Cloning direction in pGEM-T Easy vector 5’3’ ND* SP6T7** 3 025-043-TE 2533-4380 1.8 T7SP6 4 043-053-TE 4356-5404 1.0 T7SP6 5 6 053-065-TE 065-079-TE 5380-6542 6518-7980 1.1 1.5 T7SP6 SP6T7 7 079-093-TE 7957-9330 1.4 ND 8 091-102-TE 9145-10218 1.1 SP6T7 9 3C-TE 9970-10887 0.9 ND 10 099-120-TE 9970-12076 2.0 SP6T7 11 120-138-TE 12052-13883 1.8 T7SP6 12 RdRp-TE 13357-16151 2.8 ND 13 150-192-TE* 15004-19236 4.2 SP6T7 14 150-179-TE 15004-17955 2.9 T7SP6 15 192-202-TE 19212-20242 1.0 SP6T7 16 202-222-TE 20221-22223 2.0 ND 17 S-TE 3.8 ND 18 251-265-TE 25129-26584 1.4 ND 19 263-283-TE - - Under cloning*** 20 M-TE 26383-27048 0.7 ND 21 N-TE 28105-29373 1.4 ND 22 279-297-TE 29713-29687 1.8 T7SP6 21477-25244 *ND=Not Determined ,such clones are still under sequencing now or had already justified by restriction mapping. **. SP6T7 means that in this clone the inserted fragment’s direction, from 5’ to 3’ is from SP6 promoter to T7 promoter in in pGEM-T Easy vector,( for the detail of this vector, consult the information from Promega.Http://www.promega.com ) ***This fragment is under cloning, this 2kb fragment contains the final missing 1kb cDNA to cover the complete SARS CoV genome. 635 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Bioinformatics analysis We have predicted the hydrophobicity / hydrophilicity and transmambrane regions of all major viral proteins of SARS CoV, the results are shown in Table 5 and Fig.4 . This result suggests that most of the nonstructural proteins are hydrophilic except NSP1, NSP3, X1, X4, which are predicted to have some transmembrane domains, and that most of the structural proteins are hydrophobic except the N protein which is predicted to be highly hydrophilic. The reason for such result may be related with the function and distribution of the viral proteins: All the nonstructural proteins are synthesized in cytosol in host cells and then carry out the function of virus replication, including (putative for SARS CoV) viral RNA replication, viral mRNA transcription and editing, viron assembly, and inhibition of host cell protein synthesis , etc. Such functional requirements the may drive the nonstructural proteins to take soluble form, i.e hydrophilicity. The exceptions of the NSP1, NSP3, X1, X4 proteins’ hydrophobicity may also be explicited by their specific functions. On the other hand, the structural proteins like E (envelop), M (membrane) are on the surface of a viron, which necessitate them to take hydrophobic form to form a non-polar barrier between the nucleocapsid inside the viron and the outer environment. The S (spike) glycoprotein is predicted to have a short piece of transmembrane region near its C-terminus, which may separate the outer part postulated to interact with hANP(CD13)6. and the inner part. For the exception of the structural proteins, functionally speaking, since the N (nucleocapsid) protein is postulated to be integrated with the viral genome RNAs, a hydrophilic form might facilate such interaction as indicated in other coronaviruses7 8 9. 636 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Table 5. Characteristics of SARS CoV Protein predicted by TMHMM Protein name Leader NSP1 NSP2 3C-like NSP3 HD2 NSP4 NSP5 NSP6 NSP7 growth-factor-like NSP9_RdRp NSP10 MB_NTPase_HEL NSP11 NSP12 ? NSP13 ribose-2-O-methyltransferase E M N S X1 X2 X3 X4 X5 Ref: Number of predicted Transmembrane or hydrophobic helixes Exp number of AAs in Transmembrane or hydrophobic helixes Putative Signal peptide 0 0* N/A** 2422 7 209 N/A 306 0 0 N/A 290 7 145 N/A 83 0 0 N/A 198 0 0 N/A 113 0 0 N/A 139 0 0 N/A 932 0# 0 N/A 601 0 0 N/A 527 0 0 N/A 346 0 0 N/A 298 0# 0 N/A 76 1# 24 221 3# 67 422 0# 0 N/A 1255 1# 25 N/A 274 3 65 POSSIBLE N-term signal sequence 154 0 6 N/A 63 0 13 POSSIBLE N-term signal sequence 122 1 22 N/A 84 0 0 N/A Length (AAs) 179 SARS coronavirus Urbani, complete genome.ACCESSION POSSIBLE N-term signal sequence POSSIBLE N-term signal sequence AY278741 *The 0 here means the Exp number of AAs in Transmembrane or hydrophobic helixes predicted by TMHMM is less than 1 in the protein sequence **N/A: There is no signal sequence predicted by TMHMM # The hydrophilic probability curves of these proteins are shown in Fig.4 637 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) a) b): c): d): e): f): Fig4. Hydrophobic and hydrophilic regions of SARS CoV Protein predicted by TMHMM Abscissa : amino acids (NC ) ; Ordinate : probability of hydrophilicity; (a to f ) Structural proteins: a)NSP13 (ribose-2-O-methyl transferase);b)NSP9 (RdRp) ; Nonstructural proteins: c)S(Spike glycoprotein); d)M(Membrane protein); e)E(Envelop protein); f)N(Nucleocapsid) Our work has been recently concentrated on NSP13, which is a putative nonstructural protein first predicted by Marra et al after abtaining the genome sequence of SARS CoV (assession: NC_004718). Located in the C-terminal part of the pp1ab viral polyprotein, this protein , recently predicted to be assigned the function of AdoMet-dependent ribose-2-O-methyl-transferase. Bioinformatic analysis has been performed to characterize this protein. The result of TMHMM shows that SARS NSP13 is highly hydrophilic (Fig4 .a). The indices of hydrophilicity and hydrophobicity were calculated by DNAMAN(4.0) as shown in Fig.5 a , b , respectively. Rich in Leucines , Asparagines , Serines , this protein is predicted to have a pI of 7.81, which is consistent with its putative soluble form . 638 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) (a) (b) SARS- NSP13 SARS- NSP13 5 5 4 4 3 3 2 Hydrophobicity Hydrophilicity 2 1 0 -1 -2 1 0 -1 -2 -3 -3 -4 -4 -5 -5 1 75 150 224 298 1 75 Amino acid number 150 224 298 Amino acid number Fig5.Hydrophilicity and Hydrophobicity indices of SARS CoV NSP13 calculated by DNAMAN(4.0). Abscissa : amino acids (NC ) ; Ordinate : (a)Relative index of hydrophilicity , (b)Relative index of hydrophobicity . Blast result shows that all the coronaviruses sequenced have a putative CDS of NSP13 in their sequence each. These proteins in different species of coronaviruses may have important evolutional implications of the newly-broken SARS CoV and the coronaviridae (coronavirus family ). Since the putative protein sequences of coronavirus are rare, we have obtained all the 7 sequences in SRS as in Table 6 below for putative functional and evolutional analysis. Table 6. Sequence information from genbank SeqVersion SeqLength(aa) Description NP_828873.2 298 putative coronavirus nsp13 [SARS coronavirus]. NP_840013.1 300 putative coronavirus nsp13 [Transmissible gastroenteritis virus]. NP_839969.1 301 putative coronavirus nsp13 [Porcine epidemic diarrhea virus]. NP_742142.1 299 coronavirus nsp13 [Bovine coronavirus]. NP_835356.1 300 putative coronavirus nsp13 [Human coronavirus 229E]. NP_740620.1 299 coronavirus nsp13 [Murine hepatitis virus]. NP_740633.1 302 coronavirus nsp13 [Avian infectious bronchitis virus]. Multiple alignment was done to all the 7 sequences by clustalx.(1.8). Six conserved motifs in all the 7 sequences were found (Fig.6 ) and four of them , KYTQLCQY, DLXXSD , AXKXTE and SSSE contain key residues of K46, D130, K170, E203 respectively .These four residues are essential for the K-D-K-E conserved tetrad of residues which is essential for mRNA cap-1 (mGpppNm) formation.10 639 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Fig6.Multiple alignment of 7 amino acid sequences of the coronavirus nonstructural protein NSP13 by Clustalx(1.8). The symbols of species were explicated in Table 6. (*) Residues conserved across all compared sequences. Some of such identical residues form motifs : motif KYTQLCQY in the first 60AAs (line 7-13 in this figure ), GVAPG in the second 60AAs (line 16-22), DLXXSD and AXKXTE in the third 60AAs (line 25-31), triple-Cysteine motif of SSSE and a HANYXFWRN motif in the fourth 60AAs (line 34-40). These motifs contain the four residues of K46, D130, K170, E203 which form a spatial tetrad conserved for mRNA cap-1 (mGpppNm) formation. 640 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Based on the result of multiple alignment of those 7 AA sequences , a genetic distance matrix was constructed (Table 7).Of the sequences analyzed, The SARS CoV has a maximum genetic distance from IBV (0.634 substitution per AA ) and a minimum from BCoV(0.433 substitution per AA ). An evolutionary tree was constructed to examine the phylogenetic relationship between different coronavirus species(Fig.7).The 7 isolates could be segregated into 4 clusters (designated A, B, C, and D) based on the predicted degree of relatedness. SARS CoV grouped isolated as in cluster A apart from the other groups, with a closer relationship to group B (BCoV and MHV ). This analysis further confirmed that the SARS CoV has novel genetic characteristics does not belong to any other group of coronavirus, suggesting that it may not be possible of generated by homologic recombination. This is consistent with the previous evolutional analysis of the structural proteins and RdRp by other scientists. Further more , the closer relatedness (indicated in Table.7 and Fig.7) of SARS CoV to BCoV and MHV than HCoV229E suggests that the SARS CoV may be originated from animal rather than human . The genetic distances of SARS CoV to BCoV and MHV are 0.433 and 0.465 substitution per AA residue , respectively, while that of SARS CoV to HCoV229E is 0.590. This suggests that the SARS CoV might have evolved from a manmal coronavirus to a novel independent branch of coronaviridae. In addition, similar bioinformatics study on another viral protein NSP10(putative NTP-Helicase) is in consistence with that of NSP13, which further strengthen current conclusion. There is a strong implication that all the genomic differences between SARS CoV and other coronaviruses signify SARS CoV as a novel branch of coronaviridae. Table 7. Distance matrix for NSP13 amino acid sequences of 7 coronaviruses by Mega(2.1) _____________________________________________________________________ Unit: substitution per AA residue. 641 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) SARS MHV A B BCoV TGEV PEDV C HCoV229E IBV 0.30 0.25 0.20 0.15 0.10 0.05 D 0.00 Fig7. Neighbor-joining tree based on multiple alignment and genetic distance calculation of 7 AA sequences of different coronavirus (by Mega 2.1). The 7 isolates could be segregated into 4 clusters (designated A, B, C, and D) based on the predicted degree of relatedness. The scale below represents the relative genetic distances of the tree. Concluding remark : We have cloned 21 cDNA fragments covering nearly the complete genome of the SARS coronavirus , including some cDNAs coding essential viral proteins such as 3CL, RdRp, Spike , Membrane and Nucleocapsid. All these clonings were confirmed by DNA sequencing. Based on the cDNA sequence, we have conducted bioinformatics analysis. The prediction of protein hydrophilicity/hydrophobicity and transmembrane regions shows part of the characteristics of the viral proteins; the motif search for an important viral protein NSP13 strengthen the model for tetrad residues contributing to mRNA capping function; evolutional analysis based on the AA sequence of NSP13 is consistent with that of NSP10 and previous study on other viral proteins, showing that the SARS CoV is not closely related to any group of coronavirus, but relatively, has a closer relation to BCoV and MHV. Acknowlegements : We give special thanks to Prof. Chen Jianguo , who guided our research both theoretically and technically and provide us additional help in everyday’s work , to the Jun-Zheng foundation , which funded part of this project , to Dr. Y Lu in Zhejiang Provincial Center for Disease Prevention and Control who did the virus isolation, viral RNA extraction, and reverse-transcription of viral RNA and provide the 1st-strand cDNA of SARS CoV for free, to Qian Feng and Zhiyin Wu who provided aid for assistance in operating part of the bioinformatic softwares ,and finally to all the co-workers in the Lab of Molecular and Cellular Biology, Life Science Center, Peking University. 642 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) Reference: 1 Marco A. Marra et al, The Genome Sequence of the SARS-Associated Coronavirus, sciencexpress, published on line, 1 May 2003 / Page 3/ 10.1126/ 2 B. N. Fields, D. M. Knipe, P. M. Howley, D. E. Griffin,Fields Virology (Lippincott Williams & Wilkins,Philadelphia, ed. 4, 2001). 3 M. M. C. Lai, D. Cavanagh, Adv. Virus Res. 48, 1 (1997). 4 P. A. Rota et al., Characterization of a Novel Coronavirus Associated with Severe Acute Respiratory Syndrome,published on line www.scienceexpress.org of May 01, 2003;Science, 1 May 2003 / Page 1/ 10.1126/ 5 YiJun Ruan et al , Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection, THE LANCET • Published online May 9, 2003 • http://image.thelancet.com/extras/03art4454web.pdf 6 Yu XJ et al, Putative hAPN receptor binding sites in SARS-CoV spike protein, Acta Pharmacol Sin. 2003 Jun;24(6):481-8. 7 Masters, P. S. 1992. Localization of an RNA-binding domain in the nucleocapsid protein of the coronavirus mouse hepatitis virus. Arch. Virol. 125:141–160. 8 Nelson, G. W., and S. A. Stohlman. 1993. Localization of the RNA-binding domain of mouse hepatitis virus nucleocapsid protein. J. Gen. Virol. 74:1975–1979. 9 Lili Kuo et al, Genetic Evidence for a Structural Interaction between the Carboxy Termini of the Membrane and Nucleocapsid Proteins of Mouse Hepatitis Virus, JOURNAL OF VIROLOGY, May 2002, p. 4987–4999 10 Marcin von Grotthuss et al, mRNA Cap-1 Methyltransferase in the SARS Genome, Cell, Vol. 113,701-701,June 13,2003. 作者简介: 陈斯迪,男,1984 年 12 月出生于广东湛江,2000 年 7 月从广东华南师大附 中考入北京大学生命科学学院。在校期间,思想上积极向上,追求上进,勤奋学 习,成绩优秀,在年级中处于前十名,积极参加体育锻炼与运动会,表现活跃, 得到导师、班主任、任课教师和学校的肯定,连续两年获取一等奖学金。 感悟与寄语: 通过一年多的科研工作,我初步感受到了科学研究的艰辛。在实验中最常出 现的一个问题就是实验失败和不断的摸索、重复。在曲折的道路中我一步步前行, 虽然没有作出令人很满意的结果,但是我学习到了许多生命科学最前沿的新知识 和新技术,而且在陈建国教授的细心指导与关怀下,渐渐地对科学研究有了感性 和理性的认识,初步走上科学研究的道路。非常感激陈老师和实验室的其他师兄 师姐的帮助与指导,希望自己今后能在学习和科研工作中取得更大的进步。 643 Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003) 指导教师简介: 陈建国,男,教授,1960 年出生于浙江,从事于神经细胞的分子生物学的 研究。在神经细胞内神经丝(NF)及微管体系构建的分子机制方面取得了一些有意 义的结果。有关论文在 Nature, EMBO, Mol. Biol. Cell, J. Cell Biology, J. Cell Science 等学术刊物上发表,并被国外学术刊物及细胞生物学教科书广泛 引用。有关论文在 Nature, EMBO J, J. of Cell Biology, J. of Cell Science 等刊物上发表。共发表研究论文 40 余篇,被 SCI 检索刊物引用 500 多次。 644