我与君政

advertisement
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Fragmental Cloning of Complete Genome Length
cDNA and Important Viral Proteins of SARS-CoV
and Sequence-based Bioinformatic Analysis1
Sidi Chen and Jianguo Chen2 
Department of Cell Biology and Genetics ,College of Life Science ,Peking University
Abstract:SARS-CoV , a positive-single-strand RNA virus, is a recent highly infectious
coronavirus of human respiratory system. We have cloned the complete-length genomic cDNA as
well as some important viral proteins of this coronavirus in fragments by standard molecular
cloning techniques, and the credibility of the correctness of the cloned cDNA sequences have been
confirmed by DNA sequencing. Bioinformatic analysis, which is based on cDNA and protein
sequence, reveals the putative characteristics of the viral proteins and the evolutional relationships
between SARS-CoV and other coronavirus. Multiple alignment of the amino acid sequences of the
putative NSP13 of different species in coronaviridae reveals in this nonstructural protein some
conserved motifs in which there are 4 residues forming the K-D-K-E conserved tetrad predicted to
be related with viral mRNA capping. Evolution trees based on the alignment of NSP13 reveals
that the SARS CoV is not close related to any group of the known coronavirus, which is consistent
with the analysis based on the alignment of structural proteins and polymerase. The evolutional
distances based on the alignments of NSP13 and NSP10 suggest that the SARS CoV share more
similarity with BCoV and MHV than other coronavirus, which may provide a clue for the origin
of SARS CoV.
KEYWORDS : SARS coronavirus (SARS-CoV), cDNA cloning, NSP13, Bioinformatic
analysis , Evolutional relationship
1
This project is partly funded by the Jun-Zheng Foundation at PKU.
The one in charge of this project is Prof. Jianguo Chen in Department of Cell Biology and
Genetics ,College of Life Science ,Peking University
2

corresponding author: chenjg@pku.edu.cn
628
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Introduction:
The Severe Acute Respiratory Syndrome that broke out recently in China and
was confirmed to be caused by SARS coronavirus. The first complete genome-length
sequence was obtained by Marra et al , Cananda.1
The SARS CoV is a member of the coronaviridae, in which family all the viruses
are enveloped and have significant spike proteins rounding their surfaces. Inside the
viron of each virus is the genomic RNA integrated with nucleocapsid proteins.2 The
model for a typical coronavirus (e.g HCoV229E) is shown in Fig.13. The envelope
carries three glycoproteins: Spike protein (S) , which is related with receptor binding, cell
fusion and acts as the major antigen in immunity; Envelope protein (E), which is a
envelope-associated protein; Membrane protein (M), which is related with
transmembrane - budding and envelope formation ;In a few types, there is a third
glycoprotein: Haemagglutinin-esterase (HE). The genome is associated with a basic
phosphoprotein, N. The coronaviruses, which replicate in the cytoplasm in the host cells,
are distinguished by the presence of a single-stranded plus sense RNA genome
approximately 30 kb in length that has a 5´ cap structure and 3´ polyA tract.3 The mRNA
mapping based on sequence reveals the genome organization of SARS CoV, which
indicates the presence of the CDS of a leader peptide, structural proteins such as S, M, E,
N4, nonstructural proteins such as MHV p65 counterpart, NSP1-NSP13, X1-X5.5
We aimed to clone the complete length cDNA and cDNAs that coding different
structural and nonstructural proteins of the SARS coronavirus in order that
experiments of virus-cell interaction and of viral protein expression and
characterization could be carried out in the future. More over, bioinformatics analysis
could be done to predict the characteristics of some viral proteins like the putative
nonstructural protein NSP13 which is suggested to have the function of viral mRNA
capping , and to explore the evolutional relationship of SARS CoV and other
coronaviruses.
Fig1. Model for a viron of coronavirus
S - Spike protein; E - Envelope protein: small, envelope-associated protein; M - Membrane
protein ; HE - Haemagglutinin-esterase (In a few types); N- Nucleocapsid.
3
This figure is from Alan J. Cann , Priciples of Molecular Virology.
629
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Materials and Methods:
1. Template preparation:
The reverse transcriptional mixture of SARS-CoV RNA from supernatant
fluid of the virus infected Vero cells using random or specific primers
was generously supplied by Dr. Y Lu in Zhejiang Provincial Center for
Disease Prevention and Control.
2. Primer Design (by Primer-Premier) :
We designed primers for both complete-length genomic cDNA cloning and the cloning
of some viral proteins, which is based on the sequence of the positive strand of SARS CoV
summitted by Marra et al (Genbank Assess Number:NC_004718). Primers for
complete-length genomic cDNA cloning have no additional bases, while primers for protein
cloning have additional bases such as restriction sites, start codon ATG, or stop codon TAA. .
Primers for complete-length genomic cDNA cloning locate at intervals of about 1-2kilobase,
and commonly have a restriction site in original sequence ; Primers for protein CDS cloning are
located at the beginnings and ends of the target cDNA sequence of the protein. These primers
are listed in Table 1 a,b and Table 2.
The primers were synthesis and dissoluted in distilled water at a concentration of 10µmol/L.
Table 1a. PCR Primers for fragmental cDNA cloning of SARS CoV genome
Forward Primers:
Primer name
sequence (5’3’)
F0
CTACCCAGGA AAAGCCAACC AACC
F10
CCAGACACCC TTCGAAATTA AGAG
F25
GAACTCGA AGCACTCGAG ACGCCCG
F43
GGATG TTAGAGCCAT AATGGCAACC
F5
CATCTTCTACA GCATGCTAAT TTGG
F65
TTT CACTAGCCTT AGGTTTAAAAAC
F79
GCTT ATGTCGACAC CTTTTCAGCA
F91
TGCGAAAGGTCAGAAGTAGGTATT
F99
AGTGGTTTTAG GAAAATGGCA TTCCC
F120
GCCACTGCC CAGGAGGCCT ATGAGC
F138
CCTGACATCTT ACGCGTATAT GC
F150
GCAAAGAA TAGAGCTCGC ACCGTAGCT
F165
CTGAGAGACT CAAGCTTTTC GC
F179
CTGCAAT TTACAAGTCT AGAAATACC
F192
GTATGTGAAT AAGCATGCAT TCCAC
F202
GGCTATGCCTT CGAACACATC G
F251
GC ATGACTAGTT GTTGCAGTTG CC
F279
TACTA TCAAC TGTCA AGATC CAGC
F286
TCTTCTCGCTCCTCATCACGT
630
position in genome
(1-24)
(1002-1025)
(2533-2557)
(4356-4380)
(5380-5404)
(6518-6542)
(7957-7980)
(9145-9168)
(9970-9994)
(12052-12076)
(13861-13883)
(15004-15030)
(16532-16553)
(17905-17930)
(19212-19236)
(20221-20242)
(25129-25153)
(27980-28004)
(28654-28634)
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Table 1b. PCR Primers for fragmental cDNA cloning of SARS CoV genome (continued)
Reverse Primers:
Primer name
sequence (5’3’)
position in genome
R10
CTCTTAATTT CGAAGGGTGT CTGG
(1025-1002)
R27
AATTGGTGCA CCCCCTTTTA AGCG
(2712-2688)
R43
GGTTGCCATT ATGGCTCTAA CATCC
(4380-4356)
R53
CCAAATTAGC ATGCTGTAGA AGATG
(5404-5380)
R65
GTTTTTAAAC CTAAGGCTAG TGAAA
(6542-6518)
R79
TGCTGAAAAG GTGTCGACAT AAGC
(7980-7957)
R93
ACCAGCCACT ACTGAAGCAG ACAC
(9330-9307)
R102
TTGCATAGAA TGGCCAATAA CACG
(10218-10193)
R120
GCTCATAGGC CTCCTGGGCA GTGGC
(12076-12052)
R138
GCATATACGC GTAAGATGTC AGG
(13883-13861)
R150
AGCTACGGTG CGAGCTCTAT TCTTTGC (15030-15004)
R165
GCGAAAAGCT TGAGTCTCTC AG
(16553-16532)
R179
TTGTAATGTA GCCACATTGC GACG
(17955-17932)
R192
GTGGAATGCA TGCTTATTCA CATAC
(19236-19212)
R202
CGATGTGTTC GAAGGCATAG CC
(20242-20221)
R219
GTACCCATGG GTTTAGAAAC AGC
(21914-21891)
R265
CAGCAAGCAC AAAACAAGCA A
(26584-26564)
R283
CCCTGGCCTC GAGGGAATCT AAGTT
(28320-28296)
R297
GTCATTCTCC TAAGAAGCTA TTAAAA
(29713-29687)
Table 2. PCR Primers for cDNA cloning of SARS CoV protein CDS
Primer
name
F3C
R3C
Fn
Rn
Fm
Rm
Fs1
Rs2
Fp1
Rp5
sequence (5’3’)
CCC GGA TCC ATG AGTGGTTTTAG GAAAATGGCA TTCCC
GGG GCGG CCGC TTA TTGGAAGGTA ACACCAGAGC
CCC GGA TCC ATGTCT GATAATGGAC CCCAATCAAA
GGGGCGG CCGCTTATGCCTGA GTTGAATCAG CAGAAGC
CCCGGA TCC ATGGCAGACAACGGTACTATTACCG
GGGGCGG CCGCTTACTGTACT AGCAAAGCAA TATTG
CCCGGA TCCATGTTTATTTTCTTATTATTTCTTACTC
GGGGCGG CCGCTTATGTGTAA TGTAATTTGA CACCC
CCCGGATCCATGTCTGCGGATGCATCAACGTTTTT
AAACCGGGTTTGCGGTGTAAG
GGGCTCGAGTTACTGCAAGACTGTATGTGGTGTG
additional
restriction
site
Bam HI
Not I
Bam HI
Not I
Bam HI
Not I
Bam HI
Not I
BamHI
XhoI
*The mucleotides underlined are added artificially for cloning, not native in the SARS genome
** The primers for specific protein CDS cloning are named with the direction and protein name:
F3C and R3C are for the cloning of 3CL protein , Fn and Rn for N protein, Fm and Rm for M
protein, Fs1 and Rs2 for S protein, Fp1 and Rp5 for RdRp.
631
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
3.cDNA synthesis by Polymerase Chain Reaction:
For cloning of the cDNA of the SARS-CoV, the first stand cDNA mixture was
subjected to PCR amplification using the specific primers to amplify cDNA fragments
of the virus (Table 1, Table 2). In brief, we set up the polymerase chain reaction (PCR)
as described below (Table 3). The PCR products were purified by agarose gel
electrophoresis and than cloned directly into pGEM T Easy vector (Promega,
Madison, USA). The result of latter sequencing confirmed that the amplifed fragments
were exactly the cDNA fragments of SARS-CoV.
Table 3. PCR reaction for SARS CoV cDNA synthesis
Volume
Content
(µL )
0.1
Template :1st –strand cDNA solution**
0.5
Forward primer
0.5
Reverse primer
dNTP mixture
5
(dATP ,dCTP ,dGTP, dTTP )
5
10 fold ExTaq buffer
ExTaq polymerase*
dH2O
Total
0.5
38.4
50
Final concentration
0.1µmol/L
0.1µmol/L
0.25mmol/L each
10mmol/L Tris-HCl (pH 8.3)
50mmol/L KCl
1.5mmol/L MgCl2
-
*The Extaq polymerase and its buffer were bought from Takara Ltd
**Negative control was set without adding template
3. Characteristics prediction of viral proteins by TMHMM and DNAMAN :
With the sequence of the complete-length cDNA of SARS CoV and the putative
CDS(coding sequence), we collected all the SARS CoV protein sequences from
Genbank, with exceptions that some of the protein sequences were translated from
DNA sequences by DNAMAN biosoftware.
The protein sequences were then submitted to the servers of TMHMM , which do
the analysis of the characteristics of the SARS proteins.
4. Evolutional relationship analysis by Clustal and Mega.
After collecting the cDNA and protein sequences of some structural and
nonstructural proteins of different coronavirus, we did multiple alignments of these
sequences by Clustalx biosoftware and drew evolutional trees by Mega biosoftware
based on the results of alignments.
632
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Results and discussion:
cDNA synthesis and cloning
22 cDNA fragments of SARS CoV were synthesized by polymerase chain
reaction (some PCR product bands shown in Fig.2). By cloning cDNA fragments into
pGEM-T Easy vector, We have constructed 21 plasmids which contain a cDNA
fragment. These 21 cloned cDNA fragments cover the complete length of the SARS
CoV genome except a short piece, which is still under cloning, of about 2kb at the
position from 26383 to 28320 in the genome of SARS CoV CUHK-W1 (AY278554).4
Some of the clones , such as the 3C-TE, RdRp-TE, S-TE, M-TE, N-TE contain cDNA
fragments coding an independent viral protein each.(see Figure.3, Table 4). Some the
fragments (as shown by black arrows in Fig.3 ) were sequenced to ascertain their
correctness and to determined their cloning directions in pGEM-T Easy vector
(T7SP6 or SP6T7, as shown in Table 4 ), while the clones coding viral proteins
(protein clones for short, as shown by red arrows in Fig3) were proved correct by
restriction mapping (data not shown ), and the proteins they are coding were designed
to be expressed by constructing expressional plasmids based on these protein clones.
The fragment under cloning (as shown by blue bar in Fig3 ) is in progress.
Lane:
a1 a2
b1
b2
1.4kb
(a)
c1
0.9kb
(b)
c2
0.6kb 
(c)
Fig2. Electrophoresis of PCR products of some cDNA fragments of the SARS CoV genome.
(a) PCR products of cDNA coding the Nucleocapsid of SARS CoV; (b)PCR products of cDNA
coding the 3CL protein of SARS CoV ; (c) PCR products of cDNA coding the Membrane
protein of SARS CoV. The lengths (in kb) of the cDNAs are incicated by arrow bars. And
Lanes a1, b1 and c2 are the DNA Molecular Weight Marker of λ DNA digested by EcoRI and
HindIII.PCR products of the other fragment were all separated and purified by argarose gel
electrophoresis (pictures not shown ).
4
Since the difference between sequences of different SARS CoVs (NC_004718 and AY278554 ) is minor, in our
work on sequence analysis, we use the sequence of SARS CoV CUHK-W1 (AY278554) to determined the
fragment positions in the genome for restriction mapping, which has not affect any results both experimentally and
theoretically..
633
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
000-010-TE
010-027-TE
025-043-TE
043-053-TE
053-065-TE
065-079-TE
263-283-TE
079-093-TE
UNDER CLONING
150-192-TE
091-102-TE
099-120-TE
192-202-TE
120-138-TE
1
150-179-TE
251-265-TE
202-222-TE
279-297-TE
29776
2000bp
3CL protein
RdRp Nsp9
Spike Glygoprotein
M protein
Nucleocapsid
SARS FRAGMENT CLONES
fragment Clones
Protein clones
Under cloning fragment
mapped by Chen SD,PKU
Fig3. cDNA fragment clones covering the SARS CoV genome
All these fragments had been cloned into pGEM-T Easy vector and some later into expression
vectors. The black arrows show the fragment clones for construction of
complete-genome-length . Detail information of all the clones is provided in Table4.
634
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Table 4. cDNA fragment clones covering the SARS CoV genome
No.
Plasmid
name
Inserted
cDNA
fragment
position in SARS CoV genome
Fragment
size (kb)
1
2
000-010-TE
010-027-TE
1-1025
1002-2712
1.0
1.7
Cloning
direction
in
pGEM-T Easy vector
5’3’
ND*
SP6T7**
3
025-043-TE
2533-4380
1.8
T7SP6
4
043-053-TE
4356-5404
1.0
T7SP6
5
6
053-065-TE
065-079-TE
5380-6542
6518-7980
1.1
1.5
T7SP6
SP6T7
7
079-093-TE
7957-9330
1.4
ND
8
091-102-TE
9145-10218
1.1
SP6T7
9
3C-TE
9970-10887
0.9
ND
10
099-120-TE
9970-12076
2.0
SP6T7
11
120-138-TE
12052-13883
1.8
T7SP6
12
RdRp-TE
13357-16151
2.8
ND
13
150-192-TE*
15004-19236
4.2
SP6T7
14
150-179-TE
15004-17955
2.9
T7SP6
15
192-202-TE
19212-20242
1.0
SP6T7
16
202-222-TE
20221-22223
2.0
ND
17
S-TE
3.8
ND
18
251-265-TE
25129-26584
1.4
ND
19
263-283-TE
-
-
Under cloning***
20
M-TE
26383-27048
0.7
ND
21
N-TE
28105-29373
1.4
ND
22
279-297-TE
29713-29687
1.8
T7SP6
21477-25244
*ND=Not Determined ,such clones are still under sequencing now or had already justified by
restriction mapping.
**. SP6T7 means that in this clone the inserted fragment’s direction, from 5’ to 3’ is from SP6
promoter to T7 promoter in in pGEM-T Easy vector,( for the detail of this vector, consult the
information from Promega.Http://www.promega.com )
***This fragment is under cloning, this 2kb fragment contains the final missing 1kb cDNA to
cover the complete SARS CoV genome.
635
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Bioinformatics analysis
We have predicted the hydrophobicity / hydrophilicity and transmambrane regions of all
major viral proteins of SARS CoV, the results are shown in Table 5 and Fig.4 . This result suggests
that most of the nonstructural proteins are hydrophilic except NSP1, NSP3, X1, X4, which are
predicted to have some transmembrane domains, and that most of the structural proteins are
hydrophobic except the N protein which is predicted to be highly hydrophilic. The reason for such
result may be related with the function and distribution of the viral proteins: All the nonstructural
proteins are synthesized in cytosol in host cells and then carry out the function of virus replication,
including (putative for SARS CoV) viral RNA replication, viral mRNA transcription and editing,
viron assembly, and inhibition of host cell protein synthesis , etc. Such functional requirements the
may drive the nonstructural proteins to take soluble form, i.e hydrophilicity. The exceptions of the
NSP1, NSP3, X1, X4 proteins’ hydrophobicity may also be explicited by their specific functions.
On the other hand, the structural proteins like E (envelop), M (membrane) are on the surface of a
viron, which necessitate them to take hydrophobic form to form a non-polar barrier between the
nucleocapsid inside the viron and the outer environment. The S (spike) glycoprotein is predicted to
have a short piece of transmembrane region near its C-terminus, which may separate the outer part
postulated to interact with hANP(CD13)6. and the inner part. For the exception of the structural
proteins, functionally speaking, since the N (nucleocapsid) protein is postulated to be integrated
with the viral genome RNAs, a hydrophilic form might facilate such interaction as indicated in
other coronaviruses7 8 9.
636
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Table 5. Characteristics of SARS CoV Protein predicted by TMHMM
Protein name
Leader
NSP1
NSP2
3C-like
NSP3
HD2
NSP4
NSP5
NSP6
NSP7
growth-factor-like
NSP9_RdRp
NSP10
MB_NTPase_HEL
NSP11
NSP12 ?
NSP13
ribose-2-O-methyltransferase
E
M
N
S
X1
X2
X3
X4
X5
Ref:
Number of predicted
Transmembrane or
hydrophobic helixes
Exp number of AAs
in Transmembrane or
hydrophobic helixes
Putative
Signal peptide
0
0*
N/A**
2422
7
209
N/A
306
0
0
N/A
290
7
145
N/A
83
0
0
N/A
198
0
0
N/A
113
0
0
N/A
139
0
0
N/A
932
0#
0
N/A
601
0
0
N/A
527
0
0
N/A
346
0
0
N/A
298
0#
0
N/A
76
1#
24
221
3#
67
422
0#
0
N/A
1255
1#
25
N/A
274
3
65
POSSIBLE N-term
signal sequence
154
0
6
N/A
63
0
13
POSSIBLE N-term
signal sequence
122
1
22
N/A
84
0
0
N/A
Length
(AAs)
179
SARS coronavirus Urbani, complete genome.ACCESSION
POSSIBLE N-term
signal sequence
POSSIBLE N-term
signal sequence
AY278741
*The 0 here means the Exp number of AAs in Transmembrane or hydrophobic helixes predicted by TMHMM
is less than 1 in the protein sequence
**N/A: There is no signal sequence predicted by TMHMM
# The hydrophilic probability curves of these proteins are shown in Fig.4
637
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
a)
b):
c):
d):
e):
f):
Fig4. Hydrophobic and hydrophilic regions of SARS CoV Protein predicted by TMHMM
Abscissa : amino acids (NC ) ; Ordinate : probability of hydrophilicity; (a to f )
Structural proteins: a)NSP13 (ribose-2-O-methyl transferase);b)NSP9 (RdRp) ;
Nonstructural proteins: c)S(Spike glycoprotein); d)M(Membrane protein); e)E(Envelop protein);
f)N(Nucleocapsid)
Our work has been recently concentrated on NSP13, which is a putative nonstructural protein
first predicted by Marra et al after abtaining the genome sequence of SARS CoV (assession:
NC_004718). Located in the C-terminal part of the pp1ab viral polyprotein, this protein , recently
predicted to be assigned the function of AdoMet-dependent ribose-2-O-methyl-transferase.
Bioinformatic analysis has been performed to characterize this protein. The result of TMHMM
shows that SARS NSP13 is highly hydrophilic (Fig4 .a). The indices of hydrophilicity and
hydrophobicity were calculated by DNAMAN(4.0) as shown in Fig.5 a , b , respectively. Rich in
Leucines , Asparagines , Serines , this protein is predicted to have a pI of 7.81, which is consistent
with its putative soluble form .
638
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
(a)
(b)
SARS- NSP13
SARS- NSP13
5
5
4
4
3
3
2
Hydrophobicity
Hydrophilicity
2
1
0
-1
-2
1
0
-1
-2
-3
-3
-4
-4
-5
-5
1
75
150
224
298
1
75
Amino acid number
150
224
298
Amino acid number
Fig5.Hydrophilicity and Hydrophobicity indices of SARS CoV NSP13 calculated by
DNAMAN(4.0). Abscissa : amino acids (NC ) ; Ordinate : (a)Relative index of
hydrophilicity , (b)Relative index of hydrophobicity .
Blast result shows that all the coronaviruses sequenced have a putative CDS of NSP13 in
their sequence each. These proteins in different species of coronaviruses may have important
evolutional implications of the newly-broken SARS CoV and the coronaviridae (coronavirus
family ). Since the putative protein sequences of coronavirus are rare, we have obtained all the 7
sequences in SRS as in Table 6 below for putative functional and evolutional analysis.
Table 6. Sequence information from genbank
SeqVersion
SeqLength(aa)
Description
NP_828873.2
298
putative coronavirus nsp13 [SARS coronavirus].
NP_840013.1
300
putative coronavirus nsp13 [Transmissible gastroenteritis virus].
NP_839969.1
301
putative coronavirus nsp13 [Porcine epidemic diarrhea virus].
NP_742142.1
299
coronavirus nsp13 [Bovine coronavirus].
NP_835356.1
300
putative coronavirus nsp13 [Human coronavirus 229E].
NP_740620.1
299
coronavirus nsp13 [Murine hepatitis virus].
NP_740633.1
302
coronavirus nsp13 [Avian infectious bronchitis virus].
Multiple alignment was done to all the 7 sequences by clustalx.(1.8). Six conserved motifs
in all the 7 sequences were found (Fig.6 ) and four of them , KYTQLCQY, DLXXSD , AXKXTE
and SSSE contain key residues of K46, D130, K170, E203 respectively .These four residues are
essential for the K-D-K-E conserved tetrad of residues which is essential for mRNA cap-1
(mGpppNm) formation.10
639
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Fig6.Multiple alignment of 7 amino acid sequences of the coronavirus nonstructural protein
NSP13 by Clustalx(1.8). The symbols of species were explicated in Table 6. (*) Residues
conserved across all compared sequences. Some of such identical residues form motifs : motif
KYTQLCQY in the first 60AAs (line 7-13 in this figure ), GVAPG in the second 60AAs (line
16-22), DLXXSD and AXKXTE in the third 60AAs (line 25-31), triple-Cysteine motif of SSSE
and a HANYXFWRN motif in the fourth 60AAs (line 34-40). These motifs contain the four
residues of K46, D130, K170, E203 which form a spatial tetrad conserved for mRNA cap-1
(mGpppNm) formation.
640
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Based on the result of multiple alignment of those 7 AA sequences , a genetic distance matrix
was constructed (Table 7).Of the sequences analyzed, The SARS CoV has a maximum genetic
distance from IBV (0.634 substitution per AA ) and a minimum from BCoV(0.433 substitution
per AA ). An evolutionary tree was constructed to examine the phylogenetic relationship between
different coronavirus species(Fig.7).The 7 isolates could be segregated into 4 clusters (designated
A, B, C, and D) based on the predicted degree of relatedness. SARS CoV grouped isolated as in
cluster A apart from the other groups, with a closer relationship to group B (BCoV and MHV ).
This analysis further confirmed that the SARS CoV has novel genetic characteristics does not
belong to any other group of coronavirus, suggesting that it may not be possible of generated by
homologic recombination. This is consistent with the previous evolutional analysis of the
structural proteins and RdRp by other scientists. Further more , the closer relatedness (indicated in
Table.7 and Fig.7) of SARS CoV to BCoV and MHV than HCoV229E suggests that the SARS
CoV may be originated from animal rather than human . The genetic distances of SARS CoV to
BCoV and MHV are 0.433 and 0.465 substitution per AA residue , respectively, while that of
SARS CoV to HCoV229E is 0.590. This suggests that the SARS CoV might have evolved from a
manmal coronavirus to a novel independent branch of coronaviridae. In addition, similar
bioinformatics study on another viral protein NSP10(putative NTP-Helicase) is in consistence
with that of NSP13, which further strengthen current conclusion. There is a strong implication that
all the genomic differences between SARS CoV and other coronaviruses signify SARS CoV as a
novel branch of coronaviridae.
Table 7.
Distance matrix for NSP13 amino acid sequences of 7 coronaviruses by Mega(2.1)
_____________________________________________________________________
Unit: substitution per AA residue.
641
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
SARS
MHV
A
B
BCoV
TGEV
PEDV
C
HCoV229E
IBV
0.30
0.25
0.20
0.15
0.10
0.05
D
0.00
Fig7. Neighbor-joining tree based on multiple alignment and genetic distance calculation of 7
AA sequences of different coronavirus (by Mega 2.1). The 7 isolates could be segregated into 4
clusters (designated A, B, C, and D) based on the predicted degree of relatedness. The scale
below represents the relative genetic distances of the tree.
Concluding remark :
We have cloned 21 cDNA fragments covering nearly the complete genome of the
SARS coronavirus , including some cDNAs coding essential viral proteins such as
3CL, RdRp, Spike , Membrane and Nucleocapsid. All these clonings were confirmed
by DNA sequencing. Based on the cDNA sequence, we have conducted
bioinformatics analysis. The prediction of protein hydrophilicity/hydrophobicity and
transmembrane regions shows part of the characteristics of the viral proteins; the
motif search for an important viral protein NSP13 strengthen the model for tetrad
residues contributing to mRNA capping function; evolutional analysis based on the
AA sequence of NSP13 is consistent with that of NSP10 and previous study on other
viral proteins, showing that the SARS CoV is not closely related to any group of
coronavirus, but relatively, has a closer relation to BCoV and MHV.
Acknowlegements :
We give special thanks to Prof. Chen Jianguo , who guided our research
both theoretically and technically and provide us additional help in
everyday’s work , to the Jun-Zheng foundation , which funded part of this
project , to Dr. Y Lu in Zhejiang Provincial Center for Disease Prevention
and Control who did the virus isolation, viral RNA extraction, and
reverse-transcription of viral RNA and provide the 1st-strand cDNA of SARS
CoV for free, to Qian Feng and Zhiyin Wu who provided aid for assistance
in operating part of the bioinformatic softwares ,and finally to all the
co-workers in the Lab of Molecular and Cellular Biology, Life Science
Center, Peking University.
642
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
Reference:
1
Marco A. Marra et al, The Genome Sequence of the SARS-Associated Coronavirus,
sciencexpress, published on line, 1 May 2003 / Page 3/ 10.1126/
2
B. N. Fields, D. M. Knipe, P. M. Howley, D. E. Griffin,Fields Virology (Lippincott Williams &
Wilkins,Philadelphia, ed. 4, 2001).
3
M. M. C. Lai, D. Cavanagh, Adv. Virus Res. 48, 1 (1997).
4
P. A. Rota et al., Characterization of a Novel Coronavirus Associated with Severe Acute
Respiratory Syndrome,published on line www.scienceexpress.org of May 01, 2003;Science,
1 May 2003 / Page 1/ 10.1126/
5
YiJun Ruan et al , Comparative full-length genome sequence analysis of 14 SARS coronavirus
isolates and common mutations associated with putative origins of infection, THE LANCET •
Published online May 9, 2003 • http://image.thelancet.com/extras/03art4454web.pdf
6 Yu XJ et al, Putative hAPN receptor binding sites in SARS-CoV spike protein, Acta Pharmacol
Sin. 2003 Jun;24(6):481-8.
7 Masters, P. S. 1992. Localization of an RNA-binding domain in the nucleocapsid
protein of the coronavirus mouse hepatitis virus. Arch. Virol. 125:141–160.
8 Nelson, G. W., and S. A. Stohlman. 1993. Localization of the RNA-binding domain of mouse
hepatitis virus nucleocapsid protein. J. Gen. Virol. 74:1975–1979.
9 Lili Kuo et al, Genetic Evidence for a Structural Interaction between the Carboxy Termini of the
Membrane and Nucleocapsid Proteins of Mouse Hepatitis Virus, JOURNAL OF VIROLOGY,
May 2002, p. 4987–4999
10 Marcin von Grotthuss et al, mRNA Cap-1 Methyltransferase in the SARS Genome, Cell, Vol.
113,701-701,June 13,2003.
作者简介:
陈斯迪,男,1984 年 12 月出生于广东湛江,2000 年 7 月从广东华南师大附
中考入北京大学生命科学学院。在校期间,思想上积极向上,追求上进,勤奋学
习,成绩优秀,在年级中处于前十名,积极参加体育锻炼与运动会,表现活跃,
得到导师、班主任、任课教师和学校的肯定,连续两年获取一等奖学金。
感悟与寄语:
通过一年多的科研工作,我初步感受到了科学研究的艰辛。在实验中最常出
现的一个问题就是实验失败和不断的摸索、重复。在曲折的道路中我一步步前行,
虽然没有作出令人很满意的结果,但是我学习到了许多生命科学最前沿的新知识
和新技术,而且在陈建国教授的细心指导与关怀下,渐渐地对科学研究有了感性
和理性的认识,初步走上科学研究的道路。非常感激陈老师和实验室的其他师兄
师姐的帮助与指导,希望自己今后能在学习和科研工作中取得更大的进步。
643
Series of Selected Papers from Chun-Tsung Scholars,Peking University(2003)
指导教师简介:
陈建国,男,教授,1960 年出生于浙江,从事于神经细胞的分子生物学的
研究。在神经细胞内神经丝(NF)及微管体系构建的分子机制方面取得了一些有意
义的结果。有关论文在 Nature, EMBO, Mol. Biol. Cell, J. Cell Biology, J.
Cell Science 等学术刊物上发表,并被国外学术刊物及细胞生物学教科书广泛
引用。有关论文在 Nature, EMBO J, J. of Cell Biology, J. of Cell Science
等刊物上发表。共发表研究论文 40 余篇,被 SCI 检索刊物引用 500 多次。
644
Download