Journal of General Virology (1995), 76, 593-602. Printed in Great Britain 593 Sequencing and analysis of the nucleocapsid (N) and polymerase (L) genes and the terminal extragenic domains of the vaccine strain of rinderpest virus Michael D. Baron* and Tom Barrett A F R C Institute for Animal Health, Pirbright Laboratory, Ash Road, Woking, Surrey GU24 ONF, UK The nucleocapsid (N) and polymerase (L) genes of the vaccine strain of rinderpest, and the 5' and 3' terminal domains of the genome have been sequenced. Together with previously published data, this completes the sequence of the entire genome of rinderpest virus. The viral genome is 15 881 bases in length, similar to that of measles virus and slightly longer than that of canine distemper virus. The L gene is identical in length to that of measles virus, encoding a 2183 amino acid protein with a calculated M r of 248 100. The L protein sequence ofmorbilliviruses is highly conserved, more than 75 % of residues being identical or conserved in all three sequences currently available. The N protein was, as for the other sequenced genes where comparison is possible, essentially identical to that of the virulent parent. In addition, we have determined the terminal sequences of two virulent strains of rinderpest and compared the sequences of virulent and non-virulent strains. Introduction contains six genes, encoding the surface glycoproteins H and F, responsible for viral attachment to and fusion with the host cell, the nucleocapsid (N) protein, the envelope matrix (M) protein, the polymerase or large (L) protein and the polymerase-associated (P) protein; the gene order (3' to 5' on the genome) is N P - M - F - H - L , as determined by transcriptional mapping for MV, CDV and RPV (Barrett et al., 1991; Dowling et al., 1986; Rima et al., 1986). The P gene also encodes the nonstructural proteins C and V (Baron et al., 1993). Short extragenic sequences are found at the 5' and 3' ends of the genome. As part of our investigations into virulence factors in RPV, we have cloned the entire genome of the vaccine strain (RPV-R), sometimes known as the Plowright vaccine. This strain was derived by repeated passage in primary bovine kidney cells (Plowright & Ferris, 1962) from the virulent Kabete 'O' strain (RPVK), first isolated in 1911. In previous work we have determined the sequences of the F (Evans et al., 1994) and H (Chamberlain, 1992) genes, the P gene (Baron et al., 1993) and the M gene (Baron et al., 1994). We report here the sequences of the N and L genes of RPV-R, together with the 5' and 3' terminal sequences. These data complete the first full sequence of a strain of rinderpest virus. Rinderpest virus (RPV) belongs to the morbilliviruses, a genus of the family Paramyxoviridae, and is thus related to measles virus (MV), canine and phocid distemper viruses (CDV and PDV) and peste-des-petits ruminants virus (PPRV). Rinderpest, the disease, is economically highly important, affecting domestic cattle and wild bovids. Widespread in sub-Saharan Africa in the mid1980s, it has now been restricted to parts of East Africa by the efforts of the Pan African Rinderpest Campaign (PARC). The disease remains enzootic in most of the Indian subcontinent, as well as several countries in the Near and Middle East, with recent outbreaks in Oman, the United Arab Emirates, Saudi Arabia and Turkey. All the morbilliviruses are related serologically, and available sequence data shows that there is a high degree of homology at the sequence level (for review see Barrett et al., 1991). The Paramyxoviridae possess an ssRNA genome of negative polarity; in the morbilliviruses this genome * Author for correspondence. Fax +44 1483232448. e-mail BARON@BBSRC.AC.UK All the sequencespresentedin this paper havebeen submittedto the EMBLdatabase. Accessionnumbersare: RPV-R3' extragenicregion, Z30701; RPV-R N gene, X68311; RPV-R L gene and 5' extragenic region, Z30698; RPV-K 3' extragenic region, Z33634; RPV-K 5" extragenicregion,Z33635; RPV-Kw3' extragenicregionand N gene, Z34262; RPV-Kw5' extragenicregion,Z33636;entireRPV-Rgenome, Z30697. 0001-2775 © 1995SGM Methods RNA purification and cDNA library construction. Poly(A)÷RNA was isolated fromRPV-R infectedVero cellsand oligo(dT)-primedcDNA synthesizedas previouslydescribed (Baron et al., 1993); EcoRI-NotI adaptors (Pharmacia) were added, the cDNA ligated to 2gtll arms Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 594 M. D. Baron and T. Barrett (Boehringer Mannheim) and packaged using GigaPack (Stratagene). For the genomic library, virus was partially purified from the medium of infected Vero cells by sucrose density gradient centrifugation (Barrett et al., 1989). RNA was purified by two sequential extractions with phenol, one extraction with phenol~chloroform-isoamyl alcohol (25:24: 1, by vol.) and one with chloroform-isoamyl alcohol (24: 1, v/v), precipitated with ethanol and dissolved in diethyl pyrocarbonatetreated water. Random hexanncleotides (Boehringer Mannheim) were used to prime eDNA synthesis and the eDNA ligated into 2gtll as described above. Analysis of the fraction of insert-containing phage that were positive for RPV-R sequence suggested that 40% of the original RNA preparation was viral RNA. Viral RNA from another, more recent, virulent Middle Eastern strain [Kuwait/g2/1 (Taylor, 1986); RPV-Kw] that had been passaged five times in Vero cells was isolated in the same way. Total cytoplasmic RNA isolated from tissues of a cow infected with RPV-K has previously been described (Baron et al., 1994). Full-length N clones were isolated from the first library by screening with D-74, a previously identified N-gene specific eDNA (Diallo et al., 1989). L gene-specific clones were isolated from the genomic library by screening with the 4962 bp XmnI-XmnI fragment of the measles L gene isolated from plasmid peMV(-)2ip (the kind gift of M. Billeter and R. Cattaneo). Probes were either labelled with biotin-dUTP and positive clones detected as previously described (Baron et al., 1993) or were labelled and detected using the ECL direct-labelling system (Amersham). The size of inserts in positive clones was determined by PCR (Dorfman et al., 1989), and large inserts were cut from purified 2 DNA (Windle, 1988) with NotI and ligated into NotI-cut pBluescript KS(+) (N clones) or pGEM-5Zf(+) (L clones). Sequencing. A single eDNA clone (N1) spanning almost the entire length of the N gene was identified, isolated and restriction fragments subcloned into M13tgl30 for sequencing. The complete sequence was determined on both strands. The ends of L-positive clones were sequenced to map their position in the gene, and a set of overlapping clones were identified that spanned almost the entire length of the gene. Three clones (L17, L18 and L20) and part of a fourfll (L9) with a combined length of 8 kb were sequenced on both strands from nested deletion sets created using the Erase-a-Base system (Promega). A fulllength L gene was assembled in pGEM-5Zf( + ) from components of all four L clones plus the terminal sequence isolated as below. Determining the sequence o f the 5' and 3" ends o f the genome. The leader and trailer sequences were determined using a modification of the 5' RACE method (Frohman et al., 1988; Loh et al., 1989) as developed by Shuster et al. (1992). Briefly, an mRNA-sense primer corresponding to a sequence near the end of the L mRNA (RPVL1 : 5' GTAGGCTGGTGAGTAATCT Y) was annealed to RNA isolated from partly purified virus (prepared as for library construction), and extended using MMLV (mouse mammary leukaemia virus) reverse transcriptase (Life Technologies). RNA was hydrolysed by adding NaOH to 0-3 M and heating for 30 min at 50 °C. After neutralization by adding HC1 to 0.3 M, the primer extension product was purified using a Glass-Max spun column (Life Technologies), eluting the ssDNA in 50 gl of water; 10 pl of this ssDNA was then tailed with poly(dA) or poly(dC) in a 201~1 reaction mixture containing 0.5x reverse transcriptase buffer, 200pM-dATP or -dCTP and 10U terminal deoxynucleotide transferase (Pbarmacia) for 5 rain at 37 °C (Shuster et al., 1992). PCR was used to amplify 5 gl of the tailed DNA in a 50 gl reaction mixture containing 100 pmol of a second L specific primer (RPVL6: 5' GTAATCTCAAGTCTGGATACC 3') and 100 pmol of either a NotI-(dT)a~ primer/adaptor (Pharmacia) or a special mixed dG-dI primer (Y-RACE primer; Life Technologies). The amplification conditions were: 94 °C, 5 min; 35 cycles of (94 °C, 45 s; 50 °C, 1 rain; 72 °C, 2 min); 72 °C, I5 rain. In the case of the dA-tailed cDNA, the annealing temperature for the first two cycles was reduced to 30 °C. Since other primer extension experiments suggested that the purified virus contained significant quantities of anti-genomes (positive sense RNA), we repeated the above procedure using primers corresponding to the 3' end of the N gene (5' end of the N mRNA), amplifying DNA fragments of the expected size in each case; the N primers used were RPVN4 (5' CAAGCCATCCTTTGTCA 3') (primer extension) and RPVN6 (5' TGATTCCCCGGATAGCC 3') (PCR). PCR amplified DNA was gel-purified, cloned into pGEM-T (Promega) and sequenced on both strands. Sequences were determined from at least three independently amplified clones. The method was also successfully applied to total cytoplasmic RNA from RPV-K-infected cattle; in this case the PCR for the 3" end of the genome also produced an amplified product from the N mRNA, sequencing of which confirmed the start point of the N gene (see Results). In order to determine the exact 3' end of the L gene we used the NotI-(dT)a 8 primer/adaptor to prime reverse transcription on poly(A)+ RNA from RPV-R-infected cells, and then PCR to amplify the terminal region using the primer/adaptor and RPVLI as primers. The reaction conditions were essentially as previously described (Baron et al., 1994). The intergenic sequence between the F and H genes was determined from clones of PCR-amplified vRNA. The amplification used primers corresponding to F mRNA (F13: 5' CGGGTCTTAAACCAGACCTC 3') or complementary to H mRNA (H5 : 5' GATACCTGCGATAGCTAATAGCCCG Y). Sequence was determined from two independently amplified clones. Computer analysis' o f sequence. Sequence data for different genes was assembled using the Staden package (Staden, 1980, 1982); further analysis used programs of the University of Wisconsin (GCG) package (Genetics Computer Group, 1991). Protein alignment figures were produced with the help of the program ALSCRIPT (Barton, 1993). Results and Discussion Of several isolated N gene-specific clones, one (N1) was chosen for complete sequencing. The clone began at genome position 80 and extended to the poly(A) tail. The 3' end of the gene was determined from clone P14 (Baron et al., 1993), which was derived from a bicistronic N-P RNA and therefore contained the intergenic junction. [It should be noted that although this is technically the 5' end of the gene (the genome being of negative polarity), it is normal to consider the genes in positive sense.] The 5' end of the gene, and the upstream leader sequence, were determined by 5' RACE as described in Methods. The complete gene sequence, and all other gene sequences analysed in this paper, have been deposited in the EMBL database. The N gene starts at the AGGA sequence at position 56, following the intergenic CTT trinucleotide, as shown by 5' RACE (data not shown). The single large open reading frame (ORF) in the N gene encodes a protein of 525 amino acids with a calculated M r of 58053, slightly smaller than that determined by SDS-PAGE for the native N protein in infected cells (Diallo et al., 1987), but essentially the same as that calculated for other morbillivirus N proteins. Sequences from two other RPV N genes have been published, those of the parental RPV-K (Ismail et at., 1994) and of the lapinized RPV vaccine Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 595 Completion of rinderpest virus sequence RPV-R RPV-K RPV-Kw RPV-L 1 A RPV-R RPV-K RPV-Kw RPV-L RPV-R RPV-K 50 75 RPV-R RPV-K RPV-Kw RPV-L RPV-R RPV-K RPV-Kw RPVoL 25 . . . R 100 . I i, ~ ~ • m • • , ~ Q i - ' ~R 125 150 : ~ l) D G 17S ~. ~ A A T L ~ 200 :: .':' Do: 225 ',',':" ,,, 250 T g l ~ T !ii : '.. : 275 : : ', . 300 RPV-Kw RPV-L 325 RPV-R RPV-K RPV-Kw RPV-L RPV-R RPV-K RPV-Kw RPV-L 3S0 I?.~ N S 375 400 L' 425 • RItI'~k~IY~IDB~,I~'I~GlflI,II:'I~,IINII, I~m~IDNUOA # ' 45O RPV-R ~ A P ~I['~IT R T ~ ' ~ [ ~ [ l ~ : ~ H rr.4[~lG~m'~K [~P ~ [ ~ S ]~A M RPV-K RT m ~ m ~ H G K PI~(' S A M RPV-Kw RPV-L ~Kl~l-'PR~l~V L ~'~g~J"r.'.'.'.'.'.'.'.WITWA ~[~Jl~ [ ~ i ~ Y [~A[~DIUI~N PJIA ~ [ ' l p fflv K RPV-R RPV-K RPV-Kw RPV-L 500 ' P o G I~IT LT~N 475 L L P A V A ES GS GN 525 Fig. 1. C o m p a r i s o n o f the N p r o t e i n sequences o f f o u r strains o f RPV. Positions w h e r e the sequence is c o m p l e t e l y c o n s e r v e d are s h o w n as w h i t e c h a r a c t e r s o n a black b a c k g r o u n d a n d w h e r e there are similar residues as b l a c k - o n - g r e y . T h e sequences o f the R P V - L a n d R P V - K N proteins were t a k e n f r o m K a m a t a et al. (1991) a n d Ismail et aL (1994) respectively, strain (RPV-L) (Kamata et al., 1991). In neither case was sequence data obtained for the 3' extragenic region. In addition, we have determined the 3' end and the N gene of another, more recent, virulent Middle Eastern strain [Kuwait/82/1 (Taylor, 1986); RPV-Kw]. The deduced amino acid sequences of all four N proteins were aligned (Fig. 1). The N proteins from RPV-R and that from RPV-K are 99.2 % identical, in accord with comparisons of other proteins from these two strains (Baron et al., 1994); there are only three differences in the amino acid sequences of these two proteins, and two of those are conservative changes (E/D at position 190 and I / L at position 523). At position 380, RPV-R has L and RPVK has P, a non-conservative difference. However, both the other N protein sequences have L at this position. It seems unlikely, therefore, that changes in the N protein are involved in the phenotypic change from the extremely virulent RPV-K to the avirulent RPV-R. The lapinized strain, although also avirulent in cattle, is quite distantly related to either strain, the N protein being only 90.7 % identical to that of RPV-R. This probably reflects the fact that this strain is of geographically and temporally distant origin (Asian) and has been adapted to growth in rabbits, and subsequently in Vero cells. The sequence of the N gene of RPV-Kw is intermediate between the RPV-R and RPV-L sequences, being 93' 1% identical to the former and 93"7 % identical to the latter. It has previously been observed (Diallo et al., 1994; Kamata et al., 1991 ; Rozenblatt et al., 1985) that the N proteins of morbilliviruses are much more conserved over the first 400 amino acids. This also applies to a comparison of the N proteins from the four RPV strains, in that RPV-R/K N is ~ 99 % identical to the others over the first 400 amino acids, while for the remaining 125 amino acids the level of identity is reduced to 73.6 % (RPV-L) or 79'2% (RPV-Kw) (Fig. 1). Studies on Sendai virus (SeV), a related paramyxovirus, have shown that the highly conserved region contains all the necessary structural information for self-assembly into nucleocapsids (Buchholz et al., 1993; Curran et al., 1993), whereas the carboxyterminal 'tail' is in some way involved in replication (Curran et al., 1993), perhaps through binding to the P protein (Homann et al., 1991). Since there are only limited homologies between the NP protein of SeV and the morbillivirus N proteins (Morgan, 1991), it will be important to investigate whether a Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 MDG I I SVKT NVSL LM L 13~ YR YQ Cdv vo N Pvvv P T V~ m HA O H . A T Q ° ~ HNDSDLI TY I 1 + . QT HI):, ~l L I. L R~CHTILIKTVS NNHHCCFF T . . . . . . :RH:~ LT C.$Q . CN S P AT l+ E " O DT SKQVNQ .... RDPS QFKKNKDSKHEMPD ENREGI ~P A T SJ~l NL CA D G" RSLF KOTTLY K LIS RNI FI~IRGNSI |,~ ~ RMRHD[I]$ RNAQASGEG~ DNI I NGRWI LAA I AA Motif A ON I L~S V D p~ • DNLS VS KF ~1 T m~L D+ T ~ L C* ~C Q. P . S ]~ . . . . . . " ~:~ YK A N G Y L L E E E ¥~ AV T KG EL E Y A T N F V ~ M Q P G M F ~ Q V ~ K ~ E N t ~ L Q F I ~ p I ~ S EV PIEQRGCNTGGPDQI VI RQOQDTOHPENM S A~ TVS TVS KRSRHEFKATDSST G TLS KS KKYEFKS TDI YN~G~t~ITv$ C Plra~ . . . . . . . . . . . . . . . . . . . . . . K C iiv-I~ : N Q A I ~ S ' "IIIII:t:tlIit~' ~ + TA : C ~ D T N" TN TN G. L+A YYQS~, . . . . . . HKYt HAYI G~KKC~ND" NLN ETANSR DLGI Ol G. LGG . . . . . . ALDKTI ~.Y LSLCT~STYLNP KK TY L$ K|K GTK . K " . . Q O L L MR MR H:D: ''VKG + ~. VKG LTRYG~QL~I~ELKA. G G S R R TIP PV P R K .V~l]~I BG ,RC ER ET . R S R R ~ 4S0 ' QI ~1 . . . . I N N . . YAQ . . . . KLK . . . . CH . HC HC D D $ FE P EY . Q . DO DQ GD DD R. Q QDH& S6 I. S SO ONVP YKVP DK¥I DTAL . A CT . . NK SK . . ................ NO+ CD DP NK E~ . . . . . M M R R HN R HN R KY KY VS VS . . . . . ~ 1+ L . H SA H :SA Y Y I l It~lE AHIS LIDII I I PY PY AV ST IS VIII S; S | I TDS VYNNS KS S E KgN~ GMGNKNS GGYWDE K YNBVYNNS KS HTDDLKTYNKI S NLNL8 S NQ . . . . . . . . . . . . . . . . . . . . . . . . . . ....... ,,...,,., BDHP I~I VDL~ . . . . . A. N I . P ADTSQF+ S DPT L. DLD5 DLDA V EK " KTDSK . . . . . . . . . . . . N'~HRI N + . Y + * " 7OO S~KAKRYTRDNI SSQOFHTI ~ n$ mRS T ADNMTLAHS ~u $ m. . . . . . TNASNB I EHEKLK~SESDK . . AH KY KH ms C L F P | ~ E KT ~ 9 ~ P MSlII LNI TEEDQFIKR PK~KKDISHRGLTNQRKSLKPAPYRGTRHSV$SPSSRYI I . . .... L $ H I Q ~ SVS PR . . . . . . . . Sl S PR . . . . . . . . .ISNKSNEYNDNYNNY AVS SQI .... . . ~!K E~N E~N $K K L Q $ gY Q~G N N ~ + ~ K L~LFHNBO~YII KEIGFI N F F P T~N ST ~ Q ~ P F F P AL~N P T ~ I Q ~ I P GFFPD NST O ~ NLAFQ DAV R S VIE o,o DK~YN QTLS . . . MQSSG MQ$$Q ..... E! A A WHD $ D~K 1 RIAI S K~HTTYKS mm. . . . . . . . . . NI ....iF, DIDK¥LIGDIV SiKL ~ I FN FN H T T L . . VT Y~P K+EF $ L ~+ E+ V T E T G YC VADRM G YC SEKE CT . . . . ~ .... ~ S F i g . 2. F o r l e g e n d see p a g e 5 9 8 , ETC Q NKL V ~ E ~ Q S LblS g,~llL F N N F F . 0 ~DI M G T S $ P I~IN L I W ~ $ ~ N ..... o M~AKD T T SQ +mm+m .ALQRE CP ~0 T~V SLY... DEAGI L+ , I DIK LKNGF EEGF .... "II R.... NGG I KH~ I K +I wKH~ ' ° ' .L D. S."G.'.F" NI TKH. C K" CQ R LmL TITIAQ.... m''+.ii+ilNi!Hi+E . + + +. I M L, D P T ....LCT+....S,.GP+...$ --E L.QL.D TV.K~,,.S NI E L K °"°°EI " °P " ["""" S B( IATTNIKKI . . N. H.NP.YY I( ~ + . I T Y K S i T S HVLLD V . A. . . . .E.Y. . .R VA EY R VS EY R G RDLWI N~LSK AS KP~D . . . . . . GS NYDLNBBINN K K ImM C S mQ .... KmW~IIH W F N L . . . . . . l LTQ . . . . . . . ~0 D ~ ~ N ~ S ~ A ~ H A ~ T L M ~ V L N Q ~ S r I ~ i ~ s r l I L B ~ S D I, K S S HA LM VMNQ S S F R L P K K V~flI~IE~I ~[~NIPITICI~P . . . . GCFMPLSL , ~F~mg~ +~. E~T Y C~.V~i~WN [ $ & C & K~P F NNVILTQLFLYGII~CI ]" '-";" "+1" ~ .Q D . L. T. P.D. " '~+ Q+L $ M~:}~2~I H K ~ C F D A $ FA LI'~IL t E R D L I v L $ ~ Y ~ E .++ .~+~, QC E,g~K.. ~ Y V ~ K N ~ ] . H T K Q N Y N G Y L L~P ~ P Wl S NC~NTImNKSNGIIECG.. T l)mLRI" I"I".N TS ! ~ CTC 2OO K Y H K G I E K, D 0 O RRL HH ~ Y ~ [ ~ ¢ , I Y CWII~I ~ M r E T A M V~P R Y T EL~L E U ~ R V ~ K ~ G K E $ Q H ~'.~Y~ i l I - ~ l F ~ . ~ ] Y C~, I~L~mM T E T A m T ~ A R Y T E rlBL G m ~ R Y ~_~I~K ~ G DLTSHI ~Y M YC EY MTDTAMA ORYST HV RY ,D FNTENHT! Q Im~ T m~L Y~K DONGFSD GTYHE~ I E. A OY[ I TDO H T;G I SF O O N OF Y . . . . . . . . • . . . . ~lIo F , ~ . ~ . . . . ~ H i,~T Ol"~l~F~ S F TSRDVYT~AEADT S~LAI~HGTS~DEK~I):~SF rE. .R ' GTNS TEDNI +~L~L~D N~ICNT L F . . . . . . . . . . F D Q K N P~ H + P , NSNRL 4OO F N MG K D Y N L+E I TVTTYNQFITWKD, N A I V L P L R W L T Y Y K L~+IT Y P ~ L H SLR AROKKATEK FHAQORVI FI PK G + T B L M N D I ~ J T m I ~ ' k ~ I F T ~ K H ~ v S L + + + ' ' ~ K ~ C F D A D A G R N E K E ~ NI D E D NI NCt+IVNQS~N~NHVVK+~L~GI'~g~+St M "|" Q CNQE N N CNKL TQS Y . | 'YY I " IN I " R+G ' IA Q .Y . NDQLPL~KNWD~LVKE$C~V~+AGT+SQC~QN~SYG[.t+IITGRGNLFTRSEELSODRRD. .... ....... o ...... MS KE~F KLHI e Ol N$ K~T E LLL KADRTY Q . . . . . . KNH$ T~IQ . . . . . D N $ V I Tm T RI~ +I K D . . . . . . . . . , . + + . O H S ++~. N A q + S U'r ^+~+~. m c V H VH HGN . A V KK QA Q K N L ~ S AN I TVE I . . . PVI . RI'P++Q~DTBC^, "~ "~• ' R~-~IT~R ~R K ]~THA TT . • , . P I DF KI ,.o ..... L P L~P I S RQNP NQ . NQ NQ ..... +V 'K..........~T~L' ". A" . . i~i.~~II____T~,__+ + . ~ K +~~,'" ""+ i+'~'i+~'+ N] + LIE~ L I ' EC ~ ' I F ltrsv Pinfl3 SeV Cdv +'+ P,r,n+ MUS~ N. GmQ+Cr ltrsv G s.v pIId1] S Cdv t N O SMV~u T . Q KARN gl++ m+ L F Y N~M~N SL V Y .... Itrsv SeV SL S L Q SR ~ I .V P ITP.TEE. . . + T H A. F . ~v~ IVI vVSyLy + C Q * If...K . PAHSH: PKHNH RI . .... YLKNDYTNL ESSQNPSD MDSLS MDSLS MDSVS ' "Q.......+DTR . G L + + P + .......,V~M.F--+V S.+"~ M. "nil2 Sv5 yD~R . LKR K T ~gqF TNQK MHQS . S NNGQDE RSY ...... ~;Y K ttl~v P KS. NG qmT , F NGP + ~ L GS YI S g S S V H T $ H R R R Y V P $ F F S O D~]IF E ~ : ~ I ~ R ~ I ~ ] I A ~ $ ~ H $ ~~ K K S$CSI TT H T C H R R R H T p V F F T G S ~ I V E ~ E ~ I ~ R : D ! J ~ I A ~ S HNCRKRRQNPVF KGE L N ~S~R ~D~ C Q~ Q m m L~ S F S E CNAL TH+~+,P D L R K G H W Q E V V N V D$ YLKGVI KE KD~K GNV GEl +KA+ D V K~/~Y A I ~ I ~ N K ~ G L N RNF ~! YLT ~MP~Vm-~ P I ~ L F F TE P~ L L FL B FF ~I ~ T T EE ~ S Cdv p ~v$ ~AQR ~ VN NNVS SSN I I NCI E SemQXX ms PI.I12 Plnfl3 Hlsv $ ~ sN"'" ANV L I EHMNLKKLNI MDP I I NGNS ~v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cdv ......................................................................... . . .. . .. .. . .. . . . . . . .. .. . .. .. . .. . .. . . . .. .. . .. .. . . . . .. .. . .. .. . .. . . . . . . .. .. . .. .. . .. . .. . . . .. .. . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 VRV V ~ T S ~ S VA VR QSrlIIt~T Y D0~I r~P Q . . vs i POgT]~S NLLYL$ o~ VQPF +P,. YNpa'~B~ + E K P N [ ,,, ...mr,- ,) ~ m m + R KM D ~Q~F~ ~S Q ~i~L O A~H~A~]~G. g~Ggf~l ~ T y y N g~N ~ L ~G[~L~L~'~J~G:.~++E+~HL F RL S S T ~ T + G D S N T ~ / ~ V + E T ~I V C R ~ K R S T NAK NAA mm~ + + m , +m+ . m m t m . m m ~ * . . . . T~ j ~ p Hrs, Cdv SeV R~F~s Pl~2 v v ND ~ L 16~ or~v+l QL~K~$ K~FEKII]WGEG~H~M[I~ F K L M~WN F~E P~NEI L~ Q ~ L ~ I I N~;~K~ImFNAYKT NL + E R ~ H S N V o ore,) + m . Q . . A ~1~1 E ~ i r ~ ~E ~ v I GSDE~.~BH[~Sfi* ~ V V V SI V O ~ C AA QSS V R R KQ KQ K VFI ~ [] Q Q Y Y P K I~t KP BVRP mamm RKG~ ~IM~ K R G G ~ . .... A Y DDR~NK 16,50 $ $ WT N ~ F ~ Y T Y M T ~ R ~ P N S p A r~MPIIF~S G L I I I G V ADL DK ~T . ~ A mm+ . ~ R ANT ~ . ~ K, ~ V ~ . ~ ' ~ A ~ T ~ D ~ G y A F MS ~. ~SO ~G T ~L I T N VAR ~ Q +" ~ + " T H HPD . AT . . . . ."~" . .""+" . Mp A~S y RI DQRVV T~KYGDE DDKKV O T A L N N~A A T ~ L ~ , ~ A MN L D ~ T ~ H ~ Y ~ A ~ L Dy ~ K ~ D A ~ G 1700 Vg Q K R +K . oo v[l . . . . . v, I~+~+]II] S S L T R O]K RGIP T K [ ~ I ~ LSFI o | Ig Y T T S T I m s G% I ~ l m X Y N I N F i g . 2. For legend see p. 598. USQD.. A$1~H.. R~KGCHS PK~WFLKRi~INYAEIt QG~L~Y~YMT~R~g~P GMAI T G~ S T ~ ] ~ L ~ C I NL D~A~NSI'~H~A~L D Y T K ~ D A ~ G T KQ S [~K Y O~G~L~T S FL~Ar~I~$ K S V FI~v F V N A ~ t ~ R + ~ F WH S O { ~ E ~ H ~ ( ~ S ~ O ~ O N. ~ H ~ C NM~$ C VM S I~IK y O MIG ! ~ L ! ~ S $ F I l l s r ~ l ~ s K G V F L'4V ~ V N A r l ~ ~ F W H C G~E r ~ H ~ : ~ S ~ O A O N . ~ H .T ~F~'~C N ~ T C Y M $ r~lK y O ~ G I~L ~ F S F L ~ S ~ $ K G V F i~11 ~ A N A ~ ~ F WD $ G ~ E ~ H ~ S ~ O ~ Q N. ~ H ~ C N ~ N C y M R REE w H VRI KDTSHAVL V SNA + PWNAG E Y: N~S QDK L SVCK SVDL Q~I SI LT F A~;~A 1 QB L E R ~ j ~ D N D P AS N P P ~ . ~ I ~ m+ + +tmmmmmmmm.o,, s . . . . , I ~ D D K ~ K G I T A M Rtq~'llF~c GArlEB ~ : " ~ S GC~D ~ ~ L$ YYTNH~DKLQD~$ . SAS ~ : ~]i~: S . . . . . . . . . . . . . . . . . . . . . . . . . CVF S GP~I ~ H~ I V T ~ A ~ E $ A I T [~.~Ai:1 E A i i A ~ M ,~L~L~U]~G ~ A SC S ,m+~m~,,.uu.m,m,+m,,+m., CVF LCLTA L~L~SM!R[~IH:~:~R R H E T ~ V I ~ . . Ecr~sVN~G~+~SGCQ~OD D K E T 8 $ ~ ~SIkt~K[~IY~RRHE S~L L~;1.. ASL.~SHN~QJ~.~QRI AN~.I VLRGN| CTEGAI HR +: L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + I+" mmvm~m+ m y [~l~U l~r~ H~ I ~ M ~ K ~ g ~ A E~ O ~ J ~ ;+ ~ " r ~ L ~ . AVA v NLLY~$1~y~RT~-~TR~VH$ M~I~I~Iv C ~m'~l~ff'~N ~ ST A S 1) ~ J ~ P ~ A A S N~ . . . . . . ~K~L~V i ~l~y T ~ O . . . . . . . . . . . . . . " K Q R YL~C..¥HK'G'~GKAKLECDMNTSD~LCVL~EL~D$Sy~S~$KVFLEQ~I~YI Y~D S VF VYS[~IWAL~E L $ ~Q~LI,*II E P R~T V Y L *r~OCT A~N W~:'~D [~H~]HI~P E P R ~L~[~T I Y L rb~O C A A ~ N W ~ D ~ H FAH l i p E p RiL~T V y L ~O C A A~N W~S ~HJ~HI~Ip D P CSTF Ol L NQF +~ S GL~I S Y T K~D~V TNVCPNRI]~I S ~Y T ~ N ~ C VVEQF. S N ~ C C Y T O H ~ / ' N ~ V $~.~I~;~G O ~ O ~ , ~ S ~ : ~ [ ~ ' ~ F ~ L V D H ~ N ~ 1 S ~:~,J~G O ~L~O) ~ S ~ F ~L I D H : ~ N ~ V T ~ I ~ . ~ G O~ l D ~ ! ~ $ ~ I E ~ F . S S DN~K MI~£ ND D~ SL F VI ! F~S~:S ATS I VND~QA~]YT T V S [~C~$ E~ M ~ [ ~ S . S N U~V [j]~I~T m S T r~v K~ q r ~ Q .... EJr~w T |~IF~ K O I~IA O ] ~ S I ~ O]~N R S [k~$ L . . . . H MU'~/ARL~A R ~ P I ~ Y ~ V ~ J E V I v HMrI~AKL~AKC~E;~PI~Y[~V~L~I . . . . . . . "I+ Q. T $ A R~PSLttE KFRRSl-CT)~IPLI+31~IS~I~It~+R~ATE+~IYTO~HI~RH~VI~FVT~STSOLYH++~AKS~S+~OL+~K.. F LC+~ PL + NA I +R:O]AT++ Y T O + + H R R H VEFV'i" STPOLYH AKS + S: OL K.. ~k~s OHP R PGLRK+ VI PRN F + D b E l T Q E N ~ K L ~+t'it]P D ~ K ~ V ~ L PPRST E L F S K V R D v V H T V O MT y ~ S O D E y ~ R A T S I C ~ T ~ . ~ A O T~Q + . NPEST e L I R V r E ~]E P ~ / ] ~ K D P ~ 7 ~ K ~ V ~ L S ~ g ~ M V I ~ D H $ Y T I D MN Y~l~]3 D T D I ]~H A ~ S I C ~ T ] W A D T~Q . . HL~K~PI FTGD~DI HKLKQVI QK, . QHMF~t~Ph]KIStmTQYV'BLFImSNK~LKSGSHVNSNLILAHKm~SD~FHN~L~y~LSTNLAGHWI ~D~E ~ Q [~]~]I~I~T[e]I~S~F~I~IFNMRYKKO~LGKPL]:~NNG~MESPQEAN ~r~~ ~Ng'~I~JI~T[-~'~$~,:~F[~YLFRLEST~GHNPI~[~J[]~t~BD~]~KESFNDEH Cdv SeV um P ~1 R ~ ' ~ / ~ S . ~ V ~ ] ~ I ~ T ~rJ~5 ~ ' ~ ] ~ O ~ S I.j%,~Q O H C l ~ l l , ~ T ~ I k ~ A ~ , . , I ~ I A P ~ + ~ F I , I ~ N FKGKNW RCAV . . . AN F :Et ME N L $ I t ~ ] ~ S ~ J ~ y ~ l E + + + ' + - ' + 0 + ° ° ® ' + + + * + - ° + * - AS~ D M K ~:~F~iS .... DS0 ~ L. R r~r~SQY;~/Ig//JLl~Acr/IE/J[Rl~N~r31]l~o~Rf~l L~ : T [-q~id~Q~M~KTMP~YNRQ~L~KKQ~fDQ~D~LAKL~k~`~f+~NKDEFMEE[]~f~+~TL~Y~KA~K+F~QYL~L+TV$~R~CE~PA~|PAY T R G D Q .+m.:,.-,+++++~++-..-.+m,,~+,,, Itrsv ~ E ~ -. . . . . G R V H N~ ~ K ~QHP R~S~Vrl... ~N N P HL~s ;I~A ~... EVMDD .... ~ ...... H H Cdv +" Plma~ A~ ~, LNA:~[~HV~)'~I~'~I]IJ.~°~SIII',IKL$1~ATI,~LNI~pIIIE~'J~p+~<~L~H~C~:~! T O]t~O~dA T e M MR g~[~[ t L L ] l ~ R ~ m r L O C N R D K R E I ~ $ E Hrs, +NT + H H ~++.++...u~.++.--.-l+.~ ~ C V O S ~ ] I R :~.~I~IN ~ I A R F ~ I H ~IP ~ [ ~ U ~ l K B ~ ! ~ I H ! ~ O S K r ~ D ~ G I ~ I A S ~J[~t~D wr,~s * ~ ] [ ~ : ~ A I ~ l ~ p C V O $ m~'~R :~!~K~N ~ U ~ A R H O p y O Q S K S D A E R r ~ H i ~ E 8 Q ~ A ~ A ~RVJ:~T::R~SNk~dlDY EQ~F R A G M V L ~ T G f l K R N .... V~'?~O K s E ; S r ' ~ V ~ Q ~ A L ~ S ~!1 ~ T : R T : [ ~ S T ~ D Y E:Q!F R A G ] , ~ R ~ F S G K G H D . . . OI~'~DQ]D;~S~V~OJ[~]A~]N M~+' N H D L V + . . . . . . ~}+~R L , + . . . . . . . ,+ E Y E Y+M +I~: rIG ~K QNT RH KHAQYT: YLYPP YLVPSR+ Cd~ ...... m~ ...... ...... EEl V R ~P~D S . . . . . . LKL~K El CNNFG QRLHD ,,,-~.o .... ER CL FRNVWLY~A~QI~KNHALCk~NK~YLD~LKVLKHLKT~FNLDN~DTA~T~YMN~ML~G~p K T L $ +~+ I ,)I~ E +~++ .N.T.N.Y. L. .S.Y N ~ P m . ~ . ,L N P LI Pl~2 YR~ESLLCS E ~ ' ~ H ~O~M~T O y ~ D ~ NHV QT~AQAD~ .,- .r . ~ , . , NL AS RE~ [::IIK +~,~F!D~YPlAI~IT I ~ I ~ L K I ~II~'~I:~L ~ S L,~GI~IT ] . . l ~ S A ~ R O~l E P ...... N YSPI O CI~A~YKTC C S GMTI . . P T :~ P T RDQ ...... Cd~ Cdv StY EQ +,,.l+X+ .H~.,, TYKQ ,, LSHSE t~S L ~ D ~ [ ~ K P ~ . . . . . . Motif B ..... T ~v~ nil2 ~ I N ~ V ~ D ~ C ~ . y L ~ / ~ Y M T I i K ~ S Y D ~ [ p S p Q . . V S I F G D Q ~ T L E tT/NII~V~D~C~K~Y . . . . . . X M I, ANT Hrs, T .ld+-,H.id,,~+,,H~~,<ldHid~+~w~ s o,,.' Plr~++ SvS r~ 598 M . D. B a r o n a n d T. B a r r e t t m~m ~mm m[.,z mmm atmat ... ,m~m . ,,J.m . ,>>> . ,mmz . ,~.a.m . ,a.~h . Z'zZ ..... ZZZwww~O~ m ~ m _ _ _ a. ~,.::g: N ..;~zzz..J ~moocmm <Z>_l-l>.>.¢. c~ O'3 ;> *-.'.~'<~ z~<mwwmm~ ~ m~mwwmmmm ~.~m----~Z z z z ~ m :=::t0oom~t~ mc~m~-wzwc:. .... >u.wt~w lo~www:~m ;~Z;~O00 mY,;[ ~mmmm:<mm .,a.owP-m z. • , .>>~.> f~ 0 <o~om m.,a >>.<>z¢<~<. -6 <4<>>->>.>~ ;~zmmwt~ m~;~ l-m.lk>->- ~,m m "N ZZ:[OQOM~t" >>¢<->_..m -N ,a,a~....~ =0~ ...=:z.a om ¢Izto,( <z~ >.>. > . o o o <~¢,J .~.au>>>.~.a > * . ; . c~ m c ~ . . . ~ Z ~ ~zzzzz~ m. o~U)l- ¢~=ooo,.a,.a ~.a~]¢]¢o._ Z :=~m<<> >~.,z Z~>tnt~z ~.a . >.:>.>z>.a.~ >.>. z mm ~ OOt-~-I- mm.a J ~ >.> > xlg,.amz u> t-<<~.~.;¢ ~,.;www,.;,.~ . . . . . . . ,.1[-~.<~-<>> == ~m . . . . . . . mm~ ...... ~ o~1. m : = ~ = > < c v . v. z ~ ~.~ ~ ZZZm¢¢ZZ. ~mmpO~;~OZ ~ ~ ' ~ ,.~ ~¢= < o o o 0 ~ . = : .¢*~_l_mm< ml~ m>>l-,.l.1 ~ ~ ~ o ~ ~.~ .~ -~ ~ ~'~ g<t-¢¢¢--~< <z;.gx¢~>. .Zt. m~.zw~)¢ .a.a.aooo :: mz m ma~--..*, m >. ,d ~ , d t - . m O m m ~ m < <.J..t~,. Z ~.Zm . . .~mm <<<zzx~mz mzzzoz--m ~:=<ooomtam I~z z ~ - ~ ~o~wwoOm ..,a<<<,a.~ m~ Z>>~;-;. z 0 =<z--><<z >__u.u.u...m ¢:~=¢=zzzzz> ..-<<<v.t.~. m..o~oz>= . 7 . ¢. > = ~ : ~ Z= m z m ¢- ~ t l . ~- , . a . a. i - . w ¢ ¢ ¢ ~ m <m ,a ,.~ ,J.a :,. > ~ !¢ z, ~mm ..... . . . . . . . z ~ m m m m m . ~ att.,wtatcm m ~ ~m~¢tc~mmo zm<__ . r., ~. > ~m~z~-mzm Z.~omw ¢.1- ~ O Z ~ l - ..m >e..looo<~> 0 >. zzz~¢¢>>¢ < <,az~::zz tam~atnto~ z~[-, ~m~wwwmc~..~ ,.l~,.~,ooouu,.~ .ww~_.m .Zzz . :< ,a 0 u) . ¢ ~ m ~ ~ . ,.~ I-. , ( , < > ® OP;'~,J ...~ m.a~mo .m~. .o.¢~wD>>. O~Ozzzo0~ >. ;,. >.~. u . . < Z ~ < Z Z O*-~.J.J 0 Z;)- >.~- > . 0 0 ~.~ ~ .ZmZ u~<ooo>Z;~ ~>vmmm<<> ~mm~x~mm> ,a~oooooa. Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 ~ _ ~ o~o-~ Completion of rinderpest virus sequence similar functional assignment can be made for the morbillivirus N proteins. The trinucleotide at the start and end of all genes in RPV is CTT, except for that at the junction of the H and L genes. Analysis of the rinderpest L gene clones showed that, in this position, the trinucleotide is CGT (MV also has CGT, whilst CDV has CTA). The 5" UTR (untranslated region) is 22 bases long and is followed by a 2183 codon ORF encoding the L protein, a 72 base 3' UTR and a 37 base extragenic domain. The L protein predicted by translation of the ORF had a predicted Mr of 248100, although the observed M r of the L protein synthesized in infected cells is smaller (about 190000) (data not shown). The end of the L gene (the polyadenylation site) and the start of the extra-genic region was determined by using RT-PCR from poly(A) + RNA as described in Methods. The length of each of these regions is identical to the corresponding region of the MV genome (Blumberg et al,, 1988; Crowley et al., 1988); the CDV genome shows certain differences, notably that the L protein is 22 amino acids shorter. Overall, the sequences of the three morbillivirus L proteins are very conserved; allowing for conservative substitutions, only 16 % of residues differ between the L protein of RPV and that of MV, 24 % between those of RPV and CDV, or those of MV and CDV. Alignment of these three L proteins with those of L proteins of other paramyxoviruses, including human respiratory syncytial virus (Fig. 2), shows that the L protein of RPV contains all the major motifs identified in the polymerases of negative-stranded RNA viruses (Poch et al., 1989, 1990; Tordo et al., 1988), including the RNA binding region (Motif A), the GDNQ-containing region proposed to be the active site of the polymerase (Motif B) and the proposed purine nucleotide binding site (Motif C) (Poch et al., 1990). No major changes were seen in the RPV L protein in any of the highly conserved domains found distributed along the polymerase; the alanine residue at alignment position 757 where all the other paramyxoviruses have threonine was confirmed in both RPV-R and the virulent RPV-K parental strain by PCR on genomic RNA. The CDV L protein, on the other hand, differs from the other polymerases in two regions of otherwise strong conservation, at alignment positions 1144-1155 and 1320-1349 (Fig. 2). In each case a sequence closely matching the consensus can be found in the other reading frames (B. Rima, personal communication). These differences are not commented on in the original paper (Sidhu et al., 1993 b); since they appear to arise by single base deletions and insertions, and the gene sequenced is that of the attenuated Onderstepoort strain, the changes may modify the activity of the polymerase in such a way as to account for the lack of virulence. In the Pararnyxoviridae, as in all the non-segmented, 599 negative-strand RNA viruses, the 3' terminal sequence of the genome is believed to contain one or more recognition sites for the RNA polymerase, transcription both of the antigenome and of all the mRNAs being initiated at this point (Blumberg et al., 1991; Kingsbury, 1990). Similarly, the 3' end of the antigenome (corresponding to the 5' end of the genome) is the site for initiation of transcription of the genome. The 5' end of the antigenome has been shown, in MV, to contain an encapsidation signal (Castaneda & Wong, 1990), and it may be assumed that a similar signal exists in the 5' end of the genome. The terminal sequences therefore play a crucial role in the replication of paramyxoviruses, and we were interested to see if there were any differences between the ends of non-virulent (RPV-R) and virulent (RPV-K, Kw) viruses. We therefore used 5' RACE to determine the ends of the two virulent strains in addition to the vaccine strain. Comparison of the sequences from the three morbilliviruses with the same regions of MV [strains Edmonston (Crowtey et al., 1988) and AIK-C (Mori et al., 1993)] and CDV [Onderstpoort strain (Sidhu et al., 1993 a)] is shown in Fig. 3. The 3' extragenic region (5' leader of the antigenome) of RPV is 52 bases long, exactly the same length as those of MV (Crowley et al., 1988) and CDV (Sidhu et al., 1993 a) and is followed by the consensus morbillivirus intergenic trinucleotide CTT. The 5' extragenic region is 37 bases long, the same as that of MV and one base shorter than that of CDV. The RPV sequences show extensive homology with each other and with the corresponding regions of MV and CDV, as expected, given the similarities in all the morbilliviruses (Barrett et al., 1991). It is clear that both terminal regions are highly conserved, although different positions seem to be variable at each end, e.g. the first 16 bases in the genome are conserved (Fig. 3 a), whereas the antigenome has three variable positions in the first 16 (Fig. 3 b). In addition, the extragenic regions are much more conserved in sequence than the normal non-coding sequences in the genes: 58 59% of the terminal sequences are completely conserved between the three viruses, whereas the rest of the non-coding sequences show only ~ 30 % sequence identity, little more than that expected by chance. There are only four variant residues in the 5' end of the RPV genome outside the L gene, and three of those in the poorly conserved region from 29-37. The only difference between RPV-R and RPV-K is found in this region. The 5' end of the antigenome shows nine variant positions between the three RPV sequences. At five of those (24, 28, 31, 36 and 49) RPV-R and RPV-K match, and at two more (12 and 40) RPV-R matches RPV-Kw. Only at positions 5 and 26 does the non-virulent strain differ from both the virulent strains; of these, position 26 may be the most significant, since here the four avirulent Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 600 M . D. Baron and T. Barrett (a) (b) 50 1 50 RPV-Kw Rpv-K Rpv-R Mv Mv-Aik Cdv ~CCAGAC~GCTGG~g-~GA~--e~GCTCCTTTG.~-~e-~ RPV-Kw ~CC~Ca~~~c~Tc4~'tr~GTA h C C A G A C A A A G C T G G G ~ A G A ~ A C T T ~ A C T TCTTTG. A A ~ T T T T ~ T~A A C C A G A C A A A G C T G G ~ G A ~ A C T T ~ A C A T C TTT G. AA~TT T ~ V T ~ ACCAGACAAAGC T G G G ~ A G A ~ A C T T F G T A T T T T C A . A ~ G T T T ~ V T ~ A ACCAGACAAAGC T G G G ~ A G A ~ A C TT~GTAT TTTCA. AA~TTTT~VT~A hC CAGACAAAG C T G G G ~ . ~ G A T ~ T TAATAAC C GTT~ T T T T~_~.~C RPV-K RPV-R MV NV-AIK CDV ~C C ~ A C A A A G ~ G ~ A A G G A ~ ~qA~G~ ~C C~ACAAAG~ GG~AAG GA~C~qA~A~ ~ C C A ~ A C A A A G ~ GG ~ A A G G A ~ C ~ A ~ qA~G~ ~ C C ~ A C A A A G ~ GG ~ A A G G A ~ A ~ C ~ A ~ q A ~ G ~ ~ C CA~AC AAAG~G~_~_~TA A G GA T ~ m q ~ T~_~ RPV-Kw 51 ~TATTCC~CAGTTTTGTTGATCAGATT~--~A~GG~C RPV-Kw 51 ~ Rpv-K Rpv-a I00 G[fATA[fTAT TTC[r~%CAGTTTTATTGACCAGATCT~GG~FG~Gp~AGFp~G~C Mv Mv-Aik Cdv G~ATA~TATTT CF~CAGTTTTATTGAC CAGAT C A ~ G ~ F A ~ G ~ A G F ~ C A~ATA~ T G C ~ T G C C T A A C C A C C TAGG G C A ~ G ~ A ~ G ~ T T ~ F ~ A A~AT A~ T G C A A A ~ A T GC C T AAC CAC CT AG G G CA~ G ~ F A ~ G~ T T ~ A A G ~ _ ~ C TAAG T~CAATAGCAAT GAAT G GAAG G G ~ _ ~ . A ~ G C ~ _ ~ C RPv-~w ~aTTGCTT~eXC~CCG~--~g'~~NrCN~Ck-'%~ i01 RPV-K T C ~ - ~ - - C ~ T C GACT G ~ G ~ T A ~ A ~ F ~ G CA TG ~ A ~ V ~ G C A CAlF T V ~ G T G CA ~ T V ~ G TG TA ~ T~9%~J2AAA TT ~ I00 CA A A ~ T T qACTTA GC!A~TC ~ G A T CCTAIlCGACT G ~ ] A G F A ~ T T ~ t G ~ T A T ~ p T T RPV-R MV MV-AIK CDV ~ A C T T A G ~ T ~ A ~ A T C C TA~ CGAC T G ~ A G F A ~ F T W ~ C A C A ~ p T T ~ACTTAG ~ T ~A~GAT C C TA~ TAT CAG~GA~A~G~G C ~ G ~ T T A G ~ A T ~CTTAG~T~ATC CT~TATCAG~A~GC~TTAG~AT A ~ C T T A G G ~ C A ~ G A T C CTA~C T T ~ C T~GTTCA~ACC aPv-Kw CTTZ~TCTF-~CC~C~--~I~G~C~G~C~ 150 I01 150 gpv-K AGTTGCAGA~C~C~r~CCG~%AT~C~f~CAC%]GFA~tCC[~WCFTpA~C I RPV-K CTTT2X~ATaGq2TC1]C2]CT~G~AG~qC~2gGFFCFGTTCAT~GGCCA Rpv-R AGT T G C A G A ~ e A ~ RPV-R CTTT~ATGGCFT C~T~E~A~A~ Mv-Aik Cdv C C GP~AT~AC4fV C AC ~G~A~C C~g~rfT ~ CN T~ A~C[ G T T C A A C ~ [ T ~ A 2 @ C T T ~ T F D G ~ CAC~GVA~C C ~ N T ~ C F TDT~CI TAAC C T G ~ T ~ A ~ T T CT ~ _ ~ C ~ _ C ~ ~ ~ A T ~ T ~ T j G A ~ MV-AIK CDV ~DGFFCFG TTCA~GGCCA AT C C ~G~j~GGC~ACA~ ~ T T ~ G ~ A ~ D G F g T ~ TAC C ~ A G C ~ . ~ T C~G~_G~ C~A~CV GTT C ~ A A A C A G T T C A A ~ GAC T C Fig. 3. Terminal sequences of RPV. The terminal sequences of three RPV strains were determined as described in Methods. The 5' ends of the genome (a) and the anti-genome (b) were aligned with the same regions of two MV strains and one CDV strain. Residues conserved in all six sequences are boxed. For orientation, the trinucleotides at the end of the L gene (a) or the beginning of the N gene (b) are marked in Imld, while the end of the L coding sequences (a) and the beginning of the N coding sequences (b) are shaded. 1 10 20 30 40 50 60 70 5 - ACCAGACAAAGCTGGGTAAGGATCGTTCTATCAATGATTGTGATTTAGCACACTTAGGATTCAAGATC IIIIIIIIIIIIIIII I II III I II I I I I I I I ~ RPV: 5 ACCAGACAAAGCTGGGGATAGAAACTTCACATCTTTGAAGTTTTCTTTAGTATATTATTTCTACAGTTT 80 n : ~ .~: -, 90 - .... _ ~ - - -, -_ . ~2~1~ I00 110 TTCTTTAAAATG II I I g ~ CAGTTGCAGAAT 5 - ACCAAACAAA~TTGGGTAAGGA~AGTTCAATCAATGATCATCTTCTAGTGCACTTAGGATTCAAGA~CCTATTATCAGG~AC~TATCCGAGATG ;'iV : II;I IIllll IIII I II I Ill I I I fill i I I I II I I [ I I 5 AC CAGACAAAGCTGGGAATAGAAACT TCGTAT TTTCAAAGT TTTCTTTAATATAT TGCAAATAATGCCTAACCAC ......C T ~ T C C G G A G T T C A 5 - AC•AGACAAAGTTGGCTAAGGATAGTTAAATTATTGAATATTTTATTAAAAACTTAGGGTCAATGATCCTACCTTAAAGAACAA ~ ~ { ~ ~'-[" TATG 5 - ACCAGACA~GCTGGGTATGATAA~TTATTAATAACCGTTGTTTTTTTTCGTATAACTAAGTT~AATAGCAATGAATGGAAGGG~GTCAG Fig. 4. Repeated sequences at the ends of morbilliviruses. The first 110 bases of the anti-genome and genome are shown for rinderpest virus (RPV), measles virus (MV) and canine distemper virus (CDV). Matching bases between the two strands are indicated (I), and insertions made to optimize the MV alignment are shown by (-). The internal repeated sequences in RPV, MV and CDV are highlighted. viruses have the same base. It is unfortunate that the most studied strains of MV and C D V are currently vaccine strains that have been adapted for a long time to tissue culture. It will be important to determine the terminal sequences of wild-type isolates of these and other morbilliviruses. The techniques we have used here can be used to determine these sequences from R N A isolated directly from infected tissue, without the need to passage the viruses in cell culture, which m a y lead to significant changes in the virus as it adapts to growth in cells which are not its normal host. Such changes m a y be very subtle and hidden a m o n g normal inter-strain variation: in the example presented here, there are more differences between the sequences of the two virulent strains of RPV than there are between the vaccine and its virulent parent. It has previously been noted (Blumberg et al., 1991) that, although there is only limited homology between the 5' or 3' ends of different paramyxoviruses, the ends of the genome and anti-genome of any individual virus are very similar, especially over the first 18 bases. This region presumably contains the promoter/landing site for the viral polymerase, at least when transcribing full-length anti-genomes or genomes. The mechanism of transcription of m R N A s in morbilliviruses is currently unclear. Although free leader R N A s (i.e. transcripts from the 3' extragenic region of the genome) have been detected in VSV-, SeV- and Newcastle disease virusinfected cells (Cotonno & Bannerjee, 1977; Kurilla et al., 1982, 1985; Leppert et al., 1979), intensive efforts by a number of groups (Billeter et al., 1984; Castaneda & Wong, 1989; Crowley et al., 1988) have failed to detect such R N A s in MV-infected cells. It has therefore been argued (Blumberg et al,, 1991; Castaneda & Wong, 1989) that there must be a second p r o m o t e r region in morbilliviruses for direct initiation of transcription of the N gene; as yet there are no data to indicate the location of this promoter. B . M . Blumberg and co-workers (Blumberg et al., 1991) have called attention to short sequences about 90 bases from the 3' ends of the genome and anti-genome of MV which are very similar (14/15 identical nucleotides, Fig. 4); a similarly repeated Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 Completion of rinderpest virus sequence sequence at the same position relative to the ends of the genome was identified in CDV (Sidhu et al., 1993 a) and one can also be seen in RPV (Fig. 4). The sequences are different in the different morbilliviruses and, interestingly, such strongly conserved sequences are not found in other paramyxoviruses. In addition, the region 70-100 bases from each end of the genome to which these partial repeats map is the least conserved between viruses (Fig. 3). These sequences are unlikely to be promoters for internal transcription initiation, as there is no evidence for such transcription from the anti-genome. Since the presence or absence of the transcript from the extragenic sequence appears to be the determining factor in whether or not viral RNAs are encapsidated (Castaneda & Wong, 1990), these repeated elements, which lie in the UTRs of the N and L genes, do not appear to be candidate encapsidation signals. An alternative explanation, that these regions are part of the polymerase landing site for replicative transcription (Blumberg et al., 1991) is more likely, since this is a function required at both ends of the genome. Why these repeats should be found only in morbilliviruses, however, and why the sequences involved are not as well conserved as those at the ends of the genome, are questions that remain to be answered. Based on work on copy-back defective interfering (DI) forms of SeV, it has been suggested that efficient replication of the genome requires that the number of nucleotides be a multiple of six, and that this is due to a nucleotide:N protein ratio of six. Interestingly, the recorded length of the full SeV (15384 bases) is also a multiple of six, as is the recorded length of one MV strain (Mori et al., 1993) and one Pifl3 strain (Stokes et al., 1992). However, the published lengths of MV Edmonston strain [15892 (Crowley et al., 1988)], CDV [15616 (Sidhu et al., 1993a)] or RPV (15881, this paper) do not fit this rule. Interestingly, the two full-length MV genome sequences in the database, for the Edmonston strain [Billeter et al., 1984; Cattaneo et al., 1989 (accession no. K01711)] and the AIK-C strain [Mori et al., 1993 (accession no. $58435)], both have lengths of 15894, a multiple of six. The two sequences, however, have several insertions/deletions relative to one another (as well as to other MV sequences in the database, and it is not clear whether these differences are due to differences in the strain or the difficulties of sequencing nearly 16000 bases to an accuracy of plus or minus one base. More reliable information as to the universality of the rule of six among paramyxoviruses will probably be obtained from discrete changes to natural genomes or genome-like constructs. We thank Lynnette Goatley for invaluable technical assistance, Dr P. Thomas and Wendy Blakemore for oligonucleotide synthesis, and Drs R. Cattaneo and M. Billiter for the gift of plasmid peMV(- )2ip. This work was funded by the Wellcome Trust. 601 References BARON, M.D., SHAILA, M.S. & BARRETT, T. (1993). Cloning and sequence analysis of the phosphoprotein gene of rinderpest virus. Journal of General Virology 74, 299-304. BARON, M.O., GOATLEY, t. & BARRETT, T. (1994). Cloning and sequence analysis of the matrix (M) protein gene of rinderpest virus and evidence for another bovine morbillivirus. Virology 200, 121-129. BARRETT,T., BELSHAM,G. J., SUBBARAO,S. M. & EVANS,S. A. (1989). Immunization with a vaccinia recombinant expressing the F protein protects rabbits from challenge with a lethal dose of rinderpest virus. Virology 170, 11-18. BARR~TY,T., SUBBARAO,S. M., BELSHAM,G. J. & MAHy, B. W. (1991). The molecular biology of the morbilliviruses. In The Paramyxoviruses, pp. 83-102. Edited by D. W. Kingsbury. New York : Plenum. BARTON,G. J. (1993). ALSCRIPT, a tool to format multiple sequence alignments. Protein Engineering 6, 37-40. BILLE~R, M. A., BACZI~O,K., SCHMm, A. & TER MEULEN,V. (1984). Cloning of DNA corresponding to four different measles virus genomic regions. Virology 132, 147-159. BLUMBERG, B.M., CROWLEY, J.C., SILVERMAN,J.I., MENNONA, J., COOK, S.D. & DOWLING, P.C. (1988). Measles virus L protein evidences elements of ancestral RNA polymerase. Virology 164, 487-497. BLUMSERG, B.M., CIaAN, J. & UDEM, S. (1991). Function of paramyxovirus 3' and 5' end sequences in theory and practice. In The Paramyxoviruses, pp. 235-247. Edited by D. W. Kingsbury. New York: Plenum Press. BUCHHOLZ, C.J., SPEHNER, D., DRILLIEN, R., NEUBERT, W.J. & HO~aANN, H. E. (1993). The conserved N-terminal region of Sendal virus nucleocapsid protein NP is required for nucleocapsid assembly. Journal of Virology 67, 5803 5812. CASTANEDA,S. J. & WONt, T. C. (1989). Measles virus synthesizes both leaderless and leader-containing polyadenylated RNAs in vivo. Journal of Virology 63, 2977-2986. CASTAtCEDA,S. J. & WONG, T. C. (1990). Leader sequence distinguishes between translatable and encapsidated measles virus RNAa. Journal of Virology 64, 22~230. CATTANEO,R., SCHMID,A., SPIELHOFER,P., KAELIN,K., BACZKO,K., TER MEULEN,V., PARDOWITZ,J., FEANAGAN,S., RIMA,B. K-, UDEM, S. A. & BILLETER,M. A. (1989). Mutated and hypermutated genes of persistent measles viruses which caused lethal human brain disease. Virology 173, 415-425. CHAMBERLArN,R. W. (1992). Studies on the surface glycoprotein genes from different strains of rinderpest virus. PhD thesis, University of Reading, Reading, UK. COLONNO, R. J. 8,~ BANNERJEE,m. K. (1977). Mapping and initiation studies on the leader RNA of vesicular stomatitis virus. Virology 77, 260-268. CROWLEY, J.C., DOWLING, P.C., MENONNA, J., SILVERMAN,J.I., SCn~OBACK,D., COOK, S. D. & BLUMBERG,B. M. (1988). Sequence variability and function of measles virus 3' and 5' ends and intercistronic regions. Virology 164, 498-506. CURRAN,J., HOMANN,H., BUCHHOLZ,C., ROCHAT,S., NEUBERT,W. & KOLAKOFSKY,D. (1993). The hypervariable C-terminal tail of Sendal paramyxovirus nucleocapsid protein is required for template function but not for RNA eucapsidatiou. Journal of Virology 67, 4358-4364. DIALLO, A., BARRETT, T., LEFEVRE, P.-C. & TAYLOR, W. P. (1987). Comparison of proteins induced in cells infected with rinderpest and peste-des-petits-ruminants. Journal of General Virology 68, 2033 2038. DIALLO, A , BARRETT,T., SUBBARAO,S. M. & TAYLOR,W. P. (1989). Differentiation of rinderpest and peste-des-petits-ruminauts viruses using specific cDNA clones. Journal of Virological Methods 23, 127-137. DIALLO, A., BARRETT,T., BARBRON,M., MEYER, G. & LEFEVRE,P. C. (1994). Cloning of the nucleocapsid protein gene of peste-des-petitsruminants virus: relationship to other viruses. Journal of General Virology 75, 233 237. ])ORFMAN, D.M., ZON, L.I. & ORKIN, S. I-]_. (1989). Rapid ampli- Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21 602 M . D. Baron and T. Barrett fication of 2gt 11 bacteriophage library inserts from plaques using the polymerase chain reaction (PCR). BioTechniques 7, 568-570. DOWLING, P.C., BLUMBERG, B.M., MENONNA, J., ADAMUS, J.E., COOK, P., CROWLEY, J. C., KOLAKOFSKY, D. & COOK, S. D. (1986). Transcriptional map of the measles virus genome. Journalof General Virology 67, 1987-1992. EVANS, S.A., BARON, M. D., CHAMBERLAIN,R.W., GOATLEY,L. & BARRETT, T. (1994). Nucleotide sequence comparisons of the fusion protein gene from virulent and attenuated strains of rinderpest virus. Journal of General Virology 75, 3611-3617. FROHMAN, M.A., DUSH, M.K. & MARTIN, G . R . (1988). Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proceedingsof the National Academy of Sciences, USA 85, 8998-9002. GALINSKI, M.S., M1NK, M.A. & PONS, M.W. (1988). Molecular cloning and sequence analysis of the human parainftuenza 3 virus encoding the L protein. Virology 165, 499-510. GENETICS COMPUTER GROUP (1991). Program manual for the GCG package. Madison, Wisconsin, USA. HOMANN, H.E., WILLENBR1NK, W., BUCHHOLZ, C.J. & NEUBERT, W. J. (1991). Sendal virus protein-protein interactions studied by a protein-blotting protein-overlay technique: mapping of domains on NP protein required for binding to P protein. Journal of Virology 65, 1304-1309. ISMAIL, T., AHMAD, S., D'SouzA-AULT, M., BASSIRI, M., SALIKI, J., MEBUS, C. & YILMA, T. (1994). Cloning and expression of the nucleocapsid gene of virulent Kabete O strain of rinderpest virus in Baculovirus: use in differential diagnosis between vaccinated and infected animals. Virology 198, 138-147. KAMATA,H., TSUKIYAMA,K.. SUGIYAMA,M., KAMATA,Y., YOSHIKAWA, Y. & YAMANOUCHI,K. (1991). Nucleotide sequence of cDNA to the rinderpest virus mRNA encoding the nucleocapsid protein. Virus Genes 5, 5-15. KAWANO, M., OKAMOTO, K., BAND, H , KONDO, K., TSURUDOME,M., KOMAOA, H., NIsmO, M. & Izo, Y. (1991). Characterization of the human parainfluenza type 2 virus gene encoding the L protein and the intergenic sequences. Nucleic Acids Research 19, 2739 2746. KINGSBURY, D. W. (1990). Paramyxoviridae and their replication. In Virology, pp. 945-962. Edited by B. N. Fields. New York: Raven Press. KURILLA, M. G., PIWNICA-WORMS, H. & KEENE, J. D. (1982). Rapid and transient localization of the leader RNA of vesicular stomatitis virus in the nuclei of infected cells. Proceedings of the National Academy of Sciences, USA 79, 5240-5244. KURILLA, M. G., STONE, H. O. & KEENE, J. D. (1985). RNA sequence and transcriptional properties of the 3' end of the Newcastle disease virus genome. Virology 145, 203-212. LEPPERT, M., RITTENHOUSE, L., PERRAULT, J., SUMMERS, D.F. & KOLAKOFSKY, D. (1979). Plus and minus strand leader RNAs in negative strand virus-infected cells. Cell 16, 735-747. LOH, E. Y., ELLIOTT, J. F., CW1RLA,S., LANIER, L. L. & DAVIES, M. M. (1989). Polymerase chain reaction with single-sided specificity: analysis of T-cell receptor 6 chain. Science 243, 217-220. MIDDLETON, Y., TASHIRO, M., THAI, T., OH, J., SEYMOUR,J., PRITZER, E., KLENK, H.D., ROTH, R. & SETO, J.T. (1990). Nucleotide sequence analysis of the genes encoding the HN, M, NP, P and L proteins of two host range mutants of Sendal virus. Virology 176, 656--657. MORGAN, E. M. (1991). Evolutionary relationships of paramyxovirus nucleocapsid-associated proteins. In The Paramyxoviruses, pp. 163-180. Edited by D. W. Kingsbury. New York: Plenum Press. MORI, T., SASAKI,K., HASHIMOTO,H. & MAKINO, S. (1993). Molecular cloning and complete nucleotide sequence of genomic RNA of the AIK-C strain of attenuated measles virus. Virus Genes 7, 67-81. PARKS, G. D., WARD, C. D. & LAMB, R. A. (1992). Molecular cloning of the NP and L genes of simian virus 5: identification of highly conserved domains in paramyxovirus NP and L proteins. Virus Research 22, 259 279. PLOWRIGHT, W. & FERRIS, R. D. (1962). Studies with Rinderpest virus in tissue culture. The use of attenuated culture virus as a vaccine for cattle. Research in Veterinary Science 3, 17~182. POCH, O., SAUVAGET, I., DELARUE, M. & TORDO, N. (t989). Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8, 3867~3874. POCH, O., BLUMBERG,B. M., BOUGUELERET, L. & TORDO, N. (1990). Sequence comparison of five polymerases (L proteins) of unsegmented negative-strand RNA viruses: theoretical assignment of functional domains. Journal of General Virology 71, 1153 1162. RIMA, B. K., BACZKO, K., CLARKE, D. K., CURRAN, M. D., MARTIN, S. J., BILLETER,M. A. & TER MEULEN, V. (1986). Characterisation of clones for the sixth (L) gene and a transcriptional map for morbilliviruses. Journal of General Virology 67, 1971 1978. ROZENBLATT, S., EIZENBERG, O., BEN-LEvY, R., LAVIE, V. & BELLINI, W. J. (1985). Sequence homology within morbilliviruses. Journal of Virology 53, 684-690. SHUSTER, D. M., BUCHMAN,G. W. & RASHTCHIAN,A. (1992). A simple and efficient method for amplification of cDNA ends using 5' RACE. Focus 14, 46-52. SIDHU, M. S., HUSAR, W., COOK, S. D., DOWLING, P. C. & UDEM, S. A. (1993a). Canine distemper terminal and intergenic non-protein coding nucleotide sequences : completion of the entire CDV genome sequence. Virology 193, 66-72. SIDHU, M. S., MENONNA,J. P., COOK,S. D., DOWLING,P. C. & UDEM, S.A. (1993b). Canine distemper virus L gene: sequence and comparison with related viruses. Virology 193, 50-65. STADEN, R. (1980). A new computer method for the storage and manipulation of DNA gel reading data. Nucleic Acids Research 8, 3673 3694. STADEN, R. (1982). Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Research 10, 4731-4751. STEC, D. S., HILL, M. G. & COLLINS, P. L. (I99I). Sequence analysis of the polymerase L gene of human respiratory syncitiat virus and predicted phylogeny of nonsegmented negative-strand viruses. Virology 183, 273 287. STOKES, A., TmRNXY, E. L., MURPHY, B. R. & HALL, S. L. (19921). The complete nucleotide sequence of the JS strain of human parainfluenzavirus type 3: comparison with the Wash/47885/57 prototype strain. Virus' Research 25, 91-104. TAYLOR, W. P. (1986). Epidemiology and control of rinderpest. Revue Scientifique et Technique (Office International des Epizooties) 5, 407-4 10. TORDO, N., POOH, O., ERMINE, A., I~ITH, G. & ROUGEON, F. (1988). Completion of the rabies virus genome sequence determination: highly conserved domains among the L (polymerase) proteins of unsegmented negative-strand RNA viruses. Virology 165, 565-576. WINDLE, B. E. (1988). A simple technique for preparing pure 2 DNA. BioTechniques 6, 403-408. (Received 13 July 1994: Accepted 28 October 1994) Downloaded from www.microbiologyresearch.org by IP: 78.47.19.138 On: Sun, 02 Oct 2016 20:14:21