Nucleotide Sequence of the Gene Coding for the

advertisement
THEJOURNALOF BIOLOGICAL
CHEMISTRY
Vol. 256, No. 6. Issue of March 25, pp. 2808-2814, 1981
Printed in U.S.A.
Nucleotide Sequence of the Gene Coding forthe Nitrogenase Iron
Protein from KZebsieZZa pneurnoniae*
(Received for publication, October 6, 1980)
Venkatesan Sundaresan and FrederickM. Ausubel
From the Cellularand Deuelomnental Biolom GrouD.
_ ,Department of Biology, Harvard University, Cambridge,
Massachusetts 02138
L,-
We report the completeDNA sequence of theKlebsiella pneumoniae n i f l gene, the gene which codes €or
component 2 (Fe protein or nitrogenase reductase) of
the nitrogenase enzyme complex. The amino acid sequence of the K. pneumoniae nitrogenase Fe protein is
deduced from the DNA sequence. The K. pneumoniae
Fe protein contains292 amino acids,has a M, = 31,753,
andcontains9
cysteine residues. We comparethe
of the K. pneumoniae protein with
amino acid sequence
available amino acid sequence data on nitrogenaseFe
proteins from two other species, Clostridium pasteurianum and Azotobacter vinelandii. T h e C. pasteurianum
Fe protein, for which the complete sequence
is known,
shows 67% homology with the K. pneumoniae Fe protein. Extensiveregions of strong conservation (9045%)
arefound,whileother
regions show relatively poor
conservation(30-35%). It is suggested that these
strongly conserved regions are of special importance
to the function of this enzyme, and the findings are
discussed in the light of evolutionary theories on the
origin of nif genes.
fied nitrogenase components 1 and 2 from different species
form interspecific hybrid complexes, many of which have
enzymatic activity (6). In corroboration, DNA hybridization
experiments carried out in our laboratory have shown evolutionary conservation of DNA sequences among thegenes for
nitrogenase in 13 different species (7). On the other hand,
there are also significant differences in the nitrogenase proteins from different species. For example, the Fe proteins
purified from different species exhibit markedly different degrees of cold lability and degrees of sensitivity to 0 2 , and
contain different numbers of cysteine residues (2). Although
purified nitrogenase components 1 and 2 from a wide variety
of organisms can interactwith one another toyield enzymatically active complexes, the degree of activity depends on the
particular interspecies hybrid and some heterologous nitrogenases show no enzymatic activity (8).
The above facts raise several interesting questions
regarding
the nitrogenase proteins: 1) Are the amino acid sequences of
nitrogenase proteinsconserved to the same extent throughout
the polypeptide chains, or is the conservation only limited to
specific regions of the proteins? 2) If specific regions are
stronglyconserved, are these conservedregionsassociated
with specific functions, such as ATP binding, formation of the
Nitrogenase, a highly conserved enzyme complex, is prob- Fe-S cluster, Fe-Mo cofactor binding, etc? 3) Are there signifably responsible for all biological nitrogen assimilation from
icant domainsor regions inthe nitrogenase proteins which are
the atmosphere. Nitrogenase catalyzes the overall reaction:
not conserved, and are these regions responsible for the obN2 3H2+ 2NH3; the detailsof its mechanism are notknown served differences in the nitrogenase proteins? If this is true,
(1). The enzyme system has been isolated from at least 10 may it be possible, by DNA recombination in vitro, to condifferent prokaryotic species (2) and in all cases consists of struct an Fe protein, for example, that combines particular
two components, both of which are required for activity, and aspects of Fe proteins from differentspecies (e.g. lowered
both are irreversibly inactivated by oxygen. When both puri- sensitivity to 02,cold stability, etc.)?
fied components are present, N2 (orC2HJ can be reduced in
Answers to these questions depend on obtaining theDNA
flavodoxin, and/or aminoacid sequences of several nitrogenase genesand
the presence of ATP and an electron donor assuch
ferrodoxin, ordithionite (3). Component 1 ortheMo-Fe
proteins, respectively. The complete amino acid sequence for
protein contains a cofactor (Fe-Mo cofactor),containing Mo, Cp2‘ has been determined (9) and partial aminoacid sequence
Fe, and S, which is the presumed site of Nz reduction (4). data on the Av2 and Kp2 proteins’ is now available (10). We
Component 2 ortheFeprotein
is necessaryfor electron have determined, and report here, the complete nucleotide
transfer to component 1, and also contains the binding sites sequence of the K.pneumoniae nimgene andcorrespondthe
for ATP. The electron transfer abilities of component 2 are ing complete amino acid sequence of the K. pneumoniae Fe
due to a 4Fe-4S unit in the dimer (which is the active form) protein (Kp2). The DNA
sequence of nifH can now be manipwhich is liganded to Cys residues (1). The two components ulated, using in vitro recombinant DNA techniques, to study
have beenshown to associate and dissociate duringeach
structure-function relationships of Fe protein.
(5),who have proposed
catalytic cycle by Hageman and Burris
In addition to obtaining useful information about the prithe nomenclature “nitrogenase reductase” for the Fe protein mary structure of Kp2, we hoped that sequencing the nifH
and “dinitrogenase” for the Mo-Fe protein.
genes would help determine whetherKp2 undergoes extensive
There is compelling evidence that the nitrogenase proteins post-translational processing. This possibility is of interest
from different species are closely related. For example, puri- because the NHp-terminal amino
acid of purified Kp2 protein
* This work was supported by National Science Foundation Grant is threonine (10) and becausepreliminary data have been
+
PCM 78-15450 and by UnitedStatesDepartment
of Agriculture
Grant 5901-0410-9-0237.The costs of publication of this article were
defrayed in part by the payment of page charges. This article must
therefore be hereby marked “advertisement” in accordance with 18
U.S.C. Section 1734 solely to indicate this fact.
’
The nomenclature used here is according to Postgate (12), i.e. Fe
protein is protein 2 and MoFe protein is protein 1. Thus, Cp2, Kp2,
and Av2 are the Fe proteinsfrom C. pasteurianum, K. pneumoniae,
and A . vinelandii, respectively.
2808
Amino
Acid
Sequence of Nitrogenase
Protein
Iron
2809
hia
nlt
publishedwhich indicateposttranslational modification K.pnaumonlae
of
chromoaome
nitrogenase proteins in K . pneumoniae (11).
DO
QBALFYVSUNE
KDH
J
I :
A final rationale for determining the
sequence of nifH is to
compare the DNA sequence of the Kp2 gene with that of
other Fe protein genes as they become available, in order to
answer questions about the
evolution of the nitrogenase genes.
PSA 30
f
I
For example, it has beensuggested by Postgate (12) that
R
H
R
nitrogenase genes evolved much later than the species that
nlt H
carry them, and perhaps
were initiallycarried ona conjugative
PSB 1
itself
plasmid or a transposable element that later distributed
H
B
&I
R
among various species.
FIG. 1. Genetic and physical map of Klebsiella nif genes. The
structural genes for nitrogenase are nif, K, D, and H . Direction of
-
0 -
0
,-,
EXPERIMENTALPROCEDURES
Preparation of ncfH DNA-The plasmid pSA3O contains a 6.0 kilo
base pairs Eco RI fragment carrying K. pneurnoniae nif genes K, D,
H, and part of E (13, 14). A 3.15 kilo base pairs Eco RI-Hind111
fragment from this insert,containing nifH and D (Fig. l),was cloned
into the plasmid vector pBR322 (15) following excision of the small
Eco RI-Hind111 fragment of pBR322.' This plasmid, pSB1, was used
as the sourceof DNA for sequencing. Plasmid DNA was prepared as
described by Clewell and Helinski (16).
DNA Sequencing-This was carried out as described by Maxam
Fig. 2.
andGilbert (17). T h e sequencing strategyisoutlinedin
Sequencing was performed across all restriction sites and on both
strands wherever possible (for over 95% of the region).
RESULTSANDDISCUSSION
A recombinant plasmid (pSA30) which carries the structural genes for K. pneumoniae Fe protein ( n i m and Mo-Fe
protein (nifK and nifD) has been constructed by Cannon et
al. (13). The physical locations of nifK, D,and H on pSA30
were determined by Riedel et al. (14) which enabled us to
construct a derivative of pSA30 (pSB1) (see Fig. 1) which
contains nifH anda t least partof n i p and which was used as
a source of DNA for the sequencing experiments reported
here.
A restrictionmap of the region of the K . pneumoniae
chromosome containing nifH is shown in Fig. 2. The DNA
sequence of the nifH gene is presented in Fig. 3. The coding
region was identified as follows: a start codon (ATG) at 288
basepairs from the Eco RI site
is followed by an open reading
frame of 292 amino acids. The protein sequencederived from
this DNA sequence was compared with the
NH2-terminal
sequence of Kp2 protein, which had been determined previously by Hausinger and Howard(10). The two sequences
matched exactly,except that the fist methionine residue
corresponding to the startcodon is not found in purified Kp2
and the threoninewhich is found at theNHa-terminal of Kp2
corresponds to the second codon. Therefore, it is likely that
the ATG codon a t 288 basepairs is the translational start
signal of the H gene and this conclusion is supported by the
presence of a Shine-Dalgarno (18) sequence (AGGAG), 8
basepairs upstream of this ATG.
The amino acid sequence of Kp2 is compared with the
available amino acid sequence data from Cp2 and Av2 in Fig,
4. We make the following observations from this comparison.
Length-Kp2 is 292 amino acids long, as compared to 273
amino acids for Cp2 and 289 amino acids for Av2 (10). When
compared with Cp2, Kp2 is extended by one amino acid at
the NH2-terminal and15 amino acids at the COOH-terminal.
There is also an insertion of two additionalamino acids
between positions 64 and 65 of Cp2. Such an insertion of 2
additionalamino acids has alsobeenfound
in Av2.:' The
is
molecular weight of Kp2 derived from the sequence 31,753,
which is close to previous estimates of 33,400 (2) and 32,600
(10).
S . Brown and V. Sundaresan, unpublished results.
R. P. Hausinger and J. B. Howard, personal communication.
transcription is from right to left. The regions cloned in plasmids
pSA30 (13) and pSBl are shown. Restriction sites: R = Eco RI, H =
HindIII, E = Barn HI, Bg = B g l 11.
L
0
I
I
400
I
I
800
I
1
1200
FIG. 2. Restriction map of the nijH region showing the sequencing strategy. This map is drawn in the opposite orientation
from the map shown in Fig. 1, z.e. direction of transcription is from
left to right. The distances from the Eco HI site are indicated in base
pairs. The arrows indicate the extent
of sequencingfrom eachrestriction site and the strand onwhich the sequencing was performed.
Cysteine Residues"Kp2 has 9 cysteine residues, as compared to 6 in Cp2 and 7 in Av2. The cysteines at positions 5,
151, and 259 of Kp2 are notfound inCp2 and thoseat positions
234 and 259 of Kp2 are notfound in Av2. This is in agreement
with the predictions of Hausinger and Howard (10) based on
protein sequence data. Thus, the five cysteines at positions
38, 85, 97,132, and 184of Kp2 are conservedin all three
proteins and they are candidates for possible ligands of the
4Fe-4S cluster.
Degree of Homology-When the sequences of Kp2 and Cp2
are aligned for maximum homology, there are 89 amino acid
substitutions in Kp2 compared to Cp2. Thus, 184/273 amino
acids of Cp2 are conserved in Kp2, which gives -675% homology, If the partial sequence data fromAv2 is included in the
comparison, we find that 104/114 amino acids sequenced in
Av2 are conserved in Kp2, giving -91% homology. The Av2
and Kp2 proteins are extented at the NHn- and COOH-terminals, compared to theCp2 protein (10). If we neglect these
extensions and make a comparison across all three proteins
for the 90 amino acids sequenced in Av2 that lie within these
limits, we find that there is -80% homology of Kp2 with Cp2
and -93% homology between Av2 and Kp2. Since the overall
homology between Kp2 and Cp2 is 67%, the regions of Av2
that have been sequenced must represent more highly conserved regions. Further, within these regions, Kp2 is more
homologous to Av2 than to Cp2.
Conserved Regions-Thecomparison
of amino acidsequences shows that some regions of the protein areespecially
conserved. Substitutions in these regions are generally conservative or neutral and areoften near the boundaries of the
regions. Such regions are outlined in Table I. Conservation in
these regions is over 90% between Kp2 and Cp2. They represent -146 amino acids.
Poorly Conserved Regions-These are outlined in Table 11.
Amino
Acid
2810
Sequence of NitrogenaseProtein
Iron
I
10
0
Met Thr Met Ar
G l n Cys Ala I l e T r G 1
L s G1
G1
I I
L s S e r ThrThrThr
Ile G 1
. G $ i AAA CACTCAACAACA
(ibA GAAGTCACC
G l n Asn Leu Val Ala Ala Leu Ala Glu Met
GGY A ~ AGG? GGY ATC GGY A ~ ATCC ACC ACC ACG AAC CTC GTC GCC GCG cTG GcG Gkti A T 6
ProLys
Ala Asp S e r Thr Ar Leu I l e Leu
Ala Lys Ala Gln Asn Thr I l e
Glu
GGT AAG AAA GTG ATG ATC GTC GG? TGC GAP CCG AAG GCG GAC TCC ACC CG? CTG ATT CTG CAC GCC AAA GCA CAG AAC ACC ATT ATG GAG
60
70
Met Ala Ala Glu Val
S e r Val Glu
LeuGluLeuGlu
As Val Leu Gln I l e
T r
As Val Ar Cys Ala G l uS e r
ATG GCC GCG GAA GTC GGE TCG GTC GAG GAE CTC GAA CTC GAA GAE GTG CTG CAA ATT GGE T ~ CG G GAP
~ GTG CG? TGC GCG GAA TCC GGE
90
100
110
ATG ACC ATG CG? CAA TGC GCT ATT T ~ C
30
GlyLysLys
Val Met I l e Val G 1
40
Cys AS
Met
His
80
G1
Gly Pro G l uP r oG l y
GGC CCG GAGCCA
120
GAT TTC GTG TTC TAT
150
Val Cys S e r G 1
As Val
Leu G 1
G1
$$
GGC GTGATC
$f
ACG GCG ATCAACTTTCTT
AS Val Val Cys G 1
Ala Met Pro I l e Ar
GlyPhe
Ala TyrGlu
GAAGAA
GGC GCCTAC
G l u Met Met Ala Met T r Ala
&
Asp Asp Leu
GAGGACGATCTC
Glu Asn L s Ala Gln Glu I l e Tyr I l e
GAE GTG CTC GGE GAE GTG
GTC TGC GGE GGC TTC GCC ATG CCG ATC CG? GAA AAC AJ(A GCCCAG
160
170
GAG ATC TAC ATC
Val L s Tyr Ala L s S e rG l yL y s
Ala Asn AsnI l e S e rL y sG l yI l e
GTC TGC TCC G G GAA
~ ATG ATG GCG ATG T ~ CGCG GCC AAC AAT ATC TCC AAA GGG ATC GTT
180
G 1 GlyLeu
G1
Gly Val I l e Thr Ala I l e Asn PheLeuGluGluGluGly
Val G l y Cy6 Ala Gly Ar
GGC GTC GGC TGC GCG GGA
Asp Phe Val PheTyr
G1
AS
190
Gln Thr As Ar
Val Arg Leu
TAC GCC A ~ ATCC GGC AAG GTG CGC CTC
Glu As GluLeu
I l e I l e Ala Leu Ala Glu
Leu
ThrGln
Ile
CAE
CG? GAA GAE GAA CTG ATT ATT GCC CTG yi GAA A ~ GCTC GGY ACC CAC~ATG ATC CAC
220
Phe Val Pro Ar As Asn I l e Val Gln Ar Ala Glu I l e Ar Ar Met Thr Val I l e Glu T r AS Pro Ala Cys
s Gln Ala Asn Glu
TTT GTG ccc CG? GAE AAC ATC GTG CAG CG? GCGGAG ATC CG?CG? ATG ACG GTT ATC GAG TIC
GAE
ccc
GCC TGT AtiA CAG GCC AAC cju
260
240
250
T r Ar ThrLeu
Ala Gln L s I l e Val Asn AsnThr Met L
Val Val PrO ThrPro C
Thr Met AS GluLeuGluSerLeuLeu
T ~ CCG? ACC CTG GCG CAG A ~ GATC GTC AAC $i ACC ATG p3JA GTG GTG CCG ACG ccc T ~ C ATG GAP GAG CTG GAA TGG CTG ti ATG
270
GluPheGly
Ile
G l u Glu Glu As Thr S e r I l e I l e G1 L
Thr Ala Ala Glu Glu Asn Ala AlaTER
GAG TTC GGC ATC ATG GAA GAG GAA GAF ACC AGC ATC ATT GG?
ACC GCC KC GAA GAA AAC GCG GCC TGA GCA CAG GAC AAT TAT,, . , .
I l e Cys Asn S e r Ar
GGE
GGC CTGATC
210
L 8
TGT AAC TCA CG? CAG ACC
Met
G1
His
L
Met
8
S
Met
8
FIG. 3. DNA sequence of nifH and the amino acid sequence deduced from it. Only the coding strand is shown.
C.P,
K.P,
A,".
C.P.
K.P.
C.P.
K.P.
,, , ,
85
90
K.P.
A,".
,105,
~
110,
K,P
VAL
115,
125
K,P,
A.v.
C.P.
K.P.
!& ALA GLV ARG ~ L
Lu ALA GLI ARC GLY
Irr A u GLI ARG
130
135
Ass VAL VAL &C
GLI GLY PNE
VAL VAL Lu GLV GLI PXE
ASP VAL V I L LXS GLI GLV PHE
1%
15s
145
GLI Lrs ALA GLH GLU ILETVR ILLVAL
SEI GLY GLU kr
A B N ~ L Y SALA GLN GLU ILLTVR ILL
GLI GLU&T
VAL PHE Tra ASP VAL
iru GLI
Asp
1]LE
A,".
SER
GLY
ASP TYR
ASP PHE
140
ALA R T PRO ILE
ARG GLU
ALA k 7 PRO I L E ARG GLU
hA AT PRO !LE ARG
VALVSLR
h.l
TVR ILE VAL
Y
EOl
l i t ILE
THR SER ILLAn ki iru GLU GLM LEU GLY Au Tra THR ASP ASP k u
ILETHR ALA ILE AS" PHE L E U GLU
GLU
GLU GLV
TVR GLU ASP ASP L E U
C.P.
C.P.
100
95
GLY ILEARG Lrr VAL GLU SER GLV GLY PRO GLU PRO GLY VAL GLV
ASP V u ARG ~
~
G SERL GLV UGLI PRO GLU PRO GLY VAL GLI
& VAL GLU SERGLIGLI
PRO GLU PRO GLV VAL GLV
C.P.
&T
Au k u T r a ALA ALA
RTA
L
A
]
~
~
E
T
~
GLUI
165
Conservation in these regions between Kp2 and Cp2 is only
30%; they represent -82 amino acids.
The NH2-terminalregions and regions of the protein around
the cysteine residues are strongly conserved. It is interesting
that the cysteine residues at positions 151 and 234 of Kp2 lie
in conserved regions, but the cysteines themselves are not
conserved (in Cp2 and Av2,respectively). Note also that there
is good conservation of the COOH-terminal region(Kp2 265289) between Kp2 and Av2, but not between Kp2 and Cp2.
This might be relevant to the observation that the purified
nitrogenase components of A. vinelandii and K . pneumoniae
form heterologous hybrids which have 93-100s of the activity
of the homologous nitrogenases, whereas the cross-reaction of
the Kp and Av nitrogenase components with those from Cp
is either poor (17% for Cp2 + Kpl and 8% or less for Cpl +
Kp2), or none (Av and Cp) (8). This COOH-terminal region
might be involved in the formation of an active complex with
the K . pneumoniae and A . vinelandii Mo-Fe proteins. In this
context,
we also observe that the cysteine residues at positions
\
5 and 151 of Kp2 are conserved in Av2, but are substituted in
C.P.
K.P.
A,".
,
C.P.
K.P.
A.V.
05C21
TABLEI
Strongly conserved regions
,
, ,0
K.P.
210
215
GLU k u GLY SER GLN k u ILL
HIS PHE VAL PRO ARC. SER PRO MET VAL THR Lrs A u
LrS LEU GLY THR GLN k T [ L E H I S ?HE VAL PRO ARG ASP ASN !LE VAL GLN ARG ALA
C.P.
1 1 6 ASN
C.P.
K.P,
A.v.
~
A
K,P,
265
C.P
K.P,
VAL !LE
270
275
GLU ARG k u GLU GLU ILE
k u MET GLN TYR GLY LEU MET ASP LEU-COOH
ASP GLU LEU GLU SER LEU LEU MET GLU PHE GLY I L E MET GLU GLU GLU ASP
<GLU&kU
285
M E 290
T % v A L F ] G L U L ]
Substitutions
in
A v ~ ~
~~~
6LU
2-22
34-49
84-104
122-187
280
THR SER ILE
C.P.
208-213
226-241
K,P,
A,".
FIG. 4. A comparison of amino acid sequences of iron proteins from K. pneumoniae (this work), C. pasteurianum (9),
and A. vinelandii (10). The numbering refers to K. pneumoniae
residues. Regions of homology are enclosed in boxes. Cysteine residues are underlined.
Substitutions in
Cp2"
KD
~ L U
GLU T Y R ASP PRO THR LLS GLU GLN ALA GLU LLU T Y R ARC
R ARG
G MET THR VAL ILE
GLU TYR ASP PRO ALA
L V S GLN ALA ASN GLU TYR ARG
TIR ASP PRO LYS ALA LYS GLN ALA ASP GLU
225<
Lrs GLN T R R
C.P.
A,".
Amino
acid residues in
Cys 5 + Val
Ile 35 + Val
Ile 49 "-* Leu
Ala 86 + Val
Val 103 "-* Ile
Asn 142 + Gln
Cys 151 -, Ala
Met 158 + Leu
Val 169 + Gln
Lys 176 + Gly
Leu 182 + Ile
No substitutions
Ala 233 + Thr
Lys 235 + Gln
Asn 238 + Gln
~
See Ref. 10.
See Ref. 6.
None
None in 33-41 (42-49 not available)
Ala 86 -+ Val (101-104 not available)
In the regions where the sequence is
known (129-140, 147-154, 179-187).
no substitutions
Sequence not available
Only 230-239 available
Ala 233 + Lys
Cys 234 -+Ala
Asn 238 -+ ASP
Amino
Acid
Sequence of Nitrogenase
Protein
Iron
Cp2. These cysteine residues might also be necessary for the
formation of active complexes with the Mo-Fe proteins from
these species.
Since the region extending beyond the COOH-terminal of
Cp2 shows conservation in Kp2 and Av2, it is possible that
Cp2 evolved from a single base pair change from the Glu
codon in the Kp2gene to a stop codon in Cp2 (GAA to TAA).
Protein Conformation-The protein conformations of Kp2
and Cp2 were analyzed by the Chou-Fasman procedure(19).
The predicted conformations were as follows. Kp2: 40% ahelix, 26% P-sheet, 26%/?-turns;Cp2: 37%a-helix, 32% /3-sheet,
25% P-tumS.
The distribution of helices and sheets were found to be
similar in thetwo proteins. Further, mostof the /3-turns were
conserved. The exceptions are the p-turnsat positions 22-25
and 113-116 of Cp2 which are not conserved in Kp2, and Kp2
has p-turns at positions 53-56, 188-191, and 255-260 that are
not found in Cp2.
unusually high number of
Tanaka et al. (9) have noted the
Gly-Gly sequences inCp2. Of the 7 Gly-Gly sequences in Cp2,
only 4 are conserved in Kp2. Three of these conserved GlyGly sequences areinvolved in p-turns.
Amino Acid Compositions-Finally, the amino acid comin III. It may
positions of the Fe proteins are compared Table
be of interest to examine the conservation
of the rarer amino
acid residues (Phe, Tyr,His, and Met) between Kp2 and Cp2
(Table IV). All the Tyr residues of Kp2 areconserved in Cp2,
the other amino acids show only -50% conservation. Methionine occurs infrequently in the other Fe-S
proteins; Kp has
15 Met residues of which 8 are conserved in Cp2.
TABLEI1
Poorly conserved regions
Amino acid residues in Kp2
Number
of residues conserved in
CP2
Number of substitutions Cp2
in
~
4
23-33
50-67
5
7
14
Also a deletion of 2 amino acids
5
10
20
73-78
188-202
245-276
~~
4
1
12
TABLE111
Amino acid compositions of Fe proteins
Residues/molecule
Amino acid
Aspartic acid
Threonine
Serine
Glutamic acid
Proline
Glycine
Alanine
Cysteine
Valine
Methionine
Isoleucine
Leucine
Tyrosine
Phenylalanine
Tryptophan
Lysine
Histidine
Arginine
Asparagine
Glutamine
Total
Kp2
Cp2
Av2
16
16
10
29
8
27
29
9
22
15
24
14
13
13
24
9
32
20
6
19
11
20
26
12
Asp + Asn = 30
12
10
Glu + Gln = 35
9
28
28
7
25
13
20
23
9
7
0
16
3
14
19
9
6
0
16
2
13
12
10
292
5
0
16
2
12
8
11
273
~
289
~~~
2811
TABLEIV
Conservation of rare amino acids
Amino acids
Phenylalanine
Tyrosine
Histidine
Methionine
Positions
Kp2
in
Residues conserved in Cp2
108, 121, 123,135,210,
123, 135,210
271
8,
80, 115, 124, 148, 159, 8, 80, 115, 124, 148, 159,
171, 230,240
171, 230,240
50,209
209
2,29,34,58,60,137,155,
2,
29,
34,
137, 155, 156,
156, 158, 207, 225,
269,274
252,261, 269, 274
In conclusion, we have shown that there is extensive sequence homology (67%) between the Fe proteins of two distantlyrelated prokaryotes, the Gram-positive c. pasteuriK . pneumoniae. Theconservaanum and the Gram-negative
tion of sequences was shown to be selective for regions of the
protein (see TablesI and 11).Thus, thehomology observed is
most likely related to function, and
is probably not merely the
result of the late evolution anddispersal of the nifgenes. This
sort of strong conservation, based on function, is not unique.
The protein sequence of cytochrome c is highly conserved in
eukaryotes, e.g. there is 55% homology between the cytochrome c proteins of the two widely separated species of horse
and yeast (20).
We can tentatively delineate
functionally important regions
of the Fe protein, although
we cannot assign specific functions
to them. The areas that have diverged are presumably not
essential to activity. This sort of divergence can occur even if
the nifgeneswere originally spread by lateral transmission as
suggested (12). In that case, however, one might expect the
divergence not to show any particular pattern based on evolutionary trees, i.e. evolutionary trees derived from nitrogenasesequences would not correlate with trees
derived,for
example, from rRNA sequences (21). The limited sequence
informationfrom the Fe protein of the Gram-negative A.
vinelandii (IO) suggests that it is in fact more homologous to
Kp2 than to the Fe protein
of Gram-positive C. pasteurianum.
In this context, it is worth noting that Ruvkun and Ausubel
(7) foundDNA hybridization of the K. pneumoniae Feprotein
gene toA. vinelandii DNA, but not C.topasteurianum DNA.
These observations aresuggestive of models based on divergence of preexisting nifgenes; however, one could also account
for them by assuming that the evolution of the nif genes
subsequent to lateral
transmission followed different paths in
Gram-positives and Gram-negatives. Such questions can be
answered only when considerably more data are available
from other species. Furthermore, we need additional DNA
sequence data from the nifgenes of several species to locate
silent substitutions that do not change the amino acid residues, in order to study evolutionary divergence independent
of the enzyme function.
The understanding of the mechanism of an enzyme can be
made possible by studying the propertiesof mutant enzymes
withsubstitutedamino
acids. Utilizing thesequencedata
presented here, it should be possible to use in vitro mutagenesis techniques to alterspecific regions or evenspecific amino
acidresidues of the Fe protein in order to obtain altered
nitrogenases which are deficient in specific functions such as
ATP binding, electron transfer, and so on, or have altered
EPR signals, etc.; one can then determine the nature of the
substitution that caused the altered property. Togetherwith
the extensive genetic system already developed in K . pneumoniae and the
detailed physical characterization of its nitrogenase components (11, 2), such studies shouldlead to an
elucidation of the mechanism of this protein.
Acknowledgments-We
would like to thank S. Brownfor
con-
28 12
Amino
Acid
Sequence of Nitrogenase
Protein
Iron
structing pSB1; W. Orme-Johnson, J. Howard,andH.Evans
for
comments on the manuscript;J. Howard for providing sequence data
on the Azotobacter uinelandii Fe protein prior to publication; R.
Tizard for advice on DNA sequencing; R. Hydefor preparation of the
manuscript;F. Fullerforuse
of computer programs; F. Lang for
suggestions on lowering background in sequencing gels.
REFERENCES
1. Orme-Johnson, W. H., and Davis, L. C. (1977) in Iron-Sulfur
Proteins (Lovenberg, W., ed) Vol. 3, pp. 16-58, Academic Press,
New York
2. Eady, R. R., and Smith,B. E. (1979) in A Treatiseon Dinitrogen
Fixation (Hardy, R. W. F., ed) pp. 299-491, John Wiley and
Sons, New York
3. Winter, H. C., and Burris, R. H. (1976) Annu. Reu. Biochem. 45,
409-426
4. Shah, V. K., and Brill, W. J. (1977) Proc. Natl. Acad. Sci. U. S.
A . 74, 3249-3253
5. Hageman, R. V., and Burris, R. H. (1978) Proc. Natl. Acad. Sci.
U. S. A . 75,2699-2702
6. Eady, R. R., and Postgate,J. R. (1974) Nature 249,805-810
7. Ruvkun, G . B., and Ausubel, F. M. (1980) Proc. Natl. Acad.Sci.
U. S. A . 77, 191-195
8. Emerich, D. W., and Burris, R. H. (1978) J.Bacteriol. 134, 936943
9. Tanaka, M., Haniu, M., Yasunobu, K. T., and Mortenson, L. E.
(1977) J. Biol. Chem. 252, 7093-7100
10. Hausinger, R. P., and Howard, J. B. (1980) Proc. Natl. Acad.Sci.
U. S. A . 77,3826-3830
11. Roberts, G. P., MacNeil, T., MacNeil, D., and Brill, W. J. (1978)
J. Bacteriol. 136,267-279
12. Postgate, J. R. (1974) Symp. SOC.
Gen. Microbiol. 24,263-292
13. Cannon, F. C., Riedel, G. E., and Ausubel, F. M. (1979) Mol. Gen.
Genet. 174, 59-66
14. Riedel, G . E.,Ausubel,F. M., and Cannon, F. C. (1979) Proc.
Natl. Acad. Sci. U. S. A . 76, 2866-2870
15. Rodriguez, R. L., Bolivar, F., Goodman, H. M., Boyer, H. W., and
Betlach, M. (1976) in Molecular Mechanisms in the Control of
Gene Expression (Nierlich, D. P., Rutter, W. J., and Fox, C.,
eds), pp. 471-499, Academic Press, New York
16. Clewell, D. B., and Helinski, D. R. (1969) Proc. Natl. Acad. Sci.
U. S. A . 62,1159-1166
17. Maxam, A,, and Gilbert, W. (1980) Methods Enzymol. 65, 499560
18. Shine, J., and Dalgarno, L. (1975) Nature 254,34-38
19. Chou, P. Y., and Fasman, G. D. (1978) Annu. Reu. Bzochem. 47,
251-276
20. Margoliash, E. (1972) Harvey Lect. 66, 177-248
21. Fox, G., Stackebrandt, E., Hespell, R., Gibson, J., Maniloff, J.,
Dyer, T.,Wolfe, R., Balch, W.,Tanner, R., Magrum, L., Zablen,
L., Blakemore, R., Gupta, R., Bonen, L., Lewis, B., Stahl, D.,
Luehrsen, K., Chen, K., and Woese, C. (1980) Science 209,457463
Download