THEJOURNALOF BIOLOGICAL CHEMISTRY Vol. 256, No. 6. Issue of March 25, pp. 2808-2814, 1981 Printed in U.S.A. Nucleotide Sequence of the Gene Coding forthe Nitrogenase Iron Protein from KZebsieZZa pneurnoniae* (Received for publication, October 6, 1980) Venkatesan Sundaresan and FrederickM. Ausubel From the Cellularand Deuelomnental Biolom GrouD. _ ,Department of Biology, Harvard University, Cambridge, Massachusetts 02138 L,- We report the completeDNA sequence of theKlebsiella pneumoniae n i f l gene, the gene which codes €or component 2 (Fe protein or nitrogenase reductase) of the nitrogenase enzyme complex. The amino acid sequence of the K. pneumoniae nitrogenase Fe protein is deduced from the DNA sequence. The K. pneumoniae Fe protein contains292 amino acids,has a M, = 31,753, andcontains9 cysteine residues. We comparethe of the K. pneumoniae protein with amino acid sequence available amino acid sequence data on nitrogenaseFe proteins from two other species, Clostridium pasteurianum and Azotobacter vinelandii. T h e C. pasteurianum Fe protein, for which the complete sequence is known, shows 67% homology with the K. pneumoniae Fe protein. Extensiveregions of strong conservation (9045%) arefound,whileother regions show relatively poor conservation(30-35%). It is suggested that these strongly conserved regions are of special importance to the function of this enzyme, and the findings are discussed in the light of evolutionary theories on the origin of nif genes. fied nitrogenase components 1 and 2 from different species form interspecific hybrid complexes, many of which have enzymatic activity (6). In corroboration, DNA hybridization experiments carried out in our laboratory have shown evolutionary conservation of DNA sequences among thegenes for nitrogenase in 13 different species (7). On the other hand, there are also significant differences in the nitrogenase proteins from different species. For example, the Fe proteins purified from different species exhibit markedly different degrees of cold lability and degrees of sensitivity to 0 2 , and contain different numbers of cysteine residues (2). Although purified nitrogenase components 1 and 2 from a wide variety of organisms can interactwith one another toyield enzymatically active complexes, the degree of activity depends on the particular interspecies hybrid and some heterologous nitrogenases show no enzymatic activity (8). The above facts raise several interesting questions regarding the nitrogenase proteins: 1) Are the amino acid sequences of nitrogenase proteinsconserved to the same extent throughout the polypeptide chains, or is the conservation only limited to specific regions of the proteins? 2) If specific regions are stronglyconserved, are these conservedregionsassociated with specific functions, such as ATP binding, formation of the Nitrogenase, a highly conserved enzyme complex, is prob- Fe-S cluster, Fe-Mo cofactor binding, etc? 3) Are there signifably responsible for all biological nitrogen assimilation from icant domainsor regions inthe nitrogenase proteins which are the atmosphere. Nitrogenase catalyzes the overall reaction: not conserved, and are these regions responsible for the obN2 3H2+ 2NH3; the detailsof its mechanism are notknown served differences in the nitrogenase proteins? If this is true, (1). The enzyme system has been isolated from at least 10 may it be possible, by DNA recombination in vitro, to condifferent prokaryotic species (2) and in all cases consists of struct an Fe protein, for example, that combines particular two components, both of which are required for activity, and aspects of Fe proteins from differentspecies (e.g. lowered both are irreversibly inactivated by oxygen. When both puri- sensitivity to 02,cold stability, etc.)? fied components are present, N2 (orC2HJ can be reduced in Answers to these questions depend on obtaining theDNA flavodoxin, and/or aminoacid sequences of several nitrogenase genesand the presence of ATP and an electron donor assuch ferrodoxin, ordithionite (3). Component 1 ortheMo-Fe proteins, respectively. The complete amino acid sequence for protein contains a cofactor (Fe-Mo cofactor),containing Mo, Cp2‘ has been determined (9) and partial aminoacid sequence Fe, and S, which is the presumed site of Nz reduction (4). data on the Av2 and Kp2 proteins’ is now available (10). We Component 2 ortheFeprotein is necessaryfor electron have determined, and report here, the complete nucleotide transfer to component 1, and also contains the binding sites sequence of the K.pneumoniae nimgene andcorrespondthe for ATP. The electron transfer abilities of component 2 are ing complete amino acid sequence of the K. pneumoniae Fe due to a 4Fe-4S unit in the dimer (which is the active form) protein (Kp2). The DNA sequence of nifH can now be manipwhich is liganded to Cys residues (1). The two components ulated, using in vitro recombinant DNA techniques, to study have beenshown to associate and dissociate duringeach structure-function relationships of Fe protein. (5),who have proposed catalytic cycle by Hageman and Burris In addition to obtaining useful information about the prithe nomenclature “nitrogenase reductase” for the Fe protein mary structure of Kp2, we hoped that sequencing the nifH and “dinitrogenase” for the Mo-Fe protein. genes would help determine whetherKp2 undergoes extensive There is compelling evidence that the nitrogenase proteins post-translational processing. This possibility is of interest from different species are closely related. For example, puri- because the NHp-terminal amino acid of purified Kp2 protein * This work was supported by National Science Foundation Grant is threonine (10) and becausepreliminary data have been + PCM 78-15450 and by UnitedStatesDepartment of Agriculture Grant 5901-0410-9-0237.The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ’ The nomenclature used here is according to Postgate (12), i.e. Fe protein is protein 2 and MoFe protein is protein 1. Thus, Cp2, Kp2, and Av2 are the Fe proteinsfrom C. pasteurianum, K. pneumoniae, and A . vinelandii, respectively. 2808 Amino Acid Sequence of Nitrogenase Protein Iron 2809 hia nlt publishedwhich indicateposttranslational modification K.pnaumonlae of chromoaome nitrogenase proteins in K . pneumoniae (11). DO QBALFYVSUNE KDH J I : A final rationale for determining the sequence of nifH is to compare the DNA sequence of the Kp2 gene with that of other Fe protein genes as they become available, in order to answer questions about the evolution of the nitrogenase genes. PSA 30 f I For example, it has beensuggested by Postgate (12) that R H R nitrogenase genes evolved much later than the species that nlt H carry them, and perhaps were initiallycarried ona conjugative PSB 1 itself plasmid or a transposable element that later distributed H B &I R among various species. FIG. 1. Genetic and physical map of Klebsiella nif genes. The structural genes for nitrogenase are nif, K, D, and H . Direction of - 0 - 0 ,-, EXPERIMENTALPROCEDURES Preparation of ncfH DNA-The plasmid pSA3O contains a 6.0 kilo base pairs Eco RI fragment carrying K. pneurnoniae nif genes K, D, H, and part of E (13, 14). A 3.15 kilo base pairs Eco RI-Hind111 fragment from this insert,containing nifH and D (Fig. l),was cloned into the plasmid vector pBR322 (15) following excision of the small Eco RI-Hind111 fragment of pBR322.' This plasmid, pSB1, was used as the sourceof DNA for sequencing. Plasmid DNA was prepared as described by Clewell and Helinski (16). DNA Sequencing-This was carried out as described by Maxam Fig. 2. andGilbert (17). T h e sequencing strategyisoutlinedin Sequencing was performed across all restriction sites and on both strands wherever possible (for over 95% of the region). RESULTSANDDISCUSSION A recombinant plasmid (pSA30) which carries the structural genes for K. pneumoniae Fe protein ( n i m and Mo-Fe protein (nifK and nifD) has been constructed by Cannon et al. (13). The physical locations of nifK, D,and H on pSA30 were determined by Riedel et al. (14) which enabled us to construct a derivative of pSA30 (pSB1) (see Fig. 1) which contains nifH anda t least partof n i p and which was used as a source of DNA for the sequencing experiments reported here. A restrictionmap of the region of the K . pneumoniae chromosome containing nifH is shown in Fig. 2. The DNA sequence of the nifH gene is presented in Fig. 3. The coding region was identified as follows: a start codon (ATG) at 288 basepairs from the Eco RI site is followed by an open reading frame of 292 amino acids. The protein sequencederived from this DNA sequence was compared with the NH2-terminal sequence of Kp2 protein, which had been determined previously by Hausinger and Howard(10). The two sequences matched exactly,except that the fist methionine residue corresponding to the startcodon is not found in purified Kp2 and the threoninewhich is found at theNHa-terminal of Kp2 corresponds to the second codon. Therefore, it is likely that the ATG codon a t 288 basepairs is the translational start signal of the H gene and this conclusion is supported by the presence of a Shine-Dalgarno (18) sequence (AGGAG), 8 basepairs upstream of this ATG. The amino acid sequence of Kp2 is compared with the available amino acid sequence data from Cp2 and Av2 in Fig, 4. We make the following observations from this comparison. Length-Kp2 is 292 amino acids long, as compared to 273 amino acids for Cp2 and 289 amino acids for Av2 (10). When compared with Cp2, Kp2 is extended by one amino acid at the NH2-terminal and15 amino acids at the COOH-terminal. There is also an insertion of two additionalamino acids between positions 64 and 65 of Cp2. Such an insertion of 2 additionalamino acids has alsobeenfound in Av2.:' The is molecular weight of Kp2 derived from the sequence 31,753, which is close to previous estimates of 33,400 (2) and 32,600 (10). S . Brown and V. Sundaresan, unpublished results. R. P. Hausinger and J. B. Howard, personal communication. transcription is from right to left. The regions cloned in plasmids pSA30 (13) and pSBl are shown. Restriction sites: R = Eco RI, H = HindIII, E = Barn HI, Bg = B g l 11. L 0 I I 400 I I 800 I 1 1200 FIG. 2. Restriction map of the nijH region showing the sequencing strategy. This map is drawn in the opposite orientation from the map shown in Fig. 1, z.e. direction of transcription is from left to right. The distances from the Eco HI site are indicated in base pairs. The arrows indicate the extent of sequencingfrom eachrestriction site and the strand onwhich the sequencing was performed. Cysteine Residues"Kp2 has 9 cysteine residues, as compared to 6 in Cp2 and 7 in Av2. The cysteines at positions 5, 151, and 259 of Kp2 are notfound inCp2 and thoseat positions 234 and 259 of Kp2 are notfound in Av2. This is in agreement with the predictions of Hausinger and Howard (10) based on protein sequence data. Thus, the five cysteines at positions 38, 85, 97,132, and 184of Kp2 are conservedin all three proteins and they are candidates for possible ligands of the 4Fe-4S cluster. Degree of Homology-When the sequences of Kp2 and Cp2 are aligned for maximum homology, there are 89 amino acid substitutions in Kp2 compared to Cp2. Thus, 184/273 amino acids of Cp2 are conserved in Kp2, which gives -675% homology, If the partial sequence data fromAv2 is included in the comparison, we find that 104/114 amino acids sequenced in Av2 are conserved in Kp2, giving -91% homology. The Av2 and Kp2 proteins are extented at the NHn- and COOH-terminals, compared to theCp2 protein (10). If we neglect these extensions and make a comparison across all three proteins for the 90 amino acids sequenced in Av2 that lie within these limits, we find that there is -80% homology of Kp2 with Cp2 and -93% homology between Av2 and Kp2. Since the overall homology between Kp2 and Cp2 is 67%, the regions of Av2 that have been sequenced must represent more highly conserved regions. Further, within these regions, Kp2 is more homologous to Av2 than to Cp2. Conserved Regions-Thecomparison of amino acidsequences shows that some regions of the protein areespecially conserved. Substitutions in these regions are generally conservative or neutral and areoften near the boundaries of the regions. Such regions are outlined in Table I. Conservation in these regions is over 90% between Kp2 and Cp2. They represent -146 amino acids. Poorly Conserved Regions-These are outlined in Table 11. Amino Acid 2810 Sequence of NitrogenaseProtein Iron I 10 0 Met Thr Met Ar G l n Cys Ala I l e T r G 1 L s G1 G1 I I L s S e r ThrThrThr Ile G 1 . G $ i AAA CACTCAACAACA (ibA GAAGTCACC G l n Asn Leu Val Ala Ala Leu Ala Glu Met GGY A ~ AGG? GGY ATC GGY A ~ ATCC ACC ACC ACG AAC CTC GTC GCC GCG cTG GcG Gkti A T 6 ProLys Ala Asp S e r Thr Ar Leu I l e Leu Ala Lys Ala Gln Asn Thr I l e Glu GGT AAG AAA GTG ATG ATC GTC GG? TGC GAP CCG AAG GCG GAC TCC ACC CG? CTG ATT CTG CAC GCC AAA GCA CAG AAC ACC ATT ATG GAG 60 70 Met Ala Ala Glu Val S e r Val Glu LeuGluLeuGlu As Val Leu Gln I l e T r As Val Ar Cys Ala G l uS e r ATG GCC GCG GAA GTC GGE TCG GTC GAG GAE CTC GAA CTC GAA GAE GTG CTG CAA ATT GGE T ~ CG G GAP ~ GTG CG? TGC GCG GAA TCC GGE 90 100 110 ATG ACC ATG CG? CAA TGC GCT ATT T ~ C 30 GlyLysLys Val Met I l e Val G 1 40 Cys AS Met His 80 G1 Gly Pro G l uP r oG l y GGC CCG GAGCCA 120 GAT TTC GTG TTC TAT 150 Val Cys S e r G 1 As Val Leu G 1 G1 $$ GGC GTGATC $f ACG GCG ATCAACTTTCTT AS Val Val Cys G 1 Ala Met Pro I l e Ar GlyPhe Ala TyrGlu GAAGAA GGC GCCTAC G l u Met Met Ala Met T r Ala & Asp Asp Leu GAGGACGATCTC Glu Asn L s Ala Gln Glu I l e Tyr I l e GAE GTG CTC GGE GAE GTG GTC TGC GGE GGC TTC GCC ATG CCG ATC CG? GAA AAC AJ(A GCCCAG 160 170 GAG ATC TAC ATC Val L s Tyr Ala L s S e rG l yL y s Ala Asn AsnI l e S e rL y sG l yI l e GTC TGC TCC G G GAA ~ ATG ATG GCG ATG T ~ CGCG GCC AAC AAT ATC TCC AAA GGG ATC GTT 180 G 1 GlyLeu G1 Gly Val I l e Thr Ala I l e Asn PheLeuGluGluGluGly Val G l y Cy6 Ala Gly Ar GGC GTC GGC TGC GCG GGA Asp Phe Val PheTyr G1 AS 190 Gln Thr As Ar Val Arg Leu TAC GCC A ~ ATCC GGC AAG GTG CGC CTC Glu As GluLeu I l e I l e Ala Leu Ala Glu Leu ThrGln Ile CAE CG? GAA GAE GAA CTG ATT ATT GCC CTG yi GAA A ~ GCTC GGY ACC CAC~ATG ATC CAC 220 Phe Val Pro Ar As Asn I l e Val Gln Ar Ala Glu I l e Ar Ar Met Thr Val I l e Glu T r AS Pro Ala Cys s Gln Ala Asn Glu TTT GTG ccc CG? GAE AAC ATC GTG CAG CG? GCGGAG ATC CG?CG? ATG ACG GTT ATC GAG TIC GAE ccc GCC TGT AtiA CAG GCC AAC cju 260 240 250 T r Ar ThrLeu Ala Gln L s I l e Val Asn AsnThr Met L Val Val PrO ThrPro C Thr Met AS GluLeuGluSerLeuLeu T ~ CCG? ACC CTG GCG CAG A ~ GATC GTC AAC $i ACC ATG p3JA GTG GTG CCG ACG ccc T ~ C ATG GAP GAG CTG GAA TGG CTG ti ATG 270 GluPheGly Ile G l u Glu Glu As Thr S e r I l e I l e G1 L Thr Ala Ala Glu Glu Asn Ala AlaTER GAG TTC GGC ATC ATG GAA GAG GAA GAF ACC AGC ATC ATT GG? ACC GCC KC GAA GAA AAC GCG GCC TGA GCA CAG GAC AAT TAT,, . , . I l e Cys Asn S e r Ar GGE GGC CTGATC 210 L 8 TGT AAC TCA CG? CAG ACC Met G1 His L Met 8 S Met 8 FIG. 3. DNA sequence of nifH and the amino acid sequence deduced from it. Only the coding strand is shown. C.P, K.P, A,". C.P. K.P. C.P. K.P. ,, , , 85 90 K.P. A,". ,105, ~ 110, K,P VAL 115, 125 K,P, A.v. C.P. K.P. !& ALA GLV ARG ~ L Lu ALA GLI ARC GLY Irr A u GLI ARG 130 135 Ass VAL VAL &C GLI GLY PNE VAL VAL Lu GLV GLI PXE ASP VAL V I L LXS GLI GLV PHE 1% 15s 145 GLI Lrs ALA GLH GLU ILETVR ILLVAL SEI GLY GLU kr A B N ~ L Y SALA GLN GLU ILLTVR ILL GLI GLU&T VAL PHE Tra ASP VAL iru GLI Asp 1]LE A,". SER GLY ASP TYR ASP PHE 140 ALA R T PRO ILE ARG GLU ALA k 7 PRO I L E ARG GLU hA AT PRO !LE ARG VALVSLR h.l TVR ILE VAL Y EOl l i t ILE THR SER ILLAn ki iru GLU GLM LEU GLY Au Tra THR ASP ASP k u ILETHR ALA ILE AS" PHE L E U GLU GLU GLU GLV TVR GLU ASP ASP L E U C.P. C.P. 100 95 GLY ILEARG Lrr VAL GLU SER GLV GLY PRO GLU PRO GLY VAL GLV ASP V u ARG ~ ~ G SERL GLV UGLI PRO GLU PRO GLY VAL GLI & VAL GLU SERGLIGLI PRO GLU PRO GLV VAL GLV C.P. &T Au k u T r a ALA ALA RTA L A ] ~ ~ E T ~ GLUI 165 Conservation in these regions between Kp2 and Cp2 is only 30%; they represent -82 amino acids. The NH2-terminalregions and regions of the protein around the cysteine residues are strongly conserved. It is interesting that the cysteine residues at positions 151 and 234 of Kp2 lie in conserved regions, but the cysteines themselves are not conserved (in Cp2 and Av2,respectively). Note also that there is good conservation of the COOH-terminal region(Kp2 265289) between Kp2 and Av2, but not between Kp2 and Cp2. This might be relevant to the observation that the purified nitrogenase components of A. vinelandii and K . pneumoniae form heterologous hybrids which have 93-100s of the activity of the homologous nitrogenases, whereas the cross-reaction of the Kp and Av nitrogenase components with those from Cp is either poor (17% for Cp2 + Kpl and 8% or less for Cpl + Kp2), or none (Av and Cp) (8). This COOH-terminal region might be involved in the formation of an active complex with the K . pneumoniae and A . vinelandii Mo-Fe proteins. In this context, we also observe that the cysteine residues at positions \ 5 and 151 of Kp2 are conserved in Av2, but are substituted in C.P. K.P. A,". , C.P. K.P. A.V. 05C21 TABLEI Strongly conserved regions , , ,0 K.P. 210 215 GLU k u GLY SER GLN k u ILL HIS PHE VAL PRO ARC. SER PRO MET VAL THR Lrs A u LrS LEU GLY THR GLN k T [ L E H I S ?HE VAL PRO ARG ASP ASN !LE VAL GLN ARG ALA C.P. 1 1 6 ASN C.P. K.P, A.v. ~ A K,P, 265 C.P K.P, VAL !LE 270 275 GLU ARG k u GLU GLU ILE k u MET GLN TYR GLY LEU MET ASP LEU-COOH ASP GLU LEU GLU SER LEU LEU MET GLU PHE GLY I L E MET GLU GLU GLU ASP <GLU&kU 285 M E 290 T % v A L F ] G L U L ] Substitutions in A v ~ ~ ~~~ 6LU 2-22 34-49 84-104 122-187 280 THR SER ILE C.P. 208-213 226-241 K,P, A,". FIG. 4. A comparison of amino acid sequences of iron proteins from K. pneumoniae (this work), C. pasteurianum (9), and A. vinelandii (10). The numbering refers to K. pneumoniae residues. Regions of homology are enclosed in boxes. Cysteine residues are underlined. Substitutions in Cp2" KD ~ L U GLU T Y R ASP PRO THR LLS GLU GLN ALA GLU LLU T Y R ARC R ARG G MET THR VAL ILE GLU TYR ASP PRO ALA L V S GLN ALA ASN GLU TYR ARG TIR ASP PRO LYS ALA LYS GLN ALA ASP GLU 225< Lrs GLN T R R C.P. A,". Amino acid residues in Cys 5 + Val Ile 35 + Val Ile 49 "-* Leu Ala 86 + Val Val 103 "-* Ile Asn 142 + Gln Cys 151 -, Ala Met 158 + Leu Val 169 + Gln Lys 176 + Gly Leu 182 + Ile No substitutions Ala 233 + Thr Lys 235 + Gln Asn 238 + Gln ~ See Ref. 10. See Ref. 6. None None in 33-41 (42-49 not available) Ala 86 -+ Val (101-104 not available) In the regions where the sequence is known (129-140, 147-154, 179-187). no substitutions Sequence not available Only 230-239 available Ala 233 + Lys Cys 234 -+Ala Asn 238 -+ ASP Amino Acid Sequence of Nitrogenase Protein Iron Cp2. These cysteine residues might also be necessary for the formation of active complexes with the Mo-Fe proteins from these species. Since the region extending beyond the COOH-terminal of Cp2 shows conservation in Kp2 and Av2, it is possible that Cp2 evolved from a single base pair change from the Glu codon in the Kp2gene to a stop codon in Cp2 (GAA to TAA). Protein Conformation-The protein conformations of Kp2 and Cp2 were analyzed by the Chou-Fasman procedure(19). The predicted conformations were as follows. Kp2: 40% ahelix, 26% P-sheet, 26%/?-turns;Cp2: 37%a-helix, 32% /3-sheet, 25% P-tumS. The distribution of helices and sheets were found to be similar in thetwo proteins. Further, mostof the /3-turns were conserved. The exceptions are the p-turnsat positions 22-25 and 113-116 of Cp2 which are not conserved in Kp2, and Kp2 has p-turns at positions 53-56, 188-191, and 255-260 that are not found in Cp2. unusually high number of Tanaka et al. (9) have noted the Gly-Gly sequences inCp2. Of the 7 Gly-Gly sequences in Cp2, only 4 are conserved in Kp2. Three of these conserved GlyGly sequences areinvolved in p-turns. Amino Acid Compositions-Finally, the amino acid comin III. It may positions of the Fe proteins are compared Table be of interest to examine the conservation of the rarer amino acid residues (Phe, Tyr,His, and Met) between Kp2 and Cp2 (Table IV). All the Tyr residues of Kp2 areconserved in Cp2, the other amino acids show only -50% conservation. Methionine occurs infrequently in the other Fe-S proteins; Kp has 15 Met residues of which 8 are conserved in Cp2. TABLEI1 Poorly conserved regions Amino acid residues in Kp2 Number of residues conserved in CP2 Number of substitutions Cp2 in ~ 4 23-33 50-67 5 7 14 Also a deletion of 2 amino acids 5 10 20 73-78 188-202 245-276 ~~ 4 1 12 TABLE111 Amino acid compositions of Fe proteins Residues/molecule Amino acid Aspartic acid Threonine Serine Glutamic acid Proline Glycine Alanine Cysteine Valine Methionine Isoleucine Leucine Tyrosine Phenylalanine Tryptophan Lysine Histidine Arginine Asparagine Glutamine Total Kp2 Cp2 Av2 16 16 10 29 8 27 29 9 22 15 24 14 13 13 24 9 32 20 6 19 11 20 26 12 Asp + Asn = 30 12 10 Glu + Gln = 35 9 28 28 7 25 13 20 23 9 7 0 16 3 14 19 9 6 0 16 2 13 12 10 292 5 0 16 2 12 8 11 273 ~ 289 ~~~ 2811 TABLEIV Conservation of rare amino acids Amino acids Phenylalanine Tyrosine Histidine Methionine Positions Kp2 in Residues conserved in Cp2 108, 121, 123,135,210, 123, 135,210 271 8, 80, 115, 124, 148, 159, 8, 80, 115, 124, 148, 159, 171, 230,240 171, 230,240 50,209 209 2,29,34,58,60,137,155, 2, 29, 34, 137, 155, 156, 156, 158, 207, 225, 269,274 252,261, 269, 274 In conclusion, we have shown that there is extensive sequence homology (67%) between the Fe proteins of two distantlyrelated prokaryotes, the Gram-positive c. pasteuriK . pneumoniae. Theconservaanum and the Gram-negative tion of sequences was shown to be selective for regions of the protein (see TablesI and 11).Thus, thehomology observed is most likely related to function, and is probably not merely the result of the late evolution anddispersal of the nifgenes. This sort of strong conservation, based on function, is not unique. The protein sequence of cytochrome c is highly conserved in eukaryotes, e.g. there is 55% homology between the cytochrome c proteins of the two widely separated species of horse and yeast (20). We can tentatively delineate functionally important regions of the Fe protein, although we cannot assign specific functions to them. The areas that have diverged are presumably not essential to activity. This sort of divergence can occur even if the nifgeneswere originally spread by lateral transmission as suggested (12). In that case, however, one might expect the divergence not to show any particular pattern based on evolutionary trees, i.e. evolutionary trees derived from nitrogenasesequences would not correlate with trees derived,for example, from rRNA sequences (21). The limited sequence informationfrom the Fe protein of the Gram-negative A. vinelandii (IO) suggests that it is in fact more homologous to Kp2 than to the Fe protein of Gram-positive C. pasteurianum. In this context, it is worth noting that Ruvkun and Ausubel (7) foundDNA hybridization of the K. pneumoniae Feprotein gene toA. vinelandii DNA, but not C.topasteurianum DNA. These observations aresuggestive of models based on divergence of preexisting nifgenes; however, one could also account for them by assuming that the evolution of the nif genes subsequent to lateral transmission followed different paths in Gram-positives and Gram-negatives. Such questions can be answered only when considerably more data are available from other species. Furthermore, we need additional DNA sequence data from the nifgenes of several species to locate silent substitutions that do not change the amino acid residues, in order to study evolutionary divergence independent of the enzyme function. The understanding of the mechanism of an enzyme can be made possible by studying the propertiesof mutant enzymes withsubstitutedamino acids. Utilizing thesequencedata presented here, it should be possible to use in vitro mutagenesis techniques to alterspecific regions or evenspecific amino acidresidues of the Fe protein in order to obtain altered nitrogenases which are deficient in specific functions such as ATP binding, electron transfer, and so on, or have altered EPR signals, etc.; one can then determine the nature of the substitution that caused the altered property. Togetherwith the extensive genetic system already developed in K . pneumoniae and the detailed physical characterization of its nitrogenase components (11, 2), such studies shouldlead to an elucidation of the mechanism of this protein. Acknowledgments-We would like to thank S. Brownfor con- 28 12 Amino Acid Sequence of Nitrogenase Protein Iron structing pSB1; W. Orme-Johnson, J. Howard,andH.Evans for comments on the manuscript;J. Howard for providing sequence data on the Azotobacter uinelandii Fe protein prior to publication; R. Tizard for advice on DNA sequencing; R. Hydefor preparation of the manuscript;F. Fullerforuse of computer programs; F. Lang for suggestions on lowering background in sequencing gels. REFERENCES 1. Orme-Johnson, W. H., and Davis, L. C. (1977) in Iron-Sulfur Proteins (Lovenberg, W., ed) Vol. 3, pp. 16-58, Academic Press, New York 2. Eady, R. R., and Smith,B. E. (1979) in A Treatiseon Dinitrogen Fixation (Hardy, R. W. F., ed) pp. 299-491, John Wiley and Sons, New York 3. Winter, H. C., and Burris, R. H. (1976) Annu. Reu. Biochem. 45, 409-426 4. Shah, V. K., and Brill, W. J. (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 3249-3253 5. Hageman, R. V., and Burris, R. H. (1978) Proc. Natl. Acad. Sci. U. S. A . 75,2699-2702 6. Eady, R. R., and Postgate,J. R. (1974) Nature 249,805-810 7. Ruvkun, G . B., and Ausubel, F. M. (1980) Proc. Natl. Acad.Sci. U. S. A . 77, 191-195 8. Emerich, D. W., and Burris, R. H. (1978) J.Bacteriol. 134, 936943 9. Tanaka, M., Haniu, M., Yasunobu, K. T., and Mortenson, L. E. (1977) J. Biol. Chem. 252, 7093-7100 10. Hausinger, R. P., and Howard, J. B. (1980) Proc. Natl. Acad.Sci. U. S. A . 77,3826-3830 11. Roberts, G. P., MacNeil, T., MacNeil, D., and Brill, W. J. (1978) J. Bacteriol. 136,267-279 12. Postgate, J. R. (1974) Symp. SOC. Gen. Microbiol. 24,263-292 13. Cannon, F. C., Riedel, G. E., and Ausubel, F. M. (1979) Mol. Gen. Genet. 174, 59-66 14. Riedel, G . E.,Ausubel,F. M., and Cannon, F. C. (1979) Proc. Natl. Acad. Sci. U. S. A . 76, 2866-2870 15. Rodriguez, R. L., Bolivar, F., Goodman, H. M., Boyer, H. W., and Betlach, M. (1976) in Molecular Mechanisms in the Control of Gene Expression (Nierlich, D. P., Rutter, W. J., and Fox, C., eds), pp. 471-499, Academic Press, New York 16. Clewell, D. B., and Helinski, D. R. (1969) Proc. Natl. Acad. Sci. U. S. A . 62,1159-1166 17. Maxam, A,, and Gilbert, W. (1980) Methods Enzymol. 65, 499560 18. Shine, J., and Dalgarno, L. (1975) Nature 254,34-38 19. Chou, P. Y., and Fasman, G. D. (1978) Annu. Reu. Bzochem. 47, 251-276 20. Margoliash, E. (1972) Harvey Lect. 66, 177-248 21. Fox, G., Stackebrandt, E., Hespell, R., Gibson, J., Maniloff, J., Dyer, T.,Wolfe, R., Balch, W.,Tanner, R., Magrum, L., Zablen, L., Blakemore, R., Gupta, R., Bonen, L., Lewis, B., Stahl, D., Luehrsen, K., Chen, K., and Woese, C. (1980) Science 209,457463