432 Electrophoresis 1997. 18, 432-442 P. Dainese ef a / . Probing protein function using a combination of gene knockout and proteome analysis by mass spectrometry Paola Dainese’ Werner Staudenmann’ Manfred0 Quadroni’ Chantal Korostensky’ Gaston Gonnet’ Michael Kertesz’ Peter James’ ‘Protein Chemistry Laboratory ’Computational Biology Research Group ’Microbiology Institute, Swiss Federal Institute of Technology, ETH-Zentrum, Zurich, Switzerland Recently the determination of the genome sequences of three procaryotes (Haemophilus influenzae, Methanococcus ,jannaschii and Mycoplasma genitahum) as well as the first eucaryotic genome (Saccharomyces cerevisiae) were completed. Between 40-60% of the genes were found to code for proteins to which no function could be assigned. We describe an approach which con?bines proteome analysis (mapping of expressed proteins isolated by twodimensional polyacrylamide gel electrophoresis to the genome) with genetic manipulations to study the complex pattern of protein regulation occurring in Escherichia coli in response to sulfate starvation. We have previously described the upregulation of eight spots on two-dimensional (2-D) gels in response to sulfate starvation and the assignment of six of these to entries in the E. coli genome sequence (Quadroni et al., Eur. J. Biochem. 1996, 239, 773-781). Here we describe the identification of the remaining two proteins which are encoded in a sulfate-controlled operon in the 21.5’ region of the E. colr genome. Upregulated protein spots were cut from multiple 2-D gels collected and run on a modified funnel gel to concentrate the proteins and remove the sodium dodecyl sulfate before digestion. The peptide masses obtained from the digests were used to search the SwissProt database or a six-frame translation of the EMBL DNA database using a peptide mass fingerprinting algorithm. A digest can be reanalyzed after deuterium exchange to obtain a second, orthogonal data set to increase the confidence level of protein identification. The digests of the remaining unidentified proteins were used for peptide fragment generation using either post-source decay in a matrix-assisted laser desorption ionization (MALDI) time-of-flight mass spectrometer or collision-induced dissociation (CID) coupled mass specrometry (MS/MS) with triple stage quadrupole or ion trap mass spectrometers. The spectra were used as peptide fragment fingerprints to search the SwissProt and EMBL databases. 1 Introduction In the past year and a half, three complete bacterial genome sequences (Haemophilus inj7uenzae [ 11, Mycoplasma genitalium [2], and Methanococcus jannaschii [3]) as well as a eucaryotic genome (Saccharomyces cerevisiae, available on the internet at http:/genome-www.stanford.edu/Saccharomyces/) were completed. About 45 OIo of the open reading frames (ORF) identified in the procaryote genome sequences of M. genitalium (470 ORFs), H. injluenzae (1743 ORFs) and Escherichia coli (an estimated 4100 ORFs according to the Japanese genome project http://bsw3.aistnara.ac.jp)code for proteins of unknown function [l, 21. The main challenge arising out of the genome projects will be to try to develop methods to assign functions to the various ORE In this article we describe our attempts to combine genetic approaches (by random gene knockout or activation) with proteome ~~ ~ Dr. Peter James, Protein Chemistry Laboratory, Universitaetstrasse 16, ETH-Zentrum, CH-8092 Zurich, Switzerland, (Tel: +41-1632-2919; Fax: +41-1632-1213; E-mail: bcmass@bc.biol. Correspondence: ethz.ch) Nonstandard abbreviations: AMU, atomic mass units; CID, collisioninduced dissociation; MS/MS, fragmentation analysis using coupled mass scanning devices; ORF, open reading frame; PSD, post source decay analysis; SSI, sulfate starvation-induced; TOF, lime of flight; WWW, World Widc Wcb Keywords: Proteome / Gene knockout I Fingerprinting / Sulfate starvation / Mass spectrometry 0 VCH Verlag\gcsellschaft mbH, 69451 Weinhelm, 1997 analysis (mapping changes in gene expression by 2-D gel electrophoresis and mass spectrometry) in order to assign functions to proteins involved in the sulfate starvation response in E. coli. Bacteria must be able to respond rapidly to changes in their environment since they are relatively immobile. They can modulate the expression of individual genes o r large groups of genes, regulons, which are sets of operons with a common regulator, in response to the needs of the organism. The entire set of genes responding LO an environmental stimulus is termed a “stimulon” [4]. Such global regulation systems are well established in E. coli for the assimilation of carbon, nitrogen and phosphorus [5, 61 and recently for sulfur [7]. Sulfur comprises ca. 1 % of the dry weight of a cell [8] and is assimilated primarily from sulfate in the cysteine biosynthetic pathway. Sulfate uptake and assimilation is under the control of the “cys regulon” [9]. Sulfate is bound by a periplasmic sulfate binding protein and moved across the cytoplasmic membrane by two channel-forming membrane proteins which are bound to a cytoplasmic nucleotide binding subunit. The sulfate is reduced to sulfite and then to sulfide before reacting with O-acetylserine to form cysteine. If the sulfide concentration in a cell drops, the amount of 0-acetylserine rises, as well as N-acetylserine, which is formed by an irreversible N - 0 migration of the acetyl group. Full expression of the cys regulon requires the presence of N-acetylserine and the transcriptional activator protein CysB which positively regulates the genes of the cys regulon [lo]. In soil how0173-0835/97/0304-0432 $17.50+.561/0 Elrcrrouhoresis 1991. 18. 432-442 ever, a high percentage of sulfur is present in an organic form such as sulfate esters, sulfamates, amino acids and sulfonic acids [Ill. In order to study how E. coli survives in soil under sulfate starvation conditions, we grew the bacteria in vitro with ethanesulfonate as the sole sulfur source. We have previously shown that eight proteins are induced by a factor of a least 2 and have identified six of them [12]. We describe here the identification of the other two proteins and outline the evidence that a sulfate starvation regulon exists in E. coli. 2 Materials and methods 2.1 Materials Acrylamide, N,N-methylenebisacrylamide and carrier ampholytes for 2-D electrophoresis were purchased from BDH (Poole, England); CHAPS and NP-40 were from Sigma (Buchs, Switzerland); Coomassie Brilliant Blue (Serva Blue G ) was from Serva (Heidelberg, Germany); SyPro Orange and Red were from Molecular Probes (Eugene, OR, USA); ImmobilineTM strips were from Pharmacia (Uppsala, Sweden). All other reagents for 2-D electrophoresis were the highest purity grade available from Fluka (Buchs, Switzerland). Fluorotrans PVDF membrane was obtained from PALL (Muttenz, Switzerland) and 0-octylglucopyranoside was from Pierce (Rockford, IL, USA). Sequencing grade modified trypsin was purchased from Promega (Zurich, Switzerland) and DNase and RNase were from Boehringer (Mannheim, Germany). All HPLC solvents used were from RiedeldeHaen (Seelze, Germany). 2.2 Bacterial culture and cell extraction Escherichia coli MC4100 (F- aruD139 A(argF-lac) U169 rpsL1.50 relAl deoCl p t s R 5 rpsRflbB5301) [13] was obtained from the laboratory collection. Bacteria were cultivated in a sulfur-free, synthetic glucose-salts medium as previously described [7], with the addition of either inorganic sulfate (500 pm) or ethanesulfonate (500 pm) as the sole sulfur source. The culture were grown aerobically on a rotary shaker (180 rpm) at 37"C, and growth was monitored spectrophotometrically at 650 nm. Cells were harvested in the mid-exponential phase (A,,,=0.5) by centrifugation (7000 X g , 10 min, 4°C) and washed with 50 mM Tris/HCl, pH 7.0. They were then resuspended in the same buffer (0.8 g wet mass per mL) and ruptured by three passages through a chilled French pressure cell at 135 MPa. Ten I.ILof 10 mM Tris/HCl, pH 7.5, was added per 200 pL of pellet, followed by 20 pL of 1% w/v SDS, 150 mM DTT. The solution was kept at 95°C for 5 min and then DNase I (50 pg/mL) and RNase A (10 pg/mL) were added and incubated for 30 min at 37". Cell debris was removed by centrifugation (12 000 X g, 30 min, 4°C). 2.3 2-D gel electrophoresis The first-dimensional Immobiline strips were run in batches of 20 on a Multiphor I1 system. The Immobiline strips were reswollen overnight in 8 M urea, 2% CHAPS, 10 mM DTT, 0.8% carrier ampholytes pH 4-8. Typically, 400 p of proteins were loaded on each Immobiline strip Probing protein [unction by gene knockout and proteome analysis 433 when using the Pharmacia cup system. The strips were focused at 300 V for 3 h to allow the samples to enter the gel, then the voltage was ramped up to 3500 V over 6 h and run at 3500 V for 24 h. Large loadings were carried out using a gel rehydration cassette as described by Rabilloud [14]. After the focusing was complete, the strips were equilibrated for 20 min in 50 mM Tris/HCl, pH 6.8, 6 M urea, 25% glycerol, 0.2% SDS, 30 mM DTT, before changing the buffer and incubating for 5 min in 50 mM Tris/HCI, pH 6.8, 6 M urea, 25% glycerol, 0.2% SDS, 65 mM iodoacetamide. The strips were transferred onto 12% polyacrylamide gels (Laemmli system). The second-dimensional gels were run in a 40 L tank at 12-15°C overnight at a constant current of 400 mA (using a running buffer of 25 mM Tris, 200 mM glycine, 0.19'0 w/v SDS) using an Iso-Dalt apparatus (Hoefer, San Francisco, CA, USA). Six batches of gels were run, each from a different preparation of E. coli to ensure the reproducibility of the sulfate starvation induction effect. Gels were stained with Coomassie Brilliant Blue according to the protocol of Schagger and von Jagow [15], since it produces a very clear background, suitable for scanning densitometry. Gels were fixed overnight in methanoUacetic acid/water (50/10/40 v/v), stained in 0.025% w/v Serva Blue G in 10% v/v acetic acid for 3 h and then destained in 10% v/v acetic acid. Silver staining was carried out according to Doucet and Trifaro [ 161. The gels were fixed in 40% ethanol, 10% acetic acid overnight and then washed with Millipore UltraP water (3 X 20 min) before sensitization for 30 rnin with a solution of 5 mg/L DTT (30 rnin). Silver impregnation was carried out by soaking in 0.1% AgNO, for 30 rnin before washing twice with water for 30 s. The gels were developed with a solution of Na,CO, (30 g/L) with 300 pL/L (37% v/v in water) formaldehyde. Gels were then placed in a stop solution of 1% acetic acid for storage. Wet gels were scanned in a Personal Densitometer (Molecular Dynamics, Sunnyvale, CA, USA) and image analysis and spot matching were performed using the Investigator software package (Millipore, Bedford, MA, USA) on a Sun workstation. The p l and M, abscissa were calibrated using the known values from two proteins, alkyl hydroperoxide reductase C-22 and the sulfate-binding protein as described in [12]. 2.4 Protein elution, concentration and electrotransfer onto PVDF membrane Individual proteins were excised from multiple 2-D gels of E. coli grown under sulfate starvation conditions and concentrated to single sharp bands using a funnelshaped gel electrophoresis device (Dainese, P. et al., unpublished). Proteins were then electroblotted onto PVDF membranes in a semidry apparatus (Hoefer) in a buffer containing 50 mM Tris/HCl, 192 mM glycine, 0.02% w/v SDS, 10% v/v methanol, 2 mM DTT, for 1 h at approximately 1.2 mA/cm2. Proteins were visualized on the membranes by a 5 min incubation in 0.1% w/v Serva Blue R in 50% v/v methanol, followed by destaining in 70% v/v methanol for 5-10 min. 2.5 Protein digestion Dried PVDF membrane slices were cut into small pieces (1 mm2) and equilibrated for 1 h at room temperature in 434 P. Dainese Electrophoresis 1997, 18, 432-442 el a / . 10 pL of 1O/o w/v 0-octylglucopyranoside, 100 mM ammonium bicarbonate (pH 7.8). Digestion was initiated by the addition of 1 pg of Promega trypsin (1 pg/pL in the same buffer) and carried out for 15 h at room temperature. Digestion was stopped by adding 1 ILLof 2% v/v TFA to the sample. The supernatant was collected and the membrane pieces were washed twice with 10 yL 0.1% v/v TFA and the washes were pooled with the supernatant . 2.6 MALDI-MS and post-source decay (PSD) analysis MALDI time of flight (TOF) mass spectra were accumulated using a Voyager Elite (Perseptive Biosystems, Framingham, MA, USA). Samples were acidified with 10% TFA to lower the pH to below 3 for cocrystallization acid (5 mg/mL in 50% with a-cyano-4-hydroxy-cinnamic acetonitrile, 50% 0.1% TFA in water) on a 100 position sample tray. The crystals were washed (three times) by covering the spot with a drop of ice-cold water (for 5 s), which was then removed by suction using a fine pipette. Samples were analyzed in pulsed extraction reflector mode using an accelerating voltage of 20 kV, a pulse delay time of 75 ns, a grid voltage of 55% and guide wire voltage of 0.05%. After the 100 positions had been analyzed, each sample was wetted with 1 pL of deuterium oxide and allowed to dry. This was repeated four times in an airtight glove box in which the atmosphere was saturated with deuterium oxide and kept under a slight overpressure of nitrogen. The glove box was located immediately above the sample inlet to the mass spectrometer from where the target could be moved into the instrument without exposure to the air. PSD spectra were accumulated from the same spots using a 50% higher incident laser power and setting the timed ion selector to the mass of interest and varying the mirror ratio from 1 to 0.02. The guide wire voltage was lowered to 0.02% for mirror ratios below 0.3 and spectra were accumulated for 64 to 256 shots per mirror ratio setting. 2.7 Automated peptide fragmentation by collision-induced dissociation (CID) Digestion mixtures were separated by reversFd-phase HPLC on a capillary column (C,,, 5 yL, 300 A, 280 X 0.05 mm) from LC Packings International (Zurich, Switzerland) directly connected to a Finnigan MAT (San Jose, CA, USA) TSQ 700 triple quadrupole mass spectrometer equipped with an electrospray ionization source using a coaxial flow of 1.5 pL/min methoxyethanol as a sheath liquid. The total flow was 3 yL/min and the column was washed extensively with solvent A (0.1 Yo v/v TFA in H,O) before running a 60 min linear gradient from 0 to 70% solvent B (80% v/v acetonitrile, 0.08% v/v TFA). A program, Autofrag, which automates the collection of CID fragmentation spectra from unknown samples, was written in Finnigan MAT instrument control language [17]. The program was subsequently modified to incorporate features of a similar program presented by Dr. Terry Lee at the 1993 American Society of Mass Spectrometrists meeting [18]. The program monitors the masses of the peptides eluting from the column, using the first quadrupole Q1 to scan the mass range from 500 to 2000 atomic mass units (amu) every 4 s (with 2 mTorr argon in 42 but with the collision offset voltage set to zero). The program switches to MS/MS mode every time a signal lasting more than two scans is detected with a signal/noise ratio > 5. Quadrupole Q1 filters out the selected ion which undergoes fragmentation in the second quadrupole, filled with a collision gas (argon) to a pressure of 2 mTorr. The program returns to scanning Q1 after 5 MS/MS scans so that less intense, coeluting peptides can also be analyzed. The collision offset voltage (the voltage for accelerating the ions in the collision chamber, Q2) is automatically adjusted to a value determined by the mass of the ion selected. The resulting fragments are analyzed with the third quadrupole 4 3 [19], scanning the mass range from 50 to 2000 amu in 3.0 s . This procedure allows both parent ion mass measurement (for protein mass fingerprinting) and sequence analysis by fragmentation (for peptide mass fingerprinting) to be carried out during the same HPLC run. The program has an optional lookup table which contains masses of common contaminants such a8 trypsin and keratin fragments and nonpeptidic ions arising from the gel. These masses are not used for carrying out MS/MS analysis. 3 Results In order to search for proteins expressed at low levels which are induced by sulfate starvation, we changed from using carrier ampholyte-based tube gels in the firsl. dimension to a flat-bed Immobiline system. Using the gel rehydration method of sample loading described by Rabilloud [14], it was possible to increase the loading from 100 pg to 10 mg per strip. The 2-D images obtained using a first dimension of carrier ampholytes in tube gels were very different from the ones obtained using Immobiline strips in the first dimension; therefore, we repeated the sulfate starvation study with the new system. Figure 1 shows a 2-D gel of wild-type E. coli grown in the presence of 500 pm inorganic sulfate (left panel) and wild-type E. coli grown with 500 pm ethanesulfonate (right panel). Eight spots were found which are upregulated by a factor of at least 2, corresponding to the ones previously observed [ 121. Interestingly, several of the spots had changed their relative positions and spots 2 and 3 showed a reversal of apparent p l values. The spots were excised from sixteen gels and concentrated using the funnel gel system and then digested. The amount of each protein used for identification (estimated from the integrated intensity of Coomassie staining of the proteins in comparison with that of a dilution series of bovine serum albumin) was between 1-50 pmol. The resulting peptides were extracted into a final volume of ca. 20 pL. The digests were analyzed according to a three-tier mass spectrometry approach to data collection for protein identification and characterization (represented schematically in Fig. 2). 3.1 First-level mass spectrometric analysis: peptide mass fingerprinting One pL of the protein digest was used for analysis by MALDI-TOF-MS. Reproducible good spectra were ob- Electrophoresis 1997, 18. 432-442 Probing protein function by gene knockout and proteome analysis 435 Figure 1. 2-D PAGE mapping of Escherrchia coli grown in the presence or absence of inorganic sulfate. E. cull was grown with 500 pm sulfate (left panel) or 500 pm ethane sulfonate (right panel) as the sole sulfur source. All the proteins which were induced during growth with ethane sulfonate by more than a factor of 2 X are labeled. Two proteins which were expressed at the same level under both growth conditions are labeled as EC9A and EC19A. tained at the 100 femtomole level for protein digests containing octyglucoside. The sample was pipetted, together with an equal volume of matrix, onto the surface of a 100 position target and allowed to dry. The spectra were accumulated in an automated overnight run and the digests giving poor or no spectra were reanalyzed manually the next day. The stability of the mass calibration (+ 0.3 amu) was high enough that no internal calibration was required. The spectral accumulation used less than 5% of the material deposited on the target (which could be kept for weeks and reexamined at will). The samples were then remeasured after deuterium exchange and the two sets of peptide molecular masses obtained (the native and deuterated peptide mass fingerprints) were used to search protein (SwissProt) and nucleic acid sequence (EMBL) databases using the program MassSearch [20, 211. Figure 3 shows the MALDI-TOF mass spectrum of the trypsin digest of spot EC9a, which was used as a control, since the level of its expression was the same in the presence or absence of inorganic sulfate. Table 1 shows the MassSearch output for the single digest. The confidence level of the score, the difference between the correct match and the next nonrelated protein was 15.9. In order to confirm the identification as elongation factor Tu, a second data set, usually obtained by deuteration of the first, was used as an orthogonal fingerprint. Table 2 shows the increase in confidence (from 15.9 to 140.3) achieved by using two data sets (here with two digestions). If the protein was positively identified, the relevant database entry was downloaded and examined in detail. All the peptide masses observed were checked against the sequence, allowing for partial digestions and modifications such as oxidation, deamidation, carbamylation, etc. The masses which did not match were used to search the database again to check if two (or more) proteins were present in the same spot. The first stage of analysis, peptide mass fingerprinting, conclusively identified three of the proteins as sulfate binding protein (spot 2), cysteine synthase A (spot 5 ) and alkylhydroperoxide reductase (spot 8) in the SwissProt database, whilst the fourth protein, the j7iY gene product, a cystine binding protein (spot 7), could only be identified as an entry in the DNA database using a dual data set. As a general rule, if a protein was present in a protein sequence database, a set of six masses from a single digest was sufficient to identify the protein correctly with the levels of mass accuracy achieved using automated data collection. For proteins in the six-frame translation of the DNA database, a dual digestion with four masses per digest was found to be the minimum required under the same data accumulation conditions. Dual data sets do not require more material, since a deuterated data set can be obtained by reusing the samples on the MALDI target. After single, dual and high mass accuracy searches, four proteins remained unidentified. These were used for the second-level analysis. 3.2 Second-level mass spectrometric analysis: peptide fragment fingerprinting After peptide mass fingerprinting, 19/20 of the protein digest still remained and 1/20 was on the MALDI-TOF 436 P. Dainese et Electrophoresis 1997. 18, 432-442 a/. digests showed peptides that were sufficiently mass-separated that one could attempt sequencing by daughter ion analysis in a reflectron TOF-MS using MALDI. A second aliquot of 1 pL was used for PSD [22]. The parent ion was selected by a timed ion gate which can resolve ions separated by a least 40 amu from the next ion (up to a mass of ca. 2000). PSD analysis is useful for poor digestions when a few peptides, well separated by mass, are obtained, though at least 100-200 femtomoles of material is required. The high energy fragmentation patterns differ from those obtained using a low energy regime such as in an ion trap or triple quadrupole MS. The spectra could be used directly with the high energy version of the Sequest program [23] for database searching, or interpreted manually, and a peptide tag used as input to the Peptidesearch program [24]. Figure 4 shows the PSD spectrum of peptide mass 1275.6 from the control protein, EC19a. Database searches using the uninterpreted spectrum with high energy Sequest and using a partial TAG sequence with the Peptidesearch program identified the protein as E. coli B-lactamase. target after fingerprint analysis (since only a few percent of the material loaded was actually consumed during data accumulation). On average, about one in four + ~ FINGERPRINT -l Q POSlTlV I 1 1 I DEUTERATE DIGEST I Since none of the remaining four protein digests gave a peptide mass distribution suitable for PSD analysis, the digests were used for auto-HPLC-MUMS peptide fragment fingerprinting. Half of the remaining 18/20 was used for peptide fragment fingerprinting by HPLC-MSI MS. The samples were loaded into 50 pm internal diameter capillaries packed with C-18, 300 A, 5 pvn reverse-phase material and eluted by a gradient of acetonitrile in water into the triple stage quadrupole or ion trap mass spectrometers for automated on-line HPLC MSIMS data collection. This procedure generated (on average) partial to complete sequence spectra for >70% of the peptides in the digest extract. Figure 5 shows the auto MS-MS run for protein 3. The masses of the eluting peptides are listed above the respective total ion current (TIC) peaks. The spectra for each of the ca. 40 masses chosen for MSIMS were averaged and written to separate files for subsequent database searching with the Sequest program [25]. Table 3 shows the result of a single peptide fragment fingerprint search from protein spot 3. Four other MS/MS spectra matched with this A SEARCH SEARCH MANUALLY SEARCH Figure 2. Logical flow diagram of the strategy used for protein identification. 1962.1 20,000_) I 1 1233.4 21 17.5 1781.2 2240.1 2729.5 I 2240.1 I I 1,000 1,500 I 2,000 Mass (m/z) Figure 3. MALDI-TOF spectrum of the tryptic digestion of protein EC9a. I I 2,500 I 3,000 Probing protein function by gene knockout and proteome analysis Elecrrophoresis 1997, 18, 432-442 Table 1. Protein identification using single digest data") Number Scoreb' n k AC DE/OS 1 79.5 14 3 2 79.5 14 3 3 63.6 6 2 4 63.1 7 2 5 62.4 5 2 PO2990 Unmatched P21694 Unmatched PO973 1 Unmatched P32800 Unmatched 401360 Unmatched Elongation factor Tu, E. coli weights: 1233.4; 2240.8 Elongation factor Tu, S. typhimurium weights: 1233.4; 2240.8 Protein HXLF5.Cytomegalovirus weights: 1962.1; 2117.5: 2240.8 CRTl protein. S. cerevisiae weights: 1233.4; 1781.2; 1962.1 Aliphatic amidase. R . eythropolis weights: 1781.2; 2117.5; 2240.8 a) The table shows the Masssearch output using the tryptic peptide masses of protein EC9a measured by MALDITOF, shown in Fig. 3. Searching SwissProt release 32 using average masses. Scores lower than 60 are probably not significant. The average tryptic fragment masses used were: 1233.4. 1781.2, 1962.1, 2117.5, 2240.8 b) Score is the inverse logarithm of the probability of the matches occurring at random; n is the number of peptides in the matching protein which have masses between the lowest and highest match used in the search; k is the number of peptides showing matches. AC is the accession number of the matching protein in the SwissProt database. DE is the SwissProl annotated description of the protein. Table 2. Protein identification using orthogonal data sets: Masssearch output for EC9a"' Number Score n k n k AC DElOS 1 236.9 14 3 5 5 PO2990 2 236.9 14 3 5 5 P21694 3 162.0 12 3 6 4 P43926 4 104.1 12 4 7 3 P29542 7 96.6 7 2 4 3 P35647 Elongation factor Tu, E. coli Unmatched tryptic: 1233.4; 2240.8 All AspN weights matched Elongation factor Tu, S. typhimurium Unmatched tryptic: 1233.4; 2240.8 All AspN weights matched Elongation factor Tu, H. influenzae Unmatched tryptic: 1233.4; 2240.8 Unmatched AspN: 1289.4. Elongation factor Tu, S. ramocissimus Unmatched tryptic: 2240.8 Unmatched AspN: 1196.3; 1289.4 Hemagglutinin 1, E. corrodens Unmatched tryptic: 1962.1, 2117.5, 2240.8 Unmatched AspN: 598.6; 1196.3 a) Searching SwissProt release 32 using average masses. Scores lower than 90 are probably not significant. The tryptic fragment masses used were: 1233.4, 1781.2, 1962.1, 2117.5, 2240.8, and for AspN the fragment masses were: 598.6, 929.1, 1071.3, 1196.3, 1289.4. Abbreviations as in Table 1. ARG N 0 r r 300 0 r X v) 20- CI S 3 s 10- I 200 400 600 800 1000 1200 Mass m/z Figure 4. PSD spectrum of peptide mass 1275.6 from the control protein, EC19a. The sequence tag used for searching is indicated. The ion series can be easily identified since all occur in groups showing a characteristic mass separation (a+28=b, b+17=c). The protein was identified by Peptidesearch and high energy Sequest as p-lactamase. 437 43 8 P. Dainese 1001 PI al. Electrophoresis 724.5 ol MS Scan 80- ‘x50 10+200 I 8 530.2 60. 573.2 Select only 725 -64 1997, 18. 402-442 I 1724 MS/MS Scan I 486 436 I 4 1 I I 40- 20- I, 600 1000 I 1400 100 200 300 400 500 600 700 8 1800 masskharge ION CURRENT TRACE Figure 5. On-line HPLC auto-MS and MS/MS peptide fragment fingerprint accumulation for the tryptic digest of SSI protein #3. The bottom panel shows the intensity of ions entering the mass spectrometer from the HPLC against time, expressed as scan numbers (scanning from mass 500 to 2000 in 4 s). The masses of the eluting peptides are given above the peaks. In this example, a peptide of mass 724.5 in scan 56 (top left) was identified by the Autofrag program as a candidate for sequence analysis and was then isolated and collisionally activated to give an MS/M!I or fragmentation spectrum (top right). Manual interpretation of the spectrum gave the sequence Thr-His-Pro-Val-Ser-Gly-Lys. Table 3. Sulfate starvation-induced protein #3 is found in the databases using peptide fragment fingerprintinga’ Sequest search result Number (M+H)+ 1 2 3 ORFl 4 5 deltCn Ions Access # 775 715 715 0.0000 8/11 0.1561 0.2272 15/25 91 18 tauD 4-Coumarate-CoA ligase 1 Transposon TXI, hypothetical protein 715 715 0.2655 0.3899 8/19 1113 1 G2-specific protein kinase Phosphoribosylamine-glycine ligase a) Sequest output using MS/MS spectrum of tryptic mass 775 from sulfate starvation-induced protein #3 to search the database. region of DNA in the 8.5‘ region of the E. coli chromosome. Eight MS/MS spectra from spot 1 matched an adjoining region around 8.5‘. Figure 6 shows these peptides aligned against a translation of the genomic sequence. Peptides from the auto HPLC-MS/MS runs of spots 4 and 6 were subsequently found grouped together in the 21.3’ region of the E. coli chromosome (see Table 4). 3.3 Third-level mass spectrometric analysis: subtractive MS/MS If the protein was successfully identified, the 9/20 of the digest remaining after peptide mass and peptide frag- ment fingerprint analysis was analyzed by HPLC-MS/ MS using a modified version of AutoFrag called ModFrag. The program ignores all the masses in a userdefined list (e.g. the tryptic masses predicted from the genome sequence) and sequences only the unknowns. The resulting MS/MS spectra coming from ions that are not predicted from the known sequence are used to research the databases in order to detect peptides coming from comigrating proteins, or to detect possible DNA sequencing errors leading to amino acid substitutions, and reading frame shifts. The nonmatching spectra remaining after both searches are then manually interpreted to give a complete sequence [19] or used for a. Probing protein function by gene knockout and proteome analysis Elerrruphurrsrs 1991. 18. 432-442 1 2 3 V V 1 2 3 P W R I K P Q P N R R G V S N L S R T G R E A K E S G E P W T G V L K K A E N R G L A ' R K R R T V D W R 1 2 3 K S S A W G L Q L S T D N T F P T T P L R Q H L C - A R G A E Q H R A G A G F R R R A N R Q P R F Q P V S G C A L A S G D V Q I G N L G S S P L A V A R W L Q A T C K ' R L Q 1 2 3 1 2 3 N T L L A A L A F I A F Q A Q A f 1 T H F L P H W H S S L F R H R R ' T S ' H T S C R T G I H R F S G T G G E R H R L a - - S Q P T G A D * S L L A G V K T G * L R .A .S SQ :Q *V ;P ZI BE l V F G N S E P A N R C R L K S S C W R Q N W V T P K - 3 S A G G K E N Y Q Q T G R S D W Q T H R A L V V K K T I S K P E D L I G K R m R W W * R K L S A N R K I ' L A N A S P - 1 R 1 2 T 2 3 1 Y H R ' I 2 L 3 V * L S V S T T H Y S L A G G T E T L G P P P T T A W L A A L K H W G V H H P L Q P G W R H ' N T G A A S G D C R T C S R P R L S L P K P G Q V E I V E P A A A R D Y R C L N P G K W R L ' N L Q P P A I I A A W T R 1 2 3 G S G E I L M A A G R Y * W Q R G D I 1 K T R 2 1 2 3 T 1 L 2 3 1 2 3 R A R D 3 - G V C L L M C S L G H R R L T P W K G T G G ' R P G K - R C + P I L N R S G S G A R Q R W Q G V D R P * T G R A V G R A N A G K V L T D S E Q V G Q W G A P T L D - S G W C A K I L P R N I L R S ' K R S L G G A Q R F C R E T S ' G R E S V R A F A W V V R V K A P S M L S N R T L L T Q T C G ' N ' K R H R C S A T V H C ' P R R V A E T K S A I D A Q Q P Y I A N P D V W L K Q - S R K T A G K H S Q A N W R V * A A C L K V T F P Q T G A F K R R A ' R ' R S R L A R L S G V P E G D V P G Figure 6. Alignment of the fragmentation spectra matches found using Sequest using the tryptic digest of SSI protein #I with the three forward reading frames of a preliminary DNA sequence from the E. coli 8.5'-region. The fragmentation spectra from the automated HPLCM U M S of the tryptic digest of SSI protein # I were matched against our E. coli database using the program Sequest. The peptides produced fragmentation spectra which matched amino acid sequences (indicated by shaded boxes) in all three forward reading frames of an unpublished stretch of DNA sequence from E. coli. peptide tag seaarch using the method of Mann [24]. This helps to pick up modified peptide masses such as those due to post-translational modifications. None of the eight proteins analyzed showed any unexpected masses. 3.4 Loss of specific function by random gene knockout In an attempt to further define the functions of these last four proteins, an independent approach was taken. Random mutants were generated using a phage carrying a promoterless B-galactosidase gene which randomly inserts into the E. coli chromosome. The resultant fusion strains were screened for mutants showing increased B-galactosidase activity under sulfate starvation conditions. If the phage inserts into a gene the corresponding protein spot on a 2-D gel will disappear. Figure 7 shows a 2-D gel of wild-type sulfate-starved E. coli (left panel) and of the mutant strain, 108 (right panel). The insertion site of the phage was found to be in the 8.5' region of the chromosome in the middle of the ORF corresponding to that identified by peptide fragment fingerprinting of spot 3. The insertion mutants which showed genes under sulfate control were then screened using a series of sulfur sources in the media to find which substrates they could grow on. In seven of the nine mutants analyzed, the phage had integrated in the 8.5'-region, three of them into the region coding for the spot 3 protein. Interestingly, mutant 108 could grow using ethanesulfonate but not taurine as the sulfur source. 4 Discussion All bacteria require sulfur and phosphorus for growth, and these are usually supplied in laboratory growth media in the form of inorganic phosphate and sulfate. The growth of E. coli under phosphate-limited conditions leads to an increased expression of the pho regulon, an ensemble of 81 or more proteins [ 6 ] ,many of which are known to be directly involved in phosphate metabolism [27]. A similar negatively-regulated system was proposed to exist for sulfur metabolism, since a set of proteins (SSI proteins) can be seen by 2-D electrophoresis to be upregulated when E. coli i s grown using compounds other than cysteine or sulfate as the sole source of sulfur [7]. We have recently described the identification of six SSI proteins seen to be upregulated on Coomassie bluestained 2-D gels [12]. A similar sulfate response is shown by Pseudomonas putida and Staphylococcus aureus, which exhibit 14 and 10 SSI proteins, respectively [7]. The similarities suggest that mechanisms for the compensation of sulfate starvation might be widespread among bacteria. By analogy with the phosphate starvation response system, we believe that these proteins are involved in the assimilation of organic sulfur sources when inorganic sulfate is scarce. Eight proteins were seen to be upregulated when comparing Coomassie-stained 2-D gels of E. coli grown in the presence of ethane sulfonate instead of sulfate as a sulfur source. We have described the identification of six Table 4. Peptide fragment fingerprints from sulfate starvation-induced proteins 4 and 6 determined by MS/MS which match in the E. colt genome 21.6' regiona' Protein spot # number Peptide mass (MH+),, Matching sequence in genome SSI 4 716.9 119.9 1069.5 1254.6 133.9 890.1 909.1 FDSPAXK AAYSGAXK TXXDXXPER DVQVPDXXS LR AQAAFAR TDSVGQQR ETVDFNGK SSI 6 a) X indicates Ile or Leu since they are isobaric. 439 440 P. Dninese el ol. Electrophoresis 1997, 18, 432-442 Figure 7. 2-D PAGE mapping of Escherichia coli wild type and 108 mutant grown in the absence of inorganic sulfate. E. coli wild type (left panel) and the 108 mutant (right panel) were grown with 500 um ethane sulfonate as thc sole sulfur source. The magnified insets show the presence of SSI protein 3 in the wild type and its absence in the mutant. of these from 2-D gels using tube gels with carrier ampholytes in the first dimension. In order to identify the remaining two proteins we switched to using Immobiline strips as the first dimension to increase the amount of material which could be loaded. Since the 2-D patterns obtained were different we had to repeat the identification of all the SSI spots as well as the triangulation markers used as markers for gel comparisons. Four of the SSI spots could be identified by peptide mass fingerprinting using MALDI spectra: SSI spot 8 was found to be alkylhydroperoxide reductase; SSI spot 2, sulfate binding protein; SSI spot 5, cysteine synthase A; and SSI spot 7, cystine binding protein. This was in agreement with our previous results obtained by HPLCMS electrospray on a TSQ MS. The MALDI data accumulation, however, took only 15 minutes whereas the HPLC-MS required 7 h. The four remaining spots, SSI 1, 3, 4, and 6 could not be identified, even using dual data set searching. The final four spots were analyzed by HPLC-auto-MS/ MS to obtain peptide fragment fingerprints. Matches for peptide MS/MS spectra were found for all four proteins in this way. However, if peptide fragment fingerprinting could find these proteins, why had the peptide mass fingerprinting failed? The answer was immediately obvious when one considered Fig. 6. The reading frame was shifting so often that a high score could never be obtained in a single reading frame stretch. Spots 1 and 3 were found in the 8.5’ region (between 384.600-385.562 and 386.339-387.166 kbp, respectively) of the E. coli chromosome and spots 4 and 6 in the 21.5‘ region (between 993.944-994.517 and 991.500-992.643 kbp, respectively). The E. coli DNA sequence used for the searches was downloaded from the Japanese E. coli genome project group’s WWW site at htp://bsw3.aistnara.ac.jp. All four of these SSI protein-encoding ORFs had not been found (they were not in the ORF annotation list maintained at the site). The eight proteins seen to be upregulated by comparing Coomassie-stained 2-13 gels can be divided into four functional groups: (i) SSI spot 8, alkylhydroperoxide reductase, is a general stressrelated protein. (ii) SSI spot 2, sulfate binding protein, spot 5 cysteine synthase A and spot 7, cystine binding protein, form part of the cys regulon under the control of CysB. (iii) SSI spots 1 and 3 are proteins encoded by a region close to the hemB (porphobilinogen synthase) gene. This region has been shown to contain four open reading frames, tauA, B, C and D [27], where TauA and TauD are SSI spots 1 and 3 respectively. (iv) SSI spots 4 and 6 are proteins encoded by a region between the pepN (aminopeptidase N) and pyrD (dihydroorotate oxidase) genes and are separated by the ycbE gene, an O R F encoding a hypothetical ABC transporter protein. In order to further define the functions of the SSI proteins, random mutants were generated using the hplacMu9 system. Nine mutants were isolated whiclh showed sulfur-regulated B-galactosidase activity [27]. 2-D gel analysis of mutant strain MS-108 showed that SSI Electrophorrsis 1997, 18, 432-442 spot 3 disappears, indicating that the phage had integrated into the tauD gene. This was confirmed by DNA sequencing. The insertion sites of the phage in the other mutants showed all occurred in the 8.5’ region (mutant 61 was in the tauB gene; 11, 15, and 43 in the tauC gene and 74, 82, 104, 108 and 115 in tauD). Inspection of the sequence of the tauABCD region showed that there was a single promoter before tauA and that the region was probably transcribed as an operon. TauABCD showed many features characteristic of ABC-type transporter systems such as a periplasmic substrate binding protein, a channel-forming membrane protein and a cytoplasmic nucleotide binding protein [28]. A comparison of the tauA-encoded sequence with the N-terminal of TauA (SSI spot 1) showed that a signal peptide had been cleaved off, indicating a periplasmic location. This was consistent with TauA being a periplasmic binding protein. TauB was found to be homologous to known ATPbinding proteins of ABC transporter and TauC showed a hydropathy plot characteristic of a membrane protein. TauD, SSI spot 3, shows homology to tdfA dioxygenase of Alcaligenes eutrophus which catalyzes the conversion of 2,4-dichlorophenoxyacetate to 2,4-dichlorophenol and concommitantly, glyoxylate and a-ketoglutarate to succinate and carbon dioxide. Studies on taurine degradation in other microorganism show it is usually oxidatively deaminated to sulfoacetaldehyde and then cleaved to sulfate and acetate. Thus TauD may be responsible for oxidatively desulfonating taurine to release sulfite which enters the cysteine biosynthesis pathway. Disruption of tauD and the other tau genes only affected the ability of the bacterium to grow, using taurine as a sulfur source, and did not affect the utilization of alkylsulfonates; tau is therefore thought to represent a sulfur-regulated operon involved in the metabolism of taurine. The genes encoding SSI spots 4 and 6 were found in the same region of the chromosome, on either side of an ORF coding for a hypothetical ABC transporter. This region also showed great similarity to an ABC transporter operon. This second sulfur-controlled operon containing SSI 4 and 6 may be responsible for the uptake of alkylsulfonates since disruptions in the tau genes led to an inability to grow with taurine as the sulfur source but did not affect the uptake of ethanesulfonate. Gene inactivation analysis should be able to provide the data to help answer this question. The results that we present here show that proteome analysis of gene activation in response to environmental factors, in combination with gene knockout, is a powerful technique to obtain information about the function of unknown ORFs in genome sequences. The screening for the loss of a specific function by random gene knockout ensure that changes visible by 2-D PAGE are not just due to random upregulation of proteins or to a generic stress response. Proteome analysis will not only help to find functions for the ca. 40% of the genome ORFs which have no homology to known genes but will also allow a broader understanding of genome organization, especially the coordination of global responses. We are extending our proteome analysis studies to include gene knockout or permanent activation of specific genes in order to find which proteins are Probing protein function by g e n e knockout and proteome analysis 441 coregulated as part of a global response. The sensitivity of protein identification is increasing all the time [29] and the use of higher loading techniques [14] together with the use of narrow pH range Immobiline strips in the first dimension to increase resolution should make virtually all proteins accessible to mapping. However, there is one major caveat, the proteome coverage that can be achieved using 2-D gel analysis must be defined. A detailed comparison of the mRNAs being expressed with the level of their translation products will be necessary, in order to answer the critical question of how much of a genome is being expressed at any given time. This work was supported by a grant from the Swiss Federal Institute of Technology to PJ. The authors would like to thank John Yates and Jimmy Eng of the University of Washington (Seattle, Washington, USA) for their Sequest program, Matthias Mann for his Peptidesearch program, and H . Nashimoto (Teikyo University, Utsunomiya, Japan) for making the preliminary E. coli sequences available (the final sequence is available on the internet at http:llbsw3.aist-nara.ac.jp). Received September 18, 1996 5 References Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A , , Merrick, J. M., McKenny, K., Sutton, G., Fitzhugh, W., Fields, C., Gocayne, J. D., Scott, J., Shirley, R., Liu, L. I., Glodek, A., Kelley, J. M., Weidman, J. F., Phillips, C. A., Spriggs, T., Hedblom, E., Cotton, M. D., Utterback, T. R., Hanna, M. C., Nguyen, D. T., Saudek, D. M., Brandon, R. C., Fine, L. D., Fritchman, J. L., Fuhrmann, N. S., Geoghagen, N. S. M., Gnehm, C. L., McDonald, L. A,, Small, K. V., Fraser, C. M., Smith, H. O., Venter, J. C., Science 1995, 26Y, 496-521. Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A , , Fleischmann, R. D., Bult, C. J., Kerlavage, A . R., Sutton, G., Kelley, J. M., Fritchman, J. L., Weidman, J. F., Small, K. V., Sandusky, M., Fuhrmann, J., Nguyen, D., Utterback, T. R., Saudek, D. M., Phillips, C. A,, Merrick, J. M., Tomb, J. F., Dougherty, B. A,, Bott, K. F., Hu, P. C., Lucier, T. S . , Peterson, S. N., Smith, H. O., Hutchinson, C. A,, Venter, J. C., Science 1995, 270, 397-403. Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., FitzGerald, L. M., Clayton, R. A., Gocayne, J. D., Kerlavage, A. R., Dougherty, B. A,, Tomb, J. F., Adams, M. D., Reich, C. I., Overbeek, R., Kirkness, E. F., Weinstock, K. G., Merrick, J . M., Glodek, A,, Scott, J. L., Geoghagen, N. S. M., Weidman, J. F., Fuhrmann, J. L., Nguyen, D., Utterback, T. R., Kelley, J. M., Peterson, J. D., Sadow, P. W., Hanna, M. C., Cotton, M. D., Roberts, K. M., Hurst, M. A , , Kaine, B. P., Borodovsky, M., Klenk, H. P., Fraser, C. M., Smith, H. O., Woese, C. R., Venter, J. C., Science 1996, 273, 1058-1073. [4] Neidhardt, F. C., in: Neidhardt, F. C., Ingraham, J. L., Magasanik, B., Low, K. B., Schachter, M., Umbarger, H. E. (Eds.) Escherichia coli and Salmonella typhimurium, ASM Press, Washington 1987, pp. 1313-1317. [5] Neidhardt, F. C., Ingraham, J. L., Schaechter, M., Physiologv of the Bacterial Cell: A molecular approach, Sinauer Associates, Sunderland, MA 1990. [6] Van Bogelen, R. A,, Sankar, P., Clark, R. L., Bogan, J. A., Neidhardt, F. C., Ekctrophoresis 1992, 13, 1014-1054. [7] Kertesz, M. A,, Leisinger, T., Cook, A. M., J. Bacteriol. 1993, 175, 1 187- 1 190. [8] Roberts, R. B., Abelson, P. H., Cowie, D. B., Bolton, E. T., Britten, R. J., Studies of Biosynthesis in Escherichia coli, Carnegie Institution of Washington, Washington, D C 1957. [9] Kredich, N. M., Mol. Microbiol. 1992, 6, 2746-2753. 442 P. i)ainese el a / . [lo] Ostrowski, J., Kredich, N. M., J. Bacteriol. 1991, 173, 2212-2218. 1111 Gerrnida, J . J., Wainwright, M., Gupta, V. V. S. R., in: Stotzky, G., Bollag, J.-M. (Eds.), Soil Biochemistry, Marcel Dekker, New York 1992, pp. 1-53. [12] Quadroni, M., Staudenmann, W., Kertesz, M., James, P., Eur. J. Biochem. 1996, 239, 773-78 1. [I31 Silhavy, T. J., Berman, M. L., Enquist, L. W., Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring, Harbor, NY 1984. [I41 Rabilloud, T., Valette, C., Lawrence, J. J., Electrophoresis 1994, 1.5, 1552-1558. [15] Schagger, H., von Jagow, G., Anal. Biochem. 1987, 166, 368-379. [I61 Doucet, J. P., Trifaro, J. M., Anal. Biochem. 1988, 168, 265-271. [I71 Piccinni, E., Staudenrnann, W., Albergoni, V., De Gahrieli, R., James, P., Eur. J. Biochem. 1994, 226, 853-859. [18] SLahl, D. C., Swiderek, K. M., Davis, M. T., Lee, T. D., J . Am. Soc. Mass Spectrom. 1996, 7, 532-540. [19] Hunt, D. F., Yates, J. R., Shabanowitz, J., Winston, S., Hauer, C. R., Proc. Natl. Acad. Sci. USA 1986, 83, 6233-6237. [20] James, P., Quadroni, M., Carafoli, E., Gonnet, G., Biochem. Biophys. Res. Commun. 1993, 195, 58-64. Electrophoresis 1997, 18, 432-412 [21] James, P., Quadroni, M., Carafoli, E, Gonnet, G., Protein Sci. 1994, 3, 1347-1350. [22] Kaufmann, R., Spengler, B., Lutzenkirchen, F. Rapid Commun. Mass Spectro. 1993, 7, 902-910. [23] Griffin, P. R., MacCoss, M. J., Eng, J. K., Blevins, R. A,, Aaronson, J . S., Yates, J. R., Rupid Commun. Mass Spectrom. 1995, 9, 1546-1551. [24] Mann, M., Wilm, M., Anal. Chem. 1994, 66, 4390-4399. [25] Eng, J . K., McCorrnack, A. L., Yates, J. R., J. Am. Soc. Mass Speclrom. 1994, 5, 976-989. [26] Torriani-Gorini, A., in: Torriani-Gorini, A,, Yagil, E., Silver, S . (Eds.), Phosphate in Microorganisms, ASM Press, Washington, DC 1994, pp. 1-4. [27] Van der Ploeg, J., Weiss, M., Saller, E., Nashimoto, H., Saito, N., Kertesz, M., Leisinger, T., J. Bacteriol. 1996, 178, 5438-5446. [28] Sirko, A,, Hryniewicz, M., Hulanicka, D., Roeck, A , , J. Bacteriol. 1990, 172, 3351-3357. [29] Wilm, M., Shevchenko, A., Houthaeve, T., Breit, S . , Schweigerer, L., Fotsis, T., Mann, M., Nature 1996, 379, 466-469.