Supplementary Results Monte-Carlo-Simulations To assess whether additional experiments could significantly increase the number of identified proteins, we plotted the total number of distinct protein identifications as a function of the number of overall high quality peptide identifications for the twelve S2 cell line experiments (red dots, Fig. 2). To rule out that the size of the experiments has a large effect, the increase of protein identifications for hundred random permutations of the experiment order is also shown (gray dots). As expected, independent of the experiment order, a fast saturation of distinct protein identifications was observed. To forecast the increase of distinct protein identifications beyond the experimentally identified proteins, successive Monte-Carlo-Simulations were carried out assuming different protein identification probabilities. The resulting saturation curves are shown in Figure 2 and include two extreme scenarios: (i) the probabilities for identification of a protein are all equal (blue curve) and (ii) the probabilities of those proteins not identified in the 12 experiments are all 0 (purple curve). The assumption that each of the proteins not yet identified has a small probability of being identified results in the green curve. Depending on how many genes are not expressed in a given tissue or cell line (i.e. their encoded proteins have a probability 0 of being detected), the green curve will be closer to the purple curve. Since the red dots are above the purple curve, the assumption that the probability for each of the proteins not identified is 0 is wrong. We thus expect that we could identify more proteins with more experimental trials, but since all simulated curves are saturation type curves, the effort will become increasingly higher. Analysis driven experimentation (ADE) to increase proteome coverage We developed an iterative feedback strategy which cycles between experimentation, thorough bioinformatics and statistical analysis, and re-directed experimentation, which we term ADE, for analysis-driven experimentation. This strategy enables us to optimize experimental conditions and specifically target those parts of a proteome that are underrepresented in prior data sets. To that aim, a number of physico-chemical and functional protein parameters (length, isoelectric point, favored codon frequency (FCF) as a measure for protein abundance, transmembrane domains, signal 1 peptides) were computed for all distinct proteins Drosophila proteins (Supplementary Table 1 online). In a first analysis we compared the length distribution of all proteins with that of 4,455 experimentally identified proteins from five large Kc167 cell line experiments and assessed their statistical significance by calculating areas of under- or overrepresentation in a combined histogram (Supplementary Fig. 2S online). We observed a highly significant underrepresentation of short proteins below 500 amino acids, which is most pronounced in the length classes up to 300 amino acids (Fig. 3a). The same length-bias was observed for the two other datasets (data not shown) pointing out the need to change the current experimental setup to cover this highly underrepresented protein class to a higher extent. Thus gel-filtration experiments on Kc167-cell protein extracts were performed to enrich for small proteins. The length-distribution of 1,194 proteins identified in these gel-filtration experiments differed significantly from that of all proteins (data not shown) and the 4,455 previously identified Kc167 cell proteins (Fig. 3b), and revealed an over- rather than an underrepresentation of the small protein class. Moreover, 15% of the 1,194 proteins were new protein identifications of which 70% fell into the length classes below 450 amino acids (50 kDa) (Fig. 3c), demonstrating the high value of this analysis-driven experimentation (ADE) approach. Similar statistical analyses on the 3rd-instar larvae dataset (not shown) and a larger dataset of almost 8,000 proteins from a range of experiments showed that basic proteins were detected with a lower frequency than expected, whereas the identification of acidic proteins showed a reciprocal bias (Fig. 3d). While our experimental set-up thus preferentially identified acidic proteins, many more acidic proteins remain to be identified. For the ADE approach to be generally applicable and lead towards a more complete proteome coverage, it also has to be able to add new protein identifications in an area where the experimental set-up works well. We thus targeted acidic proteins by separating larval proteins by free flow electrophoresis (FFE) in the pH range 4-7 and identified 1,960 proteins. Statistical analysis of the protein pI distributions revealed that we could indeed achieve an even more pronounced overrepresentation in the acidic range (Fig. 3ef). A comparison of unique larval protein identifications before and after this round of ADE adaptation showed that in addition to around 330 newly identified acidic proteins, we also identified a significant number of previously undetected basic proteins (about 45% of the 600 newly identified proteins). This can be attributed to the fact that we also analysed the protein fractions that concentrate at the boundary of the pH 4-7 gradient, in which the basic proteins were concentrated. 2 Cross-comparative searches for proteomics-based genome annotation In a pilot study, a representative set of seven large experiments was searched (Supplementary Table 2 online) against both our standard protein database and a six frame-translated genomic database, generating two search outputs for every spectrum. The output was subsequently filtered for spectra with peptide assignments displaying a high PeptideProphet score (p>0.9) in the genomic database search but a low score (p<0.5) in the protein database search, indicating that the respective protein sequence entry could be missing in the protein database. Using this concept 1138 MS/MS scans were identified that defined 901 distinct peptides (Supplementary Table 5 online). These peptides were then blasted against the non-redundant protein database (NCBI): 349 peptides remained without any hit, 343 sequences showed certain homology, 132 revealed 100% identity to proteins from other species and 77 sequences identified various transposable elements in Drosophila, which were not contained in our protein search database. Next, the peptides were searched against the genomic sequence to determine their exact coordinates. For a subset of repeatedly-identified, high-scoring peptides, the genomic regions flanking these loci were inspected manually for the presence of open reading frames (ORFs), using the Expasy translate tool1, or the GenScan gene-finding tool2, respectively. 3 Supplementary Methods Data Processing and Statistical Validation MS/MS spectra were searched using the Turbo- Sequest algorithm3 against our standard Drosophila protein database (BDGP release 3.2) with the following initial criteria: requirement for trypsin digestion, mass tolerance 3 Da, variable modification of the amino acids, methionine (+15.9994 Da) and cysteine (+227,26 with a differential modification of +8.9339 Da for the heavy isoform of the C-ICAT reagent; or +422.22 Da with a +8 differential modification for d8-ICAT-labeled cysteine, Supplementary Table 2, online). The Sequest output was submitted to a suite of software tools (trans proteomic pipeline; see www.proteomecenter.org/software.php) including the use of a statistical model (PeptideProphet) to estimate the accuracy of peptide assignments to MS/MS spectra4. This combination exhibited excellent sensitivity and selectivity in a recent comparison of various database search engines5. It extracted high quality information on our dataset, yielding an average false discovery rate (FDR) of 1.37% in spectrum assignments (average PeptideProphet Pvalue of 0.985 for a peptide assignment). Despite a high redundancy in the dataset, some peptides were identified only once. This set of 29,314 peptides (~5.8%) have to be distinguished from single hit identifications in smaller datasets since the extensive analysis of samples from the six main developmental stages are expected to generate redundant identifications also for peptides derived from rare proteins. (A subset of these peptides are expected due to the analysis of very specific protein samples that have been performed in this context). We therefore denominate these peptides “one-hit” identifications and very likely a higher false positive rate is expected amongst this class of peptides. Indeed, the average PeptideProphet P-value for this class of peptides is 0.974 and thus slightly lower than the average PeptideProphet P-value for all peptides. In contrast, if we apply other conventionally used, but less stringent filtering constraints based on Xcorrelation (Xcorr>1.6 for charge state +1; Xcorr>2.4 for charge state +2; Xcorr>3.2 for +3) scores we would reach a 75% proteome coverage (11,565 proteins; 102,630 unique peptides) and hit more than 85% of all Drosophila gene models. However, since one of our key aims was to critically assess and improve the genome annotation, we emphasized the selection of very stringent criteria. 4 Monte-Carlo-Simulations Two different types of simulations were carried out (Fig. 2). 1. Experiments of different size (i.e. sum of identification trials) could have an effect on the increase of distinct protein identifications, depending on the consecutive order of the experiments. The average number of distinct protein identifications per experiment was 1,640 proteins, ranging from approximately hundred to three thousand identified proteins. To rule out that the size of the experiments has a large effect, we carried out a first simulation as a control. Out of a total of 12! theoretically possible experiment orders (close to 480 million), 100 orders were randomly selected, and the respective number of distinct protein identifications over time for these orders is shown by the gray dots in Figure 2. If, by chance, in one permutation of the experiment order, several large experiments are among the first ones for that order, the sum of distinct protein identifications at that stage can be higher compared to that of the actual experiment order (red dots). Importantly, each of the individual twelve experiments added some unique identifications. 2. In order to estimate the increase of distinct protein identifications beyond the identified 5,795 proteins for the entire proteome, several additional Monte Carlo simulations were carried out with the following model: A single protein identification trial in a single experiment can be regarded as a random trial with 16,743 possible outcomes. Since identification trials can be regarded as independent of each other, and since the experimental conditions within one experiment can be regarded as constant, each experiment can be described by a multinomial model. Results, e.g. frequencies of protein identifications in a single experiment i consisting of n identification trials can be modelled as a 16,743-dimensional multinomial distributed random vector (Xi,1,Xi,2,…,Xi,16743)(n,pi,1,pi,2,…,pi,16743), with parameters n, p1,...,p16743 with pi = probability of identification of protein i. For a series of experiments with the aim to identify the whole proteome of an organism the random variables Yi(n)=Sign(Xi,1)+…+Sign(Xi,16743), the count of distinct identified proteins in a series of n identification trials in experiment i is of most interest. The maximum of the possible values of Yi(n), the number of identifiable proteins, corresponds to the number of proteins with pi>0. Importantly, in order to be able to carry out the simulations one has to assume a certain probability of identifying those proteins that were not yet identified. Several simulations were carried out under 5 the following assumptions: An extreme and very unrealistic case, which assumes an equally high probability of identifying any of the 16,743 proteins results in the blue curve (Fig. 2). More realistically, for the 5,795 experimentally identified proteins pi,j, one can use as maximum likelihood estimators their observed relative frequencies hi,j/ni where hi,j denotes the count of identifications of protein j in experiment i of size ni (hi = # of identifications of protein i in all S2 experiments, h1,...,hk>0 for k=5795 proteins seen). Finally, the probabilities for proteins not identified by the 12 experiments (16,743-k=10,948) were calculated based on either assuming an overall number of identifications of 1 for each of these protein (green curve, hk+1:…: h16743 = 1) or of 0 (purple curve, hk+1:…: h16743 = 0). In case of the green curve, all proteins can be identified, but this needs many more identification trials than for the blue curve. The purple curve is of most interest, since we thus got evidence by simulating the stochastic process (Yi(1),Yi(2),…,Yi(ni)) for each experiment that the observed values of Yi(ni) (red dots) were significantly higher than the simulated values (purple curve), a result, which contradicts the model assumptions. It indicates that several proteins have a small probability of being identified, and that continued experimentation can identify more proteins. Data storage (SBEAMS) SBEAMS is being designed as a framework for collecting, exploring, and exporting data produced by a variety of biological experiments. The framework is intended to be flexible for use with a number of different experiments. At its core, SBEAMS is not a single program, but rather a set of software tools designed to get data in and out of many evolving relational database schemas. The code is designed around several Perl modules (web-server common gate interface (CGI) scripts) which handle most of the communication with the relational database engine and also provide a consistent Web front end. HTML offers the overwhelming advantage that it is easily accessible to any computer without software installation. The implemented SBEAMS proteomics module for Drosophila melanogaster (named FLYCAT) incorporates information about the protein source as well as the experimental conditions. To analyze and compare datasets from different projects, the SBEAMS database offers a variety of tools ranging from simple search options connected to continuative links for retrieving information on specific entries up to specialized implementations that allow to rapidly compare or summarize over multiple experiments. SBEAMS allows to combine, resume or compare multiple projects and experiments from different origins enabling 6 researchers to validate or share data in a simple, quick and reproducible manner. A profound and detailed description of SBEAMS and of its implementation is published in Desiere et al., 20056. Peptide Atlas PeptideAtlas is an expandable resource for the integration of data from many diverse proteomics experiments. Its main goal is the validation of gene prediction by protein expression data6. This is achieved by mapping peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data back to the corresponding genomic sequence (Ensembl 35.4c). Identifications in PeptideAtlas have been statistically filtered to retain only those of the highest quality. The Drosophila build of the PeptideAtlas is available at www.mop.unizh.ch/peptideatlas. The PeptideAtlas web site offers interfaces for the public to contribute data, explore the database, and download data. Fractionation of cells/tissues by hypotonic lysis Schneider S2 cells were grown to high density in 150 cm2 flasks in Schneider’s Drosophila cell culture medium (Invitrogen) supplemented with 10% FCS. The cells were harvested by gently tapping the flasks to dislodge the cells, and then transferred to 50 ml Falcon tubes and centrifuged (1,000xg) for 7 minutes at 4o C. The pelleted cells were washed 3 times with 50ml of chilled PBS by using the initial centrifugation conditions. To begin the cell fractionation procedure the final washed cell pellet was resuspended in 5 volumes of fresh ice cold hypotonic cell lysis buffer [10 mM Hepes (pH 7.90), 1.5 mM MgCl2, 10 mM KCl] supplemented just before use with 0.5 mM DTT and CompleteTM protease inhibitor cocktail (Roche) and incubated in this solution for 10 minutes on ice. The swollen cells were dounced 20x or until all cells were visibly lysed. (The efficiency of cell lysis was assessed by using phase contrast microscopy to monitor the release of intact nuclei). The resulting lysate was transferred to corex tubes and centrifuged at 4o C for 7 min. at 1,000xg. This generated a nuclear pellet. The supernatant (combined cytoplasmic and membrane fractions) was removed and processed as described in the next section. The pelleted nuclei were recentrifuged at 4o C in a SS-34 rotor (Sorvall) at 20,000xg for 20 minutes to further remove contaminating cytoplasmic and membrane proteins. The resulting supernatant was 7 removed and discarded. To extract nuclear proteins, the nuclear pellet was first resuspended in an ice cold high salt protein extraction buffer [(20mM Hepes (pH 7.9), 25% glycerol, 1.5 mM MgCl2, 0.42 M NaCl)] supplemented just before use with 0.5 mM DTT, 0.5mM PMSF, and CompleteTM protease inhibitor cocktail (Roche). Nuclear proteins were extracted over the next 30 minutes by stirring this suspension at 4oC. The suspension was then centrifuged at 20,000xg in SS34 rotor (Sorvall) at 4o C for 20 minutes. The resultant supernatant (the nuclear protein fraction) was carefully removed and placed in Eppendorf tubes. The pellet was discarded; it contained abundant histone proteins which were selectively retained in the pellet by the choice of the salt concentration used in the extraction procedure7, 8. The initial hypotonic supernatant (cytoplasmic and membrane fraction) was ultra centrifuged at 100,000xg (SW 60 rotor, Beckman) at 4o C for 90 minutes to generate separate membrane and cytoplasmic fractions. The supernatant (cytoplasmic fraction) was removed and the pellet (membrane fraction) was taken directly without further washing and dissolved in ICAT Labeling Buffer (see below). The proteins from the cytoplasmic and the nuclear fractions were each precipitated separately by adding 7 volumes of ice cold acetone followed by incubation at -20°C for 90 minutes and centrifugation in an Eppendorf tube at full speed for 10 min. at 4oC. The precipitated protein was resuspended again in ICAT Labeling Buffer at a final concentration of 2 mg/ml) and stored at -80oC. In the case of the membrane fraction, however, the proteins were directly resusupended in a slightly modified Membrane labeling buffer so that it contained a higher percentage of SDS and urea (8M urea, 10mM TrisCl (pH 8.4) and 0.125% SDS). The protein concentration of each fraction was measured using the BCA assay (Pierce) or the RCDC protein estimation kit from Biorad; Bovine Serum Albumin (BSA) served as a standard. Total Extract from Adult flies yw flies were reared in population cages at 25°C and fed on normal yeast agar plates. No distinction was made between male and female flies for protein extractions. To extract total proteins, flies were first ground and homogenized using mortar and pestle in fresh ice cold RIPA buffer (150 mM NaCl, 50 mM Tris-HCl, pH 7.5, 500 µM EDTA, 100 µM EGTA, 1.0% Triton X-100, and 1% sodium deoxycholate) supplemented just before use with CompleteTM protease inhibitor cocktail (Roche). They were centrifuged for 5 minutes at 1,000xg at 4°C to remove the insoluble parts. The supernatant was homogenized further using 20-30 strokes in a glass douncer. The resulting 8 homogenate was spun at 1,000xg for 10 minutes at 4°C and proteins were precipitated using 7 volumes of ice-cold acetone. The resulting precipitated protein was resuspended again in ICAT Labeling Buffer. Protein extract from Adult fly heads Around 3,000 adult yw flies (from a single population cage) were first immobilized at 4°C, transferred to a 50 ml falcon tube and immersed into liquid nitrogen for a few seconds. After a vigorous shake, the heads were sieved through a 0.8mm/0.4mm/0.1mm sieves in that order and separated from the body. They were manually separated from other debris under microscope. The heads were pooled onto a 50 ml falcon and frozen at -80°C. The frozen heads grinded by mortar and pestle (cold) and homogenized in 1 ml of fresh ice cold hypotonic cell lysis buffer [10 mM Hepes (pH 7.90), 1.5 mM MgCl2, 10 mM KCl] supplemented just before use with 0.5 mM DTT and CompleteTM protease inhibitor cocktail (Roche). The homogenate was centrifuged in a pre-cooled eppendorf at 1,000xg for 10 minutes at 4°C . The resulting supernatant were separated into cytoplasmic and membrane and nuclear (insoluble) fractions like the cells/tissues. Membrane protein extract from Adult flies yw flies were reared in population cages at 25°C and fed on normal yeast agar plates. No distinction was made between male and female flies for protein extractions. To extract total proteins, flies were first ground and homogenized using mortar and pestle in 10% sucrose supplemented just before use with CompleteTM protease inhibitor cocktail (Roche). They were centrifuged for 5 minutes at 600g at 4°C to remove the insoluble parts. The supernatant was spun at 8,000xg for 30 minutes at 4°C. The pellet was washed once using 50mMTris-HCl (pH8.4) and incubated for 60 minutes on ice to lyse the cells. The cells were homogenized using 20-30 strokes in a glass douncer. The resulting homogenate was transferred to ultracentrifugation tubes (1ml/tube), overlayed with 7ml of 48% sucrose, 15ml of 28.5% Sucrose, and filled with10% Sucrose solution. The samples were spun 100,000xg for 2 hours at 4°C. The membranes were recovered from the interphase between 48% and 28.5%sucrose and proteins were precipitated using 7 volumes of ice-cold acetone. The resulting precipitated protein was resuspended in ICAT Labeling Buffer at a final concentration of 2 mg/ml. 9 Protein extract from Adult and Larval Hemolymph Adult flies were punctured with a sharp pre-cooled needle through the thorax and transferred quickly into a 0.5 ml eppendorf tube with siliconized glass wool with the bottom clipped off and placed inside a 1.5 ml eppendorf tube. When enough flies are pooled, the eppendorf tubes are centrifuged at 1500g for 5 minutes at 4°C. The hemolymph was collected from the 1.5 ml eppendorf. The larval hemolymph was extracted by a similar manner, with the exception that they are initially grabed by a pair of forceps and a small piece of the cuticle was torn off, taking care that the internal organs were not damaged and when enough larvae were pooled, they were spun at 800g for 5 minutes at 4°C. The proteins from the resulting hemolymph from larvae and flies were precipitated using 7 volumes of ice-cold acetone at -20°C for 90 minutes and spun at full speed in an Eppendorf centrifuge for 10 minutes at 4oC. The resulting precipitated protein was resuspended in ICAT Labeling Buffer at a concentration of 10 g/ml Protein extract from larval fatbodies 3rd-instar yw larvae were first washed with PBS and homogenized in Balanced Salt Solution (5M NaCl, 1M KCl, 1M MgSO4, 1M CaCl2, 1M Tricine, 1M Glucose, 1M sucrose, 1g BSA supplemented just before use with CompleteTM protease inhibitor cocktail (Roche)). The grindate was collected and centrifuged for 20 minutes at 3,000xg (4°C) in 250 ml polycarbonate bottles in HS-4 rotor (Sorvall). The fat layer floating on the surface was collected and proteins were precipitated using ice-cold acetone at -20°C for 90 minutes. Protein extract from yw pupae Pupae were collected from population cages and homogenized using ice-cold RIPA buffer and supplemented just before use with CompleteTM protease inhibitor cocktail (Roche). The homogenate was initially centrifuged at 800g to get rid of the insoluble fractions and proceeded as before (hypotonic lysis) to obtain a membrane fraction. 10 Total protein extract from yw Embryos The embryos (2h AEL) were collected from 100 apple agar plates and dechorionated. They were homogenized using 8M urea, 10mM TrisCl (pH 8.4) and 0.125% SDS. The homogenate was centrifuged at 800g and the supernatant, containing the proteins, were precipitated using ice-cold acetone at -20°C for 90 minutes. The resulting precipitated protein was resuspended again in ICAT Labeling Buffer. Protein extract from Golgi Membranes Kc cells were first homogenized by hypotonic lysis as described above. Organelle fractionation (Golgi) was performed using OptiprepTM (Axis-Shield) by buoyant density in pre-formed discontinuous iodixanol gradients as per the manufacter’s instructions. Protein extract from Autophagosomes Kc cells were first homogenized as described above for the generation of a membrane fraction. Organelle fractionation (autophagosomes) was performed using OptiprepTM (Axis-Shield, Norway) by buoyant density in pre-formed discontinuous iodixanol gradients as per the manufacter’s instructions. Preparation of Chromatin fraction from Kc cells The chromatin fraction from Kc cells was extracted using procedures described elsewhere after the generation of the nuclear fraction as described above9. Free Flow Electrophoresis (Protein and Peptide) Total protein extract from 3rd Instar larvae was performed using RIPA buffer, as described above. Free flow electrophoresis (FFE) was performed using a FFE-Weber FFETM free-flow electrophoresis 11 apparatus (FFE-Weber, Munich, Germany). All media contained 8 M Urea and 250 mM Mannitol. The anodic stabilization media contained 100 mM α-Hydroxyisobutyric acid (HIBA), 150 mM DL2-Aminobutyric acid, 100 mM Nicotinamide and 15 mM Glycyl-glycine, separation media 1 contained 23% prolyte mixture (pH 4-7) and the cathodic stabilization media contained, 75 mM Ethanolamine, 75 mM AMPSO, 150 mM TAPS and 30mM HEPES. For the separation of Drosophila melanogaster proteins and evaluation of FFE, the fractions were collected into a deepwell 96 well sample collection plate (maximum volume 2 mL for each well). The corresponding samples from the same wells from 96-well plates were pooled and dissolved in 1% (w/v) RapiGestTM (Waters) in 50 mM Tris-Hcl and digested overnight at 37°C [1:200 (w/w) trypsin:protein]. For Peptide FFE, the samples were prepared as described above, with the exception that they were digested before performing FFE. Gel Filtration Gelfiltration was done on a Superose 6 column (HR 10/30, Amersham-Pharmacia, Switzerland) controlled by an automatic fast protein liquid chromatography (FPLC) station (AmershamPharmacia) at 20°C. The column was equilibrated with a native buffer containing 150 mM NaCl, 1 mM DTT, 0.2 mM sodium-vanadate, 5mM EDTA, 1 g/ml Aprotinin (Sigma-Aldrich, Germany), and 50 mM Tris pH 7.2. The column was calibrated with Catalase (232 kDa), Albumine (67 kDa), Ovalbumin (43 kDa), Chymotrypsinogen (25 kDa), Ribonuclease A (15.6 kDa; all AmershamPharmacia), and Cytidin (Merk, Switzerland). Before loading onto the column cytosolic fractions were cleared of nucleic acids by precipitation with polyethyleneimine (Sigma-Aldrich, 0.1% w/v). 200 l of the cleared cytosolic fraction (1 mg/ml protein) was loaded onto the column and eluted at a flow rate of 0.3 ml/minute. Fractions of 0.6 ml calculated to be < 25 kDa were collected. The eluate was precipitated by trichloroacetic acid (50%), the pellet washed twice with 70% acetone, and redissolved in digestion buffer containing 1M urea. Iso Electric Focussing (IEF) of peptides The digested proteins (nuclear extract embryo10) were brought up to 8M urea in the presence of IPG buffer 3-10 NL (Amersham) and applied to a 13-cm IPG dry strip, 3-10 NL (Amersham). Using an 12 IPGphor (Amersham), the following focusing protocol was applied: 16 h 30V, 3h 500V, 8,000V up to 30,000Vh. Alternative settings were used for the S2 cell extract (1h 500V 1 h 1,000V, 24h 8,000V). Excess cover oil was removed, and the gel was scraped off the plastic backing in 20 (17 for S2) equally sized parts within 3 min. to prevent diffusion. Peptides were eluted in three sequential solutions containing 0%, 50% and 100% acetonitrile in water and 0.1% TFA (twice in 40% acetonitrile 0.5%TFA for the S2 cell experiments). Supernatants were combined, dried down and redissolved in 5% acetic acid. Samples were cleaned of salts and residual oil using STAGE tips 26, dried, and stored at -80°C before analysis. Chloroform extraction of proteins (isolation of CHCl3-soluble proteins) 3rd-instar larvae were first washed with PBS and homogenized in 50mMTris-Hcl, pH 8.4 (supplemented just before use with CompleteTM protease inhibitor cocktail (Roche)) sonicating the homogenate with 5 short pulses of 10 seconds each. The homogenate was extracted with Chloroform (1:1) by vortexing for 2 minutes and a subsequent incubation of 20 minutes on ice (chemical hood). The phases were separated for 7 minutes at 20°C in a table-top centrifuge at full speed. The interphase was recovered and stored at -80°C. The Chloroform phase was recovered and recentrifuged as indicated above. The aqueous phase was then completely removed and dried in a vacuum dryer generating a yellowish, glassy pellet. The pellet was resuspended in 50mMTris-Hcl, pH 8.4, 2% (w/v) RapiGestTM (Waters) and digest using trypsin (1:50). Size exclusion Chromatography of proteins S2 cells were washed 5 times in ice cold PBS and lysed in fresh ice cold RIPA buffer (150 mM NaCl, 50 mM Tris-HCl, pH 7.5, 500 µM EDTA, 100 µM EGTA, 1.0% Triton X-100, and 1% sodium deoxycholate) supplemented just before use with CompleteTM protease inhibitor cocktail (Roche), reduced using 5mM Tributyl phosphine (SIGMA) for 30 minutes at 37°C. Cysteine residues were carbamidomethylated using 10mM iodoacetamide dissolved in 10mM Ammonium bicarbonate for 2 hours at 20°C. The protein mixture was centrifuged at 4°C at 2300xg using ultrafree-Clhigh-flow Bionex-PB (Millipore, cutoff 10kD). The flowthrough was recovered and proteins were precipitated using ice-cold acetone at -20°C for 90 minutes. 13 ICAT-labeling, processing and analyzing protein samples The general procedures for ICAT-labeling, processing and analyzing protein samples are described elsewhere11-13. Specific differences in these general procedures that were used in our study will be detailed below. For the cytoplasmic and nuclear fraction, 4 mg of protein that had been dissolved in ICAT Labeling Buffer was reduced with 5mM tributylphosphine for 30 minutes at 37 oC. The sample was then split in half (2 mg/ml) and differentially labeled (1:1) with 1.2 mM of isotopically light or heavy ICAT reagent [ABI, Applied Biosystems] for 90 min. in the dark at 20 oC using gentle mixing. After combining the heavy and light labeled fractions, the mixture was diluted 6-fold with H2O [final conc. of urea (1M) and SDS (.008%)] and digested with trypsin [1:50 (w/w) trypsin:protein] (Promega, Madison, WI) at 37o C for 18 hours. Samples of each labeled fraction and the trypsinized mixture were then run on polyacrylamide gels and silver- stained to assess whether the labeling and trypsinization process were successful. In the case of the membrane fraction 4 mg of the protein fraction was completely dissolved in Membrane Labeling Buffer by mixing overnight at a concentration of 2mg/ml. At this ratio of SDS to membrane lipid (15/1) all membrane lipid should have been removed from the integral membrane proteins and the lipid and protein should be present in separate micelles. This membrane protein sample was then reduced and differentially labeled with the light and heavy-ICAT reagent in the same way as previously described for the cytoplasmic and nuclear fractions. (Separate control experiments using cysteine-rich non-membrane proteins demonstrated that there was complete ICAT labeling of the protein when this concentration of urea (8M) was present along with 1.2% SDS) (unpublished observations, Kregenow and Brunner). After combining the light and heavylabeled fractions, the combined mixture was diluted 40-fold with H2O [final conc of urea (0.2M) and SDS (0.03%)] and digested with trypsin [1:10 (w/w) trypsin: protein] at 37o C for 24 hrs. Samples of each labeled fraction and the trypsinized mixture were analyzed, as before, on silver-stained polyacrylamide gels to ascertain that both the labeling and trypsinization procedures were successful. Upstream fractionation of peptides 14 The peptide mixture was fractionated routinely on a cation-exchange chromatography column as the first step of a three-step chromatography separation process (Supplementary Table 2 online). 50-60 one minute fractions were collected from a 2.1mm x 20 cm Polysulfaethyl A column (Poly LC Inc, Columbia, MD, 5M particles, pore size 300A) at a flow rate of 200l/min using Buffer A, [20 mM KH2PO4, 25% CH3CN (pH 3.0)], and buffer B [20 mMKH2PO4, 25% CH3CN and 350 mMKCl (pH 3.0)] according to the following regime (0 – 25% Buffer B for 30 min followed by 25% - 100% buffer B for 20 min). For the second stage of the three-step upstream peptide separation process, adjacent cation-exchange fractions with a low peptide content (determined by 214 mm absorption measurements) were pooled so that only approximately 30 cation fractions were purified further by affinity chromotography using a ABI monomeric avidin column (See ABI literature for the protocol). The final eluent from these samples, containing the ICAT-labeled peptides, was dried down and the peptides resuspended in 15 l of Buffer A-1. (See below). The third step in the separation process involved an online reverse phase (RP) capillary chromatography column [75 um x 10 cm column with self-packed Magic C18AQ resin (5, 200A) from Michrom BioResources] positioned directly upstream of the mass spectrometer. Peptides were eluted from the RP column using a flow rate of approximately 250 nl/min. and a 100 minute gradient of 10% to 35% buffer B-1 [Buffer A-1: 0.2% formic acid, 5% CH3CN in H20; Buffer B-1: 0.2% formic acid, in 80% CH3CN and 20% ACN]. The peptide content of each sample determined how much of the sample was loaded onto the column (and therefore analyzed in the mass spectrometer); the peptide content having been estimated from the 214 nm absorption measurement made during cation-exchange chromatography. Between 25% - 50% of the cytoplasmic and nuclear fraction samples were added while we generally loaded entire samples from the membrane fraction. References 1. 2. Gasteiger, E. et al. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31, 3784-3788 (2003). Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78-94 (1997). 15 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Eng, J., McCormack, A. & Yates III, J. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of American Society for Mass Spectrometry 5, 976-989 (1994). Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383-5392 (2002). Kapp, A. et al. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis. Proteomics 5, 3475-3490 (2005). Desiere, F. et al. Integration with the human genome of peptide sequences obtained by highthroughput mass spectrometry. Genome Biol 6, R9 (2005). Dignam, J.D. Preparation of extracts from higher eukaryotes. Methods Enzymol 182, 194203 (1990). Dignam, J.D., Lebovitz, R.M. & Roeder, R.G. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res 11, 1475-1489 (1983). Mendez, J. & Stillman, B. Chromatin association of human origin recognition complex, cdc6, and minichromosome maintenance proteins during the cell cycle: assembly of prereplication complexes in late mitosis. Mol Cell Biol 20, 8602-8612 (2000). Krijgsveld, J., Gauci, S., Dormeyer, W. & Heck, A.J. In-gel isoelectric focusing of peptides as a tool for improved protein identification. J Proteome Res 5, 1721-1730 (2006). Han, D.K., Eng, J., Zhou, H. & Aebersold, R. Quantitative profiling of differentiationinduced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19, 946-951 (2001). Smolka, M.B., Zhou, H., Purkayastha, S. & Aebersold, R. Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis. Anal Biochem 297, 25-31 (2001). Von Haller, P.D. et al. The Application of New Software Tools to Quantitative Protein Profiling Via Isotope-coded Affinity Tag (ICAT) and Tandem Mass Spectrometry: I. Statistically Annotated Datasets for Peptide Sequences and Proteins Identified via the Application of ICAT and Tandem Mass Spectrometry to Proteins Copurifying with T Cell Lipid Rafts. Mol Cell Proteomics 2, 426-427 (2003). 16