Improved protein separation and identification by use of 2D liquid protein fractionation and ion mobility mass spectrometry 1 1 1 1 1 2 Susan E. Slade ; Konstantinos Thalassinos ; Sarah J. Nicholson ; Jonathan P. Williams ; James H. Scrivens ; Kevin Giles and Robert H. Bateman 2 1 Biological Mass Spectrometry and Proteomics, University of Warwick, Coventry, United Kingdom 2 Waters Micromass MS Technologies, Floats Road, Wythenshawe, Manchester, United Kingdom OVERVIEW MATERIALS AND METHODS Chromatofocusing Column (CF) Sample Reverse Phase Column (RP) Purpose To utilise and evaluate the combination of a commercial two-dimensional liquid protein separation system with mass spectrometry-based protein identification for proteomics applications. Two-dimensional liquid chromatography protein separation and mass spectrometry-based identification Intact Mass Digest Methods E. coli cell lysate proteins were resolved by chromatofocusing followed by hydrophobicity chromatographic separations. A number of protein fractions were concentrated and tryptically digested prior to analysis by means of LC-ESI-MS/MS and protein identification. MALDI-PMF Bioinformatic approaches were developed to extract biologically relevant information from the generated dataset. Results Preliminary results indicate that we have a demonstrated a significant improvement in our protein identification confidence due to increased sequence coverage for proteins from a widerange of molecular weight and isoelectric point, compared to samples from gel-based sources. Interesting observations were noted including one protein that eluted at a number of pI intervals, with differing peptide sequences observed in each fraction. In addition, the solution-based protein separation system allows characterisation of both tryptically digested proteins and the intact species thus enabling further protein characterisation. INTRODUCTION Gel-based proteomics experiments have proved highly successful in analysing relatively complex biological systems but have a number of limitations. These include narrow sample loading capacity reducing the quantity of protein available for downstream mass spectrometry-based identification. When the capacity is exceeded, poor resolution of protein species is evident. Other limitations include gel-to-gel variation, narrow dynamic range, difficulties in generating a homogeneous sample for analysis due to problems with protein solubility and the low efficiency of peptide extraction post-tryptic digestion compounding the problems encountered in protein identification. Prefractionation of complex proteomes can be required prior to analysis particularly when the sample source contains a number of highly abundant proteins. The characterisation of low abundance species continues to be a challenge in the field of proteomics. UV Detector 280 nm LC-ESI-MS/MS UV Detector 214 nm Database Search Figure 1. Schematic representation of the separation mechanism employed by the PF2D protein separation system (Beckman Coulter) and subsequent potential MS analysis strategies. Typically a single 2D-LC separation combining the collection of protein fractions from the CF and RP columns could generate approximately 800 samples for digestion and MS analysis. Clearly a more focused approach to the analysis of biological systems is required with the necessity to derive biologically relevant information from the vast quantity of mass spectrometry data generated. Each CF fraction was then applied to a non-porous silica RP column (Beckman Coulter) and further resolved by an organic solvent gradient using acetonitrile with trifluroacetic acid as ion pairing agent. Fixed volume fractions were collected during each separation and o aliquotted into smaller volumes prior to storage at -70 C A number of fractions were selected for MS-based protein identification across a number of pH intervals and apparent protein concentrations. A 175 µL aliquot of each fraction was dried and reconstituted in 10 µL of ammonium bicarbonate buffer. and processed using a MassPrep robotic protein handling system (Waters Micromass MS Technologies) using the maufacturer’s in-solution digest protocol. In brief, protein samples were reduced, alkylated with iodoacetamide, digested with trypsin and the resultant peptides acidified. Figure 3. ProteoView visualisation of E. coli fractionated proteins using a 2D-LC approach. Vertical columns represent CF fractions and horizontal bands protein absorbance. Shown on the left is the absorbance measured at 214 nm during the reverse phase elution of one chromatofocusing fraction. ! We have identified 183 proteins with a minimum of one observed peptide, of which 92 were identified with two peptides and 51 with a minimum of 3 peptides. ! In addition, many CF fractions contain multiple proteins often identified with a number of peptides observed, see Figure 4. We have developed extensive in-house bioinformatics resources which have facilitated the interpretation of the 2D-LC data, elements of which are presented on poster TP28 (Thalassinos, Slade et al.). The tryptic extracts were analysed by means of nano-LC-ESI-MS/MS on a Q-Tof Ultima Global with in-line CapLC system (Waters Micromass MS Technologies). The tryptic extract was desalted using an in-line C18 precolumn cartridge (Dionex, U.S.A.) and the peptides further resolved on a 75 µm C18 PepMap column (Dionex, U.S.A.) using an increasing acetonitrile concentration gradient. Due to the highly complex nature of many biological systems, it may not be possible to resolve each of the proteins in a sample into individual fractions using the 2D-LC separation. Therefore we have explored the potential of ion mobility to resolve peptides of identical m/z, which may result from a tryptic digest of a 2D-LC fraction containing a number of proteins, see Figure 2. Ion mobility study of isomeric peptides Doubly charged precursor ions at m/z 246 were selected from two peptides having the sequences GRGDS and SDGRG, for ion mobility study using the Synapt HDMS (Waters Micromass MS Technologies), a hybrid MS-IMS-MS instrument with a Quadrupole / IMS / Orthogonal-TOF configuration. The ion mobility separation device (IMS) comprises three consecutive travelling wave RF ion guides (Triwave) incorporating a repeating sequence of transient DC pulses to propel ions through the guide in the presence of a background gas. Ions are accumulated in the Trap T-Wave and periodically released into the IMS T-Wave where they separate according to their mobility. RESULTS Protein resolution is achieved in the first dimension using chromatofocusing (CF) by generating a pH gradient on an ion exchange resin. The column is equilibrated at basic pH and the solublised proteins are applied to the column. Proteins that have an isoelectric point (pI) equivalent to the column pH have a net zero charge, thus do not bind and elute immediately from the column and are collected. Over a period of a few hours, the pH of the column is reduced and sequentially the proteins elute from the column according to their pI and are collected for further analysis. Any proteins still bound to the column at acidic pH are eluted using a high ionic strength buffer and the fractions collected, see Figure 1. RESULTS Protein identification To date, we have obtained significant numbers of protein identifications from seven of the possible 32 reverse phase-separated CF fractions, spanning the pH region 6.4 to 4.0. Figure 2. Overlay of the arrival time mobility distributions for the doubly charged ions of two isomeric peptides using the Synapt HDMS (Waters Micromass MS Technologies, U.K.). Collisional cross-section measurements indicate a 5 % difference in physical size and shape. ! Each of the CF fractions contains multiple proteins as depicted by a one-dimensional representation generated by the ProteoView software (Beckman Coulter), see Figure 3. Each vertical “track” depicts a single CF fraction, with the horizontal bands indicating the presence of protein detected by absorbance at 214 nm. CONCLUSIONS AND FUTURE WORK The combination of 2D-LC chromatofocusing and hydrophobicity fractionation steps prior to MS-based protein identification has proved highly successful in the analysis of our model system. We have demonstrated that proteins identified with one peptide in one RP fraction may subsequently be observed in other fractions from the 2D-LC separation. The biological implications of this project will only be evident when the complete dataset of RP fractions has been analysed and the proteins identified. ProteinLynx Global Server 2.1 (Waters Micromass MS Technologies) was used to interrogate the data obtained from the LC-ESI-MS/MS experiments against an in-house database containing sequences from E. coli W3110, trypsin and keratin contaminants. The database chosen to store the experimental and protein identification results was MySQL 5.0 (http://www.mysql.com/). A program, written in the Java programming language (v 1.5.0_04), allows the user to enter a variety of experimental 2D-LC and MS parameters used, including (but not restricted to) methods, locations of CF and RP fractions (tray and well numbers), processing parameters, datafile locations, processed spectra etc. The program also parses the GS protein identification results and links identifications to each fraction. The database can then be queried, for example listing all fractions from CF and RP separations in which a protein, by accession number, was identified. Figure 5. Interpreted product ion spectrum of a doubly charged tryptic peptide with amino acid residue assignments. The ability to load milligram quantities results in protein identifications with substantially improved sequence coverage and thus confidence in protein assignments. Where protein sequences are not available for interrogation, the high quality of the tandem MS data in combination with the higher m/z peptides observed, allow longer regions of peptides to be de novo sequenced. Relational database model We have combined a two-dimensional liquid chromatography (2D-LC) protein separation system with mass spectrometry-based identification using a well characterised commercially available cell lysate. Further resolution of the proteins is achieved by a second dimension of protein separation according to hydrophobicity. Each fraction from the chromatofocusing column is sequentially applied to a reverse phase (RP) column under aqueous conditions. An organic solvent gradient is used to elute the proteins from the column with the hydrophilic proteins emerging initially from the column followed by those having a greater hydrophobic nature. A commercially available Escherichia coli (strain W3110) cell lysate containing 3 mg of protein was applied to a chromatofocusing HPCF column equilibrated in CF start buffer, using the PF2D protein fractionation system (Beckman Coulter, U.S.A.). Proteins were resolved over the pH range 8.5 to 4.0 using the proprietary elution buffer followed by a 1 M sodium chloride eluate. Fractions were collected either by pH interval or volume. The vast quantity of data generated for each CF and RP fraction ensures the absolute requirement for bioinformatic handling of the biological information. Proteins that elute from the CF column at differing pH values than their predicted pI have been selected for further characterisation. Figure 4. Global Server protein identifications from a single reverse phase fraction from the 2D-LC separation. The upper left pane indicates that multiple protein species are present. The ability to produce intact proteins after the RP separation ensures that post-translational modification mapping can be undertaken on both the digested and full length protein. The latter approach is currently being explored in a high throughput manner. ! We propose to introduce real-time database searching and exclusion of identified proteins onthe-fly to encourage the observation of the lower abundance protein species. The RMMs of proteins identified to date range from 8 KDa to 100 KDa ! Abundant peptides have been observed containing more than 3 charges, a number having theoretical masses greater than 3.5 Kda. ! The product ion spectra (MS/MS) of higher m/z peptides can be interpreted to yield good candidate amino acid sequences (de novo), see Figure 5. ! The majority of proteins elute near to their expected pI with reproducible RP retention times, but some proteins exhibit differing properties to those expected (see poster TP28 for further details). ! Abundant proteins may elute across adjacent fractions from both the CF and RP columns. Frequently a protein may be identified in one fraction with only one peptide, to be subsequently identified in the next fraction by significantly more peptides. ! ! We will incorporate more directed mass spectrometry approaches to the identification of peptides e.g. neutral loss trigger for data directed acquisition on phosphorylated peptides. We plan to extend our studies to include biological systems that are incompatible with gelbased separation methods. Due to the complexity of the fractions analysed, protein quantitation may be an issue. We plan to assess the suitability of iTRAQ technology (Applied BioSystems, U.S.A.) for the quantitation of protein expression levels in proteomic studies in combination with 2D-LC protein fractionation. REFERENCES Sequence coverage can approach 50 % on proteins having RMMs from 9 to 35 K Da. Using the database, we have identified two doubly charged isobaric peptides of m/z 965 Da with identical protein elution profiles from the RP column, but differing in sequence (AFTSEEFTHFLEELTK and LVDKVIGITNEEAISTAR) for ion mobility separation and MS/MS identification. This would be achieved in the first and/or third T-Wave ion guides of the Synapt HDMS system generating first and second generation fragment ion spectra. Lubman, D.M. et al. (2002). Journal of Chromatography B. 782 (1-2) 183 -196. Zheng, S. et al. (2003). Biotechniques 35 (6) 1202-1211. Zhu, K. et al. (2004). Journal of Chromatography A.1053 (1-2) 133 -142. Giles K, Pringle SD, Worthington KR, Little D, Wildgoose JL and Bateman RH, Rapid Commun. Mass Spectrom. 18 (2004) 2401. Levreri, I. et al. (2005). Clinical Chemistry and Laboratory Medicine 43 1327-1333.