Susan E. Slade1, Eamonn Breslin2, Steve Thornton2, Ranjit Akolekar3, Kypros Nicolaides3, Hayley Crowe4, Steve McDonald4, John P. Shockcor4 and James H. Scrivens1 1Warwick University, Coventry, U.K., 2Warwick Medical School, Coventry, U.K., 3King’s College Hospital, London, U.K. and 4Waters Corporation, Milford, U.S.A.. OVERVIEW Purpose Confidently identify and quantify the proteins present in individual plasma samples after a depletion strategy has been employed to remove highly abundant proteins. Assess the natural variation in the proteome of women carrying a normal or trisomy 21 fetus and women that develop early onset pre-eclampsia in plasma taken at 11-13 weeks gestation. Compare the proteomes obtained from individual patients with pooled samples obtained from the same obstetric condition. Compare the lipidomes obtained from a number of plasma samples used in the proteomics study. Methods Extraction of the lipidome from individual samples of non-depleted plasma. Depletion of highly abundant proteins followed by tryptic digest of individual maternal plasma samples. Reversed phase (RP) liquid chromatography (LC) and two-dimensional RP/RP-LC of the depleted proteome. MS data acquisition incorporating alternating low and elevated collision energy (MSE) using a non-labelled approach. Protein identification with relative and absolute quantitation and multivariate statistical analysis of lipidomics data. Results Protein identification with relative and absolute quantitation of the depleted plasma proteome. Variation in the natural abundance of some proteins within maternal plasma from 29 women and three obstetric conditions A small number of “biomarker” proteins show good correlation obtained from pooled and individual samples. Three lipids observed at slightly elevated levels in pre-eclampsia. INTRODUCTION The identification of protein biomarkers in plasma is particularly challenging with a wide dynamic range of protein concentrations calculated to approach 1012, of which the top 12 most abundant proteins represent over 95% of the protein complement [1]. The complexity caused by demographic variation between patients requires that any methodology employed to then quantify the proteomes must be robust, reproducible and capable of analysing large datasets. A number of analytical approaches have been utilised to study the plasma proteome, including quantitative measurements based on 2D gels or iTRAQ™ labelled peptides providing information on the protein levels in the samples. These are frequently limited to relative quantitation measurements. We have been utilising an LC-MS-based methodology which relies on changes in this response between peptide accurate mass measurement/retention time pairs to directly reflect concentrations in one sample relative to another [2]. Thishas since been developed into a labelfree system capable of relative and absolute quantification [3-4]. All detectable, eluting peptides and their corresponding fragments are observed via rapid switching between low and elevated collision energy (MSE) during the LC-MS experiment, giving a comprehensive list of all ions that can subsequently be searched resulting in protein identification and quantitative measurements [5] (Figure 1) with significantly improved results compared with iTRAQ™ labelled samples [6]. METHODS RESULTS depleted of 14 highly abundant protein Double centrifuged maternal plasma samples were supplied by King’s College Hospital, London with ethical approval and stored at -70 oC until use. For the samples analysed individually, 10 normal, 10 PE and 9 T21 samples were chosen at random from a selection supplied by KCH. The pooled samples were generated from a different set of plasma samples generated from 20 normal and 20 T21 conditions. The plasma used in this study spanned ethnicity, age and BMI within the gestational range 81-102 days. RP analysis of individual plasma samples A total of 29 individual plasma samples were depleted, digested and analysed by quantitative LC-MSE analysis in triplicate. Ten normal, ten PE and nine T21 samples were completed in this data set. Protein loading was calculated using the IdentityE algorithm in PLGS and optimised around 615 ng on column and typically 113 proteins were identified from each replicate injection, Figure 2. The incidence of chromosomal abnormalities in the absence of prenatal screening is estimated to be 6 in 1000 births. Abnormalities may include deletions, translocations or duplications (trisomies) of which the most prevalent is Down (T21). Screening for T21 has become common practice in developed countries incorporating ultrasound to identify a build up of fluid at the back of the neck [8] termed nuchal translucency with biochemical tests to identify higher risk patients. Diagnosis via amniocentesis or chorionic villus sampling carry a miscarriage rate of 1-2%. Using a combination of depletion of 14 highly abundant proteins, tryptic digest and LC-MSE we have analysed a total of 29 individual plasma samples from women carrying a normal or trisomy 21 (T21) fetus and women who develop preeclampsia (PE) taken at 11-13 weeks gestation and compared the results against data obtained from pooled samples. The data acquired on the extracted lipid samples was analyzed using multivariate statistical methods with MarkerLynx XS software. The partial least squares discriminate analysis (PLS-DA) model has 3 principal components (Figure 8, left) with weighted averages of the original variables. The scores PC(1) and PC(2), shown in Figure 8, left, are the two most important new variables in summarizing and separating the data. The three groups are shown in different colors to aid in visualization. Some slight variance was observed between the PE and normal groups, although less than half the variance is explained. An orthogonal project to latent structures discriminate analysis (OPLS-DA) was performed on the data from the normal and PE groups, with variance displayed along the t(1) axes (Figure 8, middle) allowing analysis of the loadings using the S-Plot. The S-Plot displays covariance p[1] versus correlation p(corr)[1] loadings from the two class OPLS-DA model (PET vs. N) (Figure 8, right) . The points shown in the plot are Exact Mass/Retention Time pairs (EMRTs). From these data we observed that three phospholipids are slightly elevated (<1.5) in the PET group (Table 3). Albumin Transferrin α1-Acid Glycoprotein (Orosomucoid) IgG α1-Antitrypsin α2-Macroglobulin HDL (Apolipoproteins A-I and A-II) IgA and IgM Fibrinogen Haptoglobin LDL (mainly Apolipoprotein B) Complement C3 Figure 2. Sample loading (left) and proteins identified (right) from technical replicates of LC-MSE analyses of depleted individual patient plasma samples. Table 1. Highly abundant protein species depleted using Seppro® IgY14 Sample Preparation The depleted plasma samples were solubilised in 0.1% Rapigest™ solution, concentrated and then heated. The samples were reduced, alkylated and digested with trypsin overnight at 37 oC. The samples were acid treated, filtered through a 0.22 µm membrane and stored at -70 oC. An aliquot of the sample was transferred to a new vial and internal standard added at fixed concentration. For the RP LC-MSE 87 injection data set: 1,265 proteins were identified of which 894 were only observed in one injection with 371 identified in two or more injections, see Figure 3. 2D High/Low pH RP/RP-Liquid Chromatography-MSE acquisition Approximately 2.5 µg of pooled plasma digest containing alcohol dehydrogenase as internal standard was loaded onto the first dimension column, Xbridge™ C18 (300 µm x 5 cm 5 µm) using a 2D NanoAcquity UPLC® system equilibrated in 20 mM ammonium formate at pH 10 at 2 µL/min. A discontinuous 6-step gradient of acetonitrile was used (11.1, 14.5, 17.4, 20.8, 45 and 65%) to elute peptides onto a trapping column, described above. The fractions containing organic solvent were diluted ten-fold using aqueous flow from the 2nd dimension pump prior to trapping. For the 2nd dimension a 20 cm BEH™ C18 column was used, as described above, using a 300 nL/min flow rate and data were acquired on a QToF Synapt® HDMS in triplicate. Figure 5. Average natural abundance of proteins that show similar levels between obstetric conditions (upper) from individual depleted patient plasma, based on three replicate analyses. Ceruloplasmin (lower left) and Complement C3 (lower right) levels after depletion with standard deviation indicated. 2D RP/RP analysis of pooled plasma samples depleted of 14 highly abundant proteins The tryptically digested, pooled IgY14-depleted plasma samples from 20 women carrying either a normal or a trisomy 21 foetus were analysed by 2D RP/RP-LC-MSE. In total 173 plasma proteins were confidently identified (observed in at least 2 technical replicates) in the pooled, depleted normal and trisomy 21 samples from the 2D analyses. The protein levels obtained from the 2D analyses were compared for the normal and T21 outcomes using the ExpressionE algorithm. Figure 3. Frequency of protein observations from technical replicates of LC-MSE analyses of depleted individual patient plasma samples. Sample Preparation for lipidomic analysis Lipids were extracted from human plasma by adding 30 µL of plasma followed by 180 µL of methanol, and then 360 µL of dichloromethane. The sample was centrifuged at 13,000 rpm and the organic layer was extracted. This was diluted 5-fold with buffer A and 5 µL was injected into the system. Liquid Chromatography-MSE acquisition for lipid extracts Lipid extracts were loaded onto a HSS T3 column 2.1 x 100 mm 1.8 µm fitted to a ACQUITY UPLC® (Waters) held at 65 oC with a flow rate of 500 µL/min acetonitrile/water (40/60) with 10 mM AmAc (buffer A). A linear gradient of 40100% buffer B (acetonitrile/isopropanol (10:90) with 10 mM AmAc) was performed over 10 minutes. Data were acquired in MSE mode on a Q-Tof Synapt ® G2 with a cone voltage of 35 V, desolvation temperature and gas flow of 400 oC and 800 L/hr respectively. The mobility gas used was nitrogen at 32 mL/min. A small number of proteins were identified as unique to one or other clinical outcome, that were not identified by the RP LC-MSE approach. A total of 66 proteins were identified by the ExpressionE algorithm to be present at differing levels between the normal and T21 pooled samples. A plot of the ratios is shown in Figure 6 including the standard deviation values. The average sequence coverage from the data set was 30.8% and the average number of peptides identified per protein was 16.4. For each patient, the three technical replicates were compared and protein abundances were calculated from the IdentityE tables (as a % of the total loading) for each protein that was observed in two or more replicates. The false positive protein identification rate was <0.3% after the removal of proteins only observed in one of the three replicates for each patient. The abundances of the proteins depleted from the plasma prior to digestion (Table 1) were calculated, Figure 4. Confident identifications, quantitation and comparative protein expression The protein tables from PLGS IdentityE were compiled in Excel and pivot tables were used to identify proteins observed in a minimum of 2 replicate analyses from each individual patient, thus termed confident protein identifications. The protein abundance (as a % of the total loading) was then calculated for the confident identifications. The number of random entries in the confident protein table was used to determine the false positive rate for the analyses. For the 2D RP/RP analyses, ExpressionE was used to determine differences in protein levels between the pooled trisomy 21 and normal plasma samples. Figure 8. Statistical analysis plots generated by MarkerLynx from the extracted lipid samples 16 proteins were identified in all replicate injections from all patients, regardless of clinical condition. Reversed Phase Liquid Chromatography-MSE acquisition Individual patient samples containing 0.6 µg of tryptic digest were loaded onto a Symmetry® C18 trapping column (180 µm x 20 mm 5 µm) using a NanoAcquity UPLC® system (Waters). The trapping column was flushed for 1 min prior to elution of the peptides onto a BEH™ C18 column (75 µm x 250 mm 1.7 µm) at 250 nL/min using a linear gradient of 340% buffer B (acetonitrile containing 0.1% HCOOH) over 90 minutes. Data were acquired on a Q-ToF Synapt® HDMS (Waters) operated in MSE mode, alternating the trap collision energy from 3 V to a ramped 15-30 V in elevated mode over 2 hours with a 0.9 sec scan rate. Human [Glu1]-Fibrinopeptide B (doubly charged m/z 785.8426) was used for mass correction. All data were acquired in triplicate. Figure 1. Flow diagram of the workflow for biomarker discovery in the maternal plasma proteome Pre-eclampsia (PE) is a pregnancy-specific condition characterised by hypertension and proteinuria occurring from the second trimester. It encompasses a variety of hypertensive proteinuric conditions of pregnancy and ranges from a mild disease to a catastrophic life threatening condition. For the this study we focus on severe early onset PE occurring before 32 weeks gestation, which has the greatest morbidity. Some proteins show little variation in abundance across the obstetric conditions studied, see Figure 5 upper panels, whereas other show greater differences between patients Figure 5 (lower left). Complement C3 was present at significantly elevated levels in some patients despite being depleted during sample processing, Figure 5 (lower right). Depletion of 14 highly abundant proteins A 50 µL aliquot of each individual or pooled plasma sample was depleted of 14 abundant protein species (see Table 1) using the Seppro® IgY14 LC2 column (Sigma Aldrich). The depletion was repeated 2-4 times, combined and concentrated. Data processing and database interrogation The raw data files were processed using ProteinLynx Global Server™ (PLGS) v2.4 with IdentityE and ExpressionE informatics (Waters) using default parameters for MSE data. The database search parameters used the following variable modifications, N-terminal acetylation, deamidation of N/Q and oxidation of M residues. The IPI human database rel. 3.69 was appended to include the sequences for the internal standards. A database was then generated which included one random entry for each original sequence in the file and was used for all subsequent interrogations. The labour and financial cost of performing quantitative studies on depleted plasma from individual patients has frequently resulted in the use of pooled samples. Information on the natural variation in the abundance of the proteins present between women of different age, ethnicity, body mass index, gestational age etc. is lost. In addition, a comparison is made between two pooled samples (normal:disease x) yielding protein biomarkers associated with a general inflammatory response that may not be clinically specific to the condition under study. Variation in natural abundance of proteins in depleted plasma between individual patients We have determined the abundance of 371 proteins observed in maternal plasma from normal, PE and T21 obstetric conditions analysed in triplicate from a total of 29 patients. No correlation was observed between depleted protein abundance and number of depletions performed on the IgY-14 column. Figure 6. Ratio of protein levels between normal and T21 outcomes analysed by 2D RP/RP LCMSE. The plot on a natural log scale includes standard deviation values. Y-axis value are % of total protein loading calculated from IdentityE. Retention Time ID Retention Time m/z Factor of Change Average (N) Average (PET) Std.Dev (N) PC16.0/20.4 5.55 PC16.0/20.4 5.55 782.5611 1.4 120.944 168.405 35.6807 38.7281 PC 16.0/22.5 5.68 PC 16.0/22.5 5.68 808.5789 1.3 31.3426 41.8732 10.1291 12.3813 PC 16.0/22.6 5.18 PC 16.0/22.6 5.18 806.5638 1.4 16.5139 23.8588 8.33024 9.70142 Std.Dev (PET) Table 3. Lipids identified at slightly elevated levels in PE compared to normal plasma CONCLUSIONS We have successfully established a robust, statistically valid methodology incorporating the depletion of abundant proteins, tryptic digestion and subsequent analysis by means of RP or 2D RP/RP LC-MSE from maternal plasma. Our 2D RP/RP LC-MSE analyses of pooled samples allowed the identification of proteins present at lower concentrations than could be observed in the RP LC-MSE study, but required more sample and experimental time. Plasma was obtained from women with a range of age, BMI and ethnicity with an average gestational age of 91 days. Two obstetric conditions were analysed, T21 and PE and the results compared with women carrying a normal fetus. Maternal plasma has been analysed both from individual patients and as pooled samples. Plasma used for the individual analysis were not included in the pooled samples. Protein abundances were determined as a % of the total loading based on the observation in at least 2 technical replicates for each individual. The abundance of some proteins appears to be relatively constant between patients and clinical conditions, whilst others show greater variation. The highly abundant protein levels were monitored to assess efficiency of depletion but no change was observed during this study. The log(e) ratio from 53 proteins present at differing levels between T21 and normal plasma obtained from pooled samples were compared with those calculated from the average of the individual samples. The majority of proteins show little correlation between the levels observed between pooled and the averaged individual samples, with 12 proteins showing relatively good agreement, Figure 7. Figure 4. Abundances of proteins depleted from individual patient plasma samples and analysed by LC-MSE. 5 A comparison of the protein levels observed from the pooled and averaged data sets, for those proteins differing between clinical outcomes, there was agreement for a small number of proteins although the majority showed little or no correlation. The lipid profiles from the non-depleted plasma were analysed by LC-MSE and three lipids were identified as present as slightly elevated levels in the PE condition. Figure 7. Comparison of the log (e) ratio of protein variation between normal and T21 outcomes determined from pooled and averaged individual plasma. REFERENCES [1] Anderson, N.L. and N.G. Anderson, The Human Plasma Proteome: History, Character, and Diagnostic Prospects. Molecular and Cellular Proteomics, 2002. 1(11): p. 845-867. [2] Silva, J.C., et al., Quantitative Proteomic Analysis by Accurate Mass Retention Time Pairs. Analytical Chemistry, 2005. 77(7): p. 2187-2200. [3] Silva, J.C., et al., Absolute Quantification of Proteins by LCMSE: A Virtue of Parallel MS Acquisition. Molecular and Cellular Proteomics, 2006. 5(1): p. 144-156. [4] Silva, J.C., et al., Simultaneous Qualitative and Quantitative Analysis of the Escherichia coli Proteome: A Sweet Tale. Molecular and Cellular Proteomics, 2006. 5(4): p. 589-607. [5] Cheng, F.-Y., et al., Absolute Protein Quantification by LC/MSE for Global Analysis of Salicylic Acid-Induced Plant Protein Secretion Responses. Journal of Proteome Research, 2009. 8(1): p. 82-93. [6] Patel, V. J. et al., A Comparison of Labeling and Label-Free Mass Spectrometry-Based Proteomics Approaches. J. Proteome Res., 2009, 8 (7): pp 3752–3759. [7] Spencer, K., Aneuploidy screening in the first trimester. Am J Med Genet C Semin Med Genet, 2007. 145C(1): p. 18-32. [8] Nicolaides, K.H., Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am J Obstet Gynecol, 2004. 191(1): p. 45-67.