Susan E. Slade , Eamonn Breslin , Steve Thornton , Ranjit Akolekar

advertisement
Susan E. Slade1, Eamonn Breslin2, Steve Thornton2, Ranjit Akolekar3, Kypros Nicolaides3, Hayley Crowe4, Steve McDonald4, John P. Shockcor4 and James H. Scrivens1
1Warwick
University, Coventry, U.K., 2Warwick Medical School, Coventry, U.K., 3King’s College Hospital, London, U.K. and 4Waters Corporation, Milford, U.S.A..
OVERVIEW
Purpose
 Confidently identify and quantify the proteins present in individual plasma samples after a depletion strategy has been
employed to remove highly abundant proteins.
 Assess the natural variation in the proteome of women carrying a normal or trisomy 21 fetus and women that develop
early onset pre-eclampsia in plasma taken at 11-13 weeks gestation.
 Compare the proteomes obtained from individual patients with pooled samples obtained from the same obstetric
condition.
 Compare the lipidomes obtained from a number of plasma samples used in the proteomics study.
Methods
 Extraction of the lipidome from individual samples of non-depleted plasma.
 Depletion of highly abundant proteins followed by tryptic digest of individual maternal plasma samples.
 Reversed phase (RP) liquid chromatography (LC) and two-dimensional RP/RP-LC of the depleted proteome.
 MS data acquisition incorporating alternating low and elevated collision energy (MSE) using a non-labelled approach.
 Protein identification with relative and absolute quantitation and multivariate statistical analysis of lipidomics data.
Results
 Protein identification with relative and absolute quantitation of the depleted plasma proteome.
 Variation in the natural abundance of some proteins within maternal plasma from 29 women and three obstetric
conditions
 A small number of “biomarker” proteins show good correlation obtained from pooled and individual samples.
 Three lipids observed at slightly elevated levels in pre-eclampsia.
INTRODUCTION
The identification of protein biomarkers in plasma is particularly challenging with a wide dynamic range of protein
concentrations calculated to approach 1012, of which the top 12 most abundant proteins represent over 95% of the
protein complement [1]. The complexity caused by demographic variation between patients requires that any
methodology employed to then quantify the proteomes must be robust, reproducible and capable of analysing large
datasets.
A number of analytical approaches have been utilised to study the plasma proteome, including quantitative
measurements based on 2D gels or iTRAQ™ labelled peptides providing information on the protein levels in the
samples. These are frequently limited to relative quantitation measurements. We have been utilising an LC-MS-based
methodology which relies on changes in this response between peptide accurate mass measurement/retention time
pairs to directly reflect concentrations in one sample relative to another [2]. Thishas since been developed into a labelfree system capable of relative and absolute quantification [3-4]. All detectable, eluting peptides and their
corresponding fragments are observed via rapid switching between low and elevated collision energy (MSE) during the
LC-MS experiment, giving a comprehensive list of all ions that can subsequently be searched resulting in protein
identification and quantitative measurements [5] (Figure 1) with significantly improved results compared with iTRAQ™
labelled samples [6].
METHODS
RESULTS depleted of 14 highly abundant protein
Double centrifuged maternal plasma samples were supplied by King’s College Hospital, London with ethical approval and
stored at -70 oC until use. For the samples analysed individually, 10 normal, 10 PE and 9 T21 samples were chosen at
random from a selection supplied by KCH. The pooled samples were generated from a different set of plasma samples
generated from 20 normal and 20 T21 conditions. The plasma used in this study spanned ethnicity, age and BMI within
the gestational range 81-102 days.
RP analysis of individual plasma samples
A total of 29 individual plasma samples were depleted, digested and analysed by quantitative LC-MSE analysis in
triplicate. Ten normal, ten PE and nine T21 samples were completed in this data set. Protein loading was calculated
using the IdentityE algorithm in PLGS and optimised around 615 ng on column and typically 113 proteins were identified
from each replicate injection, Figure 2.
The incidence of chromosomal abnormalities in the absence of prenatal screening is estimated to be 6 in 1000 births.
Abnormalities may include deletions, translocations or duplications (trisomies) of which the most prevalent is Down
(T21). Screening for T21 has become common practice in developed countries incorporating ultrasound to identify a
build up of fluid at the back of the neck [8] termed nuchal translucency with biochemical tests to identify higher risk
patients. Diagnosis via amniocentesis or chorionic villus sampling carry a miscarriage rate of 1-2%.
Using a combination of depletion of 14 highly abundant proteins, tryptic digest and LC-MSE we have analysed a total
of 29 individual plasma samples from women carrying a normal or trisomy 21 (T21) fetus and women who develop preeclampsia (PE) taken at 11-13 weeks gestation and compared the results against data obtained from pooled samples.
The data acquired on the extracted lipid samples was analyzed using multivariate statistical methods with MarkerLynx
XS software.
 The partial least squares discriminate analysis (PLS-DA) model has 3 principal components (Figure 8, left) with
weighted averages of the original variables. The scores PC(1) and PC(2), shown in Figure 8, left, are the two most
important new variables in summarizing and separating the data. The three groups are shown in different colors to aid
in visualization.
 Some slight variance was observed between the PE and normal groups, although less than half the variance is
explained.
An orthogonal project to latent structures discriminate analysis (OPLS-DA) was performed on the data from the normal
and PE groups, with variance displayed along the t(1) axes (Figure 8, middle) allowing analysis of the loadings using
the S-Plot.
 The S-Plot displays covariance p[1] versus correlation p(corr)[1] loadings from the two class OPLS-DA model (PET
vs. N) (Figure 8, right) . The points shown in the plot are Exact Mass/Retention Time pairs (EMRTs).
 From these data we observed that three phospholipids are slightly elevated (<1.5) in the PET group (Table 3).
Albumin
Transferrin
α1-Acid Glycoprotein (Orosomucoid)
IgG
α1-Antitrypsin
α2-Macroglobulin
HDL (Apolipoproteins A-I and A-II)
IgA and IgM
Fibrinogen
Haptoglobin
LDL (mainly Apolipoprotein B)
Complement C3
Figure 2. Sample loading (left) and proteins identified (right) from technical replicates of LC-MSE analyses of depleted
individual patient plasma samples.
Table 1. Highly abundant protein species depleted using Seppro® IgY14
Sample Preparation
The depleted plasma samples were solubilised in 0.1% Rapigest™ solution, concentrated and then heated. The
samples were reduced, alkylated and digested with trypsin overnight at 37 oC. The samples were acid treated, filtered
through a 0.22 µm membrane and stored at -70 oC. An aliquot of the sample was transferred to a new vial and internal
standard added at fixed concentration.
For the RP LC-MSE 87 injection data set:
 1,265 proteins were identified of which 894 were only observed in one injection with 371 identified in two or more
injections, see Figure 3.
2D High/Low pH RP/RP-Liquid Chromatography-MSE acquisition
Approximately 2.5 µg of pooled plasma digest containing alcohol dehydrogenase as internal standard was loaded onto
the first dimension column, Xbridge™ C18 (300 µm x 5 cm 5 µm) using a 2D NanoAcquity UPLC® system equilibrated
in 20 mM ammonium formate at pH 10 at 2 µL/min. A discontinuous 6-step gradient of acetonitrile was used (11.1, 14.5,
17.4, 20.8, 45 and 65%) to elute peptides onto a trapping column, described above. The fractions containing organic
solvent were diluted ten-fold using aqueous flow from the 2nd dimension pump prior to trapping. For the 2nd dimension a
20 cm BEH™ C18 column was used, as described above, using a 300 nL/min flow rate and data were acquired on a QToF Synapt® HDMS in triplicate.
Figure 5. Average natural abundance of proteins that show similar levels between obstetric conditions (upper) from
individual depleted patient plasma, based on three replicate analyses.
Ceruloplasmin (lower left) and Complement C3 (lower right) levels after depletion with standard deviation indicated.
2D RP/RP analysis of pooled plasma samples depleted of 14 highly abundant proteins
The tryptically digested, pooled IgY14-depleted plasma samples from 20 women carrying either a normal or a trisomy
21 foetus were analysed by 2D RP/RP-LC-MSE.
 In total 173 plasma proteins were confidently identified (observed in at least 2 technical replicates) in the pooled,
depleted normal and trisomy 21 samples from the 2D analyses.
 The protein levels obtained from the 2D analyses were compared for the normal and T21 outcomes using the
ExpressionE algorithm.
Figure 3. Frequency of protein observations from technical replicates of LC-MSE analyses of depleted individual patient
plasma samples.
Sample Preparation for lipidomic analysis
Lipids were extracted from human plasma by adding 30 µL of plasma followed by 180 µL of methanol, and then 360 µL
of dichloromethane. The sample was centrifuged at 13,000 rpm and the organic layer was extracted. This was diluted
5-fold with buffer A and 5 µL was injected into the system.
Liquid Chromatography-MSE acquisition for lipid extracts
Lipid extracts were loaded onto a HSS T3 column 2.1 x 100 mm 1.8 µm fitted to a ACQUITY UPLC® (Waters) held at
65 oC with a flow rate of 500 µL/min acetonitrile/water (40/60) with 10 mM AmAc (buffer A). A linear gradient of 40100% buffer B (acetonitrile/isopropanol (10:90) with 10 mM AmAc) was performed over 10 minutes. Data were
acquired in MSE mode on a Q-Tof Synapt ® G2 with a cone voltage of 35 V, desolvation temperature and gas flow of
400 oC and 800 L/hr respectively. The mobility gas used was nitrogen at 32 mL/min.
 A small number of proteins were identified as unique to one or other clinical outcome, that were not identified by the
RP LC-MSE approach.
 A total of 66 proteins were identified by the ExpressionE algorithm to be present at differing levels between the normal
and T21 pooled samples. A plot of the ratios is shown in Figure 6 including the standard deviation values.
 The average sequence coverage from the data set was 30.8% and the average number of peptides identified per
protein was 16.4.
 For each patient, the three technical replicates were compared and protein abundances were calculated from the
IdentityE tables (as a % of the total loading) for each protein that was observed in two or more replicates.
 The false positive protein identification rate was <0.3% after the removal of proteins only observed in one of the three
replicates for each patient.
 The abundances of the proteins depleted from the plasma prior to digestion (Table 1) were calculated, Figure 4.
Confident identifications, quantitation and comparative protein expression
The protein tables from PLGS IdentityE were compiled in Excel and pivot tables were used to identify proteins observed
in a minimum of 2 replicate analyses from each individual patient, thus termed confident protein identifications. The
protein abundance (as a % of the total loading) was then calculated for the confident identifications. The number of
random entries in the confident protein table was used to determine the false positive rate for the analyses.
For the 2D RP/RP analyses, ExpressionE was used to determine differences in protein levels between the pooled
trisomy 21 and normal plasma samples.
Figure 8. Statistical analysis plots generated by MarkerLynx from the extracted lipid samples
 16 proteins were identified in all replicate injections from all patients, regardless of clinical condition.
Reversed Phase Liquid Chromatography-MSE acquisition
Individual patient samples containing 0.6 µg of tryptic digest were loaded onto a Symmetry® C18 trapping column (180
µm x 20 mm 5 µm) using a NanoAcquity UPLC® system (Waters). The trapping column was flushed for 1 min prior to
elution of the peptides onto a BEH™ C18 column (75 µm x 250 mm 1.7 µm) at 250 nL/min using a linear gradient of 340% buffer B (acetonitrile containing 0.1% HCOOH) over 90 minutes. Data were acquired on a Q-ToF Synapt® HDMS
(Waters) operated in MSE mode, alternating the trap collision energy from 3 V to a ramped 15-30 V in elevated mode
over 2 hours with a 0.9 sec scan rate. Human [Glu1]-Fibrinopeptide B (doubly charged m/z 785.8426) was used for
mass correction. All data were acquired in triplicate.
Figure 1. Flow diagram of the workflow for biomarker discovery in the maternal plasma proteome
Pre-eclampsia (PE) is a pregnancy-specific condition characterised by hypertension and proteinuria occurring from the
second trimester. It encompasses a variety of hypertensive proteinuric conditions of pregnancy and ranges from a
mild disease to a catastrophic life threatening condition. For the this study we focus on severe early onset PE
occurring before 32 weeks gestation, which has the greatest morbidity.
 Some proteins show little variation in abundance across the obstetric conditions studied, see Figure 5 upper panels,
whereas other show greater differences between patients Figure 5 (lower left). Complement C3 was present at
significantly elevated levels in some patients despite being depleted during sample processing, Figure 5 (lower right).
Depletion of 14 highly abundant proteins
A 50 µL aliquot of each individual or pooled plasma sample was depleted of 14 abundant protein species (see Table 1)
using the Seppro® IgY14 LC2 column (Sigma Aldrich). The depletion was repeated 2-4 times, combined and
concentrated.
Data processing and database interrogation
The raw data files were processed using ProteinLynx Global Server™ (PLGS) v2.4 with IdentityE and ExpressionE
informatics (Waters) using default parameters for MSE data. The database search parameters used the following
variable modifications, N-terminal acetylation, deamidation of N/Q and oxidation of M residues. The IPI human
database rel. 3.69 was appended to include the sequences for the internal standards. A database was then generated
which included one random entry for each original sequence in the file and was used for all subsequent interrogations.
The labour and financial cost of performing quantitative studies on depleted plasma from individual patients has
frequently resulted in the use of pooled samples. Information on the natural variation in the abundance of the proteins
present between women of different age, ethnicity, body mass index, gestational age etc. is lost. In addition, a
comparison is made between two pooled samples (normal:disease x) yielding protein biomarkers associated with a
general inflammatory response that may not be clinically specific to the condition under study.
Variation in natural abundance of proteins in depleted plasma between individual patients
We have determined the abundance of 371 proteins observed in maternal plasma from normal, PE and T21 obstetric
conditions analysed in triplicate from a total of 29 patients.
 No correlation was observed between depleted protein abundance and number of depletions performed on the IgY-14
column.
Figure 6.
Ratio of protein levels between
normal and T21 outcomes
analysed by 2D RP/RP LCMSE.
The plot on a natural log scale
includes standard deviation
values.
Y-axis value are % of total protein loading
calculated from IdentityE.
Retention
Time
ID
Retention
Time
m/z
Factor of
Change
Average (N)
Average (PET)
Std.Dev (N)
PC16.0/20.4
5.55
PC16.0/20.4
5.55
782.5611
1.4
120.944
168.405
35.6807
38.7281
PC 16.0/22.5
5.68
PC 16.0/22.5
5.68
808.5789
1.3
31.3426
41.8732
10.1291
12.3813
PC 16.0/22.6
5.18
PC 16.0/22.6
5.18
806.5638
1.4
16.5139
23.8588
8.33024
9.70142
Std.Dev (PET)
Table 3. Lipids identified at slightly elevated levels in PE compared to normal plasma
CONCLUSIONS
 We have successfully established a robust, statistically valid methodology incorporating the depletion of abundant
proteins, tryptic digestion and subsequent analysis by means of RP or 2D RP/RP LC-MSE from maternal plasma.
 Our 2D RP/RP LC-MSE analyses of pooled samples allowed the identification of proteins present at lower
concentrations than could be observed in the RP LC-MSE study, but required more sample and experimental time.
 Plasma was obtained from women with a range of age, BMI and ethnicity with an average gestational age of 91 days.
Two obstetric conditions were analysed, T21 and PE and the results compared with women carrying a normal fetus.
 Maternal plasma has been analysed both from individual patients and as pooled samples. Plasma used for the
individual analysis were not included in the pooled samples.
 Protein abundances were determined as a % of the total loading based on the observation in at least 2 technical
replicates for each individual. The abundance of some proteins appears to be relatively constant between patients and
clinical conditions, whilst others show greater variation.
 The highly abundant protein levels were monitored to assess efficiency of depletion but no change was observed
during this study.
The log(e) ratio from 53 proteins present at differing levels between T21 and normal plasma obtained from pooled
samples were compared with those calculated from the average of the individual samples. The majority of proteins show
little correlation between the levels observed between pooled and the averaged individual samples, with 12 proteins
showing relatively good agreement, Figure 7.
Figure 4.
Abundances of proteins depleted from
individual patient plasma samples and
analysed by LC-MSE.
5
 A comparison of the protein levels observed from the pooled and averaged data sets, for those proteins differing
between clinical outcomes, there was agreement for a small number of proteins although the majority showed little or
no correlation.
 The lipid profiles from the non-depleted plasma were analysed by LC-MSE and three lipids were identified as present
as slightly elevated levels in the PE condition.
Figure 7.
Comparison of the log (e) ratio of protein
variation between normal and T21
outcomes determined from pooled and
averaged individual plasma.
REFERENCES
[1] Anderson, N.L. and N.G. Anderson, The Human Plasma Proteome: History, Character, and Diagnostic Prospects. Molecular and Cellular Proteomics, 2002. 1(11): p. 845-867.
[2] Silva, J.C., et al., Quantitative Proteomic Analysis by Accurate Mass Retention Time Pairs. Analytical Chemistry, 2005. 77(7): p. 2187-2200.
[3] Silva, J.C., et al., Absolute Quantification of Proteins by LCMSE: A Virtue of Parallel MS Acquisition. Molecular and Cellular Proteomics, 2006. 5(1): p. 144-156.
[4] Silva, J.C., et al., Simultaneous Qualitative and Quantitative Analysis of the Escherichia coli Proteome: A Sweet Tale. Molecular and Cellular Proteomics, 2006. 5(4): p. 589-607.
[5] Cheng, F.-Y., et al., Absolute Protein Quantification by LC/MSE for Global Analysis of Salicylic Acid-Induced Plant Protein Secretion Responses. Journal of Proteome Research, 2009. 8(1): p. 82-93.
[6] Patel, V. J. et al., A Comparison of Labeling and Label-Free Mass Spectrometry-Based Proteomics Approaches. J. Proteome Res., 2009, 8 (7): pp 3752–3759.
[7] Spencer, K., Aneuploidy screening in the first trimester. Am J Med Genet C Semin Med Genet, 2007. 145C(1): p. 18-32.
[8] Nicolaides, K.H., Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am J Obstet Gynecol, 2004. 191(1): p. 45-67.
Download