Document 12943186

advertisement
Diagnosis of Prenatal Disorders – the search for biomarkers in maternal human plasma using a robust analytical approach
Susan E. Slade1, Konstantinos Thalassinos1, Nisha Patel1, Steve Thornton2, Kypros Nicolaides3, Chris Hughes4, Joanne B. Connolly4, James Langridge4 and James H. Scrivens1
1Warwick
University, Coventry, U.K., 2Warwick Medical School, Coventry, U.K., 3King’s College Hospital, London, U.K. and 4Waters MS Technologies, Manchester, U.K.
OVERVIEW
METHODS
Purpose
 Establish a robust, statistically valid analytical approach to the analysis of maternal plasma with the use of technical
replicates.
 Confidently identify and quantify the proteins present after a depletion strategy has been employed to remove highly
abundant proteins.
 Assess the natural variation in the proteome of women carrying a normal foetus.
 Analyse the plasma proteome of women carrying a trisomy 21 foetus to identify unique proteins and changes in
abundance levels, suitable for use as clinical biomarkers.
Methods
 Depletion of highly abundant proteins.
 Reversed phase (RP) liquid chromatography (LC) of tryptic digests of the plasma proteome and two-dimensional
RP/RP-LC.
 MS data acquisition using alternating low and elevated collision energy (MSE).
 Protein identification and quantitation using a non-labelled approach.
Results
 Over 400 plasma proteins have been identified and quantified using a depletion and 1D or 2D RP-LC-MSE approach,
on individual and pooled samples in triplicate.
 Unique proteins to normal and trisomy 21 outcomes have been identified in pooled samples.
 Changes in protein abundance have been identified to each outcome in pooled samples using RP and RP/ RP-LC-MSE.
INTRODUCTION
Double centrifuged maternal plasma samples were supplied by King’s College Hospital, London with ethical approval and
stored at -70 oC until use.
Depletion Strategies
Depletion of 12 highly abundant proteins
A 20 µL aliquot of each individual plasma sample was depleted of 12 abundant proteins using the IgY-12 spin column kit
(Beckman Coulter). All samples were from Caucasian women, aged 25-37, BMI 19-37 and gestational age 83-94 days.
Depletion of 14 highly abundant proteins
A 40 µL aliquot of each pooled normal or trisomy 21 plasma sample was depleted of 14 abundant proteins (see Table 1)
using the Seppro® IgY14 LC2 column (Sigma Aldrich). Further details of the pooled plasma samples are shown in Table
2. The depletion was repeated four times, the samples combined and concentrated.
Albumin
IgG
α1-Antitrypsin
IgA
IgM
Transferrin
Haptoglobin
α2-Macroglobulin
Fibrinogen
Complement C3
α1-Acid Glycoprotein (Orosomucoid)
HDL (Apolipoproteins A-I and A-II)
LDL (mainly Apolipoprotein B)
A number of analytical approaches have been utilised to study the plasma proteome, including quantitative
measurements based on 2D gels and iTRAQ™ labelled peptides, have provided information on the protein levels in
the samples but these are frequently values limited to relative quantitation measurements. The numbers of proteins
reported in depleted plasma range from the low to mid-hundreds unless a combination of methodologies are employed
[2]. In quantitative studies, the number of proteins though is typically in the low hundred [3] and there is an inherent
cost in the chemical labels required and instrumental requirements, thus prohibiting their use in a large-scale study for
biomarkers of prenatal disorders.
The incidence of chromosomal abnormalities in the absence of prenatal screening is estimated to be 6 in 1000 births
[4]. Abnormalities may include deletions, translocations or duplications (trisomies) of which the most prevalent is
Down. An estimated 95% of Down cases are due to the presence of an additional chromosome 21 caused prior to or
at conception by faulty division of the chromosomes in either the sperm or the egg. The incidence of trisomy 21 (T21)
risk increases significantly with maternal age. Screening for T21 has become common practice in developed countries
incorporating ultrasound to identify a build up of fluid at the back of the neck [5] termed nuchal translucency Figure 1,
with biochemical tests to identify higher risk patients. A number of biomolecules have been identified as markers for
trisomy 21 by their reduced levels in the first trimester including α-fetoprotein, unconjugated estriol and pregnancyassociated plasma protein-A whereas free β-human chorionic gonadotropin and inhibin A levels can be elevated.
It has been observed that ESI provides signal responses that correlate linearly with increasing concentration [6].
Recently, a simple LC-MS-based methodology was published which relies on changes in this response between
peptide accurate mass measurement/retention time pairs to directly reflect concentrations in one sample relative to
another [7], which has since been developed into a label-free system capable of relative and absolute quantification [89]. All detectable, eluting peptides and their corresponding fragments are observed via rapid switching between low
and elevated collision energy (MSE) during the LC-MS experiment, giving a comprehensive list of all ions that can
subsequently be searched [10] resulting in protein identification and quantitative measurements, Figure 2.
Our study utilises a depletion strategy to remove the highly abundant proteins followed by a single RP or 2D RP/RPLC-MSE experiment, performed in triplicate, to quantitatively probe the plasma proteome of women carrying a normal
or trisomy 21 foetus, in a statistically valid approach.
Confident identifications, quantitation and comparative protein expression
The protein tables from PLGS were compiled in Excel and pivot tables were used to identify proteins observed in a
minimum of 2 replicate analyses, thus termed “confident” protein identifications. The number of random entries in the
confident protein table was used to determine the false positive rate for the analyses.
For the RP analyses, ExpressionE was used to determine differences in protein levels between the trisomy 21 and normal
plasma samples. For the 2D RP/RP analyses, IdentityE was used to determine the absolute quantity of each protein in
each plasma sample and its abundance as a percentage of the sample loaded.
 Proteins common to both clinical outcomes were identified in both analyses 229 in the RP and 321 from the 2D
experiments, see Figure 6.
 Proteins unique to one clinical outcome were observed in both analyses.
 On average, each protein was identified with 11 peptides and 24% sequence coverage in the RP/RP-LC-MSE
analyses and the false positive rate was 0.28%
RP-LC-MSE
2D RP/RP-LC-MSE
Unique protein number
25
38
2 Afro-Caribbean
1 Asian
RP analysis of individual plasma samples depleted of 12 highly abundant proteins
Ten individual plasma samples from women carrying a normal foetus were depleted using the IgY-12 spin columns,
tryptically digested and analysed by RP-LC-MSE using an Ultima Global instrument. A typical chromatogram is shown in
Figure 3.
Age range 21-40
BMI 18-37
Gestational age 83-96 days
Trisomy 21 Foetus
19 Caucasian
1 Mixed race
Age range 23-44
66
229
23
 We have successfully established a robust, statistically valid methodology incorporating the depletion of abundant
proteins, tryptic digestion and subsequent analysis by means of RP or 2D RP/RP-LC separation, followed by in-line
data acquisition from maternal plasma samples.
 The number of plasma proteins identified from each patient varied from 101-219 and was to some extent,
concentration dependent.
 Data is collected in a data independent MSE mode using alternating low and elevated collision energy switching
resulting in significant improvements in the number of proteins identified, average sequence coverage and numbers of
peptides observed over a data dependent acquisition [11].
 All the depleted plasma samples had a set of common proteins identified across the dataset, 112 proteins were
identified in 5 or more samples.
 Some samples contained unique proteins, that were confidently identified that were not concentration dependent, see
Figure 4.
The false positive protein identification rate was <0.2%.
Reversed Phase Liquid Chromatography-MSE acquisition
A sample containing 0.5 – 2 µg of plasma tryptic digest was loaded onto a Symmetry® C18 trapping column (180 µm x
20 mm 5 µm) using a NanoAcquity UPLC® system (Waters). The trapping column was flushed at 15 µL/min (99.9% A)
for 1 min prior to elution of the peptides onto a BEH™ C18 column (75 µm x 250 mm 1.7 µm) at 250 nL/min using a
linear gradient of 3-40% buffer B (acetonitrile containing 0.1% HCOOH) over 90 minutes.
Data was acquired on a Q-ToF Ultima Global™ operated in pseudo-MSE mode, optimised for minimal in-source
fragmentation, between 10 and 110 minutes, or a Synapt™ HDMS (Waters). Data was acquired in low energy mode
using a collision energy (CE) of 6 V with a ramped CE of 6-35 V in elevated CE mode with a 0.9 sec scan rate. Human
[Glu1]-Fibrinopeptide B (doubly charged m/z 785.8426) was used for mass correction. All data were acquired in at least
triplicate.
Chromatographic Peak Width (FWHM)
Automatic
MS TOF Resolution
Automatic
Low Energy Threshold
Minimum Integrated Energy Threshold
250
100
Automatic
1500
Automatic
Fragment Mass Tolerance
Automatic
Min Fragment Ion Matches Per Peptide
1
Min Fragment Ion Matches Per Protein
5
Min Peptide Matches Per Protein
Maximum protein Mass
Missed Cleavages
1
250000
 Proteins are identified and quantified using a non-labelled approach, using an internal standard with high confidence
with false positive rates below 0.3%.
Proteins unique to a trisomy 21 clinical outcome
 Maternal plasma samples taken between 11 and 15 weeks gestational age were depleted of 12 or 14 highly
abundant proteins using commercially available spin or LC columns. The depleted samples were tryptically digested
and analysed in at least triplicate. Proteins were confidently identified if they were observed in two or more technical
replicates.
Common proteins to both clinical outcomes identified by RP analysis
Common proteins to both clinical outcomes identified by 2D RP/RP analysis
Figure 6. Venn diagram indicating proteins confidently identified from the RP-LC-MSE analysis of IgY14 depleted pooled normal and trisomy 21
samples (left) and 2D RP/RP-LC-MSE (right).
 Plasma samples from 10 women carrying normal foetuses were analysed individually after an IgY-12 depletion by
means of RP-LC-MSE analysis, identifying 349 proteins by an average of 9 peptides/protein and 24% sequence
coverage.
Comparison of RP and 2D RP/RP protein identifications
 Common proteins were identified across the dataset whilst some were unique to individual patients, exemplifying the
need to analyse individual samples and assess normal patient to patient variation in the search for clinical biomarkers
of disease.
 The RP analysis identified 45 proteins not observed in the 2D RP/RP data set which contained 111 proteins that were
unique.
 Pooled plasma samples from 20 women carrying either normal or trisomy 21 foetuses were depleted of 14 highly
abundant proteins to allow a deeper probe of the proteome. The pooled samples were analysed by means of both RP
and RP/RP-LC-MSE analysis and proteins common to both clinical outcomes were identified in both datasets, 229 and
321 respectively.
 The RP/RP-LC-MSE analyses identified more proteins (384 compared to 318) in the pooled samples providing better
proteome coverage and potential for biomarker discovery (111 unique proteins) but was more demanding in terms of
instrument time and processing requirements.
 A total of 66 (23) unique proteins in the normal maternal plasma were confidently identified from the RP (RP/RP)
datasets and 23 (25) proteins were unique to the trisomy 21 samples, providing potential biomarkers requiring further
study.
Proteins unique to RP analysis
Proteins identified in both analyses
45
273
111
Proteins unique to 2D RP/RP analysis
Figure 3. Base peak intensity chromatogram obtained from a RPLC-MSE acquisition from a plasma sample depleted of 12 highly
abundant proteins.
The chromatogram obtained at low collision energy is shown in red
and elevated in green
Figure 4. Venn diagram indicating common and unique proteins
identified confidently from ten individual IgY-12 depleted plasma
samples.
Depletion of 14 highly abundant proteins
Pooled samples were prepared from 20 women carrying normal or trisomy 21 foetuses were combined and depleted
using IgY14 LC column. The chromatogram obtained at 280 nm is shown in Figure 5.
1
Table 3. MSE data processing parameters
Table 4. Database search parameters used for RP-LC-MSE data
2D High/Low pH RP/RP-Liquid Chromatography-MSE acquisition
The pooled IgY14 depleted normal and trisomy 21 samples for the 2D-RP/RP analysis were identical to those used for
the RP-LC-MSE analyses. To each sample, a known amount of an ADH tryptic digest was added.
Approximately 2.5 µg of plasma digest was loaded onto the first dimension column, Xbridge™ C18 (300 µm x 5 cm 5
µm) using a 2D NanoAcquity UPLC® system equilibrated in 20 mM ammonium formate at pH 10 at 2 µL/min. A
discontinuous 6-step gradient of acetonitrile was used (11.1, 14.5, 17.4, 20.8, 45 and 65%) to elute peptides onto a
trapping column, described above. The fractions containing organic solvent were diluted ten-fold using aqueous flow
from the 2nd dimension pump prior to trapping. For the 2nd dimension a 20 cm BEH™ C18 column was used, as
described above, using a 300 nL/min flow rate. Data was acquired in MSE mode using a Synapt™ MS (Waters) over
the m/z range 50-1990 using GFP as reference compound.
Proteins unique to a normal clinical outcome
 The confident identifications from the RP and 2D RP/RP pooled, IgY14 depleted analyses were compared and 273
plasma proteins were found to be common to both data sets, Figure 7.
Data processing and database interrogation
The raw data files were processed using ProteinLynx Global Server™ (PLGS) v2.3 with IdentityE and ExpressionE
informatics (Waters) using default parameters for MSE data, shown in Table 3 The database search parameters used
are shown in Table 4 with the following variable modifications selected, N-terminal acetylation, deamidation of N/Q and
oxidation of M residues.
The IPI human database rel. 3.49 was appended to include the sequences for glycogen phosphorylase and alcohol
dehydrogenase. A database was then generated which included one random entry for each original sequence in the file
and was used for all subsequent interrogations.
Peptide Mass Tolerance
Unique protein number
Figure 9. Changes in protein abundance (plotted as log℮
protein ratio) from RP (blue) and RP/RP-LC-MSE (blue)
analyses of pooled plasma samples.
 In total 349 plasma proteins were confidently identified (observed in at least 2 technical replicates) across the IgY-12
depleted dataset encompassing all 10 patients.
 On average, each protein was identified with 8 peptides and 24% sequence coverage.
Sample Preparation
The depleted plasma samples were solubilised in 0.1% Rapigest™ solution and concentrated using a 5 kDa NMWCO
spin column (Biomax™, Millipore) prior to heating at 80 oC for 15 minutes. The samples were reduced, alkylated and
digested with trypsin overnight at 37 oC. The sample was incubated at 37 oC for 20 minutes with 2 µL of concentrated
TFA, filtered through a 0.22 µm membrane and stored at -70 oC. An aliquot of the sample was transferred to a new vial
and an internal standard of a tryptic digest of glycogen phosphorylase added at fixed concentration
Figure 8. Reproducibility of protein abundance measurements determined by
IdentityE from the 2D RP/RP analyses of 3 technical replicates of the pooled
IgY14 depleted normal and trisomy 21 samples. Proteins were selected based
on their increased abundance in the trisomy 21 outcome.
CONCLUSIONS
321
Gestational age 81-102 days
Table 2. Plasma samples demographic information
Retention Time Window
Figure 2. Flow diagram of the workflow for biomarker discovery in the maternal plasma proteome
 In total 318 and 384 plasma proteins were confidently identified (observed in at least 2 technical replicates) in the
pooled, depleted normal and trisomy 21 samples from the RP and 2D analyses respectively.
17 Caucasian
BMI 17-37
Elevated Energy Threshold
Figure 1. Nuchal translucency identified in the
sonographic image between the crosshairs
Comparison of RP and 2D RP/RP analysis of pooled plasma samples depleted of 14 highly abundant proteins
The tryptically digested, pooled IgY14-depleted plasma samples from 20 women carrying either a normal or a trisomy
21 foetus were analysed by RP and 2D RP/RP-LC-MSE using Synapt instruments.
RESULTS
Normal Foetus
Table 1. Highly abundant proteins depleted using Seppro® IgY14
The identification of protein biomarkers in plasma is particularly challenging. The wide dynamic range of the protein
concentrations observed in plasma is calculated to approach 1012 of which the top 12 most abundant proteins
represent over 95% of the protein complement [1]. The complexity due to demographic variation between patients
requires that any methodology employed to then quantify the proteomes must be robust, reproducible and capable of
analysing large datasets.
E data
Data processing and database interrogation for 2D RP/RP-LC-MS
RP/RP
The processing parameters used for the 2D analyses were identical to those for the RP analyses except low and elevated
energy thresholds were reduced to 200 and 75 counts respectively.
respectively A minimum of 3 fragment ions/peptide and 7
fragment ions/protein were specified in the database search.
search Individual chromatograms were processed separately and
merged into one file for database interrogation.
Figure 5. Chromatogram obtained from the depletion of a pooled
plasma sample using an IgY14 LC column. The fraction eluting
between 5 and 17 minutes was then tryptically digested and
analysed by LC-MSE
Figure 7. Venn diagram indicating proteins confidently identified from the RP and 2D
RP/RP-LC-MSE analysis of IgY14 depleted pooled normal and trisomy 21 samples.
Reproducibility of quantitative measurements
All of the analyses were performed in technical triplicate, the absolute concentrations were determined for the confident
proteins using IdentityE and their abundance as a percentage of the total calculated. In Figure 8 a selected number of
proteins from the 2D RP/RP analysis have been plotted that showed an increase in abundance in the pooled trisomy
21 sample. Each of the three technical replicates from each sample are shown.
The RP-LC-MSE analysis of pooled trisomy 21 and normal samples was processed using ExpressionE in PLGS and
proteins that were common to both outcomes were expressed as a ratio (T21:Normal). The protein abundance
calculated from IdentityE for the RP/RP analysis was also converted to a ratio for comparison with the RP analysis.
 The RP-LC-MSE analysis indicated that 32 proteins were present at a lower level in the trisomy 21 pooled sample of
which 29 were confirmed at a lower level by the RP/RP data.
 Conversely 17 proteins were present at higher levels in the pooled trisomy 21 sample, of which 10 followed the same
trend in the RP/RP analysis.
 The log℮ trisomy 21:normal protein ratios were calculated from both the RP (blue) and RP/RP analyses (red) , Figure
9 shows a comparison of the two data sets for a selected number of proteins.
 The quantitative data from the pooled, IgY14 depleted plasma RP-LC-MSE analysis was generated using the
ExpressionE algorithm and compared with the absolute quantification data (converted to protein abundance) from the
RP/RP-LC-MSE analysis using the IdentityE algorithm. The data obtained from both analyses processed using different
algorithms was generally in very good agreement, increasing the confidence in the quality of data obtained. Some of
the discrepancies in the quantitative results may be explained by the low protein score reported, suggesting low quality
data for the peptides.
 Manual validation of the proteins identified as unique to each clinical outcome is underway. Due to redundancy in the
protein database, some proteins of near identical sequence may not have been collapsed to a single entry in the
protein tables. Thus some of the unique proteins identified may be isoforms of the same protein that are
indistinguishable in the dataset and need to be removed as potential biomarkers.
 Further work is underway to analyse IgY14 depleted individual normal and trisomy 21 samples to validate our initial
observations. A reference library of protein information (identity and quantitation) will be compiled to which samples
from other clinical outcomes e.g. pre-eclampsia will be compared.
REFERENCES
[1] Anderson, N.L. and N.G. Anderson, The Human Plasma Proteome: History, Character, and Diagnostic Prospects. Molecular and Cellular Proteomics, 2002. 1(11): p. 845-867.
[2] Schenk, S., et al., A high confidence, manually validated human blood plasma protein reference set. BMC Med Genomics, 2008. 1: p. 41.
[3] Song, X., et al., iTRAQ experimental design for plasma biomarker discovery. J Proteome Res, 2008. 7(7): p. 2952-8.
[4] Spencer, K., Aneuploidy screening in the first trimester. Am J Med Genet C Semin Med Genet, 2007. 145C(1): p. 18-32.
[5] Nicolaides, K.H., Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am J Obstet Gynecol, 2004. 191(1): p. 45-67.
[6] Chelius, D. and P.V. Bondarenko, Quantitative Profiling of Proteins in Complex Mixtures Using Liquid Chromatography and Mass Spectrometry. Journal of Proteome Research, 2002. 1(4): p. 317-323.
[7] Silva, J.C., et al., Quantitative Proteomic Analysis by Accurate Mass Retention Time Pairs. Analytical Chemistry, 2005. 77(7): p. 2187-2200.
[8] Silva, J.C., et al., Absolute Quantification of Proteins by LCMSE: A Virtue of Parallel MS Acquisition. Molecular and Cellular Proteomics, 2006. 5(1): p. 144-156.
[9] Silva, J.C., et al., Simultaneous Qualitative and Quantitative Analysis of the Escherichia coli Proteome: A Sweet Tale. Molecular and Cellular Proteomics, 2006. 5(4): p. 589-607.
[10] Cheng, F.-Y., et al., Absolute Protein Quantification by LC/MSE for Global Analysis of Salicylic Acid-Induced Plant Protein Secretion Responses. Journal of Proteome Research, 2009. 8(1): p. 82-93.
[11 Patel, V. J. et al., J. Proteome Res., In Press Publication Date (Web): May 12, 2009
The authors would like to acknowledge the significant technical assistance of Matthew Edgeworth in this work.
Download