Analysis of the Human Serum Proteome Dr. Timothy D. Veenstra

advertisement
Analysis of the Human Serum Proteome
Dr. Timothy D. Veenstra
Director, Laboratory of Proteomics and Analytical
Technologies and
NCI-Frederick Biomedical Proteomics Program
Challenge Goal: Eliminate Suffering and Death
Due to Cancer by 2015.
Dr. von Eschenbach, Director National Cancer Institute
TRANSLATIONAL RESEARCH
•
Take research from the bench to bedside.
•
Obligation to public health.
• Allow physicians to make better decisions in
cancer management.
Three Keys to Translational Cancer Research
• Early Detection
improved
and
bioinformatic
If Development
detected at anofearly
stage, proteomics
the five-year
survival
rates fortools
most
for
diagnostic
medicine.
cancers
is high
even using currently available treatments.
• Molecular Diagnostics
NewisTarget
(Global
Proteomics)
There
a needDiscovery
for biomarkers
and
novel diagnostic technologies
Signal
Transduction
Pathway
Profiling
that are
more accurate
and can
detect (Targeted
early stageProteomics)
cancers.
• Molecular Targeted Therapeutics
Implementation of new technologies to ongoing NCI-based
clinical trials.
The Importance of Early Detection of Ovarian Cancer
ACTUAL
WITH EARLY DETECTION
100
100
75
75
% 5 YR SURVIVAL
% STAGE DISTRIBUTION
50
50
25
25
0
0
I
II
III IV
I
II III
IV
A SHIFT IN NUMBER OF PATIENTS DIAGNOSED AT EARLY
STAGE WILL DRAMATICALLY EFFECT PATIENT SURVIVAL!
Current Status of Ovarian Cancer Screening
CA 125: a high-molecular-weight glycoprotein.
CA 125 is elevated in 83% of patients with ovarian
cancer.
False Negative rates of 40-50% for stage I
disease.
CA-125 cannot be detected in tissue sections
from 20% of ovarian cancers.
Hence, the false negative rates using CA-125 will never
be lower than 20%.
Patterns of Proteomic Information in Serum
Hypothesis:
Tissues are continuously perfused by serum -- their
histopathology may be reflected in serum proteomic “patterns.”
1. Signature proteins are
products of the tumor-host
Perfused Tissue
microenvironment,
and
thereby unique to the tissue
site and pathophysiological
state.
2. These biomarkers are
likely to be modified or
cleaved “reporter”
proteins/peptides that are
produced/amplified at the
tumor/host interface, are
released, and partition to
circulating carrier
proteins.
Patterns of Proteomic Information in Serum
Pathologic Signature?
“Proteomic” Mass Spectrum
CAN PROTEIN PROFILING IDENTIFY PROTEIN EXPRESSION PATTERNS
DIAGNOSTIC OF INVASIVE EPITHELIAL OVARIAN CANCER?
Serum Proteomic Pattern Diagnostic Workflow
Serum
Hydrophobic
PS-10 or PS-20
Cation Exchange
Anion Exchange
Antibody - Antigen
Metal Affinity
Receptor - Ligand
Normal Phase
DNA - Protein
Protein Chip
SELDI-TOF MS
Diagnosis
m/z
Proteomic Pattern
Pattern Recognition
Application and Implementation of SELDI-QqTOF for Diagnostic Proteomics
WCX2 ProteinChip Array
Ciphergen SELDI-TOF MS
Widely accessible
Extensive m/z range (5-300,000)
Low Resolution (~ 100-200)
Low Mass Accuracy (~1000 ppm)
ABI QSTAR Pulsar QqTOF MS
More specialized knowledge required…?
Limited m/z range? (5-12,000)
Higher resolution (>9000 at m/z 1500)
High mass accuracy (>50 ppm - external cal)
Bioinformatic Analysis for the Discovery of Diagnostic Patterns
Phase I: Pattern Discovery
a. Unaffected samples
1000
2000
3000
Phase 2: Pattern Matching
Test/validation sample
for diagnosis
b. Cancer samples
4000
5000
6000
m/z
1000
3000
4000
5000
6000
m/z
Genetic algorithm + self-organizing
cluster analysis
“Survival of the fittest” discriminatory
Patterns that discriminate “a” from “b”
in the training set
2000
Lead diagnostic fingerprint
(from training set)
Normal
Cancer
New
Sample and Modeling Breakdown
Samples obtained from National Ovarian Cancer Early Detection Program,
Northwestern University (Director: Dr. David Fishman)
A.
84 training samples (28 Unaffected and 56 Ovarian Cancer)
B.
87 blind testing samples (30 Unaffected and 57 Cancer)
C.
77 blind validation samples (37 Unaffected and 40 Cancer)
Total: 153 Ovarian Cancer; 95 Unaffected
Metrics of “High Fitness” Models from QqTOF Data
A. Results Breakdown of Four Models with Highest Diagnostic Accuracy
Results: 100%
sensitivity;
100% specificity
State
Testing
Validation
Normal
Ovarian Cancer
31/31
63/63
37/37
40/40
Unaffected
B.
Key diagnostic features recognized within each model
5
4
Model
1
7060
1276.861
0
6900
Model 2
8606
1001.654
0
4260.403
1255.593
7046.018
m/z
4377.854
7060.121
8602.237
6004.417
7096.922
8664.385
2374.244
6352.723
7000
7100
7200
m/z
4292.900
6548.771
8500
5
8600
8540.536
8606
7202.716
9870.938
8605.678
8605.678
8706.065
7000
7100
m/z
Conrads, T. P., Zhou, M., Petricoin, E, Liotta, L., and
Veenstra, T. D., Expert Rev. Mol. Diagn., 3, 411-420.
7200
8700
7060.121
8706.065
0
6900
Model 4
1144.796
818.480
Ovarian Cancer
7060.121
7060
4
8605.678
Model 3
8709.548
0
8500
8600
9367.113
m/z
8700
B.
Key diagnostic features recognized within each model
Model 1
Model 2
Model 3
Model 4
1276.861
818.480
1144.796
1001.654
2374.244
6352.723
4260.403
1255.593
4292.900
6548.771
7046.018
4377.854
7060.121
7060.121
8602.237
6004.417
8605.678
7096.922
8664.385
7060.121
8706.065
8540.536
7202.716
9870.938
8605.678
8605.678
8706.065
8709.548
9367.113
BLINDED TEST RESULTS:
Collaborators: Denise Ching, Kim Lyerly, Sam Wells, David Harpole; Duke U.
Benign vs. Malignant
(Spiral CT +)
Pattern Recognition
Method #1
Pattern Recognition
Method #2
Pattern Recognition
Method #3
Key ion features selected
(m/z)
6851.505
2378.046 2371.398
6675.697
10070.302
2210.224
4914.232
6854.245
2620.747 4471.636
5086.187 6649.053
6854.456
1028, 1035, 1050, 1289, 1980, 2080,
2210, 2212, 2365, 2366, 2485, 2589
2897, 3158, 3435, 3538, 3763, 4062,
4071, 4307, 4315, 4482, 4491, 4559
4643, 5138, 5139, 5800, 5861, 5879
6414, 6432, 6629, 6646, 6660, 6852
6978, 7834, 7835, 7908, 7922, 7923
7935, 7953, 8329, 8330, 8601, 8617
8619, 8634, 8913, 8931, 9120
Specificity
Sensitivity
69%
71%
85%
98%
89%
95%
100
Benign
50
Relative Intensity (%)
2365
0
1000.0
2370
2000.0
3000.0
2375
4000.0
6820
5000.0
6850
6000.0
6880
7000.0
8000.0
9000.0
1.0e4
100
.
1.1e4
1.2e4
Adenocarcinoma
50
2370
0
1000.0
2000.0
3000.0
2375 6820
4000.0
5000.0
6850
6000.0
6880
7000.0
8000.0
9000.0
1.0e4
1.1e4
1.2e4
1.1e4
1.2e4
100
Squamous
50
2365
0
1000.0
2000.0
2370
3000.0
2375
4000.0
6820
5000.0
6850
6000.0
m/z
6880
7000.0
8000.0
9000.0
1.0e4
6
4
2
0
6835
m/z
6875
Do we detect clinical biomarkers such as CA125 or PSA in proteomic patterns using SELDI?
Short Answer: No.
Is this due to the sensitivity of the instrument?
Short Answer: No, it is a dynamic range issue. A SELDI-TOF can detect below 10-12 mol/L.
Will a straight MALDI approach and high resolution MS without specifically targeting PSA,
for example, allow detection of these low abundant biomarkers?
Short Answer: No (see above)
Are we trying to detect PSA and CA125
Short Answer: No
Do we need better ways of diagnosing early stage cancer beyond CA125 and PSA?
Short answer: Absolutely.
Are all of the steps necessary to make proteomic pattern diagnostics clinically useful being
evaluated?
Short answer: Absolutely.
Characterization of the Human Serum Proteome
22 PROTEINS COMPRISE 99% OF THE PROTEIN MASS IN SERUM!
90%
10%
Human Serum Proteomic Investigation
Three tracks:
Global serum proteome survey
Can we account for the presence of histopathologicallyrelated proteins/peptides in serum?
Low molecular weight protein/peptide proteome
Can we deplete the high molecular weight fraction
for more effective interrogation of the source of the
diagnostic information?
Investigation of bound peptides to high abundant serum proteins
Is there histopathological content bound to the
highly abundant carrier proteins, such as albumin?
Global Serum Proteome Survey
Ampholyte Free Serum Peptide IEF
(20 Fractions)
Tryptic Digest
500000
100
90
80
70
300000
60
50
200000
40
30
100000
20
10
0
0
0
12
24
36
48
60
72
84
Time (min)
Analyze by LC/MS/MS
Strong Cation Exchange
(140 Fractions)
96
% Salt Gradient
Fluorescence
400000
GLOBAL ANALYSIS OF THE SERUM PROTEOME
bpp.nci.nih.gov
IEF
473 Proteins
957 Unique Peptides
IEF/SCX
1143 Proteins
2071 Unique Peptides
Total Proteins and Peptides Identified
1446 Unique Proteins
2649 Unique Peptides
Analysis of the Human Serum Proteome
King C. Chan, David A. Lucas, Denise Hise, Carl F. Schaefer, Zhen Xiao, George M. Janini, Kenneth H. Buetow,
Haleem J. Issaq, Timothy D. Veenstra and Thomas P. Conrads
Clinical Proteomics (2004) In Press
?
Analysis of Identified Human Serum Proteins
viral life cycle
0.04%
Molecular Function
antioxidant activity
0.05%
apoptosis regulator activity
transporter activity
0.54%
9.34%
transcription regulator activity
5.94%
toxin activity
0.05%
surfactant activity
0.05%
structural molecule activity
4.75%
behavior
0.54%
transport
6.93%
biological_process unknown
2.43%
response to stress
2.94%
cell communication
9.93%
response to external stimulus
7.76%
cell cycle
2.36%
pregnancy
0.29%
pathogenesis
0.47%
cell growth and/
or maintenance
13.31%
binding activity
34.07%
metabolism
19.43%
cell motility
1.99%
excretion
0.29%
signal transducer activity
10.37%
protein tagging activity
0.05%
protein stabilization activity
0.05%
motor activity
1.13%
molecular function unknown
3.46%
enzyme regulator activity
2.86%
homeostasis
0.65%
extracellular matrix
organization 0.15%
diuresis
0.04%
cell adhesion molecule activity
digestion
2.86%
0.22%
chaperone activity
enzyme activity
18.79%
0.70%
cytoskeletal regulator activity
0.05%
defense/immunity protein activity
4.86%
death
cellular process
20.67%
1.81%
development
6.85%
circulation
0.91%
Biological Processes
Cellular Component of Human Serum Proteins
Cytoplasmic
3%
virion
<1%
Cytoskeletal
3%
cellular component unknown
7%
Endoplasmic Reticulum
3%
Golgi
2%
Lysosomal
1%
Membrane
30%
Extracellular
15%
Extracellular
8%
Mitochondrial
4%
Membrane
39%
Intracellular
47%
GO of Human Serum Proteome
Nuclear
30%
Intracellular
8%
GO of Human Proteome
Human Serum Proteomic Investigation
Three tracks:
Global
serum proteome survey
B. Key diagnostic features recognized within each model
Model 3 ofModel
4
1
Model
2 presence
Can weModel
account
for
the
histopathologicallyrelated
proteins818.480
in serum?
1276.861
1144.796
1001.654
2374.244
6352.723
4260.403
1255.593
Low molecular
protein/peptide
proteome
4292.900weight
6548.771
7046.018
4377.854
7060.121
7060.121
8602.237
6004.417
Can we
deplete7096.922
the high8664.385
molecular
weight fraction
8605.678
7060.121
for more
effective
interrogation of7202.716
the source of the
8706.065
8540.536
diagnostic
information?
9870.938
8605.678
8605.678
8706.065
8709.548
9367.113
Investigation of bound peptides to high
abundant serum proteins
Is there histopathological content bound to the
highly abundant carrier proteins, such as albumin?
High Molecular Weight Protein Depletion by Ultrafiltration
Dilute raw serum 1:5 in
25 mM NH4HCO3, pH 8.2/20% acetonitrile
30 kDa MWCO Filter
Centrifuge
Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads,
TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003).
Depletion of High MW Serum Proteins by Ultrafiltration
HSA
21.5 kDa
14.4 kDa
6.0 kDa
3.5 kDa
M
LMW
Ultrafiltrate
Raw Serum
Ultrafiltrate
Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads,
TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003).
MALDI-TOF MS of Ultrafiltered Serum
Raw Serum
7.5
5
2.5
0
No Acetonitrile
7.5
5
2.5
0
20% Acetonitrile
7.5
5
2.5
0
2500
5000
7500
10000
12500
15000
m/z
Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads,
TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003).
High Molecular Weight Protein Depletion by Ultrafiltration
Dilute raw serum 1:5 in
25 mM NH4HCO3, pH 8.2/20% acetonitrile
Centrifuge
30 kDa MWCO
Filter
SCX Fractionation
mLC-MS/MS
Trypsin Digest
Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads,
TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003).
880 Unique Peptides (341 Proteins) Identified from Human Serum
LOW MOLECULAR WEIGHT Fraction
Hypothetical proteins
Enzymes
Circulating proteins
Coagulation &
complement factors
Structural proteins,
nuclear proteins,
transcription factors,
oncogene products, etc.
Transport and
binding proteins
Protease Inhibitors
Proteases
Channels, Receptors,
Binding Proteins
Cytokines, Growth Factors,
Hormones
Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads,
TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003).
GLOBAL ANALYSIS OF THE SERUM PROTEOME
www.bpp.nci.nih.gov
Protein
Peptide
Xcorr
Interferon g
LKKYFNAG
2.48
Charge Compartment
2
extracellular
Function
defense/immunity
1674 Unique Proteins
3441 Unique Peptides
100
IEF/SCX
IEF
Materials
and
LMW
Methods
473 Proteins
957 Unique Peptides
1143 Proteins
2071 Unique Peptides
Relative Abundance (%)
bpp-dev.nci.nih.gov
75
50
Data Analysis
25
0
250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
m/z
308 Proteins
884 Unique Peptides
Systems Analysis of Human Serum
Serum Proteomic Analysis
Three tracks:
Global serum proteome survey
Can we account for the presence of disease and
cellular process-related proteins in serum?
Low molecular weight protein/peptide proteome
Can we deplete the high molecular weight fraction
for more effective interrogation of the source of the
diagnostic information?
Investigation of bound peptides to high abundant serum proteins
Is there histopathological content bound to the
highly abundant carrier proteins, such as albumin?
Targeted Serum Proteomics

Utilize the character of serum – is the presence of albumin
such a detriment, or is it something exploitable?

Can we target the proteomic study of serum for disease
diagnosis as we would signal transduction pathways?

Diagnostic molecular sponges?

Preliminary diagnostic studies are demonstrating that highly
abundant HMW proteins actually contain bound diagnostic
information.
NH2
NH2
Lys
Lys
Bind
Xlink
Protein G
Protein G
Protein G
NH2+Cl
Cl+H2N
C-(CH2)5-C
H3CO
OCH3
Incubate
Serum
Dimethyl pimelimidate (DMP)
Wash
Elute
Protein G
Protein G
MALDI-TOF MS (Diagnostic)
Centrifuge
or
Trypsin Digest/mLC-MS/MS (Discovery)
30 kDa MWCO
Serum interactionomics studies have been completed for:
HSA via Antibody Capture
HSA via Dye-binding
Apolipoprotein
Transferrin
IgG
IgA
IgM
Prostate specific antigen (PSA) was detected bound to IgG
and albumin but not in the global serum analysis.
Using high abundance proteins as sponges, may increase the
likelihood of detecting low abundant proteins in serum or
plasma.
Global Analysis of the Mouse Serum Proteome
Intact Proteins
95
AU
15
195
9
3
-3
-5
0
15
30
45
60
75
90
0
Time (min)
15
30
45
60
75
90
Time (min)
Digest into Peptides
500000
100
500000
100
90
90
80
50
200000
40
30
100000
20
10
0
0
0
12
24
36
48
60
72
84
96
Fractionate Using
Strong Cation Exchange
Time (min)
80
70
300000
60
50
200000
40
30
100000
% Salt Gradient
60
% Salt Gradient
70
300000
400000
Fluorescence
400000
Fluorescence
AU
Cation Exchange
Anion Exchange
295
20
10
0
0
0
12
24
36
48
60
72
84
96
Time (min)
Compile Results
Analyze by LC/MS/MS
Analyze by LC/MS/MS
Gene Ontology of Mouse Serum Proteome
Extracellular
6.2%
Membrane
43.8%
Intracellular
47.5%
Global Analysis of the Mouse Serum Proteome
5053 Unique Proteins
11113 Unique Peptides
Analysis and Bioinformatic Annotation is Continuing
i.e. Comparison of mouse and human serum proteome
Is mouse a reasonable model for studying human cancers?
Cross Comparison of Mouse and Human Serum Proteome
Human
Mouse
Total Number of Proteins Identified
1674
5059
Proteins Mapped to Locus Link
Human/Mouse Pairs with >90% Similarity
Human/Mouse Pairs with >80% Similarity
Human/Mouse Pairs with >70% Similarity
1317
165
240
385
4637
166
244
401
Almost 30% of the human serum proteins identified had a homolog with >70%
sequence similarity that was identified within the mouse serum proteome.
What About
“One Hit Wonders”
In Biomarker
Discovery
Distribution of Unique Peptide Identifiers per Protein within Mouse
Cortical Neuron Proteome
9 (1.3%)
8 (1.9%)
7 (2.6%)
10 (1.2%)
>10 (5.0%)
6 (3.3%)
5 (4.4%)
1 (45.6%)
4 (6.3%)
3 (10.3%)
2 (18.1%)
In most global proteomic surveys and quantitative proteomic studies
using ICAT, a large fraction of the peptides are identified by a single
unique peptide.
Validation is a Key Component for Discovery-Driven Research
Relative Abundance
100
ICAT-12/13C9
A C Q E Q I E A L L E S S L R
y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2
50
S
P
Cyclin D1
0
81.0
82.0
83.0
84.0
Retention Time (min)
85.0
Actin
Relative Abundance
100
13C /13C
9
0
ratio = 1.76
50
0
81.0
82.0
83.0
84.0
Retention Time (min)
85.0
Densitometric ratio = 2.41
Interstitial Cystitis and Antiproliferative Factor
Interstitial cystitis (IC) is a debilitating chronic painful bladder
disorder, of unknown etiology, from which approximately one
million Americans suffer.
Bladder epithelial cells from IC patients produce an
antiproliferative factor (APF) that inhibits the proliferation of
normal bladder epithelial cells in vitro and alters the production of
specific growth factors.
APF is a potential anti-bladder cancer agent, however, its identity is
unknown.
Relative Abundance (%)
Identification of Fraction with APF-activity
100
80
1
2
3
4
5
6
60
40
20
0
0
10
20
30
40
Retention Time (min)
% 3H-Thymidine
Incorporation
100
50
0
1
2
3
4
Fraction
5
6
MS of Fraction with APF-activity
Relative Abundance
100
X
80
60
40
Y
20
0
600
800
1000
1200
1400
1600
1800
2000
Relative Abundance (%)
m/z
100
XIC m/z X
XIC m/z Y
80
60
40
20
0
0
10
20
Retention Time (min)
30
40
Identification of APF by de novo Sequencing
Relative Abundance (%)
y6
100
80
b3 b4 b5 b6
b7 b8
60
40
20
0
Relative Abundance (%)
ABC X X X X X X
DEF
500
700
900
1100
1300
1500
100
80
60
40
20
0
250
350
450
550
m/z
650
750
850
Percentage Decrease in Live Cell Count
Antiproliferative Activity of APF Peptide and Glycosylated
Derivatives
The APF peptide has 100%
homology to a peptide within a
known ligand receptor
120
100
80
60
40
VALIDATION!!!
20
Native APF
Native Mock
APF
Synthetic
APF
Synthetic
Peptide Alone
Presence of APF peptide in
interstitial cystitis patients
confirmed by Northern blotting
VALIDATION!!!
APF as a Biomarker for Interstitial Cystitis
APF has 100% homology to a
peptide within a known ligand
receptor
APF is a single peptide biomarker/effector
for patients with IC.
If the “one-hit wonder” rule is followed, it
would have been disregarded.
CONCLUSIONS
We have used high resolution MS and obtained 100% sensitivity and
specificity for ovarian cancer diagnosis.
Characterization of human (1447 proteins identified) and mouse serum
proteome (~5000 proteins) demonstrates that proteins across all functional
classes and cellular locations are present within serum.
Unlikely that Ab-based detection will provide reliable specificity in disease
marker detection.
A potential archive of histopathological information is bound to highly
abundant serum carrier proteins.
Just because a protein is only identified by a single peptide does not mean it
should be ignored. After all these studies are directly identifying peptides, not
proteins.
NCI BIOMEDICAL PROTEOMICS PROGRAM
SCIENTIFIC TEAM AND COLLABORATORS
Laboratory of Proteomics
And Analytical Chemistry
SAIC-Frederick
Mass Spectrometry Center
Thomas P. Conrads
Radha Tirumalai
Li-Rong Yu
Ming Zhou
Josip Blonder
Zhen Xiao
John Roman
Separations Technology Lab
Haleem Issaq, Head
Proteomic Patterns
Lance Liotta
Emmanuel Petricoin
Ben Hitt
Vincent Fusaro
Interstitial Cystitis
Chris Michedja
George Janini
King Chan
Stephen Fox
NMR Lab
Gwen Chmurny, Head
Que Van
John Klose
Aaron Lucas
Joseph Kates
Director RTP, SAIC-Frederick
David Goldstein
CCR, NCI
J. Carl Barrett
Director CCR, NCI
Download