High Throughput and Large Scale Proteomics Analysis Austin Yang, Ph.D. Department of Pharmaceutical

advertisement
High Throughput and Large Scale
Proteomics Analysis
Austin Yang, Ph.D.
Department of Pharmaceutical
Sciences, University of Southern
California
Overview
1. Shotgun proteomics and ESI mass spectrometry
2. Proteomic data mining
and data visualization
12,000 proteins
Are We Ready for Mammalian Proteomics ?
Shotgun
Proteomics
2-D Gel
Cytoskelatal Proteins mM, 1x 109 copies/cell
Metabolism
0.1 mM, 1x 108
Ribosomes
10 mM, 1x 107
Kinases
1 mM, 1x 106
Cyclins
0.1 mM, 1x 105
Transcription factors 10 nM, 1x 104
Synaptic Markers 0.1 nM, 1x 103
Advantages of Proteomics Using LC-MS/MS
• No pre-selection of biased targets
(hypothesis-free, open approach)
• Protein variants are detected simultaneously
• Protein isolation and detection are on a
small scale (~ 10 fmol from complex mixtures
– subcellular fractions, whole cells, or tissue)
• Obtain sequence information of peptides
(not just masses) and can sequence ~4,000
proteins in a single experiment
Liquid Chromatography Quadrupole Ion Trap Tandem Mass Spectrometer
Electrospray vs Nanospray
Splitless Nano-Liquid Chromatography
Five Independent Loop Injections
10-cycle MudPIT Analysis
SCX
(NH4OAc)
RF#1
RF#2
100 mM
MSM
Wash
MS
Wash
Wash
MS
Wash
MSM
MSS
600 mM Wash
700 mM MSM
800 mM Wash
900 mM MS
1000 mM Wash
Wash
MSM
Wash
MSS
Wash
MS
200 mM
300 mM
400 mM
500 mM
Multidimensional Protein Identification Technology (MudPIT)
Digested protein complexes
0-500 mM NH4OAc
SCX
Column
RP #1
500
500
400
300
200
100
RP #2
400
300
200
100
1,000-2,000 Sequencing Attempts in 60 Minutes
20,000 MS/MS spectra/day
Isotope-Coded Affinity Tags (ICAT)
Electrospray Ionization (ESI)
Ions in gaseous phase
Ions in solution
LC
Spray
tip
Ion source
opening
for the MS
Theoretical CID of a Tryptic Peptide
+
+
F L G
+
F L G K
+
K
b3
y1
+
Parent
ions
+
+
+
G K
b2
CID
+
F L G K
y2
+
+
F
L G K
b1
+
F L G K
y3
Non-dissociated
Parent ions
Daughter ions
y1
+
F L G K
+
F L
F L G K
+
y3
b1
y2 b
2
MS/MS
Spectrum
K
G
L
F
L
G
F
b3
K
m/z
(464.29)
SequestQueue (6,000 dta x50 = 300,000 ms/ms scans)
Data Mining through SEQUEST and
PAULA
Database
•Yeast ORFs (6,351 entries)
•Non-redundant protein (100k entries)
•EST (100K entries, 3-frames)
Search Time
52 sec:
0.104 sec/s
3500 min:
5-10,000 min:
SEQUEST Algorithm
Step 1.
Determine Parent
STEP 1.
Ion molecular
Step 2.
Theoretical MS/MS
spectra
SEQ 1
mass
SEQ 2
SEQ 3
(Experimental MS/MS
Spectrum)
SEQ 4
500 peptides with masses
closest to that of the parent ion
are retrieved from a protein
database. Computer generates
a theoretical MS/MS Spectrum
for each peptide sequence
(SEQ1, 2, 3, 4, …)
ZSA-charge assignment
Step 4.
Scores are ranked and
Protein Identifications
are made based on
these cross
correlation scores.
Step 3.
STEP
3.
Experimental Spectrum is
compared with each theoretical
spectra and correlation scores
are assigned.
Unified Scoring Function
(Experimental
MS/MS Spectrum)
One spectrum TWO protein identifications
Spectrum A was used to search against
NCBI human database:
Macrophage inhibitory factor was identified
Same spectrum was used to search against
non-redundant database. Bovine G-protein
gamma was identified. Since the primary
amino acid sequence of human G-protein
gamma is almost identical to bovine, this
protein was later identified as human G-protein
Gamma. The initial false ID was due to
an entry missing of human g-protein in human
database. The sequence was later reentered
Into the human database and the third search
yielded correct ID.
Mol Cell Proteomics. 2003 Jul;2(7):428-42.
Fragment ions match both sequences are indicated by *
Spectrum B has two additional ions matched to G-protein gamma
Distribution of Xcorr from correctly and
incorrectly identified peptides
X-correlation vs Peptide length
Distribution of Xcorr vs Charge State
F-score and probability-based peptide assignment
Identification of modified LRP in APP/PS1 Transgenic Mice
Neurotransmitter Receptors
Tg
Peptide
A) 1. (Q9WV18) Gamma-aminobutyric acid type B receptor, subunit 1 precursor (GABA-B-R1)
2.
(NP_032102.1) gamma-aminobutyric acid (GABA-A) receptor, subunit rho 2
3.
(NP_034382.1) gamma-aminobutyric acid A receptor, gamma 1
4.
(NP_033733.1) cholinergic receptor, nicotinic, epsilon polypeptide; acetylcholine receptor
5.
(NP_150372.1) cholinergic receptor, muscarinic 3, cardiac; AChR M3
6.
(S28058) serotonin receptor 5
7.
(NP_031903.1) dopamine receptor 3; D3 receptor
8.
(Q60934) Glutamate receptor, ionotropic kainate 1 precursor (Glutamate receptor 5)
9.
(I49696) glutamate receptor chain B (version flip)
B) 1. (NP_038589.1) 5-hydroxytryptamine (serotonin) receptor 3A
2.
(P30545) Alpha-2B adrenergic receptor (Alpha-2B adrenoceptor)
3.
(NP_032195.1) glutamate receptor, ionotropic, NMDA1 (zeta 1)
4.
(NP_032198.1) glutamate receptor, ionotropic, NMDA2D (epsilon 4); GluRepsilon4
5.
(I49696) glutamate receptor chain B (version flip)
C) 1
2.
(NP_034428.1) glycine receptor, beta subunit
(JC4262) glutamate transporter 2
Proteomic Data Visualization and Future Directions
• information overload
• data integration
• ease of visualization
Network for NMDA and glutamate receptors
Network for NMDA and glutamate receptors
(Zoom-in)
Scoring Algorithm for Spectral Analysis
SEQUEST
Raw Unidentified Spectra
(~10,000-100,000)
SALSA
Identified Sequence
SALSA Overview
*
product ion
chargedloss
neutral loss
Mass
difference
A GD
W
T
ion series
• SALSA is a tool for
identifying MS-MS spectra
in Xcalibur analysis files that
display specific user-defined
characteristics. Because
these characteristics
correspond to structural
features of a peptide, SALSA
allows the user to selectively
locate MS-MS spectra of
specific peptides or their
variant or modified forms.
Construction of SALSA ruler GAIIGLMGGVV
GAIIGLMGGV
GAIIGLMGG
GAIIGLMG
GAIIGLMGGVV
GAIIGLM
GAIIGL
GAIIG
GAII
GA GAI
Methionine Oxidation
16 amu (one oxygen atom)
m/z
GAIIGLMGGV
GAIIGLMGG
GAIIGLMG
GAIIGLM
GAIIGL
GAIIG
GAII
GAI
GA
GAIIGLMGGVV
Absolute Quantification Analysis
Quantification of Methionine Oxidation
y6*
A.
[Aß29-40+1O]+1
B.
y9* y8*
G A
I
I
b4
G
y7*
y6* y5
L
M V
b6 b7*
y3
G G V V
b9*
b11* b12*
b6
y8*
b11*
b4
[b11*]+2
y3
y5
y9*
b7*
b9*
y7*
[Aß29-40+2O]+1
b12*
[Aß29-40]+1
GAIIGLMVGGVV
GAIIGLMVGGVV: +7 amu
Download