Proteomics_12-8

advertisement
Multiple flavors of mass analyzers
Single MS (peptide fingerprinting):
Identifies m/z of peptide only
Peptide id’d by comparison to database,
of predicted m/z of trypsinized proteins
Tandem MS/MS (peptide sequencing):
Pulls each peptide from the first MS
Breaks up peptide bond
Identifies each fragment based on m/z
Collision cell
Now multiple types of collision cells:
CID: collision induced dissociation
ETD: electron transfer dissociation
HCD: high-energy collision dissociation
1
Intro to Mass Spec (MS)
Separate and identify peptide fragments by their Mass and Charge (m/z ratio)
Mass Spec
Ion source
Mass analyzer
MS Spectrum
Detector
Basic principles:
1. Ionize (i.e. charge) peptide fragments
2. Separate ions by mass/charge (m/z) ratio
3. Detect ions of different m/z ratio
4. Compare to database of predicted m/z fragments for each genome
2
How does each spectrum translate to amino acid sequence?
3
Mann Nat Reviews MBC. 5:699:711
How does each spectrum translate to amino acid sequence?
1.
De novo sequencing: very difficult and not widely used (but being developed)
for large-scale datasets
2.
Matching observed spectra to a database of theoretical spectra
3.
Matching observed spectra to a spectral database of previously seen spectra
4
Nesvizhskii (2010) J. Proteomics, 73:20922123.
-
spectral matching is supposedly more accurate but …
limited to the number of peptides whose spectra have been observed before
With either approach, observed spectra are processed to:
group redundant spectra, remove bad spectra, recognized co-fragmentation,
improve z estimates
Many good spectra will not match a known sequence due to:
absence of a target in DB, PTM modifies spectrum, constrained DB search,
5
incorrect m or z estimate.
Result: peptide-to-spectral match (PSM)
A major problem in proteomics is bad PSM calls
… therefore statistical measures are critical
Methods of estimating significance of PSMs:
p- (or E-) value: compare score S of best PSM against distribution of
all S for all spectra to all theoretical peptides
FDR correction methods:
1.B&H FDR
2.Estimate the null distribution of RANDOM PSMs:
- match all spectra to real (‘target’) DB and to fake (‘decoy) DB
- often decoy DB is the same peptides in the library but reverse
sequence
3. Use #2
oneabove
measure
to calculate
of FDR:posterior
2*(# decoy
probabilities
hits) / (# decoy
for EACH
hits +PSM
# target hits)
6
3. Use #2 above to calculate posterior probabilities for EACH PSM
- mixture model approach: take the distribution of ALL scores S
- this is a mixture of ‘correct’ PSMs and ‘incorrect’ PSMs
- but we don’t know which are correct or incorrect
- scores from decoy comparison are included, which can provide
some idea of the distribution of ‘incorrect’ scores
-EM or Bayesian approaches can then estimate the proportion of correct vs.
incorrect PSM … based on each PSM score, a posterior probability is calculated
FDR can be done at the level of PSM identification … but often done
at the level of Protein identification
7
Error in PSM identification can amplify FDR in Protein identification
Some methods
combine PSM FDR
to get a protein FDR
Nesvizhskii (2010) J. Proteomics, 73:20922123.
Often focus on proteins identified by at least 2 different PSMs
(or proteins with single PSMs of very high posterior probability)
8
Some practical guidelines for analyzing proteomics results
1.
Know that abundant proteins are much easier to identify
2.
# of peptides per protein is an important consideration
- proteins ID’d with >1 peptide are more reliable
- proteins ID’d with 1 peptide observed repeatedly are more reliable
- note than longer proteins are more likely to have false PSMs
3.
Think carefully about the p-value/FDR and know how it was calculated
4.
Know that proteomics is no where near saturating
… many proteins will be missed
9
Quantitative proteomics
Either absolute measurements or relatively comparisons
1.
Spectral counting
2.
Isotope labeling (SILAC)
3.
Isobaric tagging (iTRAQ & TMT)
4.
SRM
10
Spectral counting
counting the number of peptides and counts for each protein
Challenges:
- different peptides are more (or less) likely to be assayed
- analysis of complex mixtures often not saturating – may miss some
peptides in some runs
newer high-mass accuracy machines alleviate these challenges
- quantitation comes in comparing separate mass-spec runs … therefore
normalization is critical and can be confounded by error
- requires careful statistics to account for differences in:
quality of run, likelihood of observing each peptide, likelihood
of observing each protein (eg. based on length, solubility, etc)
Advantages / Challenges
+ label-free quantitation; cells can be grown in any medium
- requires careful statistics to quantify
- subject to run-to-run variation / error
11
SILAC
(Stable Isotope Labeling with Amino acids in Cell culture)
Cells are grown separately in heavy (13C) or light (12C) amino acids (often K or R),
lysates are mixed, then analyzed in the same mass-spec run
Mass shift of one neutron allows deconvolution, and quantification,
of peaks in the same run.
Advantages / Challenges:
+ not affected by run-to-run variation
- need special media to incorporate heavy aa’s,
- can only compare (and quantify) 2 samples directly
- incomplete label incorporation can confound MS/MS identification
12
Isobaric Tagging
iTRAQ or
Tandem Mass Tags, TMTs
Each peptide mix covalently tagged
with one of 4, 6, or 8 chemical
tags of identical mass
LTQ Velos
Orbitrap
Samples are then pooled and analyzed
in the same MS run
Collision before MS2 breaks tags –
Tags can be distinguished in the
small-mass range and quantified to
give relative abundance across
up to 8 samples.
Advantages / Challenges:
+ can analyze up to 8 samples,
same run
13
- still need to deal with normalization
Selective Reaction Monitoring (SRM)
Targeted proteomics to quantify specific peptides with great accuracy
-
Specialized instrument capable of very sensitively measuring
the transition of precursor peptide and one peptide fragment
-
Typically dope in heavy-labeled synthetic peptides of precisely known
abundance to quantify
Advantages:
- best precision measurements
Disadvantages:
- need to identify ‘proteotypic’ peptides for doping controls
- expensive to make many heavy peptides of precise abundance
- limited number of proteins that can be analyzed
14
Phospho-proteomics and Post-translational modifications (PTMs)
phosphorylated (P’d) peptides are enriched, typically through chromatography
- P’d peptides do not ionize as well as unP’d peptides
- enrichment of P’d peptides ensures ionization and aids in mapping
IMAC: immobilized metal ion affinity chromatography
- phospho groups bind charged metals
- contamination by negatively-charged peptides
Titanium dioxide (TiO2) column:
- binds phospho groups (mono-P’d better than multi-P’d)
SIMAC: Sequential Elution from IMAC:
- IMAC followed by TiO2 column
Goal: identify which residues are phosphorylated (Ser, Thr, Tyr),
mapped based on known m/z of phospho group
15
Download