Proteomics_12-6

advertisement
Goals in Proteomics
1.
Identify and quantify proteins in complex mixtures/complexes
2.
Identify global protein-protein interactions
3.
Define protein localizations within cells
4.
Measure and characterize post-translational modifications
5.
Measure and characterize activity (e.g. substrate specificity, etc)
1
Goals in Proteomics
1.
Identify and quantify proteins in complex mixtures/complexes
MS and MS/MS
2.
Identify global protein-protein interactions
MS and MS/MS, Y2H
3.
Define protein localizations within cells
High-throughput microscopy, organelle pull down
4.
Measure and characterize post-translational modifications
MS techniques
5.
Measure and characterize activity (e.g. substrate specificity, etc)
Protein arrays
2
Basic overview of Tandem mass-spectrometry (MS/MS)
3
Coon et al. 2005
Intro to Mass Spec (MS)
Separate and identify peptide fragments by their Mass and Charge (m/z ratio)
Mass Spec
Ion source
Mass analyzer
MS Spectrum
Detector
Basic principles:
1. Ionize (i.e. charge) peptide fragments
2. Separate ions by mass/charge (m/z) ratio
3. Detect ions of different m/z ratio
4. Compare to database of predicted m/z fragments for each genome
4
Intro to Mass Spec (MS)
Separate and identify peptide fragments by their Mass and Charge (m/z ratio)
1.
Ionization
Goal: ionize (i.e. charge) peptide fragments without destroying molecule
Positive ionization (protonate amine groups)
especially useful for trypsinized proteins (cleaved after R and K)
vs.
Negative ionization (deprotonate carboxylics and alcohols)
http://www.colorado.edu/chemistry/chem5181/MS_ESI_Gilman_Mashburn.pdf
5
Liquid chromatography + Electrospray ionization
electric field
* Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures
6
Liquid chromatography + Electrospray ionization
electric field
* Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures
MALDI
* Less sensitive to contaminants, more common for less complex mixtures
http://www.astbury.leeds.ac.uk/facil/MStut/mstutorial.htm
7
Intro to Mass Spec (MS)
Separate and identify peptide fragments by their Mass and Charge (m/z ratio)
2.
Separation of ions based on m/z ratio (mass m versus charge c)
Multiple flavors of mass analyzers use different technology
* TOF (‘time of flight’): separates based on velocity
* Triple quadrupole: separation based on pulsed electrical pulse
8
Multiple flavors of mass analyzers
Single MS (peptide fingerprinting):
Identifies m/z of peptide only
Peptide id’d by comparison to database,
of predicted m/z of trypsinized proteins
Tandem MS/MS (peptide sequencing):
Pulls each peptide from the first MS
Breaks up peptide bond
Identifies each fragment based on m/z
Collision cell
9
Multiple flavors of mass analyzers … can be hooked together in multiple configs.
g. Orbitrap
10
Multiple flavors of mass analyzers
Single MS (peptide fingerprinting):
Identifies m/z of peptide only
Peptide id’d by comparison to database,
of predicted m/z of trypsinized proteins
Tandem MS/MS (peptide sequencing):
Pulls each peptide from the first MS
Breaks up peptide bond
Identifies each fragment based on m/z
Collision cell
Now multiple types of collision cells:
CID: collision induced dissociation
ETD: electron transfer dissociation
HCD: high-energy collision dissociation
11
Fragmentation happens in fairly defined way along peptide backbone
Peptide can fragment along 3 possible bonds …
charge stays on either the ‘left’ (a,b, or c) or ‘right’ (x, y, or z) side of cleavagee
Cleavage along the CO-NH bond is most common, generating ‘b’ and ‘y’ ions
12
MS spectrum (i.e. peptide ions)
Each peak often surrounded
by smaller peaks of similar m/z
Sensitivity of instrument
determines resolution
Each peak is a different peptide, separated based on m/z
A single peptide is selected by the instrument for the second MS
13
Mann Nat Reviews MBC. 5:699:711
Second MS identifies y (or b) ions to read out amino-acid sequence
14
Mann Nat Reviews MBC. 5:699:711
Trypsin often used to digest proteins (cleaves after Arg and Lys)
WHY?
Because of challenges distinguishing spectra, simplified mixtures
are typically injected into the MS:
-
-
either excised proteins
- purified complexes
fractionated pools of complex mixtures
15
2 dimensional gel separation
(largely outdated)
The first dimension
(separation by isoelectric focusing)
- gel with an immobilised pH gradient
- electric current causes charged
proteins to move until it reaches the
isoelectric point
(pH gradient makes the net charge 0)
The second dimension
(separation by mass)
-pH gel strip is loaded onto a SDS gel
-SDS denatures the protein (to make
movement solely dependent on mass,
not shape) and eliminates charge.
16
Ahna Skop
2D-SDS PAGE gel
17
Ahna Skop
TAP-tag: Tandem Affinity Purification
(for IP’ing individual proteins and proteins
bound to them)
18
Ion exchange chromatography
Anion exchange:
Column is postively charged (can
bind negativey charged proteins).
Cation exchange:
Column is negativey charged (can
bind positively charged proteins).
Exploit the isoelectric point of a protein to
Separate it from other macromolecules.
19
Ahna Skop
Size exclusion chromatography
Porous beads made of different but
controlled sizes.
Smaller proteins go in and out of beads and
will be retained in the resin.
Large proteins will only go into large beads
and will be retained less.
Very large proteins will not go into any of
the beads (exclusion limit).
Can be used as a preparative method or to
determine the molecular weight of a
protein in solution.
20
Ahna Skop
Affinity chromatography
A ligand with high affinity to the protein
is attached to a matrix.
Protein of interest binds to ligand
And is retained by resin. Everything else
flows through.
Can use excess of the soluble ligand
to elute the protein.
21
Ahna Skop
How does each spectrum translate to amino acid sequence?
22
Mann Nat Reviews MBC. 5:699:711
How does each spectrum translate to amino acid sequence?
1.
De novo sequencing: very difficult and not widely used (but being developed)
for large-scale datasets
2.
Matching observed spectra to a database of theoretical spectra
23
Theoretical spectra:
- in silico digestion of a known
protein database
- set of limited set of theoretical
spectra based on enzyme,
instrument sensitivity, others
- this reduces search space
- can miss some peptides
- comparisons based on several
different scores (eg.
correlation between obs.
and theoretical profiles)
24
Mann Nat Reviews MBC. 5:699:711
How does each spectrum translate to amino acid sequence?
1.
De novo sequencing: very difficult and not widely used (but being developed)
for large-scale datasets
2.
Matching observed spectra to a database of theoretical spectra
3.
Matching observed spectra to a spectral database of previously seen spectra
25
Nesvizhskii (2010) J. Proteomics, 73:20922123.
-
spectral matching is supposedly more accurate but …
limited to the number of peptides whose spectra have been observed before
With either approach, observed spectra are processed to:
group redundant spectra, remove bad spectra, recognized co-fragmentation,
improve z estimates
Many good spectra will not match a known sequence due to:
absence of a target in DB, PTM modifies spectrum, constrained DB26search,
incorrect m or z estimate.
Result: peptide-to-spectral match (PSM)
A major problem in proteomics is bad PSM calls
… therefore statistical measures are critical
Methods of estimating significance of PSMs:
p- (or E-) value: compare score S of best PSM against distribution of
all S for all spectra to all theoretical peptides
FDR correction methods:
1.B&H FDR
2.Estimate the null distribution of RANDOM PSMs:
- match all spectra to real (‘target’) DB and to fake (‘decoy) DB
- often decoy DB is the same peptides in the library but reverse
sequence
3. Use #2
oneabove
measure
to calculate
of FDR:posterior
2*(# decoy
probabilities
hits) / (# decoy
for EACH
hits +PSM
# target hits)
27
3. Use #2 above to calculate posterior probabilities for EACH PSM
- mixture model approach: take the distribution of ALL scores S
- this is a mixture of ‘correct’ PSMs and ‘incorrect’ PSMs
- but we don’t know which are correct or incorrect
- scores from decoy comparison are included, which can provide
some idea of the distribution of ‘incorrect’ scores
-EM or Bayesian approaches can then estimate the proportion of correct vs.
incorrect PSM … based on each PSM score, a posterior probability is calculated
FDR can be done at the level of PSM identification … but often done
at the level of Protein identification
28
Error in PSM identification can amplify FDR in Protein identification
Some methods
combine PSM FDR
to get a protein FDR
Nesvizhskii (2010) J. Proteomics, 73:20922123.
Often focus on proteins identified by at least 2 different PSMs
(or proteins with single PSMs of very high posterior probability)
29
Some practical guidelines for analyzing proteomics results
1.
Know that abundant proteins are much easier to identify
2.
# of peptides per protein is an important consideration
- proteins ID’d with >1 peptide are more reliable
- proteins ID’d with 1 peptide observed repeatedly are more reliable
- note than longer proteins are more likely to have false PSMs
3.
Think carefully about the p-value/FDR and know how it was calculated
4.
Know that proteomics is no where near saturating
… many proteins will be missed
30
Download