Goals in Proteomics 1. Identify and quantify proteins in complex mixtures/complexes 2. Identify global protein-protein interactions 3. Define protein localizations within cells 4. Measure and characterize post-translational modifications 5. Measure and characterize activity (e.g. substrate specificity, etc) 1 Goals in Proteomics 1. Identify and quantify proteins in complex mixtures/complexes MS and MS/MS 2. Identify global protein-protein interactions MS and MS/MS, Y2H 3. Define protein localizations within cells High-throughput microscopy, organelle pull down 4. Measure and characterize post-translational modifications MS techniques 5. Measure and characterize activity (e.g. substrate specificity, etc) Protein arrays 2 Basic overview of Tandem mass-spectrometry (MS/MS) 3 Coon et al. 2005 Intro to Mass Spec (MS) Separate and identify peptide fragments by their Mass and Charge (m/z ratio) Mass Spec Ion source Mass analyzer MS Spectrum Detector Basic principles: 1. Ionize (i.e. charge) peptide fragments 2. Separate ions by mass/charge (m/z) ratio 3. Detect ions of different m/z ratio 4. Compare to database of predicted m/z fragments for each genome 4 Intro to Mass Spec (MS) Separate and identify peptide fragments by their Mass and Charge (m/z ratio) 1. Ionization Goal: ionize (i.e. charge) peptide fragments without destroying molecule Positive ionization (protonate amine groups) especially useful for trypsinized proteins (cleaved after R and K) vs. Negative ionization (deprotonate carboxylics and alcohols) http://www.colorado.edu/chemistry/chem5181/MS_ESI_Gilman_Mashburn.pdf 5 Liquid chromatography + Electrospray ionization electric field * Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures 6 Liquid chromatography + Electrospray ionization electric field * Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures MALDI * Less sensitive to contaminants, more common for less complex mixtures http://www.astbury.leeds.ac.uk/facil/MStut/mstutorial.htm 7 Intro to Mass Spec (MS) Separate and identify peptide fragments by their Mass and Charge (m/z ratio) 2. Separation of ions based on m/z ratio (mass m versus charge c) Multiple flavors of mass analyzers use different technology * TOF (‘time of flight’): separates based on velocity * Triple quadrupole: separation based on pulsed electrical pulse 8 Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted m/z of trypsinized proteins Tandem MS/MS (peptide sequencing): Pulls each peptide from the first MS Breaks up peptide bond Identifies each fragment based on m/z Collision cell 9 Multiple flavors of mass analyzers … can be hooked together in multiple configs. g. Orbitrap 10 Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted m/z of trypsinized proteins Tandem MS/MS (peptide sequencing): Pulls each peptide from the first MS Breaks up peptide bond Identifies each fragment based on m/z Collision cell Now multiple types of collision cells: CID: collision induced dissociation ETD: electron transfer dissociation HCD: high-energy collision dissociation 11 Fragmentation happens in fairly defined way along peptide backbone Peptide can fragment along 3 possible bonds … charge stays on either the ‘left’ (a,b, or c) or ‘right’ (x, y, or z) side of cleavagee Cleavage along the CO-NH bond is most common, generating ‘b’ and ‘y’ ions 12 MS spectrum (i.e. peptide ions) Each peak often surrounded by smaller peaks of similar m/z Sensitivity of instrument determines resolution Each peak is a different peptide, separated based on m/z A single peptide is selected by the instrument for the second MS 13 Mann Nat Reviews MBC. 5:699:711 Second MS identifies y (or b) ions to read out amino-acid sequence 14 Mann Nat Reviews MBC. 5:699:711 Trypsin often used to digest proteins (cleaves after Arg and Lys) WHY? Because of challenges distinguishing spectra, simplified mixtures are typically injected into the MS: - - either excised proteins - purified complexes fractionated pools of complex mixtures 15 2 dimensional gel separation (largely outdated) The first dimension (separation by isoelectric focusing) - gel with an immobilised pH gradient - electric current causes charged proteins to move until it reaches the isoelectric point (pH gradient makes the net charge 0) The second dimension (separation by mass) -pH gel strip is loaded onto a SDS gel -SDS denatures the protein (to make movement solely dependent on mass, not shape) and eliminates charge. 16 Ahna Skop 2D-SDS PAGE gel 17 Ahna Skop TAP-tag: Tandem Affinity Purification (for IP’ing individual proteins and proteins bound to them) 18 Ion exchange chromatography Anion exchange: Column is postively charged (can bind negativey charged proteins). Cation exchange: Column is negativey charged (can bind positively charged proteins). Exploit the isoelectric point of a protein to Separate it from other macromolecules. 19 Ahna Skop Size exclusion chromatography Porous beads made of different but controlled sizes. Smaller proteins go in and out of beads and will be retained in the resin. Large proteins will only go into large beads and will be retained less. Very large proteins will not go into any of the beads (exclusion limit). Can be used as a preparative method or to determine the molecular weight of a protein in solution. 20 Ahna Skop Affinity chromatography A ligand with high affinity to the protein is attached to a matrix. Protein of interest binds to ligand And is retained by resin. Everything else flows through. Can use excess of the soluble ligand to elute the protein. 21 Ahna Skop How does each spectrum translate to amino acid sequence? 22 Mann Nat Reviews MBC. 5:699:711 How does each spectrum translate to amino acid sequence? 1. De novo sequencing: very difficult and not widely used (but being developed) for large-scale datasets 2. Matching observed spectra to a database of theoretical spectra 23 Theoretical spectra: - in silico digestion of a known protein database - set of limited set of theoretical spectra based on enzyme, instrument sensitivity, others - this reduces search space - can miss some peptides - comparisons based on several different scores (eg. correlation between obs. and theoretical profiles) 24 Mann Nat Reviews MBC. 5:699:711 How does each spectrum translate to amino acid sequence? 1. De novo sequencing: very difficult and not widely used (but being developed) for large-scale datasets 2. Matching observed spectra to a database of theoretical spectra 3. Matching observed spectra to a spectral database of previously seen spectra 25 Nesvizhskii (2010) J. Proteomics, 73:20922123. - spectral matching is supposedly more accurate but … limited to the number of peptides whose spectra have been observed before With either approach, observed spectra are processed to: group redundant spectra, remove bad spectra, recognized co-fragmentation, improve z estimates Many good spectra will not match a known sequence due to: absence of a target in DB, PTM modifies spectrum, constrained DB26search, incorrect m or z estimate. Result: peptide-to-spectral match (PSM) A major problem in proteomics is bad PSM calls … therefore statistical measures are critical Methods of estimating significance of PSMs: p- (or E-) value: compare score S of best PSM against distribution of all S for all spectra to all theoretical peptides FDR correction methods: 1.B&H FDR 2.Estimate the null distribution of RANDOM PSMs: - match all spectra to real (‘target’) DB and to fake (‘decoy) DB - often decoy DB is the same peptides in the library but reverse sequence 3. Use #2 oneabove measure to calculate of FDR:posterior 2*(# decoy probabilities hits) / (# decoy for EACH hits +PSM # target hits) 27 3. Use #2 above to calculate posterior probabilities for EACH PSM - mixture model approach: take the distribution of ALL scores S - this is a mixture of ‘correct’ PSMs and ‘incorrect’ PSMs - but we don’t know which are correct or incorrect - scores from decoy comparison are included, which can provide some idea of the distribution of ‘incorrect’ scores -EM or Bayesian approaches can then estimate the proportion of correct vs. incorrect PSM … based on each PSM score, a posterior probability is calculated FDR can be done at the level of PSM identification … but often done at the level of Protein identification 28 Error in PSM identification can amplify FDR in Protein identification Some methods combine PSM FDR to get a protein FDR Nesvizhskii (2010) J. Proteomics, 73:20922123. Often focus on proteins identified by at least 2 different PSMs (or proteins with single PSMs of very high posterior probability) 29 Some practical guidelines for analyzing proteomics results 1. Know that abundant proteins are much easier to identify 2. # of peptides per protein is an important consideration - proteins ID’d with >1 peptide are more reliable - proteins ID’d with 1 peptide observed repeatedly are more reliable - note than longer proteins are more likely to have false PSMs 3. Think carefully about the p-value/FDR and know how it was calculated 4. Know that proteomics is no where near saturating … many proteins will be missed 30