Biochem 523b: Advanced Physical Methods: Mass Spectrometry, X-ray Crystallography and NMR 1. Biological Mass Spectrometry (Lajoie) (3 +1 lectures) 2. X-ray Crystallography (Ling) (3 +1 lectures) 3. NMR (Shaw) (3 +1 lectures) Biochem 523b: Advanced Physical Methods: Mass Spectrometry, X-ray Crystallography and NMR First lecture: January 11(room DSB 3008) 2:30-5:30pm Last Lecture: March 29 Final Exam: TBA, Mid-April Reference material: Course notes and journal articles. Evaluation: Presentations (2) 30 marks Assignments (3) 30 marks Final exam 40 marks Students will give a 20 min presentation in two of the three topics discussed in the course. There will be an assignment for each topic. The final exam will be a 3hr exam with questions in each section. Note: Course notes from Biochem 440 and 465a will be available for review Biochem 523b: Advanced Physical Methods: Mass Spectrometry, X-ray Crystallography and NMR A. Mass Spectrometry Lecture 1 Introduction Definitions Basic concepts Mass Spectrometer Ionization MALDI ESI Multiply charged ions and deconvolution MS/MS sequencing Lecture 2 Mass analyzers Biochem 523b: Advanced Physical Methods: Mass Spectrometry, X-ray Crystallography and NMR Lecture 3 Quantitation General Principles ICAT, iTRAQ, SILAC,etc. PTMs Phosphorylation Glycosylation Lecture 4 Short presentations by students Non-Covalent studies Protein Folding HTP Proteomics Metabolomics Other PTMs Etc Mass Spectrometry • An instrument that measures the masses of individual molecules that have been converted into ions, i.e., molecules that have been electrically charged. Measure mass of ions not of neutral molecules. • Since molecules are so small, it is not convenient to measure their masses is kilograms, or grams, or pounds. In fact, the mass of a single hydrogen atom is approximately 1.6726 X 10-24 grams. The convenient unit of mass is often referred to by chemists and biochemists as the dalton (Da) and is defined as follows: 1 Da=(1/12) of the mass of a single atom of the isotope of carbon-12(12C). This follows the accepted convention of defining the 12C isotope as having exactly 12 mass units 12C= 12.00000. Dalton (Da) is also known as unified atomic mass unit (u or amu) 1 Da = 1 u = 1.66055 x 10-27 Kg Why Mass Spectrometry? Highly selective: Can monitor a certain analyte with minimum interference by other species in the sample The high selectivity also to distinguish several species at the same time for measure multiple species at the same time with high resolution for each as opposed to average signal for many spectroscopic techniques such as UV absorbtion, fluorescence, etc. Selectivity can be increased by coupling with separation technique such as LC or GC MS is highly sensitive or has very low limit of detection picomole (10-12 mol) to zeptomole (10-21 mol). Much more sensitive than NMR Concentration in NMR is typically mM MS can handle nM or better Use of Mass Spectrometry • Identify structures of biomolecules such as proteins, carbohydrates, nucleic acids and steroids • Sequence biopolymers: proteins and oligosaccharides • Determine how drugs are used by the body (metabolites) • Perform forensic analyses: confirmation and quantitation of drugs of abuse • Analyze for environmental pollutants • Determine the age and origins of specimens in geochemistry and archaeology • Identify and quantitate compounds of complex organic mixtures Applications in Biochemistry •Characterization of biomolecules including proteins and their modifications (phospho, glyco) •Sequence determination of proteins (peptides), polysaccahrides, lipids •Protein–Ligands Interactions •Protein abundance •Protein-Protein interactions (network) •Stoichiometry of complexes (quaternary structure, metal, etc.) •Cellular localization (organellar proteomics) •Protein dynamics and folding •Proteomics, Glycomics, Lipidomics, Metabolomics (HTP of above) The Mass Spectrometer High Vacuum Sample Ion Source Creates ions in the gas phase Electron Impact (EI) Chemical Ionization (CI) Fast atom bombardment (FAB) MALDI Electrospray Mass Analyzer Detector Separates ions in space or time according to mass to charge ratio m/z Collects ions and amplifies signals Magnetic sector Time-of-flight (TOF) Quadrupole (Q) Hybrid : Q-TOF Linear Ion Trap FT ICR Orbitrap Data System Stores and analyzes data Controls the mass spectrometer Why high vacuum in mass analyzer? High vacuum = low pressure High vacuum is necessary to minimize collisions with other gazeous molecules. Collisions would produce deviation of the trajectory and ions would loose their charges on walls of instrument. Ion-molecules could produced unwanted reactions and increase complexity of spectrum (controlled collisions can be useful and will be discussed later. The average distance a particle can travel before colliding is called the mean free path L: L = kT/ r1 r2 2 ps k = Boltzmann constant, T is temperature in K p = pressure in Pa and s is collision cross-section (m2) s = p d2 where d is sum of radii of the stationary molecule and colliding molecule d = r1 + r2 K = 1.38 x 10-21 J K-1, T ~ 300 K, s ~ 45 x 10-20 m2 L (cm) = 0.66/ p (Pa) or = 4.95/ p (milliTorr) Why high vacuum in mass analyzer?... The mean free path L must be larger than the dimension of the mass analyzer In a typical mass spec the mean free path should be at least 1 meter and hence the maximum pressure should be no more than 66 nbar However we need L = 10 to 100 times the free ion path to reduce The probability of ion/neutral collision to 10% or better to 1% or less Typical vacuum in MS systems is 10-5to a10-10 Torr. For air molecule at 10-7 Torr, L is ~1 km Note: Measurement of cross section can yield information on the conformation of molecules in the gas phase Useful Definitions and Units Prefix for SI units 10-1 10-3 10-6 10-9 10-12 10-15 10-18 10-21 deci milli micro nano pico femto atto zepto Quantities Charge of electron Mass of the electron Mass of the proton Mass of the neutron Unified of atomic mass Avogrado constant Pressure 1 pascal (Pa) = 1 Newton (N) m-2 1 bar = 10 6 dyn cm-2 =105 Pa 1 millibar (mbar) = 10-3 bar = 102 Pa 1 atmosphere (Atm) = 1.1013 bar = 101 308 Pa 1 Torr = 1 mmHg = 1.33 mbar = 133.3 Pa 1 psi = 1 pound per square inch = 0.07 atm e me mp mn u NA 1.60219 x 10-19 C 9.10953 x 10-31 kg 1.67265 x 10-27 kg 1.67495 x 10-27 kg 2.99793 x 10-27 kg 6.02205 x 1023 mol-1 Energy 1 cal = 4.184 J eV = 1.602 x 10-19 J Mass Spectrum 2 dimensional representation of signal intensity (y axis) vs m/z (x axis) The intensity reflects the abundance of ionic species intensity 100 Most intense peak is called the base peak and is most often normalized to 100% relative Intensity. Plot centroid peak 50 50 100 150 200 m/z m/z is dimensionless, z = is an integer, 1 or more Back to Basics… Chemical Composition of Living Matter 27 of 92 natural elements are essential. Elements in biomolecules (organic matter): H, C, N, O, P, S These elements represent approximately 92% of dry weight. Organic Matter Organized in "building blocks" amino acids polypeptides ( proteins) monosaccharides starch, glycogen nucleic acids DNA, RNA Mass (Weights) of Atoms and Molecules Element C H O N S P Nominal Exact Percent mass mass abundance 12 12.00000 98.90% 13 13.00335 1.10% 1 1.00783 99.986% 2 2.01410 0.015% 16 17 15.99491 16.9991 99.762% 0.038% 18 17.9992 0.2% 14 14.00307 99.63% 15 15.00011 0.37% 32 31.9721 95.02% 33 32.9714 0.75% 34 33.9678 4.21% 36 35.9671 0.02% 31 30.9737 100% Average mass 12.0115 1.00797 15.9994 14.0067 32.066 30.9737 Calculation of Atomic and Molecular mass Nominal Mass: To calculate the approximate mass of a molecule Use the mass of the element present eg CO2 12 + (2x16)= 44; not precise but sufficient in many applications Isotopic mass: is calculated from the exact mass of the isotopes. It is close but not equal to the nominal mass. The monoisotopic mass of a molecule is the addition of the exact mass of the most abundant isotopes for each atom Present. For CO2 12.00000 u + (2 x 15.994915) u = 43.989830 (u or Da) Exact ionic mass: Depends on how the ions are formed. For CO2+. 12.0000000u + 2 x (15.994915) – 0.000548 u (mass of e) = 43.989282. For ESI or MALDI in positive ion mode, we will add the mass of one or more proton. Relative Atomic Mass (average mass): calculated from the weighted average of naturally occurring isotopes of an element. The relative molecular Mass Mr is calculated from the relative atomic masses of the elements in the empirical formula. Eg CO2 12.0108 + 2 x 15.9994 = 43.9988 Mass spectrum A mass spectrum is a graph of ion intensity as a function of mass-to-charge ratio. Mass spectra are often depicted as simple histograms as shown Most abundant =100% relative Intensity (number of ions counted) Mass Spectrometry + (CH3) 3N-CH2-CH2-OH 104 (Choline ion) 5 C (12) = 60 14 H (1) = 14 1 O (16) = 16 1 N (14) = 14 --------104 50 150 100 m/z Low resolution mass spectrum Formation of Ions by Electron Ionization Removal of 1 electron Mass or Molecular Weight of Molecules Ethyl acetate C4H8O2 4 C12 8 H1 2 O16 Nominal Mass: 4 x 12.0000 8 x 1.00783 2 x 15.9949 48 + 8 + 32 = 48.0000 8.06264 31.98982 88 Monoisotopic Mass: 88.0546 Average Mass: 48.046 + 8.06376 + 31.988 = 88.10856 Mass Spectrum of Ethyl Acetate by Electron Impact (EI) ..O.+ Harsh ionization causes fragmentation + CH3 + H3C C H3C m/z = 43 m/z = 15 %relative intensity O .. CH CH O .. 2 m/z = 88 43.02 (100%) Monoisotopic peaks Base peak 88.05 (10%) 15 (1%) 20 44.02 (2.2%) 40 89.05 (0.44%) 80 m/z 3 Approximation of Isotopic Distribution Ethyl acetate C4H8O2 1st PEAK (100%intensity) 4 C12 4 x 12.0000 8 H1 8 x 1.0078 2 O16 2 x 15.99949 Second peak (4.56 % intensity) 3 C12 3 x 12.0000 1 C13 1 x 13.000333 8 H1 8 x 1.0078 2 O16 2 x 15.99949 (1.1% x 4 = 4.4%) 4 C12 7 H1 1 H2 2 O16 4 x 12.0000 7 x 1.0078 1 x 2.0140 2 x 15.99949 (0.020 x 8 = 0.16%) 48.0000 8.0624 31.9898 88.0522 36.0000 13.0335 8.0624 31.9898 89.055 48.0000 7.0546 2.0140 31.9898 89.0584 Amino Acids (20) Intact nominal mass R = H, R H2N H CO2H Glycine (Gly, G) R = CH3 , C2 H5NO2 MW 75 C3H7NO2 MW 89 Alanine (Ala, A) R = CH2 CO2H, R = (CH2)4-NH2 Aspartic (Asp, D) C4H7NO4 MW 133 Lysine (Lys, K) C6H14N2O2 MW 146 Exact Mass of Amino Acid Residues in Proteins Gly Ala G A 57.02150 71.03720 Gln Lys Glu Q K E 128.05860 128.09500 129.04270 Note: Leu (L) = Ile (I) = 113.08410 Amino Acids and Proteins Have Mass (or Weight) Ser Ala H N H CH3 H O C + H N H N CH2OH H O C CH3 H C H O + H N H OH Phe OH CH2OH H H H C N N O H CH2 H O C CH2 H O C OH + 2 HO 2 OH Ala-Ser-Phe (ASF) Nominal (MW 89 + 106 + 165 - (2 x 18)) = 323 or C15H21N3O5 monoisotopic mass: 71.03711 + 87.03203 +147.0684 + 18.0105 (H2O) = 323.1481 Mr average mass 323.3490 Mass accuracy and resolution Mass accuracy: the difference between measured and accurate mass and calculated exact mass. Mass accuracy can be stated as absolute units of u (or mmu) or as relative mass accuracy in ppm (most common): (Experimental – Calculated) (106) = ppm Calculated (0.0406) (106) (3708.99 -3708.9494) (106) = 11 ppm = 3708.9494 3708.9494 Resolution: good mass accuracy can only be obtained from sharp peaks that are evenly shaped signals that are well separated form each other Resolution and mass accuracy Resolution (R) is a measure of separation between two adjacent peaks (masses). Dm is the smallest mass difference at which two masses can be resolved. R = m/Dm Resolving power (R) is also a performance characteristics of MS instruments, that is its ability to distinguish between two Ions that differ only slightly in their m/z rario There are a number of ways to describe resolution (R): • Peak width at 10% valley for two overlapping peaks (2x 5%) • Peak width at 5% maximum for a single peak • Peak full width at half maximum (FWHM) (most common) ie in Da at 50% of the intensity Since resolution is also related to peak width, resolution will also affect mass accuracy. On most instruments higher resolution means lower sensitivity. Resolution and mass accuracy Two overlapping peaks 2 peaks at 10% valley Single peak Full Width at Half Maximum (FWHM) Consequences of resolution on mass accuracy 1u 1u 1u 1u 0.1u 50 51 500 501 1000 1001 Signals at m/z 50, 500 and 100 at R = 500. At m/z 100 the peak maxima are shifted towards each other due to superimposing of the peaks. Importance of Resolution Glucagon: Monoisotopic and Average Mass As the mass increase the monoisotopic peak is less and less evident First peak C153 H225N42 O50S 100% Second Peak: 12C-13C 1H-2H(D) 14N- 15N 153 x 1.1% 225 x 0.02% 42 x 0.37% Monoisotopic mass: 3,482.61 Average mass: 3,484.75 170% 4.5% 15.5% 190% * Note: Peaks of highest intensity is 1 Da higher than monoisotopic for each ~1500 Da (ie for mass ~3000 the highest peak is 2 Da higher than the monoisotopic peak Resolution and mass accuracy… Mass accuracy: ppm = 106 /R = 106 Dm/M Example: Measure a mass at 1,000 +/- 0.5 Da Mass accuracy = 106 (0.5)/ 1,000 = 500 ppm Resolution R = M/Dm = 1,000/0.5 = 2,000 • Higher resolution gives higher mass accuracy • For a given resolution mass accuracy decrease with higher m/z m/z 1,000 2,000 10,000 10,000 10,000 Dm (+/-) Resolution 0.05 20,000 0.05 40,000 0.5 20,000 0.05 200,000 0.005 2,000,000 ppm (+/-) 50 25 50 5 0.5 Mass range 999.95-1000.05 1999.95-2000.05 9999.5-10,000.5 9999.95-10,000.05 9999.995-10,000.005 Characteristics of Mass Spectrometers - Sensitivity: expressed in lowest detection limit eg picomolar (10-12 mole), now subfemtomolar (< 10-15) - Mass range eg 50-4000 - Mass accuracy expressed in u or ppm (best 1- 5 ppm) - Resolving power: ability to separate two peaks (masses) For R = 20,000 can see two masses at 100.000 and 100.005 dynamic range: ability to observe two peaks at very different intensities eg 1000:1 (103)- best 104 (LTQ-FTMS) - -others: cost, ease of operation, etc. Characteristics of Some Mass Spectrometers - Sensitivity for tryptic peptides MALDI–TOF/R 10-100 x 10-15mole Q-TOF2 50-200 x 10-15mole -Resolution MALDI -TOF/R Q-TOF2 FTMS 10,000 at mass 2,000 10-15,0000 100,000-3,000,000 - Mass Accuracy MALDI-TOF/R: external calibration internal calibration Q-TOF2: external calibration FTMS +/- 50 ppm +/- 20 ppm +/- 50 ppm +/- 1-10 ppm MALDI-TOF Q-TOF Ion Trap FTICR (9.4T) Sensitivity Highest High Medium High Mass Accuracy Narrow range High Poor Highest Sequencing (MS/MS) Difficult Yes Yes Yes Throughput High Med Med Med Ease of operation Easiest Med Med Hardest Cost 300K 650K 300K 1.0M Newest: MALDI -TOF/TOF; MALDI- Qq-TOF; FTICR MS 12 Tesla MALDI-ion trap/quadrupole; ESI Quad/Trap/TOF, Orbitrap Matrix Assisted Laser Desorption/Ionization (MADLI) 1. Matrix containing analytes (eg proteins) absorbs UV (or IR) energy from a pulse laser (10 nanoseconds) 2. The matrix ionizes and dissociates; it undergoes a phase change to a supercompressed gas; it then transfers it charges to the analyte molecules 3. Matrix expands at supersonic velocity; additional analytes are formed in the gas phase; the resulting ions are entrained in the expanding plume 4. The analyte ions are accelerated by a voltage pulse and analyze in the mass spectrometer Matrix Assisted Laser Desorption/Ionization Sample is co-crystallized with matrix (solid) Formation of singly charged ions Koichi Tanaka, Nobel Prize 2002 MALDI-TOF/R MS of Peptides from a Tryptic Digest 2790.22 100 2791.23 1324.60 2789.22 1325.62 Peptides from trypsin self-digestion 2792.23 % internal calibrants 2466.18 1265.62 2465.20 2467.19 1326.60 2793.23 1759.93 1974.94 1760.93 2468.20 1975.93 1179.41 0 1000 1748.86 1477.62 1478.61 1761.92 1540.63 1327.61 1460.59 1976.92 2356.10 2355.11 2179.87 2794.20 2469.17 2746.23 2795.06 3104.41 3103.43 3106.42 m/z 1200 1400 1600 1800 2000 2200 2400 2600 2800 Mass “Fingerprint” of a Pure Protein 3000 Protein Identification with MALDI-TOF/R 1. Cut spots from 2D Gel, destained, reduce disulfide bonds, alkylate with iodoacetamide and trypsin digestion of each spot (medium to high silver stained spot) 2. Extract peptides and purify by ZipTip, containing reverse phase or by capillary HPLC. 3. Mix with matrix and analyze by MALDI-TOF/R 4. Compare observed masses with masses in databases obtained from virtual tryptic digest of all proteins (mass fingerprinting) 5. Confidence for hits depends on coverage: minimum 5 masses (should get >30% coverage) Proteomic Analysis with MALDI: Mass Fingerprinting “Bottom-up Approach” Peptides to proteins 1000 1500 2000 Mass (m/z) Pick spots on a gel Protein(s) in solution Extract peptides; mass analyze Digest – site specific protease Database search or sequence Difficulties With Mass Fingerprints Many tryptic peptides have similar masses resulting in numerous false positives. Mass accuracy is critical !! Mass from 1000.30-1000.70 Typical Problems 1. No MS signals!! Insufficient sample (poor digestion, poor extraction) Contaminants that affect ionization: SDS, acrylamide, salts, detergents, PEG 2. Protein contamination Keratins, peptides from trypsin self-digestion, bacterial proteins, etc.. 3. Detect the most abundant proteins only 4. Masses affected by PTMs, adducts, etc, wrong assignment Electrospray Ionization –MS ESI MS Formation of Charged Droplets and Mutilply Charged Ions Formation of multiply charged ions Mass Spectrum of a Multiply Charged Protein Raw Data Spectrum for Myoglobin (Denaturing conditions) myo 12 (0.467) Sb (2,10.00 ); Cm (5:20) A20 848.55 100 TOF MS ES+ 3.24e3 848.58 A18 942.73 A: +18 +17 +16 +15 A22 771.51 A17 998.12 Maximum number of charges is dependent on number of basic residues (Lys, Arg, His) A16 1060.51 A15 1131.08 A23 738.04 % 16951.50±0.02 A14 1211.77 A24 707.30 +14 1211.92 1231.38 A13 1305.02 1232.37 +13 A25 679.06 1413.56 948.18 1003.92 1066.52 +12 1137.65 A11 1542.01 1233.39 +11 1249.42 1312.52 1421.81 1098.93 1550.99 1172.13 1696.15 +10 1884.66 1705.87 +9 1895.38 1352.30 1464.88 1847.54 0 m/z 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 Deconvoluted Spectrum for Myoglobin myo 12 (0.467) M1 [Ev-145428,It12] (Gs,0.700,650:1950,1.00,L33,R33); Sb (25,10.00 ); Cm (5:20) TOF MS ES+ 2.30e5 A 100 16951.85 A: 16951.50±0.02 Myoglobin: Deconvoluted Spectrum Expected MW = 16951.49 Da R = 8000 % 17049.71 17567.41 0 14000 mass 14500 15000 15500 16000 16500 17000 17500 18000 18500 19000 19500 MS of Glu-fibrinopeptide 25-MAR-2002 gfmar25b 399 (9.653) Sm (SG, 2x4.00); Cm (397:403) 100 1: TOF MS Survey ES+ 2.59e3 A2 785.85 A: +2 1569.70±0.00 786.36 M= (785.85 x 2) -2H = 1569.79 % 786.86 +3 A3 524.24 524.58 542.22 787.38 776.87 542.56 0 400 787.88 m/z 500 600 700 800 900 1000 1100 1200 1300 1400 1500 MS of Glu-fibrinopeptide: doubly charged ion 785.85 25-MAR-2002 gfmar25b 399 (9.653) Sm (SG, 2x4.00); Cm (397:403) A2 100 785.85 1: TOF MS Survey ES+ 2.59e3 A: 1569.70±0.00 786.36 786.36 M= (785.85 x 2) -2H = 1569.79 0.5 Da Monoisotopic 786.86 % 786.86 0.5 Da 787.38 787.38 0.5 Da 787.88 0 785 786 787 788 m/z 789 Transform of Different Charged States to MW Each peak is related to the mass (m) and charge state (z) m/z1 = ( MW + n1 mH ) / n1 m/z2 = [ MW + ( n2 mH )] /n2 Each adjacent for a pure molecule is related ie n2 = n1 +1, that is one proton and one more charge Calculation of MW (M) of Proteins from ESI Data For two adjacent peaks of m/z: m/z1 (higher value) and m/z2 (lower value) the number of charges (n) will differ by one; mH= mass of H+ (1.0079) n2 = n1 + 1 m/z1 = (M + n1 mH) / n1 m/z2 = (M + n2 mH) / n2 m/z2 = (M_____________ + (n1 + 1) mH) n1+1 M = m/z1 n1 – n1mH = m/z2 (n1+1) – ((n1+ 1) mH) n1 = _______ m/z2- mH m/z1- m/z2 M = n1 (m/z1- mH) Once we know the charge sate n1 we can calculate M ESI MS of bovine insulin ESI MS of Bovine Insulin Bovine insulin peaks at 2867.2 and adjacent peaks at 1912.3, 1434.3, 1147.4 n1 = m/z2- mH / m/z1- m/z2 n1 = (1912.3 – 1) / (2867.2 – 1912.3) = 1911.3 /954.9 = 2.00 M = n1 ( m/z1- mH) M = 2.00 X (2867.2-1.0079) = 5732.4 Repeat for 1434.3, M = 5733.9 1147.4, M = 5733.2 and average: Mexp = 5732.9 +/- 0.7 Mr = Calculated average mass = 5733.58 Verification of Mutant Proteins Yeast Iso-1 cytochrome c Y67F, N53I (+heme) Cald: 12688 Obsd:12687 Confirmation of Sequence : Glyoxylase 1 Predicted from cDNA: 14906 Observed: 14919 +/- 1 D : 13 Da?! ACC to AAC Thr to Asn Effects of Denaturation on Charge Distribution Denaturation by heat, pH, organic solvents 2 distributions of charge states Non-Covalent Complex:Calmodulin + 4 Ca++ + MLCK pH = 6.7, 40 C, 5% MeOH KRRWKKNFIAVSAANRFKKISS 2634 Calmodulin: 16700 4 Ca++ : 160 Cald: 19486 Obsd: 19484 Effect of pH on Hemoglobin Tetramer D = Dimer Q = Tetramer ESMS and Tandem MS (MS/MS) of Peptides or Mutliply Charged MS/MS spectrum Protein Identification by MS/MS of Peptides y5 y4 x4 z4 O H H2N H C C H N c1 b1 y5 H C C a2 H N R1 c2 b2 a3 N H C y2 z2 C C H C H N c3 b3 R3 b2 O H C C H N a4 OH y1 O O H N C a5 c4 b4 H C C H N H C R5 R4 b3 H C R5 R4 C z1 O O H N y1 x1 y2 O R2 b1 H C y3 O C x2 R3 y4 H C z3 O R2 a1 H2N y3 O R1 H x3 b4 a5 C OH “Mobile” proton causes cleavage along peptide chain CH3 O H+ NH O NH NH NH2 O CH2OH CH(CH3)OH O CH3 NH O H+ O NH H+ CH(CH3)OH O O CH3 O NH NH O OH + (CH2)4-NH2 H CH2(C6H5) NH NH CH2(C6H5) O CH(CH3)OH O O + (CH2)4-NH2 H CH2(C6H5) O O OH NH NH CH2OH NH NH2 NH CH2(C6H5) O NH O CH2(C6H5) NH CH2OH NH NH2 O CH2(C6H5) O CH(CH3)OH O CH3 H+ Migration of the mobile proton NH NH NH2 OH (CH2)4-NH2 CH2(C6H5) O CH2OH O NH NH Doubly Protonated ATSFYK O CH2(C6H5) O OH + (CH2)4-NH2 H H+ H+ CH3 O NH NH NH2 O CH2OH O NH CH(CH3)OH O CH2(C6H5) NH O O NH NH OH + (CH2)4-NH2 H CH2(C6H5) O CH(CH3)OH O CH3 O NH NH NH2 O CH2OH CH2(C6H5) NH CH2(C6H5) O NH O OH + (CH2)4-NH2 H H+ Each protonation site can induce different fragmentation Formation of y ions: CH3 O NH NH NH NH2 H+ CH2OH CH3 H + NH N NH NH2 O O OH OH + (CH2)4-NH2 H CH2(C6H5) O O NH NH CH2(C6H5) CH2OH H + NH N NH NH2 O O OH (CH2)4 NH2 O H+ OH CH2(C6H5) O O NH NH CH2(C6H5) OH O CH(CH3)OH H+ NH CH2OH CH3 CH2(C6H5) O O NH NH NH CH2(C6H5) N O OH (CH2)4 NH2 H+ O O O NH NH CH(CH3)OH CH3 NH2 O CH2(C6H5) CH2(C6H5) O CH(CH3)OH O O O CH2OH CH(CH3)OH y3 ion (doubly charged) Formation of b ions: y3 O H2N H C C O: H N H C R1 O H+ N H C R2 H C C O O H C H N R3 C H N H C C OH R5 R4 b2 O H2N H C C H N H C R1 + C O R2 b2 -C O H2N R3 (-28) C O O O H C H N H C R4 C H N H C C OH R5 Neutral O H2N H C R1 C H N + C H R2 a2 b2 ions are often observed with a diagnostic -28 a2 ion; b1 ions are rare b2 ion allows you to determine yn-2 ion, since M + 2 = b2 + yn-2 y and b Ions from Peptide DAEFR y ions: 1 H 115.1 71.1 Ala Asp 129.1 Glu 147.2 Phe 156.2 Arg 17 OH + H+ 115.1+ 1 + 71.1 + 129.1 + 147.2 + 156.2 + 17 + 1 m/z = 637.7 71.1 + 1+ 129.1+ 147.2 + 156.2 + 17 + 1 129.1 + 1 + 147.2 + 156.2 + 17 + 1 147.2 + 1+ 156.2 + 17 + 1 m/z = 522.6 m/z = 451.5 156.2 + 1 +17 + 1 m/z = 175.2 m/z = 322.4 b ions: 1 H 115.1 71.1 Ala Asp 129.1 Glu 147.2 Phe 156.2 Arg 17 OH + H+ 1 + 115.1 + 71.1 + 129.1 + 147.2 + 156.2 m/z = 619.7 1 + 115.1 + 71.1 + 129.1 + 147.2 m/z = 463.5 1 + 115.1 + 71.1 + 129.1 m/z = 316.3 1 + 115.1 + 71.1 m/z = 187.2 1 + 115.1 m/z = 116.1 Solving MS/MS Spectra Mass difference between b1 and b2, b2 and b3 or between y1 and y2, etc, gives mass corresponding to aa. For tryptic digests (K, R), the first amino acid at the C- terminal is known, ie K or R. (note R gives stronger signals than K by either ESI or MALDI) Immonium ions are often observed and give information on types of amino acids present in sequence. (not observed with ion trap MS) + H2N C H R1 Lys 101 Agr 129 Ser 60 Glu 102 Trp 159 Pro 70 Val 72 Gln 101 Thr 74 His 110 Met 104 Phe 120 Tyr 136 Cys 76 Asp 88 Leu 86 Asn 87 Gly 30 Ala 44 Cys 76 Ile 86 Idealized Product Ion Spectrum of Tryptic Peptides M + 2 = b2 + yn-2 100% a2 y1 b2 b3 y2 b4 y3 y4 b5 y5 y5 = M +1 Immonium ions 200 300 m/z 28 100 400 500 MS of Glu-Fibrinopeptide Select doubly charged ion in MS gfmar25b 399 (9.653) Sm (SG, 2x4.00); Cm (397:403) 100 25-MAR-2002 1: TOF MS Survey ES+ 2.59e3 +2 A2 785.85 A: 1569.70±0.00 786.36 25-MAR-2002 gfmar25b 399 (9.653) Sm (SG, 2x4.00); Cm (397:403) A2 100 785.85 1: TOF MS Survey ES+ 2.59e3 A: 1569.70±0.00 786.36 % % 786.86 786.86 787.38 +3 787.88 0 785 A3 524.24 524.58 542.22 786 787 m/z 789 788 787.38 776.87 542.56 0 400 787.88 m/z 500 600 700 800 900 1000 1100 1200 EGVNDNEEGFFSAR 1300 1400 1500 Sequencing Glu-fibrinopeptide (Q-TOF) R (1395.47) R (1395.47) 684.36 100 813.40 480.27 333.20 % 942.44 187.08 1285.55 1056.47 1171.50 497.21 175.12 246.16 y1 bMax yMax 1570.72(M+H) + 627.33 740.28 382.19 612.23 924.46 1057.57 1286.66 1384.63 169.07 1535.67 1571.60 1574.74 0 M/z 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 •m/z 175 = C-terminal Arg, m/z 147 = C-terminal Lys (y ion series) •Can start sequencing from anywhere MS/MS of Glu Fibrinopeptide 246 – 175 = 71, residue molecular weight Ala = 71! (1324.43) R A A 684.36 100 % 942.44 187.08 246.16 y2 1285.55 1056.47 1171.50 497.21 175.12 y1 bMax yMax 813.40 480.27 333.20 R (1324.43) 1570.72(M+H) + 627.33 740.28 382.19 612.23 924.46 1057.57 1286.66 1384.63 169.07 1535.67 1571.60 1574.74 0 M/z 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 MS/MS of Glu Fibrinopeptide 333 – 246 = 87, residue molecular weight Ser = 87! (1237.40) R A S S 684.36 100 333.20 y3 942.44 246.16 y2 bMax yMax 1285.55 1056.47 1171.50 497.21 175.12 y1 R 813.40 480.27 % 187.08 A (1237.40) 1570.72(M+H) + 627.33 740.28 382.19 612.23 924.46 1535.67 1057.57 1286.66 1384.63 169.07 1571.60 1574.74 0 M/z 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 MS/MS of Glu Fib Complete Sequence EG R V A N S D N F F 100 E G 684.36 y6 G F N E 942.44 y8 % 187.08 b2 246.16 y2 F D S N A V R G 1171.50 y10 1570.72(M+H) + 627.33 y5 740.28 382.19 bMax yMax 1285.55 y11 1056.47 y9 497.21 175.12 y1 E 813.40 y7 480.27 y4 333.20 y3 E E 612.23 924.46 1535.67 1286.66 1384.63;y12 1057.57 169.07 0 1571.60 1574.74 M/z 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Sequence Tags from Asp-N Treatment (before D) DL P K V L DLVKP 666.46 b6 GLL 100 D L LG L D bMax yMax 553.37 b5 201.00 a2 % 781.46 b7 226.06 229.03 b2 199.04 0 84.64 100 197.02 667.42 638.44 a6 376.94 325.21 307.20 424.32 535.34 457.06 894.54 b8 779.52 763.47 782.54 895.63 1069.56 1097.82 M/z 200 300 400 500 600 700 800 900 1000 •Observe mainly b ions ;b1 ions rarely observed 1100 MS/MS of EA(I/L)DFFAR 540.28 100 Expected MH+ = 484.74*2-1.0078 =968.47 484.74 (2+) 655.30 968.47-839.44 = 129.03 (i.e. Glutamic acid) D % 393.22 A Full sequence: EA(I/L)DFFAR F 229.12 R F 246.15 I/L 175.12 201.09 296.16 173.09 365.18 768.40 411.19 155.08 523.26 358.18 120.08 A 637.31 656.36 541.34 477.25 100 150 200 250 300 350 400 450 500 839.44 729.25 0 550 600 650 700 750 800 850 858.48 900 961.49 950 mass 1000 Expected Chemical Modifications 1. Carbamidomethyl (CAM) (-CH2-C(O)-NH2) On cysteine after alkylation with iodo acetamide I-CH2C(O)NH2 103 (Cys) + 58 (CAM) = 161 -1 (SH) = 160 2. Oxidation of methionine residues (air oxidation) -CH2-CH2-S-CH3 Met 3. Deamidation -CH2C(O)NH2 Asn, Gln -CH2-CH2-S(O)-CH3 Met (O) + 16 -CH2C(O)OH Asp, Glu + 1 4. Carbamylation (Lys and N-terminal NH2) -CH2-NH-C(O)-NH2 CH2-NH2 +43 Manual or De novo Interpretation of MS/MS Data 1. Most proteins are analyzed by MS/MS after trypsin digestion (unless otherwise specified eg Lys-C and Asp N) 2. A parent ion (usually doubly charged) is selected by the first quadrupole. A neutral gas is introduced in the collision cell and causes fragmentation along the backbone producing b and y ions. Each b ions will differ by the mass of one amino acid residue. Each y ion will differ by the mass of one amino acid residue. (see movie MS/MS tutorial at http://www.mshri.on.ca/pawson/ms/movie.html 3. Since trypsin is used the C-terminal amino acid must be Arg or Lys, y1 y1 ion at 175 or 147 O H N H C C Arg (Lys) OH Interpretation of MS/MS Data … 4. The mass of the peptide can be calculated form the doubly charged ion = x 2- 2H) 5. The b and y series may not be complete creating gaps in the sequence. The gaps can often be identified or partially identified by the sum of the mass of two amino acids b and especially y ions can loose H2O so the mass of the amino acid -18 is detected in the MS 6. b2 ions are often observed with a diagnostic -28 a2 ion; b1 ions are rare. b2 ion allows us to determine yn-2 ion, since M + 2 = b2 + yn-2 n is the maximum number of possible y ions smallest b2 =115 (Gly, Gly) largest b2 = 373 (Trp,Trp) 7. There many mass equivalence. The two most common are oxidized methionine Met(o) = 147.04 and Phe =147.07 and Cys (Cam) and CysGly (CG) = 160.03 Bioinformatics Databases- several types: -DNA sequences, proteins sequences -EST (expressed sequence tags) (more prone errors) -2D Gels, 3-D structure, post-translational modifications -Annotations: forms, function, etc. Protein Databases: (use more than to increase confidence) SwissProt (best) NCBInr OWL Search Engines Mascot: masses, sequence tags, MS/MS data Profound: masses, sequence tags, MS/MS data MS-Fit: masses, sequence tags, MS/MS data, homology Protein Links Global Server (PLGS, Micromass) Strategies to ID proteins with MS/MS data Need to determine sequence of tryptic peptides - for de novo sequencing of unknown organisms - to obtain partial sequences for database searches “Sequence Tags”: get much better results Algorithms to determine sequence are poor and determination can be slow when done manually. Solution: search databases with uninterpreted MS/MS data against virtual (in silico) MS/MS of peptide in database (MASCOT from www.matrixscience.com) or SEQUEST or X!Tandem Web Tools • Peptide Mass Fingerprinting • Mascot (Peptide Mass Fingerprint): http://www.matrixscience.com • MassSearch: http://cbrg.inf.ethz.ch/Server/MassSearch.html • MOWSE: http://www.hgmp.mrc.ac.uk/Bioinformatics/Webapp/mow se • MS-Fit: http://prospector.ucsf.edu • PeptIdent: http://us.expasy.org/tools/peptident.html • Peptide Search: http://www.mann.emblheidelberg.de/GroupPages/Homepage.html • Profound: http://prowl.rockefeller.edu • PepMapper: http://wolf.bms.umist.ac.uk/mapper/ Approaches to Identify Proteins from MS Data 1. Masses of digested peptides compared with in silico digests of protein databases (mass fingerprinting) Unreliable 2. Compare uninterpreted MS/MS data with in silico MS/MS of digested proteins in databases (eg MASCOT) Problems: - Too many false hits - Need known genomes 3. Search databases with partial sequences (sequence tags) Much better for known and unknown genomes Problems: - Long and tedious to determine sequence manually - Inaccurate software available until PEAKS