Data Acquisition and Analysis in Mass Spectrometry Based Metabolomics Pavel Aronov BioCyc workshop October 27, 2010 Outline Fundamentals of Mass Spectrometry Data Acquisition and Analysis in GCMS based Metabolomics Data Acquisition and Analysis in LCMS based Metabolomics How to analyze tryptophan or any other metabolite? Two most common techniques in analytical chemistry to determine or confirm chemical structure: Nuclear Magnetic Resonance/NMR (1940s, Felix Bloch at Stanford University) Excellent structural information Mass Spectrometry (1900s, JJ Thompson at Cambridge University) Excellent sensitivity What is a mass spectrometer? Atmosphere Vacuum Mass Spectrometer M+ M M M Ion Source M+ Mass Analyzer M+ Detector M+ Measured value: mass-to-charge ratio M/Z Mass Units Unit of mass: 1/12 mass of carbon-12 atom 1 u or 1 Da Unit of mass-to-charge 1 Da / z = 1 Th (Thompson) m/z 205 For metabolites usually z = 1, Hence 1 Da is equivalent to 1 Th Monoisotopic vs Average Mass Fall 0022 DS Tryptophan statistically can contain: no carbon-12 (M): 204.09 Da (100 %) one carbon-13 (M+1): 205.09 Da (11.9 %) two carbons-13 (M+2): 206.09 Da (1.4 %) These are monoisotopic masses O OH HN NH2 C 11 H 1 2 N 2 O 2 % Two stable isotopes important in biochemistry Carbon-12 (100 %) and Carbon-13 (~1.1 %) Sulfur-32 (100 %) and Sulfur-34 (4.4 %) vitD051608sample001 (0.005) Is (1.00,1.00) C11H12N2O2 204.09 8.73e12 100 Average mass = (204.09 *100 + 205.09*11.9 + 206.09*1.4)/113.2 = 204.22 (molecular weight, g/mol) 205.09 206.10 0 203 mass 204 205 206 Mass defect 1H (p+e-) 12C 14N 16O n 1.0078 u 12.0000 u 14.0031 u 15.9949 u 1.0087 u Carbon-12: 6 protons, 6 neutron and 6 electrons 6 x 1.0078 u + 6 x 1.0087 u = 12.0990 u Mass Defect = 12.0990 u – 12.0000 u = 0.0990 u E = mc2 0.1 u = 93 MeV Elemental composition from accurate mass 1H 12C 14N 16O 1.0078 u 12.0000 u 14.0031 u 15.9949 u What is 28 u? N2 (2 x 14 u), CO (12 u + 16 u) or C2H4 (2 x 12 u + 4 x 1 u)? What is 28.0313 u? [high accuracy] C2H4 (2 x 12.0000 u + 4 x 1.0078 u) High resolution mass spectrometry 562.19 100 561.18 % 0.06 amu FWHM High Resolution: R = 561/0.06 ~ 9,000 563.20 TOF: 7,000-50,000 Orbitrap: 104-105 FT ICR: 105-106 564.20 0 561.14 100 0.8 amu FWHM % 562.10 Nominal Mass Resolution (<1000) R = 561/0.8 ~ 700 Quadrupoles and ion traps, some TOFs 563.06 0 m/z 560 561 562 563 564 9 Mass of an electron becomes important at high accuracies Two types of ions in mass spectrometry: Odd Electron (OE) Ions Typically generated by electron ionization (GC/MS): C 11 H 12 N 2 O 2 204.08988 Da (2.6 ppm error) e C 11 H 12 N 2 O 2 204.08933 Da (true mass) 0.00055 Da Even Electron (EE) Ions Typically generated by chemical ionization techniques and electrospray C 11H 12N 2O 2 C 1 1 H 1 3 N 2 O 2 205. 09715 Da (true mass) C 1 1 H 1 3 N 2 O 2 205. 09770 Da (2.6 ppm error) Modern instruments can achieve < 1 ppm accuracy Identification based on accurate mass NL: 6.95E6 MeyerT_100422_sampl e0062#636 RT: 6.29 AV: 1 T: FTMS {1,1} - p ESI Full ms [70.00-800.00] 212.00217 C 8 H6 O4 N S -0.63646 ppm 100 Relative Abundance Matching accurate mass and isotopic peak ratio 90 80 70 60 50 Acquired spectrum 40 30 H N 20 10 0 NL: 8.59E5 C 8 H 6 O 4 N S: C 8 H6 O4 N1 S 1 pa Chrg -1 212.00230 C 8 H6 O4 N S 0.00000 ppm 100 90 80 OSO3 70 60 50 Theoretical spectrum 40 30 20 10 0 211 212 213 m/z 214 215 Error = -0.00013 Da/212.0023 Da * 1000,000 = 0.6 parts per million (ppm) Confirmation of structure from isotopes (M+2) 213.99796 100 NL: 3.24E5 MeyerT_100422_sampl e0062#636 RT: 6.29 AV: 1 T: FTMS {1,1} - p ESI Full ms [70.00-800.00] 90 Matching accurate mass and isotopic peak ratio Relative Abundance 80 70 60 50 Acquired spectrum 40 30 H N 20 10 0 100 NL: 3.88E4 C 8 H 6 O 4 N S: C 8 H6 O4 N1 S 1 pa Chrg -1 213.99810 90 80 OSO3 70 60 50 Theoretical spectrum 40 30 20 10 0 213.94 213.96 213.98 214.00 214.02 m/z 214.04 214.06 214.08 Tandem Mass Spectrometry Mass Spectrometer M M M Ion Source HPLC M+ Mass 1 F+ M+ Collision Cell Analyzer M+ Atmosphere Vacuum M+ F+ Mass Analyzer 2 F+ Detector F+ MS/MS of isomers Prostaglandin A1 336.2301 amu Prostaglandin B1 336.2301 amu Chromatography Separation by volatility and polarity (gas chromatography/GC) or polarity (liquid chromatography/LC) C12 ) 100 C10 C8 C9 14.40 C14 16.73 11.82 C16 Gas chromatography of hydrocarbons 18.84 10.43 % 9.00 C18 20.77 C30 C20 C22 30.05 28.44 22.53 27.07 24.16 25.66 0 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 22.00 24.00 26.00 28.00 30.00 32.00 34.00 36.00 Time 2D dimensionality of metabolomics data in LC-MS and GC-MS GC-MS and LC-MS GC MS LC -Derivatization usually required (except VOC) -Upper mass limit at ~400-500 amu -Preferred for small polar metabolites (primary metabolism) -Relatively high peak capacity -No derivatization usually required -Upper mass is limited by column permeability -Preferred for bigger molecules (e.g. some lipids, secondary metabolites) -Relatively low peak capacity -EI ion source (extensive fragmentation, reproducible, libraries available -ESI ion source (ionic compounds, ion suppression) -CI ion source (little fragmentation, advantage for accurate mass measurement -APCI ion source (less ion suppression and more amenable for non polar compounds than ESI but usually lower sensitivity) Types of Experiments in Metabolomics targeted non-targeted • Number of analyzed metabolites is limited by the number of available standards •Number of analyzed metabolites is limited by capacity of analytical instrumentation • Absolute quantitation of metabolites (nM, mg/mL) • Relative quantitation of metabolites (fold) • Selective MS detectors (quadrupoles, triple quadrupoles) •Scanning MS detectors (ion trap, TOF, FT) Bottlenecks in Metabolomics ASMS09 survey: metabolomics bottlenecks 9-Other; 2% 8-Data acquisition/throughput; 3% 7-Validation/Utility Studies; 5% 6-Statistical analysis; 5% 5-No opinion; 6% 1-Identification of metabolites; 35% 4-Sample preparation; 8% 3-Data processing/reduction; 14% 2-Assigning biological significance; 22% throughput (3 %) vs. post-acquisition bottlenecks (5 + 35 + 22 + 14 = 76 %) GC-MS based metabolomics: overview 50 - 600 (400) amu mass range mono- and disaccharides, amino acids, fatty acids (mostly primary metabolites) Derivatization usually required GC-MS: derivatization 40 mg/mL in pyridine at 37˚C for 90 min Prevents α-ketoacids from thermal decarboxylation Keeps sugars in open conformation to minimize number of conformation and relieve steric hindrences for next step OH O HO OH OH HO OH HO OH O OH N OCH3 O HO OH α/β epimers HO OH H 3C NH2 HO OH Syn/anti isomers GC-MS: derivatization MSTFA, 1% TMCS at 50˚C for 30 min Substitution of active hydrogens Incomplete derivatization possible Si HO M STFA Si O M S T FA O O O NH2 O HN Si N Si Si GC-MS data analysis S H -1 5 H G -6 0 3 -S H -1 5 -H 1 4 .2 4 100 1 4 .4 8 2 1 .2 5 2 2 .8 3 3 4 .5 5 S c a n E I+ T IC 1 .3 4 e 6 3 6 .5 8 2 3 .3 4 3 5 .5 0 2 3 .4 2 2 5 .4 2 2 7 .1 5 2 2 .5 5 % 1 5 .0 0 2 6 .6 6 4 9 .6 5 2 6 .0 9 1 3 .0 7 9 .4 0 4 0 .5 7 2 8 .8 3 3 7 .6 2 3 0 .6 6 3 2 .9 5 1 5 .0 6 2 3 .9 8 1 3 .6 9 9 .8 7 3 0 .8 1 2 7 .3 1 1 6 .2 3 1 0 .0 3 2 8 .2 2 2 1 .7 8 1 2 .4 4 1 6 .1 0 1 8 .0 5 1 6 .9 0 3 9 .9 2 3 7 .3 8 3 0 .0 7 3 4 .3 9 3 1 .3 2 3 3 .4 7 3 2 .0 4 4 1 .9 2 4 2 .5 8 3 8 .4 9 1 8 .3 7 2 0 .9 9 1 9 .1 12 0 .0 1 4 2 .8 0 0 T im e 1 0 .0 0 1 2 .5 0 1 5 .0 0 1 7 .5 0 2 0 .0 0 2 2 .5 0 2 5 .0 0 2 7 .5 0 3 0 .0 0 3 2 .5 0 3 5 .0 0 3 7 .5 0 4 0 .0 0 4 2 .5 0 4 5 .0 0 4 7 .5 0 5 0 .0 0 Electron Ionization in GC-MS 70 eV >> energy of chemical bond Highly reproducible Extensive fragmentation Often no molecular ion observed EI: alpha-cleavage [a ] more common CID MS/MS: inductive cleavage [i ] common OH2 i OH a OH GC-MS: present and future Current GC-MS metabolomics platforms use: 1) nominal resolution mass analyzers (no accurate mass and elemental composition) 2) electron ionization ion source OE molecular ions, extensive fragmentation, often molecular ion is not observed Advantages: 1) Low cost 2) Good chromatographic separation for many small polar metabolites after derivatization 3) Extensive libraries of fragmentation spectra help identification 4) Retention time is to some extent predictable (retention indices) Trends: 1) Development of high resolution instruments for GC/MS 2) Development of soft ionization sources similar to LC/MS (EE ions, no fragments) GC-MS data analysis Deconvolution of mass spectra based on chromatographic profiles (e.g freeware AMDIS) Identification of metabolites based on matching to spectral libraries and retention indices Automated processing routines exist for some GC-MS instrument (SetupX and BinBase) Application Examples - listeria 1st inj ARONOVP_100819_SAMPLE004_STER Scan EI+ TIC 2.31e7 100 - cells Glycine-2TMS % 6.16 0 6.00 6.50 7.00 ARONOVP_100819_SAMPLE005_LIST 7.50 8.00 8.50 9.00 9.50 10.00 10.50 11.00 11.50 12.00 12.50 Scan EI+ TIC 2.31e7 11.00 11.50 12.00 12.50 10.67 100 6.16 % + cells 0 6.00 Time 6.50 7.00 7.50 8.00 8.50 9.00 9.50 10.00 10.50 Application Examples: AMDIS Peak of interest Acquired mass spectrum Library mass spectrum (glycine-2TMS) LC-MS based metabolomics Combination of ionization modes is preferred (ESI, APCI, +, -) Reversed phase LC for non-polar metabolites and hydrophilic interaction chromatography (HILIC) for polar metabolites Detection of spectral “features” (ions) using metabolomics software Identification based on accurate mass, and fragmentation (MS/MS libraries) Electrospray Ionization (ESI) R + H+ R– H+ Positive ESI [R+H]+ Negative ESI [R – H]+ APCI ESI Soft ionization, pseudomolecular ions [M + H]+, [M - H]- ,[M + Na]+, [M + Cl]Volatile mobile phase, no inorganic salts (phosphate buffer) Ionization in gas phase Ionization in liquid phase High ionization efficiency for compounds with high proton affinity in gas phase High ionization efficiency for compound ionic in a solution Usually singly charged ions Multiply charged ions common for large biomolecules (proteins, nucleic acids) Compatible with reverse and normal phase, Reverse phase, Mobile phase must be conductive Ion suppression common Combination of Acquisition Modes Separation modes: Reversed phase and HILIC Ionization modes: ESI and APCI or combined ESI/APCI (MM) Ionization polarities: + and - Nordstrom A. et al, Anal Chem, 2008. RP and HILIC liquid chromatography RT: 0.00 - 10.00 1.12 114.07 100 95 2.31 166.09 90 NL: 8.02E7 TIC MS MeyerT_10 0127_samp le034 Creatinine 1.69 132.10 85 N 80 O 75 NH2 70 Relative Abundance 65 60 3.29 188.07 55 N 50 Reversed Phase C18 45 40 4.03 268.15 35 30 25 2.45 232.03 20 2.80 102.09 15 3.93 102.09 4.52 102.09 5.21 74.10 5.43 74.10 5 Time (min) 6 0.11 74.10 10 6.69 7.12 74.10 74.10 7.66 74.10 8.96 8.46 74.10 74.10 9.02 74.10 5 0 0 1 2 3 4 7 8 9 10 RT: 0.00 - 10.03 1.63 496.34 100 NL: 1.72E8 TIC MS MeyerT_10 0127_samp le067 95 90 Creatinine 85 80 2.57 114.07 2.55 114.07 75 70 Aminopropyl HILIC Relative Abundance 65 60 55 1.42 758.57 50 2.53 114.07 4.38 269.00 4.14 269.00 45 40 2.93 144.10 35 1.22 288.29 30 25 0.92 332.33 20 4.03 269.00 4.54 269.00 4.77 104.99 3.04 118.09 1.77 496.34 Better retention for polar molecules 5.93 255.23 7.45 233.24 7.60 233.24 7.90 233.24 8.45 233.24 8.87 233.24 9.33 233.24 15 10 0.86 233.25 5 0 0 1 2 3 4 5 Time (min) 6 7 8 9 10 LC-MS: Data Analysis Alignment of chromatograms (optional) Detection of ‘features’ in mass chromatograms Removal of isotopic peaks, adducts, fragments etc to improve statistics Statistical analysis Identification based on accurate mass, MS/MS spectra and comparison with standards Example: Search for bacterial metabolites in humans comparing two groups: controls and people who underwent colectomy (no colon bacteria) Initially software detected 900 features in positive ESI mode After features with missing chromatographic profile were removed 769 features left (visual inspection) After isotopes were removed, 554 features left. Only at this point, these are likely molecular ions of individual metabolites Adducts MeyerT_100422_sample0088 C18 pos R5 4/27/2010 7:23:42 PM RT: 14.99 - 16.15 15.59 100 M+H 50 Relative Abundance NL: 1.53E6 m/z= 398.95-399.26 MS MeyerT_100422_sa mple0088 15.61 15.34 0 100 15.83 15.86 15.97 16.01 16.06 NL: 1.37E6 m/z= 416.11-416.28 MS MeyerT_100422_sa mple0088 15.59 M + NH4 50 16.03 15.09 15.12 15.18 0 100 15.90 16.12 NL: 6.43E5 m/z= 421.05-421.26 MS MeyerT_100422_sa mple0088 15.61 M + Na 50 15.78 15.84 0 15.0 15.1 15.2 15.3 15.4 15.5 15.6 Time (min) 15.7 15.8 15.97 15.9 MeyerT_100422_sample0088 #1581-1600 RT: 15.51-15.69 AV: 20 NL: 7.32E5 T: FTMS {1,1} + p ESI Full ms [70.00-800.00] 399.1856 100 M+H 90 16.08 16.0 16.1 M + NH4 416.2120 Relative Abundance 80 70 60 M + Na 50 421.1673 40 30 20 400.1888 417.2153 10 0 397.1851 398 401.1905 403.0866 400 402 404 407.1887 406 408 409.9785 410 415.2011 413.2656 412 m/z 414 416 420.3664 418.2171 418 420 422.1704 423.1722 422 424 426 Fragments in LC-MS MeyerT_100422_sample0088 C18 pos R5 4/27/2010 7:23:42 PM RT: 6.69 - 8.06 Relative Abundance NL: 2.23E5 m/z= 118.05-118.09 MS MeyerT_10042 2_sample0088 7.31 7.33 100 7.29 80 7.35 Hyppuric acid 60 7.39 40 20 0 100 6.73 6.78 6.87 6.91 7.00 7.08 7.51 7.55 7.60 7.66 7.74 7.80 7.86 7.17 7.94 7.99 NL: 2.38E5 m/z= 118.05-118.09 MS MeyerT_10042 2_sample0088 7.31 7.33 7.29 80 7.35 m/z 118.0651 60 7.39 40 7.41 20 6.73 6.78 6.85 6.92 6.98 7.51 7.55 7.60 7.66 7.06 7.12 7.17 7.76 7.80 7.93 8.00 0 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 Time (min) 7.5 7.6 7.7 7.8 7.9 8.0 MeyerT_100422_sample0088 #740-782 RT: 7.16-7.52 AV: 43 NL: 3.04E6 T: FTMS {1,1} + p ESI Full ms [70.00-800.00] Hyppuric acid 100 90 180.0651 Relative Abundance 80 70 C8H8N – indole? No, fragment of hyppuric acid Not confirmed by GC-MS either 60 50 40 176.9715 30 20 10 118.0651 0 115 181.0684 122.5471 125.9862 120 125 134.0599 130 135 141.9584 140 145 149.0231 154.9899 150 155 m/z 162.0547 167.0125 173.0298 160 165 170 175 182.9847 180 185 195.0873 190 195 Identification tools Accurate mass search (BioCyc, HMDB, Metlin) MS/MS search (Metlin, MassBank) In addition, many MS manufacturers offer proprietary tools for structure elucidation MassBank MS/MS sulfate m/z 132 C8H6NO LC-MS Data Analysis Summary Not every peak detected by a mass spectrometer represents an individual metabolite Automated data processing helps to reduce the amount of routine work, however human intervention is still required Accurate mass measurements and MS/MS allow to determine elemental composition of unknowns and their structural components. Confirmation with chemical standards is still required