Metabolomics PCB 5530 Tom Niehaus Fall 2014 Learning Outcomes Day 1 • Lecture - Learn the basics of metabolomics - Understand the limitations of metabolomics - Things to consider when using metabolomics for your own research Day 2 • Finish lecture • Activity 1: Identifying an unknown peak • Activity 2: Analyzing a metabolomics dataset Definitions and Background Metabolome = the total metabolite pool • All low molecular weight (< 2000 Da) organic molecules in a sample such as a leaf, fruit, seedling, etc. Sugars Nucleosides Organic acids Ketones Aldehydes Amines Amino acids Small peptides Lipids Steroids Terpenes Alkaloids Drugs (xenobiotics) Definitions and Background Metabolomics = high-throughput analysis of metabolites Metabolomics is the simultaneous measurement of the levels of a large number of cellular metabolites (typically several hundred). Many of these are not identified (i.e. are just peaks in a profile). Not hypothesis driven snapshot Definitions and Background Definitions and Background Scope Metabolomics -measure many compounds Metabolic profiling -measure a set of related compounds Targeted analysis -measure a specific compound Accuracy Definitions and Background History and Development • Metabolic profiling is not new. Profiling for clinical detection of human disease using blood and urine samples has been carried out for Centuries. This urine wheel was published in 1506 by Ullrich Pinder, in his book Epiphanie Medicorum. The wheel describes the possible colors, smells and tastes of urine, and uses them to diagnose disease. Nicholson, J. K. & Lindon, J. C. Nature 455, 1054–1056 (2008). Definitions and Background History and Development • Advanced chromatographic separation techniques were developed in the late 1960’s. • Linus Pauling published “Quantitative Analysis of Urine Vapor and Breath by GasLiquid Partition Chromatography” in 1971 • Chuck Sweeley at MSU helped pioneer metabolic profiling using gas chromatography/ mass spectrometry (GC-MS) • Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early leaders in the field. • Metabolomics is expanding to catch up with other multiparallel analytical techniques (transcriptomics, proteomics) but remains far less developed and less accessible. Definitions and Background Plant Metabolome Size • It is estimated that all plant species contain 90,000 - 200,000 compounds. • Each individual plant species contains about 5,000 – 30,000 compounds. e.g. ~ 5,000 in Arabidopsis The plant metabolome is much larger than that of yeast, where there are far fewer metabolites than genes or proteins (<600 metabolites vs. 6000 genes). The size of the plant metabolome reflects the vast array of plant secondary compounds. This makes metabolic profiling in plants much harder than in other organisms. Definitions and Background The Power of Metabolomics Silent Knockout Mutations. ~90% of Arabidopsis knockout mutations are silent – i.e. have no visible phenotype and so provide no clues to gene function. (The search for some sort of visible phenotype therefore often becomes desperate.) The situation in yeast is similar – up to 85% of yeast genes are not needed for survival. When there is little or no change in growth rate (visible phenotype) of a knockout mutant, the pool sizes of metabolites have altered so as to compensate for the effect of the mutation, leaving metabolic fluxes are unchanged. Thus – intuitively – mutations that are silent when scored for metabolic fluxes or growth rate (growth rate is the sum of all metabolic fluxes) should have obvious effects on metabolite levels. There is a firm theoretical basis for this in MCA. Definitions and Background The Power of Metabolomics Example. • In the Chloroplast 2010 project (phenotype analysis of knockouts of Arabidopsis genes encoding predicted chloroplast proteins): • Various knockouts showed essentially normal growth and color but highly abnormal free amino acid profiles, e.g. At1g50770 (‘Aminotransferaselike’) Definitions and Background Limitations of metabolomics • High biological variance in metabolite levels (i.e., high variation between genetically identical plants grown in the same conditions) • Unlike nucleic acids and proteins, metabolites have a vast range of chemical structures and properties. Their molecular weights span two orders of magnitude (20–2000 Da). Therefore no single extraction or analysis method works for all metabolites. (Unlike DNA sequencing, microarrays, MS analysis of proteins – all are general methods.) • The concentrations of various metabolites can vary dramatically from mM to pM concentrations. • Some metabolites are labile and won’t survive extraction and analysis • Issues with chromatography, detection, and data analysis Metabolomics Steps in metabolomics sample preparation sample extraction chromatography data analysis detection Sample Preparation Growth/Sample Size • Grow organisms (e.g. plants or bacteria) under identical conditions • Randomize the treatment groups (Make sure the effects you measure are due to the variable being testing) • number of replicates… depends on what you want to find - Large differences = small replication needed - Small differences = large replication needed • In general, six replicates for each treatment are needed (due to high biological variability) Sample Preparation Sample collection • Uniform sample sizes (e.g. hole punches in leaves) • Be consistent - similar tissue - time of day • Quickly freeze sample in liquid nitrogen, store samples at -80°C • Fast-harvesting method for bacteria (~30 sec) Sample Extraction Choosing an extraction method • No universal extraction method exists • Some solvents may degrade certain compounds • Its good to have some idea of what metabolites you want to extract Sample Extraction Sample extraction • The method should be consistent and reproducible SPEX SamplePrep Grinder • Further workup may be required (e.g. solid phase extraction) Chromatography introduction • Invented in 1900 by Mikhail Tsvet (used to separate plant pigments) • There are several types of chromatography, but all consist of a stationary phase and a mobile phase. Compounds are separated based on differential partitioning between the two phases. • Types include: - TLC (thin-layer chromatography) - GC (gas chromatography) Y - LC (liquid chromatography) GC and LC are routinely used in metabolomics Chromatography Gas Chromatography • GC = ‘good chromatography’ • optimized over several decades • ~5 columns routinely used (5% diphenyl/95% methyl siloxane) • high reproducibility Identification based on RT Limitations: - high temperatures can destroy labile compounds - polar compounds cannot ‘fly’ on GC columns and must first be derivatized Chromatography Sample derivatization Step 2) Silylation 100 Abundance Step 1) Methoximation 115 96 340 207 91 107 231 128 50 147 193 163 141 371 218 177 283 244 267 312 356 298 388 401 415 383 415 0 298 244 257 189 141 96 50 91 163 435 457 475 489 356 312 271 283 205 218 107 128 231 371 100 m/z 115 80 110 340 140 170 200 estrone minor_RI 950990 230 260 290 320 350 380 410 440 estrone major_RI 948753 470 500 Z/E isomer have same mass spectrum but differ 2 seconds in retention time Gas chromatography requires volatile compounds (two step derivatization in vial) 1) Methoximation of aldehyde and keto groups (primarily for opening reducing ring sugars) 2) Silylation of polar hydroxy, thiol, carboxy and amino groups with silylation agent MSTFA • • A single compound with multiple active groups will result in multiple peaks (1TMS, 2TMS, 3TMS) GC-MS can distinguish between stereoisomers Anal Chem. 2009 Dec 15;81(24):10038-48. doi: 10.1021/ac9019522. FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. Kind T, Wohlgemuth G, Lee do Y, Lu Y, Palazoglu M, Shahbaz S, Fiehn O. 20 Chromatography Liquid Chromatography • LC = ‘Lousy chromatography’ • fairly new, recent advances • thousands of columns available - normal phase -ion exchange - reverse phase -HILIC • infinite solvent systems possible • low reproducibility Advantages: - compound can be collected after separation - derivatization not necessary - a separation protocol can be optimized for nearly any compound Detection Mass Spectrometry • mass spectrometry is a technique to measure the mass of ions (m/z) • All mass spectrometers perform three main tasks: 1) Ionize molecules: 2) Use electric and magnetic fields to accelerate ions and manipulate their flight: 3) Detect ions (convert to electronic signal): Detection Mass Spectrometry relative abundance Example mass spectrum: m/z Detection Mass Spectrometry Normalized Intensity 100 Peak selector Chromatogram (GC-MS) 75 50 25 0 0 Normalized Intensity 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 100 22.00 Time [min] 166 Mass spectrum (EI) 75 97 129 83 61 50 47 25 35 70 0 30 40 50 60 70 112 80 90 100 110 119 120 130 140 150 160 170 m/z Mass Spectrometry Ionization: chemical vs electon Chemical Ionization (+) [M+H]+ 70eV; 500uA Emission; 40% CI gas; mass range 65-800; ScanRate0.2-0.03; source tempe 200C;PushInter 40 397.1690 100 TOF MS CI+ 1.43e+003 [M+28]+ % 396.1648 398.1729 [M+40]+ 399.1714 400.1689 401.1606 395.2110 Electron Ionization (+) 0 394.0 396.0 398.0 400.0 402.0 Accurate mass [u] Mass accuracy [ppm] Isotopic abundance error [%] A+1 [%] A+2 [%] A+3 [%] 404.0 406.0 397.1690 5 5 37.90 17.84 5.03 • [M+H]+ is very abundant in chemical ionization (CI) • Different ionization gases can be used such as NH3, methane, butane Example picture: adduct ions at M+28.02=[M+C2H5]+ and M+40.04=[M+C3H5]+ are used for verification of [M+H]+ m/z Adduct formation – expect the unexpected Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%] [M+H]+ 62.55381 [M+H-C3H8O]+ 0.02667 [M-CCl3]+ 0.00381 [M(37Cl)]+. 0.00190 [M-2H+Na]- 0.00127 [M+2H]2+ 11.44459 [M-H-H2O-CO2]- 0.02667 [M-H-CO2]- 0.00381 [M-CH3]+ 0.00190 [M-H+Co]+ 0.00127 [M+H-H2O]+ 8.77598 [M-H-H2O-HCO2H]- 0.02667 [M+H-C5H7PO6]+ 0.00381 [M+H-C4H11N]+ 0.00190 [M+H-(CH3)2NH-C3H6]+ 0.00127 [M-H]- 6.25214 [M+H-3H2O]+ 0.02540 [M+H-HCl]+ 0.00381 [M+H-NO2-CHO]+ 0.00190 [M+H-C10H6(OH)N]+ 0.00127 [M+Na]+ 5.51055 [M+H-CHN]+ 0.02540 [M+H-C12H12N2O3]+ 0.00381 [M-H-HF]- 0.00190 [M-H+Ni]+ 0.00127 [M+H-NH3]+ 1.19494 [M+K-3H]2- 0.01905 [M+H-CH3CO2H]+ 0.00381 [M(37Cl)+H]+ 0.00190 [M-H-H2O-C4H7CO2H]- 0.00127 [M+NH4]+ 0.73715 [M+H-(CH3)2NH]+ 0.01524 [M+H-CH3]+. 0.00381 [M-H-C6H10O5]- 0.00190 [M+H-OH]+ 0.00127 [M-H-H2O]- 0.34604 [M+H-CHNO]+ 0.01333 [M+H-H2]+ 0.00381 [M+H-H2O-C6H13N]+ 0.00190 [M(81Br)+H]+... 0.00127 [M-H+2Na]+ 0.32953 [M+H-C2H6O]+ 0.01333 [M+H-C3H8NO6P]+ 0.00317 [M+H-H2O-H3PO4]+ 0.00190 [M-H-CH2O-CH2NH]- 0.00127 [M-H+H2O]- 0.24508 [M+H-CH4O]+ 0.01270 [M+H-C5H14NO4P]+ 0.00317 [M+H-C5H7PO6-NH3]+ 0.00190 [M+H-CO-CONH]+ 0.00127 [M+NH4-H2O]+ 0.22984 [M+H-C7H13NO3]+ 0.01143 [M+Li-(CH3)3N]+ 0.00317 [M-H-C5H7PO6]- 0.00190 [M-H-CONH]- 0.00127 [M+H+H2O]+ 0.19429 [M+Na-2H]- 0.00952 [M+Li-C5H14NO4P]+ 0.00317 [M+H-H2S]+ 0.00190 [M+H-C3H4O2]+ 0.00127 [M+H+Na]2+ 0.18286 [M-H-CH2O]- 0.00952 [M+Cl]- 0.00317 [M+H-H2O-C8H8]+ 0.00190 [M+H-C3H6O4]+ 0.00127 [M+H+K]2+ 0.17524 [M+H-C11H12N2O3]+ 0.00952 [M(35Cl)-H]- 0.00317 [M+H-H2O-NH3-C8H8]+ 0.00190 [M+Na-H2S]+ 0.00127 [M-2H]2- 0.13968 [M+H-C13H16N3O4]+ 0.00952 [M(37Cl)-H]- 0.00317 [M+H-H2O-NH3-C8H8-CO]+ 0.00190 [M-H+2Na-H2S]+ 0.00127 [M+2Na]2+ 0.13778 [M+H-C17H25N3O4]+ 0.00952 [M-H-C5H7O6P]- 0.00317 [M+H-H2O-NH3]+ 0.00190 [M-C5H5Cl]+ 0.00127 [M+2H-NH3]2+ 0.13714 [M+CH3CO2]- 0.00889 [M+H-C3H7O5P]+ 0.00317 [M+H-C3H6]+ 0.00190 [M+H-N2]+ 0.00127 [M+K]+ 0.13651 [M-H2O+Na]+ 0.00825 [M-H-C6H6N8O]- 0.00317 [M+HCO2-320]- 0.00190 [M+H-H2O-CO]+ 0.00127 [M+H-2H2O]+ 0.11810 [M-H+NH3]- 0.00762 [M(81Br)+H]+ 0.00317 [M+H-C3H7N]+ 0.00190 [M-H-H3PO4]- 0.00127 [M+3H]3+ 0.06667 [M+H-C9H9NO]+ 0.00762 [M-C4H9]+ 0.00317 [M-H-H2]- 0.00190 [M+H+CH3CN]+ 0.00127 [M+2H-H2O]2+ 0.06476 [M+H-C15H21N2O3]+ 0.00762 [M-2H+3Li]+ 0.00254 [M-H-C16H30O-H2O]- 0.00190 [M+H-C4H6]+ 0.00127 [M]+. 0.05905 [M-2H+3Na]+ 0.00698 [M-H-HCl]- 0.00254 [M-H-CH4O]- 0.00190 [M+H-CH3OH]+ 0.00127 [M+2Na-H]+ 0.05143 [M+HCO2]- 0.00635 [M+2Li-H]+ 0.00254 [M+H-C10H8FN3]+ 0.00127 [M+H-HCCl3]+ 0.00127 [M-H+2K]+ 0.05079 [M+H-NO2]+ 0.00571 [M+H-C8H10O2]+ 0.00254 [M+Li-C3H5NO2]+ 0.00127 [M+H-C2H3N3]+ 0.00127 [M+H-CO]+ 0.04635 [M+H-C6H13NO2]+ 0.00571 [M+H-C2Cl4]+ 0.00254 [M+Li-H3PO4]+ 0.00127 [M+H-C3H6O2]+ 0.00127 [M+H-CO2]+ 0.04318 [M-H-C3H5NO2]- 0.00508 [M-H-C7H5NO]- 0.00254 [M-2H+3Li-C15H31CO2H]+ 0.00127 [M+H-CH2Cl2O]+ 0.00127 [M+H-CH2O2]+ 0.03810 [M(81Br)-H]- 0.00508 [M+H-C5H11N]+ 0.00254 [M-2H+3Na-C3H5NO2]+ 0.00127 [M(356)+H-HCl]+ 0.00127 [M-H-NH3]- 0.03746 [M+H-HCO2H]+ 0.00508 [M+Ba-H]+ 0.00254 [M-2H+Na+Co]+ 0.00127 [M-C4H4O4S]+ 0.00127 [M.Cl]- 0.03556 [M-2H+Li]- 0.00444 [M+H-C14H25NO3]+ 0.00254 [M-2H+Li-C3H5NO2]- 0.00127 [M+H-C8H14O3]+ 0.00127 [M+Li]+ 0.03111 [M+H-CH4]+ 0.00444 [M+H-C6H5NO2S]+ 0.00254 [M-2H+Li-C16H30O]- 0.00127 [M+H-C2H4]+ 0.00127 …around 290 different adducts Statistics: Adducts in NIST12 MS/MS DB (80,000 spectra) Most common adducts for LC-MS ([M+H]+ [M+Na]+ [M+NH4]+ [M+acetate]+) 26 Mass Spectrometry Mass Spectrometers • There are several types of mass spectrometers: - TOF (time of flight) - Q, QQQ (quadrupole) - Ion Trap - Orbitrap - FTICR (Fourier transform ion cyclotron resonance) Quad TOF Mass Spectrometry Definitions and concepts • isomer- compounds with the same chemical formula e.g. propanol and isopropanol (C3H8O) C8H10N2O has 100,082,479 isomers • isobar- compounds with similar masses e.g. CO (27.9949) and C2H4 (28.0313) • isotopes- compounds with different numbers of neutrons in their nuclei e.g. 12C vs 13C Mass Spectrometry Definitions and concepts • Resolution (resolving power) RP(FWHM) = measured mass / peak width at 50% peak intensity • Accuracy Difference in true mass and measured mass • Mass range Range of ions that can be detected (typically 50-1000 m/z) Mass Spectrometry Why is resolution important? • High resolution is needed to determine the accurate mass • High resolution is also needed to determine accurate isotopic patterns • Note: -monoisotopic vs ave mass -accurate mass can distinguish isobars, not isomers Mass Spectrometry Definitions and concepts • Dynamic range- the concentration range over which a linear response is obtained Determines the capability of an instrument to do quantitative analysis • Sensitivity- the lowest amount an instrument can detect • matrix effects- signal is muted due to complex sample or other unknown processes • Speed- the number of spectra or scans that can be acquired in one second 1 scan/ sec = very slow 500 scans/sec = very fast Mass Spectrometry Why is high speed important? In order to deconvolute (separate/clean) overlapping peaks, enough mass spectra have to be acquired to perform the mathematical calculations. With only one spectrum per second this is impossible. That requires: a) fast scanning detectors like time-of-flight (TOF) b) fast data acquisition hardware/software (DAC/ADC) The LECO TOF can acquire up to 500 mass spectra per second. For GC-MS 20 spectra/second sufficient for comprehensive GC (GCxGC) up to 200 spectra/sec needed 32 Source: LECO ChromaTOF Helpfile Mass Spectrometry Properties of various mass spectrometers TOF Quad Ion Trap Orbitrap FT-ICR Resolving Power very good fair fair very good excellent Dynamic Range very good excellent fair fair fair Sensitivity excellent excellent excellent excellent excellent Speed excellent good excellent good fair Cost 150-300K 100K 100K 500K 1M Maintenance ave ave ave ave very high Data Analysis Goals • Huge data files • Identify all peaks In practice this is very difficult if not impossible • quantification or semi-quantification of compounds Often involves comparing -fold changes in samples or groups of samples e.g. wild-type vs knockout plant Various statistical tests to look for differences in the treatment groups e.g. PCA, MCA, ANOVA Data Analysis Identifying peaks • MS libraries can identify peaks (mostly GC/MS), especially when combined with RT information (GC/MS only): e.g. NIST library Data Analysis Activity 1: Identifying peaks • Can you find sucrose in a MS dataset? Example: sucrose (C12H22O11) Data Analysis Activity 1: Identifying peaks • Accurate mass can help determine the chemical formula: Example: sucrose (C12H22O11) -Determine monoisotopic mass at http://www.chemspider.com/ (342.116211 Da) -Determine M+H from MS adduct excel sheet (class website) (343.123487 Da) Lets say you find that mass in the dataset, but is it really sucrose? -Download Molecular weight calculator at http://www.alchemistmatt.com/mwtwin.html -Open formula finder under tools -enter molecular weight target: 342.116211 -how many isobars are at 2 ppm? 0.1 ppm -enter 342.116211 at chemspider, how many isomers? Data Analysis Example output of a metabolomics experiment • Open GC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -Pathway analysis at http://www.metaboanalyst.ca/MetaboAnalyst/ -enter compound names or KEGG IDs for significant -fold changes -choose organism ‘E. coli’ and submit - Which pathways are affected in this dataset? • Open HILIC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -How many unidentified peaks? -Can you identify an unknown peak with a significant fold change