Dealing With the Unknown Metabolomics & Ben Bowen Metabolite Atlases Pathway Tools Workshop 2010 Acknowledgements Trent Northen Richard Baran Wolfgang Reindl Do Yup Lee Jane Tanamachi Jill Banfield Curt Fisher Paul Wilmes US Department of Energy BER Genome Sciences Program LC-MS/MS Workflow metabolite solvent extraction Sample independent: suitable for unsequenced organisms and communities HPLC (C18; hilic) C18NEG/255.22807/3.39329/Hexadecanoic acid; C18NEG/255.22862/4.89002/Hexadecanoic acid; C18NEG/248.8424/1.47135/24-Dibromophenol; C18NEG/112.98576/27.34079/Acetylenedicarboxylate; C18NEG/270.82471/1.34821/ C18NEG/168.88735/1.29241/ Metabolite ‘features’ & Quantification AGILENT 6520 QTOF MS/MS How a data point becomes a compound From Feature to Formula Photo: John Waterbury, Woods Hole Oceanographic Institute (DOE) Annotation of Metabolite Atlases From Formula to Compound •Selection of features •Pure Spectra •Isotopic pattern fitting •Stable Isotope Labeling •Exact Match to MS/MS Spectra •Partial Match to MS/MS Spectra •Exchangable hydrogen •Retention time •Authentic standards •Other (NMR & Synthesis) •Define feature in database •Sample Metadata •Extraction methods •LC/MS methods •mz@rt annotations Systems biology depends on accurate models Analysis of MetaCyc shows many unique formulas are shown in only a few reactions or pathways Pathway Specific Markers Or Sparsity of Knowledge •Models provide a framework to prove or disprove observations. •Highlight gaps in annotations when new compounds are discovered Using inexact mass for formula ID C & N Isotopic Labels Isotopic Pattern Fitting Reduce Degeneracy About m/z value Mass and Degeneracy are Correlated Heuristically Filtered Brute Force Method Large-scale formula determination using stable isotopic labeling PROBLEM: Difficult to ID many metabolites give low coverage of CONTROL authentic standards Approach: Stable isotope labeling (SIL) for direct empirical formula determination Na15NO3 NaH13CO3 Baran et. al. Untargeted metabolite profiling of Synechococcus sp. PCC 7002 reveals a large fraction of unexpected metabolites (Analytical Chemistry 2010) Less Degeneracy Isn’t Better We Prefer to Work With Unique Chemical Formulae Heuristically Filtered Only Unfiltered + SIL Heuristically Filtered + SIL Noise & Isotopic Patterns Initial focus is on Synechococcus sp a simple yet important model system Simple system For method Widely distributed and globally important in carbon cycling development 1. Photosynthetic bacteria 2. Small genome (3299 ORFs) 3. ~fast growing and easy to grow 4. No metabolite background (salt media) 5. Adaptable: 0-2M salt, T up to 45C Benefits of Using SIL • Are the signals being measured biological? • What type of ion is the signal? • Has this signal been seen before? • What compound(s) is it? • What else in the sample behaves like that compound? Global Profiling SIL Standards Stable isotope labeling Control [15N]NaNO3 15N [13C]NaHCO3 13C Stable isotope labeling Non-biological features dominate •Manually curated •Computationally Identified •Sets are constructed by grouping features by retention time Results ~100 distinct metabolites detected 82 assigned chemical formulas 74 unique 45 outside of Syn7002Cyc 24 outside of MetaCyc or KEGG 54 identified or putatively identified metabolites Using authentic standards or MS/MS Most dominant biological features Formula Metabolite Peak height Cell extract Media extract (+) (-) (+) (-) Formula matches in 7002 MetaCyc KEGG C9H18O8 C5H9NO4 (Glucosylglycerol) 452242 658300 1 2 2 Glutamate 228714 44229 3 9 10 C25H40N2O18 C25H40N2O18 (Hexos(amine)-based oligomer) 184691 90745 0 0 0 (Hexos(amine)-based oligomer) 174581 152126 0 0 0 C9H16O9 C12H22O11 (Glucosylglycerate) (2Hexoses-H2O) 39066 163000 0 2 1 19819 83700 2 26 29 C9H15N3O2 (NNN-trimethylhistidine) 69974 2444 0 1 1 Putative hexose(amine)-based trisaccharide: Excreted metabolites Formula Metabolite C9H11NO2 C3H7NO2 Phenylalanine Peak height Cell extract Media extract (+) (-) (+) (-) Formula matches in 7002 MetaCyc KEGG 12860 8878 24417 8259 1 4 4 (Alanine) 3987 7325 2479 1500 4 7 8 C6H13NO2 C6H13NO2 Isoleucine 1200 1301 4427 1532 2 8 11 Leucine 2089 1992 4093 1707 2 8 11 C11H12N2O2 C5H11NO2S Tryptophan 1778 2264 929 1 2 7 Methionine 950 1 5 4 C5H11NO2 C10H14N2O6 Valine 600 1 8 10 570 0 0 2 C11H15N5O5 C11H15N5O4 Methylguanosine 350 140 0 3 1 Methyladenosine 310 0 1 2 Methyluridine 220 Histidine-betaine derivatives O N OH NH HO N O N Previously only to attributed to non-yeast-fungi and Actinomycetales bacteria Culture purity validated by PCR of markers of ribosomal RNA and sequencing OH NH N O N HS OH NH N N2-acetyllysine Lysine biosynthesis VI (Syn7002Cyc) Lysine biosynthesis V (Syn7002Cyc) Analyze selected features by MS/MS Target features at specific m/z & r.t. MS/MS structural confirmation • Commercial Standards • Metlin • Massbank • Collaborating to expand the number of authentic standards (Siuzdak, Mukhopadhyay) and make these publically available. De novo MS/MS analysis 5-methyluridine Proton Painting CiHjOkNxPySz Ci (HNj1HEXj2) OkNxPySz j=j1+j2 Chemical properties in addition to m/z decyldimethylammoniopropane sulfonate Glycylglycine Lipids from microbial communities • Unlabeled • 15N labeled • 2H labeled (exchangeable) • Sample independent Resolve Isomers of lysolipids Pure-Spectra Includes Ca2+ & Fe2+ Adducts Absolute abundance of L-PE features is much higher in a “friable” sample. AB Muck DS2 AB Muck Friable Relative abundance of various PEs changes with development stage. Moving from features to formulas to metabolites is challenging m/z 205.097 Chemical formula determination Time (sec) C11H12N2O2 Structural analysis After 12 Observations Retention Time Correlation Store retention time correlations SIL Automatic Annotation Test the fit for all possible formulas for common ionization mechanisms Label Purity and Percent Incorporation are Parameters Correlation and mass defect analysis 11 x 10 12 12 x 10 C2H4 8 C2H4 G() 3 G() 4 10 6 4 2 0 28 28.02 28.04 2 1 Kendrick Mass Defect Kendrick Mass Defect 0 0 0 -0.1 -0.26 -0.28 -0.3 -0.32 -0.2 650 -0.3 -0.4 200 400 600 Nominal Mass 800 700 750 Nominal Mass 800 50 100 150 28.06 Modular Metabolome Autocorrelation Spectra of unprocessed data H2O Find the dominant mass differences in data Estimate the likelihood of all possible chemical differences 0.06 Correlation, G( ) 0.05 0.04 How can you know that this is CH2? 0.03 0.02 0.01 0 13.99 14 14.01 14.02 14.03 m/z lag, 14.04 14.05 14.06 What can be resolved 1 0.8 G() 0.6 0.4 0.2 1 -2 -1 0 * 1 2 3 x 10 -3 0.8 Mass of an electron shown for scale 0.6 G() 0 -3 0.4 0.2 0 0.98 0.99 1 1.01 1.02 1.03 1.04 1.05 Time and Mass Correlation C2H4: Positive Time Correlation Neutron: Zero Time Correlation H2O: Mixture of: Zero Time and Negative Time Correlation Relate back to features Correlation, G() 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 16.94 16.96 16.98 17 17.02 17.04 17.06 17.08 17.1 m/z lag, 17.12 Microbial Metabolite Atlases 5 x 10 6 From Features to Pure Spectra intensity 5 4 3 2 1 0 900 1000 1100 retention time (sec) 2500 5 x 10 retention time (sec) intensity 6 Within one experiment: 1000s of features from 100s of metabolites 4 2 0 600 2000 1500 1000 500 0 800 1000 1200 1400 1600 1800 retention time (sec) 2000 2200 2400 500 1000 m/z 1500 2000 The End