EMBO Practical Course on Metabolomics Bioinformatics for Life Scientists “Dissecting an untargeted metabolomic workflow” Oscar Yanes, PhD Untargeted metabolomics workflow Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis Metabolite identification Experimental validation Hypothesis Untargeted metabolomics workflow Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis EMBO Course Metabolite identification Experimental validation Hypothesis Ultimate goal of metabolomics List of metabolites differentially regulated Biomarker discovery Pathway analysis Model construction Scientific literature Disease vs. control Mechanism Validation Hypothesis Untargeted metabolomics workflow Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis Metabolite identification Experimental validation Hypothesis THE IMPORTANCE OF EXPERIMENTAL DESIGN I want to do metabolomics ME COLLABORATOR THE IMPORTANCE OF EXPERIMENTAL DESIGN … I want to do metabolomics ME COLLABORATOR THE IMPORTANCE OF EXPERIMENTAL DESIGN I have many samples at -80°C. Could you do metabolomics and find out something? ME COLLABORATOR THE IMPORTANCE OF EXPERIMENTAL DESIGN I have many samples at -80°C. Could you do metabolomics and find out something? !! ME COLLABORATOR THE IMPORTANCE OF EXPERIMENTAL DESIGN BASIC DIAGRAM OF A MASS SPECTROMETER BASIC DIAGRAM OF A MASS SPECTROMETER Gas-phase: Gas chromatography Liquid-phase: Liquid chromatography Capillary electrophoresis Solid-phase: Surface-based BASIC DIAGRAM OF A MASS SPECTROMETER Electron ionization (EI) Chemical ionization (CI) Atmospheric pressure chemical ionization (APCI) Electrospray ionization (ESI) Laser desorption ionization (LDI) Watch out serum/plasma samples from biobanks! Lactate Glucose 1.0 Area/Area (IS) Area/Area (IS) 0.4 0.3 0.2 0.1 0.0 0 4 12 24 0.8 0.6 0.4 0.2 0.0 0 Time (h) 4 24 Time (h) Pyruvic Acid Choline 0.2 1.0 Area/Area (IS) Area/Area (IS) 12 0.1 0.0 0 4 12 Time (h) 24 0.8 0.6 0.4 0.2 0.0 0 4 12 Time (h) 24 Untargeted metabolomics workflow Sample preparation Experimental design Sample analysis by MS Pre-processing data analysis Metabolite identification Experimental validation Hypothesis Requisite for untargeted metabolomics Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500) Requisite for untargeted metabolomics Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500) Number of features Intensity of the features Requisite for untargeted metabolomics Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500) Number of features Coverage of the metabolome Intensity of the features Accurate quantification and identification of metabolites How do we increase the number of features and their intensity?? intensity mass time Feature: molecular entity with a unique m/z and retention time value How do we increase the number of features and their intensity?? intensity mass time Sample preparation: Chromatography: - Extraction method - Stationary-phase - Mobile-phase Ion Funnel Technology etc. Extraction method Hot EtOH/Amm. Acetate Cold Acetone/MeOH Only 45% of the metabolites are detected with Acetone/MeOH MS/MS threshold Extraction method Yanes O., et al. Anal. Chem. 2011; 83(6):2152-61 Liquid Chromatography: mobile-phase Ammonium Fluoride Ammonium acetate Formic acid Yanes O et al. Anal. Chem. 2011; 83(6):2152-61 Ammonium fluoride Ammonium acetate FAmmonium fluoride Chromatography: stationary phase HILIC RP C18/C8 Effect of pH; ammonium salts; ion pairs (e.g. TBA) LC flow rate and pressure: UPLC vs. HPLC vs. nanoLC (vs. GC!) HPLC UPLC Minutes Minutes BASIC DIAGRAM OF A MASS SPECTROMETER Electron ionization (EI) Chemical ionization (CI) Atmospheric pressure chemical ionization (APCI) Electrospray ionization (ESI) Laser desorption ionization (LDI) PRACTICAL ASPECTS 1. Number of scans/second Implications in LC/MS and GC/MS: Quantification Maximum intensity or integrated area 2. Instrument resolution Implications: Detector saturation Quantification 3. Sample amount injected Implications: Detector saturation Untargeted metabolomics workflow Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis EMBO Course Metabolite identification Experimental validation Hypothesis RAW METABOLOMICS DATA FROM RAW DATA TO METABOLITE IDs METABOLITE IDENTIFICATIONS STATISTICAL ANALYSIS PRE-PROCESSING RAW DATA CONVERSION FROM RAW DATA TO METABOLITES IDs GC/MS METABOLITE IDENTIFICATIONS RAW DATA CONVERSION LC/MS PREPROCESSING LC/MS GC/MS STATISTICAL ANALYSIS PATHWAY ANALYSIS LC-MS WORKFLOW LC-MS RAW DATA PROTEOWIZARD mZDATA PREPROCESSING mZRT1 M1 mZRT1 I M1 M2 ... ... ... mZRT2 mZRT3 ... ... mZRT2 ... ... ... I M2 STATISTICAL ANALYSIS IDENTIFICATION mZRT Features Table Feature: individual ions with a unique mass-tocharge ratio and a unique retention time LC-MS WORKFLOW RAW LC-MS DATA TO mZXML: PROTEOWIZARD [Nature Biotechnology, 30 (918–920) (2012)] VENDOR Agilent Bruker Thermo Fisher Waters AB Sciex FORMATS MassHunter.d Compass.d, YEP, BAF, FID RAW MassLynx.raw WIFF CONVERTER ProteoWizard ProteoWizard ProteoWizard ProteoWizard ProteoWizard LC-MS WORK-FLOW XCMS PRE-PROCESSING •http://metlin.scripps.edu/download/ •Free & Open Source •Based on R •On-line version •Suitable for: -GC-MS -LC-MS Analytical Chemistry, 78(3), 779–787, 2006 Analytical Chemistry, 84(11), 5035-5039, 2012 LC-MS WORKFLOW XCMS PRE-PROCESSING 1. FEATURE DETECTION [BMC Bioinformatics, 2008 9:504] LC-MS WORKFLOW XCMS PRE-PROCESSING 1. FEATURE DETECTION 1. Dense regions in m/z space 2. Gaussian peak shape in chromatogram LC-MS WORK-FLOW XCMS PRE-PROCESSING 2. RETENTION TIME CORRECTION LC-MS WORKFLOW • 103-104 mZRT features IDENTIFICATION NOT FEASIBLE! • features redundancy: -adducts: [M+H+], [M+Na+], [M+NH4+], [M+H+-H2O]… -isotopes: [M+1], [M+2], [M+3] • Many mZRT features are noisy in nature and irrelevant to our phenomea STATISTICAL ANALYSIS FEATURES RANKING Those features varying according to our phenomena are retained to further identification experiments LC-MS WORK-FLOW FEATURES RANKING CRITERIA (I) ANALYTICAL VARIABILITY -RANDOMIZE -USE QCs TO CHECK ANALYTICAL VARIATION WORKLIST LC-MS WORK-FLOW FEATURES RANKING CRITERIA (I) ANALYTICAL VARIABILITY T CV mZRT ( j ) T S mZRT ( j ) X T mZRT 100 ( j) QC CV mZRT ( j ) QC S mZRT ( j ) X QC mZRT ( j) 100 USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831 LC-MS WORK-FLOW FEATURES RANKING CRITERIA (IV) HYPOTHESIS TESTING+FDR =0.05 (235 features significantly varied by chance, 26% out of 900) FDR=0.0074 (20 features varied by chance, 5% out of 404) #features=4704 USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831 USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831 LC-MS WORKFLOW 10M data points # mZRT=51908 (i) analytical variability # mZRT=38377 (ii) features intensity # mZRT=4704 (iii) hypothesis testing + fold change # mZRT=250 Annotation Data Base look-up Identification experiments 10-50 differential metabolites Workflow for Metabolite Identification Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards Workflow for Metabolite Identification Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards Workflow for Metabolite Identification Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards Step 2: Search databases for accurate mass Step 2: Search databases for accurate mass Each feature returns many hits. HMDB Metlin Step 2: Search databases for accurate mass Common adducts Na+, NH4+, K+, Cl-, and H2O loss Adducts increase number of hits returned! Workflow for Metabolite Identification Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards Step 3: Filter “putative” identification list Eliminate •drugs? • intensity in the mass spectrum • adducts? • matches with obviously inconsistent retention times Example: feature with m/z 733.56 is unlikely to be a phospholipid if it has a 1-min RT with reverse-phase chromatography. Look for hits that implicate the same pathway, give those features priority. Standards can be expensive, your intuition will save you money and time! Workflow for Metabolite Identification Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards What experimental data should be required to constitute a metabolite identification? • Accurate mass? • Retention time? • MS/MS data? Unlike proteomics, no journals have requirements or guidelines for publication of metabolite identifications. accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identifiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.” accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” Accurate mass identifications are putative All structures have a neutral mass of 146.0691 Mass error (even if small) and adducts add more possibilities! accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.” Many structural isomers have the retention time citrate isocitrate Citrate and isocitrate have the same retention time but different MS/MS patterns. accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.” Step 4: Compare RT and MS/MS of standards Standard7α-hydroxy-cholesterol H 367.33 H H Q-TOF H H HO OH 367.33 Biological sample 60 100 140 180 220 260 Mass-to-Charge (m/z) 300 340 380 420 Step 4: Compare RT and MS/MS of standards Retention time will be available from the profiling experiment, however, to obtain MS/MS data for the feature of interest in the research sample typically another experiment is required. Note: Only need to perform MS/MS on one research sample. Pick a sample from the group for which the feature is upregulated! Do not pick this group What if feature of interest is not in the database? (or model compound is not commercially available) FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend… What if feature of interest is not in the database? (or model compound is not commercially available) FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend… What if feature of interest is not in the database? (or model compound is not commercially available) FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend… What if feature of interest is not in the database? (or model compound is not commercially available) FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend… • Thermophile organism adapted to live at high temperatures. • Organisms challenged with cold temperature (72 º C) and compared to high-temperature (95 º C) controls. Feature up-regulated at cold temperature Natural product * N1-Acetylthermospermine Identification??? * Feature up-regulated at cold temperature Natural product * N1-Acetylthermospermine * Intensity of m/z 112 fragment is significantly different. NOT A MATCH! Chemical synthesis of hypothesized structure is required Synthesized metabolite produces comparable MS/MS data as natural product from Pyrococcusfuriosus. Natural product N4(NAcetylaminopropyl)spermidine N1-Acetylthermospermine Ultimate goal of metabolomics List of metabolites differentially regulated Biomarker discovery Pathway analysis Model construction Scientific literature Disease vs. control Mechanism Validation Hypothesis Validate your metabolites!! Targeted metabolomics Molecular biology techniques LC and GC-Triple quadrupole MS Immunohistochemistry Reverse Transcription-PCR Gene expression array Cell cultures Animal experimentation ….. Thank you email: oscar.yanes@urv.cat web: www.yaneslab.com Twitter: @yaneslab