Bioinformatics for Targeted Metabolomics: Met and Unmet Needs Klaus M. Weinberger Biocrates Life Sciences AG, Innsbruck, Austria 3rd Annual Forum for SMEs Information Workshop on European Bioinformatics Resources Vienna, September 3 – 4, 2009 Agenda • Why (targeted) metabolomics? BIOCRATES • Proof-of-concept in routine clinical diagnostics • Technology platform • Workflow integration & data analysis • Issues Socrates 470-399 BC Hippocrates 460-377 BC Intelligence Wisdom Medicine Health • Acknowledgements “Creating Knowledge for Health” Metabolomics is... ... the systematic identification and quantitation of all/ biologically relevant small molecules* in a given compartment, cell, tissue or body fluid. It represents the functional end-point of physiological and pathophysiological processes depicting both genetic predisposition and environmental influences like nutrition, exercise or medication. * no biopolymers (nucleic acids, polypeptides) Why (targeted) metabolomics? Six systems biologists examining an elephant Why metabolomics? Polypeptides Proteins ~106 Translation ~107 PTM Enzymatic activity Transport etc. RNA ~105 Transcription DNA 2.5·104 Metabolites • Functional end-point of physiology and pathophysiology • Reasonable scale of the analytical challenge • Direct mirror of environmental influences • (Mal-)nutrition • Exercize • Medication ~104 Metabolomics approaches Sample cohorts Metabolic profiling (e.g. full scan LC-MS) Differential pattern information HPLC-ToF-MS of urine samples +TOF MS: 4.995 to 9.994 min from PR01-40-1_040092_56_1_0204029486.wiff ,saturation correction applied a=3.56735167855777570e-004, t0=3.08326670642854880e+001, subtracted (12.994 to 13.994 min) Max. 32.0 counts. 376.1168 32 Sample: 30 114.0871 28 26 105.0298 24 22 600.2309 Intensity, counts 20 HPLC: injection volume: detection: 388.2420 327.1889 18 mass accuracy: data content: 432.2672 16 377.1319 207.1518 14 359.1289 134.0554 12 10 391.2730 180.0597 344.2083 424.2011 163.1292 4 195.0611 584.2329 497.1240 8 6 570.2205 415.2349 297.1361 318.1748 107.0682 334.1510 mouse urine ID 0204029486 (3/8) Waters Atlantis dC18 10 µl pos. ToF-MS m/z 100-1500 ~ 2 ppm c. 2500 features per spectrum for statistical assessment 366.1678 520.3189 446.2369 624.2623 2 0 100 200 300 400 500 600 700 800 m/z, amu 900 1000 1100 1200 1300 1400 1500 PCA of LC/MS profiling data Candidate drug vs. Untreated Untreated vs. Rosiglitazone Metabolomics approaches Sample cohorts Metabolic profiling (e.g. full scan LC-MS) Differential pattern information Identification of relevant metabolites Targeted metabolomics (ID / quantitation by SID on MS/MS) Metabolite concentration shifts Functional annotation Pathway mapping of quantitative Mx data Asp ASS Cit NO Argsucc ASL OCT Carb-P NOS Fum ARG Arg Orn Urea Areas of application Basic research - Functional genomics in biochemistry, physiology, cell biology, microbiology, ecology, … Agricultural & nutrition industry - Plant intermediary metabolism - Health effects of functional food products Biotechnology - Optimization and monitoring of fermentation processes Pharmaceutical R&D - Pathobiochemistry / characterization of disease models - Safety / toxicology - Efficacy / pharmacodynamics and mode-of-action Clinical diagnostics & theranostics - Early diagnosis and accurate staging - Specific monitoring of therapeutic effects History and proof-of-concept in clinical diagnostics Sir Archibald Edward Garrod • • • • • • 1857, London – 1936, Cambridge Educated in Marlborough, Oxford, and London Postgraduate studies at the AKH in Vienna in 1884/85 Publications on chemical pathology (e.g. of alkaptonuria, cystinuria, pentosuria) One gene – one enzyme hypothesis Concept of inborn errors of metabolism (Croonian lectures to the Royal College of Physicians, 1908) Proof-of-concept in neonatology • Newborn screening for inborn metabolic disorders • • • • • • • • replaced expensive monoparametric assays simultaneous detection of 40 - 60 metabolites (amino acids, acylcarnitines) simultaneous diagnosis of 20 - 30 monogenic diseases (AA metabolism, FATMO) with immediate treatment options total incidence > 1:2000 unprecedented sensitivity, specificity, ppv co-pioneered in the mid-90s by BIOCRATES founder Bert Roscher > 1,300,000 newborns screened in Munich similar labs worldwide Lessons from newborn screening 1) Quantitative tandem mass spectrometry (stable isotope dilution) is able to meet the most stringent quality criteria (precision, accuracy) for routine diagnostics 2) The concept of multiparametric biomarkers improving assay sensitivity and, particularly, specificity is valid for many monogenic (and multifactorial) diseases 3) MS-based diagnostics can save costs despite a wider analytical panel and improved diagnostic quality Also true for therapeutic drug monitoring of immunosuppressants, antidepressants, antiretrovirals... Goals in clinical diagnostics Conventional diagnostics ill Multiparametric diagnostics latent healthy genetic predisposition • • Early diagnosis Prophylaxis instead of therapy • • • • Subtyping / Staging Therapeutic drug monitoring Phenotypic pharmacogenomics Individualized (and more costefficient) medicine Technology, workflow integration & data analysis Integrated technology platform • • • • Technical validation Statistical analysis Data visualization Biochemical interpretation ase 2 dise • Separation (LC, GC) • Quantitation (MRM, SID) • QA/QC BioInformatics se as e1 • Automated extraction and derivatization • SPE Analytics di Sample preparation LIMS/Database BioBank Clinical & experimental samples Diagnoses & lab data Workflow overview Staging of diabetic and non-diabetic nephropathy by PCA-DA MarkerViewTM Identifying marker candidates: stage 3 vs. stage 5 kidney disease (loadings) Increasing oxidative stress in progressing CKD • Oxidation of methionine is highly indicative for oxidative stress 0,030 Met-SO/Met 0,025 0,020 • Ratio of Met-SO to Met quantitative measure for this biomarker 0,015 0,010 0,005 0,000 stage 3 stage 4 stage 5 Decreasing ADMA secretion in progressing CKD Metabolite vs. eGFR, non-diabetic, w/o Stage 5 ADMA (U) Linear (ADMA (U)) 60 50 Metabolite 40 30 20 10 y = 0.4995x - 5.8957 R² = 0.7523 0 100 80 60 40 20 0 eGFR • Regression analysis to identify correlation of marker candidates with continous (clinical) variables instead of discrete (=artificial) stages Orchestration of fatty acid oxidation Membrane phospholipids (GPC, GPE, GPS, ...) SPL2 Lysophospholipids LA 18:2w6 Free fatty acids PUFAs AA 20:4w6 13-HODE DHA 22:6w3 LOX ROS 9-HODE EPA 20:5w3 12-HETE 15-HETE COX LTB4 TXB2 PGD2 PGE2 Pathway visualization in KEGG (reference pathway) Pathway visualization in KEGG (human) Dynamic pathway visualization in MarkerIDQ Exploring ‚metabolic shells‘ around metabolites Route finding between metabolites across pathways Reactions vs. Reactant pairs! Issues I: Databases Parallel / competing initiatives with incompatible / proprietary data formats KEGG MetaCyc, HumanCyc, etc. Reactome HMDB OMIM Lipidomics consortia ... Compartmentalization not well depicted Incompleteness / generic entries (phospholipids, acylcarnitines, etc.) Lack of curation Lack of publication Issues II: Standardization and normalization Standardization Instrument vendors oppose common data formats What meta-data to record? No valid guidelines for quantitation of endogenous metabolites (FDA guidance was developed for xenobiotics) Nomenclature vs. analytical reality (sum signals, isomers, etc.) Normalization Absolute quantitation overcomes the need for analytical normalization Role of sample types (plasma, CSF, urine, tissue homogenates, cell extracts, ...) How can biological normalization work? Are there ‚housekeeping metabolites‘? Issues III: Biostatistics Overfitting & correction Suitable clustering algorithms for multivariate data sets? Metabolites are no equivalent independent variables Analytical validity/variability are usually not considered Often, groups of metabolites are synthesized or degraded by the same enzyme(s) Consecutive reactions within a pathway/network depend on each other (flux analysis!) How to incorporate this in biostatistics? Weighting? Derived parameters, ratios, etc.? How to exploit this in (automated) plausibility checks? Summary I • Metabolomics depicts the functional end-point of genetics and environment • Targeted metabolomics data are analytically reproducible and allow immediate biochemical interpretation • Proof-of-concept has been achieved in routine diagnostics of inborn errors of metabolism • Many metabolic biomarkers are valid across species and enable translational research • Comprehensive targeted metabolomics bridges the gap to open profiling approaches Summary II : Success factors for biomarker development Validated biomarkers Patent strategy and experience Biomarker candidates Welldocumented biobanking Diligent study design Clinical & scientific experts Solid multivariate biostatistics Validated quantitative assays Biochemical plausibility & understanding Selected partners Analytics Stefanie Gstrein Sascha Dammeier Hai Pham Tuan Cornelia Röhring Therese Koal Ali Alchalabi Verena Forcher Ines Unterwurzacher Stefan Urban Doreen Kirchberg Ralf Bogumil Patrizia Hofer Lisa Körner Peter Enoh Acknowledgements Brad Morie Doris Gigele Elgar Schnegg Admin, IT & BizDev Anton Grones Ingrid Sandner Georg Debus Wolfgang Samsinger Patricia Aschacher Bioinformatics Daniel Andres Olivier Lefèvre Paolo Zaccaria Florian Bichteler Marc Breit Manuel Gogl Bernd Haas Mattias Bair Robert Eller Hamza Ovacin Gerd Lorünser Yi Zao Statistics & Biochemistry Ingrid Osprian Marion Beier Vera Neubauer Oliver Lutz Matthias Keller Denise Sonntag Hans-Peter Deigner Ulrika Lundin