NEDMDG SYMPOSIUM Worcester, Mass. June 8th, 2006 STATISTICAL SPECTROSCOPY AND GLOBAL SYSTEMS BIOLOGY APPROACHES IN DISEASE MODELING Jeremy K. Nicholson, PhD Professor and Head of Biological Chemistry Imperial College University of London Summary • The quest for new drugs - ‘Top-down’ Systems Biology, gene-environment interactions and the Personalized Healthcare Paradigm. • Generating and modeling system level metabolic data in experimental disease states. – Pharmaco-metabonomics and predictive models – Biomarker recovery via statistical spectroscopy • NMR, UPLC-MS, UPLC-MS/NMR, Proteo-metabonomics • Characterizing genetic,dietary and microbial contributions to Human Metabolic Phenotypes: Molecular epidemiology and “The Health of Nations”. Can Systems Biology save the Pharmaceutical Industry? High technology has generally not helped the rate of practical drug discovery! ONLY 11% OF DRUGS IN CLINICAL TRIALS MAKE IT TO MARKET! PHARMACEUTICAL PRODUCTIVITY DECLINE: REAL COST PER DRUG HAS INCREASED 10 FOLD IN 30 YEARS! THIS IS AN UNSUSTAINABLE BUSINESS MODEL! 23,710 3,051 HUMAN GENOME 23,710 Lipinski Rules: Compound Bioavailability is poor if… > 5 H-bond donors CLog P > 5 Sum {N + O} > 10 Mass > 500 DALTONS DRUGGABLE GENOME 3,000 DRUG TARGETS 1000 DISEASE MODIFYING GENES 3,000 PHARMACEUTICAL PIPELINE ATTRITION DISCOVERY LIBRARY (NME) DEVELOPMENT INVESTIGATIONAL NEW DRUG ADME/ANIMALS CLINICAL TRIALS Screening Studies: EARLYProteomic and Transcriptomic, Metabolomic/Metabonomic Chemistry and in vitro screens Pre-Lead Prioritization Systems Biology Application PHARMACOLOGY/TOXICOLOGY NEW DRUG TARGETS MARKET Ph1 Ph2 CLINICAL Ph3 DIAGNOSTIC MARKERS Response Analysis: Genotyping, CRITICAL Proteomics, Pharmaco-genomics. Pharmaco-metabonomics $$$ failure! Product Rescue? Preclinical Safety Biomarkers and Mechanisms OBLIGATE Clinical Safety BiomarkersFATAL Quasi-Darwinian Lead Selection Clinical Efficacy Withdrawal Selection Preclinical Efficacy Models & Biomarkers and Optimization Biomarkers from © Imperial College, 2006 market Personalized Healthcare Optimized drug efficacy & minimized toxicity Theranostics & patient stratification Optimized Nutrition Global Systems Biology COMPREHENSIVE PHENOTYPE Metabonomics Genomics Microbiome © Imperial College,2006 Proteomics Xeno-metabolome Multivariate Descriptions of Metabolism • MetaboLome Definition: The quantitative analyisis or description of all low molecular weight metabolites in specified cellular, tissue or biofluid compartments. (Metabolomics: Numbers, chemical classes, structures, concentrations: < 1KDa) • MetaboNome Definition: The sums, products & interactions of all the individual compartments/metabolomes (including extra-genomic sources) dispersed in a complex organism…The ‘Global’ System. METABONOMICS “Quantitative measurement of multivariate metabolic responses of multicellular systems to pathophysiological stimuli or genetic modification” (AIMS TO MODEL GLOBAL METABOLIC REGULATION OF COMPLEX SYSTEMS INCLUDING DYNAMIC INTERACTIONS & COMPARTMENTALIZATION OF COMPONENTS) METABOLOMICS (various definitions) e.g. “measurement of metabolite concentrations & fluxes in cell systems”. OR “measurement & modelling of all metabolites & pathways in a system” C1 METABONOMICS OF COMPLEX SYSTEMS C2 Cellular transcriptome Multiple cell lines C1,C2…C8 etc. Cellular proteome Intracellular metabolome Intervention C8 with C3 specific C7 Extra-cellular metabolite pool (biofluids) C4 target Extra-cellular metabolite pool Tissue profiles Excretion signatures Molecular compartments Reaction profile measurement & modeling External secretions /excretion C6 C5 PHYSICAL-BIOCHEMICAL FEATURES BIOLOGICAL FLUID TYPES OF URINE AND PLASMA (primary secretory and connective roles) URINE: Variable pH, ionic strength, osmolarity. Key Diagnostic Fluids:constant. Plasma, Urine. High dielectric Extreme dynamic concentration range (>1011). time Specialized Functions: Cerebrospinal, thyroid. Thousands of molecules < 1KDa, (polarity?). averaged Metal complexes and supramolecular Saliva (sub-lingual, parotid, sub-maxillary),aggregates. Gastric, Many small proteins, high enzyme activities in Bile, Pancreatic. pathological states-dynamically reactive matrix. Amniotic, Follicular, Milk, Seminal Vesicle, Prostatic, Epidydimal, Seminal. PLASMA: Relatively constant pH, ionic strength, osmolarity. LowerFluids: bulk dielectric Pathological Ascites,constant. Cystic, Blister. 5). High dynamic concentration range (>10 snap Artificial Hundreds Fluids: Bronchiolar lavage fluid,and peritoneal of of molecules < 1KDa >1KDa. shot dialysates, hemodialysates, fecal water, rectal Metal complexes and supramolecular complexes. dialysates, cell extracts and cell diffusionalsupernatants. Multi-compartment -multi matrix Many large proteins and protein complexes. Analytical Approaches in Metabonomics and Metabolomics NMR Spectroscopy Biofluids, extracts, cells/tissues Mass Spectrometry Biofluids and extracts Linked chromatography/MS Single pulse 1H, 13C, 31P Many Ionization Methods 2+ D methods COSY (pattern recognition for classification, TLC/MS TOCSY Single quad CE-MS HMQC Triple quad GC-MS HMBC etc TOF-MS LC-NMR, CE-NMR, CEC-NMR HPLC-MS QTOF-MS PFG Diffusion analysis Ion trap UPLC-MSn DOSY etc… Linear ion trap LC-ICPMS-MS HR-MAS (cells + tissues) FTMS (Linking multiple spectra & spectral types for PFG-MAS LC- NMR -MSn structure (cryoprobes/robotic FI etc)elucidation/pathway analysis) CHEMOMETRIC MODELLING diagnostics & biomarker analysis) STATISTICAL SPECTROSCOPY Standard Analytical Information: Identity, Structure, Quantity (BOTH) Physical Biochemical Information: Interactions, Compartments (NMR) CHEMOMETRIC TOOLS FOR INFORMATION RECOVERY FROM MULTIVARIATE DATA UNSUPERVISED • Principal Components Analysis (PCA) • Hierarchichal Cluster Analysis (HCA) • Logical blocking-PCA • Non-linear Mapping (NLM) • Supergravity Association Mapping (SAM)….etc. © Imperial College, 2006 SUPERVISED • Partial Least Squares (PLS) & PLS-DA • O-PLS & O2-PLS • Soft Independent Modeling of Class Analogy (SIMCA) • Rule Induction • Bayes Nets/Machine Learning • Genetic Algorithms • Neural Networks • CLOUDS…etc… 900 MHz 1H NMR Spectrum of Untreated Human Urine Contains Latent Biomarker information on: Genotype Physiological state Nutritional state Gut microbes ‘Biological’ Age Presence of Disease Translatable Biomarkers Diagnostic Prognostic Toxicity Efficacy Primary and co-metabolome interactions in mammalian systems (Nicholson et al Nature, Microbiology, 2005, 3, 2-8) GUT ‘MICROBIOME’ 1 2 3 4 5 6 Species transcriptomes Species proteomes HOST GENOME A B C D E F Cellular transcriptomes Species metabolomes Enteron microbial and dietary 2o metabolites Cellular proteomes 1o Intracellular metabolomes Extracellular metabolite pool Secretory Metabolomes Humans: > 500 functionally distinct NORMAL cell types/ca.10 trillion parenchymal cells Co-metabolome enters via hepatic portal + mesenteric veins Biliary secretions enter duodenum from common bile duct ENTEROHEPATIC CIRCULATION Humans: > 1000 Species. > 100 trillion cells MICROBIAL-MAMMALIAN CO-METABOLISM OF CHOLIC ACID OH 1. Biosynthesis: cholic acid COOH HO OH 2. Phase II glycine Conjugation Phase II taurine Conjugation OH OH C O N H COOH MAMMALIAN LIVER HO 7. Phase II glycine Conjugation OH H H HO Phase II taurine Conjugation OH taurocholic acid glycocholic acid 3. Secretion into bile H C O N H SO3H C O N COOH HO H C O N SO3H HO taurodeoxyocholic acid glycodeoxyocholic acid 8. Secretion into bile enterohepatic circulation amino acid deconjugation by gut microbiota 6. reabsorption into blood via hepatic portal system OH 4. Regeneration of cholic acid 9. Deconjugation further reactions H COOH deoxycholic acid COOH MAMMALIAN GUT HO OH 5. 77-a-dehydroxylation by gut microbiota a- HO De-conjugated bile acids are less efficient at emulsifying fats PHARMACO-METABONOMICS A new paradigm for personalized predictive drug metabolism and toxicology. Definition: The prediction of the quantitative outcome of an intervention based on a pretreatment metabolic model: Applications in drug metabolism, xenobiotic toxicity, drug efficacy…etc. Understanding drug interaction responses in relation to individual metabolic variation: Gene-environment interactions determine the pre-dose starting phenotype. Genetic Factors P450 Polymorphisms SNP variations Nutrition Gut Microbiome Age Hormonal status Are there locations that are more risk averse For particular interventions? Prognostic biomarker clusters? Individual (dot) location is the resultant of the influence vectors in m-space. Clayton et al 440 (20) 1073-1077, 2006) Global System Interactions Affecting Drug Metabolism & Toxicity Conditional Metabolic Phenotype Host Genetic Constitution Interspecies variations & individual SNP variations Individual Gut Microbiome microbial species variation & ACTIVITY Specific drug metabolizing enzyme complement (CYP450) polymorphisms Tissue-specific CYP450 induction state (e.g., in liver & gut) Nutritional status & dietary composition Metabolic Fate Nicholson, JK et al and Toxicity Nature, Biotechnology of Drug 22 (10) 1268-1274. (2004) STATISTICAL SPECTROSCOPY “The application of multivariate statistical methods to extract latent structural or connectivity information in multiple spectral data sets from samples or experiments collected serially or in parallel.” 1. STATISTICAL SEARCH SPACE REDUCTION FOR BIOMARKER IDENTIFICATION IN SERIAL UPLC-MS DATA SETS Crockford et al Analytical Chemistry (2006), in press PARTIAL LEAST SQUARES DISCRIMINANT ANALYSIS (PLS-DA) SORTS FEATURES ACCORDING TO IMPORTANCE FOR CLASS SEPARATION. SPECTRAL LOADINGS BACK-PROJECTED DIRECTLY TO LCMS CHROMATOGRAM TO IDENTIFY RETENTION TIMES & MASSES OF CANDIDATE BIOMARKERS. O-PLS-DA STATISTICAL SEARCH SPACE REDUCTION Hydrazine dosed (10) vs control (10): Statistically significant peaks (r > 0.6) Back-projected into UPLC-MS time domain 356 features © IMPERIAL COLLEGE 2006 O-PLS-DA STATISTICAL SEARCH SPACE REDUCTION r > 0.8 STRONG CANDIDATE BIOMARKERS 51 features © Imperial College, 2006 STOCSY • RECONSTRUCTION OF LATENT BIOMARKER INFORMATION FROM LARGE SPECTROSCOPIC SETS BY STATISTICAL TOTAL CORRELATION SPECTROSCOPY (STOCSY) Cloarec et al Analytical Chemistry, 77 (5) 1282-1289, 2005. Calculate correlation matrix (C) between all computer points (d/d) for all 1D spectra in all datasets to be compared: X1 and X2 are the auto-scaled experimental matrices of n x v1 and n x v2 n = number of spectra in each class v1 and v2 = number of variables in each matrix (32K) 2D STOCSY: Plot d/d correlation matrix for all samples, color code by r2. Gives both self-molecular correlations (assignment) and also pathway and compartment correlations. R-Selected 2D STOCSY (30 x 1D mouse urine spectra) Only self molecular correlations r2 > 0.9 plotted SHY STATISTICAL HETEROSPECTRSCOPY Analytical Chemistry (2006) 78 363-371. SHY Parallel NMR & MS data Collection Sequential NMR & UPLC-MS spectra can be obtained on each sample for statistical integration © Imperial College, 2006 NMR Control Rat urine Hydrazine treated Rat UPLC-MS Control Rat urine Hydrazine treated Rat SHY CONNECTIONS IN PARALLEL NMR-MS SPECTROSCOPIC SETS Direct Structure Assignment Co-variance dX-Ym/z (parent)-Zm/z (fragment) MS data Zm/z Am/z Ym/z dX dB NMR data Direct Pathway Connection Co-variance (Am/z- dB) Statistical HeterospectroscopY (SHY): Expansion- shows NMR to parent ion, fragment pattern & pathway correlates. • m/z N-acetyl-lysine NMR domain Correlation/anticorrelation coefficients MS domain MOLECULAR EPIDEMIOLOGY DATA DRIVEN TOP-DOWN SYSTEM METABOLIC MODELING Can genetic, dietary, microbial, and environmental influences in large scale population studies be deconvolved? Examples from the INTERMAP and INTERSALT studies. J. Stamler (PI) P. Elliot (PI) M. Daviglus H. Kesteloot H. Ueshima B. Zhou Q. Chan M. D Iorio E. Maibaum S. Bruce C. Teague R.L. Loo L. Smith Acknowledgements: The INTERMAP study has been supported by Grant 5RO1-HL50490-09, 5-RO1 HL65461-04 and 5 RO1 HL71950-02 from the US National Heart, Lung, and Blood Institutes, National Institutes of Health, Bethesda, MD, USA; by the Chicago Health Research Foundation; and by national agencies in Japan, People’s Republic of China and the United Kingdom. INTERSALT Study was supported by the Council on Epidemiology and Prevention of the International Society and Federation of Cardiology; World Health Organisation; International Society of Hypertension; Wellcome Trust; National Heart, Lung, and Blood Institute, US; Heart Foundations of Canada, Great Britain, Japan and the Netherlands; Chicago Health Research Foundation; Parastatal Insurance, Company, Belgium; and by many national agencies supporting local studies. FUNDAMENTAL METABOTYPE DIFFERENCES PCA-DA of population data (disease outliers removed) JAPANESE • AMERICAN CHINESE © Imperial College, 2006 COMBINATION OF GENETIC ENVIRONMENTAL & NUTRITIONAL FACTORS Concluding Remarks Metabonomics is a powerful top-down systems biology tool for investigating drug toxicity, disease processes, phenotypic variation & differential gene function in vivo. NOVEL OUTPUTS: Metabolic biomarker information on system regulation & failure. Deeper understanding of DISEASE MECHANISMS. Models that incorporate genetic & environmental interactions. Omics data must be considered in an extensive biological framework with robust statistical interrogation and integration to visualize system activity Analyzing modulations of the MICROBE-MAMMAL-METABOLIC AXIS will be crucial for understanding genotype-phenotype interactions and variation in toxicity and efficacy of drugs in man. Top down metabolic modeling is likely to prove to be a powerful tool in the pursuit of Personalized Healthcare Solutions and understanding the Health of Nations. The Metabonomics Engine @ IC & Collaborators Academic: Dr Elaine Holmes, Prof John C. Lindon, Dr H. Keun Dr T.Ebbels, Dr J. Bundy, Prof James Scott, Prof Tim Aitman, Prof Paul Elliot, Dr H. Tang, Dr G. Tranter, Dr S. Mitchell. (Imperial) Post Doctoral Group: Drs, O. Cloarec, M. Dumas, A. Craig, A. Maher, B. Beckwith-Hall, E. A. Clayton, R. Barton, J., Y. Wang, E. Meibaum, I. Douarte, S. Bruce. T. Tseng, C.Stella, M. Coen. J. Sidhu, E.Skiordi, M. Bollard, ………..etc Graduate Students: T. Athersuch, I. Yap, R. C. Bailey, C. Teague, D. Parker, A. Tregay. J. Pearce, J. Bowen, S. Lowdell, L.Smith, A. Cooray, N.Jones, G. McLaughlin, D. O’Connor, R.Liu, M.Ratalainen, K. Veselkov, F.P. Martin. …etc Collaborators: Dr Rob Plum, John Shockcor (WATERS), Prof Ian D. Wilson and Dr T.Orton (AZ ), Prof J. Everett, Drs M. Reily and D. Robertson, (Pfizer), Prof Jose Ordovas (Tufts University), Prof Burt Singer (Princeton University), Drs M. Spraul (Bruker), Dr Sunil Kochhar (Nestle), Frans van D’Ouderra,J. Powell, M. Faughan et al (Unilever). Dr D. Gaugier (Oxford University), Prof D.Withers (UCL). FUNDING: NIH, The Wellcome Trust, BBSRC, MRC, EPSRC, NERC, The Royal Society, Roche Foundation, Servier, Lilly, P&G, Pfizer, AstraZeneca, Nestle, Unilever, Novo Nordisk, Roche Foundation, BMS, Hi-Q, Metabometrix, METAGRAD, WATERS CORPORATION.