“Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us” Invited Talk New Applications of Computer Analysis to Biomedical Data Sets QB3 Seminar San Francisco, CA May 28, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1 I Have Turned My Body into a Genomic and Biomarker Observatory Over 100 Blood and Stool Biomarker Time Series Calit2 64 Megapixel VROOM One Blood Draw For Me Only One of My Blood Measurements Was Far Out of Range--Indicating Chronic Inflammation 27x Upper Limit Episodic Peaks in Inflammation Followed by Spontaneous Drops Normal Range <1 mg/L Complex Reactive Protein (CRP) is a Blood Biomarker for Detecting Presence of Inflammation Adding Stool Tests Revealed Oscillatory Behavior in an Immune Variable Which is Antibacterial Typical Lactoferrin Value for Active Inflammatory Bowel Disease (IBD) 124x Upper Limit Normal Range <7.3 µg/mL Lactoferrin is a Protein Shed from Neutrophils An Antibacterial that Sequesters Iron Dynamical Innate and Adaptive Immune Oscillations From Stool Samples Adaptive Immune System Normal 50 to 200 Innate Immune System Normal <600 Methods Needed for Biomarker Time Series Correlations Adding 40 Microbial Ecology Time Series Immune & Inflammation Variables Weekly Symptoms Pharma Therapies Stool Samples How Will Detailed Knowledge of Microbiome Ecology Radically Change Medicine and Wellness? Your Body Has 10 Times As Many Microbe Cells As Human Cells 99% of Your DNA Genes Are in Microbe Cells Not Human Cells Challenge: Map Out Microbial Ecology and Function in Health and Disease States For Deep Analysis of Changes in the Gut Microbiome Ecology Our Team Compared a Healthy Population to Patients with Disease Example: Inflammatory Bowel Disease (IBD) “Healthy” Individuals Inflammatory Bowel Disease Patients 250 Subjects 1 Point in Time 2 Ulcerative Colitis Patients, 6 Points in Time Larry Smarr (Colonic Crohn’s) 7 Points in Time 5 Ileal Crohn’s Patients, 3 Points in Time Total of 2.7 Trillion DNA Bases Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Our team used 25 CPU-years to compute comparative gut microbiomes starting from 2.7 trillion DNA bases of my samples and healthy and IBD controls Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Two Forms of IBD Average Healthy Most Common Microbial Phyla Based on ~10,000 Bacteria, Archaea, Viruses Whose Genomes are Known Average Ulcerative Colitis Explosion of Proteobacteria Average Colonic Crohn’s Disease Hybrid of UC and CD High Level of Archaea Average Ileal Crohn’s Disease Collapse of Bacteroidetes Explosion of Actinobacteria Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Ulcerative Colitis Colonic Crohn’s Healthy Ileal Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Seven Time Samples Over 1.5 Years Healthy Colonic Crohn’s Ileal Crohn’s Now Extending 7 to 40 Time Samples Large Changes in Genus Relative Abundances Observed Over the Seven Smarr Samples (1.5 Years)-Clearly Not Healthy! Can We Extend These Approaches from Hundreds to Thousands by Using Crowdsourcing Gut Microbiome Sequencing? Three Years of Larry Smarr’s Gut Microbiome (38 Samples) Phyla Distribution Analyzed Using Ubiome Source: Ubiome provided kits and taxonomic analysis Graph by Larry Smarr Applying Ayasdi to Ubiome Gut Microbiome Genus Relative Abundance Time Series Metric: Variance Normalized Euclidean with PCA Lens of Ecology Change Over All Time Pairs Analysis by Mehrdad Yazdani, Calit2; Devi Ramanan, Ubiome Using Ayasdi to Discover Abrupt Changes in Time in Gut Microbiome Ecology Large Ecology Change In Short Time Interval Apply Ayasdi Statistical Tests Sorted by KS stat Analysis by Mehrdad Yazdani, Calit2; Devi Ramanan, Ubiome Abrupt Changes Discovered in Genus Faecalibacterium Relative Abundance Time Series 35x Source: Ubiome provided kits and taxonomic analysis Graph by Larry Smarr Genetic Sequencing of Humans and Their Microbes Is a Huge Growth Area and the Future Foundation of Medicine Source: @EricTopol Twitter 9/27/2014 Thanks to Our Great Team! UCSD Metagenomics Team JCVI Team Weizhong Li Sitao Wu Karen Nelson Shibu Yooseph Manolito Torralba Calit2@UCSD Future Patient Team SDSC Team Michael Norman Ilkay Altintas Shweta Purawat Mahidhar Tatineni Robert Sinkovits Jerry Sheehan Tom DeFanti Kevin Patrick Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Joe Keefe Ernesto Ramirez Dell/R Systems and Dell Analytics Brian Kucic John Thompson Tom Hill UCSD Health Sciences Team William J. Sandborn Elisabeth Evans John Chang Brigid Boland David Brenner