“Inspired by Carl: Exploring the Microbial Dynamics Within” Invited Talk Looking in the Right Direction: Carl Woese and the New Biology University of Illinois, Urbana-Champaign September 20, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1 Carl Woese Was My Mentor for Microbial Genomics To Larry Smarr: “I want to talk to you about setting up a megabase sequencing unit at the U of I I take this as necessary to the survival of good biology on this campus, for it is clear that megabase sequencing will be a major biological activity in the future. - Carl Woese, July 6, 1995 To Carl Woese: “What I have always understood is that you were responsible for ‘turning Larry Smarr on’ to biology, to evolution, to the adventures in living systems.” – John Wooley, July 26, 2006 Last visit to Carl and Gay at their house Sept 20, 2009 There are 100 billion stars in the Andromeda galaxy… …and 100 billion galaxies in the known universe. It’s a microbial world… …there are 100 million times as many bacteria on Earth as stars in the universe. Microbiology is the ultimate Big Data science! Carl’s Late Thoughts on the Critical Need for Research in Microbial Ecologies The second major direction involves the nature of the global ecosystem. . . . Bacteria are the major organisms on this planet— in numbers, in total mass, in importance to the global balances. Thus, it is microbial ecology that . . . is most in need of development, both in terms of facts needed to understand it, and in terms of the framework in which to interpret them.” -Carl Woese Current Biology 15: R111–R112 (2005). I started intensively working on microbial ecologies in 2005 PI Larry Smarr Grant Announced January 17, 2006 Calit2 Microbial Metagenomics ClusterNext Generation Optically Linked Science Data Server Source: Phil Papadopoulos, SDSC, Calit2 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched / Routed Core ~200TB Sun X4500 Storage 10GbE Calit2 CAMERA: 0ver 4000 Registered Users From Over 90 Countries The Human Gut Starting Showing Up as a Another Microbial Environment Being Metagenomically Sampled The Human Gut as a Super-Evolutionary Microbial Cauldron • Enormous Density – 1000x Ocean Water • Highly Dynamic Microbial Ecology – Hundreds to Thousands of Species • Horizontal Gene Transfer • Phages • Adaptive Selection Pressures (Immune System) – Innate Immune System – Adaptive Immune System – Macrophages and Antimicrobial proteins • Constantly Changing Environmental Pressures – Diet – Antibiotics – Pharmaceuticals To Better Understand the Human Gut Dynamics I Have Turned My Body into a Genomic and Biomarker Observatory Calit2 64 Megapixel VROOM One Blood Draw For Me Only One of My Blood Measurements Was Far Out of Range--Indicating Chronic Inflammation 27x Upper Limit Episodic Peaks in Inflammation Followed by Spontaneous Drops Normal Range <1 mg/L Complex Reactive Protein (CRP) is a Blood Biomarker for Detecting Presence of Inflammation Adding Stool Tests Revealed Oscillatory Behavior in an Immune Variable Which is Antibacterial Typical Lactoferrin Value for Active Inflammatory Bowel Disease (IBD) 124x Upper Limit for Healthy Normal Range <7.3 µg/mL Lactoferrin is a Protein Shed from Neutrophils An Antibacterial that Sequesters Iron Evolving Microbiome Environmental Pressures: Dynamical Innate and Adaptive Immune Oscillations in Colon Adaptive Immune System Normal 50 to 200 Innate Immune System Normal <600 These Must Be Coupled to A Dynamic Microbiome Ecology For Deep Analysis of Changes in the Gut Microbiome Ecology Our Team Compared a Healthy Population with 3 Types of IBD Each Sample Has 100-200 Million Illumina Short Reads (100 bases) “Healthy” Individuals Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 2 Ulcerative Colitis Patients, 6 Points in Time Larry Smarr (Colonic Crohn’s) 7 Points in Time 5 Ileal Crohn’s Patients, 3 Points in Time Total of 27 Billion Reads Or 2.7 Trillion Bases Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Example: Inflammatory Bowel Disease (IBD) We Used 25 CPU-Years to Compute Comparative Gut Microbiomes of my 7 Time Samples, 255 Healthy, and 20 IBD Patients Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer UCSD’s Integrated Digital Infrastructure (IDI) Initiative Enhanced Cyberinfrastructure to Support Knight Lab for Microbial Genomics Knight 1024 Cluster In SDSC Co-Lo Gordon Knight Lab Data Oasis 7.5PB, 100GB/s CHERuB 100Gbps 120Gbps 10Gbps FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Emperor & Other Vis Tools 40Gbps Prism@UCSD 64Mpixel Data Analysis Wall Resulting Microbiome Profiles Allow Us to Quickly Find 1 Unhealthy Person Out of 155 HMP “Healthy” Subjects 75 Most Abundant Species Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Ulcerative Colitis Colonic Crohn’s Healthy Ileal Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Seven Time Samples Over 1.5 Years Healthy Colonic Crohn’s Ileal Crohn’s We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Three Forms of IBD Average HE Most Common Microbial Phyla Average Ulcerative Colitis Average LS Colonic Crohn’s Disease Average Ileal Crohn’s Disease Explosion of Proteobacteria Hybrid of UC and CD High Level of Archaea Collapse of Bacteroidetes Explosion of Actinobacteria We Find Large Changes in Gut Microbial Abundance: Ileal CD Average Compared to Healthy Average by Family 235x 30 Families >10x or <1/10x (Out of 76 Families with > 0.1% Abundance) 1/320x Our Research Shows Even Larger Changes in Protein Family Abundance Between Health and Disease – Ileal Crohns Ratio of Ileal CD Average to Healthy Average for Each Nonzero KEGG KEGGs Greatly Increased In the Disease State Note Hi/Low Symmetry Most KEGGs Are Within 10x In Healthy and Ileal Crohn’s Disease KEGGs Greatly Decreased In the Disease State Over 7000 KEGGs Which Are Nonzero in Health and Disease States Our Relative Abundance Results Across ~300 People Reveal Potential Diagnostic Species UC 100x Healthy Healthy 100x CD UC 100x CD We Produced Similar Results for ~2500 Microbial Species The Woese Effect: I Seem to Have a Large Amount of Archaea in my Gut 18% LS Average 175x Healthy Average Next Step: Discover How the Time Varying Immune System & Pharma Drives Adaptive Changes in the Microbiome Ecology Immune & Inflammation Variables 2009 2010 2011 2012 2013 2014 2015 Weekly Symptoms Pharma Therapies First 7 Stool Samples To Expand IBD Project the Knight/Smarr Labs Were Just Awarded ~ 1 CPU-Century Supercomputing Time • Smarr Gut Microbiome Time Series – From 7 Samples Over 1.5 Years – To 50 Samples Over 4 Years • IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients – 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank – 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients • New Software Suite from Knight Lab 8x Compute Resources Over Prior Study – Re-annotation of Reference Genomes, Functional / Taxonomic Variations – Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner Bringing the Lessons of Microbial Ecology to Healthcare We Must Move From Combating Single Microbe Diseases to Developing the Human/Microbiome System Approach to Public Health Bach (2002) N Engl J Med, Vol. 347, 911-920 2014 For Public Health It is Still About Microbes, But from Single Species to Entire Ecologies The Coupled Neural, Immune, and Microbiome Systems Provide a Model Explaining How Nutrition Can Alter Neurodevelopment Thanks to Our Great Team! UCSD Metagenomics Team JCVI Team Weizhong Li Sitao Wu Karen Nelson Shibu Yooseph Manolito Torralba Calit2@UCSD Future Patient Team SDSC Team Jerry Sheehan Tom DeFanti Kevin Patrick Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Joe Keefe Ernesto Ramirez Dell/R Systems Ayasdi Devi Ramanan Pek Lum Michael Norman Mahidhar Tatineni Robert Sinkovits Brian Kucic John Thompson UCSD Health Sciences Team Rob Knight Lab William J. Sandborn Elisabeth Evans John Chang Brigid Boland David Brenner