Scalable metabolic reconstruction for metagenomic data and the human microbiome Sahar Abubucker, Nicola Segata, Johannes Goll, Alyxandria M. Schubert, Jacques Izard, Brandi L. Cantarel, Beltran Rodriguez-Mueller, Jeremy Zucker, Mathangi Thiagarajan, Bernard Henrissat, Owen White, Scott T. Kelley, Barbara Methé, Patrick D. Schloss, Dirk Gevers, Makedonka Mitreva, Curtis Huttenhower Harvard School of Public Health Department of Biostatistics 07-16-11 What’s metagenomics? Total collection of microorganisms within a community Also microbial community or microbiota Total genomic potential of a microbial community Study of uncultured microorganisms from the environment, which can include humans or other living hosts Total biomolecular repertoire of a microbial community 2 Valm et al, PNAS 2011 What to do with your metagenome? Reservoir of gene and protein functional information Who’s there? What are they doing? Comprehensive snapshot of microbial ecology and evolution Who’s there varies: your microbiota is plastic and personalized. What they’re doing is adapting to their environment: you, your body, and your environment. Public health tool monitoring population health and interactions Diagnostic or prognostic biomarker for host disease 4 The Human Microbiome Project for a normal population 300 People/ 15(18) Body Sites >50M 16S seqs. Multifaceted analyses Human population 4Tbp unique Microbial Multifaceted data >6,000 samples metagenomic sequence >1,900 reference genomes Full clinical metadata population Novel organisms Biotypes Viruses Metabolism 2 clin. centers, 4 seq. centers, data generation, technology development, computational tools, ethics… Metabolic/Functional Reconstruction: The Goal Healthy/IBD BMI Diet Taxon Geneabundances SNP LEfSe: Enzyme family abundances LDA Effect Size expression genotypes Pathway abundances Metagenomic biomarker discovery Nicola Segata http://huttenhower.sph.harvard.edu/lefse 6 HMP: Metabolic reconstruction 100 subjects 1-3 visits/subject ~7 body sites/visit 10-200M reads/sample 100bp reads HUMAnN: HMP Unified Metabolic Analysis Network Functional seq. KEGG + MetaCYC BLAST CAZy, TCDB, VFDB, MEROPS… BLAST → Genes (1 p )(a g ) 1 p a 1 a(r ) c( g ) |g| r http://huttenhower.sph.harvard.edu/humann a a(r ) Genes (KOs) Genes → Pathways MinPath (Ye 2009) WGS reads Taxonomic limitation Pathways (KEGGs) Xipe ? Rem. paths in taxa < ave. Pathways/ modules Distinguish zero/low Gap filling (Rodriguez-Mueller in review) c(g) = max( c(g), median ) Smoothing Witten-Bell TN /(V T ) /( N T ) c( g ) 0 c( g ) otherwise c( g ) N /( N T ) 7 HUMAnN: Metabolic reconstruction Oral (BM) Oral (TD) Gut ← Pathways→ Vaginal Skin Nares Oral (SupP) ← Samples → Oral (BM) Gut Oral (SupP) Oral (TD) Skin Nares ← Pathways→ Vaginal ← Samples → Pathway coverage Pathway abundance 8 HUMAnN: Validating gene and pathway abundances on synthetic data Individual gene families Validated on individual gene families, module coverage, and abundance • 4 synthetic communities: ρ=0.91 Low (20 org.) and high (100 org.) complexity Even and lognormal abundances • Best-BLAST-hit overshoots false positives, undershoot real pathways as a result • HUMAnN FNs: short genes (<100bp), taxonomically rare pathways • HUMAnN FPs: large and multicopy (not many in bacteria) 9 A portrait of the healthy human microbiome: Who’s there vs. what they’re doing ← Relative abundance → Nares Oral (BM) Vaginal Skin Gut Oral (SupP) Oral (TD) ← Relative abundance → ← Relative abundance → ← Relative abundance → ← Phylotypes → ← Pathways → 10 Niche specialization in human microbiome function Metabolic modules in the KEGG functional catalog enriched at one or more body habitats http://huttenhower.sph.harvard.edu/lefse Nicola Segata 11 Proteoglycan degradation by the gut microbiota Glycosaminoglycans (Polysaccharide chains) AA core 12 Proteoglycan degradation: From pathways to enzymes Enzyme relative abundance 10-8 10-3 • Heparan sulfate degradation missing due to the absence of heparanase, a eukaryotic enzyme • Other pathways not bottlenecked by individual genes • HUMAnN links microbiome-wide pathway reconstructions → site-specific pathways → individual gene families 13 Patterns of variation in human microbiome function by niche 14 Patterns of variation in human microbiome function by niche • Three main axes of variation • Eukaryotic exterior • Low-diversity vaginal • Gut metabolism • Oral vs. tooth hard surface • Only broad patterns: every human-associated habitat is functionally distinct! 15 How do microbes and function vary within each body site across the population? 16 How do body sites compare between individuals across the population? 17 HMP: Prevalence of species (OTUs) across the population Cumulative prevalence 18 HMP: Prevalence of pathways across the population Cumulative prevalence • 16 (of 251) modules strongly “core” at 90%+ coverage in 90%+ individuals at 7 body sites • 24 modules at 33%+ coverage • 71 modules (28%) weakly “core” at 33%+ coverage in 66%+ individuals at 6+ body sites • Contrast zero phylotypes or OTUs meeting this threshold! • Only 24 modules (<10%) differentially covered by body site • Compare with 168 modules (>66%) differentially abundant by body site 19 Linking function to community composition ← Taxa and correlated metabolic pathways → ← 52 posterior fornix microbiomes → Plus ubiquitous pathways: transcription, translation, cell wall, portions of central carbon metabolism… Lactobacillus crispatus Phosphate and peptide transport Lactobacillus jensenii Sugar transport Lactobacillus gasseri Embden-Meyerhof glycolysis, phosphotransferases Lactobacillus iners F-type ATPase, THF Gardnerella/Atopobium AA and small molecule biosynthesis Candida/Bifidobacterium Eukaryotic pathways 20 Linking communities to host phenotype Normalized relative abundance Top correlates with BMI in stool Body Mass Index Vaginal pH (posterior fornix) Vaginal pH, community metabolism, and community composition represent a strong, direct link between phenotype and function in these data. Vaginal pH (posterior fornix) 21 Microbial biomolecular function and metabolism in the human microbiome: the story so far? • HUMAnN – Accurate metagenomic metabolic reconstruction – Sequences → genes → pathways → phenotypes – Validated on 4x synthetic communities • Who’s there varies even in health – What they’re doing doesn’t (as much) • There are patterns in this variation – Communities in related environments adapt using related functions – Function correlates with membership and phenotype • ~1/3 to 2/3 of human metagenome characterized – Job security! 22 Ask both what you can do for your microbiome and what your microbiome can do for you Thanks! Human Microbiome Project Sahar Abubucker Nicola Segata Dirk Gevers Levi Waldron George Weinstock Owen White Rob Knight Johannes Goll Makedonka Mitreva Yuzhen Ye Erica Sodergren Beltran Rodriguez-Mueller Mihai Pop Jeremy Zucker Vivien Bonazzi Mathangi Thiagarajan Jane Peterson Brandi Cantarel Lita Proctor Qiandong Zeng Maria Rivera Barbara Methe Bill Klimke Daniel Haft HMP Metabolic Reconstruction Ben Ganzfried Fah Sathira Alyx Schubert Pat Schloss Jacques Izard Bruce Birren Ramnik Xavier Doyle Ward Eric Alm Ashlee Earl Lisa Cosimi Interested? We’re recruiting students and postdocs! http://huttenhower.sph.harvard.edu Vagheesh Narasimhan Larisa Miropolsky http://huttenhower.sph.harvard.edu/humann http://huttenhower.sph.harvard.edu/lefse 24 HMP: Prevalence of genera (phylotypes) across the population Cumulative prevalence 26