Charting the function of microbes and microbial communities Curtis Huttenhower Harvard School of Public Health Department of Biostatistics 11-17-11 Valm et al, PNAS 2011 What to do with your metagenome? Reservoir of and protein Who’s there? genefunctional What are they doing? information Comprehensive snapshot of microbial ecology and evolution Who’s there varies: your microbiota is plastic and personalized. This personalization is true at the level of phyla, genera, species, strains, and sequence variants. Public health tool monitoring population health and interactions What they’re doing is adapting to their environment: Diagnostic or you, your body, and your environment. prognostic biomarker for host disease 3 Slides by Dirk Gevers The NIH Human Microbiome Project (HMP): A comprehensive microbial survey • • • • • What is a “normal” human microbiome? 300 healthy human subjects Multiple body sites • 15 male, 18 female Multiple visits Clinical metadata www.hmpdacc.org A three-tier study design… 16S WGS ref …for mining metagenomic data WGS 16S >3k reads per sample Filtering/ trimming ~100M reads per sample Assembly ~50% Chimera removal contigs BLAST against functional DBs Annotation Taxonomic Clustering classification into OTUs (RDP) Organismal census at different taxonomic levels genes ~90M proteins Map on ref ~57% ~36% pathways census ... “Pathogen” carriage varies a lot 22 ***uniquely identifiable*** nonzero abundance “pathogens” from NIAID’s list of 135 0.12 Gemella Supragingival Capnocytophaga plaque 0.06 0.12 1 Capnocytophaga gingivalis Actinomyces 0.1 Capnocytophaga sputigena 0.08 Capnocytophaga ochracea 0.06 0.04 0.02 0 Alistipes 0.8 Relative Abundance 0.08 Posterior fornix 0.14 Relative Abundance 124 Samples 0.6 0.4 0.2 0 Stool 0.04 0.4 0.02 0 Relative Abundance Average Relative Abundance 0.1 0.3 0.2 Gardnerella vaginalis Alistipes putredinis Gemella haemolysans Actinomyces odontolyticus Gardnerella Capnocytophaga sputigena Capnocytophaga gingivalis Capnocytophaga ochracea Eikenella corrodens Burkholderiales bacterium Propionibacterium acnes Gardnerella vaginalis Parvimonas micra Porphyromonas gingivalis Proteus mirabilis 60 Samples Streptobacillus moniliformis Atopobium rimae Ureaplasma urealyticum Eggerthella lenta Proteus penneri Arcobacter butzleri Salmonella enterica Nocardia farcinica Cryptobacterium curtum Alistipes putredinis +Propionibacterium 0.1 Buccal mucosa Tongue dorsum >0.66 0 Supragingival plaque 146 Samples Stool Posterior fornix Anterior nares Retroauricular crease 7 Normalized relative abundance Phenotypes that explain variation (or not) can be surprising 8 Normalized relative abundance Phenotypes that explain variation (or not) can be surprising 9 Normalized relative abundance Phenotypes that explain variation (or not) can be surprising 10 A functional perspective on the human microbiome Healthy/IBD BMI Diet 100 subjects 1-3 visits/subject ~7 body sites/visit 10-200M reads/sample 100bp reads BLAST Functional seq. KEGG + MetaCYC Metagenomic reads CAZy, TCDB, VFDB, MEROPS… Taxon Geneabundances SNP Enzyme family abundances expression genotypes Pathway abundances ? Enzymes and pathways HUMAnN HMP Unified Metabolic Analysis Network http://huttenhower.sph.harvard.edu/humann 11 HUMAnN: Metabolic reconstruction Oral (BM) Oral (TD) Gut ← Pathways→ Vaginal Skin Nares Oral (SupP) ← Samples → Oral (BM) Gut Oral (SupP) Oral (TD) Skin Nares ← Pathways→ Vaginal ← Samples → Pathway coverage Pathway abundance 12 A portrait of the healthy human microbiome: Who’s there vs. what they’re doing ← Pathway abundance → Nares Oral (BM) Vaginal Skin Gut Oral (SupP) Oral (TD) ← Pathway abundance → ← Phylotype abundance → ← Phylotype abundance → ← Subjects → ← Subjects → 13 Niche specialization in human microbiome function ← Pathway abundance→ Metabolic modules in the KEGG functional catalog enriched at one or more body habitats ← ~700 HMP communities→ • 16 (of 251) modules strongly “core” at 90%+ coverage in 90%+ individuals at 7 body sites • 24 modules at 33%+ coverage • 71 modules (28%) weakly “core” at 33%+ coverage in 66%+ individuals at 6+ body sites • Contrast zero phylotypes or OTUs meeting this threshold! • Only 24 modules (<10%) differentially covered by body site • Compare with 168 modules (>66%) differentially abundant by body site 14 Proteoglycan degradation by the gut microbiota Glycosaminoglycans (Polysaccharide chains) AA core 15 Proteoglycan degradation: From pathways to enzymes Enzyme relative abundance 10-8 10-3 • Heparan sulfate degradation missing due to the absence of heparanase, a eukaryotic enzyme • Other pathways not bottlenecked by individual genes • HUMAnN links microbiome-wide pathway reconstructions → site-specific pathways → individual gene families 16 Patterns of variation in human microbiome function by niche 17 Patterns of variation in human microbiome function by niche • Three main axes of variation • Eukaryotic exterior • Low-diversity vaginal • Gut metabolism • Oral vs. tooth hard surface • Only broad patterns: every human-associated habitat is functionally distinct! 18 Normal varies a lot at the genus level (16S) Relative frequency of genera within Stool 343 genera Relative frequency Parabacteroides Faecalibacterium Alistipes Bacteroides 200 subjects Dirk Gevers Normal varies a lot at the species level (WGS) Relative frequency Relative frequency of Bacteroides species within Stool Bacteroides caccae Bacteroides stercoris Bacteroides sp. Bacteroides uniformis Bacteroides sp. Bacteroides vulgatus 123 samples Dirk Gevers What’s wrong with this picture? 52 posterior fornix microbiomes → Species and strains matter – but so does your method for identifying them in a community! Lactobacillus crispatus MV-1A-US Lactobacillus crispatus JV-V01 Lactobacillus crispatus 125-2-CHN Lactobacillus crispatus 214-1 Lactobacillus crispatus MV-3A-US Lactobacillus crispatus ST1 Lactobacillus gasseri JV-V03 Lactobacillus gasseri 202-4 Lactobacillus gasseri 224-1 Lactobacillus gasseri MV-22 Bifidobacterium breve DSM 20213 Bifidobacterium dentium ATCC 27679 Mycoplasma hominis Clostridiales genomosp BVAB3 str UPII9-5 Clostridiales genomosp BVAB3 UPII9-5 Gardnerella vaginalis AMD Prevotella timonensis CRIS 5C-B1 Megasphaera genomosp type 1 str 28L Porphyromonas uenonis 60-3 Gardnerella vaginalis 409-05 Gardnerella vaginalis 5-1 Atopobium vaginae DSM 15829 Gardnerella vaginalis ATCC 14019 Lactobacillus jensenii 1153 Lactobacillus jensenii 269-3 Lactobacillus jensenii SJ-7A-US Lactobacillus jensenii 208-1 Lactobacillus jensenii JV-V16 Lactobacillus jensenii 27-2-CHN Lactobacillus jensenii 115-3-CHN Lactobacillus iners AB-1 21 Lactobacillus iners DSM 13335 Core gene families Gene X A core gene is a gene strongly conserved within a clade Gene X is a core gene for Clade Y All subclades of Clade Y must have Gene X as core gene (strict definition) Gene X may be a core gene of several (unrelated) clades We have to relax the definition for taking into account: • Low-level gene losses • Sequencing errors • Gene calls errors 22 Examples of core genes 23 Clade-specific marker genes Gene X Gene X is a marker gene (for Clade Y) if X is a core gene for Y and X never appears outside Clade Y 24 Examples of marker genes 25 The BactoChip: high-throughput microbial species identification With Olivier Jousson, Annalisa Ballarini 26 BactoChip: detecting single species With Olivier Jousson, Annalisa Ballarini 27 MetaPhlAn: inferring microbial abundances from metagenomic data using marker genes • Map metagenomic reads to marker genes to infer microbial abundances – Normalizing for copy number, gene length, etc. Much faster than existing approaches as the marker gene database is ~50 times smaller than the whole microbial sequence DB Few hours instead of weeks for Illumina samples with 100Gb of sequence data MetaPhlAn: Metagenomic Phylogenetic Analysis http://huttenhower.sph.harvard.edu/metaphlan 28 MetaPhlAn: synthetic validation on lognormal abundances Summary of 8 synthetic communities composed by 2M reads coming from 200 organisms with log-normal distributed abundances concentrations Species-level Species level Class-level Class level 29 Matching 16S and more 30 The human microbiome at species-level resolution 31 Species Genera Whence enterotypes? 32 Microbial community function and structure in the human microbiome: the story so far? • Who’s there varies even in health – What they’re doing doesn’t (as much) – Both correlate with niche – By the way: both change during disease and treatment • There are patterns in this variation – Function correlates with membership and phenotype – “Pathogenicity” correlates with lower prevalence – Membership means species, strains, or variants – Patterns aren’t always as simple as enterotypes • ~1/3 to 2/3 of human metagenome characterized – Job security! 33 Ask both what you can do for your microbiome and what your microbiome can do for you Thanks! Human Microbiome Project Nicola Segata Levi Waldron Xochi Morgan Dirk Gevers Owen White George Weinstock Karen Nelson Sahar Abubucker Joe Petrosino Yuzhen Ye Mihai Pop Beltran Rodriguez-Mueller Pat Schloss Jeremy Zucker Makedonka Mitreva Qiandong Zeng Erica Sodergren Mathangi Thiagarajan Vivien Bonazzi Brandi Cantarel Jane Peterson Maria Rivera Lita Proctor Barbara Methe Bill Klimke Daniel Haft HMP Metabolic Reconstruction Joseph Moon Fah Sathira Tim Tickle Ramnik Xavier Harry Sokol Bruce Birren Mark Daly Doyle Ward Eric Alm Ashlee Earl Lisa Cosimi Jacques Izard Jeroen Raes Karoline Faust Vagheesh Narasimhan Josh Reyes Olivier Jousson Annalisa Ballarini Wendy Garrett Michelle Rooks http://huttenhower.sph.harvard.edu 35 Linking function to community composition ← Taxa and correlated metabolic pathways → ← 52 posterior fornix microbiomes → Plus ubiquitous pathways: transcription, translation, cell wall, portions of central carbon metabolism… Lactobacillus crispatus Phosphate and peptide transport Lactobacillus jensenii Sugar transport Lactobacillus gasseri Embden-Meyerhof glycolysis, phosphotransferases Lactobacillus iners F-type ATPase, THF Gardnerella/Atopobium AA and small molecule biosynthesis Candida/Bifidobacterium Eukaryotic pathways 37 Linking communities to host phenotype Normalized relative abundance Top correlates with BMI in stool Body Mass Index Vaginal pH (posterior fornix) Vaginal pH, community metabolism, and community composition represent a strong, direct link between phenotype and function in these data. Vaginal pH (posterior fornix) 38