Metagenomics at Second Genome Tanya Yatsunenko tanya@secondgenome.com • San Francisco based company leveraging microbiome science to enable the discovery and development of human health products through services, collaborations and internal R&D • Taking a mechanistic approach to discovery – First-of-kind microbiome drug discovery platform with pharma partner validation – Not Dx, not nutrition, not fecal transplant, not strains as drugs • Curator of Greengenes™ database (Todd DeSantis) • Qiime developer (Justin Kuczynski from Rob Knight Lab) • Over 200 microbiome studies completed to date across industry, government, academic researchers, nutrition companies, and pharma Metagenomic (and RNA-seq) Pipeline at SG Sample1_Right.fast q Sample1_Left.fastq Remove adapters Remove poor quality bases and short reads Remove Host DNA Remove rRNA fastq-mcf prinseq-lite Bowtie2 SortmeRNA Filtered sequences Metaphla n Taxonomi c Table Functional Annotation RapSearch Samples comparison: PCoA, Hierarchical Clustering; Discriminatory Organisms and Pathways BioCyc Database Genes, Genomes, Pathway abundance and coverage Open source software Cloud = Amazon AWS spot Functional annotation Genes -> Enzymes -> Pathways and Strains 1 Query Sequence from Sample1: KDYDTAQRVLGNVLVLNIIIGLAFTVLTLIFLD Functional assignments Bacterial strain assignments Genes 1 2 GJXV-1205, GTP cyclohydrolase 1 0 GJXV-2161, Na+-driven multidrug pump 0 10 Enzymes 1 2 ENZRXNJXV-1763 1 0 ENZRXNJXV-1765 0 10 Pathways 1 2 NAGLIPASYN-PWY 1 0 PWY-5687 0 10 Strains 1 2 Faecalibacterium prausnitzii M-65 1 100 Acidovorax sp.JS42 0 1 Connecting genes/enzymes to bacterial genomes Challenges • ~1% filtered sequences with a significant hit to BioCyc database • Assembly with complex microbiota? • Paired-end sequences are treated independently (for hi-seq) • Confidence in identification of strains hits from metagenomic and transcriptomic datasets • Database: KEGG vs BioCyc vs others • Some samples forward and reverse reads result in different microbiome profiles Correlating human with microbial transcriptome Microbial gene +Rho -Rho Human gene Get correlation coefficient (Rho) and p value 23 mln correlations, 400 after bonferroni correction Best correlation: Peptidoglycan glycosyltransferase vs Human gene (inflammasome related) Human gene expression Sample ID Microbial enzyme expression 0 5 10 15 0 50 100 150 Best correlation: microbial enzyme vs 5 human genes 160 150 140 Relative Abundance of HUMAN genes 130 120 110 100 90 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 Relative Abundance of MICROBIAL ENZYME RXN-11348 Peptidoglycan glycosyltransferase. 12 14 Summary • Will be happy to discuss our methods and some of the findings • Currently working on relating human and microbiome functions in disease states tanya@secondgenome.com