BioIT World conference

advertisement
Scalable metabolic reconstruction
for metagenomic data
and the human microbiome
Sahar Abubucker, Nicola Segata, Johannes Goll,
Alyxandria Schubert, Beltran Rodriguez-Mueller, Jeremy Zucker,
the Human Microbiome Project Metabolic Reconstruction team,
the Human Microbiome Consortium,
Patrick D. Schloss, Dirk Gevers, Makedonka Mitreva,
Curtis Huttenhower
Harvard School of Public Health
Department of Biostatistics
04-14-11
What’s metagenomics?
Total collection of microorganisms
within a community
Also microbial community or microbiota
Total genomic potential of
a microbial community
Study of uncultured microorganisms
from the environment, which can include
humans or other living hosts
Total biomolecular repertoire
of a microbial community
2
The Human Microbiome Project for a normal population
300 People/
15(18) Body Sites
Multifaceted
Multifaceted
analyses
data
 >12,000 samples  Human
population
 >50M 16S seqs.
 Microbial
 4.6Tbp unique
population
metagenomic
sequence
 Novel organisms
 >1,900 reference  Biotypes
genomes
 Viruses
 Full clinical
metadata
 Metabolism
2 clin. centers, 4 seq. centers, data generation,
technology development, computational tools, ethics…
15+ Demonstration Projects for microbial communities in disease
Gastrointestinal
Skin
Urogenital
 Obesity
 Psoriasis
 Crohn’s disease
 Acne
 Ulcerative colitis
 Atopic dermatitis  STDs
 Reproductive
 Autoimmunity
 Cancer
 Bacterial
vaginosis
health
 Necrotizing
enterocolitis
All include additional subjects and technology development
What to do with your metagenome?
Reservoir of
gene and protein
functional
information
Who’s there?
What are they doing?
Comprehensive
snapshot of
microbial ecology
and evolution
What do functional genomic
data tell us about microbiomes?
(x1010)
What can our microbiomes tell
us about us?
Public health tool
monitoring
population health
and interactions
Diagnostic or
prognostic
biomarker for
host disease
5
Metabolic/Functional Reconstruction:
The Goal
Intervention/
perturbation
Healthy/IBD
BMI
Diet
Batch effects?
Population
structure?
Biological
story?
Taxon
Geneabundances
SNP
Enzyme
family abundances
expression
genotypes
Pathway abundances
Niches &
Phylogeny
Confounds/
stratification/
environment
Independent
sample
Crossvalidate
Test for
correlates
Multiple
hypothesis
correction
Feature
selection
p >> n
6
HMP: Metabolic reconstruction
300 subjects
1-3 visits/subject
~6 body sites/visit
10-200M reads/sample
100bp reads
HUMAnN:
HMP Unified
Metabolic Analysis
Network
Functional seq.
KEGG + MetaCYC
BLAST
CAZy, TCDB,
VFDB, MEROPS…
BLAST → Genes
 (1  p )(a  g )
1  p
a
1
a(r )
c( g ) 

|g| r
http://huttenhower.sph.harvard.edu/humann
r
a(r )
Genes
(KOs)
Genes → Pathways
MinPath (Ye 2009)
WGS
reads
Taxonomic limitation
Pathways
(KEGGs)
Xipe
?
Rem. paths in taxa < ave.
Pathways/
modules
Distinguish zero/low
Gap filling
(Rodriguez-Mueller in review)
c(g) = max( c(g), median )
Smoothing
Witten-Bell
TN /(V  T ) /( N  T ) c( g )  0
c( g )  
otherwise
c( g ) N /( N  T )
7
HMP: Metabolic reconstruction
Pathway coverage
Pathway abundance
8
HUMAnN: Validating gene and pathway
abundances on synthetic data
Individual gene families
ρ=0.86
Validated on individual gene families,
module coverage, and abundance
• 4 synthetic communities:
Low (20 org.) and high (100 org.) complexity
Even and lognormal abundances
• Few false negatives: short genes (<100bp),
taxonomically rare pathways
• Few false positives: large and multicopy
(not many in bacteria)
9
Functional modules in 741 HMP samples
PF
O(BM)
← Samples →
S
O(SP)
O(TD)
RC
AN
Coverage
• Zero microbes (of ~1,000)
are core among body sites
• Zero microbes are core
among individuals
← Pathways→
• 19 (of ~220) pathways are
present in every sample
• 53 pathways are present in
90%+ samples
Abundance
• Only 31 (of 1,110) pathways
are present/absent from
exactly one body site
• 263 pathways are
differentially abundant in
exactly one body site
10
A portrait of the human microbiome:
Who’s there?
With Jacques Izard, Susan Haake, Katherine Lemon
11
Pathway coverage
A portrait of the human microbiome:
What are they doing?
12
HMP: How do microbes vary within each body
site across the population?
13
HMP: How do body sites compare between
individuals across the population?
14
HMP: Penetrance of species (OTUs)
across the population
Data from Pat Schloss
15
HMP: Penetrance of genera (phylotypes)
across the population
Data from Pat Schloss
16
HMP: Penetrance of pathways
across the population
KEGG Metabolic modules
M00001: Glycolysis (Embden-Meyerhof)
M00002: Glycolysis, core module
M00003: Gluconeogenesis
M00004: Pentose phosphate cycle
M00007: Pentose phosphate (non-oxidative)
M00049: Adenine biosynthesis
M00050: Guanine biosynthesis
M00052: Pyrimidine biosynthesis
M00053: Pyrimidine deoxyribonuleotide biosynthesis
M00120: Coenzyme A biosynthesis
M00126: Tetrahydrofolate biosynthesis
M00157: F-type ATPase, bacteria
M00164: ATP synthase
M00178: Ribosome, bacteria
M00183: RNA polymerase, bacteria
M00260: DNA polymerase III complex, bacteria
M00335: Sec (secretion) system
M00359: Aminoacyl-tRNA biosynthesis
M00360: Aminoacyl-tRNA biosynthesis, prokaryotes
M00362: Nucleotide sugar biosynthesis, prokaryotes
M00006: Pentose phosphate (oxidative)
M00051: Uridine monophosphate biosynthesis
M00125: Riboflavin biosynthesis
M00008: Entner-Doudoroff pathway
M00239: Peptides/nickel transport system
M00018: Threonine biosynthesis
M00168: CAM (Crassulacean acid metabolism), dark
M00167: Reductive pentose phosphate cycle
• Human microbiome functional structure dictated
primarily by microbial niche, not host (in health)
• Huge variation among hosts in who’s there;
small variation in what they’re doing
• Note: definitely variation in how these
functions are implemented
• Does not yet speak in detail to host
environment (diet!), genetics, or disease
17
Population summary statistics
 population biology
← Individuals →
← Species →
Posterior fornix, ref. genomes
Lactobacillus iners
Lactobacillus crispatus
Gardnerella vaginalis
Lactobacillus jensenii
Lactobacillus gasseri
Posterior fornix, functional modules
← Pathways →
Essential amino acids
Basic biology, sugar transport
Urea cycle, amines, aromatic AAs
18
LEfSe: Metagenomic class
comparison and explanation
LEfSe
LDA +
Effect Size
Nicola
Segata
http://huttenhower.sph.harvard.edu/lefse
19
LEfSe: Evaluation on synthetic data
Their FP rate
Our FP rate
20
Microbes characteristic of the
oral and gut microbiota
21
Aerobic, microaerobic and
anaerobic communities
• High oxygen: skin, nasal
• Mid oxygen: vaginal, oral
• Low oxygen: gut
LEfSe: The TRUC murine colitis microbiota
With Wendy Garrett
23
Microbial biomolecular function and biomarkers
in the human microbiome: the story so far?
• Who’s there changes
– What they’re doing doesn’t (as much)
– How they’re doing it does
• The data so far only scratch the surface
–
–
–
–
Only 1/3 to 2/3 of the reads/sample map to cataloged gene families
Only 1/3 to 2/3 of these gene families have cataloged functions
Very much in line with MetaHIT study of gut microbiota
Job security!
• Looking forward to functional reconstruction…
– In environmental communities
– With respect to host environment + genetics
– With respect to host disease
24
Thanks!
Human Microbiome Project
Ramnik Xavier
Nicola Segata
Dirk Gevers
Pinaki Sarder
George Weinstock
Jennifer Wortman
Owen White
Sahar Abubucker
Makedonka Mitreva
Yuzhen Ye
Erica Sodergren
Beltran Rodriguez-Mueller
Mihai Pop
Jeremy Zucker
Vivien Bonazzi
Qiandong Zeng
Jane Peterson
Mathangi Thiagarajan
Lita Proctor
Brandi Cantarel
Maria Rivera
Barbara Methe
Bill Klimke
Daniel Haft
HMP Metabolic Reconstruction
Levi Waldron
Larisa
Miropolsky
Wendy Garrett
Jacques Izard
Bruce Birren Mark Daly
Doyle Ward Eric Alm
Ashlee Earl Lisa Cosimi
Interested? We’re recruiting
graduate students and postdocs!
http://huttenhower.sph.harvard.edu
http://huttenhower.sph.harvard.edu/humann
http://huttenhower.sph.harvard.edu/lefse
25
Download