Presentation - Turing Gateway to Mathematics

advertisement
Metagenomics at Second Genome
Tanya Yatsunenko
tanya@secondgenome.com
• San Francisco based company leveraging microbiome science to
enable the discovery and development of human health products
through services, collaborations and internal R&D
• Taking a mechanistic approach to discovery
– First-of-kind microbiome drug discovery platform with pharma partner
validation
– Not Dx, not nutrition, not fecal transplant, not strains as drugs
• Curator of Greengenes™ database (Todd DeSantis)
• Qiime developer (Justin Kuczynski from Rob Knight Lab)
• Over 200 microbiome studies completed to date across industry,
government, academic researchers, nutrition companies, and pharma
Metagenomic (and RNA-seq) Pipeline at SG
Sample1_Right.fast
q
Sample1_Left.fastq
Remove
adapters
Remove poor
quality bases
and short reads
Remove
Host
DNA
Remove
rRNA
fastq-mcf
prinseq-lite
Bowtie2
SortmeRNA
Filtered
sequences
Metaphla
n
Taxonomi
c Table
Functional
Annotation
RapSearch
Samples comparison:
PCoA, Hierarchical Clustering;
Discriminatory Organisms and
Pathways
BioCyc
Database
Genes, Genomes, Pathway
abundance and coverage
Open source software
Cloud = Amazon AWS spot
Functional annotation
Genes -> Enzymes -> Pathways and Strains
1 Query Sequence from Sample1: KDYDTAQRVLGNVLVLNIIIGLAFTVLTLIFLD
Functional assignments
Bacterial strain assignments
Genes
1
2
GJXV-1205, GTP cyclohydrolase
1
0
GJXV-2161, Na+-driven multidrug pump
0
10
Enzymes
1
2
ENZRXNJXV-1763
1
0
ENZRXNJXV-1765
0
10
Pathways
1
2
NAGLIPASYN-PWY
1
0
PWY-5687
0
10
Strains
1
2
Faecalibacterium prausnitzii M-65
1
100
Acidovorax sp.JS42
0
1
Connecting
genes/enzymes to
bacterial genomes
Challenges
• ~1% filtered sequences with a significant hit to
BioCyc database
• Assembly with complex microbiota?
• Paired-end sequences are treated independently
(for hi-seq)
• Confidence in identification of strains hits from
metagenomic and transcriptomic datasets
• Database: KEGG vs BioCyc vs others
• Some samples forward and reverse reads result in
different microbiome profiles
Correlating human with microbial
transcriptome
Microbial gene
+Rho
-Rho
Human gene
Get correlation coefficient (Rho) and p value
23 mln correlations, 400 after bonferroni correction
Best correlation: Peptidoglycan glycosyltransferase vs Human
gene (inflammasome related)
Human gene expression
Sample ID
Microbial enzyme
expression
0
5
10
15
0
50
100
150
Best correlation: microbial enzyme vs 5 human genes
160
150
140
Relative Abundance of HUMAN genes
130
120
110
100
90
80
70
60
50
40
30
20
10
0
0
2
4
6
8
10
Relative Abundance of MICROBIAL ENZYME RXN-11348
Peptidoglycan glycosyltransferase.
12
14
Summary
• Will be happy to discuss our methods and
some of the findings
• Currently working on relating human and
microbiome functions in disease states
tanya@secondgenome.com
Download