Uploaded by Cypher #

Class2-Microbial systemsbiology

advertisement
Microbiology systems biology
Systems biology is ...
$ (1) Identifying the parts (2) Mapping the interactions (3) Profiling responses to pertubations (4) Modeling the system, predicting responses
Microbial systems biology
$ Microorganisms rarely occur as sinbgle species populations
$ Microorganisms are encountered in many hosts/environments
$ Technological advances in sequencing are revolutionizing (microbial) biology: NGS allowed massive data generation
Uncovering unkowns: -omics
$ Example 1: bacterial analysis of 3 stool samples
-> 340 cultured bacterial species
-> 698 phylotypes identified by pyrosequencing
->majority (42%) uncultured bacteria
-> Abundance unknowns not taken into account (technical limits)
Uncovering unknowns: our other genome
$ Example 2: MetaHIT ultra-deep illumina sequencing of 124 individuals
Key questions in systems biology
$ WHat is a system?
$ What biological units map on to systems?
$ How do systems constrain individual components
Microbial systems biology- bacterial cell
$ Integration of patterns within a microorganism
-> static patterns ex. protein-protein interactions, metabolic pathways
-> dynamic patterns ex. metabolite flow through network of enzymes
$ Comparison across different species: based on evolution
= comparisons of interactions of geens and proteins occurences, activities
-> powerfull because systems arose through processes of evolution
Microbial systems biology - COmplex microbial community
$ Integration of patterns within a microbial community
-> static patterns ex. network analyses based on presence-abscence or abundance
-> dynamic patterns ex. time-series analyses, perturbation experiments
$ Comparison across different communities
= comparisons of interactions between microbial communities
-> co-occurence and co-exclusion pattersn across communities
-> production of secondary metabolites, cross feeding patterns
Microbial Systems biology - complex microbial community in interaction with environment
$ Integration of patterns in complex microbial community and environment
-> ex impact host genotype on human associated microbial composition
-> dynamic patterns ex. metabolite flow through network of enzymes
$ Comparison across different environmental changes on complex communities
-> ex. impact of host medication on intestinal microbial community
Complex microbial ecosystems: from systems biology to ecosystems biology
$ Systems biology in microbiology: same definitions, different context
$ systems biology = integration
$defitnition of systems determines research: in this part: system = complex microbial community
$ biological units: microorganims
$ similar evolution from data collection to explanatory science
$ feedback loop: data driven discovery approach <=> hypothesis driven approach
$ Disclaimer: interpretation of transcriptome, proteome and metabolome research is hampered by unknowns ==> focus on marker gene based and metagenomic research, microbial systems biology is notlimited to genome based research
From data-collection to explanatory science
$ CLusteirng analyses, network analyses, working around unknows
Searching for patterns in complex data
$ clustering of samples form different body parts: general consensus
WHat data is being used?
$ clustering based on distances metrics
How to cluster?
$ visually: arbitrary method based on visually apparent groups (non preferred)
$ using clustering quality measures
-> separation: low similarity between clusters
-> compactness: high similarity within clusters
$ "Best" clustering:number of samples = number of clusters
==> statistics for estimating number of clusters: comparing the change in within cluster dispersion with that expected under reference null distribution
Searching for patterns in complex data: The enterotypes
$ clustering of samples types within body-parts: highly contested
Enterotypes
$ Quentitative species composition comparison based on 40 universal single copy genes (phylomarkers)
Searching for patterns in complex data: The enterotypes:why contested?
$ Lack of consensus on analytical basis for enterotypes
$ enterotype detection is confounded by: Distance metrix, cluster socring method, taxonomical level, 16S rRNA region, OTU-picking approaches, WGS
$ Identifying enterotypes in datasets depends not only on structure of data but is also sensitive to methods applied to identifying clustering strength
Dirichlet multinomial mixture models, DMMs
$ Probabilistic modelling of microbial metagenomics data
$ frequency matric giving number of times each axa is observed in each sample
$ communities are diverse and skewed to rare taxa: scattered matrix
$ samples have different size
$ model fit for increasing number of Dirichlet mixture components (using laplace approximation to negative log model evidence)
==> groups of communities with similar composition
Enterotuype classification schemes: based on different methods
$ 1106 samples: Flemish gut flora project
$ according to approach with 3 and 32 clusrers, and DMM approach
$ Prevotella remains separated
$ methods mostmy differ in dividing area between ETB and ETF
$ distance within a clsuters compared to the median distance betweene the clusters
$ for all cases distances witgin clusters < distance between clsuters
Hierarchical structure of different clusterings
$ different clusterings: highly associated, forming hierarchical structure
The enterotypes: biological relevance
$ density peaks and relative stability suggest ecosyste optima
$ functional richness differs substantially between enterotypes
$ Discovery of bad enterotype: bacteriodes enterotype 2
Enterotyping: DIY
$ Dirichlet multinomial mixtures (DMM) is a probabilistic method for community typing (or clustering) of microbial community profiling data. It is an infinite mixture model, which means that the method can infer the optimal number of community types. Note that the number of community types is likely to grow with data size
$ Fit the DMM modem. Let us set the maximum allowed number of community tyês to 3 to speed up the example. Chekc model fit with different number of mixture components using standard information criteria. Pick the optimal model. Mixture parameters pi and theta. Sample-components assignments. Contributuon of each taxonomic group to each component
$ output: 3 community tyês with their top drivers (maximu of communities was set to 3)
Enterotypes of human gut
$ Statistical support for 2-4 enteortpes, biological support for more enterotypes
==> illustrates feedback loop between computational models and simulations versus knowledge discovery and datamining in systems biology
Enterotypes: The truth is still out
$ consistent clusters of complex microbial samples from human gut
$ hypothesis: due to co evolution of human and human-associated microbial communities: overall function of gut microbial ecosystem is non-random but composition can vary
==> distinct clusters are ecosystem-optima
$ trade off between pragmatic systems biology (large scale molecular interactions) and systems theoretic biology (systems principles)
==> enterotypes illustrates interplay of different streams within systems biology: both committed to understan docmplex systems, both committed to mathematical modelling, both lack clear account of what biological microbial systems are
Intermediate conclusion: clustering analyses
$ clustering analysis of complex microbial communities: allows data stratification
$ structuring of complex data
$ enterotypes illustrate limits as wella spotential of clustering analyses in complex gut microbial communties
$ confounding factors for cluster detection need to be taken into account
From data-collection to explanatory science
$ network analyses
Searching for patterns in complex data: the enterotypes
$ microbial interaction patterns: co-occurence networkds of the 3 enterotypes from the sanger metagenomics
$ how do systems constrain individual components? How are individual biological units and their beahvior altered, controlled or constrained by becoming components of the systrm?
Microbial networks
$ interactions in microbial communities
$ microbial network analyses methods vary in sensitivity and precision
$ most popular method for correlation based networks is spearmans rank correlation coefficients between taxa
$ microbial communties are subdivided by abundance activity function and occupancy
=> use differetn network analyses and different subcommunities
Network analyses
$ characterizing microbial interactions using similarity measures
$ microbial co-occurence relationships in the human microbiome
Network analyses: co-occurence of microbial clades within and among body areas
$ nodes: microbial classes
$edges: summary of interactions between classes over all body sites if number of edges is significantly larger than expected
$ solid edges: % of contribution of interactions at oral cavity
$ dashed edges: % of contribution of interactions from skin sites
$ most interclass interacito, in mouth
Network analyses: summary statistics of microbhial associations in normal human microbiota
Networkd analyses: biological relevance: microbial dysniosis index characterizes crohns disease severity
$ correlation network inferred for mcirobiota compositions using CCREPE with checkerboeard score
$ strong co-occurence between taxa of same diesease-associated behavior
$ co-exclusion between taxa of different behavior
$co occurring taxa with disease associated behavior: positively correlated with disease activity
$ co-occuring taxa with non CD associated behavior: neagtively correlated with disease activity
Network analyses: biological relevance: functional and phylogenetic similarities between co-occurring taxa
$ evolutionary distances among microbial clased compared to functioncal potential $ JAccard idnex of orthologuous gene (COG) families share dbetween genomes
$ phylogenetic distances inferred by FastTree using species level 16S sequences
$ Baseline correlation between functional and evolutionary distances
$ lower left: higjmy related clades co-occuring among related habitats
$ off diagonals: potential competition or functional complementarity
Agreement network analyses: association networks produced by individual similarity measures
$ heat map depicting edge overlap measured by jaccard index
$ ensemble of scoring measures to capture different types of microbial co-occurences
- correlations (pearso, spearman)
-GBLMs (generalized boosted linear models) (????)
- dissimilarities (KLD kullbakc leibler divergence), bray curtis distance)
Different types of communities:impact on stability
$ community with interactions not correlated with species abundances ==> only removal of many species/OTUs will result in fucntional change
$ community with abundant keystone species involved in most interactions ==> strength of ecological functions varies greatly: drivers and passengers in community
Microbial network analysis: DIY: CoNet
$ CoNet = association network inference tool
$ developed with microbial community data from sequencing in mind
$ desgined to be generic: detect associations in any data set with repeated observations of biological entities (genes, metabolites, species)
$ Variety of network inference approaches which can be combined
$ Workflow: starts with matrix -> preprocessing (filtering, normalization) -> method selection (similarities, dissimilarities, correlations) -> threshold setting (initial networkd) -> computation of permutation and bootstrap distributions (final network with p values)
$ can parse biom files
$ support for lagged similarity computation in time series
$ automatic assignment of higher level taxa from lineages
$ large choice of correlation, distance and similarity measures
$ measures can be combined in multiple ways
$ implements the ReBoot procedure
$ significance can be tested with various randomization routines and multiple testing corrections
$ supports row groups and combination of 2 input matrices
$ several data preprocessing and filtering options $ repeatable data analysis with settings loading and saving
$ command line tool for the analysis of large data sets
Intermediate conclusion: network analyses
$ allows ecological insights
$ structural composition of complex communities
$ correlation analyses allow discovery of relevant biological pattern
$ sensitivity and precision vary between network analyses methods
From data-collection to explanatory science
$ Working around unknowns
Working around unknowns
$ vast amount of sequencing data: uncultured bacteria
$ uncultured = unknown
$ omitting large parts of data
$ metagenomic sequencing: underexploited potential
From data-collection to explanatory science: metagenomic species (MGS) and co-abundance gene groups (CAGs)
$ Why? microbial diversity of many environments extends far beyond what is covered by reference databases
$ What? annotate unknown data: MGS and CAGs = identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
$ how?
-> Segregating a metagenome into groups of genes that have similar abundance(CAGs)=> identification of biological entities like species and phages, as well as small genetic entities representing co inherited clonal heterogeneity
-> Binning co abundant genes across a series of metagenomic samples => comprehensive discovery of new microbial genomes without the need for reference sequences
$ DAN from independent microbial communities samples extracted and shtigun sequenced
$ genes assembled and identified in individual samples integrated => cross sample, non redundant gene catalog
Back to clustering:canopy clustering
$ uses both an approximate similarity measure and an accurate similarity measure to cluster
$ canopies: approximate distance measure efficiently dividing data in overlapping subsets
$ distance measurements onlly between pairs of points in same canopies ==> far fewer than all possible pairs in data set
$ outer threshold = canopy
$ inner threshold = points excluded to from new canopy centers
$ Canopy= centered on a co abundance gene group (CAG) = genes with a pearson correlation coefficient > 0.9 to seed gene profile
$ gene content of a settled canopy is named metagneomic species (MGS) if it contains >= 700 genes
$smaller gorups remain referred to a s CAGs
$ individual samples with sequence reads that map to MGS genes and their contigs => extracted and used to assembly a draft genome sequence for a MGS
$ sample specific sequence reads in assemblies to help discriminate between closely related strains => high quality genomes
Metagenomic species (MGS): MGS augmented assembly in practice
$ BLAST dot-blot shows relative chromosomal positions of mathcing sequence on MGS augmented assembly
MGS and CAGs analysis: DIY
$ https://services.healthtech.dtu.dk/ > Datasets > MetaGenomic Species
Intermediate conclusion: metagenomic species
$ Working around unknowns in complex microbial communities: metagenomic species and co abundance gene groups
$ using specific clustering algortihm: pattern discovery in complex metagenomic data
$ MGS = bioinformatically defined hypothetical species allow assembly of bacterial genomes from metagenomic data
$ dependency patterns between unknowns can be detected in complex microbial communities
Relationships among microbial phylogeny, diversity and ecosystem functioning
$ Relationship between phylogeny and function is complex
$ many functional traits are phylogenetically conserved
$ reasons for phylogeny-function decoupling: HGT, gene gain and loss functional type functional rate
Network analysis influenced by environment
$ environmental shifts change relative abundance of dormant and active fractions witin microbial community
$ such changes cause shifts in microbial co-occurence pattersn (cross-feeding, secondary metabolites)
$ resulting in affected ecosystem fucntioning
Final conclusion
$ microbial systems biology: same definitions, context: complex microbial communities
$ clustering and network analyses allow discovery of patterns with biological repercussions
$ clustering and network analyses are tools to mine and unravel massive amounts of data $ prokaryotic properties need to be take into account
$ MGS and CAGs demonstrate how models can shortcut lack of knowledge on bacteria
Download