About the cellulases distribution Renaud Berlemont, UCI Adam Martiny Lab. About the GHx classification • • • CAZYdb Glycoside Hydrolases, … Structure – Sequences Alignements : Families (>100) / Clans (14) « Convergence – Divergence » Some statements • Biochemically confirmed « cellulases » = CMCases Some statements • Biochemically confirmed « cellulases » = CMCases • Many cellulases are active on other substrates (e.g. xylan) • Many « cellulases » are non-cellulolytic !? • CMCases ≠ Cellulases • Cellulose production : – – – – – GH8 (Romling, 2002) – Biofilm / Interaction (w. plant) GH5 (Berlemont, 2009) - Biofilm GH6 (Delbrassine, in prep) – Cell differenciation GH6 (Tunicate, animal) GH9 (KORrigan, plant) Some statements • Biochemically confirmed « cellulases » = CMCases • Many cellulases are active on other substrates (e.g. xylan) • Many « cellulases » are non-cellulolytic • CMCases ≠ Cellulases • Best studied cellulose degraders all belong to the Firmicutes group (e.g. Clostridium) • ~20 genomes of cellulose degraders have been completely sequenced Question 2 How are extracellular enzyme genes distributed among microbial taxa ? Hypothesis 2a Some extracellular enzymes are broadly distributed across taxa while others are constrained to a small number of taxa. Hypothesis 2b The occurrence of different extracellular enzyme genes among taxa will be correlated. Some genes will show patterns of over-dispersion while others will show cooccurrence. pSEED - FigFams • Sequenced genomes (patricbrc db - 4089) In order to analyze as much as possible sequenced genomes pSEED - FigFams « FIGfams are sets of protein sequences that are similar along the full length of the proteins. Proteins are thought of as implementing one or more abstract functional roles, and all of the members of a single FIGfam are believed to implement precisely the same set of functional roles ». « Unambiguous coherent annotation system » … 3.2.1.4 : 1,4-beta-D-endoglucanase, 1,4-beta-D-glucan-4-glucanohydrolase, beta-1,4-endoglucan hydrolase, beta-1,4-endoglucanase, endoglucanase, Methodology CAZYdb GH families E.C. 3.2.1.4 GHx Pfam (pro. + euk.) InterPRo (pro.) PfGHx.FASTA IprGHx.FASTA Home-made Script : SEQ PEG ID Figfam IDs pSEED PEG IDs FigFam IDs Several Figfam IDs correspond To one GHx families because Signal Peptides and accessory domains Are not conserved … Methodology GHx pSEED FigFam IDs Genomes Annotations Figfam IDs GHx Occurrence In Sequenced genomes Bacterial CBM2 groups Bacterial Occurrence / groups List Bacterial Occurrence / groups List … … Statistic Genomes annotations (pSEED) Alignment GHx distribution A huge data-set A B C D E F G H I J K L M N O P Q R S T U V W Huge bias : A + C + M + R = 88% of the sequenced genomes… Actinobacteria Aequfacie Bactero./Chlorobi Chlam./ Verruco. Chloroflexi Chrysiogenetes Cyanobacteria Deferibacter Deinoco./Thermus Dictyoglomi Elusomicrobia Fibrob./ Acidobact. Firmicutes Fusobacteria Nitrospirae Gemmatimonadetes Planctomyces Proteobacteria Spirochaetes Synergistetes Tenericutes Thermodesulfobact. Thermotogae Average Gene Content (AGC) Life style (Auto Vs. Hetero) Host association … “HKG” Multi-function … A B C D E F G H I J K L M N O P Q R S T U V W Actinobacteria Aequfacie Bactero./Chlorobi Chlam./ Verruco. Chloroflexi Chrysiogenetes Cyanobacteria Deferibacter Deinoco./Thermus Dictyoglomi Elusomicrobia Fibrob./ Acidobact. Firmicutes Fusobacteria Nitrospirae Gemmatimonadetes Planctomyces Proteobacteria Spirochaetes Synergistetes Tenericutes Thermodesulfobact. Thermotogae GHx distribution in Genomes Life Style Autotrophic : Aequifacie Cyanobacteria Chrysiogenetes Nitrospirae Host associated: Chlam./ Verruco. Elusomicrobia Fibrob./ Acidobact.* Fusobacteria Spirochaetes Tenericutes GHx distribution in Genomes GHx functions « house keeping » GH6 endoglucanase ; cellobiohydrolase GH18 … endo-β-N-acetylglucosaminidase … Q: Planctomycetes U: Tenericutes - Mycoplasma GHx distribution in Genomes GHx functions GHx families « specialization » GH6 endoglucanase ; cellobiohydrolase GH5 chitosanase ; β-mannosidase ; cellulase ; glucan β-1,3-glucosidase ; licheninase ; glucan endo-1,6-β-glucosidase mannan endo-β-1,4-mannosidase ; endo-β-1,4-xylanase ; cellulose β-1,4-cellobiosidase ; β-1,3-mannanase ; xyloglucan-specific endo-β-1,4-glucanase ; mannan transglycosylase ; endo-β-1,6-galactanase ; endoglycoceramidase How is it possible to know if an Enzyme from the GH5 is a cellulase? Complex architectures GH5 chitosanase (EC 3.2.1.132); β-mannosidase (EC 3.2.1.25); cellulase (EC 3.2.1.4); glucan β-1,3-glucosidase (EC 3.2.1.58); licheninase (EC 3.2.1.73); glucan endo-1,6-β-glucosidase (EC 3.2.1.75) mannan endo-β-1,4-mannosidase (EC 3.2.1.78); endo-β-1,4-xylanase (EC 3.2.1.8); cellulose β-1,4-cellobiosidase (EC 3.2.1.91); β-1,3-mannanase (EC 3.2.1.-); xyloglucan-specific endo-β-1,4-glucanase (EC 3.2.1.151); mannan transglycosylase (EC 2.4.1.-); endo-β-1,6-galactanase (EC 3.2.1.164); endoglycoceramidase (EC 3.2.1.123) GH6 endoglucanase (EC 3.2.1.4); cellobiohydrolase (EC 3.2.1.91) 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 GH6*, GH6-CBM2 distribution in sequenced genomes Associated to the cellulose production In actynomycetes ! GH5* 800 GH5-CBM2 GH6* GH6-CBM2 600 400 150 ? 100 PEGs PEGs GH5*, GH5-CBM2 distribution sequenced genomes Free cellulases from inthe GH6 are 125 100 75 50 50 25 0 0 A B C D E F G H I J K L M N O P Q R S T U V W Bacterial Groups A B C D E F G H I J K L M N O P Q R S T U V W Bacterial Groups Is there an efficient combination of enzymes ? Is there an efficient combination of enzymes ? Some genes are abundant (GH5, 10, 16, 18, 19) Are these genes really involved in PCW breakdown ? Multi-domain Why are Fibrobacteria so Efficient ? Is there an efficient combination of enzymes ? The keys of the succes in Fibrobacteria Things to remember… • Huge dataset • Distribution of GHx amongst taxa • Not all the GHx are equivalent – Multifunction, house keeping and specialized GHx families • Not all the taxa are equivelent – Life style, metabolism • Future : « Multi-domain » What’s next Looking at the GHx-distribution in subgroups (e.g Proteobacteria, Firmicutes, …) Detailed table of the GHx distribution amongst (sub)-taxa Potential publication ? • What is the phylogenetic distribution of GHx’s and CBM-GHx’s • Catabolism regulation analysis in Actynobacteria CebR (GHx vs CBM-GHx) : – Presence/absence of regulating sequences upstream the GHx-coding sequences • Environmental factors : “life style”, “metabolism”, … • Gene Gain/loss : 16S rRNA Vs. presence/absence of GHx’s Do the cellulose degradation potential vary in environment ? Some cases studies … GHx distribution in metagenomes % of CBM linked GHx Warnecke 2007 Spirochaetes, Fibrobacter, Bacteroidetes, … Hess 2011 Bacteroidetes, Fibrobacteria, Clostridia, … …Vs. Our study Percent of hits to bacterial SSU-rRNAsequences Using the SSU… 120 Fibrobacter/Acidobacter Bacteroidetes Cyanobacteria Firmicutes GammaProteobacteria BetaProteobacteria AlphaProteobacteria ActinoBacteria Others 100 80 60 40 20 0 L1 L2 L3 L4 L5 L6 PL …Vs. Our study Reno 2012 (probably) Actinobacteria, Alphaproteobacteria, Bacteroidetes, … Warnecke 2007 Spirochaetes, Fibrobacter, Bacteroidetes, … Hess 2011 Bacteroidetes, Fibrobacteria, Clostridia, … Metagenomes Clustrering 16S rRNA GHx GOS GOS Leaf Litter Leaf Litter ? Leaf Litter (tr. 1) Leaf Litter (tr. 1) Leaf Litter (tr. 2) Leaf Litter (tr. 2) Cow Rmuen Cow Rmuen Termites Termites Wood feeding insects Wood feeding insects Human metagenome Human metagenome Environment selects for different populations (with different GHx) Things to remember… • Different recipes for efficient PCW breakdown • Depending on the ecosystem • Leaf litter ≠ Cow Rumen – Bacterial content – GH content • Regarding the ecosystems, bacteria display different strategies to access plant polymers – [GH6, GH8, GH9]LL > [GH6, GH8, GH9]CR – [CMB-GHx]LL > [CBM-GHx]CR What’s next • Leaf Litter Metagenome – 22 samples ~ready to be sequenced (TruSeq TM DNA -Illumina) (first year) – samples to be prepared (second year) – Compare : [GHx/16s rRNA in sequenced genomes] vs. [GHx/16s rRNA in Leaf Litter] – Compare different treatments, metagenomes Nitrogen fertilization Nemergut, 2008, The effects of chronic nitrogen fertilization on alpine tundra soil microbial communities: implications for carbon and nitrogen cycling. control GHz 16S rRNA control GHz GHy GHx 16S rRNA 24 samples • TruSeq TM DNA (Illumina) • 24 samples • 22 samples ready to be sequenced Complex architectures CBM2 Cel5 Cel5 CBM2 Xyl8 Cel5 Amount of FigFam IDs corresponding to a 2-domain protein Amount of FigFam IDs ≠ Amount of genes Metagenomes Clustrering 16S rRNA GOS GHx Leaf Litter Leaf Litter Leaf Litter (tr. 1) Leaf Litter (tr. 1) Leaf Litter (tr. 2) GOS Leaf Litter (tr. 2) Cow Rmuen Cow Rmuen Termites Termites Wood feeding insects Wood feeding insects Human metagenome Human metagenome Environment selects for different GHx potential