Carbohydrate-Active Enzymes in Melampsora laricispopulina Brandi Cantarel, Bernard Henrissat, Pedro M. Coutinho Architecture et Fonction des Macromolécules Biologiques (UMR 6098) CNRS / Aix-Marseille Université, France 1st Melampsora Genome Consortium Workshop, Nancy (Aug/08) Outline CAZY Database and Website Genome Annotation and Comparative Genomics Annotation Highlights from Melampsora laricis-populina Interpretation and Speculation Carbohydrate Active enZymes (CAZymes) • Adhesion 1991 Glycoside Hydrolases (112) • Recognition • Glycosidases cleave • Transglycosidases form 1997 Glycosyltransferases (91) (NDP-, NMP-, lipid-phosphorylases) • Selectivity form 1998 Polysaccharide Lyases (19) cleave 1999 Carbohydrate Esterases (15) modify 2000 Carbohydrate-Binding Modules (52) © Coutinho & Henrissat, 2008 PDB accessions Subfamily Name of protein EC number CAZy Organism @ AFMB since September 1998 GenBank accessions www.cazy.org UniProt accessions CAZY Database and Website Genome Annotation and Comparative Genomics Annotation Highlights from Melampsora laricis-populina Interpretation and Speculation CAZy: Carbohydrate-Active EnZymes Database www.cazy.org Sequences/Structures: GenBank; UniProt; PDB Genome Sequence BLAST HMMER Specialized Library of Modules CAZy Sequences Modular Annotation Family Annotation Mechanism; Structure; Function Biochemical Data: Literature; PubMed; EMP; PMD; Other Individual CAZyme Annotation Annotating CAZymes Function Prediction is a major bottleneck • Common Genome Annotation Practices • Sequence Similarity ~ Specific Functional Prediction (≠) • Erroneous annotation are propagated • Original error(s) difficult to track • Conservative Practices • Sequence Similarity = Family inclusion • Catalytic machinery checked for borderline cases • Functional assignment based on literature • Prediction based on subfamily analysis Annotation and Comparisons CAZy - Biochemical Bioinformatics: Correlation of data w/ biochemical databases Manual Literature Curation Text correlation / mining CAZy – Phylo -Genetics / -Genomics: Identify Orthologs and Paralogs Identify Analogs -- Convergent Evolution Distinguish close / remote relationships Enzyme discovery in a Single Genome Search and list all the CAZymes Infer Properties (Mechanism / Fold) from Families Infer Function from SubFamilies and Known Biochemically Characterized Cases Compare CAZyme content of Multiple Genomes Correlate CAZyme content with Lifestyle Discover singularities in Genomes Understand Genome Evolution © Coutinho, Danchin & Henrissat, 2007 CAZy: On the Genomic Scale Annotations of CAZymes in Genomes Modular Annotation Identify modules Identify gene models with major problems (large truncations, insertions, frameshifts, etc) Identify Signal peptides, Linkers, GPI-anchors, TMs Functional Annotation Sequence similarity to characterized enzymes Make use of Subfamilies with characterized enzymes for reliable annotation Characterized in the literature Provide annotations that will “age well” Several Levels / Categories: Know Cases (++) :EC activity assignment High Similarity (+) : “candidate” activity Medium Similarity (-) : “related to” Low Similarity (--) : “distantly related to” (taxon) activity Interpretation Analogies with better characterized genomes Singularities in enzyme distribution Interaction with Consortia Biologists CAZY Database and Website Genome Annotation and Comparative Genomics Annotation Highlights from Melampsora laricis-populina Interpretation and Speculation Sequence Similarity based Modular Analysis of CAZymes Genome Sequences Filter against CAZY Sequences using BLASTP CAZymes Identify Modular Structure using HMMs of Modular Families Modular Annotation CAZyModO : Genomic entry (1.ModO; 2.Function) Modularity in a Genome: Melampsora laricis-populina SS-based Functional Analysis of CAZymes © Coutinho & Henrissat, 2007 Activities in a Genome: Melampsora laricis-populina Fungal CAZymes : M_lari vs Global Trends GH GT PL CBM LifeStyle S_cere A_nige A_oryz 45 67 0 12 Saprophite 239 109 8 40 Saprophite 283 114 21 33 Saprophite B_fuck 223 92 9 64 PhytoPath. T_mela 91 96 3 25 M_gris Symbiont PhytoPath. 231 92 4 63 H_jeco 192 93 3 41 G_zeae 242 102 20 62 Saprophite PhytoPath. Saprophite N_cras 171 74 3 41 P_anse 229 88 7 75 Saprophite Saprophite S_pomb C_neof 46 81 61 66 3 8 10 P_chry L_bico 179 66 4 47 162 88 7 C_cine M_lari P_gram U_mayd 26 90 Saprophite 210 72 13 176 93 6 10 PhytoPath. 157 97 88 64 4 1 11 9 PhytoPath. 0 Pathogen Saprophite Symbiont PhytoPath. Normal GT set Medium GH Low PL / CBM set CAZyme Family & Functional Annotation Objectives Attribution of CAZymes to Families Annotation based on Biochemically Characterized cases Understand Evolution A.nidulans A.fumigatus A.niger Eurotiomycetes A.oryzae S. sclerotiorum M.grisea P.anserina N.crassa Sordariomycetes H.jecorina G.zeae C.albicans S.cerevisiae Saccharomycotina C.glabrata Ascomycota S.pombe Archaeascomycetes C.neoformans Hyménomycetes L.bicolor P.chrysosporium Basidiomycota U.maydis R. oryzae Zygomycota © Coutinho, Danchin & Henrissat, 2006 Fungal Genomes: Kluyveromyces lactis NRRL Y-1140 Pichia stipitis CBS 6054 Saccharomyces cerevisiae S288C Debaryomyces hansenii CBS767 Eremothecium gossypii ATCC 10895 Yarrowia lipolytica CLIB99 Candida albicans - Private Candida glabrata CBS138 Phaeosphaeria nodorum SN15 - Private Aspergillus nidulans FGSC A4 v.2 Aspergillus nidulans FGSC A4 v.3 - Private Aspergillus clavatus NRRL 1 [- Private Aspergillus flavus NRRL3357 - Private Aspergillus niger CBS 513.88 – (2007) Aspergillus niger ATCC 1015 - Private Aspergillus niger CBS 513.88 - Private Aspergillus oryzae RIB 40 Aspergillus fumigatus Af293 - Private Aspergillus terreus NIH2624 - Private Coccidioides immitis RS - Private Sclerotinia sclerotiorum 1980 - Private Botryotinia fuckeliana T4 - Private Tuber melanosporum - Private Magnaporthe grisea 70-15 Hypocrea jecorina – Private (2008) Gibberella zeae - Private Fusarium verticillioides 7600 - Private Nectria haematococca mpVI - Private Fusarium oxysporum lycopersici - Private Cryphonectria parasitica EP155 v1 - Private Neurospora crassa OR74A Chaetomium globosum CBS 148.51 - Private Podospora anserina – Private (2008) Schizosaccharomyces pombe 972hSchizosaccharomyces japonicus yFS275 - Private Fungal Genome Crunching >35 Private (Consortia + Extra) and/or 15 Public @ www.cazy.org Cryptococcus neoformans H99 - Private Cryptococcus neoformans var. neoformans JEC21 Postia placenta Mad-698-R - Private Phanerochaete chrysosporium – Private (2004) Laccaria bicolor – Private (2008) Coprinopsis cinerea- Private Melampsora laricis-populina - Private Puccinia graminis f. tritici - Private Ustilago maydis - Private Malassezia globosa CBS 7966 – Private Rhizopus oryzae RA 99-880 – Private Batrachochytrium dendrobatidis JAM81 – Private Encephalitozoon cuniculi GB-M1 Kluyveromyces lactis NRRL Y-1140 Pichia stipitis CBS 6054 Saccharomyces cerevisiae S288C Debaryomyces hansenii CBS767 Eremothecium gossypii ATCC 10895 Yarrowia lipolytica CLIB99 Candida albicans - Private Candida glabrata CBS138 Phaeosphaeria nodorum SN15 - Private Aspergillus nidulans FGSC A4 v.2/v.3 - Private Aspergillus clavatus NRRL 1 [- Private Aspergillus flavus NRRL3357 - Private Aspergillus niger CBS 513.88 Private – (2007) Aspergillus niger ATCC 1015 - Private Aspergillus oryzae RIB 40 Aspergillus fumigatus Af293 - Private Aspergillus terreus NIH2624 - Private Coccidioides immitis RS - Private Sclerotinia sclerotiorum 1980 - Private Botryotinia fuckeliana T4 - Private Tuber melanosporum - Private Magnaporthe grisea 70-15 Hypocrea jecorina – Private (2008) Gibberella zeae - Private Fusarium verticillioides 7600 - Private Nectria haematococca mpVI - Private Fusarium oxysporum lycopersici - Private Cryphonectria parasitica EP155 v1 - Private Neurospora crassa OR74A Chaetomium globosum CBS 148.51 - Private Podospora anserina – Private (2008) Schizosaccharomyces pombe 972hSchizosaccharomyces japonicus yFS275 – Private Cryptococcus neoformans H99 - Private Cryptococcus neoformans var. neoformans JEC21 Postia placenta Mad-698-R - Private Phanerochaete chrysosporium – Private (2004) Laccaria bicolor – Private (2008) Coprinopsis cinerea- Private Melampsora laricis-populina - Private Puccinia graminis f. tritici - Private Ustilago maydis - Private Malassezia globosa CBS 7966 – Private Rhizopus oryzae RA 99-880 – Private Batrachochytrium dendrobatidis JAM81 – Private Encephalitozoon cuniculi GB-M1 Orthologous Distance Fungal CAZymes (Preliminary Results) « Rusts » CAZY Database and Website Genome Annotation and Comparative Genomics Annotation Highlights from Melampsora laricis-populina Interpretation and Speculation Host–Rust Parasite Interaction Interaction between rust and host is initiated on external surface. The haustorial mother cell produces a narrow peg that penetrates the host cell wall. Pathogen-secreted molecules inside the host cell suppress host defence and enhance susceptibility Maheshwari R. The scourge of mankind: From ancient time into the genomic era. Current Science. 2007 (9) 1249-1256. Infection Upon penetration of the plant cell wall by enzymatic dissolution, an haustorium is formed in the periplasmic space of the host cell. The interface between the plant and fungal cytoplasm consists of A gel like layer consisting of carbohydrates (extrahaustorial matrix) Extrahaustorial membrane -derived from the plant cell wall. The haustorium is directly connected to the mother cell so that nutrients can be transported from the plant cell to the developing fungal hyphae. Leonard KL and Szabo LJ. Molecular Plant Pathology (2005). 6 (2), 99-111 M_lari vs Fungal GHs : Highlights GH S_cere A_nige A_oryz B_fuck T_mela M_gris H_jeco G_zeae P_anse S_pom C_neof P_chry L_bico C_cine M_lari P_gram U_may M_glob 1 0 3 3 3 2 2 2 3 1 0 0 2 0 2 0 0 0 0 2 0 6 7 2 2 6 7 10 7 0 0 2 2 2 4 10 1 0 3 0 17 23 16 6 19 13 22 11 1 7 11 2 7 3 2 3 1 5 5 10 13 15 6 13 11 15 15 3 10 20 22 27 30 27 12 6 7 0 2 3 2 0 6 2 2 6 0 0 9 0 7 8 8 0 0 10 0 1 4 2 1 5 1 5 8 0 0 6 0 5 6 5 2 0 11 0 4 4 3 0 5 4 3 6 0 0 1 0 6 0 0 1 0 12 0 4 4 4 1 3 2 4 2 0 0 2 3 1 10 3 0 0 13 8 18 17 10 8 10 5 8 9 12 10 9 8 9 8 5 6 0 15 1 2 3 4 1 2 2 3 3 2 2 2 2 4 4 3 1 0 16 5 13 13 21 7 16 16 21 12 3 12 23 31 32 11 9 21 7 17 4 5 5 6 4 7 4 6 4 1 1 1 3 3 1 1 2 0 18 2 14 18 10 5 14 20 19 20 1 4 11 10 9 15 17 3 1 20 0 3 3 1 2 2 3 2 1 0 1 3 2 2 3 2 2 0 PC W PCW PCW CW PCW PCW PCW PCW Gly Gly FCW FCW FCW FCW S S S 26 0 1 1 2 0 0 0 0 1 0 0 0 0 0 5 5 0 0 27 0 4 3 4 0 4 8 2 2 1 0 3 1 0 7 12 1 0 ? ? 28 1 21 20 18 2 3 4 6 0 0 1 4 6 3 3 1 1 0 32 1 6 4 1 1 5 0 5 0 2 1 0 0 0 2 2 2 0 43 0 10 20 4 1 19 2 17 10 0 0 4 0 4 8 2 4 1 47 3 5 5 8 5 6 8 10 9 2 3 6 9 8 14 14 3 2 51 0 4 3 3 0 3 0 2 1 0 1 2 0 1 3 0 2 0 61 0 7 8 9 4 17 3 15 33 0 1 15 8 33 2 3 0 0 78 0 8 8 8 2 1 1 7 1 0 3 1 0 0 0 0 0 0 88 105 0 0 1 2 3 2 1 1 0 0 1 3 0 1 1 3 0 0 0 0 2 1 1 0 2 0 1 1 0 1 0 0 0 1 0 0 PCW Suc PCW FCW PCW CW PCW PCW PCW S S S S S Low Plant Cell-Wall (PCW) saccharification (S) capacity (GH1, 3, 43, 78…) Original combination of high GH7,10,12 but absent GH11 Large number of GH26,27 but unknown specificity (extrahaustorial matrix?) Capacity to saccharify sucrose (GH32) that is absent from PCW-saccharifying fungi Normal FCW-aiming enzymes but probably large set in CW-targeting family GH5 Differences w/ P_gram may reflect host specificity (Dicot/Monocot?) M_lari vs Fungal CBMs : Highlights CBM S_cere A_nige A_oryz B_fuck T_mela M_gris H_jeco G_zeae P_anse S_pomb C_neof P_chry L_bico C_cine M_lari P_gram U_mayd M_glob 1 0 8 3 18 1 22 15 12 30 2 0 31 1 46 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 13 0 1 2 1 0 0 3 2 0 0 5 5 10 24 0 0 0 0 18 2 13 5 16 16 33 8 34 30 0 1 1 1 1 0 0 2 0 19 1 0 1 0 1 0 0 0 0 0 0 0 1 2 5 3 0 0 PC W FC W PC W FC W FC W No CBMs aiming at Plant Cell-Wall (PCW) Few CBMs aiming at Fungal Cell-Wall (FCW) M_lari : Main CAZy Conclusions • An original distribution of CAZymes mostly shared with P_gram (where differences may relate w/ host) • Sufficient degrading GH + PL (not shown) enzymes to perforate the Plant Cell Wall, and form the Haustorium, but not for its saccharification • GH32 invertases present to saccharify Sucrose (like P_gram and U_mayd) • Open Question : Are some enzymes present to destroy oligosaccharide elicitors (resulting from FCM-degradation by plant enzymes) and diminish plant response? Bernard Henrissat (DR1) Pedro Coutinho (PR2) Brandi Cantarel (Post-Doc) Corinne Rancurel (IE - Bioinformatics) Vincent Lombard (IE - DB Expert) Thomas Bernard (PhD Student) (2008) Centre National de la Recherche Scientifique Aix-Marseille Universités ANR-PNRB: E-TriCel © Coutinho & Henrissat, 2008 CAZy - Team & Funding