Identification of protein-coding genes putatively involved in infection by combining metagenomics analysis and protein orthologue clustering. Contributors Christine Sambles and David Studholme. University of Exeter, Devon. Introduction In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus in planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus. Material Transcriptome assemblies: F. excelsior: ATU1 C. fraxinea: KW1 Mixed material: AT1, AT2, Upton, Holt Output from BLASTX searches against GenBank: F. excelsior: ATU1 C. fraxinea: KW1 Mixed material: AT1, AT2, Upton, Holt Methods & Results We used MEGAN as previously described (http://oadb.tsl.ac.uk/?p=704), to assign transcripts to taxonomic bins. These transcripts came from four transcript assemblies: o o 1 H. pseudoalbidus isolate (KW1) and 4 mixed material (AT1, AT2, Holt & Upton). This resulted in 36,945 transcripts being allocated to the bin for order Helotiales. The longest open reading frame for each Helotiales-binned transcript (Table 1) was translated into a predicted protein sequence. These protein sequences were clustered using OrthoMCL. Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate that were binned to the order Helotiales using MEGAN. Helotiales % all transcripts AT1 8,214 15.61% AT2 7,403 8.80% Holt 6,930 6.44% Upton 7,410 12.25% KW1 6,561 31.75% ATU1 0 0.00% OrthoMCL analysis Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure 1. Fig 1: Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H. pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus infected F. excelsior (AT1, AT2, Holt and Upton). There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively. The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms). We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts. GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008), PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) (Park, Karpinets et al. 2010). GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2). 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% in planta growth cell growth lipid metabolism protein biosynthesis cell-cell signaling cell homeostasis secondary metabolism reproduction response to external stimulus ion transport mitochondrion organization and biogenesis biosynthesis cytoskeleton organization and biogenesis carbohydrate metabolism protein metabolism cell communication response to biotic stimulus response to abiotic stimulus regulation of gene expression, epigenetic metabolism organelle organization and biogenesis signal transduction cell organization and biogenesis catabolism transport protein transport cell cycle protein modification response to stress nucleobase, nucleoside, nucleotide and nucleic acid metabolism cell death death response to endogenous stimulus DNA metabolism morphogenesis embryonic development development generation of precursor metabolites and energy behavior cell differentiation cell proliferation pan proteome Enriched in pan-proteome Enriched in planta Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotialesbinned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1. PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families: Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704) Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding dehydrogenase (Pfam: ADH_zinc_N, PF00107) alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859) Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) & PAP2 superfamily (Pfam: PAP2_3, PF14378) Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415) Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431) Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase (Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like. BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003). Comparisons of Pfam domain content among samples PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1. Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group (http://pfam.sanger.ac.uk/). Domain/Family Name Pfam accession ATP12 ATP12 chaperone protein PF07542 BOP1NT BOP1NT (NUC169) domain PF08145 iPGM_N BPG-independent PGAM N-terminus PF06415 CDC37_M Cdc37 Hsp90 binding domain PF08565 CDC37_N Cdc37 N terminal kinase binding domain PF03234 CDC37_C Cdc37 C terminal domain PF08564 Chalcone Chalcone-flavanone isomerase PF02431 Copper-bind Copper binding proteins plastocyanin/azurin family PF00127 Sdh5 Flavinator of succinate dehydrogenase PF03937 HD_3 HD domain PF13023 Hpt Hpt domain PF01627 Metalloenzyme Metalloenzyme superfamily PF01676 CENP-I Mis6 PF07778 Myosin_tail_1 Myosin tail PF01576 TRM N2 N2-dimethylguanosine tRNA methyltransferase PF02005 Es2 Nuclear protein Es2 PF09751 Outer mitochondrial membrane transport complex Tom37 PF10568 protein PAP2_3 PAP2 superfamily PF14378 PMC2NT PMC2NT (NUC016) domain PF08066 Porphobilinogen deaminase dipyromethane cofactor PF01379 Porphobil_deam binding domain Porphobil_deam(C) Porphobilinogen deaminase C-terminal domain PF03900 DUF2012 Protein of unknown function PF09430 DUF775 Protein of unknown function PF05603 Prp31_C Prp31 C terminal domain PF09785 Ribosomal_L32p Ribosomal L32p protein family PF01783 Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’: Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and confines pathogen to site of attempted infection (Mur, Carver et al. 2006). Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copperbind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2 protein with them, but the assembled transcript was incomplete at the 5’ end and the PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence similarity to cupredoxin from Glarea lozoyensis and HHPred predicts similarity to cucumber stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccasemediated detoxification of phytoalexins generated by the plant defence systems has been observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian, Rajaei et al. 1998; Breuil, Jeandet et al. 1999). The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton & Holt. The histidine-containing phosphotransfer (HPt) domain is a novel protein module with an active histidine residue that mediates phosphotransfer reactions in the twocomponent signalling systems (Catlett, Yoder et al. 2003). Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1: ClusterID: HELO2454: HELO4337: HELO5213: HELO5952: Clustered protein present in: AT1, AT2, HOLT, UPTON AT1, AT2, HOLT, UPTON, KW1 AT1, HOLT, UPTON, KW1 AT2, UPTON, KW1 HELO4337 HELO5952 HELO5213 HELO2454 Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2). Table 2: Pfam domains/families identified in the ex planta group Domain/Family Name COX1 Cytochrome C and Quinol oxidase polypeptide I DASH_Spc34 DASH complex subunit Spc34 Pentapeptide_4 Pentapeptide repeats Vac7 Vacuolar segregation subunit 7 P DHQ_synthase 3-dehydroquinate synthase LtrA Bacterial low temperature requirement A protein FSH1 Serine hydrolase Tyrosinase Common central domain of tyrosinase Glyco_hydro_47 Glycosyl hydrolase family 47 DUF202 Domain of unknown function SET SET domain Abhydrolase_1 alpha/beta hydrolase fold adh_short_C2 Enoyl-(Acyl carrier protein) reductase Glyco_hydro_3 Glycosyl hydrolase family 3 N terminal domain Pfam accession PF00115 PF08657 PF13599 PF12751 PF01761 PF06772 PF03959 PF00264 PF01532 PF02656 PF00856 PF00561 PF13561 PF00933 ADH_zinc_N AAA adh_short Zinc-binding dehydrogenase ATPase family associated with various cellular activities short chain dehydrogenase PF00107 PF00004 PF00106 This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis. Conclusions Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein. References Adrian, M., H. Rajaei, et al. (1998). "Resveratrol Oxidation in Botrytis cinerea Conidia." Phytopathology 88: 472-476. Breuil, A. C., P. Jeandet, et al. (1999). "Characterization of a Pterostilbene Dehydrodimer Produced by Laccase of Botrytis cinerea." Phytopathology 89: 298-302. Catlett, N. L., O. C. Yoder, et al. (2003). "Whole-genome analysis of two-component signal transduction genes in fungal pathogens." Eukaryotic cell 2: 1151-1161. de Vries, R. P. and J. Visser (2001). "Aspergillus Enzymes Involved in Degradation of Plant Cell Wall Polysaccharides." Microbiology and Molecular Biology Reviews 65: 497-522. Eggert, C., U. Temp, et al. (1997). "Laccase is essential for lignin degradation by the white-rot fungus Pycnoporus cinnabarinus." FEBS Letters 407: 89-92. Kulkarni, R. D., H. S. Kelkar, et al. (2003). An eight-cysteine-containing CFEM domain unique to a group of fungal membrane proteins. Trends in Biochemical Sciences. 28: 118-121. Mur, L. A. J., T. L. W. Carver, et al. (2006). "NO way to live; the various roles of nitric oxide in plantpathogen interactions." Journal of experimental botany 57: 489-505. Park, B. H., T. V. Karpinets, et al. (2010). "CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database." Glycobiology 20: 1574-1584. Pezet, R., V. Pont, et al. (1991). "Evidence for oxidative detoxication of pterostilbene and resveratrol by a laccase-like stilbene oxidase produced by Botrytis cinerea." Physiological and Molecular Plant Pathology 39: 441-450. Sbaghi, M., P. Jeandet, et al. (1996). "Degradation of stilbene‐type phytoalexins in relation to the pathogenicity of Botrytis cinerea to grapevines." Plant Pathology: 139-144. Schmid, R. and M. L. Blaxter (2008). "annot8r: GO, EC and KEGG annotation of EST datasets." BMC bioinformatics 9: 180. Tuor, U., K. Winterhalter, et al. (1995). Enzymes of white-rot fungi involved in lignin degradation and ecological determinants for wood decay. Journal of Biotechnology. 41: 1-17. Zhou, S., T. Narukami, et al. (2012). Heme-Biosynthetic Porphobilinogen Deaminase Protects Aspergillus nidulans from Nitrosative Stress. Applied and Environmental Microbiology. 78: 103-109.