Orthologue_clustering_v3

advertisement
Identification of protein-coding genes putatively involved in infection
by combining metagenomics analysis and protein orthologue
clustering.
Contributors
Christine Sambles and David Studholme. University of Exeter, Devon.
Introduction
In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus
in planta interactions, we took an orthologue clustering approach. By identifying fungal
transcripts that are present in four samples taken from infected ash and removing transcripts
that are also present in the KW1 isolate could reveal some infection-related transcripts from
H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and
absent from F. excelsior with no signs of infection could identify transcripts involved in the
plants response to infection by H. pseudoalbidus.
Material
Transcriptome assemblies:
F. excelsior: ATU1
C. fraxinea: KW1
Mixed material: AT1, AT2, Upton, Holt
Output from BLASTX searches against GenBank:
F. excelsior: ATU1
C. fraxinea: KW1
Mixed material: AT1, AT2, Upton, Holt
Methods & Results
We used MEGAN as previously described (http://oadb.tsl.ac.uk/?p=704), to assign
transcripts to taxonomic bins. These transcripts came from four transcript assemblies:
o
o
1 H. pseudoalbidus isolate (KW1) and
4 mixed material (AT1, AT2, Holt & Upton).
This resulted in 36,945 transcripts being allocated to the bin for order Helotiales.
The longest open reading frame for each Helotiales-binned transcript (Table 1) was
translated into a predicted protein sequence. These protein sequences were clustered using
OrthoMCL.
Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate
that were binned to the order Helotiales using MEGAN.
Helotiales
% all transcripts
AT1
8,214
15.61%
AT2
7,403
8.80%
Holt
6,930
6.44%
Upton
7,410
12.25%
KW1
6,561
31.75%
ATU1
0
0.00%
OrthoMCL analysis
Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein
clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure
1.
Fig 1: Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H.
pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus
infected F. excelsior (AT1, AT2, Holt and Upton).
There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113
protein clusters was identified only in H. pseudoalbidus samples that were from infected F.
excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus
isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.
The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta)
contained a total of 565 transcripts (459 excluding isoforms). We annotated the transcript
sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM
and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of
565 transcripts.
GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008),
PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and
CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) (Park,
Karpinets et al. 2010).
GO analysis revealed a reduction of growth-related and an increase of cell differentiation
and proliferation proteins in infected material (Fig 2).
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
in planta
growth
cell growth
lipid metabolism
protein biosynthesis
cell-cell signaling
cell homeostasis
secondary metabolism
reproduction
response to external stimulus
ion transport
mitochondrion organization and biogenesis
biosynthesis
cytoskeleton organization and biogenesis
carbohydrate metabolism
protein metabolism
cell communication
response to biotic stimulus
response to abiotic stimulus
regulation of gene expression, epigenetic
metabolism
organelle organization and biogenesis
signal transduction
cell organization and biogenesis
catabolism
transport
protein transport
cell cycle
protein modification
response to stress
nucleobase, nucleoside, nucleotide and nucleic acid metabolism
cell death
death
response to endogenous stimulus
DNA metabolism
morphogenesis
embryonic development
development
generation of precursor metabolites and energy
behavior
cell differentiation
cell proliferation
pan proteome
Enriched in pan-proteome
Enriched in planta
Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton,
Holt) compared to in planta proteins. The in planta proteins were translated from Helotialesbinned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were
from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also
translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.
PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM
domains/families and the following CAZy families:
 Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704)
 Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding
dehydrogenase (Pfam: ADH_zinc_N, PF00107)
 alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859)
 Protein of unknown function, a putative transmembrane protein from bacteria. It is
likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594)
& PAP2 superfamily (Pfam: PAP2_3, PF14378)
 Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415)


Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431)
Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase
(Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like.
BLASTX hits from the in planta transcripts included putative CFEM domain-containing
protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis).
The Galactose mutarotase-like protein is of interest as it is also similar to
rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell
walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing
proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al.
2003).
Comparisons of Pfam domain content among samples
PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were
identified using the hmmpfam wrapper script, Pfam scan. These were compared to the
PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains
within this group. The domains and families in which >80% annotations were present in the
‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.
Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were
present in the ‘in planta’ group (http://pfam.sanger.ac.uk/).
Domain/Family
Name
Pfam accession
ATP12
ATP12 chaperone protein
PF07542
BOP1NT
BOP1NT (NUC169) domain
PF08145
iPGM_N
BPG-independent PGAM N-terminus
PF06415
CDC37_M
Cdc37 Hsp90 binding domain
PF08565
CDC37_N
Cdc37 N terminal kinase binding domain
PF03234
CDC37_C
Cdc37 C terminal domain
PF08564
Chalcone
Chalcone-flavanone isomerase
PF02431
Copper-bind
Copper binding proteins plastocyanin/azurin family
PF00127
Sdh5
Flavinator of succinate dehydrogenase
PF03937
HD_3
HD domain
PF13023
Hpt
Hpt domain
PF01627
Metalloenzyme
Metalloenzyme superfamily
PF01676
CENP-I
Mis6
PF07778
Myosin_tail_1
Myosin tail
PF01576
TRM
N2 N2-dimethylguanosine tRNA methyltransferase
PF02005
Es2
Nuclear protein Es2
PF09751
Outer mitochondrial membrane transport complex
Tom37
PF10568
protein
PAP2_3
PAP2 superfamily
PF14378
PMC2NT
PMC2NT (NUC016) domain
PF08066
Porphobilinogen deaminase dipyromethane cofactor
PF01379
Porphobil_deam
binding domain
Porphobil_deam(C) Porphobilinogen deaminase C-terminal domain
PF03900
DUF2012
Protein of unknown function
PF09430
DUF775
Protein of unknown function
PF05603
Prp31_C
Prp31 C terminal domain
PF09785
Ribosomal_L32p
Ribosomal L32p protein family
PF01783
Several of the Pfam hits struck us as interesting; these are described below. The pairs of
numbers in brackets are the number found within the in planta group / number found in
entire ‘pan-proteome’:
Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two
Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned
KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus
nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant)
protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the
reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and
fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part
of the plant hypersensitive response, a localized programmed cell death and confines
pathogen to site of attempted infection (Mur, Carver et al. 2006).
Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copperbind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2
protein with them, but the assembled transcript was incomplete at the 5’ end and the
PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence
similarity to cupredoxin from Glarea lozoyensis and HHPred predicts similarity to cucumber
stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and
fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes
cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases
which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccasemediated detoxification of phytoalexins generated by the plant defence systems has been
observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian,
Rajaei et al. 1998; Breuil, Jeandet et al. 1999).
The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton
& Holt. The histidine-containing phosphotransfer (HPt) domain is a novel protein module
with an active histidine residue that mediates phosphotransfer reactions in the twocomponent signalling systems (Catlett, Yoder et al. 2003).
Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the
‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and
none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4
clusters, only one of which is not present in KW1:
ClusterID:
HELO2454:
HELO4337:
HELO5213:
HELO5952:
Clustered protein present in:
AT1, AT2, HOLT, UPTON
AT1, AT2, HOLT, UPTON, KW1
AT1, HOLT, UPTON, KW1
AT2, UPTON, KW1
HELO4337
HELO5952
HELO5213
HELO2454
Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where
at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730). The names
of full-length proteins are shown in black; in grey are names of shorter length proteins from
incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain
sequences due to sequence similarity and inferred orthology. Orthologue clustering was
performed on all translated transcripts binned to the Helotiales using MEGAN from the one
H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from
infected F. excelsior (AT1, AT2, Holt & Upton).
The 33 clusters (representing 72 peptides) in the ex planta group which were only identified
in the isolate KW1 were annotated with PFAM as previously described. This resulted in
identification of 17 Pfam domains/families (Table 2).
Table 2: Pfam domains/families identified in the ex planta group
Domain/Family
Name
COX1
Cytochrome C and Quinol oxidase polypeptide I
DASH_Spc34
DASH complex subunit Spc34
Pentapeptide_4
Pentapeptide repeats
Vac7
Vacuolar segregation subunit 7 P
DHQ_synthase
3-dehydroquinate synthase
LtrA
Bacterial low temperature requirement A protein
FSH1
Serine hydrolase
Tyrosinase
Common central domain of tyrosinase
Glyco_hydro_47
Glycosyl hydrolase family 47
DUF202
Domain of unknown function
SET
SET domain
Abhydrolase_1
alpha/beta hydrolase fold
adh_short_C2
Enoyl-(Acyl carrier protein) reductase
Glyco_hydro_3
Glycosyl hydrolase family 3 N terminal domain
Pfam accession
PF00115
PF08657
PF13599
PF12751
PF01761
PF06772
PF03959
PF00264
PF01532
PF02656
PF00856
PF00561
PF13561
PF00933
ADH_zinc_N
AAA
adh_short
Zinc-binding dehydrogenase
ATPase family associated with various cellular activities
short chain dehydrogenase
PF00107
PF00004
PF00106
This low number of peptides not identified in any of the H. pseudoalbidus infected ash
samples limits the ability to perform any comparative analysis.
Conclusions
Proteins putatively involved in plant-pathogen interactions have been identified from groups of
translated transcripts exclusively found in planta and were not identified in isolate KW1. They
included a copper binding protein within the plastocyanin/azurin family, porphobilinogen
deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.
References
Adrian, M., H. Rajaei, et al. (1998). "Resveratrol Oxidation in Botrytis cinerea Conidia."
Phytopathology 88: 472-476.
Breuil, A. C., P. Jeandet, et al. (1999). "Characterization of a Pterostilbene Dehydrodimer Produced
by Laccase of Botrytis cinerea." Phytopathology 89: 298-302.
Catlett, N. L., O. C. Yoder, et al. (2003). "Whole-genome analysis of two-component signal
transduction genes in fungal pathogens." Eukaryotic cell 2: 1151-1161.
de Vries, R. P. and J. Visser (2001). "Aspergillus Enzymes Involved in Degradation of Plant Cell Wall
Polysaccharides." Microbiology and Molecular Biology Reviews 65: 497-522.
Eggert, C., U. Temp, et al. (1997). "Laccase is essential for lignin degradation by the white-rot fungus
Pycnoporus cinnabarinus." FEBS Letters 407: 89-92.
Kulkarni, R. D., H. S. Kelkar, et al. (2003). An eight-cysteine-containing CFEM domain unique to a
group of fungal membrane proteins. Trends in Biochemical Sciences. 28: 118-121.
Mur, L. A. J., T. L. W. Carver, et al. (2006). "NO way to live; the various roles of nitric oxide in plantpathogen interactions." Journal of experimental botany 57: 489-505.
Park, B. H., T. V. Karpinets, et al. (2010). "CAZymes Analysis Toolkit (CAT): web service for searching
and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy
database." Glycobiology 20: 1574-1584.
Pezet, R., V. Pont, et al. (1991). "Evidence for oxidative detoxication of pterostilbene and resveratrol
by a laccase-like stilbene oxidase produced by Botrytis cinerea." Physiological and Molecular
Plant Pathology 39: 441-450.
Sbaghi, M., P. Jeandet, et al. (1996). "Degradation of stilbene‐type phytoalexins in relation to the
pathogenicity of Botrytis cinerea to grapevines." Plant Pathology: 139-144.
Schmid, R. and M. L. Blaxter (2008). "annot8r: GO, EC and KEGG annotation of EST datasets." BMC
bioinformatics 9: 180.
Tuor, U., K. Winterhalter, et al. (1995). Enzymes of white-rot fungi involved in lignin degradation and
ecological determinants for wood decay. Journal of Biotechnology. 41: 1-17.
Zhou, S., T. Narukami, et al. (2012). Heme-Biosynthetic Porphobilinogen Deaminase Protects
Aspergillus nidulans from Nitrosative Stress. Applied and Environmental Microbiology. 78:
103-109.
Download