Supporting Information Notes S1–S7 1. Simple sequence repeats and transposable elements 1. Simple sequence repeats identification and characterization 2. Transposable element identification and characterization 2. The mitochondrial genome annotation and analysis 3. Targeted annotation of specific gene families 1. Cerato-platanin family protein 2. Oxidative enzymes in the ROS gene network 3. Lignin peroxidases 4. Copper Radical Oxidases 5. Transporters 6. Peptidases 7. Signal transduction pathways 8. Transcription Factors 4. The mating incompatibility locus (MAT) 5. Wood degradation, enzyme content, expression and growth 1. Wood degradation Heterobasidion genome project outline 6. Pathogenicity 1. Natural product genes in the H. irregulare genome 2. Genes differentially regulate in interactions between H. irregulare and Pine 3. Anchoring pathogenicity QTL’s to the genome sequence 7. Trade-off 8. References 1 Notes S1 Simple sequence repeats and transposable elements 1.1. Simple sequence repeats identification and characterisation The whole genome was searched for SSRs using Sciroko (Kofler et al., 2007) with the ‘perfect MISA-mode’ search set at a minimum number of repeats of 14, 7, 5, 4, 4, 4 for mono-, di-, tri-, tetra-, penta-, and hexanucleotides, respectively. The distribution of SSRs was further described by manual annotation of all microsatellites present in the 10 largest scaffolds, representing 75 % of the genome. SSRs were scored as present or absent in the following genomic regions within ORFs: exons, introns, 3’UTRs and 5’UTRs. Since gene expression is reported to be controlled by promoters located before ORFs (Abeel et al., 2008), manually annotated of SSRs present within 50 bp and within 50-500 bp upstream or downstream of ORFs were performed. Finally, all SSRs located further than 500 bp from ORFs were also annotated. Manual annotation was conducted using all JGI filtered models. The frequency of SSRs was standardized based on the percentage of each of the above genomic regions as in the Frozen Catalog 090414. The total number of perfect SSRs found was 2541, and they comprised about 0.0017% of the genome. There is approximately one microsatellite per 13 Kb and the total number of SSRs in intergenic (n=1372) and intragenic regions (n=1169) is similar. Density of SSRs (number/Mb) in the ten largest scaffolds was highest in 3’UTR, followed by regions located more than 500 bp from ORFs, regions 50 bp upstream ORFs, 5’UTR, 50 bp downstream ORFs, 50-500 bp downstream ORFs, introns, 50-500 bp upstream ORFs and exons (Fig. S4). The most frequent SSRs in exons are trinucleotides followed by hexanucleotides, while tetranucleotides are dominant in introns (Fig. S4). Trinucleotides are also clearly dominant in 5’UTRs, within 50 bp before ORFs and in 50-500 bp upstream ORFs. Conversely, tetranucleotides are more frequent than other SSRs in the genome fraction located more than 500 bp from ORFs. Densities of tri- and tetranucleotides are similar in 3’UTR and 50 bp downstream ORFs. Dinucleotides are present at extremely low frequencies both in exons and within 50 bp before ORF. Total number and density of SSRs is higher in 3’ than in 5’UTRs. Overall, the highest concentration of trinucleotides is found in 5’UTRs and within 50 bp before ORFs (Fig. S4). 2 The most frequent perfect fully standardized motifs are ACG, CCG and AGC, both in intergenic and intragenic regions. Some repeats frequently reported in fungi (i.e., AT, ATC) (Toth et al., 2000) are rare in the H. irregulare genome (Table S9). On the other hand, H. irregulare harbour a significant component of repeats absent or rare in fungi (i.e., CG, ACG) (Table S9). Although the density of SSRs is comparable between introns and exons, there appears to be a clear selection in favour of trinucleotides and hexanucleotides in the exonic coding regions that is absent in introns. Such dominance of triplets over other repeats in coding regions may be explained by low tolerance for frameshift mutations (Metzgar et al., 2000). In parallel, it could be argued that dominance of trirepeats and selection against other repeat numbers (in particular di- and tetrarepeats) in other regions of the genome may be indicative of a delicate role played by SSRs in gene regulation, for instance by altering the abundance of nuclear protein binding sites as shown by changes in number of CCG repeats in humans (Richards et al., 1993, Stallings, 1994). In this light, the overall low number of SSRs, the clear dominance of trinucleotides in both the 5’UTR and the 50 bp upstream ORFs, and the low representation of dinucleotides in the 50 bp upstream ORFs in the H. irregulare genome suggests a history of negative selection towards those SSRs (e.g dinucleotides) that are more likely to disrupt the functions played by these regions. We identified a dominance of tetranucleotides over trinucleotides in intronic non-coding regions. This finding is surprising since this dominance in introns is reported only for vertebrates (Toth et al., 2000, Mun et al., 2006, Lawson & Zhang, 2006). The parallel finding that tetranucleotides are dominant in regions further than 500 bp from ORFs is suggestive that these regions play a lesser role in the regulation of gene expression when compared to regions closer to ORFs. The lack of dominance of trinucleotides was also observed in the 3’UTR (in contrast with the absolute dominance of these SSRs in 5’UTRs) and downstream of ORFs. Dominance of trirepeats in the 5’UTR has been generally reported for both animals and plants (Li et al., 2004). However, the H. irregulare genome is characterized by a higher number and density of SSRs in the 3’UTR, a trait commonly associated mostly with animals and not plants (Li et al., 2004, Lawson & Zhang, 2006). The most abundant triplet in the H. irregular genome was ACG, a motif rarely found in genomes of other organisms, including fungi (Toth et al., 2000). We also found ACG repeats in introns, despite the fact they are reported as absolutely absent in fungi and several other 3 groups of organisms (Li et al., 2004). CG, a dinucleotide notoriously underrepresented in most organisms and not reported for other fungi, was also detected. In conclusion, this is one of the first reports linking the presence and type of SSRs with the regulatory function of DNA regions immediately upstream of ORFs. This study also provides an example from the fungi of a specific selection process almost exclusively in favour of trinucleotides in the 5’UTR, a well-known mechanism in other groups of organisms. Conversely, frequency of trinucleotides decreases further away from ORFs, and tetranucleotides dominate in regions further than 500 bp from ORFs, suggesting a loss of constraint that may be expected in regions less directly involved in the regulation of gene expression. A similar lack of constraint is exemplified by the dominance of tetranucleotides detected in introns of the H. irregulare genome. 1.2. Transposable element identification and characterization RepeatScout (Price et al., 2005) was used for de novo identification of repetitive DNA in the H. irregulare genome assembly. The default parameters (with l=15) were used. RepeatScout generated 1,082 consensus sequences. This library was then filtered as follow: 1) all the sequences less than 100 bp were eliminated; 2) low-complexity repeats and tandem repeats were removed as part of the RepeatScout algorithm using Nseg (Wooton & Federhen, 1996) and TRF (Benson, 1999); 3) repeats having less than 5 copies in the genome were removed and 4) repeats having significant hits to known proteins in Uniprot (The UniProt Consortium, 2008) except proteins known to belong to TE were removed. The classification of the 272 consensus sequences remaining was conducted using the pipeline REPCLASS (Feschotte et al., 2009). The elements were annotated manually using the REPCLASS classification and a tBLASTx search (Altschul et al., 1990) against RepBase. TE´s belonging to ClassI and ClassII as defined by Wicker and colleagues (2007) were identified (Fig. S1). The gypsy-like elements were the most frequent TEs corresponding to 9.28 % of the H. irregulare assembly. The Class II TIR was the second most frequent categorized elements (1.05 %). 3.67 % of the genome was masked by repeated elements belonging to unknown families (Fig. S1). To identify full length LTR retrotransposons, a second de novo search was performed with LTR_STRUC (Mc Carthy & Mc Donald, 2003). The program yielded 116 full-length candidate LTR retrotransposon sequences, which were checked for their homology using the BLASTN algorithm (Altschul et al., 1990) against the sequences coming from the RepBase database. Among the 116 putative full length LTRs, 90 were attributed to Gypsy/Ty3-like 4 elements and 17 to Copia/Ty1-like. Nine other elements did not exhibit a significant homology with known TE families or have homologies with non LTR retrotransposons, which sequences have been excluded for further analyses. The insertion age of full length LTRs was determined from the evolutionary distance between 5’- and 3’-solo LTR derived from a ClustalW (Thompson et al., 1994) alignment of the two solo LTR sequences using the Kimura correction. For the conversion of the sequence distance to putative insertion age, a substitution rate of 1.3 x 10-8 mutations per site per year was used (Ma & Bennetzen, 2004). H. irregulare underwent a recent activity which peaks at an estimated 0.2 Mya, preceded by a gradual increase starting 2 Mya. An old activity occurred at 4-8 Mya could also be detected. The decrease between 12 to 8 Mya probably reflects element deterioration leading to loss of ability to detect these elements (Fig. S3). The number of TE occurrences and the percent of genome coverage were identified by masking the H. irregulare genome assembly using RepeatMasker (Smit et al., 1996; www.repeatmasker.org). Of the 379 consensus sequences found, 272 came from the RepeatScout/REPCLASS pipeline and the 90 Gypsy/Ty3-like full length elements and 17 to Copia/Ty1-like full length elements were identified by LTR_STRUC. RepeatMasker masked 16.21 % of the H. irregulare genome assembly. Identified TEs are not uniformly distributed across the genome (Chi2 test, p.value <0.05), but are clustered in gene poor regions (Fig. S2). Notes S2 The mitochondrial genome annotation and analysis The mitochondrial genome (mt-genome) of H. irregulare, TC32-1, comprises 114193 bp and has a circular structure and a mean GC-content of 22.8%. Open reading frames (ORFs) longer than 150 bp were identified using ORFfinder, codon usage table 4 (http://www.ncbi.nlm.nih.gov/projects/gorf/). The ORFs that were 300 bp or longer were used to search the non-redundant NCBI database using BLAST in order to find conserved genes. Exon/intron boundaries were located by means of CLUSTALW alignment with homologous genes from other fungal species. The 300 bp or longer ORFs with no significant hits were considered as non-conserved putative genes. The small and large ribosomal RNA (rns, rnl rRNA) genes were located by BLASTing the homologous genes from closely related species to the mt-genome of H. irregulare. The program tRNAscan-SE was used to identify the tRNA regions. 5 Of the 15 protein coding genes identified, 14 are involved in energy production: Seven genes are encoded in the NADH dehydrogenase complex (nad1, nad2, nad3, nad4, nad4L, nad5, nad6), one gene in the cytochrome bc1 complex (cob), three genes in the cytochrome c oxidase complex (cox1, cox2, cox3) and three genes in the ATP synthase complex (atp6, atp8, atp9) (Table S10). In addition, the mt-genome included a ribosomal small subunit protein 3 (rps3) gene and one extra partial nad2 gene. The rns and rls rRNA genes and 25 tRNAs were identified. Two ORFs found are vaguely similar to each other (E-value 6e-17) and they were annotated as putative plasmid genes (Ppl1 and Ppl2), since they have low similarity hits with hypothetical plasmid proteins from P. ostreatus and Moniliophtora perniciosa (E-values 3e-12 and 3e-7). Next to one of the putative plasmid genes are two putative pseudo B-type DNA polymerase genes (PSdpo1 and PSdpo2). These polymerases are commonly found in mitochondrial plasmids and sometimes also in mt-genomes. The six non-conserved hypothetical genes found (NC-ORF1-6) have open reading frames larger than 100 amino acids and in five of these, InterProScan found transmembrane regions. Four of the nonconserved hypothetical genes are located adjacent to each other and the two others are both next to one of the two putative plasmid genes. None of the hypothetical genes has any similarity to each other. A number of 24 group I introns were identified in the mt genes: Nine in cox1, two in cox2, two in cox3, seven in cob, two in nad1, one in nad5 and one in rnl. In ten of these introns, 14 intronic genes were found. Out of these 14 intronic genes, 10 were found in the introns of cox1, one in cox3, two in cob and one in rnl. There are as many as three intronic genes in intron four of cox1 and also one putative pseudo-intronic gene. These intronic genes are conserved homing endonuclease genes (HEGs) with two different kinds of motifs: The LAGLIDADG motif and the GIY-YIG motif. HEGs are known to invade group I introns and promote mobility of the introns. Some of these HEGs are also maturases that assists in intron folding and thereby also in intron splicing. Notes S3 Targeted annotation of specific gene families 3.1 Cerato-platanin family protein The three members of the H. irregulare CP family were identified by recursive tBLASTn searches, initially using the sequences of the Ceratocystis platani CP members as queries (Comparini et al., 2009) (GenBank accession number EF017218.1 and AJ311644), the sequence of the CP paralog of Ceratocystis fimbriata isolate Cf 4 CF-MANG protein 6 (EF017221) and the cerato-populin gene from Ceratocystis populicola isolate Cf 2 (EF017219) (Comparini et al., 2009). The ORFs identified were designated CP genes and used as queries for further searches. This process was repeated until no new CP genes were recovered. The H. irregulare proteins were found in to separate clades when compared to 77 fungal Cp proteins (Fig. S11). 3.2. Oxidative enzymes in the ROS gene network Handling massive reactive oxygen species (ROS) production is required for pathogenicity in Magnaporthe oryzae (Egan et al., 2007) or for the mutualistic relationship between Epichloe festucae and perennial ryegrass (Tanaka et al., 2008). Prevention of ROS toxicity and control of ROS signalling require a large gene network of at least 150 genes in Arabidopsis, named the “ROS gene network” (Mittler et al., 2004). Within this network, H. irregulare possesses 5 peroxiredoxins, 3 catalases, 5 haloperoxidases, comparable numbers to those identified in other fungal genomes (Table S11). Notable is the absence of Alkylhydroperoxidase D-like and of Glutathione peroxidase, usually detected in fungi. Class I peroxidases are found in all living organisms and are members of the ROS network. Cytochrome C peroxidases (CcP) are found in mitochondria: they play a major role in the control of H 2 O 2 concentrations. One CcP sequence and four hybrid sequences divergent from B B B B other fungal sequences were detected in H. irregulare. For the production of ROS, fungi may also use NADPH oxidase homologues (NOx) and ferric reductase (FRe) (Gessler et al., 2007) with distinct functions: NOx are necessary for superoxide generation during developmental processes, whereas FRe are involved in metal reduction required to acquire iron from the infected host. As expected, NOxA and B were detected in H. irregulare. Surprisingly, seven FRe encoding sequences, probably resulting from several recent duplications were found. Iron uptake is required for virulence, resistance to oxidative stress, asexual/sexual development, and iron storage (Johnson, 2008). The high number of FRe copies could be associated with the pathogenic capacity of H. irregulare. 3.3. Lignin peroxidases In the genome of H. irregulare six sequences containing the Manganese Peroxidase (MnP) characteristic residue (ExxxE and D) were detected and one that show homology to MnP but lack the specific residues. Other plant pathogens in the Russulales (such as different species of Amylostereum, Echinodontium and Heterobasidion genus) also contained several MnP, 7 based on cDNA sequencing. The putative MnPs from H. irregulare did not cluster directly with the major MnP clades composed from Polyporales species (Fig. S10). 3.4. Copper radical oxidases Glyoxal oxidase is a copper-radical oxidase (CRO), with wide substrate specificity for oxidizing simple aldehydes, such as glyoxal and methylglyoxal, to the corresponding carboxylic acids (Whittaker et al., 1996). These substrates are found in ligninolytic cultures, suggesting a role as physiological substrates for GLX. GLX also has been implicated in the regulation of peroxidase activity, and is activated in vitro by lignin peroxidase (Kersten, 1990; Kersten & Kirk, 1987). Based on similarities to the galactose oxidase from Dactylium dendroides, the active site of GLX has been identified, and includes Tyr377, His378, Tyr135, Tyr70 and His471 (Ito et al., 1991; Kersten & Cullen, 1993; Whittaker et al., 1996). These residues are conserved in more recently identified CRO genes, including 6 in P. chrysosporium (Vanden Wymelenberg et al., 2006) and 3 in Postia placenta (Martinez et al., 2009). Five putative copper radical oxidases have been identified in the H. irregulare genome. BLAST analysis of the H. irregulare genome identified 5 sequences with significant similarity to P. chrysosporium glx1 or structurally related copper radical oxidases (Martinez et al., 2004). All sequences feature predicted secretion signals (SignalP v3.0, www.cbs.dtu.dk/services/SignalP/). Deduced, mature proteins had identities ranging from 29% to 47% compared to the P. chrysosporium glx1. Multiple alignments identified conserved residues constituting the Cu-coordinating active site of GLX (Tyr135, Tyr377, His378, His471) (Whittaker et al., 1996; Whittaker et al., 1999). In addition, a cysteine crosslinked with Tyr135 forms the radical redox site, and also is conserved in all six sequences (Cys70). Thus, based on structure, these proteins are likely copper radical oxidases. 3.5. Transporters Transporter proteins were classified with the aid of the system for membrane transporter classification (http://www.tcdb.org) and further refined manually. The H. irregulare genome contains 499 gene models that which is equivalent with other basidiomycetes fungi (Table S13). The largest numbers of proteins were found in secondary transporter family with the individually highest number being the major facilitator superfamily type. 8 3.6. Peptidases Automatically predicted proteinase genes and functions in the H. irregulare genome were further refined by manual curation using web-based tools at http://merops.sanger.ac.uk/ (Rawlings et al., 2008). The predicted proteinases were categorized in to different groups (Table S14).The presens of distingt members were compared with other basidiomycetes (Table S15). 3.7. Signal transduction pathways Five signal transduction pathways have been investigated in H. irregulare using the well characterized S. cerevisiae genes as probe for the bioinformatic analysis. The five pathways are: Fus3/Kss1 pheromone pathway, Hog1 osmostress pathway, Mpk1 cell integrity pathway, the calcium/calcineurin signaling pathway and the cAMP pathway (Gustin et al., 1998, Rispail et al., 2009). Fus3/Kss1 pheromone pathway The pheromone pathway in S. cerevisiae is triggered by the binding of pheromone to the cognate receptors Ste2p and Ste3p. No Ste2 homologue could be found in the H. irregulare genome. It has been documented that basidiomycetes (i.e. U. maydis and C. neoformans) lack type receptors (Rispail et al., 2009). Ste3p from S. cerevisiae show similarity with five proteins (protein id. 147162, 181128, 171777, 181123 and 147163) that are located in a cluster on scaffold 7. The interaction between the pheromone and the receptor leads to the downstream dissociation of the heterotrimeric G-protein. A Gpa1 homologue exist in H. irregulare (protein id. 33983) with typical small G protein domains. There are other two G- proteins with higher E-value but still with typical G-protein domains (protein id. 57348 and 31682). The ß-subunit and γsubunit are also present in the genome. A Cdc24 is present in the H. irregulare genome and possesses the DH-domain and the pleckstrin-like domain characteristic for guanidine exchange factors. Furthermore, Cdc42 (activated by Cdc24) is present and shows the typical Ras GTPase domain (Johnson, 1999). There is a Ste50 homologue in the H. irregulare genome, a protein that functions as an adaptor between Cdc42-Ste20 and the MAPK Ste11 (Jung et al., 2011). In all basidiomycetes 9 analyzed (C. cinereus, L. bicolor, P. chrysosporium, H. irregulare, C. neoformans (serotype A), C. neoformans (serotype B), C. neoformans (serotype D B-3501A), U. maydis, S. roseus) this protein is longer than in the ascomycota due to the presence of SH3 domain (Fig. S6). Ste5 homologue, a scaffold protein that binds Ste11p, Ste7p, and Fus3p kinases in the central MAPK cascade module, is absent from H. irregulare and in all basidiomycetes analysed so far. However, the Cdc42p-activated signal transducing kinase of the PAK (p21-activated kinase) family Ste20 orthologue is present (Malleshaiah et al., 2010). A Bem1 orthologue which has a Phox-like domain and an SH3-domain thought to interact with Ste5-MAPK complex of the pheromone pathway is present. The central MAPK module of the pheromone pathway in S. cerevisiae is composed of the MAPKKK Ste11, the MAPKK Ste7 and the MAPK Fus3 (Gustin et al., 1998). Genes encoding homologous proteins to these proteins are present in the H. irregulare genome in one copy. Ste11 shows the typical SAM interactive domain that characterizes all the other Ste11 MAPKKK proteins in the fungus studied. The SAM domain in the Ste11 protein is thought to interact with the SAM domain in the upstream interaction protein Ste50 (Grimshaw et al., 2004). HOG1 osmotolerance pathway The Sln1 histidine phosphorelay sensor is used by fungi to sense changes in the environments osmolarity conditions (Posas et al., 1996). In the H. irregulare genome there is one copy of Sln1-like protein. Five transmembrane domains have been found in the H. irregulare Sln1 and this differs from the 2 transmembrane domains of the Sln1 sensor in S. cerevisiae. There is a Sho1 orthologue protein in H. irregulare, this protein shows 4 transmembrane domains and an SH3 domain. The phosphorelay intermediate protein of the Hog1 osmoregulation pathway including Ypd1like protein and the response regulator Ssk1 is present in H. irregulare (Posas et al., 1996). In the budding yeast there is another signaling branch related to the hyperosmolarity pathway which is activated by the association between Sho1 and the signaling mucin protein Msb2. The search in the H. irregulare genome didn’t show any orthologue similar to S. cerevisiae Msb2. The core component of the Hog1 osmoregulation pathway is all present in the H. irregulare genome. There is one copy for each of the following proteins, Ssk2 MAPKKK, Pbs2 MAPKK and the Hog1 MAPK (Brewster et al., 1993; Rispail et al., 2009). 10 The Mpk1 cell integrity pathway No members of the Wsc or the Mid2, of the cell integrity gene families, are present in H. irregulare genome. However, the downstream GTPase protein Rho1 is present in one copy. The Rho1-activation proteins Tus1 is missing while Sac7 and Bem2 are present. Rom1, a GEP for Rho1, is present and it’s characterized by the DH domain (GEF activity), a plekstrinlike domain and a citron-like domain. The cell integrity pathway core is again characterized by the MAPKKK Bkc1, the MAPKK Mkk1 and the effector MAPK Mpk1 (Gustin et al., 1998). There are single copies of each of these elements in the H. irregulare genome. The Ca2+ pathway There are three membrane proteins involved in the Ca2+ metabolism. The large transmembrane protein Cch1 is present in the H. irregulare genome with 18 transmembrane domains. Mid1 orthologue is also present, it has no transmembrane domains since its anchored trough a GPI-anchor (Locke et al., 2000). The calmodulin protein (Ca2+ binding protein) is present and it shows the typical EF-hand domain that characterizes many Ca2+ binding proteins. There are two isoforms of the calcineurin A catalytic subunit. The regulatory subunit of the calcineurin A is also present as a single copy (Cyert, 2003). The Ca2+ pathway is characterized by different types of vacuolar Ca2+ channels. In the H. irregulare genome there are two copies of the calcium-transporting ATPases Pmr1 and one copy of the Pmc1. Interestingly in the genome there are 5 paralogues of the Vacuolar H+ /Ca2+ exchanger Vcx1 which is involved in the control of cytosolic Ca2+ concentration in S. cerevisiae (Rispail et al., 2009). The cyclic-AMP-PKA pathway The cAMP-PKA pathway is composed by several elements that are all present in the H. irregulare genome (Pukkila-Worley & Alspaugh, 2004; Shemarova, 2009). The adenylate cyclase (AC) shows the characteristic LRR domain typical of proteins involved in signal transduction. The phylogenetic analysis shows three distinct clusters of AC sequences that correspond to phylogentic groups basidiomycota, ascomycota and oomycota (Fig. S7). The AC doesn´t show any transmembrane domain which differs from the human AC that shows 9 predicted transmembrane domains. There are two sequences related to the phosphodiesterase in the H. irregulare genome. The phosphodiesterase is responsible for reduceing the intracellular concentration of the second messenger cAMP (Ma et al., 1999). The 11 phosphodiesterase class I (PDE I) is the low-affinity enzyme for the cAMP while the PDE II is the high-affinity one. The PKA is a kinase responsible for the downstream regulation and activation of target proteins in pathways regulated by the cAMP level. The H. irregulare genome contains both the catalytic and the regulatory subunit of PKA. The regulatory subunit is characterized by the type II PKA R subunit domain. Finally the CAP protein (Cyclaseassociated protein) is also present (Shemarova, 2009). 3.8. Transcription Factors Out of 11,464 H. irregulare proteins identified 440 werecharacterised as putative transcription factors (TFs). The 440 H. irregulare TFs are distributed in 36 families. Among these 440 putative TFs, the zinc finger CCHC-type family is the most abundant in H. irregulare and the second most abundant TF family in H. irregulare is the C2H2 zinc finger whereas the fungal specific transcription factor zn2Cys6 is ranked as the third most abundant (Fig. S5). Notes S4 The mating incompatibility locus Mating incompatibility in heterothallic Agaricomycotina is a post-fusion response controlled by the mating-type (MAT) loci. Tetrapolar species have two MAT loci, with one locus (MATA) encoding homeodomain transcription factors and controlling clamp-cell formation and conjugate nuclear division, and the other locus (MAT-B) encoding pheromones and pheromone receptors and controlling nuclear migration and clamp-cell fusion (Brown & Casselton, 2001). In bipolar basidiomycetes, these processes are regulated by a single locus, and where it is known, this locus is homologous to MAT-A in tetrapolar species (Aimi et al., 2005; James et al., 2006). H. irregulare is a bipolar species in an order of Agaricomycetes (Russulales) that has never been investigated for its genomic structure of MAT. The species has multiple mating types, forms clamp connections after heterokaryotization, but has multinucleate rather than dikaryotic cells in the secondary mycelium (Korhonen & Stenlid, 1998). Initial queries of the genome identified two distinct regions of the genome apparently homologous to the MAT loci of other basidiomycetes. The MAT-A gene homologues (homeodomain genes) were found on scaffold 1, within a genomic region of highly conserved gene composition similar to other Agaricomycetes (Fig. S9) (James, 2007; Niculita-Hirzel et al., 2008; Ya et al., 2009). The MAT-B gene homologues (G-protein coupled pheromone 12 receptors) were found on scaffold 5, with 5 receptors encoded in a genomic region of ~20 kb. These genes, like those of all basidiomycete MAT-B loci, are homologous to the STE3 gene (a-factor receptor) of S. cerevisiae (Brown & Casselton, 2001). At least three putative pheromone genes (protein IDs: 181126, 181135, 181139) have also been identified in this region and another one (protein ID: 181275) about 110 kb upstream to these. These two genomic scaffolds are unlinked (Fig. 1), and either could represent the MAT locus of H. irregulare. In order to test whether either of the two putative mating-type loci found in the genome of H. irregulare are the MAT locus, a segregation analysis of single basidiospore progeny was conducted. A mature basidiocarp was taken from a stump of a fallen Pinus resinosa tree in Stinchfield Woods, Washtenaw County, Michigan. A sample of 24 single basidiospore progeny was isolated from the basidiocarp, and the mating types of the progeny were determined by pairings on 0.5% malt extract agar and analysis of heterokaryotization by clamp-connection formation. The segregation of the putative MAT loci was analyzed by genotyping two markers: the MIP locus (protein ID: 59033) adjacent to MAT-A homologues and one of the STE3 pheromone receptor homologues Ha-STE3.3 (protein ID: 181123), by restriction enzyme analysis of PCR products of the regions (Fig. S9). Comparison of the segregation of mating type and the two genomic regions demonstrated that the MAT-A genomic region but not the MAT-B genomic region displays consistent cosegregation with mating type. Therefore these data provide strong evidence that the region adjacent to MIP encoding homeodomain genes paired with a novel or highly derived gene (see below) is the MAT locus of H. irregulare. The MAT loci of basidiomycetes are subject to strong balancing selection due to negative frequency dependent selection on rare mating-type alleles, the coalescence time and therefore DNA polymorphism of the MAT alleles in a species has been found to be extensive relative to neutrally evolving genes (May et al., 1999). Ten homokaryotic isolates of H. irregulare are being sequenced at the MAT-A and MAT-B regions to determine levels of polymorphism. Preliminary data confirm the cosegregation data because MAT-A sequences are hyperpolymorphic while MAT-B sequences are not. Most basidiomycete MAT-A loci encode one or more pairs of homedomain genes encoded in divergently transcribed pairs (Kües & Casselton, 1992). Each gene pair is comprised of one 13 member from each of two classes (HD1 and HD2-types) (Hiscock & Kües, 1999). The HD2type is considered the typical homeobox DNA binding motif, whereas the HD1 type proteins have an atypical DNA binding motif, and the heterodimerization of the two types is believed to be crucial for regulating heterokaryosis and sexual development of basidiomycetes (Kües, 2000). The putative MAT locus of H. irregulare is flanked by the genes MIP and βfg as observed in most Agaricomycetes and contains two HD1-type homeodomain genes paired in a divergently transcribed arrangement with a gene with very low similarity to other proteins in sequence databases (Fig. S5). These two genes (herein termed MAT-Aα2 and MAT-Aβ2), are clearly homologous but their products share only ~50% amino acid similarity over 1/3 of their length. Similarity is seen at the N-terminal region between amino acids 48-71 and 150-173, the regions where other mating-type proteins typically encode their allelic specificity. The second regions of similarity are from amino acids 347-365 and 509-524 and contain two NLS signals but are otherwise S/T/P rich. MAT-Aα2 and MAT-Aβ2 are moderately predicted to have a nuclear subcellular localization based on the presence of 2-3 nuclear localization signals (NLS) in their sequence (Nakai & Horton, 1999). These two proteins are actively expressed as evidenced by their hybridization to the NimbleGen microarray and the detection of MAT-Aα2 by EST sequencing. The major characteristic that distinguishes MAT-Aα2 and MAT-Aβ2 from the typical HD2 class of homeodomain proteins is the apparent absence of the DNA-binding homeodomain motif that in C. cinerea is essential for HD2 function (Kües, 2000). Currently, the most parsimonious explanation is that the MAT-Aα2 and MAT-Aβ2 proteins are highly derived HD2 proteins that have lost or highly modified their DNA binding motifs. The only similarity between the MAT-Aα2 and MAT-Aβ2 proteins and any others is weak similarity between the H. irregulare MAT-Aβ2 protein and the MAT-A2 protein of P. chrysosporium (protein ID: 7556) in the N-terminus region of the two proteins. This region is the specificity-determining region in the other characterized basidiomycete homeodomain genes (Hiscock & Kües, 1999). The organization of the H. irregulare MAT with two pairs of HD1 genes divergently transcribed with the highly modified gene suggests that one pair arose from the other by a duplication event. In summary, these data provide strong evidence that mating type in H. irregulare is controlled by homeodomain transcription factors in a manner similar to other Agaricomycetes. Thus, the evolutionary origin of the bipolar mating system is an additional example of convergent evolution of bipolarity from tetrapolarity wherein the pheromone receptors of the MAT-B locus have lost function as mating-type specificity determinants, but have maintained their 14 structure, genomic location, and presumable function as regulators of sexual development (James et al., 2006). One novel aspect of the H. irregulare MAT locus is the absence of proteins with a detectable HD2 DNA binding motif. Future research will be needed to determine if these proteins perform a similar function as the HD2 proteins in other Agaricomycetes. Notes S5 Wood degradation, enzyme content, expression and growth The CAZy profile in Glycoside Hydrolases (GHs), Polysaccharide Lyases (PLs), Carbohydrate Esterases (CEs) and Cellulose binding moduals (CBMs) of H. irregulare were compared to those of 8 other fungi, including 7 basidiomycetes (U. maydis, P. placenta, P. chrysosporium, L. bicolor, C. cinerea, S. commune, C. neoformans) and the ascomycete plant pathogen M. oryzae. The global number of GHs, PLs, CEs and CBMs of H. irregulare is almost perfectly in the median of each category (Table S16). However, these global numbers overlook important details that appear during a detailed inspection of the GH profile at the family level. In particular the spectrum of GH families dedicated to plant cell wall polysaccharide degradation of H. irregulare is almost as complete as that of the saprophytes S. commune and that of C. cinerea (Table S16). H. irregulare appears to have all the enzymatic equipment to digest cellulose (enzymes from families GH5, GH6, GH7 and GH45), xyloglucan and its side chains (GH27, GH29, GH12 and GH74), and pectin and its side chains (GH28,GH43, GH51, GH53, GH78, GH88, GH105, PL1, PL4, CE8 and CE12). In contrast H. irregulare appears to have a limited xylanolytic potential, with only two xylanases from family GH10 and none from family GH11. The presences of 17 proteins with CBM1 modules indicate a strong cellulose binding capacity (Table S16). The number of enzymes active in pectin degradation was found to be more than twice as many as from the basidiomycets and the pathogenic M. oryzae (Table S17). Growth of H. irregulare was analysed on minimal media containing different carbon sources and compared to growth of four other basidiomycetes, P. chrysosporium, L. bicolor, S. commune and P. placenta, along with M. oryzae. Growth of H. irregulare on monomeric and oligomeric sugars was slow compared to growth on some polysaccharides (data not shown). Growth on sucrose, however, correlated well with the presence of an invertase (GH32) in the genome of H. irregulare. Good growth was observed on guar gum (a galactomannan similar in structure to softwood galacto(gluco)mannans), apple pectin and starch, while poor growth 15 was observed on beechwood xylan (glucuronoarabinoxylans), which correlates well with the CAZy identified in the genome.. Gene models putatively involved in lignin degradation in the H. irregulare genome were characterized and compared to other fungi (Table S12). 5.1. Transcripte profiling of wood degradation Shavings of Pinus sylvestris (L.) sapwood, on moist soil separated with a nylon net, were inoculated with H. irregular for a period of ten days at 20oC and 80% relative humidity. Colonized wood shavings were frozen in liquid nitrogen and RNA was extracted (see above). Out of the 305 carbohydrate active enzymes (CAZYs) present in the H. irregulare genome, 27 are differentially expressed (p<0.05) under wood degradation compared to growth on liquid medium (Table S18). The carbohydrate active part of the transcriptome is dominated by cellulase degrading glucoside hydrolysing enzymes in the groups GH61, GH5 and GH12 which may have compensated for the relatively low numbers of GH7, 10, 11 enzymes. Pectin, a constutuent of the middle lamellae between pine-tracheid cell walls and ray cells, is the target for pectic lyasates which are higly expressed under pine wood decay. Moreover, cellulose fibres are protected by hemecellulose and they are degraded by arabinase and xylose which are expressed at intermediate and low levels. Generally, enzymes included in oxidative lignocellulose degradation have a lower expression compared to carbohydrate hydrolysing enzymes under wood degradation. Manganese peroxidase and versatile peroxidases are expressed threefold compared to the liquid medium. Peroxidases require peroxide as an oxidant for their function. It may be provided by three differet GMC-oxidoreductases that are 15 fold significantly higher expressed in wood compared to liquid culture (Table S18). Glyoxal oxidase is another producer of H2O2 and is as well upregulated in wood. Laccases are also moderately upregulated. Cellobiose dehydrogenase responsible for the oxidation of cellobiose is moderately upregulated. The fungal mycelium may connect energy and mineral resources at different physical locations. In the context of wood decay this is accomplished by active transportation of sugars from the wood degradation and ammonium into this area. Genes that are differentially 16 expressed include also those that encodes proteins that facilitate transport of nutrients (Table S18). Notes S6 Pathogenicity The Plant-Associated Microbe Gene Ontology (PAMGO) consortium has extended the Gene Ontology (GO), which describes a gene product in the context of their molecular function, biological process and cellular localisation, to include terms describing processes involvning multi-organism interactions (Ashburner et al., 2000). From the AMIGO (http://amigo.geneontology.org/cgi-bin/amigo/go.cgi) web site gene products identifications from M. oryzea and C. albicans for the GO:0044403 symbiosis encompassing mutualism through parasitism term and it’s children were down loaded. After manual curration and refining of the lists 219 and 216 unique sequences were retrieved from M. oryzea and C. albicans, respectively (Arnaud et al., 2009, Torto-Alalibo et al., 2009). Out of 219 M. oryzea proteins 116 had at least one hit, among the filtered model gene set of H. irregulare, with an e-value below -10. There were 36 gene products that had 2 or more hits. From C. albicans there were 181 gene products that had significant homology with predicted proteins from the H. irregulare genome and 123 had more than two matches. Among the genes identified there were only 13 that show homology with GO:0044403 classified genes of both M. oryzea and C. albicans. The potentially conserved pathogenicity genes common between M. oryzea and C. albicans are mainly classified as signalling pathway and transcriptional regulation proteins. Two proteins, id: 148775 and 57164 are MAP kinases which share similarities with the mitogenactivated protein kinase, FUS3/KSS1 and SLT2/MPK1 respectively from S. cerevisiae. A catalytid subunit of a cAMP-dependent protein kinase (Protein id: 68091) and adenylate cyclase (Protein id:45792) were shared between M. oryzea and C. albicans. Further more, signalling related proteins including a Ras related GTPase (Protein id: 66761) and a small GTPase (Protein id: 154562) similar to Cdc42 from S. cerevisiae were found. Two proteins, (Protein id: 99199 and 172816) show homology to proteins involved in transcription alteration and chromatin remodelling by interference with protein-DNA complexes, a trehalose-6-phosphate synthase (Protein id: 54829) and two proteins related to lipid metabolism were also among the shared proteins. Protein 38714 are probably involved in 17 beta-oxidation in the peroxisome and protein 156259 show similarity to a acyltransferase specific to short chain fatty acids. 6.1. Natural product genes in the H. irregulare genome H. annosum (s.l.) is recognized as a producer of at least 10 different secondary metabolites such as the fomajorins (Donnelly et al., 1987) and fomannosin (Kepler et al., 1967), which all belong to the terpenoids, and fomannoxin (Hirotani et al., 1977). These compounds are produced both in axenic cultures and during interaction with plants and other fungi. Fomannosin was the first toxin that was isolated from H. annosum (s.l.) and hypothesized to be involved in pathogenesis by Basset and co-workers in 1967, although they were not able to detect the compound from infected Pinus taeda stems or roots, nor from H. annosum (s.l.) cultures on pine sapwood shavings (Basset et al., 1967). Fomannosin was preferentially produced on media containing high sugar levels, associated with the declining growth phase of the fungus. Application of fomannosin to stem wounds in pine seedlings causes needle browning and death, and de novo synthesis of pinosylvin (Basset et al., 1967). Another toxin produced by H. annosum (s.l.) is fomannoxin, which have a 100 fold greater toxicity to plant cells than fomannosin (Hirotani et al., 1977). This toxin has been isolated from H. annosum (s.l.) infected Sitka spruce stem wood (Heslin et al., 1983). Uptake of fomannoxin by Sitka spruce seedlings resulted in rapid browning of the roots accompanied by chlorosis and progressive browning of needles (Heslin et al., 1983). This, and the production of fomannoxin by actively growing hyphae, suggests a role for fomannoxin during pathogenesis (Heslin et al., 1983). Fomannoxin, fomannosin and fomajorins shows varying degrees of toxicity to both plant, fungal and bacterial cells (Sonnenbichler et al., 1989). H. irregulare strain TC32-1 was grown in liquid Hagem media (Stenlid 1985) at 20 ºC and stationary conditions for 30 days. Five liters culture filtrate was extracted using eight 10-g solid-phase extraction (SPE) columns (Isolute C18 (EC), Biotage, Uppsala, Sweden). The columns were activated and equilibrated (60 ml CH3OH and 120 ml H2O containing 0.2% v/v formic acid, respectively) before sample loading. Water-soluble organic and inorganic substances were washed out with 120 ml H2O, containing 0.2% v/v formic acid, before the lipophilic analytes were eluted with 90 ml aqueous 95% CH3CN with 0.2% v/v formic acid present. The 95% aqueous CH3CN eluate was dried under reduced pressure and the residue was dissolved in 3 ml 30% aqueous CH3CN before it was subjected to preparative reversed phase HPLC (Reprosil-Pur ODS-3, C18, 5 µm, 20 × 100 mm and 20 × 30 mm guard column, 18 Dr Maisch GmbH, Ammerbuch, Germany). Separation (3 ×1 ml injected) was performed in 60 min using a gradient of CH3CN in H2O with 0.2% v/v formic acid present (8-70% CH3CN in 50 min, 70-95% in 5 min, followed by a hold at 95% for 5 min). The flow rate was 13.2 ml/min and the eluate was monitored (UV at 254 nm) and collected into 2-ml fractions in deep-well plates using a liquid handler (Gilson FC204). Selected fractions were concentrated under a stream of nitrogen and diluted with H2O before they were subjected to further SPE. The SPE columns (50 mg and 100 mg, Isolute C18 (EC), International Sorbent Technology, Hengoed, UK) were treated in the same manner as described above except for; D2O was used for washing and eluation was done with CDCl3. The structures of the selected compounds were determined using nuclear magnetic resonance (NMR) spectroscopy and liquid chromatography-mass spectrometry (LC-MS). Data acquired from these techniques were analyzed and compared to literature data of structures already published from H. annosum (s.l.). NMR data were obtained using a Bruker DRX400 spectrometer equipped with a 5-mm QNP probe head and a Bruker Avance III 600 spectrometer with a 2.5-mm SEI probe head. All NMR experiments were recorded at 30°C and with CDCl3 as solvent. Standard pulse sequences supplied by Bruker were used to perform one-dimensional 1H and two-dimensional 1H-1H COSY and 1H-13C HSQC experiments. Chemical shifts were determined relative to internal CHCl3 (δH 7.27; δC 77.23). LC-MS was done using a HP1100 LC system (Hewlett-Packard, Palo Alto, CA, USA) connected to an Esquire-LC ion-trap mass spectrometer with an electrospray interface working in positive mode (Bruker Daltonics Inc., Billerica, MA, USA). LC-MS analysis of the peak eluting after 39.5 min yielded one major ion at m/z 189.3 [M+H]+ which indicated a molecular mass of 188 Da, i.e. the molecular mass of the secondary metabolite fomannoxin (Fig. 2) (Hirotani et al., 1977). The acquired 1H NMR data was shown to be identical to data previously reported for fomannoxin (1) (Hirotani et al., 1977). Isolation and subsequent investigation of the peak eluting at 26.2 min with LC-MS (m/z 187.3 [MH2O+H]+ and 205.3 [M+H]+) indicated a molecular mass of 204 Da. The 1H NMR spectrum revealed resonances partly similar to the pattern obtained for fomannoxin including signals from a 1,2,4-trisubstituted benzene and an aldehyde group. The chemical shifts of these signals and remaining signals were found to be comparable with data previously reported for the fomannoxin analogoue 5-formyl-2-(isopropyl-1´-ol)benzofuran (2) (Donnelly et al., 1988). The peak eluting after 22.7 min yielded three major ions by LC-MS analysis, m/z 19 245.3, 263.3 and 285.3, probably corresponding to [M-H2O+H]+, [M+H]+ and [M+Na]+, respectively, indicating a molecular mass of 262 Da, i.e. the molecular mass of the sesquiterpene fomannosin. The NMR-data was in accordance with a sesquiterpene structure and the data was found to be identical to data previously reported for fomannosin (Basset et al., 1967; Kepler et al., 1967; Paquette et al., 2008). This constitutes the first report on the production of secondary metabolites from the sequenced strain TC32-1. The results confirm the production of fomannoxin from the North American species H. irregulare, a metabolite that was previously only found in the European species H. annosum (s.s.) (Fig. 2). Putative natural product genes were found in the H. irregulare genome (Table S20), suggestive of a biosynthetic capacity greater than evident from the above chemical investigations. Computer analysis using a secondary metabolite unique regions finder (SMURF) web-tool (http://www.jcvi.org/smurf/index.php) and manual curation identified clustered co-localization of putative natural product genes (Table S21). Two polyketide synthase (PKS) genes (Prot. Id.174227 and 174228), were 174227 has an unusual tridomain structure including adenylation carrier protein and ketosynthase domains, forms a short cluster (cluster 1, Table S21) together with a halogenase gene. Further downstream genes in this cluster indicative for secondary metabolism encode a major facilitator protein (Prot. Id. 52305) and a transcription factor (Prot. Id. 118599) containing a Cys2His2-zinc finger motif. A third PKS gene (Prot. Id. 50938) for a canonical, non-reducing wA-like synthase is located next to a gene for a dual Cys2His2 zinc finger/Zn(II)2Cys6 binuclear cluster transcriptional regulator (Prot. Id. 51550) (cluster 8, Table S21), found in other species in the context of natural product genes (Zhang et al., 2004, Misiek & Hoffmeister, 2008). No polyketide metabolites have so far been isolated from H. annosum (s.l.), which suggests that the PKS genes are tightly regulated. Genes for multimodular nonribosomal peptide synthetases (NRPSs) were not detected, however 13 genes for tridomain enzymes identically composed of typical NRPS domains (adenylation, thiolation, reduction) are present in the genome and showed high homology to LYS2 from yeast (Ehmann et al., 1999) which functions as α-aminoadipate reductase during L-lysine synthesis in primary metabolism or to an ochratoxin biosynthesis NRPS-like gene, npsPN of Penicillium nordicum (Karolewiez & Geisen, 2005). From the microarray data, gene corresponding to Prot. Id. 153301 was induced during infection of pine bark (P = 0.033) and during growth on wood (P = 0.030), compared with liquid cultures. 20 Putative terpene cyclase genes include genes corresponding to Prot. Id. 181194, 115814, and 169607. Such an enzymatic activity is required to synthesize the intermediate en route to fomajorins and fomannosin. From the microarray data, gene 169607 was significantly downregulated during growth on wood (P = 0.019) compared with liquid cultures. This finding fits well with previous failure to detect fomannosin from infected P. taeda stems or roots, or from H. annosum (s.l.) cultures on pine sapwood shavings (Basset et al., 1967). Instead, the production of fomannosin on media containing high sugar levels during the declining growth phase of the fungus suggest that the primary function of fomannosin can be related to fungal development or microbial interactions, rather than as a phytotoxin. Several genes encoding so-called tailoring enzymes, i.e., enzymes for post-backbone assembly modification, were identified, among those five genes for flavin-dependent halogenases (Prot. Id. 174229, 181184, 181189, 181191, 181192) and one dimethylallyltransferase synthase (DMATS)-type prenyltransferase (108351). For either enzyme category, involvement in fungal natural product assembly has been experimentally shown (Steffan et al., 2007, Wang et al., 2008). While a DMATS may catalyze the prenyltransfer during fomannoxin assembly, halogenated natural products have not been described for the secondary metabolome of H. annosum (s.l.) yet. Here we report that the genome of H. irregulare contain a number of putative genes for biosynthesis of secondary metabolites. Chemical analyses of culture filtrates of strain TC32-1 predicts the presence of terpene cyclase (fomannosin) and DMATS (fomannoxin) genes, which were identified in the genome. However, the presence of putative PKS, NRPS and halogenase genes predict that the biosynthetic capacity of H. irregulare is greater than evident from the above chemical investigations, as no compounds have so far been identified that can be classified as polyketides, non-ribosomal peptides or to be halogenated. 6.2. Genes differentially regulated in interactions between H. irregulare and pine Samples of Pinus sylvestris (L.) bark colonised by H. irregulare for a period of four weeks were frozen in liquid nitrogen and RNA was extracted (see above). The 12200 models represented on the microarray were filtered for probes which cross hybridized with transcripts from pine. This filtering eliminated 1747 gene models from the future analysis. From the remaining 10453 the most highly expressed genes were selected, cut 21 of > 400 units mean raw. This gives a list of 1353 gene models out of there are 43 more than 2 fold up-regulated and 15 down-regulated (P<0.05). Genes highly expressed during infection which are not significantly induced in the bark sample compared to the liquid culture are presented in Table S22. Five of the secreted proteins are classified as CAZymes, one as a lipase, two as oxidases active on saccharide molecules in addition one acetylglucosaminyltransferase. The remaining 9 secreted protein have no homology with any protein with know function. However, two of out of these have a structure indicating four transmembrane domains indicating a membrane localisation. 6.3 Anchoring pathogenicity QTL’s to the genome sequence An AFLP-marker based linkage map of H. annosum (s.l.) was originally published by Lind et al., (2005). This map was based on a mapping population of 102 single spore isolates originating from a compatible mating between a H. occidentale North American S-type isolate (TC-122-12) and the now sequenced H. irregulare North American P-type isolate (TC-32-1) (Olson et al., 2005). This map has been used to map several traits of interest, such as growth rate (Olson, 2006), virulence (Lind et al., 2007b) and various intraspecific interactions (Lind et al., 2007a). To anchor the linkage groups containing the QTLs to the assembled genome, microsatellite markers were designed using the sequence information. One microsatellite marker on each of the 14 larger scaffolds was designed, screened for in the progeny isolates and added to the existing set of markers. Using the JoinMap 3.0 software (van Ooijen, 2001), the new markers were mapped and new linkage groups formed. The most interesting linkage group from a virulence point of view is linkage group 15, containing virulence QTLs from several different experiments on different hosts (Lind et al., 2007b). This group absorbed microsatellite marker 35, which was designed from the end of scaffold 12 (at 1557077 base pairs out of 1764121). Linkage group 20, which contained another virulence QTL, absorbed microsatellite 4, at 524787 base pairs into scaffold 1. The other linkage groups containing known QTLs might also be possible to anchor using an analogue approach. 22 With the scaffolds containing the virulence QTLs uncovered, efforts were made to pinpoint and sharpen their exact position. The microsatellite markers x1p, x2p and 198 were successfully added to linkage group 20, fusing it to linkage group 4. Likewise, another 8 markers from scaffold 12 added to linkage group 15. These new markers also made linkage group 9 fuse together with 15. This group now spans from marker 65_2, at 5473 bp, to marker 138.258, at 1742709 bp, indicating that the entire scaffold 12 is covered. Since no other markers have linked to the microsatellites at the ends of this scaffold, it seems possible that this scaffold in fact constitutes an entire single chromosome. This is confirmed by the presence of telomeric repeats (Fig. 1). Virulence assays described by Lind and collegues (2007a) were re-mapped using the new linkage groups, and the QTL regions (Fig. 3) were scanned for candidate genes and compared to microarray data from RNA extracted from infection sites. Notes S7 Trade-off The trade-off between growing in living and dead host within a specific organism can be illustrated by differences in gene expression profiles (Table 1). The number of common genes differentially expressed during colonisation of living and dead host is small as well as the number of uniquely expressed genes at distinct conditions (Fig 5). Specifically genes involved in degradation of lignocellulose and nutrient transportation are key components to understand the underlaying mechanism of trade-off in necrotrophic plant pathogens. To not use the full potential for wood decomposition represents an energetic cost for the pathogenic lifestyle. H. irregulare have 17 Cazymes up regulated compared to liquid culture when grown on wood while only half the number are expressed during bark colonisation (Table S24). The cost is not related to the capacity to degrade pectin while the capacity to degrade cellulose is connected with a trade-off. All genes associated with a trade-off effect show higher induction on wood vs liquid culture (LC) than bark vs LC. H. irregulare express 27 membrane transport proteins during wood growth (Table S25). The lower cellulose degradation potential expressed during growth in bark correllates with lower capacity for nutrient transport. Out of the 8 MFS1 that are connected with the trade-off effect only two show a higher expression compared to liquid culture on bark than on wood, while three of the four expressed both on wood and bark show higher expression on bark. 23 References 1. Abeel T, Saeys Y, Bonnet E, Rouzé P, van de Peer Y. 2008. Generic eukaryotic core promoter prediction using structural features of DNA. Genome Research 18: 310-323. 2. Aimi T, Yoshida R, Ishikawa M, Bao DP, Kitamoto Y. 2005. Identification and linkage mapping of the genes for the putative homeodomain protein (hox1) and the putative pheromone receptor protein homologue (rcb1) in a bipolar basidiomycete, Pholiota nameko. Current Genetics 48: 184-194. 3. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal Molecular Biology 215: 403–410. 4. Arnaud M B, Costanzo MC, Shah P, Skrzypek MS, Sherlock G. 2009. Gene ontology and annotation of pathogen genomes: the case of Candida albicans. Trends in Microbiology 17: 295-303. 5. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. 2000. Gene Onthology: toll for the unification of biology. Nature Genetics 25: 25-29. 6. Basset C, Sherwood RT, Kepler JA, Hamilton PB. 1967. Production and biological activity of fomannosin, a toxic sesquiterpene metabolite of Fomes annosus. Phytopathology 57: 1046-1052. 7. Benson G. 1999. Tandem Repeat Finder: a program to analyze DNA sequences. Nucleic Acid Research 27: 573-580. 8. Brewster JL, Devaloir T, Dwyer ND, Winter E, Gustin MC. 1993. An Osmosensing Signal Transduction Pathway in Yeast. Science 259: 1760-1763. 9. Brown AJ, Casselton LA. 2001. Mating in mushrooms: increasing the chances but prolonging the affair. Trends in Genetics 17: 393-400. 10. Comparini C, Carresi L, Pagni E, Sbrana F, Sebastiani F, Luchi N, Santini A, Capretti P, Tiribilli B, Pazzagli L, et al. 2009. New proteins orthologous to ceratoplatanin in various Ceratocystis species and the purification and characterization of cerato-populin from Ceratocystis populicola. Applied Microbiology and Biotechnol 84: 309-322. 11. Cyert MS. 2003. Calcineurin signaling in Saccharomyces cereviside: how yeast go crazy in response to stress. Biochem Biophys Res Commun 311:1143-1150. 12. Donnelly DMX, Fukuda N, Kouno I, Martin M, O´Reilly J. 1988. Dihydrobenzofurans from Heterobasidion annosum. Phytochemistry 27: 2709-2713. 24 13. Donnelly DMX, O’Reilly J, Polonsky J, Sheridan MH. 1987. In vitro production and biosynthesis of formajorin D and S by Fomes annosus (Fr.) Cooke. Journal of the Chemical Society Perkin Transaction 1: 1869-1872. 14. Egan MJ, Wang ZY, Jones MA, Smirnoff N, Talbot NJ. 2007. Generation of reactive oxygen species by fungal NADPH oxidases is required for rice blast disease. Proceedings of the National Academy of Science USA 104: 11772-11777. 15. Ehmann DE, Gehring AM, Walsh CT. 1999. Lysine biosynthesis in Saccharomyces cerevisiae: mechanism of α-aminoadipate reductase (Lys2) involves posttranslational phosphopantetheinylation by Lys5. Biochemistry 38: 6171-6177. 16. Feschotte C, Keswani U, Ranganathan N, Guibotsy ML, Levine D. 2009. Ewploring the repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biology and Evolution 1: 205-220. 17. Gessler NN, Aver'yanov AA, Belozerskaya TA. 2007. Reactive oxygen species in regulation of fungal development. Biochemistry (Mosc) 72: 1091-1109. 18. Grimshaw SJ, Mott HR, Stott KM, Nielsen PR, Evetts KA, Hopkins LJ, Nietlispach D, Owen D. 2004. Structure of the sterile alpha motif (SAM) domain of the Saccharomyces cerevisiae mitogen-activated protein kinase pathway-modulating protein STE50 and analysis of its interaction with the STE11 SAM. Journal of Biological Chemistry 279: 2192-2201. 19. Gustin MC, Albertyn J, Alexander M, Davenport K. 1998. MAP kinase pathways in the yeast Saccharomyces cerevisiae. Microbiology and Molecular Biology Review 62: 1264-1300. 20. Heslin MC, Stuart MR, Murchu PO, Donnelly DMX. 1983. Fomannoxin, a phytotoxic metabolite of Fomes annosus: in vitro production, host toxicity and isolation from naturally infected Sitka spruce heartwood. European Journal of Forest Pathology 13: 11-23. 21. Hirotani M, O´Reilly J, Donnelly DMX. 1977. Fomannoxin – a toxic metabolite of Fomes annosus. Tetrahedron Letter 7: 651-652. 22. Hiscock SJ, Kües U. 1999. Cellular and molecular mechanisms of sexual incompatibility in plants and fungi. International Review of Cytology 193: 165-295. 23. Ito N, Phillips SEV, Stevens C, Ogel ZB, McPherson MJ, Keen JN, Yadav KDS, Knowles PF. 1991. Novel thioether bond revealed by a 1.7 A crystal structure of galactose oxidase. Nature 350: 87-90. 25 24. James TY. 2007. Analysis of mating-type locus organization and synteny in mushroom fungi- beyond model species. In: Heitman J, Kronstad J, Taylor JW, Casselton LA, eds. Sex in fungi: molecular determination and evolutionary implications. Washington, D. C. USA: ASM Press, 317-331. 25. James TY, Srivilai P, Kües U, Vilgalys R. 2006. Evolution of the bipolar mating system of the mushroom Coprinellus disseminatus from its tetrapolar ancestors involves loss of mating-type-specific pheromone receptor function. Genetics 172: 1877-1891. 26. Johnson DI. 1999. Cdc42: An essential Rho-type GTPase controlling eukaryotic cell polarity. Microbiology and Molecular Biology Review 63: 54-105. 27. Johnson L. 2008. Iron and siderophores in fungal-host interactions. Mycological Research 112: 170-183. 28. Jung K, Kim S, Okagaki LH, Nielsen K, Bahn Y. 2011. Ste50 adaptor protein governs sexual differentiation of Cryptococcus neoformans via the pheromoneresponse MAPK signaling pathway. Fungal Genetics and Biology 48: 154-165. 29. Karolewiez A, Geisen R. 2005. Cloning a part of the ochratoxin A biosynthetic gene cluster of Penicillium nordicum and characterization of the ochratoxin polyketide synthase gene. Systematics and Applied Microbiology 28: 588-595. 30. Kepler JA, Wall ME, Mason JE, Basset C, McPhail AT, Sim, GA. 1967. The structure of fomannosin, a novel sesquiterpene metabolite of the fungus Fomes annosus. Journal of American Chemical Society 89: 1260-1261. 31. Kersten PJ. 1990. Glyoxal oxidase of Phanerochaete chrysosporium: Its characterization and activation by lignin peroxidase. Proc Natl Acad Sci USA 87:29362940. 32. Kersten PJ, Cullen D. 1993. Cloning and characterization of a cDNA encoding glyoxal oxidase, a peroxide-producing enzyme from the lignin-degrading basidiomycete Phanerochaete chrysosporium. Proceedings of the National Academy of Science USA 90: 7411-7413. 33. Kersten PJ, Kirk TK. 1987. Involvement of a new enzyme, glyoxal oxidase, in extracellular H2O2 production by Phanerochaete chrysosporium. Journal of Bacteriology 169: 2195-2201. 34. Kofler R, Schlötterer C, Lelley T. 2007. SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics 23: 1683-1685. 26 35. Korhonen K, Stenlid J. 1998. Biology of Heterobasidion annosum. In: Woodward S, Stenlid J, Karjalainen R, Hüttermann A, eds. Heterobasidion annosum: Biology, Ecology, Impact and Control. Wallingford UK: CAB International, 43-70. 36. Kües U. 2000. Life history and developmental processes in the basidiomycete Coprinus cinereus. Microbiology and Molecular Biology Review 64: 316-353. 37. Kües U, Casselton LA. 1992. Fungal mating type genes - regulators of sexual development. Mycological Research 96: 993-1006. 38. Lawson MJ, Zhang L. 2006. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biology 7: R14. 39. Li Y-C, Korol AB, Fahima T, Nevo E. 2004. Microsatellites within genes: structure, function, and evolution. Molecular Biology and Evolution 21: 991–1007. 40. Lind M, Dalman K, Stenlid J, Karlsson B, Olson Å. 2007b. Identification of quantitative trait loci affecting virulence in the basidiomycete Heterobasidion annosum s.l. Current Genetics 52: 35-44. 41. Lind M, Olson Å, Stenlid J. 2005. An AFLP-markers based genetic linkage map of Heterobasidion annosum locating intersterility genes. Fungal Genetics and Biology 42: 519-527. 42. Lind M, Stenlid J, Olson Å. 2007a. Genetics and QTL mapping of somatic incompatibilityand intraspecific interactions in the basidiomycete Heterobasidion annosum s.l. Fungal Genetics and Biology 44: 1242–1251 43. Locke EG, Bonilla M, Liang L, Takita Y, Cunningham KW. 2000. A homolog of voltage-gated Ca2+ channels stimulated by depletion of secretory Ca2+ in yeast. Molecular and Cell Biology 20: 6686-6694. 44. Ma J, Bennetzen JL. 2004. Rapid recent growth and divergence of rice nuclear genomes. Proceedings of the National Academy of Science USA 101: 12404-12410. 45. Ma PS, Wera S, van Dijck P, Thevelein JM. 1999. The PDE1-encoded low-affinity phosphodiesterase in the yeast Saccharomyces cerevisiae has a specific function in controlling agonist-induced cAMP signaling. Molecular and Cell Biology 10: 91-104. 46. Malleshaiah MK, Shahrezaei V, Swain PS, Michnick SW. 2010. The scaffold protein Ste5 directly controls a switch-like mating decision in yeast. Nature 465: 101105. 47. Martinez D, Larrondo LF, Putnam N, Sollewijn Gelpke MD, Huang K, Chapman J, Helfenbein KG, Ramaiya P, Detter JC, Larimer F, et al. 2004. Genome 27 sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology 22: 695-700. 48. Martinez D, Challacombe J, Morgenstern I, Hibbett D, Schmoll M, Kubicek CP, Ferreira P, Ruiz-Duenas FJ, Martinez AT, Kersten P, et al. 2009. Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion. Proceedings of the National Academy of Science USA 106: 1954-1959. 49. May G, Shaw F, Badrane H, Vekemans X. 1999. The signature of balancing selection: fungal mating compatibility gene evolution. Proceedings of the National Academy of Science USA 96: 9172-9177. 50. Mc Carthy E, Mc Donald JF. 2003. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362-367. 51. Metzgar D, Bytof J, Wills C. 2000. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Research 10: 72–80. 52. Misiek M, Hoffmeister D. 2008. Processing sites involved in intron splicing of Armillaria natural product genes. Mycological Research 112: 216-224. 53. Mittler R, Vanderauwera S, Gollery M, van Breusegem F. 2004. Reactive oxygen gene network of plants. Trends in Plant Science 9: 490-498. 54. Mun J-H, Kim D-J, Choi H-K, Gish J, Debelle F, Mudge J, Denny R, Endre G, Saurat O, Dudez A-M, et al. 2006. Distribution of microsatellites in the genome of Medicago truncatula: a resource of genetic markers that integrate genetic and physical maps. Genetics 172: 2541-2555. 55. Nakai K, Horton P. 1999. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences 24: 3435. 56. Niculita-Hirzel H, Labbé J, Kohler A, le Tacon F, Martin F, Sanders IR, Kües U. 2008. Gene organization of the mating type regions in the ectomycorrhizal fungus Laccaria bicolor reveals distinct evolution between the two mating type loci. New Phytologist 180: 329-342. 57. Olson Å. 2006. Genetic linkage between growth rate and the intersterility genes S and P in the basidiomycete Heterobasidion annosum s.lat. Mycolical Research 110: 979984. 58. Olson Å, Lind M, Stenlid J. 2005. In vitro test of virulence in theprogeny of a Heterobasidion interspecific cross. Forest Pathology 35: 321–331. 28 59. Paquette LA, Peng X, Yang J, Kang H-J. 2008. The carbohydrate-sesquiterpene interface. Directed synthetic routes to both (+)- and (-)- fomannosin from D-glucose. Journal of Organic Chemistry 73: 4548-4558. 60. Posas F, Wurgler-Murphy SM, Maeda T, Witten EA, Thai TC, Saito H. 1996. Yeast HOG1 MAP kinase cascade is regulated by a multistep phosphorelay mechanism in the SLN1-YPD1-SSK1 ''two-component'' osmosensor. Cell 86: 865875. 61. Price AL, Jones NC, Pevzner PA. 2005. De novo identification of repeat families in large genomes. In: Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, MI, USA, 351-358. 62. Pukkila-Worley R, Alspaugh JA. 2004. Cyclic AMP signaling in Cryptococcus neoformans. FEMS Yeast Res 4: 361-367. 63. Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ. 2008. MEROPS: the peptidase database. Nucleic Acids Research 36: 320-325. 64. Richards RI, Holman K, Yu S, Sutherland GR. 1993. Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. Human Molecular Genetics 2: 1429–1435. 65. Rispail N, Soanes DM, Ant C, Czajkowski R, Grünler A, Huguet R, PerezNadales E, Poli A, Sartorel E, Valiante V, et al. 2009. Comparative genomics of MAP kinase and calcium-calcineurin signalling components in plant and human pathogenic fungi. Fungal Genetics and Biology 46: 287-298. 66. Shemarova IV. 2009. cAMP-dependent signal pathways in unicellular eukaryotes. Critical Review Microbiology 35: 23-42. 67. Smit AFA, Hubley R, Green P. 1996-2010. RepeatMasker Open-3.0. 68. Sonnenbichler J, Bliestle IM, Peipp H, Holdenrieder O. 1989. Secondary fungal metabolites and their biological activities, I. Isolation of antibiotic compounds from cultures of Heterobasidion annosum synthesized in the presence of antagonistic fungi or host plant cells. Biological Chemistry Hoppe-Seyler 370: 1295-1303. 69. Stallings RL. 1994. Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics 21: 116–121. 70. Steffan N, Unsöld IA, Li S-M. 2007. Chemoenzymatic synthesis of prenylated indole derivatives by using a 4-dimethylallyltryptophan synthase from Aspergillus fumigatus. Chembiochem 8: 1298-1307. 29 71. Stenlid J. 1985. Population structure of Heterobasidion annosum as determined by somatic incompatibility, sexual incompatibility, and isoenzyme patterns. Canadian Journal Botany 63: 2268-2273. 72. Tanaka A, Takemoto D, Hyon GS, Park P, Scott B. 2008. NoxA activation by the small GTPase RacA is required to maintain a mutualistic symbiotic association between Epichloe festucae and perennial ryegrass. Molecular Microbiology 68: 11651178. 73. The UniProt Consortium. 2008. The Universal Protein Resource (UniProt), Nucleic Acid Research 36: 190-195. 74. Thompson JD, Higgins DG, Gibson TJ. 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acid Research 22: 4673-4680. 75. Tóth G, Gáspári Z, Jurka J. 2000. Microsatellites in Different Eukaryotic Genomes: Survey and Analysis. Genome Research 10: 967-981. 76. Vanden Wymelenberg A, Sabat G, Mozuch M, Kersten PJ, Cullen D, Blanchette RA. 2006. Structure, organization, and transcriptional regulation of a family of copper radical oxidase genes in the lignin-degrading basidiomycete Phanerochaete chrysosporium. Applied and Environmental Microbiology 72: 4871-4877. 77. Wang S, Xu Y, Maine EA, Wijeratne EMK, Espinosa-Artiles P, Gunatilaka AAL, Molnár I. 2008. Functional characterization of the biosynthesis of radicicol, an Hsp90 inhibitor resorcylic acid lactone from Chaetomium chiversii. Chemical Biology 15: 1328-1338. 78. van Ooijen JW, Voorrips RP. 2001. JoinMap® 3.0, Software for the calculation of genetic linkage maps. Plant Research International, Wageningen, the Netherlands. 79. Whittaker MM, Kersten PJ, Cullan D, Whittaker JW. 1999. Identification of catalytic residues in glyoxal oxidase by targeted mutagenesis. Journal of Biological Chemistry 274: 36226-36232. 80. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. 2007. A unified classification system for eukaryotic transposable elements. Nature Review Genetics 8: 973-982. 81. Whittaker MM, Kersten PJ, Nakamura N, Sanders-Loehr J, Scheizer ES, Whittaker JW. 1996. Glyoxal oxidase from Phanerochaete chrysosporium is a new radical-copper oxidase. Journal of Biological Chemistry 271: 681-687. 30 82. Wooton JC, Federhen S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymology 266: 554-571. 83. Yi R, Tachikawa T, Ishikawa M, Mukaiyama H, Bao Dapeng, Aimi T. 2009. Genomic structure of the A mating-type locus in a bipolar basidiomycete, Pholiota nameko. Mycological Research 113: 240-248. 84. Zhang S, Monahan BJ, Tkacz JS, Scott B. 2004. Indole-diterpene gene cluster from Aspergillus flavus. Applied and Environmental Microbiology 70: 6875-6883. 31