The internal transcribed spacer as a universal DNA barcode marker for Fungi Fungal Barcoding Consortium Conrad L. Schoch1, Keith A. Seifert2, Sabine Huhndorf3, John L. Spouge1, Vincent Robert4, Elena Bolchacova5, Kerstin Voigt6, Wen Chen3, Pedro W. Crous4, Andrew N. Miller7, Micheal J. Wingfield8, Gen Okada9, M. Suzuki9, Sarah Hambleton3, André Levesque3, J. Otte10, Imke Schmitt10, Nattawut Boonyuen11, E.B. Gareth Jones11, Satinee Suetrong11, Eric Tretter12, Merlin M. White12, Filip Högnabba13, Soili Stenroos13, Ferry Hagen4, Ursula Eberhardt4, Willem Quaedvlieg4, Teun Boekhout4, Ulrike Damm4, Sybren De Hoog4, Johannes Z. Groenewald4, Marizeth Groenewald4, G. Walther4, V. Duong4, ArthurSchüßler14, C. Qing15, Z.-L. Yang15, Mesfin Bogale16, Wendy A. Untereiner16, H. Maganti17, J.P Xu17, S.D. Leavitt3, H. Thorsten Lumbsch3, Karen Hansen18, I. Olariaga18, T.A. Duong19, Z. Wilhelm De Beer8, R. Henrik Nilsson20, G. Cardinali21, Ana R. Burgaz22, Anna Crespo22, Ruth Del-Prado22, Pradeep K. Divakar22, Constantino Ruibal22, K. Sotome23, Seppo Huhtinen24, Katarina Fliegerova25, B. Douglas26, Gareth W. Griffith26, K.-D. An27, Peter R. Johnston28, D. Park28, Bevan S. Weir28, Meredith Blackwell29, Hector Urbina29, M. Catherine Aime30, G. Heller30, A. McTaggert30, Kevin D. Hyde31, Cletus P. Kurtzman32, Jennifer J. Luangsa-ard33, S. Mongkolsamrit33, Kentaro Hosaka34, Leho Tedersoo35, Marie-Josée Bergeron36, Richard C. Hamelin36, Agathe Vialle36, Izumi Okane37, Kare Liimatainen38, Tuula Niskanen 38, Javier Diéguez-Uribeondo39, M. Dueñas39, M.A. García39, María P. Martin39, Raquel Pino-Bodas39, J.M. Sarmiento-Ramírez39, M.T. Telleria39, J.C. Zamora39, Brian J. Coppins40, Peter Harrold40, Peter Hollingsworth40, Laura J. Kelly40, Rebecca Yahr40, K. Griffiths41, T. May41, Frank O.P. Stefani41, Andrey Yurkov42, Dominik Begerow42, Feng-Yan Bai43, Lei Cai43, Liang-Dong Guo43, Huzefa A. Raja44, Dirk Redecker45, Herbert Stockinger45, Carol Shearer46, László G. Nagy46, I. Nyilasi46, Tamás Papp46, Tamás Petkovits46, Csaba Vágvölgyi46, Urmas Kõljalg47, ArthurSchüßler?, Roberto Barreto?, Bart Buyck?, Priscilla Chaverri?, Bryn Dentinger?, M.S. Elshahed?, Zai-Wei Ge?, Marieka Gryzenhout?, H.-M. Ho?, Valerie Hofstetter?, S.-B Hong?, Jos Houbraken?, Karen Hughes?, Timothy James?, E. Johnson?, Paul Kirk?, Gábor M. Kovacs?, Sara Landvik?, Audra S. Liggenstoffer?, Lorenzo Lombard?, Wieland Meyer?, Jean-Marc Moncalvo?, T. Rintoul?, Sung-Oui Suh?, Kazuaki Tanaka?, D. Vu?, Y. Wang?, Micheal Weiß?, Ning Zhang?, Wen-Ying Zhuang? and David Schindel? Author Affiliations 1 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, MSC 6510, Bethesda, Maryland 20892-6510, U.S.A 2 Biodiversity (Mycology and Microbiology) Agriculture and Agri-Food Canada , 960 Carling Avenue, Ottawa, Ontario 3 Department of Botany, The Field Museum, 1400 S. Lake,Shore Drive, Chicago, IL 60605, USA 4 CBS-KNAW Fungal Biodiversity Centre, P.O. Box 85167, 3508 AD Utrecht, 5 LifeTech, Foster City, CA, USA 6 F.Schiller University, Institute of Microbiology, Jena, Germany 7 University of Illinois, Illinois Natural History Survey, 1909 South Oak Street, Champaign, IL 61820, USA 8 Department of Microbiology and Plant Pathology, Forestry & Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0001, South Africa 9 Japan Collection of Microorganisms, RIKEN BioResource Center, Wako, Saitama 351-0198, Japan 10 Biodiversity and Climate Research Center (BiK-F), Senckenberg, Gesellschaft für Naturforschung, Senckenberganlage 25, D-60325 Frankfurt, (Main), Germany 11 BIOTEC, NSTDA, 113 Paholyothin Road, Pathum Thani 12120 Thailand 12 Boise State University, Department of Biological Sciences, 1910 University Dr. Boise, Idaho 83725 13 Botanical Museum, Finnish Museum of Natural History, FI-00014 University of Helsinki, Finland 14 Department of Biology, Biocenter of the Ludwig-Maximilian-University Munich, Martinsried, Germany 15 Chinese Academy of Sciences in Kunming (Kunming Institute of Botany) 16 Department of Biology, Brandon University, Brandon, Manitoba, Canada R7A 6A9 17 Department of Biology, McMaster University, Hamilton, Ontario, Canada 18 Department of Cryptogamic Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden 19 Department of Genetics, Forestry & Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0001, South Africa 20 Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, Gothenburg, Sweden 21 Dipartimento Biologia Applicata- Microbiologia, Università degli Studi di Perugia, Perugia, Italy 22 Dpto. Biología Vegetal II, Facultad de Farmacia, Universidad Complutense de Madrid, 28040 Madrid, Spain 23 Fungus/Mushroom Resource and Research Center, Tottori University, Japan 24 Herbarium, University of Turku, FI-20014 University of Turku, Finland 25 Institute of Animal Physiology and Genetics, Czech Academy of Sciences, Prague, Czech Republic 26 Institute of Biological. Environmental and Rural Sciences, Prifysgol Aberystwyth, Aberystwyth, Ceredigion Wales SY23 3DD 27 Japan Collection of Microorganisms, RIKEN BioResource Center, Wako, Saitama 351-0198, Japan 28 Landcare Research, Private Bag 92170, Auckland 1142, New Zealand 29 Louisiana State University, 380 Life Sciences Building, Department of Biological Sciences, Baton Rouge, Louisiana 70803 30 Louisiana State University, Louisiana State University Agricultural Center, Department of Biological Sciences, Baton Rouge, Louisiana 70803 31 Mae Fah Luang University, 57100 Chiang Rai, Thailand & Botany and Microbiology Department, College of Science, King Saud University, P.O. Box: 2455, Riyadh 1145, Saudi Arabia 32 National Center for Agricultural Utilization Research, ARS, USDA, 1815 N. University St, Peoria, IL, USA 33 National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand 34 National Museum of Nature and Science, Tsukuba, Japan 35 Natural History Museum of Tartu University 36 Natural Resources Canada, Department of Forest Sciences , Faculty of Forestry , The University of British Columbia 3rd Floor, 2424 Main Mall , Vancouver, British Columbia , Canada 37 NITE Biological Resource Center (NBRC), National Institute of Technology and Evaluation, Kisarazu, Chiba 292-0818, Japan 38 Plant Biology, Department of Biosciences, P.O. Box 65, 00014, University of Helsinki, Finland 39 Real Jardín Botánico, CSIC, Madrid 40 Royal Botanic Garden Edinburgh, Edinburgh, EH3 5LR 41 Royal Botanic Gardens Melbourne, Birwood Avenue, South Yarra, VIC 3141 Australia 42 Ruhr-Universität Bochum, Geobotanik ND03/169, Universitätstr. 150, 44801 Bochum, Germany 43 State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Bei-Chen-Xi Road, Chao Yang District, Beijing, 100101, China 44 The University of North Carolina at Greensboro, Department of Chemistry and Biochemistry, 457 Sullivan Science Building, P.O. Box 26170, Greensboro, NC 27402-6170 45 UMR Microbiologie du Sol et de l'Environnement, INRA/Université de Bourgogne, BP 86510, 17 rue Sully, 21065 Dijon cedex, France 46 University of Illinois, Department of Plant Biology, 505 South Goodwin Avenue, Urbana, IL 61801, USA 47 University of Tartu, Estonia Abstract Six genes were evaluated in a multi-laboratory, multi-national consortium as potential DNA barcodes for the Fungi, the second largest Kingdom of eukaryotic life. Cytochrome c oxidase 1, the animal barcode, is difficult to amplify in fungi, often includes large introns and can be insufficiently variable, was excluded as a potential barcode. Three regions from the nuclear ribosomal RNA cistron were compared, along with three representative protein coding genes, RPB1, RPB2 and MCM7. Although the protein coding genes often had a higher percent of correct identification than the ribosomal markers, low PCR and sequencing success eliminates them as candidates for a universal fungal barcode. The ribosomal small subunit (SSU) has poor species-level resolution in fungi. The internal transcribed spacer (ITS) has the highest probability of success of identification (72%) of the regions of the ribosomal cistron across the broadest range of fungi, with the most clearly defined barcode gap between inter- and infraspecific variation. The LSU, a popular phylogenetic marker, had superior species resolution in some taxonomic groups, such as the basal fungal lineages and the yeasts, but was otherwise slightly inferior to the ITS. We will propose ITS to be formally adopted by the Consortium for the Barcode of Life as the first fungal barcode marker, with the possibility that supplementary barcodes may be proposed for narrow taxonomic groups. Introduction The absence of an accepted DNA barcode for Fungi, the second most speciose Eukaryotic Kingdom, is a serious limitation for multi-taxon ecological and biodiversity studies. DNA barcoding uses standardized 500-700 base pair (bp) sequences to identify species of all kingdoms, using primers universal for the broadest possible taxonomic group. Reference barcodes must have expertly identified vouchers deposited in biological collections with on-line metadata, and be validated by on-line sequence chromatograms. Interspecific variation should exceed infraspecific variation (the ‘barcode gap’), and the process is optimal when a sequence is constant and unique to one species (1-3). Ideally, the barcode locus would be the same for all kingdoms. The mitochondrial gene cytochrome c oxidase 1 (CO1, or cox1) is the barcode for animals (1, 2) and is the default marker adopted by the Consortium for the Barcode of Life (CBOL) for all groups of organisms including fungi (4). In plants, CO1 has limited value for differentiating species and a two gene system was adopted (5, 6), based on portions of the ribulose biphosphate carboxylase (rbcL ) gene and a maturase-encoding gene within the intron of the chloroplast trnK gene (matK). This sets a precedent for reconsidering CO1 as the default fungal barcode. CO1 functions well in some fungal groups such as Penicillium, with reliable primers and adequate species resolution (67% in this young lineage) (7), but results in the few other groups examined experimentally are inconsistent and cloning is often required (8). Degenerate primers applicable to many Ascomycota (9) exist, but are difficult to assess because amplification failures may not reflect priming mismatches. Extreme length variation occurs because of multiple introns (7, 10-12), the introns are not consistently present in any one species, multiple copies of different lengths and variable sequence occur, and identical sequences are sometimes shared by several species in some groups (9). Some fungal clades such as Neocallimastigomycota, a basal lineage of anaerobic, zoosporic gut fungi, lack mitochondria (13). Finally, because most fungi are microscopic and invisible without optical or molecular magnification, robust, universal primers must be available to detect a truly representative profile. This appears impossible with CO1. For more than 20 years, the nuclear ribosomal RNA cistron was used for fungal diagnostics and phylogenetics (14) and its components are most frequently discussed as alternatives to CO1 (10, 11, 15, 16). The Eukaryotic RNA cistron consists of the 18S, 5.8S and 28S rRNA genes, transcribed as a unit by RNA polymerase I. Post-transcriptional processes divides this, removing two internal transcribed spacers. These two spacers, including the 5.8S gene, are usually called the ITS. The 18S nuclear ribosomal small subunit (SSU) is commonly used in phylogenetics, and although its homolog (16S) is often used as a species diagnostic for bacteria (17), it has fewer hyper variable domains in fungi. The 28S nuclear ribosomal large subunit (LSU) sometimes discriminates species on its own or in combination with ITS. For yeasts, the D1/D2 region of LSU was adopted for characterizing species long before the concept of DNA barcoding was promoted (18-20). ITS is the most frequently sequenced fungal genetic marker and is used for species identification in many fungal lineages where it already functions as a de facto DNA barcode (12, 14, 15, 21). Currently, ~172,000 full-length fungal ITS sequences are in GenBank, 56% identified with a Latin binomial, representing ~15,500 species and 2,500 genera, derived from ~11,500 scientific studies in ~500 journals (H. Nilsson, pers. comm.). An important part of these data are sequences from environmental samples (22-24). Protein-coding genes are widely used in mycology for higher-level phylogenies and species diagnostics. For Ascomycota, protein coding genes are superior to ribosomal genes for both purposes (16). Specialized identification databases employ several markers, e.g. translation elongation factor 1-α for Fusarium (25) and β-tubulin for Penicillium (26), but there is little standardization between groups. Available primers for such markers usually amplify a narrow taxonomic range. Among protein-coding genes, the largest subunit of RNA polymerase II (RPB1) may have potential as a fungal barcode; it is ubiquitous, single copy and has a slow rate of sequence divergence (27). Its phylogenetic utility was demonstrated in studies of Basidiomycota, Zygomycota and Microsporidia (28-32) and protists (33). RPB1 primers were developed for the Assembling Fungal Tree of Life project (AFToL) and the locus is included in the subsequent AFToL2 (aftol.org/about.php; (34). However, its utility as a barcode remains untested. This paper results from a multi-laboratory, multi-national initiative to establish a standard DNA barcode for Fungi. We compared barcoding performance based on probability of correct identification (PCI) and barcode gap analysis, of the three nuclear ribosomal regions (ITS, LSU and SSU), and one representative protein coding gene, RPB1, based on newly generated sequences for 742 specimens or strains representing the 17 major fungal lineages (Fig. 1). Contributors used standard primers and protocols developed by AFToL and submitted sequences to a custom-built database for analysis. Some also contributed sequences of two optional genes , including the second largest subunit of RNA polymerase II (RPB2), also an AFToL marker (35), and a gene encoding a mini-chromosome maintenance protein (MCM7), chosen based on their usefulness in phylogenetic studies and ease of amplification across Ascomycota (36, 37). Materials and Methods DNA isolation, amplification and sequencing DNA was isolated and purified from cultures or specimens using the methods routinely employed by the participating laboratories. Similarly, PCR protocols (Table S1) and thermocyclers varied from laboratory to laboratory. PCR primers were based on those used in the AFToL projects (Table S1). Many samples were sent by contributors for PCR amplification and sequencing at LifeTech (Foster City, CA). For PCR at LifeTech, 1-2 ul of fungal DNA were amplified in a final volume of 30 μl with 15 μl AmpliTaq Gold® 360 Mastermix, PCR primers and water. Forward primers contained the M13-20F sequencing primer and reverse primers included the M13R-27 sequencing primer. PCR products (3 μl )were enzymatically cleaned before cycle sequencing with 1 μl of ExoSap-IT ® and 1 μl TE buffer and incubated at 37oC for 20 min, followed by 80oC for 15 min. Cycle sequencing reactions contained 5 μl of cleaned PCR product and 2 μl of BigDye® Terminator v3.1 Ready Reaction Mix, 1 μl of 5x Sequencing Buffer, 1.6 pmol of M13F or M13R sequencing primer and water in a final volume of 10 μl. The standard cycle sequencing protocols was 27 cycles of 10 sec at 96oC, 5 sec at 50oC, 4 min at 60oC, hold 4oC. Sequencing clean up was performed with the BigDye XTerminator® Purification Kit as recommended by manufacturer for 10 μl volumes. Sequencing reactions were analysed on a 3730xl Genetic Analyser. PCR Success. Participants recorded their experience on the success of PCR amplification and sequencing for the genes and taxa they contributed to this study (Fig. 1). They also documented specific problems with PCR, quality of PCR amplification, primer problems (PCR and sequencing), and whether cloning was required. The genes were ranked for their ability to discriminate species and their overall taxonomic and phylogenetic utility in specialized taxonomic groups. Comments were parsed to identify taxon-specific problems and summarised in the supplemental data (Fig. S5). Database A query-based BioloMICS database (38) was established for 3256 strains (X species) provided by >70 members of the consortium (www.fungalbarcoding.org). The data are based on deposited voucher specimens or cultures identified by taxonomic specialists. The database allows pairwise sequence alignments or polyphasic identifications using one or any combination of the six genes used in this study. The taxon sampling covered fifteen of the seventeen major lineages attributed to the true Fungi (Fig. 1), weighted towards species-rich higher taxa such as the Pezizomycotina (the largest group of Ascomycota) and the Agaricomycotina (mushrooms and other macro-Basidiomycota). Data Analyses Sampling. Closely related but separately named asexual and sexual species were coded under one genus name, then divided into subsets to a allow taxonomically targeted assessment of markers for each major clade (Fig. 1). From the barcoding database of 3256 samples, we selected a subset of 742 strains with sequences for all four markers (ITS, LSU, SSU and RPB1). This was divided into four taxonomically delimited data sets: 416 strains in Pezizomycotina (filamentous ascomycetes), 81 in Saccharomycotina (ascomycetous yeasts), 202 in Basidiomycota and 43 strains from the combined, polyphyletic basal lineages. Two additional analyses were performed for samples with three markers to enhance evaluation of certain under sampled lineages with four markers, the first for 683 strains of Pezizomycotina with ITS, LSU and RPB1 sequences, and the second for 152 representatives of basal lineages with ITS, LSU, SSU sequences. Finally, a six marker comparison was made for a selection of 207 strains of Pezizomycotina, Basidiomycota and Saccharomycotina, with the first four markers supplemented with the two optional markers, MCM7 and RPB2. The species and strains used in the analysis are shown in Table S2. Probability of Correct Identification. For each data set, we calculated the probability of correct species identification (PCI). All alignments used the BLAST default DNA scoring system (39, 40). Two kinds of sequence alignment were calculated between every sample pair, namely a) a global alignment using the Needleman-Wunsch algorithm, which aligns the entire sequence length with penalties for gaps at the alignment ends (41); and b) a semi-global alignment, using a variant Needleman-Wunsch algorithm that includes both ends of one sequence and finds the alignment with the highest score without penalizing end gaps in the other sequence. The latter algorithm does the same for the other sequence, returning the alignment with the higher of the two scores. Thus, the global alignment matches the whole length of two sequences and the semi-global alignment matches one sequence to a subsequence of the other, and then vice versa. Semi-global alignment checks whether disparate sequence lengths degrade species identification; if they do not, global and semi-global alignment should result in similar identifications. For the two types of alignment, the p-distance (the proportion of aligned nucleotide pairs consisting of differing nucleotides) was calculated. The ‘sequence diameter’ of a species is defined as the greatest pdistance between any two samples from within a species. Based on the sequence diameter, ‘correct identification’ of a species occurs, if for every sample in the species, no sample from another species lies within the sequence diameter. The corresponding ‘probability of correct identification’ (PCI) is the fraction of species correctly so identified (5). The Wilson score interval yielded 95% confidence intervals for each PCI estimate (42). PCI was also calculated for all possible combinations of two, three or four genes, to evaluate the potential payoff of a multigene barcoding system. Sequence divergence and DNA gap analyses Using the same data set as for the PCI analysis, a DNA barcode gap analysis was performed using matrix algebra and SAS (Statistic Analysis Software; SAS Institute Inc, Cary, NC, USA) as described previously (43), except the lower triangular uncorrected distance matrix was calculated using Mothur (44). The result is indicated in Fig. 3. Additional comparisons were done without a matrix and are described in the supplemental data (Fig. S4). Results PCR Success. Our survey (Fig. S5) showed that PCR of ribosomal genes were more reliable across the Fungi than the single protein coding marker (Fig. 1). As expected, the success varied by taxonomic group, e.g. ITS PCR success ranged from 100% (Saccharomycotina) to 65% (basal lineages). Ranges for the other ribosomal markers were similar. In comparison, success for RPB1 varied from 80% (Saccharomycotina) to 14% (basal lineages). About 80% of respondents reported no problems with PCR amplification of ITS, 90% scored it as easy to obtain a high quality PCR product, and 80% reported no significant sequencing problems. In comparison, >70% reported PCR amplification problems for RPB1; 40-50% reported primer failure as the biggest problem. Species identification. We performed several analyses to allow direct comparison of the barcoding utility of the four main genes under consideration, ITS, LSU, SSU and RPB1 (Figs 2, 3). To assess the Probability of Correct Identification (PCI), data were divided into four sets by taxonomic affinity. All four genes were available for 742 samples. A pair of three gene comparisons were made to expand diversity for some major clades under represented in the initial analysis. For lichen forming Pezizomycotina, SSU was often absent because our protocols favored amplicons from the algal phycobiont rather than the fungus. Eliminating the requirement for SSU allowed more intensive sampling, with 683 sequences (179 species) of the remaining three markers. Similarly, basal lineages yielded only 43 RPB1 sequences, and a comparison ribosomal markers included a larger set of 152 samples and 34 species. The combined four gene PCI comparisons (Fig. 2) included 142 species represented by more than one sample and 84 species with only one sample. SSU was consistently the worst performing marker, with the lowest species discrimination in Pezizomycotina (Fig. 2a) and Basidiomycota (Fig. 2b), and the second lowest in Saccharomycotina (Fig. 2c). In the basal lineages (Fig. 2d), SSU had a better PCI, on par with LSU and better than both ITS and RPB1. However, LSU had variable levels of PCI (0.46-0.88) amongst all groups (Fig. 2). ITS had the most resolving power for species discrimination in the Basidiomycota (0.79) but performed less well than RPB1 in Pezizomycotina (0.79). ITS had lower discriminatory power than RPB1 in the Saccaromycotina and SSU and LSU in the basal lineages, but margins of error were high. When all taxa are considered, the PCI of ITS (0.73) was marginally lower than RPB1 (0.74). RPB1 consistently yielded high levels of species discrimination, comparable to multi-gene combinations (Fig. 2), in all the fungal groups except the basal lineages. It had the best PCI in the Pezizomycotina (0.79), but in the Basidiomycota performed slightly lower (0.67) than ITS (0.79) and LSU (0.72). In the Saccharomycotina, SSU had the highest PCI (0.46); ITS had the lowest PCI (0.39) of the single genes, margins of error were large for this set. In the multigene combinations the most effective 2 genes in the combined analysis were ITS and RPB1, yielding a PCI of 0.77. This represented an increase of 0. 04 from the highest rank single gene. Similarly the highest ranked three and four gene combination gave a similar increase (0.05). The expanded set of Pezizomycotina taxa lacking SSU sequences allowed increased sampling of lichenised species (Fig. S6). The data set included 179 species with more than one sample and 117 species with single samples. The expanded data set for basal lineage taxa lacking RPB1 sequences included 34 species with more than one sample and 50 species with one sample; in this set, all sequences were unique to their species (Fig. S5). There was no apparent difference in ranking of the four candidate barcodes compared with the four gene comparison in either analysis. The barcode gap analyses (Fig. 3) largely confirmed the trends seen in the PCI analysis. The clearest indication of a barcode gap is seen for RPB1, followed by ITS. LSU and SSU performed poorly, each lacking a significant barcode gap. To test whether other single copy protein coding genes might have a similar barcoding performance to RPB1, RPB2 and MCM7 sequences were tested for a subset of taxa. Neither yielded data from the basal lineages, but a combination of remaining groups yielded 207 strains and 55 species with all six marker sequences. This data set (Supplemental Fig X) included 55 species with more than one sample and 23 species with one sample; for both markers, all sequences were unique to their species. The two supplementary genes had a similar barcoding performance to RPB1, with RPB2 yielding the best results, followed by RPB1 and MCM7. Discussion We compared the barcoding performance of four genes using newly generated sequences from 746 strains, with two additional protein coding genes analyzed for a smaller subset of about 200 strains. Our taxon sampling was comprehensive, covering the main fungal lineages, with heavier sampling in the most speciose clades. We did not attempt to cover Glomeromycota in the absence of data for markers other than LSU and ITS; RPB1 could not be amplified consistently. We were also unable to cover Neocallimastigomycota, because of the absence of sufficient sequence data spanning the full length of the ribosomal operon. We omitted the Rozella and Microsporidia clades; arguments for and against their inclusion within Fungi continue (45, 46). For practical reasons, we had to assume that the species concepts employed by the many taxonomists participating in the consortium were equivalent and accurate, while acknowledging that species concepts vary from one fungal group to another according to the relative age and rate of divergence of the lineages and variable states of knowledge (ref?). Overall, ribosomal markers had the fewest problems with failed priming or weak amplification (Fig. 1; Fig. S4). Based on overall performance in species discrimination, SSU had almost no barcode gap (47) and the worst combined PCI, and can be eliminated as a candidate locus (Fig. 2, 3). LSU, a highly favored phylogenetic marker among mycologists, especially those involved in environmental metagenomics (ref?), had virtually no amplification, sequencing, alignment and editing problems, and the barcode gap was superior to the SSU. Across the fungal kingdom, ITS was generally superior to LSU in species discrimination and had a more clearly defined barcode gap (Fig. 3). The percentage of correct species identification using ITS is comparable to the success reported for the two-gene plant barcode system (0.73 vs 0.70) (5). Higher species identification success can be expected in the major macro-fungal groups in the Basidiomycota (0.79), and slightly lower success in the important micro-fungal groups in the filamentous Ascomycota (0.75). ITS performed as a close second to the most heavily sampled of our protein coding genes, RPB1. However, the much higher PCR success rate for the ITS is a critical difference in its performance as a barcode (Fig. 1). Taking all these arguments into account, we propose ITS as the standard barcode for fungi. The proposal will satisfy most fungal biologists, but not all. It is unlikely that a single-gene barcode system will be capable of identifying all fungi to species. Furthermore, the limitations of ITS sequences for identifying species in some groups, and the failure of the ‘universal’ ITS primers to work in a minority of other groups, will have to be carefully documented (11). This has already been found in species-rich Ascomycota genera with shorter amplicons, such as Cladosporium (48), Penicillium (49) and Fusarium (50). In addition, genetic drift may prevent lineage sorting of ancestral polymorphisms in some slow evolving groups such as lichens (51). Other data suggest that infragenomic variation requires careful consideration (52-54), e.g. within single sporocarps in basidiomycetes (55), (56). Multiple nonorthologous ITS variants have been reported, e.g. the ascomycete Fusarium (50). Highly variable lengths and high evolutionary rates in the Cantherellales (Basidiomycota) may provide challenges for sequencing and analysis of ITS in that order (57). Several evolutionary mechanisms affect ribosomal and ITS sequences, including horizontal DNA transfer (58) and a process unique to fungi called ‘repeat induced point mutations’ (RIP) (59, 60). Despite these challenges, ITS combines the highest resolving power to discriminate between closely related species with a high PCR/sequencing success rate across a broad range of fungi. In addition to Fungi, ITS may be also useful as a barcode for other organisms. Its utility is already demonstrated in Chlorophyta and certain medical plants (61, 62) and the fungus-like Oomycota (43). The possibility of multi-kingdom analyses of complex ecosystems like soil, using the species informative, stable, high-copy number of ITS mirrors the original vision of DNA barcoding and seems feasible. Protein coding genes are very popular phylogenetic markers in mycology, and are used as de facto barcodes of limited taxonomic scope in several groups. We chose RPB1 as a representative marker to include in our broad comparisons, with RPB2 and MCM7 analyzed for a smaller sample. In general, such protein genes had more species resolving power but PCR and sequencing failures eliminate them as potential universal barcodes for the broad phylogenetic scope of the Kingdom Fungi. Reliable, kingdomwide PCR amplification seems unlikely for other widely employed protein coding markers, such as translation elongation factor 1-α (TEF1), β-tubulin (TUB1), or actin (ACT1). The possibility of a two-marker barcoding system for fungi, as adopted for plants, is often discussed amongst mycologists, in particular researchers working on yeasts (ref.) and Glomeromycota, who both prefer a system combining ITS and LSU (ref.). The dual role of the ITS and LSU is already well utilised in studies of fungal diversity in environmental samples (22-24, 63) , where tandem amplification allows simultaneous species identification with the ITS and phylogenetic analysis with the LSU. Our analyses with two, three or four gene barcode systems (Fig. 2) reveal only a modest increase in the Probability of Correct Identification over a single-marker ITS barcode. The need for a second marker depends on the intended purpose of an experiment, i.e. whether a broad and general survey is intended, or whether particular critical species are being monitored. If these are taxa with low ITS interspecific variability, secondary markers must be employed in order not to underreport genetic diversity (64). Genome mining efforts have identified a few single-copy genes that might be amenable for broad range priming, and these efforts will continue (65, 66). However, for taxa where ITS is too variable or inordinately invariant, these standardization of supplementary barcode markers across the broadest possible clades will be required (67). The recent discovery of a ubiquitous fungal Class from soil (68) as well as a novel and diverse early diverging lineage tied to Rozella (69) from a freshwater pond on a university campus illuminates the fact that the majority of fungal diversity awaits discovery. It also suggests that direct barcoding of fungal DNA from environmental samples will play a critical role in the future of fungal taxonomy. Continued discovery will require DNA databases tied to reliable sequences and documented vouchers. Fungal barcoding will be central to this. Legends Figure 1. Dendrogram of 17 fungal lineages sampled in this study showing consensus relationships and sampling. Relationships with high levels of uncertainty are indicated with stippled lines. Al lineages are labeled and listed together with a predicted number of species (rounded). Two possible nodes for delineating Fungi are indicated with an ‘F’. The phyla Ascomycota and Basidiomycota are indicated with an ‘A’ and a ‘B’ respectively. Grey bars to the left indicate numbers of strains in the barcode database with the longest bar equal to 1176 strains. Black bars indicate the proportions selected for a PCI analysis. The 4 data sets analysed for PCI are numbered as follows: 1) Pezizomycotina, 2) Saccharomycotina, 3) Basidiomycota, 4) Basal lineages. Pie charts indicate success ratios of attempts to PCR the four marker genes in the following order: ITS, LSU, SSU, RPB1. Black indicates successful PCRs and sequences, gray indicates uncertain cases where no report was given and white indicates unsuccessful report of PCR. Figure 2. Monophyletic probability of identification for the four marker data sets. The plots show the combinations of barcode markers investigated on the Y-axis, with the following abbreviations I (ITS), L (LSU), S (SSU) and R (RPB1), The X-axis shows the monophyletic PCI estimate for (a) Ascomycota, Pezizomycotina ( = 142 species), (b) Basidiomycota ( = 43 species), (c) Ascomycota, Saccharomycotina ( = 13 species), (d) basal lineages ( = 8 species), and (e) combined groups (a)-(d) ( = 206 species). The error bars indicate 95% confidence intervals for the PCI estimate . Figure 3. Barcode gap analysis using distance histograms for each marker. Histograms display intraspecific variation in grey and interspecific variation in black. Inset table summarizes distance data. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Hebert PDN, Cywinska A, Ball SL, & DeWaard JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270:313-321. Hebert PDN, Ratnasingham S, & deWaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. P Roy Soc Lond B Bio 270:S96-S99. Letourneau A, Seena S, Marvanova L, & Barlocher F (2010) Potential use of barcoding to identify aquatic hyphomycetes. Fungal Divers 40:51-64. Schindel DE & Miller SE (2005) DNA barcoding a useful tool for taxonomists. Nature 435:17. Hollingsworth PM, et al. (2009) A DNA barcode for land plants. P Natl Acad Sci USA 106:1279412797. Kress WJ, et al. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. P Natl Acad Sci USA 106:18621-18626. Seifert KA, et al. (2007) Prospects for fungus identification using C01 DNA barcodes, with Penicillium as a test case. P Natl Acad Sci USA 104:3901-3906. Dentinger BTM, Didukh M, & Moncalvo JM (in press) Evaluating COI as a DNA barcode marker for mushrooms and allies (Agaricomycotina). Plos One. Gilmore SR, Grafenhan T, Louis-Seize G, & Seifert KA (2009) Multiple copies of cytochrome oxidase 1 in species of the fungal genus Fusarium. Molec Ecol Res 9:90-98. Rossman AY (2007) Report of the planning workshop for all fungi DNA Barcoding. Inoculum 58:15. Seifert KA (2008) The all-fungi barcoding campaign (FunBOL). Persoonia 20:106. Seifert KA (2009) Progress towards DNA barcoding of fungi. Mol Ecol Resources 9:83-89. Bullerwell CE & Lang BF (2005) Fungal evolution: the case of the vanishing mitochondrion. Curr Opin Microbiol 8:362-369. Begerow D, Nilsson H, Unterseher M, & Maier W (2010) Current state and perspectives of fungal DNA barcoding and rapid identification procedures. Appl Microbiol Biotechnol 87:99-108. Eberhardt U (2010) A constructive step towards selecting a DNA barcode for fungi. New Phytol 187:265-268. Schoch CL, et al. (2009) The Ascomycota Tree of Life: A Phylum-wide Phylogeny Clarifies the Origin and Evolution of Fundamental Reproductive and Ecological Traits. Syst Biol 58:224-239. Stackebrandt E & Goebel BM (1994) Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology. Int J Syst Evol Microbiol 44:846-849. Fell JW, Boekhout T, Fonseca A, Scorzetti G, & Statzell-Tallman A (2000) Biodiversity and systematics of basidiomycetous yeasts as determined by large-subunit rDNA D1/D2 domain sequence analysis. Intern J Syst Evol Microbiol 50:1351-1371. Kurtzman CP & Robnett CJ (1998) Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences. Anton Leeuw Int J G 73:331-371. Scorzetti G, Fell JW, Fonseca A, & Statzell-Tallman A (2002) Systematics of basidiomycetous yeasts: a comparison of large subunit D1/D2 and internal transcribed spacer rDNA regions. Fems Yeast Res 2:495-517. Koljalg U, et al. (2005) UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol 166:1063-1068. Buee M, et al. (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. New Phytol 184:449-456. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. Jumpponen A & Jones KL (2009) Massively parallel 454 sequencing indicates hyperdiverse fungal communities in temperate Quercus macrocarpa phyllosphere. New Phytol 184:438-448. Opik M, Metsis M, Daniell TJ, Zobel M, & Moora M (2009) Large-scale parallel 454 sequencing reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral forest. New Phytol 184:424-437. O'Donnell K, et al. (2010) Internet-accessible DNA sequence database for identifying fusaria from human and animal infections. J Clin Microbiol 48:3708-3718. Frisvad JC & Samson RA (2004) Polyphasic taxonomy of Penicillium subgenus Penicillium - A guide to identification of food and air-borne terverticillate Penicillia and their mycotoxins. Stud Mycol:1-173. Tanabe, Watanabe, & Sugiyama (2003) Are Microsporidia really related to Fungi? (vol 106, pg 1380, 2002). Mycol Res 107:511-511. Cheney SA, Lafranchi-Tristem NJ, Bourges D, & Canning EU (2001) Relationships of microsporidian genera, with emphasis on the polysporous genera, revealed by sequences of the largest subunit of RNA polymerase II (RPB1). J Eukaryot Microbiol 48:111-117. Garnica S, Weiss M, Oertel B, Ammirati J, & Oberwinkler F (2009) Phylogenetic relationships in Cortinarius, section Calochroi, inferred from nuclear DNA sequences. Bmc Evol Biol 9:-. Liu YJJ, Hodson MC, & Hall BD (2006) Loss of the flagellum happened only once in the fungal lineage: phylogenetic structure of Kingdom Fungi inferred from RNA polymerase II subunit genes. Bmc Evol Biol 6:-. Matheny PB, Liu YJJ, Ammirati JF, & Hall BD (2002) Using RPB1 sequences to improve phylogenetic inference among mushrooms (Inocybe, Agaricales). Am J Bot 89:688-698. Tanabe Y, Saikawa M, Watanabe MM, & Sugiyama J (2004) Molecular phylogeny of Zygomycota based on EF-1 alpha and RPB1 sequences: limitations and utility of alternative markers to rDNA. Mol Phylogenet Evol 30:438-449. Longet D & Pawlowski J (2007) Higher-level phylogeny of Foraminifera inferred from the RNA polymerase II (RPB1) gene. Eur J Protistol 43:171-177. McLaughlin DJ, Hibbett DS, Lutzoni F, Spatafora JW, & Vilgalys R (2009) The search for the fungal tree of life. Trends Microbiol 17:488-497. James TY, et al. (2006) Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature 443:818-822. Aguileta G, et al. (2008) Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies. Syst Biol 57:613-627. Schmitt I, et al. (2009) New primers for promising single-copy genes in fungal phylogenetics and systematics. Persoonia 23:35-40. Robert VA, et al. (2011) BioloMICS Software: Biological Data Management, Identification, Classification and Statistics. Open Appl Inform J 5:87-98. Altschul SF (1999) Hot papers - Bioinformatics - Gapped BLAST and PSI-BLAST: a new generation of protein database search programs by S.F. Altschul, T.L. Madden, A.A. Schaffer, J.H. Zhang, Z. Zhang, W. Miller, D.J. Lipman - Comments. Scientist 13:15. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. Needleman SB & Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443-453. Wilson EB (1927) Probable inference, the law of succession, and statistical inference. J Am Stat Assoc 22:209-212. Robideau G, et al. (2011) DNA barcoding of oomycetes with cytochrome c oxidase subunit I and internal transcribed spacer. Mol Ecol Res. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. Schloss PD, et al. (2009) Introducing mothur: Open-Source, Platform-Independent, CommunitySupported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol 75:7537-7541. Voigt K & Kirk PM (2011) Recent developments in the taxonomic affiliation and phylogenetic positioning of fungi: impact in applied microbiology and environmental biotechnology. Appl Microbiol Biot 90:41-57. Lee SC, et al. (2010) Evolution of the sex-Related Locus and Genomic Features Shared in Microsporidia and Fungi. Plos One 5:-. Anonymous (Guidelines for CBOL Approval of Non-COI Barcode Regions http://www.barcoding.si.edu/pdf/guidelines%20for%20non-co1%20selection%20final.pdf. Schubert K, et al. (2007) Biodiversity in the Cladosporium herbarum complex (Davidiellaceae, Capnodiales), with standardisation of methods for Cladosporium taxonomy and diagnostics. Stud Mycol 58:105-156. Skouboe P, et al. (1999) Phylogenetic analysis of nucleotide sequences from the ITS region of terverticillate Penicillium species. Mycol Res 103:873-881. O'Donnell K & Cigelnik E (1997) Two divergent intragenomic rDNA ITS2 types within a monophyletic lineage of the fungus Fusarium are nonorthologous. Mol Phylogenet Evol 7:103116. Printzen C, Ekman S, & Tonsberg T (2003) Phylogeography of Cavernularia hultenii: evidence of slow genetic drift in a widely disjunct lichen. Mol Ecol 12:1473-1486. Gomes EA, Kasuya MCM, de Barros EG, Borges AC, & Araujo EF (2002) Polymorphism in the internal transcribed spacer (ITS) of the ribosomal DNA of 26 isolates of ectomycorrhizal fungi. Genet Mol Biol 25:477-483. Hijri M, Hosny M, van Tuinen D, & Dulieu H (1999) Intraspecific ITS polymorphism in Scutellospora castanea (Glomales, Zygomycota) is structured within multinucleate spores. Fungal Genet Biol 26:141-151. Simon UK & Weiss M (2008) Intragenomic variation of fungal ribosomal genes iIs higher than previously thought. Mol Biol Evol 25:2251-2254. Smith ME, Douhan GW, & Rizzo DM (2007) Intra-specific and intra-sporocarp ITS variation of ectomycorrhizal fungi as assessed by rDNA sequencing of sporocarps and pooled ectomycorrhizal roots from a Quercus woodland. Mycorrhiza 18:15-22. Lindner DL & Banik MT (2011) Intragenomic variation in the ITS rDNA region obscures phylogenetic relationships and inflates estimates of operational taxonomic units in genus Laetiporus. Mycologia 103:731-740. Moncalvo JM, et al. (2006) The cantharelloid clade: dealing with incongruent gene trees and phylogenetic reconstruction methods. Mycologia 98:937-948. Xie J, et al. (2008) Intergeneric transfer of ribosomal genes between two fungi. Bmc Evol Biol 8:87. Selker EU (2002) Repeat-induced gene silencing in fungi. Adv Genet 46:439-450. Rouxel T, et al. (2011) Effector diversification within compartments of the Leptosphaeria maculans genome affected by Repeat-Induced Point mutations. Nat Commun 2:202. Buchheim MA, et al. (2011) Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life. Plos One 6:-. Chen SL, et al. (2010) Validation of the ITS2 Region as a Novel DNA Barcode for Identifying Medicinal Plant Species. PLoS ONE 5:-. 63. 64. 65. 66. 67. 68. 69. Nagy LG, et al. (2011) Where is the unseen fungal diversity hidden? A study of Mortierella reveals a large contribution of reference collections to the identification of fungal environmental sequences. New Phytol:no-no. Gaziz S, Rehner S, & Chaverri P (2011) Species delimitation in fungal endophyte diversity studies and its implications in ecological and biogeographic inferences. Molec Ecol:in press. Lewis CA, et al. (2011) Identification of fungal DNA barcode targets and PCR primers based on Pfam protein families and taxonomic hierarchy. Open Appl Inform J 5:30-44. Robert V, et al. (2011) The quest for a general and reliable fungal DNA barcode. Open Appl Inform J 5:45-61. Seifert KA (2009) Integrating DNA barcoding into the mycological sciences. Persoonia 21:162166. Rosling A, et al. (2011) Archaeorhizomycetes: Unearthing an Ancient Class of Ubiquitous Soil Fungi. Science 333:876-879. Richards TA, et al. (2011) Discovery of novel intermediate forms redefines the fungal tree of life. Nature 474:200-U234.