Latest version 06/29

advertisement
The internal transcribed spacer as a universal DNA barcode marker for Fungi
Fungal Barcoding Consortium
Conrad L. Schoch1, Keith A. Seifert2, Sabine Huhndorf3, John L. Spouge1, Vincent Robert4, Elena
Bolchacova5, Kerstin Voigt6, Wen Chen3, Pedro W. Crous4, Andrew N. Miller7, Micheal J. Wingfield8,
Gen Okada9, M. Suzuki9, Sarah Hambleton3, André Levesque3, J. Otte10, Imke Schmitt10, Nattawut
Boonyuen11, E.B. Gareth Jones11, Satinee Suetrong11, Eric Tretter12, Merlin M. White12, Filip
Högnabba13, Soili Stenroos13, Ferry Hagen4, Ursula Eberhardt4, Willem Quaedvlieg4, Teun Boekhout4,
Ulrike Damm4, Sybren De Hoog4, Johannes Z. Groenewald4, Marizeth Groenewald4, G. Walther4, V.
Duong4, ArthurSchüßler14, C. Qing15, Z.-L. Yang15, Mesfin Bogale16, Wendy A. Untereiner16, H.
Maganti17, J.P Xu17, S.D. Leavitt3, H. Thorsten Lumbsch3, Karen Hansen18, I. Olariaga18, T.A. Duong19, Z.
Wilhelm De Beer8, R. Henrik Nilsson20, G. Cardinali21, Ana R. Burgaz22, Anna Crespo22, Ruth Del-Prado22,
Pradeep K. Divakar22, Constantino Ruibal22, K. Sotome23, Seppo Huhtinen24, Katarina Fliegerova25, B.
Douglas26, Gareth W. Griffith26, K.-D. An27, Peter R. Johnston28, D. Park28, Bevan S. Weir28, Meredith
Blackwell29, Hector Urbina29, M. Catherine Aime30, G. Heller30, A. McTaggert30, Kevin D. Hyde31, Cletus
P. Kurtzman32, Jennifer J. Luangsa-ard33, S. Mongkolsamrit33, Kentaro Hosaka34, Leho Tedersoo35,
Marie-Josée Bergeron36, Richard C. Hamelin36, Agathe Vialle36, Izumi Okane37, Kare Liimatainen38,
Tuula Niskanen 38, Javier Diéguez-Uribeondo39, M. Dueñas39, M.A. García39, María P. Martin39, Raquel
Pino-Bodas39, J.M. Sarmiento-Ramírez39, M.T. Telleria39, J.C. Zamora39, Brian J. Coppins40, Peter
Harrold40, Peter Hollingsworth40, Laura J. Kelly40, Rebecca Yahr40, K. Griffiths41, T. May41, Frank O.P.
Stefani41, Andrey Yurkov42, Dominik Begerow42, Feng-Yan Bai43, Lei Cai43, Liang-Dong Guo43, Huzefa A.
Raja44, Dirk Redecker45, Herbert Stockinger45, Carol Shearer46, László G. Nagy46, I. Nyilasi46, Tamás
Papp46, Tamás Petkovits46, Csaba Vágvölgyi46, Urmas Kõljalg47, ArthurSchüßler?, Roberto Barreto?, Bart
Buyck?, Priscilla Chaverri?, Bryn Dentinger?, M.S. Elshahed?, Zai-Wei Ge?, Marieka Gryzenhout?, H.-M.
Ho?, Valerie Hofstetter?, S.-B Hong?, Jos Houbraken?, Karen Hughes?, Timothy James?, E. Johnson?,
Paul Kirk?, Gábor M. Kovacs?, Sara Landvik?, Audra S. Liggenstoffer?, Lorenzo Lombard?, Wieland
Meyer?, Jean-Marc Moncalvo?, T. Rintoul?, Sung-Oui Suh?, Kazuaki Tanaka?, D. Vu?, Y. Wang?, Micheal
Weiß?, Ning Zhang?, Wen-Ying Zhuang? and David Schindel?
Author Affiliations
1
National Center for Biotechnology Information, National Library of Medicine, National Institutes of
Health, 45 Center Drive, MSC 6510, Bethesda, Maryland 20892-6510, U.S.A
2
Biodiversity (Mycology and Microbiology) Agriculture and Agri-Food Canada , 960 Carling Avenue,
Ottawa, Ontario
3
Department of Botany, The Field Museum, 1400 S. Lake,Shore Drive, Chicago, IL 60605, USA
4
CBS-KNAW Fungal Biodiversity Centre, P.O. Box 85167, 3508 AD Utrecht,
5
LifeTech, Foster City, CA, USA
6
F.Schiller University, Institute of Microbiology, Jena, Germany
7
University of Illinois, Illinois Natural History Survey, 1909 South Oak Street, Champaign, IL 61820, USA
8
Department of Microbiology and Plant Pathology, Forestry & Agricultural Biotechnology Institute (FABI),
University of Pretoria, Pretoria, 0001, South Africa
9
Japan Collection of Microorganisms, RIKEN BioResource Center, Wako, Saitama 351-0198, Japan
10
Biodiversity and Climate Research Center (BiK-F), Senckenberg, Gesellschaft für Naturforschung,
Senckenberganlage 25, D-60325 Frankfurt, (Main), Germany
11
BIOTEC, NSTDA, 113 Paholyothin Road, Pathum Thani 12120 Thailand
12
Boise State University, Department of Biological Sciences, 1910 University Dr. Boise, Idaho 83725
13
Botanical Museum, Finnish Museum of Natural History, FI-00014 University of Helsinki, Finland
14
Department of Biology, Biocenter of the Ludwig-Maximilian-University Munich, Martinsried, Germany
15
Chinese Academy of Sciences in Kunming (Kunming Institute of Botany)
16
Department of Biology, Brandon University, Brandon, Manitoba, Canada R7A 6A9
17
Department of Biology, McMaster University, Hamilton, Ontario, Canada
18
Department of Cryptogamic Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05,
Stockholm, Sweden
19
Department of Genetics, Forestry & Agricultural Biotechnology Institute (FABI), University of Pretoria,
Pretoria, 0001, South Africa
20
Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, Gothenburg,
Sweden
21
Dipartimento Biologia Applicata- Microbiologia, Università degli Studi di Perugia, Perugia, Italy
22
Dpto. Biología Vegetal II, Facultad de Farmacia, Universidad Complutense de Madrid, 28040 Madrid,
Spain
23
Fungus/Mushroom Resource and Research Center, Tottori University, Japan
24
Herbarium, University of Turku, FI-20014 University of Turku, Finland
25
Institute of Animal Physiology and Genetics, Czech Academy of Sciences, Prague, Czech Republic
26
Institute of Biological. Environmental and Rural Sciences, Prifysgol Aberystwyth, Aberystwyth,
Ceredigion Wales SY23 3DD
27
Japan Collection of Microorganisms, RIKEN BioResource Center, Wako, Saitama 351-0198, Japan
28
Landcare Research, Private Bag 92170, Auckland 1142, New Zealand
29
Louisiana State University, 380 Life Sciences Building, Department of Biological Sciences, Baton Rouge,
Louisiana 70803
30
Louisiana State University, Louisiana State University Agricultural Center, Department of Biological
Sciences, Baton Rouge, Louisiana 70803
31
Mae Fah Luang University, 57100 Chiang Rai, Thailand & Botany and Microbiology Department, College
of Science, King Saud University, P.O. Box: 2455, Riyadh 1145, Saudi Arabia
32
National Center for Agricultural Utilization Research, ARS, USDA, 1815 N. University St, Peoria, IL, USA
33
National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand
34
National Museum of Nature and Science, Tsukuba, Japan
35
Natural History Museum of Tartu University
36
Natural Resources Canada, Department of Forest Sciences , Faculty of Forestry , The University of
British Columbia 3rd Floor, 2424 Main Mall , Vancouver, British Columbia , Canada
37
NITE Biological Resource Center (NBRC), National Institute of Technology and Evaluation, Kisarazu,
Chiba 292-0818, Japan
38
Plant Biology, Department of Biosciences, P.O. Box 65, 00014, University of Helsinki, Finland
39
Real Jardín Botánico, CSIC, Madrid
40
Royal Botanic Garden Edinburgh, Edinburgh, EH3 5LR
41
Royal Botanic Gardens Melbourne, Birwood Avenue, South Yarra, VIC 3141 Australia
42
Ruhr-Universität Bochum, Geobotanik ND03/169, Universitätstr. 150, 44801 Bochum, Germany
43
State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Bei-Chen-Xi
Road, Chao Yang District, Beijing, 100101, China
44
The University of North Carolina at Greensboro, Department of Chemistry and Biochemistry, 457
Sullivan Science Building, P.O. Box 26170, Greensboro, NC 27402-6170
45
UMR Microbiologie du Sol et de l'Environnement, INRA/Université de Bourgogne, BP 86510, 17 rue
Sully, 21065 Dijon cedex, France
46
University of Illinois, Department of Plant Biology, 505 South Goodwin Avenue, Urbana, IL 61801, USA
47
University of Tartu, Estonia
Abstract
Six genes were evaluated in a multi-laboratory, multi-national consortium as potential DNA barcodes for
the Fungi, the second largest Kingdom of eukaryotic life. Cytochrome c oxidase 1, the animal barcode, is
difficult to amplify in fungi, often includes large introns and can be insufficiently variable, was excluded
as a potential barcode. Three regions from the nuclear ribosomal RNA cistron were compared, along
with three representative protein coding genes, RPB1, RPB2 and MCM7. Although the protein coding
genes often had a higher percent of correct identification than the ribosomal markers, low PCR and
sequencing success eliminates them as candidates for a universal fungal barcode. The ribosomal small
subunit (SSU) has poor species-level resolution in fungi. The internal transcribed spacer (ITS) has the
highest probability of success of identification (72%) of the regions of the ribosomal cistron across the
broadest range of fungi, with the most clearly defined barcode gap between inter- and infraspecific
variation. The LSU, a popular phylogenetic marker, had superior species resolution in some taxonomic
groups, such as the basal fungal lineages and the yeasts, but was otherwise slightly inferior to the ITS.
We will propose ITS to be formally adopted by the Consortium for the Barcode of Life as the first fungal
barcode marker, with the possibility that supplementary barcodes may be proposed for narrow
taxonomic groups.
Introduction
The absence of an accepted DNA barcode for Fungi, the second most speciose Eukaryotic Kingdom, is a
serious limitation for multi-taxon ecological and biodiversity studies. DNA barcoding uses standardized
500-700 base pair (bp) sequences to identify species of all kingdoms, using primers universal for the
broadest possible taxonomic group. Reference barcodes must have expertly identified vouchers
deposited in biological collections with on-line metadata, and be validated by on-line sequence
chromatograms. Interspecific variation should exceed infraspecific variation (the ‘barcode gap’), and the
process is optimal when a sequence is constant and unique to one species (1-3). Ideally, the barcode
locus would be the same for all kingdoms. The mitochondrial gene cytochrome c oxidase 1 (CO1, or cox1)
is the barcode for animals (1, 2) and is the default marker adopted by the Consortium for the Barcode of
Life (CBOL) for all groups of organisms including fungi (4). In plants, CO1 has limited value for
differentiating species and a two gene system was adopted (5, 6), based on portions of the ribulose
biphosphate carboxylase (rbcL ) gene and a maturase-encoding gene within the intron of the chloroplast
trnK gene (matK). This sets a precedent for reconsidering CO1 as the default fungal barcode.
CO1 functions well in some fungal groups such as Penicillium, with reliable primers and adequate species
resolution (67% in this young lineage) (7), but results in the few other groups examined experimentally
are inconsistent and cloning is often required (8). Degenerate primers applicable to many Ascomycota
(9) exist, but are difficult to assess because amplification failures may not reflect priming mismatches.
Extreme length variation occurs because of multiple introns (7, 10-12), the introns are not consistently
present in any one species, multiple copies of different lengths and variable sequence occur, and
identical sequences are sometimes shared by several species in some groups (9). Some fungal clades
such as Neocallimastigomycota, a basal lineage of anaerobic, zoosporic gut fungi, lack mitochondria (13).
Finally, because most fungi are microscopic and invisible without optical or molecular magnification,
robust, universal primers must be available to detect a truly representative profile. This appears
impossible with CO1.
For more than 20 years, the nuclear ribosomal RNA cistron was used for fungal diagnostics and
phylogenetics (14) and its components are most frequently discussed as alternatives to CO1 (10, 11, 15,
16). The Eukaryotic RNA cistron consists of the 18S, 5.8S and 28S rRNA genes, transcribed as a unit by
RNA polymerase I. Post-transcriptional processes divides this, removing two internal transcribed spacers.
These two spacers, including the 5.8S gene, are usually called the ITS. The 18S nuclear ribosomal small
subunit (SSU) is commonly used in phylogenetics, and although its homolog (16S) is often used as a
species diagnostic for bacteria (17), it has fewer hyper variable domains in fungi. The 28S nuclear
ribosomal large subunit (LSU) sometimes discriminates species on its own or in combination with ITS. For
yeasts, the D1/D2 region of LSU was adopted for characterizing species long before the concept of DNA
barcoding was promoted (18-20).
ITS is the most frequently sequenced fungal genetic marker and is used for species identification in many
fungal lineages where it already functions as a de facto DNA barcode (12, 14, 15, 21). Currently, ~172,000
full-length fungal ITS sequences are in GenBank, 56% identified with a Latin binomial, representing
~15,500 species and 2,500 genera, derived from ~11,500 scientific studies in ~500 journals (H. Nilsson,
pers. comm.). An important part of these data are sequences from environmental samples (22-24).
Protein-coding genes are widely used in mycology for higher-level phylogenies and species diagnostics.
For Ascomycota, protein coding genes are superior to ribosomal genes for both purposes (16).
Specialized identification databases employ several markers, e.g. translation elongation factor 1-α for
Fusarium (25) and β-tubulin for Penicillium (26), but there is little standardization between groups.
Available primers for such markers usually amplify a narrow taxonomic range. Among protein-coding
genes, the largest subunit of RNA polymerase II (RPB1) may have potential as a fungal barcode; it is
ubiquitous, single copy and has a slow rate of sequence divergence (27). Its phylogenetic utility was
demonstrated in studies of Basidiomycota, Zygomycota and Microsporidia (28-32) and protists (33).
RPB1 primers were developed for the Assembling Fungal Tree of Life project (AFToL) and the locus is
included in the subsequent AFToL2 (aftol.org/about.php; (34). However, its utility as a barcode remains
untested.
This paper results from a multi-laboratory, multi-national initiative to establish a standard DNA barcode
for Fungi. We compared barcoding performance based on probability of correct identification (PCI) and
barcode gap analysis, of the three nuclear ribosomal regions (ITS, LSU and SSU), and one representative
protein coding gene, RPB1, based on newly generated sequences for 742 specimens or strains
representing the 17 major fungal lineages (Fig. 1). Contributors used standard primers and protocols
developed by AFToL and submitted sequences to a custom-built database for analysis. Some also
contributed sequences of two optional genes , including the second largest subunit of RNA polymerase II
(RPB2), also an AFToL marker (35), and a gene encoding a mini-chromosome maintenance protein
(MCM7), chosen based on their usefulness in phylogenetic studies and ease of amplification across
Ascomycota (36, 37).
Materials and Methods
DNA isolation, amplification and sequencing
DNA was isolated and purified from cultures or specimens using the methods routinely employed by the
participating laboratories. Similarly, PCR protocols (Table S1) and thermocyclers varied from laboratory
to laboratory. PCR primers were based on those used in the AFToL projects (Table S1). Many samples
were sent by contributors for PCR amplification and sequencing at LifeTech (Foster City, CA). For PCR at
LifeTech, 1-2 ul of fungal DNA were amplified in a final volume of 30 μl with 15 μl AmpliTaq Gold® 360
Mastermix, PCR primers and water. Forward primers contained the M13-20F sequencing primer and
reverse primers included the M13R-27 sequencing primer. PCR products (3 μl )were enzymatically
cleaned before cycle sequencing with 1 μl of ExoSap-IT ® and 1 μl TE buffer and incubated at 37oC for 20
min, followed by 80oC for 15 min. Cycle sequencing reactions contained 5 μl of cleaned PCR product and
2 μl of BigDye® Terminator v3.1 Ready Reaction Mix, 1 μl of 5x Sequencing Buffer, 1.6 pmol of M13F or
M13R sequencing primer and water in a final volume of 10 μl. The standard cycle sequencing protocols
was 27 cycles of 10 sec at 96oC, 5 sec at 50oC, 4 min at 60oC, hold 4oC. Sequencing clean up was
performed with the BigDye XTerminator® Purification Kit as recommended by manufacturer for 10 μl
volumes. Sequencing reactions were analysed on a 3730xl Genetic Analyser.
PCR Success. Participants recorded their experience on the success of PCR amplification and sequencing
for the genes and taxa they contributed to this study (Fig. 1). They also documented specific problems
with PCR, quality of PCR amplification, primer problems (PCR and sequencing), and whether cloning was
required. The genes were ranked for their ability to discriminate species and their overall taxonomic and
phylogenetic utility in specialized taxonomic groups. Comments were parsed to identify taxon-specific
problems and summarised in the supplemental data (Fig. S5).
Database
A query-based BioloMICS database (38) was established for 3256 strains (X species) provided by >70
members of the consortium (www.fungalbarcoding.org). The data are based on deposited voucher
specimens or cultures identified by taxonomic specialists. The database allows pairwise sequence
alignments or polyphasic identifications using one or any combination of the six genes used in this study.
The taxon sampling covered fifteen of the seventeen major lineages attributed to the true Fungi (Fig. 1),
weighted towards species-rich higher taxa such as the Pezizomycotina (the largest group of Ascomycota)
and the Agaricomycotina (mushrooms and other macro-Basidiomycota).
Data Analyses
Sampling. Closely related but separately named asexual and sexual species were coded under one genus
name, then divided into subsets to a allow taxonomically targeted assessment of markers for each major
clade (Fig. 1). From the barcoding database of 3256 samples, we selected a subset of 742 strains with
sequences for all four markers (ITS, LSU, SSU and RPB1). This was divided into four taxonomically
delimited data sets: 416 strains in Pezizomycotina (filamentous ascomycetes), 81 in Saccharomycotina
(ascomycetous yeasts), 202 in Basidiomycota and 43 strains from the combined, polyphyletic basal
lineages. Two additional analyses were performed for samples with three markers to enhance evaluation
of certain under sampled lineages with four markers, the first for 683 strains of Pezizomycotina with ITS,
LSU and RPB1 sequences, and the second for 152 representatives of basal lineages with ITS, LSU, SSU
sequences. Finally, a six marker comparison was made for a selection of 207 strains of Pezizomycotina,
Basidiomycota and Saccharomycotina, with the first four markers supplemented with the two optional
markers, MCM7 and RPB2. The species and strains used in the analysis are shown in Table S2.
Probability of Correct Identification. For each data set, we calculated the probability of correct species
identification (PCI). All alignments used the BLAST default DNA scoring system (39, 40). Two kinds of
sequence alignment were calculated between every sample pair, namely a) a global alignment using the
Needleman-Wunsch algorithm, which aligns the entire sequence length with penalties for gaps at the
alignment ends (41); and b) a semi-global alignment, using a variant Needleman-Wunsch algorithm that
includes both ends of one sequence and finds the alignment with the highest score without penalizing
end gaps in the other sequence. The latter algorithm does the same for the other sequence, returning
the alignment with the higher of the two scores. Thus, the global alignment matches the whole length of
two sequences and the semi-global alignment matches one sequence to a subsequence of the other, and
then vice versa. Semi-global alignment checks whether disparate sequence lengths degrade species
identification; if they do not, global and semi-global alignment should result in similar identifications. For
the two types of alignment, the p-distance (the proportion of aligned nucleotide pairs consisting of
differing nucleotides) was calculated. The ‘sequence diameter’ of a species is defined as the greatest pdistance between any two samples from within a species. Based on the sequence diameter, ‘correct
identification’ of a species occurs, if for every sample in the species, no sample from another species lies
within the sequence diameter. The corresponding ‘probability of correct identification’ (PCI) is the
fraction of species correctly so identified (5). The Wilson score interval yielded 95% confidence intervals
for each PCI estimate (42). PCI was also calculated for all possible combinations of two, three or four
genes, to evaluate the potential payoff of a multigene barcoding system.
Sequence divergence and DNA gap analyses
Using the same data set as for the PCI analysis, a DNA barcode gap analysis was performed using matrix
algebra and SAS (Statistic Analysis Software; SAS Institute Inc, Cary, NC, USA) as described previously
(43), except the lower triangular uncorrected distance matrix was calculated using Mothur (44). The
result is indicated in Fig. 3. Additional comparisons were done without a matrix and are described in the
supplemental data (Fig. S4).
Results
PCR Success. Our survey (Fig. S5) showed that PCR of ribosomal genes were more reliable across the
Fungi than the single protein coding marker (Fig. 1). As expected, the success varied by taxonomic group,
e.g. ITS PCR success ranged from 100% (Saccharomycotina) to 65% (basal lineages). Ranges for the other
ribosomal markers were similar. In comparison, success for RPB1 varied from 80% (Saccharomycotina) to
14% (basal lineages). About 80% of respondents reported no problems with PCR amplification of ITS,
90% scored it as easy to obtain a high quality PCR product, and 80% reported no significant sequencing
problems. In comparison, >70% reported PCR amplification problems for RPB1; 40-50% reported primer
failure as the biggest problem.
Species identification. We performed several analyses to allow direct comparison of the barcoding utility
of the four main genes under consideration, ITS, LSU, SSU and RPB1 (Figs 2, 3). To assess the Probability
of Correct Identification (PCI), data were divided into four sets by taxonomic affinity. All four genes were
available for 742 samples. A pair of three gene comparisons were made to expand diversity for some
major clades under represented in the initial analysis. For lichen forming Pezizomycotina, SSU was often
absent because our protocols favored amplicons from the algal phycobiont rather than the fungus.
Eliminating the requirement for SSU allowed more intensive sampling, with 683 sequences (179 species)
of the remaining three markers. Similarly, basal lineages yielded only 43 RPB1 sequences, and a
comparison ribosomal markers included a larger set of 152 samples and 34 species.
The combined four gene PCI comparisons (Fig. 2) included 142 species represented by more than one
sample and 84 species with only one sample. SSU was consistently the worst performing marker, with
the lowest species discrimination in Pezizomycotina (Fig. 2a) and Basidiomycota (Fig. 2b), and the second
lowest in Saccharomycotina (Fig. 2c). In the basal lineages (Fig. 2d), SSU had a better PCI, on par with LSU
and better than both ITS and RPB1. However, LSU had variable levels of PCI (0.46-0.88) amongst all
groups (Fig. 2). ITS had the most resolving power for species discrimination in the Basidiomycota (0.79)
but performed less well than RPB1 in Pezizomycotina (0.79). ITS had lower discriminatory power than
RPB1 in the Saccaromycotina and SSU and LSU in the basal lineages, but margins of error were high.
When all taxa are considered, the PCI of ITS (0.73) was marginally lower than RPB1 (0.74). RPB1
consistently yielded high levels of species discrimination, comparable to multi-gene combinations (Fig.
2), in all the fungal groups except the basal lineages. It had the best PCI in the Pezizomycotina (0.79), but
in the Basidiomycota performed slightly lower (0.67) than ITS (0.79) and LSU (0.72). In the
Saccharomycotina, SSU had the highest PCI (0.46); ITS had the lowest PCI (0.39) of the single genes,
margins of error were large for this set. In the multigene combinations the most effective 2 genes in the
combined analysis were ITS and RPB1, yielding a PCI of 0.77. This represented an increase of 0. 04 from
the highest rank single gene. Similarly the highest ranked three and four gene combination gave a similar
increase (0.05).
The expanded set of Pezizomycotina taxa lacking SSU sequences allowed increased sampling of
lichenised species (Fig. S6). The data set included 179 species with more than one sample and 117
species with single samples. The expanded data set for basal lineage taxa lacking RPB1 sequences
included 34 species with more than one sample and 50 species with one sample; in this set, all
sequences were unique to their species (Fig. S5). There was no apparent difference in ranking of the four
candidate barcodes compared with the four gene comparison in either analysis.
The barcode gap analyses (Fig. 3) largely confirmed the trends seen in the PCI analysis. The clearest
indication of a barcode gap is seen for RPB1, followed by ITS. LSU and SSU performed poorly, each
lacking a significant barcode gap.
To test whether other single copy protein coding genes might have a similar barcoding performance to
RPB1, RPB2 and MCM7 sequences were tested for a subset of taxa. Neither yielded data from the basal
lineages, but a combination of remaining groups yielded 207 strains and 55 species with all six marker
sequences. This data set (Supplemental Fig X) included 55 species with more than one sample and 23
species with one sample; for both markers, all sequences were unique to their species. The two
supplementary genes had a similar barcoding performance to RPB1, with RPB2 yielding the best results,
followed by RPB1 and MCM7.
Discussion
We compared the barcoding performance of four genes using newly generated sequences from 746
strains, with two additional protein coding genes analyzed for a smaller subset of about 200 strains. Our
taxon sampling was comprehensive, covering the main fungal lineages, with heavier sampling in the
most speciose clades. We did not attempt to cover Glomeromycota in the absence of data for markers
other than LSU and ITS; RPB1 could not be amplified consistently. We were also unable to cover
Neocallimastigomycota, because of the absence of sufficient sequence data spanning the full length of
the ribosomal operon. We omitted the Rozella and Microsporidia clades; arguments for and against their
inclusion within Fungi continue (45, 46). For practical reasons, we had to assume that the species
concepts employed by the many taxonomists participating in the consortium were equivalent and
accurate, while acknowledging that species concepts vary from one fungal group to another according to
the relative age and rate of divergence of the lineages and variable states of knowledge (ref?).
Overall, ribosomal markers had the fewest problems with failed priming or weak amplification (Fig. 1;
Fig. S4). Based on overall performance in species discrimination, SSU had almost no barcode gap (47) and
the worst combined PCI, and can be eliminated as a candidate locus (Fig. 2, 3). LSU, a highly favored
phylogenetic marker among mycologists, especially those involved in environmental metagenomics
(ref?), had virtually no amplification, sequencing, alignment and editing problems, and the barcode gap
was superior to the SSU. Across the fungal kingdom, ITS was generally superior to LSU in species
discrimination and had a more clearly defined barcode gap (Fig. 3). The percentage of correct species
identification using ITS is comparable to the success reported for the two-gene plant barcode system
(0.73 vs 0.70) (5). Higher species identification success can be expected in the major macro-fungal groups
in the Basidiomycota (0.79), and slightly lower success in the important micro-fungal groups in the
filamentous Ascomycota (0.75). ITS performed as a close second to the most heavily sampled of our
protein coding genes, RPB1. However, the much higher PCR success rate for the ITS is a critical
difference in its performance as a barcode (Fig. 1).
Taking all these arguments into account, we propose ITS as the standard barcode for fungi. The proposal
will satisfy most fungal biologists, but not all. It is unlikely that a single-gene barcode system will be
capable of identifying all fungi to species. Furthermore, the limitations of ITS sequences for identifying
species in some groups, and the failure of the ‘universal’ ITS primers to work in a minority of other
groups, will have to be carefully documented (11). This has already been found in species-rich
Ascomycota genera with shorter amplicons, such as Cladosporium (48), Penicillium (49) and Fusarium
(50). In addition, genetic drift may prevent lineage sorting of ancestral polymorphisms in some slow
evolving groups such as lichens (51). Other data suggest that infragenomic variation requires careful
consideration (52-54), e.g. within single sporocarps in basidiomycetes (55), (56). Multiple nonorthologous ITS variants have been reported, e.g. the ascomycete Fusarium (50). Highly variable lengths
and high evolutionary rates in the Cantherellales (Basidiomycota) may provide challenges for sequencing
and analysis of ITS in that order (57). Several evolutionary mechanisms affect ribosomal and ITS
sequences, including horizontal DNA transfer (58) and a process unique to fungi called ‘repeat induced
point mutations’ (RIP) (59, 60). Despite these challenges, ITS combines the highest resolving power to
discriminate between closely related species with a high PCR/sequencing success rate across a broad
range of fungi.
In addition to Fungi, ITS may be also useful as a barcode for other organisms. Its utility is already
demonstrated in Chlorophyta and certain medical plants (61, 62) and the fungus-like Oomycota (43). The
possibility of multi-kingdom analyses of complex ecosystems like soil, using the species informative,
stable, high-copy number of ITS mirrors the original vision of DNA barcoding and seems feasible.
Protein coding genes are very popular phylogenetic markers in mycology, and are used as de facto
barcodes of limited taxonomic scope in several groups. We chose RPB1 as a representative marker to
include in our broad comparisons, with RPB2 and MCM7 analyzed for a smaller sample. In general, such
protein genes had more species resolving power but PCR and sequencing failures eliminate them as
potential universal barcodes for the broad phylogenetic scope of the Kingdom Fungi. Reliable, kingdomwide PCR amplification seems unlikely for other widely employed protein coding markers, such as
translation elongation factor 1-α (TEF1), β-tubulin (TUB1), or actin (ACT1).
The possibility of a two-marker barcoding system for fungi, as adopted for plants, is often discussed
amongst mycologists, in particular researchers working on yeasts (ref.) and Glomeromycota, who both
prefer a system combining ITS and LSU (ref.). The dual role of the ITS and LSU is already well utilised in
studies of fungal diversity in environmental samples (22-24, 63) , where tandem amplification allows
simultaneous species identification with the ITS and phylogenetic analysis with the LSU. Our analyses
with two, three or four gene barcode systems (Fig. 2) reveal only a modest increase in the Probability of
Correct Identification over a single-marker ITS barcode. The need for a second marker depends on the
intended purpose of an experiment, i.e. whether a broad and general survey is intended, or whether
particular critical species are being monitored. If these are taxa with low ITS interspecific variability,
secondary markers must be employed in order not to underreport genetic diversity (64). Genome
mining efforts have identified a few single-copy genes that might be amenable for broad range priming,
and these efforts will continue (65, 66). However, for taxa where ITS is too variable or inordinately
invariant, these standardization of supplementary barcode markers across the broadest possible clades
will be required (67).
The recent discovery of a ubiquitous fungal Class from soil (68) as well as a novel and diverse early
diverging lineage tied to Rozella (69) from a freshwater pond on a university campus illuminates the fact
that the majority of fungal diversity awaits discovery. It also suggests that direct barcoding of fungal DNA
from environmental samples will play a critical role in the future of fungal taxonomy. Continued
discovery will require DNA databases tied to reliable sequences and documented vouchers. Fungal
barcoding will be central to this.
Legends
Figure 1. Dendrogram of 17 fungal lineages sampled in this study showing consensus relationships and
sampling. Relationships with high levels of uncertainty are indicated with stippled lines. Al lineages are
labeled and listed together with a predicted number of species (rounded). Two possible nodes for
delineating Fungi are indicated with an ‘F’. The phyla Ascomycota and Basidiomycota are indicated with
an ‘A’ and a ‘B’ respectively. Grey bars to the left indicate numbers of strains in the barcode database
with the longest bar equal to 1176 strains. Black bars indicate the proportions selected for a PCI analysis.
The 4 data sets analysed for PCI are numbered as follows: 1) Pezizomycotina, 2) Saccharomycotina, 3)
Basidiomycota, 4) Basal lineages. Pie charts indicate success ratios of attempts to PCR the four marker
genes in the following order: ITS, LSU, SSU, RPB1. Black indicates successful PCRs and sequences, gray
indicates uncertain cases where no report was given and white indicates unsuccessful report of PCR.
Figure 2. Monophyletic probability of identification for the four marker data sets.
The plots show the combinations of barcode markers investigated on the Y-axis, with the following
abbreviations I (ITS), L (LSU), S (SSU) and R (RPB1), The X-axis shows the monophyletic PCI estimate for
(a) Ascomycota, Pezizomycotina ( = 142 species), (b) Basidiomycota ( = 43 species), (c) Ascomycota,
Saccharomycotina ( = 13 species), (d) basal lineages ( = 8 species), and (e) combined groups (a)-(d) ( = 206
species). The error bars indicate 95% confidence intervals for the PCI estimate .
Figure 3. Barcode gap analysis using distance histograms for each marker. Histograms display
intraspecific variation in grey and interspecific variation in black. Inset table summarizes distance data.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
Hebert PDN, Cywinska A, Ball SL, & DeWaard JR (2003) Biological identifications through DNA
barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270:313-321.
Hebert PDN, Ratnasingham S, & deWaard JR (2003) Barcoding animal life: cytochrome c oxidase
subunit 1 divergences among closely related species. P Roy Soc Lond B Bio 270:S96-S99.
Letourneau A, Seena S, Marvanova L, & Barlocher F (2010) Potential use of barcoding to identify
aquatic hyphomycetes. Fungal Divers 40:51-64.
Schindel DE & Miller SE (2005) DNA barcoding a useful tool for taxonomists. Nature 435:17.
Hollingsworth PM, et al. (2009) A DNA barcode for land plants. P Natl Acad Sci USA 106:1279412797.
Kress WJ, et al. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest
dynamics plot in Panama. P Natl Acad Sci USA 106:18621-18626.
Seifert KA, et al. (2007) Prospects for fungus identification using C01 DNA barcodes, with
Penicillium as a test case. P Natl Acad Sci USA 104:3901-3906.
Dentinger BTM, Didukh M, & Moncalvo JM (in press) Evaluating COI as a DNA barcode marker for
mushrooms and allies (Agaricomycotina). Plos One.
Gilmore SR, Grafenhan T, Louis-Seize G, & Seifert KA (2009) Multiple copies of cytochrome
oxidase 1 in species of the fungal genus Fusarium. Molec Ecol Res 9:90-98.
Rossman AY (2007) Report of the planning workshop for all fungi DNA Barcoding. Inoculum 58:15.
Seifert KA (2008) The all-fungi barcoding campaign (FunBOL). Persoonia 20:106.
Seifert KA (2009) Progress towards DNA barcoding of fungi. Mol Ecol Resources 9:83-89.
Bullerwell CE & Lang BF (2005) Fungal evolution: the case of the vanishing mitochondrion. Curr
Opin Microbiol 8:362-369.
Begerow D, Nilsson H, Unterseher M, & Maier W (2010) Current state and perspectives of fungal
DNA barcoding and rapid identification procedures. Appl Microbiol Biotechnol 87:99-108.
Eberhardt U (2010) A constructive step towards selecting a DNA barcode for fungi. New Phytol
187:265-268.
Schoch CL, et al. (2009) The Ascomycota Tree of Life: A Phylum-wide Phylogeny Clarifies the
Origin and Evolution of Fundamental Reproductive and Ecological Traits. Syst Biol 58:224-239.
Stackebrandt E & Goebel BM (1994) Taxonomic Note: A Place for DNA-DNA Reassociation and
16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology. Int J Syst Evol
Microbiol 44:846-849.
Fell JW, Boekhout T, Fonseca A, Scorzetti G, & Statzell-Tallman A (2000) Biodiversity and
systematics of basidiomycetous yeasts as determined by large-subunit rDNA D1/D2 domain
sequence analysis. Intern J Syst Evol Microbiol 50:1351-1371.
Kurtzman CP & Robnett CJ (1998) Identification and phylogeny of ascomycetous yeasts from
analysis of nuclear large subunit (26S) ribosomal DNA partial sequences. Anton Leeuw Int J G
73:331-371.
Scorzetti G, Fell JW, Fonseca A, & Statzell-Tallman A (2002) Systematics of basidiomycetous
yeasts: a comparison of large subunit D1/D2 and internal transcribed spacer rDNA regions. Fems
Yeast Res 2:495-517.
Koljalg U, et al. (2005) UNITE: a database providing web-based methods for the molecular
identification of ectomycorrhizal fungi. New Phytol 166:1063-1068.
Buee M, et al. (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high
fungal diversity. New Phytol 184:449-456.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
Jumpponen A & Jones KL (2009) Massively parallel 454 sequencing indicates hyperdiverse fungal
communities in temperate Quercus macrocarpa phyllosphere. New Phytol 184:438-448.
Opik M, Metsis M, Daniell TJ, Zobel M, & Moora M (2009) Large-scale parallel 454 sequencing
reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral
forest. New Phytol 184:424-437.
O'Donnell K, et al. (2010) Internet-accessible DNA sequence database for identifying fusaria from
human and animal infections. J Clin Microbiol 48:3708-3718.
Frisvad JC & Samson RA (2004) Polyphasic taxonomy of Penicillium subgenus Penicillium - A
guide to identification of food and air-borne terverticillate Penicillia and their mycotoxins. Stud
Mycol:1-173.
Tanabe, Watanabe, & Sugiyama (2003) Are Microsporidia really related to Fungi? (vol 106, pg
1380, 2002). Mycol Res 107:511-511.
Cheney SA, Lafranchi-Tristem NJ, Bourges D, & Canning EU (2001) Relationships of
microsporidian genera, with emphasis on the polysporous genera, revealed by sequences of the
largest subunit of RNA polymerase II (RPB1). J Eukaryot Microbiol 48:111-117.
Garnica S, Weiss M, Oertel B, Ammirati J, & Oberwinkler F (2009) Phylogenetic relationships in
Cortinarius, section Calochroi, inferred from nuclear DNA sequences. Bmc Evol Biol 9:-.
Liu YJJ, Hodson MC, & Hall BD (2006) Loss of the flagellum happened only once in the fungal
lineage: phylogenetic structure of Kingdom Fungi inferred from RNA polymerase II subunit
genes. Bmc Evol Biol 6:-.
Matheny PB, Liu YJJ, Ammirati JF, & Hall BD (2002) Using RPB1 sequences to improve
phylogenetic inference among mushrooms (Inocybe, Agaricales). Am J Bot 89:688-698.
Tanabe Y, Saikawa M, Watanabe MM, & Sugiyama J (2004) Molecular phylogeny of Zygomycota
based on EF-1 alpha and RPB1 sequences: limitations and utility of alternative markers to rDNA.
Mol Phylogenet Evol 30:438-449.
Longet D & Pawlowski J (2007) Higher-level phylogeny of Foraminifera inferred from the RNA
polymerase II (RPB1) gene. Eur J Protistol 43:171-177.
McLaughlin DJ, Hibbett DS, Lutzoni F, Spatafora JW, & Vilgalys R (2009) The search for the fungal
tree of life. Trends Microbiol 17:488-497.
James TY, et al. (2006) Reconstructing the early evolution of Fungi using a six-gene phylogeny.
Nature 443:818-822.
Aguileta G, et al. (2008) Assessing the Performance of Single-Copy Genes for Recovering Robust
Phylogenies. Syst Biol 57:613-627.
Schmitt I, et al. (2009) New primers for promising single-copy genes in fungal phylogenetics and
systematics. Persoonia 23:35-40.
Robert VA, et al. (2011) BioloMICS Software: Biological Data Management, Identification,
Classification and Statistics. Open Appl Inform J 5:87-98.
Altschul SF (1999) Hot papers - Bioinformatics - Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs by S.F. Altschul, T.L. Madden, A.A. Schaffer, J.H. Zhang, Z.
Zhang, W. Miller, D.J. Lipman - Comments. Scientist 13:15.
Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res 25:3389-3402.
Needleman SB & Wunsch CD (1970) A general method applicable to the search for similarities in
the amino acid sequence of two proteins. J Mol Biol 48:443-453.
Wilson EB (1927) Probable inference, the law of succession, and statistical inference. J Am Stat
Assoc 22:209-212.
Robideau G, et al. (2011) DNA barcoding of oomycetes with cytochrome c oxidase subunit I and
internal transcribed spacer. Mol Ecol Res.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
Schloss PD, et al. (2009) Introducing mothur: Open-Source, Platform-Independent, CommunitySupported Software for Describing and Comparing Microbial Communities. Appl Environ
Microbiol 75:7537-7541.
Voigt K & Kirk PM (2011) Recent developments in the taxonomic affiliation and phylogenetic
positioning of fungi: impact in applied microbiology and environmental biotechnology. Appl
Microbiol Biot 90:41-57.
Lee SC, et al. (2010) Evolution of the sex-Related Locus and Genomic Features Shared in
Microsporidia and Fungi. Plos One 5:-.
Anonymous (Guidelines for CBOL Approval of Non-COI Barcode Regions
http://www.barcoding.si.edu/pdf/guidelines%20for%20non-co1%20selection%20final.pdf.
Schubert K, et al. (2007) Biodiversity in the Cladosporium herbarum complex (Davidiellaceae,
Capnodiales), with standardisation of methods for Cladosporium taxonomy and diagnostics. Stud
Mycol 58:105-156.
Skouboe P, et al. (1999) Phylogenetic analysis of nucleotide sequences from the ITS region of
terverticillate Penicillium species. Mycol Res 103:873-881.
O'Donnell K & Cigelnik E (1997) Two divergent intragenomic rDNA ITS2 types within a
monophyletic lineage of the fungus Fusarium are nonorthologous. Mol Phylogenet Evol 7:103116.
Printzen C, Ekman S, & Tonsberg T (2003) Phylogeography of Cavernularia hultenii: evidence of
slow genetic drift in a widely disjunct lichen. Mol Ecol 12:1473-1486.
Gomes EA, Kasuya MCM, de Barros EG, Borges AC, & Araujo EF (2002) Polymorphism in the
internal transcribed spacer (ITS) of the ribosomal DNA of 26 isolates of ectomycorrhizal fungi.
Genet Mol Biol 25:477-483.
Hijri M, Hosny M, van Tuinen D, & Dulieu H (1999) Intraspecific ITS polymorphism in
Scutellospora castanea (Glomales, Zygomycota) is structured within multinucleate spores. Fungal
Genet Biol 26:141-151.
Simon UK & Weiss M (2008) Intragenomic variation of fungal ribosomal genes iIs higher than
previously thought. Mol Biol Evol 25:2251-2254.
Smith ME, Douhan GW, & Rizzo DM (2007) Intra-specific and intra-sporocarp ITS variation of
ectomycorrhizal fungi as assessed by rDNA sequencing of sporocarps and pooled
ectomycorrhizal roots from a Quercus woodland. Mycorrhiza 18:15-22.
Lindner DL & Banik MT (2011) Intragenomic variation in the ITS rDNA region obscures
phylogenetic relationships and inflates estimates of operational taxonomic units in genus
Laetiporus. Mycologia 103:731-740.
Moncalvo JM, et al. (2006) The cantharelloid clade: dealing with incongruent gene trees and
phylogenetic reconstruction methods. Mycologia 98:937-948.
Xie J, et al. (2008) Intergeneric transfer of ribosomal genes between two fungi. Bmc Evol Biol
8:87.
Selker EU (2002) Repeat-induced gene silencing in fungi. Adv Genet 46:439-450.
Rouxel T, et al. (2011) Effector diversification within compartments of the Leptosphaeria
maculans genome affected by Repeat-Induced Point mutations. Nat Commun 2:202.
Buchheim MA, et al. (2011) Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure
Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life. Plos One
6:-.
Chen SL, et al. (2010) Validation of the ITS2 Region as a Novel DNA Barcode for Identifying
Medicinal Plant Species. PLoS ONE 5:-.
63.
64.
65.
66.
67.
68.
69.
Nagy LG, et al. (2011) Where is the unseen fungal diversity hidden? A study of Mortierella
reveals a large contribution of reference collections to the identification of fungal environmental
sequences. New Phytol:no-no.
Gaziz S, Rehner S, & Chaverri P (2011) Species delimitation in fungal endophyte diversity studies
and its implications in ecological and biogeographic inferences. Molec Ecol:in press.
Lewis CA, et al. (2011) Identification of fungal DNA barcode targets and PCR primers based on
Pfam protein families and taxonomic hierarchy. Open Appl Inform J 5:30-44.
Robert V, et al. (2011) The quest for a general and reliable fungal DNA barcode. Open Appl
Inform J 5:45-61.
Seifert KA (2009) Integrating DNA barcoding into the mycological sciences. Persoonia 21:162166.
Rosling A, et al. (2011) Archaeorhizomycetes: Unearthing an Ancient Class of Ubiquitous Soil
Fungi. Science 333:876-879.
Richards TA, et al. (2011) Discovery of novel intermediate forms redefines the fungal tree of life.
Nature 474:200-U234.
Download