Flanking regions of monomorphic microsatellite loci provide a new source of data for plant species-level phylogenetics
Lars W. Chatrou a
, M. Pilar Escribano b
, Maria A. Viruel b
, Jan. W. Maas c
, James E.
Richardson d , José I. Hormaza b a Nationaal Herbarium Nederland, Wageningen branch, and Wageningen UR,
Biosystematics Group, Generaal Foulkesweg 37, 6703 BL, the Netherlands. Email:
Lars.Chatrou@wur.nl
b Estación Experimental La Mayora-CSIC, Algarrobo-Costa, Málaga 29750, Spain.
Email: pilescri@eelm.csic.es
, marian.viruel@eelm.csic.es
, ihormaza@eelm.csic.es
c Nationaal Herbarium Nederland, Utrecht branch, Heidelberglaan 2, 3584 CS Utrecht, the Netherlands. Email: J.W.Maas@uu.nl
d
Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, United
Kingdom. Email: J.Richardson@rbge.ac.uk
Corresponding author:
Dr. Lars W. Chatrou
Nationaal Herbarium Nederland, Wageningen branch, and Wageningen UR,
Biosystematics Group
Generaal Foulkesweg 37
6703 BL Wageningen
The Netherlands
Phone: +31 – 317 – 483854
Fax: +31 – 317 – 484917
Email: Lars.Chatrou@wur.nl
1
Abstract
Well-resolved phylogenetic trees are essential for us to understand evolutionary processes at the level of species. The degree of species-level resolution in the plant phylogenetic literature is poor, however, largely due to the dearth of sufficiently variable molecular markers.
Unlike the common genic approach to marker development, we generated DNA sequences of monomorphic nuclear microsatellite flanking regions in a phylogenetic study of Annona species (Annonaceae). The resulting data showed no evidence of paralogy or allelic diversity that would confound attempts to reconstruct the species tree. Microsatellite flanking regions are short, making them practical to use, yet have astounding proportions of variable characters. They have 3.5-10-fold higher substitution rates compared to two commonly used chloroplast markers, have no rate heterogeneity among nucleotide positions, evolve in a clock-like fashion, and show no evidence of saturation. These advantages are offset by the short length of the flanking regions, resulting in similar numbers of parsimony informative characters to the chloroplast markers.
The neutral evolution and high variability of flanking regions, together with the wide availability of monomorphic microsatellite loci in angiosperms, are useful qualities for species-level phylogenetics. The general methodology we present here facilitates to find phylogenetic markers in groups where microsatellites have been developed.
Key words: microsatellite flanking regions, species-level phylogenetics, neutral evolution.
2
1.
Introduction
Increased focus on species-level phylogenetics in angiosperms has encouraged the pursuit of molecular markers that are capable of resolving phylogenetic relationships at lower taxonomic levels, i.e. have a mutation rate that is fast enough to produce sufficient variation (Crawford and Mort, 2004). The need for such markers resonates within the literature (Bailey et al., 2004; Choi et al., 2006; Crawford and Mort, 2004;
Whittall et al., 2006), as only a small percentage of the published species phylogenetic trees in plants are fully resolved (Hughes et al., 2006).
Chloroplast markers have been an important source of data for plant phylogenetics.
Apparent advantages of the application of chloroplast markers are its relative abundance in plant total DNA and the relatively conservative mutation rates, facilitating extraction and amplification using conserved primer binding sites. Furthermore, chloroplast markers are essentially single-copy. This avoids the reconstruction of erroneous organismal phylogenies due to the application of paralogous gene copies, which may be a problem when applying nuclear markers (Bailey et al., 2003; Baker et al., 2000;
Sanderson and Shaffer, 2002).
The features that simplify the application of cpDNA markers at the species level, are however traded off against less desirable qualities for organismal species-level phylogenetics (Sang, 2002). Chloroplast markers generally evolve at rates that are too slow to provide sufficient phylogenetically informative characters over recent time spans (Richardson et al., 2001), even after considerable data collection (Perret et al.,
2003; Pirie et al., 2006). It is not to say that not fully resolved phylogenies are meaningless. As long as critical nodes are well-supported they can serve to pinpoint biogeographical phenomena (e.g. Erkens et al., 2007b), or to falsify current classification based on morphological characters (e.g. Shaw and Small, 2004) Only few
3
papers with fairly large chloroplast data sets have generated reasonably well-resolved and robustly supported species-level phylogenies (e.g. Clarkson et al., 2004).
Furthermore, chloroplast markers are uniparentally inherited, usually maternally in angiosperms, and therefore provide only part of the evidence for the evolutionary development of a lineage if hybridization and introgression have taken place (Chase et al., 2003; Sunnucks, 2000; Vriesendorp and Bakker, 2005).
The search for useful markers for plant species-level systematics has predominantly yielded markers from genic regions, or, in the case of noncoding DNA, at short distances from genic regions. There is a growing body of literature on single- or lowcopy nuclear genes that provide sufficient informative characters and do not complicate organismal phylogeny reconstruction with paralogous copies (Edwards et al., 2008;
Emshwiller and Doyle, 1999; Sang et al., 1997; Small et al., 2004; Whittall et al., 2006).
However, it has also become clear that rates of nucleotide substitution of these markers may differ significantly among lineages, and even among closely related species
(Hughes et al., 2006). Therefore, attempts to extrapolate the utility of these markers for resolving species-level relationships outside the taxonomic group for which they have been developed may not succeed. These difficulties have led some researchers to suggest that a universal approach should be abandoned in favour of a lineage-specific one (Small et al., 2004).
However, an alternative to a gene-based approach to the development of variable nuclear markers involves search strategies that focus on randomly amplified regions throughout the genome (Bailey et al., 2004; Hughes et al., 2006). The high variability, abundance, uniform and genome-wide distribution, and neutral evolution of one of these, namely microsatellites (Ellegren, 2004), make them potentially useful at the species level. However, their polymorphic nature brings about analytical problems,
4
related to the translation of allele sizes to distance-based characters that are susceptible to incorrect homology assessment (Matsuoka et al., 2002; Primmer and Ellegren, 1998).
We avoid this drawback by only focusing on the nucleotide sequences of the flanking regions alongside the microsatellite repeat region, not on the repeat region. A further factor possibly complicating phylogeny reconstruction is the presence of multiple alleles, i.e. of variation that doesn’t necessarily have a one-to-one relationship to the organismal phylogeny, for example because of incomplete lineage sorting. The distinction between paralogous and orthologous microsatellite copies is less of a complicating factor. Microsatellites, including the flanking regions, usually represent unique and therefore orthologous loci (Sunnucks, 2000), although duplication events affecting microsatellite loci have been reported (Antunes et al., 2006; Zhang and
Rosenberg, 2007).
Thus, an optimal microsatellite flanking region marker for plant species-level phylogenetics has a rate of substitution that allows resolving shallower relationships, is represented by orthologous copies, and is monomorphic within species, populations and individuals. We present examples of such markers from the plant family Annonaceae, and outline the potential for the broader applicability of this approach in other clades of angiosperms. We have taken orthology-by-default of as a starting point of our study, further hypothesizing that a neutral, highly variable marker system such as microsatellites, including the flanking regions, evolves at a fast enough rate to elucidate relationships among species of Annona (Annonaceae). In this plant family, the application of chloroplast sequence data has produced phylogenies that are poorly resolved at the species level, despite the gathering of large amounts of data (Erkens et al., 2007a; Mols et al., 2004). Annona is paraphyletic with respect to Rollinia , and the
5
two genera together comprise a clade of approximately 175 species. Species of Rollinia were synonymised into Annona recently (Rainer, 2007). Here we provide the first published phylogenetic support for this taxonomic decision, as the former species of
Rollinia ( Annona cuspidata , A. herzogii , A. mucosa , A. neochrysocarpa ) appear as a well-supported clade within Annona from the analyses we present here. Although our taxon sampling reflects one eighth of the species diversity in Annona , covering the entire morphological diversity as well as the geographical distribution of the genus, additional taxon sampling would be needed to confidently corroborate the inclusion of
Rollinia into Annona .
To assess the utility of microsatellite flanking regions, we need to address the following issues: (1) can we produce flanking region sequences that are monomorphic within individuals? (2) Can we confirm the assumed orthology of the flanking region sequences? (3) What is the transferability of microsatellite regions across species of
Annona and other Annonaceae? (4) What is the strength of the phylogenetic signal at the species level?
2. Materials and Methods
2.1 Taxon sampling
For this study we sampled 24 species: 22 species of Annona and two species of Asimina as outgroup species (Table 1). Richardson et al. (2004) have shown that Asimina is sister to Annona . The samples of Annona represent the complete geographical range of the genus, as well as the considerable morphological (particularly floral) variation.
6
2.2 Character sampling
Two chloroplast markers, rbcL and trnLF , were sequenced. These markers have commonly been applied in phylogenetic analyses of Annonaceae (e.g. Couvreur et al.
(2008), Erkens et al. (2007a), Pirie et al (2006)). These markers are generally considered to be useful at taxonomic levels above that of species. In a family-wide analysis the relationships among nine species of Annona were fairly well resolved but generally poorly supported, based on these two chloroplast markers only (Richardson et al., 2004). The microsatellite loci were selected based on a screening with the first 15 microsatellite loci that were developed in cherimoya ( Annona cherimola ) (Escribano et al., 2004). Seven of them (LMCH4, 5, 6, 9, 10, 11, and 14) produced amplification bands in the eight species studied initially ( Annona sp. nov., A. glabra , A. montana , A. muricata , A. oligocarpa , A. reticulata , A. senegalensis , and Rollina cuspidata [now
Annona cuspidata , (Rainer, 2007)]. No amplification was obtained for two loci
(LMCH1 and LMCH13). For two additional loci (LMCH7 and LMCH8) amplification was obtained only with A. montana and A. glabra . LMCH9 and LMCH10 were selected for this study because they showed clear and monomorphic single-allele amplification bands in all the species tested.
2.3 DNA extraction, PCR amplification and sequencing
Total genomic DNA was extracted following a protocol adapted from the CTAB method (Doyle and Doyle, 1987), as described in Pirie et al. (2006). PCR conditions and primers for the chloroplast markers were standard, and are identical to Pirie et al.
(2006). PCR products were purified using QIAquick PCR purification kits (Qiagen),
7
and sequenced with the PCR primers. PCR conditions and primers for the microsatellites LMCH9 and LMCH10 are according to Escribano et al. (2004).
PCR products were resolved in 3% high resolution agarose (Metaphor, FMC
Bioproducts, Rockland, ME) gel electrophoresis in SB buffer at 5V/cm.
Sequencing reactions had a total volume of 10 µl contained 0.5 µl DYEnamic ET
Terminator (Amersham Pharmacia Biotech), 3.5 µl ET Terminator dilution buffer
(Amersham Pharmacia Biotech), and 2-4 µl of DNA template, depending on the concentration. Template concentration was assessed by gel electrophoresis through a
1.5% agarose gel using a molecular weight marker (Smart-Ladder, Eurogentec, Seraing,
Belgium). Sequencing products were purified in a Sephadex
G-50, DNA grade
(Sigma-Aldrich, St. Louis, MO, USA), and analyzed on an automatic sequencer ABI
3730XL (Applied Biosystems).
2.4 Phylogenetic analysis
DNA sequences were edited in SeqMan 4.0 (DNAStar Inc., Madison, Wisconsin), and aligned manually. After exclusion of ambiguous positions, the resulting alignment of rbcL consisted of 1364 positions, trnLF 845 positions, LMCH9 117 positions, and
LMCH10 233 positions. Indels were coded following Simmons and Ochotorena (2000), and resulted in 15 further characters ( trnLF : 4, LMCH9: 3, LMCH10: 8).
Maximum parsimony analyses [Fitch parsimony (Fitch, 1971)] for each marker separately were done applying heuristic searches, with 100,000 random addition sequence replicates, saving maximally 100 trees per replicate, using TBR branch swapping. The program PAUP* 4.0b10 (Swofford, 2000) was used for the phylogenetic analyses. The concatenated data matrix of all markers was analyzed using the branch
8
and bound method, with furthest addition sequence, and the MulTrees option in effect.
Bootstrap resampling of the data matrix was used to assess support, with 1000 bootstrap replicates for each bootstrap analysis. For each marker individually, full heuristic searches were done of 100 random addition sequences, TBR, saving 100 trees each time. For the concatenated matrix, each bootstrap resampled matrix was analyzed using the branch and bound algorithm, with settings as above.
DNA substitution models for each data partition separately were identified using
ModelTest 3.04 (Posada and Crandall, 1998). Individual data partitions were optimized onto the combined topology. Based on the model identified by ModelTest, a likelihood ratio test (Felsenstein, 1988) was applied to test whether each data partition evolves along all branches within the combined topology at a homogenous rate (molecular clock). The difference in likelihoods of the tree topologies, with and without clock constraint, was used to calculate the likelihood ratio test statistic
, which is reported in
Table 2. Likelihood values were produced with PAUP* 4.0b10.
To test for congruence between the chloroplast data partition and the flanking region data partition, we applied the incongruence length difference (ILD) test (Farris et al.,
1995a, b), implemented in PAUP* 4.0b10, using informative characters only, with 5000 replicates, and heuristic searches as described above. The chloroplast markers rbcL and trnLF were combined into a single data partition, and we tested incongruence of the plastid data partition with both flanking region markers separately. Incongruence between the two flanking regions was tested as well. Statistics of the incongruence length difference tests are given in Table 3.
Saturation plots were made by plotting corrected vs. uncorrected distances for all possible species pairs, both of which are produced by PAUP*. Distances were corrected
9
applying the models of molecular evolution for each marker separately, as found with
Modeltest 3.06 (Posada and Crandall, 1998). These models are given in Table 2.
Substitution rates were estimated using the program r8s (Sanderson, 2004), using penalized likelihood as reconstruction method. To estimate branch lengths as accurately as possible, sequences of each individual markers were optimized onto the combined topology. Rates were calculated in absolute time (10 -9 substitutions / site / year) by calibrating the crown node of Annona at 19.1 myr. Richardson et al. (2004) estimated the crown node of Annona at 25.6
3.8 myr. Unpublished results (Pirie et al., in prep.), analysing more sequence data and calibrating with more fossils compared to Richardson et al. (2004) have pushed this age up to 19.1
2.0 myr. Reliable fossil data for Annona are unavailable. Given the broad taxon and character sampling of the study from which we derive this age, the quality of the fossils, and the small confidence intervals, we consider this secondary calibration reliable.
3. Results and discussion
3.1 Monomorphism and allelic diversity
Monomorphic microsatellites, i.e. those with only a single allele for a locus, are routinely discovered during the screening of microsatellite loci in plants (Squirrell et al.,
2003). Two out of 15 microsatellites developed for Annona cherimola (Escribano et al.,
2004), LMCH9 and LMCH10, meet this criterion, as they produced clear single amplification bands after PCR. We produced LMCH9 and LMCH10 nucleotide sequences for 22 species of Annona (Table 1). The two microsatellite loci both contain a dinucleotide repeat region, as well as short 5’ and 3’ flanking regions.
10
PCR of these regions generally produced homogeneous bands, supporting our assumption of orthology of the included flanking region sequences (Small et al., 2004).
Standard PCR conditions were adequate for obtaining amplification products, and no cloning was required. The small size of the fragments enhances the ease of amplification, which makes them especially useful when working with degraded DNA.
In the unusual case of double bands, fragment size similarity amongst different species was easy to assess, and fragments were cut out from the high-resolution separation gel.
Sequencing of PCR products typically produced chromatograms indicative of monomorphic loci, which could be interpreted unequivocally. Single nucleotide polymorphisms (SNPs), i.e. identical polymorphisms that were present in both the forward and reverse sequence, were hardly ever encountered. Three LMCH9 sequences
( Annona dumetorum , A. mucosa , and A. urbaniana ) and two LMCH10 sequences ( A. bicolor and A. hypoglauca ) contained 1-3 SNPs, causing a polymorphism frequency between 0.4% and 1.7% for these five sequences. There was no overlap in positions at which the polymorphism occurred between any of these species. These SNPs might indicate the presence of multiple alleles. However, given the very low frequency of
SNPs, possible alleles were highly similar and their effect on the results of the phylogenetic analyses was non-existent. Even if the SNPs would point at allelic variation, it would only cause problems for species-level phylogeny reconstruction in case the coalescence of alleles at deeper phylogenetic levels, ancestral to the species sampled here. And so, despite the use of a highly variable marker, the careful selection of monomorphic loci precluded the gathering of intra-specific polymorphisms that would have rendered reconstruction of the species phylogeny problematic.
3.2 Orthology
11
Alignment of the LMCH9 and LMCH10 flanking regions was straightforward. The 5’ flanking region of LMCH9 was only 30 bps and therefore only the 3’ flanking region was included in the analyses. Both the 5’ and 3’ flanking region of LMCH10 were included. The dinucleotide repeat regions were excluded from the analysis. The aligned flanking regions of LMCH9 and LMCH10 comprise 117 and 233 characters, respectively. Additionally, 11 indel characters were scored and included in the analyses
(Table 2).
Orthology of the flanking region sequences was supported by the similarity of phylogenetic signal in the flanking regions and in an independent data source, viz. chloroplast rbcL and trnL-F sequences. Manual observation of bootstrap support values for each marker separately (Fig. 1) revealed the absence of any well-supported conflicting clades (bootstrap support
85 %). The sister group relationship of the A. glabra / A. senegalensis clade with a clade containing the former Rollinia species, as reconstructed with LMCH9 sequences, is in conflict with the position of the former clade after analysis of the other markers. However, the bootstrap support of 75% is only moderate, and insufficient to consider the signal of LMCH9 to be different.
Furthermore, congruence of the combined chloroplast markers and each flanking region was demonstrated using the parsimony-based incongruence length difference (ILD) test
(Table 3). Both LMCH9 and LMCH10 were not significantly incongruent with the chloroplast data partition at the 95% confidence level. Finally, incongruence of the two flanking regions was clearly refuted.
3.3 Transferability
12
We were unable to amplify either of the two microsatellite loci for any species outside
Annona , not even for its sister genus Asimina (Richardson et al., 2004). It should be noted that the clade to which Annona and Asimina belong is characterized by long branches subtending generally species-rich clades, causing sister genera to be relatively distant (Richardson et al., 2004). However, similar patterns of good amplification success within a target group and poor success in non-congeneric species have been reported in other plant and animal clades too (Fraser et al., 2005; Peakall et al., 1998;
Wilson et al., 2004) In Annona the limited transferability only poses problems with regard to the rooting of the tree, as the flanking regions sequences could be produced for the entire ingroup. We predict that this would also be the case in other similar studies in angiosperms. Datasets for phylogenetic analyses typically comprise multiple markers, and can easily be designed to contain flanking region sequences as well as markers that can be sequenced for a broader range of taxa. The latter sequences would ensure appropriate rooting of phylogenetic trees and the former would provide many informative characters at nodes within the ingroup. The potential for utilizing microsatellite flanking regions for species-level phylogenetics in plants is fairly large, as published monomorphic microsatellite sequences are available for species-rich tropical groups, such as Begonia or Melaleuca , as well as temperate groups such as Pinus and
Primula (Squirrell et al., 2003), and could readily be scrutinized for their phylogenetic utility. At the same time our results as well as the other reports on transferability suggest that the usefulness of the methodology we describe here is limited to resolving the shallow branches of the tree of life and will not contribute to taxonomically large data sets (Chase et al., 2006; Driskell et al., 2004).
3.4 Phylogenetic utility
13
The addition of flanking region sequence data has a positive effect on the resolution of the phylogenetic tree of Annona . The number of well-supported nodes increases compared to the application of the chloroplast sequences only (Table 4), and the simultaneous analysis of the four markers produced a single most-parsimonious tree, generally with high bootstrap support for the nodes (Fig. 2).
The flanking region sequences are much more variable than the chloroplast markers, as expressed by the higher percentage of both variable and parsimony informative characters (Table 2). Also, mean substitution rates are higher for the flanking regions, approximately 3.5 to 10-fold the rate of the chloroplast markers, although it should be noted that the standard deviations of the substitution rates are large (Table 2). The models of evolution of the flanking regions evolve are simpler than of the chloroplast markers. All sites of each of the flanking regions, respectively, evolve at the same rates as the model estimates showed the absence of rate substitution heterogeneity among the positions (
=
). Moreover, substitutions accumulate linearly over time in the flanking regions. For LMCH9, the molecular clock hypothesis was not rejected by the likelihood ratio test (LRT) at any significance level (Table 2). For LMCH10, the molecular clock hypothesis is just rejected at the 5% level, though not at the 2.5 % level. For both chloroplast markers the molecular clock hypothesis was rejected (p < 0.001).
For each marker we plotted uncorrected pairwise distances against distances corrected using models of molecular evolution (Table 2) as identified using ModelTest (Posada and Crandall, 1998), to assess the occurrence of saturation (Fig. 3). The chloroplast markers show initial saturation as the graphs deflect from linearity but don’t reach a saturation plateau yet. In contrast, the flanking regions that show no evidence of multiple substitutions at nucleotide positions despite the higher overall substitution
14
rates. The first signs of saturation in the chloroplast markers might be attributable to the sampling of only 22 out of 175 species of Annona . Increased taxon sampling would likely reduce the phylogenetic distances among sequences, and consequently could minimize the appearance of saturation. However, saturation in the four markers is compared against the background of the same taxon sampling. The tentative conclusion that the microsatellite flanking regions are less saturated than the chloroplast markers is therefore warranted.
Due to the higher percentage of variable characters and the higher rate of substitution, it would be reasonable to expect higher levels of homoplasy in the flanking regions, simply because of the availability of four character states only for each nucleotide position. First, superimposed substitutions at a nucleotide position would be supposed to occur more readily, causing the saturation plot to deflect from linearity. In addition to this hidden homoplasy, we would expect to see higher levels of ‘visible’ homoplasy, i.e. the independent multiple origin of identical character states, as reflected in lower values of the consistency index (CI). However, both the saturation plot (Fig. 3) and the CI values (Table 2) show results differing from these expectations; there is not as much saturation in the flanking regions as in the chloroplast markers, and CI values are similar. The explanation must be sought in the characteristics of molecular evolution of the flanking regions, notably the clock-like accumulation of substitutions, and the absence of rate heterogeneity. Both characteristics are assumed by neutral molecular evolution: a constant substitution rate over evolutionary lineages and over sites in DNA sequences (Bromham and Penny, 2003). The even distribution of substitutions over the nucleotide positions in the flanking regions allows the substitution rates to be higher than for the chloroplast markers, while at the same time making the flanking regions
15
less prone to saturation. In addition, the K80 model of molecular evolution estimated for
LMCH9 is congruent with these indications of neutrality.
These characteristics of the flanking regions are noteworthy as they are in contrast to findings in the literature on homoplasy in chloroplast markers. The correlation between levels of homoplasy on the one hand, and substitution rates and/or levels of sequence divergence on the other hand is often assumed without further testing, for instance to rule out the possible deleterious effect of saturation in data partitions with low substitution rates (e.g. Cronn et al., 2002; Zgurski et al., 2008). Such a positive relationship between substitution rate and level of homoplasy, as expressed by the consistency index, has been demonstrated for nucleotide substitution rates in chloroplast genes (Graham and Olmstead, 2000), and even for rates of chloroplast indel characters
(Ingvarsson et al., 2003). Wortley et al. (2005) found that a simulating an increase of substitution rate of rbcL , matK and ndhF soon resulted in the decrease of phylogenetic resolution, probably because of saturation. This is likely to be related to the fact that the evolution of these chloroplast genes is governed by mild functional constraints
(Savolainen et al., 2002). The absence of comparable correlations in microsatellite flanking regions between substitution rate and homoplasy mirrors the neutral evolution of microsatellites (Ellegren, 2004), which apparently is present at the nucleotide level in the flanking regions too.
The clock-like behaviour of the flanking regions makes them a helpful tool for the dating of divergences. An additional advantage is the availability of a nuclear data partition, providing a more complete picture of species level phylogenies than based on chloroplast markers only.
These advantages are traded off by the drawback of the limited size of the flanking regions. Despite the high percentage of variable and parsimony informative characters,
16
the absolute number of these characters is comparable to the chloroplast markers, the latter being a more relevant characteristic for determining phylogenetic utility. Wortley and Scotland (2006) fine-tuned this criterion by describing the minimum number of parsimony informative character-state changes, as a measure of utility. For the markers in this paper the difference between this measure and the number of parsimony informative characters was negligible, as there was only a difference of 1 between the two measures for trnLF and LMCH10.
3.5 Conclusion
To our knowledge, microsatellite flanking regions have only once been demonstrated to be congruent with phylogenies based on other data, and subsequently been used in species-level phylogenetics, viz. in cichlid fishes (Zardoya et al., 1996). Here we present the first example of the utility of flanking regions for angiosperm phylogenetics.
Our data strongly suggest that the suitability of the flanking regions for resolving species-level relationships is related to the neutral molecular evolution of these regions, as exemplified by the substitution rate constancy, similar substitution rates over nucleotide positions, and the lack of saturation. Given the large number of microsatellite libraries that have been created for a broad range of species in a large number of angiosperm genera, the potential for using a similar approach to that employed here for discovering phylogenetically useful markers in problematic groups is high. In a survey of the utilization of microsatellites for population genetic studies on individual species
Squirrel et al. (Squirrell et al., 2003) highlighted 66 examples in angiosperms. Of the studies considered in Table 3 of that publication an average of 17.7% of microsatellite loci producing PCR products were monomorphic. Our study demonstrates that these
17
loci may represent a great untapped source of phylogenetically informative nuclear neutral markers for plant species-level systematics. Obviously, on the basis of our results we cannot accurately predict the phylogenetic utility of these markers in other plant groups, as little is known about the molecular evolution of microsatellite flanking regions in general. Nevertheless, it is likely that these molecular evolutionary patterns will resemble that of Annona , given the frequency and distribution of the bulk of microsatellites in the genome, and given the neutral evolution of the repeat regions
(Ellegren, 2004). Given the universal utility of our approach microsatellite flanking regions have the potential to become useful tools for resolving relationships amongst recently diverged taxa.
18
Acknowledgments
The authors acknowledge financial support from the Spanish Ministry of Education
(Project Grants AGL2004-02290/AGR and AGL2007-60130/AGR). M.P.E. was supported by a FPI grant of the Spanish Ministry of Education.
19
References
Antunes, A., Gharbi, K., Alexandrino, P., Guyomard, R., 2006. Characterization of transferrin -linked microsatellites in brown trout ( Salmo trutta ) and Atlantic salmon
( Salmo salar ). Mol. Ecol. Notes 6, 547-549.
Bailey, C.D., Carr, T.G., Harris, S.A., Hughes, C.E., 2003. Characterization of angiosperm nrDNA polymorphism, paralogy, and pseudogenes. Mol. Phylogen. Evol.
29, 435-455.
Bailey, C.D., Hughes, C.E., Harris, S.A., 2004. Using RAPDs to identify DNA sequence loci for species level phylogeny reconstruction: an example from Leucaena
(Fabaceae). Syst. Bot. 29, 4-14.
Baker, W.J., Hedderson, T.A., Dransfield, J., 2000. Molecular phylogenetics of subfamily Calamoideae (Palmae) based on nrDNA ITS and cpDNA rps16 intron sequence data. Mol. Phylogen. Evol. 14, 195-217.
Bromham, L., Penny, D., 2003. The modern molecular clock. Nat. Rev. Genet. 4, 216-
224.
Chase, M.W., Fay, M.F., Soltis, D.E., Soltis, P.S., Takahashi, K.T., Savolainen, V.,
2006. Simple phylogenetic tree searches easily "succeed" with large matrices of single genes. Taxon 55, 573-578.
Chase, M.W., Knapp, S., Cox, A.V., Clarkson, J.J., Butsko, Y., Joseph, J., Savolainen,
V., Parokonny, A.S., 2003. Molecular systematics, GISH and the origin of hybrid taxa in Nicotiana (Solanaceae). Ann. Bot. 92, 107-127.
Choi, H.-K., Luckow, M., Doyle, J.J., Cook, D., 2006. Development of nuclear genederived molecular markers linked to legume genetic maps. Mol. Genet. Genomics 276,
56-70.
20
Clarkson, J.J., Knapp, S., Garcia, V.F., Olmstead, R.G., Leitch, A.R., Chase, M.W.,
2004. Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions. Mol. Phylogen. Evol. 33, 75-90.
Couvreur, T.L.P., Richardson, J.E., Sosef, M.S.M., Erkens, R.H.J., Chatrou, L.W.,
2008. Evolution of syncarpy and other morphological characters in African
Annonaceae: a posterior mapping approach. Mol. Phylogen. Evol. 47, 302-318.
Crawford, D.J., Mort, M.E., 2004. Single-locus molecular markers for inferring relationships at lower taxonomic levels: observations and comments. Taxon 53, 631-
635.
Cronn, R.C., Small, R.L., Haselkorn, T., Wendel, J.F., 2002. Rapid diversification of the cotton genus ( Gossypium : Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Am. J. Bot. 89, 707-725.
Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11-15.
Driskell, A.C., Ane, C., Burleigh, J.G., McMahon, M.M., O'Meara, B.C., Sanderson,
M.J., 2004. Prospects for building the tree of life from large sequence databases.
Science 306, 1172-1174.
Edwards, C.E., Lefkowitz, D., Soltis, D.E., Soltis, P.S., 2008. Phylogeny of Conradina and related southeastern scrub mints (Lamiaceae) based on GapC gene sequences. Int. J.
Plant Sci. 169, 579-594.
Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev.
Genet. 5, 435-445.
Emshwiller, E., Doyle, J.J., 1999. Chloroplast-expressed glutamine synthetase (ncpGS): potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae).
Mol. Phylogen. Evol. 12, 310-319.
21
Erkens, R.H.J., Chatrou, L.W., Koek-Noorman, J., Maas, J.W., Maas, P.J.M., 2007a.
Classification of a large and widespread genus of Neotropical trees, Guatteria
(Annonaceae) and its three satellite genera Guatteriella , Guatteriopsis and
Heteropetalum . Taxon 56, 757-774.
Erkens, R.H.J., Chatrou, L.W., Maas, J.W., van der Niet, T., Savolainen, V., 2007b. A rapid diversification of rainforest trees ( Guatteria ; Annonaceae) following dispersal from Central into South America. Mol. Phylogen. Evol. 44, 399-411.
Escribano, P., Viruel, M.A., Hormaza, J.I., 2004. Characterization and cross-species amplification of microsatellite markers in cherimoya ( Annona cherimola Mill.,
Annonaceae). Mol. Ecol. Notes 4, 746-748.
Farris, J.S., Källersjö, M., Kluge, A.G., Bult, C., 1995a. Constructing a significance test for incongruence. Syst. Biol. 44, 570-572.
Farris, J.S., Källersjö, M., Kluge, A.G., Bult, C., 1995b. Testing significance of incongruence. Cladistics 10, 315-319.
Felsenstein, J., 1988. Phylogenies and quantitative characters. Annu. Rev. Ecol. Syst.
19, 445-471.
Fitch, W.M., 1971. Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Zool. 20, 406-416.
Fraser, L.G., McNeilage, M.A., Tsang, G.K., Harvey, C.F., De Silva, H., 2005. Crossspecies amplification of microsatellite loci within the dioecious, polyploid genus
Actinidia (Actinidiaceae). Theor. Appl. Genet. 112, 149-157.
Graham, S.W., Olmstead, R.G., 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Amer. J. Bot. 87, 1712-1730.
22
Hughes, C.E., Eastwood, R.J., Bailey, C.D., 2006. From famine to feast? Selecting nuclear DNA sequence loci for plant species-level phylogeny reconstruction. Philos.
Trans. R. Soc. Lond., Ser. B: Biol. Sci. 361, 211-225.
Ingvarsson, P.K., Ribstein, S., Taylor, D.R., 2003. Molecular evolution of insertions and deletion in the chloroplast genome of Silene . Mol Biol Evol 20, 1737-1740.
Matsuoka, Y., Mitchell, S.E., Kresovich, S., Goodman, M., Doebley, J., 2002.
Microsatellites in Zea - variability, patterns of mutations, and use for evolutionary studies. Theor. Appl. Genet. 104, 436-450.
Mols, J.B., Gravendeel, B., Chatrou, L.W., Pirie, M.D., Bygrave, P., Chase, M.W.,
Kessler, P.J.A., 2004. Identifying clades in Asian Annonaceae: monophyletic genera in the polyphyletic Miliuseae. Amer. J. Bot. 91, 590-600.
Peakall, R., Gilmore, S., Keys, W., Morgante, M., Rafalski, A., 1998. Cross-species amplification of soybean ( Glycine max ) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants.
Mol. Biol. Evol. 15, 1275-1287.
Perret, M., Chautems, A., Spichiger, R., Kite, G., Savolainen, V., 2003. Systematics and evolution of tribe Sinningieae (Gesneriaceae): evidence from phylogenetic analyses of six plastid DNA regions and nuclear ncpGS . Amer. J. Bot. 90, 445-460.
Pirie, M.D., Chatrou, L.W., Erkens, R.H.J., Maas, J.W., van der Niet, T., Mols, J.B.,
Richardson, J.E., 2005. Phylogeny reconstruction and molecular dating in four
Neotropical genera of Annonaceae: the effect of taxon sampling in age estimation. In:
Bakker, F.T., Chatrou, L.W., Gravendeel, B., Pelser, P.B. (Eds.), Plant species-level systematics: new perspectives on pattern and process. A.R.G. Gantner Verlag, Ruggell,
Liechenstein, pp. 149-174.
23
Pirie, M.D., Chatrou, L.W., Mols, J.B., Erkens, R.H.J., Oosterhof, J., 2006. 'Andeancentred' genera in the short-branch clade of Annonaceae: testing biogeographical hypotheses using phylogeny reconstruction and molecular dating. J. Biogeogr. 33, 31-
46.
Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution.
Bioinformatics 14, 817-818.
Primmer, C.R., Ellegren, H., 1998. Patterns of molecular evolution in avian microsatellites. Mol. Biol. Evol. 15, 997-1008.
Rainer, H., 2007. Monographic studies in the genus Annona L. (Annonaceae): Inclusion of the genus Rollinia A.St.-Hil. Ann. Naturhist. Mus. Wien, B 108, 191-205.
Richardson, J.E., Chatrou, L.W., Mols, J.B., Erkens, R.H.J., Pirie, M.D., 2004.
Historical biogeography of two cosmopolitan families of flowering plants: Annonaceae and Rhamnaceae. Philos. Trans. R. Soc. Lond., Ser. B: Biol. Sci. 359, 1495-1508.
Richardson, J.E., Pennington, R.T., Pennington, T.D., Hollingsworth, P.M., 2001. Rapid diversification of a species-rich genus of Neotropical rainforest trees. Science 293,
2242-2245.
Sanderson, M.J., 2004. r8s, version 1.70. Distributed by the author, Section of Evolution and Ecology, University of California, Davis, USA.
Sanderson, M.J., Shaffer, H.B., 2002. Troubleshooting molecular phylogenetic analyses.
Annu. Rev. Ecol. Syst. 33, 49-72.
Sang, T., 2002. Utility of low-copy nuclear gene sequences in plant phylogenies. Crit.
Rev. Biochem. Mol. Biol. 37, 121-147.
Sang, T., Donoghue, M.J., Zhang, D., 1997. Evolution of alcohol dehydrogenase genes in peonies ( Paeonia ): phylogenetic relationships of putative nonhybrid species. Mol.
Biol. Evol. 14, 994-1007.
24
Savolainen, V., Chase, M.W., Salamin, N., Soltis, D.E., Soltis, P.S., López, A.J.,
Fédrigo, O., Naylor, G.J.P., 2002. Phylogeny reconstruction and functional constraints in organellar genomes: plastid atpB and rbcL sequences versus animal mitochondrion.
Syst. Biol. 51, 638 - 647.
Shaw, J., Small, R.L., 2004. Addressing the "hardest puzzle in American pomology:"
Phylogeny of Prunus sect. Prunocerasus (Rosaceae) based on seven noncoding chloroplast DNA regions. Amer. J. Bot. 91, 985-996.
Simmons, M.P., Ochotorena, H., 2000. Gaps as characters in sequence-based phylogenetic analysis. Syst. Biol. 49, 369-381.
Small, R.L., Cronn, R.C., Wendel, J.F., 2004. L.A.S. Johnson review no. 2. Use of nuclear genes for phylogeny reconstruction in plants. Aust. Syst. Bot. 17, 145-170.
Squirrell, J., Hollingsworth, P.M., Woodhead, M., Russell, J., Lowe, A.J., Gibby, M.,
Powell, W., 2003. How much effort is required to isolate nuclear microsatellites from plants? Mol. Ecol. 12, 1339-1348.
Sunnucks, P., 2000. Efficient genetic markers for population biology. Trends Ecol.
Evol. 15, 199-203.
Swofford, D.L., 2000. PAUP*. Phylogenetic Analysis Using Parsimony (* and other methods), version 4.0b10. Sinauer Associates, Sunderland (MA).
Vriesendorp, B., Bakker, F.T., 2005. Reconstructing patterns of reticulate evolution in angiosperms: what can we do? Taxon 54, 593-604.
Whittall, J.B., Medina-Marino, A., Zimmer, E.A., Hodges, S.A., 2006. Generating single-copy nuclear gene data for a recent adaptive radiation. Mol. Phylogen. Evol. 39,
124-134.
Wilson, A.C.C., Massonnet, B., Simon, J.-C., Prunier-Leterme, N., Dolatti, L.,
Llewellyn, K.S., Figueroa, C.C., Ramirez, C.C., Blackman, R.L., Estoup, A., Sunnucks,
25
P., 2004. Cross-species amplification of microsatellite loci in aphids: assessment and application. Mol. Ecol. Notes 4, 104-109.
Wortley, A.H., Rudall, P.J., Harris, D.J., Scotland, R.W., 2005. How much data are needed to resolve a difficult phylogeny? Case study in Lamiales. Syst. Biol. 54, 697 -
709.
Wortley, A.H., Scotland, R.W., 2006. Determining the potential utility of datasets for phylogeny reconstruction. Taxon 55, 431-442.
Zardoya, R., Vollmer, D.M., Craddock, C., Streelman, J.T., Karl, S., Meyer, A., 1996.
Evolutionary conservation of microsatellite flanking regions and their use in resolving the phylogeny of cichlid fishes (Pisces: Perciformes). Proc. R. Soc. Lond., Ser. B: Biol.
Sci. 263, 1589-1598.
Zgurski, J.M., Rai, H.S., Fai, Q.M., Bogler, D.J., Francisco-Ortega, J., Graham, S.W.,
2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogen. Evol. 47, 1232-1237.
Zhang, K., Rosenberg, N.A., 2007. On the genealogy of a duplicated microsatellite.
Genetics 177, 2109-2122.
26
Table 1. Species, voucher information and GenBank, NCBI accession numbers.
Species Geography Voucher
Asimina angustifolia
A.Gray
Asimina triloba
(L.) Dunal
Annona amazonica
R.E.Fr.
Annona bicolor
Urb.
Annona cornifolia
A.St.-Hil.
Annona cuspidata
(Mart.) H.Rainer
Annona deminuta
R.E.Fr.
Annona dumetorum
R.E.Fr.
Annona glabra
L.
Annona herzogii
(R.E.Fr.) H.Rainer
Annona holosericea
Saff.
Annona hypoglauca
Mart.
Annona montana
Macfad.
Annona mucosa
Jacq.
Annona muricata
L.
USA
USA
Bolivia
Mexico
Bolivia
Guyana
Peru
Dominican
Republic
Neotropical /
African
Bolivia
Honduras
Bolivia
Neotropical
Peru
Neotropical
Weerasooriya, A. s.n. (U)
Chatrou, L.W.
276 (U)
Chatrou, L.W.
462 (U)
Maas, P.J.M.
8381 (U)
Chatrou, L.W.
343 (U)
Jansen-Jacobs, M.J.
5957 (U)
Rainer, H.
271 (WU)
Maas, P.J.M.
8374 (U)
Chatrou, L.W.
467 (U)
Chatrou, L.W.
347 (U)
Maas, P.J.M.
8445 (U)
Chatrou, L.W.
444 (U)
Chatrou, L.W.
484 (U)
Chatrou, L.W.
247 (U)
Chatrou, L.W.
468 (U)
Annona neochrysocarpa
H.Rainer
Peru
Annona oligocarpa
R.E.Fr.
Annona pruinosa
G.E.Schatz
Ecuador
Costa Rica
Annona reticulata
L.
Annona scandens
Diels
Annona senegalensis
Pers.
Annona squamosa
L.
Annona symphyocarpa
Sandw.
Annona urbaniana
R.E.Fr. a This study
Bolivia
Bolivia
West African
Curação
Guyana
Dominican
Republic
Pirie, M.D.
43 (U)
Maas, P.J.M.
8522 (U)
Chatrou, L.W.
77 (U)
Chatrou, L.W.
290 (U)
Chatrou, L.W.
365 (U)
Chatrou, L.W.
469 (U) van Proosdij, A.S.J.
1133 (U)
Ek, R.C.
1270 (U)
Maas, P.J.M.
8392 (U) rbcL trnLF
DQ124939 b AY841677 a
AY743441
EU420853
EU420854
EU420855
EU420869
EU420857
EU420856
AY841596
AY841656
EU420858
EU420859
EU420860
EU420870
AY743440
EU420868
EU420861
EU420862
EU420863
EU420864
AY841597
EU420865
EU420866
EU420867 c a a a a a a a a a a a a c a a a a a a a a a
AY743460
EU420836
EU420837
–
EU420851
EU420839
EU420838
AY841673
AY841734
EU420840
EU420841
EU420842
EU420852
AY743459
EU420850
EU420843
EU420844
EU420845
EU420846
AY841674
EU420847
EU420848
EU420849 c a a a a a a a a a a a a a a a a a a a a
LMCH9
–
–
EU420769
EU420770
EU420787
EU420772
EU420771
EU420773
EU420788
EU420774
EU420775
EU420776
EU420789
EU420786
EU420778
EU420779
EU420780
EU420781
EU420782
EU420783
EU420784
EU420785 a a a a a a a a a a a a a a a a a a a a
LMCH10
–
–
EU420768 a EU420790 a
EU420791
EU420792
EU420809
EU420794
EU420793
EU420795
EU420810
EU420796
EU420797
EU420798
EU420811 c EU420777 a EU420799
EU420808
EU420800
EU420801
EU420802
EU420803
EU420804
EU420805
EU420806
EU420807 a a a a a a a a a a a a a a a a a a a a a b Erkens et al. (2007b) c
Pirie et al. (2005)
27
rbcL trnL-F
1364
845
LMCH 9 117
LMCH 10 233 model rbcL trnL-F
Table 2.
Statistics per marker on features of data and molecular evolution.
# nucl. # # and % # and % # most tree chars. pars. trees length indel chars. variable chars. pars. inf. chars.
CI
GTR + Γ
TIM + Γ
LMCH9 K80
LMCH10 HKY
0
4
72 / 5.3 % 42 / 3.1 % 24
101 / 12.0 % 47 / 5.6 % 66
103
142
3 41 / 35.0 % 26 / 22.2 % 31 55
8 69 / 29.6 % 42 / 18.0 % 189 amongsite rate
LRT statistic
(
2
) variation rate (10 -9 substit. / site / year)
98
=
0.1504
=
0.7028
∞
58.95
16.37 a
102.89 b a
0.3127 ±
0.2225
0.8682 ±
0.5258
∞ 33.26 c
2.962 ±
1.999
3.065 ±
1.693
RI
0.874 0.913
0.887 0.898
0.873 0.932
0.898 0.906 a p < 0.0001 b
0.6 < p < 0.7 c
0.03 < p < 0.04
28
Table 3.
Statistics of incongruence length difference tests.
Partitions ILD p-value chloroplast markers vs. LMCH9 chloroplast markers vs. LMCH10
LMCH9 vs. LMCH10
0.0746
0.0856
0.2160
Table 4.
Effect of combining data partitions on the number of clades with bootstrap support ≥ 85 %. rbcL 10 trnLF 6 rbcL / trnLF 12
LMCH9 2
LMCH10 6
LMCH9 / LMCH10 9 rbcL / trnLF / LMCH 9 13 trnLF / LMCH9 / LMCH10 12 rbcL / trnLF / LMCH9 / LMCH10 16
29
Figure legends
Figure 1. Maximum parsimony phylogram for each of the four markers. The total number of most parsimonious trees from which the trees shown here were arbitrarily chosen is given in Table 2. Thick grey branches indicate bootstrap support
85%, thick black branches bootstrap support between 70% and 84%. Lack of flanking region sequences for Asimina precluded outgroup rooting. For ease of comparison, trees of
LMCH9 and LMCH10 have been drawn rooted at the midpoint between the clade with
Annona muricata and the remainder of the ingroup species (as found in the plastid trees rooted with Asimina ). Horizontal bars equal indicate branch lengths of five steps.
Figure 2.
Single most-parsimonious phylogram resulting from maximum parsimony analysis of all data ( rbcL , trnLF , LMCH9, and LMCH10) combined. Thick grey branches indicate bootstrap support
85%, thick black branches bootstrap support between 70% and 84%.
Figure 3. Saturation plots, displaying corrected versus uncorrected pairwise distances.
Distances were corrected using the models of molecular evolution given in Table 2.
30