file - BioMed Central

advertisement
Additional file 14
Phylogenetic analysis (Antirrhinum matrix)
Phylogenetic analyses for 190 trnS-trnG/trnK-matK concatenated sequences were
conducted using Bayesian inference (BI), maximum likelihood (ML) and maximum
parsimony (MP). In addition, Bayesian phylogenetic analyses were also performed on
the separate matrices to examine plastid gene tree congruence. Gambelia speciosa and
Misopates orontium were selected as the outgroup based on previous phylogenetic
evidence [1], and gaps were treated as missing data. The MP analysis was performed in
TNT 1.1 [2] using a heuristic search with 10,000 replicates saving two mostparsimonious trees per replicate, followed by a second heuristic search retaining all best
trees and using the trees obtained in the previous 10,000 replicates as the starting ones.
Bootstrap support (MP-BS) of clades was assessed using 1000 standard replicates. For
ML and BI analyses, the simplest model of sequence evolution that best fits the
sequence data was determined under the Akaike Information Criterion (AIC) in
jModeltest 0.1.1 [3]. The General Time Reversible model incorporating invariant sites
and a gamma distribution (GTR+I+G) was selected for the two plastid DNA regions.
ML was implemented in PhyML 3.0 [4] with 500 non-parametric bootstrap replicates
(ML-BS). BI was performed in MrBayes v3.1.2 [5]. Two identical searches with 10
million generations each and a sample frequency of 1000 were performed. Chain
convergence was assessed with Tracer 1.5 [6], and a 50% majority rule consensus tree
with Bayesian posterior probabilities (PP) of clades was calculated, using the sumt
command, to yield the final Bayesian estimate of phylogeny after removing the first
10% generations as burn-in. Trees were visualized using FigTree 1.3.1. [7].
Ancestral area reconstructions (Antirrhinum matrix)
A discrete phylogeographic analysis (DPA) that uses standard MCMC sampling
implemented in BEAST [8] was performed to assess the probability distribution of the
geographic locations in each node of the maximum clade credibility tree. A total of 14
discrete areas were delimited: (i) the four Iberian quadrants (northeastern Iberia, NE;
northwestern Iberia, NW; southeastern Iberia, SE; southwestern Iberia, SW), as divided by
the geographical coordinates 40ºN/5ºW [see 9]; (ii) Eastern, Central and Western Pyrenees,
as the three recognized biogeographic regions within the Pyrenees (see below); (iii) the
other two northern areas sampled nearby the Pyrenees (Southern French basin and Southwestern Alps); and (iv) the remaining five regions sampled across Mediterranean basin
(Morocco, Sicily, Sardinia, Italy and Turkey). Statistical significance for the rates of the
dispersal events was assessed via Bayes factor test (BF) as described by Lemey et al. [8].
Dispersal rates were allowed to be zero with some probability in the framework of
Bayesian stochastic search variable selection (BSSVS). The analysis consisted of two
independent runs of 100 million generations each sampling every 10000 generations.
Chain convergence was examined in Tracer 1.5 [10]. The two runs were combined in
LogCombiner 1.6.2 after discarding the first 10% of sampled generations as burn-in. A
consensus chronogram with the maximum sum of clade credibilities (MCC), was obtained
with TreeAnotator v.1.6.2 and visualized in FigTree 1.3.1 [7]. Well-supported rates of
dispersal (BF>3) were visualized in Google Earth using the RateIndicatorBF tool added to
the BEAST code.
Additional ancestral range reconstructions were conducted using the Bayesian timecalibrated molecular phylogeny with the aim to discriminate between northern and
southern origin of Pyrenean lineages. For this purpose, only four areas were delimited
(i) Iberian Peninsula, (ii) Pyrenees and adjacent areas, (iii) South-western Alps, and (iv)
samples from the Mediterranean basin (excluding Iberia). Ancestors were allowed to be
present in all of them. Distribution ranges of sequences (haplotypes) instead of species
was used [see 11]. Two alternative reconstruction methods were used: (a) statistical
dispersal-vicariance analysis (S-DIVA) implemented in the program RASP 1.1 [12], a
parsimony-based approach (DIVA; [13]) that determines the probability of each
geographical region for each node, accounting for the uncertainty of the Bayesian
phylogenetic analysis [14]; and (b) dispersal-extinction-cladogenesis analysis (DEC)
implemented in the software package Lagrange v2.0.1 [15], a parametric likelihoodbased approach that estimates the most likely geographic distribution of two daughter
lineages following a speciation event. Whereas the first method estimates the actual
state at the node, the second estimates the states of the branches emanating from a given
node. For the S-DIVA analysis we followed the method of Harris & Xiang [16]. Two
hundred trees randomly sampled after the burn-in period from the BEAST run were
selected, and the single MCC tree was used as final tree (after pruning outgroup taxa).
For DEC analysis we used the pruned MCC tree. Symmetric dispersal between both
areas and constant dispersal rates through time were set.
Genetic diversity and geographic structure (Pyrenees matrix)
An analysis of genetic diversity was carried out across the three recognized
biogeographic regions in which the Pyrenees range is divided (Eastern, Central and
Western Pyrenees) (see Fig. 3a). The boundaries of this three biogeographic areas,
although with slight differences, have been traditionally established by both geologists
[17, 18] and phytogeographers [19-24] on the basis of geologic, climatic and floristic
data. Haplotype frequencies and molecular diversity indices for each biogeographic area
were calculated using DnaSP v5 [25]. In addition, to identify potential hotspots of
genetic diversity across the Pyrenees, individuals were geographically grouped by
means of a 10x10 km grid. Charts representing haplotype frequencies were constructed
for each grid cell, which was named by a generic letter–number code (Fig. 3).
To infer the spatial genetic structure we used a Bayesian model-based approach,
implemented in the BAPs software, version 5.3 [26, 27]. This software assigns the
genotypes into genetically structured groups (K) and incorporates the possibility to
account for the dependence due to linkage between the sites within aligned sequences.
Five iterations of K, for Kmax values of five, ten and 20 potential populations, were
conducted to determine the optimal number of genetically homogeneous groups.
‘Clustering of groups with linked loci’ analysis was chosen, and the groups were
defined by natural sampled populations. Admixture analyses [26] were run with 100
iterations to estimate admixture coefficients for individuals, 200 reference individuals
from each population and 20 iterations to estimate admixture coefficients for reference
individuals.
To identify genetic subdivisions among the Eastern, Central and Western Pyrenees, we
performed an analysis of molecular variance (AMOVA) [28], which compares
haplotype variation within and between groups. Pairwise FST statistics were also
calculated to estimate genetic distances. Both analysis were performed by using
ARLEQUIN [29]. Additionally, an AMOVA was performed in order to assess the
partitioning of variance between the Lineage E (primarily distributed in the eastern part
of the Pyrenees) and the rest of lineages (see below).
To evaluate the optimal grouping of the sampled sites without a priori assumptions, a
spatial analysis of molecular variance (SAMOVA) implemented in the software
package SAMOVA 1.0 [30] was also performed. This analysis uses a simulated
annealing approach based on genetic and geographical data to identify groups of related
populations. The program was run for K = 2 to 20 groups, from 100 initial conditions,
and the most likely structure was identified using highest values of FCT (the proportion
of genetic variation between groups of populations) excluding any groups of a single
population.
References
1.
Vargas P, Rosselló JA, Oyama R, Güemes J: Molecular evidence for
naturalness of genera in the tribe Antirrhineae (Scrophulariaceae) and
three independent evolutionary lineages from the New World and the Old.
Plant Systematics and Evolution 2004, 249(3-4):151-172.
2.
Goloboff PA, Farris JS, Nixon KC: TNT, a free program for phylogenetic
analysis. Cladistics 2008, 24(5):774-786.
3.
Posada D: jModelTest: Phylogenetic model averaging. Molecular Biology and
Evolution 2008, 25(7):1253-1256.
4.
Guindon S, Gascuel O: A Simple, Fast, and Accurate Algorithm to Estimate
Large Phylogenies by Maximum Likelihood. Systematic Biology 2003,
52(5):696-704.
5.
Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference
under mixed models. Bioinformatics 2003, 19(12):1572-1574.
6.
Rambaut A, Drummond A: Tracer ver. 1.5. Program available at http://beast
bio ed ac uk/Tracer 2009.
7.
Rambaut A: FigTree version 1.3.1. FigTree Available at
http://tree.bio.ed.ac.uk/software/figtree/. In.; 2009.
8.
Lemey P, Rambaut A, Drummond AJ, Suchard MA: Bayesian phylogeography
finds its roots. PLoS Computational Biology 2009, 5(9).
9.
Vargas P, Carrió E, Guzmán B, Amat E, Güemes J: A geographical pattern of
Antirrhinum (Scrophulariaceae) speciation since the Pliocene based on
plastid and nuclear DNA polymorphisms. Journal of Biogeography 2009,
36(7):1297-1312.
10.
Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by
sampling trees. BMC Evolutionary Biology 2007, 7(1).
11.
Fernández-Mazuecos M, Vargas P: Historical isolation versus recent longdistance connections between Europe and Africa in bifid toadflaxes (Linaria
sect. Versicolores). PLoS ONE 2011, 6(7).
12.
Yu Y, Harris A, He X: rasp (reconstruct ancestral state in phylogenies) 2.0
beta. In.: http://mnh.scu.edu.cn/soft/blog/RASP; 2011.
13.
Ronquist F: Dispersal-Vicariance Analysis: A New Approach to the
Quantification of Historical Biogeography. Systematic Biology 1997,
46(1):195-203.
14.
Nylander JA, Olsson U, Alström P, Sanmartín I: Accounting for phylogenetic
uncertainty in biogeography: a Bayesian approach to dispersal-vicariance
analysis of the thrushes (Aves: Turdus). Systematic Biology 2008, 57(2):257268.
15.
Ree RH, Smith SA: Maximum likelihood inference of geographic range
evolution by dispersal, local extinction, and cladogenesis. Systematic Biology
2008, 57(1):4-14.
16.
Harris AJ, Xiang QY: Estimating ancestral distributions of lineages with
uncertain sister groups: A statistical approach to dispersal-vicariance
analysis and a case using Aesculus L. (Sapindaceae) including fossils.
Journal of Systematics and Evolution 2009, 47(5):349-368.
17.
Souquet P, Bilotte M, Canerot J, Debroas E, Peybernés B, Rey J: Nouvelle
interprétation de la structure des Pyrénées. Comptes Rendus de l’Academie
des Sciences de Paris 1975, 281:609-612.
18.
Souquet P, Mediavilla F: Nouvelle hypothèse sur la formation des Pyrénées.
CR Acad Sci 1976, 282:2139-2142.
19.
Rivas Martinez S: Memoria del mapa de series de vegetaci6n de Espafia 1:
400000. ICONA, Madrid 1987.
20.
Rivas Martínez S, Báscones Carretero JC, Díaz González TE, Fernández
González F, Loidi Arregui J: Vegetación del Pirineo occidental y Navarra: VI
Excursión Internacional de Fitosociología (AEFA). Itinera Geobotanica
1991(5):5-456.
21.
Montserrat P: L'exploration floristique des Pyrenees occidentales.(The
floristic exploration of West Pyrenees). Bol Soc Brot 1974, 47:227-241.
22.
Villar L: La vegetación del Pirineo occidental: estudio de Geobotánica
Ecológica. In: Príncipe de Viana Suplemento de Ciencias. 1982: 263-434.
23.
Vigo J, Ninot J: Los Pirineos. In: La vegetación de España. Edited by Peinado
Lorea M, Rivas Martinez S: Col. Aula Abierta. Publ. Univ. Alcalá de Henares;
1987: 351-384.
24.
Izard M: Le climat. In: La Végétation des Pyrénees. Edited by Dupias G.
Editions du CNRS, Paris; 1985: 17-36.
25.
Librado P, Rozas J: DnaSP v5: A software for comprehensive analysis of
DNA polymorphism data. Bioinformatics 2009, 25(11):1451-1452.
26.
Corander J, Marttinen P, Sirén J, Tang J: Enhanced Bayesian modelling in
BAPS software for learning genetic structures of populations. BMC
bioinformatics 2008, 9:539.
27.
Corander J, Tang J: Bayesian analysis of population structure based on
linked molecular information. Mathematical Biosciences 2007, 205(1):19-31.
28.
Excoffier L, Smouse PE, Quattro JM: Analysis of molecular variance inferred
from metric distances among DNA haplotypes: Application to human
mitochondrial DNA restriction data. Genetics 1992, 131(2):479-491.
29.
Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): an integrated
software package for population genetics data analysis. Evolutionary
bioinformatics online 2005, 1:47.
30.
Dupanloup I, Schneider S, Excoffier L: A simulated annealing approach to
define the genetic structure of populations. Molecular Ecology 2002,
11(12):2571-2581.
Download