Supplemental Material (doc 2176K)

advertisement
Electronic Supplementary Material (ESM)
New clues to the evolutionary history of the main European paternal lineage M269: dissection of
the Y-SNP S116 in Atlantic Europe and Iberia.
Laura Valverde, M. José Illescas, Patricia Villaescusa, Amparo M. Gotor, Ainara García, Sergio
Cardoso, Jaime Algorta, Susana Catarino, Karen Rouault, Claude Férec, Orla Hardiman, Maite
Zarrabeitia, Susana Jiménez, M. Fátima Pinheiro, Begoña M. Jarreta, Jill Olofsson, Niels Morling,
Marian M. de Pancorbo.
Eur J Hum Genet 2015
Box 1: Actual controversy about the origin and expansion of maternal haplogroup H (hg H)
The most accepted theories for mitochondrial DNA (mtDNA) haplogroup H support its origin in the
Franco-Cantabrian refuge and its postglacial expansion [1].
However, there has recently been much controversy with the new adjustments of the mitochondrial
time-scales based on the information from complete mitochondrial genomes. Soares et al. [2] have
proposed the departure dates for the H1 and H3 subgroups from the refuge after the last cold
period, the Younger Dryas, i.e., the early Mesolithic. Fu et al. [3] obtained a sharp increase in the
population size of hg H at approximately 5,000-9,000 ybp, and they associated these results with an
expansion during the Neolithic, based on previous analyses showing that Neolithic remains have a
high frequency of hg H, while this hg was absent in pre-Neolithic remains, something that is no
longer considered true since hg H has been detected in Upper Palaeolithic remains from the FrancoCantabrian refuge [4]. Also interesting is that these dates would be Neolithic if the expansion is from
Eastern Europe but Mesolithic if it is from Western Europe. In another possibility, Pala et al. [5] not
only support the Palaeolithic origin of hg H in Western Europe, but their results point to a
Palaeolithic entry age from east of the mtDNA haplogroups J and T, hitherto considered Neolithic.
Other studies have recalculated the mitochondrial timescale by analysing complete mitochondrial
genomes of firmly dated Neolithic remains [6,7]. Both studies obtained much higher mutation rates
than previously estimated, indicating that all events so far dated for mtDNA seem to be younger.
Thus, these authors are more supportive of a Neolithic expansion for hg H. Nevertheless, the authors
are aware that their results do not rule out the theory of the postglacial expansion of hg H, as some
of them concluded the year before in a study of the Basque population [8], in which the results
suggested the presence of hg H in Basques since pre-Neolithic times, something currently confirmed
by aDNA analysis in the Franco-Cantabrian refuge [4,9].
Page 1 of 18
References
1. Torroni A, Bandelt HJ, D'Urbano L et al: mtDNA analysis reveals a major late Paleolithic
population expansion from southwestern to northeastern Europe. Am J Hum Genet 1998; 62:
1137-1152.
2. Soares P, Achilli A, Semino O et al: The archaeogenetics of Europe. Curr Biol 2010; 4: R174-183.
3. Fu Q, Rudan P, Pääbo S, Krause J: Complete mitochondrial genomes reveal neolithic expansion
into Europe. PLoS One 2012; 7: e32473.
4. Hervella M, Izagirre N, Alonso S, Fregel R, Alonso A, Cabrera VM, de la Rúa C: Ancient DNA from
hunter-gatherer and farmer groups from northern Spain supports a random dispersion model
for the Neolithic expansion into Europe. PLoS One 2012; 7: e34417.
5. Pala M, Olivieri A, Achilli A et al: Mitochondrial DNA signals of late glacial recolonization of
Europe from near eastern refugia. Am J Hum Genet 2012; 90: 915-924.
6. Fu Q, Mittnik A, Johnson PL et al: A revised timescale for human evolution based on ancient
mitochondrial genomes. Curr Biol 2013; 23: 553-559.
7. Brotherton P, Haak W, Templeton J et al: Neolithic mitochondrial haplogroup H genomes and
the genetic origins of Europeans. Nat Commun 2013; 4: 1764.
8. Behar DM, Harmant C, Manry J et al: The Basque paradigm: genetic evidence of a maternal
continuity in the Franco-Cantabrian region since pre-Neolithic times. Am J Hum Genet 2012; 90:
486-493.
9. Lacan M, Keyser C, Crubézy E, Ludes B: Ancestry of modern Europeans: contributions of ancient
DNA. Cell Mol Life Sci 2013; 70: 2473-2487.
Box 2: Materials & Methods
Population
A total of 1,560 healthy, unrelated males from the Iberian Peninsula (Galicia, Asturias, Cantabria,
Basque Country, Barcelona, Alicante, Andalucía, Madrid, Portugal) and Atlantic Europe (Brittany
(Brest), Ireland, Denmark) were studied (Table S1). All participants provided written informed
consent. The procedures were in accordance with the ethical principles of the Helsinki Declaration of
1975, as revised in 2000.
Y-SNP analysis
The Y-SNP M269 was analysed using a TaqMan® predesigned assay (Applied Biosystems) for
rs9786153, following the manufacturer’s guidelines. Allelic discrimination analysis was performed
with a 7000 Real-Time PCR System (Applied Biosystems).
The Y-SNPs L11, U106, S116, U152, M529, DF27, DF19 and L238 [1,2] were analysed by High
Page 2 of 18
Resolution Melting. YSNP characteristics and the primers used for the amplification of each Y-SNP
are shown in Table S2. Y-SNPs were amplified with 0.5 L of each primer (1 M), 2.5 L of SsoFast
EvaGreen Supermix (BioRad) and 1 ng of DNA in a final volume of 5 L. Amplification and melting
were done in a C1000 thermocycler equipped with a CFX96 optic module (BioRad) under the
following conditions: 98°C 10 sec; 35 cycles at 98°C 5 sec, corresponding annealing temperature (see
Table S2) 20 sec; 95°C 30 sec, 60°C 2 min and finally the melting cycle from 65°C to 95°C with an
increase of 0.2°C/sec, for detecting the different allelic variants. Data interpretation was performed
using Precision Melt Analysis software (BioRad). Only high-quality amplification and melting curves
with a cluster assignment over 95% of confidence were considered. The assignment of the
corresponding allelic variants of every cluster was performed by using positive and negative controls
previously detected by sequencing.
Danish males were typed for M269, S116 and U106 using custom-designed TaqMan Assays (Thermo
Fisher) and analysed on a 7900HT Fast Real-Time PCR System (Thermo Fisher).
Problematic samples were reanalysed or sequenced when necessary.
Amplifications for the sequencing of each Y-SNP were done with 2.5 L of KAPA2GTM Fast HotStart
Ready Mix (2X) (KAPA Biosystems), 0.5 L of each primer at 1 M (see Table S2) and 1 ng of DNA in a
final volume of 5 L. Amplification was conducted with the same conditions of amplification as
previously described (without the melting cycle) in a C1000 thermocycler (BioRad). Sequencing
reactions were carried out with BigDye Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems)
following the manufacturer’s guidelines.
Y-STR analysis
Individuals from Basque Country were genotyped for a set of 17 Y-STR loci using the
AmpFlSTR®YfilerTM kit (Applied Biosystems), following the recommendations of the manufacturer.
Capillary electrophoresis took place in an ABI Prism 3130 Genetic Analyser, and fragment sizes were
assigned using GeneMapper® v. 4.0 software. The nomenclature used is that of the latest
recommendations for the DNA Commission of the International Society of Forensic Genetics, except
for locus Y GATA H4, which was named on the basis of the allelic ladder supplied with the AmpFlSTR
® YfilerTM kit.
Data analysis
The maps of haplogroup frequency distribution were constructed using Surfer Golden Software v.
10.0.500 by the kriging method.
The spatial genetic patterns were studied through spatial principal component analyses (sPCA),
Page 3 of 18
implemented using the algorithm provided in the R software package adegenet [3-6]. This
method calculates the components based on the genetic variance between populations and their
spatial autocorrelation. The components can be positive or negative. The most informative
components are those with the absolute highest eigenvalues, i.e., the most positive (associated with
positive spatial autocorrelation, global structure) and the most negative (associated with negative
spatial autocorrelation, local structure). A global structure implies that each sampling location is
genetically closer to its neighbours than randomly chosen locations, as occurs with spatial groups,
clines or intermediate states. In contrast, a stronger genetic differentiation among neighbours than
among random pairs of populations characterizes a local structure.
Genetic distances (Fst) between populations based on haplogroup frequencies were calculated with
the Arlequin v 3.1 software [7] with 10,000 permutations. They were plotted in Multidimensional
Scaling graphs using PAST software [8].
The phylogenetic relationships of Y-STR haplotypes were estimated by median joining networks
using NETWORK v 4.5.1.6 [9]. Higher phylogenetic weight was allocated to the loci with lower
mutation rate [10,11], lower variance [VL, 12] and higher linearity [D, 13; calculated with the actual
range published in YHRD, 14; Supplementary Box 4]. Coalescent times were estimated using
Network software and the re-calibrated evolutionary STR mutation rate 6.9x10-4/locus/25 years
revised for this set of 17 Y-STRs [15,16].
References
1. Rocca RA, Magoon G, Reynolds DF, Krahn T, Tilroe VO, Op den Velde Boots PM, Grierson AJ:
Discovery of Western European R1b1a2 Y chromosome variants in 1000 genomes project data:
an online community approach. PLoS One 2012; 7: e41634.
2. International Society of Genetic Genealogy 2014. Y-DNA Haplogroup Tree 2014, Version: 9.71,
Date: 21 July 2014, http://www.isogg.org/tree/.
3. Jombart T: adegenet: a R package for the multivariate analysis of genetic markers.
Bioinformatics 2008; 24: 1403-1405.
4. Jombart T, Devillard S, Dufour AB, Pontier D: Revealing cryptic spatial patterns in genetic
variability by a new multivariate method. Heredity (Edinb) 2008; 101: 92-103.
5. Montano V, Ferri G, Marcari V, Batini C, Anyaele O, Destro-Bisol G, Comas D: The Bantu
expansion revisited: a new analysis of Y chromosome variation in Central Western Africa. Mol
Ecol 2011; 20: 2693-2708.
6. R Core Team: R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria, 2013. URL http://www.R-project.org/.
7. Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): an integrated software package for
population genetics data analysis. Evol Bioinform Online 2007; 1: 47-50.
8. Hammer O, Harper DAT, Ryan PD: PAST: Paleontological statistics software package for
education and data analysis. Palaeontologia Electronica 2001; 4: 9.
Page 4 of 18
9. Bandelt HJ, Forster P, Röhl A: Median-joining networks for inferring intraspecific phylogenies.
Mol Biol Evol 1999; 16: 37-48.
10. Goedbloed M, Vermeulen M, Fang RN et al: Comprehensive mutation analysis of 17 Ychromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR
amplification kit. Int J Legal Med 2009; 123: 471-482.
11. Ballantyne KN, Goedbloed M, Fang O et al: Mutability of Y-chromosomal microsatellites: rates,
characteristics, molecular bases, and forensic implications. Am J Hum Genet 2010; 87: 341-353.
12. Kayser M, Krawczak M, Excoffier L et al: An extensive analysis of Y-chromosomal microsatellite
haplotypes in globally dispersed human populations. Am J Hum Genet 2001; 68: 990-1018.
13. Busby GB, Brisighelli F, Sánchez-Diz P et al: The peopling of Europe and the cautionary tale of Y
chromosome lineage R-M269. Proc Biol Sci 2012; 279: 884-892.
14. Willuweit S, Roewer L: International Forensic Y Chromosome User Group. Y chromosome
haplotype reference database (YHRD): update. Forensic Sci Int Genet 2007; 1: 83-87.
15. Zhivotovsky LA, Underhill PA, Cinnioğlu C et al: The effective mutation rate at Y chromosome
short tandem repeats, with application to human population-divergence time. Am J Hum Genet
2004; 74: 50-61.
16. Shi W, Ayub Q, Vermeulen M et al: A worldwide survey of human male demographic history
based on Y-SNP and Y-STR data from the HGDP-CEPH populations. Mol Biol Evol 2010; 27: 385393.
Box 3: Special features of Basque population
The Basque region has been historically subjected to genetic isolation and is therefore a possible
stronghold of potentially ancient lineages. Moreover, the native language of the Basque region
allows the differentiation of indigenous people by their surnames [1], so in this regard, the Basque
population provides a unique opportunity to explore the oldest Y-chromosome genetic substratum
of the population. Individuals arrived in the Basque Country during the last century with the
Industrial Revolution, who are non-native Basques, can be recognized because of their non-Basque
surnames and then be removed from the population sample for statistical purposes [2,3].
Table S1 shows how the haplogroup frequencies would vary depending on whether the total Basque
population (native and non-native) or only the native population is assessed. The frequencies of
S116 and DF27 are slightly higher in the native population than in the total population. However, in
both cases, the frequencies of these haplogroups are the highest in Europe, which supports the
evolutionary inferences made about DF27 and S116 in this population.
However, the strong isolation of the Basque territory could also have caused a loss in the diversity of
lineages, due to phenomena such as genetic drift or bottlenecks between the populations of the
valleys of the complex Basque orography. However, the analysis has shown that haplogroup
frequencies of the populations adjacent to Basque Country show a logical continuation of the
Page 5 of 18
pattern of frequencies, which suggests that, indeed, S116 and DF27 are ancient lineages that
originated in this region.
Although the native Basque population offers a special opportunity for studying potential ancient
lineages from the Franco-Cantabrian refuge (and has therefore been used for various phylogenetic
approaches), in the statistical analysis, including population comparisons, the total Basque
population sample (native and non-native Basques) has been assessed. This has been done because
individual selection based on autochthonous surnames is difficult or impossible to do in other
European populations, so the comparison between actual populations and only the native
individuals of the Basque population would introduce bias into statistical calculations.
References
1. Valverde L, Rosique M, Köhnemann S et al: Y-STR variation in the Basque diaspora in the
Western USA: evolutionary and forensic perspectives. Int J Legal Med 2012; 126: 293-298.
2. Peña JA, Garcia-Obregon S, Perez-Miranda AM, De Pancorbo MM, Alfonso-Sanchez MA: Gene
flow in the Iberian Peninsula determined from Y-chromosome STR loci. Am J Hum Biol 2006; 18:
532-539.
3. Valverde L, Köhnemann S, Rosique M, Cardoso S, Zarrabeitia M, Pfeiffer H, de Pancorbo MM: 17
Y-STR haplotype data for a population sample of Residents in the Basque Country. Forensic Sci
Int Genet 2012; 6: e109-111.
Box 4: Settings for constructing Y-STR phylogeny by median joining networks
For constructing the phylogeny, native Basque individuals were selected. Moreover, the properties
of the Y-STRs were carefully assessed. Busby et al. [1] warned that the attributes of the Y-STRs are
rarely considered in phylogenetic reconstructions and calculations of TMRCA, altering the precision
of the results. Here, we have prioritised the Y-STRs with higher phylogenetic weight. Thus, DYS385
and DYS389b were discarded for lacking phylogenetic interest, because it is not possible to assign a
specific allele to each locus of DYS385 with the genotyping method used here and because DYS389b
has a very complex and repetitive structure and may then have several allelic variants [2,3]. In
contrast, we gave the highest phylogenetic weight to Y-STRs DYS390, DYS392, DYS393, DYS437,
DYS438 and DYS448 for having the lower mutation rates of the Y-STRs analysed [4,5]. The Y-STRs
DYS19, DYS391 and DYS635 have higher mutation rates than the previous, but the same
phylogenetic weight was applied because they have a very low variance VL in Basque population [6],
and DYS635 also has a high linearity D [1], which gives them greater phylogenetic weight in this
Page 6 of 18
population. Finally, a minimum weight of 1 was given to the Y-STRs DYS389I, DYS439, DYS456,
DYS458 and GATA H4 because they have much higher mutation rates.
References
1. Busby GB, Brisighelli F, Sánchez-Diz P et al: The peopling of Europe and the cautionary tale of Y
chromosome lineage R-M269. Proc Biol Sci 2012; 279: 884-892.
2. Forster P, Röhl A, Lünnemann P, Brinkmann C, Zerjal T, Tyler-Smith C, Brinkmann B: A short
tandem repeat-based phylogeny for the human Y chromosome. Am J Hum Genet 2000; 67: 182196.
3. Zhivotovsky LA, Underhill PA, Cinnioğlu C et al: The effective mutation rate at Y chromosome
short tandem repeats, with application to human population-divergence time. Am J Hum Genet
2004; 74: 50-61.
4. Goedbloed M, Vermeulen M, Fang RN et al: Comprehensive mutation analysis of 17 Ychromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR
amplification kit. Int J Legal Med 2009; 123: 471-482.
5. Ballantyne KN, Goedbloed M, Fang O et al: Mutability of Y-chromosomal microsatellites: rates,
characteristics, molecular bases, and forensic implications. Am J Hum Genet 2010; 87: 341-353.
6. Kayser M, Krawczak M, Excoffier L et al: An extensive analysis of Y-chromosomal microsatellite
haplotypes in globally dispersed human populations. Am J Hum Genet 2001; 68: 990-1018.
Box 5: Inferences combining genetic evidence found for Ychr hg R-M269 and mtDNA hg H.
The genetic evidence found for the sister haplogroup of M269 in the maternal line, hg H, could be
helpful for giving clues about M269 history, although with cautiousness because noncontemporaneous histories have also been proposed for these haplogroups [1]. However, it seems
reasonable to consider that both haplogroups have coexisted at some point in their long
evolutionary trees. For example, our results and other published data would allow for the
coexistence of paternal S116 or DF27 haplogroups and maternal H1 and H3 in the Franco-Cantabrian
refuge [2,3].
Recently, Brotherton et al. [4] have shown that some subgroups of hg H seem to have different
geographical locations in Europe. This differential distribution for subhaplogroups emulates the
distribution of subgroups of hg R, although it is too soon to determine whether some of them share
a geographic location.
Moreover, Brotherton et al. [4] analysed remains from the early, mid and late Neolithic (ENE, MNE
and LNE, respectively), concluding that the remains from the ENE show genetic discontinuity with
MNE/LNE remains. In fact, the authors report similarities between ENE remains and current
populations from Eastern Europe and between MNE/LNE remains and current Central/SW Europe,
Page 7 of 18
respectively. This east-west genetic discontinuity could be interpreted as demic diffusion not
reaching the western part of the continent. That is, the presence of hg H in Palaeolithic remains of
the Franco-Cantabrian refuge would indicate the arrival of this haplogroup in Western Europe before
the Neolithic. The Neolithic wave could bring early farmers belonging to subgroups of hg H that
evolved independently in the East and different to those present in Europe in pre-Neolithic times.
The demic diffusion would have been short in expansion because it was soon superseded by cultural
diffusion. Thus, the Western European Palaeolithic populations were neolithised mainly by culture
diffusion, and now the genetic substrate mainly present in Western and Central Europe would
correspond with the Palaeolithic genetic substrate.
Brotherton et al. [4] relates the dominant maternal gene pool of current Western Europe with the
expansion of the Neolithic culture Bell Beaker from Iberia in the LNE, as Klyosov [5] does for Ychr.
We consider that this would also be consistent with the scenario proposed here. Bell Beaker Culture
is believed to have emerged from the megalithic cultures, and it is believed that the Atlantic
megalithic cultures arose from the ancient inhabitants of the European Atlantic coast [6-7]. The
apogee of the megalithism has been linked to the arrival of new models of social organization or
even to newcomers, which produced a sense of territoriality in the original inhabitants. This led
them to build huge stone monuments. The ancient clans of hunter-gatherer-fishers, who inhabited
the Atlantic coast from the Upper Palaeolithic, were spread across the Portuguese coast, Cantabrian
Sea, western and northern coast of Europe, islands and even Baltic Sea coast. It is believed that they
were the source of megalithic cultures. These ancient individuals could be carriers of L11 lineages.
Evidence of this could be the actual maximum frequencies of L11* in these same Atlantic territories.
This would imply a genetic continuity in SW Europe from Palaeolithic times, with a minor influence
of Neolithic lineages arrived from the East.
The dates of origin and expansion of the U106 and S116 subtypes originated from these L11
individuals remain uncertain. Our calculations, which were made including all precautions reported
so far, point to an origin and expansion at the beginning of the Holocene, as suggested previously by
Myres et al. [8] and Soares et al. [2] for mtDNA. This would make sense because the improved
weather conditions would have led to a large enough population explosion to allow its expansion
and generation of new variability, illustrated by the M529, U152 and DF27 sublineages.
Thus, the presence in the network of undiscovered DF27 variability suggests that there may exist still
more expansion events and unknown histories.
Page 8 of 18
References
1. Boattini A, Martinez-Cruz B, Sarno S et al: Uniparental markers in Italy reveal a sex-biased
genetic structure and different historical strata. PLoS One 2013; 8: e65441.
2. Soares P, Achilli A, Semino O et al: The archaeogenetics of Europe. Curr Biol 2010; 4: R174-183.
3. Cardoso S, Valverde L, Alfonso-Sánchez MA et al: The expanded mtDNA phylogeny of the FrancoCantabrian region upholds the pre-neolithic genetic substrate of Basques. PLoS One 2013; 8:
e67835.
4. Brotherton P, Haak W, Templeton J et al: Neolithic mitochondrial haplogroup H genomes and
the genetic origins of Europeans. Nat Commun 2013; 4: 1764.
5. Klyosov A: Ancient history of the Arbins, bearers of haplogroup R1b, from central Asia to Europe,
16,000 to 1500 years before present. Advances in Anthropology 2012; 2: 87-105.
6. Fernández-Martínez VM: Prehistoria. El largo camino de la humanidad. Alianza Editorial, Madrid,
2007.
7. Barandiarán I, Martí B, Del Rincón MA, Maya JL: Prehistoria de la Península Ibérica. Editorial
Ariel, Barcelona, Spain, 2012.
8. Myres NM, Rootsi S, Lin AA et al: A major Y-chromosome haplogroup R1b Holocene era founder
effect in Central and Western Europe. Eur J Hum Genet 2011; 19: 95-101.
Supplementary Tables (collected in the Supplementary Excel File)
Table S1. Y-SNP frequencies (%) in the analysed samples of population. For each
haplogroup/column, the higher the frequency, the more intense the colour. Below, detailed
characteristics of the Basque population sample.
Table S2. Y-SNP characteristics, primer sequences and analysis conditions.
Table S3. 17YSTR-YSNP haplotype data from the Basque sample of population.
Table S4. Genetic Fst distances based on Y-SNP haplogroup frequencies (above diagonal) and p
values (below diagonal). Statistically significant values after Bonferroni correction are shaded in blue.
Page 9 of 18
Supplementary Figures
M269 U106
S116 U152
M529
L238
DF19
DF27
M269
U106
S116
S116 (xU152xM529)
M529
U152
Fig. S1. Frequency distribution maps of the data compiled in this study (blue stars) and the data from
Myres et al. (2011), Larmuseau et al. (2011) and Busby et al. (2012) (red points). This Fig. S1
represents the comparisons performed at a lower level of tree resolution than in Fig. S2 (exclusively
data from present study), because no higher resolution data is available in the literature and a
broadly geographical overview of European continent was intended in this 1st representation. The YSNPs used for the construction of these Fig. S1 maps are highlighted in bold in the upper right tree.
Page 10 of 18
M269
YxM269
M269 U106
S116 U152
M529
L238
DF19
DF27
S116
S116*
DF27
Fig. S2. Frequency distribution maps of M269, S116 and DF27 in the Atlantic Coast and Iberian
Peninsula. The stars in M269 map indicate the samples of population analysed. The upper right tree
includes the Y-SNPs used for constructing the distribution maps.
Page 11 of 18
M269 L150 L11 U106
S116
U152
M529
L238
DF19
DF27
Fig. S3. Median joining network of the M269 haplogroup in the Basque native population (bearing
Basque surnames). The blue arrows indicate a phylogenetic split of DF27 haplogroup into two groups
bearing the alleles 14/18 and 15/19 in the Y-STR haplotype DYS437/DYS448.
Page 12 of 18
M269 L150 L11
U106
S116
U152
M529
L238
DF19
DF27
Fig. S4. Median joining network of the total Basque population, including both native and non-native
individuals.
The network from Fig. S3 was assembled only for native individuals, with the aim of studying the
ancestral gene pool of the population, in this case M269 ancestral lineages.
It is well known that Basque population is a genetic isolate. This may have caused a loss of diversity
of lineages, which may affect the calculation of coalescence times and introduce errors in inferences.
In addition, the native Basque sample has been selected on the basis of the Basque surnames. This
way of selection could remove part of the gene flow occurred during recent years, which may
further reduce the diversity and alter the calculations.
To ensure the reliability of the calculations done with the native sample of population, a comparison
of TMRCA results was done between the phylogeny constructed in Fig. S3 (including only native
individuals) and a parallel phylogeny constructed including the total current Basque population
(native and non native males, i.e. a random actual sampling in Basque Country without selection of
individuals by surnames). This phylogeny of the actual Basque population is presented in Fig. S4.
The results in both cases were similar and identical conclusions were reached with both sets of
population samples. So, the inclusion of non-native individuals did not alter the structure of the S116
Page 13 of 18
and DF27 haplogroups. The blue arrows indicate the phylogenetic split of the DF27 haplogroup into
two groups bearing the alleles 14/18 and 15/19 in the Y-STR haplotype DYS437/DYS448.
The dates obtained (S116: 10659.31 +/- 1511 YBP; DF27: 9988 +/- 1374.YBP) were only slightly
lower, and they do not modify the prehistoric window period inferred in with the native Basque
population.
These analyses demonstrate the reliability and robustness of the results obtained in the native
sample of population, and state that the genetic isolation of Basque Country and/or the sampling
strategy have not altered the demographic inferences.
PC1
PC2
PC3
PC4
M269 U106
S116 U152
M529
L238
DF19
DF27
Fig. S5. Spatial PCAs based on haplogroup frequencies of the analysed populations and data
compiled from Myres et al. (2011), Larmuseau et al. (2011) and Busby et al. (2012). Here, the level of
resolution of the analysis is lower because S116 is not completely dissected in the literature data.
The Y-SNPs used for the analysis are marked in bold in the tree. All the components of the analysis
have positive eigenvalues (global structures). The spatial analyses of the most representative 4
principal components are presented. The colour plot corresponds to the spatial representations of
the 2 principal components that explain the maximum variance. The colours make easier the
identification of the different haplogroup spatial patterns found by the analysis.
Page 14 of 18
U152
M529
S116 xM529 xU152
U106
L11 xU106 xS116
M269 xL11
U152
M529
S116 xM529 xU152
PC2
U152
M529
S116 xM529 xU152
U106
L11 xU106 xS116
U152
M529
S116 xM529 xU152
M269 xL11
PC4
U106
L11 xU106 xS116
M269 xL11
PC3
U106
L11 xU106 xS116
M269 xL11
PC1
Fig. S6. Contributions of the alleles to the principal components 1, 2, 3 and 4 (PC1, PC2, PC3 and PC4,
respectively) of sPCA of Fig. S5. The order of the Y-SNPs in the graph: (1) M269 (xL11), (2) L11 (xU106
xS116), (3) U106, (4) S116 (xM529 xU152), (5) M529 and (6) U152.
Page 15 of 18
PC1
PC2
PC3
Fig. S7. Spatial PCAs based on haplogroup frequencies of the analysed populations. The bar plot
indicates the eigenvalues obtained for every component. Single-population scores of the 2 positive
eigenvalues (red) (PC1 and PC2) and the negative eigenvalues (blue) (PC3) are represented with
black/white squares, associated with positive/negative values, respectively. Square size is
proportional to the absolute value, indicating the degree of differentiation.
Page 16 of 18
S116*
DF27
L238
M529
U152
Y xM269
U106
L238
M529
U152
U106
DF27
YxM269
S116*
DF27
L238
U152
U106
Y xM269
PC3
S116*
PC2
M529
PC1
Fig. S8. Contribution of the alleles to the principal components 1, 2 and 3 (PC1, PC2 and PC3,
respectively) of the sPCA of Fig. S7. The order of the Y-SNPs in the graph: (1) xM269, (2) U106, (3)
U152, (4) M529, (5) L238, (6) DF27 and (7) S116*.
Fig. S9. Colour plot of the 2 principal positive components of the sPCA from Fig. S7. The colours
make easier the identification of the different haplogroup spatial patterns found by the analysis. In
this case, the red-orange dots identify the spatial pattern found for DF27 in Iberia, and the green dot
for M529 in Ireland.
Page 17 of 18
Fig. S10. Multidimensional scaling of genetic Fst distances calculated on the basis of Y-SNP
haplogroup frequencies. Stress 0.048.
Iberian populations appear more clustered due to the absence of statistically significant differences
between them, with the exception of the Basque population, which statistically differs both from
Iberia (with the exception of the neighbouring Cantabria population and the cosmopolitan cities
Madrid and Barcelona) and from Brest and Ireland (see Table S4).
Page 18 of 18
Download