Electronic Supplementary Material (ESM) New clues to the evolutionary history of the main European paternal lineage M269: dissection of the Y-SNP S116 in Atlantic Europe and Iberia. Laura Valverde, M. José Illescas, Patricia Villaescusa, Amparo M. Gotor, Ainara García, Sergio Cardoso, Jaime Algorta, Susana Catarino, Karen Rouault, Claude Férec, Orla Hardiman, Maite Zarrabeitia, Susana Jiménez, M. Fátima Pinheiro, Begoña M. Jarreta, Jill Olofsson, Niels Morling, Marian M. de Pancorbo. Eur J Hum Genet 2015 Box 1: Actual controversy about the origin and expansion of maternal haplogroup H (hg H) The most accepted theories for mitochondrial DNA (mtDNA) haplogroup H support its origin in the Franco-Cantabrian refuge and its postglacial expansion [1]. However, there has recently been much controversy with the new adjustments of the mitochondrial time-scales based on the information from complete mitochondrial genomes. Soares et al. [2] have proposed the departure dates for the H1 and H3 subgroups from the refuge after the last cold period, the Younger Dryas, i.e., the early Mesolithic. Fu et al. [3] obtained a sharp increase in the population size of hg H at approximately 5,000-9,000 ybp, and they associated these results with an expansion during the Neolithic, based on previous analyses showing that Neolithic remains have a high frequency of hg H, while this hg was absent in pre-Neolithic remains, something that is no longer considered true since hg H has been detected in Upper Palaeolithic remains from the FrancoCantabrian refuge [4]. Also interesting is that these dates would be Neolithic if the expansion is from Eastern Europe but Mesolithic if it is from Western Europe. In another possibility, Pala et al. [5] not only support the Palaeolithic origin of hg H in Western Europe, but their results point to a Palaeolithic entry age from east of the mtDNA haplogroups J and T, hitherto considered Neolithic. Other studies have recalculated the mitochondrial timescale by analysing complete mitochondrial genomes of firmly dated Neolithic remains [6,7]. Both studies obtained much higher mutation rates than previously estimated, indicating that all events so far dated for mtDNA seem to be younger. Thus, these authors are more supportive of a Neolithic expansion for hg H. Nevertheless, the authors are aware that their results do not rule out the theory of the postglacial expansion of hg H, as some of them concluded the year before in a study of the Basque population [8], in which the results suggested the presence of hg H in Basques since pre-Neolithic times, something currently confirmed by aDNA analysis in the Franco-Cantabrian refuge [4,9]. Page 1 of 18 References 1. Torroni A, Bandelt HJ, D'Urbano L et al: mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am J Hum Genet 1998; 62: 1137-1152. 2. Soares P, Achilli A, Semino O et al: The archaeogenetics of Europe. Curr Biol 2010; 4: R174-183. 3. Fu Q, Rudan P, Pääbo S, Krause J: Complete mitochondrial genomes reveal neolithic expansion into Europe. PLoS One 2012; 7: e32473. 4. Hervella M, Izagirre N, Alonso S, Fregel R, Alonso A, Cabrera VM, de la Rúa C: Ancient DNA from hunter-gatherer and farmer groups from northern Spain supports a random dispersion model for the Neolithic expansion into Europe. PLoS One 2012; 7: e34417. 5. Pala M, Olivieri A, Achilli A et al: Mitochondrial DNA signals of late glacial recolonization of Europe from near eastern refugia. Am J Hum Genet 2012; 90: 915-924. 6. Fu Q, Mittnik A, Johnson PL et al: A revised timescale for human evolution based on ancient mitochondrial genomes. Curr Biol 2013; 23: 553-559. 7. Brotherton P, Haak W, Templeton J et al: Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. Nat Commun 2013; 4: 1764. 8. Behar DM, Harmant C, Manry J et al: The Basque paradigm: genetic evidence of a maternal continuity in the Franco-Cantabrian region since pre-Neolithic times. Am J Hum Genet 2012; 90: 486-493. 9. Lacan M, Keyser C, Crubézy E, Ludes B: Ancestry of modern Europeans: contributions of ancient DNA. Cell Mol Life Sci 2013; 70: 2473-2487. Box 2: Materials & Methods Population A total of 1,560 healthy, unrelated males from the Iberian Peninsula (Galicia, Asturias, Cantabria, Basque Country, Barcelona, Alicante, Andalucía, Madrid, Portugal) and Atlantic Europe (Brittany (Brest), Ireland, Denmark) were studied (Table S1). All participants provided written informed consent. The procedures were in accordance with the ethical principles of the Helsinki Declaration of 1975, as revised in 2000. Y-SNP analysis The Y-SNP M269 was analysed using a TaqMan® predesigned assay (Applied Biosystems) for rs9786153, following the manufacturer’s guidelines. Allelic discrimination analysis was performed with a 7000 Real-Time PCR System (Applied Biosystems). The Y-SNPs L11, U106, S116, U152, M529, DF27, DF19 and L238 [1,2] were analysed by High Page 2 of 18 Resolution Melting. YSNP characteristics and the primers used for the amplification of each Y-SNP are shown in Table S2. Y-SNPs were amplified with 0.5 L of each primer (1 M), 2.5 L of SsoFast EvaGreen Supermix (BioRad) and 1 ng of DNA in a final volume of 5 L. Amplification and melting were done in a C1000 thermocycler equipped with a CFX96 optic module (BioRad) under the following conditions: 98°C 10 sec; 35 cycles at 98°C 5 sec, corresponding annealing temperature (see Table S2) 20 sec; 95°C 30 sec, 60°C 2 min and finally the melting cycle from 65°C to 95°C with an increase of 0.2°C/sec, for detecting the different allelic variants. Data interpretation was performed using Precision Melt Analysis software (BioRad). Only high-quality amplification and melting curves with a cluster assignment over 95% of confidence were considered. The assignment of the corresponding allelic variants of every cluster was performed by using positive and negative controls previously detected by sequencing. Danish males were typed for M269, S116 and U106 using custom-designed TaqMan Assays (Thermo Fisher) and analysed on a 7900HT Fast Real-Time PCR System (Thermo Fisher). Problematic samples were reanalysed or sequenced when necessary. Amplifications for the sequencing of each Y-SNP were done with 2.5 L of KAPA2GTM Fast HotStart Ready Mix (2X) (KAPA Biosystems), 0.5 L of each primer at 1 M (see Table S2) and 1 ng of DNA in a final volume of 5 L. Amplification was conducted with the same conditions of amplification as previously described (without the melting cycle) in a C1000 thermocycler (BioRad). Sequencing reactions were carried out with BigDye Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems) following the manufacturer’s guidelines. Y-STR analysis Individuals from Basque Country were genotyped for a set of 17 Y-STR loci using the AmpFlSTR®YfilerTM kit (Applied Biosystems), following the recommendations of the manufacturer. Capillary electrophoresis took place in an ABI Prism 3130 Genetic Analyser, and fragment sizes were assigned using GeneMapper® v. 4.0 software. The nomenclature used is that of the latest recommendations for the DNA Commission of the International Society of Forensic Genetics, except for locus Y GATA H4, which was named on the basis of the allelic ladder supplied with the AmpFlSTR ® YfilerTM kit. Data analysis The maps of haplogroup frequency distribution were constructed using Surfer Golden Software v. 10.0.500 by the kriging method. The spatial genetic patterns were studied through spatial principal component analyses (sPCA), Page 3 of 18 implemented using the algorithm provided in the R software package adegenet [3-6]. This method calculates the components based on the genetic variance between populations and their spatial autocorrelation. The components can be positive or negative. The most informative components are those with the absolute highest eigenvalues, i.e., the most positive (associated with positive spatial autocorrelation, global structure) and the most negative (associated with negative spatial autocorrelation, local structure). A global structure implies that each sampling location is genetically closer to its neighbours than randomly chosen locations, as occurs with spatial groups, clines or intermediate states. In contrast, a stronger genetic differentiation among neighbours than among random pairs of populations characterizes a local structure. Genetic distances (Fst) between populations based on haplogroup frequencies were calculated with the Arlequin v 3.1 software [7] with 10,000 permutations. They were plotted in Multidimensional Scaling graphs using PAST software [8]. The phylogenetic relationships of Y-STR haplotypes were estimated by median joining networks using NETWORK v 4.5.1.6 [9]. Higher phylogenetic weight was allocated to the loci with lower mutation rate [10,11], lower variance [VL, 12] and higher linearity [D, 13; calculated with the actual range published in YHRD, 14; Supplementary Box 4]. Coalescent times were estimated using Network software and the re-calibrated evolutionary STR mutation rate 6.9x10-4/locus/25 years revised for this set of 17 Y-STRs [15,16]. References 1. Rocca RA, Magoon G, Reynolds DF, Krahn T, Tilroe VO, Op den Velde Boots PM, Grierson AJ: Discovery of Western European R1b1a2 Y chromosome variants in 1000 genomes project data: an online community approach. PLoS One 2012; 7: e41634. 2. International Society of Genetic Genealogy 2014. Y-DNA Haplogroup Tree 2014, Version: 9.71, Date: 21 July 2014, http://www.isogg.org/tree/. 3. Jombart T: adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 2008; 24: 1403-1405. 4. Jombart T, Devillard S, Dufour AB, Pontier D: Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity (Edinb) 2008; 101: 92-103. 5. Montano V, Ferri G, Marcari V, Batini C, Anyaele O, Destro-Bisol G, Comas D: The Bantu expansion revisited: a new analysis of Y chromosome variation in Central Western Africa. Mol Ecol 2011; 20: 2693-2708. 6. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2013. URL http://www.R-project.org/. 7. Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 2007; 1: 47-50. 8. Hammer O, Harper DAT, Ryan PD: PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica 2001; 4: 9. Page 4 of 18 9. Bandelt HJ, Forster P, Röhl A: Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 1999; 16: 37-48. 10. Goedbloed M, Vermeulen M, Fang RN et al: Comprehensive mutation analysis of 17 Ychromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR amplification kit. Int J Legal Med 2009; 123: 471-482. 11. Ballantyne KN, Goedbloed M, Fang O et al: Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am J Hum Genet 2010; 87: 341-353. 12. Kayser M, Krawczak M, Excoffier L et al: An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations. Am J Hum Genet 2001; 68: 990-1018. 13. Busby GB, Brisighelli F, Sánchez-Diz P et al: The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Proc Biol Sci 2012; 279: 884-892. 14. Willuweit S, Roewer L: International Forensic Y Chromosome User Group. Y chromosome haplotype reference database (YHRD): update. Forensic Sci Int Genet 2007; 1: 83-87. 15. Zhivotovsky LA, Underhill PA, Cinnioğlu C et al: The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 2004; 74: 50-61. 16. Shi W, Ayub Q, Vermeulen M et al: A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations. Mol Biol Evol 2010; 27: 385393. Box 3: Special features of Basque population The Basque region has been historically subjected to genetic isolation and is therefore a possible stronghold of potentially ancient lineages. Moreover, the native language of the Basque region allows the differentiation of indigenous people by their surnames [1], so in this regard, the Basque population provides a unique opportunity to explore the oldest Y-chromosome genetic substratum of the population. Individuals arrived in the Basque Country during the last century with the Industrial Revolution, who are non-native Basques, can be recognized because of their non-Basque surnames and then be removed from the population sample for statistical purposes [2,3]. Table S1 shows how the haplogroup frequencies would vary depending on whether the total Basque population (native and non-native) or only the native population is assessed. The frequencies of S116 and DF27 are slightly higher in the native population than in the total population. However, in both cases, the frequencies of these haplogroups are the highest in Europe, which supports the evolutionary inferences made about DF27 and S116 in this population. However, the strong isolation of the Basque territory could also have caused a loss in the diversity of lineages, due to phenomena such as genetic drift or bottlenecks between the populations of the valleys of the complex Basque orography. However, the analysis has shown that haplogroup frequencies of the populations adjacent to Basque Country show a logical continuation of the Page 5 of 18 pattern of frequencies, which suggests that, indeed, S116 and DF27 are ancient lineages that originated in this region. Although the native Basque population offers a special opportunity for studying potential ancient lineages from the Franco-Cantabrian refuge (and has therefore been used for various phylogenetic approaches), in the statistical analysis, including population comparisons, the total Basque population sample (native and non-native Basques) has been assessed. This has been done because individual selection based on autochthonous surnames is difficult or impossible to do in other European populations, so the comparison between actual populations and only the native individuals of the Basque population would introduce bias into statistical calculations. References 1. Valverde L, Rosique M, Köhnemann S et al: Y-STR variation in the Basque diaspora in the Western USA: evolutionary and forensic perspectives. Int J Legal Med 2012; 126: 293-298. 2. Peña JA, Garcia-Obregon S, Perez-Miranda AM, De Pancorbo MM, Alfonso-Sanchez MA: Gene flow in the Iberian Peninsula determined from Y-chromosome STR loci. Am J Hum Biol 2006; 18: 532-539. 3. Valverde L, Köhnemann S, Rosique M, Cardoso S, Zarrabeitia M, Pfeiffer H, de Pancorbo MM: 17 Y-STR haplotype data for a population sample of Residents in the Basque Country. Forensic Sci Int Genet 2012; 6: e109-111. Box 4: Settings for constructing Y-STR phylogeny by median joining networks For constructing the phylogeny, native Basque individuals were selected. Moreover, the properties of the Y-STRs were carefully assessed. Busby et al. [1] warned that the attributes of the Y-STRs are rarely considered in phylogenetic reconstructions and calculations of TMRCA, altering the precision of the results. Here, we have prioritised the Y-STRs with higher phylogenetic weight. Thus, DYS385 and DYS389b were discarded for lacking phylogenetic interest, because it is not possible to assign a specific allele to each locus of DYS385 with the genotyping method used here and because DYS389b has a very complex and repetitive structure and may then have several allelic variants [2,3]. In contrast, we gave the highest phylogenetic weight to Y-STRs DYS390, DYS392, DYS393, DYS437, DYS438 and DYS448 for having the lower mutation rates of the Y-STRs analysed [4,5]. The Y-STRs DYS19, DYS391 and DYS635 have higher mutation rates than the previous, but the same phylogenetic weight was applied because they have a very low variance VL in Basque population [6], and DYS635 also has a high linearity D [1], which gives them greater phylogenetic weight in this Page 6 of 18 population. Finally, a minimum weight of 1 was given to the Y-STRs DYS389I, DYS439, DYS456, DYS458 and GATA H4 because they have much higher mutation rates. References 1. Busby GB, Brisighelli F, Sánchez-Diz P et al: The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Proc Biol Sci 2012; 279: 884-892. 2. Forster P, Röhl A, Lünnemann P, Brinkmann C, Zerjal T, Tyler-Smith C, Brinkmann B: A short tandem repeat-based phylogeny for the human Y chromosome. Am J Hum Genet 2000; 67: 182196. 3. Zhivotovsky LA, Underhill PA, Cinnioğlu C et al: The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 2004; 74: 50-61. 4. Goedbloed M, Vermeulen M, Fang RN et al: Comprehensive mutation analysis of 17 Ychromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR amplification kit. Int J Legal Med 2009; 123: 471-482. 5. Ballantyne KN, Goedbloed M, Fang O et al: Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am J Hum Genet 2010; 87: 341-353. 6. Kayser M, Krawczak M, Excoffier L et al: An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations. Am J Hum Genet 2001; 68: 990-1018. Box 5: Inferences combining genetic evidence found for Ychr hg R-M269 and mtDNA hg H. The genetic evidence found for the sister haplogroup of M269 in the maternal line, hg H, could be helpful for giving clues about M269 history, although with cautiousness because noncontemporaneous histories have also been proposed for these haplogroups [1]. However, it seems reasonable to consider that both haplogroups have coexisted at some point in their long evolutionary trees. For example, our results and other published data would allow for the coexistence of paternal S116 or DF27 haplogroups and maternal H1 and H3 in the Franco-Cantabrian refuge [2,3]. Recently, Brotherton et al. [4] have shown that some subgroups of hg H seem to have different geographical locations in Europe. This differential distribution for subhaplogroups emulates the distribution of subgroups of hg R, although it is too soon to determine whether some of them share a geographic location. Moreover, Brotherton et al. [4] analysed remains from the early, mid and late Neolithic (ENE, MNE and LNE, respectively), concluding that the remains from the ENE show genetic discontinuity with MNE/LNE remains. In fact, the authors report similarities between ENE remains and current populations from Eastern Europe and between MNE/LNE remains and current Central/SW Europe, Page 7 of 18 respectively. This east-west genetic discontinuity could be interpreted as demic diffusion not reaching the western part of the continent. That is, the presence of hg H in Palaeolithic remains of the Franco-Cantabrian refuge would indicate the arrival of this haplogroup in Western Europe before the Neolithic. The Neolithic wave could bring early farmers belonging to subgroups of hg H that evolved independently in the East and different to those present in Europe in pre-Neolithic times. The demic diffusion would have been short in expansion because it was soon superseded by cultural diffusion. Thus, the Western European Palaeolithic populations were neolithised mainly by culture diffusion, and now the genetic substrate mainly present in Western and Central Europe would correspond with the Palaeolithic genetic substrate. Brotherton et al. [4] relates the dominant maternal gene pool of current Western Europe with the expansion of the Neolithic culture Bell Beaker from Iberia in the LNE, as Klyosov [5] does for Ychr. We consider that this would also be consistent with the scenario proposed here. Bell Beaker Culture is believed to have emerged from the megalithic cultures, and it is believed that the Atlantic megalithic cultures arose from the ancient inhabitants of the European Atlantic coast [6-7]. The apogee of the megalithism has been linked to the arrival of new models of social organization or even to newcomers, which produced a sense of territoriality in the original inhabitants. This led them to build huge stone monuments. The ancient clans of hunter-gatherer-fishers, who inhabited the Atlantic coast from the Upper Palaeolithic, were spread across the Portuguese coast, Cantabrian Sea, western and northern coast of Europe, islands and even Baltic Sea coast. It is believed that they were the source of megalithic cultures. These ancient individuals could be carriers of L11 lineages. Evidence of this could be the actual maximum frequencies of L11* in these same Atlantic territories. This would imply a genetic continuity in SW Europe from Palaeolithic times, with a minor influence of Neolithic lineages arrived from the East. The dates of origin and expansion of the U106 and S116 subtypes originated from these L11 individuals remain uncertain. Our calculations, which were made including all precautions reported so far, point to an origin and expansion at the beginning of the Holocene, as suggested previously by Myres et al. [8] and Soares et al. [2] for mtDNA. This would make sense because the improved weather conditions would have led to a large enough population explosion to allow its expansion and generation of new variability, illustrated by the M529, U152 and DF27 sublineages. Thus, the presence in the network of undiscovered DF27 variability suggests that there may exist still more expansion events and unknown histories. Page 8 of 18 References 1. Boattini A, Martinez-Cruz B, Sarno S et al: Uniparental markers in Italy reveal a sex-biased genetic structure and different historical strata. PLoS One 2013; 8: e65441. 2. Soares P, Achilli A, Semino O et al: The archaeogenetics of Europe. Curr Biol 2010; 4: R174-183. 3. Cardoso S, Valverde L, Alfonso-Sánchez MA et al: The expanded mtDNA phylogeny of the FrancoCantabrian region upholds the pre-neolithic genetic substrate of Basques. PLoS One 2013; 8: e67835. 4. Brotherton P, Haak W, Templeton J et al: Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. Nat Commun 2013; 4: 1764. 5. Klyosov A: Ancient history of the Arbins, bearers of haplogroup R1b, from central Asia to Europe, 16,000 to 1500 years before present. Advances in Anthropology 2012; 2: 87-105. 6. Fernández-Martínez VM: Prehistoria. El largo camino de la humanidad. Alianza Editorial, Madrid, 2007. 7. Barandiarán I, Martí B, Del Rincón MA, Maya JL: Prehistoria de la Península Ibérica. Editorial Ariel, Barcelona, Spain, 2012. 8. Myres NM, Rootsi S, Lin AA et al: A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe. Eur J Hum Genet 2011; 19: 95-101. Supplementary Tables (collected in the Supplementary Excel File) Table S1. Y-SNP frequencies (%) in the analysed samples of population. For each haplogroup/column, the higher the frequency, the more intense the colour. Below, detailed characteristics of the Basque population sample. Table S2. Y-SNP characteristics, primer sequences and analysis conditions. Table S3. 17YSTR-YSNP haplotype data from the Basque sample of population. Table S4. Genetic Fst distances based on Y-SNP haplogroup frequencies (above diagonal) and p values (below diagonal). Statistically significant values after Bonferroni correction are shaded in blue. Page 9 of 18 Supplementary Figures M269 U106 S116 U152 M529 L238 DF19 DF27 M269 U106 S116 S116 (xU152xM529) M529 U152 Fig. S1. Frequency distribution maps of the data compiled in this study (blue stars) and the data from Myres et al. (2011), Larmuseau et al. (2011) and Busby et al. (2012) (red points). This Fig. S1 represents the comparisons performed at a lower level of tree resolution than in Fig. S2 (exclusively data from present study), because no higher resolution data is available in the literature and a broadly geographical overview of European continent was intended in this 1st representation. The YSNPs used for the construction of these Fig. S1 maps are highlighted in bold in the upper right tree. Page 10 of 18 M269 YxM269 M269 U106 S116 U152 M529 L238 DF19 DF27 S116 S116* DF27 Fig. S2. Frequency distribution maps of M269, S116 and DF27 in the Atlantic Coast and Iberian Peninsula. The stars in M269 map indicate the samples of population analysed. The upper right tree includes the Y-SNPs used for constructing the distribution maps. Page 11 of 18 M269 L150 L11 U106 S116 U152 M529 L238 DF19 DF27 Fig. S3. Median joining network of the M269 haplogroup in the Basque native population (bearing Basque surnames). The blue arrows indicate a phylogenetic split of DF27 haplogroup into two groups bearing the alleles 14/18 and 15/19 in the Y-STR haplotype DYS437/DYS448. Page 12 of 18 M269 L150 L11 U106 S116 U152 M529 L238 DF19 DF27 Fig. S4. Median joining network of the total Basque population, including both native and non-native individuals. The network from Fig. S3 was assembled only for native individuals, with the aim of studying the ancestral gene pool of the population, in this case M269 ancestral lineages. It is well known that Basque population is a genetic isolate. This may have caused a loss of diversity of lineages, which may affect the calculation of coalescence times and introduce errors in inferences. In addition, the native Basque sample has been selected on the basis of the Basque surnames. This way of selection could remove part of the gene flow occurred during recent years, which may further reduce the diversity and alter the calculations. To ensure the reliability of the calculations done with the native sample of population, a comparison of TMRCA results was done between the phylogeny constructed in Fig. S3 (including only native individuals) and a parallel phylogeny constructed including the total current Basque population (native and non native males, i.e. a random actual sampling in Basque Country without selection of individuals by surnames). This phylogeny of the actual Basque population is presented in Fig. S4. The results in both cases were similar and identical conclusions were reached with both sets of population samples. So, the inclusion of non-native individuals did not alter the structure of the S116 Page 13 of 18 and DF27 haplogroups. The blue arrows indicate the phylogenetic split of the DF27 haplogroup into two groups bearing the alleles 14/18 and 15/19 in the Y-STR haplotype DYS437/DYS448. The dates obtained (S116: 10659.31 +/- 1511 YBP; DF27: 9988 +/- 1374.YBP) were only slightly lower, and they do not modify the prehistoric window period inferred in with the native Basque population. These analyses demonstrate the reliability and robustness of the results obtained in the native sample of population, and state that the genetic isolation of Basque Country and/or the sampling strategy have not altered the demographic inferences. PC1 PC2 PC3 PC4 M269 U106 S116 U152 M529 L238 DF19 DF27 Fig. S5. Spatial PCAs based on haplogroup frequencies of the analysed populations and data compiled from Myres et al. (2011), Larmuseau et al. (2011) and Busby et al. (2012). Here, the level of resolution of the analysis is lower because S116 is not completely dissected in the literature data. The Y-SNPs used for the analysis are marked in bold in the tree. All the components of the analysis have positive eigenvalues (global structures). The spatial analyses of the most representative 4 principal components are presented. The colour plot corresponds to the spatial representations of the 2 principal components that explain the maximum variance. The colours make easier the identification of the different haplogroup spatial patterns found by the analysis. Page 14 of 18 U152 M529 S116 xM529 xU152 U106 L11 xU106 xS116 M269 xL11 U152 M529 S116 xM529 xU152 PC2 U152 M529 S116 xM529 xU152 U106 L11 xU106 xS116 U152 M529 S116 xM529 xU152 M269 xL11 PC4 U106 L11 xU106 xS116 M269 xL11 PC3 U106 L11 xU106 xS116 M269 xL11 PC1 Fig. S6. Contributions of the alleles to the principal components 1, 2, 3 and 4 (PC1, PC2, PC3 and PC4, respectively) of sPCA of Fig. S5. The order of the Y-SNPs in the graph: (1) M269 (xL11), (2) L11 (xU106 xS116), (3) U106, (4) S116 (xM529 xU152), (5) M529 and (6) U152. Page 15 of 18 PC1 PC2 PC3 Fig. S7. Spatial PCAs based on haplogroup frequencies of the analysed populations. The bar plot indicates the eigenvalues obtained for every component. Single-population scores of the 2 positive eigenvalues (red) (PC1 and PC2) and the negative eigenvalues (blue) (PC3) are represented with black/white squares, associated with positive/negative values, respectively. Square size is proportional to the absolute value, indicating the degree of differentiation. Page 16 of 18 S116* DF27 L238 M529 U152 Y xM269 U106 L238 M529 U152 U106 DF27 YxM269 S116* DF27 L238 U152 U106 Y xM269 PC3 S116* PC2 M529 PC1 Fig. S8. Contribution of the alleles to the principal components 1, 2 and 3 (PC1, PC2 and PC3, respectively) of the sPCA of Fig. S7. The order of the Y-SNPs in the graph: (1) xM269, (2) U106, (3) U152, (4) M529, (5) L238, (6) DF27 and (7) S116*. Fig. S9. Colour plot of the 2 principal positive components of the sPCA from Fig. S7. The colours make easier the identification of the different haplogroup spatial patterns found by the analysis. In this case, the red-orange dots identify the spatial pattern found for DF27 in Iberia, and the green dot for M529 in Ireland. Page 17 of 18 Fig. S10. Multidimensional scaling of genetic Fst distances calculated on the basis of Y-SNP haplogroup frequencies. Stress 0.048. Iberian populations appear more clustered due to the absence of statistically significant differences between them, with the exception of the Basque population, which statistically differs both from Iberia (with the exception of the neighbouring Cantabria population and the cosmopolitan cities Madrid and Barcelona) and from Brest and Ireland (see Table S4). Page 18 of 18