1 Garrick RC, Benavides E, Russello MA, Hyseni C, Edwards DL, Gibbs JP, Tapia W, Ciofi C, 2 Caccone A (2014) Lineage fusion in Galápagos giant tortoises. Molecular Ecology (doi: xxxxxx) 3 Supplementary material 4 5 6 Methods 7 8 Previous analysis of the number of Chelonoidis becki genotypic clusters 9 10 We recently re-examined the genetic composition of C. becki tortoises (Garrick et al. 11 2012; Edwards et al. 2013). The number of natural genetic clusters (K) within C. becki was 12 assessed by analyzing microsatellite data from ~1700 Wolf Volcano tortoises in STRUCTURE 13 v2.3.3 (Pritchard et al. 2000). Following Evanno et al. (2005), we found strong support for K = 2 14 groups [Piedras Blancas (PBL) and Puerto Bravo (PBR) herein]. In this initial analysis, estimated 15 membership coefficients (Q-values) of Q ≥ 0.95 were considered indicative of purebred 16 individuals (Vähä & Primmer 2006). Minimally related representatives of PBR and PBL 17 purebreds (23 individuals per group), as determined by analyses using KINGROUP v2.0 18 (Konovalov et al. 2004; full-sib reconstruction, Descending Ratio search algorithm), replaced 19 previously selected representatives of C. becki in our archipelago-wide reference dataset 20 (Russello et al. 2007; Poulakakis et al. 2008). This now comprises 354 individuals, representing 21 all extant and most extinct taxa of Galápagos giant tortoises. STRUCTURE analysis of the 22 reference dataset recovers K = 12 natural groups that are sufficiently differentiated for use in 23 genetic assignment tests (Garrick et al. 2012; Edwards et al. 2013). 1 24 25 Classification of Chelonoidis becki individuals 26 27 STRUCTURE was used to estimate Q-values for 841tortoises sampled from Wolf 28 Volcano via comparison to the reference dataset, with K = 12 set as a fixed parameter.. 29 Previously, in order to classify individuals on the basis of their Q-values, we simulated numerous 30 admixture scenarios for Wolf Volcano tortoises, including heterospecific crosses (i.e., between 31 C. becki and tortoise species endemic to other islands; Russello et al. 2007; Garrick et al. 2012; 32 Edwards et al. 2013). Here we used the same approach but focused on conspecific crosses (i.e., 33 hybridization and backcrossing between the two C. becki lineages, PBR and PBL). 34 35 We used HYBRIDLAB v1.0 (Nielsen et al. 2006) to simulate 12-locus microsatellite 36 datasets that could be used for characterizing the distribution of Q-values associated with five 37 different classes of C. becki tortoise: (1) PBR purebred; (2) PBL purebred; (3) F1 hybrid; (4) 38 PBR × F1 backcross; or (5) PBL × F1 backcross. Simulations were conditioned on the empirical 39 microsatellite data (i.e., same number of alleles, and extent of differentiation between parental 40 gene pools). These simulated datasets, comprised of 100 individuals each, were analyzed 41 together with the 354-individual reference dataset in STRUCTURE (run settings above, with K = 42 12). Following Garrick et al. (2012), we were able to distinguish between the five different 43 classes of C. becki tortoise by jointly considering two descriptors: QR, the Q-value range within 44 each parental cluster (i.e., purebred PBL and PBR), and QD, the Q-value difference between 45 parental clusters (table S1). Where necessary, ambiguous assignments of Wolf Volcano tortoises 2 46 were resolved following Garrick et al. (2012) by comparing empirical Q-values to the 47 distribution of simulated QR and QD, and then selecting the best-fit hybrid class. 48 49 Genetic diversity and differentiation 50 51 Pairwise genetic distance metrics.—We framed our assessment of levels of genetic 52 differentiation between purebred PBR and PBL lineages in the context of differences observed 53 between pairs of the following 13 Galápagos giant tortoise taxa: C. hoodensis, C. chathamensis, 54 C. abingdoni, C. porteri, C. ephippium, C. darwini, C. vandenburghi, C. microphyes, C. 55 guntheri, C. vicina, C. elephantopus, C. sp. nov. (Russello et al. 2005), and C. becki (represented 56 by the PBL lineage). For nuclear microsatellites, we compared a metric based on allele 57 frequencies alone (FST) to a related metric that also incorporates information on mutational 58 changes in alleles (RST). This approach can help determine whether divergences occurred over 59 relatively short timescales where genetic drift dominates, or over the longer timescales on which 60 new alleles evolve (Pons & Petit 1996; Hardy et al. 2003). For the 13 taxa (79 interspecific 61 pairwise comparisons), FST and RST values were calculated in GENEPOP v4.0 (Rousset 2008) 62 and RSTCALC v2.2 (Goodman 1997), respectively. Additionally, levels of divergence between 63 the lineages were examined on the basis of mtDNA sequences, for which we calculated 64 uncorrected p-distances, and maximum likelihood-corrected distances (using the best-fit model 65 identified via MODELTEST v3.0; Posada & Crandall 1998) in PAUP* v4.0b10 (Swofford 66 2002). Here, non-redundant haplotypes were the units of analysis; after excluding mtDNA 67 haplotypes that generate pairwise distances of zero, the 13 taxa were represented by 78 68 haplotypes (2619 interspecific pairwise comparisons). 3 69 70 Hybridization dynamics and forward-in-time projections 71 72 To examine the potential consequences of continued hybridization among C. becki 73 tortoises on Wolf Volcano, we compared characteristics of the present generation (G0) of C. 74 becki tortoises with those after one generation of random mating (G1). These characteristics 75 included changes in the frequency of PBL vs. PBR mtDNA haplogroups and purebred vs. 76 admixed tortoises, the degree of genetic substructure (measured via linkage disequilibrium 77 [LD]), and levels of genetic diversity (measured via allelic richness [AR] and observed 78 heterozygosity [HO]). 79 80 For the following analyses, we used only adult C. becki tortoises with complete genetic 81 data (i.e., individuals for which sex had been determined, and mtDNA sequences plus nuclear 82 microsatellite genotypes were available; N = 502 assigned individuals). First, the observed 83 frequencies of males and females for the following eight types of tortoises were used to calculate 84 the probability of all possible mate pairings (subscript “mt” is the mtDNA haplogroup): (1) 85 purebred PBL + PBLmt, (2) purebred PBR + PBRmt, (3) F1 hybrid + PBLmt, (4) F1 hybrid + 86 PBRmt, (5) PBL × F1 backcross + PBLmt, (6) PBL × F1 backcross + PBRmt, (7) PBR × F1 87 backcross + PBLmt, and (8) PBR × F1 backcross + PBRmt (table S4). From this pairwise matrix, 88 we projected the next-generation frequencies of purebred tortoises that had matching mtDNA 89 and nuclear genetic backgrounds (i.e., types 1 and 2, above; figure S4). Based on nuclear genetic 90 data alone, we also projected next-generation frequencies of purebred PBL, purebred PBR, and 91 hybrid (i.e., F1 plus backcross) tortoises (figure S5). 4 92 93 Stochasticity is associated with the particular parental pairs that may form in a single 94 reproductive cycle that generates that next generation of offspring, as well as random segregation 95 of alleles that occur within those individuals during gamete formation. To model this, we 96 simulated crosses between the multilocus microsatellite genotypes of randomly selected male- 97 female pairs. We also explored the possible impacts of assumptions about Ne by considering two 98 different values that—assuming a current census size (Nc) of ~8,000 tortoises on Wolf Volcano 99 tortoises (Garrick et al. 2012)—correspond with a Ne : Nc ratio of 0.05 and 0.10 (i.e., 200 and 100 400 adults, respectively). These ratios lie within the range commonly seen in numerous wild 101 species (Frankham 1995). To perform this modeling, subsets of adult C. becki tortoises were 102 randomly selected to represent the gene pool of the present (G0) generation’s breeders (N = 200 103 and N = 400 individuals, 3 replicates each). 104 105 To quantify the level of genetic substructure that exists in the present generation, 106 GENEPOP was used to perform tests of LD for all possible pairs of microsatellite loci using the 107 log likelihood ratio statistic (12 loci, 66 comparisons). To measure genetic diversity in the 108 present generation, HO was also calculated in GENEPOP. For the same data, another genetic 109 diversity metric, AR, was calculated using rarefaction correction, with a standardized sample size 110 of 184 diploid individuals, implemented in HP-RARE. To represent the gene pool of the next 111 generation (G1), random mating within the subsets of breeders was simulated between random 112 male-female pairs using HYBRIDLAB, to generate a total of 200 offspring per set (N = 200 or 113 400 breeders × 3 replicates each = 6 simulated datasets comprising 200 offspring). Using these 114 multilocus genotypes, we tested for LD and calculated HO and AR. 5 115 116 To compare G0 vs. G1, frequency distributions of P-values for tests of LD were plotted to 117 illustrate change in the number of locus pairs showing significant LD (figure S6). Here, locus 118 pairs that fall within the histogram category of P < 0.05 are of greatest interest, since these 119 represent cases of significant deviation from random association of alleles across loci (i.e., 120 linkage equilibrium). We also included a category of P < 0.00001, which approximates the 121 Bonferroni-corrected critical value, owing to multiple tests being performed on the same data 122 (figure S6). For AR and HO, we calculated the difference in these summary statistic values 123 between G0 vs. G1 for each microsatellite locus, and represented the distribution of these values 124 as box-and-whisker plots (figures S7 and S8). 6 Supplementary Tables Table S1. Criteria used to classify 841 Wolf Volcano tortoises as purebreds, F1 hybrids or backcrosses, based on STRUCTURE Q-values derived from crosses simulated in HYBRIDLAB. Following Garrick et al. (2012), a combination of two criteria was used: QR and QD. 7 Table S2. Assessment of historical divergence between the Puerto Bravo (PBR) and Piedras Blancas (PBL) lineages of C. becki, estimated using approximate Bayesian computation. Three data subsets (1-3) were run, each without or with bottleneck events (-NB, no bottleneck; -B, with bottleneck) included. Three scenarios were compared via posterior probabilities, which identified scenario 2a as the best-fit (error associated with scenario choice relates to this model). Estimates of parameters included in the best-fit scenario are reported as medians, together with 5% and 95% quantiles (Q5 and Q95, respectively). Parameters were effective population sizes (Ne; subscripts indicate population, where “AGO” = Santiago Island and “ancestor” is no longer extant), and splitting times, in units of generations (t1, younger event; t2, older event). Bottlenecks, when included, had two parameters: duration, in units of generations (dur), and severity of the size reduction, in units of Ne (sev1, younger event; sev2 older event). 8 Table S3. Proportions of offspring expected to have a PBR mtDNA haplogroup sequence if mating between the two lineages of C. becki on Wolf Volcano is random. Values were calculated based on operational sex-ratios from empirical data. Below, the two parental gene pools in each cross are represented by females (♀) and males (♂), where red indicates individuals carrying a PBR mtDNA haplogroup sequence (blue is PBL mtDNA haplogroup). Proportions of each sex are given below. Grid cells represent random pairings (initial expected proportions of offspring from each type of cross are given in grey text), of which only male-female parental pairs produce offspring (adjusted proportions are given in black text). Panel A: Purebred PBL × purebred PBL. Panel B: Purebred PBL × F1 hybrid. Panel C: Purebred PBR × F1 hybrid. The delimitation of mtDNA haplogroups is shown in figure S1. 9 Table S4. Probability of random male (♂) × female (♀) pairings, calculated for each of eight types of C. becki tortoises. The eight different types take onto account microsatellite (msat)-based assignment [i.e., PBL purebred, PBR purebred, F1 hybrid, and PBL backcross (F1 × PBL) or PBR backcross (F1 × PBR)], and mitochondrial DNA (mtDNA) haplogroup (PBLmt or PBRmt). The column and row labeled ‘Frequency’ is based on empirical data from N = 502 adult individuals from Wolf Volcano in the present (G0) generation. The interior cells of the matrix represent the projected the next-generation (G1) frequencies of offspring from each of the potential crosses, assuming random mating. Comparisons between G0 vs. G1 were used to examine the trajectory of future changes in frequencies of mtDNA haplogroup sequences, as well as purebred vs. hybrid tortoises on Wolf Volcano. 10 Supplementary Figures Figure S1. Statistical parsimony network showing evolutionary relationships among mitochondrial DNA (mtDNA) sequences carried by Wolf Volcano tortoises. Each mtDNA haplotype is represented by a pie chart (labeled R-1 to R-5 and L-1 to L-5). Pie slices indicate the frequency of a given haplotype for individuals classified as purebreds, F1 hybrids, and backcrosses (five classes) on the basis of their microsatellite genotypes. Pie chart sizes reflect overall abundance of each mtDNA haplotype. Black diamonds are hypothetical (unsampled or extinct) haplotypes, and black lines between haplotypes represent a single mutational step. The two disconnected networks are separated by a large number of mutational steps. “Native” haplotypes are found almost exclusively in C. becki tortoises from Wolf Volcano, Isabela Island, whereas “non-native” haplotypes appear to be derived from C. vandenberghi from Alcedo Volcano, Isabela Island.. This analysis was performed with sequences from 800 classified individuals. Fourteen additional individuals were omitted because they carried haplotypes characteristic of species endemic to other islands (i.e., C. hoodensis from Española [N = 10], C. chathamensis [N = 2] from San Cristóbal, or C. elephantopus [N = 2] from Floreana; main text, figure 1). 11 Figure S2. Best-fit model of historical divergence between the Puerto Bravo (PBR) and Piedras Blancas (PBL) lineages of C. becki, estimated using approximate Bayesian computation. Point estimates are median values, averaged over three data subsets, and 90% confidence intervals are given in parentheses. Model parameters are as follows: Ne = effective population size of contemporary and ancestral linages, and t = splitting time in units of thousands of years ago. 12 Figure S3. Best-fit model of historical divergence between Puerto Bravo (PBR) and Piedras Blancas (PBL) lineages of C. becki, including hypothetical bottleneck events, estimated using approximate Bayesian computation. Point estimates are median values, averaged over three data subsets, and 90% confidence intervals are given in parentheses. Model parameters are as follows: Ne = effective population size of contemporary and ancestral linages, and t = splitting time in units of thousands of years ago. Bottleneck events are associated with long-distance over-water colonization of Wolf Volcano from a Santiago Island ancestor, characterized by duration (median generations = 3; 90% CI: 1–5 for both) and severity (median Ne = 8 or 11; 90% CI: 2–18 or 3–19). 13 Figure S4. Histograms comparing the frequency of two C. becki mtDNA haplogroups in the present generation (G0) vs. projected frequencies after one generation of random mating (G1). For each timescale, frequencies of PBL and PBR haplogroups sum to one. Within columns, polka-dots indicate the proportion of a given haplogroup that occurs in purebreds with a corresponding nuclear genetic background (i.e., PBR mtDNA in a purebred PBR individual). 14 Figure S5. Histograms comparing the frequency three classes of C. becki tortoises, as determined using nuclear microsatellite data (i.e., purebred PBR, purebred PBL, and hybrids), in the present generation (G0) vs. projected frequencies after one generation of random mating (G1). For each timescale, frequencies of the three classes sum to one. Within the columns representing hybrids, the proportion of F1 hybrids, PBL × F1 backcrosses, and PBR × F1 backcrosses are indicated by dark, intermediate, and light grey shading, respectively (diagonal stripes are other kinds of admixture resulting from F2 and third-generation double backcrosses). 15 Figure S6. Frequency distributions comparing the current level of linkage disequilibrium (LD) among microsatellite alleles of C. becki tortoises (G0; solid lines, filled circles) vs. projected LD after one generation of random mating (G1; dashed lines, open circles). Distributions represent Pvalues of LD tests for each pair of microsatellite loci, and x-axis labels indicate upper bounds of P-value categories. For the present (G0) generation, these were calculated using empirical genotypic data from randomly selected subsets of adult C. becki tortoises (Panel A: N = 200 individuals; Panel B: N = 400). Forward-in-time (G1) projections are based on simulations of a single episode of male-female crosses (N = 200 offspring in both cases) using the same subsets of Ne = 200 and Ne = 400 adult tortoises chosen as breeders from the present generation. 16 Figure S7. Box-and-whisker plots showing projected change in allelic richness (AR) at microsatellite loci of C. becki tortoises, after one generation of random mating. For the present (G0) generation, AR was calculated using empirical genotypic data from randomly selected subsets of adult C. becki tortoises (N = 200, and N = 400 individuals). Forward-in-time (G1) projections are based on simulations of a single episode of male-female crosses (N = 200 offspring in both cases) conducted using the same subsets of Ne = 200 and Ne = 400 adult tortoises chosen as breeders from the present generation. The change in AR was calculated for each of 12 microsatellite loci as G1 minus G0 (the red line on zero marks no change in AR). On each plot, the lower and upper boundaries of the box represent 25th and 75th percentiles, respectively, and the line within the box marks the median. Upper and lower whiskers indicate the 90th and 10th percentiles; outlying data points are shown as open circles. 17 Figure S8. Box-and-whisker plots showing projected change in observed heterozygosity (HO) at microsatellite loci of C. becki tortoises, after one generation of random mating. For the present (G0) generation, HO was calculated using empirical genotypic data from randomly selected subsets of adult C. becki tortoises (N = 200, and N = 400 individuals). Forward-in-time (G1) projections are based on simulations of a single episode of male-female crosses (N = 200 offspring in both cases) conducted using the same subsets of Ne = 200 and Ne = 400 adult tortoises chosen as breeders from the present generation. The change in HO was calculated for each of 12 microsatellite loci as G1 minus G0 (the red line on zero marks no change in HO). On each plot, the lower and upper boundaries of the box represent 25th and 75th percentiles, respectively, and the line within the box marks the median. Upper and lower whiskers indicate the 90th and 10th percentiles; outlying data points are shown as open circles. 18 Supplementary References Edwards DL, Benavides E, Garrick RC et al. (2013) The genetic legacy of Lonesome George survives: Giant tortoises with Pinta Island ancestry identified in Galápagos. Biological Conservation, 157, 225–228. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Molecular Ecology, 14, 2611–2620. Frankham R (1995) Effective population-size: adult-population size ratios in wildlife: A review. Genetical Research, 66, 95–107. Garrick RC, Benavides E, Russello MA et al. (2012) Genetic rediscovery of an ‘extinct’ Galápagos giant tortoise species. Current Biology, 22, R10–R11. Goodman SJ (1997) RSTCalc: A collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance. Molecular Ecology, 6, 881–885. Hardy OJ, Charbonnel N, Fréville H, Heuertz M (2003) Microsatellite allele sizes: a simple test to assess their significance on genetic differentiation. Genetics, 163, 1467–1482. 19 Konovalov DA, Manning C, Henshaw MT (2004) KINGROUP: A program for pedigree relationship reconstruction and kin group assignments using genetic markers. Molecular Ecology Notes, 4, 779–782. Nielsen EE, Bach LA, Kotlicki P (2006) HYBRIDLAB (Version 1.0): A program for generating simulated hybrids from population samples. Molecular Ecology Notes, 6, 971–973. Pons O, Petit RJ (1996) Measuring and testing genetic differentiation with ordered versus unordered alleles. Genetics, 144, 1237–1245. Posada D, Crandall KA (1998) MODELTEST: Testing the model of DNA substitution. Bioinformatics, 14, 817–818. Poulakakis N, Glaberman S, Russello M et al. (2008) Historical DNA analysis reveals living descendants of an extinct species of Galápagos tortoise. Proceedings of the National Academy of Sciences, USA, 105, 15464–15469. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945–959. Rousset F (2008) GENEPOP’007: A complete re-implementation of the GENEPOP software for Windows and Linux. Molecular Ecology Resources, 8, 103–106. 20 Russello MA, Beheregaray LB, Gibbs JP et al. (2007) Lonesome George is not alone among Galápagos tortoises. Current Biology, 17, R317–R318. Russello MA, Glaberman S, Gibbs JP et al. (2005) A cryptic taxon of Galápagos tortoise in conservation peril. Biology Letters, 1, 287–290. Swofford DL (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer, Sunderland, Massachusetts, USA. Vähä JP, Primmer CR (2006) Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Molecular Ecology, 15, 63–72. 21