Biol Invasions https://doi.org/10.1007/s10530-024-03386-3 ORIGINAL PAPER New evidence contradicts the rapid spread of invasive genes into a threatened native species Benjamin M. Fitzpatrick · Evan McCartney‑Melstad · Jarrett R. Johnson · H. Bradley Shaffer Received: 1 February 2024 / Accepted: 24 June 2024 © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024 Abstract When introduced species hybridize with native relatives, spread of advantageous invasive genes into native populations (introgression) is a conservation concern. Genome-scale SNP (single nucleotide polymorphism) analysis can be a powerful approach to detect hybridization and identify candidate loci experiencing selection in hybrid zones. However, followup studies are critical to verify and interpret potentially impactful patterns of introgression. In Supplementary Information The online version contains supplementary material available at https://doi. org/10.1007/s10530-024-03386-3. B. M. Fitzpatrick (*) Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA e-mail: benfitz@utk.edu E. McCartney‑Melstad · H. B. Shaffer Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA E. McCartney‑Melstad · H. B. Shaffer La Kretz Center for California Conservation Science, Institute of the Environment and Sustainability, University of California, Los Angeles, CA 90095, USA Present Address: E. McCartney‑Melstad Nutcracker Therapeutics, 5858 Horton Street, Suite 540, Emeryville, CA 94608, USA an earlier publication we identified three outlier loci (out of 68 unlinked SNPs) where non-native alleles appeared to have introgressed 90 km into the range of a threatened native salamander, while the other 65 markers showed no evidence of spread further than 12 km. This was consistent with strong selection favoring a few invasive traits, but our inferences necessarily depended on limited reference samples of the native species. Here, we further tested our initial interpretation by interrogating the outlier markers in greater detail. First, we isolated DNA from two museum specimens of native salamanders collected several decades before the introduction. Both had the putatively invasive SNPs, indicating that the SNP alleles were present before the introduction and therefore not diagnostic for nonnative ancestry. Second, we developed a novel genealogical analysis of DNA sequences (rather than SNPs) to infer allelic ancestry, since genealogical analysis of haplotypes minimizes the ancestry assignment errors that can occur with SNPs. When applied to the original loci, this analysis confirmed that the genotypes formerly interpreted as ‘superinvasive’ are native variants, and non-native alleles remain limited to areas near the original introduction sites. Keywords Hybridization · Gene flow · Tiger salamander · Ascertainment bias J. R. Johnson Department of Biology, Western Kentucky University, Bowling Green, KY 42103, USA Vol.: (0123456789) B. M. Fitzpatrick et al. Introduction When introduced and native species interbreed, biologists and conservation managers need to track and understand the impacts of non-native genes in native populations and communities (Rhymer and Simberloff 1996; Allendorf et al. 2001; Fitzpatrick et al. 2015; Todesco et al. 2016; Wayne and Shaffer 2016; Draper et al. 2021). Of particular interest is the potential for rapid spread of invasive alleles owing to strong selection arising from ecological or sexual advantages (Petit 2004; Chatfield et al. 2010; Lipshutz et al. 2019). Such patterns can be detected using outlier analysis of geographic or genomic clines if ancestry-informative markers can be characterized with sufficient accuracy from reference samples of the parental species (e.g., Gompert and Buerkle 2011; DeVos et al. 2023). The California tiger salamander, Ambystoma californiense is one of the most intensely analyzed systems involving hybridization between an endangered native and a closely-related introduced congener (e.g., Riley et al. 2003; Ryan et al. 2009; Johnson et al. 2011; Searcy et al. 2016; Cooper and Shaffer 2023). Previously, we presented evidence that 3/68 analyzed SNP markers had introgressed from introduced barred tiger salamanders deep into the native range of the California tiger salamander (Fitzpatrick et al. 2010). Because A. californiense is listed as Threatened/Endangered under both the US and California Endangered Species Acts, the correct identification of hybrid populations has major repercussions for both mitigation actions and species recovery efforts, and our earlier results suggested that strongly selected non-native alleles were somehow sweeping across native A. californiense populations. The key to any analysis of hybridization is accurate estimation of allele frequency differences between the parental populations. However, doing so in natural populations is often trickier than it first appears. At a minimum, correctly inferring the native/non-native ancestry of any marker depends on genetic data from reference populations that include a comprehensive inventory of variation in both species (Fitzpatrick 2012; Pereira et al. 2020). If important reference populations have already been invaded, and therefore cannot be used as reference data, then it might be impossible to accurately estimate pre-invasion allele frequencies and natural patterns of geographic Vol:. (1234567890) variation. As a result, outlier loci interpreted as showing exceptional patterns of introgression might instead be artifacts of inadequately sampled reference populations (Fig. 1). This inference problem cannot be resolved by simply using a larger number of markers from across the genome. While genome-wide data might lead to more precise estimates of overall ancestry proportions at the individual or population level, analyses and inferences of outlier loci depend on assumptions about allelic ancestry at particular candidate loci. If those assumptions are wrong, and a native allele has been mistakenly inferred to be derived from another species, no amount of additional genomic data will reveal the mistake. Rather, analysis of more loci would tend to enhance the outlier status of loci with unusual patterns of variation. Therefore, follow-up studies must investigate individual putative outlier loci in greater detail. Here, we investigate putative outliers discovered in our SNP analysis of hybridization between the endangered native salamander Ambystoma californiense and its invasive congener Ambystoma mavortium in central California. The epicenter of the original introduction is well-established and not controversial. During the 1950’s, large scale barred tiger salamander (A. mavortium) introductions were made for the emerging fishing-bait industry in and around Salinas, Monterey County, California (Riley et al. 2003). Extensive earlier work established that the hybrid area is largely restricted to the Salinas Valley and adjacent valleys in the coast range of central California with many populations reaching levels of non-native allele frequencies exceeding 50% across the genome (Fitzpatrick and Shaffer 2007b). It is important to emphasize that the reality of this introduction is not in question. We have confirmed it verbally with one of the individuals who brought in the non-native salamanders (Riley et al. 2003), and we identified and talked to many of the land-owners where the bait dealers established the initial sites of introduction (Fitzpatrick and Shaffer 2007b). We have also discovered multiple populations consisting entirely of non-native genotypes within and outside of the range of A. californiense in California (Johnson et al. 2011). Because we did not have access to genetic samples of A. californiense from before the historically documented introductions of A. mavortium, our earlier work could not unambiguously determine the New evidence contradicts the rapid spread of invasive genes into a threatened native species Fig. 1 Hypothetical example illustrating the challenge of ascertaining the ancestry of alleles when the original natural configuration A is unknown. Reliance on reference samples far removed from a known region of introgression (e.g., above the dashed line in B) can result in bias against detecting natural variation (the haplotypes shaded purple) in reference samples of the native species and therefore misinterpretation of that variation when discovered in the area of hybridization (below the dashed line in B). This error can be avoided if the true genealogical relationships of haplotypes can be inferred (C). Tree based haplotype inference does not require comprehensive pre-invasion samples if historical differentiation between native and introduced lineages is clearly reflected in gene trees, as long as some known native and introduced haplotypes are included in the reference sample set native genotypes that were present before introductions occurred in the hybrid area. We had considered it critical to identify diagnostic alleles for both species, therefore we avoided samples from the known hybridization area in our original reference panel to establish allelic ancestry. Instead, we used multiple native reference populations from those parts of the native range that were geographically isolated from the hybrid area, and could therefore be considered a priori as pure native A. californiense genotypes. While this exposed us to the possibility that we might miss native variants that were geographically restricted to the region dominated by the invasion, it also safeguarded us against ignoring potentially informative diagnostic markers by erroneously misidentifying A. mavortium SNP alleles from within the hybrid area as ancestral polymorphisms segregating in A. californiense. Given our otherwise broad sampling across the range of the native species, the only concern that avoiding the hybridization area raised was the potential presence of true native variation where one SNP allele was largely restricted to the central coast hybridization area, the rest of the native range fixed or nearly fixed for the alternative allele, and the invasive barred tiger salamander happened to share the central coast allele. In this scenario, the central coast SNP variant would be identical by state to the non-native allele (since biallelic SNPs only occur in two states), and would be misinterpreted as invasive wherever it was later found. We viewed this as extremely unlikely. Based on ancestry inference from this set of 18 reference individuals from 9 widely dispersed locations across the species range, we analyzed a transect from the heart of the hybrid area north into pure A. californiense for a set of 68 independent SNP markers that were available at the time to examine patterns of introgression across the genome (Fitzpatrick et al. 2010). Sixtyfive markers showed the expected pattern, with a steep shift from high non-native frequency at the southern Vol.: (0123456789) B. M. Fitzpatrick et al. end of the transect in the Salinas Valley to pure native within 10–20 km north of the known introduction sites. However, to our surprise, three markers showed what appeared to be an extraordinary penetration of the nonnative allele into native A. californiense populations, moving roughly 90 km further than the other 65 markers. We considered these outlier alleles to be ‘superinvasive’, with presumably strong selection driving their high frequency and rapid spread. Although this interpretation stretched our credulity of how quickly these non-native alleles could spread based on our knowledge of salamander movement biology, all evidence pointed to their reality as alleles that moved much more rapidly than the invasion front of non-native alleles. Here we provide two new lines of evidence that strongly suggest a reinterpretation of our earlier results. Armed with new genomic tools, additional key specimens, and a novel approach to infer the source of alleles, we reexamined the superinvasive hypothesis for these three candidate markers. First, we were able to sequence A. californiense samples collected from the central coast decades before the known introduction of A. mavortium, and found that the three SNPs that we had interpreted as superinvasive were naturally already present in the central coast portion of the range of A. californiense. Following that observation, we used new DNA sequencing capabilities to analyze the flanking genomic regions containing the presumptive superinvasive SNPs to more accurately recover the ancestry information in DNA data from hybrid populations. By analyzing these sequences phylogenetically, we were able to infer their ancestry, even though we lacked perfect knowledge of all haplotype variation in the reference species. Our conclusion is that we erroneously interpreted naturally occurring, geographically restricted SNPs as extraordinarily mobile non-native A. mavortium alleles. Our new analyses strongly suggest that the hybrid invasion is still largely restricted to the central coast part of the A. californiense range. Methods Samples and data We obtained two ethanol-preserved samples of A. californiense from the California Academy of Sciences Vol:. (1234567890) collected near Stanford University in 1909 and 1921 (California Academy of Sciences specimen numbers CAS SUA-3466 and CAS 50182, respectively). These samples pre-date the introduction of barred tiger salamanders to California in the 1950’s (Riley et al. 2003), and were collected in the central coast region roughly 100 km northwest of the epicentre of deliberate introductions in the Salinas Valley. We included these samples in a large exon capture analysis (McCartney-Melstad et al. 2016) of 1624 previously collected hybrid and reference samples, including one representative of the closely related model species, the Mexican axolotl (A. mexicanum), which is more closely related to A. mavortium than A. californiense, to help contextualize estimated gene trees (Everson et al. 2021). Almost all DNA samples were taken from larvae, which cannot be identified by phenotype as native, non-native, or hybrid, therefore the genetic status of each sample within California was inferred based on DNA data. The original SNPs were identified by Fitzpatrick et al. (2009) by first aligning EST sequences (Smith et al. 2005) from A. californiense, A. mexicanum, A. tigrinum, and A. mavortium sampled from their native range. Based on the Ambystoma phylogeny, they reasoned that SNPs differentiating A. californiense from the other three were likely to be reliable markers of native (A. californiense) vs non-native ancestry. After designing FP-TDI SNP assays, they tested the SNPs on a set of 19 A. californiense from nine populations assumed to be pure native based on their geographic distance from the known area of introduction (the specimen from Santa Barbara County listed by Fitzpatrick et al. (2009) was not used as a native reference owing to concerns about introgression from an introduced population in the nearby Lompoc Valley). They also used a reference set of eight A. mavortium from an introduced population in Lake County, CA (USA) which was outside the native range of A. californiense and verified by a former bait dealer as a site of deliberate introduction (Fitzpatrick and Shaffer 2007a, b; Johnson et al. 2011). Moreover, adult salamanders collected at the Lake County site appeared to be A. mavortium based on color pattern and head shape (Ryan et al 2009; Johnson et al. 2010). The final set of 68 SNP markers was diagnostic in the test panel (Fitzpatrick et al. 2009). For the current study, we extracted DNA from tissue samples using a salt-based extraction protocol New evidence contradicts the rapid spread of invasive genes into a threatened native species (Sambrook and Russell 2001). Extractions were normalized to 100 ng/uL and sheared to approximately 300–500 bp on a Bioruptor NGS (Diagenode). We prepared libraries using KAPA LTP library preparation kit half reactions (KAPA Biosystems, Wilmington, MA) and universal stubs, adding dual 8-bp adapter index sequences via a 6-cycle PCR reaction (Glenn et al. 2019). We pooled libraries into groups of 8 with 500 ng of input library each and enriched for a set of 5,237 exons using biotinylated RNA probes (Arbor Biosciences, Ann Arbor, MI) in the presence of 30,000 ng of ambystomatid c­ 0t-1 sequence blocker (McCartney-Melstad et al. 2016). We amplified enriched libraries with 14 cycles of PCR and combined 19 enrichments at an equimolar ratio to create pools of 152 samples for sequencing on Illumina HiSeq 4000 150 bp paired-end lanes. We used skewer v0.2.2 (Jiang et al. 2014) to trim reads for adapter contamination, discarding trimmed reads shorter than 40 bp. We used the Genome Analysis Toolkit (GATK) version 3.8–1 to call SNPs and genotypes (McKenna et al. 2010; Van der Auwera et al. 2013). We ran HaplotypeCaller on individual samples and GenotypeGVCFs to jointly call and genotype SNPs and short indels with the target sequences as the reference set. The final target sequences were assembled by McCartney-Melstad et al. (2016). They derived these by mapping their reads to the original probe sequences, identifying reciprocal best BLAST hits (RBBHs), and masking potential chimeras by blasting the RBBHs to themselves. The target locus RBBH reference sequences we used here range from 534 to 1075 base pairs, and are given in Online Resource 1. We filtered SNPs with hard filters as follows: QD < 2.0, MQ < 20.0, FS > 60.0, MQRankSum < −12.5, −8.0 > ReadPosRankSum > 8.0, SOR > 5.0, and QUAL < 30. We filtered indels with the following hard filters: QD < 2.0, SOR > 10.0, FS > 60.0, −8.0 > ReadPosRankSum > 8.0, and QUAL < 30. Genotype calls that had a depth lower than 8 or a GQ score lower than 20 were set to missing data. We removed samples with more than 75% missing data. Then, we removed indels and SNPs with more than 25% missing data across the remaining samples. We extracted data mapping to the original SNP loci, including those containing the putative superinvasive SNPs (E6E11, E12C11, and E23C6) with vcftools (Danecek et al. 2011), and phased genotypes within each target region using BEAGLE without imputing missing data (Browning et al. 2021). We extracted target-level haplotypes for each sample (a maximum of two haplotypes per sample). The final dataset included 1399 individuals (Online Resource 2). Range‑wide analysis of the original outlier SNPs To verify our prior result with the current genotyping-by-sequencing methods and broader geographic sample, we used the same SNP positions as before (Fitzpatrick et al. 2010) to call genotypes and scan for outliers. We plotted the geographic distributions of each SNP allele of the putative superinvasive loci to assess the extent of apparent introgression of the alleles formerly classified as introduced. We tested for outlier SNPs using genomic clines fitted with maximum likelihood by the R function HIest (Fitzpatrick 2013). We fitted clines using three alternative mathematical models suited for diagnostic loci (Barton, logit-logistic, and beta cline functions) and chose the form with best fit to the putative superinvasive SNPs. To detect outliers, we entered the estimated cline parameters (μ and ν for each locus) as bivariate data in aq.plot (Filzmoser and Gschwandtner 2021) to compute a squared Mahalanobis distance (D2) for each locus and compare it to the quantiles of the χ2 distribution with two degrees of freedom. To better conform to a multivariate normal distribution, we used the natural log of ν. Analysis of haplotypes To better evaluate the ancestry information in the SNP-containing loci, we used the inferred DNA sequences of the full phased haplotypes aligned to the final target sequences. We estimated haplotype trees (with unique haplotypes as tips) using maximum likelihood with FastTree2 (Price et al. 2010). As a check on the FastTree2 results for the three outlier loci, we also estimated trees with MrBayes (Ronquist et al. 2012) using the HKY + I + G substitution model after model selection using mrModeltest v2 (Nylander 2004). We ran the MrBayes algorithm for 2 × ­ 106 generations, 2 independent runs of 4 chains, and 25% burnin. The Bayesian and maximum likelihood trees were identical. Vol.: (0123456789) B. M. Fitzpatrick et al. We inferred the ancestry of haplotypes that did not appear in the reference samples if they unambiguously clustered with one set of reference Vol:. (1234567890) samples. To score the ancestry of each unique haplotype as native or non-native, we used the following algorithm: New evidence contradicts the rapid spread of invasive genes into a threatened native species ◂Fig. 2 Approximate native range of the California tiger sala- mander (shown in grey: 870 unique locations from California Department of Fish and Wildlife (2018)) with the primary area of introductions indicated by the oval. Geographic distribution (blue) of SNP alleles characteristic of native reference populations, and (red) SNP alleles formerly thought to represent “superinvasive” alleles from non-native barred tiger salamanders (A. mavortium). E6E11, E12C11, and E23C6 are loci mapped to separate chromosomes in the Ambystoma genome (E23C6 actually maps to two chromosomes). The red alleles are concentrated in the central coast region where true nonnative alleles dominate, were fixed in extralimital populations known to be introduced, and were rare or absent in the rest of the range 1. Identify the node that groups the reference set of native haplotypes into the smallest possible subtree and root the tree at this node. Note: this was done to establish polarity with respect to the nonnative haplotypes. This made it easier to automate the identification of the most recent common ancestor (MRCA) in the next step. 2. Identify the MRCA of the reference set of nonnative haplotypes and re-root the tree at this node. 3. Re-determine the MRCA of the reference native haplotypes on the re-rooted tree. 4. Discard the locus if the native and non-native MRCAs are the same (indicates an uninformative gene tree with respect to native/non-native ancestry). 5. For each haplotype (tip on the tree) calculate its distance to each node, and determine whether it is closer to one of the MRCA nodes or to a node on the path between the MRCA nodes. 6. If the haplotype is closest to the native MRCA, score it as native. If it is closest to the non-native MRCA, score it as non-native. 7. If the haplotype is closest to a node on the path between the MRCA nodes, then its ancestry is potentially ambiguous. If the internode it connects to is closer to either of the MRCA nodes than to the midpoint between the MRCA nodes, then it is scored as whichever MRCA it is closer to; otherwise, it is unscorable. In other words, the middle 50% of the path between the reference MRCA nodes is considered unscorable. Using this gene tree approach, we then examined the geographic distribution of the scored haplotypes and performed outlier detection on fitted genomic clines as before. We were able to perform genomic cline outlier analyses for 42 of the original 65 loci in addition to the three candidate superinvasives for both SNP-based and haplotype-based analyses. The other 23 original loci either could not be included in the target capture (15 loci) or failed to resolve monophyletic native/non-native subtrees (i.e., failed at step 4 of the algorithm; eight loci). Results Range‑wide analysis of the original outlier SNPs For the three previously identified putatively superinvasive SNPs, we recovered the same geographic pattern as that published in 2010 using the FP-TDI SNP scoring method: Alleles previously identified as characteristic of A. mavortium were common throughout much of the range of California tiger salamanders (Fig. 2). Genomic cline model comparison indicated that the Beta function was the best fit to the three putative superinvasive SNPs, therefore we performed outlier analysis using the fitted parameters of this model. To better conform to a bivariate normal distribution, we log-transformed the ν parameter prior to computing the squared Mahalanobis distances (D2). Outlier analyses identified all three markers as having D2 exceeding the 99.999th percentile of the χ2 (df = 2) distribution (Fig. 3). Their genomic clines are highly displaced, as previously observed, consistent with differential introgression of those SNPs relative to the genomic background (Fig. 3). The only other marker exceeding this percentile was Gnat1, which is known to exhibit a heterozygote deficit arising from early embryonic mortality of hybrids (Johnson et al. 2010). The genomic cline for Gnat1 is exceptionally steep but not displaced, as expected for a locus with heterozygote disadvantage (Fitzpatrick 2013). Analysis of pre‑introduction museum specimens Both of the pre-introduction A. californiense samples from Stanford also had the SNP alleles characteristic of introduced A. mavortium. Therefore, these SNPs cannot be diagnostic of introduced ancestry, as previously thought. We confirmed this inference using the entire DNA sequence flanking the SNPs to estimate gene trees for each locus, demonstrating that Vol.: (0123456789) B. M. Fitzpatrick et al. 3 Gnat1 0.4 ln( ) Gnat1 1 E12C11 2 E23C6 0.6 0.8 Beta Cline Parameters 0 0.2 E23C6 E6E11 E12C11 -1 0.0 Locus-specific Ancestry 1.0 Genomic Clines E6E11 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Multi-locus Ancestry Fig. 3 Genomic clines fitted to biallelic SNP data where individuals were scored as having zero, one, or two copies of the SNP allele that had been classified as introduced based on the original reference data. Clines described by the beta equation (Fitzpatrick 2013) were fitted to each locus by maximum likeli- hood (left) and the parameters were used to calculate a squared Mahalanobis distance (D2) for each locus from the bivariate centroid of the distribution (right). Outlier loci with D2 greater than the 99.999 percentile of the χ2 distribution (df = 2) are shown in red the haplotypes carried by the pre-introduction samples grouped with other native haplotypes (Online Resources 2–5: The samples were scored as homozygous for haplotypes 28 for E6E11, C for E12C11, and D for E23C6, all of which were common in modern samples from the western part of the native range). ancestry-informative sites. This lack of clear genealogical separation between native and non-native reference haplotypes indicates that E23C6 is not an appropriate marker for differentiating tiger salamander ancestry. Moreover, when we aligned E23C6 to the axolotl genome, it mapped to two chromosomes, suggesting that it may be a paralog. The full genome was not available when the original markers were developed, so it is not surprising if a few (particularly outliers) turn out not to be single-copy. We used the gene trees for E6E11 and E12C11 to score haplotypes as native or invasive based on their genealogical affinities illustrated in Fig. 4. The geographic distribution of inferred invasive haplotypes matches the known distribution of the hybrid populations (Fitzpatrick and Shaffer 2007b), with no evidence of differential introgression (Fig. 5). Genomic cline outlier analysis of the scored haplotype data did not support outlier status for the candidate loci E6E11 and E12C11, although it did still identify Gnat1 as an outlier with a signature consistent with heterozygote disadvantage (Fig. 6). In fact, the SNP and haplotype data were identical for all loci except for the three putative superinvasive SNPs. We used all 44 loci with haplotypes scored as native or non-native to re-evaluate the distribution of introduced alleles in California (Fig. 7). Outside of the Salinas Valley in Monterey County, we found Range‑wide analysis of haplotypes The haplotype trees for loci E6E11 and E12C11 clearly show several haplotypes clustering with native A. californiense despite having the SNP formerly characterized as invasive (Fig. 4), further confirming that the SNP is an ancestral polymorphism shared by both the native and introduced species. Two ambiguous haplotypes of E6E11 that fell on the long internal branch of the gene tree (Fig. 4) were observed only in the Salinas Valley; otherwise, all haplotypes were clearly clustering with one of the two species. These ambiguous haplotypes could be results of recombination as they share variants with non-native haplotypes toward the beginning of the sequence and share variants with native haplotypes toward the end (Online Resource 3), however the A. mexicanum reference sequences show the same pattern and there is no definitive evidence of recombination based on the four-gamete test. The tree for the third locus, E23C6, is not resolvable because it has few Vol:. (1234567890) New evidence contradicts the rapid spread of invasive genes into a threatened native species Fig. 4 Unrooted haplotype trees estimated for each locus with MrBayes using one representative sequence per haplotype. Haplotypes with red branches contain the SNP alleles formerly assumed to be diagnostic of non-native ancestry. Our treebased ancestry inference algorithm assigns as native all haplotypes unambiguously clustering with Ambystoma californiense reference samples (blue dots), and as introduced all haplotypes unambiguously clustering with A. mavortium reference samples (red dots). Haplotypes falling along the middle half of the internal branch connecting these clusters are considered ambiguous and cannot be assigned with confidence to either species. Support for the clades encircled by dashed lines was greater than 0.95 posterior probability non-native A. mavortium haplotypes in samples previously identified as independent introductions outside of the range of native A. californiense in Lake, Siskiyou, Mono, Kern, and San Diego counties (Johnson et al. 2011). A non-native population was also sampled in the Lompoc Valley (Santa Barbara County) near pure native populations of the Santa Barbara Distinct Population Segment of A. californiense. In addition, we found previously unpublished non-native genotypes in Great Valley Grasslands State Park (Merced County) and Stanford (Santa Clara County). The Stanford samples are of undocumented specific origin and may have been collected in the Salinas Valley. Finally, we confirmed the presence of nonnative alleles in Sonoma County—the northernmost non-native alleles within the native range shown in Fig. 7, see also Cooper et al. (submitted). Discussion Accurately identifying hybrid genotypes is central to evolutionary, population, and conservation genetics. Detection of introgression rests on the assumption that the ancestry of each allele can be inferred with an acceptable level of certainty. Previous work in Vol.: (0123456789) B. M. Fitzpatrick et al. Fig. 5 Geographic distributions of haplotypes scored as native (blue and violet) and introduced (red) based on genealogical affinities (Fig. 2). Violet indicates native haplotypes contain- ing the SNP formerly assumed to be diagnostic of non-native ancestry for that locus this study system has addressed statistical and philosophical issues surrounding the designation of ‘pure’ and ‘hybrid’ populations in the general framework of multi-locus population genetics (e.g., Fitzpatrick and Shaffer 2007b; Fitzpatrick et al. 2010, 2015; Wayne and Shaffer 2016; Searcy et al. 2016). Here, we focused on the few markers that were identified as outliers in previous studies, suggesting incredibly fast differential introgression of non-native alleles at these loci relative to the majority of the genome. While individual or population level estimates of whole genome ancestry proportions do not require fixed differences or comprehensive knowledge of ancestral populations (e.g., Pritchard et al. 2000; Gompert and Buerkle 2011), interpretation of introgression outliers depends critically on the assumed distribution of specific alleles prior to the initiation of hybridization. That is, we must first understand which alleles are native, and which are introduced. Unfortunately, homoplasy or shared ancestral variation can lead to false inferences of introgression when a putative recipient population is naturally segregating for an allele that is characteristic of a putative source population. False inference can be particularly problematic when working with SNPs because of their extreme simplicity. This is a special case of the broader problem of SNP ascertainment bias, wherein population genetic inferences can be biased owing Vol:. (1234567890) New evidence contradicts the rapid spread of invasive genes into a threatened native species Beta Cline Parameters 3.0 2.0 0.5 1.0 1.5 ln( ) 0.6 0.4 0.0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Multi-locus Ancestry Fig. 6 Genomic clines fitted to haplotype data where individuals are scored as having zero, one, or two introduced haplotypes. Clines described by the beta equation (Fitzpatrick 2013) were fitted to each locus by maximum likelihood (left) and the parameters were used to calculate a squared Mahalanobis Native CTS Introduced or hybrid North scale approx 1:8,300,000 0 Gnat1 2.5 0.8 Gnat1 0.0 Locus-specific Ancestry 1.0 Genomic Clines 100 200 km Fig. 7 Updated map of California, USA, showing the geographic distribution of the native California tiger salamander (Ambystoma californiense) in black (870 unique locations) and localities with at least one introduced barred tiger salamander (A. mavortium) haplotype in red 0.0 0.1 0.2 0.3 0.4 0.5 0.6 distance (D2) for each locus from the bivariate centroid of the distribution (right). Only Gnat1 was an outlier with D2 greater than the 99.999 percentile of the χ2 distribution (df = 2). The candidate superinvasive loci E6E11 and E12C11, also shown in red, are near the centroid to the systematic choice of nucleotide variants with particular characteristics in a reference sample rather than a truly random sample of sites from the genome (Rogers and Jorde 1996; Clark et al. 2005; Dokan et al. 2021). Figures 1 and 4 illustrate how inadequate sampling of a structured population can lead to the inaccurate assignment of a diagnostic SNP, and how tree based haplotype inference can overcome the problem of inadequate reference samples. The basic steps involved in tracking non-native alleles in native populations are straightforward and well established. First, a comprehensive, complete set of reference individuals from both the native and non-native species that are known to be unaffected by hybridization must be assembled and sequenced for target markers. A complete inventory of variation is critical to avoid the misinterpretation of native polymorphism as nonnative introgression (Fig. 1). Particularly for wellestablished hybridization situations, this can be more challenging than first appears, because genetic variation at any marker within a known hybrid population might include ancestral native variation that was present before hybridization, variation introduced from the non-native species, or both. The logical path forward in such situations is to avoid including individuals from known hybrid populations or their nearby neighbors in the native reference panel, since variants Vol.: (0123456789) B. M. Fitzpatrick et al. cannot be unambiguously attributed to either species. However, the possibility always exists that the current area of hybridization harbors unique native alleles that happen to be shared with the non-native relative. In such a situation, by excluding individuals from the hybrid area in the native reference panel, those alleles will be mistakenly misinterpreted as exclusively derived from the non-native species rather than as natural, but geographically limited native variants (Fig. 1). We argue that such errors are more likely when (1) very simple genetic markers, especially biallelic SNPs are analyzed (since the allele fixed in the non-native species and a potential local variant in the native species are by definition identical by state), and (2) when the hybrid area is large, environmentally unique, or bounded by natural barriers to gene flow (in all cases, the potential for a native variant being restricted to the hybrid area is relatively great). We further argue that, in the absence of genetic samples from the hybrid area collected before the hybridization occurred, this problem can sometimes be resolved with the phylogenetic analysis of genetic markers of sufficient complexity that their ancestry (native or non-native) can be unambiguously determined (Figs. 1 and 4). In our salamander case study, locus E6E11 includes a clade of haplotypes that share several unique SNPs, including the one used in our original study. Our reference sample included only individuals carrying those haplotypes, leading us to attribute alleles from the other native haplotype clade to introgression from A. mavortium based on a single SNP. This limitation was overcome in the current study by using the more expansive haplotype sequence and its associated gene tree to assign ancestry to haplotypes that were not observed in the reference sample. Doing so is a natural extension of the current practice of identifying diagnostic SNPs. However, by analyzing sequences, each containing multiple SNPs, the likelihood of misinterpreting ancestral polymorphism is dramatically reduced. The probability that isolated populations share polymorphisms decreases with time divided by effective population size (Wright 1931). Conversely, the probability that any particular locus will have a well resolved reciprocally monophyletic gene tree increases with time divided by effective population size (Tajima 1983). Thus, for very recently diverged groups, no approach is certain to yield diagnostic markers for ancestry. However, there is likely to be a Vol:. (1234567890) substantial window in which many gene trees will be well-resolved while many individual nucleotide positions will be non-diagnostic (i.e., when branch lengths within populations are long relative to between populations). In fact, because there are only four possible states for a nucleotide, there is always a reasonable chance of homoplasy at the nucleotide level, even for deeply diverged lineages (Jukes and Cantor 1969). We believe that our previous inference of rapid spread of invasive alleles was incorrect, and arose from our misinterpretation of ancestral polymorphism present within the California tiger salamander. Our incorrect inference stemmed from several aspects of our earlier study. First, by using biallelic SNPs defined as diagnostic from a systematic, but incomplete sampling of the range of the species, we assumed that the common native allele present in our reference samples also characterized the central coast (Fitzpatrick et al. 2009). For three loci and the reference individuals we happened to sample, that was not the case. Second, in that detailed analysis of the hybridization dynamics of California and barred tiger salamanders, we analyzed only a transect extending north from the hybrid area into a pure native area. While both reasonable and comprehensive in scope, this obscured the widespread occurrence of the putative invasive allele, especially at lower frequencies to the east (e.g., Fig. 2). And third, our relatively modest sampling of our reference populations, with a single individual from each of nine widely dispersed sites from outside of the known hybrid area, means that we could, and did, miss an allele if it was at a low or moderate frequency, as happened for markers E6E11 and E12C11 in the southeastern populations of the southern San Joaquin Valley. For two of the three loci, tree-based haplotype scoring alone would have averted the false inferences, and for the third, the locus would have been discarded as uninformative. Even in the absence of comprehensive reference population data for the central coast, the haplotypes are sufficiently rich in variation that genealogical affinities of most observed haplotypes can be inferred (Fig. 2). In all three cases, the new data from the two pre-introduction specimens confirmed that the SNPs formerly scored as non-native were present in A. californiense decades before the known introductions of A. mavortium in the 1950s. For those with a specific interest in the conservation of A. californiense, our updated assessment of New evidence contradicts the rapid spread of invasive genes into a threatened native species the geographic distribution of introduced A. mavortium in California is as follows (Fig. 7): Relatively isolated pure introduced populations are found outside of the range of A. californiense in Lake, Siskiyou, Mono, Kern, and San Diego counties (Johnson et al. 2011). Hybrid populations are widespread in the Salinas Valley (Monterey, San Benito, and southern Santa Clara counties), and have been detected, but are rare, in Great Valley Grasslands State Park (Merced County), the Santa Rosa Plain (Sonoma County), Lompoc Valley (Santa Barbara County), and possibly Stanford (Santa Clara County). Many of these sites are known to have been targets of deliberate introductions to establish harvestable populations for fishing bait (Riley et al. 2003; Fitzpatrick and Shaffer 2007b). Contrary to our previous inference of extensive introgression of three genes, there is no evidence that the hybrid population centered in the Salinas Valley has spread much farther than 12 km from specific known or suspected introduction sites in the last seven decades (Fitzpatrick and Shaffer 2007b). Finally, hybrids do appear occasionally in places far from the Salinas Valley hybrid area. The presence of extralimital hybrids and pure A. mavortium suggest that humans have repeatedly translocated non-native tiger salamanders in California, and demonstrates the need for continued monitoring of native populations for signs of admixture and introgression. Barred tiger salamanders and hybrids exhibit significant ecological differences from native California tiger salamanders, including negative impacts on other native amphibians and endangered vernal pool invertebrates (Ryan et al. 2009; Searcy et al. 2016). While the results presented here are more optimistic than our previous geographical assessment of the invasion progression (Fitzpatrick et al. 2010), hybrid larvae have been shown to have greater fitness than native larvae (Fitzpatrick and Shaffer 2007a; Cooper and Shaffer 2023) and negative ecological consequences for vernal pool ecosystems (Ryan et al. 2009; Searcy et al. 2016). Therefore, the spread of hybrid genotypes remains a risk that must be both monitored, and ideally, reversed before it spreads further. Acknowledgements This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI1548562, and the Stampede2 cluster at the University of Texas through allocation TG-DEB180005, as well as the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH Instrumentation Grant S10 OD018174. We thank the Department of Herpetology, California Academy of Sciences for tissue sample loans of the two specimens that predate the introduction of barred tiger salamanders to California. Author contributions All authors contributed to the study conception, design, and field sampling. Lab work and bioinformatics were performed by Evan McCartney-Melstad. Data analysis was performed by Ben Fitzpatrick. The first draft of the manuscript was written by Ben Fitzpatrick and all authors contributed to previous versions of the manuscript. All authors read and approved the final manuscript. Funding Funding was provided by the National Science Foundation, the Central Valley Project Conservation Program, and the California Department of Transportation to HBS. Data availability All data supporting the findings of this study are available within the paper and its Supplementary Information. Declarations Conflict of interest The authors have no relevant financial or non-financial interests to disclose. References Allendorf FW, Leary RF, Spruell P et al (2001) The problems with hybrids: setting conservation guidelines. Trends Ecol Evol 16:613–622 Browning BL, Tian X, Zhou Y et al (2021) Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 108:1880–1890 California Department of Fish and Wildlife (2018) California Natural Diversity Database (CNDDB)–Government version dated June 1, 2018. Retrieved June 12, 2018 from https://map.dfg.ca.gov/rarefi nd/view/RareFind.aspx Chatfield MWH, Kozak KH, Fitzpatrick BM et al (2010) Patterns of differential introgression in a salamander hybrid zone: inferences from genetic data and ecological niche modelling. Mol Ecol 19:4265–4282 Clark AG, Hubisz MJ, Bustamante CD et al (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15:1496–1502 Cooper RD, Luckau TK, Toffelmier E et al (submitted) A novel genetic tool to enable rapid detection of rare non-native alleles. Scientific Reports Cooper RD, Shaffer HB (2023) Managing invasive hybrids with pond hydroperiod manipulation in an endangered salamander system. Cons Biol 2023:e14167 Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158 DeVos TB, Bock DG, Kolbe JJ (2023) Rapid introgression of non-native alleles following hybridization between a native Anolis lizard species and a cryptic invader across an urban landscape. Mol Ecol 32:2930–2944 Vol.: (0123456789) B. M. Fitzpatrick et al. Dokan K, Kawamura S, Teshima KM (2021) Effects of single nucleotide polymorphism ascertainment on population structure inferences. G3 (Bethesda) 11 Draper D, Laguna E, Marques I (2021) Demystifying negative connotations of hybridization for less biased conservation policies. Front Ecol Evolut. https://doi.org/10. 3389/fevo.2021.637100 Everson KM, Gray LN, Jones AG, et al. (2021) Geography is more important than life history in the recent diversification of the tiger salamander complex. Proc Natl Acad Sci U S A 118 Filzmoser P, Gschwandtner M (2021) Mvoutlier: multivariate outlier detection based on robust methods. R package version 2.1.1, <https://CRAN.R-project.org/package= mvoutlier>. Fitzpatrick BM (2012) Estimating ancestry and heterozygosity of hybrids using molecular markers. BMC Evolut Biol 12:1–14 Fitzpatrick BM (2013) Alternative forms for genomic clines. Ecol Evol 3:1951–1966 Fitzpatrick BM, Shaffer HB (2007a) Hybrid vigor between native and introduced salamanders raises new challenges for conservation. Proc Natl Acad Sci USA 104:15793–15798 Fitzpatrick BM, Shaffer HB (2007b) Introduction history and habitat variation explain the landscape genetics of hybrid tiger salamanders. Ecol Appl 17:598–608 Fitzpatrick BM, Johnson JR, Kump DK et al (2009) Rapid fixation of non-native alleles revealed by genome-wide SNP analysis of hybrid tiger salamanders. BMC Evol Biol 9:176 Fitzpatrick BM, Johnson JR, Kump DK et al (2010) Rapid spread of invasive genes into a threatened native species. Proc Natl Acad Sci USA 107:3606–3610 Fitzpatrick BM, Ryan ME, Johnson JR et al (2015) Hybridization and the species problem in conservation. Curr Zool 61:206–216 Glenn TC, Nilsen RA, Kieran TJ et al (2019) Adapterama I: universal stubs and primers for 384 unique dual-indexed or 147,456 combinatorially-indexed Illumina libraries (iTru & iNext). PeerJ 7:e7755 Gompert Z, Buerkle CA (2011) Bayesian estimation of genomic clines. Mol Ecol 20:2111–2127 Jiang H, Lei R, Ding S-W et al (2014) Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinform. https://doi.org/10. 1186/1471-2105-15-182 Johnson JR, Fitzpatrick BM, Shaffer HB (2010) Retention of low-fitness genotypes over six decades of admixture between native and introduced tiger salamanders. BMC Evol Biol 10:147 Johnson JR, Thomson RC, Micheletti SJ et al (2011) The origin of tiger salamander (Ambystoma tigrinum) populations in California, Oregon, and Nevada: introductions or relicts. Conserv Genet 12:355–370 Jukes TH, Cantor CR (1969) Evolution of protein molecules. Academic Press, New York Lipshutz SE, Meier JI, Derryberry GE et al (2019) Differential introgression of a female competitive trait in a hybrid zone between sex-role reversed species. Evolution 73:188–201 Vol:. (1234567890) McCartney-Melstad E, Mount GG, Shaffer HB (2016) Exon capture optimization in amphibians with large genomes. Mol Ecol Resour 16:1084–1094 McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303 Nylander JAA (2004) MrModeltest v2. Evolutionary biology centre, Uppsala University, Program distributed by the author Pereira V, Santangelo R, Børsting C et al (2020) Evaluation of the precision of ancestry inferences in south American admixed populations. Front Genet. https://doi.org/ 10.3389/fgene.2020.00966 Petit RJ (2004) Biological invasions at the gene level. Divers Distrib 10:159–165 Price MN, Dehal PS, Arkin AP (2010) FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5:e9490 Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959 Rhymer JM, Simberloff D (1996) Extinction by hybridization and introgression. Annu Rev Ecol Syst 27:83–109 Riley SPD, Shaffer HB, Voss SR et al (2003) Hybridization between a rare, native tiger salamander (Ambystoma californiense) and its introduced congener. Ecol Appl 13:1263–1275 Rogers AR, Jorde LB (1996) Ascertainment bias in estimates of average heterozygosity. Am J Hum Genet 58:1033–1041 Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542 Ryan ME, Johnson JR, Fitzpatrick BM (2009) Invasive hybrid tiger salamander genotypes impact native amphibians. Proc Natl Acad Sci USA 106:11166–11171 Sambrook J, Russell DW (2001) Molecular cloning: a laboratory manual (3-volume set). Cold Spring Harbor, New York Searcy CA, Rollins HB, Shaffer HB (2016) Ecological equivalency as a tool for endangered species management. Ecol Appl 26:94–103 Smith JJ, Kump DK, Walker JA et al (2005) A comprehensive expressed sequence tag linkage map for tiger salamander and Mexican axolotl: Enabling gene mapping and comparative genomics in Ambystoma. Genetics 171:1161–1171 Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460 Todesco M, Pascual MA, Owens GL et al (2016) Hybrid Extinct Evolut Appl 9:892–908 Van der Auwera GA, Carneiro MO, Hartl C et al (2013) From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinform 43(1):11–10 Wayne RK, Shaffer HB (2016) Hybridization and endangered species protection in the molecular era. Mol Ecol 25:2680–2689 Wright S (1931) Evolution in mendelian populations. Genetics 16:97–159 New evidence contradicts the rapid spread of invasive genes into a threatened native species Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Vol.: (0123456789)