Uploaded by Ben Fitzpatrick

New evidence contradicts the rapid spread of invasive genes into a threatened native species

advertisement
Biol Invasions
https://doi.org/10.1007/s10530-024-03386-3
ORIGINAL PAPER
New evidence contradicts the rapid spread of invasive genes
into a threatened native species
Benjamin M. Fitzpatrick ·
Evan McCartney‑Melstad · Jarrett R. Johnson ·
H. Bradley Shaffer
Received: 1 February 2024 / Accepted: 24 June 2024
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024
Abstract When introduced species hybridize with
native relatives, spread of advantageous invasive
genes into native populations (introgression) is a
conservation concern. Genome-scale SNP (single
nucleotide polymorphism) analysis can be a powerful
approach to detect hybridization and identify candidate loci experiencing selection in hybrid zones. However, followup studies are critical to verify and interpret potentially impactful patterns of introgression. In
Supplementary Information The online version
contains supplementary material available at https://​doi.​
org/​10.​1007/​s10530-​024-​03386-3.
B. M. Fitzpatrick (*)
Department of Ecology and Evolutionary Biology,
University of Tennessee, Knoxville, TN 37996, USA
e-mail: benfitz@utk.edu
E. McCartney‑Melstad · H. B. Shaffer
Department of Ecology and Evolutionary Biology,
University of California, Los Angeles, CA 90095, USA
E. McCartney‑Melstad · H. B. Shaffer
La Kretz Center for California Conservation Science,
Institute of the Environment and Sustainability, University
of California, Los Angeles, CA 90095, USA
Present Address:
E. McCartney‑Melstad
Nutcracker Therapeutics, 5858 Horton Street, Suite 540,
Emeryville, CA 94608, USA
an earlier publication we identified three outlier loci
(out of 68 unlinked SNPs) where non-native alleles
appeared to have introgressed 90 km into the range
of a threatened native salamander, while the other
65 markers showed no evidence of spread further
than 12 km. This was consistent with strong selection favoring a few invasive traits, but our inferences
necessarily depended on limited reference samples of
the native species. Here, we further tested our initial
interpretation by interrogating the outlier markers
in greater detail. First, we isolated DNA from two
museum specimens of native salamanders collected
several decades before the introduction. Both had
the putatively invasive SNPs, indicating that the SNP
alleles were present before the introduction and therefore not diagnostic for nonnative ancestry. Second,
we developed a novel genealogical analysis of DNA
sequences (rather than SNPs) to infer allelic ancestry,
since genealogical analysis of haplotypes minimizes
the ancestry assignment errors that can occur with
SNPs. When applied to the original loci, this analysis confirmed that the genotypes formerly interpreted
as ‘superinvasive’ are native variants, and non-native
alleles remain limited to areas near the original introduction sites.
Keywords Hybridization · Gene flow · Tiger
salamander · Ascertainment bias
J. R. Johnson
Department of Biology, Western Kentucky University,
Bowling Green, KY 42103, USA
Vol.: (0123456789)
B. M. Fitzpatrick et al.
Introduction
When introduced and native species interbreed,
biologists and conservation managers need to track
and understand the impacts of non-native genes in
native populations and communities (Rhymer and
Simberloff 1996; Allendorf et al. 2001; Fitzpatrick
et al. 2015; Todesco et al. 2016; Wayne and Shaffer 2016; Draper et al. 2021). Of particular interest
is the potential for rapid spread of invasive alleles
owing to strong selection arising from ecological or
sexual advantages (Petit 2004; Chatfield et al. 2010;
Lipshutz et al. 2019). Such patterns can be detected
using outlier analysis of geographic or genomic clines
if ancestry-informative markers can be characterized
with sufficient accuracy from reference samples of
the parental species (e.g., Gompert and Buerkle 2011;
DeVos et al. 2023).
The California tiger salamander, Ambystoma californiense is one of the most intensely analyzed systems involving hybridization between an endangered
native and a closely-related introduced congener (e.g.,
Riley et al. 2003; Ryan et al. 2009; Johnson et al.
2011; Searcy et al. 2016; Cooper and Shaffer 2023).
Previously, we presented evidence that 3/68 analyzed SNP markers had introgressed from introduced
barred tiger salamanders deep into the native range
of the California tiger salamander (Fitzpatrick et al.
2010). Because A. californiense is listed as Threatened/Endangered under both the US and California
Endangered Species Acts, the correct identification
of hybrid populations has major repercussions for
both mitigation actions and species recovery efforts,
and our earlier results suggested that strongly selected
non-native alleles were somehow sweeping across
native A. californiense populations.
The key to any analysis of hybridization is accurate
estimation of allele frequency differences between the
parental populations. However, doing so in natural
populations is often trickier than it first appears. At
a minimum, correctly inferring the native/non-native
ancestry of any marker depends on genetic data from
reference populations that include a comprehensive
inventory of variation in both species (Fitzpatrick
2012; Pereira et al. 2020). If important reference
populations have already been invaded, and therefore cannot be used as reference data, then it might
be impossible to accurately estimate pre-invasion
allele frequencies and natural patterns of geographic
Vol:. (1234567890)
variation. As a result, outlier loci interpreted as
showing exceptional patterns of introgression might
instead be artifacts of inadequately sampled reference
populations (Fig. 1).
This inference problem cannot be resolved by simply using a larger number of markers from across the
genome. While genome-wide data might lead to more
precise estimates of overall ancestry proportions at
the individual or population level, analyses and inferences of outlier loci depend on assumptions about
allelic ancestry at particular candidate loci. If those
assumptions are wrong, and a native allele has been
mistakenly inferred to be derived from another species, no amount of additional genomic data will reveal
the mistake. Rather, analysis of more loci would tend
to enhance the outlier status of loci with unusual patterns of variation. Therefore, follow-up studies must
investigate individual putative outlier loci in greater
detail.
Here, we investigate putative outliers discovered in
our SNP analysis of hybridization between the endangered native salamander Ambystoma californiense
and its invasive congener Ambystoma mavortium in
central California. The epicenter of the original introduction is well-established and not controversial. During the 1950’s, large scale barred tiger salamander (A.
mavortium) introductions were made for the emerging fishing-bait industry in and around Salinas, Monterey County, California (Riley et al. 2003). Extensive
earlier work established that the hybrid area is largely
restricted to the Salinas Valley and adjacent valleys in
the coast range of central California with many populations reaching levels of non-native allele frequencies exceeding 50% across the genome (Fitzpatrick
and Shaffer 2007b). It is important to emphasize that
the reality of this introduction is not in question. We
have confirmed it verbally with one of the individuals who brought in the non-native salamanders (Riley
et al. 2003), and we identified and talked to many of
the land-owners where the bait dealers established
the initial sites of introduction (Fitzpatrick and Shaffer 2007b). We have also discovered multiple populations consisting entirely of non-native genotypes
within and outside of the range of A. californiense in
California (Johnson et al. 2011).
Because we did not have access to genetic samples of A. californiense from before the historically
documented introductions of A. mavortium, our earlier work could not unambiguously determine the
New evidence contradicts the rapid spread of invasive genes into a threatened native species
Fig. 1 Hypothetical example illustrating the challenge of
ascertaining the ancestry of alleles when the original natural
configuration A is unknown. Reliance on reference samples
far removed from a known region of introgression (e.g., above
the dashed line in B) can result in bias against detecting natural
variation (the haplotypes shaded purple) in reference samples
of the native species and therefore misinterpretation of that
variation when discovered in the area of hybridization (below
the dashed line in B). This error can be avoided if the true
genealogical relationships of haplotypes can be inferred (C).
Tree based haplotype inference does not require comprehensive pre-invasion samples if historical differentiation between
native and introduced lineages is clearly reflected in gene trees,
as long as some known native and introduced haplotypes are
included in the reference sample set
native genotypes that were present before introductions occurred in the hybrid area. We had considered
it critical to identify diagnostic alleles for both species, therefore we avoided samples from the known
hybridization area in our original reference panel
to establish allelic ancestry. Instead, we used multiple native reference populations from those parts of
the native range that were geographically isolated
from the hybrid area, and could therefore be considered a priori as pure native A. californiense genotypes. While this exposed us to the possibility that we
might miss native variants that were geographically
restricted to the region dominated by the invasion,
it also safeguarded us against ignoring potentially
informative diagnostic markers by erroneously misidentifying A. mavortium SNP alleles from within the
hybrid area as ancestral polymorphisms segregating
in A. californiense. Given our otherwise broad sampling across the range of the native species, the only
concern that avoiding the hybridization area raised
was the potential presence of true native variation
where one SNP allele was largely restricted to the
central coast hybridization area, the rest of the native
range fixed or nearly fixed for the alternative allele,
and the invasive barred tiger salamander happened to
share the central coast allele. In this scenario, the central coast SNP variant would be identical by state to
the non-native allele (since biallelic SNPs only occur
in two states), and would be misinterpreted as invasive wherever it was later found. We viewed this as
extremely unlikely.
Based on ancestry inference from this set of 18 reference individuals from 9 widely dispersed locations
across the species range, we analyzed a transect from
the heart of the hybrid area north into pure A. californiense for a set of 68 independent SNP markers that were
available at the time to examine patterns of introgression across the genome (Fitzpatrick et al. 2010). Sixtyfive markers showed the expected pattern, with a steep
shift from high non-native frequency at the southern
Vol.: (0123456789)
B. M. Fitzpatrick et al.
end of the transect in the Salinas Valley to pure native
within 10–20 km north of the known introduction sites.
However, to our surprise, three markers showed what
appeared to be an extraordinary penetration of the nonnative allele into native A. californiense populations,
moving roughly 90 km further than the other 65 markers. We considered these outlier alleles to be ‘superinvasive’, with presumably strong selection driving their
high frequency and rapid spread. Although this interpretation stretched our credulity of how quickly these
non-native alleles could spread based on our knowledge
of salamander movement biology, all evidence pointed
to their reality as alleles that moved much more rapidly
than the invasion front of non-native alleles.
Here we provide two new lines of evidence that
strongly suggest a reinterpretation of our earlier
results. Armed with new genomic tools, additional
key specimens, and a novel approach to infer the
source of alleles, we reexamined the superinvasive
hypothesis for these three candidate markers. First,
we were able to sequence A. californiense samples
collected from the central coast decades before the
known introduction of A. mavortium, and found that
the three SNPs that we had interpreted as superinvasive were naturally already present in the central
coast portion of the range of A. californiense. Following that observation, we used new DNA sequencing
capabilities to analyze the flanking genomic regions
containing the presumptive superinvasive SNPs to
more accurately recover the ancestry information in
DNA data from hybrid populations. By analyzing
these sequences phylogenetically, we were able to
infer their ancestry, even though we lacked perfect
knowledge of all haplotype variation in the reference
species.
Our conclusion is that we erroneously interpreted
naturally occurring, geographically restricted SNPs
as extraordinarily mobile non-native A. mavortium
alleles. Our new analyses strongly suggest that the
hybrid invasion is still largely restricted to the central
coast part of the A. californiense range.
Methods
Samples and data
We obtained two ethanol-preserved samples of A. californiense from the California Academy of Sciences
Vol:. (1234567890)
collected near Stanford University in 1909 and 1921
(California Academy of Sciences specimen numbers
CAS SUA-3466 and CAS 50182, respectively). These
samples pre-date the introduction of barred tiger
salamanders to California in the 1950’s (Riley et al.
2003), and were collected in the central coast region
roughly 100 km northwest of the epicentre of deliberate introductions in the Salinas Valley. We included
these samples in a large exon capture analysis (McCartney-Melstad et al. 2016) of 1624 previously collected hybrid and reference samples, including one
representative of the closely related model species,
the Mexican axolotl (A. mexicanum), which is more
closely related to A. mavortium than A. californiense,
to help contextualize estimated gene trees (Everson
et al. 2021). Almost all DNA samples were taken
from larvae, which cannot be identified by phenotype
as native, non-native, or hybrid, therefore the genetic
status of each sample within California was inferred
based on DNA data.
The original SNPs were identified by Fitzpatrick
et al. (2009) by first aligning EST sequences (Smith
et al. 2005) from A. californiense, A. mexicanum, A.
tigrinum, and A. mavortium sampled from their native
range. Based on the Ambystoma phylogeny, they
reasoned that SNPs differentiating A. californiense
from the other three were likely to be reliable markers of native (A. californiense) vs non-native ancestry. After designing FP-TDI SNP assays, they tested
the SNPs on a set of 19 A. californiense from nine
populations assumed to be pure native based on their
geographic distance from the known area of introduction (the specimen from Santa Barbara County listed
by Fitzpatrick et al. (2009) was not used as a native
reference owing to concerns about introgression from
an introduced population in the nearby Lompoc Valley). They also used a reference set of eight A. mavortium from an introduced population in Lake County,
CA (USA) which was outside the native range of A.
californiense and verified by a former bait dealer as a
site of deliberate introduction (Fitzpatrick and Shaffer
2007a, b; Johnson et al. 2011). Moreover, adult salamanders collected at the Lake County site appeared
to be A. mavortium based on color pattern and head
shape (Ryan et al 2009; Johnson et al. 2010). The
final set of 68 SNP markers was diagnostic in the test
panel (Fitzpatrick et al. 2009).
For the current study, we extracted DNA from tissue samples using a salt-based extraction protocol
New evidence contradicts the rapid spread of invasive genes into a threatened native species
(Sambrook and Russell 2001). Extractions were normalized to 100 ng/uL and sheared to approximately
300–500 bp on a Bioruptor NGS (Diagenode). We
prepared libraries using KAPA LTP library preparation kit half reactions (KAPA Biosystems, Wilmington, MA) and universal stubs, adding dual 8-bp
adapter index sequences via a 6-cycle PCR reaction (Glenn et al. 2019). We pooled libraries into
groups of 8 with 500 ng of input library each and
enriched for a set of 5,237 exons using biotinylated
RNA probes (Arbor Biosciences, Ann Arbor, MI)
in the presence of 30,000 ng of ambystomatid c­ 0t-1
sequence blocker (McCartney-Melstad et al. 2016).
We amplified enriched libraries with 14 cycles of
PCR and combined 19 enrichments at an equimolar
ratio to create pools of 152 samples for sequencing on
Illumina HiSeq 4000 150 bp paired-end lanes.
We used skewer v0.2.2 (Jiang et al. 2014) to trim
reads for adapter contamination, discarding trimmed
reads shorter than 40 bp. We used the Genome Analysis Toolkit (GATK) version 3.8–1 to call SNPs and
genotypes (McKenna et al. 2010; Van der Auwera
et al. 2013). We ran HaplotypeCaller on individual
samples and GenotypeGVCFs to jointly call and genotype SNPs and short indels with the target sequences
as the reference set. The final target sequences were
assembled by McCartney-Melstad et al. (2016). They
derived these by mapping their reads to the original
probe sequences, identifying reciprocal best BLAST
hits (RBBHs), and masking potential chimeras by
blasting the RBBHs to themselves. The target locus
RBBH reference sequences we used here range from
534 to 1075 base pairs, and are given in Online
Resource 1.
We filtered SNPs with hard filters as follows: QD < 2.0, MQ < 20.0, FS > 60.0, MQRankSum < −12.5,
−8.0 > ReadPosRankSum > 8.0,
SOR > 5.0, and QUAL < 30. We filtered indels with
the following hard filters: QD < 2.0, SOR > 10.0,
FS > 60.0,
−8.0 > ReadPosRankSum > 8.0,
and
QUAL < 30. Genotype calls that had a depth lower
than 8 or a GQ score lower than 20 were set to missing data. We removed samples with more than 75%
missing data. Then, we removed indels and SNPs
with more than 25% missing data across the remaining samples.
We extracted data mapping to the original SNP
loci, including those containing the putative superinvasive SNPs (E6E11, E12C11, and E23C6) with
vcftools (Danecek et al. 2011), and phased genotypes
within each target region using BEAGLE without
imputing missing data (Browning et al. 2021). We
extracted target-level haplotypes for each sample (a
maximum of two haplotypes per sample). The final
dataset included 1399 individuals (Online Resource
2).
Range‑wide analysis of the original outlier SNPs
To verify our prior result with the current genotyping-by-sequencing methods and broader geographic
sample, we used the same SNP positions as before
(Fitzpatrick et al. 2010) to call genotypes and scan
for outliers. We plotted the geographic distributions
of each SNP allele of the putative superinvasive loci
to assess the extent of apparent introgression of the
alleles formerly classified as introduced. We tested for
outlier SNPs using genomic clines fitted with maximum likelihood by the R function HIest (Fitzpatrick
2013). We fitted clines using three alternative mathematical models suited for diagnostic loci (Barton,
logit-logistic, and beta cline functions) and chose
the form with best fit to the putative superinvasive
SNPs. To detect outliers, we entered the estimated
cline parameters (μ and ν for each locus) as bivariate
data in aq.plot (Filzmoser and Gschwandtner 2021)
to compute a squared Mahalanobis distance (D2) for
each locus and compare it to the quantiles of the χ2
distribution with two degrees of freedom. To better
conform to a multivariate normal distribution, we
used the natural log of ν.
Analysis of haplotypes
To better evaluate the ancestry information in the
SNP-containing loci, we used the inferred DNA
sequences of the full phased haplotypes aligned to the
final target sequences. We estimated haplotype trees
(with unique haplotypes as tips) using maximum likelihood with FastTree2 (Price et al. 2010). As a check
on the FastTree2 results for the three outlier loci, we
also estimated trees with MrBayes (Ronquist et al.
2012) using the HKY + I + G substitution model after
model selection using mrModeltest v2 (Nylander
2004). We ran the MrBayes algorithm for 2 × ­
106
generations, 2 independent runs of 4 chains, and 25%
burnin. The Bayesian and maximum likelihood trees
were identical.
Vol.: (0123456789)
B. M. Fitzpatrick et al.
We inferred the ancestry of haplotypes that did
not appear in the reference samples if they unambiguously clustered with one set of reference
Vol:. (1234567890)
samples. To score the ancestry of each unique haplotype as native or non-native, we used the following algorithm:
New evidence contradicts the rapid spread of invasive genes into a threatened native species
◂Fig. 2 Approximate native range of the California tiger sala-
mander (shown in grey: 870 unique locations from California
Department of Fish and Wildlife (2018)) with the primary
area of introductions indicated by the oval. Geographic distribution (blue) of SNP alleles characteristic of native reference
populations, and (red) SNP alleles formerly thought to represent “superinvasive” alleles from non-native barred tiger salamanders (A. mavortium). E6E11, E12C11, and E23C6 are loci
mapped to separate chromosomes in the Ambystoma genome
(E23C6 actually maps to two chromosomes). The red alleles
are concentrated in the central coast region where true nonnative alleles dominate, were fixed in extralimital populations
known to be introduced, and were rare or absent in the rest of
the range
1. Identify the node that groups the reference set of
native haplotypes into the smallest possible subtree and root the tree at this node. Note: this was
done to establish polarity with respect to the nonnative haplotypes. This made it easier to automate the identification of the most recent common ancestor (MRCA) in the next step.
2. Identify the MRCA of the reference set of nonnative haplotypes and re-root the tree at this
node.
3. Re-determine the MRCA of the reference native
haplotypes on the re-rooted tree.
4. Discard the locus if the native and non-native
MRCAs are the same (indicates an uninformative
gene tree with respect to native/non-native ancestry).
5. For each haplotype (tip on the tree) calculate its
distance to each node, and determine whether it
is closer to one of the MRCA nodes or to a node
on the path between the MRCA nodes.
6. If the haplotype is closest to the native MRCA,
score it as native. If it is closest to the non-native
MRCA, score it as non-native.
7. If the haplotype is closest to a node on the path
between the MRCA nodes, then its ancestry is
potentially ambiguous. If the internode it connects to is closer to either of the MRCA nodes
than to the midpoint between the MRCA nodes,
then it is scored as whichever MRCA it is closer
to; otherwise, it is unscorable. In other words, the
middle 50% of the path between the reference
MRCA nodes is considered unscorable.
Using this gene tree approach, we then examined
the geographic distribution of the scored haplotypes
and performed outlier detection on fitted genomic
clines as before. We were able to perform genomic
cline outlier analyses for 42 of the original 65 loci in
addition to the three candidate superinvasives for both
SNP-based and haplotype-based analyses. The other
23 original loci either could not be included in the
target capture (15 loci) or failed to resolve monophyletic native/non-native subtrees (i.e., failed at step 4 of
the algorithm; eight loci).
Results
Range‑wide analysis of the original outlier SNPs
For the three previously identified putatively superinvasive SNPs, we recovered the same geographic pattern as that published in 2010 using the FP-TDI SNP
scoring method: Alleles previously identified as characteristic of A. mavortium were common throughout
much of the range of California tiger salamanders
(Fig. 2). Genomic cline model comparison indicated
that the Beta function was the best fit to the three
putative superinvasive SNPs, therefore we performed
outlier analysis using the fitted parameters of this
model. To better conform to a bivariate normal distribution, we log-transformed the ν parameter prior to
computing the squared Mahalanobis distances (D2).
Outlier analyses identified all three markers as having D2 exceeding the 99.999th percentile of the χ2
(df = 2) distribution (Fig. 3). Their genomic clines are
highly displaced, as previously observed, consistent
with differential introgression of those SNPs relative
to the genomic background (Fig. 3). The only other
marker exceeding this percentile was Gnat1, which is
known to exhibit a heterozygote deficit arising from
early embryonic mortality of hybrids (Johnson et al.
2010). The genomic cline for Gnat1 is exceptionally
steep but not displaced, as expected for a locus with
heterozygote disadvantage (Fitzpatrick 2013).
Analysis of pre‑introduction museum specimens
Both of the pre-introduction A. californiense samples
from Stanford also had the SNP alleles characteristic
of introduced A. mavortium. Therefore, these SNPs
cannot be diagnostic of introduced ancestry, as previously thought. We confirmed this inference using
the entire DNA sequence flanking the SNPs to estimate gene trees for each locus, demonstrating that
Vol.: (0123456789)
B. M. Fitzpatrick et al.
3
Gnat1
0.4
ln( )
Gnat1
1
E12C11
2
E23C6
0.6
0.8
Beta Cline Parameters
0
0.2
E23C6
E6E11
E12C11
-1
0.0
Locus-specific Ancestry
1.0
Genomic Clines
E6E11
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Multi-locus Ancestry
Fig. 3 Genomic clines fitted to biallelic SNP data where individuals were scored as having zero, one, or two copies of the
SNP allele that had been classified as introduced based on the
original reference data. Clines described by the beta equation
(Fitzpatrick 2013) were fitted to each locus by maximum likeli-
hood (left) and the parameters were used to calculate a squared
Mahalanobis distance (D2) for each locus from the bivariate
centroid of the distribution (right). Outlier loci with D2 greater
than the 99.999 percentile of the χ2 distribution (df = 2) are
shown in red
the haplotypes carried by the pre-introduction samples grouped with other native haplotypes (Online
Resources 2–5: The samples were scored as homozygous for haplotypes 28 for E6E11, C for E12C11, and
D for E23C6, all of which were common in modern
samples from the western part of the native range).
ancestry-informative sites. This lack of clear genealogical separation between native and non-native
reference haplotypes indicates that E23C6 is not an
appropriate marker for differentiating tiger salamander ancestry. Moreover, when we aligned E23C6 to
the axolotl genome, it mapped to two chromosomes,
suggesting that it may be a paralog. The full genome
was not available when the original markers were
developed, so it is not surprising if a few (particularly
outliers) turn out not to be single-copy.
We used the gene trees for E6E11 and E12C11 to
score haplotypes as native or invasive based on their
genealogical affinities illustrated in Fig. 4. The geographic distribution of inferred invasive haplotypes
matches the known distribution of the hybrid populations (Fitzpatrick and Shaffer 2007b), with no evidence of differential introgression (Fig. 5). Genomic
cline outlier analysis of the scored haplotype data did
not support outlier status for the candidate loci E6E11
and E12C11, although it did still identify Gnat1 as an
outlier with a signature consistent with heterozygote
disadvantage (Fig. 6). In fact, the SNP and haplotype
data were identical for all loci except for the three
putative superinvasive SNPs.
We used all 44 loci with haplotypes scored as
native or non-native to re-evaluate the distribution
of introduced alleles in California (Fig. 7). Outside
of the Salinas Valley in Monterey County, we found
Range‑wide analysis of haplotypes
The haplotype trees for loci E6E11 and E12C11
clearly show several haplotypes clustering with native
A. californiense despite having the SNP formerly
characterized as invasive (Fig. 4), further confirming
that the SNP is an ancestral polymorphism shared by
both the native and introduced species. Two ambiguous haplotypes of E6E11 that fell on the long internal branch of the gene tree (Fig. 4) were observed
only in the Salinas Valley; otherwise, all haplotypes
were clearly clustering with one of the two species.
These ambiguous haplotypes could be results of
recombination as they share variants with non-native
haplotypes toward the beginning of the sequence
and share variants with native haplotypes toward
the end (Online Resource 3), however the A. mexicanum reference sequences show the same pattern
and there is no definitive evidence of recombination
based on the four-gamete test. The tree for the third
locus, E23C6, is not resolvable because it has few
Vol:. (1234567890)
New evidence contradicts the rapid spread of invasive genes into a threatened native species
Fig. 4 Unrooted haplotype trees estimated for each locus with
MrBayes using one representative sequence per haplotype.
Haplotypes with red branches contain the SNP alleles formerly
assumed to be diagnostic of non-native ancestry. Our treebased ancestry inference algorithm assigns as native all haplotypes unambiguously clustering with Ambystoma californiense
reference samples (blue dots), and as introduced all haplotypes
unambiguously clustering with A. mavortium reference samples (red dots). Haplotypes falling along the middle half of
the internal branch connecting these clusters are considered
ambiguous and cannot be assigned with confidence to either
species. Support for the clades encircled by dashed lines was
greater than 0.95 posterior probability
non-native A. mavortium haplotypes in samples previously identified as independent introductions outside
of the range of native A. californiense in Lake, Siskiyou, Mono, Kern, and San Diego counties (Johnson
et al. 2011). A non-native population was also sampled in the Lompoc Valley (Santa Barbara County)
near pure native populations of the Santa Barbara
Distinct Population Segment of A. californiense. In
addition, we found previously unpublished non-native
genotypes in Great Valley Grasslands State Park
(Merced County) and Stanford (Santa Clara County).
The Stanford samples are of undocumented specific
origin and may have been collected in the Salinas
Valley. Finally, we confirmed the presence of nonnative alleles in Sonoma County—the northernmost
non-native alleles within the native range shown in
Fig. 7, see also Cooper et al. (submitted).
Discussion
Accurately identifying hybrid genotypes is central to
evolutionary, population, and conservation genetics.
Detection of introgression rests on the assumption
that the ancestry of each allele can be inferred with
an acceptable level of certainty. Previous work in
Vol.: (0123456789)
B. M. Fitzpatrick et al.
Fig. 5 Geographic distributions of haplotypes scored as native
(blue and violet) and introduced (red) based on genealogical
affinities (Fig. 2). Violet indicates native haplotypes contain-
ing the SNP formerly assumed to be diagnostic of non-native
ancestry for that locus
this study system has addressed statistical and philosophical issues surrounding the designation of ‘pure’
and ‘hybrid’ populations in the general framework
of multi-locus population genetics (e.g., Fitzpatrick
and Shaffer 2007b; Fitzpatrick et al. 2010, 2015;
Wayne and Shaffer 2016; Searcy et al. 2016). Here,
we focused on the few markers that were identified
as outliers in previous studies, suggesting incredibly
fast differential introgression of non-native alleles
at these loci relative to the majority of the genome.
While individual or population level estimates of
whole genome ancestry proportions do not require
fixed differences or comprehensive knowledge of
ancestral populations (e.g., Pritchard et al. 2000;
Gompert and Buerkle 2011), interpretation of introgression outliers depends critically on the assumed
distribution of specific alleles prior to the initiation
of hybridization. That is, we must first understand
which alleles are native, and which are introduced.
Unfortunately, homoplasy or shared ancestral variation can lead to false inferences of introgression when
a putative recipient population is naturally segregating for an allele that is characteristic of a putative
source population. False inference can be particularly
problematic when working with SNPs because of
their extreme simplicity. This is a special case of the
broader problem of SNP ascertainment bias, wherein
population genetic inferences can be biased owing
Vol:. (1234567890)
New evidence contradicts the rapid spread of invasive genes into a threatened native species
Beta Cline Parameters
3.0
2.0
0.5
1.0
1.5
ln( )
0.6
0.4
0.0
0.2
0.0
0.2
0.4
0.6
0.8
1.0
Multi-locus Ancestry
Fig. 6 Genomic clines fitted to haplotype data where individuals are scored as having zero, one, or two introduced haplotypes. Clines described by the beta equation (Fitzpatrick 2013)
were fitted to each locus by maximum likelihood (left) and
the parameters were used to calculate a squared Mahalanobis
Native CTS
Introduced or hybrid
North
scale approx 1:8,300,000
0
Gnat1
2.5
0.8
Gnat1
0.0
Locus-specific Ancestry
1.0
Genomic Clines
100 200 km
Fig. 7 Updated map of California, USA, showing the geographic distribution of the native California tiger salamander
(Ambystoma californiense) in black (870 unique locations) and
localities with at least one introduced barred tiger salamander
(A. mavortium) haplotype in red
0.0
0.1
0.2
0.3
0.4
0.5
0.6
distance (D2) for each locus from the bivariate centroid of the
distribution (right). Only Gnat1 was an outlier with D2 greater
than the 99.999 percentile of the χ2 distribution (df = 2). The
candidate superinvasive loci E6E11 and E12C11, also shown
in red, are near the centroid
to the systematic choice of nucleotide variants with
particular characteristics in a reference sample rather
than a truly random sample of sites from the genome
(Rogers and Jorde 1996; Clark et al. 2005; Dokan
et al. 2021).
Figures 1 and 4 illustrate how inadequate sampling of a structured population can lead to the inaccurate assignment of a diagnostic SNP, and how
tree based haplotype inference can overcome the
problem of inadequate reference samples. The basic
steps involved in tracking non-native alleles in native
populations are straightforward and well established.
First, a comprehensive, complete set of reference
individuals from both the native and non-native species that are known to be unaffected by hybridization
must be assembled and sequenced for target markers.
A complete inventory of variation is critical to avoid
the misinterpretation of native polymorphism as nonnative introgression (Fig. 1). Particularly for wellestablished hybridization situations, this can be more
challenging than first appears, because genetic variation at any marker within a known hybrid population
might include ancestral native variation that was present before hybridization, variation introduced from
the non-native species, or both. The logical path forward in such situations is to avoid including individuals from known hybrid populations or their nearby
neighbors in the native reference panel, since variants
Vol.: (0123456789)
B. M. Fitzpatrick et al.
cannot be unambiguously attributed to either species.
However, the possibility always exists that the current
area of hybridization harbors unique native alleles
that happen to be shared with the non-native relative.
In such a situation, by excluding individuals from
the hybrid area in the native reference panel, those
alleles will be mistakenly misinterpreted as exclusively derived from the non-native species rather than
as natural, but geographically limited native variants
(Fig. 1). We argue that such errors are more likely
when (1) very simple genetic markers, especially
biallelic SNPs are analyzed (since the allele fixed in
the non-native species and a potential local variant in
the native species are by definition identical by state),
and (2) when the hybrid area is large, environmentally unique, or bounded by natural barriers to gene
flow (in all cases, the potential for a native variant
being restricted to the hybrid area is relatively great).
We further argue that, in the absence of genetic
samples from the hybrid area collected before the
hybridization occurred, this problem can sometimes
be resolved with the phylogenetic analysis of genetic
markers of sufficient complexity that their ancestry
(native or non-native) can be unambiguously determined (Figs. 1 and 4). In our salamander case study,
locus E6E11 includes a clade of haplotypes that share
several unique SNPs, including the one used in our
original study. Our reference sample included only
individuals carrying those haplotypes, leading us to
attribute alleles from the other native haplotype clade
to introgression from A. mavortium based on a single SNP. This limitation was overcome in the current study by using the more expansive haplotype
sequence and its associated gene tree to assign ancestry to haplotypes that were not observed in the reference sample. Doing so is a natural extension of the
current practice of identifying diagnostic SNPs. However, by analyzing sequences, each containing multiple SNPs, the likelihood of misinterpreting ancestral
polymorphism is dramatically reduced.
The probability that isolated populations share
polymorphisms decreases with time divided by
effective population size (Wright 1931). Conversely,
the probability that any particular locus will have a
well resolved reciprocally monophyletic gene tree
increases with time divided by effective population
size (Tajima 1983). Thus, for very recently diverged
groups, no approach is certain to yield diagnostic
markers for ancestry. However, there is likely to be a
Vol:. (1234567890)
substantial window in which many gene trees will be
well-resolved while many individual nucleotide positions will be non-diagnostic (i.e., when branch lengths
within populations are long relative to between populations). In fact, because there are only four possible
states for a nucleotide, there is always a reasonable
chance of homoplasy at the nucleotide level, even for
deeply diverged lineages (Jukes and Cantor 1969).
We believe that our previous inference of rapid
spread of invasive alleles was incorrect, and arose
from our misinterpretation of ancestral polymorphism present within the California tiger salamander. Our incorrect inference stemmed from several
aspects of our earlier study. First, by using biallelic
SNPs defined as diagnostic from a systematic, but
incomplete sampling of the range of the species, we
assumed that the common native allele present in our
reference samples also characterized the central coast
(Fitzpatrick et al. 2009). For three loci and the reference individuals we happened to sample, that was
not the case. Second, in that detailed analysis of the
hybridization dynamics of California and barred tiger
salamanders, we analyzed only a transect extending
north from the hybrid area into a pure native area.
While both reasonable and comprehensive in scope,
this obscured the widespread occurrence of the putative invasive allele, especially at lower frequencies to
the east (e.g., Fig. 2). And third, our relatively modest sampling of our reference populations, with a
single individual from each of nine widely dispersed
sites from outside of the known hybrid area, means
that we could, and did, miss an allele if it was at a
low or moderate frequency, as happened for markers
E6E11 and E12C11 in the southeastern populations
of the southern San Joaquin Valley. For two of the
three loci, tree-based haplotype scoring alone would
have averted the false inferences, and for the third,
the locus would have been discarded as uninformative. Even in the absence of comprehensive reference
population data for the central coast, the haplotypes
are sufficiently rich in variation that genealogical
affinities of most observed haplotypes can be inferred
(Fig. 2). In all three cases, the new data from the two
pre-introduction specimens confirmed that the SNPs
formerly scored as non-native were present in A. californiense decades before the known introductions of
A. mavortium in the 1950s.
For those with a specific interest in the conservation of A. californiense, our updated assessment of
New evidence contradicts the rapid spread of invasive genes into a threatened native species
the geographic distribution of introduced A. mavortium in California is as follows (Fig. 7): Relatively
isolated pure introduced populations are found outside of the range of A. californiense in Lake, Siskiyou, Mono, Kern, and San Diego counties (Johnson
et al. 2011). Hybrid populations are widespread in the
Salinas Valley (Monterey, San Benito, and southern
Santa Clara counties), and have been detected, but are
rare, in Great Valley Grasslands State Park (Merced
County), the Santa Rosa Plain (Sonoma County),
Lompoc Valley (Santa Barbara County), and possibly
Stanford (Santa Clara County). Many of these sites
are known to have been targets of deliberate introductions to establish harvestable populations for fishing
bait (Riley et al. 2003; Fitzpatrick and Shaffer 2007b).
Contrary to our previous inference of extensive introgression of three genes, there is no evidence that the
hybrid population centered in the Salinas Valley has
spread much farther than 12 km from specific known
or suspected introduction sites in the last seven decades (Fitzpatrick and Shaffer 2007b).
Finally, hybrids do appear occasionally in places
far from the Salinas Valley hybrid area. The presence
of extralimital hybrids and pure A. mavortium suggest
that humans have repeatedly translocated non-native
tiger salamanders in California, and demonstrates
the need for continued monitoring of native populations for signs of admixture and introgression. Barred
tiger salamanders and hybrids exhibit significant ecological differences from native California tiger salamanders, including negative impacts on other native
amphibians and endangered vernal pool invertebrates
(Ryan et al. 2009; Searcy et al. 2016). While the
results presented here are more optimistic than our
previous geographical assessment of the invasion progression (Fitzpatrick et al. 2010), hybrid larvae have
been shown to have greater fitness than native larvae
(Fitzpatrick and Shaffer 2007a; Cooper and Shaffer
2023) and negative ecological consequences for vernal pool ecosystems (Ryan et al. 2009; Searcy et al.
2016). Therefore, the spread of hybrid genotypes
remains a risk that must be both monitored, and ideally, reversed before it spreads further.
Acknowledgements This work used the Extreme Science
and Engineering Discovery Environment (XSEDE), which is
supported by National Science Foundation grant number ACI1548562, and the Stampede2 cluster at the University of Texas
through allocation TG-DEB180005, as well as the Vincent J.
Coates Genomics Sequencing Laboratory at UC Berkeley,
supported by NIH Instrumentation Grant S10 OD018174. We
thank the Department of Herpetology, California Academy of
Sciences for tissue sample loans of the two specimens that predate the introduction of barred tiger salamanders to California.
Author contributions All authors contributed to the study
conception, design, and field sampling. Lab work and bioinformatics were performed by Evan McCartney-Melstad. Data
analysis was performed by Ben Fitzpatrick. The first draft of
the manuscript was written by Ben Fitzpatrick and all authors
contributed to previous versions of the manuscript. All authors
read and approved the final manuscript.
Funding Funding was provided by the National Science
Foundation, the Central Valley Project Conservation Program,
and the California Department of Transportation to HBS.
Data availability All data supporting the findings of this
study are available within the paper and its Supplementary
Information.
Declarations
Conflict of interest The authors have no relevant financial or
non-financial interests to disclose.
References
Allendorf FW, Leary RF, Spruell P et al (2001) The problems
with hybrids: setting conservation guidelines. Trends Ecol
Evol 16:613–622
Browning BL, Tian X, Zhou Y et al (2021) Fast two-stage
phasing of large-scale sequence data. Am J Hum Genet
108:1880–1890
California Department of Fish and Wildlife (2018) California
Natural Diversity Database (CNDDB)–Government version dated June 1, 2018. Retrieved June 12, 2018 from
https://​map.​dfg.​ca.​gov/​raref​i nd/​view/​RareF​ind.​aspx
Chatfield MWH, Kozak KH, Fitzpatrick BM et al (2010) Patterns of differential introgression in a salamander hybrid
zone: inferences from genetic data and ecological niche
modelling. Mol Ecol 19:4265–4282
Clark AG, Hubisz MJ, Bustamante CD et al (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15:1496–1502
Cooper RD, Luckau TK, Toffelmier E et al (submitted) A novel
genetic tool to enable rapid detection of rare non-native
alleles. Scientific Reports
Cooper RD, Shaffer HB (2023) Managing invasive hybrids
with pond hydroperiod manipulation in an endangered
salamander system. Cons Biol 2023:e14167
Danecek P, Auton A, Abecasis G et al (2011) The variant call
format and VCFtools. Bioinformatics 27:2156–2158
DeVos TB, Bock DG, Kolbe JJ (2023) Rapid introgression
of non-native alleles following hybridization between a
native Anolis lizard species and a cryptic invader across
an urban landscape. Mol Ecol 32:2930–2944
Vol.: (0123456789)
B. M. Fitzpatrick et al.
Dokan K, Kawamura S, Teshima KM (2021) Effects of single
nucleotide polymorphism ascertainment on population
structure inferences. G3 (Bethesda) 11
Draper D, Laguna E, Marques I (2021) Demystifying negative connotations of hybridization for less biased conservation policies. Front Ecol Evolut. https://​doi.​org/​10.​
3389/​fevo.​2021.​637100
Everson KM, Gray LN, Jones AG, et al. (2021) Geography is
more important than life history in the recent diversification of the tiger salamander complex. Proc Natl Acad
Sci U S A 118
Filzmoser P, Gschwandtner M (2021) Mvoutlier: multivariate
outlier detection based on robust methods. R package
version 2.1.1, <https://​CRAN.R-​proje​ct.​org/​packa​ge=​
mvout​lier>.
Fitzpatrick BM (2012) Estimating ancestry and heterozygosity of hybrids using molecular markers. BMC Evolut
Biol 12:1–14
Fitzpatrick BM (2013) Alternative forms for genomic clines.
Ecol Evol 3:1951–1966
Fitzpatrick BM, Shaffer HB (2007a) Hybrid vigor between
native and introduced salamanders raises new challenges for conservation. Proc Natl Acad Sci USA
104:15793–15798
Fitzpatrick BM, Shaffer HB (2007b) Introduction history
and habitat variation explain the landscape genetics of
hybrid tiger salamanders. Ecol Appl 17:598–608
Fitzpatrick BM, Johnson JR, Kump DK et al (2009) Rapid
fixation of non-native alleles revealed by genome-wide
SNP analysis of hybrid tiger salamanders. BMC Evol
Biol 9:176
Fitzpatrick BM, Johnson JR, Kump DK et al (2010) Rapid
spread of invasive genes into a threatened native species. Proc Natl Acad Sci USA 107:3606–3610
Fitzpatrick BM, Ryan ME, Johnson JR et al (2015) Hybridization and the species problem in conservation. Curr
Zool 61:206–216
Glenn TC, Nilsen RA, Kieran TJ et al (2019) Adapterama I:
universal stubs and primers for 384 unique dual-indexed
or 147,456 combinatorially-indexed Illumina libraries
(iTru & iNext). PeerJ 7:e7755
Gompert Z, Buerkle CA (2011) Bayesian estimation of
genomic clines. Mol Ecol 20:2111–2127
Jiang H, Lei R, Ding S-W et al (2014) Skewer: a fast and
accurate adapter trimmer for next-generation sequencing
paired-end reads. BMC Bioinform. https://​doi.​org/​10.​
1186/​1471-​2105-​15-​182
Johnson JR, Fitzpatrick BM, Shaffer HB (2010) Retention
of low-fitness genotypes over six decades of admixture
between native and introduced tiger salamanders. BMC
Evol Biol 10:147
Johnson JR, Thomson RC, Micheletti SJ et al (2011) The origin of tiger salamander (Ambystoma tigrinum) populations in California, Oregon, and Nevada: introductions
or relicts. Conserv Genet 12:355–370
Jukes TH, Cantor CR (1969) Evolution of protein molecules.
Academic Press, New York
Lipshutz SE, Meier JI, Derryberry GE et al (2019) Differential introgression of a female competitive trait in a
hybrid zone between sex-role reversed species. Evolution 73:188–201
Vol:. (1234567890)
McCartney-Melstad E, Mount GG, Shaffer HB (2016) Exon
capture optimization in amphibians with large genomes.
Mol Ecol Resour 16:1084–1094
McKenna A, Hanna M, Banks E et al (2010) The genome
analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res
20:1297–1303
Nylander JAA (2004) MrModeltest v2. Evolutionary biology
centre, Uppsala University, Program distributed by the
author
Pereira V, Santangelo R, Børsting C et al (2020) Evaluation
of the precision of ancestry inferences in south American admixed populations. Front Genet. https://​doi.​org/​
10.​3389/​fgene.​2020.​00966
Petit RJ (2004) Biological invasions at the gene level. Divers
Distrib 10:159–165
Price MN, Dehal PS, Arkin AP (2010) FastTree 2–approximately maximum-likelihood trees for large alignments.
PLoS ONE 5:e9490
Pritchard JK, Stephens M, Donnelly P (2000) Inference of
population structure using multilocus genotype data.
Genetics 155:945–959
Rhymer JM, Simberloff D (1996) Extinction by hybridization
and introgression. Annu Rev Ecol Syst 27:83–109
Riley SPD, Shaffer HB, Voss SR et al (2003) Hybridization
between a rare, native tiger salamander (Ambystoma
californiense) and its introduced congener. Ecol Appl
13:1263–1275
Rogers AR, Jorde LB (1996) Ascertainment bias in estimates of average heterozygosity. Am J Hum Genet
58:1033–1041
Ronquist F, Teslenko M, van der Mark P et al (2012)
MrBayes 3.2: efficient Bayesian phylogenetic inference
and model choice across a large model space. Syst Biol
61:539–542
Ryan ME, Johnson JR, Fitzpatrick BM (2009) Invasive hybrid
tiger salamander genotypes impact native amphibians.
Proc Natl Acad Sci USA 106:11166–11171
Sambrook J, Russell DW (2001) Molecular cloning: a laboratory manual (3-volume set). Cold Spring Harbor, New
York
Searcy CA, Rollins HB, Shaffer HB (2016) Ecological equivalency as a tool for endangered species management. Ecol
Appl 26:94–103
Smith JJ, Kump DK, Walker JA et al (2005) A comprehensive
expressed sequence tag linkage map for tiger salamander
and Mexican axolotl: Enabling gene mapping and comparative genomics in Ambystoma. Genetics 171:1161–1171
Tajima F (1983) Evolutionary relationship of DNA sequences
in finite populations. Genetics 105:437–460
Todesco M, Pascual MA, Owens GL et al (2016) Hybrid
Extinct Evolut Appl 9:892–908
Van der Auwera GA, Carneiro MO, Hartl C et al (2013) From
FastQ data to high confidence variant calls: the genome
analysis toolkit best practices pipeline. Curr Protoc Bioinform 43(1):11–10
Wayne RK, Shaffer HB (2016) Hybridization and endangered species protection in the molecular era. Mol Ecol
25:2680–2689
Wright S (1931) Evolution in mendelian populations. Genetics
16:97–159
New evidence contradicts the rapid spread of invasive genes into a threatened native species
Publisher’s Note Springer Nature remains neutral with regard
to jurisdictional claims in published maps and institutional
affiliations.
Springer Nature or its licensor (e.g. a society or other partner)
holds exclusive rights to this article under a publishing
agreement with the author(s) or other rightsholder(s); author
self-archiving of the accepted manuscript version of this article
is solely governed by the terms of such publishing agreement
and applicable law.
Vol.: (0123456789)
Download