Cryptic population structuring in Scandinavian lynx: reply to Pamilo

advertisement
Molecular Ecology (2006) 15, 1189– 1192
doi: 10.1111/j.1365-294X.2005.02782.x
COMMENT
Blackwell Publishing Ltd
Cryptic population structuring in Scandinavian lynx:
reply to Pamilo
P . E . J O R D E ,*† E . K . R U E N E S S ,* N . C . S T E N S E T H * and K . S . J A K O B S E N *
*Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, PO Box 1066 Blindern, N-0316 Oslo, Norway,
†Institute of Marine Research, Flødevigen Research Station, N-4817 His, Norway
Abstract
In a recent Commentary in this journal, Pamilo (2004) criticized our analysis of the spatial
genetic structure of the Eurasian lynx in Scandinavia (Rueness et al. 2003). The analyses
uncovered a marked geographical differentiation along the Scandinavian peninsula with
an apparent linear gradient in the north–south direction. We used computer simulations to
check on the proposition that the observed geographical structure could have arisen by
genetic drift and isolation by distance in the approximate 25 generations that have passed
since the last bottleneck. Pamilo disapproved of our choice of population model and also
how we compared the outcome of the simulations with data. As these issues should be of
interest to a wider audience we discuss them in some detail.
Keywords: computer simulations, genetic structure, isolation by distance, lynx, recolonization
Received 5 September 2005; revision accepted 23 September 2005
Background
‘How cryptic is the Scandinavian lynx?’ asks Pamilo in his
commentary published in Molecular Ecology last year (Pamilo
2004). His paper is a critique of our paper entitled ‘Cryptic
population structure in a large, mobile mammalian predator:
the Scandinavian lynx’ (Rueness et al. 2003) published in
the same journal. Pamilo’s criticism contains several misunderstandings and errors, and in this reply we address
the more general ones. Additional details will be provided
by the authors upon request.
The genetic population structuring of the Scandinavian
lynx, first described by Hellborg et al. (2002), is surprisingly pronounced, given the high mobility of the species
and the short time since recolonization, presumably since
the 1950s. Aiming at a deeper understanding of the differentiation mechanism(s), Rueness et al. (2003) expanded on
the previous analysis and included additional samples from
Scandinavia. The earlier study by Hellborg et al. (2002)
included 29 individuals with only approximately known
Correspondence: Per Erik Jorde, Fax: +47 37 05 90 01; E-mail:
p.e.jorde@bio.uio.no
© 2006 Blackwell Publishing Ltd
sampling locations, and these individuals could for that
reason not be included in the later analysis, which focused
on fine-scaled geographical structure. In addition to a
common central Scandinavian lynx population, Rueness
et al. (2003) demonstrated the occurrence of two distinct
groups or populations: one in southern Norway and one
in the northernmost part of Scandinavia. Combining the
population genetic patterns we observed, through both
individual-based and frequency-based genetic analyses,
with information about history, geography and lynx biology,
we concluded that the population history of the Scandinavian lynx most likely is more complex than earlier
assumed. In particular, we found with the aid of computer
simulations that the observed pattern was not likely to
have arisen after migration from a hypothetical single
source population in the brief time span available.
Choice of simulation model
The pattern of genetic differentiation in lynx in Scandinavia
was found to increase linearly with geographical distance.
Because such a linear pattern is expected theoretically in a
standard one-dimensional stepping-stone model, we chose
this model as a basis for our computer simulations. The
1190 P . E . J O R D E E T A L .
Table 1 FST/(1 – FST) and its slope simulated with different total
population array length (number of population units). The computer simulations were carried out as described in Rueness et al.
(2003), using Ne = 25, t = 25 generations, and averaging over
10 000 replicate runs.
Simulated population array length
Distance
5
10
100
1000
1 step
2 steps
3 steps
4 steps
Slope
0.068
0.104
0.133
0.161
0.031
0.066
0.098
0.121
0.139
0.024
0.065
0.093
0.113
0.127
0.021
0.065
0.092
0.112
0.126
0.020
Note that the simulated slopes are identical or nearly so for all
array lengths, except for the very shortest one (with five
populations). The latter reveals inflated FST values between the
terminal populations (four steps apart), probably caused by these
populations receiving only half the number of immigrants (three
instead of six per generation). This ‘edge-effect’ probably has
no counterpart in real lynx populations, which receive
immigrants from outside Scandinavia, at least in the north
(Rueness et al. 2003).
simulations aimed at testing the null hypothesis that the
present genetic structure in Scandinavian lynx has arisen
by random genetic drift and geographically restricted gene
flow, after expansion from a single source population some
25 generations ago. We initiated the population array with
equal allele frequencies in all populations in order to study
how rapid genetic differentiation builds up from an
undifferentiated source population. Biologically, this initial
condition represents the situation where the source population expands rapidly to fill the vacant habitat, corresponding
to the ‘radiation’ model of Slatkin (1993: p. 267). There is
thus no discrepancy between our verbal hypothesis on one
hand and our simulation model on the other, as claimed by
Pamilo.
The standard stepping-stone model assumes a long array
of populations (and is sometimes made to be circular)
to avoid ‘edge effects’ caused by the terminal populations
(Table 1; Maruyama 1970). We used an arbitrary length of
100 populations in the simulations, but studied differentiation among neighbouring five populations only, i.e.
populations separated by at most four migration steps.
Pamilo claims that such a long (100) chain of connected
populations approaches drift–migration equilibrium much
more slowly than an array with only five populations. This
is a misunderstanding because the total length of the
population array has little influence on the building up of
genetic differentiation among nearby populations (Fig. 1;
see also Slatkin 1993). The amount of differentiation between
populations separated by a given distance at a given time
Fig. 1 Results of computer simulations depicting the approach
towards drift–migration equilibrium in a finite stepping-stone
model (length = 100 populations of size N = 25). The dashed lines
represent FST/(1 – FST) values after t generations, starting with
uniform allele frequencies at t = 0. The solid line represents the
theoretical equilibrium slope of 1/(4Nm) or 0.043.
is therefore largely independent of the length of the total
population array (Table 1).
The simple stepping-stone model obviously cannot capture all the details of the real situation (nor is it supposed
to), but has the important virtue of requiring very few
parameters. Namely, just m, the rate of exchange among
neighbouring populations, and Ne, the effective size of
local populations. (The standard stepping-stone model
also includes an additional long-distance rate, minf, but this
term only lowers the overall amount of differentiation and
was therefore ignored.) To evaluate the genetic pattern
expected from this model, we simulated all realistic values
of Ne (12–200) and used an empirical estimate for M (the
product of m and Ne). As Pamilo observes, the estimation
of M assumes that the Scandinavian lynx populations are
in drift–migration equilibrium. As discussed in our original paper, this assumption is reasonable under the present
null hypothesis because the observed pattern is linear as
far as can be determined (cf. Figure 4 in Rueness et al. 2003),
and it would not be linear had it not yet reached equilibrium (cf. Fig. 1). This is so if the null hypothesis is correct.
If it is not, then a linear relationship may or may not arise
outside equilibrium, but in such situations the null hypothesis
should be rejected anyway, as was indeed the case.
Pamilo obviously concur with our approach of using
computer simulations to evaluate the mechanisms behind
the present genetic structure of Scandinavian lynx. He also,
© 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 1189– 1192
L Y N X P O P U L A T I O N S T R U C T U R E 1191
implicitly, agrees with the basic structure of our model, in
which the elongated Scandinavian peninsula is partitioned
into five geographical areas or ‘populations’ in a linear
configuration. Where our approaches differ is that we
choose the simplest model that could possibly explain the
data and evaluate that model over the (almost) entire
parameter space of the only free variable (Ne) in the model.
This evaluation was necessary because the real values are
very poorly known. Pamilo, in contrast, takes a different
approach and apparently tries to find a model that best fits
the data. His model introduces several additional parameters (a population growth rate, a carrying capacity, and
four different migration rates) with fixed, arbitrary values.
Pamilo’s ad hoc approach is useful to demonstrate what is
possible, but is problematic when the purpose is to test
hypotheses; if his model had been rejected by the data, how
many other parameters values or model variations should
be tried?
Comparing computer simulations with data
The slope of FST/(1 – FST) against distance in a onedimensional habitat is robust to geographical scaling and
depends only on dispersal (Rousset 1997). Consequently,
we compare simulations and data by comparing the simulated slopes to the observed one. The comparison must
take uncertainty (sampling errors) into account and there
are two different approaches that may be taken to do this.
First, one may do as Pamilo did and simulate the uncertainty that can be expected from sampling a finite number
of loci (and individuals) for genetic analysis. In the
comparison the observed slope is then regarded as a fixed
value and one checks if it lies within the simulated range,
say within the range that includes 95% of the simulations.
Alternatively, one may follow the more conventional hypothesis testing route, as we did, and use the uncertainty of
the estimated slope. In this latter approach the simulated
slope is regarded as a fixed value, being averaged over a
sufficiently large number of computer runs to eliminate
stochastic variability. The simulated value thus represents
the expected value under the stipulated null hypothesis
and one then checks if this expected value lies within, say,
the 95% confidence interval (CI) of the observed slope.
Both of the above approaches to compare simulation
results with data can be used, but both also have potential
problems with adequately describing sampling variability.
In the first approach it is unpractical to (and Pamilo did
not) take into account differences among loci in number
of alleles and allele frequency profiles, as well as different
sample sizes across the study area. These factors all affect
the variability among computer runs and therefore the
probability of rejecting the null hypothesis. The second
approach implicitly includes most of these sources of sampling errors, but it is not clear how to best calculate CI from
© 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 1189–1192
the data. Problems arise because the data consist of pairwise measures (FST), and the measures are thus not all
independent. In our original paper we used the traditional
method of calculating CI, which assumes independence
among measures. Using that method we got a 95% CI for
the slope from 0.000129 to 0.000176 per km (corresponding
to 0.036–0.050 per population step of 283 km). We get reasonably similar results, however, if we instead jackknife over
populations (yielding 95% CI from 0.000094 to 0.000209) or
bootstrap over independent population pairs (0.000100–
0.000337), using the ibdws version 2.0 Beta software (Jensen
et al. 2005). The steepest slope in our simulations, b =
0.00010 per km, which occurred for Ne = 12, lies just at the
lower limit of these new CIs and should probably still be
judged significant in a one-sided test. (A one-sided test is
appropriate because the alternative hypothesis is that the
simulated slope is less than the observed one.) It is unclear
which of these procedures, if any, is to be preferred for calculating CI from pairwise data, and it is beyond the scope
of this note to resolve this issue. At any rate, Pamilo’s claim
that we did not take stochastic variation into account is
incorrect.
Concluding remarks
As we have shown in this commentary, our original
analysis was appropriate. It was unfortunate that Pamilo’s
commentary was published without our consultation and
that it contains a number of misunderstandings. Pamilo’s
model does nevertheless represent an interesting alternative
to, although not necessarily more realistic than, our model.
Because the two models, representing different hypotheses
about the lynx recolonization process in Scandinavia, led
to different conclusions despite being based on the same
data, it seems that FST in the present case does not contain
enough information to distinguish among alternative hypotheses. This sobering lesson may reflect a more general
fact and should be kept in mind also in other studies.
Nevertheless, when additional evidence, as presented and
discussed in detail in the original paper, is brought into
consideration, an origin from multiple sources remains the
most likely scenario for the present lynx populations in
Scandinavia.
References
Hellborg L, Walker CW, Rueness EK et al. (2002) Differentiation
and levels of genetic variation in northern European lynx (Lynx
lynx) populations revealed by microsatellites and mitochondrial
DNA analysis. Conservation Genetics, 3, 97–111.
Jensen JL, Bohonak AJ, Kelley ST (2005) Isolation by distance, web
service. BMC Genetics, 6, 13 (http://phage.sdsu.edu/∼jensen/).
Maruyama T (1970) Analysis of population structure. I. Onedimensional stepping-stone models of finite length. Annals of
Human Genetics, London, 34, 201–219.
1192 P . E . J O R D E E T A L .
Pamilo P (2004) How cryptic is the Scandinavian lynx? Molecular
Ecology, 13, 3257– 3259.
Rousset F (1997) Genetic differentiation and estimation of gene
flow from F-statistics under isolation by distance. Genetics, 145,
1219–1228.
Rueness EK, Jorde PE, Hellborg L, Stenseth NC, Ellegren H, Jakobsen KS (2003) Cryptic population structure in a large, mobile
mammalian predator: the Scandinavian lynx. Molecular Ecology,
12, 2623–2633.
Slatkin M (1993) Isolation by distance in equilibrium and nonequilibrium populations. Evolution, 47, 264 – 279.
The data behind the present work was generated as a part of Eli K.
Rueness’ PhD thesis. She is now a post doc working on the
molecular ecology of lynx and other species. Per Erik Jorde is a
population geneticist focusing on temporal genetic change and
spatial genetic structure. He performed the computer simulations
in this and in the original paper. Nils Chr. Stenseth, head of the
CEES (http://biologi.uio.no/cees/), is a population biologist
working on marine, freshwater and terrestrial systems. Kjetill S.
Jakobsen is an evolutionary geneticist with particular interest in
the interplay between ecology and genetics.
© 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 1189– 1192
Download