Allele frequency covariance among cohorts and its use in

advertisement
Molecular Ecology Resources (2012) 12, 476–480
doi: 10.1111/j.1755-0998.2011.03111.x
Allele frequency covariance among cohorts and its use in
estimating effective size of age-structured populations
PER ERIK JORDE*
Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
Abstract
A general expression for the covariance of allele frequencies among cohorts in age-structured populations is derived. The
expression is used to extend the so-called temporal method for estimating effective population size from allele frequency
shifts among samples from cohorts born any number of years apart. Computer simulations are used to check on the
accuracy and precision of the method, and an application to coastal Atlantic cod is presented.
Keywords: effective population size, overlapping generations, temporal method
Received 2 August 2011; revision received 23 November 2011; accepted 10 December 2011
Introduction
The so-called temporal method is the most widely used
genetic approach for estimating the effective size of contemporary natural populations (Waples & Yokota 2007;
Luikart et al. 2010). The method relies on replicate sampling of individuals from the population, and effective
population size is estimated from observed allele
frequency shifts over the sampling interval. Various
alternatives exist for how the calculations are carried out
(reviewed by Wang 2005), but all depend on the premise
that the observed shifts are related only to the effective
size (Ne) of the population and the time (number of
generations, t) between samples. For diploid organisms,
the expected allele frequency shift, F, is:
t
EðFÞ :
ðeqn 1Þ
2Ne
A commonly used estimator for temporal allele
frequency shifts is (Nei & Tajima 1981):
Fc ¼
a
1X
ðxl yl Þ2
;
a l¼1 ðxl þ yl Þ=2 xl yl
ðeqn 2Þ
where xl and yl are the frequencies of the l’th allele in the
first and second sample, respectively, and the summation
is over all a alleles at the locus. In practical application,
c is calculated
multiple loci are scored and an average F
over loci, weighing each locus by its number of alleles.
Depending on the details of the sampling strategy
used, i.e., whether individuals for genotyping were
Correspondence: Per Erik Jorde;
E-mail: p.e.jorde@bio.uio.no
*Present address: Institute of Marine Research, Flødevigen,
N-4817 His, Norway
collected after reproduction (or, alternatively returned to
the population after biopsy: which are equivalent for the
current purpose and referred to as sample plan I), or
before reproduction and not returned to the population
(plan II), the calculated Fc can be used to estimate Ne after
subtracting the expected contribution arising from sampling (Waples 1989, eqn 12):
N^e ¼
t
~ þ 1=NÞ
2ðFc 1=n
(plan I)
ðeqn 3Þ
(plan II)
ðeqn 4Þ
or (Waples 1989, eqn 11):
N^e ¼
t
~Þ
2ðFc 1=n
where ñ is the (harmonic) mean sample size and, in the
case of sample plan I, N is the census population size.
Estimation of Ne is complicated in the presence of agestructure and overlapping generations, as exemplified by
the hypothetical population in Fig. 1. The figure depicts
temporal shifts (F) over various numbers of years (up to T
20) when samples are draw from separate age classes,
from sexually mature age classes and from all age classes
combined. The pattern of temporal shifts depends on the
demographic characteristics (age-specific survival and
birth rates) and differ among populations, but some general features are apparent in the figure and may be summarized as follows. First, when allele frequency shifts are
estimated from samples drawn from a single age class (i.e.,
comparing individuals collected at the same age, in different years), temporal shifts will be different from, and generally larger than, those pertaining to the total population
over the same time period. Second, such temporal shifts
from single age classes do not increase linearly with
Ó 2012 Blackwell Publishing Ltd
E S T I M A T I N G E F F E C T I V E S I Z E O F A G E - S T R U C T U R E D P O P U L A T I O N S 477
0.014
N^e ¼
12
e
Singl
0.010
10
8
0.008
6
0.006
Adult
0.004
sses
age cla
e
All ag
0.002
s
classe
4
2
0
0.000
1 2 3 4 5 6 7 8 9 10
Years apart (T)
15
20
Fig. 1 Temporal dynamics of allele frequency change (F) in an
age-structured population (demographic parameters as in
Table 1), as observed when sampling from a single age class (10
separate, partly overlapping curves with age class 1 at the top
and 10 at the bottom), from sexually mature adults (5 age classes
in each sample) and from the total population (10 age classes in
each sample). Based on 100 000 replicate simulation runs. Also
indicated are the expected temporal shift (eqn 6) in a hypothetical population with discrete generations and the same generation length (G ¼ 5.93 years) and effective size (Ne ¼ 456 per
generation) (solid black line), and the derived correction factor
(eqn 9) for this population, scaled by generation length: C(T)/G
(red line).
number of years between samples (T) but instead follow a
pattern of damped oscillations with a period of one generation (here G ¼ 5.93 years). In particular, F displays local
minima at T ¼ 1G years, 2G years, etc., and local maxima
at T ¼ 0.5G years, 1.5G years, etc., superimposed on a general increasing trend. Third, when samples are drawn
from multiple age classes, either mature adults or also
including juveniles (i.e., representative samples from the
total population), F still does not follow the predicted value
(eqn 6; solid black line in Fig. 1) closely, but also take part
in the oscillations of its constituent age classes, although
considerably subdued in magnitude. Finally, temporal
shifts for such aggregated age classes are small, and this
could make them difficult to measure with precision.
From the above observations, it is clear that estimating
Ne from temporal change in allele frequencies in
age-structured populations requires an adjustment of
the standard, discrete-generation expectation (eqn 1).
Corrective measures are needed to account for the particulars of sampling and demographic features of the population under consideration. When samples are drawn
from consecutive cohorts, Jorde & Ryman (1995), devised
a correction factor (C) that summarizes the pertinent
demographic characteristics of the population and
applied it to develop a corrected estimator for Ne (Jorde
& Ryman 1995, eqn 25):
Ó 2012 Blackwell Publishing Ltd
ðeqn 5Þ
s
lasse
age c
C(T)/G
Genetic difference (F)
0.012
C
;
~ þ 1=N1 Þ
2GðFc 1=n
which differ from eqn (3) in that C/G replaces t and N1
(the number of newborns) replaces N (the total population
size). Estimator (eqn 5) has been used in several empirical
studies and is currently implemented in softwares factorc
(Jorde & Ryman 1995) and GONe (Coombs et al. 2012).
In practical application, it is often the case that samples may not be available from consecutive cohorts and
(eqn 5) then cannot be used. Such was the situation for
Knutsen et al. (2011), who estimated the effective sizes of
two coastal Atlantic cod populations from juvenile
cohorts that were collected for various purposes from a
number of years. Screening all samples for 13 microsatellite loci, they estimated F among pairs of cohorts and
averaged over pairs that were separated by the same
number (T) of years apart. Because most of the cohort
pairs were from nonconsecutive years (T > 1), the
authors resorted to computer simulations to evaluate
additional correction factors, C(T), for pairs separated by
up to 9 years. These additional factors are needed for
unbiased estimation of Ne because of the non-linear relationship between F and T (cf. Fig. 1). Setting up and running computer simulations to estimate Ne is a
cumbersome approach, however, and the purpose of this
note is to extend Jorde & Ryman (1995)’s method and
develop an analytical expression for C(T) for cohorts born
any number of years apart.
Methods and results
As in Jorde & Ryman (1995), we consider an isolated,
monoecious population consisting of k age classes,
indexed by i, with constant size and age-specific survival
li and birth rates bi. Considering a diploid locus that at
one time, t ¼ 0 (from here on, t index year), in the past
was segregating for a (selective neutral) allele at frequency q, Jorde & Ryman (1995) developed recurrence
equations for the second moments of its subsequent frequency within, r2i ðtÞ (i.e., variance), and among, ri,j(t)
(covariance), age classes i and j for any year t. It is a property of age-structured populations that, as long as the
demographic rates, li and bi, remain constant, all such
variance terms will eventually settle towards a common,
constant rate of increase per year (Felsenstein 1971). For
an age class i, this rate can be expressed as
r¼
r2i ðt þ 1Þ r2i ðtÞ
1
;
2Ne G
qð1 qÞ r2i ðtÞ
ðeqn 6Þ
where G is the generation length (in years) and Ne is the
effective size per generation.
478 P . E . J O R D E
Measuring r directly is impractical because the variance terms in eqn (6) are generally not observable.
Instead, the temporal method relies on sampling the populations at two different times and using the standardized variance, F (eqn 2), of allele frequency change as a
substitute for r. Considering observed allele frequencies
xt and xt+T in two samples drawn from single cohorts
born T years apart, the expected value of Fc (eqn 2) is (cf.
Jorde & Ryman 1995; eqn 16):
E½Fc ðTÞ ¼
Varðxt Þ þ VarðxtþT Þ 2Covðxt ; xtþT Þ
: ðeqn 7Þ
qð1 qÞ Covðxt ; xtþT Þ
Here, Var(xt) and Cov(xt,xt+T) represent the variance
and covariance, respectively, of allele frequencies within
and among the two samples with respect to the ancestral
population frequency, q. Var(xt) has been derived previously (Jorde & Ryman 1995, eqn 17) and shown to be
approximately equal to r21 ðtÞ, plus terms involving sample sizes. These sample terms will subsequently be subtracted from Fc (cf. eqns 3 and 4) and will not concern us
in the following.
In deriving the covariance term in eqn (7), we denote
the true allele frequencies in the two sampled cohorts by
q1,t and q1,t+T, respectively, which for the purpose of derivation we assume to be the newborn age class 1. Considering first sampling from consecutive cohorts (i.e., T ¼ 1),
we let E denote the expected value operator and note that
P
E(xt)¼q1,t and that Eðq1;tþ1 Þ ¼ ki¼1 pi qi;t . The former
expression simply states the implicit but obvious assumption that sampling is representative so that the expected
allele frequency in the sample equals the allele frequency
in the sampled cohort. The latter expression relates the
allele frequency in age class 1 in year t + 1, from which
the second sample were taken, to the weighted mean
over their parents (in year t), the weights being the proportion of offspring derived from each parental age class,
or pi ¼ libi. Thus, Covðxt ; xtþ1 Þ ¼ E½ðxt qÞðxtþ1 qÞ ¼
P
E ½ðq qÞðq1;tþ1 qÞ ¼ E½ðq1;t qÞð ki¼1 pi qi;t qÞ ¼
Pk 1;t
Pk
In other
i¼1 pi E½ðq1;t qÞðqi;t qÞ ¼
i¼1 pi r1;i ðtÞ.
words, the covariance of allele frequencies in samples
drawn from consecutive juvenile cohorts is simply the
weighted (by pi) mean of the population covariances,
r1,i(t), between age class 1 in the first sample and the
parents of those in the second sample (cf. Jorde & Ryman
1995, eqn 6). We note in passing that, because subsequent
mortality within cohort at older ages is independent
between the two cohorts, this sample covariance will be
the same regardless at which age the cohorts were sampled, as long as sampling remains representative (which,
however, is not always the case, as discussed later).
Expanding the derivations to cohorts born T ¼
2 years apart, we note that the frequency in the second
sample, xt+2, has an expected value of E(xt+2) ¼ E(q1,t+2)
P
E( i¼1piqi,t+1). To express this quantity in terms of allele
frequencies in the preceding year, t, we split the sum into
a juvenile (i ¼ 1) and an older (i > 1) component:
P
E(p1q1,t+1)+E( i>1piqi,t+1). (Per definition pi>k ¼ 0 so we
hereafter drop the upper limit, k, on summations over
age classes.) The former, juvenile, component can, as in
the above, be expressed as a sum over parental age
P
classes: p1E( i¼1piqi,t). The component that consists of
older age classes (i > 1) can also be expressed in terms of
allele frequencies the previous year, when they were one
P
year younger (i)1): E( i>1piqi)1,t). Putting this together,
P
we find Cov(xt,xt+2) ¼p1E[(q1,t)q)( i¼1piqi,t)q)]+E[(q1,t)q)
P
P
P
( i>1piqi)1,t)q) ¼ p1 i¼1pir1,i(t)+ i>1pir1,i)1(t). But we
P
have already found that
i¼1pir1,i(t) ¼ Cov(xt,xt+1), so
we see that the covariance between samples from cohorts
born T ¼ 2 years apart can be expressed in terms of
the covariance between cohorts born T ¼ 1 years apart,
plus an additional term: Cov(xt,xt+2) ¼ p1Cov(xt,xt+1)+
P
i>1pir1,i)1(t). By repeating the procedure for larger T,
the pattern repeats, and we are eventually led to a
recursive expression for cohorts born any number of
T years apart:
Covðxt ; xtþT Þ ¼
T1
X
pj Covðxt ; xtþTj Þ þ
X
pi r1;iðT1Þ ðtÞ:
iT
j¼1
ðeqn 8Þ
We now take the ratio of E[F(T)] (eqn 7), that is, what
we can observe, to the annual rate of genetic drift r
(eqn 6), what we are interested in, as a correction factor,
C(T), for samples from cohorts born T years apart (cf.
Jorde & Ryman 1995, eqn 22):
CðTÞ r21 ðtÞ þ r21 ðt þ TÞ 2Covðt; t þ TÞ
:
r21 ðt þ 1Þ r21 ðtÞ
ðeqn 9Þ
Here, the r’s are given by Jorde & Ryman (1995)
(eqns 4–7, or the alternative formulations in their
eqns 10–13) and Cov(t,t + T) is the quantity calculated in
eqn (8) above. In Fig. 1, the quantity C(T)/G for the hypothetical population is plotted (red line) and seen to match
closely the simulated F for juvenile age classes.
Finally, replacing the expected value of F(T) with its
^
empirical estimate, FðTÞ,
the per-generation effective size
can be estimated as
^e ¼
N
CðTÞ
;
^
~ þ 1=N1 1=n
2G½FðTÞ
ðeqn 10Þ
thus extending estimator (eqn 5) to pairs of cohorts born
any number (T) of years apart.
Computer simulations were used to check on the accuracy and precision of Ne-estimates using the proposed
method. The simulations were carried out as in Jorde &
Ó 2012 Blackwell Publishing Ltd
E S T I M A T I N G E F F E C T I V E S I Z E O F A G E - S T R U C T U R E D P O P U L A T I O N S 479
Ryman (1995). Briefly, a set of age-specific survival (li) and
birth rates (bi) were chosen (Table 1) and a population
consisting of 2Ni ¼ 2N1*li genes in each age class i was
created with equal allele frequencies q ¼ 0.1 (10 alleles) in
all age classes in the initial year, t ¼ 0. In each subsequent
year, a newborn cohort (age class i ¼ 1) was generated by
binomial sampling 2Nibi genes from each parental age
class. Older age classes, i > 1, were generated by hypergeometric sampling (i.e., sampling without replacement)
of 2Ni surviving genes out of the previous 2Ni)1. The procedure was repeated for a number of years (50), for the
allele frequency dynamics to become independent of the
initial conditions. Sampling for genetic analyses were simulated by sampling 2n genes without replacement from
the juvenile age class in years t ¼ 50, 51, 53 and 55. Samples were returned to the population after inspection,
before the next year’s reproduction (i.e., sampling followed plan I). Simulations were repeated 20 times and
each run taken to represent a separate, independently segregating gene locus. Temporal change in allele frequencies were calculated for each sample interval, T ¼ 1, 3 and
5, using Jorde & Ryman (2007)’s estimator Fs as a substitute for Fc when estimating Ne (eqn 10).
Table 1 reports the results of 5000 replicate simulations, each representing 20 independent loci with 10
alleles, for various combinations of population and sample sizes (n ¼ 50, 100, and 200). Two different population
sizes were simulated, with N1 ¼ 100 and 300 juveniles,
respectively, and representing true effective sizes of 456
and 1367 per generation (calculated according to Felsenstein 1971). The generation length was G ¼ 5.93 years,
and the calculated correction factors (eqn 9) for samples
from 1 to 5 years apart were C(T) ¼ 57.88, 63.01, 65.56,
58.51 and 46.89. The different C(T) values compensate for
the different amount of temporal change expected in a
population with the present demographic characteristics
(cf. Fig. 1) and should therefore result in correct estimates
of Ne for any T. This prediction is born out in the simulations (Table 1), as can be seen by comparing estimates for
the same true Ne and n for T ¼ 1, 3, and 5 years apart.
The similarity between estimated and true effective size
verifies that the method can provide accurate results.
Precision of the estimates, as indicated by the width of
the reported confidence intervals (95% CI), is quite low
when sample sizes are small relative to the true Ne
(Table 1). However, precision improves with larger samples, in accordance with previous findings for the temporal method (Nei & Tajima 1981; Waples 1989). Precision
improves further when estimates over separate time
intervals are combined, by correcting each F(T) estimate
with the corresponding C(T) factor before averaging to a
single estimate over all intervals (right column). The latter average has approximately half as wide CI, and thus
twice the precision, as compared to estimates for the
three separate intervals (cf. Table 1).
Applied to coastal Atlantic cod populations, C(T) was
calculated for cohorts born up to nine years apart and
compared to the computer simulated values of Knutsen
et al. (2011). As seen from the results (Table 2), the fit
between the simulated values and those calculated from
eqn (9) is very good.
Discussion
The estimator (eqn 10) is a direct extension of Jorde &
Ryman (1995)’s expression (eqn 5), applicable to temporal
samples from single cohorts that are born any number of
years apart and thus not restricted to consecutive cohorts.
As for the earlier approach, it requires knowledge, or reasonable estimates, of the age-specific survival and birth
rates, that is, the elements of a standard Leslie population
matrix or life table (e.g. Caswell 2001). From such estimates, correction factors, C(T), can be calculated (eqn 9)
for the population at hand. Software for carrying out the
calculations is available from http://folk.uio.no/ejorde.
In common with the original formulation (Jorde &
Ryman 1995), the present method assumes that the
demographic parameters (li, bi, and census population
size) remain reasonably constant for some time (see Jorde
& Ryman 1995 for a discussion). Hence, the method may
Table 1 Results of computer simulations that estimate effective size of age structured populations from temporal allele frequency shifts
between samples taken from juvenile cohorts born different number of years apart. Demographic population parameters were: survival
rates, li = 1.0, 1.0, 0.9, 0.9, 0.8, 0.7, 0.6, 0.5, 0.5, 0.5 (10 age classes) and birth rates, bi = 0.0, 0.0, 0.0, 0.13333, 0.39375, 0.39286, 0.26667, 0.13,
0.13, 0.0 (adopted from Table 2 of Jorde & Ryman 1995)
T¼1
T¼3
T¼ 5
All
Ne
n
N^e
95% CI
N^e
95% CI
N^e
95% CI
N^e
95% CI
456
456
1367
1367
1367
50
100
50
100
200
461.4
458.8
1400.5
1395.8
1380.0
317.7–744.4
373.5–571.3
611.0–inf
838.6–3321.5
1037.9–1913.9
460.0
457.3
1410.9
1384.5
1376.7
326.8–714.3
371.7–568.5
654.2–inf
869.7–2826.1
1054.4–1930.1
461.4
455.8
1385.2
1394.6
1382.0
305.9–810.9
369.4–568.0
547.1–inf
792.0–3632.2
1027.9–1996.1
459.3
456.9
1382.4
1368.4
1371.7
369.4–599.1
403.3–519.3
812.5–4183.6
1009.5–2083.2
1161.1–1652.6
Ó 2012 Blackwell Publishing Ltd
480 P . E . J O R D E
Table 2 Comparison between predicted (expression 9) and
simulated (Knutsen et al. 2011) correction factors C(T) for coastal
Atlantic cod cohorts, T ¼ 1–9 years apart. Simulations and
calculations were based on life-table data in Table A1 of
Knutsen et al. (2011)
Years apart (T)
1
2
3
4
5
6
7
8
9
Simulated 6.77 8.56 8.54 9.01 10.23 11.57 12.35 13.12 14.60
Predicted 6.68 8.39 8.48 8.98 10.26 11.43 12.37 13.30 14.30
not be applicable to populations or organisms that are
known to fluctuate greatly in these characteristics. Other
potential limitations of the method concerns sampling.
Because the method assumes that each sample represent
a single age class, there must be some way to reliably age
the sampled individuals, or else samples must be collected at a life stage when age can be inferred indirectly,
for example, as young-of-the-year.
When ageing is possible, any age class may in principle be sampled and used to estimate Ne. A problem
arises, however, when comparing older (mature) age
classes over longer time intervals because individuals
that have survived to be sampled at an older age no
longer represent a random sample of the original cohort.
Instead, with increasing age, surviving individuals
increasingly represent those that have participated in
past reproductive events and are thus more strongly correlated genetically to subsequent cohorts. Older age classes are therefore genetically more similar than predicted
by eqn (8), which assumes that individuals are sampled
randomly. For organisms with high survival, this phenomenon has little effect on empirical estimates of
genetic differences and effective size. An example is the
population depicted in Fig. 1, where mortality (up to
50%) is seen to reduce F between samples from older age
classes taken at several years apart (here, T > 4), but only
moderately so (about 10%) relative to F between juvenile
cohorts. For organisms with higher mortality, the effect
on older age classes is more pronounced and may not be
ignored (data not shown). Instead, such organisms are
better sampled as juveniles if Ne is to be estimated from
samples taken more than one year apart. (The original
Jorde & Ryman (1995) method, which assumes cohorts
born in consecutive years, is not affected by mortality
unless reproduction starts already in the year of birth,
which is probably rare for age-structured organisms.)
A common challenge when attempting to estimate
genetic drift and effective size of natural populations is that
the quantity to be measured is often quite small. This is
especially so for long-lived organisms that consist of multiple age classes and have long generation intervals. For the
two examples in Table 1, the expected rate of change for the
total population is only 1/(2GNe) ¼ 0.00018 and 0.00006
per year, respectively (cf. eqn 6). Such minute levels of drift
are difficult to measure with any precision, unless samples
are huge (on the order of thousands) or the number of years
between samples is large. The original Jorde & Ryman
(1995) method delivers a boost in precision by focusing on
temporal change in a single age class, rather than in the total
population. The improvement in precision stems from the
much larger, by a factor of C/G, levels of temporal change
within single age classes relative to the total population (cf.
Fig. 1). The present development increases the usefulness
of the Jorde & Ryman (1995) method by also allowing samples from non-consecutive cohorts and therefore also provides a method for combining corrected estimates over
multiple time intervals into a single, more precise, estimate
of Ne. The improvements over the standard temporal
method comes at the cost of needing to obtain demographic
estimates for the population at hand, but the cost can be reasonable in situations where estimation otherwise will be
difficult or lack precision.
Acknowledgements
This work was supported by the Research Council of Norway
and by the European Science Foundation’s MarinERA programme. I thank three anonymous referees for constructive comments on an earlier version of this manuscript.
References
Caswell H (2001) Matrix Population Models, 2nd edn. Sinauer, Sunderland,
MA.
Coombs JA, Letcher BH, Nislow, KH (2012) GONe: Software for estimating effective population size in species with generational overlap.
Molecular Ecology Resources, 12, 160–163.
Felsenstein J (1971) Inbreeding and variance effective numbers in populations with overlapping generations. Genetics, 68, 581–597.
Jorde PE, Ryman N (1995) Temporal allele frequency change and estimation of effective size in populations with overlapping generations.
Genetics, 139, 1077–1090.
Jorde PE, Ryman N (2007) Unbiased estimator for genetic drift and effective population size. Genetics, 177, 927–935.
Knutsen H, Olsen EM, Jorde PE et al. (2011) Are low but statistically significant levels of genetic differentiation in marine fishes ‘biologically
meaningful’? A case study of coastal Atlantic cod. Molecular Ecology,
20, 768–783.
Luikart G, Ryman N, Tallmon DA, Schwartz MK, Allendorf FW (2010)
Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches. Conservation Genetics, 11, 355–373.
Nei M, Tajima F (1981) Genetic drift and estimation of effective population size. Genetics, 98, 625–640.
Wang J (2005) Estimation of effective population sizes from data on
genetic markers. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences, 360, 1395–1409.
Waples RS (1989) A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121,
379–391.
Waples RS, Yokota M (2007) Temporal estimates of effective population
size in species with overlapping generations. Genetics, 175, 219–233.
Ó 2012 Blackwell Publishing Ltd
Download