Molecular Ecology Resources (2012) 12, 476–480 doi: 10.1111/j.1755-0998.2011.03111.x Allele frequency covariance among cohorts and its use in estimating effective size of age-structured populations PER ERIK JORDE* Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway Abstract A general expression for the covariance of allele frequencies among cohorts in age-structured populations is derived. The expression is used to extend the so-called temporal method for estimating effective population size from allele frequency shifts among samples from cohorts born any number of years apart. Computer simulations are used to check on the accuracy and precision of the method, and an application to coastal Atlantic cod is presented. Keywords: effective population size, overlapping generations, temporal method Received 2 August 2011; revision received 23 November 2011; accepted 10 December 2011 Introduction The so-called temporal method is the most widely used genetic approach for estimating the effective size of contemporary natural populations (Waples & Yokota 2007; Luikart et al. 2010). The method relies on replicate sampling of individuals from the population, and effective population size is estimated from observed allele frequency shifts over the sampling interval. Various alternatives exist for how the calculations are carried out (reviewed by Wang 2005), but all depend on the premise that the observed shifts are related only to the effective size (Ne) of the population and the time (number of generations, t) between samples. For diploid organisms, the expected allele frequency shift, F, is: t EðFÞ : ðeqn 1Þ 2Ne A commonly used estimator for temporal allele frequency shifts is (Nei & Tajima 1981): Fc ¼ a 1X ðxl yl Þ2 ; a l¼1 ðxl þ yl Þ=2 xl yl ðeqn 2Þ where xl and yl are the frequencies of the l’th allele in the first and second sample, respectively, and the summation is over all a alleles at the locus. In practical application, c is calculated multiple loci are scored and an average F over loci, weighing each locus by its number of alleles. Depending on the details of the sampling strategy used, i.e., whether individuals for genotyping were Correspondence: Per Erik Jorde; E-mail: p.e.jorde@bio.uio.no *Present address: Institute of Marine Research, Flødevigen, N-4817 His, Norway collected after reproduction (or, alternatively returned to the population after biopsy: which are equivalent for the current purpose and referred to as sample plan I), or before reproduction and not returned to the population (plan II), the calculated Fc can be used to estimate Ne after subtracting the expected contribution arising from sampling (Waples 1989, eqn 12): N^e ¼ t ~ þ 1=NÞ 2ðFc 1=n (plan I) ðeqn 3Þ (plan II) ðeqn 4Þ or (Waples 1989, eqn 11): N^e ¼ t ~Þ 2ðFc 1=n where ñ is the (harmonic) mean sample size and, in the case of sample plan I, N is the census population size. Estimation of Ne is complicated in the presence of agestructure and overlapping generations, as exemplified by the hypothetical population in Fig. 1. The figure depicts temporal shifts (F) over various numbers of years (up to T 20) when samples are draw from separate age classes, from sexually mature age classes and from all age classes combined. The pattern of temporal shifts depends on the demographic characteristics (age-specific survival and birth rates) and differ among populations, but some general features are apparent in the figure and may be summarized as follows. First, when allele frequency shifts are estimated from samples drawn from a single age class (i.e., comparing individuals collected at the same age, in different years), temporal shifts will be different from, and generally larger than, those pertaining to the total population over the same time period. Second, such temporal shifts from single age classes do not increase linearly with Ó 2012 Blackwell Publishing Ltd E S T I M A T I N G E F F E C T I V E S I Z E O F A G E - S T R U C T U R E D P O P U L A T I O N S 477 0.014 N^e ¼ 12 e Singl 0.010 10 8 0.008 6 0.006 Adult 0.004 sses age cla e All ag 0.002 s classe 4 2 0 0.000 1 2 3 4 5 6 7 8 9 10 Years apart (T) 15 20 Fig. 1 Temporal dynamics of allele frequency change (F) in an age-structured population (demographic parameters as in Table 1), as observed when sampling from a single age class (10 separate, partly overlapping curves with age class 1 at the top and 10 at the bottom), from sexually mature adults (5 age classes in each sample) and from the total population (10 age classes in each sample). Based on 100 000 replicate simulation runs. Also indicated are the expected temporal shift (eqn 6) in a hypothetical population with discrete generations and the same generation length (G ¼ 5.93 years) and effective size (Ne ¼ 456 per generation) (solid black line), and the derived correction factor (eqn 9) for this population, scaled by generation length: C(T)/G (red line). number of years between samples (T) but instead follow a pattern of damped oscillations with a period of one generation (here G ¼ 5.93 years). In particular, F displays local minima at T ¼ 1G years, 2G years, etc., and local maxima at T ¼ 0.5G years, 1.5G years, etc., superimposed on a general increasing trend. Third, when samples are drawn from multiple age classes, either mature adults or also including juveniles (i.e., representative samples from the total population), F still does not follow the predicted value (eqn 6; solid black line in Fig. 1) closely, but also take part in the oscillations of its constituent age classes, although considerably subdued in magnitude. Finally, temporal shifts for such aggregated age classes are small, and this could make them difficult to measure with precision. From the above observations, it is clear that estimating Ne from temporal change in allele frequencies in age-structured populations requires an adjustment of the standard, discrete-generation expectation (eqn 1). Corrective measures are needed to account for the particulars of sampling and demographic features of the population under consideration. When samples are drawn from consecutive cohorts, Jorde & Ryman (1995), devised a correction factor (C) that summarizes the pertinent demographic characteristics of the population and applied it to develop a corrected estimator for Ne (Jorde & Ryman 1995, eqn 25): Ó 2012 Blackwell Publishing Ltd ðeqn 5Þ s lasse age c C(T)/G Genetic difference (F) 0.012 C ; ~ þ 1=N1 Þ 2GðFc 1=n which differ from eqn (3) in that C/G replaces t and N1 (the number of newborns) replaces N (the total population size). Estimator (eqn 5) has been used in several empirical studies and is currently implemented in softwares factorc (Jorde & Ryman 1995) and GONe (Coombs et al. 2012). In practical application, it is often the case that samples may not be available from consecutive cohorts and (eqn 5) then cannot be used. Such was the situation for Knutsen et al. (2011), who estimated the effective sizes of two coastal Atlantic cod populations from juvenile cohorts that were collected for various purposes from a number of years. Screening all samples for 13 microsatellite loci, they estimated F among pairs of cohorts and averaged over pairs that were separated by the same number (T) of years apart. Because most of the cohort pairs were from nonconsecutive years (T > 1), the authors resorted to computer simulations to evaluate additional correction factors, C(T), for pairs separated by up to 9 years. These additional factors are needed for unbiased estimation of Ne because of the non-linear relationship between F and T (cf. Fig. 1). Setting up and running computer simulations to estimate Ne is a cumbersome approach, however, and the purpose of this note is to extend Jorde & Ryman (1995)’s method and develop an analytical expression for C(T) for cohorts born any number of years apart. Methods and results As in Jorde & Ryman (1995), we consider an isolated, monoecious population consisting of k age classes, indexed by i, with constant size and age-specific survival li and birth rates bi. Considering a diploid locus that at one time, t ¼ 0 (from here on, t index year), in the past was segregating for a (selective neutral) allele at frequency q, Jorde & Ryman (1995) developed recurrence equations for the second moments of its subsequent frequency within, r2i ðtÞ (i.e., variance), and among, ri,j(t) (covariance), age classes i and j for any year t. It is a property of age-structured populations that, as long as the demographic rates, li and bi, remain constant, all such variance terms will eventually settle towards a common, constant rate of increase per year (Felsenstein 1971). For an age class i, this rate can be expressed as r¼ r2i ðt þ 1Þ r2i ðtÞ 1 ; 2Ne G qð1 qÞ r2i ðtÞ ðeqn 6Þ where G is the generation length (in years) and Ne is the effective size per generation. 478 P . E . J O R D E Measuring r directly is impractical because the variance terms in eqn (6) are generally not observable. Instead, the temporal method relies on sampling the populations at two different times and using the standardized variance, F (eqn 2), of allele frequency change as a substitute for r. Considering observed allele frequencies xt and xt+T in two samples drawn from single cohorts born T years apart, the expected value of Fc (eqn 2) is (cf. Jorde & Ryman 1995; eqn 16): E½Fc ðTÞ ¼ Varðxt Þ þ VarðxtþT Þ 2Covðxt ; xtþT Þ : ðeqn 7Þ qð1 qÞ Covðxt ; xtþT Þ Here, Var(xt) and Cov(xt,xt+T) represent the variance and covariance, respectively, of allele frequencies within and among the two samples with respect to the ancestral population frequency, q. Var(xt) has been derived previously (Jorde & Ryman 1995, eqn 17) and shown to be approximately equal to r21 ðtÞ, plus terms involving sample sizes. These sample terms will subsequently be subtracted from Fc (cf. eqns 3 and 4) and will not concern us in the following. In deriving the covariance term in eqn (7), we denote the true allele frequencies in the two sampled cohorts by q1,t and q1,t+T, respectively, which for the purpose of derivation we assume to be the newborn age class 1. Considering first sampling from consecutive cohorts (i.e., T ¼ 1), we let E denote the expected value operator and note that P E(xt)¼q1,t and that Eðq1;tþ1 Þ ¼ ki¼1 pi qi;t . The former expression simply states the implicit but obvious assumption that sampling is representative so that the expected allele frequency in the sample equals the allele frequency in the sampled cohort. The latter expression relates the allele frequency in age class 1 in year t + 1, from which the second sample were taken, to the weighted mean over their parents (in year t), the weights being the proportion of offspring derived from each parental age class, or pi ¼ libi. Thus, Covðxt ; xtþ1 Þ ¼ E½ðxt qÞðxtþ1 qÞ ¼ P E ½ðq qÞðq1;tþ1 qÞ ¼ E½ðq1;t qÞð ki¼1 pi qi;t qÞ ¼ Pk 1;t Pk In other i¼1 pi E½ðq1;t qÞðqi;t qÞ ¼ i¼1 pi r1;i ðtÞ. words, the covariance of allele frequencies in samples drawn from consecutive juvenile cohorts is simply the weighted (by pi) mean of the population covariances, r1,i(t), between age class 1 in the first sample and the parents of those in the second sample (cf. Jorde & Ryman 1995, eqn 6). We note in passing that, because subsequent mortality within cohort at older ages is independent between the two cohorts, this sample covariance will be the same regardless at which age the cohorts were sampled, as long as sampling remains representative (which, however, is not always the case, as discussed later). Expanding the derivations to cohorts born T ¼ 2 years apart, we note that the frequency in the second sample, xt+2, has an expected value of E(xt+2) ¼ E(q1,t+2) P E( i¼1piqi,t+1). To express this quantity in terms of allele frequencies in the preceding year, t, we split the sum into a juvenile (i ¼ 1) and an older (i > 1) component: P E(p1q1,t+1)+E( i>1piqi,t+1). (Per definition pi>k ¼ 0 so we hereafter drop the upper limit, k, on summations over age classes.) The former, juvenile, component can, as in the above, be expressed as a sum over parental age P classes: p1E( i¼1piqi,t). The component that consists of older age classes (i > 1) can also be expressed in terms of allele frequencies the previous year, when they were one P year younger (i)1): E( i>1piqi)1,t). Putting this together, P we find Cov(xt,xt+2) ¼p1E[(q1,t)q)( i¼1piqi,t)q)]+E[(q1,t)q) P P P ( i>1piqi)1,t)q) ¼ p1 i¼1pir1,i(t)+ i>1pir1,i)1(t). But we P have already found that i¼1pir1,i(t) ¼ Cov(xt,xt+1), so we see that the covariance between samples from cohorts born T ¼ 2 years apart can be expressed in terms of the covariance between cohorts born T ¼ 1 years apart, plus an additional term: Cov(xt,xt+2) ¼ p1Cov(xt,xt+1)+ P i>1pir1,i)1(t). By repeating the procedure for larger T, the pattern repeats, and we are eventually led to a recursive expression for cohorts born any number of T years apart: Covðxt ; xtþT Þ ¼ T1 X pj Covðxt ; xtþTj Þ þ X pi r1;iðT1Þ ðtÞ: iT j¼1 ðeqn 8Þ We now take the ratio of E[F(T)] (eqn 7), that is, what we can observe, to the annual rate of genetic drift r (eqn 6), what we are interested in, as a correction factor, C(T), for samples from cohorts born T years apart (cf. Jorde & Ryman 1995, eqn 22): CðTÞ r21 ðtÞ þ r21 ðt þ TÞ 2Covðt; t þ TÞ : r21 ðt þ 1Þ r21 ðtÞ ðeqn 9Þ Here, the r’s are given by Jorde & Ryman (1995) (eqns 4–7, or the alternative formulations in their eqns 10–13) and Cov(t,t + T) is the quantity calculated in eqn (8) above. In Fig. 1, the quantity C(T)/G for the hypothetical population is plotted (red line) and seen to match closely the simulated F for juvenile age classes. Finally, replacing the expected value of F(T) with its ^ empirical estimate, FðTÞ, the per-generation effective size can be estimated as ^e ¼ N CðTÞ ; ^ ~ þ 1=N1 1=n 2G½FðTÞ ðeqn 10Þ thus extending estimator (eqn 5) to pairs of cohorts born any number (T) of years apart. Computer simulations were used to check on the accuracy and precision of Ne-estimates using the proposed method. The simulations were carried out as in Jorde & Ó 2012 Blackwell Publishing Ltd E S T I M A T I N G E F F E C T I V E S I Z E O F A G E - S T R U C T U R E D P O P U L A T I O N S 479 Ryman (1995). Briefly, a set of age-specific survival (li) and birth rates (bi) were chosen (Table 1) and a population consisting of 2Ni ¼ 2N1*li genes in each age class i was created with equal allele frequencies q ¼ 0.1 (10 alleles) in all age classes in the initial year, t ¼ 0. In each subsequent year, a newborn cohort (age class i ¼ 1) was generated by binomial sampling 2Nibi genes from each parental age class. Older age classes, i > 1, were generated by hypergeometric sampling (i.e., sampling without replacement) of 2Ni surviving genes out of the previous 2Ni)1. The procedure was repeated for a number of years (50), for the allele frequency dynamics to become independent of the initial conditions. Sampling for genetic analyses were simulated by sampling 2n genes without replacement from the juvenile age class in years t ¼ 50, 51, 53 and 55. Samples were returned to the population after inspection, before the next year’s reproduction (i.e., sampling followed plan I). Simulations were repeated 20 times and each run taken to represent a separate, independently segregating gene locus. Temporal change in allele frequencies were calculated for each sample interval, T ¼ 1, 3 and 5, using Jorde & Ryman (2007)’s estimator Fs as a substitute for Fc when estimating Ne (eqn 10). Table 1 reports the results of 5000 replicate simulations, each representing 20 independent loci with 10 alleles, for various combinations of population and sample sizes (n ¼ 50, 100, and 200). Two different population sizes were simulated, with N1 ¼ 100 and 300 juveniles, respectively, and representing true effective sizes of 456 and 1367 per generation (calculated according to Felsenstein 1971). The generation length was G ¼ 5.93 years, and the calculated correction factors (eqn 9) for samples from 1 to 5 years apart were C(T) ¼ 57.88, 63.01, 65.56, 58.51 and 46.89. The different C(T) values compensate for the different amount of temporal change expected in a population with the present demographic characteristics (cf. Fig. 1) and should therefore result in correct estimates of Ne for any T. This prediction is born out in the simulations (Table 1), as can be seen by comparing estimates for the same true Ne and n for T ¼ 1, 3, and 5 years apart. The similarity between estimated and true effective size verifies that the method can provide accurate results. Precision of the estimates, as indicated by the width of the reported confidence intervals (95% CI), is quite low when sample sizes are small relative to the true Ne (Table 1). However, precision improves with larger samples, in accordance with previous findings for the temporal method (Nei & Tajima 1981; Waples 1989). Precision improves further when estimates over separate time intervals are combined, by correcting each F(T) estimate with the corresponding C(T) factor before averaging to a single estimate over all intervals (right column). The latter average has approximately half as wide CI, and thus twice the precision, as compared to estimates for the three separate intervals (cf. Table 1). Applied to coastal Atlantic cod populations, C(T) was calculated for cohorts born up to nine years apart and compared to the computer simulated values of Knutsen et al. (2011). As seen from the results (Table 2), the fit between the simulated values and those calculated from eqn (9) is very good. Discussion The estimator (eqn 10) is a direct extension of Jorde & Ryman (1995)’s expression (eqn 5), applicable to temporal samples from single cohorts that are born any number of years apart and thus not restricted to consecutive cohorts. As for the earlier approach, it requires knowledge, or reasonable estimates, of the age-specific survival and birth rates, that is, the elements of a standard Leslie population matrix or life table (e.g. Caswell 2001). From such estimates, correction factors, C(T), can be calculated (eqn 9) for the population at hand. Software for carrying out the calculations is available from http://folk.uio.no/ejorde. In common with the original formulation (Jorde & Ryman 1995), the present method assumes that the demographic parameters (li, bi, and census population size) remain reasonably constant for some time (see Jorde & Ryman 1995 for a discussion). Hence, the method may Table 1 Results of computer simulations that estimate effective size of age structured populations from temporal allele frequency shifts between samples taken from juvenile cohorts born different number of years apart. Demographic population parameters were: survival rates, li = 1.0, 1.0, 0.9, 0.9, 0.8, 0.7, 0.6, 0.5, 0.5, 0.5 (10 age classes) and birth rates, bi = 0.0, 0.0, 0.0, 0.13333, 0.39375, 0.39286, 0.26667, 0.13, 0.13, 0.0 (adopted from Table 2 of Jorde & Ryman 1995) T¼1 T¼3 T¼ 5 All Ne n N^e 95% CI N^e 95% CI N^e 95% CI N^e 95% CI 456 456 1367 1367 1367 50 100 50 100 200 461.4 458.8 1400.5 1395.8 1380.0 317.7–744.4 373.5–571.3 611.0–inf 838.6–3321.5 1037.9–1913.9 460.0 457.3 1410.9 1384.5 1376.7 326.8–714.3 371.7–568.5 654.2–inf 869.7–2826.1 1054.4–1930.1 461.4 455.8 1385.2 1394.6 1382.0 305.9–810.9 369.4–568.0 547.1–inf 792.0–3632.2 1027.9–1996.1 459.3 456.9 1382.4 1368.4 1371.7 369.4–599.1 403.3–519.3 812.5–4183.6 1009.5–2083.2 1161.1–1652.6 Ó 2012 Blackwell Publishing Ltd 480 P . E . J O R D E Table 2 Comparison between predicted (expression 9) and simulated (Knutsen et al. 2011) correction factors C(T) for coastal Atlantic cod cohorts, T ¼ 1–9 years apart. Simulations and calculations were based on life-table data in Table A1 of Knutsen et al. (2011) Years apart (T) 1 2 3 4 5 6 7 8 9 Simulated 6.77 8.56 8.54 9.01 10.23 11.57 12.35 13.12 14.60 Predicted 6.68 8.39 8.48 8.98 10.26 11.43 12.37 13.30 14.30 not be applicable to populations or organisms that are known to fluctuate greatly in these characteristics. Other potential limitations of the method concerns sampling. Because the method assumes that each sample represent a single age class, there must be some way to reliably age the sampled individuals, or else samples must be collected at a life stage when age can be inferred indirectly, for example, as young-of-the-year. When ageing is possible, any age class may in principle be sampled and used to estimate Ne. A problem arises, however, when comparing older (mature) age classes over longer time intervals because individuals that have survived to be sampled at an older age no longer represent a random sample of the original cohort. Instead, with increasing age, surviving individuals increasingly represent those that have participated in past reproductive events and are thus more strongly correlated genetically to subsequent cohorts. Older age classes are therefore genetically more similar than predicted by eqn (8), which assumes that individuals are sampled randomly. For organisms with high survival, this phenomenon has little effect on empirical estimates of genetic differences and effective size. An example is the population depicted in Fig. 1, where mortality (up to 50%) is seen to reduce F between samples from older age classes taken at several years apart (here, T > 4), but only moderately so (about 10%) relative to F between juvenile cohorts. For organisms with higher mortality, the effect on older age classes is more pronounced and may not be ignored (data not shown). Instead, such organisms are better sampled as juveniles if Ne is to be estimated from samples taken more than one year apart. (The original Jorde & Ryman (1995) method, which assumes cohorts born in consecutive years, is not affected by mortality unless reproduction starts already in the year of birth, which is probably rare for age-structured organisms.) A common challenge when attempting to estimate genetic drift and effective size of natural populations is that the quantity to be measured is often quite small. This is especially so for long-lived organisms that consist of multiple age classes and have long generation intervals. For the two examples in Table 1, the expected rate of change for the total population is only 1/(2GNe) ¼ 0.00018 and 0.00006 per year, respectively (cf. eqn 6). Such minute levels of drift are difficult to measure with any precision, unless samples are huge (on the order of thousands) or the number of years between samples is large. The original Jorde & Ryman (1995) method delivers a boost in precision by focusing on temporal change in a single age class, rather than in the total population. The improvement in precision stems from the much larger, by a factor of C/G, levels of temporal change within single age classes relative to the total population (cf. Fig. 1). The present development increases the usefulness of the Jorde & Ryman (1995) method by also allowing samples from non-consecutive cohorts and therefore also provides a method for combining corrected estimates over multiple time intervals into a single, more precise, estimate of Ne. The improvements over the standard temporal method comes at the cost of needing to obtain demographic estimates for the population at hand, but the cost can be reasonable in situations where estimation otherwise will be difficult or lack precision. Acknowledgements This work was supported by the Research Council of Norway and by the European Science Foundation’s MarinERA programme. I thank three anonymous referees for constructive comments on an earlier version of this manuscript. References Caswell H (2001) Matrix Population Models, 2nd edn. Sinauer, Sunderland, MA. Coombs JA, Letcher BH, Nislow, KH (2012) GONe: Software for estimating effective population size in species with generational overlap. Molecular Ecology Resources, 12, 160–163. Felsenstein J (1971) Inbreeding and variance effective numbers in populations with overlapping generations. Genetics, 68, 581–597. Jorde PE, Ryman N (1995) Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics, 139, 1077–1090. Jorde PE, Ryman N (2007) Unbiased estimator for genetic drift and effective population size. Genetics, 177, 927–935. Knutsen H, Olsen EM, Jorde PE et al. (2011) Are low but statistically significant levels of genetic differentiation in marine fishes ‘biologically meaningful’? A case study of coastal Atlantic cod. Molecular Ecology, 20, 768–783. Luikart G, Ryman N, Tallmon DA, Schwartz MK, Allendorf FW (2010) Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches. Conservation Genetics, 11, 355–373. Nei M, Tajima F (1981) Genetic drift and estimation of effective population size. Genetics, 98, 625–640. Wang J (2005) Estimation of effective population sizes from data on genetic markers. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360, 1395–1409. Waples RS (1989) A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121, 379–391. Waples RS, Yokota M (2007) Temporal estimates of effective population size in species with overlapping generations. Genetics, 175, 219–233. Ó 2012 Blackwell Publishing Ltd