Life history analysis for inference of fitness and population growth

advertisement
Title: Joint analysis of life-history stages for inference of lifetime fitness and population
growth
Authors: RG Shaw, CJ Geyer, S Wagenius, HH Hangelbroek and JR Etterson (in some,
yet-to-be-determined order)
It is well understood that the fitness of an individual is a function of its survival and
contribution of offspring to the next generation. It is important to obtain accurate
measurements of individual lifetime fitness to model ecological and evolutionary
dynamics of populations such as numerical contributions to population growth and the
quality of offspring relative to natural selection. The simplicity of these statements,
however, belies underlying statistical complications that arise in efforts to estimate fitness
by jointly analyzing life-history components measured by different metrics throughout
the lifespan of an individual. One problem is the distribution of different kinds of data.
For example, the parametric distribution of binary survival data is not the same as that for
continuous fecundity data. Another problem is the dependency of later life history stages
on previous ones. For example, fecundity depends upon survival to reproductive age.
Although these issues have long been acknowledged, to date there is no single rigorously
justified analysis that can simultaneously account for a diversity of fitness measurements
taken sequentially throughout the lifetime of an individual. This is an important gap in
our portfolio of statistical tools given the pervasiveness of these issues to inferences of
population growth, natural selection, and the evolution of life history strategies.
Here we present the application of a new statistical approach that can be employed to
analyze longitudinal records of individuals to estimate lifetime fitness and population
growth. This method accommodates different statistical distributions of distinct life
history stages and accounts for the dependency inherent in sequential fitness
measurements. A complete mathematical development can be found in Geyer et al.
(2006) and the program, referred to henceforth as “Aster,” is available at … Our goal
here is twofold. First we will illustrate the problem by reviewing the limitations
approaches that have previously been employed in empirical studies. Second we describe
how Aster resolves these problems and provide three empirical examples where our
approach has been implemented in the context of x, y, and z.
The problem
Two standard properties of life-history data are central to the statistical challenges
addressed by Aster. First, no single parametric distribution is suitable for modeling both
survival and fecundity. Second, fitness measurements taken later life-history stages are
contingent upon its life-history status at earlier stages. Consequently, contributions to
fitness realized over the lifespan do not conform to any available distribution, even if
transformations are applied. Previous studies have attempted to deal with these problems
by analyzing components of fitness separately or by combining components using a
variety of methods.
Separate analyses of fitness components
Components of fitness are sometimes analyzed separately to avoid the statistical
complications described above. This approach is most appropriate if the researcher is
interested in whether this life history stage has fitness consequences that are independent
of other stages. For example, in Shaw's (1986) study of genetic variation within a
population of Salvia lyrata in its response to conspecific density, she provided analyses
of survival using categorical methods and, for those plants that survived to a given age,
ANOVA of size (as a proxy for reproductive capacity in this perennial plant). This
approach has the appeal that the statistical assumptions underlying the analyses are more
likely to be satisfied. Yet the analysis of survivors' size, or in other cases fecundity, in
effect treats the missing observations of individuals that did not survive as though they
are missing at random, even though failure of that condition is directly relevant to fitness
variation. Moreover, separate analyses of fitness components cannot substitute for an
analysis of overall fitness, particularly considering the possibility of tradeoffs between
components. [SW - Explicate the previous sentence.]
Joint analyses of fitness components
JE -What about simple multiplicative fitness functions?
A common approach to dealing jointly with survival and reproduction is to use fecundity
as the index of fitness and retain zeroes for individuals who died prior to reproduction.
The underlying justification for this approach is that the zeroes have biological meaning
that will be lost if they are excluded. However, the resultant distribution is often skewed
and cannot be rectified by data transformation which undermines the validity of statistical
tests. [To obtain an analysis of overall fitness, Shaw (1986) also reported an analysis of
size means that included zeroes for plants that died. Here again, use of multiple
overlapping analyses undermines clear inference – this example could be blended in here]
[sw -why?].The excess of zeros can be dealt with… [SW- refer to Cheng's mixture model
and state its limitations OR refer to the "inflated-zero" approach and its limitations? – I
agree with Stuart but am not qualified to do this but found this citation that seems
relevant - Getachew 2004 Hierarchical Bayesian Analysis of Correlated Zero-inflated
Count Data. Biometrical Journal 46:653-663 – From the outline at the end, it appears as if
you intended to bring these points up in the discussion – but maybe they belong here]
Though authors rarely publish fitness distributions, they frequently remark on the
awkwardness of these distributions in their studies (e.g. Etterson 2004) and sometimes
report multiple analyses as efforts to address different distributional problems. For
example, in their experimental studies of frequency-dependent selection in the perennial
grass, Anthoxanthum odoratum, Antonovics and Ellstrand (1984) noted the extreme
skewness of the distribution of lifetime reproductive output. Finding no transformation
that yielded a normal distribution suitable for analysis of variance (ANOVA), they
assessed the robustness of their inferences by applying three distinct analyses (categorical
analysis of discrete fecundity classes, ANOVA of means, and nonparametric analysis). In
this study, results of the three analyses were largely consistent, but, in general, that need
not be the case.
JE – Contrast Aster approach to standard matrix projection models of population growth
with sensitivity analyses?
Individual fitness estimated with Aster
We propose an approach for rigorous statistical modeling and analysis of longitudinal
records of life history data as a sound basis for estimating and comparing total fitness
(Geyer et al. 2006). [JE – a fig as in Geyer et al. 2006 fig 1 would be helpful to refer to
through this explanation]. In this approach, life history is modeled through a sequence of
intervals bounded by the times at which fitness components are observed for all
individuals in the study population. The intervals could be days or years, and need not all
be the same length. Within a given interval, survival is modeled as a binomial variable.
Likewise, given that an individual survived to this point, whether it reproduced is
considered binomial. Fecundity, given that it reproduced in this interval, is modeled
according to a zero-truncated Poisson distribution. The components of fitness are
modeled jointly over successive intervals by explicitly taking into account the inherent
dependence of each stage on a previous stage. Whereas, at the outset, any individual in
the cohort may die or survive in the first interval, reproduction is considered only for
individuals that survive to either reproduce or not. Previous sentence unclear. Similarly,
fecundity is considered only for individuals that reproduce. The modeling of each single
component of fitness with a suitable probability distribution leads to a sampling
distribution for the joint expression of the fitness components, lifetime fitness, that
adequately approximates the actual distribution, as we show in our examples. The theory
underlying the aster approach requires modeling fitness components with a distribution
drawn from the exponential family (ref), a requirement which retains considerable
flexibility, in view of the many distributions in this family, including Bernoulli, Poisson,
geometric, and normal. The analysis employs the principle of maximum-likelihood,
developed by Fisher () and now widely adopted as a rigorous, general approach to any
statistical problem (Kendall and Stuart). Software for conducting the analysis, written in
the R statistical language, is freely available (R development core team 2005).
As we have emphasized, whether for ecological or evolutionary interpretation, it is the
lifetime fitness that is usually of primary interest. Unconditional aster models yield
analyses in terms of lifetime fitness. In some cases, interest is in evaluating fitness of the
remainder of a cohort, given survival up to some point in the life. Conditional aster
models serve this role.
We demonstrate the value and versatility of this approach with three examples. The first
employs a small dataset that Len ski and Service (1982) used to illustrate a method they
developed for inference of population growth rates from individual life-histories. This
permits a useful comparison of these two approaches. Second, we apply aster to quantify
effects of inbreeding on fitness of Echinacea angustifolia, a long-lived composite that is a
common component of the once widespread tallgrass prairie community. In our last
example, we reanalyze data of Etterson (2004) to evaluate phenotypic selection on the
annual legume, Chamaecrista fasciculata, subjected to different climate regimes.
Order of examples? Maybe better 1) Echinacea inbreeding and mortality, 2) C.f.
phenotypic selection 3) aphid pop growth
Example 1 (or 3???): Estimation of population growth rate, phi
Lenski and Service (1982) recognized the need for a valid statistical approach to
inferences about finite rates of increase (phi) from life-history records. They emphasized
the importance of accounting for individual variation in survivorship and fecundity in
inference of phi via the stable age equation: (their eqn 1). This expression weights earlier
produced offspring more heavily in their contribution to population growth to an extent
dependent on the population growth rate itself (Fisher 1930, 1958). Lenski and Service
(1982) presented a nonparametric approach that resamples records of individuals
according to the jackknife procedure. Using the properties of the jackknife, they showed
how to obtain estimates and sampling variances of phi. They illustrated the approach with
a small dataset sampled from the aphid, Uroleucon rudbeckiae. In this simple case, the
survival and fecundity in each of nine age intervals, recorded for a cohort of 18
individuals, served as the basis for estimating phi and its sampling variance for this
cohort (data printed in Lenski and Service 1982).
We have used the aster approach to analyze these data. For simplicity, we modeled
survivorship, Sx, as a quadratic function of age, x, and found not only that survivorship
declined significantly with age (P < 0.001) but also that there is significant deviation (P =
0.018) from a linear decline in the binomial parameter governing survival probability.
Fecundities, Bx, modeled according to a Poisson distribution, were estimated for each
age class. In this context, conditional aster models are appropriate because they estimate
fecundities expected for individuals of each age, given that they survive to that age, as
eqn 1 requires.
Interest focuses primarily on phi and especially on its sampling variance. As a nonlinear
function of Sx and Bx, phi is estimated directly from their estimates. Importantly, the
parametric aster analysis yields direct estimates of sampling variances for Sx and Bx, and
these can be used to obtain the sampling variance of phi via the delta method. From the
data given, we estimated phi = 1.67741 sig figs with a standard error of 0.056. Our
estimate is very close to that of Lenski and Service's (1982) value of 1.6876, whereas we
obtain considerably smaller standard error than their value of 1.47. This gain in precision
is a clear advantage of the aster approach attributable to its use of parametric models.
Example 3: Phenotypic selection analysis
Lande and Arnold (1983) proposed multiple regression of fitness on a set of quantitative
traits as a method for quantifying natural selection directly on each trait. In practice, these
analyses generally employ measures of components of fitness as the response variable,
rather than overall fitness (see examples in Lande and Arnold 1983). As a result, the
estimated selection gradients, the partial regression coefficients, reflect a 'bout of
selection', rather than selection over a cohort's lifespan. Focusing on this limitation,
Arnold and Wade (1984a) considered a partitioning of the overall selection gradient into
parts attributable to distinct episodes of selection, and Arnold and Wade (1984b)
illustrated the approach with example datasets. Wade and Kalisz (1989) modified this
approach to allow for change in phenotypic variance among selection episodes. Whereas
this prior method was developed to account for the multiple stages of selection, it does
not directly account for the dependence of later components of fitness on ones expressed
earlier. Further, Mitchell-Olds and Shaw (1987), among others, have noted that statistical
testing of the selection gradients is vitiated, in many cases, by the failure of the analysis
to satisfy the assumption of normality of the fitness measure, given the predictors. To
address this problem for the case of dichotomous fitness outcomes, such as survival,
Janzen and Stern (1998) recommended the use of logistic regression for testing selection
on traits and showed how the estimates resulting from logistic regression could be
transformed to obtain selection gradients. More generally, Schluter (1988) suggested a
nonparametric method of estimating the form of a fitness function as a cubic spline.
Because aster appropriately models the dependence of components of fitness on those
expressed earlier, it serves as a basis for valid inference in the context of phenotypic
selection analysis. When phenotypic traits are included as predictors in the aster model, a
conditional analysis quantifies the relationship between each component of fitness and
each trait, while accounting for other traits in the model and also for earlier expressed
components. Moreover, an unconditional aster analysis estimates the relationship
between overall fitness and the traits.
We illustrate this use of aster via a reanalysis of Etterson's (2004) study of phenotypic
selection on three traits in three populations of the annual legume, Chamaecrista
fasciculata, reciprocally transplanted into three sites. The three traits, measured when the
plants were 8-9 wk old, are leaf number (log transformed, LN), leaf thickness (measured
as specific leaf area, SLA, the ratio of a leaf's area to its dry weight, log transformed) and
reproductive stage (RS, scored in 6 categories with values increasing with reproductive
advancement). Here, for simplicity, we consider only the data for the three populations
grown in the Minnesota site.
C. fasciculata grows with a strictly annual life-history. In this experiment, fitness was
measured as 1) survival to flowering, 2) flowering, given that the plant survived, 3) the
number of fruits a plant produced, and 4) the number of seeds per fruit in a sample of
three fruits, the last two contingent on the plant having flowered. Consequently, overall
fitness was modeled jointly as the number of seeds per fruit and the number of fruits per
plant, rather than as the total number of seeds per plant (Fig. 1 in chamae.pdf).
Preliminary analyses revealed that nearly all survivors flowered, so the first two fitness
components were collapsed to a single one, modeled as Bernoulli. Both fecundity
components were modeled by a truncated negative binomial distribution. In addition to
the traits of interest, the model included the spatial block in which individuals were
planted and the source population.
For this subset of the data, we analyzed several models. We here focus on the submodel
that includes the dependence of the two reproductive components on each of the three
traits, emphasizing that aster analyses account for the dependence of later expressed
fitness components on those expressed earlier. This aster model detected dependence of
both fruit number and seed count on LN and RS, but not on SLA (P > 0.18; Table on top
of p.17 in chamae.pdf). The aster analysis fits the data well, as reflected by the scatter
plots of Pearson residuals which show very little trend and only two extreme outliers for
fruit number (Fig 4 in chamae.pdf). This evidence that we have suitably modeled the data
justifies use of our fitted aster model to estimate phenotypic selection gradients, as we do
below.
We conducted a parametric bootstrap (Efron and Tibshirani 1993, Section 6.5),
employing the relationships between the fitness components and the traits estimated by
the aster analysis to simulate many (here, 1000) datasets from the model fitted to the
original data. Within each, we calculated the overall absolute fitness (W) of each
individual as the product of its seed number per fruit and number of fruits. From this,
individual relative fitness (w) was obtained by dividing by the average, over the whole
dataset, of absolute fitness. For each dataset, multiple regression of w on the three traits
via ordinary least squares (OLS) estimates the average selection gradients as described by
Lande and Arnold (1983). Over all the datasets in the sample obtained by the parametric
bootstrap, these estimates closely approximate a normal distribution (Fig 2 of
chamae.pdf); thus, their means and variances over all the datasets validly estimate the
selection gradients for the original data and their sampling variance. The adherence of
this analysis to required statistical sampling properties justifies this aster-based approach
to analyzing phenotypic selection. By contrast, the conventional OLS multiple regression
of relative fitness on the traits using the original data yields residuals that seriously
violate the usual OLS assumptions of homoscedasticity and normality (Fig. 7 or 8 of
chamae.pdf), as expected, given that 8% of plants have fitness of zero and that the
distributions of numbers of fruits per plant is very heavily skewed.
The structure of the aster model suitable for these data precluded direct aster analysis of
phenotypic selection. As noted, seed counts were available for a subset of fruits, rather
than counted for each whole plant. Data of this kind are often acquired this way, for
reasons of practicality, not only for plants, but also, for example, for frogs (Howard
1979). It is this feature of the data that dictated the choice of modeling the dependence of
seed count and fruit number jointly on fecundity status (Fig. 1 in chamae.pdf). This, in
turn, obviated the direct aster analysis of the dependence of overall fitness on the traits
and entailed application of the parametric bootstrap. We wish to emphasize, however,
that statistical tests of overall selection on individual traits can be done directly in aster
for datasets where direct observations of total reproductive output are available, i.e. for
our example, if total seed counts were available for each plant. In that case, total seed
count would be modeled as conditional on the number of fruits produced. We have not
used the product of fruit count and number of seeds per fruit in this way because this
product is not distributed according to an exponential family.
However, we can illustrate more direct use of aster in phenotypic selection analysis by
analyzing just the two fitness components, reproductive status and fruit count from the C.
fasciculata dataset, as the complete fitness response. For plants, observations of
reproductive output are often limited to counts of fruits. We further, for simplicity,
reduced the model of predictors to Population, Block, and the two traits, SLA, and LN.
This model, which can serve as complete analysis of directional selection on the
phenotypes, detected strong dependence of fitness on both traits (table at bottom of p.14
of chamae2.pdf), such that selection is toward more, thicker leaves. Extending this
analysis to assess curvature in the fitness function, we added quadratic components to the
model and detected highly significant negative curvature, suggestive of stabilizing
selection, for both traits, as well as interaction between the traits (table toward bottom of
p.15 of chamae2.pdf). The plot of the fitness function together with the observed
phenotypes reveals that the fitness optimum lies outside the range of the distribution of
leaf number for two populations, and it is very near the distribution's edge for the third
(Fig 1 on p.18 of chamae2.pdf). Thus, for this trait, selection against both extremes of the
trait distribution (i.e. stabilizing selection) is not observed. By contrast with this asterinferred bivariate quadratic fitness function, OLS estimates significant positive curvature
for the relationship between fitness and leaf number (p.21 of chamae2.pdf), suggestive of
disruptive selection.
In this section, we have illustrated how aster can be used to obtain a complete phenotypic
selection analysis which does not suffer typical violations of assumptions of the OLS
regression of fitness on phenotypes. Even when the available data are such that total
reproductive output cannot be modeled as dependent on all earlier expressed fitness
components, as above in the analysis of the data including counts of number of seeds per
fruit, aster estimates the parameters of a fitness model that, in combination with
parametric bootstrapping, yield a valid phenotypic selection analysis. long sent!
Annual l-h simpler fitness graph, but Aster still greatly improves anal (fit).
___________________________________________________________________
Intro:
`Fitness:
`contribution to pop growth (role in pop and evol dynamics)
`var in individual fitness throughout the lifespan is the
basis for evolutionary change by nat. sel.
Fitness records spanning more of the life history contain more
information about overall fitness and its variability, in some cases it is
feasible to obtain complete l.h. records
`Efforts to assess fitness comprehensively (Antonovics&Ellstrand 1984)
fraught with statistical challenges.
Impediment to inference about variation in overall fitness and its
causes: how to analyse multiple fitness traits jointly. (Stanton
and Thiede)
`Two key aspects of these data that pose challenges
`Early mortality before reproduction
`For different components of fitness, different prob dis'ns are suitable.
'Overview of Aster
Examples:
Lenski-Service data
Mortality of Echinacea (INB1 including prefield survival—To avoid overspecification,
could we lump the multiple pre-field intervals into one?)
Chamaecrista - whole life, selection analysis.
Discussion:
Understanding of individual contributions to the collective dynamics
of populations depends on comprehensive assessment of fitness, as
does understanding of evolutionary process, esp. nat. sel.
Caswell, Morris and Doak.
This likelihood-based approach that employs parametric models that
a) are well suited to the data and b) account for "censorship"
provides a joint analysis of all the data that is statisically valid
and more powerful than alternatives. Avoids arbitrary transformations
that can complicate interpretation (Stanton and Thiede). Most important,
it serves as a single unified framework for analysis of this kind of data
that is suitable for addressing questions that arise in different ecol and
evol contexts.
Efforts to deal (Lenski-Service;)
cf. zero-inflated poisson "zero-tolerance ecology"
not general...
Common practice of estimating selection based on a single component of fitness
(e.g. Arnold and Lande 1984, see also Kingsolver).
But For organisms with life spans more complex than binary fission, single
components of fitness do not, in general, match overall fitness.
simply incomplete.
with tradeoffs may be misleading. (Prout, Antonovics)
Valuable: yields coherent (rather than piecemeal)
1) prediction of change in pop size (requires inference of lambda)
cf. Leslie matrix
2) inference of genotype-dependent selection (differential fitness of genotypes)
3) inference of phenotype-dependent selection (Lande-Arnold, Arnold and Wade, van
Tienderen)
genotypic selection analysis - maybe some day...
Addresses some of the points raised by Beatty.
Beatty () has summarized key conceptual problems in defining fitness.
Current limitations:
1) single-parameter distributions only (i.e. not Normal)
2) fixed predictors only (i.e. no variance components models)
But these are limitations in current practice, not in principle, though
challenging. (Plans???)
Acknowledgments:
JA
Mark Borello
NSF
Janzen, F. J., and H. S. Stern. 1998. Logistic regression for empirical studies of
multivariate selection. Evolution 52:1564-1571.
Schluter, D. 1988. Estimating the form of natural selection on a quantitative trait.
Evolution 42:849-861.
Efron, B. and Tibshirani. 1993. Introduction to the Bootstrap. Chapman and Hall.
Stanton, M. L. and D. A. Thiede. 2005. Statistical convenience vs. biological insight:
consequences of data transformation for the analysis of fitness variation in heterogeneous
environments. New Phytologist 166: 319-337.
Howard, R.D. 1979. Estimating reproductive success in natural populations. Am. Nat.
114: 221-231.
R Development Core Team (2005). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-070, URL http://www.R-project.org.
Download