nph12153-sup-0003-NotesS1

advertisement
Supporting Information Notes S1 Model Description
As explained in the text, Sit, which indicates whether an individual tree i is alive or dead in year t, follows a Bernoulli
distribution with parameter θit. If an individual is dead (Sit =0), it remains dead in all subsequent time-steps. We convert the linear
equation containing the covariates and the error term into a 0-to-1 probability using a logit link:
πœ†π‘–π‘‘ = 𝑋𝑖 𝛽⃑ + π‘Šπ‘–π‘‘ 𝛼⃑ + πœ€π‘–π‘‘
πœƒπ‘–π‘‘ =
πœ€π‘–π‘‘ ~𝑁(0, 𝜎 2 )
𝑒 πœ†π‘–π‘‘
1 + 𝑒 πœ†π‘–π‘‘
The year after planting, 1998, is indicated by t =1. We assume that states (alive or dead) are observed without error, and that
the probability of survival in a given year depends on a matrix of constant covariates and state variables X (such as clone or treatment)
and a matrix of time-dependent state variables Wt (such as age or height).
It should be noted that β and α are vectors. Matrix X has n rows, one for each individual tree, and a number of columns equal
to one plus the number of constant covariates and state variables in the model under consideration (hereafter Q). Matrix Wt has n rows
and a number of columns equal to the number of time-dependent state variables in the model under consideration (hereafter J). X
consists of a first column of 1’s (to accommodate the intercept), with all subsequent columns consisting of indicator variables for
clone, treatment, and clone x treatment. For instance, consider this example X for a dataset with 4 individuals, 3 clones, and 2
treatments:
𝑖𝑛𝑑
1
𝑋= 1
1
[1
𝑐1
0
0
1
0
𝑐2 𝑐3 𝑑1
1 0 0
0 1 1
0 0 0
1 0 0
𝑑2
0
0
1
1
𝑐1 × π‘‘1
0
0
0
0
𝑐1 × π‘‘2
0
0
1
0
𝑐2 × π‘‘1
0
0
0
0
𝑐2 × π‘‘2
0
0
0
1
𝑐3 × π‘‘1
0
1
0
0
𝑐3 × π‘‘2
0
0
0
0 ]
In this example, if β2 were positive, this would mean that individuals belonging to clone1 have a higher-than-average annual
survival probability; If β6 were negative, that would mean that individuals in treatment 2 had a lower survival probability than control
individuals. Finally, if β8 were positive, this would indicate a positive interaction between clone 1 and treatment 2 above and beyond
what can be explained by the higher survival of clone 1 and the lower survival in treatment 2 independently. Wt, on the other hand,
consists of measurements, or some transformation thereof. Age is squared, because preliminary data suggested that mortality rates
tend to increase over time, in a manner consistent with a second-power, as individuals grow and compete. Because annual growth
increment or age changes over time, there is a different matrix Wt at each timestep t. In this part of the analysis, height was assumed to
be observed without error. If a year of height data was missing for an individual at time t, the height was assumed to be the mean of
the heights in the flanking years t-1 and t+1.
As in any Bayesian model, the joint posterior probability of the parameters (in this case, β, α, and σ) is proportional to the
likelihood of the data given the parameters multiplied by prior probability distributions for the parameters. The likelihood of an
individual’s survival history if, for instance, Si = [1,1,1,0,0] is P(Si1 = 1, Si2 = 1, Si3 = 1, Si4 = 0, Si5 = 0). Because the probability that a
dead individual remains dead is equal to 1, this simplifies to
𝐿(𝑆𝑖 ) = 𝑃(𝑆𝑖 |πœƒ) = πœƒπ‘–1 πœƒπ‘–2 πœƒπ‘–3 (1 − πœƒπ‘–4 )
Because the survival probabilities depend on covariates, the full likelihood must be expressed conditionally:
𝑇
𝐿(𝑆𝑖 |𝛽⃑ , 𝛼⃑, 𝜎) = ∏[𝑝(𝑆𝑖𝑑 |πœƒπ‘–π‘‘ )𝑝( πœƒπ‘–π‘‘ |𝑋𝑖 𝛽⃑ , π‘Šπ‘–π‘‘ 𝛼⃑, 𝜎)]
𝑑=1
The complete Bayesian model can be expressed as:
𝑛
𝑇
⃑⃑⃑⃑𝑏, πœ™)𝑝(𝛼⃑|π‘Ž, πœ”)𝑝(𝜎|𝑠1 , 𝑠2 )
𝑝(𝛽⃑ , 𝛼⃑, 𝜎|𝑋, π‘Š, 𝑆) ∝ ∏ ∏[𝑝(𝑆𝑖𝑑 |πœƒπ‘–π‘‘ )𝑝( πœƒπ‘–π‘‘ |𝑋𝑖 𝛽⃑ , π‘Šπ‘–π‘‘ 𝛼⃑, 𝜎)]𝑝(𝛽|
𝑖=1 𝑑=1
or, where l(λ) represents the logit link function:
𝑛
𝑇
⃑⃑⃑⃑𝑏, πœ™)𝑝(𝛼⃑|π‘Ž, πœ”)𝑝(𝜎|𝑠1 , 𝑠2 )
∝ ∏ ∏[𝑝(𝑆𝑖𝑑 |𝑙(πœ†π‘–π‘‘ ))𝑝( πœ†π‘–π‘‘ |𝑋𝑖 𝛽⃑ , π‘Šπ‘–π‘‘ 𝛼⃑, 𝜎)]𝑝(𝛽|
𝑖=1 𝑑=1
Coefficient parameter vectors β and α are assigned multivariate normal priors and error parameter σ an inverse gamma prior,
because these distributions are conjugate with the normal distribution of the likelihood, allowing for direct sampling from the
posterior. The priors were chosen to reflect that a) average annual survival is high (>80%) in most trees, b) all treatments are likely to
have negative effects on survival, c) effects of clone or clone x treatment interactions may be positive or negative, and d) height and
height increment are likely to have positive effects on survival. Model results were relatively insensitive to the choice of priors.
Further details about the priors used and the sensitivity tests can be found in Appendix S2.
The model was implemented in R (www.r-project.org) using Gibbs sampling (Clark, 2007). Gibbs sampling involves
factoring a high-dimensional posterior density, representing the joint distribution of multiple parameters, into several lowerdimensional densities that can be sampled sequentially. For example:
𝑛
𝑇
⃑⃑⃑⃑𝑏, πœ™)𝑝(𝛼⃑|π‘Ž, πœ”)
𝑝(𝛽⃑ , 𝛼⃑|𝜎, 𝑋, π‘Š, 𝑆) ∝ ∏ ∏ 𝑝( πœ†π‘–π‘‘ |𝑋𝑖 𝛽⃑ , π‘Šπ‘–π‘‘ 𝛼⃑, 𝜎)]𝑝(𝛽|
𝑖=1 𝑑=1
𝑛
𝑇
⃑⃑⃑⃑𝑏, πœ™)𝑁(𝛼⃑|π‘Ž, πœ”) = π‘šπ‘£π‘›π‘œπ‘Ÿπ‘š(𝑉2 𝑣2 , 𝑉2 )
= ∏ ∏[𝑁( 𝑋𝑖 𝛽⃑ + π‘Šπ‘–π‘‘ 𝛼⃑, 𝜎 2 )]𝑁(𝛽|
𝑖=1 𝑑=1
In each model run, parameters were initialized using a draw from a uniform distribution:
β1 ~ Unif(3,7)
β2, β3 ~ Unif(-3.5,-1.5)
β4 ~ Unif(-2,-1)
β5 to β24 ~ Unif(-1,1)
α1~ Unif(0.01,0.1)
α2~ Unif(0.1,0.8)
α3~ Unif(-0.04,0)
σ ~ Unif(0.05,0.5)
An initial value for lambda is calculated based on these values. Then, at each Gibbs step we draw new values for β and α given σ and
λ from the multivariate normal distribution with mean matrix V1v1 and variance matrix V1:
𝑉1−1 = (π‘‹π‘Š × π‘‹π‘Š)
𝑣1 = (π‘Šπ‘‹ × πΏ)
1
1⁄πœ™
+
𝐼
([
])
1⁄πœ”
𝜎2
1
𝑏⁄πœ™
+ 𝐼 ([
])
2
π‘Ž⁄πœ”
𝜎
where XW is a matrix with Q + J columns and nT rows, and L is an nT x 1 matrix, such that:
𝑋 π‘Š1
𝑋 π‘Š2
π‘‹π‘Š = [
]
… …
𝑋 π‘Šπ‘‡
πœ†1
πœ†
𝐿 = [ 2]
…
πœ†π‘›
Next we draw new values for σ given β, α, and λ from the inverse gamma distribution with parameters u1 and u2, where:
𝑒1 = 𝑠1 +
𝑛𝑇
2
𝑒2 = 𝑠2 +
𝛽
𝑄 = 𝐿 − (π‘Šπ‘‹ × [ ])
𝛼
𝑄 ×𝑄
2
Finally, at each Gibbs step new values for the lambdas are drawn from a normal distribution with mean Xβ+Wtα and variance σ2.
These values are accepted or rejected with a probability equal to:
∏𝑇𝑑=1(πœƒπ‘–π‘‘∗ )𝑆𝑖𝑑 (1 − πœƒπ‘–π‘‘∗ )(1−𝑆𝑖𝑑)
𝑆𝑖𝑑
∏𝑇𝑑=1(πœƒπ‘–π‘‘ ) (1 − πœƒπ‘–π‘‘ )(1−𝑆𝑖𝑑)
where θ* is the logit transformation of the proposed lambda value and θ is the logit transformation of the current lambda value. These
three steps are then repeated until estimates for the parameters converge to a stable distribution. We used 25,000 Gibbs steps,
discarded the first 12,000 as burn-in, and sampled every 20th step (to avoid correlation between steps) from the remaining sequence to
generate the posterior distribution for each parameter.
We constructed multiple nested and non-nested models in order to determine the importance of various covariates for mortality
risk (Table S1). All models included an intercept (β1) and an error parameter (σ). The most complex model tested (shown in the
preceding equations) included effects of treatment (β2 to β4), clone (β5 to β9), clone x treatment interactions (β10 to β24), current height
(α1), height increment in previous year (α2), and age2 (α3). We compared models using predictive loss (Dm), a measure of fit
developed specifically for hierarchical Bayesian models (Gelfand & Ghosh, 1998), which can be calculated as part of the Gibbs
sampler (Clark, 2007). We aim to minimize Dm, where Dm = Gm + Pm. The cost of selecting the wrong model, Gm, is the error sumof-squares while the penalty for overfitting, Pm, comes from the predictive variance. For this set of models:
𝐺
𝑇
2
1
(𝑔)
πΊπ‘š = ∑ (∑ ∑(πœƒπ‘–,𝑑−1 − 1) )
𝐺
𝑔=1
𝐺
𝑑=2 𝑖∈𝐼𝑑
𝑇
1
(𝑔)
(𝑔)
π‘ƒπ‘š = ∑ (∑ ∑ πœƒπ‘–,𝑑−1 (πœƒπ‘–,𝑑−1 − 1) )
𝐺
𝑔=1
𝑑=2 𝑖∈𝐼𝑑
where G is the number of Gibbs steps after burn-in, θ(g) is the value of theta imputed at Gibbs step g, and It is the set of individuals still
alive at time t.
Download