Supporting Information Notes S1 Model Description As explained in the text, Sit, which indicates whether an individual tree i is alive or dead in year t, follows a Bernoulli distribution with parameter θit. If an individual is dead (Sit =0), it remains dead in all subsequent time-steps. We convert the linear equation containing the covariates and the error term into a 0-to-1 probability using a logit link: πππ‘ = ππ π½β + πππ‘ πΌβ + πππ‘ πππ‘ = πππ‘ ~π(0, π 2 ) π πππ‘ 1 + π πππ‘ The year after planting, 1998, is indicated by t =1. We assume that states (alive or dead) are observed without error, and that the probability of survival in a given year depends on a matrix of constant covariates and state variables X (such as clone or treatment) and a matrix of time-dependent state variables Wt (such as age or height). It should be noted that β and α are vectors. Matrix X has n rows, one for each individual tree, and a number of columns equal to one plus the number of constant covariates and state variables in the model under consideration (hereafter Q). Matrix Wt has n rows and a number of columns equal to the number of time-dependent state variables in the model under consideration (hereafter J). X consists of a first column of 1’s (to accommodate the intercept), with all subsequent columns consisting of indicator variables for clone, treatment, and clone x treatment. For instance, consider this example X for a dataset with 4 individuals, 3 clones, and 2 treatments: πππ‘ 1 π= 1 1 [1 π1 0 0 1 0 π2 π3 π‘1 1 0 0 0 1 1 0 0 0 1 0 0 π‘2 0 0 1 1 π1 × π‘1 0 0 0 0 π1 × π‘2 0 0 1 0 π2 × π‘1 0 0 0 0 π2 × π‘2 0 0 0 1 π3 × π‘1 0 1 0 0 π3 × π‘2 0 0 0 0 ] In this example, if β2 were positive, this would mean that individuals belonging to clone1 have a higher-than-average annual survival probability; If β6 were negative, that would mean that individuals in treatment 2 had a lower survival probability than control individuals. Finally, if β8 were positive, this would indicate a positive interaction between clone 1 and treatment 2 above and beyond what can be explained by the higher survival of clone 1 and the lower survival in treatment 2 independently. Wt, on the other hand, consists of measurements, or some transformation thereof. Age is squared, because preliminary data suggested that mortality rates tend to increase over time, in a manner consistent with a second-power, as individuals grow and compete. Because annual growth increment or age changes over time, there is a different matrix Wt at each timestep t. In this part of the analysis, height was assumed to be observed without error. If a year of height data was missing for an individual at time t, the height was assumed to be the mean of the heights in the flanking years t-1 and t+1. As in any Bayesian model, the joint posterior probability of the parameters (in this case, β, α, and σ) is proportional to the likelihood of the data given the parameters multiplied by prior probability distributions for the parameters. The likelihood of an individual’s survival history if, for instance, Si = [1,1,1,0,0] is P(Si1 = 1, Si2 = 1, Si3 = 1, Si4 = 0, Si5 = 0). Because the probability that a dead individual remains dead is equal to 1, this simplifies to πΏ(ππ ) = π(ππ |π) = ππ1 ππ2 ππ3 (1 − ππ4 ) Because the survival probabilities depend on covariates, the full likelihood must be expressed conditionally: π πΏ(ππ |π½β , πΌβ, π) = ∏[π(πππ‘ |πππ‘ )π( πππ‘ |ππ π½β , πππ‘ πΌβ, π)] π‘=1 The complete Bayesian model can be expressed as: π π ββββπ, π)π(πΌβ|π, π)π(π|π 1 , π 2 ) π(π½β , πΌβ, π|π, π, π) ∝ ∏ ∏[π(πππ‘ |πππ‘ )π( πππ‘ |ππ π½β , πππ‘ πΌβ, π)]π(π½| π=1 π‘=1 or, where l(λ) represents the logit link function: π π ββββπ, π)π(πΌβ|π, π)π(π|π 1 , π 2 ) ∝ ∏ ∏[π(πππ‘ |π(πππ‘ ))π( πππ‘ |ππ π½β , πππ‘ πΌβ, π)]π(π½| π=1 π‘=1 Coefficient parameter vectors β and α are assigned multivariate normal priors and error parameter σ an inverse gamma prior, because these distributions are conjugate with the normal distribution of the likelihood, allowing for direct sampling from the posterior. The priors were chosen to reflect that a) average annual survival is high (>80%) in most trees, b) all treatments are likely to have negative effects on survival, c) effects of clone or clone x treatment interactions may be positive or negative, and d) height and height increment are likely to have positive effects on survival. Model results were relatively insensitive to the choice of priors. Further details about the priors used and the sensitivity tests can be found in Appendix S2. The model was implemented in R (www.r-project.org) using Gibbs sampling (Clark, 2007). Gibbs sampling involves factoring a high-dimensional posterior density, representing the joint distribution of multiple parameters, into several lowerdimensional densities that can be sampled sequentially. For example: π π ββββπ, π)π(πΌβ|π, π) π(π½β , πΌβ|π, π, π, π) ∝ ∏ ∏ π( πππ‘ |ππ π½β , πππ‘ πΌβ, π)]π(π½| π=1 π‘=1 π π ββββπ, π)π(πΌβ|π, π) = ππ£ππππ(π2 π£2 , π2 ) = ∏ ∏[π( ππ π½β + πππ‘ πΌβ, π 2 )]π(π½| π=1 π‘=1 In each model run, parameters were initialized using a draw from a uniform distribution: β1 ~ Unif(3,7) β2, β3 ~ Unif(-3.5,-1.5) β4 ~ Unif(-2,-1) β5 to β24 ~ Unif(-1,1) α1~ Unif(0.01,0.1) α2~ Unif(0.1,0.8) α3~ Unif(-0.04,0) σ ~ Unif(0.05,0.5) An initial value for lambda is calculated based on these values. Then, at each Gibbs step we draw new values for β and α given σ and λ from the multivariate normal distribution with mean matrix V1v1 and variance matrix V1: π1−1 = (ππ × ππ) π£1 = (ππ × πΏ) 1 1⁄π + πΌ ([ ]) 1⁄π π2 1 π⁄π + πΌ ([ ]) 2 π⁄π π where XW is a matrix with Q + J columns and nT rows, and L is an nT x 1 matrix, such that: π π1 π π2 ππ = [ ] … … π ππ π1 π πΏ = [ 2] … ππ Next we draw new values for σ given β, α, and λ from the inverse gamma distribution with parameters u1 and u2, where: π’1 = π 1 + ππ 2 π’2 = π 2 + π½ π = πΏ − (ππ × [ ]) πΌ π ×π 2 Finally, at each Gibbs step new values for the lambdas are drawn from a normal distribution with mean Xβ+Wtα and variance σ2. These values are accepted or rejected with a probability equal to: ∏ππ‘=1(πππ‘∗ )πππ‘ (1 − πππ‘∗ )(1−πππ‘) πππ‘ ∏ππ‘=1(πππ‘ ) (1 − πππ‘ )(1−πππ‘) where θ* is the logit transformation of the proposed lambda value and θ is the logit transformation of the current lambda value. These three steps are then repeated until estimates for the parameters converge to a stable distribution. We used 25,000 Gibbs steps, discarded the first 12,000 as burn-in, and sampled every 20th step (to avoid correlation between steps) from the remaining sequence to generate the posterior distribution for each parameter. We constructed multiple nested and non-nested models in order to determine the importance of various covariates for mortality risk (Table S1). All models included an intercept (β1) and an error parameter (σ). The most complex model tested (shown in the preceding equations) included effects of treatment (β2 to β4), clone (β5 to β9), clone x treatment interactions (β10 to β24), current height (α1), height increment in previous year (α2), and age2 (α3). We compared models using predictive loss (Dm), a measure of fit developed specifically for hierarchical Bayesian models (Gelfand & Ghosh, 1998), which can be calculated as part of the Gibbs sampler (Clark, 2007). We aim to minimize Dm, where Dm = Gm + Pm. The cost of selecting the wrong model, Gm, is the error sumof-squares while the penalty for overfitting, Pm, comes from the predictive variance. For this set of models: πΊ π 2 1 (π) πΊπ = ∑ (∑ ∑(ππ,π‘−1 − 1) ) πΊ π=1 πΊ π‘=2 π∈πΌπ‘ π 1 (π) (π) ππ = ∑ (∑ ∑ ππ,π‘−1 (ππ,π‘−1 − 1) ) πΊ π=1 π‘=2 π∈πΌπ‘ where G is the number of Gibbs steps after burn-in, θ(g) is the value of theta imputed at Gibbs step g, and It is the set of individuals still alive at time t.