gcb12193-sup-0001-DataS1-FigS1

advertisement
Spatial relationship between climatologies and changes in global vegetation activity
Rogier de Jong, Michael E. Schaepman, Reinhard Furrer, Sytze de Bruin, Peter H. Verburg
Supporting Online Material (SOM)
Materials and Methods
S1. Deterministic model (fixed effects)
For global application, we used a regression tree (RT) model. Such a model is built by
recursive partitioning of the sample (= root node) into more homogeneous nodes, or
children (Breiman et al., 1984). Each split is based on one predictor and is selected
according to a splitting criterion, which minimizes the total sum of squared deviations
from node centers. The tree is grown until no splits can be made anymore due to lack of
data and subsequently reduced in a process of pruning with least-important splits, based
upon a cost-complexity measure (Steinberg, 2009), being removed. Cross-validation was
used for derivation of the optimal complexity parameter. As such, the 54601 grid cells
(root node) were classified into 867 terminal nodes. All climate variables were selected
in approximately equal amounts within the splits. The resulting model was used to
predict the change in vegetation activity by following the path from the root node down
to the appropriate terminal node of the tree. This provided the fixed -effects term of the
additive model in Equation 1 (main document). For this model, we used the tree package
in R (R Development Core Team, 2012).
S2. Spatial field model (random effects)
Spatial dependence in the non-associated effects was modeled in R using a stationary
Gaussian random field (GRF). A GRF is specified by its mean value function and its
covariance function. Therefore, the main assumption underlying h is a normal
distribution with, in this case, zero mean and covariance matrix ∑(Θ). The model
parameters Θ (i.e. sill and range parameter δ) fully characterize the random field, which
is expressed as:
ℎ ~ 𝑁(0, ∑(𝛩))
(Eq. S1)
A spherical covariance function induces a symmetric and positive definite
covariance matrix. The size of the covariance matrix (i.e. square of the number of
observations) may lead to serious computational issues for datasets of the size used here
(Furrer & Sain, 2009). We used two measures to deal with this.
First, a spherical function was selected because observations beyond the maximum
range δ can be considered spatially uncorrelated. The range, therefore, defines the mean
'patch size' in a realization of a GRF. We determined δ (in 1 degree steps, using great
circle distance) by computing the negative 2 log-likelihood (-2ln(L)) of the observed
spatial field of residuals. We found the optimum around 900km (Figure S1a), with the
most substantial decrease in -2ln(L) below ~500km. The latter provides a sort of
minimal range that should be respected. We used the longer-range δ = 897km (~8deg)
for estimation of the other model parameters. It resulted in a covariance-matrix density
of 3%, equivalent to 89.5 million nonzero elements for 54’601 observations.
Second, recognizing the sparse nature of the covariance matrices, only the nonzero
entries were stored and used for estimation of Θ. For this part of the analysis we used
the R package spam (Furrer & Sain, 2010). Given that ∑(Θ) is symmetric and positive
definite, Cholesky decomposition was used to construct a lower triangular matrix L,
such that the product LLT returns the original matrix. Solving linear systems, like
maximum likelihood estimation (MLE), becomes computationally more efficient using
this manipulation (Higham, 2009), which we used to our advantage for optimizing Θ, as
described in the following steps.
(step 1)
A set of initial parameters Θ0 was derived from the residuals of the
fixed-effects model by method-of-moments using gstat (Pebesma &
Wesseling, 1998).
(step 2)
The distance matrix up to distance δ was calculated. Subsequently, the
spherical covariance function was applied using the current parameter
estimates.
(step 3)
The parameters were optimized using MLE to obtain a new set Θ,
which was used for predicting the spatial field ℎ̂ (Eq. S2). In turn, this
spatial field was used for backfitting of β; this procedure was repeated
until convergence. The final spatial parameters and the resulting
spherical covariance function are shown in Figure S1b.
The described methodology provides the best empirical linear unbiased prediction
(E-BLUP) of the spatial field (Henderson, 1975) and is, under the intrinsic assumption
of Eq. S1, analogous to kriging approaches in geostatistics (Lark et al., 2006). More
specifically, it is analogous to kriging approaches with nugget filtering, which yielded
the separation between the spatially correlated field (Figure 4c) and the uncorrelated
residuals (Figure 4d).
Figure S1 (a) Maximum likelihood estimation (MLE) of spatial model parameters as a
function of range δ (x-axis). The y-axis shows the negative 2 log-likelihood, or -2ln(L).
The optimal range was found at 897km or 8deg (b) the optimized spherical covariance
function obtained from the MLE and used for the Gaussian random field (GRF). The
spatial dependency reduces to zero at distance δ.
References (SOM only)
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees,
Wadsworth, Belmont, CA, Chapman and Hall / CRC Press.
Furrer R, Sain SR (2009) Spatial model fitting for large datasets with applications to climate
and microarray problems. Statistics and Computing, 19, 113-128.
Furrer R, Sain SR (2010) spam: a sparse matrix R package with emphasis on MCMC methods
for Gaussian Markov random fields. Journal of Statistical Software, 36, 1-25.
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model.
Biometrics, 31, 423-447.
Higham NJ (2009) Cholesky factorization. Wiley Interdisciplinary Reviews: Computational
Statistics, 1, 251-254.
Lark RM, Cullis BR, Welham SJ (2006) On spatial prediction of soil properties in the
presence of a spatial trend: the empirical best linear unbiased predictor (E-BLUP) with
REML. European Journal of Soil Science, 57, 787-799.
Pebesma EJ, Wesseling CG (1998) Gstat: a program for geostatistical modelling, prediction
and simulation. Computers & Geosciences, 24, 17-31.
R Development Core Team (2012) R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
http://www.R-project.org/.
Steinberg D (2009) CART: Classification and Regression Trees. In: The Top Ten Algorithms
in Data Mining. (eds Wu X, Kumar V) pp Page. Boca Raton, FL, USA, CRC Press
(Taylor & Francis Group).
Download