1 Introduction - IDA - Linköping University

advertisement
Semiparametric smoothers for trend assessment of multiple
time series of environmental quality data
1
1
Grimvall, A., 1Wahlin, K., 1Hussian, M. and 2von Brömssen, C.
Department of Computer and Information Science, Linköping University, SE-581 83
Linköping, Sweden
2
Unit of Applied Statistics and Mathematics, Swedish University of Agricultural
Sciences, Box 7013, SE-750 07 Uppsala, Sweden
e-mail: anders.grimvall@liu.se
Abstract
Multiple time series of environmental quality data with similar, but not necessarily
identical, trends call for multivariate methods for trend detection and adjustment for
covariates. Here, we show how an additive model in which the multivariate trend
function is specified in a nonparametric fashion (and the adjustment for covariates is
based on a parametric expression) can be used to estimate how the human impact on an
ecosystem varies with time and across components of the observed vector time series.
More specifically, we demonstrate how a roughness penalty approach can be utilized to
impose different types of smoothness on the function surface that describes trends in
environmental quality as a function of time and vector component. Compared to other
tools used for this purpose, such as Gaussian smoothers and thin plate splines, an
advantage of our approach is that the smoothing pattern can easily be tailored to different
types of relationships between the vector components. We give explicit roughness
penalty expressions for data collected over several seasons or representing several classes
on a linear or circular scale. In addition, we define a general separable smoothing
method. A new resampling technique that preserves statistical dependencies over time
and across vector components enables realistic calculations of confidence and prediction
intervals.
1
1
Introduction
Deterioration of an ecosystem is usually a slow process, and mathematical
functions that describe the impact of humans on the environment will presumably vary
smoothly over time. Furthermore, in many cases, a substantial fraction of the temporal
changes in the measured state of the environment can be attributed to random fluctuations
in weather conditions or other types of natural variability. Consequently, there is an
obvious need for statistical methods that enable simultaneous extraction of smooth trends
and adjustment for covariates.
The approaches most often used to detect trends in environmental quality in the
presence of covariates have been reviewed by Thompson and coworkers (2001).
Regression methods predominate and several investigators have employed nonparametric
or semiparametric techniques because they enable unprejudiced inference about the shape
of the trend curves (Shively & Sager, 1999; Gardner & Dorling, 2000 a,b; Stålnacke &
Grimvall, 2001; Giannitrapani et al., 2004). In particular, it has been emphasized that
generalized additive models (GAMs) provide a suitable framework for such inference
(Giannitrapani et al., 2004).
Residual resampling (Mammen, 2000) is a standard technique to assess the
precision of parameter estimates in semiparametric or nonparametric regression models.
Because new residuals are drawn randomly from the original model residuals in this
procedure, it generates statistically independent error terms. Block resampling has been
proposed as a means of preserving short-term correlations in time series data (Lahiri,
1999). Furthermore, it has been shown how block resampling procedures can be refined
by concatenating with higher likelihood those blocks that match at their ends (Carlstein et
al., 1998; Srinivas & Srinivasan, 2005). However, it is unclear how two-dimensional
blocks should be selected and matched to achieve optimal results. Therefore, there is a
need for new resampling techniques that can preserve dependencies in more than one
direction.
Our research group has focused on applications in which the collected data
represent several classes of observations, and the models are estimated using a roughness
penalty approach that allows the smoothness conditions to be selected just as carefully as
2
other model features. Two papers have addressed time series of riverine load data
collected over several seasons (Stålnacke & Grimvall, 2001; Hussian et al., 2004), and
Libiseller and Grimvall (2005) have presented a method for trend analysis and
normalization of atmospheric deposition data representing several wind sectors. Both the
riverine loads by season and the deposition by wind sector can be regarded as vector time
series of annual data for which a joint analysis of temporal trends would be desirable.
Additional examples of such multivariate data are observations from several sampling
sites along a gradient or several congeners of an organic pollutant.
Here, we show how an additive model in which the multivariate trend function is
specified in a nonparametric fashion (and the adjustment for covariates is based on a
parametric expression) can be used to estimate how the human impact on the ecosystem
of interest varies with time and across components of the observed vector time series.
More specifically, we demonstrate how a roughness penalty approach can be utilized to
impose different types of smoothness on the function surface describing the trends in
environmental quality as a function of time and vector component. In addition, we show
how a new resampling technique that preserves statistical dependencies over time and
across vector components can be used to compute realistic confidence and prediction
intervals.
2
Basic semiparametric model
Let
y t  ( y t(1) , ..., y t( m ) ) T , t  1, ..., n
denote an m-dimensional vector time series representing the observed state of the
environment at n equidistant time points, and let
 x1(t1)

 .
x t   .
 .
 ( m )
 x1t
.
.
.
.
.
.
x (pt1) 

. 
.  , t  1, ..., n
. 

x (ptm ) 
3
be a matrix that includes contemporaneous values of p explanatory vectors representing
natural fluctuations in yt. Further, assume that
p
yt( j )   t( j )    i( j ) xit( j )   t( j ) ,
j  1, ..., m, t  1, ..., n
(1)
i 1
where the sequence of vectors αt  ( t(1) , ...,  t( m ) )T , t  1, ..., n represents a deterministic
temporal trend,
  1(1)

 .
β   .
 .
 ( m )
 1
.
.
.
 p(1) 

. 
. 
. 

 p( m ) 
is a matrix of time-independent regression coefficients, and the error terms  t( j ) , j = 1, …,
m, t = 1, …, n, are identically distributed with mean zero.
Estimation of this semiparametric model requires that smoothness conditions be
introduced to make the degrees of freedom (effective dimension) of the model less than
the number of observations, which imposes constraints on the variation over time and
across components. In addition, we note that smoothness conditions for the intercept
parameters  t( j ) are introduced in a more natural manner, if we reformulate the model so
that the intercepts represent the expected response values when the covariates are equal to
their expectations, that is
p
y t( j )   t( j )    i( j ) ( xit( j )  E ( xit( j ) ))   t( j ) ,
i 1
j  1, ..., m, t  1, ..., n
(2)
The following discussion considers how the roughness penalty expressions can be
tailored to achieve different smoothing patterns. The smoothing over time (years) is
identical for all variants, whereas the smoothing across vector components differs. For
example, when data represent different wind sectors, it is natural to introduce constraints
that force the trend levels (intercepts) of adjacent sectors to be similar. Likewise, when
data represent several seasons, it is natural to force the trend levels for the last season in
one year and the first season in the following year to be close to each other. We use the
4
terms circular and sequential smoothing for these types of constraints on the intercept
parameters (Figure 1a, b). Data collected along a gradient may also require tailored
smoothing in two directions: over time and along the sampled gradient (Figure 1c).
3
Circular smoothing
Circular smoothing is desirable when the collected data represent different classes
on a circular scale. The previously mentioned example of deposition data for different
wind sectors can be used to illustrate this situation. The expected responses are similar
for adjacent classes, but the first and last classes are also closely interrelated.
To introduce circular smoothing in our semiparametric model, we estimate the
parameters by minimizing the sum
S (α, β, λ)  1 L1 (α)  2 L2 (α)
(3)
where 1 and 2 are roughness penalty factors,
n
p
m
S (α, β , λ)    ( yt( j )   t( j )    i( j ) ( xit( j )  xi(. j ) )) 2
t 1 j 1
(4)
i 1
is the residual sum of squares,
n 1 m
L1 (α)    (
t  2 j 1
( j)
t

 t(j1)   t(j1)
2
)2
(5)
represents smoothing of the multivariate temporal trend over time, and
n
m
L2 (α)    ( t( j ) 
t 1 j 1
 t( j 1)   t( j 1)
2
)2
(6)
stands for smoothing across components of this trend. As usual, x i . denotes a mean value
and, to simplify the notation for the circular smoothing, we use the notation
 t( m 1)   t(1) and  t( 0)   t( m )
The selection of smoothing factors 1 and 2 is based on block cross-validation (see Section
10).
5
4
Sequential smoothing
When data are collected over several seasons, there is an obvious relationship
between the observations for adjacent seasons. This feature can be incorporated into the
smoothing pattern by letting s  t (m  1)  j represent the sequential order of the
observations and setting
mn1
 s 1   s 1
s 2
2
L2 (α)   ( s 
)2
(7)
where  s   t( j ) .
5
Gradient smoothing
Gradient smoothing can be suitable if the collected data come from sampling sites
located along a transect or along a gradient of elevation, temperature or precipitation.
Such smoothing can also be appropriate for data representing measured concentrations of
chemical compounds, which can be ordered linearly with respect to, for instance,
volatility, polarity, or lipophilicity. Ordering with respect to the mean of the vector
components constitutes another important option. Regardless of how the linear ordering
is defined, smoothing across coordinates can be accomplished by setting
n m 1
 t( j 1)   t( j 1)
t 1 j  2
2
L2 (α)    ( t( j ) 
)2
(8)
6
a)
b)
c)
Sector 1
Sector 2
.
.
.
.
Sector m
Sector 1
Sector 2
.
.
.
.
Sector m
Sector 1
Sector 2
.
.
.
.
Sector m
Sector 1
Sector 2
.
.
.
.
Sector m
Sector 1
Sector 2
.
.
.
.
Sector m
Year 1
Year 2
Year 3
Year 4
Year n
Season 1
Season 2
.
.
.
.
Season m
Season 1
Season 2
.
.
.
.
Season m
Season 1
Season 2
.
.
.
.
Season m
Season 1
Season 2
.
.
.
.
Season m
Season 1
Season 2
.
.
.
.
Season m
Year 1
Year 2
Year 3
Year 4
Year n
Site 1
Site 2
.
.
.
.
Site m
Site 1
Site 2
.
.
.
.
Site m
Site 1
Site 2
.
.
.
.
Site m
Site 1
Site 2
.
.
.
.
Site m
Site 1
Site 2
.
.
.
.
Site m
Year 1
Year 2
Year 3
Year 4
Year n
Figure 1. Different types of smoothing patterns for the intercept of the basic
semiparametric model. The three graphs show circular smoothing for data representing
several sectors (a), sequential smoothing for data collected over several season (b), and
gradient smoothing for data collected at different sites along a gradient (c).
6
Separable smoothing
The circular, sequential and gradient smoothing are special cases of more general
smoothing patterns that can be imposed by minimizing an expression of the form
7
n
m
  ( yt( j )  yˆ ) 2   L (W1 , α)   L2 (W2 , α)
( j)
t
t 1 j 1
1
1
(9)
2
where
  j   t2j 2    t3j3  
 t 1 

L1 W1 , α  

1

2
t1 , j1 , t 2 , j 2 , t 3 , j3 W1 


2
  j   t2j 2    t3j3  
 t 1 

L2 W2 , α  

1

2
t1 , j1 , t 2 , j 2 , t 3 , j3 W2 


(10)
2
(11)
and yˆ t( j ) denotes a prediction of y t( j ) based on data for all time points s  t . Normally, W1
is set to
W1  {(t, j, t  1, j, t  1, j), t  2, ..., n  1 j  1, ..., m }
in order to generate horizontal smoothing (smoothing over time), whereas W2 is used to
impose a vertical smoothing pattern (smoothing across coordinates). However, the model
can accommodate any user-defined smoothing patterns W1 and W2. We shall refer to this
general technique as separable smoothing. If both 1 and 2 are large, the fitted smooth
trend surface will be almost a plane in R3. If both 1 and 2 are small, the smoothing of
observed values will be practically negligible.
7
A new resampling technique preserving statistical dependencies
To assess the precision of parameter estimates in our semiparametric regression
model we introduce a new form of constrained residual resampling that can preserve
important statistical dependencies in the resampled data. In our case, constrained residual
resampling implies that the resampling favours those combinations of the original
residuals that have desirable dependencies over time and across vector coordinates. First
we compute all the model residuals and their total variation
 
Rtot e    et( j )
n
m
t 1 j 1
2
(12)
Thereafter, we determine
8
  j  et2j 2   et3j3  
 et 1 

R1 W1 , e  

1

2
t1 , j1 , t 2 , j 2 , t 3 , j 3 W1 


2
(13)
and
  j  et2j 2   et3j3  
 et 1 

R2 W2 , e  

1

2
t1 , j1 , t 2 , j 2 , t 3 , j3 W2 


2
(14)
where the first sum is used as a measure of the short-term temporal variation of the
residuals, and the second one is introduced to capture variation across series. Finally, we
form the ratios
T1 (e ) 
R1 (W1 , e )
Rtot (e )
(15)
T2 (e ) 
R2 (W2 , e )
Rtot (e )
(16)
and
that are subsequently used as targets for a step-by-step modification of ordinary bootstrap
residuals e*  et( j )* ,

j  1, ..., m, t  1, ..., n .
The modification of bootstrap residuals is based on an iterative proposal-rejection
algorithm in which pairs of residuals are swapped. At each iteration, a pair of residuals is
randomly selected, and the ratios
T1 (e*) 
R1 (W1 , e*)
Rtot (e*)
(17)
T2 (e*) 
R2 (W2 , e*)
Rtot (e*)
(18)
and
are computed before and after swapping the selected pair. If the swap decreases the
Euclidean distance
(T1 (e*)  T1 (e )) 2  (T2 (e*)  T2 (e )) 2
9
to the target it is accepted, otherwise it is rejected. The swapping is stopped after a
predefined number of proposed swaps or consecutive rejections.
Figure 2 illustrates how the swapping can restore the statistical features of the
residuals obtained for a synthetic dataset representing five sampling sites (stations). The
upper graph shows residuals with a strong correlation across series. The ordinary
bootstrap subsequently destroyed that correlation, but it was restored by the swapping.
Dependencies in residuals of seasonal data or records on a circular scale can also be
restored using our swapping procedure.
8
Smoothing and normalization
The models discussed in this article can be used to produce two major types of
outputs. First, we can estimate the trend surface  t( j ) , j  1, ..., m, t  1, ..., n by
suppressing all types of random variation in the collected data. Secondly, we can
normalize the observed data by removing the variation that can be attributed to the
covariates, that is, by forming
p
~
yt( j )  yt( j )   ˆi( j ) ( xit( j )  xi(. j ) ),
j  1, ..., m, t  1, ..., n
i 1
(19)
where ˆi( j ) is the estimated regression coefficient for the jth component of the ith
covariate. Figure 3 illustrates the results obtained when gradient smoothing was used to
summarize the trends in mean annual concentrations of mercury in muscle tissue from
flounder (Platichthys flesus) caught in the German Bight at five stations located at
varying distances from the mouth of the Elbe River, which was heavily polluted until the
beginning of the 1990s. Figure 4 illustrates the difference between the smooth trend
surface and the considerably rougher surface of normalized values for monthly loads of
nitrogen carried by the Rhine River.
10
3
2
Residual
1
0
-1
-2
2004
2002
2000
1998
1996
1994
1992
1988
1988
1988
1990
1984
1986
1984
1986
1984
1982
1982
1982
1986
1980
1980
1980
-3
Stn 5
Stn 3
Stn 1
3
2
Residual
1
0
-1
-2
2004
2002
2000
1998
1996
1994
1992
1990
-3
Stn 5
Stn 3
Stn 1
3
2
Residual
1
0
-1
-2
2004
2002
2000
1998
1996
1994
1992
1990
-3
Stn 5
Stn 3
Stn 1
Figure 2. Constrained resampling of a set of synthetic residuals that are strongly
correlated across vector coordinates (stations). The three graphs show the original data
(top), an ordinary bootstrap sample of that dataset (middle), and the same bootstrap
sample after 100,000 proposed swaps.
11
1000
800
600
Hg conc.
(ng/g ww)
400
200
Stn 5
2000
1998
Stn 3
1996
1994
Stn 1
1992
1990
1988
1986
0
1000
800
600
Hg conc.
(ng/g ww)
400
200
0
Stn3
Stn5
2000
1998
1996
1994
1992
1990
1988
1986
Stn1
Figure 3. Trend assessment of the concentration of mercury in muscle tissue from
flounder (Platichthys flesus) caught in the German Bight at five stations (53o 53’ N, 9o 11’
E; 53o 52’ N, 8o 52’ E; 53o 56’ N, 8o 38’ E; 53o 57’ N, 8o 30’ E; 53o 45’ N, 8o 2’ E) in or
outside the mouth of the Elbe River. The two graphs show observed annual means and a
trend surface obtained by gradient smoothing (1 = 2, 2 = 0.5).
12
Tot-N load (ton/month)
a)
Discharge
90000
16000
80000
14000
70000
12000
60000
10000
50000
8000
40000
6000
30000
4000
20000
2000
10000
0
Water discharge (109 m3/month)
Tot-N load
0
1988
1990
1992
1994
1996
b)
1998
c)
50000
2000
2002
2004
50000
40000
40000
30000
Total-N load
(ton/month) 20000
30000
Tot-N load
(ton/month)
10000
20000
10000
1999
2001
1993
1995
Sep
May
1997
2001
1998
1995
Apr
Jan
1989
0
Jul
1991
Oct
1992
1989
0
Jan
Figure 4. Trend assessment of the total nitrogen load in the Rhine River at Lobith on the
border between Germany and The Netherlands. The three graphs show observed monthly
nitrogen loads and water discharge values (a), the estimated trend function (b), and flownormalized monthly loads (c).
9
Missing and multiple values
To keep the notation as simple as possible, thus far we have assumed that we have
exactly one observation of the response variable and the p explanatory variables for each
combination of time point (year) and vector component. As pointed out by Hussian et al.
13
(2004), the algorithms used to estimate the semiparametric models can easily be adapted
to accommodate missing or multiple values by changing the diagonal elements of the
above-mentioned band matrix. More generally, our approach can handle data sets of the
type
( ytj((kk)) , x1j,t((kk)) , ..., x pj (,tk()k ) ), k  1, ..., N
where t(k) and j(k) respectively denote the time point and the vector component of the kth
observation.
Our resampling procedure can also be modified to accommodate missing and
multiple values. For the general case with a varying number of observations per cell, we
introduce the random effect model
et(, jk)  t( j )  t(, kj ) , t  1, ..., n,
j  1, ..., m, k  1, ..., n(t , j )
(20)
where et(,kj ) denotes the kth residual of the jth series at time t. First, the cell-specific
random effects  t( j ) are predicted using expressions of the form
ˆt( j )   t( j ) et(. j )
(21)
where
 t( j ) 
Vˆ (δ)
Vˆ (δ)  Vˆ (η) / n(t , j )
(22)
and Vˆ ( δ) and Vˆ (η) denote the estimated variances of the two types of random effects
(Hall & Maiti, 2006). Thereafter the t(, kj ) are predicted by subtracting the predicted cellspecific components from the original residuals. Finally, constrained resampling is used
to sample the ˆt( j ) , whereas the ordinary bootstrap is applied to the ˆt(, kj ) representing
variation within cells.
10 Computational aspects
For fixed values of the smoothing factors 1 and 2, the semiparametric models
described in this article can be estimated by using back-fitting algorithms in which
estimation of the slope parameters for fixed intercepts is alternated with estimation of the
intercepts for fixed slopes. More specifically, ordinary regression algorithms can be used
14
to estimate the slope parameters, whereas a system of linear equations with nm unknowns
must be solved to estimate the intercepts. The latter can be achieved by Cholesky
factorization of the coefficient matrix and sequential determination of the unknowns
(Silverman & Green, 1994). In the case of sequential smoothing, Stålnacke and Grimvall
(2001) demonstrated that the coefficient matrix is a positive definite symmetric band
matrix with lower and upper bandwidth 2m. The other smoothing techniques described
here give rise to symmetric band matrices that have the same bandwidth. Also, it can be
noted that, for a given bandwidth, the computational burden is proportional to the number
of time points n.
The smoothing factors 1 and 2 that control the effective dimension of the
estimated models can be determined by cross-validation. However, it should be noted that
conventional leave-one-out cross-validation may lead to over-fitting if the error terms are
correlated (Shao, 1993; Libiseller & Grimvall, 2003). Hence, we propose block crossvalidation in which the observed response values are divided into n blocks, each
composed of p contemporaneous components of yt.
The computational burden of the resampling varies strongly with the number of
bootstrap replicates and swaps. We normally used 200 replicates and a maximum of
100,000 swaps for each replicate. Under these conditions the resampling approximately
doubled the computational time.
A software package (Multitrend) containing our semiparametric smoothing method
and resampling technique is available (LiU, 2008).
11 Further generalizations
Time-varying slope parameters
The model introduced in formula 2 has time-dependent intercepts, whereas the
slope parameters are not dependent on time. Models in which the intercepts are time
independent and the time-dependence of the slope parameters is controlled by roughness
penalty expressions can be specified and estimated in a similar manner (Stålnacke &
Grimvall, 2001). However, for computational reasons, it is not feasible to handle more
15
than one explanatory variable in such semiparametric models. Furthermore, it is not
practicable to let both the intercept and the slope parameters vary with time due to the
following: back-fitting algorithms for such models may have convergence problems, and
the computational burden increases dramatically with the number of roughness penalty
factors determined by cross-validation.
Nonparametric analogues
The nonparametric specification of the intercepts can be justified by a desire to
make unprejudiced inference about the shape of the trend surface. Similar arguments
might rationalize the use of models in which both the trend and the influence of
covariates are specified nonparametrically, for example, those of the type
p
y t( j )   t( j )   hi( j ) ( xit( j )  E ( xit( j ) ))   t( j ) ,
j  1, ..., m, t  1, ..., n
i 1
(23)
where hi( j ) , i  1, ..., p, j  1, ..., m are smooth functions. Furthermore, this generalization
is easy to program when a back-fitting algorithm is used to estimate the model
parameters, because, with such an approach, the estimation of hi( j ) , i  1, ..., p, j  1, ..., m
for given intercepts can be reduced to mp nonparametric univariate regressions. However,
it should be noted that models with very few structural constraints may lead to problems
with over-fitting and slow convergence of the back-fitting algorithm.
12 Discussion
The current techniques for detecting trends in time series of environmental quality
data are dominated by approaches in which the collected data are analyzed separately for
each site and response variable. The major exceptions to this rule are the widespread use
of multivariate linear models and the growing interest in spatio-temporal geostatistical
models. In this article, we have demonstrated that roughness penalty approaches to the
estimation of semiparametric regression models constitute a very flexible group of
techniques for simultaneous detection of temporal trends and adjustment for covariates in
vector time series of environmental quality data.
16
In contrast to ARIMAX models and other classical multivariate time series models,
which are based on stationarity or very simple forms of nonstationarity, our method
enables careful modelling of the nonstationary features of the collected data. This
property, it shares with a large class of methods that can be referred to as smoothing
techniques for response surfaces. Thin plate splines, vector generalized additive models
(GAMs), and kernel smoothers (Wahba, 1990; Yee & Wild, 1996; Härdle, 1997; Hastie
et al., 2001) can all be appropriate for estimating nonlinear trends. However, this article
shows that our roughness penalty approach has the advantage that the smoothing pattern
can be tailored to take into account almost any relationship between the different
components of the observed random vectors. Furthermore, the degree of smoothing can
be fine-tuned in
two directions
without making the computational burden
insurmountable.
Dynamic factor analysis (DFA) represents another promising approach to
assessment of trends in vector time series data (Zuur et al., 2003). Inasmuch as DFA is a
latent variable technique, it is particularly useful when the dimension of the analyzed
vector time series is high. Nonetheless, it might be worth considering our method for
such data. This is particularly true when then there is considerable prior knowledge about
the interrelationship between the vector components or the proximity of the vector
components can be modeled.
Some comments should also be made about the residual resampling technique. Our
technique offers the important advantage of taking into account the correlation structure
of the model residuals without employing any sophisticated spatio-temporal modelling.
Moreover, it is well coordinated with the smoothers used to extract the trend surface.
However, like any other form of residual resampling, our method creates a new
resampled dataset by adding resampled residuals to fitted response values. This can be
justified if the uncertainty of the fitted responses is considerably smaller than the
individual error terms. In practice, this implies that the uncertainty estimates are reliable
for models with relatively large smoothing factors.
17
Finally, it is worth noting that, although some of the computational aspects of
parameter estimation in semiparametric models may require special attention (Schimek,
2001), the basic principles of such models are easy to communicate to a wide audience.
13 Acknowledgements
The authors are grateful for financial support from the Swedish Environmental
Protection Agency.
14 References
Carlstein E., Do K., Hall P., Hesterberg T., and Künsch H.R. (1998). Matched-block
bootstrap for dependent data. Bernoulli, 4, 305-328.
Gardner M.W. and Dorling S.R. (2000a). Statistical surface ozone models: an improved
methodology to account for non-linear behaviour. Atmospheric Environment, 34, 2134.
Gardner M.W. and Dorling S.R. (2000b). Meteorologically adjusted trends in UK daily
maximum surface ozone concentrations. Atmospheric Environment, 34, 171-176.
Giannitrapani M., Scott M., Bowman A., and Smith R. (2004). A statistical method to
analyze
trends
in
EMEP
data.
EMEP
Assessment
Report
(http://www.emep.int/assessment/appendix1.pdf).
Hall P. and Maiti T. (2006). Nonparametric estimation of mean-squared prediction error
in nested-error regression models. The Annals of Statistics, 34, 1733-1750.
Härdle W. (1997). Applied non-parametric regression. Cambridge, Cambridge
University.
Hastie T., Tibshirani R., and Friedman J. (2001). The elements of statistical learning.
New York, Springer.
Hussian M., Grimvall A., and Petersen W. (2004). Estimation of the human impact on
nutrient loads carried by the Elbe River. Environmental Monitoring and Assessment,
96, 15-33.
18
Lahiri S.N. (1999). Theoretical comparison of block bootstrap methods. Annals of
Statistics, 27, 386-404.
Libiseller C. and Grimvall A. (2003). Model selection for local and regional
meteorological normalisation of background concentrations of tropospheric ozone.
Atmospheric Environment, 37, 3923-3931.
Libiseller C. and Grimvall A. (2005). Trend analysis of CAMP data regarding wet and
dry nitrogen deposition. Report to the 2005 meeting of the OSPAR working group on
inputs to the marine environment (INPUT). London, OSPAR.
LiU
(Linköping
University)
(2008).
http://www.ida.liu.se/divisions/stat/research/.
Accessed 2008-08-20.
Mammen E. (2000). Resampling methods for nonparametric regression. In Schimek
M.G. (ed), Smoothing and Regression – Approaches, Computation, and Application.
New York, John Wiley & Sons.
Schimek M.G. (2001). Smoothing and regression: Approaches, Computation, and
Application. New York, Wiley-Interscience.
Shao J. (1993). Linear model selection by cross-validation. Journal of the American
Statistical Association, 88, 486-494.
Shively T.S. and Sager T.W. (1999). Semiparametric regression approach to adjusting for
meteorological variables in air pollution trends. Environmental Science &
Technology, 33, 3873-3880.
Silverman B.W. and Green P.J. (1994). Nonparametric regression and generalized linear
models: A roughness penalty approach. London, Chapman & Hall.
Srinivas V.V. and Srinivasan K. (2005). Matched block bootstrap for resampling
multiseason hydrologic time series. Hydrological Processes, 19, 3659-3682.
Stålnacke P. and Grimvall A. (2001). Semiparametric approaches to flow-normalisation
and source apportionment of substance transport in rivers. Environmetrics, 12, 233250.
19
Thompson M.L., Reynolds J., Cox L.H., Guttorp P., and Sampson P.D. (2001). A review
of statistical methods for the meteorological adjustment of ozone. Atmospheric
Environment, 35, 617-630.
Wahba G. (1990). Spline models for observational data. Philadelphia, SIAM.
Yee T.W. and Wild C.J. (1996) Additive extensions to generalized estimation equation
methods. Journal of the Royal Statistical Society Series B, 58, 711-725.
Zuur A.F., Fryer R.J., Jolliffe I.T., Dekker R., and Beukema J.J. (2003). Estimating
common trends in multivariate time series using dynamic factor analysis.
Environmetrics, 7, 665-685.
20
Download