Systematic evaluation of autoregressive error models as post-processors

Journal of Hydrology 407 (2011) 58–72
Contents lists available at ScienceDirect
Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol
Systematic evaluation of autoregressive error models as post-processors
for a probabilistic streamflow forecast system
Martin Morawietz ⇑, Chong-Yu Xu, Lars Gottschalk, Lena M. Tallaksen
Department of Geosciences, University of Oslo, P.O. Box 1047 Blindern, 0316 Oslo, Norway
a r t i c l e
i n f o
Article history:
Received 20 December 2010
Received in revised form 2 May 2011
Accepted 5 July 2011
Available online 12 July 2011
This manuscript was handled by Andras
Bardossy, Editor-in-Chief, with the
assistance of Luis E. Samaniego, Associate
Editor
Keywords:
Probabilistic forecast
Post-processor
Hydrologic uncertainty
Autoregressive error model
Ranked probability score
Bootstrap
s u m m a r y
In this study, different versions of autoregressive error models are evaluated as post-processors for probabilistic streamflow forecasts. The post-processors account for hydrologic uncertainties that are introduced by the precipitation–runoff model. The post-processors are evaluated with the discrete ranked
probability score (DRPS), and a non-parametric bootstrap is applied to investigate the significance of differences in model performance. The results show that differences in performance between most model
versions are significant. For these cases it is found that (1) error models with state dependent parameters
perform better than those with constant parameters, (2) error models with an empirical distribution for
the description of the standardized residuals perform better than those with a normal distribution, and
(3) procedures that use a logarithmic transformation of the original streamflow values perform better
than those that use a square root transformation.
Ó 2011 Elsevier B.V. All rights reserved.
1. Introduction
In recent years, the topic of probabilistic flow forecasting has
gained increased attention in hydrological research and operational applications. The traditional method of flow forecasting
has been based on using a deterministic meteorological forecast
and transforming it through a deterministic hydrological model
to attain a single deterministic flow value. However, it was recognised that such a forecast is often associated with considerable
uncertainties, which need to be described as well in order to assist
rational decision making.
A catalyst in this development has been the development of
meteorological ensemble forecasts (e.g. Molteni et al., 1996) that
aim to describe the uncertainties of the meteorological forecasts.
Many hydrological studies have focused on treating the input
uncertainties of precipitation and temperature by using meteorological ensemble forecasts as inputs to hydrological models (see
for example the review on ensemble flood forecasting by Cloke
and Pappenberger (2009)). However, in order to obtain a proper
probabilistic flow forecast, all relevant sources of uncertainty in
the hydrological modelling process should be addressed, not only
the input uncertainties of forecast precipitation and temperature.
⇑ Corresponding author. Tel.: +47 22854908; fax: +47 22854215.
E-mail address: martin.morawietz@geo.uio.no (M. Morawietz).
0022-1694/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.jhydrol.2011.07.007
Other uncertainties comprise the model uncertainty (model structure and parameters), uncertainty of the initial states of the hydrological model at the time of the forecast as well as uncertainties of
observed precipitation and temperature that drive the hydrological
model up to the point where the forecast starts (these latter uncertainties influence the uncertainties of initial states).
Ideally, a probabilistic forecast system would treat all possible
sources of uncertainty explicitly. However, this seems both theoretically and practically impossible (Cloke and Pappenberger,
2009). The complex interactions between the different sources of
uncertainties and the character of the unknown make an explicit
treatment impossible. Following the argument of Krzysztofowicz
(1999), ‘‘. . . for the purpose of real-time forecasting it is infeasible,
and perhaps unnecessary, to explicitly quantify every single source
of uncertainty’’. Similarly, with respect to parameter uncertainty,
Todini (2004) states ‘‘that in flood forecasting problems one is definitely not interested in a parameter sensitivity analysis, but
mainly focused at assessing the uncertainty conditional to the chosen model with its assumed parameter values’’.
A compromise between explicit and lumped treatment of the
different sources of uncertainty is laid out in the framework for
probabilistic forecast described by Krzysztofowicz (1999). He
proposes to treat those input variables that have the greatest
impact on the uncertainty of the forecast explicitly; that means
probability distributions of these variables are used as inputs to
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
the deterministic hydrological model. In the case of flow forecasting, these inputs variables are forecasts of precipitation and, in
addition, forecasts of temperature in catchments where snow plays
an important role for runoff formation. All other uncertainties are
then treated together in a lumped form as hydrologic uncertainty
with a so called hydrologic uncertainty processor, also called
post-processor.
When describing the hydrologic uncertainty through a hydrologic uncertainty processor for streamflow, the aim is to find the
distribution of future observed streamflow at time t, Qobs(t), conditional on the simulated streamflow at time t, Qsim(t), that is attained when the true values of precipitation and temperature are
used to drive the hydrological model. The distribution can be conditioned on additional variables, and, analogue to the use of observed river stage at time t = 0 for a hydrologic uncertainty
processor for river stage forecasting (Krzysztofowicz and Kelly,
2000), the observed streamflow at time t = 0 when the forecast
starts, Qobs(0), is used in a hydrologic uncertainty processor for
streamflow. The conditional distribution that is sought-after is
then u(Qobs(t)|Qsim(t), Qobs(0)).
The approach investigated in this paper is a direct estimation of
the distribution u(Qobs(t)|Qsim(t), Qobs(0)). The distribution is described with the help of a first order autoregressive error model
0
0
0
of the form dt ¼ adt1 þ ret , where dt is the model error of the
deterministic hydrological model (observed minus simulated
streamflow) at time t, a and r are the parameters of the error model,
and et is the residual error described through a probability distribution j. Based on this equation, the distribution u of the observed
streamflow at time t with time t 1 = 0 as the time where the forecast starts (time of the last observed data) is given as:
uðQ obs ðtÞjQ sim ðtÞ; Q obs ð0Þ; Q sim ð0ÞÞ
ðQ obs ðtÞ Q sim ðtÞÞ aðQ obs ð0Þ Q sim ð0ÞÞ
¼j
r
ð1Þ
Note that in this description the distribution is conditioned on an
additional variable, Qsim(0).
A similar approach of a direct estimation of the distribution
u(Qobs(t)|Qsim(t), Qobs(0)) by using an autoregressive model was
proposed by Seo et al. (2006). However, in their formulation the
transformed observed streamflow itself is the autoregressive variable, not its error. Their autoregressive model has then the simulated streamflow at time t, Qsim(t), as exogenous variable, but the
model does not contain the simulated streamflow at time 0,
Qsim(0).
In general, autoregressive error models have been used in
hydrology in many different contexts. For example, Engeland and
Gottschalk (2002) and Kuczera (1983) use them in the framework
of Bayesian parameter estimation, and Xu (2001) uses them to
study the residuals of a water balance model. In the context of flow
forecasting, they are used for updating of model outputs (e.g. Toth
et al., 1999). However, when used for updating of model outputs
only the deterministic part of the error model is applied for correcting a deterministic forecast to another deterministic forecast.
This paper analyses the use of an autoregressive error model as
hydrologic uncertainty processor, also called post-processor,
where not only the deterministic component of the error model
is used as for model updating, but the full distribution is applied.
When using autoregressive models for the description of
streamflow or streamflow errors, three aspects are important.
59
high and low flows, while Engeland and Gottschalk (2002)
use a more detailed classification based on states of the variables temperature, precipitation and snow depth.
(2) In order to make the residuals homoscedastic, a transformation is often applied to the original observed and simulated
streamflow values. Common transformations are the logarithmic transformation (e.g. Engeland and Gottschalk,
2002) or the square root transformation (Xu, 2001), which
are (apart from a linear shift) special cases of the Box-Cox
transformation (Box and Cox, 1964).
(3) The errors et of the autoregressive model are usually
assumed to be normally distributed. However, when an
autoregressive error model is used as a hydrologic uncertainty processor to generate probabilistic forecasts, a violation of the distributional assumptions will distort the
results of such a forecast. A straightforward alternative to
solve this problem is the use of an empirical distribution
defined through the empirical standardized residuals of the
calibration period. Application of this approach for a hydrologic post-processor has to our knowledge so far not been
described in the literature.
Another important aspect for a proper evaluation of the results
is the assessment of the significance of differences found in the
evaluation measures. Such an evaluation does not always have to
be carried out through a formal analysis if sufficient experience
with a certain subject can ensure that a subjective evaluation leads
to a reliable assessment. However, as there is so far relatively little
experience in hydrological research with forecast evaluation measures such as the discrete ranked probability score (DRPS), an explicit treatment of the uncertainty of these evaluation measures
seems more appropriate. The flexible approach of bootstrap (Efron
and Tibshirani, 1993) allows such an explicit evaluation of the
uncertainty without the necessity of making distributional
assumptions of the variable being evaluated.
Based on these considerations, the main objective of this study
is an evaluation of different versions of autoregressive error models as hydrologic uncertainty processors. The following aspects are
investigated in particular:
(1) Use of state dependent parameters versus state independent
parameters.
(2) Use of a logarithmic transformation of the original streamflow values versus a square root transformation.
(3) Use of a standard normal distribution for the description of
the standardized residuals versus an empirical distribution.
In addition, the application of bootstrap to the forecast evaluation measures to evaluate the significance of the results is demonstrated, and a discussion on the discrete ranked probability score
for the evaluation of probabilistic streamflow forecasts is included.
The study was carried out by evaluating eight different autoregressive error models as hydrologic uncertainty processors. A wellknown precipitation–runoff model, the HBV model, was chosen as
deterministic hydrological model to which the uncertainty processors were applied. The uncertainty processors were calibrated for
55 catchments in Norway and evaluated using the discrete ranked
probability score in combination with a non-parametric bootstrap.
2. Methods
(1) In the simplest application, the parameters of the autoregressive model are assumed to be fixed. However, several
authors propose the use of different parameters for different
hydrological or meteorological states. Lundberg (1982) and
Seo et al. (2006), for example, use different parameters for
2.1. Deterministic hydrological model: HBV model
The HBV model (Bergström, 1976, 1992) can be characterised as
a semi-distributed conceptual precipitation–runoff model. It
60
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
distinguishes different elevation zones based on the hypsographic
curve, and for these elevation zones temperature and precipitation
are adjusted according to temperature and precipitation gradients
and a temperature threshold to distinguish between rain and
snow. Within each elevation zone, different land use zones are distinguished by allowing different parameters for certain model processes. Each zone runs individual routines for the snow and soil
moisture routine.
Since its original development in the early 1970s, the HBV model
has been applied and modified in many different operational and
research settings. The model version used in this study is the ‘‘Nordic’’ HBV model (Sælthun, 1996), which is used for operational flow
forecasting at the Norwegian Water Resources and Energy Directorate (NVE). The model is run with daily time steps with mean daily
temperature and accumulated daily precipitation as model inputs
and mean daily streamflow as model output. The model was calibrated by NVE for 117 catchments (Lawrence et al., 2009). The calibration was carried out as an automated model calibration using
the parameter estimation software PEST (Doherty, 2004), which is
based on an implementation of the Levenberg–Marquardt method
(Levenberg, 1944; Marquardt, 1963). From the 117 catchments,
55 catchments together with the calibrated HBV models were selected for this study based on a sufficiently long period of common
data. For these catchments, the HBV model calibration period was
1981–2000 and the validation period were the combined periods
of 1961–1980 and 2001–2006. The Nash–Sutcliffe efficiency coefficients NE (Nash and Sutcliffe, 1970) for the validation period range
from 0.50 to 0.90. Twenty catchments have a good model performance (NE P 0.80), 26 catchments have an intermediate model
performance (0.65 6 NE < 0.8), and 9 catchments have a relatively
weak model performance (0.5 6 NE < 0.65).
2.2. Versions of post-processors for the HBV model
The simulation errors of the deterministic precipitation–runoff
model are described through an autoregressive error model:
dt ¼ at dt1 þ rt et
ð2Þ
The simulation error dt is defined as the difference between the
transformed observed streamflow, ot, and transformed simulated
streamflow, st:
dt ¼ ot st
ð3Þ
Parameters at and rt are the parameters of the error model, and et is
the standardized residual error described through a random variable with the probability density function j.
Solving the error model for the observed streamflow yields:
ot ¼ st þ at ðot1 st1 Þ þ rt et
ð4Þ
The density of the observed streamflow ot conditional on st, ot–1 and
st–1 is then equal to the density of the value et that corresponds to ot
through Eq. (4):
uðot jst ; ot1 ; st1 Þ ¼ jðet Þ with et ¼
ðot st Þ at ðot1 st1 Þ
Table 1
Aspects investigated.
(1) Parameters at and rt
(2) Transformation
type
(3) Distribution j
of the standardized
residuals et
State dependent (SD)
Log
transformation
(Log)
ot ¼ ln Q obs ðtÞ
st ¼ ln Q sim ðtÞ
Standard normal
distribution
(Norm)
Square root
transformation
(Sqrt)
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ot ¼ Q obs ðtÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
st ¼ Q sim ðtÞ
Empirical
distribution
(Emp)
a
þ bs
for Q
ðtÞ P q
t
sim
thresh
at ¼ aiðtÞ
iðtÞ þ b st for Q sim ðtÞ < qthresh
ln rt ¼
AiðtÞ þ Bst for Q sim ðtÞ P qthresh
AiðtÞ þ B st for Q sim ðtÞ < qthresh
State independent (SI)
at, rt constant
Table 2
Model versions investigated in the study; abbreviations are defined in Table 1.
Version
Label: Parameters.transformation.distribution
1
2
3
4
5
6
7
8
SD.Log.Norm
SI.Log.Norm
SD.Sqrt.Norm
SI.Sqrt.Norm
SD.Log.Emp
SI.Log.Emp
SD.Sqrt.Emp
Si.Sqrt.Emp
In this study, the post-processors are applied to HBV-model
outputs that are produced without additional updating procedures
like e.g. Kalman-filtering (Kalman, 1960) or variational data assimilation (Seo et al., 2003). Thus, updating is only done through the
autoregressive error model itself. According to the classification
of updating procedures by the World Meteorological Organization
(1992), this updating is classified as an updating of output variables. In principle, the post-processors may also be applied to model outputs that include other updating procedures like updating of
input variables, state variables or model parameters. In this case,
the post-processors would need to be calibrated on model outputs
that include these updating procedures.
2.2.1. State dependent parameter formulation
As state dependent parameter formulation, a parameter
description used at the Norwegian Water Resources and Energy
Directorate (NVE) is applied (Langsrud et al., 1998). State dependence of the parameters is realized in three ways:
(1) Firstly, the parameters at and rt of the autoregressive error
model are formulated to be linearly dependent on the transformed simulated streamflow st:
at ¼ aiðtÞ þ bst
ln rt ¼ AiðtÞ þ Bst
rt
ð7Þ
ð8Þ
ð5Þ
i.e.
ðot st Þ at ðot1 st1 Þ
uðot jst ; ot1 ; st1 Þ ¼ j
rt
ð6Þ
Eq. (6) constitutes a post-processor. The aspects investigated are (1)
parameter formulation, (2) transformation type and (3) distribution
type (Table 1). Each of the three aspects has two possible realizations, and by combining these, in total eight model versions of
post-processors are generated (Table 2).
(2) Secondly, the parameters ai(t) and Ai(t) of the linear relations
can assume different values, depending on the states defined
through the variables observed temperature Tt, observed
precipitation Pt and simulated snow water equivalent SWEt
at time t. It is distinguished if the temperature is below or
above 0 °C, if precipitation occurs or not, and if the snow
water equivalent is above or below a certain threshold value
swethresh; if the amounts of snow are below swethresh, the
catchment is assumed to behave as a snow free catchment.
61
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Table 3
States distinguished for different conditions of observed temperature Tt, observed
precipitation Pt, simulated snow water equivalent SWEt, and simulated streamflow
Qsim(t) at time t, and the corresponding parameters of the error model.
Tt 6 0 °C
Tt > 0 °C and Pt = 0 mm
SWEt 6 swethresh
Tt > 0 °C and Pt = 0 mm
SWEt > swethresh
Tt > 0 °C and Pt > 0 mm
SWEt 6 swethresh
Tt > 0 °C and Pt > 0 mm
SWEt > swethresh
States of simulated streamflow
Qsim(t) P qthresh
Qsim(t) < qthresh
and
1
2
a1, A1
a2, A2
a1 ; A1
a2 ; A2
and
3
a3, A3
b, B
a3 ; A3
and
4
a4, A4
a4 ; A4
and
5
a5, A5
a5 ; A5
3/M
2/M
b⁄, B⁄
ε
Fig. 1. Schematic diagram of the empirical distribution function F(e) of the
standardized empirical residuals e.
Summarizing points 1–3, the state dependent parameters at
and rt are formulated as
at ¼
aiðtÞ þ bst
for Q sim ðtÞ P qthresh
aiðtÞ þ b st
(
ln rt ¼
for Q sim ðtÞ < qthresh
AiðtÞ þ Bst
for Q sim ðtÞ P qthresh
AiðtÞ
for Q sim ðtÞ < qthresh
þ B st
ð9Þ
rt
ð11Þ
of the error model, obtained by solving Eq. (2) for et. The set of standardized empirical residuals ^em ; m 2 f1; 2; . . . ; Mg, is calculated from
all days m = 1, . . . , M of the calibration period after the parameters
of the error model have been estimated. The empirical distribution
function F(e) (Fig. 1) of the standardized residual e is then defined as
the step function
FðeÞ ¼
M
1 X
Ið^em 6 eÞ
M m¼1
2.3. Estimation of the parameters of the error model
2.3.1. Models with state independent parameters
For models with state independent parameters, Eq. (2)
constitutes a simple linear regression model. Parameter a is
estimated using ordinary least squares, and parameter r is estimated as the root mean square of the residuals of the regression
model.
2.3.2. Models with state dependent parameters
For models with state dependent parameters (Eqs. (9) and (10)),
Eq. (2) can be rewritten as:
(
dt ¼
2.2.2. Empirical distribution function
The empirical distribution function is based on the empirical
standardized residuals
dt at dt1
where I(A) is an indicator variable that is 1 if A is true and 0 if A is
false. For values of e that lie outside the observed range of residuals,
the non-exceedance probability is 0 or 1.
No further attempt was made to smooth the distribution function or use some other plotting position formula to determine the
points of the distribution function. Given the large number of
points it is not expected that such refinements would lead to any
measurable changes in the results.
ð10Þ
The threshold for snow water equivalent was chosen as follows.
A catchment is assumed to behave as snow free when the simulated snow cover falls below 10%. The threshold swethresh is then
determined as the average snow water equivalent that corresponds to a snow cover of 10%.
As threshold qthresh to distinguish between high and low flows,
the 75th percentile of the observed streamflow of the calibration
period is used.
^et ¼
1/M
0
Through the combination of the three variables, five different states i(t) e {1, 2, . . . , 5} are distinguished (Table 3). The
classification into the five states is based on conceptual considerations about the presence or absence of different processes which may result in different error behaviour. For
temperatures below zero, precipitation is accumulated in
the snow storage and streamflow comes mainly from base
flow. For temperatures above zero, different processes take
place both in the model and in the real catchment, depending on if precipitation is present or not and if a snow pack is
present or not.
(3) Thirdly, two different sets of parameters aj, b, Aj, B,
j=1, . . . , 5, are used, depending on whether the simulated
streamflow at time t is above or below a flow threshold
qthresh (Table 3).
(
.......
i(t)
(M−2)/M
F (ε)
Meteorological and snow states
1
(M−1)/M
ð12Þ
aiðtÞ dt1 þ bst dt1 þ expðAiðtÞ þ Bst Þet
aiðtÞ dt1 þ b st dt1 þ expðAiðtÞ þ B st Þet
for Q sim ðtÞ P qthresh
for Q sim ðtÞ < qthresh
ð13Þ
The first two terms on the right hand side of Eq. (13) comprise the
deterministic component of the error model, while the third term
constitutes the stochastic component.
The parameter estimation for this model type follows an iterative two-step procedure (Langsrud et al., 1998). In the first step,
the parameters of the deterministic part of the error model,
aj ; b; aj ; b , are estimated in a weighted linear regression. In the
second step, the parameters of the stochastic component,
Aj ; B; Aj ; B , are estimated in a generalized linear regression with
logarithmic link function and Gamma distribution. Steps one
and two are then repeated using the results from the second step
to update the weights of the linear regression of the first step;
repetitions are continued until the parameter estimates converge.
A detailed description of the estimation procedure is given in
Appendix A.
2.3.3. Practical adjustments for the parameter estimation
2.3.3.1. Exclusion of the smallest streamflow values for 15 catchments. For several catchments, the smallest streamflow values
were removed from the data series due to problems with the logarithmic transformation.
62
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
(1) For 13 catchments, the streamflow series of observed or simulated streamflow contained instances (days) with zero values. However, the logarithm is only defined for values larger
than zero. Therefore, instances with zero values were
removed from the data series of the respective catchments.
(2) For streamflow values Q that tend towards zero, Q ? 0, the
log transformed values tend towards minus infinity,
ln Q ? 1. As a result, infinitesimal differences between
Qobs(t) and Qsim(t) in the original scale can become arbitrarily
large in the transformed variable space. For the residuals
et = rtet of an autoregressive error model, this may then lead
to the opposite of what the transformation should achieve;
instead of harmonizing the residuals, the variance of the
residuals becomes arbitrarily large for Qobs(t) ? 0 or
Qsim(t) ? 0. While for the majority of catchments, the logarithmic transformation worked well (see example plot of
the squared residuals e2t versus ln Qsim(t) for catchment
Losna, Fig. 2a), for 10 catchments an arbitrary increase of
the squared residuals for the smallest streamflow values
was found (e.g. catchment Reinsnosvatn, Fig. 2b). This led
to a non-convergence of the calibration routine for the
parameters Ai and B⁄ in these catchments. Based on visual
inspection of the plots of the squared residuals e2t versus
lnQsim(t), instances with very small values of observed or
simulated streamflow were therefore removed from the
respective streamflow series.
Based on points (1) and (2) above, the streamflow series of 15
catchments were truncated by their smallest flow values. The
catchments might also have been completely excluded from the
study. However, this would have substantially reduced the number
of catchments available. As the focus of this study is not on the
smallest values of the flow spectrum but rather on the general
range and high flows, it is considered that such a truncation is reasonable and that it is preferable to include these data series in the
study.
2.3.3.2. Minimum number of data per class. For each of the 10 classes
defined by meteorological and snow states and states of simulated
streamflow (Table 3), a sufficient number of data should be available for estimation of the parameters of the respective class. Therefore a minimum requirement of 60 instances per class was
introduced. In case the number is less, the respective class is
merged with another class for estimation of common parameters.
A merging scheme was developed based on the similarity of
parameter estimates of the different classes. The merging scheme
was developed from 35 catchments that had sufficient data in all
10 classes.
In the final parameter estimation for the models with state
dependent parameters, the complete number of 24 parameters
was estimated in 43% of the estimations. For another 43%, the
number of estimated parameters was reduced to 22. Twenty
parameters were estimated in 11% of the estimations, and only in
3% of the estimations the number of estimated parameters was less
than 20.
2.4. Evaluation of the error model
2.4.1. The discrete ranked probability score (DRPS)
The discrete ranked probability score (DRPS; Murphy, 1971;
Toth et al., 2003) was chosen as forecast evaluation measure. It is
a summary score that allows an evaluation of the forecast with respect to a number of user specified thresholds.
There are several definitions of the discrete ranked probability
score. The essence of all definitions is the same, i.e. the DRPS evaluates the squared differences (pk ok)2 between the cumulative
distribution function of the forecast, p, and the cumulative distribution function of a perfect forecast (observation), o, at some predefined thresholds, xk, k = 1, 2, . . . , K (Fig. 3).
Table 4 gives an overview of different definitions found in the
literature. The different formulations lead to differences in range
and orientation of the score. The first definition defines the DRPS
as the sum of the squared differences (Murphy, 1971; Wilks,
1995), leading to a range of [0, K]. The second definition uses the
mean of the squared differences (e.g. Toth et al., 2003), resulting
in a standardized range [0, 1]. For both definitions the orientation
of the score is negative, i.e. the best score has the lowest value of
zero. In the third definition, the score is inverted to a positive orientation. In addition, the range is adjusted by adding a constant of
one. This third version was used in the original definition of the
ranked probability score by Epstein (1969).
Since the definitions of the DRPS given above are linear transformations of each other, the information conveyed through the
different scores is essentially the same. So far it does not seem that
one of the above definitions has become a definitive standard. For
this publication, the definition according to the second equation is
Fig. 2. Example plots of squared residuals e2t ¼ r2t e2t versus the logarithm of simulated streamflow ln Qsim(t) for selected catchments for models with state dependent
parameters (SD) and logarithmic transformation (Log). (a) Catchment Losna; the squared residuals are relatively homoscedastic over the whole range of flow values. (b)
Catchment Reinsnosvatn; strong deviation from homoscedastic behaviour for small streamflow values.
63
1.0
Table 4
Definitions of the discrete ranked probability score (DRPS) for a single event.
p
0.8
o
pk − o k
No
PK
2
1
K
3
1 K1
k¼1 ðpk
ok Þ2
Range
Orientation
Source
[0, K]
Negative
Murphy (1971)
Wilks (1995)
[0, 1]
Negative
Bougeault (2003)b
Nurmi (2003)
Toth et al. (2003)
WWRP/WGNE Joint
Working Group on
Forecast Verification
Research (2010)
[0, 1]
Positive
Epstein (1969)c
Stanski et al. (1989)
0.2
0.4
0.6
1
Formulaa
PK
2
k¼1 ðpk ok Þ
0.0
Non−exceedance probability
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
x1 x 2 x 3
...
x K− 2 x K−1 x K
PK
k¼1 ðpk
ok Þ2
Fig. 3. Definition of the discrete ranked probability score.
chosen. On the one hand, a standardized range seems preferable
over a range that is dependent on the number of thresholds as in
definition 1. On the other hand, definition 2 is more direct and
intuitive than definition 3. The latter may be an important aspect
when probabilistic forecasting and forecast verification should be
communicated to a wider audience.
Following Toth et al. (2003), the DRPS for one event is calculated
as follows. K thresholds x1 < x2 < < xK for the continuous variable
X are selected. These thresholds define the events Ak = {X 6 xk},
k = 1, 2, . . . , K, with the corresponding forecast probabilities p1,
p2, . . . , pK. Analogue, for each event Ak a binary indicator variable
ok is defined that indicates if the event Ak occurs or not, i.e. ok = 1
if Ak occurs, and ok = 0 otherwise. The DRPS for one event is then
calculated as
DRPSind ¼
K
1X
ðp ok Þ2
K k¼1 k
ð14Þ
In this study, the continuous variable X is daily streamflow, and a
single event is the daily streamflow occurring on a certain day in
a certain catchment.
To evaluate the performance of a probabilistic forecast model
in one catchment over a certain time period, the average of the
DRPSind values for the individual days n = 1, 2, . . . , N is calculated as
DRPS ¼
N
1X
DRPSindn
N n¼1
ð15Þ
Furthermore, to evaluate and compare the overall performance of
different forecast models over a number of catchments, the average
of the DRPS values for catchments c = 1, 2, . . . , C is calculated as
DRPS ¼
C
1X
DRPSc
C c¼1
ð16Þ
To actually calculate a DRPS for an event, the K thresholds x1,
x2, . . . , xK have to be specified. Jaun and Ahrens (2009) in their evaluation study of a probabilistic hydrometeorological forecast system chose four thresholds (the 25th, 50th, 75th and 95th
percentile of the historic streamflow record). However, when such
a small number of thresholds is chosen, differences between the
cumulative distribution functions of the forecast and the corresponding observation might not be captured adequately. Therefore, if the purpose of the study does not implicate the selection
of certain specific thresholds, one should aim to select a relatively
large number of thresholds that extend over the whole range of
values. For this study, 99 thresholds were chosen as the 1st,
2nd, . . . , 99th percentile of the flow record of each catchment. Note
that the selection of the thresholds on the basis of flow quantiles
also allows for a direct comparison of DRPS values from different
catchments and justifies the calculation of averages of DRPS values
a
The formulas may use symbols or formulations different from the respective
sources in the last column, but are mathematically equivalent.
b
No explicit formula is given, but the definition is described in the text.
c
Mathematical equivalence of the original definition by Epstein (1969) with the
formula given here is demonstrated by Murphy (1971).
over several catchments as indicated by Eq. (16). The comparability
over catchments is also reflected in the decomposition of the DRPS
where the uncertainty component assumes the same value in all
catchments when flow quantiles are selected as thresholds (see
Section 2.4.3).
2.4.2. Significance of differences of the DRPS: bootstrap
The overall evaluation of the eight error models is done by calculating the average discrete ranked probability score over the 55
catchments, DRPS, according to Eq. (16) for each model version. In
order to assess if differences between the scores of the different
model versions are significant, confidence intervals are estimated
using a non-parametric bootstrap (Efron and Tibshirani, 1993). Because of the correlations between the DRPS values of different
model versions for individual catchments (see Section 3.1,
Figs. 5–7), a construction of confidence intervals for the DRPS values directly would not be helpful because the correlations would
not allow a distinction of significant differences based on overlapping or non-overlapping confidence intervals. Instead, confidence
intervals for the differences between DRPS values of different model
versions are calculated as described in the following.
Let xc, c = 1, 2, . . . , 55, be the DRPS values (Eq. (15)) for one version of the error model, say version 1, for the individual catchments
1, 2, . . . , 55. And let yc, c = 1, 2, . . . , 55, be the corresponding DRPS
values for another version of the error model, say version 2. Then
the mean discrete ranked probability scores over the 55 catchP55
1
ments, DRPS (Eq. (16)), are DRPSð1Þ ¼ 55
c¼1 xc and DRPSð2Þ ¼
P55
1
y
for
model
versions
1
and
2,
respectively.
Let now
c
c¼1
55
zc = xc yc, c = 1, 2, . . . , 55, be the differences between the DRPS
of the two model versions in the individual catchments. Then the
difference of the mean values DRPSð1Þ and DRPSð2Þ is equal to
the mean of the differences, z:
DRPSð1Þ DRPSð2Þ ¼
55
1 X
zc ¼ z
55 c¼1
ð17Þ
Thus, estimating a confidence interval for the difference of the mean
values is equivalent to estimating a confidence interval for the
mean of the differences.
We now regard the differences zc, c = 1, 2, . . . , 55, as a sample of
an unknown population Z, which reflects the distribution of differences that might occur in general. A confidence interval for the
mean value of samples of size 55 from this unknown population
is estimated with non-parametric bootstrap, using the sample of
differences zc, c = 1, 2, . . . , 55, as a surrogate of the unknown population Z. The steps are as follows.
64
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Fig. 4. Overview of the 55 catchments and streamflow stations in Norway. Eight catchments are nested in a larger catchment.
(1) Generate a bootstrap sample by sampling 55 times with
replacement from the original sample of zc values. The bootstrap sample consists of the 55 values zi , i=1, 2, . . . , 55.
(Note: the indices i of the bootstrap sample have no relation
to the catchment numbers c; each value zi can be equal to
any of the values zc from the original sample.)
(2) Calculate the mean value of the bootstrap sample, z , as
z ¼
55
1 X
z
55 i¼1 i
ð18Þ
(3) Repeat steps 1 and 2 up to n-times, n being the number of
repetitions. For each new bootstrap sample, a new mean
value of the differences, zj , j=1, 2, . . . , n, is calculated according to Eq. (18).
From the sample of bootstrap replicates of the mean values zj ,
j = 1, 2, . . . , n, a confidence interval for the mean of the differences
can now be derived. The most straight forward method, a simple
percentile mapping (Efron and Tibshirani, 1993), is used. That
means the 95% confidence interval is estimated through the 2.5th
and 97.5th percentile of the distribution of zj values as lower
and upper limit of the confidence interval, respectively.
If the value of zero lies outside the confidence interval, the difference between the two model versions is regarded as signifi-
cantly different from zero. A number of n = 100,000 bootstrap
replications was used for this study.
2.4.3. Decomposition of the DRPS
Analogue to the Brier score (Brier, 1950), the DRPS can be
decomposed into three components. The Brier score decomposition (Murphy, 1973) into reliability (REL), resolution (RES) and
uncertainty (UNC) is based on the calculation of mean observed frequencies conditioned/stratified on different forecast probabilities.
To avoid problems with sparseness and make the estimated conditional mean values less uncertain, it is recommendable to divide
the interval of forecast probabilities [0, 1] into a finite set of nonoverlapping bins l = 1, . . . , L. When such stratification over bins of
probabilities is used, the decomposition of the Brier score yields
two extra components, the within-bin variance and within-bin
covariance (Stephenson et al., 2008). Including these extra components into the resolution term, which then is labelled as generalized
resolution (GRES), the decomposition of the Brier score can be written as (Stephenson et al., 2008)
BSk ¼ RELk GRESk þ UNC k
ð19Þ
with
RELk ¼
L
1X
l o
l Þ2
Nl ðp
N l¼1
ð20Þ
65
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
0.01
0.04
0.07
(d)
DRPS (SI)
0.01
0.01
DRPS (SD)
0.04
0.07
0.01
DRPS (SI)
0.07
Sqrt.Emp
0.07
Log.Emp
0.07
0.01
0.04
DRPS (SI)
0.07
0.04
0.01
DRPS (SI)
(c)
Sqrt.Norm
0.04
(b)
Log.Norm
0.04
(a)
0.01
DRPS (SD)
0.04
0.07
0.01
DRPS (SD)
0.04
0.07
DRPS (SD)
Fig. 5. DRPS values of models with state independent parameters (SI) versus DRPS values of models with state dependent parameters (SD) for the independent validation
1984–2005 (Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SI.Log.Norm) versus DRPS(SD.Log.Norm). (b)
DRPS(SI.Sqrt.Norm) versus DRPS(SD.Sqrt.Norm). (c) DRPS(SI.Log.Emp) versus DRPS(SD.Log.Emp). (d) DRPS(SI.Sqrt.Emp) versus DRPS(SD.Sqrt.Emp).
0.04
0.07
0.04
0.07
0.04
0.07
0.04
DRPS (Sqrt)
0.01
DRPS (Log)
SI.Emp
0.01
0.07
(d)
0.04
DRPS (Sqrt)
0.01
DRPS (Log)
SD.Emp
0.01
0.07
(c)
0.04
DRPS (Sqrt)
0.01
SI.Norm
0.01
0.07
(b)
0.04
DRPS (Sqrt)
SD.Norm
0.01
(a)
0.07
0.01
DRPS (Log)
0.04
0.07
DRPS (Log)
Fig. 6. DRPS values of models with square root transformation (Sqrt) versus DRPS values of models with log transformation (Log) for the independent validation 1984–2005
(Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SD.Sqrt.Norm) versus DRPS(SD.Log.Norm). (b) DRPS(SI.Sqrt.Norm)
versus DRPS(SI.Log.Norm). (c) DRPS(SD.Sqrt.Emp) versus DRPS(SD.Log.Emp). (d) DRPS(SI.Sqrt.Emp) versus DRPS(SI.Log.Emp).
0.04
0.07
DRPS (Emp)
0.04
0.07
DRPS (Emp)
0.04
0.07
DRPS (Emp)
0.07
0.04
DRPS (Norm)
0.01
SI.Sqrt
0.01
0.07
0.04
DRPS (Norm)
0.01
(d)
SD.Sqrt
0.01
0.07
0.04
DRPS (Norm)
0.07
0.04
0.01
(c)
SI.Log
0.01
(b)
SD.Log
0.01
DRPS (Norm)
(a)
0.01
0.04
0.07
DRPS (Emp)
Fig. 7. DRPS values of models with normal distribution (Norm) versus DRPS values of models with empirical distribution (Emp) for the independent validation 1984–2005
(Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SD.Log.Norm) versus DRPS(SD.Log.Emp). (b) DRPS(SI.Log.Norm) versus
DRPS(SI.Log.Emp). (c) DRPS(SD.Sqrt.Norm) versus DRPS(SD.Sqrt.Emp). (d) DRPS(SI.Sqrt.Norm) versus DRPS(SI.Sqrt.Emp).
P
ð1=KÞ Kk¼1 BSk (Toth et al., 2003). Thus, a decomposition of the
DRPS can be given as
Nl
L
L X
1 X
1X
l o
Þ2 l Þ2
GRESk ¼
N l ðo
ðp p
N l¼1
N l¼1 j¼1 lj
þ
Nl
L X
2X
l Þðplj p
l Þ
ðolj o
N l¼1 j¼1
ð1 o
Þ
UNC k ¼ o
DRPS ¼ DREL DGRES þ DUNC
ð21Þ
ð22Þ
BSk is the Brier score for the forecast of the event Ak as defined in
Section 2.4.1 for one catchment averaged over a total number of
N forecasts (days); Nl is the number of forecast probabilities that fall
into the lth bin; plj, j = 1, . . . , Nl, are the forecast probabilities falling
into the lth bin, and olj denote the binary observations (0 or 1) corresponding to the plj; the average of the forecast probabilities of the
P l
P l
l ¼ ð1=N l Þ Nj¼1
l ¼ ð1=N l Þ Nj¼1
lth bin is calculated as p
plj , and o
olj is
the corresponding average of the binary observations;
P P l
¼ ð1=NÞ Ll¼1 Nj¼1
o
olj is the overall mean of the binary observations,
i.e. the climatological base rate.
The DRPS from Eq. (15) can be formulated as the mean of the
Brier scores over all thresholds k = 1, . . . , K, i.e. DRPS ¼
ð23Þ
PK
with DRPS reliability DREL ¼ ð1=KÞ k¼1 RELk , DRPS generalized resP
olution
DGRES ¼ ð1=KÞ Kk¼1 GRESk ,
and
DRPS
uncertainty
PK
DUNC ¼ ð1=KÞ k¼1 UNC k .
To calculate the DRPS decomposition components in this study,
L = 10 equally spaced bins were chosen. Based on the selection of
the thresholds xk as quantiles of the flow distribution, the uncertainty component DUNC assumes the same value in all catchments.
With xk as the 1st, . . . , 99th percentile, the theoretical value of the
uncertainty
component
in
this
study
is
given
as
P
ð1 o
Þ ¼ 16:665=99 0:168 .
DUNC ¼ ð1=99Þ o2f0:01;...;0:99g o
2.5. Catchments and data
Fifty-five catchments distributed over the whole of Norway
were selected (Fig. 4). The selection was based on a common period of data from 1.9.1961 to 31.12.2005. The catchments are spread
66
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Table 5
Overview of the five validations carried out for each model version in each catchment.
Label
Validation type
Period for validation
Period for parameter
estimation
Cal.1 + 2
Cal.1
Cal.2
Val.1
Val.2
Dependent
Dependent
Dependent
Independent
Independent
1+2
1
2
1
2
1+2
1
2
2
1
(1962–2005)
(1962–1983)
(1984–2005)
(1962–1983)
(1984–2005)
(1962–2005)
(1962–1983)
(1984–2005)
(1984–2005)
(1962–1983)
applied as post-processors for longer lead times. This can then be
done in two ways. Either the error model is applied recursively
with one parameter set estimated for the basic time step; or the error model is applied with different parameter sets that are estimated for different intervals between t 1 and t.
3. Results
3.1. Model performances in the individual catchments
relatively evenly over Norway with some clustering in the southeast. Catchment sizes vary from 6 to 15,450 km2. The majority of
the catchments (45) are smaller than 1000 km2 and of these 21
are between 100 and 300 km2. The mean catchment elevations
range from 181 to 1460 m above mean sea level. They are relatively evenly distributed over the range of elevations with some
predominance in the interval 400–900 m. The mean annual runoff
ranges from 370 to 3236 mm/year. Also the runoff values are relatively evenly distributed over the range of runoff values.
The data used in the study are daily data of mean air temperature, accumulated precipitation and mean streamflow for the 55
catchments. Station measurements of mean daily temperature
and accumulated daily precipitation have been interpolated to a
1 1 km grid by the Norwegian Meteorological Institute (Mohr
and Tveito, 2008). Catchment values of mean daily temperature
(Tt) and accumulated daily precipitation (Pt) are then extracted
from the grid as mean values of the grid cells that lie within the
boundaries of the respective catchments. The streamflow data
are series of mean daily streamflow (Qobs(t)) from the Norwegian
Water Resources and Energy Directorate.
2.6. Error model calibration and validation procedure
The HBV model was run for the complete period in the 55 catchments, generating time series of simulated streamflow Qsim(t) and
simulated snow water equivalent SWEt. The first four months of
the model run were discarded as spin-up period and the remaining
period (1.1.1962–31.12.2005) was kept to investigate the eight
versions of the error model.
For each version of the error model, three different parameter
sets were estimated from three different periods in each of the
55 catchments:
Period 1:
1.1.1962–31.12.1983.
Period 2:
1.1.1984–31.12.2005.
Period 1 + 2: 1.1.1962–31.12.2005.
The eight versions were then evaluated in three dependent and
two independent validations in each catchment (Table 5). In the
dependent validations, the error model was applied to the same
data that was used for estimation of the parameters of the error
model. In the independent validations, the error model was applied
to independent data that had not been used in the estimation of
the parameters of the error model.
For each day of a validation period, a probabilistic forecast was
generated using Eq. (6). Based on the forecasts distributions
u(ot|st, ot1, st1) and the actual observations ot, the discrete
ranked probability score DRPS was calculated according to Eq.
(15). This was done for all 55 catchments, for all 8 model versions
(Table 3) and for all 5 validations (Table 5), i.e. altogether 2200
DRPS values were calculated.
For this study, a time step of one day was used as interval between t 1 and t, corresponding to the basic time step of the precipitation–runoff model. In general, the error models may also be
Figs. 5–7 show plots of the DRPS for the 55 catchments for
one model version versus the DRPS for another model version
for selected model combinations. All plots are for the independent validation for the period 1984–2005 (Val.2). The plots for
the other four validations in Table 5 are similar to the corresponding plots of the independent validation 1984–2005 and
are therefore not shown. Fig. 5 shows the four plots of state
independent (SI) models versus the corresponding state dependent (SD) models, Fig. 6 shows the four plots of square root
transformed (Sqrt) models versus the corresponding log transformed (Log) models, and Fig. 7 shows the four plots of models
with normal distribution (Norm) versus the corresponding models with empirical distribution (Emp). The following characteristics can be seen:
(a) A strong correlation between DRPS values of different model
versions is visible in all plots. The correlation varies to some
degree from plot to plot, but overall all plots show a distinct
correlation.
(b) The differences between DRPS values of different model versions in the same catchment (vertical deviations of the
points from the 1:1 line) are in all plots considerably smaller
than the range of DRPS values for one model version over the
55 catchments (maximum extent of the points in x or y
direction).
These two characteristics (a) and (b) show that a main influence on the performance of a probabilistic forecast that is based
on the post-processing of a deterministic forecast with some kind
of autoregressive error model lies in the performance of the deterministic forecast model in combination with the strength of the
autoregressive behaviour of the errors of the deterministic forecast
model. In a catchment where the deterministic forecast model
performs well or the autocorrelation of the errors of the deterministic forecast is high, the performance of the probabilistic forecast
in terms of the DRPS will in general be good, and the influence of
the specific implementation of the post-processor is of minor
importance if one compares the results with a catchment where
the deterministic forecast model has a weak performance and
the autocorrelation of the model errors is weak. This illustrates
that the error model can not ‘cure’ a poor deterministic forecast
model. Though it may produce a probabilistic forecast that is well
calibrated, the resolution of the forecast, which has a main influence on the values of the DRPS (see Section 3.4.1), will always
be poor.
(c) As a third characteristic, the point clouds show a systematic
shift from the 1:1 line in many of the plots. The shift is very
distinct in some of the plots, for example Fig. 5b, while in
other plots, for example Fig. 7b, the shift is not that obvious.
Still, on closer inspection, most of the plots seem to exhibit a
systematic shift. Such a systematic shift reflects an on average better performance of one version of the error model
over another version with respect to the evaluation measure
DRPS.
67
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
3.3. Significance of differences in model performances
We will now look more specifically at the differences in model
performance between the different model versions, namely the differences between (1) models with state dependent parameters
(SD) and corresponding models with state independent parameters
(SI), (2) models with square root transformation (Sqrt) and corresponding models with log transformation (Log) and (3) models
with normal distribution (Norm) and corresponding models with
empirical distribution (Emp). Figs. 9–11 show the differences between the average discrete ranked probability scores DRPS (Eq.
(16)) of corresponding model versions according to the aspects 3,
1 and 2, respectively. Error bars indicate the 95% bootstrap confidence intervals.
The results for the differences between models with normal distribution (Norm) and models with empirical distribution (Emp)
(Fig. 9) are very clear. In all five validations, the differences between model versions with normal distribution (Norm) and the
corresponding model versions with empirical distribution (Emp)
are significantly different from zero, and all are above zero. This
means that on average models that use an empirical distribution
perform significantly better than models that use the normal dis-
0.032
0.030
DRPS
0.028
0.026
0.024
Normal Distribution (Norm)
SI.Sqrt
SI.Log
SD.Sqrt
SD.Log
0.022
To further investigate and compare the average performance of
the eight error models, the average discrete ranked probability
score over all catchments, DRPS, was calculated according to Eq.
(16). Fig. 8 shows the values of DRPS for all five validations.
For the three dependent validations (white backgrounds), it is
apparent that the scores in period 2, Cal.2, are consistently better
compared with corresponding models in period 1, Cal.1. The scores
for the dependent validation for the complete period, Cal.1 + 2, lie
between the scores of periods 1 and 2. Thus, the period of data has
a clear impact on the performance of the error models in terms of
the absolute values of DRPS. The differences between the DRPS values of period 1 and period 2 for the same model version can be much
larger than differences between different model versions in the
same period.
The scores for the two independent validations (grey backgrounds) show a similar behaviour as the dependent validations
of the corresponding periods in that the scores of period 2 are better for all model versions compared with period 1 for the corresponding error models. However, the differences are less
pronounced than for the dependent validation.
The better performance of period 2 over period 1 found for both
the dependent and independent validation may be explained
through the correlation found between performance of the postprocessors in terms of DRPS and the Nash–Sutcliffe efficiency coefficients of the underlying HBV models (see Section 3.4.2). For most
of the catchments (45 out of 55), the HBV model performance is
better in period 2 than in period 1, and this is correlated with better performance of the post-processors in terms of DRPS, which is
then reflected in the average DRPS as well.
When comparing scores between independent and dependent
validations for the same period, there is no consistent change of
the scores from the dependent to the independent validation.
Though for period 2, the scores of the independent validation,
Val.2, are a bit worse than the corresponding scores of the dependent validation, Cal.2, the performance of the independent validation for period 1, Val.1, is more or less the same as for the
dependent validation, Cal.1. This indicates that none of the eight
versions of the error model has a clear over-parameterization in
the sense that the model performance would strongly deteriorate
in periods with independent data.
0.034
3.2. Model performances averaged over all catchments
Cal.1+2
Cal.1
Val.1
Empirical Distribution (Emp)
SI.Sqrt
SI.Log
SD.Sqrt
SD.Log
Cal.2
Val.2
Period and Validation Type
Fig. 8. Values of DRPS for the eight versions of the error model for the different
periods with dependent (white background) and independent (grey background)
validation.
tribution. The degree of improvement in absolute values is largest
for the SI.Sqrt.Norm model (magenta/grey1 diamonds in Fig. 9),
which shows the worst model performance with respect to DRPS
in all five validations compared with the other model versions (magenta/grey diamonds in Fig. 8). The second largest improvement is
given for the SI.Log.Norm model (blue/black diamonds in Fig. 9),
which scores second worst of all models with normal distribution
(red/grey circles in Fig. 8).
The results for the comparison of state dependent (SD) versus
state independent (SI) models are also clear (Fig. 10). All differences are larger than zero and all differences except one
(Log.Emp models in Val.2) are significantly different from zero.
That means that model versions with state dependent parameters
(SD) perform on average significantly better than the corresponding model versions with state independent parameters (SI). The
largest improvement of DRPS (blue/black diamonds in Fig. 10)
is again given for the version with the worst model performance
in terms of DRPS, SI.Sqrt.Norm (magenta/grey diamonds in Fig. 8).
The second largest improvement is given for SI.Sqrt.Emp (magenta/grey diamonds in Fig. 10), which has the second poorest model
performance in the class of SI models (magenta/grey pluses in
Fig. 8).
The results for models with log transformation (Log) versus
models with square root transformation (Sqrt) are more complex.
For the SI models (diamonds in Fig. 11) all differences are positive
and all except one (SI.Emp in Val.1) are significantly different from
zero. That means that for state independent models the model versions that use log transformation perform on average better than
models with square root transformation. Again, the improvement
is largest for the SI.Sqrt.Norm model (blue/black diamonds in
Fig. 11), the model with the worst model performance in terms
of DRPS (magenta/grey diamonds in Fig. 8). For the SD models however (cirlces in Fig. 11) none of the differences is significantly different from zero. That means that no significant difference in
model performance can be detected for log versus square root
transformation in the case of state dependent models.
1
Colour for web version / black-and-white for print version.
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Cal.1
Val.1
Cal.2
Val.2
0.000
0.002
0.004
0.006
Cal.1+2
−0.002
D R P S (Sqrt) − D R P S (Log)
−0.004
SD.Log
SI.Log
SD.Sqrt
SI.Sqrt
SD.Norm
SI.Norm
SD.Emp
SI.Emp
−0.004
0.004
0.002
0.000
−0.002
D R P S (Norm) − D R P S (Emp)
0.006
0.008
0.008
68
Cal.1+2
Period and Validation Type
Val.1
Cal.2
Val.2
Period and Validation Type
Fig. 9. Differences of values of DRPS between models with normal distribution
(Norm) and models with empirical distribution (Emp) with 95% bootstrap confidence intervals; the x-axis distinguishes the different periods with dependent
(white background) and independent (grey background) validation.
Fig. 11. Differences of values of DRPS between models with square root transformation (Sqrt) and models with log transformation (Log) with 95% bootstrap
confidence intervals; the x-axis distinguishes the different periods with dependent
(white background) and independent (grey background) validation.
0.000
0.002
0.004
0.006
0.008
catchments show some deviation from the theoretical values. This
is an artefact of the sampling uncertainty for the flow distributions
in the different periods. To have a consistent basis for all DRPS calculations, the same thresholds xk were used in all DRPS calculations, based on the percentiles of the flow distribution of the
complete period of data (period 1 + 2). As the flow distributions
in the other periods are slightly different, uncertainty values for
validations in period 1 and period 2 show slight deviations from
the theoretical uncertainty value.
When comparing the influence of the reliability DREL and generalized resolution DGRES on the final DRPS values, it is apparent
that the main influence is given by the resolution component,
while the reliability values DREL are all close to zero.
−0.002
D R P S (Si) − D R P S (SD)
Cal.1
−0.004
Log.Norm
Sqrt.Norm
Log.Emp
Sqrt.Emp
Cal.1+2
Cal.1
Val.1
Cal.2
Val.2
Period and Validation Type
Fig. 10. Differences of values of DRPS between models with state independent
parameters (SI) and models with state dependent parameters (SD) with 95%
bootstrap confidence intervals; the x-axis distinguishes the different periods with
dependent (white background) and independent (grey background) validation.
3.4. Decomposition of the DRPS and correlations with catchment
parameters
3.4.1. Decomposition of the DRPS
The decomposition of the DRPS in the individual catchments
according to Eq. (23) is shown in Fig. 12 for the SD.Log.Norm model
for the independent validation in period 1 (Val.1). The features
highlighted below are representative for the behaviour of equivalent plots for the other model versions and other validations.
The theoretical value of the uncertainty DUNC is indicated as a
horizontal line. The uncertainty values calculated for the individual
3.4.2. Correlation between DRPS and catchment characteristics
Correlations of the DRPS values in the individual catchments
with the catchment characteristics area and runoff coefficient, as
well as with the Nash–Sutcliffe efficiency coefficients of the HBV
models, were investigated. Correlations with the runoff coefficients
were found to be very week; the Pearson product-moment correlation coefficients for the different model versions and validations lie
between 0.07 and 0.26. Moderate negative correlations were found
for the correlation of DRPS with the logarithm of the catchment
area; the correlation coefficients lie between 0.41 and 0.51.
The strongest (negative) correlations were found for DRPS with
the Nash–Sutcliffe efficiency coefficients; the correlation coefficients lie between 0.64 and 0.80. This reflects the generally
increasing performance of the probabilistic forecasts in the individual catchments with increasing performance of the underlying
deterministic precipitation–runoff model.
4. Discussion
4.1. Normal distribution versus empirical distribution of the
standardized residuals
The models using an empirical distribution function to describe
the standardized residuals were found to perform significantly
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
69
0.0
0.1
0.2
of the same empirical distribution for each day, though superior
to the use of the standard normal distribution, is still sub-optimal.
Identifying different distributions for different conditions might be
more difficult as larger samples are necessary to clearly identify a
distribution than to estimate regression parameters, and a further
investigation of this aspect was beyond the scope of this study. Given the importance of using a correct distribution for generating a
valuable (calibrated) probabilistic forecast, it is an aspect where
further improvements of autoregressive hydrologic uncertainty
processors could be made if different distributions were present
and could be identified.
−0.2
−0.1
4.2. State dependent parameters versus state independent parameters
0
10
20
DRPS
DREL
DUNC
−DGRES
30
40
50
Catchment number
Fig. 12. Decomposition of the DRPS values for the 55 catchments for the
SD.Log.Norm model in the independent validation for period 1 (Val.1).
better than corresponding models that use a standard normal distribution. This finding is not too surprising when looking at quantile–quantile plots of the standardized empirical residuals (Fig. 13).
Those plots show a clear deviation of the distribution of the standardized empirical residuals from the standard normal distribution. Thus, when using a standard normal distribution one can
not expect to receive optimal results, and an empirical distribution
function proved to be superior.
Still, some care has to be taken when assessing the approach of
using an empirical distribution function as carried out in this
study. When applying the same empirical distribution on each
day, one assumes implicitly that the distribution of the standardized residuals is the same for each day. However, there is no theoretical basis that would warrant this assumption and it seems
plausible that, in the same way that the regression parameters of
the autoregressive model may be different for different conditions,
the standardized residuals might be described through different
distributions for different conditions as well. In this case, the use
Considering the simplicity of the autoregressive model of Eq. (2)
it seems reasonable to assume that the parameters of the autoregressive model might not be constant but vary depending on the
states of the model or the environment. Indeed, the results showed
clearly that the simple autoregressive models with state independent parameters have a significantly poorer performance than the
corresponding models with state dependent parameters. The state
dependent parameterization chosen for this study is relatively detailed. It is possible that a simpler classification scheme might lead
to an equivalent performance, or that alternative types of classifications may lead to similar or even better performances. Further
investigation of these aspects was beyond the scope of this study.
However, it was clearly shown that for an autoregressive error
model used as hydrologic uncertainty processor a significant
improvement of the performance is achieved through a state
dependent parameterization compared to a simple model with
state independent parameters.
4.3. Log transformation versus square root transformation
For state independent models it was found that the models with
logarithmic transformation perform significantly better than models with square root transformation. An explanation for this behaviour can be found when looking at plots of the standardized
residuals et versus simulated streamflow Qsim(t). For most catchments, plots of models with logarithmic transformation show a
fairly homoscedastic behaviour, while plots of models with square
root transformation reveal a systematic increase of the variance of
the residuals with increasing streamflow values. Fig. 14 gives an
example of the plots for the catchment Fustvatn. The x-axis in
these plots is scaled according to the rank of the simulated streamflow. This is done to assure a constant density of the points in the
x-dimension. Otherwise, if the density in x-direction is very inho-
Fig. 13. Quantile–quantile plots for the empirical standardized residuals ^et assuming a standard normal distribution as theoretical distribution for the catchment Bulken
(period 1.1.1962–31.12.2005 with parameters estimated from the same period). (a) Complete data. (b) The same plot as a, but only displaying the central 95% of the data,
leaving out the 2.5% smallest and 2.5% highest values of ^
et .
70
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Fig. 14. Plots of the standardized residuals et versus the rank of the simulated streamflow Qsim(t) for models with state independent parameters for the catchment Fustvatn.
(a) Model with logarithmic transformation of the original streamflow values. (b) Model with square root transformation of the original streamflow values.
mogeneous as it is usually the case for streamflow values on a linear or logarithmic scale, the visual impression might be distorted
and a set of homoscedastic values may appear as strongly nonhomoscedastic. In Fig. 14a the standardized residuals for the state
independent model with logarithmic transformation show a fairly
homoscedastic behaviour, while in Fig. 14b the residuals for the
state independent model with square root transformation show
an increasing variance with increasing streamflow values. Thus,
the assumption of a state independent variance r is less justified
for the SI models with square root transformation, and this is reflected in their inferior DRPS compared to the SI models with logarithmic transformation.
However, for models with state dependent parameters, there is
no significant difference in the performance between models with
logarithmic transformation and models with square root transformation. The formulation of rt as dependent on the simulated
streamflow (Eq. (10)) and the other flexibilities introduced with
the state dependent formulation, can account for the more nonhomoscedastic behaviour and other deficiencies that the SI models
with square root transformation might have compared to the models with logarithmic transformation. The similar performance of
Log and Sqrt models shows that the choice of transformation
looses its importance for the models with state dependent parameters. It is likely that for a range of transformations of the Box–Cox
type (Box and Cox, 1964) that lie in between the special cases of
the logarithmic and square root transformation, the differences
introduced by different transformations will be levelled out
through the flexibility introduced with the state dependent
formulation.
5. Summary and conclusions
Eight different versions of autoregressive error models were
investigated as hydrologic uncertainty processors for probabilistic
streamflow forecasting. Evaluation with the discrete ranked probability score as forecast evaluation measure gave the following
main findings.
(1) The variance of DRPS values for the same model version over
different catchments is larger than differences between different model versions in the same catchment. This reflects
the strong dependence of the quality of the probabilistic
forecast on the quality of the underlying (updated) deterministic forecast.
(2) Given a certain catchment with its deterministic precipitation–runoff model, significant differences in model
performance between different versions of the autoregressive hydrologic uncertainty processors could be detected:
(a) Models with state dependent parameters perform significantly better than corresponding models with state
independent parameters.
(b) Models using an empirical distribution function to
describe the standardized residuals perform significantly
better than corresponding models using a standard normal distribution.
(c) For models with state independent parameters, those
with a logarithmic transformation of the original streamflow values perform significantly better than those with a
square root transformation. However, for models with
state dependent parameters, this significance disappears
and there is no difference in the performance of the logarithmic versus the square root transformation. The explanation is found in the flexibility that is introduced with
the state dependent formulation, which can account for
and alleviate the more non-homoscedastic behaviour that
is found for the square root transformation.
The results give guidance when using an autoregressive error
model as hydrologic uncertainty processor. If a simple model with
constant parameters and the assumption of a standard normal distribution for the standardized residuals is chosen, the choice of
transformation used to attain homoscedastic residuals is important, and the logarithmic transformation was in this study clearly
superior over the square root transformation. If a more complex
model is chosen, both use of an empirical distribution function
and a formulation with state dependent parameters will lead to
an improved performance of the uncertainty processor. The best
model performance is attained for models that use both an empirical distribution and state dependent parameters. For this type of
models and with the formulation of state dependence as used in
this study, the transformation type has no longer a significant
influence on the model performance.
Aspects that might lead to further improvements not investigated in this study are:
(1) The use of state dependent empirical distributions for the
standardized residuals analogue to the use of state dependent parameters.
71
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
If the log-transformed values of the expected values of the
squared residuals e2t are described through a linear predictor
and the squared residuals themselves follow a Gamma distribution, then Eq. (A5) constitutes a generalized linear model
(Faraway, 2006) with response variable e2t , predictor variable
st, link function as the natural logarithm and Gamma distribution of the response variable. The parameters of the
generalized linear model are estimated using iteratively
reweighted least squares (IRWLS), which are equivalent to
maximum likelihood estimates (Faraway, 2006; McCullagh
and Nelder, 1989).
Iteration: Once parameters Aj and B are estimated, the variance r2t is calculated for each time step from Eq. (10). Then
steps one and two are iterated with the only modification
that the parameters in step one are now estimated with a
weighted linear regression using the reciprocal of the variance 1=r2t as weights. The rationale is that cases with a larger
variance in the linear regression of step one should receive
less weight than cases with a smaller variance. The iterations
are stopped once convergence of the parameters is obtained.
Praxis showed that the algorithm converges very quickly.
Ten iterations were enough to assure convergence of the
parameters in this study.
(2) The investigation of alternative state dependent parameterization schemes. The scheme presented is relatively detailed.
There might be less complex schemes that exhibit an equivalent performance as the one used in this study, or alternative schemes with better performances.
In addition to the findings, the study discussed the use of the
discrete ranked probability score as an evaluation measure that allows evaluation over the whole range of streamflow values (apart
from a certain discretization) and a direct averaging of the scores
over several catchment for an overall evaluation and comparison
of different methods.
Acknowledgements
We thank the Norwegian Water Resources and Energy Directorate (NVE) for the provision of the data, the ‘‘Nordic’’ HBV model
program and computing facilities. We further thank Thomas Skaugen and Elin Langsholt for their information on the ‘‘Nordic’’ HBV
model code and the uncertainty procedures operational at NVE.
We are also very grateful to Lukas Gudmundsson and Klaus Vormoor for their discussions and comments, and we want to thank
two anonymous reviewers for their valuable comments, which
helped to improve the paper.
Appendix A. Parameter estimation for models with state
dependent parameters
The iterative two-step procedure for parameter estimation for
model versions with state dependent parameters works as follows.
Input data to the calibration procedure are time series of the variables dt, dt1, st, Tt, Pt, SWEt and Qsim(t), as defined in Section 2.2. In
the beginning, the data set is divided into two subsets 1 and 2 for
cases with Qsim(t) P qthresh and Qsim(t) < qthresh respectively. Parameters aj, b, Aj, B are derived from subset 1, and parameters
aj ; b ; Aj ; B are derived from subset 2. In the following paragraphs,
derivation of the parameters for subset 1 is described. Derivation of
the parameters for subset 2 is analogue.
For the sake of clarity, regression Eqs. (A1) and (A5) have been
written without individual appearance of the parameters a1, . . . , a5
and A1, . . . , A5. However, individual appearance of the parameters
as it is necessary for a formal derivation of the regression parameters can be achieved by introducing five indicator variables
Ij;t ¼
1 for iðtÞ ¼ j
;
0 for iðtÞ – j
j ¼ 1; . . . ; 5
ðA6Þ
where i(t) is the number of the meteorological and snow state as defined in Table 1. Defining further the five variables dj,t1 = Ij,t dt1,
j = 1, . . . , 5, regression Eqs. (A1) and (A5) can be rewritten as
dt ¼ a1 d1;t1 þ a2 d2;t1 þ a3 d3;t1 þ a4 d4;t1 þ a5 d5;t1 þ bzt
ðA7Þ
and
lnðEðe2t ÞÞ ¼ 2A1 I1;t þ 2A2 I2;t þ 2A3 I3;t þ 2A4 I4;t þ 2A5 I5;t þ 2Bst
Step 1: The deterministic part of the error model is a linear
model with response variable dt and two predictor variables
dt1 and zt = stdt1
dt ¼ aiðtÞ dt1 þ bzt
ðA1Þ
The parameters are estimated with ordinary least squares, which
in the case of normal distribution of the residuals are equivalent
to maximum likelihood estimates.
Step 2: With the parameters estimated in the first step the
residuals of the deterministic component are calculated as
et ¼ dt aiðtÞ dt1 bzt
ðA2Þ
Under the assumption of normal distribution of the residuals et,
the squared residuals e2t follow a Chi-square distribution with
one degree of freedom, which is a special case of the Gamma distribution. Also, the expected value of the squared residuals is
equal to the variance
Eðe2t Þ ¼ r2t
ðA3Þ
With the relation of the standard deviation given in Eq. (10) it
follows that:
ln r2t ¼ 2 ln rt ¼ 2AiðtÞ þ 2Bst
ðA4Þ
Combining Eqs. (A3) and (A4) leads to
lnðEðe2t ÞÞ ¼ 2AiðtÞ þ 2Bst
ðA5Þ
ðA8Þ
respectively.
Under the assumption that the standardized residuals et in Eq.
(13) are standard normally distributed, the ordinary least squares
estimates of step one and the iteratively reweighted least squares
from step two are equivalent to Maximum likelihood estimates. If
the distributional assumptions are violated the estimates are no
longer strict maximum likelihood estimates. However, it is assumed that moderate violations of the distributional assumptions
do not have a strong negative effect on the estimates of the
parameters.
References
Bergström, S., 1976. Development and Application of a Conceptual Runoff Model for
Scandinavian Catchments. SMHI RHO 7. Swedish Meteorological and
Hydrological Institute, Norrköping, Sweden.
Bergström, S., 1992. The HBV Model – its Structure and Applications. SMHI Reports
Hydrology No. 4, Swedish Meteorological and Hydrological Institute,
Norrköping, Sweden.
Bougeault, P., 2003. The WGNE Survey of Verification Methods for Numerical
Prediction of Weather Elements and Severe Weather Events. Météo-France,
Toulouse,
France.
<http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/
Bougeault/Bougeault_Verification-methods.htm> (accessed 26.11.10).
Box, G.E.P., Cox, D.R., 1964. An analysis of transformations. J. Roy. Stat. Soc. Ser. B 26
(2), 211–252.
Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Mon.
Weather Rev. 78 (1), 1–3. doi:10.1175/1520-0493(1950)078<0001:VOFEIT>
2.0.CO;2.
72
M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72
Cloke, H.L., Pappenberger, F., 2009. Ensemble flood forecasting: a review. J. Hydrol.
375 (3–4), 613–626. doi:10.1016/j.jhydrol.2009.06.005.
Doherty, J., 2004. PEST: Model Independent Parameter Estimation. User Manual,
fifth ed. Watermark Numerical Computing, Brisbane, Australia.
Engeland, K., Gottschalk, L., 2002. Bayesian estimation of parameters in a regional
hydrological model. Hydrol. Earth Syst. Sci. 6 (5), 883–898. doi:10.5194/hess-6883-2002.
Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman & Hall,
New York, NY, US.
Epstein, E.S., 1969. A scoring system for probability forecasts of ranked categories. J.
Appl. Meteorol. 8 (6), 985–987. doi:10.1175/1520-0450(1969)008<0985:
ASSFPF>2.0.CO;2.
Faraway, J.J., 2006. Extending the Linear Model with R: Generalized Linear, Mixed
Effects and Nonparametric Regression Models. Chapman & Hall/CRC, Boca
Raton, FL, US.
Jaun, S., Ahrens, B., 2009. Evaluation of a probabilistic hydrometeorological forecast
system. Hydrol. Earth Syst. Sci. 13 (7), 1031–1043. doi:10.5194/hess-13-10312009.
Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. J.
Basic Eng. 82 (1), 35–64.
Krzysztofowicz, R., 1999. Bayesian theory of probabilistic forecasting via
deterministic hydrological model. Water Resour. Res. 35 (9), 2739–2750.
doi:10.1029/1999WR900099.
Krzysztofowicz, R., Kelly, K.S., 2000. Hydrologic uncertainty processor for
probabilistic river stage forecasting. Water Resour. Res. 36 (11), 3265–3277.
doi:10.1029/2000WR900108.
Kuczera, G., 1983. Improved parameter inference in catchment models: 1.
Evaluating parameter uncertainty. Water Resour. Res. 19 (5), 1151–1162.
doi:10.1029/WR019i005p01151.
Langsrud, Ø., Frigessi, A., Høst, G., 1998. Pure Model Error of the HBV Model. HYDRA
Note No. 4, Norwegian Water Resources and Energy Directorate, Oslo, Norway.
Lawrence, D., Haddeland, I., Langsholt, E., 2009. Calibration of HBV Hydrological
Models Using PEST Parameter Estimation. Report No. 1 – 2009, Norwegian
Water Resources and Energy Directorate, Oslo, Norway.
Levenberg, K., 1944. A method for the solution of certain problems in least squares.
Quart. Appl. Math. 2, 164–168.
Lundberg, A., 1982. Combination of a conceptual model and an autoregressive
model for improving short time forecasting. Nord. Hydrol. 13 (4), 233–246.
Marquardt, D., 1963. An algorithm for least-squares estimation of non-linear
parameters. J. Soc. Ind. Appl. Math. 11 (2), 431–441. doi:10.1137/0111030.
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman &
Hall, London, UK.
Mohr, M., Tveito, O.E., 2008. Daily temperature and precipitation maps with 1 km
resolution derived from Norwegian weather observations. In: Extended
Abstract from 17th Conference on Applied Climatology, American
Meteorological Society, 11–14 August, Whistler, BC, Canada. <http://
ams.confex.com/ams/pdfpapers/141069.pdf> (accessed 26.11.10).
Molteni, F., Buizza, R., Palmer, T.N., Petroliagis, T., 1996. The ECMWF ensemble
prediction system: methodology and validation. Quart. J Royal Meteorol. Soc.
122 (529), 73–119. doi:10.1002/qj.49712252905.
Murphy, A.H., 1971. A note on the ranked probability score. J. Appl. Meteorol. 10 (1),
155–156. doi:10.1175/1520-0450(1971)010<0155:ANOTRP>2.0.CO;2.
Murphy, A.H., 1973. A new vector partition of the probability score. J. Appl.
Meteorol. 12 (4), 595–600.
Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models:
Part I – A discussion of principles. J. Hydrol. 10 (3), 282–290. doi:10.1016/00221694(70)90255-6.
Nurmi, P., 2003. Recommendations on the Verification of Local Weather Forecasts
(at ECMWF Member States). Consultancy Report, ECMWF Operations
Department.
<http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Rec_FIN_
Oct.pdf> (accessed 26.11.10).
Seo, D.-J., Koren, V., Cajina, N., 2003. Real-time variational assimilation of hydrologic
and hydrometeorological data into operational hydrologic forecasting. J.
Hydrometeor. 4 (3), 627–641.
Seo, D.-J., Herr, H.D., Schaake, J.C., 2006. A statistical post-processor for accounting
of hydrologic uncertainty in short-range ensemble streamflow prediction.
Hydrol. Earth Syst. Sci. Discuss. 3 (4), 1987–2035. doi:10.5194/hessd-3-19872006.
Stanski, H.R., Wilson, L.J., Burrows, W.R., 1989. Survey of common verification
methods in meteorology. WMO World Weather Watch Technical Report No. 8,
WMO/TD No. 358, second ed. Atmospheric Environment Service, Downsview,
Canada.
<http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Stanski_et_al/
Stanski_et_al.html> (accessed 26.11.10).
Stephenson, D.B., Coelho, C.A.S., Jolliffe, I.T., 2008. Two extra components in the
Brier score decomposition. Weather Forecast. 23 (4), 752–757. doi:10.1175/
2007WAF2006116.1.
Sælthun, N.R., 1996. The ‘‘Nordic’’ HBV model. Norwegian Water Resources and
Energy Administration Publication No. 7, Oslo, Norway.
Todini, E., 2004. Role and treatment of uncertainty in real-time flood forecasting.
Hydrological Processes 18 (14), 2743–2746. doi:10.1002/hyp.5687.
Toth, E., Montanari, A., Brath, A., 1999. Real-time flood forecasting via combined use
of conceptual and stochastic models. Phys. Chem. Earth, Part B 24 (7), 793–798.
doi:10.1016/S1464-1909(99)00082-9.
Toth, Z., Talagrand, O., Candille, G., Zhu, Y., 2003. Probability and ensemble
forecasts. In: Jolliffe, I.T., Stephenson, D.B. (Eds.), Forecast Verification: a
Practitioner’s Guide in Atmospheric Science. John Wiley & Sons, Chichester,
UK, pp. 137–164.
Wilks, D.S., 1995. Statistical Methods in Atmospheric Sciences: an Introduction.
International Geophysics Series, vol. 59. Academic Press, San Diego, CA, US.
World Meteorological Organization, 1992. Simulated Real-time Intercomparison of
Hydrological Models. Operational Hydrology Report No. 38, WMO Publication
No. 779, World Meteorological Organization, Geneva, Switzerland.
WWRP/WGNE Joint Working Group on Forecast Verification Research, 2010.
Forecast verification: Issues, methods and FAQ. <http://www.cawcr.gov.au/
projects/verification/> (accessed 26.11.10).
Xu, C.-Y., 2001. Statistical analysis of parameters and residuals of a conceptual
water balance model – methodology and case study. Water Resour. Manage. 15
(2), 75–92. doi:10.1023/A:1012559608269.