Evaluating uncertainties in parameter estimation and

advertisement
Assessing Hydrological Model
Performance Using Stochastic
Simulation
Ke-Sheng Cheng
Department of Bioenvironmental Systems Engineering
National Taiwan University
September 8, 2010
1
INTRODUCTION
 Very
often, in hydrology, the problems
are not clearly understood for a
meaningful analysis using physicallybased methods.
 Rainfall-runoff modeling
models – regression, ANN
 Conceptual models – Nash LR
 Physical models – kinematic wave
 Empirical
September 8, 2010
2
 Regardless
of which types of models are
used, almost all models need to be
calibrated using historical data.
 Model calibration encounters a range
of uncertainties which stem from
different sources including
 data
uncertainty,
 parameter uncertainty, and
 model structure uncertainty.
September 8, 2010
3
 The
uncertainties involved in model
calibration inevitably propagate to the
model outputs.
 Performance of a hydrological model
must be evaluated concerning the
uncertainties in the model outputs.
Uncertainties in model
performance evaluation.
September 8, 2010
4
ASCE Task Committee, 1993
“Although there have been a multitude of
watershed and hydrologic models developed in
the past several decades, there do not appear to
be commonly accepted standards for evaluating
the reliability of these models. There is a great
need to define the criteria for evaluation of
watershed models clearly so that potential users
have a basis with which they can select the
model best suited to their needs”.
 Unfortunately, almost two decades have
passed and the above scientific quest remains
valid.

September 8, 2010
5
SOME NATURES OF FLOOD
FLOW FORECASTING

Incomplete knowledge of the hydrological
process under investigation.


Uncertainties in model parameters and model
structure when historical data are used for model
calibration.
It is often impossible to observe the process
with adequate density and spatial resolution.

Due to our inability to observe and model the
spatiotemporal variations of hydrological
variables, stochastic models are sought after for
flow forecasting.
September 8, 2010
6
 A unique
and important feature of the
flow at watershed outlet is its
persistence, particularly for the cases of
large watersheds.
 Even
though the model input (rainfall)
may exhibit significant spatial and
temporal variations, flow at the outlet is
generally more persistent in time.
September 8, 2010
7
Illustration of persistence in flood
flow series
p
xt  0  i xt  i   t
i 1
A measure of
persistence is
defined as the
cumulative
impulse
response (CIR).
1
CIR 
1 
p
  i
i 1
September 8, 2010
8
 The
flow series have significantly
higher persistence than the rainfall
series.
 We have analyzed flow data at other
locations including Hamburg, Iowa of
the United States, and found similar
high persistence in flow data series.
September 8, 2010
9
The Problem of Lagged Forecast
September 8, 2010
10
September 8, 2010
11
CRITERIA FOR MODEL
PERFORMANCE EVALUATION
 Relative
error (RE)
 Mean absolute error (MAE)
 Correlation coefficient (r)
 Root-mean-squared error (RMSE)
 Normalized Root-mean-squared error
(NRMSE)
NRMSE 
RMSE
 obs
September 8, 2010
12

Coefficient of efficiency (CE) (Nash and Sutcliffe,
n
1970)
ˆ 2
SSE
CE  1 
 1
SSTm

 (Q  Q )
t 1
n
t
t
2
(
Q

Q
)
 t
t 1
Coefficient of persistence (CP) (Kitanidis and Bras,
n
1980)
ˆ )2
(
Q

Q

t
t
SSE
t 1
CP  1 
SSEN
 1
n
2
(
Q

Q
)
 t t k
t 1

Error in peak flow (or stage) in percentages or
absolute value (Ep)
September 8, 2010
13
September 8, 2010
14
Coefficient of Efficiency (CE)

The coefficient of efficiency evaluates the
model performance with reference to the
mean of the observed ˆdata.
n
CE  1 

SSE
 1
SSTm
 (Q
t 1
n
t
 (Q
t 1
t
 Qt ) 2
 Q )2
Its value can vary from 1, when there is a
perfect fit, to  . A negative CE value
indicates that the model predictions are
worse than predictions using a constant
equal to the average of the observed data.
September 8, 2010
15
Model performance rating using
CE (Moriasi et al., 2007)

Moriasi et al. (2007) emphasized that the
above performance rating are for a monthly
time step. If the evaluation time step
decreases (for example, daily or hourly time
step), a less strict performance rating should
be adopted.
September 8, 2010
16
Coefficient of Persistency (CP)

It focuses on the relationship of the performance
of the model under consideration and the
performance of the naïve (or persistent) model
which assumes a steady state over the forecast lead
n
time.
ˆ )2
(
Q

Q

SSE
CP  1 
 1
SSEN

t 1
n
t
t
2
(
Q

Q
)
 t t k
t 1
A small positive value of CP may imply occurrence
of lagged prediction, whereas a negative CP value
indicates that performance of the considered
model is inferior to the naïve model.
   CP  1
September 8, 2010
17
An example of river stage forcating
Model
forecasting
CE=0.68466
ANN model
observation
September 8, 2010
18
Model
forecasting
CE=0.68466
CP= -0.3314
Naive
forecasting
CE=0.76315
ANN model
observation
Naïve model
September 8, 2010
19
Bench Coefficient
 Seibert
(2001) addressed the importance
of choosing an appropriate benchmark
series with which the predicted series of
the considered model is compared.
n
 (Q  Qˆ )
n
Gbench  1 
2
ˆ
 (Qt  Qt )
CP  1 
t 1
n
 (Q  Q
t 1
t
b ,t
)
t
 (Q  Q
t 1
n
2
CE  1 
September 8, 2010
t
t 1
n
t
2
tk
)2
ˆ )2
(
Q

Q
 t t
t 1
n
2
(
Q

Q
)
 t
t 1
20
 The
bench coefficient provides a
general form for measures of goodnessof-fit based on benchmark comparisons.
 CE and CP are bench coefficients with
respect to benchmark series of the
constant mean series and the naïveforecast series, respectively.
September 8, 2010
21
 The
bottom line, however, is what
should the appropriate benchmark
series be for the kind of application
(flood forecasting) under consideration.
 We propose to use the AR(1) or AR(2)
model as the benchmark for flood
forecasting model performance
A CE-CP coupled MPE
evaluation.
criterion.
September 8, 2010
22
Demonstration of parameter and
model uncertainties
September 8, 2010
23
Parameter uncertainties without
model structure uncertainty
September 8, 2010
24
Parameter uncertainties without
model structure uncertainty
September 8, 2010
25
Parameter uncertainties without
model structure uncertainty
September 8, 2010
26
Parameter uncertainties with model
structure uncertainty
September 8, 2010
27
Uncertainties in model performance
RMSE
September 8, 2010
28
Uncertainties in model performance
RMSE
September 8, 2010
29
Uncertainties in model performance
CE
September 8, 2010
30
Uncertainties in model performance
CE
September 8, 2010
31
Uncertainties in model performance
CP
September 8, 2010
32
Uncertainties in model performance
CP
September 8, 2010
33
 It
appears that the model specification
error does not affect the parameter
uncertainties. However, the bias in
parameter estimation of AR(1)
modeling will result in a poorer
forecasting performance and higher
uncertainties in MPE criteria.
September 8, 2010
34
ASYMPTOTIC RELATIONSHIP
BETWEEN CE AND CP
a sample series { xt , t  1,2,, n}, CE
and CP respectively represent measures
of model performance by choosing the
constant mean series and the naïve
forecast series as benchmark series.
 The sample series is associated with a
lag-1 autocorrelation coefficient
.
 Given
1
September 8, 2010
35
[A]
September 8, 2010
36
 Given
a data series with a specific lag-1
autocorrelation coefficient, we can
choose various models for one-step lead
time forecasting of the given data series.
 Equation [A] indicates that, although
the forecasting performance of these
models may differ significantly, their
corresponding (CE, CP) pairs will all
fall on a specific line determined by 1 .
September 8, 2010
37
Asymptotic relationship between CE and CP for data
series of various lag-1 autocorrelation coefficients.
1  0.6
September 8, 2010
38

The asymptotic CE-CP relationship can be
used to determine whether a specific CE
value, for example CE=0.55, can be
considered as having acceptable accuracy.
 The CE-based model performance rating
recommended by Moriasi et al. (2007) does
not take into account the autocorrelation
structure of the data series under
investigation, and thus may result in
misleading recommendations.
September 8, 2010
39

Consider a data series with significant persistence
or high lag-1 autocorrelation coefficient, say 0.8.
Suppose that a forecasting model yields a CE value
of 0.55 (see point C). With this CE value,
performance of the model is considered satisfactory
according to the performance rating recommended
by Moriasi et al. (2007).
 However, it corresponds to a negative value of CP (0.125), indicating that the model performs even
poorer than the naïve forecasting, and thus should
not be recommended.
September 8, 2010
40
Asymptotic relationship between CE and CP for data
series of various lag-1 autocorrelation coefficients.
September 8, 2010
41
1= 0.843
CE=0.686
at CP=0
1= 0.822
CE=0.644
at CP=0
1= 0.908
CE=0.816
at CP=0
September 8, 2010
42
 For these
three events, the very simple
naïve forecasting yields CE values of
0.686, 0.644, and 0.816 respectively,
which are nearly in the range of good to
vary good according to the rating of
Moriasi et al. (2007).
September 8, 2010
43
 In
the literature we have found that
many flow forecasting applications
resulted in CE values varying between
0.65 and 0.85. With presence of high
persistence in flow data series, it is
likely that not all these models
performed better than naïve
forecasting.
September 8, 2010
44

Another point that worth cautioning in using
CE for model performance evaluation is
whether it should be applied to individual
events or a constructed continuous series of
several events.
 Variation of CE values of individual events
enables us to assess the uncertainties in
model performance. Whereas some studies
constructed an artifact of continuous series
of several events, and a single CE value was
calculated from the multiple-event
continuous series.
September 8, 2010
45

CE value based on such an artifactual series
cannot be considered as a measure of overall
model performance with respect to all events.
 This is due to that fact that the denominator
in CE calculation is significant larger for the
artifactual series than that of any individual
event series, and thus the CE value of the
artifactual series will be higher than the CE
value of any individual event.
 (Q  Qˆ )
n
2
CE  1 
September 8, 2010
t 1
n
t
 (Q
t 1
t
t
 Q )2
46
 For example,
the CE value by naïve
forecasting for an artifactual flow series
of the three events in Figure 1 is 0.8784
which is significant higher than the
naïve-forecasting CE value of any
individual event.
September 8, 2010
47
1= 0.843
CE=0.686
at CP=0
1= 0.822
CE=0.644
at CP=0
1= 0.908
CE=0.816
at CP=0
September 8, 2010
48
A nearly perfect forecasting model
1600
CE=0.79021
1400
CE=0.66646
1200
CE=0.79109
1000
CE=0.80027
CE=0.85599
CE=0.62629
800
CE=0.77926
600
CE=0.76404
CE=0.84652
400
200
0
1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 379 393 407 421
September 8, 2010
49
A CE-CP COUPLED MPE
CRITERION

Are we satisfied with using the constant
mean series or naïve forecasting as
benchmark?
 Considering the high persistence nature in
flow data series, we argue that performance
of the autoregressive model AR(p) should be
considered as a benchmark comparison for
performance of other flow forecasting
models.
September 8, 2010
50
 From
our previous experience in flood
flow analysis and forecasting, we
propose to use AR(1) or AR(2) model
for benchmark comparison.
September 8, 2010
51
 The
asymptotic relationship between
CE and CP indicates that when
different forecasting models are applied
to a given data series (with a specific
value of 1, say *), the resultant (CE,
CP) pairs will all fall on a line
determined by Eq. [A] with 1= * .
September 8, 2010
52

In other words, points on the asymptotic line
determined by 1= * represent forecasting
performance of different models which are
applied to the given data series.
 Using the AR(1) or AR(2) model as the
benchmark, we need to know which point on
the asymptotic line corresponds to the AR(1)
or AR(2) model.
September 8, 2010
53
CE-CP relationships for AR(1)
model
 AR(1)
CE  4CP2  4CP  1
September 8, 2010
[B]
54
CE-CP relationships for AR(1) and
AR(2) models
 AR(2)
X t  1 X t 1  2 X t 2   t
 4  2 
 4

8 
CP   4 
CP  
CE  
 3 
2 
2 
2
1  2 
 1  2 

 1  2

September 8, 2010
[C]
55
Example of event-1
AR(2)
model
1=0.843
AR(1)
model
Data AR(2)
modeling
Data AR(1)
modeling
September 8, 2010
56
Assessing uncertainties in (CE, CP) using
modeled-based bootstrap resampling
September 8, 2010
57
Assessing uncertainties in MPE by
bootstrap resampling (Event-1)
September 8, 2010
58
Assessing uncertainties in MPE by
bootstrap resampling (Event-1)
September 8, 2010
59
Conclusions
 Performance
of a flow forecasting
model needs to be evaluated by taking
into account the uncertainties in model
performance.
 AR(2) model should be considered as
the benchmark.
 Bootstrap resampling can be helpful in
evaluating the uncertainties in model
performance.
September 8, 2010
60
 As
a final remark, we like to reiterate a
remark made by Seibert (2001) a
decade ago:
“Obviously there is the risk of discouraging
results when a model does not outperform
some simpler way to obtain a runoff series.
But if we truly wish to assess the worth of
models, we must take such risks. Ignorance
is no defense.”
September 8, 2010
61
Thank you for your attention.
Your comments are most welcome!
September 8, 2010
62
What exactly does ensemble mean?
 “Ensemble”
used in weather
forecasting
 Ensemble
Prediction System (EPS)
 Ensemble Streamflow Prediction (ESP)
 Perturbation instead of stochastic variation
 “Ensemble”
in statistics
 A collection
of all possible outcomes of a
random experiment.
September 8, 2010
63

In mathematical physics, especially as
introduced into statistical mechanics and
thermodynamics by J. Willard Gibbs in
1878, an ensemble (also statistical ensemble
or thermodynamic ensemble) is an
idealization consisting of a large number of
mental copies (sometimes infinitely many) of
a system, considered all at once, each of
which represents a possible state that the
real system might be in.
September 8, 2010
64

In both cases, the fact of including stochastic
physics in the model gives rise to higher
forecast scores values than using only an
ensemble based on random perturbations to the
initial conditions.
September 8, 2010
65
Download