Model performance and parameter behavior for varying time

advertisement
Click
Here
WATER RESOURCES RESEARCH, VOL. 45, W05418, doi:10.1029/2007WR006695, 2009
for
Full
Article
Model performance and parameter behavior for varying time
aggregations and evaluation criteria in the WASMOD-M
global water balance model
E. Widén-Nilsson,1,2 L. Gong,2 S. Halldin,2 and C.-Y. Xu2,3
Received 23 November 2007; revised 15 October 2008; accepted 17 March 2009; published 20 May 2009.
[1] Global discharge estimates commonly range between 36.500 km3 a1 and
44.500 km3 a1, i.e., around 20%, and continental estimates differ much more. Data
uncertainties are assumed to be a main cause of simulated runoff uncertainties, but model
performance must also be addressed. The parsimonious WASMOD-M global water
balance model, using limited input data, was used to assess data and model uncertainty
(contrary to models using much data but being modestly or not at all calibrated). A Monte
Carlo technique based on 15,000 parameter value sets was used to evaluate the model
against four criteria: observed snow and monthly, annual, and long-term discharge.
WASMOD-M was overparameterized when evaluated only against long-term average
discharge but not against monthly discharge, and its snow algorithm could be simplified.
Sequential calibration is suggested for confining the behavioral parameter space and
minimizing model equifinality starting with snow, followed by long-term volume error,
and ending with discharge dynamics.
Citation: Widén-Nilsson, E., L. Gong, S. Halldin, and C.-Y. Xu (2009), Model performance and parameter behavior for varying time
aggregations and evaluation criteria in the WASMOD-M global water balance model, Water Resour. Res., 45, W05418,
doi:10.1029/2007WR006695.
1. Introduction
[2] Global water balance models are increasingly used to
estimate present and future water resources at large scales
for purposes of, e.g., climate impact studies, freshwater
availability for a growing global population, transboundary
water management, and virtual water trade [Arnell, 2004;
Islam et al., 2007; Lehner et al., 2006; Nijssen et al., 2001a;
Vörösmarty et al., 2000a]. Such global models exist of
different types and complexity. Models like MacPDM
[Arnell, 1999; 2003], WBM [Vörösmarty et al., 1998],
WGHM/WaterGAP [Alcamo et al., 2003; Döll et al.,
2003], WASMOD-M [Widén-Nilsson et al., 2007], and the
‘‘reduced-form’’ model of Kleinen and Petschel-Held
[2007] have roots in traditional catchment modeling.
Models like VIC [Liang et al., 1994] and the integrated
global water resources model of Hanasaki et al. [2008a,
2008b] are macroscale hydrological models with the possibility of GCM coupling because of their energy balance
simulations. Global runoff is also produced by dynamic
vegetation models like LPJ [Gerten et al., 2004] and IBIS
[Kucharik et al., 2000].
[3] Global water resources were originally assessed from
country statistics [e.g., L’vovich, 1979; Baumgartner and
Reichel, 1975], but global water balance modeling has
1
Department of Aquatic Science and Assessment, Swedish University of
Agricultural Sciences, Uppsala, Sweden.
2
Department of Earth Sciences, Uppsala University, Uppsala, Sweden.
3
Department of Geosciences, University of Oslo, Oslo, Norway.
Copyright 2009 by the American Geophysical Union.
0043-1397/09/2007WR006695$09.00
gradually taken over as an assessment tool. Internationally
coordinated efforts gradually improve data sets for such
models, and multimodel ensemble techniques are suggested
as a way to improve global assessments [Dirmeyer et al.,
2006]. In spite of such progress, there is still a large
uncertainty in global discharge estimation. Total global
discharge estimates commonly range between 36.500 km3
a1 and 44.500 km3 a1 [Widén-Nilsson et al., 2007], but
Oki et al. [2001] report a model ensemble value as low as
29.485 km3 a1 for a 2-year period using precipitation data
without gauge undercatch correction. Continental discharge
estimates differ much more [Widén-Nilsson et al., 2007].
Probst and Tardy [1987] report annual global fluctuations
between 34.500 and 44.000 km3 a1.
[4] Gerten et al. [2004] show large differences between
runoff simulated with the LPJ, WBM, Macro-PDM, and
WGHM models. Kleinen and Petschel-Held [2007] compare simulation volume for 31 large river basins calculated
with the VIC model [Nijssen et al., 2001b] and the land
surface GCM component of Russell and Miller [1990].
They find volume differences to vary from 70% to over
+2000% with an average of +10%. Global models rely on
global data sets and are confined by their availability and
often limited quality. All global models suffer from data
uncertainties, which are often assumed to be a main cause of
simulated runoff uncertainties. Döll et al. [2003] found that
they could not match the observed average discharge
without violating the physical range of the calibration
parameter for some basins. Döll et al. [2003] and Fekete
et al. [2002] had to apply large runoff correction factors in
some cells to make inflow to the downstream interstation
area equal to measured flow. They relate this to differences
W05418
1 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
between precipitation and runoff data and especially precipitation undercatch. Fekete et al. [2002], and in part Döll
et al. [2003], allow their correction factors to influence the
simulated global runoff fields.
[5] Global model performance has also received attention
in recent years. Evaluation techniques partly depend on the
modeler’s attitude to calibration. Most modelers agree that
global model parameters should preferably not be calibrated
[Arnell, 1999; Hanasaki et al., 2008a], but Döll et al. [2003]
state that erroneous data are one reason why calibration
cannot be avoided. WGHM, WASMOD-M, and the WTM
routing module sometimes used with WBM and VIC are
calibrated models [Döll et al., 2003; Widén-Nilsson et al.,
2007; Vörösmarty and Moore, 1991; Vörösmarty et al.,
1996; Liang et al., 1994]. Two ways to test model quality
are evaluation against independent data sets of the same
type and calibration against more than one variable. Another
is uncertainty and sensitivity analysis by Monte Carlo
simulations, which can reveal overparameterization and
equifinality, i.e., that several parameter value sets give
equally good results. Demaria et al. [2007] apply Monte
Carlo analysis for subsurface parts of VIC for 4 and Huang
and Liang [2006] for 12 U.S. basins to find those model
structures that can be simplified without losing model
performance. Monte Carlo tests of WGHM are presented
by Kaspar [2004] and Güntner et al. [2007]. Kaspar [2004]
concludes that the most sensitive parameters are related to
lakes and wetlands for low flows and that the impact of
climate change scenarios is stronger than parameter uncertainty for long-term average runoffs. Güntner et al. [2007]
find a strong regional variation in the sensitivity of parameters governing total water storage (snow, soil moisture,
groundwater, and surface water) depending on which processes are most important.
[6] Wagener et al. [2003] present two reactions to the
equifinality problem. The first is to use parsimonious
models, with a risk of too simplistic model structures
[Kuczera and Mroczkowski, 1998]. The second is to search
for calibration methods that better use information in
available data series of, e.g., discharge, groundwater levels,
and snow cover. Therefore, ‘‘uncertainty evaluation of
models means analyzing the range of parameter sets and
sometimes even model structures that are viable for an
anticipated study’’ [Wagener, 2003, p. 3376]. The selection
of one performance criterion normally confines the behavioral parameter space differently than another, meaning that
the optimum parameter value varies with different criteria
[Madsen, 2000]. A combination of criteria, focusing on
different parts of the hydrograph, is usually needed to
evaluate the performance of a hydrological model [Krause
et al., 2005; Legates and McCabe, 1999; Madsen, 2000].
Schulze and Döll [2004] use satellite-derived snow cover
and discharge measurements to test a new subgrid snow
routine for WGHM. WGHM, which explicitly simulates
surface water storage [Güntner et al., 2007], has also been
evaluated by Schmidt et al. [2006] against water storage
variations from the Gravity Recovery and Climate Experiment (GRACE) satellite observations, whereas Werth et al.
[2007] use these data for calibration. The mismatch among
the GRACE data, which give a measure of the total water
storage, changes over large regions after subtracting the
atmospheric water content, with some blurring from sur-
W05418
rounding areas, and the WGHM simulations of major water
storages with conceptual model equations for 0.5° cells
[Güntner et al., 2007] are an example of the incommensurability problems between measured and modeled entities.
Hillard et al. [2003], Pan et al. [2003], and Sheffield et al.
[2003] present comparisons of nonglobal VIC applications
and satellite-derived snow data, whereas Nijssen et al.
[2001c] compare the uncalibrated global version of VIC
with measurements of snow cover and soil moisture in
addition to global runoff. Rawlins et al. [2005, 2007]
compare remotely sensed snow and locally measured river
discharge with results from PWBM, a modified version of
WBM. Fekete et al. [2006] use isotope data to evaluate
WBM/WTM runoff.
[7] It is difficult to calibrate a model against discharge
time series if the model does not include routing delays
from lakes, wetlands, and the river reach itself, as well as
dam regulation. The problem is exacerbated since discharge
information from upstream and downstream gauges often
represent different time periods. Most previous studies have
thus used long-term average discharge when evaluating
results or selecting behavioral parameter value sets. Some
global models, e.g., WGHM [Döll et al., 2003] and WBM/
WTM [Vörösmarty and Moore, 1991; Vörösmarty et al.,
1996], include travel time delay. Many global rivers have
regulation delays of 1 – 3 months [Vörösmarty et al., 1997],
but regulation data are often unavailable [Brakenridge et al.,
2005]. Algorithms for dam operation schemes are emerging
[Haddeland et al., 2006; Hanasaki et al., 2006] but are not
widely used. Model results are commonly reported for
climatological (long-term average) intra-annual patterns,
but efficiency measures are seldom calculated for such
averages. Relatively few studies compare model efficiency
at different time scales. Döll et al. [2003] and Hunger and
Döll [2008] do this for WGHM. Parkin et al. [1996],
Jothityangkoon et al. [2001], Eder et al. [2003], and Hay
et al. [2006] present techniques on the catchment scale to
deal with different time steps, and all agree that increased
model complexity can be supported at a finer time step if
required data are at hand.
[8] Given the uncertainties in the global and continental
discharge estimates, it is advantageous to have global
hydrological models using different approaches, just like
the ensembles of GCMs used in climate research. In this
study we used WASMOD-M [Widén-Nilsson et al., 2007]
to assess data and model uncertainty. We believe that
WASMOD-M, with its six parameters, has the most parsimonious structure of all models except the one of Kleinen
and Petschel-Held [2007]. In developing WASMOD-M, we
start from a very simple structure, with as few parameters
as possible, to avoid overparameterization. If it is found to
be necessary, more processes and input data sets will be
added to the model in the future. Widén-Nilsson et al. [2007,
p. 111] state that ‘‘In spite of its simplicity, it may be
questioned if WASMOD-M also is overparameterized as
long as only long-term average discharge is used for
validation.’’ Most of the current WASMOD-M parameters
are calibrated contrary to those of many other models. We
wanted to find out how much the behavioral parameter
value sets of WASMOD-M could be confined by validation
against snow data in addition to discharge data. To what
degree would selection of performance criteria be instru-
2 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
Table 1. Range of the Five Tunable Parameters in WASMOD-M
Parameter
Governing Storage
Range
Sampling
Interval
Ts (deg C)
Tm (deg C)
Ac
Ps (month1)
Pf (mm1)
snowfall (equations (2) and (4))
snowmelt (equations (2) and (4))
actual evaporation (equation (7))
slow runoff (equation (8))
fast runoff (equation (9))
0–4
4 – 0
0–1
e18 – e0
e14 – e0
uniform
uniform
uniform
logarithmic
logarithmic
mental to simulation success? Would monthly and annual
validation data allow successful discharge simulation without modeling travel time delay? Would equifinality or bad
simulation indicate too simple or too complicated a model
structure at the different time scales?
2. Material and Methods
2.1. Global Data Sets
[9] WASMOD-M is driven by time series of monthly
precipitation (P), temperature (Ta), and potential evaporation (ep) on a 0.5° 0.5° latitude-longitude grid. Precipitation, temperature, and water vapor pressure were taken
from the CRU TS 2.10 climate data [Mitchell and Jones,
2005], covering 1901 – 2002 but only used until 2000.
Precipitation was corrected for gauge undercatch with
long-term average monthly factors calculated from the
Global air temperature and precipitation regridded monthly
and annual climatologies version 2.01 (available at http://
climate.geog.udel.edu/climate) [Legates and Willmott,
1990]. Gridded potential evaporation was preprocessed
from temperature Ta (°C) and relative humidity RH (%),
calculated from temperature and water vapor pressure:
ep ¼ Ec ½maxðTa ; 0Þ2 ð100 RH Þ:
ð1Þ
Ec (mm month1 °C2) was set in an inverse process to
make the average annual potential evaporation equal to the
highest value in two evaporation data sets (Terrestrial water
balance data archive: Regridded monthly climatologies
version 1.02 by C. J. Willmott and K. Matsuura, available at
http://climate.geog.udel.edu/climate/, and Potential evapotranspiration by C.-H. Ahn and R. Tateishi, available at
http://www-cger.nies.go.jp/grid-e/). The minimum instead
of the maximum was chosen in some Arctic Canadian cells
to get Ec 1 mm month1 °C2.
[10] Flow network and cell and basin areas were taken
from STN-30p [Vörösmarty et al., 2000b]. Monthly discharge time series were taken from 654 Global Runoff
Data Centre (GRDC) stations, in 254 basins, coregistered
in 2007 to the STN-30p network in the UNH/GRDC
composite runoff fields version 1.0 [Fekete et al., 2002]
(available at http://www.grdc.sr.unh.edu/). Gauge data
before 1901 were discarded. The Northern Hemisphere
monthly snow cover extent 0.5° 0.5° latitude-longitude
data set by R. L. Armstrong (available at http://islscp2.
sesda.com/ISLSCP2_1/html_pages/groups/snow/snow_
cover_xdeg.html), provided temporal snow-cover data
(percentage of weeks in a month for which a cell is snowcovered to more than 50%) for 344 of the 654 GRDC
basins for 1986– 1995.
W05418
2.2. Global Water Balance Model
[11] The WASMOD-M global water balance model [WidénNilsson et al., 2007] is a distributed version of the monthly
catchment model WASMOD by Xu [2002]. WASMOD-M
calculates snow accumulation and melt and actual evaporation
and separates runoff into a fast and a slow component for each
grid cell with a time step Dt of 1 month. The present model
version does not calculate, e.g., time-delayed routing and
reservoir operation, open-water evaporation, glacier melt,
and anthropogenic water abstraction. The model has five
tunable parameters (Table 1). The version used in this study
was the same as the one presented by Widén-Nilsson et al.
[2007] except for a slightly different formulation of evaporation and total runoff.
[12] The model simultaneously allows snowfall, rainfall,
and snowmelt to occur in the same month. Snowfall and
rainfall (sf and rf, mm month1) as well as snowmelt (sm,
mm month1) and snow accumulation (sp) vary exponentially
between temperature thresholds Tm and Ts, °C:
2
sf ¼ P 1 e½fðTa Ts Þ=ðTs Tm Þg ;
ð2Þ
rf ¼ P sf ;
ð3Þ
2
sm ¼ ðspold =Dt þ sf Þ 1 e½fðTm Ta Þ=ðTs Tm Þg ;
ð4Þ
sp ¼ spold þ ðsf smÞDt;
ð5Þ
where P is precipitation (mm month1), Dt = 1 month, and
{x} means min(x,0).
[13] The ‘‘land moisture’’ variable (lm, mm) represents
the storage of water available for evaporation and runoff in
the next time step. Other authors use ‘‘soil moisture’’ for
similar state variables, but we prefer lm to avoid the
incommensurability problem of the soil moisture point
measurements compared to the modeled, conceptual entity.
Actual evaporation (evap, mm month1) is calculated from
land moisture, potential evaporation, and available water
(aw, mm month1):
aw ¼ lmold =Dt þ rf þ sm
evap ¼ min
nh i
o
; aw :
ep 1 Aaw=ep
c
ð6Þ
ð7Þ
The slow runoff (sr, mm month1) is a base flow, provided
by land moisture, whereas the fast runoff (fr, mm month1)
is provided by both land moisture and water added during a
time step. Both runoffs are described by linear reservoirs:
3 of 14
sr ¼ Ps ðlmold Þ;
ð8Þ
fr ¼ Pf ðlmold Þðsm þ rÞ;
ð9Þ
tr ¼ minfðsr þ frÞ; ðaw evapÞg;
ð10Þ
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
W05418
Table 2. Number of Basins With at Least One Parameter Value Set Resulting in Simulated Results Fulfilling Different Evaluation
Criteria Limits for Nash Coefficient, Volume Error, Limit of Acceptability, and Snow Fit When Evaluated Against Monthly and Annual
Observationsa
Monthly
Annual
Criterion Type
Limit
Name
Number
Percent
Number
Percent
NC
0.8
0.5
0
1%
20%
50%
±75% of observations
±99% of observations
0.95
0.75
0.50
NC0.8
NC0.5
NC0
VE1
VE20
VE50
LA75
LA99
SF95
SF75
SF50
157
479
642
632
643
651
580
654
274
321
339
24
73
98
97
98
100
89
100
77
91
96
103
363
560
630
643
651
650
654
-
16
56
86
96
98
100
99
100
-
VE
LA
SF
a
There were 654 basins with runoff measurements and 344 with snow measurements. Percentages relate to these totals. NC is Nash coefficient, VE is
volume error, LA is limit of acceptability, and SF is snow fit.
where tr is total runoff. Finally, the land moisture storage is
updated:
lm ¼ lmold þ ðrf þ sm evap trÞDt:
ð11Þ
The code was written in MATLAB, and simulations were
made on a PC with support from a parallel cluster. The
model warmup time period was 5 years. Initial values of
land moisture and snow (where potentially occurring) were
globally uniform.
2.3. Model Evaluation
[14] The split-sample method was used to calibrate and
validate the model, in which the first half of each discharge
time series was used for calibration and the second was used
for validation, and vice versa. Snow calibration was made
for the whole 1986 – 1995 period. It was validated by
comparison with a benchmark snow calibration driven by
the long-term mean 1986– 1995 climatology. Calibration
was made independently for each basin area with uniform
parameter value sets. Interstation runoff was not calculated
in nested basins. An upstream cell, belonging to the basins
of several downstream stations, could thus get several
different parameter value sets.
[15] Calibration was a search of all ‘‘behavioral’’ parameter value sets at each discharge station. Monte Carlo
simulations were made with the same sets for all basins.
Parameter values were sampled from uniform and logarithmic distributions within given ranges (Table 1) and were
randomly combined to generate 15,000 parameter value
sets.
[16] Calibration was made against monthly snow observations and monthly and annual discharge observations.
Calendar year measurement averages were calculated from
a minimum of 10 months, and missing months were also
excluded from simulated averages. Snow was evaluated
with one criterion and runoff with three criteria. Evaluation
criteria were calculated separately for monthly and annual
time series. Validation was based on the same runoff criteria
as calibration.
[17] We defined parameter value sets to be ‘‘behavioral’’
in two ways. A first, relative definition was the selection of
the best 1, 3.3, and 20% (150, 500, and 3000 of 15,000)
simulations for each criterion. A second definition used the
absolute limits for each criterion (Table 2).
[18] Since the snow in WASMOD-M is independent of
land moisture and evaporation, it was evaluated separately.
The snow criterion faced conceptual problems in both
measurements and simulations. The simulated snowpack
represents an amount on the last day of the month, available
for melting in the following month. The Northern Hemisphere snow measurements give no information on amounts
but percentage of weeks in a month for which a cell is snow
covered to more than 50%. Only months with snow during
100% of the time are guaranteed to have snow on the last
day. Months with 0% snow-covered (i.e., <50% cover) time
were assumed to represent no-snow conditions unless they
occurred in winter (December – February) or adjacent to a
month with a spatial snow coverage above 50%. Given
these conceptual limitations, the snow-fit criterion was
based on measured and simulated snow periods:
0X
1
X
smocorrect
nsmocorrect
B cells
C
C
X
SF ¼ minB
; cells
@ X smo
A
nsmo
tot
tot
cells
ð12Þ
cells
where smocorrect is the number of months with simulated
snow fitting months with measured snow and nsmocorrect is
the number of simulated months with no snow fitting
measured snow-free months. The total number of measured
snow-covered and snow-free months is given by smotot and
nsmotot. Everything is summed for all cells in a basin. SF
varies from a perfect fit at 1 to no fit at 0. A minimum
simulated 1 mm was required to accept a snow cover.
[19] The Nash coefficient (NC), calculated from the
discharge time series, and the volume error (VE), calculated
from the long-term average discharge, are the most widely
used criteria for discharge:
4 of 14
X
ðdobs dsim Þ2
time
NC ¼ 1 X
time
dobs dobs
2
ð13Þ
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
W05418
W05418
Figure 1. Runoff performance for (left) River Sénégal at Bakel and (right) River Ob at Salekhard for
combinations of values of the evaporation (Ac) and slow-flow (Ps) and fast-flow (Pf) parameters.
Calibration was made with the Nash criterion (NC) against observed monthly and annual observations
and with the absolute value of the volume error (%) for the calibration time period (1904 – 1951 at Bakel
and 1930 – 1961 at Salekhard). Values for the 500 (3.3%) best sets are shown as large dots.
X
X
dsim dobs
VE ¼ time X time
;
dobs
ð14Þ
time
where dobs is observed discharge and dsim is simulated
discharge.
[20] The limit of acceptability (LA) criterion is presented
by Beven [2006]. It requires a modeler to predefine acceptable simulation errors on the basis of ‘‘effective observation
error’’ of input data and discharge measurements. These
limits can vary in time. Simulated runoff that falls within the
acceptable limits at a given point in time is weighted with a
triangular or a trapezoidal function where a simulation close
to the measured discharge is given 100% weight, whereas a
simulation outside limits is given zero weight. The choice of
predefined error limits was not obvious in our case, and we
started with subjective, wide limits. This was motivated by
the facts that GRDC do not generally report rating curve
errors and that the model-input data are uncertain. A
symmetrical, triangular weighting function (with a zero
minimum and a unit maximum) was used, and LA was
calculated as a time average.
[21] LA was defined by a range around the measured flow
that simulated flows had to meet at least 95% of the time;
that is, less than 3 months in 5 years was allowed to fall
outside of range. The initial range (LA75) was given as
±75% of the flow at each time step plus 3 mm to avoid high
relative low-flow errors. If a sufficient number of simulations did not meet this criterion, when selecting the 1 – 20%
best, we widened the range to ±99% of the flow plus 3 mm
(LA99). If this was not enough, we widened the range until
we obtained the required number of simulations (LAmax).
in the auxiliary material were selected to obey two criteria:
(1) locations should be reported for other global models,
and (2) results should represent typical cases, not just good
or bad.1
3.1. Time Aggregations
[23] Equifinality of both runoff and evaporation parameter increased when successively calibrated against monthly,
annual, and long-term average runoff (Figure 1). The trend
was very clear when going from annual to long-term average
aggregation but less clear when going from monthly to
annual aggregation. Equifinality decreased in some cases
from monthly to annual aggregation, especially for the
evaporation parameter. It was also evident that LA75 was
too permissive in combination with annual data (Table 2).
3.2. Parameter Behavior
[24] WASMOD-M showed a very good snow performance for many basins (Figure 2). SF values above 0.95
were found for 77% of all 344 snow observation basins
(Table 2), and 35% of the 344 had SF 0.95 for all
parameter value sets. The biggest problems to simulate
snow observations correctly were found in basins with only
occasional snow cover. Snow calibration commonly confined the snow parameter space (Figure 3). The constriction
was clear for Ts but less so for Tm. The maximum SF values
were higher with the normal snow calibration for 40% of the
basins compared to the benchmark calibration. No improvement was seen in another 40%, mainly because SF = 1 with
both calibrations in these mainly small basins. Ts was better
confined with the normal snow calibration than with the
benchmark calibration for 2/3 of the basins, while 1/3 were
better confined with the benchmark. Tm was better confined
with the normal snow calibration than the benchmark
calibration for 47%, while 40% were better confined with
3. Results
[22] WASMOD-M simulations exhibited a wide range of
results from poor to excellent. Examples shown below and
1
Auxiliary materials are available in the HTML. doi:10.1029/
2007WR006695.
5 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
W05418
Figure 2. Observed and simulated snow properties for a 0.5° 0.5° cell in the River Ob basin. Snow
observations give percentage of weeks when snow covered more than 50% of the area. Solid squares
indicate months when the cell was snow covered to 100%, and open squares indicate months completely
without snow. The top row of squares shows measured data, and the bottom row shows simulated data.
the benchmark and 13% were equally well confined with
both calibrations.
[25] The behavioral space for the evaporation parameter
(Ac) differed between snow-covered and dry, warm basins.
The evaporation parameter space was better confined for
dry and warm basins with all criteria, while the runoff
parameter (Ps, Pf) spaces were better confined for snowcovered basins, particularly by VE (Figure 1). Runoff
parameters were better confined for nonsnow basins by
the 1% best NC and LA values but for snow-covered basins
with the 20% best NC and LA values.
3.3. Criteria Relationships
[26] It was possible to meet the VE criterion within 1%
for almost all basins (Tables 2 and S1) during the calibration
periods. All basins (with one exception) where this criterion
could not be met had runoff coefficient problems (too-high
runoff compared to precipitation). It was also possible for
many basins to fulfill the LA75 criterion (Table 2). A very
high number of parameter value sets fulfilling LA75 were
found for warm and dry basins, whereas a smaller number
was found for snow-covered basins. This had to do with the
generous ±3-mm limit that was too high for some arid
basins with annual runoff sometimes below 3 mm. The
Nash criterion was the most demanding, and less than a
quarter of all basins got NC above 0.8 when calibrated
against monthly data (Table 2).
[27] Parameter value sets that simultaneously fulfilled all
monthly criteria within their strictest limits (NC0.8, VE1,
LA75, and SF95; see Table 2) were found for 57 basins (9%
of all), none of which had runoff coefficient problems.
Almost half were located in Africa, likely because SF was
not used there and LA75 was too generous of a limit in dry
Figure 3. Runoff (Nash criterion (NC)) and snow (snow fit (SF)) performance for a range of Ts and Tm
values for River Ob at Salekhard. The best 3.3% (500 of 15,000) parameter value sets for each criterion
as well as 20 common value sets among the best are highlighted.
6 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
W05418
Figure 4. Pairwise comparison of three runoff criteria (Nash (NC), volume error (VE), and limit of
acceptability (LA75)) during calibration for (top) River Sénégal at Bakel and (bottom) River Ob at
Salekhard. The dashed and dash-dotted lines delineate the best 500 (3.3%) and 150 (1%) value sets,
respectively, for each parameter. Sets that are common for all three criteria are highlighted in black (best
3.3%) and are encircled (best 1%).
basins. The NC performance was also high in Africa.
Parameter value sets that simultaneously fulfilled the
monthly NC, VE, and SF (where applicable) criteria at their
second (NC0.5, VE20, SF75) and third (NC0, VE50, SF50)
levels together with LA75 could be found for 52 and 84% of
all basins.
[28] It was possible to find good runoff parameter value
sets concurrent with the best 1% snow parameter value sets
for almost all snow-covered basins. It was also possible to
find common sets between the 1% best of NC and VE for
99% of all basins. NC and LA behave similarly in parameter
space, but common sets between the 1% best of them were
found for only 83% of all basins. Common sets between the
1% best VE and LA were found for 52% of all basins. When
pairwise common parameter value sets were found, the
largest number was found for NC-LA followed by NC-VE.
The smallest number of combinations was found between
VE and LA (Figure 4). The number of pairwise common
parameter value sets for snow-covered basins was usually
smaller than for nonsnow basins, especially for NC-VE.
Although the snow calibration narrowed the simulated
runoff range, the reduction was not proportional to the
reduction in the number of behavioral parameter value sets,
and the average runoff time series produced by the confined
and nonconfined sets were almost equal (Figure 5). The less
confined parameter space for VE compared to NC is clearly
seen in the range of hydrographs (Figure 6 and auxiliary
material). The average time course is also different between
the two, with NC simulations giving higher emphasis to
peak flows.
3.4. Model Performance
[29] The best parameter value sets identified during the
calibration period always produced among the best validation results for all criteria (Figure 7). The best NC values
decreased on average with 0.28 units during validation
(Figure 8). The average validation performance could be
better or worse, and the performance relation was seldom
one to one even if good calibrations led to good validations
(see Sénégal River in Figure 7). It was also common,
especially for VE, that the 1% best calibrations were not
found among the 1% best validations. The overlap between
the best parameter value sets between calibrations of the
7 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
Figure 5. Ob River runoff at Salekhard. The thick black
line gives observations. The light gray area delineates runoff
simulated with the 500 (3.3%) best parameter value sets
according to the Nash criterion. The dark gray area
delineates runoff simulated with 20 common parameter
value sets giving the best 3.3% fit for both the Nash and the
snow-fit criteria. The thin black line gives the average of the
best simulations (indistinguishable between Nash and
combined Nash and snow-fit criteria). Runoff calibration
for this basin was 1930– 1964, and snow calibration was
1986 – 1995.
two periods was on average 80% for sets selected from the
best 20% monthly NC values and 44% for the best 1%
monthly NC values. The overlap was better for monthly
than for annual calibration. Common sets were found for all
basins among the best 20% NC values, while the 1% best
had total misses for 11% of the basins.
W05418
seasonality, should require somewhat less data, and 30 years
should suffice. This was available for only 21% of the
stations, whereas 27% had fewer than 10 annual data points.
The LA criterion should theoretically be comparable over
time aggregations if time series span at least 20 years to
make the 95% time limit meaningful. Less than half of the
stations fulfilled this requirement. The selected LA limits
should have been more restrictive for annual than for
monthly data to be comparable.
[32] Döll et al. [2003] use NC on annual runoff, whereas
other authors choose other annual criteria. Hunger and Döll
[2008] use the coefficient of determination, Mouelhi et al.
[2006] use an RMS error normalized by precipitation, and
Bari et al. [2005] use correlation coefficients. Jothityangkoon
et al. [2001] and Eder et al. [2003] compare measured and
simulated annual exceedance probability. Schaefli and Gupta
[2007] point out that an NC benchmark is needed since
almost any model can deliver high NC values for some
stations, whereas not even the best models are successful in
other basins. One of their proposed benchmarks, weighting
NC with precipitation, could be worth exploring for annual
data. An LA criterion, with specific limits for the annual
time step, might also be developed.
[33] The equifinality of runoff parameter value sets (Ps,
Pf) generally increased with increased time aggregation
(Figure 1). Monthly and annual time steps often gave
similar peaks in parameter space, whereas the long-term
average runoff hardly confined the space at all. The sharpest
peaks (sometimes more visible when integrated to give
probability densities) were most often seen for the shortest
time step. The behavioral monthly NC parameter value sets
for the two runoff parameters often coincided with the
annual NC sets. It was easy to find parameter value sets
that also obeyed the long-term VE criterion, although they
were not always among the 1% or 3.3% best sets. Common
4. Discussion
[30] A discussion of a global water balance model must
focus on general patterns, not on specific details. Unexpected results for single basins cannot be explored in detail
because of limited and uncertain information about individual basins.
4.1. Time Aggregation
[31] We selected the same runoff criteria for monthly and
annual aggregations, but no criterion could be directly
compared between aggregations. The volume error should
be invariant to time aggregation but differed slightly since
some monthly observations were excluded from the annual
aggregation. The Nash criterion was not comparable
between time aggregations for two reasons. NC values were
lower for the annual aggregation because the annual runoff
variability was lower. Annual NC values were also less
certain because the number of observations was lower. Xu
and Vandewiele [1994] show that the WASMOD catchment
model requires 10 years of data for a robust calibration in
humid climate. More than 10 years of monthly calibration
data (after dividing all time series into halves) were available for 76% of the discharge stations, whereas 2% provided
less than 5 years of data. Annual values, not affected by
Figure 6. Sénégal River runoff at Bakel for the last 10
years of the calibration period 1904 – 1951. The thick black
line gives observations. The light gray area delineates runoff
simulated with the 500 (3.3%) best parameter value sets
according to the volume error criterion, and dark gray areas
delineate simulation with the Nash criterion. The thin black
line gives the average of the best Nash simulations, and the
dotted line gives the average of the best volume error
simulations.
8 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
W05418
Figure 7. Modeled runoff performance for validation (second half) versus calibration (first half of
observation) periods for three evaluation criteria (Nash (NC), limit of acceptability (LA), and volume
error (VE)). Shown are performance for (top) Sénégal River at Bakel (observations 1904 –1989) and
(bottom) Ob River at Salekhard (observations 1930– 1999). Simulations based on the best 150 (1%)
calibrated parameter value sets are highlighted, and the thin line gives the one-to-one relation. The best
calibrated parameter value sets for Ob River not fulfilling the 95% limit for LA75 during the calibration
period are marked with crosses.
parameter value sets selected on the basis of NC were more
frequent than those based on LA.
[34] The evaporation parameter Ac showed a complex
behavior. Ac sometimes exhibited more equifinality with NC
and LA for shorter than for longer time aggregations. This
complex behavior is difficult to assess without a reliable
global evaporation database.
4.2. Parameter Behavior
[35] The SF criterion did not account for the start and end
of the snow period because of the incommensurability
problem between modeled and observed snow data. Still,
the use of monthly precipitation and temperature time series
in the normal calibration generally improved the results
compared to the climatological benchmark snow calibration. The equifinality of Tm indicated that it might be
discarded in future model versions. It is possible that this
finding can be challenged when high-resolution MODIS
snow cover data, which exist from 2000, can provide a
better evaluation data set (National Snow and Ice Data
Center, http://nsidc.org/data/modis/faq.html).
[36] The evaporation parameter Ac was normally more
confined by all runoff criteria for non– snow covered than
for seasonally snow covered basins. The VE criterion
confined Ac for 30% of the snow basins and for 70% of
the nonsnow basins, primarily for basins with a high runoff
coefficient. This constriction was therefore always toward
high Ac values giving low evaporation. The VE constriction
always acted to remove low Ac values but never high ones.
The NC and LA criteria could confine both high and low Ac
values. All runoff criteria acted oppositely on Ps and Ac
such that a smaller range of evaporation mostly coincided
with a wider base flow range and vice versa. The NC and
LA criteria successfully confined the Ps and Pf runoff
parameter space in its upper part and often also confined
their values to a small range, while VE often left these
parameters undetermined (Figure 1).
4.3. Criteria Relationships
[37] Dunn and Colohan [1999] and Udnæs et al. [2007]
show the importance of multiobjective calibration against
snow data to get a better internal model structure even if
simulated runoff is not improved. State variables updated
with remotely sensed snow cover can marginally improve
simulated streamflows, but their importance increases in
areas with seasonally variable snow cover [Andreadis and
Lettenmaier, 2006; Clark et al., 2006]. These findings are
similar to ours, where a well-simulated snow cover only
affected runoff to a small extent. The snow parameters were
confined almost only by the snow criterion (Figure 3) but in
a few cases also slightly by NC.
[38] The three runoff criteria had a complex interrelationship that depended on the relative criteria limits and the
presence or not of snow. Among the runoff criteria, NC and
LA mostly gave similarly confined selections of behavioral
parameter value sets. NC was commonly more restrictive
than LA for nonsnow basins. VE results were commonly
least confined, i.e., produced most equifinality. It was
obviously more difficult to find pairwise common runoff
parameter value sets when the tighter limits were put on
each criterion (Figure 4). It was possible to get combinations of behavioral sets between NC and VE for all basins.
The simultaneous requirements of LA and VE were seldom
met in basins with bad LA performance and in dry basins
9 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
Figure 8. (top) Best Nash (NC) performance of 15,000
tested parameter value combinations for 654 gauged basins
during calibration in the first half of the measured discharge
time series and (bottom) the corresponding NC values
during validation in the second half of the measured period.
The results refer to the whole upstream area for each
catchment but are shown only for their interstation
coverage.
where the 3-mm limit of LA was too permissive. For basins
where it was not possible to reconcile LA and NC, it was
also impossible to reconcile LA and VE. NC was the easiest
criterion to reconcile with any of the other criteria. The fact
that different criteria confined the parameter space differently is supported by earlier findings, e.g., Madsen [2000]
and Chahinian and Moussa [2007], who selected paretooptimal parameter values. Chahinian and Moussa [2007,
p. 1032] point out that ‘‘. . . the calibrated parameter values
are dependent on the type of criteria used. Significant tradeoffs are observed between the different objectives: no unique
set of parameter is able to satisfy all objectives simultaneously.’’ A similar conclusion is also drawn by Madsen
[2000]. We made tests to find common parameter value
combinations among the 1– 33% best parameter value combinations for each criterion. Rather low performances had to
be accepted to find parameter value combinations within the
best ranges of all criteria. We thus instead suggest a stepwise
approach taking one criterion after the other.
[39] We found a few geographical criteria patterns. One
was that almost no European basin fell within the NC0.8
limit (Figure 8), possibly because of the too-high precipitation correction factors in Europe [Arnell, 1999; Döll et al.,
2003]. Some Alpine catchments also have problems with
W05418
nonstationarity of their state variables. Further investigation
is needed if this is caused by retreating glaciers or something else. The high NC values in Africa might be surprising, given the usually lower data quality. Our modeling
experience, however, has shown that NC values are usually
higher in Africa and south Asia, where yearly discharge
have one or two distinct flood periods and a dry period,
compared to Europe, where the annual pattern can be more
complex.
[40] Widén-Nilsson et al. [2007] based their WASMOD-M
analysis on 1680 parameter value sets and found it difficult
to achieve very good fit in some cases. In this study we used
15,000 sets and had few problems in identifying ‘‘good’’
sets. It is likely that a still larger sample would allow the
identification of a larger number of good parameter value
sets and common parameter value sets between different
criteria. We do not believe, however, that analysis of, e.g.,
one million sets would greatly alter the parameter value
behavior or the relation between criteria.
4.4. Model Performance
[41] In an ideal setting, validation should be performed
with split-sample, proxy basin tests in a stationary climate
within basins of time-invariable land cover and flow network. Such conditions can be met for individual catchments
but not in a global basin analysis since human interventions
during the 20th century have transformed the majority of the
world’s basins. Many runoff records were systematically
less affected by human activity during early calibration than
during later validation periods. This nonstationarity, including climatic variations and possible changes in meteorological station network and data quality, can explain why
sometimes all parameter value sets gave much better values
for one criterion on one time period than on the other. It also
limited the possibility of drawing detailed conclusions from
the validation tests. The nonstationarity was also shown by
calibrations performed on the second period. This high
overlap, combined with the general agreement between
high-calibration and high-validation performance for all criteria (Figure 7), showed that WASMOD-M was robust under
these conditions. It was also shown that WASMOD-M
could be calibrated to ±1% volume error for all basins
except those with runoff coefficient problems. It was often
also possible to find parameter value sets producing good
dynamics within those sets that produced small volume
errors.
[42] Validation was hampered not only by nonstationarity
but also by well-known data problems [Widén-Nilsson et al.,
2007]. Error-prone data fed into an otherwise ‘‘perfect’’
model can create substantial equifinality. Our combination
of precipitation and runoff data sets produced too-large
runoff in relation to precipitation for 4% of all basins. Such
problems were found, e.g., in some Alaskan basins and the
headwaters of the Ganges-Brahmaputra and the neighboring
Irrawaddy. Data problems, such as shifts in discharge
pattern after damming and strange discharge pattern possibly caused by unit conversion problems or changes in rating
curves, were found in another 4% of all basins. Several
problem basins could be well simulated, and the most
obvious consequence was the forcing of the evaporation
parameter toward its upper limit for basins with too-high
runoff coefficients. The usage of the maximum potential
evaporation from two databases gave too-high potential
10 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
evaporation in some regions. The dryness index (quotient of
annual potential evaporation to annual precipitation)
exceeded 10, meaning hyperarid and superarid [Ponce et
al., 2000], during the calibration period for as many as 4%
of all basins. The dryness indices were compared at the five
basins presented by Riebsame et al. [1995], and four basins
clearly exceeded the values by Riebsame et al. [1995]. The
too-high dryness was likely caused by a combination of toohigh potential evaporation and too-low precipitation. Low
precipitation was likely the main problem in basins with
high runoff coefficients.
[43] Model problems were identified for a small number
(12, less than 2%) of basins where no parameter value set
produced NC above zero. Five of these basins were dominated by large lakes (Great Lakes Saint Lawrence River
basin, Owen reservoir in the Victoria Nile, and the Narva
River downstream of Lake Peipsi). The model treated all
basins as land and did not account for delays caused by
large lakes. WASMOD-M only excluded outlet lakes like
the Black Sea, the Caspian Sea, and the Aral Sea from the
land area.
5. Conclusions
[44] Most global water balance models are evaluated
against long-term average runoff, and results are also
presented in the form of reasonable within-year dynamics.
This study showed that WASMOD-M could always be
calibrated to a very small volume error (long-term average
runoff) for basins where input data were not found to be
unreasonable and where the model assumptions were not
obviously wrong. Calibration of WASMOD-M against
measured snow properties reduced equifinality of the snow
parameters. Runoff calibration against monthly and annual
time series with the NC or LA criteria was expected to provide a model with good dynamical behavior and increased
sensitivity to routing and damming, whereas runoff calibration against the long-term average runoff, i.e., the volume
error, was expected to provide a model with a less consistent
dynamic behavior but a smaller sensitivity to routing and
damming. Results only partly confirmed this picture. Evaluation against annual data was difficult because time series
were normally too short for generally accepted criteria. Calibration against monthly as opposed to annual data sometimes provided more equifinality, and calibration against
long-term average runoff mostly produced considerable
equifinality. This confirmed the concern of Widén-Nilsson
et al. [2007] that WASMOD-M was overparameterized
when evaluated against long-term average runoff but not
against monthly time series, with the possibility that the
snow algorithm can be simplified to use only one parameter.
The somewhat ambiguous evaluations against monthly and
annual observations as well as the model failure to mimic
seasonal dynamics if influenced by large lakes provided
incentives to develop a routing algorithm for WASMOD-M.
[45] Overparameterization is one of three interrelated
factors that cause equifinality and uncertainty in model
results. The other two relate to the quantity and quality of
input and validation data. The relation between input data
and equifinality was reasonably straightforward since our
input data were insufficient to confine all parameters. The
data errors increased the uncertainty of the modeling components that were not controlled by the objective function
W05418
and therefore enhanced the equifinality of the related model
parameters. Concerning overparameterization, things were
more complicated. This is because hydrological models are
of diverse forms, so there is no simple relation between the
number of parameters and the overparameterization of a
model, although a large number of parameters usually
increases the risk of overparameterization. Hornberger et
al. [1985] point out the great danger of overparameterization if a modeler attempts to simulate all hydrological
processes considered relevant and fit those parameters by
optimization against an observed discharge record. This is
because overparameterization is not only a problem of
model structure but is also related to data problems. The
WASMOD-M philosophy follows Beven [1989, p. 159],
who conclude that it ‘‘appears that three to five parameters
should be sufficient to reproduce most of the information in
a hydrological record.’’ Given the inadequacy and inaccuracy of input data, simple models that capture the essential
features should be preferable to complex models that are
designed to simulate a large number of processes. Overparameterization and equifinality are caused by a lack of
input data and data that are poor or modest representations
of their real-world entities.
[46] Since even the parsimonious WASMOD-M showed
signs of being overparameterized, it can be questioned
whether other, less parsimonious global water balance
models might also be overparameterized. Although WASMOD-M has the highest number of calibrated parameters,
other models (except the one by Kleinen and Petschel-Held
[2007]) have a much higher total number of parameters.
Noncalibrated parameters can suffer from large errors introduced by the physiographic data sets [Hannerz and Lotsch,
2006; Peel et al., 2007] used to estimate these parameters. Despite such problems, we are aiming at making use
of further data sets to possibly reduce the number of calibrated parameters in WASMOD-M. The uncertainty in data
could possibly be decreased by using several data sets of the
same entity.
[47] The generation and analysis of behavioral parameter
value sets were done to analyze the combined model and
data uncertainty in WASMOD-M. It was also done to
generate a basis for regionalization and to define behavioral
sets to be used in further model applications. VE was not
enough to confine the parameter space and had to be
accompanied by other criteria. Only 57 basins had parameter value sets that simultaneously fulfilled all monthly
criteria within their strictest limits. A stepwise criteria
application to select good parameter value sets was an
alternative to the search for common parameter value sets
among the best ones for each criterion. We suggest that
snow calibration should be a first, independent step to
confine parameter space before applying other criteria, since
snow simulation is independent of the runoff simulation in
WASMOD-M, and possibly other models, and since all land
surface above 40° northern latitude has seasonal snow
cover, and about 50% of the Northern Hemisphere runoff
comes from snowmelt. The NC and LA criteria gave similar
results, but LA needs further elaboration before use in global
modeling. Since it was always possible to find behavioral
parameter value sets fulfilling the VE criterion, we suggest
(as do Demaria et al. [2007] and Schaefli et al. [2005]) that
runoff calibration should start by confining the model to
11 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
those sets that give good fit to long-term average runoff.
The parameter space should then be further confined by the
NC criterion to provide as good a dynamical behavior as
possible. The long-term average runoff should also be given
a higher priority because anthropological influence is much
larger on time series than on long-term averages. We think
the LA criterion could be useful in future global modeling if
combined with region- or basin-specific benchmarks such
that model performance can be compared between different
parts of the world.
[48] This study used several parameter value sets to
simulate runoff ranges. Fekete et al. [2004] and Fiedler
and Döll [2007] showed varying model results with their
models driven by different precipitation data sets. We see
these studies as the start of a more general use of uncertainty
analysis of all future model results. Further development of
WASMOD-M and other global water balance models would
benefit from intercomparisons based on the same input and
validation data, the same land mask, and the same evaluation criteria. Simulated global runoff data should be specified to a given time period, and interannual variability
should be assessed. Such development would increase the
reliability of global runoff and discharge estimates and
would decrease the large uncertainty we face today.
[49] Acknowledgments. We are grateful to groups providing the free
global data we used: the Climate Research Unit (CRU) and David Viner at
the University of East Anglia; C. J. Willmott, K. Matsuura, and collaborators at the University of Delaware; GRID-Tsukuba at National Institute
for Environmental Studies; Chung-Hyun Ahn and Ryutaro Tateishi at Chiba
University; the Water Systems Research Group at the University of New
Hampshire; Thomas de Couet at the Global Runoff Data Centre (GRDC);
the International Satellite Land-Surface Climatology Project (ISLSCP); and
the National Snow and Ice Data Center (NSIDC) at the University of
Colorado, Boulder. The study was funded by the Swedish Research Council
through contracts 629-2002-287 and 621-2002-4352 and Formas, the
Swedish Research Council for Environment, Agricultural Sciences and
Spatial Planning through contract 214-2005-911. Parts of the computations
were performed on UPPMAX resources under project p2006015. Keith
Beven and Ida Westerberg were valuable discussion partners, and Keith
Beven kindly helped us to improve presentation and grammar. We thank
Balazs Fekete and an anonymous reviewer for their valuable manuscript
comments.
References
Alcamo, J., P. Döll, T. Henrichs, F. Kaspar, B. Lehner, T. Rosch, and
S. Siebert (2003), Development and testing of the WaterGAP 2 global
model of water use and availability, Hydrol. Sci. J., 48(3), 317 – 337,
doi:10.1623/hysj.48.3.317.45290.
Andreadis, K. M., and D. P. Lettenmaier (2006), Assimilating remotely
sensed snow observations into a macroscale hydrology model, Adv.
Water Resour., 29(6), 872 – 886, doi:10.1016/j.advwatres.2005.08.004.
Arnell, N. W. (1999), A simple water balance model for the simulation of
streamflow over a large geographic domain, J. Hydrol., 217(3 – 4), 314 –
335, doi:10.1016/S0022-1694(99)00023-2.
Arnell, N. W. (2003), Effects of IPCCSRES emissions scenarios on river
runoff: A global perspective, Hydrol. Earth Syst. Sci., 7, 619 – 641.
Arnell, N. W. (2004), Climate change and global water resources: SRES
emissions and socio-economic scenarios, Global Environ. Change,
14(1), 31 – 52.
Bari, M. A., K. R. J. Smettem, and M. Sivapalan (2005), Understanding
changes in annual runoff following land use changes: A systematic databased approach, Hydrol. Processes, 19(13), 2463 – 2479, doi:10.1002/
hyp.5679.
Baumgartner, A., and E. Reichel (1975), Die Weltwasserbilanz, R. Oldenbourg, Munich, Germany.
Beven, K. J. (1989), Changing ideas in hydrology: The case of physically
based models, J. Hydrol., 105(1 – 2), 157 – 172, doi:10.1016/00221694(89)90101-7.
W05418
Beven, K. (2006), A manifesto for the equifinality thesis, J. Hydrol.,
320(1 – 2), 18 – 36, doi:10.1016/j.jhydrol.2005.07.007.
Brakenridge, G. R., S. V. Nghiem, E. Anderson, and S. Chien (2005),
Space-based measurement of river runoff, Eos Trans. AGU, 86, 185 –
188, doi:10.1029/2005EO190001.
Chahinian, N., and R. Moussa (2007), Comparison of different multiobjective calibration criteria of a conceptual rainfall-runoff model of
flood events, Hydrol. Earth Syst. Sci. Discuss., 4, 1031 – 1067.
Clark, M. P., A. G. Slater, A. P. Barrett, L. E. Hay, G. J. McCabe,
B. Rajagopalan, and G. H. Leavesley (2006), Assimilation of snow covered
area information into hydrologic and land-surface models, Adv. Water
Resour., 29(8), 1209 – 1221, doi:10.1016/j.advwatres.2005.10.001.
Demaria, E. M., B. Nijssen, and T. Wagener (2007), Monte Carlo sensitivity
analysis of land surface parameters using the variable infiltration capacity
model, J. Geophys. Res., 112, D11113, doi:10.1029/2006JD007534.
Dirmeyer, P. A., X. A. Gao, M. Zhao, Z. C. Guo, T. K. Oki, and
N. Hanasaki (2006), GSWP-2—Multimodel analysis and implications for
our perception of the land surface, Bull. Am. Meteorol. Soc., 87(10),
1381 – 1397, doi:10.1175/BAMS-87-10-1381.
Döll, P., F. Kaspar, and B. Lehner (2003), A global hydrological model
for deriving water availability indicators: Model tuning and validation,
J. Hydrol., 270(1 – 2), 105 – 134, doi:10.1016/S0022-1694(02)00283-4.
Dunn, S. M., and R. J. E. Colohan (1999), Developing the snow component
of a distributed hydrological model: A step-wise approach based on
multi-objective analysis, J. Hydrol., 223(1 – 2), 1 – 16, doi:10.1016/
S0022-1694(99)00095-5.
Eder, G., M. Sivapalan, and H. P. Nachtnebel (2003), Modelling water
balances in an Alpine catchment through exploitation of emergent properties over changing time scales, Hydrol. Processes, 17(11), 2125 – 2149,
doi:10.1002/hyp.1325.
Fekete, B. M., C. J. Vörösmarty, and W. Grabs (2002), High-resolution
fields of global runoff combining observed river discharge and simulated
water balances, Global Biogeochem. Cycles, 16(3), 1042, doi:10.1029/
1999GB001254.
Fekete, B. M., C. J. Vörösmarty, J. O. Roads, and C. J. Willmott (2004),
Uncertainties in precipitation and their impacts on runoff estimates,
J. Clim., 17, 294 – 304, doi:10.1175/1520-0442(2004)017<0294:UIPATI>2.0.CO;2.
Fekete, B. M., J. J. Gibson, P. Aggarwal, and C. J. Vörösmarty (2006),
Application of isotope tracers in continental scale hydrological modeling,
J. Hydrol., 330(3 – 4), 444 – 456, doi:10.1016/j.jhydrol.2006.04.029.
Fiedler, K., and P. Döll (2007), Global modelling of continental water
storage change—Sensitivity to different climate data sets, Adv. Geosci.,
11, 63 – 68.
Gerten, D., S. Schaphoff, U. Haberlandt, W. Lucht, and S. Sitch (2004),
Terrestrial vegetation and water balance—Hydrological evaluation of a
dynamic global vegetation model, J. Hydrol., 286(1 – 4), 249 – 270,
doi:10.1016/j.jhydrol.2003.09.029.
Güntner, A., J. Stuck, S. Werth, P. Döll, K. Verzano, and B. Merz (2007), A
global analysis of temporal and spatial variations in continental water
storage, Water Resour. Res., 43, W05416, doi:10.1029/2006WR005247.
Haddeland, I., T. Skaugen, and D. P. Lettenmaier (2006), Anthropogenic
impacts on continental surface water fluxes, Geophys. Res. Lett., 33,
L08406, doi:10.1029/2006GL026047.
Hanasaki, N., S. Kanae, and T. Oki (2006), A reservoir operation scheme
for global river routing models, J. Hydrol., 327(1 – 2), 22 – 41,
doi:10.1016/j.jhydrol.2005.11.011.
Hanasaki, N., S. Kanae, T. Oki, K. Masuda, K. Motoya, N. Shirakawa,
Y. Shen, and K. Tanaka (2008a), An integrated model for the assessment
of global water resources—Part 1: Model description and input meteorological forcing, Hydrol. Earth Syst. Sci., 12, 1007 – 1025.
Hanasaki, N., S. Kanae, T. Oki, K. Masuda, K. Motoya, N. Shirakawa,
Y. Shen, and K. Tanaka (2008b), An integrated model for the assessment
of global water resources—Part 2: Applications and assessments, Hydrol.
Earth Syst. Sci., 12, 1027 – 1037.
Hannerz, F., and A. Lotsch (2006), Assessment of land use and cropland
inventories for Africa, Discuss. Pap. 22, Cent. for Environ. Econ. and
Policy in Afr., Univ. of Pretoria, Pretoria, South Africa.
Hay, L. E., G. H. Leavesley, M. P. Clark, S. L. Markstrom, R. J. Viger, and
M. Umemoto (2006), Step wise, multiple objective calibration of a hydrologic model for a snowmelt dominated basin, J. Am. Water Resour. Assoc.,
42(4), 877 – 890, doi:10.1111/j.1752-1688.2006.tb04501.x.
Hillard, U., V. Sridhar, D. P. Lettenmaier, and K. C. McDonald (2003),
Assessing snowmelt dynamics with NASA scatterometer (NSCAT) data
and a hydrologic process model, Remote Sens. Environ., 86(1), 52 – 69,
doi:10.1016/S0034-4257(03)00068-3.
12 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
Hornberger, G. M., K. J. Beven, B. J. Cosby, and D. E. Sappington
(1985), Shenandoah watershed study: Calibration of a topographically
based, variable contributing area hydrological model to a small forested
catchment, Water Resour. Res., 21, 1841 – 1850, doi:10.1029/
WR021i012p01841.
Huang, M. Y., and X. Liang (2006), On the assessment of the impact of
reducing parameters and identification of parameter uncertainties for a
hydrologic model with applications to ungauged basins, J. Hydrol.,
320(1 – 2), 37 – 61, doi:10.1016/j.jhydrol.2005.07.010.
Hunger, M., and P. Döll (2008), Value of river discharge data for globalscale hydrological modeling, Hydrol. Earth Syst. Sci., 12, 841 – 861.
Islam, M. S., T. Oki, S. Kanae, N. Hanasaki, Y. Agata, and K. Yoshimura
(2007), A grid-based assessment of global water scarcity including virtual water trading, Water Resour. Manage., 21(1), 19 – 33, doi:10.1007/
s11269-006-9038-y.
Jothityangkoon, C., M. Sivapalan, and D. L. Farmer (2001), Process controls of water balance variability in a large semi-arid catchment: Downward approach to hydrological model development, J. Hydrol., 254(1 – 4),
174 – 198, doi:10.1016/S0022-1694(01)00496-6.
Kaspar, F. (2004), Entwicklung und Unsicherheitsanalyse eines Globalen
Hydrologischen Modells, Kassel Univ. Press, Kassel, Germany.
Kleinen, T., and G. Petschel-Held (2007), Integrated assessment of changes
in flooding probabilities due to climate change, Clim. Change, 81(3 – 4),
283 – 312, doi:10.1007/s10584-006-9159-6.
Krause, P., D. P. Boyle, and F. Bäse (2005), Comparison of different efficiency criteria for hydrological model assessment, Adv. Geosci., 5,
89 – 97.
Kucharik, C. J., J. A. Foley, C. Delire, V. A. Fisher, M. T. Coe, J. D.
Lenters, C. Young-Molling, N. Ramankutty, J. M. Norman, and S. T.
Gower (2000), Testing the performance of a dynamic global ecosystem
model: Water balance, carbon balance, and vegetation structure, Global
Biogeochem. Cycles, 14(3), 795 – 825, doi:10.1029/1999GB001138.
Kuczera, G., and M. Mroczkowski (1998), Assessment of hydrologic parameter uncertainty and the worth of multiresponse data, Water Resour.
Res., 34(6), 1481 – 1489, doi:10.1029/98WR00496.
Legates, D. R., and G. J. McCabe (1999), Evaluating the use of ‘‘goodnessof-fit’’ measures in hydrologic and hydroclimatic model validation,
Water Resour. Res., 35(1), 233 – 241, doi:10.1029/1998WR900018.
Legates, D. R., and C. J. Willmott (1990), Mean seasonal and spatial
variability in gauge-corrected, global precipitation, Int. J. Climatol.,
10(2), 111 – 127, doi:10.1002/joc.3370100202.
Lehner, B., P. Döll, J. Alcamo, T. Henrichs, and F. Kaspar (2006), Estimating the impact of global change on flood and drought risks in Europe: A
continental, integrated analysis, Clim. Change, 75(3), 273 – 299,
doi:10.1007/s10584-006-6338-4.
Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple
hydrologically based model of land surface water and energy fluxes for
general-circulation models, J. Geophys. Res., 99(D7), 14,415 – 14,428,
doi:10.1029/94JD00483.
L’vovich, M. I. (1979), World Water Resources and Their Future, translated
from Russian by R. L. Nace, AGU, Washington, D. C.
Madsen, H. (2000), Automatic calibration of a conceptual rainfall-runoff
model using multiple objectives, J. Hydrol., 235(3 – 4), 276 – 288,
doi:10.1016/S0022-1694(00)00279-1.
Mitchell, T. D., and P. D. Jones (2005), An improved method of constructing a database of monthly climate observations and associated
high-resolution grids, Int. J. Climatol., 25(6), 693 – 712, doi:10.1002/
joc.1181.
Mouelhi, S., C. Michel, C. Perrin, and V. Andreassian (2006), Linking
stream flow to rainfall at the annual time step: The Manabe bucket model
revisited, J. Hydrol., 328(1 – 2), 283 – 296, doi:10.1016/j.jhydrol.
2005.12.022.
Nijssen, B., G. M. O’Donnell, A. F. Hamlet, and D. P. Lettenmaier (2001a),
Hydrologic sensitivity of global rivers to climate change, Clim. Change,
50(1 – 2), 143 – 175, doi:10.1023/A:1010616428763.
Nijssen, B., G. M. O’Donnell, D. P. Lettenmaier, D. Lohmann, and E. F.
Wood (2001b), Predicting the discharge of global rivers, J. Clim., 14(15),
3307 – 3323, doi:10.1175/1520-0442(2001)014<3307:PTDOGR>2.0.
CO;2.
Nijssen, B., R. Schnur, and D. P. Lettenmaier (2001c), Global retrospective
estimation of soil moisture using the variable infiltration capacity land
surface model, 1980 – 93, J. Clim., 14(8), 1790 – 1808, doi:10.1175/15200442(2001)014<1790:GREOSM>2.0.CO;2.
Oki, T., Y. Agata, S. Kanae, T. Saruhashi, D. Yang, and K. Musiake (2001),
Global assessment of current water resources using total runoff integrating pathways, Hydrol. Sci. J., 46, 983 – 995.
W05418
Pan, M., et al. (2003), Snow process modeling in the North American Land
Data Assimilation System (NLDAS): 2. Evaluation of model simulated
snow water equivalent, J. Geophys. Res., 108(D22), 8850, doi:10.1029/
2003JD003994.
Parkin, G., G. Odonnell, J. Ewen, J. C. Bathurst, P. E. Oconnell, and
J. Lavabre (1996), Validation of catchment models for predicting
land-use and climate change impacts. 2. Case study for a Mediterranean catchment, J. Hydrol., 175(1 – 4), 595 – 613, doi:10.1016/
S0022-1694(96)80027-8.
Peel, M. C., B. L. Finlayson, and T. A. McMahon (2007), Updated world
map of the Köppen-Geiger climate classification, Hydrol. Earth Syst.
Sci., 11, 1633 – 1644.
Ponce, V. M., R. P. Pandey, and S. Ercan (2000), Characterization of
drought across climatic spectrum, J. Hydrol. Eng., 5(2), 222 – 224,
doi:10.1061/(ASCE)1084-0699(2000)5:2(222).
Probst, J. L., and Y. Tardy (1987), Long-range streamflow and world continental runoff fluctuations since the beginning of this century, J. Hydrol.,
94(3 – 4), 289 – 311, doi:10.1016/0022-1694(87)90057-6.
Rawlins, M. A., K. C. McDonald, S. Frolking, R. B. Lammers, M. Fahnestock,
J. S. Kimball, and C. J. Vörösmarty (2005), Remote sensing of snow
thaw at the pan-Arctic scale using the SeaWinds scatterometer, J. Hydrol.,
312(1 – 4), 294 – 311, doi:10.1016/j.jhydrol.2004.12.018.
Rawlins, M. A., M. Fahnestock, S. Frolking, and C. J. Vörösmarty (2007),
On the evaluation of snow water equivalent estimates over the terrestrial
Arctic drainage basin, Hydrol. Processes, 21(12), 1616 – 1623,
doi:10.1002/hyp.6724.
Riebsame, W. E., et al. (1995), Complex river basins, in As Climate
Changes: International Impacts and Implications, edited by K. M.
Strzepek and J. B. Smith, pp. 57 – 91, Cambridge Univ. Press, Cambridge,
U. K.
Russell, G. L., and J. R. Miller (1990), Global river runoff calculated from a
global atmospheric general circulation model, J. Hydrol., 117(1 – 4),
241 – 254, doi:10.1016/0022-1694(90)90095-F.
Schaefli, B., and H. V. Gupta (2007), Do Nash values have value?, Hydrol.
Processes, 21(15), 2075 – 2080, doi:10.1002/hyp.6825.
Schaefli, B., B. Hingray, M. Niggli, and A. Musy (2005), A conceptual
glacio-hydrological model for high mountainous catchments, Hydrol.
Earth Syst. Sci., 9, 95 – 109.
Schmidt, R., et al. (2006), GRACE observations of changes in continental
water storage, Global Planet. Change, 50(1 – 2), 112 – 126, doi:10.1016/
j.gloplacha.2004.11.018.
Schulze, K., and P. Döll (2004), Neue Ansätze zur Modellierung von
Schneeakkumulation und -schmelze im globalen Wassermodell WaterGAP, in Tagungsband zum 7. Workshop zur Großskaligen Modellierung
in der Hydrologie. München, 27 – 28 November 2003, edited by R. Ludwig
et al., pp. 145 – 154, Kassel Univ. Press, Kassel, Germany.
Sheffield, J., et al. (2003), Snow process modeling in the North American
Land Data Assimilation System (NLDAS): 1. Evaluation of model-simulated snow cover extent, J. Geophys. Res., 108(D22), 8849, doi:10.1029/
2002JD003274.
Udnæs, H. C., E. Alfnes, and L. M. Andreassen (2007), Improving runoff
modelling using satellite-derived snow covered area?, Nord. Hydrol.,
38(1), 21 – 32, doi:10.2166/nh.2007.032.
Vörösmarty, C. J., and B. Moore (1991), Modeling basin-scale hydrology
in support of physical climate and global biogeochemical studies—An
example using the Zambezi River, Surv. Geophys., 12(1 – 3), 271 – 311,
doi:10.1007/BF01903422.
Vörösmarty, C. J., C. J. Willmott, B. J. Choudhury, A. L. Schloss, T. K.
Stearns, S. M. Robeson, and T. J. Dorman (1996), Analyzing the discharge regime of a large tropical river through remote sensing, groundbased climatic data, and modeling, Water Resour. Res., 32(10), 3137 –
3150, doi:10.1029/96WR01333.
Vörösmarty, C. J., K. P. Sharma, B. M. Fekete, A. H. Copeland,
J. Holden, J. Marble, and J. A. Lough (1997), The storage and aging
of continental runoff in large reservoir systems of the world, Ambio,
26(4), 210 – 219.
Vörösmarty, C. J., C. A. Federer, and A. L. Schloss (1998), Evaporation
functions compared on US watersheds: Possible implications for globalscale water balance and terrestrial ecosystem modeling, J. Hydrol.,
207(3 – 4), 147 – 169, doi:10.1016/S0022-1694(98)00109-7.
Vörösmarty, C. J., P. Green, J. Salisbury, and R. B. Lammers (2000a),
Global water resources: Vulnerability from climate change acid population growth, Science, 289(5477), 284 – 288, doi:10.1126/science.
289.5477.284.
Vörösmarty, C. J., B. M. Fekete, M. Meybeck, and R. B. Lammers (2000b),
Global system of rivers: Its role in organizing continental land mass and
13 of 14
W05418
WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE
defining land-to-ocean linkages, Global Biogeochem. Cycles, 14(2),
599 – 621, doi:10.1029/1999GB900092.
Wagener, T. (2003), Evaluation of catchment models, Hydrol. Processes,
17(16), 3375 – 3378, doi:10.1002/hyp.5158.
Wagener, T., N. McIntyre, M. J. Lees, H. S. Wheater, and H. V. Gupta
(2003), Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic identifiability analysis, Hydrol. Processes, 17(2), 455 –
476, doi:10.1002/hyp.1135.
Werth, S., A. Güntner, and B. Merz (2007), Calibration of the global
hydrology model WGHM with water storage variations from the
GRACE mission, Geophys. Res. Abstr., 9, 05743, sref:1607-7962/gra/
EGU2007-A-05743.
Widén-Nilsson, E., S. Halldin, and C. Xu (2007), Global water-balance
modelling with WASMOD-M: Parameter estimation and regionalisation,
J. Hydrol., 340(1 – 2), 105 – 118, doi:10.1016/j.jhydrol.2007.04.002.
Xu, C.-Y. (2002), WASMOD—The water and snow balance modeling
system, in Mathematical Models of Small Watershed Hydrology and
W05418
Applications, edited by V. P. Singh and D. K. Frevert, chap. 17,
pp. 555 – 590, Water Resour. Publ., Highlands Ranch, Colo.
Xu, C. Y., and G. L. Vandewiele (1994), Sensitivity of monthly rainfallrunoff models to input errors and data length, Hydrol. Sci. J., 39(2),
157 – 176.
L. Gong and S. Halldin, Department of Earth Sciences, Uppsala
University, Villavägen 16, SE-752 36 Uppsala, Sweden. (sven.halldin@
hyd.uu.se)
E. Widén-Nilsson, Department of Aquatic Science and Assessment,
Swedish University of Agricultural Sciences, P.O. Box 7050, SE-750 07
Uppsala, Sweden.
C.-Y. Xu, Department of Geosciences, University of Oslo, P.O. Box 1047,
Blindern, N-0316 Oslo, Norway.
14 of 14
Download