Click Here WATER RESOURCES RESEARCH, VOL. 45, W05418, doi:10.1029/2007WR006695, 2009 for Full Article Model performance and parameter behavior for varying time aggregations and evaluation criteria in the WASMOD-M global water balance model E. Widén-Nilsson,1,2 L. Gong,2 S. Halldin,2 and C.-Y. Xu2,3 Received 23 November 2007; revised 15 October 2008; accepted 17 March 2009; published 20 May 2009. [1] Global discharge estimates commonly range between 36.500 km3 a1 and 44.500 km3 a1, i.e., around 20%, and continental estimates differ much more. Data uncertainties are assumed to be a main cause of simulated runoff uncertainties, but model performance must also be addressed. The parsimonious WASMOD-M global water balance model, using limited input data, was used to assess data and model uncertainty (contrary to models using much data but being modestly or not at all calibrated). A Monte Carlo technique based on 15,000 parameter value sets was used to evaluate the model against four criteria: observed snow and monthly, annual, and long-term discharge. WASMOD-M was overparameterized when evaluated only against long-term average discharge but not against monthly discharge, and its snow algorithm could be simplified. Sequential calibration is suggested for confining the behavioral parameter space and minimizing model equifinality starting with snow, followed by long-term volume error, and ending with discharge dynamics. Citation: Widén-Nilsson, E., L. Gong, S. Halldin, and C.-Y. Xu (2009), Model performance and parameter behavior for varying time aggregations and evaluation criteria in the WASMOD-M global water balance model, Water Resour. Res., 45, W05418, doi:10.1029/2007WR006695. 1. Introduction [2] Global water balance models are increasingly used to estimate present and future water resources at large scales for purposes of, e.g., climate impact studies, freshwater availability for a growing global population, transboundary water management, and virtual water trade [Arnell, 2004; Islam et al., 2007; Lehner et al., 2006; Nijssen et al., 2001a; Vörösmarty et al., 2000a]. Such global models exist of different types and complexity. Models like MacPDM [Arnell, 1999; 2003], WBM [Vörösmarty et al., 1998], WGHM/WaterGAP [Alcamo et al., 2003; Döll et al., 2003], WASMOD-M [Widén-Nilsson et al., 2007], and the ‘‘reduced-form’’ model of Kleinen and Petschel-Held [2007] have roots in traditional catchment modeling. Models like VIC [Liang et al., 1994] and the integrated global water resources model of Hanasaki et al. [2008a, 2008b] are macroscale hydrological models with the possibility of GCM coupling because of their energy balance simulations. Global runoff is also produced by dynamic vegetation models like LPJ [Gerten et al., 2004] and IBIS [Kucharik et al., 2000]. [3] Global water resources were originally assessed from country statistics [e.g., L’vovich, 1979; Baumgartner and Reichel, 1975], but global water balance modeling has 1 Department of Aquatic Science and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden. 2 Department of Earth Sciences, Uppsala University, Uppsala, Sweden. 3 Department of Geosciences, University of Oslo, Oslo, Norway. Copyright 2009 by the American Geophysical Union. 0043-1397/09/2007WR006695$09.00 gradually taken over as an assessment tool. Internationally coordinated efforts gradually improve data sets for such models, and multimodel ensemble techniques are suggested as a way to improve global assessments [Dirmeyer et al., 2006]. In spite of such progress, there is still a large uncertainty in global discharge estimation. Total global discharge estimates commonly range between 36.500 km3 a1 and 44.500 km3 a1 [Widén-Nilsson et al., 2007], but Oki et al. [2001] report a model ensemble value as low as 29.485 km3 a1 for a 2-year period using precipitation data without gauge undercatch correction. Continental discharge estimates differ much more [Widén-Nilsson et al., 2007]. Probst and Tardy [1987] report annual global fluctuations between 34.500 and 44.000 km3 a1. [4] Gerten et al. [2004] show large differences between runoff simulated with the LPJ, WBM, Macro-PDM, and WGHM models. Kleinen and Petschel-Held [2007] compare simulation volume for 31 large river basins calculated with the VIC model [Nijssen et al., 2001b] and the land surface GCM component of Russell and Miller [1990]. They find volume differences to vary from 70% to over +2000% with an average of +10%. Global models rely on global data sets and are confined by their availability and often limited quality. All global models suffer from data uncertainties, which are often assumed to be a main cause of simulated runoff uncertainties. Döll et al. [2003] found that they could not match the observed average discharge without violating the physical range of the calibration parameter for some basins. Döll et al. [2003] and Fekete et al. [2002] had to apply large runoff correction factors in some cells to make inflow to the downstream interstation area equal to measured flow. They relate this to differences W05418 1 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE between precipitation and runoff data and especially precipitation undercatch. Fekete et al. [2002], and in part Döll et al. [2003], allow their correction factors to influence the simulated global runoff fields. [5] Global model performance has also received attention in recent years. Evaluation techniques partly depend on the modeler’s attitude to calibration. Most modelers agree that global model parameters should preferably not be calibrated [Arnell, 1999; Hanasaki et al., 2008a], but Döll et al. [2003] state that erroneous data are one reason why calibration cannot be avoided. WGHM, WASMOD-M, and the WTM routing module sometimes used with WBM and VIC are calibrated models [Döll et al., 2003; Widén-Nilsson et al., 2007; Vörösmarty and Moore, 1991; Vörösmarty et al., 1996; Liang et al., 1994]. Two ways to test model quality are evaluation against independent data sets of the same type and calibration against more than one variable. Another is uncertainty and sensitivity analysis by Monte Carlo simulations, which can reveal overparameterization and equifinality, i.e., that several parameter value sets give equally good results. Demaria et al. [2007] apply Monte Carlo analysis for subsurface parts of VIC for 4 and Huang and Liang [2006] for 12 U.S. basins to find those model structures that can be simplified without losing model performance. Monte Carlo tests of WGHM are presented by Kaspar [2004] and Güntner et al. [2007]. Kaspar [2004] concludes that the most sensitive parameters are related to lakes and wetlands for low flows and that the impact of climate change scenarios is stronger than parameter uncertainty for long-term average runoffs. Güntner et al. [2007] find a strong regional variation in the sensitivity of parameters governing total water storage (snow, soil moisture, groundwater, and surface water) depending on which processes are most important. [6] Wagener et al. [2003] present two reactions to the equifinality problem. The first is to use parsimonious models, with a risk of too simplistic model structures [Kuczera and Mroczkowski, 1998]. The second is to search for calibration methods that better use information in available data series of, e.g., discharge, groundwater levels, and snow cover. Therefore, ‘‘uncertainty evaluation of models means analyzing the range of parameter sets and sometimes even model structures that are viable for an anticipated study’’ [Wagener, 2003, p. 3376]. The selection of one performance criterion normally confines the behavioral parameter space differently than another, meaning that the optimum parameter value varies with different criteria [Madsen, 2000]. A combination of criteria, focusing on different parts of the hydrograph, is usually needed to evaluate the performance of a hydrological model [Krause et al., 2005; Legates and McCabe, 1999; Madsen, 2000]. Schulze and Döll [2004] use satellite-derived snow cover and discharge measurements to test a new subgrid snow routine for WGHM. WGHM, which explicitly simulates surface water storage [Güntner et al., 2007], has also been evaluated by Schmidt et al. [2006] against water storage variations from the Gravity Recovery and Climate Experiment (GRACE) satellite observations, whereas Werth et al. [2007] use these data for calibration. The mismatch among the GRACE data, which give a measure of the total water storage, changes over large regions after subtracting the atmospheric water content, with some blurring from sur- W05418 rounding areas, and the WGHM simulations of major water storages with conceptual model equations for 0.5° cells [Güntner et al., 2007] are an example of the incommensurability problems between measured and modeled entities. Hillard et al. [2003], Pan et al. [2003], and Sheffield et al. [2003] present comparisons of nonglobal VIC applications and satellite-derived snow data, whereas Nijssen et al. [2001c] compare the uncalibrated global version of VIC with measurements of snow cover and soil moisture in addition to global runoff. Rawlins et al. [2005, 2007] compare remotely sensed snow and locally measured river discharge with results from PWBM, a modified version of WBM. Fekete et al. [2006] use isotope data to evaluate WBM/WTM runoff. [7] It is difficult to calibrate a model against discharge time series if the model does not include routing delays from lakes, wetlands, and the river reach itself, as well as dam regulation. The problem is exacerbated since discharge information from upstream and downstream gauges often represent different time periods. Most previous studies have thus used long-term average discharge when evaluating results or selecting behavioral parameter value sets. Some global models, e.g., WGHM [Döll et al., 2003] and WBM/ WTM [Vörösmarty and Moore, 1991; Vörösmarty et al., 1996], include travel time delay. Many global rivers have regulation delays of 1 – 3 months [Vörösmarty et al., 1997], but regulation data are often unavailable [Brakenridge et al., 2005]. Algorithms for dam operation schemes are emerging [Haddeland et al., 2006; Hanasaki et al., 2006] but are not widely used. Model results are commonly reported for climatological (long-term average) intra-annual patterns, but efficiency measures are seldom calculated for such averages. Relatively few studies compare model efficiency at different time scales. Döll et al. [2003] and Hunger and Döll [2008] do this for WGHM. Parkin et al. [1996], Jothityangkoon et al. [2001], Eder et al. [2003], and Hay et al. [2006] present techniques on the catchment scale to deal with different time steps, and all agree that increased model complexity can be supported at a finer time step if required data are at hand. [8] Given the uncertainties in the global and continental discharge estimates, it is advantageous to have global hydrological models using different approaches, just like the ensembles of GCMs used in climate research. In this study we used WASMOD-M [Widén-Nilsson et al., 2007] to assess data and model uncertainty. We believe that WASMOD-M, with its six parameters, has the most parsimonious structure of all models except the one of Kleinen and Petschel-Held [2007]. In developing WASMOD-M, we start from a very simple structure, with as few parameters as possible, to avoid overparameterization. If it is found to be necessary, more processes and input data sets will be added to the model in the future. Widén-Nilsson et al. [2007, p. 111] state that ‘‘In spite of its simplicity, it may be questioned if WASMOD-M also is overparameterized as long as only long-term average discharge is used for validation.’’ Most of the current WASMOD-M parameters are calibrated contrary to those of many other models. We wanted to find out how much the behavioral parameter value sets of WASMOD-M could be confined by validation against snow data in addition to discharge data. To what degree would selection of performance criteria be instru- 2 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE Table 1. Range of the Five Tunable Parameters in WASMOD-M Parameter Governing Storage Range Sampling Interval Ts (deg C) Tm (deg C) Ac Ps (month1) Pf (mm1) snowfall (equations (2) and (4)) snowmelt (equations (2) and (4)) actual evaporation (equation (7)) slow runoff (equation (8)) fast runoff (equation (9)) 0–4 4 – 0 0–1 e18 – e0 e14 – e0 uniform uniform uniform logarithmic logarithmic mental to simulation success? Would monthly and annual validation data allow successful discharge simulation without modeling travel time delay? Would equifinality or bad simulation indicate too simple or too complicated a model structure at the different time scales? 2. Material and Methods 2.1. Global Data Sets [9] WASMOD-M is driven by time series of monthly precipitation (P), temperature (Ta), and potential evaporation (ep) on a 0.5° 0.5° latitude-longitude grid. Precipitation, temperature, and water vapor pressure were taken from the CRU TS 2.10 climate data [Mitchell and Jones, 2005], covering 1901 – 2002 but only used until 2000. Precipitation was corrected for gauge undercatch with long-term average monthly factors calculated from the Global air temperature and precipitation regridded monthly and annual climatologies version 2.01 (available at http:// climate.geog.udel.edu/climate) [Legates and Willmott, 1990]. Gridded potential evaporation was preprocessed from temperature Ta (°C) and relative humidity RH (%), calculated from temperature and water vapor pressure: ep ¼ Ec ½maxðTa ; 0Þ2 ð100 RH Þ: ð1Þ Ec (mm month1 °C2) was set in an inverse process to make the average annual potential evaporation equal to the highest value in two evaporation data sets (Terrestrial water balance data archive: Regridded monthly climatologies version 1.02 by C. J. Willmott and K. Matsuura, available at http://climate.geog.udel.edu/climate/, and Potential evapotranspiration by C.-H. Ahn and R. Tateishi, available at http://www-cger.nies.go.jp/grid-e/). The minimum instead of the maximum was chosen in some Arctic Canadian cells to get Ec 1 mm month1 °C2. [10] Flow network and cell and basin areas were taken from STN-30p [Vörösmarty et al., 2000b]. Monthly discharge time series were taken from 654 Global Runoff Data Centre (GRDC) stations, in 254 basins, coregistered in 2007 to the STN-30p network in the UNH/GRDC composite runoff fields version 1.0 [Fekete et al., 2002] (available at http://www.grdc.sr.unh.edu/). Gauge data before 1901 were discarded. The Northern Hemisphere monthly snow cover extent 0.5° 0.5° latitude-longitude data set by R. L. Armstrong (available at http://islscp2. sesda.com/ISLSCP2_1/html_pages/groups/snow/snow_ cover_xdeg.html), provided temporal snow-cover data (percentage of weeks in a month for which a cell is snowcovered to more than 50%) for 344 of the 654 GRDC basins for 1986– 1995. W05418 2.2. Global Water Balance Model [11] The WASMOD-M global water balance model [WidénNilsson et al., 2007] is a distributed version of the monthly catchment model WASMOD by Xu [2002]. WASMOD-M calculates snow accumulation and melt and actual evaporation and separates runoff into a fast and a slow component for each grid cell with a time step Dt of 1 month. The present model version does not calculate, e.g., time-delayed routing and reservoir operation, open-water evaporation, glacier melt, and anthropogenic water abstraction. The model has five tunable parameters (Table 1). The version used in this study was the same as the one presented by Widén-Nilsson et al. [2007] except for a slightly different formulation of evaporation and total runoff. [12] The model simultaneously allows snowfall, rainfall, and snowmelt to occur in the same month. Snowfall and rainfall (sf and rf, mm month1) as well as snowmelt (sm, mm month1) and snow accumulation (sp) vary exponentially between temperature thresholds Tm and Ts, °C: 2 sf ¼ P 1 e½fðTa Ts Þ=ðTs Tm Þg ; ð2Þ rf ¼ P sf ; ð3Þ 2 sm ¼ ðspold =Dt þ sf Þ 1 e½fðTm Ta Þ=ðTs Tm Þg ; ð4Þ sp ¼ spold þ ðsf smÞDt; ð5Þ where P is precipitation (mm month1), Dt = 1 month, and {x} means min(x,0). [13] The ‘‘land moisture’’ variable (lm, mm) represents the storage of water available for evaporation and runoff in the next time step. Other authors use ‘‘soil moisture’’ for similar state variables, but we prefer lm to avoid the incommensurability problem of the soil moisture point measurements compared to the modeled, conceptual entity. Actual evaporation (evap, mm month1) is calculated from land moisture, potential evaporation, and available water (aw, mm month1): aw ¼ lmold =Dt þ rf þ sm evap ¼ min nh i o ; aw : ep 1 Aaw=ep c ð6Þ ð7Þ The slow runoff (sr, mm month1) is a base flow, provided by land moisture, whereas the fast runoff (fr, mm month1) is provided by both land moisture and water added during a time step. Both runoffs are described by linear reservoirs: 3 of 14 sr ¼ Ps ðlmold Þ; ð8Þ fr ¼ Pf ðlmold Þðsm þ rÞ; ð9Þ tr ¼ minfðsr þ frÞ; ðaw evapÞg; ð10Þ W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE W05418 Table 2. Number of Basins With at Least One Parameter Value Set Resulting in Simulated Results Fulfilling Different Evaluation Criteria Limits for Nash Coefficient, Volume Error, Limit of Acceptability, and Snow Fit When Evaluated Against Monthly and Annual Observationsa Monthly Annual Criterion Type Limit Name Number Percent Number Percent NC 0.8 0.5 0 1% 20% 50% ±75% of observations ±99% of observations 0.95 0.75 0.50 NC0.8 NC0.5 NC0 VE1 VE20 VE50 LA75 LA99 SF95 SF75 SF50 157 479 642 632 643 651 580 654 274 321 339 24 73 98 97 98 100 89 100 77 91 96 103 363 560 630 643 651 650 654 - 16 56 86 96 98 100 99 100 - VE LA SF a There were 654 basins with runoff measurements and 344 with snow measurements. Percentages relate to these totals. NC is Nash coefficient, VE is volume error, LA is limit of acceptability, and SF is snow fit. where tr is total runoff. Finally, the land moisture storage is updated: lm ¼ lmold þ ðrf þ sm evap trÞDt: ð11Þ The code was written in MATLAB, and simulations were made on a PC with support from a parallel cluster. The model warmup time period was 5 years. Initial values of land moisture and snow (where potentially occurring) were globally uniform. 2.3. Model Evaluation [14] The split-sample method was used to calibrate and validate the model, in which the first half of each discharge time series was used for calibration and the second was used for validation, and vice versa. Snow calibration was made for the whole 1986 – 1995 period. It was validated by comparison with a benchmark snow calibration driven by the long-term mean 1986– 1995 climatology. Calibration was made independently for each basin area with uniform parameter value sets. Interstation runoff was not calculated in nested basins. An upstream cell, belonging to the basins of several downstream stations, could thus get several different parameter value sets. [15] Calibration was a search of all ‘‘behavioral’’ parameter value sets at each discharge station. Monte Carlo simulations were made with the same sets for all basins. Parameter values were sampled from uniform and logarithmic distributions within given ranges (Table 1) and were randomly combined to generate 15,000 parameter value sets. [16] Calibration was made against monthly snow observations and monthly and annual discharge observations. Calendar year measurement averages were calculated from a minimum of 10 months, and missing months were also excluded from simulated averages. Snow was evaluated with one criterion and runoff with three criteria. Evaluation criteria were calculated separately for monthly and annual time series. Validation was based on the same runoff criteria as calibration. [17] We defined parameter value sets to be ‘‘behavioral’’ in two ways. A first, relative definition was the selection of the best 1, 3.3, and 20% (150, 500, and 3000 of 15,000) simulations for each criterion. A second definition used the absolute limits for each criterion (Table 2). [18] Since the snow in WASMOD-M is independent of land moisture and evaporation, it was evaluated separately. The snow criterion faced conceptual problems in both measurements and simulations. The simulated snowpack represents an amount on the last day of the month, available for melting in the following month. The Northern Hemisphere snow measurements give no information on amounts but percentage of weeks in a month for which a cell is snow covered to more than 50%. Only months with snow during 100% of the time are guaranteed to have snow on the last day. Months with 0% snow-covered (i.e., <50% cover) time were assumed to represent no-snow conditions unless they occurred in winter (December – February) or adjacent to a month with a spatial snow coverage above 50%. Given these conceptual limitations, the snow-fit criterion was based on measured and simulated snow periods: 0X 1 X smocorrect nsmocorrect B cells C C X SF ¼ minB ; cells @ X smo A nsmo tot tot cells ð12Þ cells where smocorrect is the number of months with simulated snow fitting months with measured snow and nsmocorrect is the number of simulated months with no snow fitting measured snow-free months. The total number of measured snow-covered and snow-free months is given by smotot and nsmotot. Everything is summed for all cells in a basin. SF varies from a perfect fit at 1 to no fit at 0. A minimum simulated 1 mm was required to accept a snow cover. [19] The Nash coefficient (NC), calculated from the discharge time series, and the volume error (VE), calculated from the long-term average discharge, are the most widely used criteria for discharge: 4 of 14 X ðdobs dsim Þ2 time NC ¼ 1 X time dobs dobs 2 ð13Þ WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE W05418 W05418 Figure 1. Runoff performance for (left) River Sénégal at Bakel and (right) River Ob at Salekhard for combinations of values of the evaporation (Ac) and slow-flow (Ps) and fast-flow (Pf) parameters. Calibration was made with the Nash criterion (NC) against observed monthly and annual observations and with the absolute value of the volume error (%) for the calibration time period (1904 – 1951 at Bakel and 1930 – 1961 at Salekhard). Values for the 500 (3.3%) best sets are shown as large dots. X X dsim dobs VE ¼ time X time ; dobs ð14Þ time where dobs is observed discharge and dsim is simulated discharge. [20] The limit of acceptability (LA) criterion is presented by Beven [2006]. It requires a modeler to predefine acceptable simulation errors on the basis of ‘‘effective observation error’’ of input data and discharge measurements. These limits can vary in time. Simulated runoff that falls within the acceptable limits at a given point in time is weighted with a triangular or a trapezoidal function where a simulation close to the measured discharge is given 100% weight, whereas a simulation outside limits is given zero weight. The choice of predefined error limits was not obvious in our case, and we started with subjective, wide limits. This was motivated by the facts that GRDC do not generally report rating curve errors and that the model-input data are uncertain. A symmetrical, triangular weighting function (with a zero minimum and a unit maximum) was used, and LA was calculated as a time average. [21] LA was defined by a range around the measured flow that simulated flows had to meet at least 95% of the time; that is, less than 3 months in 5 years was allowed to fall outside of range. The initial range (LA75) was given as ±75% of the flow at each time step plus 3 mm to avoid high relative low-flow errors. If a sufficient number of simulations did not meet this criterion, when selecting the 1 – 20% best, we widened the range to ±99% of the flow plus 3 mm (LA99). If this was not enough, we widened the range until we obtained the required number of simulations (LAmax). in the auxiliary material were selected to obey two criteria: (1) locations should be reported for other global models, and (2) results should represent typical cases, not just good or bad.1 3.1. Time Aggregations [23] Equifinality of both runoff and evaporation parameter increased when successively calibrated against monthly, annual, and long-term average runoff (Figure 1). The trend was very clear when going from annual to long-term average aggregation but less clear when going from monthly to annual aggregation. Equifinality decreased in some cases from monthly to annual aggregation, especially for the evaporation parameter. It was also evident that LA75 was too permissive in combination with annual data (Table 2). 3.2. Parameter Behavior [24] WASMOD-M showed a very good snow performance for many basins (Figure 2). SF values above 0.95 were found for 77% of all 344 snow observation basins (Table 2), and 35% of the 344 had SF 0.95 for all parameter value sets. The biggest problems to simulate snow observations correctly were found in basins with only occasional snow cover. Snow calibration commonly confined the snow parameter space (Figure 3). The constriction was clear for Ts but less so for Tm. The maximum SF values were higher with the normal snow calibration for 40% of the basins compared to the benchmark calibration. No improvement was seen in another 40%, mainly because SF = 1 with both calibrations in these mainly small basins. Ts was better confined with the normal snow calibration than with the benchmark calibration for 2/3 of the basins, while 1/3 were better confined with the benchmark. Tm was better confined with the normal snow calibration than the benchmark calibration for 47%, while 40% were better confined with 3. Results [22] WASMOD-M simulations exhibited a wide range of results from poor to excellent. Examples shown below and 1 Auxiliary materials are available in the HTML. doi:10.1029/ 2007WR006695. 5 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE W05418 Figure 2. Observed and simulated snow properties for a 0.5° 0.5° cell in the River Ob basin. Snow observations give percentage of weeks when snow covered more than 50% of the area. Solid squares indicate months when the cell was snow covered to 100%, and open squares indicate months completely without snow. The top row of squares shows measured data, and the bottom row shows simulated data. the benchmark and 13% were equally well confined with both calibrations. [25] The behavioral space for the evaporation parameter (Ac) differed between snow-covered and dry, warm basins. The evaporation parameter space was better confined for dry and warm basins with all criteria, while the runoff parameter (Ps, Pf) spaces were better confined for snowcovered basins, particularly by VE (Figure 1). Runoff parameters were better confined for nonsnow basins by the 1% best NC and LA values but for snow-covered basins with the 20% best NC and LA values. 3.3. Criteria Relationships [26] It was possible to meet the VE criterion within 1% for almost all basins (Tables 2 and S1) during the calibration periods. All basins (with one exception) where this criterion could not be met had runoff coefficient problems (too-high runoff compared to precipitation). It was also possible for many basins to fulfill the LA75 criterion (Table 2). A very high number of parameter value sets fulfilling LA75 were found for warm and dry basins, whereas a smaller number was found for snow-covered basins. This had to do with the generous ±3-mm limit that was too high for some arid basins with annual runoff sometimes below 3 mm. The Nash criterion was the most demanding, and less than a quarter of all basins got NC above 0.8 when calibrated against monthly data (Table 2). [27] Parameter value sets that simultaneously fulfilled all monthly criteria within their strictest limits (NC0.8, VE1, LA75, and SF95; see Table 2) were found for 57 basins (9% of all), none of which had runoff coefficient problems. Almost half were located in Africa, likely because SF was not used there and LA75 was too generous of a limit in dry Figure 3. Runoff (Nash criterion (NC)) and snow (snow fit (SF)) performance for a range of Ts and Tm values for River Ob at Salekhard. The best 3.3% (500 of 15,000) parameter value sets for each criterion as well as 20 common value sets among the best are highlighted. 6 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE W05418 Figure 4. Pairwise comparison of three runoff criteria (Nash (NC), volume error (VE), and limit of acceptability (LA75)) during calibration for (top) River Sénégal at Bakel and (bottom) River Ob at Salekhard. The dashed and dash-dotted lines delineate the best 500 (3.3%) and 150 (1%) value sets, respectively, for each parameter. Sets that are common for all three criteria are highlighted in black (best 3.3%) and are encircled (best 1%). basins. The NC performance was also high in Africa. Parameter value sets that simultaneously fulfilled the monthly NC, VE, and SF (where applicable) criteria at their second (NC0.5, VE20, SF75) and third (NC0, VE50, SF50) levels together with LA75 could be found for 52 and 84% of all basins. [28] It was possible to find good runoff parameter value sets concurrent with the best 1% snow parameter value sets for almost all snow-covered basins. It was also possible to find common sets between the 1% best of NC and VE for 99% of all basins. NC and LA behave similarly in parameter space, but common sets between the 1% best of them were found for only 83% of all basins. Common sets between the 1% best VE and LA were found for 52% of all basins. When pairwise common parameter value sets were found, the largest number was found for NC-LA followed by NC-VE. The smallest number of combinations was found between VE and LA (Figure 4). The number of pairwise common parameter value sets for snow-covered basins was usually smaller than for nonsnow basins, especially for NC-VE. Although the snow calibration narrowed the simulated runoff range, the reduction was not proportional to the reduction in the number of behavioral parameter value sets, and the average runoff time series produced by the confined and nonconfined sets were almost equal (Figure 5). The less confined parameter space for VE compared to NC is clearly seen in the range of hydrographs (Figure 6 and auxiliary material). The average time course is also different between the two, with NC simulations giving higher emphasis to peak flows. 3.4. Model Performance [29] The best parameter value sets identified during the calibration period always produced among the best validation results for all criteria (Figure 7). The best NC values decreased on average with 0.28 units during validation (Figure 8). The average validation performance could be better or worse, and the performance relation was seldom one to one even if good calibrations led to good validations (see Sénégal River in Figure 7). It was also common, especially for VE, that the 1% best calibrations were not found among the 1% best validations. The overlap between the best parameter value sets between calibrations of the 7 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE Figure 5. Ob River runoff at Salekhard. The thick black line gives observations. The light gray area delineates runoff simulated with the 500 (3.3%) best parameter value sets according to the Nash criterion. The dark gray area delineates runoff simulated with 20 common parameter value sets giving the best 3.3% fit for both the Nash and the snow-fit criteria. The thin black line gives the average of the best simulations (indistinguishable between Nash and combined Nash and snow-fit criteria). Runoff calibration for this basin was 1930– 1964, and snow calibration was 1986 – 1995. two periods was on average 80% for sets selected from the best 20% monthly NC values and 44% for the best 1% monthly NC values. The overlap was better for monthly than for annual calibration. Common sets were found for all basins among the best 20% NC values, while the 1% best had total misses for 11% of the basins. W05418 seasonality, should require somewhat less data, and 30 years should suffice. This was available for only 21% of the stations, whereas 27% had fewer than 10 annual data points. The LA criterion should theoretically be comparable over time aggregations if time series span at least 20 years to make the 95% time limit meaningful. Less than half of the stations fulfilled this requirement. The selected LA limits should have been more restrictive for annual than for monthly data to be comparable. [32] Döll et al. [2003] use NC on annual runoff, whereas other authors choose other annual criteria. Hunger and Döll [2008] use the coefficient of determination, Mouelhi et al. [2006] use an RMS error normalized by precipitation, and Bari et al. [2005] use correlation coefficients. Jothityangkoon et al. [2001] and Eder et al. [2003] compare measured and simulated annual exceedance probability. Schaefli and Gupta [2007] point out that an NC benchmark is needed since almost any model can deliver high NC values for some stations, whereas not even the best models are successful in other basins. One of their proposed benchmarks, weighting NC with precipitation, could be worth exploring for annual data. An LA criterion, with specific limits for the annual time step, might also be developed. [33] The equifinality of runoff parameter value sets (Ps, Pf) generally increased with increased time aggregation (Figure 1). Monthly and annual time steps often gave similar peaks in parameter space, whereas the long-term average runoff hardly confined the space at all. The sharpest peaks (sometimes more visible when integrated to give probability densities) were most often seen for the shortest time step. The behavioral monthly NC parameter value sets for the two runoff parameters often coincided with the annual NC sets. It was easy to find parameter value sets that also obeyed the long-term VE criterion, although they were not always among the 1% or 3.3% best sets. Common 4. Discussion [30] A discussion of a global water balance model must focus on general patterns, not on specific details. Unexpected results for single basins cannot be explored in detail because of limited and uncertain information about individual basins. 4.1. Time Aggregation [31] We selected the same runoff criteria for monthly and annual aggregations, but no criterion could be directly compared between aggregations. The volume error should be invariant to time aggregation but differed slightly since some monthly observations were excluded from the annual aggregation. The Nash criterion was not comparable between time aggregations for two reasons. NC values were lower for the annual aggregation because the annual runoff variability was lower. Annual NC values were also less certain because the number of observations was lower. Xu and Vandewiele [1994] show that the WASMOD catchment model requires 10 years of data for a robust calibration in humid climate. More than 10 years of monthly calibration data (after dividing all time series into halves) were available for 76% of the discharge stations, whereas 2% provided less than 5 years of data. Annual values, not affected by Figure 6. Sénégal River runoff at Bakel for the last 10 years of the calibration period 1904 – 1951. The thick black line gives observations. The light gray area delineates runoff simulated with the 500 (3.3%) best parameter value sets according to the volume error criterion, and dark gray areas delineate simulation with the Nash criterion. The thin black line gives the average of the best Nash simulations, and the dotted line gives the average of the best volume error simulations. 8 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE W05418 Figure 7. Modeled runoff performance for validation (second half) versus calibration (first half of observation) periods for three evaluation criteria (Nash (NC), limit of acceptability (LA), and volume error (VE)). Shown are performance for (top) Sénégal River at Bakel (observations 1904 –1989) and (bottom) Ob River at Salekhard (observations 1930– 1999). Simulations based on the best 150 (1%) calibrated parameter value sets are highlighted, and the thin line gives the one-to-one relation. The best calibrated parameter value sets for Ob River not fulfilling the 95% limit for LA75 during the calibration period are marked with crosses. parameter value sets selected on the basis of NC were more frequent than those based on LA. [34] The evaporation parameter Ac showed a complex behavior. Ac sometimes exhibited more equifinality with NC and LA for shorter than for longer time aggregations. This complex behavior is difficult to assess without a reliable global evaporation database. 4.2. Parameter Behavior [35] The SF criterion did not account for the start and end of the snow period because of the incommensurability problem between modeled and observed snow data. Still, the use of monthly precipitation and temperature time series in the normal calibration generally improved the results compared to the climatological benchmark snow calibration. The equifinality of Tm indicated that it might be discarded in future model versions. It is possible that this finding can be challenged when high-resolution MODIS snow cover data, which exist from 2000, can provide a better evaluation data set (National Snow and Ice Data Center, http://nsidc.org/data/modis/faq.html). [36] The evaporation parameter Ac was normally more confined by all runoff criteria for non– snow covered than for seasonally snow covered basins. The VE criterion confined Ac for 30% of the snow basins and for 70% of the nonsnow basins, primarily for basins with a high runoff coefficient. This constriction was therefore always toward high Ac values giving low evaporation. The VE constriction always acted to remove low Ac values but never high ones. The NC and LA criteria could confine both high and low Ac values. All runoff criteria acted oppositely on Ps and Ac such that a smaller range of evaporation mostly coincided with a wider base flow range and vice versa. The NC and LA criteria successfully confined the Ps and Pf runoff parameter space in its upper part and often also confined their values to a small range, while VE often left these parameters undetermined (Figure 1). 4.3. Criteria Relationships [37] Dunn and Colohan [1999] and Udnæs et al. [2007] show the importance of multiobjective calibration against snow data to get a better internal model structure even if simulated runoff is not improved. State variables updated with remotely sensed snow cover can marginally improve simulated streamflows, but their importance increases in areas with seasonally variable snow cover [Andreadis and Lettenmaier, 2006; Clark et al., 2006]. These findings are similar to ours, where a well-simulated snow cover only affected runoff to a small extent. The snow parameters were confined almost only by the snow criterion (Figure 3) but in a few cases also slightly by NC. [38] The three runoff criteria had a complex interrelationship that depended on the relative criteria limits and the presence or not of snow. Among the runoff criteria, NC and LA mostly gave similarly confined selections of behavioral parameter value sets. NC was commonly more restrictive than LA for nonsnow basins. VE results were commonly least confined, i.e., produced most equifinality. It was obviously more difficult to find pairwise common runoff parameter value sets when the tighter limits were put on each criterion (Figure 4). It was possible to get combinations of behavioral sets between NC and VE for all basins. The simultaneous requirements of LA and VE were seldom met in basins with bad LA performance and in dry basins 9 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE Figure 8. (top) Best Nash (NC) performance of 15,000 tested parameter value combinations for 654 gauged basins during calibration in the first half of the measured discharge time series and (bottom) the corresponding NC values during validation in the second half of the measured period. The results refer to the whole upstream area for each catchment but are shown only for their interstation coverage. where the 3-mm limit of LA was too permissive. For basins where it was not possible to reconcile LA and NC, it was also impossible to reconcile LA and VE. NC was the easiest criterion to reconcile with any of the other criteria. The fact that different criteria confined the parameter space differently is supported by earlier findings, e.g., Madsen [2000] and Chahinian and Moussa [2007], who selected paretooptimal parameter values. Chahinian and Moussa [2007, p. 1032] point out that ‘‘. . . the calibrated parameter values are dependent on the type of criteria used. Significant tradeoffs are observed between the different objectives: no unique set of parameter is able to satisfy all objectives simultaneously.’’ A similar conclusion is also drawn by Madsen [2000]. We made tests to find common parameter value combinations among the 1– 33% best parameter value combinations for each criterion. Rather low performances had to be accepted to find parameter value combinations within the best ranges of all criteria. We thus instead suggest a stepwise approach taking one criterion after the other. [39] We found a few geographical criteria patterns. One was that almost no European basin fell within the NC0.8 limit (Figure 8), possibly because of the too-high precipitation correction factors in Europe [Arnell, 1999; Döll et al., 2003]. Some Alpine catchments also have problems with W05418 nonstationarity of their state variables. Further investigation is needed if this is caused by retreating glaciers or something else. The high NC values in Africa might be surprising, given the usually lower data quality. Our modeling experience, however, has shown that NC values are usually higher in Africa and south Asia, where yearly discharge have one or two distinct flood periods and a dry period, compared to Europe, where the annual pattern can be more complex. [40] Widén-Nilsson et al. [2007] based their WASMOD-M analysis on 1680 parameter value sets and found it difficult to achieve very good fit in some cases. In this study we used 15,000 sets and had few problems in identifying ‘‘good’’ sets. It is likely that a still larger sample would allow the identification of a larger number of good parameter value sets and common parameter value sets between different criteria. We do not believe, however, that analysis of, e.g., one million sets would greatly alter the parameter value behavior or the relation between criteria. 4.4. Model Performance [41] In an ideal setting, validation should be performed with split-sample, proxy basin tests in a stationary climate within basins of time-invariable land cover and flow network. Such conditions can be met for individual catchments but not in a global basin analysis since human interventions during the 20th century have transformed the majority of the world’s basins. Many runoff records were systematically less affected by human activity during early calibration than during later validation periods. This nonstationarity, including climatic variations and possible changes in meteorological station network and data quality, can explain why sometimes all parameter value sets gave much better values for one criterion on one time period than on the other. It also limited the possibility of drawing detailed conclusions from the validation tests. The nonstationarity was also shown by calibrations performed on the second period. This high overlap, combined with the general agreement between high-calibration and high-validation performance for all criteria (Figure 7), showed that WASMOD-M was robust under these conditions. It was also shown that WASMOD-M could be calibrated to ±1% volume error for all basins except those with runoff coefficient problems. It was often also possible to find parameter value sets producing good dynamics within those sets that produced small volume errors. [42] Validation was hampered not only by nonstationarity but also by well-known data problems [Widén-Nilsson et al., 2007]. Error-prone data fed into an otherwise ‘‘perfect’’ model can create substantial equifinality. Our combination of precipitation and runoff data sets produced too-large runoff in relation to precipitation for 4% of all basins. Such problems were found, e.g., in some Alaskan basins and the headwaters of the Ganges-Brahmaputra and the neighboring Irrawaddy. Data problems, such as shifts in discharge pattern after damming and strange discharge pattern possibly caused by unit conversion problems or changes in rating curves, were found in another 4% of all basins. Several problem basins could be well simulated, and the most obvious consequence was the forcing of the evaporation parameter toward its upper limit for basins with too-high runoff coefficients. The usage of the maximum potential evaporation from two databases gave too-high potential 10 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE evaporation in some regions. The dryness index (quotient of annual potential evaporation to annual precipitation) exceeded 10, meaning hyperarid and superarid [Ponce et al., 2000], during the calibration period for as many as 4% of all basins. The dryness indices were compared at the five basins presented by Riebsame et al. [1995], and four basins clearly exceeded the values by Riebsame et al. [1995]. The too-high dryness was likely caused by a combination of toohigh potential evaporation and too-low precipitation. Low precipitation was likely the main problem in basins with high runoff coefficients. [43] Model problems were identified for a small number (12, less than 2%) of basins where no parameter value set produced NC above zero. Five of these basins were dominated by large lakes (Great Lakes Saint Lawrence River basin, Owen reservoir in the Victoria Nile, and the Narva River downstream of Lake Peipsi). The model treated all basins as land and did not account for delays caused by large lakes. WASMOD-M only excluded outlet lakes like the Black Sea, the Caspian Sea, and the Aral Sea from the land area. 5. Conclusions [44] Most global water balance models are evaluated against long-term average runoff, and results are also presented in the form of reasonable within-year dynamics. This study showed that WASMOD-M could always be calibrated to a very small volume error (long-term average runoff) for basins where input data were not found to be unreasonable and where the model assumptions were not obviously wrong. Calibration of WASMOD-M against measured snow properties reduced equifinality of the snow parameters. Runoff calibration against monthly and annual time series with the NC or LA criteria was expected to provide a model with good dynamical behavior and increased sensitivity to routing and damming, whereas runoff calibration against the long-term average runoff, i.e., the volume error, was expected to provide a model with a less consistent dynamic behavior but a smaller sensitivity to routing and damming. Results only partly confirmed this picture. Evaluation against annual data was difficult because time series were normally too short for generally accepted criteria. Calibration against monthly as opposed to annual data sometimes provided more equifinality, and calibration against long-term average runoff mostly produced considerable equifinality. This confirmed the concern of Widén-Nilsson et al. [2007] that WASMOD-M was overparameterized when evaluated against long-term average runoff but not against monthly time series, with the possibility that the snow algorithm can be simplified to use only one parameter. The somewhat ambiguous evaluations against monthly and annual observations as well as the model failure to mimic seasonal dynamics if influenced by large lakes provided incentives to develop a routing algorithm for WASMOD-M. [45] Overparameterization is one of three interrelated factors that cause equifinality and uncertainty in model results. The other two relate to the quantity and quality of input and validation data. The relation between input data and equifinality was reasonably straightforward since our input data were insufficient to confine all parameters. The data errors increased the uncertainty of the modeling components that were not controlled by the objective function W05418 and therefore enhanced the equifinality of the related model parameters. Concerning overparameterization, things were more complicated. This is because hydrological models are of diverse forms, so there is no simple relation between the number of parameters and the overparameterization of a model, although a large number of parameters usually increases the risk of overparameterization. Hornberger et al. [1985] point out the great danger of overparameterization if a modeler attempts to simulate all hydrological processes considered relevant and fit those parameters by optimization against an observed discharge record. This is because overparameterization is not only a problem of model structure but is also related to data problems. The WASMOD-M philosophy follows Beven [1989, p. 159], who conclude that it ‘‘appears that three to five parameters should be sufficient to reproduce most of the information in a hydrological record.’’ Given the inadequacy and inaccuracy of input data, simple models that capture the essential features should be preferable to complex models that are designed to simulate a large number of processes. Overparameterization and equifinality are caused by a lack of input data and data that are poor or modest representations of their real-world entities. [46] Since even the parsimonious WASMOD-M showed signs of being overparameterized, it can be questioned whether other, less parsimonious global water balance models might also be overparameterized. Although WASMOD-M has the highest number of calibrated parameters, other models (except the one by Kleinen and Petschel-Held [2007]) have a much higher total number of parameters. Noncalibrated parameters can suffer from large errors introduced by the physiographic data sets [Hannerz and Lotsch, 2006; Peel et al., 2007] used to estimate these parameters. Despite such problems, we are aiming at making use of further data sets to possibly reduce the number of calibrated parameters in WASMOD-M. The uncertainty in data could possibly be decreased by using several data sets of the same entity. [47] The generation and analysis of behavioral parameter value sets were done to analyze the combined model and data uncertainty in WASMOD-M. It was also done to generate a basis for regionalization and to define behavioral sets to be used in further model applications. VE was not enough to confine the parameter space and had to be accompanied by other criteria. Only 57 basins had parameter value sets that simultaneously fulfilled all monthly criteria within their strictest limits. A stepwise criteria application to select good parameter value sets was an alternative to the search for common parameter value sets among the best ones for each criterion. We suggest that snow calibration should be a first, independent step to confine parameter space before applying other criteria, since snow simulation is independent of the runoff simulation in WASMOD-M, and possibly other models, and since all land surface above 40° northern latitude has seasonal snow cover, and about 50% of the Northern Hemisphere runoff comes from snowmelt. The NC and LA criteria gave similar results, but LA needs further elaboration before use in global modeling. Since it was always possible to find behavioral parameter value sets fulfilling the VE criterion, we suggest (as do Demaria et al. [2007] and Schaefli et al. [2005]) that runoff calibration should start by confining the model to 11 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE those sets that give good fit to long-term average runoff. The parameter space should then be further confined by the NC criterion to provide as good a dynamical behavior as possible. The long-term average runoff should also be given a higher priority because anthropological influence is much larger on time series than on long-term averages. We think the LA criterion could be useful in future global modeling if combined with region- or basin-specific benchmarks such that model performance can be compared between different parts of the world. [48] This study used several parameter value sets to simulate runoff ranges. Fekete et al. [2004] and Fiedler and Döll [2007] showed varying model results with their models driven by different precipitation data sets. We see these studies as the start of a more general use of uncertainty analysis of all future model results. Further development of WASMOD-M and other global water balance models would benefit from intercomparisons based on the same input and validation data, the same land mask, and the same evaluation criteria. Simulated global runoff data should be specified to a given time period, and interannual variability should be assessed. Such development would increase the reliability of global runoff and discharge estimates and would decrease the large uncertainty we face today. [49] Acknowledgments. We are grateful to groups providing the free global data we used: the Climate Research Unit (CRU) and David Viner at the University of East Anglia; C. J. Willmott, K. Matsuura, and collaborators at the University of Delaware; GRID-Tsukuba at National Institute for Environmental Studies; Chung-Hyun Ahn and Ryutaro Tateishi at Chiba University; the Water Systems Research Group at the University of New Hampshire; Thomas de Couet at the Global Runoff Data Centre (GRDC); the International Satellite Land-Surface Climatology Project (ISLSCP); and the National Snow and Ice Data Center (NSIDC) at the University of Colorado, Boulder. The study was funded by the Swedish Research Council through contracts 629-2002-287 and 621-2002-4352 and Formas, the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning through contract 214-2005-911. Parts of the computations were performed on UPPMAX resources under project p2006015. Keith Beven and Ida Westerberg were valuable discussion partners, and Keith Beven kindly helped us to improve presentation and grammar. We thank Balazs Fekete and an anonymous reviewer for their valuable manuscript comments. References Alcamo, J., P. Döll, T. Henrichs, F. Kaspar, B. Lehner, T. Rosch, and S. Siebert (2003), Development and testing of the WaterGAP 2 global model of water use and availability, Hydrol. Sci. J., 48(3), 317 – 337, doi:10.1623/hysj.48.3.317.45290. Andreadis, K. M., and D. P. Lettenmaier (2006), Assimilating remotely sensed snow observations into a macroscale hydrology model, Adv. Water Resour., 29(6), 872 – 886, doi:10.1016/j.advwatres.2005.08.004. Arnell, N. W. (1999), A simple water balance model for the simulation of streamflow over a large geographic domain, J. Hydrol., 217(3 – 4), 314 – 335, doi:10.1016/S0022-1694(99)00023-2. Arnell, N. W. (2003), Effects of IPCCSRES emissions scenarios on river runoff: A global perspective, Hydrol. Earth Syst. Sci., 7, 619 – 641. Arnell, N. W. (2004), Climate change and global water resources: SRES emissions and socio-economic scenarios, Global Environ. Change, 14(1), 31 – 52. Bari, M. A., K. R. J. Smettem, and M. Sivapalan (2005), Understanding changes in annual runoff following land use changes: A systematic databased approach, Hydrol. Processes, 19(13), 2463 – 2479, doi:10.1002/ hyp.5679. Baumgartner, A., and E. Reichel (1975), Die Weltwasserbilanz, R. Oldenbourg, Munich, Germany. Beven, K. J. (1989), Changing ideas in hydrology: The case of physically based models, J. Hydrol., 105(1 – 2), 157 – 172, doi:10.1016/00221694(89)90101-7. W05418 Beven, K. (2006), A manifesto for the equifinality thesis, J. Hydrol., 320(1 – 2), 18 – 36, doi:10.1016/j.jhydrol.2005.07.007. Brakenridge, G. R., S. V. Nghiem, E. Anderson, and S. Chien (2005), Space-based measurement of river runoff, Eos Trans. AGU, 86, 185 – 188, doi:10.1029/2005EO190001. Chahinian, N., and R. Moussa (2007), Comparison of different multiobjective calibration criteria of a conceptual rainfall-runoff model of flood events, Hydrol. Earth Syst. Sci. Discuss., 4, 1031 – 1067. Clark, M. P., A. G. Slater, A. P. Barrett, L. E. Hay, G. J. McCabe, B. Rajagopalan, and G. H. Leavesley (2006), Assimilation of snow covered area information into hydrologic and land-surface models, Adv. Water Resour., 29(8), 1209 – 1221, doi:10.1016/j.advwatres.2005.10.001. Demaria, E. M., B. Nijssen, and T. Wagener (2007), Monte Carlo sensitivity analysis of land surface parameters using the variable infiltration capacity model, J. Geophys. Res., 112, D11113, doi:10.1029/2006JD007534. Dirmeyer, P. A., X. A. Gao, M. Zhao, Z. C. Guo, T. K. Oki, and N. Hanasaki (2006), GSWP-2—Multimodel analysis and implications for our perception of the land surface, Bull. Am. Meteorol. Soc., 87(10), 1381 – 1397, doi:10.1175/BAMS-87-10-1381. Döll, P., F. Kaspar, and B. Lehner (2003), A global hydrological model for deriving water availability indicators: Model tuning and validation, J. Hydrol., 270(1 – 2), 105 – 134, doi:10.1016/S0022-1694(02)00283-4. Dunn, S. M., and R. J. E. Colohan (1999), Developing the snow component of a distributed hydrological model: A step-wise approach based on multi-objective analysis, J. Hydrol., 223(1 – 2), 1 – 16, doi:10.1016/ S0022-1694(99)00095-5. Eder, G., M. Sivapalan, and H. P. Nachtnebel (2003), Modelling water balances in an Alpine catchment through exploitation of emergent properties over changing time scales, Hydrol. Processes, 17(11), 2125 – 2149, doi:10.1002/hyp.1325. Fekete, B. M., C. J. Vörösmarty, and W. Grabs (2002), High-resolution fields of global runoff combining observed river discharge and simulated water balances, Global Biogeochem. Cycles, 16(3), 1042, doi:10.1029/ 1999GB001254. Fekete, B. M., C. J. Vörösmarty, J. O. Roads, and C. J. Willmott (2004), Uncertainties in precipitation and their impacts on runoff estimates, J. Clim., 17, 294 – 304, doi:10.1175/1520-0442(2004)017<0294:UIPATI>2.0.CO;2. Fekete, B. M., J. J. Gibson, P. Aggarwal, and C. J. Vörösmarty (2006), Application of isotope tracers in continental scale hydrological modeling, J. Hydrol., 330(3 – 4), 444 – 456, doi:10.1016/j.jhydrol.2006.04.029. Fiedler, K., and P. Döll (2007), Global modelling of continental water storage change—Sensitivity to different climate data sets, Adv. Geosci., 11, 63 – 68. Gerten, D., S. Schaphoff, U. Haberlandt, W. Lucht, and S. Sitch (2004), Terrestrial vegetation and water balance—Hydrological evaluation of a dynamic global vegetation model, J. Hydrol., 286(1 – 4), 249 – 270, doi:10.1016/j.jhydrol.2003.09.029. Güntner, A., J. Stuck, S. Werth, P. Döll, K. Verzano, and B. Merz (2007), A global analysis of temporal and spatial variations in continental water storage, Water Resour. Res., 43, W05416, doi:10.1029/2006WR005247. Haddeland, I., T. Skaugen, and D. P. Lettenmaier (2006), Anthropogenic impacts on continental surface water fluxes, Geophys. Res. Lett., 33, L08406, doi:10.1029/2006GL026047. Hanasaki, N., S. Kanae, and T. Oki (2006), A reservoir operation scheme for global river routing models, J. Hydrol., 327(1 – 2), 22 – 41, doi:10.1016/j.jhydrol.2005.11.011. Hanasaki, N., S. Kanae, T. Oki, K. Masuda, K. Motoya, N. Shirakawa, Y. Shen, and K. Tanaka (2008a), An integrated model for the assessment of global water resources—Part 1: Model description and input meteorological forcing, Hydrol. Earth Syst. Sci., 12, 1007 – 1025. Hanasaki, N., S. Kanae, T. Oki, K. Masuda, K. Motoya, N. Shirakawa, Y. Shen, and K. Tanaka (2008b), An integrated model for the assessment of global water resources—Part 2: Applications and assessments, Hydrol. Earth Syst. Sci., 12, 1027 – 1037. Hannerz, F., and A. Lotsch (2006), Assessment of land use and cropland inventories for Africa, Discuss. Pap. 22, Cent. for Environ. Econ. and Policy in Afr., Univ. of Pretoria, Pretoria, South Africa. Hay, L. E., G. H. Leavesley, M. P. Clark, S. L. Markstrom, R. J. Viger, and M. Umemoto (2006), Step wise, multiple objective calibration of a hydrologic model for a snowmelt dominated basin, J. Am. Water Resour. Assoc., 42(4), 877 – 890, doi:10.1111/j.1752-1688.2006.tb04501.x. Hillard, U., V. Sridhar, D. P. Lettenmaier, and K. C. McDonald (2003), Assessing snowmelt dynamics with NASA scatterometer (NSCAT) data and a hydrologic process model, Remote Sens. Environ., 86(1), 52 – 69, doi:10.1016/S0034-4257(03)00068-3. 12 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE Hornberger, G. M., K. J. Beven, B. J. Cosby, and D. E. Sappington (1985), Shenandoah watershed study: Calibration of a topographically based, variable contributing area hydrological model to a small forested catchment, Water Resour. Res., 21, 1841 – 1850, doi:10.1029/ WR021i012p01841. Huang, M. Y., and X. Liang (2006), On the assessment of the impact of reducing parameters and identification of parameter uncertainties for a hydrologic model with applications to ungauged basins, J. Hydrol., 320(1 – 2), 37 – 61, doi:10.1016/j.jhydrol.2005.07.010. Hunger, M., and P. Döll (2008), Value of river discharge data for globalscale hydrological modeling, Hydrol. Earth Syst. Sci., 12, 841 – 861. Islam, M. S., T. Oki, S. Kanae, N. Hanasaki, Y. Agata, and K. Yoshimura (2007), A grid-based assessment of global water scarcity including virtual water trading, Water Resour. Manage., 21(1), 19 – 33, doi:10.1007/ s11269-006-9038-y. Jothityangkoon, C., M. Sivapalan, and D. L. Farmer (2001), Process controls of water balance variability in a large semi-arid catchment: Downward approach to hydrological model development, J. Hydrol., 254(1 – 4), 174 – 198, doi:10.1016/S0022-1694(01)00496-6. Kaspar, F. (2004), Entwicklung und Unsicherheitsanalyse eines Globalen Hydrologischen Modells, Kassel Univ. Press, Kassel, Germany. Kleinen, T., and G. Petschel-Held (2007), Integrated assessment of changes in flooding probabilities due to climate change, Clim. Change, 81(3 – 4), 283 – 312, doi:10.1007/s10584-006-9159-6. Krause, P., D. P. Boyle, and F. Bäse (2005), Comparison of different efficiency criteria for hydrological model assessment, Adv. Geosci., 5, 89 – 97. Kucharik, C. J., J. A. Foley, C. Delire, V. A. Fisher, M. T. Coe, J. D. Lenters, C. Young-Molling, N. Ramankutty, J. M. Norman, and S. T. Gower (2000), Testing the performance of a dynamic global ecosystem model: Water balance, carbon balance, and vegetation structure, Global Biogeochem. Cycles, 14(3), 795 – 825, doi:10.1029/1999GB001138. Kuczera, G., and M. Mroczkowski (1998), Assessment of hydrologic parameter uncertainty and the worth of multiresponse data, Water Resour. Res., 34(6), 1481 – 1489, doi:10.1029/98WR00496. Legates, D. R., and G. J. McCabe (1999), Evaluating the use of ‘‘goodnessof-fit’’ measures in hydrologic and hydroclimatic model validation, Water Resour. Res., 35(1), 233 – 241, doi:10.1029/1998WR900018. Legates, D. R., and C. J. Willmott (1990), Mean seasonal and spatial variability in gauge-corrected, global precipitation, Int. J. Climatol., 10(2), 111 – 127, doi:10.1002/joc.3370100202. Lehner, B., P. Döll, J. Alcamo, T. Henrichs, and F. Kaspar (2006), Estimating the impact of global change on flood and drought risks in Europe: A continental, integrated analysis, Clim. Change, 75(3), 273 – 299, doi:10.1007/s10584-006-6338-4. Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple hydrologically based model of land surface water and energy fluxes for general-circulation models, J. Geophys. Res., 99(D7), 14,415 – 14,428, doi:10.1029/94JD00483. L’vovich, M. I. (1979), World Water Resources and Their Future, translated from Russian by R. L. Nace, AGU, Washington, D. C. Madsen, H. (2000), Automatic calibration of a conceptual rainfall-runoff model using multiple objectives, J. Hydrol., 235(3 – 4), 276 – 288, doi:10.1016/S0022-1694(00)00279-1. Mitchell, T. D., and P. D. Jones (2005), An improved method of constructing a database of monthly climate observations and associated high-resolution grids, Int. J. Climatol., 25(6), 693 – 712, doi:10.1002/ joc.1181. Mouelhi, S., C. Michel, C. Perrin, and V. Andreassian (2006), Linking stream flow to rainfall at the annual time step: The Manabe bucket model revisited, J. Hydrol., 328(1 – 2), 283 – 296, doi:10.1016/j.jhydrol. 2005.12.022. Nijssen, B., G. M. O’Donnell, A. F. Hamlet, and D. P. Lettenmaier (2001a), Hydrologic sensitivity of global rivers to climate change, Clim. Change, 50(1 – 2), 143 – 175, doi:10.1023/A:1010616428763. Nijssen, B., G. M. O’Donnell, D. P. Lettenmaier, D. Lohmann, and E. F. Wood (2001b), Predicting the discharge of global rivers, J. Clim., 14(15), 3307 – 3323, doi:10.1175/1520-0442(2001)014<3307:PTDOGR>2.0. CO;2. Nijssen, B., R. Schnur, and D. P. Lettenmaier (2001c), Global retrospective estimation of soil moisture using the variable infiltration capacity land surface model, 1980 – 93, J. Clim., 14(8), 1790 – 1808, doi:10.1175/15200442(2001)014<1790:GREOSM>2.0.CO;2. Oki, T., Y. Agata, S. Kanae, T. Saruhashi, D. Yang, and K. Musiake (2001), Global assessment of current water resources using total runoff integrating pathways, Hydrol. Sci. J., 46, 983 – 995. W05418 Pan, M., et al. (2003), Snow process modeling in the North American Land Data Assimilation System (NLDAS): 2. Evaluation of model simulated snow water equivalent, J. Geophys. Res., 108(D22), 8850, doi:10.1029/ 2003JD003994. Parkin, G., G. Odonnell, J. Ewen, J. C. Bathurst, P. E. Oconnell, and J. Lavabre (1996), Validation of catchment models for predicting land-use and climate change impacts. 2. Case study for a Mediterranean catchment, J. Hydrol., 175(1 – 4), 595 – 613, doi:10.1016/ S0022-1694(96)80027-8. Peel, M. C., B. L. Finlayson, and T. A. McMahon (2007), Updated world map of the Köppen-Geiger climate classification, Hydrol. Earth Syst. Sci., 11, 1633 – 1644. Ponce, V. M., R. P. Pandey, and S. Ercan (2000), Characterization of drought across climatic spectrum, J. Hydrol. Eng., 5(2), 222 – 224, doi:10.1061/(ASCE)1084-0699(2000)5:2(222). Probst, J. L., and Y. Tardy (1987), Long-range streamflow and world continental runoff fluctuations since the beginning of this century, J. Hydrol., 94(3 – 4), 289 – 311, doi:10.1016/0022-1694(87)90057-6. Rawlins, M. A., K. C. McDonald, S. Frolking, R. B. Lammers, M. Fahnestock, J. S. Kimball, and C. J. Vörösmarty (2005), Remote sensing of snow thaw at the pan-Arctic scale using the SeaWinds scatterometer, J. Hydrol., 312(1 – 4), 294 – 311, doi:10.1016/j.jhydrol.2004.12.018. Rawlins, M. A., M. Fahnestock, S. Frolking, and C. J. Vörösmarty (2007), On the evaluation of snow water equivalent estimates over the terrestrial Arctic drainage basin, Hydrol. Processes, 21(12), 1616 – 1623, doi:10.1002/hyp.6724. Riebsame, W. E., et al. (1995), Complex river basins, in As Climate Changes: International Impacts and Implications, edited by K. M. Strzepek and J. B. Smith, pp. 57 – 91, Cambridge Univ. Press, Cambridge, U. K. Russell, G. L., and J. R. Miller (1990), Global river runoff calculated from a global atmospheric general circulation model, J. Hydrol., 117(1 – 4), 241 – 254, doi:10.1016/0022-1694(90)90095-F. Schaefli, B., and H. V. Gupta (2007), Do Nash values have value?, Hydrol. Processes, 21(15), 2075 – 2080, doi:10.1002/hyp.6825. Schaefli, B., B. Hingray, M. Niggli, and A. Musy (2005), A conceptual glacio-hydrological model for high mountainous catchments, Hydrol. Earth Syst. Sci., 9, 95 – 109. Schmidt, R., et al. (2006), GRACE observations of changes in continental water storage, Global Planet. Change, 50(1 – 2), 112 – 126, doi:10.1016/ j.gloplacha.2004.11.018. Schulze, K., and P. Döll (2004), Neue Ansätze zur Modellierung von Schneeakkumulation und -schmelze im globalen Wassermodell WaterGAP, in Tagungsband zum 7. Workshop zur Großskaligen Modellierung in der Hydrologie. München, 27 – 28 November 2003, edited by R. Ludwig et al., pp. 145 – 154, Kassel Univ. Press, Kassel, Germany. Sheffield, J., et al. (2003), Snow process modeling in the North American Land Data Assimilation System (NLDAS): 1. Evaluation of model-simulated snow cover extent, J. Geophys. Res., 108(D22), 8849, doi:10.1029/ 2002JD003274. Udnæs, H. C., E. Alfnes, and L. M. Andreassen (2007), Improving runoff modelling using satellite-derived snow covered area?, Nord. Hydrol., 38(1), 21 – 32, doi:10.2166/nh.2007.032. Vörösmarty, C. J., and B. Moore (1991), Modeling basin-scale hydrology in support of physical climate and global biogeochemical studies—An example using the Zambezi River, Surv. Geophys., 12(1 – 3), 271 – 311, doi:10.1007/BF01903422. Vörösmarty, C. J., C. J. Willmott, B. J. Choudhury, A. L. Schloss, T. K. Stearns, S. M. Robeson, and T. J. Dorman (1996), Analyzing the discharge regime of a large tropical river through remote sensing, groundbased climatic data, and modeling, Water Resour. Res., 32(10), 3137 – 3150, doi:10.1029/96WR01333. Vörösmarty, C. J., K. P. Sharma, B. M. Fekete, A. H. Copeland, J. Holden, J. Marble, and J. A. Lough (1997), The storage and aging of continental runoff in large reservoir systems of the world, Ambio, 26(4), 210 – 219. Vörösmarty, C. J., C. A. Federer, and A. L. Schloss (1998), Evaporation functions compared on US watersheds: Possible implications for globalscale water balance and terrestrial ecosystem modeling, J. Hydrol., 207(3 – 4), 147 – 169, doi:10.1016/S0022-1694(98)00109-7. Vörösmarty, C. J., P. Green, J. Salisbury, and R. B. Lammers (2000a), Global water resources: Vulnerability from climate change acid population growth, Science, 289(5477), 284 – 288, doi:10.1126/science. 289.5477.284. Vörösmarty, C. J., B. M. Fekete, M. Meybeck, and R. B. Lammers (2000b), Global system of rivers: Its role in organizing continental land mass and 13 of 14 W05418 WIDÉN-NILSSON ET AL.: GLOBAL HYDROLOGICAL MODEL PERFORMANCE defining land-to-ocean linkages, Global Biogeochem. Cycles, 14(2), 599 – 621, doi:10.1029/1999GB900092. Wagener, T. (2003), Evaluation of catchment models, Hydrol. Processes, 17(16), 3375 – 3378, doi:10.1002/hyp.5158. Wagener, T., N. McIntyre, M. J. Lees, H. S. Wheater, and H. V. Gupta (2003), Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic identifiability analysis, Hydrol. Processes, 17(2), 455 – 476, doi:10.1002/hyp.1135. Werth, S., A. Güntner, and B. Merz (2007), Calibration of the global hydrology model WGHM with water storage variations from the GRACE mission, Geophys. Res. Abstr., 9, 05743, sref:1607-7962/gra/ EGU2007-A-05743. Widén-Nilsson, E., S. Halldin, and C. Xu (2007), Global water-balance modelling with WASMOD-M: Parameter estimation and regionalisation, J. Hydrol., 340(1 – 2), 105 – 118, doi:10.1016/j.jhydrol.2007.04.002. Xu, C.-Y. (2002), WASMOD—The water and snow balance modeling system, in Mathematical Models of Small Watershed Hydrology and W05418 Applications, edited by V. P. Singh and D. K. Frevert, chap. 17, pp. 555 – 590, Water Resour. Publ., Highlands Ranch, Colo. Xu, C. Y., and G. L. Vandewiele (1994), Sensitivity of monthly rainfallrunoff models to input errors and data length, Hydrol. Sci. J., 39(2), 157 – 176. L. Gong and S. Halldin, Department of Earth Sciences, Uppsala University, Villavägen 16, SE-752 36 Uppsala, Sweden. (sven.halldin@ hyd.uu.se) E. Widén-Nilsson, Department of Aquatic Science and Assessment, Swedish University of Agricultural Sciences, P.O. Box 7050, SE-750 07 Uppsala, Sweden. C.-Y. Xu, Department of Geosciences, University of Oslo, P.O. Box 1047, Blindern, N-0316 Oslo, Norway. 14 of 14