Estimation of Regional Parameters 03/09/01 Estimation of Regional Parameters Estimation of Regional Parameters in a Macro Scale Hydrological Model Kolbjørn Engeland, Lars Gottschalk and Lena Tallaksen Department of Geophysics, University of Oslo, Norway Abstract Macro-scale hydrological modelling implies a repeated application of a model within an area using regional parameters. These parameters are based on climate and landscape characteristics, and they are used to calculate the water balance in ungauged areas. The regional parameters ought to be robust and not too dependent of the catchment and time period used for calibration. The ECOMAG model is applied for the NOPEX-region as a macro-scale hydrological model distributed on a 2x2 km2 grid. Each model element is assigned parameters according to soil and vegetation classes. A Bayesian methodology is followed. An objective function describing the fit between observed and simulated values is used to describe the likelihood of the parameters. Using Baye’s theorem these likelihoods are used to update the probability distributions of the parameters using additional data, being it either an additional year of streamflow or an additional streamflow station. Two sampling methods are used, regular sampling and Metropolis-Hastings sampling. The results show that regional parameters exist according to some predefined criteria. The probability distribution of the parameters shows a decreasing variance as data from new catchments are used for updating. A few parameters do, however, not exhibit this property, and they are therefore not suitable in a regional context. Key words: hydrological macro modelling, distributed models, regional parameters, GLUE, NOPEX Page 1 Estimation of Regional Parameters 03/09/01 Introduction Regional hydrological modelling or hydrological macro modelling implies a repeated use of a model everywhere within an area using regional parameters. Observations for calibration and validation of the model are only available at a subset of sites where the model is applied. For all sites without observations the model application needs to be based on the regional parameters. The problem as such is a classical one in hydrology - to be able to calculate streamflow or eventually other hydrological variables like soil moisture or groundwater level, at ungauged sites. It has, however, received renewed interest in climate impact studies where water balance elements are estimated over large territories by linking the hydrological models more or less directly to General Circulation Models (GCM). Regionalisation methods aim to find a relationship between the parameters of the modelling units and the physical characteristics of the corresponding landscape unit. Parameters of lumped conceptual models operating at the catchment scale can be regionalised by relating them to catchment characteristics using multiple regression (e.g. Abdulla and Lettenmaier, 1997). For a distributed hydrological model, the approach is different. As the catchment unit used for calibration can be composed of several modelling units, a regression analysis is difficult to perform. Secondly, the modelling strategy adapted in this paper is to include physical characteristics, e.g. soil and landuse classes in the parameterisation. The regression method would disturb this strategy. Klemes (1986) suggested the split sample and proxy basin tests for regionalisation of parameters. The split sample test considers whether the model is transposable in time, and the proxy basin test whether the model is geographical transposable within a region. For both tests the model parameters are calibrated using a subset of data and then validated on independent data (data from years or catchments not used in the calibration). Neither the regression method nor the split sample and proxy basin tests consider the nonuniqueness of parameter sets giving good model results. This is especially important when model parameters are correlated. In such cases two catchments giving approximately the same parameter sets may not be hydrologically similar and vice versa. The split sample and proxy basin tests were used when applying the ECOMAG model to the NOPEX region (Motovilov et al. 1999). A striking result was the variation in performance criteria between different years and different catchments. The final calibrated parameter set was therefore dependent on the data used for calibration. Regional parameters ought to be robust and not too dependent on the catchment and time period used for calibration. A conclusion that can be drawn from the earlier quoted studies is that more formal procedures are needed to be able to accept a model for regional application and in the search for regional parameters. In the hydrological literature there are at least two approaches that can serve as appropriate tools - the multi-objective method (Gupta et al., 1998) and the Bayesian method, in hydrology referred to as Generalised Likelihood Uncertainty Estimation (GLUE) (Beven and Binley, 1992). Both the multi-objective method and the Bayesian method consider the uncertainty in the choice of parameter values instead of finding the one and only optimal parameter set. In a multi-objective context, the parameter variability is due to the trade-off between one or more objective functions for the different catchments, resulting in a set of Pareto optimal parameter sets. Page 2 Estimation of Regional Parameters 03/09/01 The Bayesian method, on the other hand, gives the statistical uncertainty around the optimum of one objective function for all the catchments. In this study the Bayesian method is selected for analysing the performance of the ECOMAG model in the NOPEX area. Streamflow data from several catchments are used to update the probability distributions of the parameters. For a model to "perform satisfactory" in a regional context it might be expected that: • The shape and the optima of the parameter distribution do not depend too much on catchments or years used to estimate it. • The parameters are sensitive to the model result (objective criteria). • The variance of the parameter distribution decreases as new streamflow data (from a new year or catchment) are used for updating. (Follows from the two points above.) • A performance criteria (here the Reff Nash-Sutcliffe coefficient (Nash and Sutcliffe, 1970)) calculated for the optimal regional parameter set for each catchment should be higher than some lowest acceptable value (here minimum 0.75 is classified as a good result and between 0.75 and 0.36 as satisfactory results). An introductory part of the paper presents in brief the main features of the ECOMAG model and the basic data sets used from the NOPEX region. It is followed by a description of the Bayesian method and the applied sampling methods: regular sampling and Metropolis-Hastings (MH) sampling. The results of applying these methods for construction of two dimensional parameter probability distributions for regular sampling and a nine dimensional distribution for MH sampling are presented. The distributions reveal how model structure and parameters behave in a regional context. Finally, conclusions are drawn both what concerns the applicability of the ECOMAG model as a macro-scale hydrological model as well as the quality and quantity of the NOPEX data for use in regional hydrological modelling. Catchment and Model Descriptions The NOPEX Area The NOPEX area (Halldin et al. 1995; 1999) is situated in southern Sweden northwest of Uppsala. It is an area of low relief with altitude ranging from 5 to 145 m.a.s.l.. The area is crossed by some north-south oriented eskers reaching a height of 20-50 m over the surrounding terrain. Also outcrops of bedrock rise over the plain. Till is the most common soil type, particularly in the north. The fine grained clay soils, together with sandy and silty materials, dominate in the south. Part of the area is covered by peat land having the largest extent in the northern part (Seibert, 1994). The NOPEX area has a heterogeneous surface cover, represented by coniferous and mixed forest (57%), mires (2.6%), lakes (2.6%) and urbanised areas (2.0%). The remainder 35.8% is mainly agricultural land (evaluated from digital maps of the National Land Survey of Sweden). The portion of forest increases from south towards north. The forest is predominantly coniferous. Annual precipitation in the NOPEX area fluctuates between 600 and 800 mm, with a minimum in August and a maximum in February. 20 to 30 per cent of the total annual precipitation falls as Page 3 Estimation of Regional Parameters 03/09/01 snow. A snow cover lasts from the middle of November for 100 to 110 days on average, but is normally not continuous throughout the winter. The mean annual temperature for the period 19611990 at the station Uppsala is +6oC, with a maximum in July (+17oC) and minimum in February (5oC). The vegetation period lasts about 180 days (Seibert, 1994). The Swedish Meteorological and Hydrological Institute (SMHI) has 25 precipitation stations, 7 temperature stations, 5 air humidity stations and 10 streamflow gauging stations in the NOPEX area. The gauged catchments cover a large part of the area as illustrated in Fig. 1. Short catchment descriptions are given in Table 1. All the data are available as daily values for the period 1981-1995 in the SINOP database (Halldin and Lundin, 1994) developed for the NOPEX project. Temperature and vapour pressure deficit are interpolated to a regular 2km grid by inverse distance weighting whereas the precipitation is interpolated by kriging (Motovilov et al. 1999). Fig. 1. NOPEX area and the ten gauged catchments Page 4 Estimation of Regional Parameters 03/09/01 Table 1 - Gauged catchments in the NOPEX area. Station Vattholma Ulva Kvarndamn Sörsätra Gränvad Härnevi Lurbo Ransta Sävja Stabby Tärnsjö Catchment Vattholmån Fyrisån Sagån Lillån Örsundaån Hågaån Sävaån Sävjaån Stabbybäcken Stalbobäcken Area (km2) 284.0 950.0 612.0 168.0 305.0 124.0 198.0 727.0 6.6 14.0 Lake (%) 4.8 3.0 1.1 0.0 1.0 0.3 0.9 2.0 0.0 1.5 Forest (%) 71.0 61.0 61.0 41.0 55.0 77.7 66.1 64.0 87.0 84.5 Open land(%) 24.2 36.0 37.9 59.0 44.0 27.0 33.0 34.0 13.0 14.0 The ECOMAG Model The ECOMAG model (Motovilov et al. 1999) describes the main processes of the land surface hydrological cycle: infiltration, evapotranspiration, heat and water regime of the soil, snowmelt and formation of surface, subsurface, groundwater and river runoff on a daily time resolution. The catchment is divided into grid cells (here 2x2 km), and the same model algorithms are applied on each cell. The vertical structure of each grid cell is shown in Fig. 2. A threshold temperature decides the phase of precipitation: snow or rain. Snowmelt is calculated using a degree-day factor. The water reaching the ground, rain in summer or melt water in winter, infiltrates into horizon A, portioned between the capillary and the non-capillary zone. If horizon A is saturated or infiltration capacity exceeded, surface runoff is formed, described by a kinematic wave. In the non-capillary zone of horizon A, the water can flow horizontally to the river network following Darcy's law, or infiltrate vertically into horizon B. From the capillary zone of horizon A water can only be removed by evapotranspiration. In horizon B the water can penetrate to the groundwater zone. In the groundwater zone the water flows horizontally to the river network following Darcy's law. Each grid cell is assigned a soil class and a vegetation class. Some of the parameters are determined from the soil or the vegetation class of the grid cell. The rest of the parameters are common for the whole region. Three parameters are described by a distribution function to account for the variability within a grid-cell: the field capacity of horizon A, the surface retention storage and the vertical conductivity of horizon A. Table 2 lists the optimal parameter values found in Motovilov et al. (1999). To reduce the number of parameters that need calibration, the parameters for soil and vegetation classes are not calibrated for each individual class. Instead the standard parameter values are multiplied by a common factor. This means that the relative differences between the parameter values of each soil or vegetation class are determined prior to the calibration. Page 5 Estimation of Regional Parameters 03/09/01 precipitation E5 ice particles non capilla ry zone capillary zone snow cover h5 melt water E1 infiltration Z2 subsurface inflow horizon A s o i l m a t r i x horizon A non capillary zone capillary zone penetration evapotranspiration E2 infiltration surface water storage E3 subsurface inflow horizon B Z3 s o i l WP horizon B porosity h3 E4 groundwater inflow Z4 surface water outflow h2 River flow subsurface outflow horizon A field capacity groundwater zone h4 penetration m a t r i x h1 return flow surface water inflow subsurface outflow horizon B groundwater outflow Fig. 2. Vertical structure of the ECOMAG model. Application of ECOMAG to the NOPEX Region In this application the thickness of horizon B is set to zero due to the fact that in a typical Nordic catchment having mainly till deposits, the ground water table is close to the surface. The present study is based on the work by Motovilov et al.(1999) where a regional calibration and validation of ECOMAG was done following the proxy basin scheme (Klemes, 1986) in addition to internal validation. As a first step the model was calibrated on streamflow data for seven years for three catchments. An additional adjustment of the soil parameters was performed using soil moisture and groundwater data from five small experimental catchments. This was followed by validation of the model against streamflow for 14 years from six other catchments and synoptic streamflow and evapotranspiration measurements performed during two concentrated field efforts in 1994 and 1995. Page 6 Estimation of Regional Parameters 03/09/01 Table 2 - Parameters needed to be specified in the ECOMAG model and the parameter values found by Motovilov et al. (1999a; 1999b). d.l. = dimensionless 3DUDPHWHUVIRUVRLOFODVVHV 3HDW &OD\ 6DQG 7LOO 6KDOORZ /DNHV EHGURFNV Volume density (g cm-3) Porosity of horizon A (d.l.) Porosity of groundwater zone (d.l.) Field capacity of horizon A (d.l.) Field capacity of groundwater zone (d.l.) Wilting point of horizon A (d.l.) Vertical conductivity of horizon A (cm day-1), VCA 0.2 0.90 0.80 0.60 0.60 0.30 464.7 1.0 0.65 0.45 0.45 0.43 0.27 139.4 1.2 0.45 0.45 0.20 0.40 0.10 464.7 1.1 0.60 0.45 0.40 0.43 0.16 232.3 1.2 0.45 0.25 0.10 0.20 0.02 464.7 0.2 0.90 0.80 0.60 0.60 0.10 464.7 *Horizontal conductivity of horizon A (cm day-1), HCA *Horizontal conductivity of groundwater zone (cm day-1), HCG **Thickness of horizon A (cm), THA Thickness of horizon B (cm) Maximal field capasity (d.l.) 1140 81202 11400 1140 1540 1540 114000 1710 0 4620 1540 1540 1540 95.44 0.0 0.70 47.72 0.0 0.57 47.72 0.0 0.28 47.72 0.0 0.20 95.44 0.0 0.70 3DUDPHWHUVIRUYHJHWDWLRQFODVVHV -1 -1 Evaporation parameter (cm d mb ) , EVAP Degree-day-factor (cm d-1 oC-1), DDF Density of new snow (g cm-3) Heat conductivities for thawed soil (cal cm-1 oC-1 d-1) Heat conductivities for frozen soil (cal cm-1 oC-1 d-1) **Maximal surface depression storage (cm), SDS Mannings roughness coefficient for slope Factor for thickness of horizon A (d.l.) Factor for horizontal conductivity of horizon A (d.l.) 47.72 0.0 0.55 2SHQODQG )RUHVW /DNH 6ZDPS 8UEDQ 0.072 0.42 0.15 120 240 4.0 11.5 1.0 1.261 0.072 0.294 0.12 96 192 4.0 11.5 1.3 0.247 0.080 0.42 0.15 96 192 7.0 11.5 1.0 1.0 0.080 0.42 0.15 96 192 7.0 11.5 1.0 1.0 0.072 0.42 0.15 120 240 4.0 11.5 1.0 1.0 3DUDPHWHUVIRUZKROHFDWFKPHQW Critical temperature snow/rain (oC) CTP Snow water holding capasity (volume/volume) Parameter of snow compaction (cm2 g-1 day-1) Snow evaporation parameter (cm day-1 mb-1) Depth of unchanged ground temperature (cm) Temperature of ground water (oC) Part of actual evaporation from horizon A (d.l.) (The rest is evaporated from ground water zone) Critical temperature for start of snow melt (oC) CTM 0.69 0.045 0.15 0.01 120 2.0 0.94 0.00 *The parameter is multiplied by the factor (mean slope of element/mean slope of NOPEX area) **The parameter is divided by the factor (mean slope of element/mean slope of NOPEX area) Page 7 Estimation of Regional Parameters 03/09/01 The Bayesian Method The Bayesian method aims to establish a multi-dimensional probability distribution for the parameters conditioned on hydrological observations. When we have observed data vector Y, the probability, p, of the parameter set i is given by: p( i Y ) (1) Due to non-linearities in the model, the distribution may have an irregular surface containing several local maxima. Therefore an empirical non-parametric distribution of Eq. (1) is established. Estimation of Likelihood First the likelihood for the parameter sets is calculated as (e.g. Freer et al, 1996): σ2 L( i Y ) ∝ exp − 2i σ obs (2) where L( i Y ) is the likelihood of the ith parameter set given the observations Y, σi2 is the sum of squared errors divided by number of time steps in a period and σobs2 is the observed variance over a period (here one year). The likelihood function is calculated for a period of one year, 1. June - 31. May. The choice of the likelihood function can be based on two different arguments. The first is to assume that the likelihood of a parameter set is proportional to a quality of fit measure (Beven and Binley, 1992). The choice of likelihood function is then subjective. The function in Eq. (2) was chosen due to the fact that when Baye’s theorem is used for updating, the error variance of each period or catchment contributes linearly inside an exponent. Secondly a statistical derivation that detects the assumptions hidden inside this likelihood function, can be performed. In this case it is assumed that Yj ~ f (Yj | i ) where f (⋅ |⋅ ) is a density function indexed by a parameter vector i . i When the data Yj are given, the likelihood function L( i |Yj) is any function proportional to f(Yj | i ). It is assumed that the simulation errors at each time step j, εi,j are identically normal distributed: ε i , j ~ N (0, σ ε2 ) (3) where σε2 is the variance for simulation errors (the difference between the observed and simulated streamflow). Then the likelihood of the ith parameter set i , dependent on one observation Yj, is: L ( 1 ε i2, j − Y exp ∝ i j 2σ2 ε ) The likelihood of the parameter set errors are independent, is: Page 8 (4) i when data Y from one year are given and assuming the Estimation of Regional Parameters n 2 ∑ ε i, j n 1 ε i2, j = exp − j =1 L( i Y ) ∝ ∏ exp − 2 2σ 2 2σ j =1 ε ε 03/09/01 σ i2 − = exp 2σ ε2 n (5) In order to get Eq. (2) set n=365 and the variance of the simulation errors has to be: n 2 σ ε2 = σ obs 2 (6) It should be noted that the variance in Eq. (6) corresponding to the likelihood in Eq. (2) is far too large to be a reasonable estimate of the variance of the simulation errors. The over-estimation of the variance in Eq. (2) results in an over-estimation of the variance of the model parameters. The likelihood in Eq. (2) is not a statistical likelihood but a generalised (GLUE) likelihood. The difference between Eq. (2) and Eq. (5) influence only the scale of the likelihood, neither the location nor the shape. The statistical model chosen for the simulation errors (Eq. 3) does not describe all the properties of the data. To find a more statistical correct likelihood function, the statistical properties of the simulation errors have to be carefully investigated: the distribution of the simulation errors, whether the variance is constant or dependent on streamflow and input variables (e.g. temperature, precipitation) and whether the simulation errors are autocorrelated. Sorooshian (1991), Romanowicz et al. (1994), Langsrud et al. (1998) and Kuczera (1983) among others have constructed statistical models for the simulation error that take into account one ore more of these three aspects. To find a model that satisfactory manage to describe the simulation errors is difficult. Gupta et al. (1998) suggested that it is possible that there may not exist an objective statistically correct choice for the error function. Even though the chosen likelihood function is not statistically correct, is it a useful starting point for investigating the existence of a regional parameters and approximately where in the parameter space the best parameters values are located. The likelihood function (Eq. 2) is selected because it is already commonly applied in hydrology. The aim is not to assess the model- or parameter uncertainty in detail. Further research into this topic is needed, however not within the scope of the present work. Definition of Prior Distribution of Parameters The prior distribution of the parameters is chosen to be uniform. The upper and lower limits of the uniform distributions have to be decided. For some parameters the choice is easy, e.g. threshold temperature for snow/rain precipitation. For other parameters, e.g. horizontal conductivity of horizon A, test runs were performed to ensure a sufficient large variation. The borders span the optimised parameters from Motovilov et al. (1999) (Table 2). Page 9 Estimation of Regional Parameters 03/09/01 Estimation of Posterior Distribution of Parameters The posterior distribution, p( i |Y ), is estimated using Baye’s theorem: p( i Y ) = p(Y i )P( ) P(Y ) i When the data Y are given, p(Y| i ) can be regarded as a function of function of i given Y and is written L( i |Y). We then have: P( i Y ) = L( i Y )p( i ) C (7) i , which is the likelihood (8) where p( i ) is the prior probability for the parameter set, p( i |Y) the posterior probability given observations Y, L( i |Y ) the likelihood function calculated from the set of observations Y, and C is a scaling constant making the cumulative sum equal to 1. Data from an additional period or data from a neighbouring catchment are used for updating the distribution p( i |Y ). When Eq. (2) is used to calculate the likelihood, additional error variance contributes linearly inside the exponent. Repeated use of equations 2 and 8 gives the posterior distribution conditioned on n set of observations: σ2 σ 2 p( i Y1 ....Yn ) = exp − 2i ,1 − .... − 2i ,n C σ σ obs ,1 obs ,n (9) Value of Additional Data The Shannon entropy measure H describes the variance of a multi-dimensional distribution and is here used to measure the advantage of additional data. M H =− ∑p i =1 i log p i log M (10) where the probabilities pi, are scaled making the sum equal to one, and M is the total number of simulations of different parameter sets. This function has 1 as a maximum value when all the realisations have the same probability, and a minimum at zero when one realisation has a probability of 1 and all others are zero. Page 10 Estimation of Regional Parameters 03/09/01 Uncertainty Boundaries for Streamflow The errors in the streamflow simulations have four main sources; measurement errors in the observed data used for calibration or as input, errors in the model structure and errors in the parameter values. To assess the uncertainty in calculated streamflow due to the uncertainty in parameter values, samples from the estimated parameter distribution are put into the model to calculate a sample of streamflows for each day. The 95% quantiles of the calculated streamflow can then be plotted together with the observed streamflow. Such a plot is useful for illustrating how important uncertainty in parameter values is for the simulation error. If the observed streamflow falls outside the uncertainty boundaries, the other error sources also are important. Sampling at regular points The distributions are sampled at regular points in the parameter space. Following this strategy, the distributions are calculated only once for each data set (here totally 90 data sets from 10 year and nine catchments), and afterwards Baye’s theorem gives the possibility to combine the distributions. However, this implementation requires huge calculation capacity when the parameter space has many dimensions. If the model has 9 parameters, 109 simulations are necessary to run the model at a resolution of 1/10 of the initial parameter range. It is, however, still possible that the area of highest probability is not found. If the parameter range is chosen too wide, it is possible to get a too sparse sample in the parameter space, and if it is too narrow, it is possible that the most probable part of the parameter space falls outside the window. As 1000 iterations of the ECOMAG model for the nine catchments on a 500 MHz Compaq Alpha workstation requires 7.5 hours, the computation time limits the dimension of the parameter space. This would also be the case for other sampling strategies, e.g. importance sampling and stratified sampling. Therefore conditional distributions are calculated. Metropolis Hastings Sampling The Metropolis Hastings (MH) algorithm (Metropolis et. al, 1953, Hasting, 1970), a Markov Chain Monte Carlo (MCMC) method, is used to find the distribution in a nine dimensional parameter space. This sampling method is chosen because it only requires knowledge about the likelihood function, and because the number of required calculations is almost independent of the dimension of the parameter space. Kuczera and Parent (1998) used the MH algorithm to assess parameter uncertainty in a hydrological model, and they concluded that this algorithm produces reliable inference with modest sampling. Here a random-walk MH algorithm is applied (Chib and Greenberg, 1995) to calculate the chain {θ (1), θ (2), .., θ (n)}which gives the parameter vector for iterations 1, 2, .., n The first iterations before the chain has converged, the burn in, has to be removed. From the rest of the chain, the mean, variance, correlation and histogram can be calculated. As a long chain is better than several short chains Geyer (1992), one chain of length 11000 is calculated. The number of iterations is for the MH algorithm, independent of the dimension of the parameter space in case for the MH algorithm, whereas for the regular sampling the number of calculations increases exponentially with the dimension. As the MH algorithm investigates only the most probable part of the parameter space, the problem of too small sampling density is avoided. But the Page 11 Estimation of Regional Parameters 03/09/01 MH algorithm has some drawbacks. The chain generated by the MH algorithm may not converge, and especially for multi-modal distributions the algorithm may have difficulties. The MH routine does not allow updating the distribution (Eq. 3) when additional data are available, and a new chain has to be calculated. In this study totally 90 data sets are available, but a chain is calculated only once using all the data sets. Results To investigate how the entropy develops as more data sets are used in the likelihood function, regular sampling of conditional distributions in a two-dimensional parameter space is performed. The MH algorithm is applied for calculating the distribution in the nine dimensional parameter space when data from 10 years and 9 catchments are used. Regular sampling A regular sampling of conditional distributions is performed in two dimensional parameter spaces. To have a common reference, the horizontal conductivity of horizon A is chosen as one dimension in all the spaces. The other dimensions are in turn vertical conductivity of horizon A, horizontal conductivity of the groundwater zone, thickness of horizon A, evaporation parameter, surface depression storage, critical temperature snow/rain precipitation, degree-day-factor and threshold temperature for start of snowmelt. These nine parameters prove to be the most sensitive to simulation of streamflow. The likelihood function is computed for a time period of one year, 1. June - 31. Mai, for each catchment for the years 1981-1990. Fig. 3 shows the entropy measure of the conditional and marginal distributions as new streamflow data are used for updating (10 years for each station) when regular sampling is used. MH sampling The MH algorithm is tuned, and 11000 iterations are performed for a nine dimensional parameter space. The histograms for the marginal distributions are shown in Fig. 4. As the starting point of the chain is assumed to be within the distribution, the first 1000 samples are removed as burn in, and 10000 samples are left for the analysis. Several of the parameters are correlated (Table 3). The highest correlations are found between the three parameters representing winter conditions: degreeday-factor, critical temperature for start of snowmelt and critical temperature for phase of precipitation. The calculated chains seem to be stationary for all parameters. The autocorrelations decrease slowly for the three correlated parameters, after 1000 iterations the autocorrelations are insignificant. Test calculations show that the problems disappear if two of the three parameters are excluded. The high autocorrelation implies that longer chains are necessary to get a better estimate of the histograms and that the results from the two dimensional conditional distributions may not be totally correct. The Reff Nash-Sutcliffe coefficients for the parameter values having the maximum likelihood value in the MH-chain are given in Table 4. Page 12 Estimation of Regional Parameters 03/09/01 Fig. 3. The entropy measure for the two dimensional parameter spaces, as new data are used for updating of the probability distributions. Table 3 - Correlation matrix for nine ECOMAG parameters (See table 2 for the abbreviations of parameter names) HCA EVAP DDF CTM CTP THA SDS HCG VCA Page 13 HCA 1.00 0.34 -0.09 0.03 -0.02 0.47 0.43 0.43 -0.10 EVAP 0.34 1.00 -0.01 0.07 -0.13 0.18 0.08 0.32 0.02 DDF -0.09 -0.01 1.00 0.81 -0.31 0.05 0.01 -0.10 -0.16 CTM 0.03 0.07 0.81 1.00 -0.65 0.05 0.15 0.02 -0.13 CTP -0.02 -0.13 -0.31 -0.65 1.00 0.08 -0.06 -0.05 0.01 THA 0.47 0.16 0.05 0.05 0.08 1.00 0.37 0.45 -0.00 SDS 0.43 0.08 0.01 0.15 -0.06 0.37 1.00 0.41 -0.25 HCG 0.43 0.32 -0.10 0.02 -0.05 0.45 0.41 1.00 -0.04 VCA -0.01 0.02 -0.16 -0.13 0.01 -0.00 -0.25 -0.04 1.00 03/09/01 1.5 0.0 0 0.0 1 0.5 0.4 2 1.0 3 0.8 4 Estimation of Regional Parameters 0 2 4 6 8 10 0.0 2.0 3.0 0.0 Evaporation parameter 2.0 3.0 Degree-day-factor 0.4 -1 0 1 2 3 0.5 -2 0 1 2 3 1.0 2.0 3.0 Thickness horizon A 0 2 4 6 8 Surface depression storage 10 0.005 0.0 0.0 0.0 0.1 0.05 0.2 0.10 0.0 Critical temperature for phase of precipitation 0.3 0.15 Critical temperature for start of snowmelt -1 0.015 -2 0.0 0.0 0.0 0.2 0.2 1.0 0.4 1.0 1.5 Horizontal conductivity horizon A 1.0 0 2 4 6 8 10 Horizontal conductity groundwater zone 0 20 40 60 80 100 Vertical conductivity horizon A Fig. 4. The posterior marginal probability distributions updated on data from all years and catchments, calculated using the MH algorithm. Page 14 Estimation of Regional Parameters 03/09/01 Table 4 - Model performance (Reff) for the gauged catchments in the NOPEX area, when the optimal parameter set is used. Catchment Fyrisån Sagån Lillån Örsunda ån Hågaån Savaån Savjaån Stabby bäcken Stalbo bäcken 0.76 0.82 0.75 0.63 0.72 0.89 0.89 0.66 0.91 0.75 0.65 0.62 0.58 0.85 0.53 0.58 0.50 0.28 0.71 0.38 0.82 0.69 0.72 0.74 0.68 0.66 0.60 0.28 0.63 0.49 0.86 0.67 0.66 0.84 0.77 0.77 0.70 0.29 0.78 0.58 0.75 0.53 0.69 0.80 0.81 0.66 0.68 0.40 0.75 0.53 0.82 0.69 0.76 0.91 0.83 0.80 0.78 0.37 0.82 0.68 0.75 0.53 0.43 0.81 0.80 0.80 0.73 0.19 0.83 0.66 0.17 0.68 0.86 0.46 0.21 0.54 0.80 0.57 0.84 0.38 0.58 0.61 0.66 0.72 0.49 0.54 0.60 0.36 0.70 0.63 Year 1981/82 1982/83 1983/84 1984/85 1985/86 1986/87 1987/88 1988/89 1989/90 1990/91 The MH sample of the parameter distribution is put into the ECOMAG model to calculate a sample of streamflows for each day. In Fig. 5 the area between the 95% quantiles of the calculated streamflow is shaded grey, and the observed values are solid lines. The results are shown for four of the catchments for the years 1986/87 and 1988/89. Discussion The entropy of the conditional distributions decreases as new data are used for updating them (Fig. 3), and quite well defined areas, with two exceptions, appear to contain good parameter values for the NOPEX region (Fig 4). The two parameters that do not follow this trend are the vertical conductivity of horizon A and the surface depression storage. The reasons for the high variances of these two distributions are different. When the vertical conductivity of horizon A is high enough, the model results are good. Above this limit, the model results are rather insensitive to the parameter. This can be explained by the high infiltration capacity in a typical Nordic catchment. All precipitation will infiltrate in unsaturated areas. The vertical conductivity of horizon A is therefore not important to include in the parameter space, providing a sufficient high value has been chosen. Consequently, it could be excluded to save computing time. For surface depression storage there are differences between the catchments concerning the shape of the distributions, resulting in a regional distribution with high variance and without a clear modal value (Fig. 4). The differences between the catchments show that the parameterisation of the grid cells does not manage to catch the physical properties that are most important for this parameter. It is also possible that this process description is a compensation for other physical processes, e.g. interception. A change in the description and/or parameterisation of this process is necessary to find a regional parameter value. Page 15 Estimation of Regional Parameters 03/09/01 1986/87 Q (m3/ s) 40 1988/89 Q ( m3/ s) 20 F yris å n F yris å n 18 16 35 30 14 25 20 12 10 15 8 6 10 4 2 5 0 0 1 61 12 1 Q ( m3/ s) 16 18 1 D ay s 241 301 1 361 L illå n 61 12 1 Q (m3/ s) 9 18 1 D ay s 241 301 361 L illå n 8 14 7 12 6 10 5 8 4 6 3 4 2 2 1 0 0 1 61 Q ( m3/ s) 1.2 12 1 18 1 D ay s 241 301 1 361 61 Q (m3/ s) 0 .4 S t a lb o b ä c k e n 12 1 18 1 D ay s 241 301 361 241 301 361 241 301 361 S t a lb o b ä c k e n 0 .3 5 1 0 .3 0 .8 0 .2 5 0 .2 0 .6 0 .15 0 .4 0 .1 0 .2 0 .0 5 0 0 1 61 12 1 18 1 D ay s 241 301 361 61 12 1 45 18 1 D ay s S a gå n Q ( m3/ s) 40 S a gå n Q ( m3/ s) 50 1 35 40 30 35 25 30 20 25 20 15 15 10 10 5 5 0 0 1 61 12 1 18 1 D ay s 241 301 361 1 61 12 1 18 1 D ay s Fig. 5. The 95% quantiles (grey area) of simulated streamflow resulting from the parameter uncertainty, and the observed streamflow (solid line) for 1986/87 and 1998/89. Page 16 Estimation of Regional Parameters 03/09/01 Fig. 3 reveals that for three parameters (critical temperature for start of snowmelt, degree-dayfactor and critical temperature for phase of precipitation) the entropy stops decreasing when the data sets from the seven catchments have been used for updating the distributions. This suggests that a limit is reached for how accurate the parameters can be defined, and new data do not provide more information about the parameters. However, as these three parameters are highly correlated, these conclusions may not be valid for the marginal distributions. The parameter values found by Motovilov et al. (1999) (Table 2) appear to be close to the modal value found here. Also the Reff Nash-Sutcliffe coefficients for the parameter values giving the maximum likelihood in the MH-chain (Table 4), indicate satisfactory or good model performance for all catchments. For individual catchments and years, however, the criterion is not satisfactory. For Stabbybäcken the Reff has the smallest value, 0.17 for the year 1981/82, and for the year 1988/89 the model does not perform good for any catchments and satisfactory for only five. Stalbobäcken is a special catchment as it contains parts of an esker, and it may have been better to exclude this catchment in the estimation of regional parameters. The observed streamflow is outside the uncertainty boundaries much of the time (Fig. 5), indicating that other error sources than errors in parameter values are important. The observed streamflow is outside the uncertainty boundaries for all catchments during the days 200-220 in the year 1988/89. ECOMAG does not manage to describe the general response resulting from the special winter condition of this period. For Sagån the observed streamflow is almost always higher than the upper 95% quantile. The observed specific streamflow in Sagån is higher than in the other eight catchments. Therefore a quality control of the streamflow data and rating curve ought to be performed before concluding that ECOMAG fails to model this catchment properly. Also for Stalbobäcken streamflow is difficult to simulate. This is a special catchment as it contains parts of an esker giving a relatively high streamflow in recession periods. This indicates that not all the important physical characteristics of the grid elements are properly included in the parameterisation of ECOMAG. It is important to be aware of that the chosen likelihood function is not a statistical likelihood but a generalised one. The variance of the simulation errors (Eq. 6) is far too large to be a statistical likelihood. If a more correct variance is used, the entropy in Fig. 3 would have been smaller, the variance of the distributions in Fig. 4 would have been smaller, and the uncertainty boundaries in Fig. 5 would have been narrower. The main conclusions, however, would not change much. The difference between Eq. (2) and Eq. (5) affects only the scale of the likelihood, neither the location nor the shape. The posterior (Eq. 9), however, might vary as the contribution to the posterior likelihood from the different data sets might change. But this difference is not expected to influence the conclusions drawn on the basis of the four regionalisation criteria defined in the introduction. The GLUE approach let the uncertainty in the parameter values account for too much of the simulation errors and does not recognise the three other sources of errors: measurement errors in the observed data used for calibration or as input and errors in the model structure and errors. Page 17 Estimation of Regional Parameters 03/09/01 Conclusions The aim of this work is to show the existence of, and that it is meaningful to define, regional parameters for the macro scale hydrological model ECOMAG. The Bayesian method is used to find a distribution for the parameters and data from different catchments and years are used for updating the distribution. The results herein suggest that for the NOPEX area regional parameters exist according to the predefined criteria. The use of additional data implies a decrease in the variances of the conditional distributions for most of the parameters, and a relatively narrow area in parameter space appears to contain good parameter sets for simulation of streamflow in the nine studied catchments. For three parameters (critical temperature for start of snowmelt, degree-day-factor and critical temperature for phase of precipitation) a limit is reached for how accurate the parameters can be defined. When this limit is reached, additional data do not give more information about the parameters. It is shown that one parameter in the ECOMAG model, the surface depression storage, is not suitable for a regional application. The location of the best parameter value changed too much between years and catchments. The model results are found relatively insensitive to the vertical conductivity of horizon A, they are good providing a sufficient high value is chosen for this parameter. Even though the statistical assumption for calculating the distribution of the parameters are violated, i.e. the simulation errors are independently normally distributed with zero mean and constant variance, the Bayesian method proves to be useful for estimating regional parameters. As the simulation errors may depend on climatic conditions as well as physical properties of the catchments, the search for a more appropriate error model needs further attention. To better identify the parameter uncertainty, it would also be necessary to identify the other three main error sources: measurement errors in the observed data used for calibration or as input and errors in the model structure. The GLUE approach let the parameter uncertainty account for too much of the simulation errors. A practical problem is the required computing time. For regular sampling the dimension of the parameter space and the density of the samples in parameter space can not be as high as wanted whereas the MH algorithm requires recalculations when new data are available. The regular sampling would gain from a reduction in the dimension of the parameter space, which could be obtained by searching for relationship between parameters. Page 18 Estimation of Regional Parameters 03/09/01 References Abdulla, F. A. and Lettenmaier, D. P. (1997) Application of regional parameter estimation schemes to simulate the water balance of a large continental river, Journal of Hydrology Vol. 197, pp. 258-285. Beven, K.J. and Binley, A.M., 1992, The future of distributed models: Model calibration and uncertainty prediction, Hydrol. Proces., 6, 279-298. Chib, S. and Greenberg, E. (1995) Understanding the Metropolis-Hastings Algorithm, The American Statistican, Vol. 49, (4), pp. 327-335. Freer, J., Beven, K. and Ambroise, B. (1996) Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach, Water Resources Research, Vol. 32 (7), pp. 2161-2173. Geyer, C.J. (1992) Practical Markov Chain Monte Carlo Statistical Science, 7, pp.743-511 Gupta, H. V., Sorooshian, S. and Yapo, P. O. (1998) Towards improved calibration of hydrologic models: Multiple and noncommensurable measures of information, Water Resources Research, Vol. 34 (4), pp. 751-763. Halldin, S. and Lundin, L-C. (1994) SINOP-system for information in NOPEX , NOPEX Technical report No. 1, Institute of Earth Sciences, Uppsala University. Halldin, S., Gottschalk, L., Van de Girend, A. A., Gryning, S-E., Heikinheimo, M., Högstrom, U., Jochum, A. and Lundin, L-C. (1995) Science plan for NOPEX, NOPEX Technical report No. 12, Institute of Earth Sciences, Uppsala University. Halldin, S., Gottschalk, L., Van de Griend, A.A., Gryning, S_E., Heikinheimo, M., Högstrom, U., Jochum, A. and Lundin, L-C. (1999) NOPEX - a northern hemisphere climate processes land surface experiment. Accepted for publication in a BACH special issue of Journal of hydrology. Hasting, W.K (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrica Vol. 57, pp. 97-109. Klemes, V. (1986) Operational testing of hydrological simulation models, Hydrological Sciences Journal, Vol. 31 (1), pp. 13-24. Kuczera, G. (1983) Improved parameter ingerence in Catchment models 1. Evaluating Parameter Uncertainty, Water Resources Research, Vol. 19 (5), pp. 1151-1162. Kuczera, G. and Parent, E. (1998) Monte Carlo assessment of parameter uncertainty in conceptual catchment models: the Metropolis algorithm, Journal of Hydrology, Vol. 211, pp. 69-85. Langsrud, Ø., Frigessi, A. and Høst, G. (998) Pure Model Error of the HBV-model, Note 4/1998, Norwegian Water Resources and Energy Directorate, Oslo. Metropolis, N.., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A. H. and Teller, E. (1953) Equation of state calculations by fast computing machines. J. Chem. Phys. Vol. 21 pp. 1087-1092. Motovilov, Y.G., Gottschalk, L. Engeland, K. and Rodhe, A. (1999) Validation of a distributed model against spatial observations, Nopex special issue of Journal of Agricultural and Forest Meteorological Research, 98-99 pp. 257-277. Nash, J. E. and Sutcliffe, J. V. (1970) River flow forecasting through conceptual models part 1 - A discussion of principles, Journal of Hydrology Vol. 10, pp. 282-290. Romanowicz, R., Beven, K. J. and Tawn, J. A. (1994) Evaluation of predictive uncertainty in nonlinear hydrological models using a Bayesian approach, Statistics for the Environment II, Water Related Issues, edidet by Barnett, V. and Turkman, K.F. pp. 297-317 John Wiley, New York. Page 19 Estimation of Regional Parameters 03/09/01 Seibert, P. (1994) Hydrological characteristics of the NOPEX research area, Nopex Technical Report No. 3, Instutute of Earth Sciences, Uppsala University. Sorooshian, S. (1991) Parameter Estimation, Model Identification, and Model Validation: Conceptual-Type Models, Recent Advances in The Modelling of Hydrologic Systems, edited by Bowles, D., S. ans O’Connel, P. E., pp. 443-467, NATO ASI Series -Vol. 345. Tanner, M.A. (1996) Tools for Statistical Inference: Methods for Exploration of Posterior Distributions and Likelihood Functions, Springer, New York Aknowledgements This work has partly been carried out within the framework of NOPEX - a NOrthern hemisphere climate Processes land-surface EXperiment. The data used in this investigation come from SINOP the System for Information in NOPEX. The river streamflow and climate data were provided to SINOP from the Swedish Meteorological and Hydrological Institute (SMHI). The authors are grateful for the constructive and useful critics raised by the reviewers. Page 20