ESTIMATION OF REGIONAL PARAMETERS IN A MACRO SCALE HYDROLOGICAL MODEL Kolbjørn Engeland1, Lars Gottschalk2 and Lena Tallaksen3 Department of Geophysics, University of Oslo, P.O. Box 1022 Blindern, N-0315 Oslo, Norway, e-mail: 1kolbjorn.engeland@geofysikk.uio.no 2 lars.gottschalk@geofysikk.uio.no 3lena.tallaksen@geofysikk.uio.no INTRODUCTION Regional hydrological modelling or hydrological macro modelling implies a repeated use of a model everywhere within a region using a global set of parameters. Observations for calibration and validation of the model are only available at a subset of sites where the model is applied. For sites without observations the model application needs to be based on the global parameters. In a distributed hydrological model the parameters are determined for each fundamental unit (grid cell) of the model based on physiographic factors, e.g. topography, soil type and vegetation class of the units. The same global parameter values are used everywhere where the physiographic factors for a fundamental unit falls into the same classes. Parameters of lumped conceptual models can be regionalised by using multiple regression to relate them to catchment characteristics. In the split sample and proxy basin tests (Klemes, 1986), the model is calibrated on some of the data and then validated on data from time periods and catchments not used in the calibration. Neither the regression method nor the split sample and proxy basin tests consider the non-uniqueness of acceptable parameter sets. A striking result from Motovilov et al. (1999a; 1999b) was the variation in performance criteria between different years and different catchments. Regional parameters ought to be robust, physical based, and have a low dependence on the catchment and time period used for calibration. A conclusion that can be drawn from earlier studies is that more formal procedures are needed for accepting a model for regional application as well as in the search for regional parameters. In the hydrologic literature there are at least two approaches that can serve as appropriate tools - the multi-objective method (Gupta et al., 1998) and the Bayesian method (Binley and Beven, 1991). In the first case the model is executed for several possible parameter sets and catchments. On the basis of one or several error criteria it is possible to judge which parameter sets give acceptable simulations and which do not. The method provides a decision rule as how to select the parameter sets that perform satisfactory for all catchments. The result will be several possible parameter sets. The Bayesian method aims to estimate a probability distribution of the parameters. Parameter sets are given likelihoods based on a quality measure describing the goodness of fit between observed and simulated values. These likelihoods can be used for updating the probability using additional data following Baye’s theorem. The data can be streamflow from an additional year or from an additional catchment. Both the multi-objective method and the Bayesian method consider the uncertainty in the choice of parameter values. Instead of letting the user or an optimisation routine decides the one and only optimal parameter set, several parameter sets are considered as likely. In this study the Bayesian method has been selected for analysing the performance of the ECOMAG model for the NOPEX region. Streamflow data from several catchments are used to update the probability distributions of the parameters. For a model to "perform satisfactory" in a regional context it might be expected that: • The shape and the optima of the parameter distribution do not depend too much on catchment or year used for estimating the distribution. • The model result are sensitive to the parameters (objective criteria). • The variance of the parameter distribution decreases as additional streamflow data (from an additional year or catchment) are used for updating (follows from the two points above). • A performance criteria (here the Reff Nash-Sutcliffe coefficient (Nash and Sutcliffe, 1970)) calculated for the optimal regional parameter set for each catchment should be higher than some lowest acceptable value (here minimum 0.75 is classified as a good result and between 0.75 and 0.36 as satisfactory results). An introductory part of the paper presents in brief the main features of the study area, the ECOMAG model and the basic data sets used. It is followed by a description of the theoretical concepts of the Bayesian method and the application of this method for construction of two dimensional parameter probability distributions. The purpose of this paper is to study the applicability of the ECOMAG model as a macro scale hydrological model using the Bayesian method as well as the quality and quantity of the NOPEX data for use in regional hydrological modelling. CATCHMENT AND MODEL DESCRIPTIONS The NOPEX Area The NOPEX area (Halldin et al., 1995) is situated in the southern part of Sweden, north-west of Uppsala. It is an area of low relief with altitude ranging from 5 to 145 m.a.s.l.. Till and clay are the most common soil types, and coniferous forest and agricultural land are the dominating surface covers (Seibert, 1994). Data from the regular climate and discharge observation network of the Swedish Meteorological and Hydrological Institute in the NOPEX area were used for the period 1981-1990. The data set contains daily values from 10 streamflow stations (Fig. 1), 25 precipitation stations, 7 temperature stations and 5 stations for vapour pressure deficit. Temperature and vapour pressure deficit were interpolated to a regular 2km grid by inverse distance weighting and the precipitation was interpolated by kriging (Motovilov et al., 1999a). Figure 1. NOPEX area and the ten gauged catchments. The ECOMAG Model The ECOMAG model (Motovilov et al., 1999a; 1999b) describes the main processes of the land surface hydrological cycle: infiltration, evapotranspiration, heat and water regime of the soil, snowmelt, formation of surface, subsurface, groundwater and river runoff. The region is divided into grid cells, and the same model algorithms are applied on each cell. In this application a grid size of 2x2 km is used. The vertical structure of the model includes two layers, an upper layer called horizon A and a lower groundwater zone. Each grid cell is assigned a soil class and a vegetation class, and some of the parameters are determined from the respective classes, whereas the rest of the parameters are common for the whole region. To reduce the number of calibrated parameters, the relative differences between corresponding parameters for soil and vegetation classes are assumed to be constant, and the standard parameter values of a class are instead multiplied by a common calibration factor. Application of ECOMAG to the NOPEX region The present study is based on the work by Motovilov et al.(1999a; 1999b) where a regional calibration and validation of ECOMAG was done following the proxy basin scheme suggested by Klemes(1986). As a first step the model was calibrated using standard meteorological and hydrological data for seven years for three basins. An additional adjustment of the soil parameters was performed using soil moisture and groundwater level data from five small experimental catchments. The model was validated against streamflow for 14 years from six other catchments, synoptic streamflow and evapotranspiration measurements performed during two concentrated field efforts in 1994 and 1995. THE BAYESIAN METHOD The Bayesian approach used here is mainly as described by Binley and Beven (1991), but the notation from Tanner (1996) has been adopted. The method aims to establish a multi-dimensional probability density distribution for the parameters dependent on observations. The posterior probability, p, of the parameter set θi, given observations Y is: P(θ i Y ) = L(θ i Y )p (θ i ) C (1) where p(θi) is the prior probability for the parameter set, L( θi |Y ) the likelihood function calculated from the observations Y, and C is a scaling constant making the cumulative sum equal one. Streamflow data from an additional period or a neighbouring area are used for updating the probability distribution p(θi |Y ). An empirical, non-parametric probability distribution is established by performing a sampling of Eq. (1) in the parameter space. Estimation of Likelihood and Prior Probability Distribution First a likelihood function for the parameter sets has to be estimated. The formula used here is: σ i2 L(θ i Y ) ~ exp − 2 σ obs (2) where L(θi|Y) is the likelihood of the ith parameter set θi, given the observations Y, σi2 is the sum of squared simulation errors divided by number of time steps in a period and σobs2 is the observed variance over a period. In this study the likelihood function is calculated for a period of one year, 1. June - 31. May. Repeated use of equations 1 and 2 results in the following expression for the posterior probability function given n set of observations: σ2 σ 2 p(θ i Y1 ....Yn ) = exp − 2i ,1 − .... − 2i ,n σ σ obs ,1 obs ,n (3) The prior probability distributions are chosen to be uniform except for the conductivity parameters. Since the conductivity often follows a log-normal distribution, a log-uniform distribution is chosen. The optimised parameters from Motovilov et al. (1999a) were restricted to be within the upper and lower limits of the uniform or log-uniform distributions. Value of Additional Data The normalised Shannon entropy measure H (Eq. (4)) describes the variance of a multi-dimensional probability function. It is used as a numerical measure of the advantage of using additional data in the estimation of the probability distribution of the parameters: M H =− ∑p i =1 i log p i log M (4) where the probabilities pi, are scaled making their sum equal to one, and M is the total number of simulations. This function has a maximum at 1 when all the realisations have the same probability, and a minimum at zero when one realisation has a probability of 1 and all others are zero. The way H develops as more observations are used is dependent of the probability function in Eq. (1) and how the additional observations contribute in Eq. (3). RESULTS Calculations were performed in two-dimensional parameter spaces to avoid too extensive computing time. To have a common reference, the horizontal conductivity of horizon A was chosen as one dimension in the parameter spaces for all the calculations. The other dimensions were in turn vertical conductivity of horizon A, horizontal conductivity of the groundwater zone, thickness of horizon A, evaporation parameter, critical temperature snow/rain precipitation, degree-day-factor and threshold temperature for start of snowmelt. These nine parameters proved to be the most sensitive to simulation of streamflow. The likelihood function was computed for a time period of one year, 1. June - 31. Mai, for each catchment for the period 1981-1990. The probability distributions updated on all catchments and years for two dimensional parameter spaces are shown in Fig. 2. The entropy measure of the marginal probability distributions as additional streamflow data are used for the updating (10 years for each station) is seen in Fig. 3. The entropy decreases as additional data are used for updating the probability distributions, and quite well defined areas, with two exceptions, appear to be good parameter sets for the NOPEX region (Fig 2). The parameter values found by Motovilov et al. (1999a) appear to be close to the modal values found here. Also the Reff Nash-Sutcliffe coefficients calculated for the optimal parameter values indicate satisfactory or good model performance for all catchments. These results suggest that regional parameters, as defined in this paper, exist. −3 x 10 0.035 0 8 10 −1 6 10 10 10 −2 4 −3 2 Horizontal conductivity of horizon A* 10 10 0.5 1 1.5 2 2.5 3 3.5 Surface retension storage* 0 0.03 0.04 0 10 0.025 −1 0.03 −1 0.02 10 0.02 0.015 10−2 −2 10 0.01 −3 0.01 −3 10 10 0.5 1 1.5 2 2.5 Degree−day−factor* −3 3 0.5 1 1.5 2 2.5 Evaporation parameter* 3 x 10 10 0 10 0.025 0 8 10 6 10 −2 4 10 −3 2 0.015 0 10 0.02 −1 10 10 10 −2 10 −1 0 −1 −1 0.015 −2 0.01 10 1 0.01 −2 10 0.005 0.005 10−3 −3 −2 10 10 10 Vertical conductivity of horizon A* 10 −1 0 1 −1 −0.5 0 0.5 1 1.5 2 2.5 10 10 10 10 Conductivity of groundwater zone* Critical temperature precipitation oC 0.05 0.025 0 0.04 10 −1 0.03 10 −2 0.02 10 0 0.02 10 10 0.01 −3 10 −2 −1 0 1 2 Critical temperature start of snowmelt oC −1 0.015 −2 10 0.01 0.005 −3 10 0.5 1 1.5 2 2.5 Thickness horizon A* 3 * Factor multiplied with the parameter values found by Motovilov et al. (1999a, 1999b) Figure 2. The posterior probability distribution for two dimensional parameter spaces updated on data from all years and catchments. The gray shades reprecent the probability. The two parameters that do not follow this trend are the vertical conductivity of horizon A and the surface depression storage. The model results are relatively insensitive to the vertical conductivity of horizon A if the value is high enough. As all precipitation is likely to infiltrate in unsaturated areas in typical Nordic catchments this is a reasonable result. Providing a sufficient high value has been chosen it could therefore be excluded from the parameter space to save computing time. The optimal value of the surface depression storage changes much between years and catchments and thus suggests that the parameterisation of the grid cells does not represent the physical properties that are most important for this parameter. It is also possible that this process description compensates for other physical processes, e.g. interception. A change in the description and/or parameterisation of the process is necessary to determine a regional parameter value. 1 Normalised entropy 0.8 0.6 Fyrisån Lillån Hågaån Sagån Örsundaån Surface retension storage Degree−day−factor Evaporation parameter Vertical conductivity of horizon A Conductivity of groundwater zone Critical temperature precipitation Critical temperature start of snowmelt Thickness horizon A 0.4 0.2 1 11 Sävjaån Sävaån Stabbybäcken Stalbobäcken 21 31 41 51 61 71 Number of data sets used for updating of probability 81 Figure 3. The entropy measure for the marginal probability distributions, as additional data are used for the updating. Fig. 3 reveals that for three parameters (critical temperature for start of snowmelt, degree-day-factor and critical temperature for phase of precipitation) the entropy stops decreasing when the data sets from the last three catchments are used for updating the probability distributions. This suggests that a limit is reached for how accurate the parameters can be defined. When the limit is reached, additional data do not provide more information about the parameters. CONCLUSIONS The aim of this work was to demonstrate that it is meaningful to define regional parameters for the macro scale hydrological model ECOMAG. The Bayesian method has proved to be useful for an objective estimation of regional parameters. For the NOPEX region a regional parameter set exists according to some predefined criteria. The use of additional data implied a decrease in the variances of the marginal distribution for most of the parameters, and a relatively narrow area in parameter space appeared to contain good parameter sets for simulation of streamflow in the nine catchments studied in the NOPEX region. Computing time is a limitation for the number of dimensions in the parameter space and the sampling density. Possible solutions could be to reduce the dimensions of the parameter space by searching for relationship between parameters or to use a Monte Carlo Marcov Chain resampling procedure e.g. the Metropolis-Hastings Algorithm. An other issue for further investigations is the choice of likelihood function. The statistical assumptions for the likelihood function used here, i.e. the simulation errors are independent normally distributed with zero mean and constant variance, are not fulfilled. The simulation errors of a dynamical model are most likely autocorrelated, and the variance is dependent on the streamflow value, climatic conditions and physical properties of the catchments. The Bayesian method also provide the possibility to include internal variables, e.g. soil moisture and groundwater in the updating of the probability distributions. REFERENCES Binley, K. and Beven, K. (1991) Physically-based modelling of catchment hydrology: a likelihood approach to reducing predictive uncertainty. In D. D. Farmer and M. J. Rycrift, Eds. Computer modelling in the Environmental Sciences, The Institute of Mathematics and its Applications Conference Series, Claredon Press, pp. 75-88. Gupta, H. V., Sorooshian, S. and Yapo, P. O. (1998) Towards improved calibration of hydrologic models: Multiple and noncommensurable measures of information, Water Resources Research 34 (4), pp. 751-763. Halldin, S., Gottschalk, L., Van de Girend, A. A., Gryning, S-E., Heikinheimo, M., Högstrom, U., Jochum, A. and Lundin, L-C. (1995) Science plan for NOPEX, NOPEX Technical report No. 12, Institute of Earth Sciences, Uppsala University. Klemes, V. (1986) Operational testing of hydrological simulation models, Hydrological Sciences Journal 31 (1), pp. 13-24. Motovilov, Y.G., Gottschalk, L. Engeland, K. and Rodhe, A. (1999a) Validation of a distributed model against spatial observations, Nopex special issue of Journal of Agricultural and Forest Meteorological Research. Motovilov,Y.G., Gottschalk, L., Engeland, K. (1999b) ECOMAG - a physically based hydrological model - application to the NOPEX region, Technical report, Department of Geophysics, University of Oslo. Nash, J. E. and Sutcliffe, J. V. (1970) River flow forecasting through conceptual models part 1 - A discussion of principles, Journal of Hydrology 10, pp. 282290. Seibert, P. (1994) Hydrological characteristics of the NOPEX research area, Nopex Technical Report No. 3, Instutute of Earth Sciences, Uppsala University. Tanner, M.A. (1996) Tools for Statistical Inference: Methods for Exploration of Posterior Distributions and Likelihood Functions, Springer, New York