Statistical Analysis of Parameters and Residuals of and Case Study

advertisement

Water Resources Management 15: 75–92, 2001.

© 2001 Kluwer Academic Publishers. Printed in the Netherlands.

75

Statistical Analysis of Parameters and Residuals of a Conceptual Water Balance Model – Methodology and Case Study

CHONG-YU XU

Department of Earth Sciences, Hydrology, Uppsala University, Villavagen 16, S-752 36 Uppsala,

Sweden, e-mail: chong-yu.xu@hyd.uu.se

(Received: 30 May 2000; in final form: 24 May 2001)

Abstract.

Statistical analysis of parameters and residuals of conceptual hydrological models has received little effort in the hydrological research, certainly by orders of magnitude less than on many other problems like development and comparison of automatic calibration methods, optimisation algorithms, etc. Much more work is required than is presently undertaken to investigate the properties of model residuals. There is a need of an easily understandable and applicable statistical analysis scheme. In this article, a procedure is presented through which two basic issues of model evaluation are accounted for. First, different techniques used for parameter analysis are discussed.

Second, methodology of residual analysis is discussed and the general behaviours of residuals are examined. To illustrate the procedure, a simple water balance model was applied to the Stabbybäcken

River Basin in central Sweden.

Key words: calibration, conceptual models, parameter analysis, residuals analysis

1. Introduction

Conceptual catchment models have been formulated to varying degrees of complexity (e.g., Thornthwaite and Mather, 1955; Crawford and Linsley, 1966;

Bergström, 1976, 1992; Xu and Singh, 1998). All these models accord with the definition given by Moore and Clarke (1981), namely: (1) they describe conceptually land-based hydrologic processes which are spatially averaged or lumped, and

(2) some of their parameters are estimated by fitting to observed hydrologic data such as rainfall, potential evapotranspiration, and streamflow.

There are two basic approaches to estimation of model parameters: manual and automatic. In recent years a great deal of research has been devoted to the development of later approach, which has two major components: (1) the estimation criterion and (2) the optimisation algorithm. The choice of estimation criterion has been discussed extensively (e.g., Sorooshian and Dracup, 1980; Sefe and Boughton, 1982; Servat and Dezetter, 1991). The selection of an automatic parameter optimisation algorithm has also been studied extensively (e.g., Dawdy and O’Donnell, 1965; Nash and Sutcliffe, 1970; Clarke, 1973; Pickup, 1977; So-

76

CHONG-YU XU rooshian and Gupta, 1983). Until recently, virtually all calibration methods belong to ‘local-search’ procedures. The convergence problems encountered by local search algorithms have been well documented in the literature (e.g., Gupta and

Sorooshian, 1985; Hendrickson et al ., 1988). Research into optimisation methods has led to the use of population-evolution-based search strategies (e.g., Wang,

1991; Duan et al ., 1992). In this regard the use of the shuffled complex evolution

(SCE-UA) global optimisation algorithm has been reported in a number of studies

(e.g., Duan et al ., 1992; Sorooshian et al ., 1993; Kuczera, 1997, Yapo et al ., 1996;

Gupta et al ., 1998). Notwithstanding the success mentioned above, Vertessy et al .

(1993) showed that the gains of using the global optimisation algorithm can be at a substantial computational price.

Regardless of the means employed to calibration, optimisation methods are based on a philosophy that involves some assumptions (hypotheses), which often are not satisfied. Typically, model parameters for gauged catchments are usually estimated by ordinary least squares (OLS), which involves solving the minimisation problem: min sum of squares = min n t

=

1 q t,obs

− q t,com

(x t

;

A)

2

(1) where q t,obs and q t,com are observed and computed discharge, respectively. The difference of which is called model error or residual ε t

. The x t is a vector of inputs (such as rainfall and evaporation), and A , a parameter vector about which inference is sought. Probably the main reason for the popularity of this criterion, as stated by Sorooshian and Dracup (1980) has been its direct applicability to any model. However, the use of Equation 1 as an objective function to be minimized for parameter optimisation implies certain assumptions about the residuals

1973):

ε t

(Clarke,

(a) that the ε t

σ

ε

2

;

(b) that the ε t have zero mean and constant variance σ

ε

2 (i.e., E ( ε t

) = 0, E ( ε are mutually uncorrelated ( E ( ε t

, ε t

− k

) = 0 for all k

=

0).

t

2

)

=

If it were known that either assumption (a) or (b), or both, were invalid, then

Equation (1) would not be the most sensible objective function; estimates of model parameters would, of course, still be obtained by minimising Equation (1), but their interpretation would be fallacious.

Clarke (1973) also stated that if approximate confidence intervals are to be given for the estimated model parameters, a further assumption must be made about the probability distribution of the residuals, that is:

(c) that the ε t are distributed normally.

The above assumptions need to be tested. The success or otherwise of the fitted model as a description of the relation between rainfall and streamflow from the

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

77 catchment is illustrated by the model residuals, which also give evidence of the validity or invalidity of the assumptions (such as (a), (b) and (c) above) made in the model formulation. However, in the field of hydrological modelling, few writers examine and describe any properties of residuals given by their models when fitted to data, although most present diagrams comparing the observed and fitted hydrographs, together with a measure of ‘goodness of fit’ such as R 2 , the analog of the coefficient of determination. The lack of such studies was pointed out by, for example Aitken (1973), Clarke (1973), Sorooshian and Dracup (1980), and

Kuczera (1983), among others. The point is that the whole set of residuals requires study, and not just a single descriptive measure calculated from them. One of the reasons might be that there is a lack of an easily understandable and applicable statistical analysis procedure for the average hydrologist, or for that matter any model user to use.

The purpose of this article is not to discuss and compare which optimisation methods are preferable to the others. Instead, this article addresses two basic issues of model evaluation, such as what is the basic requirement for the statistical analysis of the parameters and residuals, and how to perform such analysis with a simple procedure. To illustrate the procedure, a simple water balance model was applied to the Stabbybäcken River at Stabby in central Sweden.

2. The Model and Data

The water balance model used in this study was developed for water balance computation for the NOPEX region (Xu et al ., 1996). The input data are areal precipitation, long-term average potential evapotranspiration and air temperature. The model outputs are river flow and other water balance components, such as actual evapotranspiration, slow and fast components of river flow, soil-moisture storage and accumulation of snowpack, etc. The time step used in the study is 10 days. The model works as follows: precipitation p t is first split into rainfall r t and snowfall s t by using a temperature-index function, snowfall is added to the snowpack sp t

(the first storage) at the end of the time step, of which a fraction m t melts and contributes to the soil-moisture storage sm t

. Snowmelt is calculated by using a temperatureindex method. Before the rainfall contributes to the soil storage as ‘active’ rainfall, a part is subtracted and added to interception evaporation loss. The soil storage contributes to evapotranspiration e t

, to a fast component of flow f t and to slow flow b t

. The model concept is shown in Figure 1 and the main equations are shown in Table I.

The hydrological data of the Stabbybäcken river at gauge station Stabby are used in the study. The basin has an area of 6.6 km 2 and is situated 4 km SW of

Uppsala. The area is dominated by forest with abundant bog areas. Eleven years

(1981–1991) of ten-day precipitation, air temperature and runoff data were used in the study to calibrate the model parameters, of which the first year 1981 was used as a ‘warming-up’ period.

78

CHONG-YU XU

Figure 1.

Schematic computational flow chart of the NOPEX water balance model.

Table I.

Principal equations of the NOPEX-6 monthly snow and water balance model

Snow fall s t

= p t

{

1

− exp

[−

(c t

− a

1

)/(a

1

− a

2

)

] 2 } +

Rainfall

Snow storage

Snowmelt r t

= p t

− s t sp t

= sp t

1 m t

= sp t

+ s t

− m

{

1

− exp

[

(c t t

− a

2

)/(a

1

Potential evapotranspiration ep t

= ( 1

+ a

3

Actual evapotranspiration

Slow flow

Fast flow equation

Total computed runoff e b f d t t t t

= min

=

=

= a a b t

5

6

[ w

(sm

(sm

+ f t

(c t

− c

( 1

− e m

− a

4

))ep m ep t ), ep t

] t

+ t

1 t

1

)

)(m t

+ n t

)

Water balance equation sm t

= sm t

1

+ r t

+ m t

− e t

− d t

− a

2

)

] 2 } + a

1

≥ a

0

≥ a a a

5

6

4

0

0

2

1 w n t t

=

= r t p t

+ ep t sm

( 1

+ t

1

− e

− p t is the available water; sm

+ t

1

/ep t ) is the active rainfall; p t

− max and c t

(sm t

1

, 0 ) is the available storage; are monthly precipitation and air temperature, respectively; ep m and c m are long-term monthly average potential evapotranspiration and air temperature, respectively; a i

(i

=

1 , 2 , . . ., 6 ) are model parameters.

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

79

3. Methodology of Statistical Analysis

The basic issues of statistical analysis of conceptual catchment models include the use of different techniques for the estimation and analysis of model parameters, for evaluation and analysis of the statistical behaviour of residuals, etc.

3.1.

PARAMETER ESTIMATION

The estimation procedure is defined given the validity of the assumptions (a) to (c) presented in the introduction section. The validity or invalidity of the assumptions

(a) to (c) will be tested in the residual analysis section. A classical estimation method is to maximise the loglikelihood with respect to the parameters. Because of the hypotheses (a) to (c) maximising the loglikelihood with respect to model parameters a i is equivalent to minimising the sum of squares (Equation (1)). Minimisation of Equation (1) with respect to the parameters a i results in estimates of a i

. The model standard deviation, σ , is estimated by

S

= minimum sum of squares

N

K

=

SSE

N

K

(2) where N is the number of terms in (2), and K is the number of parameters.

An approximate standard deviation of model variance is easily found to be (e.g.,

Spiegel, 1975): std(S

2

)

= √

2 σ 2

2 (N

K)

(3)

For n (i.e., N

K in this case)

100 the sampling distribution of S is very nearly normal. If the population is normal (or approximately normal), it can be shown that

(Spiegel, 1975, Table 5.1, page 162): std(S)

= √

σ

2 (N

K)

(4)

Note that (3) yields (4) in case the population is normal with

µ

S

2

=

E(S

2

)

=

(n

1 )σ

2

/n (5) which is very nearly σ

The fact that ( of freedom enables us to obtain confidence limits for and χ

2

0 .

975

N

2

K for large n ( n

30).

) S are the values of

2 /σ

χ 2

2 has a chi-square distribution with N

K degrees

σ

2 or σ . For example if χ

2

0 .

025 for which 2.5% of the area lies in each ‘tail’ of the distribution, then a 95% confidence interval is

χ

2

0 .

025

(N

K)S

2

σ 2

χ

2

0 .

975

.

(6)

80

CHONG-YU XU

From these we see that σ can be estimated to lie in the interval

S

N

K

σ

S

.

χ

0 .

975

N

K

χ

0 .

025

(7)

Applying the large sampling theory, a 95% confidence limits for σ is approximately

(e.g., Spiegel, 1975, page 204):

S

±

1 .

96 σ/ 2 (N

K) .

(8)

If the model is correct, then the quantity a i

/S i is distributed as a t -distribution with

N

K degrees of freedom. Thus the confidence limits on parameter a i can be estimated from (e.g., Haan, 1977) a i

± t

1 −

α/ 2 ,N

K

S i

(9) where for σ i

ˆ i is an estimate of a i

, α is the significance level, and S i

, the standard deviation of a i is an estimate

. For N

K large enough (>100), a normal approximation can be applied for a i

/S i

, the half width of a 95% confidence interval of a i thus is

H W CI (a i

)

=

1 .

96 σ i

.

(10)

These formulae are standard results in regression analysis.

3.2.

PARAMETER ANALYSIS

In parameter analysis we should ask: are all parameters statistically significant? Are parameters highly correlated with each other? Which parameter is more sensitive?

The first question can be studied by checking whether the zero values belongs to the 95% confidence interval

(a i

1 .

96 σ i

, a i

+

1 .

96 σ i

).

If the zero value is included in the confidence interval, i.e., if the hypothesis is acceptable, then parameter a i a i

=

0 can be set equal to zero without diminishing the explanatory power of the model. Regarding Question 2, the correlation matrix of the parameters has to be computed. A correlation coefficient between two parameters very near +1 or –1, implies that perhaps a model can be found with a smaller number of parameters and with the same explanatory power; alternatively the parameters may have to be built into the model in a different way, so that their explanatory effects are more dissociated, and optimisation is easier. As for

Question 3, an analysis of the parameter sensitivity is performed by plotting the criterion function SSE versus the percentage of relative deviations of the parameter values from the optimised parameter value, i.e., SSE versus a i a i a i (%). Where a i is

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

81 the optimised parameter value, a i

. The smallest value for SSE

˜ i are the parameter values at the neighbourhood of

, i.e., S SE , is found at a i a i and SSE increases with both the increase and decrease of a i

. These curves (e.g., Figure 3) must have a parabolic aspect centred around the minimum S SE and according to the form of the cup one can make a qualitative analysis of the parameter sensitivity.

3.3.

RESIDUALS ANALYSIS

Residual analysis is checking whether the residuals ε t behave as are required by the model hypotheses, especially whether they are independent, homoscedastic and normally distributed with zero expectation. The general behaviour of residuals can be studied by various methods (e.g. Aitken, 1973; Sorooshian and Dracup,

1980; Vandewiele et al ., 1992). The methods used for testing the three assumptions of residuals (i.e., homoscedasticity, independence, and normality) as stated in the introduction section are discussed in this section.

3.3.1.

Check on Homoscedasticity

The homoscedasticity of residuals can be checked by plotting the residuals versus important variables such as computed discharge. The scattergram has to be symmetric with respect to the horizontal axis (zero expectation), and the conditional standard deviation has to be constant (homoscedasticity). Examples are given in the case study section.

The homoscedasticity can also be tested by the Kruskal-Wallis statistics (Kruskal and Wallis, 1952). The Kruskal-Wallis test, or H test enables us to test the null hypothesis that k independent random samples come from identical populations.

It is a nonparametric test. The method assumes that the variable has a continuous distribution, but nothing is said about the form of the population distribution or distributions from which the samples were drawn. The test is based on the statistic

H

=

12 n(n

+

1 ) k i

=

1

R i

2 n i

3 (n

+

1 ) .

(11)

In the test, all observations are ranked jointly, and occupied by the n i

R i observations of the i th sample, and n is the sum of the ranks

1

+ n

2

+

. . .

+ n k

= n .

When n i

> 5 for all i and the null hypothesis is true, the sampling distribution of the H statistic is well approximated by the chi-square distribution with k

1 degrees of freedom. The null hypothesis of homoscedasticity will be rejected for a given significance level, α , if computed H is bigger than χ

2

1 −

α,k

− 1

. Examples and results of the test are given in the case study section.

82

CHONG-YU XU

3.3.2.

Check on Independence

The hypothesis that the residuals are mutually uncorrelated can be checked by computing the autocorrelations of the residuals, ρ k

, with time lag k and the corresponding confidence interval. In general, the autocorrelations is

ρ k with time lag k

ρ k

=

E

[

(x t

µ)(x t

+ k

µ)

]

2 where µ and σ 2 are the mean and variance of the residuals, respectively.

An estimate of ρ k is

ρ

ˆ k

= n

− k t

=

1 x t

2 − 1 n

− k n

− k t

=

1 x t x t

+ k n

− k t

=

1 x t

2

1 n

− k

1 / 2 n

− k t

=

1 x t n t

= k

+

1 x t n t

= k

+

1 x t

2 − 1 n

− k n t

= k

+

1 x t

2

1 / 2

(12)

(13) when n is large and k is small, n/(n

− k)

1, a simpler estimator of autocorrelation coefficient is r k

= n

− k t

=

1

(x t n t

= 1 x) (x t

+ k

(x t

− ¯ 2

(14)

The confidence interval for the autocorrelation coefficient of an independent series is given by the limits (Haan, 1977) r k

( 95% )

=

1 n

− k

1

±

1 .

96 (n

− k

1 ) (15)

If the calculated r zero ( H o

:

ρ k k falls outside these confidence limits, the hypothesis that ρ k

=

0 versus H a

:

ρ k is

=

0) is rejected. Examples of the tests are shown in the case study section.

3.3.3.

Check on Normality

The hypothesis that residuals are distributed normally is needed if the estimated confidence regions for the parameters are required. The normality can be tested using different methods. The Kolmogorov-Smirnov test method is used here. The test has several advantages. It is easy to use and the procedure is graphic; a large number of samples can be tested on the same plot; the test is nonparametric and is not subject to the very small sample limitation. The test is conducted as follows:

1) Let F (x) be the completely specified theoretical cumulative distribution function under the null hypothesis.

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

83

2) Let F s

(x) be the sample cumulative density function based on n observations.

For any observed x , F s

(x)

= k/n where k is the number of observations less than or equal to x .

3) Determine the maximum deviation, D , defined by

D

= max

|

F (x)

F s

(x)

|

(16)

4) If, for the chosen significance level, the observed value of D is greater than or equal to the critical tabulated value of the Kolmogorov-Smirnov statistic, the hypothesis is rejected.

Examples of the tests are shown in the case study section.

4. The Case Study

4.1.

RESULTS OF PARAMETER ESTIMATION AND ANLYSIS

Optimised parameter values and their 95% confidence intervals were computed and shown in Figure 2 (Run 1). Correlation matrix of parameter values were computed and shown in Table II (Run 1). Results show that five of the six parameters are statistically significant and correlation coefficients between different parameters are ranging between 0 to 0.67. Third, an analysis of the parameter sensitivity is performed by plotting SSE versus the parameter values at the neighbourhood of the minima (Figure 3). In the figure, a i is the optimised value for a i

, and

˜ i are the values of a i at its neighbourhood of the minimum. The figure shows that the snowmelt parameter a

1 and fast flow parameter a

6 are the most sensitive ones while a

3

, a parameter that is used to convert long-term average monthly potential evapotranspiration to the monthly potential evapotranspiration, is least sensitive in this case.

4.2.

RESULTS OF RESIDUALS ANALYSIS

4.2.1.

Check on Homoscedasticity

The residual’s homoscedasticity was checked using both the graphic method and the Kruskal-Wallis test method as discussed before. The result of the graphic method is shown in Figure 4. This plot reveals two deficiencies. First, for low runoff there appears systematic bias in predicting runoff. Second, the residuals’ variability increases with increasing runoff. This suggests that the assumption of constant error variance (homoscedasticity) is violated.

In performing the Kruskal-Wallis test, the following procedure is used:

1) Divide the residual time series into three groups for low (computed discharge d t is less than 75% of the long-term average value, d

¯

), median (computed

84

CHONG-YU XU

Figure 2.

Comparison of optimised parameter values with the 95% confidence interval (line with circle for the first run and line with plus for the second run). For the sake of plotting, parameters a

1 and a

2 up by 10, respectively.

have been scaled down by 0.1; parameters a

4 and a

6 have been scaled

Table II.

Correlation matrix of the parameters a

1 a

2 a

3 a

4 a

5 a

6

Run 1 a

1 a

2 a

3 a

4 a

5 a

6

1.000

–0.675

–0.134

–0.675

1.000

0.052

0.057

–0.167

–0.507

0.358

0.387

–0.453

–0.134

0.052

0.057

–0.167

1.000

0.114

–0.507

0.358

0.219

0.387

–0.453

–0.075

0.114

1.000

0.163

0.512

0.219

0.163

1.000

0.036

–0.075

0.512

0.036

1.000

Run 2 a

1 a

2 a

3 a

4 a

5 a

6

α a

1 a

2 a

3 a

4

1.000

–0.498

–0.067

0.077

–.0498

1.000

0.042

–0.074

–0.067

0.042

1.000

–0.014

0.077

–0.074

–0.014

1.000

–0.335

0.268

0.230

–0.072

0.360

–0.265

–0.026

0.205

–0.095

–0.052

0.013

–0.115

a

5 a

6

–0.335

0.360

0.268

–0.265

α –0.095

–0.052

0.230

–0.026

0.013

–0.072

0.205

–0.115

1.000

–0.145

–0.003

–0.145

1.000

–0.063

–0.003

–0.063

1.000

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

85

Figure 3.

Intersection of the hypersurface of the criterion function. (

ˆ i value,

˜ i

= parameter values at the neighbourhood of

ˆ i

= optimised parameter

). The diagram is used to check the sensitivity of the parameters to the criterion function value. The x-axis shows the percentage of relative deviation of the parameter values from their optimised values and the y-axis shows the change of the criterion function value.

discharge d t value, is great than 75% and less than 125% of the long-term average d

¯

) and high (computed discharge d t is great than 125% of the long-term average value, n

3

=

90, and n d

¯

) flows, respectively. In this way, we get n

= n

1

+ n

2

+ n

3

=

360.

1

=

206, n

2

=

64,

2) Ranking these residuals jointly from smallest to largest. Each variable in the residual series occupies a rank number.

3) The sums of the ranks are computed to give R

1

=

39770, R

2

14507, for low, median and high flows’ groups, respectively.

4) Applying Equation (11) we get:

=

10703, R

3

=

H

=

12

360

×

( 360

+

1 )

39770

3

×

( 360

+

1 )

=

7 .

139 .

206

2

+

10703

2

64

+

14507

2

90

86

CHONG-YU XU

Figure 4.

Plot of residual versus computed runoff for catchment Stabby (Run 1). Two deficiencies are revealed: systematic bias for low flows and variability of the residuals increases with increasing predicted runoff.

Figure 5.

Autocorrelation of residuals for the catchment Stabby (Run 1). Correlated case.

5) Since H

=

7 .

139 exceeds 5.991, the value of χ

2

0 .

95 for 2 degrees of freedom, the null hypothesis of homoscedasticity must be rejected.

4.2.2.

Check on Independence

The residual autocorrelations computed by Equation (12) together with the 95% confidence interval by Equation (13) are plotted in Figure 5. It is seen that the residuals are correlated significantly for small time lags. Therefore, the hypothesis of independence of residual is not fulfilled in this case study.

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

87

Figure 6.

Comparison of cumulated probability distribution of residuals with the theoretical normal distribution function values. The maximum deviation between the theoretical line (solid) and the sample line (dashed) on the probability scale is about 0.17 at residual

ε t

=

0 .

53. The critical value of the Kolmogorov-Smirnov test statistic for α

=

0 .

05 and

N

K

=

360

6

=

354 is 0.08. This figure shows that the hypothesis that the residuals are normally distributed is rejected at the significance level α

=

0 .

05.

4.2.3.

Check on Normality

The Kolmogorov-Smirnov test is used to check whether the residuals are normally distributed. The theoretical normal probability distribution function values and the sample probability distribution function values are plotted in Figure 6. The maximum deviation between the theoretical line and the sample line on the probability scale is about 0.17 at residual ε t

= 0.53. The critical value of the Kolmogorov-

Smirnov test statistic for α

=

0 .

05 and N

K

=

360

6

=

354 is 0.08.

Therefore the hypothesis that the residuals are normally distributed is rejected at the significance level α

=

0 .

05.

4.3.

CORRECTIVE ACTIONS

Because all three assumptions discussed in the introduction section appear to be violated, i.e., residuals are heteroscedastic (variance is increasing as calculated runoff increases, Figure 4), residuals are significantly correlated (time dependent, Figure 5), and residuals are not normally distributed at the significance level

α

=

0 .

05 (Figure 6), the following corrections were tried in this study:

1) To remove heteroscedasticity, i.e., dependence of error variance on computed runoff, a square root transformation on runoffs q t,obs and q t,com

, as suggested by Vandewiele et al . (1992), was used as a first try. Therefore, it is supposed

88 that

√ q t,obs

√ q t,com

=

ε t

.

CHONG-YU XU

(17)

2) To handle the correlated errors case, a first-order autoregressive scheme was assumed to represent the transformed residuals. The same procedure has also been used by Sorooshian and Dracup (1980) and Alley (1984): with

ε t

=

αε t

1

+

β t

β t

N ( 0 , σ

2

) and β t independent

(18)

(19) where α is a so called autocorrelation parameter.

The objective function (1) then becomes min sum of squares = min n t

= 1

ε t

αε t

1

2

(20)

The above function when minimised with respect to the unknown parameters should result in the parameter values that when used in the model for forecasting purposes will produce the hydrograph with the highest probability of being nearest to the true hydrograph. Again, hypotheses (18) and (19) have to be checked in the new calibration together with parameters analysis. Some of the results of the new calibration are discussed as follows.

4.3.1.

Check on Homoscedasticity

As in the first run, the plot of residual versus computed discharge was first checked

(Figure 7). Comparing Figure 7 with Figure 4 reveals that the variability of the residuals does not display dependence on predicted runoff. The tendency of residuals to be positive for very small runoff, as shown in Figure 4, was removed (Figure 7).

Again, the Kruskal-Wallis test was applied. Following the same procedure as described in Section 4.2.1, the computed H statistic equals to 2.89 < 5.991 = the value of χ 2

0 .

95 for 2 degrees of freedom. The hypothesis that three samples are from an identical distribution cannot be rejected.

4.3.2.

Check on Independence

The residual autocorrelation computed by Equation (12) together with the 95% confidence interval by Equation (13) are plotted in Figure 8. The plot reveals that the strong time dependence (as measured by residual autocorrelation in Figure 5) was removed by fitting the AR (1) model to the transformed residuals.

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

89

Figure 7.

Plot of residual versus computed runoff for catchment Stabby (Run 2). A case of homoscedastic residual.

Figure 8.

Autocorrelation of residuals for the catchment Stabby (Run 2). Uncorrelated case.

4.3.3.

Check on Normality

The theoretical normal probability distribution function values and the sample probability distribution function values are plotted in Figure 9. The maximum deviation between the theoretical line and the sample line on the probability scale is about 0.06 at residual ε t

=

0 .

50. The critical value of the Kolmogorov-Smirnov test statistic for α

=

0 .

05 and N

K

=

360

6

=

354 is 0.08. Therefore the hypothesis that the residuals are normally distributed is not rejected at the significance level

α

=

0 .

05.

90

CHONG-YU XU

Figure 9.

Comparison of cumulated probability distribution of residuals with the theoretical normal distribution function values. The maximum deviation between the theoretical line (solid) and the sample line (dashed) on the probability scale is about 0.06 at residual

ε t

=

0 .

50. The critical value of the Kolmogorov-Smirnov test statistic for α

=

0 .

05 and

N

K

=

360

6

=

354 is 0.08. This figure shows that the hypothesis that the residuals are normally distributed is not rejected at the significance level α

=

0 .

05.

Together with residual analysis, the parameter analysis techniques as discussed in Section 3.2 are applied in the new calibration. For the sake of comparison, the optimised parameters and parameter correlation matrix are shown in Figure 2

(Run 2) and Table II (Run 2), respectively. Comparing with the first run, this particular case study shows that parameter values are not significantly different as the

95% confidence intervals overlapped (Figure 2). Correlation coefficients between parameters are slightly smaller in the second run (Table II).

5. Conclusions

A procedure to analyse parameter significance and sensitivity and to evaluate residual behaviour in conceptual catchment models is discussed and exemplified.

To illustrate the procedure, the NOPEX-6 water balance model (Xu et al ., 1996) was applied to the Stabby catchment. Through a case study it is shown that the proposed approach is practical and simple to implement and can also provide useful information that helps to identify and remove the violations of model hypotheses.

It might be said that the proposed methodology can also be used in other model applications of the same type.

STATISTICAL ANALYSIS OF A CONCEPTUAL WATER BALANCE MODEL

91

Acknowledgements

The data used in this investigation was provided from the SINOP (System for

Information in NOPEX) database. The Swedish Meteorological and Hydrological

Institute (SMHI) provided most of the data to SINOP. The referees comments are gratefully acknowledged.

References

Aitken, A. P.: 1973, Assessing systematic errors in rainfall-runoff models, J. Hydrol.

20 , 121–136.

Alley, W. M.: 1984, On the treatment of evapotranspiration, soil moisture accounting and aquifer recharge in monthly water balance models, Wat. Resour. Res.

20 (8), 1137–1149.

Bergström, S.: 1976, Development and Application of a Conceptual Model for Scandinavian

Catchments , The Swedish Meteorological and Hydrological Institute (SMHI), Report RHO 7,

Norrköping, Sweden.

Bergström, S.: 1992, The HBV Model – Its Structure and Applications, SMHI Report RH No. 4,

Norrköping, Sweden.

Clarke, R. T.: 1973, ‘A review of some mathematical models used in hydrology, with observations on their calibration and use’, J. Hydrol.

19 (1), 1–20.

Crawford, N. H. and Linsley, P. K.: 1966, Digital simulation in Hydrology: Stanford Watershed

Model IV , Technical Report 39, Stanford University, Stanford.

Dawdy, D. R. and O’Donnell, T.: 1965, Mathematical models of catchment behaviour, Proceedings of

American Society of Civil Engineers. Journal of the Hydraulics Divisions of the ASCE , 91(HY4),

123–137.

Duan, Q., Sorooshian, S. and Gupta, V. K.: 1992, Effective and efficient global optimisation for conceptual rainfall-runoff models, Water Resour. Res.

28 (4), 1015–1031.

Gupta, H. V., Sorooshian, S. and Yapo, P. O.: 1998, Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information, Water Resour. Res.

34 (4),

751–763.

Gupta, V. K. and Sorooshian, S.: 1985, The relationship between data and the precision of estimated parameters, J. Hydrol.

81 , 57–77.

Haan, C. T.: 1977, Statistical Methods in Hydrology , The Iowa State University Press, Ames, Iowa,

378 pp.

Hendrickson, J., Sorooshian, S. And Brazil, L. E.: 1988, Comparison of Newton-type and direct search algorithms for calibration of conceptual rainfall-runoff models, Water Resour. Res.

24 (5),

691–700.

Kruskal, W. H. and Wallis, W. A.: 1952, Use of ranks in one criterion variance analysis, J. Amer. Stat.

Assoc.

47 , 583–621.

Kuczera, G.: 1983, Improved parameter inference in catchment models 1. Evaluating parameter uncertainty, Water Resour. Res.

19 (5), 1151–1162.

Kuczera, G.: 1997, Efficient subspace probabilistic parameter optimisation for catchment models,

Water Resour. Res.

33 (1), 177–185.

Moore, R. J. and Clarke, R. T.: 1981, A distribution function approach to rainfall runoff modelling,

Water Resour. Res.

17 (5), 1367–1382.

Nash, J. E. and Sutcliffe, J.: 1970, River flow forecasting through conceptual models. Part I. A discussion of principles, J. Hydrol.

10 , 282–290.

Pickup, G.: 1977, Testing the efficiencies of algorithms and strategies for automatic calibration of rainfall-runoff models, Hydrological Science Bulletin 22 (2), 257–274.

Sefe, F. T. and Boughton, W. C.: 1982, Variation of model parameter values and sensitivity with type of objective function, J. Hydrol.

21 (1), 117–132.

92

CHONG-YU XU

Servat, E. and Dezetter, A.: 1991, Selection of calibration objective functions in the context of rainfall-runoff modeling in a Sudanese Savannah area, Hydrol. Sci. J.

36 (4), 307–330.

Sorooshian, S. and Dracup, J. A.: 1980, Stochastic parameter estimation procedures for hydrologic rainfall-runoff models. Correlated and Heteroscedastic error cases, Water Resour. Res.

16 (2),

430–442.

Sorooshian, S. and Gupta, V. K.: 1983, Automatic calibration of conceptual rainfall-runoff models:

The question of parameter observability and uniqueness, Water Resour. Res.

19 (1), 251–259.

Sorooshian, S., Duan, Q. and Gupta, V. K.: 1993, Calibration of rainfall-runoff models: Application of global optimisation to the Sacramento soil moisture accounting model, Water Resour. Res.

29 (4), 1185–1194.

Spiegel, M. R.: 1980, Theory and Problems of Probability and Statistics , McGraw-Hill Book

Company, 372 pp.

Thornthwaite, C. W. and Mather, J. R.: 1955, The Water Balance , Publications in Climatology,

Laboratory of Climatology, Centerton, NJ, 8(1).

Vandewiele, G. L., Xu, C.-Y. and Ni-lar-win: 1992, Methodology and comparative study of monthly water balance models in Belgium, China and Burma, J. Hydrol.

134 , 315–347.

Vertessy, R. A., Hatton, T. J., O’Shaughnessy, P. J. and Jayasuriya, M. D. A.: 1993, Predicting water yield from a mountain ash forest catchment using a terrain analysis based catchment model, J.

Hydrol.

150 , 665–700.

Wang, Q. J.: 1991, The generic algorithm and its application to calibrating conceptual rainfall-runoff models, Water Resour. Res.

27 (9), 2467–2471.

Xu, C.-Y. and Singh, V. P.: 1998, A review on monthly water balance models for water resources investigation and climatic impact assessment, Water Resources Management 12 , 31–50.

Xu, C.-Y., Seibert, J. and Halldin, S.: 1996, Regional water balance modelling in the NOPEX area:

Development and application of monthly water balance models, J. Hydrol.

180 (1–4), 211–236.

Yapo, P. O., Gupta, H. V. and Sorooshian, S.: 1996, Automatic calibration of conceptual rainfallrunoff models: Sensitivity to calibration data, J. Hydrol.

181 , 23–48.

Download