Entropy theory based multi-criteria resampling of rain gauge networks

Journal of Hydrology 525 (2015) 138–151

Contents lists available at ScienceDirect

Journal of Hydrology

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / j h y d r o l

Entropy theory based multi-criteria resampling of rain gauge networks for hydrological modelling – A case study of humid area in southern

China

Hongliang Xu

a , b

, Chong-Yu Xu

a

,

c ,

, Nils Roar Sælthun

a

, Youpeng Xu

b

, Bin Zhou

d , e

, Hua Chen

c

a

Department of Geosciences, University of Oslo, PO Box 1047, Blindern, 0316 Oslo, Norway b

Department of Land Resources and Tourism, Nanjing University, 22 Hankou Road, Nanjing, Jiangsu 210093, PR China c

Department of Hydrology and Water Resources, Wuhan University, PR China d Department of Chemistry, University of Oslo, Oslo, Norway e Tianjin Academy of Environmental Sciences, 17 Fukang Road, Tianjin, PR China a r t i c l e i n f o

Article history:

Received 14 May 2014

Received in revised form 1 March 2015

Accepted 7 March 2015

Available online 25 March 2015

This manuscript was handled by

Konstantine P. Georgakakos, Editor-in-Chief, with the assistance of Emmanouil N.

Anagnostou, Associate Editor

Keywords:

Entropy

Mutual information

Multi-criteria

Xinanjiang Model

SWAT Model

Xiangjiang River Basin s u m m a r y

Rain gauge networks are used to provide estimates of area average, spatial variability and point rainfalls at catchment scale and provide the most important input for hydrological models. Therefore, it is desired to design the optimal rain gauge networks with a minimal number of rain gauges to provide reliable data with both areal mean values and spatial–temporal variability. Based on a dense rain gauge network of

185 rain gauges in Xiangjiang River Basin, southern China, this study used an entropy theory based multi-criteria method which simultaneously considers the information derived from rainfall series, minimize the bias of areal mean rainfall as well as minimize the information overlapped by different gauges to resample the rain gauge networks with different gauge densities. The optimal networks were examined using two hydrological models: The lumped Xinanjiang Model and the distributed SWAT Model. The results indicate that the performances of the lumped model using different optimal networks are stable while the performances of the distributed model keep on improving as the number of rain gauges increases. The results reveal that the entropy theory based multi-criteria strategy provides an optimal design of rain gauge network which is of vital importance in regional hydrological study and water resources management.

Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction

Accurate rainfall estimation is an important and challenging task and good spatial distribution of the rain gauge is a vital factor in providing reliable areal rainfall. Modern rainfall network established to monitor hydrological features should provide the necessary and real-time information for purposes such as water resources management, reservoir operation and flood forecast

and control ( Chen et al., 2008; Shafiei et al., 2014

). Direct measurement of rainfall can only be achieved by rain gauges, which provide a basis for characterizing the temporal and spatial variations of rainfall (

Cheng et al., 2008 ). However, even if rain gauges are

capable of providing real-time rainfall information at very fine temporal resolution under the help of automatic rainfall record equipment, it is still difficult to characterize the spatial variation

Corresponding author at: Department of Geosciences, University of Oslo, PO

Box 1047, Blindern, 0316 Oslo, Norway.

E-mail address: c.y.xu@geo.uio.no

(C.-Y. Xu).

http://dx.doi.org/10.1016/j.jhydrol.2015.03.034

0022-1694/ Ó 2015 Elsevier B.V. All rights reserved.

of rainfall without a well-designed rain gauge network in the catchment.

A well designed rain gauge network with proper densities and distributions is essential to provide valid precipitation information reflecting the spatial–temporal features in a catchment. However, most river basins of the world are poorly gauged or ungauged, and most rain gauge networks applied for hydrological purposes are largely inadequate according to the most dilute density requirements of the World Meteorological Organization (WMO).

The WMO recommends certain densities of rain gauges to be followed for different types of basins such as 500 km 2 per gauge is recommended in flat regions of temperate zones, while 25 km 2 per gauge is recommended for small mountainous islands with

irregular precipitation ( WMO, 1994

). Moreover, many non-hydrological factors considerably impact the rain gauge network design, e.g. accessibility, cost and easiness of maintenance, topographical aspects, etc. However, many studies have noted a marked decline in the amount of hydrometric data being collected in many parts

of the world ( Perks et al., 1996; Stokstad, 1999; Kizza et al.,

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

2009

). The decline of hydrometric gauges exists not only in developing countries, but even in developed countries, e.g. the U.S.

Geological Survey (USGS) network had undergone some significant

reductions in the mid-1990s ( Mason and York, 1997; Pyrce, 2004 ).

Meanwhile, with the technology advancement, the widespread application of satellite rainfall products has further caused deterioration of rain gauge networks in some cases (

Ali et al., 2003;

Visessri and McIntyre, 2012 ). This decline in hydrometric gauges

means that scientists and water resources engineers are less able to monitor water supplies, predict droughts, and forecast floods

than they were 30 years ago ( Stokstad, 1999

).

Rain gauge network design involves analysis of the quantity and location of stations necessary for fulfilling the required accuracy

( Bras, 1990 ) and meeting the objectives of information provided

by the network as efficiently and economically as possible

( Hackett, 1966 ). It is therefore desirable to design a minimum den-

sity of a rain gauge network for a required level of service in a given catchment.

Entropy theory has been applied as a useful tool for understanding the characteristics of precipitation that governed by complex factors in helping designing rain gauge networks.

Maruyama et al. (2005)

assessed the global potential water resources availability by using two different measures of entropy, the so called

Intensity Entropy (IE) and Apportionment Entropy (AE) and

11,260 rain gauges.

Tapiador (2007)

analysed the global satellite based monthly precipitation database (NOAA Climate Prediction

Centre Merged Analysis of Precipitation data) from 1979 to 2001 using a direct maximum entropy spectral analysis method and found that several cycles other than the annual or seasonal cycles affect the rainfall distribution of many areas, in particular western

Europe.

Liu et al. (2013)

studied the large-scale spatial rainfall distribution in the Pearl River basin of China from 1959 to 2009 using the information entropy theory and the fuzzy cluster analysis. The study area was then classified into 10 zones with their unique temporal and spatial distribution characteristics to meet the increasing demand of domestic and industrial usage of water resources.

Sang (2013)

investigated the spatial and temporal variability of daily precipitation and precipitation extrema in the

Yangtze River Delta (YRD) during 1958–2007 by using the discrete wavelet entropy method and indicated that the daily precipitation variability in YRD is determined by the comprehensive impacts of atmospheric circulation, urbanizations, and the Taihu Lake, while the variability of precipitation extrema is mainly determined by natural atmospheric circulation.

Based on the understanding of precipitation patterns, various approaches using optimal selection of rainfall gauges have been applied in designing rain gauge network to yield higher precision of rainfall estimation with minimum cost.

Pardo-Igúzquiza

(1998)

presented an optimal network design for the estimation of areal mean rainfall events by using simulated annealing method, which demonstrated that the simulated annealing algorithm of random search for optimal location of rain gauges takes into consideration the estimation accuracy and economic cost simultaneously.

Patra (2001)

applied a statistical theory for rain gauge network design. The study applied the coefficient of variation and the acceptable percentage of error range to estimate the optimal number of rain gauges.

St-Hilaire et al. (2003)

evaluated the impact of meteorological network density on the estimation of basin precipitation and runoff in five drainage basins in Mauricie watershed in Quebec, Canada by using Kriging method to estimate the spatial distribution and variance of rainfall.

Dong et al. (2005)

used variance reduction analysis method to find the appropriate quantity and location of rain gauges in Qingjiang River basin,

China for flow simulation. The study demonstrated that both cross correlation coefficient and modelling performance increased hyperbolically and level off after more than five rain gauges were

139 included in the network for the study area.

Anctil et al. (2006)

applied the method of randomly selection of rain gauges to produce subsets of rain gauge network to optimize the mean daily areal rainfall series in Bas-en-Basset watershed, southern France and using a genetic algorithm to orient the rain gauge combinatorial problem towards improved forecasting performance.

Segond et al. (2007)

investigated the relationship between spatial rainfall and runoff production by using the rain gauge networks of various densities and radar data in Lee catchment, UK. The study concluded that the dominant effect on hydrological modelling is the spatial variability of the rainfall estimated by different rain gauge networks and radar data.

Bárdossy and Das (2008)

studied the influence of the spatial resolution of rainfall input on the model calibration and application by varying the distribution of the rain gauge network via External Drift Kriging method (EDK) in southwest of Germany. The study pointed out that the overall performance of the model worsened dramatically with reduction of rain gauges, while there is no significant improvement of the model after the number of rain gauges passed a certain threshold.

Chen et al. (2008)

applied Kriging and entropy-based algorithm to design rain gauge network which contains the minimum number of rain gauges and optimum spatial distribution in Taiwan

Province, China. The study found that the saturation of rainfall information can be used to add or remove the rain gauge stations in order to determine the optimum spatial distribution and the minimum number of rain gauges in the network.

Yoo et al.

(2008)

compared the applications of mixed and continuous distribution functions to the theory of entropy for the evaluation of rain gauge networks in the Choongju Dam basin, Korea. Due to the small wet probability and the high coincidence of daily rainfall between rain gauge stations, the study found that the optimal number of rain gauges estimated by the mixed distribution function was much smaller, but still reasonable, than that estimated by applying the continuous distribution function.

Wei et al.

(2014)

investigated the spatiotemporal scaling effect on rainfall network design relocated by calculating the maximum joint entropy of rainfall in 1992–2012 for 1-, 3-, and 5-km grids in

Taiwan Province, China. The study found a smaller number and a lower percentage of required stations reached stable joint entropy provide key reference points for adjusting the network to capture more accurate information and minimize redundancy.

In many studies, rain-gauge networks are designed to provide good estimation for areal rainfall and for flood modelling and prediction (e.g.

Nour et al., 2006; Segond et al., 2007; Barca et al.,

2008; Volkmann et al., 2010; Chebbi et al., 2011

; etc.).

Tsintikidis et al. (2002)

demonstrated that even when lumped models are used for flood forecasting, a proper gauge network can significantly improve the results. Due to the summer flash rainfall exhibits particularly high spatiotemporal variability and produces severe, quick, and sharply peaked flash flooding (

Desilets et al., 2008 ),

the monitoring of summer flash rainfall represents the most difficult and important challenge for a rain gauge network designed for flood prediction.

Volkmann et al. (2010)

designed sparse rain gauge networks in semiarid catchments with complex terrain to predict flash flood. The study showed that the multi-criteria strategy which provided a robust design in diluting the rain gauge network could be implemented in designing sparse but accurate rain gauge networks in the semiarid catchments similar to the one studied.

Precipitation gauge network structure is not only dependent on the station density; station location also plays an important role in determining whether information is gained properly.

Gupta et al.

(2002) and Yatheendradas et al. (2008)

pointed out that the mountain areas with rapidly changing patterns of precipitation are poorly monitored which make it difficult to produce accurate hydrological forecasts with sufficient leading time. Therefore, the

140 H. Xu et al. / Journal of Hydrology 525 (2015) 138–151 design of hydrological measurement networks has received considerable attention in research settings.

Rain gauge network optimization can be taken as the process of finding the locations of a limited number of rain gauges which provide sufficient rainfall information of both the spatial distribution and the areal mean precipitation. The main objectives of this paper are designed to: (1) understand and quantify the variability of the precipitation at catchment scale using the Shannon’s Entropy and Mutual Information method; (2) design and evaluate a new entropy theory based multi-criteria strategy for identifying the best locations for installation of rain gauges based on the existing dense rain gauge network; and (3) evaluate the impact of the different rain gauge networks on hydrological simulation by using lumped and distributed hydrological models.

variation of runoff follows that of the precipitation, i.e., about

70% of the flooding events occurs during the rainy season.

The precipitation dataset used in this study consists of 185 rain

gauges ( Fig. 1

) and covers the period from 1st January 1991 to 31st

December 2005. The China Meteorological Data Sharing Service

System ( http://cdc.cma.gov.cn

) provided the meteorological data including daily maximum and minimum air temperature, precipitation, wind speed, solar radiation and average daily humidity.

The spatial data of the basin including elevation (the 90 m resolution DEM map was downloaded from the Shuttle Radar

Topography Mission), one km resolution soil dataset with respect to texture, depth and drainage attributes (provided by the project of Watershed EUTROphication management in China through system oriented process modelling of Pressures, Impacts and

Abatement actions) and land use map (consisting of five classes of forest, agriculture, grass, urban and water was interpreted from the Landsat satellite images) were prepared as inputs to the SWAT

Model (Soil and Water Assessment Tool).

2. Material and methods

2.1. Study area and data 2.2. Methodology

Xiangjiang River is a tributary of Yangtze River located between

24 ° 30 0 –29 ° 30 0 N and 110 ° 30 0 -114 ° E in central-south China with a total river length of 856 km (

Fig. 1 ). In the 94,660 km

2 catchment area, the terrain is ladder-like with high mountains in the headwater region and low plains in the downstream. The monsoon climate is the main factor to bring rainfall in the catchment from the Pacific Ocean. Nearly two thirds of the 1600 mm annual precipitation occur in the rainy season (from April to September). The mean annual temperature and mean annual potential evapotranspiration are 17 ° C and 1000 mm respectively. The mean annual discharge at Xiangtan gauge is 72.2 billion m

3 and the seasonal

To achieve the objectives of optimizing rain gauge networks of various gauge densities and investigating the impact of rain gauge location on hydrological modelling, the long-term rainfall (from

1991 to 2005) is considered in the analysis. The rainfall from 185 gauges and the areal mean rainfall computed by using the

Thiessen Polygon algorithm with all the 185 gauges over the study area were assumed to present the ‘‘true’’ point and areal mean precipitation, respectively. In the following three subsections, the basic theory of Shannon’s Entropy and Mutual Information, the strategy for rain gauge network optimization, and the hydrological models used in the study are described.

Fig. 1.

Study area: Xiangjiang River Basin.

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

2.2.1. Shannon’s entropy and Mutual Information

The entropy of a system is commonly described as a measure of the inherent disorder within the system.

Shannon’s Entropy (H)

(1949)

, also named as Information Entropy, is a measure of the

uncertainty in a random variable ( Ihara, 1993 ), which quantifies

the expected value of the information contained in a message

(i.e. the specific realization of the random variable) (

Bush, 2010 ).

Information Entropy is the average unpredictability in a random variable, which is equivalent to its information content (

Strange et al., 2005

). The basic hypotheses of the entropy are: for a discrete random variable X with possible values { x

1

, x

2

, . . .

, x n

} and probability mass function P ( X ), the amount of information, I ( P ), is a real nonnegative measure, additive and a continuous function of probability p, then (

Lin, 1991; Chen et al., 2008 ):

I ð P ð x i

ÞÞ P 0

I ð P ð x

1

Þ P ð x

2

ÞÞ ¼ I ð P ð x

1

ÞÞ þ I ð P ð x

2

ÞÞ

ð 1 Þ

For any discrete probability distribution, Shannon’s entropy is

denoted as ( Yao, 2003; Bhattacharyya and Sanyal, 2012

):

H ð X Þ ¼ E ½ I ð X Þ ¼ E ½ log b

ð P ð X ÞÞ ¼

X

P ð x i

Þ I ð x i

Þ i

¼

X

P ð x i

Þ log b

P ð x i

Þ i

ð 2 Þ where E is the expected value operator, I is the information content of X , I ( X ) = log b

( P ( X )) is self-information of a random variable (i.e. a measure of the information content associated with the outcome of a random variable), b is the base of the logarithm with common values of 2, Euler’s number e , and 10, and the unit of entropy is bit for b = 2, nat for b = e, and dit (or digit) for b = 10.

P ( x i

) is the probability mass function of outcome x i

. In the case of P ( x i

) = 0 for some i , the value of the corresponding summand 0log b

(0) is taken to be 0, which is consistent with the well-known limit: lim p !

0 þ p ð log b

ð p ÞÞ ¼ 0 ð 3 Þ

An important property of entropy is that it is maximized when the system is in the highest possible state of disorder. For a system with a finite number of possible states, the entropy is maximized when all probabilities are equal, i.e.

P ( x ) = 1 /n and H max

( X ) = log b

( n )

( Borwein et al., 2014

). Mathematically, the amount of information contained in the random variable X is inversely associated to the probability of the occurrence of x i

. The principle of maximum entropy is a general method for estimating probability distributions from random variable X which can be used to obtain unbiased proba-

bility assessments ( Guiasu and Shenitzer, 1985

). The over-riding principle in maximum entropy is a generalization of the classical principle of indifference which means that the distribution of X should be as uniform as possible if nothing is known about X except that it belongs to a certain class (

Smith and Grandy (Eds.) 1985; Gull,

1989

).

Mutual Information measures the amount of information that can be obtained about one random variable by observing another

(i.e. it is a quantitative measurement of the mutual dependence of two random variables) (

Steuer et al., 2001

). In our case the two random variables are two rain gauges X with values { x

1

, x

2

,

. . .

, x n

} and Y with values { y

1

, y

2

, . . .

, y n

}, respectively. The precipitation information of rain gauges X and Y may be overlapped.

Corresponding to Eq.

(2)

, the joint entropy of the two rain gauges is (

Krstanovic and Singh, 1992

):

H ð X ; Y Þ ¼ E ½ log

2

ð P ð X ; Y ÞÞ ¼

X X p xy log

2 p xy y 2 Y x 2 X

¼

X X p ð x ; y Þ log

2 y 2 Y x 2 X p p

ð

ð x x

Þ

; p y

ð

Þ y Þ

ð 4 Þ

141 where p ( x , y ) is the joint probability distribution function of X and Y , and p ( x ) and p ( y ) are the marginal probability distribution functions of X and Y respectively. The entropies of these distributions are related to each other and it can be proved that (

Zhang and Yeung,

1998

):

H ð X ; Y Þ P max ½ H ð X Þ ; H ð Y Þ

H ð X ; Y Þ 6 H ð X Þ þ H ð Y Þ

ð 5 Þ

When the precipitation information is recorded in rain gauge X , the uncertainty of the rain gauge Y can be exhibited by the conditional entropy. The conditional probability of rain gauge X under the impact of rain gauge Y can be denoted as: p ð X j Y Þ ¼ p x j y

¼ p xy p y

ð 6 Þ

Therefore

H ð X ; Y Þ ¼

X X p xy log

2 p xy

¼ x 2 X

X y 2 Y

X

½ p ð x j y Þ p ð y Þ log

2

ð p ð x j y Þ p ð y ÞÞ x 2 X

¼

X y 2 Y

" p ð y Þ

X p ð x j y Þ log

2 p ð x j y Þ

# y 2 Y

X

" x 2 X p ð y Þ log

2

ð p ð y ÞÞ

X p ð x j y Þ

# y 2 Y x 2 X

ð 7 Þ

The joint entropy which measures the uncertainty associated with two random variables X and Y is computed by using Eq.

(8)

with conditional probability ( Cover and Thomas, 2012

):

H ð Y j X Þ ¼ H ð X ; Y Þ H ð X Þ ð 8 Þ where H ð Y j X Þ is the conditional entropy of event Y under given event X . There are no uncertainties in terms of the conditional entropy of identical variables. Hence, the value of conditional entropy shown below is set to be zero:

H ð x j x Þ ¼ 0 ð 9 Þ

Unlike Pearson’s correlation that characterizes linear depen-

dence, Mutual Information is completely general ( Pethel and

Hahs, 2014

). The amount of mutual (overlapped) information of the two rain gauges can be estimated by applying the transferable information computation that similar to using the rain gauge X to forecast the information of rain gauge Y . The reduction in information about one variable (i.e. rain gauge X ) due to the knowledge of the other (i.e. rain gauge Y

) is ( Chen et al., 2008

):

I ð X ; Y Þ ¼ H ð Y Þ H ð Y j X Þ ¼ H ð X Þ þ H ð Y Þ H ð X ; Y Þ ð 10 Þ where I ( X , Y ) is the Mutual Information of variables (rain gauges) X and Y , and zero dependence occurs if and only if p ( x , y ) = p ( x ) p ( y ) (i.e.

X and Y are independent and log

2

ð p ð x ; y Þ p ð x Þ p ð y Þ

Þ ¼ log

2

1 ¼ 0), otherwise

I ( X , Y ) is a positive quantity and can be proved that:

8

<

I ð X ; Y Þ ¼ I ð Y ; X Þ

I ð X ; Y Þ 6 H ð X Þ ð 11 Þ

I ð Y ; X Þ 6 H ð Y Þ

The deduction of Eqs.

(8) and (10)

is given in the

Appendix A .

For a detailed description and understanding of Information

Entropy and Mutual Information, please refer to

William (2007) and MacKay (2003)

.

2.2.2. Rain gauge network optimization

To find the optimal rain gauge network with different number of rain gauges, various scenarios of different rain gauge densities were built. Nine broad scenarios comprising of 5–75% of total rain

142 H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

gauges ( Table 1 ) are obtained for analysis. For each scenario, the

best and good rain gauge networks are selected from 5000 different network configurations by using Monte Carlo method from the total of 185 gauges following the steps below:

Step 1 : Compute the Shannon’s Entropy ( H ) of each rain gauge and find the rain gauge with maximum H max

. The surplus 184 rain gauges (except the rain gauge H max

) can be considered as a 184dimensional dataset R

184 and the given number of rain gauges randomly selected from R

184 were considered as sub-datasets

R d

2 R

184 ( d = 8, 18, 27, . . .

etc.). To find all the possible combinations of given gauge number ( d ) is not necessary and impossible as well (e.g. there are approximately 2.792

10 13 different combinations of 8 gauges in 184 gauges), so we adopt Monte

Carlo stochastic selection method to composite the feasible rain gauge network set H 0 which includes 5000 different combinations of each given gauge number ( d ). Then add the rain gauge of maximum entropy into each rain gauge network in H 0 to composite the rain gauge network set H .

Step 2 : To find the ‘‘best’’ network configuration in a given number of rain gauges, we applied a multi-criteria algorithm based on values computed for three objective functions (OFs) in dataset

H .

8

>

>

>

>

:

F

F

2

F

1

3

ð

ð

ð h h h

Þ ¼

Þ ¼

Þ ¼

I ¼

PBIAS

NSC

P d 1 i ¼ 1

P d j ¼ i þ 1

I ð X i

; X

¼

¼

1 j

Þ

P

C

2 d n t P n jð x t p t

Þj p t

P t ¼ 1

P n

ð x t p t ¼ 1 n t ¼ 1

ð p t t p Þ

Þ

2

2

ð 12 Þ where X i and X j are rain gauges pair derived from H , C 2 d is the combinatorial number equals to d ð d 1 Þ

2

.

p t is the ‘‘true’’ areal mean precipitation (computed by using all the 185 rain gauges in the catchment) at time interval t , x t is the sampled areal mean precipitation from a given network configuration at that time interval, and n is the number of 1 day time intervals analysed. The over score operator (as in x and p ) indicates the average of the measure ( p t

) over all n time intervals considered.

The first OF ( F

1

ð h Þ ) is the arithmetic mean of the Mutual

Information computed by all bi-combinations of the rain gauges

( X i and X j ) from H which represents the rainfall information

‘‘overlapped’’ among gauges. The second OF ( F

2

ð h Þ ) is PBIAS which measures errors in global rainfall volume input to the catchment, computed as the percent bias of the absolute error. Considering the temporal dynamics of discharge generation process, only the accurate estimation of rainfall volumes cannot provide sufficient information to predict accurately the discharge volume and hydrograph shape. Therefore, the third OF ( F

3

ð h Þ ), the Nash–Sutcliffe

Coefficient ( NSC ) is also used which determines the relative magnitude of the residual variance (‘‘noise’’) compared to the measured data variance (‘‘information’’) (

Nash and Sutcliffe, 1970 ).

Simultaneous optimization of the non-commensurable information provided by the three OFs (i.e.

F

1

ð h Þ , F

2

ð h Þ and F

3

ð h Þ ) helps to extract relevant information from a single signal for resampling the optimal rain gauge networks applied in hydrological modelling

(

Gupta et al., 1998 ). Note that any number of relevant OFs could be

considered in the multi-criteria purpose, but the selected OFs should not be highly correlated for the purpose of evaluating various aspects between calculated and ‘‘true’’ precipitation in the catchment.

Step 3 : To find rain gauge networks that optimize all OFs simultaneously from H , formulate the multi-criteria network

design problem (function (12)) as ( Gupta et al., 1998 ):

optim h 2 H

F ð h Þ ¼ ½ F

1

ð h Þ ; F

2

ð h Þ ; . . .

; F m

ð h Þ ð 13 Þ where h stands for the possible gauge combinations within the rain gauge network set H and m stands for the number of criteria used

(in this study, m = 3). In general, this multi-criteria optimization problem has a set of solutions but not a unique solution that simultaneously optimizes each criterion. Therefore, it is necessary to take use of a Pareto set of solutions which have the property that moving from one solution to another and resulting in the improvement of at least one criterion while causing deterioration in at least

one other ( Gupta et al., 2003; Jayawardena, 2014

). As

Volkmann et al. (2010)

stated ‘‘This Pareto set defines the minimum uncertainty in network selection that can be achieved without stating a subjective relative preference for minimizing one specific component of F ( h ) at the expense of another’’.

However, in practice, the rain gauge network design needs a unique solution rather than Pareto solutions, but it is not a simple task to objectively assess the goodness of the possible solutions in the Pareto set and identify the best solution before validated in hydrological models. The selection of a restricted set of Pareto solutions which allows for subjectively identifying an appropriate single network can be named as ‘‘compromise solutions’’, i.e. the

Pareto solutions which correspond to a more balanced trade-off between the three OFs and consequently to simultaneously optimize all the OFs (low I and PBIAS and high NSC ). That is to say, the Pareto solution which referred to the most satisfiable compromise among the three OFs is the solution that each OF cannot be optimized any further, otherwise the slightly improvement of one OF comes at the expense of a strong deterioration of at least one other objective (in practice, a slightly deterioration of only one function is acceptable to achieve the improvement of the other two functions). Therefore, the appropriate final ‘‘best’’ rain gauge network can be identified from the compromise solutions that fulfil the multiple-criteria requirements (i.e. in a simple manner in this study, the best network is the network which identified in the three criteria indices in the compromise solutions by (1) giving three best values; or (2) giving at least one best value while the other values are only slightly worse than the best values; or (3) all three indices are not achieving their best values but only slightly worse than the best values). But in some certain case, the value of an OF is usually reasonable but may not optimal in a single-criterion sense. After selecting the best rain gauge network, all solutions close to the ideal values of the three criteria (i.e.

I = min ( I ), PBIAS = 0 and NSC = 1) and in proximity to the best solution may be chosen as ‘‘good’’ rain gauge networks (denoted as H g

) with respect to their compromise between the three OFs (in the sense of 3-D space built by the three criteria, the good networks should simultaneously fulfil the conditions of: (1) close to the ideal values of the three criteria, and (2) close to the best network). To find the good rain gauge networks, first, a compromise set of solutions (denoted as H c

) should be identified from the highly compromised part of the Pareto set which excludes the solutions that only correspond to single-criteria optimizations (denoted as H p

, and

Table 1

The number of rain gauges in different percentage of selection.

Percentage of rain gauges

Number of rain gauges

5%

9

10%

19

15%

28

20%

37

25%

46

30%

56

40%

74

50%

93

75%

139

100%

185

s 2 ¼ s j

2 n

½ 1 þ r ð n 1 Þ where s j

2 ¼

1 n

X s j

2 j ¼ 1 s j

2 ¼

"

X

ð x ij i ¼ 1 x j x j

¼

X x ij

!

= N i ¼ 1

Þ

2

#

= N r ¼ 2

X j ¼ 1 i ¼ j þ 1 r ij

!

= n ð n 1 Þ

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

H g

H c

H p

H ). Then, the good rain gauge networks can be found in proximity to the best rain gauge network solution. The proximity is defined as solutions in a region of three-criteria space with vertices located at (1) I = min ( I ), PBIAS = 0 and NSC = 1 and (2) the x th percentile of F i

ð h Þ and the other three OFs equal their best values (e.g. the x th percentile of I , PBIAS = 0 and NSC = 1). The value x is selected by iteratively increasing from the 90th percentile until the resulting region contains 10 solutions or less. The best and the good rain gauge networks are identified by using the entire rainfall data (from 1st January 1991 to 31st December 2005).

2.2.3. Evaluation of rainfall estimated by optimal rain gauge networks–variance reduction due to the increase in the number of rain gauges

Variance in the rainfall time series provides one estimate of the variability of the rainfall at a location or of a region (

Dong et al.,

2005

). The variance of areal mean rainfall is expected to reduce if the number of rain gauges for calculating the areal rainfall is increased. The algorithm adopted from

Yevjevich (1972)

provides a good measure of the relationship between the variance of station measurement and that of areal mean rainfall.

Under the hypothesis that a rain gauge network of a basin is composed by n gauges with records length N , and the rainfall recorded in the area is spatially ergodic and homogeneous, the variance of the areal mean rainfall can be expressed as

( Yevjevich, 1972; Dong et al., 2005

):

ð

ð

ð

ð

ð

14

15

16

17

18

Þ

Þ

Þ

Þ

Þ

SW t

¼ SW

0

þ

X

ð R day i ¼ 1

Q surf

E a

W seep

Q gw

Þ

3. Results

3.1. Optimal networks with different number of rain gauges

143 the Xinanjiang Model has been widely applied in humid and semi-humid areas of China and aboard (

Zhao et al., 1995; Singh,

1995; Zhang et al., 2012 ). Based on the concept of runoff formation

on repletion of storage, the main application of the model is for hydrological forecasting (

Zhao, 1992; Li et al., 2009 ), meanwhile,

the model also demonstrated great potential in application of water resources assessment, catchment management, hydrological design, water quality accounting and impact study of climate change and land use change (e.g.

Yao et al., 2009; Yuan et al.,

2012; Zhang et al., 2012; Xu et al., 2013

).

SWAT (Soil and Water Assessment Tool) Model is a physicallybased continuous, long-term, distributed-parameter model designed to predict the effects of land management practices on the hydrology, sediment, and contaminant transport in agricultural watersheds under varying soils, land use, and management condi-

tions ( Arnold et al., 1998

). SWAT is based on the concept of

Hydrologic Response Units (HRUs), which are portions of a subbasin that possesses unique land use, management, and soil attributes. The runoff from each HRU is calculated separately based on weather, soil properties, topography, vegetation, and land management and then summed to determine the total value from the subbasin. The hydrologic cycle simulated by SWAT is based on

the water balance equation ( Neitsch et al., 2002 ):

ð 19 Þ where SW t is the final soil water content, SW

0 content on day i , t is the time (days), R day is the initial soil water is the amount of precipitation on day i , Q surf is the amount of surface runoff on day i , E a is the amount of evapotranspiration on day i , W seep is the amount of water entering the vadose zone from the soil profile on day i , and Q gw is the amount of return flow on day i . For a detailed description and explanation of the SWAT Model, please refer to

Neitsch et al. (2011) .

The hydrological models are calibrated from 1st January 1991 to

31st December 1999 and validated from 1st January 2000 to 31st

December 2005. The performances of hydrological simulation is evaluated by using Relative Mean Error ( R

E

) and Nash–Sutcliffe efficiency coefficient ( NSC ) (e.g.,

Li et al., 2014

).

where s j

2 is the mean of the gauge variance; s j

2 is the variance of the j th rain gauge; x ij is the rainfall data recorded at the and the j th rain gauge; x j is the mean of the j i th time point th rain gauge; r ij is the sample product–moment correlation coefficient between rainfall series of gauges i and j ; and r is the arithmetic mean of the correlation coefficients of all bi-combinations of the rain gauges.

According to Eq.

(14) , the variance of areal mean rainfall is expected

to decrease hyperbolically with an increasing number of rain gauges n (more details can be seen in

Rodriguez-Iturbe and Mejía, 1974;

Booij, 2002 ).

2.3. Hydrological models

To investigate the impact of rain gauge density and spatial location on hydrological modelling performances, and to test the utility of the entropy theory based multi-criteria resampling algorithm, two different types of hydrological models are applied in the study.

Xinanjiang Model is a lumped conceptual rainfall-runoff hydro-

logical model developed in 1973 ( Zhao et al., 1980

). Since then,

3.1.1. The distribution of Information Entropy in the catchment

Daily precipitation records from all 185 rain gauges of

Xiangjiang River Basin are used as benchmark data in designing the optimal rainfall network.

Fig. 2

shows the distribution of

Information Entropy over the study area calculated by daily precipitation of 185 rain gauges from 1st January 1991 to 31st December

2005 using Ordinary Kriging method. It is seen that the spatial variability of Information Entropy of rainfall is non-homogeneous and is larger in the south and east region than that in the north and west of the basin. Referring to the DEM map in

Fig. 1

, it is seen that the high values of Information Entropy (>4.6 bit) mainly distributed in the mountain area of the south-east border of the catchment (the rain gauges which located outside the basin are not considered in the study because of the limitation of precipitation data availability). This is due to its high elevation that blocks the transpiration of vapour carried by the monsoon from the Pacific Ocean and thereby forms very high volume of rainfall; meanwhile the complex topography also caused considerable precipitation variation in this area.

144 H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

Fig. 2.

The distribution of Information Entropy in Xiangjiang River Basin.

3.1.2. The Pareto front of best and good rain gauge networks

As expected, a clear trend of improvement in PIBAS , NSC and I is

found with the increasing number of rain gauges ( Fig. 3

). When nine rain gauges are included in the networks, the maximum/ minimum values of the 5000 combinations of the networks are

0.59/0.09, 0.99/0.44 and 0.542/0.434 bit for PBIAS , NSC and I respectively, then the values of PBIAS , NSC decrease/increase

1

0.8

0.6

PBIAS

0.55

NSC

Mutual

Information

0.5

0.6

0.4

0.45

0.2

0

9 19 28 37 46 56 74 93 139 185

Number of Rain gauges

0.4

Fig. 3.

The maximum and minimum values of multi-criteria results for all combination (5000 combinations for each given rain gauge number) of networks with different number of rain gauges.

gradually to approach the theoretical ideal value ( PBIAS = 0 and

NSC = 1) with the increasing of rain gauge numbers. On the other hand, the maximum values of NSC show nearly no difference among the networks with different number of rain gauges, and the maximum and minimum values of I decrease and increase progressively to approach 0.467 bit calculated from the all 185 rain gauges in the catchment. It is also seen that the maximum/minimum values of PBIAS decrease nearly linearly with the increase of gauge numbers in the networks while the maximum/minimum values of NSC and I show no considerable differences after more than 74 rain gauges are selected in the network.

For illustrative purposes,

Fig. 4

shows the results of the entropy theory based multi-criteria rain gauge network optimization algorithm of network composed by nine, 19, 46 and 93 rain gauges. As the number of rain gauges increased in the network (moving from

Fig. 4 (a)–(d)), the network Pareto solutions, including the best

network and good networks, move closer in performance to the network including all 185 rain gauges (i.e.

PBIAS = 0, NSC = 1 and

I = 0.467 bit). The borders (proximity) of Pareto front of the best and good rain gauge networks (green lines) become gradually shorter and move towards the location of theoretically perfect network in the three-dimensional coordinate system (i.e. threecriterion space). In addition, it is also seen that the location of best and good rain gauge networks increasingly tends to concentrate into the corner which locates in the diagonal position to the theoretically perfect solution (with respect to each of combination

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151 145

Fig. 4.

Three dimensional projections of objective function space for the multi-criteria optimization of rain gauge networks of (a) 9 rain gauges; (b) 19 rain gauges;(c) 46 rain gauges, and (d) 93 rain gauges (the red star indicates the best network, the blue dots indicate the good networks and the green lines are the Pareto fronts). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) of two OFs) as numbers of rain gauges increased. This result shows that the algorithm adopted is an informed method to identify the best and good rain gauge networks.

1700

1650 a

1600

1550

1500

1450

9

Good Networks

Best network

19 28 37 46 56 74

Number of rain gauges

93 139 185

65

62 b

59

56

53 Good Networks

Best network

50

Number of rain gauges

Fig. 5.

Effect of the number of rain gauges of optimized networks on the mean and variance of areal mean rainfall.

3.1.3. Rainfall variation evaluated by different optimal rain gauge networks

The areal annual mean rainfall and variance computed from

1991 to 2005 of best and good rain gauge networks with different number of rain gauges are shown in

Fig. 5

. It is seen that areal mean rainfalls estimated by the best and good networks of different rain gauge configurations vary around the line equals 1602 mm

(i.e. the areal annual mean rainfall calculated by the all 185 rain gauges in the catchment,

Fig. 5

(a)), and the relative errors are all less than 5% compared with the areal annual mean rainfall estimated by the network containing 185 gauges. Moreover, there are no significant differences between the areal annual mean rainfalls computed by different gauge configurations using Student’s t-test under the 5% significant level. Similarly, the effects of the number of rain gauges of optimal networks on the variance of areal mean rainfall series are shown in

Fig. 5 (b). It is expected that the

optimal networks with sparse gauges should perform similar variance as the optimal networks with dense gauges to demonstrate that there is no considerable differences between the optimal networks with different numbers of rain gauges. It is seen that the variance of best networks computed from Eq.

(14)

decreases slightly from 61.5 mm 2 /day 2 to 59 mm 2 /day 2 with the increase of the rain gauge number from 9 to 185. After a certain threshold

(best network with 74 gauges), the variance levels off to a final value of 59 mm 2 /day 2 , which implies that the effect of the variance of areal mean rainfall series keeps stable when rain gauge number n is greater than a certain threshold number. In addition, the variation range of variance estimated by good networks is less than

6 mm 2 /day 2 and narrows gradually from more than 46 rain gauges are included in the good networks.

3.1.4. The distribution of rain gauges of the best and good network in the catchment

For illustrative purposes,

Fig. 6

gives an example of the geographical distribution of the optimal networks of nine, 19 and 37 rain gauges (the unique best and two good networks are represented respectively). It is seen that the values of the three criteria indices

146 H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

Fig. 6.

The distribution of optimized rain gauge networks with sparse gauges. (a): best network of 9 gauges, (b) and (c): good networks of 9 gauges; (d): best network of 19 gauges, (e) and (f): good networks of 19 gauges; (g): best network of 37 gauges, (h) and (i): good networks of 37 gauges.

(i.e.

I , NSC and PBIAS ) are similar in best and good networks with given rain gauge numbers (e.g. the values of I / NSC / PBIAS are

0.459/0.953/0.186, 0.458/0.949/0.186 and 0.455/0.945/0.195 for the best and good networks with nine rain gauges showed in

Fig. 6 (a)–(c) respectively). There are two characteristics of geographi-

cal locations can be detected from the combinations that sparsely distributed in the catchment: (1) a strong effect of the geographical location: rain gauges in best and good networks are mainly located in the upper and middle reaches of the main stream and tributaries streams, and (2) referring to

Figs. 1 and 2

, the rain gauges located in the mountain areas in the south and west parts of the basin which have higher values of Information Entropy play an important role in designing the optimal networks.

3.2. Evaluation of rainfall estimates for hydrological simulation

3.2.1. Comparison of simulation results in Xinanjiang Model and SWAT

Model

The rainfall estimation errors will be reflected in hydrological modelling performances. To test the suitability of Information

Entropy theory based multi-criteria rain gauge network optimization algorithm and study the impact of rain gauge density and spatial location on hydrological modelling;

Figs. 7 and 8

show the simulation results of the lumped Xinanjiang Model and the distributed SWAT Model at Xiangtan gauge, respectively.

In case of the lumped Xinanjiang Model, it is seen that there is no considerable differences between the models’ results based on

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151 147

Fig. 7.

Simulation results of Xinanjiang Model. (a) and (c) are calibration period; (b) and (d) are validation period.

Fig. 8.

Simulation results of SWAT Model. (a) and (c) are calibration period; (b) and (d) are validation period.

various optimal rain gauge networks in terms of Relative Error ( R

E

) and Nash–Sutcliffe efficiency coefficient ( NSC ).

In the calibration period, R

E

(

Fig. 7

(a)) and NSC (

Fig. 7

(c)) are

0.3% and 0.92 respectively in the model using the best network of nine rain gauges. With the increasing of the number of rain gauges in the best and good networks, the simulation results show that the values of R the values of NSC

E locate around zero, and all | R

E than 2%. Moreover, a gradually increasing trend can be observed in

. When the models using the best network with more than 28 rain gauges, the values of NSC

| values are less show nearly no differences and approximately equal to 0.95. Considering the models based on good networks, it is seen that the variation ranges of R

E

148 H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

Fig. 9.

Simulated hydrographs of 1999 from optimized rain gauge networks based hydrological models: (a), (b) and (c) are 9, 37, 93 gauged based Xinanjiang Model; (d), (e) and (f) are 9, 37, 93 gauged based SWAT Model.

and NSC of the simulation results narrow progressively with the increasing number of rain gauges.

In the validation period, a similar change patterns can be observed for R

E

(

Fig. 7 (b)) and

NSC (

Fig. 7

(d)) as in calibration period, but the values of R

E and NSC are slightly larger and lower respectively and the variation ranges expanded. The | R

E

| values are less than 3% for all models’ simulations based on best networks and the values of NSC show nearly no differences and approximately equal to 0.93 when the models use the best networks including more than 19 rain gauges. However, the variation range of R

E is 8% to 3% and narrows to ±4% until more than 74 rain gauges are included in the good networks in hydrological modelling. Furthermore, the variation range of NSC is 0.89–0.92 when simulated by using good networks including nine gauges and gradually narrows from when 37 gauges are included in the good networks in modelling.

In case of the distributed SWAT Model, both the density and distribution of rain gauges affect the model performances considerably compared with the lumped Xinanjiang Model. Similar to the rainfall estimates, the uncertainty in runoff simulation is strongly reduced by increasing the rain gauge numbers in the optimal networks in the distributed SWAT Model.

In the calibration period, the models based on best networks have R

E values of the simulated and observed discharge range between

5% and 7% ( Fig. 8 (a)). The range of

R

E is 6% and 9% in the good networks, and gradually narrows with the increasing number of rain gauges especially from when 56 rain gauges are included.

Fig. 8

(c) shows a clear improvement trend of NSC from

0.66 (nine rain gauges) to 0.91 (185 rain gauges). However, the

NSC of model simulations based on good networks increases but the variation range only shows a slight narrowing trend (e.g. the variation ranges of NSC are 0.61–0.72, 0.73–0.8 and 0.86–0.9 for the 9, 37 and 139 rain gauges in the model results based on good networks, respectively).

In the validation period, the models using various best networks overestimate the volume of discharge (except the models using the best networks including 74, 139 rain gauges) and all R

E values are located between 1% and 9% (

Fig. 8

(b)). The variation range of R

E of the models using various good networks does not show a clear narrowing trend, but it is progressively more cantered on zero with the increasing of rain gauge numbers (e.g. the variation ranges of

R

E are 3–14%, 3–8% and 5% to 4% for the good networks of nine,

37 and 139 rain gauges respectively), which does suggest that the mean error decreases from discharge simulations using the optimal networks that include more rain gauges. The values of

NSC increase clearly with the increasing number of rain gauges

included in the optimal networks ( Fig. 8 (d)). In models using best

networks, the NSC increased progressively from 0.57 (nine rain gauges) to 0.86 (185 rain gauges); meanwhile, the variation range of NSC narrows gradually with increasing number of rain gauges in good networks used in hydrological modelling (e.g. the variation ranges of NSC are 0.43–0.78, 0.58–0.79 and 0.85–0.88 for the

models based on good networks with nine, 37 and 139 rain gauges respectively), and after more than 93 rain gauges are used in simulation, the differences for NSC between the upper and lower limit are less than 0.1. If the subjectively decided acceptance domain of the model performance is | R

E

|<5% and NSC > 0.8, it is seen that at least more than one fourth (i.e. 46 rain gauges) of the rain gauges should be included in the network in using the SWAT Model for discharge simulation in the catchment.

As Pareto optimal does not imply that it maximizes utility

( Friedland, Ed. 2009 ), it is seen that the hydrologic models based

on ‘‘best’’ rain gauge networks do not always give the best model performances in comparing with the ‘‘good’’ networks due to three reasons (1) both the rainfall-runoff transformation processes in reality and modelling processes are highly non-linear, (2) runoff generation and routing mechanisms are model depended and cannot easily be generalized (for example, as shown in the paper, lumped model and distributed model respond differently to the density and distribution of rain gauges in a network), and (3) criteria used to evaluate the performance of rain gauge networks and hydrological models are also different. However, good and similar performances can be achieved in the ‘‘best’’ and ‘‘good’’ networks using the lumped Xinanjiang Model with lower number of rain gauges, and for the distributed SWAT model good performances can be achieved in the ‘‘best’’ and ‘‘good’’ networks that contain more than a certain number of gauges.

In summary, it is seen that there is no considerable difference in model performances for the lumped Xinanjiang Model using various gauge configurations of best and good networks. However, as the distributed SWAT Model uses only the single rain gauge nearest to the sub-basin’s centroid as rainfall input for each sub-basin

( Galván et al., 2014 ), it is much more sensitive to the number of

gauges and their spatial distribution compared to a lumped model.

When using low density gauge networks, the gauge assigned to a particular sub-basin may be quite far from its centroid.

Consequently, there may be large errors between the rainfall used by the SWAT model for this sub-basin and the ‘‘true’’ areal rainfall.

This will result in large errors in discharge simulation for the affected sub-basins. Moreover, in extreme and ideal situation, good simulation results can be achieved in lumped Xinanjiang Model using the network only includes one well located rain gauge which can perfectly represent the areal mean rainfall, but the distributed

SWAT Model requires the properly located minimal number of rain gauges to generate acceptable simulation results.

3.2.2. Comparison of hydrographs in Xinanjiang Model and SWAT

Model

For illustrative purposes, an example of hydrographs is generated using the calibrated Xinanjiang Model and SWAT Model

( Fig. 9

) using optimal networks with nine, 37 and 93 rain gauges for the period of 1st January 1999–31st December 1999 at

Xiangtan gauge. It is seen that: (1) It is possible to capture the major temporal characteristics of the dynamic process of discharge in both hydrological models by using the optimal networks with different rain gauge numbers. (2) The ranges of the simulated hydrographs from the good networks narrow gradually with increasing number of rain gauges in the catchment. (3) Comparing with the lumped

Xinanjiang Model (

Fig. 9 (a)–(c)), the hydrographs derived from the distributed SWAT Model ( Fig. 9

(d)–(f)) have higher probability to overestimate or underestimate the peak flows and the ranges of the simulated hydrographs are also wider.

4. Conclusions and outlook

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

This paper designs an entropy theory based multi-criteria rain gauge resampling method to investigate the influences of the optimal gauge networks with various rain gauge densities and gauge

149 locations on the performance of lumped and distributed hydrological models. Several aspects of the results in the study reveal that the rain gauge networks selected by this method are robust and optimal. It is concluded from the study that:

1. There is no significant difference between the annual areal mean rainfall estimated by the benchmark network (185 gauges) and the optimal rain gauge networks with different gauge densities, although the spatial distribution patterns change with the number of stations and the location of the stations. This is the main reason why the performance of the lumped Xinanjiang Model does not change much with the number of rain gauges used but the distributed SWAT model does.

Meanwhile, the effect of an increase in the number of rain gauges in optimal networks on the variance reduction of mean areal precipitation is not obvious.

2. In the case of optimal rain gauge networks (best and good), it is seen that most of the rain gauges distributed in the upper and middle reaches of the main stream and tributaries or in the mountain areas.

3. For the best and good rain gauge networks, the lumped

Xinanjiang Model gives small relative errors (all | R

E

|<2%) and high values of Nash–Sutcliffe efficiency coefficient (all

NSC > 0.92) in the calibration period; while in validation period, the relative errors ( 8% < R

E

< 3%) and values of NSC (>0.89) are slightly increased/decreased when using the good networks with only nine rain gauges in simulation.

4. In general, the performance of the distributed SWAT Model is somehow lower than the lumped Xinanjiang Model, and also larger variability can be observed in simulated runoff using optimal networks with different rain gauge densities. The relative errors are 6% to 9% and 7% to 14% for the calibration and validation periods respectively. However, a clear improving trend can be observed in the values of NSC with more rain gauges included in the optimal networks.

It should be noted that (1) the multi-criteria objective functions can be changed depending on different design objectives and hydrological models to provide improved discharge simulations.

(2) Following the idea of design in the future studies.

‘‘true’’ precipitation (

Chen et al. (2008)

(4) As discussed in Section

3.2.1

, we select the unique

rain gauge with maximum Information Entropy in the first step in designing the optimal rain gauge networks; however, the x rain gauges which simultaneously fulfil the conditions of: (i) having the values of Information Entropy larger than a certain threshold and (ii) the arithmetic mean Mutual Information is below a certain threshold can be considered in the first step in rain gauge network

(3) In this study region, high density and good quality rain gauges are available which can be used as ‘‘true’’ or benchmark precipitation. However, in many catchments good quality and high density rain gauges may not be available, the satellite-based precipitation data and other global rainfall datasets with high spatial–temporal resolution have been used by some researchers as an alternatively

Krajewski and Smith, 2002; Germann et al.,

2006; Germann et al., 2009; Adjei et al., 2015; Al-Mukhtar et al.,

2014; Castro et al., 2015; Kang and Merwade, 2014

; etc.).

, the ‘‘best’’ rain gauge networks

do not always give the best performances of hydrological models due to the nonlinear behaviour of rainfall-runoff transformation processes and modelling processes, differences in runoff generation and routing mechanisms among the models, and differences in evaluation criteria used to evaluate the performance of rain gauge networks and hydrological models. However, this method is time saving and guarantees that small differences of simulation results can be achieved in hydrological simulations using the optimal networks and using the most densely benchmark networks.

150

Appendix A

1. The deduction of Eq.

(8)

H ð Y j X Þ ¼

X

½ p ð x Þ H ð Y j X ¼ x Þ ¼

X

" p ð x Þð

X

ð p ð y j x Þ log

2 p ð y j x ÞÞÞ

# x 2 X x 2 X y 2 Y

¼

X X

½ p ð x ; y Þ log

2 p ð y j x Þ ¼ x 2 X y 2 Y

X x 2 X ; y 2 Y

½ p ð x ; y Þ log

2 p ð y j x Þ

X

¼ x 2 X ; y 2 Y p ð x ; y Þ log

2 p ð x ; y Þ p ð x Þ

X

¼ x 2 X ; y 2 Y p ð x ; y Þ log

2 p p

ð x

ð

; x Þ y Þ

¼

(

X x 2 X ; y 2 Y

½ p ð x ; y Þ log

2 p ð x ; y Þ

)

þ

(

X x 2 X ; y 2 Y

½ p ð x ; y Þ log

2 p ð x Þ

)

¼ f H ð X ; Y Þ g þ

(

X

½ p ð x Þ log

2 p ð x Þ x 2 X

)

¼ H ð X ; Y Þ H ð X Þ

2. The deduction of Eq.

(10)

I ð X ; Y Þ ¼

X p ð x ; y Þ log

2 p ð x ; y Þ p ð x Þ p ð y Þ x 2 X ; y 2 Y

(

X

¼ ½ p ð x ; y Þ log

2 p ð x ; y Þ

) (

X

¼

( x 2 X ; y 2 Y

X

½ p ð x ; y Þ log

2

ð p ð x Þ p ð y j x ÞÞ x 2 X ; y 2 Y

(

X

½ p ð x ; y Þ log

2

ð p ð x Þ p ð y ÞÞ x 2 X ; y 2 Y

)

)

½ p ð x ; y Þ log

2

ð p ð x Þ p ð y ÞÞ

)

¼ x 2 X ; y 2 Y

(

X x 2 X ; y 2 Y

(

X

½ p ð x ; y Þ log

2 p ð x Þ þ

X x 2 X ; y 2 Y

½ p ð x ; y Þ log

2 p ð y j x Þ

X

½ p ð x ; y Þ log

2 p ð x Þ þ ½ p ð x ; y Þ log

2 p ð y Þ

)

)

¼

¼

( x 2 X ; y 2 Y

X

"

X p ð x ; y Þ

!

x 2 X

(

X x 2 X

" y 2 Y log

2 p

(

X

½ p ð x Þ log

2

ð x Þ log

2 p ð x Þ

X y 2 Y p ð x Þ þ p ð x

X

; y Þ x 2 X ; y 2 Y

#

X

þ

!

#

½ ð p ð x Þ p ð y j x ÞÞ log

2 p ð y j x Þ

)

½ p ð x ; y Þ log

2 p ð y j x Þ x 2 X ; y 2 Y

X

"

þ log

2 p ð y Þ

X p ð x ; y Þ

!

# ) y 2 Y x 2 X

)

¼ x 2 X

(

X

½ ð log

2 x 2 X ; y 2 Y p ð x ÞÞ p ð x Þ þ

X

½ ð log

2 p ð y ÞÞ p ð y Þ x 2 X

(

X

½ p ð x Þ log

2 p ð x Þ þ

)

X

" y 2 Y p ð x Þ

X

ð p ð y j x Þ log

2 p ð y j x ÞÞ

# ) x 2 X x 2 X y 2 Y

¼

½ H ð Y Þ þ ½ H ð X Þ

(

X

½ p ð x Þ log

2 p ð x Þ þ g

X

½ p ð x Þð H ð Y X ¼ x ÞÞ x 2 X x 2 X

) f H ð Y Þ þ H ð X Þ g

¼ ½ H ð X Þ þ ½ H ð Y j X Þ g ½ H ð Y Þ þ ½ H ð X Þ g

¼ H ð Y Þ H ð Y j X Þ

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

References

Adjei, K.A., Ren, L.L., Appiah-Adjei, M.K., Odai, S.N., 2015. Application of satellitederived rainfall for hydrological modelling in the data-scarce Black Volta transboundary basin. Hydrology Research, in press. doi: http://dx.doi.org/10.2166/ nh.2014.111

.

Ali, A., Lebel, T., Amani, A., 2003. Invariance in the spatial structure of Sahelian rain fields at climatological scales. J. Hydrometeorol. 4, 996–1011 .

Al-Mukhtar, M., Dunger, V., Merkel, B., 2014. Evaluation of the climate generator model CLIGEN for rainfall data simulation in Bautzen catchment area, Germany.

Hydrol. Res. 45 (4–5), 615–630.

http://dx.doi.org/10.2166/nh.2013.073

.

Anctil, F., Lauzon, N., Andréassian, V., Oudin, L., Perrin, C., 2006. Improvement of rainfall-runoff forecasts through mean areal rainfall optimization. J. Hydrol.

328, 717–725 .

Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic modeling and assessment Part I: model development. J. Am. Water Resour.

Assoc. 34, 73–89 .

Barca, E., Passarella, G., Uricchio, V., 2008. Optimal extension of the rain gauge monitoring network of the Apulian Regional Consortium for Crop Protection.

Environ. Monitor. Assess. 145 (1–3), 375–386 .

Bárdossy, A., Das, T., 2008. Influence of rainfall observation network on model calibration and application. Hydrol. Earth Syst. Sci. Discuss. 12, 77–89 .

Bhattacharyya, S., Sanyal, G., 2012. A robust image steganography using DWT difference modulation (DWTDM). Int. J. Comput. Network Inform. Security 7,

27–40 .

Booij, M.J., 2002. Extreme daily precipitation in Western Europe with climate change at appropriate spatial scales. Int. J. Climatol. 22, 69–85 .

Borwein, J., Howlett, P., Piantadosi, J., 2014. Modelling and simulation of seasonal rainfall using the principle of maximum entropy. Entropy 16 (2), 747–769 .

Bras, R.L., 1990. Hydrology: An Introduction to Hydrologic Science. Addison-Wesley

Publishing Company .

Bush, S.F., 2010. Nanoscale Communication Networks. Artech House.

Castro, L.M., Salas, M., Fernández, B., 2015. Evaluation of TRMM Multi-satellite precipitation analysis (TMPA) in a mountainous region of the central Andes range with a Mediterranean Climate. Hydrology Research, in press.

http:// dx.doi.org/10.2166/nh.2013.096

.

Chebbi, A., Bargaoui, Z.K., Cunha, M.D.C., 2011. Optimal extension of rain gauge monitoring network for rainfall intensity and erosivity index interpolation. J.

Hydrol. Eng. 16 (8), 665–676 .

Chen, Y.C., Wei, C., Yeh, H.C., 2008. Rainfall network design using Kriging and entropy. Hydrol. Process. 22, 340–346 .

Cheng, K.S., Lin, Y.C., Liou, J.J., 2008. Rain gauge network evaluation and augmentation using geostatistics. Hydrol. Process. 22, 2554–2564 .

Cover, T.M., Thomas, J.A., 2012. Elements of Information Theory. John Wiley & Sons .

Desilets, S.L., Ferré, T.P., Ekwurzel, B., 2008. Flash flood dynamics and composition in a semiarid mountain watershed. Water Resour. Res. 44, W12436 .

Dong, X., Dohmen-Janssen, C.M., Booij, M.J., 2005. Appropriate spatial sampling of rainfall or flow simulation. Hydrol. Sci. J. 50, 279–297 .

Friedland, J. (Ed.), 2009. Doing Well and Good: The Human Face of the New

Capitalism. Information Age Publishing Inc.

Galván, L., Olías, M., Izquierdo, T., Cerón, J.C., Fernández de Villarán, R., 2014.

Rainfall estimation in SWAT: an alternative method to simulate orographic precipitation. J. Hydrol. 509, 257–265 .

Germann, U., Galli, G., Boscacci, M., Bolliger, M., 2006. Radar precipitation measurement in a mountainous region. Quarterly J. Royal Meteorol. Soc. 132, 1669–1692 .

Germann, U., Berenguer, M., Sempere-Torres, D., Zappa, M., 2009. REAL—ensemble radar precipitation estimation for hydrology in a mountainous region. Quart. J.

Royal Meteorol. Soc. 135, 445–456 .

Guiasu, S., Shenitzer, A., 1985. The principle of maximum entropy. Math. Intell. 7,

42–48 .

Gull, S.F., 1989. Developments In Maximum Entropy Data Analysis. Maximum

Entropy and Bayesian Methods. Springer, Netherlands, pp. 53–71 .

Gupta, H.V., Sorooshian, S., Yapo, P.O., 1998. Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information.

Water Resour. Res. 34, 751–763 .

Gupta, H.V., Sorooshian, S., Gao, X., Imam, B., Hsu, K., Bastidas, L., Li, J., Mahani, S.,

2002. The challenge of predicting flash floods from thunderstorm rainfall.

Philos. Trans. Royal Soc. Lond. Ser. A: Math., Phys. Eng. Sci. 360, 1363–1371 .

Gupta, H.V., Sorooshian, S., Hogue, T.S., Boyle, D.P., 2003. Advances in automatic calibration of watershed models. Water Sci. Appl. 6, 9–28 .

Hackett, O.M., 1966. National water data program. J. Am. Water Works Assoc. 58,

786–792 .

Ihara, S., 1993. Information Theory for Continuous Systems. World Scientific,

Singapore .

Jayawardena, A.W., 2014. Environmental and Hydrological Systems Modelling. CRC

Press .

Kang, K., Merwade, V., 2014. The effect of spatially uniform and non-uniform precipitation bias correction methods on improving NEXRAD rainfall accuracy for distributed hydrologic modeling. Hydrol. Res. 45 (1), 23–42.

http:// dx.doi.org/10.2166/nh.2013.194

.

H. Xu et al. / Journal of Hydrology 525 (2015) 138–151

Kizza, M., Rodhe, A., Xu, C.-Y., Ntale, H.K., Halldin, S., 2009. Temporal rainfall variability in the Lake Victoria Basin in East Africa during the Twentieth

Century. Theor. Appl. Climatol. 98, 119–135 .

Krajewski, W.F., Smith, J.A., 2002. Radar hydrology: rainfall estimation. Adv. Water

Resour. 25, 1387–1394 .

Krstanovic, P.F., Singh, V.P., 1992. Evaluation of rainfall networks using entropy: I.

Theoretical development. Water Resour. Manage. 6, 279–293 .

Li, H., Zhang, Y., Chiew, F.H.S., Xu, S., 2009. Predicting runoff in ungauged catchments by using Xinanjiang Model with MODIS leaf area index. J. Hydrol.

370, 155–162 .

Li, H., Beldring, S., Xu, C.-Y., 2014. Implementation and testing of routing algorithms in the distributed HBV model for mountainous catchments. Hydrol. Res. 45 (3),

322–333 .

Lin, J., 1991. Divergence measures based on the Shannon entropy. Inform. Theory,

IEEE Trans. 37 (1), 145–151 .

Liu, B., Chen, X., Lian, Y., Wu, L., 2013. Entropy-based assessment and zoning of rainfall distribution. J. Hydrol. 490, 32–40 .

MacKay, D.J., 2003. Information Theory, Inference and Learning Algorithms.

Cambridge University Press .

Maruyama, T., Kawachi, T., Singh, V.P., 2005. Entropy-based assessment and clustering of potential water resources availability. J. Hydrol. 309 (1), 104–113 .

Mason, R.R., York, T.H., 1997. Streamflow information for the nation. U.S. Geological

Survey.

Nash, J., Sutcliffe, J., 1970. River flow forecasting through conceptual models part I—

A discussion of principles. J. Hydrol. 10, 282–290 .

Neitsch, S.L., Arnold, J.G., Kiniry, J.R., Williams, J.R., King, K.W., 2002. Soil and Water

Assessment Tool Theoretical Documentation, Version 2000. Texas, USA.

Neitsch, S.L., Arnold, J.G., Kiniry, J.R., Williams, J.R., King, K.W., 2011. Soil and Water

Assessment Tool Theoretical Documentation, Version 2009. Texas, USA.

Nour, M., Smit, D., El-Din, M., 2006. Geostatistical mapping of precipitation: implications for rain gauge network design. Water Sci. Technol. 53 (10), 101–

110 .

Pardo-Igúzquiza, E., 1998. Optimal selection of number and location of rainfall gauges for areal rainfall estimation using geostatistics and simulated annealing.

J. Hydrol. 210, 206–220 .

Patra, K.C., 2001. Hydrology and Water Resources Engineering. Alpha Science

International Limited .

Perks, A., Winkler, T., Stewart, B., 1996. The Adequacy of Hydrological Networks: A

Global Assessment. Secretariat of the World Meteorological Organization .

Pethel, S.D., Hahs, D.W., 2014. Exact test of independence using mutual information.

Entropy 16 (5), 2839–2849 .

Pyrce, R.S., 2004. Review and Analysis of Stream Gauge Networks for the Ontario

Stream Gauge Rehabilitation Project. Watershed Science Centre, Trent

University, Peterborough, Ontario, Canada .

Rodriguez-Iturbe, I., Mejía, J.M., 1974. On the transformation from point rainfall to areal rainfall. Water Resour. Res. 10, 729–735 .

Sang, Y.F., 2013. Wavelet entropy-based investigation into the daily precipitation variability in the Yangtze River Delta, China, with rapid urbanizations. Theor.

Appl. Climatol. 111 (3–4), 361–370 .

Segond, M.L., Wheater, H.S., Onof, C., 2007. The significance of spatial rainfall representation for flood runoff estimation: a numerical evaluation based on the

Lee catchment, UK. J. Hydrol. 347 (1), 116–131 .

Shafiei, M., Ghahraman, B., Saghafian, B., Pande, S., Gharari, S., Davary, K., 2014.

Assessment of rain-gauge networks using a probabilistic GIS based approach.

Hydrol. Res. 45 (4–5), 551–562 .

Shannon, C.E., 1949. Commun. Theor. Secrecy Syst.: Bell Syst. Tech. J. 28, 656–715 .

Singh, V.P., 1995. Computer Models of Watershed Hydrology. Water Resources

Publications. ISBN 0-918334-91-8.

Smith, C.R., Grandy, W., Jr. (Eds.), 1985. Maximum-Entropy and Bayesian Methods

In Inverse Problems, vol. 14. Springer .

151

Steuer, R., Kurths, J., Daub, C.O., Weise, J., Selbig, J., 2001. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18,

S231–S240 .

St-Hilaire, A., Ouarda, T.B., Lachance, M., Bobée, B., Gaudet, J., Gignac, C., 2003.

Assessment of the impact of meteorological network density on the estimation of basin precipitation and runoff: a case study. Hydrol. Process. 17,

3561–3580 .

Stokstad, E., 1999. Scarcity of rain, stream gages threatens forecasts. Science 285,

1199–1200 .

Strange, B.A., Duggins, A., Penny, W., Dolan, R.J., Friston, K.J., 2005. Information theory, novelty and hippocampal responses: unpredicted or unpredictable?

Neural Networks 18, 225–230 .

Tapiador, F.J., 2007. A maximum entropy analysis of global monthly series of rainfall from merged satellite data. Int. J. Rem. Sens. 28 (6), 1113–1121 .

Tsintikidis, D., Georgakakos, K.P., Sperfslage, J.A., Smith, D.E., Carpenter, T.M., 2002.

Precipitation uncertainty and rain gauge network design within Folsom Lake watershed. J. Hydrol. Eng. 7, 175–184 .

Visessri, S., McIntyre, N., 2012. Comparison between the TRMM Product and Rainfall

Interpolation for Prediction in Ungauged Catchments. In: 2012 International

Congress on Environmental Modelling and Software Managing Resources of a

Limited Planet, Sixth Biennial Meeting, Leipzig, Germany.

Volkmann, T.H., Lyon, S.W., Gupta, H.V., Troch, P.A., 2010. Multicriteria design of rain gauge networks for flash flood prediction in semiarid catchments with complex terrain. Water Resour. Res. 46, W11554 .

Wei, C., Yeh, H.C., Chen, Y.C., 2014. Spatiotemporal scaling effect on rainfall network design using entropy. Entropy 16 (8), 4626–4647 .

William, H., 2007. Numerical Recipes 3rd edition: The Art of Scientific Computing.

Cambridge University Press .

WMO (World Meteorological Organization), 1994. Guide to Hydrological Practices.

168. WMO: Geneva, Switzerland.

Xu, H., Xu, C.Y., Chen, H., Zhang, Z.X., Li, L., 2013. Assessing the influence of rain gauge density and distribution on hydrological model performance in a humid region of China. J. Hydrol. 505, 1–12 .

Yao, Y.Y., 2003. Information-theoretic measures for knowledge discovery and data mining. Entropy Measures, Maximum Entropy Principle and Emerging

Applications, pp. 115–136.

Yao, C., Li, Z., Bao, H., Yu, Z., 2009. Application of a developed Grid-Xinanjiang Model to

Chinese watersheds for flood forecasting purpose. J. Hydrolo. Eng. 14, 923–934 .

Yatheendradas, S., Wagener, T., Gupta, H., Unkrich, C., Goodrich, D., Schaffner, M.,

Stewart, A., 2008. Understanding uncertainty in distributed flash flood forecasting for semiarid regions. Water Resources Research 44, W05S19.

Yevjevich, V., 1972. Probability and Statistics in Hydrology. Water Resources

Publications, FORT COLLINS, COLORADO, U.S.A

.

Yoo, C., Jung, K., Lee, J., 2008. Evaluation of rain gauge network using entropy theory: comparison of mixed and continuous distribution function applications.

J. Hydrol. Eng. 13 (4), 226–235 .

Yuan, F., Ren, L.L., Yu, Z.B., Zhu, Y.H., Xu, J., Fang, X.Q., 2012. Potential natural vegetation dynamics driven by future long-term climate change and its hydrological impacts in the Hanjiang River basin, China. Hydrol. Res. 43, 73–90 .

Zhang, Z., Yeung, R.W., 1998. On characterization of entropy function via information inequalities. Inform. Theor., IEEE Trans. Inform. Theor. 44 (4), 1440–1452 .

Zhang, D.R., Zhang, L.R., Guan, Y.Q., Chen, X., Chen, X.F., 2012. Sensitivity analysis of

Xinanjiang rainfall–runoff model parameters: a case study in Lianghui, Zhejiang province, China. Hydrol. Res. 43, 123–134 .

Zhao, R.J., 1992. The Xinanjiang Model applied in China. J. Hydrol. 135, 371–381 .

Zhao, R.J., Zhuang, Y.L., Fang, L.R., Liu, X.R., Zhang, Q.S., 1980. The Xinanjiang

Model. Hydrological Forecasting Proceedings Oxford Symposium. IASH 129,

351–356 .

Zhao, R.J., Liu, X., Singh, V.P., 1995. The Xinanjiang Model. Computer Models of

Watershed Hydrology. Water Resources Publications, pp. 215–232.