joc3906-sup-0001-AppendixS1

advertisement
Supporting Document
for
“A coupled K-nearest neighbour and Bayesian neural network model for daily
rainfall downscaling”
Y. Lu and X.S. Qin*
Content
S1. Backward stepwise regression
S2. ASD method
S3. GLM method
S4. Downscaled daily rainfall distributions based on monthly data at S44 station
Figure S1. Downscaled daily rainfall distribution by ASD and KNN-BNN method which only
applied for December during verification period at S44.
Figure S2:Comparison of performances among KNN-BNN model for June and August rainfall
downscaling at S44 based on both full-year and individual-month data. 1 and 2 denote the June
and August, respectively; a, b and c denote the Mean, Standard Deviation (STD) and 90th
percentile rainfall amount (PERC90), respectively; APE – I and APE – F denote the absolute
percentage error (APE) for individual month and full year, respectively. The APE values are
calculated based on mean value and observed value.
S1
S1. Backward stepwise regression
Stepwise regression method is used for selecting the potential predictors using the regression
model and it includes forward selection, backward elimination and bidirectional elimination. In
this study, the backward step regression method is applied for the large-scale weather variable
selection. Initially, the model includes all variables, and then it removes the least significant one
step by step until the remaining predictors are significant in statistical meaning (Hessami et al.,
2008). In each selection step, F-test will be carried out based on the following equation:
R
F
2
q

 Rq21 n  q  1
(S1)
1  Rq2
where n is the number of observed data; q is the number of predictors; Rq is the correlation
coefficient between the criterion variables and the predicted ones with q predictors. If the F value
is smaller than a threshold, the predictor should be removed. The threshold is a criterion F value
and could be calculated by the following equation (Hessami et al., 2008):
 a
a  1 1 
 2
1/ q
(S2)
where a is the significance level (in this study, we choose 95%).
S2. Automated statistical downscaling tool (ASD)
ASD is based on multi-linear regression. It adds two procedures to enhance downscaling
performance: (i) backward stepwise regression and partial correlation coefficients to select
S2
predictors; (ii) ridge regression to alleviate the effect of non-orthogonality (Hessami et al., 2008).
The two sub-models of ASD for downscaling rainfall, occurrence model and amount model could
be written as follows (Hessami et al., 2008):
n
Oi  a0   ai pij
(S3)
j 1
n
Ri0.25   0    j pij  ei
(S4)
j 1
where i means ith of day; j means jth of predictors; n is the number of predictors; a and β are the
model parameters; ei is the error; Oi is the occurrence of daily rainfall; Ri is the amount of daily
rainfall. Other details could be found in Hessami et al. (2008).
S3. Generalized linear model (GLM)
GLM was proposed by Chandler and Wheater (2002) and applied at western Ireland. It also needs
two sub-models, occurrence model and amount model, to estimate daily rainfall. Beside the
large-scale predictors, GLM also considers other covariates used in models to filter noise and
predict the daily rainfall, such as seasonal effect, station correlation, autocorrelation, and
interaction among covariates. Therefore, GLM has advantages to address many effects that are
difficult to be tackled by other methods. Generally, two sub-models based on logistic regression
and gamma distribution are used for occurrence and amount modeling, respectively (Chandler
and Wheater , 2002; Yang et al., 2005; Segond et al., 2006):
S3
 p 
ln  i   xiT 
 1  pi 
(S5)
ln  i   iT 
(S6)
where pi is the probability of rain for the ith day, xi is the predictors for the ith day in occurance
determination, β is the coefficient vector, and T means the transpose of matrices; εi is the
predictors for the ith day in amount model, γ is the coefficient vector. Another parameter for
rainfall amount model is the dispersion coefficient ν for all gamma distribution, which is assumed
a common shape (Yang et al., 2005; Segond et al., 2006). Predictor selection is based on MSE
values on each regression step. For detailed description of GLM, readers are referred to Chandler
and Wheater (2002), Yang et al. (2005) and Segond et al. (2006).
S4. Downscaled daily rainfall distributions based on monthly data at S44 station
Figure S1 shows the comparison of one random sequence of downscaled daily rainfall between
ASD-D, GLM-D, KNN-BNN-D in December during verification period. All of the three methods
provide notable improvement for reproduction of peak rainfall amount compared with
performance of original models in Figure 7. However, all three models also give underestimation
for the extreme rainfall frequency; the number of extreme events (NEEs) are 5 for ASD-D, 9 for
GLM-D and 9 for KNN-BNN-D, respectively. KNN-BNN-D shows the best performance for
extreme frequency and amount in three models. Regarding the entire 50 ensembles, the average
NEEs of three methods equal to 6.1, 7.1 and 10 respectively. However, three models give
S4
underestimation for the amount of high intensity rainfall. It is also reflected by the PERC90 in
Daily rainfall (mm/day)
160
(a)
140
120
100
80
60
40
20
0
Daily rainfall (mm/day)
160 (b)
140
120
100
80
60
40
20
0
Daily rainfall (mm/day)
Table 6.
160
(c)
140
120
100
80
60
40
20
0
OBS
ASD-D
20
40
60
80
100
120
140
160
180
140
160
180
140
160
180
OBS
GLM-D
20
40
60
80
100
120
OBS
KNN-BNN-D
20
40
60
80
100
120
Figure S1: Downscaled daily rainfall distribution by ASD and KNN-BNN method which only
applied for December during verification period at S44. a. ASD-D method; b. GLM-D method; c.
KNN-BNN-D method.
S5
10
(a1) Jun
OBS
APE - I = 0.135
APE - F = 0.107
8
7
6
5
20
Individual
APE - I = 0.205
APE - F = 0.254
(b1) Jun
6
5
4
Full
18
APE - I = 0.034
APE - F = 0.140
Individual
(b2) Aug
16
STD (mm/day)
18
STD (mm/day)
(a2) Aug
7
Mean (mm/day)
Mean (mm/day)
9
8
16
14
Full
APE - I = 0.126
APE - F = 0.233
14
12
10
12
8
Individual
(c1) Jun
55
APE - I = 0.016
APE - F = 0.172
40
36
32
28
Individual
Individual
(c2) Aug
50
PERC90 (mm/day)
PERC90 (mm/day)
44
Full
APE - I = 0.024
APE - F = 0.249
45
40
35
30
25
20
Full
Full
Individual
Full
Figure S2:Comparison of performances among KNN-BNN model for June and August rainfall
downscaling at S44 based on both full-year and individual-month data. 1 and 2 denote the June
and August, respectively; a, b and c denote the Mean, Standard Deviation (STD) and 90th
S6
percentile rainfall amount (PERC90), respectively; APE – I and APE – F denote the absolute
percentage error (APE) for individual month and full year, respectively. The APE values are
calculated based on mean value and observed value.
In the Manuscript Section - 4.2.2, the results of KNN-BNN model based on full-year record are
presented; those of Section – 4.2.4 are based on individual monthly data. It is indicated that the
downscaling performance based on individual month is better. In Figure 5 of the main Manuscript,
there is a reproduction of rainfall for June and August, in terms of the properties of Mean, STD
and PERC90. The corresponding downscaled results could not well cover the observed data. In
this section, the individual monthly data of June and August are applied for KNN-BNN model,
aiming to mitigate such a problem. The number of KNN classes is 6, which are classified
according to the 10th, 25th, 50th, 75th and 90th percentile values for rainfall amount. Figure S2
illustrates the comparison of performances of KNN-BNN model for June and August rainfall
downscaling based on full-year and individual-month data, respectively. Compared with the
absolute percentage error (APE), the downscaled results based on monthly data show some
improvement regarding various monthly indicators except for the Mean of June. It is worth
mentioning that the PERC90 shows the most notable enhancement but with a slightly increase of
the uncertainty range. The results indicate that the downscaled results based on the full-year data
could not well reflect the rainfall patterns in these two months, although it has a narrower
uncertainty range. In terms of the maximum values, the results from individual-month
downscaling for June and August are 104.6 mm/day and 97.6 mm/day, respectively, which are
S7
slightly closer to observed data (i.e. 131.2mm/day and 85.7 mm/day) than those based on
full-year data (i.e. 101.1 mm/day and 62.8 mm/day). Overall, the above analysis demonstrates
that the separation of yearly data into monthly could considerably enhance the model
performance.
References
Chandler, R.E. and Wheater, H.S., 2002. Analysis of rainfall variability using generalized linear
models: A case study from the west of Ireland. Water Resources Research 38, 1192, doi:
10.1029/2001WR000906.
Hessami, M., Gachon, P., Ouarda, Taha B. M.J., St-Hilaire, A., 2008. Automated regression-based
statistical downscaling tool. Environmental Modelling & Software 23, 813-834.
Segond, M.L., Neokleous, N., Makropoulos, C., Onof, C., Maksimovic, C., 2007. Simulation and
spatio-temporal disaggregation of multi-site rainfall data for urban drainage applications.
Hydrology Sciences Journal – Journal Des Sciences Hydrologiques 52, 917-935.
Yang, C., Chandler, R.E., Isham, V.S., Wheater, H.S., 2005. Spatial-temporal rainfall simulation
using generalized linear models. Water Resources Research 41, W11415,
doi:10.1029/2004WR003739.
S8
Download