Supporting Document for “A coupled K-nearest neighbour and Bayesian neural network model for daily rainfall downscaling” Y. Lu and X.S. Qin* Content S1. Backward stepwise regression S2. ASD method S3. GLM method S4. Downscaled daily rainfall distributions based on monthly data at S44 station Figure S1. Downscaled daily rainfall distribution by ASD and KNN-BNN method which only applied for December during verification period at S44. Figure S2:Comparison of performances among KNN-BNN model for June and August rainfall downscaling at S44 based on both full-year and individual-month data. 1 and 2 denote the June and August, respectively; a, b and c denote the Mean, Standard Deviation (STD) and 90th percentile rainfall amount (PERC90), respectively; APE – I and APE – F denote the absolute percentage error (APE) for individual month and full year, respectively. The APE values are calculated based on mean value and observed value. S1 S1. Backward stepwise regression Stepwise regression method is used for selecting the potential predictors using the regression model and it includes forward selection, backward elimination and bidirectional elimination. In this study, the backward step regression method is applied for the large-scale weather variable selection. Initially, the model includes all variables, and then it removes the least significant one step by step until the remaining predictors are significant in statistical meaning (Hessami et al., 2008). In each selection step, F-test will be carried out based on the following equation: R F 2 q Rq21 n q 1 (S1) 1 Rq2 where n is the number of observed data; q is the number of predictors; Rq is the correlation coefficient between the criterion variables and the predicted ones with q predictors. If the F value is smaller than a threshold, the predictor should be removed. The threshold is a criterion F value and could be calculated by the following equation (Hessami et al., 2008): a a 1 1 2 1/ q (S2) where a is the significance level (in this study, we choose 95%). S2. Automated statistical downscaling tool (ASD) ASD is based on multi-linear regression. It adds two procedures to enhance downscaling performance: (i) backward stepwise regression and partial correlation coefficients to select S2 predictors; (ii) ridge regression to alleviate the effect of non-orthogonality (Hessami et al., 2008). The two sub-models of ASD for downscaling rainfall, occurrence model and amount model could be written as follows (Hessami et al., 2008): n Oi a0 ai pij (S3) j 1 n Ri0.25 0 j pij ei (S4) j 1 where i means ith of day; j means jth of predictors; n is the number of predictors; a and β are the model parameters; ei is the error; Oi is the occurrence of daily rainfall; Ri is the amount of daily rainfall. Other details could be found in Hessami et al. (2008). S3. Generalized linear model (GLM) GLM was proposed by Chandler and Wheater (2002) and applied at western Ireland. It also needs two sub-models, occurrence model and amount model, to estimate daily rainfall. Beside the large-scale predictors, GLM also considers other covariates used in models to filter noise and predict the daily rainfall, such as seasonal effect, station correlation, autocorrelation, and interaction among covariates. Therefore, GLM has advantages to address many effects that are difficult to be tackled by other methods. Generally, two sub-models based on logistic regression and gamma distribution are used for occurrence and amount modeling, respectively (Chandler and Wheater , 2002; Yang et al., 2005; Segond et al., 2006): S3 p ln i xiT 1 pi (S5) ln i iT (S6) where pi is the probability of rain for the ith day, xi is the predictors for the ith day in occurance determination, β is the coefficient vector, and T means the transpose of matrices; εi is the predictors for the ith day in amount model, γ is the coefficient vector. Another parameter for rainfall amount model is the dispersion coefficient ν for all gamma distribution, which is assumed a common shape (Yang et al., 2005; Segond et al., 2006). Predictor selection is based on MSE values on each regression step. For detailed description of GLM, readers are referred to Chandler and Wheater (2002), Yang et al. (2005) and Segond et al. (2006). S4. Downscaled daily rainfall distributions based on monthly data at S44 station Figure S1 shows the comparison of one random sequence of downscaled daily rainfall between ASD-D, GLM-D, KNN-BNN-D in December during verification period. All of the three methods provide notable improvement for reproduction of peak rainfall amount compared with performance of original models in Figure 7. However, all three models also give underestimation for the extreme rainfall frequency; the number of extreme events (NEEs) are 5 for ASD-D, 9 for GLM-D and 9 for KNN-BNN-D, respectively. KNN-BNN-D shows the best performance for extreme frequency and amount in three models. Regarding the entire 50 ensembles, the average NEEs of three methods equal to 6.1, 7.1 and 10 respectively. However, three models give S4 underestimation for the amount of high intensity rainfall. It is also reflected by the PERC90 in Daily rainfall (mm/day) 160 (a) 140 120 100 80 60 40 20 0 Daily rainfall (mm/day) 160 (b) 140 120 100 80 60 40 20 0 Daily rainfall (mm/day) Table 6. 160 (c) 140 120 100 80 60 40 20 0 OBS ASD-D 20 40 60 80 100 120 140 160 180 140 160 180 140 160 180 OBS GLM-D 20 40 60 80 100 120 OBS KNN-BNN-D 20 40 60 80 100 120 Figure S1: Downscaled daily rainfall distribution by ASD and KNN-BNN method which only applied for December during verification period at S44. a. ASD-D method; b. GLM-D method; c. KNN-BNN-D method. S5 10 (a1) Jun OBS APE - I = 0.135 APE - F = 0.107 8 7 6 5 20 Individual APE - I = 0.205 APE - F = 0.254 (b1) Jun 6 5 4 Full 18 APE - I = 0.034 APE - F = 0.140 Individual (b2) Aug 16 STD (mm/day) 18 STD (mm/day) (a2) Aug 7 Mean (mm/day) Mean (mm/day) 9 8 16 14 Full APE - I = 0.126 APE - F = 0.233 14 12 10 12 8 Individual (c1) Jun 55 APE - I = 0.016 APE - F = 0.172 40 36 32 28 Individual Individual (c2) Aug 50 PERC90 (mm/day) PERC90 (mm/day) 44 Full APE - I = 0.024 APE - F = 0.249 45 40 35 30 25 20 Full Full Individual Full Figure S2:Comparison of performances among KNN-BNN model for June and August rainfall downscaling at S44 based on both full-year and individual-month data. 1 and 2 denote the June and August, respectively; a, b and c denote the Mean, Standard Deviation (STD) and 90th S6 percentile rainfall amount (PERC90), respectively; APE – I and APE – F denote the absolute percentage error (APE) for individual month and full year, respectively. The APE values are calculated based on mean value and observed value. In the Manuscript Section - 4.2.2, the results of KNN-BNN model based on full-year record are presented; those of Section – 4.2.4 are based on individual monthly data. It is indicated that the downscaling performance based on individual month is better. In Figure 5 of the main Manuscript, there is a reproduction of rainfall for June and August, in terms of the properties of Mean, STD and PERC90. The corresponding downscaled results could not well cover the observed data. In this section, the individual monthly data of June and August are applied for KNN-BNN model, aiming to mitigate such a problem. The number of KNN classes is 6, which are classified according to the 10th, 25th, 50th, 75th and 90th percentile values for rainfall amount. Figure S2 illustrates the comparison of performances of KNN-BNN model for June and August rainfall downscaling based on full-year and individual-month data, respectively. Compared with the absolute percentage error (APE), the downscaled results based on monthly data show some improvement regarding various monthly indicators except for the Mean of June. It is worth mentioning that the PERC90 shows the most notable enhancement but with a slightly increase of the uncertainty range. The results indicate that the downscaled results based on the full-year data could not well reflect the rainfall patterns in these two months, although it has a narrower uncertainty range. In terms of the maximum values, the results from individual-month downscaling for June and August are 104.6 mm/day and 97.6 mm/day, respectively, which are S7 slightly closer to observed data (i.e. 131.2mm/day and 85.7 mm/day) than those based on full-year data (i.e. 101.1 mm/day and 62.8 mm/day). Overall, the above analysis demonstrates that the separation of yearly data into monthly could considerably enhance the model performance. References Chandler, R.E. and Wheater, H.S., 2002. Analysis of rainfall variability using generalized linear models: A case study from the west of Ireland. Water Resources Research 38, 1192, doi: 10.1029/2001WR000906. Hessami, M., Gachon, P., Ouarda, Taha B. M.J., St-Hilaire, A., 2008. Automated regression-based statistical downscaling tool. Environmental Modelling & Software 23, 813-834. Segond, M.L., Neokleous, N., Makropoulos, C., Onof, C., Maksimovic, C., 2007. Simulation and spatio-temporal disaggregation of multi-site rainfall data for urban drainage applications. Hydrology Sciences Journal – Journal Des Sciences Hydrologiques 52, 917-935. Yang, C., Chandler, R.E., Isham, V.S., Wheater, H.S., 2005. Spatial-temporal rainfall simulation using generalized linear models. Water Resources Research 41, W11415, doi:10.1029/2004WR003739. S8