CHAPTER III METHODOLOGY 3.1 Introduction Disaggregation is stochastic generation that starts with the previously generated aggregate series and subdivides or disaggregates that series into a finer scale time series. Disaggregation modelling has proven to be a very practical approach, especially for multi site and multivariate analysis. Disaggregation modelling has major advantages over other modelling techniques. It is easily understood, easily applied and very flexible. There are opportunities to use long term memory models and to include parameter uncertainty. The stochastic simulation of seasonal rainfall series consisting of two approaches: a. Generate seasonal rainfall directly using models whose parameters (at least means and variances) change from season to season. b. Generate annual rainfall first and then disaggregation these into seasonal rainfall. This study investigates the performance of disaggregation models and uses this model to generate annual rainfall first and then disaggregation into seasonal rainfall. 24 Disaggregation procedures are to generate annual rainfall with appropriate model and then divide those flows among the modelled period within each year. In general, disaggregation procedures can generate reasonable monthly rainfall data which are consistent with a desired annual rainfall model. Calculation for the disaggregation models employed the software “Stochastic Analysis Modelling and Simulations” (SAMS) 2000 that were developed by Salas et al.(1996). The method include the Valencia-Schaake, Mejia-Rousselle and Lane also other methods. This study will covered three major disaggregation methods such as basic model (Valencia and Shake, 1973); extended model (Mejia and Rousselle, 1973); and the condensed model (Lane, 1979; Grygier and Stedinger, 1991). The development of the rainfall simulation models comprises three processes: 1) statistical analysis of data, 2) fitting a stochastic model and 3) generating synthetic series. For the evaluation, generally it is evaluation the properties of the process such as means and standard deviation comparing the statistical properties (statistics) of the process being modelled. 3.2 Data and Site Description The data that use in this study are actual data from observation that measured in millimetre (mm). This data were taken from the JPS Hydrology and Water Resources Division, one of division in Department of Irrigation and Drainage of Malaysia (DID). The data modelled were the data rainfall for 52 years from 1949 until 2000. The utility of this approach is demonstrated through the application of average monthly rainfall from 5 sites in Johor. 25 The sites (station) are: 1. Ladang Sungai Pelentong, Johor Bahru 2. Ladang Kulai Young, Kulai 3. Ladang Getah Malaya, Kota Tinggi 4. Pintu Kawalan Semberong, Johor 5. Ladang Simpang Renggam, Simpang Renggam The locations of sites can refer in web page DID Malaysia at http://Agrolink.moa.my/did and go to Main program >Hydrology and Water Resources> Hydrology Resources>National Hydrological Networks>Negeri>Rainfall Station. These locations are chosen because they have less missing data than other locations. 3.3 Disaggregation Models Salas et al. (1980) classified the disaggregation models into temporal and spatial disaggregation models. A temporal example is the disaggregation of an annual time series into a seasonal time series. The temporal disaggregation also include annual to semiannual, semiannual to biweekly, annual to monthly and annual to biweekly. The classic use of spatial disaggregation is the disaggregation of total natural flow of a river basin into individual tributary flows. In the present study the application is for the disaggregation of annual to monthly series in case of a single site. Three forms of a single site temporal disaggregation are presented here: the basic model (Valencia and Schaake, 1973); extended model (Meija and Rousselle, 1976); and the condensed model (Lane, 1979; Grygier and Stedinger, 1991). 26 3.3.1 Valencia-Schaake In the present study, the basic model of Valencia and Schaake (1973) disaggregation model is applied. For the case of a single site the equation is: Yt = AQt + Bεt (3.1) where; Qt is a 1 x 1 matrix of the annual rainfall value of year t; Yt is a vector of m seasonal rainfall values which sum to Qt; Yti = (yt,1, yt,2,…yt,m) m is taken as 12 month; εt is the an m x 1 matrix of independent standard normal deviates; A and B are parameter matrices with dimensions of m x 1 and m x m respectively. The disaggregation model is expected to satisfy additivity, e.g. the seasonal rainfalls in a year add up to the annual rainfalls. The model parameter matrices A and B can be estimated by using the method of moments as Valencia and Schaake (1973): A = M0(YQ)M0-1(Q) (3.2) BBT = M0(Y) – M0(YQ)M0-1(Q)M0(QY) (3.3) Equation (3.2) and (3.3) can be used to obtain estimates of A and B by substituting the population moments M0 (Q), M0 (Y), M0 (QY) and M0 (YQ) by their corresponding sample estimates. 27 3.3.2 Meija-Rousselle The extended model, developed by Meija and Rousselle (1976), is an extension of the basic temporal model. An additional term is included in the model to preserve the seasonal covariances between seasons of the current year and the seasons of the past year. This disaggregation model is used in the simulation of the seasonal rainfall and the equation is: Yt = AQt + Bεt + CYt-1 (3.4) where; Yt , Qt , εt , A and B are the same way as for Valencia-Schaake model; C is an additional (m x m) parameter matrix. The model parameter matrices A, B and C can be estimated by using the method of moment as: A = {[M0(YQ) – M1(Y)M0-1(Y)M0T(QY)] [M0(QQ) – M1(QY)M0-1(Y)M1T(QY)]-1} (3.5) C = [M1(Y) – AM1(QY)]M0-1(Y) (3.6) BBT = M0(Y) – AM0(QY) – CM1T(Y) (3.7) Equation (3.5) trough (3.7) can be used to obtain estimates of A, B and C by substituting the population moments M0 (Q), M0 (Y), M0 (QY), M0 (YQ),M1(Q), M1(Y), M1(QY) and M1(YQ) by their corresponding sample estimates. 3.3.3 Lane Lane (1979) has developed the condensed model. His approach essentially set to zero several parameters of the Meija and Rousselle (1976) model which is not important. The number of parameters and moments are reduced drastically. The present study 28 employs the Lane model for the seasonal rainfall simulation. The equation of Lane model for a single site may be written as, Yν,τ = AτQν + Bτεν,τ + CτYν,τ-1 (3.8) where; Yν,τ is the seasonal rainfall vector; Qν,τ is the annual rainfall vector; εν,τ is the vector of normally distributed noise terms with mean zero and the identity matrix as its variance-covariance matrix. The noises εν,τ are independent in time and space; ν donates the number of year; τ is for season (month). Salas et al. (1996) estimated the model parameters matrices A, B and C based on Lane and Frevert (1991). The equation are given as Aτ = {[M0,τ(YQ) – M1,τ(Y)M-10,τ-1 (Y)MT1,τ(QY)] [M0(QQ) – M1,τ(QY) M-10,τ-1 (Y)MT1,τ(QY)]-1} (3.9) Cτ = [M1,τ(Y) – AτM1,τ(QY)] M-10,τ-1 (Y) (3.10) BτBτT = M0,τ(Y) – AτM0,τ(QY) – CτM1,τT(Y) (3.11) in which, the term M denotes the population moment. The method of moment parameter matrices can estimate by substituting the population moments by their corresponding sample estimates and solving equation (3.9), (3.10) and (3.11). 3.4 Development of the Rainfall Simulation Models The development of the rainfall simulation models comprises three processes: a. statistical analysis of data b. fitting a stochastic model c. generating synthetic series 29 Analysis is carried out for the rainfall series into a system of single site. This historical input data is prepared in a text file for use in the analysis of stochastic modelling. The development of the disaggregation models is in accordance to the available standard procedure. The guidelines for modelling the Valencia-Schaake, Meija and Rousselle and Lane are illustrated by Salas et al. (1996) in the SAMS manual. The data analysis is including the data plotting, checking the normality of the data, data transformation and data statistical characteristics. Initially, it is required to check for the basic statistical characteristics and the normality of the rainfall data. A time series process can be characterized by a number of statistical properties such as the mean, standard deviation, coefficient of variation and skewness coefficient. Probability plots are included for verifying the normality of the data. The data can be transformed to normal by using different transformation techniques. The normality test for the data is done by plotting the data on normal probability paper so it can be checked whether the data is normal or not. Salas et al. (1996) applied the skewness test of normality. For fitting process, the stochastic models shall be selected for testing with the rainfalls data series. The parameter estimation of the stochastic model requires the fitting process. After estimation the model parameters, the fitted model should be tested to ensure that the model is suitable for the data. In general this can be done by testing of the residual and comparing the model and the historical properties. Testing of the residuals is an important process to identify whether the fitted model is appropriate or not. The basic assumptions about the residuals are that they are normal and independent. For testing the normality of residuals, the skewness test is available. The independence of the residuals may be checked the performing the Portmentau test and the test of Akaike Information Criterion (AIC). The Portmentau test measure white noise that shall be significantly zero. The AIC measures the total “worth” of the parameters in fitting the model and corresponds to an expected maximum likelihood criterion, where the expectation is taken over the distribution of the estimated 30 parameters. The lower value of the AIC is the better for residuals. Once the model has been fitted to the data and the parameters are estimated, the theoretical covariance structure can be calculated. Comparing the model and historical covariance correlation structure is another method by which one can check if a certain model is appropriate or not. Once the model has been defined and the parameters have been estimated, the synthetic samples of the rainfall can be generated. The comparison is made for the important statistical characteristics of the historical and the generated data. Such comparison is important to checking whether the model used in the generation is adequate or not. All the candidate models employ the available statistical software to generate the synthetic samples. The resulting generated synthetic rainfall series should be retransformed and adjusted after the analysis performed for the transformed data. Adjustment techniques are applicable for the disaggregation models to eliminate discrepancies between the historical and generated aggregate annual rainfall. It is found that the method of the adjustment proposed by Grygier and Stidinger (1991) is different from Salas et al. (1996). Basically, the stochastic models are proposed for the case of the transformed and untransformed rainfalls. For the case of the untransformed, the annual and monthly inflow series are assumed normally distributed. Therefore retransformation is not required for the model of untransformed rainfall series. The results of statistical analysis of the generated data for each model considered will be evaluated related to its performance. Figure 3.1 showed the schematic diagram for modelling hydrological times series. However, in this study disaggregation was choose for modelling approach but the sequence or flow chart of the analysis is in the same way. The processes that use in this study are in bold words. 31 3.5 Evaluation the Performance of Simulation Models Evaluation the properties of the process generally means comparing the statistical properties (statistics) of the process being modelled. In general, one would like the model to be capable of reproducing the necessary statistics that affect the variability of the data. Furthermore, the model should be capable of reproducing certain statistics that are related to the intended use of the model. The present study investigates the performance of stochastic rainfall models for use in the rainfall-runoff modelling. In this case the seasonal properties should be tested. Further, the reliable models are chosen and will be simulated for the properties of rainfall-runoff model. Generally, the statistic properties that is important these studies are the seasonal mean, standard deviation, variance, skewness and correlation. 32 Yes Trends Model the Trends No Trend Removal Normal Distribution No No Transform the Series Yes Yes Use Model with Skewed Noise Yes No Use Non-Normal Models Yes Use Normal Models Periodicity in the Parameter No Yes Model Periodic Parameter Yes Use Fourier Series Analysis No Use Sample Statistics Select Modeling Approach Direct Modeling Disaggregation Aggregation Estimation of Parameters Test of Goodness of Fit Figure 3.1: Schematic diagram for modelling hydrological times series 33 3.6 Definition of Statistical Characteristics A time series processes can be characterized a by a number of statistical properties such as the men, standard deviation. skewness and season-to-season correlation. These statistics are defined for both annual and seasonal data. 3.6.1 Annual Data The mean, y and the standard deviation, s of the time series yt , are estimated by 1 N y y t N t 1 1 N s y N t 1 t y (3.12) 2 (3.13) respectively, where N is the sample size. 3.6.2 Seasonal Data Seasonal hydrologic time series, such as monthly flows are better characterized by seasonal statistics. Let yν,τ be the seasonal time series, where ν represent years and τ seasons; ν = 1….. N with N =number of years and 1 ,…., =number of seasons. The mean and standard deviation for season can be estimated by yt and 1 N N y 1 v, (3.14) 34 1 N ( yν , y t )2 N 1 st (3.15) respectively. The seasonal coefficient of variation is cντ= sτ/yτ. Similarly, the seasonal skewness coefficient is estimated by gt 1 N y y N 3 v, 1 t (3.16) s3 The sample lag-k season-to-season correlation coefficient may be estimated by rk , m mk , mo , k o , (3.17) 1 2 where mk ,t 1 N y y y y N v, v 1 k v, (3.18) in which m0,τ represent the sample variances for season . Likewise, for multi site series, the lag-k sample cross-correlation between site i and site j, for season τ, rijk,τ may be estimated by rkij, mkij, m (3.19) 1 ij 2 k ,m ij o , and mkij, 1 N y y N 1 i , i y j , k j y k (3.20) in which m ij 0 , represents the sample variance for season τ and i. Note that in equation (3.17) through (3.20) when k 1, the terms, ν=1, yv,τ-k, y v,τ-k ,m0, τ-k, y(j) v,τ-k, y (j)τ-k and mjj0,τ-k are replaced by = 2, yv-1ω+τ-k, y v,ω+τ-k, m0,ω+ τ-k, y(j) v-1,ω+τ-k, y (j)ω+τ-k and mjj0,ω+ τ-k respectively.