Prediction of Rainfall in Saudi Arabia A THESIS SUBMITTED TO THE GRADUATE EDUCATIONAL POLCIES COUNCIL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS For the degree MASTER OF SCIENCE By Hadi Obaid M Alshammari Adviser Dr. Rahmatullah Imon Ball State University Muncie, Indiana May 2015 ACKNOWLEDGEMENTS I would like to gratefully and sincerely thank my supervisor Professor Dr. Rahmatullah Imon for the patient guidance, encouragement and advice he has provided throughout my time as his student. I have been extremely lucky to have a supervisor who cared so much about my work, and who responded to my questions and queries so promptly. His mentorship was paramount in providing a well-rounded experience consistent my long-term career goals. For everything you’ve done for me Dr. Imon, thank you. I would like also to thank the rest of my thesis committee: Dr. Rich Stankewitz and Dr. Yayuan Xiao for their encouragement, insightful comments and patience. Finally, I would like to thank my family: my mother, my brothers and sisters, for supporting me throughout my life. Hadi Obaid M Alshammari March, 30, 2015 Table of Contents 4 CHAPTER 1 INTRODUCTION 1.1 Saudi Arabia Climate Data 1.2 Outline of the Study 4 5 10 12 CHAPTER 2 ESTIMATION OF MISSING VALUES IN SAUDI ARABIA CLIMATE DATA 2.1 Imputation Methods 12 2.2 Expectation Maximization Algorithm 14 2.3 Estimation with Trend Models 15 2.4 Estimation with Smoothing Techniques 17 2.5 Robust Methods 18 2.6 Nonparametric Bootstrap 21 2.7 Estimation of Missing Values for Saudi Arabia Climate Data 24 2.8 Trend of Climate Data for Saudi Arabia 32 CHAPTER 3 MODELING AND FITTING OF DATA USING REGRESSION AND MEDIATION METHODS 34 34 3.1 Classical Regression Analysis 34 3.2 Mediation 42 3.3 Accuracy Measures 43 3.4 A Comparison of Regression and Mediation Fits for Gizan 45 48 CHAPTER 4 FORECASTNG WITH ARIMA MODELS 48 4.1 The Box-Jenkins Methodology 48 4.2 Stationary and Nonstationary Time Series 49 4.3 Test for Significance of a White Noise Autocorrelation Function 52 4.4 ARIMA Models 54 4.5 Estimation and Specification of ARIMA Models 60 4.6 Diagnostic Checking 62 4.7 Computing a Forecast 63 4.8 Fitting of the ARIMA Model to Saudi Arabia Rainfall Data 69 CHAPTER 5 79 EVALUATION OF FORECASTS BY REGRESSION, MEDIATION AND ARIMA MODELS 77 5.1 Cross Validation in Regression and Time Series Models 79 5.2 Evaluation of Forecasts for Rainfall Data 81 CHAPTER 6 CONCLUSIONS AND AREAS OF FUTURE RESEARCH 86 86 6.1 Conclusions 86 6.2 Areas of Future Research 87 REFERENCES 88 APPENDIX A 91 Saudi Arabia Climate Data List of Tables Chapter 2 Table 2.1: Trend of Complete Climate Data for Saudi Arabia 32 Chapter 3 Table 3.1: Actual and Predicted Fits of Rainfall for Gizan 45 Table 3.2: Accuracy Measures for Regression and Mediation Fits of Rainfall for Gizan 47 Chapter 4 Table 4.1: Specification of ARIMA Models 62 Table 4.2: ACF and PACF Values of Rainfall Data for Gizan 70 Chapter 5 Table 5.1: Rainfall Forecast for Gizan 82 Table 5.2: MSPE of Regression, Mediation and ARIMA Forecasts of Rainfall for Gizan 83 Table 5.3: Predicted Trend of Rainfall in Saudi Arabia 84 List of Figures Chapter 1 Figure 1.1 Major Cities of Saudi Arabia 5 Figure 1.2: Time Series Plot of Total Number of Rainy Days in Gizan 7 Figure 1.3: Time Series Plot of Yearly Temperature of Gizan 7 Figure 1.4: Time Series Plot of Maximum Temperature in Gizan 7 Figure 1.5: Time Series Plot of Minimum Temperature in Gizan 8 Figure 1.6: Time Series Plot of Wind Speed in Gizan 8 Figure 1.7: Time Series Plot of Rainfall in Gizan 8 Figure 1.8: Individual Value Plot of Rainy Days vs Rainfall in Gizan 9 Chapter 2 Figure 2.1: Time Series Plot of Total Number of Rainy Days in Gizan with Complete Data 26 Figure 2.2: Time Series Plot of Yearly Temperature of Gizan with Complete Data 27 Figure 2.3: Time Series Plot of Maximum Temperature in Gizan with Complete Data 28 Figure 2.4: Time Series Plot of Minimum Temperature in Gizan with Complete Data 29 Figure 2.5: Time Series Plot of Wind Speed in Gizan with Complete Data 30 Figure 2.6: Time Series Plot of Rainfall in Gizan with Complete Data 31 1 Chapter 3 Figure 3.1: Normal Probability Plot of Rainfall vs Rainy Days in Gizan 40 Figure 3.2: Normal Probability Plot of Rainfall vs Different Climate Variables in Gizan 41 Figure 3.3: Time Series Plot of Original and Predicted Rainfall for Gizan 41 Figure 3.4: Regression vs Mediation Analysis 43 Figure 3.5: Time Series Plot of Actual and Predicted Fits of Rainfall for Gizan 46 Chapter 4 Figure 4.1: ACF and PACF of Rainfall Data for Gizan 70 Figure 4.2: ACF and PACF of Rainfall Data for Hail 70 Figure 4.3: ACF and PACF of Rainfall Data for Abha 71 Figure 4.4: ACF and PACF of Rainfall Data for Al Ahsa 71 Figure 4.5: ACF and PACF of Rainfall Data for Al Baha 71 Figure 4.6: ACF and PACF of Rainfall Data for Arar 72 Figure 4.7: ACF and PACF of Rainfall Data for Buriedah 72 Figure 4.8: ACF and PACF of Rainfall Data for Dahran 72 Figure 4.9: ACF and PACF of Rainfall Data for Jeddah 73 Figure 4.10: ACF and PACF of Rainfall Data for Khamis Mashit 73 Figure 4.11: ACF and PACF of Rainfall Data for Madinah 73 Figure 4.12: ACF and PACF of Rainfall Data for Mecca 74 Figure 4.13: ACF and PACF of Rainfall Data for Quriat 74 2 Figure 4.14: ACF and PACF of Rainfall Data for Rafha 74 Figure 4.15: ACF and PACF of Rainfall Data for Riyadh 75 Figure 4.16: ACF and PACF of Rainfall Data for Sakaka 75 Figure 4.17: ACF and PACF of Rainfall Data for Sharurah 75 Figure 4.18: ACF and PACF of Rainfall Data for Tabuk 76 Figure 4.19: ACF and PACF of Rainfall Data for Taif 76 Figure 4.20: ACF and PACF of Rainfall Data for Turaif 76 Figure 4.21: ACF and PACF of Rainfall Data for Unayzah 77 Figure 4.22: ACF and PACF of Rainfall Data for Wejh 77 Figure 4.23: ACF and PACF of Rainfall Data for Yanbu 77 Figure 4.24: ACF and PACF of Rainfall Data for Bishah 78 Figure 4.25: ACF and PACF of Rainfall Data for Najran 78 Chapter 5 Figure 5.1: Time Series Plot of Rainfall Forecast for Gizan 83 3 CHAPTER 1 INTRODUCTION Prediction of rainfall is still a huge challenge to the climatologists. It is the most important component of a climate system. Most of the burning issues of our time like global warming, floods, draught, heat waves, soil erosion and many other climatic issues are directly related with rainfall. Every country should have a better understanding and prediction of its weather to avoid environmental issues in the future. In Saudi Arabia, we have little rain and snow comparing with other western countries. However, Saudi Arabia has a lot of damages if the weather becomes rainy or snowy. Since we do not have enough studies regarding environmental issues in Saudi Arabia, we usually face problems with that. As a desert country, people think that we do not need to deal seriously with weather damages, but the reality is different. Our government spends a lot of money to fix the damages which are caused by environmental change. In 2011, Jeddah Municipality made a session to discuss the causes of environmental issues. “Nowadays, the environmental change is major problem in Saudi Arabia, and we need to forecast the damages before it happened”. (Eng. Alzahrani 2011). In this session, all speakers agree that we should have a lot of studies about our future weather to be able to avoid the problems and damages of it. In that year (2011), Prince Khaled Al-Feisal employed more than 9 companies to deal with environmental issues just in Jeddah. (Okaz newspaper, 2011). Saudi Arabia has been suffering with lots of kinds of environmental issues. Since the population is increasing, the need for solving these issues becomes very significant. We are interested to study the weather of Saudi Arabia and its environmental issues. In this study, we will forecast the future rainfall of Saudi Arabia, and try to know the reasons that cause rainfall. We will employ appropriate statistical 4 models and methods to predict rainfall correctly and also to identify the variables responsible for rain. 1.1 Saudi Arabia Climate Data In our study, we would like to consider a variety of climate data that might be useful to predict the rainfall of Saudi Arabia. A political map of Saudi Arabia is presented in Figure 1.1 Figure 1.1 Major Cities of Saudi Arabia 5 Saudi Arabia is a large country and its climate patterns in different regions are different. For this reason instead of considering data for the entire Saudi Arabia we consider data from 26 major cities of this country which are: Gizan, Hail, Madinah, Makkah (Mecca), Najran, Rafha, Riyadh, Sharurah, Tabuk, Taif, Turaif, Wejh, Yanbo, Abha, Al Baha, Sakaka, Guriat, Arar, Buraydah, Alqasim (Unayzah), Dahran, Al Ahsa, Khamis, Mushait, Jeddah and Bishah. Initially we have considered eleven climate variables of Saudi Arabia from year 1986 to 2014. Data are taken from the Saudi Arabia Presidency of Meteorology and Environment which is available in http://ww2.pme.gov.sa/ which are also presented in Appendix A. The variables we observe here are Data T Scale of measurement - Average annual daily temperature Celsius (°C) TM - Annual average daily maximum temperature Celsius (°C) Tm – Annual Average daily minimum temperature Celsius (°C) PP - Rain or snow precipitation total (annual) (mm) V - Annual daily average wind speed Km\h RA - Number of days with rain Days SN - Number of days with snow Days TS - Number of days with storm Days FG - Number of foggy days Days TN - Number of days with tornado Days GR - Number of days with hail Days 6 At first we construct time series plots of six of the most important variables for the city Gizan and they are shown in Figures 1.2 – 1.7. Time Series Plot of Rainy Days 35 30 Rainy Days 25 20 15 10 5 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.2: Time Series Plot of Total Number of Rainy Days (RA) in Gizan Time Series Plot of Temp 31.25 31.00 Temp 30.75 30.50 30.25 30.00 29.75 29.50 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.3: Time Series Plot of Yearly Temperature (T) of Gizan 7 Time Series Plot of Temp_Max 36.25 36.00 Temp_Max 35.75 35.50 35.25 35.00 34.75 34.50 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.4: Time Series Plot of Maximum Temperature (TM) in Gizan Time Series Plot of Temp_Min 27.0 Temp_Min 26.5 26.0 25.5 25.0 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.5: Time Series Plot of Minimum Temperature (Tm) in Gizan Time Series Plot of Wind Speed 14 Wind Speed 13 12 11 10 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.6: Time Series Plot of Wind Speed (V) in Gizan 8 Time Series Plot of Rainfall 350 300 Rainfall 250 200 150 100 50 0 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 1.7: Time Series Plot of Rainfall (PP) in Gizan The time series plots of the variables reveal several interesting features of time series. For each variable the information for the year 2005 is missing. This is not only the case with Gizan, for all other 25 cities we observe the same situation. We also observe that the yearly average temperature and minimum temperature trend up. The number of rainy days and the maximum temperature show slight decreasing trend. The average wind speed shows a decreasing pattern and then tends to increase. The total amount of rainfall shows slightly decreasing pattern although it is surprising to see that on many years there is zero amount of rainfall whereas there are quite a few rainy days on those years. Certainly it cannot be right. To investigate this issue a bit further we consider a one-way ANOVA test for the difference in the number of rainy days among the days when we have no rain and enough rain and the MINITAB output and boxplot and individual value plot are given below. Here no reported rainfall days are categorized by 0 and the reported rainfall days are categorized by 1. 9 Individual Value Plot of Rainy Days vs Rainfall 35 30 Rainy Days 25 20 15 10 5 0 1 Rainfall Figure 1.8: Individual Value Plot of Rainy Days (RA) vs Rainfall (PP) in Gizan One-way ANOVA: Rainy Days versus Rainfall Source Rain Error Total DF 1 27 28 S = 5.636 Level 0 1 N 10 19 SS 36.9 857.6 894.6 MS 36.9 31.8 R-Sq = 4.13% Mean 13.100 15.474 StDev 4.433 6.150 F 1.16 P 0.291 R-Sq(adj) = 0.58% Individual 95% CIs For Mean Based on Pooled StDev --+---------+---------+---------+------(-------------*--------------) (----------*----------) --+---------+---------+---------+------10.0 12.5 15.0 17.5 The above MINITAB output for one way analysis of variance (ANOVA) between reported no rainfall days and rainfall days show that the difference of total amount of rainfall on those two types of days are not statistically significant. This table contains sum of squares (SS), degrees of freedom (DF), mean squares (MS), the F statistic and its associated p-value for rainfall over rainy days. The p-value for the test statistic F is 0.291 (much above the cut-off value 0.05) shows that the difference is not significant. The 95% confidence interval for no rainfall has a huge overlap with the 95% confidence interval of rainfall days and hence they are not separated at the 5% level of significance. But practically it is simply impossible to believe that the total number 10 of days when the total rainfall is zero is statistically insignificant with the number of day where there was high rainfall. We need to address this issue later on. 1.2 Outline of the Study Here is the outline of my thesis. In Chapter 2, we introduce different methodologies we use in our research that include the estimation of missing values that include a robust and bootstrap expectation minimization (EM) algorithm. We also study the trend of variables in this section. In Chapter 3 we employ regression and mediation methods for the prediction of rainfall in different parts of Saudi Arabia. Fitting and forecasting of data using ARIMA models are discussed in Chapter 4. In order to determine the most appropriate ARIMA model we employed both graphical methods based on the autocorrelation function (ACF) and the partial autocorrelation function and numerical tests such as the t-test and the Ljung-Box test based on the ACF and the PACF. We report a cross validation in Chapter 5 which is designed to investigate which of the regression, mediation and ARIMA models can generate better forecasts of rainfall in Saudi Arabia. We also report the future trend of rainfall in different regions of Saudi Arabia. 11 CHAPTER 2 ESTIMATION OF MISSING VALUES IN SAUDI ARABIA CLIMATE DATA We have observed in Section 1.1 that each climate variable of Saudi Arabia has a missing observation in it. Missing data are a part of almost all research and it has a negative influence on the analysis, such as information loss and, as a result, a loss of efficiency, loss of unbiasedness of estimated parameters and loss of power. An excellent review of different aspects of missing values is available in Little and Rubin (2002) and Alshammari (2015). In this section we introduce a few commonly used missing values estimation techniques which are popular with statisticians. 2.1 Imputation Methods In this section we will discuss a couple of imputation techniques for estimating missing values. 2.1.1 Mean Imputation Technique Among the different methods for solving the missing value, Imputation methods (Little and Rubin (2002)) is one of the most widely used techniques to solve incomplete data problems. Therefore, this study stresses several imputation methods to determine the best methods to replace missing data. Let us consider n observations x1 , x2 ,..., xn together with m missing values denoted by x1* , x2* ,..., xm* . Thus the total number of observed data with missing values is n + m given as 12 x1 , x2 ,..., xn1 , x1* , xn1 1 , xn1 2 ,..., xn2 , x2* , xn2 1 , xn2 2 ,..., xm* ,......., xn . (2.1) Therefore, the first missing value occurs after n1 observations, the second missing value occurs after n 2 total observations, and so on. Note that there might be more than one consecutive missing observation. Mean-before Technique The mean-before technique is one of the most popular imputation techniques in handling missing data. This technique consists of substituting all missing values with the mean of observed data since the last missing data point. Thus for the data in (2.1), x1* will be replaced by x1 1 n1 xi n1 i 1 (2.2) and x 2* will be replaced by x2 n2 1 xi (n2 n1 1) i n1 1 (2.3) and so on. Mean-before-after Technique The mean-before-after technique substitutes all missing values with the mean of one observed datum before the missing value and one observed datum after the missing value. Thus for the data in (2.1), x1* will be replaced by x1 x n1 x n1 1 (2.4) 2 and x 2* will be replaced by 13 x2 x n2 x n2 1 (2.5) 2 and so on. 2.1.2 Median Imputation Technique Since mean is highly sensitive to extreme observations and/or outliers, median imputation is becoming more popular now-a-days [Mamun (2014)]. Instead of mean the missing value for a data set given in (2.1), the missing value is estimated by the median. For example, for the median before technique, x1* in (2.2) and x 2* in (2.3) will be replaced by their respective sample medians and so on. 2.2 Expectation Maximization Algorithm Let y be an incomplete (containing few missing observations) data vector, e.g., as in (2.1), whose density function is f ( y; ) where is a p-dimensional parameter. If y were complete, the maximum likelihood of would be based on the distribution of y. The log-likelihood function of y, log L( y; ) l ( y; ) log f ( y; ) where l = Log L is required to be maximized. As y is incomplete we may denote it as ( yobs , y mis ) where y obs is the observed data and y mis is the unobserved missing data. Let us assume that the missing data is missing by random, then for some functions f 1 and f 2 : f ( y, ) f ( yobs , y mis ; ) f1 ( yobs ; ) f 2 ( y mis | yobs ; ) . (2.6) Considering the log-likelihood function, we see lobs ( ; yobs ) l ( ; y) log f 2 ( y mis | yobs ; ). (2.7) The EM algorithm is focused on maximizing l ( ; y) in each iteration by replacing it by its 14 conditional expectation given the observed data y obs . The EM algorithm has an E-step (estimation step) followed by an M-step (maximization step) as follows: E-step: Compute Q( ; (t ) ) where Q( ; (t ) ) E ( t ) [l ( ; y) | yobs ]. (2.8) for the t-th iteration and E stands for expectation. M-step: Find (t 1) such that Q( (t 1) ; (t ) ) ≥ Q( ; (t ) ) . (2.9) The E-step and M-step are repeated alternately until the difference L( (t 1) ) L( (t ) ) is less than , where is a small quantity. If the convergence attribute of the likelihood function of the complete data, that is L( ; y) , is attainable, then convergence of EM algorithm also attainable. The rate of convergence depends on the number of missing observations. Dempster, Laird, and Rubin (1977) show that convergence is linear with rate proportional to the fraction of information about in l ( ; y) that is observed. 2.3 Estimation with Trend Models The missing value estimation techniques discussed above are designed for independent observations. But in regression or in time series we assume a model and that should have a consideration when we try to estimate missing values. In time series things are even more challenging as the observations are dependent. Here we consider several commonly used missing value technique useful for time series. 15 We begin with simple models that can be used to forecast a time series on the basis of its past behavior. Most of the series we encounter are not continuous in time, instead, they consist of discrete observations made at regular intervals of time. We denote the values of a time series by { y t }, t = 1, 2, …, T. Our objective is to model the series y t and use that model to forecast y t beyond the last observation yT . We denote the forecast l periods ahead by yˆ T l . We sometimes can describe a time series y t by using a trend model defined as yt TR t t (2.10) where TR t is the trend in time period t. 2.3.1. Linear Trend Model: TR t 0 1t . (2.11) for constants 0 and 1 . We can predict y t by yˆ t ˆ0 ˆ1t . (2.12) 2.3.2. Polynomial Trend Model of Order p TR t 0 1t 2 t 2 ... p t p (2.13) for constant coefficients. We can predict y t by ŷt ˆ0 ˆ1t ˆ2t 2 ... ˆ pt p . (2.14) 2.3.3. Exponential Trend Model: The exponential trend model, which is often referred to a growth curve model, is defined as 16 y t 0 1 t . t (2.15) If 0 > 0 and 1 > 0, applying a logarithmic transformation to the above model yields y t 0 1 t t * * * * (2.16) where yt ln yt , 0 ln 0 , 1* ln 1 and t ln t . * * * 2.4 Estimation with Smoothing Techniques Several decomposition and smoothing techniques are available in the literature for estimating the missing value of time series data. 2.4.1. Moving Average Moving average is a technique of updating averages by dropping the earliest observation and adding the latest observation of a series. Let us suppose we have T = nL observations, where L is the number of seasons, then the first moving average computed for that season is obtained as y1 y1 y 2 ... y L / L . (2.17) The second moving average is obtained as y2 y2 ... y L y L1 / L y1 y L1 y1 / L . (2.18) In general, the m-th moving average is obtained as y m y m1 y m L1 y m1 / L . (2.19) 2.4.2. Centered Moving Average Instead of the moving average we often use the centered moving average (CMA) which is the average of two successive moving average values. 17 Some other useful smoothing techniques that can be used for estimating the missing values in time series are exponentially weighted moving average (EWMA) and the Holt-Winter model. We can use these methods to estimate the missing values by taking moving average of 5 to 10 neighboring data points around the missing observation. 2.5 Robust Methods Robust procedures are nearly as efficient as the classical procedure when classical assumptions hold strictly but are considerably more efficient overall when there is a small departure from them. The main application of robust techniques in a time series problem is to try to devise estimators that are not strongly affected by outliers or departures from the assumed model. A large body of literature is now available [Rousseuw and Leroy (1987), Maronna, Martin, and Yohai (2006), Hadi, Imon and Werner (2009)] for robust techniques that are readily applicable in linear regression or in time series. 2.5.1. Least Median of Squares Rousseeuw (1984) proposed the Least Median of Squares (LMS) method which is a fitting technique less sensitive to outliers than the OLS. In the OLS, we estimate parameters by minimizing the sum of n squared residuals u t 1 2 t which is obviously the same if we minimize the mean of squared residuals 1 n 2 1 n 2 . Sample means are sensitive to outliers, but medians are not. Hence to make u t ut less n t 1 n t 1 sensitive we can replace the mean by a median to obtain median sum of squared residuals 2 MSR ( ˆ ) = Median { uˆ t }. 18 (2.20) Then the LMS estimate of is the value that minimizes MSR ( ˆ ). Rousseeuw and Leroy (1987) have shown that LMS estimates are very robust with respect to outliers. 2.5.2. Least Trimmed Squares The least trimmed (sum of) squares (LTS) estimator is proposed by Rousseeuw (1984). Here we arbitrarily trim a certain amount of extreme observations from both tails of the data. Let us assume that we have trimmed 100 % observations and h is the remaining number of observations after trimming. In this method we try to estimate in such a way that h 2 LTS ( ˆ ) = Minimize uˆ t , (2.21) t 1 where ût is the t-th ordered residual. For a trimming percentage of , Rousseeuw and Leroy (1987) suggested choosing the number of observations h based on which the model is fitted as h = [n (1 – )] + 1. The advantage of using LTS over LMS is that, in the LMS we always fit the regression line based on roughly 50% of the data, but in the LTS we can control the level of trimming. When we suspect that the data contains nearly 10% outliers, the LTS with 10% trimming will certainly produce better result than the LMS. We can increase the level of trimming if we suspect there are more outliers in the data. 2.5.3. M – estimator Huber (1973) generalized the estimation of parameters by considering a class of estimators, which chooses to n n Minimize i Minimize yi xi i 1 i 1 T (2.22) where is a symmetric function less sensitive to outliers than squares. An estimator of this type is called an M – estimator, where M stands for maximum likelihood. It is easy to see from (2.22) that the function is related to the likelihood function for an appropriate choice of an error distribution. For 19 example if the error distribution is normal, then z z 2 / 2 , –∞ < z < ∞, which also yields the OLS estimator. The M – estimator obtained from (2.22) is not scale invariant. To obtain a scale invariant version of this estimator we solve n y i xi T i . Minimize Minimize i 1 i 1 n (2.23) In most of the practical applications, the value of is unknown and it is usually estimated before solving equation (2.22). A popular choice of is ~ = MAD (normalized) where MAD stands for the median absolute deviation. To minimize (2.22), we have to equate the first partial derivatives of w.r.t. for j = 0, 1, …, p to zero, yielding a necessary condition for a minimum. This gives a system of k = p + 1 equations y i xi T = 0, xij ~ i 1 n j = 0, 1, …, p (2.24) where . In general the function is nonlinear and (2.24) must be solved by iterative methods. 2.5.4. S – estimator Rouusseeuw and Yohai (1984) suggested another class of robust estimator based on the minimization of the dispersion of the residuals: ˆ 1 , ˆ 2 , ..., ˆ n s The above dispersion is defined as the solution of in a way that 1 n ˆ i / s K n i 1 (2.25) K is often put equal to E where is the standard normal. The function must satisfy the following conditions: is symmetric and continuously differentiable and (0) = 0. The estimator 20 thus obtained is called an S – estimator because it is derived from a scale statistic in an implicit way. In fact s given in the above estimating equation is an M – estimator of scale. 2.5.5. MM – estimator The MM – estimator was originally proposed by Yohai (1987). The objective was to produce a robust point estimator that maintained good efficiency. The MM – estimator has three stages: The initial estimate is an S – estimate, so it is fairly robust. The second stage computed an M – estimate of the error standard deviation using the residuals from the initial S – estimate. The last step is an M – estimate of the parameters using a hard redescending weight function to put a very small (often zero) weight to sufficiently large residuals. In an extensive performance evaluation of several robust regression estimators, Simpson and Montgomery (1998) report that MM –estimators have high efficiency and work well in most outlier scenarios. 2.6 Nonparametric Bootstrap Nonparametric methods have recently been very popular to statisticians because they do not require standard assumptions to hold where the reality is otherwise. Among the nonparametric techniques the bootstrap technique proposed by Efron (1979) has become extremely popular with statisticians. In this procedure one can create a huge number of sub-samples from a pre-observed data set by a simple random sampling with replacement. These sub-samples could be later used to investigate the nature of the population without having any assumption about the population itself. There are several forms of the bootstrap, and additionally, several other resampling methods that are related to it, such as jackknifing, cross-validation, randomization tests, and permutation 21 tests. Suppose that we draw a sample S= x1 , x2 ,..., xn from a population P= x1 , x2 ,..., x N imagine further at least for the time being, that N is very much larger than n, and that S is either a simple random or an independent random sample, from P. Now suppose that we are interested in some statistic T = t(S) as an estimate of the corresponding population parameter θ = t(P). Again, θ could be a vector of parameters and T the corresponding vector of estimates, but for simplicity assumes that θ is scalar. A traditional approach to statistical inference is to make assumptions about the structure of the population (e.g., an assumption of normality), and along with the stipulation of random sampling, to use these assumptions to derive the sampling distribution of T, on which classical inference is based. In certain instances, the exact distribution of T may be intractable and so we instead derive its asymptotic distribution. This familiar approach has two potentially important deficiencies: 1. If the assumptions about the population are wrong, then the corresponding sampling distribution of the statistic may be seriously inaccurate. On the other hand, if asymptotic results are relied upon these may not hold to the required level of accuracy in a relatively small sample. 2. The approach requires sufficient mathematical process to derive the sampling distribution of the statistic of interest. In some cases, such a derivation may be prohibitively difficult. In contrast, the nonparametric bootstrap allows us to estimate the sampling distribution of a statistic which is empirical in nature. That means this statistic is obtained from a data without making any assumptions about the form of the population of that data, without deriving the sampling distribution explicitly. The essential idea of the nonparametric bootstrap is as follows: We proceed to draw a sample of size n from among the elements of S, sampling with * *, , x12 ..., x1*n } . It is necessary to sample replacement. Call the resulting bootstrap sample S1* {x11 22 with replacement, because we would otherwise simply reproduce the sample S. In effect, we are treating the S as an estimate of the population P; that is, each element Xi of S is selected for the bootstrap sample with probability 1/n, mimicking the original selection of the sample S from the population P. We repeat this procedure a large number of times, B selecting many bootstrap * samples; the bth such bootstrap sample is denoted S b* {xb*1 , xb*2 ,..., xbn } .The basic hypothesis is this: representative and sufficient resample from the original sample would contain information about the original population as the original sample does represent the population. In the real world, an unknown distribution F has given the observed data S= x1 , x2 ,..., xn by random sampling. We calculate a statistic of interest T = t(S). In the bootstrap world, the * empirical distribution F̂ gives bootstrap samples Sb* {xb*1 , xb*2 ,..., xbn } by random sampling from which we calculate bootstrap replications of the statistic of interest, Tb t ( Sb ) . The big * * advantage of the bootstrap world is that we can calculate as many replications of Tb* as we want or at least as many as we can afford. Next, we compute the statistic T for each of the bootstrap samples; that is Tb t ( S b ) . Then the distribution of Tb* around the original estimate T is * * analogous to the sampling distribution of the estimator T around the population parameter θ. For B example the average of the bootstrapped statistics, T * Eˆ * (T * ) average of the bootstrap statistics; then Bia sˆ* T * T is T b 1 B * b . Estimate of the an estimate of the bias of T. * Similarly, the estimated bootstrap variance of T, Vˆ * (T * ) (Tb T * ) 2 /( B 1) , that estimates the sampling variance of T. The random selection of bootstrap samples is not an essential aspect of the nonparametric bootstrap. At least in principle, we could enumerate all bootstrap samples 23 of size n. Then we could calculate E * (T * ) and V * (T * ) exactly, rather than having to estimate them. The number of bootstrap samples, however, is astronomically large unless n is tiny. There are, therefore two sources of error in bootstrap inference: (a) the error induced by using a particular sample S to represent the population; (b) the sampling error produce by failing to estimate all bootstrap samples. Making the number of bootstrap replications B sufficiently large can control the latter sources of error. How large should we take B, the number of bootstrap replications uses to evaluate different estimates. The possible bootstrap replications is B = nn. To estimate the standard error we usually use the number of bootstrap replications to be between 25 to 250. But for other estimates such as confidence interval or a regression estimate, B is much bigger. We may increase B if T= t (S) a very complicated function of X to increasing variability. Bootstrap replications depend on the value of X, if n ≤ 100 we generally replicate B ≤ 10000. Here are two rules of thumb gathered from Efron and Tibshirani's experience Even a small number of bootstrap replications, say B = 25, is usually informative. B = 50 is often enough to give a good estimate of standard error. Very seldom are more than B = 200 replications needed for estimating a standard error For estimating a standard error, the number B will ordinarily be in the range 25-2000. Much bigger values of B are required for bootstrap confidence interval, cross-validation, randomization tests and permutation test. Some other suggestions are also available in the literature [see Efron (1987), Hall (1992), Booth and Sarker (1998)] about the number of replications needed in bootstrap. 24 2.7 Estimation of Missing Values for Saudi Arabia Climate Data In order to find the best method for estimating the missing values in time series such as the number of road accidents in Saudi Arabia, Alshammari (2015) considered the following methods: Mean Imputation Median Imputation Linear Trend Model Quadratic Trend Model Exponential Trend Model Centered Moving Average EM-OLS EM-LTS EM-LMS EM-M EM-MM BOOT-OLS BOOT-LTS BOOT-LMS BOOT-M BOOT-MM According to Alshammari (2015), the expectation minimization algorithm based on the robust MM estimator (EM-MM) method performs the best in estimating the missing values. We 25 followed his suggestion and estimate missing values for each of the variable for each city by the EM-MM method. Here we present a graphical display of estimated values in a time series plot for six important variables in Gizan. It is worth mentioning that we have looked at the graphical displays for all 26 cities as reported in Table 2.1 but they are not presented here because of brevity. Time Series Plot of Rainy Days 35 30 Rainy Days 25 20 15 10 5 1986 1990 1994 1998 2002 2006 2010 2014 Year Time Series Plot of Rainy Days 35 30 Rainy Days 25 20 15 10 5 1986 1990 1994 1998 2002 2006 2010 2014 Year Figure 2.1: Time Series Plot of Total Number of Rainy Days in Gizan with Complete Data 26 Time Series Plot of Temp 31.25 31.00 Temp 30.75 30.50 30.25 30.00 29.75 29.50 1986 1990 1994 1998 2002 2006 2010 2006 2010 2014 Year Time Series Plot of Temp 31.25 31.00 Temp 30.75 30.50 30.25 30.00 29.75 29.50 1986 1990 1994 1998 2002 2014 Year Figure 2.2: Time Series Plot of Yearly Temperature of Gizan with Complete Data 27 Time Series Plot of Temp_Max 36.25 36.00 Temp_Max 35.75 35.50 35.25 35.00 34.75 34.50 1986 1990 1994 1998 2002 2006 2010 2014 2010 2014 Year Time Series Plot of Temp_Max 36.25 36.00 Temp_Max 35.75 35.50 35.25 35.00 34.75 34.50 1986 1990 1994 1998 2002 2006 Year Figure 2.3: Time Series Plot of Maximum Temperature in Gizan with Complete Data 28 Time Series Plot of Temp_Min 27.0 Temp_Min 26.5 26.0 25.5 25.0 1986 1990 1994 1998 2002 2006 2010 2014 2010 2014 Year Time Series Plot of Temp_Min 27.0 Temp_Min 26.5 26.0 25.5 25.0 1986 1990 1994 1998 2002 2006 Year Figure 2.4: Time Series Plot of Minimum Temperature in Gizan with Complete Data 29 Time Series Plot of Wind Speed 14 Wind Speed 13 12 11 10 1986 1990 1994 1998 2002 2006 2010 2014 2010 2014 Year Time Series Plot of Wind Speed 14 Wind Speed 13 12 11 10 1986 1990 1994 1998 2002 2006 Year Figure 2.5: Time Series Plot of Wind Speed in Gizan with Complete Data 30 Time Series Plot of Rainfall 350 300 Rainfall 250 200 150 100 50 0 1986 1990 1994 1998 2002 2006 2010 2014 2006 2010 2014 Year Time Series Plot of Rainfall 350 300 Rainfall 250 200 150 100 50 0 1986 1990 1994 1998 2002 Year Figure 2.6: Time Series Plot of Rainfall in Gizan with Complete Data The above plots show that the EM-MM method yields pretty good estimates of the missing values since they are quite consistent with their previous and next observations. 31 2.8 Trend of Climate Data for Saudi Arabia We have just mentioned that we estimated missing values of climate data for the year 2005 for all 25 major cities in Saudi Arabia but for brevity we are not presenting all of them. The following table gives a summary of our findings for all 25 cities. This table presents the trends of six most important climate variables for studying rainfall such as rainy days, yearly average temperature, maximum temperature, minimum temperature, wind speed and total amount of rain fall after the estimation the estimation of missing values. Table 2.1: Trend of Complete Climate Data for Saudi Arabia City Rainy Days Temp (Y) Temp (Max) Temp (Min) Wind Speed Rainfall Gizan Decrease Increase Decrease Increase Decrease Decrease Hail Decrease Increase Increase Increase Increase Increase Madinah Decrease Increase Increase Increase Decrease Increase Makkah Increase Increase Increase Increase Decrease Decrease Najran Decrease Increase Increase Increase Decrease Decrease Rafha Decrease Increase Increase Increase Decrease Decrease Riyadh Decrease Increase Increase Increase Decrease Decrease Sharurah Decrease Increase Increase Increase Increase Decrease Tabuk Decrease Increase Increase Increase Increase Decrease Taif Decrease Increase Increase Increase Increase Increase Turaif Decrease Increase Increase Increase Increase Decrease Wejh Decrease Increase Increase Increase Decrease Increase Yanbo Decrease Increase Increase Increase Decrease Decrease 32 Abha Decrease Increase Decrease Decrease Al Baha Decrease Increase Increase Increase Decrease Decrease Sakaka Decrease Increase Increase Increase Decrease Decrease Guriat Decrease Decrease Decrease Arar Decrease Increase Increase Increase Increase Increase Buraydah Decrease Increase Increase Increase Decrease Decrease Alqasim Decrease Increase Increase Increase Decrease Decrease Dahran Decrease Increase Increase Increase Increase Decrease Al Ahsa Increase Increase Increase Increase Decrease Increase Khamis Decrease Increase Increase Increase Increase Decrease Jeddah Decrease Increase Increase Increase Decrease Increase Bishah Decrease Decrease Decrease Increase Increase Increase Decrease Decrease 33 Increase Increase Increase CHAPTER 3 MODELING AND FITTING OF DATA USING REGRESSION AND MEDIATION METHODS In this chapter at first we discuss different classical, robust and nonparametric methods commonly used in regression and later use them for the revenue and expenditure data of Saudi Arabia. 3.1 Classical Regression Analysis Regression is probably the most popular and commonly used statistical method in all branches of knowledge. It is a conceptually simple method for investigating functional relationships among variables. The user of regression analysis attempts to discern the relationship between a dependent (response) variable and one or more independent (explanatory/predictor/regressor) variables. Regression can be used to predict the value of a response variable from knowledge of the values of one or more explanatory variables. To describe this situation formally, we define a simple linear regression model Yi X i ui (3.1) where Y is a random variable, X is a fixed (nonstochastic) variable and u is a random error term whose value is based on an underlying probability distribution (usually normal). For every value X there exists a probability distribution of u and therefore a probability distribution of the Y’s. We can now fully specify the two-variable linear regression model as given in (3.1) by listing its important assumptions. 1. The relationship between Y and X is linear. 34 2. The X’s are nonstochastic variables whose values are fixed. 3. Each error ui has zero expected values: E( ui )= 0 4. The error term has constant variance for all observations, i.e., E( u i ) = 2 , i = 1, 2, …, n. 2 5. The random variables u i are statistically independent. Thus, E( u i u j ) = 0, for all i j. 6. Each error term is normally distributed. 3.1.1 Tests of Regression Coefficients, Analysis of Variance and Goodness of Fit We often like to establish that the explanatory variable X has a significant effect on Y, that the coefficient of X (which is ) is significant. In this situation the null hypothesis is constructed in way that makes its rejection possible. We begin with a null hypothesis, which usually states that a certain effect is not present, i.e., = 0. We estimate by ˆ and the standard error of ˆ denoted by s ˆ from the data and compute the statistic t= ˆ ~ t n2 . s ˆ (3.2) which means the statistic t follows a t distribution with n – 2 degrees of freedom. Residuals can provide a useful measure of the fit between the estimated regression line and the data. A good regression equation is one which helps explain a large proportion of the variance of Y. Large residuals imply a poor fit, while small residuals imply a good fit. The problem with using the residuals as a measure of goodness of fit is that its value depends on the units of the dependent 35 variable. For this we require a unit-free measure for the goodness of fit. Thus the total variation of Y (usually known as total sum of squares TSS) can be decomposed into two parts: the residual variation of Y (error sum of squares ESS) and the explained variation of Y (regression sum of squares RSS). To standardize, we divide both sides of the equation TSS = ESS + RSS by TSS to obtain 1= ESS RSS + . TSS TSS We define the R – squared ( R 2 ) of the regression equation as R2 = RSS ESS =1– . TSS TSS (3.3) Thus R 2 is the proportion of the total variation in Y explained by the regression of Y on X. It is easy to show that R 2 ranges in value between 0 and 1. But it is only a descriptive statistics. Roughly speaking, we associate a high value of R 2 (close to 1) with a good fit of the model by the regression line and associate a low value of R 2 (close to 0) with a poor fit. How large must R 2 be for the regression equation to be useful? That depends upon the area of application. If we could develop a regression equation to predict the stock market, we would be ecstatic if R 2 = 0.50. On the other hand, if we were predicting death in a road accident, we would want the prediction equation to have strong predictive ability, since the consequences of poor prediction could be quite serious. It is often useful to summarize the decomposition of the variation in Y in terms of an analysis of variance (ANOVA). In such a case the total explained and unexplained variations in Y are converted into variances by dividing by the appropriate degrees of freedom. This helps us to develop a formal procedure to test the goodness of fit by the regression line. Initially we set 36 the null hypothesis that the fit is not good. In other words, our hypothesis is that the overall regression is not significant in a sense that the explanatory variable is not able to explain the response variable in a satisfactory way. ANOVA Table for a Two Variable Regression Model Components Sum of Squares Degrees of freedom Mean SS F statistic Regression RSS 1 RSS/1 = RMS RMS/EMS~ F1,n 2 Error ESS n–2 ESS/(n –2) = EMS Total TSS n–1 Here we compute the mean sum of squares for both regression (RMS) and error (ESS) by dividing RSS and ESS by their respective degrees of freedom as shown in column 4 of the above ANOVA table. Finally we compute the ratio RMS / EMS which follows an F distribution with 1 and n – 2 degrees of freedom (which is also a square of a t distribution with n – 2 d. f.). If the calculated value of this ratio is greater than F1, n 2,0.05 , we reject the null hypothesis and conclude that the overall regression is significant at the 5% level of significance. 3.1.2. Regression Diagnostics and Tests for Normality Diagnostics are designed to find problems with the assumptions of any statistical procedure. In a diagnostic approach we estimate the parameters by the classical method (the OLS) and then see whether there is any violation of assumptions and/or irregularity in the results regarding the six standard assumptions mentioned at the beginning of this section. But among them the assumption of normality is the most important assumption. The normality assumption means the errors are distributed as normal. The simplest graphical display for checking normality in regression analysis is the normal probability plot. 37 This method is based in the fact that if the ordered residuals are plotted against their cumulative probabilities on normal probability paper, the resulting points should lie approximately on a straight line. An excellent review of different analytical tests for normality is available in Imon (2003). A test based on the correlation of true observations and the expectation of normalized order statistics is known as the Shapiro – Wilk test. A test based on an empirical distribution function is known as the Anderson – Darling test. It is often very useful to test whether a given data set approximates a normal distribution. This can be evaluated informally by checking to see whether the mean and the median are nearly equal, whether the skewness is approximately zero, and whether the kurtosis is close to 3. Skewness and kurtosis are measures of skewed and peaked behavior of a population respectively. The commonly used measures of skewness and peakness are S 3 2 3 and K 4 where k E[ X E ( X )]k is called the k-th central moment of the 4 2 random variable X. When the data come from a normal population with expectation and variance 2 , then the standard results show that 3 = 0 and 4 3 4 which yield S = 0 and K = 3. For samples, we can estimate k by mk 1 n x x k where x is the sample mean for a n i 1 m3 sample of size n. Then the coefficients of skewness and kurtosis are estimated by Sˆ and 3 m2 m Kˆ 44 . Thus a more formal test for normality is given by the Jarque – Bera statistic: m2 JB = [n / 6] [ Sˆ 2 ( Kˆ 3) 2 / 4] (3.4) Imon (2003) suggests a slight adjustment to the JB statistic to make it more suitable for the regression problems. His proposed statistic based on rescaled moments (RM) of ordinary least squares residuals is defined as 38 RM = [n c 3 / 6] [ Sˆ 2 c ( Kˆ 3) 2 / 4] (3.5) where c = n/(n – p), p is the number of independent variables in a regression model. Both the JB and the RM statistic follow a chi square distribution with 2 degrees of freedom. If the values of these statistics are greater than the critical value of the chi square, we reject the null hypothesis of normality. Here we report a regression analysis for Gizan. We try to predict rainfall. At first we fit the total amount of rainfall on the number of rainy days. Our common sense tells us that there must be a very strong linear relationship here, but because of too many strange values in the rainfall the data relationship does not turn out that strong. The MINITAB output of this analysis is given below. We observe from this output that although the number of rainy days has a significant impact on rainfall as the p-value is 0.016, the R 2 of this fit is only 19.7%. Regression Analysis: Rainfall versus Rainy Days The regression equation is Rainfall = - 56.1 + 7.48 Rainy Days Predictor Constant Rainy Days Coef -56.14 7.480 S = 86.9938 SE Coef 45.58 2.909 R-Sq = 19.7% T -1.23 2.57 P 0.229 0.016 R-Sq(adj) = 16.7% Analysis of Variance Source Regression Residual Error Total DF 1 27 28 SS 50053 204334 254387 MS 50053 7568 F 6.61 P 0.016 The normal probability plot as shown in Figure 3.1 clearly shows a nonnormal pattern and the corresponding RM statistic has p-value 0.003. 39 Normal Probability Plot (response is Rainfall) 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 -200 -100 0 100 200 300 Residual Figure 3.1: Normal Probability Plot of Rainfall vs Rainy Days in Gizan When we include a few more explanatory variables such as temperature, maximum temperature, minimum temperature and wind speed, the fit improves a bit. The R 2 value goes up to 54.6% and the normal probability plot (see Figure 3.2) shows a better normality pattern. The p-value of the RM statistic is 0.765. Regression Analysis: Rainfall versus Rainy Days, Temp, ... The regression equation is Rainfall = 1933 + 6.57 Rainy Days + 118 Temp - 92.4 Temp_Max - 97.6 Temp_Min + 21.1 Wind Speed Predictor Constant Rainy Days Temp Temp_Max Temp_Min Wind Speed Coef 1933 6.575 118.43 -92.45 -97.60 21.10 S = 70.8575 SE Coef 1648 2.565 84.63 39.58 55.44 12.53 R-Sq = 54.6% T 1.17 2.56 1.40 -2.34 -1.76 1.68 P 0.253 0.017 0.175 0.029 0.092 0.106 R-Sq(adj) = 44.7% Analysis of Variance Source Regression Residual Error Total DF 5 23 28 Source Rainy Days Temp Seq SS 50053 14986 DF 1 1 SS 138909 115478 254387 MS 27782 5021 F 5.53 P 0.002 40 Temp_Max Temp_Min Wind Speed 1 1 1 19635 39996 14239 Normal Probability Plot (response is Rainfall) 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 -150 -100 -50 0 50 Residual 100 150 200 Figure 3.2: Normal Probability Plot of Rainfall (PP) vs Different Climate Variables in Gizan But we are not entirely happy with this fit. Figure 3.3 gives the fitted values of rainfall together with the original values. We observe from this plot that for 8 years we predict negative amount of rainfall which cannot be true. Scatterplot of Rainfall, Fitted Rainfall vs Year 350 Variable Rainfall Fitted Rainfall 300 250 Y-Data 200 150 100 50 0 -50 1985 1990 1995 2000 Year 2005 2010 2015 Figure 3.3: Time Series Plot of Original and Predicted Rainfall (PP) for Gizan 41 We see similar pattern in rainfall data for other regions of Saudi Arabia and the details results are omitted for brevity. 3.2 Mediation We have already observed some strange behavior of rainfall data for different parts in Saudi Arabia. In many years the total amounts of rainfall are zero although there are quite a few rainy days there. It clearly indicates something is wrong with the data. When there are 30 to 40 rainy days in a year, the total amount of rainfall cannot be zero. Probably the corresponding data are missing and in the data sheet they wrongly type those as zeroes. In such a situation a regression model can fail to predict rainfall. But we have seen in the previous section that the number of rainy days has a high correlation with the total amount of rainfall and we can also establish a significant linear relationship between them. Since the total number of rainy days data is complete we can use that information to predict or forecast rainfall by an indirect regression approach. Indirect regression is popularly known as mediation in the regression literature. In statistics, a mediation model is one that seeks to identify and explicate the mechanism or process that underlies an observed relationship between an explanatory (independent) variable and a response (dependent) variable via the inclusion of a third explanatory variable, known as a mediator variable. The mediation method was first proposed by Baron and Kenny (1986) and then developed by many authors. An excellent review of mediation technique is available in Montgomery et al. (2014). Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the 42 independent and dependent variables. In other words, mediating relationships occur when a third variable plays an important role in governing the relationship between the other two variables. Figure 3.4: Regression vs Mediation Analysis Figure 3.4 gives a visual representation of the overall mediating relationship to be explained. Mediation analyses are employed to understand a known relationship by exploring the underlying mechanism or process by which one variable (X) influences another variable (Y) through a mediator (M). For example, suppose a cause X affects a variable (Y) presumably through some intermediate process (M). In other words X leads to M which leads to Y. In our study we assume that the climate variables will lead to the number of rainy days and that will lead to total amount of rainfall. Thus, the number of rainy days has become an intervening variable which is called a mediator. 3.3 Accuracy Measures When several methods are available for fitting or predicting a regression or time series, we need to find some kind of accuracy measures for comparison of the goodness of fit. For a regular regression and/or time series three measures of accuracy of the fitted model: MAPE, MAD, and 43 MSD are very commonly used. For all three measures, the smaller the value, the better the fit of the model. Use these statistics to compare the fits of the different methods. 3.3.1. MAPE Mean absolute percentage error (MAPE) measures the accuracy of fitted time series values. It expresses accuracy as a percentage. MAPE = | y yˆt / yt | t T 100 (3.6) where yt equals the actual value, ŷt equals the fitted value, and T equals the number of observations. 3.3.2. MAD MAD stands for mean absolute deviation, measures the accuracy of fitted time series values. It expresses accuracy in the same units as the data, which helps conceptualize the amount of error. MAD = | y t yˆt | T (3.7) where yt equals the actual value, ŷt equals the fitted value, and T equals the number of observations. 3.3.3. MSD Mean squared deviation (MSD) is always computed using the same denominator, T, regardless of the model, so we can compare MSD values across models. MSD is a more sensitive measure of an unusually large forecast error than MAD. MSD = y yˆt 2 t T 44 (3.8) where yt equals the actual value, ŷt equals the fitted value, and T equals the number of observations. 3.4 A Comparison of Regression and Mediation Fits for Gizan To offer a comparison between regression and mediation methods in fitting the rainfall data, we fit the data by both methods and the results of the fitted values together with the actual rainfall values are presented in Table 3.1. It is worth mentioning that we employed the ordinary least squares method to compute the regression fits. Here the response variable is total amount of rainfall and the explanatory variables are temperature, maximum temperature, minimum temperature, number of rainy days and wind speed. For the mediation values at first we predict the mediator variable total number of rainy days by temperature, maximum and minimum temperature and wind speed and then obtain the predicted rainfall as Predicted rainfall = Predicted number of rainy days × average rainfall per day Table 3.1: Actual and Predicted Fits of Rainfall for Gizan Year 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 Rainfall 0.00 0.00 0.00 28.96 53.09 0.00 318.77 310.89 2.03 113.80 257.05 48.01 2.03 Regression 44.283 8.429 -38.270 -7.876 42.723 48.151 256.727 219.078 96.859 184.732 89.395 87.237 81.041 45 Mediation 16.100 17.120 14.940 15.930 53.140 48.151 256.727 309.800 56.320 184.732 210.236 87.237 39.876 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 0.00 0.00 0.00 0.00 0.00 12.95 46.70 10.20 9.80 9.15 9.15 9.80 10.20 91.70 206.78 0.00 10.041 26.527 27.865 -15.492 -40.160 -1.602 44.897 77.789 85.927 2.663 -9.468 48.110 -0.415 99.371 39.181 43.317 10.041 26.527 27.865 15.234 16.132 14.768 44.897 34.935 55.673 2.663 12.254 15.897 16.944 99.371 134.143 55.045 Scatterplot of Rainfall, Regression, Mediation vs Year 350 Variable Rainfall Regression Mediation 300 250 Y-Data 200 150 100 50 0 -50 1985 1990 1995 2000 Year 2005 2010 2015 Figure 3.5: Time Series Plot of Actual and Predicted Fits of Rainfall (PP) for Gizan A graphical display of this data in terms of a time series plot is presented in Figure 3.5. This graph clearly shows that the mediation method is fitting the data better than the regression method for this data. 46 Table 3.2: Accuracy Measures for Regression and Mediation Fits of Rainfall for Gizan Measures Regression Mediation MAPE 228.50 68.34 MAD 47.06 25.84 MSD 3930.88 1156.93 Finally we compute the accuracy measures MAPE, MAD and MSD for the fitted values obtained by regression and mediation methods and present them in Table 3.2. This table clearly shows that all three accuracy measures for mediation are much less than their corresponding regression results. Thus we may conclude that there exists empirical evidence which shows mediation method can fit the rainfall data of Gizan better than regression methods. 47 CHAPTER 4 FORECASTNG WITH ARIMA MODELS In this chapter we discuss different aspects of data analysis techniques useful in time series analysis. Here the prime topic of our discussion will be ARIMA models. We will talk about fitting of a model and generating forecasts using ARIMA models. An excellent review of different aspects of stochastic time series modelling is available in Pyndick and Rubenfield (1998), Bowerman et al. (2005) and Imon (2015). We assume that the time series models have been generated by a stochastic process. In other words, we assume that each value y1 , y 2 , …, yT in the series is randomly drawn from a probability distribution. We could assume that the observed series y1 , y 2 , …, yT is drawn from a set of jointly distributed random variables. If we could specify the probability distribution function of our series, we could determine the probability of one or another future outcome. Unfortunately, the complete specification of the probability distribution function for a time series is usually impossible. However, it usually is possible to construct a simplified model of the time series which explains its randomness in a manner that is useful for forecasting purposes. 4.1 The Box-Jenkins Methodology The Box-Jenkins methodology consists of a four-step iterative procedure. Step 1: Tentative Identification: Historical data are used to tentatively identify an appropriate Box-Jenkins model. Step 2: Estimation: Historical data are used to estimate the parameters of tentatively identified model. 48 Step 3: Diagnostic Checking: Various diagnostics are used to check the adequacy of the tentatively identified model and, if need be, to suggest an improved model, which is then regarded as a new tentatively identified model. Step 4: Forecasting: Once a final model is obtained, it is used to forecast future time series values. 4.2 Stationary and Nonstationary Time Series It is important to know whether the stochastic process that generates the series can be assumed to be invariant with respect to time. If the characteristic of the stochastic process changes over time we call the process nonstationary. If the process is nonstationary, it will often be difficult to represent the time series over past and future intervals of time by a simple algebraic model. By contrast, if the process is stationary, one can model the process via an equation with fixed coefficients that can be estimated from past data. Properties of Stationary Process We have said that any stochastic time series y1 , y 2 , …, yT can be thought of as having been generated by a set of jointly distributed random variables; i.e., the set of data points y1 , y 2 , …, yT represents a particular outcome (also known as a realization) of the joint probability distribution function p( y1 , y 2 , …, yT ). Similarly, a future observation yT 1 can be thought of as being generated by a conditional probability distribution function p( yT 1 | y1 , y 2 , …, yT ) (4.1) that is, a probability distribution for yT 1 given the past observations y1 , y 2 , …, yT . We define a stationary process, then, as one whose joint distribution and conditional distribution both are invariant with respect to displacement in time. In other words, if the series is stationary, then 49 p( y t , y t 1 , …, y t k ) = p( yt m , yt m1 , …, yt m k ) and p( y t ) = p( yt m ) (4.2) for any t, k, and m. If the series y t is stationary, the mean of the series, which is defined as y = E( y t ) (4.3) must also be stationary, so that E( y t ) = E( yt m ), for any two different time period t and m. Furthermore, the variance of the series y 2 = E[( y t – y )] 2 (4.4) must be stationary, so that E[( y t – y )] 2 = E[( yt m – y )] 2 . (4.5) Finally, for any lag k, the covariance of the series k = Cov ( y t , y t k ) = E[( y t – y ) ( y t k – y )] (4.6) must be stationary, so that Cov ( y t , y t k ) = Cov ( yt m , yt m k ). If a stochastic process is stationary, the probability distribution p( y t ) is the same for all time t and its shape can be inferred by looking at the histogram of the observations y1 , y 2 , …, yT . An estimate of the mean y can be obtained from the sample mean T y = y t 1 t /T (4.7) and an estimate of the variance y can be obtained from the sample variance 2 50 ˆ y 2 = T y t 1 y / T. 2 t (4.8) Usually it is very difficult to get a complete description of a stochastic process. The autocorrelation function could be extremely useful because it provides a partial description of the process for modeling purposes. The autocorrelation function tells us how much correlation there is between neighboring data points in the series y t . We define the autocorrelation with lag k as k Cov y t , y t k V yt V yt k (4.9) For a stationary time series the variance at time t is the same as the variance at time t + k; thus from (4.6) the autocorrelation becomes k k k 0 y2 (4.10) where 0 = Cov ( y t , y t ) = E[( y t – y ) ( y t – y )] = y2 and thus 0 = 1 for any stochastic process. Suppose the stochastic process is simply y t t where t is an independently distributed random variable with zero mean. Then it is easy to show that for this process, 0 = 1 and k = 0 for k > 0. This particular process is known as white noise, and there is no model that can provide a forecast any better than yˆ T l = 0 for all l. Thus, if the autocorrelation function is zero (or close to zero) for all k > 0, there is little or no value in using a model to forecast the series. 51 4.3 Test for Significance of a White Noise Autocorrelation Function In practice, we use an estimate of the autocorrelation function, called the sample autocorrelation (SAC) function T k rk y t 1 t y y t k y T y t 1 t y (4.11) 2 It is easy to see from their definitions that both the theoretical and the estimated autocorrelation functions are symmetrical; i.e., k = k and rk = r k . 4.3.1. Bartlett’s test Here the null hypothesis is H 0 : k = 0 for k > 0. Bartlett shows that when the time series is generated by a white noise process, the sample autocorrelation function is distributed approximately as a normal with mean 0 and variance 1/T. Hence, the test statistic is |z| = T | rk | (4.12) and we reject the null hypothesis at the 95% level of significance, if |z| is greater than 1.96. 4.3.2. The t-test based on SAC The standard error of rk is given by 1/ T k 1 SE rk 2 1 2 ri / T i 1 if k 1 if k 1 . (4.13) The t-statistic for testing the hypothesis H 0 : k = 0 for k > 0 is defined as T = rk /SE( rk ) (4.14) 52 and this test is significant when |T| > 2. 4.3.3. Box and Pierce Test and Ljung and Box Test To test the joint hypothesis that all the autocorrelation coefficients are zero we use a test statistic introduced by Box and Pierce (1970). Here the null hypothesis is H 0 : 1 = 2 = … = k = 0. Box and Pierce show that the appropriate statistic for testing this null hypothesis is k Q=T r i 1 2 (4.15) i is distributed as chi-square with k degrees of freedom. A slight modification of the Box-Pierce test was suggested by Ljuang and Box (1978), which is known as the Ljuang-Box Q (LBQ) test defined as k Q T (T 2) (T k ) 1 ri 2 i 1 . (4.16) Thus, if the calculated value of Q is greater than, say, the critical 5% level, we can be 95% sure that the true autocorrelation coefficients are not all zero. 4.3.4. Stationarity and the Autocorrelation Function How can we decide whether a series is stationary or determine the appropriate number of times a homogenous nonstationary series should be differenced to arrive at a stationary series? The correlogram (a plot of autocorrelation coefficients against the number of lag periods) could be a useful indicator of it. For a stationary series, the autocorrelation function drops off as k becomes large, but this usually is not the case for a nonstationary series. In order to employ the BoxJenkins methodology, we must examine the behavior of the SAC. The SAC for a nonseasonal 53 time series can display a variety of behaviors. First, the SAC for a nonseasonal time series can cut off at lag k. We say that a spike at lag k exists in the SAC if the SAC at lag k is statistically significant. Second, we say that the SAC dies down if this function does not cut-off but rather decreases in a steady fashion. In general, it can be said that 1. If the SAC of the time series values either cuts off fairly quickly or dies down fairly quickly, then the time series values should be considered stationary. 2. If the SAC of the time series values dies down extremely slowly, then the time series values should be considered nonstationary. 4.4 ARIMA Models In this section we introduce some commonly used stochastic time series models which are popularly known as integrated autoregressive moving average (ARIMA) models. 4.4.1. White Noise The simplest example of a stochastic time series is the white noise. Here the response y t is determined by just by the noise or errors as y t = t (4.17) with E( t ) = 0 and E( t s ) = 0 for t s. In the ARIMA version this model can be denoted as ARIMA (0,0,0). 54 4.4.2. Moving Average Models In the moving average process of order q each observation y t is generated by a weighted average of random disturbances going back to q periods. We denote this process as MA(q) and write its equation as y t = t 1 t 1 2 t 2 ... q t q (4.18) . In the moving average model the random disturbances are assumed to be independently distributed across time, i.e., generated by a white noise process. In particular, each t is assumed to be normal random variable with mean 0 and variance , and covariance k = 0 for 2 k 0. We have E( y t ) = which shows that a moving average process is independent of time. The process MA(q) is described by exactly q + 2 parameters, the mean , the disturbance 2 and the parameters 1 , 2 ,..., q . Let us now look at the variance of the moving average process of order q. We observe from (4.6) that 2 0 = E yt (4.19) = E( t 1 t 1 2 t 2 ... q t q 21 t t 1 y t 1 – …) 2 2 2 2 2 2 2 2 = (1 + 1 2 ... q ) 2 2 2 The moving average process of order q has autocorrelation function k 1 k 1 2 k 2 ... q k q 2 2 2 k = 1 1 2 ... q 0 55 k 1,2,..., q kq . (4.20) 4.4.3. Autoregressive Models In the autoregressive process of order p the current observation y t is generated by a weighted average of past observations going back p periods, together with a random disturbance in the current period. We denote this process as AR(p) and write its equation as y t = 1 yt 1 2 yt 2 ... p yt p t (4.21) If the autoregressive process is stationary, then its mean, which we denote by , must be invariant with respect to time; i.e., E( y t ) = E( y t 1 ) = E( y t p ) = for all p. The mean is thus given by = 1 2 ... p or = . If the process is stationary, 1 1 2 ... p then must be finite. For this, it is necessary that 1 2 ... p < 1. For the first-order process AR(1): y t = 1 yt 1 t , k = k k = 1 . 0 (4.22) In general, for k > 1, k = E yt k 1 yt 1 2 yt 2 t = 1 k 1 + 2 k 2 . (4.23) Thus the autocorrelation function is given by 1 = 1 1 2 (4.24) k = 1 k 1 + 2 k 2 for k > 1 . 4.4.4. The Partial Autocorrelation Function The partial autocorrelation function is used to determine the order of an autoregressive process. For an autoregressive process of order p, the covariance with displacement k is determined from 56 k = E yt k 1 yt 1 2 yt 2 ... p yt p t (4.25) which gives 0 = 1 1 + 2 2 + … + p p + 2 1 = 1 0 + 2 1 + … + p p 1 …………………………………… p = 1 p 1 + 2 p 2 + … + p 0 . (4.26) The above equations also give a set of p equations, known as Yule-Walker equations, to determine the first p values of the autocorrelation functions: 1 = 1 + 2 1 + p p 1 ……………………………… p = 1 p 1 + 2 p 2 + … + p (4.27) The solution of the Yule-Walker equations requires the knowledge of p. Therefore we solve these equations for successive values of p. We begin by hypothesizing that p = 1. We compute the sample autocorrelation ˆ1 as an estimate of 1 . If this value is significantly different from 0, we know that the autoregressive process is at least order 1. Next we consider the hypothesis that p = 2. We solve the Yule-Walker equations for p = 2 and obtain a new set of estimates for 1 and 2 . If 2 is significantly different from 0, we may conclude that the process is at least order 2. Otherwise we conclude that the process is order 1. We repeat this process for successive values of p. We call the series 1 , 2 , …, partial autocorrelation function. If the true order of the process is p, we should observe that ˆ j 0 for j > p. To test whether a particular j is zero, we can use the fact that it is approximately normally distributed with mean 0 and variance 1 / T. 57 Hence we can check whether it is statistically significant at, say, the 5% level by determining whether it exceeds 2 / T in magnitude. 4.4.5. Mixed Autoregressive – Moving Average (ARMA) Models Many stationary random processes cannot be modeled as a purely moving average or as purely autoregressive, since they have qualities of both types of processes. In this situation we can use the mixed autoregressive – moving average (ARMA) process. An ARMA process of order (p, q) is defined as y t = 1 yt 1 2 yt 2 ... p yt p t 1 t 1 2 t 2 ... q t q . (4.28) We assume that the process is stationary, then its mean, which we denote by , must be invariant with respect to time and is given by = 1 2 ... p or = For the general ARMA(p, q) process it is not easy to obtain the 1 1 2 ... p . variances, covariances and autocorrelations by solving equations. It can be shown easily, however, that k = 1 k 1 + 2 k 2 + … + p k p , k > q . (4.29) It is interesting to note that q is the memory of the moving average part of the process, so that for k > q, the autocorrelation function exhibits the properties of a purely autoregressive process. The above equations also give a set of p equations, known as Yule-Walker equations, to determine the first p values of the autocorrelation functions: 1 = 1 + 2 1 + p p 1 ……………………………… p = 1 p 1 + 2 p 2 + … + p 58 (4.30) The solution of the Yule-Walker equations requires the knowledge of p. Therefore we solve these equations for successive values of p. We begin by hypothesizing that p = 1. We compute the sample autocorrelation ˆ1 as an estimate of 1 . If this value is significantly different from 0, we know that the autoregressive process is at least order 1. Next we consider the hypothesis that p = 2. We solve the Yule-Walker equations for p = 2 and obtain a new set of estimates for 1 and 2 . If 2 is significantly different from 0, we may conclude that the process is at least order 2. Otherwise we conclude that the process is order 1. We repeat this process for successive values of p. We call the series 1 , 2 , …, partial autocorrelation function. If the true order of the process is p, we should observe that ˆ j 0 for j > p. To test whether a particular j is zero, we can use the fact that it is approximately normally distributed with mean 0 and variance 1 / T. Hence we can check whether it is statistically significant at, say, the 5% level by determining whether it exceeds 2 / T in magnitude. 4.4.6. Homogenous Nonstationary Processes: ARIMA Models Probably very few of the time series one meets in practice are stationary. Fortunately, however, many of the nonstationary time series that are encountered have the desirable property that they are differenced one or more times, the resulting series will be stationary. Such a nonstationary series is termed homogenous. The number of times the original series must be differenced before a stationary series results in is called the order of homogeneity. Thus, if y t is first order homogenous nonstationary, the series wt = y t – y t 1 = y t is stationary. Here we should construct models for those nonstationary series, which can be transformed into stationary series 59 by differencing them one or more times. We say that y t is homogenous nonstationary of order d if wt d yt (4.31) is a stationary series. If wt d yt and wt is an ARMA(p, q) process, then we say that y t is an integrated autoregressive moving average process of order p, d and q, or simply ARIMA(p, d, q). 4.5 Estimation and Specification of ARIMA Models It is often convenient to describe time lags by using the backward shift operator B. The operator B imposes a one-period time lag each time it is applied to a variable. Thus B t = t 1 , B 2 t = t 2 , …, B n t = t n . (4.32) Using this operator, we can write an MA(q) process as: y t = t 1 1 B 2 B 2 ... q B q = t (B) (4.33) In a similar way, the AR(p) process can be rewritten as yt 1 1 B 2 B 2 ... p B p = t . (4.34) Finally, an ARMA(p, q) process can be reexpressed as yt 1 1 B 2 B 2 ... p B p = t 1 1 B 2 B 2 ... q B q . (4.35) It is easy to show that any homogenous nonstationary process can be modeled as an ARIMA process. We can write the equation for an ARIMA(p, d, q) as: B d y t = B t (4.36) B y t = B t (4.37) If d = 0, we obtain 60 which is an ARMA(p, q). When q = 0, i.e., when wt d yt is just AR(p), we call y t an integrated autoregressive process of order (p, d) and denote it as ARIMA(p, d, 0) or ARI(p, d, 0). When p = 0, i.e., when wt is just MA(q), we call y t an integrated moving average process of order (d, q) and denote it as ARIMA(0, d, q) or IMA(0, d, q). In practice, it is crucial to specify the ARIMA model, i.e., to choose the most appropriate values for p, d and q. Given a series y t , the first problem is to determine the degree of homogeneity d. To do this one first examines the autocorrelation function of the original series y t and determines whether it is stationary. If it is not, difference the series and examine the autocorrelation function for y t . Repeat this process until a value of d is reached such that d yt is stationary, i.e., the autocorrelation function goes to 0 as k becomes large. After d is determined, one can work with the stationary series wt d yt and examine both its autocorrelation and partial autocorrelation function to determine possible specifications for p and q. The lower order processes like AR(1), AR(2), MA(1), MA(2), ARMA(1, 1) etc are easy to recognize. The autoregressive and moving average of order p and q respectively are determined by using the partial autocorrelation function (PACF) and ACF respectively. Table 4.1 summarizes the characteristics of the ARIMA(p,0,q) or ARMA(p,q) model. 61 Table 4.1: Specification of ARIMA Models Model Autocorrelation Function (ACF) Partial Autocorrelation Function (PACF) White noise All zero All zero MA(1) Zero after 1 lag Declining from 1st lag MA(2) Zero after 2 lags Declining from 2nd lag MA(q) Zero after q lags Declining from qth lag AR(1) Geometric decline from 1 lag Zero after 1 lag AR(2) Geometric decline from 2 lags Zero after 2 lags AR(p) Geometric decline from pth lag Zero after p-lags ARMA(1,1) Geometric decline from 1 lag Declining from first lag ARMA(p,q) Geometric decline from pth lag Declining from qth lag 4.6 Diagnostic Checking After a time series model has been estimated, one must test whether the specification was correct. We assume that the random errors t in the actual process are normally distributed and independent. Then, if the model is specified correctly, the residuals ̂t should resemble a white noise process. Consequently a sample autocorrelation function of the residuals for n observations is n rˆk ˆ ˆ t t 2 t k n 2 ˆ t t 1 62 (4.38) would be close to 0 for k > 0. We can use the Box and Pierce Test for this purpose. Consider the statistic Q composed of the first K residual autocorrelations K Q=T rˆ k 1 2 k (4.39) which is distributed as chi-square with K – p – q degrees of freedom. 4.7 Computing a Forecast In ARIMA models we generate forecasts by the minimum mean-square error method. Here our objective is to predict future values of a time series. For this reason we consider the optimum forecast to be that forecast which has the minimum mean square forecast error. Since the forecast error is a random variable, we minimize the expected value. Thus, we wish to choose our forecast yˆ T l so that E [ e 2 T l ] = E {[ yT l – yˆ T l ]} 2 (4.40) is minimized. It is easy to show that this forecast is given by the conditional expectation of yT l , that is, by yˆ T l = E ( yT l | yT , yT 1 , …, y1 ) . The computation of the forecast yˆ T l can be done recursively by using the estimated ARIMA model. This involves first computing a forecast one period ahead, then using this forecast to compute a forecast two periods ahead, and continuing until the l-period forecast has been reached. Let us write the ARIMA(p, d, q) model as wt = 1 wt 1 + 2 wt 2 + … + p wt p + t – 1 t 1 – … – q t q + . 63 (4.41) To compute the forecast yˆ T l , we begin by computing the one-period forecast of wt , ŵT (1). To do so, we write wT 1 = 1 wT + 2 wT 1 + … + p wT p 1 + T 1 – 1 T – … – q T q 1 + . (4.42) We obtain ŵT (1) = E( wT 1 | wT , ...) ˆ T q 1 + . = 1 wT + 2 wT 1 + … + p wT p 1 – 1 ̂T – … – q (4.43) Now, using the one-period forecast ŵT (1), we can obtain the second-period forecast ŵT (2): ˆ T q 2 + ŵT (2) = 1 ŵT (1) + 2 wT + … + p wT p 2 – 2 ̂T – … – q (4.44) Thus the l-period forecast is given by ŵT (l) = 1 ŵT (l–1) +…+ 1 wT + …+ p wT p l – 1 ̂T – … – q ̂T q l + . (4.45) If l > p and l > q, this forecast will be ŵT (l) = 1 ŵT (l – 1) + … + p ŵT (p – 1) wT p l . When d = 1, our l-period forecast yˆ T l is given by yˆ T l = ŵT (1) + ŵT (2) + … + ŵT (l) . 4.7.1. The Forecast Error We can write the ARIMA (p, d, q) model as B 1 B d yt = B t where = 0 and = 1 – B. Therefore, y t = B 1 1 Bd B t = (B) t = j 0 j t j . Thus an ARIMA model can be expressed as a purely moving average process of infinite order. Then 64 yT l = 0 T l + 1 T l 1 + … + l 1 T 1 + j 0 l j T j . The desired forecast yˆ T l can be based only on information available up to time T that gives yˆ T l = l j j 0 T j . We define the forecast error eT l as eT l = yT l – yˆ T l = 0 T l + 1 T l 1 + … + l 1 T 1 . (4.46) The variance of the forecast error is given by E( e 2 T l ) = ( 0 + 1 + … + l 1 ) . 2 2 2 2 It is easy to show that 0 = 1. Therefore, for any ARIMA specification, the forecast error one period ahead is just eT l = T l . 4.7.2. Forecast Confidence Interval The estimate of is given by 2 ˆ 2 = T 2 t t 1 / (T – p – q) (4.47) Hence the 100 (1 – )% confidence interval around a forecast l periods ahead would be given by yˆ T l z 1 / 2 (1 + l 1 j 1 65 2 j ) 1/ 2 . (4.48) 4.7.3. White Noise Forecasts For the white noise model y t = t forecasts are generated as ŷ t = ̂ y . (4.49) 4.7.4. The AR(1) Process Let us consider the AR(1) process: y t = 1 y t 1 + + t The one-period forecast is yˆ T 1 = 1 yT + . The two-period forecast is 2 yˆ T 2 = 1 yˆ T 1 = 1 yT + ( 1 + 1) . Thus the l-period forecast is l l 1 l 2 yˆ T l = 1 yT + ( 1 + 1 + … + 1 + 1) . As l becomes large, the forecast converges to the value lim yˆ T l = l j 0 = / (1 – 1 ). j 1 which is the mean of the process. The forecast error is given by eT l = T l + 1 T l 1 + … + 1 T 1 l 1 which has a variance E( e 2 T l ) = (1 + 1 + … + 1 2 66 2l 2 ) . 2 (4.50) 4.7.5. The MA(1) Process Let us consider the MA(1) process: y t = + t – 1 t 1 The one-period forecast is yˆ T 1 = – 1 ̂T . The l-period forecast is yˆ T l = E ( + T l – 1 T l 1 ) = (4.51) for l > 1. The forecast error is given by eT l = T l – 1 T l 1 which has a variance E( e 2 T l ) = (1 + 1 ) . 2 2 4.7.6. The ARMA(1, 1) Process An ARMA (1, 1) process is: y t = + 1 y t 1 + t – 1 t 1 The one-period forecast is yˆ T 1 = E( 1 yT + + T l – 1 T ) = 1 y t 1 + – 1 ̂T . The two-period forecast is 2 yˆ T 2 = 1 yˆ T 1 + = 1 yT + ( 1 + 1) – 1 1 ̂T . Finally the l-period forecast is l l 1 l 2 l 1 yˆ T l = 1 yT + ( 1 + 1 + … + 1 + 1) – 1 1 ̂T . 67 (4.53) 4.7.7. The ARI(1, 1, 0) Process Now we examine a simple nonstationary process, the integrated autoregressive process ARI(1, 1, 0): wt = 1 wt 1 + + t with wt = y t – y t 1 . Since wt is AR(1), the l-period forecast is l l 1 l 2 wˆ T l = 1 wT + ( 1 + 1 + … + 1 + 1) . We also have yˆ T l = yT + ŵT (1) + ŵT (2) + … + ŵT (l). The one-period forecast is yˆ T 1 = yT + 1 ( yT – yT 1 ) + = (1 + 1 ) yT – 1 yT 1 . The two-period forecast is yˆ T 2 = yT + ŵT (1) + ŵT (2) = yˆ T 1 + ŵT (2) = (1 + 1 ) yT – 1 yT 1 + 1 ( yT – yT 1 ) + ( 1 + 1) 2 = (1 + 1 + 1 ) yT – ( 1 + 1 ) yT 1 + ( 1 + 1) + . 2 2 2 Since wˆ T 2 = 1 wT + ( 1 + 1) = 1 ( 1 wT + ) + = 1 wˆ T 1 + , hence yˆ T 2 = yˆ T 1 + 1 wˆ T 1 + . Similarly, yˆ T l = yˆ T l 1 + 1 wˆ T l 1 + . The forecast error for one-period is given by eT 1 = yT 1 – yˆ T 1 = yT + wT 1 – yT – ŵT (1)= T 1 . 68 (4.54) The two-period forecast error is given by eT 2 = yT 2 – yˆ T 2 = yT + wT 1 + wT 2 – yT – ŵT (1) – ŵT (2) = [ wT 1 – ŵT (1)] + [ wT 2 – ŵT (2)] = (1 + 1 ) T 1 + T 2 . Finally eT l = (1 + 1 + 1 +…+ 1 ) T 1 + (1 + 1 + 1 +…+ 1 2 l 1 2 l 2 ) T 2 +… + (1 + 1 ) T l 1 + T l . This has a variance l l i j 2 E( e 2 T l ) = 1 . i 1 j 0 4.8 Fitting of the ARIMA Model to Saudi Arabia Rainfall Data In our study we try to predict rainfall in different cities of Saudi Arabia. In this section we would like to employ ARIMA models to fit this variable. In order to determine the order of ARIMA we compute the ACF, the PACF, and the corresponding t value. We begin with Gizan. Figure 4.1 presents the ACF and the PACF of rainfall of Gizan. We observe that ACF and PACF values of all order are close to zero so the data should fit a white noise model. Similar remarks may apply with the ACF and PACF values with the associated t and Ljuang-Box tests as shown in Tables 4.2. 69 Autocorrelation Function for Rainfall Partial Autocorrelation Function for Rainfall (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 3 4 Lag 5 6 -1.0 7 1 2 3 4 Lag 5 6 7 Figure 4.1: ACF and PACF of Rainfall Data for Gizan Table 4.2: ACF and PACF Values of Rainfall Data for Gizan ACF TSTAT LBQ PACF TSTAT 0.309823 -0.055817 0.276673 0.130178 -0.197858 -0.224183 -0.182002 1.66845 -0.27532 1.36113 0.60306 -0.90531 -0.99795 -0.78371 3.0820 3.1857 5.8325 6.4419 7.9083 9.8727 11.2263 0.309823 -0.167927 0.397144 -0.174101 -0.113085 -0.253456 -0.129367 1.66845 -0.90432 2.13869 -0.93756 -0.60898 -1.36490 -0.69666 It is worth mentioning that each city in Saudi Arabia shows a similar pattern. For this reason we present results for Gizan. We also present graphs of ACF and PACF for different cities. The numerical results are omitted for brevity. Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 1.0 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 2 3 Lag Figure 4.2: ACF and PACF of Rainfall Data for Hail 70 4 Lag 5 6 7 Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.3: ACF and PACF of Rainfall Data for Abha Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Autocorrelation Function for PP Partial Autocorrelation Function for RA (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Figure 4.4: ACF and PACF of Rainfall Data for Al Ahsa 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 Lag 2 3 4 Lag Figure 4.5: ACF and PACF of Rainfall Data for Al Baha 71 5 6 7 Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.6: ACF and PACF of Rainfall Data for Arar Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Figure 4.7: ACF and PACF of Rainfall Data for Buriedah 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 Lag 2 3 4 Lag Figure 4.8: ACF and PACF of Rainfall Data for Dahran 72 5 6 7 Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 2 3 4 Lag 5 6 7 Lag Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation 1.0 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 2 3 4 Lag 5 6 7 Lag Figure 4.10: ACF and PACF of Rainfall Data for Khamis Mashit Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) (with 5% significance limits for the autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Autocorrelation Figure 4.9: ACF and PACF of Rainfall Data for Jeddah 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 Lag 1 2 3 4 Lag Figure 4.11: ACF and PACF of Rainfall Data for Madinah 73 5 6 7 Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.12: ACF and PACF of Rainfall Data for Mecca Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.13: ACF and PACF of Rainfall Data for Quriat Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 Lag 2 3 4 Lag Figure 4.14: ACF and PACF of Rainfall Data for Rafha 74 5 6 7 Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.15: ACF and PACF of Rainfall Data for Riyadh Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.16: ACF and PACF of Rainfall Data for Sakaka Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 Lag Lag Figure 4.17: ACF and PACF of Rainfall Data for Sharurah 75 5 6 7 Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Figure 4.18: ACF and PACF of Rainfall Data for Tabuk 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.19: ACF and PACF of Rainfall Data for Taif Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 Lag Lag Figure 4.20: ACF and PACF of Rainfall Data for Turaif 76 5 6 7 Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Figure 4.21: ACF and PACF of Rainfall Data for Unayzah Partial Autocorrelation Function for PP Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 5 6 7 Lag Lag Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Figure 4.22: ACF and PACF of Rainfall Data for Wejh 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 4 5 6 1 7 2 3 4 Lag Lag Figure 4.23: ACF and PACF of Rainfall Data for Yanbu 77 5 6 7 Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 Lag 2 3 4 5 6 7 Lag Autocorrelation Function for PP Partial Autocorrelation Function for PP (with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation Figure 4.24: ACF and PACF of Rainfall Data for Bishah 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 1 2 3 4 5 6 7 1 Lag 2 3 4 5 6 7 Lag Figure 4.25: ACF and PACF of Rainfall Data for Najran Most of the plots show no evidence of high correlation and partial correlation and for this reason their corresponding bands look horizontal. For some cities like Al Ahsa, Madinah, Mecca, Riyadh, Sharurah, Taif and Turaif the bands do not look horizontal because their corresponding first order correlation values are relatively larger. But they are not statistically significant (the largest one has p-value 0.073) at the 5% level and do not show geometric declining pattern, Hence we can conclude that the white noise is the most appropriate ARIMA model for rainfall in different cities of Saudi Arabia. 78 CHAPTER 5 EVALUATION OF FORECASTS BY REGRESSION, MEDIATION AND ARIMA MODELS In this chapter our main objective is to evaluate forecasts made by different regression, mediation and time series methods. We employ the cross validation technique for doing this. 5.1 Cross Validation in Regression and Time Series Models Cross-validation is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). An excellent review of different types of cross validation techniques is available in Izenman (2008). Picard and Cook (1984) developed all the basic fundamentals of applying cross validation techniques in regression and time series. According to Montgomery et al. (2008), three types of procedures are useful for validating a regression or time series model. (i) Analysis of the model coefficients and predicted values including comparisons with prior experience, physical theory, and other analytical models or simulation results, (ii) Collection of new data with which to investigate the model’s predictive performance, 79 (iii) Data splitting, that is, setting aside some of the original data and using these observations to investigate the model’s predictive performance. We prefer the data splitting technique for crossvalidation of the fitted model in our study. In randomly missing data, generally three performance indicators; say, mean absolute error (MAE), root mean square error (RMSE) and estimated bias (EB) are considered to examine the accuracy of theses imputation methods. In order to select the best method for estimation of missing values, the predicted and observed data were compared. The mean absolute error is the average difference between predicted and actual data values, and is given by MAE 1 N N P O i 1 i i (5.1) where N is the number of imputations, Pi and Oi are the imputed and observed data points, respectively. MAE varies from 0 to infinity and perfect fit is obtained when MAE = 0. The root mean squared error is one of the most commonly used measures and it is computed by RMSE 1 N Pi Oi 2 . N i 1 (5.2) The smaller the RMSE value, the better is the performance of the model. The estimated bias is the absolute difference between the observed and the estimated value of the respective parameters and defined as EB Oi Ei (5.3) where Ei is the estimated value of the parameter that obtained from the imputation methods. In order to find out the best prediction model we usually leave out say, l observations aside as holdback period. The size of l is usually 10% to 20% of the original data. Suppose that 80 we tentatively select two models namely, A and B. We fit both the models using (T – l) set of observations. Then we compute MSPE A 1 l 2 e Ai l t 1 (5.4) for model A and MSPE B 1 l 2 eBi l t 1 (5.5) for model B. Several methods have been devised to determine whether one mean sum of square prediction error (MSPE) is statistically different from the other. One such popular method of testing is the F-test approach, where F-statistic is constructed as a ratio between the two MSPEs keeping the larger MSPE in the numerator of the F-statistic. If the MSPE for model A is larger, this statistic takes the form: F MSPE A MSPE B (5.6) This statistic follows an F distribution with (l , l) degrees of freedom under the null hypothesis of equal forecasting performance. If the F-test is significant we will choose model B for this data otherwise, we would conclude that there is little difference in choosing between these two models. 5.2 Evaluation of Forecasts for Rainfall Data To evaluate forecasts we carry out an experiment using cross validation. Here we compare three different methods: regression, mediation and ARIMA model. We have rainfall data for 29 years. We use 20 of them (roughly two-third) as the training set and left the last 9 observations (onethird) for cross validation. We have produced three sets of results for predicting rainfall, the first 81 one is the regression forecast, the second one is the mediation forecast, and the third one is the ARIMA forecast. The forecasted values together with the true values for Gizan are given in Table 5.1 and are also presented in Figure 5.1. Table 5.1: Rainfall Forecast for Gizan Year Original Regression Mediation ARIMA 2006 10.20 77.7891 34.935 59.7 2007 9.80 85.9272 55.673 59.7 2008 9.15 2.6628 2.663 59.7 2009 9.15 -9.4679 12.254 59.7 2010 9.80 48.1100 15.897 59.7 2011 10.20 -0.4147 16.944 59.7 2012 91.70 99.3708 99.371 59.7 2013 206.78 39.1808 134.143 59.7 2014 0 43.3172 55.045 59.7 The above table and the following figure clearly show that the mediation generates better forecasts. ARIMA forecasts are constants which should not be the case. The regression method predicts negative rainfall for the year 2009 and 2011 which are simply impossible and in fact Gizan had 9.20 and 10.15 cm rainfall in those years. 82 Time Series Plot of Original, Regression, Mediation, ARIMA V ariable O riginal Regression M ediation A RIM A 200 Data 150 100 50 0 2006 2007 2008 2009 2010 Year 2011 2012 2013 2014 Figure 5.1: Time Series Plot of Rainfall Forecast for Gizan Table 5.2 offers a comparison regarding the quality of forecasts between regression, mediation and ARIMA models. Obviously the mediation is the clear cut winner. It possesses the lowest MSD and MSPE values. Bothe regression and ARIMA forecasts yield very high MSD and MAPE values and p-values corresponding to the MSPE for both of them are significant at the 5% level. Table 5.2: MSPE of Regression, Mediation and ARIMA Forecasts of Rainfall for Gizan Measures MSD MSPE p-value Regression 4706 3.777 0.030 Mediation 1246 1.000 0.500 ARIMA 4579 3.675 0.033 83 Now we do the same kind of experiment for each major city of Saudi Arabia. It is worth mentioning that the mediation performs best for all of the cities and the details results are not presented for brevity. But we present the summary information regarding which method is predicting what trend of future rainfall as shown in Table 5.3. Table 5.3: Predicted Trend of Rainfall in Saudi Arabia City Regression ARIMA Mediation Gizan Decrease No change Increase Hail Decrease No change Increase Madinah Decrease No change Increase Makkah Increase No change Increase Najran Decrease No change Decrease Rafha Decrease No change Decrease Riyadh Decrease No change Decrease Sharurah Decrease No change Increase Tabuk Increase No change Increase Taif Increase No change Increase Turaif Decrease No change Decrease Wejh Decrease No change Increase Yanbo Decrease No change Increase Abha Increase No change Increase Al Baha Decrease No change Decrease 84 Sakaka Increase No change Increase Guriat Increase No change Increase Arar Increase No change Increase Buraydah Increase No change Increase Alqasim Decrease No change Increase Dahran Decrease No change Decrease Al Ahsa Increase No change Increase Khamis Mushait Decrease No change Decrease Jeddah Decrease No change Increase Bishah Increase No change Increase . Table 5.3 reveals an interesting feature regarding the future rainfall pattern in Saudi Arabia. ARIMA forecasts say it will be remain the same for all cities in future. Regression predicts that out of 25 cities 10 cities will have more rainfall in future and 15 other cities will have less rainfall. The mediation technique predicts that 18 cities will have more rainfall than present time and 7 cities will have less rainfall. Since mediation maintains much better control over prediction error we will go by this and conclude that Saudi Arabia will have higher rainfall in future. 85 CHAPTER 6 CONCLUSIONS AND AREAS OF FUTURE RESEARCH In this chapter we will summarize the findings of our research to draw some conclusions and outline ideas for our future research. 6.1 Conclusions Our prime objective was to predict rainfall of Saudi Arabia for the next few years. In order to do that we considered few climate variables that could possibly determine rainfall. We had a few missing observations here in our data and we used the EM MM algorithm to estimate those. After that we employed regression methods to fit the rainfall data and observed that the fits are not great. ARIMA models also failed to produce good fit for this data. Later we employed the mediation technique where number of rainy days is used as a mediator for predicting the total amount rainfall. Our data suggests that mediation produces a much better fit than regression and ARIMA models for all 25 major cities of Saudi Arabia. Finally we used cross validation to assess the goodness of forecasts and we observed that mediation yields much better forecasts as well. From these forecasts we can say that Saudi Arabia will have more rainfall in future. We predict that 18 out of 25 major cities will have higher rainfall including the holy cities Mecca and Madinah and only 7 cities will have less rainfall. 86 6.2 Areas of Future Research In our study we were not able to consider a few important climate variables such as humidity and evaporation because this information was not available. We would like to predict the rainfall once again if this information are available. Neural network could be an interesting method to employ on these data. In the future we would like to extend our research by considering a large number of alternative methods such as neural networking, clustering and studying volatility of the series of data. 87 REFERENCES 1. 2. 3. 4. 5. 6 7. 8. 9. 10. 11. 12. 13 14. 15. 16 17. 18. 19. 20. 21. 22 Altman, N.S. (1992). An Introduction to Kernel and Nearest-neighbor Nonparametric Regression, The American Statistician, 46, pp. 175-185. Alshammari, A.O.M. (2015). Modeling Motor Accidents in Saudi Arabia (Unpublished MS Thesis), Department of Mathematical Sciences, Ball State University. Baron, R. M. and Kenny, D. A. (1986). The Moderator-Mediator Variable Distinction in Social Psychological Research – Conceptual, Strategic, and Statistical Considerations, Journal of Personality and Social Psychology, 51, pp. 1173–1182. Bowerman, B. L., O’Connell, R. T., and Koehler, A. B. (2005). Forecasting, Time Series, and Regression: An Applied Approach, 4th Ed., Duxbury Publishing, Thomson Books/Cole, New Jersey. Box, G. E. P. and Pierce, D. A. (1970). Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models, Journal of the American Statistical Association, 65, pp. 1509-1526. Cleveland, W.S. (1979), Robust locally weighted regression and smoothing scatterplots, Journal of the American Statistical Association, 74, pp. 829-836. Hadi, A.S., Imon, A.H.M.R. and Werner, M. (2009). Detection of outliers, Wiley Interdisciplinary Reviews: Computational Statistics, 1, pp. 57 – 70. Hastie, T. and Tibshirani, R. (1987). Analysis of Categorical Data, Wiley, New York. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W. (1986). Robust Statistics: The Approach Based on Influence Function, Wiley, New York. Huber, P.J. (1973). Robust Regression: Asymptotics, Conjectures, and Monte Carlo, The Annals of Statistics, 1, pp. 799-821. Imon, A. H. M. R. (2003). Regression Residuals, Moments, and Their Use in Tests for Normality, Communications in Statistics—Theory and Methods, 32, pp. 1021 – 1034. Izenman, A.J. (2008), Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning, Springer, New York. Kadane, J.B. (1984). Robustness of Bayesian Analysis, Elsevier North-Holland, Amsterdam. Ljung, G. M. and Box, G. E. P. (1978), .On a Measure of a Lack of Fit in Time Series Models, Biometrika, 65, pp. 297–303. Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006), Robust Statistics: Theory and Methods, Wiley, New York. Ministry of Finance, Government of Saudi Arabia, www,mof.gov.sa Picard, R. and Cook, R.D. (1984), Cross-Validation of Regression Models, Journal of the American Statistical Association, 79, pp. 575–583. Pindyck, R. S. and Rubenfeld, D. L. (1998), Econometric Models and Economic Forecasts, 4th Ed. Irwin/McGraw-Hill Boston. Rousseeuw, P.J. (1984). Least Median of Squares Regression, Journal of the American Statistical Association, 79, pp. 871 – 880. Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection, Wiley, New York. Rousseeuw, P.J. and Leroy, A.M. (1987). A Fast Algorithm for S-Regression Estimates, Journal of Computational and Graphical Statistics, 15, pp. 414–427. Silverman, B.W. (1985), Some Aspects of the Spline Smoothing Approach to Non88 parametric Regression Curve Fitting, Journal of the Royal Statistical Society. Series B, 47, pp. 1-52. 23. Simpson, J.R. and Montgomery, D.C. (1998). A Robust Regression Technique Using Compound Estimation, Naval Research Logistics (NRL), 45, pp. 125–139. 24. Yohai, V. (1987). High Breakdown-Point and High Efficiency Robust Estimates for Regression, The Annals of Statistics, 15, pp. 642-656. 89 APPENDIX A Saudi Arabia Climate Data Variables: Data T - Average annual temperature TM - Annual average maximum temperature Tm – Annual Average minimum temperature PP - Rain or snow precipitation total annual V - Annual average wind speed RA - Number of days with rain SN - Number of days with snow TS - Number of days with storm FG - Number of foggy days TN - Number of days with tornado GR - Number of days with hail City: Gizan Year T Scale of measurement Celsius (°C) Celsius (°C) Celsius (°C) (mm) Km\h Days Days Days Days Days Days TM Tm PP V RA SN TS FG TN GR 1986 30.1 35.7 25.9 0.00 10.9 22 0 19 1 0 0 1987 1988 1989 1990 1991 1992 1993 1994 1995 30.4 30.5 30.2 30.2 30.3 29.5 30.0 30.1 30.4 35.7 36.2 36.2 36.1 35.6 34.4 34.5 35.1 35.5 26.1 26.0 25.5 25.3 25.5 25.4 25.6 25.7 25.1 0.00 0.00 28.96 53.09 0.00 318.77 310.89 2.03 113.80 10.0 10.2 10.7 10.8 11.1 12.9 13.1 14.2 14.1 17 13 14 17 11 33 22 8 13 0 0 0 0 0 0 2 1 1 19 21 13 9 11 26 27 14 13 0 0 1 5 0 0 0 0 1 0 0 0 0 0 0 0 1 0 2 1 0 3 1 0 0 0 0 1996 1997 1998 1999 2000 2001 2002 30.2 30.2 30.5 30.2 30.3 30.4 30.5 35.2 34.8 35.2 35.5 35.4 35.5 35.6 25.7 26.2 26.4 25.7 26.0 26.2 26.7 257.05 48.01 2.03 0.00 0.00 0.00 0.00 13.1 12.0 12.7 11.9 11.2 13.0 12.2 10 15 15 6 12 9 12 2 1 0 1 0 0 0 22 29 36 28 36 28 22 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 2003 30.9 35.7 27.0 0.00 10.3 13 0 28 0 0 0 2004 2005 2006 30.4 30.5 34.9 34.8 26.4 26.7 12.95 10.2 9.9 10.0 9 22 0 0 32 37 0 0 0 0 0 0 2007 2008 2009 2010 2011 2012 30.7 30.4 30.9 30.9 30.7 31.0 35.0 34.7 35.1 35.2 35.0 34.9 26.6 26.5 27.1 27.0 26.9 27.0 9.8 9.15 9.15 9.8 10.2 91.70 10.3 10.0 9.9 9.8 10.4 11.6 20 8 12 21 11 17 0 0 0 0 0 0 33 31 17 37 34 30 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2013 2014 31.2 31.2 35.5 35.7 27.0 27.1 206.78 0.00 11.5 12.1 13 16 0 0 29 31 0 0 0 0 0 0 City:Hail Year T 1986 21.4 1987 23.0 TM 28.4 30.1 Tm 13.7 14.8 PP 0.00 0.00 V 9.5 12.2 RA 22 27 SN 1 1 TS 20 17 FG 6 1 TN 0 0 GR 2 4 1988 1989 1990 22.6 21.7 22.5 29.2 28.6 29.7 14.5 13.6 14.1 0.00 19.06 24.88 13.6 12.3 12.4 17 30 20 0 0 0 10 9 6 3 4 3 1 0 0 0 0 1 1991 22.4 29.3 14.4 10.00 12.5 41 2 17 2 0 0 1992 1993 1994 1995 20.8 22.0 22.0 22.6 27.5 28.9 28.8 29.2 13.1 14.1 14.1 14.5 6.09 0.00 0.00 199.90 12.9 11.9 11.4 10.6 32 47 24 22 1 0 0 0 17 27 24 18 7 7 2 9 1 0 0 0 2 0 1 1 1996 1997 1998 1999 23.0 21.9 23.4 23.6 29.8 28.6 30.6 30.7 14.9 13.9 14.9 15.3 0.00 0.00 0.00 0.00 11.4 12.4 11.4 10.7 27 29 25 25 1 0 3 2 13 16 12 16 4 3 1 2 0 0 0 0 1 0 0 0 2000 2001 2002 2003 23.0 23.5 23.5 23.3 30.0 30.6 30.6 30.6 14.8 15.2 15.4 15.1 0.00 0.00 0.00 0.00 11.1 10.8 10.9 11.1 34 28 22 17 0 0 0 0 26 21 19 15 7 1 4 4 0 0 0 0 0 0 0 0 2004 2005 2006 22.9 23.1 30.4 30.2 14.5 15.0 0.00 31.5 12.5 13.1 18 36 0 0 20 27 5 2 0 0 0 1 2007 23.2 30.6 14.5 32.00 14.3 4 0 7 2 0 1 91 2008 22.9 30.0 14.5 33.5 14.0 15 1 7 5 0 1 2009 23.2 30.5 14.8 39.37 14.8 14 0 18 6 0 0 2010 2011 2012 2013 24.4 23.0 23.6 23.1 31.9 30.0 30.8 30.1 16.0 14.6 14.9 15.1 87.89 51.07 65.00 101.58 14.0 12.1 12.4 12.3 13 15 17 20 0 0 0 0 12 12 20 14 0 1 1 2 0 0 0 0 0 0 0 0 2014 23.6 30.4 15.5 0.00 12.3 19 0 27 2 0 0 TM 34.1 35.2 35.2 34.5 35.2 35.1 33.3 34.8 34.6 35.0 Tm 20.7 21.3 21.4 21.0 21.0 21.4 19.9 21.0 21.2 21.8 PP 0.00 0.25 0.00 6.60 129.54 145.03 28.19 135.89 190.25 193.04 V 12.1 10.3 11.9 12.0 11.0 12.0 11.1 10.9 13.9 11.4 RA 17 13 17 21 6 27 16 24 13 10 SN 0 0 0 1 0 0 0 1 1 0 TS 11 3 6 12 4 15 14 22 21 14 FG 0 1 1 1 0 2 2 0 0 0 TN 0 0 1 0 1 0 0 0 0 0 GR 0 0 2 1 1 0 1 0 0 1 City: Madinah Year T 1986 27.9 1987 28.6 1988 28.7 1989 28.1 1990 28.4 1991 28.4 1992 26.7 1993 28.1 1994 28.1 1995 28.7 1996 1997 1998 1999 2000 2001 2002 2003 2004 29.1 27.8 29.1 29.3 28.8 29.3 29.4 29.1 29.1 35.5 34.1 35.9 35.9 35.5 36.0 36.2 35.9 35.7 22.2 21.0 21.9 21.9 21.4 21.9 21.9 21.9 21.7 27.94 6.10 0.00 0.00 0.0 0.0 0.00 0.00 70.10 12.2 11.4 11.6 11.5 11.6 11.1 11.5 11.8 13.1 8 16 13 6 10 11 12 4 11 2 0 0 0 0 1 0 0 0 11 18 15 10 14 15 6 9 14 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2005 2006 28.7 34.9 21.6 39.37 9.9 12 0 16 0 0 1 2007 2008 2009 2010 2011 29.0 28.7 29.2 30.2 28.9 35.5 35.1 35.7 37.0 35.3 21.7 21.5 21.8 22.7 21.8 5.08 31.49 18.29 114.81 69.85 9.3 9.7 9.6 9.7 10.0 4 10 12 9 9 0 0 0 0 0 5 5 13 8 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 2012 29.2 35.5 21.8 94.23 10.6 7 0 7 0 0 0 2013 2014 29.3 29.6 35.3 34.9 22.4 23.5 28.19 0.00 11.1 12.5 8 11 0 0 12 12 0 0 0 0 0 0 City: Makkah (Mecca) Year T TM 1985 29.8 38.0 1986 30.0 37.6 1987 30.8 38.1 1988 31.2 38.5 1989 30.1 37.7 Tm 23.7 23.6 24.1 24.5 23.4 PP 2.03 0.00 0.00 0.00 50.03 V 6.4 7.2 6.9 7.7 7.4 RA 8 6 12 8 15 SN 0 0 0 1 0 TS 3 4 7 7 5 FG 0 0 1 1 5 TN 0 0 0 0 1 GR 0 1 0 1 4 1990 30.4 38.2 23.4 7.11 7.2 5 0 7 2 0 1 1991 1992 1993 1994 1995 1996 1997 1998 30.3 29.1 30.5 30.6 31.0 31.0 30.4 30.5 37.2 36.4 37.7 37.4 37.8 37.8 37.2 37.2 23.9 22.6 24.0 24.5 24.7 24.8 24.2 24.5 62.23 30.98 0.00 0.00 0.00 0.00 0.00 0.00 7.0 6.6 6.4 5.6 6.1 5.9 5.4 4.0 18 21 9 8 4 20 15 16 0 0 2 0 0 0 0 0 8 12 6 6 4 15 6 13 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1999 2000 2001 2002 2003 2004 31.5 31.0 31.4 31.5 31.4 30.7 39.2 38.5 38.8 38.5 38.5 37.9 25.1 24.9 25.1 25.3 25.4 25.0 0.00 0.00 0.00 0.00 0.00 0.00 4.3 5.4 6.0 6.5 4.6 6.2 20 10 10 10 3 10 1 0 0 0 0 0 17 13 16 15 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2005 2006 30.9 37.9 25.3 164.84 6.2 14 0 15 0 0 0 2007 30.9 38.0 25.4 170.2 6.3 12 0 13 0 0 0 2008 2009 2010 2011 2012 2013 2014 31.5 31.3 31.8 31.0 31.5 31.7 31.8 39.2 39.4 39.9 38.7 39.1 39.1 39.0 25.5 25.4 25.7 25.3 26.1 26.2 26.3 274.83 254.1 251.9 127.27 76.21 39.87 87.38 3.1 3.0 3.1 2.3 2.4 2.5 2.0 6 14 12 8 10 9 8 0 0 0 0 0 0 0 12 13 13 9 12 6 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 93 City: Najran Year T 1986 25.0 1987 26.1 1988 26.5 1989 26.0 1990 26.4 1991 25.5 TM 31.6 32.2 32.0 31.2 32.3 32.7 Tm 17.2 17.3 17.2 16.7 17.1 17.1 PP 139.95 0.00 0.00 3.30 201.43 6.10 V 9.4 9.1 9.8 10.4 10.5 10.0 RA 29 24 23 13 10 11 SN 0 0 0 0 0 0 TS 15 2 6 3 6 4 FG 0 0 3 1 0 2 TN 0 0 0 0 0 0 GR 1 0 0 0 0 0 1992 1993 25.8 26.8 31.6 32.3 17.0 17.1 0.00 0.00 10.4 10.2 32 31 0 1 7 17 7 2 0 0 0 0 1994 1995 27.1 25.4 32.8 32.7 17.7 17.1 0.00 27.94 9.3 7.4 6 14 2 1 10 3 1 1 0 0 0 0 1996 1997 1998 1999 25.4 25.9 26.1 25.7 32.9 33.3 33.8 33.5 16.6 17.4 17.0 15.8 0.00 4.06 0.00 0.00 6.4 7.5 7.0 6.9 21 25 20 7 2 1 1 0 10 10 13 2 0 0 0 0 0 0 0 0 0 0 0 1 2000 2001 2002 2003 26.1 25.8 25.6 26.1 33.9 33.5 33.4 34.2 16.3 16.9 16.3 16.8 0.00 0.00 0.00 0.00 6.6 7.5 7.3 6.5 6 20 10 16 0 0 0 0 5 12 0 16 1 0 1 0 0 0 0 0 0 0 0 1 2004 2005 2006 25.4 26.5 33.5 34.4 16.4 17.7 15.49 0.00 6.3 6.8 20 8 0 0 11 7 0 0 0 0 0 0 2007 2008 2009 26.6 26.1 26.9 34.5 34.1 34.9 17.9 17.2 18.1 0.00 0.00 1.02 7.0 7.0 7.3 6 13 7 0 0 0 7 8 5 0 0 0 0 0 0 0 0 0 2010 2011 2012 26.5 26.5 26.0 34.4 34.2 34.0 18.1 18.3 17.5 0.00 0.00 59.71 7.8 8.2 7.9 15 23 12 1 0 0 21 19 17 0 0 0 0 0 0 0 0 0 2013 25.8 33.3 17.8 0.00 7.9 16 0 17 0 0 0 2014 26.1 33.9 17.7 0.00 8.2 12 0 9 0 0 0 City : Rafha Year T 1986 23.3 1987 23.7 1988 23.1 TM 30.9 31.2 30.2 Tm 15.7 16.0 15.4 PP 0.00 0.00 0.00 V 15.5 13.9 14.5 RA 36 29 25 SN 0 0 0 TS 13 15 5 FG 1 0 1 TN 0 0 0 GR 0 0 0 94 1989 24.9 30.5 15.4 0.00 15.9 28 0 8 2 0 2 1990 1991 1992 1993 1994 1995 1996 1997 1998 23.7 23.1 21.3 22.9 23.2 23.2 24.0 23.0 25.3 31.4 30.7 29.1 30.7 30.8 30.5 31.5 30.0 32.4 15.5 15.5 13.7 14.9 15.2 15.3 16.2 15.4 16.5 3.05 80.26 65.54 0.00 0.00 197.36 0.00 55.88 22.10 12.3 11.5 10.7 14.9 15.4 14.3 14.7 15.6 14.3 11 32 27 40 32 29 23 29 14 0 0 0 1 0 0 0 2 0 2 19 8 14 17 9 5 6 2 3 7 6 9 7 7 7 7 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1999 25.0 32.1 16.5 0.00 14.0 20 0 2 8 0 0 2000 2001 2002 2003 2004 2005 2006 24.5 25.0 2501 23.6 23.5 24.1 31.2 32.1 32.1 31.4 31.5 31.4 16.1 15.2 15.3 15.5 15.4 16.4 0.00 0.00 0.00 0.00 2.03 0.00 14.4 13.8 13.9 14.0 16.6 12.5 28 20 23 21 26 27 0 0 0 0 0 0 3 4 5 10 5 11 12 5 5 9 8 4 0 0 0 0 0 0 0 0 0 0 0 0 2007 24.3 31.7 16.1 0.00 13.5 11 0 4 0 0 0 2008 2009 2010 2011 2012 2013 2014 24.1 24.1 25.8 23.7 24.6 23.7 24.4 31.1 31.1 33.2 31.0 31.6 30.5 31.1 16.3 16.5 18.0 15.9 17.0 16.4 17.1 0.00 39.38 92.96 0.00 68.83 0.00 0.00 12.3 13.2 13.4 13.9 13.7 13.7 13.3 14 15 14 20 17 39 39 0 0 0 0 0 0 0 4 6 6 5 8 15 21 3 10 0 1 6 3 4 0 0 0 0 0 0 0 0 0 0 0 0 1 2 City: Riyadh Year T 1986 24.7 1987 25.9 1988 25.5 1989 24.8 1990 25.6 1991 25.3 1992 24.0 TM 32.0 33.5 32.8 32.3 33.6 32.9 31.4 Tm 16.7 17.6 17.0 16.4 16.6 16.9 16.1 PP 0.00 0.00 0.00 102.11 21.34 175.76 121.92 V 12.3 12.0 11.9 11.5 10.8 10.9 10.5 RA 47 20 27 31 22 28 43 SN 0 0 0 0 0 2 1 TS 20 8 20 15 5 11 12 FG 4 3 1 2 6 8 9 TN 0 0 0 0 0 0 0 GR 2 0 0 3 2 2 2 95 1993 25.3 32.7 17.0 108.21 9.7 42 0 23 4 0 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 26.0 25.5 26.0 25.3 26.7 26.9 26.4 26.4 26.4 33.2 32.7 33.3 32.4 34.4 34.9 34.0 34.3 34.2 17.6 17.2 17.8 17.5 18.1 18.2 18.4 18.1 18.5 0.00 0.00 5.08 12.19 0.00 0.00 0.00 0.00 0.00 11.2 10.6 9.7 10.6 9.3 9.4 10.3 10.8 10.9 20 30 27 44 15 14 20 16 24 0 0 1 1 0 1 0 0 0 7 8 9 16 1 1 3 2 6 1 5 4 3 2 2 5 0 8 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 2003 26.5 34.1 18.6 0.00 11.0 17 1 8 1 0 1 2004 2005 2006 26.0 26.1 33.6 33.3 17.8 18.1 18.04 0.00 10.5 10.6 23 20 0 0 10 6 4 0 0 0 0 0 2007 2008 2009 2010 2011 26.3 25.9 26.4 27.0 25.9 33.7 33.4 33.9 35.0 33.2 18.1 17.8 18.6 18.7 18.5 36.08 0.00 70.87 123.43 154.19 9.3 10.5 10.2 9.8 10.8 16 17 14 15 20 0 0 0 0 0 8 1 8 11 13 0 3 2 0 0 0 0 0 0 0 0 0 0 0 0 2012 2013 2014 26.4 26.1 26.4 33.5 33.1 33.0 18.7 18.5 18.9 0.00 0.00 0.00 10.9 11.0 9.9 29 25 22 0 0 0 18 15 10 2 0 0 0 0 0 1 0 0 TM 35.4 35.8 35.3 34.7 Tm 19.9 20.3 20.4 19.4 PP 0.00 10.92 0.00 0.00 V 12.3 11.3 15.2 15.2 RA 7 6 9 12 SN 0 0 0 0 TS 1 0 2 0 FG 0 0 0 0 TN 0 0 0 0 GR 0 0 0 0 35.7 36.2 35.0 35.5 36.2 35.8 35.9 19.4 19.8 18.6 17.5 19.1 18.1 18.1 0.00 0.00 0.00 0.00 0.00 117.35 2.03 14.9 14.8 14.2 14.5 16.1 15.0 10.4 4 2 14 13 0 3 3 0 0 1 0 0 0 1 1 1 5 3 0 2 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 City: Sharurah Year T 1986 27.9 1987 28.4 1988 29.3 1989 28.0 1990 1991 1992 1993 1994 1995 1996 29.2 29.6 28.9 28.5 30.1 28.9 28.3 96 1997 28.6 36.5 18.1 0.00 11.2 7 0 3 0 0 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 29.2 29.0 29.2 29.1 29.9 28.8 28.3 28.9 36.9 36.6 36.5 36.5 26.6 36.4 36.1 36.4 19.8 20.0 19.9 19.8 19.8 19.8 19.0 19.9 0.00 0.00 0.00 0.00 0.00 59.94 59.94 17.02 12.5 12.7 12.3 10.7 9.9 9.1 6.7 11.6 7 1 2 4 5 9 8 6 0 1 0 0 0 0 0 0 3 1 0 2 0 7 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2007 2008 2009 2010 2011 2012 2013 2014 28.9 28.3 29.3 29.2 28.9 29.3 28.3 28.8 36.2 35.9 36.9 36.8 36.5 36.8 35.5 36.3 20.0 19.0 20.4 20.4 20.3 20.7 19.9 20.2 24.90 34.03 7.11 30.99 37.85 59.70 87.13 70.36 11.2 12.2 12.7 10.8 11.5 12.8 12.6 13.3 8 6 1 6 7 2 11 3 0 0 0 0 0 0 0 0 3 3 3 3 4 1 12 3 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 City: Tabuk Year T 1986 21.5 TM 29.4 Tm 14.6 PP 2.04 V 11.0 RA 13 SN 0 TS 4 FG 1 TN 0 GR 2 1987 1988 1989 1990 22.2 22.0 21.4 21.9 30.3 29.7 29.5 29.9 14.5 14.6 13.8 14.2 26.41 2.03 119.89 2.04 11.0 11.8 10.3 10.2 19 29 22 14 1 1 0 0 9 13 14 7 1 0 4 2 0 1 0 0 0 2 2 1 1991 1992 1993 22.0 20.5 21.6 29.6 28.1 29.4 14.6 13.0 14.1 0.51 0.00 0.00 10.1 10.1 9.8 22 26 32 2 2 2 5 6 12 0 2 1 0 0 0 1 1 0 1994 21.4 29.2 14.4 195.83 9.9 30 0 17 1 0 0 1995 1996 1997 1998 21.8 22.4 21.2 22.8 29.8 30.4 29.3 30.9 14.3 14.7 12.5 15.2 1.02 1.02 33.02 0.00 10.5 10.4 10.0 10.2 14 16 14 9 0 1 1 1 2 8 8 3 0 1 1 0 0 0 0 0 0 0 1 0 1999 2000 22.5 21.9 30.4 29.6 15.0 14.5 0.00 0.00 9.1 9.1 13 10 0 0 4 10 0 0 0 1 1 0 97 2001 22.6 30.5 15.0 10.93 7.8 8 0 8 1 0 1 2002 22.6 30.4 14.9 0.00 8.2 7 0 6 0 0 0 2003 2004 2005 2006 22.6 22.5 22.3 30.3 30.2 29.9 14.9 14.7 14.8 0.00 1.02 58.17 9.5 9.1 9.5 7 7 7 0 0 0 5 4 5 0 0 0 0 0 0 0 0 0 2007 2008 2009 2010 2011 2012 22.5 22.7 22.6 24.6 22.3 23.0 30.2 30.3 30.1 32.1 29.6 30.2 14.8 15.1 15.1 16.7 14.9 15.4 81.80 1.02 30.48 19.56 20.83 0.00 9.5 10.3 9.5 8.7 9.8 10.8 10 6 13 11 8 17 1 0 0 0 0 1 5 1 2 7 7 9 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 3 1 0 2013 2014 22.9 23.2 30.1 30.4 15.5 16.0 93.75 42.90 11.0 11.4 18 16 0 0 4 6 0 1 0 0 0 0 City: Taif Year T 1985 21.8 1986 22.0 1987 23.1 TM 29.0 28.9 29.7 Tm 16.1 15.9 16.7 PP 0.00 6.10 0.00 V 12.7 12.8 12.7 RA 35 37 35 SN 0 0 0 TS 49 51 43 FG 8 5 3 TN 0 0 0 GR 1 3 2 1988 1989 1990 1991 1992 1993 1994 1995 1996 23.4 22.3 23.2 23.4 21.9 23.1 23.2 23.4 23.3 29.6 28.7 29.7 29.7 28.4 29.9 29.6 30.3 30.3 16.8 16.0 16.6 16.8 15.3 16.4 16.4 16.3 15.0 0.00 4.06 9.91 36.59 77.98 23.88 0.00 0.00 0.00 13.8 12.8 12.8 13.2 14.3 14.4 14.8 13.6 12.0 31 49 19 34 31 24 17 19 32 0 0 1 1 0 1 0 0 3 34 42 29 44 35 44 37 39 53 7 10 6 6 5 3 4 8 8 0 0 0 0 0 0 0 0 0 4 6 3 5 3 1 1 0 3 1997 1998 1999 2000 2001 2002 2003 22.7 23.3 23.6 23.6 23.4 23.5 23.8 29.7 30.9 31.3 31.0 30.7 30.8 31.0 14.8 15.5 15.6 15.7 15.0 15.5 16.5 34.04 0.00 0.00 0.00 0.00 0.00 0.00 12.2 12.7 13.5 12.5 13.8 13.9 13.5 34 27 21 30 28 26 17 0 1 0 1 0 1 1 55 43 54 55 51 40 23 12 5 10 5 5 9 11 0 0 0 0 0 0 0 1 0 0 2 3 2 0 98 2004 23.5 30.6 16.1 0.00 13.0 27 1 42 14 0 0 2005 2006 23.0 30.3 16.2 0.00 13.5 33 0 43 6 0 2 2007 2008 2009 2010 2011 2012 2013 23.2 23.2 23.6 24.0 23.4 23.6 23.5 30.5 30.4 30.8 31.2 30.5 30.5 30.3 16.2 16.0 16.8 16.7 16.5 17.0 16.6 0.00 0.00 0.00 0.00 71.87 101.62 101.87 13.2 12.1 13.2 13.5 14.2 15.0 14.4 18 23 24 20 11 20 14 0 0 0 0 0 0 0 34 36 46 46 16 34 40 8 10 10 1 9 9 6 0 0 0 0 0 0 0 1 0 1 1 0 1 0 2014 23.8 30.4 17.3 0.00 14.5 30 1 47 8 0 1 City: Turaif Year T 1986 18.5 TM 26.3 Tm 11.3 PP 101.09 V 14.0 RA 51 SN 0 TS 9 FG 5 TN 0 GR 1 1987 1988 1989 1990 1991 19.1 19.5 20.5 20.0 19.7 26.8 25.8 26.5 26.4 26.3 11.4 11.4 11.0 11.7 11.5 0.00 0.00 0.00 184.94 38.36 13.9 13.7 12.5 12.8 13.5 35 42 24 35 31 0 0 0 1 1 8 7 6 4 4 8 5 5 7 3 0 0 0 0 0 3 3 1 2 0 1992 17.9 24.4 10.2 7.12 18.2 53 7 8 17 0 1 1993 1994 1995 1996 18.8 19.2 20.7 20.3 26.0 25.6 25.8 27.0 11.3 12.4 11.8 12.3 0.00 0.00 41.91 85.85 16.8 18.8 17.5 16.4 35 47 17 37 3 0 0 5 9 9 5 4 13 4 8 14 0 0 0 0 0 0 0 0 1997 1998 1999 2000 19.6 20.6 20.0 19.4 25.6 27.7 27.5 26.5 11.3 13.0 12.0 11.6 0.00 46.99 0.00 0.00 16.6 14.1 13.5 14.2 25 22 14 20 0 2 0 2 5 0 2 3 8 4 7 10 0 0 1 0 0 0 0 0 2001 2002 2003 2004 19.9 19.9 19.2 19.3 27.2 27.1 26.2 26.4 12.2 12.2 12.2 12.0 0.00 0.00 0.00 0.00 14.4 15.4 16.2 16.1 25 36 16 24 0 1 0 0 1 9 2 3 5 4 8 4 0 1 2 0 0 0 0 0 2005 2006 19.9 26.1 11.9 0.00 15.1 25 0 5 2 0 0 2007 19.4 26.3 12.3 0.00 15.9 20 0 5 1 0 0 99 2008 2009 2010 19.3 19.3 21.4 26.3 26.0 28.4 11.8 11.8 14.1 0.00 0.00 0.00 15.0 15.1 14.8 12 11 25 4 0 1 0 3 5 2 5 4 0 0 0 0 1 1 2011 2012 2013 2014 18.9 19.8 19.3 19.6 25.7 26.4 26.1 26.3 11.7 12.7 12.1 12.6 0.00 0.00 0.00 0.00 14.8 15.4 16.1 15.1 27 21 24 25 0 0 1 0 4 2 4 4 5 8 7 10 0 0 0 0 0 0 0 1 City: Wejh Year T 1986 24.8 TM 29.8 Tm 20.0 PP 0.00 V 16.9 RA 7 SN 0 TS 2 FG 3 TN 0 GR 1 1987 1988 1989 1990 1991 1992 25.1 25.4 24.4 25.0 25.0 24.0 30.0 30.5 29.8 30.1 30.0 28.8 19.8 20.0 19.5 19.7 20.0 18.7 0.00 4.06 0.00 0.00 0.00 0.00 17.5 18.3 17.7 17.6 17.5 17.4 9 10 11 8 19 18 2 0 0 0 0 0 4 7 5 4 6 2 4 3 7 3 5 4 0 0 0 0 0 0 2 3 3 3 1 2 1993 1994 1995 25.0 25.0 25.1 29.5 29.9 29.8 19.9 19.9 19.8 0.00 0.00 0.00 17.1 16.1 16.7 14 8 6 0 0 1 8 4 0 5 0 4 0 0 0 1 0 1 1996 25.3 29.9 19.8 0.00 18.3 10 2 5 7 1 0 1997 1998 1999 2000 24.8 25.7 25.6 24.8 29.3 30.2 30.4 29.4 19.5 20.5 20.2 19.3 2.03 0.00 0.00 1.02 16.6 15.8 16.2 15.5 13 7 7 7 2 0 0 0 7 3 2 0 7 3 1 1 0 0 0 0 0 0 0 0 2001 2002 2003 2004 25.4 25.3 25.6 25.4 30.1 30.5 30.7 30.5 20.3 20.3 20.5 20.1 0.51 1.02 1.02 1.02 15.3 15.2 14.2 14.7 8 5 5 5 0 1 0 0 3 2 2 1 5 2 0 7 0 0 0 0 1 0 0 0 2005 2006 25.2 30.1 20.1 8.89 14.9 3 0 2 8 0 0 2007 25.1 30.0 19.8 111.00 13.7 4 0 0 6 0 0 2008 2009 2010 25.5 25.3 27.1 30.5 30.5 32.2 20.3 20.1 22.0 79.24 196.85 67.06 12.6 14.2 13.6 8 7 6 0 0 0 1 3 4 2 2 6 0 0 0 0 0 0 2011 25.6 30.6 20.2 12.19 14.3 4 0 1 1 0 0 100 2012 26.1 31.2 20.8 32.75 13.8 5 0 3 1 0 0 2013 2014 26.4 26.7 31.2 31.0 21.2 21.7 28.19 71.12 15.2 15.2 7 7 0 0 3 6 2 1 0 0 0 0 City: Yanbo Year T 1986 26.7 1987 27.5 TM 33.9 34.9 Tm 20.5 20.8 PP 0.00 0.00 V 14.8 14.4 RA 4 2 SN 0 0 TS 2 2 FG 3 7 TN 0 0 GR 0 0 1988 1989 1990 28.0 27.3 27.1 35.1 34.4 34.6 21.2 20.8 20.2 8.89 139.20 1.02 15.2 11.7 11.8 5 9 3 0 1 0 0 7 2 1 10 12 0 0 0 0 1 1 1991 27.4 34.2 21.0 219.20 13.1 9 0 3 6 0 1 1992 1993 1994 1995 26.1 27.6 27.4 27.6 33.0 34.6 34.3 34.5 19.6 21.0 21.1 20.9 100.33 0.00 0.00 181.86 13.3 12.7 11.6 14.2 12 12 3 3 0 0 0 0 8 6 6 3 2 5 2 6 0 0 0 0 0 1 0 0 1996 1997 1998 1999 27.6 27.1 28.1 28.6 35.0 34.8 35.6 36.0 20.8 19.9 20.9 21.6 194.06 0.00 105.92 0.00 12.3 14.7 15.9 16.2 7 9 2 7 1 0 1 0 7 3 4 6 6 4 4 7 0 0 0 0 0 0 0 0 2000 2001 2002 2003 27.4 28.9 28.9 28.5 35.2 36.3 36.3 36.0 20.1 21.9 21.2 21.9 0.00 0.00 0.00 0.00 14.5 14.0 14.1 14.1 3 3 7 3 0 0 0 0 1 4 3 1 6 5 4 3 0 0 0 0 0 0 0 0 2004 2005 2006 28.6 28.6 37.1 35.9 21.3 21.7 0.00 0.00 13.6 13.4 4 2 0 0 2 3 4 6 0 1 0 0 2007 2008 27.7 28.3 34.8 34.7 21.1 22.0 0.00 6.10 12.3 13.3 2 4 0 0 0 2 6 5 0 0 0 0 2009 28.1 35.2 21.3 106.93 12.7 3 0 4 8 0 0 2010 2011 2012 2013 29.6 27.7 28.1 28.5 36.5 34.9 35.1 35.3 22.9 20.9 21.3 22.0 42.16 5.08 41.14 67.06 13.9 13.0 12.8 12.8 8 3 5 4 0 0 0 0 7 5 8 6 7 3 7 4 0 0 0 0 0 0 0 0 2014 29.0 35.5 22.5 35.81 13.3 7 0 6 4 0 0 101 City:Abha Year T 1986 17.6 1987 18.5 TM 24.7 25.4 Tm 11.7 12.3 PP 0.00 0.00 V 11.7 12.1 RA 35 49 SN 0 0 TS 39 51 FG 17 6 TN 0 0 GR 1 3 1988 1989 1990 1991 18.6 18.6 18.3 19.0 25.5 24.6 25.2 25.5 12.1 11.5 11.5 12.1 0.00 0.00 0.00 0.00 13.9 14.9 14.8 11.2 36 41 47 40 0 1 0 1 41 46 41 36 13 16 11 9 0 0 0 0 2 3 5 0 1992 1993 17.7 18.4 23.6 25.0 11.8 12.1 0.00 0.00 11.1 10.4 70 57 1 2 46 53 14 4 0 0 2 2 1994 1995 18.8 18.6 26.0 25.3 12.6 12.3 7.11 0.00 10.5 9.6 28 41 0 0 38 39 4 8 0 0 0 1 1996 1997 1998 1999 18.6 18.3 18.9 19.0 25.7 25.1 26.0 26.5 12.2 12.3 12.5 11.9 279.91 36.07 0.00 0.00 10.3 11.4 11.5 9.3 46 52 44 30 1 0 0 1 52 78 45 55 4 2 4 2 0 0 0 0 0 0 0 0 2000 2001 2002 2003 19.3 18.9 18.9 19.2 26.3 26.5 26.5 26.0 12.5 12.2 12.8 13.0 0.00 0.00 0.00 8.89 11.0 10.8 10.5 9.6 35 26 27 22 0 0 0 0 57 47 43 38 3 5 5 4 0 0 0 0 1 0 1 0 2004 2005 2006 18.9 19.1 25.8 25.9 12.8 13.1 0.00 0.00 10.9 12.6 28 37 0 0 55 65 3 2 0 0 0 2 2007 2008 2009 19.2 19.4 19.8 26.1 26.5 26.8 12.9 12.6 13.5 0.00 0.00 0.00 12.5 11.2 12.3 27 25 24 1 0 0 72 63 53 10 9 5 0 0 0 0 0 1 2010 2011 2012 19.5 19.3 19.7 26.6 26.2 26.5 13.1 13.4 13.8 0.00 0.00 0.00 11.1 11.5 11.8 40 37 26 0 0 0 82 73 55 4 6 9 0 0 0 1 3 1 2013 19.5 26.4 13.6 0.00 11.2 31 0 59 3 0 0 2014 19.8 26.7 13.9 171.97 11.1 18 0 51 5 0 1 City: Al Baha Year T 1986 21.7 1987 22.4 TM 28.7 29.2 Tm 16.2 16.9 PP 54.10 0.00 V 13.5 13.0 RA 38 50 SN 1 0 TS 42 40 FG 3 0 TN 0 0 GR 5 4 29.2 16.8 0.00 14.7 22 0 34 4 0 3 1988 23.6 102 1989 22.6 28.2 15.8 0.00 14.7 41 0 38 9 0 7 1990 1991 22.7 22.7 28.8 29.3 16.4 16.8 0.00 100.33 14.7 14.8 34 32 1 0 31 35 1 1 0 0 2 0 1992 1993 1994 1995 21.9 23.6 24.2 23.9 26.9 28.4 28.8 28.8 16.2 17.1 17.4 17.0 0.00 0.00 0.00 1.02 16.2 16.0 15.9 15.6 40 30 11 15 2 3 0 0 25 26 11 17 1 6 0 3 0 0 0 0 0 1 0 0 1996 1997 1998 23.5 24.5 26.1 29.8 29.7 30.8 16.8 16.4 16.7 0.00 25.91 3.05 11.9 9.8 13.8 48 52 28 1 0 1 45 63 35 4 1 3 1 1 0 2 0 0 1999 26.5 30.8 17.7 0.00 13.7 30 0 24 3 0 0 2000 2001 2002 2003 26.1 25.0 20.1 24.0 30.8 30.2 30.2 30.4 17.1 16.6 16.8 17.4 0.00 0.00 0.00 0.00 12.8 12.1 12.1 11.2 26 34 28 41 0 1 0 1 40 42 63 61 1 4 3 0 1 0 0 0 1 1 1 0 2004 2005 2006 23.9 23.0 29.8 29.4 17.0 16.9 0.00 0.00 11.3 10.2 38 32 0 0 68 43 6 2 0 0 0 0 2007 2008 2009 23.1 23.1 23.7 29.8 30.1 30.5 16.9 16.6 17.4 0.00 50.05 0.00 12.1 11.7 11.7 34 19 16 0 0 0 47 34 34 0 2 4 0 0 0 0 0 0 2010 2011 2012 2013 23.5 23.3 23.5 23.3 30.1 29.9 30.3 29.7 17.3 17.3 17.4 17.4 0.00 0.00 0.00 0.00 11.7 12.2 11.8 11.7 20 14 30 20 0 0 0 0 40 29 47 42 1 4 3 1 0 0 0 0 0 0 0 0 2014 23.4 29.7 17.5 0.00 10.8 20 0 31 1 0 0 City: Sakaka Year T 1986 21.1 1987 24.3 1988 23.6 1989 23.3 1990 22.5 1991 21.5 1992 20.5 TM 28.5 29.5 28.5 28.5 29.1 28.6 27.2 Tm 14.0 14.9 14.6 14.0 14.5 14.4 13.2 PP 0.00 0.00 0.00 0.00 0.00 168.91 0.00 V 13.3 13.8 13.5 12.3 13.1 14.2 15.7 RA 21 20 24 15 14 26 17 SN 0 0 0 0 1 0 3 TS 9 9 12 6 7 14 5 FG 1 1 0 3 1 4 4 TN 0 0 0 0 0 0 0 GR 1 1 1 0 2 0 0 103 1993 21.4 28.6 14.3 3.05 12.3 30 0 11 4 0 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 21.8 22.9 23.6 22.8 24.5 25.1 24.4 24.8 24.7 28.6 28.8 29.9 28.3 30.5 30.0 28.9 29.8 29.5 14.9 14.7 15.1 14.2 16.0 15.7 15.4 15.2 15.2 66.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 13.9 13.4 15.1 16.2 14.1 13.7 15.1 16.8 16.7 31 19 23 19 15 18 13 10 16 0 1 0 1 0 0 0 0 0 14 6 8 8 0 3 6 2 8 4 5 5 3 0 4 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2003 22.2 29.4 15.1 0.00 15.0 16 0 7 6 0 0 2004 2005 2006 22.1 22.7 29.2 29.6 14.9 15.7 59.94 94.24 13.2 13.7 13 11 0 0 7 11 4 0 0 0 0 0 2007 2008 2009 2010 2011 23.1 23.2 23.2 25.2 22.8 30.0 29.9 29.9 32.3 29.7 15.8 16.1 16.2 17.8 15.8 7.11 21.09 00.0 24.91 18.79 14.3 13.1 13.5 13.3 12.5 4 7 12 13 9 0 0 0 0 0 1 2 8 9 3 0 3 6 0 1 0 0 0 0 0 0 0 0 0 0 2012 2013 2014 23.8 23.1 23.7 30.3 29.7 30.2 17.0 16.5 17.0 13.98 109.97 0.00 12.9 14.6 13.8 10 17 24 0 0 0 9 10 17 2 1 1 0 0 0 0 0 0 City: Guriat Year T 1986 19.1 1987 19.4 TM 28.7 29.0 Tm 11.4 11.2 PP 0.00 0.00 V 15.3 15.5 RA 35 27 SN 0 1 TS 6 3 FG 1 4 TN 0 0 GR 2 0 1988 1989 19.3 19.3 28.3 28.3 11.4 10.5 0.00 3.05 15.9 15.6 43 19 2 1 14 1 4 6 0 0 3 1 1990 1991 19.2 20.1 28.4 27.9 11.0 12.1 4.58 36.57 15.8 16.1 37 31 1 0 6 4 1 5 0 0 0 1 1992 1993 1994 1995 19.2 19.9 21.6 19.7 26.6 28.0 28.5 28.0 10.3 11.0 12.4 11.2 0.00 330.20 0.00 2.03 17.6 16.3 18.0 15.3 38 39 26 19 2 1 0 4 3 14 12 6 3 5 3 7 0 0 0 0 1 0 1 0 1996 20.3 29.1 11.4 0.00 15.5 20 1 4 5 0 0 104 1997 19.6 27.9 11.2 0.00 14.3 27 0 8 1 0 0 1998 1999 20.9 20.2 29.6 29.2 12.5 11.7 0.00 0.00 14.1 13.7 27 11 0 0 0 1 5 4 0 0 0 0 2000 2001 20.5 20.9 28.6 29.2 10.7 11.8 0.00 0.00 14.8 14.0 17 19 1 0 1 5 1 6 0 0 0 0 2002 20.8 29.1 11.9 0.00 15.1 28 1 8 2 0 0 2003 2004 2005 2006 19.6 19.6 20.0 28.8 28.8 28.0 11.8 11.8 11.6 0.00 2.03 0.00 15.1 15.1 14.0 20 28 20 0 0 0 5 3 6 10 5 7 0 0 0 0 0 0 2007 20.0 28.6 11.8 0.00 15.8 23 0 7 1 0 0 2008 2009 2010 2011 20.0 20.2 21.1 19.7 28.7 27.2 29.3 28.2 11.7 11.9 13.8 11.8 78.23 0.00 53.84 0.00 15.8 14.3 15.1 14.9 14 14 13 18 0 0 0 0 0 1 5 2 3 4 1 1 0 0 0 0 0 1 2 0 2012 2013 2014 20.8 20.2 20.4 29.2 28.4 28.1 13.0 12.5 12.7 0.00 37.82 124.98 15.2 15.5 14.8 21 15 18 0 0 0 2 11 10 3 2 4 0 0 0 0 0 0 City: Arar Year T 1986 21.6 TM 29.5 Tm 14.8 PP 0.00 V 10.4 RA 33 SN 0 TS 10 FG 1 TN 0 GR 1 1987 1988 1989 21.8 21.3 21.7 29.4 28.2 28.8 15.1 14.4 14.7 3.05 12.95 2.03 10.1 13.3 11.2 30 33 15 0 0 0 13 12 4 1 3 2 0 0 0 0 1 0 1990 1991 1992 1993 22.1 21.8 20.1 21.7 28.6 29.0 27.2 28.8 14.7 15.0 13.3 14.5 0.00 0.00 4.06 4.07 11.1 10.2 9.7 12.0 13 29 31 27 0 0 3 0 3 9 4 9 1 2 8 9 0 0 0 0 0 0 1 0 1994 1995 1996 1997 21.8 23.8 24.2 22.9 29.2 28.9 29.7 28.2 15.1 14.9 15.4 14.3 1.02 0.00 0.00 0.00 10.8 12.6 11.6 12.4 30 11 16 15 1 1 1 0 7 5 1 10 7 3 7 9 0 0 0 0 0 0 0 0 1998 1999 2000 25.5 25.1 24.7 30.9 30.6 29.8 15.1 15.1 14.5 0.00 0.00 0.00 14.2 14.5 14.6 13 14 10 1 0 0 0 6 7 0 3 3 0 0 0 0 0 0 105 2001 25.1 30.7 15.2 0.00 14.6 13 0 8 6 0 0 2002 2003 2004 2005 2006 25.2 22.7 22.4 23.7 30.7 30.7 30.8 30.5 15.2 15.2 14.6 15.5 0.00 0.00 0.00 0.00 14.5 12.5 15.0 15.1 13 12 23 15 0 0 0 0 8 5 10 15 5 10 5 2 0 0 0 0 0 0 0 0 2007 24.1 30.6 15.4 0.00 16.1 8 0 5 1 0 0 2008 2009 2010 23.4 22.9 24.8 31.0 31.0 33.0 15.3 15.3 17.0 70.10 0.00 0.00 12.2 15.6 14.7 6 12 15 1 0 0 3 4 5 3 3 0 0 0 0 0 0 0 2011 22.5 29.6 15.2 0.00 14.5 19 0 8 1 0 0 2012 2013 2014 23.5 22.9 23.5 30.7 30.0 30.7 16.2 15.8 16.4 16.51 0.00 0.00 14.7 15.1 14.3 15 30 37 0 0 0 2 10 21 4 6 4 0 0 0 0 0 0 City: Buraydah Year T TM 1986 24.5 32.3 Tm 17.8 PP 0.00 V 15.1 RA 40 SN 0 TS 17 FG 3 TN 0 GR 1 1987 1988 25.9 25.4 33.4 32.7 18.6 18.1 0.00 0.00 14.6 16.4 35 29 0 2 17 14 1 4 0 0 1 1 1989 24.6 32.2 17.5 0.00 16.5 36 1 13 10 0 0 1990 1991 1992 1993 25.7 25.2 23.9 24.8 32.8 32.5 30.2 31.6 17.9 18.2 16.6 17.7 0.00 0.50 0.00 0.00 16.2 15.7 15.2 14.4 30 31 30 42 0 0 0 0 9 10 9 17 2 6 11 10 0 0 0 0 0 0 1 0 1994 1995 1996 1997 25.0 25.1 26.0 24.7 31.9 32.2 33.3 31.9 17.9 17.4 18.4 17.3 57.92 261.11 0.00 307.08 13.1 12.3 12.9 12.9 40 43 34 40 0 4 2 3 24 15 16 19 6 8 12 13 0 0 0 0 2 0 1 1 1998 1999 2000 2001 26.5 26.3 25.7 26.2 34.1 34.0 33.0 33.9 18.6 18.5 18.1 18.1 60.96 0.00 0.00 0.00 10.1 10.1 11.3 12.4 26 34 34 31 0 1 0 0 8 16 20 17 3 2 12 6 0 0 0 0 1 0 0 0 2002 2003 2004 26.2 26.0 25.6 33.8 33.5 33.1 18.5 18.7 18.1 0.00 0.00 156.46 12.3 12.2 11.3 28 26 24 0 0 1 10 8 13 4 6 7 0 0 0 0 0 0 106 2005 2006 25.9 33.3 18.4 0.00 11.8 29 0 11 2 0 0 2007 26.0 33.5 18.5 0.00 12.5 16 0 6 1 0 0 2008 2009 2010 2011 26.0 25.9 27.3 25.5 33.6 33.6 35.3 33.3 18.3 18.1 19.3 17.6 0.00 0.00 0.00 57.17 12.2 11.6 11.0 11.4 15 26 18 19 0 0 0 0 3 8 14 9 0 7 0 1 0 0 0 0 0 0 0 0 2012 2013 2014 26.3 25.8 26.4 33.9 33.2 33.3 18.7 18.1 18.8 197.62 0.00 0.00 11.0 12.5 11.3 16 27 29 0 0 0 8 16 16 9 6 3 0 0 0 0 0 0 Tm 17.0 18.2 17.3 15.8 16.5 16.9 15.4 PP 0.00 0.00 0.00 4.06 21.08 137.15 514.10 V 11.4 11.8 11.9 10.7 10.5 10.1 9.2 RA 25 16 16 15 12 30 30 SN 0 0 0 0 1 0 0 TS 5 11 4 14 6 11 16 FG 2 3 2 3 2 7 8 TN 0 0 0 1 0 0 0 GR 0 0 1 4 2 0 0 City: Alqasim (Unayzah) Year T TM 1986 24.2 31.4 1987 25.9 32.9 1988 25.1 32.2 1989 23.9 31.2 1990 24.8 32.4 1991 24.8 31.7 1992 23.1 30.3 1993 1994 1995 1996 1997 1998 1999 2000 2001 24.5 25.0 25.0 25.6 24.7 26.2 26.1 25.5 26.0 31.7 32.1 32.1 33.1 32.0 34.1 34.1 33.5 34.5 16.4 16.9 16.9 17.5 16.7 17.5 17.6 17.4 17.4 0.00 92.97 0.00 2.03 25.91 0.00 0.00 0.00 0.00 6.7 9.5 10.4 9.6 9.8 9.7 8.9 9.9 12.7 40 23 27 25 18 20 13 17 16 0 0 0 1 2 1 0 0 0 19 17 17 12 14 13 8 15 11 6 1 14 6 7 1 4 14 6 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 2002 2003 2004 2005 2006 26.1 25.9 25.5 25.6 34.2 34.2 33.7 33.3 17.3 17.1 16.8 17.7 0.00 0.00 130.30 211.34 12.6 12.6 11.8 11.5 26 17 21 21 0 0 0 0 8 14 17 20 4 2 6 0 0 0 0 0 0 0 0 0 2007 2008 25.9 25.7 34.2 33.4 17.5 17.0 13.98 92.48 11.4 10.9 7 13 0 1 3 8 2 1 0 0 0 0 107 2009 26.1 34.2 17.7 280.95 11.0 16 0 14 5 0 0 2010 2011 2012 2013 2014 26.8 26.0 26.7 26.3 27.0 35.2 33.6 34.4 33.6 33.6 18.4 18.1 18.5 18.3 19.5 0.00 0.00 0.00 0.00 0.00 9.9 11.4 11.3 12.0 11.9 17 28 17 16 11 0 0 0 0 0 21 22 22 20 23 0 1 2 1 1 0 0 0 0 0 1 1 0 0 0 City: Dahran Year T 1986 26.3 TM 33.1 Tm 20.2 PP 0.00 V 14.7 RA 36 SN 1 TS 13 FG 11 TN 0 GR 0 1987 1988 1989 1990 1991 1992 26.7 26.6 26.0 26.5 25.7 25.0 33.6 33.3 32.7 33.4 32.1 31.7 20.2 20.4 19.6 20.2 20.2 19.2 0.00 0.00 0.00 0.00 0.00 0.00 14.2 16.1 16.2 16.1 15.8 15.8 27 35 27 22 37 46 0 1 0 0 1 0 5 11 11 5 14 12 18 11 10 11 15 13 0 0 1 0 0 0 0 0 3 1 1 2 1993 1994 1995 26.0 26.7 26.2 32.9 33.3 32.7 20.0 20.7 20.2 37.08 522.22 0.00 15.9 16.2 14.9 34 34 41 0 0 0 12 8 8 15 8 15 0 0 0 0 1 0 1996 1997 1998 1999 26.8 26.3 27.4 27.6 33.3 32.7 34.4 34.6 20.8 20.2 21.0 21.2 0.00 75.95 0.25 0.00 14.4 15.4 15.5 16.3 30 42 19 16 0 2 1 0 8 16 4 9 19 12 13 8 0 1 0 0 0 0 0 1 2000 2001 2002 2003 26.8 27.2 27.1 26.8 33.7 34.5 34.1 33.8 20.2 20.3 20.1 20.6 0.00 0.00 0.00 159.00 17.8 17.2 17.2 14.5 18 11 22 29 0 0 0 0 9 2 8 19 11 19 4 16 0 0 0 0 0 0 0 0 2004 2005 2006 26.8 27.7 33.7 34.9 20.2 21.0 173.99 0.00 14.5 15.4 34 33 0 0 15 14 6 8 0 0 0 0 2007 27.9 35.7 20.9 0.00 15.3 24 0 15 20 0 0 2008 2009 2010 27.8 28.0 27.9 35.7 35.5 35.4 20.7 21.1 21.2 24.13 0.00 115.56 15.6 15.8 14.3 13 26 16 0 0 0 7 15 13 16 8 10 0 0 0 0 0 1 2011 2012 27.0 27.2 33.8 34.3 20.4 20.6 0.00 0.00 15.6 14.7 26 26 0 0 8 9 7 9 0 0 0 0 108 2013 27.0 33.8 20.4 0.00 15.4 24 0 17 15 0 0 2014 27.6 34.1 21.1 0.00 15.8 21 0 10 12 0 0 City: Al Ahsa Year T 1986 28.1 1987 27.0 TM 33.9 34.6 Tm 19.6 19.3 PP 0.00 0.00 V 16.8 15.0 RA 15 19 SN 0 2 TS 4 6 FG 3 2 TN 0 0 GR 0 2 1988 1989 1990 1991 27.7 27.9 27.6 26.0 33.9 33.6 34.4 33.7 19.1 18.3 19.1 18.8 0.00 0.00 0.00 0.00 16.1 17.2 17.2 14.9 23 16 16 30 1 0 1 1 13 9 10 8 7 4 7 10 0 0 0 0 0 2 1 0 1992 1993 1994 1995 25.5 26.8 27.2 26.7 33.0 34.4 34.8 34.0 18.0 18.8 19.3 18.5 27.18 0.00 1.02 0.00 14.9 14.5 15.0 13.8 29 31 14 32 0 0 1 0 13 20 6 11 5 6 3 4 0 0 0 0 2 0 0 0 1996 1997 1998 1999 26.9 26.5 27.8 27.9 34.6 34.0 36.1 36.3 19.0 19.1 19.7 19.9 0.00 36.08 0.00 0.00 11.9 13.1 9.8 9.8 23 27 10 11 0 2 0 0 7 16 6 3 2 5 4 5 0 0 0 0 0 1 0 0 2000 2001 2002 2003 27.2 27.7 27.6 27.6 35.4 36.4 36.5 35.8 19.3 19.5 19.4 20.7 0.00 0.76 0.00 0.00 9.4 9.7 9.7 12.6 13 7 24 23 0 0 1 0 4 4 2 13 5 2 5 7 0 0 0 0 0 0 0 1 2004 2005 2006 27.4 27.8 35.7 35.5 20.2 20.6 0.00 70.62 12.0 11.7 24 26 0 0 8 8 7 3 0 0 0 0 2007 27.7 27.5 35.7 35.4 20.4 20.1 58.69 18.80 10.9 8.6 18 12 0 0 10 1 2 3 0 0 1 0 2008 2009 27.8 35.7 20.7 0.00 10.0 20 0 9 6 0 0 2010 2011 2012 2013 28.5 27.4 28.0 27.6 36.9 35.3 35.9 35.3 21.0 20.1 20.4 20.1 0.00 154.17 91.44 124.96 10.2 14.3 15.2 14.9 20 17 21 20 0 0 0 0 11 10 10 14 5 8 4 6 0 0 0 0 0 0 0 0 2014 28.0 35.6 20.3 0.00 13.7 22 0 12 6 0 0 109 City: Khamis Mushait Year T TM 1986 18.5 25.9 1987 19.5 27.0 Tm 12.9 13.4 PP 0.00 0.00 V 10.1 8.5 RA 47 59 SN 1 0 TS 35 35 FG 2 1 TN 1 0 GR 2 3 1988 1989 1990 1991 19.7 19.1 19.2 19.8 27.7 26.3 26.7 27.1 13.6 13.2 13.3 14.0 1.02 36.07 0.00 16.00 8.5 8.6 8.2 9.5 54 52 48 44 0 1 1 0 25 29 32 41 1 4 3 0 0 0 0 0 6 7 4 0 1992 1993 18.6 19.3 25.4 26.4 13.4 13.3 0.00 0.00 13.8 10.5 66 58 0 1 45 48 2 3 0 0 3 2 1994 1995 19.7 19.6 26.9 26.5 13.5 13.3 0.00 0.00 10.5 11.3 41 44 0 1 38 34 0 0 0 0 0 0 1996 1997 1998 1999 19.7 19.4 20.0 20.1 26.8 26.4 27.6 28.1 13.3 13.6 13.7 13.3 25.91 55.88 0.00 0.00 11.1 10.2 11.5 11.5 47 60 47 32 1 0 1 1 40 66 46 33 1 4 0 1 0 0 0 0 1 1 1 3 2000 2001 2002 2003 20.3 20.0 20.1 20.6 28.1 27.8 27.9 28.0 13.5 13.1 13.1 13.9 0.00 0.00 0.00 7.11 11.1 11.8 11.9 12.5 31 32 29 20 0 0 0 0 43 36 34 29 2 0 3 3 0 0 0 0 0 1 1 1 2004 2005 2006 20.3 20.1 27.5 27.4 13.6 13.6 0.00 0.00 11.9 10.2 36 41 0 0 50 45 3 3 0 0 1 1 2007 2008 2009 20.0 20.2 20.5 27.4 27.8 27.9 13.5 13.4 13.7 0.00 0.00 30.49 10.1 9.2 10.6 34 30 14 0 0 0 49 43 24 2 5 0 0 0 0 1 0 0 2010 2011 2012 20.2 20.0 20.3 28.0 27.8 28.0 13.3 13.0 13.0 0.00 0.00 0.00 11.1 11.5 11.2 30 30 28 0 0 0 60 39 43 0 2 1 0 0 0 1 0 0 2013 20.2 27.2 14.0 0.00 10.9 28 0 40 0 0 0 2014 20.3 27.2 14.2 0.00 11.3 26 0 36 0 0 0 City:Jeddah Year T 1986 28.1 TM 34.5 Tm 22.1 PP 10.92 V 12.6 RA 3 SN 0 TS 3 FG 3 TN 0 GR 1 35.0 35.1 22.3 22.7 0.00 0.00 12.5 12.5 15 11 0 1 8 5 2 3 0 0 0 1 1987 1988 28.4 28.5 110 1989 1990 1991 1992 27.7 27.6 27.9 26.8 34.1 33.9 33.8 33.2 22.0 21.6 22.5 21.3 224.03 1.02 0.00 0.00 13.3 13.4 14.0 14.1 9 6 12 15 0 0 0 0 8 2 9 13 2 8 6 2 0 1 0 0 0 0 1 1 1993 1994 1995 1996 28.1 28.5 28.4 28.3 34.0 34.8 34.8 34.7 22.6 23.0 22.8 22.7 0.00 0.00 0.00 0.00 14.7 14.1 13.2 14.2 15 2 3 14 1 1 0 1 6 2 3 21 2 0 4 5 0 1 0 0 0 0 0 0 1997 1998 27.9 28.6 34.5 35.2 22.2 22.7 0.00 0.00 12.8 12.2 6 7 1 2 7 9 3 0 0 0 0 0 1999 2000 28.7 28.1 35.5 34.8 23.0 22.6 0.00 0.00 11.6 13.3 6 3 2 0 7 9 2 5 0 0 0 0 2001 2002 2003 2004 28.6 28.7 28.8 28.5 35.4 35.4 35.4 35.2 23.0 23.0 23.2 22.7 0.00 0.00 0.00 0.00 13.8 13.7 12.8 12.9 7 4 8 5 1 0 0 0 8 4 8 6 2 3 3 1 0 0 0 0 0 0 0 0 2005 2006 28.7 35.2 23.1 0.25 13.8 3 0 6 5 0 0 2007 28.7 35.2 23.0 3.05 13.4 1 0 1 5 0 0 2008 2009 2010 28.9 28.9 29.5 35.5 35.4 36.4 23.3 23.4 23.7 134.87 196.33 101.34 12.4 12.1 10.3 7 8 9 0 0 0 7 12 14 1 2 2 0 0 0 0 0 0 2011 2012 2013 2014 28.5 29.2 29.3 29.6 34.6 35.1 34.3 34.7 22.9 23.8 24.6 25.0 80.77 24.63 1.52 48.53 12.9 13.4 13.4 13.4 10 5 2 9 0 0 0 0 10 10 7 12 2 1 0 1 0 0 0 0 0 0 0 0 City: Bishah Year T 1986 24.9 TM 32.6 Tm 16.4 PP 0.00 V 8.0 RA 17 SN 0 TS 18 FG 1 TN 0 GR 0 33.0 33.0 32.1 32.8 33.7 32.2 17.0 16.9 15.3 16.6 17.0 16.2 0.00 0.00 0.00 264.92 73.92 134.62 8.3 9.1 10.5 10.3 10.2 7.8 29 19 26 16 21 32 0 0 0 0 0 0 15 15 15 12 17 19 1 1 2 3 1 1 0 0 0 0 0 0 1 0 3 2 1 3 1987 1988 1989 1990 1991 1992 25.6 25.8 26.6 27.3 26.1 24.6 111 1993 1994 1995 1996 25.7 26.6 27.5 26.5 32.9 33.7 33.6 33.7 16.6 17.4 17.1 17.0 7.12 0.00 0.00 190.00 8.1 8.9 8.6 8.0 21 8 14 20 0 0 0 1 16 14 19 23 0 0 6 0 0 0 0 0 2 0 1 0 1997 1998 1999 25.5 27.3 28.0 33.3 34.3 34.5 17.2 17.6 17.5 29.97 26.92 0.00 7.7 8.0 9.7 40 21 8 2 2 0 46 20 12 0 1 0 0 0 0 0 0 0 2000 2001 28.2 27.7 34.7 34.4 17.9 17.3 0.00 0.00 9.6 11.7 11 12 0 0 14 18 0 0 0 0 0 0 2002 2003 2004 27.8 28.5 26.7 34.5 34.9 34.1 17.4 18.3 17.5 0.00 0.00 0.00 10.6 10.5 8.4 9 9 13 0 0 0 11 15 17 0 0 1 0 0 0 0 0 1 2005 2006 26.2 34.1 17.1 0.00 6.7 22 0 17 0 0 0 2007 26.4 34.2 17.6 33.51 7.0 7 0 21 1 0 0 2008 2009 2010 26.2 27.1 26.4 34.0 34.9 34.8 17.6 18.9 18.0 31.50 148.85 0.00 7.9 7.6 7.5 13 6 19 0 0 0 19 11 18 0 2 0 0 0 0 0 0 0 2011 2012 2013 2014 26.6 26.1 25.6 25.9 34.7 34.3 33.4 33.5 18.6 17.5 17.6 17.7 40.13 54.37 191.76 0.00 7.0 7.3 6.6 8.9 9 14 13 14 0 0 0 0 11 22 24 17 1 0 1 0 0 0 0 0 0 0 0 0 112