1 Chapter 1 INTRODUCTION 1.1 World Natural Rubber Industry Natural rubber (NR) is generated from latex yielding trees (Havea Brasillensis). According to the United Nations Conference on Trade and Development (UNCTAD), due to its elasticity, toughness and resilience, NR is commercially an important component in manufacturing a variety of products in the transportation, industrial, consumer and medical sectors. In 2010, more than 90 percent of global supply of NR was produced by the Association of Natural Rubber Producing Countries (ANRPC), which consists of Thailand, Indonesia, Malaysia, India, Vietnam, China, Sri Lanka, Philippines and Cambodia. ANRPC supplied 9.422 million metric tons (MT) of NR to the world market in 2010. Malaysia is considered as one of main NR producers and consumers. It is the third-ranked NR producer with 970,000 MT in 2010, which was more than 10 percent of global output of NR (Table 1.1). The country is expected to increase its production to 1.05 million MT in 2011 according to the ANRPC. Furthermore, it estimated that five percent of the total NR supply was consumed by Malaysia in 2010. Essentially, the Malaysian market is considered one of main commercial hubs for the natural rubber industry (Burger and Smit 2002). The price of Standard Malaysia Rubber No. 20 (SMR20) will be analyzed in this study because it is the most commonly traded type of NR. 2 NR is considered a vital source of income for the region, especially in Thailand, Malaysia, Indonesia, Vietnam, India and Cambodia. The International Rubber Research and Development Board (IRRDB) estimated that approximately 20 million families depend on this commodity for their basic income. Most farms are small with an average of two hectares (or 4.94 acres) of land filled with Havea. Large estates consist of a minimum 300 hectares (or 741.31 acres), and are commonly owned by states and private enterprises. In Malaysia, 94.5% of total NR productions are from smallholdings while the estate sector accounted for only 5.5%. Table 1.1 Supply of Natural Rubber in 2009 and 2010 Country Production ('000 MT) 2009 2010 Percentage Change 3,164 3,072 -3% Thailand 2440 2,843 14% Indonesia 857 970 12% Malaysia 820 845 3% India 724 750 3% Vietnam 643 647 1% China 137 148 7% Sri Lanka 98 102 4% Philippines 35 45 22% Cambodia 8,918 9,422 5% Total Source of Data: The Association of Natural Rubber Producing Countries Futures contracts are used in trading NR between producing and consuming countries such as China, Singapore, European, Japan and the United States through brokers (Romprasert 2009). With the establishment of futures markets, it provides opportunities for local traders and producers to exercise different options in trading such as when to buy, sell, or stock NR. Therefore, a better understanding of what determines 3 prices is necessary to make sound decisions in hedging risk and speculating in this commodity futures market. Due to a variation in prices, the existence of the hedging procedure protects the investments against loss in the futures market by balancing or compensating contracts. 1.2 The Price of Natural Rubber The content of natural rubber is technically considered as an intermediate good, which is complementarily used to produce a variety of final consumer goods such as medical gloves, condoms, general rubber goods (GRG), and tires (Table 1.2). NR is predominantly demanded by the manufacturing sector; specifically, tires and automotive industries play a key role in the growth of NR (Burger and Smit 1989). Today, more than 65 percent of NR is used in the tire and automobile industries (ANRPC 2010). Table 1.2 indicates that the percentage of the NR-share in the tire and GRG (General Rubber Goods) sectors have been decreasing; economically, due to the substitution of synthetic rubber for increasingly-costly natural rubber. Table 1.2 Percentages of NR Used in Tire and GRG Sector in 2010 Period NR-Share in tire sector (%) NR-Share in GRG sector (%) 2003 76 2004 76 2005 76 2006 73 2007 72 2008 73 2009 71 2010 68 Source of Data: The Association of Natural Producing Countries 79 79 79 78 78 77 76 75 4 Similar to most agricultural commodities in the world market, the price of NR fluctuates over time (Figure 3.1). From January 2000 to mid 2008, the price of natural rubber increased around 4.5 percent annually. In the fourth quarter of 2008, the price of SMR20 plunged due to the global economic downturn, which led to lower demand for NR. Since Heavea are primarily grown in developing countries, the price of NR plays an important role in contributing to poverty in the region. Global economic factors are viewed as the main determined of NR prices. For example, the United States economy grew slightly faster in Q3, 2010 at 2.6 percent rate than Q2; giving the economy a significant boost, and this growth signaled a jump in global NR prices due to the boost in U.S. consumer. The global economic recovery in the United States and elsewhere continues to boost commodity prices through speculative investments (ANRPC 2010). 1.3 Problem Statement With large price volatility, it is important to statistically and accurately forecast the future spot prices of natural rubber. The decisions of current production depend heavily on the prevailing futures prices (Feder at et. (1980) and Allen (1994)). In that case, the importance of accurate price forecasting for decision makers has become even more critical for producers, traders and consumers who are involved in the NR industry. There are several econometric techniques utilized in price forecasting. Agricultural economists have done extensive work forecasting many specific agriculture commodity prices. Yet, there are not many studies which focus on forecasting NR prices. Romprasert (2009) conducted different forecasting models, which included regression 5 analysis, exponential smoothing, Holt’s linear exponential, and Box-Jenkins, to study the futures price of the Thailand natural rubber ribbed smoked sheet no. 3 (RSS3). Khin et al. (2011) incorporated an autoregressive integrated moving average (ARIMA) and multivariate autoregressive moving average (MARMA) models to forecast the futures prices of SMR20 over the period January 1990 to December 2008. Burger et al. (2002) included a vector error correction (VEC) model in their paper to analyze the relationship between the price of NR and exchange rate during the Asian financial crisis in 1997. Three econometric models, ARIMA and vector autoregressive (VAR) and VEC are employed in this analysis. The ARIMA model is a form of extrapolation, where the past behavior of prices and other variables (captured in the error terms) are used to forecast future values. With ARIMA models, there are no explanatory variables except the past history of the variable in question. That is, it is a univariate method of forecasting. The multivariate autoregression (VAR) model involves N-variables. Technically, it is an extension of the univariate to multivariate time series. Economically, the occurrence of an event may be caused by multiple time series variables. Furthermore, not only that these variables are contemporaneously correlated to each other, their past values may also correlate with each other. By considering multiple time series jointly for an analysis, it utilizes additional information in determining the dynamic relationships over time among the series (Stock and Watson 2011). 1.4 Objectives and Overview The objective of the thesis is to develop and evaluate the forecasting accuracy of univariate and multivariate time series models ARIMA, VAR, and VEC, respectively, for 6 NR prices. The models are based on the average monthly spot price of SMR20, which is the most commonly used specification in the industries. Economically, there are variables that influence prices. Most importantly, the average monthly spot price of crude oil and the average monthly exchange rate between the Malaysia ringgit (MYR) and the United States dollar (USD) are used in the multivariate analysis. Before proceeding with the examination, all variables are converted into log form and tested for stationary. If non-stationary occurs in the time series, the data are differenced. Then, tests for cointegration are conducted if the variables in the forecasting models are differenced of the same order. If cointegration is found, then a VEC model is taken into consideration for the multivariate time series instead of VAR model. Having determined the specific models, evaluating the forecasting ability of the alternative models will be conducted in order to compare the forecasting errors of each model. In essence, this paper carefully studies the historical data of NR to come up with proper econometric models, which can be used to perform a short-term forecast of NR prices. 1.5 Contributions of the Study Besides palm oil, rice, and cassava, NR is considered as one of most important commodities in Southeast Asia since it plays a major role in improving social economies throughout the region. Millions of families’ incomes are dependent on the prices of natural rubber. Consequently, the instability of the price of NR posts a significant risk to producers, traders, consumers and others who involved in the production of NR. Therefore, this study provides valuable information in terms of improving, planning, and making decisions in NR production. As one of the producers and 7 distributors of natural rubber in Cambodia, I can adopt these methods of forecasting to develop a market strategy to work with local farmers and improve the well being of NR producers in Cambodia. Furthermore, this study can be introduced to the Department of Rubber Development in Cambodia where certain policies can be implemented to maximize the welfare of rubber farmers. 8 Chapter 2 LITERATURE REVIEW 2.1 Introduction Forecasting consists of using prior data to predict futures values. Since natural rubber is a storable intermediate good, current production depends heavily on future prices. In essence, forecasts provide complementarily information, which aids in executing policies and making decisions for future events. 2.2 Methodology Reviews The ARIMA model deals with a univariate time series data and it is function of autoregression (AR) and moving average (MA) model. The process of AR depends on a weighted sum of its past values and a random disturbance term while the process of MA model depends on a weighted sum of current and lagged random disturbances. If a time series is not stationary, it can be differenced (integrated) once or more to become stationary. Therefore, the stationary process of ARIMA model is a combination of both lagged from past values and random disturbances, as well as a current disturbance term (Pindyck and Rubinfeld 1998). In terms of short-term forecasting, univariate time series models frequently outperform sophisticated structural models (Harvey and Todd 1983). Khin (2010) developed multiple forecasting models, included ARIMA, to predict the short-term future price of natural rubber in Malaysia where the short term ex ante and ex post forecasting was being generated; Khin et. al (2008) fit the historical prices, over period January 1990 to December 2006, of SMR20 in a time series model and concluded that the data suits an 9 ARIMA(1,1,1) model, using Box-Jenkins methods. The outcomes showed that forecasting values are satisfactory in term of statistical results. Harvey and Todd (1983) suggested that univariate time series price forecasting was reasonably accurate in predicting the future prices of diesel for short-term forecasting; yet, it was formidable to forecast the long-term price. The results of forecasting and statistical results are somewhat inconsistent when few data points are available and when the forecast horizon is extended (Bailey and Gupta 1999). For accurate forecasting, a lengthy observation is required and it is recommended that at least 50 observations are generally needed to obtain good results (Meyler, Kenny and Quinn 1998). Zant (1994) studied the Indian rubber market, which focused particularly on the explanation of short run price and stock information over the period 1978 to 1990. He concluded that the performance on historic time series was not impressive due to enormous price fluctuations. Theoretically, there are economic variables that influence the prices of NR. In that regard, synthetic rubber, a close substitute source in manufacturing most industrial products, according to United Nation Conference on Trade and Agreement, UNCTAD, can be processed from crude oil. Therefore, it is important to add the price of crude oil in the analysis in order to minimize forecasting errors. A fluctuation in exchange rates for NR producing countries is also one of the main factors in influencing NR prices (ANRPC 2010). Sims (1980) utilized and promoted VAR models as a method to estimate the economic relationships. Essentially, variables such as exchange rates and oil prices will be utilized as additional predictors in the VAR or VEC model. 10 Having a set of interacting variables, leads to the use of multivariate models where multiple variables and their interaction are taken into consideration. Burger et. al (2002) indicated that exchange rates have impacted the price of NR. The price of crude oil and exchange rate between MYR and USD play a key role in changing the price of NR in Malaysia. Frank and Garcia (2010) adopted VAR and VEC model procedures to determine the linkages among several commodities oil and exchange rates. This study suggested that the agricultural commodity markets depend more on the exchange rate and to a lesser extent on oil prices, but both will be utilized in this thesis. 11 Chapter 3 FORECASTING DATA 3.1 The Data The time series of the average monthly spot prices of SMR20, the average monthly spot prices of crude oil, the average exchange rate between MYR and the USD, and the average estimated end-to-month sales of motor vehicle in the United States are implemented in this empirical analysis. The data are drawn from three sources: the Malaysia Rubber Board (MRB), the Bank Negara Malaysia (Central Bank of Malaysia) and the United States Department of Commerce. In essence, the data consists of four time series, 141 observations for each series over the period of January 2000 to September 2011. The obtained price of SMR20 and crude oil are the calculated monthly average spot prices and given in USD per MT and USD per barrel, respectively. The estimated sales of motor vehicle in the U.S. are given as a total end-of-month retail sale. For convenience, I note the prices of SMR20, prices of crude oil, total sales of motor vehicle in the U.S., and the average monthly exchange rate between MYR and the USD as PSMR20, PCO, TSMV and EXM, respectively. 3.2 Forecasting Periods In order to evaluate the out-of-sample forecasting ability of the various models, some observations at the end of the sample period are not used in estimating the models. Thus, there are two periods in the analysis: an in-sample period (January 2000 to December 2009), and an out-of-sample period (January 2010 to September 2011). The series from the in-sample period is used to generate the forecasting models where the out- 12 of-sample forecasts can be used to check against actual data. Figure 3.1 shows that during the fourth quarter of 2008, the price of SMR20 plummets and this phenomenon was caused by the global economic downturn, which led to a decline in demand for commodities, including NR. The global recession was likely the primary reason for the large decline in NR prices. Auto sales are included to account for the macroeconomic factors along with oil prices and exchange rates. Figure 3.1 Average Monthly Spot Prices of SMR20 (Jan. 2000 to Sep. 2011) 6000 USD/MT 5000 4000 3000 2000 1000 0 Time Source: Malaysia Rubber Board 3.3 Descriptive Statistics Descriptive statistics are presented in Table 3.1 where the Panels A and B show the statistics corresponding to the time series over the period January 2000 to December 2009 and January 2010 to September 2011, respectively. Over the period January 2000 to December 2009, the mean for PSM020 is 1,406.52 USD/MT while the standard deviation is 703.40 USD/MT. The minimum value for PSMR20 was 503.80 USD/MT while the maximum price reached 3,183.10 USD/M. The coefficient of variation is 13 slightly over 50 percent, which indicated that the price of SMR20 was highly varied over the period January 2000 to December 2009. The decrease in the volatility of price shows that between January 2010 and September 2011 mean adjusted volatility was less than one-half of what it was prior January 2010. Table 3.1 Descriptive Statistics for In-Sample of PSMR20 Panel A: Period January 2000 to December 2010 (120 observations) PSMR20 Mean SD Min Max Coef. Of Variation 1,406.52 703.40 503.80 3,183.10 50.01 Panel B: Period January 2010 to September 2011 (21 observations) PSMR20 3,962.25 Mean 880.80 SD 2,861.00 Min 5,560.80 Max 22.33 Coef. Of Variation Source: Malaysia Rubber Board and Malaysia Central Bank Chapter 1 briefly discussed that the trend of PSMR20 and PCO are closely related. The correlation between the two variables over the period January 2000 and December 2010 is close to perfect, at 0.94. This result is consistent with demand theory where the prices of complementary goods move together. The negative relationship between Log of PSMR20 and Log of EXM indicates that appreciation in the Malaysia ringgit leads to a higher price of SMR20. The positive correlation between the log PSMR20 and log of TSMV reveals that strong demand in motor vehicles leads to price increases in SMR20. 14 Table 3.2 Correlation Between Log of PSMR20 with Log of PCO and EXM Log of SMR20 Log of PCO 0.94 Log of EXM -0.79 Log of TSMV 0.20 Source: Malaysia Rubber Board and Malaysia Central Bank Furthermore, examining the plots of the data is important; the graphical results provide visual evidence as to whether there exists any structural breaks, outliers or data errors. One can also detect a significant seasonal pattern form a time series plot. Visual plots also suggest potential relationships among the time series data. It appears that there is strong positive relationship between the PSMR20 and PCO, which is supported by economic theory since synthetic rubber and natural rubber are the key components in producing commercial tires. NR is traded in USD and its prices normally gain on strengthening currencies of NR producing country (ANRPC 2010). An appreciation of Malaysia ringgit over the period of late 2006 to mid 2008 led to an increase in PSMR20. Furthermore, NR prices are generally expected to follow the path of the crude oil market (ANRPC 2010). Correspondingly, Figure 3.2 shows that crude oil prices follow the same trend as SMR20 prices. Frank and Garcia (2010) suggested that commodity prices depend on the exchange rate. The U.S. index dropped to the lowest point in 2008, which coincides with peak price of crude oil in that same period (Figure 3.2). Also, the plot reveals that the series all appear to behave like random walks with no seasonal vibration. Figure 3.3 suggests that the trend of PSMR20 and PTSMV seem to evolve in a common movement. Therefore, by adding the demand for motor vehicles, should help explain the variation of PSMR20. 15 Figure 3.2 Log Forms of the Variables (Jan. 2000 to Dec. 2009) Source: Malaysia Rubber Board and Malaysia Central Bank Figure 3.3 Log Forms of the Variables (Jan. 2010 to Sep. 2011) Source: Malaysia Rubber Board and the United State Department of Commerce 16 Chapter 4 EMPIRICAL METHODOLOGY 4.1 Preliminary Procedures 4.1.1 Unit Root Processes It is important for the time series data to be stationary. A stationary time series is one with a constant mean, variance and a covariance that does not depend on time (Stock and Watson 2011). In the case of nonstationary time series, the data are transformed by differencing to induce stationary. There are several methods to test for stationary, including the Dickey-Fuller (DF) test (Dickey and Fuller 1981) and the Phillips-Perron Test (Phillips and Perron 1988). The Phillips-Perron method suffers from severe size distortions when there are negative moving average errors. In this analysis, the DF test will be performed since it is generally the most reliable and it is easy to implement and interpret (Stock and Watson (2011). Consider a first-order of autoregressive model AR(1) which can be written as the following: PSMR20t = b0 + b1PSMR20t-1 + ut (4.1) In the Dickey-Fuller (DF) test for unit root, the null hypothesis is b1 = 1 which indicates that PSMR20t is nonstationary and has an autoregressive root of 1. The alternative is that b1 < 1 which implies that time series PSMR20t is stationary. In practice, the Dickey-Fuller test is implemented by subtracting PSMR20t-1form both sides of the equation (4.1) to yield: 17 DPSMR20t = b0 + d PSMR20t-1 + ut (4.2) Where DPSMR20t = PSMR20t - PSMR20t-1 and d = b1 -1 . The null hypothesis is now d = 0 (unit root) against the alternative d < 0 (Stationary). The DF test applies only to AR(1). In some cases, AR(1) is not considered a good model in capturing all the serial correlation of the time series. Therefore, a higherorder of autoregressive is taken into account. In that situation, testing for unit root requires the augmented Dickey-Fuller (ADF) test. The ADF-critical value is used in the analysis. 4.1.2 Checking for Seasonality After seven years, Havea trees are mature and can yield latex all year around. However, in some regions the trees are not being tapped during period of growing new leaves, which normally occurs in February and March. Technically, the latex is processed into dried block or sheet rubber which can be stored in the warehouse for a long period of time. In that case, NR is considered a storable good. The supply of NR is considered non-seasonal (Barlow 1978). However, there is a possibility that the demand of rubber is seasonal. In that case, checking for seasonality for the price of SMR20 is taken into consideration in the empirical analysis. The existence of a consistent pattern in a time series reveals seasonality (Goetz and Weber 1986). Inspecting the plots of autocorrelation function (ACF) and partial autocorrelation function (PACF) are considered effective methods for checking seasonality in the time series. For instance, the existence of a seasonal pattern in the data suggests that the ACF plots spike consistently every 12 observations in monthly data. 18 More formally, seasonal dummy variables are included in the regression models. If the ACF and PACF plots indicate an existence of seasonality, seasonal dummy variables are incorporated in the analysis. 4.2 Univariate Time Series Model 4.2.1 Autoregressive Integrated Moving Average (ARIMA) Model In order to attain a forecast with minimal errors, there are roughly seven characteristics of a good ARIMA model, which are taken into consideration (Prankratz 1983). First of all, a good model is parsimonious which provides a strong practical orientation in developing a model. Parsimony means only including the smallest numbers of coefficients needed to explain the available data. Second, a good autoregressive (AR) model is stationary which implies that the time series has constant mean and variance through time. Third, a good moving average (MA) is invertible where the requirements imply that the coefficient of MA must satisfy certain conditions (Table 4.1). Fourth, a good model has high quality estimated coefficients at the estimation stage, which refers to the AR and MA coefficients. Theoretically, the coefficients must be statistically and significantly different from zero. Fifth, a good model has statistically independent residuals. Sixth, the residuals of the estimated model should be normally distributed. There are certain statistically tests which will be used to analyze the residuals. Having determined the fitted models, the results of forecasts error will be generated and discussed. Finally, a good model has sufficiently small forecast errors, which satisfactorily forecasts the future and normally fit the past data as well; it should 19 produce fairly acceptable forecasting future values. Correspondingly, performing an outof-sample forecast will provides results of forecasts error. As discussed in Chapters 2 and 3, there was a global recession which started in mid 2008. The price variation was excessive during the period. Therefore, in order to minimize the forecasting errors, dummy variables are being implemented to account for the global recession in the fourth quarter of 2008. Table 4.1 Summary of Invertibility Conditions for MA Coefficients Model Type Invertibility Conditions ARMA(p,0) Always invertible | q1 | <1 MA(1) or ARMA(p,1) | q 2 | <1 MA(2) or ARMA(p,2) q2 + q1 <1 q2 - q1 <1 4.2.1.1 Identification and Estimation The analysis and model are based on the observation of ln(PSMR20t) over the period January 2000 to December 2009, while the observations from January 2010 to September 2011 are being reserved for out-of-sample forecasting. Box-Jenkins forecasting models essentially involve examining the patterns of the ACF and PACF. The estimated ACF and PACF are used as a guide in choosing one or more ARIMA models that might fit the available data. These tools are considered important in the identification stage since they evaluate the statistical relationship between observations in a univariate time series. The plot of the ACF is a useful method in identifying the trend of a time series. For instance, a time series is considered non-stationary if an ACF plot of the time series 20 decays extremely slowly; however, an ACF graph cuts off fairly quickly, the time series should be identified stationary. Hence, let take a closer look at the functionality of the ACF by considering the following time series values PSMR20b, PSMR20b+1,...PSMR20n where the sample autocorrelation at lag k, denoted by rk is, n-k å(PSMR20 - PSMR20)(PSMR20 t rk = t+k - PSMR20) t=b (4.3) n-k å(PSMR20 t - PSMR20) 2 t=b An estimated autocorrelation coefficient rk measures the direction and strength of the relationship between the time series observations separated by a lag of k. Theoretically, rk always falls between -1 and 1 where a value rk close to positive 1 indicates a strong linear positive correlation and negative 1 represents a strong tendency to move together in the direction of negative slope. When rk = 0, there is no indication of correlation between the time series observation separated by a lag of k time units. 21 Table 4.2 Characteristics of the ACF and PACF for AR and MA Processes Process ACF PACF AR(p) Tails off as exponential decay or Cuts off after lag p – Cuts off to damped sine wave – Decays toward zero (lag length last spike equals zero AR order of process) MA(q) Cuts off after lag q – Cuts off to zero Tails off as exponential decay or (lag length of last spike equals MA damped sine wave order of process) ARMA(p,q) Tails off after lag (q,p) – Tails off Tails off after lag (p,q) – Tails off toward zero toward zero Source: Excerpt from Weit, 2005 p.109 and Pankratz, 1983 p.55. Table 4.3 Detailed Characteristics of Five Common Stationary Processes Process ACF AR(1) Exponential decay: (i) on the Spike at lag 1, then cuts off to positive side of f1>0; (ii) alternating zero; (i) spike is positive if f1>0; in sign starting on the negative sign (ii) spike is negative if f1<0. f1<0. AR(2) A mixture of exponential decays or a Spike at lags 1 and 2, then cuts damped sine wave. The exact pattern off to zero depends on the signs and sizes of f1 and f 2 . MA(1) Spike at lag 1, then cuts off to zero: Damps out exponentially: (i) (i) spike is positive if q1 < 0; (ii) alternating in sign, starting on the positive side, if q1 < 0; (ii) on the spike is negative if q1 > 0. negative side, if q1 > 0. MA(2) Spike at lags 1 and 2, then cuts off to A mixture of exponential decays zero. or a damped sine wave. The exact pattern depends on the signs and sizes of q1 and q 2 . ARMA(p,q) Exponential decay from lag 1: (i) Exponential decay from lag 1: (i) sign of r1 = sign of ( (f1 - q1 ) ; (ii) f11 = r1 ; (ii) all one sign if q1 > all one sign of f1 > 0; (iii) 0; (iii) alternating in sign if q1 < 0. alternating in sign if f1<0. Source: Excerpt from Pankratz, 1983 p.55. PACF 22 Autoregressive (AR) Processes Assuming that the time series is stationary, the next step is to establish an appropriate lag length structure in ARIMA model by following the guidelines in Table 4.2 and 4.3. First, let’s consider an AR model which expresses the current time series values of PSMR20t as a function of past time series values PSMR20t-1, PSMR20t-2 ,…, and PSMR20t-p . The AR model can be written as: PSMR20t = d + f1PSMR20t-1 + f2 PSMR20t-2 +... + f p PSMR20t-p + et (4.4) PSMR20t is a dependent variable at time t and PSMR20t-1, PSMR20t-2 ,…, and PSMR20t-p represent the independent variables at time lags t-1, t-2,…, and t-p, respectively. Hence, f1, f2 ,..., f p are the unknown parameters which relates to PSMR20t-1, PSMR20t-2 ,…, and PSMR20t-p . Moving Average (MA) Processes The number of lags for moving average (MA) terms is determined by following the guidelines in Tables 4.2 and 4.3. Generally, moving average terms take into account the impact of the current random shock et and past random stock of et-1, et-2…, and et-q. qth-order of moving average model, MA(q), can be written as, PSMR20t = et + q1et-1 + q2et-2 +... + q pet-q (4.5) PSMR20t is a dependent variable at time t and et, et-1,…, and et-q represent the errors at time t, and errors in the previous time periods that correspond to t-1, t-2,…, and t-p, respectively. Hence, q1, q2 ,..., q q are unknown parameter which relates to et-1, et-2 …, and et-q , respectively. 23 Mixed Autoregressive Moving Average (ARIMA) Processes Theoretical ACF’s with both AR and MA characteristics are implemented in mixed ARIMA process, which can be determined by once again following the guidelines in Tables 4.2 and 4.3. Having theoretically defined the proper lag length for AR and MA, now it is time to combine both models into a single ARIMA model of order (p,d,q) which can be written as PSMR20t = d + f1PSMR20 t-1 + f2 PSMR20t-2 +... + f p PSMR20t-p + et + q1 et-1 + q 2 et-2 +... + q p et-p (4.6) 4.2.1.2 Model Diagnostic Having identified an ARIMA model and having satisfactorily estimated its parameters, a model is examined for improvement. If there is evidence of autocorrelation or statistical insignificance, one needs to go back to the identification stage and modify the model. There are a number of diagnostic tools available for ensuring that an ARIMA model is statistically adequate. A first check is to simply plot the autocorrelogram of the residuals of the fitted model. The residuals of an estimated ARIMA model should resemble a white noise process if the model is correctly specified; a plot of autocorrelation should immediately die out from one lag on. In that case, any significant autocorrelations may results in model specification. Secondly, a statistically acceptable model has random shocks, et , that are statistically independent. The residual autocorrelations rk (k =1, 2,...,l) are supposed to be uncorrelated and normally distributed as N(0,1/n). The chosen models will be assessed 24 for autocorrelation in the residuals. Generally, it is testing the null hypothesis, there is no residual autocorrelation, against the alternative hypothesis where there is at least one nonzero autocorrelation. H 0 : rk = 0 and k =1, 2,3,..., K H 0 : rk ¹ 0 for at least onek =1, 2, 3,..., K To ensure these interests, one can adopt a diagnostic chi-square test, known as the Ljung –Box test, on the autocorrelations of the residuals in order to check for adequacy in the model. The Ljung-Box statistic is, K 2 Q = n(n + 2)å(n - k)-1 rk2 (â) ~ c l-m * k=1 (4.7) where n is the number of observation used to fit the model and l is the number of autocorrelations included in the test. Also, rl2 (â) is the squared sample autocorrelation. The Q* statistic approximately follows the chi-squared distribution. Thus, if Q* is large, and statistically significant from zero, reject the null hypothesis; hence, it indicates that the residuals of the estimated model are autocorrelated. Third, one must test if the standardized residuals are normally distributed, based on the third and fourth moments, by measuring the difference of the skewness and the kurtosis of the series with those from the normal distribution. Skewness is a measure of symmetry of the histogram. The skewness of a symmetrical distribution such as the normal distribution is zero. The kurtosis is a measure of the thickness of the tails of the distribution. The kurtosis of a normal distribution is 3. If the distribution has thicker tail than does the normal distribution, its kurtosis will exceed three. One common test for 25 normality is the Jarque-Bera (JB) test. Under the null hypothesis of normality, the Jarque-Bera is distributed chi-square with 2 degrees of freedom. H 0 : E(uts )3 = 0 (skewness) and E(uts )4 = 3 (kurtosis) H 0 : E(uts )3 ¹ 0 or E(uts )4 ¹ 3 If JB is large, it rejects the null hypothesis, which indicates that residuals are nonnormally distributed. Thus, a statistically acceptable model is adequate when its residuals are distributed as white noise, not autocorrelated, and normally distributed. If a model is not statistically acceptable because its random shocks are not statistically significant or the residuals are non-normally distributed, one must reformulate the model by returning to an identification stage and repeat the process until an acceptable or best model is found. 4.3 Multivariate Time Series Model 4.3.1. Vector Autoregressive (VAR) Model A univariate time series involves only with one variable in the analysis. In contrast, multivariate autoregression (VAR) model involve N-variables. Technically, it is an extension of the univariate to multivariate time series. In most case, an occurrence of an event is caused by multiple time series variables. Furthermore, not only that these variables are contemporaneously correlated to each other, their past values may also correlate to each others. By considering multiple time series jointly for an analysis, it utilizes the additional information in determining the dynamic relationships over time among the series (Stock and Watson 2011). 26 Based on two endogenous variables, namely PSMR20t and EXMt, the standard form of VAR with n lag model can be expressed as: PSMR20t = a1 PSMR20t-1 +... + ak PSMR20 t-k + ak+1PEXM t-1 +... + ak+1+n PEXM t-n + ePSMR20t PEXM t = b1PSMR20t-1 +... + bk PSMR20t-k + bk+1PEXM t-1 +... + bk+1+n PEXM t-n + ePEXM t (4.8) (4.9) PSMR20t and EXMt, depend on the lags of itself and lags of another variable, which captures the interrelationship among them. F-tests and information criterion are useful tools in selecting an appropriate lag length for a VAR model (Stock and Watson 2011). Using the F-statistic to test the null hypothesis; it is a joint hypothesis that sets all of the coefficients equal zero against the alternative where at least one of the coefficients is not zero. The F-statistic, however, may overestimate the true lag order (Stock and Watson 2011). Also, the Bayes Information criterion (BIC) and Akaike Information criterion (AIC) will be employed in defining the appropriate lag lengths for a VAR model since they are better measurements. BIC(p) = ln[det(Ŝu )]+ k(kp +1) ln(T ) T (4.10) AIC(p) = ln[det(Ŝu )]+ k(kp +1) 2 T (4.11) The number of lagged value will be outlined by the statistical package EViews. 27 4.3.2 Cointegration and Vector Error Correctional (VEC) Model At this stage, the variables are assumed stationary in the analysis of VAR models. If the time series are nonstationary, it may leads to spurious results. In order to apply VAR models in the analysis, the data must be stationary by taking sufficient differencing. For instance, if there are at least two variables in a model are nonstationary, the model is considered cointegrated. Suppose that PSMR20t and EXMt are cointegrated, then they share the common stochastic trend (Stock and Watson 2011). Heij at. el (2004) listed four steps in cointegration and VEC models analysis. Step 1, test whether the trend of time series variables is deterministic or stochastic. The methods of unit root test, which discussed in Section 4.1 will be used. Failing to reject the null hypothesis of stochastic trends, it implies that the variables are nonstationary and continues with step 2. In the second step, test the presence of cointegration by using Johansen trace test. It involves choosing relevant VEC model equation and starting with constant terms where trends are included. Failing to reject the null hypothesis of no cointegration, it continues to step 4. If it rejects the null hypothesis, then continue to step 3. In Step 3, it estimates the VEC model with cointegration by using Johansen trace test. Finally, in Step 4, convert the variables to stationary by taking first differences and then estimate the VAR model for the differenced variables. If cointegration has been detected among the variables which involves in a multivariate model, there is an existence of a long-term equilibrium relation between the series. In that case, a VECM is implemented instead of VAR model in order to avoid 28 misspecification errors in the analysis. The regression equation form for two endogenous variable, namely PSMR20t and EXMt both are I(1) with lag order 1, can be written as: DPSMR20t = aPSMR20 (PSMR20t-1 - b EXMt-1 )+ ePSMR20,t (4.14) DEXMt = aEXM (PSMR20t-1 - b EXMt-1 )+ eEXM,t (4.15) The above expressions yield as aPSMR20 = -a12 a21 / (1- a22 ) , b = (1- a22 ) / a21and a EXM = a21 . Equation (4.14) and (4.15) also suggest that PSMR20t and EXMt are changed in response to PSMR20t-1 - b EXM t-1 which is the previous period’s deviation from longrun equilibrium (Enders 2004). 4.4 Forecast and Forecasting Evaluation There are two forecasting period in this analysis, an in-sample and an out-ofsample period. The in-sample period is January 2000 to December 2009, which is implemented to generate the model specifications of ARIMA and VAR or VEC models. Furthermore, in order to evaluate the out-of-sample forecasting ability, some observations at the end of the sample period are not included in generating the models. The out-ofsample period is January 2010 to September 2011. There are two ways, dynamic and static, for out-of-sample forecasts. The dynamic forecasts technique is using the models to predict multiple steps ahead while static forecasts method is used to predict one-step ahead. This study will consider the case of dynamic forecasts. The estimated models produce six-step ahead forecasts (6 future months). These estimated values are used to aid in trading NR. The specific models, which are generated from in-sample period, are being used to produce multiple-steps ahead forecasts. The approach is to estimate the 29 model recursively and forecast ahead a specific number of observations (Tashman 2000). Technically, the model first compute one-step ahead forecast, using this forecast to compute two-step ahead forecasts, and continue until specific period forecast has been reached. For example, the time series are available from January 2000 to September 2011 and one wish to forecast six steps ahead (i.e., October 2011 to March 2012). In that case, once could initially estimate the model over the period January 2000 to December 2009 and forecast six steps ahead (i.e., January 2010 to June 2010). Then, re-estimate the model over the period January 2000 to January 2010 and forecasts another six steps ahead (i.e., February 2010 to July 2010). The process of estimation will be repeated until there is no out-of-sample observation left in the available data. Therefore, it will produce 21 one-step ahead forecasts, 20 two-steps ahead forecasts, 19 three-steps ahead forecasts, 18 four-steps ahead forecasts, 17 five-steps ahead forecasts, and 16 six-steps ahead forecasts. Tests of forecast accuracy are based on the differences between the forecasted values of the dependent variable at time t and the actual value of the dependent variable at time t. The closer together the two are, the smaller is the forecast error, and the better the forecast. There are several measures of forecasting accuracy, most of which are based on an average of the errors between the actual and forecast values at time t. The forecasting error is denoted as: et = PSMR20t - PSMR20tF (4.16) et is the errors between the actual and forecast price at time t. PSMR20t represents the actual price and PSMRtF represents the forecast price at time t. 30 The mean absolute error (MAE), the root mean squared error, and the Theil inequality coefficient (TIC) will be taken into consideration as tools in evaluating the forecast performance. 1 F MAE = å et F i=1 (4.17) 1 F RSME = (et )2 å F i=1 (4.18) TIC = F 1 (et )2 å F i=1 F RSME 1 (etN )2 = å F i=1 RSME N (4.19) F is considered at the number-of-sample observations and N is the naïve model of no change (estimated model from in-sample period). The MAE and RSME depend on the scale of the independent variables and should be used as relative measures to compare forecasts for the same series. They have been used extensively in application of theoretical works on forecasting. It indicates that the smaller error in MAE and RMSE, the better the forecasting ability of the model. The RMSE will always be at least larger than MAE. If both equal each other, all errors are exactly the same. A value of TIC is, the ratio of RSME of the estimated model over the naïve model, close to zero indicates a nearly perfect fit model where the forecasting performance of the estimated model is satisfactory. The advantage of Theil statistic is that it is a unit-free measure while the MAE and RMSE are dependent on the dimensions of the dependent variable. 31 Chapter 5 EMPIRICAL RESULTS AND DISCUSSION 5.1 Introduction The main objective of this research paper is to forecast the price of SMR20, which is the most commonly traded type of NR. In this chapter, empirical results, which are outlined in Chapter 4, will be presented. Also, forecasting performances of each model are analyzed based on the statistical criteria. 5.2 Unit Root Test Results Table 5.1 Results of Unit Root Tests with Trend (Jan. 2000 to Dec. 2009) Variables Coefficient of the Unit Root Test Level First Different DF-1 lag DF-0 lag LPSMR20t -0.69 -8.09* DF-1 lag DF-0 lag LPCO t -1.29 -8.27* DF-1 lag DF-0 lag LEXM t -1.33 -8.39* LTSMVt ADF-12 lag -1.62* ADF-11 lags -11.74* Note: *, **, *** Indicates significance at 1%, 5%, and 10% levels. Table 5.2 The Augmented Dickey-Fuller t-Statistic For Unit Root Test Deterministic Regressors 10% 5% 1% Intercept only -2.57 -2.86 -3.43 Intercept and time trend -3.12 -3.41 -3.99 The number of lags in ADF test is chosen based on Schwarz Information Criterion so that the highest lag significant from zero is taken into account. With the ADF-critical value approximately equals -3.41 at the 5% level (included time trend), it fails to reject the null hypothesis that LPSMR20t, LPCOt , LEXMt, and LTSMVt in levels have a unit root according to the results in Table 5.1. 32 Then the four series, LPSMR20t, LPCOt , LEXMt, and LTSMVt are first differenced, and the DF and ADF tests are being performed on the first differenced data. Table 5.1 illustrates that DF and ADF tests are significant which it rejects for all the series in first difference. Therefore, it suggests that LPSMR20t, LPCOt , LEXMt, and LTSMVt are integrated of order one I(1). Therefore, the ARIMA and VAR or VECM analysis is carried out in first differences which being denoted as DLPSMR20t, DLPCOt , DLEXMt, and DLTSMVt. 5.3 Seasonality The graphical result of level prices of SMR20 in Figure 3.1 shows no indication of seasonal pattern. However, according to the plots of ACF and PACF in Figures 5.3 and 5.4, it appears that there are seasonal patterns; yet, those lags are not significant at 5% levels. To determine whether there is a pattern of seasonality in the price of SMR20, the seasonal dummy variables are included in the analysis; if the results yields significant statistical results, the dummy variables are important in explaining the variation of the dependent variable. 5.4 Univariate Model Specifications The analysis and model are based on the observation of LPSMR20t over the period January 2000 to December 2009. Having determined whether the series is stationary by first differencing (d =1), it is time to establish an appropriate lag length for an ARIMA model. Technically, Box-Jenkins forecasting models essentially involve examining the patterns of the autocorrelation function (ACF) and the partial autocorrelation function 33 (PACF). The estimated ACF and PACF are used as a guideline in choosing one or more ARIMA models that might fit the available data. These tools are considered important in the identification stage since they evaluate the statistical relationship between observations in a univariate time series. 5.4.1 Univariate Results Figures 5.1 and 5.2 shows the estimated ACF and PACF of the log form of original time series, LPSMR20 over the period January 2000 to December 2009. The estimated ACF are falling to zero slowly while estimated PACF is spike at 1 and then cut off; these incidences are corresponding to nonstationary. In Figures 5.3 and 5.4, the ACF is decay; it starts from the positive side and then switch to negative side; the PACF is spike at lag 1 and then it cuts off to zero. These occurrences are associated with the autoregressive model of order 1, AR(1), according to the guidelines in Tables 4.3 and 4.4. The following equation represents the AR(1) model: DLPSMR20t = d - f1DLPSMR20t-1 + et (5.1) Once again, Figures 5.3 and 5.4 show the plot of ACF has a spike at lag 1 and 2 followed by a mixture of exponential decays. These occurrences associate with the moving average model of order 2, MA(2) with negative coefficient, according to the guideline in Table 4.5. The following equation represents the MA(1): DLPSMR20t = a - q1et-1 - q1et-2 + et (5.2) With p = 1 and q = 2, ARIMA(1,1,2) is chosen, according to the guideline in Table 4.5. Thus, ARIMA (1,1,2) can be expressed as, DLPSMR20t = g + f1DLPSMR20t-1 - q1et-1 - q1et-2 + et (5.3) 34 Figure 5.1 Autocorrelation of LPSMR20 Figure 5.2 Partial Autocorrelation of LPSMR20 35 Figure 5.3 Autocorrelation of DLPSMR20 Figure 5.4 Partial Autocorrelation of DLPSMR20 36 Table 5.3 ARIMA and VEC of PSMR20t ARIMA (2,1,1) Specification Regressors AR(1) AR(2) MA(1) 1.13*** (11.93) -0.22** (2.41) 0.98*** (81.37) (1) ARIMA (2,1,1) Seasonal Dummy Intercept January February March April May June July August September October November 0.30*** (7.82) 0.02*** (7.42) (3) ARIMA (1,1,2) Seasonal Dummy ARIMA (2,1,2) ARIMA (2,1,2) Seasonal Dummy -0.11 (0.21) 0.14 (0.88) -0.90*** (25.11) -0.56 (1.33) 1.44*** (4.52) -0.52* (1.80) 1.48*** (5.08) -0.56** (2.09) 0.37 (0.7) 1.17*** (11.39) 0.82* (1.94) - 1.32*** (3.77) -1.33** (4.05) 0.19* (1.94) 0.21* (1.69) 0.34 (0.99) 0.35 (1.09) MA(2) D2008Q4 (2) ARIMA (1,1,2) -0.31*** (6.67) -0.34*** (8.18) 0.01 (0.42) 0.06** (2.12) 0.03 (1.15) -0.01 (0.19) 0.01 (0.05) 0.01 (0.34) 0.02 (0.60) -0.10 (0.35) 0.01 (0.33) 0.01 (0.04) 0.02 (0.79) 0.01 (0.64) 0.02*** (2.79) -0.27*** (7.34) 0.01 (0.45) 0.05** (2.05) 0.03 (1.00) -0.01 (0.36) 0.01 (0.12) 0.01 (0.21) 0.02 (0.57) -0.01 (0.36) 0.01 (0.31) 0.00 (0.02) 0.02 (0.77) 0.01 (0.46) 0.02*** (8.88) 0.01 (0.28) 0.06** (2.07) 0.03 (1.09) 0.03 (0.20) 0.01 (0.18) 0.01 (0.40) 0.02 (0.77) 0.01 (0.16) 0.01 (0.49) 0.01 (0.20) 0.02 (0.77) 0.01 (0.41) Note: ***, **, and * denote significant 1%, 5% and 10%, respectively. The values of t-statistic are in the parenthesis. 37 Table 5.4 Statistical Results of ARIMA and VEC of PSMR20t ARIMA (2,1,1) (1) ARIMA (2,1,1) Seasonal Dummy 0.36 (2) (3) ARIMA ARIMA (2,1,2) (2,1,2) Seasonal Dummy 0.40 0.39 0.39 ARIMA (1,1,2) Seasonal Dummy 0.37 -2.17 -2.56 -2.17 -2.55 -2.18 -2.59 -2.55 -2.67 -2.55 -2.69 -2.59 0.96 0.03 -.60 -0.56 0.95 1.08 0.37 0.61 1.03 0.99 0.98 Q(16) = 7.37 3.70 Q(16) = 4.81 7.57 Q(16) = 5.53 5.41 Q(16) = 5.09 9.00 Q(16) = 9.01 1.89 Q(16) = 7.44 1.37 0.6422 0.6219 0.5129 0.4868 0.5785 0.5546 0.5583 0.5344 1.0705 1.0549 0.8714 0.8560 Specification Adjusted R2 0.40 BIC -2.70 AIC Inverted AR Roots Inverted MA Roots Ljung-Box Q-statistic Jarque_Bera RSME MAE -0.98 ARIMA (1,1,2) As discussed in Chapters 2 and 4, due to the global recession in 2008, the prices of NR plummeted in the fourth quarter of 2008. Therefore, a dummy variable of D2008Q4 will be implemented into the regressions analysis in order to offset the high variation of price change. Having determined the specific model for ARIMA(1,1,2). There are also several ARIMA model that can be constructed from the characteristics of ACF and PACF plots in Figures 5.3 and 5.4. In that case, Table 5.3 illustrates different models of ARIMA, which yield different statistical results. One can compare all the models and define which models stand out in term of their statistical results, which fit the seven stages, discussed in Chapter 4. Tables 5.3 and 5.4 show that seasonal dummy variables are not statistically significant. Also, implementing the seasonal dummies causes the coefficients to be 38 statistically insignificant. Therefore, it appears that there is no seasonal condition in the series of PSMR20t. Having determined that seasonal is absent in the series, let analyze the statistical results of the ARIMA models. First, in order to be stationary, the absolute coefficients of AR must less than one. Table 5.3 and 5.4 lists that the inverted AR roots for all models are less than 1, which implies that all models are satisfactory with the appropriate stationary conditions. Second, the coefficients of MA must satisfy the invertibility condition where the value of coefficients of in MA terms must be less than 1. In that case, only ARIMA(2,1,1), ARIMA(1,1,2) and ARIMA(2,1,2) specifications meet the requirements. Next, the model should have high quality of estimated coefficients, which suggests that all the coefficients on the right hand side must be statistically significant. In that regard, only ARIMA(1,1,2) and ARIMA(2,1,1) have all the coefficients statistically significantly at 5% and 1% levels. The process of elimination leaves only two models: ARIMA(1,1,2) and ARIMA(2,1,1). The Ljung-Box statistic of ARIMA (2,1,1) and ARIMA (1,1,2) is not significantly different from zero, according to Ljung-Box statistic values in Table 5.3. It fails to reject the null hypothesis of no remaining significant autocorrelation in the residual of the models. In that case, the prices of SMR20 can be fitted into an ARIMA (2,1,1) and ARIMA (1,1,2) model. With the JB-test equals 3.70 and 5.41, it fails rejects the null hypothesis, which indicates that residuals are considered normally distributed at 5% level. Finally, the MAE and RMSE for ARIMA(1,1,2) are smaller than ARIMA(2,1,1,) which indicates it ARIMA(1,1,2) fits the available data (the past) well 39 enough to satisfy the analysis. Also, the ACF and PACF graphs in Figures 5.3 and 5.4 also show the models are satisfactory. 5.5 Multivariate Results The empirical results of multivariate time series models VAR and VEC model will be presented in this section. Table 5.1 shows the results of ADF test indicated that fails to reject the null hypothesis of stochastic in LPSMR20t, LPCOt and LEXMt. The ADF tests on the series of first differenced DLPSMR20t, DLPCOt , DLEXMt, and DLTSMVt, shows these series are stationary. Therefore, it concludes that the series of DLPSMR20t, DLPCOt , DLEXMt, and DLTSMVt, are integrated of order 1. 5.5.1 Cointegration and VEC model By applying Johansen test on the cointegration, the trace statistic does not reject the null hypothesis; hence, there is at least one cointegration relation between the variables. The number of cointegration of r =1 is not rejected at the 5% level (11.80 < 29.80) (Table 5.5). The tests confirm that variables are cointegration. Table 5.5 Results of Cointegration Tests Hypothesized No. of CE(s) Eigenvalue None 0.130080 At most 1 0.068932 At most 2 0.030729 At most 3 0.000001 Trace Statistic 49.82959 11.80386 3.590188 0.000974 5% Critical Value 47.85613 29.79707 15.49471 3.441466 Based on the discussion in Chapter 4, a VEC model is implemented when variables are cointegrated. Using the AIC and BIC criterion, it estimates a VEC model with 1 lag for each variable. Therefore, one lag is included in the cointegration equations 40 based on EViews’ suggestion, and incorporated in a VEC model for price of SMR20 (Table 5.6). Table 5.6 AIC and BIC Criterions From EViews Lag LogL LR FPE AIC 0 593.0104 NA 4.85E-09 -10.63082 1 621.318 54.57483 3.42e-09* -10.97870* 2 625.5368 7.905638 3.73E-09 -10.89256 3 631.625 11.07929 3.94E-09 -10.84009 4 637.2915 10.00575 4.19E-09 -10.78003 SC -10.55759 -10.68578* -10.37994 -10.10778 -9.82803 HQ -10.60111 -10.85987* -10.6846 -10.54301 -10.39383 With an existence of cointegration between the variables, it suggests a long-term relationship among the variables. Therefore, the long-run equilibrium relation is estimated by EViews, which illustrates in Table 5.7. Table 5.7 The Estimated Long-Run Relationship for One Cointegration Vector Variable LPSMR20 LPCO LEXM PTSMV Constant 1 2.67*** -11.30*** 4.63*** -3.28*** Coefficients (6.08) (2.72) (3.46) (5.15) Note: ***, **, and * denote significant 1%, 5% and 10%, respectively. In Table 5.7, all the coefficients are statistically significant at the 1% level. The results suggest that 1% increased in price of crude oil leads to 2.67% prices increased of NR. This interpretation is consistence with the economic theory since NR and SR are complement goods. Yet, a 1% increase in the exchange rate between MYR and USD causes the prices of NR to drop by 11.30%. If the currencies of producing countries are appreciated against dollar, the goods are being traded are under valued. However, the price of NR would boost up by 4.63% if the demand for motor vehicle in the United 41 States increased by 1%. This phenomenon reflects the quantity of supply and demand theory where the quantity of demand go up, the price is increased. Having determined the estimated long-run relationship between LPSMR20t , LPCOt, LEXMt and LTSMVt , the VEC model can be constructed and expressed as in Equation 5.4 and Table 5.8. DLPSMR20 t = a1 (LPSMR20t-1 + 2.6LPCOt-1 -11.30LEXM t-1 +4.30PTSMV - 3.28) + g11DLPSMR20 t-1 +g12 DLPCOt-1 + g13 DLEXM t-1 + D2008Q4 + e1t (5.4) LPSMR20t is the price of SMR20 at time t, a1 is a coefficient on the error correction terms, g11 , g12 , and g13 are the coefficients to be estimated on the autoregressive terms. b1 is the coefficient on the exogenous variable of dummy variable when the market hit global recession in late 2008. e1t is the innovations which assumed to be serially correlated within the equation. Table 5.8 reveals that the coefficients of lagged value of DLPCO and D2008Q4 are statistically significant at the 1% level. 42 Table 5.8 VEC(1,1) Model for PSMR20t (Jan .2000 to Dec .2009) Dependent variable DLPSMR20t Regressors Cointegration Equation (Table 5.6) Adjusted R2 -0.003 (1.40) -0.049 (1.57) -0.022 (1.29) -1.880*** (2.91) -0.108 (0.65) -0.234*** (4.81) 0.022 (1.44) 0.37 Jarque_Bera 4.75 DLPSMR20t-1 DLPCO t-1 DLEXM t-1 DLTSMV t-1 D2008Q4 Intercept RSME 0.3896 MAE 0.3624 Note: ***, **, and * denote significant 1%, 5% and 10%, respectively. The values of t-statistic are in the parenthesis. Number in parenthesis represents the value of t-statistic. 5.6 Evaluations and Forecast Accuracy There are two fitted models in this study, ARIMA(1,1,2) and VEC(1,1) which are estimated over the period January 2000 to December 2009. Tables 5.9 and 5.10 below presents some forecast statistics for the prices of SMR20 series estimated over the period January 2010 to September 2011. Once again, the estimated models are used to forecast six-steps ahead; then the models were recursively estimated, stepping forward one month at a time, and again forecast six-steps ahead until September 2011. 43 Table 5.9 Forecast Statistics for ARIMA of LPSMR20t (Jan. 2010 to Sep. 2011) Steps MAE RMSE TIC Number of Obs. 1 0.5520 0.5887 0.0564 21 2 0.5512 0.5885 0.0563 20 3 0.5642 0.6025 0.0575 19 4 0.5546 0.5889 0.0562 18 5 0.5510 0.5895 0.0562 17 6 0.5423 0.5829 0.0558 16 Average 1-6 0.5526 0.5902 0.0564 Table 5.10 Forecast Statistics for VEC of LPSMR20t (Jan. 2010 to Sep. 2011) Step MAE RMSE Theil Number of Obs. 1 0.3574 0.4280 0.0598 21 2 0.3903 0.4254 0.0413 20 3 0.3883 0.4233 0.0411 19 4 0.3878 0.4224 0.0410 18 5 0.3863 0.4208 0.0408 17 6 0.3838 0.4182 0.0406 16 Average 1-6 0.3823 0.4230 0.0441 Tables 5.9 and 5.10 indicate the forecast statistics from the multivariate model VE(1,1) clearly outperforms the univariate model ARIMA(1,1,2). The RSME of the ARIMA forecasts is higher than the VEC forecasts. Also, the RSME for ARIMA varies from 0.6025% to 0.5829% compared to approximately 0.4230% for the VEC model. The average RMSE of ARIMA(1,1,2) and VEC(1,1) are 0.5902% and 0.4230%, respectively. Once again, out of the two models, ARIMA(1,1,2) provides better forecasting due to its lower RMSE. Also, the TIC statistics are consistently below unity which indicates that the estimated model outperforms the simple naïve model which indicate nearly perfect fit models where the forecasting performance of the estimated models are satisfactory. 44 Table 5.11 Six-Step Forecast for PSMR20t and Model Evaluations Actual ARIMA Date Price (1,1,2) 2,888.90 Jun-10 3,285.26 2,884.70 Jul-10 3,351.71 3,065.40 Aug-10 3,419.51 3,369.00 Sep-10 3,488.68 3,774.10 Oct-10 3,559.25 4,170.60 Nov-10 3,631.25 4,584.40 Dec-10 3,704.71 5,230.40 Jan-11 3,779.65 5,560.80 Feb-11 3,856.11 4,767.40 Mar-11 3,934.11 5,007.30 Apr-11 4,013.69 4,491.20 May-11 4,094.88 4,527.90 Jun-11 4,177.71 4,535.80 Jul-11 4,262.22 4,587.70 Aug-11 4,348.44 4,489.70 Sep-11 3,285.26 0.5887 Avg. RMSE 0.5520 Avg. MAE 0.0564 Avg. TIC VEC (1,1) 3,333.56 3,501.57 3,672.53 3,843.05 3,927.45 3,963.79 3,999.08 4,232.70 4,351.63 4,466.22 4,614.74 4,676.30 4,709.78 4,888.39 4,993.83 3,333.56 0.4287 0.3574 0.0598 The approach of forecasting six-steps forecasting is discussed in Chapter 4. Table 5.11 illustrates the comparison between the dynamic forecast, six-steps ahead of each model, and actual values of PSMR20t. Table 5.12 shows the percentage difference between the forecast and actual value at time t. The large different between the forecasting and actual values indicates that the models are not performing well when time horizon is extended. Also, the value of RMSE and MAE for VEC(1,1) model are lesser than ARIMA(1,1,2) which signifies that multivariate forecasting model perform better in term of statistical results. 45 For a one-step ahead forecasting, all the models do a good job of forecasting the prices in term of proximity between actual and forecast values Figure 5.5 and 5.6; however, all the models fail to predict the sharp turns, which consisted of periods between April 2010 to June 2010 and February 2011 to June 2011. Also, the percentage difference between the actual value and multivariate model, VEC(1,1) are smaller than the univariate model, ARIMA(1,1,2). Table 5.12 Percentage Difference Between the Forecast an Actual Values ARIMA VEC Date (1,1,2) (1,1) Jun-10 -11.47% -14.48% Jul-10 -13.89% -15.56% Aug-10 -9.34% -14.23% Sep-10 -10.50% -9.01% Oct-10 7.56% -1.83% Nov-10 14.66% 5.83% Dec-10 20.79% 13.54% Jan-11 29.17% 23.54% Feb-11 32.03% 23.88% Mar-11 19.12% 8.72% Apr-11 21.43% 10.81% May-11 10.63% -2.75% Jun-11 9.56% -3.28% Jul-11 7.89% -3.84% Aug-11 7.09% -6.55% Sep-11 3.15% -11.23% Date Aug-11 Sep-11 Aug-11 Sep-11 Feb-11 Jan-11 Dec-10 Nov-10 Oct-10 Sep-10 Aug-10 Jul-10 Jun-10 May-10 Apr-10 Mar-10 Feb-10 Jan-10 Jul-11 2000 Jul-11 3000 Jun-11 4000 Jun-11 5000 May-11 6000 May-11 VEC (1,1) Apr-11 Figure 5.6 One-Step Ahead Forecast of VEC(1,1) Apr-11 Forecast Mar-11 Date Mar-11 Feb-11 Jan-11 Actual Dec-10 Nov-10 Oct-10 Sep-10 Aug-10 Jul-10 Jun-10 May-10 Apr-10 Mar-10 Feb-10 Jan-10 USD/MT USD/MT 46 Figure 5.5 One-Step Ahead Forecast of ARIMA(1,1,2) ARIMA (1,1,2) 6000 5000 4000 3000 2000 47 Chapter 6 CONCLUSIONS AND IMPLICATIONS 6.1 Conclusion The results from the models ARIMA(1,1,2) and VEC(1,1) provide efficient outcomes in terms of statistical results and predictable values for out-of-sample forecasting. The average RSME and MAE for the univariate ARIMA model are significantly larger than the multivariate VEC model. Furthermore, the predicted values from out-of-sample forecasts are close to the actual values. The results from the empirical analysis indicates that there is a positive relationship between the price of NR and price of crude oil which is consistent with the economic theory since two commodities are considered complementary goods. Furthermore, with an increasing demand in motor vehicle, the price of NR is climbed. This phenomenon reflects the concept of quantity theory of supply and demand. Yet, the price of NR is declined if the currencies of producing countries are appreciated against US dollar. Also, the multivariate model outperform univariate model in term of statistical results and forecasting values. 6.2 Remarks and Future Research This paper demonstrates that simple forecasting techniques provide valuable information in terms of improving, planning and decision-making. Currently, as one of the producers and distributors of NR in Cambodia, I can adopt these methods of forecasting to develop a market strategy to work with local farmers and improve the well being of rubber producers. While several models and techniques were employed in this 48 thesis, this analyses can, there is room to further improve the models in order to introduce this study to the Department of Rubber Development in Cambodia where certain policies can be implemented to maximum the welfare for rubber farmers. 49 REFERENCES Allen, P.G. (1994). Economic forecasting in agriculture. International Journal of Forecasting, 10(1), 81-135. Retrieved from http://www.sciencedirect.com/science/article/pii/0169207094900523 ANRPC. (n.d.) Natural rubber trends and statistic. Retrieved March 15, 2011, from http://www.anrpc.org/html/archive.aspx Bailey, C.D. & Gupta S. (1999). Judgment in learning-curve forecasting: a laboratory study. Journal of Forecasting. 18(1), 39-57. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1099131X(199901)18:1%3C39::AID-FOR683%3E3.0.CO;2-N/abstract Burger, K., & Smit H. P. (1891). Long term and short term analysis of the natural rubber Market. Review of World Economics, 125(4),718-747. Retrieved from http://www.springerlink.com/content/d537678331523827/ Burger, K., Smit H. P. & Vogalvang, B. (2002). Exchange rates and natural rubber prices, the effect of the Asian crisis. 10th EAAE Congress: Exploring Diversity in the European agri-food system. Retrieved From http://ageconsearch.umn.edu/bitstream/24958/1/cp02bu31.pdf Dickey, D. A., & Fuller W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with A Unit Root. Journal of the American Statistical Association, 49(366), 427-431. Retrieved from http://www.deu.edu.tr/userweb/onder.hanedar/dosyalar/1979.pdf Enders, W. (2004). Applied econometric time series (2nd Edition). New York, NY: Wiley. Feder, G., Richard, J. E., & Andrew, S. (1980). Futures Markets and Theory of The Firm Under Price Uncertainty. The Quarterly Journal of Economics, 94(2), 317-328. Retrieved from http://qje.oxfordjournals.org/content/94/2/317 Frank, J., & Garcia, P. (2010). How strong are the linkages among agricultural, oil, and exchange rate markets? In Proceedings of the NCCC-134 Conference on Applied Commodity Price Analysis, Forecasting, and Market Risk Management. Retrieved from http://www.ucema.edu.ar/conferencias/download/2010/11.06.pdf 50 Harvey, A.C. & E.C. Todd (1983). Forecasting econometric time series with structural and Box-Jenkins models (with discussion). Journal of Business and Economic Statistics, 1(4), 299-315. Retrieved from http://www.jstor.org/pss/1391661 Heij, C., Boer. P. D., Frances, P.H., Kloek, T., & Dijk, H.K. (2004). Econometric models with applications in business and economics. New York, NY: Oxford University Press. Khin A. A. (2011). Econometrics forecasting models for short term natural rubber prices. Saarbrcken, Germany: Lambert Academic Publishing. Meyler, A, G. Kenny and T. Quinn (1998). Forecasting Irish inflation using ARIMA models. Central Bank of Ireland Technical Paper. Retrieved from http://mpra.ub.uni-muenchen.de/11359/1/cbi_3RT98_inflationarima.pdf Pankratz, A. (1983). Forecasting with univariate Box-Jenkins models. Hoboken, NJ: John Wiley & Sons, Inc. Pindyck R.S., and Rubinfeld D. L. (1998). Econometric models and economic forecasts. New York, NY: Irwin/McGraw-Hill Phillip P.C.B, & Perron P. (1988). Testing for a unit root time series regression. Biometrika. 75(2), 335-346. Retrieved from ftp://ftp.icesi.edu.co/jcalonso/Analisis%20de%20Series%20de%20tiempo/Paper/ PP1988.pdf Romprasert, S. 2009. Forecasting model of RSS3 price in futures market. Kaetsart University Journal of Economics, 16(1), 54-74. Retrieved from http://journal.eco.ku.ac.th/upload/document/thai/20090958015458.pdf Sims, C.A. (1980) Macroeconomics and reality. Econometrica, 48(1), 1-48. Retrieved from http://www.ekonometria.wne.uw.edu.pl/uploads/Main/macroeconomics_and_reali ty.pdf Stock J.H. & Watson, M.W. (2011). Introduction to Econometrics (3rd Edition). Boston, MA: Pearson. Tashman, L. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting. 16(1), 437-450. Retrieved from http://www.sciencedirect.com/science/article/pii/S0169207000000650 51 UNCTAD. (n.d.) Market Information in the Commodities Area: The Main Uses of NR. August 11, 2011. Retrived from http://unctad.org/infocomm/anglais/rubber/uses.htm Zant, W. (1994). Price and stock information with rational expectations in the Indian natural rubber market. Discussion Paper, Tinbergen Institute, Amsterdam. Retrieved from ftp://zappa.ubvu.vu.nl/19930020.pdf