Vectorautoregressive- VAR Models and Cointegration Analysis Time Series Analysis Dr. Sevtap Kestel 1 VECTOR TIME SERIES 2 VECTOR TIME SERIES 3 Vectorautoregression Vector autoregression (VAR) is an econometric model used to capture the evolution and the interdependencies between multiple time series, generalizing the univariate AR Models. All the variables in a VAR are treated symmetrically by including for each variable an equation explaining its evolution based on its own lags and the lags of all the other variables in the model. A VAR model describes the evolution of a set of k variables measured over the same sampleperiod (t єT) as a linear function of only their past evolution. The variables are collected in a k x 1 vector yt, which has as the ith element yi,t , the time t observation of variable yi. For example, if the ith variable is GDP, then yi,t is the value of GDP at t. A (reduced) p-th order VAR, VAR(p), is yt c A1 yt 1 A2 yt 2 ... Ap yt p t where c is a k x 1 vector of constants (intercept) Ai is a k x k matrix (for every i = 1, ..., p) and εt is a k x 1 vector of error terms satisfying the conditions. Properties: E[ t ] 0 every error term has mean zero E[ t t ] the contemporaneous covariance matrix of errors E[ t tk ] 0 Order of integration of the variables Note that all the variables used have to be of the same order of integration. We have the following cases: All the variables are I(0) (stationary): one is in the standard case, ie. a VAR in level All the variables are I(d) (non-stationary) with d>1: The variables are conintegrated: the error correction term has to be included in the VAR. The model becomes a Vector error correction model (VECM) which can be seen as a restricted VAR. The variables are not cointegrated: the variables have first to be differenced d times and one has a VAR in difference Example: VAR(1) Suppose {y1t}tєT denote real GDP growth, {y2t} tєT denote inflation y1t c1 A11 y c A 2t 2 21 A12 y1,t 1 1t A22 y2,t 1 2t y1t c1 A11 y1,t 1 A12 y2,t 1 1t y2t c2 A21 y1,t 1 A22 y2,t 1 2 t One equation for each variable in the model. The current (time t) observation of each variable depends on its own lags as well as on the lags of each other variable in the VAR. VAR(1) PROCESS • Example: 1.1 0.3 Yt Yt 1 Z t 0.6 0.2 1.1 0.3 I 0 . 6 0 . 2 det I 1.1 0.2 0.60.3 2 1.3 0.4 0 1 0.8, 2 0.5 The process is stationary. 8 Structural VAR (SVAR) with p lags B0 yt c0 B1 yt 1 B2 yt 2 ... B p yt p et where c0 is a k x 1 vector of constants, Bi is a k x k matrix, i = 0, ..., p, and et is a k x 1 vector of error terms. The main diagonal terms of the B0 matrix (the coefficients on the ith variable in the ith equation) are scaled to 1. The error terms et (structural shocks) satisfy the conditions and particularity that all the elements off the main diagonal of the covariance matrix E(etet') = Σ are zero. That is, the structural shocks are uncorrelated. IMPULSE RESPONSE FUNCTION The key tool to trace short run effects with an SVAR is the impulse response function. yt c A1 yt 1 A2 yt 2 ... Ap yt p t can be expressed as MA(‡) yt c t 1 t 1 2 t 2 ... ( B) t yt l l ' t the row i , column j element of l identifies the consequences of a one-unit increase in the jth variable’s innovation at date t (εtj) for the value of the ith variable at time t+l, holding all other innovations at all dates constant. A plot of the row i, column j element of as a function of lag l is called the non-orthogonalized impulse response function. l GRANGER CAUSALITY • In time series analysis, sometimes, we would like to know whether changes in a variable will have an impact on changes other variables. • To find out this phenomena more accurately, we need to learn more about Granger Causality Test. 12 GRANGER CAUSALITY • In principle, the concept is as follows: • If X causes Y, then, changes of X happened first then followed by changes of Y. 13 GRANGER CAUSALITY • If X causes Y, there are two conditions to be satisfied: 1. X can help in predicting Y. Regression of X on Y has a big R2 2. Y can not help in predicting X. 14 COINTEGRATION Cointegration • In many time series, integrated processes are considered together and they form equilibrium relationships. – Short-term and long-term interest rates – Income and consumption • These leads to the concept of cointegration. • The idea behind the cointegration is that although multivariate time series is integrated, certain linear transformations of the time series may be stationary. 16 SPURIOUS REGRESSION • If we regress a y series with unit root on regressors who also have unit roots the usual t tests on regression coefficients show statistically significant regressions, even if in reality it is not so. • The Spurious Regression Problem can appear with I(0) series • In a Spurious Regression the errors would be correlated and the standard t-statistic will be wrongly calculated because the variance of the errors is not consistently estimated. In the I(0) case the solution is: ˆ ˆ t t - distributi on , where ˆ (long - run varian ce of ˆ )1/2 ˆ 17 SPURIOUS REGRESSION Typical symptom: “High R2, t-values, F-value, but low DW” 1. Egyptian infant mortality rate (Y), 1971-1990, annual data, on Gross aggregate income of American farmers (I) and Total Honduran money supply (M) Y ^ = 179.9 - .2952 I - .0439 M, R2 = .918, DW = .4752, F = 95.17 (16.63) (-2.32) (-4.26) Corr = .8858, -.9113, -.9445 2. US Export Index (Y), 1960-1990, annual data, on Australian males’ life expectancy (X) Y ^ = -2943. + 45.7974 X, R2 = .916, DW = .3599, F = 315.2 (-16.70) (17.76) Corr = .9570 18 Cointegration If two or more series are themselves non-stationary, but a linear combination of them is stationary, then the series are said to be cointegrated. Example: A stock market index and the price of its associated follow a random walk by time. Testing the hypothesis that there is a statistically significant connection between the futures price and the spot price could now be done by testing for a cointegrating vector. The usual procedure for testing hypotheses concerning the relationship between nonstationary variables was to run Ordinary Least Squares (OLS) regressions on data which had initially been differenced. Although this method is correct in large samples, cointegration provides more powerful tools when the data sets are of limited length, as most economic time-series are. The two main methods for testing for cointegration are: The Engle-Granger three-step method. The Johansen procedure. Granger Causality • According to Granger, causality can be further subdivided into long-run and short-run causality. • This requires the use of error correction models or VECMs, depending on the approach for determining causality. • Long-run causality is determined by the error correction term, whereby if it is significant, then it indicates evidence of long run causality from the explanatory variable to the dependent variable. • Short-run causality is determined as before, with a test on the joint significance of the lagged explanatory variables, using an F-test or Wald test. 20 Granger Causality • Before the ECM can be formed, there first has to be evidence of cointegration, given that cointegration implies a significant error correction term, cointegration can be viewed as an indirect test of long-run causality. • It is possible to have evidence of long-run causality, but not short-run causality and vice versa. • In multivariate causality tests, the testing of longrun causality between two variables is more problematic, as it is impossible to tell which explanatory variable is causing the causality through the error correction term. 21 Engle-Granger Approach Estimation of parameters can be done by OLS estimation of linear regression equation: Yt 0 1Y2t .. MYMt t Dickey-Fuller t test is applied to the OLS residuals ˆt Rejecting the null hypothesis of non-stationarity concludes “cointegration relationship” does exist. Three-step approach •Determine the I(d) for every variable Dickey Fuller, Perron tests H0: series is non-stationary •Estimate the cointegration relation by OLS regression •Test the residuals for stationarity y1t 0 1 y2 t t t y1t 0 1 y2 t ˆ y ˆ ˆ y t 1t 0 1 2t H0: series are not cointegrated ADF Test does not give correct critical values because of the OLS residuals we use MacKinnon Table to determine the critical values Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders. Error Correction Model Granger Representation Theorem Determination of the dynamic relationship between cointegrated variables in terms of their stationary error terms.For bivariate case: Two integrated I(1) variables y1t and y 2t yielding one cointegrated combination p 1 y1t 1 t 1 (a11 i y1t i a12 i y2 t i ) 1t i 1 t I (0) p 1 y2 t 2 t 1 ( a21i y1t i a22 i y2 t i ) 1t i 1 Estimate parameters by OLS. Regression with only stationary variables on both sides. Multivariate Cointegration Analysis - Johansen Test VAR(1) having M I(1) variables can be expressed as: Yt Yt 1 t where: Y, ì and å are (Mx1) vectors and à is a (MxM) matrix Johannsen Test The approach of Johansen is based on the maximum likelihood estimation of the matrix (Γ - I) under the assumption of normal distributed error variables. Following the estimation the hypotheses H0: r = 0, H0: r = 1, …, H0:r = M-1 are tested using likelihood ratio (LR) tests. The Johansen Trace and Maximal Eigenvalue Tests • To test whether the variables are cointegrated or not, one of the well-known tests is the Johansen trace test. The Johansen test is used to test for the existence of cointegration and is based on the estimation of the ECM by the maximum likelihood, under various assumptions about the trend or intercepting parameters, and the number k of cointegrating vectors, and then conducting likelihood ratio tests. 26 Example: Exchange rate, interest rates, S&P 500(GLOBAL) index, ISE index Series: GLOBAL EXCHANGE_RATE INTEREST_RATE ISE Trace Test Lags interval (in first differences): 1 to 4 Hypothesized Trace 0.05 No. of CE(s) Eigenvalue Statistic Critical Value Prob.** None * 0.065285 156.7717 47.85613 0.0000 At most 1 * 0.017048 38.96042 29.79707 0.0034 At most 2 0.005118 8.955273 15.49471 0.3695 At most 3 1.20E-06 0.002096 3.841466 0.9599 Trace test indicates 2 cointegrating eqn(s) at the 0.05 level * denotes rejection of the hypothesis at the 0.05 level Unrestricted Cointegration Rank Test (Maximum Eigenvalue) Hypothesized Max-Eigen 0.05 No. of CE(s) Eigenvalue Statistic Critical Value Prob.** None * 0.065285 117.8113 27.58434 0.0000 At most 1 * 0.017048 30.00514 21.13162 0.0022 At most 2 0.005118 8.953177 14.26460 0.2901 At most 3 1.20E-06 0.002096 3.841466 0.9599 Max-eigenvalue test indicates 2 cointegrating eqn(s) at the 0.05 level * denotes rejection of the hypothesis at the 0.05 level **MacKinnon-Haug-Michelis (1999) p-values 1 Cointegrating Equation(s): Log likelihood -46009.85 Normalized cointegrating coefficients (standard error in parentheses) GLOBAL EXCHANGE_R ATE INTEREST_RAT E ISE 1.000000 -0.000120 -16.55210 -0.026394 (0.00014) (1.44457) (0.00218) Log likelihood -45994.85 2 Cointegrating Equation(s): Normalized cointegrating coefficients (standard error in parentheses) GLOBAL EXCHANGE_R ATE INTEREST_RAT E ISE 1.000000 0.000000 -15.66567 -0.024910 (1.30729) (0.00194) 7382.376 12.36210 (1649.84) (2.44689) 0.000000 1.000000 Therefore, we can conclude that in the long term these three variables are cointegrated and there are 2 cointegration equations Pairwise Granger Causality Tests Date: 07/20/08 Time: 10:40 Granger Causality Test: In order to compare pairwise variables Granger Causality Tests is used Sample: 1/02/2001 12/31/2007 Lags: 5 Null Hypothesis: INTEREST_RATE does not Granger Cause EXCHANGE_RATE Obs F-Statistic Probability 174 5 28.3482 1.1E-27 32.1459 2.0E-31 58.2545 3.6E-56 2.31559 0.04151 21.4690 6.7E-21 1.16105 0.32611 12.2991 9.7E-12 0.10286 0.99158 2.98831 0.01084 1.48645 0.19105 20.7727 3.3E-20 1.91422 0.08894 EXCHANGE_RATE does not Granger Cause INTEREST_RATE ISE does not Granger Cause EXCHANGE_RATE 174 5 EXCHANGE_RATE does not Granger Cause ISE GLOBAL does not Granger Cause EXCHANGE_RATE 174 5 EXCHANGE_RATE does not Granger Cause GLOBAL ISE does not Granger Cause INTEREST_RATE 174 5 INTEREST_RATE does not Granger Cause ISE GLOBAL does not Granger Cause INTEREST_RATE 174 5 INTEREST_RATE does not Granger Cause GLOBAL GLOBAL does not Granger Cause ISE ISE does not Granger Cause GLOBAL 174 5 ARDL APPROACH TO COINTEGRATION AUTOREGRESSIVE DISTRIBUTED LAGS (ARDL) APPROACH • In regression analysis if model includes both current and lagged values for independent variables it is called distributed lags model and if model also includes lagged values of dependent variable it is called as autoregressive model (Gujarati, 2004). AUTOREGRESSIVE DISTRIBUTED LAGS (ARDL) APPROACH • Autoregressive distributed lags method allows us to express cointegrated behavior of variables which have different order of integration. • ARDL procedure is irrespective whether variables used in model are I(0), I(1) or mutually cointegrated (Peseran et al., 2001). ARDL MODEL REPRESENTATION BOUNDS TESTING PROCEDURE • Cointegration test for ARDL method is applied through bound testing procedure • In this test there are two set of asymptotic values which assume that all variables are I(1) in one set and I(0) in another. These two sets provide critical value bounds for cointegration for both I(1) and I(0) data sets. BOUNDS TESTING PROCEDURE • For applying ARDL procedure 3 steps are required as: – Applying bounds testing procedure for detecting cointegration ranks between variables – Estimating long run relationship coefficients with respect to cointegration relations estimated in first step and – Estimating short run dynamic coefficients through vector error correction modeling. BOUNDS TESTING PROCEDURE DECISION RULE FOR THE TEST • Test on the null hypothesis through an F-statistics and the critical values calculated by Peseran et al. (2001) • It is assumed that lower bound critical values could be used for I(0) variables and upper bound critical values are used for I(1) variables. – if computed F-statistics is less than lower bound critical values the null hypothesis is rejected that there is no long run relationship between variables – if computed F-statistics is greater than the upper bound value, it could be claimed that variables used in the model are cointegrated. – if computed F-statistic falls between the lower and upper bound values, then the test results are inconclusive EXAMPLE OF BOUNDS TESTING PROCEDURE Cointegration hypothesis F-statistics F(CON|GDP,IND,LOS,PRICE,URB) 3.1012** F(GDP|CON,IND,LOS,PRICE,URB) 6.3478* F(IND|CON,GDP,LOS,PRC,URB) 7.2093* F(LOS|CON,GDP,IND,PRICE,URB) 1.8595 F(PRC|CON,GDP,IND,LOS,URB) 5.5008* F(URB|CON,GDP,IND,LOS,PRC) 0.88845 significance at 1%, ** at 2.5% levels with respect to Pesaran and Pesaran (1997) critical values. * Bounds Test results indicates that there are four cointegated relations when dependent variables are selected as annual electricity consumption, GDP, industry value added and mean adjusted annual average electricity prices EXAMPLE OF BOUNDS TESTING PROCEDURE VECTOR ERROR CORRECTION MODELS • After implementing Bounds-Testing procedure to determine cointegrating relationships, short and long run coefficients and related Error Correction Models (ECM) have been estimated within ARDL method whose orders are selected with respect to Schwarz Information Criterion (SIC). • In other words, vector error correction could be described as a restricted vector autoregression, used for cointegrated nonstationary variables. • VECM is useful for determining short term dynamics between variables by restricting long run behavior of variables. It restricts long run relationships through their cointegrating relations and error correction term represents the deviation from long run equilibrium. VECTOR ERROR CORRECTION MODELS EXAMPLE FOR ESTIMATING COEFFICIENTS The long run coefficient estimates and ECM. Dependent variable CONS, ARDL(3,2,2,3,3,3) (a) Estimated Long Run Coefficients Regressor Coefficients Standard Error T-Ratio (Prob*1%,**5%) GDP 0.0020823 0.1058E-3 19.6864* IND -0.50248 0.052200 -9.6260* LOS 1.9291 0.059938 32.1848* PRC -0.17769 3.6958 -0.048078* URB 1977.1 104.0641 18.9993* Constant -627.1865 37.8846 -16.5551 (a) Error Correction Representation for the ARDL Model ΔCON(-1) 1.1063 0.23387 4.7303* ΔCON(-2) 0.63176 0.18166 3.4777* ΔGDP 0.0027787 0.4203E-3 6.6117* ΔGDP(-1) -0.0017604 0.4911E-3 -3.5848* ΔIND -0.74040 0.14494 -5.1084* ΔIND(-1) 0.42188 0.13797 3.0579* ΔLOS 2.7343 0.44020 6.2116* ΔLOS(-1) -0.035601 0.46645 -0.076323(.940) ΔLOS(-2) -2.0454 0.49798 -4.1074* ΔPRC -21.6140 6.7027 -3.2247* ΔPRC(-1) -14.7163 6.7179 -2.1906** ΔPRC(-2) -19.7450 8.4128 -2.3470** ΔURB 2992.8 1910.6 1.5664(.132) ΔURB(-1) 141.2130 2060.3 0.068541(.946) ΔURB(-2) -5826.9 1823.4 -3.1957* INTERCEPT -1286.2 192.2404 -6.6905* ECM(-1) -2.0507 0.30552 -6.7122* EXAMPLE FOR ESTIMATING COEFFICIENTS REFERENCES • • • • • • Pesaran, M.H., Shin, Y., Smith, R.J.,2001. Bounds testing approaches to the analysis of level relationships. Journal of Applied Econometrics. 16, 289-326. Pesaran M.H., and Pesaran B.,1997. Working with Microfit 4.0: interactive econometric analysis. Oxford University Press. Hamilton J.D.A., 1994. The time series analysis. New Jersey, Princeton University Press. Engle, R.F., Granger, C.W.J, 1987. Co-integration and error correction: Representation, estimation and testing. Econometrica. 55, 251-276. World Bank Statistics Service (data resource). Available at: http://data.worldbank.org/ (Accessed: 4.29.2013). TEIAS Electricity Statistics (data resource). Available at: http://www.teias.gov.tr/istatistikler.aspx (Accessed: 4.29.2013).