Preliminary Draft July 2013 The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on Topics Relevant to Financial Analysis Houston H. Stokes Professor of Economics University of Illinois at Chicago hhstokes@uic.edu PREFACE .................................................................................................................................................................... 1 Chapter 1 ...................................................................................................................................................................... 2 1.1 OVERVIEW ...................................................................................................................................................... 2 Figure 1.1-1 GAM Smoothed AGE series ............................................................................................................. 9 Figure 1.1-2 GAM Smoothed BDEFICIT series .................................................................................................. 9 1.2 Plan of Book ..................................................................................................................................................... 10 Table 1.1 Useful Concepts.................................................................................................................................. 12 PREFACE This monograph is intended to give the reader the essentials of modern times series analysis without emphasis on proving theorems. The focus is on illustrating theorems with actual problems with the B34S® software and other programs such as SCA and RATS. Actual and generated datasets are used. The reader can cut and paste self contained code to extend the examples. If the user does not have access to a computer, the output of the examples can be studied for insight. This monograph should be read in conjunction with Stokes (1997, 200x), Stokes-Neuburger (1998), Enders (1995, 2004, 2010) and other time series texts such as Liu (2005). While Stokes (1997, 200x) covers a wide range of topics, this monograph's goal is to expand on material in Stokes (1997, 200x), particularly chapter 7, 8, 14 and 15. A major objective is to show the setups and output of a wide range of problems and discuss the results. I learned many years ago from Henri Theil, it is only through practice, through looking at printouts and results, that one gets a real feel of how to apply econometrics to real problems. The goal is to facilitate this process by showing how time series methods can be thought of as an extension of traditional econometric analysis. Many people have assisted me in the writing phase of this study My wife, Diana A. Stokes has helped in the editing phase of the document and provided continuing and invaluable support. Jin-Man Lee has helped in the testing of the matrix command and advised on content. My students in Economics 537 and 538 have helped me improve the exposition and the focus. This document, however, is still a work in progress. Houston H. Stokes September 2007 Chapter 1 in The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on Topics Relevant to Financial Analysis © by Houston H. Stokes 16 July 2013. All rights reserved. Preliminary Draft Chapter 1 1.1 OVERVIEW This overview will give you a brief “road map” of how modern time series analysis fits into general econometric modeling. In this overview, not all details are given nor are terms completely defined. As the monograph proceeds, please refer to this section to see were you are going, where you came from and how much you now understand.1 At every step computer applications are shown to see how the theory is applied. These applications use the B34S® software. Most sample problems are variants of examples distributed with the B34S so that they can be modified by users for further experimentation. If this is not the case, since the examples are complete, they can be "cut" out of this document and run. The goal of this overview is to provide a rough outline of how time series fits into applied Econometric Analysis. The basic OLS model for k input series2 is k yt k xt j i , i ut (1.1-1) i 1 (1.1-1) assumes no serial correlation E (ut ut k ) 0 for k>0 , no heteroskedasticity (constant residual variance), and that all coefficients are stable over time and independent of the level of the explanatory variable. Define B as the backshift operator such that Bk xt xt k . Many books use L in place of B. For k > 0. The simple GLS model for one input series (here xt xt ,1 ) is yt 1 xt 2 zt [ut /(1 1B .. k Bk )] yt (1 1B .. k Bk ) (1 1B .. k Bk ) 1 xt (1 1B .. k Bk ) (1.1-2) 2 zt (1 1B .. k B ) ut k 1 Key references for these notes are: Enders (1995, 2004, 2010), Stokes (1997, 200x), Nelson (1973) and the major work Box-Jenkins-Riensel (2008) that started everything in. Hamilton (1994) is a comprehensive but highly technical reference. Much of modern Time Series has found its way into basic texts such as Greene (2000). A simplified treatment is contained in Liu (2005). Finance applications of Time Series include Campbell-LoMacKinley (1997), Lo-MacKinley (1999) and Stokes-Neuburger (1998). This monograph attempts to summarize key topics in these references with a major emphasis on applied examples. Until this note is removed, it should be treated a "provisional text." The B34S matrix command and other time series commands are used to illustrate the theory. 2 It is assumed that the reader has had a basic statistics course and knows the OLS model. 2 Equation (1.1-2) can be viewed as a special case of the more general rational distributed lag model and the even more general transfer function model given in (1.1-8). A number of methods can be used to estimate the GLS AR parameters. ML (Maximum likelihood) methods use nonlinear estimation to estimate these terms jointly with the model. Two pass methods are simpler but implicitly assume that the covariance between the AR terms and the other coefficients in the model are zero. A subset of the transfer function model is the ARIMA(p,d,q) model where q refers to the AR part of the model, d refers to the differencing part of the model and q refers to the MA part of the model.. If we take the GLS model (1.1-2) and generalize the error process to have q moving average terms and p autoregressive terms and no differencing terms (d=0), we have yt 1xt 2 zt ut (1 1B, , p Bq ) /(1 1B, , q B p ) (1.1-3) It may be more parsimonious to express the error process as a ratio of polynomials than as many AR terms. In general, if there is invertability, a MA(1) model can be written as an AR() model and an AR(1) model can be written as a MA() model. If we have only one series, then it may be possible to filter (remove the autocorrelation) of the series with an ARIMA model which can be written from (1.1-3) as yt ut (1 1B, , p Bq ) /(1 1B, , q B p ) (1.1-4) of if differencing on yt is needed to obtain stationarity yt (1 B) ut (1 1B, , p Bq ) /(1 1B, , q B p ) (1.1-5) Note that (1.1-5) is a special case of the more general multiple series model (1.1-3). It will be shown later that (1.1-3) is a special case of the more general transfer function model. The great advantage of the ARMA model (1.1-5) is that it can be applied in cases where we do not have a good theory on what series to use on the right hand side of a model such as (1.1-3) although autocorrelation analysis indicates there is structure in the yt series. An ARMA model, if correctly identified, captures the structure in the series, or filters the series to obtain white noise, and can be used to forecast ahead. Forecasts of such models can be updated in the field without having to re-estimate the model. In contrast, a model of the form yt f ( xt ) with no lags can only be used to forecast if we have future data on xt . This is often hard to achieve in practice unless xt k is an expectation. In summery, the objective of the ARIMA model building is to select the appropriate terms in the AR and MA part of the model so that there is no structure left in the error process. The autocorrelation and partial correlation functions help us in this task. The Dickey-Fuller and Phillips Perron tests can be used to determine if the series are stationary. Consider (1.1-6). Unless the series 1 ( B)1 ( B) xt is stationary, the ACF and PACF of yt cannot be calculated. If a stationary series is being used to predict a nonstationary series, the theory of the model is not consistent with the state of the world. If a model between two nonstationary series is estimated, the error may or may not be stationary. If the errors of the model are stationary, then the series are assumed to be cointegrated or a linear combination of the two series is stationary. Unit root analysis helps us in this task. A rational distributed lag model starts with (1.1-1) and adds a ratio of polynomials for the coefficients. Using mathematical shorthand yt [1 ( B) / 1 ( B)]xt [ 2 ( B) / 2 ( B)]zt ut (1.1-6) where i ( B) and i ( B ) are polynominals in the lag operator B. If we add an ARMA process for the error from (1.1-3) we obtain yt [1 ( B) / 1( B)]xt [ 2 ( B) / 2 ( B)]zt ut [(1 1B , , q B p ) /(1 1 , , p B q )] (1.1-7) which can be written more generally and compactly as yt [1 ( B) / 1 ( B)]xt [ 2 ( B) / 2 ( B)]zt [ ( B) / ( B)]ut (1.1-8) Equation (1.1-8) is a transfer function. It assumes that the dynamic relationship moves from x and z to y. Feedback from y to x and y to z is ruled out. The noise model is an ARIMA model that allows us to model systematic errors and improve the forecasting. Transfer function models can be tested to see if there is feedback. If feedback is found, VAR, VMA or VARMA models should be considered. If we drop the assumption of no feedback, then the general VARMA model (see Stokes (1997) page 197) becomes G ( B ) Zt D( B ) u t (1.1-9) where Zt’ is the tth observation on k series {x1 t , , xk t } and G(B) and D(B) are k by k polynomial matrices in which each element, Gi j ( B ) and Di j ( B ) is itself a polynomial vector in the lag operator B. Assuming k = 3, we can write (1.1-9) as G11 ( B) G12 ( B) G13 ( B) x1t D11 ( B) D12 ( B) D13 ( B) u1t G ( B ) G ( B ) G ( B ) x D ( B ) D ( B ) D ( B ) u 22 23 22 23 21 2t 21 2t G31 ( B) G32 ( B) G33 ( B) x3t D31 ( B) D32 ( B) D33 ( B) u3t (1.1-10) If Gi , j ( B ) Di , j ( B ) 0 for i j , then the above model reduces to three ARIMA models. If Gi , j ( B ) Di , j ( B ) 0 for i j , then series 1 is exogenous to series 2 and series 2 is exogenous to series 3. If Di , j ( B ) I , then we have an VAR model, while if Gi , j ( B ) I we have a VMA 4 model. It is important to note that while the VARMA form of the model may be the most parsimonious form, if invertability is possible, a VARMA model can be written in VAR or VMA form as [ D( B)]1 G( B) or [G( B)]1 D( B) respectively. If such a model were to be estimated, then many parameters would turn out not to be significant due to covariance between the parameters. Note that the transfer function model (1.1-8) is a special case of the more general VARMA model (1.1-10) just as the ARIMA model is a special case of the ARIMA model. The above models all assume constant variance of the error term or homoskedasticity. Engle (1982) and others developed a class of models that drops this assumption and estimates both the first moment and the second moment of a process. These approaches, that can be applied to both univariate and multivariable models, will be sketched. Assume the variance of a model of yt , conditional on all information known up to period t-1 t1 , can be written V ( yt | t 1 ) 0 1et21 ... q et2q (1.1-11) where et is the error of the first moment equation. Such an ARCH model attempts to explain variance clustering in the residuals. ARCH models imply nonlinear dependence among the squared errors of the first moment model. If we define vt V ( yt | t 1 ) , then the GARCH second moment equation is: vt 0 1et21 ... q et2q 1vt 1 ... p vt p (1.1-12) which can be seen as an ARIMA (p,0,q) model on the second moment. Such a model can be generalized to a transfer function. For example a GARCH-M(p,q), and MA(1) maximizes .5*(log(vt ) et2 / vt ) where yt a i 0 i Bi xt j 1 j B j yt (vt ) 2 et et 1 k m (1.1-13) vt a0 j 1 a j et2 j i 1 gi vt i q p If we drop et from the first equation and assume either no input series or that the input series starts at lag t-1, we can think of it as the expected value of yt conditional on information known at period t-1. The second equation calculates the expected value of the squared error of the first equation, conditional on information known up to period t-1 or Et 1et2 vt . If 0, k 1, m 1 we have a GARCH(1,1) model on the error term of a transfer function. If 0 we have a GARCH(p,q)-M transfer function model since information from the second equation on the second moment is feeding back into the first equation. Such models cannot be estimated using the two pass method if we use Et 1et2 vt . If there is no input, then we just have a GARCH(p,q) or GARCH(p,q)-M model depending on the value of . Here we show the case where 0 . yt a j 1 j B j yt et m (1.1-14) vt a0 j 1 a j (et j )2 i 1 gi vt i q p If 0 , there is no feedback from the second moment equation to the first moment equation and the GARCH(1,1) model can be estimated using either the joint (one pass) method or the two pass method. The two pass method is computationally much simpler and was originally suggested by Engle (1982). For a one variable models (ARIMA), the two pass approach involves estimation of two ARIMA models. The first model is on the series. The second model is on the square of the errors of the first moment model. The ARCH/GARCH class of models can be extended to multiple series where it is called BGARCH or bivariate garch. This can be seen as a VARMA model of the form of (1.1-10) on the second moment. One main assumption of the VARMA model is that there is a multidimensional hyperplane. This means that no matter what the values of the variables, the same coefficients are used. This is a strong, but widely made, assumption, that if the model is nonlinear, will trigger the nonlinearity tests in Stokes (200x chapter 8). One way to proceed is to use theory, or luck, to parameterize a nonlinear model. This is very difficult, unless theory provides a clear formulation. As an alternative MARS, GAM and ACE models attempt to automatically detect any nonlinearity in the model. These approaches are covered in some detail, in Stokes (200x) and will only be sketched here. The MARS model, covered in more detail in Stokes (200x Chapter 14), drops the assumption of a hyperplane. Variables on the right no longer have to be "switched on" all the time. The VAR, AR, and OLS models become special cases of this more general nonlinear representation that includes the TAR (threshold autoregressive model) as a special case. Assume a nonlinear model of the form y f ( x1, , xm ) e (1.1-15) involving N observations on m right-hand-side variables, x1 , approximate the nonlinear function f( ) by , xm . A MARS model attempts to s fˆ ( X ) c j K j ( X ), (1.1-16) j 1 where fˆ ( X ) is an additive function of the product basis functions {K j ( X )}sj 1 associated with the s sub regions {R j }sj 1 and c j is the coefficient for the j th product basis function. If all sub regions include the complete range of each of the right-hand-side variables, then the coefficients {c j }sj 1 can be interpreted as just OLS coefficients of variables or interactions among variables. The B34S mars procedure can identify the sub regions under which the coefficients are stable 6 and detect any possible interactions up to a maximum number of possible interactions controllable by the user. For example, assume the model y 1 x e for x 100 (1.1-17) 2 x e for x 100 In terms of the MARS notation, this is written y ' c1 ( x * ) c2 ( * x ) e , (1.1-18) where * 100 and ( )+ is the right (+) truncated spline function which takes on the value 0 if the expression inside ( )+ is negative and its actual value if the expression inside ( )+ is > 0. Here c1 1 and c2 2 . In terms of equation (1-12), K1 ( X ) ( x * ) and K 2 ( X ) ( * x ) . Note that the derivative of the spline function is not defined for values of x at the knot value of 100. Friedman (1991b) suggests using either a linear or cubic approximation to determine the exact y value. In the results reported later, both evaluation techniques were tested and the one with the lowest sum of squares of the residual was selected. The MARS user selects the maximum number of knots to consider and the highest order interaction to investigate. Alternatively, the minimum numbers of observations between knots can be set. An example of an interaction model for y f ( x, z ) follows. y c1 ( x 1* ) c2 ( 1* x ) c3 ( x 1* ) ( z 2* ) e (1.1-19) implies that y c1 x c1 1* e for x 1* and z 2* c2 x c2 1* e for x 1* (1.1-20) c1 x c c3 ( xz z x ) e for x and z . * 1 1 * 1 * 2 * * 1 2 * 1 * 2 An alternative to the MARS model is the GAM (generalized additive model) model discussed by Hastie-Tibshirani (1986, 1990) and Faraway (2006, 240). Assuming y f ( x1 , x2 ,..., xk ) where xi and y are one dimensional vectors, a GAM model can be written as k E ( y | x1, x2 ,..., xk ) 0 a j ( x j ) (1.1-21) j 1 where the j (.) are smooth functions standardized (to remove free constants) so that E j ( x j ) 0 and estimated one at a time using forward stepwise estimation using a scatterplot smoother. When (1.1-21) is estimated with OLS, the expected coefficients are all 1.0. The user sets the degree of the smoother. The B34S implementation allows the user who has set the degree of the smoother > 1 to see the “cost” in the sense of an increase in the errors sum of squares if the linearity assumption (degrees of freedom = 1) was imposed. A significance test that measures the difference of the sum of squares of the residuals for the linear restriction case and the DF> 1 case allows a relative measure of the degree of nonlinearity, by variable, that is assumed away if the model was estimated using OLS. The Hastie-Tibshirani (1990. 87) data on diabetes that models ln(level of serum C-peptide) (lpeptide) as a function of age and base deficit (bdeficit) will be used to show how the GAM model can be used to graphically display the degree of nonlinearity.. Annotated Output follows. Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 40) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable AGE BDEFICIT CONSTANT Lag 0 0 0 Coefficient 0.15016835E-01 0.89648947E-02 1.4828545 LPEPTIDE 0.3742769804943026 0.3429908295190177 0.6685211174195327 1.671302793548832E-02 0.1292788766020510 1.068397831915541 28.50922237373565 1.545441506335143 0.1594930832890150 4.189827641219706 11.96302417609537 0.9999153469426932 5.670219014249498E-04 0.2937646199111468 43 SE 0.50989105E-02 0.28794517E-02 0.59691527E-01 t 2.9451067 3.1134033 24.841960 Generalized Additive Models (GAM) Analysis Reference: Generalized Additive Models by Hastie and Tibshirani. Chapman (1990) Model estimated with GPL code obtained from R. Gaussian additive model assumed Identity link - yhat = x*b + sum(splines) Response variable .... Number of observations: Residual Sum of Squares # iterations # smooths/variable Mean Squared Residual df of deviance Scale Estimate Primary tolerence Secondary tolerance R square Total sum of Squares Model df -----------1. 3.00 3.00 ----7.00 coef ---1.47871 0.147574E-01 0.816856E-02 LPEPTIDE 43 0.4518579290553415 1 9 1.050832393151957E-02 35.99996595688997 1.255162100976546E-02 1.000000000000000E-09 1.000000000000000E-09 0.5770695937811844 1.068397831915541 st err z score -----------0.5173E-01 28.59 0.4419E-02 3.340 0.2495E-02 3.274 8 nl pval ------- lin_res ------- 0.9941 0.8278 0.6085 0.5145 Name ---intcpt AGE BDEFICIT Lag --0 0 The gain from going from OLS to GAM was to reduce e ' e from .66852 to .45186. When age is restricted to be linear, e ' e becomes .6085, which is a highly significant difference. When bdeficit is restricted to be linear, e ' e increases to .5145, which is significant at .8278. Plots of the smoothed right hand side variables are shown next and should be compared to the same plots in Hastie-Tibshirani (1990, 87) figure 4.3. Surface AGE .10 .05 0 -.05 L E V E R A G E -.10 -.15 -.20 -.25 -.30 -.35 2 4 6 8 10 12 14 16 X_VAR Figure 1.1-1 GAM Smoothed AGE series Surf ace B D EFI CI T .15 .10 .05 L E V E R A G E 0 -.05 -.10 -.15 -.20 -30 -25 -20 -15 -10 X _VA R Figure 1.1-2 GAM Smoothed BDEFICIT series -5 0 L O W E R _ B U P P E R _ B L O W E R _ B U P P E R _ B Another alternative is the ACE model (Brieman-Friedman (1985) that smoothes both the left hand side and the right hand sides. The ACE model is written as k ( y ) 0 a j ( x j ) (1.1-22) j 1 If is invertible, (1.1-22) can be written as k ˆ 1[ˆ aˆ ( x )] . y 0 j j (1.1-23) j 1 Once the model is estimated. The ACE algorithm minimizes the squared error k E{( y ) 0 j ( x j )}2 subject to var{( y )} 1 . The steps of the ACE algorithm3 are: j 1 (i) (ii) (iii) Initialize by setting ( y ) { y E ( y )}/{var( y )}.5 Fit an additive model to ( y ) that will obtain new functions f1 ( x1 ),..., f p ( x p ). ˆ ( y ) E{ f ( x ) | y} and update the left hand side by forming Compute j j j ˆ ( y ) /[var{ ˆ ( y )}].5 . ( y ) (iv) Alternate: steps (ii) and (iii) until E{( y ) f j ( x j )}2 does not change. j Step (ii) can be thought of as for a fixed , the minimizing fi ( xi ) is f ( X ) E{ ( y ) | X } while step (iii) can be thought of as for fixed f ( ) , the minimizing is ( y ) E{ f ( X ) | y}. In addition to forecasting, an advantage of the ACE procedure may be as a diagnostic tool that will lead to an understanding whether a model can be safely estimated as linear or that some kind of transformation is needed to one of both sides of the equation. The question arises concerning why it is often necessary to transform both sides of the equation, not just the right hand side as is done with GAM and MARS. Hastie and Tibshirani (1990) make the point that a model of the form y exp( x z 2 )e cannot be estimated in additive form by GAM or MARS but a simple additive model can be found that describes log( y ) . For example log( y ) x z 2 e . 1.2 Plan of Book Table 1.1 lists a number of useful concepts that will be developed further in this book and in the revision of Stokes (1997) that is referred to as Stokes (200x). 3 See Hastie-Tibshirani (1990, 176) for details. The discussion of ACE has been taken from this key reference with minor modifications. 10 Chapter 2 provides an overview of time series modeling objectives. White noise is defined and a number of basic filtering techniques are discussed and illustrated using the B34S software system. While these may appear naive, they provide a "base case" which much be beaten if a more complex model is to be used. Many of the Data Mining applications use such simple models when it is not possible due to time constraints or data constraints to develop a more sophisticated approach. The most basic of these methods of forecasting in the "no change extrapolation" which asserts that the expectation of a series in period t+1 formed in period t is is actual value in period t or t 1 xte xt . Chapter 3 looks at stability conditions and outlines how the frequency and time series representations of a series are related. Chapter 4 provides a brief discussion of stationary time series models. The autocorrelation and partial correlation functions are defined and shown to map to the AR (autoregressive) and MA (moving average) coefficients. Use of simulation techniques will give the reader confidence on how to use these core diagnostic tools. Chapter 5 covers the estimation of AR(p), MA(q) and ARMA(p,q) models both with user input and using automatic or "expert" systems. Chapter 6 deals with filtering issues and introduces some of the concepts involved in cointegration tests of economic series. Rather that dealing solely with the theory, the main point is to show that models that do not adequately check for unit roots run serious risks of being invalid. Chapter 7 involves relaxing the assumption of homoskedasticity. The efficient markets assumption suggests that an expected stock price series for period t+k t j pte should embody all known information t up to that time period, or t j pte ( pt | t ) . While the ACF of pt may show no spikes, the ACF of the squared residual ( pt )2 may show spikes. ARCH and GARCH models model both the first and second moment of ARIMA models and have been shown to be quite useful in financial modeling. In general the squared residual of any time series model can and should be inspected. Chapter 8 covers bivariate GARCH models that are discussed in the light of VAR models. Table 1.1 Useful Concepts ______________________________________________________________________ The VAR model can be expressed in the frequency domain. The assumption of coefficient stability can be tested using recursive residual analysis, which involves recursive estimation. If the assumption of coefficient stability is violated, the marspline, gamfit, acefit and pispline commands can be used, provided that the instability is level dependent. These techniques allow for automatic detection of parameter shifts and provide diagnostic tests for how the model is doing. If the instability is time dependent, then it may be possible to parameterize the model to take this into account. . A dynamic system can be studied in both the time and frequency domain. The varfreq command can be used to study a VAR model in the frequency domain. The wavelet command under the matrix command can be used to remove short duration noise from data. The assumption of linearity can be tested using the Hinich (1982) test, the Hinich (1996) test, the BDS test, the Tsay (1986) test and other nonlinearity tests. Since not all nonlinearity tests can detect all types of models, which test shows nonlinearity may indicate how linearity is violated and provide guidance on solutions. The assumption of homoskedasticity can be tested using the Engle (1982) test or tests on the autocorrelations of the squared residuals. If heteroskedasticity is found, an ARCH or GARCH model can be estimated. (see Stokes (1997) page 337 – 342). For multiple series models BGARCH or bivariate GARCH model can be used. Such models can jointly estimate the first moment and the second moment or employ the more econometrically tractable two pass method of estimation. A bivariate GARCH model for two series can be thought of as VARMA model on both the first and the second moment. Simultaneous equations models can be shown to be special cases of the more general VARMA class of models. _____________________________________________________________________________________ 12