Financial Econometric Modelling Stan Hurn, Vance Martin, Peter Phillips and Jun Yu Preface This book provides a broad ranging introduction to the financial econometrics from a thorough grounding in basic regression and inference to more advanced financial econometric methods and applications in financial markets. The target audiences are intermediate and advanced undergraduate students, honours students who wish to specialise in financial econometrics and postgraduate students with limited backgrounds in finance who are doing masters courses designed to offer an introduction to finance.Throughout the exposition, special emphasis is placed on the illustration of core concepts using interesting data sets and emphasising a hands-on approach to learning by doing. The guiding principle that is adopted is only by working through plenty of applications and exercises can a coherent understanding of the properties of financial econometric models and interrelationships with the underlying finance theory be achieved. Organization of the Book Part ONE is designed to be a semester long first course in financial econometrics. Consequently the level of technical difficulty is kept to a bare minimum with the emphasis on the intuition. Slightly more challenging sections are included but are clearly marked with a dagger † and may be omitted without losing the flow of the exposition. The main estimation technique used is limited to ordinary least squares. Of course this choice does require the discussion to be quite loose in places, but in these instances are revisited later in Parts TWO and THREE so that a fuller picture can be obtained if desired. Although there are specific applications and reproductions of results from papers that use a variety of data sources, by and large the general concepts are illustrated using the stock market data that is downloadable from the homepage of Nobel Laureate Robert Shiller.1 This data set consists of monthly stock price, dividends, and earnings data and the consumer price index all starting January 1871. The data set used is truncated at June 2004 at the time of writing the data is current to 2013 and is updated regularly. This is deliberate, in that it allows both the reproduction of the examples and illustrations in the book, but also allows the reader to explore the effects of the using the more recent data. The level of difficulty steps up a little in Parts TWO and THREE are aimed at more advanced undergraduates, honours and masters students. 1 http://www.econ.yale.edu/~shiller/data.htm iv The material in these two parts is more than enough for a semester course in advanced financial econometrics. Computation All the results reported in the book may be reproduced using the econometric software packages EViews and Stata. In some cases the programming languages of these packages needs to be used. For those who actively choose to learn by programming the results are also reproducible using the R programming language.2 Presenting the numerical results of the examples in the text immediately gives rise to two important issues concerning numerical precision. In all of the examples listed in the front of the book where computer code has been used, the numbers appearing in the text are rounded versions of those generated by Eviews. The publication quality graphics were generated using Stata. The fact that all the exercises, figures and tables in the text can be easily reproduced in these three environments helps to bridge the gap between theory and practice by enabling the the reader to build on the code and tailor it to more involved applications. The data files used by the book are all available for download from a companions website (www.finects.book) in EViews format (.wf1), Stata format (.dta) and as Excel spreadsheets (.xlsx). A complete description of the variables, frequency, sample and number of observations in each data set is available in Appendix A. Code to reproduce the figures, examples and complete the exercises is also available. Acknowledgements Stan Hurn Vance Martin, Peter Phillips and Jun Yu December 2013 2 EViews is the copyright of IHS-Inc. www.eviews.com, Stata is the copyright of StataCorp LP www.stata.com and R www.r-project.org is a free software environment for statistical computation and graphics which is part of the GNU Project. Contents List of illustrations PART ONE 1 2 BASICS page 1 1 Properties of Financial Data 1.1 Introduction 1.2 A First Look at the Data 1.2.1 Prices 1.2.2 Returns 1.2.3 Simple Returns 1.2.4 Log Returns 1.2.5 Excess Returns 1.2.6 Yields 1.2.7 Dividends 1.2.8 Spreads 1.2.9 Financial Distributions 1.2.10 Transactions 1.3 Summary Statistics 1.3.1 Univariate 1.3.2 Bivariate 1.4 Percentiles and Computing Value-at-Risk 1.5 The Efficient Markets Hypothesis and Return Predictability 1.6 Efficient Market Hypothesis and Variance Ratio Tests† 1.7 Exercises 3 3 4 4 6 8 8 10 10 11 14 14 16 18 19 22 23 Linear Regression Models 2.1 Introduction 35 35 27 30 32 vi 3 Contents 2.2 2.3 Portfolio Risk Management Linear Models in Finance 2.3.1 The Constant Mean Model 2.3.2 The Market Model 2.3.3 The Capital Asset Pricing Model 2.3.4 Arbitrage Pricing Theory 2.3.5 Term Structure of Interest Rates 2.3.6 Present Value Model 2.3.7 C-CAPM † 2.4 Estimation 2.5 Some Results for the Linear Regression Model† 2.6 Diagnostics 2.6.1 Diagnostics on the Dependent Variable 2.6.2 Diagnostics on the Explanatory Variables 2.6.3 Diagnostics on the Disturbance Term 2.7 Estimating the CAPM 2.8 Qualitative Variables 2.8.1 Stock Market Crashes 2.8.2 Day-of-the-week Effects 2.8.3 Event Studies 2.9 Measuring Portfolio Performance 2.10 Exercises 36 38 38 39 40 41 41 42 43 45 46 49 49 50 52 54 57 57 59 60 61 66 Modelling with Stationary Variables 3.1 Introduction 3.2 Stationarity 3.3 Univariate Autoregressive Models 3.3.1 Specification 3.3.2 Properties 3.3.3 Mean Aversion and Reversion in Returns 3.4 Univariate Moving Average Models 3.4.1 Specification 3.4.2 Properties 3.4.3 Bid-Ask Bounce 3.5 Autoregressive-Moving Average Models 3.6 Regression Models 3.7 Vector Autoregressive Models 3.7.1 Specification and Estimation 3.7.2 Lag Length Selection 3.7.3 Granger Causality Testing 74 74 75 76 76 77 80 81 81 82 83 83 84 85 85 88 90 Contents 3.8 3.7.4 Impulse Response Analysis 3.7.5 Variance Decomposition 3.7.6 Diebold-Yilmaz Spillover Index Exercises vii 91 92 93 95 4 Nonstationarity in Financial Time Series 4.1 Introduction 4.2 Characteristics of Financial Data 4.3 Deterministic and Stochastic Trends 4.3.1 Unit Roots† 4.4 The Dickey-Fuller Testing Framework 4.4.1 Dickey-Fuller (DF) Test 4.4.2 Augmented Dickey-Fuller (ADF) Test 4.5 Beyond the Dickey-Fuller Framework† 4.5.1 Structural Breaks 4.5.2 Generalised Least Squares Detrending 4.5.3 Nonparametric Adjustment for Autocorrelation 4.5.4 Unit Root Test with Null of Stationarity 4.5.5 Higher Order Unit Roots 4.6 Price Bubbles 4.7 Exercises 101 101 101 105 109 110 110 114 116 116 117 119 119 120 121 125 5 Cointegration 5.1 Introduction 5.2 Equilibrium Relationships 5.3 Equilibrium Adjustment 5.4 Vector Error Correction Models 5.5 Relationship between VECMs and VARs 5.6 Estimation 5.7 Fully Modified Estimation† 5.8 Testing for Cointegration 5.8.1 Residual-based tests 5.8.2 Reduced-rank tests 5.9 Multivariate Cointegration 5.10 Exercises 131 131 132 134 136 138 140 143 148 148 150 154 156 6 Forecasting 6.1 Introduction 6.2 Types of Forecasts 6.3 Forecasting with Univariate Time Series Models 6.4 Forecasting with Multivariate Time Series Models 6.4.1 Vector Autoregressions 162 162 162 164 168 169 viii Contents 6.4.2 Vector Error Correction Models 6.5 Forecast Evaluation Statistics 6.6 Evaluating the Density of Forecast Errors 6.6.1 Probability integral transform 6.6.2 Equity Returns 6.7 Combining Forecasts 6.8 Regression Model Forecasts 6.9 Predicting the Equity Premium 6.10 Stochastic Simulation 6.10.1 Exercises 170 172 175 176 178 179 182 184 189 193 PART TWO 201 ADVANCED TOPICS 7 Maximum Likelihood 7.1 Introduction 7.2 The Likelihood Principle and the CAPM 7.3 A Duration Model for Trades 7.4 A Constant Mean Model of the Interest Rate 7.5 The Log-likelihood Function 7.6 Analytical Solution 7.6.1 Duration Model 7.6.2 Returns 7.6.3 Models of Interest Rates 7.7 The Log-Likelihood Function 7.8 Numerical Approach 7.8.1 Returns 7.8.2 Durations 7.9 Properties of Maximum Likelihood Estimators 7.10 Hypothesis Tests based on the Likelihood Principle 7.11 Testing CAPM 7.12 Testing the Vasicek Model of Interest Rates 7.13 Exercises 203 203 203 204 207 207 209 209 211 214 215 216 217 218 218 219 221 222 223 8 Generalised Method of Moments 8.1 Introduction 8.2 Moment Conditions 8.3 Estimation 8.3.1 Just Identified 8.3.2 Over Identified 8.3.3 Choice of Weighting Matrix 233 233 234 235 235 236 237 Contents 8.4 8.5 8.6 8.7 8.3.4 Choice of estimation method The Distribution of the GMM Estimator Testing Consumption CAPM Exercises ix 239 240 241 243 245 9 Panel Data 9.1 Introduction 9.2 Portfolio Returns 9.2.1 Time Series Regressions 9.2.2 Fama-MacBeth Regressions 9.3 No Common Effects 9.4 Pooling Time Series and Cross Section Data 9.5 Fixed Effects 9.5.1 Dummy Variable Estimator 9.5.2 Fixed Effects Estimator 9.6 Random Effects 9.6.1 Generalised Least Squares 9.6.2 Fixed or Random Effects 9.7 Applications 9.7.1 Performance of Family Owned Firms 9.8 Exercises 256 256 257 257 258 262 263 265 266 266 267 268 269 270 270 270 10 Factor Models 273 11 Risk 11.1 11.2 11.3 274 274 274 279 280 281 283 284 286 288 289 290 291 297 11.4 11.5 11.6 11.7 and Volatility Models Introduction Volatility Clustering GARCH 11.3.1 Specification 11.3.2 Estimation 11.3.3 Forecasting Asymmetric GARCH Models GARCH in Mean Multivariate GARCH 11.6.1 BEKK Model 11.6.2 Estimation 11.6.3 DCC Exercises PART THREE FINANCIAL MARKETS 309 x Contents 12 Fixed Interest Securities 12.1 Introduction 12.2 Background and Terminology 12.3 Statistical Properties of Yields 12.4 Forecasting the Yield Curve 12.5 Expectations Hypothesis 12.5.1 Hypothesis Testing 12.6 Discrete Time Models 12.6.1 Simple Model 12.6.2 Autoregressive Dynamics 12.7 Fitting Term Structure Models to Data 12.7.1 Square Root Models 12.7.2 Levels Effects 12.8 Testing a CKLS Model of Interest Rates 12.9 Continuous Time Models 12.9.1 Vasicek 12.9.2 Cox-Ingersoll-Ross 12.9.3 Singleton 12.9.4 Option Price Formulae 12.10 Estimation 12.10.1Jackknifing 12.11 Interpreting Factors 12.12 Application to Option Pricing 12.13 Conclusions 12.14 Computer Applications 12.14.1EViews Commands 12.14.2Exercises 311 311 312 314 317 320 325 327 327 328 328 328 328 328 334 334 334 334 334 334 334 334 334 334 334 334 334 13 Futures Markets 340 14 Microstructure 14.1 Introduction 341 341 Appendix A Data Description 342 Appendix B Long-Run Variance: Theory and Estimation 351 Appendix C Numerical Optimisation References Author index Subject index 357 368 375 376 Illustrations 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 2.1 2.2 2.3 2.4 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Monthly U.S. equity price index from 1933 to 1990 Logarithm of monthly U.S. equity price index from 1933 to 1990 Monthly U.S. equity returns from 1933 to 1990 Monthly U.S. zero coupon yields from 1946 to 1987 Monthly U.S. equity prices and dividends 1933 to 1990 Monthly U.S. dividends yield 1933 to 1990 U.S. zero coupon 6 and 9 month spreads from 1933 to 1990 Histogram of $/£ exchange rate returns Histogram of durations between trades for AMR U.S. equity returns for the period 1933 to 1990 with sample average superimposed U.S. equity prices for the period 1933 to 1990 with sample average superimposed Histogram of monthly U.S. equity returns 1933 -1990 Histogram of Bank of America trading revenue Daily 1% VaR for Bank of America Least squares residuals from CAPM regressions Microsoft prices and returns 1990-2004 Histogram of Microsoft CAPM residuals Fama-French and momentum factors S&P Index 1957- 2012 S&P500 log returns 1957- 2012 VAR impulse responses for equity-dividend model Simulated random walk with drift Different filters applied to U.S. equity prices Deterministic and stochastic trends Simulated distribution of Dickey-Fuller test NASDAQ Index 1973 - 2009 Recursive estimation of ADF tests on the NASDAQ Rolling window estimation of ADF tests on the NASDAQ 4 6 7 11 12 13 15 16 18 19 20 22 25 27 56 58 59 65 75 75 92 103 104 108 113 121 123 124 2 5.1 5.2 5.3 5.4 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.1 7.2 7.3 7.4 7.5 7.6 7.7 8.1 9.1 11.1 11.2 11.3 11.4 12.1 12.2 12.3 12.4 12.5 12.6 Illustrations Logarithm of U.S. equity prices, dividends and earnings Phase diagram to demonstrate equilibrium adjustment Scatter plot of U.S. equity prices, dividends and earnings Residuals from cointegrating regression AR(1) forecast of United States equity returns Probability integral transform Illustrating the probability integral transform Illustrating the probability integral transform Equity premium, dividend yield and dividend price ratio Recursive coefficients from predictive regressions Evaluating predictive regressions of the equity premium Stochastic simulation of equity prices Simulating VAR Durations between AMR trades Log-likelihood function of exponential model Eurodollar interest rates Density of Eurodollar interest rates Transitional density of Eurodollar interest rates Illustrating the LR and Wald tests Illustrating the LM test Moment conditions Fama-MacBeth regression coefficients Volatility clustering in merger hedge fund returns Empirical distribution of merger hedge fund returns Conditional variance News impact curve U.S. Term structure January 2000 U.S. zero coupon yields Yield curve factor loadings Diebold and Li (2006) factor loadings Monthly U.S. zero coupon bond yields 1946 to 1991 Impulse responses of a VECM (zero.*) 132 134 136 149 168 176 177 179 185 187 188 190 192 206 210 211 212 215 220 221 235 261 275 276 282 285 314 315 316 319 329 339 PART ONE BASICS 1 Properties of Financial Data 1.1 Introduction The financial pages of newspapers and magazines, online financial sites, and academic journals all routinely report a plethora of financial statistics. Even within a specific financial market, the data may be recorded at different observation frequencies and the same data may be presented in various ways. As will be seen, the time series based on these representations have very different statistical properties and reveal different features of the underlying phenomena relating to both long run and short run behaviour. A simple understanding of these everyday encounters with financial data requires at least a passing knowledge of the tools for the presentation of data, which is the subject matter of this chapter. The characteristics of financial data may also differ across markets. For example, there is no reason to expect that equity markets behave the same way as currency markets, or for commodity markets to behave the same way as bond markets. In some cases, like currency markets, trading is a nearly continuous activity, while other markets open and close in a regulated manner according to specific times and days. Options markets have their own special characteristics and offer a wide and growing range of financial instruments that relate to other financial assets and markets. One important preliminary role of statistical analysis is to find stylised facts that characterise different types of financial data and particular markets. Such analysis is primarily descriptive and helps us to understand the prominent features of the data and the differences that can arise from basic elements like varying the sampling frequency and implementing various transformations. Accordingly, the primary aim of this chapter is to highlight the main characteristics of financial data and establish a set of stylised facts 4 Properties of Financial Data for financial time series. These characteristics will be used throughout the book as important inputs in the building and testing of financial models. 1.2 A First Look at the Data This section identifies the key empirical characteristics of financial data. Special attention is devoted to establishing a set of stylised empirical facts that characterise financial data. These empirical characteristics are important for building financial models. A more detailed treatment of the material covered in this section may be found in Campbell, Lo and MacKinlay (1997). 1.2.1 Prices Figure 1.1 gives a plot of the monthly United States equity price index (S&P500) for the period January 1933 to December 1990. The time path of equity prices shows long-run growth over this period whose general shape is well captured by an exponential trend. This observed exponential pattern in the equity price index may be expressed formally as Pt = Pt−1 exp(rt ) , (1.1) Equity Price Index 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 0 100 200 300 400 where Pt is the current equity price, Pt−1 is the previous month’s price and rt is the rate of the increase between month t − 1 and month t. Exponential Trend Figure 1.1 Monthly equity price index for the United States from January 1933 to December 1990. If rt in (1.1) is restricted to take the same constant value, r, in all time 1.2 A First Look at the Data 5 periods, then equation (1.1) becomes Pt = Pt−1 exp(r) . (1.2) The relationship between the current price, Pt and the price two months earlier, Pt−2 , is Pt = Pt−1 exp(δ) = Pt−2 exp(r) exp(r) = Pt−2 exp(2r) . By continuing this recursion, the relationship between the current price, Pt , and the price T months earlier, P0 , is given by Pt = P0 exp(rT ). (1.3) It is this exponential function that is plotted in Figure 1.1 in which P0 = 7.09 is the equity price in January 1933 and r = 0.0055. The exponential function in equation (1.3) provides a predictive relationship based on long-run growth behaviour. It shows that in January 1933 an investor who wished to know the price of equities in December 1990 (T = 695) would use P (Dec.1990) = 7.09 × exp (0.0055 × 695) = 324.143. The actual equity price in December 1990 is 328.75 so that the percentage forecast error is 324.143 − 328.75 100 × = −1.401%. 328.75 Of course, equation (1.3) is based on information over the intervening period that would not be available to an investor in 1933. So, the prediction is called ex post, meaning that it is performed after the event. If we wanted to use this relationship to predict the equity price in December 2000, then the prediction would be ex ante or forward looking and the suggested trend price would be P (Dec.2000) = 7.09 × exp (0.0055 × 815) = 627.15. In contrast to the ex post prediction, the predicted share price of 627.15 now grossly underestimates the actual equity price of 1330.93. The fundamental reason for this is that the information between 1990 and 2000 has not been used to inform the choice of the value of the crucial parameter r. An alternative way of analysing the long run time series behaviour of asset prices is to plot the logarithms of price over time. An example is given in Figure 1.2 where the natural logarithm of the equity price given in Figure 1.1 is presented. Comparing the two series shows that while prices increase at 6 Properties of Financial Data an increasing rate (Figure 1.1) the logarithm of price increases at a constant rate (Figure 1.2). To see why this is the case, we take natural logarithms of equation (1.3) to yield pt = p0 + rT , (1.4) 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 2 Log Equity Price Index 3 4 5 6 where lowercase letters now denote the natural logarithms of the variables, namely, log Pt and log P0 . This is a linear equation between pt and T in which the slope is equal to the constant r. This equation also forms the basis of the definition of log returns, a point that is now developed in more detail. Figure 1.2 The natural logarithm of the monthly equity price index for the United States from January 1933 to December 1990. 1.2.2 Returns The return to a financial asset is one of the most fundamental concepts in financial econometrics and traditionally more attention is focussed on returns, which are a scale-free measure of the results of an investment, than on prices. Abstracting for the moment from the way in which returns are computed, Figure 1.3 plots monthly equity returns for the United States over the period January 1933 to December 1990. The returns are seen to hover around a return value that is near zero over the sample period, in fact r = 0.0055 as discussed earlier. In fact, we often consider data on financial asset returns to be distributed about a mean return value of zero. This 7 1.2 A First Look at the Data 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 -.2 -.1 Equity Returns 0 .1 .2 .3 feature of equity returns contrasts dramatically with the trending character of the corresponding equity prices presented in Figure 1.1. Figure 1.3 Monthly United States equity returns for the period January1933 to December 1990. The empirical differences in the two series for prices and returns reveals an interesting aspect of stock market behaviour. It is often emphasised in the financial literature that investment in equities should be based on long run considerations rather than the prospect of short run gains. The reason is that stock prices can be very volatile in the short run. This short run behaviour is reflected in the high variability of the stock returns shown in Figure 1.3. Yet, although stock returns themselves are generally distributed about a mean value of approximately zero, stock prices (which accumulate these returns) tends to trend noticeably upwards over time as is apparent in Figure 1.1. If stock prices were based solely on the accumulation of quantities with a zero mean, then there would be no reason for this upwards drift over time, a which is taken up again in Chapter ??. For present purposes, it is sufficient to remark that when returns are measured over very short periods of time, any tendency of prices to drift upwards is virtually imperceptible because that effect is so small and is swamped by the apparent volatility of the returns. This interpretation puts emphasis on the fact that returns generally focus on short run effects whereas price movements can trend noticeably upwards over long periods of time. 8 Properties of Financial Data 1.2.3 Simple Returns The simple return on an asset between time t and t − 1 is given by Rt = Pt Pt − Pt−1 = − 1. Pt−1 Pt−1 The compound return for n periods, Rn,t , is therefore given by Pt −1 Pt−n Pt−(n+2) Pt−(n+1) Pt Pt−1 = × × ··· × −1 × Pt−1 Pt−2 Pt−(n+1) Pt−n Rn,t = = (1 + Rt ) × (1 + Rt−1 ) × · · · × (1 + Rt−(n+2) ) × (1 + Rt−(n+1) ) − 1 = n−1 Y j=0 (1 + Rt−j ) − 1 The most common period over which a return is quoted is one year and returns data are commonly presented in per annum terms. In the case of monthly returns, the associated annualised simple return is computed as a geometric mean given by 1/12 11 Y Annualised Rn,t = (1 + Rt−j ) − 1. (1.5) j=0 1.2.4 Log Returns The log return of an asset is defined as rt = log Pt − log Pt−1 = log(1 + Rt ) . (1.6) Log returns are also referred to as continuously compounded returns. It is now clear that this definition of log returns is identical to that given in equation (1.4) with t = 1. The motivation for dealing with log returns stems from the associated ease with which compound returns may be dealt with. For example, the compound 2-period return is given by r2,t = (log Pt − log Pt−1 ) + (log Pt−1 − log Pt−2 ) = rt + rt−1 , (1.7) so that, by extension, the n-period compound return is simply rn,t = rt + rt−1 + · · · + rt−(n+1) = n−1 X j=1 rt−j , (1.8) 1.2 A First Look at the Data 9 In other words, the n-period compound log return is simply the sum of the single period log returns over the pertinent period. For example, for monthly log returns the annualised rate is Annualised rn,t = n−1 X j=0 rt−j = log Pt − log Pt−n , (1.9) where the last equality may be deduced from inspection of the first term on the right hand side of equation (1.7), after cancellation of terms. The major implication of the result in expression (1.9) is that a series of monthly returns can be expressed on a per annum basis by simply multiplying all monthly returns by 12, the implicit assumption being that the best guess of the per annum return is that the current monthly return will persist for the next 12 months. Another way to look at this is as follows. If rt is regarded as a constant, then it follows that the return over the year is rt × 12 = log Pt − log Pt−12 , and the price increase over the year is given by Pt = Pt−12 exp(rt × 12) . (1.10) This is exactly the relationship established in equation (1.2). By analogy, if prices are observed quarterly, then the individual quarterly returns can be annualised by multiplying the quarterly returns by 4. Similarly, if prices are observed daily, then the daily returns are annualised by multiplying the daily returns by the number of trading days 252. The choice of 252 for the number of trading days is an approximation as a result of holidays and leap years etc. Other choices are 250 and, very rarely, the number of calendar days, 365, is used. One major problem with using log returns as opposed to simple returns relates to the construction of portfolios of assets. The problem stems from the fact that taking a logarithm is a nonlinear transformation and this action causes problems when computing portfolio returns. The problem stems from the fact that log return on the portfolio cannot be expressed as a sum of log returns which each return weighted by the asset’s share in the portfolio. The reason for this is that the logarithm of a sum is not equivalent to the sum of logarithm of each of the constituents of the sum. We will largely ignore this problem because when returns are measured over short intervals and are therefore small the log return on the portfolio is negligibly different to the weighted sum of logarithm of the constituent asset returns. A more detailed treatment of this point is provided in the excellent texts of Campbell, Lo and MacKinlay (1997) and Tsay (2010). 10 Properties of Financial Data 1.2.5 Excess Returns The difference between the return on a risky financial asset and the return on some benchmark asset that is usually assumed to be a risk-free alternative, usually denoted rf,t , is known as the excess return. The risk-free return is usually taken to be the return on a government bond because the risk of default on this investment is so low as to be negligible. The simple and log excess returns on an asset are therefore defined, respectively, as Zt = Rt − rf,t zt = rt − rf,t . (1.11) 1.2.6 Yields A bond can be viewed simply as an interest only loan in the sense that the borrower will pay the interest in every period up to the maturity of loan, but none of the principal. The principal (or face value) of the bond is then repaid in full at end of the life of the bond (or at maturity). The number of years until the face value is paid off is called the bond’s time to maturity. The yield on a bond is now defined as the discount rate that equates the present value of the bond’s face value to its price. For present purposes, assume that the bond pays no interest at all (a zero coupon bond) and the investor’s return comes solely from the difference between the sale price of the bond and its face value at maturity. Bonds are dealt with in detail in Chapter 12 but for the moment, it suffices to state that the price of a zero coupon bond that pays $1 at maturity in n years is given by Pn,t = exp (−nyt ) , (1.12) in which yn,t represents the yield, commonly expressed in per annum terms. The yield can be derived by taking natural logarithms and rearranging equation (12.4) to give 1 yn,t = − pn,t . (1.13) n This expression shows that the yield is inversely proportional to the natural logarithm of the price of the bond. Figure 1.4 gives plots of yields on United States zero coupon bonds for maturities ranging from 2 months (n = 2/12) to 9 months (n = 9/12). The plot shown in Figure 1.4 show that the actual time series behaviour of bond yields is fairly complex, with periods of rising and falling yields that have a random wandering character. Randomly wandering series such as these in Figure 1.4 are very common in both finance and economics. 11 0 5 Zero Coupon Yields 10 15 20 1.2 A First Look at the Data 1945 1955 1950 1965 1960 1975 1970 1985 1980 Figure 1.4 Monthly United States zero coupon bond yields for maturities ranging from 2 months to 9 months the period December 1946 to February 1987. One particularly important feature of such series is that they behave as if they have no fixed mean level, so that they wander around in an apparently random manner over time continually revisiting earlier levels. 1.2.7 Dividends In many applications in finance, as in economics, the focus is on understanding the relationships among two or more series. For instance, in present value models of equities, the price of an equity is equal to the discounted future stream of dividend payments Dt+1 Dt+2 Dt+3 Pt = Et + + + · · · , (1.14) (1 + δt+1 ) (1 + δt+2 )2 (1 + δt+n )3 where Et [Dt+n ] represents the expectation of dividends in the future at time t + n given information available at time t and δt+n is the corresponding discount rate. The relationship between equity prices and dividends is highlighted in Figure 1.5 which plots United States equity prices and dividend payments from January 1933 to December 1990. There appears to be a relationship between the two series as both series exhibit positive exponential trends. To analyse the relationship between equity prices and dividends more closely, 12 Properties of Financial Data 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 2 3 4 5 6 (a) Equity Prices 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 0 5 10 15 (b) Dividend Payments Figure 1.5 Monthly United States equity prices and dividend payments for the period January1933 to December 1990. consider the dividend yield YIELDt = Dt , Pt (1.15) which is presented in Figure 1.6 based on the data in Figure 1.5. The dividend yield exhibits no upward trend and instead wanders randomly around the level 0.05. This behaviour is in stark contrast to the equity price and dividend series which both exhibit strong upward trending behaviour. The calculation of the dividend yield in (1.15) provides an example of how combining two or more series can change the time series properties of the data - in the present case by apparently eliminating the strong upward trending behaviour. The process of combining trending financial variables into new variables that do not exhibit trends is a form of trend reduction. An extremely important case of trend reduction by combining variables is known as cointegration, a concept that is discussed in detail in Chapter 5. The expression for the dividend yield in (1.15) can be motivated from the present value equation in (1.14), by adopting two simplifying assumptions. First, expectations of future dividends are given by present dividends 13 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 .02 .04 Dividend Yields .06 .08 .1 1.2 A First Look at the Data Figure 1.6 Monthly United States dividend yield for the period December 1946 to February 1987. Et [Dt+n ] = D. Second, the discount rate is assumed to be fixed at δ. Using these two assumptions in (1.14) gives 1 1 + + ... Pt = D (1 + δ) (1 + δ)2 D 1 1 = 1+ + + ... 1+δ (1 + δ) (1 + δ)2 D 1 = 1 + δ 1 − 1/ (1 + δ) D = , δ where the penultimate step uses the sum of a geometric progression.1 Rearranging this expression gives D δ= , (1.16) Pt which shows that the discount rate, δ is equivalent to the dividend yield, YIELDt . An alternative representation of the present value model suggested by 1 An infinite geometric progression is summed as follows 1 + λ + λ2 + λ3 + ... = where in the example λ = 1/ (1 + δt ). 1 , 1−λ |λ| < 1, 14 Properties of Financial Data equation (1.15) is to transform this equation into natural logarithms and rearrange for log (Pt ) as log (Pt ) = − log (δt ) + log (Dt ) . Assuming equities are priced according to the present value model, this equation shows that there is a one-to-one relationship between log Pt and log Dt . The relationship is explored in detail in Chapter 5 using the concept of cointegration. 1.2.8 Spreads An important characteristic of the bond yields presented in Figure 1.4 is that they all exhibit similar time series patterns, in particular a general upward drift with increasing volatility. This commonality suggests that yields do not move too far apart from each other. One way to highlight this feature is to compute the spread between the yields on a long maturity and a short maturity SPREADt = yLON G,t − ySHORT,t . Figure 1.7 gives the 6 and 9 month spreads relative to the 3 month zero coupon yield. None of these spreads exhibit any noticeable trend and all seem to hover around a constant level. The spreads also show increasing volatility over the sample period with the gyrations increasing towards the end of the sample. Comparison of Figures 1.4 and 1.7 reveals that yields exhibit vastly different time series patterns to spreads, with the former having upward trends while the latter show no evidence of trends. This example is another illustration of how combining two or more series can change the time series properties of the data. 1.2.9 Financial Distributions An important assumption underlying many theoretical and empirical models in finance is that returns are normally distributed. This assumption is widely used in portfolio allocation models, in Value-at-Risk (VaR) calculations, in pricing options, and in many other applications. An example of an empirical returns distribution is given in Figure 1.8 which gives the histogram of hourly United States exchange rate returns computed relative to the British pound. Even though this distribution exhibits some character- 15 85 19 80 19 75 19 70 19 65 19 60 19 55 19 50 19 85 19 80 19 75 19 70 19 65 19 60 19 55 19 50 19 19 45 -1 9-Month Spread 0 1 2 19 45 -1 6-Month Spread 0 1 2 1.2 A First Look at the Data Figure 1.7 Monthly United States 6-month and 9-month zero coupon spreads computed relative to the 3-month zero coupon yield for the period January1933 to December 1990. istics that are consistent with a normal distribution such as symmetry, the distribution differs from normality in two important ways: (1) The presence of heavy tails. (2) A sharp peak in the centre of the distribution. Distributions exhibiting these properties are known as leptokurtic distributions. As the empirical distribution exhibits tails that are much thicker than those of a normal distribution, the actual probability of observing excess returns is higher than that implied by the normal distribution. The empirical distribution also exhibits some peakedness at the centre of the distribution around zero, and this peakedness is sharper than that of a normal distribution. This feature suggests that there are many more observations where the exchange rate returns hardly moves and for which there are small returns than there would be in the case of draws from a normal population. 16 0 100 Density 200 300 400 Properties of Financial Data -.015 -.01 -.005 0 Exchange rate returns .005 .01 Figure 1.8 Empirical distribution of hourly $/£ exchange rate returns for the period 1 January 1986 00:00 to 15 July 1986 11:00 with a normal distribution overlaid. The example given in Figure 1.8 is for exchange rate returns. But the property of heavy tails and peakedness of the distribution of returns is common for other asset markets including equities, commodities and real estate markets. All of these empirical distributions are therefore inconsistent with the assumption of normality and financial models that are based on normality, therefore, may result in financial instruments such as options being incorrectly priced or measures of risk being underestimated. 1.2.10 Transactions A property of all of the financial data analysed so far is that observations on a particular variable are recorded at discrete and regularly spaced points in time. The data on equity prices and dividend payments in Figure 1.5 and the data on zero coupon bond yields in Figure 1.4, are all recorded every month. In fact, higher frequency data are also available at regularly spaced time intervals, including daily, hourly and even 10-15 minute observations. More recently, transactions data have become available which records the price of every trade conducted during the trading day. An example is given in Table 1.1 which gives a snapshot of the trades recorded on American Airlines on August 1, 2006. The variable Trade, x is a binary variable signifying 17 1.2 A First Look at the Data whether a trade has taken place so that 1 : Trade occurs xt = 0 : No trade occurs. The duration between trades, u, is measured in seconds, and the corresponding price of the asset at the time of the trade, P , is also recorded. The table shows that there is a trade at the 5 second mark where the price is $21.58. The next trade occurs at the 11 second mark at a price of $21.59, so the duration between trades is u = 6 seconds. There is another trade straight away at the 12 second mark at the same price of $21.59, in which case the duration is just u = 1 second. There is no trade in the following second, but there is one two seconds later at the 14 second mark, again at the same price of $21.59, so the duration is u = 2 seconds. The time differences between trades of American Airlines (AMR) shares is further highlighted by the histogram of the duration times, u, given in Figure 1.9. This distribution has an exponential shape with the duration time of u = 1 second, being the most common. However, there are a number of durations in excess of u = 25 seconds, and there are some times even in excess of 50 seconds. Table 1.1 American Airlines (AMR) transactions data: on August 1 2006, at 9 hours and 42 minutes. Sec. Trade (x) Duration (u) Price (P ) 5 6 7 8 9 10 11 12 13 14 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 6 1 1 2 $21.58 $21.58 $21.58 $21.58 $21.58 $21.58 $21.59 $21.59 $21.59 $21.59 The important feature of transactions data that distinguishes it from the time series data discussed above, is that the time interval between trades is not regular or equally spaced. In fact, if high frequency data are used, such as 1 minute data, there will be periods where no trades occur in the 18 Properties of Financial Data window of time and the price will not change. This is especially so in thinly traded markets. The implication of using such transactions data is that the models specified in econometric work need to incorporate those features, including the apparent randomness in the observation interval between trades. Correspondingly, the appropriate statistical techniques are expected to be different from the techniques used to analyse regularly spaced financial time series data. These issues for high frequency irregularly spaced data are investigated further in Chapter 14 on financial microstructure effects. 1.3 Summary Statistics In the previous section, the time series properties of financial data are explored using a range of graphical tools, including line charts, scatter diagrams and histograms. In this section a number of statistical methods are used to summarise financial data. While these methods are general summary measures of financial data, a few important case will be highlighted in which it is inappropriate to summarise financial data using these simple measures. 0 .05 Density .1 .15 Histogram of Durations between AMR Trades 0 20 40 60 Duration (secs) 80 100 Figure 1.9 Empirical distribution of durations (in seconds) between trades of American Airlines (AMR) on 1 August 2006 from 09:30 to 04:00 (23 401 observations). 19 1.3 Summary Statistics 1.3.1 Univariate Sample Mean An important feature of United States equity returns in Figure 1.3 is that they hover around some average value over the sample period. This average value is formally known as the sample mean. For the log returns series, rt , the sample mean is defined as T 1X rt . T r= (1.17) t=1 For the United States equity returns in in Figure 1.3, the sample mean is r = 0.005568. This value is plotted in Figure 1.10 together with the actual returns data. Not surprisingly, this value is very close to the value of r = 0.0055 used in Figure 1.1. Expressing the monthly sample mean in annual terms gives 0.005568 × 12 = 0.0668, Equity Returns 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 -.2 -.1 0 .1 .2 .3 which shows that average returns over the period 1933 to 1990 are 6.68% per annum. Mean Return Figure 1.10 Monthly United States equity returns for the period January 1933 to December 1990 with the sample average superimposed. An example where computing the sample mean is an inappropriate summary measure is the equity price index given in Figure 1.1. Figure 1.11 plots 20 Equity Price Index 90 Ja n 19 80 Ja n 19 70 Ja n 19 60 Ja n 19 50 Ja n 19 40 19 n Ja Ja n 19 30 0 100 200 300 400 Properties of Financial Data Mean Price Figure 1.11 Monthly United States equity price index for the period January 1933 to December 1990 with the sample average superimposed. the equity price index again, together with its sample mean of P = 80.253. Clearly the sample mean is not a representative measure of the equity price as there is no tendency for the equity price to return to its mean. In fact, the equity price is trending upwards away from its sample mean. A comparison of Figures 1.10 and 1.11 suggests that models of returns and prices need to be different. Sample Sample Variance and Standard Deviation Risk refers to the uncertainty surrounding the value of, or payoff from, a financial investment. In other words, risk reflects the chance that the actual return on an investment may be very different than the expected return and increased potential for loss from investments have obvious ramifications for individual investors. Figure 1.10 shows that actual returns deviate from the sample mean in most periods and the larger are these deviations the more risky is the investment. The classic measure of risk is given by the average squared deviation of returns from the mean, which is known as the sample variance T 1 X s = (rt − r)2 . T −1 2 t=1 (1.18) 1.3 Summary Statistics 21 In the case of the returns data, the sample variance is s2 = 0.0402602 = 0.00162. In finance, the sample standard deviation, which is the square root of the variance, v u T u 1 X t (rt − r)2 , (1.19) s= T −1 t=1 is usually used as the measure of the riskiness of an investment and is called the volatility of a financial return. The standard deviation has the scale as a return (rather than a squared return) and is therefore easily interpretable. The sample standard deviation of the returns series in Figure 1.3 is s = 0.040260. Sample Skewness Whilst the variance provides an average summary measure of deviations of returns around the sample mean, investors are also interested in the occurrence of extreme returns. Figure 1.12 gives a histogram of the United States equity returns previously plotted in Figure 1.3, which shows that there is a larger concentration of returns below the sample mean of r = 0.005568 (left tail) than there is for returns above the sample mean (right tail). In fact, the sample skewness is computed to be SK = −0.299. Formally, the distribution in this case is referred to as being negatively skewed as it shows that there is a greater chance (probability) of large returns below the sample mean than large returns above the sample mean. A distribution is positively skewed if the opposite is true, whereas a distribution is symmetric if the probabilities of extreme returns above and below the sample mean is the same. Sample Kurtosis The sample skewness statistic focusses on whether the extreme returns are in the left or the right tail of the distribution. The sample kurtosis statistic identifies if there are extreme returns, regardless of sign, relative to some benchmark, typically the normal distribution. The measure of kurtosis is T 1 X rt − r 4 KT = , (1.20) T s t=1 which is compared to a value of KT = 3 that would occur if the returns came from a normal distribution. In the case of the United States equity returns in Figure 1.12, the sample kurtosis is KT = 7.251. As this value is 22 0 5 Density 10 15 Properties of Financial Data -.2 -.1 0 .1 Equity Returns .2 .3 Figure 1.12 Empirical distribution of United States equity returns with sample average superimposed. Data are monthly for the period January 1933 to December 1990. greater than 3, there are more extreme returns in the data not predicted by the normal distribution. 1.3.2 Bivariate Covariance The statistical measures discussed so far summarise the characteristics of a single series. Perhaps what is more important in finance is understanding the interrelationships between two or more financial time series. For example, in constructing a diversified portfolio, the aim is to include assets whose returns are not perfectly correlated. Figure ?? provides an example of prices and dividends moving in the same direction, as reflected by the positive slope of the scatter diagram. One way to measure co-movements between the returns on two assets, rit and rjt , is by computing the covariance sij = T 1X (rit − ri ) (rjt − rj ) , T (1.21) t=1 where ri and rj are the respective sample means of the returns on assets i and j. 1.4 Percentiles and Computing Value-at-Risk 23 A positive covariance, sij > 0, shows that when the returns of asset i and asset j have a tendency to move together. That is, when return on asset i is above its mean, the return on asst j is also likely to be above its mean. A negative covariance, sij < 0, indicates that when the returns of asset i are above its sample mean, on average, the returns on asset j are likely to be below its sample mean. Covariance has a particularly important role to play in portfolio theory and asset pricing, as will become clear in Chapter 2. Correlation Another measure of association that is widely used in finance is the correlation coefficient, defined as sij , (1.22) cij = √ sii sjj where T 1X sii = (rit − ri )2 , T t=1 sjj T 1X = (rjt − rj )2 , T t=1 represent the respective variances of the returns of assets i and j. The correlation coefficient is the covariance scaled by the standard deviations of the two returns. The correlation has the property that is has the same sign as the covariance, as well as the additional property that it lies in the range −1 ≤ cij ≤ 1. 1.4 Percentiles and Computing Value-at-Risk The percentiles of a distribution are a set of summary statistics that summarise both the location and the spread of a distribution. Formally, a percentile is a measure that indicates the value of a given random variable below which a given percentage of observations fall. So the important measure of the location of a distribution, the median, below which 50% of the observations of the random variable fall, is also the 50th percentile. The median is an alternative to the sample mean as a measure of location and can be very important in financial distributions in which large outliers are encountered. The difference between the 25th percentile (or first quartile) and the 75th percentile (or third quartile) is known as the inter-quartile range. which provides an alternative to the variance as a measure of the dispersion of the distribution. It transpires that the percentiles of the distribution, particularly the 1st and 5th percentiles are important statistics in the computation of an important risk measure in finance known as Value-at-Risk or VaR. 24 Properties of Financial Data Losses faced by financial institutions have the potential to be propagated through the financial system and undermine its stability. The onset of heightened fears for the riskiness of the banking system can be rapid and have widespread ramifications. The potential loss faced by banks is therefore a crucial measure of the stability of the financial sector. A bank’s fundamental soundness may be measured by its trading revenue, which is a hypothetical revenue based on portfolio allocation decisions made by the bank. For the most part, such a measure does not exist, but it is possible to ascertain actual daily trading revenues, which include the effects of intraday trades made by the bank and also trading fees and/or commissions, from graphical reports published by some major banks. Pérignon and Smith (2010) adopted an innovative method for collecting this data. They searched for banks that had disclosing graphs of the daily trading revenues over a sufficiently long sample period (2001 - 2004). They then downloaded the graph, converted it to a JPG image and captured the co-ordinates of each point in order to return a numerical value for daily trading revenue. The summary statistics and percentiles of the daily trading revenues of Bank of America, obtained by this method, are presented in Table 1.2. Table 1.2 Descriptive statistics and percentiles for daily trading revenue of Bank of America for the period 2 January 2001 to 31 December 2004. Statistics Observations Mean Std. Dev. Skewness Kurtosis Maximum Minimum 1008 13.86988 14.90892 0.1205408 4.925995 84.32714 -57.38857 Percentiles 1% 5% 10% 25% 50% 75% 90% 95% 99% -24.82143 -9.445714 -2.721429 4.842857 13.14839 22.96184 30.85943 36.43548 57.10429 Mean is greater than the median indicating that the bulk of the values lie to left of the mean and that the distribution is positively skewed. This conclusion is borne out by the positive value of the skewness statistic, 0.1205, and also by Figure 1.13 which shows a histogram of daily trading revenue with a normal distribution superimposed. The histogram also shows very clearly that the distribution of daily trading revenue exhibits kurtosis, 4.9360. The histogram indicates that the peak of the distribution is higher 25 1.4 Percentiles and Computing Value-at-Risk 0 .01 Density .02 .03 than that of the associated normal distribution and the tails are also fatter. This situation is known as leptokurtosis. -50 0 50 100 Trading Revenue Figure 1.13 Histogram of daily trading revenue from 2 January 2001 to 31 December 2004 reported by Bank of America. Normal distribution with mean 13.8699 and standard deviation 14.9090 is superimposed. How may this information be used to inform a discussion about risk? Following a wave of banking collapses in the 1990s financial regulators, in the guise of the Basel Committee on Banking Supervision (1996), started requiring banks to hold capital to buffer against possible losses, measured using a method called Value-at-Risk (VaR). VaR quantifies the loss that a bank can face on its trading portfolio within a given period and for a given confidence interval. More formally in the context of a bank, VaR is defined in terms of the lower tail of the distribution of trading revenues. Specifically, the 1% VaR for the next h periods conditional on information at time T is the 1st percentile of expected trading revenue at the end of the next h periods. For example, if the daily 1% h-period VaR is $30million, then there is a 99% chance that at the end of h periods bank’s trading loss will exceed $30million, but there is a 1% chance the bank will lose $30 million or more. Although $30 million is a loss in this example, by convention the minus sign is not used. There are three common ways to compute VaR. 1. Historical Simulation The historical method simply computes the percentiles of the distribution from historical data and assumes that history will repeat 26 Properties of Financial Data itself from a risk perspective. From Table 1.2 the 1% daily VaR for Bank of America using all available historical data (2001 - 2004) is $24.8214 million. There is evidence that most banks use historical simulation to compute VaR (Pérignon and Smith, 2010). Its popularity is probably due to a combination of simplicity, both conceptually and computationally, and the fact that estimates of VaR will be reasonably smooth over time. 2. The Variance-Covariance Method This method assumes that the trading revenues are normally distributed. In other words, it requires that we estimate only two factors, the expected (or mean) return and the standard deviation, in order to describe the entire distribution of trading revenue. From Table 1.2 the mean is $13.8699 mill and the standard deviation is $14.9089 which taken together generate the normal curve superimposed on the histogram in Figure 1.13. From the assumption of a normal distribution it follows that 1% of the distribution lies in the tail delimited by −2.33 standard deviations from the mean. The daily 1% VaR for Bank of America is therefore 13.8699 − 2.33 × 14.9089 = $20.8679 . This value is slightly lower than that provided by historical simulation because the assumption of normality ignores the slightly fatter tails exhibited by the empirical distribution of daily trading revenues. 3. Monte Carlo Simulation The third method involves developing a model for future stock price returns and running multiple hypothetical trials through the model. A Monte Carlo simulation refers to any method that randomly generates trials, but by itself does not tell us anything about the underlying methodology. This approach is revisited in Chapter 6. Figure 1.14 plots the daily trading revenue of the Bank of America together with the 1% daily VaR reported by the bank obtained by Pérignon and Smith in the manner just described. Even to the naked eye it is apparent that Bank of America had only four violations of the 1% daily reported VaR during the period 2001-2004 (T = 1008), amounting to only 0.4%. The daily VaR computed from historical simulation is also shown and it provides compelling evidence that the Bnak of America has been over-conservative in its estimation of daily VaR. Furthermore, Figure 1.14 reveals that the reported values of VaR are not always closely related to actual observed volatility in daily trading revenue. The VaR reported by Bank of America for the 27 Trading Revenue Historical VaR 05 20 04 20 03 20 02 20 20 01 -100 -50 $ mill 0 50 100 1.5 The Efficient Markets Hypothesis and Return Predictability Daily Reported VaR Figure 1.14 Time series plot of the daily 1% Value-at-Risk reported by Bank of America from 2 January 2001 to 31 December 2004. year 2001 is fairly consistent and, if anything, trends upward over the year. This is counter-intuitive given the volatility in trading revenue following the events of 11 September 2001. 1.5 The Efficient Markets Hypothesis and Return Predictability The correlation statistic in (1.22) determines the strength of the co-movements between the returns of one asset with the returns of another asset. An important alternative application of correlation is to measure the strength of movements in current returns on an asset, rt with returns on the same asset k periods earlier, rt−k . As the correlation is based on own lags, it is referred to as the autocorrelation. For any series of returns, the autocorrelation coefficient for k lags is defined as PT (rt − r) (rt−k − r) ρk = t=k+1 PT 2 t=1 (rt − r) If the series of returns does not exhibit autocorrelation then there is no discernible pattern in their behaviour, making future movements in returns 28 Properties of Financial Data unpredictable. If a series of returns exhibits positive autocorrelation, however, then successive values of returns tend to have the same sign and this pattern can be exploited in predicting the future behaviour of returns. Similarly, negative autocorrelation results in the signs of successive values returns alternating and prediction is based on this pattern is possible. The fact that the presence of autocorrelation in asset returns represents a pattern which can potentially be used in prediction of future returns is the cornerstone of an important concept in modern finance, namely the efficient markets hypothesis (Fama, 1965; Samuelson, 1965). In its most general form, the efficient markets hypothesis theorises that all available information concerning the value of a risky asset is factored into the current price of the asset. A natural corollary of the efficient markets hypothesis is that the current price provides no information on the direction of the future price and that the asset returns should exhibit no autocorrelation. An empirical test of the efficient market hypothesis in the context of a particular asset is therefore that all the autocorrelations in its returns are zero, or ρ1 = ρ2 = ρ3 = · · · = 0. Table 1.3 gives the first 10 autocorrelations of hourly DM/$ exchange rate returns in column 2. All autocorrelations appear close to zero, suggesting that exchange rate returns are not predictable and that the foreign exchange market is therefore efficient in the sense that all information about the DM/$ exchange rate is contained in the current quoted price. Table 1.3 Autocorrelation properties of returns and functions of returns for the hourly DM/$ exchange rate for the period 1 January 1986 00:00 to 15 July 1986 11:00. Lag rt rt2 1 2 3 4 5 -0.022 0.020 0.023 -0.027 0.030 0.079 0.074 0.042 0.055 0.004 0.182 0.128 0.086 0.070 0.034 6 7 8 9 10 -0.024 -0.010 0.013 -0.007 0.027 0.018 -0.007 -0.009 -0.019 0.017 0.058 0.018 0.020 0.004 -0.014 |rt | 0.5 |rt | 0.214 0.129 0.085 0.055 0.043 0.064 0.035 0.033 0.015 -0.021 The calculation of autocorrelations of returns reveals information on the 1.5 The Efficient Markets Hypothesis and Return Predictability 29 mean of returns. This suggests that applying this approach to squared returns reveals information on the variance of returns. The autocorrelation between squared returns at time t and squared returns k periods earlier, is defined as PT 2 − r2 2 2 r r − r t t=k+1 t−k . ρk = 2 PT 2 2 t=1 rt − r The application of autocorrelations to squared returns represents an important diagnostic tool in models of time-varying volatility which is discussed in Chapter 11. Following in particular the seminal work of Engle (1982) and Bollerslev (1986), positive autocorrelations in squared returns, suggests that there is a higher chance of high (low) volatility in the next period if volatility in the previous period is high (low). Formally this phenomenon is known as volatility clustering. Column 3 in Table 1.3 gives the first 10 autocorrelations of hourly DM/$ squared exchange rate returns. Comparing these autocorrelations to the autocorrelations based on returns, shows that there is now stronger positive autocorrelation. This suggests that while the mean return is not predictable, the variance of return is potentially predictable because of the phenomenon of volatility clustering in exchange rate returns. Note, however, that this conclusion does not violate the efficient markets hypothesis because his hypothesis is concerned only with the expected value of the level of returns. It is also possible to compute autocorrelations for various transformations of returns, including rt3 , rt4 , |rt | , |rt |α . The first two transformations provide evidence of autocorrelations in skewness and kurtosis respectively. The third transformation provides an alternative measure of the presence of autocorrelation in the variance. The last case simply represents a general transformation. For example, setting α = 0.5 computes the autocorrelation of the standard deviation (the square root of the variance). The presence of stronger autocorrelation in squared returns than returns, suggests that other transformations of returns may reveal even stronger autocorrelation patters and this conjecture is born out by the results reported in Table 1.3. Columns 4 and 5 in Table 1.3 respectively give the first 10 autocorrelations of hourly absolute DM/$ exchange returns, |rt |, and the square root of absolute DM/$ exchange returns returns, |rt |0.5 . Comparing these autocorrelations to the autocorrelations based on returns (column 2) 30 Properties of Financial Data and squared returns (column 3), reveals even stronger positive autocorrelation patterns with the strongest pattern revealed by the standard deviation transformation |rt |0.5 . 1.6 Efficient Market Hypothesis and Variance Ratio Tests† Another statement of the efficient markets hypothesis is that the price of a financial asset encapsulates all available information. Consider the following simple model of asset prices pt = αpt−1 + ut −→ pt − pt−1 = rt = α + ut , (1.23) in which the constant α represents a small positive compensation for holding a risky asset. The main implication of this model is that the predictably of asset returns and hence prices depends solely upon the characteristics of the disturbance term ut . Based on this simple model a formal test of the predictability of asset returns may be developed based on the concept of a variance ratio, which in fact just turns out to be a clever way of testing that the autocorrelations of returns are zero. Campbell, Lo and MacKinlay (1997) provide a thorough treatment of the different versions of the variance ratio tests. Suppose that E[u2t ] = σ 2 and that E[ut−i ut−j ] = 0 for all i 6= j. In this situation there is no information in the disturbance term that may be used to predict asset returns and the market is therefore efficient. Under these assumptions, the q-period return is simply the sum of the single period log returns, as discussed previously, and the variance of the multi-period returns is var(ut + · · · ut−q+1 ) is simply qσ 2 . Let σ bq2 be an estimator of var(ut + · · · ut−q+1 ) and σ b2 be the sample variance. Under the null hypothesis, the statistic based on the ratio of variances Vq = σ bq2 qσ b2 should, on average, be equal to one. The intuition behind the test may be developed a little further. Assume that the disturbance term ut has constant variance σ 2 , but that the covariance between ut and ut−j is not zero but γj . For example, the 3-period return is var(r3t ) = var(rt + rt−1 + rt−2 ) = 3var(rt ) + 2 cov(rt , rt−1 ) + cov(rt−1 , rt−2 ) + cov(rt , rt−2 ) = 3γ0 + 2(2γ1 + γ2 ) , 1.6 Efficient Market Hypothesis and Variance Ratio Tests† 31 recognising that var(rt ) = σ 2 = γ0 . The variance ratio for the 3-period return is then 3γ0 + 2(2γ1 + γ2 ) . V3 = 3 γ0 This expression may be simplified by recalling that the autocorrelation at lag i is given by ρi = γi /γ0 . The variance ratio may then be written as 2 1 V3 = 1 + 2 ρ1 + ρ2 , 3 3 which is a weighted sum of autocorrelations with weights declining as the order of autocorrelation increases. Of course if both ρ1 and ρ2 are zero, then V3 = 1. In other words, the variance ratio is simply a test that all the autocorrelations of ut are zero and that therefore returns are not predictable. To construct a proper statistical test it is necessary to specify how to compute the variance ratio and what the distribution of the test statistic under the null hypothesis is. Suppose that there are T + 1 observations on log prices {p1 , p2 , · · · , pT +1 so that there are T observations on log returns. The variance ratio statistic for returns defined over q periods is defined as b2 bq = σ V σ bq2 in which T 1X α b= rk T σ b2 = σ bq2 = 1 T k=1 T X (rk − α b)2 k=1 T X 11 qT (1.24) (pk − pk−q − q α b )2 . (1.25) (1.26) k=q1 bq − 1 Lo and MacKinlay (?) show that, in large samples, the test statistic V is distributed as follows: 1/2 √ T b bq − 1 ∼ N (0, 1) T Vq − 1 ∼ N (0, 2(q − 1)) or V 2(q − 1) There are many other versions of the variance ratio test statistic. Small sample bias adjustments may be made to the estimators of σ b2 and σ bq2 . The assumptions about the behaviour of the underlying disturbance term, ut , may be relaxed. For example, it will become apparent in Chapter ?? that, 32 Properties of Financial Data when dealing with the returns to financial assets, the assumption of a constant variance for disturbance term is unrealistic. Furthermore, although the test is still for zero autocorrelations in the ut , there is strong evidence to suggest dependence in the squares of the disturbance term. This situation can also be dealt with by adjusting the definition of the variance ratio statistic. 1.7 Exercises (1) Equity Prices, Dividends and Returns pv.wf1, pv.dta, pv.xlsx (a) Plot the equity price over time and interpret its time series properties. Compare the result with Figure 1.1. (b) Plot the natural logarithm of the equity price over time and interpret its time series properties. Compare this graph with Figure 1.2. (c) Plot the return on equities over time and interpret its time series properties. Compare this graph with Figure 1.3. (d) Plot the price and dividend series using a line chart and compare the result in Figure 1.5. (e) Compute the dividend yield and plot this series using a line chart. Compare the graph with Figure 1.6. (f) Compare the graphs in parts (a) and (b) and discuss the time series properties of equity prices, dividend payments and dividend yields. (g) The present value model predicts a one-to-one relationship between the logarithm of equity prices and the logarithm of dividends. Use a scatter diagram to verify this property and compare the result with Figure ??. (h) Compute the returns on United States equities and then calculate the sample mean, variance, skewness and kurtosis of these returns. Interpret the statistics. (2) Yields zero.wf1, zero.dta, zero.xlsx (a) Plot the 2, 3, 4, 5, 6 and 9 months United States zero coupon yields using a line chart and compare the result in Figure 1.4. 1.7 Exercises 33 (b) Compute the spreads on the 3-month, 5-month and 9-month zero coupon yields relative to the 2-month yield and and plot these spreads using a line chart. Compare the graph with Figure 1.4. (c) Compare the graphs in parts (a) and (b) and discuss the time series properties of yields and spreads. (3) Computing Betas capm.wf1, capm.dta, capm.xlsx (a) Compute the monthly excess returns on the United States stock Exxon and the market excess returns. (b) Compute the variances and covariances of the two excess returns. Interpret the statistics. (c) Compute the Beta of Exxon and interpret the result. (d) Repeat parts (a) to (c) for General Electric, Gold, IBM, Microsoft and Wal-Mart. (4) Duration Times Between American Airline (AMR) Trades amr.wf1, amr.dta, amr.xlsx (a) Use a histogram to graph the empirical distribution of the duration times between American Airline trades. Compare the graph with Figure 1.9. (b) Interpret the shape of the distribution of durations times. (5) Exchange Rates hour.wf1, hour.dta, hour.xlsx (a) Draw a line chart of the $/£ exchange rate and discuss its time series characteristics. (b) Compute the returns on $/£ pound exchange rate. Draw a line chart of this series and discuss its time series characteristics. (c) Compare the graphs in parts (a) and (b) and discuss the time series properties of exchange rates and exchange rate returns. (d) Use a histogram to graph the empirical distribution of the returns on the $/£. Compare the graph with Figure 1.12. 34 Properties of Financial Data (e) Compute the first 10 autocorrelations of the returns, squared returns, absolute returns and the square root of the absolute returns. (f) Repeat parts (a) to (e) using the DM/$ exchange rate and comment on the time series characteristics, empirical distributions and patterns of autocorrelation for the two series. Discuss the implications of these results for the efficient markets hypothesis. (6) Value-at-Risk bankamerica.wf1, bankamerica.dta, bankamerica.xlsx (a) Compute summary statistics and percentiles for the daily trading revenues of Bank of America. Compare the results with Table 1.2. (b) Draw a histogram of the daily trading returns and superimpose a normal distribution on top of the plot. What do you deduce about the distribution of the daily trading revenues. (c) Plot the trading revenue together with the historical 1% VaR and the reported 1% Var. Compare the results with Figure 1.14. (d) Now assume that a weekly VaR is required. Repeat parts (a) to (c) for weekly trading revenues. 2 Linear Regression Models 2.1 Introduction One of the most widely used models in empirical finance is the linear regression model. This model provides a framework in which to explain the movements of one financial variable in terms of one, or many explanatory variables. Important examples include, but are not limited to, measuring Beta-risk in the capital asset pricing model (CAPM), extensions and variations of the CAPM model, such as the Fama-French three factor model and the consumption-CAPM version, arbitrage pricing theory, the term structure of interest rates and the present value model of equity prices. Although these basic models stipulate linear relationships between the variables, the framework is easily extended to a range of nonlinear relationships as well. Movements to capture sharp changes in returns caused by stock market crashes, day-of-the-week effects and policy announcements is easily handled by means of qualitative response variables or dummy variables. The importance of the linear regression modelling framework is highlighted by appreciating its flexibility in quantifying changes in key financial parameters arising from changes in the financial landscape. From Chapter 1 the traditional approach to modelling the Beta-risk of an asset is to assume that it is a constant ratio of the covariance between the excess returns on the asset with the market, to the variance of the market excess returns. However, one or both of these quantities may change over time resulting in changes in the Beta-risk of the asset. The linear regression model provides a flexible and natural approach to modelling time-variations in Beta-risk. 36 Linear Regression Models 2.2 Portfolio Risk Management Risk management concerns choosing a portfolio of assets where the relative contribution of each asset in the portfolio is chosen to minimise the overall risk of the portfolio, as measure by its volatility, or its variance. To derive the minimum variance portfolio, consider a portfolio consisting of two assets with returns r1,t and r2,t , respectively, with the following properties Mean: µ1 = E[r1,t ] µ2 = E[r2,t ] Variance: σ12 = E[(r1,t − µ1 )2 ] σ22 = E[(r2,t − µ2 )2 ] Covariance: σ1,2 = E[(r1,t − µ1 )(r2,t − µ2 )] The return on the portfolio is given by rp,t = w1 r1,t + w2 r2,t , (2.1) w1 + w2 = 1, (2.2) where are weights that define the relative contributions of each asset in the portfolio. The expected return on this portfolio is µp = E[w1 r1,t + w2 r2,t ] = w1 E[r1,t ] + w2 E[r2,t ] = w1 µ1 + w2 µ2 , (2.3) where a measure of the portfolio’s risk is σp2 = E[(rp,t − µp )2 ] = E[(w1 (r1,t − µ1 ) + w2 (r2,t − µ2 ))2 ] = w12 E[(r1,t − µ1 )2 ] + w22 E[(r2,t − µ2 )2 ] + 2w1 w2 E[(r1,t − µ1 )(r2,t − µ2 )] = w12 σ12 + w22 σ22 + 2w1 w2 σ1,2 . (2.4) Using the restriction imposed by equation (2.2), the risk of the portfolio is equivalent to σp2 = w12 σ12 + (1 − w1 )2 σ22 + 2w1 (1 − w1 )σ1,2 . (2.5) To find the optimal portfolio that minimises risk, the following optimisation problem is solved min σp2 . w1 Differentiating (2.5) with respect to w1 gives dσp2 = 2w1 σ12 − 2(1 − w1 )σ22 + 2(1 − 2w1 )σ1,2 . dw1 2.2 Portfolio Risk Management 37 Setting this derivative to zero and rearranging for w1 gives the optimal portfolio weight on the first asset as w1 = σ22 − σ1,2 . σ12 + σ22 − 2σ1,2 (2.6) Upon using (2.2) gives the optimal weight on the other asset as w2 = 1 − w1 = σ12 − σ1,2 . σ12 + σ22 − 2σ1,2 (2.7) An alternative way of expressing the minimum variance portfolio model is to consider the linear regression equation yt = β0 + β1 xt + ut , (2.8) where the variables are defined as yt = r2,t , xt = r2,t − r1,t , (2.9) and ut is a disturbance term which is shown below to be also the return on the portfolio. The parameters β0 and β1 , are chosen such that their estimated values βb0 and βb1 given by cov(yt , xt ) βb1 = , var(xt ) βb0 = E[yt ] − β1 E[xt ] , (2.10) respectively minimize the variance, σ 2 = E[u2t ]. To see that the expressions in (2.10) yield the minimum variance portfolio, the definitions of yt and xt in (2.9) are substituted into (2.10) to give cov(yt , xt ) βb1 = var(xt ) cov(r2,t , r2,t − r1,t ) = var(r2,t − r1,t ) var(r2,t ) − cov(r2,t , r1,t ) = var(r2,t ) + var(r1,t ) − 2cov(r2,t , r1,t ) σ 2 − σ1,2 = 2 2 2 , σ1 + σ2 − 2σ1,2 (2.11) and βb0 = E[yt ] − β1 E[xt ] = E[r2,t ] − β1 E[r2,t − r1,t ] = β1 E[r1,t ] − (1 − β1 )E[r2,t ] = β1 µ1 − (1 − β1 )µ2 . (2.12) 38 Linear Regression Models The expression for βb1 is equivalent to the optimal weight of the first asset in the portfolio given in (2.6), that is βb1 = w1 . A comparison of the expression of βb0 with the expected return on the portfolio in (2.3) shows that βb0 represents the mean return on the minimum variance portfolio. Moreover, the estimate of the disturbance term in (2.8) is u bt = yt − βb0 − βb1 xt = r2,t − βb0 − βb1 (r2,t − r1,t ) = r2,t − (βb1 µ1 − (1 − βb1 )µ2 ) − βb1 (r2,t − r1,t ) = βb1 (r1,t − µ1 ) + (1 − βb1 )(r2,t − µ2 ), where the third line makes use of the expression of βb0 in (2.12). The disturbance term is a weighted average of the deviations of the returns from their average values where the weights are the portfolio weights. This also means that the variance of the disturbance term σ 2 = E[u2t ], corresponds to the risk of the portfolio, σp2 . This one-to-one relationship between the minimum variance portfolio and the linear regression parameters in (2.8) forms the basis of the least squares estimator which is used to estimate the parameters of this model from a sample of data. Before exploiting this connection, some further examples showing the relationship between the linear regression model and finance theoretical models are given next. 2.3 Linear Models in Finance This section highlights the importance of the linear regression model in empirical finance by demonstrating that it is central to a number of well-known theories in finance. In many of these examples the parameters of the linear regression model are shown to have very clear and explicit interpretations that directly relate to financial inputs and quantities. 2.3.1 The Constant Mean Model The simplest linear model in finance is where the average return on an asset is assumed to be constant rt = µ + ut , (2.13) where rt is the return and µ = E[rt ] is the average return or expected return. The disturbance term ut represents the deviation of the return on the asset 2.3 Linear Models in Finance 39 at time t from its mean ut = rt − µ. This term has two important properties which follow immediately from (2.13). First, it has zero mean since E[ut ] = E[rt − µ] = E[rt ] − µ = µ − µ = 0 . (2.14) Second, the variance of ut is σ 2 = E[u2t ] = E[(rt − µ)2 ] , (2.15) where the last step shows that the variance of ut and rt are the equivalent. 2.3.2 The Market Model The market model extends the constant mean model in (2.13) by assuming that the return on the asset follows movements in the return on the market portfolio, rm,t , and is given by rt = β0 + β1 rm,t + ut , (2.16) in which ut is the disturbance term. The parameters β0 and β1 represent, respectively, the intercept and the slope of the linear function β0 + β1 rm,t . Equation (2.16) is a regression line in which rt is the dependent variable and rm,t is the explanatory variable, so-called because movements in rt help to explain movements in rm,t . Of course the variation in rt is only partially explained by movements in rm,t , with any unexplained variation in rt being captured by the disturbance term. In the market model, the expected return on the asset is given by Et [rt ] = β0 + β1 rm,t , (2.17) where Et [·] is the conditional expectations operator based on information at time t, as given by rm,t . In the special case where the return is not affected by the return on the market, β1 = 0, the market model reduces to the constant mean model in (2.13) and the conditional expectations operator reduces to the unconditional expectation, Et [rt ] = E[rt ] = β0 . Put simply, the t subscript on the conditional expectations operator is now dropped as the expectation is not based on any information at time t, or any other point in time for that matter. 40 Linear Regression Models 2.3.3 The Capital Asset Pricing Model Building on efficient portfolio theory developed by Markowitz (1952, 1959), the Capital Asset Pricing Model (CAPM), which is credited to Sharpe (1964) and Lintner (1965), relates the return on the ith asset at time t, ri,t , to the return on the market portfolio, rm,t , with both returns adjusted by the return on a risk-free asset, rf,t , usually taken to be the interest rate on a government security. As in equation (1.11) of Chapter 1, the log excess return for asset i are defined as zi,t = ri,t − rf,t , zm,t = rm,t − rf,t . As pointed out in Chapter 1, the risk characteristics of an asset are encapsulated by its Beta-risk β= cov(zi,t , zm,t ) , var(zm,t ) (2.18) which was introduced in Chapter 1. The CAPM is equivalent to the linear regression model ri,t − rf,t = α + β(rm,t − rf,t ) + ut , (2.19) in which ut is a disturbance term and β represents the asset’s Beta-risk as given in (2.18) and the constant, which is traditionally labelled α, represents the abnormal return to the asset over and above the asset’s exposure to the excess return on the market. This model postulates a linear relationship between the excess return on the asset and the excess return on the market, with the slope given by asset’s Beta-risk, β1 . In the pure form of the CAPM, the return on the market is equal to the return on the risk free asset so that rm,t = rf,t . In this scenario, the return on the asset should also equal the risk free rate of return as well. For this relationship to be satisfied, the intercept of the regression model is restricted to be zero, α = 0, and the the CAPM regression line passes through the origin. A further feature of the linear regression equation in (2.19) is that it conveniently decomposes the total risk of an asset at time t in terms of the component that is systemic and that part which is ideosyncratic E[(ri,t − rf,t )2 ] = E[(α + β(rm,t − rf,t ))2 ] + {z } | {z } | Total risk Systematic risk E[u2t ] | {z } , (2.20) Ideosyncratic risk a result which uses the fact that E[(rm,t − rf,t ), ut ] = 0. Systematic risk is so-called because it relates to the risk of the overall market portfolio. The 2.3 Linear Models in Finance 41 idiosyncratic risk, σ 2 = E[u2t ], relates to that part of risk which is unique to the individual asset and uncorrelated with the market. 2.3.4 Arbitrage Pricing Theory An alternative approach to mo using Fama-French factors in extending the CAPM equation in (2.19), is to include variables that capture unanticipated movements in key economic variables such as commodity movements and output growth. This class of models is based on arbitrage pricing theory (APT) developed by Ross (1976), which is summarised by the linear regression equation ri,t − rf,t = β0 + β1 (rm,t − rf,t ) + β2 Ut + ut , (2.21) where Ut represents unanticipated movements in a particular variable or set of variables and ut is a disturbance term. This model reduces to the CAPM in (2.19) where β2 = 0, a situation which occurs when unanticipated movements in the economy do not contribute to explaining movements in the excess returns on the asset. One of the drawbacks of the APT model is that it does not identify the factors, Ut , to be included in equation (2.21). In applied work, the choice of factors can usually driven either by theoretical considerations or by the data. The theoretical approach attempts to discern macroeconomic and financial market variables that relate to the systematic risk of the economy. The statistical or data-driven approach normally uses a technique known as principal component analysis to identify number of underlying ‘factors’ that drive returns, without specifying how exactly these factors are to be interpreted. This approach to factor choice is the subject matter of Chapter 10. 2.3.5 Term Structure of Interest Rates Consider the relationship between the return on a long-term bond maturing in n-periods rn,t , and a short-term 1-period bond r1,t . The expectations hypothesis of the term structure of interest rates requires that the yield on a n-period long-term bond, rn,t , is equal to a constant risk premium, φ, plus the average of current and expected future 1-period short-term rates r1,t + Et [r1,t+1 ] + Et [r1,t+2 ] + · · · + Et [r1,t+n−1 ] , (2.22) n in which Et [r1,t+j ] represents the conditional expectations of future short rates based on information at time t. Assuming that expectations of future rn,t = φ + 42 Linear Regression Models short-term rates are formed according to Et [r1,t+j ] = r1,t , the term structure relationship in (2.22) reduces to rn,t = φ + r1,t . (2.23) Equation (2.23) suggests that the term structure of interest rates can be modelled by the following linear regression model rn,t = β0 + β1 r1,t + ut , in which ut is a disturbance term. Under the expectations hypothesis the slope parameter is given by β1 = 1 and the intercept may then be interpreted as the risk premium, β0 = φ. 2.3.6 Present Value Model The price of asset is equal to the expected discounted dividend stream Pt = Et [ Dt+2 Dt+3 Dt+1 + + + · · · ], (1 + δ) (1 + δ)2 (1 + δ)3 (2.24) where Dt is the dividend payment, δ is the discount factor, which is assumed to be constant for simplicity, and Et [Dt+j ] represents the conditional expectations of Dt+j based on information at time t. Adopting the assumptions that expectations of future dividends are given by present dividends, Et [Dt+n ] = Dt , and the discount rate is constant and equal to δ, then Chapter 1 shows that the price of the asset simplifies to Dt . (2.25) δ By taking natural logarithms of both sides gives a linear relationship between log Pt and log Dt Pt = log(Pt ) = − log(δ) + log(Dt ). This suggests that the present value model can be represented by the following linear regression model log(Pt ) = β0 + β1 log(Dt ) + ut , (2.26) in which ut is a disturbance term. A test of the present value model is based on the restriction β1 = 1. This model also shows that the intercept term β0 is a function of the discount factor, β0 = − log(δ), which suggests that the discount factor is given by δ = exp(−β0 ). 2.3 Linear Models in Finance 43 2.3.7 C-CAPM † The consumption based Capital Asset Pricing Model (C-CAPM) assumes that a representative agent chooses current and future real consumption {Ct , Ct+1 , Ct+2 , · · · } to maximise the inter-temporal expected utility function # " 1−γ ∞ X Ct+j − 1 j , (2.27) δ Et 1−γ j=0 subject to the wealth constraint Wt+1 = (1 + ri,t+1 )(Wt − Ct ), (2.28) where Wt is wealth, ri,t is the return on an asset (more precisely on wealth), and Et is the conditional expectations operator based on information at time t. The parameters are the discount rate δ, and the relative risk aversion coefficient, γ. Solving this maximisation problem yields the first order condition " # Ct+1 −γ Et δ (2.29) (1 + ri,t+1 ) = 1. Ct Taking natural logarithms of this equation gives " # Ct+1 −γ log Et δ (1 + ri,t+1 ) = 0, Ct (2.30) since log 1 = 0. The left hand side of expression (2.30) is essential the logarithm of a conditional expectation. This expression may be simplified by recognising that if a variable X follows the log-normal distribution, then 1 log Et [X] = Et [log X] + vart (log X) . (2.31) 2 The trick is now to define X = δ(Ct+1 /Ct )−γ (1 + ri,t+1 ) and then find relatively straightforward expressions for the two terms on the right hand side of (2.31), based on the assumption that X does indeed follow a lognormal distribution. The properties of natural logarithms require that Ct+1 log X = log δ − γ log + log(1 + ri,t+1 ) , Ct so that Ct+1 Et [log X] = log δ − γEt log + Et [log(1 + ri,t+1 )] , Ct 44 Linear Regression Models which is the first term on the right hand side of (2.31). The second term is vart (log X) = vart (log δ − γ log( Ct+1 ) + log(1 + ri,t+1 )) , Ct which may be simplified by recognising that the only contributions to vart (log X) will come from the variances and covariance of the terms in Ct+1 /Ct and rt . These terms are as follows Ct+1 Ct+1 2 vart γ log = γ vart log Ct Ct vart (log(1 + ri,t+1 )) = vart (log(1 + ri,t+1 )) Ct+1 , log(1 + ri,t+1 ) = γ 2 σc2 + σr2 − 2γσc,r . covt −γ log Ct Using these results, it follows that (2.30) can be re-expressed as Ct+1 1 log δ − γEt log + Et [log(1 + ri,t+1 )] + (γ 2 σc2 + σr2 − 2γσc,r ) = 0, Ct 2 or Ct+1 1 . Et [log(1 + ri,t+1 )] = − log δ − (γ 2 σc2 + σr2 − 2γσc,r ) + γEt log 2 Ct To convert this equation from expected variables to observable variables define the following expectations generating equations log ri,t+1 = Et [log(1 + ri,t+1 )] + u1,t Ct+1 Ct+1 log = Et log + ut,2 , Ct Ct in which u1,t and u2,t represent errors in forming conditional expectations. Using these expressions in (2.3.7) gives a linear regression model between log returns of an asset and the growth rate in consumption log(Ct+1 /Ct ) Ct+1 log(1 + ri,t+1 ) = β0 + β1 log + ut , (2.32) Ct in which 1 β0 = − log δ − (γ 2 σc2 + σr2 − 2γσc,r ) 2 β1 = γ, and where ut = u1,t − γu2,t is a composite disturbance term. In this expression, the slope parameter of the regression equation is in fact the relative risk aversion coefficient, γ. The expression of the intercept term shows that β0 is a function of a number of parameters including the relative risk aversion 2.4 Estimation 45 parameter γ, the discount rate δ, the variance of consumption growth σc2 , the variance of log asset returns σr2 and the covariance between logarithm of asset returns and real consumption growth. 2.4 Estimation The finance models presented in Section 2.3 are all representable in terms of the following generic linear regression equation yt = β0 + β1 x1,t + β2 x2,t + · · · + βK xK,t + ut , (2.33) in which yt is the dependent variable which is a function of a constant, a set of K explanatory variables given by x1,t , x2,t , · · · , xK,t and a disturbance term, ut . The disturbance term represents movements in the dependent variable yt not explained movements in the explanatory variables. The regression parameters, β0 , β1 , β2 , · · · , βK , control the the strength of the relationships between the dependent and the explanatory variables. For equation (2.33) to represent a valid model ut needs to satisfy a number of properties, some of which have already been discussed. (1) Mean: The disturbance term has zero mean, E[ut ] = 0. (2) Homoskedasticity The disturbance variance is constant for all observations, var(ut ) = σ 2 . (3) No autocorrelation: Disturbances corresponding to different observations are independent, E[ut ut+j ] = 0, j 6= 0. (4) Independence: The disturbance is uncorrelated with the explanatory variables, E[ut xj,t ] = 0, j = 1, 2, · · · , K. (5) Normality: The disturbance has a normal distribution. These assumptions are usually summarised as ut ∼ iid N (0, σ 2 ) in the specification of the regression model. The regression model in (2.33) represents the population. The aim of estimation is to compute the unknown parameters β0 , β1 , β2 , · · · , βK , given a sample of T observations on the dependent variables and the K explanatory variables. As it is the sample that is used to estimate the population parameters, the sample counterpart of (2.33) is yt = βb0 + βb1 x1,t + βb2 x2,t + · · · + βbK xK,t + u bt , (2.34) 46 Linear Regression Models where βbk is the sample estimate of βk , and u bt represents the regression residual. Given a sample of T observations the βbk ’s are estimated by minimising the residual sum of squared errors RSS = T X u b2t . (2.35) t=1 The βbk ’s represent the ordinary least squares estimates of the parameters of the model. From the discussion of the minimum variance portfolio problem in Section 2.2, the least squares solution corresponds to estimating the population moments by the sample moments. In the case of a portfolio with two assets, the expressions in (2.10) in terms of the sample moments become βb1 = T 1X (yt − y)(xt − x) T t=1 T 1X (xt − x)2 T , βb0 = y − βb1 x, (2.36) t=1 where y and x are the sample means T 1X y= yt , T t=1 T 1X x= xt . T t=1 These formulas are easily extended to the multiple regression model in which there is more than one explanatory variable. 2.5 Some Results for the Linear Regression Model† This section provides a limited derivation of the ordinary least squares estimators of the multiple linear regression model and also the sampling distributions of the estimators. Attention is focussed on a model with one independent variable and two explanatory variables in order to give some insight into the general result. Consider the linear regression model yt = β1 x1,t + β2 x2,t + ut , ut ∼ iid N (0, σ 2 ) , (2.37) in which the variables are defined as being deviations from their means so that there is no constant term in equation (2.37). This assumption simplifies the algebra but has no substantive affect. The residual sum of squares is 2.5 Some Results for the Linear Regression Model† 47 given by b = RSS(β) T X t=1 u b2t = yt − T X (βb1 x1,t + βb2 x2,t )2 (2.38) t=1 Differentiating RSS with respect to β1 and β2 and setting the results equal to zero yields ∂RSS ∂β1 ∂RSS ∂β2 = PT = PT t=1 (yt − βb1 x1,t − βb2 x2,t )x1,t = 0 (2.39) b b t=1 (yt − β1 x1,t − β2 x2,t )x2,t = 0 . This system of first-order conditions can be written in matrix form as T T T P P 2 P βb1 0 y x x x x t 1,t 1,t 2,t 1,t t=1 t=1 t=1 − = , T T T P P P yt x2,t x1,t x2,t x22,t 0 βb t=1 t=1 and solving for βb1 and βb2 gives P T T P 2 βb1 x1,t x2,t x 1,t t=1 t=1 = P T T P b x x x22,t 1,t 2,t β2 t=1 2 t=1 t=1 −1 T P x1,t yt , t=1 T P x2,t yt (2.40) t=1 which are the ordinary least squares estimators βb = [βb1 , βb2 ]0 of the population parameters {β1 , β2 }. Inspection of the terms on the right-hand side of (2.40) allows a number of simplifications of notation to be made. The first matrix on the right-hand side of (2.40) when multiplied by T −1 is the sample covariance matrix of x1,t and x2,t , which may be denoted Mxx . Similarly the second object on the right-hand side of (2.40), when multiplied by T −1 sample covariance of x1,t and x2,t with yt , respectively. This may be denoted Mxy . The ordinary least squares estimator of the multiple regression model in equation (2.37) may therefore be written as T T h1 X i−1 h 1 X i −1 βb = Mxx Mxy = xt x0t x t yt , T T t=1 ]0 . (2.41) t=1 in which xt = [x1,t , x2,t The beauty of this notation is that it is completely general. In the event of K > 2 regressors the relevant vector xt is defined and the estimator is still given by (2.41). 48 Linear Regression Models Once the ordinary least squares estimates have been computed, the ordinary least squares estimator, s2 , of the variance, σ 2 in the case of K = 2, is obtained from T 1X b 2 (β1 x1,t − βb2 x2,t )2 . (2.42) s = T t=1 In computing s2 in equation (2.42) it is common to express the denominator in terms of the degrees of freedom, T − K instead of merely T . If K > 2, the estimation of σ 2 proceeds exactly as in equation (2.42) where, of course, the appropriate number of regressors and coefficients are now included in the computation. Equation (2.41) for the ordinary least squares estimator of the parameters of the K variable regression model may be re-arranged and written as T T T T h1 X i−1 h 1 X i h1 X i−1 1 X 0 0 b β= xt xt x t yt = β + xt xt xt ut , (2.43) T T T T t=1 t=1 t=1 t=1 where the last term is obtained by substituting for yt from regression equation (2.37). This expression shows that the distribution of the estimator βb P P is going to depend crucially on T −1 Tt=1 xt ut and T −1 Tt=1 xt x0t . The distribution of the estimator ordinary least squares estimator βb is established in terms of two important results. In order to invoke these results the variables xt and yt need to satisfy a number of important conditions.1 The first result is the weak law of large numbers (WLLN) which is used to claim that the sample covariance matrix of the xt variables converges, as the sample size gets infinitely large, to the population covariance matrix, or T 1X p xt x0t −→ Ω T t=1 p where Ω is the population covariance matrix of xt and −→ represents convergence in probability as T → ∞. The second result is the application of a central limit theorem to claim that T 1 X d √ xt ut −→ N (0, σ Ω) T t=1 d where σ is the population variance of ut and −→ represents convergence of 1 For expediency reasons, it will simply be assumed here that the requisite conditions on xt and yt are indeed satisfied. For a more detailed discussion of these conditions and the appropriate choice of central limit theorem see, Hamilton (1994) or Martin, Hurn and Harris (2013). 49 2.6 Diagnostics the distribution as T → ∞. Re-arranging equation (2.43) slightly and using these two important convergence results, yields √ d T (βb − β) −→ Ω−1 × N (0, σ Ω) = N (0, σΩ−1 ) . This is the usual expression for the distribution of the least squares estimator of the multiple regression model as T → ∞. 2.6 Diagnostics The estimated regression model is based on the assumption that the model is correctly specified. To test this assumption a number of diagnostic procedures are performed. These diagnostics are divided into three categories which relate to the key variables that summarise the model, namely, the dependent variable Yt , the explanatory variables Xt and the disturbances ut . 2.6.1 Diagnostics on the Dependent Variable The fundamental aim of the linear regression model is to explain the movements in the dependent variable yt . This suggests that a natural measure of the success of an estimated model is given by the proportion of the variation in the dependent variable explained by the model. This statistic is given by the coefficient of determination T P Explained sum of squares R2 = = Total sum of squares (yt − y)2 − t=1 T P (yt − T P t=1 u b2t . (2.44) y)2 t=1 The coefficient of determination satisfies the inequality 0 ≤ R2 ≤ 1. Values close to unity suggest a very good model fit and values close to zero representing a poor fit. From equation (2.20), the explained sum of squares provides an overall estimate of the systematic (non-diversifiable) risk of the asset, while the unexplained part gives an estimate of its idiosyncratic (or diversifiable risk). This suggests that R2 provides a measure of the proportion of the total risk of an asset that is non-diversifiable, and 1 − R2 represents the proportion that is diversifiable. A potential drawback with R2 is that it never decreases when another variable is added to the model. By continually including variables, until the 50 Linear Regression Models number just matches the actual sample size, it is possible to obtain a coefficient of determination of R2 = 1, with all risk effectively diversified away. From a statistical point of view, what is important in selecting explanatory variables is to include just those variables which significantly help to improve the explanatory power of the model. This is achieved by penalising the R2 statistic through the loss in degrees of freedom. This statistic is referred to as the adjusted coefficient of determination which is computed as 2 R = 1 − (1 − R2 ) T −1 . T −K −1 (2.45) A related measure to the coefficient of determination is the standard error of the regression s PT b2t t=1 u s= , (2.46) T −K −1 which is simply the standard deviation of the ordinary least squares residuals. As the residuals in the CAPM model represent the component of risk that is diversifiable, this statistic provides an overall measure of diversifiable risk. A value of s = 0 implies a perfect fit with R2 = 1, with the resultant implication that all risk is non-diversifiable. An estimate of s > 0 suggests a less than perfect fit with some risk being diversifiable. However, it is not possible to determine the quality of fit of a model by simply looking at the value of s because this quantity is affected by the units in the measurement of the variables. For example, re-expressing returns in terms of percentages has the effect of increasing s by a factor of 100, without changing the fit of the model. 2.6.2 Diagnostics on the Explanatory Variables As the aim of the regression model is to explain movements in the dependent variable over and above its mean y, using information on the explanatory variables x1,t , x2,t , · · · , xK,t , this implies that for this information to be important the slope parameters β1 , β2 , · · · , βK associated with these explanatory variables must be non-zero. To investigate this proposition tests are performed on these parameters individually and jointly. To test the importance of a single explanatory variable in the regression equation, the associated parameter estimate is tested to see if it is zero using a t-test. The null and alternative hypotheses are respectively H0 : βk = 0 [xk,t is does not contribute to explaining yt ] H1 : βk 6= 0 [xk,t is does contribute to explaining yt ]. 2.6 Diagnostics 51 The t statistic to perform this test is t= βbk , se(βbk ) (2.47) where βbk is the estimated coefficient of βk and se(βbk ) is the corresponding standard error. The null hypothesis is rejected at the α significance level if the test yields a smaller p-value p − value < α : Reject H0 at the α level of significance p − value > α : Fail to reject H0 at the α level of significance. (2.48) It is typical to choose α = 0.05 as the significance level, which means that there is a 5% chance of rejecting the null hypothesis when it is actually true. A joint test of all of the explanatory variables is determined by using a either a F-test or a chi-square test. The null and alternative hypotheses are respectively H0 : β1 = β2 = ... = βK = 0 H1 : at least one βk is not zero. Notice that this test does not include the intercept parameter β0 , so the total number of restrictions is K. The F-statistic is computed as F = R2 /K , (1 − R2 )/(T − K − 1) (2.49) which is distributed as FK,T −K−1 (α). The χ2 test is computed as χ2 = KF = R2 , (1 − R2 )/(T − K − 1) (2.50) which is distributed as χ2 with K degrees of freedom. Values of the test statistics yielding p-values less than 0.05, constitute rejection of the null hypothesis as in (2.48). The t-test in (2.47) is designed to determine the importance of an explanatory variable by determining if the slope parameter is zero. From the discussion of various theories in finance presented in Section 2.3, other types of tests are of interest which focus on testing whether the population parameter equals a particular non-zero value. For example, in the case of the CAPM it is of interest to see whether an asset tracks the market one-to-one by determining if the slope parameter is unity. The t-statistic to perform this test is obtained by generalising (2.47) as t= βbk − 1 . se(βbk ) (2.51) 52 Linear Regression Models More generally, sets of restrictions can be tested using either a F-test or a chisquare test as before. In the case of testing 1 restriction, then F = χ2 = t2 . 2.6.3 Diagnostics on the Disturbance Term The third and final set of diagnostic tests are based on the disturbance term, ut . For the regression model to represent a well specified model there should be no information contained in the disturbance term. If this condition is not satisfied, not only does this represent a violation of the assumptions underlying the linear regression model, but it also suggests that there are some arbitrage opportunities which can be used to improve predictions of the dependent variable. Residual Plots A visual plot of the least squares residuals over the sample provides an initial descriptive tool to identify potential patterns. Positive residuals show that the model underestimates the dependent variable, whereas negative residuals show that the model overestimates the dependent variable. A sequence of positive (negative) residuals suggests that the model continually underestimates (overestimates) the dependent variable, thereby raising the possibility of arbitrage opportunities in predicting movements in the dependent variable. Residual plots are also helpful in identifying abnormal movements in financial variables. LM Test of Autocorrelation This test is very important when using time series data. The aim of the test is to detect if the disturbance term is related to previous disturbance terms. The null and alternative hypotheses are respectively H0 : No autocorrelation H1 : Autocorrelation If there is no autocorrelation this provides support for the model, whereas rejection of the null hypothesis suggests that the model excludes important information. The test consists of using the least squares residuals u bt in the following equation u bt = γ0 + γ1 x1,t + γ2 x2,t + · · · + γK xK,t + ρ1 u bt−1 + vt , (2.52) where vt is a disturbance term. This equation is similar to the linear regression model (2.33) with the exception that yt is replaced by u bt and there is 2.6 Diagnostics 53 an additional explanatory variable given by the lagged residual u bt−1 . The test statistic is LM = T R2 , (2.53) where T is the sample size and R2 is the coefficient of determination from estimating (2.52). This statistic is distributed as χ2 with one degree of freedom. This test of autocorrelation using (2.52) constitutes a test of first order autocorrelation. Extensions to higher order autocorrelation is straightforward. For example, a test for second order autocorrelation is based on the regression equation u bt = γ0 + β1 x1,t + γ2 x2,t + · · · + γK xK,t + ρ1 u bt−1 + ρ2 u bt−2 + vt . (2.54) The test statistic is still (2.53) with the exception that the degrees of freedom is now equal to 2 to correspond to performing a joint test of lags 1 and 2. White Test of Heteroskedasticty White’s test of heteroskedasticity (White, 1980) is important when using cross-section data or when modelling time-varying volatility, a topic that is dealt with in Chapter ??. The aim of the test is to determine the constancy of the disturbance variance σ 2 . The null and alternative hypotheses are respectively H0 : Homoskedasticity [σ 2 is constant] H1 : Heteroskedasticity [σ 2 is time-varying]. The test consists of estimating the following equation for the case of K = 2 explanatory variables u b2t = γ0 + γ1 x1,t + γ2 x2,t + α1,1 x21,t + α1,2 x1,t x2,t + α2,2 x22,t + vt , (2.55) where vt is a disturbance term. The choice of the explanatory variables can be extended to include additional variables that are not necessarily included in the initial regression equation. The test statistic is LM = T R2 , where T is the sample size and R2 is the coefficient of determination from estimating (2.55). This statistic is distributed as χ2 with 5 degrees of freedom which corresponds to the number of explanatory variables in (2.55) excluding the constant. If the disturbance variance is constant is should not be affected by the explanatory variables in (2.55). In this special case γ1 = γ2 = α1,1 = α1,2 = α2,2 = 0, and the variance reduces to a constant given by σ 2 = γ0 . 54 Linear Regression Models Normality Test The assumption that ut is normally distributed is important in performing hypothesis tests. A common way to test this assumption is the Jarque-Bera test . The null and alternative hypotheses are respectively: H0 : Normality H1 : Nonnormality The test statistic is JB = T SK KT − 3 + 6 24 , (2.56) where T is the sample size, and SK and KT are skewness and kurtosis, respectively, of the least squares residuals T T 1X u bt 3 1X u bt 4 SK = , KT = . T s T s t=1 t=1 and s is the standard error of the regression in (2.46). The JB statistic is distributed as χ2 with 2 degrees of freedom. This set of diagnostics is especially helpful in those situations where, for example, the fit of the model is poor as given by a small value of the coefficient of determination. In this situation, the specified model is only able to explain a small proportion of the overall movements in the dependent variable. But if it is the case that ut is random, this suggests that the model cannot be improved despite a relatively large proportion of variation in the dependent variable is unexplained. In empirical finance this type of situation is perhaps the norm particularly in the case of modelling financial returns because the volatility tends to dominate the mean. In this noisy environment it is difficult to identify the signal in the data. 2.7 Estimating the CAPM Ordinary least squares estimates of the capital asset pricing model in (8.1) are given in Table 7.3 for five United States stocks (Exxon, General Electric, IBM, Microsoft, Walmart) and one commodity (gold) using continuously compounded monthly excess returns from May 1990 to July 2004. The pvalues associated with a t-test of the significance of each parameter estimate are given in parentheses. General Electric, IBM and Microsoft are all aggressive stocks (βb1 > 1), Exxon and Walmart are conservative stocks (0 < βb1 < 1) and gold is an imperfect hedge (βb1 < 0). 55 2.7 Estimating the CAPM Table 2.1 Ordinary least squares estimates of the CAPM in equation for monthly returns to five United States stocks and gold for the period April 1990 to July 2004. Standard errors are given in parentheses and p-values in square brackets. Stock Exxon General Electric Gold IMB Microsoft Walmart b0 b1 0.012 (0.000) 0.016 (0.000) -0.003 (0.238) 0.004 (0.474]) 0.012 (0.069) 0.007 (0.156) 0.502 (0.000) 1.144 (0.000) -0.098 (0.066) 1.205 (0.000) 1.447 (0.000) 0.868 (0.000) PT t=1 u b2t R 2 s 0.249 0.235 0.038 0.510 0.440 0.055 0.149 0.014 0.030 1.048 0.297 0.079 1.282 0.333 0.087 0.747 0.234 0.066 The t-statistic to test that the market excess return is an important explanatory variable of the excess return on say Exxon is computed as 0.502 = 55.778 0.009 The p-value is 0.000, which is given in square brackets. As 0.000 < 0.05, the null hypothesis is rejected at the 5% level. The same qualitative results occur for the other assets in Table 8.1 with the exception of gold. For gold the p-value of the test is 0.066 suggesting that this restriction is rejected at the 10% level, but not at the 5% level. These results may also be used to test the hypothesis that a stock tracks the market one-to-one. The pertinent null hypothesis is H0 : β1 = 1, which may be tested using a t-test. In the case of General Electric, to test statistic is 1.144 − 1 t= = 1.458 . 0.098 The p-value of this statistic is 0.1447 and the conclusion is that the null hypothesis cannot be rejected at the 5% level. 2 The R statistics of the estimated CAPM for the various assets are also given in the second last column of Table 8.1. The largest value reported is for General Electric which shows that 44% of variation of movements in its excess returns are explained movements in the market returns relative to the t= 56 Linear Regression Models Residuals -.4 -.2 0 .2 General Electric Residuals -.4 -.2 0 .2 Exxon 1990 1995 2000 2005 1990 1995 2005 2000 2005 2000 2005 Residuals -.4 -.2 0 .2 IBM Residuals -.4 -.2 0 .2 Gold 2000 1990 1995 2000 2005 1990 1995 Residuals -.4 -.2 0 .2 Walmart Residuals -.4 -.2 0 .2 Microsoft 1990 1995 2000 2005 1990 1995 Figure 2.1 Least squares residuals from an estimated CAPM regressions for six United States stock returns for the period April 1990 to July 2004. 2 risk free rate. Gold has the lowest R with just 1.4% of movements explained by the market. This result also suggests that gold has the highest proportion of risk that is diversifiable. Estimates of the diversifiable risk characteristics of each asset are given by s in the last column of the Table. Plots of the least squares residuals in Figure 2.1 highlight the presence of some outliers in gold (+16.43%) and IBM (−28.48%) in October of 1999, and Microsoft during the dot-com crisis of 2000 with the biggest movement occurring in April (−38.56%). The estimated CAPM for Exxon and Walmart do not exhibit any significant model misspecification. The IBM model does not exhibit autocorrelation at the 1%, but fails the normality test. The gold and Microsoft CAPMs exhibit second order autocorrelation, but not first or twelfth autocorrelation at the 5% level, as well as fail the normality test. In contrast, the General Electric CAPM exhibits autocorrelation at all lags, but does not fail the normality test at the 5% level. All estimated models pass the White heteroskedasticity test. 57 2.8 Qualitative Variables Table 2.2 Diagnostic test statistics (with p-values in parentheses) of the estimated CAPM models for monthly returns to five United States stocks and gold for the period April 1990 to July 2004. P-values are given in parentheses. The test statistics are LM (j), which is the LM test for j th order autocorrelation; W HIT E, which is the White test of heteroskedasticity with regressors given by the levels and squares; and JB, which is the Jarque-Bera test of normality. Stock Exxon GE Gold IMB Microsoft Walmart LM (1) LM (2) LM (12) W HIT E JB 0.567 (0.452) 5.458 (0.019) 1.452 (0.228) 0.719 (0.396) 3.250 (0.071) 1.270 (0.260) 1.115 (0.573) 7.014 (0.030) 7.530 (0.023) 0.728 (0.695) 6.134 (0.047) 1.270 (0.530) 12.824 (0.382) 41.515 (0.000) 17.082 (0.146) 10.625 (0.561) 12.220 (0.428) 12.681 (0.393) 1.022 (0.600) 5.336 (0.069) 2.579 (0.275) 1.613 (0.446) 0.197 (0.906) 2.230 (0.328) 2.339 (0.310) 5.519 (0.063) 224.146 (0.000) 34.355 (0.000) 52.449 (0.000) 4.010 (0.135) 2.8 Qualitative Variables In all of the applications and examples investigated so far the explanatory variables are all quantitative whereby each variable takes on a different value for each sample observation. However, there are a number of applications in financial econometrics where it is appropriate to allow some of the explanatory variables to exhibit qualitative movements. Formally this is achieved by using a dummy variable which is 1 for an event and 0 for a non-event 0 : (non-event) Dumt = 1 : (event). 2.8.1 Stock Market Crashes Consider the augmented present value model Pt = β0 + β1 Dt + β2 Dumt + ut , where Pt is the stock market price, Dt is the dividend payment and ut is a disturbance term. The variable Dumt is a dummy variable that captures 58 Linear Regression Models the effects of a stock market crash on the price of the asset 0 : (pre-crash period) Dumt = 1 : (post-crash period). The dummy variable has the effect of changing the intercept in the regression equation according to Pt = β0 + β1 Dt + ut : (pre-crash period) Pt = (β0 + β2 ) + β1 Dumt + ut : (post-crash period). For a stock market crash β2 < 0,which represents a downward shift in the present value relationship between the asset price and dividend payment. An important stock market crash that began on 10 March 2000 is known at the dot-com crash because the stocks of technology companies fell sharply. The effect on one of the largest tech stocks, Microsoft, is highlighted in Figure 2.2 by the large falls in its share price over 2000. The biggest movement is in April 2000 where there is a negative return of 42.07% for the month. Modelling of Microsoft is also complicated by the unfavourable ruling of its antitrust case at the same time which would have exacerbated the size of the fall in April. Further inspection of the returns shows that there is a further fall in December of 27.94%, followed by a correction of 34.16% in January of the next year. 0 20 Price 40 60 (a) Price 1990 1995 2000 2005 2000 2005 -.4 -.2 Returns 0 .2 .4 (b) Returns 1990 1995 Figure 2.2 Monthly Microsoft price and returns for the period April 1990 to July 2004. These three large movements are also apparent in the residual plot in Figure 2.2. Introducing dummy variables for each of these three months into 59 2.8 Qualitative Variables a CAPM model yields ri,t − rf,t = 0.015 + 1.370 (rm,t − rf,t ) − 0.391 Apr00t −0.298 Dec00t − 0.282 Jan01t + u bt . Figure 2.3 gives histograms without and with these three dummy variables and show that the dummy variables are successful in purging the outliers from the tails of the distribution. This result is confirmed by the JB statistic which has a p-value of 0.651 for the augmented model. (b) Residuals with Dummy Variables Density 4 0 0 2 2 Density 4 6 6 8 (a) Residuals without Dummy Variables -.4 -.2 0 Residuals .2 .4 -.4 -.2 0 Residuals .2 .4 Figure 2.3 Histograms of residuals from a CAPM regression using Microsoft returns for the period April 1990 to July 2004, both with and without dummy variables for the dot-com crash. 2.8.2 Day-of-the-week Effects Sometimes share prices exhibit greater movements on Monday than during the week. One reason for this extra volatility arises from the build up of information over the weekend when the stock market is closed. To capture this behaviour consider the regression model rt = β0 + β1 Mont + β2 Tuet + β3 Wedt + β4 Thut + ut , 60 Linear Regression Models where the data are daily. The dummy variables are defined as 0 : not Monday Mont = 1 : Monday 0 : not Tuesday Tuet = 1 : Tuesday 0 : not Wednesday Wedt = 1 : Wednesday 0 : not Thursday Thut = 1 : Thursday Notice that there are just 4 dummy variables to explain the 5 days of the week. This is because the setting of all dummy variables to zero Mont = Tuet = Wedt = Thut = 0, defines the regression model on the Friday as rt = β0 + ut . The intercept β0 in the model represents a benchmark average return which corresponds to the default day, namely Friday. All of the other average returns are measured with respect to this value. For example, the Monday average return is E[ rt | Mon] = β0 + β1 . So a significant value of β1 shows that average returns on Monday differ significantly from average returns on Friday. 2.8.3 Event Studies Event studies are widely used in empirical finance to model the effects of qualitative changes arising from a particular event on financial variables. Typically events arise from some announcement caused by for example, a change in the CEO of a company, an unfavourable antitrust decision, or the effects of monetary policy announcements on the market. In fact, the stock market crash and day-of-the-week effects examples of dummy variables given above also constitute event studies. A typical event study involves specifying a regression equation based on a particular model to represent ‘normal’ returns, and then defining separate dummy variables at each point in time over the event window to capture the ‘abnormal’ returns, positive or negative. The parameter on a particular dummy is the ‘abnormal’ return 2.9 Measuring Portfolio Performance 61 at that point in time as it represents the return over and above the ‘normal’ return. In defining the period of the event window two periods are included which occur on either side of the point in time of the actual announcement. The period before the announcements is included to identify how the market behaves in anticipation of the announcement. The period after the announcement captures the reaction of the market to the announcement. For an event study with ‘normal’ returns based on the market model in (2.15) and ‘abnormal’ returns corresponding to an event window that occurs in the last 5 days of the sample with the actual announcement occurring the 3rd last day in the sample, the regression equation is rt = β0 + β1 rm,t | {z } ‘Normal’ return + δ−2 ET −5 + δ−1 ET −3 + δ0 ET −2 + δ1 ET −1 + δ2 ET −0 +ut . | {z } ‘Abnormal’ return The normal return at each point in time is given by β0 + β1 rm,t . The abnormal return on the day of the announcement is δ0 , on the days prior to the announcement given by δ−2 and δ−1 , and on the days after the announcement given by δ1 and δ2 . The abnormal return for the whole of the event window is Total abnormal return = δ−2 + δ−1 + δ0 + δ1 + δ2 . This suggests that a test of the statistical significance of the event and its effect on generating abnormal returns over the event window period is based on the restrictions H0 : δ−2 = δ−1 = δ0 = δ1 = δ2 = 0 (Normal returns) H1 : at least one restriction is not valid (Abnormal returns). A χ2 test can be used with 5 degrees of freedom. 2.9 Measuring Portfolio Performance There are three commonly used metrics to measure portfolio performance. Sharpe Ratio (Sharpe, 1966) The Sharpe ratio is a measure of average return, R, in excess of a risk free rate, Rf , risk per unit of total portfolio risk, s, and is defined as S= r − rf . s 62 Linear Regression Models The Sharpe ratio demonstrates how well the return of an asset compensates the investor for the risk taken. In particular, when comparing two risky assets the one with a higher Sharpe ratio provides better return for the same risk. The Sharpe ratio has proved very popular in empirical finance because it may be computed directly from any observed time series of returns. Treynor Index (Treynor, 1966). The Treynor ratio is defined as T = r − rf , β where β is the Beta-risk of the portfolio. Like the Sharpe ratio, this measure also gives a measure of excess returns per unit of risk, but is uses Beta-risk as the denominator and not total portfolio risk as in the Sharpe ratio. Jensen’s Alpha (Jensen, 1968) Jensen’s alpha is obtained from the CAPM regression as α = E[ri,t − rf,t ] − βE[rm,t − rf,t ] . To illustrate the general ideas involved in measuring portfolio performance a data set comprising monthly returns to 10 industry portfolios was downloaded from Ken French’s webpage at Dartmouth2 together with a benchmark monthly returns to the market and the monthly return on a risk free rate of interest . The industry portfolios are: consumer nondurables (nondur), consumer durables (dur), manufacturing (man), energy (energy), technology (hitec), telecommunications (telecom), wholesale and retail (shops), healthcare (health), utilities (utils) and a catch all that includes mining, construction, entertainment and finance (other). The The return on the market is constructed as the value-weight return of all CRSP firms incorporated in the United States and listed on the NYSE, AMEX, or NASDAQ and the risk free rate is the 1-month U.S. Treasury Bill rate (for more details see Appendix A). Table 2.3 reports summary statistics for the portfolio returns as well as the market and risk free variables. Table 2.4 tabulates the Sharpe ratio, Treynor index and Jensen’s alpha for the 10 industry portfolios together with their Beta coefficient obtained from estimation of the CAPM equation. Consumer durables, manufacturing and the sectors summarised in ‘other’ are the all aggressive portfolios with β > 1. The retail, wholesale and service shop industry provides a sector portfolio that is closest to being a tracking portfolio 2 http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. 63 2.9 Measuring Portfolio Performance with β = 0.96. All the other industry portfolios are relatively conservative with 0 < β < 1. As expected none of the industry portfolios provide a hedge against systematic risk. Table 2.3 Summary statistics for monthly returns data on the market portfolio, risk free rate of interest and 10 United States industry portfolios for the period January 1927 to December 2008 (T = 984). Data are downloaded from Ken French’s data library. Variable Mean Std. Dev. Skewness Kurtosis emkt rf nondur dur man energy hitec telcom shops health utils other 0.5895 0.3046 0.9489 1.0001 0.9810 1.0625 1.0505 0.8026 0.9584 1.0628 0.8694 0.8762 5.4545 0.2522 4.7127 7.6647 6.3799 6.0306 7.4844 4.6422 5.9160 5.7923 5.7101 6.5295 0.1886 1.0146 −0.0323 1.0988 0.9177 0.2118 0.2807 0.0109 −0.0313 0.1684 0.0881 0.9197 10.5619 1.0146 8.7132 18.1815 15.3365 6.1139 8.8840 6.2314 8.3867 10.0623 10.4817 16.4520 Table 2.4 Measures of portfolio performance for monthly returns data on 10 United States industry portfolios for the period January 1927 to December 2008 (T = 984). Data are downloaded from Ken French’s data library. Variable Sharpe Ratio Treynor Index Beta Jensen’s Alpha Rank Sharpe Rank Treynor Rank Alpha nondur dur man energy hitec telcom shops health utils other 0.137 0.091 0.106 0.126 0.010 0.107 0.111 0.131 0 .099 0.088 0.845 0.568 0.601 0.892 0.597 0.768 0.681 0.884 0.707 0.510 0.762 1.225 1.126 0.850 1.249 0.649 0.960 0.858 0.799 1.120 0.195 −0.027 0.013 0.257 0.010 0.116 0.088 0.252 0.094 −0.089 1 8 6 3 10 5 4 2 7 9 3 9 7 1 8 4 6 2 5 10 3 9 7 1 8 4 6 2 5 10 The correct treatment of risk in evaluating portfolio models has been the subject of much research. While it is well understood that adjusting the 64 Linear Regression Models portfolio for risk is important, the exact nature of this adjustment is more problematic. The results in Table 2.4 highlight a feature that is commonly encountered in practical performance evaluation, namely, that the Sharpe and Treynor measures rank performance differently. Of course, this is not surprising because the Sharpe ratio accounts for total portfolio risk, while the Treynor measure adjusts excess portfolio returns for systematic risk only. The similarity between the rankings provided by Treynor’s index and Jensen’s alpha is also to be expected given that the alpha measure is derived from a CAPM regression which explicitly accounts for systematic risk via the inclusion of the market factor. On the other hand, the precision of the alpha measure is questionable in these regressions, a factor that will be returned to a little later. All of the rankings are consistent in one respect, namely that a positive alpha is a necessary condition for good performance and hence alpha is probably the most commonly used measure. Table 2.4 confirms that the consumer durables and other industry portfolios are the only ones to return a negative alpha and they are uniformly ranked a poor performers by all metrics. The importance of the alpha of a portfolio has led to a substantial literature that extends the basic CAPM model to account for risk factors over and above the market risk factor. If these factors can be reliably identified then the exposure of a portfolio to this risk factor can be included in expected return. In this way the true excess return or alpha is identified. Fama and French (1992, 1993) augment the CAPM model by including two additional factors that measure the performance of small stocks relative to big stocks (SMB) and the performance of value stocks relative to growth stocks (HML). The inclusion of a SMB or ‘size’ factor is usually justified by arguing that this factor captures the fact that small firms have greater sensitivity to economic conditions than large firms and embody greater informational asymmetry. The motivation for HML is that high book value relative to market value implies a greater probability of financial distress and bankruptcy. The combined model is commonly referred to as the FamaFrench three-factor model. Carhart (1977) suggested a fourth factor be included in the extended CAPM model following the work of Jegadeesh and Titman (1993). Jegadeesh and Titman found that a portfolio made up of buying stocks had high returns over the past three to twelve months and selling those that have had poor returns over the same period, had a higher return than that predicted by a three-factor model. This factor is known as the momentum factor, MOMt , as its inclusion into the extended CAPM model is usually justified 65 2.9 Measuring Portfolio Performance by appealing to behavioural aspects of investors such as herding and overor under-reaction to news. 00 10 20 80 70 60 50 90 20 19 19 19 19 00 90 80 70 60 50 10 20 20 19 19 19 19 19 19 40 -60 30 19 00 90 80 70 60 50 10 20 20 19 19 19 19 19 40 19 19 30 -20 -40 0 -20 20 0 20 Momentum Factor 40 Value Factor 19 19 40 -20 30 19 00 90 10 20 20 70 60 50 40 80 19 19 19 19 19 19 19 30 -40 -20 0 0 20 20 40 Size Factor 40 Market Factor Figure 2.4 Monthly data for market, size, value and momentum factors of the extended CAPM model for the period January 1927 to December 2012. Figure 2.4 plots the evolution of the four factors of the extended CAPM model. The linear regression equation to be estimated in order to implement the extended model is given by ri,t − rf,t = α + β1 (rm,t − rf,t ) + β2 SMBt + β3 HMLt + β4 MOMt + ut , (2.57) where ut is a disturbance term. The contributions of SMB, HML and MOM are determined by the parameters β2 , β3 and β3 respectively. In the special case where these additional factors do not explain movements in the excess return on the asset ri,t − rf,t , or β2 = β3 = β4 = 0, equation (2.57) reduces to the standard CAPM regression equation in (2.19). Table 2.5 reports the results of estimating this model for the 10 United States industry portfolios. There are a number of interesting features to note about the results reported in Table 2.5 in which statistical significance is marked with asterisks 66 Linear Regression Models Table 2.5 The four-factor CAPM model, equation (2.57), estimated using monthly returns data on 10 United States industry portfolios for the period January 1927 to December 2008 (T = 984). Data are downloaded from Ken French’s data library. Variable Constant α emkt β1 smb β2 nondur dur man energy hitec telcom shops health utils other 0.1659* 0.7693*** −0.0246 0.0344 1.1663*** 0.0122 −0.0210 1.1034*** −0.0030 0.0836 0.8859*** −0.2042*** 0.2026* 1.2564*** 0.0825** 0.2513* 0.6669*** −0.1373*** 0.1796* 0.9476*** 0.0787** 0.3180** 0.9025*** −0.0896** 0.0227 0.7835*** −0.1540*** −0.1319* 1.0380*** 0.0662*** ∗ p < 0.05 ∗ ∗ p < 0.01 ∗ ∗ ∗ hml β3 mom β4 0.0318 0.1566*** 0.1385*** 0.2719*** −0.3592*** −0.1141*** −0.1435*** −0.1810*** 0.3090*** 0.3328*** p < 0.001. 0.0229 −0.1205*** −0.0116 0.1157*** −0.0910*** −0.0870*** −0.0575** 0.0044 −0.0122 −0.0775*** for easy interpretation. The strength of the market factor in driving the returns to the portfolios is striking, with all the industry portfolio β’s being significant at the 0.1% level. There is strong evidence that the all the factors other than the market factor are important explanatory variables in the extended CAPM equation, but the results are not quite as uniform over the 10 portfolios. Not only does statistical significance vary, but there are also changes in sign which is indicative that different industries have vastly differing exposures to these factors. Perhaps the most interesting result is the effect of the additional factors on Jensen’s alpha. The statistical significance of α is not nearly as strong as expected: 4 of the industry portfolio’s have statistically insignificant estimates of α while the catch all sector ‘other’ has a negative and significant estimate. The biggest loser in this extended analysis is the energy sector. Energy was ranked first in Table 2.4 on both the Treynor and Jensen measures, but the estimate of α here is statistically insignificant. Health and telecommunications appear to come out of the extended CAPM with the highest measure of excess return. 2.10 Exercises (1) Minimum Variance Portfolios 2.10 Exercises 67 capm.wf1, capm.dta, capm.xlsx Consider the equity prices of the United States companies Microsoft and Walmart for the period April 1990 to July 2004 (T = 172). (a) Compute the continuously compounded returns on Microsoft and Walmart. (b) Compute the variance-covariance matrix of the returns on these two stocks. Verify that the covariance matrix of the returns is 0.011332 0.002380 , 0.002380 0.005759 where the diagonal elements are the variances of the individual asset returns and the off-diagonal elements are the covariances. Note that the off-diagonal elements are in fact identical because the covariance matrix is a symmetric matrix. (c) Use the expressions in (2.6) and (2.7) to verify that the minimum variance portfolio weights between these two assets are σ22 − σ1,2 0.005759 − 0.002380 = 0.274 = 0.011332 + 0.005759 − 2 × 0.002380 σ12 + σ22 − 2σ1,2 w2 = 1 − w1 = 1 − 0.274 = 0.726. w1 = (d) Using the computed weights in part (c), compute the return on the portfolio as well as its mean and variance (without any degrees of freedom adjustment). (e) Estimate the regression equation rWmart,t = β0 + β1 (rWmart,t − rMsoft,t ) + ut , where ut is a disturbance term. (i) Interpret the estimate of β1 and discuss how it is related to the optimal portfolio weights computed in part (c). (ii) Interpret the estimate of β0 . (iii) Compute the least squares residuals u bt , and interpret this quantity in the context of the minimum variance portfolio problem. (iv) Compute the variance of the least squares residuals, without any degrees of freedom adjustment, and interpret the result. (f) Using the results in part (e) (i) Construct a test of an equal weighted portfolio, w1 = w2 = 0.5. (ii) Construct a test of portfolio diversification. 68 Linear Regression Models (g) Repeat parts (a) to (f) for Exxon and GE. (h) Repeat parts (a) to (f) for gold and IBM. (2) Estimating the CAPM capm.wf1, capm.dta, capm.xlsx (a) Compute the monthly excess returns on Exxon, General Electric, Gold, IBM, Microsft and Walmart. Be particularly carefully when computing the correct risk free rate to use. [Hint: the variable TBILL is quoted as an annual rate.] (b) Estimate the CAPM in (2.19) for each asset and interpret the estimated Beta-risk. (c) For each asset, test the restriction β1 = 0. Assuming that this restriction holds, what is the relationship between CAPM and the Constant Mean Model in (2.13)? (d) For each asset, test the restriction β1 = 1. Assuming that this restriction holds, what is the relationship between CAPM and the Market Model in (2.16)? (e) For each asset, test the restriction β0 = 0. Provide an interpretation of the CAPM if this restriction is valid. (3) Fama-French Three Factor Model fama french.wf1, fama french.dta, fama french.xlsx (a) For each of the 25 portfolios in the data set, estimate the CAPM and interpret the Beta-risk. (b) Estimate the Fama-French three factor model for each portfolio and interpret the estimate of the Beta-risk and compare the estimate obtained in part (a). (c) Perform a joint test of the size (SMB) and value (HML) risk factors in explaining excess returns in each portfolio. (4) Present Value Model pv.wf1, pv.dta, pv.xlsx 2.10 Exercises 69 The present value model for price in terms of dividends is represented by the following regression model pt = β0 + β1 dt + ut where ut is a disturbance term and lowercase denotes logarithms. (a) Estimate the model and interpret the parameter estimates. (b) Examine the properties of the model by (i) (ii) (iii) (iv) Plotting the OLS residuals. Testing for autocorrelation. Testing for heteroskedasticity. Testing for nonnormality. (c) Test the restriction β1 = 1 and interpret the result. In particular, interpret the estimate of β0 when β1 = 1. (5) International CAPM icapm.wf1, icapm.dta, icapm.xlsx (a) Estimate the ICAPM for the NYSE and interpret the parameter estimates. (b) Examine the properties of the model by (i) (ii) (iii) (iv) Plotting the OLS residuals. Testing for autocorrelation. Testing for heteroskedasticity. Testing for nonnormality. (c) Test the restriction β1 = 1 and interpret the result. (d) Test the joint restrictions β0 = 0, β1 = 1 and interpret the result. (6) Fisher Hypothesis fisher.wf1, fisher.dta, fisher.xlsx The Fisher hypothesis states that nominal interest rates fully reflect long-run movements in inflation. To test this model consider the linear regression model rt = β0 + β1 πt + ut , where πt be the inflation rate and ut is a disturbance term. If the Fisher hypothesis is correct, β1 = 1. 70 Linear Regression Models (a) Estimate this model and interpret the parameter estimates. (b) Test the restriction β1 = 1 and interpret the result. In particular, interpret the estimate of β0 when β1 = 1. (7) Term Structure of U.S. Zero Coupon Rates termstructure.wf1, termstructure.dta, termstructure.xlsx The expectations theory of the term structure of interest rates is represented by a linear relationship between long-term and short-term interest rates LONGt = β0 + β1 SHORTt + ut where ut is a disturbance term. (a) Estimate the model where the long rate is the 2-year yield and the short rate is the 1-year yield. Interpret the parameter estimates. (b) Assuming that Et [SHORTt+1 ] = SHORTt implies that β1 = 1. Test this restriction. (c) Repeat (a) and (b) where the long rate is chosen, respectively, as the 3-year rate, the 4-year rate and so on up to the 15-year rate. (d) Suppose that the conditional expected value of the short rate is now given by Et [SHORTt+j ] = φj SHORTt , j = 1, 2, · · · , where φ is an unknown parameter. Show that for the case where the short and long rates are respectively the 1-year and 2-year yields, the slope parameter is given by 1+φ . 2 Use the results obtained in part (a) to estimate φ. (e) Repeat part (d) where the long rate is the 3-year yield and compare the estimate of φ with the estimate obtained in part (d). [ Hint: in deriving an expression for φ it is necessary to solve a quadratic equation in terms of β1 .] (f) Suppose that the long term bond is a consul with n → ∞. Show that the slope parameter in a regression of a consul on a constant and the 1-year short rate equals zero for |φ| < 1 in part (d) and unity for |φ| = 1. β1 = 71 2.10 Exercises (8) Fama-Bliss Regressions fama bliss.wf1, fama bliss.dta, fama bliss.xlsx (a) Convert the prices of United States zero coupon bonds into yields using Pn,t 1 ), n = 1, 2, 3, 4, 5, yn,t = − log( n 100 where Pn,t is the price of a n-year zero coupon bond at time t. (b) Compute the forward yields as fn,t = log(Pn−1,t ) − log(Pn,t ), n = 2, 3, 4, 5, (c) Compute the annual holding period returns as hn,t = log(Pn−1,t ) − log(Pn,t−12 ), n = 2, 3, 4, 5, (d) Compute the annual excess returns as u bn,t = hn,t − y1,t−12 , n = 2, 3, 4, 5, (e) Fama and Bliss (1987) specify a regression equation where the excess return is a function of the lagged forward spread in the previous year u bn,t = β0 + β1 (fn,t−12 − y1,t−12 ) + ut , where ut is a disturbance term. Estimate this equation for maturities n = 2, 3, 4, 5, over the sample period January 1965 to December 2003, and compare the estimates reported by Cochrane and Piazzesi (2009) who provide updated estimates of the Fama-Bliss regressions. Fama and Bliss found that the ability to forecast excess returns increased as maturity increased for horizons less than 5 2 years. Discuss this proposition by comparing R for each estimated regression equation. (f) An alternative approach is suggested by Cochrane and Piazzesi (2009) who specify the regression equation in terms of all forward rates in the previous year u bn,t = β0 +β1 y1,t−12 +β2 f2,t−12 +β3 f3,t−12 +β4 f4,t−12 +β5 f5,t−12 +ut , where ut is a disturbance term. Estimate this equation for maturities n = 2, 3, 4, 5 over the sample period January 1965 to December 2003, and compare the estimates with those reported by Cochrane 72 Linear Regression Models and Piazzesi (2009). Discuss the pattern of the slope parameter estimates {β1 , β2 , β3 , β4 , β5 } in each of the four regression equations. Briefly discuss the advantages of this specification over the FamaBliss regression model. (9) The Retirement of Lee Raymond as the CEO of Exxon capms.wf1, capm.dta, capm.xlsx In December of 2005, Lee Raymond retired as the CEO of Exxon receiving the largest retirement package ever recorded of around $400m. How did the markets view the Lee Raymond event? (a) Estimate the market model for Exxon from January 1970 to September 2005 rt = β0 + β1 rm,t + ut , where rt is the log return on Exxon and rm,t is the market return computed from the S&P500. Verify that the result is rt = 0.009 + 0.651 rm,t + u bt , where u bt is the residual. (b) Construct the dummy variables 1: D2005:10,t = 0: 1: D2005:11,t = 0: .. . 1: D2006:2,t = 0: Oct. 2005 , Otherwise Nov. 2005 , Otherwise Feb. 2006 , Otherwise (c) Restimate the market model including the 5 dummy variables constructed in part (b) over the extended sample from January 1970 to February 2006. Verify that the estimated regression equation is rt = 0.009 + 0.651 rm,t − 0.121 Oct05t + 0.007 Nov05t − 0.041 Dec05t +0.086 Jan06t − 0.059 Feb06t + u bt . (i) What is the relationship between the parameter estimates of β0 and β1 computed in parts (a) and (c)? 73 2.10 Exercises (ii) Do you agree that the total estimated abnormal return on Exxon from October 2005 to February 2006 is Total abnormal return = −0.121+0.007−0.041+0.086−0.059 = −0.128. (d) An alternative way to compute abnormal returns is to use the estimated model in part (a) and substitute in the values of rm,t for the event window. As the monthly returns on the market for this period are {−0.0179, 0.0346, −0.0009, 0.0251, 0.0004} , recompute the abnormal returns. Compare these estimates with the estimates obtained in part (c). (e) Perform the following tests of abnormal returns. (i) There was no abnormal return Decemberv2005. (ii) There were no abnormal returns (iii) There were no abnormal returns (iv) There were no abnormal returns at the time of retirement on before retirement. after retirement. at all. 3 Modelling with Stationary Variables 3.1 Introduction An important feature of the linear regression model discussed in Chapter 2 is that all variables are designated at the same point in time. To allow for financial variables to adjust to shocks over time the linear regression model is extended to allow for a range of dynamics. The first class of dynamic models developed is univariate whereby a single financial variable is modelled using its own lags as well as lags of our financial variables. Then multivariate specifications are developed in which several financial variables are jointly modelled. An important characteristic of the multivariate class of models investigated in the chapter is that each variable in the system is expressed as a function of its own lags as well as the lags of all of the other variables in the system. This model is known as a vector autoregression (VAR), model that is characterised by the important feature that every equation has the same set of explanatory variables. This feature of a VAR has several advantages. First, estimation is straightforward, being simply the application of ordinary least squares applied to each equation one at a time. Second, the model provides the basis of performing causality tests which can be used to quantity the value of information in determining financial variables. These tests can be performed in three ways beginning with Granger causality tests, impulse response functions and variance decompositions. Fourth, multivariate tests of financial theories can be undertaken as these theories are shown to impose explicit restrictions on the parameters of a VAR which can be verified empirically. Fifth, the VAR provides a very convenient and flexible forecasting tool to compute predictions of financial variables. 75 3.2 Stationarity 3.2 Stationarity 10 20 00 20 90 19 80 19 70 19 19 60 0 500 1000 1500 The models in this chapter, which use standard linear regression techniques, require that the variables involved satisfy a condition known as stationarity. Stationarity, or more correctly, its absence is the subject matter of Chapters 4 and 5. For the present a simple illustration will indicate the main idea. Consider Figures 3.1 and 3.2 which show the daily S&P500 index and associated log returns, respectively. 10 20 00 20 90 19 80 19 70 19 19 60 -.02 -.01 0 .01 .02 Figure 3.1 Snapshots of the time series of the S&P500 index comprising daily observations for the period January 1957 to December 2012. Figure 3.2 Snapshots of the time series of S&P500 log returns computed from daily observations for the period January 1957 to December 2012. Assume that an observer is able to take a snapshot of the two series at 76 Modelling with Stationary Variables different points in time; the first snapshot shows the behaviour of the series for the decade of the 1960s and the second shows their behaviour from 20002010. It is clear that the behaviour of the series in Figure 3.1 is completely different in these two time periods. What the impartial observer sees in 1960-1970 looks nothing like what happens in 2000-2010. The situation is quite different for the log returns plotted in Figure 3.2. To the naked eye the behaviour in the two shaded areas is remarkable similar given that the intervening time span is 30 years. In both this chapter and the next chapter it will simply be assumed that the series we deal with exhibit behaviour similar to that in Figures 3.2. This assumption is needed so that past observations can be used to estimate relationships, interpret the relationships and forecast future behaviour by extrapolating from the past. In practice, of course, stationarity must be established using the techniques described in Chapter 4. It is not sufficient merely to assume that the condition is satisfied. 3.3 Univariate Autoregressive Models 3.3.1 Specification The simplest specification of a dynamic model of the dependent variable yt is where the explanatory variables are the own lags of the dependent variable yt = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p + ut , (3.1) where ut is a disturbance term with zero mean and variance σ 2 , and φ0 , φ1 , · · · , φp , are unknown parameters. This equation shows that the information used to explain movements in yt are the own lags with the longest lag being the pth lag. This property is formally represented by the conditional expectations operator which gives the predictor of yt based on information available at time t − 1 Et−1 [yt ] = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p . (3.2) Equation (3.1) is referred to as an autoregressive model with p lags, or simply AR(p). Estimation of the unknown parameters is achieved by using ordinary least squares. These parameter estimates can also be used to identify the role of past information by performing tests on the parameters. 77 3.3 Univariate Autoregressive Models 3.3.2 Properties To understand the properties of AR models, consider the AR(1) model yt = φ0 + φ1 yt−1 + ut , where |φ1 | < 1. Applying the unconditional expectations operator to both sides gives E[yt ] = E[φ0 + φ1 yt−1 + ut ] = φ0 + φ1 E[yt−1 ]. As E[yt ] = E[yt−1 ], the unconditional mean is E[yt ] = φ0 . 1 − φ1 The unconditional variance is defined as γ0 = E[(yt − E[yt ])2 ]. Now yt − E[yt ] = (φ0 + φ1 yt−1 + ut ) − (φ0 + φ1 E[yt−1 ]) = φ1 (yt−1 − E[yt−1 ]) + ut . Squaring both sides and taking unconditional expectations gives E[(yt − E[yt ])2 ] = φ21 E[(yt−1 − E[yt−1 ])2 ] + E[u2t ] + 2E[(yt−1 − E[yt−1 ])ut ] = φ21 E[(yt−1 − E[yt−1 ])2 ] + E[u2t ], as E[(yt−1 − E[yt−1 ])ut ] = 0. Moreover, because γ0 = E[(yt − E[yt ])2 ] = E[(yt−1 − E[yt−1 ])2 ] if follows that γ0 = φ21 γ0 + σ 2 , which upon rearranging gives γ0 = The first order autocovariance is σ2 . 1 − φ21 γ1 = E[(yt − E[yt ])(yt−1 − E[yt−1 ])] = E[(φ1 (yt−1 − E[yt−1 ]) + ut )(yt−1 − E[yt−1 ])] = φ1 E[(yt−1 − E[yt−1 ])2 ] = φ1 γ0 . It follows that the k th autocovariance is γk = φk1 γ0 . (3.3) 78 Modelling with Stationary Variables It immediately follows from this result that the autocorrelation function (ACF) of the AR(1) model is γk = φk1 . ρk = γ0 For 0 < φ1 < 1, the autocorrelation function declines for increasing k so that the effects of previous values on yt gradually diminish. For higher order AR models the properties of the ACF are in general more complicated. To compute the ACF, the following sequence of AR models are estimated by ordinary least squares yt = φ10 + ρ1 yt−1 + ut yt = φ20 + ρ2 yt−2 + ut .. .. .. . . . yt = φ30 + ρk yt−k + ut , where the estimated ACF is given by {b ρ1 , ρb2 , · · · , ρbk }. The notation adopted for the constant term emphasises that this term will be different for each equation. Another measure of the dynamic properties of AR models is the partial autocorrelation function (PACF), which measures the relationship between yt and yt−k but now with the intermediate lags included in the regression model. The PACF at lag k is denoted as φk,k . By implication the PACF for an AR(p) model is zero for lags greater than p. For example, in the AR(1) model the PACF has a spike at lag 1 and thereafter is φk,k = 0, ∀ k > 1. This is in contrast to the ACF which in general has non-zero values for higher lags. Note that by construction the ACF and PACF at lag 1 are equal to each other. To compute the PACF the following sequence of AR models are estimated by ordinary least squares yt = φ10 + φ11 yt−1 + ut yt = φ20 + φ21 yt−1 + φ22 yt−2 + ut yt = φ30 + φ31 yt−1 + φ32 yt−2 + φ33 yt−3 + ut .. .. .. .. . . . . yt = φk0 + φk1 yt−1 + φk2 yt−2 + · · · + φkk yt−k + ut , where the estimated PACF is therefore given by {ϕ b1 = φb11 , ϕ b2 = φb22 , · · · , ϕ bk = b φkk }. Consider United States monthly data on real equity returns expressed as 3.3 Univariate Autoregressive Models 79 a percentage, rpt , from February 1871 to June 2004. The ACF and PACF of the equity returns are computed by means of a sequence of regressions. The ACF for lags 1 to 3 is computed using the following three regressions (standard errors in parentheses): rpt = 0.247 + 0.285 rpt−1 + vbt , (0.099) (0.024) rpt = 0.342 + 0.008 rpt−2 + vbt , (0.103) (0.025) rpt = 0.361 − 0.053 rpt−3 + vbt . (0.103) (0.025) The estimated ACF is {b ρ1 = 0.285, ρb2 = 0.008, ρb3 = −0.053} . By contrast, the PACF for lags 1 to 3 is computed using the following three regressions (standard errors in parentheses): rt = 0.247 + 0.285 rt−1 + vbt , (0.099) (0.024) rt = 0.266 + 0.308 rt−1 − 0.080 rt−2 + vbt , (0.098) (0.025) (0.025) rt = 0.274 + 0.305 rt−1 − 0.070 rt−2 − 0.035 rt−3 + vbt . (0.099) (0.025) (0.026) (0.025) The estimated PACF is {ϕ b1 = 0.285, ϕ b2 = −0.080, ϕ b3 = −0.035} . The significance of the estimated coefficients in the regressions required to compute the ACF and PACF suggest that a useful starting point for a dynamic of of real equity returns is a simple univariate autoregressive model. The parameter estimates obtained by estimating an AR(6) model by ordinary least squares are as follows (standard errors in parentheses): rpt = 0.243 + 0.303 rpt−1 − 0.064 rpt−2 − 0.041 rpt−3 (0.099) (0.025) (0.026) (0.026) +0.019 rpt−4 + 0.056 ret−5 + 0.022 rpt−6 + vbt , (0.026) (0.026) (0.025) in which vbt is the least squares residual. The first lag is the most important both economically, having the largest point estimate (0.303) and statistically, having the largest t-statistic (0.303/0.025 = 12.12). The second and fifth lags are also statistically important at the 5% level. The insignificance of the parameter estimate on the sixth lag suggests that an AR(5) model may be a more appropriate and parsimonious model or real equity returns. 80 Modelling with Stationary Variables 3.3.3 Mean Aversion and Reversion in Returns There is evidence that returns on assets exhibit positive autocorrelation for shorter maturities and negative autocorrelation for longer maturities. Positive autocorrelation represents mean aversion as a positive shock in returns in one period results in a further increase in returns in the next period, whereas negative autocorrelation arises when a positive shock in returns leads to a decrease in returns in the next period. An interesting illustration of mean aversion and reversion in autorcorrelations is provided by the NASDAQ share index. Using monthly, quarterly and annual frequencies for the period 1989 to 2009 the following results are obtained from estimating a simple AR(1) model (standard errors in parentheses): Monthly : rt = 0.599 + 0.131 rt−1 + et Quarterly : rt = 1.950 + 0.058 rt−1 + et Annual : rt = 8.974 − 0.131 rt−1 + et . (0.438) (1.520) (7.363) (0.063) (0.111) (0.238) There appears to be mean aversion in returns for time horizons less than a year as the first order autocorrelation is positive for monthly and quarterly returns. By contrast, there is mean reversion for horizons of at least a year as the first order autocorrelation is now negative with a value of −0.131 for annual returns. To understand the change in the autocorrelation properties of returns over different maturities, consider the following model of prices, Pt , in terms of fundamentals, Ft pt = ft + ut ut ∼ iid N (0, σu2 ) ft = ft−1 + vt vt ∼ iid N (0, σv2 ), where lower case letters denote logarithms and vt and ut are disturbance terms assumed to be independent of each other. Note that ut represents transient movements in the actual price from its fundamental price. The 1-period return is rt = pt − pt−1 = vt + ut − ut−1 . 3.4 Univariate Moving Average Models 81 and the h-period return is rt (h) = pt − pt−h = rt + rt−1 + · · · + rt−h+1 = (vt + ut − ut−1 ) + (vt−1 + ut−1 − ut−2 ) + · · · +(vt−h+1 + ut−h+1 − ut−h ) = vt + vt−1 + · · · vt−h+1 + ut − ut−h . The autocovariance is γh = E[(log pt − log pt−h )(log pt−h − log pt−2h )] = E[(vt + vt−1 · · · vt−h+1 + ut − ut−h ) ×(vt−h + vt−h−1 + · · · vt−2h+1 + ut−h − ut−2h )] = E[ut ut−h ] − E[ut ut−2h ] − E[u2t−h ] + E[ut−h ut−2h ] = 2E[ut ut−h ] − E[ut ut−2h ] − E[u2t−h ]. For h = 0, the returns variance is γ0 = 0. As ut is stationary by assumption, for longer maturities E[ut ut−h ] and E[ut ut−2h ] both approach zero, and lim γh = −E[u2t−h ], h→∞ implying that the autocovariance must eventually become negative. For intermediate maturities, however, this expression can be positive thereby implying mean aversion in these intermediate returns. 3.4 Univariate Moving Average Models 3.4.1 Specification An alternative way to introduce dynamics into univariate models is to allow the lags in the dependent variable yt to be implicitly determined via the disturbance term ut . The specification of the model is yt = ψ0 + ut , (3.4) ut = vt + ψ1 vt−1 + ψ2 vt−2 + · · · + ψq vt−q , (3.5) with ut specified as where vt is a disturbance term with zero mean and constant variance σv2 , and ψ0 , ψ1 , · · · , ψq are unknown parameters. As ut is a weighted sum of current and past disturbances, this model is referred to as a moving average model with q lags, or more simply MA(q). Estimation of the unknown parameters is more involved for this class of models than it is for the autoregressive model as it requires a nonlinear least squares algorithm. 82 Modelling with Stationary Variables 3.4.2 Properties To understand the properties of MA models, consider the MA(1) model yt = ψ0 + vt + ψ1 vt−1 , (3.6) where |ψ1 | < 1. Applying the unconditional expectations operator to both sides gives the unconditional mean E[yt ] = E[ψ0 + vt + ψ1 vt−1 ] = ψ0 + E[vt ] + ψ1 E[vt−1 ] = ψ0 . The unconditional variance is γ0 = E[(yt − E[yt ])2 ] = E[(vt + ψ1 vt−1 )2 ] = σv2 (1 + ψ12 ). The first order autocovariance is γ1 = E[(yt − E[yt ])(yt−1 − E[yt−1 ])] = E[(vt + ψ1 vt−1 )(vt−1 + ψ1 vt−k )] = ψ1 σv2 , whilst for autocovariances of k > 1, γk = 0. The ACF of a MA(1) model is summarised as ψ1 : k=1 γk (3.7) ρk = = 1 + ψ12 γ0 0 : otherwise. This result is in contrast to the ACF of the AR(1) model as now there is a spike in the ACF at lag 1. As this spike corresponds to the lag length of the model, it follows that the ACF of a MA(q) model has non-zero values for the first q lags and zero thereafter. To understand the PACF properties of the MA(1) model, consider rewriting ( 3.6) using the lag operator yt = ψ0 + (1 + ψ1 L)vt , whereby Lvt = vt−1 . As |ψ1 | < 1, this equation is rearranged by multiplying both sides by (1 + ψ1 L)−1 (1 + ψ1 L)−1 yt = (1 + ψ1 L)−1 ψ0 + vt (1 − ψ1 L + ψ12 L2 + · · · )yt = (1 + ψ1 L)−1 ψ0 + vt . As this is an infinite AR model, the PACF is non-zero for higher order lags in contrast to the AR model which has just non-zero values up to an including lag p. 3.5 Autoregressive-Moving Average Models 83 3.4.3 Bid-Ask Bounce Market-makers provide liquidity in asset markets as they are prepared to post prices and respond to the demand of buyers and sellers. The marketmakers buy at the bid price, bid, and sell at the ask price, ask, with the difference between the two, the bid-ask spread given by s = ask − bid, representing their profit. The price pt is assumed to behave according to s pt = f + It , 2 where f is the fundamental price assumed to be constant and It is a binary indicator variable that pushes the price of the asset upwards (downwards) if there is a buyer (seller) It = +1 : with probability 0.5 (buyer) −1 : with probability 0.5 (seller). The change in the price exhibits negative first-order autocorrelation corr(∆pt , ∆pt−1 ) = − corr(∆pt , ∆pt−k ) = 0, 1 2 k > 1. Since the autocorrelation function has a spike at lag 1, this process is equivalent to a first-order MA process. 3.5 Autoregressive-Moving Average Models The autoregressive and moving average models are now combined to yield an autoregressive-moving average model yt = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p + ut ut = vt + ψ1 vt−1 + ψ2 vt−2 + · · · + ψq vt−q , where vt is a disturbance term with zero mean and constant variance σv2 . This model is denoted as ARMA(p,q). As with the MA model, the ARMA model requires a nonlinear least squares procedure to estimate the unknown parameters. 84 Modelling with Stationary Variables 3.6 Regression Models A property of the regression models discussed in the previous chapter is that the dependent and explanatory variables all occur at time t. To allow for dynamics into this model, the autoregressive and moving average specifications discussed above can be used. Some ways that dynamics are incorporated into this model are as follows. (1) Including lagged autoregressive disturbance terms: yt = β0 + β1 xt + ut ut = ρ1 ut−1 + vt . (2) Including lagged moving average disturbance terms: yt = β0 + β1 xt + ut ut = vt + θ1 vt−1 . (3) Including lagged dependent variables: yt = β0 + β1 xt + λyt−1 + ut . (4) Including lagged explanatory variables: yt = β0 + β1 xt + γ1 xt−1 + γ2 xt−2 + β2 zt−1 + ut . (5) Joint specification: yt = β0 + β1 xt + λ1 yt−1 + γ1 xt−1 + γ2 xt−2 + β2 zt−1 + ut ut = ρ1 ut−1 + vt + θ1 vt−1 . A natural specification of dynamics in the linear regression model arises in the case of models of forward market efficiency. Lags here are needed for two reasons. First, the forward rate acts as a predictor of future spot rates. Second, if the data are overlapping whereby the maturity of the forward rate is longer than the frequency of observations, the disturbance term will have a moving average structure. This point is taken up in Exercise 6. An important reason for including dynamics into a regression model is to correct for potential misspecification problems that arise from incorrectly excluding explanatory variables. In Chapter 2, misspecification of this type is detected using the LM autocorrelation test applied to the residuals of the estimated regression model. 3.7 Vector Autoregressive Models 85 3.7 Vector Autoregressive Models Once a decision is made to move into a multivariate setting, it becomes difficult to delimit one variable as the ‘dependent’ variable to be explained in terms of all the others. It may be that all the variables are in fact jointly determined. 3.7.1 Specification and Estimation This problem was first investigated by Sims (1980) using United States data on the nominal interest rate, money, prices and output. He suggested that to start with it was useful to treat all variables as determined by the system of equations. The model will therefore have an equation for each of the variables under consideration. The most important distinguishing feature of the system of equations, however, is each equation will have exactly the same the set of explanatory variables. This type of model is known as a vector autoregressive model (VAR). An example of a bivarate VAR(p) is y1t = φ10 + y2,t = φ20 + p X i=1 p X i=1 φ11,i y1,t−i + φ21,i y1,t−i + p X i=1 p X φ21,i y2t−i + u1t (3.8) φ22,i y2t−i + u2t , (3.9) i=1 where y1,t and y2,t are the dependent variables, p is the lag length which is the same for all equations and u1,t and u2,t are disturbance terms. Interestingly, despite being a multivariate system of equations with lagged values of the each variable potentially influencing all the others, estimation of a VAR is performed by simply applying ordinary least squares to each equation one at a time. Despite the model being a system of equations, ordinary least squares applied to each equation is appropriate because the set of explanatory variables is the same in each equation. Higher dimensional VARs containing k variables {y1,t , y2,t , · · · , yk,t }, are specified and estimated in the same way as they are for bivariate VARs. For example, in the case of a trivariate model with k = 3, the VAR is specified 86 Modelling with Stationary Variables as y1t = φ10 + y2t = φ20 + y3t = φ30 + p X φ11,i y1,t−i + p X φ12,i y2,t−i + p X i=1 i=1 i=1 p X p X p X φ21,i y1,t−i + φ22,i y2,t−i + i=1 i=1 i=1 p X p X p X φ31,i y1,t−i + i=1 φ32,i y2,t−i + i=1 φ13,i y3,t−i + u1,t φ23,i y3,t−i + u2,t (3.10) φ33,i y3,t−i + u3,t . i=1 Estimation of the first equation involves regressing y1,t on a constant and all of the lagged variables. This is repeated for the second equation where y2t is the dependent variable, and for the third equation where y3t is the dependent variable. In matrix notation the VAR is conveniently represented as yt = Φ0 + Φ1 yt−1 + Φ2 yt−2 + · · · + Φk yt−k + ut , where the parameters φ10 φ20 Φ0 = . .. φk0 are given by , Φi = φ11,i φ1,2,i · · · φ21,i φ22,i .. .. .. . . . φk1,i φk2,i · · · φ1,k,i φ2k,i .. . (3.11) . φkk,i The disturbances ut = {u1,t , u2,t , ..., uk,t }, have zero mean with covariance matrix var(u1t ) cov(u1t , u2t ) · · · cov(u1t , ukt ) cov(u2t , u1t ) var(u2,t ) cov(u2t , ukt ) Ω= (3.12) . .. .. .. .. . . . . cov(ukt , u1t ) cov(ukt , u2t ) · · · var(ukt ) This matrix has two properties. First, it is a symmetric matrix so that the upper triangular part of the matrix is the mirror of the lower triangular part cov(uit , ujt ) = cov(ujt , uit ), i 6= j. Second, the disturbance terms in each equation are allowed to be correlated with the disturbances of other equations cov(uit , ujt ) 6= 0, i 6= j. This last property is important when undertaking impulse response analysis 87 3.7 Vector Autoregressive Models and computing variance decompositions, topics which are addressed at a later stage. Now consider extending the AR(6) model for real equity returns to include lagged real dividend returns, rdt , as possible explanatory variables. The seems like a reasonable course of action given that the present value model established a theoretical link between equity prices and dividends. Setting the lag length, p, equal to six yields the following estimated equation: ret = 0.254 + 0.296 ret−1 − 0.064 ret−2 − 0.040 ret−3 (0.102) (0.025) (0.026) (0.026) +0.021 ret−4 + 0.053 ret−5 + 0.013 ret−6 (0.026) (0.026) (0.025) −0.019 rdt−1 + 0.504 rdt−2 − 0.296 rdt−3 (0.193) (0.262) (0.258) +0.395 rdt−4 − 0.259 rdt−5 − 0.350 rdt−6 + u bt . (0.257) (0.263) (0.191) As before, standard errors are shown in parentheses and u bt is the least squares residual. Equally important, however, is a model to explain real dividend returns and a natural specification of a model of real dividend returns is to include as explanatory variables both own lags and lags of real equity returns. Using the same data as in the estimated models of real equity returns, an AR(6) model of rdt which also includes lagged values of ret , is estimated by ordinary least squares. The results are as follows: rdt = 0.016 + 0.001 ret−1 + 0.008 ret−2 + 0.007 ret−3 (0.013) (0.003) (0.003) (0.003) +0.001 ret−4 + 0.012 ret−5 + 0.014 ret−6 (0.003) (0.003) (0.003) +0.918 rdt−1 + 0.015 rdt−2 − 0.282 rdt−3 (0.025) (0.034) (0.033) +0.250 rdt−4 + 0.015 rdt−5 − 0.030 rdt−6 + u bt . (0.033) (0.034) (0.025) The parameter estimates on real equity returns at lags 2, 3, 5 and 6 are all statistically significant. A joint test of the parameters of the lags of ret , yields a Chi-square statistic of 60.395. The p-value is 0.000, showing that the restrictions are easily rejected and that lagged values of ret are important in explaining the behaviour of rdt . Treating both real equity returns , ret , and real dividend payments, rdt , as potentially endogenous, a VAR(6) model is estimated for monthly United States data from 1871 to 2004. The parameter estimates (with standard errors in parentheses) are given in Table 3.1. A comparison of the point 88 Modelling with Stationary Variables estimates of the VAR(6) and the univariate models of equity and dividend returns given previously will show that the estimates are indeed the same. Table 3.1 Parameter estimates of a bivariate VAR(6) model for United States monthly real equity returns and real dividend payments for the period 1871 to 2004. Lag Equity Returns re rd Dividend Returns re rd (0.025) −0.019 2 −0.064 (0.026) (0.262) (0.003) (0.034) 3 −0.040 (0.026) −0.296 (0.258) (0.003) 0.007 −0.282 4 0.021 0.395 0.001 1 5 6 Constant 0.296 (0.193) 0.504 0.001 0.918 (0.003) (0.025) 0.008 0.015 (0.033) 0.250 (0.026) (0.257) (0.003) 0.053 −0.259 (0.263) (0.003) (0.034) 0.013 −0.350 (0.003) 0.014 −0.030 (0.026) (0.025) 0.254 (0.102) (0.191) (0.033) 0.012 0.015 (0.025) 0.016 (0.013) 3.7.2 Lag Length Selection An important part of the specification of a VAR is the choice of the lag structure p. If the lag length is too short important parts of the dynamics are excluded from the model. If the lag structure is too long then there are redundant lags which can reduce the precision of the parameter estimates, thereby raising the standard errors and yielding t-statistics that are relatively too small. Moreover, in choosing a lag structure in a VAR, care needs to be exercised as degrees of freedom can quickly diminish for even moderate lag lengths. An important practical consideration in estimating the parameters of a VAR(p) model is the optimal choice of lag order. A common data-driven way of selecting the lag order is to use information criteria. An information criterion is a scalar that is a simple but effective way of balancing the improvement in the fit of the equations with the loss of degrees of freedom which results from increasing the lag order of a time series model. The three most commonly used information criteria for selecting a parsimonious time series model are the Akaike information criterion (AIC) (Akaike, 1974, 1976), the Hannan information criterion (HIC) (Hannan and Quinn, 1979; Hannan, 1980) and the Schwarz information criterion (SIC) 89 3.7 Vector Autoregressive Models (Schwarz, 1978). If k is the number of parameters estimated in the model, these information criteria are given by 2k T −p b + 2k ln(log(T − p)) HIC = log |Ω| T −p k log(T − p) b + SIC = log |Ω| . T −p b + AIC = log |Ω| (3.13) b is the ordinary in which p is the maximum lag order being tested for and Ω least squares estimate of the matrix in equation (3.12). In the scalar case, b is replaced by the the determinant of the estimated covariance matrix, |Ω|, 2 estimated residual variance, s . Choosing an optimal lag order using information criteria requires the following steps. Step 1: Choose a maximum number of lags for the VAR model. This choice is informed by the ACFs and PACFs of the data, the frequency with which the data are observed and also the sample size. Step 2: Estimate the model sequentially for all lags up to and including p. For each regression, compute the relevant information criteria. Step 3: Choose the specification of the model corresponding to the minimum values of the information criteria. In some cases there will be disagreement between different information criteria and the final choice is then an issue of judgement. The bivariate VAR(6) for equity returns and dividend returns in Table 3.1 arbitrarily chose p = 6. In order to verify this choice the information criteria outlined in Section 3.7.2 should be used. For example, the Hannan-Quinn criterion (HIC) for this VAR for lags from 1 to 8 is as follows: Lag: 1 2 3 4 5 6 7 8 HQ: 7.155 7.148 7.146 7.100 7.084 7.079* 7.086 7.082 It is apparent that the minimum value of the statistic is HQ = 7.079, which corresponds to an optimal lag structure of 6. This provides support for the choice of the number of lags used to estimate the VAR. 90 Modelling with Stationary Variables 3.7.3 Granger Causality Testing In a VAR model, all lags are assumed to contribute to information on each dependent variable, but in most empirical applications are large number of the estimated coefficients are statistically insignificant. It is then a question of crucial importance to determine if at least one of the parameters on the lagged values of the explanatory variables in any equation are are not zero. In the bivariate VAR case, this suggests that a test of the information content of y2t on y1t in equation (3.8) is given by testing the joint restrictions φ21,1 = φ21,2 = φ21,3 = · · · = φ21,p = 0. These restrictions can be tested jointly using a chi-square test. If y2t is important in predicting future values of y1t over and above lags of y1t alone, then y2t is said to cause y1t in Granger’s sense (Granger, 1969). It is important to remember, however, that Granger causality is based on the presence of predictability. Evidence of Granger causality and the lack of Granger causality from y2t to y1t , are denoted, respectively, as y2t → y1t y2t 9 y1t y . It is also possible to test for Granger causality in the reverse direction by performing a joint test of the lags of y1t in the y2t equation. Combining both sets of causality results can yield a range of statistical causal patterns: Unidirectional: (from y2t to y1t ) y2t → y1t y1t 9 y2t Bidirectional: (feedback) y2t → y1t y1t → y2t Independence: y2t 9 y1t y1t 9 y2t Table 3.2 gives the results of the Granger causality tests based on the chi-square statistic. Both p-values are less than 0.05 showing that there is bidirectional Granger causality between real equity returns (re) and real dividend returns (rd). Note that the results of the Granger causality test for rd 9 re reported in Table 3.2 may easily be verified using the estimation results obtained from the univariate model where real equity returns are a function of lags 1 to 6 of ret and rdt , a test of the information value of real dividend returns is given by the chi-square statistic χ2 = 20.288. There are 6 degrees of freedom resulting in a p-value is 0.0025, suggesting real dividend returns are statistically important in explaining real equity returns at the 91 3.7 Vector Autoregressive Models 5% level. This is in complete agreement with the results of the Granger causality tests concerning the information content of dividends. Table 3.2 Results of Granger causality tests based on the estimates of a bivariate VAR(6) model for United States monthly real equity returns and real dividend payments for the period 1871 to 2004. Null Hypothesis: Chi-square Degrees of Freedom p-value rd 9 re re 9 rd 20.288 60.395 6 6 0.0025 0.0000 3.7.4 Impulse Response Analysis The Granger causality test provides one method for understanding the overall dynamics of lagged variables. An alternative, but related approach, is to track the effects of shocks through the model on the dependent variables. In this way the full dynamics of the system are displayed and how the variables interact with each other over time. This approach is formally called impulse response analysis. In performing impulse response analysis a natural candidate to represent a shock is the disturbance term ut = {u1,t , u2,t , ..., uk,t } in the VAR as it represents that part of the dependent variables that is not predicted from past information. The problem though is that the disturbance terms are correlated as highlighted by the fact that the covariance matrix in (3.12) in general has non-zero off-diagonal terms. The approach in impulse response analysis is to transform ut into another disturbance term which has the property that it has a covariance matrix with zero off-diagonal terms. Formally the transformed residuals are referred to as orthogonalized shocks which have the property that u2,t to uK,t do not have an immediate effect on u1,t , u3,t to uk,t do not have an immediate effect on u2,t , etc. Figure 3.3 gives the impulse responses of the VAR equity-dividend model. There are four figures to capture the four sets of impulses. The first column gives the response of re and rd to a shock in re, whereas the second column shows how re and rd are affected by a shock to rd. A positive shock to re has a damped oscillatory effect on re which quickly dissipates. The effect on rd is initially negative which quickly becomes positive, reaching a peak after 8 months, before decaying monotonically. The effect of a positive rd shock on rd slowly dissipates approaching zero after nearly 30 periods. The 92 Modelling with Stationary Variables Equity-Dividend Model Impulse Responses RE -> RE RD -> RE 4 4 3 3 2 2 1 1 0 0 -1 -1 0 10 20 Forecast Horizon 30 0 RE -> RD 10 20 Forecast Horizon 30 RD -> RD .5 .5 .4 .3 .2 .1 0 -.1 .4 .3 .2 .1 0 -.1 0 10 20 Forecast Horizon 30 0 10 20 Forecast Horizon 30 Figure 3.3 Impulse responses for the VAR(6) model of equity prices and dividends. Data are monthly for the period January 1871 to June 2004. immediate effect of this shock on re is zero by construction, which hovers near zero exhibiting a damped oscillatory pattern. 3.7.5 Variance Decomposition The impulse response analysis provides information on the dynamics of the VAR system of equations and how each variable responds and interacts to shocks in the other variables in the system. To gain insight into the relative importance of shocks on the movements in the variables in the system a variance decomposition is performed. In this analysis, movements in each variable over the horizon of the impulse response analysis are decomposed into the separate relative effects of each shock with the results expressed as a percentage of the overall movement. It is because the impulse responses are expressed in terms of orthogonalized shocks that it is possible to carry out this decomposition. The variance decomposition for selected periods of real equity (re) and real dividend (rd) returns based on the bivariate VAR equity-dividend model is as follows: 93 3.7 Vector Autoregressive Models Period 1 5 10 15 20 25 30 Decomposition of re re rd Decomposition of rd re rd 100.000 98.960 98.651 98.593 98.554 98.539 98.535 0.316 1.114 8.131 10.698 11.686 11.996 12.081 0.000 1.040 1.348 1.406 1.445 1.460 1.465 99.684 98.886 91.869 89.302 88.313 88.004 87.919 The rd shocks contribute very little to re with the maximum contribution still less than 2%. In contrast, re shocks after 15 periods contribute more than 10% of the variance in rd. These results suggest that the effects of shocks in re on rd, are relatively more important that the reverse. 3.7.6 Diebold-Yilmaz Spillover Index An important application of the variance decomposition of a VAR is the spillover index proposed by Diebold and Yilmaz (2009) where the aim is to compute the total contribution of shocks on an asset market arising from all other markets. Table 3.3 gives the volatility decomposition for a 10 week horizon of the weekly asset returns of 19 countries based on a VAR with 2 lags and a constant. The sample period begins December 4th 1996, and ends November 23rd 2007. The first row of the table gives the contributions to the 10-week forecast variance of shocks in all 19 asset markets on US weekly returns. By excluding own shocks, which equal 93.6%, the total contribution of the other 18 asset markets is given in the last column and equals 1.6 + 1.5 + · · · + 0.3 = 6.4%. Similarly, for the UK, the total contribution of the other 18 asset markets to its forecast variance is 40.3 + 0.7 + · · · + 0.5 = 44.3%. Of the 19 asset markets, the US appears to be the most independent of all international asset markets as it has the lowest contributions from other asset markets, equal to just 6.4%. The next lowest is Turkey with a contribution of 14%. Germany’s asset market appears to be the most affected by international asset markets where the contribution of shocks from external markets to its forecast variance is 72.4%. Table 3.3 US UK FRA GER HKG JPN AUS IDN KOR MYS PHL SGP TAI THA ARG BRA CHL MEX TUR To 291.9 385.5 93.6 40.3 38.3 40.8 15.3 12.1 23.2 6 8.3 4.1 11.1 16.8 6.4 6.3 11.9 14.1 11.8 22.2 3 US 84.1 139.8 1.6 55.7 21.7 15.9 8.7 3.1 6 1.6 2.6 2.2 1.6 4.8 1.3 2.4 2.1 1.3 1.1 3.5 2.5 UK 31 68.2 1.5 0.7 37.2 13 1.7 1.8 1.3 1.2 1.3 0.6 0.3 0.6 1.2 1 1.6 1 1 1.2 0.2 FRA 11.2 38.8 0 0.4 0.1 27.6 1.4 0.9 0.2 0.7 0.7 1.3 0.2 0.9 1.8 0.7 0.1 0.7 0 0.4 0.7 GER 80.8 150.6 0.3 0.1 0 0.1 69.9 2.3 6.4 6.4 5.6 10.5 8.1 18.5 5.3 7.8 1.3 1.3 3.2 3 0.6 HKG 19.2 96.9 0.2 0.5 0.2 0.1 0.3 77.7 2.3 1.6 3.7 1.5 0.4 1.3 2.8 0.2 0.8 1.4 0.6 0.3 0.9 JPN 11.5 68.3 0.1 0.1 0.3 0.3 0 0.2 56.8 0.4 1 0.4 0.9 0.4 0.4 0.8 1.3 1.6 1.4 1.2 0.6 AUS 31.4 108.3 0.1 0.2 0.3 0.4 0.1 0.3 0.1 77 1.2 6.6 7.2 3.2 0.4 7.6 0.4 0.5 2.3 0.2 0.1 IDN 13.6 86.4 0.2 0.2 0.3 0.6 0 0.3 0.4 0.7 72.8 0.5 0.1 1.6 2 4.6 0.4 0.5 0.3 0.3 0.6 KOR 16.2 85.4 0.3 0.3 0.2 0.1 0.3 0.1 0.2 0.4 0 69.2 2.9 3.6 1 4 0.6 0.7 0.3 0.9 0.3 MYS 9.9 72.8 0.2 0.2 0.2 0.3 0.1 0.2 0.2 0.1 0 0.1 62.9 1.7 1 2.3 0.4 1 0.1 1 0.6 PHL 8.2 51.2 0.2 0 0.1 0.3 0 0.3 0.2 0.9 0.1 0.1 0.3 43.1 0.9 2.2 0.6 0.8 0.9 0.1 0.1 SGP 5.9 79.5 0.3 0.1 0.1 0 0.2 0.3 0.4 0.2 0.1 0.2 0.4 0.3 73.6 0.3 1.1 0.1 0.3 0.3 0.9 TAI 11.8 70 0.2 0.1 0.3 0.2 0.9 0.1 0.5 1 1.3 1.1 1.5 1.1 0.4 58.2 0.2 0.7 0.8 0.5 0.8 THA 21.4 96.7 0.1 0.1 0.1 0 0.3 0.1 0.1 0.7 0.2 0.1 1.6 0.8 0.8 0.5 75.3 7.1 2.9 5.4 0.5 ARG 9.4 75.2 0.1 0.1 0.1 0.1 0 0 0.3 0.1 0.2 0.6 0.1 0.5 0.3 0.2 0.1 65.8 4 1.6 1.1 BRA 2.6 68.4 0 0 0.1 0 0.1 0 0.1 0.3 0.1 0.4 0 0.1 0.1 0.1 0.1 0.1 65.8 0.3 0.6 CHL 8.4 65.4 0.5 0.4 0.1 0.1 0.3 0.1 0.6 0.1 0.1 0.2 0.1 0.3 0.3 0.4 1.4 0.6 2.7 56.9 0.2 MEX 6.7 92.4 0.3 0.5 0.3 0.1 0.4 0.1 0.7 0.4 0.7 0.3 0.2 0.4 0 0.3 0.3 0.7 0.4 0.6 85.8 TUR 675 Index = 35.5% 6.4 44.3 62.8 72.4 30.1 22.3 43.2 23 27.2 30.8 37.1 56.9 26.4 41.8 24.7 34.2 34.2 43.1 14.2 Others Diebold-Yilmaz spillover index of global stock market returns. Based on a VAR with 2 lags and a constant with the variance decomposition based on a 10 week horizon. Others Own 3.8 Exercises 95 By adding up the separate contributions to each asset market in the last column gives the total contributions of non-own shocks on all 19 asset market 6.4 + 44.3 + · · · + 14.2 = 675.0%. As the contributions to the total forecast variance by construction are normalized to sum to 100% for each of the 19 asset markets, the percentage contribution of external shocks to the 19 asset market is given by the spillover index 675.0 SP ILLOV ER = = 35.5%. 19 This value shows that approximately one-third of the forecast variance of asset returns is the result of shocks from external asset markets with the remaining two-thirds arising from internal shocks on average. 3.8 Exercises (1) Estimating AR and MA Models pv.wf1, pv.dta, pv.xlsx (a) Compute the percentage monthly return on equities and dividends. Plot the two returns and interpret their time series patterns. (b) Estimate an AR(6) model of equity returns. Interpret the parameter estimates. (c) Estimate an AR(6) model of equity returns but now augment the model with 6 lags on dividend returns. Perform a test of the information value of dividend returns in understanding equity returns. (d) Repeat parts (b) and (c) for real dividend returns. (e) Estimate a MA(3) model of real equity returns. (f) Estimate a MA(6) model of equity returns. (g) Perform a test that the parameters on lags 4 to 6 are zero. (h) Repeat parts (e) to (g) using real dividend returns. (2) Computing the ACF and PACF pv.wf1, pv.dta, pv.xlsx (a) Compute the percentage monthly return on equities and dividends. 96 Modelling with Stationary Variables (b) Compute the ACF of real equity returns for up to 6 lags. Compare a manual procedure with an automated version provided by econometric software. (c) Compute the PACF of real equity returns for up to 6 lags.Compare a manual procedure with an automated version provided by econometric software. (d) Repeat parts (b) and (c) for real dividend returns. (3) Mean Aversion and Reversion in Stock Returns int yr.wf1, int yr.dta, int yr.xlsx int qr.wf1, int qr.dta, int qr.xlsx int mn.wf1,int mn.dta, int mn.xlsx (a) Estimate the following regression equation using returns on the NASDAQ (rt ) for each frequency (monthly, quarterly, annual) rt = φ0 + φ1 rt−1 + ut , where ut is a disturbance term. Interpret the results. (b) Repeat part (a) for the Australian share price index. (c) Repeat part (a) for the Singapore Straits Times stock index. (4) Poterba-Summers Pricing Model Poterba and Summers (1988) assume that the price of an asset pt , behaves according to log pt = log ft + ut log ft = log ft−1 + vt ut = φ1 ut−1 + wt , where ft is the fundamental price, ut represents transient price movements, and vt and wt are independent disturbance terms with zero means 2 respectively. and constant variances, σv2 and σw (a) Show that the k th order autocorrelation of the one period return rt = log pt − log pt−1 = vt + ut − ut−1 , is ρk = 2 σw φ1k−1 (φ1 − 1) < 0. 2 /σ 2 ) σv2 (1 + φ1 + 2σw v 97 3.8 Exercises (b) Show that the first order autocovariance function of the h-period return rt (h) = log pt − log pt−h = rt + rt−1 + · · · + rt−h+1 , is γh = 2 σw (2φh1 − φ2h 1 − 1) < 0. 1 − φ21 (5) Roll Model of Bid-Ask Bounce spot.wf1, spot.dta, spot.xlsx Roll (1984) assumes that the price, pt , of an asset follows s pt = f + It , 2 where f is a constant fundamental price, s is the bid-ask spread and It is a binary indicator variable given by +1 : with probability 0.5 (buyer) It = −1 : with probability 0.5 (seller). (a) Derive E[It ], var(It ), cov(It , It−1 ), corr(It , It−1 ). (b) Derive E[∆It ], var(∆It ), cov(∆It , ∆It−1 ), corr(∆It , ∆It−1 ). (c) Show that the autocorrelation function of ∆pt is corr(∆pt , ∆pt−1 ) = − corr(∆pt , ∆pt−k ) = 0, 1 2 k > 1. (d) Suppose that the price is now given by s pt = ft + It , 2 where the fundamental price ft is now assumed to be random with zero mean and variance σ 2 . Derive the autocorrelation function of ∆pt . (6) Forward Market Efficiency spot.wf1, spot.dta, spot.xlsx The forward market is efficient if the lagged forward rate is an unbiased predictor of the current spot rate. 98 Modelling with Stationary Variables (a) Estimate the following model of the spot and the lagged 1-month forward rate St = β0 + β1 Ft−4 + ut , where the forward rate is lagged four periods (the data are weekly). Verify that weekly data on the $/AUD spot exchange rate and the 1 month forward rate yields St = 0.066 + 0.916Ft−4 + et , where a lag length of four is chosen as the data are weekly and the forward contract matures in one month. Test the restriction β1 = 1 and interpret the result. (b) Compute the ACF and PACF of the least squares residuals, et , for the first 8 lags. Verify that the results are as follows. Lag: 1 2 3 4 5 6 7 8 ACF PACF 0.80 0.80 0.54 -0.28 0.29 -0.14 0.07 -0.07 0.07 0.40 0.09 -0.11 0.13 -0.04 0.15 -0.02 (c) There is evidence to suggest that the ACF decays quickly after 3 lags. Interpret this result and use this information to improve the specification of the model and redo the test of β1 = 1. (d) Repeat parts (a) to (c) for the 3-month and the 6-month forward rates. (7) Microsoft in the Dot-Com Crisis capm.wf1, capm.dta, capm.xlsx (a) Compute the monthly excess returns for Microsoft and the market. (b) Estimate a CAPM augmented by dummy variables to capture the large movements in the Microsoft returns in April 2000, December 2000 and January 2001. Perform a test of autocorrelation on ut and interpret the result. (c) Reestimate the CAPM in part (b) augmented by including the first lag of Microsoft excess returns. Test of autocorrelation on ut and interpret the result. (d) Briefly discuss other ways that dynamics can be included in the model. 3.8 Exercises 99 (8) An Equity-Dividend VAR pv.wf1, pv.dta, pv.xlsx (a) Compute the percentage monthly return on equities and dividends and estimate a bivariate VAR for these variables with 6 lags. (b) Test for the optimum choice of lag length using the Hannan-Quinn criterion and specifying a maximum lag length of 12. If required, re-estimate the VAR. (c) Test for Granger causality between equity returns and dividends and interpret the results. (d) Compute the impulse responses for 30 periods and interpret the results. (e) Compute the variance decomposition for 30 periods and interpret the results. (9) Campbell-Shiller Present Value Model cam shiller.wf1, cam shiller.dta, cam shiller.xlsx Let rdt be real dividend returns (expressed in percentage terms) and let vt be deviations from the present value relationship between equity prices and dividends computed from the linear regression pt = β + αdt + vt . Campbell and Shiller (1987) develop a VAR model for rdt and vt given by rdt µ1 φ1,1,1 φ1,2,1 rdt−1 u1,t = + + . vt µ2 φ2,1,1 φ2,2,1 vt−1 u2,t (a) Estimate the parameter α by regressing equity prices, STOCKt , on a constant and dividend parents, DIVt and compute the least squares residuals vbt . (b) Estimate a VAR(1) containing the variables rdt and vbt . (c) Campbell and Shiller show that φ22,1 = δ −1 − αφ12,1 where δ represents the discount factor. Use the parameter estimate of α obtained in part (a) and the parameter estimates of φ12,1 and φ22,1 obtained in part (b), to estimate δ. Interpret the result. 100 Modelling with Stationary Variables (10) Causality Between Stock Returns and Output Growth stock out.wf1, stock out.dta, stock out.xlsx (a) For the United States, compute the percentage continuous stock returns and output growth rates, respectively. (b) It is hypothesised that stock returns lead output growth but not the reverse. Test this hypothesis by performing a test for Granger causality between the two series using 1 lag. (c) Test the robustness of these results by using higher order lags up to a maximum of 4. What do you conclude about the causal relationships between stock returns and output growth in the United States? (d) Repeat parts (a) to (c) for Japan, Singapore and Taiwan. (11) Volatility Linkages diebold.wf1, diebold.dta, diebold.xlsx Diebold and Yilmaz (2009) construct spillover indexes of international real asset returns and volatility based on the variance decomposition of a VAR. The data file contains weekly data on real asset returns, rets, and volatility, vol, of 7 developed countries and 12 emerging countries from the first week of January 1992 to the fourth week of November 2007. (a) Compute descriptive statistics of the 19 real asset market returns given in rets. Compare the estimates with the results reported in Table 1 of Diebold and Yilmaz. (b) Estimate a VAR(2) containing a constant and the 19 real asset market returns. (c) Estimate V D10 , the variance decomposition for horizon h = 10, and compare the estimates with the results reported in Table 3 of Diebold and Yilmaz. (d) Using the results in part (c) compute the ‘Contribution from Others’ by summing each row of V D10 excluding the diagonal elements, and the ‘Contribution to Others’ by summing each column of V D10 excluding the diagonal elements. Interpret the results. (e) Repeat parts (a) to (d) with the 19 series in rets replaced by vol, and the comparisons now based on Tables 2 and 4 in Diebold and Yilmaz. 4 Nonstationarity in Financial Time Series 4.1 Introduction An important property of asset prices identified in Chapter 1 is that they exhibit strong trends. Financial series exhibiting no trending behaviour are referred to as being stationary and are the subject matter of Chapter 3, while series that are characterised by trending behaviour are referred to as being nonstationary. This chapter focuses on identifying and testing for nonstationarity in financial time series. The identification of nonstationarity will hinge on a test for ρ = 1 in a model of the form yt = ρyt−1 + ut , in which ut is a disturbance term. This test is commonly referred to as a test for unit root. This situation is different from hypothesis tests performed on stationary processes under the null conducted in Chapter 3 because the process is nonstationary under the null hypothesis of ρ = 1 and as a consequence the test statistic does not have a normal distribution in large samples. The classification of variables as either stationary or nonstationary has important implications in both finance and econometrics. From a finance point of view, the presence of nonstationarity in the price of financial asset is consistent with the efficient markets hypothesis which states that all of the information in the price of an asset is contained in its most recent price. If the nonstationary process is explosive then this may be taken as evidence of a bubble in the price of the asset. 4.2 Characteristics of Financial Data In Chapter 1 the efficient markets hypothesis was introduced which theorises that all available information concerning the value of a risky asset is factored 102 Nonstationarity in Financial Time Series into the current price of the asset. The return to a risky asset may be written as rt = pt − pt−1 = α + vt , vt ∼ iid (0, σ 2 ) , (4.1) where pt is the logarithm of the asset price. The parameter α represents the average return on the asset. From an efficient markets point of view, provided that vt is not autocorrelated, then rt is unpredictable using information at time t. An alternative representation of equation (4.1) is to rearrange it in terms of pt as pt = α + pt−1 + vt . (4.2) This representation of pt is known as a random walk with drift, where the mean parameter α represents the drift. From an efficient market point of view this equation shows that in predicting the price of an asset in the next period, all of the relevant information is contained in the current price. To understand the properties of the random walk with drift model of asset prices in (4.2), Figure 4.1 provides a plot of a simulated random walk with drift. In simulating equation (4.2), the drift parameter α is set equal to the mean return on the S&P500 while the volatility, σ 2 corresponds to the variance of the logarithm of S&P500 returns. The simulated price is has similar time series characteristics to the observed logarithm of the price index given in Figure 1.2 in Chapter 1 and Figure fig::transformations. In particular, the simulated price exhibits two important characteristics, namely, an increasing mean and an increasing variance. These characteristics may be demonstrated formally as follows. Lag the random walk with drift model in equation (4.2) by one period yields pt−1 = α + pt−2 + vt−1 , and then substituting this expression for pt−1 in (4.2) gives pt = α + α + pt−2 + vt + vt−1 . Repeating this recursive substitution process for t-steps in total gives pt = p0 + αt + vt + vt−1 + vt−2 + · · · + v1 , in which pt is fully determined by its initial value, p0 , a deterministic trend component and the summation of the complete history of disturbances. Taking expectations of this expression and using the property that E[vt ] = E[vt−1 ] · · · = 0, gives the mean of pt E[pt ] = p0 + αt . 103 4.2 Characteristics of Financial Data Random Walk with Drift 2.5 2 1.5 0 50 100 150 200 Figure 4.1 Simulated random walk with drift model using equation (4.2). The initial value of the simulated data is the natural logarithm of the S&P500 equity price index in February 1871 and the drift and volatility parameters are estimated from the returns to the S&P500 index. The distribution of the disturbance term is taken to be the normal distribution. This demonstrates that the mean of the random walk with drift model increases over time provided that α > 0. The variance of pt in the random walk model is defined as var(pt ) = E[(pt − E[pt ])2 ] = tσ 2 by using the property that the disturbances are independent. As with the expression for the mean the variance also is an increasing function over time, that is pt exhibits fluctuations with increasing amplitude as time progresses. It is now clear that the efficient market hypothesis has implications for the time series behaviour of financial asset prices. Specifically in an efficient market asset prices will exhibit trending behaviour. In Chapter 3 the idea was developed of an observer who observes snapshots of a financial time series at different points in time. If the snapshots exhibit similar behaviour in terms of the mean and variance of the observed series, the series is said to be stationary, but if the observed behaviour in either the mean or the variance of the series (or both) is completely different then it is non-stationary. More formally, a variable yt is stationary if its distribution, or some important aspect of its distribution, is constant over time. There are two commonly used definitions of stationarity known as weak (or covariance) 104 Nonstationarity in Financial Time Series and strong (or strict) stationarity1 and it is the former that will be of primary interest. Definition: Weak (or Covariance) Stationarity A process is is weakly stationary if both the population mean and the population variance are constant over time and if the covariance between two observations is a function only of the distance between them and not of time. The efficient markets hypothesis requires that financial asset returns have a non-zero (positive) mean and variance that are independent of time as in equation (4.1). Formally this means that returns are weakly or covariance stationary. By contrast, the logarithm of prices is a random walk with drift, (4.2), in which the mean and the variance are functions of time. It follows, therefore, that a series with these properties is referred to as being non stationary. Logarithm of Equity Prices 80 00 20 40 60 19 19 19 20 Equity Returns 80 60 00 20 19 19 40 19 20 19 00 19 80 18 00 20 80 19 60 19 40 19 20 19 00 19 18 80 -.4 -.2 0 -150-100 -50 0 .2 .4 50 100 First Difference of Equity Prices 19 00 19 80 0 18 00 20 80 19 60 19 40 19 20 19 00 19 18 80 0 2 500 4 1000 6 8 1500 Equity Prices Figure 4.2 Different transformations of monthly United States equity prices for the period January 1871 to June 2004. 1 Strict stationarity is a stronger requirement than that weak stationarity pertains to all of the moments of the distribution not just the first two. 4.3 Deterministic and Stochastic Trends 105 Figure 4.2 highlights the time series properties of the real United States equity price and various transformations of this series, from January 1871 to June 2004. The transformed equity prices are the logarithm of the equity price, the first difference of the equity price and and the first difference of the logarithm of the equity price (log returns). A number of conclusions may be drawn from the behaviour of equity prices in Figure 4.2 which both reinforce and extend the ideas developed previously. Both the equity price and its logarithm are nonstationary in the mean as both exhibit positive trends. Furthermore, a simple first difference of the equity price renders the series stationary in the mean, which is now constant over time, but the variance is still increasing with time. The implication of this is that simply first differencing of the equity price does not yield a stationary series. Finally, equity returns defined as the first difference of the logarithm of prices is stationary in both mean and variance. The appropriate choice of filter to detrend the data is the subject matter of the next section. 4.3 Deterministic and Stochastic Trends While the term ‘trend’ is deceptively easy to define, being the persistent long-term movement of a variable over time, in practice it transpires that trends are fairly tricky to deal with and the appropriate choice of filter to detrend the data is therefore not entirely straightforward. The main reason for this is that there are two very different types of trending behaviour that are difficult to distinguish between. (i) Determimistic trend A deterministic trend is a nonrandom function of time yt = α + δt + ut , in which t is a simple time trend taking integer values from 1 to T . In this model, shocks to the system have a transitory effect in that the process always reverts to its mean of α + δt. This suggests the removing the deterministic trend from yt will give a series that does not trend. That is b =u y−α b − δt bt , in which ordinary least squares has been used to estimate the parameters, is stationary. Another approaches to estimating the parameters of the deterministic elements, generalised least squares, is considered at a later stage. 106 Nonstationarity in Financial Time Series (ii) Stochastic trend By contrast, a stochastic trend is random and varies over time, for example, yt = α + yt−1 + ut , (4.3) which is known as a random walk with drift model. In this model, the best guess for the next value of series is the current value plus some constant, rather than a deterministic mean value. As a result, this kind of models is also called ‘local trend’ or ‘local level’ models. The appropriate filter here is to difference the data to obtain a stationary series as follows ∆yt = α + ut . Distinguishing between deterministic and stochastic trends is important as the correct choice of detrending filter depends upon this distinction. The deterministic trend model is stationary once the deterministic trend has been removed (and is called a trend-stationary process) whereas a stochastic trend can only be removed by differencing the series (a differencestationary process). Most financial econometricians would agree that the behaviour of many financial time series is due to stochastic rather than deterministic trends. It is hard to reconcile the predictability implied by a deterministic trend with the complications and surprises faced period-after-period by financial forecasters. Consider the simple AR(1) regression equation yt = α + ρyt−1 + ut . The results obtained by fitting this regression to monthly data on United States zero coupon bonds with maturities ranging from 2 months to 9 months for period January 1947 to February 1987 are given in Table 4.1 The major result of interest in the results in Table 4.1 is that in all the estimated regressions estimate of the slope coefficient, ρb is very close to unity and indicative of a stochastic trend in the data along the lines of equation (4.3). This empirical result is quite consistent one for all the maturities and, furthermore, the pattern is a fairly robust one that applies to other financial markets such as currency markets (spot and forward exchange rates) and equity markets (share prices and dividends) as well. The behaviour under simulation of series with deterministic (dashed lines) and stochastic trend models (solid lines) is demonstrated in Figure 4.3 using simulated data. The nonstationary series look similar, both showing clear evidence of trending. The key difference between a deterministic trend and 4.3 Deterministic and Stochastic Trends 107 Table 4.1 Ordinary least squares estimates of an AR(1) model estimated using monthly data on United States zero coupon bonds with maturities ranging from 2 months to 9 months for period January 1947 to February 1987 Maturity (mths) 2 3 4 5 6 9 Intercept (b α) se(b α) 0.090 0.087 0.085 0.085 0.087 0.088 0.046 0.045 0.044 0.044 0.045 0.046 Slope (b ρ) se(b ρ) 0.983 0.984 0.985 0.985 0.985 0.985 0.008 0.008 0.007 0.007 0.007 0.007 a stochastic trend however is that removing a deterministic trend from the difference stationary process, illustrated by the solid line in panel (b) of Figure 4.3, does not result in a stationary series. The longer the series is simulated for, the more the evidence reveals the more erratic behaviour of the difference stationary process which has been detrended incorrectly. It is in fact this feature of the makeup of yt that makes its behaviour very different to the simple deterministic trend model because simply removing the deterministic trend will not remove the nonstationarity in the data that is due to the summation of the disturbances. The element of summation of the disturbances in nonstationarity is the origin of an important term, the order of integration of a series. Definition: Order of Integration A process is integrated of order d, denoted by I(d), if it can be rendered stationary by differencing d times. That is, yt is non-stationary, but (yt − yt−1 )d is stationary. Accordingly a process is said to be integrated of order one, denoted by I(1), if it can be rendered stationary by differencing once, that is yt is nonstationary, but ∆yt = yt − yt−1 is stationary. If d = 2, then yt is I(2) and needs to be differenced twice to achieve stationarity as follows (yt − yt−1 )2 = (yt − yt−1 ) − (yt−1 − yt−2 ) = yt − 2yt−1 + yt−2 . By analogy, a stationary process is integrated of zero, I(0), if it does not require any differencing to achieve stationarity. 108 Nonstationarity in Financial Time Series (a) Raw Simulated Data 2.5 2 1.5 0 50 100 150 200 150 200 150 200 (b) Detrended Data .2 0 -.2 0 50 100 (c) Differenced Data .2 0 -.2 0 50 100 Figure 4.3 Panel (a) comparing a process with a deterministic time trend (dashed line) to a process with a stochastic trend (solid line). In panel (b) the estimated deterministic trend is used to detrend both time series data. The deterministically trending data (dashed line) is now stationary, but the model with a stochastic trend (solid line) is still not stationary. In panel (c) both series are differenced. There is one final important point that arises out of the simulated behaviour illustrated in Figure 4.3. At first sight panel (c) may suggest that differencing a financial time series, irrespective of whether it is trend of difference stationary, may be a useful strategy because both the resultant series in panel (c) appear to be stationary. The logic of the argument then becomes, if the series has a stochastic trend then this is the correct course of action and if it is trend stationary then a stationary series will result in 4.3 Deterministic and Stochastic Trends 109 any event. This is not, however, a strategy to be recommended. Consider again the deterministic trend model yt = α + δt + ut In first-difference form this becomes ∆yt = δ + ut − ut−1 , so that the process of taking the first difference has introduced a moving average error term which has a unit root. This is known as over-differencing and it can have treacherous consequences for subsequent econometric analysis, should the true data generating process actually be trend-stationary. In fact for the simple problem of estimating the coefficient δ in the differenced model it produces an estimate that is tantamount to using only the first and last data points in estimation process. 4.3.1 Unit Roots† A series that is I(1) is also said to have a unit root and tests for nonstationarity are called tests for unit roots. The reason for this is easily demonstrated. Consider the general n - th order autoregressive process yt = φ1 yt−1 + φ2 yt−2 + . . . + φn yt−n + ut . This may be written in a different way by using the lag operator, L, which is defined as yt−1 = Lyt , yt−2 = L2 yt ··· yt−n = Ln yt , so that yt = φ1 Lyt + φ2 L2 yt + . . . + φn Ln yt + ut or Φ (L) yt = ut where Φ (L) = 1 − φ1 L − φ2 L2 − . . . − φn Ln is called a polynomial in the lag operator. The roots of this polynomial are the values of L which satisfy the equation 1 − φ1 L − φ2 L2 − . . . − φn Ln = 0. 110 Nonstationarity in Financial Time Series If all of the roots of this equation are greater in absolute value than one, then yt is stationary. If, on the other hand, any of the roots is equal to one (a unit root) then yt is non-stationary. The AR(1) model is (1 − φ1 L) yt = ut and the roots of the equation 1 − φ1 L = 0 are of interest. The single root of this equation is given by L∗ = 1/φ1 and the root is greater than unity only if |φ1 | < 1. If this is the case then the AR(1) process is stationary. If, on the other hand, the root of the equation is unity, then |φ1 | = 1 and the AR(1) process is non-stationary. In the AR(2) model 1 − φ1 L − φ2 L2 yt = ut it is possible that there are two unit roots, corresponding to the roots of the equation 1 − φ1 L − φ2 L2 = 0. A solution is obtained by factoring the equation yield (1 − ϕ1 L) (1 − ϕ2 L) = 0 in which ϕ1 + ϕ2 = φ1 and ϕ1 ϕ2 = φ2 . The roots of this equation are 1/ϕ1 and 1/ϕ2 , respectively, and yt will have a unit root if either of the roots is unity. In the event of φ1 = 2 and φ2 = −1 then both roots of the equation are one and yt has two unit roots and is therefore I(2). 4.4 The Dickey-Fuller Testing Framework The original testing procedures for unit roots were developed by Dickey and Fuller (1979, 1981) and this framework remains one of the most popular methods to test for nonstationarity in financial time series. 4.4.1 Dickey-Fuller (DF) Test Consider again the AR(1) regression equation yt = α + ρyt−1 + ut , (4.4) 4.4 The Dickey-Fuller Testing Framework 111 in which ut is a disturbance term with zero mean and constant variance σ 2 . The null and alternative hypotheses are respectively H0 : H1 : ρ=1 ρ<1 (Variable is nonstationary) (Variable is stationary). (4.5) To carry out the test, equation (4.4) is estimated by ordinary least squares and a t-statistic is constructed to test that ρ = 1 tρ = ρb − 1 . se(b ρ) (4.6) This is all correct up to this stage: the estimation of (4.4) by ordinary least squares and the use of the t-statistic in (??) to test the hypothesis are both sound procedures. The problem is that the distribution of the statistic in (??) is not distributed as a Student t distribution. In fact the distribution of this statistic under the null hypothesis of nonstationarity is non-standard. The correct distribution is known as the Dickey-Fuller distribution and the t-statistic given in (4.6) is commonly known as the Dickey-Fuller unit root test to recognize that even though it is a t-statistic by construction its distribution is not. In practice, equation (4.4) is transformed in such a way to convert the tstatistic in (4.6) to a test that the slope parameter of the transformed equation is zero. This has the advantage that the t-statistic commonly reported in standard regression packages directly yields the Dickey-Fuller statistic. Subtract yt−1 from both sides of (4.4) and collect terms to give yt − yt−1 = α + (ρ − 1)yt−1 + ut , (4.7) or by defining β = ρ − 1, so that yt − yt−1 = α + βyt−1 + ut . (4.8) Equations (4.4) and (4.8) are exactly the same models with the connection being that β = ρ − 1. Consider again the monthly data on United States zero coupon bonds with maturities ranging from 2 months to 9 months for period January 1947 to February 1987 used in the estimation of the AR(1) regressions reported in Table 4.1. Estimating equation (4.4) yields the following results (with standard errors in parentheses) yt = 0.090 + 0.983 yt−1 + et , (0.046) (0.008) (4.9) 112 Nonstationarity in Financial Time Series On the other hand, estimating the transformed equation (4.8) yields yt − yt−1 = 0.090 − 0.017 yt−1 + u bt . (0.046) (0.008) (4.10) Comparing the estimated equations in (4.9) and (4.10) shows that they differ only in terms of the slope estimate on yt−1 . The differences in the two slope estimates is easily reconciled as the slope estimate of (4.9) is ρb = 0.983, whereas an estimate of β may be recovered as βb = ρb − 1 = 0.983 − 1 = −0.017. This is also the slope estimate obtained in (4.10). To perform the test of H0 : ρ = 1, the relevant t-statistics are ρb − 1 0.983 − 1 = = −2.120 , se(b ρ) 0.008 −0.017 − 0 βb − 0 = = −2.120 , tβ = b 0.008 se(β) tρ = which demonstrates that the two methods are indeed equivalent. The Dickey-Fuller test regression must now be extended to deal with the possibility that under the alternative hypothesis, the series may be stationary around a deterministic trend. As established in Sections ?? and ??, financial data often exhibit trends and one of the problems faced by the empirical researcher is distinguishing between stochastic and deterministic trends. If the data are trending and if the null hypothesis of nonstationarity is rejected, it is imperative that the model under the alternative hypothesis is able to account for the major characteristics displayed by the series being tested. If the test regression in equation (4.8) is used and the null hypothesis of a unit root rejected, the alternative hypothesis is that of a process which is stationary around the constant mean α. In other words, the model under the alternative hypothesis contains no deterministic trend. Consequently, the important extension of the Dickey-Fuller framework is to include a linear time trend, t, in the test regression so that the estimated equation becomes yt − yt−1 = α + βyt−1 + δtt + ut . (4.11) The Dickey-Fuller test still consists of testing β = 0. Under the alternative hypothesis, yt is now a stationary process with a deterministic trend. Once again using the monthly data on United States zero coupon bonds, the estimated regression including the time trend gives the following results 113 4.4 The Dickey-Fuller Testing Framework (with standard errors in parentheses) ∆yt = 0.030 − 0.046 yt−1 + 0.001 t + u bt . (0.052) (0.014) (0.001) The value of the Dickey-Fuller test is tβ = −0.046 − 0 βb − 0 = = −3.172. b 0.014 se(β) Finally, the Dickey-Fuller test can be performed without a constant and a time trend by setting α = 0 and δ = 0 in (4.11). This form of the test, which assumes that the process has zero mean, is only really of use when testing the residuals of a regression for stationarity as they are known to have zero mean, a problem that is returned to in Chapter 5. 0 .1 .2 .3 .4 .5 Distribution of the Dickey Fuller Tests -4 -2 0 2 4 x no constant or trend constant and trend constant but no trend standard normal Figure 4.4 Comparing the standard normal distribution (solid line) to the simulated Dickey-Fuller distribution without an intercept or trend (dashed line), with and intercept but without a trend (dot-dashed line) and with both intercept and trend (dotted line). There are therefore three forms of the Dickey-Fuller test, namely, Model 1: Model 2: Model 3: ∆yt = βyt−1 + ut ∆yt = α + βyt−1 + ut ∆yt = α + δtt + βyt−1 + ut . (4.12) For each of these three models the form of the Dickey-Fuller test is still the same, namely the test of β = 0. The pertinent distribution in each case, however, is not the same because the distribution of the test statistic changes 114 Nonstationarity in Financial Time Series depending on whether a constant and or a time trend is included. The distributions of different versions of Dickey-Fuller tests are shown in Figure 4.4. The key point to note is that all three Dickey Fuller distributions are skewed to the left with respect to the standard normal distribution. In addition, the distribution becomes less negatively skewed as more deterministic components (constants and time trends) are included. The monthly United States zero coupon bond data have been used to estimate Model 2 and Model 3. Using the Dickey-Fuller distribution the p-value for the Model 2 Dickey-Fuller test statistic (−2.120) is 0.237 and because 0.237 > 0.05 the null hypothesis of nonstationarity cannot be rejected at the 5% level of significance. This is evidence that the interest rate is nonstationary. For Model 3, using the Dickey-Fuller distribution reveals that the p-value of the test statistic (−3.172) is 0.091 and because 0.091 > 0.05, the null hypothesis cannot be rejected at the 5% level of significance. This result is qualitatively the same result as the Dickey-Fuller test based on Model 2, although there is quite a large reduction in the p-value from 0.237 in the case of Model 2 to 0.091 in Model 3. 4.4.2 Augmented Dickey-Fuller (ADF) Test In estimating any one of the test regressions in equation (4.12), there is a real possibility that the disturbance term will exhibit autocorrelation. One reason for the presence of autocorrelation will be that many financial series are interact with each other and because the test regressions are univariate equations the effects of these interactions are ignored. One common solution to correct for autocorrelation is to proceed as in Chapter 3 and include lags of the dependent variable ∆yt in the test regressions (4.12). These equations then become Model 1: ∆yt = βyt−1 + p P φi ∆yt−i + ut i=1 Model 2: ∆yt = α + βyt−1 + p P φi ∆yt−i + ut i=1 Model 3: ∆yt = α + δtt + βyt−1 + p P (4.13) φi ∆yt−i + ut , i=1 in which the lag length p is chosen to ensure that ut does not exhibit autocorrelation. The unit root test still consists of testing β = 0. The inclusion of lagged values of the dependent variable represents an augmentation of the Dickey-Fuller regression equation so this test is commonly referred to as the Augmented Dickey-Fuller (ADF) test. Setting p = 0 4.4 The Dickey-Fuller Testing Framework 115 in any version of the test regressions in (4.13) gives the associated DickeyFuller test. The distribution of the ADF statistic in large samples is also the Dickey-Fuller distribution. For example, using Model 2 in (4.13) to construct the augmented DickeyFuller test with p = 2 lags for the United States zero coupon 2-month bond yield, the estimated regression equation is ∆yt = 0.092 − 0.017 yt−1 + 0.117 ∆yt−1 − 0.080 ∆yt−2 + u bt . (0.046) (0.008) (0.045) (0.046) The value of the Augmented Dickey-Fuller test is tβ = βb − 0 −0.017 − 0 = = −2.157. b 0.008 se(β) Using the Dickey-Fuller distribution the p-value is 0.223. Since 0.223 > 0.05 the null hypothesis is not rejected at the 5% level of significance This result is qualitatively the same result as the Dickey-Fuller test with p = 0 lags. The selection of p affects both the size and power properties of a unit root test. If p is chosen to be too small, then substantial autocorrelation will remain in the error term of the test regressions (4.13) and this will result in distorted statistical inference because the large sample distribution under the null hypothesis no longer applies in the presence of autocorrelation. However, including an excessive number of lags will have an adverse effect on the power of the test. To select the lag length p to use in the ADF test, a common approach is to base the choice on information criteria as discussed in in Chapter 3. Two commonly used criteria are the Akaike Information criteria (AIC) and the Schwarz information criteria (SIC). A lag-length selection procedure that has good properties in unit root testing is the modified Akaike information criterion (MAIC) method proposed by Ng and Perron (2001). The lag length is chosen to satisfy pb = arg min MAIC(p) = log(b σ2) + p 2(τp + p) , T − pmax (4.14) in which α b2 τp = 2 σ b T X u b2t−1 , t=pmax +1 and the maximum lag length is chosen as pmax = int[12(T /100)1/4 ]. In estimating pb, it is important that the sample over which the computations are performed is held constant. 116 Nonstationarity in Financial Time Series There are two other more informal ways of choosing the length of the lag structure p. The first of these is to include lags until the t-statistic on the lagged variable is statistically insignificant using the t-distribution. Unlike the ADF test, the distribution of the t-statistic on the lagged dependent variables has a standard distribution based on the Student t distribution. The second informal approach dealing with the need to choose the lag length p is effectively to circumvent making a decision at all. The ADF test is performed for a range of lags, say p = 0, 1, 2, 3, 4, · · · . If all of the tests show that the series is nonstationary then the conclusion is clear. If four of the 5 tests show evidence of nonstationarity then there is still stronger evidence of nonstationarity than there is of stationarity. 4.5 Beyond the Dickey-Fuller Framework† A number of extensions and alternatives to the Dickey-Fuller and Augmented Dickey-Fuller unit roots tests have been proposed. A number of developments, some of which are commonly available in econometric software packages, are considered briefly. 4.5.1 Structural Breaks The form of the nonstationarity emphasised so far is based on the series following a random walk. An alternative form of nonstationarity discussed earlier is based on a deterministic linear time trend. Another form of nonstationarity is when the series exhibits a structural break as this represents a shift in the mean and hence by definition is non-mean reverting. The simplest approach is where the timing of the structural break is known. The approach is to include a dummy variable in (4.13) to capture the structural break according to ∆yt = α + βyt−1 + δt + p X φi ∆yt−i + γBREAKt + ut , (4.15) i=1 where the structural break dummy variable is defined as 0 : t≤τ BREAKt = , 1 : t>τ (4.16) and τ is the observation where there is a break. The unit root test is still based on testing β = 0, however the p-values are now also a function of the timing of the structural break τ , so even more tables are needed. The correct p-values for a unit roots test with a structural break are available in Perron 4.5 Beyond the Dickey-Fuller Framework† 117 (1989). For a review of further extensions of unit root tests with structural breaks, see Maddala and Kim (1998). An example of a possible structural break is highlighted in Figure 4.2 where there is a large fall in the share price at the time of the 1929 stock market crash. 4.5.2 Generalised Least Squares Detrending Consider the following model yt = α + δt + ut (4.17) ut = φut−1 + vt (4.18) in which ut is a disturbance term with zero mean and constant variance σ 2 . This is the fundamental equation from which Model 3 of the Dickey-Fuller test is derived. If the aim is still to test for a unit root in yt the null and alternative hypotheses are H0 : φ = 1 H1 : φ < 1 . [Nonstationary] [Stationary] (4.19) Instead of proceeding in the manner described previously and using Model 3 in either (4.12) or (4.13), an alternative approach is to use a two-step procedure. Step 1: Detrending Estimate the parameters of equation (4.17) by ordinary least squares and then construct a detrended version of yt given by b . yt∗ = yt − α b − δt Step 2: Testing Test for a unit root using the deterministically detrended data, yt∗ , from the first step, using the Dickey-Fuller or augmented DickeyFuller test. Model 1 will be the appropriate model to use because, by construction, yt∗ will have zero mean and no deterministic trend. It turns out that in large samples (or asymptotically) this procedure is equivalent to the single-step approach based on Model 3. Elliott, Rothenberg and Stock (1996) suggest an alternative detrending step which proceeds as follows. Define a constant φ∗ = 1 + c/T in which the value of the c depends upon the whether the detrending equation has only 118 Nonstationarity in Financial Time Series a constant or both a content and a time trend. The proposed values of c are c = −7 [Constant (α 6= 0, δ = 0)] c = −13.5 [Trend (α 6= 0, δ 6= 0)]. and use this constant to rewrite the detrending regression as yt∗ = γ0 α∗ + γ1 t∗ + u∗t , (4.20) in which e∗t is a composite disturbance term, yt∗ = yt − φ∗ yt−1 , ∗ ∗ ∗ ∗ α =1−φ , t = t − φ (t − 1) , t = 2···T t = 2···T (4.21) (4.22) (4.23) and the starting values for each of the series at t = 1 are taken to by y1∗ = y1 and α1∗ = t∗1 = 1, respectively. The starting values are important because if c = −T the detrending equation reverts to the simple detrending regression (4.17). If, on the other hand, c = 0 then the detrending equation is an equation in first-differences. It is for this reason that this method, which is commonly referred to as generalised least squares detrending, is also known as quasi-differencing and partial generalised least squares (Phillips and Lee, 1995). Once the ordinary least squares estimates γ b0 and γ b1 are available, the detrended data b0 α∗ − +γb1 t∗ , u bt∗ = yt∗ − γ is tested for a unit root. If Model 1 of the Dickey-Fuller framework is used then the test is referred to as the GLS-DF test. Note, however, that because the detrended data depend on the value of c the critical value are different to the Dickey-Fuller critical values which rely on simple detrending. The generalised least squares (or quasi-differencing) approach was introduced to try and overcome one of the important shortcomings of the Dickey-Fuller approach, namely that the Dickey-Fuller tests have low power. What this means is that the Dickey-Fuller tests struggle to reject the null hypothesis of nonstationarity (a unit root) when it is in fact false. The modified detrending approach proposed by Elliott, Rothenberg and Stock (1996) is based on the premise that the test is more likely to reject the null hypothesis of a unit root if under the alternative hypothesis the process is very close to being nonstationary. The choice of value for c = 0 in the detrending process ensures that the quasi-differenced data have an autoregressive root that is very close to one. For example, based on a sample size of T = 200, the quasi difference 4.5 Beyond the Dickey-Fuller Framework† 119 parameter φ∗ = 1 + c/T is 0.9650 for a regression with only a constant and 0.9325 for a regression with a constant and a time trend. 4.5.3 Nonparametric Adjustment for Autocorrelation Phillips and Perron (1988) propose an alternative method for adjusting the Dickey-Fuller test for autocorrelation. Their test is based on estimating the Dickey-Fuller regression equation, either (4.8) or (4.11), by ordinary least squares but using a nonparametric approach to correct for the autocorrelation. The Phillips-Perron statistic is 1/2 b T (fb0 − γ b0 )se(β) γ b0 e tβ = tα − , (4.24) 1/2 fb0 2fb0 s where tβ is the ADF statistic, s is the standard error of the regression, fb0 is known as the long-run variance which is computed as fb0 = γ b0 + 2 p X j=1 j γj , (1 − )b p (4.25) where p is the length of the lag, and γ bj is the j th estimated autocovariance function of the ordinary least squares residuals obtained from estimating either (4.8) or (4.11) γ bj = T 1 X u bt u bt−j . T (4.26) t=j+1 The critical values are the same as the Dickey-Fuller critical values when the sample size is large. 4.5.4 Unit Root Test with Null of Stationarity The Dickey-Fuller testing framework for unit root testing, including the generalised least squares detrending and Phillips-Perron variants, are for the null hypothesis that a time series yt is nonstationary or I(1). There is, however, a popular test that is often reported in the empirical literature which has a null hypothesis of stationarity or I(0). Consider the regression model yt = α + δt + zt , where zt is given by zt = zt−1 + εt , εt ∼ iid N (0, σε2 ) . 120 Nonstationarity in Financial Time Series The null hypothesis that yt is a stationary I(0) process is tested in terms of the null hypothesis H0 : σε2 = 0 in which case zt is simply a constant. Define {b z1 , · · · , zbT } as the ordinary least squares residuals from regression of yt on a constant and a deterministic trend. Now define the standardised test statistic PT Pt bj )2 t=1 ( j=1 z S= , T 2 fb0 in which fb0 is a consistent estimator of the long-run variance of zt . This test statistic can is most commonly known as the KPSS test, after Kwiatkowski, Phillips, Schmidt and Shin (1992). This can also be regarded as a test for over-differencing following the earlier discussion of over-differencing. 4.5.5 Higher Order Unit Roots A failure to reject the null hypothesis of nonstationarity suggests that the series needs to be differenced at least once to render it stationary ie d ≥ 1. The question is how many times does the series have to be differenced to achieve stationarity. To identify the value of d, the unit root tests discussed above are performed sequentially as follows. (1) Test the level of the series for a unit root. (a) If the null is rejected, stop and conclude that the series is I(0). (b) If you fail to reject the null, conclude that the process is at least I(1) and move to the next step. (2) Test the first difference of the series for a unit root. (a) If the null is rejected, stop and conclude that the series is I(1). (b) If you fail to reject the null, conclude that the process is at least I(2) and move to the next step. (3) Test the second difference of the series for a unit root. (a) If the null is rejected, stop and conclude that the series is I(2). (b) If you fail to reject the null, conclude that the process is at least I(3) and move to the next step. As it is very rare for financial series to exhibit orders of integration higher than I(2), it is safe to stop at this point. The pertinent p-values vary at each stage of the sequential unit root testing procedure. 121 4.6 Price Bubbles 4.6 Price Bubbles During the 1990s, led by Dot-Com stocks and the internet sector, the United States stock market experienced a spectacular rise in all major indices, especially the NASDAQ index. Figure 4.5 plots the monthly NASDAQ index, expressed in real terms, for the period February 1973 to January 2009. The series grows fairly steadily until the early 1990s and begins to surge. The steep upward movement in the series continues until the late 1990s as investment in Dot-Com stocks grew in popularity. Early in the year 2000 the Index drops abruptly and then continues to fall to the mid-1990s level. In summary, over the decade of the 1990s, the NASDAQ index rose to the historical high on 10 March 2000. Concomitant with this striking rise in stock market indices, there was much popular talk among economists about the effects of the internet and computing technology on productivity and the emergence of a new economy associated with these changes. What caused the unusual surge and fall in prices, whether there were bubbles, and whether the bubbles were rational or behavioural are among the most actively debated issues in macroeconomics and finance in recent years. 10 20 00 20 90 19 80 19 19 70 0 10 ndreal 20 30 NASDAQ Index Expressed in Real Terms Figure 4.5 The monthly NASDAQ index expressed in real terms for the period February 1973 to January 2009. A recent series of papers places empirical tests for bubbles and rational exuberance is an interesting new development in the field of unit root testing (Phillips and Yu, 2011; Phillips, Wu and Yu, 2011). Instead of concentrating 122 Nonstationarity in Financial Time Series on performing a test of a unit root against the alternative of stationarity (essentially using a one-sided test where the critical region is defined in the left-hand tail of the distribution of the unit root test statistic), they show that the process having an explosive unit root (the right tail of the distribution) is appropriate for asset prices exhibiting price bubbles. The null hypothesis of interest is still ρ = 1 but the alternative hypothesis is now ρ > 1 in (4.4), or H0 : H1 : ρ=1 ρ>1 (Variable is nonstationary, No price bubble) (Variable is explosive, Price bubble). (4.27) To motivate the presence of a price bubble, consider the following model Pt (1 + R) = Et [Pt+1 + Dt+1 ] , (4.28) where Pt is the price of an asset, R is the risk-free rate of interest assumed to be constant for simplicity, Dt is the dividend and Et [·] is the conditional expectations operator. This equation highlights two types of investment strategies. The first is given by the left hand-side which involves investing in a risk-free asset at time t yielding a payoff of Pt (1 + R) in the next period. Alternatively, the right hand-side shows that by holding the asset the investor earns the capital gain from owning an asset with a higher price the next period plus a dividend payment. In equilibrium there are no arbitrage opportunities so the two two types of investment are equal to each other. Now write the equation as Pt = β Et [Pt+1 + Dt+1 ] , (4.29) where β = (1 + R)−1 is the discount factor. Now writing this expression at t+1 Pt+1 = β Et [Pt+2 + Dt+2 ] , (4.30) which can be used to substitute out Pt+1 in (4.29) Pt = β Et [β Et [Pt+2 + Dt+2 ] + Dt+1 ] = β Et [Dt+1 ]+β 2 Et [Dt+2 ]+Et [Pt+2 ] . Repeating this approach N −times gives the price of the asset in terms of two components Pt = N X β j Et [Dt+j ] + β N Et [Pt+N ] . (4.31) j=1 The first term on the right-hand side is the standard present value of an asset 123 4.6 Price Bubbles whereby the price of an asset equals the discounted present value stream of expected dividends. The second term represents the price bubble Bt = β N Et [Pt+N ] , (4.32) as it is an explosive nonstationary process. Consider the conditional expectation of the bubble the next period discounted by β and using the property Et [Et+1 [·]] = Et [·]: β Et [Bt+1 ] = β Et β N Et+1 [Pt+N +1 ] = β N +1 Et [Pt+N +1 ] (4.33) However, this expression would also correspond to the bubble in (4.32) if the N forward iterations that produced (4.31) actually went for N +1 iterations. In which case Bt = βEt [Bt+1 ] or, as β = (1 + R)−1 Et [Bt+1 ] = (1 + R)Bt which represents a random walk in Bt but with an explosive parameter 1+R. 10 20 05 20 00 20 95 19 90 19 85 19 80 19 19 75 -3 -2 -1 0 1 2 Recursive ADF Tests Figure 4.6 Testing for price bubbles in the monthly NASDAQ index expressed in real terms for the period February 1973 to January 2009 by means of recursive Augmented Dickey Fuller tests with 1 lag. The startup sample is 39 observations from February 1973 to April 1976. The approximate 5% critical value is also shown. 124 Nonstationarity in Financial Time Series 10 20 00 20 90 19 19 80 -4 -2 0 2 Rolling Window ADF Tests Figure 4.7 Testing for price bubbles in the monthly NASDAQ index expressed in real terms for the period February 1973 to January 2009 by means of rolling window Augmented Dickey Fuller tests with 1 lag. The size of the window is set to 77 observations so that the starting sample is February 1973 to June 1979. The approximate 5% critical value is also shown. Interestingly enough, if we were to follow the convention and apply the ADF test to the full sample (February 1973 to January 2009), the unit root test would not reject the null hypothesis H0 : ρ = 1 in favour of the righttailed alternative hypothesis H1 : ρ > 1 at the 5 % level of significance. One would conclude that there is no significant evidence of exuberance in the behaviour of the NASDAQ index over the sample period. This result would sit comfortably with the consensus view that there is little empirical evidence to support the hypothesis of explosive behaviour in stock prices (see, for example, Campbell, Lo and MacKinlay, 1997, p260). On the other hand, Evans (1991) argues that explosive behaviour is only temporary in the sense that economic eventually bubbles collapse and that therefore the observed trajectories of asset prices may appear rather more like an I(1) or even a stationary series than an explosive series, thereby confounding empirical evidence. Evans demonstrates by simulation that standard unit root tests have difficulties in detecting such periodically collapsing bubbles. In order for unit root test procedures to be powerful in detecting 4.7 Exercises 125 bubbles, the use of recursive unit root testing proves to an invaluable approach in the detection and dating of bubbles. Figure 4.6 plots the ADF statistic with 1 lag computed from forward recursive regressions by fixing the start of the sample period and progressively increasing the sample size observation by observation until the entire sample is being used. Interestingly, the NASDAQ shows no evidence of rational exuberance until June 1995. In July 1995, the test detects the presence of a bubble, ρb > 0, with the supporting evidence becoming stronger from this point until reaching a peak in February 2000. The bubble continues until February 2001 and by March 2001 the bubble appears to have dissipated and ρb < 0. Interestingly, the first occurrence of the bubble is July 1995, which is more than one year before the remark by Greenspan (1996) on 5 December 1996, coining the phrase of irrational exuberance, to characterise herding behaviour in stock markets. To check the robustness of the results Figure 4.7 plots the ADF statistic with 1 lag for a series of rolling window regressions. Each regression is based on a subsample of size T = 77 with the first sample period from February 1973 to June 1979. The fixed window is then rolled forward one observation at a time. The general pattern to emerge is completely consistent with the results reported in Figure 4.6. Of course these results do not have any causal explanations for the exuberance of the 1990s in internet stocks. Several possibilities exist, including the presence of a rational bubble, herding behaviour, or explosive effects on economic fundamentals arising from time variation in discount rates. Identification of the explicit economic source or sources of will involve more explicit formulation of the structural models of behaviour. What this recursive methodology does provide, however, is support of the hypothesis that the NASDAQ index may be regarded as a mildly explosive propagating mechanism. This methodology can also be applied to study recent phenomena in real estate, commodity, foreign exchange, and equity markets, which have attracted attention. 4.7 Exercises (1) Unit Root Properties of Commodity Price Data commodity.wf1, commodity.dta, commodity.xlsx (a) For each of the commodity prices in the dataset, compute the natural logarithm and use the following unit root tests to determine 126 Nonstationarity in Financial Time Series the stationarity properties of each series. Where appropriate test for higher orders of integration. (i) Dickey-Fuller test with a constant and no time trend. (ii) Augmented Dickey-Fuller test with a constant and no time trend, and p = 2 lags. (iii) Phillips-Perron test with a constant and no time trend. (b) Perform a panel unit root test on the 7 commodity prices with a constant and no time trend and with p = 2 lags. (2) Equity Market Data pv.wf1, pv.dta, pv.xlsx (a) Use the equity price series to construct the following transformed series; the natural logarithm of equity prices, the first difference of equity prices and log returns of equity prices. Plot the series and discuss the stationarity properties of each series. Compare the results with Figure 4.2. (b) Construct similarly transformed series for dividend payments and discuss the stationarity properties of each series. (c) Construct similarly transformed series for earnings and and discuss the stationarity properties of each series. (d) Use the following unit root tests to test for stationarity of the natural logarithms of prices, dividends and earnings: (i) Dickey-Fuller test with a constant and no time trend. (ii) Augmented Dickey-Fuller test with a constant and no time trend and p = 1 lag. (iii) Phillips-Perron test with a constant and no time trend and p = 1 lags. In performing these tests it may be necessary to test for higher orders of integration. (e) Repeat part (d) where the lag length for the ADF and PP tests is based on the automatic bandwidth selection procedure. (3) Unit Root Tests of Bond Market Data zero.wf1, zero.dta, zero.xlsx 4.7 Exercises 127 (a) Use the following unit root tests to determine the stationarity properties of each yield (i) Dickey-Fuller test with a constant and no time trend. (ii) Augmented Dickey-Fuller test with a constant and no time trend, and p = 2 lags. (iii) Phillips-Perron test with a constant and no time trend. In performing these tests it is necessary to test for higher orders of integration. (b) Perform a panel unit root test on the 6 yield series with a constant and no time trend and with p = 2 lags. (4) The Term Structure of Interest Rates zero.wf1, zero.dta, zero.xlsx The term expectations hypothesis of the term structure of interest rates predicts the following relationship between a long-term interest rate of maturity n and a short-term rate of maturity m < n yn,t = β0 + β1 ym,t + ut , where ut is a disturbance term and β0 is represents the term premium and β1 = 1 for the pure expectations hypothesis. (a) Test for cointegration between y9,t and y3,t using Model 2 and p = 1 lags. (b) Given the results in part (a) estimate a bivariate ECM for y9,t and y3,t using Model 2 with p = 1 lags. Write out the estimated model (the cointegrating equation(s) and the ECM). In estimating the VECM order the yields from the longest maturity to the shortest. (c) Interpret the long-run parameter estimates of β1 and β2 . (d) Interpret the error correction parameter estimates of γ1 and γ1 . (e) Interpret the short-run parameter estimates of πi,j . (f) Test the restriction β1 = 1. (g) Repeat parts (a) to (f) for the 6-month (y6,t ) and 3-month (y3,t ) yields. (h) Repeat parts (a) to (f) for the 9-month (y9,t ), 6-month (y6,t ) and 3-month (y3,t ) yields. (i) Repeat parts (a) to (f) for all 6 yields (y9,t , y6,t , y5,t , y4,t , y3,t , y2,t ). 128 Nonstationarity in Financial Time Series (j) Discuss whether the empirical results support the term structure of interest rate model. (k) Questions (a) to (k) are all based on specifying Model 2 as the ECM. Reestimate the VECM where Model 3 is chosen. As the difference between Model 2 and Model 3 is the inclusion of intercepts in each equation of the VECM, perform a test that each intercept is zero. Interpret the results of this test. (l) In estimating the VECM in the previous question, the order of the yields consists of choosing the longest maturity first and the shortest maturity last ie y9,t , y6,t , y3,t . Now reestimate the VECM choosing the ordering y9,t , y3,t , y6,t . Show that the estimated cointegrating equation(s) from this system can be obtained from the previous system based on an alternative ordering. Hence show that the estimates of the cointegrating equation(s) is (are) not unique. (m) Test for weak exogeneity in the bivariate system containing y9,t and y3,t . To perform the test that y9,t is weakly exogenous. Repeat the test for a system that contains the interest rates y6,t and y3,t and then for the trivariate system y9,t , y6,t and y3,t . (5) Purchasing Power Parity ppp.wf1, ppp.dta, ppp.xlsx Under the assumption of purchasing power parity (PPP), the nominal exchange rate adjusts in the long-run to the price differential between foreign and domestic countries P F This suggests that the relationship between the nominal exchange rate and the prices in the two countries is given by S= st = β0 + β1 pt + β2 ft + ut where lower case letters denote natural logarithms and ut is a disturbance term which represents departures from PPP with β2 = −β1 . 4.7 Exercises 129 (a) Construct the relevant variables, s, f , p and the difference dif f = p − f. (b) Use unit root tests to determine the level of integration of all of these series. In performing the unit root tests, test the sensitivity of the results by using a model with a constant and no time trend, and a model with a constant and a time trend. Let the lags be p = 12. Discuss the results in terms of the level of integration of each series. (c) Test for cointegration between s p and f using Model 3 with p = 12 lags. (d) Given the results in part (c) estimate a trivariate ECM for s, p and f using Model 3 and p = 12 lags. Write out the estimated (the cointegrating equation(s) and the ECM). (e) Interpret the long-run parameter estimates. Hint: if the number of cointegrating equations is greater than one, it is helpful to rearrange the cointegrating equations so one of the equations expresses s as a function of p and f . (f) Interpret the error correction parameter estimates. (g) Interpret the short-run parameter estimates. (h) Test the restriction H0 : β2 = −β1 . (i) Discuss the long-run properties of the $/AUD foreign exchange market? (6) Fisher Hypothesis fisher.wf1, fisher.dta, fisher.xlsx Under the Fisher hypothesis the nominal interest rate fully reflects the long-run movements in the inflation rate. (a) Construct the percentage annualised inflation rate, πt . (b) Plot the nominal interest rate and inflation. (c) Perform unit root tests to determine the level of integration of the nominal interest rate and inflation. In performing the unit root tests, test the sensitivity of the results by using a model with a constant and no time trend, and a model with a constant and a time trend. Let the lags be determined by the automatic lag length selection procedure. Discuss the results in terms of the level of integration of each series. (d) Compute the real interest rate as rt = it − πt , 130 Nonstationarity in Financial Time Series where it is nominal interest rate and πt is the inflation rate. Test the real interest rate rt for stationarity using a model with a constant but no time trend. Does the Fisher hypothesis hold? Discuss. (7) Price Bubbles in the Share Market bubbles.wf1, bubbles.dta, bubbles.xlsx The data represents a subset of the equity us.* data in order to focus on the 1987 stock market crash. The present value model predicts the following relationship between the share price Pt , and the dividend Dt pt = β0 + β1 dt + ut where ut is a disturbance term. A rational bubble occurs when the actual price persistently deviates from the present value price β0 + β1 dt . The null and alternative hypotheses are H0 : H1 : Bubble Cointegration (ut is nonstationary) (ut is stationary) (a) Create the logarithms of real equity prices and real dividends and use unit root tests to determine the level of integration of the series. (b) Estimate a bivariate VAR with a constant and use the SIC lag length criteria to determine the optimal lag structure. (c) Test for a bubble by performing a cointegration between pt and dt using Model 3 with the number of lags based on the optimal lag length obtained form the estimated VAR. (d) Are United States equity prices driven solely by market fundamentals or do bubbles exist. 5 Cointegration 5.1 Introduction An important implication of the analysis of stochastic trends and the unit root tests discussed in Chapter 4 is that nonstationary time series can be rendered stationary through differencing the series. This use of the differencing operator represents a univariate approach to achieving stationarity since the discussion of nonstationary processes so far has concentrated on a single time series. In the case of N > 1 nonstationary time series yt = {y1,t , y2,t , · · · , yN,t }, an alternative method of achieving stationarity is to form linear combinations of the series. The ability to find stationary linear combinations of nonstationary time series is known as cointegration (Engle and Granger, 1987). Cointegration provides a basis for interpreting a number of models in finance in terms of long-run relationships. Having uncovered the long-run relationships between two or more variables by establishing evidence of cointegration, the short-run properties of financial variables are modelled by combining the information from the lags of the variables with the longrun relationships obtained from the cointegrating relationship. This model is known as a vector error-correction model (VECM) which is shown to be a restricted form of the vector autoregression models (VAR) discussed in Chapter 3. The existence of cointegration among sets of nonstationary time series has three important implications. (1) Cointegration implies a set of dynamic long-run equilibria where the weights used to achieve stationarity represent the parameters of the equilibrium relationship. (2) The estimates of the weights to achieve stationarity (the long-run parameter estimates) converge to their population values at a super-consistent 132 Cointegration √ rate of T compared to the usual T rate of convergence for stationary variables. (3) Modelling a system of cointegrated variables allows for specification of both long-run and short-run dynamics in terms of the VECM. 5.2 Equilibrium Relationships Equity Prices Earnings 00 20 80 19 60 19 40 19 20 19 00 19 18 80 -2 0 2 4 6 8 An important property of asset prices identified in Chapter 1 is that they exhibit strong trends. This is indeed the case for United States as seen in Figure 5.1 which shows that the logarithm of monthly real equity prices, pt = log Pt , exhibit a strong positive trend over the period 1871 to 2004. The same is true for the logarithms of real dividends, dt = log Dt , and real earnings per share, yt = log Yt , also illustrated in Figure 5.1. As discussed in Chapter 4, many important financial time series exhibit trending behaviour and are therefore nonstationary. Dividends Figure 5.1 Time series plots of the logarithms of monthly United States real equity prices, real dividends and real earnings per share for the period February 1871 to June 2004. It may be an empirical fact that the financial variables, illustrated in Figure 5.1 are I(1), but theory suggests some theoretical link between the behaviour of prices, dividends and earnings. An early influential paper in this area is by Gordon (1959). who outlines two views of asset price determination. In the dividend view, the investor purchases as stock to acquire the entire future stream of dividend payments. This path of future dividends is approximated by the current dividend and the expected growth in the div- 5.2 Equilibrium Relationships 133 idend. If the expected growth of dividends are assumed constant then there is a long-run relationship between prices and dividends given by pt = µd + βd dt + ud,t . [Dividend model] (5.1) Important feature is that both pt and dt are I(1) but if µd + βd yt truly does represent the expected value of pt , then it must follow that the disturbance term, ud,t is stationary or I(0). Alternatively, in the earnings view of the world, the investor buys equity in order to obtain the income per share and is indifferent as to whether the returns are packaged in terms of the fraction of earnings distributed as a dividend or in terms of the rise in the share’s value. This suggests a relationship of the form pt = µy + βy yt + uy,t , [Earnings model] (5.2) where once again uy,t must be I(0) if this represents a valid long-run relationship vector. In other words, in either view of the world, pt can be decomposed into a long-run component and a short-run component which represents temporary deviations of pt from its long-run. This can be represented as ud,t pt = µd + βd dt + |{z} | {z } |{z} Actual Long-run Short-run or in the case of the earnings model pt = µy + βy dt + uy,t |{z} | {z } |{z} Actual Long-run Short-run A linear combination of nonstationary variables generates a new variable that is stationary is a result known as cointegation. Furthermore, the concept of cointegration is not limited to the bivariate case. If the growth of dividends is driven by retained earnings, then the path of future dividends is approximated by the current dividend and the expected growth in the dividend given by retained earnings. This suggests an equilibrium relationship of the form pt = µ + βd dt + βy yt + ut , [Combined model] where as before pt , dt and yt are I(1) and ut is I(0). If the owner of the share is indifferent to the fraction of earnings distributed, then cointegrating parameters, βd and βy will be identical. Of course, all dividends are paid out of retained earnings so there will be a relationship between these two 134 Cointegration variables as well, a fact which raises the interesting question of more than one cointegrating relationship being present in multivariate contexts. This is issue is taken up again in Section 5.8. 5.3 Equilibrium Adjustment Assume that we have two variables y1,t and y2,t who share a long-run equilibrium relationship given by y1,t = µ + βy2,t−1 + εt , in which εt is a mean-zero disturbance term and although the equation is normalised with respect to respect to y1,t the notation is deliberately chosen to reflect the fact that both variables are possibly endogenously determined. This relationship is presented in Figure 5.2 for β > 0. y1 B C D A y2 Figure 5.2 Phase diagram to demonstrate the equilibrium adjustment if two variables are cointegrated. The system is in equilibrium anywhere along the long ADC. Now suppose there is shock to the system such that y1,t−1 > µ + βy2,t−1 or equivalently ut−1 > 0 and the system is displaced to point B. An equilibrium relationship implies necessarily that any shock to the system will result in an adjustment taking place in such a way that equilibrium is restored. There are three cases. (1) The adjustment is done by y1,t : ∆y1,t = α1 (y1,t−1 − µ − βy2,t−1 ) + u1,t . (5.3) Since y1,t−1 − µ − βy2,t−1 > 0, inspection of equation (5.3) reveals that ∆y1,t should be negative, which in turn suggests the restriction α1 < 0. 5.3 Equilibrium Adjustment 135 In Figure 5.2 this adjustment is represented by a perpendicular move down from B towards A. (2) The adjustment is done by y2,t : ∆y2,t = α2 (y1,t−1 − µ − βy2,t−1 ) + u2,t . (5.4) Since y1,t−1 − µ − βy2,t−1 > 0, inspection of equation (5.4) reveals that ∆y2,t should be positive, which in turn suggests the restriction α2 > 0. In Figure 5.2 this adjustment is represented by a horizontal move from B towards C. (3) Both y1,t and y2,t adjust: In this case both equations (5.3) and (5.4) operate with pt increasing and y2,t decreasing. The strength of the movements in the two variables is determined by the relative magnitudes of the parameters α1 and α2 . If both variables bear an equal share of the adjustment the movement back to equilibrium is from point B to point D as shown in Figure 5.2. Prima facie evidence of equilibrium relationships between equity prices and dividends, and equity prices and earnings is presented in panels (a) and (b), respectively, of Figure 5.3. Scatter plots of these relationships together with lines of best fit demonstrate that both these relationships are similar to the equilibrium represented in Figure 5.2. Furthermore, casual inspection of the equilibrium relationships suggests that the values of βd and βy are both close to 1. In order to explore which of the variables do the adjusting in the event of a shock which forces the system away from equilibrium, equations (5.3) and (5.4) must be estimated. Particularising these equations to the equity prices/dividends and equity prices/earnings relationships and estimating by sequential application of ordinary least squares yields the following results. For the dividend model the estimates are ∆pt = −0.0009 pt−1 − 1.1787 dt−1 − 3.128 + u b1,t ∆dt = 0.0072 pt−1 − 1.1787 dt−1 − 3.128 + u b2,t , while for the earnings model the results are ∆pt = −0.0053 pt−1 − 1.0410 yt−1 − 2.6073 + u b1,t ∆yt = 0.0035 pt−1 − 1.0410 yt−1 − 2.6073 + u b2,t . It appears that the equilibrium adjustment predicted by equations (5.3) and (5.4) is confirmed for these two relationships. In particular, the signs 136 Cointegration 6 Equity Prices 4 2 0 0 2 Equity Prices 4 6 8 (b) 8 (a) -2 -1 0 1 Dividends 2 3 -2 0 2 4 Earnings Figure 5.3 Scatter plots of the logarithms of month United States real equity prices and real dividends, panel (a), and real equity prices and real earnings per share, panel (b), for the period February 1871 to June 2004. on the adjustment parameters satisfy the conditions required for there to be equilibrium adjustment. 5.4 Vector Error Correction Models Taken together equations (5.3) and (5.4) are known as a vector error correction model or VECM. In practice, the specification of a VECM requires the inclusion of more complex short-run dynamics, introduced through the addition of lags in dependent variables, and also the inclusion of constants and time trends in the same way that these deterministic variables are included in unit root tests. Here the situation is slightly more involved because these deterministic variables can appear in either the long-run cointegrating equation or in the short-run dynamics, or VAR, part of the equation. There are five different models to consider all of which are listed below. For simplicity the short-run dynamics or VAR part of the VECM are not included in this listing of the models. Model 1(No Constant or Trend): No intercept and no trend in the cointegrating equation and no intercept and no trend in the VAR: ∆y1,t = α1 (y1,t−1 − βy2,t−1 ) + u1,t ∆y2,t = α2 (y1,t−1 − βy2,t−1 ) + u2,t 5.4 Vector Error Correction Models 137 This specification is included for completeness but, in general, the model will only rarely be of any practical use as most empirical specifications will require at least a constant whether or in the longrun or short-run or both. Model 2 (Restricted Constant): Intercept and no trend in the cointegrating equation and no intercept and no trend in the VAR ∆y1,t = α1 (y1,t−1 − βy2,t−1 − µ) + v1,t ∆y2,t = α2 (y1,t−1 − βy2,t−1 − µ) + v2,t This model is referred to as the restricted constant model as there is only one intercept term µ in the long-run equation which acts as the intercept for both dynamic equations. Model 3 (Unrestricted Constant): Intercept and no trend in the cointegrating equation and intercept and no trend in the VAR ∆y1,t = δ1 + α1 (y1,t−1 − βy2,t−1 − µ) + v1,t ∆y2,t = δ2 + α2 (y1,t−1 − βy2,t−1 − µ) + v2,t Model 4 (Restricted Trend): Intercept and trend in the cointegrating equation and intercept and no trend in the VAR ∆y1,t = δ1 + α1 (y1,t−1 − βy2,t−1 − µ − φTREND) + v1,t ∆y2,t = δ2 + α2 (y1,t−1 − βy2,t−1 − µ − φTREND) + v2,t Similar to Model 2, this model is called the restricted trend model because there is only one trend term in the long-run equation. Model 5 (Unrestricted Trend): Intercept and trend in the cointegrating equation and intercept and trend in the VAR ∆y1,t = δ1 + θ1 TREND + α1 (y1,t−1 − βy2,t−1 − µ − φTREND) + v1,t ∆y2,t = δ2 + θ2 TREND + α2 (y1,t−1 − βy2,t−1 − µ − φTREND) + v2,t As with the unit root tests lagged values of all of the dependent variables (VAR terms) are included as additional regressors to capture the short-run dynamics. As the system is multivariate, the lags of all dependent variables are included in all equations. For example, a VECM based on Model 2 138 Cointegration (restricted constant) with p lags on the dynamic terms becomes ∆y1,t = α1 (y1,t−1 − βy2,t−1 − µ) + ∆y2,t = α2 (y1,t−1 − βy2,t−1 − µ) + p X i=1 p X π11,i ∆y1,t−i + π21,i ∆y1,t−i + i=1 p X i=1 p X π12,i ∆y2,t−i + v1,t π22,i ∆y2,t−i + v2,t . i=1 Exogenous variables determined outside of the system are also allowed. Finally, the system can be extended to include more than two variables. In this case there is the possibility of more than a single cointegrating equation which means that the system adjusts in general to several shocks, a theme taken up again in Section 5.8. 5.5 Relationship between VECMs and VARs The VECM represents a restricted form of a VAR. Instead of the VAR format where all variables are stationary (first differences in this instance), the VECM specifically includes the long-run equilibrium relationship in which the variables enter in levels. To highlight this relationship consider a simple VECM given by y1,t − y1,t−1 = α1 (y1,t−1 − βy2,t−1 ) + u1,t y2,t − y2,t−1 = α2 (y1,t−1 − βy2,t−1 ) + u2,t , (5.5) in which there is one cointegrating equation and no lagged difference terms on the right hand side. There are three parameters to be estimated, namely, the cointegating parameter β and the two error correction parameters α1 and α2 . Now re-express each equation in terms of the levels of the variables as y1,t = (1 + α1 )y1,t−1 − α1 βy2,t−1 + u1,t y2,t = α2 y1,t−1 + (1 − α2 β)y2,t−1 + u2,t . (5.6) Not that the VAR is a VAR(1) which has one lag of the levels of the variables on the right hand side. This is a general relationship between a VAR and a VECM. If the underlying VAR is specified to be a VAR(n) then the VECM will have n − 1 lagged difference terms, that is a VECM(n − 1). y1,t = φ11 y1,t−1 + φ12 y2,t−1 + u1,t y2,t = φ21 y1,t−1 + φ22 y2,t−1 + u2,t , (5.7) where the parameters in (5.7) are related to those in (5.6) by the restrictions φ11 = 1 + α1 , φ12 = −α1 β φ21 = α2 , φ22 = 1 − α2 β. 5.5 Relationship between VECMs and VARs 139 Equation (5.7) is a VAR in the levels of the variables discussed in Chapter 3. Estimating the VAR yields estimates of φ11 , φ12 , φ21 and φ22 . A comparison of equations (5.6) and (5.7) shows that cointegration imposes one cross-equation restriction on this system, which accounts for the difference in the number of parameters in the VAR and the VECM. This restriction arises as both variables are determined by the same underlying long-run relationship which involves the parameter β. The form of the restriction is recovered by noting that α1 = φ11 − 1, α2 = φ21 , β = (1 − φ22)φ−1 21 The additional VAR parameter can be expressed as a function of the other three VAR parameters as φ12 = (1 − φ11 )(1 − φ22 )φ−1 21 . This result suggests that if there is cointegration, estimating the unrestricted VAR in levels produces an estimate of φ12 that is close to the value that would be obtained from substituting the remaining VAR parameters estimates into this expression. Alternatively, if there is no cointegration then there is nothing for the system to error-correct to and the error-correction parameters in (5.5) are simply α1 = α2 = 0. The VECM is now a VAR in first differences. It is recognition of a second-best strategy whereby if no long-run relationship exists, then the next strategy is to model just the short-run relationships amongst the variables. This discussion touches on the old problem in time-series modelling of when to difference variables in order to address the problem of nonstationarity. The solution is to know whether there is cointegration or not. If there is cointegration, a VAR in levels is the correct specification. If there is no cointegration a VAR if first differences is required. Of course, if there is cointegartion an VECM can be specified, but in large samples this would be equivalent to estimating the VAR in levels. This result also highlights the importance of VECMs in modelling financial variables because it demonstrates that the old practice of automatically differencing variables to render them stationary and then estimating a VAR on the differenced data, rules out the possibility of a long-run relationship and hence any role for an error-correction term in modelling the dynamics. 140 Cointegration 5.6 Estimation To illustrate the estimation of a VECM, consider a very simple specification based on Model 3 (unrestricted constant) in which the dynamics are limited to one lag on all the dynamics terms. The full VECM consists of the following three equations y1,t = µ + βy2,t + ut ∆y1,t = δ1 + φ11 ∆y1,t−1 + φ12 ∆y2,t−1 + α1 (y1,t−1 − βy2,t−1 ) + v1,t (5.8) (5.9) ∆y2,t = δ2 + φ21 ∆y1,t−1 + φ22 ∆y2,t−1 + α2 (y1,t−1 − βy2,t−1 ) + v2,t , (5.10) whose parameters must be estimated. Two estimators are discussed initially, namely, the the Engle-Granger two-step procedure that provides estimates of the cointegrating equation without considering the dynamics from the VECM or the potential endogeneity of y2,t , and the the Johansen estimator that provides estimates of the cointegrating equation that takes into account all of the dynamics of the model. For this reason, the Johansen procedure is referred to as an efficient estimation procedure and the Engle-Granger method as the inefficient estimation procedure. The Engle and Granger estimator (Engle and Granger, 1987) The Engle Granger two stage procedure is implemented by estimating equations (5.8), (5.9) and (5.10) by ordinary least squares in two steps. Long-run: Regress y1,t on a constant and y2,t and compute the residuals u bt . Short-run: Estimate each equation of the error correction model in turn by ordinary least squares as follows (1) Regress ∆y1,t on a constant, u bt−1 , ∆y1,t−1 and ∆y2,t−1 . (2) Regress ∆y2,t on a constant, u bt−1 , ∆y1,t−1 and ∆y2,t−1 . The error correction parameter estimates, α b1 and α b2 , are the slope parameter estimates on u bt−1 in these two equations, respectively. This estimator yields super-consistent estimates of the cointegrating vector (Stock, 1987; Phillips, 1987). Nevertheless the Engle-Granger estimator does not produce estimates that are asymptotically efficient, except under very strict conditions which are, in practice, unlikely to be satisfied. This results in the estimates having nonstandard distributions which invalidates the use of standard inferential methods. The econometric problems with the Engle-Granger procedure arise from the potential endogeneity of yt and autocorrelation in the disturbances ut 141 5.6 Estimation when simply estimating equation (5.8) by ordinary least squares. Thus, while it is not necessary to take into account short-run dynamics to obtain superconsistent estimates of the long-run parameters, it is necessary to model the short-run dynamics to obtain efficient an efficient estimator with t-statistics that have standard distributions. The Johansen estimator (Johansen, 1988, 1991, 1995). In estimating the cointegrating regression in the two-step procedure none of the dynamics from the VECM are included in the estimation. A way to correct for this is to estimate all the parameters of the model jointly, a procedure known as the Johansen estimator This estimator provides more efficient estimates of the cointegrating parameters but the second stage still involves the same sequence of least squares regression but the u bt−1 will be different. Table 5.1 Engle-Granger two-stage estimates of the VECMs for equity prices and dividends and equity prices and earnings per share. Estimates are for Model 3 (unrestricted constant) with 1 lag. The sample period is January 1871 to June 2004. Variable β µ δi φi1 φi2 αi Dividend Model Long ∆pt ∆dt Run 1.179 (0.005) 3.129 (0.008) Earnings Model Long ∆pt ∆yt Run 1.042 (0.005) 2.607 (0.009) 0.002 (0.001) 0.291 (0.024) 0.148 (0.087) -0.007 (0.003) 0.000 (0.000) 0.000 (0.003) 0.877 (0.012) 0.002 (0.000) 0.002 (0.001) 0.286 (0.024) 0.074 (0.042) -0.008 (0.003) 0.000 (0.000) 0.011 (0.007) 0.8781 (0.012) 0.004 (0.001) The Engle-Granger and Johansen estimators are now compared by estimating VECM model specified in equations (5.8) to (5.10) using the United States data on equity prices, dividends and earnings. Two separate cointegrating regressions are estimated, one for prices and dividends (the dividend model) and one for prices and earnings (the earnings model). The Engle-Granger two stage estimates are reported in Table 5.1. The cointegration parameters in both cases are slightly greater than unity. Although it is tempting to look at the standard errors and claim that they 142 Cointegration Table 5.2 Estimates of the VECM for equity prices and earnings per share using the Johansen estimator. Estimates are based on Model 3 (unrestricted constant) with 1 lag. The sample period is January 1871 to June 2004. Variable β µ δi φi1 φi2 αi Dividend Model Long ∆pt ∆dt Run 1.169 (0.039) 3.390 (—–) Earnings Model Long ∆pt ∆yt Run 1.079 (0.039) 2.791 (—–) 0.002 (0.001) 0.291 (0.024) 0.148 (0.087) -0.007 (0.003) 0.000 (0.000) 0.000 (0.003) 0.877 (0.012) 0.002 (0.000) 0.001 (0.001) 0.286 (0.024) 0.072 (0.042) -0.008 (0.003) 0.001 (0.000) 0.012 (0.007) 0.871 (0.012) 0.004 (0.001) are in fact significantly different from unity, this conclusion is premature as will be come apparent later. The signs of the error-correction parameters are consistent with the system converging to its long-run equilibrium as given by the cointegating equation because in both dynamic equations α b1 < 0 and α b2 > 0, respectively. Finally, one really interesting result concerns the estimate of the intercept µ in the cointegration equation for dividends. Equation (1.16) in Chapter 1 establishes that this intercept is related to the factor at which future dividends are discounted, δ. The relationship is δ = exp(−µ) = exp(−3.129) = 0.044 . This estimate lines up nicely with the rough estimate of 0.05 obtained from Figure 1.6 in Chapter 1. Table 5.2 gives the estimates of the VECM specified in equations (5.8) (5.10) for the United States data on equity prices and earnings using the Johansen estimator. Not surprisingly there are few changes to the dynamic parameters of the VAR. The major changes, however, are in the parameter estimates of the cointegrating vector and their standard errors. The β estimates are 1.169 as opposed to 1.179 for dividends and 1.079 as opposed to 1.042 for earnings. These results are suggestive of the conclusion that problems with the single equation approach are more severe in the earnings equation. This does accord a little with intuition particularly insofar as possible endogeneity is concerned. Dividend policy by firms is changed very 5.7 Fully Modified Estimation† 143 reluctantly but retained earnings will be more responsive to the factors that influence equity prices. In addition, the estimates of the standard errors of the Johansen estimates of the cointegration parameter are about ten times larger. This appreciable difference in standard errors illustrates very clearly that inference using the standard errors obtained from the Engle-Granger procedure cannot be relied on. 5.7 Fully Modified Estimation† The ordinary least squares estimator of β in (5.8) superconsistent but inefficient. Solutions to the efficiency problem and bias introduced by possible endogeneity of the right-hand-side variables and serial correlation in ut have also been addressed within single equation framework as opposed to the the system framework adopted by the Johansen estimator. Consider the following system of equations 1 −β pt 0 0 y1,t−1 u1,t = + , (5.11) 0 1 yt 0 1 y2,t−1 u2,t in which it should be apparent that both y1,t and y2,t are I(1) variables and u1,t and u2,t are I(0) disturbances. The first equation in the system is the cointegrating regression between y1,t and y2,t with the constant term taken to be zero for simplicity. The second equation is the nonstationary generating process for y2,t . In order to complete the system fully it is still necessary to specify the properties of the disturbance vector ut = [u1,t u2,t ]0 . The most simple generating process that allows for serial correlation in ut and possible endogeneity of y2,t is the following simple autoregressive scheme of order 1 u1,t = b11,1 u1,t−1 + b12,0 u2,t + b12,1 u2,t−1 + 1,t u2,t = b21,0 u1,t + b21,1 u1,t−1 + b22,1 u2,t−1 + 2,t (5.12) in which t = [1,t 2,t ]0 ∼ iid(0, Σ) with σ11 σ12 Σ= . σ21 σ22 The notation in equation (5.12) is particularly cumbersome, but it can be simplified significantly by using the lag operator L, defined as L0 zt = zt , L1 zt = zt−1 , L2 zt = zt−2 , ··· Ln zt = zt−n . For more information on the lag operator see, for example, Hamilton (1994) and Martin, Hurn and Harris (2013). 144 Cointegration Using the lag operator, the system of equations (5.12) can be written as B(L)ut = t where B(L) = 1 − b11,1 L −b21,0 + b21,1 L −b12,0 − b12,1 L 1 − b22,1 L = b11 (L) b12 (L) b21 (L) b22 (L) . (5.13) Once B(L) is written in the form of the second matrix on the right-hand side of (5.13), then the matrix polynomials in the lag operator bij (L) can be specified to have any order and, in addition, leads as well as lags of ut can be entertained in the specification. In other words, the assumption of a simple autoregressive model of order 1 at the outset can be generalised without any additional effort. In order to express the system (5.11) in terms of t and not ut and hence remove the serial correlation, it is necessary to premultiply by B(L). The result is b11 (L) −βb11 (L) + b12 (L) y1,t 0 b11 (L) y1,t−1 = + 1,t , b21 (L) −βb21 (L) + b22 (L) y2,t 0 b22 (L) y2,t−1 2,t (5.14) The problem with single equation estimation of the cointegrating regression is now obvious: the cointegrating parameter β appears in both equations of (5.14). This suggests that to estimate the cointegrating vector, a systems approach is needed which takes into account this cross-equation restriction, the solution provided by Johansen estimator (Johansen, 1988, 1991, 1995). It follows from (5.14) that for a single equation approach to produce asymptotically efficient parameter estimates two requirements that need to be satisfied. (1) There should be no cross equation restrictions so that b21 (L) = 0. (2) There should be no contemporaneous correlation between the disturbance term in the equation used to estimate β and the 2,t , the error term in the equation generating y2,t . If this condition is not satisfied, the second equation in (5.14) cannot be ignored in the estimation of β. Assuming now that b21 (L) = 0, adding and subtracting (y1,t − βy2,t ) from the first equation in (5.14) and rearranging yields y1,t − βy2,t + [b11 (L) − 1](y1,t − βy2,t ) + b12 (L)∆y2,t−1 = 1,t (5.15) The problem remains that E[1,t , 2,t ] = σ12 6= 0 so that the second condition outlined earlier is not yet satisfied. The remedy is to multiply the second 5.7 Fully Modified Estimation† 145 equation by ρ = σ12 /σ22 and subtract the result from the first equation in (5.14). The result is y1,t −βy2,t +[b11 (L)−1](y1,t −βy2,t )+[b12 (L)−ρb22 (L)]∆y2,t−1 = vt , (5.16) in which vt = 1,t − ρ2,t . As a result of this restructuring it follows that E[vt , 2,t ] = E[1,t − ρ2,t , 2,t ] = σ12 − ρσ22 = σ12 − σ12 σ22 = 0 , σ22 so that the second condition for efficient single equation estimation of the cointegrating parameter β is now satisfied. Equation (5.16) provides a relationship between y1,t and its long-run equilibrium level, βy2,t , with the dynamics of the relationship being controlled by the structure of the polynomials in the lag operator, b11 (L), b12 (L) and b22 (L). A very general specification of these lag polynomials will allow for different lag orders and also leads as well as lags. In other words, the a general version of (5.16 will allow for both the leads and lags of the cointegrating relationship, (y1,t − βy2,t ) and the leads and lags of ∆y2,t . A reduced form version of this equation is q X y1,t = βy2,t + k=−q πk (pt−k − βyt−k ) + q X αk ∆yt−k + ηt , (5.17) k=−q k6=0 where for the sake of simplicity the lag length in all cases has been set at q. As noted by Lim and Martin (1995), this approach to obtaining asymptotically efficient parameter estimates of the cointegrating vector can be interpreted as a parametric filtering procedure. in which the filter expresses u1,t in terms of observable variables which are then included as regressors in the estimation of the cointegrating vector.The intuition behind this approach is that improved estimates of the long-run parameters can be obtained by using information on the short-run dynamics. The Phillips and Loretan estimator (Phillips and Loretan, 1991) The Phillips and Loretan (1991) estimator excludes the leads of the cointegrating vector from equation (5.17) are excluded. The equation is y1,t = βy2,t + q X k=1 πk (pt−k − βyt−k ) + q X αk ∆yt−k + ηt , (5.18) k=−q which is estimated by non-linear least squares. This procedure yields (super) consistent and asymptotically efficient estimates of the cointegrating vector if all the restrictions in moving from (5.14) to (5.18) are satisfied. 146 Cointegration Dynamic least squares (Saikkonen, 1991; Stock and Watson, 1993) The dynamic least squares estimator excludes the lags and leads of the cointegrating vector from equation (5.17). The equation is y1,t = βy2,t + q X αk ∆yt−k + ηt , (5.19) k=−q which has the advantage of being estimated by ordinary least squares. This procedure yields (super) consistent and asymptotically efficient estimates of the cointegrating vector if all the restrictions in moving from (5.14) to (5.19) are satisfied. Fully modified least squares (Phillips and Hansen, 1990) The fully modified estimator excludes the lags and leads of the cointegrating vector and limits the terms in ∆yt to the contemporaneous difference with coefficient ρ. The resulting model is y1,t = βy2,t + ρ∆yt + ηt . (5.20) Comparison of the first equation in (5.11) and (5.20) implies that u1,t = ρ∆y2,t + ηt . (5.21) The fully modified ordinary least squares approach is now implement in three steps. (1) Estimate first equation in (5.11) by ordinary least squares to obtain βb and u b1,t . (2) Estimate (5.21) by ordinary least squares to obtain estimates of ρb of σ bη2 . (3) Regress the constructed variable y1,t − ρb∆yt on y2,t and get a revised b Use the estimate of σ estimate of β. bη2 to construct standard errors. The Engle and Yoo estimator (Engle and Yoo, 1991) The Engle and Yoo estimator starts by formulating the error correction version of equation (5.20) by adding and subtracting y1,t−1 from the lefthand-side and adding and subtracting βy2,t−1 from the right-hand-side and rearranging to yield ∆y1,t = −(y1,t−1 − βy2,t−1 ) + (β + ρ)∆y2,t + ηt . (5.22) b a reduced form version of (5.22) is Given an estimate β, b 2,t−1 ) + α∆y2,t + wt . ∆y1,t = −δ(y1,t−1 − βy (5.23) 5.7 Fully Modified Estimation† 147 in which wt = αδy2,t−1 + ηt , α = β − βb . (5.24) The Engle and Yoo estimator is implemented in three steps. (1) Estimate first equation in (5.11) by ordinary least squares to obtain βb and u b1,t . (2) Estimate (5.24) by ordinary least squares to obtain estimates of w bt and b δ. (3) Regress the residuals w bt on y2,t−1 and in order to obtain α b. The revised estimate of β is given by βb + α b. Table 5.3 Single equation estimates of the cointegration regression between stock prices and dividends and stock prices and earnings, respectively. The dynamic ordinary least squares estimates use one forward lead and one backward lag. The sample period is January 1871 to June 2004. OLS β µ Dividend Model DOLS FMOLS 1.179 (0.005) 3.129 (0.008) 1.174 (0.040) 3.117 (0.056) 1.191 (0.038) 3.143 (0.053) OLS Earnings Model DOLS FMOLS 1.042 (0.005) 2.607 (0.009) 1.043 (0.039) 2.607 (0.065) 1.065 (0.038) 2.612 (0.064) Table 5.3 compares the ordinary least squares estimator of the cointegrating regression with the fully modified and dynamic ordinary least squares estimators. Comparison with the results in Table 5.2 shows that the fully modified ordinary least squares estimator works particularly well in the case of the earnings model, which previously was identified as the more problematic of the two models in terms of potential endogeneity. The dynamic least squares estimator is less impressive in this situation, although there may be scope for improvement by considering a longer lead/lag structure. Interestingly, the standard errors on the fully modified and dynamic least squares approaches are similar to those of the Johansen approach. The results suggest that modified single equation approaches can help to improve inference in the cointegrating regression. The limitation of these approaches remains that the dimension of the cointegration space is always limited to unity. 148 Cointegration 5.8 Testing for Cointegration Up to this point the existence of a cointegrating relationship has merely been posited or assumed. Of course, the identification of cointegration is a crucial step in modelling with nonstationary variables and is, in fact, the place where the modelling procedure actually begins. Yule (1926) first drew attention to the problems of modelling with unrelated nonstationary variables and Granger and Newbold (1974) later showed that regression involving non stationary variables can lead to spurious correlations. Spurious regressions arise when unrelated nonstationary variables are found to have a statistically significant relationship. Suppose yt and xt are unrelated I(0) variables, the chance of getting a nonzero estimate of a regression coefficient of xt on yt , even though the true value is zero, is substantial. Banerjee, Dolado, Galbraith and Hendry (1993)indexauthorsHendry, D.F. showed that in a sample size of 100 a rejection probability of 75.3% was obtained. Morevoer, the problem does not go away in large samples, in fact the opposite is true which the rejection probability of a zero coefficient going up the larger the sample gets. To guard against spurious regressions it is critically important that cointegration can be identified reliably. 5.8.1 Residual-based tests A natural way to test for cointegration is a two-step procedure consisting of estimating the cointegrating equation by least squares in the first step and testing the residuals for stationarity in the second step. As the unit root test treats the null hypothesis as nonstationary, in applying the unit root procedure to test for cointegration the null hypothesis is no cointegration whereas the alternative hypothesis of stationarity represents cointegration: H0 : H1 : No Cointegration Cointegration (ut is nonstationary) (ut is stationary) This is a sensible strategy given that the estimator of the cointegrating equation is super-consistent and converges √ at the faster rate of T to its population value compared to the usual rate of T for stationary variables. However, in applying a unit root test to the ordinary least squares residuals the critical values must take into account the loss of degrees of freedom in estimating the cointegrating equation. The critical values of the tests depend on the sample size and the number of deterministic terms and other regressors in the first stage regression. Tables are provided by Engle and Granger (1987) and Engle and Yoo (1987). MacKinnon (1991) provides response surface estimates of the critical values that are now used in most computer packages. 149 Dividend residuals 00 20 80 19 60 19 40 19 20 19 00 19 18 80 -1 -.5 Residuals 0 .5 1 5.8 Testing for Cointegration Earnings residuals Figure 5.4 Plot of the residuals from the first stage of the Engle-Granger two stage procedure applied to the dividend model and the earnings model, respectively. Data are monthly observations from February 1871 to June 2004 on United States equity prices, dividends and earnings per share. The residuals obtained by estimating the cointegrating regressions for the dividend model, (5.1), and the earnings model, (5.2), respectively, by ordinary least squares are plotted in Figure 5.4. The series appear to have mean zero and there is no trend apparent giving the appearance of stationarity. Formal tests of the stationarity of the residuals are carried out using the Dickey-Fuller framework, based on a test regression with no constant or trend. The results are shown in Table 5.4 for up to four lags used to augment the test regression. Despite the aberration of the Dickey-Fuller test (0 lags) failing to reject the null hypothesis of nonstationarity, the results from the augmented Dickey-Fuller test are unequivocal. The null hypothesis of nonstationarity is rejected and the residuals are I(0). This confirms the intuition provided by Figure 5.4 and allows the conclusion that both the dividend model and the earnings model represent valid long-run relationships between equity prices and dividends and equity prices and earnings per share, respectively. Although residual-based tests of cointegration are a natural way to think about the problem of testing for cointegration they suffer from the same problem as all single equation approaches to cointegration, namely, that the number of cointegrating relationships is necessarily limited to one. This is not problematic in the case of two variables, but it is severely limiting when wanting to consider the multivariate case. 150 Cointegration Table 5.4 Testing for cointegration between United States equity prices and dividends and equity prices and earnings. Augmented Dickey-Fuller tests based on the test regression with no constant term and with number of lags shown. Critical values are from MacKinnon (1991). Dividend Model Dickey-Fuller Test Lags Statistic 5% CV 0 1 2 3 4 -2.654 -3.890 -3.630 -3.576 -3.814 -3.340 -3.340 -3.340 -3.340 -3.340 Earnings Model Dickey-Fuller Test Rank Statistic 5% CV 0 1 2 3 4 -2.674 -4.090 -3.921 -3.936 -4.170 -3.340 -3.340 -3.340 -3.340 -3.340 5.8.2 Reduced-rank tests Consider the following simple model ∆y1,t π11 π12 y1,t−1 1,t = + , ∆y2,t π21 π22 y2,t−1 2,t (5.25) which is a bivariate VAR rearranged to look like a VECM but with no long-run equilibrium relationships imposed. In other words, the matrix π11 π12 Π= , π21 π22 is an unrestricted matrix in which the rows and columns of the matrix are not related in a linear fashion. This condition is referred to as the matrix having full rank. As this model is simply a VAR model written in a particular way for this to be a correct representation of the data both y1,t and y2,t must be stationary. Now consider the situation when y1,t and y2,t share a long-run relationship with cointegrating parameter β with speed of adjustment parameters α1 and α2 in the first and second equations, respectively. Equation (5.25) must be 151 5.8 Testing for Cointegration restricted to reflect this long-run relationship to yield the familiar VECM ∆y1,t α1 α1 β y1,t−1 1,t = + . (5.26) ∆y2,t α2 α2 β y2,t−1 2,t so that Π= α1 α1 β α2 α2 β = α1 α2 1 β . The effect of the long-run relationship is to restrict the elements of the matrix Π. In particular the second column of Π is simply the first column multiplied by β so that there is now dependence between the columns of the matrix. The matrix Π is now referred to as having reduced rank, in this case rank one. If the matrix Π has rank zero then the system becomes ∆y1,t 1,t = , (5.27) ∆y2,t 2,t in which both y1,t and y2,t are nonstationary. It is now apparent from equations (5.25) to (5.25) that testing for cointegration is equivalent to testing the validity of restrictions on the matrix Π, or determining the rank of this matrix. In other words, testing for cointegration amounts to testing if the matrix Π has reduced rank. As the rank of the matrix is determined from the number of significant eigenvalues, Johansen provides two tests of cointegration based on the eigenvalues of the matrix Π, known as the maximal eigenvalue test and the trace test respectively (Johansen, 1988, 1991, 1995). Testing for cointegration based on the eigenvalues of Π is now widely used because it has two advantages over the two-step residual based test, namely, the tests generate the correct p-values and the tests are easily applied in a multivariate context where testing for several cointegrating equations jointly is required. The Johansen cointegration test proceeds sequentially. If there are two variables being tested for cointegration the maximum number of hypotheses considered is two. If there are N variables being tested for possible cointegration the maximum number of hypotheses considered is N . Stage 1: H0 : H1 : No cointegrating equations One of more cointegrating equations Under the null hypothesis all of the variables are I(1) and there is no linear combination of the variables that achieves cointegration. 152 Cointegration Under the alternative hypothesis there is (at least) one linear combination of the I(1) variables that yields a stationary disturbance and hence cointegration. If the null hypothesis is not rejected then the hypothesis testing stops. Alternatively, if the null hypothesis is rejected it could be the case that there is more than one linear combination of the variables that achieves stationarity so the process continues. Stage 2: H0 : H1 : One cointegrating equation Two or more cointegrating equations If the null hypothesis is not rejected the testing procedure stops and the conclusion that there are two cointegrating equations. Otherwise proceed to the next stage. Stage N: H0 : H1 : N − 1 cointegrating equations All variables are stationary At the final stage, the alternative hypothesis is that all variables are stationary and not that there are N cointegating equations. For there to be N linear stationary combinations of the variables, the variables need to be stationary in the first place. Large values of the Johansen cointegration statistic relative to the critical value result in rejection of the null hypothesis. Alternatively, small p-values less than 0.05 for example, represents a rejection of the null hypothesis at the 5% level. In performing the cointegration test, it is necessary to specify the VECM to be used in the estimation of the matrix Π. The deterministic components (constant and time trend) as well as the number of lagged dependent variables to capture autocorrelation in the residuals must be specified. The results of the Johansen cointegration test applied to the United States equity prices, dividends and earnings data is given in Table 5.5. Results are provided for the dividend model, the earnings model and a combined model which tests all three variables simultaneously. For the first two models, N = 2, so the maximum rank of the Π matrix is 2. Inspection of the first null hypothesis of zero rank or no cointegration shows that the null hypothesis is easily rejected at the 5% level for both the dividend and earnings models. There is therefore at least one cointegrating vector in both of these specifications. The next hypothesis corresponds to Π having rank one or there being one cointegating equation. The null hypothesis is not rejected 153 5.8 Testing for Cointegration Table 5.5 Johansen tests of cointegration between United States equity prices, dividends and earnings. Testing is based on Model 3 (unrestricted constant) with 2 lags in the underlying VAR. Rank Eigenvalue 0 1 2 · 0.01907 0.00091 Rank Eigenvalue 0 1 2 · 0.01988 0.00061 Rank Eigenvalue 0 1 2 3 · 0.05055 0.01576 0.00078 Dividend Model Trace Test Statistic 5% CV Max Test Statistic 5% CV 32.2643 1.4510 · 30.8132 1.4510 · 15.41 3.76 · 14.07 3.76 · Earnings Model Trace Test Statistic 5% CV Max Test Statistic 5% CV 33.1124 0.9814 · 32.1310 0.9814 · 15.41 3.76 · 14.07 3.76 · Combined Model Trace Test Statistic 5% CV Max Test Statistic 5% CV 109.6699 26.6677 1.2495 · 83.0022 25.4183 1.2495 · 29.68 15.41 3.76 · 20.97 14.07 3.76 · at the 5% level for both models, so the conclusion is that there is one cointegrating equation that combines prices and dividends and one cointegrating equation that combines prices and earnings into stationary series. The results of the Johansen cointegration test applied to the combined model of real equity prices, real dividends and earnings per share are given in Table 5.5. The body of the table contains three rows as there are now N = 3 variables being examined. The first null hypothesis of zero rank or no cointegration is easily rejected at the 5% level so there is at least one linear combination of these variables that is stationary. The next hypothesis corresponds to Π having rank one or there being one cointegating equation. The null hypothesis is again rejected at the 5% level so there are at least two cointegrating relationships between these three variables. The null hypothesis of a rank of two cannot be rejected at the 5% level, so the conclusion is that there are two linear combinations of these three variables that produce a stationary residual. 154 Cointegration 5.9 Multivariate Cointegration The results of the Johansen cointegration test applied to the the three variable system of real equity prices, real dividends and earnings per share in the previous section indicated that there are two cointegrating vectors. There are thus two combinations of these three nonstationary variables that yield stationary residuals. The next logical step is to estimate a VECM which takes all three variables as arguments and imposes a cointegrating rank of two on the estimation. The results of this estimation are shown in Table 5.6. Table 5.6 Estimates of a three-variable VECM(1) for equity prices, dividends and earnings per share using the Johansen estimator based on Model 3 (unrestricted constant). The sample period is January 1871 to June 2004. The two estimated cointegrating equations are pt = 1.072 yt + 2.798 [Ecm1] dt = 0.910 yt − 0.445 [Ecm2] (0.042) (0.012) Variable Ecm1 Ecm2 ∆pt−1 ∆dt−1 ∆yt−1 Constant ∆pt ∆dt ∆yt -0.0082 (0.0034) 0.0014 (0.0069) 0.2868 (0.0242) 03674 (0.1015) 0.0699 (0.0465) 0.0005 (0.0012) 0.0017 (0.0004) -0.0072 (0.0009) -0.0020 (0.0032) 0.8194 (0.0133) 0.0235 (0.0061) 0.0006 (0.0001) 0.0029 (0.0010) 0.0049 (0.0020) 0.01339 (0.0070) 0.0542 (0.0292) 0.8748 (0.0133) 0.0009 (0.0004) The interpretation of the results in Table 5.6 proceeds as follows. (1) Cointegrating equations: The first cointegrating equation estimates the long-run relationship between price and earnings and is normalised with respect to price. The second cointegrating relationship is between dividends and earnings, normalised with repeat to dividends. (2) Speed of adjustment parameters: The signs and significance of the speed of adjustment parameters on the error correction terms help to establish the stability of the 5.9 Multivariate Cointegration 155 estimated relationships. Stability requires that the coefficient of adjustment on the error correction term in the equation for ∆pt be negative. This is indeed the case and the estimate is also significant, although marginally so. The coefficient of adjustment in the earnings equation is positive and significant which is also required by theory. Interestingly, the adjustment coefficient in the dividend equation is also significant. This is to be expected because earnings and dividends are closely related as demonstrated by the second cointegrating equation. What this suggests is that dividends and earnings adjust more aggressively than prices do to correct any deviation from long-run equilibrium. As expected the adjustment parameter on the second error-correction term is negative and significant in the dividend equation and positive and significant in the dividend equation. Notice however that the coefficient of adjustment on Ecm2 in the ∆pt equation is insignificant which is to be expected given that price is not expected to adjust to a divergence from long-run equilibrium between dividends and earnings. (3) Dynamic parameters: The first test of interest on the parameters of the VECM relates to the significance of the constant terms in the short-run dynamic specification of the system. This relates to the choice of Model 3 (unrestricted constant) as opposed to Model 2 (restricted constant) where the constant term only appears in the cointegrating equations. Although the constants are all small in absolute size at least two of them appear to be estimated fairly precisely. The joint hypothesis that they are all zero, or equivalently that Model 2 is preferable to Model 3, is therefore unlikely to be accepted. An important issue in estimating multivariate systems in which there are cointegrating relationships is that the estimates of the cointegrating vectors are not unique, but depend on the normalisation rules which are adopted. For example, the results obtained when estimating this three variable system but imposing the normalisation rule that both cointegrating equations are normalised on pt are reported in Table 5.7. The two cointegrating regressions reported in Table 5.7 are now the familiar expressions that have been dealt with in the bivariate cases throughout the chapter (see for example, Table 5.2). While this seems to contradict the results reported in Table 5.6 the two sets of long-run relationships are easily 156 Cointegration Table 5.7 Estimates of the three-variable VECM for equity prices, dividends and earnings per share using the Johansen estimator. Estimates are based on Model 3 (unrestricted constant) with 1 lag of the differenced variables. The sample period is January 1871 to June 2004. The two estimated cointegrating equations are pt = 1.072 yt + 2.798 [Ecm1] pt = 1.777 dt + 3.323 [Ecm2] (0.039) (0.039) Variable Ecm1 Ecm2 ∆pt−1 ∆dt−1 ∆yt−1 Constant ∆pt ∆dt ∆yt -0.0070 (0.0051) 0.0012 (0.0059) 0.2868 (0.0242) 03674 (0.1015) 0.0699 (0.0465) 0.0005 (0.0012) -0.0045 (0.0007) 0.0062 (0.0008) -0.0020 (0.0032) 0.8194 (0.0133) 0.0235 (0.0061) 0.0006 (0.0001) 0.0071 (0.0015) -0.0042 (0.0017) 0.01339 (0.0070) 0.0542 (0.0292) 0.8748 (0.0133) 0.0009 (0.0004) reconciled. It follows directly from the results in Table 5.7 that pt = 1.777dt = 1.072yt ⇒ dt = 1.072/1.777yt = 0.9107yt which corresponds to the second cointegrating equation in Table 5.6. One final interesting point to note is that Table 5.7 confirms the rather weak adjustment by prices to any disequilibrium. Both the adjustment parameters on Ecm1 and Ecm2 in this specification are insignificantly different from zero. What this suggests is that dividends and earnings per share tend to pick up most of the adjustment in relation to shocks which disturb the long-run equilibrium. Multivariate cointegration modelling is a very useful tool in dealing with financial models and will be encountered again in Chapters 12 and 13. The potentially more complicated issues of testing and interpretation will be left to deal with in these later chapters. 5.10 Exercises (1) Simulating a VECM 5.10 Exercises 157 Consider a simple bivariate VECM y1,t − y1,t−1 = δ1 + α1 (y2,t−1 − βy1,t−1 − µ) y2,t − y2,t−1 = δ2 + α2 (y2,t−1 − βy1,t−1 − µ) (a) Using the initial conditions for the endogenous variables y1 = 100 and y2 = 110 simulate the model for 30 periods using the parameters δ1 = δ2 = 0; α1 = −0.5; α2 = 0.1; β = 1; µ = 0 . Compare the two series. Also check to see that the long-run value of y2 is given by βy1 + µ. (b) Simulate the model using the following parameters: δ1 = δ2 = 0; α1 = −1.0; α2 = 0.1; β = 1; µ = 0 Compare the resultant series with the those in (a) and hence comment on the role of the error correction parameter α1 . (c) Simulate the model using the following parameters: δ1 = δ2 = 0; α1 = 1.0; α2 = −0.1; β = 1; µ = 0 Compare the resultant series with the previous ones and hence comment on the relationship between stability and cointegration. (d) Simulate the model using the following parameters: δ1 = δ2 = 0; α1 = −1.0; α2 = 0.1; β = 1; µ = 10 Comment on the role of the parameter µ. Also check to see that the long-run value of y2 is given by βy1 + µ. (e) Simulate the model using the following parameters: δ1 = δ2 = 1; α1 = −1.0; α2 = 0.1; β = 1; µ = 0 Comment on the role of the parameters δ1 and δ2 . (f) Explore a richer class of models which also includes short-run dynamics. For example, consider the model y1,t − y1,t−1 = δ1 + α1 (y2,t−1 − βy1,t−1 − µ) + φ11 (y1,t−1 − y1,t−2 ) +φ12 (y2,t−1 − y2,t−2 ) y2,t − y2,t−1 = δ2 + α2 (y2,t−1 − βy1,t−1 − µ) + φ21 (y1,t−1 − y1,t−2 ) +φ22 (y2,t−1 − y2,t−2 ) (2) The Present Value Model 158 Cointegration pv.wf1, pv.dta, pv.xlsx The present value model predicts the following relationship between the two series pt = β0 + β1 dt + ut , where pt is the natural logarithm of real price of equities, dt is the natural logarithm of real dividend payments, ut is a disturbance term and β1 is the discount rate and β1 = 1. (a) Test for cointegration between pt and dt using Model 3 and p = 1 lags. (b) Given the results in part (a) estimate a bivariate ECM for pt and dt using Model 3 with p = 1 lag. Interpret the results paying particular attention to the long-run parameter estimates, β0 and β1 and the error correction parameter estimates, α bi . (c) Derive an estimate of the long-run real discount rate from R = exp(−β0 ) and interpret the result. (d) Test the restriction H0 : β1 = 1. (e) Discuss whether the empirical results support the present value model. (3) Forward Market Efficiency spot.wf1, spot.dta, spot.xlsx The data for this question were obtained from Corbae, Lim and Ouliaris (1992) who test for speculative efficiency by considering the equation st = β0 + β1 ft−n + ut , where st is the natural logarithm of the spot rate, ft−n is the natural logarithm of the forward rate lagged n periods and ut is a disturbance term. In the case of weekly data and the forward rate is the 1-month rate, ft−4 is an unbiased estimator of st if β1 = 1. (a) Use unit root tests to determine the level of integration of st , ft−1 , ft−2 and ft−3 . (b) Test for cointegration between st and ft−4 using Model 2 with p = 0 lags. 159 5.10 Exercises (c) Provided that the two rates are cointegrated, estimate a bivariate VECM for st and ft−4 using Model 2 with p = 0 lags. (d) Interpret the coefficients β0 and β1 . In particular, test that β1 = 1. (e) Repeat these tests for the 3 month and 6 month forward rates. Hint: remember that the frequency of the data is weekly. (4) Spurious Regression Problem Program files nts_spurious1.*, nts_spurious2.* A spurious relationship occurs when two independent variables are incorrectly identified as being related. A simple test of independence is based on the estimated correlation coefficient, ρb. (a) Consider the following bivariate models (i) (ii) (iii) (iv) y1,t y1,t y1,t y1,t = v1,t , = y1,t−1 + v1,t , = y1,t−1 + v1,t , = 2y1,t−1 − y1,t−2 + v1,t , y2,t y2,t y2,t y2,t = v2,t = y2,t−1 + v2,t = 2y2,t−1 − y2,t−2 + v2,t = 2y2,t−1 − y2,t−2 + v2,t in which v1,t , v2,t are iid N (0, σ 2 ) with σ 2 = 1. Simulate each bivariate model 10000 times for a sample of size T = 100 and compute the correlation coefficient, ρb, of each draw. Compute the sampling distributions of ρb for the four sets of bivariate models and discuss the properties of these distributions in the context of the spurious regression problem. (b) Repeat part (a) with T = 500. What do you conclude? (c) Repeat part (a), except for each draw estimate the regression model y2,t = β0 + β1 y1,t + ut , ut ∼ iid (0, σ 2 ) . Compute the sampling distributions of the least squares estimator βb1 and its t statistic for the four sets of bivariate models. Discuss the properties of these distributions in the context of the spurious regression problem. (5) Fisher Hypothesis fisher.wf1, fisher.dta, fisher.xlsx 160 Cointegration Under the Fisher hypothesis the nominal interest rate fully reflects the long-run movements in the inflation rate. The Fisher hypothesis is represented by it = β0 + β1 πt + ut , where ut is a disturbance term and the slope parameter is β1 = 1. (a) Construct the percentage annualised inflation rate, πt . (b) Perform unit root tests to determine the level of integration of the nominal interest rate and inflation. In performing the unit root tests, test the sensitivity of the results by using a model with a constant and no time trend, and a model with a constant and a time trend. Let the lags be determined by the automatic lag length selection procedure. Discuss the results in terms of the level of integration of each series. (c) Estimate a bivariate VAR with a constant and use the SIC lag length criteria to determine the optimal lag structure. (d) Test for cointegration between it and πt using Model 2 with the number of lags based on the optimal lag length obtained form the estimated VAR. Remember if the optimal lag length of the VAR is p, the lag structure of the VECM is p − 1. (e) Redo part (d) subject to the restriction that β1 = 1. (f) Does the Fisher hypothesis hold in the long-run? Discuss. (6) Purchasing Power Parity ppp.wf1, ppp.dta, ppp.xlsx Under the assumption of purchasing power parity (PPP), the nominal exchange rate adjusts in the long-run to the price differential between foreign and domestic countries S= P F This suggests that the relationship between the nominal exchange rate and the prices in the two countries is given by st = β0 + β1 pt + β2 ft + ut where lower case letters denote natural logarithms and ut is a disturbance term which represents departures from PPP with β2 = −β1 . 5.10 Exercises 161 (a) Construct the relevant variables, s, f , p and the difference dif f = p − f. (b) Use unit root tests to determine the level of integration of all of these series. In performing the unit root tests, test the sensitivity of the results by using a model with a constant and no time trend, and a model with a constant and a time trend. Let the lags be p = 12. Discuss the results in terms of the level of integration of each series. (c) Test for cointegration between s p and f using Model 3 with p = 12 lags. (d) Given the results in part (c) estimate a trivariate ECM for s, p and f using Model 3 and p = 12 lags. Write out the estimated (the cointegrating equation(s) and the ECM). (e) Interpret the long-run parameter estimates. Hint: if the number of cointegrating equations is greater than one, it is helpful to rearrange the cointegrating equations so one of the equations expresses s as a function of p and f . (f) Interpret the error correction parameter estimates. (g) Interpret the short-run parameter estimates. (h) Test the restriction H0 : β2 = −β1 . (i) Discuss the long-run properties of the $/AUD foreign exchange market? 6 Forecasting 6.1 Introduction The future values of variables are important inputs into the current decision making of agents in financial markets and forecasting methods, therefore, are widely used in financial markets. Formally, a forecast is a quantitative estimate about the most likely value of a variable based on past and current information and where the relationship between variables is embodied in an estimated model. In the previous chapters a wide variety of econometric models have been introduced, ranging from univariate to multivariate time series models, from single equation regression models to multivariate vector autoregressive models. The specification and estimation of these financial models provides a mechanism for producing forecasts that are objective in the sense that the forecasts can be recomputed exactly by knowing the structure of the model and the data used to estimate the model. This contrasts with back-of-the-envelope methods which are not reproducible. Forecasting can also serve as a method for comparing alternative models. Forecasting methods not only provide an important way to choose between alternative models, but also a way of combining the information contained in forecasts produced by different models. 6.2 Types of Forecasts Illustrative examples of forecasting in financial markets abound. (i) The determination of the price of an asset based on present value methods requires discounting the present and future dividend stream at a discount rate that potentially may change over time. (ii) Firms are interested in forecasting the future health of the economy 163 6.2 Types of Forecasts (iii) (iv) (v) (vi) when making decisions about current capital outlays because this investment earns a stream of returns over time. In currency markets, forward exchange rates provide an estimate, forecast, of the future spot exchange rate. In options markets, the Black-Scholes method for pricing options is based on the assumption that the volatility of the underlying asset that the option is written on is constant over the life of the option. In futures markets, buyers and sellers enter a contract to buy and sell commodities at a future date. Model-based computation of Value-at-Risk requires repeated forecasting of the value of a portfolio over a given time horizon. Although all these examples are vastly different, the forecasting principles in each case are identical. Before delving into the actual process of generating forecasts it is useful to establish some terminology. Consider an observed sample of data {y1 , y2 , · · · , yT } and an econometric model is to be used to generate forecasts of y over an horizon of H periods. The forecasts of y which are denoted yb are of two main types. Ex Ante Forecasts: The entire sample {y1 , y2 , · · · , yT } is used to estimate the data and the task is to forecast the variable over an horizon H beginning after the last observation of the dataset. Ex Post Forecasts: The model is estimated over a restricted sample period that excludes the last H observations, {y1 , y2 , · · · , yT −H }. The model is then forecasted out-of-sample over these H observations, but as the actual value of these observations have already been observed and it is therefore possible to compare the accuracy of the forecasts with the actual values. Ex post and ex ante forecasts may be illustrated as follows: Sample y1 , y2 , · · · , yT −H , yT −H+1 , yT −H+2 · · · yT Ex Post y1 , y2 , · · · , yT −H , ybT −H+1 , ybT −H+2 · · · ybT Ex Ante y1 , y2 , · · · , yT −H , yT −H+1 , yT −H+2 · · · yT ybT +1 , · · · ybT +H It is clear therefore that forecasting ex ante for H periods ahead requires the successive generation of ybT +1 , ybT +2 up to and including ybT +H . This is referred to a multi-step forecast. On the other hand, ex post forecasting allows some latitude for choice. The forecast ybT −H+1 is based on data up to and including yT −H . In generating the forecast ybT −H+2 the observation 164 Forecasting yT −H+1 is available for use. Forecasts that use this observation are referred to as a one-step ahead or static forecast. Ex post forecasting also allows multistep forecasting using data up to and including yT −H and this is known as dynamic forecasting. There is a distinction between forecasting based on dynamic time series models and forecasts based on broader linear or nonlinear regression models. Forecasts based on dynamic univariate or multivariate time series models developed in Chapter ?? are referred to as recursive forecasts. Forecasts that are based on econometric models that related one variable to another as in the linear regression model outlined in Chapter 2 are known as structural forecasts. It should be noted, however, that that the distinction between these two types of forecasts is often unclear as econometric models often contain both structural and dynamic time series features. An area in forecasting that has attracted a lot of recent interest which incorporates both recursive and structural elements is the problem of or predictive regressions, dealt with in Section 6.9. Finally, forecasts in which only a single figure, say ybT +H , is reported for period T + H is known as a point forecast. The point forecast represents the best guess of the value of yT +H . Even if this guess is a particularly good one and it is known that on average the forecast is correct, or more formally Eb yT +H = yT +H , there is some uncertainty associated with every forecast. Interval forecasts encapsulate this uncertainty by providing a range of forecast values for ybT +H within which the actual value yT +H is expected to be found at some given level of confidence. 6.3 Forecasting with Univariate Time Series Models To understand the basic principles of forecasting financial econometric models, the simplest example namely a univariate autoregressive model with one lag, AR(1), model, is sufficient to demonstrate the key elements. Extending the model to more complicated univariate and multivariate models only increases the complexity to the computation but not the underlying fundamental technique of how the forecasts are generated. Consider the AR(1) model yt = φ0 + φ1 yt−1 + vt . (6.1) Suppose that the data consist of T sample observations y1 , y2 , · · · , yT . Now consider using the model to forecast the variable one period into the future, 6.3 Forecasting with Univariate Time Series Models 165 at T + 1. The model at time T + 1 is yT +1 = φ0 + φ1 yT + vT +1 . (6.2) To be able to compute a forecast of yT +1 it is necessary to know everything on the right-hand side of equation ??ch5-e2). Inspection of this equation reveals that some of these terms are known and some are unknown at time T: Observations: yT Known Parameters: φ0 , φ1 Unknown Disturbance: vT +1 Unknown The aim of forecasting is to replace the unknowns with the best guess of these quantities. In the case of parameters, the best guess is simply to replace them with their point estimates, φb0 and φb1 , where all the sample data is used to obtain the estimates. Formally this involves using the mean of the sampling distribution to replace the population parameters φ0 , φ1 by their sample estimates. Adopting the same strategy, the unknown disturbance term vT +1 in (6.2) is replaced by using the mean of its distribution, namely E[vT +1 ] = 0. The resulting forecast of yT +1 based on equation (6.2) is given by ybT +1 = φb0 + φb1 yT + 0 = φb0 + φb1 yT , (6.3) where the replacement of yT +1 by ybT +1 emphasizes the fact that the latter is a forecast quantity. Now consider extending the forecast range to T + 2, the second period after the end of the sample period. The strategy is the same as before with the first step being expressing the model at time T + 2 as yT +2 = φ0 + φ1 yT +1 + vT +2 , in which all that all terms are now time T : Parameters: Observations: Disturbance: (6.4) unknown at the end of the sample at φ0 , φ1 yT +1 vT +2 Unknown Unknown Unknown As before, replace the parameters φ0 and φ1 by their sample estimators, b φ0 and φb1 , and the disturbance vT +2 by its mean E[vT +2 ] = 0. What is new in equation (6.4) is the appearance of unknown quantity yT +1 on the right-hand side of the equation Again, adopting the strategy of replacing unknowns by a best guess requires that the forecast of this variable obtained 166 Forecasting in the previous step, ybT +1 be used. Accordingly, the forecast for the second period is ybT +2 = φb0 + φb1 ybT +1 + 0 = φb0 + φb1 ybT +1 . Clearly extending this analysis to H implies a forecasting equation of the form ybT +H = φb0 + φb1 ybT +H−1 + 0 = φb0 + φb1 ybT +H−1 . The need to use the forecast from the previous step to generate a forecast in the next step is commonly referred to as recursive forecasting. Moreover, as all of the information embedded in the forecasts ybT +1 , ybT +2 , · · · ybT +H is based on information up to and including the last observation in the sample at time T , the forecasts are commonly referred to as conditional mean forecasts where the conditioning is based on information at time T . Extending the AR(1) model to an AR(2) model yt = φ0 + φ1 yt−1 + φ2 yt−2 + vt , involves the sample strategy to forecast yt . Writing the model at time T + 1 gives yT +1 = φ0 + φ1 yT + φ2 yT −1 + vT +1 . Replacing the parameters {φ0 , φ1 , φ2 } by their sample estimators {φb0 , φb1 , φb2 } and the disturbance vT +1 by its mean E[vT +1 ] = 0, the forecast for the first period into the future is ybT +1 = φb0 + φb1 yT + φb2 yT −1 . To generate the forecasts for the second period, the AR(2) model is written at time T + 2 yT +2 = φ0 + φ1 yT +1 + φ2 yT + vT +2 . Replacing all of the unknowns on the right-hand side by their appropriate best guesses, gives ybT +2 = φb0 + φb1 ybT +1 + φb2 yT . To derive the forecast of yt at time T + 3 the AR(2) model is written at T +3 yT +3 = φ0 + φ1 yT +2 + φ2 yT +1 + vT +3 . Now all terms on the right-hand side are unknown and the forecasting equation becomes ybT +3 = φb0 + φb1 ybT +2 + φb2 ybT +1 . 6.3 Forecasting with Univariate Time Series Models 167 This univariate recursive forecasting procedure is easily demonstrated. Consider the logarithm of monthly United States equity index, pt , for which data are available from February 1871 to June 2004, and associated returns, rpt = pt − pt−1 , expressed as percentages. Ex ante forecasts To generate ex ante forecasts of returns using a simple AR(1) model, the parameters are estimated using the entire available sample period and these estimates, together with the actual return for June 2004 are used to generate the recursive forecasts. Consider the case where ex ante forecasts are required for July and August 2004. The estimated model is rpt = 0.2472 + 0.2853 ret−1 + vb1,t , where vb1,t is the least squares residual. Given that the actual return for June 2004 is 2.6823% the forecasts for July and August are, respectively, : rp b T +1 = = February : rp b T +2 = = January 0.2472 + 0.2853 rpT 0.2472 + 0.2853 × 2.6823 = 1.0122% 0.2472 + 0.2853 rp b T +1 0.2472 + 0.2853 × 1.0120 = 0.5359% Ex post forecasts Suppose now that ex post forecasts are required for the period January 2004 to June 2004. The model is now estimated over the period February 1871 to December 2013 to yield rpt = 0.2459 + 0.2856 rpt−1 + vbt , where vbt is the least squares residual. The forecasts are now generated recursively using the estimated model and also the fact that the equity return 168 Forecasting in December 2003 is 2.8858%: January : rp b T +1 = = February : rp b T +2 = = March : rp b T +3 = = April : rp b T +4 = = May : rp b T +5 = = June : rp b T +6 = = 0.2459 + 0.2856 rpT 0.2459 + 0.2856 × 2.8858 0.2459 + 0.2856 rp b T +1 0.2459 + 0.2856 × 1.0701 0.2459 + 0.2856 rp b T +2 0.2459 + 0.2856 × 0.5515 0.2459 + 0.2856 rp b T +3 0.2459 + 0.2856 × 0.4034 0.2459 + 0.2856 rp b T +4 0.2459 + 0.2856 × 0.3611 0.2459 + 0.2856 rp b T +5 0.2459 + 0.2856 × 0.3490 = 1.0701% = 0.5515% = 0.4034% = 0.3611% = 0.3490% = 0.3456% . The forecasts are illustrated in Figure 8.1. It is readily apparent how quickly the forecasts are driven toward the unconditional mean of returns. This is typical of time series forecasts. -10 -5 0 5 AR(1) Forecast of U.S. Equity Returns Jan 2003 Jul 2003 Jan 2004 Jul 2004 Figure 6.1 Forecasts (dashed line) of United States equity returns generated by an AR(1) model. The estimation sample period is February 1871 to December 2003 and the forecast period is from January 2004 to June 2004. 6.4 Forecasting with Multivariate Time Series Models The recursive method used to generate the forecasts of a univariate time series model is easily generalised to multivariate models. 6.4 Forecasting with Multivariate Time Series Models 169 6.4.1 Vector Autoregressions Consider a bivariate vector autoregression with one lag, VAR(1), given by y1,t = φ10 + φ11 y1,t−1 + φ12 y2,t−1 + v1,t y2,t = φ20 + φ21 y1,t−1 + φ22 y2,t−1 + v2,t . (6.5) Given data up to time T , a forecast one period ahead is obtained by writing the model at time T + 1 y1,T +1 = φ10 + φ11 y1,T + φ12 y2,T + v1,T +1 y2,T +1 = φ20 + φ21 y1,T + φ22 y2,T + v2,T +1 . The knowns on the right-hand side are the last observations of the two variables, y1,T and y2,T and the unknowns are the the disturbance terms v1,T +1 and v2,T +1 and the parameters {φ10 , φ11 , φ12 , φ20 , φ21 , φ22 }. Replacing the unknowns by the best guesses, as in the univariate AR model, yields the following forecasts for the two variables at time T + 1: yb1,T +1 = φb10 + φb11 y1,T + φb12 y2,T yb2,T +1 = φb20 + φb21 y1,T + φb22 y2,T . To generate forecasts of the VAR(1) model in (6.5) in two periods ahead, the model is written at time T + 2 y1,T +2 = φ10 + φ11 y1,T +1 + φ12 y2,T +1 + v1,T +2 y2,T +2 = φ20 + φ21 y1,T +1 + φ22 y2,T +1 + v2,T +2 . Now all terms on the right-hand side are unknown. As before the parameters are replaced by the estimators and the disturbances are replaced by their means, while y1,T +1 and y2,T +1 are replaced by their forecasts from the previous step, resulting in the two-period ahead forecasts yb1,T +2 = φb10 + φb11 yb1,T +1 + φb12 yb2,T +1 yb2,T +2 = φb20 + φb21 yb1,T +1 + φb22 yb2,T +1 . In general, the forecasts of the VAR(1) model for H−periods ahead are yb1,T +H yb2,T +H = φb10 + φb11 yb1,T +H−1 + φb12 yb2,T +H−1 = φb20 + φb21 yb1,T +H−1 + φb22 yb2,T +H−1 . An important feature of this result is that even if forecasts are required for just one of the variables, say y1,t , it is necessary to generate forecasts of the other variables as well. To illustrate forecasting using a VAR consider in addition to the logarithm of the equity index, pt and associated returns, rpt , consider also the logarithm of real dividends dt and the returns to dividends rdt . As before data 170 Forecasting are available for the period February 1871 to June 2004 and suppose ex ante forecasts are required for July and August 2004. The estimated bivariate VAR model is rpt = 0.2149 + 0.2849 rpt−1 + 0.1219 rdt−1 + vb1,t rdt = 0.0301 + 0.0024 rpt−1 + 0.8862 rdt−1 + vb2,t , where vb1,t and vb2,t are the residuals from the two equations. The forecasts for equity and dividend returns in July are rp b T +1 = 0.2149 + 0.2849 rpT + 0.1219 rdT = 0.2149 + 0.2849 × 2.6823 + 0.1219 × 1.0449 = 1.1065% b T +1 = 0.0301 + 0.0024 rpT + 0.8862 rdT rd = 0.0301 + 0.0024 × 2.6823 + 0.8862 × 1.0449 = 0.9625%. The corresponding forecasts for August are b T +1 b T +1 + 0.1219 rd rp b T +2 = 0.2149 + 0.2849 rp = 0.2149 + 0.2849 × 1.1065 + 0.1219 × 0.9625 = 0.6475% b T +2 = 0.0301 + 0.0024 rp b T +1 rd b T +1 + 0.8862 rd = 0.0301 + 0.0024 × 1.1065 + 0.8862 × 0.9625 = 0.6475%. 6.4.2 Vector Error Correction Models An important relationship between vector autoregressions and vector error correction models discussed in Chapter 5 is that a VECM represents a restricted VAR. This suggests that a VECM can be re-expressed as a VAR which, in turn, can be used to forecast the variables of the model. Consider the following bivariate VECM containing one lag ∆y1,t = γ1 (y2,t−1 − βy1,t−1 − µ) + π11 ∆y1,t−1 + π12 ∆y2,t−1 + v1,t ∆y2,t = γ2 (y2,t−1 − βy1,t−1 − µ) + π21 ∆y1,t−1 + π22 ∆y2,t−1 + v2,t . 6.4 Forecasting with Multivariate Time Series Models 171 Rearranging the VECM as a (restricted) VAR(2) in the levels of the variables, gives y1,t = −γ1 µ + (1 + π11 − γ1 β)y1,t−1 − π11 y1,t−2 + (γ1 + π12 )y2,t−1 − π12 y2,t−2 + v1,t y2,t = −γ2 µ + (π21 − γ2 β)y1,t−1 − π21 y1,t−2 + (1 + γ2 + π22 )y2,t−1 − π22 y2,t−2 + v2,t , Alternatively, it is possible to write y1,t = φ10 + φ11 y1,t−1 + φ12 y1,t−2 + φ13 y2,t−1 + φ14 y2,t−2 + v1,t y2,t = φ20 + φ21 y1,t−1 + φ22 y1,t−2 + φ23 y2,t−1 + φ24 y2,t−2 + v2,t , (6.6) in which the VAR and VECM parameters are related as follows φ10 φ11 φ12 φ13 φ14 = −γ1 µ = 1 + π11 − γ1 β = −π11 = γ1 + π12 = −π12 φ20 φ21 φ22 φ23 φ24 = −γ2 µ = π21 − γ2 β = −π21 = 1 + γ2 + π22 = −π22 . (6.7) Now that the VECM is re-expressed as a VAR in the levels of the variables in equation (6.6), the forecasts are generated for a VAR as discussed in Section 6.4.1 with the VAR parameter estimates computed from the VECM parameter estimates based on the relationships in (6.7). Using the same dataset as that used in producing ex ante VAR forecasts, the procedure is easily repeated for the VECM. The estimated VECM model with a restricted constant (Model 3) and with two lags in the underlying VAR model is 1 rpt = 0.2056 − 0.0066(pt−1 − 1.1685 dt−1 − 312.9553) +0.2911 rpt−1 + 0.1484 rdt−1 + vb1,t rdt = 0.0334 + 0.0023(pt−1 − 1.1685 dt−1 − 312.9553) +0.0002 rpt−1 + 0.8768 rdt−1 + vb2,t , where vb1,t and vb2,t are the residuals from the two equations. Writing the VECM as a VAR in levels gives pt = (0.2056 + 0.0066 × 312.9553) + (1 − 0.0066 + 0.2911) pt−1 − 0.2911 pt−2 +(0.0066 × 1.1685 + 0.1484)dt−1 − 0.1484 dt−2 + vb1,t dt = (0.0334 − 0.0023 × 312.9553) + (0.0023 + 0.0002) pt−1 − 0.0002 pt−2 + (1 − 0.0023 × 1.1685 + 0.8768) dt−1 − 0.8768 dt−2 + vb2,t , 1 These estimates are the same as the estimates reported in Chapter 5 with the exception that the intercepts now reflect the fact that the variables are scaled by 100. 172 Forecasting or pt = 2.2711 + 1.2845 pt−1 − 0.2911 pt−2 +0.1561 dt−1 − 0.1484 dt−2 + vb1,t dt = −0.6864 + 0.0025 pt−1 − 0.0002 pt−2 +1.8741 dt−1 − 0.8768 dt−2 + vb2,t . The forecast for July log equities is pbT +1 = 2.2711 + 1.2845 pT − 0.2911 pT −1 + 0.1561 dT − 0.1484 dT −1 = 704.0600, and for July log dividends is dbT +1 − 0.6864 + 0.0025 pT − 0.0002 pT −1 + 1.8741 dT − 0.8768 dT −1 = 293.3700. Similar calculations reveal that the forecasts for August log equities and dividends are: pbT +2 = 704.3400 dbT +1 = 294.4300. Based on these forecasts of the logarithms of equity prices and dividends, the forecasts for the percentage equity returns in July and August 2004 are, respectively, rp b T +1 = 704.0600 − 703.2412 = 0.8188% rp b T +2 = 704.3400 − 704.0600 = 0.2800%, and the corresponding forecasts for dividend returns are, respectively, b T +1 = 293.3700 − 292.3162 = 1.0538% rd b T +2 = 294.4300 − 293.3700 = 1.0600%. rd 6.5 Forecast Evaluation Statistics The discussion so far has concentrated on forecasting a variable or variables over a forecast horizon H, beginning after the last observation in the dataset. This of course is the most common way of computing forecasts. Formally these forecasts are known as ex ante forecasts. However, it is also of interest to be able to compare the forecasts with the actual value that are realised to determine their accuracy. One approach is to wait until the future values are observed, but this is not that convenient if an answer concerning the forecasting ability of a model is required immediately. A common solution adopted to determine the forecast accuracy of a model 6.5 Forecast Evaluation Statistics 173 is to estimate the model over a restricted sample period that excludes the last H observations. The model is then forecasted out-of-sample over these observations, but as the actual value of these observations have already been observed it is possible to compare the accuracy of the forecasts with the actual values. As the data are already observed, forecasts computed in this way are known as ex post forecasts. There are a number of simple summary statistics that are used to determine the accuracy of forecasts. Define the forecast error in period T + h as the difference between the actual and forecast value over the forecast horizon yT +1 − ybT +1 , yT +2 − ybT +2 , · · · , yT +H − ybT +H , then it follows immediately that the smaller the forecast error the better is the forecast. The most commonly used summary measures of overall closeness of the forecasts to the actual values are: Mean Absolute Error: M AE = Mean Absolute Percentage Error: M AP E = Mean Square Error: M SE Root Mean Square Error: RM SE H 1 P |yT +h − ybT +h | H h=1 H y 1 P bT +h T +h − y H h=1 yT +h H 1 P (yT +h − ybT +h )2 H h=1 s H 1 P (yT +h − ybT +h )2 = H h=1 = These use of these statistics is easily demonstrated in the context of the United States equity returns, rpt . To allow the generation of ex post forecasts an AR(1) model is estimated using data for the period February 1871 to December 2003. Forecasts for the period January to June of 2004 for are then used with the observed monthly percentage return on equities to generate the required summary statistics. To compute the MSE for the forecast period the actual sample observations of equity returns from January 2004 to June 2004 are required. These are 4.6892%, 0.9526%, −1.7095%, 0.8311%, −2.7352%, 2.6823%. 174 Forecasting The MSE is 6 1X M SE = (yt+h − ft+h )2 6 h=1 1 = (4.6892 − 1.0701)2 + (0.9526 − 0.5515)2 + (−1.7095 − 0.4034)2 6 + (0.8311 − 0.3611)2 + (−2.7352 − 0.3490)2 + (2.6823 − 0.3456)2 = 5.4861 The RMSE is v u 6 u1 X √ RM SE = t (yt+h − ft+h )2 = 5.4861 = 2.3423 6 h=1 Taken on its own, the root mean squared error of the forecast, 2.3422, does not provide a descriptive measure of the relative accuracy of this model per se, as its value can easily be changed by simply changing the units of the data. For example, expressing the data as returns and not percentage returns results in the RMSE falling by a factor of 100. Even though the RMSE is now smaller that does not mean that the forecasting performance of the AR(1) model has improved in this case. The way that the RMSE and the MSE are used to evaluate the forecasting performance of a model is to compute the same statistics for an alternative model: the model with the smaller RMSE or MSE, is judged as the better forecasting model. The forecasting performance of several models are now compared. The models are an AR(1) model of equity returns, a VAR(1) model containing equity and dividend returns, and a VECM(1) based on Model 3, containing log equity prices and log dividends. Each model is estimated using a reduced sample on United States monthly percentage equity returns from February 1871 to December 2003, and the forecasts are computed from January to June of 2004. The forecasts are then compared using the MSE and RMSE statistics. The results in Table 6.1 show that the VAR(1) is the best forecasting model as it yields the smallest MSE and RMSE. The AR(1) is second best followed by the VECM(1). There is an active research area in financial econometrics at present in which these statistical (or direct) measures of forecast performance are replaced by problem-specific (or indirect) measures of forecast performance in which the evaluation relates specifically to an economic decision (Elliot and Timmerman, 2008; Patton and Sheppard, 2009). Early examples of the indi- 175 6.6 Evaluating the Density of Forecast Errors Table 6.1 Forecasting performance of models of United States monthly percentage equity returns. All models are estimated over the period January 1871 to December 2003 and the forecasts are computed from January to June of 2004. Forecast/Statistic January 2004 February 2004 March 2004 April 2004 May 2004 June 2004 MSE RMSE AR(1) VAR(1) VECM(1) 1.0701% 0.5515% 0.4034% 0.3611% 0.3490% 0.3456% 1.2241% 0.7333% 0.5780% 0.5200% 0.4912% 0.4721% 0.9223% 0.3509% 0.1890% 0.1474% 0.1411% 0.1447% 5.4861 2.3422 5.4465 2.3338 5.5560 2.3571 rect approach to forecast evaluation are Engle and Colacito (2006) evaluate forecast performance in terms of portfolio return variance, while Fleming, Kirby and Ostdiek (2001, 2003) apply a quadratic utility function that values one forecast relative to another. Becker, Clements, Doolan and Hurn (2013) provide a survey and comparison of these different approaches to forecast evaluation. 6.6 Evaluating the Density of Forecast Errors The discussion of generating forecasts of financial variables thus far focusses on either the conditional mean (point forecasts) or the conditional variance (interval forecasts) of the forecast distribution. A natural extension is also to forecast higher order moments, including skewness and kurtosis. In fact, it is of interest in the area of risk management to forecast all moments of the distribution and hence forecast the entire probability density of key financial variables. As is the case with point forecasts where statistics are computed to determine the relative accuracy of the forecasts, the quality of the density forecasts are also evaluated to determine their relative accuracy in forecasting all moments of the distribution. However, the approach is not to try and evaluate the forecasts properties of each moment separately, but rather test all moments jointly by using the probability integral transformation (PIT). 176 Forecasting 6.6.1 Probability integral transform Consider a very simple model of a data generating process for the yt = µ + vt vt ∼ iid N (0, σ 2 ), in which µ = 0.0 and σ 2 = 1.0. Now denote the cumulative distribution function of the standard normal distribution evaluated at any point z as Φ(z), then if a sample of observed values yt are indeed generated correctly, then ut = Φ(yt − µ) t = 1, 2, · · · , T results in the transformed time series ut having an iid uniform distribution. This transformation is known as the probability integral transform. Figure 6.2 contains an example of how the transformed times series ut is obtained from the actual time series yt where the specified model is N (0, 1). This result is a reflection of the property that if the cumulative distribution is indeed the correct distribution, transforming yt to ut means that each yt has the same probability of being realised as any other value of yt . 0 .2 .4 ut .6 .8 1 Probabality Integral Transform -4 -2 0 yt 2 4 Figure 6.2 Probability integral transform showing how the the time series yt is transformed into ut based on the distribution N (0, 1). The probability integral transform in the case where the specified model is chosen correctly is highlighted in panel (a) of Figure 6.3. A time series plot of 1000 simulated observations, yt , drawn from a N (0, 1) distribution is transformed into via the cumulative normal distribution to ut . Finally 177 6.6 Evaluating the Density of Forecast Errors 50 ut 0 .2 .4 .6 .8 1 0 -4 -2 yt 0 2 4 Panel (a) - Correct distribution 0 500 1000 0 500 1000 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 100 0 50 ut 0 .2 .4 .6 .8 1 -2 0 yt 2 4 Panel (b) - Mean misspecified 0 500 1000 0 500 1000 100 0 50 ut 0 .2 .4 .6 .8 1 -5 yt 0 5 Panel (c) - Variance misspecified 0 500 1000 0 500 1000 Figure 6.3 Simulated time series to show the effects of misspecification on the probability integral transform. In panel (a) there is no misspecification while panels (b) and (c) demonstrate the effect of misspecification in the mean and variance of the distribution respectively. the histogram of the transformed time series, ut is shown. Inspection of this histogram confirms that the distribution of ut is uniform and that the distribution used in transforming yt is indeed the correct one. Now consider the case where the true data generating process for yt is the N (0.5, 1) distribution, but the incorrect distribution, N (0, 1), is used as the forecast distribution to perform the PIT. The effect of misspecification of the mean on the forecasting distribution is illustrated in panel (b) of Figure 6.3. A time series of 1000 simulated observations from a N (0.5, 1.0) 178 Forecasting distribution, yt , is transformed using the incorrect distribution, N (0, 1), and the histogram of the transformed time series, ut is plotted. The fact that ut is not uniform in this case is a reflection of a misspecified model. The histogram exhibits a positive slope reflecting that larger values of yt have a relatively higher probability of occurring than small values of yt . Now consider the case where the variance of the model is misspecified. If the data generating process is a N (0, 2) distribution, but the forecast distribution used in the PIT is once again N (0, 1) then it is to be expected that the forecast distribution will understate the true spread of the data. This is clearly visible in panel (c) of Figure 6.3. The histogram of ut is now U-shaped implying that large negative and large positive values have a higher probability of occurring than predicted by the N (0, 1) distribution. 6.6.2 Equity Returns The models used to forecast United States equity returns rpt in Section 6.3 are all based on the assumption of normality. Consider the AR(1) model rpt = φ0 + φ1 rpt−1 + vt , vt ∼ N (0, σ 2 ) . Assuming the forecast is ex post so that rpt is available, the one-step ahead forecast error is given by vbt = rpt − φb0 − φb1 rpt−1 , with distribution f (b vt ) ∼ N (rpt − φb0 − φb1 rpt−1 , σ 2 ) . Using monthly data from January 1871 to June 2004, this distribution is f (b vt ) ∼ N (rpt − 0.2472 − 0.2853 rpt−1 , 3.9292 ) . The PIT corresponding to the estimated distribution in (6.6.2) the transformed time series are computed as vbt ut = Φ , σ b in which σ b1 is the standard error of the regression. A histogram of the transformed time series, ut , is given in Figure 6.4. It appears that the AR(1) forecasting model of equity returns is misspecified because the distribution of ut is non-uniform. The interior peak of the distribution of ut suggests that the distribution of yt is more peaked than that predicted by the normal distribution. Also, the pole in the distribution at zero suggests that there 179 6.7 Combining Forecasts 0 f(ut) 50 100 are some observed negative values of yt that are also not consistent with the specification of a normal distribution. These two properties combined suggest that the specified model fails to take into account the presence of higher order moments such as skewness and kurtosis. The analysis of the one-step ahead AR(1) forecasting model can easily be extended to the other estimated models of equity returns including the VAR and the VECM investigated in Section 6.4 to forecast equity returns. 0 .2 .4 .6 .8 1 ut Figure 6.4 Probability integral transform applied to the estimated one-step ahead forecast errors of the AR(1) model of United States equity returns, January 1871 to June 2004. As applied here, the PIT is ex post as it involves using the within sample one-step ahead prediction errors to perform the analysis and it is also a simple graphical implementation in which misspecification is detected by simple inspection of the histogram of the transformed time series, ut . It is possible to relax both these assumptions. Diebold, Gunther and Tay (1998) discuss an alternative ex ante approach, while Ghosh and Bera (2005) propose a class of formal statistical tests of the null hypothesis that ut is uniformly distributed. 6.7 Combining Forecasts Given that all models are wrong but some are useful, it is not surprising that the issue of combining forecasts has generated a great deal of interest (Timmerman, 2006; Elliott and Timmerman, 2008) and very often the financial press will report consensus forecasts which are essentially averages 180 Forecasting of different forecasts of the same quantity. This raises an important question in forecasting: is it better to rely on the best individual forecast or is there any gain to averaging the competing forecasts? Suppose you have two unbiased forecasts of a variable yt given by ybt1 and ybt2 , with respective variances σ12 and σ22 and covariance σ12 . A weighted average of these two forecasts is ybt = ωb y1,t + (1 − ω)b y1,t and the variance of average is σ 2 = ω 2 σ12 + (1 − ω)2 σ22 + 2ω(1 − ω)σ1 1 A natural approach is to choose the weight ω in order to minimise the variance of the forecast. Solving the the first order condition ∂σ = 2ωσ12 − 2(1 − ω)σ22 + 2σ12 − 4ωσ11 = 0 ∂ω for the optimal weight gives ω= σ22 − σ11 . σ12 + σ22 − 2σ11 It is clear therefore that the weight attached to ybt1 varies inversely with its variance. In passing, these weights are of course identical to the optimal weights for the minimum variance portfolio derived in Chapter 2. This point can be illustrated more clearly if the forecasts are assumed to be uncorrelated, σ12 = 0. In this case, ω= σ22 σ12 + σ22 1−ω = σ12 σ12 + σ22 and it is clear that both forecasts have weights varying inversely with their variances. By rearranging the expression for ω as follows ω= = σ22 σ2−2 σ1−2 σ12 + σ22 σ2−2 σ1−2 σ1−2 , σ1−2 + σ2−2 (6.8) the inverse proportionality is now manifestly clear in the numerator of expression (6.8). This simple intuition in the two forecast case translates into a situation in which there are N forecasts {b yt1 , ybt2 , · · · , ybtN } of the same 6.7 Combining Forecasts 181 variable yt . If these forecasts are all unbiased and uncorrelated and if the weights satisfy N X i=1 ωi = 1 ωi ≥ 0 i = 1, 2, · · · , N , then from (6.8) the optimal weights are σ −2 ωi = PN i −2 , j=1 σj and the weight on forecast i is inversely proportional to its variance. While the weights in expression (6.8) are intuitively appealing as they are based on the principle of producing a minimum variance portfolio. Important questions remain, however, about how best to implement the combination of forecasts approach in practice. Bates and Granger (1969) suggested using (6.8) estimating the σi2 using the forecast mean square error as an estimate of the forecast variance. All this approach requires then is an estimate of the MSE of all the competing forecasts in order to compute the optimal weights, ω bi . Granger and Ramanathan (1984) later showed that this method was numerically equivalent to weights constructed from running the restricted regression yt = ω1 ybt1 + ω2 ybt2 + · · · + ωN ybtN + vt , in which the coefficients are constrained to be non-negative and to sum to one. Of course enforcing these restrictions in practice can be tricky and sometimes ad hoc methods need to be adopted. One one method is the sequential elimination of forecasts with weights estimated to be negative until all the remaining forecasts in the proposed combination forecast have positive weights. This is sometimes referred to as forecast encompassing because all the forecasts that eventually remain in the regression encompass all the information in those that are left out. Yet another approach to averaging forecasts is based on the use of information criteria (Buckland, Burnham and Augustin, 1997; Burnham and Anderson, 2002), which may be interpreted as the relative quality of an econometric model. Suppose you have N different models each with an estimated Akaike information criterion AIC1 , AIC2 , · · · , AICN , then the model that returns the minimum value of the information criterion is usually the model of choice. Denote the minimum value of the information criterion for this set of models as AICmin , then exp [∆Ii /2] = exp [(AICi − AICmin )/2] 182 Forecasting may be interpreted as a relative measure of the loss of information2 due to using model i instead of the model yielding Imin . It is therefore natural to allow the forecast combination to reflect this relative information by computing the weights ω bi = exp [∆Ii /2] N P exp [∆Ii /2] j=i The Schwarz (Bayesian) Information Criterion (SIC) has also been suggested as an alternative information criterion to use in this context.3 Of course the simplest idea would be assign equal weight to these forecasts construct the simple average 1 X ybt = = 1N yb1it . N i Interestingly enough, simulation studies and practical work generally indicated that this simplistic strategy often works best, especially when there are large numbers of forecasts to be combined, notwithstanding all the subsequent work on the optimal estimation of weights (Stock and Watson, 2001). Two possible explanations of why averaging might in practice work better than constructing the optimal combination focus are as follows. (i) There may be significant error in the estimation of the weights, due either to parameter instability (Clemen, 1989; Winkler and Clemen, 1992, Smith and Wallis, 2009) or structural breaks (Hendry and Clements, 2004)). (ii) The fact that the variances of the competing forecasts may be very similar and their covariances positive suggests that large gains obtained by constructing optimal weights are unlikely (Elliott, 2011). 6.8 Regression Model Forecasts The forecasting of univariate and multivariate models discussed so far are all based on time series models as each dependent variable is expressed as 2 3 The exact form of this expression derives from the likelihood principle which is discussed in Chapter 7. The AIC is an unbiased estimate of −2 times the log-likelihood function of model i, so the after dividing by −2 and exponentiating the result is a measure of the likelihood that model i actually generated the observed data. When the SIC is is used to construct the optimal weights have the interpretation of a Bayesian averaging procedure. Illustrative examples may be found in Garratt, Koop and Vahey, (2008) and Kapetanios, Vabhard and Price (2008). 6.8 Regression Model Forecasts 183 a function of own lags and lags of other variables. Now consider forecasting the linear regression model yt = β0 + β1 xt + ut , where yt is the dependent variable, xt is the explanatory variable, ut is a disturbance term, and the sample period is t = 1, 2, · · · , T . To generate a forecast of yt at time T + 1, as before, the model is written at T + 1 as yT +1 = β0 + β1 xT +1 + uT +1 The unknown values on the right hand-side are yT +1 and uT +1 , as well as the parameters {β0 , β1 }. As before, uT +1 is replaced by its expected value of E[uT +1 ] = 0, while the parameters are replaced by their sample estimates, {βb0 , βb1 }. However, it is not clear how to deal with xT +1 , the future value of the explanatory variable. One strategy is to specify hypothetical future values of the explanatory variable that in some sense capture scenarios the researcher is interested in. A less subjective approach is to specify a time series model for xt and use this model to generate forecasts of xT +i . Suppose for the sake of argument that an AR(2) model is proposed for xt . The bivariate system of equations to be estimated is then yt = β0 + β1 xt + ut xt = φ0 + φ1 xt−1 + φ2 xt−2 + vt . (6.9) (6.10) To generate the first forecast at time T +1 the system of equations is written as yT +1 = β0 + β1 xT +1 + uT +1 xT +1 = φ0 + φ1 xT + φ2 xT −1 + vT +1 . Replacing the unknowns with the best available guesses, yields ybT +1 = βb0 + βb1 x bT +1 x bT +1 = φb0 + φb1 xT + φb2 xT −1 . (6.11) (6.12) Equation (6.12) is used to generate the forecast x bT +1 , which is the substituted into equation (6.11) to generate a ybT +1 Alternatively, these calculations can be performed in one step by substituting (6.12) for x bT +1 into (6.11) to give ybT +1 = βb0 + βb1 (φb0 + φb1 x1,T + φb2 xT −1 ) = βb0 + βb1 φb0 + βb1 φb1 x1,T + βb1 φb2 xT −1 . 184 Forecasting Of course, the case where there are multiple explanatory variables is easily handled by specifying a VAR to generate the required multivariate forecasts. The regression model may be used to forecast United States equity returns, rpt , using dividend returns, rdt . As in earlier illustrations, the data are from February 1871 to June 2004. Estimation of equations (6.9) and (6.10), in which for simplicity the latter is restricted to an AR(1) representation, gives yt = 0.3353 + 0.0405y1,t + u bt , xt = 0.0309 + 0.8863x1,t−1 + vbt . Based on these estimates, the forecasts for dividend returns in July and August are, respectively, x bT +1 = 0.0309 + 0.8863 x1,T = 0.0309 + 0.8863 × 1.0449 = 0.9570% x bT +2 = 0.0309 + 0.8863 x1,T +1 = 0.0309 + 0.8863 × 0.9570 = 0.8791% , so that in July and August the forecasted equity returns are ybT +1 = 0.3353 + 0.0405f1,T +1 = 0.3353 + 0.0405 × 0.9570 = 0.3741% ybT +2 = 0.3353 + 0.0405f1,T +2 = 0.3353 + 0.0405 × 0.8791 = 0.3709% 6.9 Predicting the Equity Premium Forecasting in finance using regression models, or predictive regressions, as outlined in Section 6.8 is one that is currently receiving quite a lot of attention (Stambaugh, 1999). In a series of recent papers Goyal and Welch (2003; 2008) provide empirical evidence of the predictability of the equity premium, eqpt , defined as the total rate of return on the S&P 500 index, rmt , minus the short-term interest rate, in terms of the dividend-price ratio dpt and the dividend yield dyt . What follows reproduces some of the results from Goyal and Welch (2003). Table 6.2 provides summary statistics for the data. There are difficulties in reproducing all the summary statistics reported by Goyal and Welch in their papers because the data they provide is updated continuously.4 The summary statistics reported here are for slightly different sample periods than those listed in Goyal and Welch (2003), but the mean and standard deviation for the sample period 1927 to 2005 of 6.04% and 19.17%, respectively, are identical to those for the same period listed in Goyal and Welch (2008). Furthermore the plots of the logarithm of the equity premium and 4 See http://www.hec.unil.ch/agoyal/ 185 6.9 Predicting the Equity Premium the logarithms of the dividend yield and dividend price ratio in Figure 6.5 are almost identical to the plots in Figure 1 of Goyal and Welch (2003). Table 6.2 Descriptive statistics for the annual total market return, the equity premium, the dividend price ratio and the dividend yield all defined in terms of the S&P 500 index. All variables are in percentages. 1926 - 2003 1946 - 2003 1927 - 2005 Mean St.dev. Min. Max. Skew. Kurt. rmt eqpt dpt dyt 9.79 6.11 -3.28 -3.22 19.10 19.28 0.44 0.42 -53.99 -55.13 -4.48 -4.50 42.51 42.26 -2.29 -2.43 -0.82 -0.65 -0.64 -1.07 3.69 3.41 3.63 4.33 rmt eqpt dpt dyt 10.52 5.88 -3.37 -3.30 15.58 15.93 0.42 0.43 -30.12 -37.64 -4.48 -4.50 41.36 40.43 -2.63 -2.43 -0.46 -0.43 -0.76 -0.81 2.66 2.84 3.52 3.96 rmt eqpt dpt dyt 9.69 6.04 -3.30 -3.24 18.98 19.17 0.45 0.43 -53.99 -55.13 -4.48 -4.50 42.51 42.26 -2.29 -2.43 -0.80 -0.65 -0.57 -0.96 3.71 3.44 3.28 3.79 Equity Premium -.6 -.4 -.2 0 .2 .4 (a) Equity Premium 1920 1940 1960 1980 2000 -4.5 -4 -3.5 -3 -2.5 (b) Dividend Ratios 1920 1940 1960 Div Yield 1980 2000 Div-Price Ratio Figure 6.5 Plots of the time series of the logarithm of the equity premium, dividend yield, and dividend-price ratio. 186 Forecasting The predictive regressions used in this piece of empirical analysis are, respectively, eqpt = αy + βy dyt−1 + uy,t (6.13) eqpt = αp + βp dpt−1 + up,t . (6.14) The parameter estimates obtained from estimating these equations for two different sample periods, namely, 1926 to 1990 and 1926 to 2002, respectively, are reported in Table 6.3. Table 6.3 Predictive regressions for the equity premium using the divined price ratio, dpt , and the dividend yield, dyt , as explanatory variables. α β R2 R 2 Std. error N Sample 1926 - 1990 dpt dyt 0.57 (0.257) (0.030) 0.738 (0.282) (0.011) 0.163 (0.0818) (0.050) 0.221 (0.0913) (0.018) .0595 0.0446 0.193 65 .0851 0706 0.1903 65 Sample 1926 - 2002 dpt dyt 0.379 (0.169) (0.028) 0.467 (0.176) (0.010 ) 0.0984 (0.0517) (0.061) 0.128 (0.0547) (0.022) .0461 .0334 0.1898 77 .0680 .0556 0.1876 77 These results suggest that dividend yields and price dividend ratios had at least some forecasting power with respect to the equity premium for the period 1926 - 1990, at least for the S&P 500 index. It is noticeable however that the size of the coefficients on both dpt−1 and dyt−1 is substantially reduced when the sample size is increased to 2002. Although the results are not identical to those in Table 2 of Goyal and Welch (2003) because of data revisions, the coefficients are similar and so is the pattern of size of the coefficient estimates decreasing as the sample size is increased. This sub-sample instability of the estimated regression coefficients in Table 6.3 is further illustrated by considering the recursive plots of the slope coefficients on dpt−1 and dyt−1 in Figure 6.6 reveal some important problems with this interpretation at least from the forecasting perspective. The 187 6.9 Predicting the Equity Premium Recursive Coefficient Estimates -.5 0 .5 (a) Divident Price Ratio 1940 1960 1980 2000 1980 2000 -.5 0 .5 1 1.5 2 (b) Divident Yield 1940 1960 Figure 6.6 Recursive estimates of the coefficients on the dividend-price ratio and the dividend yield from (6.13) and (6.14). plot reveals that although the coefficient on dyt−1 appears to be marginally statistically significant at the 5% level over long periods, the coefficient on dpt−1 increases over time while the coefficient on dyt−1 steadily decreases. In other words, as time progresses the forecaster would rely less on dyt and more on dpt despite the fact that the dyt coefficient appears more reliable in terms of statistical significance. In fact, the dividend yield is almost always produces an inferior forecast to the unconditional mean of the equity premium and the dividend-price ratio fares only slightly better. The point being made is that a trader relying on information available at the time a forecast was being made and not relying on information relating to the entire sample would have had difficulty in extracting meaningful forecasts. The main tool for interpreting the performance of predictive regressions supplied by Goyal and Welch (2003) is a plot of the cumulative sum of squared one-step-ahead forecast errors of the predictive regressions expressed relative to the forecast error of the best current estimate of the mean of the equity premium. Let one-step-ahead forecast errors of the dividend yield and dividend-price ratio models be u by,t+1|t and u bp,t+1|t , respectively, and let the forecast errors for the best estimate of the unconditional mean be u bt+1|t , 188 Forecasting then Figure 6.7 plots the two series SSE(y) = SSE(p) = 2003 X b2y,t+1|t ) (b u2t+1|t − u [Dividend Yield Model] b2p,t+1|t ) (b u2t+1|t − u [Dividend-Price Ratio Model]. t=1946 2003 X t=1946 -.3 -.2 -.1 0 .1 A positive value for SSE means that the model forecasts are superior to the forecasts based solely on the mean thus far. A positive slope implies that over the recent year the forecasting model performs better than the mean. 1940 1960 SSE Dividend Yield Model 1980 2000 SSE Dividend Price Ratio Model Figure 6.7 Plots of the cumulative squared relative one-step-ahead forecast errors obtained from the equity premium predictive regressions. The squared one-step-ahead forecast errors obtained from the models are subtracted from the squared one-step-ahead forecast errors based solely on the best current estimate of the unconditional mean of the equity premium. Figure 6.7 indicates that the forecasting ability of a predictive regression using the dividend yield is abysmal as SSE(y) is almost uniformly less than zero. There are two years in mid-1970s two years around 2000 when SSE(y) has a positive slope but these episodes are aberrations. The forecasting performance of the predictive regression using the dividend-price ratio is slightly better than the forecasts generated by the mean, SSE(p) > 0. This is not a conclusion that emerges naturally from Figure 6.6 which indicates that the slope coefficient from this regression is almost always statistically insignificant. 6.10 Stochastic Simulation 189 There are a few important practical lessons to learn from predictive regressions. The first of these is that good in-sample performance does not necessarily imply that the estimated equation will provide good ex ante forecasting ability. As in the case of the performance of pooled forecasts, parameter instability is a a problem for good predictive performance. Second, there is a fundamental problem using variables that are almost nonstationary processes as explanatory equations in predictive regressions which purport to explain stationary variables. So Stambaugh (1999) finds that dividend ratios are almost random walks while the equity premia are stationary. It may therefore be argued that dividend ratios are good predictors of their own future behaviour only and not of the future path of the equity premium. 6.10 Stochastic Simulation Forecasting need not necessarily be about point forecasts or best guesses. Sometimes important information is conveyed by the degree of uncertainty inherent in the best guess. One important application of this uncertainty in finance is the concept of Value-at-Risk which was introduced in Chapter 1. Stated formally, Value-at-Risk represents the losses that are expected to occur with probability α on an asset or portfolio of assets, P , after N . The N − day (1 − α)% Value-at-Risk is expressed as V aR(P, N, 1 − α). That Value-at-Risk is related to the uncertainty in the forecast of future values of the portfolio is easily demonstrated. Consider the case of US monthly data on equity prices. Suppose that the asset in question is one which pays the value of the index. An investor who holds this asset in June 2004, the last date in the sample, would observe that the value of the portfolio is $1132.76. The value of the portfolio is now forecast out for six months to the end of December 2004. In assessing the decision to hold the asset or liquidate the investment, it is not so much the best guess of the future value that is important as the spread of the distribution of the forecast. The situation is illustrated in Figure 6.8 where the shaded region captures the 90% confidence interval of the forecast. Clearly, the investor needs to take this spread of likely outcomes into account and this is exactly the idea of Valueat-Risk. It is clear therefore that forecast uncertainty and Value-at-Risk are intimately related. Recall from Chapter 1 that Value-at-Risk may be computed by historical simulation, the variance-covariance method, or Monte Carlo simulation. Using a model to make forecasts of future values of the asset or portfolio and then assessing the uncertainty in the forecast is the method of Monte Carlo simulation. In general simulation refers to any method that randomly 190 800 1000 1200 1400 Forecasting 2002m7 2003m1 2003m7 2004m1 2004m7 2005m1 Figure 6.8 Stochastic simulation of the equity price index over the period July 2004 to December 2004. The ex ante forecasts are shown by the solid line while the confidence interval encapsulates the uncertainty inherent in the forecast. generates repeated trials of a model and seeks to summarise uncertainty in the model forecast in terms of the distribution of these random trials. The steps to perform a simulation are as follows: Step 1: Estimate the model Estimate the following (simple) AR(1) regression model yt = φ0 + φ1 yt−1 + vt and store the parameter estimates φb0 and φb1 . Note that the AR(1) model is used for illustrative purposes only and any model of yt could be used. Step 2: Solve the model For each available time period t in the model, use φb0 and φb1 to generate a one-step-ahead forecast ybt+1 = φb0 + φb1 yt and then compute and store the one-step-ahead forecast errors vbt+1|t = ybt+1 − yt+1 . 191 6.10 Stochastic Simulation Step 3: Simulate the model Now forecast the model forward but instead of a forecast based solely on the best guesses for the unknowns, the uncertainty is explicitly accounted for by including an error term. The error term is obtained either by drawing from some parametric distribution (such as the normal distribution) or by taking a random draw from the estimated one-step-ahead forecast errors ybT1 +1 = φ0 + φ1 yT + ṽT +1 ybT1 +2 = φ0 + φ1 ybT +1 + ṽT +1 .. . ybT1 +H = φ0 + φ1 ybT +H−1 + ṽT +H where ṽT +i are all random drawings from vbt+1|t , the computed onestep-ahead forecast errors from Step 2. The series of forecasts {b yT1 +1 , ybT1 +2 · · · , ybT1 +H } represents one repetition of a Monte Carlo simulation of the model. Step 4: Repeat Step 3 is now repeated S times to obtain an ensemble of forecasts ybT1 +1 ybT1 +2 .. . ybT2 +1 ybT2 +2 .. . ybT1 +H ybT1 +H ybT3 +1 ybT3 +1 ybT3 +1 ybT3 +1 ··· ··· .. . ybTS−1 +1 ybTS−1 +2 .. . ybTS +1 ybTS +2 .. . ··· ybTS−1 +H ybTS +H Step 5: Summarise the uncertainty Each column of this ensemble of forecasts is a representative of a possible outcome of the model and therefore collectively the ensemble captures the uncertainty of the forecast. In particular, the percentiles of these simulated forecasts for each time period T + i give an accurate picture of the distribution of the forecast at that time. The disturbances used to generate the forecasts are drawn from the actual one-step-ahead prediction errors and not from a normal distribution and the forecast uncertainty will then reflect any non-symmetry or fat tails present in the estimated prediction errors. One practical item of importance concerns the reproduction of the results of the simulation. In order to reproduce simulation results it is necessary to use the same set of random numbers. To ensure this reproducibility it is important to set the seed of the random number generator before carrying out the simulations. If this is not done, a different set of random numbers 192 Forecasting 200 150 Frequency 100 50 0 0 50 Frequency 100 150 200 will be used each time the simulation is undertaken. Of course as S → ∞ this step becomes unnecessary, but in most practical situations the number of replications is set as a realistic balance between computing considerations and accuracy of results. 500 1000 1500 2000 Simulated Index Distribution 2500 -500 0 500 1000 Simulated Loss Distribution 1500 Figure 6.9 Simulated distribution of the equity index and the profit/loss on the equity index over a six month horizon from July 2004. Consider now the problem of computing the 99% Value-at-Risk for the asset which pays the value of the United States equity index over a time horizon is six months. On the assumption that equity returns are generated by an AR(1) model, the estimated equation is rpt = 0.2472 + 0.2853 ret−1 + vbt , which may be used to forecast returns for period T + 1 but ensuring that uncertainty is explicitly introduced. The forecasting equation is therefore rp b T +1 = 0.2472 + 0.2853 reT + ṽT +1 , where ṽT +1 is a random draw from the computed one-step-ahead forecast errors computed by means of an in-sample static forecast. The value of the asset at T + 1 in repetition s is computed as PbTs +1 = PT exp rp b T +1 /100 where the forecast returns are adjusted so that they no longer expressed as 6.10 Stochastic Simulation 193 percentages. A recursive procedure is now used to forecast the value of the asset out to T +6 and the whole process is repeated S times. The distribution of the value of the asset at T + 6 after S repetitions of the is shown in panel (a) of Figure 6.9 with the initial value at time T of PT = $1132.76 superimposed. The distribution of simulated losses obtained by subtracting the initial value of the asset from the terminal value is shown in panel (b) of Figure 6.9. The first percentile value of this terminal distribution is $833.54 so that six month 99% Value-at-Risk is $833.54 − $1132.76 = $ − 299.13. By convention the minus sign is dropped when reporting Value-at-Risk. Of course this approach is equally applicable to simulating Value-at-Risk for more complex portfolios comprising more than one asset and portfolios that include derivatives. 6.10.1 Exercises (1) Recursive Ex Ante Forecasts of Real Equity Returns pv.wf1, pv.dta, pv.xlsx Consider monthly data on the logarithm of real United States equity prices, pt , and the logarithm of real dividend payments, dt , from January 1871 to June 2004. (a) Estimate an AR(1) model of real equity returns, rpt , with the sample period ending in June 2004 . Generate forecasts of rpt from July to December of 2004. (b) Estimate an AR(2) model of real equity returns, rpt , with the sample period ending in June 2004. Generate forecasts of rpt from July to December of 2004. (c) Repeat parts (a) and (b) for real dividend returns, rdt . (d) Estimate a VAR(1) containing for rpt and rdt with the sample period ending in June 2004. Generate forecasts of real equity returns from July to December of 2004. (e) Estimate a VAR(2) for rpt and rdt with the sample period ending in June 2004. Generate forecasts of real equity returns from July to December of 2004. (f) Estimate a VECM(1) for rpt and rdt with the sample period ending in June 2004 and where the specification is based on Model 3, as set out in Chapter 5. Generate forecasts of real equity returns from July to December of 2004. 194 Forecasting (g) Repeat part (f) with the lag length in the VECM increasing from 1 to 2. (h) Repeat part (g) with the VECM specification based on Model 2, as set out in Chapter 5. (i) Now estimate a VECM(1) containing real equity returns, rpt , real dividend returns, rdt , and real earnings growth, ryt , with the sample period ending in June 2004 and the specification is based on Model 3. Assume a cointegrating rank of 1. Generate forecasts of real equity returns from July to December of 2004. (j) Repeat part (a) with the lag length in the VECM increasing from 1 to 2. (k) Repeat part (i) with the VECM specification based on Model 2 (2) Recursive Ex Post Forecasts of Real Equity Returns pv.wf1, pv.dta, pv.xlsx Consider monthly data on the logarithm of real United States equity prices, pt , and the logarithm of real dividend payments, dt , from January 1871 to June 2004. (a) Estimate an AR(1) model of real equity percentage returns (y1,t ) with the sample period ending December 2003, and generate ex post forecasts from January to June of 2004. (b) Estimate a VAR(1) model of real equity percentage returns (y1,t ) and real dividend percentage returns (y2,t ) with the sample period ending December 2003, and generate ex post forecasts from January to June of 2004. (c) Estimate a VECM(1) model of real equity percentage returns (y1,t ) and real dividend percentage returns (y2,t ) using Model 3, with the sample period ending December 2003, and generate ex post forecasts from January to June of 2004. (d) For each set of forecasts generated in parts (a) to (c), compute the MSE and the RMSE. Which is the better forecasting model? Discuss. (3) Regression Based Forecasts of Real Equity Returns pv.wf1, pv.dta, pv.xlsx 6.10 Stochastic Simulation 195 Consider monthly data on the logarithm of real United States equity prices, pt , and the logarithm of real dividend payments, dt , from January 1871 to June 2004. (a) Estimate the following regression of real equity returns (y1,t ) with real dividend returns (y2,t ) as the explanatory variable, with the sample period ending in June 2004 y1,t = β1 + β2 y2,t + ut , (b) Estimate an AR(1) model of dividend returns y2,t = ρ0 + ρ1 y2,t−1 + vt , and combine this model with the estimated model in part (a) to generate forecasts of real equity returns from July to December of 2004. (c) Estimate an AR(2) model of dividend returns y2,t = ρ0 + ρ1 y2,t−1 + ρ2 y2,t−2 + vt , and combine this model with the estimated model in part (a) to generate forecasts of real equity returns from July to December of 2004. (d) Use the estimated model in part (a) to generate forecasts of real equity returns from July to December of 2004 assuming that real dividends increase at 3% per annum. (e) Use the estimated model in part (a) to generate forecasts of real equity returns from July to December of 2004 assuming that real dividends increase at 10% per annum. (f) Use the estimated model in part (a) to generate forecasts of real equity returns from July to December of 2004 assuming that real dividends increase at 3% per annum from July to September and by 10% from October to December. (4) Pooling Forecasts This question is based on the EViews file HEDGE.WF1 which contains daily data on the percentage returns of seven hedge fund indexes, from 196 Forecasting the 1st of April 2003 to the 28th of May 2010, a sample size of T = 1869. R R R R R R R CONVERTIBLE DISTRESSED EQUITY EVENT MACRO MERGER NEUTRAL : : : : : : : Convertible Arbitrage Distressed Securities Equity Hedge Event Driven Macro Merger Arbitrage Equity Market Neutral (a) Estimate an AR(2) model of the returns on the equity market neutral hedge fund (y1,t ) with the sample period ending on the 21st of May 2010 (Friday) y1,t = ρ0 + ρ1 y1,t−1 + ρ2 y1,t−2 + v1,t . Generate forecasts of y1,t for the next working week, from the 24th to the 28th of May, 2010 (save the forecasts in the EViews file and write out the forecasts in the exam script). (b) Repeat part (a) for S&P500 returns (y2,t ) (save the forecasts in the EViews file and write out the forecasts in the exam script). (c) Estimate a VAR(2) containing the returns on the equity market neutral hedge fund (y1,t ) and the returns on the S&P500 (y2,t ), with the sample period ending on the 21st of May 2010 (Friday) y1,t = α0 + α1 y1,t−1 + α2 y1,t−2 + α3 y2,t−1 + α4 y2,t−2 + v1,t y2,t = β0 + β1 y1,t−1 + β2 y1,t−2 + β3 y2,t−1 + β4 y2,t−2 + v2,t . Generate forecasts of y1,t for the next working week, from the 24th to the 28th of May, 2010. (d) For the AR(2) and VAR(2) forecasts obtained for the returns on the equity market neutral hedge fund (y1,t ) and the S&P500 (y2,t ) , compute the RMSE (a total of four RMSEs). Discuss which model yields the superior forecasts. AR be the forecasts form the AR(2) model of the returns on the (e) Let f1,t V AR be the corresponding equity market neutral hedge fund and f1,t VAR(2) forecasts. Restricting the sample period just to the forecast period, 24th to the 28th of May, estimate the following regression which pools the two sets of forecasts AR V AR y1,t = φ0 + φ1 f1,t + φ2 f1,t + ηt , where ηt is a disturbance term with zero mean and variance ση2 . Interpret the parameter estimates and discuss whether pooling the 6.10 Stochastic Simulation 197 forecasts has improved the forecasts of the returns on the equity market neutral hedge fund. (5) Evaluating Forecast Distributions using the PIT pv.wf1, pv.dta, pv.xlsx (a) (Correct Model Specification) Simulate y1 , y2 , · · · , y1000 observations (T = 1000) from the true model given by a N (0, 1) distribution. Assuming that the specified model is also N (0, 1) , for each t compute the PIT ut = Φ(yt ) . Interpret the properties of the histogram of ut . (b) (Mean Misspecification) Repeat part (a) except that the true model is N (0.5, 1) and the misspecified model is N (0, 1). (c) (Variance Misspecification) Repeat part (a) except that the true model is N (0, 2) and the misspecified model is N (0, 1) . (d) (Skewness Misspecification) Repeat part (a) except that the true model is the standardised gamma distribution gt − br yt = √ , b2 r where gt is a gamma random variable with parameters {b = 0.5, r = 2} and the misspecified model is N (0, 1) . (e) (Kurtosis Misspecification) Repeat part (a) except that the true model is the standardised Student t distribution st yt = r , ν ν−2 where st is a Student t random variable with degrees of freedom equal to ν = 5, and the misspecified model is N (0, 1) . (6) Now estimate an AR(1) model of real equity returns, rpt , on monthly United States data for the period February 1871 to June 2004. rpt = φ0 + φ1 rpt−1 + vt , and compute the standard error of the residuals, σ b. Use the PIT to compute the transformed time series vb t ut = Φ . σ b 198 Forecasting Interpret the properties of the histogram of ut . (7) Predicting the Equity Premium goyal annual.wf1, goyal annual.dta, goyal annual.xlsx The data are annual observations on the S&P 500 index, dividends d12t and the risk free rate of interest, rf reet , used by Goyal and Welch (2003; 2008) in their research on the determinants of the United States equity premium. (a) Compute the equity premium, the dividend price ratio and the dividend yields as defined in Goyal and Welch (2003). (b) Compute basic summary statistics for S&P 500 returns, rmt , the equity premium, eqpt , the dividend-price ratio dpt and the dividend yield, dyt . (c) Plot eqpt , dpt and dyt and compare the results with Figure ??. (d) Estimate the predictive regressions eqpt = αy + βy dyt−1 + uy,t eqpt = αp + βp dpt−1 + up,t for two different sample periods, 1926 to 1990 and 1926 to 2002, and compare your results with Table 6.3. (e) Estimate the regressions recursively using data up to 1940 as the starting sample in order to obtain recursive estimates of βy and βp together with 95% confidence intervals. Plot and interpret the results. (8) Simulating VaR for a Single Asset pv.wf1, pv.dta, pv.xlsx The data are monthly observations on the logarithm of real United States equity returns, rpt , from January 1871 to June 2004, expressed as percentages. The problem is to simulate 99% Value-at-Risk over a time horizon of six months for the asset that pays the value of the United States equity index (a) Assume that the equity returns are generated by an AR(1) model rpt = φ0 + φ1 rpt−1 + vt . (b) Use the model to provide ex post static forecasts of the entire sample and thus compute the one-step-ahead prediction errors, vbt+1 . 6.10 Stochastic Simulation 199 (c) Generate 1000 forecasts of the terminal equity price PT +6 using stochastic simulation by implementing the following steps. (i) Forecast rp b sT +k using the scheme rp b sT +k = φb0 + φb1 rp b sT +k−1 + ṽT +k , where ṽT +k is a random draw from the estimated one-step-ahead prediction errors, vbt+1 . (ii) Compute the simulated equity price PbTs +k = PbTs +k exp(rp b sT +k /100) (iii) Repeat (i) and (ii) for k = 1, 2, · · · 6. (iv) Repeat (i), (ii) and (iii) for s = 1, 2, · · · 1000. (d) Compute the 99% Value-at-Risk based on the S simulated equity prices at T + 6, PbTs +6 .