Uploaded by EIOWeew

Financial Econometric Modelling Textbook

advertisement
Financial Econometric Modelling
Stan Hurn, Vance Martin, Peter Phillips and Jun Yu
Preface
This book provides a broad ranging introduction to the financial econometrics from a thorough grounding in basic regression and inference to more
advanced financial econometric methods and applications in financial markets. The target audiences are intermediate and advanced undergraduate
students, honours students who wish to specialise in financial econometrics
and postgraduate students with limited backgrounds in finance who are doing masters courses designed to offer an introduction to finance.Throughout
the exposition, special emphasis is placed on the illustration of core concepts using interesting data sets and emphasising a hands-on approach to
learning by doing. The guiding principle that is adopted is only by working
through plenty of applications and exercises can a coherent understanding
of the properties of financial econometric models and interrelationships with
the underlying finance theory be achieved.
Organization of the Book
Part ONE is designed to be a semester long first course in financial econometrics. Consequently the level of technical difficulty is kept to a bare minimum
with the emphasis on the intuition. Slightly more challenging sections are
included but are clearly marked with a dagger † and may be omitted without losing the flow of the exposition. The main estimation technique used
is limited to ordinary least squares. Of course this choice does require the
discussion to be quite loose in places, but in these instances are revisited
later in Parts TWO and THREE so that a fuller picture can be obtained if
desired.
Although there are specific applications and reproductions of results from
papers that use a variety of data sources, by and large the general concepts are illustrated using the stock market data that is downloadable from
the homepage of Nobel Laureate Robert Shiller.1 This data set consists of
monthly stock price, dividends, and earnings data and the consumer price
index all starting January 1871. The data set used is truncated at June 2004
at the time of writing the data is current to 2013 and is updated regularly.
This is deliberate, in that it allows both the reproduction of the examples
and illustrations in the book, but also allows the reader to explore the effects
of the using the more recent data.
The level of difficulty steps up a little in Parts TWO and THREE are
aimed at more advanced undergraduates, honours and masters students.
1
http://www.econ.yale.edu/~shiller/data.htm
iv
The material in these two parts is more than enough for a semester course
in advanced financial econometrics.
Computation
All the results reported in the book may be reproduced using the econometric software packages EViews and Stata. In some cases the programming
languages of these packages needs to be used. For those who actively choose
to learn by programming the results are also reproducible using the R programming language.2 Presenting the numerical results of the examples in
the text immediately gives rise to two important issues concerning numerical precision. In all of the examples listed in the front of the book where
computer code has been used, the numbers appearing in the text are rounded
versions of those generated by Eviews. The publication quality graphics were
generated using Stata.
The fact that all the exercises, figures and tables in the text can be easily
reproduced in these three environments helps to bridge the gap between
theory and practice by enabling the the reader to build on the code and
tailor it to more involved applications. The data files used by the book are all
available for download from a companions website (www.finects.book) in
EViews format (.wf1), Stata format (.dta) and as Excel spreadsheets (.xlsx).
A complete description of the variables, frequency, sample and number of
observations in each data set is available in Appendix A. Code to reproduce
the figures, examples and complete the exercises is also available.
Acknowledgements
Stan Hurn Vance Martin, Peter Phillips and Jun Yu
December 2013
2
EViews is the copyright of IHS-Inc. www.eviews.com, Stata is the copyright of StataCorp LP
www.stata.com and R www.r-project.org is a free software environment for statistical
computation and graphics which is part of the GNU Project.
Contents
List of illustrations
PART ONE
1
2
BASICS
page 1
1
Properties of Financial Data
1.1 Introduction
1.2 A First Look at the Data
1.2.1 Prices
1.2.2 Returns
1.2.3 Simple Returns
1.2.4 Log Returns
1.2.5 Excess Returns
1.2.6 Yields
1.2.7 Dividends
1.2.8 Spreads
1.2.9 Financial Distributions
1.2.10 Transactions
1.3 Summary Statistics
1.3.1 Univariate
1.3.2 Bivariate
1.4 Percentiles and Computing Value-at-Risk
1.5 The Efficient Markets Hypothesis and Return Predictability
1.6 Efficient Market Hypothesis and Variance Ratio Tests†
1.7 Exercises
3
3
4
4
6
8
8
10
10
11
14
14
16
18
19
22
23
Linear Regression Models
2.1 Introduction
35
35
27
30
32
vi
3
Contents
2.2
2.3
Portfolio Risk Management
Linear Models in Finance
2.3.1 The Constant Mean Model
2.3.2 The Market Model
2.3.3 The Capital Asset Pricing Model
2.3.4 Arbitrage Pricing Theory
2.3.5 Term Structure of Interest Rates
2.3.6 Present Value Model
2.3.7 C-CAPM †
2.4 Estimation
2.5 Some Results for the Linear Regression Model†
2.6 Diagnostics
2.6.1 Diagnostics on the Dependent Variable
2.6.2 Diagnostics on the Explanatory Variables
2.6.3 Diagnostics on the Disturbance Term
2.7 Estimating the CAPM
2.8 Qualitative Variables
2.8.1 Stock Market Crashes
2.8.2 Day-of-the-week Effects
2.8.3 Event Studies
2.9 Measuring Portfolio Performance
2.10 Exercises
36
38
38
39
40
41
41
42
43
45
46
49
49
50
52
54
57
57
59
60
61
66
Modelling with Stationary Variables
3.1 Introduction
3.2 Stationarity
3.3 Univariate Autoregressive Models
3.3.1 Specification
3.3.2 Properties
3.3.3 Mean Aversion and Reversion in Returns
3.4 Univariate Moving Average Models
3.4.1 Specification
3.4.2 Properties
3.4.3 Bid-Ask Bounce
3.5 Autoregressive-Moving Average Models
3.6 Regression Models
3.7 Vector Autoregressive Models
3.7.1 Specification and Estimation
3.7.2 Lag Length Selection
3.7.3 Granger Causality Testing
74
74
75
76
76
77
80
81
81
82
83
83
84
85
85
88
90
Contents
3.8
3.7.4 Impulse Response Analysis
3.7.5 Variance Decomposition
3.7.6 Diebold-Yilmaz Spillover Index
Exercises
vii
91
92
93
95
4
Nonstationarity in Financial Time Series
4.1 Introduction
4.2 Characteristics of Financial Data
4.3 Deterministic and Stochastic Trends
4.3.1 Unit Roots†
4.4 The Dickey-Fuller Testing Framework
4.4.1 Dickey-Fuller (DF) Test
4.4.2 Augmented Dickey-Fuller (ADF) Test
4.5 Beyond the Dickey-Fuller Framework†
4.5.1 Structural Breaks
4.5.2 Generalised Least Squares Detrending
4.5.3 Nonparametric Adjustment for Autocorrelation
4.5.4 Unit Root Test with Null of Stationarity
4.5.5 Higher Order Unit Roots
4.6 Price Bubbles
4.7 Exercises
101
101
101
105
109
110
110
114
116
116
117
119
119
120
121
125
5
Cointegration
5.1 Introduction
5.2 Equilibrium Relationships
5.3 Equilibrium Adjustment
5.4 Vector Error Correction Models
5.5 Relationship between VECMs and VARs
5.6 Estimation
5.7 Fully Modified Estimation†
5.8 Testing for Cointegration
5.8.1 Residual-based tests
5.8.2 Reduced-rank tests
5.9 Multivariate Cointegration
5.10 Exercises
131
131
132
134
136
138
140
143
148
148
150
154
156
6
Forecasting
6.1 Introduction
6.2 Types of Forecasts
6.3 Forecasting with Univariate Time Series Models
6.4 Forecasting with Multivariate Time Series Models
6.4.1 Vector Autoregressions
162
162
162
164
168
169
viii
Contents
6.4.2 Vector Error Correction Models
6.5 Forecast Evaluation Statistics
6.6 Evaluating the Density of Forecast Errors
6.6.1 Probability integral transform
6.6.2 Equity Returns
6.7 Combining Forecasts
6.8 Regression Model Forecasts
6.9 Predicting the Equity Premium
6.10 Stochastic Simulation
6.10.1 Exercises
170
172
175
176
178
179
182
184
189
193
PART TWO
201
ADVANCED TOPICS
7
Maximum Likelihood
7.1 Introduction
7.2 The Likelihood Principle and the CAPM
7.3 A Duration Model for Trades
7.4 A Constant Mean Model of the Interest Rate
7.5 The Log-likelihood Function
7.6 Analytical Solution
7.6.1 Duration Model
7.6.2 Returns
7.6.3 Models of Interest Rates
7.7 The Log-Likelihood Function
7.8 Numerical Approach
7.8.1 Returns
7.8.2 Durations
7.9 Properties of Maximum Likelihood Estimators
7.10 Hypothesis Tests based on the Likelihood Principle
7.11 Testing CAPM
7.12 Testing the Vasicek Model of Interest Rates
7.13 Exercises
203
203
203
204
207
207
209
209
211
214
215
216
217
218
218
219
221
222
223
8
Generalised Method of Moments
8.1 Introduction
8.2 Moment Conditions
8.3 Estimation
8.3.1 Just Identified
8.3.2 Over Identified
8.3.3 Choice of Weighting Matrix
233
233
234
235
235
236
237
Contents
8.4
8.5
8.6
8.7
8.3.4 Choice of estimation method
The Distribution of the GMM Estimator
Testing
Consumption CAPM
Exercises
ix
239
240
241
243
245
9
Panel Data
9.1 Introduction
9.2 Portfolio Returns
9.2.1 Time Series Regressions
9.2.2 Fama-MacBeth Regressions
9.3 No Common Effects
9.4 Pooling Time Series and Cross Section Data
9.5 Fixed Effects
9.5.1 Dummy Variable Estimator
9.5.2 Fixed Effects Estimator
9.6 Random Effects
9.6.1 Generalised Least Squares
9.6.2 Fixed or Random Effects
9.7 Applications
9.7.1 Performance of Family Owned Firms
9.8 Exercises
256
256
257
257
258
262
263
265
266
266
267
268
269
270
270
270
10
Factor Models
273
11
Risk
11.1
11.2
11.3
274
274
274
279
280
281
283
284
286
288
289
290
291
297
11.4
11.5
11.6
11.7
and Volatility Models
Introduction
Volatility Clustering
GARCH
11.3.1 Specification
11.3.2 Estimation
11.3.3 Forecasting
Asymmetric GARCH Models
GARCH in Mean
Multivariate GARCH
11.6.1 BEKK Model
11.6.2 Estimation
11.6.3 DCC
Exercises
PART THREE
FINANCIAL MARKETS
309
x
Contents
12
Fixed Interest Securities
12.1 Introduction
12.2 Background and Terminology
12.3 Statistical Properties of Yields
12.4 Forecasting the Yield Curve
12.5 Expectations Hypothesis
12.5.1 Hypothesis Testing
12.6 Discrete Time Models
12.6.1 Simple Model
12.6.2 Autoregressive Dynamics
12.7 Fitting Term Structure Models to Data
12.7.1 Square Root Models
12.7.2 Levels Effects
12.8 Testing a CKLS Model of Interest Rates
12.9 Continuous Time Models
12.9.1 Vasicek
12.9.2 Cox-Ingersoll-Ross
12.9.3 Singleton
12.9.4 Option Price Formulae
12.10 Estimation
12.10.1Jackknifing
12.11 Interpreting Factors
12.12 Application to Option Pricing
12.13 Conclusions
12.14 Computer Applications
12.14.1EViews Commands
12.14.2Exercises
311
311
312
314
317
320
325
327
327
328
328
328
328
328
334
334
334
334
334
334
334
334
334
334
334
334
334
13
Futures Markets
340
14
Microstructure
14.1 Introduction
341
341
Appendix A Data Description
342
Appendix B Long-Run Variance: Theory and Estimation
351
Appendix C Numerical Optimisation
References
Author index
Subject index
357
368
375
376
Illustrations
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
2.1
2.2
2.3
2.4
3.1
3.2
3.3
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Monthly U.S. equity price index from 1933 to 1990
Logarithm of monthly U.S. equity price index from 1933 to 1990
Monthly U.S. equity returns from 1933 to 1990
Monthly U.S. zero coupon yields from 1946 to 1987
Monthly U.S. equity prices and dividends 1933 to 1990
Monthly U.S. dividends yield 1933 to 1990
U.S. zero coupon 6 and 9 month spreads from 1933 to 1990
Histogram of $/£ exchange rate returns
Histogram of durations between trades for AMR
U.S. equity returns for the period 1933 to 1990 with sample
average superimposed
U.S. equity prices for the period 1933 to 1990 with sample average superimposed
Histogram of monthly U.S. equity returns 1933 -1990
Histogram of Bank of America trading revenue
Daily 1% VaR for Bank of America
Least squares residuals from CAPM regressions
Microsoft prices and returns 1990-2004
Histogram of Microsoft CAPM residuals
Fama-French and momentum factors
S&P Index 1957- 2012
S&P500 log returns 1957- 2012
VAR impulse responses for equity-dividend model
Simulated random walk with drift
Different filters applied to U.S. equity prices
Deterministic and stochastic trends
Simulated distribution of Dickey-Fuller test
NASDAQ Index 1973 - 2009
Recursive estimation of ADF tests on the NASDAQ
Rolling window estimation of ADF tests on the NASDAQ
4
6
7
11
12
13
15
16
18
19
20
22
25
27
56
58
59
65
75
75
92
103
104
108
113
121
123
124
2
5.1
5.2
5.3
5.4
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.1
7.2
7.3
7.4
7.5
7.6
7.7
8.1
9.1
11.1
11.2
11.3
11.4
12.1
12.2
12.3
12.4
12.5
12.6
Illustrations
Logarithm of U.S. equity prices, dividends and earnings
Phase diagram to demonstrate equilibrium adjustment
Scatter plot of U.S. equity prices, dividends and earnings
Residuals from cointegrating regression
AR(1) forecast of United States equity returns
Probability integral transform
Illustrating the probability integral transform
Illustrating the probability integral transform
Equity premium, dividend yield and dividend price ratio
Recursive coefficients from predictive regressions
Evaluating predictive regressions of the equity premium
Stochastic simulation of equity prices
Simulating VAR
Durations between AMR trades
Log-likelihood function of exponential model
Eurodollar interest rates
Density of Eurodollar interest rates
Transitional density of Eurodollar interest rates
Illustrating the LR and Wald tests
Illustrating the LM test
Moment conditions
Fama-MacBeth regression coefficients
Volatility clustering in merger hedge fund returns
Empirical distribution of merger hedge fund returns
Conditional variance
News impact curve
U.S. Term structure January 2000
U.S. zero coupon yields
Yield curve factor loadings
Diebold and Li (2006) factor loadings
Monthly U.S. zero coupon bond yields 1946 to 1991
Impulse responses of a VECM (zero.*)
132
134
136
149
168
176
177
179
185
187
188
190
192
206
210
211
212
215
220
221
235
261
275
276
282
285
314
315
316
319
329
339
PART ONE
BASICS
1
Properties of Financial Data
1.1 Introduction
The financial pages of newspapers and magazines, online financial sites, and
academic journals all routinely report a plethora of financial statistics. Even
within a specific financial market, the data may be recorded at different
observation frequencies and the same data may be presented in various ways.
As will be seen, the time series based on these representations have very
different statistical properties and reveal different features of the underlying
phenomena relating to both long run and short run behaviour. A simple
understanding of these everyday encounters with financial data requires at
least a passing knowledge of the tools for the presentation of data, which is
the subject matter of this chapter.
The characteristics of financial data may also differ across markets. For
example, there is no reason to expect that equity markets behave the same
way as currency markets, or for commodity markets to behave the same
way as bond markets. In some cases, like currency markets, trading is a
nearly continuous activity, while other markets open and close in a regulated
manner according to specific times and days. Options markets have their
own special characteristics and offer a wide and growing range of financial
instruments that relate to other financial assets and markets.
One important preliminary role of statistical analysis is to find stylised
facts that characterise different types of financial data and particular markets. Such analysis is primarily descriptive and helps us to understand the
prominent features of the data and the differences that can arise from basic elements like varying the sampling frequency and implementing various
transformations. Accordingly, the primary aim of this chapter is to highlight
the main characteristics of financial data and establish a set of stylised facts
4
Properties of Financial Data
for financial time series. These characteristics will be used throughout the
book as important inputs in the building and testing of financial models.
1.2 A First Look at the Data
This section identifies the key empirical characteristics of financial data. Special attention is devoted to establishing a set of stylised empirical facts that
characterise financial data. These empirical characteristics are important for
building financial models. A more detailed treatment of the material covered
in this section may be found in Campbell, Lo and MacKinlay (1997).
1.2.1 Prices
Figure 1.1 gives a plot of the monthly United States equity price index
(S&P500) for the period January 1933 to December 1990. The time path of
equity prices shows long-run growth over this period whose general shape is
well captured by an exponential trend. This observed exponential pattern
in the equity price index may be expressed formally as
Pt = Pt−1 exp(rt ) ,
(1.1)
Equity Price Index
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
0
100
200
300
400
where Pt is the current equity price, Pt−1 is the previous month’s price and
rt is the rate of the increase between month t − 1 and month t.
Exponential Trend
Figure 1.1 Monthly equity price index for the United States from January
1933 to December 1990.
If rt in (1.1) is restricted to take the same constant value, r, in all time
1.2 A First Look at the Data
5
periods, then equation (1.1) becomes
Pt = Pt−1 exp(r) .
(1.2)
The relationship between the current price, Pt and the price two months
earlier, Pt−2 , is
Pt = Pt−1 exp(δ) = Pt−2 exp(r) exp(r) = Pt−2 exp(2r) .
By continuing this recursion, the relationship between the current price, Pt ,
and the price T months earlier, P0 , is given by
Pt = P0 exp(rT ).
(1.3)
It is this exponential function that is plotted in Figure 1.1 in which P0 = 7.09
is the equity price in January 1933 and r = 0.0055.
The exponential function in equation (1.3) provides a predictive relationship based on long-run growth behaviour. It shows that in January 1933
an investor who wished to know the price of equities in December 1990
(T = 695) would use
P (Dec.1990) = 7.09 × exp (0.0055 × 695) = 324.143.
The actual equity price in December 1990 is 328.75 so that the percentage
forecast error is
324.143 − 328.75
100 ×
= −1.401%.
328.75
Of course, equation (1.3) is based on information over the intervening
period that would not be available to an investor in 1933. So, the prediction
is called ex post, meaning that it is performed after the event. If we wanted
to use this relationship to predict the equity price in December 2000, then
the prediction would be ex ante or forward looking and the suggested trend
price would be
P (Dec.2000) = 7.09 × exp (0.0055 × 815) = 627.15.
In contrast to the ex post prediction, the predicted share price of 627.15 now
grossly underestimates the actual equity price of 1330.93. The fundamental
reason for this is that the information between 1990 and 2000 has not been
used to inform the choice of the value of the crucial parameter r.
An alternative way of analysing the long run time series behaviour of asset
prices is to plot the logarithms of price over time. An example is given in
Figure 1.2 where the natural logarithm of the equity price given in Figure 1.1
is presented. Comparing the two series shows that while prices increase at
6
Properties of Financial Data
an increasing rate (Figure 1.1) the logarithm of price increases at a constant
rate (Figure 1.2). To see why this is the case, we take natural logarithms of
equation (1.3) to yield
pt = p0 + rT ,
(1.4)
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
2
Log Equity Price Index
3
4
5
6
where lowercase letters now denote the natural logarithms of the variables,
namely, log Pt and log P0 . This is a linear equation between pt and T in
which the slope is equal to the constant r. This equation also forms the
basis of the definition of log returns, a point that is now developed in more
detail.
Figure 1.2 The natural logarithm of the monthly equity price index for the
United States from January 1933 to December 1990.
1.2.2 Returns
The return to a financial asset is one of the most fundamental concepts
in financial econometrics and traditionally more attention is focussed on
returns, which are a scale-free measure of the results of an investment, than
on prices. Abstracting for the moment from the way in which returns are
computed, Figure 1.3 plots monthly equity returns for the United States
over the period January 1933 to December 1990. The returns are seen to
hover around a return value that is near zero over the sample period, in fact
r = 0.0055 as discussed earlier. In fact, we often consider data on financial
asset returns to be distributed about a mean return value of zero. This
7
1.2 A First Look at the Data
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
-.2
-.1
Equity Returns
0
.1
.2
.3
feature of equity returns contrasts dramatically with the trending character
of the corresponding equity prices presented in Figure 1.1.
Figure 1.3 Monthly United States equity returns for the period January1933 to December 1990.
The empirical differences in the two series for prices and returns reveals an
interesting aspect of stock market behaviour. It is often emphasised in the
financial literature that investment in equities should be based on long run
considerations rather than the prospect of short run gains. The reason is that
stock prices can be very volatile in the short run. This short run behaviour is
reflected in the high variability of the stock returns shown in Figure 1.3. Yet,
although stock returns themselves are generally distributed about a mean
value of approximately zero, stock prices (which accumulate these returns)
tends to trend noticeably upwards over time as is apparent in Figure 1.1.
If stock prices were based solely on the accumulation of quantities with a
zero mean, then there would be no reason for this upwards drift over time, a
which is taken up again in Chapter ??. For present purposes, it is sufficient to
remark that when returns are measured over very short periods of time, any
tendency of prices to drift upwards is virtually imperceptible because that
effect is so small and is swamped by the apparent volatility of the returns.
This interpretation puts emphasis on the fact that returns generally focus
on short run effects whereas price movements can trend noticeably upwards
over long periods of time.
8
Properties of Financial Data
1.2.3 Simple Returns
The simple return on an asset between time t and t − 1 is given by
Rt =
Pt
Pt − Pt−1
=
− 1.
Pt−1
Pt−1
The compound return for n periods, Rn,t , is therefore given by
Pt
−1
Pt−n
Pt−(n+2) Pt−(n+1)
Pt
Pt−1
=
×
× ··· ×
−1
×
Pt−1 Pt−2
Pt−(n+1)
Pt−n
Rn,t =
= (1 + Rt ) × (1 + Rt−1 ) × · · · × (1 + Rt−(n+2) ) × (1 + Rt−(n+1) ) − 1
=
n−1
Y
j=0
(1 + Rt−j ) − 1
The most common period over which a return is quoted is one year and
returns data are commonly presented in per annum terms. In the case of
monthly returns, the associated annualised simple return is computed as a
geometric mean given by

1/12
11
Y
Annualised Rn,t =  (1 + Rt−j )
− 1.
(1.5)
j=0
1.2.4 Log Returns
The log return of an asset is defined as
rt = log Pt − log Pt−1 = log(1 + Rt ) .
(1.6)
Log returns are also referred to as continuously compounded returns. It is
now clear that this definition of log returns is identical to that given in
equation (1.4) with t = 1. The motivation for dealing with log returns stems
from the associated ease with which compound returns may be dealt with.
For example, the compound 2-period return is given by
r2,t = (log Pt − log Pt−1 ) + (log Pt−1 − log Pt−2 ) = rt + rt−1 ,
(1.7)
so that, by extension, the n-period compound return is simply
rn,t = rt + rt−1 + · · · + rt−(n+1) =
n−1
X
j=1
rt−j ,
(1.8)
1.2 A First Look at the Data
9
In other words, the n-period compound log return is simply the sum of the
single period log returns over the pertinent period. For example, for monthly
log returns the annualised rate is
Annualised rn,t =
n−1
X
j=0
rt−j = log Pt − log Pt−n ,
(1.9)
where the last equality may be deduced from inspection of the first term
on the right hand side of equation (1.7), after cancellation of terms. The
major implication of the result in expression (1.9) is that a series of monthly
returns can be expressed on a per annum basis by simply multiplying all
monthly returns by 12, the implicit assumption being that the best guess of
the per annum return is that the current monthly return will persist for the
next 12 months. Another way to look at this is as follows. If rt is regarded
as a constant, then it follows that the return over the year is
rt × 12 = log Pt − log Pt−12 ,
and the price increase over the year is given by
Pt = Pt−12 exp(rt × 12) .
(1.10)
This is exactly the relationship established in equation (1.2). By analogy,
if prices are observed quarterly, then the individual quarterly returns can
be annualised by multiplying the quarterly returns by 4. Similarly, if prices
are observed daily, then the daily returns are annualised by multiplying the
daily returns by the number of trading days 252. The choice of 252 for the
number of trading days is an approximation as a result of holidays and leap
years etc. Other choices are 250 and, very rarely, the number of calendar
days, 365, is used.
One major problem with using log returns as opposed to simple returns
relates to the construction of portfolios of assets. The problem stems from
the fact that taking a logarithm is a nonlinear transformation and this action
causes problems when computing portfolio returns. The problem stems from
the fact that log return on the portfolio cannot be expressed as a sum of log
returns which each return weighted by the asset’s share in the portfolio. The
reason for this is that the logarithm of a sum is not equivalent to the sum of
logarithm of each of the constituents of the sum. We will largely ignore this
problem because when returns are measured over short intervals and are
therefore small the log return on the portfolio is negligibly different to the
weighted sum of logarithm of the constituent asset returns. A more detailed
treatment of this point is provided in the excellent texts of Campbell, Lo
and MacKinlay (1997) and Tsay (2010).
10
Properties of Financial Data
1.2.5 Excess Returns
The difference between the return on a risky financial asset and the return on
some benchmark asset that is usually assumed to be a risk-free alternative,
usually denoted rf,t , is known as the excess return. The risk-free return is
usually taken to be the return on a government bond because the risk of
default on this investment is so low as to be negligible. The simple and log
excess returns on an asset are therefore defined, respectively, as
Zt = Rt − rf,t
zt = rt − rf,t .
(1.11)
1.2.6 Yields
A bond can be viewed simply as an interest only loan in the sense that the
borrower will pay the interest in every period up to the maturity of loan,
but none of the principal. The principal (or face value) of the bond is then
repaid in full at end of the life of the bond (or at maturity). The number of
years until the face value is paid off is called the bond’s time to maturity.
The yield on a bond is now defined as the discount rate that equates the
present value of the bond’s face value to its price. For present purposes,
assume that the bond pays no interest at all (a zero coupon bond) and the
investor’s return comes solely from the difference between the sale price of
the bond and its face value at maturity. Bonds are dealt with in detail in
Chapter 12 but for the moment, it suffices to state that the price of a zero
coupon bond that pays $1 at maturity in n years is given by
Pn,t = exp (−nyt ) ,
(1.12)
in which yn,t represents the yield, commonly expressed in per annum terms.
The yield can be derived by taking natural logarithms and rearranging equation (12.4) to give
1
yn,t = − pn,t .
(1.13)
n
This expression shows that the yield is inversely proportional to the natural
logarithm of the price of the bond. Figure 1.4 gives plots of yields on United
States zero coupon bonds for maturities ranging from 2 months (n = 2/12)
to 9 months (n = 9/12).
The plot shown in Figure 1.4 show that the actual time series behaviour
of bond yields is fairly complex, with periods of rising and falling yields
that have a random wandering character. Randomly wandering series such
as these in Figure 1.4 are very common in both finance and economics.
11
0
5
Zero Coupon Yields
10
15
20
1.2 A First Look at the Data
1945
1955
1950
1965
1960
1975
1970
1985
1980
Figure 1.4 Monthly United States zero coupon bond yields for maturities
ranging from 2 months to 9 months the period December 1946 to February
1987.
One particularly important feature of such series is that they behave as if
they have no fixed mean level, so that they wander around in an apparently
random manner over time continually revisiting earlier levels.
1.2.7 Dividends
In many applications in finance, as in economics, the focus is on understanding the relationships among two or more series. For instance, in present value
models of equities, the price of an equity is equal to the discounted future
stream of dividend payments
Dt+1
Dt+2
Dt+3
Pt = Et
+
+
+
·
·
·
,
(1.14)
(1 + δt+1 ) (1 + δt+2 )2 (1 + δt+n )3
where Et [Dt+n ] represents the expectation of dividends in the future at time
t + n given information available at time t and δt+n is the corresponding
discount rate.
The relationship between equity prices and dividends is highlighted in
Figure 1.5 which plots United States equity prices and dividend payments
from January 1933 to December 1990. There appears to be a relationship
between the two series as both series exhibit positive exponential trends. To
analyse the relationship between equity prices and dividends more closely,
12
Properties of Financial Data
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
2
3
4
5
6
(a) Equity Prices
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
0
5
10
15
(b) Dividend Payments
Figure 1.5 Monthly United States equity prices and dividend payments for
the period January1933 to December 1990.
consider the dividend yield
YIELDt =
Dt
,
Pt
(1.15)
which is presented in Figure 1.6 based on the data in Figure 1.5. The dividend yield exhibits no upward trend and instead wanders randomly around
the level 0.05. This behaviour is in stark contrast to the equity price and
dividend series which both exhibit strong upward trending behaviour.
The calculation of the dividend yield in (1.15) provides an example of
how combining two or more series can change the time series properties of
the data - in the present case by apparently eliminating the strong upward
trending behaviour. The process of combining trending financial variables
into new variables that do not exhibit trends is a form of trend reduction.
An extremely important case of trend reduction by combining variables is
known as cointegration, a concept that is discussed in detail in Chapter 5.
The expression for the dividend yield in (1.15) can be motivated from
the present value equation in (1.14), by adopting two simplifying assumptions. First, expectations of future dividends are given by present dividends
13
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
.02
.04
Dividend Yields
.06
.08
.1
1.2 A First Look at the Data
Figure 1.6 Monthly United States dividend yield for the period December
1946 to February 1987.
Et [Dt+n ] = D. Second, the discount rate is assumed to be fixed at δ. Using
these two assumptions in (1.14) gives
1
1
+
+ ...
Pt = D
(1 + δ) (1 + δ)2
D
1
1
=
1+
+
+ ...
1+δ
(1 + δ) (1 + δ)2
D
1
=
1 + δ 1 − 1/ (1 + δ)
D
= ,
δ
where the penultimate step uses the sum of a geometric progression.1 Rearranging this expression gives
D
δ=
,
(1.16)
Pt
which shows that the discount rate, δ is equivalent to the dividend yield,
YIELDt .
An alternative representation of the present value model suggested by
1
An infinite geometric progression is summed as follows
1 + λ + λ2 + λ3 + ... =
where in the example λ = 1/ (1 + δt ).
1
,
1−λ
|λ| < 1,
14
Properties of Financial Data
equation (1.15) is to transform this equation into natural logarithms and
rearrange for log (Pt ) as
log (Pt ) = − log (δt ) + log (Dt ) .
Assuming equities are priced according to the present value model, this
equation shows that there is a one-to-one relationship between log Pt and
log Dt . The relationship is explored in detail in Chapter 5 using the concept
of cointegration.
1.2.8 Spreads
An important characteristic of the bond yields presented in Figure 1.4 is that
they all exhibit similar time series patterns, in particular a general upward
drift with increasing volatility. This commonality suggests that yields do
not move too far apart from each other. One way to highlight this feature
is to compute the spread between the yields on a long maturity and a short
maturity
SPREADt = yLON G,t − ySHORT,t .
Figure 1.7 gives the 6 and 9 month spreads relative to the 3 month zero
coupon yield. None of these spreads exhibit any noticeable trend and all
seem to hover around a constant level. The spreads also show increasing
volatility over the sample period with the gyrations increasing towards the
end of the sample.
Comparison of Figures 1.4 and 1.7 reveals that yields exhibit vastly different time series patterns to spreads, with the former having upward trends
while the latter show no evidence of trends. This example is another illustration of how combining two or more series can change the time series
properties of the data.
1.2.9 Financial Distributions
An important assumption underlying many theoretical and empirical models in finance is that returns are normally distributed. This assumption is
widely used in portfolio allocation models, in Value-at-Risk (VaR) calculations, in pricing options, and in many other applications. An example of
an empirical returns distribution is given in Figure 1.8 which gives the histogram of hourly United States exchange rate returns computed relative to
the British pound. Even though this distribution exhibits some character-
15
85
19
80
19
75
19
70
19
65
19
60
19
55
19
50
19
85
19
80
19
75
19
70
19
65
19
60
19
55
19
50
19
19
45
-1
9-Month Spread
0
1
2
19
45
-1
6-Month Spread
0
1
2
1.2 A First Look at the Data
Figure 1.7 Monthly United States 6-month and 9-month zero coupon
spreads computed relative to the 3-month zero coupon yield for the period January1933 to December 1990.
istics that are consistent with a normal distribution such as symmetry, the
distribution differs from normality in two important ways:
(1) The presence of heavy tails.
(2) A sharp peak in the centre of the distribution.
Distributions exhibiting these properties are known as leptokurtic distributions. As the empirical distribution exhibits tails that are much thicker
than those of a normal distribution, the actual probability of observing excess returns is higher than that implied by the normal distribution. The
empirical distribution also exhibits some peakedness at the centre of the distribution around zero, and this peakedness is sharper than that of a normal
distribution. This feature suggests that there are many more observations
where the exchange rate returns hardly moves and for which there are small
returns than there would be in the case of draws from a normal population.
16
0
100
Density
200
300
400
Properties of Financial Data
-.015
-.01
-.005
0
Exchange rate returns
.005
.01
Figure 1.8 Empirical distribution of hourly $/£ exchange rate returns for
the period 1 January 1986 00:00 to 15 July 1986 11:00 with a normal
distribution overlaid.
The example given in Figure 1.8 is for exchange rate returns. But the
property of heavy tails and peakedness of the distribution of returns is common for other asset markets including equities, commodities and real estate
markets. All of these empirical distributions are therefore inconsistent with
the assumption of normality and financial models that are based on normality, therefore, may result in financial instruments such as options being
incorrectly priced or measures of risk being underestimated.
1.2.10 Transactions
A property of all of the financial data analysed so far is that observations
on a particular variable are recorded at discrete and regularly spaced points
in time. The data on equity prices and dividend payments in Figure 1.5 and
the data on zero coupon bond yields in Figure 1.4, are all recorded every
month. In fact, higher frequency data are also available at regularly spaced
time intervals, including daily, hourly and even 10-15 minute observations.
More recently, transactions data have become available which records the
price of every trade conducted during the trading day. An example is given in
Table 1.1 which gives a snapshot of the trades recorded on American Airlines
on August 1, 2006. The variable Trade, x is a binary variable signifying
17
1.2 A First Look at the Data
whether a trade has taken place so that
1 : Trade occurs
xt =
0 : No trade occurs.
The duration between trades, u, is measured in seconds, and the corresponding price of the asset at the time of the trade, P , is also recorded. The
table shows that there is a trade at the 5 second mark where the price is
$21.58. The next trade occurs at the 11 second mark at a price of $21.59,
so the duration between trades is u = 6 seconds. There is another trade
straight away at the 12 second mark at the same price of $21.59, in which
case the duration is just u = 1 second. There is no trade in the following
second, but there is one two seconds later at the 14 second mark, again at
the same price of $21.59, so the duration is u = 2 seconds.
The time differences between trades of American Airlines (AMR) shares
is further highlighted by the histogram of the duration times, u, given in
Figure 1.9. This distribution has an exponential shape with the duration
time of u = 1 second, being the most common. However, there are a number
of durations in excess of u = 25 seconds, and there are some times even in
excess of 50 seconds.
Table 1.1
American Airlines (AMR) transactions data:
on August 1 2006, at 9 hours and 42 minutes.
Sec.
Trade
(x)
Duration
(u)
Price
(P )
5
6
7
8
9
10
11
12
13
14
1
0
0
0
0
0
1
1
0
1
1
1
1
1
1
1
6
1
1
2
$21.58
$21.58
$21.58
$21.58
$21.58
$21.58
$21.59
$21.59
$21.59
$21.59
The important feature of transactions data that distinguishes it from the
time series data discussed above, is that the time interval between trades
is not regular or equally spaced. In fact, if high frequency data are used,
such as 1 minute data, there will be periods where no trades occur in the
18
Properties of Financial Data
window of time and the price will not change. This is especially so in thinly
traded markets. The implication of using such transactions data is that the
models specified in econometric work need to incorporate those features, including the apparent randomness in the observation interval between trades.
Correspondingly, the appropriate statistical techniques are expected to be
different from the techniques used to analyse regularly spaced financial time
series data. These issues for high frequency irregularly spaced data are investigated further in Chapter 14 on financial microstructure effects.
1.3 Summary Statistics
In the previous section, the time series properties of financial data are explored using a range of graphical tools, including line charts, scatter diagrams and histograms. In this section a number of statistical methods are
used to summarise financial data. While these methods are general summary
measures of financial data, a few important case will be highlighted in which
it is inappropriate to summarise financial data using these simple measures.
0
.05
Density
.1
.15
Histogram of Durations between AMR Trades
0
20
40
60
Duration (secs)
80
100
Figure 1.9 Empirical distribution of durations (in seconds) between trades
of American Airlines (AMR) on 1 August 2006 from 09:30 to 04:00 (23 401
observations).
19
1.3 Summary Statistics
1.3.1 Univariate
Sample Mean
An important feature of United States equity returns in Figure 1.3 is that
they hover around some average value over the sample period. This average
value is formally known as the sample mean. For the log returns series, rt ,
the sample mean is defined as
T
1X
rt .
T
r=
(1.17)
t=1
For the United States equity returns in in Figure 1.3, the sample mean
is r = 0.005568. This value is plotted in Figure 1.10 together with the
actual returns data. Not surprisingly, this value is very close to the value
of r = 0.0055 used in Figure 1.1. Expressing the monthly sample mean in
annual terms gives
0.005568 × 12 = 0.0668,
Equity Returns
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
-.2
-.1
0
.1
.2
.3
which shows that average returns over the period 1933 to 1990 are 6.68%
per annum.
Mean Return
Figure 1.10 Monthly United States equity returns for the period January
1933 to December 1990 with the sample average superimposed.
An example where computing the sample mean is an inappropriate summary measure is the equity price index given in Figure 1.1. Figure 1.11 plots
20
Equity Price Index
90
Ja
n
19
80
Ja
n
19
70
Ja
n
19
60
Ja
n
19
50
Ja
n
19
40
19
n
Ja
Ja
n
19
30
0
100
200
300
400
Properties of Financial Data
Mean Price
Figure 1.11 Monthly United States equity price index for the period January 1933 to December 1990 with the sample average superimposed.
the equity price index again, together with its sample mean of P = 80.253.
Clearly the sample mean is not a representative measure of the equity price
as there is no tendency for the equity price to return to its mean. In fact, the
equity price is trending upwards away from its sample mean. A comparison
of Figures 1.10 and 1.11 suggests that models of returns and prices need to
be different.
Sample Sample Variance and Standard Deviation
Risk refers to the uncertainty surrounding the value of, or payoff from, a
financial investment. In other words, risk reflects the chance that the actual
return on an investment may be very different than the expected return and
increased potential for loss from investments have obvious ramifications for
individual investors. Figure 1.10 shows that actual returns deviate from the
sample mean in most periods and the larger are these deviations the more
risky is the investment. The classic measure of risk is given by the average
squared deviation of returns from the mean, which is known as the sample
variance
T
1 X
s =
(rt − r)2 .
T −1
2
t=1
(1.18)
1.3 Summary Statistics
21
In the case of the returns data, the sample variance is s2 = 0.0402602 =
0.00162. In finance, the sample standard deviation, which is the square root
of the variance,
v
u
T
u 1 X
t
(rt − r)2 ,
(1.19)
s=
T −1
t=1
is usually used as the measure of the riskiness of an investment and is called
the volatility of a financial return. The standard deviation has the scale as
a return (rather than a squared return) and is therefore easily interpretable.
The sample standard deviation of the returns series in Figure 1.3 is s =
0.040260.
Sample Skewness
Whilst the variance provides an average summary measure of deviations of
returns around the sample mean, investors are also interested in the occurrence of extreme returns. Figure 1.12 gives a histogram of the United States
equity returns previously plotted in Figure 1.3, which shows that there is a
larger concentration of returns below the sample mean of r = 0.005568 (left
tail) than there is for returns above the sample mean (right tail). In fact, the
sample skewness is computed to be SK = −0.299. Formally, the distribution
in this case is referred to as being negatively skewed as it shows that there is
a greater chance (probability) of large returns below the sample mean than
large returns above the sample mean. A distribution is positively skewed if
the opposite is true, whereas a distribution is symmetric if the probabilities
of extreme returns above and below the sample mean is the same.
Sample Kurtosis
The sample skewness statistic focusses on whether the extreme returns are
in the left or the right tail of the distribution. The sample kurtosis statistic
identifies if there are extreme returns, regardless of sign, relative to some
benchmark, typically the normal distribution.
The measure of kurtosis is
T 1 X rt − r 4
KT =
,
(1.20)
T
s
t=1
which is compared to a value of KT = 3 that would occur if the returns
came from a normal distribution. In the case of the United States equity
returns in Figure 1.12, the sample kurtosis is KT = 7.251. As this value is
22
0
5
Density
10
15
Properties of Financial Data
-.2
-.1
0
.1
Equity Returns
.2
.3
Figure 1.12 Empirical distribution of United States equity returns with
sample average superimposed. Data are monthly for the period January
1933 to December 1990.
greater than 3, there are more extreme returns in the data not predicted by
the normal distribution.
1.3.2 Bivariate
Covariance
The statistical measures discussed so far summarise the characteristics of a
single series. Perhaps what is more important in finance is understanding the
interrelationships between two or more financial time series. For example,
in constructing a diversified portfolio, the aim is to include assets whose
returns are not perfectly correlated. Figure ?? provides an example of prices
and dividends moving in the same direction, as reflected by the positive
slope of the scatter diagram. One way to measure co-movements between
the returns on two assets, rit and rjt , is by computing the covariance
sij =
T
1X
(rit − ri ) (rjt − rj ) ,
T
(1.21)
t=1
where ri and rj are the respective sample means of the returns on assets i
and j.
1.4 Percentiles and Computing Value-at-Risk
23
A positive covariance, sij > 0, shows that when the returns of asset i and
asset j have a tendency to move together. That is, when return on asset i
is above its mean, the return on asst j is also likely to be above its mean. A
negative covariance, sij < 0, indicates that when the returns of asset i are
above its sample mean, on average, the returns on asset j are likely to be
below its sample mean. Covariance has a particularly important role to play
in portfolio theory and asset pricing, as will become clear in Chapter 2.
Correlation
Another measure of association that is widely used in finance is the correlation coefficient, defined as
sij
,
(1.22)
cij = √
sii sjj
where
T
1X
sii =
(rit − ri )2 ,
T
t=1
sjj
T
1X
=
(rjt − rj )2 ,
T
t=1
represent the respective variances of the returns of assets i and j. The correlation coefficient is the covariance scaled by the standard deviations of the
two returns. The correlation has the property that is has the same sign as
the covariance, as well as the additional property that it lies in the range
−1 ≤ cij ≤ 1.
1.4 Percentiles and Computing Value-at-Risk
The percentiles of a distribution are a set of summary statistics that summarise both the location and the spread of a distribution. Formally, a percentile is a measure that indicates the value of a given random variable below
which a given percentage of observations fall. So the important measure of
the location of a distribution, the median, below which 50% of the observations of the random variable fall, is also the 50th percentile. The median
is an alternative to the sample mean as a measure of location and can be
very important in financial distributions in which large outliers are encountered. The difference between the 25th percentile (or first quartile) and the
75th percentile (or third quartile) is known as the inter-quartile range. which
provides an alternative to the variance as a measure of the dispersion of the
distribution. It transpires that the percentiles of the distribution, particularly the 1st and 5th percentiles are important statistics in the computation
of an important risk measure in finance known as Value-at-Risk or VaR.
24
Properties of Financial Data
Losses faced by financial institutions have the potential to be propagated
through the financial system and undermine its stability. The onset of heightened fears for the riskiness of the banking system can be rapid and have
widespread ramifications. The potential loss faced by banks is therefore a
crucial measure of the stability of the financial sector.
A bank’s fundamental soundness may be measured by its trading revenue,
which is a hypothetical revenue based on portfolio allocation decisions made
by the bank. For the most part, such a measure does not exist, but it is
possible to ascertain actual daily trading revenues, which include the effects
of intraday trades made by the bank and also trading fees and/or commissions, from graphical reports published by some major banks. Pérignon and
Smith (2010) adopted an innovative method for collecting this data. They
searched for banks that had disclosing graphs of the daily trading revenues
over a sufficiently long sample period (2001 - 2004). They then downloaded
the graph, converted it to a JPG image and captured the co-ordinates of
each point in order to return a numerical value for daily trading revenue.
The summary statistics and percentiles of the daily trading revenues of Bank
of America, obtained by this method, are presented in Table 1.2.
Table 1.2
Descriptive statistics and percentiles for daily trading revenue of Bank of America
for the period 2 January 2001 to 31 December 2004.
Statistics
Observations
Mean
Std. Dev.
Skewness
Kurtosis
Maximum
Minimum
1008
13.86988
14.90892
0.1205408
4.925995
84.32714
-57.38857
Percentiles
1%
5%
10%
25%
50%
75%
90%
95%
99%
-24.82143
-9.445714
-2.721429
4.842857
13.14839
22.96184
30.85943
36.43548
57.10429
Mean is greater than the median indicating that the bulk of the values lie to left of the mean and that the distribution is positively skewed.
This conclusion is borne out by the positive value of the skewness statistic,
0.1205, and also by Figure 1.13 which shows a histogram of daily trading
revenue with a normal distribution superimposed. The histogram also shows
very clearly that the distribution of daily trading revenue exhibits kurtosis,
4.9360. The histogram indicates that the peak of the distribution is higher
25
1.4 Percentiles and Computing Value-at-Risk
0
.01
Density
.02
.03
than that of the associated normal distribution and the tails are also fatter.
This situation is known as leptokurtosis.
-50
0
50
100
Trading Revenue
Figure 1.13 Histogram of daily trading revenue from 2 January 2001 to
31 December 2004 reported by Bank of America. Normal distribution with
mean 13.8699 and standard deviation 14.9090 is superimposed.
How may this information be used to inform a discussion about risk?
Following a wave of banking collapses in the 1990s financial regulators, in
the guise of the Basel Committee on Banking Supervision (1996), started
requiring banks to hold capital to buffer against possible losses, measured
using a method called Value-at-Risk (VaR). VaR quantifies the loss that a
bank can face on its trading portfolio within a given period and for a given
confidence interval. More formally in the context of a bank, VaR is defined in
terms of the lower tail of the distribution of trading revenues. Specifically,
the 1% VaR for the next h periods conditional on information at time T
is the 1st percentile of expected trading revenue at the end of the next h
periods. For example, if the daily 1% h-period VaR is $30million, then there
is a 99% chance that at the end of h periods bank’s trading loss will exceed
$30million, but there is a 1% chance the bank will lose $30 million or more.
Although $30 million is a loss in this example, by convention the minus sign
is not used.
There are three common ways to compute VaR.
1. Historical Simulation
The historical method simply computes the percentiles of the distribution from historical data and assumes that history will repeat
26
Properties of Financial Data
itself from a risk perspective. From Table 1.2 the 1% daily VaR for
Bank of America using all available historical data (2001 - 2004) is
$24.8214 million. There is evidence that most banks use historical
simulation to compute VaR (Pérignon and Smith, 2010). Its popularity is probably due to a combination of simplicity, both conceptually
and computationally, and the fact that estimates of VaR will be
reasonably smooth over time.
2. The Variance-Covariance Method
This method assumes that the trading revenues are normally distributed. In other words, it requires that we estimate only two factors, the expected (or mean) return and the standard deviation, in
order to describe the entire distribution of trading revenue. From
Table 1.2 the mean is $13.8699 mill and the standard deviation is
$14.9089 which taken together generate the normal curve superimposed on the histogram in Figure 1.13. From the assumption of a
normal distribution it follows that 1% of the distribution lies in the
tail delimited by −2.33 standard deviations from the mean. The daily
1% VaR for Bank of America is therefore
13.8699 − 2.33 × 14.9089 = $20.8679 .
This value is slightly lower than that provided by historical simulation because the assumption of normality ignores the slightly fatter
tails exhibited by the empirical distribution of daily trading revenues.
3. Monte Carlo Simulation
The third method involves developing a model for future stock price
returns and running multiple hypothetical trials through the model.
A Monte Carlo simulation refers to any method that randomly generates trials, but by itself does not tell us anything about the underlying methodology. This approach is revisited in Chapter 6.
Figure 1.14 plots the daily trading revenue of the Bank of America together with the 1% daily VaR reported by the bank obtained by Pérignon
and Smith in the manner just described. Even to the naked eye it is apparent
that Bank of America had only four violations of the 1% daily reported VaR
during the period 2001-2004 (T = 1008), amounting to only 0.4%. The daily
VaR computed from historical simulation is also shown and it provides compelling evidence that the Bnak of America has been over-conservative in its
estimation of daily VaR. Furthermore, Figure 1.14 reveals that the reported
values of VaR are not always closely related to actual observed volatility
in daily trading revenue. The VaR reported by Bank of America for the
27
Trading Revenue
Historical VaR
05
20
04
20
03
20
02
20
20
01
-100
-50
$ mill
0
50
100
1.5 The Efficient Markets Hypothesis and Return Predictability
Daily Reported VaR
Figure 1.14 Time series plot of the daily 1% Value-at-Risk reported by
Bank of America from 2 January 2001 to 31 December 2004.
year 2001 is fairly consistent and, if anything, trends upward over the year.
This is counter-intuitive given the volatility in trading revenue following the
events of 11 September 2001.
1.5 The Efficient Markets Hypothesis and Return Predictability
The correlation statistic in (1.22) determines the strength of the co-movements
between the returns of one asset with the returns of another asset. An important alternative application of correlation is to measure the strength of
movements in current returns on an asset, rt with returns on the same asset
k periods earlier, rt−k . As the correlation is based on own lags, it is referred
to as the autocorrelation. For any series of returns, the autocorrelation coefficient for k lags is defined as
PT
(rt − r) (rt−k − r)
ρk = t=k+1
PT
2
t=1 (rt − r)
If the series of returns does not exhibit autocorrelation then there is no
discernible pattern in their behaviour, making future movements in returns
28
Properties of Financial Data
unpredictable. If a series of returns exhibits positive autocorrelation, however, then successive values of returns tend to have the same sign and this
pattern can be exploited in predicting the future behaviour of returns. Similarly, negative autocorrelation results in the signs of successive values returns
alternating and prediction is based on this pattern is possible.
The fact that the presence of autocorrelation in asset returns represents
a pattern which can potentially be used in prediction of future returns is
the cornerstone of an important concept in modern finance, namely the
efficient markets hypothesis (Fama, 1965; Samuelson, 1965). In its most
general form, the efficient markets hypothesis theorises that all available
information concerning the value of a risky asset is factored into the current
price of the asset. A natural corollary of the efficient markets hypothesis
is that the current price provides no information on the direction of the
future price and that the asset returns should exhibit no autocorrelation.
An empirical test of the efficient market hypothesis in the context of a
particular asset is therefore that all the autocorrelations in its returns are
zero, or ρ1 = ρ2 = ρ3 = · · · = 0.
Table 1.3 gives the first 10 autocorrelations of hourly DM/$ exchange rate
returns in column 2. All autocorrelations appear close to zero, suggesting
that exchange rate returns are not predictable and that the foreign exchange
market is therefore efficient in the sense that all information about the DM/$
exchange rate is contained in the current quoted price.
Table 1.3
Autocorrelation properties of returns and functions of returns for the hourly
DM/$ exchange rate for the period 1 January 1986 00:00 to 15 July 1986 11:00.
Lag
rt
rt2
1
2
3
4
5
-0.022
0.020
0.023
-0.027
0.030
0.079
0.074
0.042
0.055
0.004
0.182
0.128
0.086
0.070
0.034
6
7
8
9
10
-0.024
-0.010
0.013
-0.007
0.027
0.018
-0.007
-0.009
-0.019
0.017
0.058
0.018
0.020
0.004
-0.014
|rt |
0.5
|rt |
0.214
0.129
0.085
0.055
0.043
0.064
0.035
0.033
0.015
-0.021
The calculation of autocorrelations of returns reveals information on the
1.5 The Efficient Markets Hypothesis and Return Predictability
29
mean of returns. This suggests that applying this approach to squared returns reveals information on the variance of returns. The autocorrelation
between squared returns at time t and squared returns k periods earlier, is
defined as
PT
2 − r2
2
2
r
r
−
r
t
t=k+1
t−k
.
ρk =
2
PT
2
2
t=1 rt − r
The application of autocorrelations to squared returns represents an important diagnostic tool in models of time-varying volatility which is discussed
in Chapter 11. Following in particular the seminal work of Engle (1982) and
Bollerslev (1986), positive autocorrelations in squared returns, suggests that
there is a higher chance of high (low) volatility in the next period if volatility
in the previous period is high (low). Formally this phenomenon is known as
volatility clustering.
Column 3 in Table 1.3 gives the first 10 autocorrelations of hourly DM/$
squared exchange rate returns. Comparing these autocorrelations to the autocorrelations based on returns, shows that there is now stronger positive
autocorrelation. This suggests that while the mean return is not predictable,
the variance of return is potentially predictable because of the phenomenon
of volatility clustering in exchange rate returns. Note, however, that this
conclusion does not violate the efficient markets hypothesis because his hypothesis is concerned only with the expected value of the level of returns.
It is also possible to compute autocorrelations for various transformations
of returns, including
rt3 ,
rt4 ,
|rt | ,
|rt |α .
The first two transformations provide evidence of autocorrelations in skewness and kurtosis respectively. The third transformation provides an alternative measure of the presence of autocorrelation in the variance. The last case
simply represents a general transformation. For example, setting α = 0.5
computes the autocorrelation of the standard deviation (the square root of
the variance).
The presence of stronger autocorrelation in squared returns than returns,
suggests that other transformations of returns may reveal even stronger autocorrelation patters and this conjecture is born out by the results reported
in Table 1.3. Columns 4 and 5 in Table 1.3 respectively give the first 10
autocorrelations of hourly absolute DM/$ exchange returns, |rt |, and the
square root of absolute DM/$ exchange returns returns, |rt |0.5 . Comparing
these autocorrelations to the autocorrelations based on returns (column 2)
30
Properties of Financial Data
and squared returns (column 3), reveals even stronger positive autocorrelation patterns with the strongest pattern revealed by the standard deviation
transformation |rt |0.5 .
1.6 Efficient Market Hypothesis and Variance Ratio Tests†
Another statement of the efficient markets hypothesis is that the price of a
financial asset encapsulates all available information. Consider the following
simple model of asset prices
pt = αpt−1 + ut −→ pt − pt−1 = rt = α + ut ,
(1.23)
in which the constant α represents a small positive compensation for holding
a risky asset. The main implication of this model is that the predictably of
asset returns and hence prices depends solely upon the characteristics of
the disturbance term ut . Based on this simple model a formal test of the
predictability of asset returns may be developed based on the concept of
a variance ratio, which in fact just turns out to be a clever way of testing
that the autocorrelations of returns are zero. Campbell, Lo and MacKinlay
(1997) provide a thorough treatment of the different versions of the variance
ratio tests.
Suppose that E[u2t ] = σ 2 and that E[ut−i ut−j ] = 0 for all i 6= j. In this
situation there is no information in the disturbance term that may be used
to predict asset returns and the market is therefore efficient. Under these
assumptions, the q-period return is simply the sum of the single period log
returns, as discussed previously, and the variance of the multi-period returns
is var(ut + · · · ut−q+1 ) is simply qσ 2 . Let σ
bq2 be an estimator of var(ut +
· · · ut−q+1 ) and σ
b2 be the sample variance. Under the null hypothesis, the
statistic based on the ratio of variances
Vq =
σ
bq2
qσ
b2
should, on average, be equal to one.
The intuition behind the test may be developed a little further. Assume
that the disturbance term ut has constant variance σ 2 , but that the covariance between ut and ut−j is not zero but γj . For example, the 3-period
return is
var(r3t ) = var(rt + rt−1 + rt−2 )
= 3var(rt ) + 2 cov(rt , rt−1 ) + cov(rt−1 , rt−2 ) + cov(rt , rt−2 )
= 3γ0 + 2(2γ1 + γ2 ) ,
1.6 Efficient Market Hypothesis and Variance Ratio Tests†
31
recognising that var(rt ) = σ 2 = γ0 . The variance ratio for the 3-period
return is then
3γ0 + 2(2γ1 + γ2 )
.
V3 =
3 γ0
This expression may be simplified by recalling that the autocorrelation at
lag i is given by ρi = γi /γ0 . The variance ratio may then be written as
2
1
V3 = 1 + 2 ρ1 + ρ2 ,
3
3
which is a weighted sum of autocorrelations with weights declining as the
order of autocorrelation increases. Of course if both ρ1 and ρ2 are zero,
then V3 = 1. In other words, the variance ratio is simply a test that all the
autocorrelations of ut are zero and that therefore returns are not predictable.
To construct a proper statistical test it is necessary to specify how to
compute the variance ratio and what the distribution of the test statistic
under the null hypothesis is. Suppose that there are T + 1 observations on
log prices {p1 , p2 , · · · , pT +1 so that there are T observations on log returns.
The variance ratio statistic for returns defined over q periods is defined as
b2
bq = σ
V
σ
bq2
in which
T
1X
α
b=
rk
T
σ
b2 =
σ
bq2 =
1
T
k=1
T
X
(rk − α
b)2
k=1
T
X
11
qT
(1.24)
(pk − pk−q − q α
b )2 .
(1.25)
(1.26)
k=q1
bq − 1
Lo and MacKinlay (?) show that, in large samples, the test statistic V
is distributed as follows:
1/2
√
T
b
bq − 1 ∼ N (0, 1)
T Vq − 1 ∼ N (0, 2(q − 1)) or
V
2(q − 1)
There are many other versions of the variance ratio test statistic. Small
sample bias adjustments may be made to the estimators of σ
b2 and σ
bq2 . The
assumptions about the behaviour of the underlying disturbance term, ut ,
may be relaxed. For example, it will become apparent in Chapter ?? that,
32
Properties of Financial Data
when dealing with the returns to financial assets, the assumption of a constant variance for disturbance term is unrealistic. Furthermore, although the
test is still for zero autocorrelations in the ut , there is strong evidence to suggest dependence in the squares of the disturbance term. This situation can
also be dealt with by adjusting the definition of the variance ratio statistic.
1.7 Exercises
(1) Equity Prices, Dividends and Returns
pv.wf1, pv.dta, pv.xlsx
(a) Plot the equity price over time and interpret its time series properties. Compare the result with Figure 1.1.
(b) Plot the natural logarithm of the equity price over time and interpret
its time series properties. Compare this graph with Figure 1.2.
(c) Plot the return on equities over time and interpret its time series
properties. Compare this graph with Figure 1.3.
(d) Plot the price and dividend series using a line chart and compare
the result in Figure 1.5.
(e) Compute the dividend yield and plot this series using a line chart.
Compare the graph with Figure 1.6.
(f) Compare the graphs in parts (a) and (b) and discuss the time series
properties of equity prices, dividend payments and dividend yields.
(g) The present value model predicts a one-to-one relationship between
the logarithm of equity prices and the logarithm of dividends. Use a
scatter diagram to verify this property and compare the result with
Figure ??.
(h) Compute the returns on United States equities and then calculate
the sample mean, variance, skewness and kurtosis of these returns.
Interpret the statistics.
(2) Yields
zero.wf1, zero.dta, zero.xlsx
(a) Plot the 2, 3, 4, 5, 6 and 9 months United States zero coupon yields
using a line chart and compare the result in Figure 1.4.
1.7 Exercises
33
(b) Compute the spreads on the 3-month, 5-month and 9-month zero
coupon yields relative to the 2-month yield and and plot these
spreads using a line chart. Compare the graph with Figure 1.4.
(c) Compare the graphs in parts (a) and (b) and discuss the time series
properties of yields and spreads.
(3) Computing Betas
capm.wf1, capm.dta, capm.xlsx
(a) Compute the monthly excess returns on the United States stock
Exxon and the market excess returns.
(b) Compute the variances and covariances of the two excess returns.
Interpret the statistics.
(c) Compute the Beta of Exxon and interpret the result.
(d) Repeat parts (a) to (c) for General Electric, Gold, IBM, Microsoft
and Wal-Mart.
(4) Duration Times Between American Airline (AMR) Trades
amr.wf1, amr.dta, amr.xlsx
(a) Use a histogram to graph the empirical distribution of the duration
times between American Airline trades. Compare the graph with
Figure 1.9.
(b) Interpret the shape of the distribution of durations times.
(5) Exchange Rates
hour.wf1, hour.dta, hour.xlsx
(a) Draw a line chart of the $/£ exchange rate and discuss its time
series characteristics.
(b) Compute the returns on $/£ pound exchange rate. Draw a line chart
of this series and discuss its time series characteristics.
(c) Compare the graphs in parts (a) and (b) and discuss the time series
properties of exchange rates and exchange rate returns.
(d) Use a histogram to graph the empirical distribution of the returns
on the $/£. Compare the graph with Figure 1.12.
34
Properties of Financial Data
(e) Compute the first 10 autocorrelations of the returns, squared returns, absolute returns and the square root of the absolute returns.
(f) Repeat parts (a) to (e) using the DM/$ exchange rate and comment
on the time series characteristics, empirical distributions and patterns of autocorrelation for the two series. Discuss the implications
of these results for the efficient markets hypothesis.
(6) Value-at-Risk
bankamerica.wf1, bankamerica.dta, bankamerica.xlsx
(a) Compute summary statistics and percentiles for the daily trading
revenues of Bank of America. Compare the results with Table 1.2.
(b) Draw a histogram of the daily trading returns and superimpose a
normal distribution on top of the plot. What do you deduce about
the distribution of the daily trading revenues.
(c) Plot the trading revenue together with the historical 1% VaR and
the reported 1% Var. Compare the results with Figure 1.14.
(d) Now assume that a weekly VaR is required. Repeat parts (a) to (c)
for weekly trading revenues.
2
Linear Regression Models
2.1 Introduction
One of the most widely used models in empirical finance is the linear regression model. This model provides a framework in which to explain the
movements of one financial variable in terms of one, or many explanatory
variables. Important examples include, but are not limited to, measuring
Beta-risk in the capital asset pricing model (CAPM), extensions and variations of the CAPM model, such as the Fama-French three factor model and
the consumption-CAPM version, arbitrage pricing theory, the term structure of interest rates and the present value model of equity prices. Although
these basic models stipulate linear relationships between the variables, the
framework is easily extended to a range of nonlinear relationships as well.
Movements to capture sharp changes in returns caused by stock market
crashes, day-of-the-week effects and policy announcements is easily handled
by means of qualitative response variables or dummy variables.
The importance of the linear regression modelling framework is highlighted by appreciating its flexibility in quantifying changes in key financial
parameters arising from changes in the financial landscape. From Chapter
1 the traditional approach to modelling the Beta-risk of an asset is to assume that it is a constant ratio of the covariance between the excess returns
on the asset with the market, to the variance of the market excess returns.
However, one or both of these quantities may change over time resulting in
changes in the Beta-risk of the asset. The linear regression model provides
a flexible and natural approach to modelling time-variations in Beta-risk.
36
Linear Regression Models
2.2 Portfolio Risk Management
Risk management concerns choosing a portfolio of assets where the relative
contribution of each asset in the portfolio is chosen to minimise the overall
risk of the portfolio, as measure by its volatility, or its variance. To derive
the minimum variance portfolio, consider a portfolio consisting of two assets
with returns r1,t and r2,t , respectively, with the following properties
Mean:
µ1 = E[r1,t ]
µ2 = E[r2,t ]
Variance:
σ12 = E[(r1,t − µ1 )2 ]
σ22 = E[(r2,t − µ2 )2 ]
Covariance:
σ1,2 = E[(r1,t − µ1 )(r2,t − µ2 )]
The return on the portfolio is given by
rp,t = w1 r1,t + w2 r2,t ,
(2.1)
w1 + w2 = 1,
(2.2)
where
are weights that define the relative contributions of each asset in the portfolio. The expected return on this portfolio is
µp = E[w1 r1,t + w2 r2,t ] = w1 E[r1,t ] + w2 E[r2,t ] = w1 µ1 + w2 µ2 ,
(2.3)
where a measure of the portfolio’s risk is
σp2 = E[(rp,t − µp )2 ]
= E[(w1 (r1,t − µ1 ) + w2 (r2,t − µ2 ))2 ]
= w12 E[(r1,t − µ1 )2 ] + w22 E[(r2,t − µ2 )2 ] + 2w1 w2 E[(r1,t − µ1 )(r2,t − µ2 )]
= w12 σ12 + w22 σ22 + 2w1 w2 σ1,2 .
(2.4)
Using the restriction imposed by equation (2.2), the risk of the portfolio is
equivalent to
σp2 = w12 σ12 + (1 − w1 )2 σ22 + 2w1 (1 − w1 )σ1,2 .
(2.5)
To find the optimal portfolio that minimises risk, the following optimisation problem is solved
min σp2 .
w1
Differentiating (2.5) with respect to w1 gives
dσp2
= 2w1 σ12 − 2(1 − w1 )σ22 + 2(1 − 2w1 )σ1,2 .
dw1
2.2 Portfolio Risk Management
37
Setting this derivative to zero and rearranging for w1 gives the optimal
portfolio weight on the first asset as
w1 =
σ22 − σ1,2
.
σ12 + σ22 − 2σ1,2
(2.6)
Upon using (2.2) gives the optimal weight on the other asset as
w2 = 1 − w1 =
σ12 − σ1,2
.
σ12 + σ22 − 2σ1,2
(2.7)
An alternative way of expressing the minimum variance portfolio model
is to consider the linear regression equation
yt = β0 + β1 xt + ut ,
(2.8)
where the variables are defined as
yt = r2,t ,
xt = r2,t − r1,t ,
(2.9)
and ut is a disturbance term which is shown below to be also the return on
the portfolio. The parameters β0 and β1 , are chosen such that their estimated
values βb0 and βb1 given by
cov(yt , xt )
βb1 =
,
var(xt )
βb0 = E[yt ] − β1 E[xt ] ,
(2.10)
respectively minimize the variance, σ 2 = E[u2t ].
To see that the expressions in (2.10) yield the minimum variance portfolio,
the definitions of yt and xt in (2.9) are substituted into (2.10) to give
cov(yt , xt )
βb1 =
var(xt )
cov(r2,t , r2,t − r1,t )
=
var(r2,t − r1,t )
var(r2,t ) − cov(r2,t , r1,t )
=
var(r2,t ) + var(r1,t ) − 2cov(r2,t , r1,t )
σ 2 − σ1,2
= 2 2 2
,
σ1 + σ2 − 2σ1,2
(2.11)
and
βb0 = E[yt ] − β1 E[xt ]
= E[r2,t ] − β1 E[r2,t − r1,t ]
= β1 E[r1,t ] − (1 − β1 )E[r2,t ]
= β1 µ1 − (1 − β1 )µ2 .
(2.12)
38
Linear Regression Models
The expression for βb1 is equivalent to the optimal weight of the first asset
in the portfolio given in (2.6), that is βb1 = w1 . A comparison of the expression of βb0 with the expected return on the portfolio in (2.3) shows that βb0
represents the mean return on the minimum variance portfolio.
Moreover, the estimate of the disturbance term in (2.8) is
u
bt = yt − βb0 − βb1 xt
= r2,t − βb0 − βb1 (r2,t − r1,t )
= r2,t − (βb1 µ1 − (1 − βb1 )µ2 ) − βb1 (r2,t − r1,t )
= βb1 (r1,t − µ1 ) + (1 − βb1 )(r2,t − µ2 ),
where the third line makes use of the expression of βb0 in (2.12). The disturbance term is a weighted average of the deviations of the returns from their
average values where the weights are the portfolio weights. This also means
that the variance of the disturbance term σ 2 = E[u2t ], corresponds to the
risk of the portfolio, σp2 .
This one-to-one relationship between the minimum variance portfolio and
the linear regression parameters in (2.8) forms the basis of the least squares
estimator which is used to estimate the parameters of this model from a
sample of data. Before exploiting this connection, some further examples
showing the relationship between the linear regression model and finance
theoretical models are given next.
2.3 Linear Models in Finance
This section highlights the importance of the linear regression model in empirical finance by demonstrating that it is central to a number of well-known
theories in finance. In many of these examples the parameters of the linear
regression model are shown to have very clear and explicit interpretations
that directly relate to financial inputs and quantities.
2.3.1 The Constant Mean Model
The simplest linear model in finance is where the average return on an asset
is assumed to be constant
rt = µ + ut ,
(2.13)
where rt is the return and µ = E[rt ] is the average return or expected return.
The disturbance term ut represents the deviation of the return on the asset
2.3 Linear Models in Finance
39
at time t from its mean
ut = rt − µ.
This term has two important properties which follow immediately from
(2.13). First, it has zero mean since
E[ut ] = E[rt − µ] = E[rt ] − µ = µ − µ = 0 .
(2.14)
Second, the variance of ut is
σ 2 = E[u2t ] = E[(rt − µ)2 ] ,
(2.15)
where the last step shows that the variance of ut and rt are the equivalent.
2.3.2 The Market Model
The market model extends the constant mean model in (2.13) by assuming
that the return on the asset follows movements in the return on the market
portfolio, rm,t , and is given by
rt = β0 + β1 rm,t + ut ,
(2.16)
in which ut is the disturbance term. The parameters β0 and β1 represent,
respectively, the intercept and the slope of the linear function β0 + β1 rm,t .
Equation (2.16) is a regression line in which rt is the dependent variable
and rm,t is the explanatory variable, so-called because movements in rt help
to explain movements in rm,t . Of course the variation in rt is only partially
explained by movements in rm,t , with any unexplained variation in rt being
captured by the disturbance term.
In the market model, the expected return on the asset is given by
Et [rt ] = β0 + β1 rm,t ,
(2.17)
where Et [·] is the conditional expectations operator based on information at
time t, as given by rm,t . In the special case where the return is not affected
by the return on the market, β1 = 0, the market model reduces to the
constant mean model in (2.13) and the conditional expectations operator
reduces to the unconditional expectation, Et [rt ] = E[rt ] = β0 . Put simply,
the t subscript on the conditional expectations operator is now dropped as
the expectation is not based on any information at time t, or any other point
in time for that matter.
40
Linear Regression Models
2.3.3 The Capital Asset Pricing Model
Building on efficient portfolio theory developed by Markowitz (1952, 1959),
the Capital Asset Pricing Model (CAPM), which is credited to Sharpe (1964)
and Lintner (1965), relates the return on the ith asset at time t, ri,t , to the
return on the market portfolio, rm,t , with both returns adjusted by the return
on a risk-free asset, rf,t , usually taken to be the interest rate on a government
security. As in equation (1.11) of Chapter 1, the log excess return for asset
i are defined as
zi,t = ri,t − rf,t ,
zm,t = rm,t − rf,t .
As pointed out in Chapter 1, the risk characteristics of an asset are encapsulated by its Beta-risk
β=
cov(zi,t , zm,t )
,
var(zm,t )
(2.18)
which was introduced in Chapter 1.
The CAPM is equivalent to the linear regression model
ri,t − rf,t = α + β(rm,t − rf,t ) + ut ,
(2.19)
in which ut is a disturbance term and β represents the asset’s Beta-risk as
given in (2.18) and the constant, which is traditionally labelled α, represents
the abnormal return to the asset over and above the asset’s exposure to the
excess return on the market. This model postulates a linear relationship
between the excess return on the asset and the excess return on the market,
with the slope given by asset’s Beta-risk, β1 .
In the pure form of the CAPM, the return on the market is equal to
the return on the risk free asset so that rm,t = rf,t . In this scenario, the
return on the asset should also equal the risk free rate of return as well.
For this relationship to be satisfied, the intercept of the regression model
is restricted to be zero, α = 0, and the the CAPM regression line passes
through the origin.
A further feature of the linear regression equation in (2.19) is that it
conveniently decomposes the total risk of an asset at time t in terms of the
component that is systemic and that part which is ideosyncratic
E[(ri,t − rf,t )2 ] = E[(α + β(rm,t − rf,t ))2 ] +
{z
} |
{z
}
|
Total risk
Systematic risk
E[u2t ]
| {z }
,
(2.20)
Ideosyncratic risk
a result which uses the fact that E[(rm,t − rf,t ), ut ] = 0. Systematic risk is
so-called because it relates to the risk of the overall market portfolio. The
2.3 Linear Models in Finance
41
idiosyncratic risk, σ 2 = E[u2t ], relates to that part of risk which is unique to
the individual asset and uncorrelated with the market.
2.3.4 Arbitrage Pricing Theory
An alternative approach to mo using Fama-French factors in extending the
CAPM equation in (2.19), is to include variables that capture unanticipated
movements in key economic variables such as commodity movements and
output growth. This class of models is based on arbitrage pricing theory
(APT) developed by Ross (1976), which is summarised by the linear regression equation
ri,t − rf,t = β0 + β1 (rm,t − rf,t ) + β2 Ut + ut ,
(2.21)
where Ut represents unanticipated movements in a particular variable or
set of variables and ut is a disturbance term. This model reduces to the
CAPM in (2.19) where β2 = 0, a situation which occurs when unanticipated
movements in the economy do not contribute to explaining movements in
the excess returns on the asset.
One of the drawbacks of the APT model is that it does not identify the
factors, Ut , to be included in equation (2.21). In applied work, the choice
of factors can usually driven either by theoretical considerations or by the
data. The theoretical approach attempts to discern macroeconomic and financial market variables that relate to the systematic risk of the economy.
The statistical or data-driven approach normally uses a technique known
as principal component analysis to identify number of underlying ‘factors’
that drive returns, without specifying how exactly these factors are to be
interpreted. This approach to factor choice is the subject matter of Chapter
10.
2.3.5 Term Structure of Interest Rates
Consider the relationship between the return on a long-term bond maturing
in n-periods rn,t , and a short-term 1-period bond r1,t . The expectations
hypothesis of the term structure of interest rates requires that the yield on
a n-period long-term bond, rn,t , is equal to a constant risk premium, φ, plus
the average of current and expected future 1-period short-term rates
r1,t + Et [r1,t+1 ] + Et [r1,t+2 ] + · · · + Et [r1,t+n−1 ]
,
(2.22)
n
in which Et [r1,t+j ] represents the conditional expectations of future short
rates based on information at time t. Assuming that expectations of future
rn,t = φ +
42
Linear Regression Models
short-term rates are formed according to
Et [r1,t+j ] = r1,t ,
the term structure relationship in (2.22) reduces to
rn,t = φ + r1,t .
(2.23)
Equation (2.23) suggests that the term structure of interest rates can be
modelled by the following linear regression model
rn,t = β0 + β1 r1,t + ut ,
in which ut is a disturbance term. Under the expectations hypothesis the
slope parameter is given by β1 = 1 and the intercept may then be interpreted
as the risk premium, β0 = φ.
2.3.6 Present Value Model
The price of asset is equal to the expected discounted dividend stream
Pt = Et [
Dt+2
Dt+3
Dt+1
+
+
+ · · · ],
(1 + δ) (1 + δ)2 (1 + δ)3
(2.24)
where Dt is the dividend payment, δ is the discount factor, which is assumed to be constant for simplicity, and Et [Dt+j ] represents the conditional
expectations of Dt+j based on information at time t. Adopting the assumptions that expectations of future dividends are given by present dividends,
Et [Dt+n ] = Dt , and the discount rate is constant and equal to δ, then Chapter 1 shows that the price of the asset simplifies to
Dt
.
(2.25)
δ
By taking natural logarithms of both sides gives a linear relationship between
log Pt and log Dt
Pt =
log(Pt ) = − log(δ) + log(Dt ).
This suggests that the present value model can be represented by the following linear regression model
log(Pt ) = β0 + β1 log(Dt ) + ut ,
(2.26)
in which ut is a disturbance term. A test of the present value model is based
on the restriction β1 = 1. This model also shows that the intercept term β0
is a function of the discount factor, β0 = − log(δ), which suggests that the
discount factor is given by δ = exp(−β0 ).
2.3 Linear Models in Finance
43
2.3.7 C-CAPM †
The consumption based Capital Asset Pricing Model (C-CAPM) assumes
that a representative agent chooses current and future real consumption
{Ct , Ct+1 , Ct+2 , · · · } to maximise the inter-temporal expected utility function
#
" 1−γ
∞
X
Ct+j − 1
j
,
(2.27)
δ Et
1−γ
j=0
subject to the wealth constraint
Wt+1 = (1 + ri,t+1 )(Wt − Ct ),
(2.28)
where Wt is wealth, ri,t is the return on an asset (more precisely on wealth),
and Et is the conditional expectations operator based on information at
time t. The parameters are the discount rate δ, and the relative risk aversion coefficient, γ. Solving this maximisation problem yields the first order
condition
" #
Ct+1 −γ
Et δ
(2.29)
(1 + ri,t+1 ) = 1.
Ct
Taking natural logarithms of this equation gives
" #
Ct+1 −γ
log Et δ
(1 + ri,t+1 ) = 0,
Ct
(2.30)
since log 1 = 0.
The left hand side of expression (2.30) is essential the logarithm of a
conditional expectation. This expression may be simplified by recognising
that if a variable X follows the log-normal distribution, then
1
log Et [X] = Et [log X] + vart (log X) .
(2.31)
2
The trick is now to define X = δ(Ct+1 /Ct )−γ (1 + ri,t+1 ) and then find
relatively straightforward expressions for the two terms on the right hand
side of (2.31), based on the assumption that X does indeed follow a lognormal distribution.
The properties of natural logarithms require that
Ct+1
log X = log δ − γ log
+ log(1 + ri,t+1 ) ,
Ct
so that
Ct+1
Et [log X] = log δ − γEt log
+ Et [log(1 + ri,t+1 )] ,
Ct
44
Linear Regression Models
which is the first term on the right hand side of (2.31). The second term is
vart (log X) = vart (log δ − γ log(
Ct+1
) + log(1 + ri,t+1 )) ,
Ct
which may be simplified by recognising that the only contributions to vart (log X)
will come from the variances and covariance of the terms in Ct+1 /Ct and rt .
These terms are as follows
Ct+1
Ct+1
2
vart γ log
= γ vart log
Ct
Ct
vart (log(1 + ri,t+1 )) = vart (log(1 + ri,t+1 ))
Ct+1
, log(1 + ri,t+1 ) = γ 2 σc2 + σr2 − 2γσc,r .
covt −γ log
Ct
Using these results, it follows that (2.30) can be re-expressed as
Ct+1
1
log δ − γEt log
+ Et [log(1 + ri,t+1 )] + (γ 2 σc2 + σr2 − 2γσc,r ) = 0,
Ct
2
or
Ct+1
1
.
Et [log(1 + ri,t+1 )] = − log δ − (γ 2 σc2 + σr2 − 2γσc,r ) + γEt log
2
Ct
To convert this equation from expected variables to observable variables
define the following expectations generating equations
log ri,t+1 = Et [log(1 + ri,t+1 )] + u1,t
Ct+1
Ct+1
log
= Et log
+ ut,2 ,
Ct
Ct
in which u1,t and u2,t represent errors in forming conditional expectations.
Using these expressions in (2.3.7) gives a linear regression model between
log returns of an asset and the growth rate in consumption log(Ct+1 /Ct )
Ct+1
log(1 + ri,t+1 ) = β0 + β1 log
+ ut ,
(2.32)
Ct
in which
1
β0 = − log δ − (γ 2 σc2 + σr2 − 2γσc,r )
2
β1 = γ,
and where ut = u1,t − γu2,t is a composite disturbance term. In this expression, the slope parameter of the regression equation is in fact the relative risk
aversion coefficient, γ. The expression of the intercept term shows that β0
is a function of a number of parameters including the relative risk aversion
2.4 Estimation
45
parameter γ, the discount rate δ, the variance of consumption growth σc2 ,
the variance of log asset returns σr2 and the covariance between logarithm
of asset returns and real consumption growth.
2.4 Estimation
The finance models presented in Section 2.3 are all representable in terms
of the following generic linear regression equation
yt = β0 + β1 x1,t + β2 x2,t + · · · + βK xK,t + ut ,
(2.33)
in which yt is the dependent variable which is a function of a constant, a set of
K explanatory variables given by x1,t , x2,t , · · · , xK,t and a disturbance term,
ut . The disturbance term represents movements in the dependent variable
yt not explained movements in the explanatory variables. The regression
parameters, β0 , β1 , β2 , · · · , βK , control the the strength of the relationships
between the dependent and the explanatory variables.
For equation (2.33) to represent a valid model ut needs to satisfy a number
of properties, some of which have already been discussed.
(1) Mean:
The disturbance term has zero mean, E[ut ] = 0.
(2) Homoskedasticity
The disturbance variance is constant for all observations, var(ut ) = σ 2 .
(3) No autocorrelation:
Disturbances corresponding to different observations are independent,
E[ut ut+j ] = 0, j 6= 0.
(4) Independence:
The disturbance is uncorrelated with the explanatory variables, E[ut xj,t ] =
0, j = 1, 2, · · · , K.
(5) Normality:
The disturbance has a normal distribution.
These assumptions are usually summarised as ut ∼ iid N (0, σ 2 ) in the specification of the regression model.
The regression model in (2.33) represents the population. The aim of estimation is to compute the unknown parameters β0 , β1 , β2 , · · · , βK , given a
sample of T observations on the dependent variables and the K explanatory variables. As it is the sample that is used to estimate the population
parameters, the sample counterpart of (2.33) is
yt = βb0 + βb1 x1,t + βb2 x2,t + · · · + βbK xK,t + u
bt ,
(2.34)
46
Linear Regression Models
where βbk is the sample estimate of βk , and u
bt represents the regression residual. Given a sample of T observations the βbk ’s are estimated by minimising
the residual sum of squared errors
RSS =
T
X
u
b2t .
(2.35)
t=1
The βbk ’s represent the ordinary least squares estimates of the parameters of
the model.
From the discussion of the minimum variance portfolio problem in Section 2.2, the least squares solution corresponds to estimating the population
moments by the sample moments. In the case of a portfolio with two assets,
the expressions in (2.10) in terms of the sample moments become
βb1 =
T
1X
(yt − y)(xt − x)
T
t=1
T
1X
(xt − x)2
T
,
βb0 = y − βb1 x,
(2.36)
t=1
where y and x are the sample means
T
1X
y=
yt ,
T
t=1
T
1X
x=
xt .
T
t=1
These formulas are easily extended to the multiple regression model in which
there is more than one explanatory variable.
2.5 Some Results for the Linear Regression Model†
This section provides a limited derivation of the ordinary least squares estimators of the multiple linear regression model and also the sampling distributions of the estimators. Attention is focussed on a model with one
independent variable and two explanatory variables in order to give some
insight into the general result.
Consider the linear regression model
yt = β1 x1,t + β2 x2,t + ut ,
ut ∼ iid N (0, σ 2 ) ,
(2.37)
in which the variables are defined as being deviations from their means so
that there is no constant term in equation (2.37). This assumption simplifies
the algebra but has no substantive affect. The residual sum of squares is
2.5 Some Results for the Linear Regression Model†
47
given by
b =
RSS(β)
T
X
t=1
u
b2t = yt −
T
X
(βb1 x1,t + βb2 x2,t )2
(2.38)
t=1
Differentiating RSS with respect to β1 and β2 and setting the results equal
to zero yields
∂RSS
∂β1
∂RSS
∂β2
=
PT
=
PT
t=1 (yt
− βb1 x1,t − βb2 x2,t )x1,t = 0
(2.39)
b
b
t=1 (yt − β1 x1,t − β2 x2,t )x2,t = 0 .
This system of first-order conditions can be written in matrix form as
 T
  T

  
T
P
P 2
P
βb1
0
y
x
x
x
x
t
1,t
1,t
2,t
1,t

 




t=1
t=1
t=1

−

 =  ,
T
T
T
 P
  P

P
yt x2,t
x1,t x2,t
x22,t
0
βb
t=1
t=1
and solving for βb1 and βb2 gives

  P
T
T
P
2
βb1
x1,t x2,t
x
1,t

 
t=1
t=1

=

  P
T
T
P
b
x
x
x22,t
1,t
2,t
β2
t=1
2
t=1
t=1
−1 



T
P

x1,t yt 

,
 t=1
T

 P
x2,t yt
(2.40)
t=1
which are the ordinary least squares estimators βb = [βb1 , βb2 ]0 of the population parameters {β1 , β2 }.
Inspection of the terms on the right-hand side of (2.40) allows a number
of simplifications of notation to be made. The first matrix on the right-hand
side of (2.40) when multiplied by T −1 is the sample covariance matrix of
x1,t and x2,t , which may be denoted Mxx . Similarly the second object on the
right-hand side of (2.40), when multiplied by T −1 sample covariance of x1,t
and x2,t with yt , respectively. This may be denoted Mxy . The ordinary least
squares estimator of the multiple regression model in equation (2.37) may
therefore be written as
T
T
h1 X
i−1 h 1 X
i
−1
βb = Mxx
Mxy =
xt x0t
x t yt ,
T
T
t=1
]0 .
(2.41)
t=1
in which xt = [x1,t , x2,t The beauty of this notation is that it is completely
general. In the event of K > 2 regressors the relevant vector xt is defined
and the estimator is still given by (2.41).
48
Linear Regression Models
Once the ordinary least squares estimates have been computed, the ordinary least squares estimator, s2 , of the variance, σ 2 in the case of K = 2, is
obtained from
T
1X b
2
(β1 x1,t − βb2 x2,t )2 .
(2.42)
s =
T
t=1
In computing s2 in equation (2.42) it is common to express the denominator
in terms of the degrees of freedom, T − K instead of merely T . If K > 2,
the estimation of σ 2 proceeds exactly as in equation (2.42) where, of course,
the appropriate number of regressors and coefficients are now included in
the computation.
Equation (2.41) for the ordinary least squares estimator of the parameters
of the K variable regression model may be re-arranged and written as
T
T
T
T
h1 X
i−1 h 1 X
i
h1 X
i−1 1 X
0
0
b
β=
xt xt
x t yt = β +
xt xt
xt ut , (2.43)
T
T
T
T
t=1
t=1
t=1
t=1
where the last term is obtained by substituting for yt from regression equation (2.37). This expression shows that the distribution of the estimator βb
P
P
is going to depend crucially on T −1 Tt=1 xt ut and T −1 Tt=1 xt x0t .
The distribution of the estimator ordinary least squares estimator βb is
established in terms of two important results. In order to invoke these results
the variables xt and yt need to satisfy a number of important conditions.1
The first result is the weak law of large numbers (WLLN) which is used to
claim that the sample covariance matrix of the xt variables converges, as the
sample size gets infinitely large, to the population covariance matrix, or
T
1X
p
xt x0t −→ Ω
T
t=1
p
where Ω is the population covariance matrix of xt and −→ represents convergence in probability as T → ∞. The second result is the application of a
central limit theorem to claim that
T
1 X
d
√
xt ut −→ N (0, σ Ω)
T t=1
d
where σ is the population variance of ut and −→ represents convergence of
1
For expediency reasons, it will simply be assumed here that the requisite conditions on xt and
yt are indeed satisfied. For a more detailed discussion of these conditions and the appropriate
choice of central limit theorem see, Hamilton (1994) or Martin, Hurn and Harris (2013).
49
2.6 Diagnostics
the distribution as T → ∞. Re-arranging equation (2.43) slightly and using
these two important convergence results, yields
√
d
T (βb − β) −→ Ω−1 × N (0, σ Ω) = N (0, σΩ−1 ) .
This is the usual expression for the distribution of the least squares estimator
of the multiple regression model as T → ∞.
2.6 Diagnostics
The estimated regression model is based on the assumption that the model
is correctly specified. To test this assumption a number of diagnostic procedures are performed. These diagnostics are divided into three categories
which relate to the key variables that summarise the model, namely, the
dependent variable Yt , the explanatory variables Xt and the disturbances
ut .
2.6.1 Diagnostics on the Dependent Variable
The fundamental aim of the linear regression model is to explain the movements in the dependent variable yt . This suggests that a natural measure of
the success of an estimated model is given by the proportion of the variation
in the dependent variable explained by the model. This statistic is given by
the coefficient of determination
T
P
Explained sum of squares
R2 =
=
Total sum of squares
(yt − y)2 −
t=1
T
P
(yt −
T
P
t=1
u
b2t
.
(2.44)
y)2
t=1
The coefficient of determination satisfies the inequality 0 ≤ R2 ≤ 1. Values close to unity suggest a very good model fit and values close to zero
representing a poor fit.
From equation (2.20), the explained sum of squares provides an overall
estimate of the systematic (non-diversifiable) risk of the asset, while the
unexplained part gives an estimate of its idiosyncratic (or diversifiable risk).
This suggests that R2 provides a measure of the proportion of the total risk
of an asset that is non-diversifiable, and 1 − R2 represents the proportion
that is diversifiable.
A potential drawback with R2 is that it never decreases when another
variable is added to the model. By continually including variables, until the
50
Linear Regression Models
number just matches the actual sample size, it is possible to obtain a coefficient of determination of R2 = 1, with all risk effectively diversified away.
From a statistical point of view, what is important in selecting explanatory
variables is to include just those variables which significantly help to improve
the explanatory power of the model. This is achieved by penalising the R2
statistic through the loss in degrees of freedom. This statistic is referred to
as the adjusted coefficient of determination which is computed as
2
R = 1 − (1 − R2 )
T −1
.
T −K −1
(2.45)
A related measure to the coefficient of determination is the standard error
of the regression
s
PT
b2t
t=1 u
s=
,
(2.46)
T −K −1
which is simply the standard deviation of the ordinary least squares residuals. As the residuals in the CAPM model represent the component of risk
that is diversifiable, this statistic provides an overall measure of diversifiable
risk. A value of s = 0 implies a perfect fit with R2 = 1, with the resultant
implication that all risk is non-diversifiable. An estimate of s > 0 suggests
a less than perfect fit with some risk being diversifiable. However, it is not
possible to determine the quality of fit of a model by simply looking at the
value of s because this quantity is affected by the units in the measurement
of the variables. For example, re-expressing returns in terms of percentages
has the effect of increasing s by a factor of 100, without changing the fit of
the model.
2.6.2 Diagnostics on the Explanatory Variables
As the aim of the regression model is to explain movements in the dependent
variable over and above its mean y, using information on the explanatory
variables x1,t , x2,t , · · · , xK,t , this implies that for this information to be important the slope parameters β1 , β2 , · · · , βK associated with these explanatory variables must be non-zero. To investigate this proposition tests are
performed on these parameters individually and jointly.
To test the importance of a single explanatory variable in the regression
equation, the associated parameter estimate is tested to see if it is zero using
a t-test. The null and alternative hypotheses are respectively
H0 : βk = 0 [xk,t is does not contribute to explaining yt ]
H1 : βk 6= 0 [xk,t is does contribute to explaining yt ].
2.6 Diagnostics
51
The t statistic to perform this test is
t=
βbk
,
se(βbk )
(2.47)
where βbk is the estimated coefficient of βk and se(βbk ) is the corresponding
standard error. The null hypothesis is rejected at the α significance level if
the test yields a smaller p-value
p − value < α : Reject H0 at the α level of significance
p − value > α : Fail to reject H0 at the α level of significance.
(2.48)
It is typical to choose α = 0.05 as the significance level, which means that
there is a 5% chance of rejecting the null hypothesis when it is actually true.
A joint test of all of the explanatory variables is determined by using a
either a F-test or a chi-square test. The null and alternative hypotheses are
respectively
H0 : β1 = β2 = ... = βK = 0
H1 : at least one βk is not zero.
Notice that this test does not include the intercept parameter β0 , so the
total number of restrictions is K. The F-statistic is computed as
F =
R2 /K
,
(1 − R2 )/(T − K − 1)
(2.49)
which is distributed as FK,T −K−1 (α). The χ2 test is computed as
χ2 = KF =
R2
,
(1 − R2 )/(T − K − 1)
(2.50)
which is distributed as χ2 with K degrees of freedom. Values of the test
statistics yielding p-values less than 0.05, constitute rejection of the null
hypothesis as in (2.48).
The t-test in (2.47) is designed to determine the importance of an explanatory variable by determining if the slope parameter is zero. From the
discussion of various theories in finance presented in Section 2.3, other types
of tests are of interest which focus on testing whether the population parameter equals a particular non-zero value. For example, in the case of the
CAPM it is of interest to see whether an asset tracks the market one-to-one
by determining if the slope parameter is unity. The t-statistic to perform
this test is obtained by generalising (2.47) as
t=
βbk − 1
.
se(βbk )
(2.51)
52
Linear Regression Models
More generally, sets of restrictions can be tested using either a F-test or a chisquare test as before. In the case of testing 1 restriction, then F = χ2 = t2 .
2.6.3 Diagnostics on the Disturbance Term
The third and final set of diagnostic tests are based on the disturbance term,
ut . For the regression model to represent a well specified model there should
be no information contained in the disturbance term. If this condition is
not satisfied, not only does this represent a violation of the assumptions
underlying the linear regression model, but it also suggests that there are
some arbitrage opportunities which can be used to improve predictions of
the dependent variable.
Residual Plots
A visual plot of the least squares residuals over the sample provides an initial
descriptive tool to identify potential patterns. Positive residuals show that
the model underestimates the dependent variable, whereas negative residuals show that the model overestimates the dependent variable. A sequence of
positive (negative) residuals suggests that the model continually underestimates (overestimates) the dependent variable, thereby raising the possibility
of arbitrage opportunities in predicting movements in the dependent variable. Residual plots are also helpful in identifying abnormal movements in
financial variables.
LM Test of Autocorrelation
This test is very important when using time series data. The aim of the test
is to detect if the disturbance term is related to previous disturbance terms.
The null and alternative hypotheses are respectively
H0 : No autocorrelation
H1 : Autocorrelation
If there is no autocorrelation this provides support for the model, whereas
rejection of the null hypothesis suggests that the model excludes important
information. The test consists of using the least squares residuals u
bt in the
following equation
u
bt = γ0 + γ1 x1,t + γ2 x2,t + · · · + γK xK,t + ρ1 u
bt−1 + vt ,
(2.52)
where vt is a disturbance term. This equation is similar to the linear regression model (2.33) with the exception that yt is replaced by u
bt and there is
2.6 Diagnostics
53
an additional explanatory variable given by the lagged residual u
bt−1 . The
test statistic is
LM = T R2 ,
(2.53)
where T is the sample size and R2 is the coefficient of determination from
estimating (2.52). This statistic is distributed as χ2 with one degree of freedom. This test of autocorrelation using (2.52) constitutes a test of first order
autocorrelation. Extensions to higher order autocorrelation is straightforward. For example, a test for second order autocorrelation is based on the
regression equation
u
bt = γ0 + β1 x1,t + γ2 x2,t + · · · + γK xK,t + ρ1 u
bt−1 + ρ2 u
bt−2 + vt .
(2.54)
The test statistic is still (2.53) with the exception that the degrees of freedom
is now equal to 2 to correspond to performing a joint test of lags 1 and 2.
White Test of Heteroskedasticty
White’s test of heteroskedasticity (White, 1980) is important when using
cross-section data or when modelling time-varying volatility, a topic that is
dealt with in Chapter ??. The aim of the test is to determine the constancy
of the disturbance variance σ 2 . The null and alternative hypotheses are
respectively
H0 : Homoskedasticity [σ 2 is constant]
H1 : Heteroskedasticity [σ 2 is time-varying].
The test consists of estimating the following equation for the case of K = 2
explanatory variables
u
b2t = γ0 + γ1 x1,t + γ2 x2,t + α1,1 x21,t + α1,2 x1,t x2,t + α2,2 x22,t + vt ,
(2.55)
where vt is a disturbance term. The choice of the explanatory variables can
be extended to include additional variables that are not necessarily included
in the initial regression equation. The test statistic is LM = T R2 , where T
is the sample size and R2 is the coefficient of determination from estimating
(2.55). This statistic is distributed as χ2 with 5 degrees of freedom which
corresponds to the number of explanatory variables in (2.55) excluding the
constant. If the disturbance variance is constant is should not be affected by
the explanatory variables in (2.55). In this special case
γ1 = γ2 = α1,1 = α1,2 = α2,2 = 0,
and the variance reduces to a constant given by σ 2 = γ0 .
54
Linear Regression Models
Normality Test
The assumption that ut is normally distributed is important in performing
hypothesis tests. A common way to test this assumption is the Jarque-Bera
test . The null and alternative hypotheses are respectively:
H0 : Normality
H1 : Nonnormality
The test statistic is
JB = T
SK KT − 3
+
6
24
,
(2.56)
where T is the sample size, and SK and KT are skewness and kurtosis,
respectively, of the least squares residuals
T T 1X u
bt 3
1X u
bt 4
SK =
,
KT =
.
T
s
T
s
t=1
t=1
and s is the standard error of the regression in (2.46). The JB statistic is
distributed as χ2 with 2 degrees of freedom.
This set of diagnostics is especially helpful in those situations where, for
example, the fit of the model is poor as given by a small value of the coefficient of determination. In this situation, the specified model is only able
to explain a small proportion of the overall movements in the dependent
variable. But if it is the case that ut is random, this suggests that the model
cannot be improved despite a relatively large proportion of variation in the
dependent variable is unexplained. In empirical finance this type of situation
is perhaps the norm particularly in the case of modelling financial returns
because the volatility tends to dominate the mean. In this noisy environment
it is difficult to identify the signal in the data.
2.7 Estimating the CAPM
Ordinary least squares estimates of the capital asset pricing model in (8.1)
are given in Table 7.3 for five United States stocks (Exxon, General Electric,
IBM, Microsoft, Walmart) and one commodity (gold) using continuously
compounded monthly excess returns from May 1990 to July 2004. The pvalues associated with a t-test of the significance of each parameter estimate
are given in parentheses.
General Electric, IBM and Microsoft are all aggressive stocks (βb1 > 1),
Exxon and Walmart are conservative stocks (0 < βb1 < 1) and gold is an
imperfect hedge (βb1 < 0).
55
2.7 Estimating the CAPM
Table 2.1
Ordinary least squares estimates of the CAPM in equation for monthly returns to
five United States stocks and gold for the period April 1990 to July 2004.
Standard errors are given in parentheses and p-values in square brackets.
Stock
Exxon
General Electric
Gold
IMB
Microsoft
Walmart
b0
b1
0.012
(0.000)
0.016
(0.000)
-0.003
(0.238)
0.004
(0.474])
0.012
(0.069)
0.007
(0.156)
0.502
(0.000)
1.144
(0.000)
-0.098
(0.066)
1.205
(0.000)
1.447
(0.000)
0.868
(0.000)
PT
t=1
u
b2t
R
2
s
0.249
0.235
0.038
0.510
0.440
0.055
0.149
0.014
0.030
1.048
0.297
0.079
1.282
0.333
0.087
0.747
0.234
0.066
The t-statistic to test that the market excess return is an important explanatory variable of the excess return on say Exxon is computed as
0.502
= 55.778
0.009
The p-value is 0.000, which is given in square brackets. As 0.000 < 0.05,
the null hypothesis is rejected at the 5% level. The same qualitative results
occur for the other assets in Table 8.1 with the exception of gold. For gold
the p-value of the test is 0.066 suggesting that this restriction is rejected at
the 10% level, but not at the 5% level.
These results may also be used to test the hypothesis that a stock tracks
the market one-to-one. The pertinent null hypothesis is H0 : β1 = 1, which
may be tested using a t-test. In the case of General Electric, to test statistic
is
1.144 − 1
t=
= 1.458 .
0.098
The p-value of this statistic is 0.1447 and the conclusion is that the null
hypothesis cannot be rejected at the 5% level.
2
The R statistics of the estimated CAPM for the various assets are also
given in the second last column of Table 8.1. The largest value reported is
for General Electric which shows that 44% of variation of movements in its
excess returns are explained movements in the market returns relative to the
t=
56
Linear Regression Models
Residuals
-.4 -.2 0 .2
General Electric
Residuals
-.4 -.2 0 .2
Exxon
1990
1995
2000
2005
1990
1995
2005
2000
2005
2000
2005
Residuals
-.4 -.2 0 .2
IBM
Residuals
-.4 -.2 0 .2
Gold
2000
1990
1995
2000
2005
1990
1995
Residuals
-.4 -.2 0 .2
Walmart
Residuals
-.4 -.2 0 .2
Microsoft
1990
1995
2000
2005
1990
1995
Figure 2.1 Least squares residuals from an estimated CAPM regressions
for six United States stock returns for the period April 1990 to July 2004.
2
risk free rate. Gold has the lowest R with just 1.4% of movements explained
by the market. This result also suggests that gold has the highest proportion
of risk that is diversifiable. Estimates of the diversifiable risk characteristics
of each asset are given by s in the last column of the Table.
Plots of the least squares residuals in Figure 2.1 highlight the presence of
some outliers in gold (+16.43%) and IBM (−28.48%) in October of 1999,
and Microsoft during the dot-com crisis of 2000 with the biggest movement
occurring in April (−38.56%). The estimated CAPM for Exxon and Walmart
do not exhibit any significant model misspecification. The IBM model does
not exhibit autocorrelation at the 1%, but fails the normality test. The gold
and Microsoft CAPMs exhibit second order autocorrelation, but not first or
twelfth autocorrelation at the 5% level, as well as fail the normality test.
In contrast, the General Electric CAPM exhibits autocorrelation at all lags,
but does not fail the normality test at the 5% level. All estimated models
pass the White heteroskedasticity test.
57
2.8 Qualitative Variables
Table 2.2
Diagnostic test statistics (with p-values in parentheses) of the estimated CAPM
models for monthly returns to five United States stocks and gold for the period
April 1990 to July 2004. P-values are given in parentheses. The test statistics are
LM (j), which is the LM test for j th order autocorrelation; W HIT E, which is the
White test of heteroskedasticity with regressors given by the levels and squares;
and JB, which is the Jarque-Bera test of normality.
Stock
Exxon
GE
Gold
IMB
Microsoft
Walmart
LM (1)
LM (2)
LM (12)
W HIT E
JB
0.567
(0.452)
5.458
(0.019)
1.452
(0.228)
0.719
(0.396)
3.250
(0.071)
1.270
(0.260)
1.115
(0.573)
7.014
(0.030)
7.530
(0.023)
0.728
(0.695)
6.134
(0.047)
1.270
(0.530)
12.824
(0.382)
41.515
(0.000)
17.082
(0.146)
10.625
(0.561)
12.220
(0.428)
12.681
(0.393)
1.022
(0.600)
5.336
(0.069)
2.579
(0.275)
1.613
(0.446)
0.197
(0.906)
2.230
(0.328)
2.339
(0.310)
5.519
(0.063)
224.146
(0.000)
34.355
(0.000)
52.449
(0.000)
4.010
(0.135)
2.8 Qualitative Variables
In all of the applications and examples investigated so far the explanatory
variables are all quantitative whereby each variable takes on a different value
for each sample observation. However, there are a number of applications in
financial econometrics where it is appropriate to allow some of the explanatory variables to exhibit qualitative movements. Formally this is achieved
by using a dummy variable which is 1 for an event and 0 for a non-event
0 : (non-event)
Dumt =
1 : (event).
2.8.1 Stock Market Crashes
Consider the augmented present value model
Pt = β0 + β1 Dt + β2 Dumt + ut ,
where Pt is the stock market price, Dt is the dividend payment and ut is
a disturbance term. The variable Dumt is a dummy variable that captures
58
Linear Regression Models
the effects of a stock market crash on the price of the asset
0 : (pre-crash period)
Dumt =
1 : (post-crash period).
The dummy variable has the effect of changing the intercept in the regression
equation according to
Pt = β0 + β1 Dt + ut
: (pre-crash period)
Pt = (β0 + β2 ) + β1 Dumt + ut : (post-crash period).
For a stock market crash β2 < 0,which represents a downward shift in the
present value relationship between the asset price and dividend payment.
An important stock market crash that began on 10 March 2000 is known
at the dot-com crash because the stocks of technology companies fell sharply.
The effect on one of the largest tech stocks, Microsoft, is highlighted in Figure 2.2 by the large falls in its share price over 2000. The biggest movement
is in April 2000 where there is a negative return of 42.07% for the month.
Modelling of Microsoft is also complicated by the unfavourable ruling of its
antitrust case at the same time which would have exacerbated the size of the
fall in April. Further inspection of the returns shows that there is a further
fall in December of 27.94%, followed by a correction of 34.16% in January
of the next year.
0
20
Price
40
60
(a) Price
1990
1995
2000
2005
2000
2005
-.4
-.2
Returns
0
.2
.4
(b) Returns
1990
1995
Figure 2.2 Monthly Microsoft price and returns for the period April 1990
to July 2004.
These three large movements are also apparent in the residual plot in
Figure 2.2. Introducing dummy variables for each of these three months into
59
2.8 Qualitative Variables
a CAPM model yields
ri,t − rf,t = 0.015 + 1.370 (rm,t − rf,t ) − 0.391 Apr00t
−0.298 Dec00t − 0.282 Jan01t + u
bt .
Figure 2.3 gives histograms without and with these three dummy variables
and show that the dummy variables are successful in purging the outliers
from the tails of the distribution. This result is confirmed by the JB statistic
which has a p-value of 0.651 for the augmented model.
(b) Residuals with Dummy Variables
Density
4
0
0
2
2
Density
4
6
6
8
(a) Residuals without Dummy Variables
-.4
-.2
0
Residuals
.2
.4
-.4
-.2
0
Residuals
.2
.4
Figure 2.3 Histograms of residuals from a CAPM regression using Microsoft returns for the period April 1990 to July 2004, both with and
without dummy variables for the dot-com crash.
2.8.2 Day-of-the-week Effects
Sometimes share prices exhibit greater movements on Monday than during
the week. One reason for this extra volatility arises from the build up of
information over the weekend when the stock market is closed. To capture
this behaviour consider the regression model
rt = β0 + β1 Mont + β2 Tuet + β3 Wedt + β4 Thut + ut ,
60
Linear Regression Models
where the data are daily. The dummy variables are defined as
0 : not Monday
Mont =
1 : Monday
0 : not Tuesday
Tuet =
1 : Tuesday
0 : not Wednesday
Wedt =
1 : Wednesday
0 : not Thursday
Thut =
1 : Thursday
Notice that there are just 4 dummy variables to explain the 5 days of the
week. This is because the setting of all dummy variables to zero
Mont = Tuet = Wedt = Thut = 0,
defines the regression model on the Friday as
rt = β0 + ut .
The intercept β0 in the model represents a benchmark average return which
corresponds to the default day, namely Friday. All of the other average
returns are measured with respect to this value. For example, the Monday
average return is
E[ rt | Mon] = β0 + β1 .
So a significant value of β1 shows that average returns on Monday differ
significantly from average returns on Friday.
2.8.3 Event Studies
Event studies are widely used in empirical finance to model the effects of
qualitative changes arising from a particular event on financial variables.
Typically events arise from some announcement caused by for example, a
change in the CEO of a company, an unfavourable antitrust decision, or
the effects of monetary policy announcements on the market. In fact, the
stock market crash and day-of-the-week effects examples of dummy variables
given above also constitute event studies. A typical event study involves
specifying a regression equation based on a particular model to represent
‘normal’ returns, and then defining separate dummy variables at each point
in time over the event window to capture the ‘abnormal’ returns, positive
or negative. The parameter on a particular dummy is the ‘abnormal’ return
2.9 Measuring Portfolio Performance
61
at that point in time as it represents the return over and above the ‘normal’
return.
In defining the period of the event window two periods are included which
occur on either side of the point in time of the actual announcement. The
period before the announcements is included to identify how the market behaves in anticipation of the announcement. The period after the announcement captures the reaction of the market to the announcement. For an event
study with ‘normal’ returns based on the market model in (2.15) and ‘abnormal’ returns corresponding to an event window that occurs in the last 5
days of the sample with the actual announcement occurring the 3rd last day
in the sample, the regression equation is
rt = β0 + β1 rm,t
|
{z
}
‘Normal’ return
+ δ−2 ET −5 + δ−1 ET −3 + δ0 ET −2 + δ1 ET −1 + δ2 ET −0 +ut .
|
{z
}
‘Abnormal’ return
The normal return at each point in time is given by β0 + β1 rm,t . The abnormal return on the day of the announcement is δ0 , on the days prior to the
announcement given by δ−2 and δ−1 , and on the days after the announcement given by δ1 and δ2 . The abnormal return for the whole of the event
window is
Total abnormal return = δ−2 + δ−1 + δ0 + δ1 + δ2 .
This suggests that a test of the statistical significance of the event and its
effect on generating abnormal returns over the event window period is based
on the restrictions
H0 : δ−2 = δ−1 = δ0 = δ1 = δ2 = 0
(Normal returns)
H1 : at least one restriction is not valid (Abnormal returns).
A χ2 test can be used with 5 degrees of freedom.
2.9 Measuring Portfolio Performance
There are three commonly used metrics to measure portfolio performance.
Sharpe Ratio (Sharpe, 1966)
The Sharpe ratio is a measure of average return, R, in excess of a risk
free rate, Rf , risk per unit of total portfolio risk, s, and is defined as
S=
r − rf
.
s
62
Linear Regression Models
The Sharpe ratio demonstrates how well the return of an asset compensates the investor for the risk taken. In particular, when comparing two risky assets the one with a higher Sharpe ratio provides
better return for the same risk. The Sharpe ratio has proved very
popular in empirical finance because it may be computed directly
from any observed time series of returns.
Treynor Index (Treynor, 1966).
The Treynor ratio is defined as
T =
r − rf
,
β
where β is the Beta-risk of the portfolio. Like the Sharpe ratio, this
measure also gives a measure of excess returns per unit of risk, but
is uses Beta-risk as the denominator and not total portfolio risk as
in the Sharpe ratio.
Jensen’s Alpha (Jensen, 1968)
Jensen’s alpha is obtained from the CAPM regression as
α = E[ri,t − rf,t ] − βE[rm,t − rf,t ] .
To illustrate the general ideas involved in measuring portfolio performance
a data set comprising monthly returns to 10 industry portfolios was downloaded from Ken French’s webpage at Dartmouth2 together with a benchmark monthly returns to the market and the monthly return on a risk free
rate of interest . The industry portfolios are: consumer nondurables (nondur), consumer durables (dur), manufacturing (man), energy (energy), technology (hitec), telecommunications (telecom), wholesale and retail (shops),
healthcare (health), utilities (utils) and a catch all that includes mining, construction, entertainment and finance (other). The The return on the market
is constructed as the value-weight return of all CRSP firms incorporated in
the United States and listed on the NYSE, AMEX, or NASDAQ and the
risk free rate is the 1-month U.S. Treasury Bill rate (for more details see
Appendix A).
Table 2.3 reports summary statistics for the portfolio returns as well as the
market and risk free variables. Table 2.4 tabulates the Sharpe ratio, Treynor
index and Jensen’s alpha for the 10 industry portfolios together with their
Beta coefficient obtained from estimation of the CAPM equation. Consumer
durables, manufacturing and the sectors summarised in ‘other’ are the all
aggressive portfolios with β > 1. The retail, wholesale and service shop industry provides a sector portfolio that is closest to being a tracking portfolio
2
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
63
2.9 Measuring Portfolio Performance
with β = 0.96. All the other industry portfolios are relatively conservative
with 0 < β < 1. As expected none of the industry portfolios provide a hedge
against systematic risk.
Table 2.3
Summary statistics for monthly returns data on the market portfolio, risk free rate
of interest and 10 United States industry portfolios for the period January 1927 to
December 2008 (T = 984). Data are downloaded from Ken French’s data library.
Variable
Mean
Std. Dev.
Skewness
Kurtosis
emkt
rf
nondur
dur
man
energy
hitec
telcom
shops
health
utils
other
0.5895
0.3046
0.9489
1.0001
0.9810
1.0625
1.0505
0.8026
0.9584
1.0628
0.8694
0.8762
5.4545
0.2522
4.7127
7.6647
6.3799
6.0306
7.4844
4.6422
5.9160
5.7923
5.7101
6.5295
0.1886
1.0146
−0.0323
1.0988
0.9177
0.2118
0.2807
0.0109
−0.0313
0.1684
0.0881
0.9197
10.5619
1.0146
8.7132
18.1815
15.3365
6.1139
8.8840
6.2314
8.3867
10.0623
10.4817
16.4520
Table 2.4
Measures of portfolio performance for monthly returns data on 10 United States
industry portfolios for the period January 1927 to December 2008 (T = 984).
Data are downloaded from Ken French’s data library.
Variable
Sharpe
Ratio
Treynor
Index
Beta
Jensen’s
Alpha
Rank
Sharpe
Rank
Treynor
Rank
Alpha
nondur
dur
man
energy
hitec
telcom
shops
health
utils
other
0.137
0.091
0.106
0.126
0.010
0.107
0.111
0.131
0 .099
0.088
0.845
0.568
0.601
0.892
0.597
0.768
0.681
0.884
0.707
0.510
0.762
1.225
1.126
0.850
1.249
0.649
0.960
0.858
0.799
1.120
0.195
−0.027
0.013
0.257
0.010
0.116
0.088
0.252
0.094
−0.089
1
8
6
3
10
5
4
2
7
9
3
9
7
1
8
4
6
2
5
10
3
9
7
1
8
4
6
2
5
10
The correct treatment of risk in evaluating portfolio models has been the
subject of much research. While it is well understood that adjusting the
64
Linear Regression Models
portfolio for risk is important, the exact nature of this adjustment is more
problematic. The results in Table 2.4 highlight a feature that is commonly
encountered in practical performance evaluation, namely, that the Sharpe
and Treynor measures rank performance differently. Of course, this is not
surprising because the Sharpe ratio accounts for total portfolio risk, while
the Treynor measure adjusts excess portfolio returns for systematic risk
only. The similarity between the rankings provided by Treynor’s index and
Jensen’s alpha is also to be expected given that the alpha measure is derived
from a CAPM regression which explicitly accounts for systematic risk via the
inclusion of the market factor. On the other hand, the precision of the alpha
measure is questionable in these regressions, a factor that will be returned
to a little later.
All of the rankings are consistent in one respect, namely that a positive alpha is a necessary condition for good performance and hence alpha
is probably the most commonly used measure. Table 2.4 confirms that the
consumer durables and other industry portfolios are the only ones to return
a negative alpha and they are uniformly ranked a poor performers by all
metrics. The importance of the alpha of a portfolio has led to a substantial
literature that extends the basic CAPM model to account for risk factors
over and above the market risk factor. If these factors can be reliably identified then the exposure of a portfolio to this risk factor can be included in
expected return. In this way the true excess return or alpha is identified.
Fama and French (1992, 1993) augment the CAPM model by including
two additional factors that measure the performance of small stocks relative
to big stocks (SMB) and the performance of value stocks relative to growth
stocks (HML). The inclusion of a SMB or ‘size’ factor is usually justified
by arguing that this factor captures the fact that small firms have greater
sensitivity to economic conditions than large firms and embody greater informational asymmetry. The motivation for HML is that high book value
relative to market value implies a greater probability of financial distress
and bankruptcy. The combined model is commonly referred to as the FamaFrench three-factor model.
Carhart (1977) suggested a fourth factor be included in the extended
CAPM model following the work of Jegadeesh and Titman (1993). Jegadeesh
and Titman found that a portfolio made up of buying stocks had high returns over the past three to twelve months and selling those that have had
poor returns over the same period, had a higher return than that predicted
by a three-factor model. This factor is known as the momentum factor,
MOMt , as its inclusion into the extended CAPM model is usually justified
65
2.9 Measuring Portfolio Performance
by appealing to behavioural aspects of investors such as herding and overor under-reaction to news.
00
10
20
80
70
60
50
90
20
19
19
19
19
00
90
80
70
60
50
10
20
20
19
19
19
19
19
19
40
-60
30
19
00
90
80
70
60
50
10
20
20
19
19
19
19
19
40
19
19
30
-20
-40
0
-20
20
0
20
Momentum Factor
40
Value Factor
19
19
40
-20
30
19
00
90
10
20
20
70
60
50
40
80
19
19
19
19
19
19
19
30
-40
-20
0
0
20
20
40
Size Factor
40
Market Factor
Figure 2.4 Monthly data for market, size, value and momentum factors of
the extended CAPM model for the period January 1927 to December 2012.
Figure 2.4 plots the evolution of the four factors of the extended CAPM
model. The linear regression equation to be estimated in order to implement
the extended model is given by
ri,t − rf,t = α + β1 (rm,t − rf,t ) + β2 SMBt + β3 HMLt + β4 MOMt + ut , (2.57)
where ut is a disturbance term. The contributions of SMB, HML and MOM
are determined by the parameters β2 , β3 and β3 respectively. In the special
case where these additional factors do not explain movements in the excess
return on the asset ri,t − rf,t , or β2 = β3 = β4 = 0, equation (2.57) reduces
to the standard CAPM regression equation in (2.19). Table 2.5 reports the
results of estimating this model for the 10 United States industry portfolios.
There are a number of interesting features to note about the results reported in Table 2.5 in which statistical significance is marked with asterisks
66
Linear Regression Models
Table 2.5
The four-factor CAPM model, equation (2.57), estimated using monthly returns
data on 10 United States industry portfolios for the period January 1927 to
December 2008 (T = 984). Data are downloaded from Ken French’s data library.
Variable
Constant
α
emkt
β1
smb
β2
nondur
dur
man
energy
hitec
telcom
shops
health
utils
other
0.1659*
0.7693*** −0.0246
0.0344
1.1663***
0.0122
−0.0210
1.1034*** −0.0030
0.0836
0.8859*** −0.2042***
0.2026*
1.2564***
0.0825**
0.2513*
0.6669*** −0.1373***
0.1796*
0.9476***
0.0787**
0.3180** 0.9025*** −0.0896**
0.0227
0.7835*** −0.1540***
−0.1319*
1.0380***
0.0662***
∗ p < 0.05 ∗ ∗ p < 0.01 ∗ ∗ ∗
hml
β3
mom
β4
0.0318
0.1566***
0.1385***
0.2719***
−0.3592***
−0.1141***
−0.1435***
−0.1810***
0.3090***
0.3328***
p < 0.001.
0.0229
−0.1205***
−0.0116
0.1157***
−0.0910***
−0.0870***
−0.0575**
0.0044
−0.0122
−0.0775***
for easy interpretation. The strength of the market factor in driving the
returns to the portfolios is striking, with all the industry portfolio β’s being significant at the 0.1% level. There is strong evidence that the all the
factors other than the market factor are important explanatory variables in
the extended CAPM equation, but the results are not quite as uniform over
the 10 portfolios. Not only does statistical significance vary, but there are
also changes in sign which is indicative that different industries have vastly
differing exposures to these factors.
Perhaps the most interesting result is the effect of the additional factors
on Jensen’s alpha. The statistical significance of α is not nearly as strong
as expected: 4 of the industry portfolio’s have statistically insignificant estimates of α while the catch all sector ‘other’ has a negative and significant
estimate. The biggest loser in this extended analysis is the energy sector.
Energy was ranked first in Table 2.4 on both the Treynor and Jensen measures, but the estimate of α here is statistically insignificant. Health and
telecommunications appear to come out of the extended CAPM with the
highest measure of excess return.
2.10 Exercises
(1) Minimum Variance Portfolios
2.10 Exercises
67
capm.wf1, capm.dta, capm.xlsx
Consider the equity prices of the United States companies Microsoft
and Walmart for the period April 1990 to July 2004 (T = 172).
(a) Compute the continuously compounded returns on Microsoft and
Walmart.
(b) Compute the variance-covariance matrix of the returns on these two
stocks. Verify that the covariance matrix of the returns is
0.011332 0.002380
,
0.002380 0.005759
where the diagonal elements are the variances of the individual asset
returns and the off-diagonal elements are the covariances. Note that
the off-diagonal elements are in fact identical because the covariance
matrix is a symmetric matrix.
(c) Use the expressions in (2.6) and (2.7) to verify that the minimum
variance portfolio weights between these two assets are
σ22 − σ1,2
0.005759 − 0.002380
= 0.274
=
0.011332 + 0.005759 − 2 × 0.002380
σ12 + σ22 − 2σ1,2
w2 = 1 − w1 = 1 − 0.274 = 0.726.
w1 =
(d) Using the computed weights in part (c), compute the return on the
portfolio as well as its mean and variance (without any degrees of
freedom adjustment).
(e) Estimate the regression equation
rWmart,t = β0 + β1 (rWmart,t − rMsoft,t ) + ut ,
where ut is a disturbance term.
(i) Interpret the estimate of β1 and discuss how it is related to the
optimal portfolio weights computed in part (c).
(ii) Interpret the estimate of β0 .
(iii) Compute the least squares residuals u
bt , and interpret this quantity in the context of the minimum variance portfolio problem.
(iv) Compute the variance of the least squares residuals, without
any degrees of freedom adjustment, and interpret the result.
(f) Using the results in part (e)
(i) Construct a test of an equal weighted portfolio, w1 = w2 = 0.5.
(ii) Construct a test of portfolio diversification.
68
Linear Regression Models
(g) Repeat parts (a) to (f) for Exxon and GE.
(h) Repeat parts (a) to (f) for gold and IBM.
(2) Estimating the CAPM
capm.wf1, capm.dta, capm.xlsx
(a) Compute the monthly excess returns on Exxon, General Electric,
Gold, IBM, Microsft and Walmart. Be particularly carefully when
computing the correct risk free rate to use. [Hint: the variable TBILL
is quoted as an annual rate.]
(b) Estimate the CAPM in (2.19) for each asset and interpret the estimated Beta-risk.
(c) For each asset, test the restriction β1 = 0. Assuming that this restriction holds, what is the relationship between CAPM and the
Constant Mean Model in (2.13)?
(d) For each asset, test the restriction β1 = 1. Assuming that this restriction holds, what is the relationship between CAPM and the
Market Model in (2.16)?
(e) For each asset, test the restriction β0 = 0. Provide an interpretation
of the CAPM if this restriction is valid.
(3) Fama-French Three Factor Model
fama french.wf1, fama french.dta, fama french.xlsx
(a) For each of the 25 portfolios in the data set, estimate the CAPM
and interpret the Beta-risk.
(b) Estimate the Fama-French three factor model for each portfolio and
interpret the estimate of the Beta-risk and compare the estimate
obtained in part (a).
(c) Perform a joint test of the size (SMB) and value (HML) risk factors
in explaining excess returns in each portfolio.
(4) Present Value Model
pv.wf1, pv.dta, pv.xlsx
2.10 Exercises
69
The present value model for price in terms of dividends is represented
by the following regression model
pt = β0 + β1 dt + ut
where ut is a disturbance term and lowercase denotes logarithms.
(a) Estimate the model and interpret the parameter estimates.
(b) Examine the properties of the model by
(i)
(ii)
(iii)
(iv)
Plotting the OLS residuals.
Testing for autocorrelation.
Testing for heteroskedasticity.
Testing for nonnormality.
(c) Test the restriction β1 = 1 and interpret the result. In particular,
interpret the estimate of β0 when β1 = 1.
(5) International CAPM
icapm.wf1, icapm.dta, icapm.xlsx
(a) Estimate the ICAPM for the NYSE and interpret the parameter
estimates.
(b) Examine the properties of the model by
(i)
(ii)
(iii)
(iv)
Plotting the OLS residuals.
Testing for autocorrelation.
Testing for heteroskedasticity.
Testing for nonnormality.
(c) Test the restriction β1 = 1 and interpret the result.
(d) Test the joint restrictions β0 = 0, β1 = 1 and interpret the result.
(6) Fisher Hypothesis
fisher.wf1, fisher.dta, fisher.xlsx
The Fisher hypothesis states that nominal interest rates fully reflect
long-run movements in inflation. To test this model consider the linear
regression model
rt = β0 + β1 πt + ut ,
where πt be the inflation rate and ut is a disturbance term. If the Fisher
hypothesis is correct, β1 = 1.
70
Linear Regression Models
(a) Estimate this model and interpret the parameter estimates.
(b) Test the restriction β1 = 1 and interpret the result. In particular,
interpret the estimate of β0 when β1 = 1.
(7) Term Structure of U.S. Zero Coupon Rates
termstructure.wf1, termstructure.dta, termstructure.xlsx
The expectations theory of the term structure of interest rates is represented by a linear relationship between long-term and short-term interest rates
LONGt = β0 + β1 SHORTt + ut
where ut is a disturbance term.
(a) Estimate the model where the long rate is the 2-year yield and the
short rate is the 1-year yield. Interpret the parameter estimates.
(b) Assuming that Et [SHORTt+1 ] = SHORTt implies that β1 = 1. Test
this restriction.
(c) Repeat (a) and (b) where the long rate is chosen, respectively, as
the 3-year rate, the 4-year rate and so on up to the 15-year rate.
(d) Suppose that the conditional expected value of the short rate is now
given by
Et [SHORTt+j ] = φj SHORTt ,
j = 1, 2, · · · ,
where φ is an unknown parameter. Show that for the case where the
short and long rates are respectively the 1-year and 2-year yields,
the slope parameter is given by
1+φ
.
2
Use the results obtained in part (a) to estimate φ.
(e) Repeat part (d) where the long rate is the 3-year yield and compare
the estimate of φ with the estimate obtained in part (d). [ Hint:
in deriving an expression for φ it is necessary to solve a quadratic
equation in terms of β1 .]
(f) Suppose that the long term bond is a consul with n → ∞. Show
that the slope parameter in a regression of a consul on a constant
and the 1-year short rate equals zero for |φ| < 1 in part (d) and
unity for |φ| = 1.
β1 =
71
2.10 Exercises
(8) Fama-Bliss Regressions
fama bliss.wf1, fama bliss.dta, fama bliss.xlsx
(a) Convert the prices of United States zero coupon bonds into yields
using
Pn,t
1
),
n = 1, 2, 3, 4, 5,
yn,t = − log(
n
100
where Pn,t is the price of a n-year zero coupon bond at time t.
(b) Compute the forward yields as
fn,t = log(Pn−1,t ) − log(Pn,t ),
n = 2, 3, 4, 5,
(c) Compute the annual holding period returns as
hn,t = log(Pn−1,t ) − log(Pn,t−12 ),
n = 2, 3, 4, 5,
(d) Compute the annual excess returns as
u
bn,t = hn,t − y1,t−12 ,
n = 2, 3, 4, 5,
(e) Fama and Bliss (1987) specify a regression equation where the excess
return is a function of the lagged forward spread in the previous year
u
bn,t = β0 + β1 (fn,t−12 − y1,t−12 ) + ut ,
where ut is a disturbance term. Estimate this equation for maturities n = 2, 3, 4, 5, over the sample period January 1965 to December 2003, and compare the estimates reported by Cochrane and
Piazzesi (2009) who provide updated estimates of the Fama-Bliss
regressions. Fama and Bliss found that the ability to forecast excess returns increased as maturity increased for horizons less than 5
2
years. Discuss this proposition by comparing R for each estimated
regression equation.
(f) An alternative approach is suggested by Cochrane and Piazzesi
(2009) who specify the regression equation in terms of all forward
rates in the previous year
u
bn,t = β0 +β1 y1,t−12 +β2 f2,t−12 +β3 f3,t−12 +β4 f4,t−12 +β5 f5,t−12 +ut ,
where ut is a disturbance term. Estimate this equation for maturities n = 2, 3, 4, 5 over the sample period January 1965 to December
2003, and compare the estimates with those reported by Cochrane
72
Linear Regression Models
and Piazzesi (2009). Discuss the pattern of the slope parameter estimates {β1 , β2 , β3 , β4 , β5 } in each of the four regression equations.
Briefly discuss the advantages of this specification over the FamaBliss regression model.
(9) The Retirement of Lee Raymond as the CEO of Exxon
capms.wf1, capm.dta, capm.xlsx
In December of 2005, Lee Raymond retired as the CEO of Exxon
receiving the largest retirement package ever recorded of around $400m.
How did the markets view the Lee Raymond event?
(a) Estimate the market model for Exxon from January 1970 to September 2005
rt = β0 + β1 rm,t + ut ,
where rt is the log return on Exxon and rm,t is the market return
computed from the S&P500. Verify that the result is
rt = 0.009 + 0.651 rm,t + u
bt ,
where u
bt is the residual.
(b) Construct the dummy variables
1:
D2005:10,t =
0:
1:
D2005:11,t =
0:
..
.
1:
D2006:2,t =
0:
Oct. 2005
,
Otherwise
Nov. 2005
,
Otherwise
Feb. 2006
,
Otherwise
(c) Restimate the market model including the 5 dummy variables constructed in part (b) over the extended sample from January 1970 to
February 2006. Verify that the estimated regression equation is
rt = 0.009 + 0.651 rm,t − 0.121 Oct05t + 0.007 Nov05t − 0.041 Dec05t
+0.086 Jan06t − 0.059 Feb06t + u
bt .
(i) What is the relationship between the parameter estimates of β0
and β1 computed in parts (a) and (c)?
73
2.10 Exercises
(ii) Do you agree that the total estimated abnormal return on Exxon
from October 2005 to February 2006 is
Total abnormal return = −0.121+0.007−0.041+0.086−0.059 = −0.128.
(d) An alternative way to compute abnormal returns is to use the estimated model in part (a) and substitute in the values of rm,t for the
event window. As the monthly returns on the market for this period
are
{−0.0179, 0.0346, −0.0009, 0.0251, 0.0004} ,
recompute the abnormal returns. Compare these estimates with the
estimates obtained in part (c).
(e) Perform the following tests of abnormal returns.
(i) There was no abnormal return
Decemberv2005.
(ii) There were no abnormal returns
(iii) There were no abnormal returns
(iv) There were no abnormal returns
at the time of retirement on
before retirement.
after retirement.
at all.
3
Modelling with Stationary Variables
3.1 Introduction
An important feature of the linear regression model discussed in Chapter 2
is that all variables are designated at the same point in time. To allow for
financial variables to adjust to shocks over time the linear regression model is
extended to allow for a range of dynamics. The first class of dynamic models
developed is univariate whereby a single financial variable is modelled using
its own lags as well as lags of our financial variables. Then multivariate
specifications are developed in which several financial variables are jointly
modelled.
An important characteristic of the multivariate class of models investigated in the chapter is that each variable in the system is expressed as a
function of its own lags as well as the lags of all of the other variables in
the system. This model is known as a vector autoregression (VAR), model
that is characterised by the important feature that every equation has the
same set of explanatory variables. This feature of a VAR has several advantages. First, estimation is straightforward, being simply the application of
ordinary least squares applied to each equation one at a time. Second, the
model provides the basis of performing causality tests which can be used to
quantity the value of information in determining financial variables. These
tests can be performed in three ways beginning with Granger causality tests,
impulse response functions and variance decompositions. Fourth, multivariate tests of financial theories can be undertaken as these theories are shown
to impose explicit restrictions on the parameters of a VAR which can be
verified empirically. Fifth, the VAR provides a very convenient and flexible
forecasting tool to compute predictions of financial variables.
75
3.2 Stationarity
3.2 Stationarity
10
20
00
20
90
19
80
19
70
19
19
60
0
500
1000
1500
The models in this chapter, which use standard linear regression techniques,
require that the variables involved satisfy a condition known as stationarity.
Stationarity, or more correctly, its absence is the subject matter of Chapters 4 and 5. For the present a simple illustration will indicate the main
idea. Consider Figures 3.1 and 3.2 which show the daily S&P500 index and
associated log returns, respectively.
10
20
00
20
90
19
80
19
70
19
19
60
-.02
-.01
0
.01
.02
Figure 3.1 Snapshots of the time series of the S&P500 index comprising
daily observations for the period January 1957 to December 2012.
Figure 3.2 Snapshots of the time series of S&P500 log returns computed
from daily observations for the period January 1957 to December 2012.
Assume that an observer is able to take a snapshot of the two series at
76
Modelling with Stationary Variables
different points in time; the first snapshot shows the behaviour of the series
for the decade of the 1960s and the second shows their behaviour from 20002010. It is clear that the behaviour of the series in Figure 3.1 is completely
different in these two time periods. What the impartial observer sees in
1960-1970 looks nothing like what happens in 2000-2010. The situation is
quite different for the log returns plotted in Figure 3.2. To the naked eye
the behaviour in the two shaded areas is remarkable similar given that the
intervening time span is 30 years.
In both this chapter and the next chapter it will simply be assumed that
the series we deal with exhibit behaviour similar to that in Figures 3.2. This
assumption is needed so that past observations can be used to estimate
relationships, interpret the relationships and forecast future behaviour by
extrapolating from the past. In practice, of course, stationarity must be
established using the techniques described in Chapter 4. It is not sufficient
merely to assume that the condition is satisfied.
3.3 Univariate Autoregressive Models
3.3.1 Specification
The simplest specification of a dynamic model of the dependent variable yt
is where the explanatory variables are the own lags of the dependent variable
yt = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p + ut ,
(3.1)
where ut is a disturbance term with zero mean and variance σ 2 , and φ0 , φ1 , · · · , φp ,
are unknown parameters. This equation shows that the information used to
explain movements in yt are the own lags with the longest lag being the pth
lag. This property is formally represented by the conditional expectations
operator which gives the predictor of yt based on information available at
time t − 1
Et−1 [yt ] = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p .
(3.2)
Equation (3.1) is referred to as an autoregressive model with p lags, or simply
AR(p). Estimation of the unknown parameters is achieved by using ordinary
least squares. These parameter estimates can also be used to identify the
role of past information by performing tests on the parameters.
77
3.3 Univariate Autoregressive Models
3.3.2 Properties
To understand the properties of AR models, consider the AR(1) model
yt = φ0 + φ1 yt−1 + ut ,
where |φ1 | < 1. Applying the unconditional expectations operator to both
sides gives
E[yt ] = E[φ0 + φ1 yt−1 + ut ] = φ0 + φ1 E[yt−1 ].
As E[yt ] = E[yt−1 ], the unconditional mean is
E[yt ] =
φ0
.
1 − φ1
The unconditional variance is defined as
γ0 = E[(yt − E[yt ])2 ].
Now
yt − E[yt ] = (φ0 + φ1 yt−1 + ut ) − (φ0 + φ1 E[yt−1 ]) = φ1 (yt−1 − E[yt−1 ]) + ut .
Squaring both sides and taking unconditional expectations gives
E[(yt − E[yt ])2 ] = φ21 E[(yt−1 − E[yt−1 ])2 ] + E[u2t ] + 2E[(yt−1 − E[yt−1 ])ut ]
= φ21 E[(yt−1 − E[yt−1 ])2 ] + E[u2t ],
as E[(yt−1 − E[yt−1 ])ut ] = 0. Moreover, because
γ0 = E[(yt − E[yt ])2 ] = E[(yt−1 − E[yt−1 ])2 ]
if follows that
γ0 = φ21 γ0 + σ 2 ,
which upon rearranging gives
γ0 =
The first order autocovariance is
σ2
.
1 − φ21
γ1 = E[(yt − E[yt ])(yt−1 − E[yt−1 ])]
= E[(φ1 (yt−1 − E[yt−1 ]) + ut )(yt−1 − E[yt−1 ])]
= φ1 E[(yt−1 − E[yt−1 ])2 ]
= φ1 γ0 .
It follows that the k th autocovariance is
γk = φk1 γ0 .
(3.3)
78
Modelling with Stationary Variables
It immediately follows from this result that the autocorrelation function
(ACF) of the AR(1) model is
γk
= φk1 .
ρk =
γ0
For 0 < φ1 < 1, the autocorrelation function declines for increasing k so
that the effects of previous values on yt gradually diminish. For higher order
AR models the properties of the ACF are in general more complicated.
To compute the ACF, the following sequence of AR models are estimated
by ordinary least squares
yt = φ10 + ρ1 yt−1 + ut
yt = φ20 + ρ2 yt−2 + ut
.. ..
..
. .
.
yt = φ30 + ρk yt−k + ut ,
where the estimated ACF is given by {b
ρ1 , ρb2 , · · · , ρbk }. The notation adopted
for the constant term emphasises that this term will be different for each
equation.
Another measure of the dynamic properties of AR models is the partial
autocorrelation function (PACF), which measures the relationship between
yt and yt−k but now with the intermediate lags included in the regression
model. The PACF at lag k is denoted as φk,k . By implication the PACF for
an AR(p) model is zero for lags greater than p. For example, in the AR(1)
model the PACF has a spike at lag 1 and thereafter is φk,k = 0, ∀ k > 1. This
is in contrast to the ACF which in general has non-zero values for higher
lags. Note that by construction the ACF and PACF at lag 1 are equal to
each other.
To compute the PACF the following sequence of AR models are estimated
by ordinary least squares
yt = φ10 + φ11 yt−1 + ut
yt = φ20 + φ21 yt−1 + φ22 yt−2 + ut
yt = φ30 + φ31 yt−1 + φ32 yt−2 + φ33 yt−3 + ut
.. ..
..
..
. .
.
.
yt = φk0 + φk1 yt−1 + φk2 yt−2 + · · · + φkk yt−k + ut ,
where the estimated PACF is therefore given by {ϕ
b1 = φb11 , ϕ
b2 = φb22 , · · · , ϕ
bk =
b
φkk }.
Consider United States monthly data on real equity returns expressed as
3.3 Univariate Autoregressive Models
79
a percentage, rpt , from February 1871 to June 2004. The ACF and PACF
of the equity returns are computed by means of a sequence of regressions.
The ACF for lags 1 to 3 is computed using the following three regressions
(standard errors in parentheses):
rpt = 0.247 + 0.285 rpt−1 + vbt ,
(0.099)
(0.024)
rpt = 0.342 + 0.008 rpt−2 + vbt ,
(0.103)
(0.025)
rpt = 0.361 − 0.053 rpt−3 + vbt .
(0.103)
(0.025)
The estimated ACF is
{b
ρ1 = 0.285, ρb2 = 0.008, ρb3 = −0.053} .
By contrast, the PACF for lags 1 to 3 is computed using the following
three regressions (standard errors in parentheses):
rt = 0.247 + 0.285 rt−1 + vbt ,
(0.099)
(0.024)
rt = 0.266 + 0.308 rt−1 − 0.080 rt−2 + vbt ,
(0.098)
(0.025)
(0.025)
rt = 0.274 + 0.305 rt−1 − 0.070 rt−2 − 0.035 rt−3 + vbt .
(0.099)
(0.025)
(0.026)
(0.025)
The estimated PACF is
{ϕ
b1 = 0.285, ϕ
b2 = −0.080, ϕ
b3 = −0.035} .
The significance of the estimated coefficients in the regressions required
to compute the ACF and PACF suggest that a useful starting point for
a dynamic of of real equity returns is a simple univariate autoregressive
model. The parameter estimates obtained by estimating an AR(6) model by
ordinary least squares are as follows (standard errors in parentheses):
rpt = 0.243 + 0.303 rpt−1 − 0.064 rpt−2 − 0.041 rpt−3
(0.099)
(0.025)
(0.026)
(0.026)
+0.019 rpt−4 + 0.056 ret−5 + 0.022 rpt−6 + vbt ,
(0.026)
(0.026)
(0.025)
in which vbt is the least squares residual. The first lag is the most important
both economically, having the largest point estimate (0.303) and statistically,
having the largest t-statistic (0.303/0.025 = 12.12). The second and fifth
lags are also statistically important at the 5% level. The insignificance of
the parameter estimate on the sixth lag suggests that an AR(5) model may
be a more appropriate and parsimonious model or real equity returns.
80
Modelling with Stationary Variables
3.3.3 Mean Aversion and Reversion in Returns
There is evidence that returns on assets exhibit positive autocorrelation for
shorter maturities and negative autocorrelation for longer maturities. Positive autocorrelation represents mean aversion as a positive shock in returns
in one period results in a further increase in returns in the next period,
whereas negative autocorrelation arises when a positive shock in returns
leads to a decrease in returns in the next period.
An interesting illustration of mean aversion and reversion in autorcorrelations is provided by the NASDAQ share index. Using monthly, quarterly
and annual frequencies for the period 1989 to 2009 the following results are
obtained from estimating a simple AR(1) model (standard errors in parentheses):
Monthly
:
rt = 0.599 + 0.131 rt−1 + et
Quarterly
:
rt = 1.950 + 0.058 rt−1 + et
Annual
:
rt = 8.974 − 0.131 rt−1 + et .
(0.438)
(1.520)
(7.363)
(0.063)
(0.111)
(0.238)
There appears to be mean aversion in returns for time horizons less than a
year as the first order autocorrelation is positive for monthly and quarterly
returns. By contrast, there is mean reversion for horizons of at least a year
as the first order autocorrelation is now negative with a value of −0.131 for
annual returns.
To understand the change in the autocorrelation properties of returns over
different maturities, consider the following model of prices, Pt , in terms of
fundamentals, Ft
pt = ft + ut
ut ∼ iid N (0, σu2 )
ft = ft−1 + vt
vt ∼ iid N (0, σv2 ),
where lower case letters denote logarithms and vt and ut are disturbance
terms assumed to be independent of each other. Note that ut represents
transient movements in the actual price from its fundamental price.
The 1-period return is
rt = pt − pt−1 = vt + ut − ut−1 .
3.4 Univariate Moving Average Models
81
and the h-period return is
rt (h) = pt − pt−h = rt + rt−1 + · · · + rt−h+1
= (vt + ut − ut−1 ) + (vt−1 + ut−1 − ut−2 ) + · · ·
+(vt−h+1 + ut−h+1 − ut−h )
= vt + vt−1 + · · · vt−h+1 + ut − ut−h .
The autocovariance is
γh = E[(log pt − log pt−h )(log pt−h − log pt−2h )]
= E[(vt + vt−1 · · · vt−h+1 + ut − ut−h )
×(vt−h + vt−h−1 + · · · vt−2h+1 + ut−h − ut−2h )]
= E[ut ut−h ] − E[ut ut−2h ] − E[u2t−h ] + E[ut−h ut−2h ]
= 2E[ut ut−h ] − E[ut ut−2h ] − E[u2t−h ].
For h = 0, the returns variance is γ0 = 0. As ut is stationary by assumption,
for longer maturities E[ut ut−h ] and E[ut ut−2h ] both approach zero, and
lim γh = −E[u2t−h ],
h→∞
implying that the autocovariance must eventually become negative. For intermediate maturities, however, this expression can be positive thereby implying mean aversion in these intermediate returns.
3.4 Univariate Moving Average Models
3.4.1 Specification
An alternative way to introduce dynamics into univariate models is to allow
the lags in the dependent variable yt to be implicitly determined via the
disturbance term ut . The specification of the model is
yt = ψ0 + ut ,
(3.4)
ut = vt + ψ1 vt−1 + ψ2 vt−2 + · · · + ψq vt−q ,
(3.5)
with ut specified as
where vt is a disturbance term with zero mean and constant variance σv2 , and
ψ0 , ψ1 , · · · , ψq are unknown parameters. As ut is a weighted sum of current
and past disturbances, this model is referred to as a moving average model
with q lags, or more simply MA(q). Estimation of the unknown parameters
is more involved for this class of models than it is for the autoregressive
model as it requires a nonlinear least squares algorithm.
82
Modelling with Stationary Variables
3.4.2 Properties
To understand the properties of MA models, consider the MA(1) model
yt = ψ0 + vt + ψ1 vt−1 ,
(3.6)
where |ψ1 | < 1. Applying the unconditional expectations operator to both
sides gives the unconditional mean
E[yt ] = E[ψ0 + vt + ψ1 vt−1 ] = ψ0 + E[vt ] + ψ1 E[vt−1 ] = ψ0 .
The unconditional variance is
γ0 = E[(yt − E[yt ])2 ] = E[(vt + ψ1 vt−1 )2 ] = σv2 (1 + ψ12 ).
The first order autocovariance is
γ1 = E[(yt − E[yt ])(yt−1 − E[yt−1 ])]
= E[(vt + ψ1 vt−1 )(vt−1 + ψ1 vt−k )]
= ψ1 σv2 ,
whilst for autocovariances of k > 1, γk = 0. The ACF of a MA(1) model is
summarised as

 ψ1
: k=1
γk
(3.7)
ρk =
=
1 + ψ12

γ0
0
: otherwise.
This result is in contrast to the ACF of the AR(1) model as now there is a
spike in the ACF at lag 1. As this spike corresponds to the lag length of the
model, it follows that the ACF of a MA(q) model has non-zero values for
the first q lags and zero thereafter.
To understand the PACF properties of the MA(1) model, consider rewriting ( 3.6) using the lag operator
yt = ψ0 + (1 + ψ1 L)vt ,
whereby Lvt = vt−1 . As |ψ1 | < 1, this equation is rearranged by multiplying
both sides by (1 + ψ1 L)−1
(1 + ψ1 L)−1 yt = (1 + ψ1 L)−1 ψ0 + vt
(1 − ψ1 L + ψ12 L2 + · · · )yt = (1 + ψ1 L)−1 ψ0 + vt .
As this is an infinite AR model, the PACF is non-zero for higher order lags in
contrast to the AR model which has just non-zero values up to an including
lag p.
3.5 Autoregressive-Moving Average Models
83
3.4.3 Bid-Ask Bounce
Market-makers provide liquidity in asset markets as they are prepared to
post prices and respond to the demand of buyers and sellers. The marketmakers buy at the bid price, bid, and sell at the ask price, ask, with the
difference between the two, the bid-ask spread given by
s = ask − bid,
representing their profit. The price pt is assumed to behave according to
s
pt = f + It ,
2
where f is the fundamental price assumed to be constant and It is a binary
indicator variable that pushes the price of the asset upwards (downwards)
if there is a buyer (seller)
It =
+1 : with probability 0.5 (buyer)
−1 : with probability 0.5 (seller).
The change in the price exhibits negative first-order autocorrelation
corr(∆pt , ∆pt−1 ) = −
corr(∆pt , ∆pt−k ) = 0,
1
2
k > 1.
Since the autocorrelation function has a spike at lag 1, this process is equivalent to a first-order MA process.
3.5 Autoregressive-Moving Average Models
The autoregressive and moving average models are now combined to yield
an autoregressive-moving average model
yt = φ0 + φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p + ut
ut = vt + ψ1 vt−1 + ψ2 vt−2 + · · · + ψq vt−q ,
where vt is a disturbance term with zero mean and constant variance σv2 .
This model is denoted as ARMA(p,q). As with the MA model, the ARMA
model requires a nonlinear least squares procedure to estimate the unknown
parameters.
84
Modelling with Stationary Variables
3.6 Regression Models
A property of the regression models discussed in the previous chapter is
that the dependent and explanatory variables all occur at time t. To allow for dynamics into this model, the autoregressive and moving average
specifications discussed above can be used. Some ways that dynamics are
incorporated into this model are as follows.
(1) Including lagged autoregressive disturbance terms:
yt = β0 + β1 xt + ut
ut = ρ1 ut−1 + vt .
(2) Including lagged moving average disturbance terms:
yt = β0 + β1 xt + ut
ut = vt + θ1 vt−1 .
(3) Including lagged dependent variables:
yt = β0 + β1 xt + λyt−1 + ut .
(4) Including lagged explanatory variables:
yt = β0 + β1 xt + γ1 xt−1 + γ2 xt−2 + β2 zt−1 + ut .
(5) Joint specification:
yt = β0 + β1 xt + λ1 yt−1 + γ1 xt−1 + γ2 xt−2 + β2 zt−1 + ut
ut = ρ1 ut−1 + vt + θ1 vt−1 .
A natural specification of dynamics in the linear regression model arises
in the case of models of forward market efficiency. Lags here are needed for
two reasons. First, the forward rate acts as a predictor of future spot rates.
Second, if the data are overlapping whereby the maturity of the forward rate
is longer than the frequency of observations, the disturbance term will have
a moving average structure. This point is taken up in Exercise 6.
An important reason for including dynamics into a regression model is to
correct for potential misspecification problems that arise from incorrectly
excluding explanatory variables. In Chapter 2, misspecification of this type
is detected using the LM autocorrelation test applied to the residuals of the
estimated regression model.
3.7 Vector Autoregressive Models
85
3.7 Vector Autoregressive Models
Once a decision is made to move into a multivariate setting, it becomes
difficult to delimit one variable as the ‘dependent’ variable to be explained
in terms of all the others. It may be that all the variables are in fact jointly
determined.
3.7.1 Specification and Estimation
This problem was first investigated by Sims (1980) using United States data
on the nominal interest rate, money, prices and output. He suggested that
to start with it was useful to treat all variables as determined by the system
of equations. The model will therefore have an equation for each of the
variables under consideration. The most important distinguishing feature
of the system of equations, however, is each equation will have exactly the
same the set of explanatory variables. This type of model is known as a
vector autoregressive model (VAR).
An example of a bivarate VAR(p) is
y1t = φ10 +
y2,t = φ20 +
p
X
i=1
p
X
i=1
φ11,i y1,t−i +
φ21,i y1,t−i +
p
X
i=1
p
X
φ21,i y2t−i + u1t
(3.8)
φ22,i y2t−i + u2t ,
(3.9)
i=1
where y1,t and y2,t are the dependent variables, p is the lag length which is
the same for all equations and u1,t and u2,t are disturbance terms.
Interestingly, despite being a multivariate system of equations with lagged
values of the each variable potentially influencing all the others, estimation
of a VAR is performed by simply applying ordinary least squares to each
equation one at a time. Despite the model being a system of equations,
ordinary least squares applied to each equation is appropriate because the
set of explanatory variables is the same in each equation.
Higher dimensional VARs containing k variables {y1,t , y2,t , · · · , yk,t }, are
specified and estimated in the same way as they are for bivariate VARs. For
example, in the case of a trivariate model with k = 3, the VAR is specified
86
Modelling with Stationary Variables
as
y1t = φ10 +
y2t = φ20 +
y3t = φ30 +
p
X
φ11,i y1,t−i +
p
X
φ12,i y2,t−i +
p
X
i=1
i=1
i=1
p
X
p
X
p
X
φ21,i y1,t−i +
φ22,i y2,t−i +
i=1
i=1
i=1
p
X
p
X
p
X
φ31,i y1,t−i +
i=1
φ32,i y2,t−i +
i=1
φ13,i y3,t−i + u1,t
φ23,i y3,t−i + u2,t (3.10)
φ33,i y3,t−i + u3,t .
i=1
Estimation of the first equation involves regressing y1,t on a constant and
all of the lagged variables. This is repeated for the second equation where
y2t is the dependent variable, and for the third equation where y3t is the
dependent variable.
In matrix notation the VAR is conveniently represented as
yt = Φ0 + Φ1 yt−1 + Φ2 yt−2 + · · · + Φk yt−k + ut ,
where the parameters

φ10
 φ20

Φ0 =  .
 ..
φk0
are given by




,



Φi = 

φ11,i φ1,2,i · · ·
φ21,i φ22,i
..
..
..
.
.
.
φk1,i φk2,i · · ·
φ1,k,i
φ2k,i
..
.
(3.11)



.

φkk,i
The disturbances ut = {u1,t , u2,t , ..., uk,t }, have zero mean with covariance
matrix


var(u1t )
cov(u1t , u2t ) · · · cov(u1t , ukt )
 cov(u2t , u1t )
var(u2,t )
cov(u2t , ukt ) 


Ω=
(3.12)
.
..
..
..
..


.
.
.
.
cov(ukt , u1t ) cov(ukt , u2t ) · · ·
var(ukt )
This matrix has two properties. First, it is a symmetric matrix so that the
upper triangular part of the matrix is the mirror of the lower triangular part
cov(uit , ujt ) = cov(ujt , uit ),
i 6= j.
Second, the disturbance terms in each equation are allowed to be correlated
with the disturbances of other equations
cov(uit , ujt ) 6= 0,
i 6= j.
This last property is important when undertaking impulse response analysis
87
3.7 Vector Autoregressive Models
and computing variance decompositions, topics which are addressed at a
later stage.
Now consider extending the AR(6) model for real equity returns to include
lagged real dividend returns, rdt , as possible explanatory variables. The
seems like a reasonable course of action given that the present value model
established a theoretical link between equity prices and dividends. Setting
the lag length, p, equal to six yields the following estimated equation:
ret = 0.254 + 0.296 ret−1 − 0.064 ret−2 − 0.040 ret−3
(0.102)
(0.025)
(0.026)
(0.026)
+0.021 ret−4 + 0.053 ret−5 + 0.013 ret−6
(0.026)
(0.026)
(0.025)
−0.019 rdt−1 + 0.504 rdt−2 − 0.296 rdt−3
(0.193)
(0.262)
(0.258)
+0.395 rdt−4 − 0.259 rdt−5 − 0.350 rdt−6 + u
bt .
(0.257)
(0.263)
(0.191)
As before, standard errors are shown in parentheses and u
bt is the least
squares residual.
Equally important, however, is a model to explain real dividend returns
and a natural specification of a model of real dividend returns is to include
as explanatory variables both own lags and lags of real equity returns. Using
the same data as in the estimated models of real equity returns, an AR(6)
model of rdt which also includes lagged values of ret , is estimated by ordinary
least squares. The results are as follows:
rdt = 0.016 + 0.001 ret−1 + 0.008 ret−2 + 0.007 ret−3
(0.013)
(0.003)
(0.003)
(0.003)
+0.001 ret−4 + 0.012 ret−5 + 0.014 ret−6
(0.003)
(0.003)
(0.003)
+0.918 rdt−1 + 0.015 rdt−2 − 0.282 rdt−3
(0.025)
(0.034)
(0.033)
+0.250 rdt−4 + 0.015 rdt−5 − 0.030 rdt−6 + u
bt .
(0.033)
(0.034)
(0.025)
The parameter estimates on real equity returns at lags 2, 3, 5 and 6 are
all statistically significant. A joint test of the parameters of the lags of ret ,
yields a Chi-square statistic of 60.395. The p-value is 0.000, showing that the
restrictions are easily rejected and that lagged values of ret are important
in explaining the behaviour of rdt .
Treating both real equity returns , ret , and real dividend payments, rdt ,
as potentially endogenous, a VAR(6) model is estimated for monthly United
States data from 1871 to 2004. The parameter estimates (with standard
errors in parentheses) are given in Table 3.1. A comparison of the point
88
Modelling with Stationary Variables
estimates of the VAR(6) and the univariate models of equity and dividend
returns given previously will show that the estimates are indeed the same.
Table 3.1
Parameter estimates of a bivariate VAR(6) model for United States monthly real
equity returns and real dividend payments for the period 1871 to 2004.
Lag
Equity Returns
re
rd
Dividend Returns
re
rd
(0.025)
−0.019
2
−0.064
(0.026)
(0.262)
(0.003)
(0.034)
3
−0.040
(0.026)
−0.296
(0.258)
(0.003)
0.007
−0.282
4
0.021
0.395
0.001
1
5
6
Constant
0.296
(0.193)
0.504
0.001
0.918
(0.003)
(0.025)
0.008
0.015
(0.033)
0.250
(0.026)
(0.257)
(0.003)
0.053
−0.259
(0.263)
(0.003)
(0.034)
0.013
−0.350
(0.003)
0.014
−0.030
(0.026)
(0.025)
0.254
(0.102)
(0.191)
(0.033)
0.012
0.015
(0.025)
0.016
(0.013)
3.7.2 Lag Length Selection
An important part of the specification of a VAR is the choice of the lag
structure p. If the lag length is too short important parts of the dynamics
are excluded from the model. If the lag structure is too long then there are
redundant lags which can reduce the precision of the parameter estimates,
thereby raising the standard errors and yielding t-statistics that are relatively too small. Moreover, in choosing a lag structure in a VAR, care needs
to be exercised as degrees of freedom can quickly diminish for even moderate
lag lengths.
An important practical consideration in estimating the parameters of a
VAR(p) model is the optimal choice of lag order. A common data-driven
way of selecting the lag order is to use information criteria. An information
criterion is a scalar that is a simple but effective way of balancing the improvement in the fit of the equations with the loss of degrees of freedom
which results from increasing the lag order of a time series model.
The three most commonly used information criteria for selecting a parsimonious time series model are the Akaike information criterion (AIC)
(Akaike, 1974, 1976), the Hannan information criterion (HIC) (Hannan and
Quinn, 1979; Hannan, 1980) and the Schwarz information criterion (SIC)
89
3.7 Vector Autoregressive Models
(Schwarz, 1978). If k is the number of parameters estimated in the model,
these information criteria are given by
2k
T −p
b + 2k ln(log(T − p))
HIC = log |Ω|
T −p
k
log(T
− p)
b +
SIC = log |Ω|
.
T −p
b +
AIC = log |Ω|
(3.13)
b is the ordinary
in which p is the maximum lag order being tested for and Ω
least squares estimate of the matrix in equation (3.12). In the scalar case,
b is replaced by the
the determinant of the estimated covariance matrix, |Ω|,
2
estimated residual variance, s .
Choosing an optimal lag order using information criteria requires the following steps.
Step 1: Choose a maximum number of lags for the VAR model. This choice
is informed by the ACFs and PACFs of the data, the frequency with
which the data are observed and also the sample size.
Step 2: Estimate the model sequentially for all lags up to and including p.
For each regression, compute the relevant information criteria.
Step 3: Choose the specification of the model corresponding to the minimum values of the information criteria. In some cases there will
be disagreement between different information criteria and the final
choice is then an issue of judgement.
The bivariate VAR(6) for equity returns and dividend returns in Table 3.1
arbitrarily chose p = 6. In order to verify this choice the information criteria
outlined in Section 3.7.2 should be used. For example, the Hannan-Quinn
criterion (HIC) for this VAR for lags from 1 to 8 is as follows:
Lag:
1
2
3
4
5
6
7
8
HQ:
7.155
7.148
7.146
7.100
7.084
7.079*
7.086
7.082
It is apparent that the minimum value of the statistic is HQ = 7.079, which
corresponds to an optimal lag structure of 6. This provides support for the
choice of the number of lags used to estimate the VAR.
90
Modelling with Stationary Variables
3.7.3 Granger Causality Testing
In a VAR model, all lags are assumed to contribute to information on each
dependent variable, but in most empirical applications are large number of
the estimated coefficients are statistically insignificant. It is then a question
of crucial importance to determine if at least one of the parameters on the
lagged values of the explanatory variables in any equation are are not zero. In
the bivariate VAR case, this suggests that a test of the information content
of y2t on y1t in equation (3.8) is given by testing the joint restrictions
φ21,1 = φ21,2 = φ21,3 = · · · = φ21,p = 0.
These restrictions can be tested jointly using a chi-square test.
If y2t is important in predicting future values of y1t over and above lags
of y1t alone, then y2t is said to cause y1t in Granger’s sense (Granger, 1969).
It is important to remember, however, that Granger causality is based on
the presence of predictability. Evidence of Granger causality and the lack of
Granger causality from y2t to y1t , are denoted, respectively, as
y2t → y1t
y2t 9 y1t y .
It is also possible to test for Granger causality in the reverse direction by
performing a joint test of the lags of y1t in the y2t equation. Combining both
sets of causality results can yield a range of statistical causal patterns:
Unidirectional:
(from y2t to y1t )
y2t → y1t
y1t 9 y2t
Bidirectional:
(feedback)
y2t → y1t
y1t → y2t
Independence:
y2t 9 y1t
y1t 9 y2t
Table 3.2 gives the results of the Granger causality tests based on the
chi-square statistic. Both p-values are less than 0.05 showing that there is
bidirectional Granger causality between real equity returns (re) and real
dividend returns (rd). Note that the results of the Granger causality test for
rd 9 re reported in Table 3.2 may easily be verified using the estimation
results obtained from the univariate model where real equity returns are a
function of lags 1 to 6 of ret and rdt , a test of the information value of real
dividend returns is given by the chi-square statistic χ2 = 20.288. There are 6
degrees of freedom resulting in a p-value is 0.0025, suggesting real dividend
returns are statistically important in explaining real equity returns at the
91
3.7 Vector Autoregressive Models
5% level. This is in complete agreement with the results of the Granger
causality tests concerning the information content of dividends.
Table 3.2
Results of Granger causality tests based on the estimates of a bivariate VAR(6)
model for United States monthly real equity returns and real dividend payments
for the period 1871 to 2004.
Null Hypothesis:
Chi-square
Degrees of Freedom
p-value
rd 9 re
re 9 rd
20.288
60.395
6
6
0.0025
0.0000
3.7.4 Impulse Response Analysis
The Granger causality test provides one method for understanding the overall dynamics of lagged variables. An alternative, but related approach, is to
track the effects of shocks through the model on the dependent variables. In
this way the full dynamics of the system are displayed and how the variables
interact with each other over time. This approach is formally called impulse
response analysis.
In performing impulse response analysis a natural candidate to represent
a shock is the disturbance term ut = {u1,t , u2,t , ..., uk,t } in the VAR as it
represents that part of the dependent variables that is not predicted from
past information. The problem though is that the disturbance terms are
correlated as highlighted by the fact that the covariance matrix in (3.12) in
general has non-zero off-diagonal terms. The approach in impulse response
analysis is to transform ut into another disturbance term which has the property that it has a covariance matrix with zero off-diagonal terms. Formally
the transformed residuals are referred to as orthogonalized shocks which
have the property that u2,t to uK,t do not have an immediate effect on u1,t ,
u3,t to uk,t do not have an immediate effect on u2,t , etc.
Figure 3.3 gives the impulse responses of the VAR equity-dividend model.
There are four figures to capture the four sets of impulses. The first column
gives the response of re and rd to a shock in re, whereas the second column
shows how re and rd are affected by a shock to rd. A positive shock to re
has a damped oscillatory effect on re which quickly dissipates. The effect
on rd is initially negative which quickly becomes positive, reaching a peak
after 8 months, before decaying monotonically. The effect of a positive rd
shock on rd slowly dissipates approaching zero after nearly 30 periods. The
92
Modelling with Stationary Variables
Equity-Dividend Model Impulse Responses
RE -> RE
RD -> RE
4
4
3
3
2
2
1
1
0
0
-1
-1
0
10
20
Forecast Horizon
30
0
RE -> RD
10
20
Forecast Horizon
30
RD -> RD
.5
.5
.4
.3
.2
.1
0
-.1
.4
.3
.2
.1
0
-.1
0
10
20
Forecast Horizon
30
0
10
20
Forecast Horizon
30
Figure 3.3 Impulse responses for the VAR(6) model of equity prices and
dividends. Data are monthly for the period January 1871 to June 2004.
immediate effect of this shock on re is zero by construction, which hovers
near zero exhibiting a damped oscillatory pattern.
3.7.5 Variance Decomposition
The impulse response analysis provides information on the dynamics of the
VAR system of equations and how each variable responds and interacts to
shocks in the other variables in the system. To gain insight into the relative
importance of shocks on the movements in the variables in the system a
variance decomposition is performed. In this analysis, movements in each
variable over the horizon of the impulse response analysis are decomposed
into the separate relative effects of each shock with the results expressed as
a percentage of the overall movement. It is because the impulse responses
are expressed in terms of orthogonalized shocks that it is possible to carry
out this decomposition.
The variance decomposition for selected periods of real equity (re) and
real dividend (rd) returns based on the bivariate VAR equity-dividend model
is as follows:
93
3.7 Vector Autoregressive Models
Period
1
5
10
15
20
25
30
Decomposition of re
re
rd
Decomposition of rd
re
rd
100.000
98.960
98.651
98.593
98.554
98.539
98.535
0.316
1.114
8.131
10.698
11.686
11.996
12.081
0.000
1.040
1.348
1.406
1.445
1.460
1.465
99.684
98.886
91.869
89.302
88.313
88.004
87.919
The rd shocks contribute very little to re with the maximum contribution
still less than 2%. In contrast, re shocks after 15 periods contribute more
than 10% of the variance in rd. These results suggest that the effects of
shocks in re on rd, are relatively more important that the reverse.
3.7.6 Diebold-Yilmaz Spillover Index
An important application of the variance decomposition of a VAR is the
spillover index proposed by Diebold and Yilmaz (2009) where the aim is to
compute the total contribution of shocks on an asset market arising from
all other markets. Table 3.3 gives the volatility decomposition for a 10 week
horizon of the weekly asset returns of 19 countries based on a VAR with
2 lags and a constant. The sample period begins December 4th 1996, and
ends November 23rd 2007.
The first row of the table gives the contributions to the 10-week forecast
variance of shocks in all 19 asset markets on US weekly returns. By excluding
own shocks, which equal 93.6%, the total contribution of the other 18 asset
markets is given in the last column and equals
1.6 + 1.5 + · · · + 0.3 = 6.4%.
Similarly, for the UK, the total contribution of the other 18 asset markets
to its forecast variance is
40.3 + 0.7 + · · · + 0.5 = 44.3%.
Of the 19 asset markets, the US appears to be the most independent of
all international asset markets as it has the lowest contributions from other
asset markets, equal to just 6.4%. The next lowest is Turkey with a contribution of 14%. Germany’s asset market appears to be the most affected by
international asset markets where the contribution of shocks from external
markets to its forecast variance is 72.4%.
Table 3.3
US
UK
FRA
GER
HKG
JPN
AUS
IDN
KOR
MYS
PHL
SGP
TAI
THA
ARG
BRA
CHL
MEX
TUR
To
291.9
385.5
93.6
40.3
38.3
40.8
15.3
12.1
23.2
6
8.3
4.1
11.1
16.8
6.4
6.3
11.9
14.1
11.8
22.2
3
US
84.1
139.8
1.6
55.7
21.7
15.9
8.7
3.1
6
1.6
2.6
2.2
1.6
4.8
1.3
2.4
2.1
1.3
1.1
3.5
2.5
UK
31
68.2
1.5
0.7
37.2
13
1.7
1.8
1.3
1.2
1.3
0.6
0.3
0.6
1.2
1
1.6
1
1
1.2
0.2
FRA
11.2
38.8
0
0.4
0.1
27.6
1.4
0.9
0.2
0.7
0.7
1.3
0.2
0.9
1.8
0.7
0.1
0.7
0
0.4
0.7
GER
80.8
150.6
0.3
0.1
0
0.1
69.9
2.3
6.4
6.4
5.6
10.5
8.1
18.5
5.3
7.8
1.3
1.3
3.2
3
0.6
HKG
19.2
96.9
0.2
0.5
0.2
0.1
0.3
77.7
2.3
1.6
3.7
1.5
0.4
1.3
2.8
0.2
0.8
1.4
0.6
0.3
0.9
JPN
11.5
68.3
0.1
0.1
0.3
0.3
0
0.2
56.8
0.4
1
0.4
0.9
0.4
0.4
0.8
1.3
1.6
1.4
1.2
0.6
AUS
31.4
108.3
0.1
0.2
0.3
0.4
0.1
0.3
0.1
77
1.2
6.6
7.2
3.2
0.4
7.6
0.4
0.5
2.3
0.2
0.1
IDN
13.6
86.4
0.2
0.2
0.3
0.6
0
0.3
0.4
0.7
72.8
0.5
0.1
1.6
2
4.6
0.4
0.5
0.3
0.3
0.6
KOR
16.2
85.4
0.3
0.3
0.2
0.1
0.3
0.1
0.2
0.4
0
69.2
2.9
3.6
1
4
0.6
0.7
0.3
0.9
0.3
MYS
9.9
72.8
0.2
0.2
0.2
0.3
0.1
0.2
0.2
0.1
0
0.1
62.9
1.7
1
2.3
0.4
1
0.1
1
0.6
PHL
8.2
51.2
0.2
0
0.1
0.3
0
0.3
0.2
0.9
0.1
0.1
0.3
43.1
0.9
2.2
0.6
0.8
0.9
0.1
0.1
SGP
5.9
79.5
0.3
0.1
0.1
0
0.2
0.3
0.4
0.2
0.1
0.2
0.4
0.3
73.6
0.3
1.1
0.1
0.3
0.3
0.9
TAI
11.8
70
0.2
0.1
0.3
0.2
0.9
0.1
0.5
1
1.3
1.1
1.5
1.1
0.4
58.2
0.2
0.7
0.8
0.5
0.8
THA
21.4
96.7
0.1
0.1
0.1
0
0.3
0.1
0.1
0.7
0.2
0.1
1.6
0.8
0.8
0.5
75.3
7.1
2.9
5.4
0.5
ARG
9.4
75.2
0.1
0.1
0.1
0.1
0
0
0.3
0.1
0.2
0.6
0.1
0.5
0.3
0.2
0.1
65.8
4
1.6
1.1
BRA
2.6
68.4
0
0
0.1
0
0.1
0
0.1
0.3
0.1
0.4
0
0.1
0.1
0.1
0.1
0.1
65.8
0.3
0.6
CHL
8.4
65.4
0.5
0.4
0.1
0.1
0.3
0.1
0.6
0.1
0.1
0.2
0.1
0.3
0.3
0.4
1.4
0.6
2.7
56.9
0.2
MEX
6.7
92.4
0.3
0.5
0.3
0.1
0.4
0.1
0.7
0.4
0.7
0.3
0.2
0.4
0
0.3
0.3
0.7
0.4
0.6
85.8
TUR
675
Index = 35.5%
6.4
44.3
62.8
72.4
30.1
22.3
43.2
23
27.2
30.8
37.1
56.9
26.4
41.8
24.7
34.2
34.2
43.1
14.2
Others
Diebold-Yilmaz spillover index of global stock market returns. Based on a VAR with 2 lags and a constant with the variance
decomposition based on a 10 week horizon.
Others
Own
3.8 Exercises
95
By adding up the separate contributions to each asset market in the last
column gives the total contributions of non-own shocks on all 19 asset market
6.4 + 44.3 + · · · + 14.2 = 675.0%.
As the contributions to the total forecast variance by construction are normalized to sum to 100% for each of the 19 asset markets, the percentage contribution of external shocks to the 19 asset market is given by the spillover
index
675.0
SP ILLOV ER =
= 35.5%.
19
This value shows that approximately one-third of the forecast variance of
asset returns is the result of shocks from external asset markets with the
remaining two-thirds arising from internal shocks on average.
3.8 Exercises
(1) Estimating AR and MA Models
pv.wf1, pv.dta, pv.xlsx
(a) Compute the percentage monthly return on equities and dividends.
Plot the two returns and interpret their time series patterns.
(b) Estimate an AR(6) model of equity returns. Interpret the parameter
estimates.
(c) Estimate an AR(6) model of equity returns but now augment the
model with 6 lags on dividend returns. Perform a test of the information value of dividend returns in understanding equity returns.
(d) Repeat parts (b) and (c) for real dividend returns.
(e) Estimate a MA(3) model of real equity returns.
(f) Estimate a MA(6) model of equity returns.
(g) Perform a test that the parameters on lags 4 to 6 are zero.
(h) Repeat parts (e) to (g) using real dividend returns.
(2) Computing the ACF and PACF
pv.wf1, pv.dta, pv.xlsx
(a) Compute the percentage monthly return on equities and dividends.
96
Modelling with Stationary Variables
(b) Compute the ACF of real equity returns for up to 6 lags. Compare a manual procedure with an automated version provided by
econometric software.
(c) Compute the PACF of real equity returns for up to 6 lags.Compare
a manual procedure with an automated version provided by econometric software.
(d) Repeat parts (b) and (c) for real dividend returns.
(3) Mean Aversion and Reversion in Stock Returns
int yr.wf1, int yr.dta, int yr.xlsx
int qr.wf1, int qr.dta, int qr.xlsx
int mn.wf1,int mn.dta, int mn.xlsx
(a) Estimate the following regression equation using returns on the
NASDAQ (rt ) for each frequency (monthly, quarterly, annual)
rt = φ0 + φ1 rt−1 + ut ,
where ut is a disturbance term. Interpret the results.
(b) Repeat part (a) for the Australian share price index.
(c) Repeat part (a) for the Singapore Straits Times stock index.
(4) Poterba-Summers Pricing Model
Poterba and Summers (1988) assume that the price of an asset pt ,
behaves according to
log pt = log ft + ut
log ft = log ft−1 + vt
ut = φ1 ut−1 + wt ,
where ft is the fundamental price, ut represents transient price movements, and vt and wt are independent disturbance terms with zero means
2 respectively.
and constant variances, σv2 and σw
(a) Show that the k th order autocorrelation of the one period return
rt = log pt − log pt−1 = vt + ut − ut−1 ,
is
ρk =
2
σw
φ1k−1 (φ1 − 1)
< 0.
2 /σ 2 )
σv2 (1 + φ1 + 2σw
v
97
3.8 Exercises
(b) Show that the first order autocovariance function of the h-period
return
rt (h) = log pt − log pt−h = rt + rt−1 + · · · + rt−h+1 ,
is
γh =
2
σw
(2φh1 − φ2h
1 − 1) < 0.
1 − φ21
(5) Roll Model of Bid-Ask Bounce
spot.wf1, spot.dta, spot.xlsx
Roll (1984) assumes that the price, pt , of an asset follows
s
pt = f + It ,
2
where f is a constant fundamental price, s is the bid-ask spread and It
is a binary indicator variable given by
+1 : with probability 0.5 (buyer)
It =
−1 : with probability 0.5 (seller).
(a) Derive E[It ], var(It ), cov(It , It−1 ), corr(It , It−1 ).
(b) Derive E[∆It ], var(∆It ), cov(∆It , ∆It−1 ), corr(∆It , ∆It−1 ).
(c) Show that the autocorrelation function of ∆pt is
corr(∆pt , ∆pt−1 ) = −
corr(∆pt , ∆pt−k ) = 0,
1
2
k > 1.
(d) Suppose that the price is now given by
s
pt = ft + It ,
2
where the fundamental price ft is now assumed to be random with
zero mean and variance σ 2 . Derive the autocorrelation function of
∆pt .
(6) Forward Market Efficiency
spot.wf1, spot.dta, spot.xlsx
The forward market is efficient if the lagged forward rate is an unbiased
predictor of the current spot rate.
98
Modelling with Stationary Variables
(a) Estimate the following model of the spot and the lagged 1-month
forward rate
St = β0 + β1 Ft−4 + ut ,
where the forward rate is lagged four periods (the data are weekly).
Verify that weekly data on the $/AUD spot exchange rate and the
1 month forward rate yields
St = 0.066 + 0.916Ft−4 + et ,
where a lag length of four is chosen as the data are weekly and the
forward contract matures in one month. Test the restriction β1 = 1
and interpret the result.
(b) Compute the ACF and PACF of the least squares residuals, et , for
the first 8 lags. Verify that the results are as follows.
Lag:
1
2
3
4
5
6
7
8
ACF
PACF
0.80
0.80
0.54
-0.28
0.29
-0.14
0.07
-0.07
0.07
0.40
0.09
-0.11
0.13
-0.04
0.15
-0.02
(c) There is evidence to suggest that the ACF decays quickly after 3
lags. Interpret this result and use this information to improve the
specification of the model and redo the test of β1 = 1.
(d) Repeat parts (a) to (c) for the 3-month and the 6-month forward
rates.
(7) Microsoft in the Dot-Com Crisis
capm.wf1, capm.dta, capm.xlsx
(a) Compute the monthly excess returns for Microsoft and the market.
(b) Estimate a CAPM augmented by dummy variables to capture the
large movements in the Microsoft returns in April 2000, December
2000 and January 2001. Perform a test of autocorrelation on ut and
interpret the result.
(c) Reestimate the CAPM in part (b) augmented by including the first
lag of Microsoft excess returns. Test of autocorrelation on ut and
interpret the result.
(d) Briefly discuss other ways that dynamics can be included in the
model.
3.8 Exercises
99
(8) An Equity-Dividend VAR
pv.wf1, pv.dta, pv.xlsx
(a) Compute the percentage monthly return on equities and dividends
and estimate a bivariate VAR for these variables with 6 lags.
(b) Test for the optimum choice of lag length using the Hannan-Quinn
criterion and specifying a maximum lag length of 12. If required,
re-estimate the VAR.
(c) Test for Granger causality between equity returns and dividends
and interpret the results.
(d) Compute the impulse responses for 30 periods and interpret the
results.
(e) Compute the variance decomposition for 30 periods and interpret
the results.
(9) Campbell-Shiller Present Value Model
cam shiller.wf1, cam shiller.dta, cam shiller.xlsx
Let rdt be real dividend returns (expressed in percentage terms) and
let vt be deviations from the present value relationship between equity
prices and dividends computed from the linear regression
pt = β + αdt + vt .
Campbell and Shiller (1987) develop a VAR model for rdt and vt given
by
rdt
µ1
φ1,1,1 φ1,2,1
rdt−1
u1,t
=
+
+
.
vt
µ2
φ2,1,1 φ2,2,1
vt−1
u2,t
(a) Estimate the parameter α by regressing equity prices, STOCKt , on a
constant and dividend parents, DIVt and compute the least squares
residuals vbt .
(b) Estimate a VAR(1) containing the variables rdt and vbt .
(c) Campbell and Shiller show that
φ22,1 = δ −1 − αφ12,1
where δ represents the discount factor. Use the parameter estimate
of α obtained in part (a) and the parameter estimates of φ12,1 and
φ22,1 obtained in part (b), to estimate δ. Interpret the result.
100
Modelling with Stationary Variables
(10) Causality Between Stock Returns and Output Growth
stock out.wf1, stock out.dta, stock out.xlsx
(a) For the United States, compute the percentage continuous stock
returns and output growth rates, respectively.
(b) It is hypothesised that stock returns lead output growth but not
the reverse. Test this hypothesis by performing a test for Granger
causality between the two series using 1 lag.
(c) Test the robustness of these results by using higher order lags up to a
maximum of 4. What do you conclude about the causal relationships
between stock returns and output growth in the United States?
(d) Repeat parts (a) to (c) for Japan, Singapore and Taiwan.
(11) Volatility Linkages
diebold.wf1, diebold.dta, diebold.xlsx
Diebold and Yilmaz (2009) construct spillover indexes of international
real asset returns and volatility based on the variance decomposition of
a VAR. The data file contains weekly data on real asset returns, rets,
and volatility, vol, of 7 developed countries and 12 emerging countries
from the first week of January 1992 to the fourth week of November
2007.
(a) Compute descriptive statistics of the 19 real asset market returns
given in rets. Compare the estimates with the results reported in
Table 1 of Diebold and Yilmaz.
(b) Estimate a VAR(2) containing a constant and the 19 real asset market returns.
(c) Estimate V D10 , the variance decomposition for horizon h = 10,
and compare the estimates with the results reported in Table 3 of
Diebold and Yilmaz.
(d) Using the results in part (c) compute the ‘Contribution from Others’
by summing each row of V D10 excluding the diagonal elements,
and the ‘Contribution to Others’ by summing each column of V D10
excluding the diagonal elements. Interpret the results.
(e) Repeat parts (a) to (d) with the 19 series in rets replaced by vol,
and the comparisons now based on Tables 2 and 4 in Diebold and
Yilmaz.
4
Nonstationarity in Financial Time Series
4.1 Introduction
An important property of asset prices identified in Chapter 1 is that they
exhibit strong trends. Financial series exhibiting no trending behaviour are
referred to as being stationary and are the subject matter of Chapter 3,
while series that are characterised by trending behaviour are referred to
as being nonstationary. This chapter focuses on identifying and testing for
nonstationarity in financial time series. The identification of nonstationarity
will hinge on a test for ρ = 1 in a model of the form
yt = ρyt−1 + ut ,
in which ut is a disturbance term. This test is commonly referred to as a test
for unit root. This situation is different from hypothesis tests performed on
stationary processes under the null conducted in Chapter 3 because the process is nonstationary under the null hypothesis of ρ = 1 and as a consequence
the test statistic does not have a normal distribution in large samples.
The classification of variables as either stationary or nonstationary has
important implications in both finance and econometrics. From a finance
point of view, the presence of nonstationarity in the price of financial asset
is consistent with the efficient markets hypothesis which states that all of
the information in the price of an asset is contained in its most recent price.
If the nonstationary process is explosive then this may be taken as evidence
of a bubble in the price of the asset.
4.2 Characteristics of Financial Data
In Chapter 1 the efficient markets hypothesis was introduced which theorises
that all available information concerning the value of a risky asset is factored
102
Nonstationarity in Financial Time Series
into the current price of the asset. The return to a risky asset may be written
as
rt = pt − pt−1 = α + vt ,
vt ∼ iid (0, σ 2 ) ,
(4.1)
where pt is the logarithm of the asset price. The parameter α represents the
average return on the asset. From an efficient markets point of view, provided
that vt is not autocorrelated, then rt is unpredictable using information at
time t.
An alternative representation of equation (4.1) is to rearrange it in terms
of pt as
pt = α + pt−1 + vt .
(4.2)
This representation of pt is known as a random walk with drift, where the
mean parameter α represents the drift. From an efficient market point of
view this equation shows that in predicting the price of an asset in the next
period, all of the relevant information is contained in the current price.
To understand the properties of the random walk with drift model of
asset prices in (4.2), Figure 4.1 provides a plot of a simulated random walk
with drift. In simulating equation (4.2), the drift parameter α is set equal
to the mean return on the S&P500 while the volatility, σ 2 corresponds to
the variance of the logarithm of S&P500 returns. The simulated price is
has similar time series characteristics to the observed logarithm of the price
index given in Figure 1.2 in Chapter 1 and Figure fig::transformations.
In particular, the simulated price exhibits two important characteristics,
namely, an increasing mean and an increasing variance. These characteristics
may be demonstrated formally as follows. Lag the random walk with drift
model in equation (4.2) by one period yields
pt−1 = α + pt−2 + vt−1 ,
and then substituting this expression for pt−1 in (4.2) gives
pt = α + α + pt−2 + vt + vt−1 .
Repeating this recursive substitution process for t-steps in total gives
pt = p0 + αt + vt + vt−1 + vt−2 + · · · + v1 ,
in which pt is fully determined by its initial value, p0 , a deterministic trend
component and the summation of the complete history of disturbances.
Taking expectations of this expression and using the property that E[vt ] =
E[vt−1 ] · · · = 0, gives the mean of pt
E[pt ] = p0 + αt .
103
4.2 Characteristics of Financial Data
Random Walk with Drift
2.5
2
1.5
0
50
100
150
200
Figure 4.1 Simulated random walk with drift model using equation (4.2).
The initial value of the simulated data is the natural logarithm of the
S&P500 equity price index in February 1871 and the drift and volatility
parameters are estimated from the returns to the S&P500 index. The distribution of the disturbance term is taken to be the normal distribution.
This demonstrates that the mean of the random walk with drift model increases over time provided that α > 0. The variance of pt in the random
walk model is defined as
var(pt ) = E[(pt − E[pt ])2 ] = tσ 2
by using the property that the disturbances are independent. As with the
expression for the mean the variance also is an increasing function over time,
that is pt exhibits fluctuations with increasing amplitude as time progresses.
It is now clear that the efficient market hypothesis has implications for the
time series behaviour of financial asset prices. Specifically in an efficient
market asset prices will exhibit trending behaviour.
In Chapter 3 the idea was developed of an observer who observes snapshots
of a financial time series at different points in time. If the snapshots exhibit
similar behaviour in terms of the mean and variance of the observed series,
the series is said to be stationary, but if the observed behaviour in either the
mean or the variance of the series (or both) is completely different then it is
non-stationary. More formally, a variable yt is stationary if its distribution,
or some important aspect of its distribution, is constant over time. There are
two commonly used definitions of stationarity known as weak (or covariance)
104
Nonstationarity in Financial Time Series
and strong (or strict) stationarity1 and it is the former that will be of primary
interest.
Definition: Weak (or Covariance) Stationarity
A process is is weakly stationary if both the population mean and the population variance are constant over time and if the covariance between two
observations is a function only of the distance between them and not of time.
The efficient markets hypothesis requires that financial asset returns have
a non-zero (positive) mean and variance that are independent of time as in
equation (4.1). Formally this means that returns are weakly or covariance
stationary. By contrast, the logarithm of prices is a random walk with drift,
(4.2), in which the mean and the variance are functions of time. It follows,
therefore, that a series with these properties is referred to as being non
stationary.
Logarithm of Equity Prices
80
00
20
40
60
19
19
19
20
Equity Returns
80
60
00
20
19
19
40
19
20
19
00
19
80
18
00
20
80
19
60
19
40
19
20
19
00
19
18
80
-.4
-.2
0
-150-100 -50 0
.2
.4
50 100
First Difference of Equity Prices
19
00
19
80
0
18
00
20
80
19
60
19
40
19
20
19
00
19
18
80
0
2
500
4
1000
6
8
1500
Equity Prices
Figure 4.2 Different transformations of monthly United States equity prices
for the period January 1871 to June 2004.
1
Strict stationarity is a stronger requirement than that weak stationarity pertains to all of the
moments of the distribution not just the first two.
4.3 Deterministic and Stochastic Trends
105
Figure 4.2 highlights the time series properties of the real United States
equity price and various transformations of this series, from January 1871
to June 2004. The transformed equity prices are the logarithm of the equity
price, the first difference of the equity price and and the first difference of
the logarithm of the equity price (log returns).
A number of conclusions may be drawn from the behaviour of equity prices
in Figure 4.2 which both reinforce and extend the ideas developed previously.
Both the equity price and its logarithm are nonstationary in the mean as
both exhibit positive trends. Furthermore, a simple first difference of the
equity price renders the series stationary in the mean, which is now constant
over time, but the variance is still increasing with time. The implication of
this is that simply first differencing of the equity price does not yield a
stationary series. Finally, equity returns defined as the first difference of the
logarithm of prices is stationary in both mean and variance. The appropriate
choice of filter to detrend the data is the subject matter of the next section.
4.3 Deterministic and Stochastic Trends
While the term ‘trend’ is deceptively easy to define, being the persistent
long-term movement of a variable over time, in practice it transpires that
trends are fairly tricky to deal with and the appropriate choice of filter to
detrend the data is therefore not entirely straightforward. The main reason
for this is that there are two very different types of trending behaviour that
are difficult to distinguish between.
(i) Determimistic trend
A deterministic trend is a nonrandom function of time
yt = α + δt + ut ,
in which t is a simple time trend taking integer values from 1 to T .
In this model, shocks to the system have a transitory effect in that
the process always reverts to its mean of α + δt. This suggests the
removing the deterministic trend from yt will give a series that does
not trend. That is
b =u
y−α
b − δt
bt ,
in which ordinary least squares has been used to estimate the parameters, is stationary. Another approaches to estimating the parameters
of the deterministic elements, generalised least squares, is considered
at a later stage.
106
Nonstationarity in Financial Time Series
(ii) Stochastic trend
By contrast, a stochastic trend is random and varies over time, for
example,
yt = α + yt−1 + ut ,
(4.3)
which is known as a random walk with drift model. In this model, the
best guess for the next value of series is the current value plus some
constant, rather than a deterministic mean value. As a result, this
kind of models is also called ‘local trend’ or ‘local level’ models. The
appropriate filter here is to difference the data to obtain a stationary
series as follows
∆yt = α + ut .
Distinguishing between deterministic and stochastic trends is important as
the correct choice of detrending filter depends upon this distinction. The deterministic trend model is stationary once the deterministic trend has been
removed (and is called a trend-stationary process) whereas a stochastic trend can only be removed by differencing the series (a differencestationary process).
Most financial econometricians would agree that the behaviour of many
financial time series is due to stochastic rather than deterministic trends.
It is hard to reconcile the predictability implied by a deterministic trend
with the complications and surprises faced period-after-period by financial
forecasters. Consider the simple AR(1) regression equation
yt = α + ρyt−1 + ut .
The results obtained by fitting this regression to monthly data on United
States zero coupon bonds with maturities ranging from 2 months to 9 months
for period January 1947 to February 1987 are given in Table 4.1
The major result of interest in the results in Table 4.1 is that in all the
estimated regressions estimate of the slope coefficient, ρb is very close to unity
and indicative of a stochastic trend in the data along the lines of equation
(4.3). This empirical result is quite consistent one for all the maturities and,
furthermore, the pattern is a fairly robust one that applies to other financial
markets such as currency markets (spot and forward exchange rates) and
equity markets (share prices and dividends) as well.
The behaviour under simulation of series with deterministic (dashed lines)
and stochastic trend models (solid lines) is demonstrated in Figure 4.3 using
simulated data. The nonstationary series look similar, both showing clear
evidence of trending. The key difference between a deterministic trend and
4.3 Deterministic and Stochastic Trends
107
Table 4.1
Ordinary least squares estimates of an AR(1) model estimated using monthly
data on United States zero coupon bonds with maturities ranging from 2 months
to 9 months for period January 1947 to February 1987
Maturity
(mths)
2
3
4
5
6
9
Intercept
(b
α)
se(b
α)
0.090
0.087
0.085
0.085
0.087
0.088
0.046
0.045
0.044
0.044
0.045
0.046
Slope
(b
ρ)
se(b
ρ)
0.983
0.984
0.985
0.985
0.985
0.985
0.008
0.008
0.007
0.007
0.007
0.007
a stochastic trend however is that removing a deterministic trend from the
difference stationary process, illustrated by the solid line in panel (b) of
Figure 4.3, does not result in a stationary series. The longer the series is
simulated for, the more the evidence reveals the more erratic behaviour of
the difference stationary process which has been detrended incorrectly.
It is in fact this feature of the makeup of yt that makes its behaviour very
different to the simple deterministic trend model because simply removing
the deterministic trend will not remove the nonstationarity in the data that
is due to the summation of the disturbances.
The element of summation of the disturbances in nonstationarity is the
origin of an important term, the order of integration of a series.
Definition: Order of Integration
A process is integrated of order d, denoted by I(d), if it can be rendered
stationary by differencing d times. That is, yt is non-stationary, but (yt −
yt−1 )d is stationary.
Accordingly a process is said to be integrated of order one, denoted by
I(1), if it can be rendered stationary by differencing once, that is yt is nonstationary, but ∆yt = yt − yt−1 is stationary. If d = 2, then yt is I(2) and
needs to be differenced twice to achieve stationarity as follows
(yt − yt−1 )2 = (yt − yt−1 ) − (yt−1 − yt−2 ) = yt − 2yt−1 + yt−2 .
By analogy, a stationary process is integrated of zero, I(0), if it does not
require any differencing to achieve stationarity.
108
Nonstationarity in Financial Time Series
(a) Raw Simulated Data
2.5
2
1.5
0
50
100
150
200
150
200
150
200
(b) Detrended Data
.2
0
-.2
0
50
100
(c) Differenced Data
.2
0
-.2
0
50
100
Figure 4.3 Panel (a) comparing a process with a deterministic time trend
(dashed line) to a process with a stochastic trend (solid line). In panel (b)
the estimated deterministic trend is used to detrend both time series data.
The deterministically trending data (dashed line) is now stationary, but the
model with a stochastic trend (solid line) is still not stationary. In panel
(c) both series are differenced.
There is one final important point that arises out of the simulated behaviour illustrated in Figure 4.3. At first sight panel (c) may suggest that
differencing a financial time series, irrespective of whether it is trend of
difference stationary, may be a useful strategy because both the resultant
series in panel (c) appear to be stationary. The logic of the argument then
becomes, if the series has a stochastic trend then this is the correct course
of action and if it is trend stationary then a stationary series will result in
4.3 Deterministic and Stochastic Trends
109
any event. This is not, however, a strategy to be recommended. Consider
again the deterministic trend model
yt = α + δt + ut
In first-difference form this becomes
∆yt = δ + ut − ut−1 ,
so that the process of taking the first difference has introduced a moving
average error term which has a unit root. This is known as over-differencing
and it can have treacherous consequences for subsequent econometric analysis, should the true data generating process actually be trend-stationary. In
fact for the simple problem of estimating the coefficient δ in the differenced
model it produces an estimate that is tantamount to using only the first and
last data points in estimation process.
4.3.1 Unit Roots†
A series that is I(1) is also said to have a unit root and tests for nonstationarity are called tests for unit roots. The reason for this is easily demonstrated.
Consider the general n - th order autoregressive process
yt = φ1 yt−1 + φ2 yt−2 + . . . + φn yt−n + ut .
This may be written in a different way by using the lag operator, L, which
is defined as
yt−1 = Lyt ,
yt−2 = L2 yt
···
yt−n = Ln yt ,
so that
yt = φ1 Lyt + φ2 L2 yt + . . . + φn Ln yt + ut
or
Φ (L) yt = ut
where
Φ (L) = 1 − φ1 L − φ2 L2 − . . . − φn Ln
is called a polynomial in the lag operator. The roots of this polynomial are
the values of L which satisfy the equation
1 − φ1 L − φ2 L2 − . . . − φn Ln = 0.
110
Nonstationarity in Financial Time Series
If all of the roots of this equation are greater in absolute value than one,
then yt is stationary. If, on the other hand, any of the roots is equal to one
(a unit root) then yt is non-stationary.
The AR(1) model is
(1 − φ1 L) yt = ut
and the roots of the equation
1 − φ1 L = 0
are of interest. The single root of this equation is given by
L∗ = 1/φ1
and the root is greater than unity only if |φ1 | < 1. If this is the case then the
AR(1) process is stationary. If, on the other hand, the root of the equation
is unity, then |φ1 | = 1 and the AR(1) process is non-stationary.
In the AR(2) model
1 − φ1 L − φ2 L2 yt = ut
it is possible that there are two unit roots, corresponding to the roots of the
equation
1 − φ1 L − φ2 L2 = 0.
A solution is obtained by factoring the equation yield
(1 − ϕ1 L) (1 − ϕ2 L) = 0
in which ϕ1 + ϕ2 = φ1 and ϕ1 ϕ2 = φ2 . The roots of this equation are 1/ϕ1
and 1/ϕ2 , respectively, and yt will have a unit root if either of the roots is
unity. In the event of φ1 = 2 and φ2 = −1 then both roots of the equation
are one and yt has two unit roots and is therefore I(2).
4.4 The Dickey-Fuller Testing Framework
The original testing procedures for unit roots were developed by Dickey and
Fuller (1979, 1981) and this framework remains one of the most popular
methods to test for nonstationarity in financial time series.
4.4.1 Dickey-Fuller (DF) Test
Consider again the AR(1) regression equation
yt = α + ρyt−1 + ut ,
(4.4)
4.4 The Dickey-Fuller Testing Framework
111
in which ut is a disturbance term with zero mean and constant variance σ 2 .
The null and alternative hypotheses are respectively
H0 :
H1 :
ρ=1
ρ<1
(Variable is nonstationary)
(Variable is stationary).
(4.5)
To carry out the test, equation (4.4) is estimated by ordinary least squares
and a t-statistic is constructed to test that ρ = 1
tρ =
ρb − 1
.
se(b
ρ)
(4.6)
This is all correct up to this stage: the estimation of (4.4) by ordinary
least squares and the use of the t-statistic in (??) to test the hypothesis are
both sound procedures. The problem is that the distribution of the statistic
in (??) is not distributed as a Student t distribution. In fact the distribution
of this statistic under the null hypothesis of nonstationarity is non-standard.
The correct distribution is known as the Dickey-Fuller distribution and the
t-statistic given in (4.6) is commonly known as the Dickey-Fuller unit root
test to recognize that even though it is a t-statistic by construction its
distribution is not.
In practice, equation (4.4) is transformed in such a way to convert the tstatistic in (4.6) to a test that the slope parameter of the transformed equation is zero. This has the advantage that the t-statistic commonly reported
in standard regression packages directly yields the Dickey-Fuller statistic.
Subtract yt−1 from both sides of (4.4) and collect terms to give
yt − yt−1 = α + (ρ − 1)yt−1 + ut ,
(4.7)
or by defining β = ρ − 1, so that
yt − yt−1 = α + βyt−1 + ut .
(4.8)
Equations (4.4) and (4.8) are exactly the same models with the connection
being that β = ρ − 1.
Consider again the monthly data on United States zero coupon bonds
with maturities ranging from 2 months to 9 months for period January 1947
to February 1987 used in the estimation of the AR(1) regressions reported
in Table 4.1. Estimating equation (4.4) yields the following results (with
standard errors in parentheses)
yt = 0.090 + 0.983 yt−1 + et ,
(0.046)
(0.008)
(4.9)
112
Nonstationarity in Financial Time Series
On the other hand, estimating the transformed equation (4.8) yields
yt − yt−1 = 0.090 − 0.017 yt−1 + u
bt .
(0.046)
(0.008)
(4.10)
Comparing the estimated equations in (4.9) and (4.10) shows that they differ
only in terms of the slope estimate on yt−1 . The differences in the two slope
estimates is easily reconciled as the slope estimate of (4.9) is ρb = 0.983,
whereas an estimate of β may be recovered as
βb = ρb − 1 = 0.983 − 1 = −0.017.
This is also the slope estimate obtained in (4.10). To perform the test of
H0 : ρ = 1, the relevant t-statistics are
ρb − 1
0.983 − 1
=
= −2.120 ,
se(b
ρ)
0.008
−0.017 − 0
βb − 0
=
= −2.120 ,
tβ =
b
0.008
se(β)
tρ =
which demonstrates that the two methods are indeed equivalent.
The Dickey-Fuller test regression must now be extended to deal with the
possibility that under the alternative hypothesis, the series may be stationary around a deterministic trend. As established in Sections ?? and ??,
financial data often exhibit trends and one of the problems faced by the
empirical researcher is distinguishing between stochastic and deterministic
trends. If the data are trending and if the null hypothesis of nonstationarity
is rejected, it is imperative that the model under the alternative hypothesis is able to account for the major characteristics displayed by the series
being tested. If the test regression in equation (4.8) is used and the null
hypothesis of a unit root rejected, the alternative hypothesis is that of a
process which is stationary around the constant mean α. In other words,
the model under the alternative hypothesis contains no deterministic trend.
Consequently, the important extension of the Dickey-Fuller framework is to
include a linear time trend, t, in the test regression so that the estimated
equation becomes
yt − yt−1 = α + βyt−1 + δtt + ut .
(4.11)
The Dickey-Fuller test still consists of testing β = 0. Under the alternative
hypothesis, yt is now a stationary process with a deterministic trend.
Once again using the monthly data on United States zero coupon bonds,
the estimated regression including the time trend gives the following results
113
4.4 The Dickey-Fuller Testing Framework
(with standard errors in parentheses)
∆yt = 0.030 − 0.046 yt−1 + 0.001 t + u
bt .
(0.052)
(0.014)
(0.001)
The value of the Dickey-Fuller test is
tβ =
−0.046 − 0
βb − 0
=
= −3.172.
b
0.014
se(β)
Finally, the Dickey-Fuller test can be performed without a constant and a
time trend by setting α = 0 and δ = 0 in (4.11). This form of the test, which
assumes that the process has zero mean, is only really of use when testing
the residuals of a regression for stationarity as they are known to have zero
mean, a problem that is returned to in Chapter 5.
0
.1
.2
.3
.4
.5
Distribution of the Dickey Fuller Tests
-4
-2
0
2
4
x
no constant or trend
constant and trend
constant but no trend
standard normal
Figure 4.4 Comparing the standard normal distribution (solid line) to the
simulated Dickey-Fuller distribution without an intercept or trend (dashed
line), with and intercept but without a trend (dot-dashed line) and with
both intercept and trend (dotted line).
There are therefore three forms of the Dickey-Fuller test, namely,
Model 1:
Model 2:
Model 3:
∆yt = βyt−1 + ut
∆yt = α + βyt−1 + ut
∆yt = α + δtt + βyt−1 + ut .
(4.12)
For each of these three models the form of the Dickey-Fuller test is still the
same, namely the test of β = 0. The pertinent distribution in each case, however, is not the same because the distribution of the test statistic changes
114
Nonstationarity in Financial Time Series
depending on whether a constant and or a time trend is included. The distributions of different versions of Dickey-Fuller tests are shown in Figure
4.4. The key point to note is that all three Dickey Fuller distributions are
skewed to the left with respect to the standard normal distribution. In addition, the distribution becomes less negatively skewed as more deterministic
components (constants and time trends) are included.
The monthly United States zero coupon bond data have been used to estimate Model 2 and Model 3. Using the Dickey-Fuller distribution the p-value
for the Model 2 Dickey-Fuller test statistic (−2.120) is 0.237 and because
0.237 > 0.05 the null hypothesis of nonstationarity cannot be rejected at
the 5% level of significance. This is evidence that the interest rate is nonstationary. For Model 3, using the Dickey-Fuller distribution reveals that the
p-value of the test statistic (−3.172) is 0.091 and because 0.091 > 0.05, the
null hypothesis cannot be rejected at the 5% level of significance. This result
is qualitatively the same result as the Dickey-Fuller test based on Model 2,
although there is quite a large reduction in the p-value from 0.237 in the
case of Model 2 to 0.091 in Model 3.
4.4.2 Augmented Dickey-Fuller (ADF) Test
In estimating any one of the test regressions in equation (4.12), there is a
real possibility that the disturbance term will exhibit autocorrelation. One
reason for the presence of autocorrelation will be that many financial series
are interact with each other and because the test regressions are univariate
equations the effects of these interactions are ignored. One common solution
to correct for autocorrelation is to proceed as in Chapter 3 and include lags
of the dependent variable ∆yt in the test regressions (4.12). These equations
then become
Model 1:
∆yt = βyt−1 +
p
P
φi ∆yt−i + ut
i=1
Model 2:
∆yt = α + βyt−1 +
p
P
φi ∆yt−i + ut
i=1
Model 3:
∆yt = α + δtt + βyt−1 +
p
P
(4.13)
φi ∆yt−i + ut ,
i=1
in which the lag length p is chosen to ensure that ut does not exhibit autocorrelation. The unit root test still consists of testing β = 0.
The inclusion of lagged values of the dependent variable represents an
augmentation of the Dickey-Fuller regression equation so this test is commonly referred to as the Augmented Dickey-Fuller (ADF) test. Setting p = 0
4.4 The Dickey-Fuller Testing Framework
115
in any version of the test regressions in (4.13) gives the associated DickeyFuller test. The distribution of the ADF statistic in large samples is also the
Dickey-Fuller distribution.
For example, using Model 2 in (4.13) to construct the augmented DickeyFuller test with p = 2 lags for the United States zero coupon 2-month bond
yield, the estimated regression equation is
∆yt = 0.092 − 0.017 yt−1 + 0.117 ∆yt−1 − 0.080 ∆yt−2 + u
bt .
(0.046)
(0.008)
(0.045)
(0.046)
The value of the Augmented Dickey-Fuller test is
tβ =
βb − 0
−0.017 − 0
=
= −2.157.
b
0.008
se(β)
Using the Dickey-Fuller distribution the p-value is 0.223. Since 0.223 > 0.05
the null hypothesis is not rejected at the 5% level of significance This result
is qualitatively the same result as the Dickey-Fuller test with p = 0 lags.
The selection of p affects both the size and power properties of a unit
root test. If p is chosen to be too small, then substantial autocorrelation will
remain in the error term of the test regressions (4.13) and this will result
in distorted statistical inference because the large sample distribution under
the null hypothesis no longer applies in the presence of autocorrelation.
However, including an excessive number of lags will have an adverse effect
on the power of the test.
To select the lag length p to use in the ADF test, a common approach is
to base the choice on information criteria as discussed in in Chapter 3. Two
commonly used criteria are the Akaike Information criteria (AIC) and the
Schwarz information criteria (SIC). A lag-length selection procedure that
has good properties in unit root testing is the modified Akaike information
criterion (MAIC) method proposed by Ng and Perron (2001). The lag length
is chosen to satisfy
pb = arg min MAIC(p) = log(b
σ2) +
p
2(τp + p)
,
T − pmax
(4.14)
in which
α
b2
τp = 2
σ
b
T
X
u
b2t−1 ,
t=pmax +1
and the maximum lag length is chosen as pmax = int[12(T /100)1/4 ]. In estimating pb, it is important that the sample over which the computations are
performed is held constant.
116
Nonstationarity in Financial Time Series
There are two other more informal ways of choosing the length of the lag
structure p. The first of these is to include lags until the t-statistic on the
lagged variable is statistically insignificant using the t-distribution. Unlike
the ADF test, the distribution of the t-statistic on the lagged dependent
variables has a standard distribution based on the Student t distribution.
The second informal approach dealing with the need to choose the lag length
p is effectively to circumvent making a decision at all. The ADF test is
performed for a range of lags, say p = 0, 1, 2, 3, 4, · · · . If all of the tests show
that the series is nonstationary then the conclusion is clear. If four of the 5
tests show evidence of nonstationarity then there is still stronger evidence
of nonstationarity than there is of stationarity.
4.5 Beyond the Dickey-Fuller Framework†
A number of extensions and alternatives to the Dickey-Fuller and Augmented Dickey-Fuller unit roots tests have been proposed. A number of
developments, some of which are commonly available in econometric software packages, are considered briefly.
4.5.1 Structural Breaks
The form of the nonstationarity emphasised so far is based on the series
following a random walk. An alternative form of nonstationarity discussed
earlier is based on a deterministic linear time trend. Another form of nonstationarity is when the series exhibits a structural break as this represents
a shift in the mean and hence by definition is non-mean reverting. The simplest approach is where the timing of the structural break is known. The
approach is to include a dummy variable in (4.13) to capture the structural
break according to
∆yt = α + βyt−1 + δt +
p
X
φi ∆yt−i + γBREAKt + ut ,
(4.15)
i=1
where the structural break dummy variable is defined as
0 : t≤τ
BREAKt =
,
1 : t>τ
(4.16)
and τ is the observation where there is a break. The unit root test is still
based on testing β = 0, however the p-values are now also a function of the
timing of the structural break τ , so even more tables are needed. The correct
p-values for a unit roots test with a structural break are available in Perron
4.5 Beyond the Dickey-Fuller Framework†
117
(1989). For a review of further extensions of unit root tests with structural
breaks, see Maddala and Kim (1998).
An example of a possible structural break is highlighted in Figure 4.2
where there is a large fall in the share price at the time of the 1929 stock
market crash.
4.5.2 Generalised Least Squares Detrending
Consider the following model
yt = α + δt + ut
(4.17)
ut = φut−1 + vt
(4.18)
in which ut is a disturbance term with zero mean and constant variance σ 2 .
This is the fundamental equation from which Model 3 of the Dickey-Fuller
test is derived. If the aim is still to test for a unit root in yt the null and
alternative hypotheses are
H0 : φ = 1
H1 : φ < 1 .
[Nonstationary]
[Stationary]
(4.19)
Instead of proceeding in the manner described previously and using Model
3 in either (4.12) or (4.13), an alternative approach is to use a two-step
procedure.
Step 1: Detrending
Estimate the parameters of equation (4.17) by ordinary least squares
and then construct a detrended version of yt given by
b .
yt∗ = yt − α
b − δt
Step 2: Testing
Test for a unit root using the deterministically detrended data, yt∗ ,
from the first step, using the Dickey-Fuller or augmented DickeyFuller test. Model 1 will be the appropriate model to use because,
by construction, yt∗ will have zero mean and no deterministic trend.
It turns out that in large samples (or asymptotically) this procedure is equivalent to the single-step approach based on Model 3.
Elliott, Rothenberg and Stock (1996) suggest an alternative detrending
step which proceeds as follows. Define a constant φ∗ = 1 + c/T in which the
value of the c depends upon the whether the detrending equation has only
118
Nonstationarity in Financial Time Series
a constant or both a content and a time trend. The proposed values of c are
c = −7
[Constant (α 6= 0, δ = 0)]
c = −13.5 [Trend (α 6= 0, δ 6= 0)].
and use this constant to rewrite the detrending regression as
yt∗ = γ0 α∗ + γ1 t∗ + u∗t ,
(4.20)
in which e∗t is a composite disturbance term,
yt∗ = yt − φ∗ yt−1 ,
∗
∗
∗
∗
α =1−φ ,
t = t − φ (t − 1) ,
t = 2···T
t = 2···T
(4.21)
(4.22)
(4.23)
and the starting values for each of the series at t = 1 are taken to by y1∗ = y1
and α1∗ = t∗1 = 1, respectively. The starting values are important because if
c = −T the detrending equation reverts to the simple detrending regression
(4.17). If, on the other hand, c = 0 then the detrending equation is an
equation in first-differences. It is for this reason that this method, which is
commonly referred to as generalised least squares detrending, is also known
as quasi-differencing and partial generalised least squares (Phillips and Lee,
1995).
Once the ordinary least squares estimates γ
b0 and γ
b1 are available, the
detrended data
b0 α∗ − +γb1 t∗ ,
u
bt∗ = yt∗ − γ
is tested for a unit root. If Model 1 of the Dickey-Fuller framework is used
then the test is referred to as the GLS-DF test. Note, however, that because
the detrended data depend on the value of c the critical value are different
to the Dickey-Fuller critical values which rely on simple detrending. The
generalised least squares (or quasi-differencing) approach was introduced to
try and overcome one of the important shortcomings of the Dickey-Fuller
approach, namely that the Dickey-Fuller tests have low power. What this
means is that the Dickey-Fuller tests struggle to reject the null hypothesis of
nonstationarity (a unit root) when it is in fact false. The modified detrending
approach proposed by Elliott, Rothenberg and Stock (1996) is based on the
premise that the test is more likely to reject the null hypothesis of a unit
root if under the alternative hypothesis the process is very close to being
nonstationary. The choice of value for c = 0 in the detrending process ensures
that the quasi-differenced data have an autoregressive root that is very close
to one. For example, based on a sample size of T = 200, the quasi difference
4.5 Beyond the Dickey-Fuller Framework†
119
parameter φ∗ = 1 + c/T is 0.9650 for a regression with only a constant and
0.9325 for a regression with a constant and a time trend.
4.5.3 Nonparametric Adjustment for Autocorrelation
Phillips and Perron (1988) propose an alternative method for adjusting the
Dickey-Fuller test for autocorrelation. Their test is based on estimating the
Dickey-Fuller regression equation, either (4.8) or (4.11), by ordinary least
squares but using a nonparametric approach to correct for the autocorrelation. The Phillips-Perron statistic is
1/2
b
T (fb0 − γ
b0 )se(β)
γ
b0
e
tβ = tα
−
,
(4.24)
1/2
fb0
2fb0 s
where tβ is the ADF statistic, s is the standard error of the regression, fb0 is
known as the long-run variance which is computed as
fb0 = γ
b0 + 2
p
X
j=1
j
γj ,
(1 − )b
p
(4.25)
where p is the length of the lag, and γ
bj is the j th estimated autocovariance
function of the ordinary least squares residuals obtained from estimating
either (4.8) or (4.11)
γ
bj =
T
1 X
u
bt u
bt−j .
T
(4.26)
t=j+1
The critical values are the same as the Dickey-Fuller critical values when
the sample size is large.
4.5.4 Unit Root Test with Null of Stationarity
The Dickey-Fuller testing framework for unit root testing, including the
generalised least squares detrending and Phillips-Perron variants, are for
the null hypothesis that a time series yt is nonstationary or I(1). There is,
however, a popular test that is often reported in the empirical literature
which has a null hypothesis of stationarity or I(0). Consider the regression
model
yt = α + δt + zt ,
where zt is given by
zt = zt−1 + εt ,
εt ∼ iid N (0, σε2 ) .
120
Nonstationarity in Financial Time Series
The null hypothesis that yt is a stationary I(0) process is tested in terms
of the null hypothesis H0 : σε2 = 0 in which case zt is simply a constant.
Define {b
z1 , · · · , zbT } as the ordinary least squares residuals from regression
of yt on a constant and a deterministic trend. Now define the standardised
test statistic
PT Pt
bj )2
t=1 ( j=1 z
S=
,
T 2 fb0
in which fb0 is a consistent estimator of the long-run variance of zt . This test
statistic can is most commonly known as the KPSS test, after Kwiatkowski,
Phillips, Schmidt and Shin (1992). This can also be regarded as a test for
over-differencing following the earlier discussion of over-differencing.
4.5.5 Higher Order Unit Roots
A failure to reject the null hypothesis of nonstationarity suggests that the
series needs to be differenced at least once to render it stationary ie d ≥ 1.
The question is how many times does the series have to be differenced to
achieve stationarity. To identify the value of d, the unit root tests discussed
above are performed sequentially as follows.
(1) Test the level of the series for a unit root.
(a) If the null is rejected, stop and conclude that the series is I(0).
(b) If you fail to reject the null, conclude that the process is at least
I(1) and move to the next step.
(2) Test the first difference of the series for a unit root.
(a) If the null is rejected, stop and conclude that the series is I(1).
(b) If you fail to reject the null, conclude that the process is at least
I(2) and move to the next step.
(3) Test the second difference of the series for a unit root.
(a) If the null is rejected, stop and conclude that the series is I(2).
(b) If you fail to reject the null, conclude that the process is at least
I(3) and move to the next step.
As it is very rare for financial series to exhibit orders of integration higher
than I(2), it is safe to stop at this point. The pertinent p-values vary at each
stage of the sequential unit root testing procedure.
121
4.6 Price Bubbles
4.6 Price Bubbles
During the 1990s, led by Dot-Com stocks and the internet sector, the United
States stock market experienced a spectacular rise in all major indices, especially the NASDAQ index. Figure 4.5 plots the monthly NASDAQ index,
expressed in real terms, for the period February 1973 to January 2009. The
series grows fairly steadily until the early 1990s and begins to surge. The
steep upward movement in the series continues until the late 1990s as investment in Dot-Com stocks grew in popularity. Early in the year 2000 the Index
drops abruptly and then continues to fall to the mid-1990s level. In summary, over the decade of the 1990s, the NASDAQ index rose to the historical
high on 10 March 2000. Concomitant with this striking rise in stock market
indices, there was much popular talk among economists about the effects of
the internet and computing technology on productivity and the emergence
of a new economy associated with these changes. What caused the unusual
surge and fall in prices, whether there were bubbles, and whether the bubbles were rational or behavioural are among the most actively debated issues
in macroeconomics and finance in recent years.
10
20
00
20
90
19
80
19
19
70
0
10
ndreal
20
30
NASDAQ Index Expressed in Real Terms
Figure 4.5 The monthly NASDAQ index expressed in real terms for the
period February 1973 to January 2009.
A recent series of papers places empirical tests for bubbles and rational
exuberance is an interesting new development in the field of unit root testing
(Phillips and Yu, 2011; Phillips, Wu and Yu, 2011). Instead of concentrating
122
Nonstationarity in Financial Time Series
on performing a test of a unit root against the alternative of stationarity
(essentially using a one-sided test where the critical region is defined in
the left-hand tail of the distribution of the unit root test statistic), they
show that the process having an explosive unit root (the right tail of the
distribution) is appropriate for asset prices exhibiting price bubbles. The
null hypothesis of interest is still ρ = 1 but the alternative hypothesis is now
ρ > 1 in (4.4), or
H0 :
H1 :
ρ=1
ρ>1
(Variable is nonstationary, No price bubble)
(Variable is explosive, Price bubble).
(4.27)
To motivate the presence of a price bubble, consider the following model
Pt (1 + R) = Et [Pt+1 + Dt+1 ] ,
(4.28)
where Pt is the price of an asset, R is the risk-free rate of interest assumed to
be constant for simplicity, Dt is the dividend and Et [·] is the conditional expectations operator. This equation highlights two types of investment strategies. The first is given by the left hand-side which involves investing in a
risk-free asset at time t yielding a payoff of Pt (1 + R) in the next period.
Alternatively, the right hand-side shows that by holding the asset the investor earns the capital gain from owning an asset with a higher price the
next period plus a dividend payment. In equilibrium there are no arbitrage
opportunities so the two two types of investment are equal to each other.
Now write the equation as
Pt = β Et [Pt+1 + Dt+1 ] ,
(4.29)
where β = (1 + R)−1 is the discount factor. Now writing this expression at
t+1
Pt+1 = β Et [Pt+2 + Dt+2 ] ,
(4.30)
which can be used to substitute out Pt+1 in (4.29)
Pt = β Et [β Et [Pt+2 + Dt+2 ] + Dt+1 ] = β Et [Dt+1 ]+β 2 Et [Dt+2 ]+Et [Pt+2 ] .
Repeating this approach N −times gives the price of the asset in terms of
two components
Pt =
N
X
β j Et [Dt+j ] + β N Et [Pt+N ] .
(4.31)
j=1
The first term on the right-hand side is the standard present value of an asset
123
4.6 Price Bubbles
whereby the price of an asset equals the discounted present value stream of
expected dividends. The second term represents the price bubble
Bt = β N Et [Pt+N ] ,
(4.32)
as it is an explosive nonstationary process. Consider the conditional expectation of the bubble the next period discounted by β and using the property
Et [Et+1 [·]] = Et [·]:
β Et [Bt+1 ] = β Et β N Et+1 [Pt+N +1 ] = β N +1 Et [Pt+N +1 ]
(4.33)
However, this expression would also correspond to the bubble in (4.32) if the
N forward iterations that produced (4.31) actually went for N +1 iterations.
In which case
Bt = βEt [Bt+1 ]
or, as β = (1 + R)−1
Et [Bt+1 ] = (1 + R)Bt
which represents a random walk in Bt but with an explosive parameter 1+R.
10
20
05
20
00
20
95
19
90
19
85
19
80
19
19
75
-3
-2
-1
0
1
2
Recursive ADF Tests
Figure 4.6 Testing for price bubbles in the monthly NASDAQ index expressed in real terms for the period February 1973 to January 2009 by
means of recursive Augmented Dickey Fuller tests with 1 lag. The startup
sample is 39 observations from February 1973 to April 1976. The approximate 5% critical value is also shown.
124
Nonstationarity in Financial Time Series
10
20
00
20
90
19
19
80
-4
-2
0
2
Rolling Window ADF Tests
Figure 4.7 Testing for price bubbles in the monthly NASDAQ index expressed in real terms for the period February 1973 to January 2009 by
means of rolling window Augmented Dickey Fuller tests with 1 lag. The
size of the window is set to 77 observations so that the starting sample
is February 1973 to June 1979. The approximate 5% critical value is also
shown.
Interestingly enough, if we were to follow the convention and apply the
ADF test to the full sample (February 1973 to January 2009), the unit root
test would not reject the null hypothesis H0 : ρ = 1 in favour of the righttailed alternative hypothesis H1 : ρ > 1 at the 5 % level of significance.
One would conclude that there is no significant evidence of exuberance in
the behaviour of the NASDAQ index over the sample period. This result
would sit comfortably with the consensus view that there is little empirical
evidence to support the hypothesis of explosive behaviour in stock prices
(see, for example, Campbell, Lo and MacKinlay, 1997, p260).
On the other hand, Evans (1991) argues that explosive behaviour is only
temporary in the sense that economic eventually bubbles collapse and that
therefore the observed trajectories of asset prices may appear rather more
like an I(1) or even a stationary series than an explosive series, thereby confounding empirical evidence. Evans demonstrates by simulation that standard unit root tests have difficulties in detecting such periodically collapsing
bubbles. In order for unit root test procedures to be powerful in detecting
4.7 Exercises
125
bubbles, the use of recursive unit root testing proves to an invaluable approach in the detection and dating of bubbles.
Figure 4.6 plots the ADF statistic with 1 lag computed from forward recursive regressions by fixing the start of the sample period and progressively
increasing the sample size observation by observation until the entire sample is being used. Interestingly, the NASDAQ shows no evidence of rational
exuberance until June 1995. In July 1995, the test detects the presence of
a bubble, ρb > 0, with the supporting evidence becoming stronger from this
point until reaching a peak in February 2000. The bubble continues until
February 2001 and by March 2001 the bubble appears to have dissipated
and ρb < 0. Interestingly, the first occurrence of the bubble is July 1995,
which is more than one year before the remark by Greenspan (1996) on 5
December 1996, coining the phrase of irrational exuberance, to characterise
herding behaviour in stock markets.
To check the robustness of the results Figure 4.7 plots the ADF statistic
with 1 lag for a series of rolling window regressions. Each regression is based
on a subsample of size T = 77 with the first sample period from February
1973 to June 1979. The fixed window is then rolled forward one observation
at a time. The general pattern to emerge is completely consistent with the
results reported in Figure 4.6.
Of course these results do not have any causal explanations for the exuberance of the 1990s in internet stocks. Several possibilities exist, including
the presence of a rational bubble, herding behaviour, or explosive effects on
economic fundamentals arising from time variation in discount rates. Identification of the explicit economic source or sources of will involve more explicit formulation of the structural models of behaviour. What this recursive
methodology does provide, however, is support of the hypothesis that the
NASDAQ index may be regarded as a mildly explosive propagating mechanism. This methodology can also be applied to study recent phenomena in
real estate, commodity, foreign exchange, and equity markets, which have
attracted attention.
4.7 Exercises
(1) Unit Root Properties of Commodity Price Data
commodity.wf1, commodity.dta, commodity.xlsx
(a) For each of the commodity prices in the dataset, compute the natural logarithm and use the following unit root tests to determine
126
Nonstationarity in Financial Time Series
the stationarity properties of each series. Where appropriate test
for higher orders of integration.
(i) Dickey-Fuller test with a constant and no time trend.
(ii) Augmented Dickey-Fuller test with a constant and no time trend,
and p = 2 lags.
(iii) Phillips-Perron test with a constant and no time trend.
(b) Perform a panel unit root test on the 7 commodity prices with a
constant and no time trend and with p = 2 lags.
(2) Equity Market Data
pv.wf1, pv.dta, pv.xlsx
(a) Use the equity price series to construct the following transformed
series; the natural logarithm of equity prices, the first difference
of equity prices and log returns of equity prices. Plot the series
and discuss the stationarity properties of each series. Compare the
results with Figure 4.2.
(b) Construct similarly transformed series for dividend payments and
discuss the stationarity properties of each series.
(c) Construct similarly transformed series for earnings and and discuss
the stationarity properties of each series.
(d) Use the following unit root tests to test for stationarity of the natural
logarithms of prices, dividends and earnings:
(i) Dickey-Fuller test with a constant and no time trend.
(ii) Augmented Dickey-Fuller test with a constant and no time trend
and p = 1 lag.
(iii) Phillips-Perron test with a constant and no time trend and p = 1
lags.
In performing these tests it may be necessary to test for higher
orders of integration.
(e) Repeat part (d) where the lag length for the ADF and PP tests is
based on the automatic bandwidth selection procedure.
(3) Unit Root Tests of Bond Market Data
zero.wf1, zero.dta, zero.xlsx
4.7 Exercises
127
(a) Use the following unit root tests to determine the stationarity properties of each yield
(i) Dickey-Fuller test with a constant and no time trend.
(ii) Augmented Dickey-Fuller test with a constant and no time trend,
and p = 2 lags.
(iii) Phillips-Perron test with a constant and no time trend.
In performing these tests it is necessary to test for higher orders of
integration.
(b) Perform a panel unit root test on the 6 yield series with a constant
and no time trend and with p = 2 lags.
(4) The Term Structure of Interest Rates
zero.wf1, zero.dta, zero.xlsx
The term expectations hypothesis of the term structure of interest
rates predicts the following relationship between a long-term interest
rate of maturity n and a short-term rate of maturity m < n
yn,t = β0 + β1 ym,t + ut ,
where ut is a disturbance term and β0 is represents the term premium
and β1 = 1 for the pure expectations hypothesis.
(a) Test for cointegration between y9,t and y3,t using Model 2 and p = 1
lags.
(b) Given the results in part (a) estimate a bivariate ECM for y9,t and
y3,t using Model 2 with p = 1 lags. Write out the estimated model
(the cointegrating equation(s) and the ECM). In estimating the
VECM order the yields from the longest maturity to the shortest.
(c) Interpret the long-run parameter estimates of β1 and β2 .
(d) Interpret the error correction parameter estimates of γ1 and γ1 .
(e) Interpret the short-run parameter estimates of πi,j .
(f) Test the restriction β1 = 1.
(g) Repeat parts (a) to (f) for the 6-month (y6,t ) and 3-month (y3,t )
yields.
(h) Repeat parts (a) to (f) for the 9-month (y9,t ), 6-month (y6,t ) and
3-month (y3,t ) yields.
(i) Repeat parts (a) to (f) for all 6 yields (y9,t , y6,t , y5,t , y4,t , y3,t , y2,t ).
128
Nonstationarity in Financial Time Series
(j) Discuss whether the empirical results support the term structure of
interest rate model.
(k) Questions (a) to (k) are all based on specifying Model 2 as the ECM.
Reestimate the VECM where Model 3 is chosen. As the difference
between Model 2 and Model 3 is the inclusion of intercepts in each
equation of the VECM, perform a test that each intercept is zero.
Interpret the results of this test.
(l) In estimating the VECM in the previous question, the order of the
yields consists of choosing the longest maturity first and the shortest
maturity last ie
y9,t , y6,t , y3,t .
Now reestimate the VECM choosing the ordering
y9,t , y3,t , y6,t .
Show that the estimated cointegrating equation(s) from this system
can be obtained from the previous system based on an alternative
ordering. Hence show that the estimates of the cointegrating equation(s) is (are) not unique.
(m) Test for weak exogeneity in the bivariate system containing y9,t and
y3,t . To perform the test that y9,t is weakly exogenous. Repeat the
test for a system that contains the interest rates y6,t and y3,t and
then for the trivariate system y9,t , y6,t and y3,t .
(5) Purchasing Power Parity
ppp.wf1, ppp.dta, ppp.xlsx
Under the assumption of purchasing power parity (PPP), the nominal
exchange rate adjusts in the long-run to the price differential between
foreign and domestic countries
P
F
This suggests that the relationship between the nominal exchange rate
and the prices in the two countries is given by
S=
st = β0 + β1 pt + β2 ft + ut
where lower case letters denote natural logarithms and ut is a disturbance term which represents departures from PPP with β2 = −β1 .
4.7 Exercises
129
(a) Construct the relevant variables, s, f , p and the difference dif f =
p − f.
(b) Use unit root tests to determine the level of integration of all of
these series. In performing the unit root tests, test the sensitivity of
the results by using a model with a constant and no time trend, and
a model with a constant and a time trend. Let the lags be p = 12.
Discuss the results in terms of the level of integration of each series.
(c) Test for cointegration between s p and f using Model 3 with p = 12
lags.
(d) Given the results in part (c) estimate a trivariate ECM for s, p and
f using Model 3 and p = 12 lags. Write out the estimated (the
cointegrating equation(s) and the ECM).
(e) Interpret the long-run parameter estimates. Hint: if the number of
cointegrating equations is greater than one, it is helpful to rearrange
the cointegrating equations so one of the equations expresses s as a
function of p and f .
(f) Interpret the error correction parameter estimates.
(g) Interpret the short-run parameter estimates.
(h) Test the restriction H0 : β2 = −β1 .
(i) Discuss the long-run properties of the $/AUD foreign exchange market?
(6) Fisher Hypothesis
fisher.wf1, fisher.dta, fisher.xlsx
Under the Fisher hypothesis the nominal interest rate fully reflects
the long-run movements in the inflation rate.
(a) Construct the percentage annualised inflation rate, πt .
(b) Plot the nominal interest rate and inflation.
(c) Perform unit root tests to determine the level of integration of the
nominal interest rate and inflation. In performing the unit root tests,
test the sensitivity of the results by using a model with a constant
and no time trend, and a model with a constant and a time trend.
Let the lags be determined by the automatic lag length selection
procedure. Discuss the results in terms of the level of integration of
each series.
(d) Compute the real interest rate as
rt = it − πt ,
130
Nonstationarity in Financial Time Series
where it is nominal interest rate and πt is the inflation rate. Test the
real interest rate rt for stationarity using a model with a constant
but no time trend. Does the Fisher hypothesis hold? Discuss.
(7) Price Bubbles in the Share Market
bubbles.wf1, bubbles.dta, bubbles.xlsx
The data represents a subset of the equity us.* data in order to focus
on the 1987 stock market crash. The present value model predicts the
following relationship between the share price Pt , and the dividend Dt
pt = β0 + β1 dt + ut
where ut is a disturbance term. A rational bubble occurs when the actual
price persistently deviates from the present value price β0 + β1 dt . The
null and alternative hypotheses are
H0 :
H1 :
Bubble
Cointegration
(ut is nonstationary)
(ut is stationary)
(a) Create the logarithms of real equity prices and real dividends and
use unit root tests to determine the level of integration of the series.
(b) Estimate a bivariate VAR with a constant and use the SIC lag length
criteria to determine the optimal lag structure.
(c) Test for a bubble by performing a cointegration between pt and dt
using Model 3 with the number of lags based on the optimal lag
length obtained form the estimated VAR.
(d) Are United States equity prices driven solely by market fundamentals or do bubbles exist.
5
Cointegration
5.1 Introduction
An important implication of the analysis of stochastic trends and the unit
root tests discussed in Chapter 4 is that nonstationary time series can be
rendered stationary through differencing the series. This use of the differencing operator represents a univariate approach to achieving stationarity since the discussion of nonstationary processes so far has concentrated
on a single time series. In the case of N > 1 nonstationary time series
yt = {y1,t , y2,t , · · · , yN,t }, an alternative method of achieving stationarity is
to form linear combinations of the series. The ability to find stationary linear
combinations of nonstationary time series is known as cointegration (Engle
and Granger, 1987).
Cointegration provides a basis for interpreting a number of models in
finance in terms of long-run relationships. Having uncovered the long-run
relationships between two or more variables by establishing evidence of
cointegration, the short-run properties of financial variables are modelled
by combining the information from the lags of the variables with the longrun relationships obtained from the cointegrating relationship. This model
is known as a vector error-correction model (VECM) which is shown to be
a restricted form of the vector autoregression models (VAR) discussed in
Chapter 3.
The existence of cointegration among sets of nonstationary time series has
three important implications.
(1) Cointegration implies a set of dynamic long-run equilibria where the
weights used to achieve stationarity represent the parameters of the
equilibrium relationship.
(2) The estimates of the weights to achieve stationarity (the long-run parameter estimates) converge to their population values at a super-consistent
132
Cointegration
√
rate of T compared to the usual T rate of convergence for stationary
variables.
(3) Modelling a system of cointegrated variables allows for specification of
both long-run and short-run dynamics in terms of the VECM.
5.2 Equilibrium Relationships
Equity Prices
Earnings
00
20
80
19
60
19
40
19
20
19
00
19
18
80
-2
0
2
4
6
8
An important property of asset prices identified in Chapter 1 is that they
exhibit strong trends. This is indeed the case for United States as seen in
Figure 5.1 which shows that the logarithm of monthly real equity prices,
pt = log Pt , exhibit a strong positive trend over the period 1871 to 2004.
The same is true for the logarithms of real dividends, dt = log Dt , and real
earnings per share, yt = log Yt , also illustrated in Figure 5.1. As discussed in
Chapter 4, many important financial time series exhibit trending behaviour
and are therefore nonstationary.
Dividends
Figure 5.1 Time series plots of the logarithms of monthly United States
real equity prices, real dividends and real earnings per share for the period
February 1871 to June 2004.
It may be an empirical fact that the financial variables, illustrated in
Figure 5.1 are I(1), but theory suggests some theoretical link between the
behaviour of prices, dividends and earnings. An early influential paper in
this area is by Gordon (1959). who outlines two views of asset price determination. In the dividend view, the investor purchases as stock to acquire
the entire future stream of dividend payments. This path of future dividends
is approximated by the current dividend and the expected growth in the div-
5.2 Equilibrium Relationships
133
idend. If the expected growth of dividends are assumed constant then there
is a long-run relationship between prices and dividends given by
pt = µd + βd dt + ud,t .
[Dividend model]
(5.1)
Important feature is that both pt and dt are I(1) but if µd + βd yt truly does
represent the expected value of pt , then it must follow that the disturbance
term, ud,t is stationary or I(0).
Alternatively, in the earnings view of the world, the investor buys equity
in order to obtain the income per share and is indifferent as to whether
the returns are packaged in terms of the fraction of earnings distributed
as a dividend or in terms of the rise in the share’s value. This suggests a
relationship of the form
pt = µy + βy yt + uy,t ,
[Earnings model]
(5.2)
where once again uy,t must be I(0) if this represents a valid long-run relationship vector.
In other words, in either view of the world, pt can be decomposed into a
long-run component and a short-run component which represents temporary
deviations of pt from its long-run. This can be represented as
ud,t
pt
= µd + βd dt +
|{z}
| {z }
|{z}
Actual
Long-run
Short-run
or in the case of the earnings model
pt
= µy + βy dt +
uy,t
|{z}
| {z }
|{z}
Actual
Long-run
Short-run
A linear combination of nonstationary variables generates a new variable
that is stationary is a result known as cointegation. Furthermore, the concept of cointegration is not limited to the bivariate case. If the growth of
dividends is driven by retained earnings, then the path of future dividends is
approximated by the current dividend and the expected growth in the dividend given by retained earnings. This suggests an equilibrium relationship
of the form
pt = µ + βd dt + βy yt + ut ,
[Combined model]
where as before pt , dt and yt are I(1) and ut is I(0). If the owner of the
share is indifferent to the fraction of earnings distributed, then cointegrating
parameters, βd and βy will be identical. Of course, all dividends are paid
out of retained earnings so there will be a relationship between these two
134
Cointegration
variables as well, a fact which raises the interesting question of more than
one cointegrating relationship being present in multivariate contexts. This
is issue is taken up again in Section 5.8.
5.3 Equilibrium Adjustment
Assume that we have two variables y1,t and y2,t who share a long-run equilibrium relationship given by
y1,t = µ + βy2,t−1 + εt ,
in which εt is a mean-zero disturbance term and although the equation is
normalised with respect to respect to y1,t the notation is deliberately chosen
to reflect the fact that both variables are possibly endogenously determined.
This relationship is presented in Figure 5.2 for β > 0.
y1
B
C
D
A
y2
Figure 5.2 Phase diagram to demonstrate the equilibrium adjustment if
two variables are cointegrated.
The system is in equilibrium anywhere along the long ADC. Now suppose
there is shock to the system such that y1,t−1 > µ + βy2,t−1 or equivalently
ut−1 > 0 and the system is displaced to point B. An equilibrium relationship
implies necessarily that any shock to the system will result in an adjustment
taking place in such a way that equilibrium is restored. There are three cases.
(1) The adjustment is done by y1,t :
∆y1,t = α1 (y1,t−1 − µ − βy2,t−1 ) + u1,t .
(5.3)
Since y1,t−1 − µ − βy2,t−1 > 0, inspection of equation (5.3) reveals that
∆y1,t should be negative, which in turn suggests the restriction α1 < 0.
5.3 Equilibrium Adjustment
135
In Figure 5.2 this adjustment is represented by a perpendicular move
down from B towards A.
(2) The adjustment is done by y2,t :
∆y2,t = α2 (y1,t−1 − µ − βy2,t−1 ) + u2,t .
(5.4)
Since y1,t−1 − µ − βy2,t−1 > 0, inspection of equation (5.4) reveals that
∆y2,t should be positive, which in turn suggests the restriction α2 > 0.
In Figure 5.2 this adjustment is represented by a horizontal move from
B towards C.
(3) Both y1,t and y2,t adjust:
In this case both equations (5.3) and (5.4) operate with pt increasing
and y2,t decreasing. The strength of the movements in the two variables
is determined by the relative magnitudes of the parameters α1 and α2 .
If both variables bear an equal share of the adjustment the movement
back to equilibrium is from point B to point D as shown in Figure 5.2.
Prima facie evidence of equilibrium relationships between equity prices
and dividends, and equity prices and earnings is presented in panels (a) and
(b), respectively, of Figure 5.3. Scatter plots of these relationships together
with lines of best fit demonstrate that both these relationships are similar
to the equilibrium represented in Figure 5.2. Furthermore, casual inspection
of the equilibrium relationships suggests that the values of βd and βy are
both close to 1.
In order to explore which of the variables do the adjusting in the event
of a shock which forces the system away from equilibrium, equations (5.3)
and (5.4) must be estimated. Particularising these equations to the equity
prices/dividends and equity prices/earnings relationships and estimating by
sequential application of ordinary least squares yields the following results.
For the dividend model the estimates are
∆pt = −0.0009 pt−1 − 1.1787 dt−1 − 3.128 + u
b1,t
∆dt = 0.0072 pt−1 − 1.1787 dt−1 − 3.128 + u
b2,t ,
while for the earnings model the results are
∆pt = −0.0053 pt−1 − 1.0410 yt−1 − 2.6073 + u
b1,t
∆yt = 0.0035 pt−1 − 1.0410 yt−1 − 2.6073 + u
b2,t .
It appears that the equilibrium adjustment predicted by equations (5.3)
and (5.4) is confirmed for these two relationships. In particular, the signs
136
Cointegration
6
Equity Prices
4
2
0
0
2
Equity Prices
4
6
8
(b)
8
(a)
-2
-1
0
1
Dividends
2
3
-2
0
2
4
Earnings
Figure 5.3 Scatter plots of the logarithms of month United States real
equity prices and real dividends, panel (a), and real equity prices and real
earnings per share, panel (b), for the period February 1871 to June 2004.
on the adjustment parameters satisfy the conditions required for there to be
equilibrium adjustment.
5.4 Vector Error Correction Models
Taken together equations (5.3) and (5.4) are known as a vector error correction model or VECM. In practice, the specification of a VECM requires the
inclusion of more complex short-run dynamics, introduced through the addition of lags in dependent variables, and also the inclusion of constants and
time trends in the same way that these deterministic variables are included
in unit root tests. Here the situation is slightly more involved because these
deterministic variables can appear in either the long-run cointegrating equation or in the short-run dynamics, or VAR, part of the equation. There are
five different models to consider all of which are listed below. For simplicity
the short-run dynamics or VAR part of the VECM are not included in this
listing of the models.
Model 1(No Constant or Trend):
No intercept and no trend in the cointegrating equation and no intercept and no trend in the VAR:
∆y1,t = α1 (y1,t−1 − βy2,t−1 ) + u1,t
∆y2,t = α2 (y1,t−1 − βy2,t−1 ) + u2,t
5.4 Vector Error Correction Models
137
This specification is included for completeness but, in general, the
model will only rarely be of any practical use as most empirical
specifications will require at least a constant whether or in the longrun or short-run or both.
Model 2 (Restricted Constant):
Intercept and no trend in the cointegrating equation and no intercept
and no trend in the VAR
∆y1,t = α1 (y1,t−1 − βy2,t−1 − µ) + v1,t
∆y2,t = α2 (y1,t−1 − βy2,t−1 − µ) + v2,t
This model is referred to as the restricted constant model as there
is only one intercept term µ in the long-run equation which acts as
the intercept for both dynamic equations.
Model 3 (Unrestricted Constant):
Intercept and no trend in the cointegrating equation and intercept
and no trend in the VAR
∆y1,t = δ1 + α1 (y1,t−1 − βy2,t−1 − µ) + v1,t
∆y2,t = δ2 + α2 (y1,t−1 − βy2,t−1 − µ) + v2,t
Model 4 (Restricted Trend):
Intercept and trend in the cointegrating equation and intercept and
no trend in the VAR
∆y1,t = δ1 + α1 (y1,t−1 − βy2,t−1 − µ − φTREND) + v1,t
∆y2,t = δ2 + α2 (y1,t−1 − βy2,t−1 − µ − φTREND) + v2,t
Similar to Model 2, this model is called the restricted trend model
because there is only one trend term in the long-run equation.
Model 5 (Unrestricted Trend):
Intercept and trend in the cointegrating equation and intercept and
trend in the VAR
∆y1,t = δ1 + θ1 TREND + α1 (y1,t−1 − βy2,t−1 − µ − φTREND) + v1,t
∆y2,t = δ2 + θ2 TREND + α2 (y1,t−1 − βy2,t−1 − µ − φTREND) + v2,t
As with the unit root tests lagged values of all of the dependent variables
(VAR terms) are included as additional regressors to capture the short-run
dynamics. As the system is multivariate, the lags of all dependent variables
are included in all equations. For example, a VECM based on Model 2
138
Cointegration
(restricted constant) with p lags on the dynamic terms becomes
∆y1,t = α1 (y1,t−1 − βy2,t−1 − µ) +
∆y2,t = α2 (y1,t−1 − βy2,t−1 − µ) +
p
X
i=1
p
X
π11,i ∆y1,t−i +
π21,i ∆y1,t−i +
i=1
p
X
i=1
p
X
π12,i ∆y2,t−i + v1,t
π22,i ∆y2,t−i + v2,t .
i=1
Exogenous variables determined outside of the system are also allowed. Finally, the system can be extended to include more than two variables. In
this case there is the possibility of more than a single cointegrating equation
which means that the system adjusts in general to several shocks, a theme
taken up again in Section 5.8.
5.5 Relationship between VECMs and VARs
The VECM represents a restricted form of a VAR. Instead of the VAR format
where all variables are stationary (first differences in this instance), the
VECM specifically includes the long-run equilibrium relationship in which
the variables enter in levels. To highlight this relationship consider a simple
VECM given by
y1,t − y1,t−1 = α1 (y1,t−1 − βy2,t−1 ) + u1,t
y2,t − y2,t−1 = α2 (y1,t−1 − βy2,t−1 ) + u2,t ,
(5.5)
in which there is one cointegrating equation and no lagged difference terms
on the right hand side. There are three parameters to be estimated, namely,
the cointegating parameter β and the two error correction parameters α1
and α2 .
Now re-express each equation in terms of the levels of the variables as
y1,t = (1 + α1 )y1,t−1 − α1 βy2,t−1 + u1,t
y2,t = α2 y1,t−1 + (1 − α2 β)y2,t−1 + u2,t .
(5.6)
Not that the VAR is a VAR(1) which has one lag of the levels of the variables
on the right hand side. This is a general relationship between a VAR and a
VECM. If the underlying VAR is specified to be a VAR(n) then the VECM
will have n − 1 lagged difference terms, that is a VECM(n − 1).
y1,t = φ11 y1,t−1 + φ12 y2,t−1 + u1,t
y2,t = φ21 y1,t−1 + φ22 y2,t−1 + u2,t ,
(5.7)
where the parameters in (5.7) are related to those in (5.6) by the restrictions
φ11 = 1 + α1 ,
φ12 = −α1 β
φ21 = α2 ,
φ22 = 1 − α2 β.
5.5 Relationship between VECMs and VARs
139
Equation (5.7) is a VAR in the levels of the variables discussed in Chapter
3. Estimating the VAR yields estimates of φ11 , φ12 , φ21 and φ22 .
A comparison of equations (5.6) and (5.7) shows that cointegration imposes one cross-equation restriction on this system, which accounts for the
difference in the number of parameters in the VAR and the VECM. This
restriction arises as both variables are determined by the same underlying
long-run relationship which involves the parameter β. The form of the restriction is recovered by noting that
α1 = φ11 − 1,
α2 = φ21 ,
β = (1 − φ22)φ−1
21
The additional VAR parameter can be expressed as a function of the other
three VAR parameters as
φ12 = (1 − φ11 )(1 − φ22 )φ−1
21 .
This result suggests that if there is cointegration, estimating the unrestricted
VAR in levels produces an estimate of φ12 that is close to the value that
would be obtained from substituting the remaining VAR parameters estimates into this expression.
Alternatively, if there is no cointegration then there is nothing for the
system to error-correct to and the error-correction parameters in (5.5) are
simply α1 = α2 = 0. The VECM is now a VAR in first differences. It is
recognition of a second-best strategy whereby if no long-run relationship
exists, then the next strategy is to model just the short-run relationships
amongst the variables.
This discussion touches on the old problem in time-series modelling of
when to difference variables in order to address the problem of nonstationarity. The solution is to know whether there is cointegration or not. If there
is cointegration, a VAR in levels is the correct specification. If there is no
cointegration a VAR if first differences is required. Of course, if there is
cointegartion an VECM can be specified, but in large samples this would be
equivalent to estimating the VAR in levels. This result also highlights the
importance of VECMs in modelling financial variables because it demonstrates that the old practice of automatically differencing variables to render them stationary and then estimating a VAR on the differenced data,
rules out the possibility of a long-run relationship and hence any role for an
error-correction term in modelling the dynamics.
140
Cointegration
5.6 Estimation
To illustrate the estimation of a VECM, consider a very simple specification
based on Model 3 (unrestricted constant) in which the dynamics are limited
to one lag on all the dynamics terms. The full VECM consists of the following
three equations
y1,t = µ + βy2,t + ut
∆y1,t = δ1 + φ11 ∆y1,t−1 + φ12 ∆y2,t−1 + α1 (y1,t−1 − βy2,t−1 ) + v1,t
(5.8)
(5.9)
∆y2,t = δ2 + φ21 ∆y1,t−1 + φ22 ∆y2,t−1 + α2 (y1,t−1 − βy2,t−1 ) + v2,t , (5.10)
whose parameters must be estimated. Two estimators are discussed initially,
namely, the the Engle-Granger two-step procedure that provides estimates
of the cointegrating equation without considering the dynamics from the
VECM or the potential endogeneity of y2,t , and the the Johansen estimator
that provides estimates of the cointegrating equation that takes into account
all of the dynamics of the model. For this reason, the Johansen procedure
is referred to as an efficient estimation procedure and the Engle-Granger
method as the inefficient estimation procedure.
The Engle and Granger estimator (Engle and Granger, 1987)
The Engle Granger two stage procedure is implemented by estimating equations (5.8), (5.9) and (5.10) by ordinary least squares in two steps.
Long-run:
Regress y1,t on a constant and y2,t and compute the residuals u
bt .
Short-run:
Estimate each equation of the error correction model in turn by
ordinary least squares as follows
(1) Regress ∆y1,t on a constant, u
bt−1 , ∆y1,t−1 and ∆y2,t−1 .
(2) Regress ∆y2,t on a constant, u
bt−1 , ∆y1,t−1 and ∆y2,t−1 .
The error correction parameter estimates, α
b1 and α
b2 , are the slope
parameter estimates on u
bt−1 in these two equations, respectively.
This estimator yields super-consistent estimates of the cointegrating vector (Stock, 1987; Phillips, 1987). Nevertheless the Engle-Granger estimator
does not produce estimates that are asymptotically efficient, except under
very strict conditions which are, in practice, unlikely to be satisfied. This
results in the estimates having nonstandard distributions which invalidates
the use of standard inferential methods.
The econometric problems with the Engle-Granger procedure arise from
the potential endogeneity of yt and autocorrelation in the disturbances ut
141
5.6 Estimation
when simply estimating equation (5.8) by ordinary least squares. Thus, while
it is not necessary to take into account short-run dynamics to obtain superconsistent estimates of the long-run parameters, it is necessary to model the
short-run dynamics to obtain efficient an efficient estimator with t-statistics
that have standard distributions.
The Johansen estimator (Johansen, 1988, 1991, 1995).
In estimating the cointegrating regression in the two-step procedure none
of the dynamics from the VECM are included in the estimation. A way
to correct for this is to estimate all the parameters of the model jointly, a
procedure known as the Johansen estimator This estimator provides more
efficient estimates of the cointegrating parameters but the second stage still
involves the same sequence of least squares regression but the u
bt−1 will be
different.
Table 5.1
Engle-Granger two-stage estimates of the VECMs for equity prices and dividends
and equity prices and earnings per share. Estimates are for Model 3 (unrestricted
constant) with 1 lag. The sample period is January 1871 to June 2004.
Variable
β
µ
δi
φi1
φi2
αi
Dividend Model
Long
∆pt
∆dt
Run
1.179
(0.005)
3.129
(0.008)
Earnings Model
Long
∆pt
∆yt
Run
1.042
(0.005)
2.607
(0.009)
0.002
(0.001)
0.291
(0.024)
0.148
(0.087)
-0.007
(0.003)
0.000
(0.000)
0.000
(0.003)
0.877
(0.012)
0.002
(0.000)
0.002
(0.001)
0.286
(0.024)
0.074
(0.042)
-0.008
(0.003)
0.000
(0.000)
0.011
(0.007)
0.8781
(0.012)
0.004
(0.001)
The Engle-Granger and Johansen estimators are now compared by estimating VECM model specified in equations (5.8) to (5.10) using the United
States data on equity prices, dividends and earnings. Two separate cointegrating regressions are estimated, one for prices and dividends (the dividend
model) and one for prices and earnings (the earnings model).
The Engle-Granger two stage estimates are reported in Table 5.1. The
cointegration parameters in both cases are slightly greater than unity. Although it is tempting to look at the standard errors and claim that they
142
Cointegration
Table 5.2
Estimates of the VECM for equity prices and earnings per share using the
Johansen estimator. Estimates are based on Model 3 (unrestricted constant) with
1 lag. The sample period is January 1871 to June 2004.
Variable
β
µ
δi
φi1
φi2
αi
Dividend Model
Long
∆pt
∆dt
Run
1.169
(0.039)
3.390
(—–)
Earnings Model
Long
∆pt
∆yt
Run
1.079
(0.039)
2.791
(—–)
0.002
(0.001)
0.291
(0.024)
0.148
(0.087)
-0.007
(0.003)
0.000
(0.000)
0.000
(0.003)
0.877
(0.012)
0.002
(0.000)
0.001
(0.001)
0.286
(0.024)
0.072
(0.042)
-0.008
(0.003)
0.001
(0.000)
0.012
(0.007)
0.871
(0.012)
0.004
(0.001)
are in fact significantly different from unity, this conclusion is premature as
will be come apparent later. The signs of the error-correction parameters are
consistent with the system converging to its long-run equilibrium as given
by the cointegating equation because in both dynamic equations α
b1 < 0 and
α
b2 > 0, respectively. Finally, one really interesting result concerns the estimate of the intercept µ in the cointegration equation for dividends. Equation
(1.16) in Chapter 1 establishes that this intercept is related to the factor at
which future dividends are discounted, δ. The relationship is
δ = exp(−µ) = exp(−3.129) = 0.044 .
This estimate lines up nicely with the rough estimate of 0.05 obtained from
Figure 1.6 in Chapter 1.
Table 5.2 gives the estimates of the VECM specified in equations (5.8) (5.10) for the United States data on equity prices and earnings using the
Johansen estimator. Not surprisingly there are few changes to the dynamic
parameters of the VAR. The major changes, however, are in the parameter
estimates of the cointegrating vector and their standard errors. The β estimates are 1.169 as opposed to 1.179 for dividends and 1.079 as opposed
to 1.042 for earnings. These results are suggestive of the conclusion that
problems with the single equation approach are more severe in the earnings equation. This does accord a little with intuition particularly insofar as
possible endogeneity is concerned. Dividend policy by firms is changed very
5.7 Fully Modified Estimation†
143
reluctantly but retained earnings will be more responsive to the factors that
influence equity prices. In addition, the estimates of the standard errors of
the Johansen estimates of the cointegration parameter are about ten times
larger. This appreciable difference in standard errors illustrates very clearly
that inference using the standard errors obtained from the Engle-Granger
procedure cannot be relied on.
5.7 Fully Modified Estimation†
The ordinary least squares estimator of β in (5.8) superconsistent but inefficient. Solutions to the efficiency problem and bias introduced by possible
endogeneity of the right-hand-side variables and serial correlation in ut have
also been addressed within single equation framework as opposed to the the
system framework adopted by the Johansen estimator.
Consider the following system of equations
1 −β
pt
0 0
y1,t−1
u1,t
=
+
,
(5.11)
0 1
yt
0 1
y2,t−1
u2,t
in which it should be apparent that both y1,t and y2,t are I(1) variables
and u1,t and u2,t are I(0) disturbances. The first equation in the system is
the cointegrating regression between y1,t and y2,t with the constant term
taken to be zero for simplicity. The second equation is the nonstationary
generating process for y2,t . In order to complete the system fully it is still
necessary to specify the properties of the disturbance vector ut = [u1,t u2,t ]0 .
The most simple generating process that allows for serial correlation in ut
and possible endogeneity of y2,t is the following simple autoregressive scheme
of order 1
u1,t = b11,1 u1,t−1 + b12,0 u2,t + b12,1 u2,t−1 + 1,t
u2,t = b21,0 u1,t + b21,1 u1,t−1 + b22,1 u2,t−1 + 2,t
(5.12)
in which t = [1,t 2,t ]0 ∼ iid(0, Σ) with
σ11 σ12
Σ=
.
σ21 σ22
The notation in equation (5.12) is particularly cumbersome, but it can be
simplified significantly by using the lag operator L, defined as
L0 zt = zt ,
L1 zt = zt−1 ,
L2 zt = zt−2 ,
···
Ln zt = zt−n .
For more information on the lag operator see, for example, Hamilton (1994)
and Martin, Hurn and Harris (2013).
144
Cointegration
Using the lag operator, the system of equations (5.12) can be written as
B(L)ut = t
where
B(L) =
1 − b11,1 L
−b21,0 + b21,1 L
−b12,0 − b12,1 L
1 − b22,1 L
=
b11 (L) b12 (L)
b21 (L) b22 (L)
.
(5.13)
Once B(L) is written in the form of the second matrix on the right-hand
side of (5.13), then the matrix polynomials in the lag operator bij (L) can
be specified to have any order and, in addition, leads as well as lags of ut
can be entertained in the specification. In other words, the assumption of
a simple autoregressive model of order 1 at the outset can be generalised
without any additional effort.
In order to express the system (5.11) in terms of t and not ut and hence
remove the serial correlation, it is necessary to premultiply by B(L). The
result is
b11 (L) −βb11 (L) + b12 (L)
y1,t
0 b11 (L)
y1,t−1
=
+ 1,t ,
b21 (L) −βb21 (L) + b22 (L)
y2,t
0 b22 (L)
y2,t−1
2,t
(5.14)
The problem with single equation estimation of the cointegrating regression
is now obvious: the cointegrating parameter β appears in both equations of
(5.14). This suggests that to estimate the cointegrating vector, a systems
approach is needed which takes into account this cross-equation restriction,
the solution provided by Johansen estimator (Johansen, 1988, 1991, 1995).
It follows from (5.14) that for a single equation approach to produce
asymptotically efficient parameter estimates two requirements that need to
be satisfied.
(1) There should be no cross equation restrictions so that b21 (L) = 0.
(2) There should be no contemporaneous correlation between the disturbance term in the equation used to estimate β and the 2,t , the error
term in the equation generating y2,t . If this condition is not satisfied,
the second equation in (5.14) cannot be ignored in the estimation of β.
Assuming now that b21 (L) = 0, adding and subtracting (y1,t − βy2,t ) from
the first equation in (5.14) and rearranging yields
y1,t − βy2,t + [b11 (L) − 1](y1,t − βy2,t ) + b12 (L)∆y2,t−1 = 1,t
(5.15)
The problem remains that E[1,t , 2,t ] = σ12 6= 0 so that the second condition
outlined earlier is not yet satisfied. The remedy is to multiply the second
5.7 Fully Modified Estimation†
145
equation by ρ = σ12 /σ22 and subtract the result from the first equation in
(5.14). The result is
y1,t −βy2,t +[b11 (L)−1](y1,t −βy2,t )+[b12 (L)−ρb22 (L)]∆y2,t−1 = vt , (5.16)
in which vt = 1,t − ρ2,t . As a result of this restructuring it follows that
E[vt , 2,t ] = E[1,t − ρ2,t , 2,t ] = σ12 − ρσ22 = σ12 −
σ12
σ22 = 0 ,
σ22
so that the second condition for efficient single equation estimation of the
cointegrating parameter β is now satisfied.
Equation (5.16) provides a relationship between y1,t and its long-run equilibrium level, βy2,t , with the dynamics of the relationship being controlled
by the structure of the polynomials in the lag operator, b11 (L), b12 (L) and
b22 (L). A very general specification of these lag polynomials will allow for
different lag orders and also leads as well as lags. In other words, the a general version of (5.16 will allow for both the leads and lags of the cointegrating
relationship, (y1,t − βy2,t ) and the leads and lags of ∆y2,t . A reduced form
version of this equation is
q
X
y1,t = βy2,t +
k=−q
πk (pt−k − βyt−k ) +
q
X
αk ∆yt−k + ηt ,
(5.17)
k=−q
k6=0
where for the sake of simplicity the lag length in all cases has been set at q.
As noted by Lim and Martin (1995), this approach to obtaining asymptotically efficient parameter estimates of the cointegrating vector can be
interpreted as a parametric filtering procedure. in which the filter expresses
u1,t in terms of observable variables which are then included as regressors in
the estimation of the cointegrating vector.The intuition behind this approach
is that improved estimates of the long-run parameters can be obtained by
using information on the short-run dynamics.
The Phillips and Loretan estimator (Phillips and Loretan, 1991)
The Phillips and Loretan (1991) estimator excludes the leads of the cointegrating vector from equation (5.17) are excluded. The equation is
y1,t = βy2,t +
q
X
k=1
πk (pt−k − βyt−k ) +
q
X
αk ∆yt−k + ηt ,
(5.18)
k=−q
which is estimated by non-linear least squares. This procedure yields (super)
consistent and asymptotically efficient estimates of the cointegrating vector
if all the restrictions in moving from (5.14) to (5.18) are satisfied.
146
Cointegration
Dynamic least squares (Saikkonen, 1991; Stock and Watson, 1993)
The dynamic least squares estimator excludes the lags and leads of the
cointegrating vector from equation (5.17). The equation is
y1,t = βy2,t +
q
X
αk ∆yt−k + ηt ,
(5.19)
k=−q
which has the advantage of being estimated by ordinary least squares. This
procedure yields (super) consistent and asymptotically efficient estimates of
the cointegrating vector if all the restrictions in moving from (5.14) to (5.19)
are satisfied.
Fully modified least squares (Phillips and Hansen, 1990)
The fully modified estimator excludes the lags and leads of the cointegrating
vector and limits the terms in ∆yt to the contemporaneous difference with
coefficient ρ. The resulting model is
y1,t = βy2,t + ρ∆yt + ηt .
(5.20)
Comparison of the first equation in (5.11) and (5.20) implies that
u1,t = ρ∆y2,t + ηt .
(5.21)
The fully modified ordinary least squares approach is now implement in
three steps.
(1) Estimate first equation in (5.11) by ordinary least squares to obtain βb
and u
b1,t .
(2) Estimate (5.21) by ordinary least squares to obtain estimates of ρb of σ
bη2 .
(3) Regress the constructed variable y1,t − ρb∆yt on y2,t and get a revised
b Use the estimate of σ
estimate of β.
bη2 to construct standard errors.
The Engle and Yoo estimator (Engle and Yoo, 1991)
The Engle and Yoo estimator starts by formulating the error correction
version of equation (5.20) by adding and subtracting y1,t−1 from the lefthand-side and adding and subtracting βy2,t−1 from the right-hand-side and
rearranging to yield
∆y1,t = −(y1,t−1 − βy2,t−1 ) + (β + ρ)∆y2,t + ηt .
(5.22)
b a reduced form version of (5.22) is
Given an estimate β,
b 2,t−1 ) + α∆y2,t + wt .
∆y1,t = −δ(y1,t−1 − βy
(5.23)
5.7 Fully Modified Estimation†
147
in which
wt = αδy2,t−1 + ηt ,
α = β − βb .
(5.24)
The Engle and Yoo estimator is implemented in three steps.
(1) Estimate first equation in (5.11) by ordinary least squares to obtain βb
and u
b1,t .
(2) Estimate (5.24) by ordinary least squares to obtain estimates of w
bt and
b
δ.
(3) Regress the residuals w
bt on y2,t−1 and in order to obtain α
b. The revised
estimate of β is given by βb + α
b.
Table 5.3
Single equation estimates of the cointegration regression between stock prices and
dividends and stock prices and earnings, respectively. The dynamic ordinary least
squares estimates use one forward lead and one backward lag. The sample period
is January 1871 to June 2004.
OLS
β
µ
Dividend Model
DOLS FMOLS
1.179
(0.005)
3.129
(0.008)
1.174
(0.040)
3.117
(0.056)
1.191
(0.038)
3.143
(0.053)
OLS
Earnings Model
DOLS FMOLS
1.042
(0.005)
2.607
(0.009)
1.043
(0.039)
2.607
(0.065)
1.065
(0.038)
2.612
(0.064)
Table 5.3 compares the ordinary least squares estimator of the cointegrating regression with the fully modified and dynamic ordinary least squares
estimators. Comparison with the results in Table 5.2 shows that the fully
modified ordinary least squares estimator works particularly well in the case
of the earnings model, which previously was identified as the more problematic of the two models in terms of potential endogeneity. The dynamic
least squares estimator is less impressive in this situation, although there
may be scope for improvement by considering a longer lead/lag structure.
Interestingly, the standard errors on the fully modified and dynamic least
squares approaches are similar to those of the Johansen approach. The results suggest that modified single equation approaches can help to improve
inference in the cointegrating regression. The limitation of these approaches
remains that the dimension of the cointegration space is always limited to
unity.
148
Cointegration
5.8 Testing for Cointegration
Up to this point the existence of a cointegrating relationship has merely been
posited or assumed. Of course, the identification of cointegration is a crucial
step in modelling with nonstationary variables and is, in fact, the place where
the modelling procedure actually begins. Yule (1926) first drew attention
to the problems of modelling with unrelated nonstationary variables and
Granger and Newbold (1974) later showed that regression involving non
stationary variables can lead to spurious correlations. Spurious regressions
arise when unrelated nonstationary variables are found to have a statistically
significant relationship. Suppose yt and xt are unrelated I(0) variables, the
chance of getting a nonzero estimate of a regression coefficient of xt on
yt , even though the true value is zero, is substantial. Banerjee, Dolado,
Galbraith and Hendry (1993)indexauthorsHendry, D.F. showed that in a
sample size of 100 a rejection probability of 75.3% was obtained. Morevoer,
the problem does not go away in large samples, in fact the opposite is true
which the rejection probability of a zero coefficient going up the larger the
sample gets. To guard against spurious regressions it is critically important
that cointegration can be identified reliably.
5.8.1 Residual-based tests
A natural way to test for cointegration is a two-step procedure consisting of
estimating the cointegrating equation by least squares in the first step and
testing the residuals for stationarity in the second step. As the unit root
test treats the null hypothesis as nonstationary, in applying the unit root
procedure to test for cointegration the null hypothesis is no cointegration
whereas the alternative hypothesis of stationarity represents cointegration:
H0 :
H1 :
No Cointegration
Cointegration
(ut is nonstationary)
(ut is stationary)
This is a sensible strategy given that the estimator of the cointegrating equation is super-consistent and converges
√ at the faster rate of T to its population
value compared to the usual rate of T for stationary variables. However, in
applying a unit root test to the ordinary least squares residuals the critical
values must take into account the loss of degrees of freedom in estimating the
cointegrating equation. The critical values of the tests depend on the sample
size and the number of deterministic terms and other regressors in the first
stage regression. Tables are provided by Engle and Granger (1987) and Engle and Yoo (1987). MacKinnon (1991) provides response surface estimates
of the critical values that are now used in most computer packages.
149
Dividend residuals
00
20
80
19
60
19
40
19
20
19
00
19
18
80
-1
-.5
Residuals
0
.5
1
5.8 Testing for Cointegration
Earnings residuals
Figure 5.4 Plot of the residuals from the first stage of the Engle-Granger
two stage procedure applied to the dividend model and the earnings model,
respectively. Data are monthly observations from February 1871 to June
2004 on United States equity prices, dividends and earnings per share.
The residuals obtained by estimating the cointegrating regressions for the
dividend model, (5.1), and the earnings model, (5.2), respectively, by ordinary least squares are plotted in Figure 5.4. The series appear to have
mean zero and there is no trend apparent giving the appearance of stationarity. Formal tests of the stationarity of the residuals are carried out using
the Dickey-Fuller framework, based on a test regression with no constant or
trend. The results are shown in Table 5.4 for up to four lags used to augment the test regression. Despite the aberration of the Dickey-Fuller test
(0 lags) failing to reject the null hypothesis of nonstationarity, the results
from the augmented Dickey-Fuller test are unequivocal. The null hypothesis
of nonstationarity is rejected and the residuals are I(0). This confirms the
intuition provided by Figure 5.4 and allows the conclusion that both the
dividend model and the earnings model represent valid long-run relationships between equity prices and dividends and equity prices and earnings
per share, respectively.
Although residual-based tests of cointegration are a natural way to think
about the problem of testing for cointegration they suffer from the same
problem as all single equation approaches to cointegration, namely, that the
number of cointegrating relationships is necessarily limited to one. This is
not problematic in the case of two variables, but it is severely limiting when
wanting to consider the multivariate case.
150
Cointegration
Table 5.4
Testing for cointegration between United States equity prices and dividends and
equity prices and earnings. Augmented Dickey-Fuller tests based on the test
regression with no constant term and with number of lags shown. Critical values
are from MacKinnon (1991).
Dividend Model
Dickey-Fuller Test
Lags
Statistic 5% CV
0
1
2
3
4
-2.654
-3.890
-3.630
-3.576
-3.814
-3.340
-3.340
-3.340
-3.340
-3.340
Earnings Model
Dickey-Fuller Test
Rank
Statistic 5% CV
0
1
2
3
4
-2.674
-4.090
-3.921
-3.936
-4.170
-3.340
-3.340
-3.340
-3.340
-3.340
5.8.2 Reduced-rank tests
Consider the following simple model
∆y1,t
π11 π12
y1,t−1
1,t
=
+
,
∆y2,t
π21 π22
y2,t−1
2,t
(5.25)
which is a bivariate VAR rearranged to look like a VECM but with no
long-run equilibrium relationships imposed. In other words, the matrix
π11 π12
Π=
,
π21 π22
is an unrestricted matrix in which the rows and columns of the matrix are
not related in a linear fashion. This condition is referred to as the matrix
having full rank. As this model is simply a VAR model written in a particular
way for this to be a correct representation of the data both y1,t and y2,t must
be stationary.
Now consider the situation when y1,t and y2,t share a long-run relationship
with cointegrating parameter β with speed of adjustment parameters α1 and
α2 in the first and second equations, respectively. Equation (5.25) must be
151
5.8 Testing for Cointegration
restricted to reflect this long-run relationship to yield the familiar VECM
∆y1,t
α1 α1 β
y1,t−1
1,t
=
+
.
(5.26)
∆y2,t
α2 α2 β
y2,t−1
2,t
so that
Π=
α1 α1 β
α2 α2 β
=
α1
α2
1 β
.
The effect of the long-run relationship is to restrict the elements of the
matrix Π. In particular the second column of Π is simply the first column
multiplied by β so that there is now dependence between the columns of the
matrix. The matrix Π is now referred to as having reduced rank, in this case
rank one.
If the matrix Π has rank zero then the system becomes
∆y1,t
1,t
=
,
(5.27)
∆y2,t
2,t
in which both y1,t and y2,t are nonstationary.
It is now apparent from equations (5.25) to (5.25) that testing for cointegration is equivalent to testing the validity of restrictions on the matrix
Π, or determining the rank of this matrix. In other words, testing for cointegration amounts to testing if the matrix Π has reduced rank. As the rank
of the matrix is determined from the number of significant eigenvalues, Johansen provides two tests of cointegration based on the eigenvalues of the
matrix Π, known as the maximal eigenvalue test and the trace test respectively (Johansen, 1988, 1991, 1995). Testing for cointegration based on the
eigenvalues of Π is now widely used because it has two advantages over the
two-step residual based test, namely, the tests generate the correct p-values
and the tests are easily applied in a multivariate context where testing for
several cointegrating equations jointly is required.
The Johansen cointegration test proceeds sequentially. If there are two
variables being tested for cointegration the maximum number of hypotheses
considered is two. If there are N variables being tested for possible cointegration the maximum number of hypotheses considered is N .
Stage 1:
H0 :
H1 :
No cointegrating equations
One of more cointegrating equations
Under the null hypothesis all of the variables are I(1) and there is
no linear combination of the variables that achieves cointegration.
152
Cointegration
Under the alternative hypothesis there is (at least) one linear combination of the I(1) variables that yields a stationary disturbance
and hence cointegration. If the null hypothesis is not rejected then
the hypothesis testing stops. Alternatively, if the null hypothesis is
rejected it could be the case that there is more than one linear combination of the variables that achieves stationarity so the process
continues.
Stage 2:
H0 :
H1 :
One cointegrating equation
Two or more cointegrating equations
If the null hypothesis is not rejected the testing procedure stops and
the conclusion that there are two cointegrating equations. Otherwise
proceed to the next stage.
Stage N:
H0 :
H1 :
N − 1 cointegrating equations
All variables are stationary
At the final stage, the alternative hypothesis is that all variables
are stationary and not that there are N cointegating equations. For
there to be N linear stationary combinations of the variables, the
variables need to be stationary in the first place.
Large values of the Johansen cointegration statistic relative to the critical
value result in rejection of the null hypothesis. Alternatively, small p-values
less than 0.05 for example, represents a rejection of the null hypothesis at the
5% level. In performing the cointegration test, it is necessary to specify the
VECM to be used in the estimation of the matrix Π. The deterministic components (constant and time trend) as well as the number of lagged dependent
variables to capture autocorrelation in the residuals must be specified.
The results of the Johansen cointegration test applied to the United States
equity prices, dividends and earnings data is given in Table 5.5. Results
are provided for the dividend model, the earnings model and a combined
model which tests all three variables simultaneously. For the first two models, N = 2, so the maximum rank of the Π matrix is 2. Inspection of the
first null hypothesis of zero rank or no cointegration shows that the null
hypothesis is easily rejected at the 5% level for both the dividend and earnings models. There is therefore at least one cointegrating vector in both of
these specifications. The next hypothesis corresponds to Π having rank one
or there being one cointegating equation. The null hypothesis is not rejected
153
5.8 Testing for Cointegration
Table 5.5
Johansen tests of cointegration between United States equity prices, dividends
and earnings. Testing is based on Model 3 (unrestricted constant) with 2 lags in
the underlying VAR.
Rank
Eigenvalue
0
1
2
·
0.01907
0.00091
Rank
Eigenvalue
0
1
2
·
0.01988
0.00061
Rank
Eigenvalue
0
1
2
3
·
0.05055
0.01576
0.00078
Dividend Model
Trace Test
Statistic 5% CV
Max Test
Statistic 5% CV
32.2643
1.4510
·
30.8132
1.4510
·
15.41
3.76
·
14.07
3.76
·
Earnings Model
Trace Test
Statistic 5% CV
Max Test
Statistic 5% CV
33.1124
0.9814
·
32.1310
0.9814
·
15.41
3.76
·
14.07
3.76
·
Combined Model
Trace Test
Statistic 5% CV
Max Test
Statistic 5% CV
109.6699
26.6677
1.2495
·
83.0022
25.4183
1.2495
·
29.68
15.41
3.76
·
20.97
14.07
3.76
·
at the 5% level for both models, so the conclusion is that there is one cointegrating equation that combines prices and dividends and one cointegrating
equation that combines prices and earnings into stationary series.
The results of the Johansen cointegration test applied to the combined
model of real equity prices, real dividends and earnings per share are given
in Table 5.5. The body of the table contains three rows as there are now
N = 3 variables being examined. The first null hypothesis of zero rank or
no cointegration is easily rejected at the 5% level so there is at least one
linear combination of these variables that is stationary. The next hypothesis
corresponds to Π having rank one or there being one cointegating equation.
The null hypothesis is again rejected at the 5% level so there are at least two
cointegrating relationships between these three variables. The null hypothesis of a rank of two cannot be rejected at the 5% level, so the conclusion is
that there are two linear combinations of these three variables that produce
a stationary residual.
154
Cointegration
5.9 Multivariate Cointegration
The results of the Johansen cointegration test applied to the the three variable system of real equity prices, real dividends and earnings per share in the
previous section indicated that there are two cointegrating vectors. There
are thus two combinations of these three nonstationary variables that yield
stationary residuals. The next logical step is to estimate a VECM which
takes all three variables as arguments and imposes a cointegrating rank of
two on the estimation. The results of this estimation are shown in Table 5.6.
Table 5.6
Estimates of a three-variable VECM(1) for equity prices, dividends and earnings
per share using the Johansen estimator based on Model 3 (unrestricted constant).
The sample period is January 1871 to June 2004.
The two estimated cointegrating equations are
pt = 1.072 yt + 2.798
[Ecm1]
dt = 0.910 yt − 0.445
[Ecm2]
(0.042)
(0.012)
Variable
Ecm1
Ecm2
∆pt−1
∆dt−1
∆yt−1
Constant
∆pt
∆dt
∆yt
-0.0082
(0.0034)
0.0014
(0.0069)
0.2868
(0.0242)
03674
(0.1015)
0.0699
(0.0465)
0.0005
(0.0012)
0.0017
(0.0004)
-0.0072
(0.0009)
-0.0020
(0.0032)
0.8194
(0.0133)
0.0235
(0.0061)
0.0006
(0.0001)
0.0029
(0.0010)
0.0049
(0.0020)
0.01339
(0.0070)
0.0542
(0.0292)
0.8748
(0.0133)
0.0009
(0.0004)
The interpretation of the results in Table 5.6 proceeds as follows.
(1) Cointegrating equations:
The first cointegrating equation estimates the long-run relationship
between price and earnings and is normalised with respect to price.
The second cointegrating relationship is between dividends and earnings, normalised with repeat to dividends.
(2) Speed of adjustment parameters:
The signs and significance of the speed of adjustment parameters
on the error correction terms help to establish the stability of the
5.9 Multivariate Cointegration
155
estimated relationships. Stability requires that the coefficient of adjustment on the error correction term in the equation for ∆pt be
negative. This is indeed the case and the estimate is also significant, although marginally so. The coefficient of adjustment in the
earnings equation is positive and significant which is also required by
theory. Interestingly, the adjustment coefficient in the dividend equation is also significant. This is to be expected because earnings and
dividends are closely related as demonstrated by the second cointegrating equation. What this suggests is that dividends and earnings
adjust more aggressively than prices do to correct any deviation from
long-run equilibrium.
As expected the adjustment parameter on the second error-correction
term is negative and significant in the dividend equation and positive
and significant in the dividend equation. Notice however that the coefficient of adjustment on Ecm2 in the ∆pt equation is insignificant
which is to be expected given that price is not expected to adjust
to a divergence from long-run equilibrium between dividends and
earnings.
(3) Dynamic parameters:
The first test of interest on the parameters of the VECM relates
to the significance of the constant terms in the short-run dynamic
specification of the system. This relates to the choice of Model 3
(unrestricted constant) as opposed to Model 2 (restricted constant)
where the constant term only appears in the cointegrating equations.
Although the constants are all small in absolute size at least two of
them appear to be estimated fairly precisely. The joint hypothesis
that they are all zero, or equivalently that Model 2 is preferable to
Model 3, is therefore unlikely to be accepted.
An important issue in estimating multivariate systems in which there are
cointegrating relationships is that the estimates of the cointegrating vectors
are not unique, but depend on the normalisation rules which are adopted.
For example, the results obtained when estimating this three variable system
but imposing the normalisation rule that both cointegrating equations are
normalised on pt are reported in Table 5.7.
The two cointegrating regressions reported in Table 5.7 are now the familiar expressions that have been dealt with in the bivariate cases throughout
the chapter (see for example, Table 5.2). While this seems to contradict the
results reported in Table 5.6 the two sets of long-run relationships are easily
156
Cointegration
Table 5.7
Estimates of the three-variable VECM for equity prices, dividends and earnings
per share using the Johansen estimator. Estimates are based on Model 3
(unrestricted constant) with 1 lag of the differenced variables. The sample period
is January 1871 to June 2004.
The two estimated cointegrating equations are
pt = 1.072 yt + 2.798
[Ecm1]
pt = 1.777 dt + 3.323
[Ecm2]
(0.039)
(0.039)
Variable
Ecm1
Ecm2
∆pt−1
∆dt−1
∆yt−1
Constant
∆pt
∆dt
∆yt
-0.0070
(0.0051)
0.0012
(0.0059)
0.2868
(0.0242)
03674
(0.1015)
0.0699
(0.0465)
0.0005
(0.0012)
-0.0045
(0.0007)
0.0062
(0.0008)
-0.0020
(0.0032)
0.8194
(0.0133)
0.0235
(0.0061)
0.0006
(0.0001)
0.0071
(0.0015)
-0.0042
(0.0017)
0.01339
(0.0070)
0.0542
(0.0292)
0.8748
(0.0133)
0.0009
(0.0004)
reconciled. It follows directly from the results in Table 5.7 that
pt = 1.777dt = 1.072yt ⇒ dt = 1.072/1.777yt = 0.9107yt
which corresponds to the second cointegrating equation in Table 5.6.
One final interesting point to note is that Table 5.7 confirms the rather
weak adjustment by prices to any disequilibrium. Both the adjustment parameters on Ecm1 and Ecm2 in this specification are insignificantly different
from zero. What this suggests is that dividends and earnings per share tend
to pick up most of the adjustment in relation to shocks which disturb the
long-run equilibrium.
Multivariate cointegration modelling is a very useful tool in dealing with
financial models and will be encountered again in Chapters 12 and 13. The
potentially more complicated issues of testing and interpretation will be left
to deal with in these later chapters.
5.10 Exercises
(1) Simulating a VECM
5.10 Exercises
157
Consider a simple bivariate VECM
y1,t − y1,t−1 = δ1 + α1 (y2,t−1 − βy1,t−1 − µ)
y2,t − y2,t−1 = δ2 + α2 (y2,t−1 − βy1,t−1 − µ)
(a) Using the initial conditions for the endogenous variables y1 = 100
and y2 = 110 simulate the model for 30 periods using the parameters
δ1 = δ2 = 0; α1 = −0.5; α2 = 0.1; β = 1; µ = 0 .
Compare the two series. Also check to see that the long-run value
of y2 is given by βy1 + µ.
(b) Simulate the model using the following parameters:
δ1 = δ2 = 0; α1 = −1.0; α2 = 0.1; β = 1; µ = 0
Compare the resultant series with the those in (a) and hence comment on the role of the error correction parameter α1 .
(c) Simulate the model using the following parameters:
δ1 = δ2 = 0; α1 = 1.0; α2 = −0.1; β = 1; µ = 0
Compare the resultant series with the previous ones and hence comment on the relationship between stability and cointegration.
(d) Simulate the model using the following parameters:
δ1 = δ2 = 0; α1 = −1.0; α2 = 0.1; β = 1; µ = 10
Comment on the role of the parameter µ. Also check to see that the
long-run value of y2 is given by βy1 + µ.
(e) Simulate the model using the following parameters:
δ1 = δ2 = 1; α1 = −1.0; α2 = 0.1; β = 1; µ = 0
Comment on the role of the parameters δ1 and δ2 .
(f) Explore a richer class of models which also includes short-run dynamics. For example, consider the model
y1,t − y1,t−1 = δ1 + α1 (y2,t−1 − βy1,t−1 − µ) + φ11 (y1,t−1 − y1,t−2 )
+φ12 (y2,t−1 − y2,t−2 )
y2,t − y2,t−1 = δ2 + α2 (y2,t−1 − βy1,t−1 − µ) + φ21 (y1,t−1 − y1,t−2 )
+φ22 (y2,t−1 − y2,t−2 )
(2) The Present Value Model
158
Cointegration
pv.wf1, pv.dta, pv.xlsx
The present value model predicts the following relationship between
the two series
pt = β0 + β1 dt + ut ,
where pt is the natural logarithm of real price of equities, dt is the natural
logarithm of real dividend payments, ut is a disturbance term and β1 is
the discount rate and β1 = 1.
(a) Test for cointegration between pt and dt using Model 3 and p = 1
lags.
(b) Given the results in part (a) estimate a bivariate ECM for pt and dt
using Model 3 with p = 1 lag. Interpret the results paying particular
attention to the long-run parameter estimates, β0 and β1 and the
error correction parameter estimates, α
bi .
(c) Derive an estimate of the long-run real discount rate from R =
exp(−β0 ) and interpret the result.
(d) Test the restriction H0 : β1 = 1.
(e) Discuss whether the empirical results support the present value
model.
(3) Forward Market Efficiency
spot.wf1, spot.dta, spot.xlsx
The data for this question were obtained from Corbae, Lim and Ouliaris (1992) who test for speculative efficiency by considering the equation
st = β0 + β1 ft−n + ut ,
where st is the natural logarithm of the spot rate, ft−n is the natural
logarithm of the forward rate lagged n periods and ut is a disturbance
term. In the case of weekly data and the forward rate is the 1-month
rate, ft−4 is an unbiased estimator of st if β1 = 1.
(a) Use unit root tests to determine the level of integration of st , ft−1 ,
ft−2 and ft−3 .
(b) Test for cointegration between st and ft−4 using Model 2 with p = 0
lags.
159
5.10 Exercises
(c) Provided that the two rates are cointegrated, estimate a bivariate
VECM for st and ft−4 using Model 2 with p = 0 lags.
(d) Interpret the coefficients β0 and β1 . In particular, test that β1 = 1.
(e) Repeat these tests for the 3 month and 6 month forward rates. Hint:
remember that the frequency of the data is weekly.
(4) Spurious Regression Problem
Program files
nts_spurious1.*, nts_spurious2.*
A spurious relationship occurs when two independent variables are
incorrectly identified as being related. A simple test of independence is
based on the estimated correlation coefficient, ρb.
(a) Consider the following bivariate models
(i)
(ii)
(iii)
(iv)
y1,t
y1,t
y1,t
y1,t
= v1,t ,
= y1,t−1 + v1,t ,
= y1,t−1 + v1,t ,
= 2y1,t−1 − y1,t−2 + v1,t ,
y2,t
y2,t
y2,t
y2,t
= v2,t
= y2,t−1 + v2,t
= 2y2,t−1 − y2,t−2 + v2,t
= 2y2,t−1 − y2,t−2 + v2,t
in which v1,t , v2,t are iid N (0, σ 2 ) with σ 2 = 1. Simulate each bivariate model 10000 times for a sample of size T = 100 and compute
the correlation coefficient, ρb, of each draw. Compute the sampling
distributions of ρb for the four sets of bivariate models and discuss
the properties of these distributions in the context of the spurious
regression problem.
(b) Repeat part (a) with T = 500. What do you conclude?
(c) Repeat part (a), except for each draw estimate the regression model
y2,t = β0 + β1 y1,t + ut ,
ut ∼ iid (0, σ 2 ) .
Compute the sampling distributions of the least squares estimator
βb1 and its t statistic for the four sets of bivariate models. Discuss
the properties of these distributions in the context of the spurious
regression problem.
(5) Fisher Hypothesis
fisher.wf1, fisher.dta, fisher.xlsx
160
Cointegration
Under the Fisher hypothesis the nominal interest rate fully reflects
the long-run movements in the inflation rate. The Fisher hypothesis is
represented by
it = β0 + β1 πt + ut ,
where ut is a disturbance term and the slope parameter is β1 = 1.
(a) Construct the percentage annualised inflation rate, πt .
(b) Perform unit root tests to determine the level of integration of the
nominal interest rate and inflation. In performing the unit root tests,
test the sensitivity of the results by using a model with a constant
and no time trend, and a model with a constant and a time trend.
Let the lags be determined by the automatic lag length selection
procedure. Discuss the results in terms of the level of integration of
each series.
(c) Estimate a bivariate VAR with a constant and use the SIC lag length
criteria to determine the optimal lag structure.
(d) Test for cointegration between it and πt using Model 2 with the
number of lags based on the optimal lag length obtained form the
estimated VAR. Remember if the optimal lag length of the VAR is
p, the lag structure of the VECM is p − 1.
(e) Redo part (d) subject to the restriction that β1 = 1.
(f) Does the Fisher hypothesis hold in the long-run? Discuss.
(6) Purchasing Power Parity
ppp.wf1, ppp.dta, ppp.xlsx
Under the assumption of purchasing power parity (PPP), the nominal
exchange rate adjusts in the long-run to the price differential between
foreign and domestic countries
S=
P
F
This suggests that the relationship between the nominal exchange rate
and the prices in the two countries is given by
st = β0 + β1 pt + β2 ft + ut
where lower case letters denote natural logarithms and ut is a disturbance term which represents departures from PPP with β2 = −β1 .
5.10 Exercises
161
(a) Construct the relevant variables, s, f , p and the difference dif f =
p − f.
(b) Use unit root tests to determine the level of integration of all of
these series. In performing the unit root tests, test the sensitivity of
the results by using a model with a constant and no time trend, and
a model with a constant and a time trend. Let the lags be p = 12.
Discuss the results in terms of the level of integration of each series.
(c) Test for cointegration between s p and f using Model 3 with p = 12
lags.
(d) Given the results in part (c) estimate a trivariate ECM for s, p and
f using Model 3 and p = 12 lags. Write out the estimated (the
cointegrating equation(s) and the ECM).
(e) Interpret the long-run parameter estimates. Hint: if the number of
cointegrating equations is greater than one, it is helpful to rearrange
the cointegrating equations so one of the equations expresses s as a
function of p and f .
(f) Interpret the error correction parameter estimates.
(g) Interpret the short-run parameter estimates.
(h) Test the restriction H0 : β2 = −β1 .
(i) Discuss the long-run properties of the $/AUD foreign exchange market?
6
Forecasting
6.1 Introduction
The future values of variables are important inputs into the current decision
making of agents in financial markets and forecasting methods, therefore,
are widely used in financial markets. Formally, a forecast is a quantitative
estimate about the most likely value of a variable based on past and current
information and where the relationship between variables is embodied in
an estimated model. In the previous chapters a wide variety of econometric
models have been introduced, ranging from univariate to multivariate time
series models, from single equation regression models to multivariate vector
autoregressive models. The specification and estimation of these financial
models provides a mechanism for producing forecasts that are objective in
the sense that the forecasts can be recomputed exactly by knowing the structure of the model and the data used to estimate the model. This contrasts
with back-of-the-envelope methods which are not reproducible. Forecasting
can also serve as a method for comparing alternative models. Forecasting
methods not only provide an important way to choose between alternative
models, but also a way of combining the information contained in forecasts
produced by different models.
6.2 Types of Forecasts
Illustrative examples of forecasting in financial markets abound.
(i) The determination of the price of an asset based on present value methods requires discounting the present and future dividend stream at a
discount rate that potentially may change over time.
(ii) Firms are interested in forecasting the future health of the economy
163
6.2 Types of Forecasts
(iii)
(iv)
(v)
(vi)
when making decisions about current capital outlays because this investment earns a stream of returns over time.
In currency markets, forward exchange rates provide an estimate, forecast, of the future spot exchange rate.
In options markets, the Black-Scholes method for pricing options is
based on the assumption that the volatility of the underlying asset that
the option is written on is constant over the life of the option.
In futures markets, buyers and sellers enter a contract to buy and sell
commodities at a future date.
Model-based computation of Value-at-Risk requires repeated forecasting
of the value of a portfolio over a given time horizon.
Although all these examples are vastly different, the forecasting principles in
each case are identical. Before delving into the actual process of generating
forecasts it is useful to establish some terminology.
Consider an observed sample of data {y1 , y2 , · · · , yT } and an econometric
model is to be used to generate forecasts of y over an horizon of H periods.
The forecasts of y which are denoted yb are of two main types.
Ex Ante Forecasts: The entire sample {y1 , y2 , · · · , yT } is used to estimate the data and the task is to forecast the variable over an horizon
H beginning after the last observation of the dataset.
Ex Post Forecasts: The model is estimated over a restricted sample period that excludes the last H observations, {y1 , y2 , · · · , yT −H }. The
model is then forecasted out-of-sample over these H observations,
but as the actual value of these observations have already been observed and it is therefore possible to compare the accuracy of the
forecasts with the actual values.
Ex post and ex ante forecasts may be illustrated as follows:
Sample
y1 , y2 , · · · , yT −H , yT −H+1 , yT −H+2 · · · yT
Ex Post
y1 , y2 , · · · , yT −H , ybT −H+1 , ybT −H+2 · · · ybT
Ex Ante
y1 , y2 , · · · , yT −H , yT −H+1 , yT −H+2 · · · yT
ybT +1 , · · · ybT +H
It is clear therefore that forecasting ex ante for H periods ahead requires
the successive generation of ybT +1 , ybT +2 up to and including ybT +H . This is
referred to a multi-step forecast. On the other hand, ex post forecasting
allows some latitude for choice. The forecast ybT −H+1 is based on data up
to and including yT −H . In generating the forecast ybT −H+2 the observation
164
Forecasting
yT −H+1 is available for use. Forecasts that use this observation are referred to
as a one-step ahead or static forecast. Ex post forecasting also allows multistep forecasting using data up to and including yT −H and this is known as
dynamic forecasting.
There is a distinction between forecasting based on dynamic time series
models and forecasts based on broader linear or nonlinear regression models.
Forecasts based on dynamic univariate or multivariate time series models developed in Chapter ?? are referred to as recursive forecasts. Forecasts that
are based on econometric models that related one variable to another as in
the linear regression model outlined in Chapter 2 are known as structural
forecasts. It should be noted, however, that that the distinction between
these two types of forecasts is often unclear as econometric models often
contain both structural and dynamic time series features. An area in forecasting that has attracted a lot of recent interest which incorporates both
recursive and structural elements is the problem of or predictive regressions,
dealt with in Section 6.9.
Finally, forecasts in which only a single figure, say ybT +H , is reported for
period T + H is known as a point forecast. The point forecast represents
the best guess of the value of yT +H . Even if this guess is a particularly
good one and it is known that on average the forecast is correct, or more
formally Eb
yT +H = yT +H , there is some uncertainty associated with every
forecast. Interval forecasts encapsulate this uncertainty by providing a range
of forecast values for ybT +H within which the actual value yT +H is expected
to be found at some given level of confidence.
6.3 Forecasting with Univariate Time Series Models
To understand the basic principles of forecasting financial econometric models, the simplest example namely a univariate autoregressive model with one
lag, AR(1), model, is sufficient to demonstrate the key elements. Extending the model to more complicated univariate and multivariate models only
increases the complexity to the computation but not the underlying fundamental technique of how the forecasts are generated.
Consider the AR(1) model
yt = φ0 + φ1 yt−1 + vt .
(6.1)
Suppose that the data consist of T sample observations y1 , y2 , · · · , yT . Now
consider using the model to forecast the variable one period into the future,
6.3 Forecasting with Univariate Time Series Models
165
at T + 1. The model at time T + 1 is
yT +1 = φ0 + φ1 yT + vT +1 .
(6.2)
To be able to compute a forecast of yT +1 it is necessary to know everything
on the right-hand side of equation ??ch5-e2). Inspection of this equation
reveals that some of these terms are known and some are unknown at time
T:
Observations: yT
Known
Parameters:
φ0 , φ1 Unknown
Disturbance: vT +1
Unknown
The aim of forecasting is to replace the unknowns with the best guess
of these quantities. In the case of parameters, the best guess is simply to
replace them with their point estimates, φb0 and φb1 , where all the sample data
is used to obtain the estimates. Formally this involves using the mean of the
sampling distribution to replace the population parameters φ0 , φ1 by their
sample estimates. Adopting the same strategy, the unknown disturbance
term vT +1 in (6.2) is replaced by using the mean of its distribution, namely
E[vT +1 ] = 0. The resulting forecast of yT +1 based on equation (6.2) is given
by
ybT +1 = φb0 + φb1 yT + 0 = φb0 + φb1 yT ,
(6.3)
where the replacement of yT +1 by ybT +1 emphasizes the fact that the latter
is a forecast quantity.
Now consider extending the forecast range to T + 2, the second period
after the end of the sample period. The strategy is the same as before with
the first step being expressing the model at time T + 2 as
yT +2 = φ0 + φ1 yT +1 + vT +2 ,
in which all that all terms are now
time T :
Parameters:
Observations:
Disturbance:
(6.4)
unknown at the end of the sample at
φ0 , φ1
yT +1
vT +2
Unknown
Unknown
Unknown
As before, replace the parameters φ0 and φ1 by their sample estimators,
b
φ0 and φb1 , and the disturbance vT +2 by its mean E[vT +2 ] = 0. What is
new in equation (6.4) is the appearance of unknown quantity yT +1 on the
right-hand side of the equation Again, adopting the strategy of replacing
unknowns by a best guess requires that the forecast of this variable obtained
166
Forecasting
in the previous step, ybT +1 be used. Accordingly, the forecast for the second
period is
ybT +2 = φb0 + φb1 ybT +1 + 0 = φb0 + φb1 ybT +1 .
Clearly extending this analysis to H implies a forecasting equation of the
form
ybT +H = φb0 + φb1 ybT +H−1 + 0 = φb0 + φb1 ybT +H−1 .
The need to use the forecast from the previous step to generate a forecast
in the next step is commonly referred to as recursive forecasting. Moreover,
as all of the information embedded in the forecasts ybT +1 , ybT +2 , · · · ybT +H
is based on information up to and including the last observation in the
sample at time T , the forecasts are commonly referred to as conditional
mean forecasts where the conditioning is based on information at time T .
Extending the AR(1) model to an AR(2) model
yt = φ0 + φ1 yt−1 + φ2 yt−2 + vt ,
involves the sample strategy to forecast yt . Writing the model at time T + 1
gives
yT +1 = φ0 + φ1 yT + φ2 yT −1 + vT +1 .
Replacing the parameters {φ0 , φ1 , φ2 } by their sample estimators {φb0 , φb1 , φb2 }
and the disturbance vT +1 by its mean E[vT +1 ] = 0, the forecast for the first
period into the future is
ybT +1 = φb0 + φb1 yT + φb2 yT −1 .
To generate the forecasts for the second period, the AR(2) model is written
at time T + 2
yT +2 = φ0 + φ1 yT +1 + φ2 yT + vT +2 .
Replacing all of the unknowns on the right-hand side by their appropriate
best guesses, gives
ybT +2 = φb0 + φb1 ybT +1 + φb2 yT .
To derive the forecast of yt at time T + 3 the AR(2) model is written at
T +3
yT +3 = φ0 + φ1 yT +2 + φ2 yT +1 + vT +3 .
Now all terms on the right-hand side are unknown and the forecasting equation becomes
ybT +3 = φb0 + φb1 ybT +2 + φb2 ybT +1 .
6.3 Forecasting with Univariate Time Series Models
167
This univariate recursive forecasting procedure is easily demonstrated.
Consider the logarithm of monthly United States equity index, pt , for which
data are available from February 1871 to June 2004, and associated returns,
rpt = pt − pt−1 , expressed as percentages.
Ex ante forecasts
To generate ex ante forecasts of returns using a simple AR(1) model, the
parameters are estimated using the entire available sample period and these
estimates, together with the actual return for June 2004 are used to generate
the recursive forecasts. Consider the case where ex ante forecasts are required
for July and August 2004. The estimated model is
rpt = 0.2472 + 0.2853 ret−1 + vb1,t ,
where vb1,t is the least squares residual. Given that the actual return for June
2004 is 2.6823% the forecasts for July and August are, respectively,
: rp
b T +1 =
=
February : rp
b T +2 =
=
January
0.2472 + 0.2853 rpT
0.2472 + 0.2853 × 2.6823 = 1.0122%
0.2472 + 0.2853 rp
b T +1
0.2472 + 0.2853 × 1.0120 = 0.5359%
Ex post forecasts
Suppose now that ex post forecasts are required for the period January 2004
to June 2004. The model is now estimated over the period February 1871 to
December 2013 to yield
rpt = 0.2459 + 0.2856 rpt−1 + vbt ,
where vbt is the least squares residual. The forecasts are now generated recursively using the estimated model and also the fact that the equity return
168
Forecasting
in December 2003 is 2.8858%:
January
: rp
b T +1 =
=
February : rp
b T +2 =
=
March
: rp
b T +3 =
=
April
: rp
b T +4 =
=
May
: rp
b T +5 =
=
June
: rp
b T +6 =
=
0.2459 + 0.2856 rpT
0.2459 + 0.2856 × 2.8858
0.2459 + 0.2856 rp
b T +1
0.2459 + 0.2856 × 1.0701
0.2459 + 0.2856 rp
b T +2
0.2459 + 0.2856 × 0.5515
0.2459 + 0.2856 rp
b T +3
0.2459 + 0.2856 × 0.4034
0.2459 + 0.2856 rp
b T +4
0.2459 + 0.2856 × 0.3611
0.2459 + 0.2856 rp
b T +5
0.2459 + 0.2856 × 0.3490
=
1.0701%
=
0.5515%
=
0.4034%
=
0.3611%
=
0.3490%
= 0.3456% .
The forecasts are illustrated in Figure 8.1. It is readily apparent how
quickly the forecasts are driven toward the unconditional mean of returns.
This is typical of time series forecasts.
-10
-5
0
5
AR(1) Forecast of U.S. Equity Returns
Jan 2003
Jul 2003
Jan 2004
Jul 2004
Figure 6.1 Forecasts (dashed line) of United States equity returns generated by an AR(1) model. The estimation sample period is February 1871
to December 2003 and the forecast period is from January 2004 to June
2004.
6.4 Forecasting with Multivariate Time Series Models
The recursive method used to generate the forecasts of a univariate time
series model is easily generalised to multivariate models.
6.4 Forecasting with Multivariate Time Series Models
169
6.4.1 Vector Autoregressions
Consider a bivariate vector autoregression with one lag, VAR(1), given by
y1,t = φ10 + φ11 y1,t−1 + φ12 y2,t−1 + v1,t
y2,t = φ20 + φ21 y1,t−1 + φ22 y2,t−1 + v2,t .
(6.5)
Given data up to time T , a forecast one period ahead is obtained by writing
the model at time T + 1
y1,T +1 = φ10 + φ11 y1,T + φ12 y2,T + v1,T +1
y2,T +1 = φ20 + φ21 y1,T + φ22 y2,T + v2,T +1 .
The knowns on the right-hand side are the last observations of the two
variables, y1,T and y2,T and the unknowns are the the disturbance terms
v1,T +1 and v2,T +1 and the parameters {φ10 , φ11 , φ12 , φ20 , φ21 , φ22 }. Replacing
the unknowns by the best guesses, as in the univariate AR model, yields the
following forecasts for the two variables at time T + 1:
yb1,T +1 = φb10 + φb11 y1,T + φb12 y2,T
yb2,T +1 = φb20 + φb21 y1,T + φb22 y2,T .
To generate forecasts of the VAR(1) model in (6.5) in two periods ahead,
the model is written at time T + 2
y1,T +2 = φ10 + φ11 y1,T +1 + φ12 y2,T +1 + v1,T +2
y2,T +2 = φ20 + φ21 y1,T +1 + φ22 y2,T +1 + v2,T +2 .
Now all terms on the right-hand side are unknown. As before the parameters
are replaced by the estimators and the disturbances are replaced by their
means, while y1,T +1 and y2,T +1 are replaced by their forecasts from the
previous step, resulting in the two-period ahead forecasts
yb1,T +2 = φb10 + φb11 yb1,T +1 + φb12 yb2,T +1
yb2,T +2 = φb20 + φb21 yb1,T +1 + φb22 yb2,T +1 .
In general, the forecasts of the VAR(1) model for H−periods ahead are
yb1,T +H
yb2,T +H
= φb10 + φb11 yb1,T +H−1 + φb12 yb2,T +H−1
= φb20 + φb21 yb1,T +H−1 + φb22 yb2,T +H−1 .
An important feature of this result is that even if forecasts are required for
just one of the variables, say y1,t , it is necessary to generate forecasts of the
other variables as well.
To illustrate forecasting using a VAR consider in addition to the logarithm
of the equity index, pt and associated returns, rpt , consider also the logarithm of real dividends dt and the returns to dividends rdt . As before data
170
Forecasting
are available for the period February 1871 to June 2004 and suppose ex ante
forecasts are required for July and August 2004. The estimated bivariate
VAR model is
rpt = 0.2149 + 0.2849 rpt−1 + 0.1219 rdt−1 + vb1,t
rdt = 0.0301 + 0.0024 rpt−1 + 0.8862 rdt−1 + vb2,t ,
where vb1,t and vb2,t are the residuals from the two equations. The forecasts
for equity and dividend returns in July are
rp
b T +1 = 0.2149 + 0.2849 rpT + 0.1219 rdT
= 0.2149 + 0.2849 × 2.6823 + 0.1219 × 1.0449
= 1.1065%
b T +1 = 0.0301 + 0.0024 rpT + 0.8862 rdT
rd
= 0.0301 + 0.0024 × 2.6823 + 0.8862 × 1.0449
= 0.9625%.
The corresponding forecasts for August are
b T +1
b T +1 + 0.1219 rd
rp
b T +2 = 0.2149 + 0.2849 rp
= 0.2149 + 0.2849 × 1.1065 + 0.1219 × 0.9625
= 0.6475%
b T +2 = 0.0301 + 0.0024 rp
b T +1
rd
b T +1 + 0.8862 rd
= 0.0301 + 0.0024 × 1.1065 + 0.8862 × 0.9625
= 0.6475%.
6.4.2 Vector Error Correction Models
An important relationship between vector autoregressions and vector error
correction models discussed in Chapter 5 is that a VECM represents a restricted VAR. This suggests that a VECM can be re-expressed as a VAR
which, in turn, can be used to forecast the variables of the model.
Consider the following bivariate VECM containing one lag
∆y1,t = γ1 (y2,t−1 − βy1,t−1 − µ) + π11 ∆y1,t−1 + π12 ∆y2,t−1 + v1,t
∆y2,t = γ2 (y2,t−1 − βy1,t−1 − µ) + π21 ∆y1,t−1 + π22 ∆y2,t−1 + v2,t .
6.4 Forecasting with Multivariate Time Series Models
171
Rearranging the VECM as a (restricted) VAR(2) in the levels of the variables, gives
y1,t = −γ1 µ + (1 + π11 − γ1 β)y1,t−1 − π11 y1,t−2 + (γ1 + π12 )y2,t−1 − π12 y2,t−2 + v1,t
y2,t = −γ2 µ + (π21 − γ2 β)y1,t−1 − π21 y1,t−2 + (1 + γ2 + π22 )y2,t−1 − π22 y2,t−2 + v2,t ,
Alternatively, it is possible to write
y1,t = φ10 + φ11 y1,t−1 + φ12 y1,t−2 + φ13 y2,t−1 + φ14 y2,t−2 + v1,t
y2,t = φ20 + φ21 y1,t−1 + φ22 y1,t−2 + φ23 y2,t−1 + φ24 y2,t−2 + v2,t ,
(6.6)
in which the VAR and VECM parameters are related as follows
φ10
φ11
φ12
φ13
φ14
= −γ1 µ
= 1 + π11 − γ1 β
= −π11
= γ1 + π12
= −π12
φ20
φ21
φ22
φ23
φ24
= −γ2 µ
= π21 − γ2 β
= −π21
= 1 + γ2 + π22
= −π22 .
(6.7)
Now that the VECM is re-expressed as a VAR in the levels of the variables
in equation (6.6), the forecasts are generated for a VAR as discussed in
Section 6.4.1 with the VAR parameter estimates computed from the VECM
parameter estimates based on the relationships in (6.7).
Using the same dataset as that used in producing ex ante VAR forecasts,
the procedure is easily repeated for the VECM. The estimated VECM model
with a restricted constant (Model 3) and with two lags in the underlying
VAR model is 1
rpt = 0.2056 − 0.0066(pt−1 − 1.1685 dt−1 − 312.9553)
+0.2911 rpt−1 + 0.1484 rdt−1 + vb1,t
rdt = 0.0334 + 0.0023(pt−1 − 1.1685 dt−1 − 312.9553)
+0.0002 rpt−1 + 0.8768 rdt−1 + vb2,t ,
where vb1,t and vb2,t are the residuals from the two equations. Writing the
VECM as a VAR in levels gives
pt = (0.2056 + 0.0066 × 312.9553)
+ (1 − 0.0066 + 0.2911) pt−1 − 0.2911 pt−2
+(0.0066 × 1.1685 + 0.1484)dt−1 − 0.1484 dt−2 + vb1,t
dt = (0.0334 − 0.0023 × 312.9553)
+ (0.0023 + 0.0002) pt−1 − 0.0002 pt−2
+ (1 − 0.0023 × 1.1685 + 0.8768) dt−1 − 0.8768 dt−2 + vb2,t ,
1
These estimates are the same as the estimates reported in Chapter 5 with the exception that
the intercepts now reflect the fact that the variables are scaled by 100.
172
Forecasting
or
pt = 2.2711 + 1.2845 pt−1 − 0.2911 pt−2
+0.1561 dt−1 − 0.1484 dt−2 + vb1,t
dt = −0.6864 + 0.0025 pt−1 − 0.0002 pt−2
+1.8741 dt−1 − 0.8768 dt−2 + vb2,t .
The forecast for July log equities is
pbT +1 = 2.2711 + 1.2845 pT − 0.2911 pT −1 + 0.1561 dT − 0.1484 dT −1
= 704.0600,
and for July log dividends is
dbT +1 − 0.6864 + 0.0025 pT − 0.0002 pT −1 + 1.8741 dT − 0.8768 dT −1
= 293.3700.
Similar calculations reveal that the forecasts for August log equities and
dividends are:
pbT +2 = 704.3400
dbT +1 = 294.4300.
Based on these forecasts of the logarithms of equity prices and dividends,
the forecasts for the percentage equity returns in July and August 2004 are,
respectively,
rp
b T +1 = 704.0600 − 703.2412 = 0.8188%
rp
b T +2 = 704.3400 − 704.0600 = 0.2800%,
and the corresponding forecasts for dividend returns are, respectively,
b T +1 = 293.3700 − 292.3162 = 1.0538%
rd
b T +2 = 294.4300 − 293.3700 = 1.0600%.
rd
6.5 Forecast Evaluation Statistics
The discussion so far has concentrated on forecasting a variable or variables
over a forecast horizon H, beginning after the last observation in the dataset.
This of course is the most common way of computing forecasts. Formally
these forecasts are known as ex ante forecasts. However, it is also of interest
to be able to compare the forecasts with the actual value that are realised
to determine their accuracy. One approach is to wait until the future values
are observed, but this is not that convenient if an answer concerning the
forecasting ability of a model is required immediately.
A common solution adopted to determine the forecast accuracy of a model
6.5 Forecast Evaluation Statistics
173
is to estimate the model over a restricted sample period that excludes the
last H observations. The model is then forecasted out-of-sample over these
observations, but as the actual value of these observations have already
been observed it is possible to compare the accuracy of the forecasts with
the actual values. As the data are already observed, forecasts computed in
this way are known as ex post forecasts.
There are a number of simple summary statistics that are used to determine the accuracy of forecasts. Define the forecast error in period T + h as
the difference between the actual and forecast value over the forecast horizon
yT +1 − ybT +1 , yT +2 − ybT +2 , · · · , yT +H − ybT +H ,
then it follows immediately that the smaller the forecast error the better is
the forecast. The most commonly used summary measures of overall closeness of the forecasts to the actual values are:
Mean Absolute Error:
M AE
=
Mean Absolute Percentage Error: M AP E =
Mean Square Error:
M SE
Root Mean Square Error:
RM SE
H
1 P
|yT +h − ybT +h |
H h=1
H y
1 P
bT +h
T +h − y
H h=1
yT +h
H
1 P
(yT +h − ybT +h )2
H h=1
s
H
1 P
(yT +h − ybT +h )2
=
H h=1
=
These use of these statistics is easily demonstrated in the context of the
United States equity returns, rpt . To allow the generation of ex post forecasts
an AR(1) model is estimated using data for the period February 1871 to
December 2003. Forecasts for the period January to June of 2004 for are then
used with the observed monthly percentage return on equities to generate
the required summary statistics.
To compute the MSE for the forecast period the actual sample observations of equity returns from January 2004 to June 2004 are required. These
are
4.6892%, 0.9526%, −1.7095%, 0.8311%, −2.7352%, 2.6823%.
174
Forecasting
The MSE is
6
1X
M SE =
(yt+h − ft+h )2
6
h=1
1
=
(4.6892 − 1.0701)2 + (0.9526 − 0.5515)2 + (−1.7095 − 0.4034)2
6
+ (0.8311 − 0.3611)2 + (−2.7352 − 0.3490)2 + (2.6823 − 0.3456)2
= 5.4861
The RMSE is
v
u 6
u1 X
√
RM SE = t
(yt+h − ft+h )2 = 5.4861 = 2.3423
6
h=1
Taken on its own, the root mean squared error of the forecast, 2.3422, does
not provide a descriptive measure of the relative accuracy of this model per
se, as its value can easily be changed by simply changing the units of the
data. For example, expressing the data as returns and not percentage returns
results in the RMSE falling by a factor of 100. Even though the RMSE is now
smaller that does not mean that the forecasting performance of the AR(1)
model has improved in this case. The way that the RMSE and the MSE are
used to evaluate the forecasting performance of a model is to compute the
same statistics for an alternative model: the model with the smaller RMSE
or MSE, is judged as the better forecasting model.
The forecasting performance of several models are now compared. The
models are an AR(1) model of equity returns, a VAR(1) model containing
equity and dividend returns, and a VECM(1) based on Model 3, containing
log equity prices and log dividends. Each model is estimated using a reduced
sample on United States monthly percentage equity returns from February
1871 to December 2003, and the forecasts are computed from January to
June of 2004. The forecasts are then compared using the MSE and RMSE
statistics.
The results in Table 6.1 show that the VAR(1) is the best forecasting
model as it yields the smallest MSE and RMSE. The AR(1) is second best
followed by the VECM(1).
There is an active research area in financial econometrics at present in
which these statistical (or direct) measures of forecast performance are replaced by problem-specific (or indirect) measures of forecast performance in
which the evaluation relates specifically to an economic decision (Elliot and
Timmerman, 2008; Patton and Sheppard, 2009). Early examples of the indi-
175
6.6 Evaluating the Density of Forecast Errors
Table 6.1
Forecasting performance of models of United States monthly percentage equity
returns. All models are estimated over the period January 1871 to December 2003
and the forecasts are computed from January to June of 2004.
Forecast/Statistic
January 2004
February 2004
March 2004
April 2004
May 2004
June 2004
MSE
RMSE
AR(1)
VAR(1)
VECM(1)
1.0701%
0.5515%
0.4034%
0.3611%
0.3490%
0.3456%
1.2241%
0.7333%
0.5780%
0.5200%
0.4912%
0.4721%
0.9223%
0.3509%
0.1890%
0.1474%
0.1411%
0.1447%
5.4861
2.3422
5.4465
2.3338
5.5560
2.3571
rect approach to forecast evaluation are Engle and Colacito (2006) evaluate
forecast performance in terms of portfolio return variance, while Fleming,
Kirby and Ostdiek (2001, 2003) apply a quadratic utility function that values one forecast relative to another. Becker, Clements, Doolan and Hurn
(2013) provide a survey and comparison of these different approaches to
forecast evaluation.
6.6 Evaluating the Density of Forecast Errors
The discussion of generating forecasts of financial variables thus far focusses
on either the conditional mean (point forecasts) or the conditional variance
(interval forecasts) of the forecast distribution. A natural extension is also
to forecast higher order moments, including skewness and kurtosis. In fact,
it is of interest in the area of risk management to forecast all moments of the
distribution and hence forecast the entire probability density of key financial
variables.
As is the case with point forecasts where statistics are computed to determine the relative accuracy of the forecasts, the quality of the density
forecasts are also evaluated to determine their relative accuracy in forecasting all moments of the distribution. However, the approach is not to try and
evaluate the forecasts properties of each moment separately, but rather test
all moments jointly by using the probability integral transformation (PIT).
176
Forecasting
6.6.1 Probability integral transform
Consider a very simple model of a data generating process for the
yt = µ + vt
vt ∼ iid N (0, σ 2 ),
in which µ = 0.0 and σ 2 = 1.0. Now denote the cumulative distribution
function of the standard normal distribution evaluated at any point z as
Φ(z), then if a sample of observed values yt are indeed generated correctly,
then
ut = Φ(yt − µ)
t = 1, 2, · · · , T
results in the transformed time series ut having an iid uniform distribution.
This transformation is known as the probability integral transform.
Figure 6.2 contains an example of how the transformed times series ut is
obtained from the actual time series yt where the specified model is N (0, 1).
This result is a reflection of the property that if the cumulative distribution
is indeed the correct distribution, transforming yt to ut means that each yt
has the same probability of being realised as any other value of yt .
0
.2
.4
ut
.6
.8
1
Probabality Integral Transform
-4
-2
0
yt
2
4
Figure 6.2 Probability integral transform showing how the the time series
yt is transformed into ut based on the distribution N (0, 1).
The probability integral transform in the case where the specified model
is chosen correctly is highlighted in panel (a) of Figure 6.3. A time series
plot of 1000 simulated observations, yt , drawn from a N (0, 1) distribution
is transformed into via the cumulative normal distribution to ut . Finally
177
6.6 Evaluating the Density of Forecast Errors
50
ut
0 .2 .4 .6 .8 1
0
-4
-2
yt
0
2
4
Panel (a) - Correct distribution
0
500
1000
0
500
1000
0
.2
.4
.6
.8
1
0
.2
.4
.6
.8
1
0
.2
.4
.6
.8
1
100
0
50
ut
0 .2 .4 .6 .8 1
-2
0
yt
2
4
Panel (b) - Mean misspecified
0
500
1000
0
500
1000
100
0
50
ut
0 .2 .4 .6 .8 1
-5
yt
0
5
Panel (c) - Variance misspecified
0
500
1000
0
500
1000
Figure 6.3 Simulated time series to show the effects of misspecification on
the probability integral transform. In panel (a) there is no misspecification
while panels (b) and (c) demonstrate the effect of misspecification in the
mean and variance of the distribution respectively.
the histogram of the transformed time series, ut is shown. Inspection of
this histogram confirms that the distribution of ut is uniform and that the
distribution used in transforming yt is indeed the correct one.
Now consider the case where the true data generating process for yt is
the N (0.5, 1) distribution, but the incorrect distribution, N (0, 1), is used as
the forecast distribution to perform the PIT. The effect of misspecification
of the mean on the forecasting distribution is illustrated in panel (b) of
Figure 6.3. A time series of 1000 simulated observations from a N (0.5, 1.0)
178
Forecasting
distribution, yt , is transformed using the incorrect distribution, N (0, 1), and
the histogram of the transformed time series, ut is plotted. The fact that
ut is not uniform in this case is a reflection of a misspecified model. The
histogram exhibits a positive slope reflecting that larger values of yt have a
relatively higher probability of occurring than small values of yt .
Now consider the case where the variance of the model is misspecified.
If the data generating process is a N (0, 2) distribution, but the forecast
distribution used in the PIT is once again N (0, 1) then it is to be expected
that the forecast distribution will understate the true spread of the data.
This is clearly visible in panel (c) of Figure 6.3. The histogram of ut is
now U-shaped implying that large negative and large positive values have a
higher probability of occurring than predicted by the N (0, 1) distribution.
6.6.2 Equity Returns
The models used to forecast United States equity returns rpt in Section 6.3
are all based on the assumption of normality. Consider the AR(1) model
rpt = φ0 + φ1 rpt−1 + vt ,
vt ∼ N (0, σ 2 ) .
Assuming the forecast is ex post so that rpt is available, the one-step ahead
forecast error is given by
vbt = rpt − φb0 − φb1 rpt−1 ,
with distribution
f (b
vt ) ∼ N (rpt − φb0 − φb1 rpt−1 , σ 2 ) .
Using monthly data from January 1871 to June 2004, this distribution is
f (b
vt ) ∼ N (rpt − 0.2472 − 0.2853 rpt−1 , 3.9292 ) .
The PIT corresponding to the estimated distribution in (6.6.2) the transformed time series are computed as
vbt
ut = Φ
,
σ
b
in which σ
b1 is the standard error of the regression. A histogram of the transformed time series, ut , is given in Figure 6.4. It appears that the AR(1)
forecasting model of equity returns is misspecified because the distribution
of ut is non-uniform. The interior peak of the distribution of ut suggests
that the distribution of yt is more peaked than that predicted by the normal
distribution. Also, the pole in the distribution at zero suggests that there
179
6.7 Combining Forecasts
0
f(ut)
50
100
are some observed negative values of yt that are also not consistent with
the specification of a normal distribution. These two properties combined
suggest that the specified model fails to take into account the presence of
higher order moments such as skewness and kurtosis. The analysis of the
one-step ahead AR(1) forecasting model can easily be extended to the other
estimated models of equity returns including the VAR and the VECM investigated in Section 6.4 to forecast equity returns.
0
.2
.4
.6
.8
1
ut
Figure 6.4 Probability integral transform applied to the estimated one-step
ahead forecast errors of the AR(1) model of United States equity returns,
January 1871 to June 2004.
As applied here, the PIT is ex post as it involves using the within sample
one-step ahead prediction errors to perform the analysis and it is also a simple graphical implementation in which misspecification is detected by simple
inspection of the histogram of the transformed time series, ut . It is possible
to relax both these assumptions. Diebold, Gunther and Tay (1998) discuss
an alternative ex ante approach, while Ghosh and Bera (2005) propose a
class of formal statistical tests of the null hypothesis that ut is uniformly
distributed.
6.7 Combining Forecasts
Given that all models are wrong but some are useful, it is not surprising
that the issue of combining forecasts has generated a great deal of interest
(Timmerman, 2006; Elliott and Timmerman, 2008) and very often the financial press will report consensus forecasts which are essentially averages
180
Forecasting
of different forecasts of the same quantity. This raises an important question
in forecasting: is it better to rely on the best individual forecast or is there
any gain to averaging the competing forecasts?
Suppose you have two unbiased forecasts of a variable yt given by ybt1
and ybt2 , with respective variances σ12 and σ22 and covariance σ12 . A weighted
average of these two forecasts is
ybt = ωb
y1,t + (1 − ω)b
y1,t
and the variance of average is
σ 2 = ω 2 σ12 + (1 − ω)2 σ22 + 2ω(1 − ω)σ1 1
A natural approach is to choose the weight ω in order to minimise the
variance of the forecast. Solving the the first order condition
∂σ
= 2ωσ12 − 2(1 − ω)σ22 + 2σ12 − 4ωσ11 = 0
∂ω
for the optimal weight gives
ω=
σ22 − σ11
.
σ12 + σ22 − 2σ11
It is clear therefore that the weight attached to ybt1 varies inversely with its
variance. In passing, these weights are of course identical to the optimal
weights for the minimum variance portfolio derived in Chapter 2.
This point can be illustrated more clearly if the forecasts are assumed to
be uncorrelated, σ12 = 0. In this case,
ω=
σ22
σ12 + σ22
1−ω =
σ12
σ12 + σ22
and it is clear that both forecasts have weights varying inversely with their
variances. By rearranging the expression for ω as follows
ω=
=
σ22 σ2−2 σ1−2 σ12 + σ22 σ2−2 σ1−2
σ1−2
,
σ1−2 + σ2−2
(6.8)
the inverse proportionality is now manifestly clear in the numerator of expression (6.8). This simple intuition in the two forecast case translates into
a situation in which there are N forecasts {b
yt1 , ybt2 , · · · , ybtN } of the same
6.7 Combining Forecasts
181
variable yt . If these forecasts are all unbiased and uncorrelated and if the
weights satisfy
N
X
i=1
ωi = 1
ωi ≥ 0
i = 1, 2, · · · , N ,
then from (6.8) the optimal weights are
σ −2
ωi = PN i −2 ,
j=1 σj
and the weight on forecast i is inversely proportional to its variance.
While the weights in expression (6.8) are intuitively appealing as they are
based on the principle of producing a minimum variance portfolio. Important
questions remain, however, about how best to implement the combination
of forecasts approach in practice. Bates and Granger (1969) suggested using
(6.8) estimating the σi2 using the forecast mean square error as an estimate
of the forecast variance. All this approach requires then is an estimate of the
MSE of all the competing forecasts in order to compute the optimal weights,
ω
bi . Granger and Ramanathan (1984) later showed that this method was
numerically equivalent to weights constructed from running the restricted
regression
yt = ω1 ybt1 + ω2 ybt2 + · · · + ωN ybtN + vt ,
in which the coefficients are constrained to be non-negative and to sum to
one. Of course enforcing these restrictions in practice can be tricky and
sometimes ad hoc methods need to be adopted. One one method is the
sequential elimination of forecasts with weights estimated to be negative
until all the remaining forecasts in the proposed combination forecast have
positive weights. This is sometimes referred to as forecast encompassing
because all the forecasts that eventually remain in the regression encompass
all the information in those that are left out.
Yet another approach to averaging forecasts is based on the use of information criteria (Buckland, Burnham and Augustin, 1997; Burnham and
Anderson, 2002), which may be interpreted as the relative quality of an
econometric model. Suppose you have N different models each with an estimated Akaike information criterion AIC1 , AIC2 , · · · , AICN , then the model
that returns the minimum value of the information criterion is usually the
model of choice. Denote the minimum value of the information criterion for
this set of models as AICmin , then
exp [∆Ii /2] = exp [(AICi − AICmin )/2]
182
Forecasting
may be interpreted as a relative measure of the loss of information2 due
to using model i instead of the model yielding Imin . It is therefore natural to allow the forecast combination to reflect this relative information by
computing the weights
ω
bi =
exp [∆Ii /2]
N
P
exp [∆Ii /2]
j=i
The Schwarz (Bayesian) Information Criterion (SIC) has also been suggested
as an alternative information criterion to use in this context.3
Of course the simplest idea would be assign equal weight to these forecasts
construct the simple average
1 X
ybt =
= 1N yb1it .
N
i
Interestingly enough, simulation studies and practical work generally indicated that this simplistic strategy often works best, especially when there are
large numbers of forecasts to be combined, notwithstanding all the subsequent work on the optimal estimation of weights (Stock and Watson, 2001).
Two possible explanations of why averaging might in practice work better
than constructing the optimal combination focus are as follows.
(i) There may be significant error in the estimation of the weights, due either to parameter instability (Clemen, 1989; Winkler and Clemen, 1992,
Smith and Wallis, 2009) or structural breaks (Hendry and Clements,
2004)).
(ii) The fact that the variances of the competing forecasts may be very
similar and their covariances positive suggests that large gains obtained
by constructing optimal weights are unlikely (Elliott, 2011).
6.8 Regression Model Forecasts
The forecasting of univariate and multivariate models discussed so far are
all based on time series models as each dependent variable is expressed as
2
3
The exact form of this expression derives from the likelihood principle which is discussed in
Chapter 7. The AIC is an unbiased estimate of −2 times the log-likelihood function of model
i, so the after dividing by −2 and exponentiating the result is a measure of the likelihood that
model i actually generated the observed data.
When the SIC is is used to construct the optimal weights have the interpretation of a
Bayesian averaging procedure. Illustrative examples may be found in Garratt, Koop and
Vahey, (2008) and Kapetanios, Vabhard and Price (2008).
6.8 Regression Model Forecasts
183
a function of own lags and lags of other variables. Now consider forecasting
the linear regression model
yt = β0 + β1 xt + ut ,
where yt is the dependent variable, xt is the explanatory variable, ut is a
disturbance term, and the sample period is t = 1, 2, · · · , T . To generate a
forecast of yt at time T + 1, as before, the model is written at T + 1 as
yT +1 = β0 + β1 xT +1 + uT +1
The unknown values on the right hand-side are yT +1 and uT +1 , as well as
the parameters {β0 , β1 }. As before, uT +1 is replaced by its expected value of
E[uT +1 ] = 0, while the parameters are replaced by their sample estimates,
{βb0 , βb1 }. However, it is not clear how to deal with xT +1 , the future value
of the explanatory variable. One strategy is to specify hypothetical future
values of the explanatory variable that in some sense capture scenarios the
researcher is interested in.
A less subjective approach is to specify a time series model for xt and use
this model to generate forecasts of xT +i . Suppose for the sake of argument
that an AR(2) model is proposed for xt . The bivariate system of equations
to be estimated is then
yt = β0 + β1 xt + ut
xt = φ0 + φ1 xt−1 + φ2 xt−2 + vt .
(6.9)
(6.10)
To generate the first forecast at time T +1 the system of equations is written
as
yT +1 = β0 + β1 xT +1 + uT +1
xT +1 = φ0 + φ1 xT + φ2 xT −1 + vT +1 .
Replacing the unknowns with the best available guesses, yields
ybT +1 = βb0 + βb1 x
bT +1
x
bT +1 = φb0 + φb1 xT + φb2 xT −1 .
(6.11)
(6.12)
Equation (6.12) is used to generate the forecast x
bT +1 , which is the substituted into equation (6.11) to generate a ybT +1
Alternatively, these calculations can be performed in one step by substituting (6.12) for x
bT +1 into (6.11) to give
ybT +1 = βb0 + βb1 (φb0 + φb1 x1,T + φb2 xT −1 )
= βb0 + βb1 φb0 + βb1 φb1 x1,T + βb1 φb2 xT −1 .
184
Forecasting
Of course, the case where there are multiple explanatory variables is easily
handled by specifying a VAR to generate the required multivariate forecasts.
The regression model may be used to forecast United States equity returns, rpt , using dividend returns, rdt . As in earlier illustrations, the data
are from February 1871 to June 2004. Estimation of equations (6.9) and
(6.10), in which for simplicity the latter is restricted to an AR(1) representation, gives
yt = 0.3353 + 0.0405y1,t + u
bt ,
xt = 0.0309 + 0.8863x1,t−1 + vbt .
Based on these estimates, the forecasts for dividend returns in July and
August are, respectively,
x
bT +1 = 0.0309 + 0.8863 x1,T = 0.0309 + 0.8863 × 1.0449 = 0.9570%
x
bT +2 = 0.0309 + 0.8863 x1,T +1 = 0.0309 + 0.8863 × 0.9570 = 0.8791% ,
so that in July and August the forecasted equity returns are
ybT +1 = 0.3353 + 0.0405f1,T +1 = 0.3353 + 0.0405 × 0.9570 = 0.3741%
ybT +2 = 0.3353 + 0.0405f1,T +2 = 0.3353 + 0.0405 × 0.8791 = 0.3709%
6.9 Predicting the Equity Premium
Forecasting in finance using regression models, or predictive regressions,
as outlined in Section 6.8 is one that is currently receiving quite a lot of
attention (Stambaugh, 1999). In a series of recent papers Goyal and Welch
(2003; 2008) provide empirical evidence of the predictability of the equity
premium, eqpt , defined as the total rate of return on the S&P 500 index,
rmt , minus the short-term interest rate, in terms of the dividend-price ratio
dpt and the dividend yield dyt . What follows reproduces some of the results
from Goyal and Welch (2003).
Table 6.2 provides summary statistics for the data. There are difficulties
in reproducing all the summary statistics reported by Goyal and Welch in
their papers because the data they provide is updated continuously.4 The
summary statistics reported here are for slightly different sample periods
than those listed in Goyal and Welch (2003), but the mean and standard
deviation for the sample period 1927 to 2005 of 6.04% and 19.17%, respectively, are identical to those for the same period listed in Goyal and Welch
(2008). Furthermore the plots of the logarithm of the equity premium and
4
See http://www.hec.unil.ch/agoyal/
185
6.9 Predicting the Equity Premium
the logarithms of the dividend yield and dividend price ratio in Figure 6.5
are almost identical to the plots in Figure 1 of Goyal and Welch (2003).
Table 6.2
Descriptive statistics for the annual total market return, the equity premium, the
dividend price ratio and the dividend yield all defined in terms of the S&P 500
index. All variables are in percentages.
1926 - 2003
1946 - 2003
1927 - 2005
Mean
St.dev.
Min.
Max.
Skew.
Kurt.
rmt
eqpt
dpt
dyt
9.79
6.11
-3.28
-3.22
19.10
19.28
0.44
0.42
-53.99
-55.13
-4.48
-4.50
42.51
42.26
-2.29
-2.43
-0.82
-0.65
-0.64
-1.07
3.69
3.41
3.63
4.33
rmt
eqpt
dpt
dyt
10.52
5.88
-3.37
-3.30
15.58
15.93
0.42
0.43
-30.12
-37.64
-4.48
-4.50
41.36
40.43
-2.63
-2.43
-0.46
-0.43
-0.76
-0.81
2.66
2.84
3.52
3.96
rmt
eqpt
dpt
dyt
9.69
6.04
-3.30
-3.24
18.98
19.17
0.45
0.43
-53.99
-55.13
-4.48
-4.50
42.51
42.26
-2.29
-2.43
-0.80
-0.65
-0.57
-0.96
3.71
3.44
3.28
3.79
Equity Premium
-.6 -.4 -.2 0 .2
.4
(a) Equity Premium
1920
1940
1960
1980
2000
-4.5 -4 -3.5 -3 -2.5
(b) Dividend Ratios
1920
1940
1960
Div Yield
1980
2000
Div-Price Ratio
Figure 6.5 Plots of the time series of the logarithm of the equity premium,
dividend yield, and dividend-price ratio.
186
Forecasting
The predictive regressions used in this piece of empirical analysis are,
respectively,
eqpt = αy + βy dyt−1 + uy,t
(6.13)
eqpt = αp + βp dpt−1 + up,t .
(6.14)
The parameter estimates obtained from estimating these equations for two
different sample periods, namely, 1926 to 1990 and 1926 to 2002, respectively,
are reported in Table 6.3.
Table 6.3
Predictive regressions for the equity premium using the divined price ratio, dpt ,
and the dividend yield, dyt , as explanatory variables.
α
β
R2
R
2
Std. error
N
Sample 1926 - 1990
dpt
dyt
0.57
(0.257)
(0.030)
0.738
(0.282)
(0.011)
0.163
(0.0818)
(0.050)
0.221
(0.0913)
(0.018)
.0595
0.0446
0.193
65
.0851
0706
0.1903
65
Sample 1926 - 2002
dpt
dyt
0.379
(0.169)
(0.028)
0.467
(0.176)
(0.010 )
0.0984
(0.0517)
(0.061)
0.128
(0.0547)
(0.022)
.0461
.0334
0.1898
77
.0680
.0556
0.1876
77
These results suggest that dividend yields and price dividend ratios had
at least some forecasting power with respect to the equity premium for the
period 1926 - 1990, at least for the S&P 500 index. It is noticeable however
that the size of the coefficients on both dpt−1 and dyt−1 is substantially
reduced when the sample size is increased to 2002. Although the results
are not identical to those in Table 2 of Goyal and Welch (2003) because of
data revisions, the coefficients are similar and so is the pattern of size of the
coefficient estimates decreasing as the sample size is increased.
This sub-sample instability of the estimated regression coefficients in Table 6.3 is further illustrated by considering the recursive plots of the slope
coefficients on dpt−1 and dyt−1 in Figure 6.6 reveal some important problems with this interpretation at least from the forecasting perspective. The
187
6.9 Predicting the Equity Premium
Recursive Coefficient Estimates
-.5
0
.5
(a) Divident Price Ratio
1940
1960
1980
2000
1980
2000
-.5 0
.5
1 1.5 2
(b) Divident Yield
1940
1960
Figure 6.6 Recursive estimates of the coefficients on the dividend-price
ratio and the dividend yield from (6.13) and (6.14).
plot reveals that although the coefficient on dyt−1 appears to be marginally
statistically significant at the 5% level over long periods, the coefficient on
dpt−1 increases over time while the coefficient on dyt−1 steadily decreases.
In other words, as time progresses the forecaster would rely less on dyt and
more on dpt despite the fact that the dyt coefficient appears more reliable
in terms of statistical significance. In fact, the dividend yield is almost always produces an inferior forecast to the unconditional mean of the equity
premium and the dividend-price ratio fares only slightly better. The point
being made is that a trader relying on information available at the time
a forecast was being made and not relying on information relating to the
entire sample would have had difficulty in extracting meaningful forecasts.
The main tool for interpreting the performance of predictive regressions
supplied by Goyal and Welch (2003) is a plot of the cumulative sum of
squared one-step-ahead forecast errors of the predictive regressions expressed
relative to the forecast error of the best current estimate of the mean of the
equity premium. Let one-step-ahead forecast errors of the dividend yield
and dividend-price ratio models be u
by,t+1|t and u
bp,t+1|t , respectively, and let
the forecast errors for the best estimate of the unconditional mean be u
bt+1|t ,
188
Forecasting
then Figure 6.7 plots the two series
SSE(y) =
SSE(p) =
2003
X
b2y,t+1|t )
(b
u2t+1|t − u
[Dividend Yield Model]
b2p,t+1|t )
(b
u2t+1|t − u
[Dividend-Price Ratio Model].
t=1946
2003
X
t=1946
-.3
-.2
-.1
0
.1
A positive value for SSE means that the model forecasts are superior to the
forecasts based solely on the mean thus far. A positive slope implies that
over the recent year the forecasting model performs better than the mean.
1940
1960
SSE Dividend Yield Model
1980
2000
SSE Dividend Price Ratio Model
Figure 6.7 Plots of the cumulative squared relative one-step-ahead forecast errors obtained from the equity premium predictive regressions. The
squared one-step-ahead forecast errors obtained from the models are subtracted from the squared one-step-ahead forecast errors based solely on the
best current estimate of the unconditional mean of the equity premium.
Figure 6.7 indicates that the forecasting ability of a predictive regression using the dividend yield is abysmal as SSE(y) is almost uniformly less
than zero. There are two years in mid-1970s two years around 2000 when
SSE(y) has a positive slope but these episodes are aberrations. The forecasting performance of the predictive regression using the dividend-price ratio is
slightly better than the forecasts generated by the mean, SSE(p) > 0. This
is not a conclusion that emerges naturally from Figure 6.6 which indicates
that the slope coefficient from this regression is almost always statistically
insignificant.
6.10 Stochastic Simulation
189
There are a few important practical lessons to learn from predictive regressions. The first of these is that good in-sample performance does not
necessarily imply that the estimated equation will provide good ex ante
forecasting ability. As in the case of the performance of pooled forecasts, parameter instability is a a problem for good predictive performance. Second,
there is a fundamental problem using variables that are almost nonstationary
processes as explanatory equations in predictive regressions which purport
to explain stationary variables. So Stambaugh (1999) finds that dividend
ratios are almost random walks while the equity premia are stationary. It
may therefore be argued that dividend ratios are good predictors of their
own future behaviour only and not of the future path of the equity premium.
6.10 Stochastic Simulation
Forecasting need not necessarily be about point forecasts or best guesses.
Sometimes important information is conveyed by the degree of uncertainty
inherent in the best guess. One important application of this uncertainty
in finance is the concept of Value-at-Risk which was introduced in Chapter
1. Stated formally, Value-at-Risk represents the losses that are expected to
occur with probability α on an asset or portfolio of assets, P , after N . The
N − day (1 − α)% Value-at-Risk is expressed as V aR(P, N, 1 − α).
That Value-at-Risk is related to the uncertainty in the forecast of future values of the portfolio is easily demonstrated. Consider the case of US
monthly data on equity prices. Suppose that the asset in question is one
which pays the value of the index. An investor who holds this asset in June
2004, the last date in the sample, would observe that the value of the portfolio is $1132.76. The value of the portfolio is now forecast out for six months
to the end of December 2004. In assessing the decision to hold the asset or
liquidate the investment, it is not so much the best guess of the future value
that is important as the spread of the distribution of the forecast. The situation is illustrated in Figure 6.8 where the shaded region captures the 90%
confidence interval of the forecast. Clearly, the investor needs to take this
spread of likely outcomes into account and this is exactly the idea of Valueat-Risk. It is clear therefore that forecast uncertainty and Value-at-Risk are
intimately related.
Recall from Chapter 1 that Value-at-Risk may be computed by historical simulation, the variance-covariance method, or Monte Carlo simulation.
Using a model to make forecasts of future values of the asset or portfolio
and then assessing the uncertainty in the forecast is the method of Monte
Carlo simulation. In general simulation refers to any method that randomly
190
800
1000
1200
1400
Forecasting
2002m7
2003m1
2003m7
2004m1
2004m7
2005m1
Figure 6.8 Stochastic simulation of the equity price index over the period
July 2004 to December 2004. The ex ante forecasts are shown by the solid
line while the confidence interval encapsulates the uncertainty inherent in
the forecast.
generates repeated trials of a model and seeks to summarise uncertainty in
the model forecast in terms of the distribution of these random trials. The
steps to perform a simulation are as follows:
Step 1: Estimate the model
Estimate the following (simple) AR(1) regression model
yt = φ0 + φ1 yt−1 + vt
and store the parameter estimates φb0 and φb1 . Note that the AR(1)
model is used for illustrative purposes only and any model of yt could
be used.
Step 2: Solve the model
For each available time period t in the model, use φb0 and φb1 to
generate a one-step-ahead forecast
ybt+1 = φb0 + φb1 yt
and then compute and store the one-step-ahead forecast errors
vbt+1|t = ybt+1 − yt+1 .
191
6.10 Stochastic Simulation
Step 3: Simulate the model
Now forecast the model forward but instead of a forecast based solely
on the best guesses for the unknowns, the uncertainty is explicitly
accounted for by including an error term. The error term is obtained
either by drawing from some parametric distribution (such as the
normal distribution) or by taking a random draw from the estimated
one-step-ahead forecast errors
ybT1 +1 = φ0 + φ1 yT + ṽT +1
ybT1 +2 = φ0 + φ1 ybT +1 + ṽT +1
..
.
ybT1 +H = φ0 + φ1 ybT +H−1 + ṽT +H
where ṽT +i are all random drawings from vbt+1|t , the computed onestep-ahead forecast errors from Step 2. The series of forecasts {b
yT1 +1 , ybT1 +2 · · · , ybT1 +H }
represents one repetition of a Monte Carlo simulation of the model.
Step 4: Repeat
Step 3 is now repeated S times to obtain an ensemble of forecasts
ybT1 +1
ybT1 +2
..
.
ybT2 +1
ybT2 +2
..
.
ybT1 +H
ybT1 +H
ybT3 +1
ybT3 +1
ybT3 +1
ybT3 +1
···
···
..
.
ybTS−1
+1
ybTS−1
+2
..
.
ybTS +1
ybTS +2
..
.
···
ybTS−1
+H
ybTS +H
Step 5: Summarise the uncertainty
Each column of this ensemble of forecasts is a representative of a possible outcome of the model and therefore collectively the ensemble
captures the uncertainty of the forecast. In particular, the percentiles
of these simulated forecasts for each time period T + i give an accurate picture of the distribution of the forecast at that time. The
disturbances used to generate the forecasts are drawn from the actual
one-step-ahead prediction errors and not from a normal distribution
and the forecast uncertainty will then reflect any non-symmetry or
fat tails present in the estimated prediction errors.
One practical item of importance concerns the reproduction of the results
of the simulation. In order to reproduce simulation results it is necessary
to use the same set of random numbers. To ensure this reproducibility it is
important to set the seed of the random number generator before carrying
out the simulations. If this is not done, a different set of random numbers
192
Forecasting
200
150
Frequency
100
50
0
0
50
Frequency
100
150
200
will be used each time the simulation is undertaken. Of course as S → ∞
this step becomes unnecessary, but in most practical situations the number
of replications is set as a realistic balance between computing considerations
and accuracy of results.
500
1000
1500
2000
Simulated Index Distribution
2500
-500
0
500
1000
Simulated Loss Distribution
1500
Figure 6.9 Simulated distribution of the equity index and the profit/loss
on the equity index over a six month horizon from July 2004.
Consider now the problem of computing the 99% Value-at-Risk for the
asset which pays the value of the United States equity index over a time
horizon is six months. On the assumption that equity returns are generated
by an AR(1) model, the estimated equation is
rpt = 0.2472 + 0.2853 ret−1 + vbt ,
which may be used to forecast returns for period T + 1 but ensuring that
uncertainty is explicitly introduced. The forecasting equation is therefore
rp
b T +1 = 0.2472 + 0.2853 reT + ṽT +1 ,
where ṽT +1 is a random draw from the computed one-step-ahead forecast
errors computed by means of an in-sample static forecast. The value of the
asset at T + 1 in repetition s is computed as
PbTs +1 = PT exp rp
b T +1 /100
where the forecast returns are adjusted so that they no longer expressed as
6.10 Stochastic Simulation
193
percentages. A recursive procedure is now used to forecast the value of the
asset out to T +6 and the whole process is repeated S times. The distribution
of the value of the asset at T + 6 after S repetitions of the is shown in
panel (a) of Figure 6.9 with the initial value at time T of PT = $1132.76
superimposed. The distribution of simulated losses obtained by subtracting
the initial value of the asset from the terminal value is shown in panel (b) of
Figure 6.9. The first percentile value of this terminal distribution is $833.54
so that six month 99% Value-at-Risk is $833.54 − $1132.76 = $ − 299.13. By
convention the minus sign is dropped when reporting Value-at-Risk.
Of course this approach is equally applicable to simulating Value-at-Risk
for more complex portfolios comprising more than one asset and portfolios
that include derivatives.
6.10.1 Exercises
(1) Recursive Ex Ante Forecasts of Real Equity Returns
pv.wf1, pv.dta, pv.xlsx
Consider monthly data on the logarithm of real United States equity
prices, pt , and the logarithm of real dividend payments, dt , from January
1871 to June 2004.
(a) Estimate an AR(1) model of real equity returns, rpt , with the sample
period ending in June 2004 . Generate forecasts of rpt from July to
December of 2004.
(b) Estimate an AR(2) model of real equity returns, rpt , with the sample
period ending in June 2004. Generate forecasts of rpt from July to
December of 2004.
(c) Repeat parts (a) and (b) for real dividend returns, rdt .
(d) Estimate a VAR(1) containing for rpt and rdt with the sample period ending in June 2004. Generate forecasts of real equity returns
from July to December of 2004.
(e) Estimate a VAR(2) for rpt and rdt with the sample period ending
in June 2004. Generate forecasts of real equity returns from July to
December of 2004.
(f) Estimate a VECM(1) for rpt and rdt with the sample period ending
in June 2004 and where the specification is based on Model 3, as
set out in Chapter 5. Generate forecasts of real equity returns from
July to December of 2004.
194
Forecasting
(g) Repeat part (f) with the lag length in the VECM increasing from 1
to 2.
(h) Repeat part (g) with the VECM specification based on Model 2, as
set out in Chapter 5.
(i) Now estimate a VECM(1) containing real equity returns, rpt , real
dividend returns, rdt , and real earnings growth, ryt , with the sample
period ending in June 2004 and the specification is based on Model
3. Assume a cointegrating rank of 1. Generate forecasts of real equity
returns from July to December of 2004.
(j) Repeat part (a) with the lag length in the VECM increasing from
1 to 2.
(k) Repeat part (i) with the VECM specification based on Model 2
(2) Recursive Ex Post Forecasts of Real Equity Returns
pv.wf1, pv.dta, pv.xlsx
Consider monthly data on the logarithm of real United States equity
prices, pt , and the logarithm of real dividend payments, dt , from January
1871 to June 2004.
(a) Estimate an AR(1) model of real equity percentage returns (y1,t )
with the sample period ending December 2003, and generate ex
post forecasts from January to June of 2004.
(b) Estimate a VAR(1) model of real equity percentage returns (y1,t )
and real dividend percentage returns (y2,t ) with the sample period
ending December 2003, and generate ex post forecasts from January
to June of 2004.
(c) Estimate a VECM(1) model of real equity percentage returns (y1,t )
and real dividend percentage returns (y2,t ) using Model 3, with the
sample period ending December 2003, and generate ex post forecasts
from January to June of 2004.
(d) For each set of forecasts generated in parts (a) to (c), compute the
MSE and the RMSE. Which is the better forecasting model? Discuss.
(3) Regression Based Forecasts of Real Equity Returns
pv.wf1, pv.dta, pv.xlsx
6.10 Stochastic Simulation
195
Consider monthly data on the logarithm of real United States equity
prices, pt , and the logarithm of real dividend payments, dt , from January
1871 to June 2004.
(a) Estimate the following regression of real equity returns (y1,t ) with
real dividend returns (y2,t ) as the explanatory variable, with the
sample period ending in June 2004
y1,t = β1 + β2 y2,t + ut ,
(b) Estimate an AR(1) model of dividend returns
y2,t = ρ0 + ρ1 y2,t−1 + vt ,
and combine this model with the estimated model in part (a) to
generate forecasts of real equity returns from July to December of
2004.
(c) Estimate an AR(2) model of dividend returns
y2,t = ρ0 + ρ1 y2,t−1 + ρ2 y2,t−2 + vt ,
and combine this model with the estimated model in part (a) to
generate forecasts of real equity returns from July to December of
2004.
(d) Use the estimated model in part (a) to generate forecasts of real
equity returns from July to December of 2004 assuming that real
dividends increase at 3% per annum.
(e) Use the estimated model in part (a) to generate forecasts of real
equity returns from July to December of 2004 assuming that real
dividends increase at 10% per annum.
(f) Use the estimated model in part (a) to generate forecasts of real
equity returns from July to December of 2004 assuming that real
dividends increase at 3% per annum from July to September and
by 10% from October to December.
(4) Pooling Forecasts
This question is based on the EViews file HEDGE.WF1 which contains
daily data on the percentage returns of seven hedge fund indexes, from
196
Forecasting
the 1st of April 2003 to the 28th of May 2010, a sample size of T = 1869.
R
R
R
R
R
R
R
CONVERTIBLE
DISTRESSED
EQUITY
EVENT
MACRO
MERGER
NEUTRAL
:
:
:
:
:
:
:
Convertible Arbitrage
Distressed Securities
Equity Hedge
Event Driven
Macro
Merger Arbitrage
Equity Market Neutral
(a) Estimate an AR(2) model of the returns on the equity market neutral hedge fund (y1,t ) with the sample period ending on the 21st of
May 2010 (Friday)
y1,t = ρ0 + ρ1 y1,t−1 + ρ2 y1,t−2 + v1,t .
Generate forecasts of y1,t for the next working week, from the 24th
to the 28th of May, 2010 (save the forecasts in the EViews file and
write out the forecasts in the exam script).
(b) Repeat part (a) for S&P500 returns (y2,t ) (save the forecasts in the
EViews file and write out the forecasts in the exam script).
(c) Estimate a VAR(2) containing the returns on the equity market
neutral hedge fund (y1,t ) and the returns on the S&P500 (y2,t ), with
the sample period ending on the 21st of May 2010 (Friday)
y1,t = α0 + α1 y1,t−1 + α2 y1,t−2 + α3 y2,t−1 + α4 y2,t−2 + v1,t
y2,t = β0 + β1 y1,t−1 + β2 y1,t−2 + β3 y2,t−1 + β4 y2,t−2 + v2,t .
Generate forecasts of y1,t for the next working week, from the 24th
to the 28th of May, 2010.
(d) For the AR(2) and VAR(2) forecasts obtained for the returns on
the equity market neutral hedge fund (y1,t ) and the S&P500 (y2,t ) ,
compute the RMSE (a total of four RMSEs). Discuss which model
yields the superior forecasts.
AR be the forecasts form the AR(2) model of the returns on the
(e) Let f1,t
V AR be the corresponding
equity market neutral hedge fund and f1,t
VAR(2) forecasts. Restricting the sample period just to the forecast
period, 24th to the 28th of May, estimate the following regression
which pools the two sets of forecasts
AR
V AR
y1,t = φ0 + φ1 f1,t
+ φ2 f1,t
+ ηt ,
where ηt is a disturbance term with zero mean and variance ση2 .
Interpret the parameter estimates and discuss whether pooling the
6.10 Stochastic Simulation
197
forecasts has improved the forecasts of the returns on the equity
market neutral hedge fund.
(5) Evaluating Forecast Distributions using the PIT
pv.wf1, pv.dta, pv.xlsx
(a) (Correct Model Specification) Simulate y1 , y2 , · · · , y1000 observations
(T = 1000) from the true model given by a N (0, 1) distribution. Assuming that the specified model is also N (0, 1) , for each t compute
the PIT
ut = Φ(yt ) .
Interpret the properties of the histogram of ut .
(b) (Mean Misspecification) Repeat part (a) except that the true model
is N (0.5, 1) and the misspecified model is N (0, 1).
(c) (Variance Misspecification) Repeat part (a) except that the true
model is N (0, 2) and the misspecified model is N (0, 1) .
(d) (Skewness Misspecification) Repeat part (a) except that the true
model is the standardised gamma distribution
gt − br
yt = √
,
b2 r
where gt is a gamma random variable with parameters {b = 0.5, r = 2}
and the misspecified model is N (0, 1) .
(e) (Kurtosis Misspecification) Repeat part (a) except that the true
model is the standardised Student t distribution
st
yt = r
,
ν
ν−2
where st is a Student t random variable with degrees of freedom
equal to ν = 5, and the misspecified model is N (0, 1) .
(6) Now estimate an AR(1) model of real equity returns, rpt , on monthly
United States data for the period February 1871 to June 2004.
rpt = φ0 + φ1 rpt−1 + vt ,
and compute the standard error of the residuals, σ
b. Use the PIT to
compute the transformed time series
vb t
ut = Φ
.
σ
b
198
Forecasting
Interpret the properties of the histogram of ut .
(7) Predicting the Equity Premium
goyal annual.wf1, goyal annual.dta, goyal annual.xlsx
The data are annual observations on the S&P 500 index, dividends d12t
and the risk free rate of interest, rf reet , used by Goyal and Welch (2003;
2008) in their research on the determinants of the United States equity
premium.
(a) Compute the equity premium, the dividend price ratio and the dividend yields as defined in Goyal and Welch (2003).
(b) Compute basic summary statistics for S&P 500 returns, rmt , the
equity premium, eqpt , the dividend-price ratio dpt and the dividend
yield, dyt .
(c) Plot eqpt , dpt and dyt and compare the results with Figure ??.
(d) Estimate the predictive regressions
eqpt = αy + βy dyt−1 + uy,t
eqpt = αp + βp dpt−1 + up,t
for two different sample periods, 1926 to 1990 and 1926 to 2002, and
compare your results with Table 6.3.
(e) Estimate the regressions recursively using data up to 1940 as the
starting sample in order to obtain recursive estimates of βy and
βp together with 95% confidence intervals. Plot and interpret the
results.
(8) Simulating VaR for a Single Asset
pv.wf1, pv.dta, pv.xlsx
The data are monthly observations on the logarithm of real United
States equity returns, rpt , from January 1871 to June 2004, expressed as
percentages. The problem is to simulate 99% Value-at-Risk over a time
horizon of six months for the asset that pays the value of the United
States equity index
(a) Assume that the equity returns are generated by an AR(1) model
rpt = φ0 + φ1 rpt−1 + vt .
(b) Use the model to provide ex post static forecasts of the entire sample
and thus compute the one-step-ahead prediction errors, vbt+1 .
6.10 Stochastic Simulation
199
(c) Generate 1000 forecasts of the terminal equity price PT +6 using
stochastic simulation by implementing the following steps.
(i) Forecast rp
b sT +k using the scheme
rp
b sT +k = φb0 + φb1 rp
b sT +k−1 + ṽT +k ,
where ṽT +k is a random draw from the estimated one-step-ahead
prediction errors, vbt+1 .
(ii) Compute the simulated equity price
PbTs +k = PbTs +k exp(rp
b sT +k /100)
(iii) Repeat (i) and (ii) for k = 1, 2, · · · 6.
(iv) Repeat (i), (ii) and (iii) for s = 1, 2, · · · 1000.
(d) Compute the 99% Value-at-Risk based on the S simulated equity
prices at T + 6, PbTs +6 .
Download