Chapter 8 - Cambridge University Press

advertisement
Solutions to the Review Questions at the End of Chapter 8
1. (a). A number of stylised features of financial data have been suggested at
the start of Chapter 8 and in other places throughout the book:
- Frequency: Stock market prices are measured every time there is a trade or
somebody posts a new quote, so often the frequency of the data is very high
- Non-stationarity: Financial data (asset prices) are covariance nonstationary; but if we assume that we are talking about returns from here on,
then we can validly consider them to be stationary.
- Linear Independence: They typically have little evidence of linear
(autoregressive) dependence, especially at low frequency.
- Non-normality: They are not normally distributed – they are fat-tailed.
- Volatility pooling and asymmetries in volatility: The returns exhibit
volatility clustering and leverage effects.
Of these, we can allow for the non-stationarity within the linear (ARIMA)
framework, and we can use whatever frequency of data we like to form the
models, but we cannot hope to capture the other features using a linear model
with Gaussian disturbances.
(b) GARCH models are designed to capture the volatility clustering effects in
the returns (GARCH(1,1) can model the dependence in the squared returns, or
squared residuals), and they can also capture some of the unconditional
leptokurtosis, so that even if the residuals of a linear model of the form given
by the first part of the equation in part (e), the û t ’s, are leptokurtic, the
standardised residuals from the GARCH estimation are likely to be less
leptokurtic. Standard GARCH models cannot, however, account for leverage
effects.
(c) This is essentially a “which disadvantages of ARCH are overcome by
GARCH” question. The disadvantages of ARCH(q) are:
- How do we decide on q?
- The required value of q might be very large
-
Non-negativity constraints might be violated.
When we estimate an ARCH model, we require i >0  i=1,2,...,q (since
variance cannot be negative)
GARCH(1,1) goes some way to get around these. The GARCH(1,1) model has
only three parameters in the conditional variance equation, compared to q+1
for the ARCH(q) model, so it is more parsimonious. Since there are less
1/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
parameters than a typical qth order ARCH model, it is less likely that the
estimated values of one or more of these 3 parameters would be negative than
all q+1 parameters. Also, the GARCH(1,1) model can usually still capture all of
the significant dependence in the squared returns since it is possible to write
the GARCH(1,1) model as an ARCH(), so lags of the squared residuals back
into the infinite past help to explain the current value of the conditional
variance, ht.
(d) There are a number that you could choose from, and the relevant ones that
were discussed in Chapter 8, inlcuding EGARCH, GJR or GARCH-M.
The first two of these are designed to capture leverage effects. These are
asymmetries in the response of volatility to positive or negative returns. The
standard GARCH model cannot capture these, since we are squaring the
lagged error term, and we are therefore losing its sign.
The conditional variance equations for the EGARCH and GJR models are
respectively
 u t 1
u
2
log(  t2 )     log(  t21 )   t 1   


 
 t 1
  t 1
And
2
t2 = 0 + 1 ut1
+t-12+ut-12It-1
where It-1
= 1 if ut-1  0
= 0 otherwise
For a leverage effect, we would see  > 0 in both models.
The EGARCH model also has the added benefit that the model is expressed in
terms of the log of ht, so that even if the parameters are negative, the
conditional variance will always be positive. We do not therefore have to
artificially impose non-negativity constraints.
One form of the GARCH-M model can be written
yt =  +other terms + t-1+ ut , ut  N(0,ht)
2
t2 = 0 + 1 ut1
+t-12
so that the model allows the lagged value of the conditional variance to affect
the return. In other words, our best current estimate of the total risk of the
asset influences the return, so that we expect a positive coefficient for . Note
that some authors use t (i.e. a contemporaneous term).
(e). Since yt are returns, we would expect their mean value (which will be
given by ) to be positive and small. We are not told the frequency of the data,
2/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
but suppose that we had a year of daily returns data, then  would be the
average daily percentage return over the year, which might be, say 0.05
(percent). We would expect the value of 0 again to be small, say 0.0001, or
something of that order. The unconditional variance of the disturbances would
be given by 0/(1-(1 +2)). Typical values for 1 and 2 are 0.8 and 0.15
respectively. The important thing is that all three alphas must be positive, and
the sum of 1 and 2 would be expected to be less than, but close to, unity,
with 2 > 1.
(f) Since the model was estimated using maximum likelihood, it does not seem
natural to test this restriction using the F-test via comparisons of residual
sums of squares (and a t-test cannot be used since it is a test involving more
than one coefficient). Thus we should use one of the approaches to hypothesis
testing based on the principles of maximum likelihood (Wald, Lagrange
Multiplier, Likelihood Ratio). The easiest one to use would be the likelihood
ratio test, which would be computed as follows:
1. Estimate the unrestricted model and obtain the maximised value of
the log-likelihood function.
2. Impose the restriction by rearranging the model, and estimate the
restricted model, again obtaining the value of the likelihood at the
new optimum. Note that this value of the LLF will be likely to be
lower than the unconstrained maximum.
3. Then form the likelihood ratio test statistic given by
LR = -2(Lr - Lu)  2(m)
where Lr and Lu are the values of the LLF for the restricted and
unrestricted models respectively, and m denotes the number of
restrictions, which in this case is one.
4. If the value of the test statistic is greater than the critical value, reject
the null hypothesis that the restrictions are valid.
(g) In fact, it is possible to produce volatility (conditional variance) forecasts
in exactly the same way as forecasts are generated from an ARMA model by
iterating through the equations with the conditional expectations operator.
We know all information including that available up to time T. The answer to
this question will use the convention from the GARCH modelling literature to
denote the conditional variance by ht rather than t2. What we want to
generate are forecasts of hT+1 T, hT+2 T, ..., hT+s T where T denotes all
information available up to and including observation T. Adding 1 then 2 then
3 to each of the time subscripts, we have the conditional variance equations
for times T+1, T+2, and T+3:
hT+1 = 0 + 1 u T2 + hT
(1)
3/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
hT+2 = 0 + 1 uT2 1 +
hT+1
(2)
hT+3 = 0 + 1 u T2  2 +hT+2
(3)
Let h1,fT be the one step ahead forecast for h made at time T. This is easy to
calculate since, at time T, we know the values of all the terms on the RHS.
Given h1,fT , how do we calculate h2,f T , that is the 2-step ahead forecast for h
made at time T?
From (2), we can write
h2,f T = 0 + 1 ET( uT2 1 )+ h1,fT
(4)
where ET( uT2 1 ) is the expectation, made at time T, of uT2 1 , which is the
squared disturbance term. The model assumes that the series t has zero
mean, so we can now write
Var(ut) = E[(ut -E(ut))2]= E[(ut)2].
The conditional variance of ut is ht, so
ht  t = E[(ut)2]
Turning this argument around, and applying it to the problem that we have,
ET[(uT+1)2] = hT+1
but we do not know hT+1 , so we replace it with h1,fT , so that (4) becomes
h2,f T = 0 + 1 h1,fT + h1f,T
= 0 + (1+) h1,fT
What about the 3-step ahead forecast?
By similar arguments,
h3,f T = ET(0 + 1 uT2  2 + hT+2)
= 0 + (1+) h2,f T
= 0 + (1+)[ 0 + (1+) h1,fT ]
And so on. This is the method we could use to forecast the conditional
variance of yt. If yt were, say, daily returns on the FTSE, we could use these
volatility forecasts as an input in the Black Scholes equation to help determine
the appropriate price of FTSE index options.
(h) An s-step ahead forecast for the conditional variance could be written
4/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
s 1
hsf,T   0  ( 1   ) i 1  ( 1   ) s 1 h1f,T
(x)
i 1
For the new value of , the persistence of shocks to the conditional variance,
given by (1+) is 0.1251+ 0.98 = 1.1051, which is bigger than 1. It is obvious
from equation (x), that any value for (1+) bigger than one will lead the
forecasts to explode. The forecasts will keep on increasing and will tend to
infinity as the forecast horizon increases (i.e. as s increases). This is obviously
an undesirable property of a forecasting model! This is called “nonstationarity in variance”.
For (1+)<1, the forecasts will converge on the unconditional variance as the
forecast horizon increases. For (1+) = 1, known as “integrated GARCH” or
IGARCH, there is a unit root in the conditional variance, and the forecasts will
stay constant as the forecast horizon increases.
2. (a) Maximum likelihood works by finding the most likely values of the
parameters given the actual data. More specifically, a log-likelihood function is
formed, usually based upon a normality assumption for the disturbance terms,
and the values of the parameters that maximise it are sought. Maximum
likelihood estimation can be employed to find parameter values for both linear
and non-linear models.
(b) The three hypothesis testing procedures available within the maximum
likelihood approach are lagrange multiplier (LM), likelihood ratio (LR) and
Wald tests. The differences between them are described in Figure 8.4, and are
not defined again here. The Lagrange multiplier test involves estimation only
under the null hypothesis, the likelihood ratio test involves estimation under
both the null and the alternative hypothesis, while the Wald test involves
estimation only under the alternative. Given this, it should be evident that the
LM test will in many cases be the simplest to compute since the restrictions
implied by the null hypothesis will usually lead to some terms cancelling out to
give a simplified model relative to the unrestricted model.
(c) OLS will give identical parameter estimates for all of the intercept and
slope parameters, but will give a slightly different parameter estimate for the
variance of the disturbances. These are shown in the Appendix to Chapter 8.
The difference in the OLS and maximum likelihood estimators for the variance
of the disturbances can be seen by comparing the divisors of equations (8A.25)
and (8A.26).
3. (a) The unconditional variance of a random variable could be thought of,
abusing the terminology somewhat, as the variance without reference to a
time index, or rather the variance of the data taken as a whole, without
conditioning on a particular information set. The conditional variance, on the
other hand, is the variance of a random variable at a particular point in time,
conditional upon a particular information set. The variance of ut,  t2 ,
conditional upon its previous values, may be written  t2 = Var(ut ut-1, ut-2,...)
5/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
= E[(ut-E(ut))2 ut-1, ut-2,...], while the unconditional variance would simply be
Var(ut) = 2.
Forecasts from models such as GARCH would be conditional forecasts,
produced for a particular point in time, while historical volatility is an
unconditional measure that would generate unconditional forecasts. For
producing 1-step ahead forecasts, it is likely that a conditional model making
use of recent relevant information will provide more accurate forecasts
(although whether it would in any particular application is an empirical
question). As the forecast horizon increases, however, a GARCH model that is
“stationary in variance” will yield forecasts that converge upon the long-term
average (historical) volatility. By the time we reach 20-steps ahead, the
GARCH forecast is likely to be very close to the unconditional variance so that
there is little gain likely from using GARCH models for forecasts with very
long horizons. For approaches such as EWMA, where there is no converge on
an unconditional average as the prediction horizon increases, they are likely to
produce inferior forecasts as the horizon increases for series that show a longterm mean reverting pattern in volatility. This arises because if the volatility
estimate is above its historical average at the end of the in-sample estimation
period, EWMA would predict that it would continue at this level while in
reality it is likely to fall back towards its long-term mean eventually.
(b) Equation (8.110) is an equation showing that the variance of the
disturbances is not fixed over time, but rather varies systematically according
to a GARCH process. This is therefore an example of heteroscedasticity. Thus,
the consequences if it were present but ignored would be those described in
Chapter 4. In summary, the coefficient estimates would still be consistent and
unbiased but not efficient. There is therefore the possibility that the standard
error estimates calculated using the usual formulae would be incorrect leading
to inappropriate inferences.
(c) There are of course a large number of competing methods for measuring
and forecasting volatility, and it is worth stating at the outset that no research
has suggested that one method is universally superior to all others, so that
each method has its merits and may work well in certain circumstances.
Historical measures of volatility are just simple average measures – for
example, the standard deviation of daily returns over a 3-year period. As such,
they are the simplest to calculate, but suffer from a number of shortcomings.
First, since the observations are unweighted, historical volatility can be slow to
respond to changing market circumstances, and would not take advantage of
short-term persistence in volatility that could lead to more accurate shortterm forecasts. Second, if there is an extreme event (e.g. a market crash), this
will lead the measured volatility to be high for a number of observations equal
to the measurement sample length. For example, suppose that volatility is
being measured using a 1-year (250-day) sample of returns, which is being
rolled forward one observation at a time to produce a series of 1-step ahead
volatility forecasts. If a market crash occurs on day t, this will increase the
measured level of volatility by the same amount right until day t+250 (i.e. it
will not decay away) and then it will disappear completely from the sample so
that measured volatility will fall abruptly. Exponential weighting of
6/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
observations as the EWMA model does, where the weight attached to each
observation in the calculation of volatility declines exponentially as the
observations go further back in time, will resolve both of these issues.
However, if forecasts are produced from an EWMA model, these forecasts will
not converge upon the long-term mean volatility estimate as the prediction
horizon increases, and this may be undesirable (see part (a) of this question).
There is also the issue of how the  parameter is calculated (see equation (8.5)
on page 443, although, of course, it can be estimated using maximum
likelihood). GARCH models overcome this problem with the forecasts as well,
since a GARCH model that is “stationary in variance” will have forecasts that
converge upon the long-term average as the horizon increases (see part (a) of
this question). GARCH models will also overcome the two problems with
unweighted averages described above. However, GARCH models are far more
difficult to estimate than the other two models, and sometimes, when
estimation goes wrong, the resulting parameter estimates can be nonsensical,
leading to nonsensical forecasts as well. Thus it is important to apply a “reality
check” to estimated GARCH models to ensure that the coefficient estimates
are intuitively plausible. Finally, implied volatility estimates are those derived
from the prices of traded options. The “market-implied” volatility forecasts are
obtained by “backing out” the volatility from the price of an option using an
option pricing formula together with an iterative search procedure. Financial
market practitioners would probably argue that implied forecasts of the future
volatility of the underlying asset are likely to be more accurate than those
estimated from statistical models because the people who work in financial
markets know more about what is likely to happen to those instruments in the
future than econometricians do. Also, an “inaccurate” volatility forecast
implied from an option price may imply an inaccurate option price and
therefore the possibility of arbitrage opportunities. However, the empirical
evidence on the accuracy of implied versus statistical forecasting models is
mixed, and some research suggests that implied volatility systematically overestimates the true volatility of the underlying asset returns. This may arise
from the use of an incorrect option pricing formula to obtain the implied
volatility – for example, the Black-Scholes model assumes that the volatility of
the underlying asset is fixed (non-stochastic), and also that the returns to the
underlying asset are normally distributed. Both of these assumptions are at
best tenuous. A further reason for the apparent failure of the implied model
may be a manifestation of the “peso problem”. This occurs when market
practitioners include in the information set that they use to price options the
possibility of a very extreme return that has a low probability of occurrence,
but has important ramifications for the price of the option due to its sheer size.
If this event does not occur in the sample period over which the implied and
actual volatilities are compared, the implied model will appear inaccurate. Yet
this does not mean that the practitioners’ forecasts were wrong, but rather
simply that the low-probability, high-impact event did not happen during that
sample period. It is also worth stating that only one implied volatility can be
calculated from each option price for the “average” volatility of the underlying
asset over the remaining lifetime of the option.
7/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
4. (a). A possible diagonal VECH model would be
y1t  1  u1t
u 
h
, ut   1t   N(0, t) ,  t   11t
y 2t   2  u 2t
u 2t 
 h12t
h12t 

h22t 
h11t  11  11u1t 1  11h11t 1
2
h12t  12  12u1t 1u 2t 1  12 h12t 1
2
h22t   22   22u 2t 1   22 h22t 1
The coefficients expected would be very small for the conditional mean
coefficients, 1 and 2, since they are average daily returns, and they could be
positive or negative, although a positive average return is probably more
likely. Similarly, the intercept terms in the conditional variance equations
would also be expected to be small since and positive this is daily data. The
coefficients on the lagged squared error and lagged conditional variance in the
conditional variance equations must lie between zero and one, and more
specifically, the following might be expected: 11 and 22  0.1-0.3; 11 and 22
 0.5-0.8, with 11 + 11 < 1 and 22 + 22 < 1. The coefficient values for the
conditional covariance equation are more difficult to predict: 11 + 11 < 1 is
still required for the model to be useful for forecasting covariances. The
parameters in this equation could be negative, although given that the returns
for two stock markets are likely to be positively correlated, the parameters
would probably be positive, although the model would still be a valid one if
they were not.
(b) One of two procedures could be used. Either the daily returns data would
be transformed into weekly returns data by adding up the returns over all of
the trading days in each week, or the model would be estimated using the daily
data. Daily forecasts would then be produced up to 10 days (2 trading weeks)
ahead.
In both cases, the models would be estimated, and forecasts made of the
conditional variance and conditional covariance. If daily data were used to
estimate the model, the forecasts for the conditional covariance forecasts for
the 5 trading days in a week would be added together to form a covariance
forecast for that week, and similarly for the variance. If the returns had been
aggregated to the weekly frequency, the forecasts used would simply be 1-step
ahead.
Finally, the conditional covariance forecast for the week would be divided by
the product of the square roots of the conditional variance forecasts to obtain
a correlation forecast.
(c) There are various approaches available, including computing simple
historical correlations, exponentially weighted measures, and implied
correlations derived from the prices of traded options.
8/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
(d) The simple historical approach is obviously the simplest to calculate, but
has two main drawbacks. First, it does not weight information: so any
observations within the sample will be given equal weight, while those outside
the sample will automatically be given a weight of zero. Second, any extreme
observations in the sample will have an equal effect until they abruptly drop
out of the measurement period. For example, suppose that one year of daily
data is used to estimate volatility. If the sample is rolled through one day at a
time, an observation corresponding to a market crash will appear in the next
250 samples, with equal effect, but with then disappear altogether.
Exponentially weighted moving average models of covariance and variance
(which can be used to construct correlation measures) more plausibly give
additional weight to more recent observations, with the weight given to each
observation declining exponentially as they go further back into the past.
These models have the undesirable property that the forecasts for different
numbers of steps ahead will be the same. Hence the forecasts will not tend to
the unconditional mean as those from a suitable GARCH model would.
Finally, implied correlations may at first blush appear to be the best method
for calculating correlation forecasts accurately, for they rely on information
obtained from the market itself. After all, who should know better about
future correlations in the markets than the people who work in those markets?
However, market-based measures of volatility and correlation are sometimes
surprisingly inaccurate, and are also sometimes difficult to obtain. Most
fundamentally, correlation forecasts will only be available where there is an
option traded whose payoffs depend on the prices of two underlying assets.
For all other situations, a market-based correlation forecast will simply not be
available.
Finally, multivariate GARCH models will give more weight to recent
observations in computing the forecasts, but maybe difficult and compute
time-intensive to estimate.
5. A news impact curve shows the effect of shocks of different magnitudes on
the next period’s volatility. These curves can be used to examine visually
whether there are any asymmetry effects in volatility for a particular set of
data. For the data given in this question, the way I would approach it is to put
values of the lagged error into column A ranging from –1 to +1 in increments
of 0.01. Then simply enter the formulae for the GARCH and EGARCH models
into columns 2 and 3 that refer to those values of the lagged error put in
column A. The graph obtained would be
9/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
0.2
GARCH
EGARCH
0.18
Value of Conditional Variance
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-1
-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
Value of Lagged Shock
This graph is a bit of an odd one, in the sense that the conditional variance is
always lower for the EGARCH model. This may suggest estimation error in
one of the models. There is some evidence for asymmetries in the case of the
EGARCH model since the value of the conditional variance is 0.1 for a shock of
1 and 0.12 for a shock of –1.
(b) This is a tricky one. The leverage effect is used to rationalise a finding of
asymmetries in equity returns, but such an argument cannot be applied to
foreign exchange returns, since the concept of a Debt/Equity ratio has no
meaning in that context.
On the other hand, there is equally no reason to suppose that there are no
asymmetries in the case of fx data. The data used here were daily USD_GBP
returns for 1974-1994. It might be the case that, for example, that news
relating to one country has a differential impact to equally good and bad news
relating to another. To offer one illustration, it might be the case that the bad
news for the currently weak euro has a bigger impact on volatility than news
about the currently strong dollar. This would lead to asymmetries in the news
impact curve. Finally, it is also worth noting that the asymmetry term in the
EGARCH model, 1, is not statistically significant in this case.
10/10
“Introductory Econometrics for Finance” © Chris Brooks 2008
Download