Solutions to the Review Questions at the End of Chapter 8 1. (a). A number of stylised features of financial data have been suggested at the start of Chapter 8 and in other places throughout the book: - Frequency: Stock market prices are measured every time there is a trade or somebody posts a new quote, so often the frequency of the data is very high - Non-stationarity: Financial data (asset prices) are covariance nonstationary; but if we assume that we are talking about returns from here on, then we can validly consider them to be stationary. - Linear Independence: They typically have little evidence of linear (autoregressive) dependence, especially at low frequency. - Non-normality: They are not normally distributed – they are fat-tailed. - Volatility pooling and asymmetries in volatility: The returns exhibit volatility clustering and leverage effects. Of these, we can allow for the non-stationarity within the linear (ARIMA) framework, and we can use whatever frequency of data we like to form the models, but we cannot hope to capture the other features using a linear model with Gaussian disturbances. (b) GARCH models are designed to capture the volatility clustering effects in the returns (GARCH(1,1) can model the dependence in the squared returns, or squared residuals), and they can also capture some of the unconditional leptokurtosis, so that even if the residuals of a linear model of the form given by the first part of the equation in part (e), the û t ’s, are leptokurtic, the standardised residuals from the GARCH estimation are likely to be less leptokurtic. Standard GARCH models cannot, however, account for leverage effects. (c) This is essentially a “which disadvantages of ARCH are overcome by GARCH” question. The disadvantages of ARCH(q) are: - How do we decide on q? - The required value of q might be very large - Non-negativity constraints might be violated. When we estimate an ARCH model, we require i >0 i=1,2,...,q (since variance cannot be negative) GARCH(1,1) goes some way to get around these. The GARCH(1,1) model has only three parameters in the conditional variance equation, compared to q+1 for the ARCH(q) model, so it is more parsimonious. Since there are less 1/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 parameters than a typical qth order ARCH model, it is less likely that the estimated values of one or more of these 3 parameters would be negative than all q+1 parameters. Also, the GARCH(1,1) model can usually still capture all of the significant dependence in the squared returns since it is possible to write the GARCH(1,1) model as an ARCH(), so lags of the squared residuals back into the infinite past help to explain the current value of the conditional variance, ht. (d) There are a number that you could choose from, and the relevant ones that were discussed in Chapter 8, inlcuding EGARCH, GJR or GARCH-M. The first two of these are designed to capture leverage effects. These are asymmetries in the response of volatility to positive or negative returns. The standard GARCH model cannot capture these, since we are squaring the lagged error term, and we are therefore losing its sign. The conditional variance equations for the EGARCH and GJR models are respectively u t 1 u 2 log( t2 ) log( t21 ) t 1 t 1 t 1 And 2 t2 = 0 + 1 ut1 +t-12+ut-12It-1 where It-1 = 1 if ut-1 0 = 0 otherwise For a leverage effect, we would see > 0 in both models. The EGARCH model also has the added benefit that the model is expressed in terms of the log of ht, so that even if the parameters are negative, the conditional variance will always be positive. We do not therefore have to artificially impose non-negativity constraints. One form of the GARCH-M model can be written yt = +other terms + t-1+ ut , ut N(0,ht) 2 t2 = 0 + 1 ut1 +t-12 so that the model allows the lagged value of the conditional variance to affect the return. In other words, our best current estimate of the total risk of the asset influences the return, so that we expect a positive coefficient for . Note that some authors use t (i.e. a contemporaneous term). (e). Since yt are returns, we would expect their mean value (which will be given by ) to be positive and small. We are not told the frequency of the data, 2/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 but suppose that we had a year of daily returns data, then would be the average daily percentage return over the year, which might be, say 0.05 (percent). We would expect the value of 0 again to be small, say 0.0001, or something of that order. The unconditional variance of the disturbances would be given by 0/(1-(1 +2)). Typical values for 1 and 2 are 0.8 and 0.15 respectively. The important thing is that all three alphas must be positive, and the sum of 1 and 2 would be expected to be less than, but close to, unity, with 2 > 1. (f) Since the model was estimated using maximum likelihood, it does not seem natural to test this restriction using the F-test via comparisons of residual sums of squares (and a t-test cannot be used since it is a test involving more than one coefficient). Thus we should use one of the approaches to hypothesis testing based on the principles of maximum likelihood (Wald, Lagrange Multiplier, Likelihood Ratio). The easiest one to use would be the likelihood ratio test, which would be computed as follows: 1. Estimate the unrestricted model and obtain the maximised value of the log-likelihood function. 2. Impose the restriction by rearranging the model, and estimate the restricted model, again obtaining the value of the likelihood at the new optimum. Note that this value of the LLF will be likely to be lower than the unconstrained maximum. 3. Then form the likelihood ratio test statistic given by LR = -2(Lr - Lu) 2(m) where Lr and Lu are the values of the LLF for the restricted and unrestricted models respectively, and m denotes the number of restrictions, which in this case is one. 4. If the value of the test statistic is greater than the critical value, reject the null hypothesis that the restrictions are valid. (g) In fact, it is possible to produce volatility (conditional variance) forecasts in exactly the same way as forecasts are generated from an ARMA model by iterating through the equations with the conditional expectations operator. We know all information including that available up to time T. The answer to this question will use the convention from the GARCH modelling literature to denote the conditional variance by ht rather than t2. What we want to generate are forecasts of hT+1 T, hT+2 T, ..., hT+s T where T denotes all information available up to and including observation T. Adding 1 then 2 then 3 to each of the time subscripts, we have the conditional variance equations for times T+1, T+2, and T+3: hT+1 = 0 + 1 u T2 + hT (1) 3/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 hT+2 = 0 + 1 uT2 1 + hT+1 (2) hT+3 = 0 + 1 u T2 2 +hT+2 (3) Let h1,fT be the one step ahead forecast for h made at time T. This is easy to calculate since, at time T, we know the values of all the terms on the RHS. Given h1,fT , how do we calculate h2,f T , that is the 2-step ahead forecast for h made at time T? From (2), we can write h2,f T = 0 + 1 ET( uT2 1 )+ h1,fT (4) where ET( uT2 1 ) is the expectation, made at time T, of uT2 1 , which is the squared disturbance term. The model assumes that the series t has zero mean, so we can now write Var(ut) = E[(ut -E(ut))2]= E[(ut)2]. The conditional variance of ut is ht, so ht t = E[(ut)2] Turning this argument around, and applying it to the problem that we have, ET[(uT+1)2] = hT+1 but we do not know hT+1 , so we replace it with h1,fT , so that (4) becomes h2,f T = 0 + 1 h1,fT + h1f,T = 0 + (1+) h1,fT What about the 3-step ahead forecast? By similar arguments, h3,f T = ET(0 + 1 uT2 2 + hT+2) = 0 + (1+) h2,f T = 0 + (1+)[ 0 + (1+) h1,fT ] And so on. This is the method we could use to forecast the conditional variance of yt. If yt were, say, daily returns on the FTSE, we could use these volatility forecasts as an input in the Black Scholes equation to help determine the appropriate price of FTSE index options. (h) An s-step ahead forecast for the conditional variance could be written 4/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 s 1 hsf,T 0 ( 1 ) i 1 ( 1 ) s 1 h1f,T (x) i 1 For the new value of , the persistence of shocks to the conditional variance, given by (1+) is 0.1251+ 0.98 = 1.1051, which is bigger than 1. It is obvious from equation (x), that any value for (1+) bigger than one will lead the forecasts to explode. The forecasts will keep on increasing and will tend to infinity as the forecast horizon increases (i.e. as s increases). This is obviously an undesirable property of a forecasting model! This is called “nonstationarity in variance”. For (1+)<1, the forecasts will converge on the unconditional variance as the forecast horizon increases. For (1+) = 1, known as “integrated GARCH” or IGARCH, there is a unit root in the conditional variance, and the forecasts will stay constant as the forecast horizon increases. 2. (a) Maximum likelihood works by finding the most likely values of the parameters given the actual data. More specifically, a log-likelihood function is formed, usually based upon a normality assumption for the disturbance terms, and the values of the parameters that maximise it are sought. Maximum likelihood estimation can be employed to find parameter values for both linear and non-linear models. (b) The three hypothesis testing procedures available within the maximum likelihood approach are lagrange multiplier (LM), likelihood ratio (LR) and Wald tests. The differences between them are described in Figure 8.4, and are not defined again here. The Lagrange multiplier test involves estimation only under the null hypothesis, the likelihood ratio test involves estimation under both the null and the alternative hypothesis, while the Wald test involves estimation only under the alternative. Given this, it should be evident that the LM test will in many cases be the simplest to compute since the restrictions implied by the null hypothesis will usually lead to some terms cancelling out to give a simplified model relative to the unrestricted model. (c) OLS will give identical parameter estimates for all of the intercept and slope parameters, but will give a slightly different parameter estimate for the variance of the disturbances. These are shown in the Appendix to Chapter 8. The difference in the OLS and maximum likelihood estimators for the variance of the disturbances can be seen by comparing the divisors of equations (8A.25) and (8A.26). 3. (a) The unconditional variance of a random variable could be thought of, abusing the terminology somewhat, as the variance without reference to a time index, or rather the variance of the data taken as a whole, without conditioning on a particular information set. The conditional variance, on the other hand, is the variance of a random variable at a particular point in time, conditional upon a particular information set. The variance of ut, t2 , conditional upon its previous values, may be written t2 = Var(ut ut-1, ut-2,...) 5/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 = E[(ut-E(ut))2 ut-1, ut-2,...], while the unconditional variance would simply be Var(ut) = 2. Forecasts from models such as GARCH would be conditional forecasts, produced for a particular point in time, while historical volatility is an unconditional measure that would generate unconditional forecasts. For producing 1-step ahead forecasts, it is likely that a conditional model making use of recent relevant information will provide more accurate forecasts (although whether it would in any particular application is an empirical question). As the forecast horizon increases, however, a GARCH model that is “stationary in variance” will yield forecasts that converge upon the long-term average (historical) volatility. By the time we reach 20-steps ahead, the GARCH forecast is likely to be very close to the unconditional variance so that there is little gain likely from using GARCH models for forecasts with very long horizons. For approaches such as EWMA, where there is no converge on an unconditional average as the prediction horizon increases, they are likely to produce inferior forecasts as the horizon increases for series that show a longterm mean reverting pattern in volatility. This arises because if the volatility estimate is above its historical average at the end of the in-sample estimation period, EWMA would predict that it would continue at this level while in reality it is likely to fall back towards its long-term mean eventually. (b) Equation (8.110) is an equation showing that the variance of the disturbances is not fixed over time, but rather varies systematically according to a GARCH process. This is therefore an example of heteroscedasticity. Thus, the consequences if it were present but ignored would be those described in Chapter 4. In summary, the coefficient estimates would still be consistent and unbiased but not efficient. There is therefore the possibility that the standard error estimates calculated using the usual formulae would be incorrect leading to inappropriate inferences. (c) There are of course a large number of competing methods for measuring and forecasting volatility, and it is worth stating at the outset that no research has suggested that one method is universally superior to all others, so that each method has its merits and may work well in certain circumstances. Historical measures of volatility are just simple average measures – for example, the standard deviation of daily returns over a 3-year period. As such, they are the simplest to calculate, but suffer from a number of shortcomings. First, since the observations are unweighted, historical volatility can be slow to respond to changing market circumstances, and would not take advantage of short-term persistence in volatility that could lead to more accurate shortterm forecasts. Second, if there is an extreme event (e.g. a market crash), this will lead the measured volatility to be high for a number of observations equal to the measurement sample length. For example, suppose that volatility is being measured using a 1-year (250-day) sample of returns, which is being rolled forward one observation at a time to produce a series of 1-step ahead volatility forecasts. If a market crash occurs on day t, this will increase the measured level of volatility by the same amount right until day t+250 (i.e. it will not decay away) and then it will disappear completely from the sample so that measured volatility will fall abruptly. Exponential weighting of 6/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 observations as the EWMA model does, where the weight attached to each observation in the calculation of volatility declines exponentially as the observations go further back in time, will resolve both of these issues. However, if forecasts are produced from an EWMA model, these forecasts will not converge upon the long-term mean volatility estimate as the prediction horizon increases, and this may be undesirable (see part (a) of this question). There is also the issue of how the parameter is calculated (see equation (8.5) on page 443, although, of course, it can be estimated using maximum likelihood). GARCH models overcome this problem with the forecasts as well, since a GARCH model that is “stationary in variance” will have forecasts that converge upon the long-term average as the horizon increases (see part (a) of this question). GARCH models will also overcome the two problems with unweighted averages described above. However, GARCH models are far more difficult to estimate than the other two models, and sometimes, when estimation goes wrong, the resulting parameter estimates can be nonsensical, leading to nonsensical forecasts as well. Thus it is important to apply a “reality check” to estimated GARCH models to ensure that the coefficient estimates are intuitively plausible. Finally, implied volatility estimates are those derived from the prices of traded options. The “market-implied” volatility forecasts are obtained by “backing out” the volatility from the price of an option using an option pricing formula together with an iterative search procedure. Financial market practitioners would probably argue that implied forecasts of the future volatility of the underlying asset are likely to be more accurate than those estimated from statistical models because the people who work in financial markets know more about what is likely to happen to those instruments in the future than econometricians do. Also, an “inaccurate” volatility forecast implied from an option price may imply an inaccurate option price and therefore the possibility of arbitrage opportunities. However, the empirical evidence on the accuracy of implied versus statistical forecasting models is mixed, and some research suggests that implied volatility systematically overestimates the true volatility of the underlying asset returns. This may arise from the use of an incorrect option pricing formula to obtain the implied volatility – for example, the Black-Scholes model assumes that the volatility of the underlying asset is fixed (non-stochastic), and also that the returns to the underlying asset are normally distributed. Both of these assumptions are at best tenuous. A further reason for the apparent failure of the implied model may be a manifestation of the “peso problem”. This occurs when market practitioners include in the information set that they use to price options the possibility of a very extreme return that has a low probability of occurrence, but has important ramifications for the price of the option due to its sheer size. If this event does not occur in the sample period over which the implied and actual volatilities are compared, the implied model will appear inaccurate. Yet this does not mean that the practitioners’ forecasts were wrong, but rather simply that the low-probability, high-impact event did not happen during that sample period. It is also worth stating that only one implied volatility can be calculated from each option price for the “average” volatility of the underlying asset over the remaining lifetime of the option. 7/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 4. (a). A possible diagonal VECH model would be y1t 1 u1t u h , ut 1t N(0, t) , t 11t y 2t 2 u 2t u 2t h12t h12t h22t h11t 11 11u1t 1 11h11t 1 2 h12t 12 12u1t 1u 2t 1 12 h12t 1 2 h22t 22 22u 2t 1 22 h22t 1 The coefficients expected would be very small for the conditional mean coefficients, 1 and 2, since they are average daily returns, and they could be positive or negative, although a positive average return is probably more likely. Similarly, the intercept terms in the conditional variance equations would also be expected to be small since and positive this is daily data. The coefficients on the lagged squared error and lagged conditional variance in the conditional variance equations must lie between zero and one, and more specifically, the following might be expected: 11 and 22 0.1-0.3; 11 and 22 0.5-0.8, with 11 + 11 < 1 and 22 + 22 < 1. The coefficient values for the conditional covariance equation are more difficult to predict: 11 + 11 < 1 is still required for the model to be useful for forecasting covariances. The parameters in this equation could be negative, although given that the returns for two stock markets are likely to be positively correlated, the parameters would probably be positive, although the model would still be a valid one if they were not. (b) One of two procedures could be used. Either the daily returns data would be transformed into weekly returns data by adding up the returns over all of the trading days in each week, or the model would be estimated using the daily data. Daily forecasts would then be produced up to 10 days (2 trading weeks) ahead. In both cases, the models would be estimated, and forecasts made of the conditional variance and conditional covariance. If daily data were used to estimate the model, the forecasts for the conditional covariance forecasts for the 5 trading days in a week would be added together to form a covariance forecast for that week, and similarly for the variance. If the returns had been aggregated to the weekly frequency, the forecasts used would simply be 1-step ahead. Finally, the conditional covariance forecast for the week would be divided by the product of the square roots of the conditional variance forecasts to obtain a correlation forecast. (c) There are various approaches available, including computing simple historical correlations, exponentially weighted measures, and implied correlations derived from the prices of traded options. 8/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 (d) The simple historical approach is obviously the simplest to calculate, but has two main drawbacks. First, it does not weight information: so any observations within the sample will be given equal weight, while those outside the sample will automatically be given a weight of zero. Second, any extreme observations in the sample will have an equal effect until they abruptly drop out of the measurement period. For example, suppose that one year of daily data is used to estimate volatility. If the sample is rolled through one day at a time, an observation corresponding to a market crash will appear in the next 250 samples, with equal effect, but with then disappear altogether. Exponentially weighted moving average models of covariance and variance (which can be used to construct correlation measures) more plausibly give additional weight to more recent observations, with the weight given to each observation declining exponentially as they go further back into the past. These models have the undesirable property that the forecasts for different numbers of steps ahead will be the same. Hence the forecasts will not tend to the unconditional mean as those from a suitable GARCH model would. Finally, implied correlations may at first blush appear to be the best method for calculating correlation forecasts accurately, for they rely on information obtained from the market itself. After all, who should know better about future correlations in the markets than the people who work in those markets? However, market-based measures of volatility and correlation are sometimes surprisingly inaccurate, and are also sometimes difficult to obtain. Most fundamentally, correlation forecasts will only be available where there is an option traded whose payoffs depend on the prices of two underlying assets. For all other situations, a market-based correlation forecast will simply not be available. Finally, multivariate GARCH models will give more weight to recent observations in computing the forecasts, but maybe difficult and compute time-intensive to estimate. 5. A news impact curve shows the effect of shocks of different magnitudes on the next period’s volatility. These curves can be used to examine visually whether there are any asymmetry effects in volatility for a particular set of data. For the data given in this question, the way I would approach it is to put values of the lagged error into column A ranging from –1 to +1 in increments of 0.01. Then simply enter the formulae for the GARCH and EGARCH models into columns 2 and 3 that refer to those values of the lagged error put in column A. The graph obtained would be 9/10 “Introductory Econometrics for Finance” © Chris Brooks 2008 0.2 GARCH EGARCH 0.18 Value of Conditional Variance 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Value of Lagged Shock This graph is a bit of an odd one, in the sense that the conditional variance is always lower for the EGARCH model. This may suggest estimation error in one of the models. There is some evidence for asymmetries in the case of the EGARCH model since the value of the conditional variance is 0.1 for a shock of 1 and 0.12 for a shock of –1. (b) This is a tricky one. The leverage effect is used to rationalise a finding of asymmetries in equity returns, but such an argument cannot be applied to foreign exchange returns, since the concept of a Debt/Equity ratio has no meaning in that context. On the other hand, there is equally no reason to suppose that there are no asymmetries in the case of fx data. The data used here were daily USD_GBP returns for 1974-1994. It might be the case that, for example, that news relating to one country has a differential impact to equally good and bad news relating to another. To offer one illustration, it might be the case that the bad news for the currently weak euro has a bigger impact on volatility than news about the currently strong dollar. This would lead to asymmetries in the news impact curve. Finally, it is also worth noting that the asymmetry term in the EGARCH model, 1, is not statistically significant in this case. 10/10 “Introductory Econometrics for Finance” © Chris Brooks 2008