GARCH Estimation by ML - The Ohio State University

advertisement
GARCH Estimation by True Maximum Likelihood
J. Huston McCulloch
Ohio State University
July 31, 2001
GARCH Estimation by ML
The GARCH(1,1) model
 t   t zt , zt ~ iid f ( z ),
 t2    t21   t21
(1)
has an easily computed conditional likelihood, conditional on 1, of
T
L( ,  ,  ;  1 )   f ( t /  t ) /  t .
(2)
t 1
There may be additional parameters, not listed, governing the shape of the distribution of
f(z) and/or regression parameters if the errors are regression residuals.
Each scale value t (and therefore the initial value 1) has a common
unconditional distribution with some precise p.d.f. g() and corresponding c.d.f. G().
This is not a standard distribution, but it can be approximated numerically, given the
parameters and f(z), by iterating repeatedly on (1), using an appropriate transformation of
 onto a finite interval. Even if there were a long run of zero errors, t2 could never fall
below    /(1   ) , although there is some probability that it could come arbitrarily
2
close to this value from above. The square root of  is therefore the infimum of the
support of g().1
Given g(), the exact likelihood of the errors may be found simply by taking the
expectation of (2) over g(1):
L( ,  ,  )  E g ( 1 ) L( ,  ,  ;  1 )



L( ,  ,  ;  1 ) g ( 1 )d 1
(3)
The improper integral in (3) may be simplified to a proper integral by means of the
substitution y = G(1), as follows:2
1
L( ,  ,  )   L( ,  ,  ; G 1 ( y )) dy
0
(4)
Note that as 1  , the initial terms in (2) decline to 0, so that
L( ,  ,  ; G 1 (1))  L( ,  ,  ; )  0 .
(5)
Comparison to Alternative Initializations
Most practitioners simply insert some value of 1 into (2) and appeal to the fact
that maximizing the resulting function over the GARCH parameters (and any other
parameters that may be present) will give Quasi Maximum Likelihood estimates whose
dependence on 1 will die out as the sample size grows, and hence will share the
consistency of true ML estimates. However, with finite sample sizes, the choice of 1
If the process truly began at t = 1, 1 could conceivably take on any positive value. It is assumed here
that in fact the process has already been going on indefinitely at t = 1, even though it is not observed until t
= 1.
1
Equation (4) is an example of what might be called “GARCHian integration,” by analogy to “Gaussian
integration.” In the latter, the Gaussian c.d.f. is used as a transformation to simplify an expectation under a
Gaussian p.d.f.
2
3
can have an important effect on the parameter estimates and can alter the distribution of
test statistics. Even asymptotically, it is not clear (at least not to me) that the distribution
of QML test statistics will not be sensitive to the choice of 1.
McCulloch (1985) introduced a conditionally symmetric stable IGARCH(1,1)
model to characterize interest rate volatility. This study maximized the conditional
likelihood (2), basing 1 on the first 12 monthly observations on t, and otherwise
discarding these observations.3 This procedure of course makes less than full use of the
available data. Bollerslev (1986, p. 315, n. 4) likewise conditions on pre-sample values.
Engle and Bollerslev (1986) set
 12 
1
T
T

t 1
2
t
.
(6)
EViews 4.0 (2000, p. 385) similarly “backcasts” 12, using a geometrically declining
weighted average of the squared errors, with an arbitrary decay factor of 0.7. By thus
conditioning on in-sample values, both these approaches technically violate the joint
probability decomposition that leads to (2). Such an initialization will not adequately
penalize parameter values that do not fit the data well, since 1 is being determined by
the data rather than by the parameters. “Backcasting” will overfit particularly well, since
it accommodates the volatility of the first several errors quite closely, regardless of the
parameters being considered.
3
Like Engle and Bollerslev (1986), McCulloch (1985) erroneously omitted the constant term in (1) in the
mistaken belief that this is necessary to prevent the process from exploding when  +  = 1. See Nelson
(1990) for the true stationarity conditions in the conditionally Gaussian cases. In order to keep expectations
finite, McCulloch (1985) also replaced the squares in (1) with the first powers of absolute values, but in
retrospect this was not necessary. The symmetric stable distributions include the Gaussian as a special
case, so that the GARCH-normal model is subsumed within the GARCH-stable class.
4
Bidarkota and McCulloch (1998) treat 1 as an additional parameter to be found
by maximizing (2). This clearly overstates the value of the integral in (4) and therefore
the likelihood, since it replaces the integrand by its maximal value. Again, the
overstatement will generally be at its worst for bad parameter values, since 1, whose
distribution is in fact determined precisely by the parameters, is instead being set to
conform to the data.
Andrews (2001) sets
 12     /(1   ) .
(7)
This choice almost certainly understates the likelihood when  > 0, since it replaces the
integrand in (4) by the relatively small value it takes on at y = 0. His paper is concerned
with testing the null hypothesis of no GARCH, which he correctly notes is the single
restriction  = 0 in (1). Under this null hypothesis,  becomes unidentified, and g()
collapses on    , so that
 12   22    T2   ,
(8)
and Andrews’ choice is appropriate. Under the alternative hypothesis  > 0 and
assuming Ezt2 = 1, however, a more appropriate single choice, short of evaluating (4),
would be
 12  E t2   /(1     ) .
(9)
This choice does not clearly over- or under-state the likelihood. It does break down in
the IGARCH case, but that is not an issue in testing for the complete absence of GARCH
as in Andrews (2001). A QML test based on (7) probably has reduced power in
comparison to ML or QML based on (9), since it does not allow the objective function to
respond adequately to deviations from the null. If one’s goal is to estimate GARCH
5
parameters by QML without excluding the IGARCH or trans-IGARCH cases, an
appropriate improvement over (7) short of computing (4) would be to set 1 equal to the
median or mode of g().4
Computational Considerations
Equation (4) may be numerically integrated by any simple method such as
Simpson’s Rule. If Simpson’s rule is computed with n = 4m+1 equally spaced y-values,
including the end points 0 and 1, where m is a positive integer, its precision may be
roughly estimated by recomputing the integral using 2m+1 of these values, and assuming
that the remaining error is less in absolute value than the difference in these two results.
If the estimated error in the log likelihood is less than .01, the precision is more than
adequate for likelihood comparison purposes. A value of m as small as 2 (yielding n =
11) might be sufficient, since h1 has little effect on any but the first few terms of (2). If
so, true ML would take only about n-1 = 10 times as long as QML using one of the
alternative initializations. With today’s computers, this is not a significant constraint for
most problems.5
The exact likelihood could in principle be computed using the SorensonAlspach/Kitagawa filtering equations given by Harvey (1989, pp. 163-4) and employed
by Bidarkota and McCulloch (1998) to model the trend of U.S. inflation. However, this
If the variance of t is infinite, as in the stable cases considered by McCulloch (1985) and Bidarkota and
McCulloch (1998), the rationale for (9) disappears, but the mode or median of the unconditional
distribution would still be appropriate.
4
I see no particular advantage to using Legendre integration rather than Simpson’s rule, since the
integrand is unlikely to be globally well approximated by a single polynomial. Simpson’s rule instead
approximates it locally by globally unrelated quadratics.
5
6
would require laborious numerical integrals at a large number of points for each value of
t, alternately computing the prior distribution for t given 1 ... t-1 and the posterior
distribution for ht given 1 ... t, just to get the likelihood for a single value of the
parameters. Equation (4) above gives the same result with far less computation.
A certain precaution should be exercised in computing (4). The log likelihood is
ordinarily a manageable number, but (4) requires that the likelihood itself be integrated,
and the likelihood itself can easily be either an underflow or an overflow. To circumvent
this problem, let Li = log L(, , ; G-1(yi)) be the log likelihood at the selected points yi
at which the integrand is to be evaluated,* and let Lmax be the maximum of these Li.
Define Li* = Li – Lmax and Li* = exp(Li*). The maximum value of Li* is unity, and any
underflows may simply be treated as zeros. Integrate the Li* numerically as above to
obtain L*. The log of the likelihood as given by (4) is then L = log(L*) + Lmax.
References
Andrews, Donald W. K., “Testing when a Parameter is on the Boundary of the
Maintained Hypothesis.” Econometrica 69 (2001): 683-734.
Bidarkota, Prasad V., and J. Huston McCulloch, “Optimal Univariate Inflation
Forecasting with Symmetric Stable Shocks.” Journal of Applied Econometrics 13
(1998): 659-70.
Typograhic note: The symbol L is a script upper case L, which may not show up correctly if the reader’s
computer does not have the Lucida Casual font loaded.
*
7
Bollerslev, Tim, “Generalized Autoregressive Conditional Heteroskedasticity.” Journal
of Econometrics 31 (1986): 307-27.
Engle, Robert F., and Tim Bollerslev, “Modelling the Persistence of Conditional
Variances.” Econometric Reviews 5 (1986): 1-50.
EViews4 Command and Programming Reference. Quantitative Micro Software, Irvine
CA, 2000.
Harvey, Andrew C., Forecasting, Structural Time Series Models and the Kalman Filter.
Cambridge, 1989.
McCulloch, J. Huston, “Interest-Risk Sensitive Deposit Insurance Premia.” Journal of
Banking and Finance 9 (1985): 137-156.
Nelson, Daniel B., “Stationarity and Persistence in the GARCH (1,1) Model.”
Econometric Theory 6 (1990): 318-34.
Download