Massachusetts Institute of Techonolgy Department of Economics 50 Memorial Drive, E52-251B Cambridge, MA 02215 Tel (617) 253 2118 'This paper my Ph.D. dissertation at Yale University. I wish to thank Peter C.B. Philhps for I have also benefited from comments by Donald Andrews, Oliver Linton, Chris Sims and participants of the econometrics workshop at Yale and seminars at Barcelona, Brussels, Chicago, CREST, Harvard, LSE, Minnesota, MIT, Northwestern, Oxford, Pennsylvania, Princeton and Stanford .All remaining errors are my own. Financial support from an Alfred P. Sloan Doctoral Dissertation Fellowship is gratefully acknowledged. is part of continued encouragement and support. Abstract This paper analyzes autoregressive time series models where the errors are assumed weak additional restrictions on their distribuUnder these conditions Quasi Maximum Likelihood estimators of the autoregressive parameters are no longer efficient in the GMM sense. The main result of the paper is to be martingale difference sequences with tion. the construction of efficient semiparametric instrumental variables estimators for the autoregressive parameters. The optimal instruments are a linear function of the innovation sequence. shown that a frequency domain approximation of the optimal instruments leads to an estimator which only depends on the data periodogram and an unknown linear filter. Semiparametric methods to estimate the optimal ffiter are proposed. The procedure is equi\/alent to GMM estimators where lagged observations are used as instruments. The number of instruments is allowed to grow at the same rate as the sample. No lag truncation parameters are needed to implement the estimator which makes It is it particularly appealing from an applied point of view. 1. Introduction This paper develops new instrumental variables (IV) estimators for autoregressive time series models when the errors are uncorrelated but not independent. The specification includes error processes which are conditionally heteroskedastic of unknown functional form. Efficiency gains are obtained without having to specify a model for the dependence in the errors. The setup is general enough to account for stylized facts in series displaying features many economic time such as thick tailed distributions and time dependent conditional variances. Classical efficiency results for the quasi autoregressive parameters such as Hannan maximum likelihood estimator (QMLE) of the (1973) depend on independence of the errors. In the more general case of conditional heteroskedasticity considered here the QMLE does GMM lowerbound for the asymptotic covariance matrix which now no longer attain a depends on fourth moments of the innovation process. In Kuersteiner (1997) it is shown how a decomposition of the higher moment terms leads to a lower bound for the covariance matrix. An instrumental variables estimator based on this decomposition is shown to achieve the lower bound for the covariance matrix in the class of IV estimators with instruments that are linear in the innovations. The feasible of Hayashi GMM estimators developed in this paper are similar to the estimators and Sims (1983), Stoica, Soderstrom and Friedlander (1985) and Hansen and Singleton (1991, 1996). In this literature lagged observations are used as instruments to account for unmodelled MA(q) innovations which lead to inconsistent OLS estimators. Here lagged observations are used as instruments to account for unmodelled conditional heteroskedasticity of the error terms which renders OLS inefficient. Apart form the different motivation for the use of instruments, inefficiency versus inconsistency, this paper extends the previous literature as it explicitly treats feasible estimation. The number of instruments in our case is allowed to grow at the same rate as the sample size. This is made possible at the cost of an additional restriction on the fourth order cumulants compared to the treatment in Kuersteiner (1997). The advantage of making this assumption is that the estimator can be implemented without the need for a truncation or bandwidth parameter for the number of instruments used. Since the optimal instruments are unobservable they need to be estimated nonparametrically. Assumptions about the generating mechanism of the volatility process or more generally the dependence in higher moments are replaced by smoothness assumptions for higher order cumulant spectra of the errors. This setup allows for the treatment of dependence in higher moments as a nuisance parameter. Nonparametric estimators of this nuisance parameter are used to construct the optimal instruments. Other semiparametric procedures proposed to handle conditional heteroskedasticity include Robinson (1987) and Newey (1991). No parametric assumptions about the form of conditional heteroskedasticity are made in these treatments. However, in order to estimate the conditional variance these authors have to assume independent errors. This assumption has precluded direct application of their techniques to the stochastic conditional variance case. Hidalgo (1992) relaxes the iid assumption for the errors but has to assume instead that the conditional variance is a smooth function of an independent stationary process. GLS metric Hansen (1995) He assumes framework. Brownian motion in a semipara- that the conditional variance process converges to a Sample path continuity of the in the limit. model treats the stochastic volatiHty limit process then allows for consistent kernel estimation of the conditional variance. Hansen (1985) and Hansen, Heaton and Ogaki (1988) prove existance lowerbound in the presence of condimensional and nonlinear heteroskedasticity. The high character of these instru- More generally of instrumental variable estimators achieving a ditional GMM ments has so far precluded implementation of such an estimator. Here we limit ourselves to the implementation of optimal procedures in the much smaller class of IV estimators with instruments that are linear functions of the observations. Ordinary least squares is a particular member of this class. In the general case of conditional heteroskedasticity it is inefficient. This paper achieves the construction of a feasible version of the most efficient IV estimator with hnear instruments. The remainder of the paper is organized as follows. Section 2 describes the model assumptions. Section 3 develops the domain approximation for the IV estimator efficient IV estimator is for the AR{p) model. derived in Section 4. A frequency Section 5 shows that a semiparametric estimator with the same optimality properties can be constructed. Some Monte Carlo simulations are reported in Secition 6 and concluding remarks are made Section 7. Some important lemmas are quoted in Appendix A while the main results the paper are proved in Appendix B. in of Model 2. We by defining the stochastic environment of the model. Let {Q ,J^ ,P) he a. general probability space and define a filtration J^t to be an increasing sequence of cr-fields such that J^t ^ -^t+i C ^ V i. There is a doubly infinite sequence of random variables generating the filtration Tf We assume that we observe a sample of size n of a univariate time series yt where i = {1, ...,n}. More specifically, we assume that yt is generated by the following autoregressive model start <l^{L)yt where (p' is = et is (01, = St (1) a martingale difference sequence generating Tt- Here is ..., (f)p) assumed that the vector of parameters describing the has (l){L) the parameter vector (p. all roots outside the unit circle. The martingale correlation between the errors. However, Rather we allow for dependence (f){L) = l — (^iL—... — (f)pU'. mean equation We of the model. It are interested in estimating difference assumption for et implies absence of it is in higher not assumed that the errors are independent. than second moments to account for thick tails and conditional heteroskedasticity. Assumption A-1. (T^, (J E {et) = cr^ < (s) < DO for E [e'te't^s) >^ Remark s > (i) oo, et is strictly (ii) (f>{L) 0, (iv) some a> has stationary and ergodic, all roots outside the unit E [etet-s^t-r) = for all E {et for s ^ r, s,r > \ J^t-i) = 0,E [e^ circle, (Hi) E{e'te't_s 0, (v) ^ \s\ \a {s)\ | " J-'t-i) ^'^) =B< = oo, s. Assumption (iv) is added to the assumptions in Kuersteiner (1997) in order to simplify the form of the optimal instruments. It is somewhat restrictive as it rules out 1. - some nonsymmetric parametric examples such as EGARCH. The IV estimators proposed in Section 3 are still consistent and asymptotically normal if (iv) fails. However, in this case they lose their optimality properties. Remark 2. It should he emphasized that no assumptions about third moments are made. In particular this allows for skewness in the error process. Remark GARCH Asssumption (A-1) under certain conditions. Nelson (1990) obtains sufficient conditions for stationarity and ergodicity of the GARCH(1,1) model. The martingale difference property follows immediately from the definition of a GARCH process. Assumption (A-liv) is shown to hold for the ARCH{p) case in Milhoj (1985) for symmetric innovation densities. The same argument extends to the GARCH (p.q) case. If the innovation distribution is normal then fourth moments are known 3. to exist for the (p,q) processes satisfy GARCH (1, 1) + 2'^if3i + 0\ < case if 87! 1. This condition is valid and thus covers the ARCH case. In Milhoj (1 985) and BoUerslev (1 986) the autocorrelation structure a (s) is shown to be identical to the AR{p) and ARMA{maoc(p, q),q) case for ARCH{p) and GARCH {p,q) respectively. This implies that the summability condition holds if fourth moments exist. Similar arguments can be made to show that stochastic volatility models satisfy the assumptions. for P= , Based on the results in Kuersteiner (1997), we will now introduce the optimal instrumental variables estimator for the AR{p) model. The estimator is constructed by reweighting the innovation sequence by the unconditional fourth moments a (k) + o"^ of the error process. these 3. Without parametric assumptions about the form of conditional heterogeneity moments typically have to be estimated. Instrumental Variables Estimator The parameter in this paper vector can be consistently estimated by OLS. Under the assumptions OLS amounts to an inefficient IV estimator in the class of IV estimators with linear instruments. Kuersteiner (1997) derives the form of the optimal instrument as a function of the fourth moments and the impulse response function of the underlying process yt in a more general context. In this section these more general results are specialized to the autoregressive model. Let zt G M.P be J^t-i measurable and square integrable, strictly stationary and ergodic. Then, the instrument satisfies the moment condition E [(</> (L) yt) zt] = where (L) yt = An instrumental variables estimator then Et- Let y't = [yt,yt-i, ...,yt-p] and </> = [l, —<p'] (f> . is defined as =arg min f=P+i with an explicit solution in the case of an autoregressive model as <t>=[ZY.i]'^ Z'Y (1) = where Y' Y= that [yi+p, . . . ,yn] + e. By Y-icf) and y!_-^ = [yp,-.,yn-i] where the ergodic theorem As argued before instruments follows it which are linear zt y'j is 4> is = [yt,yt-i, ...,yt-p+i] Note consistent. in past observations are considered. Such instruments are equivalent to instruments that are linear functions of past innovations. This set up is formalized here. Using (f)~ (z) = X^jIq^^J-^''' *^^ time series yt can be expressed as a linear filter of past shocks yt = Yl%.o''P(i>,j^t-j- Also, let =E ak and b'f^ = (V'</.,fe-i, with ,'4'4,,k-p) In Kuersteiner (1997) the z-th element of it is ip^^^ = ^ (f^) + (^^ (e^tetk) = afc^ of a^ Yl'kLi Wk,i\ k for shown that for Zf < < = Then 0. (2) define Yll^=i '^k^t-k P^ = [6i, ...,6^]. with a^ G W such ooVi the asymptotic distribution of is that for normal with mean zero and covariance matrix —KX) 771 where Am = [o-i, ...,0^]. The lower bound S= where It O'k = is \im{^' Pln^;n'Pm)-' (3) should be noted that under Assumption (A-1), Qrn = Dio-gioLi, ...,0^)then follows immediately that the optimal instrument achieving (3) has weights it A ^k/oi-k- different almost surely. This is way of writing the optimal instrument a special case of the The optimal instrument matrix Z more general formulation now = is zt defined by setting z[ lim^^oo Pm^m^T' in Kuersteiner (1997). = [zt,i, ...zt-p+i^p] with t It should be emphasized that the Zt+i,k = Y^%zO'sr^^t-i and letting Z' = [zp, ...,Zn-i] k -th instrument, i.e. the instrument for parameter (pj^ is lagged by k periods and has a convolution filter which is also shifted by k elements. is . The estimator introduced be a function of the seqeunce Define (f)Q. zt = not feasible and needs to be approximated. Let ho {fcfc/ctfc}fc^i of optimal weights and the true parameter value so far is Yfk'Ji'^ h/otk^t-ki^o) with approximate version of 4> is et{(f)o) = (p'oYt and let Z= [zp,...,Zn-i] The now ^(/io)=(zV_i)"'zV. (4) A feasible estimator is then obtained by replacing Hq with a consistent estimate h. The proof of feasibility proceeds in two steps. In Section 4 and in Section 5 we establish that ^yn{^{h) — it is 4>{^o)) In the next two sections the Approximation (4) domain shown that y/n{(f){ho) — </>) = Op(l) = Op{l). is represented in terms of frequency This imposes no practical limitations since the freqeuncy domain version by the exclusion of only a few observations at the beginning of the sample. The difference is asymptotically of order Op{n~^) and therefore does not affect the first differs integrals. from (4) order asymptotic properties of the estimator. Formulating the estimator in the frequency domain however offers the potential for im- provements in terms of computational efficiency since the frequency domain formulation 6 can be used to implement FFT-algorithms. Using FFT algorithms reduces the computational complexity of ^{h) from 0{'n?) to 0(n log n). This improvement is substantial in applications where the dataset is extremly large and the estimator is computed many times. Leading examples are model selection procedures in forecasting applications and simulation studies. 4. IV Estimation in the In this section a frequency Frequency Domciin domain approximation to the optimal IV estimator Consider the inverse of the spectral density for the AR{p) model f~y (A) and let [(/((e*"^)] useful later and = 9yy{^> '^)- The definition of the = is ^ derived. |(?!)(e*^)| spectrum of the squared errors will be is oo /,2,2 (A) = (27r)-i ^{k)e--i\k J2 (1) fe=— oo Define the lag operator a(A) = [e'^, ....,6*^^] by a(A)*. Also introduce the matrix A (A) = and denote the complex conjugate transpose a(A)*a (A) . Then gyy{(t>, A) can be represented as ' 9yy(.(t>A) 1 a(A) a{Xy A{X) Define = 77(0, A) d\ngyy{^,\) (2) and introduce the discrete Fourier transform of the data the periodogram as In,yy (A) With these definitions = 1^^;^ we turn (A) The main Re [In,zy (A) -y= Y17=i Vt^ a -itX and . | domain implementation of the the beginning. It is easy to show that 1 = = to the frequency mental variables estimator introduced at 4> as cOy (A) (A)] d\ instru- -1 Re[In,zy{\)\dX + Op{n-^) (3) contribution of the paper consists in deriving an approximation to this frequency domain formulation that does only depend on observable quantities. In a first step we decompose the instrument into observable data and an unobservable optimal filter. In the next section it is shown how the unknown optimal filter can be consistently estimated. For the purpose of this and the next section, we introduce the spaces L'^ [— 7r,7r] of functions / [— 7r,7r] —> C^ such that J" |/| d\ < oo. Also, define the spaces C^ [— 7r,7r] of functions / [— 7r,7r] —> such that f is k times continuously differentiable. Throughout, the function Rn (A) will denote a generic remainder term whose definition can change. We start by approximating the discrete Fourier transform of the instrument variables ztj : : W ^., (A) = l^,k (A) e^^V(e-^^)a;, (A) + i?^,^ (A) (4) _-i = ^-„ ^e-^>0+^) and^i?^^^) ^ „-i/2 ^oo^ _^^-,A;^^^^. ^^^ and ^{k) = {i/jQ/akjipi/ai+k, Then let the define the sequences ^ = {ipQj'ipi, where 'A, /^., (A) _ } } j^^^ inner product = -^ Yl'^o ^J^ Rl^^s (A) Unj (A) = Ylt=i-j ^te"^^ a compact form — ^^^^n,j (A) Y17=i ^te^^^- We be defined for a = ••' Kmp)^^^^'' {sq, si, ...} R^ f. where (A) in = [i?^ ^ ^^ ^^ analogous way u^^ (-A) (A) is , and ...i?^p (A)] the complex conju- (A) as l.^ . = + RriMk)^>')- denote the vectors of stacked error terms by -R^ (A) [^n,^(i)^^)' K,i,^^^ gate of uJzi^ (A) Define s can now write the remainder term -iX <fc(A) = -W(A)0(e- )K,i,W We sequence ^^,1 (A) = l^ (A) (5) 't/i.p The properties of /,/, determine the asymptotic distribution of the instrumental (A) The next lemma ables estimator. (A) gives a representation of operators. This shows that the smoothness of l^ (A) vari- terms of convolution inherited from the smoothness of AR(p) spectrum. the Lemma 4.1. Let l^^k (A) sumption (A-1) and assume that Irj is (A) in /,/, = (A) -t- 1^ 1 ^^'^^ otk —^ a{z)* has = ZT=-oo ^j^''^' (A) all {fa * v) (A) and -I- -^r) {(f), A) . 8.10, that Ir, (A) G . Also Then ^ = (^ - ^) ^^^h a, The properties of For fa & L} [— 7r,7r] and fj. satisfying As- et characteristic roots outside the unit circle. Using the convolution operator 4. , . B Proof. See Appendix Remark (e?e?_fc) (—A) can be represented as J —IT where fa = E being the coefficients of the power series expansion of 4>{z)~ ipj = (f){z) l^ (A) = E^o ^1^7^"'^^^'^''^ C'' [-7r,7r] rj E C^ Irj * a compact notation for (A) [—-n,-n] can it l^i (A) is now be determined from FoUand follows from Ij^ (A) = those of fa (1984), Theorem oo implying that J] j=0 \j\ < OO. While fa e L^ [-7r,7r] is «fc+j it is not necessary. Alternatively if a^ > a > for some sufRcient to obtain this result a and all k then sup a^. | | < a ^ < oo and Yl'^o These arguments show that central limit theorem in Appendix A. if f] E C"^ [— 7r,7r] The DFT . Irj oik+j \j\ < a-'^T=okj\\j\ <oo (A) is sufficiently smooth to apply the representation of the discrete Fourier transforms of the instruments in terms of the for the data allows to obtain a frequency domain version of 4> without the need to go through an explicit calculation of the instruments in the time domain. The approximation relies on the fact that convolutions in the time domain are transformed into multiplications 8 = domain and the by (f){e'^^). in the frequency multiphcation of The fact that the residuals W lOz,, by k periods leading to transforms in the following u, ^z, (A) cl^{e-'^)uy (A) l^ (A) (A) ,-tAp where Rn(X) = e~*^'^w^^ (A) way -iX a (A)* ©i?^ (A) and © is the element by element product. = = e'^PcOz, where the symmetry of Rn{X), (6) The correspond- is (-A) e'^cvz, u>z (A)* + '^z, (A) ing expression for the conjugate transpose of Uz (A) + R^{-X) 4,{e'^)u;y{X)*l^{-X) (7) (-A) Z^ (A) is used. discrete Fourier transform approximation for Zt introduced at the beginning of produces an asymptotically equivalent estimator based on the periodogram of this section yt is then obtained from (4) and stacking the resulting discrete Fourier transform of the instrument matrix by lagging each The can be computed by a simple ujy (A) and an unknown It is filter. convenient to define /i^((^,A) = Re l^{-X)(t>{e'^)a{-\) and h{(l),X) By =Re l^{-X)<P{e'^) . substituting for equations (6) and (7), (3) can be approximated by -1 kho) = ^n,yy (A) /l^ (^, A) dX In,yy (A) / h{(f), X)dX. (8) J— IT It is shown in the proof of Proposition (4.2) that 0(/io) 1 (t){ho) - <^o = — 4^ = Op[n -^Z^) and -1 •n,yy (A)/i^((^,A)dA J In,ee (A) Ir, (A) dX + Op {n'^'^ such that consistency follows from ergodicity and the fact that E f In,ee (A) Ir^ (A) dX = a^ f J —TT transparent from equation /^ (A) dX = 0. J —IT depends on knowledge of the true parameter values and the correlation structure of the squared errors. Feasible versions of 4>{ho) will be discussed in Section 5 below. Under the assumption that the weight matrix Re [Z^ (—A) (e*'^)] is known, the asIt is (8) that ^(/iq) is infeasible as it stands, since it </> ymptotic distribution of (A. 4). The next ^(/lo) is now a straight forward consequence of proposition summarizes this result. Lemmas (A. 2) and Proposition 4.2. Let (f>{L)yt = et where all roots of (f)(L) are outside the unit circle. If et satisfies Assumption (A-1) then for ^ defined in (1) and 4>{ho) defined in (8) we have 4){ho) — 4> = Op (n"-^/^) and ^(4>iho)-<Po)^N{0,a-^E) where E is defined in (3). Proof. See Appendix B The remainder of the paper will now be concerned with parametric estimator with the same distribution as 5. the construction of a semi- (f). Adaptive Estimation To develop a feasible efficient IV procedure, Re [l^ (A) (p (e~*^)] a^^d h^ {(f>, A) = Re \l^ (A) cf) has to be established that h it {&~''^) is called adaptive. No apart from efficiency issues, is A semiparametric confusion should arise between this use of the terminology and the literature on feasible local Bickel (1982), Kreiss (1987), Linton (1993) = a (A)] can be replaced by consistent estimates without affecting the limiting properties of the estimator. estimator having this property A) (cf), minimax estimators such and Steigerwald (1994). The main as difference, the fact that here a nonparametric correction to the crite- is made while the local minimax Newton Raphson improvement to a consistent rion function literature first makes a nonparametric one step stage estimator. Different approaches to prove adaptiveness are used in the semiparametric literature. is used in Robinson (1987,1988) in the context oiiid models and partially models and by Hidalgo (1992) in the context of time series regression models. Newey (1991) applies similar techniques as Robinson (1987) to the instrumental variables case for iid data. Andrews (1994) develops a general methodology based on stochastic equicontinuity arguments and applies it to the partially linear framework. Andrews' approach will Direct calculation linear be used here to break the proof into two parts. First, it is shown that a nonparametric estimate h{^,X) converges to h{(f)Q,X) uniformly with probability one. The second step is to established that uniformly in a shrinking neighborhood of the true filter h{(pQ,X) the distribution of an estimator is arbitrarily close to the distribution of the estimator based on the true filter. This argument will now be formalized. Let where C is the complex plane. Then introduce a n = |/i: [-7r,7r] /i = Re L (-A) (^(e'^) l^ : [— 7r,7r] — > C^ and set of functions 4> ' H defined ;Re[Z^ (-A)] ,Re ^ (e'^) ["TTj^] ~^ C as e C^ [-n , Tr]\ . (1) where C'^ [— tt, tt] the L°° Sobolev denotes the space of k times continuously differentiable functions. Define norm of order one as 1= sup 11/ (A) 11+ sup A£[-7r,7r] AG[-7r,7r] 10 I^W where ||.|| on 7^ as is the Euclidean matrix norm = p{hi,h2) {HjP) is a complete metric space. Lemma We proceed (4.1) that l^ A) ((/>, G C" defined by - ||i,/,,i If . + li,,2\\l — 1101 (trAA*) - ' . Introduce the metric ?f'2lll given by (1) and is (f) [-tt,it] \\A\\ /,/, by (5) then it follows from Therefore h{(t>,X) G H. by defining the estimator for h (</>, A) We have established that we can = {Y_iY-i)~^Y-iY or from its frequency obtain a consistent estimate 4> for example from function of some fixed parameter value before. Residuals as a domain analog introduced (p are obtained as in Kreiss (1987) from . cf) e« {<f^) = such that the estimated error measurable part (0 — et £t ((/'o) {(f)) n (</>- 0o) -^yt-p) {yt-i, We form the following t=p+k+l else dr, > where the sequence d„ for all n with dn = O {n ^1'^+^) for is < some z^ < 1/2. The dn are used to avoid "too large" values for a^^(^). Truncation was More introduced by Bickel (1982) in the context of score estimation. our context J-'t-i statistics. = oikQ)) truncation numbers (2) can be decomposed into the true error and the ,yt-p)- {yt-i, (po) + closely related to Hidalgo's (1992) semiparametric frequency domain estimator. Simulation experiments indicate that the truncation playes no role in practice and can therefore be ignored in applications. Next, an estimate for b^ = (27r)~^ J^^ A) e^^^dX 77 {4>, is The needed. vector hk contains we the impulse response function of the AR{p) model evaluated at different points. Here want to express definition of 6fc 77 b^ directly as a function of the underlying yli?-parameters. (0, A) in (2) 4)~'^ and the expansion {z) = with ip^j Yl''p4>,j'^^ = From for j the < 0, can be written as bk '^4>,k-p where the all s > coefficients ip^^j satisfy the recursion ip^^ and tp^^Q coefficients of the = 1 Let (see Kreiss, 1987). polynomial expansion of (f)~^ ifJ^ (z) . — (pi^p^^s-i This vector " 0<^.o <^1 1 <?^2 -01 — (t>p4'4,,s-p is = first ^o^ p + 1 the solution to " " " 1 — denote the vector of the 1 ^<t>A (3) 1 - 11 . ^*,P . _ _ which denoted by is defined in pxp+1 (3). t/)? Then, = ^~^ei where ei is the first unit vector bp+i denote the vector of coefficients let selector matrix ^i picking the last p elements from a p and ^o ^p^^2 +1 x 1 $ is the matrix '^4>,p+\- vector Using a we have 1 hp+i = Si^l = r^5i$-^ei 1 <t>V where we define T^ = (/>! if p = 1. In a similar way we obtain bp+s = T^Si^ ^ei. The vectors h\ .b^ can now be expressed as functions of the underlying parameters by . . bk where the convention T9 = with the indicator function coefficient b^ is Ip = is {.} = 5p,fcr("^"'t°''^-^1^5i$-iei assumed and the 1 if selector matrix Sp^k the expression inside the bracket continuous in the underlying parameters for all finite is is defined by true. The Fourier k and can therefore be consistently estimated from a consistent estimate ^. A nonparametric estimate of h (cj), A) is now defined as n—p—l and hr (^,.A No =Re (A)0(e -i\ (4) is needed. The reason is, that h (</>, A) is already a conbounded sequence and a twice continuously differentiable function. In implicitly contain a bandwidth since for every inside the stationary region additional kernel smoothing volution between a fact, the bk (f) they will decay to zero quickly. We need the following matrix h^{4',X), whose elements are continuous func^) and which is defined by will also tions of hn{4'-, h^^ The (^, a) = Re [l^ (A) ^ (e-^^) a (A) success of a semiparametric estimator depends on the ability to uniformly estimate the weights aj^. Additional assumptions about the moments of the driving error process on fourth moments such conditions necessarily involve higher than fourth moments. Here we prove uniform convergence by a mean square argument which necessitates summability assumptions on eighth moments. The following assumption is sufficient to prove the main result. are needed to assure this. Since aj depends 12 Assumption B-1. Let Cs.,.e (ii, . . . , ifc-i) he the k-th order cumulant of the error process Then et. ^(1 + \tj\) \c,,„, {ti,..., tk-i)\ < • J]] tl = oo, for all j 1, ..., k - 1 and k = 2, 3, ..,8 t-k-l Assumption (B-1) implies that higher order cumulant spectra assumption enables us to state the following Proposition 5.1. Let hn{4>n^ -^) he as deRned assume that ^„ -^ (pQ in probability. Then sup of order eight exist. This result. Assumptions (A-1, B-1) hold and in (4), let hn{(t>n^>^)-h{(t)Q,X) =Op(l) AG[-7r,7r] ^ n — Proof. See Appendix B We proceed to define the semiparametric estimator ^„{hn) by replacing Hq with a nonparametric estimate (4) We will establish that = n as P Also oo. [hn{(l>n-> P (p ^'H] -^ ^) 1 f /i„((/)„, as A), /i ((/>o, A) >^) — any for > ) (5 > as > oo and ^ oo. n /i((/)q, A) . Vn By applying Lemma (A. 2) In,yy (A) h it ((/), (^^„(^n) - (ho)) 4>n = can be shown that ior h A) = In,yy (A) Rc Op (1) (5) . eH [l^ (A) (/.(e-'^)a (A)] +In,ee (A) h^^ (0, \) + h ((/., (6) A) i?„ (A) where the remainder term Rn (A) is such that ^Jn f Rn{X)'^ (A) dX = Op uous function (A) with absolutely summable Fourier coefficients. Let (1) for (7) any contin- <; h^^ such that h^^ {(pQ, A) then follows = Re = Re [l^ (—A)] = V (5) icf>,X) (^, a) Ir/ = Re [l^ (-A) (A) and [i^ (-A) 4>{e'^)<t>^\e'^) 4>{e'^)<Po'i<^'^) if £^ In,yy (A) (A^(^„, A) - h^ (</>o, A)) dX = Op{l) (8) and j^In,e.{X){h^^Q>^,X)-lr,{X))dX = Op{l) /l„(A)(/i„(^„,A)-/i(<^o,A))c!A = Op{l). 13 (9) (10) (8) can be established easily with the help of Proposition ^" - /„,,, (A) (/i^(,^„, A) < sup r In,yy{X)dX /l^(0„,A)-^-((^O,A) J —n ,^A^ sup 2 Ae[-7r,7r] last h = = {a a){b b) expression goes to zero by (5.1) (9) we work with hn{4>n, A), is a positive scalar and the second where a and b are two conformable vectors. The and the fact that sup;^£t_ ^.^i ||a(A)|| is bounded. To inequality uses the fact, that In,yy (A) first inequality uses tr{ab ba) prove ||a(A)||%j,(0)^0 sup /V(-A)^(e^^)-Z^,o(-A).^o(e Ae[—tt.tt] where the by the following argument h^ (00, A)) d\ Ae(-7r,7r] < (5.1) 0^ Vn = cP{e'^), = (h) the metric space {H,p) defined in = Me'^) ct>^ y/n (1). Also let Hq - Hq) dX = h{(j)Q,X) , and In,ee (A) {h^^ {(f), A) - /,,,o) + Rn (A) (h J —TT for (5 h G H. Following Andrews (1994), > any given ??, > e limsupPf Vn (hn) - Vn (ho) > < limsupPf Vn (hn) - Vn (ho) > &,hn€H,p{hn,ho) <6 ^ n—>(x> + lim sup P < there exists a 1?) ' (hn ^ 7i oi p{hn, ho) sup \\vn {h) > 6 — f„ (/io)|| > > =0. i? | < £- \h€n,p{hr,M)<i we have established in Proposition (5.1) that lim sup n—too Therefore ^ hmsup P ^-"^ Since (9) follows if for such that P (Jin^H V or p (hn,hQ] ^ 6] ^ if limsupP "-°° sup ll^n (/i) — '"n (^o)ll > <^ 'i^ (11) I yft.e-H,p(/i„,/io)«5 then the following theorem can be established. ^ Theorem 5.2. Let hn{^ny^) defined in (4). Let assumption (A-1) hold and let ^„ be a previous estimator for which ^„ —> (pQ in probability or almost surely. Then, the semiparametric estimator 4>(hn) defined by ^ [hn = In,yy{X)hlQ)^,X)dX ) •TT / J 14 ./— TT /„,yy (A) /i„(^„, A)dA . has a limiting distribution characterized by ^ (4,{hn) - 4>o) ^N{0,a-^E) . Proof. See Appendix B This result establishes the feasibility of a semiparametric estimator that improves on the efficiency of the conventional Gaussian estimator in the presence of higher order depen- The frequency domain dence. representation allows to avoid estimating the instruments each observation in the sample. Instead an optimal for filter applied to the periodogram of the data leads to an asymptotically equivalent procedure. Moreover, the fact that the optimal a convolution integral in the frequency filter itself is domain solves the problem of truncating the approximation of the optimal instrument at a given lag in a natural and elegant way and eliminates the need for lag truncation parameters for the number of instruments used. Monte Carlo Simulations 6. Monte Carlo experiment is reported. To keep the exposition as simple on an AR{1) model. We consider what the efficiency gains/losses of In this section a small as possible the IV we focus estimator are relative to a correctly specified likelihood procedure and relative to OLS. The following questions are of interest: estimator achieve efficiency gains, how much how Under what circumstances does the optimal IV QMLE and big are they relative to the Gaussian by not specifying the true likelihood. These questions are analyzed for mechanism is an ARCH{1) process. generate samples of size n = 256, n = 512 and n = 1024 from the following model is lost the case where the true generating We yt where Ut ~ et is generated by the ARCH{1) A''(0,1). Starting values are yo = = + et (fyyt-i process and ej eq (i) = Ufh/ where ht = 'Yq + 7i£t_i with = 0. Small sample properties of three be defined below are evaluated for different values of 0, 7;^ G [0, 1) from Milhoj (1985) that asymptotic normality established in previous chapters different estimators to It is clear only obtains for values of 7^ € [0, -^1/3). Nevertheless, simulation results are reported for parametrizations outside this interval in order to analyze the robustness of the proposed IV procedure to departures from the assumptions. The parameter 70 is fixed at .1 for all experiments. The parameter is (f) ^ mator is is The least squares esti- = Yl^=2ytyt-i/Y17=2yt-i- ^^^ optimal instrumental variables obtained from the consistent first stage estimator 4>oLS ^^ denoted by 0„ estimator estimated by three different estimators. OLS IT In,yy (X) h"" {4)QLS, >^)dX 15 / J — -K In,yyWH^OLS^^)d>^ where h^ ated by and A) {(f), h{(p, A) computed are the Ukelihood estimator (1), obtained from maximizing is (j)^ 7i;n = 70, /(</., as explained in Section 5. If the data are gener- -^E (in and /it = 7o + li^l-iEngle (1982) to maximize the likelihood. with St = yt — We <t>yt-i Figure Figure QMLE 1 shows the potential + ^t use the (2) ^) BHHH algorithm described in 1 efficiency gains of the as a function of the autoregressive parameter IV estimator relative to the Gaussian </>. The efficiency gains are from the asymptotic covariance matrix when the generating mechanism plicitly, the asymptotic covariance matrix oi c^Ils (</',7o,7i) = = is (1). computed More ex- can be expressed as (p^ ^^-^ E'^'^^.+i (3) 7i)^ and a,+i = 27§7'i+V[(l - 7i)^(l - 37?)] + o"^. The asymptotic covariance matrix for the optimal IV estimator can be obtained from (3). It is given by where a^ (70/I - fv-i ^/y (<?!', 7o,7i) = "I -1 -^E^'V\ (4) i=0 Figure plots cr^y 1 (</>, .1,7;^) /cr^^^ (c;!), .l,7i) for cf) 6 [0,1) and different values of 71. ' OLS These theoretical gains are contrasted to the empirical efficiency of the estimators 4>n and 4>n based on 3000 replications for sample sizes 256, 512 and 1024. The results (j)^ are summarized in Table 1. Table 1 , As expected, parameter sive is gains for the above .5. IV estimator are achieved for models where the autoregres- This conforms with the theoretical analysis based on asymptotic approximations. For the sample sizes considered here, the theoretical efficiency gains are not achieved completely. The improves with the sample to 512. It is The most significant increase takes place also interesting to note that the for values of 7^ dependence table shows that the relative efficiency of the size. > IV procedure maintains \/l/3. In fact the gains are strongest when both ARCH estimator size 256 properties even autocorrelation and in the conditional variance are strong. Figure 2 shows the empirical densities of the three estimators no its IV from 4)^ , 4>n ^^d ^„ when effects are present. Figure 2 The graph confirms the information summarized identical and (^„ in the tables: The three estimators are under iid conditions. Figure 3 shows the empirical distributions of = .9 and 7^ = .9. for a sample size of 1024 when </> Figure 3 16 4>n > 4'n Here 0„ clearly dominates the two other estimators in terms of efficiency and mean and median unbiasedness. The IV estimator has surprisingly good properties even though the asymptotic theory used for construction does not hold for this set of parameter its values. Table 2 contains the means and medians on 3000 is bias tends to be largest for the (f). For a fixed other hand, 7. 4'n ^'^'^ ^n when n = 512 based is little largest (p, it is affected 2 IV estimator, but the smaller than the difference of the former with with ^^^ , replications. Table The ^^^^ ^„ for when = 7^ 4>^ .5. difference between (/>„ and ^„ The bias for 0„ and ^„ increases The bias of the ML estimatoi, on the . by the parametrization of the model. Conclusions This paper develops efficient IV estimators for autoregressive models with martingale novations. Popular parametric examples of such innovation processes are A and stochastic volatility models. these effects in many macroeconomic and The paper shows vast empirical literature that GMM OLS. ARCH, GARCH documents the presence of financial time series. that estimation of the autoregressive parameters by standard Gaussian ML techniques leads to inefficient estimators. An important result inate in- in Kuersteiner (1997) is estimators based on lagged instruments improve efficiency and therefore domIt is shown how to construct the best GMM estimator based on instruments that are linear in past observations. A common GMM problem of procedures based on a large number of instruments is bad small sample performance. This is mainly due to estimates of a high dimensional weight matrix. We introduce a novel decomposition of the weight matrix which leads to an their orthogonahzation of the instrument space. This decomposition has the advantage that it implemented by using FFT algorithms. Moreover, the estimator does not require a bandwidth choice for the number of instruments used and is can be computationally efficiently therefore straight forward to use in practice. The small sample properties of the procedures developed are very promising. mators are equivalent to conventional OLS The esti- even when the innovations are iid and strictly dominate OLS when the innovations are conditionally heteroskedastic. These results hold for sample sizes as small as 250 observations and are therefore relevant for macroeconomic time series. 17 A. Appendix Lemma - Lemmas A.l. Under Assumption (A-1) the following statements hold: a) for each m 6 N+\ {1} m fixed, the vector , -^ Yl^=i i^t^t-i, ,£t£t-m] =^ N{0,Q) with a{l)+a^ n= (T{m) + (7^ Leram.a A. 2. Let In,yy (A) be the periodogram of {yi,. ,yn} and /„_££ odogram of {ei ,...,£„} Assume et satisfy Assumption {A — 1) and that yt . . (A) is the peri- = YI'jLq i^j^t-j with spectral density ^gyyiPot^) such that Yl'jLo |'0j| \j\ < o°- Let (.) be any continuous tt] —^M. with absolutely summable Fourier coefficients {zk,—oo <k < oo} <; function on [— IT, then for any 77, P(y^ as n e , > J In,yy (A) <; {\) dX - ^J /„,,, (A) <e dX >r/ gyy{pQ, X)q (A) 00. Proof. The proof is the same minor modifications. Lemma A. 3. (Billingsley) pose that for each m Xr, p. 388, with Let Xmn, Yn be random variables defined on {Q,,J^,P) Xm —^ 1 and Davis (1987), as the proof in Brockwell as — n 00 and that > Xm —^ X as m . Sup- -^ 00. Suppose further that hmsup lim for each positive Lemma e. A. 4. Let Then Yn ^>- X as P {\Xmn - i^n| > n —f 00. In,yy{X) be the periodogram of {yi,. riodogram of {£!,...,£„}. Suppose the Y^'jLoi'j^t-j with spectral density any continuous even function on = e} satisfy et . . ,yn} and Assumption {A — [—n,Tr] such that fc=i and f^^ q (A) QyyiPo, X)dX = 0, then In,yy (A) ? (A) dA T with bk A iV 0, \ = Ejl-oo 7yy {k - j) Zj. 18 I) is the pe- and that yt = that Yl'jLo °°- ^^^ "^(O ^^ l^il bl < with Fourier coefficients {zk, —00 < k < 00} ^gyyiPoA) such —> M /n,ee (A) 4 ^ t,_i fc=i Qfet^ , B. Appendix - Proofs Proof of Lemma 4.1 — —T Y^°^_ J < First oo- we show that /„ Next note (A) for a typical € Li which follows from J^ [—tt,7t] \fa (A)| element k —n oo /jr -•^ oo oo oo j=0 l=-oo j=0 /=-oo oo oo I / oo ^ I oo gJA(j+fc) such that the result follows. Proof of Proposition 4.2 Using equation can be written as Re[In,zy W] = +4>{e-'^)li. (4) In,yy (A) Re[h^{(f), (-A) u;, i-X) Rl^ and the A)](?!) (A) definition for h^{(f)., A), /n,2y (A) + /^ (A) In,ee (X) + Re[Rn (A)] + l^ (-A) where |<^(e'^)|' |i?^_^ (A)|' +i?i,^(-A)i?2(A) and a;,(A) The two remainder terms R^ ^ quences uct i?^ Un,j (A) g = rne"^>e(A)+<^(A). and R^ (A) (A) are defined next. First define the se- = {ipQ,ip-y,...} and Tpik) = {^o/<^fc! V'l/^i+A:) •••}• Then let the inner prod= -^Y^'jLoSj^~^^''Un,j be defined for a sequence s = {so,si,...} where = Yl7=i-j £te*'^* — Y17=i ^te'^*. We can now write the remainder term R^ (A) in a Tp W (A) compact form Rl where R^ - (A) ' = (A) = -l^ (A) 0(e-^^)Hi,^(A) ^(uW' ...,R^ [i?-^ + a (X)* © i?;^_- (A) such that i?^ (A) r, ,(A)] is a p x vector. This leads 1 ' to Re[In,zyiX)]dX = / + In,yyiX)Re[h%(j),X)]dX(f> /" Ir^ (A) /„,.. (A) + dA J —ir Alternatively one has also Re[In,yz (A) a (—A)] with RP, (A) = (l)-\e-^^)uj, (A) Rl i-X) + r Re[i?„ (A)]dA. (1) J —TV = Rj, (A) 19 In,yy (A) Re[/i^((?!>, A)] R^ (-A) Now . if -I- Re[i?^ (A) /Re [i?^ a(— A)] (A)] <r(A)dA = dX = ^/^), Op (n (f) + where has absolutely summable Fourier coefficients, <;(A) follows that it (f) = Opin^^f^) and -1 In,yy{X)Re[h''{cj,,X)]dX Vn[j)-(j)j x-v/n r (A) In,.e (A) c?A Ir, Since l^ (A) G C'^ [-7r,7r] and (?!>(e^'^) G J —TV C'^ [-tTjTt] / /•IT ^^(A)J„,eaA)tiA^7V V^J where the matrix Ym^i '^i W {^n > s'^') follows it is Lemma (A. 4) that A (A) {^n , i\l , has typical element e*'^') = J2'^J+kvl'l'4-,J%,J+\k-l\- j=0 k,l (4.1) from OO 1=1 Lemma it 0,^a,(z,(A),e'^^)(z,(A),e'^^) J^ai{l^iX),e'^')(lr,{X),e Using T Re[i?„ {X)]d\ + J — TT easy to show that 5^a,(/,(A),e^^A(z,(A),e'^^- faiX-Ov{<l>,Ov{^,X) d^dX + -, / 77 a'* a A) 77 (<^, A) dX ^/n X^tt -^ (A) dX = Op (1) We discuss the four different types of remainder terms separately. Since Z^ (A) and Z^ (A) (j){e^^) have absolutely summable The result then follows if . Fourier coefficients and are uniformly bounded on [— 7r,7r] the proof of be applied to the remainder terms involving R\ ^ Lemma To show that -Jnj^^(f)~^{e^^)u)^{—X)R'^{X)dX = Op(l) is it enough to show that element by element Ey/n [%-'ie^^)u,{-X)Rl,,{X)dX 0. J —IT Using the definition of /?^ (A) EV^ one has r r' (e^^)'^e J — IT < (- A) Rlf, (A) dX E^ Jr l^^k{X)ct>'\e'^)<p{e-'^)u,{-X)R\^^{X)dX —IT +Ey/n £ r' (f") -. (-A) n- V2e-A'=iii 20 (A. 2) can (A) alone. (A) - ^^^ dX (e ''^) and applying term can be analyzed by setting q {X) = l^ (A) (^ ^ (e*"^) the proof of Lemma (A. 2). Next look at a typical element k of the second term The first (f) i^i EV^ ^ / ^ -^ Then for j, Z, and r oo oo J ,, n EE E E S^'. (='- - -"--) --"<'-'-'*"<iA j=0 1=0 r=l t=l "'=+J fixed such that l<r — k + l<n 1/2 E fc+/ Now summing i, over '-"j-n-k+l m gives j, U^) u>, i-X) n-V2 Y^ ^.e-'^Wnj This holds true 1/2 ' for all k = l,...,p. (A) ? (A) dX <sup24/vv2j;j: ^,-^/ j=0 /=0 "fc+j lJl-0. Next consider r Rli-X)Rl,{X)dX J—TT Vfc(A)0(e-'^)|i?i(A)|'dA r R\ (-A) n-1/2 V Ae-^(^+'=)C/„,, +Ey/n ^ V-TT such that the first OLj term goes to zero by the proof of (A) dA +k Lemma The second term then (A. 2). is OO E^ / y J^e-Mi+fc)c/^ Ri (_A) n-1/2 . (A) dX /TT < En (fA -TT < n-^l'' '^ /TT 1=0 j=0 EE "j+'= E\Un,l{->^)\\UnjW\dX /=o j=0 OO . \ 1/2 "" 1/2 < n (^it^n,,(A)r 0!j+k /TI" < n-V2 CO OO ElV'dmin(/,n)^/2E '^ Z=0 j=0 21 mm{j,n)^^^dX^O. dX The remaining terms can be shown to go to zero by the same arguments and the proofs are omitted. This completes the proof of the proposition Lemma = ^ Z7=i+p+i £ti<t>)^t-M, ""^ = Ea* {4>) with e^cp) defined satisfying Assumptions (A-1) and (B-1) and any fixed S < oo it follows that (2), £t B.l. For a* (</.) sup maxt;ar(n^/^ Proof. Let yt = ,yt-p\ and [yt, m,^ and a"^ = ai^^{n = — — p)/n. et {(pf Et-i 4>q = = fi'n,y...y{qi, 94=0 = sup \ai sup '^Y^\^g^---(J}g^\\il'^yy{qi,...,qi,l)\. <l>^N6{<l>o) n'^ Y17=i+p+l {(f)) - q^=Q ' --yt-i-qi - Eyt-q^ --yt-i-q^. Then ai^^l (2) 54=0 can be bounded by s fmax(|(?!)i_o| ^ it yt-qi p / so that write ^'Yl^i^'" ^giyt-qiyt-q2yt-l-q3yt-l-q4- p (2) we can 1 -,94,0 < Expression 0(1). p gi=0 define = vec{ct)(f)'y{yty't0yt-iy't_i)vec{(t)(f)') p Now af^\) vec{(l)cj)')'E{ytyt®yt-iy't-i)vec{(j)(j)') = ((P)'^ - Then =[l,(t)']'. </> Letting I |a,* {(()) in + ^) p p XI ' ' ' J ^91=0 ' remains to show var n |An,y...2/(9ij XI |An,2/...2/(9i> •••>94,0| 94=0 •)94)0| = 0{n~^) for all /. Now var \jj!n^y„,y\ is n X X "~^ 4 COv{yt-q^ yt-l-q^,ys-qs --ys-l-qs)- t=l+l+p s=l+k+p We "~^ can substitute n n X Yl for the definition of yj oo ^ '4'j^t-j which leads to oo X X t=l+l+ps=l+k+pji=0 = ^Jl ^J8^°^(^*-9l-Jl ^t-l-qi-H,£s~qs.-h ' ' ' ^s-k-qs-js) J8=0 (3) The covariances can be represented by eighth and lower order cumulants by considering the following matrix X{t,sJ,qi -ji,...,q8-J8) = ^t-qi-jl ^t-q2-h ^t-l-q3-J3 ^t-l~q4-J4 ^s—qz—jb ^s-qe—je ^s-l—qr-jr ^s-l-qs—js 22 . with typical element Xij where reference to the time indices Brillinger (1981), Theorem Then from suppressed. is 2.3.2, 4 4 coz;(JJ j=l Xi , J, n JJX2j) = X] ctim(Xij,z,j G Vs) Vs€v V j=l where cum{Xij,i,j € Vs) is the joint cumulant of all the Xij with indices the sum is over all indecomposable partitions v of the table (1,1) (1,4) (2,1) (2,4) i,j S Vs and • A given in Brillinger (1981), p.20. We note that cumulants are zero. Second cumulants are nonzero only if the time indices of et definition of indecomposable partitions all first By coincide. indecomposability of partitions there from both rows n -1 is in X in each product Ylv is one cumulant with elements at least £v^'^''^i-^i,j^'''i3 ^ Es Et co'u(n)=i ^i,j, 11^=1 ^2,j) = Es cou(nl=i ^l,j, nj=i dices for ^Y ^s)- strict stationarity ^2j) where the time in- have been normalized to zero. The summability assumption (B-1) implies that t Es<^^''^(nt=i ^ij,n7=i-^2,j) < oo uniformly in I, j\,...,js- This shows that the term (3) is of order n~^ uniformly in I as had to be shown. Proof of Proposition for any ry, 5.1 For the first part of the proposition we have to show that A^^ ((^g) is contained > e lim P{ sup hr, > (0„,A)-/i(0o,A) 77) < e. Ae[-7r,7r] This holds there if a ^ and a neighborhood is A^^ lim P( Consistency of considered. 4> hn{(l),X) implies P{(p € -h{(po,X) A^^ {4>o)) of (pQ such that — 1 > > rj) + lim P{(j) so that only the ^ Ng first {(f)o)) < e. term needs to be From hni(f),X) Re < sup sup {(J)q) parameter space and in the interior of the stationary region of the - h{(f)Q,X) f^(A)</>(e-^) l^ (A) </.(e-'^) - Re hfi{X)Ue-^^) /^,o (A) -iA> M^~n + 1 i^(-X)^{e'^)-l^^o{-X)Me'^) and l^{X)cp{e-^^j-l^,o{X)Me-'^) < kW-h,o{X) <t>{e-^^) +||Z^,o(A)|| 23 0(e-^)-(/)o(e-^^) , it is enough to show that while \(l){e'-^) To estabhsh element aj^tj) = ^ip,k — (e''^)| (f>Q sup sup A 4>€Ns{4>o) < (A) — ltp,kfi Eel{4i)el_A(j)). sup on Ns 57/2 sup^^sup^^g^y^^^^^^ (A) - li, Z,/, (A) Now let ctj /^,o (A) = = Op (1) Ee1e1_j, Then, using the definition of sup 'i/',fe (A) — Z^,fc,o = l^fi (A) Op (1) by uniform continuity of ^q {4>o) — - (A) bj^k it is {^^^) o^i enough to look denotes the A;-th [-t'",'?''] at a typical element of bj and l^^k (A) (A) Ae[-7r,7r](;6eiVi(<^o) sup n—p—1 oo j=i j=i sup n—p—l < sup sup Yl (^j(^) ^hk,4>-(^j,lhk,4> + (^jibj,k,ct>-aj^bj^kfi)e '^^ (4) j=i + sup A j>n—p We note that sup;^ Er>n-p cq\ks>e-''^ < sup,, afn-'/' T.T>n-pj"' \hkfi\ = o{n-'''). Next n—p—\ n—p—l < (5) n—p—l + E 3 It is therefore enough to show that (5) and (6) {oc-\bi^k,^-b,,kfl))e-''^^ (6) =1 go to zero uniformly on [— 7r,7r] as 6 — > 0. First consider (6). n—p—l sup {(f)Q,n))dn Ae[-7r,7r] < sup Ae[-7r,7r] n—p—l 1 +-4 (^ sup (7) A£[-7r,7r] 24 where, for E Ng cf) {(po) the finite Fourier approximation of , 77 {(f>, ^) converges uniformly such that n—p—1 sup yZ Ae[-7r,7r] J = sup ihk,4> - ^j,kfi) e' i\j =l (77 {cp, X)-fj (</>o, A))fc - Y, AG[-7r,7r] < ibj^k,4> - bj,k,o) e-'^' ]=n—p sup \7]{4>,X)-f]((f)Q,X)\f. + V rap 2 \bj^k,4 AGf-TT.TT < +2 e ^ sup \bj,k,<p\ 4>eNs{4>o)j=n-p Then, letting ctj = a~^ — a~^ the term first in (7) dominated by is n—p—1 < sup \Vk{4>,iA -'?fc(0o>M)M/^ Ag[— 7r,7r] J —TT j=i n—p—1 < J—n „_, j=—n+p+l , < The constant continuity of from which 1 Ce{5). n—p—1 C is bounded by X^,=i 7) it < 00. From uniform < Ejli Qj (-^o) Wo) — f] {(f>Q, ^) < e for some 5 > (cj), fj) on [— vr, tt] x A^^ ((/)q) we have \ri {(p, fi) follows that J_^ ri{(j),fx) — f] ((f)Q,fj,)\i^dfi < 2'ne. The constant C can be [. bounded by (27r^°l^ goes to zero as aij n — > dj 00. ((/>o)~ + 0-4) independent of n and sup^g^^(^^) Ejln-p Next we consider (5). First l^j.fc.^l look at Pj>-^jl From aj_0 a.s. and (j), = E[4>{L)yt)^ {(f){L)yt-j)'^ But, since (f){L)yt 4>{L)yt has = (f){L)(f)Q it > a^ > since otherwise (j){L)yt = an ARMA{p,p) process with parameters 0q follows that a^^^ {L)et is nonzero variance contradicting («j,0)~^ - (^J^ < Ci (f>{L)yt \aj,4, - = a.s. aj\ some constant Ci. Since Eej < 00 we can uniformly bound C2 is a finite constant depending on (f>Q and Eef. Then for Then \aj^^ n—p—1 n—p—1 j=l 25 — aj\ by 6C2 where ^ 1 where Y^=i < \^oM,<t'\ oo on Ne {(po)- Now turn to the two terms of first (4) n—p—l n—p—l < Now "j (</•) - ",i < ^ = . " ^ ^ 1 - ajj < {4>) \aj Y^ a^-,,^) ("j (0) l^i where a^ sup {(f))''^ |aj (</)) \aj {(f)) - aj^^-^ aj, <^| ^j,k,4>\ and —— - a]J + J'+P n the expected value of a* ^^.(^ is - ((^) |aj- Then . n-p—l n—p—l w— p— n First note, that since Qj^^ replace {ctj ((f)) is bounded away from zero and aj {(f>)~ < cn^/^~'", we can aj^^)"^ by n^^'^~'" times a constant that does not affect the argument. Then the second term n—p—l ^-i/2-i; ^ {j +p) \aj^^\ \bj^k,4>\ -^ n as ^ CO 3=1 uniformly on Ng ((^q) The • term now first is shown to go to zero in probability by looking at P{ The term n}''^~^ X]?=f~ ^n-p-l The last term is ni/2- sup . ^i i^) ~ Y"-^~ ^?<4 \^J i^) l^j,fc,</>l - ^U \hk,4>\ > n) dominated by ^^ n—p—l ^ zero with probability tending to one which follows form maxj P(n-^/^ sup j,g^(0 the first term is bounded by n )('-''*i d> maxj P{sup^^j^^/^ if ~ '^1 d>^ > s—^n-p-l n"' ry) — sup > 0. \ a^ By Markov's |a*(<^)-a^^^| 4>&Ns{ct>o) 1/2 < ^^YT. ^ \hk,<l>\\^ar{n^^^i 26 sup \a* {^) - - al^\)) < d„) — > inequality . The ^ from result then follows nmaxfar( This shown is It Lemma l^j.fc.^l sup it is |q;* (0) 0{n-'') — if = O (1) a"^|) . (B.l). remains to show that P(p(/i„(^„, part first in = ^ YTjZl A), /i(<;6o)'^)) enough to show that sup;^g[_^ ,r] > ^^ ^) dhn{4>ni Using the result from the 0. ^)/dX - dh A) (</)q, /dX = Op (1) Since implies that e C^[-7r,n] f]{(l),X),^ V/i from Folland (1984), Theorem follows ~ Bernstein's Theorem, ^\j\ < 8.22e, that {ij^hj \bj {(p)\ < = oo and d^r]{(j>,-T^)^ /dX^ {(j)) =J d^i]{(i),T:)^ /dX^ (^T]((f),X)) e-'^^dX. it By oo such that ^(^j)6,(</>)e'^^->^r,(</>,A) uniformly on [— 7r,7r] the first Since that . Using these part can be applied to the {Ti.,p) is PChn and noting that facts, first lr]{X) € C^ [— vr,7r] , the proof of derivative. P (p a complete metric space ( /i„(^„, A), ^((J^iq, A) ] > 6) ^ implies eH) ^i.m Proof of Theorem the proof of Lemma 5.2 (A. 2) We start by obtaining an expression for Rn (A) in (6). we have oo ^y = (A) ct>o' (e~'^) ^. (A) + n'^/^ ^V'^-e-^^^'C/^,- (A) j=0 Letting R^ (A) In,yy{X) = n.-^/^ Yl'^o ^je'^^^Unj (A) we can write = UJy{X)Uy{-X) = ^y = In,yy (A) (A) ^y (-A) a (A) a (A) 4>o (/.q + Uy (A) u, (-A) + cjy (A) 0o + In,ee (A) <Po' +u,{X)-p^Ri{-X) + 00 (e In the same way we In,yy (A) also = [^"^) + ^e (e'^) (-A) hi < (-A) (A) \Ri{-X)\' ^-^j have In,yy (A) a (-A) <^o + ^n,.. (A) <t>o^ 27 (e^^) + o;, (A) R], (-A) From Let Rn Next write = Re (A) (-A) (A) i?i a;, = A) /iQ (</>0, Re In,yy (A) +In,ee (A) while a filter based on a different parameter = In,yy{\)h{(p,X) (A) Zr,,0 </> (-A) l^^Q + /i0|j that ^/n (<?!), j Rn coefficients. = Re A) (A) (-A) (A) dX (11) is <; Next [l^p In,yy{X)Re = Op(l) for P limsup ra—oo which in turn follows lim sup [e'^j a (-A) (pQ /iQ (</'o, (po A) i?n (A) {- X) l^ (j) + (e'^^ a h {- X) (t>Q (^, A) Rr, (A) <; for all sup \h&n,pihM)<6 i?, > e ||f„ {h) - there exists a 5 Vn > (/io)|| < 'J? > such that ^ J from P n->oo as (pQ i^''^)] all if (-A)I' Then by the proof of Lemma (^.2) it follows continuous (A) with absolutely summable Fourier (j){e'^)(t)Q^ estabhshed |i?i expressed as is +In,ee (A) h^, (0, X) with + (A) based on the true parameter In,yy (A) h^ {cpQ, A) for the filter In,yy (A) %^i?i + ^, (-A) sup ^/n <e In,esW{h4,,{(f>,X)-lr^fiW)dX [ (8) J — IT \h&H,p{h,ho)<6 and P lim sup n-xx, For (9) define sup all (A) [h - ho] <£. dX (9) the open neighborhood ns = {h: Now for Rn ^/n \hGn,p{h,ho)<6 h € TCs it is [-7r,7r] ^RP \h en,p{h,ho) < 6} the case that h (A) € C^[— 7r,7r]. . By Bernstein's theorem this implies — ho is also continuous and uniformly bounded with absolutely summable Fourier coefficients such that (9) follows absolute summability of the Fourier coefficients of h. In turn h from the proof of Lemma (A. 2). Next turn to (8) The main idea of the proof by parts is used to separate h from In^ee (A) . is taken from Robinson where integration - < r In,eeW{h4,,{<P,X)-lr^fiW)dX sup y/n h&Hs J —TV sup y/n h&Hs J —IT + sup r Vn {In,,e (A) - EIn,ee (A)) {h^, (</>, A) - lr,,0 EIn,ee{X){h<t>,{(l),X)-lr,fiiX))dX 28 (A)) dX Since EIn,ee (A) note that J^^ = Ir^fi cr^ it (A) dA sup y/n heUs r and /^^ sup \/n + - h,p^ {In,,e (A) {(f), EIn,ee (A)) {h^o (</), (V ^) (<^' {4>, Tt) - - (A)) ^'/.O for /i /^,0 (tI")) / J —-K (-^n.ee (A^) Q PIT / 'dX remains to show that y/n J^^ J_^ ity. Let /^gg(^) = — /„_££ (/i) " ^^^"^ ^'''° — {^n,e£ (/^) ting Hg = = ^, In,ee we (m + 71") The /j,,o (7r)|| are uniformly ~ '^^ E^n,6e (/^)) C^MC^A dX. (fJ-)) d^ dX is bounded in probabil- fj.<X /Lt> A we work with y/n J^'' n-l n - tt) (10) /q (/„,ee (f^) - Eln^ee (a^)) C^Ai dX. Let- show that first 27r £^/n,e£ {lA) df^^X Eln^ee (yu)and define the function 1 (yu) - -E/„,ee (ju)) C?A (</>, ^"^"'^^ / EIn,ee t{X,h) Since /„,££ dA /-A ^^'^° ^^' '^^ d^ It " ||/i^g (<?!', sup y/n hens (A)) /^,o {In,ee (m) / A) - lr,fl (A))|| and ein Tis both ||^ (/i^^, bounded by C6 for some constant C < oo such that Now - A) Now r\ -Q^ sup y/n h&He 0. (V, Q J^^ (h^^ {(p, A) - Irjfl (A)) dX. Next use integration by parts is a'^ = dX A) J — IT TT < term follows that the last = V „27r r(A, /xjl°,ee (Ms) - / ^(A, M)^n,e£ (A^) ^/^ dX = Op(l). inner integral can be split into a part "-1 " f27r -^0 5=1 "-1 /•27r(s+l)/n = X] [(7"(A,AXs) / - T{X,ll))In,s6 ifJ-s) + r{X,IJ.)(In,,£ (/J-s) - In,ee {^J'))] dfM r2iT/n + T(X,n)In,eei^)dlX (11) Jo and o-2(^ ^^Zj^ r(A, /xj 27ri < An implying that /(f" t(A, n)dn). If M^ < A for all s 2TTt/n < 27r/n such that "-1 /r27r — y]^(A,/xJ-/ 27r n<T Vni < i then <A— 29 T{X,fj,)diJ, = 0(n-i/2) 27r(i + 1) > An and 1 uniformly in A. Using Markov's inequality we look at each term of (11) separately. First "-1 u yfnE r27r(s+l)/n {t^K Ms) - t{X, lj))In,ee {f^s) ^/^ l2TTs/n "-1 IV — < /•27r(s+l)/n p Wn sup 2ns/n where sup^ \T{X,^g) — = t(A,/x)| "-1 |r(A, Us) - t(A, /x) a'^dfj. I fJ- A ^ if + [27rs/n, 27r(s l)/n] and 1 otherwise. Therefore ^ /•27r(s+l)/n 27r 727rs/n V^l A^ uniformly in A. Also /•27r/n ^/nE <a' T{X,^)In,te{lj)d^ / JO where the bound is again uniform in "-1 \fnE "-1 2tt ^/n A. Finally, since E{In,ee {^Js) ~ In,ee ifj)) = 0, r2n(s+l)/n T{X,jl){In,ee ~( J2-n s=l •'27rs/ra (mJ " -^n.ee (Ai))c^M r27r{s+l)/n g—l J2-Ks/n where, from Brillinger (1981), p. 417, nvar(In,e£ 2iTs/n < fi < 2n{s + l)/n. This shows that n-l uniformly in A. is = We bounded 0{n~^) uniformly on = 0{n-'/^) can therefore consider n J —IT is In,ee{fJ')) n 2n which — j.2n 27r y/nE {/J^s) in probability n-l ^^(A,Ms)^°,£e(Ats) dX s=l by Markov's inequality if sup;^ bounded. From Brillinger (1981), Theorem ^ Yl^=i 5.10.1, 2 n-l nE nE — ^T(A,/xJ/°_^^(AiJ s=l Jo + / / Jo o/o /e..e(/^l,Ai2>-j"l)c?/^l<^M2+C>(n 30 ^) '''(^^ f^s)^n,ee (Ms) where the error is uniform in A. Then f^^{X) bounded under assumption (A-1). From Brillinger (1981), Theorem 5.10.1 and = cr'^ 5.10.2, and it /^..^ (/ii,/X2, — /^i) is uniformly follows immediately that VnE J —TI is bounded such that the second term in (10) is the proof. 31 small in probability on Tis. This completes References Andrews, D.W.K. (1994) Asymptotics for semiparametric econometric models via sto- chastic equicontinuity. Econometrica, 62:43-72. Bickel, P.J. (1982) On adaptive estimation. The Annals of Statistics, 10(3):647-671. BoUerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, pages 307-327. Brillinger, D.R. (1981) Time Series, Data Analysis and Theory, Expanded Edition. Holden- Day, Inc. Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, pages 987-1007. Folland, G.B. (1984) Real Analysis - Modern Techniques and Their Applications. John Wiley and Sons. Hannan, The asymptotic theory E.J. (1973) of linear time series models. Journal of Applied Probability, 10:130-145. Hansen, B.E. (1995) Regression with nonstationary volatility. Econometrica, 63(5):1113- 1132. Hansen, L.P. (1985) A method for calculating bounds on the asymptotic covariance matri- ces of generalized method of moments estimators. Journal of Econometrics, 30:203- 238. Hansen, L.P. &c Heaton, J.C. moment conditional & Ogaki, M. (1988) restrictions. Efficiency bounds implied by multiperiod Journal of the American Statistical Association, 83:863-871. Hansen, L.P. &; linear time editors, Singleton, K.J. (1991) series Computing semiparametric efficiency bounds for models. In William A. Barnett, James Powell, and George Tauchen, Nonparametric and Semiparametric Methods in Econometrics and Statistics, pages 387-412. Hansen, L.P. & Singleton, K.J. (1996) Efficient estimation of linear asset-pricing models with moving average errors. Journal of Business and Economic Statistics, pages 53-68. Hayashi, F. & Sims, C. (1983) Nearly efficient estimation of time series models with predetermined, but not exogenous, instruments. Econometrica, 51(3):783-798. Hidalgo, J. (1992) Adaptive estimation in time series regression models with heteroskedas- ticity of unknown Kreiss, J. P. (1987) On form. Econometric Theory, 8:161-187. adaptive estimation in stationary of Statistics, 15(1):112-113. 32 ARMA processes. The Annals Kuersteiner, tors. G.M. (1997). Linear covariance matrix lower bounds for time series estima- Manuscript. Linton, O. (1993) Adaptive estimation in ARCH models. Econometric Theory, 9(4):539- 569. Milhoj, A. (1985) The moment structure of arch processes. Scandinavian Journal of Statistics, 12:281-292. Nelson, D.B. (1990) Stationarity and persistence in the GARCH(1,1) model. Econometric Theory, 6:318-334. Newey, W.K. (1990) Efficient instrumental variables estimation of nonlinear models. Econometrica, 58:809-837. Robinson, P.M. (1987) Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica, 55:875-891. Robinson, P.M. (1988) Root-n-consistent semiparametric regression. Econometrica, 56:931954. Robinson, P.M. (1991) Automatic frequency domain inference on semiparametric and nonparametric models. Econometrica, 59:1329-1363. Steigerwald, D.G. (1994) Efficient estimation of models with conditional heteroscedasticity. Technical report, Department of Economics, University of California Santa Barbara. Optimal instrumental variable estiprocess. IEEE Transactions on Automatic Stoica, P. &: Soderstrom, T. &: Friedlander B. (1985) mates of the ar parameters of an ARMA Control, 30:1066-1074. 33 Table Model 1: Relative efRciency of yt : = (jnjt-i +et OLS et 1'^ "-.ML 1'^ <Pn 9n 4>n n = for ARCH(l) uth^' '•iML n =:512 == 256 innovations ht = 0.1 + 7..eU 1'^ ~,ML (Pn <Pn n = 1024 <!> 7i 0.0 0.0 0.9969 (0.0023) (0.0058) (0.0017) (0.0041) (0.0011) (0.0029) 0.0 0.5 0.9918 0.5158 0.9947 0.4144 0.9984 0.3801 (0.0015) (0.0158) (0.0013) (0.0134) (0.0011) (0.0126) 0.9619 0.4458 0.9631 0.3978 1.0294 0.9966 1.0126 0.9980 1.0107 0.0 0.9 0.9502 0.4432 (0.0031) (0.0138) (0.0043) (0.0131) (0.0046) (0.0123) 0.5 0.0 0.9937 1.0272 0.9985 1.0087 0.9971 1.0106 (0.0030) (0.0054) (0.0021) (0.0039) (0.0016) (0.0032) 0.5 0.5 0.9388 0.4761 0.9375 0.4515 0.9616 0.3658 (0.0056) (0.0144) (0.0054) (0.0144) (0.0082) (0.0121) 0.5 0.9 0.9361 0.4083 0.9171 0.4478 0.9106 0.4759 (0.0078) (0.0131) (0.0096) (0.0137) (0.0108) (0.0141) 1.0046 1.0190 1.0046 1.0096 1.0012 1.0025 (0.0040) (0.0054) (0.0029) (0.0036) (0.0022) (0.0029) 0.5084 0.8958 0.4416 0.8590 0.3715 0.7 0.0 0.7 0.5 0.9071 (0.0070) (0.0155) (0.0081) (0.0139) (0.0076) (0.0121) 0.7 0.9 5.5958 0.3998 0.8274 0.4470 0.8274 0.4470 (0.1854) (0.0134) (0.0120) (0.0138) (0.0120) (0.0138) 1.0391 1.0097 1.0152 1.0101 1.0085 1.0022 (0.0069) (0.0047) (0.0053) (0.0035) (0.0037) (0.0024) 0.5420 0.8522 0.4570 0.8308 0.4663 0.9 0.0 0.9 0.5 0.9209 (0.0101) (0.0154) (0.0094) (0.0135) (0.0094) (0.0142) 0.9 0.9 0.7281 0.4352 0.5951 0.3486 0.4878 0.4247 (0.0123) (0.0151) (0.0113) (0.0119) (0.0107) (0.0139) is defined as Sfy/Spi^g or S]i^j^/SQj^g, respectively, where S^ is the estimated variance of the estimator 4>- Numbers in parenthesis are asymptotic standard deviations of the variance ratio. Results are Relative Efficiency based on 3000 replications. 34 Table Model: yt = (f)yt-i Means and Medians 2: + et et = uthj ht 1'^ "lOLS = 0.1+7i£?_i '-.ML <Pn 4>n <t> 7i 0.0 Mean Median Mean Median Mean Median -0.0007 -0.0006 -0.0006 -0.0005 -0.0007 -0.0007 0.0 0.5 -0.0025 -0.0014 -0.0024 -0.0016 -0.0009 -0.0009 0.0 0.9 0.0006 0.0000 0.0009 -0.0002 0.0011 0.0009 0.4978 0.4994 0.4969 0.4980 0.4977 0.4992 0.5 0.5 0.5 0.4927 0.4935 0.4908 0.4919 0.4987 0.4995 0.5 0.9 0.4770 0.4830 0.4694 0.4780 0.4945 0.4984 0.6962 0.6969 0.6948 0.6957 0.6962 0.6970 0.7 0.7 0.5 0.6933 0.6950 0.6913 0.6933 0.6979 0.7000 0.7 0.9 0.6779 0.6888 0.6733 0.6847 0.6945 0.6992 0.8965 0.8981 0.8947 0.8962 0.8965 0.8982 0.9 0.5 0.8948 0.8977 0.8933 0.8957 0.8982 0.8992 0.9 0.9 0.8850 0.8925 0.8860 0.8921 0.8984 0.8999 0.9 Samp e size is 512. Results are based on 3000 replications. 