MIT LIBRARIES

advertisement
MIT LIBRARIES
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/rateadaptivegmmeOOkuer
$^\d
'.^i-
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
RATE-ADAPTIVE
GMM ESTIMATORS FOR
LINEAR TIME SERIES
MODELS
Guido Kuersteiner
Working Paper 03- 17
October 2002
Roonn E52-251
50 Mennorial Drive
Cannbridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://ssrn.com/abstract=4 10061
GMM ESTIMATORS FOR LINEAR TIME
RATI> ADAPTIVE
MODELS
SERIES
By Guido M. Kuersteiner*
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
First Draft:
J
JUN
5 2003
November 1999
This Version: October 2002
LIBRARIES
Abstract
In this paper
we analyze
Generalized
Method
models as advocated by Hansen and Singleton.
efficiency
bounds
if
number
the
GMM estimator.
way
known that
number
of selecting the
The optimal number
approximation.
conditions
is
proposed.
The
of instruments
shown that the
It is
order adaptive. In addition a
new
version of the
these estimators achieve
of instruments in a finite sample
is
GMM
GMM. A
bias correction for both standard
MSE
selected by minimizing a criterion based on
version of the
GMM estimator
is
higher
GMM estimator based on kernel weighted moment
kernel weights are selected in a data-dependent way. Expressions for
standard
shown that the
is
fully fesisible
the asymptotic bias of kernel weighted and standard
of the
well
This paper derives an asymptotic mean squared error (MSE) approximation for the
available.
MSE
It is
estimators for time series
of lagged observations in the instrument set goes to infinity. However,
to this date no data dependent
the
Moments (GMM)
of
GMM estimators are obtained.
procedures have a larger asymptotic bias and
and kernel weighted
MSE
It is
shown that
than optimal kernel weighted
GMM
estimators
is
proposed.
It is
bias corrected version achieves a faster rate of convergence of the higher order terms
than the uncorrected estimator.
Key Words: time
series, feasible
GMM,
number
of instruments, rate-adaptive kernels, higher
order adaptive, bias correction
'Massachusetts Institute of Technology, Dept. of Economics, 50 Memorial Drive E52-371A, Cambridge,
USA. Email: gkuerste@mit.edu.
Web: htip://web.mit.edu/gkuerste/www/.
Gallant, Jinyong Hahn, Jerry
Hausman, Whitney Newey and Ken West
excellent research assistance.
Comments by
the paper. Financial support from
NSF
a co-editor and three
grant SES-0095132
is
I
for helpful
anonymous
MA
02142,
wish to thank Xiaohong Chen, Ronald
comments. Stavros Panageas provided
referees led to substantial
improvements of
gratefully acknowledged. All remaining errors are
my
own.
1.
Introduction
In
recent years
els
based on
first
GMM estimators have become one of the main tools in estimating economic mod-
order conditions for optimal behavior of economic agents. Hansen (1982) established
the asymptotic properties of a large class of
GMM estimators.
Based on
first
order asymptotic theory
was subsequently shown by Chamberlain (1987), Hansen (1985) and Newey (1988) that
tors based on conditional
moment
restrictions
it
GMM estima-
can be constructed to achieve semiparametric efficiency
bounds.
The
focus of this paper
is
the higher order asymptotic analysis of
series case. In the cross-sectional literature
result in substantial second order bias of
it is
well
GMM
know that using a
large
estimators for the time
number
of instruments can
estimators, thus putting limits to the implementation
of efficient procedures. Similar results are obtained in this
fully feasible,
GMM
paper
second order optimal implementations of efficient
for
the time series case. In addition,
GMM estimators for time series models
are developed.
In independent sampling situations feasible versions of efficient GA'IM estimators were implemented
amongst
others,
by Newey (1990). In a time
series context
examples of first order
efficient
estimators are
Hayashi and Sims (1983), Stoica, Soderstrom and Friedlander (1985), Hansen and Singleton (1991,1996)
and Hansen, Heaton and Ogaki (1996). Under special circumstances and
Kuersteiner (2002) constructs a feasible,
number
of instruments
is
efficient
allowed to increase at the same rate as the sample
the optimal expansion rate for the
number
paper a data dependent selection rule
GMM estimators
Several
ture.
of instruments for efficient
size.
fact, to this
More
generally
date no analysis of
GMM procedures depending on
dimensional instrument set has been provided in the context of time series models. In this
infinite
version of
a slightly different context,
GMM estimator for autoregressive models where the
however, such expansion rates do not lead to consistent estimates. In
an
in
moment
for linear
for the
time
number
series
of instruments
models
is
is
obtained and a fully feasible
proposed.
selection procedures, applicable to time series data, were proposed in the litera-
Andrews (1999) considers
selection of valid instruments out of a finite dimensional instrument
set containing potentially invalid instruments. Hall
and Inoue (2001) propose an information
based on the asymptotic variance-covariance matrix of the
GMM estimator to select
relevant
criterion
moment
conditions from a finite set of potential moments. Both approaches do not directly apply to the case
of infinite dimensional instrument sets considered here.
bandwidth parameters
the asymptotic
MSB
optimal bandwidth
in
for kernel estimates of
of the estimator.
Linton (1995) analyses
tiie
optimal choice of
the partially linear regression model based on minimizing
Xiao and Phillips (1996) apply similar ideas to determine the
the estimation of the residual spectral density in a Whittle likelihood based
re-
More recently Linton
gression set up.
bandwidth
for
an
(1997) extended his procedure to the determination of the optimal
Newey
semiparametric instrumental variables estimator. Donald and
efficient
(2001)
use similar arguments to determine the optimal number of base functions in polynomial approximations
to the optimal instrument.
their true
They analyze higher order asymptotic expansions
parameter values. While the
first
order asymptotic terms typically do not depend on the
estimation of infinite dimensional nuisance parameters as
this
of the estimators around
shown
in
Andrews (1994) and Newey
(1994)
not the case for higher order terms of the expansions.
is
In this paper
case of
GMM
important
for
we
will
obtain expansions similar to the ones of Donald and
Newey
(2001) for the
estimators for models with lagged dependent right hand side variables. This set up
the analysis of intertemporal optimization models which are characterized by
One
conditions of maximization.
particular area of application
Minimizing the asymptotic approximation to the
struments leads to a feasible
GMM
MSE
is
first
is
order
asset pricing models.
with respect to the number of lagged in-
The trade
estimator for time series models.
of
is
between more
asymptotic efficiency as measured by the asymptotic covariance matrix and bias.
Full implementation of the procedure requires the specification of estimators for the criterion func-
tion used to determine the optimal
for the
number
of instruments.
optimal number of instruments leads to a
GMM
It is
established that a plug-in estimator
estimator that
We
the same asymptotic distribution as the infeasible optimal estimator.
weighted version of
GMM.
It is
shown that the asymptotic
kernel weights are applied to the
that adjusts
its
moment
conditions.
The paper
tion.
also propose a
new
kernel
this
purpose a new rate-adaptive kernel
is
can be reduced
if
suitable
introduced. In addition, a
proposed.
is
proposed.
The
bias corrected
GMM estimator achieves a faster optimal rate of convergence of the higher order terms.
is
organized as follows. Section 2 presents the time series models and introduces nota-
Section 3 introduces the kernel weighted
asymptotic
is
and achieves
and
a semiparametric correction of the asymptotic bias term
version of the
MSE
fully feasible
bias
smoothness to the smoothness of the underlying model
data-dependent way to pick the optimal kernel
Finally,
For
is
MSE terms and
GMM estimator,
contains the analysis of higher order
derives a selection criterion for the optimal
number
of instruments. Section
4 discusses implementation of the procedure, in particular consistent estimation of the criterion function for optimal
GMM
bandwidth
selection.
Section 5 analyzes the asymptotic bias of the kernel weighted
estimator and introduces a data-dependent procedure to select the optimal kernel.
discusses non-parametric bias correction.
Section 7 contains a small
proofs are collected in Appendix A. Auxiliary
upon
request.
Lemmas
are collected in
Section 6
Monte Carlo experiment. The
Appendix B which
is
available
2.
Linear
We
Time
Series
Models
consider the linear time series framework of
stationary stochastic process.
Hansen and Singleton
assumed that economic theory imposes
It is
of a structural econometric ecjuation on the process
we
=
partition yt
and
yt 3
Here, y^j
[y(,i,J/( 2'i't s]-
(1996). Let yt €
is
yt-
the scalar
R^ be
a strictly
restrictions in the
form
In order to describe this structural equation
left
hand side
The
are the excluded contemporaneous endogenous variables.
possibly a subset, of the lagged dependent variables yt_i,
variable, y^g are the included
vector Xt
where
...,yi_j.
r
is
is
defined to contain,
known and
The
fixed.
structural ecjuation then takes the form
yt.i=ao
(2.1)
The
model
structural
m
again,
with 7^
is
for
Ijl
=
/3
=
+
+
We
finite.
a\L
subject to exclusion
+
ao
/3'x(
+ arU
+
An
£f
alternative representation of (2.1)
is
x p vectors
a^
and normalization
restrictions
imphed by
such that there exists an
with
infinite
£<
we
such that a{L,fi)yt
also
=
ao
=
for
m
>
strictly
1,
where
=
EetSt-j
[yj2j^^(]
we can
by Yj
obtained by setting
+
£(.
Note that
a;
are
/?.
assume that the reduced form of
moving average (VARMA) process A{L)yt
=
yt
is
1
is
—
yt
admits a
A{l)i.iy
+ B{L)ut
moving average representation
(2.2)
G M^
m—
x't
all
1
...
representation as a vector autoregressive
/x
specifically, £t
the regressors in X( where
In addition to the structural equation (2.1)
Here,
More
denote the autocovariance function of
G K'^ and collecting
[/3o,/3x]
=
ao
Ct-
> m.
write (2.1) as y^j
a{L,/3)
on the innovations
and follows a Moving- Average (MA) process of order
assumed known and
=
Letting
also imposes restrictions
=
stationary with Eet
+ P'oyt,2 + P'i^t+£t-
a constant and Ut
is
^iy
+ A-\L)B{L)ut.
a strictly stationary and conditionally homoskedastic martingale
difference sequence.
In order to completely relate
model
(2.1) to
the generating process (2.2)
matrices of lag polynomials Ai{L) and B\{L) such that A\{L)yt
and Bi{L)
satisfy [a'(L,/3), ^'i(L)]'
=
.4(1),
\b'
[L) B[{L)]'
.
=
we
+ a\- The matrices A\{L)
[cvo,a'i]' = A{\)iiy. It follows
= B{L)
and
this representation that the structural innovations St are related to the
by
=
We
b{L)ut where b(L)
=
60
+
biL
+
...
assume that A{L) and B{L) have
A{L) and B{L) are
finite order
+
all
bm-iL'^~^ and
6,
G
W
are
I
reduced form innovations
x p vectors of constants.
their roots outside the unit circle
polynomials in L. Economic theory
the polynomials a{L,0) and b{L) such that their degrees are
is
known
p—1 xp
B\{L)ut
from
St
define additional
and that
assumed to provide
to the investigator.
all
elements of
restrictions
No
on
restrictions
are assumed to be
L
known about the polynomials A\{L) and Bi{L).
are unknown, although assumed
parameter vector
/?
=
finite.
investigator
moment
(2.1) implies
These moment
moment
=
>
for all j
model
form
of the
A
(2.1).
well
zit
where E
W be
known example
strictly stationary
asset pricing models.
is
^
••
— — oo
£l
^
and det B[z)
for
\z\
the polynomial A{L) and
<
some constant a^ and
polynomial B{L)
let
Ai
finite
be the root of
...,u\''
.
Assume
<
oo for
<
fc
is
E {utu[\J^t-i) =
that
8.
defined as C{L)
L
order polynomials in
^
for
\z\
maximum modulus
<
lag polynomial 6 (L)
= 6{L)B{L) and
=
— 9\L —
1
be the root of
let Ai
...
=
^
such that detyl(z)
of det{zP'^A{z~^))
— 9m-\L^~^.
A~^{L)B{L)
Let pa he the degree of
1.
which can eqiiivalently be written as f,(X)
1.
...)'
and
let
P'
=
(27r)~^ ct^
=
Let
0.
|g,(giA)|2
f^^
Let p^ be the degree of the
maximum modulus
= Cov {xt+rm ^t^oo)' Assume
The column rank assumption
for an extensive discussion of this point).
Then l//e(A)
The
0,
of det{zPbB{z~^))
=
0.
max(Ai,Ai). Assume that A G (0,1). Define the infinite dimensional instrument vector
(2/j,yj_j,
Remark
E {ut\lFt-i) =
with coefRcient matrices Cj
Moreover, assume b{z)
1.
(27r)^^6(e''^)'i;5(e-'^)
=
ergodic, with
|cum,j,,..,,Jii,...,^fc_i)|
where A{L) and B{L) are p x p matrices of
=
following formal Assumf>-
£fc_i=^— oo
Assumption B. The lag polynomial C{L)
zj,^
we impose the
oo
^
Define A
and
the k-th order cross cumulant oful]^^^,
oo
=
Alternatively, the
a positive definite symmetric matrix of constants. Let ul be the i-th element of ut and
is
cumij_...^j^(ti, ...,ifc_i)
f,{X)
estimators.
and A{L),B{L) and b{L).
Assumption A. Letut G
T,
GMM
by economic theory and then lead to the formulation of a
In addition to the structural restrictions of Equation (2.1)
tions on
form
0.
restrictions are the basis for the formulation of
restrictions (2.3) are often implied
structural
concerned with inference regarding the
restrictions of the
E{et+myt-j)
(2.3)
is
b{L),Ai{L) and B\{L) are treated as nuisance parameters.
[Pq, I3[) while
The economic model
The
In particular their degrees in
exists
fact that ut
In our context
it is
and corresponds
is
for
P
is
needed
that
P
has
full
column rank.
for identification (see Kuersteiner (2001)
Assumption (B) guarantees that /^(A)
to the spectral density of an
^
for
A 6 [— tt,
tt].
AR(m-l) model.
a martingale difference sequence arises naturallj' in rational expectations models.
needed together with the conditional homoskedasticity assumption to guarantee that
the optimal
GMM weight matrix
is
of a sufficiently simple form. This allows us to construct estimates of
number
the bias terms converging fast enough for bias correction and optimal
The
conditional homoskedasticity condition E{utu[\J-t-i)
changing variances. Relaxing this restriction results
type analyzed
in
in
=
Eutu't
restrictive as
is
more complicated
Kuersteiner (1997, 2001). In principle the higher order
of instruments selection.
GMM weight matrices of the
moment
however nonlinear and
will
Andrews (1991) shows
by a strong mixing assumption
Andrews
(1991)
and here are
resulting estimator
cumulants limits the temporal dependence of the innovation
for the
for
/c
=
for Ut-
4 that the summability condition on the cumulants
The cumulant summability
than the second part of Condition
slightly stronger
by
not be considered here.
The summability assumption
process.
restriction implied
The
conditional homoskedasticity could be used in addition to the conditions (2.3).
is
time
rules out
it
restrictions
A
condition used here
Andrews
in
What
(1991).
is
is
implied
is
similar but
needed both
in
on the eighth-moment dependence of the underlying process
Ut.
GMM estimation
Infeasible efficient
restriction (2.3). In our context this
An
infeasible estimator of
based on
(3
reference point around which
For this purpose
is
=
let Q.
/? is
where
zj.ooj
is
lagged observations
all
cis
moment
instruments.
defined in Assumption (B),
is
used as a
^^'^''^ fl.{l)
=
Cov{zt^oo, z[_i
^) and
D=
P'Q~^P where P
detailed analysis of these infinite dimensional matrices can be found
and Appendix
The
(B.2).
infeasible estimator of
is
given by
n~Tn
..
= D-'p'Q-'- J2
/3„,^
2t,oo
the implications of the
all
feasible versions of the estimator.
Ylh=^m+i Ij^iO
A
based on exploiting
equivalent to choosing
we expand
defined in Assumption (B).
in Kuersteiner (2001)
is
for
{yt+m.i
-
a4)
-
(-t.oo
® ^iy)
1,0
t=i
where Iqo
an infinite dimensional vector containing the element
is
= P'n~^-^
and
/uj is
the
element of
first
i^i
y
~
® m)^( almost surely such that s/n (/?„ ^o ~ 0o) D~^do. It can
be shown that D~^do — N{0,D~^) as n — oo under the assumptions made about yt and etFor any fixed integer M, let Zi^m = {y't^y't-i: yt-M+iY be a finite dimensional vector of instruments.
An approximate version /3„jv/ of Z^n.oo i^ then based on Dm = P'^jQ^^Pm and zt^M where Pm and Q.m
Let do
—
1
Yli^t,rx
loo
+
same way
are defined in the
D~^do
at
—
>
as n,
same way
fully feasible
in
Af
which AI goes to
in the
Equation
>
—
»
as (3„
^
if
P and Q with
The
infinity,
estimator
(3.2)
oo.
as
last
once
M
is
statement
i3„ j^f is
but with
z^oo replaced
Pj\j
is
no longer
by Zt^M-
It
then follows that ^/n
and
fi/i/
replaced by estimates
a function of the data alone.
while data-dependent selection of AI
is
A more
^ — /3) —
without specifying the rate
true, at least not
replaced by a feasible estimator
{^fi^
/3„
Pm
m
and
where $ri.M
fi/i/.
We
is
call
detailed definition of /?„ ^j
discussed in Section
4.
defined
0^
is
^-y
a
given
3.
Kernel Weighted
GMM
In this paper a generalized class of
introduced. Conventional
(2.3).
More
GMM
GMM
estimators based on kernel weighted
estimators are based on using the
non-random weights
generally one can consider
k{j,M)Eet+myt-j-i
The
function k{j,M)
M)
k{j,
is
The truncated
to denote the indicator function.
The
restrictions
kernel
is
M of the moment restrictions
such that
0.
For the special case where k{j,M)
a generalized kernel weight.
a standard kernel function.
is
M)
k{j,
=
first
moment
is
=
k{j/M)
=
k{j/M),
{\j/M\ < 1} where we use
{.}
general kernel weighted approach therefore covers the standard
shown that many
GMM procedure as a special case when the truncated kernel used. In Section 5
kernel functions reduce the higher order bias of GMM and that there always exists a kernel function
is
it is
that dominates the traditional truncated kernel in terms of higher order
We now
describe the kernel weighted
kM =
—
having kernel weight k{j
{km
diag({t
sample.
?/
="
>
diag(fc(0,
l,Af) in the
M),
j'-th
1}
,....{t
The
> M])
estimate of the weight matrix
weight matrix
is
Af ))'
diagonal element and zeros otherwise.
An
is
instrument selection matrix
=
denoted by zt^M
(5'm(0
® Ip)
5m (0 =
no data
is
[zt^M
Km =
Let
" 1m ® v)
in
the
where
Qm
is
obtained as follows.
m—
hM=
where 7^(Z)
For
standard
Let
=
M
^ Xl"=r"^m+i ^t^t-i and
fixed
inefficient
Zm
~
et
and possibly small,
demeaned element
in y<.
n~^Zi^jY. Let
—
x]'
Then
=
- '^^
zi^^^'t-l hi-
—
7"(0^A/(/)
Y.
a{L, f^n,M)iyt+m — y) for
well
it is
GMM procedures where Q^
x,...,i„
define Qi\i(l)
\
known
=
Em = Km^~\}Km-
Z^ =
Mp matrix
/?„
m
K,M = (P'm^mPm)
7
[2inax(i, r-m+i), Mi
Also,
P]^,;
Assuming that
dimension of the parameter space, the estimator
first
stage estimator
that such an estimator can be obtained from
the matrix of regressors.
define the d x
some consistent
Imp-
be the matrix of stacked instruments
[xniax(m+i,7-+i)
We
then given by
(3.1)
(3.2)
1,
introduced to exclude instruments for which there
is
vector of available instruments
The optimal
^M ~
k{M ~
2^t=iyt-
An
PnM-
...,
Define the matrix
/3„jv/-
where Ip the p-dimensional identity matrix.
Ip)
'S'
GMM estimator
MSE.
=
M
Y
is
-) ^n-m,M]' and
the stacked vector of the
n~^X'Zf,; as well as the
is
such that
M
can now be written as
"'
P'm^mP'm-
X =
>
d/p,
Mp x
1
where d
first
vector
is
the
.
Km
For the truncated kernel with
kernel weighted
moments can be
hjj
instead of the optimal
Hjv/
=
Ihip, (3.2)
inferred
from
(3.2).
M)
is
GMM
the standard
The
As
as weight matrix.
for suitable choices of the kernel function k{j,
order loss of efficiency
is
is
The
formula.
effects of using
kernel matrix A'a/ distorts efficiency by using
shown below, these
and bandwidth M.
more than compensated by a reduction
It is
second order
effects are
also
shown that the second
of the second order bias for suitably
chosen kernel functions.
We now
which plays a
define the constant s
role in
determining the rate of convergence of
Definition 3.1. Let \\ and Ai be as defined
s
=
the muhipUcity of Aj. Define s
si
=
3/2si
-
<
1/2 if Ai
Assumption C. The
Mh^
[-1,
,
=
A:(0)
number
hnite
We
1]
1,
=
k{x)
=
kq
i)
X 1R+, k{-x,y)
=
1/2 if Aj
k{j,M)
is
=
€
k{x,y), k{0,y)
(0,
oo),
cq
=
1
ii)
is
.)
for all
= — log(A),
kq
>
1.
satisfied
Hanning
>
|x|
^
for
=
1. k{.) is
some
ij
eR+
6
q
+
—
3/2si
=
1/2 if Ai
k{j/M). Then
Ai
and
and
limy_»oo (1
—
k{.) satisfies
and
continuous at
=
lim3;_o(l
k
:
at all but a
— ^(i'))/
1^1'
(0, oo).
if it satisfies \k (x, y)|
and k{x,y)
=
for \x\
>
k{x/y. y*"A^)) /y^X^
made
1.
=
<
c
<
oo
V
(x, y)
£
Fiirtfiermore, for A
CqCj
|x|
for all
x &M.
Andrews (1991) except that we
in
in estimation.
finite
number
The assumption
of
moment
also
conditions,
could be relaxed at the
bandwidth parameters to estimate the optimal weight matrix.
This seems unattractive from a practical point of view and
is
2s\
k{j,AI)
if
This assumption ensures that only a
cost of having to introduce additional
but
=
Ai, s
rate-adaptive
bandwidth parameter, are used
Assumption (C)
^.
ci
for |x|
controlled by the
to /3„
(B). Let si be the multiplicity of Ai
oo) there exists a constant kq such that kg
(0,
kernel function k{..
and
for
Assiimption (C) corresponds to the assumptions
require A:(x)
>
regular
=
k(-x)\/x G M, k{x)
for all q
defined in Assumption (B)
and some constant
—
Assumption
in
/3„ ^^
first
Ai.
of points. For q e
distinguish:
2si
kernel function
Assumption D. The
K
We
turn to the formal requirements the kernel weight function k{.,.) has to satisfy.
is
not pursued here.
rules out certain parametric kernel functions such as the
Quadratic Spectral kernel
by a number of well known kernels such as the Truncated, Bartlett, Parzen and Tukey-
kernels.
In the case of regular kernels the constant kg measures the higher order loss in efficiency induced by
the kernel function which
kq
^
0.
is
proportional to
1
— k{i/M) = kqM~''
For rate-adaptive kernels we obtain an efficiency
which, as will be shown,
is
of the
same order
of
loss of
|i|'
1
magnitude as the
for large
M and some q such that
— k{i/M,M^X
efficiency loss
)
= M^X
CQ\i\'^
kq
due to truncating the
number
The
of instruments.
kernels are called rate-adaptive because their smoothness locally at zero
adapts to the smoothness A of the model. In Section 4
it is
shown how the argument M^\^'
in k{.,
.)
can be replaced by an estimate.
Kernel functions that satisfy Assumption (D) can be generated by exploiting the chain rule of
Consider functions of the form
differentiation.
(3.3)
for V
4^{v,z)
.
6 [—1,1] and some non-negative integer
z(— log(2))''
7^ 0.
The
q
rate adaptive kernel
/c(j,
M)
=
MG
satisfying
Assumption
(0,oo] as long as A
(C).
the elements of
M
P^M
as in
is
minimized.
/3„ ^/ is
Donald and Newey (2001). Let ^„
+ Tn^M
6
where rn,M
is
(0, 1)
shown
f^j
with respect to
(j)
and J k{x)'^dx
exists.
The
latter
is
the case for
\im.j[j^^
MSE
<
oo uni-
all
kernels
J (p{k{x),M^\
MSE
an error term. Define the approximate mean squared error (/?„(M,
for I
e'D'/^E {bnMb'nM) D^'^'^
and require that the error terms r^ jv/ and Rn,M
=
gW^
1
with
£'l
=
)'^dx.
sum
of a weighted
be stochastically approximated by bn,M such that n^'^(^„
—— =
v.
applied and for which
use the Nagar (1959) type approximation to the
-——
=
in the proof of Proposition (3.4) that
chosen such that the approximate
We
is
z)/9u|^^j
Also note that ^ 4>{k{x),NP\'^^'fdx
\i\''kqcl.
Donald and Newey (2001) such that
(3.5)
It is
We use the short hand notation / (j){x)'^dx =
The bandwidth parameter
bn,M
=
4>{H^/^^)^ M'\^'')) /M'X'^'
formly in
0,9(jf)(u,
cl){k{j/M),NF\^)
any kernel k{j/M) satisfying Assumption (C).
-
=
l,(f>{Q,z)
then obtained as
is
k{3,M)
\imM-*oc {I
=
0(1,2)
chosen in accordance with a kernel k{x) to which ^(u, z)
is
(3.4)
for
Then
q.
thus follows that z parametrizes the partial derivative of
for all z. It
The constant
kq
= {2~z{-\og{z)Y)v + {z{-\og{z)f-l)v'
of
used in
m~P) =
i,
k{.)) of
1,
+ "PniM, ^, k{.)) +
i?„,M
satisfy
Op(l) as
M^oo,ri^oo, VAf/n^O.
(/j„(M,£,fc(.))
The only
difference to
expectation.
^
this
3.2.
and define f,,{X)
(3.6)
is
that in our case (/j„(M,
As noted by Donald and Newey, the approximation
efficiency of /?„
Proposition
Donald and Newey (2001)
=
is
is
£, /c(.)) is
only valid for
an unconditional
M -^
oo.
Given the
the case of interest in the context of this paper.
Suppose Assumptions (A) and (B) hold and
^ Ej" -oor^'e-^^i.
i
eM'^ with
Define
Ai=piAn)-'
r /,,(A)7-i(A)dA.
J—n
i'l
=
1.
Let T^^^
=
EetXs
—
A
Let
Also, 5(9)
=
limM^oo
for k{.) such that
^nd
Qm =
Assumption (Ch)
P'^i
{Imp
for
(Jim) I (M'^A^^^) with
= i'D-y'%;nMbMD''/'-l,
b\j
Qm{Imj, ~
=
Then,
foUows that
= 0{M^ In) +
0{M-^'').
defined in (3.4) such that Assumption (D) holds,
A:(., .)
< K<
-
- Km) n^j + P'.p-Jihip - Km)-
is satisfied it
^„{MJ,k{.))
ii)
(1
\ix^M^^02Ml{M'^'y?^^) where a2M
Pm [P'm^iIPmY' P'm^m)
i)
=
=eD-^/'^X^AiD-^'-i. and Bx
n,M
and Ap-^~~X"
-^ 00
n
—
>
1/k with
CO,
\imn/M^^„{M.l,k{.))
Hi) for k{x)
=
{|i|
<
1}, n, 7\/
=
-^
(p{x)'^dx\
( I
+cl'^k^^B^'^^ /k
< k <
-> 00 and M~-'-'^ X^^' n -» 1/k with
Yimn/M'^ip^iM,e,k{.))
n
= AA +
+
Bi/k.
00,
Bi/k.
This result shows that using standard kernels introduces variance terms of order 0{M~'^'^) that are
larger than for the truncated kernel.
to variance terms of the
Using the rate-adaptive kernels overcomes
same order
as in the standard
introduces additional variance terms of order A/'^'A
optimal weight matrix resulting
be shown,
will
in
.
GMM
Nevertheless, kernel weighting
case.
Intuitively, the kernel function distorts the
an increased variance of higher order terms
this increased variance
can be traded
this problem, leading
off against
in
the expansion. As
a reduction in the bias by an appropriate
choice of the kernel function. Since any kernel other than the rate-adaptive kernels lead to slower rates
of convergence, only rate-adaptive kernels will be considered from
An immediate
corollary resulting from
Lemma
now
(3.2) is that the feasible
asymptotic distribution as the optimal infeasible estimator as long as
Corollary 3.3. Assume that the assumptions of Lemma
n
—
00 then
Because
s/ii iPnj^j
B^'''
- P) — D^^do
and Bi are
difficult to
=
on.
estimator has the same
M / ^/n -^
(3.2) hold. If n.
A/
—
>
0.
oc
and Mj^/n
(3.7)
i\/*
M* mimmize
estimate
we do not minimize
ifjj{AI,l,k(.)) directly. Instead
(p„{MJ.k{.)). Then, as n -» oc.
= argminMIC(A/) = argmin^
M€l
as
Op(l).
propose the following criterion.
Proposition 3.4. Let
^
A\
"
Me I
10
/
V./-00
M*/M*
(i)(x)'^dx]
/
—^
1
-logaM
where
we
with I
0"1M
=
+
{[d/p]
— cr2M
l,[d/p]
2, ...}
and
denotes the largest integer smaller than
[a]
with aiM, cr2M defined in Proposition
ciM^^A^^ + o{Xl^) where
Remark
+
2.
By
<
A, satisfies
construction, fi^^M'
A,
<
(3.2). If in addition, for
A^/^ then
M*/M* -
some constant
=
1
Here
a.
ci,
gm =
logaiM
0(n-i/2 (logn)^/^).
higher order efEcient under quadratic risk in the class of
-"^
=
all
GMM estimators Pn,M> Mel.
Remark
3.
For the truncated kernel
f^
second order loss of efficiency caused by a
fact that (P'j^^jQ.^ Pm)
is
4>{x)'^dx
finite
=
2
and C2m
=
0.
Note that aiM measures the
number of instruments. This
from the
result follows
the asymptotic variance matrix of the estimator based on
M instruments.
Then aiM has the
interpretation of a generalized measure of relative efficiency of (3^^. It corresponds
to N-'^a'^H-'^f'{I
-
Newey
P'')fH-'^ of Donald and
the additional loss in efRciency due to the kernel.
Ai = ^a~^i' D~^''^airx. Noting
that
m=
when
Remark
shown
totically does not
sufficient
Mp is
then
the total
is
proof of Proposition
B^'^\
=
1
measures the simultaneity bias
such that
Et is serially
number of instruments,
(4.3) that
M* =
logn/
possible to replace
MIC(M) by
is
(2co)
the simpler criterion n~^
ARMA
uncorrelated,
can be seen easily
(2001).
+o(logn) and asymp-
M*/M* —
1
=
o(l) is
(Mp) — loguiM where
this simplification is allowed
the case for the
it
Newey
B\ or k up to order o(logn). If the result
at an exponential rate. This
A
rn
A
the same as in Donald and
number of instruments. Note that
logaiM decays
5.
in the
depend on A,
it is
the total
Remark
Mp is
the penalty term essentially
1
4. It is
that here
The constant
When
caused by estimates of the optimal instruments.
The term a2M measures
(2001, their notation).
only for models where
class.
further simpUfication is available if the deBnition of(pj^(M,i,k(.)) is based on
ixD^''^EhnMb'n,MD^''^
instead of £'D^^^E{bn^Mb'^ j^^)D^^^'£.
log \P'j^Q'^Pm\
n~^ (Mp)
Note that
- log
|D|
where a\M
+ log |P|v^fi^ Pm\
this formulation
Then the approximate
=
MSE
D~^/'^ P'j^^W^ Pm D"^/^
.
depends on tiaiM
The simpUfied
is
quite similar to Hall
log|CTiM|
=
criterion for this case is
^^d knowledge of the variance lowerbound D~^
of MIC(M)
=
is
no longer required.
and Inoue (2000) except
for the penalty
term {Mp)~ /n.
4.
Fully Feasible
In this section
we
GMM
derive the missing results that are needed to obtain a fully feasible procedure.
particular one needs to replace the
unknown optimal bandwidth parameter M* by an estimate M*.
11
In
In
.
order to have a fully feasible procedure
we need
consistent estimates of the constants
A\,D and
(Jm,
converging at sufficiently fast rates.
The
following analysis shows that estimation of
that consistent estimates of
same
not true for
is
D
Ai do not depend on
and a^-
We
^i can be done nuisance parameter
additional
free in the sense
unknown parameters. Unfortunately the
use an approximating parametric model for C{L) to estimate
D
and a^.
We
consider the simpler estimation problem for the constant
first
j
Note that
C
estimates of the parameter
P
MA(m-l)
representation of
to obtain estimates
can be done by using a nonlinear
it-
least squares or
of the exponential
pseudo
decay of Cj and the fact that
sample average based on estimated residuals F^^
€t
can be obtained by using consistent
An MA(m-l) model
maximum
chapter 8 of Brockwell and Davis (1991). This procedure
Because
= — oo
are the coefficients in the series expansion of f~^ (A)
Consistent estimates of the
in
^i where
m
is
is
then estimated for
it-
This
likelihood procedure as described
outlined in the proof of
is finite,
= n~^ St^minf '+'"1/
F"
Lemma
(4.1).
can be replaced by a simple
^t+m^t-j- Using these estimates
one forms ^1 by
(4.1)
-^1
To summarize, we
Proposition
=1
E
'^j'^T-m-
j=-n+l
state the following proposition.
Assumptions (A) and (B) be
4.1. Let
satisfied.
Let A\ he defined
in
(4.1).
Tlwn
^{Ai--Ai)=Op{l).
We use a finite order VAR(h)
the parameters
Vt
=
Cq
matrices
Ui.
D
and G^].
Let 7r(L)
=
tti/j, ...tt/iM is
I
It
—
follows that yt
YI'tLi ""j-^
yt
=
=
=
C{L) with C{L)
Cq C{\)^y + X^jlj
and Evtv[
=
+
+
Ei,.
TTjyt-j
=
+ vt
YLT=o CjL^ to estimate
where
= Cq^Cj and
VAR coefficient
ttj
The approximate model with
then given by
(4-2)
where T,yh
approximation to C{L)~^
Evt^h'u't h '^
^^^
=
fiyj^
TTi^hyt-i
mean squared
...
+
nh.hVt-h
+
Vt.h
prediction error of the approximating model.
VAR
ap-
proximations in a bandwidth selection context was proposed by Andrews (1991). There however, h
kept fixed such that the resulting bandwidth choice
12
is
asymptotically suboptimal.
We
let
h
—
>
is
00 at a
data-dependent
rate, to
be described below, leading to the approximating model being asymptotically
equivalent to 7r(L).
was shown by Berk (1974) and Lewis and Reinsel (1985) that the parameters
It
root-n consistent and asymptotically normal for
h
if
i.e.
Ng and Perron
-^ DO.
n
>
0.
satisfy these conditions
h does not increase too
^
(tt'i,
...,<)'
,
Ft,,,
r'j
^
=
=
{y[
-
y'
if
h
selected
is
Ng and Perron
F/,
where
=
vt^h
(n
[r^^j, ...,r^''^]
=
—
Ut
h)~
—
,
-
...,y[_^+,
p x p block of Mf[^. Let
Y^=h
7i"i,/i2/t-i
y')'
Ffc
and
A^ =
Et=/z+i
h„
first statistic in
=
^max/"- -*
if J{i, i) is
Ng and Perron
i'*-!,'.^'/-!,/.
(1995).
Let
and define Mf;\l)
is
T^_^
= Cov {yt-i,y[_j). The coefficients of the approximate model
{ni^h, ,'^h,h) = ri,/,r^^ Let fi,/, = (n - h)"-^ J2t=h ^t,h ivt+i - y)
^t,hYth- '^^^ estimated error covariance matrix
—
•••
—
^h,hyt-h with coefficients
Definition 4.2. The general-to-specific procedure chooses
ii)
Op{\/logn) and h
be the kp x kp matrix whose (m,n)th block
adopt the following lag order selection procedure from
the
as
(1995).
Tr{h)'
is
=
the sequence J{i,
z), {i
=
as
n
/^
=
n~^ Y17=h+i
A Wald
.
Ng and
hn
all
^t.h^'f ^
test for the null
Perron's notation,
=
Perron (1995).
h
if,
at significance level a, J{h, h)
ftmax, -•^ l}i wliich is significantly different
not significantly different from zero for
-^
^"cf n^/2
Ei=h„,,< + i IKjII
S„
{vecjTh^h)
Ng and
i)
is
Fj ^^F^
then, in
J{h,h) =n{vecnh,h)' \ty^h^Mf^^{l)j
is
~h=
by AIC or BIC then h
hypothesis that the coefficients of the last lag h are jointly
We
~*
where F^^^
satisfy the Yule- Walker equations
and
n^^'^'}2i=h+i''^j
avoid the problems that arise from using information criteria to select the order of the ap-
to be the lower-right
and
quickly,
Moreover, the results of Hannan
h.
proximating model we use the sequential testing procedure analyzed in
nih)
are
At the same time h must not increase too slowly to avoid
and can therefore not be used to choose
to be adaptive in the sense of
To
(tti, ...,7r/i) if
/i, ...,7r/j_/i)
(1995) argue that information criteria such as the Akaike criterion do not
and Kavalieris (1984, 1986) imply that
fails
=
Berk (1974) shows that h needs to increase such that
asymptotic biases.
h,
—
chosen such that h /n
is
7r(/i)
(tti
i
=
/i-max;
•••:
1
where
from zero or
/imax is such that
^ do.
In order to calculate the impulse response coefficients associated with (4.2) define the matrix
•
I
Ah
13
with dimensions hp x hp. The j-th impulse coefficient of the approximating model
E'l^Aj^Efi
with
=
E'l^
Estimates Cj^h
—
of Proposition (4.3)
1^.
by substituting n{h)
is
approximated by aM,h
form the possibly
obvious way.
We
a similar way.
We
D^
=
=
-
(^\M,h
leads to an estimate D^.
with
/i
as defined in Definition (4.2). In the proof
D
~
^ =
=
P'm^h
^\\^ ^^x;
i^Mp ~
=
^'D,^
CT2A/,ft
=
where P^)
parameter aM,h
i'D^
the same way as described above and i/n-consistency uniformly in
Ti{h) in
for Df^. In practice, infinite
dimensional matrices.
The
P'M,h^M,hPM,hDh
dimensional matrices such as P'^
resulting approximation can
h
defined in the
'^
obtained from
is
bM,h^M,kb'MjiD^
on
ctim./i
am
^m) ^a/,/i + ^MM^M'^h (-^^p ~ ^m) ^^^
= P^j,^^
and
Op{n~^^'^).
Next,
^P\
t,
where Dh
Poo,/i
A
for Cj_h in
'
(^-ZM.h
that A/,
/,
Kj jP-'hIh) The
P'mm {P'M.h^MjiPM.hj
A^ such
use the approximate autocovariance matrices F^^ to
dimensional matrices P^;;,
then form the matrix QM,h
(JM,h
(4-3)
in
infinite
= QMAihip -
^M,h
obtained by replacing h with
for 7r(/i) in
shown that -yn-consistency of n{h) implies D:h —
is
it
then approximated by F^^
=
=
= Y7P=-m+i^^ (J)^h{j) where the infinite dimensional matrix 0/i(j)
We denote the k, l-th block of fi^^ by -dkj^h- We define Dh by letting
defined in the obvious way. Substituting estimates Cj
is
is
given by Cj^h
Likewise we approximate the optimal weight matrix by the
E'/^Aj^Eh of Cj^h are obtained
fully feasible version Z)^
F^
autocovariance function
1)2,....
Sl^
block Tf^f._-
k, l-th
The
.
~
^
dimensional matrix
has typical
is
^^'
f^"^
Yl'iZo^i+jA'^y.h^'i h
infinite
[/p,0, ...,0]
is
be made
^'^ estimate of aiA/.h
^-
and
^
£
0.^
M
is
based
implied by the result
is
have to be replaced by
arbitrarily accurate
finite
by an appropriate
choice of the relevant dimensions without affecting the convergence results.
We now
4>{k{i/M),
on
fr(/i).
..
define the estimate
/logCTjyj^^).
M
where
where we replace K\j
Moreover, we replace
In Proposition (4.3)
uniformly in
^j^^j^
s'
is
we
all
is
^'•^),
establish that (a ^^j-^-
defined in Proposition (4.3).
om\
(4.4)
The
M*
as
A/*
=
^/2
argmin
.
/
A[
Km
by
with typical element
TOO
fr{h)
following result can be established.
14
=
Op{n~^/'^ {\ognY^'^+'')
let
than was needed to show D; —
^=
.f'L'f
h
h
\2
(f){x)^dx\
/
/ [M'^'X^^^]
Establishing this result requires a stronger
thus explaining the slower rate of convergence. Also
defined in (4.1). Define
b./i//,
elements in (4.3) with corresponding estimates based
form of uniform convergence of the approximating parameters
Op(n
in
-loga^^^.
A.-,A\Dt
h
D=
^ where >1i
M*
Proposition 4.3. Let
addition
logaiM
=
si
Remark
6.
with
s'
|Ai
It is
=
ciM^'X^^''+o{X^/'') with
> Ai| +
Ultimately, one
is
is
.
<
as in
<
A^
=
cjM'^^A
A^/^ ^^jg^
used.
/log<7jy^;y. ^) is
is
(m*/M* -
ie.
+o(A^
=
l)
l)
=
Op{l). If in
Oj,{n-'^/^ (logn)'/2+'')
some
for
)
such that
Xj-
<
<
A^
A.
that the remainder term disappears sufficiently fast.
automated estimator
interested in the properties of a fully
M*
(m*/M* -
Tlien
(3.7).
.
not too close to A,
determined optimal bandwidth
<p{k{i/M*),
< Aij
|Ai
Si
always the case that \oga\M
Here we require that Xr
M*
be defined in (4.4) and
,>.
/?
where the data
plugged into the kernel function and the data-dependent kernel
In order to analyze this estimator
we need an
additional Lipschitz
condition for the class of permitted kernels.
Assumption
and
q
>
E. The kernel k{j/A4) satisBes \k{x)
—
<
k{y)\
\x'^
—
y'^\
Vx, y £
[0, 1]
for
some
ci
<
oo
1.
Assumption (E) corresponds to the assumptions made
results
ci
we are now
in a position to state
one of the main
Andrews
in
results of this
Using the previous
(1991).
paper which establishes that
an automated bandwidth selection procedure can be used to pick the number of instruments based on
sample information alone.
Theorem
4.4. Suppose Assumptions (A)
Assumptions (D) and
is
A^
k{.)
as defined in (3.3) or
<
and
A3/2.
Let
for case
Remark
7.
ii)
M*be
and (B) hold and ether
used as an argument in
ii) k{., .)
=
{|xi
<
1}.
(p{., .)
Assume
defined in (4.4) then for case
n/VM*(/3^
^;^,
—
fin.M')
—
satisfies
that
Op(l)- Also, if
Under the conditions of the Theorem,
Assumptions
logaiM
n/^/W0„
i)
M*
i)
i) fc(., .) is
and
=
defined in 3.4 and satisfies
(Cii)
and (E) where (p{.,
+
ciM'^^X^^'
.)
<
o(A^") with
- K.M^) = Op((logn)™^''(^''"«'"^))
— q < or ii) holds then
a/s'
can he replaced hy
M*
in the last display as
well as in Proposition (4.3).
Remark
8.
Although the constant
which requires that Aj
Theorem
(4.4)
Ai
typically
and there are no
shows that using the
that have asymptotic
estimators where a
7^
s' is
mean squared
it is
often reasonable to
assume that
s'
=
1
multiplicities in the largest roots.
feasible
errors
unknown
bandwidth estimator
which are
ecjuivalent to
nonrandom optimal bandwidth sequence
15
M*
M*
results in estimates j3^
asymptotic
is
used.
mean squared
An immediate
^^j,
errors of
consequence
/
of the
Theorem
is
also that
P^
is first
j^,j,
D~^do. The theorem however shows
respect
5.
Theorem
(4.4)
is
in addition higher order equivalence of P^j(,j.
Newey
stronger than Proposition 4 of Donald and
and P„,M'- In
this
(2001).
Bias Reduction and Kernel Selection
In this section
we analyze the asymptotic
bandwidth parameter
Theorem
5.1.
n,j
sample
size
n and the
Assumption
(D).
IfM
as a fimction of the
A'l.
.)
satisfies
D'^A[ f
<p'^{x)dx.
k{.,
lim
^/^IME[bn
'
m - P) =
J
This result also holds for kernels satisfying Assumption (C) if f
9.
-^ oo
then
n^oo
Remark
bias of /3„
Suppose Assumptions (A) and (B) hold and
and M/n^/'^ -^
j'
order asymptotically equivalent to the infeasible estimator
{x)dx
(p
is
replaced by
k'^{x)dx.
A
simple consecjuence of this result
kernel weighted
GMM
estimator
is
is
that for
many standard
kernels the asymptotic bias of the
GMM
estimator based on the
satisRes
Assumption (D) with
lower than the bias for the standard
truncated kernel.
Corollary 5.2. Suppose Assumptions (A) and (B) hold and
^ (f{x)dx <
2 or
Assumption (C) with
lirn^
where b^ f^
It
is
[ k'^{x)dx
||v^/ME(5„,M -
,5)11
the stochastic approximation to the
2.
If n,
M -» oo, M/n^/"^ ->
< Jim^ \\V^/ME{bl,, -
then
/3)||
GMM estimator based on
the truncated kernel.
can be shown easily that substituting well known kernels such as the Bartlett, Parzen or Tukey-
Hanning
in 4>(v,z) leads to
We now
j
{x)dx being equal to 16/15, 67/64 and 1.34 respectively.
(p
= Hml/
(
A^^^7'Af^2s-2
^here
J
by optimality of
A
(
[
[_^
il/^,
<
(f>{x)~dx)
k*
-
4
a simple variational argument
inequality
is
MSE
turn to the question of optimality in the higher order
Let k*
function.
which
<
fc(., .)
satisfied
<
|
is
oo.
M^
is
From Proposition
+ c^k'^B^''^
k*
<
sense of the choice of kernel
optimal for the truncated kernel.
(3.2)
it
follows at once that
Note that
any kernel
dominates the truncated kernel. In Theorem
used to show that we can always find a kernel
imiformly on a compact subset of the parameter space.
question of finding an optimal or at least dominating kernel.
When
g
solved for standard kernels by Priestley (see Priestley, 1981 p. 569).
16
=
2
is
k{.,
.)
for
(5.3)
such that this
This result raises the
fixed this
problem has been
In our context of rate-adaptive
kernels with fully flexible q
optimization problem
is
exist. In
known whether
not
it is
closed form solutions of the associated functional
any event, such solutions most
depend on the constant
likely
B^"^^
which
difficult to estimate.
To avoid
selection problem. Let <f>{k{x),
the kernel
we only choose
Let r be a
finite integer.
>
if |x|
ip^
=
Let
1.
and
it is
with
(5.1)
=
ip
U
= 1 + X][=i '^i^^
Then for j = 1,
r
,ip^)
fC^
2k{x)
—
...,
where
r
(f){k*{x)fdx]
J
Note that optimality
is
it
in this section
is
follows
>
for j
^'^^
^ ^
define
Also
k{x)
[0, 1),
U^ c
ili^
^
C/^.
1, /C^
=
h{x)\k{x)
enforces adaptiveness of
(p (•, •)
M'^,
=
let IC^
=
1
+
=
k{\x\) for
such that
{|x|
>
1}.
J^'^^^ ipiX',
<
x
compact and
C/1 is
The
permissible class
kernel k*{x)
<a( I
+k*?B^'^'^/K*
(l>{k{x))^dx]
\J-oo
(5.1) is
A
and
which means that
B'''
+
k{.,.)eICg
A;2b('?)/k*, all
J
for all
in general k*
A
sequence.
k*
The notation
elements
0(/c,
=
o^^.,
argmin
-,
is
/ctj ^^. )
estabhsh the following
Theorem
its
own optimal
data-dependent optimal kernel
(5.2)
a(
is
T
M*
To
see
depending on k and
why
(5.1)
same variance term
is
selected,
sequence, can be no worse than under the
<p(k(x)fdx]
+^^(2-^)
a,^,-,.
used to emphasize that the A'M-matrix used to construct
M^
M^
is
ct2a/
r.
contains diagonal
the optimal bandwidth for the truncated kernel.
We
result.
sup (k*{x)
^'^^
and
defined as k* where
5.3. Let k*{x) be defined as in (5.2).
Let Pn,M'{k'),k' ^^
A
a reasonable optimality criterion because one of the main objectives
to construct kernel functions that dominate the truncated kernel.
evaluated under
q>q' ,q'>l
depends on
B\/k* as the truncated kernel when evaluated along the sequence M^. Once the kernel k*
MSE, when
for
k{x) e [-1, l],ipe C/^|
implies dominance note that the particular choice of k* guarantees that k* has the
its
=
and k{x)
The optimal
the restrictions outlined before.
satisfies
pointwise in
be shown that
will
Because
/c(x)^ satisfies
\J-oo
a(
before.
solution to the optimal kernel
which by the Weierstrass Theorem can be approximated by polynomials.
understood that k{x) also
=
be as described
Define k{x)
(i/'j,
ICj= K.^
is
(f){k{x))
B^'^'. It
k{x),
M^X^)
..,tpj_i,0,ipj^i,..,tpT.),
[ip-^^,
of kernels
we propose the following data-dependent
these complications
kernel weighted
estimator based on k*. Here,
M*{k)
-
fc*(x))
Then
=
for
any
q'
6
[1,t],
r
<
oo,
Op{n-^^^ (logn)^/2+.')
GMM estimator with iiernel k* and let
= argmin^M^n"^ —
17
loga.
r
/3^
M'(k')
k'
^^
^''^
where for AJ{k*) we use
GMM
a'l^j
with
^^Mh ^^^^^
(t>{^*
Furthermore,
let
satisfy
\/1oS'^im/i)
=
Assumption
{Ai,
...,
Let
(B).
-"^
used as kernel weight. Then,
Ap^, Bi,
...,
Bqi^;pa,qb finite] be the set of all reduced
Go be a compact
set
Go C
O
such that sup^^j
fi''?'
Then, for some collection of sets U^, each sufficiently large, there exists a
0.
-
lim^r^o (l
^2,6(9')/^*
for
~k{x)^ I |x|''
<
any
6 [l,r],T < oo such that
q'
part of the theorem imphes that the truncated kernel
the case because the truncated kernel
dominates
6.
^
(
A:
oo and infeo -4
G
IC^'
with
>
kg'
=
4
+
(f^^^{k{x)fdxj -
j
{|.r|
<
1}
6
/C
and there
always dominated by k*
is
is
an element
.
This
in /C that strictly
it.
Bias Correction
Another important
are analyzed
for
siipe^
<
form models that
0.
The second
is
:
first.
issue
is
whether the bias term can be corrected
for.
The
benefits of such a correction
turns out that correcting for the bias term increases the optimal rate of expansion
It
the bandwidth parameter and conseciuently accelerates the speed of convergence to the asymptotic
normal limit distribution.
Using the result
Theorem
in
Pl,i
(6.1)
Note that
The
for
the following bias corrected estimator
= hnM -
~ [p'm^mPm)'' a
Ai can be estimated by the methods described
is
proposed
j 4>Hx)dx.
GMM (truncated kernel) the bias correction term
standard
bias term
(5.1)
is
simply 2-^
(
P'j^jQ'^JPm
in the previous section.
The
)
A\
quality of
the estimator Ai determines the impact of the correction on the higher order convergence rate of the
estimator.
If
for fin,M- If
Ai — Ai
-Ai— A\
The mean squared
is
=
only Op(l) then the convergence rate of
Op{n~^)
for
r]
6
(0, 1/2]
and
^/ri.
(/3„jv;
~
error of the bias corrected estimator
/^j
r^jv^ as in (3.5).
=^n M +'n m
We
^^'i^h
j^^
is
essentially the
same
then the convergence rate of the estimator
nD'/^£'Eb';,^„bl,/(D'/^
where
/3„
=
1
+
is
is
as the
one
improved.
defined as before by
^^(M.^.fc(.))
+
JR^,M
the same restrictions imposed on the remainder terms /?^ ^^
obtain the following result.
18
.
.
Theorem
Then
any
for
Suppose Assumptions (A) and (B) hold and
6.1.
i
=
eR"^ with f ^
1,
(fiUMJ,
k{.,
Assumptions (Ci) or (D).
satisfies
.)
K, .)) = 0{M/n) - logCTM- The
optimal
M*
can he chosen
by
Mp
M_ = argmm
IfM* = argmin
^ - log^M,
n/y/W*
RemEtrk
The
10.
result
then
(m*JM; -
(^pIjCi,
if $^
from Theorem
0((logn) /n)
7.
Monte
A
small
moment
for the
(6.1)
=
l)
is
j(^.
•"
It follows
logcTM-
Op(n-i/2 (logn)''^^/^) and
K^i + ^D-'A[ I hHx)dx^ = Op(l).
-
remains valid
,
n
replaced with P^ M^/k')
C
that for f^n^M* the higher order
MSE
y
J^
T^J) as long as
q'
>
s'
0(logn/n) compared to
is
GMM estimator without bias correction.
Cctrlo Simulations
Monte Carlo experiment
is
conducted in order to assess the performance of the proposed
For the simulations we consider the following data
and bias correction methods.
selection
generating process
(7.1)
with
[ui,
~
I't]
N{0, E) where
be estimated and
is
set to
not explicitly estimated.
/3
S
=
yt,i
=
Pyt,2
yt,2
=
<Pyt-i,2
has elements
The parameter
both Ordinary Least Squares (OLS) and
poorly identified.
We
[uqiWo]
The parameter
=
0.
In each sample the
is
ai2
cr^
+ vt.
=
1
and
=
first
is
ai2-
The parameter P
is
the parameter to
one of the determinants of the small sample bias of
GMM estimators
chosen in
9 finally
generate samples of size n
=
Out-i
simulations. All remaining parameters are nuisance parameters
1 in all
the quality of lagged instruments and
o"j
+ ut-
is
and
{.1, .3, .5}
.
Low
set to { — .9, —.5, 0,
{128,512} from Model
is
.5,
set to
.5.
values of
.9}
(7.1).
The parameter
(p
controls
imply that the model
(p
is
.
Starting values are yo
1,000 observations are discarded to eliminate
=
and
dependence on
initial conditions.
Standard
GMM
estimators are obtained from applying Formula (3.2) with
in (3.1).
Im- In order
^m we first construct an inefficient but consistent estimate /3„ based on (3.2) setting
and ^m = f-M- We then construct residuals St = yu — Pn.iVit and estimate Q.m as described
to estimate
Km = fu
Km —
Kernel weighted
^
GMM estimators
(KGMM)
19
are based on the
same
inefficient initial
estimate
such that the estimate for (Im
In the second stage
is
identical to the weight matrix used for the standard
we again apply
dependent kernel k* defined
in
with
(3.2)
Equation
The estimated optimal bandwidth M*
is
and the matrix K^j based on the optimal data-
Cluf
with
(5.2)
/C,
=
/Ci.
computed according
For each simulation replication we obtain a consistent
(5.3).^
residuals
it-
We
estimate 9 by fitting an
MA(1) model
then estimate the sample autocovariances F^^ for j
estimate of Ai based on Formula (4.1). Next
optimal specification of the approximating
of lags.
Based on the optimal
=
to the procedure laid out in
first
for yt
lag length specification
stage estimate
where n
0, ...,n/2
=
[yu,
y2t]'
D
j
to generate
is
We
the sample size and form an
of Definition (4.2) to determine the
allowing for a
we compute the
which are then used to estimate the remaining parameters
/3„
Theorem
Matlab procedure arimax.
to it using the
we use the procedure
VAR
GMM estimators.
maximum
of 2 *
[71^/^]
impiilse coefficients of the
and a^j needed
for
the criterion
VAR
MIC(M)
as well as for optimal kernel selection.
In Tables 1-3
kernel,
we compare^ the performance
KGMM-Opt,
to feasible standard
automatic selection of
M* we
GMJVI-A", with a fixed
number
GMM, GMM-Opt,
consider both estimators,
of
X—
1,
Theorem
GMM
KGMM
we
that for
small relative to the sample size there
(p
are also not verj' different from
dominate
effect
and
OLS. As the
GMM both in terms of (median)
becomes more pronounced
as
by the predominance of bias terms
KGMM-X
In addition to
(with optimally chosen kernel) and
We also report the performance of the
BKGMM-Opt where M* is selected according
and
M from the properties of using weights
conditions
consider
with truncated kernel.
(6.1).
In order to separate the effects of selecting
first
GMM with optimally chosen
20 lagged instruments.
BGMM-Opt
corresponding bias corrected versions
to the procedure described in
of feasible kernel weighted
for the
moment
with a fixed number of instruments. Tables 1-3 show
is little
difference
identification of the
between the two estimators. They
model improves,
KGMM
starts to
MSE
and mean absolute error (MAE). This
more and more instruments
are being used which can be explained
bias
in this case
eis
well as
and the bias reducing property
of the kernel weighted
estimator.
Turning now to the
identified
fully feasible versions
models the choice of
M
we
see that the
same
results
does not affect bias that much and
all
remain to hold. For poorly
the estimators considered
have roughly the same bias properties. Especially for poorly identified parametrizations optimal
is
much more
disperse than optimal
KGMM. The
reason for this
lies in
the fact that optimal
GMM
GMM
tends to be based on fewer instruments which results in somewhat lower bias but comes at the cost
of increased variability.
The Matlab code
'Results for 9
is
As
identification, parameterized
available on request.
= { — .9, .9}
are available on request.
20
by
(p,
and/or sample
size improve, optimally
chosen kernel weighted
The
GMM starts to dominate standard GMM for most parameter combinations.
both estimators attain further improvements both as
bias corrected versions of
as well as
MSE
and A4AE are concerned when the model
the kernel weighted and bias corrected
On
weighted version.
A
clear ranking
compared
to
is
The
thus not possible.
is
and
more
KGMM-Opt when
identification
Theorems
without the homoskedasticity assumption
In this sense
it
ht
=
ao
+
as before.
ai£^_i
it
is
We
when
\6\
still
(4.4)
and
(6.1) are
is
expected that the optimal
+ £t - ^^t-i where St
+ ^i^t-i- We set 6i = .2, uq =
/3y2i
now implemented with
MIC(M)
M* —
IGARCH(1,1) process
follows the
.1
is
and
results are reported in
are available on request.
=
ai
.8.
The
while
KGMM-Opt now
identification
in
is
for
innovations
easy to detect in the data
the case of ^
For a fixed number of
cases for the larger sample size
n
=
512.
moment
The
=
.5.
estimator
does worse when identification
is
KGMM
GMM-Opt
(2 log A)
first
St
=
[u(,i;t]
equation in
1
/2
with
inht
are defined
we assume that
GMM
^m.
lags.
dominates
still
strong and/or sample sizes are large. As expected, bias correction
it
GMM
in
continues to perform well
weak but performs at
least as well
is
no longer
when
effective
produces severe outliers resulting
in inflated dispersion measures. For this reason only inter decile ranges
We
n/
Results for other parametrizations
conditions
reducing bias. Moreover, when combined with kernel weighting,
8.
log
heteroskedasticity consistent covariance matrix estimators
Table 4
many
of conditional
performs reasonably well
For simplicity we use the procedure of Newey and West (1987) with a fixed number of
The
Overall,
large.
not expected to go through
investigate this question by changing the
Since heteroskedasticity of this form
estimators are
or
results concerning higher order adaptive-
plausible that the criterion
is
even with heteroskedastic errors.
=
weak
we have maintained the assumption
While the strongest
ness and optimality of bias correction in
(7.1) to Hit
is
sensitive to the underlying data-generating process.
homoskedasticity of the innovations.
Model
than the non-
bias corrected estimators tend to perform relatively poorly
In the theoretical development of the paper
asymptotically.
MSE
tends to have a somewhat larger
the other hand the non- weighted version tends to overcorrect bias in some cases.
GMM-Opt
their performance
GMM
In these circumstances
well identified.
is
far as bias
(IDR) are reported.
Conclusions
have analyzed the higher order asymptotic properties of
Using expressions
for
lagged instruments
is
the asymptotic
derived.
It is
new
version of the
estimators for time series models.
Error a selection rule for the optimal number of
shown that plugging an estimated
the estimator leads to a fully feasible
A
Mean Squared
GMM
version of the optimal rule into
GMM procedure.
GMM estimator for
linear time series
21
models
is
proposed where the moment
conditions are weighted by a kernel function.
moment
It is
restrictions reduce the asymptotic bias
shown that optimally chosen
kernel weights of the
and MSE. Correcting the estimator
for
the highest
order bias term leads to an overall increase in the optimal rate at which higher order terms vanish
asymptotically.
optimal kernel
A
is
fully
automatic procedure to chose both the number of instruments as well as the
proposed.
22
A. Proofs
Lemmas
Auxiliary
are collected in
^t^
=
Ext- Define
Wf^i
-
T^^.
et+miyt-i+i
—
i^y)
Definition A.l. Let
Ew[i and
let wt,i
Define
=
=
Proof of Proposition
=
wt,i
=
^nd Est+mys
—
{xt+m
=
Next define wl^^^
For a matrix A,
^j,k,'&j,k 3.nd 'djf. respectively.
such that
upon
available
is
request.
They
are referred to
Lemma (B.XX). Before stating the proofs a few commonly used terms are defined.
in the text as
vt^i
Appendix B which
Mx)
{vt-i-i^i
-
{yt-i
—
jJ-y)
-
Hy) {yt-j
,
T^^
=
Ewt^i and F^^
=
\\A\\
Ew\-_^
= TjV.
Q~^ and
Q^
My)' with
^l-s- ^^^ ^^^ j,k-tli element of Q,
=
be
AA'.
tr
(3.2). Consider a second order Taylor approximation of D'^f
around
D~^
= ^m = PM^M^Mn"^^^ Y17=r ^t+mZtM^M,
for d-M
MPnM -P) = D-'[I - {Dm - D)D-' + {Dm - D)D'\Dm - D)D-']dM + Op{M/V^)
where
Dm
for
-
M/n^^'^
—
the error term
*
D =
Op{MI-n}/~) as shown in
We
decompose the expansion
(B.33).
P'n-'P, H2
and Hi
is
Lemmas
the Taylor theorem, and the fact that detl?
Vn{Pn,M ~ P)
=
bn,M
+ Op{M / y^)
—
d^
-\-
—
di
-\-
...
H3,
and H4 contain a Taylor
(A.l)
nil
B
Lemmas
(B.24) to
+
=
dg with dj defined in (A.L5)-(A.24) such that
with
series
n-} - o^/(Om -
9
4
i=l j=0
7=0
where
Op(l) by
0,
...
9
The terms
dM =
(B.14)-(B.23) and
^
D = Hi + + H4 where Hi = P'j^jKm^m ^mPm —
= P'mKm^mKmPm - P'mKm^mKmPm, H3 = -P'^^Km^m{^m - ^mWmKmPm
into Dj^;
defined in (A. 14). Also, (Im
is
Op{M / y/n) by
expansion of Qjj around ^^j given by
^mWm +B + op ^M - ^M
has typical element k,l given by vec(Qjvf
— ^M)' ovecriavecn'
2
)
vec(r2;v/
— ^m)- The term
2
Op{
CIm
gk{M)
—
^M
= M^X
)
=
for
Op(l) by
Lemma
rate-adaptive kernels k{j, A/).
on the kernel used. Define the constant
kernels. In
Lemmas
Let gk{M)
(B.9).
(B.14) to (B.16)
Ck
it is
=
1
=
The notation
M~'^
= Hu +
and
H12
(A.2)
Hu = P'Mn-JPM-P'^~^P =
(A.3)
H12
(A.4)
Hn =
-P;;0^/(/-A-A;)PM = O(5fc(M))
(A.5)
Hi,
=
-Pl,{I-KM)n-,}PM=0{gk{M))
=
Cjt
= — log A
+ Hi^ +
0{M'^'X^'^')
P'M{I-KM)^l,j{I-KM)PM = 0{gk{Mf)
23
k{j/M) and
gk indicates the dependence of the rate
for regular kernels
shown that Hi
for regular kernels
H14
for rate
is
adaptive
,
where
H222
= means
is
'equal by definition'. In
Lemmas
(B.17) to (B.20)
tiie
term H2
=
-\-
-[Pm-Pm)' KMn-jKM{PM-PM) =
H2U
^
(A.7)
H212
= P'MKMni}KM{PM-PM) + {PM-PM)'KM^''^KMpM =
(A.8)
H221
=
(A.9)
H222
= P'MKMnilKM{PM-PM) + {PM-PMyKMnjlKMPM =
Pm
is
Lemmas
+ H221 +
Op{M/n)
Op{n-^'^)
-{PM-PM)'KMni.}K'M{PM-PM)=Op{M/n)
defined in Section 3 and
(B.21) and (B.22)
=
P'j^j
show that
[t^^
=
Hr^
...,
H31
=
TYj] with f^^
+
Hs2
+
^33
+
Op{M/n'f^).
n"! Er=maxO+i,r-m)+i
H3A
is
(A.IO)
//31
=
{pM-PM)'KM^li{^M-^M)^MKM{PM-PM)=Oj,{M/n)
(A.ll)
H^2
=
-P'MKM^M{^M-^MWMKM{pM-PM) =
Op{M/n)
(A.12)
Hyi
=
-{pM-PMYKMn~J{ClM-nM)^MKMPM =
Op{M/n)
(A.13)
Hm = -P'j„KMn-^l{hM-nM)^-^KMPM =
and Hi which
is
(A.14)
where the
Op{n-"^)
a remainder term defined as
Hi = P'l^KMinjI - n-J + nilih^ last equality follows
from
Next we turn to the analysis
Lemma
of d^;
^uWm)KmPm = Op{M/n)
(B.23).
which
is
decomposed
as
dfc
-V2^,;^^,...,,-i/2^,;
V=
Voo such that
it
follows from
=
^
dj.
Define
M
t
t
with
H212
analyzed to be
(A.6)
where
H211
Lemmas
(B.24) to (B.33) that
(A.15)
do
= P'n-^v =
(A.16)
di
=
(A.17)
do
P'm{I
(A.18)
dz
-P'j,,{I
(A.19)
di
[Pm - Pm)' RmQ-IKmYm = Op{M/n)
(A.20)
d5
{Pm - Pm)'
(A.21)
dg
[Pm -
(A.22)
dj
P'mKm9.i}{^m - hM)^-^KMVM
(A.23)
ds
P'i^KmBKmVm +
(A.24)
dg
n-'/^Z^tpM^MnilKM [1m ®
0pii)
Pi,njlVM-p'n-'v = Op{APx")
- KM)nj}{I - Km)Vm = Op{gk {Mf)
- Km)Qi}Vm - P'^njlil -
Km^IJKmVm =
KmWm
=
Op{M/v}'^)
Pm) Km^m{^m - ^mWmKmVm =
Op
{M/n)
24
Op{gk (M))
=
OpiM/n)
Op(n-i/2)
= Op{M/n)
(y
-
My)]
=
Op{M/7i^^-).
'^tJ-
We
D~^
X)t=o
^i
" ^~^ St=i £7=0 H^D~^dj
Prom
in probability.
^12;
= M~'. We
focus on regular kernels where <7fc(M)
first
of the estimator which
the results in Equations (A. 2) to (A. 24)
do, d2, ds
-^^13, -f^i4i -f^222,
and
d^.
M~^
(B.39)
and -Edod'^D-'^iHis
as
Af
—
»
+ Hu) =
£^0^2
by
left
=
M-^/c^B^^^
(B.39)
and
+
o{M-''-i)
largest
follows that the largest such terms are
cross products of the form Edid'-,
hniM^oo {Hn
= —M~''kqB^ +
+ Hu)
/gk{M), the largest
o{M~'^) as shown in
Lemma
and
The two
o{M-i) by Lemmas
(B.24)
(B.42).
sign.
P'm{I-Km)^m{I-Km)Pm lgk{Mf. Terms of order M^^? include
by Lemma (B.38) and -Edod'^D'^H'y^ = -M'^'^k'^B^^^ + o{M-'^i)
- {Hn + Hu)D'^do){d3 - {Hu + Hu)D~'^doy = 0[M-^'i) by Lemmas
We
are
(B.16), (B.24),
(B.43).
M and are largest
Terms that grow with
that the cross product term i?i^222-C~^rfo'^5
0{n-^) by
+
and n and are
Since Edod'2 and —Ed^d'^D'^ H'12 are of opposite sign these terms cancel.
(B.35).
with E{dz
c^'^k'^
M
k'^ limM-00
M-29fc2B^9)
Lemma
=
00 are Edod'^
terms cancel because they are of opposite
Now define B^"^ =
it
depend on
Of those terms we examine
Edid'^D-'^Hi and EH^D-^dod'^D-'^Hj. Letting 6^^^
terms vanishing at rate
consider the terms in the expansion
Lemma
and
(B.40)
Edr^d'c,
=
in order are if222-D~^do
is
and
^5. It follows
by
Lemma
(B.41)
We ai'e left with EH222D~^ dQd'QD~^ H'222 =
by Lemma (B.44). Then
{MJ, k{.)) = 0{M^/n) +
of lower order.
0{M'^/n)
(/;„
OiM^^").
Next we turn to the case of the rate-adaptive kernel where gk{M)
Hu,
H222: do, di, d2, ds and ds are largest in probability.
that Edod'QD~^Hii
Lemma
(B.36)
it
(B.43).
The
Ed^d'^ such that these terms cancel out.
follows that Edod'^D^^ {H13
are therefore Edid[
Hu)D-'^do){d3 -
=
=
M^'X'^'^^Bi
+o
+ Hu)
is
{gk{M)^) where Sj
=
0{M'^'\^'^') by
term growing with
M
is
+
Lemmas
=
.
liniA./^oo
Lemmas
The
Now
H\\, H12, H13,
and (B.36) we show
(B.34)
Because Edod[
of lower order.
Hu)D-'^do)'
(//i3
largest
In
= NPX
= Hu +
largest
o{n~^) by
terms remaining
-Hu/9k{M)'^ and £(^3 - {Hn +
(B.16), (B.24), (B.25), (B.39)
not affected by the kernel choice and
is
and
therefore Ed^d'^
=
0[M'^/n) as before.
For part
ii)
and
iii)
we only need
to consider the terms
An =
Ed^d'^ and
Bn = E{d3 — {H13 +
Hu)D-'^do){dz - {Hi3
follows that lim inf „
From Lemma
+ Hu)D-'^do)' + Edid[. Since for all n > 1 we have An > and B„ >
A^ > and lim inf„ Bn > such that A, S^''^ and B\ are nonnegative.
(B.44)
it
follows that
El'D-^'^d^d!^D-^/^l
=
mV" (
1 4?{x)d:^
25
(.'
D'^'-A^^AiD-^'-l
+ o{M^/n).
it
From Lemma (B.39)
=D+
that Edod^
M^\^' Ed^d-i =
follows that
it
-kqB[''^
+ o(l)
and from
Lemma
(B.24)
it
follows
o{l) such that
=
M^'X^^^E{Hy3 + Hu)D-'dod'oD-\Hn + Hu)'
c^''fc2g(9)^-ig(9)'
+
^^i^_
This implies that
E{Hu +
or in other words
^j2s;^2A/g(g)
B„ =
^ ^^^^^,^
Proof of Proposition
1
Since
-
au-r
D
—
QajPm =
- E{Hn + Hu)D-'^dod'oD-'^{Hn + H^) +
Edsd'g
o(Af2^\2W)
j^
E{Hn + Hii)D-Uod'^D-\Hn + H^i)' =
Hu)D-^dQd!^ -
Lemma
-^^
l'D''/^{D - P'm^-^Pm)D-^I'^1
=
A'f'X^^'Bi
-^13
+
Lemma
(3.4): First note that by
=
P'j^,jQjj
^"^
(B.43) where
=
is
o{Af~'X'")
o{M'^'X-^^). Here Ed^d'^
=
defined in (B.19).
(B.25)
+
i'D-^^^Edid[D-^/^£
o{max{n-\M^'X'^^''))
+ o(max(n-\ 7\/2^a2^')).
Pm >
by standard arguments
it
follows that
1
— a\M
|
M
as
—
»
oo. Also,
-^14 such that
bM^MbM = QMnMQ[„-{H,3 + Hu){P'Mni}PMy^{Hi3 + Hu)'
where the
Cq
from
line follows
=
b[''^
c^'^k-^
consider for example, Qa/Pa/
=
H 14) /gk{M)
\im{Hu +
'^k^~\imQMQi\[Q',^i/gk{M)~ by similar arguments as
cally,
We
last
the proof of
in
and the
Lemma
fact that b!^^
(B.43).
More
=
specifi-
Hn+Hu with Hn = A/'A'" E'/.j,=i ^"' IJiI" ]J^\^jft^^n,h^-\-
use the chain rule of differentiation to write
1
,.
-
k(i,
lim
"m
Because d<t){v,z)/dv
I
lim
=
hm
A/-00 \
=
=
M)
TT-
z[—
<p(k(t/M).APX'^^)
log 2)^
- k{i/M)
1
1
-
M
\i\'^AP^X^'
=2—
-
1
,.
= hm
l-k{i/M)
+ 2{z[— logz)'' —
- (P{k{i/M).ciAPX^\\ +
(|i|/Af)'
\)v
MU'PX^''
continuous in v and z
is
0, it follows
]
{om{1)))
M
—M—
h (fc
'
W ,;„
A:,(-logA)«
lim
^
-
1)
'
logCl
777^
M^<x>\M\ogX
/
—
M
+
,
1-
r,.
{k
-
log
*
A
+
,^-logA/
1)
,
^T
A/logA
A;,(-logAr
26
M
+
,
^
1
+
,
{i/M}''
J
- log(l +
Oa/(1)) ^
A/logA
'
^
(1
+
oa/(1))
that
,
Here, om(1)
is
M
a term that goes to zero as
—
>
Define
oo.
um =
—
cr\M
follows that
It
(^2M-
= 1 + 0{M'^^>?^). Let X = 1 — aiM + cf2M such that — log(l - x) = x o{x). Then, - logaM =
1 - o-iM + 0-2M + o{x) = M'^'X^^ {kqB^'^^ +Bi) + o(M2^A2^0- Since M* ^ oo as n ^ oo it follows that
iPn{M*,£, fc(.)) = MIC(M*) (1 + o(l)) The result then follows by the same arguments as in Hannan
and Deistler (1988, p.333). If in addition log a im = ciM2^A2^^4-o(A^^) with A^ such that < A^ < A^/^
Then, it
then 1 — CTiM = ciM^^A^^ + o(A^^) and the same holds for a2M by construction of k{.,
follows that (p^{M*,i, k{.)) = MIC(M*) (l + 0{-n}l'^ {\ognf^)\ by the same arguments as in the proof
(JM
-\-
.
.).
of Proposition (4.3).
.
Proof of Proposition
^
of generality that
estimated residuals
=
9{z)
if
9{z)
Since xt contains elements of
(4.1):
C,jj^rrS''^j^-k
^^
v^'^onsistent.
—
y)
—^
=
£t
{yt
—
(xj
Let
/?
be a y^-consistent
for
\z\
<
By Assumption
1.
The periodogram
of st
is
InW =
©2 C
(B) 3
''^^^
first
stage estimate.
=
x) are used to estimate (j. Let g{X,0)
c
l-9iz-...9rn-iz'^''^- Define the parameter space 01
7^
enough to show without
it is
j/j
IZt
s
int
Gi,
M"*"^ such that ^
02 compact such
=
The
with
|i9(e^'^)|
(Bi, ...,em-i)
e 0i
that ^o G 02-
The maximum likehhood
etEsC''^**"*^-
loss
estimator for 9
is
asymptotically equivalent to
(A.25)
^
=
with Ai{9)
""'
Et,s ^tese'^^'-'\
/^-(A)
/3
n-iE,^n(Aj)/5(Aj,0)
(x
—
=
/r (A) =
n-i(do - ao) Zt,s
n-i
=
for A^
Zt,s ^t{xs
argminAf,(6l)
=
- l^,)e'^^'-'\
and /°(A)
ete'^^'-'^
2nj/n, j
= -n + 1, 0, ...,n - L Define F^X) =
/^ (A) = n"! (do - ao) EtM - ^Je'^^*-^)
...,
= n'^a^ -
= 4(A) +
(^ -/?)'/;? (A)
(^-^) +
+2 (^ - /3)'/r (A) + 2/r(A) +
Note that /-(A,)
=
A^„(0)
0.
=
= /^ (A,) = /^°(A,) =
We now
A-„(e)
+
for j
#
ao
=
(do-ao) Et
=
y
-
My
"
2(/3
-
(3)'K^{9)
From standard arguments
A''''{9)
=
/;?(A)
2(do
and /-(A,)
=
- ao)(^ -
/?)'/,?" (A)-
n(do-ao)2, /^"(A,)
£*
have
+
(/?
-
/3)'A^(^)(/3
2(do
with
-
Mx)- It follows that
/^(A)
for j
ao)^ Et,. e'^^*"^) for do
(see Brockwell
- aQ)n
in
6*
/3)
^
^(cf +
and Davis 1991,
2n j f^b{X)/9iX,9)dX and d'^hf {6)189
2n f fi.ir[X)/g{X,9)dX uniformly
-
"4-
6 02- Consistency
27
ch.
(xt
10)
-
it
Mx))
("o
-
^o)^
follows that K'^{9)
d^A''^{9)/d9 for
of 9 follows
+
/t
<
/5(O,0).
°^
A°''{9)
oo such that Ai{9) -^
from standard arguments.
.
To
=
establish y^-consistency note that ^yn^A^^{9o)/^6
Op{l),
n
^^'"^^t^t
—
^p(l) ^i^d
Therefore
We
=
also define dA'{9)/d9
dA'[9)
(A.27)
where
=
V^dAi{9)/d9
(A.26)
<
=
+ ^iQi -
(i)'
d h.'^ [9) / d9
+ Op (1)
2n J f,{X)dg~'^{X, 9)/d9d\ such that
dA%9)
dAi{9)
89
89
09
||(9A^(6'o)/56'||
V^dA'„{9)/de
+
8Ai{9)
8Ai{9o)
89
89
Op{n-^/-) by (A.26). Definition (A. 25)
dAi{9)
OAiieo)
89
89
<
dAi{9o)
2
dAU0o
+
89
for 9 implies that
=
0p(n-i/2)
89
Finally
dAi{9)
dA'{d)
80
89
=
dA'„{9)/89-
I
27Tf,{X)dg-\\,9)/89dX
+20-l3nd{A'-{9o)))/d9 +
/here the second term
Op{n
is
^/^) since [p
—
j3)
=
Op{n
^^^).
The
Op{l)
first
term can be written as
- j 2TTf,{X)8g-\X,9)/d9d\
8A'^{9)/d9
=
n-i
Y.
\^n{^3)
- 2^A(A,)1 dg-\X,:9)/de
+ n-'Y.27TMXj)8g-\Xj,9)/d9where the second term
0{n-^).
is
Now define e,^{e) =
(27r)-i
i
2^f,{X)dg-'{X,9)/89dX
/ ^^-^(A, 9)/0ee'^^dX such
Ej^jWe-'^^and
n-'Y.
J
!^n(^^)
n
CO
t,s=l
/=-oo
n
=
n-'
-
2^A(^i)] dg-\X,r0)/O9
n-|min(/,0)|
Y E
l——n
{et£t-l-Est€t-i)U0)
+ Op{n-')
t=max(/,l)
/
<
Ei'i"'(""'E,(^*^'-'-^-^'^^-')
MO
28
n
\ 1/2
that ^^"^(A, 9)/8e
where the second equahty follows from n
Then note
the Cauchy-Schwarz inequality.
^
^
=
^)
e''^^^*
that n"^ Ylt
for
i 7^
s
~ ^^t) =
i^t
and the inequality follows from
Op{n~^^^),
E
l= — n
l= — n
and Yl?=-n
I'l
^K^)'^
uniformly converging for 9 with
is
\9
—
<
6o[
5 for
some
zeros on or inside the unit circle. Consistency of 6 then implies X^JL-n Kl
establish that
_
dA%{9)
09
dA^0)
consistency of Y^- C,j+m^Y-k
of generality
y-n-l
7
YljCj+m^]^k since
__
fyy
yn-1
xt
9)
~
Jlj ^j+rrJ'^-k
_Q
it
j
=
Op{l). These results
Op(n-i/2)
09
+^
^^ consider without
J2j Cj-i^nS'^-k
of elements of yt,yt-i,
,yt-r-
We
loss
next show that
fn-l/SN Write
n-1
2^
2^Cj+mrj_fc—
7 > Cj+mrj_fc
j=— n+1
q(^)^
then follows that
n-1
7)-l
such that 6{z) has no
Op{\).
composed
is
-pw
A
=
(A.27)
—
such that by a continuity argument y/n{9
To show
From
Op{n~'^l'^).
00
>
5
Cj +
m
Cj+m j
(Cj+m
Z^
r^-fcj
(^r^-fc
= -7T+l
r^-fc-
j=-n+l
First consider
r!-l
E
-^^^
where P(supj
~
Cj
Cy
For any S such that
(„l/2^J- 71+1
>
C''^
\9
—
^
0q\
J-k
rj!.
||c.-c,
)
<
^
C
goes to zero for some
S implies 9{z)
<
>ri
j-k
E
<ni/2sup c,-c,|
large
If,-/
enough by the previous
has no zeros on or inside the unit
p
result.
circle consider
En-l
ui/2
(^)l
^
j-k
^
j-k
>T)
+p{J9-9o\ >S
We
nV2
by Equation (B.9) and the
way
it
r^_^ — r^_^
use the triangular inequality
^n-l
sup
E
<
\J0WI
ryy
1
j-k
_
-pyy
__
'
J-k
+
ryj'
^
j-k
ryy
^ J-k
fact that supi^.g^ji^^ XI":=i„+i |Cj (^)|
=
ryy
_
-pyy
Op{l)
0{1) uniformly in
follows from Equation (B.IO) that
^r^-l
ni/2
This establishes that Ai
sup
E"
— Ai = Op{n
,JC,(^)I^ S-fc
^/^).
The
'j-k
result then follows
29
such that
by
0(1).
Lemma
(B.47).
n. In the
same
j
1
Proof of Proposition
—
log (Tm
—A ij^
+
(p{x)^dx]
s'
We
=
si
<
Ai
>
Ai
+
>
si
<
Ai
<
Ai
I
-log(j^,^^
+
show that uniformly
first
L„(M) = Ln{M)
(A.28)
with
.
L^iM) =
Let
(4.3):
(l
^A
in
and L„(A/)
(j'^^(l){x)'^dx)
M,
+ Op{n-^/^ (logn)'/2+5'
Consider
.
(^-I)
^°g'^Aa-i°g'^ M
L„(M) - Ln{M)
<
+
.4
log 0-2M
Ln{M)
because L„(Af)
follows that
M.h
1
gi){A4)
=
+
> M'^/nA (j^^(p{xfdx')
A-A = Op{n-^^'^).
|Af |'^~^ a[^'
(A.2^(A//)-2Af-^'
Let
1).
°p(^L,h
.
First
First
g{M) = \M\'
by
Lemma
X^^^^,
By Proposition
1.
g'{M)
=
=
ciM'^'X-"'
+
and
Lemma
(B.47)
it
o (A/^^A^^) and loga^^^
=
(4.1)
|Af I'-'+^'Z^ a'^^I,
gr{M) =
|A/|''-^ a'^^'
and
\e'D7'^^H,^jDl'^^£ - fD-'/^HuD-'^-'e
= giMy'M-^'
(<7^Aa-^?A0
individual
>
show
= Kuh^llh^Mh-h-
we analyze
cr^;
Next, note that logcr|^
=
with // n.h
and
I* >^
components
Op(n-i/2 (iogn)i/2)
^^°"gl^ *°
oi
H^^
<
(B.46) such that
j^
^'^°'''
^hat
^n.A-^n =
Op(n"i/2 (iog„)i/2^/(M)2)
— Hu. Note that
+
(1
=
Op(n-V2(logn)^/2j5r(j))
Op{n~^/'^ (logn)^/^ j) because
analyze terms involving -dj^j^+M ^'^ use the expansion
Qy^—Q~^ = Q~^
iO.
—
Qf^]
pyx
=
0(.gr(j)).
To
Q~^+Op (n~^/^y/\ogn)
such that
(A.30)
^^:...H
Note that
m—
'^k.iA
-
^k.z
/=-m+l
=
by
Lemma
(B.46).
It
Op(n-i/2
|z
-
k\
S2
=
2.si
—
1 if
-
k) [logn)'/'')
then follows that
=
^n,n+M:h - ^n^n+AfW
where
gr{i
Aj
<
Ai, S2
noting that X]w=i^ji.t i^kih
=
si
~ '^k.i)
+
2.si
'^i.j2+M
if
Opin^'/^ (logn)^/^ M^^X'')
Xi
is
=
Aj and S2
of the
30
=
si if Aj
same order
>
Aj. This
can be seen by
as the A/-th autocorrelation of a
AR
process with lag polynomial
—
(1
H,,rHn,
^^' namely
11."
P' - ffi"^M,h \ MJi
in this difference are of the form
— AiL
1
(
1
Consider
.
^MA ~ P'm {^m -
"r
•'
OO
AiL)^'"*"^
[^
now
the largest order element of
^]m) ^^^- ^^^^ largest order terms
/
J1,..,J6
"
h
Ji,J2,n
=1
'^h,J2
{'^J2J3+Ml^j3,j4'^J4 + M,j5)'dji,je^^-jJ6
I
and taking into account (A. 30) we need to consider
JT.J2 ^
Jl
'^33 J4 ('^J4 ,35
J2J3
J2,]3,h
,
+ M'"35 ,36 ^J6 + M ,h ) ^ J7 .js ^
yx
-3a
j:,...,J8=l
Op{n-^/^^/I^M''X^^)
where
S3
=
=
S3
2s
—
4si
+ s'. The
1
if
<
Ai
=
Ai, S2
+ 4si — 1 if Aj = Ai and S2 = 5si — 1 if Ai > Ai where
H^^ ^ — Hu are of smaller order. This establishes (A. 29).
remaining terms of
Next, show that \/nsup
^2M
~~
5si
=
/g'i^I)^
'-'^2M
h
C)p(l).
We
focus on a typical term in
the
cr^j^f^,
matrix
Sm =
yM - Km) U~^ ^
P'M,h
with population analogue Sm- Other terms of
=
VlogCTiM, 5A/
J'^oga^j^jj^,
tains diagonal elements ^j.
|1
- kU/M)\
=
(t>j
1
—
(j){v,
- SM (- log sm)"!
\SM (- logSM)^
=
z)
.
—
(1
Also
PM,h
can be handled
cr^.,f^
(f){k{j/M),SM) and
Note that
Km)
(Im -
let
=
(/)j
i;)(l
—
in a similar way.
<p{k{j/M),SM).
u
+ vz (— logz)')
=
Tj^^,^-
-
<l>32)
A,,,,
Let
sm =
The matrix A'm
con-
<
such that
'r^4 and
Af^^.^
=
M
1
t^' f^^Z^'mr':.
-32,h
Now,
E
((1
-
'^.0 (1
- ^2) -
(1
-
<p3.) (1
- <^.J)
^'L
3\,32
=
E
{ (^.:
-
<^.0
{^2 -
<^..)
+
(<^..
- ^.0
(1
+
(1
-
^3^) {<P32
- h^) } ^j!,.
31,32
such that
Sm - Sm l9{Mf
<
SM
2 sup
A/
E
M
9'{M)
+2 sup
M
M
/-logs/M
SM
J
logSA/V
g'{M)
M
log
2
SM
E
M
g'{M)
M
- k{n/M)
1
1
- k{j2/M)
{J2/Mf
^31 ,32
3132
\32
SM
9'{M)
log
SM \
M
J
l-HkUi/M),SM)
l-<P{k{n/M),SM
9'{M)
g'{M)
'^
M
E
3\32
aM
31 ,32
J1J2
31
1
- kin/M)
[Ji/Mf
_ AlM
^Jl J2
l~4>ik{J2/M),SM)
9{M)
^3i ,32
,
/here
M
( - log
SM
sup
<
sup (sM
Ci
( - log
SM
SM
M
g'{M) V
)
SM
M
g'{M) \
here
=
ci
+C2SUP
/g'{M)\
(— log [M^X
supyi,;
(AFA^^) /M)
(log
is
)
/M) and
=
C2
in the
M—
By
as
>
oo.
Op(7l-l/2 (logn)i/2)
M
The
sup \criM/g'{M)\
uniformly continuous in'M and sup
py:c
- sm)
|(sa./
=
result then follows because
/9'{M)\
=
Op{n~'^'^ (logn)^'
Op{l) with probability going to one.
By
by
)
the
-J2,h
proof of Proposition (3.4)
the
2
SM
log
M
(A. 29). Also note that J]
arguments
SM
log
- Sm)
I
same arguments
it
-
follows that [(l
as in the proof of (A. 29)
(p{k{ji/M), ^/iogaiM)) /g'{M)\ —>
can be shown that
it
M
-1
pxy^M-'py^:
EiJii'iJ2
JlJ2
"Jl
n
=
Op(n-i/2(iogn)V2)
is
Op(n-i/2(iogn)^/^).
J1J2
such that the third term of the bound
lg'{MY = Op{n~^'^
for
5a/
-
lg'{Mf
5a/
.1/2
It
thus follows
We have therefore established (A.28).
Let Ln[M) = Ln{M)+g(M)^+logaM v-'here g{M)^+]ogaM = 0{\l'^^) with A^ < A by Hannan and
Kavalieris (1986, p.47). Since \l^ /g{Mf = {K/Xf' M~-' ^
as M ^ oo it follows that L„(M) =
L„{M) (1 + 0{gr{M)^) with gr{M) = AF (A^/A)^^ Let M* minimize L„(A/). By the same arguments
as in Hannan and Deistler (1988) it now follows that M* / M* - 1 = Op(l) and Ln{M*)l L„{M*) =
that
5a/
—
5a/
{\ogny''-) uniformly in
M.
.
1
+
Op(l).
This establishes the
first
part of the Proposition.
Moreover, optimality of A/* implies that
—
logcrA/-
+ i +logCTA/-
"'"^
+
c
>
for
some constant
c.
This leads to
log
n
+ 2s log (A/* +
1)
+ 2 (A/* +
1) log
A < log (2M*
+ 1 + o{g{M'f-))
or
logn ^ log (2Ar
^
-
<
M* -
In a similar
+
1
+o(g(Ar)2) - 2s log (A/* +
logn
M*
>
=
2 log A
2 log
logcrA/--i
log(2Ar-l + o(5(Ar)2) -
+
^'
~^
2s log (A/*
c
<
+
A ^- —2 log A.
such that
1)
21ogA -> -2 log A.
A/*
=
M* = - logn/(21og A) + Op(logn). Optimality of M*
/n) and logo-^^-^. ^ = Op((logn) /n). We have seen before
0(1). This implies that
then implies that Ln
that log (7^^.^
-
M*
way we note that — logUM' +
Thus, logn/A/*
1)
—_
\M*j = Op((logn)
logcTyiY. (1
+
C)p(7i~-'/-^ (logn)^''^'''*
32
)
such that
logaj^-j,
=
Op((logn)^ /n) which
in
= A**M*^ =
turn implies that ^(M*)
with e^.
A,-/A
<
=
M*^ shows
Op(logn) in X^'
A^/^ by assumption
Op(logn/ni/2). Substituting
it
that A'^m*
=
Op(logn)
follows that {K/Xf''"
=
Op{y/\ogn)
for
=
s
if
s
if
M* = - logn/(21og A) +
e^.
and Op(l) otherwise. Since
=
and Op(l) otherwise. Then,
consider
gr
(m*) =
Op((logn)^^+^/2) for
all s
such that 5,
where
log
(AjA)-'°*^"/'°^^
But
(log
A^/ log A
-
>
1)
1/2
if
=
A^
<
Ln{M*)
=
+
M-M"
(Ln{M))
I
M*
M*
follows
.
l
=
= L„(M*)
fact that
+
(logn)2^+^
^^-(logA./IogA-l)^
(l
+
M* =
Op(n-i/2 (iogn)i/2+^'))
Op(logn).
Also,
Ln(M*) <
leads to
ia2Z„(M*) .-
M* - Ar
2
(9A/2
have used the
fact that
dl^ (A'P^
jdM =
0{M)
=
we note that
for
0.
Let
Then,
Ln{M*) - Ln{M*]
1=0.
14^
2M*^~e{M*)
Ln{M*)
=0p(n-V2
(log n)V2+^')
I
from (A.31), Z„(Af*) < L„(A?') and
This result implies that
M* - M* = Opin'^'^M*
Op(l).
(log 72)^/^+''/^)
=
Op(l). Similarly,
maximizing Ln{M) we have
(A.32)
l°s"/l°g^
Op(n-i/2(iogn)i/2+^').
Ln{M*)l (2Ar2^(Af *)) =
M*
((A,/A)
Op(n-V2 [Xognf^) and
[M*] around M*
= L„ (m*) +
< M* - M* and we
[dMf
(a/*)
g,
o(g,(M*)2))
Ln{M*)/Ln{M*) =
L„ (m*)
d^
+
second order mean value expansion of L„
where
Ar/log A-l
Oj,
Op(n'-i/2 (logn)^/^"^^')) by similar arguments as before. It follows that
(A.31)
A
(1
Then
from (A. 28) and the
last equality follows
L„(Af*)(l
A3/2.
OT,[gr{M*)).
(m*)' =
('(A^/A)-'°s"/'°=(^'-/^))
Z„(M*) < L„(M*) = Ln{M*)
where the
+
A
-2 log
=
Note that (A,/A)^M- (logn)^
log"'
v-logn/21ogA (
(A./A)^M- (A,/A)-
AT -
Ln{M*) - L„{M*)
1
33
Ln{M*)
= o{gr{M*f)
since
L„(M*) < Ln{M*) = L„(M*)
where ^.(M*)^
We
=
[l
+
We
0(7i-i/2 (logn)^/").
have thus shown that
use this result to sharpen the convergence result for
Deistler (1988, p.333-334) optimality of
< Ln{M*) = L(Ar) {l + o(gr{M*)^))
lil* - M* = Opin'-'^/'^M* (logn)^^'^).
o{gr{M*)^j) and L„{M*)
M*
M*. By
the same arguments as in
implies that
LniM") < Ln{M*) < L„(Ar )(1 + Op(n-i/2
or,
<
by rewriting the inequalities as
<
(A.33)
1^ -
1
(iogn)'/^+''))
- Ln{M*) < Ln{M*)Op{n-^/^
L„(A"/')
_
+A
Hannan and
L- <
j^^i^ +
1
j
(logn)^/'^"^*'),
OAn-^^^ (logn)V^-^
)
where by a mean value expansion
[giM*f-g{Arf)n _ Qg^j^rf/dM^
with
more,
M* - M* < M* - M* Note that g{M*fn/M*'^ = 0(1) by optimality of A/*. Furtherdg{M)'^/dM = c/(i\/)2 (2s/M + 21og A) By the first order condition for M* it follows that
.
dg{M*f/dM + 2M* /n =
0,
or n/ M*dg{l\rf /dM
M* - M* =
=
-1/2. From
[2s/ M*
\M' j
dg{M*Y/dM
since
( NT _ \
Op(l) by previous arguments
it
follows that
+ 2 log A)
n/M*dg{M*)'^/dM
-^ 1/2.
We now
rewrite (A.33) as
IF + '+
.7-
")
such that the result follows.
Proof of Theorem
D'j^j.{dj^j.
— dM')
also establish D^-j,
D,;,.)
=
=
Op(l) and
Op((logn)"^^'<(^'-'?'-i))
dia.g{(f){k{0).M'\'"),
trices
used.
is
The decomposition v^ (h^^M- - K,M-) = Aw- {^M- " ^m-) ^}IJm-Note that Dm- = Op(l) and di\f = Op(l). The following calculations
(4.4)
...,
<p{k{{n
-
dj^-j.
=
Op(l).
It is
therefore enough to
~dM-) =
^j^j
,/^{/M~*{d^;.
m-
1)/Af ), Af^A'^^))' and
with elements <P{k{{j)/M*),s^-j.) where
s^-j.
34
is
show that %/n/M*{DM' —
Opi{\ognr^^^'' -"•-''>). Define
Km =
{Jcm
® Ip)- We
defined in the proof of
hf =
write K^^,. for ma-
Theorem
(4.3)
and note
that for the truncated kernel K^^^.
the other elements are zero. Let
a matrix where the
is
"
definitions
we can
=
rewrite d-M
J-n-M—m
the same
in
diagonal elements are one and
all
'^^^
way
^M
,n M
fi*A
with h\j and ^\j^ defined
pM*
Pn-m = '^"^-^'^n-m and
Qm
M
first
•n-M-Tn
replacing fi^ and Q,jJ by Q.m and ^n}- Using these
P'n-m^m^*m ^MZ^-m^-
First consider
z:
yf^lW*{d^,-dM')
= V^W{K-mKM^n*7}k^.^^^-p'^_^kM'9rj^}kM'^
= .MM~*{Pn-.
^M-^*M>M. + ^M-n^->^,. + A,^.fi;r;^M.
+kM'(p-*^}-^M')kM'
with
ci
A^. =
kj^, —Km' From Assumption (E)
br [{l/M*y -
(l/M*)")
<P{k{j/Ar),Sj^.)
<
<
for
\SM'
2c2 |s^^, (- logs,^.)'
and
C2
- SM' (-logSM.)'l +
<
- SM'
(- logSA/01
<2
/c(j/M*)
for A/^* ~* °o-
sup (5M (- log
M
k{j/M* ) — k{j/M*] <
|2
- SM' (-logSM-)"! \KJ/M*) - k{j/M*)
+
C3
br l/M*" ((AfVM
SM^ -
Now
SA,
-
A:(j7Ar)|
<
2ci
bf 1/M*' ((a/VM*)' -
1
note that
(- log
sm)'') /
(g(M)M^+^') |5(M*)M*^+^'
I
the proof of Proposition (4.3)
Op{n'~^''^ (logn)
is
Op{n-'^ (logn)^/^+''+'''''^)
)
it
and g{M*)
is
'
,
because
C3
4-|s.>. (-logS£,.)''-SA/.
By
ci
\k{j/M*f - k{j/M*f
1|
*\2
fcOYM*)^ - k{j/M*)
and SM- (— log SM-)' ~*
some constant
- 4>{kU/M*),SM')
{-logSM.y -
some constants
follows that for
Then,
.
2C2 |5^. (-logS^^.)'
+
it
n
=
\^M' {-'^OgSj^.j.y
(- log SM'
follows that
=
sup^ [hi
(-logSA/)'^
Op{n~^^^ logn) such that the
- sm (-
first
term
Op{n-^/'^logn^/'^/M*). For the second term
-SM'
^mAzMIm^
(-logSA/-)'|
SM' (-logs A/*
35
logSA-/)'^) /
U(M)M9+'''/^
in the previous inequality
we
write
SM' {-logSM-y
where
sm = M'X" + o
Sm'/sm' -
1
{APX'^'^)
= (m7M*)'
method. Since
.s
a/.
Then (-
-
a(^^'-^^*)
=
(- log sm-)'
Opin-^l"^ \ogn^'"^/M*).
.
logs^-^.)' / (-
=
1
we
(4.3)
1
=
Op(Ti-i/2 (logn)^/2+^''/^) by
^(tz'^/^ (log
From Theorem
logSM')' -
71)''"'""^)
follows that |s^j.
it
have l/Af*"
also
Op(n-i/2 (logn)^/2+//2-, ^^^
Theorem
(-
(4.3)
logs^;^.)''
{(m*IM*X -
=
l)
-
and the delta
.sa;-
(-logSA/.)'|
=
Op[n-^''^ (logn)^^^^'' /M*")
such that
a/2+max(5'-i7,-l)
)•
4>{k{3/hnrsM.)'<t>{H3lM*),SM.) =Op(|j|7T-V^(logn)
=
For the truncated kernel A^-^-
M'
unless
tending to one. So the rest of the proof
is
M*
f-
by
fi'^^.^
Then,
'd*j^^,j^-
E
^Rm*^Y.
y;VA^p;_A^.r2;ri/?A,.^j^ =
M* - M* >
(
the truncated kernel and
trivial for
general case. Denote the /c,j-th element of
P
But
.
1
=
)
with probability
we only consider the more
letting fc(ji, A/*)
=
(f){k{j/M*),Sp^-j.)
fjf [fc(ji,Ar)-fc(ji.Ar)]^;'^->(j2,Ar)t;,,,
'=iii.j2=i
=
where d^
=
sJnlKPC2
[fif
4+
+
c^e"
+ ^f +
rf^
+
4+
'^'^]
^ ^^=1 E "Xi ^7. [Mji, A/*) - Mji, M')] ^;fi/c(j2/Ar)7;,,„
n—rtj
n
< = ^Y.
E
(rjf-fjf)[A:(j:,Ar)-/c(ji,Ar)]^;fJ;fc(j2,Ar)t.,,,,
(=1 Jl.J2 = l
and similarly
by
A.-,,
for
d^,
and Qa/ by
...,
dg corresponding to Definitions (A.20-A.24) for
$1]^^
in
the same waj' as
1
^n/hPd^ <C, ^/.1/2-P
i^.OJn
-1/2
1
(log n)
in d,f.
We
dg
cZ.5
where we replace
Km
consider the largest term ^5
/2+max(s' —q. — l)
n
n—m
'E
E
i^ir||(f;r-r;r)^;5>(^2,Ar)., J2
t=l JlJ2 = l
By
the same arguments as in the proof of
n—
(B.29)
follows that
it
m
E
(A.34)
Lemma
l^ili(flf -rjf)n'J>02,Ar)X:;^^t'u.||
= Op(Ar)
Jl-J2 = l
where we have used that
properties as
^j^.j^-
note that
it
E" ,_
,
|ji|''
^^1
<
y^
oc uniformly in j2 since
Also note that k{J2,M*)
y/n/M*d^ = Op{n~^/^
(B.28,B-30-B.33)
^
(logn)
+"'^''(*
~i~
•').
=
for
|J2|
>
A/*.
i^l^-j
has the same summability
The bound
(A.34) implies that
Using similar arguments based on the proofs of
Lemmas
can be shown that the remaining terms df,d^...,d^ are of smaller order. For
Ijil"
F'^t?
jl
J2
0(1) such that
yxy „*M]\
/
n J2
jlJ2 = l
36
II
n
2\V2
d'^
and thus ./f^jM^d^
For
=
Op((logn)'"'*''(^'-'''-'')
vWM^^n-mAM.^;^>M-%^ = ^AVM^C2 [<^ +
...
+ d^^ +
d^^] we define
n—m
n
d^^ =
^*) ^n,2
JlJ2 Hj2,Mn-k{J2,M*]
n
Vt,i;
1/2
0{\)
it
From YT]^^=\ \\h\'\hV
for the other terms.
and similarly
follows that
./WJWd^^ =
£^iiEr=i^M2/x^ii'
=
Oj,{n-^'^ (logn)i+2'"^''(^'-'?'-i) M*-i/2)
Op(l).
For ^f^^^JWd^^
note that for some Ci
max(Air',M*)
114^11
< CrO,[n-'
(logn)^+2--(^'-'--^))
J]
l^^l'
l^'il"
||
[^T.
' ^^') ^tn E"=i "^^^nl^
jlj2 = l
such that for any finite
e
>
and some
C
consider
\M'+t]
32
where \M*
+
denotes the smallest integer larger than
e]
M* + e.
>C +P{M*
> M* +
e).
Using the Markov inequality
it
follows
that
1/2
1/2
a*M'
ji
J2
jl J2 =
by similar arguments as
The remaining terms
^*j^]} as in
are of smaller order
{p*j^}
Op{n
^
0{n-'/^)
™'"'^^
(logn)-^"*"
'''
by the same arguments as before.
-
^IJ.^)
A'm-^^
we expand n*r} around Q*'} and n*-}
(A.l) leading to
^;>
M - ^M' = ^mH^Ii' - ^*M,)%7' +
(A.35)
=
proof of Lemma (B.29). Therefore y/n/M*d^'^
^/^^JM^P^.^Km-
Finally, for
around
in the
=
(^||E"=i^*j=/^
Op
^A/.
~
^W
and
(A.36)
Note that
hi',}
ft*-
follows that
-
,
Q^j.
=
M* + M*
M* - M* >
1.
Using the
=
n*^j}
-
nij}{nij,
-
ni,. )nij}
M* ^ M*. Because M*
M* - M' > 1. We thus
if
fact that
pU'^
^*M'
P(
M* - M* >
~ ^M-
and
+
M*
o^i
^M' " ^M'
are integer valued by definition
note that for any e
1)
tends to zero
>s)<P{ M*
it
>
0,
^}j.
-
^\i'
it
>
also
e
^
follows that
- M* >1)^0,
37
J
1
n*-
in fact
—
^
converges to zero in probability at arbitrarily fast rates. Also note that
n^^.
Op{n-'^'^{\ogny''^) by
Lemma
-
^*^}
j^,
with
Om- =
-)*— 1
n;7.'
Combining
(B.9).
^11'
--M-
=
(A. 35)
and
- hl^.)OM.
OM-{nij.
- "• \--Jw
M'
—^
o* ^n*—
- o*
0*7/ let*
(Q^,;. - Q^;.)^M-
I^'
—
(l
'A/-
(A. 36) then leads to
+
-
0^,|.
Op(
'
*
A /•
'A/
~
'
*
A/*
/I
that
t'^us follows
Next consider
+ K^rK^!^^> + ^m-^I^^^m-
^A/.f^;-;>A>
Pn-
First,
we analyze P;_„A^,. j^^;;A';,.F„_^
H^ =
H^^
+
-\-
//g^
H^^
+
H^^ and the
consider
Furthermore
Aj^-^..
let
y;i7AFOp(n-i/2(logn)^/2+max(.'~,,-i))
<
=
Now
where
=
//.-^
//2I1
from the definitions
definitions follow
the appropriate substitutions for ^*^} and
,/^^*\\H^\\
= H^+H^ + H^
H^ =
^
+ ^.n2 + ^2li + ^222>
in (A.6)-(A.13)
with
'^^^QY7i=o^^^i^i'^/^P)
|jil||r^f^*f;5;fc02/M*)r
~
J2
Op(M*'"^^(^'-'''-^))0(l).
^^^222
71
^
— 771
Jl-J2=l
and by the proof
of
Lemma
(B.20)
0{n-^/'^M*) such that y^n/M*
(B.22)
we can show
order by
Lemmas
in
the same
it
E T,]~J!^=i
follows that
\\H^22\\
=
Op(n-i/2
\ji I"
(f Jf - T^^) ^;fj>(i2/A^*)f r^,
(logn)^^"'^''*''"''-"^^).
way that y/n/M* \\H^
||
=
Op(l). All the
Using the results of
=
Lemma
remaining terms are of lower
(B.17-B.21).
Next, we turn to P;,_„A^^.fi*r;A^^.P„-77z
defined in the obvious way.
y;VM^I|//^^||
It follows
=
H.^^
+ Hi'^ + H^^
where
Hi-^,H^^,H^^
immediately that
<
y;V^Op(n-i(logn)i+2'"^''('^'-''--i))
=
Op(n-i/2 (logn)-^/2+2max(.'-9,-i)^_
^
JlJ2 = l
38
pxy n»A/"pyi
\n\\j2\
are
j
we note
For H^^^
The same type
Op(l).
E E",X=i
that
of
\h
arguments
1721'
I"
=
'^T.-^tn^^-n
^(1) such that again
^JnjM* ^Hy^^ =
also estabhsh
^^/M^ WH^^^ =
\\
Op(l). All the other
terms are of
lower order.
Finally,
we turn
based on (^^4,
—
to
^JnlM*P'^_^KM*
terms d^.d^,
...,dc)
lower order.
The
=
+
We
j.
/c
^UL
ijj
such that
-^
y* -
argmaxq>,'
G
=>
/C^
ijjj
Proposition (4.1)
is
k'i*
for
^
—
each j
=
'^2Mh
^ such that for k{x) €
A(j
-n/Mf -
because
A:*.
it
if g'
<P
=
y*
Lemma
Ai - Ai
~
"^^M
=
(B.44)
=
all
of
follows that
for j fixed,
where
A:-'*
=
^
By
the
Note that (T2m
and
fc^
=
fc
6
/Cg,
for all
argmin(5n,j(^) and
k'^B'^^^K*
=
for k
e /C^
\ima2M^n/Mf =
Opin''^/'^ (logn)^/^+''). Next, note that
-'
log n
But now,
\
M^
j log
n
q
is
To
j.
find the
countable and
It
sup;,g^, \Qn,j{k)
Qj{k)\
=
and
M*
=
(V'l
=
Op(n-i/2(logn)i/2+ma.x(.'-g.-i)y
i/M* {k*) +
Op
(l/M*
<J>
(fc*'
39
OpipT^I'^ (logn)^/^+^')
then follows from standard arguments that
optimum
finite
-
in /C,
for
(k*j
we now
each
/M*
g,
= U*
with q
=
Qn,q{k''*) converges to a fixed
(fc*)-l
(4.3) still
define k*
= Op
(n"^/^ (logn)^/^"^*'
goes through
when k
1,
V'l)
it
The
3.
are
Op{n~'^/'^).
Op{n~^^'^).
follows that kj
+k]B^^yK. Then, uniformly
(k[i/M* (r)), ^\oga,^j.^^.y^ -
-
<
not uniformly continuous in
K.j it
can be checked easily that the proof of Proposition
Then,
i
(}>{k[x)fdx
OpiyTT^I'^ (logn)^''^+*').
Qn.qik'^*)-
for
The terms EHzD~^dj
follows that
it
0, 1,...,t. It is
uniformly continuous in k for each
=
=
Edi
First,
the proof of
(B.47), sup;^^
value. Thus, q converges. For the second part note that
with
By
M*/ logn = (-2 log A)" Vo(l).We have shown that
because Qj
EHiDdj.
(B.28,B.30-B.33).
therefore Ed^.
is
Lemma
(7^ - .
^ l\'lr-p ,n.
log n
sup^
Lemmas
= A f/^ (t){xfdx\
Also define and Qj{k)
where
consider Edi and
therefore analyze the problem of finding
Q^^j{k)
k'^B'-i'i/K*
By
(5.3):
uniformly continuous in
<
by the same arguments
o{M/y/n). m
as in the proof of
however. Note that
i
term
largest order
Proof of Theorem
is
We
(5.1):
are of lower order by
M/VnAiJ(t)'^{x)dx
same arguments
Op(l)
)
^t;,.) as before.
Proof of Proposition
Eds
^ ^m*^ ^M'Pn-m =
(^t-7.
(k*{ilM* (r)), yiogaj,^.(
is
replaced
When
of
(f
>
1
Theorem
the error
(4.4)
B\/k*.
We
fci
Op{n~^^~ (logn)
=
e.
=
64/45
<
J-j
g^(,|-^ ^j^g^^
2 with
A:2
=
1-
Now
need to show that by choosing e this can be
is
iA +
-e + 8/2l£2 + 4/3e2 + 2/5£l Choose
first
same arguments
^j^g
consider
=
(/
consider a perturbation to
In other words, for k^ the approximate
the truncated kernel, which
is
+™^^(« -9,-
'
go through. For the last part of the theorem
such that f h?{x)dx
x^ such that
is
MSE
made
is
let
k{x)
say ^^(x)
A;(.),
A f^ /j.£(x)^dx
(
]
=
J
li^{x)dx
B^^V^l^^ )
+
^{s)) /-^
+
5 {e)
<
4
-
= l — x^
= 1 — ex —
-\-e^B'^^^ /k*
smaller than the approximate
Bi/k*. Note that J h'^{x)dx
e such that (e^ supe^
and
1
as in the proof
MSE
where 5
is
used as a bandwidth choice. But then clearly,
if
M
is
chosen optimally
than with M^. Similar arguments can be used to handle the case where
Proof of Theorem
the
(6.1)
MSE of \/n (,8^ m ~ 0)
'^
We
db
and the additional term dis
"
consider the expansion of ,/n {Pn.M
then the same as the analysis for ^/n
=
d^
= M/^{A[ -
7=-4i
Vn
A\) j
/
<f)
{13^^^^
(fp'{x)dx
= E{d5- Edc,){dz- Ed^)' + o{\). From
0{M/n). Also E('H222D-'^dod'r,D-'^/'^£ = o{M/n)
with
first
Edc^d'r,
Lemma
order,
(B.40) shows that
M*
all
1
=
by
= Op{M/n),
Lemma
/?)
(4.3)
and
Theorem
analysis of
where the order of magnitude
(B.44)
(B.41) and
M/n
terms.
it
Thus
du = Op{M/n)
it is
^/M;) =
the
(}P'{x)dx
=
o(l).
follows that Ed^d'^
=
together
By Remark
(4), to
possible to minimize
by Proposition
{\ogny^^-^''
(4.4) this establishes
40
The
where we replace d^ by
Ed^ — -^A'j j
Lemma
Opin''^''^ (logn)^/^+^') follows
jM*(M;/M*-l\b~}^Aij(l)'^{x)dx = Op{n-^^^
(B.47). In light of
1.
remaining terms are at most of order M/n.
that
Lemma
when
can not do worse
0)^^ before.
—
=
{x)dx
the proof of
does not depend on the constants of the
Mp/n-logCTM- Then, M*/M* -
>
{e)
J
follows from Proposition (4.1), needs to be considered. First note that
Then
g'
for k^ it
for
(64/45)^ which
possible because of the properties of ©o- This shows that k^ dominates the truncated kernel
M^
+
(4.3). Finally,
note
Op{l) by Proposition
last part of
the Theorem.
References
Andrews, D. W.
"Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
(1991):
Estimation," Econometrica, 59(3), 817-858.
(1999):
Moment
"Consistent
Selection Procedures for Generalized
Method
of
Moments
Esti-
mation," Econometnca, pp. 543-564.
Andrews, D. W. K.
(1994):
"Asymptotics
for
Semiparametric Econometric Models via Stochastic
Equicontinuity," Econometrica, 62, 43-72.
Berk, K. N.
(1974): "Consistent Autoregressive Spectral Estimates," Annals of Statistics,
Brillinger, D. R. (1981): Time
Series,
2,
489-502.
Data Analysis and Theory, Expanded Edition. Holden-Day,
Inc.
BroCKWELL,
York,
Inc.,
p.
J.,
AND R. A. Davis
Time
(1991):
and Methods. Springer Verlag-New
Series: Theory
second edn.
Chamberlain, G.
(1987):
"Asymptotic Efficiency
in
Estimation with Conditional
Moment
Restric-
tions," Journal of Econometrics, 34, 305-334.
Donald,
S. G.,
and
W. K. Newey
(2001): "Choosing the
Number
of Instruments," Econometrica,
69, 1161-1191.
E.J.Hannan, and M.Deistler
Gawronski, W.
(1977):
(1988):
The
Statistical
"Matrix Functions:
Theory of Linear Systems. John Wiley
&
Sons.
Taylor Expansion, Sensitivity and Error Analysis,"
Applicationes Mathematicae, XVI, 73-89.
Hall, A. R., and A. Inoue
of
(2001):
"A Canonical Correlations Interpretation
Moments Estimation with Applications
Hannan,
16,
E.,
and
L.
to
Moment
Selection,"
of Generalized
Method
mimeo.
Kavalieris (1984): "Multivariate Linear Time Series Models," Adv.Appl.Prob,
492-561.
(1986):
Hansen,
"Regression, Autoregression Models," Journal of
L. P. (1982):
Time
Series Analysis,
7,
27-49.
"Large Sample Properties of Generalized Method of Moments Estimators,"
Econometrica, 50(4), 1029-1053.
41
"A Method
(1985):
eralized
Hansen,
Method
of
Hansen,
Heaton, and M. Ogaki
Moment
Time
Linear
.J.
Singleton
(1991):
Series Models," in Non-parametric
Statistics, ed.
by
W.
A. Barnett,
J.
of Econometrics, 30, 203-238.
(1988):
Restrictions," Journal of the
and K.
L. P.,
Calculating Bounds on the Asymptotic Covariance Matrices of Gen-
Moments Estimators," Journal
L. P., J. C.
Conditional
for
"Efficiency
American
Bounds ImpHed by Multiperiod
Statistical Association, 83, 863-871.
"Computing Semiparametric
and Semiparametric Methods
Efficiency
in
Bounds
for
Econometrics and
Powell, and G. Tauchen, pp. 387-412.
(1996): "Efficient Estimation of Linear Asset-Pricing
Models With Moving Average Errors,"
Journal of Business and Economic Statistics, pp. 53-68.
Hayash), p., and C. Sims (1983): "Nearly
Efficient
Estimation of Time Series Models with Predeter-
mined, but not Exogenous, Instruments," Econometrica, 51(3), 783-798.
KUERSTEINER, G. M.
(1997):
"Efficient Inference in
Time
Series
Models with Conditional Hetero-
geneity," Ph.D. thesis, Yale University.
"Optimal Instrumental Variables Estimation
(2001):
for
ARMA
Models," Journal of Econo-
metrics, 104, 359-405.
(2002):
ticity,"
"Efficient
IV Estimation
Econometric Theory,
for Autoregressive
Models with Conditional Heteroskedas-
18, 547-583.
Lewis, R., and G. Reinsel (1985): "Prediction of Multivariate Time Series by Autoregressive Model
Fitting," Journal of Multivariate Analysis, pp. 393-411.
Linton, O.
(1995): "Seocnd
Order Approximation
in the Partially Linear Regression Model.,"
Econo-
metrica, pp. 1079-1112.
Linton, O. B.
(1997):
mators and Test
Magnus,
R.,
J.
"Second Order Approximation for Semiparametric Intrumental Variable Esti-
Statistics,"
mimeo
Yale University.
and H. Neudecker
(1979):
"The Commutation Matrix: Some Properties and
Applications," Annals of Statistics, 7(2), 381-394.
Nagar, a.
L.
Parameters
in
(1959):
"The Bias and Moment Matrix
of the General k-Class Estimators of the
Simultaneous Ecjuations," Econometrica, 27(4), 575-595.
42
Newey, W. K.
(1988): "Adaptive Estimation of Regression
Models via Moment Restrictions," Journal
of Econometrics, 38, 301-339.
(1990): "Efficient Instrumental Variables Estimation of Nonlinear Models," Econometrica, 58,
809-837.
(1994):
"The Asymptotic Variance of Semiparametric Estimators," Econometrica, pp. 1349-
1382.
Ng,
S.,
AND P. Perron
(1995): "Unit
Root Tests
in
ARMA Models with Data-Dependent Methods for
the Selection of the Truncation Lag," Journal of the American Statistical Association, 90, 268-281.
Parzen, E.
(1957):
"On Consistent Estimates
of the
Spectrum
of a Stationary
Time
Series,"
Annals
of Statistics, 28, 329-348.
Priestley, M.
(1981): Spectral Analysis
and Time
Series.
Academic
Press.
Stoica, p., T. Soderstrom, and B. Friedlander (1985): "Optimal Instrumental Variable
mates of the
AR
Parameters of an
ARMA
Process,"
IEEE
Esti-
Transactions on Automatic Control, 30,
1066-1074.
Xiao,
Z.,
and P. C. Phillips
(1998):
"Higher Order Approximations
Series Regression," Journal of Econometrics, pp. 297-336.
43
for
Frequency Domain Time
Table
1:
Performance of
GMM
Estimators with
n = 128
Estimator
OLS
GMM-1
GMM-20
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
=
512
4>
Bias
I\LA.E
MSB
Bias
MAE
MSE
0.51652
0.51672
0.52458
0.5174
0.52002
0.52203
0.3
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
n
0.1
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
=
0.5
0.54524
1.3457
2.5025
0.5095
1.0594
1.653
0.4481
0.45146
0.47529
0.43876
0.44586
0.4726
0.44917
0,47111
0.53162
0.44736
0.46461
0.52371
0.48439
1
0699
2.1788
0.47735
0.8655
1.4054
0.39075
1.0655
3.6218
0.33739
0.69265
0.99113
0.48483
0.75581
1.1466
0.46242
0.66349
0.92119
0.4259
0.85469
1.3366
0.38961
0.77809
1.0823
0.52227
0.52425
0,53233
0.52319
0.52238
0.52433
0.37319
0.92571
1.7589
0.19208
0.64946
2.8861
0.4699
0.47633
0.49813
0.4479
0,45124
0.47233
0.45252
0.46546
0.51769
0.40358
0.41276
0.4601
0.41832
0.8571
1.6191
0.2193
0.60238
2.8213
0.1965
0.84522
1.6398
-0.01157
0.48039
0.64576
0.40261
0.65701
0.93193
0.26319
0.46613
0.68559
0.26391
0.71097
1.0397
0.063104
0.50435
0.80424
0.46675
0.46775
0.47616
0.47121
0,46856
0.47072
0.064923
0.36801
0,63065
0,012124
0.17127
0.22942
0.4193
0.42041
0,44244
0.2859
0,28971
0.31147
0.33826
0.35308
0,40205
0.18586
0.19964
0.23394
0.076623
0.36358
0.6227
0.01198
0.17069
0.2287
-0.20138
0.52444
0.70699
-0.17292
0.27818
0.37291
0.085374
0.32911
0.52499
0.013643
0.16615
0.21996
0.021164
0.36326
0.55213
-0.02036
0.17825
0,24791
44
Table
2:
Performance of
GMM
Estimators with 6
n = 128
Estimator
OLS
0.1
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
0.3
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
0.5
=
n = 512
Bias
MAE
MSE
Bias
MAE
MSE
0.49705
0.49725
0.50385
0.49483
0.49372
0.49516
0.49461
0.98011
1.8786
0.48621
1.1214
5.2146
0.50373
0.50334
0.52432
0.49198
0.49328
0.51224
0.49943
0.51901
0.58448
0.48642
0.49693
0.55721
0.51603
0.82588
1.5626
0.48211
0.95272
5.0932
0.48989
1.1095
2.8016
0.45823
1.326
12.9356
0.50207
0.69179
1.1779
0.48339
0.66513
0.89137
0.52045
0.76303
1.1456
0.46361
0.74174
1.0006
0.45384
0.45547
0.4613
0.45384
0.45403
0.45568
0.28972
0.66679
1.2017
0.1072
0.47244
0.90049
0.44322
0.44483
0.46525
0.40133
0.40931
0.4302
0.43339
0.4501
0.51347
0.34142
0.35464
0.40636
0.36738
0.622.56
1.1651
0.2135
0.48551
0.87465
0.28444
0.66182
1.101
0.11745
0.35947
0.49946
0.36569
0.52213
0.7151
0.20702
0.39769
0.57164
0.28996
0.57883
0.84856
0.11337
0.40762
0.66559
0.38046
0.37889
0.38487
0.37494
0.37593
0.3775
0.042111
0.29423
0.54636
0.015656
0.13132
0.1768
0.32936
0.32894
0.35015
0.21703
0.21913
0.23704
0.2695
0.28151
0.32307
0.15066
0.1643
0.19171
0.08868
0.31714
0.58005
0.030298
0.14035
0.18463
0.04296
0.27811
0.41247
0.016908
0.13559
0.18328
0.090384
0.28488
0.44025
0.02436
0.13273
0.17765
0.054996
0.28023
0.42958
0.015666
0.13461
0.18303
45
Table
3:
Performance of
n
=
GMM
Estimators with 9
128
=
.5
77
=
512
Bias
MAE
MSE
Bias
MAE
MSE
0.4708
0.47223
0.4798
0.46821
0.47019
0.47208
GMM-1
GMM-20
KGMM-20
0.52711
1.1213
2.0323
0.52264
1.1775
2.20,54
0.39267
0.39842
0.42841
0.36723
0.37541
0.40458
0.40082
0.42934
0.4976
0.37235
0.40024
0.45747
GMM- Opt
0.41931
0.80943
1.357
0.41969
0.86022
1.5581
BGMM-Opt
KGMM-Opt
BKGMM-Opt
0.32466
1.2808
13.0744
0.30929
0.60111
0.88186
Estimator
OLS
OLS
0.1
0.3
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
0.5
0.43155
0.67594
0.9787
0.40402
0.67063
0.97716
0.35219
0.70706
0.98258
0.32216
0.76215
1.232
0.38604
0.38641
0.3945
0.38685
0.38754
0.38952
0.19769
0.82882
1.6984
0.087729
0.496
0.87353
0.27067
0.28668
0.31713
0.24542
0.25758
0.28589
0.26555
0.30311
0.36346
0.22235
0.25503
0.30632
0.23217
0.62133
1.0808
0.13734
0.41268
0.70857
0.098508
0.57105
1.1184
0.018366
0.29295
0.46272
0.24356
0.50607
0.742
0.16227
0.35728
0.54694
0.13393
0.53538
0.84079
0.025493
0.35742
0.59875
0.27857
0.28049
0.28786
0.28095
0.2819
0.28394
0.011779
0.3225
0.67531
-0.00211
0.11246
0.15008
0.15784
0.16854
0.19507
0.1009
0,10759
0.12591
0.13523
0.16429
0.20386
0.065555
0.09014
0.11271
0.058289
0.253
0.46363
0.01898
0.10532
0.14026
-0.01254
0.21014
0.33764
-0.02236
0.095026
0.12984
0.065003
0.22165
0.34951
0.012326
0.10667
0.13995
0.003854
0.22473
0.42831
-0.01114
0.094756
0.13116
46
Table
4:
Performance of
GMM
Estimators with
OLS
Bias
IDR
Bias
IDR
0.1
0.36203
0.39565
0.37789
0.22017
0.50136
5.22
0.51375
8.5451
0.36409
0.9079
0.43471
1.163
0.36288
0.99398
0.43279
1.2035
0.37306
1.8038
0.44232
2.2475
0.35293
1.0481
0.42576
1.1657
0.41278
2.0442
0.49196
2.7351
0.38757
1.931
0.48469
2.158
0.28962
0.29852
0.31254
0.19993
0.19565
2.991
0.039369
2.5622
0.24842
0.65978
0.28085
0.9905
0.93733
0.3
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
0.5
0.248
0.78716
0.26538
0.22376
1.2064
0.22999
1.1617
0.24336
0.73583
0.27032
0.97506
0.2297
1.4793
0.24377
1.62
0.23773
1.37
0.25496
1.5437
0.21107
0.26352
0.22779
0.15279
-0.00746
1.0207
-0.00346
0.60612
0.12121
0.50708
0.08675
0.5928
0.11991
0.54298
0.076
0.47081
0.082524
0.57197
0.072163
0.4773
0.1116
0.49428
0.084331
0.57169
0.077325
0.67164
0.057138
0.46211
0.086783
0.65593
0.06762
0.50118
47
32ifl
OS
and Heteroskedasticity
-^
GMM-1
GMM-20
KGMM-20
GMM-Opt
BGMM-Opt
KGMM-Opt
BKGMM-Opt
OLS
.5
n = 512
n = 128
Estimator
=
AUG 2 e 2003
Date Due
Lib-26-67
Mil
LIBRARIES
3 9080 02524 4572
Download