Nearly Weighted Risk Minimal Unbiased Estimation Economics Department Princeton University

advertisement
Nearly Weighted Risk Minimal Unbiased Estimation
Ulrich K. Müller and Yulong Wang
Economics Department
Princeton University
July 2015
Abstract
We study non-standard parametric estimation problems, such as the estimation
of the AR(1) coe¢ cient close to the unit root. We develop a numerical algorithm
that determines an estimator that is nearly (mean or median) unbiased, and among
all such estimators, comes close to minimizing a weighted average risk criterion. We
demonstrate the usefulness of our generic approach by also applying it to estimation
in a predictive regression, estimation of the degree of time variation, and long-range
quantile point forecasts for an AR(1) process with coe¢ cient close to unity.
JEL classi…cation: C13, C22
Keywords: Mean bias, median bias, local-to-unity, nearly non-invertible MA(1)
The authors thank seminar participants at Harvard/MIT, Princeton, University of Kansas and University
of Pennsylvania for helpful comments and suggestions. Müller gratefully acknowledges …nancial support by
the NSF via grant SES-1226464.
1
Introduction
Much recent work in econometrics concerns inference in non-standard problems, that is
problems that don’t reduce to a Gaussian shift experiment, even asymptotically. Prominent
examples include inference under weak identi…cation (cf. Andrews, Moreira and Stock (2006,
2008)), in models involving local-to-unity time series (Elliott, Rothenberg, and Stock (1996),
Jansson and Moreira (2006), Mikusheva (2007)), and in moment inequality models. The
focus of this recent work is the derivation of hypothesis tests and con…dence sets, and in
many instances, optimal or nearly optimal procedures have been identi…ed. In comparison,
the systematic derivation of attractive point estimators has received less attention.
In this paper we seek to make progress by deriving estimators that come close to minimizing a weighted average risk (WAR) criterion, under the constraint of having uniformly
low bias. Our general framework allows for a wide range of loss functions and bias constraints, such as mean or median unbiasedness. The basic approach is to …nitely discretize
the bias constraint. For instance, one might impose zero or small bias only under m distinct
parameter values. Under this discretization, the derivation of a WAR minimizing unbiased
estimator reduces to a Lagrangian problem with m non-negative Lagrange multipliers. The
Lagrangian can be written as an integral over the data, since both the objective and the
constraints can be written as expectations. Thus, for given multipliers, the best estimator
simply minimizes the integrand for all realizations of the data, and so is usually straightforward to determine. Furthermore, it follows from standard duality theory that the value of
the Lagrangian, evaluated at arbitrary non-negative multipliers, provides a lower bound on
the WAR of any estimator that satis…es the constraints. This lower bound holds a fortiori in
the non-discretized version of the problem, since the discretization amounts to a relaxation
of the uniform unbiasedness constraint.
We use numerical techniques to obtain approximately optimal Lagrange multipliers, and
use them for two conceptually distinct purposes. On the one hand, close to optimal Lagrange
multipliers imply a large, and thus particularly informative lower bound on the WAR of
any nearly unbiased estimator. On the other hand, close to optimal Lagrange multipliers
yield an estimator that is nearly unbiased in the discrete problem. Thus, with a …ne enough
discretization and some smoothness of the problem, this estimator also comes close to having
uniformly small bias. Combining these two usages then allows us to conclude that we have in
fact identi…ed a nearly WAR minimizing estimator among all estimators that have uniformly
1
small bias.
It is not possible to …nely discretize a problem involving an unbounded parameter space.
At the same time, many non-standard problems become approximately standard in the
unbounded part of the parameter space. For instance, a large concentration parameter in a
weak instrument problem, or a large local-to-unity parameter in a time series problem, induce
nearly standard behavior of test statistics and estimators. We thus suggest “switching” to
a suitable nearly unbiased estimator in this standard part of the parameter space in a way
that ensures overall attractive properties.
These elements— discretization of the original problem, analytical lower bound on risk,
numerical approximation, switching— are analogous to Elliott, Müller and Watson’s (2015)
approach to the determination of nearly weighted average power maximizing tests in the
presence of nuisance parameters. Our contribution here is the transfer of the same ideas to
estimation problems.
There are two noteworthy special cases that have no counterpart in hypothesis testing.
First, under squared loss and a mean bias constraint, the Lagrangian minimizing estimator
is linear in the Lagrange multipliers. WAR then becomes a positive de…nite quadratic form
in the multipliers, and the constraints are a linear function of the multipliers. The numerical
determination of the multipliers thus reduces to a positive de…nite quadratic programming
problem, which is readily solved even for large m by well-known and widely implemented
algorithms.1
Second, under a median unbiasedness constraint and in absence of any nuisance parameters, it is usually straightforward to numerically determine an exactly median unbiased
estimator by inverting the median function of a suitable statistic, even in a non-standard
problem. This approach was prominently applied in econometrics in Andrews (1993) and
Stock and Watson (1998), for instance. However, the median inversion of di¤erent statistics typically yields di¤erent median unbiased estimators. Our approach here can be used
to determine the right statistic to invert under a given WAR criterion: Median inverting a
nearly WAR minimizing nearly median unbiased estimator typically yields an exactly median
unbiased nearly WAR minimizing estimator, as the median function of the nearly median
unbiased estimator is close to the identity function.
1
In this special case, and with a weighting function that puts all mass at one parameter value, the
Lagrangian bound on WAR reduces to a Barankin (1946)-type bound on the MSE of an unbiased or biased
estimator, as discussed by McAulay and Hofstetter (1971), Glave (1972) and Albuquerque (1973).
2
In addition, invariance considerations are more involved and potentially more powerful in
an estimation problem compared to a hypothesis testing problem, since it sometimes makes
sense to impose invariance (or equivariance) with respect to the parameter of interest. We
suitably extend the theory of invariant estimators in Chapter 3 of Lehmann and Casella
(1998) in Section 4 below.
We demonstrate the usefulness of our approach in four time series estimation problems.
In all of them, we consider an asymptotic version of the problem.
First, we consider mean and median unbiased estimation of the local-to-unity AR(1)
parameter with known and unknown mean, and with di¤erent assumptions on the initial
condition. The analysis of the bias in the AR(1) model has spawned a very large literature
(see, among others, Hurvicz (1950), Marriott and Pope (1954), Kendall (1954), White (1961),
Phillips (1977, 1978), Sawa (1978), Tanaka (1983, 1984), Shaman and Stine (1988), Yu
(2012)), with numerous suggestions of alternative estimators with less bias (see, for instance,
Quenouille (1956), Orcutt and Winokur (1969), Andrews (1993), Andrews and Chen (1994),
Park and Fuller (1995), MacKinnon and Smith (1998), Cheang and Reinsel (2000), Roy and
Fuller (2001), Crump (2008), Phillips and Han (2008), Han, Phillips, and Sul (2011) and
Phillips (2012)). With the exception of the conditional optimality of Crump’s (2008) median
unbiased estimator in a model with asymptotically negligible initial condition, none of these
papers make an optimality claim, and we are not aware of a previously suggested estimator
with uniformly nearly vanishing mean bias. As such, our approach comes close to solving
this classic time series problem.
Second, we consider a nearly non-invertible MA(1) model. The parameter estimator here
su¤ers from the so-called pile-up problem, that is the MA(1) root is estimated to be exactly
unity with positive probability in the non-invertible case. Stock and Watson (1998) derive
an exactly median unbiased estimator by median inverting the Nyblom (1989) statistic in
the context of estimating the degree of parameter time variation. We derive an alternative,
nearly WAR minimizing median unbiased estimator under absolute value loss and …nd it to
have very substantially lower risk under moderate and large amounts of time variation.
Third, we derive nearly WAR minimizing mean and median unbiased estimators of the
regression coe¢ cient in the so-called predictive regression problem, that is with a weakly
exogenous local-to-unity regressor. This contributes to previous analyses and estimator
suggestions by Mankiw and Shapiro (1986), Stambaugh (1986, 1999), Cavanagh, Elliott,
and Stock (1995), Jansson and Moreira (2006), Kothari and Shanken (1997), Amihud and
3
Hurvich (2004), Eliasz (2005) and Chen and Deo (2009).
Finally, we use our framework to derive nearly quantile unbiased long-range forecasts from
a local-to-unity AR(1) process with unknown mean. The unbiased property here means that
in repeated applications of the forecast rule for the
be smaller than the forecast with probability
quantile, the future value realizes to
, under all parameter values. See Phillips
(1979), Stock (1996), Kemp (1999), Gospodinov (2002), Turner (2004) and Elliott (2006) for
related studies of forecasts in a local-to-unity AR(1).
The remainder of the paper is organized as follows. Section 2 sets up the problem,
derives the lower bound on WAR, and discusses the numerical implementation. Section
3 considers the two special cases of mean unbiased estimation under squared loss, and of
median unbiased estimation without nuisance parameters. Section 4 extends the framework
to invariant estimators. Throughout Sections 2-4, we use the problem of estimating the
AR(1) coe¢ cient under local-to-unity asymptotics as our running example. In Section 5, we
provide details on the three additional examples mentioned above. Section 6 concludes.
2
Estimation, Bias and Risk
2.1
Set-Up and Notation
We observe the random element X in the sample space X . The density of X relative to some
-…nite measure
is f , where
= h( ) 2 H with estimators
2
is the parameter space. We are interested in estimating
: X 7! H. For scalar estimation problems, H
R, but
our set-up allows for more general estimands. Estimation errors lead to losses as measured
by the function ` : H
7! [0; 1) (so that `( (x); ) is the loss incurred by the estimate
(x) if the true parameter is ). The risk of the estimator
R
r( ; ) = E [`( (X); )] = `( (x); )f (x)d (x).
is given by its expected loss,
Beyond the estimator’s risk, we are also concerned about its bias. For some function
c:H
7! H, the bias of
mean bias, c( ; ) =
is de…ned as b( ; ) = E [c( (X); )]. For instance, for the
h( ), so that b( ; ) = E [ (X)]
scalar parameter of interest , c( ; ) = 1[ > h( )]
We are interested in deriving estimators
1
,
2
h( ), and for the median bias of a
so that b( ; ) = P [ (X) > h( )]
1
.
2
that minimize risk subject to an unbiasedness
constraint. In many problems of interest, a uniformly risk minimizing
might not exist,
even under the bias constraint. To make further progress, we thus measure the quality
4
of estimators by their weighted average risk R( ; F ) =
negative …nite measure F with support in
.
R
r( ; )dF ( ) for some given non-
In this notation, the weighted risk minimal unbiased estimator
solves the program
min R( ; F ) s.t.
(1)
b( ; ) = 0 8 2
(2)
.
More generally, one might also be interested in deriving estimators that are only approximately unbiased, that is solutions to (1) subject to
"
for some "
"8 2
b( ; )
(3)
0. Allowing the bounds on b( ; ) to depend on
does not yield greater
generality, as they can be subsumed in the de…nition of the function c. For instance, a
restriction on the relative mean bias to be no more than 5% is achieved by setting c( ; ) =
(
h( ))=h( ) and " = 0:05.
An estimator is called risk unbiased if E 0 [`( (X);
0 )]
E 0 [`( (X); )] for any
0;
2
.
As discussed in Chapter 3 of Lehmann and Casella (1998), risk unbiasedness is a potentially
attractive property as it ensures that under
true value
0
0,
the estimate (x) is at least as close to the
in expectation as measured by ` as it is to any other value of . It is straight-
forward to show that under squared loss `( (x); ) = ( (x)
h( ))2 a risk unbiased estimator
is necessarily mean unbiased, and under absolute value loss `( (x); ) = j (x)
h( )j, it is
necessarily median unbiased. From this perspective, squared loss and mean unbiased constraints, and absolute value loss and median unbiased contraints, form natural pairs, and
we report results for these pairings in our examples. The following development, however,
does not depend on any assumptions about the relationship between loss function and bias
constraint, and other pairings of loss function and constraints might be more attractive in
speci…c applications.
In many examples, the risk of good estimators is far from constant, as the information
in X about h( ) varies with . This makes risk comparisons and the weighting function
F more di¢ cult to interpret. Similarly, also the mean bias of an estimator is naturally
gauged relative to its sampling variability. To address this issue, we introduce a normalization function n( ) that roughly corresponds to the root mean squared error of a good
estimator at : The normalized risk rn ( ; ) is then given by rn ( ; ) = E [( (X)
5
)2 ]=n( )2
and rn ( ; ) = E [j (X)
h( )j]=n( ) under squared and absolute value loss, respectively,
and the normalized mean bias is bn ( ; ) = (E [ (X)] h( ))=n( ). The weighted average
R
normalized risk with weighting function Fn , rn ( ; )dFn ( ), then reduces to R( ; F ) above
with dF ( ) = dFn ( )=n( )2 and dF ( ) = dFn ( )=n( ) in the squared loss and absolute value
loss case, respectively, and bn ( ; ) = b( ; ) with c( ; ) = (
h( ))=n( ). The median bias,
of course, is readily interpretable without any normalization.
Running Example: Consider an autoregressive process of order 1, yt = yt
1; : : : ; T , where "t
iidN (0; 1) and y0 = 0. Under local-to-unity asymptotics,
1
+ "t , t =
=
T
=
=T , this model converges to the limit experiment (in the sense of LeCam) of observing
1
X, where X is an Ornstein-Uhlenbeck process dX(s) =
X(s)ds + dW (s) with X(0) = 0
and W a standard Wiener process. The random element X takes on values in the set of
cadlag functions on the unit interval X , and the Radon-Nikodym derivative of X relative to
Wiener measure
is
f (x) = exp[
The aim is to estimate
1
2
(x(1)
2
1)
1 2
2
Z
1
x(s)2 ds]:
(4)
0
2 R in a way that (nearly) minimizes weighted risk under a mean
or median bias constraint.
More generally, suppose that yt = yt
and
= 1
1 + ut
with y0 = op (T 1=2 ), T
1=2
2
P[ T ]
t=1
ut ) W ( ),
=T . Then for any consistent estimator ^ of the long run variance
fut g, XT ( ) = T
1=2
2
of
^ 1 y[ T ] ) X( ). Thus, for any -almost surely continuous estimator
: X ! R in the limiting problem, (T
1=2
^ 1 y[ T ] ) has the same asymptotic distribution as
(X) by the continuous mapping theorem. Therefore, if (X) is (nearly) median unbiased
for , then (XT ) is asymptotically (nearly) median unbiased. For a (nearly) mean unbiased
estimator (X), one could put constraints on fut g that ensure uniform integrability of (XT ),
so that again E [ (XT )] ! E [ (X)] for all . Alternatively, one might de…ne “asymptotic
mean bias”and “asymptotic risk”as the sequential limits
lim lim E [1[j (XT )j < K] (XT )
K!1 T !1
h( )]
lim lim E [1[`( (XT ); ) < K]`( (XT ); )]
K!1 T !1
respectively. With these de…nitions, the bias and (weighted average) risk of (XT ) converges
to the bias and weighted average risk of
by construction, as long as
continuous with …nite mean and risk relative to (4).
6
is -almost surely
Figure 1: Normalized Mean Bias and Mean Squared Error of OLS Estimator in Local-toUnity AR(1)
The usual OLS estimator for (which is also equal to the MLE) is OLS (x) = 12 (x(1)2
R1
1)= 0 x(s)2 ds. As demonstrated by Phillips (1987) and Chan and Wei (1987), for large ,
1=2
a
) ) N (0; 1) as ! 1, yielding the approximation OLS (X) N ( ; 2 ).
p
We thus use the normalization function n( ) = 2 + 10 (the o¤set of 10 ensures that
(2 )
(
OLS (X)
n( ) > 0 even as
= 0).
We initially focus on estimating
under squared loss and a mean bias constraint for
0; we return to the problem of median unbiased estimation in Section 3.2 below. Figure
1 depicts the normalized mean bias and the normalized mean squared error of
function of . As is well known, the bias of
strong mean reversion of, say,
OLS
OLS
as a
is quite substantial: even under fairly
= 60, the bias is approximately on …fth of the root mean
squared error of the estimator. N
2.2
A Lower Bound on Weighted Risk of Unbiased Estimators
In general, it will be di¢ cult to analytically solve (1) subject to (3). Both X and H are
typically uncountable, so we are faced with an optimization problem in a function space.
Moreover, it is usually di¢ cult to obtain analytical closed-form expressions for the integrals
de…ning r( ; ) and b( ; ), so one must rely on approximation techniques, such as Monte
Carlo simulation. For these reasons, it seems natural to resort to numerical techniques to
obtain an approximate solution.
7
There are potentially many ways of approaching this numerical problem. For instance,
one might posit some sieve-type space for , and numerically determine the (now …nitedimensional) parameter that provides the relatively best approximate solution. But to make
this operational, one must choose some dimension of the sieve space, and it unclear how much
better of a solution one might have been able to …nd in a di¤erent or more highly-dimensional
sieve space.
It would therefore be useful to have a lower bound on the weighted risk R( ; F ) that
holds for all estimators that satisfy (3). If an approximate solution ^ is found that also
satis…es (3) and whose weighted risk R(^; F ) is close to the bound, then we know that we
have found the nearly best solution overall.
To derive such a bound, we relax the constraint (3) by replacing it by a …nite number of
constraints: Let Gi ; i = 1; : : : ; m be probability distributions on ; and de…ne the weighted
R
average bias B( ; G) of the estimator as B( ; G) = b( ; )dG( ). Then any estimator
that satis…es (3) clearly also satis…es
"
" for all i = 1; : : : ; m:
B( ; Gi )
A special case of (5) has Gi equal to a point mass at
at the …nite number of parameter values
i,
(5)
so that (5) amounts to imposing (3)
1; : : : ; m.
Now consider the Lagrangian for the problem (1) subject to (5),
L( ; ) = R( ; F ) +
m
X
u
i (B(
; Gi )
") +
i=1
where
= ( 1; : : : ;
m)
and
i
= ( li ;
m
X
l
i(
B( ; Gi )
(6)
")
i=1
u
i ).
By writing R( ; F ) and B( ; Gi ) in terms of their
de…ning integrals and by assuming that we can change the order of integration, we obtain
L( ; ) =
Z
Z
f (x)`( (x); )dF ( ) +
+
m
X
i=1
m
X
Z
u
i(
l
i(
i=1
Let
be the estimator such that for a given ,
f (x)c( (x); )dGi ( )
Z
f (x)c( (x); )dGi ( )
(7)
")
!
") d (x):
(x) minimizes the integrand on the right
hand side of (7) for each x. Since minimizing the integrand at each point is su¢ cient to
minimize the integral,
necessarily minimizes L over .
8
This argument requires that the change of the order of integration in (7) is justi…ed. By
Fubini’s Theorem, this is always the case for R( ; F ) (since ` is non-negative), and also for
B( ; Gi ) if c is bounded (as is always the case for the median bias, and for the mean bias
if H is bounded). For unbounded H and mean bias constraints, it su¢ ces that there exists
some estimator with uniformly bounded second moment.
Standard Duality Theory for optimization implies the following Lemma.
Lemma 1 Suppose ~ satis…es (5), and for arbitrary
0 (that is, each element in
non-negative),
minimizes L( ; ) over . Then R(~; F ) L( ; ).
is
L( ; ) by de…nition of , so in particular, L(~; )
L(~; ) since
0 and ~ satis…es (5). Combining these
Proof. For any , L( ; )
L( ; ). Furthermore, R(~; F )
inequalities yields the result.
Note that the bound on R(~; F ) in Lemma 1 does not require an exact solution to the
discretized problem (1) subject to (5). Any
some choices for
0 implies a valid bound L( ; ), although
yield better (i.e., larger) bounds than others. Since (5) is implied by (3),
these bounds hold a fortiori for any estimator satisfying the uniform unbiasedness constraint
(3).
Running Example: Suppose m = 1, Fn puts unit mass at
mass at
1
>
0.
0
= 0 and G1 puts point
)2 , normalized mean bias constraint
With square loss `( ; ) = (
c( ; ) = (
)=n( ), " = 0 and u1 = 0, (x) minimizes
Z
Z
2
( (x)
0)
l
f (x)`( (x); )dF ( ) 1 f (x)c( (x); )dGi ( ) = f 0 (x)
n( 0 )2
This is solved by
(x) =
0
+
1
2
2
l n( 0 ) f
1
(x)=(f 0 (x)n( 1 )). For
numerical calculation yields R( ; F ) = rn ( ;
0)
1
l
1f
1
(x)
= 3 and
= 0:48 and B( ; G1 ) = bn ( ;
l
1
(x)
1
:
n( 1 )
= 1:5, a
1)
= 0:06;
so that L( ; ) = 0:39. We can conclude from Lemma 1 that no estimator that is unbiased
for all
2 R can have risk R( ; F ) = rn ( ;
for all estimators
0)
that are unbiased only at
smaller than 0:39, and the same holds also
1.
Moreover, this lower risk bound also holds
for all estimators whose normalized bias is smaller than 0.06 uniformly in
2 R, and more
generally for all estimators whose normalized mean bias is smaller than 0.06 in absolute
value only at
If
u
i
1.
N
0 is such that
(B( ; Gi ) ") = 0 and
l
i
satis…es (5) and the complementarity slackness conditions
( B( ; Gi ) ") = 0 hold, then (6) implies L(
so that by an application of Lemma 1, L(
;
9
) is the best lower bound.
;
) = R(
; F ),
2.3
Numerical Approach
Solving the program (1) subject to (5) thus yields the largest, and thus most informative
bound on weighted risk R( ; F ). In addition, solutions
to this program also plausibly
satisfy a slightly more relaxed version of original non-discretized constraint (3). The reason
is that if the bias b(
dense in
; ) does not vary greatly in , and the distributions Gi are su¢ ciently
, then any estimator that satis…es (5) cannot violate (3) by very much.
This suggests the following strategy to obtain a nearly weighted risk minimizing nearly
unbiased estimator, that is, for given "B > 0 and "R > 0, an estimator ^ that (i) satis…es (3)
with " = "B ; (ii) has weighted risk R(^; F )
(1 + "R )R( ; F ) for any estimator
satisfying
(3) with " = "B .
1. Discretize
by point masses or other distributions Gi , i = 1; : : : ; m.
y
2. Obtain approximately optimal Lagrange multipliers ^ for the problem (1) subject to
y
(5) for " = "B , and associated value R = L( y ; ^ ).
^
3. Obtain an approximate solution (^" ; ^ ) to the problem
min " s.t. R( ; F )
and
(8)
(1 + "R )R
" 0;
"
B( ; Gi )
", i = 1; : : : ; m
(9)
and check whether ^ satis…es (3). If it doesn’t, go back to Step 1 and use a …ner
discretization. If it does, ^ = ^ has the desired properties by an application of Lemma
1.
nor ^ have to be exact solutions to their respective programs
to be able to conclude that ^ is indeed a nearly weighted risk minimizing nearly unbiased
Importantly, neither
^y
estimator as de…ned above.
The solution ^ to the problem in Step 3 has the same form as
a weighted average of the integrand in (7), but
, that is ^ minimizes
now is the ratio of the Lagrange multipli-
ers corresponding to the constraints (9) and the Lagrange multiplier corresponding to the
constraint (8). These constraints are always feasible, since = ^y satis…es (9) with " = "B
y
(at least if ^ is the exact solution), and R( ^y ; F ) = R < (1 + "R )R. The additional slack
provided by "R is used to tighten the constraints on B(^ ; Gi ) to potentially obtain a ^
satisfying (3). A …ner discretization implies additional constraints in (5) and thus (weakly)
10
increases the value of R. At the same time, a …ner discretization also adds additional constraints on the bias function of ^ , making it more plausible that it satis…es the uniform
constraint (3).
We suggest using simple …xed point iterations to obtain approximate solutions in Steps
2 and 3, similar to EMW’s approach to numerically approximate an approximately least
favorable distribution. See Appendix B for details. Once the Lagrange multipliers underlying
^ are determined, ^ (x) for given data x is simply the minimizer of the integrand on the
right hand side of (7). We provide corresponding tables and matlab code for all examples in
this paper under www.princeton.edu/ yulong.
An important class of non-standard estimation problems are limit experiments that arise
from the localization of an underlying parameter
2
around problematic parameter values
, such as the local-to-unity asymptotics for the AR(1) estimation problem in the
N
running example. Other examples include weak instrument asymptotics (where the limit is
taken for values of the concentration parameter local-to-zero), or local-to-binding moment
conditions in moment inequality models. In these models, standard estimators, such as, say,
(bias corrected) maximum likelihood estimators, are typically asymptotically unbiased and
e¢ cient for values of
2
=
N.
Moreover, this good behavior of standard estimators typically
also arises, at least approximately, under local asymptotics with localization parameter
&( ) ! 1 for some appropriate function & :
as
7! R. For instance, as already mentioned in
Section 2.1, Phillips (1987) and Chan and Wei (1987) show that the OLS estimator for the
AR(1) coe¢ cient
=T as &( ) =
1
=
is approximately normal under local-to-unity asymptotics
=
T
=
! 1. This can be made precise by limit of experiment arguments, as
discussed in Section 4.1 of EMW. Intuitively, the boundary of the localized parameter space
typically continuously connects to the unproblematic part of the parameter space
of the underlying parameter .
n
N
It then makes sense to focus the numerical strategy on the truly problematic part of the
local parameter space, where the standard estimator
S
is substantially biased or ine¢ cient.
As in EMW, consider a partition of the parameter space
For large enough
where
S,
S
= f : &( ) >
Sg
into two regions
=
N
[
S.
is the (typically unbounded) standard region
n S is the problematic nonN =
^
standard region. Correspondingly, consider estimators that are of the following switching
S
is approximately unbiased and e¢ cient, and
form
^(x) = (x) S (x) + (1
11
(x))
N (x)
(10)
where
(x) = 1[^& (x) >
cut-o¤
S,
>
] for some estimator ^& (X) of &( ). By choosing a large enough
we can ensure that
S
is only applied to realizations of X that stem from
the relatively unproblematic part of the parameter space
S
with high probability. The
numerical problem then becomes the determination of the estimator N : X !
7 H that leads
to overall appealing bias and e¢ ciency properties of ^ in (10) for any 2 . As discussed
in EMW, an additional advantage of this switching approach is that the weighting function
F can then focus on
and the “seam”part of S where P ( (X)) is not close to zero or
one. The good e¢ ciency properties of ^ on the remaining part of S are simply inherited
N
by the good properties of
S.
Incorporating this switching approach in the numerical strategy above is straightforward:
For given
, the problem in Step 3 now has constrained to be of the form (10), that is for
all x that lead to (x) = 1, ^ (x) = S (x), and the numerical challenge is only to determine
a suitable N = ^ . By not imposing the form (10) in Step 2, we still obtain a bound on
N
weighted risk that is valid for any estimator
satisfying
"B
"B for all
b( ; )
(that
is, without imposing the functional form (10)). Of course, for a given switching rule , it
might be that no nearly minimal weighted risk unbiased estimator of the form (10) exists,
because
S
is not unbiased or e¢ cient enough relative to the weighting F , so one might try
instead with larger values for
S
and
.
Running Example: Taking the local-to-unity limit of Marriott and Pope’s (1954) mean
bias expansion E[^OLS
] =
model without intercept suggests
2 =T + o(T
S (x)
=
1
) for the AR(1) OLS estimator ^OLS in a
OLS (x)
2 as a good candidate for
indeed, it can be inferred from Figure 1 that the normalized mean bias of
0:01 for
P (^& (X) >
>
S
= 20. Furthermore, setting ^& (x) =
) < 0:1% for all
S,
OLS (x)
and
S
S.
And
is smaller than
= 40, we …nd that
so draws from the problematic part of the parameter
space are only very rarely subject to the switching.
It is now an empirical matter whether imposing the switching to
S
in this fashion is costly
in terms of risk, and this can be evaluated with the algorithm above by setting the upper
bound of the support of F to, say, 80, so that the weighted average risk criterion incorporates
parameter values where switching to
for
S
occurs with high probability (P (^& (X) >
> 70). N
12
) > 99%
3
Two Special Cases
3.1
Mean Unbiased Estimation under Quadratic Loss
Consider the special case of quadratic loss `( ; ) = (
constraint c( ; ) = (
h( ))2 and normalized mean bias
h( ))=n( ) (we focus on a scalar estimand for expositional ease, but
the following discussion straightforwardly extends to vector valued ). Then the integrand
in (7) becomes a quadratic function of (x), and the minimizing value
(x) =
a linear function of ~ i = 21 (
R
u
i
Pm ~ R
f (x)h( )dF ( )
i=1 i
R
f (x)dF ( )
l
i ).
f (x)
dGi (
n( )
)
(x) is
;
(11)
Plugging this back into the objective and constraints
yields
R( ; F ) = ~
0
~ + !R,
B( ; Gi ) = ! i
0~
i ,
where
Z
!R =
and
i
ij
!i =
Z
=
Z
Z R
linear combination of
R
R
)
R
f (x)
dGj (
n( )
)
d (x),
f (x)dF ( )
!
R f (x)
R
Z
dG
(
)
f
(x)h(
)dF
(
)
i
f
(x)
n( )
R
h( )dGi ( ) d (x)
n( )
f (x)dF ( )
h( )2 f (x)dF ( )
is the ith column of the m
semi-de…nite. In fact,
f (x)
dGi (
n( )
d (x);
. Note that is symmetric and positive
R
is positive de…nite as long as fn((x)) dGi ( ) cannot be written as a
f (x)
dGj (
n( )
m matrix
R
[ f (x)h( )dF ( )]2
R
f (x)h( )2 dF ( )
), j 6= i almost surely. Thus, minimizing R( ; F ) subject
to (5) becomes a (semi)-de…nite quadratic programming problem, which is readily solved
1
by well-known algorithms. In the special case of " = 0, the solution is ~ =
!, where
! = (! 1 ; : : : ; ! m )0 . Step 3 of the algorithm generally becomes a quadratically constrained
quadratic program, but since " is scalar, it can easily be solved by conditioning on ", with a
line search as outer-loop. This remains true also if
(10), where
N
is restricted to be of the switching form
is of the form (11), and R( ; F ) and B( ; Gi ) contain additional constants
that arise from expectations over draws with (x) = 1.
Either way, it is computationally trivial to implement the strategy described in Section
2.3 under mean square risk and a mean bias constraint, even for a very …ne discretization
m.
13
Figure 2: Mean Bias and MSE in Local-to-Unity AR(1) with Known Mean
Running Example: We set Fn uniform on [0; 80], "B = "R = 0:01, and restrict ^ to be
of the switching form (10) with ^& =
OLS
and
= 40 as described at the end of the last
section. With Gi equal to point masses at f0:0; 0:22 ; : : : ; 102 g, the algorithm successfully
identi…es a nearly weighted risk minimizing estimator ^ that is uniformly nearly unbiased
for
0. The remaining bias in ^ is very small: with the normalized mean squared error
of ^ close to one, "B = 0:01 implies that about 10,000 Monte Carlo draws are necessary
for the largest bias of ^ to be approximately of the same magnitude as the Monte Carlo
standard error of its estimate.
Figure 2 plots the normalized bias and risk of the nearly weighted risk minimizing unbiased estimator ^ . As a point of reference, we also report the normalized bias and risk of
the OLS estimator
OLS
(replicated from Figure 1), and of the WAR minimizing estimator
without any constraints. By construction, the unconstrained WAR minimizing estimator
(which equals the posterior mean of
possible average normalized risk on
under a prior proportional to F ) has the smallest
2 [0; 80]. As can be seen from Figure 2, however, this
comes at the cost of a large mean bias.
The thick line plots a lower bound on the risk envelope for nearly unbiased estimators:
For each , we set the weighting function equal to a point mass at that , and then report
the lower bound on risk at
among all estimator whose Gi averaged normalized bias is no
larger than "B = 0:01 in absolute value for all i (that is, for each , we perform Step 2 of
the algorithm in Section 2.3, with F equal to a point mass at ). The risk of ^ is seen
14
to be approximately 10% larger than this lower bound on the envelope. This implies that
normalized weightings other than the uniform weighting on 2 [0; 80] cannot decrease the
risk very much below the risk of ^ , and to the extent that 10% is considered small, one
could call ^ approximately uniformly minimum variance unbiased. N
3.2
Median Unbiased Estimation Without Nuisance Parameters
Suppose h( ) =
, that is there are no nuisance parameters, and
R. Let
an estimator taking on values in R, and let m B ( ) be its median function, P (
m B ( )) = 1=2 for all
, then the estimator
2
U (X)
. If m B ( ) is one-to-one on
1
= m B(
B (X))
estimators m B (
B (X)),
B
be
B (X)
: R 7!
is exactly median unbiased by construction
(cf. Lehmann (1959)). In general, di¤erent estimators
1
with inverse m
1
B
which raises the question how
B
yield di¤erent median unbiased
B
should be chosen for
U
to have
low risk.
Running Example: Stock (1991) and Andrews (1993) construct median unbiased estimators for the largest autoregressive root based on the OLS estimator. Their results allow for
the possibility that there exists another median unbiased estimator with much smaller risk.
N
A good candidate for B is a nearly weighted risk minimizing nearly median unbiased
estimator ^. A small median bias of ^ typically also yields monotonicity of m^ , so if P ( y =
^
1
1=2 at the boundary points of (if there are any), then m^ has an inverse m^ , and
^U (x) = m 1 (^(x)) is exactly median unbiased. Furthermore, if ^ is already nearly median
^
unbiased, then m 1 : R 7! is close to the identity transformation, so that R(^U ; F ) is close
)
^
to the nearly minimal risk R(^; F ).
This suggests the following modi…ed strategy to numerically identify an exactly median
unbiased estimator ^U whose weighted risk R(^U ; F ) is within (1 + "R ) of the risk of any
other exactly median unbiased estimator.
1. Discretize
by the distributions Gi , i = 1; : : : ; m.
y
2. Obtain approximately optimal Lagrange multipliers ^ for the problem (1) subject to
y
y
(5) for " = 0, and associated value R = L( y ; ^ ). Choose Gi and ^ to ensure that
^
P(
^y
= ) < 1=2 for any boundary points of
. If m
go back to Step 1 and use a …ner discretization.
15
^y
still does not have an inverse,
y
y
3. Compute R(^U ; F ) for the exactly median unbiased estimator ^U (X) = m^ 1 (
^ y (X)).
^y
y
If R(^U ; F ) > (1 + "R )R, go back to Step 1 and use a …ner discretization. Otherwise,
^U = ^y has the desired properties.
U
For the same reasons as discussed in Section 2.3, it can make sense to restrict the estimator
of Step 2 of the switching form (10). For unbounded
compute m
^y
, it is not possible to numerically
exactly, but if the problem becomes well approximated by a standard problem
with standard unbiased estimator
S
as &( ) ! 1, then setting the inverse median function
to be the identity for very large ^& (X) leads to numerically negligible median biases.
Running Example: As discussed in Section 2.1, we combine the median unbiased constraint on
0 with absolute value loss. We make largely similar choices as in the
derivation of nearly mean unbiased estimators: the normalized weighting function Fn is
uniform on [0; 80], with risk now normalized as rn ( ; ) = E [j (X)
j]=n( ), where
p
n( ) = 2 + 10. Phillips (1974, 1975) provides Edgeworth expansions of ^OLS , which
implies that ^OLS =(1
as
T
1
) has a median bias of order o(T
! 1 suggests that the median bias of
S (x)
=
OLS (x)
1
) for j j < 1. Taking the limit
1 should be small for moderate
and large , and a numerical analysis shows that this is indeed the case. We set "R = 0:01,
and impose switching to
S
when
OLS (x)
>
= 40. Under a median bias constraint
and absolute value loss, using point masses for Gi leads to a discontinuous integrand in the
Lagrangian (7) for given x. In order to ensure a smooth estimator
, it makes sense to
instead choose the distributions Gi in a way that any mixture of the Gi ’s has a continuous
density. To this end we set the Lebesgue density of Gi , i = 1; : : : ; m proportional to the
ith third-order basis spline on the m = 51 knots f0; 0:22 ; : : : ; 102 g (with “not-a-knot” end
conditions). We only compute m^ 1 for
y
100 and set it equal to the identity function
^y
otherwise; the median bias of ^U for
100 is numerically indistinguishable from zero with
a Monte Carlo standard error of the median bias estimate of approximately 0.2%.
y
Figure 3 reports the median bias and normalized risk of ^ , the OLS estimator
U
OLS-based median unbiased estimator
U;OLS (x)
=m
1
OLS
(
OLS (x)),
OLS ,
the
and the estimator that
minimizes weighted risk relative to F without any median unbiased constraints. The risk of
^y is indistinguishable from the risk of U;OLS . Consequently, this analysis reveals U;OLS to
U
be (also) nearly weighted risk minimal unbiased in this problem. N
16
Figure 3: Median and Absolute Loss in Local-to-Unity AR(1) with Known Mean
4
Invariance
In this section, we consider estimation problems that have some natural invariance (or
equivariance) structure, so that it makes sense to impose the corresponding invariance also
on estimators. We show that the problem of identifying a (nearly) minimal risk unbiased invariant estimator becomes equivalent to the problem of identifying an unrestricted (nearly)
minimal risk unbiased estimator in a related problem with a sample and parameter space
generated by maximal invariants. Imposing invariance then reduces the dimension of the
e¤ective parameter space, which facilitates numerical solutions.
Consider a group of transformations on the sample space g : X
denotes a group action. We write a2
A 7! X , where a 2 A
a1 2 A for the composite action g(g(x; a1 ); a2 ) =
g(x; a2 a1 ) for all a1 ; a2 2 A, and we denote the inverse of action a by a , that is g(x; a
a) =
x for all a 2 A and x 2 X . Assume that the elements in a are distinct in the sense that
g(x; a1 ) = g(x; a2 ) for some x 2 X implies a1 = a2 .
Now suppose the problem is invariant in the sense that there exists a corresponding
group g :
is Pg(
;a) ,
A 7!
for all
on the parameter space, and the distribution of g(X; a) under X
P
and a 2 A (cf. De…nition 2.1 of Chapter 3 in Lehmann and Casella
(1998)). Let M (x) and M ( ) for M : X 7! X and M :
7!
be maximal invariants of
these two groups. To simplify notation, assume that M and M select speci…c orbits, that is
M (x) = g(x; O(x) ) for all x 2 X and M ( ) = g( ; O( ) ) for all
17
2
for some functions
O : X 7! A and O :
7! A. Then by de…nition of a maximal invariant, M (M (x)) = M (x),
M (M ( )) = M ( ), and we have the decomposition
x = g(M (x); O(x))
(12)
= g(M ( ); O( )):
(13)
By Theorem 6.3.2 of Lehmann and Romano (2005), the distribution of M (X) only depends
on M ( ). The following Lemma provides a slight generalization, which we require below.
The proof is in Appendix A.
Lemma 2 The joint distribution of (M (X); g( ; O(X) )) under
only depends on
through
M ( ).
Suppose further that the estimand h( ) is compatible with the invariance structure in the
sense that h( 1 ) = h( 2 ) implies h(g( 1 ; a)) = h(g( 2 ; a)) for all
1; 2
2
and a 2 A. As
discussed in Chapter 3 of Lehmann and Casella (1998), this induces a group g^ : H
satisfying h(g( ; a)) = g^(h( ); a) for all
2
A 7! H
and a 2 A, and it is natural for the loss
function and constraints to correspondingly satisfy
`( ; ) = `(^
g ( ; a); g( ; a)) for all
2 H,
2
and a 2 A
(14)
c( ; ) = c(^
g ( ; a); g( ; a)) for all
2 H,
2
and a 2 A:
(15)
Any loss function `( ; ) that depends on
i
only through the parameter of interest, `( ; ) =
i
` ( ; h( )) for some function ` : H 7! H, such as quadratic loss or absolute value loss,
automatically satis…es (14). Similarly, constraints of the form c( ; ) = ci ( ; h( )); such as
those arising from mean or median unbiased constraints, satisfy (15).
With these notions of invariance in place, it makes sense to impose that estimators
conform to this structure and satisfy
(g(x; a)) = g^( (x); a) for all x 2 X and a 2 A.
(16)
Equations (12) and (16) imply that any invariant estimator satis…es
(x) = g^( (M (x)); O(x)) for all x 2 X.
(17)
It is useful to think about the right-hand side of (17) as inducing the invariance property: Any function
a
: M (X ) ! H de…nes an invariant estimator (x) via (x) =
18
g^( a (M (x)); O(x)). Given that any invariant estimator satis…es (17), the set of all invariant estimators can therefore be generated by considering all (unconstrained)
a,
and setting
(x) = g^( a (M (x)); O(x)).
Running Example: Consider estimation of the AR(1) coe¢ cient after observing yt =
my + ut , and ut = ut
asymptotics
1
+ "t , where "t
=T for
= 1
iidN (0; 1) and u0 = 0. Under local-to-unity
0 and my = my;T =
T 1=2 for
2 R, the limiting
problem involves the observation X on the space of continuous functions on the unit interval,
Rs
where X(s) = + J(s) and J(s) = 0 e (s r) dW (r) is an Ornstein-Uhlenbeck process with
J(0) = 0. This observation is indexed by
= ( ; ), and the parameter of interest is
= h( ). The problem is invariant to transformations g(x; a) = x + a with a 2 A = R,
and corresponding group g(( ; ); a) = ( ;
+ a). One choice for maximal invariants is
M (x) = g(x; x(0)) and M (( ; )) = g(( ; );
) = ( ; 0). The Lemma asserts that
M (X) = X
X(0) and g( ; O(X) ) = ( ;
X(0)) have a joint distribution that only
depends on
through . The induced group g^ is given by g^( ; a) =
for all a 2 A.
Under (14), the loss must not depend on the location parameter . Invariant estimators
are numerically invariant to translation shifts of X, and can all be written in the form
(x) = g^( a (x
x(0)); x(0)) =
a (x
x(0)) for some function
a
that maps paths y on the
unit interval with y(0) = 0 to H. N
Now under these assumptions, we can write the risk of any invariant estimator as
r( ; ) = E [`( (X); )]
= E [`
(M (X)); g( ; O(X) ) ]
= EM ( ) [`
(by (12) and (14) with a = O(X) )
(M (X)); g(M ( ); O(X) ) ]
= EM ( ) [EM ( ) [`
(by Lemma 1)
(M (X)); g(M ( ); O(X) ) jM (X)]]
and similarly
b( ; ) = E [c( (X); )]
= EM ( ) [EM ( ) [c
Now set
and
= M( ) 2
(M (X)); g(M ( ); O(X) ) jM (X)]]:
2 H = h(
= M ( ); h( ) =
), x = M (x) 2 X = M (X )
` (x ;
) = E [`
(X ); g( ; O(X) ) jX = x ]
(18)
c ( (x );
) = E [c
(X ); g( ; O(X) ) jX = x ].
(19)
19
Then the starred problem has exactly the same structure as the problem considered in
Sections 2 and 3, and the same solution techniques can be applied to identify a nearly
weighted average risk minimal estimator ^ : X 7! H (with the weighting function a
nonnegative measure on
). This solution is then extended to the domain of the original
sample space X via ^(x) = g^(^ (M (x)); O(x)) from (17), and the near optimality of ^
implies the corresponding near optimality of ^ in the class of all invariant estimators.
Running Example: Since
`( (X );
= h( ), and g( ; a) does not a¤ect h( ), `( (X ); g( ; O(X) )) =
) and c( (X ); g( ; O(X) )) = c( (X );
to estimating
). Thus the starred problem amounts
from the observation X = M (X) = X
X(0) (whose distribution does
not depend on ). Since X is equal to the Ornstein-Uhlenbeck process J with J(0) = 0,
the starred problem is exactly what was previously solved in the running example. So in
y
particular, the solutions ^ and ^ , analyzed in Figures 2 and 3, now applied to X , are also
U
(nearly) weighted risk minimizing unbiased translation invariant estimators in the problem
of observing X =
+ J.
A di¤erent problem arises, however, in the empirical more relevant case with the initial
condition drawn from the stationary distribution. With yt = my + ut ; ut = ut
iidN (0; 1) and u0
N (0; (1
limiting problem under
of the process X(s) =
independent of J for
2
) 1 ) independent of f"t g for
+ "t , "t
< 1 (and zero otherwise), the
1=2
asymptotics is the observation
p
+ J S (s), where J S (s) = Ze s = 2 + J(s) and Z
N (0; 1) is
=1
=T and my = my;T = T
1
> 0 (and J S = J otherwise). For
is then given by (cf. Elliott (1999))
"
r
2
1
f (x ) =
exp
(x (1)2
2
2+
1)
1 2
2
Z
0
0, the density of X = X X(0)
1
2
x (s) ds +
1
2
(
R1
0
#
x (s)ds + x (1))2
:
2+
(20)
Analogous to the discussion below (4), this asymptotic problem can be motivated more
generally under suitable assumptions about the underlying processes yt :
The algorithms discussed in Section 3 can now be applied to determine nearly WAR
minimizing invariant unbiased estimators in the problem of observing X with density (20).
p
We use the normalization function n( ) = 2 + 20, and make the same choices for Fn and
Gi on
as in the problem with zero initial condition. Taking the local-to-unity limit of the
approximately mean unbiased Orcutt and Winokur (1969) estimator of
in a regression that
includes a constant suggests OLS 4 to be nearly mean unbiased for large ; where now
R1
R1
1
2
2
(x
(1)
1)=
x
(s)
ds
with
x
(s)
=
x
(s)
x (l)dl. Similarly, Phillips’
OLS (x ) =
2
0
0
20
Figure 4: Mean Bias and MSE in Local-to-Unity AR(1) with Unknown Mean
(1975) results suggest
OLS (x
)>
OLS
3 to be nearly median unbiased. We switch to these estimators if
= 60, set "B = 0:01 and "R = 0:01 for the nearly median unbiased estimator,
and "R = 0:02 for the mean unbiased estimator.
Figures 4 and 5 show the normalized bias and risk of the resulting nearly weighted risk
minimizing unbiased invariant estimators. We also plot the performance of the analogous
set of comparisons as in Figures 2 and 3. The mean and median bias of the OLS estimator
is now even larger, and the (nearly) unbiased estimators have substantially lower normalized
risk. The nearly WAR minimizing mean unbiased estimator
still has risk only about 10%
above the (lower bound) on the envelope. In contrast to the case considered in Figure 3, the
WAR minimizing median unbiased estimator now has perceptibly lower risk than the OLS
based median unbiased estimator for small , but the gains are still fairly moderate.
Figure 6 compares the performance of the nearly mean unbiased estimator with some previously suggested alternative estimators: The local-to-unity limit
OLS
4 of the analytically
bias corrected estimator by Orcutt and Winokur (1969), the weighted symmetric estimator
analyzed by Pantula, Gonzalez-Farias, and Fuller (1994) and Park and Fuller (1995), and the
MLE
f (x ) based on the maximal invariant likelihood (20) (called
“restricted” MLE by Cheang and Reinsel (2000)). In contrast to ^ , all of these previously
M LE (x
) = arg max
0
suggested estimators have non-negligible mean biases for values of
smallest mean bias at
close to zero, with the
= 0 of nearly a third of the root mean squared error. N
21
Figure 5: Median and Absolute Loss in Local-to-Unity AR(1) with Unknown Mean
Figure 6: Mean Bias and MSE in Local-to-Unity AR(1) with Unknown Mean
22
5
Further Applications
5.1
Degree of Parameter Time Variation
Consider the canonical “local level”model in the sense of Harvey (1989),
t
X
yt = m +
"s + ut , ("t ; ut )
(21)
iidN (0; I2 )
s=1
with only yt observed. The parameter of interest is , the degree of time variation of the
Pt
1=2
‘local mean’ m +
and = T 1 asymptotics, we have the
s=1 "s : Under m = T
functional convergence of the partial sums
XT (s) = T
1=2
bsT c
X
t=1
yt ) X(s) = s +
Z
s
(22)
W" (l)dl + Wu (s)
0
with W" and Wu two independent standard Wiener processes. The limit problem is the
estimation of
from the observation X, whose distribution depends on
= ( ; ) 2 [0; 1)
R: This problem is invariant to transformations g(x; a) = x + a s with a 2 A = R,
and corresponding group g(( ; ); a) = ( ;
+ a). One choice for maximal invariants is
M (x) = g(x; x(1)) and M (( ; )) = g(( ; );
) = ( ; 0). The results of Section 4 imply
that all invariant estimators can be written as functions of X = M (X), so that
Z s
Z 1
X (s) =
W" (l)dl + Wu (s) s
W" (l)dl + Wu (1)
0
(23)
0
which does not depend on any nuisance parameters. The distribution (23) arises more
generally as the limit of the suitably scaled partial sum process of the scores, evaluated at
the maximum likelihood estimator, of well-behaved time series models with a time varying
parameter; see Stock and Watson (1998), Elliott and Müller (2006a) and Müller and Petalas
(2010). Lemma 2 and Theorem 3 of Elliott and Müller (2006a) imply that the density of X
with respect to the measure of a Brownian Bridge W (s)
f (x ) =
r
2 e
1 e
where x~ (s) = x (s)
2
x~ (1)2
exp
Rs
0
e
(s l)
2
R1
0
x~ (s)2 ds
2
1 e
sW (1) is given by
2
e x~ (1) +
+ x~ (1) +
x (l)dl.
23
R1
0
e
R1
0
s
2
x~ (s)ds
2
x~ (s)ds
As discussed by Stock (1994), the local level model (21) is intimately linked to the
MA(1) model
yt =
"t +
ut . Under small time variation asymptotics
is nearly noninvertible with local-to-unity MA(1) coe¢ cient
XT (s) = XT (s)
=1
yt gTt=2
sXT (1) and the …rst di¤erences f
= =T ,
=T + o(T
1
yt
). Both
form small sample maximal
invariants to translations of yt , so that they contain the same information. This suggests
that X is also the limit problem (in the sense of LeCam) of a stationary Gaussian local-tounity MA(1) model without a constant, and it follows from the development in Elliott and
Müller (2006a) that this is indeed the case.
It has long been recognized that the maximum likelihood estimator of the MA(1) coe¢ cient exhibits non-Gaussian limit behavior under non-invertibility; see Stock (1994) for
a historical account and references. In particular, the MLE ^ su¤ers from the so-called
pile-up problem P (^ = 1j = 1) > 0; and Sargan and Bhargava (1983) derive the limiting
probability to be 0.657. The full limiting distribution of the MLE does not seem to have
been previously derived. The above noted equivalence leads us to conjecture that under
=T asymptotics,
=1
T (1
^) )
M LE (X
(24)
) = argmax f (X ):
A numerical calculation based on (23) con…rms that under
= 0, P (
M LE (X
) = 0) = 0:657,
but we leave a formal proof of the convergence in (24) to future research.
In any event, with P (
M LE (X
median unbiased estimator of
) = 0) > 1=2 under
2 [0; 1) on
M LE (X
= 0, it is not possible to base a
), as the median function m
M LE
is
not one-to-one for small values of . Stock and Watson (1998) base their exactly median
R1
unbiased estimator on the Nyblom (1989) statistic 0 X (s)2 ds:
We employ the algorithm discussed in Section 3.2 to derive an alternative median unbi-
ased estimator that comes close to minimizing weighted average risk under absolute value
loss.
Taking the local-to-unity limit of Tanaka’s (1984) expansions suggests
S (x
) =
) + 1 to be approximately median unbiased and normal with variance 2 for large .
p
+ 6 (the o¤set 6 ensures that normalized
We thus consider risk normalized by n( ) =
M LE (x
risk is well de…ned also for
= 0). We impose switching to
S
whenever
M LE (x
)>
= 10,
set Fn to be uniform on [0; 30], choose the Lebesgue density of Gi to be proportional to the
ith basis spline on the knots f0; 0:22 ; : : : ; 82 g, and set the inverse median function m^ 1 to the
identity for arguments larger than 50 (the median bias of
S
for
^y
> 50 is indistinguishable
from zero with a Monte Carlo standard error of approximately 0.2%). With "R = 0:01, the
24
Figure 7: Median Bias and Absolute Loss Performance in Local-to-Noninvertible MA(1)
y
algorithm then successfully delivers a nearly weighted risk minimizing estimator ^U that is
exactly median unbiased for
50, and very nearly median unbiased for
> 50.
Figure 7 displays its median bias and normalized risk, along with Stock and Watson’s
(1998) median unbiased estimator and the weighted risk minimizing estimator without any
y
bias constraints. Our estimator ^U is seen to have very substantially lower risk than the
previously suggested median unbiased estimator by Stock and Watson (1998) for all but
very small values of . For instance, under
= 15 (a degree of time variation that is
detected approximately 75% of the time by a 5% level Nyblom (1989) test of parameter
constancy), the mean absolute loss of the Stock and Watson (1998) estimator is about twice
as large.
5.2
Predictive Regression
A stylized small sample version of the predictive regression problem with a weakly exogenous
and persistent regressor zt is given by
yt = my + bzt
1
+ r"zt +
p
1
r2 "yt
zt = mz + ut
ut =
ut
1
+ "zt u0
25
N (0; (1
2
) 1)
with ("zt ; "yt )0
iidN (0; I2 ) independent of u0 ,
1 < r < 1 known, and only fyt ; zt 1 gTt=1
are observed. The parameter of interest is the regression coe¢ cient b. Other formulations of
the predictive regression problem di¤er by not allowing an unknown mean for the regressor,
or by di¤erent assumptions about the distribution of u0 .
p
p
=T , b = =T , my = y = T and mz = z T , we obtain
!
X1 ( )
) X( ) =
T 1=2 z[ T ]
X2 ( )
!
!
p
Rs
2 W (s)
s
+
X
(l)dl
+
rW
(s)
+
1
r
X1 (s)
2
z
y
y
0
Rs
=
(s l)
s pZ
dWz (l)
+ 0e
X2 (s)
z +e
2
Under asymptotics with
P Tc !
T 1=2 bt=1
yt
with Z
=1
N (0; 1) independent of the standard Wiener processes Wy and Wz . The distri-
bution of the limit process X depends on the unknown
and
= (
y;
z;
; ) 2 R3
(0; 1),
is the parameter of interest. The problem is seen to be invariant to the group of
transformations indexed by a = (ay ; az ; a ) 2 R3
X1 ( ) + ay +a
g(X; a) =
g( ; a) =
X2 (s)ds
0
X2 ( ) + az
!
+ (ay ; az ; a ; 0)
and one choice for maximal invariants is
!
X1 ( )
=
M (X) = X =
X2 ( )
M( ) =
R
X1 ( )
X1 (1)
X2 ( )
= (0; 0; 0; )
!
R
^OLS (X) 0 X2 (s)ds
R1
X2 (s)ds
0
R1
R1
where ^OLS (X) = 0 X2 (s)dX1 (s)= 0 X2 (s)2 ds. With g^( ; a) = + a and O(X) =
R1
(X1 (1); 0 X2 (s)ds; ^OLS (X); 0), (17) implies that all invariant estimators can be written
in the form
^OLS (x) +
a (x
),
(25)
that is, they are equal to the OLS estimator ^OLS (x), plus a “correction term” that is a
function of the maximal invariant X .
The implications of imposing invariance are quite powerful in this example: the problem
of determining the nearly weighted average risk minimizing unbiased estimator relative to
(18) and (19) is only indexed by
, that is by the mean reversion parameter . Invariance
26
thus also e¤ectively eliminates the parameter of interest
from the problem. Solving for `
and c in (18) and (19) requires the computation of the conditional expectation of the loss
function and constraint evaluated at g(M ( ); O(X) ) given X under
depending on
through , this requires derivation of the conditional distribution of ^OLS (X)
given X . A calculation shows that under
equal to
. With ` and c only
(that is, with the true value of
=(
y;
z;
; )
= (0; 0; 0; ))
R1
X (s)dX2 (s)
1 r2
r 0R 1 2
+r ; R1
X2 (s)2 ds
X2 (s)2 ds
0
0
N
^OLS (X)jX
!
:
(26)
X2 (0) follows the same distribution as the process considered in
Furthermore, since X2 ( )
the running example at the end of Section 4, the density of X2 follows again readily from
Elliott (1998). Finally, the conditional density of X1 given X2 (relative to the measure of
a Brownian Bridge) does not depend on
minimizer
, so it plays no role in the determination of the
(x ) of the integrand in the Lagrangian (7). Appendix C.3 provides further
details on these derivations.
Equation (26) implies that under
to
, the unconditional expectation of ^OLS (X) is equal
R1
R1
r times the bias of the OLS estimator ^ OLS =
X2 (s)dX2 (s)= 0 X2 (s)2 ds of the
0
mean reversion parameter . This tight relationship between the bias in the OLS coe¢ cient
estimator, and the OLS bias in estimating the autoregressive parameter, was already noted
by Stambaugh (1986). What is more, let ~ be an alternative estimator of the mean reverting
parameter based on X2 , and set
bias of the resulting estimator of
in (25) equal to r(^ OLS
a
~ ). Then the unconditional
is equal to
r times the mean bias of ~ , and its mean
R1
squared error is equal to r times the mean squared error of ~ plus E[(1 r2 )= 0 X2 (s)2 ds].
2
Thus, setting
a (x
) equal to r(^ OLS
^ ) for ^ the nearly WAR minimizing mean un-
biased estimator derived at the end of Section 4 yields a nearly WAR minimizing mean
unbiased estimator of
in the predictive regression problem. Correspondingly, the Orcutt
and Winokur (1969)-based bias correction in the estimation of
suggested by Stambaugh
(1986) and the restricted maximum likelihood estimator suggested by Chen and Deo (2009)
have mean biases that are equal to
r times the mean biases reported in Figure 6. We
omit a corresponding plot of bias and risk for brevity, but note that just as in Figure 4,
a comparison with the (lower bound on the) risk envelope shows that our estimator comes
quite close to being uniformly minimum variance among all (nearly) unbiased estimators.
The median bias of ^OLS + r(^ OLS
~ ), however, is not simply equal to
27
r times the
Figure 8: Median Bias and Absolute Loss Performance in Predictive Regression
median bias of ~ . One also cannot simply invert the median function of an estimator of
due
to the presence of the nuisance parameter . We thus apply our general algorithm to obtain
p
a nearly WAR minimizing nearly median unbiased estimator. We use n( ) = 2 + 20,
"B = "R = 0:01, set Fn uniform on [0; 80] and let Gi be point masses on
2 f0; 0:22 ; : : : ; 102 g
(the Gaussian noise in (26) induces smooth solutions even for point masses, in contrast to
direct median unbiased estimation of the mean reverting parameter). For ^ OLS >
we switch to the estimator
a
= 3r, which is nearly median unbiased for large
to 1.
Figure 8 reports the results for r =
= 60
and jrj close
0:95, along with the same analogous set of compar-
2
isons already reported in Figure 3.
5.3
Quantile Long-range Forecasts from a Local-To-Unity Autoregressive Process
The …nal example again involves a local-to-unity AR(1) process, but now we are interested in
constructing long-run quantile forecasts. To be speci…c, assume yt = m + ut , ut = ut
iidN (0; 1) and u0
"t
some 0 <
N (0; (1
2
1
) ) independent of "t . We observe
< 1 and horizon , seek to estimate the conditional
2
fyt gTt=1 ,
1 + "t ,
and given
quantile of yt+ . Under
We could not replicate the favorable results reported by Eliasz (2005) for his conditionally median
unbiased estimator, derived for a closely related model with zero initial condition on z0 .
28
asymptotics with m = T 1=2 ,
=T and
=1
= brT c, the limiting problem involves the
observation of the stationary Ornstein-Uhlenbeck process X on the unit interval,
Z s
Ze s
X(s) = + p
exp( (s l))dW (l)
+
2
0
with Z
( ; )2R
N (0; 1) independent of W . The problem is indexed by the unknown parameters
(0; 1), and the object of interest is the
quantile of X(1 + r) conditional on
X. It is potentially attractive to construct estimators
that are quantile unbiased in the
sense that
P(
; ) (X(1
for all ( ; ) 2 R
+ r) < (X)) =
(0; 1).
(27)
This ensures that in repeated applications, X(1 + r) indeed realizes to be smaller than the
estimator (X) of the
quantile with probability
, irrespective of the true value of the
parameters governing X.
A straightforward calculation shows that X(1 + r)jX
X(1+r)jX
=
)e
r
X(1+r)jX
+
+ (X(1)
quantile is equal to
and
2
X(1+r)jX
X(1+r)jX z
= (1
e
2r
N(
X(1+r)jX ;
X(1+r)jX
with
)=2 ; so that the conditional
, where P (Z < z ) =
use this expression as an estimator, however, since
2
X(1+r)jX )
and
. It is not possible to
X(1+r)jX
depend on the
unknown parameters ( ; ). A simply plug-in estimator is given by P I (x) = ^ X(1+r)jX +
R1
R1
^ X(1+r)jX z which replaces ( ; ) by (^ ; ^ OLS ) = ( 0 X(s)ds; 21 (X(1)2 1)= 0 X(s)2 ds),
X(s) = X(s)
^.
In order to cast this problem in the framework of this paper, let
2
= (X(1 + r)
X(1+r)jX )= X(1+r)jX ,
and part of the parameter,
we integrate over
so that
= ( 1;
0
2)
1
= (X(1)
N (0; I2 ). We treat
p
) 2 and
as …xed
= ( ; ; ). In order to recover the stochastic properties of ,
N (0; I2 ) in the weighting functions F and Gi . In this way, weighted
average bias constraints over
amount to a bias constraint in the original model with
stochastic and distributed N (0; I2 ).
Now in this notation, X(1 + r) becomes non-stochastic and equals
p
e r 1 + 1 e 2r 2
p
h( ) = +
2
and the density f (x) depends on
1
(but not
2 ).
Furthermore, the quantile unbiased
constraint (27) is equivalent to weighted average bias over
c( ; ) = 1[h( ) < ]
N (0; I2 ) with c( ; ) equal to
. We use the usual quantile check function to measure the loss of
estimation errors, `( ; ) = jh( )
j j
1[h( ) < ]j.
29
Figure 9: 5% Quantile Bias and Normalized Expected 5% Quantile Check Function Loss of
Forecasts in Local-to-Unity AR(1) for 20% Horizon
With these choices, the problem is invariant to the group of translations g(x; a) = x + a
and g( ; a) = ( ;
+ a; ) for a 2 R, with g^( ; a) =
is given by X = M (X) = X(s)
relative to the measure of W ( )
X(1) and
+ a. One set of maximal invariants
= M ( ) = ( ; 0; ), and the density of X
W (1) is given by (cf. equation (25) in Elliott and Müller
(2006b))
f (x ) = exp[
1
2
(x (0)
2
1)
A calculation shows that with
on X = x is N (
2
hjX
= (1
e
2r
2
hjX
hjX ;
+ 2+2 (1
R
1 2 1
x
2
0
,
and
I2
(s) ds
…xed and
) with
e
r
hjX
2
1
q
(
2
R1
0
x (s)ds + x (0))
1 2
4 1
]: (28)
N (0; I2 ), the distribution of h( ) conditional
R1
= ( 0 x (s)ds + x (0))(1 e r )=(2 + ) and
) )=(2 ). The weighted averages of ` ( ; ) and c ( ; )
in (18) and (19) with weighting function
R
f (x )c ( ; ) I2 ( )d
R
=
(
f (x ) I2 ( )d
R
f (x )` ( ; ) I2 ( )d
R
= (
f (x ) I2 ( )d
where
2
N (0; I2 ) then become
hjX
)
hjX
hjX
)
hjX
+(
hjX
hjX
)( (
hjX
)
)
hjX
are the cdf and pdfs of a scalar and bivariate standard normal, respec-
tively.
With these expressions at hand, it becomes straightforward to implement the algorithm
of Section 2.3. We normalize risk by n( ) =
1
(2
10
30
+ )
1=2
, set "B = 0:002 and "R = 0:01,
and in addition to the integration over
N (0; I2 ), choose
2
and Gi equal to point masses on
uniform on [0; 80] in Fn ,
2
2 f0; 0:2 ; : : : ; 10 g. For ^ OLS >
= 60, we switch
R1
p
to an estimator of the unconditional quantile S (x ) = ^ + z~ = 2^ , with ^ = 0 x (s)ds
p
and ^ = ^ OLS 3 and z~a = za (2^ + 4)=(2^ z 2 ) (the use of the approximately median
unbiased estimator ^ and the replacement of z by z~a help correct for the estimation error
in (^ ; ^ ); see the Appendix C.4 for details).
Figure 9 plots the results for r = 0:2 and
= 0:05, along with the plug-in estimator
PI
de…ned above, and the WAR minimizing estimator without any quantile bias constraints.
The plug-in estimator is seen to very severely violate the quantile unbiased constraint (27)
for small : instead of the nominal 5%, X(1 + r) takes on a value smaller than the quantile
estimator about 16% of the time.
6
Conclusion
The literature contains numerous suggestions for estimators in non-standard problems, and
their evaluation typically considers bias and risk. For a recent example in the context of
estimating a local-to-unity AR(1) coe¢ cient, see, for instance, Phillips (2012).
We argue here that the problem of identifying estimators with low bias and risk may
be fruitfully approached in a systematic manner. We suggest a numerical approximation
technique to the resulting programming problem, and show that it delivers very nearly
unbiased estimators with demonstrably close to minimal weighted average risk in a number
of classic time series estimation problems.
The general approach might well be useful also for cross sectional non-standard problems,
such as those arising under weak instruments asymptotics.3 In ongoing work, we consider
an estimator of a very small quantile
that has the property that in repeated applications,
an independent draw from the same underlying distribution realizes to be smaller than
the estimator with probability
. This is qualitatively similar to the forecasting problem
considered in Section 5.3, but concerns a limiting problem of observing a …nite-dimensional
random vector with joint extreme value distribution.
In all the examples we considered in this paper, it turned out that nearly unbiased
estimators with relatively low risk exist. This may come as something of a surprise, and may
3
Andrews and Armstrong (2015) derive a mean unbiased estimator of the structural parameter under a
known sign of the …rst stage in this problem.
31
well fail to be the case in other non-standard problems. For instance, as recently highlighted
by Hirano and Porter (2012), mean unbiased estimation is not possible for the minimum of
the means in a bivariate normal problem. Unreported results show that an application of
our numerical approach to this problem yields very large lower bounds on weighted average
risk for any estimator with moderately small mean bias. The conclusion then is that even
insisting on small (but not necessarily zero) mean bias is not a fruitful constraint in that
particular problem.
References
Albuquerque, J. (1973): “The Barankin Bound: A Geometric Interpretation,” IEEE
Transactions on Information Theory, 19, 559–561.
Amihud, Y., and C. Hurvich (2004): “Predictive regressions: A reduced-bias estimation
method,”Journal of Financial and Quantitative Analysis, 39, 813–841.
Andrews, D. W. K. (1993): “Exactly Median-Unbiased Estimation of First Order Autoregressive/Unit Root Models,”Econometrica, 61, 139–165.
Andrews, D. W. K., and H. Chen (1994): “Approximately Median-Unbiased Estimation
of Autoregressive Models,”Journal of Business and Economic Statistics, 12, 187–204.
Andrews, D. W. K., M. J. Moreira, and J. H. Stock (2006): “Optimal Two-Sided
Invariant Similar Tests for Instrumental Variables Regression,” Econometrica, 74, 715–
752.
(2008): “E¢ cient Two-Sided Nonsimilar Invariant Tests in IV Regression with
Weak Instruments,”Journal of Econometrics, 146, 241–254.
Andrews, I., and T. B. Armstrong (2015): “Unbiased Instrumental Variables Estimation Under Known First-Stage Sign,”Working Paper, Harvard University.
Barankin, E. W. (1946): “Locally Best Unbiased Estimates,” Annals of Mathematical
Statistics, 20, 477–501.
Cavanagh, C. L., G. Elliott, and J. H. Stock (1995): “Inference in Models with
Nearly Integrated Regressors,”Econometric Theory, 11, 1131–1147.
32
Chan, N. H., and C. Z. Wei (1987): “Asymptotic Inference for Nearly Nonstationary
AR(1) Processes,”The Annals of Statistics, 15, 1050–1063.
Cheang, W.-K., and G. C. Reinsel (2000): “Bias Reduction of Autoregressive Estimates
in Time Series Regression Model through Restricted Maximum Likelihood,” Journal of
the American Statistical Association, 95, 1173–1184.
Chen, W. W., and R. S. Deo (2009): “Bias Reduction and Likelihood-Based Almost Exactly Sized Hypothesis Testing in Predictive Regressions Using the Restricted Likelihood,”
Econometric Theory, 25, 1143–1179.
Crump, R. K. (2008): “Optimal Conditonal Inference in nearly-integrated autoregressive
processes,”Working paper. Federal Reserve Bank of New York.
Eliasz, P. (2005): “Optimal Median Unbiased Estimation of Coe¢ cients on Highly Persistent Regressors,”Ph.D. Thesis, Princeton University.
Elliott, G. (1998): “The Robustness of Cointegration Methods When Regressors Almost
Have Unit Roots,”Econometrica, 66, 149–158.
(1999): “E¢ cient Tests for a Unit Root When the Initial Observation is Drawn
From its Unconditional Distribution,”International Economic Review, 40, 767–783.
(2006): “Forecasting with Trending Data,” in Handbook of Economic Forecasting,
Volume 1, ed. by C. W. J. G. G. Elliott, and A. Timmerman. North-Holland.
Elliott, G., and U. K. Müller (2006a): “E¢ cient Tests for General Persistent Time
Variation in Regression Coe¢ cients,”Review of Economic Studies, 73, 907–940.
(2006b): “Minimizing the Impact of the Initial Condition on Testing for Unit
Roots,”Journal of Econometrics, 135, 285–310.
Elliott, G., U. K. Müller, and M. W. Watson (2015): “Nearly Optimal Tests When
a Nuisance Parameter is Present Under the Null Hypothesis,”Econometrica, 83, 771–811.
Elliott, G., T. J. Rothenberg, and J. H. Stock (1996): “E¢ cient Tests for an
Autoregressive Unit Root,”Econometrica, 64, 813–836.
33
Glave, F. E. (1972): “A New Look at the Barankin Lower Bound,”IEEE Transactions on
Information Theory, 18, 349–356.
Gospodinov, N. (2002): “Median unbiased forecasts for highly persistent autoregressive
processes,”Journal of Econometrics, 111, 85–101.
Han, C., P. Phillips, and D. Sul (2011): “Uniform asymptotic normality in stationary
and unit root autoregression,”Econometric Theory, 27, 1117–1151.
Harvey, A. C. (1989): Forecasting, Structural Time Series Models and the Kalman Filter.
Cambridge University Press.
Hirano, K., and J. Porter (2012): “Impossibility Results for Nondi¤erentiable Func½ 1790.
tionals,”Econometrica, 80, 1769U–
Hurvicz, L. (1950): “Least Squares Bias in Time Series,”in Statistical Inference in Dynamic
Economic Models, ed. by T. Koopmans, pp. 365–383, New York. Wiley.
Jansson, M., and M. J. Moreira (2006): “Optimal Inference in Regression Models with
Nearly Integrated Regressors,”Econometrica, 74, 681–714.
Kemp, G. C. R. (1999): “The Behavior of Forecast Errors from a Nearly Integrated AR(1)
Model as Both Sample Size and Forecast Horizon Become Large,” Econometric Theory,
15(2), pp. 238–256.
Kendall, M. G. (1954): “Note on the Bias in the Estimation of Autocorrelation,” Biometrika, 41, 403–404.
Kothari, S. P., and J. Shanken (1997): “Book-to-Market, Dividend Yield, and Expected
Market Returns: A Time-Series Analysis,”Journal of Financial Economics, 18, 169–203.
Lehmann, E. L., and G. Casella (1998): Theory of Point Estimation. Springer, New
York, 2nd edn.
Lehmann, E. L., and J. P. Romano (2005): Testing Statistical Hypotheses. Springer, New
York.
MacKinnon, J. G., and A. A. Smith (1998): “Approximate Bias Correction in Econometrics,”Journal of Econometrics, 85, 205–230.
34
Mankiw, N. G., and M. D. Shapiro (1986): “Do we reject too often? Small Sample
Properties of Tests of Rational Expectations Models,”Economics Letters, 20, 139–145.
Marriott, F. H. C., and J. A. Pope (1954): “Bias in the Estimation of Autocorrelations,”Biometrika, 41, 393–402.
McAulay, R., and E. Hofstetter (1971): “Barankin Bounds on Parameter Estimation,”
IEEE Transactions on Information Theory, 17, 669 –676.
Mikusheva, A. (2007): “Uniform Inference in Autoregressive Models,” Econometrica, 75,
1411–1452.
Müller, U. K., and P. Petalas (2010): “E¢ cient Estimation of the Parameter Path in
Unstable Time Series Models,”Review of Economic Studies, 77, 1508–1539.
Nyblom, J. (1989): “Testing for the Constancy of Parameters Over Time,”Journal of the
American Statistical Association, 84, 223–230.
Orcutt, G. H., and H. S. Winokur (1969): “First Order Autoregression: Inference,
Estimation, and Prediction,”Econometrica, 37, 1–14.
Pantula, S. G., G. Gonzalez-Farias, and W. A. Fuller (1994): “A Comparison of
Unit-Root Test Criteria,”Journal of Business & Economic Statistics, 12, 449–459.
Park, H. J., and W. A. Fuller (1995): “Alternative Estimators and Unit Root Tests for
the Autoregressive Process,”Journal of Time Series Analysis, 16, 415–429.
Phillips, P. (1977): “Approximations to some …nite sample distributions associated with
a …rst order stochastic di¤erence equation,”Econometrica, 45, 463–486.
(1978): “Edgeworth and saddlepoint approximations in the …rst-order noncircular
autoregression,”Biometrika, 65, 91–108.
(1979): “The Sampling Distribution of Forecasts from a First Order Autoregression,”Journal of Econometrics, 9, 241–261.
(2012): “Folklore Theorems, Implicit Maps, and Indirect Inference,”Econometrica,
80, 425–454.
35
Phillips, P. C. B. (1987): “Towards a Uni…ed Asymptotic Theory for Autoregression,”
Biometrika, 74, 535–547.
Phillips, P. C. B., and C. Han (2008): “Gaussian Inference in AR(1) Time Series with
or Without a Unit Root,”Econometric Theory, 24, 631–650.
Quenouille, M. H. (1956): “Notes on Bias in Estimation,”Biometrika, 43, 353–360.
Roy, A., and W. A. Fuller (2001): “Estimation for Autoregressive Time Series with a
Root Near 1,”Journal of Business and Economic Statistics, 19, 482–493.
Sargan, J. D., and A. Bhargava (1983): “Maximum Likelihood Estimation of Regression
Models with First Order Moving Average Errors when the Root Lies on the Unit Circle,”
Econometrica, 51, 799–820.
Sawa, T. (1978): “The exact moments of the least squares estimator for the autoregressive
model,”Journal of Econometrics, 8, 159–172.
Shaman, P., and R. A. Stine (1988): “The Bias of Autoregressive Coe¢ cient Estimators,”
Journal of the American Statistical Association, 83, 842–848.
Stambaugh, R. F. (1986): “Bias in Regressions with Lagged Stochastic Regressors,”Working Paper, University of Chicago.
(1999): “Predictive regressions,”Journal of Financial Economics, 54, 375–421.
Stock, J. H. (1991): “Con…dence Intervals for the Largest Autoregressive Root in U.S.
Macroeconomic Time Series,”Journal of Monetary Economics, 28, 435–459.
(1994): “Unit Roots, Structural Breaks and Trends,”in Handbook of Econometrics,
ed. by R. F. Engle, and D. McFadden, vol. 4, pp. 2740–2841. North Holland, New York.
(1996): “VAR, Error Correction and Pretest Forecasts at Long Horizons,” Oxford
Bulletin of Economics and Statistics, 58, 685–701.
Stock, J. H., and M. W. Watson (1998): “Median Unbiased Estimation of Coe¢ cient
Variance in a Time-Varying Parameter Model,” Journal of the American Statistical Association, 93, 349–358.
36
Tanaka, K. (1983): “Asymptotic Expansions Associated with the AR(1) Model with Unknown Mean,”Econometrica, 51, 1221–1231.
(1984): “An asymptotic expansion associated with the maximum likelihood estimators in ARMA models,”Journal of the Royal Statistical Society B, 46, 58–67.
Turner, J. (2004): “Local to Unity, Long-Horizon Forecasting Thresholds for Model selection in the AR(1),”Journal of Forecasting, 23, 513–539.
White, J. S. (1961): “Asymptotic Expansions for the Mean and Variance of the Serial
Correlation Coe¢ cient,”Biometrika, 48, 85–94.
Yu, J. (2012): “Bias in the Estimation of the Mean Reversion Parameter in Continuous
Time Models,”Journal of Econometrics, 169, 114–122.
37
A
Proofs
The proof of Lemma 2 uses the following Lemma.
Lemma 3 For all a 2 A and x 2 X , O(g(x; a)) = O(x)
a .
Proof: Replacing x by g(x; a) in (12) yields
g(x; a) = g(M (g(x; a)); O(g(x; a)))
= g(M (x); O(g(x; a))):
Alternatively, applying a on both sides of (12) yields
g(x; a) = g(g(M (x); O(x)); a) = g(M (x); a O(x)):
Thus g(M (x); a O(x)) = g(M (x); O(g(x; a))). By the assumption that group actions a are distinct,
this implies that O(g(x; a)) = a O(x). The result now follows from (a2 a1 ) = a1 a2 .
Proof of Lemma 2: Suppose 1 and 2 are such that M ( 1 ) = M ( 2 ). Then there exists a 2 A
such that 2 = g( 1 ; a). Thus, for an arbitrary measurable subset B X
,
P 2 ((M (X); g( 2 ; O(X) )) 2 B)
= Pg(
1 ;a)
((M (X); g( 2 ; O(X) )) 2 B)
= P 1 ((M (g(X; a)); g( 2 ; O(g(X; a)) )) 2 B) (invariance of problem)
= P 1 ((M (X); g( 2 ; O(X)
a )) 2 B) (property of M and Lemma 3)
= P 1 ((M (X); g(g( 1 ; a); O(X)
a )) 2 B) (by
2
= g( 1 ; a))
= P 1 ((M (X); g( 1 ; O(X) )) 2 B):
B
Details on Algorithm of Section 2.3
The basic idea of the algorithm is to start with some guess for the Lagrange multipliers , compute
the biases B( ; Gi ), and adjust ( li ; ui ) iteratively as a function of B( ; Gi ), i = 1; : : : ; m.
To facilitate the repeated computation of B( ; Gi ), i = 1; : : : ; m, it is useful to employ an
importance sampling estimator. We use the proposal density fp , where fp is the mixture density
Pmp 1
fp (x) = (24 + mp 2) 1 (12f p;1 (x) + 12f p;mp (x) + i=2
f p;i (x)) for some application-speci…c
mp
grid of mp parameter values f p;i gi=1 (that is, under fp , X is generated by …rst drawing an index J
uniformly from f 10; 9; : : : ; mp + 11g, then draw X from f p;J , where J = max(min(J; 1); mp )).
The overweighting of the boundary values p;1 and p;mp by a factor of 12 counteracts the lack of
additional importance sampling points to one side, leading to approximately constant importance
sampling Monte Carlo standard errors in problems where is one-dimensional, as is the case in all
our applications once invariance is imposed.
Let Xl , l = 1; : : : ; N be i.i.d. draws from fp : For a given estimator , we approximate B( ; Gi )
by
Z
N Z
X
f (X)
f (Xl )
B( ; Gi ) = Efp [ c( (X); )
dGi ( )] N 1
c( (Xl ); )
dGi ( ):
fp (X)
fp (Xl )
l=1
38
R
We further approximate c( ; )f (Xl )dGi ( ) for arbitrary by quadratic interpolation based on
the closest three points in the grid Hl = f l;1 ; : : : ; l;M g, l;i < l;j for i < j (we use the same grid
for all i = 1; : : : ; m). The grid is chosen large enough so that (Xl ) is never arti…cially constrained
for any value of considered by the algorithm. Similarly, R( ; F ) is approximated by
R( ; F )
N
1
N
X
l=1
R
R
`( (Xl ); )f (Xl )dF ( )
fp (Xl )
and `( (Xl ); )f (Xl )dF ( ) for arbitrary is approximated with the analogous quadratic interpolation scheme.
Furthermore, for given , the minimizer (Xl ) of the function Ll : R 7! R
Ll ( ) =
Z
`( ; )f (Xl )dF ( ) +
m
X
u
i
(
l
i)
i=1
Z
c( ; )f (Xl )dGi ( )
is approximated by …rst obtaining the global minimum over
2 Hl , followed by a quadratic
approximation of Ll ( ) around the minimizing l;j 2 Hl based on the three values of Ll ( ) for
2 f l;j 1 ; l;j 1 ; l;j 1 ) Hl .
For given ", the approximate solution to (1) subject to (5) is now determined as follows:
1. Generate i.i.d. draws Xl , l = 1; : : : ; N with density fp .
R
R
2. Compute and store c( ; )f (Xl )dGi ( ) and `( ; )f (Xl )dF ( ),
l = 1; : : : ; N:
3. Initialize
(0)
u;(0)
i
as
4. For k = 0; : : : ; K
(k)
(b) Compute B(
u;(0)
= 0:0001 and ! i
l;(0)
= !i
= 0:05, i = 1; : : : ; m.
(Xl ) as described above, l = 1; : : : ; N:
(k)
(k+1)
l;(k)
l;(k)
exp(! i (
i
; Gi ) as described above, i = 1; : : : ; m:
from
B(
u;(k+1)
(d) Compute ! i
u;(k+1)
!i
l;(0)
i
1
(a) Compute
(c) Compute
=
2 Hl , i = 1; : : : ; m,
(k)
(k)
via
; Gi )
u;(k+1)
i
=
u;(k)
u;(k)
exp(! i (B(
i
(k)
; Gi )
")),
l;(k+1)
i
=
")), i = 1; : : : ; m.
u;(k)
= max(0:01; 0:5! i
u;(k)
min(100; 1:03! i )
) if (B(
(k+1)
; Gi ) ")(B(
l;(k+1)
!i
(k)
; Gi ) ") < 0, and
l;(k+1)
)
otherwise, and similarly
= max(0:01; 0:5! i
l;(k+1)
l;(k)
if (B( (k+1) ; Gi )+")(B( (k) ; Gi )+") < 0, and ! i
= min(100; 1:03! i ) otherwise.
=
The idea of Step 4.d is to slowly increase the speed of the change in the Lagrange multipliers
as long as the sign of the violation remains the same in consecutive iterations, and to decrease it
otherwise.
y
In the context of Step 2 of the algorithm in Section 2.3, we set ^ = (K) for K = 300.
For Step 3 of the algorithm in Section 2.3, we …rst initialize and apply the above iterations 300
times for " = "B : We then continue to iterate the above algorithm for another 400 iterations, but
39
every 100 iterations increase or decrease " via a simple bisection method based on whether or not
R( (k) ; F ) < (1 + "R )R: After these 4 bisections for ", we continue to iterate another 300 times, for
a total of K = 1000 iterations. The check of whether the resulting ^ = (K) satis…es the uniform
bias constraint (3) is performed by directly computing its bias b(^ ; ) via the importance sampling
approximation
N
X
f (Xl )
b( ; ) N 1
c( (Xl ); )
fp (Xl )
l=1
over a …ne but discrete grid 2 g
.
If the estimator is constrained to be of the switching form (10), the above algorithm is unchanged, except that (k) (Xl ) in Step 4.a is predetermined to equal (k) (Xl ) = S (Xl ) for all
draws Xl with (Xl ) = 1.
We set the number of draws to N = 250; 000 in all applications. Computations take no more
than minutes on a modern PC, with a substantial fraction of time spent in Step 2.
C
C.1
Additional Application-Speci…c Details
AR(1) Coe¢ cient
The process X is approximated by a discrete time Gaussian AR(1) with T = 2500 observations.
j i 2
4 j i+1 2 3
49
We use p;i = ( i 5 1 )2 , i = 1; : : : ; 61, g = f(12j=500)2 g500
j=0 , and Hl = ff 4 ( 5 ) + 4 ( 5 ) gj=0 gi=0 [
f144g. Integration over Gi and F is performed using a Gaussian quadrature rule with 7 points
separately on each of the intervals determined by the sequence of knots that underlie Gi .
C.2
Degree of Parameter Time Variation
The process X is approximated by partial sums of a discrete time MA(1) with T = 800 observations.
2 3
40
We set Hl = ff 4j ( 5i )2 + 4 4 j ( i+1
5 ) gj=0 gi=0 [ f64g. After invariance, the e¤ective parameter is ,
2 500
and for , we use the grids f( i 5 1 )2 g41
i=1 as proposal and f(j=50) gj=0 to check the uniform bias
property. Integration over Gi and F is performed using a Gaussian quadrature rule with 7 points
separately on each of the intervals determined by the sequence of knots that underlie Gi .
C.3
Predictive Regression
The process X2 is approximated by a discrete time Gaussian AR(1) with T = 2500 observations.
R1
i
We set Hl = f 10
( 0 X2;l (s)2 ds) 1=2 g80
i= 80 (corresponding to a range of the “correction term”in (25)
of up to 8 OLS standard deviations of ^ OLS ). After invariance and integration over
N (0; I2 ),
2 g500
as
proposal
and
f(12j=500)
the e¤ective parameter is ; and for , we use the grids f( 5i )2 g60
j=0
i=0
to check the uniform bias property.
p
To derive (26), note that under , X1 (s) = rWz (s) + 1 r2 Wy (s) and
R s X2 is independent of
Wy . Further dX2 (s) =
X2 (s)ds + dWz (s), so that Wz (s) = X2 (s) + 0 X2 (l)dl. Thus,
Z s
E[X1 (s)jX2 ] = rX2 (s) + r
X2 (l)dl
0
40
Z
1
X2 (l)dl = rX2 (1)
E[X1 (1)jX2 ] = rX2 (1) + r
0
R1
X (l)dX2 (l)
+r
E[^OLS (X)jX2 ] = r 0R 1 2
2
0 X2 (l) dl
Rs
so from X1 (s) = X1 (s) sX1 (1) ^OLS (X) 0 X2 (l)dl,
R1
Z
X (l)dX2 (l) s
E[X1 (s)jX2 ] = rX2 (s) rsX2 (1) r 0R 1 2
X2 (l)dl:
2 dl
X
(l)
0
2
0
Furthermore, the conditional covariance between X1 (s) and ^OLS (X) given X2 is
E[(^OLS (X)
" Rs
= (1
E[^OLS (X)jX2 ])(X1 (s) E[X1 (s)jX2 ])jX2 ]
#
R1
R1
Z s
2
2
0 X2 (l)dl
0 X2 (l) dl
0 X2 (l)dl
sR 1
X2 (l)dl R 1
r ) R1
=0
2 dl
2 dl
2 dl)2
X
(l)
X
(l)
X
(l)
(
0
2
2
2
0
0
0
so that by Gaussianity, ^OLS (X) is independent of X1 conditional on X2 . Thus, E[^OLS (X)jX ] =
E[^OLS (X)jX2 ], and (26) follows.
We now show that the conditional density of X1 given X2 does not depend on . Clearly, X1
is Gaussian conditional on X2 , so the conditional distribution of X1 given X2 is fully described by
the conditional mean and conditional covariance kernel. For 0 s1 s2 1, we obtain
E[(X1 (s1 )
E[X1 (s1 )jX2 ])(X1 (s2 )
E[X1 (s2 )jX2 ])jX2 ]
"
r2 ) s1
= (1
s1 s2
R s1
0
Neither this expression, nor E[X1 (s)jX2 ] computed above, depend on
C.4
#
Rs
X2 (l)dl 0 2 X2 (l)dl
:
R1
2
0 X2 (l) dl
, which proves the claim.
Quantile Long-range Forecasts
The process X is approximated by a discrete time Gaussian AR(1) with T = 2500 observations.
i
)^ X(1+r)jX;l g80
We set Hl = f(^ X(1+r)jX;l + ^ X(1+r)jX;l z + 10
i= 80 , where ^ X(1+r)jX;l and ^ X(1+r)jX;l
R1
are estimators of X(1+r)jX and X(1+r)jX that estimate ( ; ) via ( 0 Xl (s)ds; max( 12 (Xl (1)2
R1
R1
i 2 60
1)= 0 Xl (s)2 ds; 1)), Xl (s) = Xl (s)
0 Xl (s)ds. For , we use the grids f( 5 ) gi=0 as proposal and
2
500
f(12j=500) gj=0 to check the uniform bias property.
To further motivate the form of S , observe that for large , the unconditional distribution of a
mean-zero Ornstein-Uhlenbeck process with mean reverting parameter is N (0; (2 ) 1 ), suggesting
za (2^ ) 1=2 as the estimator of the unconditional quantile. Recall from the discussion in Section 2.1
a
a
that for large , ^ is approximately distributed ^ N ( ; 2 ). Further, ^ N ( ; 2 ) independent
of ^ . Thus, from a …rst order Taylor expansion,
P (^ + z~ (2^ )
1=2
<
+ (2 )
1=2
Z)
P (~
z (2 )
which is solved by z~ given in the main text.
41
1=2
< ((2 )
1
+
2
+ z~2 (2 )
2 1=2
)
Z) =
Download