Signature redacted 707 Three Essays on Econometrics 2,4

advertisement
Three Essays on Econometrics
01FTECHMOGY
SEP 2,4
707
By
LIBRA"'
Joonhwan Lee
M.A., Massachusetts Institute of Technology (2014)
B.A., Seoul National University (2007)
SUBMITTED TO THE DEPARTMENT OF ECONOMICS IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY IN ECONOMICS
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
SEPTEMBER 2014
2014 Massachusetts Institute of Technology. All rights reserved
Signature redacted
Signature of Author_
Department of Economics
August 15, 2014
Certified by
Signature redacted
Victor Chernozhukov
Professor of Economics
Thesis Supervisor
Accepted by
Signature redacted
Nancy L Rose
If I
Charles P. Kindleberger Professor of Applied Economics
Chairman, Departmental Committee on Graduate Studies
THREE ESSAYS ON ECOMETRICS
by
Joonhwan Lee
SUBMITTED TO THE DEPARTMENT OF ECONOMICS IN PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF
PHILOSOPHY IN ECONOMICS
ABSTRACT
This thesis consists of three chapters that cover separate topics in econometrics.
The first chapter demonstrate a negative result on the asymptotic sizes of subset AndersonRubin tests with weakly identified nuisance parameters and general covariance structure. The result
of Guggenberger et al (2012) in case of homoskedasticity is shown to break down when general
covariance structure is allowed. I provide a thorough simulation results to show that the break-down
of the result can be observed in wide range of parameters that is plausible in empirical applications.
The second chapter propose an inference procedure on Quasi-Bayesian estimators accounting for
Monte-Carlo numerical errors. Quasi-Bayesian method have been applied to numerous applications
to tackle the non-convex shape arises in certain extremum estimations. The method involves drawing
finite number of Monte Carlo Markov chains to make inference and thus some degree of numerical
error is inevitable. This chapter quantifies the degree of numerical error arising from the finite draws
and provides a method to incorporate such errors into the final inference. I show that a sufficient
condition for establishing correct numerical standard errors is geometric ergodicity of the MCMC
chain. It is also shown that geometric ergodicity is satisfied under Metropolis Hastings chains with
quasi-posterior for the whole class of extremum estimators.
The third chapter considers fixed effects estimation and inference in nonlinear panel data models
with random coefficients and endogenous regressors. The quantities of interest are estimated by cross
sectional sample moments of generalized method of moments (GMM) estimators applied separately
to the time series of each individual. To deal with the incidental parameter problem introduced by
the noise of the within-individual estimators in short panels, we develop bias corrections. These
corrections are based on higher-order asymptotic expansions of the GMM estimators and produce
improved point and interval estimates in moderately long panels. Under asymptotic sequences
where the cross sectional and time series dimensions of the panel pass to infinity at the same rate,
the uncorrected estimators have asymptotic biases of the same order as their asymptotic standard
deviations. The bias corrections remove the bias without increasing variance.
Thesis Supervisor : Victor Chernozhukov
Title : Professor of Economics
2
ACKNOWLEDGMENT
I would like to express the deepest appreciation to my advisers, Professor Victor Chernozhukov and Professor Anna Mikusheva. Without their guidance and support, I could not
have gone through the PhD program. They have shown inexhaustible patience and generosity
to my slow progress, and constantly helped me moving forward. Countless communications
with them were definitely the most essential element in my thesis writing.
I would also like to show my gratitude to Professor Whitney Newey. Being a teaching
assistant for him multiple times, I have witnessed the dedication to teaching and great amount
of attention to every student. He has been truly an exemplary mentor. He has been always
supportive and he was the very person that I sought help for when I had hard time. I also
thank Ivan Fernendez-Val, who coauthored one of my thesis chapters. Co-work with him had
been an enjoyable experience.
I would like to thank Gary King, the administrator of the economics department. As
anybody who has been in this department would agree, his attention to detail and administrative support have been crucial in my years at MIT. My sincere thanks to my colleagues,
especially Youngjun Jang who has been my closest friend at MIT.
I greatly appreciate the financial support from Korea Foundation for Advanced Studies.
For five years, the foundation had provided generous financial support for my study at MIT.
Finally, I would like to thank my parents who have been always encouraging what I have
been up to. Their spiritual support has been great part of my life. My special thanks to
my lover and fiancee, Junhyeon Jeong, who has supported me from long distance for so long
time.
3
Contents
ABSTRACT
2
ACKNOWLEDGMENT
3
Chapter 1.
ASYMPTOTIC SIZES OF SUBSET ANDERSON-RUBIN TESTS WITH
WEAKLY IDENTIFIED NUISANCE PARAMETERS AND GENERAL
6
COVARIANCE STRUCTURE
1.1.
INTRODUCTION
6
1.2.
LINEAR INSTRUMENTAL REGRESSION MODEL AND WEAK IV
8
1.3.
SIMPLIFICATION OF THE MODEL
11
1.4. DOMINANCE OF AR(f3 0 ) BY X 2 (k-mw) WITH CONDITIONAL HOMOSKEDASTICITY 14
1.5.
SIMULATION RESULTS
17
1.6.
SIMULATION OF MORE GENERAL SETTING
23
1.7.
CONCLUSION
27
30
Bibliography
Chapter 2.
INFERENCE ON QUASI-BAYESIAN ESTIMATORS ACCOUNTING FOR
MONTE-CARLO MARKOV CHAIN NUMERICAL ERRORS
32
2.1.
INTRODUCTION
32
2.2.
OVERVIEW ON QUASI-BAYESIAN ESTIMATORS
34
2.3.
GEOMETRIC ERGODICITY OF MCMC CHAIN
38
2.4.
CONSISTENT ESTIMATION OF VARIANCE-COVARIANCE MATRIX
42
2.5.
MONTE-CARLO SIMULATIONS
43
2.6.
CONCLUSION
45
46
Bibliography
4
Bibliography
Chaptei 3.
48
PANEL DATA MODELS WITH NONADDITIVE UNOBSERVED HETEROGENEITY:
50
ESTIMATION AND INFERENCE
3.1.
INTRODUCTION
51
3.2.
MOTIVATING EXAMPLES
54
3.3.
THE MODEL AND ESTIMATORS
58
3.4.
ASYMPTOTIC THEORY FOR FE-GMM ESTIMATORS
60
3.5.
BIAS CORRECTIONS
68
3.6.
EMPIRICIAL EXAMPLE
72
3.7.
CONCLUSION
74
Bibliography
APPENDIX
74
A. NUMERICAL EXAMPLE
79
APPENDIX B. CONSISTENCY OF ONE-STEP AND TWO-STEP FE-GMM E STIMATOR
APPENDIX
8I
C. ASYMPTOTIC DISTRIBUTION OF ONE-STEP AND TWO-S rEP
FE-GMM ESTIMATOR
82
APPENDIX D. ASYMPTOTIC DISTRIBUTION OF BIAS-CORRECTED TWO -STEP
GMM
85
APPENDIX E. STOCHASTIC EXPANSION FOR io =
APPENDIX F. STOCHASTIC EXPANSION FOR .'(9O,
i(OO) AND
=
i(9o)
0i)o
o) AND .$j(90,01)
88
92
APPENDIX G. SCORES AND DERIVATIVES
94
TABLES A1-A4
98
5
CHAPTER 1
ASYMPTOTIC SIZES OF SUBSET ANDERSON-RUBIN
TESTS WITH
WEAKLY IDENTIFIED NUISANCE PARAMETERS AND GENERAL
COVARIANCE STRUCTURE
1.1. INTRODUCTION
Making inference on structural parameters in the linear instrumental variables (IVs) regression models has been one of the classic problems of econometrics. One of the biggest
problems that many applications of the linear IV models encounter is that instruments are
often weak, i.e. they are poorly correlated with the corresponding endogenous variables.
Classical asymptotics has very bad finite sample behavior with weak instruments and thus
the classical inference is practically unreliable. (See Stock, Wright and Yogo (2002)) Naturally, the problem of developing inference procedures that are robust to weak instruments
has been one of the central questions of econometrics for the last 15 years.
There has been rich progress in constructing robust test statistics, most notably, the AR
statistic by Anderson and Rubin (1949), the Lagrange multiplier (LM) statistic by Kleibergen
(2002), and the conditional likelihood ratio (CLR) statistic by Moreira (2003). An important
shortcoming of the above methods is that they are designed to test only the simple full vector
hypothesis in the form of Ho : 0 =
#l0 where 3 contains
the coefficients for all the endogenous
variables. Testing for a subset of parameters is not straightforward because the unrestricted
structural parameters enter as additional nuisance parameters. Projection based tests are
general solution for such problems but they are often very conservative especially when the
number of dimensions projected out is large. When unrestricted structural parameters are
strongly identified, the above test statistics can be adapted to have correct asymptotic size
6
and improved power compared to projection based tests.
See Stock and Wright (2000),
Kleibergen (2004) and Guggenberger and Smith (2005) among others.
The problem of testing without any assumption on the identification of unrestricted structural parameters was a long standing question. Guggenberger et al (2012) provided an partial
answer to this question. They showed that with a Kronecker product structure on a certain
covariance matrix, the subset AR statistic with LIML (limited information maximum likelihood) estimator plugged in has the correct asymptotic size and power improvement over
projection based tests. The Kronecker product structure, however, essentially implies the
conditional homoskedasticity among reduced form disturbances. Thus, the result of Guggenberger et al (2012) is not practically useful because most economic data involve high degree
of heteroskedasticity or serial correlation. Also, an important question arises: Will the result
of Guggenberger et al (2012) hold with general covariance structure?
This paper provides an answer to this question by documenting a counter-example. I
consider a reduced model of the linear IV model with normal disturbances and perform
thorough simulations. It is shown that the result of Guggenberger et al (2012) breaks down in
wide range of covariance structure if the Kronecker product assumption is removed. Moreover,
it is demonstrated via simulation that the projection based tests have sharp asymptotic size.
The range of covariance structure where the break-down is observed in this paper, however,
necessarily imply serial correlation among reduced form disturbances. Thus, the implication
of this paper may shed light on the weak identification robust inference procedures with times
series data.
The paper is organized as follows. Section 2 briefly discusses the model and the problem
of interest. Section 3 considers a simplification of the model to make it tractable for analysis
and simulations. Section 4 reiterates the result of Guggenberger et al (2012) in the simplified
model and show that the Kronecker structure is actually isomorphic to the identity matrix in
the context of the test statistic. Section 5 and 6 discuss the counter example to their result
with general covariance structure and thorough simulation results.
7
1.2. LINEAR INSTRUMENTAL REGRESSION MODEL AND WEAK IV
Hausman(1983) wrote that an IV regression model can be represented as limited information simultaneous equations model, in which we only specify single structural equation of
interest. The full structured model is
Y = Y#3+ W^1+ E
Y
= ZIHy +Vy
W =
Zriw +VW
where y, Y, W are T x 1, T x my, T x mw matrices that contain endogenous variables. W is
consisted of solely endogenous variables, while Y may contain some exogenous variables that
is of interest. Z is a T x k matrix of instruments. We assume away any other included exogenous variable in the structural equation by regarding all the variables to be pre-multiplied
by Mx = IT- X(X'X)- 1 X', where X is a T x mx matrix of exogenous variables that is not
contained in Y. As usual, we assume that Z is a full rank with k > my + mw to satisfy the
rank condition. The hypothesis that we are interested in is
Hto:13 =13o
v.s.
H, : 3 :LLPo.
With appropriate re-parametrization, we can also test a general linear restriction in this
framework as well. If we have a test that have correct asymptotic size, we can construct
a corresponding confidence interval of 8 by inverting the test. Under classic asymptotics
when we have fixed full rank matrix of [fly Dw} and sample size T increases to infinity, we
can easily establish asymptotic normality of the estimator for / and -y. We can test these
parameters or any function of them with conventional Wald (or t) statistics.
Testing the parameters under potential weak identification, that is when [Hy Ilw] is close
to degenerate along some direction, is problematic because usual asymptotic approximation
does not work well even with very large T . Staiger and Stock (1997), among others, examine
this problem by considering an alternative asymptotics where [Hy IIwJ are changing with
8
sample size T with order of 1. More recent works have focused on finding a set of robust
tests that have asymptotically correct size under arbitrarily weak identification. These robust
tests are tests based on Anderson-Rubin statistic (Anderson and Rubin, 1949), conditional
likelihood ratio statistic (Moreira, 2003) and a Lagrange multiplier statistic (Kleibergen,
2002). The aforementioned statistics are known to have limiting distributions that do not
depend on nuisance parameters when testing a hypothesis that contains the whole set of
endogenous variables, in our case that would be HO : fl = %, -y = yo.
Contrary to classic asymptotics, it is not straightforward to perform a test on a subset
of parameters based on weak-instrument robust statistics.
This is due to the fact that
unrestricted structural parameters constitute additional nuisance parameters in the testing
problem. In our model, the hypothesis of interest is
Ho: P3= Oo,
while allowing -y to be unrestricted. If -y is strongly identified, the robust tests above can
be adapted by replacing -y with 7, which is a consistent estimator of -y. Stock and Wright
(2000) show that such modification of AR statistic in GMM setting provides a valid test.
Kleibergen (2004) extends the result to CLR and LM statistic for a linear regression model.
Guggenberger and Smith (2005) and Otsu (2006) address the similar issue in a more general
GEL(Generalized Empirical Likelihood) framework.
Without the assumption of strong identification of y, a natural approach is to apply
projection type tests. See Dufour (1997) and Dufour and Taamouti (2005), among others.
Projection test based on AR statistic can be described as follows. Consider AR statistic for
both fl and -, AR(3, -y). For testing the hypothesis HO : 3 = 8o , the projection test rejects
the null when AR(3 0 , ) > X(-k) 1
for all values of y. Thus, the corresponding test statistic
is
AR(#0 ) = min AR(3o,'y)
-YER"W
The problem of the projection approach is that it does not provide efficient test if -y happens
to be strongly identified, i.e. it has lower power than potentially optimal tests in some sense.
9
One can note that level-a projected AR test uses X2(k)1-, as a critical value while the subset
AR test with the assumption that yo is strongly identified uses X 2 (k - mw)i,- while having
the same test statistic.
Guggenberger et al(2012) show that we can actually improve upon projection tests even
with weakly identified -y. They show that under a Kronecker product covariance of (E, Vw),
that is
E [vec(ZiU") ((vec(ZiUj))'] = E[UiUI 9 E[ZiZ2]
where U = (ei, V4,)', the same subset AR test statistic AR(3 0 ) = minYERmw AR(80 , 7) has
the limit distribution that is stochastically dominated by X 2 (k - mw). Along with the fact
that the limit distribution is exactly X 2 (k - mw) with strong identification of -y, one can
conclude that the test based on the subset AR statistic with critical value of X2(k - mw)1has correct asymptotic size of a, and provides power improvements over the projected AR
test.
The crucial Kronecker product assumption essentially corresponds to conditional homoskedasticity of U = (Ei, Va)'. This can be very restrictive in many empirical applications, especially where the weak instrument robust procedures are widely used. For example,
Kleibergen and Mavroeidis (2009) applied the subset AR test based on X2(k - mw) critical
value for times series data to make inference on New Keynesian Phillips curve. Presumably,
the data has significant auto-correlation and conditional heteroskedasticity which are common
in any time series data. They applied subset tests with AR statistic based on a conjecture
that the power improvement over projection tests would hold in the case of general covariance
structure. However, as pointed out later by Guggenberger et al (2012), subset LM test and
CLR test does not give correct asymptotic size even under the conditional homoskedasticity
and the positive result holds only for subset AR test with the Kronecker product covariance assumption. A question whether the result of Guggenberger et al (2012) holds in case
of general covariance structure which allows heteroskedasticity and auto-correlation remains
open. If this question has a negative answer, in other words, if the stochastic domination by
2(k
- mw) does not hold with general covariance structure, then we are back to the lower
10
power of projection tests unless we are willing to accept the very restrictive assumption of
conditional homoskedasticitiy.
This paper tries to address the question by providing an counter-example and show that
the stochastic domination by X 2 (k - mw) breaks down in wide range of non-Kronecker
covariance structure. Also, a thorough Monte-Carlo simulation experiment is done to examine
the region of parameters that causes break-down of Guggenberger et al (2012)'s result.
1.3. SIMPLIFICATION OF THE MODEL
Here, I analyze the reduced form in case of fixed instruments, normal errors and a known
covariance matrix. The model is canonical in the literature for several reasons. First, it
provides a benchmark that allows simple exposition and finite sample analysis of the statistic
of interest. Second, it is a ground for asymptotics of more general models. See Moreira(2003,
2009), Andrews, Moreira and Stock(2006) and Guggenberger et al(2012) among others. Since
the purpose of the paper is to provide an counter-example along with a thorough Monte-Carlo
study, the notations and definitions of the following benchmark model will be used in the
remaining parts.
We can rewrite the model in reduced form as
where U = E + Vy[3
+
y)
Zi + U
Y
Z Y +VyI,
W)
Z1W +Vw
+w
Vwyo and r1 = Hyflo + Hwyo. Since error terms are normal and
instruments are fixed, the model can be reduced to
Hy/3 0 + Hw-vo
N
vec(fly)
vec(Alw)
vec(Hw)
)E
)
vec(Iy)
where
E = (I(1+my+mw) ®(Z'Z)- 1 ) Var (vec (Z'(U Vy Vw))) (I1+mymw) & (Z'Z)-')
11
and (II, By, llw) are OLS estimator of (1, U1y, Hw). Under the null hypothesis of HO :
=
180, we can further concentrate the model by incorporating the information of true 8. We
have
vec(UIw)
rw
Note that
=
where
=
(I(l+mw) 9 (Z'Z)-') Var (vec (Z'(CT Vw))) (I(i+mw) 0 (Z'Z)-),
E + Vwy.
Thus, there is no need to specify the covariance structure between
Vy and other stochastic terms to analyze this model under the null. That is exactly why
Guggenberger et al (2012) did not need Kronecker product assumption for terms involving
Vy. Note also that the reduced form covariance matrix t can be consistently estimable and
thus we can treat t as known in asymptotic analysis of the model. Here, unknown parameters
are I 1 w and yo.
The statistic of interest is the subset AR test statistic defined as
/
( -WY
AR(3 0 ) = min min
nIW
t-)
-
kvec(nw)
^
-
- vec(Hw
)
,
I
where
l= - Hy8o.
We can decompose
'-
Q'Q
where Q can be represented as
Q=
Q 11
Q12
(0
Q22)
Such decomposition includes Cholesky decomposition, which makes Q an upper-diagonal
matrix with positive diagonal entries. Then we can rewrite the statistic as
AR(flo) = mm min
vec( ) - vec() )
12
Hmy)vec(D)
vec(D) - vec())
where
=
Q1
+ Qi 2 vec(lw);
4
=
Q22 vec(llw);
cI
=
Q22 vec(flw);
H(-Y)
=
(I' ® Qn)QQ + Q12Q-,
and
N
(vec(4))
(H
),
H(yo)vec(<D)
Ik(1+mw)
vec(#)
The parameter <D can be straightforwardly concentrated out because the statistic is a
quadratic function of vec(D) given value of -y. The first order condition with respect to
vec() is,
2H(-y)'H(-y)vec(f) - 2H(-y)' - 2(vec(dD) - vec(<b)) = 0,
which gives,
vec(q*) = (H(-y)'H(-y) + Ikmw)~1 (H(y)' + vec(4)).
Plugging in the optimal value of o, we have
vec(4)) (H(-y)'H(y) +
=
min - (H(+)'
=
min ((-H(')vec(&) (H(y)H()'+Ik)-i
Ikmw)~1
(
+
AR(#o)
(H(f)' + vec(4)) + vec(^)j
+
H(-y)veC(4D)),
where the second equality uses the following decomposition of Ik(mw+1),
H(y)'
Ik(mw+1) =
Ik
Imw
-kH
(H(-r)'HQ() + Ikmw)~l
Ok
0 kmw
(H(y)H(y)' + Ik)
I
1 Note
that for any m x n matrix H, the following holds.
(I + H'H)-1 = I, - H'(Im + HH')-'H.
13
H (-)'
~1,I
IAgmw
-H(,y)
11 2
1.4. DOMINANCE OF AR(f3o) BY X2(k - mw) WITH CONDITIONAL HOMOSKEDASTICITY
Here I briefly discuss the result and proof of Guggenberger et al (2012) using the concentration and simplification I developed above for the linear IV model with conditional
homoskedasticity. Given the Kronecker product assumption (i.e. conditional homoskedasticity), we have
0 0 (Z'Z)~1,
where Q = Var ((U, Vwj)'). Let P'P= Q-1 be the Cholesky decomposition of Q-1. Consider
the following decomposition
=
(P ® (Z'Z)2)'(P & (Z'Z)2).
Also, let
P P11 P12
(10
P22)
where P11 is a scalar, P22 is a mw x mw matrix. One can note that in this case,
H(-) =
H(-y)vec(f)
(7'P
1 1Pj2'+P
12 PL')& Ik =Y ®Ik,
= (M,
with a re-parametrization. The AR statistic can be written as
AR(Qo)=min
^11 +
'
~2
This shows the Kronecker product assumption basically makes the model equivalent to a
model from t = Ik(1+mw) with different set of parameters.
{(yo, nw,
)JY =
That is, the class of model
®
&(Z'Z)-'} is equivalent to the class of model {(i0, , Ik(mw+1))} and we
can can achieve significant reduction in dimensionality by assuming the Kronecker product
14
structure. Now let us define
771
=
~12
=
o
4
The proof of the statement that the statistic AR(to) is dominated by X 2 (k - mw) in Guggenberger et al (2012), hinges on the following equivalence,
I
mm
2
-
s2
dld2
mm
+4
d + d'd2
s.t. d$dd
dl + 42d2
=C2
C,
where
1
-E
'0(7 7
72V
+
__1
E 2-
and C is any positive number. By plugging in
d*1 = 1,
d2
=-(D)1
M,
we have
i
mn
-
1+ '
since ri and
E2
e' (1k -
(V)~)
1+ d2
E <D
1
mw)
E
are independent. Obviously, all the elements above is not feasible but that
is not of concern here because we basically use them to show that there exists
realization of 1 and
t2
such that the criterion function evaluated at
'*
* for every
be dominated by
2
x 2 (k - mw). That is sufficient to show that AR statistic is indeed dominated by X (k -mw).
NEGATIVE RESULT IN GENERAL COVARIANCE CASE. The above result, however, does
not hold generally with potential heteroskedasticity and auto-correlation. Here, I document
15
a case where the above result breaks down in a case of general covariance structure. Since one
counter-example is enough to prove the claim, I demonstrate such an example with simplest
possible setting, i.e. with mw = 1 and k = 2. I set parameters to be
--
v1=2
Q1 Q11Q2 = 12 Q2 aQ12Q2 =4
0 -1
1
1
0
Note that the parameter setup seems to be not much a departure from the homoskedastic
case except that Q12Q21 is not a scalar multiple of 12. Also, the value of 4 indicates that
the strength of identification is very weak. As shown later in more thorough simulation
experiments, the upward departure from X 2 (k - mw) is more pronounced as <D and 7yo get
closer to zero. I ran a Monte-Carlo simulation based on the concentrated and reduced model
~ N
yo))
,(I2k
and examined the distribution of AR statistic
AR(fo) = min
-
H(7 )4) (H(y)H(-y)' + Ik)
1
(-
H(-y)4).
Projection method guarantee that AR(f3 0 ) is dominated by X2(2), and the question we wish
to address is whether AR( 0 ) is dominated by X 2 (1). Although the minimization involved
is over just a single dimensional space, the criterion function has potentially many humps
and thus guaranteeing global minimization for every simulation draw is not an easy task.
I employed a Newton type algorithm with multiple starting values to do the task. Figure
1 shows the quantile function of AR( 0 ) along with that of X 2 (1) and X2 (2). As we can
see from Figure 1, there is a clear stochastic ordering in this case. The statistic AR(3 0 ) is
stochastically larger than X 2 (1) and thus using X2(1) critical values would clearly produce
over rejection, i.e. size distortion.
16
Qmc. Pst OF A = 2(t) and W
=2.
10
4-
I
I
s
0
0
k
0tr
QiadQ
00.1
{0
,.2
,0.5
, 0,
52
0
he case of k ipiiyo pcfcto.As dointing prvXu
100d000nsiderthen drbaws.
okit
=2 and mW.= 1.
This caseis with
t~~~~FGUE141
at
.Frms
fistuet
stenme
swl
2 x
pe Ik se mw)
1.5. SIMULATION RESULTS
n P ftefloigstoarameters.aestt 0Q 2 2 0-)
In this section, I report some Monte-Carlo simulation results with varying model parameters 70o, (P, Q, and Q2 as well as the number of instruments k. For most part, I look into
the case of k
=
2 for simplicity of specification. As in the previous example, I set Q,
and consider the combinations of the following set of parameters.
1
Q2 E {,RR1
0
-1
= 0, 51,54, 10}.
.
(1
17
12
Due to the nature of the AR( 0 ), the distribution of it is expected to be continuous in the
change of the above parameters. The criterion function is a continuous function in those
parameters. Since minimum is a continuous functional, one can argue that the distribution
of AR(#3 0) and its functionals are also continuous in all the parameters.
2
Before examining the simulation result, it may be helpful to see what the above parameter
represent in terms of the original model of interest. The reduced form is written
y
Z- 1 + U
Y)
ZHy+VY)
kW
ZHw + VW
and in case of the model we consider for the simulation, where k
=
2 and mw
=
1, under
#, the model is reduced to
hi - I1Y30)
HwYo)
N
Hw
)
the null hypothesis of HO
=
law
If we assume that Z is normalized so that Z'Z
=
Ik, we have
t = Var (vec (Z'(0jVw))),
where U = U - f3oVy.
We can see how Q, and Q2 are related to t, the reduced form covariance as follows.
Since the model is a reduced version, there are infinitely many 5 that corresponds to the
same value of
Q,
and Q2. If we normalize so that
1
1
qll+ Q2Q})-Q4"
-(QlQi
-Q,2Qi(QlQl)Y 1
t22 = 12,
'
12
Q2~
then we have
=
(1 +
/77
72)I2
-qR~
1~2/
Note that when Q2 is zero matrix, that is when 7 = 0, then we have conditional homoskedasticity. Larger values of 7 means that the reduced form error of Z'
2
Note that the minima -y* is not necessarily continuous in parameters.
18
has much larger variance
(at the order of 7) than that of Z'Vw. Also, larger values of 7 translate into higher correlation, either negative or positive, between ZU and Z2fw, and Z2U and ZVw because the
correlation coefficients are
Given the value of R in the simulation, we have opposite signs in those two correlations. The
matrix R is chosen in this way because it produced the most amount of upward deviation from
X2(k
- mw). However, a slight change in the value of R also generated such deviation and
therefore the deviation is not a singularity, but occurs over a wide range of set of parameters.
One can ask whether the structure of 2 that generates the deviation is plausible or realistic in
empirical application. This question will be addressed more thoroughly in the later sections.
Vector parameterl indicates the strength of identification because under conditional ho-
1(Z'Z)-', wemoskedasticity
have
with
0
2
/
-
=
2
=
Q22w =
(ZZ)-flw,
][||wZ'Zlw/U2
i.e., the norm of 4i is the familiar concentration parameter for W under the assumption that
Ho :,8 =63 holds. See Stock, Wright and Yogo (2002) for the definition. In general case, we
can note that
II4II2 = HWt1[W,
where
t2
is the variance-covariance matrix of vec(<i>). In case of <I = 2011 or 4) = 51 in this
experiment setting, they correspond to the concentration parameter of 400 and 25, which are
regarded as strong identification in most empirical literature in case of k = 2. The value of
yo also presumably affects the distribution of AR(f3o). For conditional homoskedastic case,
I showed that a model with (7, 11, t) is equivalent to (', 4D, I) for some values of
and <o
in terms of the AR(o) distribution. As shown above, the value of ; is affected by the
19
elements of E. The simulation results clearly indicates that the value of-yo indeed affects the
distribution of AR(3o) in a subtle manner.
The number of simulation draw is 50,000 for each combination of parameters and I
tabulate the 95% and 90% quantile values of empirical distribution of the simulated AR(#0 ),
denoted by AR95 and AR9O respectively, along with the standard errors. Also, I report
P(AR(f3o) > X2(1)1-a) for a = 0.05 and 0.1, which is the true size of the test based on
x2 (1)1_, critical values. Those values will show the size distortion when we apply the subset
test in Guggenberger et al(2012) to a described model without the assumption of Kronecker
product. Through the simulation experiment, I found that the behavior of AR95 or AR90 is
a good representation of the behavior of the whole distribution of AR(O). That is, at least in
this experiment, when AR95 is above X 2 (1)o.95, then the distribution of AR(#o) stochastically
dominates X 2 (1). Also, when AR95 converges to x 2 (1)o.95 in some change of parameter, then
the distribution of AR(80 ) is also found to converge to X 2 (1).
Therefore, the simulation
results can be interpreted accordingly.
The simulation results in Table 1 and Table 2 show some notable tendencies. First, as <D
is further apart from zero, the distribution of AR(#0) converge to
X 2 (1).
This has a natural
explanation that I< I 12 is the concentration parameter and larger value of it indicates strong
identification. This is consistent with the result of Stock and Wright (2000) that we have the
subset test statistic following X2(k - mw) with strongly identified -yo. However, the speed of
convergence varies for different values of -y and Q2. Most notably, higher values of yo seems
to be highly correlated with the speed of convergence. Figure 2 shows how AR95 changes
in the identification strength of -Yo. In homoskedastic case where Q2 = 0, we can observe
monotonic increase of AR95 until it converges to X 2 (k - 1)0.95. In general covariance case
where Q2 = 4R, we can observe AR95 is well above x 2 (1)o.95 with weaker identification and
then decreases way below x 2 (1)o.95, and finally increase monotonically while converging. In
both cases, the speed of convergence is faster when the value of -Yo is larger. We can also
note that the speed of convergence is much slower in case of Q2 = 4R.
20
0=0
7=1
70=0.1
7=0.5
70=1
to=5
to=20
3.42
(0.029)
3.37
(0.028)
3.41
(0.029)
3.37
(0.028)
3.40
(0.029)
3.37
(0.029)
2.70
(0.023)
3.37
(0.028)
3.40
(0.029)
3.37
(0.028)
3.42
(0.029)
3.39
(0.029)
3.42
(0.032)
3.08
(0.026)
3.80
(0.032)
3.44
(0.030)
3.41
(0.029)
3.42
(0.030)
3.42
(0.030)
3.37
(0.031)
3.79
(0.032)
2.33
(0.020)
3.70
(0.032)
3.81
(0.033)
3.42
(0.031)
3.43
(0.031)
3.39
(0.031)
3.36
(0.030)
3.40
(0.028)
3.79
(0.032)
3.78
(0.033)
3.82
(0.033)
3.85
(0.033)
3.88
(0.033)
3.61
(0.030)
3.67
(0.031)
3.63
(0.031)
3.72
(0.032)
3.79
(0.032)
3.78
(0.032)
3.84
(0.033)
3.81
(0.033)
3.81
(0.032)
3.86
(0.033)
3.80
(0.032)
3.78
(0.032)
3.79
(0.032)
3.80
(0.032)
3.85
(0.033)
3.84
(0.033)
-o=0
to=0.1
70=0.5
70=1
yo=5
70=20
-M=0
1.67
(0.014)
1.68
(0.014)
1.68
(0.014)
1.66
(0.014)
1.70
(0.015)
1.68
(0.015)
A = 0.1
1.69
(0.015)
1.67
(0.014)
1.69
(0.015)
1.66
(0.014)
1.76
(0.015)
A = 0.5
1.80
(0.016)
1.80
(0.016)
1.81
(0.016)
1.89
(0.016)
2.06
(0.018)
2.07
(0.018)
2.12
(0.018)
3.69
( 0.031)
3.67
(0.031)
3.82
(0.032)
3.84
(0.033)
A
=0
1
A=
774
77=10
70=0
70=0.1
70=0.5
70=1
70=5
70=20
70=0
70=0.1
70=0.5
70=1
70=5
fo=20
A =0
5.21
0.035)
5.21
(0.035)
5.26
(0.036)
5.19
(0.036)
5.22
(0.036)
5.18
(0.036)
5.70
(0.038)
5.67
(0.037)
5.66
(0.037)
5.67
(0.037)
5.69
(0.038)
5.66
(0.037)
A = 0.1
5.24
(0.036)
5.20
(0.036)
5.17
(0.035)
5.21
(0.036)
5.15
(0.036)
4.82
(0.035)
5.59
(0.037)
5.62
(0.037)
5.63
(0.037)
5.66
(0.038)
5.59
(0.037)
5.44
(0.036)
A = 0.5
4.74
(0.034)
4.79
(0.035)
4.78
(0.035)
4.87
(0.036)
4.54
(0.036)
3.56
(0.033)
5.06
(0.037)
4.92
(0.035)
5.03
(0.037)
5.03
(0.037)
4.88
(0.036)
4.29
(0.037)
A= 1
4.31
(0.036)
4.35
(0.036)
4.30
(0.036)
4.29
(0.036)
3.90
(0.036)
3.66
(0.031)
4.35
(0.036)
4.40
(0.036)
4.41
(0.036)
4.32
(0.035)
4.31
(0.037)
3.66
(0.035)
5~
3.61
(0.030)
3.61
(0.030)
3.61
(0.031)
3.60
(0.030)
3.82
(0.033)
3.83
(0.032)
3.60
(0.030)
3.59
(0.030)
3.62
(0.030)
3.58
(0.030)
3.69
(0.031)
3.83
(0.033)
3.83
(0.033)
3.81
(0.032)
3.85
(0.033)
3.80
(0.033)
3.87
(0.033)
3.84
(0.033)
3.89
(0.033)
3.83
(0.033)
3.86
(0.033)
3.87
(0.033)
3.81
(0.033)
(0.032)
A
3.78
TABLE 1. Simulation Results of AR95 for k = 2, mw = 1
95% quantile of
X2 (1) is 3.84 and that of X2 (2) is 5.99. The result is from 50,000 simulation draws
for each configuration.
Break-down of the dominance by X 2 (k - mw) is observed in the weak identification region
of D and it is more pronounced when q is larger. Figure 3 shows that AR(# 0 ) can indeed get
very close to X2(2) when q is sufficiently large and there is no identification. This example
demonstrate that at least with k = 2 and mw = 1, the asymptotic size of the subset AR
test based on critical value of the projection AR test sharply equals a if we consider every
possible t, not just a class of t's that have Kronecker product structure. In fact, in the
next section, it is demonstrated that this is generally true for k > 2 and mw > 1. These
results show that applying the critical value of X 2 (k - mw)1-, in empirical applications
21
'q = 0
17 = 1
20
yo = 0
yo = 0.1
-o = 0.5
yo = 1
yo = 5
yo = 20
yo = 0
-o =0.1
yo = 0.5
yo = 1
yo = 5
yo =
A=0
1.17
(0.009)
1.18
(0.009)
1.17
(0.009)
1.15
(0.009)
1.18
(0.009)
1.17
(0.009)
2.44
(0.020)
2.42
(0.019)
2.43
(0.020)
2.41
(0.019)
2.43
(0.019)
2.38
(0.019)
A = 0.1
1.19
(0.010)
1.17
(0.009)
1.18
(0.010)
1.17
(0.009)
1.23
(0.010)
1.92
(0.015)
2.41
(0.019)
2.43
(0.019)
2.41
(0.019)
2.42
(0.019)
2.41
(0.019)
2.33
(0.019)
0.5
1.24
-0 (0.010)
1.26
(0.010)
1.25
(0.010)
1.32
(0.011)
2.19
(0.017)
2.69
(0.021)
2.44
(0.020)
2.40
(0.019)
2.43
(0.020)
2.41
(0.020)
2.29
(0.019)
2.65
(0.021)
1.43
(0.011)
1.43
(0.011)
1.48
(0.012)
1.62
(0.013)
2.58
(0.020)
2.69
(0.021)
2.35
(0.019)
2.37
(0.020)
2.33
(0.019)
2.32
(0.019)
2.42
(0.019)
2.70
(0.022)
5
2.59
(0.020)
2.58
(0.020)
2.64
(0.021)
2.66
(0.021)
2.68
(0.021)
2.71
(0.021)
2.57
(0.020)
2.59
(0.021)
2.53
(0.020)
2.64
(0.021)
2.70
(0.021)
2.68
(0.021)
A-20
2.71
(0.021)
2.70
(0.021)
2.68
(0.021)
2.69
(0.021)
2.67
(0.021)
2.71
(0.021)
2.66
(0.021)
2.67
(0.021)
2.69
(0.021)
2.69
(0.021)
2.72
(0.022)
2.72
(0.022)
A=
A
7 =10
=4
A
0
A = 0.1
70=0
yo=0.1
70=0.5
70=1
7 =5
70=20
70=0
70=0.1
7 0 =0.5
-m= 1
7=5
70=20
(0.026)
3.95
3.95
(0.026)
3.95
(0.026)
3.92
(0.025)
3.93
(0.026)
3.90
(0.025)
4.34
(0.026)
4.31
(0.026)
4.34
(0.026)
4.35
(0.027)
4.33
(0.026)
4.35
(0.027)
3.93
(0.026)
3.92
(0.026)
3.94
(0.026)
3.94
(0.026)
3.86
(0.025)
3.59
(0.025)
4.29
(0.027)
4.28
(0.026)
4.30
(0.026)
4.28
(0.026)
4.27
(0.026)
(0.026)
3.71
4.13
A=
0.5
3.52
(0.025)
3.57
(0.025)
3.56
(0.025)
3.54
(0.025)
3.27
(0.025)
2.40
(0.020)
3.74
(0.026)
(0.026)
3.74
(0.026)
3.71
(0.026)
3.58
(0.025)
3.05
(0.025)
A
1
3.09
(0.025)
3.12
(0.025)
3.11
(0.025)
3.07
(0.025)
2.71
(0.023)
2.58
(0.020)
3.13
(0.025)
3.14
(0.025)
3.17
(0.025)
3.12
(0.025)
3.04
(0.024)
2.48
(0.021)
A=5
2.56
(0.020)
2.57
(0.020)
2.57
(0.020)
2.54
(0.020)
2.66
(0.021)
2.70
(0.021)
2.57
(0.020)
2.55
(0.020)
2.56
(0.020)
2.57
(0.020)
2.60
(0.021)
2.66
(0.021)
A-20
2.70
(0.021)
2.68
(0.021)
2.68
(0.021)
2.65
(0.021)
2.70
(0.021)
2.71
(0.022)
2.75
(0.022)
2.70
(0.021)
2.71
(0.022)
2.73
(0.022)
2.69
(0.021)
2.68
(0.021)
TABLE 2. Simulation Results of AR90 for k = 2, mw = 1
90% quantile of X(1) is 2.71 and that of X 2 (2) is 4.61. The result is from 50,000 simulation draws
for each configuration. Standard error from simulation in parenthesis.
may generate significant size distortion when we have weakly identified -yo and high degree
of heteroskedasticity and auto-correlation. Even in the most severe case of break-down in
this setup, the upward deviation from X 2 (1) seems to disappear when A > 2 which can be
translated to concentration parameter of 4. This seems pretty small number but obviously
one cannot use such threshold to decide whether we are safe to use the X 2 (k - mw) critical
values in empirical applications.
22
395-H
weasi
aer=
I
X (kWAMon 8bmnq"
AR95 - r
4S
-.
~-.-..r- ~
*,.*
.
4
35
0
Is
FIGURE 1.5.1. Change of AR95 in Identification Strength
For r7 = 0 (Conditional homoskedastic case) and r = 4 (Non-Kronecker case). 50,000 simulation
draws for each configuration.
1.6. SIMULATION OF MORE GENERAL SETTING
The previous section discussed the simulation results for k = 2 and Q2 of scalar multiple of
R=
0 -1
)
1
One may ask if we can observe the break-down of dominance by X 2 (k-mw)
in wider range of parameters. The answer is positive both for Q 4 yjR and k > 2. This section
is devoted to exploration of the region of parameters that generates AR(8o) distribution
23
Quantile Plot of AR wilh T1=100,
9
-...
A
X=0
(1
ca
4-I
/
3
22s
0
0.1
02
0.3
0.4
0.5
0.6
0.7
01
0.1
ProIabuty
F sIcosdr2tatsnoasaarmtieofRicsefk=2.Ise,
(k)
I
FIGURE 1.5.2. Case of AR being close to
Note that the value of
a
cnie
h
syo
does not matter here because we set A =0 so that
ha g(s)
e
2
dominating Xu
(k - mw) and discussion of whether the region of parameter is plausible in
empirical applications.
First, I consider Q2 that is not a scalar multiple of R in case of k
=2.
Instead, I consider
a class Of Q2,
?R()ji7 E R, 0 2 -1 1 R(9) =sn
JJ
Q2
(coss - sin
q
2~ 2"
(sn=
cosO)I
that is, Q2 is a scalar multiple of a rotation matrix in R2 . The class contains the values Of Q2
used in the previous section as a special case of 0
=
.Figure
different values of 9, where other parameters are set to {Qi
=
4 shows how AR95 changes over
12) ID' = 0. 1A) ~yo = 1, 77 E {2,8120}}1.
Interestingly, AR95 is increasing to the rotation angle and it takes maximum at 9
can note that when 77 is sufficiently large, we can observe AR95
9. For the case of 77 = 2,
101
> X2 (1)o.95
!. We
for wide range of
> I. It should be noted that there exists Q2 that produces AR
statistic dominating X'(1) outside the class considered here. The class of scalar multiple of
24
rotation matrices is considered just for convenience of characterization. In fact, any Q2 with
complex eigenvalues whose imaginary parts are sufficiently large could generate AR statistic
that dominates X2 (1). The exact mechanic of this, however, could not be clearly described
analytically.
AR5 on 0
.%...,,~4..%;
-
-
$
~
as
t~
*q~~
,.*..*~*
".1.
4
gi
g
4.5
S
*
*'
U
S
4
I-
La
S
I
3.5
S
Ut
13
3
ID
2.5
2
i -5L-
i
-318
-24
-1/9
0
0 (Radian dded by z)
18
2M8
315
4'.
FIGURE 1.6.1. Change of AR95 on 9
50,000 simulation draws with increment of 7r/180 of 9.
Second, I consider the case of k > 2 and k = 21, i.e. k is even with 1 being a positive
integer. Then an obvious extension of Q2 with qR(O) is
Q2 = I1 0 i R(0),
where 77 = 20,9=0 and and
Ok
is a k x 1 vector of zeros. The other parameters are set as
Qi = Ik,
yo = 0,
'D=Ok.
The Table 3 shows the simulation results for k E {4,6,10, 30}. We can see that both AR95
and AR90 exceeds the value of X 2 (k - 1)0.95 and X 2 (k - 1)0.9 by a significant margin in all
25
AR90
x 2 (k)o.go
x 2 (k - 1)0.95
AR95
X2 (k)o.9o
6.25
7.56
(0.023)
7.78
7.81
9.28
(0.033)
9.49
9.24
10.42
(0.027)
10.64
11.07
12.36
(0.037)
12.59
14.68
15.75
(0.033)
15.99
16.92
18.05
(0.044)
18.31
39.93
40.26
42.56
43.45
(0.067)
TABLE 3. AR95 and AR90 in case of k = 21
43.77
- 1)o.go
X2(k
4
k
k
k
=
=
6
10
39.09
k = 30
(0.051)
cases.. Note that Q2 's in these cases have quite a sparse structure when k is larger, and they
still generate the AR(#3o) statistic that stochastically dominates X 2 (k - 1).
For k = 21 + 1, it is also possible to observe a breakdown of j 2 (k - 1) dominance. By
setting the parameters as
Q.
0'
(E
=
Q2
k
O'-1
Ik-1 /Ok_1
kOk-1
where n = 20 and 9 =
with E = 0.0001.
o
1
2.
,
o= 0 ,
<4 = Ok,
I ® yR(9)J
The table 4 shows the simulation results for k E {3,7,11,31}
I could not observe a case with Q, not being close to singular matrix
for k = 21 + 1. When k is a odd number, I was not able to find any Q2 that generates
AR > X 2 (k - 1) with
Q,
= Ik among 5000 randomly generated Q2's. Such cases was only
observed with Q, being near singular. Q, being above with very small E implies that one of
the instruments has nearly no information.
The whole region of parameters where I found the break-down of X2 (k - mw) dominance,
however, implies some degree of auto-correlation in errors. If we only have conditional heteroskedasticity without any auto-correlation, we have
(I( 1 +mw) 0 (Z'Z)-1 ) Var (vec (Z'w)) (I(I+mw) 0 (Z'Z) 1
=
(I(1+mw) 0 (Z'Z)-') E[E[wjw jzj 0 (z z )] (I(i+mw) 0 (Z'Z)-
)
=
26
)
,
1
x2(k
k
k
k
k
=
3
=
=11
=
31
- 1)o.go
AR90
x 2 (k)o. 9 o X 2 (k - 1)0.95
AR95
x2 (k)o.95
4.61
5.83
(0.022)
6.25
5.99
7.37
(0.030)
7.81
10.64
11.65
(0.029)
12.02
12.59
13.69
(0.039)
14.07
15.99
16.85
(0.034)
17.28
18.31
19.21
(0.045)
19.68
44.53
(0.067)
TABLE 4. AR95 and AR90 in case of k =21 +1
44.99
40.26
41.02
(0.052)
41.42
43.77
which implies all k x k blocks that consist t should be symmetric. It can be shown that
the parameters values where I documented the stochastic dominance of AR statistic over
x2(k - mw) are not feasible under the block-wise symmetry of t. This does not necessarily
imply that AR statistic is dominated by X 2 (k - mw) when we have only conditional heteroskedasticity. As Guggenberger et al(2012) noted, proving the result analytically is not an
easy feat. Thus, the findings in this paper have something to say for applying subset AR test
for data prone to auto-correlation, e.g. time series data. A notable application is inference
on New-Keynesian Phillips curve as in Kleibergen and Mavroeidis (2009).
1.7. CONCLUSION
Reducing the degree of freedom for testing a subset of parameters with weak identification
robust test statistic and weakly identified nuisance parameters has been a challenging problem
in the literature. Such dimension reduction is important in practical perspective because the
efficiency of a test can be substantially improved. Recent work of Guggenberger et al (2012)
showed that subset Anderson-Rubin statistic can be applied with reduced degree of freedom
with conditional homoskedasticity, or Kronecker product structure. This paper is first to
document that the result of Guggenberger et al (2012) does not hold in general covariance
structure. With a thorough simulation study, I show that the projection based tests have
sharp asymptotic size and this cannot be improved without further assumptions on the
27
covariance structure. Also, it is shown that the break down of the result is most pronounced
when the identification of nuisance parameters is weak.
The region of parameters that I found the break down, however, necessarily imply there
is some degree of auto-correlation in errors. This leaves an important question: can we have
the dimension reduction with only conditional heteroskedasticity? The simulation results
suggest that the answer might be positive. Theoretical proof may be a daunting task, but it
will definitely allow subset AR test to be applied to much wider range of problems.
28
29
Bibliography
[11 Anderson, Theodore W., and Herman Rubin. "Estimation of the parameters of a single equation in a
complete system of stochastic equations." The Annals of Mathematical Statistics 20.1 (1949): 46-63.
[21 Andrews, Donald WK, Marcelo J. Moreira, and James H. Stock. "Optimal Two-Sided Invariant Similar
Tests for Instrumental Variables Regression." Econometrica 74.3 (2006): 715-752.
[31 Guggenberger, Patrik, and Richard J. Smith. "Generalized empirical likelihood estimators and tests
under partial, weak, and strong identification." Econometric Theory 21.04 (2005): 667-709.
[41 Guggenberger, Patrik, et al. "On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier
tests in linear instrumental variables regression." Econometrica 80.6 (2012): 2649-2666.
[5] Dufour, Jean Marie, "Some impossibility theorems in econometrics with applications to structural and
dynamic models." Econometrica 65.6 (1997): 1365-1387.
[61 Dufour, Jean-Marie, and Mohamed Taamouti. "Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments." Econometrica 73.4 (2005): 1351-1365.
[71 Hausman, Jerry A. "Specification and estimation of simultaneous equation models." Handbook of econometrics 1.1 (1983): 391-448.
[81 Kleibergen, Frank. "Pivotal statistics for testing structural parameters in instrumental variables regression." Econometrica 70.5 (2002): 1781-1803.
[9] Kleibergen, Frank. "Testing parameters in GMM without assuming that they are identified." Econometrica 73.4 (2005): 1103-1123.
[101 Kleibergen, Frank, and Sophocles Mavroeidis. "Weak instrument robust tests in
GMM and the new
Keynesian Phillips curve." Journal of Business & Economic Statistics 27.3 (2009): 293-311.
[111 Moreira, Marcelo J. "A conditional likelihood ratio test for structural models." Econometrica 71.4 (2003):
1027-1048.
[12J Moreira, Marcelo J. Tests with correct size when instruments can be arbitrarily weak. Center for Labor
Economics, University of California, Berkeley, 2001.
[131 Otsu, Taisuke. "Generalized empirical likelihood inference for nonlinear and time series models under
weak identification." Econometric Theory 22.03 (2006): 513-527.
30
[141 Staiger, Douglas, and James H. Stock. "Instrumental Variables Regression with Weak Instruments."
Econometrica 65.3 (1997): 557-586.
[151 Stock, James H., and Jonathan H. Wright. "GMM with weak identification." Econometrica 68.5 (2000):
1055-1096.
[161 Stock, James H., Jonathan H. Wright, and Motohiro Yogo. "A survey of weak instruments and weak
identification in generalized method of moments." Journal of Business & Economic Statistics 20.4 (2002).
31
CHAPTER 2
ESTIMATORS ACCOUNTING FOR
INFERENCE ON QUASI-BAYESIAN
MONTE-CARLO MARKOV CHAIN NUMERICAL ERRORS
2.1. INTRODUCTION
Quasi-Bayesian estimators(QBEs) or Laplace type estimators(LTEs) are defined similaxly to Bayesian estimators with replacing general statistical criterion functions for parametric likelihood function. Chernozhukov and Hong (2003) provides a detailed discussion
on properties of QBEs. They show that under some primitive conditions that are necessary
for asymptotic normality of general class of extremum estimators, QBEs are asymptotically
equivalent to the corresponding extremum estimators. QBEs are particularly useful when the
statistical criterion function of interest exhibits irregular and non-smooth behavior in finite
sample. Such functional behavior makes it hard to calculate extremum point estimates since
the process involves global optimization on highly irregular non-smooth function. Several
prominent econometric models such as censored quantile regression model and instrumental variable quantile regression model are well known to have the problem. These models
have well defined global minima in finite sample whose behavior follows standard asymptotics. Thus, one can say that the challenge of dealing with such statistical models is purely
computational.
QBEs essentially transform the extremum estimation problem into a Bayesian estimation
problem with a properly set Quasi-posterior. Thus, the point estimation comes down to
taking mean or median of MCMC draws of the Quasi-posterior, and constructing confidence
interval can be as simple as taking appropriate quantiles of the Quasi-posterior.
Theoretically, we can make the number of draws for MCMC algorithm large enough
to assure certain degree of precision for the computation of Bayesian or Quasi-Bayesian
estimators of interest. In practice, however, there are many computational restrictions that
32
makes large number of draws very costly or nearly impossible. In QBE context, the problem
at hand is to obtain the value of E[g(9)] as precise as possible, where 9 is parameter of interest
and the expectation is taken on Quasi-posterior. As noted in Gelman and Shirley (2011),
inference on E[g(9)] or a functional of the posterior requires considerably more MCMC draws
than the problem of making Bayesian inference on g(O). Therefore, with practical number of
draws, it is plausible that there is non-trivial amount of error in the computed QBE purely
due to finiteness of MCMC draws. Rarely this error is addressed in application of not only
QBEs but also general Bayesian estimation using MCMC draws. For Bayesian inference, one
can argue that the Monte-Carlo error does not affect much on the construction of credible
interval, which often is the main object of interest. For QBEs or Bayesian point estimates,
one should take close look at the Monte-Carlo error and incorporate it into the statistical
inference procedure.
Similar problems have been treated in context of simulated method of moments. McFadden (1989) discusses the numerical error from using simulated moments in discrete response
models. Pakes and Pollard (1989) provides asymptotic theory for general class of simulation
based estimators. Simulated method of moments or its variant, however, generally assume
the possibility of independent draws. For QBE problems, independent draws are generally
impossible and thus we rely on Markov Chain Monte-Carlo, which makes the asymptotics
substantially more complicated than independent simulation draws.
This paper provides a framework of incorporating Monte-Carlo error in Quasi-Bayesian
estimation problem. First, I briefly review the statistical inference based on QBEs following
Chernozhukov and Hong (2003), and quantify the Monte-Carlo error adjusted standard errors
for QBEs. Then, I establish conditions for a central limit theorem that is needed to ensure
asymptotic normality of QBE when the number of draws B in MCMC is large. Next, I
establish conditions for consistent estimation of variance-covariance matrix of Monte-Carlo
errors.
Finally, I provide a simple Monte-Carlo study using censored quantile regression
model similar to Chernozhukov and Hong (2003) and show that incorporating Monte-Carlo
error improves the coverage probability.
33
2.2. OVERVIEW ON QUASI-BAYESIAN ESTIMATORS
In Chernozhukov and Hong (2003), QBEs are proposed as an alternative estimation
method for general extremum estimation problems. Thus, we first define the extremum estimator of interest and basic assumptions for the estimator to have consistency and asymptotic
normality. Given a probability space (Q, F, P), define
0E
argmraxLn(0)
where Ln(0) is a measurable function of 0 on
E, and a random
variable on Q. This framework
of extremum estimation encompasses various estimation methods such as maximum likelihood
estimation and generalized method of moments. The following assumptions are generally used
in the literature to ensure consistency and asymptotic normality of an extremum estimator.
They are primitive assumptions for identification and validity of asymptotic expansion. See
Newey and McFadden (1994) or Gallant and White (1988) for further discussion.
(1) The true parameter 0o belongs to the interior of a compact convex subset
e
C
Rd
and 00 is identifiable in the sense that there exists E > 0 such that
lim inf P*
sup -(Ln(0) - Ln(O0 )) < -E
II-OoI>6
=
1
for all 6 > 0.
(2) For 0 in an open neighborhood of 0o, we have
Ln(0) - Ln(0)
where Q 1
2
=
(0 - 00 )'An (0 0 ) - 1(0 - 00)'[nJn(90)I(0 - 00) + Rn(9)
(90)An(90 )//
+ N(0, Id) and Jn(90 )
=
0(1) and On(9 o)
Var(An(0o)//) =
0(1) are uniformly in n positive definite constant matrices. For residual term R,(9),
34
for any e > 0 there exist 6 > 0 and M > 0 such that
lim sup P*
n-+oo
IR2()I
sup
>E
M/vrn-:5j-Ooj_<5 nI ~ 001-
<
lim sup P*
sup
JRn(O) > E
n-+oo
I 1-001:5M/vrn
E
0
Under the Assumption 1 and 2, we have consistency of 0! and
-1/2
-+
N (0, Id).
In theory, the extremum estimation provides a simple unified framework for many standard
statistical models. However, actually computing the extremum estimates in practice is not
a trivial problem. As noted by Andrews (1997), the problem can be cumbersome for some
of important econometric models such as censored quantile regression models and quantile
instrumental variables models.
Quasi-Bayesian Estimator(QBE) or Laplace-Type Estimator(LTE) provide an asymptotic
equivalent to the extremum estimator. QBE is defined as
i
= arg inf
Ce
p(O -
Ve
)
exp(L,(0)r(0))
fe exLn(0),r())dO
dGl
where pn(u) is a penalty function and ir(9) is a prior density. By construction, O
is a
Bayesian decision based on a penalty function pn and quasi-posterior
fe exp(Ln(0)7r(0))d0-
Chernozhukov and Hong (2003) considers fairly general class of p,(u) and states their main
results accordingly. However, I mainly consider pn(u) = nIu| 2 , which makes the value of 02
be the mean of the quasi-posterior pn(O). The key results of Chernozhukov and Hong (2003)
(Theorem 1 and 2) state that the quasi-posterior, with appropriate normalization, converges
35
to Gaussian distribution in total variation of moments norm and thus we have
n
n(9Q -- 0)
=
V(9,E~o)+opf()
00)
-4
N (0, Id).
2(60n(
-
This essentially states that we can substitute QBE for an extremum estimator. Furthermore,
we can make statistical inference on 0o or any smooth function of 90 with consistent estimates
of
Qn(90)
and Jn(00 ). 2n(0) can be estimated straightforwardly by standard methods, and
Jn(90 ) can be estimated by the variance of the quasi-posterior.
Let us consider a statistical inference problem in the context of QBE. Let g(9) be a
-
continuously differentiable function and g(Go) be the object of interest. Asymptotic 100(1
a)% confidence interval for 0 can be constructed as
)- s.e(g(O2)), g(O2) +
+(<}(1
[g($) +
-
) -s.e(g($2))
where
veg(o )' ,(00)-,n(00)(6)Ig()
s.e(g(9*)) =
and J(90 )'
=
fei(O
-
92)(9
-
92)'pn(9)d9,
A2Q(G0)
is a consistent estimate for f2n(9O),
and 4D is the standard normal distribution function. In practice, the value of g(
) and
J(00 )-' is obtained by calculating the sample mean and the variance for MCMC sequence
S = {(1),
9(2),
9(3), ...
,
9(B)}. For simplicity of the discussion, I assume that the MCMC se-
quence is properly burned in. The confidence interval constructed above, therefore, implicitly
assumes that we can have arbitrarily high precision for calculating the mean and the variance
of quasi-posterior. In fact, the feasible point estimate and confidence interval are calculated
based on
B
0(a),
-
(.) E S
8==1
instead of 92, the true mean of quasi-posterior. Hence, the standard error of the feasible
estimate 92 should reflect the additional error from the finite number of MCMC draws. Since
the MCMC draws are statistically independent from 9j
36
by construction, the Monte-Carlo
error corrected standard error of g(9n) can be written as
9(
n
Vo(2)J(6)106)J+)0)
n( O ' 0 0
+ V o g ( i~1) 'V ar (
- V 9(
B(2 E 9()Vg(~
( 9)
) V og( O
)
\
s=1
where
B
60~1=
1Z
- n) (0(
-
j )'.
s=1
Whether the later part is of significance or not mainly depends on how large B is. That
is, we can arbitrarily reduce the later part of the error by simply increasing the number of
MCMC draws. However, many practical computational restrictions hinder the possibility of
having very many MCMC draws, i.e. setting B very large. For example, some problems
have highly complicated criterion functions Ln(9) which makes the repeated evaluation of
L()
required in MCMC quite costly. When calculating Ln(6) involves nested iterative
computation, increasing B may be very costly because the marginal computational cost
involves additional set of nested iterations.
Even if we consider fixed B, there is another factor that affects the size of the MonteCarlo error. This factor is the persistence of the Markov chain. For example, for a random
walk Metropolis-Hastings algorithm for a high dimensional problem, the optimal acceptance
rate is known to be about 0.234.( See Gelman, Roberts and Gilks (1996) or Roberts and
Rosenthal (2007)) This means that the Markov chain S is quite persistent which makes the
variance of E 0(l), if it exists, large in general compared to independent Monte-Carlo draws.
There are two questions to address before we attempt to quantify the key component of
the Monte-Carlo error,
B
v/B-Var( B
(.g)).
8=0
First, we need to verify whether the quantity exists at all in the limit. That is, we need
to ensure that a central limit theorem goes through for the sum of MCMC draws. More
precisely, we need to show that
B
B
a=1
6'
-
92)
37
-4 N(0,04O).
Second, we need to have a consistent estimator for QM, which can be used as a proxy for
1 9(s)). With these two points addressed, we can construct the Monte-Carlo
V B _Var(_
error corrected standard error for the feasible QBE as follows.
~9
s.e.(l)
where
Vg(d )'in(9o)-1(O)J(0)-WVeg(d )
Vog(On)'QfVg(0n')
+
B
nQM is a consistent estimator for QM
2.3. GEOMETRIC ERGODICITY OF MCMC CHAIN
In this section, I show that the MCMC draws for Quasi-posterior follow a central limit
theorem. I mainly consider the Metropolis-Hastings algorithm with random walk proposal
density, which is mostly used in practice for QBEs and other non-hierarchical Bayesian models. For the detailed discussion, I review some of the results on central limit theorems for
Markov chains. The approach in this section follows that of Meyn and Tweedie (2009), Jones
(2004) and Roberts and Tweedie (1996). We start with some basic definitions of Markov
chain theory.
Definitions.
(1) A Markov chain transition kernel P is ir-irreducible if for any x E X and for any set
A with 7r(A) > 0, there exists and n such that P"(x, A) > 0
(2) A -r-irreducible P is periodic if there exists an integer d > 2 and a collection of
disjoint sets A,,..., Ad E B such that for each x E A3 , P(x, Aj+,1 ) = 1 for
j
=
1,... , d - 1 and for x E Ad, P(x, A 1 ) = 1. Otherwise, P is said to be aperiodic.
(3) A r-irreducible Markov chain{X,} with stationary distribution 7r is Harris recurrent
if for every set A with ir(A) > 0 Pr(Xn E A i.o. 1X1 = x) = 1 for all x.
(4) If 7r is a probability distribution, the Markov chain X is said to be Harris ergodic if
it is ir-irreducible, aperiodic and Harris recurrent.
One obvious example that satisfies all the conditions above but stationarity is random walk
in single dimension, because random walk can potentially visit any set with positive probability measure in X infinitely often. Straightforwardly, Metropolis-Hastings algorithm with
38
proposal density of random walk satisfies irreducibility, aperiodicity and Harris recurrency
as long as its acceptance probability is bounded away from zero. In fact, Mengersen and
Tweedie (1996) showed that this is true for any Metropolis-Hastings chain in general. The
following proposition from Athreya et al (1996) provides a useful implication of this.
PROPOSITION. If P is Harris ergodic, then we have
IP((x,-)- K(-)|-+0
asn-+oo
where the norm is total variation norm,
IP"(x, -) -,7r(-) I = sup lP"(x, A) - xr(A) I
AEB
This shows that the Markov chain eventually converges to the stationary distribution in
terms of total variation norm. With Harris ergodicity, we can treat a Markov chain as if it
is a stationary process in terms of our problem. This follows from the following proposition
of Meyn and Tweedie (2009).
PROPOSITION. If a CLT holds for any one initialdistributionfor a Harrisergodic Markov
chain, then it holds for every initial distribution.
The convergence in total variation norm provides direct link to Markov chain central limit
theorems. In order to establish proper rate of convergence and obtain a limit distribution, we
need stronger notion of ergodicity, which is geometric ergodicity. A Harris ergodic Markov
chain P is said to be geometrically ergodic if there exist t E (0, 1) and a non-negative function
M(x) such that
IP(X,
-7r(-) I < M(X)tn
We have established that Metropolis-Hastings random walk chain is Harris ergodic by
construction. Roberts and Tweedie (1996) give a set of sufficient conditions for a multivariate Metropolis-Hastings chain to be geometrically ergodic. They show that the geometric
ergodicity of a Metropolis-Hastings random walk chain is purely determined by the shape
39
and tail behavior of the target density, the quasi-posterior in our case. The following theorem
from Roberts and Tweedie (1996) sets sufficient conditions for the shape.
THEOREM. Let p and h be polynomials on Rd, and
=
{r,7r(x) = h(x) exp{p(x)}}
s.t. lim pm(x) -+oo
Vm > 2,
where pm denotes the polynomial consisting only of p's mth order terms. For any target
density 7r E P, symmetric random walk Metropolis-Hastingschain is geometrically ergodic.
The large sample limiting distribution of quasi-posterior under the assumptions in Chernozhukov and Hong (2003) is Gaussian distribution which is contained in P. Moreover, we
assumed compact parameter space
E
for the extremum estimation. Thus, there is no tail
issue for the quasi-posterior and all of the primitive conditions for geometric ergodicity in
Theorem 3.2 of Roberts and Tweedie (1996) are easily satisfied.
THEOREM. A random walk Metropolis-Hastings chain for a quasi-posteriorthat has acceptance probability bounded away from zero is geometrically ergodic under the Assumption
1 and 2.
Geometric ergodicity turns out to be a sufficient condition for establishing a CLT. This can
be proved by showing that geometric ergodic Markov chain is strongly mixing (or a-mixing)
with exponential rate.
LEMMA. Geometric ergodic Markov chain is strongly mixing with exponential mixing rate
if
E,[M(X)] < oo,
where M(x) is defined as above.
PROOF. A sequence {Xn} is said to be strongly mixing if the mixing coefficient a(n) -+ 0
as n -+ oo where
a(n) = sup
sup
k>1 BE.FAEFkn
IP(AfnB) - P(A)P(B),
40
where F2 is a sigma field generated by {Xt : t1 < t < t 2 }. Let P"(x, -) be the transition
kernel of a geometrically ergodic Markov chain with stationary distribution ir(x). Note that
f
f
P"(x, A)-r(A)7r(dx) > I
(Pn(x, A)-r(A))r(dx)I > JPr(Xn E A, Xo E B)-(A)7(B)
and thus we have
a(n) <
sup IP"(x, A) - 7r(A)bir(dx)
J
AEB
E,[M(X)]tnh
where B is the Borel sigma field on the support of 7r, X.
0
Note that the assumption on the existence of E,[M(X)] holds in our case because the
parameter set 8 is assumed to be compact. Using this result along with the following classic
central limit theorem for strongly mixing sequences, we can establish the desired result.
THEOREM. Let X, be a centered strictly stationary strongly mixing sequence with mixing
coefficient a(n) such that EIXn12+ < oo for some 6 > 0 and
a(n)S/(2+6) < 00
Then we have,
2=
00
E[XO2|+ 2 E E[XoXj] < oo,
j=1
and if or2 > 0,
=IXk 4 N(0,.
7
k=1
COROLLARY. Let {(), s = 1, ...} be a random walk Metropolis-HastingsMarkov chain
for a quasi-posteriorpn(9). Suppose En 102+1 < oo for some 6 > 0. Let 92 = E[0] and
QM
=
En[(0(1)- obQ) ((1)-
'
9Q)IJ+
-[(9(1)
-
Q)(9(i)- )r+E
E~n[(9(i) _ y)(0(1)_
i=2
Then we have,
B
B(
-
92)
s=1
41
-4 N(0, Q).
-
)I
2.4. CONSISTENT ESTIMATION OF VARIANCE-COVARIANCE MATRIX
We have shown that the central limit theorem holds for random walk Metropolis-Hastings
chains for Quasi-posteriors.
For actual calculation of the Monte-Carlo error, we need to
obtain a consistent estimator for the variance-covariance matrix QM. In this section, I show
that geometric ergodicity is indeed a sufficient condition for the consistency of classic nonparametric HAC estimators. Andrews (1991) provides a set of primitive conditions for kernel
based non-parametric estimator to be consistent. Let 14 = 0(t) - OQ E RP and 1(j) be the
covariance of V and Vt+j. Let K&(t, t + j, t + m, t + n) be the element-by-element fourth
order cumulant of (V, Vt+j, Vt+m, V+n) where a, b, c, d indicate indices of elements. Suppose
that the following conditions hold:
(1) {1V} is a mean zero, fourth order stationary sequence of rv's with E,*-__
1(j) II <
oo and
00
00
00
E Kd(O,j,m,n)<oo
E E
Va,b,c,d<p
j=-oo m=-oo n=-oo
(2) VI
E V = Op(1), supt>1 El Vt
< oo and f*. Ik(x)Idx < oo where k(x) is a
2
kernel.
Under 1 and 2, a consistent estimator for Of
(
(T
k(
)r(j)
T-1
n=
=
1(j) exists and it can be written as,
-
for
>0
whereLr(j) =
S
j=-T+1
_
t-
1 3t+V/ for j <0
Here, ST is the truncation parameter that is increasing in T and Vt = 0 (t) -.
See Andrews
The following theorem connects geometric ergodicity and consistency of
.
(1991) for.detailed discussion.
THEOREM. Let {(s), s = 1,...} be a random walk Metropolis-HastingsMarkov chain for
a quasi-posteriorpn(9) and QM be defined as above. Then QM is a consistent estimator of
- 0 and
anS- ST
-
oo.
.
QM if
n ST
42
PROOF.
Lemma 1 of Andrews (1991) states that if {Vt} is a mean zero a-mixing sequence
of rv's with
supEVt' 4V
< oo
1 )/v
< oo
t>1
Zi
2
a(j)(
j=1
for some v > 1, then we have
00
00
Z
00
Z
E
K.b(, j, m,n) <oo
Va, b,c, d <p.
)=-oo mn=-oo n=-oo
The moment condition holds for any v because the quasi-posterior has compact support, and
the mixing coefficient condition is easily satisfied by the geometric ergodicity. Thus, the first
condition for the consistency holds. Also, the second condition is satisfied by the result of
the previous section. Therefore, by the Theorem 1 of Andrews (1991), we have
2.5. MONTE-CARLO SIMULATIONS
There has been a fairly large literature concerning the censored quantile regression model
since Powell (1988).
tremum estimate.
One of the main issues has been the difficulty of calculating the exBuchinsky and Hahn (1998) proposed an iterated linear programming
(ILP) algorithm to tackle the problem. As shown in Chernozhukov and Hong (2003), the
ILP algorithm seems to be unreliable in cases with high degree of censoring compared to
quasi-Bayesian estimation. Thus, censored quantile regression is one of good examples that
show merits of QBE.
I consider a simple version of censored quantile regression model for expository purpose.
The model is the same as that of Chernozhukov and Hong (2003) for comparison. The actual
computation cost for calculating QBE in this model is relatively low since it is a simple
43
toy model that does not involve any nested iterations. However, the simulation result from
this simple model can shed light on the importance of accounting for Monte-Carlo errors.
Consider the following model:
Y*
= /3+X'13+u
X
~
)
N(0, 1 3
u = X1N(0,1)
= max{0, Y*}
Y
where the true parameter (3o#,31,#2,133) = (-1,3,3,3).
We observe data set of copies of
(X, Y) with sample size n. The setting produces approximately 55% censoring. The criterion
function for extremum estimation is
n
Ln(O) =
-jY,
- max(0,,3o + X)1.
For quasi-Bayesian estimation, I set the initial values equal to OLS estimates and burned
first 500 draws. Gaussian random walk Metropolis-Hastings algorithm is employed and the
step size is adjusted to have acceptance probability of 20%
-
25%, which is considered to be
optimal in the literature. The Table 1 shows the percentage of Monte-Carlo error contained
R=
where Q2 M is the first diagonal element of
-
in mean-squared error. The ratio is calculated as
MSE(#0 n)
n27 defined as above. It also shows the coverage
probability of 90% confidence intervals using regular standard error calculated from QBE
and Monte-Carlo error corrected standard error. Finally, the table provides accuracy of the
Monte-Carlo variance term,
nQM
by calculating normalized RMSE of it. Following the usual
convention, normalizing constant is the difference between maximum and minimum among
the set of
'
For the last statistic, I fixed the values of (Y, X) and generated many
different Markov chains for calculating QBEs.
44
TABLE 1. Monte-Carlo Error in Censored Quantile Regression (based on 1,000 repetitions)
n = 800
n = 200
B = 1000 B = 4000 B = 1000 B = 4000
Ratio of MC Error
0.0672
0.0209
0.0941
0.0304
Coverage of S.E.
0.8130
0.8690
0.7750
0.8340
Coverage of Corrected S.E.
0.8320
0.8760
0.8130
0.8460
Normalized RMSE of Q11
0.1198
0.0916
0.1494
0.1113
We can observe that the Monte-Carlo error has non-trivial proportion in the actual mean
squared error of QBE. Obviously, this problem is most pronounced when we have relatively
small number of MCMC draws. By incorporating the Monte-Carlo error, we can achieve
some improvement in coverage probabilities of confidence intervals. The estimation of QM is
done by setting ST = 150 for B = 1000 case and ST = 250 for B = 4000 case. This is rather
arbitrary, but considering the nature of geometric mixing, these numbers can be deemed to
be reasonable. As for the kernel, the Bartlett kernel is employed. Despite relatively large
number of MCMC draws, the accuracy of QM is not so impressive. This is largely due to bias
in the estimation, potentially caused by low acceptance rate of Metropolis Hastings chain.
Sample size n has two counteractive effects on the importance of the Monte-Carlo error.
As sample size n grows, the precision of QBE improves and thus the relative size of the
Monte-Carlo error increases. However, growing sample size also shrinks the quasi-posterior
by the rate of /Fn. Shrunk quasi-posterior reduces the Monte-Carlo error by construction of
the Metropolis-Hastings chain, i.e. the chain will bounce within a narrower space so that the
variance gets reduced. These counteractive effects make the Monte-Carlo error relevant for
any sample size.
2.6. CONCLUSION
QBE provides a simple alternative to general class of extremum estimators when direct
optimization is cumbersome or infeasible in practice. QBE utilizes MCMC to calculate mean
of quasi-posterior and therefore numerical error or Monte-Carlo error arises. However, the
45
Monte-Carlo error is rarely accounted in practice. This paper quantifies the Monte-Carlo
error and provides a method to incorporate such error in the statistical inference for quasiBayesian estimators. A central limit theorem is established for random walk MetropolisHastings algorithm for QBE problems and a consistent estimator for the Monte-Carlo error
is provided.
It turns out that the Monte-Carlo error can be a substantial part of the actual standard
error of quasi-Bayesian estimators. Adjusting standard errors to incorporate the Monte-Carlo
error would result in better statistical inference, i.e. more accurate coverage of confidence
intervals. Practical relevance of the Monte-Carlo error purely depends on the nature of the
statistical model and one cannot be sure about it ex-ante. Unless a researcher can draw large
number of MCMC draws cheaply, he or she should take a close look at the Monte-Carlo error
and incorporate it into statistical inference procedures.
46
47
Bibliography
[1} Donald WK Andrews. Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica: Journal of the Econometric Society, pages 817-858, 1991.
[21 Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. Handbook of Markov Chain Monte
Carlo. CRC Press, 2011.
[3 Moshe Buchinsky and Jinyong Hahn. An alternative estimator for the censored quantile regression model.
Econometrica, pages 653-671, 1998.
[4] Victor Chernozhukov and Han Hong. Three-step censored quantile regression and extramarital affairs.
Journal of the American StatisticalAssociation, 97(459), 2002.
[5] Victor Chernozhukov and Han Hong. An mcmc approach to classical estimation. Journal of Econometrics, 115(2):293-346, 2003.
[6] A Ronald Gallant and Halbert White. A unified theory of estimation and inference for nonlineardynamic
models. Basil Blackwell New York, 1988.
[7 A Gelman, G Roberts, and W Gilks. Efficient metropolis jumping rules. Bayesian statistics, 5:599-608,
1996.
[8] Galin L Jones et al. On the markov chain central limit theorem. Probability surveys, 1(299-320):5-1,
2004.
[91 Claude Kipnis and SR Srinivasa Varadhan. Central limit theorem for additive functionals of reversible
markov processes and applications to simple exclusions. Communications in Mathematical Physics,
104(l):1-19, 1986.
[101 Kerrie L Mengersen, Richard L Tweedie, et al. Rates of convergence of the hastings and metropolis
algorithms. The Annals of Statistics, 24(l):101-121, 1996.
[111 Sean P Meyn and Richard L Tweedie. Markov chains and stochastic stability. Cambridge University
Press, 2009.
[121 Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. Handbook of
econometrics, 4:2111-2245, 1994.
[13] James L Powell. Censored regression quantiles. Journalof econometrics, 32(l):143-155, 1986.
[14] Christian P Robert and George Casella. Monte Carlo statisticalmethods, volume 319. Citeseer, 2004.
48
[15J Gareth 0 Roberts and Jeffrey S Rosenthal. Coupling and ergodicity of adaptive markov chain monte
carlo algorithms. Journalof applied probability, pages 458-475, 2007.
[161 Gareth 0 Roberts, Jeffrey S Rosenthal, et al. Optimal scaling for various metropolis-hastings algorithms.
Statisticalscience, 16(4):351-367, 2001.
[171 Gareth 0 Roberts and Richard L Tweedie. Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms. Biometrika, 83(1):95-110, 1996.
49
CHAPTER 3
PANEL DATA MODELS WITH NONADDITIVE UNOBSERVED
HETEROGENEITY: ESTIMATION AND INFERENCE'
Coauthored with IVAN FERNANDEZ-VAL
'This chapter is published in Quantitative Economics, Nov, 2013.
50
3.1. INTRODUCTION
This paper considers estimation and inference in linear and nonlinear panel data models
with random coefficients and endogenous regressors. The quantities of interest are means,
variances, and other moments of the distribution of the random coefficients. In a state level
panel model of rational addiction, for example, we might be interested in the mean and variance of the distribution of the price effect on cigarette consumption across states, controlling
for endogenous past and future consumptions. These models pose important challenges in
estimation and inference if the relation between the regressors and random coefficients is
left unrestricted. Fixed effects methods based on GMM estimators applied separately to
the time series of each individual can be severely biased due to the incidental parameter
problem. The source of the bias is the finite-sample bias of GMM if some of the regressors
is endogenous or the model is nonlinear in parameters, or nonlinearities if the parameter of
interest is the variance or other high order moment of the random coefficients. Neglecting the
heterogeneity and imposing fixed coefficients does not solve the problem, because the resulting estimators are generally inconsistent for the mean of the random coefficients (Yitzhaki,
1996, and Angrist, Graddy and Imbens, 2000).' Moreover, imposing fixed coefficients does
not allow us to estimate other moments of the distribution of the random coefficients.
We introduce a class of bias-corrected panel fixed effects GMM estimators. Thus, instead
of imposing fixed coefficients, we estimate different coefficients for each individual using the
time series observations and correct for the resulting incidental parameter bias. For linear
models, in addition to the bias correction, these estimators differ from the standard fixed
effects estimators in that both the intercept and the slopes are different for each individual.
Moreover, unlike for the classical random coefficient estimators, they do not rely on any
restriction in the relationship between the regressors and random coefficients; see Hsiao and
Pesaran (2004) for a recent survey on random coefficient models. This flexibility allows us
to account for Roy (1951) type selection where the regressors are decision variables with
levels determined by their returns. Linear models with Roy selection are commonly referred
to as correlated random coefficient models in the panel data literature. In the presence of
endogenous regressors, treating the random coefficients as fixed effects is also convenient to
overcome the identification problems in these models pointed out by Kelejian (1974).
The most general models we consider are semiparametric in the sense that the distribution of the random coefficients is unspecified and the parameters are identified from moment
conditions. These conditions can be nonlinear functions in parameters and variables, accommodating both linear and nonlinear random coefficient models, and allowing for the presence
of time varying endogeneity in the regressors not captured by the random coefficients. We
'Heckman and Vytlacil (2000) and Angrist (2004) find sufficient conditions for fixed coefficient OLS and IV
estimators to be consistent for the average coefficient.
51
use the moment conditions to estimate the model parameters and other quantities of interest
via GMM methods applied separately to the time series of each individual. The resulting
estimates can be severely biased in short panels due to the incidental parameters problem,
which in this case is a consequence of the finite-sample bias of GMM (Newey and Smith,
2004) and/or the nonlinearity of the quantities of interest in the random coefficients. We
develop analytical corrections to reduce the bias.
To derive the bias corrections, we use higher-order expansions of the GMM estimators,
extending the analysis in Newey and Smith (2004) for cross sectional estimators to panel data
estimators with fixed effects and serial dependence. If n and T denote the cross sectional
and time series dimensions of the panel, the corrections remove the leading term of the bias
of order O(T-1), and center the asymptotic distribution at the true parameter value under
sequences where n and T grow at the same rate. This approach is aimed to perform well in
econometric applications that use moderately long panels, where the most important part
of the bias is captured by the first term of the expansion. Other previous studies that used
a similar approach for the analysis of linear and nonlinear fixed effects estimators in panel
data include, among others, Kiviet (1995), Phillips and Moon (1999), Alvarez and Arellano
(2003), Hahn and Kuersteiner (2002), Lancaster (2002), Woutersen (2002), Hahn and Newey
(2004), and Hahn and Kuersteiner (2011). See Arellano and Hahn (2007) for a survey of this
literature and additional references.
A first distinctive feature of our corrections is that they can be used in overidentified models where the number of moment restrictions is greater than the dimension of the parameter
vector. This situation is common in economic applications such as rational expectation models. Overidentification complicates the analysis by introducing an initial stage for estimating
optimal weighting matrices to combine the moment conditions, and precludes the use of
the existing methods. For example, Hahn and Newey's (2004) and Hahn and Kuersteiner's
(2011) general bias reduction methods for nonlinear panel data models do not cover optimal
two-step GMM estimators. A second distinctive feature is that our results are specifically
developed for models with multidimensional nonadditive heterogeneity, whereas the previous studies focused mostly on models with additive heterogeneity captured by an scalar
individual effect. Exceptions include Arellano and Hahn (2006) and Bester and Hansen
(2008), which also considered multidimensional heterogeneity, but they focus on parametric
likelihood-based panel models with exogenous regressors. Bai (2009) analyzed related linear
panel models with exogenous regressors and multidimensional interactive individual effects.
Bai's nonadditive heterogeneity allows for interaction between individual effects and unobserved factors, whereas the nonadditive heterogeneity that we consider allows for interaction
52
between individual effects and observed regressors. A third distinctive feature of our analysis is the focus on moments of the distribution of the individual effects as one of the main
quantities of interest.
We illustrate the applicability of our methods with empirical and numerical examples
based on the cigarette demand application of Becker, Grossman and Murphy (1994). Here,
we estimate a linear rational addictive demand model with state-specific coefficients for price
and common parameters for the other regressors using a panel data set of U.S. states. We find
that standard estimators that do not account for non-additive heterogeneity by imposing a
constant coefficient for price can have important biases for the common parameters, mean of
the price coefficient and demand elasticities. The analytical bias corrections are effective in
removing the bias of the estimates of the mean and standard deviation of the price coefficient.
Figure 1 gives a preview of the empirical results. It plots a normal approximation to the
distribution of the price effect based on uncorrected and bias corrected estimates of the
mean and standard deviation of the distribution of the price coefficient. The figure shows
that there is important heterogeneity in the price effect across states. The bias correction
reduces by more than 15% the absolute value of the estimate of the mean effect and by 30%
the estimate of the standard deviation.
Some of the results for the linear model are related to the recent literature on correlated
random coefficient panel models with fixed T. Graham and Powell (2008) gave identification
and estimation results for average effects. Arellano and Bonhomme (2010) studied identification of the distributional characteristics of the random coefficients in exogenous linear
models. None of these papers considered the case where some of the regressors have time
varying endogeneity not captured by the random coefficients or the model is nonlinear. For
nonlinear models, Chernozhukov, FernAndez-Val, Hahn and Newey (2010) considered identification and estimation of average and quantile treatment effects. Their nonparametric and
semiparametric bounds do not require large-T, but they do not cover models with continuous
regressors and time varying endogeneity.
The rest of the paper is organized as follows. Section 2 illustrates the type of models
considered and discusses the nature of the bias in two examples. Section 3 introduces the
general model and fixed effects GMM estimators. Section 4 derives the asymptotic properties
of the estimators. The bias corrections and their asymptotic properties are given in Section
5. Section 6 describes the empirical and numerical examples. Section 7 concludes with a
summary of the main results. Additional numerical examples, proofs and other technical
details are given in the online supplementary appendix FernAndez-Val and Lee (2012).
53
3.2. MOTIVATING EXAMPLES
In this section we describe in detail two simple examples to illustrate the nature of the bias
problem. The first example is a linear correlated random coefficient model with endogenous
regressors. We show that averaging IV estimators applied separately to the time series of each
individual is biased for the mean of the random coefficients because of the finite-sample bias
of IV. The second example considers estimation of the variance of the individual coefficients
in a simple setting without endogeneity. Here the sample variance of the estimators of the
individual coefficients is biased because of the non-linearity of the variance operator in the
individual coefficients. The discussion in this section is heuristic leaving to Section 4 the
specification of precise regularity conditions for the validity of the asymptotic expansions
used.
3.2.1. Correlated random coefficient model with endogenous regressors.
Consider the following panel model:
(2.1)
Yt= aoi+ aliXit+E, (i= 1,...,n;t = 1,...,T);
where yit is a response variable, xi is an observable regressor, Et is an unobservable error
term, and i and t usually index individual and time period, respectively. 2 This is a linear random coefficient model where the effect of the regressor is heterogenous across individuals, but
no restriction is imposed on the distribution of the individual effect vector a := (aQ
0 , ai,)'.
The regressor can be correlated with the error term and a valid instrument (1, zit) is available
for (1, xit), that is E[Eft
I ai]
= 0, E[ztEc
I a] = 0 and Cov[zitxit
I ai] #
0. An important
.
example of this model is the panel version of the treatment-effect model (Wooldridge, 2002
Chapter 10.2.3, and Angrist and Hahn, 2004). Here, the objective is to evaluate the effect
of a treatment (D) on an outcome variable (Y). The average causal effect for each level
of treatment is defined as the difference between the potential outcome that the individual
would obtain with and without the treatment, Yd - Yo. If individuals can choose the level
of treatment, potential outcomes and levels of treatment are generally correlated. An instrumental variable Z can be used to identify the causal effect. If potential outcomes are
represented as the sum of permanent individual components and transitory individual-time
specific shocks, that is Yi = Yji + Ejit for j E {0, 1}, then we can write this model as a
special case of (2.1) with yit = (1 - Dt)Yot + DtYli, a0 j = Yo0 , ai, = Y1i - Yoi, xt = Di,
zit = Zit, and Ect = (1 - Dt)eot + Deit
Suppose that we are ultimately interested in a, := E[ai], the mean of the random slope
coefficient. We could neglect the heterogeneity and run fixed effects OLS and IV regressions
2
More generally, i denotes a group index and t indexes the observations within the group. Examples of
groups include individuals, states, households, schools, or twins.
54
in
Yit =
&Oi
+ el iXt + Uit
where ui = xit(ai - &i) +Eit in terms of the model (2.1). In this case, OLS and IV estimate
weighted means of the random coefficients in the population; see, for example, Yitzhaki
(1996) and Angrist and Krueger (1999) for OLS, and Angrist, Graddy and Imbens (2000)
for IV. OLS puts more weight on individuals with higher variances of the regressor because
they give more information about the slope; whereas IV weighs individuals in proportion to
the variance of the first stage fitted values because these variances reflect the amount of information that the individuals convey about the part of the slope affected by the instrument.
These weighted means are generally different from the mean effect because the weights can
be correlated with the individual effects.
To see how these implicit OLS and IV weighting schemes affect the estimand of the fixedcoefficient estimators, assume for simplicity that the relationship between xit and zit is linear,
that is Xtt = iroj + 7lixzit + vit, (Et, vit) is normal conditional on (zit, ai, 7ir), zt is independent
of (a;, 7ir), and (ai, ,7A) is normal, for 7r := (lUoj, ?r 1j)'. Then, the probability limits of the OLS
and IV estimators are3
a1 Ls = a1+ {Cov[c1 t,vit] +2E[riu]Var[zit]Cov[aiu, rix]}/Var[xit],
a
= a1 +Cov[a 11 ,iriI/E[rii].
These expressions show that the OLS estimand differs from the average coefficient in presence
of endogeneity, i.e. non zero correlation between the individual-time specific error terms, or
whenever the random coefficients are correlated; while the IV estimand differs from the
average coefficient only in the latter case.4 In the treatment-effects model, there exists
correlation between the error terms in presence of endogeneity bias and correlation between
the individual effects arises under Roy-type selection, i.e., when individuals who experience
a higher permanent effect of the treatment are relatively more prone to accept the offer
of treatment. Wooldridge (2005) and Murtazashvile and Wooldridge (2005) give sufficient
conditions for consistency of standard OLS and IV fixed effects estimators. These conditions
amount to Cov[et, vt] = 0 and Cov[xit, aI,aoI
=0.
Our proposal is to estimate the mean coefficient from separate time series estimators
for each individual. This strategy consists of running OLS or IV for each individual, and
then estimating the population moment of interest by the corresponding sample moment
3
The limit of the IV estimator is obtained from a first stage equation that imposes also fixed coefficients,
that is xit = ir 0o + rizit + wit, where wi = zit(Irl, - 1i) + vit. When the first stage equation is different for
each individual, the limit of the IV estimator is
a
= a1 + 2E[ri 1 ]Cov[a1 1 , iril/{E[1riul
2
+ Var[niu]}.
See Theorems 2 and 3 in Angrist and Imbens (1995) for a related discussion.
This feature of the IV estimator is also pointed out in Angrist, Graddy and Imbens (1999), p. 507.
4
55
of the individual estimators. For example, the mean of the random slope coefficient in the
population is estimated by the sample average of the OLS or IV slopes. These sample
moments converge to the population moments of interest as number of individuals n and
time periods T grow. However, since a different coefficient is estimated for each individual,
the asymptotic distribution of the sample moments can have asymptotic bias due to the
incidental parameter problem (Neyman and Scott, 1948).
To illustrate the nature of this bias, consider the estimator of the mean coefficient ai
constructed from individual time series IV estimators. In this case the incidental parameter
problem is caused by the finite-sample bias of IV. This can be explained using some expansions. Thus, assuming independence across t, standard higher-order asymptotics gives (e.g.
Rilstone et. al., 1996), as T -+ oo
(&Iv - al1 )
T1
# + -f3 + op(T-1 2),
=
where Oit = E[it
I ai, 7ri]~l'.eitt is the influence function of IV, f3i = -E[iaia I ai,w,]-2
E[Z4itcj I aj, 7r] is the higher-order bias of IV (see, e.g., Nagar, 1959, and Buse, 1992), and
the variables with tilde are in deviation from their individual means, e.g., zu = Zt - E[zit
I
a4, W]. In the previous expression the first order asymptotic distribution of the individual
estimator is centered at the truth since vW(aIi - ai) -+d N(O, oi) as T -+ oo, where
]- 2 E[zi
of= E [z uz a Ia ,, w7r
| as x]
Let a1 = n- E 1 &{1, the sample average of the IV estimators. The asymptotic distribution of &'i is not centered around al in short panels or more precisely under asymptotic
sequences where T/N/ii -+ 0. To see this, consider the expansion for &1
('1 - ai) =
(=al - ai) +
( f
- aii).
The first term is the standard influence function for a sample mean of known elements. The
second term comes from the estimation of the individual elements inside the sample mean.
Assuming independence across i and combining the previous expansions,
N/n(a"1- al) =
- al) +
(1in
1e
T+
+nnr()
=1 ==1
__=1_=1
=OPM1
=OP(l1/r)=(r/)
This expression shows that the bias term dominates the asymptotic distribution of a' in
short panels under sequences where T/Ii -+ 0. Averaging reduces the order of the variance
of aff, without affecting the order of its bias. In this case the estimation of the random
coefficients has no first order effect in the asymptotic variance of a' because the second term
is of smaller order than the first term.
56
A potential drawback of the individual by individual time series estimation is that it might
more be sensitive to weak identification problems than fixed coefficient pooled estimation. 5
In the random coefficient model, for example, we require that E[iitiz j ai, ri]= 1r1 i 4 0 with
probability one, i.e., for all the individuals, whereas fixed coefficient IV only requires that this
condition holds on average, i.e., E[7rsi] A 0. The individual estimators are therefore more
sensitive than traditional pooled estimators to weak instruments problems. On the other
hand, individual by individual estimation relaxes the exogeneity condition by conditioning on
I ai, 7r] = 0. Traditional
additive and non-additive time invariant heterogeneity, i.e, E[zite
fixed effects estimators only condition on additive time invariant heterogeneity. A formal
treatment of these identification issues is beyond the scope of this paper.
3.2.2. Variance of individual coefficients. Consider the panel model:
fi ai ~ (0, a,), ai ~(a, or.), (t=1,.,T;i=,..,)
yit = cai + fi,7u
where yit is an outcome variable of interest, which can be decomposed in an individual effect
ai with mean a and variance o%, and an error term eit with zero mean and variance oC
conditional on &j. The parameter of interest is a! = Var[aI and its fixed effects estimator
is
= (n - 1)-E( -
ac
a)2
where &8= T'-Z3
it and 8
=
n-
"
i=1
- a)2 - u2 and Wt = qi - cr,. Assuming independence across i and t, a
Leta =(a
standard asymptotic expansion gives, as n, T -+ oo,
-v=F(O)
nT
-VT--/7T
V
2
)
=
=0P(1/v'IT)
The first term corresponds to the influence function of the sample variance if the oj's were
known. The second term comes from the estimation of the ai's. The third term is a bias
term that comes from the nonlinearity of the variance in &4. The bias term dominates the
expansion in short panels under sequences where T/Vi -+ 0. As in the previous example,
the estimation of the ai's has no first order affect in the asymptotic variance since the second
term is of smaller order than the first term.
5 We
thank a referee for pointing out this issue.
57
3.3.
THE MODEL AND ESTIMATORS
We consider a general model with a finite number of moment conditions dg. To describe it,
let the data be denoted by zit (i = 1,... , n; t = 1,... , T). We assume that zit is independent
over i and stationary and strongly mixing over t. Also, let 9 be a do-vector of common
parameters, {a : 1 < i < n} be a sequence of do-vectors with the realizations of the
individual effects, and g(z; 9, aj) be an dg-vector of functions, where dg ;> do + d,.' The
model has true parameters 90 and {aio: 1 < i < n}, satisfying the moment conditions
E [g(zit;9o, ao)]= 0, (t =1,...,T;i=1,...,n),
where E[-] denotes conditional expectation with respect to the distribution of zit conditional
on the individual effects.
Let E[-] denote the expectation taken with respect to the distribution of the individual
effects. In the previous model, the ultimate quantities of interest are smooth functions of
parameters and observations, which in some cases could be the parameters themselves,
EE[(j(zit;0, a0)],
if
EEI(i(zt; O, ajo)I
< oo, or moments or other smooth functions of the individual effects
A = B[p(ajo)],
-
if EIp(ago) < oo. In the correlated random coefficient example, g(zit; 9 o, ajo) = zit(yit-aoioalioxit), 9 = 0, do = 0, da =2, and p(ajo) = a1l 0 . In the variance of the random coefficients
example, g(zit; 9 o, ao) = (yjt - aoio), 9 = 0, do = 0, da, = 1 , and 4(ajo) = (aiio
Some more notation, which will be extensively used in the definition of the estimators and
in the analysis of their asymptotic properties, is the following
Qjj(0, aj) :=E[g(zjt;6, aj)g(zj,t_j;0, aj)'j, j E {0,1,2, ...
},)
Go,(9, ai)
E[Go(zit;9, a)]=E [9g(zt;9, ai)/90'],
G.,(9, ai) := E[Ga(zit;9,a)] = E [Og(zt; 9, ai)/8a'],
where superscript' denotes transpose and higher-order derivatives will be denoted by adding
subscripts. Here Z3 is the covariance matrix between the moment conditions for individual
i at times t and t - j, and Goe and G,, are time series average derivatives of these conditions.
6 We
impose that some of the parameters are common for all the individuals to help preserve degrees of
freedom in estimation of short panels with many regressors. An order condition for this model is that the
number of individual specific parameters d, has to be less than the time dimension T.
58
Analogously, for sample moments
T
Qj,(0, a2 ) :
g(zit; 6, c )g(z,tj; 6, a2 )', j E {0, 1, ... , T - l},
T
t +1
Ge, (6, ar) :=T'
G(zt; 6,a) = T-'
t=1
Gcjj (0, a) := T-
g(zt;6, ai)/86',
t=1
G(zit; 6, ai) = T-'
T
g(z;Oai)/Oa'.
t=1
t=1
In the sequel, the arguments of the expressions will be omitted when the functions are
evaluated at the true parameter values (6, a' )', e.g., g(zit) means g(zit; 0 o, ao).
In cross-section and time series models, parameters defined from moment conditions are
usually estimated using the two-step GMM estimator of Hansen (1982). To describe how
to adapt this method to panel models with fixed effects, let '(6, a2 ) := T
g(zj;0, as),
and let (6', {&}"&1)' be some preliminary one-step FE-GMM estimator, given by (0',{&'})' =
arg inf{(I,a)'ET}
En1
(=, a1)'
a(0, a;),
W0~ where T C R4d+4 denotes the parameter
space, and {lW : 1 < i < n} is a sequence of positive definite symmetric d. x d. weighting
matrices. The two-step FE-GMM estimator is the solution to the following program
n
(6', {&}" )' = arg
where
U;(j, di)
inf
?(0, a;)'U5((,7l)-1-(0, aj),
is an estimator of the optimal weighting matrix for individual i
00
To facilitate the asymptotic analysis, in the estimation of the optimal weighting matrix
we assume that g(zt; 0o, ao) is a martingale difference sequence with respect to the sigma
algebra o-(ai, Z,t1, Z,t-2,...), so that Q2 = ioj and ni(0, i) = Qoz(0, di). This assumption
holds in rational expectation models. We do not impose this assumption to derive the
limiting distribution of the one-step FE-GMM estimator.
For the subsequent analysis of the asymptotic properties of the estimator, it is convenient
to consider the concentrated or profile problem. This problem is a two-step procedure. In
the first step the program is solved for the individual effects, given the value of the common
parameter 6. The First Order Conditions (FOC) for this stage, reparametrized conveniently
as in Newey and Smith (2004), are the following
(6()==(0,
Ga, (0, &^ (0))'AI (0)
+ Q,(
a()) + (0, &) ()
59
=0 (i = 1 ... , n),
where Aj is a d9-vector of individual Lagrange multipliers for the moment conditions, and
-j := (a , A )' is an extended (d, + d,)-vector of individual effects. Then, the solutions to
the previous equations are plugged into the original problem, leading to the following first
order conditions for 0, S(0) =0, where
n
n
s;6,y() = -n~
F(O) = n-1
Ge,(fi, &^;(9))'A(9),
i=1
1
is the profile score function for 0.7
Fixed effects estimators of smooth functions of parameters and observations are constructed using the plug-in principle, i.e. ((= C(() where
n
(0)= (nT)
T
j
(zit;, ai(0)).
i=1 t=1
Similarly, moments of the individual effects are estimated by p^ = j(m), where
11() = n-1
p-(^3(0)).
3.4. ASYMPTOTIC THEORY FOR FE-GMM ESTIMATORS
In this section we analyze the properties of one-step and two-step FE-GMM estimators in
large samples. We show consistency and derive the asymptotic distributions for estimators
of individual effects, common parameters and other quantities of interest under sequences
where both n and T pass to infinity with the sample size. We establish results separately
for one-step and two-step estimators because the former are derived under less restrictive
assumptions.
We make the following assumptions to show uniform consistency of the FE-GMM one-step
estimator:
Condition 1 (Sampling and asymptotics). (i) For each i, conditionalon a, z:
{zit : 1 < t < T}
is a stationary mixing sequence of random vectors with strong mixing coefficients ai(l) =
supt supAJAtD+ IP(A n D) - P(A)P(D), where A' = o-(a, zit, z,t_1, ... ) and ' = o-(ai, zt, zi,t+,
such that supIlad(1)!1 Cal for some 0 < a < 1 and some C > 0; (ii) {(zj, a): 1 < i < n}
are independent and identically distributed across i; (iii) n, T -+ oo such that n/T -+ ',
where 0 < i 2 < oo; and (iv) dim [g (-; 0, ai) = dg < 00.
TIn the original parametrization,
n~ 1
( ))'i3(, &4(0, &i(O))=0,
where the superscript
the FOC can be written as
-
0
i=1
denotes a generalized inverse.
60
For a matrix or vector A, let JAI denote the Euclidean norm, that is JAI2 = trace[AA'].
Condition 2 (Regularity and identification). (i) The vector of moment functions g(-; 9, a) =
(g, (-; 0, a), ... , gd, (-; 9, a))' is continuous in (9, a) E T; (ii) the parameter space T is a
compact, convex subset of Rde+da; (iii) dim (9, a) = do + d, 5 d.; (iv) there exists a
M (zit ), for
function M (zit) such that jgk(zit; 9, aj) 5 M (zit), | 9 gk (zit; 0, a) /0 (9,a2 )j
k = 1,...,dg, and supi E [M (zit)4+1 < 00 for some 6 > 0; and (v) there exists a deterministic sequence of symmetric finite positive definite matrices {W: 1 < i < n} such that
sup 1 <i<n W - Wi -+p 0, and, for each 27 > 0
in
inf Q!
(60, ao) - {(0,a):I(,a)-(Oo,aso)I>n}a)
sup
QT (9, )>
Qv(9,a
0
0,
where
Qf (0,ai) := -gi (O, a)' Wi~lg (O, ai), g (O a ) := E [ (O, aj)].
Conditions 1(i)-(ii) impose cross sectional independence, but allow for weak time series
dependence as in Hahn and Kuersteiner (2011). Conditions 1(iii)-(iv) describe the asymptotic
sequences that we consider where T and n grow at the same rate with the sample size, whereas
the number of moments d9 is fixed. Condition 2 adapts standard assumptions of the GMM
literature to guarantee the identification of the parameters based on time series variation for
all the individuals, see Newey and McFadden (1994). The dominance and moment conditions
in 2(iv) are used to establish uniform consistency of the estimators of the individual effects.
Theorem 1 (Uniform consistency of one-step estimators). Suppose that Conditions 1 and
2 hold. Then, for any 7 > 0
# - o ;>7) = o(T- 1 ),
Pr
where 9 = arg max{(oa 2 )er}n
Also, for any 7 > 0
Pr
where
&,
1
sup Id - aoj >
= arg max,
Qj
(9, a2 ) and QY (9, a2 )
~E
=
o (T-1) and Pr
(0, a) and Ai =
Wj-
sup
-
>
(9, acr)' W-1
=
(0, a).
(T-1)
(0, d).
W-'GcH , Jf:=
G'W; 1 , PZ := W-1
G',PZGo, and Jr := E[Jfl. We use the following additional assumptions to derive the
limiting distribution of the one-step estimator:
Let E
:= (G' W-1G )~1 , HZ :=
Condition 3 (Regularity). (i) For each i, (0o, ao) E int [T]; and (ii) JI4 is finite positive
definite, and {G,. W-'G, : 1 < i < n} is a sequence of finite positive definite matrices,
where {W : 1 < i < n} is the sequence of matrices of Condition 2(v).
61
Condition 4 (Smoothness). (i) There exists a function M (zit) such that, for k = 1,..., d,
Od' +d2gA;
0 < d, + d2 < 1,
(zit; 0, a 2 ) /9d1Oa2j < M (zit),
5,
and sup E [M (zit)5(4+d+6)/(-10)+6] < oo, for some 3 > 0 and 0 < v < 1/10; and (ii)
(zit)/T+ Rj"/T, where maxi|R | = op(T
there exists j(zit) such that W = Wi+Et_
E[&4(zj)] = 0, and supi E[fl&(zt) 120/-Io)+6] < oo, for some 3 > 0 and 0 < v < 1/10.
Condition 3 is the panel data analog to the standard asymptotic normality condition for
GMM with cross sectional data, see Newey and McFadden (1994). Condition 4 is similar to
Condition 4 in Hahn and Kuersteiner (2011), and guarantees the existence of higher order
expansions for the GMM estimators and the uniform convergence of their remainder terms.
Let G
:
(G,,...
, G'aiq)',where Gaa
= E[9Gc,(zjt)/ia~I,and Goa, := (G',
,...,
where Goo, = E[OO,(zt)/6ajjI. The symbol 0 denotes kronecker product of matrices, I4
a d x d identity matrix, ej a unitary dg-vector with 1 in row j, and P the j-th column
of PW. Recall that the extended individual effect is -yj = (a , AX)'.
Lemma 1 (Asymptotic expansion for one-step estimators of individual effects). Under Conditions 1, 2, 3, and 4,
(4.1)
VT(; 0 - y:o) =Q + T-1/ 2QW + T Rf,
where 1:10= #O),
T1/2E g(zat)
Pz
4
1
Z Q+'Q
N(0, R[VfW) -n1/2
N(0,Vw
E[BI, B- = +BW,I + BW,G +BW,1s
sUPi<, Rr = Op(v/T), for
Vw
BW,1
(HZ" P)
(H=)n
d.
H("
/B W I\ _
1B
1
(wNII
E
j=0
[Ga (z-1)Hag(zi,t)] - EGW'
j~i
BtE [Ga (zia)'Pitg)- ,,
d.9
+=' H/ E ,, ,HPiZI,'/2 +L(Id&) n
a,
+
-
BA
a,is HZ
a,
d
wz
B,
*
n_
4
t=1.
a
62
j
iis
HWQ.Hj
f
Hr.H |2
G
)I,
Theorem 2 (Limit distribution of one-step estimators of common parameters). Under Conditions 1, 2, 3 and 4,
o0) +
nT(-
(JW)-lN (rB',VW),
where
[G' PZGj
J!V=
[G' P Z PZGe| , BW
V~w=
[Bf'" + B ,c + B:"
=
and
BB ' = -G'. (B'
B 'V = - E
+ BwG + B''1s) , B
E[Ge(z t)'P'gi(zitj)1,
=
-,
Pg4HZ'/2- Ejtj G' (I4 9 e0)HgfiPagj2.
Ga'j
The expressions for BW' I, BWG, and BZwls are given in Lemma 1.
The source of the bias is the non-zero expectation of the profile score of 0 at the true
parameter value, due to the substitution of the unobserved individual effects by sample estimators. These estimators converge to their true parameter value at a rate vY, which
is slower than v/nT, the rate of convergence of the estimator of the common parameter.
Intuitively, the rate for 5o is v/7 because only the T observations for individual i convey
information about -fio. In nonlinear and dynamic models, the slow convergence of the estimator of the individual effect introduces bias in the estimators of the rest of parameters.
The expression of this bias can be explained with an expansion of the score around the true
value of the individual effects 8
E [s'(Oo, o)]
=
E [qf} + E [[]' E [~5'o - -fio} + E [(-$ - E [t})'(io - 7yi)]
( o j - -fioj)E [M] ( io -
+ E
7-io)
/2 + o(T')
=1
=
0 + B!V IT + Bw ,c/T + B!,v/T + o(T').
This expression shows that the bias has the same three components as in the MLE case, see
Hahn and Newey (2004). The first component, BI'B, comes from the higher-order bias of the
estimator of the individual effects. The second component, B!,C, is a correlation term and
is present because individual effects and common parameters are estimated using the same
8
Using the notation introduced in Section 3, the score is
sw(o) = n- Z
(60,io) =-n
1
=
(&1,, Z'0) is the solution to
i(00,io)= -(
G,(00 ,& 0o)'Aio
1(eO,&aio)+WiA4o
)
where is
$ei(60dio,
1=1
63
=0.
observations. The third component, B.',V, is a variance term. The bias of the individual
effects, B'", can be further decomposed in three terms corresponding to the asymptotic
bias for a GMM estimator with the optimal score, B'', when W is used as the weighting
function; the bias arising from estimation of Ga,, BG; and the bias arising from not using
an optimal weighting matrix, B,1S
We use the following condition to show the consistency of the two-step FE-GMM estimator:
Condition 5 (Smoothness, regularity, and martingale). (i) There exists a function M (zit)
such that jgk(zjt;O,aj)j 5 M(zit), j|gk(zijt;O,aj)/(0,aj)j 5 M(zit), fork = 1,..., 4,
and supi E [M (zit)10(d+d.+6)/(1-10v)+6 < co, for some 6 > 0 and 0 < v < 1/10; (ii)
{Rn : 1 < i < n} is a sequence of finite positive definite matrices; and (iii) for each i,
g(zt; 00, ajo) is a martingale difference sequence with respect to c-(aZi, ,
1
, Zi,t-2,...).
Conditions 5(i)-(ii) are used to establish the uniform consistency of the estimators of the
individual weighting matrices. Condition 5(iii) is convenient to simplify the expressions of
the optimal weighting matrices. It holds, for example, in rational expectation models that
commonly arise in economic applications.
Theorem 3 (Uniform consistency of two-step estimators). Suppose that Conditions 1, 2, 3
and 5 hold. Then, for any > 0
Pr (10-
o
_)
=o(T-1)
where 0'= arg max{(o',&;)}!,r
ET 1 Q(0, a,) and Qq (0, a,) := -g (0, aj)'
U2,(9, &4)-';(0, a,).
Also, for any 7 > 0
Pr
sup I^' - ao
where ai = arg maxcl Q(0, a) and
=o (T-') and Pr
sup Ai >
=)o (T-1),
(0 , ai) + Q(0, di)A = 0.
We replace Condition 4 by the following condition to obtain the limit distribution of the
two-step estimator:
Condition 6 (Smoothness). There exists some M (Zit) such that, for k = 1,..., d
1+d2g4
and supi E
(zit; 0, a,) /O0dc42{I
rM (zit)10(d+da 6)(1-10mo+61 <
M (zit)
0 < di + d 2 < 1 ... ,5,
00, for some 6 > 0 and 0 < v < 1/10.
Condition 6 guarantees the existence of higher order expansions for the estimators of the
weighting matrices and uniform convergence of their remainder terms. Conditions 5 and 6
are stronger versions of conditions 2(iv), 2(v) and 4. They are presented separately because
they are only needed when there is a first stage where the weighting matrices are estimated.
. 64
(G'c. LGce,) ~, Hac, :2aG' Q7 1, and P, :=
Let E.,
1 - Q 1GaHa.
Lemma 2 (Asymptotic expansion for two-step estimators of individual effects). Under the
Conditions 1, 2, 3, 4, and 5,
(4.2)
(o
where
io) = Ii + T~112Bi + T'R
-
2j,
:=
T
=-( ::Hai
)Tl1/2
g(zi) 4 N(V),
t=1
- 1/2
Qai
4 N(0, E[VJ1 )I B7 , = B' + Bl7 +B
Eni
+ B',
R2
sup<j<
1
= OP('/7 ),
with, for
,
= 8analai,
Vi
=
diag(E.,I Por),
B'
=
(k
Ha
-
BG=r
Eg(zt)g(zit)'Pa g(zi,tj)},
a,
7,
BY
=
a,/2+ E [Ga (zt)Hag(zi))
E [Ga, (zEi)'P2g(z,t-)],
BB
B
Gia
H
j=
Theorem 4 (Limit distribution for two-step estimators of common parameters). Under the
Conditions 1, 2, 3, 4, 5 and 6,
Vn(0 - 00) 4 -J,'N
[G P,,Go,] , B, = Z [B: + B] , B! =-G' [BI + Bk + B, + B,], B
=
E [Go, (zit)'Pag(z,tj)]. The expressionsfor B\, B\, B, and Bk are given in Lemma
where J =
_
( B, J8 ),
2.
Theorem 4 establishes that one iteration of the GMM procedure not only improves asymptotic efficiency by reducing the variance of the influence function, but also removes the
variance and non-optimal weighting matrices components from the bias. The higher-order
bias of the estimator of the individual effects, B\, now has four components, as in Newey and
Smith (2004). These components correspond to the asymptotic bias for a GMM estimator
with the optimal score, B.; the bias arising from estimation of G,,, Bx; the bias arising
from estimation of ;, Bf; and the bias arising from the choice of the preliminary first step
estimator, Bf. An additional iteration of the GMM estimator removes the term Be'.
65
The general procedure for deriving the asymptotic distribution of the FE-GMM estimators
consists of several expansions. First, we derive higher-order asymptotic expansions for the
estimators of the individual effects, with the common parameter fixed at its true value 0o.
Next, we obtain the asymptotic distribution for the profile score of the common parameter
at 00 using the expansions of the estimators of the individual effects. Finally, we derive the
asymptotic distribution of estimator for the common parameter multiplying the asymptotic
distribution of the score by the limit profile Jacobian matrix. This procedure is detailed
in the online appendix FernAndez-Val and Lee (2012). Here we characterize the asymptotic
bias in a linear correlated random coefficient model with endogenous regressors. Motivated
by the numerical and empirical examples that follow, we consider a model where only the
variables with common parameter are endogenous and allow for the moment conditions not
to be martingale difference sequences.
Example: Correlated random coefficient model with endogenous regressors. We
consider a simplified version of the models in the empirical and numerical examples. The
notation is the same as in the theorems discussed above. The moment condition is
g(zit; 9, ao) = Wit(yit - xziai -
X't),
where wt = (zxit, w2't)' and zt = (xa,, r',
' , yt)'. That is, only the regressors with common coefficients are endogenous. Let Eit = yit - x' a o - x' Oo. To simplify the expressions
for the bias, we assume that Ej Iwi, a - i.i.d.(0, o4) and E[X2itfE,t-1 Iwi, ai] = E[x2it4E,t-j],
for Wi = (wil, --- , WiT)' and j c {0, 1, .. .}. Under these conditions, the optimal weighted
matrices are proportional to E[witwi], which do not depend on 90 and aO. We can therefore
obtain the optimal GMM estimator in one step using the sample averages T-1
wFwt
to estimate the optimal weighting matrices.
In this model, it is straightforward to see that the estimators of the individual effects have
no bias, that is BW,-= BG
- 3W,1S -
0. By linearity of the first order conditions in 0 and
S, B ,V - 0. The only source of bias is the correlation between the estimators of 0 and ai.
After some straightforward but tedious algebra, this bias simplifies to
00
BWC
= -(d
da) E
-
~~ti-l
j=-00
For the limit Jacobian, we find
=
{E[i uiv1 E~i-ii- l-*E [iv-2
't ]}
,
J!
where variables with tilde indicate residuals of population linear projections of the corresponding variable on ximt, for example
2it
=
X2it -E[xai l]E[xitxit]-li.
66
The expression
of the bias is
B
(4.3)
=
-(dg - da)(Jw -1)
E[i2
t(&,t-j -
2i,t-jOo)]-
j=-oo
In random coefficient models the ultimate quantities of interest are often functions of
the data, model parameters and individual effects. The following corollaries characterize
the asymptotic distributions of the fixed effects estimators of these quantities. The first
corollary applies to averages of functions of the data and individual effects such as average
partial effects and average derivatives in nonlinear models, and average elasticities in linear
models with variables in levels. Section 6 gives an example of these elasticities. The second
corollary applies to averages of smooth functions of the individual effects including means,
variances and other moments of the distribution of these effects. Sections 2 and 6 give
examples of these functions. We state the results only for estimators constructed from twostep estimators of the common parameters and individual effects. Similar results apply to
estimators constructed from one-step estimators. Both corollaries follow from Lemma 2 and
Theorem 4 by the delta method.
.
Corollary 1 (Asymptotic distribution for fixed effects averages). Let ((z; 0, a1 ) be a twice
continuously differentiablefunction in its second and third argument, such that infi Var [C(zt)]I>
0, EE[C(zt) 2] < oo, EC((zjt)|2 < oo, and EEe(zt)|2 < 00, where the subscripts on
denote partial derivatives. Then, under the conditions of Theorem 4, for some deterministic
sequence rnT -+ oo such that rnT =
(-nT),
rnT(C- C -
BC/T) 4 N(O, Vc),
,
where C = EE [(zt)]
00
BC
=
FE -
d
i(z-t)'Hc, g(zi,t-j) + ai(zit)'Bai +(
for Bat = BO + BO + Bc + BZ, and for r2 =
V =
coio (zt)'Ea/2
-
(C(zit)'J-1B1
j=1
j=0
rflT(nT),
Iimn,T+
1
Ce(zit)] +
r2E [C (z)'aiC,(zit)+Ce(zit)'J.-
l
TE
(
(C(z)-C)
)
Corollary 2 (Asymptotic distribution for smooth functions of individual effects). Let p(a
be a twice differentiablefunction such that E[p(ao) 2] < oo and |pjza(ao)j 2 < oo, where the
subscripts on 1 denote partial derivatives. Then, under the conditions of Theorem
K~p^p)+ N(rBp, V,),
67
4
where y =
R[p(aio)1,
B=
[
(ao)'Ba,.
p
+(
d
Zn i(aio)'EaJ2],
j=1
for Ba8 = BL+BG + B' + BZ, and V, =E[(,L(ao) -
) 2}.
The convergence rate r-T in Corollary 1 depends on the function ((z; 9, as). For example,
rnT = Vd7 for functions that do not depend on ai such as ((z; 9, a ) = c', where c is
a known do vector. In general, rnT = ,/n- for functions that depend on ai. In this case
r2 = 0 and the first two terms of V drop out. Corollary 2 is an important special case
of Corollary 1. We present it separately because the asymptotic bias and variance have
simplified expressions.
3.5.
BIAS CORRECTIONS
The FE-GMM estimators of common parameters, while consistent, have bias in the asymptotic distributions under sequences where n and T grow at the same rate. These sequences
provide a good approximation to the finite sample behavior of the estimators in empirical
applications where the time dimension is moderately large. The presence of bias invalidates
any asymptotic inference because the bias is of the same order as the variance. In this section
we describe bias correction methods to adjust the asymptotic distribution of the FE-GMM
estimators of the common parameter and smooth functions of the data, model parameters
and individual effects. All the corrections considered are analytical. Alternative corrections
based on variations of Jackknife can be implemented using the approaches described in Hahn
and Newey (2004) and Dhaene and Jochmans (2010).9
We consider three analytical methods that differ in whether the bias is corrected from the
estimator or from the first order conditions, and in whether the correction is one-step or
iterated for methods that correct the bias from the estimator. All these methods reduce the
order of the asymptotic bias without increasing the asymptotic variance. They are based on
analytical estimators of the bias of the profile score B., and the profile Jacobian matrix J..
Since these quantities include cross sectional and time series means E and E evaluated at the
true parameter values for the common parameter and individual effects, they are estimated
by the corresponding cross sectional and time series averages evaluated at the FE-GMM
estimates. Thus, for any function of the data, common parameter and individual effects
fit(9, ai), let f7(() = f
0(,^a(O)), fi(9) = E[fit(9)] = T-1
1 (0) and f^(0) = EZ[f^()] =
n- &i fi(9). Next, define Ea,(9) = [G,,(9)'QIR 4 t(9)]-1, Ha,(9) = Eaj(0)Gai(0)'Q71,
9
Hahn, Kuersteiner and Newey (2004) show that analytical, Bootstrap, and Jackknife bias corrections methods are asymptotically equivalent up to third order for MLE. We conjecture that the same result applies to
GMM estimators, but the proof is beyond the scope of this paper.
68
and Pa, (0) = ; 'Go-,(0)Ha,(0). To simplify the presentation, we only give explicit formulas
for FE-GMM three-step estimators in the main text. We give the expressions for one and
two-step estimators in the Supplementary Appendix. Let
'B(0) = -J8 (0)
B,(0), B8 (0) = E [B (0) + B (0)], J (0) = E [Go (0)'Pa,(0)Go(0)I,
where B,(0)= -O, (0)'[L(9)+
(6)Z+ Bi(0)I,
(0) +
$
(6)
=
-9
(6)
E
Gaa 3 (6)a, (6)/2 + Pa, (6)
(9)
-
(6)
=
0 (0)'
j=O
T1
Z
, (6)'P(
a (6)Ua, (0)-,tj(0),
t=j+1
,t-j(O),
t=j+1
P(6)t()'a, (6 )',t-j(0),
T
j=0
1
T
t
Pai (0)
T
j=0
j=1
$
T
I
d.
t=j+1
and B3(0) = T- Z_= Z 2 1 Eel Ti)$,t-j(0). In the previous expressions, the spectral time series averages that involve an infinite number of terms are trimmed. The trimming
parameter f is a positive bandwidth that need to be chosen such that f -+ oo and /T -+ 0
as T -+ oo (Hahn and Kuersteiner, 2011)
The one-step correction of the estimator subtracts an estimator of the expression of the
asymptotic bias from the estimator of the common parameter. Using the expressions defined
above evaluated at 9, the bias-corrected estimator is
(5.1)
-
0C
(9)/T.
This bias correction is straightforward to implement because it only requires one optimization. The iterated correction is equivalent to solving the nonlinear equation
0B
^C
(5.2)
When 0+ S(0) is invertible in 0, it is possible to obtain a closed-form solution to the previous
equation.10 Otherwise, an iterative procedure is needed. The score bias-corrected estimator
is the solution to the following estimating equation
(5.3)
(SBC) -
s(S'BC)IT =0.
This procedure, while computationally more intensive, has the attractive feature that both
estimator and bias are obtained simultaneously. Hahn and Newey (2004) show that fully
iterated bias-corrected estimators solve approximated bias-corrected first order conditions.
IBC and SBC are equivalent if the first order conditions are linear in 0.
10See
MacKinnon and Smith (1998) for a comparison of one-step and iterated bias correction methods.
69
Example: Correlated random coefficient model with endogenous regressors. The
previous methods can be illustrated in the correlated random coefficient model example in
Section 4. Here, the fixed effects GMM estimators have closed forms:
T
-T
(fE
(
lit
Xlit(yit
t=1l
-2it
t=1
and
J.')-l
Ei2itiW
i=1
where
J
'=
,ET
T
T
0
1
(E2Tti'2(
E
(t=1
t=1
1 2i-i)~
t
i
i2
2it~it
t=-1
i-2, '2], and variables with tilde now
indicate residuals of sample linear projections of the corresponding variable on xzi, for
example i2it = X2it - ET x 2 ixm(Zfi xiitz')ETxit.
We can estimate the bias of 0 from the analytic formula in expression (4.3) replacing
population by sample moments and Oo by 0, and trimming the number of terms in the
spectral expectation,
n
min(T,T+j)
=
i=1 j=-ttmax(,J+1)
The one-step bias corrected estimates of the common parameter 0 and the average of the
individual parameter a := E[ai] are
n
PC
= 0-
3(0)/T,
aBC - n-1
a(BC)
i1
The iterated bias correction estimator can be derived analytically by solving
1BC
-
_ 13 3(JBC)/T,
which has closed-form solution
IBC =
Ids +
(g
--
da)(YW)1
n
t
EE
min(TT+j)
i2lt
2it- /(nT2)
X
i=1 j=-I t=max(1,j+1)
n
+ (dg - dca)(J)
t
EE
min(T,T+j)
i2,t-,itj/(nT2)j.
i=1 j=-I t-Max(1J+1)
The score bias correction is the same as the iterated correction because the first order conditions are linear in 0.
The bias correction methods described above yield normal asymptotic distributions centered at the true parameter value for panels where n and T grow at the same rate with
70
the sample size. This result is formally stated in Theorem 5, which establishes that all the
methods are asymptotically equivalent, up to first order.
-
Theorem 5 (Limit distribution of bias-corrected FE-GMM). Assume that v'ni(B(9)
11 2
B,)/T 4 0 and v'nT(*J,(O)- J,)/T 4 0, for some 0 = 00+Op((nT) ). Under Conditions
1, 2, 3, 4, 5 and 6, for C E {BC, SBC, IBC}
(5.4)
where OBC,
T(
'IBC
and
SBC
- O0 ) + N (0, J,),
are defined in (5.1), (5.2) and (5.3), and J. =
[G' Pa,Go,].
The convergence condition for the estimators of B. and J, holds for sample analogs evaluated at the initial FE-GMM one-step or two-step estimators if the trimming sequence is
chosen such that e -+ oo and /T -+ 0 as T -+ oo. Theorem 5 also shows that all the biascorrected estimators considered are first-order asymptotically efficient, since their variances
achieve the semiparametric efficiency bound for the common parameters in this model, see
Chamberlain (1992).
The following corollaries give bias corrected estimators for averages of the data and individual effects and for moments of the individual effects, together with the limit distributions
of these estimators and consistent estimators of their asymptotic variances. To construct
the corrections, we use bias corrected estimators of the common parameter. The corollaries
then follow from Lemma 2 and Theorem 5 by the delta method. We use the same notation
as in the estimation of the bias of the common parameters above to denote the estimators
of the components of the bias and variance.
Corollary 3 (Bias correction for fixed effects averages). Let C (z; 0, a,) be a twice continuously differentiable function in its second and third argument, such that inf, Var[C(zjt)I > 0,
EE[C(zt) 2 < oo, BE[(zit)2] < oo, and BEje(zit)1 2 < oo. For C E {BC, SBC, IBC}, let
= (g(C) - BC(Wc)/T where
()'PO,
B0= E
atj,(0)'e, (0)/2]
(0) + (c, (0)'B,(0) +
==1+1.+=1
where f is a positive bandwidth such that f -+ 00 and t/T -+ 0 as T -+ oo. Then, under the
conditions of Theorem 5
- () d N(0, Vc),
r((
=o
2) and
rnT, C, and V are defined in Corollary 1. Also, for any
where
+ Op((nT)-1/
C+ Op(r;;T),
V
r
{
0
Z ,(0)'aj (0)(tt(9) + C9 t(0)'Jq (0)Ce (o)]+ T (Z[ t(O) - (I
E=
71
is a consistent estimator for V.
Corollary 4 (Bias correction for smooth functions of individual effects). Let p(a,) be a
twice differentiable function such that E[p(ago) 2] < oo and Ejp,,(aio)12 < oo. For C E
{BC, SBC, IBC}, let pc = E[j2j(OC)] - B,(c)/T, where ^Z(0) = A(&^1(0)), and B,(9) =
E['-' p (0)'Bi (0) + E p (0)'EiZ (0)/2]. Then, under the conditions of Theorem 5
n
=o
+ Op((nT)'/2)where
and I=L
/-/
(5.5)
/_)
-
[p(ajo)] and V = E[(tp(ao)
-
4 N(OV0),
p)2]. Also, for any
0
+ Op(n"/2)
V = E
- p}2 +^
(9)'a,(#)N(#)/T],
is a consistent estimator for >, The second term in (5.5) is included to improve the fnite
sample properties of the estimator in short panels.
3.6. EMPIRICAL EXAMPLE
We illustrate the new estimators with an empirical example based on the classical cigarette
demand study of Becker, Grossman and Murphy (1994) (BGM hereafter). Cigarettes are addictive goods. To account for this addictive nature, early cigarette demand studies included
lagged consumption as explanatory variables (e.g., Baltagi and Levin, 1986). This approach,
however, ignores that rational or forward-looking consumers take into account the effect of
today's consumption decision on future consumption decisions. Becker and Murphy (1988)
developed a model of rational addiction where expected changes in future prices affect the
current consumption. BGM empirically tested this model using a linear structural demand
function based on quadratic utility assumptions. The demand function includes both future
and past consumptions as determinants of current demand, and the future price affects the
current demand only through the future consumption. They found that the effect of future
consumption on current consumption is significant, what they took as evidence in favor of
the rational model.
Most of the empirical studies in this literature use yearly state-level panel data sets. They
include fixed effects to control for additive heterogeneity at the state-level and use leads and
lags of cigarette prices and taxes as instruments for leads and lags of consumption. These
studies, however, do not consider possible non-additive heterogeneity in price elasticities or
sensitivities across states. There are multiple reasons why there may be heterogeneity in the
price effects across states correlated with the price level. First, the considerable differences
in income, industrial, ethnic and religious composition at inter-state level can translate into
different tastes and policies toward cigarettes. Second, from the perspective of the theoretical
model developed by Becker and Murphy (1988), the price effect is a function of the marginal
72
utility of wealth that varies across states and depends on cigarette prices. If the price
effect is heterogenous and correlated with the price level, a fixed coefficient specification
may produce substantial bias in estimating the average elasticity of cigarette consumption
because the between variation of price is much larger than the within variation. Wangen
(2004) gives additional theoretical reasons against a fixed coefficient specification for the
demand function in this application.
We consider the following linear specification for the demand function
(6.1)
Cit = aoi + aCiPit+ O1Ci,ti- + 82 Ci,t+1 + Xit6 + Eit,
where C is cigarette consumption in state i at time t measured by per capita sales in packs;
aQ0 is an additive state effect; ali is a state specific price coefficient; Pt is the price in 19821984 dollars; and Xt is a vector of covariates which includes income, various measures of
incentive for smuggling across states, and year dummies. We estimate the model parameters
using OLS and IV methods with both fixed coefficient for price and random coefficient for
price. The data set, consisting of an unbalanced panel of 51 U.S. states over the years 1957
to 1994, is the same as in Fenn, Antonovitz and Schroeter (2001). The set of instruments for
Cj,t_1 and C2,t+1 in the IV estimators is the same as in specification 3 of BGM and includes
Xil, Pit, P,t-1 , Pi,t+1, Taxit, Tax ,t-,, and Taxi,t+i, where Taxit is the state excise tax for
cigarettes in 1982-1984 dollars.
Table 1 reports estimates of coefficients and demand elasticities. We focus on the coefficients of the key variables, namely P, Cj,t_1 and C,t+i. Throughout the table, FC refers
to the fixed coefficient specification with ali = a, and RC refers to the random coefficient
specification in equation (6.1). BC and IBC refer to estimates after bias correction and iterated bias correction, respectively. Demand elasticities are calculated using the expressions in
Appendix A of BGM. They are functions of Cit,Pit, a&l, 01 and 02, linear in al. For random
coefficient estimators, we report the mean of individual elasticities, i.e.
n
T
h(i;0,a
Ch = ;T
1t=1
log Cit(h)/logPt(h) are price elasticities at different time horizons
h. Standard errors for the elasticities are obtained by the delta method as described in
Corollaries 3 and 4. For bias-corrected RC estimators the standard errors use bias-corrected
estimates of 9 and aj.
where Ch(zit;9, a)
=
As BGM, we find that OLS estimates substantially differ from their IV counterparts.
IV-FC underestimates the elasticities relative to IV-RC. For example, the long-run elasticity estimate is -0.70 with IV-FC, whereas it is -0.88 with IV-RC. This difference is also
pronounced for short-run elasticities, where the IV-RC estimates are more than 25 percent
73
larger than the IV-FC estimates. We observe the same pattern throughout the table for
every elasticity. The bias comes from both the estimation of the common parameter 02
and the mean of the individual specific parameter E[ai]. The bias corrections increase the
coefficient of future consumption Cit 1 and reduce the absolute value of the mean of the
price coefficient. Moreover, they have significant impact on the estimator of dispersion of
the price coefficient. The uncorrected estimates of the standard deviation are more than
20% larger than the bias corrected counterparts. In the online appendix FernAndez-Val and
Lee (2012), we show through a Monte-Carlo experiment calibrated to this empirical example,
that the bias is generally large for dispersion parameters and the bias corrections are effective
in reducing this bias. As a consequence of shrinking the estimates of the dispersion of aih,
we obtain smaller standard errors for the estimates of E[aiI throughout the table. In the
Monte-Carlo experiment, we also find that this correction in the standard errors provides
improved inference.
3.7. CONCLUSION
This paper introduces a new class of fixed effects GMM estimators for panel data models with unrestricted nonadditive heterogeneity and endogenous regressors. Bias correction
methods are developed because these estimators suffer from the incidental parameters problem. Other estimators based on moment conditions, like the class of GEL estimators, can be
analyzed using a similar methodology. An attractive alternative framework for estimation
and inference in random coefficient models is a flexible Bayesian approach. It would be interesting to explore whether there are connections between moments of posterior distributions
in the Bayesian approach and the fixed effects estimators considered in the paper. Another
interesting extension would be to find bias reducing priors in the GMM framework similar
to the ones characterized by Arellano and Bonhomme (2009) in the MLE framework. We
leave these extensions to future research.
Bibliography
[11 ALvAREZ, J. AND M. ARELLANO (2003) "The Time Series and Cross-Section Asymptotics of Dynamic
Panel Data Estimators," Econometrica, 71, 1121-1159.
[21 ANGRIST, J. D. (2004) "Treatment effect heterogeneity in theory and practice," The Economic Journal
114(494), C52-C83.
[31 ANGRIST, J. D., K. GRADDY AND G. W. IMBENS (2000) "The Interpretation of Instrumental Variables
Estimators in Simultaneous Equation Models with an Application to the Demand of Fish," Review of
Economic Studies 67, 499-527.
[41 ANGRIST, J. D., AND J. HAHN (2004) "When to Control for Covariates? Panel Asymptotics for
Estimates of Treatment Effects ," Review of Economics and Statistics 86(1), 58-72.
74
D., AND G. W. IMBENS (1995) "Two-Stage Least Squares Estimation of Average Causal
Effects in Models With Variable Treatment Intensity," Journal of the American Statistical Association
90, 431-442.
[61 ANGRIST, J. D., AND A. B. KRUEGER (1999) "Empirical Strategies in Labor Economics," in 0. Ashenfelter and D. Card, eds., Handbook of Labor Economics, Vol. 3, Elsevier Science.
[7] ARELLANO, M. AND S. BONHOMME (2009) "Robust Priors in Nonlinear Panel Data Models," Econometrica 77, 489-536.
[81 ARELLANO, M. AND S. BONHOMME (2010) "Identifying Distributional Characteristics in Random Coefficients Panel Data Model," unpublished manuscript, CEMFI.
[91 ARELLANO, M., AND J. HAHN (2006), "A Likelihood-based Approximate Solution to the Incidental
Parameter Problem in Dynamic Nonlinear Models with Multiple Effects," mimeo, CEMFI.
[10] ARELLANO, M., AND J. HAHN (2007), "Understading Bias in Nonlinear Panel Models: Some Recent Developments," in R. Blundell, W. K. Newey and T. Persson, eds., Advances in Economics and
Econometrics: Theory and Applications, Ninth World Congress, Vol. 3, Cambridge University Press:
[51
ANGRIST, J.
Cambridge.
[11] BAI, J. (2009) "Panel Data Models With Interactive Fixed Effects" Econometrica, 77(4), 1229-1279.
[121 BALTAGI, B. H. AND D. LEVIN (1986) "Estimating Dynamic Demand for Cigarettes Using Panel Data:
The Effects of Bootlegging, Taxation and Advertising Reconsidered," The Review of Economics and
Statistics, 68, 148-155.
[131 BECKER, G. S., M. GROSSMAN, AND K. M. MURPHY (1994) "An Empirical Analysis of Cigarette
Addiction," The American Economic Review, 84, 396-418.
[14] BECKER, G. S. AND K. M. MURPHY (1988) "A Theory of Rational Addiction," Journal of Political
Economy, 96, 675-700.
[151 BESTER, A. AND C. HANSEN (2008) "A Penalty Function Approach to Bias Reduction in Nonlinear
Panel Models with Fixed Effects," Journal of Business and Economic Statistics, 27(2), 131-148.
[16} BUSE, A. (1992) "The Bias of Instrumental Variables Estimators," Econometrica 60, 173-180.
[171 CHAMBERLAIN, G. (1992), "Efficiency Bounds for Semiparametric Regression," Econometrica 60, 567596.
I., HAHN, J., AND W. K. NEWEY (2010), "Average and
Quantile Effects in Nonseparable Panel Models,"unpublished manuscript, MIT.
[191 DHAENE, G., AND K. JOCHMANS (2010), "Split-Panel Jackknife Estimation of Fixed Effects Models,"unpublished manuscript, K.U. Leuven.
[201 FERNANDEZ-VAL, I., AND J. LEE (2012), "Supplementary Appendix to Panel Data Models with Nonadditive Unobserved Heterogeneity: Estimation and Inference," unpublished manuscript, Boston Uni[181
CHERNOZHUKOV, V., FERNANDEZ-VAL,
versity.
[211 FENN, A. J., F. ANTONOVITZ, AND J. R. SCHROETER (2001) "Cigarettes and addiction information:
new evidence in support of the rational addiction model," Economics Letters, 72, 39 - 45.
[221 GRAHAM, B. S. AND J. L. POWELL (2008) "Identification and Estimation of 'Irregular' Correlated
Random Coefficient Models," NBER Working Paper No. 14469
[231 HAHN, J., AND G. KUERSTEINER (2002), "Asymptotically Unbiased Inference for a Dynamic Panel
Model with Fixed Effects When Both n and T are Large," Econometrica 70, 1639-1657.
[241 HAHN, J., AND G. KUERSTEINER (2011), "Bias Reduction for Dynamic Nonlinear Panel Models with
Fixed Effects," " Econometric Theory 27, 1152-1191.
75
[251 HAHN, J., G. KUERSTEINER, AND W. NEWEY (2004), "Higher Order Properties of Bootstrap and
Jackknife Bias Corrections," unpublished manuscript.
[261 HAHN, J., AND W. NEWEY (2004), "Jackknife and Analytical Bias Reduction for Nonlinear Panel
Models," Econometrica 72, 1295-1319.
[27J HANSEN, L. P. (1982) "Large Sample Properties of Generalized Method of Moments Estimators,"
Econometrica 50, 1029-1054.
[281 HECKMAN, J., AND E. VYTLACIL (2000) "Instrumental Variables Methods for the Correlated Random
Coefficient Model," Journalof Human Resources XXXIII(4), 974-987.
[291 HSIAo, C., AND M. H. PESARAN (2004), "Random Coefficient Panel Data Models," mimeo, University
of Southern California.
[301 KELEJIAN, H. H. (1974) "Random Parameters in a Simultaneous Equation Framework: Identification
and Estimation,"Econometrica 42(3), 517-528.
[311 KiviET, J. F. (1995) "On bias, inconsistency, and efficiency of various estimators in dynamic panel data
models,"Journal of Econometrics 68(1), 53-78.
[321 LANCASTER, T. (2002), "Orthogonal Parameters and Panel Data," Review of Economic Studies 69,
647-666.
[331 MACKINNON, J. G., AND A. A. SMITH (1998), "Approximate Bias Correction in Econometrics," Journal of Econometrics 85, 205-230.
[341 MURAZASHVILI, I., AND J. M. WOOLDRIDGE, (2005), "Fixed Effects Instrumental Variables Estimation in Correlated Random Coefficient Panel Data Models, unpublished manuscript, Michigan State
University.
[351 NEWEY, W.K., AND D. McFADDEN (1994), "Large Sample Estimation and Hypothesis Testing," in
R.F. ENGLE AND D.L. MCFADDEN, eds., Handbook of Econometrics, Vol. 4. Elsevier Science. Amsterdam: North-Holland.
[361 NEWEY, W.K., AND R. SMITH (2004), "Higher Order Properties of GMM and Generalized Empirical
Likelihood Estimators," Econometrica 72, 219-255.
[371 NAGAR, A. L., (1959), "The Bias and Moment Matrix of the General k-Class Estimators of the Parameters in Simultaneous Equations," Econometrica27, 575-595.
[381 NEYMAN, J., AND E.L. SCOTT, (1948), "Consistent Estimates Based on Partially Consistent Observations," Econometrica 16, 1-32.
[39 PHILLIPS, P. C. B., AND H. R. MOON, (1999), "Linear Regression Limit Theory for Nonstationary
Panel Data," Econometrica 67, 1057-1111.
[401 RILSTONE, P., V. K. SRIVASTAVA, AND A. ULLAH, (1996), "The Second-Order Bias and Mean Squared
Error of Nonlinear Estimators, " Journalof Econometrics 75, 369-395.
[411 ROY, A., (1951), "Some Thoughts on the Distribution of Earnings, " Oxford Economic Papers 3, 135146.
[421 WANGEN, K. R. (2004) "Some Fundamental Problems in Becker, Grossman and Murphy's Implementation of Rational Addiction Theory," Discussion Papers 375, Research Department of Statistics Norway.
[431 WOOLDRIDGE, J. M. (2002), Econometric Analysis of Cross Section and Panel Data, MIT Press,
Cambridge.
[441 WOOLDRIDGE, J. M., (2005), "Fixed Effects and Related Estimators in Correlated Random Coefficient
and Treatment Effect Panel Data Models, " Review of Economics and Statistics, forthcoming.
[451 WOUTERSEN, T.M. (2002), "Robustness Against Incidental Parameters," unpublished manuscript, University of Western Ontario.
76
[461 YrrZHAKI, S. (1996) "On Using Linear Regressions in Welfare Economics," Journal of Business and
Economic Statistics 14, 478-486.
-
-
'
Iq
0
Uncorrected
Bias Corrected
I
I
CNI
9'
CVC0
9'
9'
9'
05
I
I
I
0
I
I
8
I
I
0-
-
a'
-60
-50
-40
-30
-20
-10
Price effect
FIGURE 1. Normal approximation to the distribution of price effects using
uncorrected (solid line) and bias corrected (dashed line) estimates of the mean
and standard deviation of the distribution of price effects. Uncorrected estimates of the mean and standard deviation are -36 and 13, bias corrected
estimates are -31 and 10.
77
Table 1: Estimates of Rational Addiction Model for Cigarette Demand
OLS-FC IV-FC
OLS-RC
NBC
BC
IV-RC
IBC
NBC
BC
IBC
Coefficients
(Mean) Pt
-9.58
(1.86)
-34.10 -13.49 -13.58
(4.10) (3.55) (3.55)
(Std. Dev.) Pt
-13.26 -36.39 -31.26 -31.26
(3.55) (4.85) (4.62) (4.64)
4.35
(0.98)
4.22
(1.02)
4.07
(1.03)
12.86
(2.35)
10.45
(2.13)
10.60
(2.15)
Ct_1
0.49
(0.01)
0.45
(0.06)
0.48
(0.04)
0.48
(0.04)
0.48
(0.04)
0.44
(0.04)
0.44
(0.04)
0.45
(0.04)
Ct+1
0.44
(0.01)
0.17
(0.07)
0.43
0.44
(0.04) (0.04)
0.44
(0.04)
0.23
(0.05)
0.29
(0.05)
0.27
(0.05)
Long-run
-1.05
(0.24)
-0.70
(0.12)
-1.30
(0.28)
-1.31
(0.28)
-1.28
(0.28)
-0.88
(0.09)
-0.91
(0.10)
-0.90
(0.10)
Own Price
(Anticipated)
-0.20
(0.04)
-0.32
(0.04)
-0.27
(0.06)
-0.27
(0.06)
-0.27
(0.06)
-0.38
(0.04)
-0.35
(0.04)
-0.35
(0.04)
Own Price
(Unanticipated)
-0.11
(0.02)
-0.29
(0.03)
-0.15
(0.04)
-0.16
(0.04)
-0.15
(0.04)
-0.33
(0.04)
-0.29
(0.04)
-0.29
(0.04)
Future Price
(Unanticipated)
-0.07
(0.01)
-0.05
(0.03)
-0.10
(0.02)
-0.10 -0.09
(0.02) (0.02)
-0.09
(0.02)
-0.10
(0.02)
-0.09
(0.02)
Past Price
(Unanticipated)
-0.08
(0.01)
-0.14
(0.02)
-0.11
(0.03)
-0.11
(0.02)
-0.10
(0.03)
-0.16
(0.02)
-0.15
(0.02)
-0.15
(0.02)
Short-Run
-0.30
(0.05)
-0.35
(0.06)
-0.41
(0.12)
-0.41
(0.12)
-0.40
(0.12)
-0.44
(0.06)
-0.44
(0.06)
-0.43
(0.06)
Price elasticities
RC/FC refers to random/fixed coefficient model. NBC/BC/IBC refers to no bias-correction/bias
correction/iterated bias correction estimates.
Note: Standard errors are in parenthesis.
78
Supplementary Appendix to Panel Data Models with Nonadditive Unobserved
Heterogeneity: Estimation and Inference
IvAn FernAndez-Val and Joonhwan Lee
October 15, 2013
This supplement to the paper "Panel Data Models with Nonadditive Unobserved Heterogeneity: Estimation and Inference" provides additional numerical examples and the proofs of the main results. It is organized
in seven appendices. Appendix A contains a Monte Carlo simulation calibrated to the empirical example of
the paper. Appendix B gives the proofs of the consistency of the one-step and two-step FE-GMM estimators. Appendix C includes the derivations of the asymptotic distribution of one-step and two-step FE-GMM
estimators. Appendix D provides the derivations of the asymptotic distribution of bias corrected FE-GMM
estimators. Appendix E and Appendix F contain the characterization of the stochastic expansions for the
estimators of the individual effects and the scores. Appendix G includes the expressions for the scores and
their derivatives.
Throughout the appendices Oup and onp will denote uniform orders in probability. For example, for a
sequence of random variables {i : 1 < i < n}, Cj = Osp(l) means supl<is< 6j = Op(l) as n -+ oo, and
6 = oup(l) means supI<i,< Ci = op(l) as n -+ oo. It can be shown that the usual algebraic properties for
OP and op orders also apply to the uniform orders Oup and oup. Let ej denote a 1 x dg unitary vector with
a one in position j. For a matrix A, JAI denotes Euclidean norm, that is JA1 2 = trace[AA']. HK refers to
Hahn and Kuersteiner (2011).
APPENDIX
A.
NUMERICAL EXAMPLE
We design a Monte Carlo experiment to closely match the cigarette demand empirical example in the
paper. In particular, we consider the following linear model with common and individual specific parameters:
Cit
aOi + aciPit + 61 C,t- 1 + 0 2Ci,t+1 + O(it,
=
Pit =
7/oi+77ijTaxjt+ujt, (i=1,2,...,n, t=1,2,...,T);
where {(ai,7j) : 1 < i < n} is i.i.d. bivariate normal with mean (pj,s,,), variances (a , a ), and
i < n} is i.i.d N(0, o); and
correlation p,, for j E {0, 1}, independent across j; {uit : 1 < t < T, 1
{eit : 1 < t < T, 1 < i < n} is i.i.d. standard normal. We fix the values of Taxit to the values in the data
set. All the parameters other than p1 and , are calibrated to the data set. Since the panel is balanced for
only 1972 to 1994, we set T = 23 and generate balanced panels for the simulations. Specifically, we consider
n = 51, T = 23; ito = 72.86, Il = -31.26, p, = 0.81, yn = 0.13,
a
=
2.05, ou
=
0.15, 01
=
0.45, 02
=
ao = 18.54, a, = 10.60, a, = 0.14,
0.27, po =-0.17, pl E {0, 0.3, 0.6, 0.9}, , E {2, 4, 6}.
In the empirical example, the estimated values of pi and 0 are close to 0.3 and 5, respectively.
Since the model is dynamic with leads and lags of the dependent variable on the right hand side, we
construct the series of Cit by solving the difference equation following BGM. The stationary part of the
solution is
Ct
Ci
=00
0101(02 - 01)
7 01(t+ s)=+
79
1
0 102 (02 - 001)
00
:0
sl(t -s
where
1 + (1 - 40102)1/2
49102)1/2
, 42 =20
20,
20,
In our specification, these values are 01 = 0.31 and 2 = 1.91. The parameters that we vary across the
experiments are pi and 0. The parameter p1 controls the degree of correlation between ali and Pt and
determines the bias caused by using fixed coefficient estimators. The parameter 0 controls the degree of
endogeneity in C,t_ 1 and Cii
which determines the bias of OLS and the incidental parameter bias of
random coefficient IV estimators. Although 0 is not an ideal experimental parameter because it is the
variance of the error, it is the only free parameter that affects the endogeneity of Cit-I and C,t+
1 - In this
design we cannot fully remove the endogeneity of C,t_ 1 and C,t+1 because of the dynamics.
h1 (t)
=
aoj +aisP,t-i + 4
4
,-1,
#1 =
1 - (1
-
In each simulation, we estimate the parameters with standard fixed coefficient OLS and IV with additive
individual effects (FC) , and the FE-GMM OLS and IV estimators with the individual specific coefficients
(RC). For IV, we use the same set of instruments as in the empirical example. We report results only for
the common coefficient 02, and the mean and standard deviation of the individual-specific coefficient ali.
Throughout the tables, Bias refers to the mean of the bias across simulations; SD refers to the standard
deviation of the estimates; SE/SD denotes the ratio of the average standard error to the standard deviation;
and p; .05 is the rejection frequency of a two-sided test with nominal level of 0.05 that the parameter is equal
to its true value. For bias-corrected RC estimators the standard errors are calculated using bias corrected
estimates of the common parameter and individual effects.
Table A.1 reports the results for the estimators of 02. We find significant biases in all the OLS estimators
relative to the standard deviations of these estimators. The bias of OLS grows with 4. The IV-RC estimator
has bias unless p, = 0, that is unless there is no correlation between ali and Pit, and its test shows size
distortions due to the bias and underestimation in the standard errors. IV-RC estimators have no bias in
every configuration and their tests display much smaller size distortions than for the other estimators. The
bias corrections preserve the bias and inference properties of the RC-IV estimator.
Table A2 reports similar results for the estimators of the mean of the individual specific coefficient i =
E[ais. We find substantial biases for OLS and IV-FC estimators. RC-IV displays some bias, which is
removed by the corrections in some configurations. The bias corrections provide significant improvements
in the estimation of standard errors. IV-RC standard errors overestimate the dispersion by more than 15%
when 4 is greater than 2, whereas IV-BC or IV-IBC estimators have SE/SD ratios close to 1. As a result
bias corrected estimators show smaller size distortions. This improvement comes from the bias correction in
the estimates of the dispersion of a 12 that we use to construct the standard errors. The bias of the estimator
of the dispersion is generally large, and is effectively removed by the correction. We can see more evidence
on this phenomenon in Table A3.
Table A3 shows the results for the estimators of the standard deviation of the individual specific coefficient
2
. As noted above, the bias corrections are relevant in this case. As 0 increases, the
Or1 = E[(C1l,- /1)21/
bias grows in orders of 4. Most of bias is removed by the correction even when 4 is large. For example,
when 4 = 6, the bias of IV-RC estimator is about 4 which is larger than two times its standard deviation.
The correction reduces the bias to about 0.5, which is small relative to the standard deviation. Moreover,
despite the overestimation in the standard errors, there are important size distortions for IV-RC estimators
for tests on al when 4 is large. The bias corrections bring the rejection frequencies close to their nominal
levels.
80
Overall, the calibrated Monte-Carlo experiment confirms that the IV-RC estimator with bias correction
provides improved estimation and inference for all the parameters of interest for the model considered in the
empirical example.
APPENDIX B. CONSISTENCY OF ONE-STEP AND Two-STEP FE-GMM ESTIMATOR
Lemma 3. Suppose that the Conditions 1 and 2 hold. Then, for every q > 0
Pr
jQY(, a) - Q!"(0, a)I >7 =O(T-
sup sup_
1<-i<-n(O,a)E T
and
sup IQ(0,ca) - Qj'(0',a)j 5 C E[M(zit)1 2 10-0'I
a
for some constant C > 0.
Proof First, note that
Q
I[I.(0,
1
a)'Wi
2-g(0,
+
a)]
-g1(0,
a)
[i(0,
a)]'W
ga)]I
[V(0,
g1g(0,a) I + k (0a)'(W7' - W')-(0, a)
j(0, a)'W li(0, a) - gi(, a)'W
(0, a) - Q!"(, a)
p
+
I[gt(o, a) - gi(0, a)]'(W7T
-
W')[ i(0, a) - g1 (0, a)]I + 2
+ gi(, a)'(W-1 - W)gs(0, a)
[Ri(0, a) -
g (0, a)]'(Wi-1 - W[')g(0,a)
di
<_m0ad(, a) - g, 1(Oa)2W ~
sup E[M(zi)I Wi' max I' ,(, a) - gk,(O, a) + op
+2d2
where we use that sup 1i<n jW - Wil
=
g ,i(0, a) - gk,i (0,a))
op(l). Then, by Condition 2, we can apply Lemma 4 of HK to
Igk,i(0, a) - gk,i (0, a)I to obtain the first part.
The second part follows from
jQW(0,a)
-
Q'(0',a)I
ljg(,a)'Wi'[gi(,a)- g(0',a)lj+j[g(,a)-gi(O',a)]'Wgi(6',a)I
2.- d2E [M
1-1]2W
10 _- 1'.
0
B.1. Proof of Theorem 1.
Proof. Part I: Consistency of 0. For any 77 > 0, let e := infi[QW (G0, aio)-sup{(,a):(,a)-(Ooao)I>} Q(0
0 as defined in Condition 2.
Using the standard argument for consistency of extremum estimator, as in Newey
and McFadden (1994), with probability 1 - o(T-1)
max
n~
&~'(,ai)
<
by definition of e and Lemma 3. Thus, by continuity of
we conclude that Pr
- ooj >
= o (T- 1 ).
(^W
n-1
(0,ajo)
and the definition of the lefthand side above,
[I#
Part II: Consistency of &j. By Part I and Lemma 3,
(B.1)
Pr
sup sup
ll<i<n a
(, a) - QW (0., a) > 7] = o(T-)
81
C'
for any 7 > 0. Let
6 := inf
QW (00, aio) SUP
Q! (00, Ca ) > 0.
i [Qi
{a4C'i-C't01>n}
Condition on the event
, a) - Q! (00, a) < '6
SUP SUPQ
1
a MV(
iin
which has a probability equal to 1 - o (T-1) by (B.1). Then
max
$
(#, at) <
max
QV (00 ,a)+ e < Q
(60,ao)- e<Q$"(#,aio)-.Ze.
This is inconsistent with Q " (, &j) > Q1 (j,a1o), and therefore, |&j - aioj1
for every i.
Part III: Consistency of A4. First, note that
&t)I
I-- j'-~e
d
+
Wi
d., 1-iI 1-'a ~ j d - 9k,i(d, &i)1I+jgk,i(6, ai)j)
max
S1r1<k<dg
M(zit)
dgjWtj
r7 with probability 1-o(T-1)
sup
(8,ai)ET
196o0
gk, (0, 00) - g,i(0, a)
+dgjj
M(zt);
-asoI.
Then, the result follows because sup<i 2 < KiW - Wil = op(1) and {Wi : 1 < i < n} are positive definite
by Condition 2, maxAskgd, sup(Oa.)ET Igki(Oas) - gk,i(O, a)I= op(l) by Lemma 4 in HK, and # - 60 =
Pid - aC40
oP(1) and sup<is<
2
=
oP(1) by Parts I and II.
B.2. Proof of Theorem 3.
Proof First, assume that Conditions 1, 2, 3 and 5 hold. The proofs are exactly the same as that of Theorem
1 using the uniform convergence of the criterion function.
To establish the uniform convergence of the criterion function as in Lemma 3, we need
Ui(j,
sup
i)- 0i(60, aio) = op(l),
along with an extended version of the continuous mapping theorem for oup. This can be shown by noting
that
Ui,
-
i (60o,
<
ao)
Ilfi(0, &i) -
1U(#,5) -
i(o,&i)I + IQ( ,fi) - f2(60, aio)
01(j, &)I + d2E [M(zt) 2 ] I(O,&) - (60, ato)I.
The convergence follows by the consistency of 0 and &,'s, and the application of Lemma 2 of HK to
gk(zit; 6, ai)gl (zit; 6, a1 ) using that Igk(zit; 6, ai)gL(zit; 6, a,)
APPENDIX
M(zt) 2 .
0
C. ASYMPTOTIC DISTRIBUTION OF ONE-STEP AND TWO-STEP FE-GMM ESTIMATOR
C.1. Some Lemmas.
Lemma 4. Assume that Condition 1 holds. Let h(zt;6,a1 ) be a function such that (i) h(zt;6,a1 ) is
continuously differentiable in (6, a1 ) E T C Rde+da; (ii) T is convex; (iii) there exists a function M(zt) such
that Ih(zit; 6, ai)j <_ M(zit) and I8h(zit; 6, ai)/O(6, ai) 5 M(zit) with E [M(Zt)5(da+da+6)/(1-10v)+5I < 00
82
for some 6 > 0 and 0 < v < 1/10. Define H_(, a2 ) :=T'
Let
ai =arg max
i
Q.(*
h(zit; 6, a1 ), and Hi(0, a) := E [H1(0, aj)].
a),
such that a - ajo = oup(Ta-) and 0* -00 = op(Ta*), with -2/5
any 0 between 0* and 0o, and W between a and ao,
Hi(#,)]= oup(TIll), Hti(#,-,)
v'-[fi(O,)-
a < 0, for a = max(a,,ae). Then, for
-
Hi(00, ago)= oUP(T).
Proof The first statement follows from Lemma 2 in HK. The second statement follows by the first statement
and the conditions of the Lemma by a mean value expansion since
T
j$j(3,
) - Hj(OO, ao)j
<
p- 017
)11T
M(zie) + r| - Ctio
=o,(Ta)
=op
(Ta)
=O.P(I)
+
IHi(00,ao) - Hi (0o,ao)
01z
=O.,(l)
=O.P(T).
=OUP(T-2/5)
0
Lemma 5. Assume that Conditions 1, 2, 3 and 4 hold. Let i"(0, -ye) denote the first stage GMM score of
the fixed effects, that is
where -y
=
g(0, ai) + WAk
(ai,A')', s7(0, -i) denote the one-step GMM score for the common parameter, that is
si (OYO)= -Gi(O, aj)'>n,
and ~y4(0) be such that i
Let T
(6, 4(0)) = 0.
(,-y)
denote &IV(O, y)/c7/87V,; and Mii (, 7,) denote Os (0, y)/07jOy,, for some 0 < j <
dg + d, where yi,j is the jth element of -y and j = 0 denotes no second derivative. Let ,N (0, -) denote
8ii(0,-y)/9O0' and S (0,7y) denote Oj' (0,1y1)/O0'. Let (0, 1,. . , .)be the one-step GMM estimator.
Then, for any z between 0 and 00, and 7i between 2' and yO,
T W (0, 7)-T 3 = oup (1), Ml-- (- Ti-MGy = o p (1) ,
r, (,77)-N r" = op (1),
i (0,-j)- Sj
= o p (1).
Also, for any T2 o between 'yjo and i0o = ~7(00),
v/Tij(60,) = oup (Til1) , T(T ((0,o )-Tr,) = ou (TI/10),
(, O v ) ( - Mr) = oup (TI/
)
M
Proof. The first set of results follows by inspection of the scores and their derivatives (the expressions are
given in Appendix G), uniform consistency of j by Theorem 1 and application of the first part of Lemma 4
to 0* = 6 and a = &i with a = 0.
The following steps are used to prove the second set of result. By Lemma 4,
= oup (T1/0), Tr(0oT0i) -Tr' = OuP (1)
83
where %0 is between io and No . Then, a mean value expansion of the FOC of jo, Y (0o, io) = 0, around
=
7o gives
(=0.(1) (.(1)
=OUP('(110)
oup(T"llO) + oUP (VT(~to
=
by Condition
O
)
io
-7io)
3 and the previous result. Therefore,
~ o - 7yiO) = oup(T1/l
(1 + oup(l)) v"T(jo - 7y0) = op(T1/10)
Given this uniform rate for 5io, the desired result can be obtained by applying the second part of Lemma
0
4 to 0* = 00 and a, = a1 O with a = -2/5.
C.2. Proof of Theorem 2.
Proof. By a mean value expansion of the FOC for 0 around
#=00,
-00),
0=-S(0) =f'(0O)+
where j lies between j and 00.
Part I: Asymptotic limit of dsW(N)/d0'. Note that
daw(O)
+ s, (0,- (-)) - S(-0)
C91V @0 (;,~i (0-)) asiw(0,~y(0-))
+
('
(0.1) (C-1)
'
.
1 n d_(_,__(_))
By Lemma 5,
))= S_ + OP,(1),
_0_
0(0)) = Mr + OUp(1).
10'
Then, differentiation of the FOC for
.,(0),
id (,; (0))
(
=0, with respect to 0 and j gives
+
=
0,
By repeated application of Lemma 5 and Condition 3,
00'( )
_ r) -' Nr + oup(1).
Finally, replacing the expressions for the components in (C.1) and using the formulae for the derivatives,
which are provided in the Appendix G,
(C.2)
dO'
=
E = G'P
O e Go- +op(1) = J
+ op(l), Jw =
[G'PZGo].
Part II: Asymptotic Expansion for i- 0o. By (C.2) and Lemma 22, which states the stochastic expansion
of Vn(TsW(0 ),
0
=
/Tw'(o)nT(-o).
Op(1)
Op(1)
84
Therefore, v/nT(j - Go) = Op(1), and by part I, Lemma 22 and Condition 3,
vn~T(j - Oo) + -(Jr')-N (KB!", VW)
0
C.3. Proof of Theorem 4.
Proof. Applying Lemma 4 with a minor modification, along with Condition 4, we can prove an exact counterpart to Lemma 5 for the two-step GMM score for the fixed effects
h(, 7,) = F(, gi) +
'(,
4,
where the expressions of P and iR are given in the Appendix G, and for the two-step score of the common
parameter
F(0, (0)) =
The only difference arises due to the term
V7T~7((, &j) - ni)
0(, o(0))'A")
which involves f~(,7),
( &j) - f4. Lemma 8 shows that
= oup(T/ 1' 0 ), so that a result similar to Lemma 5 holds for the two-step scores.
Thus, we can make the same argument as in the proof of Theorem 2 using the stochastic expansion of
VNTs'(60 ) given in Lemma 23.
[
APPENDIX D. ASYMPTOTIC DISTRIBUTION OF BIAS-CORRECTED Two-STEP GMM ESTIMATOR
D.1. Some Lemmas.
Lemma 6. Assume that Conditions 1, 2, 3, 4 and 5 hold. Let Z(#, 7i) denote the two-step GMM score for
the fixed effects, 'i(0, y) denote the two-step GMM score for the common parameter, and 'j (0) be such that
i(,j-()) = 0. Let Tfj(G,-y) denote 8O(G,'y)/yjO7;,j, for some 0 < j
dg + d, where ij is the jth
component of 7t and j = 0 denotes no second derivative. Let Ni(0, 7i) denote 6i (0, yi)/O0'. Let Mi2, (G, 7t)
denote cis(0, 7y)/04jO'y9 3j , for some 0 < j 5 dg + da. Let S1 (O, 7-) denote 8&i(, 7 )/aG'. Let (0,
}'__') be
the two-step GMM estimators.
Then, for any 0 between 0 and 0 0, and 77 between '
(i,d(R, j) -
V
vT
T(N2 (e,7i7)
-
and 7io,
Ti,d)
=
Op (T
N1)
=
oup (T1b), y
),
( Mii (0, 7i) - M1,)=
oup T
(i(i,7i) - si) = oup (TI11)
Proof. Let -, = -z(0) and -io = $i(0o). First, note that
vT(,-
3)
=
VIT(080'0
0)
= -
(To)1 NvT (0- Go) + oup (vT(O- 0)) = Oup(n-/2).
=0.(')
=-op(n-1/)
where the second equality follows from the proof of Theorem 2 and 4. Thus, by the same argument used in
the proof of Lemma 5,
V/T(- - 7io) = vrT(y - -o) + vrC(-o - 7o) =
oup(T/
10 ).
Given this result and inspection of the scores and their derivatives (see the Appendix G), the proof is similar
to the proof of the second part of Lemma 5.
0
85
Lemma 7. Assume that Condition 1 holds. Let hj(zit; 6, a2 ), j = 1,2 be two functions such that (i)
hj (zit; 6, a) is continuously differentiable in (0, aj) e T c Rs+do-; (ii) T is convex; (iii) there exists a function M(zit) such that Ihj(zt;9, aj)j 5 M(zit) and IOha(zjt; 6,a)/8(,a) 5 M(zt) with E [M(Zt)10(d6+da+6)/(l1lOV)+8]
oo for some 6 > 0 and 0 < v < 1/10. Define Fi(O,a) := T1
1 hi(zit;6,aj)h2(it;6,a), and
F(6, a2) := E
(0, as)]. Let
a = arg supQ(6*, a),
such that a! - ajo = oup(T-) and 6* - 00 = op(T *), with -2/5 < a < 0, for a = max(aa,ae). Then, for
any 6 between 6* and 60, and Wj between a and aoo,
j(O,&)
F(60 ,ao) = oUP(Ta), N TPj(O,&j) - F(O,&j)=oup(T 1 10 ).
-
Lemma 8. Assume that Conditions 1, 2, 3, 4, 5, and 6 hold. Letf4-(#,-& ) = T-1 g(zjt;j,Ni)g(zit; 9,
6
be an estimator of the covariance function i = E[g(zjt)g(zjt)', where = 0o + op(T-2/ 5) and 6 =
aio + oUp(T 2 /5 ). Let
i
1
d2
(0,')
=&= +d2f (aj)1MAajOd,
VIT 0Qae2(0,&i)
-
for 0 < d1 + d2 5 2. Then,
8d2)
tIi
)
.
Proof. Same as for Lemma 4, replacing H by F, and M(zit) by M(zt) 2
oUp (T1I/)
=
Proof. Note that
Nz)g(z;;,)'- E [g(zEg;zt,
dg(z<k;_,
ma
;
g)g(zL;zt]
gak(Zit;"j,"iEg(Zit;'j, &)'-E[gz;,3)sz;,)'.
Then we can apply Lemma 7 to hi = gk and h 2 = g9 with a = -2/5. A similar argument applies to the
derivatives, since they are sums of products of elements that satisfy the assumption of Lemma 7.
2
Lemma 9. Assume that Conditions 1, 2, 3,
any 0 between 0 and 60, let
_
1_
Ge6 M) Pc,
no
2,
=
4, 5, and 6 hold, and f -+ oo
such that f/T -+ 0 as T -+ oo. For
,
= Ea,
W
A
H
=
+1
Ga,
0'Q
,
O (A
(6),t-j(6), and B$(6) =
-e,
d.
=
I
j=0
$BA (6)
=
Ha, (6)' E T~ E Gat (6)' a, (O)6,t-j (6),
t=j+1
j=0
$
=
Pa,(0)
I
T~
j=O
=
T
-Pa, (6) Y &ai, (6)2ci (0)/2 + Pa, (6) EjT-'
j=1
(6)
Pa (6) E
T
Z
M
(0)'[B\ (0)+
B\ (0) + B\ (0) + Bi(0)], where
S(6)
a
Wc__
-1,
-1a0
(O do, ) , Bq(6) = T~
no
(6)-(6)'P,(6)itj(O)
t=j+1
a,4 i[Hc (9) - fai,, (6)],
j=1
86
e (6)H ()',t-j (0),
t=j+1
(0, ai),
with
Fcd Zd2i (0,^ti(0)) - Fctdl,5 = od(T/li
where
FetdId2
i
:= F if d1 + d2 = 0.
Proof. The results follow by Theorem 3 and Lemma 6, using the algebraic properties of the oup orders and
Lemma 12 of HK to show the properties of the estimators of the spectral expectations.
0
Lemma 10. Assume that Conditions 1, 2, 3,
4, 5, and 6 hold. Then, for any 9 between 0 and 00,
s ()
= Js
+ op(T-2/ 5 ).
Proof. Note that
i
by Theorem 3 and Lemmas 6 and 9, using the algebraic properties of the oup orders. The result then follows
by a CLT for independent sequences since
Js(0)
-
Ja= E[
E[G'.PGo
'-
=
{(G' P Ge - E[G',PaG1 1) + oup(T 21
n~
)
[
(G)'G(i)Ge
-G
Go,
Pc,
T(0)
=o
1 ',
be estimators of E,, H,,, P,, Ei, HW, J,, Bq and Bf. Let Fadejd 2i(O, ^,(O)) and Fd e d2
FE {YEH, P, EW, HW, Ja1 , B , BL!} denote their derivativesfor 0 < d+d 2
1. Then,
=1
Lemma 11. Assume that Conditions 1, 2, 3,
4, 5, and 6 hold. Then, for any j between 0 and 00,
B.
=
B, + op(T-2/ 5 ).
Proof. Analogous to the proof of Lemma 10 replacing J, by B,.
Lemma 12. Assume that Conditions 1, 2, 3,
3= -J; 1B.,,
S(0)
=
]
4, 5, and 6 hold. Then, for any 0 between 0 and 00, and
-J,(O)-.,(O)
= B+ op(T-2/ 5 ).
Proof The result follows from Lemmas 10 and 11, using a Taylor expansion argument.
0
D.2. Proof of Theorem 5.
Proof. Case I: C = BC. By Lemma 10 and 25
~o0)
(~-= -J^
(3)
(0) = -J8
(
0
) + op(T 2 /I
-1'(0)+ op(l).
Then, by Lemmas 12 and 25
V7
s (C- o)
=(
v(0- o0) - .
( B -J,'-(00) + /J'B, +op(l)
J 1
+ r/Bs
-
igBJ + op(1) 4 N(0, J;1).
Case II: C = SBC. First, note that since the correction of the score is of order Op(T-1),
Op(T-1). Then, by a Taylor expansion of the corrected FOC around 0SBC
00
0
=
$(-S
)
-
TB,
(SBC)
= g( 0o) +
87
. ()
(6SBC
-
0)
-
T-B,+
~
0sBC
where 0 lies between
,
; (P7sBC
OSBC
and 0. Then by Lemma 25
-i^
_o) (
-
- / nv s'(0 ) - n1/2T-1/2B]
Bs, + op (1)
B,
+
-- (=)
/1
J^
op()
d
N(0, J,-)
Case III: C = IBC. A similar argument applies to the estimating equation (5.2), since IBC is in a
O(T-1) neighborhood of 00.
APPENDIx E. STOCHASTic EXPANSION FOR io = 50i(0) AND 'io = '7(0)
We characterize the stochastic expansions up to second order for one-step and two-step estimators of the
individual effects given the true common parameter. We only provide detailed proofs of the results for the
two-step estimator 'o, because the proofs the one-step estimator 4io follow by similar arguments. Lemmas 1
and 2 in the main text are corollaries of these expansions. The expressions for the scores and their derivatives
in the components of the expansions are given in Appendix G.
Lemma 13. Suppose that Conditions 1, 2, 3, and 4 hold. Then
(~/io - 'yio)
+ T-1/2
=
4
N(, Vw),
where
(TW)'
=-
=
Also
t
=
op(T11 10 ), Rw = oV(T" 5 )
Vw =
1in
=
Op(1)
Proof. We just show the part of the remainder term because the rest of the proof is similar to the proof of
Lemma 16. By the proof of Lemma 5, v/T(jo - 'yio) = oup(TI 1 0) and
R
T'(0w
= -(T
=OU(1)
-
T7io)
0(T10
-
'yio) = oup(T/ 5 ).
0(T1)
Lemma 14. Suppose that Conditions 1, 2, 3, and 4 hold. Then,
t
-
'yio)
=
+ T-
+T
2Q
where
QW
ii
Aw
Also,
=
=
=
-(Tr)
(T1
/'(TWT
A!~
1
E
2,
!
)- op(T1/ 10), R' =oup(T3I)
in
QW = Op(1).
88
]
Tr
uP
!
=oup (T'/),
Proof. Similar to the proof of Lemma 18.
0
Lemma 15. Suppose that Conditions 1, 2, 3, and 4 hold. Then,
A N(O,
1n
[V W))
~EBI
!'&Qw
G + Bw, l1
Off,
ny~
where
HZ
yW
H
P
00
dc
E E [G, (zit)Hg(zt-j)] -EGj
j=-00
)
Bw ,I
|2
00
BZG
B',G
H4iH
j=1
'
Z
Hw
j=-00
) )
B,X'G
BlVtI
d
Hw
)
ai
(
|2,
j=1
00
H
E
Pa
for Ej = (G'W- 1Ga)',
d
EG' 'P ZG H:'/2+ L G', (Id.0 ej)H2 f P~2
j=-0
H
E [j(z)Pig(zi,t-j)],
G W ~, and PW =Wj~1 - W~1G,, HZ.
Proof. The results follow from Lemmas 13 and 14, noting that
W
-
(TW)
(
ffW
PIZ
H-
itPZ
) g(zit),
H
P
E [G., (z tE [G (zt)'P g(zi't _)]
r =
E
L."i I j=-0 (E [ it(z)'H g(zj,t-j)} + E [ i(zit)Pig(zi,t-j)1
E [i
G'G'
T
aP
if j< do,;
I
HW~~W
a,
)
'
PIZ )
E[~b~b"]
,
H
E
Gat (Id. (D ej _d.)HZGiP,,
if j > da.
0
Lemma 16. Suppose that Conditions 1, 2, 3, 4, 5, and 6 hold. Then,
(io
- 7yo) = i +
T-
+
N(O, Vi),
-
=
(Tt=)
'
= oup (T111
Also
Ei = Op(1)
89
Ru = oup (T115
,
=
,
where
v = E[#1,.
Proof The statements about jb follow by the proof of Lemma 5 applied to the second stage, and the CLT
in Lemma 3 of HK. From a similar argument to the proof of Lemma 5,
=
(T?)
('
T(fj"( 00,y.) - R)
1 0
1 0
=Otp(T /' )
=0.(1)
=
VTM( 0,7 ) - T) v'(io - y ) - (T?)
=OUP(T /' )
-
O
1
=onp(T /10)
=0(1)
=0,,(TI/10)
oUP(T1 / 5),
0
by Conditions 3 and 4.
Lemma 17. Assume that Conditions 1, 2, 3,
4 and 5 hold. Then,
fli(O, &j) = 1O, +T-/2 0 +T~1Rig,
where
i(TlO),
v0
=
=
+
-
RW10i =OU,(T/5),
j=1
and kW is the jth element of iW'
Proof. By a mean value expansion around (0, a1 o),
do
d,.
)
i(#,5
=
i+
- a, 3 ) + Zec
j=1
,)(,
j=1
3 (,i)(j
-00,j)
where (0,Th) lies between (0, 6j) and (0, a1 o). The expressions for i& can be obtained using the expansions
for 5'-o in Lemma 13 since A, - jo = osp(T- 3/ 10 ). The order of this term follows from Lemma 13 and the
CLT for independent sequences. The remainder term is
j=1 RW +
(
-
i,
tiOj)]+ do
( ,
)T(jj - t, 3).
j=1
The uniform rate of convergence then follows by Lemmas 8 and 13, and Theorem 1.
LI
Lemma 18. Suppose that Conditions 1, 2, 3,
4, and 5 hold. Then,
(E.1)
=
j + T-1/2Q1 + T-R 2
,
v'T(9 -yjo)
where
=oup T
Qui(4i, i 2)
=
AP
=
-
(T?)
+
,
=1
-v/T-(? - T)
(T /10),
=oup
R 2 i = oup (T3/10
Also,
n
Qi = Op(1).
i=1
Proof. By a second order Taylor expansion of the FOC for iO, we have
0 = t'(O0 ,j4 0 )
-
1 dg9-dw
+ J(tio - 'tio) + (tio,j j=1
90
,j),
-
)
1:+ d=iag[ ,
where 7 is between '4r and -yo. The expression for Qij can be obtained in a similar fashion as in Lemma
A4 in Newey and Smith (2004). The rest of the properties for Qu follow by Lemma 5 applied to the second
stage, Lemma 16, and an argument similar to the proof of Theorem 1 in HK that uses Corollary A.2 of Hall
and Heide (1980, p. 278) and Lemma 1 of Andrews (1991). The remainder term is
-L(T
R2d
Ru + E
tJT
IT-0 - 7h0)+rjj' RI /2
?(G,)
(ioda -7yo)v7(I
-
7
vTSi(iso -
)
d -|da
(T?)'
-
j=1
-
T(7)' [diag[O,R
jVT('jio - 'yio) + diag[0, '0]Rjj.
The uniform rate of convergence then follows by Lemmas 5 and 16, and Conditions 3 and 4.
11
Lemma 19. Suppose that Conditions 1, 2, 3, 4, 5, and 6 hold. Then,
*Z
4
N(, B[Vl),
Qu
i=1
2[B, + B, + B, + B,]=: B,,
i 1
where
B,'
=
diag ( E
, PON)
,
=
HGd
Eai/2+ E [Ga,(zit)Hagi(zitiI)]
)
(
V
)f
E E [Ga, (zit)'Po,g(zi,t_)],
H
B-(
PtEE[g(zjt)g(zjt)'Pc,,g(zj,t-j)},
B-(
E aj
PaL
for Ea, = (G',
Ga'
-1 , Hai = Eat GQ'
-HO,1
(=H
and Pa, = n-'-
0
G Hai.
Proof. The results follow by Lemmas 16 and 18, noting that
(Tf~
=
=
(E
*
=-
,
E[at
Zr
[-:
b]
;
Pa
0
{
P,
(
0
)
ifj<
+
I
g(z),
E [Gai (zit)'Pai (zite-j)}
E [G, (z-i)'Hcig(zi,t-j)]
da;
if j > da.
0
Eo E[g(zit)g(zit)'Pag(zi,t-j)1Ediag[0,
Pa
*
FltRaij
(HZ'
Haij
0I
91
io) AND '4(OO, '
)
APPENDIx F. STOCHASTIc ExPANSION FOR S'(60,
We characterize stochastic expansions up to second order for one-step and two-step profile scores of the
common parameter evaluated at the true value of the common parameter. The expressions for the scores
and their derivatives in the components of the expansions are given in Appendix G.
Lemma 20. Suppose that Conditions 1, 2, 3, and 4 hold. Then,
S(Oo,
-
=
+ T'-Q
Ti0)
+ T -3i,
where
dv+d.
M b:Y=oup(T1'/ 0 ),
Q
MrQW+Orj
~WMWW =oup(TI5 ),
+
j=1
0
=(MT
)= oup(T
M
-
Also,
10 ),
R1
= oup(T2/ 5).
n
n
Op(1), -EQW
=
ni=1
'/;ii= 1
=Op(1).
Proof. By a second order Taylor expansion of s"(0, ~yio) around jo = 7yo,
1d,+d.
sW(60, igo)
-S + MT ono - 7i) +
=
2j=1
(~E
oj - 7;oj)M
(0, %)(Oy0 - 730),
where 7i is between 5io and -yio. Noting that '(0, yio) = 0 and using the expansion for ~f'o in Lemma
14, we can obtain the expressions for r and QW, after some algebra. The rest of the properties for these
terms follow by the properties of br and QW. The remainder term is
4
i-V
+
o Mi
ERjyio)
d,+d
V(is j - 7io, )VT(M
(00,(i)Y)-M: )
oo- 720).
j=1
The uniform order of Ri
and Conditions 3 and 4.
follows by the properties of the components in the expansion of ijo, Lemma 5,
[
Lemma 21. Suppose that Conditions 1, 2, 3, and 4 hold. We then have
-+
N(0, VW),
4
EE [Q]
VW = R[G' PPn-PPGoi],
=1
n
=[B:'B+ B:,C + B:V1 =: B!J,
1
B'v =
-
G'
= -G'
jBWI'
+ B + B+ B,1s), B
P ZfHZ'/2 -E9
1
G'
(G', Wj~1Gcj)~, and PZ =Wj-1 - W~GjH
(Id
.
where B:'B = -G',B
92
C
ej)HZiZP
E
_, E [Goe(zjt)'Pc g(zi,tj)],
|/2, HZ = EWG' W1,
EW
=
Proof The results follow by Lemmas 20 and 15, noting that
a1 t
HWQ.-HW
P t.HW'
Mrw
MW'
pWQipZ
/
=
'
E
E [O;;
j]
E
E [Go (zi)'P.'g(zi,t-j)],
j=-o0
M
H,
_G'eti P fl
-G', (Id. 0 ej-d.)H
v]
fl P
if j < da;
if j > dc.
,
E[
0
Lemma 22. Suppose that Conditions 1, 2, 3, and 4 hold. Then, for -s(o)
= n- 1
j_1 "(no, '~io),
0) + N (6B!,V," 7,
where Bf
and VW are defined in Lemma 21.
Proof. By Lemma 20,
E
n
in
7n
VhsW (0o)
=OP(I)
in
=OP(I)
=r1
Q/I + op (l).
+
I =1
Ta
t=I
Then, the result follows by Lemma 21.1[
Lemma 23. Suppose that Conditions 1, 2, 3,
''(6o,
4,
5, and 6 hold. Then,
+ T~1Q1,1 + T -3/2R 2 ,,
T --1/2=
where all the terms are identical to that of Lemma 20 after replacingW by Q. Also, the properties of all the
terms of the expansion are the analogous to those of Lemma 20.
Proof. The proof is similar to the proof of Lemma 20.
Lemma 24. Suppose that Conditions 1, 2, 3,
1in
4,
0
5, and 6 hold. Then,
A N(0, Ja), Ja = B[G' PaG
0 ,] 1
in
13
+E
[Qliu = E[BB+ BE]=: B,
where B! = -G'e, (B,(. + Bx + B, + B ), Bq =
Hai = EaG'. n1, and Ea, = (G'
1Gai)~'.
,*_0 E [Go,(zt)'P g(zj,t-)] , Pa, = n 1-n7 GcHa,
Proof. The results follow by Lemmas 16, 18, 19 and 23, noting that
E [
&ai
' ] = M?
(
i
0)
MP, E
E[Ge, (zijt)'Pc,,g(z,t-j)], E V iljMpi
=j
=
0.
j=0
U
93
Lemma 25. Suppose that Conditions 1, 2, 3, 5, and 4 hold. Then, or (60o) = n-1
T (6N) =
B(B.,
_= (6Si Y),
J,),
where ,j and B, are defined in Lemmas 23 and 24, respectively.
Proof. Using the expansion form obtained in Lemma 23, we can get the result by examining each term with
Lemma 24.
0
APPENDIx
G. SCORES
AND DERIVATIVES
G.1. One-Step Score and Derivatives: Individual Effects. We denote dimensions of g(zit), aj, and 6
by d, d. and do. The symbol & denotes kronecker product of matrices, and Id. denotes a d4-order identity
matrix. Let G, (zit; 6, a1 ) := (Ga, (zit; 6, a1 )',..., Gaa, (zt; 6, a)')', where
OGa(zit;6, ai
)
G j (zit; 6, a1 ) =
(zit; 6, a1 ), and use additional sub-
We denote derivatives of G .. (zt; 6, a1 ) with respect to a,j by Graj
scripts for higher order derivatives.
G.1.1. Score.
W(OY
1
T
~L
(6,
Gct (0, ais)'A
iA
(0)=j
Gai (zit;0, ai)'t
Ai(6,ai)+W
g(z;6,ai) +Wik
G.1.2. Derivatives with respect to the fixed effects.
First Derivatives
T(,7)
TW"
&"!V (-Yi , 0)
=
=
G/
G
0
E[2']-
j) ®A i(Oa))'
(6,ai)'(Ia
1 (Gai)
W)
G'
GE,
HZ
(-r --1
HZ'
PW
Second Derivatives
P (0, -Y)
=
2
i
-0
0Y87387
()
GCCeaias (0, aj)'(Id. (9 A)
GC1Ck, (0, ai)'
Gca 1 (6, ai)
G
Gatj(0, ai)' (Id. (S ej-d.)
0
0
0
G'
0
[*Y.
E
((;Id) G®aid 0
GT
TE"| =
L-L',j
,
G'lat(Idc.oej-d.) 0
0
0)
94
j > dc,.
ifj ;4;
,if
a
if j >4.
,if
0
fj
>d,.
>
Third Derivatives
Gaaaaft,(9, )' (Id. Ai) Gaack1 ~b (0, ai)'
Gaai,,,, (0, aj)
0
0'
(9® ek-do,)
_
GCea,aiA (0, a)'(0
i3V (0,-yi)
&7i,k&yi,jo7
0 ejo, .)
0
0
0
if jda,k<da;
if j
da, k > d;
ifj > da,k
0
da;
if j > da,k > da.
TW
tjk
I
,;
if j 5 da,k
,I
0
0
0
0
0
0
Gaa,a (d.,
=E (Ti
=
)
)
dc,;
if j < dc,k > dc,;
if j > da,k < da;
(0 00~
if j > dc,k > dc.
G.1.3. Derivatives with respect to the common parameter.
First Derivatives
=~
)
,3(
NW
E
( ,0)
[&,
=-
Gea. (9, a)',A
Got,! (9, a)
GO
G.2. One-Step Score and Derivatives: Common Parameters. Let Goa, (zit; 9, a)
(Goaji (zit; 9, aj)',... , Goca,, (zit; 9, a1 )')', where
)
G&as,, (zit; 6, a1 ) = OGe (zit; , a1
We denote the derivatives of Goa, (zit; 9, a1 ) with respect to ai,3 by G&,a,, (zit; 0, ai), and use additional
subscripts for higher order derivatives.
G.2.1. Score.
G(zit;0, aj)'Ai = -Ge (0, ai)'Ai.
' (0,7)= -
t=1
G.2.2. Derivatives with respect to the fixed effects.
First Derivatives
95
=
W (0, 7)
MrW
=
= - ( &eC, (0, a1 )'(I.
&Y''
(o
E[ i]-
).
&o0(0,a,)'
®Ai)
G' ).
Second Derivatives
oee, (0, a,)'
$Gea 1 (0, aj)'(Id. ®Aj)
)
if j < d;
,
MW (0,y3)
Gja (0, ai)'(I. ® ej-d.)
-)]
0 ),
if j > dc.
-
E2Wi' (0,
-
j
MW13
0
G'aj
)I
if
0 ),
G'. (Id ejd.)
<d a,
ifj>d
Third Derivatives
;W-4,
(
74,j3 7k
MW =
0
-
Goa,a (0, a,)'(I. 9 ej-d.)
0
-
0 0
0
Mr
aGac,c
(0, ai)'(I. ®ek-d.)
-
if j
G
G'
-0
0
if j 5 d., k
if j
if j > da,k
dc,;
d0 , k > dc;
do,;
if j > d,k > d.
dak < d;
dak > d;
ek-.)
0
, if j
(Id. o e--()
0
, if j > da, k < da;
G',a (Id.0
E Mjwik
),
,
(
Gea,aak (0, a)'(Ida. &\) Go9 a,,, (0, ai)'
,
-
j > dc,, k > do,.
,if
G.2.3. Derivatives uith respect to the common parameters.
First Derivatives
kv;(1,7)0
SW
==
=
E []
-ee,
(0, a,)'Xi.
=0.
G.3. Two-Step Score and Derivatives: Fixed Effects.
ti (0,7 -0 =
=
1
E
t=
(
Ga(zit;0,ai)'Ai
g(zit; 0, ai) +046
=;
Ga, (0, a,)'
gi(6
Ai
/)
-
G.3.1. Score.
ii+Oh(i-O~
FP(0, 7 )+!?(0,4Yi).
Note that the formulae for the derivatives of Appendix 0.1 apply for W, replacing W by Q. Hence, we only
need to derive the derivatives for iR.
96
G.3.2. Derivatives with respect to the fixed effects.
First Derivatives
=if(o,-y)
TR"a
E~R~-~Ef~]
0
0
17i
Qj0( , a1)
-
0
0
T Ek Rl
ri 1
0 E [fi - ni]
Second and Third Derivatives
Since T4c(-y, 0) does not depend on -ye, the derivatives (and its expectation) of order greater than one are
zero.
G.3.3. Derivatives with respect to the common parameters.
First Derivatives
N(0,
1
)
=
0
,y) = 0.
G.4. Two-Step Score and Derivatives: Common Parameters.
G.4.1. Score.
~Ge(zit; 9, a1 )'Aj = -Ge (9, a2 )'Ak.
t=1
Since this score does not depend explicitly on li(g, aj), the formulae for the derivatives are the same as in
Appendix G.2.
s (0, 'y) =
-
REFERENCES
ANDREWS, D. W. K. (1991), "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Estimation," Econometrica 59, 817-858.
FERNANDEZ-VAL, I. AND, J. LEE (2012), "Panel Data Models with Nonadditive Unobserved Heterogeneity: Estimation and Inference," unpublished manuscript, Boston University.
HAHN, J., AND G. KUERSTEINER (2011), "Bias Reduction for Dynamic Nonlinear Panel Models with
Fixed Effects," Econometric Theory 27, 1152-1191.
HALL, P., AND C. HEIDE (1980), Martingale Limit Theory and Applications. Academic Press.
NEWEY, W.K., AND D. McFADDEN (1994), "Large Sample Estimation and Hypothesis Testing," in R.F.
ENGLE AND D.L. McFADDEN, eds., Handbook of Econometrics, Vol. 4. Elsevier Science. Amsterdam:
North-Holland.
NEWEY, W.K., AND R. SMITH (2004), "Higher Order Properties of GMM and Generalized Empirical
Likelihood Estimators," Econometrica 72, 219-255.
97
Table Al: Common Parameter 02
pi = 0. 6
PO 0. 9
0.06 0.01
Bias
0.97
0.90
0.84
1.00
1.00
1.00
0.08
1.00
0.00
0.00
0.00
0.04
0.04
0.04
-0.01
0.06
SE/SD p;.05 Bias
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
SD
1.01
1.01
1.01
0.99
0.99
0.99
0.84
0.83
1.00
0.08
0.05
0.05
0.05
1.00
1.00
1.00
0.11
1.00
0.11
0.00
4
0.12
-0.01
0.00
0.00
0.04
0.04
0.04
-0.01
0.06
SE/SD p;.05 Bias
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
SD
1.08
1.10
0.92
1.00
1.00
1.00
1.02
1.02
1.02
0.78
0.84
1.00
1.00
0.09
0.05
0.05
0.05
1.00
1.00
1.00
0.18
1.00
0.11
0.12
-0.01
0.00
0.00
0.00
0.04
0.04
0.01
0.03
0.01
0.01
0.01
0.01
0.01
0.04 0.01
-0.01
0.01
0.02
0.07 0.01
1.07
1.07
0.79
1.00
1.01
1.00
0.96
0.96
0.96
0.63
0.71
1.00
1.00
0.15
0.05
0.05
0.05
1.00
1.00
1.00
0.28
1.00
p, = 0. 3
Estimator
0.00 0.01
0.97
0.97
0.06
0.06
0.06
1.00
0
OLS - FC
0.04 0.01
1.00
0.99
1.05
1.03
0.89
P, =
IV - FC
0.04 0.01
0.04 0.01
0.99
0.01
0.02
SE/SD p;.05
OLS - RC
0.01
0.00 0.01
0.00 0.01
0.01
SD
BC - OLS
IBC - OLS
0.00
0.10
SE/SD p;.05 Bias
IV - RC
BC - IV
1.00
1.00 0.12
0.07 -0.01
SD
IBC - IV
1.06
1.09
0.94
1.00
1.00
0.01
0.02
0.06
0.12
0.00
1.07
1.07
0.10 0.01
1.00
OLS - FC
IV - FC
0.01
0.01
OLS - RC
0.02
0.06
0.06
0.11
0.11
1.00
0.00
0.99
0.99
0.08
1.00
1.00
1.00
1.34
1.00
0.05
0.02
0.02
0.92
1.26
1.01
0.01
1.26
1.08
1.08
0.00
0.00
0.04
0.01
0.01
0.01
0.16
0.01
0.02
0.05
0.05
-0.01
0.15
0.11
0.11
1.00
0.15
1.00
1.00
1.00
1.00
0.08
1.00
1.05
1.05
1.25
1.00
1.00
0.04
0.01
0.01
0.02
0.02
0.92
1.21
1.26
1.05
0.04
0.04
0.10
0.10
0.01
1.21
1.02
1.00
1.00
0.03
0.01
0.01
0.03
1.02
1.06
1.06
0.16
0.01
0.15
0.00
0.03
0.03
0.01
0.01
-0.01
0.15
0.00
BC - OLS 0.10
IBC - OLS 0.10
1.00
0.15
1.00
0.05
0.00
0.00
0.06
1.00
0.05
0.06
1.22
1.00
1.21
1.01
0.05
0.96
0.94
1.21
0.98
0.02
0.01
1.21
0.01
0.03
0.98
0.00
0.03
0.01
0.15
0.00
0.03
0.00 0.03
0.06
0.16
0.01
1.00
0.04
0.00
0.98
0.00
0.15
0.04
0.00 0.02
1.00
0.15
1.21
1.00
0.04
IV - RC
0.06
1.00
0.01
0.03
0.97
0.97
0.95
0.95
1.27
1.00
0.15
0.00
0.03
0.00 0.03
0.02
0.02
0.95
1.20
1.00
0.06
0.00
0.00
0.00
0.16 0.01
1.20
0.06
0.05
0.05
0.00 0.03
0.15 0.01
1.20
0.98
0.06
0.97
0.97
OLS - FC
0.15 0.01
0.95
0.00 0.02
0.00 0.02
IV - FC
OLS - RC
0.01
0.03
0.95
BC - IV
IBC - IV
BC - OLS
0.00 0.03
0.00 0.03
0.06 0.00
0.06 0.00
?p=6
BC - IV
IBC - OLS 0.15
0.00
IV - RC
IBC - IV
RC/FC refers to random/fixed coefficient model. BC/IBC refers to bias corrected/iterated bias corrected estimates.
Note: 1, 000 repetitions.
Table A2: Mean of Individual Specific Parameter 1, =E[cu]
P1 = 0. 6
P, = 0. 9
SD
1.02
0.35
0.40
0.12
0.78
0.44
1.15
2.58
0.16
SE/SD p;.05 Bias
1.65
1.91
1.72
SD
0.96
0.31
0.37
0.12
0.79
0.47
1.19
3.01
0.46
SE/SD p;.05 Bias
1.59
1.92
1.66
SD
0.99
0.30
0.39
0.12
0.84
0.46
1.25
1.62
3.68 2.20
0.96 1.80
0.97
0.27
0.37
0.13
0.89
0.53
p, = 0. 3
1.65
1.59
Pi =0
Bias
1.53
SE/SD p;.05
Estimator
1.16
2.33
0.08
SD
OLS - FC
IV - FC
0.93
0.93
1.03
SE/SD p;.05 Bias
OLS - RC
1.62
1.62
1.59
0.15
0.15
0.05
1.25
1.25
0.08
0.06
0.14
0.14
0.05
0.06
0.95
0.95
1.04
0.95
0.32
0.38
0.47
0.47
0.03
1.59
1.59
1.56
0.98
0.05
1.19
1.19
0.02
0.43
0.57
1.06
0.93
0.93
1.15
0.05
0.14
0.14
0.05
1.02
0.95
0.92
0.92
1.00
1.02
0.20
0.14 1.15 1.65
0.14 1.15 1.65
0.04 -0.01 1.62
1.59
0.57
0.97
0.97
1.07
1.78
5.45 2.22
1.03 1.94
3.18 1.78
3.18 1.78
3.18 1.78
0.10 1.78
0.06
1.78
0.70
1.53
1.53
1.51
0.06
0.03
2.28
1.16
1.16
0.01
0.93
0.27
0.38
0.47
0.47
0.03
0.03
2.26
BC - OLS
IBC - OLS
IV - RC
1.00
0.05
6.35
1.59
0.46
0.58
1.07
0.94
0.94
1.15
0.05
0.80
0.06
1.02
0.93
0.06
1.56
1.02
0.19
1.00
2.08
1.89
1.76
1.76
1.76
1.78
0.57
1.56
-0.04 1.78
0.69
0.00
-0.04 1.78
4.90
0.56
3.12
3.12
3.12
0.03
2.31
0.06
0.90
0.27
0.38
0.48
0.48
0.03
2.31
0.96
0.49
0.57
1.04
0.91
0.91
1.10
0.05
6.19
-0.02 1.62
4.43
0.21
3.12
3.12
3.12
-0.01
0.98
0.05
0.53
0.04
-0.08 1.86
0.98
0.19
0.92
1.02
-0.08 1.86
0.68
0.58
1.51
0.90
0.25
0.41
0.50
0.50
0.03
2.34
2.25
-0.01
0.05
5.87
BC - IV
0.05
0.26
0.98
0.93
0=4
4.59 2.14
4.59 2.14
4.59 2.14
1.11
1.07
0.88
0.88
0.03
0.51
0.67
0.67
0.03
2.23
4.52 2.11
4.52 2.11
4.52 2.11
0.97
0.97
1.16
1.09
0.89
0.89
0.06
0.06
0.02
0.50
0.64
0.64
-0.26 2.18
-0.26 2.18
-0.10 2.18
4.30 2.01
4.30 2.01
4.30 2.01
1.00
1.00
1.19
1.15
0.94
0.94
0.05
0.05
0.02
0.46
0.61
0.61
?p=6
2.33
-0.12 2.23
-0.13 2.23
1.95
1.92
1.81
1.81
1.81
1.86
0.00
1.84
1.85
1.76
1.76
1.76
1.78
0.52
0.59
1.06
0.93
0.93
1.15
1.02
1.02
0.17
0.53
0.69
0.69
0.05
0.06
0.06
0.06
1.78
0.62
1.08
0.88
0.88
0.04
0.94
0.94
0.96
4.15
0.09
3.19
3.19
3.19
0.06
1.78
0.69
0.04 -0.03 1.62
0.00
2.13
4.69 2.10
4.69 2.10
4.69 2.10
1.12
-0.10 2.33
-0.10 2.32
1.02
OLS - FC
IV - FC
OLS - RC
BC - OLS
IBC - OLS
IV - RC
-0.01
2.29
2.30
0.06
0.06
1.51
BC - IV
5.62
0.09
0.95
0.95
-0.01
IBC - IV
0.14
IBC - IV
IV - FC
OLS - FC
IV - RC
-0.05 2.30
-0.06 2.30
OLS - RC
BC - OLS
IBC - OLS
BC - IV
IBC - IV
RC/FC refers to random/fixed coefficient model. BC/IBC refers to bias corrected/iterated bias corrected estimates.
Note: 1, 000 repetitions.
0%
0%
Table A3: Standard Deviation of the Individual Specific Parameter O = E[(cel1
p, = 0. 6
-
/.L)
2
11/
2
P, = 0.9
SD
0.06
0.09
0.09
0.06
0.06
0.06
p, = 0.3
SE/SD p;.05 Bias
0.99
1.01
1.01
0.99
1.00
1.00
0.08
p_ =0
SD
1.06
1.11
1.11
1.11
1.16
1.16
1.23
0.03
0.03
0.20
SE/SD p;.05
SE/SD p;.05 Bias
0.17
-0.46
-0.46
0.46
-0.17
-0.17
1.05
1.24
1.22
1.17
OLS - RC
0.89
1.02
1.04
1.04
1.03
1.05
1.05
0.05
0.10
0.10
0.05
0.06
0.06
0.98
SD
0.06
0.10
0.10
0.06
0.07
0.07
1.09
1.23
1.24
1.18
SE/SD p;.05 Bias
0.99
1.01
1.01
0.98
1.00
1.00
0.06
-0.98
-0.98
1.87
SD
1.08
1.12
1.12
1.13
1.18
1.18
1.19
0.05
0.05
0.18
Bias
0.11
-0.52
-0.52
0.41
-0.22
-0.22
1.16
1.20
1.20
1.17
Estimator
1.08
1.41
1.41
1.26
0= 4
0.05
0.09
0.09
0.06
0.06
0.06
0.05
-1.02
-1.02
1.85
1.02
1.04
1.04
1.02
1.04
1.04
1.17
0.08
0.08
0.16
1.06
1.11
1.11
1.10
1.14
1.14
1.20
1.20
1.20
1.16
0.15
-0.48
-0.48
0.47
-0.16
-0.16
0.04
1.44
1.44
1.29
BC - OLS -1.24
IBC - OLS -1.24
1.84
1.06
1.10
1.10
1.08
1.13
1.13
1.17
0.08 -1.13
0.08 -1.13
0.17 1.83
0.02
0.02
0.01
-0.63
-0.63
0.38
-0.25
-0.25
1.21
1.19
1.19
1.17
1.18
1.16
OLS - RC
BC - OLS
IBC - OLS
IV - RC
BC - IV
IBC - IV
1.46
1.46
1.28
-0.21 1.37
-0.21 1.38
0= 6
0.21 2.57 1.37
0.01 -1.75 2.06
0.01 -1.75 2.06
0.49 3.78 1.50
0.01 -0.55 2.14
0.01 -0.55 2.14
0.38
0.00
0.00
0.60
0.01
0.02
0.03
0.03
1.30
1.30
1.30
1.28
1.29
1.29
1.28
1.28
1.26
1.23
1.24
1.22
1.18
1.18
1.40
2.14
2.14
1.55
2.23
2.23
1.31
1.78
1.80
1.48
1.96
1.97
0.02 -0.26 1.51
0.02 -0.26 1.51
2.60
-1.71
-1.71
3.87
-0.42
-0.41
2.69
-1.54
-1.54
3.87
-0.40
-0.39
1.19
1.19
0.14
0.00
0.00
0.46
0.00
0.00
0.21
0.00
0.00
0.47
0.00
0.00
0.03 -0.26 1.52
0.03 -0.26 1.52
1.38
1.41
1.41
1.31
1.37
1.37
1.31
1.35
1.35
1.30
1.35
1.35
1.20
1.20
1.33
2.04
2.04
1.52
2.13
2.13
-0.25 1.52
-0.25 1.52
2.35
-2.06
-2.06
3.79
-0.49
-0.49
IV - RC
BC - IV
IBC - IV
OLS - RC
BC - OLS
IBC - OLS
IV - RC
BC - IV
IBC - IV
RC/FC refers to random/fixed coefficient model. BC/IBC refers to bias corrected/iterated bias corrected estimates.
Note: 1,000 repetitions.
Download