to ‘Individual and Time Effects in Nonlinear Panel Models Supplement with

advertisement
Supplement to ‘Individual and Time Effects in Nonlinear Panel Models
with Large N, T ’
Iván Fernández-Val‡
Martin Weidner§
March 31, 2015
Abstract
This supplemental material contains five appendices. Appendix S.1 presents the results of an empirical
application and a Monte Carlo simulation calibrated to the application. Following Aghion et al. (2005),
we use a panel of U.K. industries to estimate Poisson models with industry and time effects for the
relationship between innovation and competition. Appendix S.2 gives the proofs of Theorems 4.3 and
4.4. Appendices S.3, S.4, and S.5 contain the proofs of Appendices B, C, and D, respectively. Appendix
S.6 collects some useful intermediate results that are used in the proofs of the main results.
S.1
S.1.1
Relationship between Innovation and Competition
Empirical Example
To illustrate the bias corrections with real data, we revisit the empirical application of Aghion, Bloom,
Blundell, Griffith and Howitt (2005) (ABBGH) that estimated a count data model to analyze the
relationship between innovation and competition. They used an unbalanced panel of seventeen U.K.
industries followed over the 22 years between 1973 and 1994.1 The dependent variable, Yit , is innovation
as measured by a citation-weighted number of patents, and the explanatory variable of interest, Zit , is
competition as measured by one minus the Lerner index in the industry-year.
Following ABBGH we consider a quadratic static Poisson model with industry and year effects where
2
Yit | ZiT , αi , γt ∼ P(exp[β1 Zit + β2 Zit
+ αi + γt ]),
for (i = 1, ..., 17; t = 1973, ..., 1994), and extend the analysis to a dynamic Poisson model with industry
and year effects where
2
Yit | Yit−1 , Zit , αi , γ t ∼ P(exp[βY log(1 + Yi,t−1 ) + β1 Zit + β2 Zit
+ αi + γt ]),
‡
§
Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215-1403, USA. Email: ivanf@bu.edu
Department of Economics, University College London, Gower Street, London WC1E 6BT, UK, and and CeMMaP.
Email: m.weidner@ucl.ac.uk
1
We assume that the observations are missing at random conditional on the explanatory variables and unobserved effects
and apply the corrections without change since the level of attrition is low in this application.
1
for (i = 1, ..., 17; t = 1974, ..., 1994). In the dynamic model we use the year 1973 as the initial condition
for Yit .
Table S1 reports the results of the analysis. Columns (2) and (3) for the static model replicate the
empirical results of Table I in ABBGH (p. 708), adding estimates of the APEs. Columns (4) and (5)
report estimates of the analytical corrections that do not assume that competition is strictly exogenous
with L = 1 and L = 2, and column (6) reports estimates of the jackknife bias corrections described
in equation (3.4) of the paper. Note that we do not need to report separate standard errors for the
corrected estimators, because the standard errors of the uncorrected estimators are consistent for the
corrected estimators under the asymptotic approximation that we consider.2 Overall, the corrected
estimates, while numerically different from the uncorrected estimates in column (3), agree with the
inverted-U pattern in the relationship between innovation and competition found by ABBGH. The close
similarity between the uncorrected and bias corrected estimates gives some evidence in favor of the strict
exogeneity of competition with respect to the innovation process.
The results for the dynamic model show substantial positive state dependence in the innovation
process that is not explained by industry heterogeneity. Uncorrected fixed effects underestimates the
coefficient and APE of lag patents relative to the bias corrections, specially relative to the jackknife.
The pattern of the differences between the estimates is consistent with the biases that we find in the
numerical example in Table S4. Accounting for state dependence does not change the inverted-U pattern,
but flattens the relationship between innovation and competition.
Table S.2 implements Chow-type homogeneity tests for the validity of the jackknife corrections. These
tests compare the uncorrected fixed effects estimators of the common parameters within the elements of
the cross section and time series partitions of the panel. Under time homogeneity, the probability limit
of these estimators is the same, so that a standard Wald test can be applied based on the difference
of the estimators in the sub panels within the partition. For the static model, the test is rejected at
the 1% level in both the cross section and time series partitions. Since the cross sectional partition is
arbitrary, these rejection might be a signal of model misspecification. For the dynamic model, the test
is rejected at the 1% level in the time series partition, but it cannot be rejected at conventional levels in
the cross section partition. The rejection of the time homogeneity might explain the difference between
the jackknife and analytical corrections in the dynamic model.
S.1.2
Calibrated Monte Carlo Simulations
We conduct a simulation that mimics the empirical example. The designs correspond to static and
dynamic Poisson models with additive individual and time effects. We calibrate all the parameters and
exogenous variables using the dataset from ABBGH.
2
In numerical examples, we find very little gains in terms of the ratio SE/SD and coverage probabilities when we reestimate
the standard errors using bias corrected estimates.
2
S.1.2.1
Static Poisson model
The data generating process is
2
Yit | ZiT , α, γ ∼ P(exp[Zit β1 + Zit
β2 + αi + γt ]), (i = 1, ..., N ; t = 1, ..., T ),
where P denotes the Poisson distribution. The variable Zit is fixed to the values of the competition
variable in the dataset and all the parameters are set to the fixed effect estimates of the model. We
generate unbalanced panel data sets with T = 22 years and three different numbers of industries N : 17,
34, and 51. In the second (third) case, we double (triple) the cross-sectional size by merging two (three)
independent realizations of the panel.
Table S3 reports the simulation results for the coefficients β1 and β2 , and the APE of Zit . We com2
pute the APE using the expression (2.5) with H(Zit ) = Zit
. Throughout the table, MLE corresponds
to the pooled Poisson maximum likelihood estimator (without individual and time effects), MLE-TE
corresponds to the Poisson estimator with only time effects, MLE-FETE corresponds to the Poisson
maximum likelihood estimator with individual and time fixed effects, Analytical (L=l) is the bias corrected estimator that uses the analytical correction with L = l, and Jackknife is the bias corrected
estimator that uses SPJ in both the individual and time dimensions. The analytical corrections are
different from the uncorrected estimator because they do not use that the regressor Zit is strictly exogenous. The cross-sectional division in the jackknife follows the order of the observations. The choice
of these estimators is motivated by the empirical analysis of ABBGH. All the results in the table are
reported in percentage of the true parameter value.
The results of the table agree with the no asymptotic bias result for the Poisson model with exogenous
regressors. Thus, the bias of MLE-FETE for the coefficients and APE is negligible relative to the
standard deviation and the coverage probabilities get close to the nominal level as N grows. The
analytical corrections preserve the performance of the estimators and have very little sensitivity to the
trimming parameter. The jackknife correction increases dispersion and rmse, specially for the small
cross-sectional size of the application. The estimators that do not control for individual effects are
clearly biased.
S.1.2.2
Dynamic Poisson model
The data generating process is
2
Yit | Yit−1 , Zit , α, γ ∼ P(exp[βY log(1 + Yi,t−1 ) + Zit β1 + Zit
β2 + αi + γt ]), (i = 1, ..., N ; t = 1, ..., T ).
The competition variable Zit and the initial condition for the number of patents Yi0 are fixed to the
values in the dataset and all the parameters are set to the fixed effect estimates of the model. To generate
panels, we first impute values to the missing observations of Zit using forward and backward predictions
from a panel AR(1) linear model with individual and time effects. We then draw panel data sets with
T = 21 years and three different numbers of industries N : 17, 34, and 51. As in the static model, we
double (triple) the cross-sectional size by merging two (three) independent realizations of the panel. We
make the generated panels unbalanced by dropping the values corresponding to the missing observations
in the original dataset.
3
Table S4 reports the simulation results for the coefficient βY0 and the APE of Yi,t−1 . The estimators
considered are the same as for the static Poisson model above. We compute the partial effect of Yi,t−1
using (2.5) with Zit = Yi,t−1 , H(Zit ) = log(1 + Zit ), and dropping the linear term. Table S5 reports
the simulation results for the coefficients β10 and β20 , and the APE of Zit . We compute the partial effect
2
using (2.5) with H(Zit ) = Zit
. Again, all the results in the tables are reported in percentage of the true
parameter value.
The results in table S4 show biases of the same order of magnitude as the standard deviation for
the fixed effects estimators of the coefficient and APE of Yi,t−1 , which cause severe undercoverage
of confidence intervals. Note that in this case the rate of convergence for the estimator of the APE is
√
rN T = N T , because the individual and time effects are hold fixed across the simulations. The analytical
corrections reduce bias by more than half without increasing dispersion, substantially reducing rmse and
bringing coverage probabilities closer to their nominal levels. The jackknife corrections reduce bias and
increase dispersion leading to lower improvements in rmse and coverage probability than the analytical
corrections. The results for the coefficient of Zit in table 8 are similar to the static model. The results
for the APE of Zit are imprecise, because the true value of the effect is close to zero.
S.2
Proofs of Theorems 4.3 and 4.4
We start with a lemma that shows the consistency of the fixed effects estimators of averages of the data
and parameters. We will use this result to show the validity of the analytical bias corrections and the
consistency of the variance estimators.
Lemma S.1. Let G(β, φ) := [N (T − j)]−1
and
Bε0
be a subset of R
dim β+2
P
i,t≥j+1
g(Xit , Xi,t−j , β, αi + γt , αi + γt−j ) for 0 ≤ j < T,
0
0
that contains an ε-neighborhood of (β, πit
, πi,t−j
) for all i, t, j, N, T ,
and for some ε > 0. Assume that (β, π1 , π2 ) 7→ gitj (β, π1 , π2 ) := g(Xit , Xi,t−j , β, π1 , π2 ) is Lipschitz
continuous over Bε0 a.s, i.e. |gitj (β1 , π11 , π21 ) − gitj (β0 , π10 , π20 )| ≤ Mitj k(β1 , π11 , π21 ) − (β, π10 , π20 )k
b φ)
b be
for all (β0 , π10 , π20 ) ∈ B 0 , (β1 , π11 , π21 ) ∈ B 0 , and some Mitj = OP (1) for all i, t, j, N, T . Let (β,
ε
ε
an estimator of (β, φ) such that kβb − β 0 k →P 0 and kφb − φ0 k∞ →P 0. Then,
b φ)
b →P E[G(β 0 , φ0 )],
G(β,
provided that the limit exists.
Proof of Lemma S.1. By the triangle inequality
b φ)
b − E[G(β 0 , φ0 )]| ≤ |G(β,
b φ)
b − G(β 0 , φ0 )| + oP (1),
|G(β,
because |G(β 0 , φ0 ) − E[G(β 0 , φ0 )]| = oP (1). By the local Lipschitz continuity of gitj and the consistency
b φ),
b
of (β,
b φ)
b − G(β 0 , φ0 )| ≤
|G(β,
X
1
0
bα
Mitj k(β,
bi + γ
bt , α
bi + γ
bt−j ) − (β 0 , αi0 + γt0 , αi0 + γt−j
)k
N (T − j)
i,t≥j+1
≤
X
1
Mitj (kβb − β 0 k + 4kφb − φ0 k∞ )
N (T − j)
i,t≥j+1
4
wpa1. The result then follows because [N (T −j)]−1
P
i,τ ≥t
b 0 k∞ ) =
Mitτ = OP (1) and (kβb−β 0 k+4kφ−φ
oP (1) by assumption.
Proof of Theorem 4.3. We separate the proof in three parts corresponding to the three statements
of the theorem.
c →P W ∞ . The asymptotic variance and its fixed effects estimators can be
Part I: Proof of W
b φ),
b where W (β, φ) has a first order representation as
c = W (β,
expressed as W ∞ = E[W (β 0 , φ0 )] and W
a continuously differentiable transformation of terms that have the form of G(β, φ) in Lemma S.1. The
result then follows by the continuous mapping theorem noting that kβb − β 0 k →P 0 and kφb − φ0 k∞ ≤
kφb − φ0 kq →P 0 by Theorem C.1.
√
−1
Part II: Proof of N T (βeA − β 0 ) →d N (0, W ∞ ). By the argument given after equation (3.3) in the
b →P D∞ . These asymptotic biases and their fixed
b →P B ∞ and D
text, we only need to show that B
effects estimators are either time-series averages of fractions of cross-sectional averages, or vice versa.
c , but
The nesting of the averages makes the analysis a bit more cumbersome than the analysis of W
the result follows by similar standard arguments, also using that L → ∞ and L/T → 0 guarantee that
b is also consistent for the spectral expectations; see Lemma 6 in Hahn and
the trimmed estimator in B
Kuersteiner (2011).
Part III: Proof of
√
−1
N T (βeJ − β 0 ) →d N (0, W ∞ ). For T1 = {1, . . . , b(T + 1)/2c}, T2 = {bT /2c +
1, . . . , T }, T0 = T1 ∪ T2 , N1 = {1, . . . , b(N + 1)/2c}, N2 = {bN/2c + 1, . . . , N }, and N0 = N1 ∪ N2 , let
βb(jk) be the fixed effect estimator of β in the subpanel defined by i ∈ Nj and t ∈ Tk .3 In this notation,
βeJ = 3βb(00) − βb(10) /2 − βb(20) /2 − βb(01) /2 − βb(02) /2.
√
We derive the asymptotic distribution of N T (βeJ − β 0 ) from the joint asymptotic distribution of
√
b = N T (βb(00) − β 0 , βb(10) − β 0 , βb(20) − β 0 , βb(01) − β 0 , βb(02) − β 0 ) with dimension 5 × dim β.
the vector B
By Theorem C.1,
√
−1
21(j>0) 21(k>0)
√
N T (βb(jk) − β 0 ) =
NT
−1
(1a,1)
for ψit = W ∞ Dβ `it , bit = W ∞ [Uit
is implicitly defined by U (·) = (N T )
(1b,1,1)
+ Uit
P
−1/2
i,t
X
[ψit + bit + dit ] + oP (1),
i∈Nj ,t∈Tk
−1
(1a,4)
], and dit = W ∞ [Uit
(1b,4,4)
+ Uit
(·)
], where the Uit
(·)
Uit . Here, none of the terms carries a superscript (jk)
by Assumption 4.3. The influence function ψit has zero mean and determines the asymptotic variance
−1
W ∞ , whereas bit and dit determine the asymptotic biases B ∞ and D∞ , but do not affect the asymptotic
variance. By this representation,
 


1
 



  1 

 

 

−1 
b
B →d N κ  1  ⊗ B ∞ + κ 
 


  2 


 

2
3

1
1
1
1
1



2 




2  ⊗ D∞ , 



1 


1
1
2
0
1
1
0
2
1
1
1
1
2
1
1
1
0



1 


−1 

1  ⊗ W∞ ,



0 


2
1



Note that this definition of the subpanels covers all the cases regardless of whether N and T are even or odd.
5
where we use that {ψit : 1 ≤ i ≤ N, 1 ≤ t ≤ T } is independent across i and martingale difference across
t and Assumption 4.3.
The result follows by writing
√
b and using the properties
N T (βeJ −β 0 ) = (3, −1/2, −1/2, −1/2, −1/2)B
of the multivariate normal distribution.
Proof of Theorem 4.4. We separate the proof in three parts corresponding to the three statements
of the theorem.
δ
δ
c in part I of the proof of
Part I: Vb δ →P V ∞ . V ∞ and Vb δ have a similar structure to W ∞ and W
Theorem 4.3, so that the consistency follows by an analogous argument.
√
δ
0
Part II: N T (δeA − δN
T ) →d N (0, V ∞ ). As in the proof of Theorem 4.2, we decompose
rN T √
0
0
rN T (δeA − δN
N T (δeA − δ).
T ) = rN T (δ − δN T ) + √
NT
Then, by Mann-Wald theorem,
√
N T (δeA − δ) =
√
δ(1)
b δ /T − D
b δ /N − δ) →d N (0, V
N T (δb − B
∞ ),
δ(1)
δ(2)
b δ →P Dδ∞ , and rN T (δ − δ 0 ) →d N (0, V δ(2)
b δ →P B δ∞ and D
provided that B
∞ ), where V ∞ and V ∞
NT
are defined as in the proof of Theorem 4.2. The statement thus follows by using a similar argument to
b δ and D
b δ , and because (δ − δ 0 ) and
part II of the proof of Theorem 4.3 to show the consistency of B
NT
√
δ
δ(2)
δ(1)
(δeA − δ) are asymptotically independent, and V ∞ = V
+V
limN,T →∞ (rN T / N T )2 .
√
δ
Part III: N T (δeJ − δ 0 ) →d N (0, V ). As in part II, we decompose
∞
NT
rN T √
0
0
rN T (δeJ − δN
N T (δeJ − δ).
T ) = rN T (δ − δN T ) + √
NT
Then, by an argument similar to part III of the proof of Theorem 4.3,
√
δ(2)
δ(1)
N T (δeJ − δ) →d N (0, V ∞ ),
δ(1)
δ(2)
0
and rN T (δ − δN
T ) →d N (0, V ∞ ), where V ∞
and V ∞ are defined as in the proof of Theorem 4.2.
δ
0
The statement follows because (δ − δN T ) and (δeJ − δ) are asymptotically independent, and V ∞ =
√
δ(2)
δ(1)
V
+V
limN,T →∞ (rN T / N T )2 .
S.3
Proofs of Appendix B (Asymptotic Expansions)
The following Lemma contains some statements that are not explicitly assumed in Assumptions B.1,
but that are implied by it.
Lemma S.1. Let Assumptions B.1 be satisfied. Then
6
(i) H(β, φ) > 0 for all β ∈ B(rβ , β 0 ) and φ ∈ Bq (rφ , φ0 ) wpa1,
sup
√
k∂ββ 0 L(β, φ)k = OP
sup
NT ,
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
β∈B(rβ
k∂φφφ L(β, φ)kq = OP ((N T ) ) ,
sup
,β 0 )
sup
φ∈Bq (rφ
k∂βφ0 L(β, φ)kq = OP (N T )1/(2q) ,
,φ0 )
k∂βφφ L(β, φ)kq = OP ((N T ) ),
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
−1
H (β, φ) = OP (1).
q
−1 −1 (ii) Moreover, kSk = OP (1) , H−1 = OP (1) , H = OP (1) , H−1 − H = oP (N T )−1/8 ,
−1
−1
−1
e −1 H − H − H HH
= oP (N T )−1/4 , k∂βφ0 Lk = OP (N T )1/4 , k∂βφφ Lk = OP ((N T ) ) ,
P
P
−1
g ∂φφ0 φg L [H−1 S]g = OP (N T )−1/4+1/(2q)+ , and g ∂φφ0 φg L [H S]g = OP (N T )−1/4+1/(2q)+ .
Rdim β and w, u ∈ Rdim φ . By a Taylor expansion of
Proof of Lemma S.1. # Part (i): Let v ∈
∂βφ0 φg L(β, φ) around (β 0 , φ0 )
X
ug v 0 ∂βφ0 φg L(β, φ) w
g
"
=
X
ug v
0
#
X
X
0
0
∂βφ0 φg L +
(βk − βk )∂βk βφ0 φg L(β̃, φ̃) −
(φh − φh )∂βφ0 φg φh L(β̃, φ̃) w,
g
k
h
with (β̃, φ̃) between (β 0 , φ0 ) and (β, φ). Thus
k∂βφφ L(β, φ)kq = sup
kvk=1
sup
sup
X
kukq =1
kwkq/(q−1) =1
g
ug v 0 ∂βφ0 φg L(β, φ) w
≤ k∂βφφ Lkq + kβ − β k sup ∂ββφφ L(β̃, φ̃) + kφ − φ0 kq sup ∂βφφφ L(β̃, φ̃) ,
0
q
(β̃,φ̃)
(β̃,φ̃)
q
where the supremum over (β̃, φ̃) is necessary, because those parameters depend on v, w, u. By Assumption B.1, for large enough N and T,
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
k∂βφφ L(β, φ)kq ≤ k∂βφφ Lk + rβ
+ rφ
sup
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
k∂ββφφ L(β, φ)kq
k∂βφφφ L(β, φ)kq
= OP [(N T ) + rβ (N T ) + rφ (N T ) ] = OP ((N T ) ) .
The proofs for the bounds on k∂ββ 0 L(β, φ)k, k∂βφ0 L(β, φ)kq and k∂φφφ L(β, φ)kq are analogous.
Next, we show that H(β, φ) is non-singular for all β ∈ B(rβ , β 0 ) and φ ∈ Bq (rφ , φ0 ) wpa1. By a
Taylor expansion and Assumption B.1, for large enough N and T,
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
kH(β, φ) − Hkq ≤ rβ
+ rφ
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
7
k∂βφφ L(β, φ)kq
k∂φφφ L(β, φ)kq = oP (1).
(S.1)
e
Define ∆H(β, φ) = H − H(β, φ). Then k∆H(β, φ)kq ≤ kH(β, φ) − Hkq + H
, and therefore
q
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
k∆H(β, φ)kq = oP (1),
by Assumption B.1 and equation (S.1).
−1
For any square matrix with kAkq < 1, (1 − A)−1 q ≤ (1 − kAkq ) , see e.g. p.301 in Horn and
Johnson (1985). Then
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
−1
H (β, φ)
q
=
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
=
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
−1 ≤ H q
−1 ≤ H q
−1 H − ∆H(β, φ)
q
−1 −1 −1 H
1
−
∆H(β,
φ)H
q
−1 −1 sup
sup
1 − ∆H(β, φ)H
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
q
−1
−1 sup
sup
1 − ∆H(β, φ)H q
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
−1 −1
≤ H (1 − oP (1)) = OP (1).
q
#Part (ii): By the properties of the `q -norm and Assumption B.1(v),
kSk = kSk2 ≤ (dim φ)1/2−1/q kSkq = Op (1).
Analogously,
k∂βφ0 Lk ≤ (dim φ)1/2−1/q k∂βφ0 Lkq = OP (N T )1/4 .
−1
−1
−1
By Lemma S.4, kH kq/(q−1) = kH kq because H is symmetric, and
q
−1
−1
−1
−1 −1 =
≤
kH kq/(q−1) kH kq = kH kq = OP (1).
H H 2
Analogously,
k∂βφφ Lk ≤ k∂βφφ Lkq = OP ((N T ) ) ,
X
X
−1
−1
∂φφ0 φg L [H S]g ≤ ∂φφ0 φg L [H S]g g
g
q
≤ k∂φφφ Lkq H−1 q kSkq = OP
X
X
−1
−1
∂φφ0 φg L [H S]g ∂φφ0 φg L [H S]g ≤ g
g
q
−1 ≤ k∂φφφ Lkq H kSkq = OP
(N T )−1/4+1/(2q)+ ,
(N T )−1/4+1/(2q)+ .
q
−1 e Assumption B.1 guarantees that H H
< 1 wpa1. Therefore,
H−1 = H
−1
e
1 + HH
−1 −1
=H
−1
∞
∞
X
X
e −1 )s = H−1 − H−1 HH
e −1 + H−1
e −1 )s .
(−HH
(−HH
s=0
s=2
8
(S.2)
−1 P∞ −1 e s
−1 P∞
e −1 )s Note that H
, and therefore
≤ H s=2 H H
s=2 (−HH
−1 3 e 2
−1
H
H
−1
−1
e −1 = oP (N T )−1/4 ,
H − H − H HH
≤
−1 e 1 − H H
by Assumption B.1(vi) and equation
(S.2). −1 −1 −1
and H − H follow immediately.
The results for H
S.3.1
Legendre Transformed Objective Function
We consider the shrinking neighborhood B(rβ , β 0 ) × Bq (rφ , φ0 ) of the true parameters (β 0 , φ0 ). Statement (i) of Lemma S.1 implies that the objective function L(β, φ) is strictly concave in φ in this shrinking
neighborhood wpa1. We define
L∗ (β, S) =
max
φ∈Bq (rφ ,φ0 )
[L(β, φ) − φ0 S] ,
where β ∈ B(rβ , β 0 ) and S ∈
Φ(β, S) =
argmax [L(β, φ) − φ0 S] ,
(S.3)
φ∈Bq (rφ ,φ0 )
Rdim φ . The function L∗ (β, S) is the Legendre transformation of the
objective function L(β, φ) in the incidental parameter φ. We denote the parameter S as the dual
parameter to φ, and L∗ (β, S) as the dual function to L(β, φ). We only consider L∗ (β, S) and Φ(β, S)
for parameters β ∈ B(rβ , β 0 ) and S ∈ S(β, Bq (rφ , φ0 )), where the optimal φ is defined by the first order
conditions, i.e. is not a boundary solution. We define the corresponding set of pairs (β, S) that is dual
to B(rβ , β 0 ) × Bq (rφ , φ0 ) by
SB r (β 0 , φ0 ) = (β, S) ∈ Rdim β+dim φ : (β, Φ(β, S)) ∈ B(rβ , β 0 ) × Bq (rφ , φ0 ) .
Assumption B.1 guarantees that for β ∈ B(rβ , β 0 ) the domain S(β, Bq (rφ , φ0 )) includes S = 0, the origin
of
Rdim φ , as an interior point, wpa1, and that L∗ (β, S) is four times differentiable in a neighborhood
of S = 0 (see Lemma S.2 below). The optimal φ = Φ(β, S) in equation (S.3) satisfies the first order
condition S = S(β, φ). Thus, for given β, the functions Φ(β, S) and S(β, φ) are inverse to each other,
and the relationship between φ and its dual S is one-to-one. This is a consequence of strict concavity of
L(β, φ) in the neighborhood of the true parameter value that we consider here.4 One can show that
∂L∗ (β, S)
,
∂S
Φ(β, S) = −
which shows the dual nature of the functions L(β, φ) and L∗ (β, S). For S = 0 the optimization in (S.3)
b
b
is just over the objective function L(β, φ), so that Φ(β, 0) = φ(β)
and L∗ (β, 0) = L(β, φ(β)),
the profile
objective function. We already introduced S = S(β 0 , φ0 ), i.e. at β = β 0 the dual of φ0 is S, and vica
4
Another consequence of strict concavity of L(β, φ) is that the dual function L∗ (β, S) is strictly convex in S. The original
L(β, φ) can be recovered from L∗ (β, S) by again performing a Legendre transformation, namely
L(β, φ) =
min
R
S∈ dim φ
9
L∗ (β, S) + φ0 S .
b
versa. We can write the profile objective function L(β, φ(β))
= L∗ (β, 0) as a Taylor series expansion of
L∗ (β, S) around (β, S) = (β 0 , S), namely
1
b
L(β, φ(β))
= L∗ (β 0 , S) + (∂β 0 L∗ )∆β − ∆β 0 (∂βS 0 L∗ )S + ∆β 0 (∂ββ 0 L∗ )∆β + . . . ,
2
where ∆β = β − β 0 , and here and in the following we omit the arguments of L∗ (β, S) and of its partial
derivatives when they are evaluated at (β 0 , S). Analogously, we can obtain Taylor expansions for the
b
b
profile score ∂β L(β, φ(β))
= ∂β L∗ (β, 0) and the estimated nuisance parameter φ(β)
= −∂S L∗ (β, 0) in
∆β and S, see the proof of Theorem B.1 below. Apart from combinatorial factors those expansions
b
feature the same coefficients as the expansion of L(β, φ(β))
itself. They are standard Taylor expansions
that can be truncated at a certain order, and the remainder term can be bounded by applying the mean
value theorem.
The functions L(β, φ) and its dual L∗ (β, S) are closely related. In particular, for given β their first
derivatives with respect to the second argument S(β, φ) and Φ(β, S) are inverse functions of each other.
We can therefore express partial derivatives of L∗ (β, S) in terms of partial derivatives of L(β, φ). This
is done in Lemma S.2. The norms k∂βSSS L∗ (β, S)kq , k∂SSSS L∗ (β, S)kq , etc., are defined as in equation
(A.1) and (A.2).
Lemma S.2. Let assumption B.1 be satisfied.
(i) The function L∗ (β, S) is well-defined and is four times continuously differentiable in SB r (β 0 , φ0 ),
wpa1.
(ii) For L∗ = L∗ (β 0 , S),
∂S L∗ = −φ0 , ∂β L∗ = ∂β L, ∂SS 0 L∗ = −(∂φφ0 L)−1 = H−1 , ∂βS 0 L∗ = −(∂βφ0 L)H−1 ,
X
∂ββ 0 L∗ = ∂ββ 0 L + (∂βφ0 L)H−1 (∂φ0 β L), ∂SS 0 Sg L∗ = −
H−1 (∂φφ0 φh L)H−1 (H−1 )gh ,
h
∂βk SS 0 L∗ = H−1 (∂βk φ0 φ L)H−1 +
X
H−1 (∂φg φ0 φ L)H−1 [H−1 ∂βk φ L]g ,
g
∂βk βl S 0 L∗ = −(∂βk βl φ0 L)H−1 − (∂βl φ0 L)H−1 (∂βk φφ0 L)H−1 − (∂βk φ0 L)H−1 (∂βl φ0 φ L)H−1
X
−
(∂βk φ0 L)H−1 (∂φg φ0 φ L)H−1 [H−1 ∂βl φ L]g ,
g
∗
∂βk βl βm L = ∂βk βl βm L +
X
(∂βk φ0 L)H−1 (∂φg φ0 φ L)H−1 (∂βl φ L)[H−1 ∂φβm L]g
g
+ (∂βk φ0 L)H−1 (∂βl φ0 φ L)H−1 ∂φβm L + (∂βm φ0 L)H−1 (∂βk φ0 φ L)H−1 ∂φβl L
+ (∂βl φ0 L)H−1 (∂βm φ0 φ L)H−1 ∂φβk L
+ (∂βk βl φ0 L)H−1 (∂φ0 βm L) + (∂βk βm φ0 L)H−1 (∂φ0 βl L) + (∂βl βm φ0 L)H−1 (∂φ0 βk L),
10
and
∂SS 0 Sg Sh L∗ =
X
H−1 (∂φφ0 φf φe L)H−1 (H−1 )gf (H−1 )he
f,e
+3
X
H−1 (∂φφ0 φe L)H−1 (∂φφ0 φf L)H−1 (H−1 )gf (H−1 )he ,
f,e
∂βk SS 0 Sg L = −
X
−
X
−
X
−
X
−
X
−
X
−
X
−
X
∗
H−1 (∂βk φ0 φ L)H−1 (∂φφ0 φh L)H−1 [H−1 ]gh
h
H−1 (∂φφ0 φh L)H−1 (∂βk φ0 φ L)H−1 [H−1 ]gh
h
H−1 (∂φφ0 φh L)H−1 [H−1 (∂βk φ0 φ L)H−1 ]gh
h
H−1 (∂φf φ0 φ L)H−1 (∂φφ0 φh L)H−1 [H−1 ]gh [H−1 ∂βk φ L]f
h,f
H−1 (∂φφ0 φh L)H−1 (∂φf φ0 φ L)H−1 [H−1 ]gh [H−1 ∂βk φ L]f
h,f
H−1 (∂φφ0 φh L)H−1 [H−1 (∂φf φ0 φ L)H−1 ]gh [H−1 ∂βk φ L]f
h,f
H−1 (∂βk φφ0 φh L)H−1 [H−1 ]gh
h
H−1 (∂φφ0 φh φf L)H−1 [H−1 ]gh [H−1 (∂βk φ L)]f .
h,f
(iii) Moreover,
k∂βββ L∗ (β, S)k = OP (N T )1/2+1/(2q)+ ,
sup
(β,S)∈SBr (β 0 ,φ0 )
sup
(β,S)∈SBr (β 0 ,φ0 )
sup
(β,S)∈SBr (β 0 ,φ0 )
sup
(β,S)∈SBr (β 0 ,φ0 )
sup
(β,S)∈SBr (β 0 ,φ0 )
k∂ββS L∗ (β, S)kq = OP (N T )1/q+ ,
k∂βSS L∗ (β, S)kq = OP (N T )1/(2q)+ ,
k∂βSSS L∗ (β, S)kq = OP (N T )1/(2q)+2 ,
k∂SSSS L∗ (β, S)kq = OP (N T )2 .
Proof of Lemma S.2. #Part (i): According to the definition (S.3), L∗ (β, S) = L(β, Φ(β, S))−Φ(β, S)0 S,
where Φ(β, S) solves the FOC, S(β, Φ(β, S)) = S, i.e. S(β, .) and Φ(β, .) are inverse functions for every
β. Taking the derivative of S(β, Φ(β, S)) = S wrt to both S and β yields
[∂S Φ(β, S)0 ][∂φ S(β, Φ(β, S))0 ] = 1,
[∂β S(β, Φ(β, S))0 ] + [∂β Φ(β, S)0 ][∂φ S(β, Φ(β, S))0 ] = 0.
(S.4)
By definition, S = S(β 0 , φ0 ). Therefore, Φ(β, S) is the unique function that satisfies the boundary
condition Φ(β 0 , S) = φ0 and the system of partial differential equations (PDE) in (S.4). Those PDE’s
11
can equivalently be written as
∂S Φ(β, S)0 = −[H(β, Φ(β, S))]−1 ,
∂β Φ(β, S)0 = [∂βφ0 L(β, Φ(β, S))][H(β, Φ(β, S))]−1 .
(S.5)
This shows that Φ(β, S) (and thus L∗ (β, S)) are well-defined in any neighborhood of (β, S) = (β 0 , S) in
which H(β, Φ(β, S)) is invertible (inverse function theorem). Lemma S.1 shows that H(β, φ) is invertible
in B(rβ , β 0 ) × Bq (rφ , φ0 ), wpa1. The inverse function theorem thus guarantee that Φ(β, S) and L∗ (β, S)
are well-defined in SB r (β 0 , φ0 ). The partial derivatives of L∗ (β, S) of up to fourth order can be expressed
as continuous transformations of the partial derivatives of L(β, φ) up to fourth order (see e.g. proof of
part (ii) of the lemma). Hence, L∗ (β, S) is four times continuously differentiable because L(β, φ) is four
times continuously differentiable.
#Part (ii): Differentiating L∗ (β, S) = L(β, Φ(β, S)) − Φ(β, S)0 S wrt β and S and using the FOC of the
maximization over φ in the definition of L∗ (β, S) gives ∂β L∗ (β, S) = ∂β L(β, Φ(β, S)) and ∂S L∗ (β, S) =
−Φ(β, S), respectively. Evaluating this expression at (β, S) = (β 0 , S) gives the first two statements of
part (ii).
Using ∂S L∗ (β, S) = −Φ(β, S), the PDE (S.5) can be written as
∂SS 0 L∗ (β, S) = H−1 (β, Φ(β, S)),
∂βS 0 L∗ (β, S) = −[∂βφ0 L(β, Φ(β, S))]H−1 (β, Φ(β, S)).
Evaluating this expression at (β, S) = (β 0 , S) gives the next two statements of part (ii).
Taking the derivative of ∂β L∗ (β, S) = ∂β L(β, Φ(β, S)) wrt to β and using the second equation of
(S.5) gives the next statement when evaluated at (β, S) = (β 0 , S).
Taking the derivative of ∂SS 0 L∗ (β, S) = −[∂φφ0 L(β, Φ(β, S))]−1 wrt to Sg and using the first equation
of (S.5) gives the next statement when evaluated at (β, S) = (β 0 , S).
Taking the derivative of ∂SS 0 L∗ (β, S) = −[∂φφ0 L(β, Φ(β, S))]−1 wrt to βk and using the second
equation of (S.5) gives
∂βk SS 0 L∗ (β, S) = H−1 (β, φ)[∂βk φ0 φ L(β, φ)]H−1 (β, φ)
X
+
H−1 (β, φ)[∂φg φ0 φ L(β, φ)]H−1 (β, φ){H−1 (β, φ)[∂βk φ L(β, φ)]}g ,
(S.6)
g
where φ = Φ(β, S). This becomes the next statement when evaluated at (β, S) = (β 0 , S).
We omit the proofs for ∂βk βl S 0 L∗ , ∂βk βl S L∗ , ∂SS 0 Sg Sh L∗ and ∂βk SS 0 Sg L∗ because they are analogous.
#Part (iii): We only show the result for k∂βSS L∗ (β, S)kq , the proof of the other statements is analogous.
By equation (S.6)
2
3
k∂βSS L∗ (β, S)kq ≤ H−1 (β, φ)q k∂βφφ L(β, φ)kq + H−1 (β, φ)q k∂φφφ L(β, φ)kq k∂βφ0 L(β, φ)kq ,
where φ = Φ(β, S). Then, by Lemma S.1
sup
(β,S)∈SBr (β 0 ,φ0 )
k∂βSS L∗ (β, S)kq ≤
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
−1
H (β, φ)2 k∂βφφ L(β, φ)k
q
q
3
+ H−1 (β, φ)q k∂φφφ L(β, φ)kq k∂βφ0 L(β, φ)kq
12
= O (N T )1/(2q)+ .
To derive the rest of the bounds we can use that the expressions from part (ii) hold not only for
(β 0 , S), but also for other values (β, S), provided that (β, Φ(β, S) is used as the argument on the rhs
expressions.
S.3.2
Proofs of Theorem B.1, Corollary B.2, and Theorem B.3
b
Proof of Theorem B.1, Part 1: Expansion of φ(β).
Let β = βN T ∈ B(β 0 , rβ ). A Taylor expansion of ∂S L∗ (β, 0) around (β 0 , S) gives
1X
b
φ(β)
= −∂S L∗ (β, 0) = −∂S L∗ − (∂Sβ 0 L∗ )∆β + (∂SS 0 L∗ )S −
(∂SS 0 Sg L∗ )SSg + Rφ (β),
2 g
where we first expand in β holding S = S fixed, and then expand in S. For any v ∈ Rdim φ the remainder
term satisfies
0
φ
v R (β) = v
0
X
1X
[∂Sβ 0 βk L∗ (β̃, S)](∆β)(∆βk ) +
[∂SS 0 βk L∗ (β 0 , S̃)]S(∆βk )
2
k
k
1X
∗
0
[∂SS 0 Sg Sh L (β , S̄)]SSg Sh ,
+
6
−
g,h
where β̃ is between β 0 and β, and S̃ and S̄ are between 0 and S. By part (ii) of Lemma S.2,
X
b
(∂φφ0 φg L)H−1 S(H−1 S)g + Rφ (β).
φ(β)
− φ0 = H−1 (∂φβ 0 L)∆β + H−1 S + 12 H−1
g
Using that the vector norm k.kq/(q−1) is the dual to the vector norm k.kq , Assumption B.1, and Lemmas S.1 and S.2 yields
φ R (β) =
sup
q
v 0 Rφ (β)
kvkq/(q−1) =1
1
1
∂Sββ L∗ (β̃, S) k∆βk2 + ∂SSβ L∗ (β 0 , S̃) kSkq k∆βk + ∂SSSS L∗ (β 0 , S̄)q kSk3q
2
6
q
q
h
i
1/q+
−1/4+1/q+
−3/4+3/(2q)+2
= OP (N T )
rβ k∆βk + (N T )
k∆βk + (N T )
= oP (N T )−1/2+1/(2q) + oP (N T )1/(2q) kβ − β 0 k ,
≤
uniformly over β ∈ B(β 0 , rβ ) by Lemma S.2.
Proof of Theorem B.1, Part 2: Expansion of profile score. Let β = βN T ∈ B(β 0 , rβ ). A Taylor
expansion of ∂β L∗ (β, 0) around (β 0 , S) gives
b
∂β L(β, φ(β))
= ∂β L∗ (β, 0) = ∂β L∗ + (∂ββ 0 L∗ )∆β − (∂βS 0 L∗ )S +
1X
(∂βS 0 Sg L∗ )SSg + R1 (β),
2 g
where we first expand in β for fixed S = S, and then expand in S. For any v ∈
Rdim β the remainder
term satisfies
X
X
1
[∂ββ 0 βk L∗ (β̃, S)](∆β)(∆βk ) −
[∂ββk S 0 L∗ (β 0 , S̃)]S(∆βk )
v R1 (β) = v
2
k
k
1X
∗
0
−
[∂βS 0 Sg Sh L (β , S̄)]SSg Sh ,
6
0
0
g,h
13
where β̃ is between β 0 and β, and S̃ and S̄ are between 0 and S. By Lemma S.2,
b
∂β L(β, φ(β))
= ∂β L + ∂ββ 0 L + (∂βφ0 L)H−1 (∂φ0 β L) (β − β 0 ) + (∂βφ0 L)H−1 S
1X
∂βφ0 φg L + [∂βφ0 L] H−1 [∂φφ0 φg L] [H−1 S]g H−1 S + R1 (β),
+
2 g
where for any v ∈ Rdim β ,
kR1 (β)k = sup v 0 R1 (β)
kvk=1
≤
1
∂βββ L∗ (β̃, S) k∆βk2 + (N T )1/2−1/q ∂ββS L∗ (β 0 , S̃) kSkq k∆βk
2
q
1
+ (N T )1/2−1/q ∂βSSS L∗ (β 0 , S̄)q kSk3q
6h
= OP (N T )1/2+1/(2q)+ rβ k∆βk + (N T )1/4+1/(2q)+ k∆βk + (N T )−1/4+1/q+2
√
= oP (1) + oP ( N T kβ − β 0 k),
i
uniformly over β ∈ B(β 0 , rβ ) by Lemma S.2. We can also write
√
−1
b
e −1 S − (∂βφ0 L)H−1 HH
e −1 S
dβ L(β, φ(β))
= ∂β L − N T W (∆β) + (∂βφ0 L)H S + (∂βφ0 L)H
−1
1 X
−1
−1
+
∂βφ0 φg L + [∂βφ0 L] H [∂φφ0 φg L] [H S]g H S + R(β),
2 g
√
= U − N T W (∆β) + R(β),
where we decompose the term linear in S into multiple terms by using that
h
ih
i
e H−1 − H−1 HH
e −1 + . . . .
−(∂βS 0 L∗ ) = (∂βφ0 L)H−1 = (∂βφ0 L) + (∂βφ0 L)
The new remainder term is
i
h
−1
e
R(β) = R1 (β) + (∂ββ 0 L)∆β
+ (∂βφ0 L)H−1 (∂φ0 β L) − (∂βφ0 L)H (∂φ0 β L) ∆β
h
−1
i
−1
e −1 S − (∂βφ0 L)H
e −1 S
e −1 HH
+ (∂βφ0 L) H−1 − H − H HH
"
#
X
1 X
−1
−1
−1
−1
+
∂βφ0 φg L[H S]g H S −
∂βφ0 φg L[H S]g H S
2 g
g
X
1 X
−1
−1
−1
+
[∂βφ0 L] H−1 [∂φφ0 φg L][H−1 S]g H−1 S −
[∂βφ0 L] H [∂φφ0 φg L][H S]g H S .
2 g
g
14
By Assumption B.1 and Lemma S.1,
−1 kR(β)k ≤ kR1 (β)k + ∂ββ 0 Le k∆βk + k∂βφ0 Lk H−1 − H k∂φ0 β Lk k∆βk
−1 + ∂βφ0 Le H k∂φ0 β Lk + ∂φ0 β L k∆βk
−1
−1
e
−1 2 e −1 + k∂βφ0 Lk H−1 − H − H HH
kSk
kSk + H ∂βφ0 Le H
−1 1
−1 + k∂βφφ Lk H−1 + H H−1 − H kSk2
2
1
−1 2 + H ∂βφφ Le kSk2
2
X
X
1
−1
−1
−1 −1
−1
−1
0
0
0
0
+ [∂
L]
H
[∂
L][H
S]
H
S
−
[∂
L]
H
[∂
L][H
S]
H
S
βφ
φφ φg
g
βφ
φφ φg
g
2 g
g
i
h
√
= kR1 (β)k + oP (1) + oP ( N T kβ − β 0 k) + OP (N T )−1/8++1/(2q)
√
= oP (1) + oP ( N T kβ − β 0 k),
uniformly over β ∈ B(β 0 , rβ ). Here we use that
X
X
−1
−1
−1
[∂βφ0 L] H [∂φφ0 φg L][H S]g H S [∂βφ0 L] H−1 [∂φφ0 φg L][H−1 S]g H−1 S −
g
g
X
−1 −1 ≤ k∂βφ0 Lk H−1 − H H−1 + H kSk ∂φφ0 φg L [H−1 S]g g
X
−1 −1 −1
∂φφ0 φg L [H S]g + k∂βφ0 Lk H−1 − H H kSk g
X
−1
−1 2
∂φφ0 φg L [H S]g + ∂βφ0 Le H kSk g
X
−1
−1
−1 e
0
+ ∂βφ L H ∂φφg φh L [H S]g [H S]h .
g,h
Proof of Corollary B.2. βb solves the FOC
b φ(
b β))
b = 0.
∂β L(β,
By βb − β 0 = oP (rβ ) and Theorem B.1,
√
N T (βb − β 0 ) + oP (1) + oP ( N T kβb − β 0 k).
√
√
√
−1
−1
Thus, N T (βb − β 0 ) = W U + oP (1) + oP ( N T kβb − β 0 k) = W ∞ U + oP (1) + oP ( N T kβb − β 0 k),
b φ(
b β))
b =U −W
0 = ∂β L(β,
√
−1
−1
where we use that W = W ∞ + oP (1) is invertible wpa1 and that W
= W ∞ + oP (1). We conclude
√
√
−1
0
b
b
that N T (β − β ) = OP (1) because U = OP (1), and therefore N T (β − β 0 ) = W ∞ U + oP (1).
b
Proof of Theorem B.3. # Consistency of φ(β):
Let η = ηN T > 0 be such that η = oP (rφ ),
(N T )−1/4+1/(2q) = oP (η), and (N T )1/(2q) rβ = oP (η). For β ∈ B(rβ , β 0 ), define
φb∗ (β) :=
argmin
{φ: kφ−φ0 kq ≤η}
15
kS(β, φ)kq .
(S.7)
Then, kS(β, φb∗ (β))kq ≤ kS(β, φ0 )kq , and therefore by a Taylor expansion of S(β, φ0 ) around β = β 0 ,
kS(β, φb∗ (β)) − S(β, φ0 )kq ≤ kS(β, φb∗ (β))kq + kS(β, φ0 )kq ≤ 2kS(β, φ0 )kq
≤ 2kSkq + 2 ∂φβ 0 L(β̃, φ0 ) kβ − β 0 k
q
h
i
= OP (N T )−1/4+1/(2q) + (N T )1/(2q) kβ − β 0 k ,
uniformly over β ∈ B(rβ , β 0 ), where β̃ is between β 0 and β, and we use Assumption B.1(v) and
Lemma S.1. Thus,
sup
h
i
kS(β, φb∗ (β)) − S(β, φ0 )kq = OP (N T )−1/4+1/(2q) + (N T )1/(2q) rβ .
β∈B(rβ ,β 0 )
By a Taylor expansion of Φ(β, S) around S = S(β, φ0 ),
b∗
φ (β) − φ0 = Φ(β, S(β, φb∗ (β))) − Φ(β, S(β, φ0 )) ≤ ∂S Φ(β, S̃)0 S(β, φb∗ (β)) − S(β, φ0 )
q
q
q
q
−1
∗
0 ∗
0 b
b
= H (β, Φ(β, S̃)) S(β, φ (β)) − S(β, φ ) = OP (1) S(β, φ (β)) − S(β, φ ) ,
q
q
q
where S̃ is between S(β, φb∗ (β)) and S(β, φ0 ) and we use Lemma S.1(i). Thus,
h
i
sup φb∗ (β) − φ0 = OP (N T )−1/4+1/(2q) + (N T )1/(2q) rβ = oP (η).
q
β∈B(rβ ,β 0 )
This shows that φb∗ (β) is an interior solution of the minimization problem (S.7), wpa1. Thus, S(β, φb∗ (β)) =
0, because the objective function L(β,
φ) is strictly
concave and differentiable, and therefore φb∗ (β) =
b
b
φ(β).
We conclude that
sup φ(β)
− φ0 = OP (η) = oP (rφ ).
β∈B(rβ ,β 0 )
q
b We have already shown that Assumption B.1(ii) is satisfied, in addition to the
# Consistency of β:
remaining parts of Assumption B.1, which we assume. The bounds on the spectral norm in Assumption B.1(vi) and in part (ii) of Lemma S.1 can be used to show that U = OP ((N T )1/4 ).
First, we consider the case dim(β) = 1 first. The extension to dim(β) > 1 is discussed below. Let
−1
η = 2(N T )−1/2 W |U |. Our goal is to show that βb ∈ [β 0 − η, β 0 + η]. By Theorem B.1,
√
√
√
√
b 0 + η)) = U − W N T η + oP (1) + oP ( N T η) = oP ( N T η) − W N T η,
∂β L(β 0 + η, φ(β
√
√
√
√
b 0 − η)) = U + W N T η + oP (1) + oP ( N T η) = oP ( N T η) + W N T η,
∂β L(β 0 − η, φ(β
and therefore for sufficiently large N, T
b 0 + η)) ≤ 0 ≤ ∂β L(β 0 − η, φ(β
b 0 − η)).
∂β L(β 0 + η, φ(β
b φ(
b β))
b = 0, for sufficiently large N, T ,
Thus, since ∂β L(β,
b 0 + η)) ≤ ∂β L(β,
b φ(
b β))
b ≤ ∂β L(β 0 − η, φ(β
b 0 − η)).
∂β L(β 0 + η, φ(β
b
The profile objective L(β, φ(β))
is strictly concave in β because L(β, φ) is strictly concave in (β, φ).
b
Thus, ∂β L(β, φ(β))
is strictly decreasing. The previous set of inequalities implies that for sufficiently
large N, T
β 0 + η ≥ βb ≥ β 0 − η.
16
We conclude that kβb − β 0 k ≤ η = OP ((N T )−1/4 ). This concludes the proof for dim(β) = 1.
To generalize the proof to dim(β) > 1 we define β± = β 0 ± η
0
b
β−β
.
0k
b
kβ−β
Let hβ− , β+ i = {rβ− + (1 −
r)β+ | r ∈ [0, 1]} be the line segment between β− and β+ . By restricting attention to values β ∈ hβ− , β+ i
we can repeat the above argument for the case dim(β) = 1 and thus show that βb ∈ hβ− , β+ i, which
implies kβb − β 0 k ≤ η = OP ((N T )−1/4 ).
S.3.3
Proof of Theorem B.4
Proof of Theorem B.4. A Taylor expansion of ∆(β, φ) around (β 0 , φ0 ) yields
∆(β, φ) = ∆ + [∂β 0 ∆](β − β 0 ) + [∂φ0 ∆](φ − φ0 ) + 21 (φ − φ0 )0 [∂φφ0 ∆](φ − φ0 ) + R1∆ (β, φ),
with remainder term
R1∆ (β, φ) = 12 (β − β 0 )0 [∂ββ 0 ∆(β̄, φ)](β − β 0 ) + (β − β 0 )0 [∂βφ0 ∆(β 0 , φ̃)](φ − φ0 )
X
(φ − φ0 )0 [∂φφ0 φg ∆(β 0 , φ̄)](φ − φ0 )[φ − φ0 ]g ,
+ 16
g
where β̄ is between β and β 0 , and φ̃ and φ̄ are between φ and φ0 .
b β)
b in Theorem B.1,
By assumption, kβb − β 0 k = oP ((N T )−1/4 ), and by the expansion of φb = φ(
3
2
b
kφb − φ0 kq ≤ H−1 q kSkq + H−1 q k∂φβ 0 Lkq βb − β 0 + 12 H−1 q k∂φφφ Lkq kSkq + Rφ (β)
q
−1/4+1/(2q)
= OP ((N T )
b φ),
b
b∆ := R∆ (β,
Thus, for R
1
1
b∆ 1 b
R1 ≤ 2 kβ − β 0 k2
q
).
sup
k∂ββ 0 ∆(β, φ)k
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
+ (N T )1/2−1/q kβb − β 0 kkφb − φ0 kq
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
+ 16 (N T )1/2−1/q kφb − φ0 k3q
√
= oP (1/ N T ).
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
b β)
b from Theorem B.1,
Again by the expansion of φb = φ(
b φ)
b − ∆ = ∂β 0 ∆ + [∂φ ∆]0 H−1 [∂φβ 0 L] (βb − β 0 )
δb − δ = ∆(β,
!
dim
Xφ
[∂φφ0 φg L]H−1 S[H−1 S]g +
+ [∂φ ∆]0 H−1 S + 12
1
2
k∂βφ0 ∆(β, φ)kq
k∂φφφ ∆(β, φ)kq
S 0 H−1 [∂φφ0 ∆]H−1 S + R2∆ ,
g=1
where
∆ ∆
b + 1 (φb − φ0 + H−1 S)0 [∂φφ0 ∆](φb − φ0 − H−1 S)
R2 = R1 + [∂φ ∆]0 Rφ (β)
2
b
≤ R1∆ + (N T )1/2−1/q k∂φ ∆kq Rφ (β)
q
1/2−1/q b
0
−1 1
+ 2 (N T )
φ − φ + H S k∂φφ0 ∆kq φb − φ0 − H−1 S q
q
√
= oP (1/ N T ),
17
(S.8)
that uses φb − φ0 − H−1 S = OP (N T )−1/2+1/q+ . From equation (S.8), the terms of the expansion
q
for δb − δ are analogous to the terms of the expansion for the score in Theorem B.1, with ∆(β, φ) taking
the role of
S.4
√1
NT
∂βk L(β, φ).
Proofs of Appendix C (Theorem C.1)
Proof of Theorem C.1, Part (i). Assumption B.1(i) is satisfied because limN,T →∞
−1
κ+κ
dim φ
√
NT
= limN,T →∞
N +T
√
NT
=
.
Assumption B.1(ii) is satisfied because `it (β, π) and (v 0 φ)2 are four times continuously differentiable
and the same is true
φ).
∗for L(β,
∗
−1 Let D = diag H(αα) , H(γγ) . Then, D = OP (1) by Assumption 4.1(v). By the properties
∞ −1 −1 −1
−1
= OP (1). Thus,
of the matrix norms and Lemma D.1, H − D ≤ (N + T ) H − D ∞
max
−1 −1 −1 −1 −1
H ≤ H ≤ D + H − D = OP (1) by Lemma S.4 and the triangle inequality.
∞
q
∞
∞
We conclude that Assumption B.1(iv) holds.
We now show that the assumptions of Lemma S.7 are satisfied:
P
(i) By Lemma S.2, χi = √1T t ∂βk `it satisfies Eφ (χ2i ) ≤ B. Thus, by independence across i

2 

!2 
X
X
1
1 X
1


∂βk `it   = Eφ  √
χi  =
Eφ χ2i ≤ B,
Eφ  √
N i
N T i,t
N i
and therefore
√1
NT
P
i,t
∂βk `it = OP (1). Analogously,
1
NT
P
i,t
√
{∂βk βl `it − Eφ [∂βk βl `it ]} = OP (1/ N T ) =
oP (1). Next,
2
X
1
Eφ  sup
sup
∂βk βl βm `it (β, πit )
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) N T i,t

2

2
X
X
1
1
|∂βk βl βm `it (β, πit )| ≤ Eφ 
M (Zit )
≤ Eφ  sup
sup
N T i,t
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) N T i,t

≤ Eφ
1 X
1 X
M (Zit )2 =
Eφ M (Zit )2 = OP (1),
N T i,t
N T i,t
and therefore supβ∈B(rβ ,β 0 ) supφ∈Bq (rφ ,φ0 )
P
gives N1T i,t ∂βk βl `it = OP (1).
1
NT
P
i,t
18
∂βk βl βm `it (β, πit ) = OP (1). A similar argument
(ii) For ξit (β, φ) = ∂βk π `it (β, πit ) or ξit (β, φ) = ∂βk βl π `it (β, πit ),
q #
"
1 X 1 X
Eφ
sup
sup
ξit (β, φ)
T
N
0
0
β∈B(rβ ,β ) φ∈Bq (rφ ,φ )
t
i
"
!q #
1X 1 X
≤ Eφ
sup
sup
|ξit (β, φ)|
N i
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) T t
"
!q #
#
"
1X 1 X
1X 1 X
≤ Eφ
M (Zit )
M (Zit )q
≤ Eφ
T t
N i
T t N i
1X 1 X
Eφ M (Zit )q = OP (1),
=
T t N i
q
P P
i.e. supβ∈B(rβ ,β 0 ) supφ∈Bq (rφ ,φ0 ) T1 t N1 i ξit (β, φ) = OP (1). Analogously, it follows that
q
P P
supβ∈B(rβ ,β 0 ) supφ∈Bq (rφ ,φ0 ) N1 i T1 t ξit (β, φ) = OP (1).
(iii) For ξit (β, φ) = ∂πr `it (β, πit ), with r ∈ {3, 4}, or ξit (β, φ) = ∂βk πr `it (β, πit ), with r ∈ {2, 3}, or
ξit (β, φ) = ∂βk βl π2 `it (β, πit ),

!(8+ν) 
X
1

|ξit (β, φ)|
Eφ 
sup
sup
max
T t
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) i

!(8+ν) 
X
1

= Eφ max
sup
sup
|ξit (β, φ)|
i
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) T t


!(8+ν) 
!(8+ν) 
X
X 1X
X
1
 ≤ Eφ 

|ξit (β, φ)|
M (Zit )
≤ Eφ 
sup
sup
T t
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) T t
i
i
"
#
X1X
X1X
≤ Eφ
M (Zit )(8+ν) =
Eφ M (Zit )(8+ν) = OP (N ).
T
T
t
t
i
i
1
T
|ξit (β, φ)| = OP N 1/(8+ν) = OP N 2 . Analo
P
gously, it follows that supβ∈B(rβ ,β 0 ) supφ∈Bq (rφ ,φ0 ) maxt N1 i |ξit (β, φ)| = OP N 2 .
P
(iv) Let χt = √1N i ∂π `it . By cross-sectional independence and Eφ (∂π `it )8 ≤ Eφ M (Zit )8 = OP (1),
q
P
P P
Eφ χ8t = OP (1) uniformly over t. Thus, Eφ T1 t χ8t = OP (1) and therefore T1 t √1N i ∂π `it =
Thus, supβ∈B(rβ ,β 0 ) supφ∈Bq (rφ ,φ0 ) maxi
P
t
OP (1), with q = 8.
P
0
). By Lemma S.2 and Eφ (∂π `it )8+ν ≤ Eφ M (Zit )8+ν = OP (1), Eφ χ8i =
Let χi = √1T t ∂π `it (β 0 , πit
OP (1) uniformly over i. Here we use µ > 4/[1 − 8/(8 + ν)] = 4(8
+ ν)/ν that
q is imposed in
P 8
P 1 P
1
1
Assumption B.1. Thus, Eφ N i χi = OP (1) and therefore N i √T t ∂π `it = OP (1), with
q = 8.
The proofs for
1
T
2
P 1 P
√
t N
i ∂βk π `it − Eφ [∂βk π `it ] = OP (1) and
1
N
2
P 1 P
√
i T
t ∂βk π `it −Eφ [∂βk π `it ] =
OP (1) are analogous.
(v) It follows by the independence of {(`i1 , . . . , `iT ) : 1 ≤ i ≤ N } across i, conditional on φ, in
Assumption B.1(ii).
0
0
(vi) Let ξit = ∂πr `it (β 0 , πit
) − Eφ [∂πr `it ], with r ∈ {2, 3}, or ξit = ∂βk π2 `it (β 0 , πit
) − Eφ ∂βk π2 `it . For
19
8+ν̃ ν̃ = ν, maxi Eφ ξit
= OP (1) by assumption. By Lemma S.1,
X
X
|Covφ (ξit , ξis )|
Eφ [ξit ξis ] =
s
s
X
1/(8+ν) 1/(8+ν)
≤
[8 a(|t − s|)]1−2/(8+ν) Eφ |ξt |8+ν
Eφ |ξs |8+ν
s
= C̃
∞
X
m−µ[1−2/(8+ν)] ≤ C̃
m=1
∞
X
m−4 = C̃π 4 /90,
m=1
where C̃ is a constant. Here we use that µ > 4(8 + ν)/ν implies µ[1 − 2/(8 + ν) > 4. We thus have
P
shown maxi maxt s Eφ [ξit ξjs ] ≤ C̃π 4 /90 =: C.
h
i8 P
√1
ξ
≤ C,
Analogous to the proof of part (iv), we can use Lemma S.2 to obtain maxi Eφ
it
t
T
h
i8
P
√1
and independence across i to obtain maxt Eφ
≤ C. Similarly, by Lemma S.2
i ξit
N
"
#4 

 1 X
√
[ξit ξjt − Eφ (ξit ξjt )]
≤ C,
max Eφ
i,j


T t
which requires µ > 2/[1 − 4/(4 + ν/2)], which is implied by the assumption that µ > 4(8 + ν)/ν.
−1 (vii) We have already shown that H = OP (1).
q
Therefore, we can apply Lemma S.7, which shows that Assumption B.1(v) and (vi) hold. We have already
shown that Assumption B.1(i), (ii), (iv), (v) and (vi) hold. One can also check that (N T )−1/4+1/(2q) =
oP (rφ ) and (N T )1/(2q) rβ = oP (rφ ) are satisfied. In addition, L(β, φ) is strictly concave. We can therefore
invoke Theorem B.3 to show that Assumption B.1(iii) holds and that kβb − β 0 k = OP ((N T )−1/4 ). Proof of Theorem C.1, Part (ii). For any N ×T matrix A we define the N ×T matrix PA as follows
(PA)it = αi∗ + γt∗ ,
(α∗ , γ ∗ ) ∈ argmin
X
α,γ
e A = PÃ,
P
where
Ãit =
(S.1)
i,t
Here, the minimization is over α ∈ RN and γ ∈ RT . The operator
PP = P. It is also convenient to define
2
Eφ (−∂π2 `it ) (Ait − αi − γt ) .
P is a linear projection, i.e. we have
Ait
.
Eφ (−∂π2 `it )
(S.2)
e is a linear operator, but not a projection. Note that Λ and Ξ defined in (C.1) and (4.3) can be written
P
e A and Ξk = P
e Bk , where Ait = −∂π `it and Bk,it = −Eφ (∂β π `it ), for k = 1, . . . , dim β.5
as Λ = P
k
By Lemma S.8(ii),
W =−√
5
N T
1 1 XX
−1
∂ββ 0 L + [∂βφ0 L] H [∂φβ 0 L] = −
[Eφ (∂ββ 0 `it ) + Eφ (−∂π2 `it ) Ξit Ξ0it ] .
N T i=1 t=1
NT
Bk and Ξk are N × T matrices with entries Bk,it and Ξk,it , respectively, while Bit and Ξit are dim β-vectors with entries
Bk,it and Ξk,it .
20
By Lemma S.8(i),
U (0) = ∂β L + [∂βφ0 L] H
−1
S=√
N T
1 X
1 XX
(∂β `it − Ξit ∂π `it ) = √
Dβ `it .
N T i,t
N T i=1 t=1
We decompose U (1) = U (1a) + U (1b) , with
e H−1 S,
e −1 S − [∂βφ0 L] H−1 H
U (1a) = [∂βφ0 L]H
U
(1b)
=
dim
Xφ ∂βφ0 φg L + [∂βφ0 L] H
−1
−1
−1
[∂φφ0 φg L] H S[H S]g /2.
g=1
By Lemma S.8(i) and (iii),
U (1a) = − √
N T
1 XX
1 X
Λit ∂βπ `˜it + Ξit ∂π2 `˜it = − √
Λit [Dβπ `it − Eφ (Dβπ `it )] ,
N T i,t
N T i=1 t=1
and
U (1b) =
i
h
X
1
−1
√
Λ2it Eφ (∂βπ2 `it ) + [∂βφ0 L] H Eφ (∂φ ∂π2 `it ) ,
2 N T i,t
where for each i, t, ∂φ ∂π2 `it is a dim φ-vector, which can be written as ∂φ ∂π2 `it =
A1T
A0 1N
for an N × T
matrix A with elements Ajτ = ∂π3 `jτ if j = i and τ = t, and Ajτ = 0 otherwise. Thus, Lemma S.8(i)
P
−1
gives [∂βφ0 L] H ∂φ ∂π2 `it = − j,τ Ξjτ 1(i = j)1(t = τ )∂π3 `it = −Ξit ∂π3 `it . Therefore
U (1b) =
T
N X
X
X
1
1
√
Λ2it Eφ (Dβπ2 `it ).
Λ2it Eφ ∂βπ2 `it − Ξit ∂π3 `it = √
2 N T i,t
2 N T i=1 t=1
Proof of Theorem C.1, Part (iii). Showing that Assumption B.2 is satisfied is analogous to the
proof of Lemma S.7 and of part (ii) of this Theorem.
In the proof of Theorem 4.1 we show that Assumption 4.1 implies that U = OP (1). This fact
√
together with part (i) of this theorem show that Corollary B.2 is applicable, so that N T (βb − β 0 ) =
−1
W ∞ U + oP (1) = OP (1), and we can apply Theorem B.4.
√
By Lemma S.8 and the result for N T (βb − β 0 ),

0
h
i
X
√
1
−1
−1
N T ∂β 0 ∆ + (∂φ0 ∆)H (∂φβ 0 L) (βb − β 0 ) = 
Eφ (Dβ ∆it ) W ∞ U (0) + U (1) + oP (1).
N T i,t
(S.3)
(0)
(1)
We apply Lemma S.8 to U∆ and U∆ defined in Theorem B.4 to give
√
1 X
(0)
N T U∆ = − √
Eφ (Ψit )∂π `it ,
N T i,t
√
1 X
(1)
N T U∆ = √
Λit [Ψit ∂π2 `it − Eφ (Ψit )Eφ (∂π2 `it )]
N T i,t
X
1
+ √
Λ2it [Eφ (∂π2 ∆it ) − Eφ (∂π3 `it )Eφ (Ψit )] .
2 N T i,t
21
(S.4)
The derivation of (S.3) and (S.4) is analogous to the proof of the part (ii) of the Theorem. Combining
Theorem B.4 with equations (S.3) and (S.4) gives the result.
S.5
Proofs of Appendix D (Lemma D.1)
The following Lemmas are useful to prove Lemma D.1. Let L∗ (β, φ) = (N T )−1/2
P
i,t `it (β, αi
+ γt ).
Lemma S.1. If the statement of Lemma D.1 holds for some constant b > 0, then it holds for any
constant b > 0.
∗
h
i
∗
∗
∂2
∗
vv 0 , where H = Eφ − ∂φ∂φ
. Since H v = 0,
0L
√
√
† ∗ † b
∗ †
NT
NT
∗ †
0
0
= H
vv = H
vv 0 ,
+ √
vv
= H
+
+
0
2
bkvv k
b(N + T )2
NT
Proof of Lemma S.1. Write H = H +
H
−1
√b
NT
where † refers to the Moore-Penrose pseudo-inverse.Thus, if H1 is the expected
Hessian
√ for b = b1 > 0
−1 −1
1
1
NT
0
and H2 is the expected Hessian for b = b2 > 0, H1 − H2 = b1 − b2 (N +T
=
)2 vv max
max
−1/2
O (N T )
.
Lemma S.2. Let Assumption 4.1 hold and let 0 < b ≤ bmin 1 +
−1
H(αα) H(αγ) ∞
<1−
b
bmax
max(N,T ) bmax
min(N,T ) bmin
−1
, and H(γγ) H(γα) ∞
<1−
−1
. Then,
b
.
bmax
Proof of Lemma S.2. Let hit = Eφ (−∂π2 `it ), and define
X hjt − b
1
P
.
P
P
−1
b−1 + j ( τ hjτ )
τ hjτ
j
√
√
∗
∗
∗
By definition, H(αα) = H(αα) + b1N 10N / N T and H(αγ) = H(αγ) − b1N 10T / N T . The matrix H(αα)
√
√
P
∗
is diagonal with elements t hit / N T . The matrix H(αγ) has elements hit / N T . The Woodbury
h̃it = hit − b −
identity states that
−1
∗−1
∗−1
H(αα) = H(αα) − H(αα) 1N
√
∗−1
N T b−1 + 10N H(αα) 1N
−1
∗−1
10N H(αα) .
√
−1
∗−1
Then, H(αα) H(αγ) = H(αα) H̃/ N T , where H̃ is the N × T matrix with elements h̃it . Therefore
P t h̃it −1
.
H(αα) H(αγ) = max P
i
∞
t hit
Assumption 4.1(iv) guarantees that bmax ≥ hit ≥ bmin , which implies hjt − b ≥ bmin − b > 0, and
1 X hjt − b
N bmax
P
h̃it > hit − b − −1
≥ bmin − b 1 +
≥ 0.
b
T bmin
τ hjτ
j
We conclude that
−1
H(αα) H(αγ) ∞


P
X hjt − b
X
h̃
1
1
it
b +

P
= max Pt
= 1 − min P
P P
−1
i
i
h
jτ
b−1 + j ( τ hjτ )
t hit
t hit t
τ
j
b
<1−
.
bmax
−1
Analogously, H(γγ) H(γα) < 1 −
∞
b
bmax .
22
−1
.
Then,
b
≤
b
1+
Proof of Lemma D.1. We choose b < bmin 1 + max(κ2 , κ−2 ) bbmax
min
min
max(N,T ) bmax
min(N,T ) bmin
−1
for large enough N and T , so that Lemma S.2 becomes applicable. The choice of b has no effect on the
general validity of the lemma for all b > 0 by Lemma S.1.
By the inversion formula for partitioned matrices,
H
−1
=
−1
!
A
−A H(αγ) H(γγ)
−H(γγ) H(γα) A
H(γγ) + H(γγ) H(γα) A H(αγ) H(γγ)
−1
−1
−1
,
−1
−1
with A := (H(αα) − H(αγ) H(γγ) H(γα) )−1 . The Woodbury identity states that
−1
∗−1
∗−1
H(αα) = H(αα) − H(αα) 1N
|
−1
H(γγ)
=
∗−1
H(γγ)
−
∗−1
H(γγ) 1T
√
∗−1
N T /b + 10N H(αα) 1N
{z
−1
=:C(αα)
√
|
∗−1
N T /b + 10T H(γγ) 1T
{z
−1
=:C(γγ)
∗−1
10N H(αα) ,
}
∗−1
10T H(γγ) .
}
√
∗−1
∗−1
∗
By Assumption 4.1(v), kH(αα) k∞ = OP (1), kH(γγ) k∞ = OP (1), kH(αγ) kmax = OP (1/ N T ). Therefore6
∗−1
kC(αα) kmax ≤ kH(αα) k2∞ k1N 10N kmax
−1
√
∗−1
N T /b + 10N H(αα) 1N
−1
√
= OP (1/ N T ),
∗−1
kH(αα) k∞ ≤ kH(αα) k∞ + N kC(αα) kmax = OP (1).
√
−1
∗
Analogously, kC(γγ) kmax = OP (1/ N T ) and kH(γγ) k∞ = OP (1). Furthermore, kH(αγ) kmax ≤ kH(αγ) kmax +
√
√
b/ N T = OP (1/ N T ). Define
B :=
−1
−1
1N − H(αα) H(αγ) H(γγ) H(γα)
−1
− 1N =
∞ X
−1
−1
H(αα) H(αγ) H(γγ) H(γα)
n
.
n=1
−1
−1
∗−1
−1
−1
−1
Then, A = H(αα) + H(αα) B = H(αα) − C(αα) + H(αα) B. By Lemma S.2, kH(αα) H(αγ) H(γγ) H(γα) k∞ ≤
2
−1
−1
b
kH(αα) H(αγ) k∞ kH(γγ) H(γα) k∞ < 1 − bmax
< 1, and
kBkmax ≤
≤
∞ X
−1
−1
kH(αα) H(αγ) H(γγ) H(γα) k∞
n=0
"∞ X
b
n=0
bmax
1−
2n #
n
−1
−1
kH(αα) k∞ kH(αγ) k∞ kH(γγ) k∞ kH(γα) kmax
√
−1
−1
T kH(αα) k∞ kH(γγ) k∞ kH(γα) k2max = OP (1/ N T ).
By the triangle inequality,
−1
−1
kAk∞ ≤ kH(αα) k∞ + N kH(αα) k∞ kBkmax = OP (1).
Thus, for the different blocks of
!−1
∗
H(αα)
0
−1
H −
=
∗
0
H(γγ)
6
∗−1
−1
A − H(αα)
−1
−H(γγ) H(γα) A
−A H(αγ) H(γγ)
−1
−1
H(γγ) H(γα) A H(αγ) H(γγ) − C(γγ)
!
,
Here and in the following me make use of the inequalities kABkmax < kAk∞ kBkmax , kABkmax < kAkmax kB 0 k∞ , kAk∞ ≤
nkAkmax , which hold for any m × n matrix A and n × p matrix B.
23
we find
∗−1 A − H(αα) max
−1
= H(αα) B − C(αα) max
√
− kC(αα) kmax = OP (1/ N T ),
√
−1
≤ kAk∞ kH(αγ) kmax kH(γγ) k∞ = OP (1/ N T ),
≤
−1
kH(αα) k∞ kBkmax
−1 −A H(αγ) H(γγ) max
−1
−1
−1
≤ kH(γγ) k2∞ kH(γα) k∞ kAk∞ kH(αγ) kmax + kC(γγ) kmax
H(γγ) H(γα) A H(αγ) H(γγ) − C(γγ) max
√
−1
≤ N kH(γγ) k2∞ kAk∞ kH(αγ) k2max + kC(γγ) kmax = OP (1/ N T ).
√
The bound OP (1/ N T ) for the max-norm of each block of the matrix yields the same bound for the
max-norm of the matrix itself.
S.6
S.6.1
Useful Lemmas
Some Properties of Stochastic Processes
Here we collect some known properties of α-mixing processes, which are useful for our proofs.
Lemma S.1. Let {ξt } be an α-mixing process with mixing coefficients a(m). Let E|ξt |p < ∞ and
E|ξt+m |q < ∞ for some p, q ≥ 1 and 1/p + 1/q < 1. Then,
|Cov (ξt , ξt+m )| ≤ 8 a(m)1/r [E|ξt |p ]
1/p
[E|ξt+m |q ]
1/q
,
where r = (1 − 1/p − 1/q)−1 .
Proof of Lemma S.1. See, for example, Proposition 2.5 in Fan and Yao (2003).
The following result is a simple modification of Theorem 1 in Cox and Kim (1995).
Lemma S.2. Let {ξt } be an α-mixing process with mixing coefficients a(m). Let r ≥ 1 be an integer,
δ
and let δ > 2r, µ > r/(1 − 2r/δ), c > 0 and C > 0. Assume that supt E |ξt | ≤ C and that a(m) ≤ c m−µ
for all m ∈ {1, 2, 3, . . .}. Then there exists a constant B > 0 depending on r, δ, µ, c and C, but not
depending on T or any other distributional characteristics of ξt , such that for any T > 0,

!2r 
T
X
1
 ≤ B.
ξt
E √
T t=1
The following is a central limit theorem for martingale difference sequences.
Lemma S.3. Consider the scalar process ξit = ξN T,it , i = 1, . . . , N , t = 1, . . . , T . Let {(ξi1 , . . . , ξiT ) :
1 ≤ i ≤ N } be independent across i, and be a martingale difference sequence for each i, N , T . Let
E|ξit |2+δ be uniformly bounded across i, t, N, T for some δ > 0. Let σ = σ N T > ∆ > 0 for all sufficiently
P
2
large N T , and let N1T i,t ξit
− σ 2 →P 0 as N T → ∞.7 Then,
X
1
√
ξit →d N (0, 1).
σ N T i,t
7
Here can allow for an arbitrary sequence of (N, T ) with N T → ∞.
24
Proof of Lemma S.3. Define ξm = ξM,m = ξN T,it , with M = N T and m = T (i − 1) + t ∈ {1, . . . , M }.
Then {ξm , m = 1, . . . , M } is a martingale difference sequence. With this redefinition the statement of
the Lemma is equal to Corollary 5.26 in White (2001), which is based on Theorem 2.3 in Mcleish (1974),
PM
and which shows that σ √1M m=1 ξm →d N (0, 1).
S.6.2
Some Bounds for the Norms of Matrices and Tensors
The following lemma provides bounds for the matrix norm k.kq in terms of the matrix norms k.k1 , k.k2 ,
k.k∞ , and a bound for k.k2 in terms of k.kq and k.kq/(q−1) . For sake of clarity we use notation k.k2 for
the spectral norm in this lemma, which everywhere else is denoted by k.k, without any index. Recall
P
that kAk∞ = maxi j |Aij | and kAk1 = kA0 k∞ .
Lemma S.4. For any matrix A we have
1/q
kAkq ≤ kAk1 kAk1−1/q
,
∞
for q ≥ 1,
2/q
kAk2 kAk1−2/q
,
∞
for q ≥ 2,
kAkq ≤
kAk2 ≤
q
kAkq kAkq/(q−1) ,
for q ≥ 1.
Note also that kAkq/(q−1) = kA0 kq for q ≥ 1. Thus, for a symmetric matrix A, we have kAk2 ≤ kAkq ≤
kAk∞ for any q ≥ 1.
Proof of Lemma S.4. The statements follow from the fact that log kAkq is a convex function of
1/q, which is a consequence of the Riesz-Thorin theorem. For more details and references see e.g.
Higham (1992).
The following lemma shows that the norm k.kq applied to higher-dimensional tensors with a special
structure can be expressed in terms of matrix norms k.kq . In our panel application all higher dimensional
tensors have such a special structure, since they are obtained as partial derivatives wrt to α and γ from
the likelihood function.
Lemma S.5. Let a be an N -vector with entries ai , let b be a T -vector with entries bt , and let c be an
N × T matrix with entries cit . Let A be an N × N × . . . × N tensor with entries
|
{z
}
p times
(
Ai1 i2 ...ip =
ai1
if i1 = i2 = . . . = ip ,
0
otherwise.
Let B be an T × T × . . . × T tensor with entries
{z
}
|
r times
(
Bt1 t2 ...tr =
bt1
if t1 = t2 = . . . = tr ,
0
otherwise.
Let C be an N × N × . . . × N × T × T × . . . × T tensor with entries
{z
} |
{z
}
|
p times
r times
(
Ci1 i2 ...ip t1 t2 ...tr =
ci1 t1
if i1 = i2 = . . . = ip and t1 = t2 = . . . = tr ,
0
otherwise.
25
e be an T × T × . . . × T × N × N × . . . × N tensor with entries
Let C
|
{z
} |
{z
}
r times
p times
(
et t ...t i i ...i =
C
1 2
r 1 2
p
ci1 t1
if i1 = i2 = . . . = ip and t1 = t2 = . . . = tr ,
0
otherwise.
Then,
kAkq = max |ai |,
for p ≥ 2,
kBkq = max |bt |,
for r ≥ 2,
kCkq ≤ kckq ,
for p ≥ 1, r ≥ 1,
e q ≤ kc0 kq ,
kCk
for p ≥ 1, r ≥ 1,
i
t
where k.kq refers to the q-norm defined in (A.1) with q ≥ 1.
Proof of Lemma S.5. Since the vector norm k.kq/(q−1) is dual to the vector norm k.kq we can rewrite
the definition of the tensor norm kCkq as follows
kCkq =
max
max
max
ku(1) kq/(q−1) =1
ku(k) kq = 1
kv (l) kq = 1
k = 2, . . . , p
l = 1, . . . , r
X
T
N
X
(r)
(p) (1) (2)
(1) (2)
ui1 ui2 · · · uip vi1 vt2 · · · vtr Ci1 i2 ...ip t1 t2 ...tr .
i1 i2 ...ip =1 t1 t2 ...tr =1
The specific structure of C yields
kCkq =
≤
max
max
ku(1) kq/(q−1) =1
ku(k) kq = 1
N T
X X
(1) (2)
(p) (1) (2)
(r)
ui ui · · · ui vt vt · · · vt cit max
kv (l) kq = 1 i=1 t=1
k = 2, . . . , p
l = 1, . . . , r
max
kukq/(q−1) ≤1
N X
T
X
max ui vi cit = kckq ,
kvkq ≤1 i=1 t=1
where we define u ∈ RN with elements ui = ui ui · · · ui
(1) (2)
(p)
and v ∈ RT with elements vt = vt vt · · · vt ,
(1) (2)
(r)
(1)
and we use that ku(k) kq = 1, for k = 2, . . . , p, and kv (l) kq = 1, for l = 2, . . . , r, implies |ui | ≤ |ui |
(1)
|vt |,
and |vt | ≤
and therefore kukq/(1−q) ≤ ku(1) kq/(1−q) = 1 and kvkq ≤ kv (1) kq = 1. The proof of
e q ≤ kc0 kq is analogous.
kCk
Let A(p) = A, as defined above, for a particular value of p. For p = 2, A(2) is a diagonal N × N
26
1/q
1−1/q
matrix with diagonal elements ai , so that kA(2) kq ≤ kA(2) k1 kA(2) k∞
= maxi |ai |. For p > 2,
N
X
(p) (1)
(2)
(p)
ui1 ui2 · · · uip Ai1 i2 ...ip max
A = (1) max
(k)
q
ku kq/(q−1) =1 ku kq = 1 i1 i2 ...ip =1
k = 2, . . . , p
=
N
X (1) (2)
(p−1)
(p)
(2)
ui ui · · · ui
uj Aij max
(k)
ku kq = 1 i,j=1
max
ku(1) kq/(q−1) =1
k = 2, . . . , p
≤
max
kukq/(q−1) ≤1
N T
X X
(2) max ui vi Aij = kA(2) kq ≤ max |ai |,
i
kvkq =1 i=1 t=1
where we define u ∈ RN with elements ui = ui ui · · · ui
(1) (2)
(p−1)
and v = u(p) , and we use that ku(k) kp = 1,
(1)
|ui |
for k = 2, . . . , p − 1, implies |ui | ≤
and therefore kukq/(q−1) ≤ ku(1) kq/(q−1) = 1. We have thus
(p) shown A ≤ maxi |ai |. From the definition of A(p) q above, we obtain A(p) q ≥ maxi |ai | by
choosing all u(k) equal to the standard basis vector, whose i∗ ’th component equals one, where i∗ ∈
argmaxi |ai |. Thus, A(p) q = maxi |ai | for p ≥ 2. The proof for kBkq = maxt |bt | is analogous.
The following lemma provides an asymptotic bound for the spectral norm of N × T matrices, whose
entries are mean zero, and cross-sectionally independent and weakly time-serially dependent conditional
on φ.
PT
Lemma S.6. Let e be an N × T matrix with entries eit . Let σ̄i2 = T1 t=1 Eφ (e2it ), let Ω be the T × T
PT
PN
matrix with entries Ωts = N1 i=1 Eφ (eit eis ), and let ηij = √1T t=1 [eit ejt − Eφ (eit ejt )]. Consider
asymptotic sequences where N, T → ∞ such that N/T converges to a finite positive constant. Assume
that
(i) The distribution of eit is independent across i, conditional on φ, and satisfies Eφ (eit ) = 0.
4
PN
PN
PN
1
1
1
4
4
4
(ii) N1 i=1 σ̄i2 = OP (1),
i,j=1 Eφ ηij =
i=1 Eφ ηii = OP (1),
T Tr(Ω ) = OP (1),
N
N2
OP (1).
Then, Eφ kek8 = OP (N 5 ), and therefore kek = OP (N 5/8 ).
Proof of Lemma S.6. Let k.kF be the Frobenius norm of a matrix, i.e. kAkF =
27
p
Tr(AA0 ). For
σ̄i4 = (σ̄i2 )2 , σ̄i8 = (σ̄i2 )4 and δjk = 1(j = k),
0
8
0 2
0
kek = kee ee k ≤ kee
ee0 k2F
=
N
X
N X
T
X
i,j=1
k=1 t,τ =1
!2
eit ekt ekτ ejτ
"N
#
N
2
X
X
1/2
2
1/2
2
ηik + T δik σ̄i
=T
ηjk + T δjk σ̄j
2
=T
2
i,j=1
k=1
N
X
N
X
i,j=1
k=1
N
X
≤ 3T 2

ηik ηjk + 2T
N
X

i,j=1
= 3T
!2
N
X
2
i,j=1
2
1/2
ηij σ̄i2
+
T δij σ̄i4
!2

2 4
+ 4T ηij
σ̄i + T 2 δij σ̄i8 
ηik ηjk
k=1
N
X
!2
ηik ηjk
+ 12T
3
2
σ̄i4 ηij
+ 3T
3
i,j=1
k=1
2
N
X
2
N
X
σ̄i8 ,
i=1
3
where we used that (a + b + c) ≤ 3(a + b + c ). By the Cauchy Schwarz inequality,
v


u
! N
!2 
N
N
N
N
u
X
X
X
X X
u
4 ) + 3T 3
σ̄i8 
Eφ (ηij
σ̄i8
Eφ kek8 ≤ 3T 2 Eφ 
ηik ηjk  + 12T 3 t N
i,j=1

N
X
N
X
i,j=1
k=1
= 3T 2 Eφ 
i=1
k=1
i,j=1
i=1
!2 
ηik ηjk
 + OP (T 3 N 2 ) + OP (T 3 N ).
Moreover,

N
X
N
X
i,j=1
k=1
Eφ 
≤
!2 
ηik ηjk
X
i, j, k, l
=
N
X
N
X
Eφ (ηik ηjk ηil ηjl ) =
Eφ (ηij ηjk ηkl ηli )
i,j,k,l=1
i,j,k,l=1
X
N
aijk Eφ (ηii ηij ηjk ηki ) ,
Eφ (ηij ηjk ηkl ηli ) + 4 i,j,k=1
mutually different
≤
X
i, j, k, l


3 1/4


N
N

 X
X
4 
4 
Eφ (ηij ηjk ηkl ηli ) + 4 
Eφ (ηii
)
Eφ (ηij
)




i,j,k=1
i,j,k=1
mutually different
=
X
i, j, k, l

3 1/4
"
#


N
N

 1 X
X
1
3
4
4 

Eφ (ηii )
E
(η
)
Eφ (ηij ηjk ηkl ηli ) + 4N
φ
ij


N2
 N

i=1
i,j=1
mutually different
=
X
i, j, k, l
Eφ (ηij ηjk ηkl ηli ) + OP (N 3 ).
mutually different
where in the second step we just renamed the indices and used that ηij is symmetric in i, j; and
aijk ∈ [0, 1] in the second line is a combinatorial pre-factor; and in the third step we applied the
Cauchy-Schwarz inequality.
28
Let Ωi be the T × T matrix with entries Ωi,ts = Eφ (eit eis ) such that Ω =
1
N
PN
i=1
Ωi . For i, j, k, l
mutually different,
1
Eφ (ηij ηjk ηkl ηli ) = 2
T
=
1
T2
T
X
Eφ (eit ejt ejs eks eku elu elv eiv )
t,s,u,v=1
T
X
Eφ (eiv eit )Eφ (ejt ejs )Eφ (eks eku )Eφ (elu elv ) =
t,s,u,v=1
because Ωi ≥ 0 for all i. Thus,
X
Eφ (ηij ηjk ηkl ηli ) =
X
Eφ (ηij ηjk ηkl ηli ) =
i, j, k, l
i, j, k, l
mutually different
mutually different
≤
1
T2
N
X
Tr(Ωi Ωj Ωk Ωl ) =
i,j,k,l=1
1
T2
1
Tr(Ωi Ωj Ωk Ωl ) ≥ 0
T2
X
Tr(Ωi Ωj Ωk Ωl )
i, j, k, l
mut. different
N4
Tr(Ω4 ) = OP (N 4 /T ).
T2
Combining all the above results gives Eφ kek8 = OP (N 5 ), since N and T are assumed to grow at the
same rate.
S.6.3
Verifying the Basic Regularity Conditions in Panel Models
The following Lemma provides sufficient conditions under which the panel fixed effects estimators in the
main text satisfy the high-level regularity conditions in Assumptions B.1(v) and (vi).
hP
i
1
b
0 2
Lemma S.7. Let L(β, φ) = √N
`
(β,
π
)
−
(v
φ)
, where πit = αi + γt , α = (α1 , . . . , αN )0 ,
it
it
i,t
2
T
γ = (γ1 , . . . , γT ), φ = (α0 , γ 0 )0 , and v = (10N , 10T )0 . Assume that `it (., .) is four times continuously
differentiable in an appropriate neighborhood of the true parameter values (β 0 , φ0 ). Consider limits as
N, T → ∞ with N/T → κ2 > 0. Let 4 < q ≤ 8 and 0 ≤ < 1/8 − 1/(2q). Let rβ = rβ,N T > 0,
rφ = rφ,N T > 0, with rβ = o (N T )−1/(2q)− and rφ = o [(N T )− ]. Assume that
(i) For k, l, m ∈ {1, 2, . . . , dim β},
1 X
1 X
1 X
∂βk `it = OP (1),
∂βk βl `it = OP (1),
{∂βk βl `it − Eφ [∂βk βl `it ]} = oP (1),
N T i,t
N T i,t
N T i,t
1 X
∂βk βl βm `it (β, πit ) = OP (1).
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) N T i,t
√
(ii) Let k, l ∈ {1, 2, . . . , dim β}. For ξit (β, φ) = ∂βk π `it (β, πit ) or ξit (β, φ) = ∂βk βl π `it (β, πit ),
q
1 X 1 X
sup
sup
ξit (β, φ) = OP (1) ,
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) T t N i
q
1 X 1 X
sup
sup
ξit (β, φ) = OP (1) .
T t
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) N
i
29
(iii) Let k, l ∈ {1, 2, . . . , dim β}. For ξit (β, φ) = ∂πr `it (β, πit ), with r ∈ {3, 4}, or ξit (β, φ) = ∂βk πr `it (β, πit ),
with r ∈ {2, 3}, or ξit (β, φ) = ∂βk βl π2 `it (β, πit ),
1X
|ξit (β, φ)| = OP N 2 ,
T t
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) i
1 X
|ξit (β, φ)| = OP N 2 .
sup
sup
max
N i
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 ) t
sup
sup
max
(iv) Moreover,
q
q
1 X 1 X
1 X 1 X
∂π `it = OP (1) ,
∂π `it = OP (1) ,
√
√
T t N i
N i T t
2
1 X 1 X
∂βk π `it − Eφ [∂βk π `it ] = OP (1) ,
√
T t N i
2
1 X 1 X
∂βk π `it − Eφ [∂βk π `it ] = OP (1) .
√
N i T t
(v) The sequence {(`i1 , . . . , `iT ) : 1 ≤ i ≤ N } is independent across i conditional on φ.
(vi) Let k ∈ {1, 2, . . . , dim β}. For ξit = ∂πr `it − Eφ [∂πr `it ], with r ∈ {2, 3}, or ξit = ∂βk π2 `it −
Eφ ∂βk π2 `it , and some ν̃ > 0,
"
#8 
 1 X

X
8+ν̃ √
max Eφ ξit
≤ C, max max
Eφ [ξit ξis ] ≤ C, max Eφ
ξit
≤ C,
t
i
i
i


T t
s
"
"
#4 
#8 


 1 X
 1 X
√
√
ξit
[ξit ξjt − Eφ (ξit ξjt )]
≤ C,
≤ C, max Eφ
max Eφ
t
i,j




N
T t
i
uniformly in N, T , where C > 0 is a constant.
−1 (vii) H = OP (1).
q
Then, Assumptions B.1(v) and (vi) are satisfied with the same parameters q, , rβ = rβ,N T and rφ =
rφ,N T used here.
Proof of Lemma S.7. The penalty term (v 0 φ)2 is quadratic in φ and does not depend on β. This
term thus only enters ∂φ L(β, φ) and ∂φφ0 L(β, φ), but it does not effect any other partial derivative of
L(β, φ). Furthermore, the contribution of the penalty drops out of S = ∂φ L(β 0 , φ0 ), because we impose
the normalization v 0 φ0 = 0. It also drops out of H̃, because it contributes the same to H and H. We can
therefore ignore the penalty
term for the purpose of proving the lemma (but it is necessary to satisfy
−1 the assumption H = OP (1)).
q
√
√
# Assumption (i) implies that k∂β Lk = OP (1), k∂ββ 0 Lk = OP ( N T ), ∂ββ 0 Le = oP ( N T ), and
√
sup
sup
k∂βββ L(β, φ)k = OP
N T . Note that it does not matter which norms we use
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
here because dim β is fixed.
30
# By Assumption (ii), k∂βφ0 Lkq = OP (N T )1/(2q) and
sup
sup
k∂ββφ L(β, φ)kq =
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
P
1
OP (N T )1/(2q) . For example, ∂βk αi L = √N
t ∂βk π `it and therefore
T
k∂βk α Lkq =
q !1/q
X 1 X
= OP N 1/q = OP (N T )1/(2q) .
∂βk π `it √
NT t
i
Analogously, k∂βk γ Lkq = OP (N T )1/(2q) , and therefore k∂βk φ Lkq ≤ k∂βk α Lkq +k∂βk γ Lkq = OP (N T )1/(2q) .
This also implies that k∂βφ0 Lkq = OP (N T )1/(2q) because dim β is fixed.
# By Assumption (iii), k∂φφφ Lkq = OP ((N T ) ), k∂βφφ Lkq = OP ((N T ) ),
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
and
sup
k∂ββφφ L(β, φ)kq = OP ((N T ) ),
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
sup
sup
β∈B(rβ ,β 0 ) φ∈Bq (rφ ,φ0 )
k∂βφφφ L(β, φ)kq = OP ((N T ) ),
k∂φφφφ L(β, φ)kq = OP ((N T ) ). For example,
k∂φφφ Lkq ≤ k∂ααα Lkq + k∂ααγ Lkq + k∂αγα Lkq + k∂αγγ Lkq
+ k∂γαα Lkq + k∂γαγ Lkq + k∂γγα Lkq + k∂γγγ Lkq
≤ k∂παα Lkq + k∂πγγ Lkq + 3 k∂παγ Lkq + 3 k∂πγα Lkq
1−1/q
1/q
1/q
1−1/q
≤ k∂παα Lk∞ + k∂πγγ Lk∞ + 3 k∂παγ Lk∞
k∂πγα Lk∞ + 3 k∂παγ Lk∞ k∂πγα Lk∞
!1−1/q
!1/q
"
X
X
X
X
1
∂π3 `it + max ∂π3 `it + 3 max
|∂π3 `it |
max
|∂π3 `it |
max =√
t t
i
i NT
t
t
t
i
!1/q
!1−1/q #
X
X
+ 3 max
|∂π3 `it |
max
|∂π3 `it |
i
1
≤√
NT
t
t
t
!1−1/q
"
max
i
X
|∂π3 `it | + max
t
t
X
|∂π3 `it | + 3 max
i
i
!1/q
+ 3 max
i
X
|∂π3 `it |
X
|∂π3 `it |
t
!1/q
max
t
X
|∂π3 `it |
t
!1−1/q #
max
t
t
X
|∂π3 `it |
= OP (N 2 ) = OP ((N T ) ).
t
Here, we use Lemma S.5 to bound the norms of the 3-tensors in terms of the norms of matrices,
e.g. k∂ααγ Lkq ≤ k∂παγ Lkq , because ∂αi αj γt L = 0 if i 6= j and ∂αi αi γt L = (N T )−1/2 ∂παi γt .8 Then,
we use Lemma S.4 to bound q-norms in terms of ∞-norms, and then explicitly expressed those ∞P
P
norm in terms of the elements of the matrices. Finally, we use that | i ∂π3 `it | ≤
i |∂π 3 `it | and
P
P
| t ∂π3 `it | ≤ t |∂π3 `it |, and apply Assumption (iii).
# By Assumption (iv), kSkq = OP (N T )−1/4+1/(2q) and ∂βφ0 Le = OP (1). For example,
1
kSkq = √
NT
oP
8
q
q !1/q
X X
X X
∂π `it +
∂π `it = OP N −1/2+1/q = OP (N T )−1/4+1/(2q) .
t
t
i
i
e = OP (N T )−3/16 = oP (N T )−1/8 and # By Assumption (v) and (vi), kHk
∂βφφ Le = OP (N T )−3/16 =
e The proof for (N T )−1/8 . We now show it kHk.
∂βφφ Le is analogous.
With a slight abuse of notation we write ∂παγ L for the N × T matrix with entries (N T )−1/2 ∂π3 `it = (N T )−1/2 ∂π3 `it , and
analogously for ∂παα L, ∂πγγ L, and ∂πγα L.
31
By the triangle inequality,
e = k∂φφ0 L − Eφ [∂φφ0 L]k ≤ k∂αα0 L − Eφ [∂αα0 L]k + k∂γγ 0 L − Eφ [∂γγ 0 L]k + 2 k∂αγ 0 L − Eφ [∂αγ 0 L]k .
kHk
Let ξit = ∂π2 `it − Eφ [∂π2 `it ]. Since ∂αα0 L is a diagonal matrix with diagonal entries
P
1
k∂αα0 L − Eφ [∂αα0 L]k = maxi √N
t ξit , and therefore
T
√1
NT
P
t ξit ,
!8 
X
1
8
ξit 
Eφ k∂αα0 L − Eφ [∂αα0 L]k = Eφ max √
i
NT t

!8 
8
X
X
1
1
√
≤ Eφ 
= OP (N −3 ).
ξit  ≤ CN √
N
T
N
t
i

Thus, k∂αα0 L − Eφ [∂αα0 L]k = OP (N −3/8 ). Analogously, k∂γγ 0 L − Eφ [∂γγ 0 L]k = OP (N −3/8 ).
Let ξ be the N ×T matrix with entries ξit . We now show that ξ satisfies all the regularity condition of
PT
2
) ≤ C 1/4
Lemma S.6 with eit = ξit . Independence across i is assumed. Furthermore, σ̄i2 = T1 t=1 Eφ (ξit
P
P
4
N
N
so that N1 i=1 σ̄i2 = OP (1). For Ωts = N1 i=1 Eφ (ξit ξis ),
1
Tr(Ω4 ) ≤ kΩk4 ≤ kΩk4∞ =
T
!4
max
t
X
Eφ [ξit ξis ]
≤ C = OP (1).
s
PT
PN
4
4
For ηij = √1T t=1 [ξit ξjt − Eφ (ξit ξjt )] we assume Eφ ηij
≤ C, which implies N1 i=1 Eφ ηii
= OP (1)
PN
1
1
4
5/8
and N 2 i,j=1 Eφ ηij = OP (1). Then, Lemma S.6 gives kξk = OP (N ). Note that ξ = √N T ∂αγ 0 L −
e = OP (N −3/8 ) =
Eφ [∂αγ 0 L] and therefore k∂αγ 0 L − Eφ [∂αγ 0 L]k = OP (N −3/8 ). We conclude that kHk
OP (N T )−3/16 .
# Moreover, for ξit = ∂π2 `it − Eφ [∂π2 `it ]
e 8+ν̃
Eφ kHk
∞
!8+ν̃
!8+ν̃
X
1 X
1
max
|ξit |
= Eφ max √
|ξit |
= Eφ √
i
NT i
NT t
t
!8+ν̃
!
X
X T 8+ν̃ 1 X
1 X
8+ν̃
√
√
≤ Eφ
|ξit |
≤ Eφ
|ξit |
= OP (N ),
T t
NT t
NT
i
i
e ∞ = oP (N 1/8 ). Thus, by Lemma S.4
and therefore kHk
e 1−2/q = oP N 1/8[−6/q+(1−2/q)] = oP N −1/q+1/8 = oP (1),
e q ≤ kHk
e 2/q kHk
kHk
∞
2
where we use that q ≤ 8.
P
−1
−1
dim φ
# Finally we show that g,h=1 ∂φφg φh Le [H S]g [H S]h = oP (N T )−1/4 . First,
dim φ
X
−1
−1
e
∂φφg φh L [H S]g [H S]h g,h=1
dim
dim
Xφ
Xφ
−1
−1
−1
−1
e
e
∂αφg φh L [H S]g [H S]h + ∂γφg φh L [H S]g [H S]h ≤
.
g,h=1
g,h=1
32
−1 S, where v is a N -vector and w is a T -vector. We assume H = OP (1). By
q
−1 −1 Lemma S.1 this also implies H = OP (1) and kSk = OP (1). Thus, kvk ≤ H kSk = OP (1),
−1 −1 kwk ≤ H kSk = OP (1), kvk∞ ≤ kvkq ≤ H kSkq = OP (N T )−1/4+1/(2q) , kwk∞ ≤ kwkq ≤
q
−1 −1/4+1/(2q)
kSk
=
O
(N
T
)
.
Furthermore,
by an analogous argument to the above proof for
H q
P
q
e Assumption (v) and (vi) imply that kHk,
∂παα0 Le = OP (N −3/8 ), ∂παγ 0 Le = OP (N −3/8 ), ∂πγγ 0 Le =
Let (v, w)0 := H
−1
OP (N −3/8 ). Then,
dim
Xφ
−1
−1
∂αi φg φh Le [H S]g [H S]h =
g,h=1
N
X
e j vk + 2
(∂αi αj αk L)v
j=1 t=1
j,k=1
=
N
X
N X
T
T
X
X
e j wt +
e t ws
(∂αi αj γt L)v
(∂αi γt γs L)w
e i2 + 2
(∂π2 αi L)v
T
X
t,s=1
T
X
e i wt +
e t2 ,
(∂παi γt L)v
(∂παi γt L)w
t=1
j=1
t=1
and therefore
dim
Xφ
−1
−1
e
e
e
e
∂αφg φh L [H S]g [H S]h ≤ ∂παα0 L kvkkvk∞ + 2 ∂παγ 0 L kwkkvk∞ + ∂παγ 0 L kwkkwk∞
g,h=1
= OP (N −3/8 )OP (N T )−1/4+1/(2q) = OP (N T )−1/4−3/16+1/(2q) = oP (N T )−1/4 ,
P
−1
−1
dim φ
where we use that q > 4. Analogously, g,h=1 ∂γφg φh Le [H S]g [H S]h = oP (N T )−1/4 and thus
P
−1
−1
dim φ
also g,h=1 ∂φφg φh Le [H S]g [H S]h = oP (N T )−1/4 .9
S.6.4
A Useful Algebraic Result
e be the linear operator defined in equation (S.2), and and let
Let P
P be the related projection operator
defined in (S.1). Lemma S.8 shows how in the context of panel data models some expressions that
e.
appear in the general expansion of Appendix B can be conveniently expressed using the operator P
This lemma is used extensively in the proof of part (ii) of Theorem C.1.
Lemma S.8. Let A, B and C be N × T matrices, and let the expected incidental parameter Hessian H
be invertible. Define the N + T vectors A and B and the (N + T ) × (N + T ) matrix C as follows10
!
diag (C1T )
C
1
A1T
1
B1T
1
A=
,
B=
,
C=
.
N T A0 1N
N T B 0 1N
NT
C0
diag (C 0 1N )
Then,
9
proof of Lemma S.7 one might wonder
why, instead of
P Given the structure of this
last part of the
P −1
−1
dim φ
−1/4
−1/(2q)
e
e
0
, we did not directly impose
as
g,h=1 ∂φφg φh L [H S]g [H S]h = oP (N T )
g ∂φg φφ L = oP (N T )
a high-level condition in Assumption B.1(vi). While this alternative high-level assumption would indeed be more
elegant
and
P e
0
sufficient to derive our results, it would not be satisfied for panel models, because it involves bounding
i ∂αi γγ L and
P 0 e
t ∂γt αα L, which was avoided in the proof of Lemma S.7.
P
P
10
0
Note that A1T is simply the N -vectors with entries
t Ait and A 1N is simply the T -vector with entries
i Ait , and
analogously for B and C.
33
(i)
A0 H
−1
(ii)
A0 H
−1
(iii)
A0 H
−1
B=
X
X
1
1
e A)it Bit =
e B)it Ait ,
(
P
(P
(N T )3/2 i,t
(N T )3/2 i,t
B=
X
1
e A)it (P
e B)it ,
Eφ (−∂π2 `it )(P
(N T )3/2 i,t
CH
−1
B=
1 X e
e B)it .
(PA)it Cit (P
(N T )2 i,t
e A)it , with à as defined in equation (S.2). The first order condition of
Proof. Let α̃i∗ + γ̃t∗ = (PÃ)it = (P
∗ ∗
1
the minimization problem in the definition of (PÃ)it can be written as √N
H α̃γ̃ ∗ = A. One solution to
T
√
P
P
∗
−1
this equation is α̃γ̃ ∗ = N T H A (this is the solution that imposes the normalization i α̃i∗ = t γ̃ ∗ ,
but this is of no importance in the following). Thus,


∗ 0
X
X
√
1
1 X e
α̃
−1

B=
α̃i∗ Bit +
γ̃t∗ Bit  =
N T A0 H B =
(PA)it Bit .
∗
γ̃
N T i,t
N T i,t
i,t
This gives the first equality of Statement (i). The second equality of Statement (i) follows by symmetry.
Statement (ii) is a special case of of Statement (iii) with C =
√1
NT
∗
H , so we only need to prove
Statement (iii).
Bit
e B)it , where B̃it =
Let αi∗ + γt∗ = (PB̃)it = (P
Eφ (−∂π2 `it ) . By an argument analogous to the one
√
∗
−1
given above, we can choose αγ ∗ = N T H B as one solution to the minimization problem. Then,
N T A0 H
−1
CH
−1
B=
=
1 X ∗
[α̃ Cit αi∗ + α̃i∗ Cit γt∗ + γ̃t∗ Cit αi∗ + γ̃t∗ Cit γt∗ ]
N T i,t i
1 X e
e B)it .
(PA)it Cit (P
N T i,t
References
Aghion, P., Bloom, N., Blundell, R., Griffith, R., and Howitt, P. (2005). Competition and innovation:
an inverted-U relationship. The Quarterly Journal of Economics, 120(2):701.
Cox, D. D. and Kim, T. Y. (1995). Moment bounds for mixing random variables useful in nonparametric
function estimation. Stochastic processes and their applications, 56(1):151–158.
Fan, J. and Yao, Q. (2003). Nonlinear time series: nonparametric and parametric methods.
Hahn, J. and Kuersteiner, G. (2011). Bias reduction for dynamic nonlinear panel models with fixed
effects. Econometric Theory, 1(1):1–40.
Higham, N. J. (1992). Estimating the matrix p-norm. Numerische Mathematik, 62(1):539–555.
Horn, R. A. and Johnson, C. R. (1985). Matrix analysis. Cambridge university press.
McLeish, D. (1974). Dependent central limit theorems and invariance principles. the Annals of Probability, pages 620–628.
White, H. (2001). Asymptotic theory for econometricians. Academic press New York.
34
Table S1: Poisson model for patents
Dependent variable: citation-­‐
weighted patents
(1)
(2)
(3)
(4)
(5)
(6)
Static model
Competition
Competition squared
165.12
(54.77)
-­‐20.00
(7.74)
-­‐88.55
(29.08)
152.81
(55.74)
-­‐6.43
(8.61)
-­‐80.99
(29.61)
387.46 389.99 401.88 401.51
(67.74) -­‐5.98
-­‐5.49
-­‐6.25
-­‐4.74
(19.68)
-­‐204.55 -­‐205.84 -­‐212.15 -­‐214.03
(36.17) 1.05
(0.02)
0.86
(0.02)
62.95
(62.68)
-­‐12.78
(7.54)
-­‐34.15
(33.21)
1.07
(0.03)
0.87
(0.03)
95.70
(65.08)
-­‐9.03
(8.18)
-­‐51.09
(34.48)
0.46
0.48
0.50
0.70
(0.05) 0.36
0.38
0.39
0.56
(0.07)
199.68 184.70 184.64 255.44
(76.66) -­‐1.68
-­‐0.15
-­‐0.43 -­‐18.45
(15.53)
-­‐105.24
-­‐97.23
-­‐97.22 -­‐136.97
(40.87) Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
A
A
J
1
2
Dynamic model
Lag-­‐patents
Competition
Competition squared
Year effects
Industry effects
Bias correction
(number of lags)
Yes
Notes: Data set obtained from ABBGH. Competition is measured by (1-­‐Lerner index) in the industry-­‐year. All columns are estimated using an unbalanced panel of seventeen industries over the period 1973 to 1994. First year available used as initial condition in dynamic model. The estimates of the coefficients for the static model in columns (2) and (3) replicate the results in ABBGH. A is the bias corrected estimator that uses an analytical correction with a number lags to estimate the spectral expectations specified at the bottom cell. J is the jackknife bias corrected estimator that uses split panel jackknife in both the individual and time dimensions. Standard errors in parentheses and average partial effects in italics.
35
Table S.2: Homogeneity test for the jackknife
Static Model
Dynamic Model
Cross section
Time series
10.49
13.37
(0.01)
(0.00)
1.87
12.41
(0.60)
(0.01)
Notes: Wald test for equality of common parameters across sub panels.
P-values in parentheses
36
14
14
17
17
17
25
60
64
17
17
17
25
1.04
1.01
1.02
1.02
1.02
0.69
0.01
0.01
0.96
0.96
0.96
0.83
Coefficient of Zit
Std. Dev. RMSE SE/SD p; .95
-58
-62
-2
-1
-1
-3
Bias
248
139
226
225
225
333
1.15
1.04
1.49
1.50
1.50
1.01
0.60
0.94
1.00
1.00
1.00
0.95
SE/SD p; .95
Bias
113
139
226
225
225
333
APE of Zit
Std. Dev. RMSE
p; .95
222
-9
-15
-9
-6
-15
Coefficient of Zit2
Std. Dev. RMSE SE/SD
0.01
0.01
0.96
0.96
0.96
0.83
0.06
0.94
0.96
0.96
0.96
0.88
0.20
0.94
0.98
0.98
0.98
0.90
0.96
0.95
1.04
1.04
1.04
0.79
0.98
0.95
1.12
1.11
1.11
0.85
238
77
128
128
129
170
240
97
158
158
159
208
66
77
128
129
129
169
81
97
158
159
159
208
228
-1
-4
2
5
-12
226
-3
-6
0
3
-15
0.00
0.00
0.94
0.94
0.94
0.94
0.00
0.00
0.96
0.96
0.96
0.93
N = 17, T = 22, unbalanced
14
60
1.03
14
64
1.01
17
17
1.02
17
17
1.02
17
17
1.02
25
25
0.70
N = 34, T = 22, unbalanced
10
58
1.03
10
62
1.00
13
13
0.99
13
13
0.99
13
13
0.99
14
14
0.90
N = 51, T = 22, unbalanced
8
57
1.00
8
61
1.00
11
11
0.97
11
11
0.97
11
11
0.96
11
11
0.90
Table S3: Finite-sample properties in static Poisson model
-59
-62
-2
-1
-1
-3
Bias
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
-57
-61
0
0
1
-1
-57
-61
0
0
1
0
0.00
0.00
0.96
0.96
0.96
0.93
0.00
0.00
0.94
0.94
0.94
0.93
1.03
1.00
0.99
0.99
0.99
0.90
1.00
1.00
0.97
0.97
0.96
0.90
59
62
12
12
13
14
58
61
10
10
11
11
10
10
12
12
13
14
8
8
10
10
10
11
-58
-61
0
0
1
-1
-58
-61
0
0
1
0
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
2
Notes: All the entries are in percentage of the true parameter value. 500 repetitions. The data generating process is: Yit ~ Poisson(exp{β1Xit + β2Xit2 + αi + γt}) with
all the variables and coefficients calibrated to the dataset of ABBGH. Average effect is E[(β1 + 2β2 Xit)exp(β1Xit + β2Xit + αi + γt)]. MLE is the Poisson maximum
likelihood estimator without individual and time fixed effects; MLE-TE is the Poisson maximum likelihood estimator with time fixed effects; MLE-FETE is the Poisson
maximum likelihood estimator with individual and time fixed effects;Analytical (L = l) is the bias corrected estimator that uses an analytical correction with l lags to
estimate the spectral expectations; and Jackknife is the bias corrected estimator that uses split panel jackknife in both the individual and time dimension.
37
Table S4: Finite-sample properties in dynamic Poisson model: lagged dependent variable
Bias
Coefficient of Yi,t-1
Std. Dev. RMSE SE/SD
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
135
142
-17
-7
-5
4
3
3
15
15
15
20
135
142
23
17
16
21
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
135
141
-16
-7
-4
3
2
2
11
11
11
13
135
141
19
13
12
14
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
135
141
-15
-6
-3
3
2
2
8
8
8
11
135
141
17
10
9
11
p; .95
Bias
APE of Yi,t-1
Std. Dev. RMSE SE/SD
N = 17,T = 21, unbalanced
1.82
0.00
158
2
1.95
0.00
163
3
0.96
0.78
-17
15
0.98
0.91
-8
14
0.96
0.92
-5
15
0.73
0.85
4
20
N = 34, T = 21, unbalanced
1.76
0.00
158
2
1.77
0.00
162
2
0.93
0.65
-16
10
0.95
0.89
-7
10
0.93
0.91
-4
10
0.77
0.85
3
13
N = 51, T = 21, unbalanced
1.81
0.00
158
1
1.79
0.00
162
2
0.97
0.55
-15
8
0.99
0.90
-6
8
0.97
0.93
-4
8
0.77
0.87
3
10
p; .95
158
163
22
16
16
20
3.75
4.17
1.38
1.41
1.38
1.03
0.00
0.00
0.89
0.97
0.98
0.95
158
162
19
12
11
13
2.82
2.69
1.05
1.08
1.05
0.86
0.00
0.00
0.71
0.92
0.94
0.89
158
162
17
10
9
11
2.58
2.41
1.03
1.05
1.03
0.80
0.00
0.00
0.55
0.91
0.93
0.88
Notes: All the entries are in percentage of the true parameter value. 500 repetitions. The data generating process is:
Yit ~ Poisson(exp{βY log(1 + Yi,t-1) + β1Zit + β2Zit2 + αi + γt}), where all the exogenous variables, initial condition and
coefficients are calibrated to the application of ABBGH. Average effect is βY E[exp{((βY - 1)log(1 + Yi,t-1) + β1Zit +
β2Zit2 + αi + γt}]. MLE is the Poisson maximum likelihood estimator without individual and time fixed effects; MLE-TE
is the Poisson maximum likelihood estimator with time fixed effects; MLE-FETE is the Poisson maximum likelihood
estimator with individual and time fixed effects; Analytical (L = l) is the bias corrected estimator that uses an
analytical correction with l lags to estimate the spectral expectations; and Jackknife is the bias corrected estimator
that uses split panel jackknife in both the individual and time dimension.
38
27
28
40
40
39
57
81
71
41
40
39
57
1.13
1.12
0.95
0.97
0.97
0.68
0.29
0.44
0.92
0.94
0.94
0.82
Coefficient of Zit
Std. Dev. RMSE SE/SD p; .95
-76
-65
9
4
3
3
Bias
Coefficient of Zit2
Std. Dev.
RMSE
SE/SD p; .95
760
541
-3
11
15
24
Bias
351
356
1151
1117
1110
1653
837
647
1150
1116
1109
1651
1.47
1.65
1.03
1.06
1.07
0.74
1.65
1.75
1.08
1.11
1.12
0.75
0.42
0.88
0.94
0.95
0.95
0.85
0.89
0.99
0.99
0.99
0.99
0.86
201
197
606
588
581
838
794
570
606
587
580
837
1.48
1.68
0.99
1.02
1.03
0.71
0.18
0.74
0.95
0.96
0.96
0.83
APE of Zit
Std. Dev. RMSE SE/SD p; .95
0.30
0.45
0.92
0.94
0.94
0.81
817
589
736
714
707
1012
768
535
-27
-11
-5
8
252
248
734
713
706
1012
0.00
0.05
0.93
0.95
0.95
0.91
777
534
-68
-51
-47
-38
N = 17, T = 21, unbalanced
27
80
1.13
29
71
1.12
41
42
0.95
40
40
0.97
40
40
0.97
57
57
0.68
N = 34, T = 21, unbalanced
19
77
1.18
19
67
1.18
28
29
0.97
28
28
0.99
27
27
1.00
31
31
0.87
N = 51, T = 21, unbalanced
15
75
1.17
16
65
1.15
22
24
1.01
22
22
1.02
22
22
1.03
25
25
0.89
0.05
0.15
0.94
0.95
0.95
0.92
Table S5: Finite-sample properties in dynamic Poisson model: exogenous regressor
-76
-65
9
4
3
3
Bias
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
-74
-64
6
2
0
2
-73
-63
8
4
2
4
0.04
0.15
0.94
0.95
0.95
0.92
0.00
0.05
0.93
0.95
0.95
0.91
1.18
1.18
0.97
0.99
0.99
0.87
1.17
1.15
1.01
1.02
1.03
0.89
77
67
28
27
27
31
76
65
23
22
21
25
19
19
28
27
27
31
15
16
22
21
21
25
-75
-65
6
2
0
2
-74
-63
8
4
2
3
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
MLE
MLE-TE
MLE-FETE
Analytical (L=1)
Analytical (L=2)
Jackknife
2
Notes: All the entries are in percentage of the true parameter value. 500 repetitions. The data generating process is: Yit ~ Poisson(exp{βY log(1 + Yi,t-1) + β1Zit +
β2Zit + αi + γt}), where all the exogenous variables, initial condition and coefficients are calibrated to the application of ABBGH. Average effect is E[(β1 + 2β2Zit)
exp{βYlog(1 + Yi,t-1) + β1Zit + β2Zit2 + αi + γt}]. MLE is the Poisson maximum likelihood estimator without individual and time fixed effects; MLE-TE is the Poisson
maximum likelihood estimator with time fixed effects; MLE-FETE is the Poisson maximum likelihood estimator with individual and time fixed effects; Analytical (L = l)
is the bias corrected estimator that uses an analytical correction with l lags to estimate the spectral expectations; and Jackknife is the bias corrected estimator that
uses split panel jackknife in both the individual and time dimension.
39
Download