NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA Maximilian Kasy

advertisement
NONPARAMETRIC INFERENCE ON THE NUMBER OF
EQUILIBRIA
Maximilian Kasy
May 5, 2012
This paper proposes an estimator and develops an inference procedure for the
number of roots of functions which are nonparametrically identi ed by conditional
moment restrictions. It is shown that a smoothed plug-in estimator of the number of
roots is super-consistent under i.i.d. asymptotics, but asymptotically normal under
non-standard asymptotics. The smoothed estimator is furthermore asymptotically
e cient relative to a simple plug-in estimator. The procedure proposed is used to
construct con dence sets for the number of equilibria of static games of incomplete
information and of stochastic di erence equations. In an application to panel data on
neighborhood composition in the United States, no evidence of multiple equilibria is
found.
Keywords: Nonparametric Testing, Multiple Equilibria.
1. INTRODUCTION
1Some
economic systems show large and persistent di erences in outcomes
even though the observable exogenous factors inuencing these systems
di er little.
One explanation for such persistent di erences in outcomes is multiplicity of
equilibria. If a system indeed has multiple equilibria, temporary, large
interventions might have a permanent e ect, by shifting the equilibrium
attained, while long-lasting, small interventions might not have a permanent
e ect.
Knowing the number of equilibria, and in particular whether there are
multiple equilibria, is of interest in many economic contexts. Multiple
equilibria and poverty traps are discussed by Dasgupta and Ray (1986),
Azariadis and Stachurski (2005), and Bowles, Durlauf, and Ho (2006).
Poverty traps can arise, for instance, if an individual’s productivity is a
function of her income and if wage income reects productivity, as in models
of e ciency wages. Productivity might depend on wages because nutrition
and health are improving with income. If this feedback mechanism is strong
enough, there might be multiple equilibria, and extreme poverty might be
self-perpetuating. In that case,
Assistant Professor, Department of Economics, UCLA, and junior associate faculty, IHS
Vienna. Address: 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095. E-Mail:
maxkasy@econ.ucla.edu.
I thank seminar participants at UC Berkeley, UCLA, USC, Brown, NYU, UPenn, LSE, UCL,
Sciences Po, TSE, Mannheim and IHS Vienna for their helpful comments and suggestions. I
particularly thank David Card, Kiril Datchev, Jinyong Hahn, Michael Jansson, Bryan Graham,
Susanne Kimm, Patrick Kline, Rosa Matzkin, Enrico Moretti, Denis Nekipelov, James Powell,
Alexander Rothenberg, Jesse Rothstein, James Stock and Mark van der Laan for many
valuable discussions and David Card, Alexander Mas and Jesse Rothstein for the access
provided to their data. This work was supported by a DOC fellowship from the Austrian
Academy of Sciences at the Department of Economics, UC Berkeley.
1\System"
1
might refer to households, rms, urban neighborhoods, national economies, etc.
2 MAXIMILIAN KASY
public investments in nutrition and health can permanently lift families out of
poverty. Multiple equilibria and urban segregation are discussed by Becker and
Murphy (2000) and Card, Mas, and Rothstein (2008). Urban segregation, along
ethnic or sociodemographic dimensions, might arise because households’ location
choices reect a preference over neighborhood composition. If this preference is
strong enough, di erent compositions of a neighborhood can be stable, given
constant exogenous neighborhood properties. Transition between di erent stable
compositions might lead to rapid composition change, or \tipping," as in the case of
gentri cation of a neighborhood. Interest in such tipping behavior motivated Card,
Mas, and Rothstein (2008), and is the focus of the application discussed in section
4 of this paper. Multiple equilibria and the market entry of rms are discussed by
Bresnahan and Reiss (1991) and Berry (1992). Entering a market might only be
pro table for a rm if its competitors do not enter that same market. As a
consequence, di erent con gurations of which rms serve which markets might be
stable. In sociology, nally, multiple equilibria are of interest in the context of social
norms. If the incentives to conform to prevailing behaviors are strong enough,
di erent behavioral patterns might be stable norms, i.e., equilibria, see Young
(2008). Transitions between such stable norms correspond to social change. One
instance where this has been discussed is the assimilation of immigrant
communities into the mainstream culture of a country.
This paper develops an estimator and an inference procedure for the number of
equilibria of economic systems. It will be assumed that the equilibria of a system
can be represented as solutions to the equation g(x) = 0. It will furthermore be
assumed that gcan be identi ed by some conditional moment restriction. The
procedure proposed here provides con dence sets for the number Z(g) of solutions
to the equation g(x) = 0.
This procedure can be summarized as follows. In a rst stage, gand its derivative
g0are nonparametrically estimated. These rst stage estimates of gand g0
are then plugged into a a smooth functional Z , as de ned in equation (4) below.
We show that under standard i.i.d. asymptotics, and for small enough, the
continuously distributed Z 2(bg) converges to the integer valued Z(g) at an in nite
rate. A superconsistent estimatorof Z(g) can thus be formed by projecting Z (bg)
on the closest integer. We then show that a rescaled version of Z(bg) converges
to a normal distribution under a non-standard sequence of experiments. This
non-standard sequence of experiments is constructed using increasing levels of
noise and shrinking bandwidth as sample size increases. Under this same
sequence of experiments, the bootstrap provides consistent estimates of the bias
and standard deviation of Z (bg) relative to Z(g). We can thus construct
con dence sets for Z(g) using t-tests. These con dence sets are sets of integers
containing the true number of2An estimator is called superconsistent if it converges at a rate
faster than the usual parametric rate, which equals the square root of the sample size.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
3
roots with a pre-speci ed asymptotic probability of 1 . An alternative to the
procedure proposed here would be to use the simple plug-in estimator Z(bg). This
estimator just counts the roots of the rst stage estimate of g. We show, however,
that the simple plug-in estimator is asymptotically ine cient relative to the smoothed
estimator Z (bg) under the non-standard sequence of experiments
considered.3Sections 3.4 and 3.5 discuss two general setups that allow to translate
the hypothesis of multiple equilibria into a hypothesis on the number of roots of
some identi able function g; these setups are static games of incomplete
information and stochastic di erence equations. Section 3.4 discusses a
nonparametric model of static games of incomplete information, similar to the one
analyzed in Bajari, Hong, Krainer, and Nekipelov (2006).Under the assumptions
detailed in section 3.4, we can nonparametrically identify the average best
response functions of the players in a static incomplete information game. This
allows to represent the Bayesian Nash equilibria of this game as roots of an
estimable function. Section 3.4 discusses how to perform inference on the number
of such Bayesian Nash equilibria.
Section 3.5 considers panel data of observations of some variable X, where X is
generated by a general nonlinear stochastic di erence equation. This is motivated
by the study of neighborhood composition dynamics in Card, Mas, and Rothstein
(2008). Section 3.5 argues that we can construct tests for the null hypothesis of
equilibrium multiplicity of such nonlinear di erence equations by testing whether
nonparametric quantile regressions of Xon Xhave multiple roots.
The rest of this paper is structured as follows. Section 2 presents the inference
procedure and its asymptotic justi cation for the baseline case. Section 3 discusses
generalizations, as well as identi cation and inference in static games of incomplete
information and in stochastic di erence equations. Section 4 applies the inference
procedure to the data on neighborhood composition studied by Card, Mas, and
Rothstein (2008). In contrast to their results, no evidence of \tipping" (equilibrium
multiplicity) is found here. Section 5 concludes. Appendix A presents some Monte
Carlo evidence. All proofs are relegated to appendix B. Additional gures and
tables are in the web appendix, Kasy (2010). This web appendix also contains a
second application of the inference procedure to data on economic growth, similar
to those discussed by Azariadis and Stachurski (2005), section 4.1, and by Quah
(1996).
3Note
that this paper does not contribute to the literature discussing identi cation and estimation
problems in games of complete information with multiple equilibria.
4
MA
XI
MI
LI
AN
KA
SY
2
.
I
N
F
E
R
E
N
C
E
I
N
T
H
E
B
A
S
E
L
I
N
E
C
A
S
E
2
.1.
S
et
up
Th
ro
ug
ho
ut
thi
s
pa
pe
r,
th
e
pa
ra
mt
er
of
int
er
es
t
is
th
e
nu
m
be
r
of
ro
ot
s
Zo
f
som
e
funct
ion
gon
a
subs
et X
of its
supp
ort :
(1)
Z(g)
:=
jfx2
X:
g(x)
=
0gj:
Inter
est
in
this
para
met
er is
moti
vate
d by
econ
omic
mod
els
in
whic
h
the
equi
libria
can
be
repr
esen
ted
as
root
s of
such
a
funct
ion
g.
Iden
ti cat
ion
of
the
para
met
er
Z(g)
follo
ws
from
ident
i cati
on
of
gon
X.
In
this
secti
on,
infer
ence
on
Z(g)
is
disc
usse
d for
funct
ions
gwit
h
one
dime
nsio
nal
and
com
pact
dom
ain
and
rang
e.
Thro
ugh
out,
the
follo
wing
assu
mpti
on
will
be
main
taine
d.
Ass
ump
tion
1 T
he
obse
rvabl
e
data
are
i.i.d.
draw
s of
(Yi;X
i). T
he
set
X is
com
pact,
and
the
dens
ity of
Xis
bou
nde
d
awa
y
from
0
onX
. Th
e
funct
ion
gis
ident
i ed
by a
cond
ition
al
mo
men
t
restr
ictio
n of
the
form
(
2
)
g
(
x
)
=
a
r
g
m
i
n
[
m
(
Y
y
)j
X
=
x
]:
T
h
e
f
u
n
c
ti
o
n
g
i
s
c
o
n
ti
n
u
o
u
s
l
y
d
i
e
r
e
n
ti
a
b
l
e
a
n
d
g
e
n
e
ri
c
i
n
t
h
e
s
e
n
s
e
o
f
d
e
n
it
i
o
n
1
b
e
l
o
w
.y
E
Yj
X
2Exa
mple
s of
funct
ions
char
acte
rized
by
cond
ition
al
mo
men
t
restr
ictio
ns
as in
equ
ation
(2)
are
cond
ition
al
mea
n
regr
essi
ons,
for
whic
h
m( )
= ,
and
cond
ition
al
qth
qua
ntile
regr
essi
ons,
for
whic
h
mq( )
= (
q1(
<0)).
0De
nitio
n1
(Ge
neric
ity)
A
conti
nuo
usly
di er
entia
ble
funct
ion g
is
calle
d
gen
eric
if fx:
g(x)
=0
and
g(x)
=
0g=
?,
and
if all
root
s of
gare
in
the
interi
or of
X.
4
G
e
n
e
r
i
c
i
t
y
o
f
g
i
m
p
l
i
e
s
t
h
a
t
g
h
a
s
o
n
l
y
a
n
i
t
e
n
u
m
b
e
r
o
f
r
o
o
t
s
.
W
e
pr
op
os
e
th
e
fol
lo
wi
ng
inf
er
en
ce
pr
oc
ed
ur
e
for
th
e
nu
m
be
r
of
ro
ot
s
of
g,
Z(
g):
Fir
st,
es
ti
m
at
e
g(:
)
an
d
g0
(:)
us
in
g
lo
ca
l
lin
ea
r
mre
gr
es
si
on
:
(3)
here K
0(x)
4
bg(x);
bg
(
= argmina;b X K (Xi x)m(Yi
1K
ab(Xi
x));
i
(
)
=
) for some (symmetric,
w
positive) kernel function Kintegrating to one with
bandwidth . Equation (3) is a sample analog of equation (2),
Suppose that ghas an in nite number of roots in the compact set X . Then the set of x such that
g(x) = 0 has an accumulation point in X . At this accumulation point genericity is violated.
b
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 5
Z=where a kernel weighted local average is replacing the conditional
Z(g
expectation. Next, calculate
(:);
g0
(4
)
Z
bg(:);(:))
:= Z
is de ned asb g0(:) , where ZL g(x)
(x) is a Lipschitz continuous, positive symmetric
kernel integrating to 1 with bandwidth and support [ ; ]. Estimate the variance
andbias.bias of b Zrelative to Zusing bootstrap. Finally, construct integer valued
con -dence sets for Zusing t-statistics based on b Zand the bootstrapped
variance and
2.2. Basic properties and consistency The rest of this section will motivate and
justify this procedure. First, we will
nsee that b Zis a superconsistent estimator of Z, in the sense
( b ZZ) ! p
that
The following proposition states that Z(g) = Z
Z (g(:);g0(:)) = Z(g(:)):
I
n
t
h
i
s
X
dx:
set
s.
g)
for
gen
eric
e
gan
x
d
p
sm
n0 for any diverging
sequence !1, under i.i.d. sampling and conditions to be all
r
stated. Then wee will present the central result of this paper, which establishes eno
asymptotic normality
ofb Zunder a non-standard sequence of experiments.
s
ugh
Fromthis result sit follows that inference based on t-statistics, using bootstrapped
.
standard errorsi and bias corrections, provides asymptotically valid con dence The
sets for Z. We also
o show thatb Zis an e cient estimator relative to the
two
simpleplug-in estimator
Z(bg) under the non-standard asymptotic sequence. We
n
fun
are mainly concerned
with constructing con dence sets for Z, rather thana pointctio
,
estimator. A point estimator could be formed by projecting b Zon theclosest
nal
integer. While bLZwill be called an estimator of Z(g), it should be kept inmind that
s
its primary role is as an intermediate statistic in the construction of con dence onl
y di er around
non-generic g, or
\bifurcation points,"
that is gwhere Zjumps.
The functional Zis a
smooth approximation
of Zwhich varies
continuously around
such jumps.
Proposition 1 For
gcontinuously
di erentiable and
generic, if >0 is small
enough, then
All proofs are
relegated to
appendix B. The
intuition underlying
proposition 1 is as
follows: Given a
generic function g,
consider the subset
of X where L(g) is
not zero. If is small
enough, this subset
is partitioned into
disjoint
neighborhoods of
the roots of g, and
gis monotonic in
each of these
neighborhoods. A
g
0
g1
g2
g3
g4
6 MAXIMILIAN KASY
Figure 1.| Zand Z
x
r0r
(g3
Notes: This gure illustrates the relationship between Zand Z
1) = Z (g1) = 0, Z(g2) = 0 <Z (g2) <1, Z(g3) = 2 >Z
4) = Z (g4) = 2.
1(X ), with the following norm:
(5) jjgjj:= supx2X
jg(x)j+
jg0(x)j:
supx2X
. For the
functions g depicted, Z(g ) >1, and Z(g
change of variables, setting y= g(x), shows that the integral over each
of these neighborhoods equals one. Figure 1 illustrates the
relationship between Zand Z. The two functionals are equal if gdoes
not peak within the range [ ; ]. If gdoes peak within the range [ ; ], they
are di erent and Zis not integer valued.
It is useful to equip the space of continuously di erentiable functions
on the compact set X , C
1
1,
and so is Z
This is the uniform rst order Sobolev norm on C
that has at least one root we can nd a function g2 2arbitrarily close to g1
to be uniformly close to
1
1
1
(X ). Given this
norm, we have the following proposition:
Proposition 2 (Local constancy) Z(:) is constant in a neighborhood,
with respect to the norm jj:jj, of any generic function g2Cif is small
enough.
Using a neighborhood of gwith respect to the sup norm in levels
only, instead of jj:jj, is not enough for the assertion of proposition 2
to hold. For any function gin the uniform sense which has more
roots than g, by adding a \wiggle" around a root of g. Figure 2
illustrates by showing two functions which are uniformly close in
levels but not in derivatives, and which have di erent numbers of
roots. If one, however, additionally restricts the rst derivative of g
that the plugin estimator b
Z= Z
b
g
(
:
)
;
b (:) converges to a degenerate limiting distribution at an \in nite" rate, if
bgconverges with respect to the norm jj:jj.to (g;g05),if gis generic and
g if nTheorem 1 (Superconsistency) If bg;b g0 converges uniformly in
0 probability!1is some arbitrary diverging sequence, then
NONPARAME
Figu
Notes: This gure illust
of roots.
the the derivative o
since around these
\harder" to estimat
dominates the asy
Proposition 2 imme
theorem states
Furthermore, if is small enough so that Z (g;g0) = Z(g) holds, then
nb
b
b Z(g) !
0 if !0 as n!1.
g;
g0 Z(g) ! Z b g
g;
5
b g0
n p(Z(bg)
n
Z(g)) !
0:
Z
This result implies that
0N
ote
tha
t
thi
s is
a
sli
ght
ly
di
ere
nt
co
ndi
tio
n
fro
p
0
:
0
p
m
co
nv
erg
en
ce
of
bg
w.r
.t.
the
nor
m
jj:jj
sin
ce
ne
ed
not
eq
ual
bg.
8
MA
XI
MI
LI
AN
KA
SY
2.
3.
As
y
m
pt
oti
c
no
rm
ali
ty
an
d
rel
ati
ve
e
ci
en
cy
W
e
ha
ve
sh
o
w
n
ou
r
rst
cl
ai
m,
su
pe
rc
on
si
st
en
cy
of
un
de
ra
no
nst
an
da
rd
se
qu
en
ce
of
ex
pe
ri
m
en
ts.
Th
is
se
cti
on
wil
l
th
en
co
nc
lu
de
by
for
m
all
y
st
ati
ng
th
e
e
ci
en
cy
of
b
Zr
el
ati
ve
to
th
e
si
m
pl
e
pl
ug
-in
es
ti
m
at
or
Z(
bg
).
To
fur
th
er
ch
ar
ac
ter
iz
e
th
e
as
y
m
pt
oti
c
di
str
ib
uti
on
of
b
Z,
w
e
ne
ed
a
su
itab
le
ap
pr
ox
im
ati
on
for
th
e
di
str
ib
uti
on
of
th
e
rs
t
st
ag
e
es
ti
m
at
or
bg
(:)
;b
g0
(:)
.K
on
g,
Li
nt
on
,
an
d
Xi
a
(2
01
0)
pr
ov
id
e
un
ifo
rm
B
ah
ad
ur
re
pr
es
en
tat
io
ns
for
lo
ca
l
po
ly
no
mi
al
es
ti
m
at
or
s
of
mre
gr
es
si
on
s.
W
e
st
at
e
th
eir
re
su
lt,
for
th
e
sp
ec
ial
ca
se
of
lo
ca
l
lin
ea
r
mre
gr
es
si
on
,
as
an
as
su
m
pti
on
.
gence of (bg; b Zgiven uniform conver-). We will show next our second claim,
b g0
asymptotic normality of b Z
1 xf
K (Xix) (Yig(x)g0(x)(Xi
1;
X ix 2 3
b
g
(
x
)
;
Assumption 2 (Bahadur expansion) The estimation error
of the estimator bg(x); b g0(x) de ned by equation (3)
can be approximated by a local average as
follows:(6) 0(x) (g(x);g
b
(
x
)
0(x))
=R
x))
1
i
n
X
g
(
x
)
s
1
(
x
)
I
n
where (in a piecewise derivative sense), s(x) =E[ (Yg(x))jX= x], and I(x) is a
f
non-random matrix converging uniformly to the identity matrix, and where
bg(x) b
(g(x);g0(x))
;
g
0
(x)
p
uniformly in x.
n
:=
R
@
dx, := m0 K(x)x2 is the density of
x, 2
x
@g(x)
R= o
bg(x);
b g0(x) 6(x)). This assumption is only well de ned in the context of a
sequence of experiments.yjxIn theorem 2 below, this assumption will b
understood to hold relative to the sequence of experiments de ned in
assumption 3. In the case of qth quantile regression, ( ) = q1( <0) an
f(g(x)jx). In the case of mean regression, ( ) = 2 and s(x) = 2.The asy
results in the remainder of this section depend on the availability of an
expansion in the form of expansion (6) and the relative negligibility
(g(x);g0
R=
uni
for
ml
y
in
X,
for
so
6Kong,
1;
1
Op
log(n)n
Linton, and Xia (2010) provide regularity conditions under which
me
2
(0;
1)
as
n!1
for
sta
tio
nar
y
mi
xin
g
pro
ce
ss
es.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 9
of the remainder, but not on any other speci cs of local linear m-regression. This
will allow for fairly straightforward generalizations of the baseline case considered
here to the cases discussed in section 3 as well as to other cases which are
beyond the scope of this paper, once we have appropriate expansions for the rst
stage estimators.
By proposition 2, consistency of any plugin estimator follows from uniform
convergence of bg(:);b g0(:) . Such uniform convergence follows from
assumption2, combined with a Glivenko Cantelli-theorem on uniform convergence
of averages, assuming i.i.d. draws from the joint distribution of (Y;X) as n!1, seevan
der Vaart (1998), chapter 19. Superconsistency of b Ztherefore follows,
whichimplies that standard i.i.d. asymptotics with rescaling of the estimator yield
only degenerate distributional approximations. This is because Z 1 and Zare
constant in a Cneighborhood of any generic g, even though they jump at
\bifurcation points", i.e., non-generic g. As a consequence, all terms in a functional
Taylor expansion of Z , as a function of g, vanish, except for the remainder. The
application of \delta method" type arguments, as in Newey (1994), gives only the
degenerate limit distribution.
In nite samples, however, the sampling variation of b Zis in general not negli-gible,
as the simulations of appendix A con rm, which makes the distributional
approximation of the degenerate limit useless for inference. Asymptotic statistical
theory approximates the nite sample distribution of interest by a limiting
distribution of a sequence of experiments, of which our actual experiment is an
element. The choice of sequence, such as i.i.d. sampling, is to some extent
arbitrary. In econometrics, non-standard asymptotics are used for instance in the
literature on weak instruments (e.g., Staiger and Stock (1997), Imbens and
Wooldridge (2007), Andrews and Cheng (2010)). In the present setup, a nondegenerate distributional limit of b Zcan only be obtained under a sequence
ofexperiments which yields a non-degenerate limiting distribution of the rst stage
estimator bg(:);b g0(:) 7.We will now consider asymptotics under such a sequence
of experiments. The sequence we consider has increasing amounts of \noise"
rel-ative to \signal" as sample size increases.
Assumption 3 Experiments are indexed by n, and for the nth experiment we
observe (Yi;n;Xi;n) for i= 1;:::;n. The observations (Xi;n;Yi;n) are i.i.d. given
7The
approach of this paper, using local asymptotics, contrasts with the approach taken by most of
the literature discussing inference on discrete valued parameters, testing and model selection. As
argued by Choirat and Seri (2012), this literature has mostly focused on the use of large deviations
asymptotics. The reason is that consistent estimators for discrete objects tend to converge at an
exponential rate. Which type of asymptotics provides a more accurate approximation of nite sample
distributions ultimately depends on the speci c data generating process, c.f. Andrews and Cheng
(2010).
10
MA
XI
MI
LI
AN
KA
SY
i;njX(8)
fx fjX= g(Xi;n) + rni;n
aE[m(rna)jX]:
a
Yi;ni;n
where frn
n, and
(:)(7)
;(9
)
X
gis a real-valued sequence and
i;n
0 = argmin E[m(a)jX] = argmin
The last equality requires the criterion function mto be \scale neutral". For a
given sample size n, this is the same model as before. As nchanges, the
function gidenti ed by equation (2) is held constant. If rngrows in n, the
estimation problem in this sequence of models becomes increasingly di cult
relative to i.i.d. sampling. Note that equation (9) does not describe an additive
structural model, which would allow to predict counterfactual outcomes. Instead,
rni;nis simply the statistical residual, given by the di erence of Y and g(X), which
is also well-de ned for non-additive structural models.
By corollary 1, a necessary condition for a non-degenerate limit of b Zis
that bg;b g0 converges to a non-degenerate limiting distribution. As is well
known,and also follows from assumption 2, b g 0converges at a slower rate than
bg, so that asymptotically variation inb g0will dominate, namely by adding
\wiggles" around the actual roots. If rn= (nh51=2)b g08in the sequence of
experiments just de ned, bgconverges uniformly in probability to g,
whereasconverges point-wise (and indeed functionally) to a non degenerate
limit. This is the basis for the following theorem.
Theorem 2 (Asymptotic normality) Under assumptions 1, 2, and 3, and if r n=
(n 51=2), n !1, !0 and = 2!0, then there exist >0 and V such that
r
for b bg;
Z= Z
b Z Zb g0 !N(0;V) . Both and V depend on the data generating process only
viathe asymptotic mean and variance of b g0at the roots of g, which in turn
depend upon fX, g0, sand Var( jX) evaluated at the roots of g.
form Z(g) = ZThis thoerem justi es the use of t-tests based on b Zfor null
hypotheses of the8 (g) = z0. The construction of a t-statistic requires a
consistent1pThe proof of theorem 2 uses somewhat similar arguments as Horv ath (1991) and
Gin e, Mason, and Zaitsev (2003), who discuss the asymptotic distribution of the Lnorm (Lnorm)
of kernel density estimators.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
11
estimator of V and an estimator of converging at a rate faster than p = .
Thelast part of theorem 2 suggests a way to obtain those. Any plug-in estimator
that consistently estimates the (co)variances ofb g0under the given sequence of
experiments consistently estimates and V. One such plug-in estimator is
standard bootstrap, that is resampling from the empirical distribution function.
The Bahadur expansion in assumption 2, which approximatesb g0by sample
averages, implies that the bootstrap gives a resampling distribution with the
asymptotically correct covariance structure forb g0. From this and theorem 2 it
then follows that the bootstrap gives consistent variance and bias estimates for
Z, where the bias is estimated from the di erence of the resampling estimates
relative to Z (bg). If sample size grows fast enough relative to p = and , the
asymptotic validity of a standard normal approximation for the pivot follows.It
would be interesting to develop distributional re nements for this statistic using
higher order bootstrapping, along the lines discussed by Horowitz (2001).
However, higher order bootstrapping might be very computationally demanding
in the present case, in particular if criteria like quantile regression are used to
identify g.
tests based on b Z=
Z
Theorem 2 also implies that increasing the bandwidth parameter reduces the
variance without a ecting the bias in the limiting normal distribution.
Asymptotically, the di culty in estimating Zis driven entirely by uctuations inb g0.
These uctuations lead both to upward bias and to variance in plug-in estimators.
When is larger, these uctuations are averaged over a larger range of X, thereby
reducing variance. Theorem 2 implies that Zis asymptotically ine cient relative to
Z 2for 1< 2 1. Furthermore, by proposition 1, Z(g) = lim !0Z (g) for all generic g. If
the relative ine ciency carries over to the limit as !0, it follows that the simple
plug-in estimator Z(bg) is asymptotically ine cient rel-ative to b Z. Note, however,
that this is only a heuristic argument. We can notexchange the limits with
respect to and with respect to nto obtain the limit distribution of Z(bg). The
following theorem, which is fairly easy to show, states a formally correct version
of this argument.
Theorem 3 (Asymptotic ine ciency of the naive plug-in estimator) Consider the setup of
theorem 2, and assume Z(g) >0. Then, as n!1,
liminf P(Z(bg) >Z(g)) >0 andVar r Z(bg) !1:From this theorem it follows in
particular that tests based on Z(bg) will in general not be consistent under the
sequence of experiments considered, i.e., the probability of false acceptances does
not go to zero. This stands in contrast to
bb
g; g .
0
12 MAXIMILIAN KASY
3. EXTENSIONS AND APPLICATIONS
In this section, several extensions and applications of the results of section 2 are
presented. Subsections 3.1 through 3.3 discuss, in turn, inference on Zif g is
identi ed by more general moment conditions, inference on Zif the domain and
range of gare multidimensional, and inference on the number of stable and
unstable roots. Subsections 3.4 and 3.5 discuss identi cation and inference for the
two applications mentioned in the introduction, static games of incomplete
information and stochastic di erence equations.
3.1. Conditioning on covariates In the previous section, inference on Z(g) was
discussed for functions gidenti ed by the moment condition g(x) = argmin[m(Yy)jX= x]: This =
subsection generalizes to functions gidenti ed by (10)
w1;W 2] ;
g(x;w1yEYjX) = argminyEW 2 EYjX;W [m(Yy)jX= x;Wwhere the
parameter of interest now is Z(g(:;w11
1)jW 2
1
1
g(x;w1) := argminyE
[m(
h(x
;w
1
2
is plugged into the func-
set supp(X;W 1) supp(W 2
The vector W 2
1
2
tional Z
)), the number of roots of gin xgiven w. The conditional moment restriction (10)
can be rationalized by a structural model of the form Y =
h(X;W; );where ?(X;Wand gis de ned by
; ) y)]]: We will assume that the joint
density of X;Wis bounded away from zero on the
), where suppdenotes the compact support of either
random vector.
serves as a vector of control variables. The conditional
independence assumption ?(X;W)jWis also known as \selection on observables."
The function gis equal to the average structural function if m( ) = , and equal to a
quantile structural function if m q( ) = (q1( <0)). The average structural function will
be of importance in the context of games of incomplete information, as discussed
in section 3.4, quantile structural functions will be used to characterize stochastic
di erence equations in section 3.5. When games of incomplete information are
discussed in section 3.4, W = W 1will correspond to the component of public
information which is not excluded from either player’s response function.
The inference procedure proposed in the previous section is based upon two
steps. First, the function gand its derivative are estimated using local
linearm-regression. In the second step, the estimator bg;b g0(:;:), which is a
smooth approximation of the functional Z(:). We can generalize this approach by
maintaining the same second step while using more
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
13
general rst stage estimators bg;b g0 . Equation (10) suggests estimating gby
anonparametric sample analog, replacing the conditional expectation with a local
linear kernel estimator of it, and the expectation over Wwith a sample average.
Formally, let(11) M(a;b;x;w1 bg(x;w1) = 1); b g0(x;wjn XPi1) K =
argmin(Xix;W a;b1i2M(a;b;x;ww1;W2iW 1), where2j)m(Yiab(XiK (Xix;W1iw1;W2iW2jix)) P) :b g0has a
non degenerate limiting distribution. If we obtain an approximation ofAn asymptotic
normality result can be shown in this context which generalizes theorem 2. In light
of the proof of theorem 2, the crucial step is to obtain a sequence of experiments
such that bgconverges uniformly to gwhileb g0equivalent to the approximation in
assumption 2, all further steps of the proof apply immediately. This can be done,
using the results of Newey (1994), for the following sequence of experiments.
Assumption 4 Experiments are indexed by n, and for the nth experiment we
observe (Yi;n;Xi;n;W i;n) for i= 1;:::;n. The observations (Xi;n;Yi;n;W i;n) are i.i.d. given
n, and
iid)
fx;w
) + rn
= n 4+d 1=2
(Xi;n;W i;n ) f
i;nj(Xi;n;W(13) Yi;n
n
i;n;W
), n d
2
to Rd
Z
L(:) bg
detd
b
g0
;
d
in the
(15) b
Z:=where bg
(:); b g0
(:)(12) jX;W
i;n
1
= g(X
1i;n
i;n
r
d
:(14) Theorem 4 (Asymptotic
normality, with control variables) Under the assumptions of section 2, but with gidenti ed by equation 10 and the data generated
by the model given by assumption 4, if r, where d = dim(X)+dim(W!1, !0
and = !0, then there exist >0 and V such that
b Z Z !N(0;V):
3.2. Higher dimensional systems Thus far, only
one-dimensional arguments xand one-dimensional ranges for
the function gwere considered, where xis the argument over which Zintegrates.
All results of section 2 are easily extended to a higher dimensional setup. In
particular, assume we are interested in the number of roots of a function gfrom R.
Generalizing equation (4), we can de ne b Zas
are again estimated by local linear m regression, Lis a kernel with support [ ; ],
and the integral is taken over the set X R
14 MAXIMILIAN KASY
if rn = (n
b g0support of g. As in the one dimensional case, superconsistency follows from
uniform convergence of (bg;). The following theorem, generalizing theorem 2,
holds for arbitrary d:
dTheorem 5 (Asymptotic normality, multidimensional systems) Under the
assumptions of section 2, but with g: R
d=2 bZ Z
!N(0;V):
dx
0
s and Z
s
u (x) <0
or Zu
s
0
0
g0
s
0
u
1
g 10
0X(:))
:= ZX
0
Zs
Zu
3.
3.
St
ab
le
an
d
un
st
ab
le
ro
ot
s
In
st
ea
d
L
4+d 1=2),
n
d
!R!1, !0 and = d
d+1!0,
then t
of
te
sti
ng
for
th
e
tot
al
nu
m
be
r
of
ro
ot
s,
on
e
mi
gh
t
be
int
er
es
te
d
in
the number of \stable" and \unstable" roots, Z
and Z
0
(g) := jfx2X : g(x) = 0 andg(g) :=
jfx2X : g(x) = 0 andg
u
(g(:);g
(:)) := Z
b g0
L
g(x)
g(x g0
)g0(x) (x)
(g(:);g
Stab
le
root
s
are
thos
e
wher
e gis
neg
ative
,
unst
.
able
root
s
thos
e
wher
e gis
posit
ive:
Z(x)
<0gj
Z(x)
>0gj:
(16)
In
the
multi
dime
nsio
nal
case
, we
coul
d
mor
e
gen
erall
y
cons
ider
root
s
with
a
gi
ve
n
nu
m
be
r
of
po
sit
iv
e
an
d
ne
ga
tiv
e
ei
ge
nv
al
ue
s
of
g.
W
e
ca
n
de
n
e
s
m
oo
th
ap
pr
ox
im
ati
on
s
of
th
e
pa
ra
m
et
er
s
Za
s
fol
lo
w
s:
x)
>0
du:(
17)A
gain,
all
argu
men
ts of
secti
on 2
go
thro
ugh
esse
ntiall
y
unch
ang
ed
for
thes
e
para
met
ers.
In
(
parti
cular
,
theo
rem
2
appli
es
litera
lly,
repl
acin
g
Zwit
h Z.
Mor
e
gen
erall
y,
funct
ional
s
whic
h
are
smo
oth
appr
oxim
ation
s of
the
num
ber
of
root
s
with
vario
us
stabi
lity
prop
ertie
s
can
be
cons
truct
ed in
the
multi
dime
nsio
nal
case
by
multi
plyin
g
the
integ
rand
with
an
indic
ator
funct
ion
dep
endi
ng
on
the
sign
s of
the
eige
nval
ues
of.
3
.4.
St
ati
c
ga
m
es
of
in
co
m
pl
et
e
inf
or
m
ati
on
Th
is
se
cti
on
an
d
se
cti
on
3.
5
di
sc
us
s
ho
w
to
ap
pl
y
th
e
inf
er
en
ce
pr
oc
ed
ur
e
prop
osed
to
test
for
equil
ibriu
m
multi
plicit
y in
econ
omic
mod
els.
The
disc
ussi
on in
this
subs
ectio
n
build
s on
Baja
ri,
Hon
g,
Krai
ner,
and
Neki
pelo
v
(200
6).
C
on
si
de
r
th
e
fol
lo
wi
ng
st
ati
c
ga
m
e
of
in
co
m
pl
et
e
inf
or
m
ati
on
.
As
su
m
e
th
er
e
ar
e
tw
o
pl
ay
er
s
i=
1;
2,
w
ho
bo
th
ha
ve
to
ch
oo
se
be
tw
ee
n
on
e
of
tw
o
ac
tio
ns
,
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
15
Figure 3.| Response functions and Bayesian Nash Equilibria
( s21,s2)) gg( s12,s1
s2
s1
s1(s )
s2(s )
Notes: This gure illustrates the two player, two action static game of incomplete information discussed
in section 3.4. The functions gare the (average) best response functions, Bayesian Nash Equilibrium
requires g( 1i;s) := g1(g2( 1;s2);s1) = 0, and we observe one equilibrium ( 1(s); 21(s)) in the data. In
this graph, there are two further equilibria which are not directly observable.
a= 0;1. Player imakes her choice based on public information s, as well as
private information . The public information sis observed by the econometrician,
and iiis independent of s. It is assumed that 9idoes not enter player i’s
utility.Denote the probability that player iplays strategy a= 1 given the public
information sby i(s). Player i’s expected utility given her information, and hence
her optimal action ai, as well as player i’s probability of choosing a= 1, i, depend
on sand i(s). Let us denote the average best response of player i, integrating
over the distribution of i, by
(18) gi
( i;s) = E[aij i;s]:
Figure 3 illustrates, by plotting the response functions g ifor given s. In Bayesian
Nash Equilibrium, the probability of player ichoosing a = 1, , equals the average
best response of player i, gii. This implies the two equilibrium conditions
i(s) = gi ( i(s);s);
for i= 1;2. In gure 3, the Bayesian Nash Equilibria correspond to the intersections
of the graphs of the two gi. The condition for Bayesian Nash Equilibrium
9This
is an important restriction. It precludes in particular application of this setup to correlated value
auctions.
16 MAXIMILIAN KASY
in this game can be restated as g( ;s) = 0, where (19) g( 1;s) = g1(g2( 11;s);s) : The
number of roots of g( 1;s) in 11is the number of Bayesian Nash Equilibria in this
game, given s.
We will now discuss identi cation and inference on the number of Bayesian Nash
Equilibria of this game, given the public information s. Assume we observe an i.i.d.
sample of (a1;j;a2;j;sj), the players’ realized actions and the public information of the
game, where ai;j2f0;1gfor i= 1;2 and s2Rk. In this subsection, iindexes players and
jindexes observations. Rational expectation beliefs of player iabout the expected
action of player iare given by (s) = E[aiijs]. The following two-stage estimation
procedure is a nonparametric variant of the procedure proposed by Bajari, Hong,
Krainer, and Nekipelov (2006). We
can get an estimate of the beliefs, b (20) (b i0 i(s);b (s)) = argminib;c(s) = b
E[ajXK (sjijs], by local linear mean regression.s)(aAverage best responses of
players are given by gii;j( ibc(sj;s) = E[a2s))ij ;s]. Without further restrictions, giiis
not identi ed, since by de nition is functionally dependent on s. If, however,
exclusion restrictions of the form
( i;si
( i;s) = gi
i
i
i.
Assume furthermore that i
i
1
1;s) = g1(g2( 1;s2);s1
ii( i;si)
= b E[aijb i;si) = =
i;si
i;si;j
si
2
i;j i;si;jsi)(ai;j
bgi( (22)
argminib;c;sijX
i;j
i
1
(23) bg(
(21) gi
)
1
1
1i 1:
) are imposed, the gcan be identi ed. In particular, assume that exclusion
restriction (21) holds, with dim(s) = dim(s) 1 = k1. There is one excluded
component of sfor each player, the remaining k2 components are not excluded
from either response function g(s) has full support [0;1] given s, for i= 1;2.
Under these assumptions, we can estimate the best response functions, bg],
again using local linear mean regression:); b gK 0 i( (b bc(b Note that no
functional form restrictions are needed for identi cation of the choice functions
g. This stands in contrast to Bajari, Hong, Krainer, and Nekipelov (2006), who
need to impose such restrictions in order to be able to identify the underlying
preferences. Recall that the condition for Bayesian Nash Equilibrium in this
game is given by g( = 0. Inserting bg, both estimated by (22), yields an
estimator of gwhich can be written asinto bg;s) = b Eh a
b 2= b
2))
E[a2jb 1=
1;s2];s
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 17
Based on this estimator, we can perform inference on the number of Bayesian
Nash Equilibria given s, Z(g(:;s)). In particular, let
n
bg(:;s);c g01(:;s) ;
(bg(:; c
s);
g01(:;s)
(24) b Z= Z
01 1
01
1
(25) c g01(
01 1 2( 1;s2
1and
c g01 2, so
that) c g01
2( 1;s1):
1
0(:)
properties of
0
) and b g0(x2
b g0(x
1x2
1
(9’) Yi;n
n
n
i;n
where bg(:;s) is given by (23). The term c g(:;s) refers to the estimated derivative of
gw.r.t. , and similarly for c g;s) = c g(bg);sInference on Z(g(:;s)) can now proceed
as before, if an asymptotic normality result similar to theorem 2 can be shown. In
the proof of theorem 2, three
bg(:); b gneeded to be proven for the statement of the theorem to follow: First,
under the given sequence of experiments, bg(:) converges uniformlyin probability
to a degenerate limit. Second, b g(:) converges in distribution to a non-degenerate
limit. Third,1) are asymptotically independent for jxj>const . These properties can
be shown for rin the present case, with replacing x, for an appropriate choice of
sequence of experiments, where ris a scale parameter as before. The choice of
sequence of experiments may seem to be more complicated here
than in the baseline case, since the dependent variable ais naturally bounded by
[0;1], so that increasing the residual variance would be inconsistent with the
structural model. This is not a problem, however, if we note that the distribution
ofb Z, in the baseline model, is invariant to a proportional rescaling of Y, gand .
We can therefore de ne a sequence of experiments which is equivalent to the
one de ned by equations (7) through (9) if we replace equation (9) by
= 1r g(Xi;n) +
. Intuitively, shrinking the \signal" gis equivalent to increasing the \noise" rni;n.
Returning to games of incomplete information, consider the following sequence
a experiments.
of
nd by =rn Assumption 5 For i= 1;2, gis continuously di erentiable and monotonic in i,
and g1 i;ni;0denotes the inverse of gi;nwith respect to the argument, given sii;n.
Experiments are indexed by n, and for the nth experiment we observe
(sj;a1;j;n;a2;j;n) for j = 1;:::;n. The observations (sj;a1;j;n;a2;j;n) are i.i.d.
18
MA
XI
MI
LI
AN
KA
SY
i;j;njsj;n
i;n fs
Bin( i;n(s( i;nj;
n(s);s
g1
2g1 2;0( 2;s
rn
1
2;n( 2;s2
2
1
rn
2;n;s
1
1
2;n(
2)
2;n
i(:;si
2
1
1;s2
1;n
1;n
1;n
) = 1 g ( 2;s
i;n
i)
+
(29
)
1
)=1
1
2
+
1
1) 1
r (
1)
2
2
1 2;0
2;n
2;n
gn(
= 1r 1;0
given nand
(:)(26) a))(27) (s) = g)(28)
sj;n
1
n
g ( 2;s
rn 1;0
(3
0)
E
qu
ati
on
s
(2
6)
to
(2
8)
ar
e
:
g
2;n;s
th
e
sa
m
e
as
in
th
e
m
od
el
w
e
ha
ve
be
en
di
sc
us
si
ng
so
far
.
E
qu
ati
on
s
(2
9)
an
d
(3
0)
sh
rin
k
th
e
gr
ap
hs
of
th
e
be
st
re
sp
on
se
fu
nc
tio
ns
g)
to
w
ar
ds
th
e
=
li
ne
(c
o
m
pa
re
gu
re
3),
pa
ral
lel
to
th
e
a
xi
s.
D
en
ot
e
=
g(
).
W
e
ge
t
1;s) = g
(
g
n
( 1;s );s )
1
g ( 2;n;s
)g
1
=g
( 2;n;s
) g1 2;0
)
1
:
2):
;s) !g1;0
( 1;s
(31) rngn
By equation (30), if rn !1, then 2;n
! 1, and hence
1
(
( ;s
converges to a non-degenerate limit i rn = O((n 4+k
1=2)
c g01 i), where kis the dimensionality of the support of the response functions Us
gi, k= dim(s).
no
inf
ba
se
ba
sm
are slower. In particular, rn
Theorem 6 (Asymptotic normality, static games of incomplete information) Under the uniformly in the Bahadur expansions as n !1, and if rn
sequence of experiments de ned by assumption 5, if R= op
r rn
ZZ
b
!N(0;V):
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 19
Figure 4.| Qualitative dynamics of stochastic difference equations
1
x
2
X
g(X,. )
x
gU( X)
gL( X)
UNotes:
This gure illustrates the characterization of the dynamics of nonlinear stochastic di erence
equations discussed in section 3.5, where gand gL1], and the basin of attraction of the upper equilibrium
region is [x2are upper and lower envelopes of gfor a sequence of realizations of . In this graph,
equilibrium regions correspond to the dashed segments of the Xaxis, the basin of attraction of the lower
equilibrium region is given by (;x;1).
3.5. Stochastic di erence equations In this subsection,
identi cation and interpretation of the number of roots of
gfor stochastic di erence equations of the form
(32) Xi;t+1
= Xi;t+1 Xi;t
= g(Xi;t; i;t)
is discussed. Interest in such di erence equations is motivated by the study of
neighborhood composition dynamics in Card, Mas, and Rothstein (2008). This
discussion will form the basis of the empirical application in section 4. First, it will
be shown that, under plausible assumptions, nding only one root in crosssectional
quantile regressions of Xon Ximplies that there is only one stable root for every
member of a family of conditional average structural functions. Second, it will be
argued that the number of roots of gallows to characterize of the qualitative
dynamics of the stochastic di erence equation in terms of equilibrium regions.
Before the formal results are stated, let us discuss the intuition behind this latter
claim. Holding constant, the number of roots of gin Xis the number of equilibria
the di erence equation (32). If is stochastic, the number of roots can still serve
to characterize qualitative dynamics in terms of \equilibrium regions"; this is
illustrated in gure 4. In this gure there are ranges of Xin which the sign
of Xdoes not depend on . This implies that in these ranges Xmoves towards
the equilibrium regions, which are the regions in which the roots of g(:; ) lie.
20 MAXIMILIAN KASY
How is the joint distribution of (Xt;X) related to the transition function g?
Unobserved heterogeneity which is positively related over time leads to an upward
bias in quantile regression slopes relative to the corresponding structural slopes.
To show this, denote the qth conditional quantile of X given X by Q XjXt+1(qjX), the
conditional cumulative distribution function at Qby F XjX(QjX), and the conditional
probability density by f(QjX). The following lemma shows that quantile regressions
of Xon Xyield biased slopes relative to the structural slope@ @X XjXg, if Xis not
exogenous. The second term in equation 33 reects the bias due to statistical
dependence between Xand .
Lemma 1 (Bias in quantile regression slopes) If X= g(X; ), and if Q and F are
di erentiable with respect to the conditioning argument X, then
@
X= Q;X
@
X
:(33)
The following assumption of rst order stochastic dominance states that there is
no negative dependence between current g(x 0; ), evaluated at xed x, and
current X:
@ @XQ XjX( jX) = E f 1 XjX(QjX)
g(X; )
P (g(X0; ) QjX)
X0=X 0
@
@
X
Assumption 6 (First order stochastic dominance) P (g(x 0; ) QjX) is
non-increasing as a function of X, holding x0constant.
Violation of this assumption would require some underlying cyclical
dynamics, in continuous time, with a frequency close enough to half the
frequency of observation, or more generally with a ratio of frequencies that is
an odd number divided by two. It seems safe to discard this possibility in most
applications. This assumption might not hold, for instance, if outcomes were
inuenced by seasonal factors and observations were semi-annual.
We can now formally state the claim that, if there are unstable equilibria
structurally, then quantile regressions should exhibit multiple roots.
Proposition 3 (Unstable equilibria in dynamics and quantile regressions) Assume
that X= g(X; ) and that g(inf X ; ) >0, and g(supX ; ) <0 for all . If assumption 6
holds and Q XjX(qjX) has only one root Xfor all q, then the conditional average
structural functions E[g(x0; )jg(X; ) = 0;X], as functions of x0, are \stable" at the
roots m:
for all X, where (0;X) is in the support of ( X;X).
E @ g(X; ) X= 0;X
0
@
X
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
21
This proposition assumes \global stability" of g, i.e., Xdoes not diverge to in nity.
Under such global stability, if there is only one root of g, then this root is stable.
According to this proposition, if quantile regressions only have one stable root, then
the same is true for the conditional average structural functions. This is not
conclusive, but it is suggestive that the g(:; ) themselves have only one root.
Let us now turn to the implications of the number of roots of gfor the qualitative
dynamics of the stochastic di erence equation (32). Let ~g(x; ) := g(x; )+x. If
gdescribes a structural relationship, the counterfactual time path under
\manipulated" initial condition Xi;0= x0is given by Xi;1= ~g(x0; ) Xi;2= ~g(Xi;0i;1; ) ..
.i;1
= ~g(Xi;t1; i;t1
and shocks ;:::;
Xi;t
U
i;1
U i;t0 s<t
i;1;:::; i;t
L i;t
i;s
i;s
i;s
g(x; i;s
0
1
L
i;t
L i;t
s
<
t
2,
U i;t
X
1
U i;t
i;0
):(34) Given the initial condition
X
g(x;
i;1
i;t
i;s
de ned by g(x) =
max
<0 or g
(x) = min
The functions g
i;s
in the upper \basin of attraction" beyond x
to x2
and g
, equation (32) describes a time inhomogenous deterministic di erence equation.
The following argument makes statements about the qualitative behavior of this
di erence equation based on properties of the function g, in particular based on
the number of roots in x of g(x; ) for given unobservables . Consider gure 4,
which shows g
and gL
i
;
t
)(35) g):(36)
and gare the upper Uand lower envelope of the family of functions g(x; ) for s=
1;:::;t. The direction of movement of Xover time does not depend on sin the
ranges where g>0 (which is where the horizontal axis is drawn solid in gure 4),
since the sign of g(x; ) does not depend on sin these ranges. In other words,
suppose we start o with an initial value below xin the picture. If that is the case,
Xwill converge monotonically toward the left-hand dashed range and then
remain within that range for all s t. Similarly, for Xwill converge to the upper
\equilibrium range" given by the right hand dashed range. Hence small changes
of initial conditions (from x) can have large and persistent e ects on X in this
case, in contrast to the case where g(:; ) only has only one stable root for all .
These arguments are summarized in the following proposition.
Proposition 4 (Characterizing dynamics of stochastic di erence equations) Assume
that gL i;t, de ned by equation (35) and (36), are smooth and
22 MAXIMILIAN KASY
generic, positive for su ciently small xand negative for su ciently large x, and
have the same number zof roots, xU 1<:::<xU zand xL 1<:::<xL z, and let xL 0U z+1= ,
x= 1. De ne the following mutually disjoint ranges:
= [xU c;xU c+1
= [xL c;xL c+1
c
= [xL c;xU c
;xL c] forc= 2;4;:::;z1
c
=
c
[
x
U
c
i;s
i;s
c
i;0
factors
i;0
c
c+1.
cc
c+1
c1
c
2
N
c
c
c
Nc ] forc= 1;3;:::;z P] forc= 0;2;:::;z1 S]
forc= 1;3;:::;z U
Then all g(x;
and ) are negative on the N , and positive on the P
S
. Furthermore, all
g(x; ) are negative in a neighborhood to the right of the maximum of the S
and positive to the left of the minimum, and the reverse holds for the U.
Therefore, if Xi;s
i;s
and then remain within S . If X
2Pc and Sc+1
will converge monotonically toward Sand then remain within S
c
[Sc [N
Assuming nonemptiness of these ranges, the interval
P
i;1;:::; i;t
c , since gU i;tL i;tg
0
c
will converge monotoni
c
i
;
s
i
s a \basin of attraction" for S, i.e., Xin this interval converges
monotonically to Sand then remains there. The main di erence
relative to the deterministic, time homogenous case is the \blurring"
of the stable equilibrium to a stable set S. We did not make any
assumptions on the joint distribution of the unobserved
. The whole argument of the preceding theorem is
conditional on these factors. However, the predictions of the theorem
will be sharper (given g) if serial dependence of unobserved factors
is stronger, increasing the number of units ito which the assertion is
applicable and reducing the size of the intervals Sand Uis going to be
smaller on average. In summary, proposition 3 implies that, if we do
not nd multiple roots in
quantile regressions, then the conditional average structural functions
E[g(x; )jg(X; ) = 0;X] do not have multiple roots. Proposition 4 implies that, if upper
and lower envelopes of g(:; ) do not have multiple roots, then the dynamics of the
system are stable and initial conditions do not matter in the long run.
6= ?, then X
6= ?, then X
4. APPLICATION TO THE DYNAMICS OF NEIGHBORHOOD COMPOSITION
This section analyzes the dynamics of minority share in a neighborhood,
applying the methods developed in the last two sections to the data used for
analysis of neighborhood composition dynamics by Card, Mas, and Rothstein
(2008). Card, Mas, and Rothstein (2008) study whether preferences over
neighborhood composition lead to a \white ight", once the minority share in a
neighborhood exceeds a certain level. They argue that such \tipping" behavior
implies discontinuities in the change of neighborhood composition over time as a
function of initial composition, and test for the presence of such discontinuities in
crosssectional regressions over di erent neighborhoods in a given city. The
authors
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
23
provided full access to their datasets, which allows us to use identical samples and
variable de nitions as in their work.
The data set is an extract from the Neighborhood Change Database, or NCDB,
which aggregates US census variables to the level of census tracts. Tract
de nitions are changing between census waves but the NCDB matches
observations from the same geographic area over time, thus allowing observation
of the development over several decades of the universe of US neighborhoods. In
the dataset used by Card, Mas, and Rothstein (2008), all rural tracts are dropped,
as well as all tracts with population below 200 and tracts that grew by more than 5
standard deviations above the MSA mean. The de nition of MSA used is the
MSAPMA from the NCDB, which is equal to Primary Metropolitan Statistical Area if
the tract lies in one of those, and equal to the MSA it lies in otherwise. For further
details on sample selection and variable de nition, see Card, Mas, and Rothstein
(2008).
The graphs and tables to be discussed are constructed as follows. For each of
the MSAs and each of the decades separately, we run local linear quantile
regressions of the change in minority share of a neighborhood (tract) on minority
share at the beginning of the decade. This is done for the quantiles 0.2, 0.5 and
0.8, with a bandwidth of n:2, where nis the sample size.10The left column of graphs
in gure 5 shows these quantile regressions for the three largest MSAs. For each
of the regressions, Z is calculated, where is chosen as 0.04. The integral in the
expression for Z is taken over the interval [0;1], intersected with the support of initial
minority share if the latter is smaller. Note that it is possible to nd no (stable)
equilibrium for an MSA, i.e. Z<1, if high initial minority shares do not occur in that
MSA and most neighborhoods experienced growing minority shares. Figure 6
shows kernel density plots of the regressor, initial minority share, which suggest
that support problems are not an issue, at least for the largest MSAs. For each Z ,
bootstrap standard errors and bias are calculated, as well as the corresponding
t-test statistics for the null hypothesis Z = 0;1;2;3;:::, implying an integer-valued
con dence set (of level .05) for z. By the results of section 2, these con dence sets
have an asymptotic coverage probability of 95%. By the Monte Carlo evidence of
appendix A, they are likely to be conservative, i.e., have a larger coverage
probability. If the con dence sets thus obtained are empty, the two neighboring
integers ofb Zare included in the11intervals shown. This makes inference even
more conservative. Table I shows the resulting con dence sets for the twelve
largest MSAs in the United States (by 2009 population), for all quantiles and
decades under consideration.
As can be seen from the table, in very few cases there is evidence of
Zexceeding 1. In all cases shown, except for the .2 quantile for Atlanta in the
1980s, we can reject the null Z 3. Similar patterns hold for almost all of the 118
cities in the
10The
implementation of local linear quantile regression uses code downloaded from Koenker (2009).
full set of results for all 115 MSAs in the dataset can be found in the web-appendix, Kasy
(2010).
11The
24 MAXIMILIAN KASY
Figure 5.| Quantile regressions of the change in minority share and of the
change in white population on initial minority share
New York, 1980-1990
.2 .5
.8
0
.2 .5
.8
0 0.2 0.4 0.6 0.8 1
0.25
0 0.2 0.4 0.6 0.8 1
-0.05
0.2
-0.1
0.15
0.1
-0.15
0
.
2
0.05
0
Los Angeles, 1970-1980
.2 .5
.8
0.25
0.05
.2 .5
.8
0 0.2 0.4 0.6 0.8 1
0
0.2 0 0.2 0.4 0.6 0.8 1
-0.05
0.15
-0.1
0.1
-0.15
0
.
2
0.05
0
Chicago, 1970-1980
.
2
.
5
.
8
.2 .5
.8
0.2
0.1
0.3
0.25
0
0.2
-0.1
0.15
-0.2
0.1
0.05
0
-0.05
0 0.2 0.4 0.6 0.8 1
Notes: These graphs show local linear quantile regressions of the change in minority share (left
column) and of the change in white population relative to initial population (right column) on
initial minority share for the quantiles .2, .5 and .8. The graphs do not show con dence bands.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
25
Figure 6.| Density of minority share across neighborhoods
New York 1980 Los Angeles 1970
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1
Chicago 1970
2.5
2
1.5
1
0 0.2 0.4 0.6 0.8 1
Notes: These graphs show kernel density estimates of the distribution of minority share
across neighborhoods.
0.5
0
26 MAXIMILIAN KASY
dataset. Rather than exhibiting multiple equilibria, the data indicate a general rise in
minority share that is largest for neighborhoods with intermediate initial share, but
not to the extent of leading to tipping behavior. Proposition 3 in section 3.5
suggests that, if we do not nd multiple roots in quantile regressions, we can reject
multiple equilibria in the underlying structural relationship. I take these results as
indicative that tipping is not a widespread phenomenon in US ethnic neighborhood
composition over the decades under consideration. This stands in contrast to the
conclusion of Card, Mas, and Rothstein (2008), who do nd evidence of tipping.
The approach used here di ers from the main analysis in Card, Mas, and
Rothstein (2008) in a number of ways. Card, Mas, and Rothstein (2008) (i) use
polynomial least squares regression with a discontinuity. They (ii) use a split
sample method to test for the presence of a discontinuity, and they (iii) regress the
change in the non-Hispanic, white population, divided by initial neighborhood
population, on initial minority share. We (i) use local linear quantile regression
without a discontinuity, we (ii) run the regressions on full samples for each MSA
and test for the number of roots, and we (iii) regress the change in minority share
on initial minority share.
To check whether the di ering results are due to variable choice (iii) rather than
testing procedure, the gures and tables that were just discussed are replicated
using the change in the non-Hispanic, white population relative to initial population
as the dependent variable, as did Card, Mas, and Rothstein (2008). The right
column of gure 5 shows such quantile regressions. These gures correspond to
the ones in Card, Mas, and Rothstein (2008), p.190, using the same variables but a
di erent regression method and the full samples. Table II shows con dence sets for
the number of roots of these regressions for the 12 largest MSAs. In comparing
tables I and II, note that there is a correspondence between the lower quantiles of
the rst (low increase in minority share) and the upper quantiles of the latter (higher
increase/lower decrease of white population). The two tables show fairly similar
results. Again, no systematic evidence of multiple roots is found.
Some factors might lead to a bias in the estimated number of equilibria, using the
methods developed here. First, the test might be sensitive to the chosen range of
integration if there are roots near the boundary. If a root lies right on the
boundary of the chosen range of integration, it enters Z as 1=2 only. Extending
the range of integration beyond the unit interval, however, might also lead to an
upward bias in the estimated number of roots, if extrapolated regression
functions intersect with the horizontal axis. Second, choosing a bandwidth
parameter that is too large might bias the estimated number of equilibria
downwards, if the function gpeaks within the range [ ; ]. Third, there might be
roots of g in the unit interval but beyond the support of the data.
Notes: The table shows con dence intervals in the integers for Z(g) for the 12 largest MSAs of the United States, ordered by population,
where gis estimated by quantile regression of the change in minority share over a decade on the initial minority share for the quantiles .2,.5
:2
, is chosen
andas
is n on
.8.:04.
sets are based
Regression
Con dence
bandwidth
t-statistics using bootstrapped bias and standard
errors.
MSA 70s 80s 90s
q= :2 q= :5 q= :8 q= :2 q= :5 q= :8 q= :2 q= :5 q= :8
New York, NY PMSA [0,1] [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0]
Los Angeles-Long Beach, CA PMSA [1,1] [1,1] [0,1] [0,1] [0,1] [0,1] [1,1] [1,1] [0,0]
Chicago, IL PMSA [0,1] [0,1] [0,1] [2,2] [0,1] [0,1] [1,1] [0,1] [0,0]
Dallas, TX PMSA [1,2] [1,1] [0,0] [0,1] [0,0] [0,0] [0,1] [0,1] [0,0]
Philadelphia, PA-NJ PMSA [1,2] [0,1] [0,1] [1,1] [0,1] [0,1] [1,1] [0,1] [0,0]
Houston, TX PMSA [1,1] [0,0] [0,0] [1,2] [0,1] [0,0] [0,1] [0,0] [0,0]
Miami, FL PMSA [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0]
Washington, DC-MD-VA-WV PMSA [0,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0]
Atlanta, GA MSA [1,1] [1,1] [0,0] [2,3] [0,0] [0,0] [0,0] [0,0] [0,0]
Boston, MA-NH PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,0] [1,1] [0,0] [0,1]
Detroit, MI PMSA [1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,0]
Phoenix-Mesa, AZ MSA [1,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0]
San Francisco, CA PMSA
[1,1] [0,1] [0,1] [0,0] [0,1] [0,0] [1,1] [0,0] [0,0]
Table I.| .95 confidence sets for Z(g) for the 12 largest MSAs of the United States by decade and
quantile, change in minority share
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
27
28 MAXIMILIAN
KASY
T ableI I .|.95confidencesetsf orZ ( g )f orthe12lar gestMSAsoftheUnitedSt a tesbydecade
andquantile,changeinwhitepopula tion
MSA70s80s90s
q =: 2q =: 5q =: 8q =: 2q =: 5q =: 8q =: 2q =: 5q =: 8
NewY ork,NYPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1]
LosAngeles-LongBeac h,CAPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1]
Chicago,ILPMSA[0,1][0,1][0,1][0,0][0,1][1,1][0,1][0,1][0,1]
Dallas,TXPMSA[0,1][0,1][0,1][0,0][1,1][0,2][0,1][1,1][0,1]
Philadelphia,P A-NJPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][1,1]
Houston,TXPMS A[0,1][0,1][0,1][1,1][1,1][1,1][0,1][0,1][0,1]
Miami,FLPMS A[0,1][0,1][0,1][0,0][0,0][1,1][1,1][1,1][1,1]
W ashington,DC-MD-V A-WVPMSA[0,1][0,0][0,1][0,0][1,1][0,0][0,1][0,1][0,1]
A tlan ta,GAMSA[0,1][1,1][0,1][1,1][1,1][1,1][1,1][1,2][0,1]
Boston,MA-NHPMSA[0,1][0,1][0,1][0,0][0,0][1,1][0,0][0,1][0,1]
Detroit,MIPMS A
[0,1][0,1][0,1][0,0][0,0][1,1][0,1][0,1][0,1]
Pho enix-Mesa,AZMSA
[0,1][0,1][0,1][0,0][1,1][0,0][0,1][0,1][0,1]
SanF rancisco,CAPMS A[0,1][0,1][0,1][0,0][0,0][0,0][0,0][1,1][0,0]
Notes:Thetablesho wscon dencein terv alsinthein tegersforZ ( g )f orthe12largestMSAsoftheU nitedStates,orderedb yp opulation,where
g isestimatedb yquan tileregressionofthec hang einthenonhispanic,whitep opulationo v eradecade,dividedb yinitialtotalp opulation,
onthei nitialminorit yshareforthequan tiles.2,.5and.8.Regressionbandwidth isn
: 2 , isc hosenas: 05timesthemaximalc hange.
Con dencesetsarebasedont-statisticsusingb o otstrapp edbiasandstandarderrors.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
29
5. SUMMARY AND CONCLUSION
This paper proposes an inference procedure for the number of roots of functions
nonparametrically identi ed using conditional moment restrictions, and develops the
corresponding asymptotic theory. In particular, it is shown that a smoothed plug-in
estimator of the number of roots is super-consistent under i.i.d. asymptotics, but
asymptotically normal under non-standard asymptotics, and asymptotically e cient
relative to a simple plug-in estimator. In section 3, these results are extended to
cover various more general cases, allowing for covariates as controls, higher
dimensional domain and range, and for inference on the number of equilibria with
various stability properties. This section also discusses how to apply the results to
static games of incomplete information and to stochastic di erence equations. In an
application of the methods developed here to data on neighborhood composition
dynamics in the United States, no evidence of multiple of equilibria is found.
The inference procedure can also be used to test for bifurcations, i.e., (dis)appearing
equilibria as a function of changing exogenous covariates. It is easy to test the
hypothesis Z(g(:;W 1)) = Z(g(:;WZ(g(:;W i)) are independent for W 12and W 2)), since the
corresponding estimators bfurther apart than twice the bandwidth . If there are
bifurcations, small exogenous shifts might have a large (discontinuous) e ect on the
equilibrium attained, if the \old" equilibrium disappears.
In the dynamic setup, one might furthermore consider to apply the procedure to
detrended data, for instance by demeaning Y. It seems likely that regressions of
detrended data have a higher number of roots. The rationale of such an approach
could be found in underlying models in which the dynamics of a detrended variable
are stationary. This is in particular the case in Solow-type growth models, in which
GDP or capital stock is stationary after normalizing by a technological growth
factor.
Finally, it might also be interesting to extend the results obtained here to cover
further cases where gcan not be directly estimated using conditional moment
restrictions. The crucial step for such extensions, as illustrated by the various
cases discussed in section 3, is to nd a sequence of experiments such that
the rst stage estimator bgconverges in probability to a degenerate limit
whereasconverges in distribution to a non-degenerate limit. Furthermore, b g0(x1b
g0) needs to be asymptotically independent ofb g0(x2) for all jx1x2j>const: . There
are many potential applications of the results obtained here, where 12it might be
interesting to know whether the underlying dynamics or strategic interactions imply
multiple equilibria. Examples include household level poverty traps,
intergenerational mobility, e ciency wages, macro models of economic growth (as
analyzed in the web appendix), nancial market bubbles (herding), market entry,
and social norms.
12The
Matlab/Octave code written for this paper is available upon request.
30 MAXIMILIAN KASY
APPENDIX A: MONTE CARLO EVIDENCE This section presents simulation
results to check the accuracy in nite samples of the asymptotic approximations obtained in theorem 2. In all simulations, the Xare i.i.d. draws of Uni[0;1] random
variables, and the additive errors are either uniformly or normally distributed:
iid
8x3
Xi
fjX
ijXi
= gj(Xi) + i
Yi
jXj
1
2
2
b Zshould be constant up to
b Z, normalized by its
0
b
g
0
Uni[0;1]
;(37) where fis an appropriately centered and scaled uniform or
normal distribution. Two functions gare considered, the rst with one root and the second with three
roots: g(x) = 0:5 x g(x) = 0:5 5x+ 12x: The function gis estimated by median regression, mean
regression and .9 quantile regression,
where the in the simulations are shifted appropriately to have median, mean or .9 quantile at the
respective g. The gures and tables show sequences of four experiments with 400, 800, 1600 and 3200
observations. These models are chosen to be comparable to the empirical application discussed in
section 4. The variance of in each experiment is chosen to yield the
same variance for b g, as implied by the asymptotic approximation of the Bahadur expansion, in all
experiments for a given g. By the proof of theorem 2, we should therefore get similar simulation results
across all setups. Furthermore, the variance of
a factor = . The parameters of these simulations are chosen to lie in an intermediate range where
variation inis existent but moderate. Figure 7 shows density plots forb Zfor the sequences of
experiments with uniform errors andmedian regressions; in the web-appendix, Kasy (2010),
similar gures are presented for the other experiments. As predicted by theorem 2, biases are positive,
and both bias and variance are decreasing in n. Figure 8 shows the distribution of the \naive" plug-in
estimator Z(bg). It was shown in section 2 that this estimator is asymptotically ine cient relative to the
smoothed plug-in estimator. This relative ine ciency is reected in a larger dispersion in the simulations,
as can be seen comparing gure 7 and 8. Figure 9 shows density plots for
sample mean and standard deviation, from the same simulations. These plots suggest that the
sample distribution ofb Zis somewhat right-skewed relative to a normal distribution.Table III shows
the results of simulations using bootstrapped standard deviations and biases, for mean regression
with uniform errors. The results show, for the range of experiments considered, that rejection
frequencies are lower than the 0:05 value implied by asymptotic theory. If this pattern generalizes,
inference based upon the t-statistic proposed in this paper is conservative in nite samples. In
particular, it seems that bootstrapped standard errors are too large.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 31
g1Figure 7.| Density of b Zin Monte Carlo experiments (x) = 0:5 x, Z(g1) = 1
n=400
n=800
n=1600
n=3200
1 2 3 4 5 6
10
8
6
4
2
0
(x) = 0:5 5x+ 12x2
8x3, Z(g2) = 3
n=400 n=800 n=1600 n=3200
g2
9
8
7
6
5
4
3
2
1
0
1 2 3 4 5 6
Notes: This gure shows density plots of b Zfrom Monte Carlo experiments with uniformerrors and
gidenti ed by median regression, as described in appendix A. The upper graph shows the distribution
from four experiments with increasing samplesize nand correspondingly growing variance of the
residual , where the true parameter Zequals one. The same holds for the lower graph, except that Z=
3.
32 MAXIMILIAN KASY
Figure 8.| Distribution of simple plug-in estimator Z(bg) in Monte Carlo
experiments
g1(x) = 0:5 x, Z(g1) = 1
0.8
0.7
n=400
n=800
n=160
0
n=320
0
0 1 2 3 4 5 6 7
0.6
0.5
0.4
0.3
0.2
0.1
0
g2(x) = 0:5 5x+ 12x2
8x3, Z(g2) = 3
n=400 n=800 n=1600 n=3200
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7
Notes: This gure shows the distribution of Z(bg), the \naive" plug-in estimator, from the same
simulations as gure 7.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 33
g1Figure 9.| Density of normalized b Zin Monte Carlo experiments (x) = 0:5 x, Z(g1) =
1
std.normal
n=400 n=800
n=1600
n=3200
0.4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
g2(x) = 0:5 5x+ 12x2
8x3, Z(g2) = 3
std.normal n=400 n=800
n=1600 n=3200
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Notes: This gure shows density plots of b Z, normalized by its sample mean and standarddeviation,
from the same simulations as gure 7. It also shows, as a reference, the density of a standard
normal.
34 MAXIMILIAN KASY
TABLE III Montecarlo rejection
probabilities
n r b P( >z ) b P( <z ) 400 0.065 0.179 0.05
0.01800 0.059 0.194 0.03 0.02 1600 0.055 0.231
0.02 0.01 3200 0.052 0.290 0.02 0.01
400 0.065 0.268 0.03 0.02 800 0.059 0.292 0.01
0.02 1600 0.055 0.347 0.01 0.01 3200 0.052 0.434
0.01 0.02
1Notes:
This table shows the frequency of rejection of the null under a test of asymptotic level 5%,
for the sequences of Monte Carlo experiments described in appendix A. The gare estimated by
mean regression, the errors are uniformly distributed, and the rst four experiments are generated
using gwith one root, the next four using g2with three roots. The columns show in turn sample size,
regression bandwidth, error standard deviation, and the rejection probabilities of one-sided tests.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
35
APPENDIX B: PROOFS Pro of of prop osition 1: By continuity of g0as
well as genericity of gwe can choose small enough such that sgn(g0(x)) is constantly equal to sgn(g0c))
6= 0 in each of the neighborhoods of the c= 1;:::;zroots of g, fxg, de ned by L (xc(g(x)) 6= 0. Hence we
can write the integralRXL (g(x))jg0(x)jdxas a sum of integrals over these neighborhoods, in each of
which there is exactly one root. Assume w.l.o.g. that z= 1 and g0is constant in the range of xwhere
L XL (g(x))jg0(g(x)) 6= 0. Then, by a change of variables setting y= g(x), Z(x)jdx= Zg(X )L (y)jg0(g1(y))j
1jg0(g1(y))j dy= 1
Pro of of prop osition 2: We need to nd such that jjg~gjj< implies Z(~g) = Z(g). By genericity of g,
each root xcof gis such that sgn(g0)) 6= 0. By continuous rst derivatives we can then nd such that
sgn(g0(xc(:)) is constant in the neighbourhood NHc:= (xc ;x+ ) of each of the nitely many roots xcand the
NHccare mutually disjoint. By continuity of g, (38) 1:= infg(x) >0
c
x=2 Sc NH
and (39) 2
jg0(x)j>0;
x2 Sc
c
=
NHc
c
0(xc
)) = sgn(~g(xc
2
1 2c
ccNHc
1
1
Z2
p
c
1
0
:= inf
is constantly equal to sgn(g
is the closure of NHc. Choosing =S ))
6= sgn(g(x
c
c
where NHc ) ful lls our purpose. To see this choose a ~gsuch that jjg~gjj< . For x=2~gis bounded away from ; min(
zero by equation (38). In NHthere must be exactly one xsuch that ~g(x) = 0: Since the NHare
mutually disjoint, sgn(g(x+ )), by (38) again sgn(g(x )) and sgn(g(x+ )) = sgn(~g(x+ )),
and nally the sign of ~g)) in NHby equation (39). The assertion for Zfollows now from the rst part
of this proof, combined with proposition 1, if we can choose a independent of ~gsuch that
proposition 1 applies. Su cient for this is a that separates roots. Choosing = accomplishes this.
By equation (38), Lwill separate the NH, and by the previous argument each of the NHwill contain
exactly one root of ~g.
n
n
=A Z2
1
APro
of of theorem 2:
Write Z
Z
A
2
=
;Z3 (Z2). We will use Z1;Z
0
o
have the same non-degenerate distributional limit for some non-random sequences aand b. In
particular, as long as such sequences exist that guarantee convergence to a non-degenerate limit,
this is implied by equality up to a remainder which is asymptotically negligible under the given
sequence of experiments, i.e., Zif Z
) The remainder of this approximation is given by
(bg))j b jg
Z (g)L
(L
Z=1) Approximation of bgwith g: b
to denote a sequence of approximations to b Z.
Negligibility of this remainder follows from uniform convergence of bgunder our sequence of
experiments at a rate faster than , which is a consequence of Bahadur expansion (6) and of = !1.
Assuming that Lis Lipschitz with constant C= , this in turn implies uniform convergence of (L (bg)) to
0. This, combined with the arguments proving distributional
36
MA
XI
MI
LI
AN
KA
SY
ver
neigh
borho
ods of
the
roots
of g,
given
below
,
prove
s that
the
remai
nder
is op(
b
Z).2)
Appro
ximati
on of
b
gZ (g;
b g0Z
L 1(
g(x))
in
XK
g(X0i(
x)
f1x) (
Yi(x)s
1
n(x)Ig
(x)
g0n(x)
(x)(X
Xi2 3L
(g)jRj;
The
absol
ute
value
of the
remai
nder
of this
appro
ximati
on is
less
than
or
equal
to
Z
pis
o
ne
gli
gib
le,
i.e.
,
R=
ow
her
e
Ris
the
re
ma
ind
er
of
the
Ba
ha
dur
ex
pa
nsi
on.
Ne
gli
gib
ility
of
the
re
ma
ind
er
of
the
ap
pro
xi
ma
tio
n
is
a
co
ns
eq
ue
nc
e
of
the
as
su
mp
tio
n
tha
t
the
re
ma
ind
er
of
the
Ba
ha
dur
ex
pa
nsi
on
b
g;b
g0
(g;
g0)
u
nif
or
ml
y
in
x.3
)
Re
stri
cti
on
to
on
e
ro
ot
at
0
an
d
Ta
ylo
r
ap
pro
xi
ma
tio
ns:
As
su
me
tha
t
g(0
)=
0
an
d
g(x
)
6=
0
for
x6
=0
(i.e
.,
Z=
1).
Thi
s is
wit
ho
ut
los
s
of
ge
ner
alit
y,
sin
ce
the
int
egr
al
for
the
ge
ner
al
ca
se
is
si
mp
ly
a
su
m
of
the
ind
ep
en
de
nt
int
egr
als
in
a
nei
gh
bor
ho
od
of
ea
ch
roo
t.
Now de ne c= g0(0), w= f1(0)s1(0)1 2,
convergence
of R
b g0
) =A
By replacing gwith g0(0)xin L
0
i
= (ei) and ~ K (d) = K
by the Bahadur expansion:
x
dx=: Z1
i
and replacing f1(x)s1(x)(x) !1
uniformly, we get
with w, both justi ed by smoothness and !0, as well as I
= Z L (cx)
x))
(d) d .
n(g(x))
1
2
Z1
x) ( i
x)(Xi
i
dx= Z2jL (g) L The absolute value of the remainder of this approximation is less
than or equal to Z(cx)j
g0 X
+Z L (cx)
f1(x)s1(x)In(x) 1 2 2w Both terms in
this expression go to 0 as !0. We can assume furthermore that X iidUni([ =c; =c])
conditional on falling in this interval and that
iid
2i) fX(Xi)f jX
i functio
Y
X
ns FX
jE j:
Z L (cx)
(Xi K iX 1 =A
n
g c+ w0r (0) fn2En1h
~K (0)s(Xi1(0) 1x) i2 3
i
(
X(0)X+
o(X)
and F jX( jX)
= F jX
n
i
(e)
jX=
0
Th
es
e
as
su
mp
tio
ns
are
jus
ti e
d
by
an
oth
er
Ta
ylo
r
ap
pro
xi
ma
tio
n,
thi
s
tim
e
of
the
dis
trib
uti
on
(x) = F
Z h(Z
( j0
)+
O(
X),
as
su
mi
ng
bot
h
dis
(0) + f
jXi) fX(0)f jX
2
1
2
(
ij0)
trib
uti
on
fun
cti
on
s
to
be
C.
To
se
e
tha
t
thi
s
ap
pro
xi
ma
tio
n
is
jus
ti e
d,
not
e
tha
t
dis
trib
uti
on
al
co
nv
erg
en
ce
to
the
sa
me
limi
t is
eq
uiv
ale
nt
to
co
nv
erg
en
ce
of
the
ex
pe
cta
tio
ns
of
an
y
Lip
sc
hit
z
co
nti
nu
ou
s
bo
un
de
d
fun
cti
on
of
the
sta
tisti
cs
to
the
sa
me
limi
t.
Th
e
di
ere
nc
e
in
ex
pe
cta
tio
ns
bet
we
en
a
fun
cti
on
hof
Za
nd
of
its
ap
pro
xi
ma
tio
n
usi
ng
co
ndi
tio
nal
ly
uni
for
m
Xa
nd
i.i.
d.
is
giv
en
by
This integral goes to 0 because the support of h(Z
) in Xis a neighborhood of 0 shrinking to 0.
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 37
j;tj+14) Partitioning the range of integration: Partition [ =c; =c] into subintervals ti
[t], j= 1;:::;b = cwith ti+1
Z2 =A
L (ctj) j
= Z3
b =c cXj=1
= Zttjj +1 2
c+rn En h ~K(Xi
c+ w
x) ii
dx n
w
with
j
n
tj
j
= 2 . Then
tj
h2
jl
h
j;t
iid
r E
2
L (cx) L c max<xThe remainder of this
approximation is given by Z
This approximation is warranted by Lipschitz continuity of L2with a Lipschitz constant of order 1= , and
by = !0.
5) Poisson approximation: The following argument essentially replaces the number of Xfalling into
the interval [ =c; =c], which is approximately distributed Bin(n;2f(0) =c), with a Poisson random variable
with parameter 2nf(0) =c; the distribution of everything else conditional on this number remains the
same.
Let njbe distributed i.i.d. Poisson(2n f(0)) for j= 1;:::;b = c. This is an approximation to the number of
Xfalling into the bin [t]. Draw Xjl iidUni([tj;tj+1]) and j+1 (e)jX= 0 for j= 1;:::;b = cand l= 1;:::;n. Now
de ne
= Zttjj +1
c j+1Xk=j nkl= h (Xjl x) jli
dx:
1X ~K
+ w rnn 2 1
j
Then
Z3 = A
b =c cXj=1
where the
j
L (ctj) j
are identically distributed and jfor jjkj 2. Conditional on ~n:=Pjnjis independent of k, the equality is
exact. The exact distribution of the number of observations falling in the interval [ =c; =c],
corresponding to ~n, would be given by
(2n( =c)f(0))~n~n~n! n!n(n~n)! (1 2( =c)f(0))(n~n):
3The Poisson approximation sets the latter part of this expression to a constant in ~n. This is justi ed
by the usual arguments deriving the Poisson distribution as a limit of Binomial distributions. The
approximation of Zfollows by an argument similar to the one of point 3, second part, once we note
that the multinomial p.m.f. converges uniformly.
j] = 6) Moments of the integrals over the subintervals: E[ 2 j] = 21 2+ o( ) E[ + o( 2j j+1]
= 2 11) E[ + o( 23 j] = 3 2+ o( 3)) E[
38 MAXIMILIAN KASY
These equations follow from noting rst pointwise convergence to normality of
x) = w rnn 2
j+1Xk=j nkl= h
(Xjl x) jli !N(0;v)
1X ~K
1
under our sequence of experiments. This is the point where the rate rn
matters:
x) =
j+1Xk=j nkl= h
(Xjl x) jli
1X ~K
1
w 1=21=2n
(n
w
1(n
)1=2
j
j +1)
1
[K( l) l l] =
+
n
j
+
n
1=2
+ nj+1
1j1j
ljXl
p
+n
;
1=2 (n
+n +n
)
];
p
[K( l)
n
n
:
+ nj
x) x11
l
=
1
+ nj n
X
= w nj1
0 n
+
0
n
j1
jl=1
j +1
l
l
X
!
j+1
j+1)
+n
!!N
+ nare i.i.d. Uni[3;3]. Now asymptotic normality follows by
noting 1and E[ ] = 0. Similarly + ) v corr(j j) v corr(j j) v v
j1
+n
j
j +1
where the j
6f(0), (nj1
2
Second, a change of the order of integration and the limit in ndelivers the claims, where this
change of order is justi able by the dominated convergence theorem. For instance,
lim(E[ 2 j]= 2) = 4limE "
Z[0;1] j(c+ tj
+ 2 1))(c+ tj
+ 2 2))jd 1d 2#
= 4 Z[0;1]2 limE[j(c+ tj
+ 2 1))(c+ tj
+ 2 2))j]d 1d 2
7) Central limit theorem applied to the sum of integrals over the subintervals: Now apply a central
limit theorem for m-dependent sequences to the sum of integrals. For a de nition of m-dependence,
see Hoe ding and Robbins (1994). Note that L (ctj) is an m-dependent sequence with m= 1. We have
Var 0
b
j
@
L
=
=
(
r
c
c
c
t
X
j
j
)
=
1
j
1
A
j
j
L2 (ctj)Var(
L (ct )(L
L
0
(ct
) + L (ct
)+X
=c
2
@
j
c
= c( 2
Z
=c
2
X
(cu) 2(
+2
3 2 1)du
+ 2 11 3 2 1 ) Z1 L2
1
1(cu)du
ro ofm 3: Fix one
b g0
of of the roots
P
theore
x0
j
11
))Cov( j; j+1
j1
j+1
)
1A
Z3
symptotic normality for q
E[Z
follow
Z3
33]s, and
by b Z
=Ah b
0(x0
)))
>0:
, the same holds for q bZE[ b Z] . Furthermore, E Z = O(1), and hence so is EZ i .
of g. By the arguments of the proof of theoremA2, @=@xbg(x) (not to be confused with 0)) 6=
sign(g(x)) converges to a non-degenerate normal distribution for all x. In particular, liminf
P(sign(@=@xbg(x
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
39
By uniform convergence in levels of bgand the intermediate value theorem (compare also gure
2),P(Z(^g) >Z(g)) P(sign(@=@xbg(x0)) 6= sign(g0(x0))): This proofs the rst claim. The second claim
now immediately follows from !1.
Pro of of theorem 4 (Sketch): We will approximate M(a;b;x;w1) by a criterion function that has the
form of equation (3), i.e., a local weighted average over the empirical distribution of some objective
function. Based on this approximation we can then again apply the results of Kong, Linton, and Xia
(2010). Newey (1994) provides a set of results that facilitate such approximations of partial means. In
particular, lemma 5.4 in Newey (1994) allows derivation of the required approximation by replacing the
outer sum over jin equation (11) with an expectation, and by linearizing the fraction inside. The rst
replacement is asymptotically warranted since the variation created by averaging over the empirical
distribution is of order 1=p nand hence dominated by the variation
in the nonparametric component. The second replacement follows from di erentiability and requires
in particular that the denominator of the fraction be asymptotically bounded away from zero. This is
guaranteed by the requirement that Whas full conditional support given (X;W 1). Formally, lemma 5.4
in Newey (1994) gives M(a;b;x;w1)EW2[EmjX;W [mjX= x;W1= w1;W22]] = ~ M(a;b;x;w1)+op( ~
M(a;b;x;w)); where~ M(a;b;x;w1) := (40)1
x)) E[m(Yj
1jab(XjjW 2j) x))jXj;W j
!:
1
X;W 1jW 2(Xj
1jW 2
fX;W1jW 2
d
= O n (4+d)
dimensional case is that (6) has to be multiplied by 1=
O r2 nn 4+d . Ld+1 d
d1.
For b g0nto equal (n b g0) =
d
jover each of these subranges will be of order
O(( = )d
1=2converges
2db
Zof
n
i r.
Pro of of theorem 5: The proof requires the following modi cations relative to the one-dimensional
case: Assumption 2 is still applicable, where the only di erence in the dto have a point-wise
non-degenerate distributional limit, we have to choose the rate r, which is slower for higher d. To see
this note that Var(is Lipschitz continuous of order (1+d), so that we require = !0 for step 4 of the proof.
The range of integration has to be partitioned into rectangular subranges of area instead of intervals of
length . There will be approximately const ( = )such subintegrals. The variance of the integral of jb g0,
similarly for expectations and covariances. This yields a variance of); see point 7 of the proof.
gPro of of theorem 6: By equations (23) and (25), it is su cient to show that r01 1(g2;n( 1;s2);s1) and
rn c c g01 2( 1;s1)) converge jointly in distribution, while r;s),
j
(
x
w
g
X K
1
1
X ;
o
)
j
W
e
1
n
s
j
m
(
t
Y
o
j
1
i
f
a
b
(
X
!
1
.
j
F
i
n
a
l
l
y
,
n
d
t
h
e
v
a
r
i
a
n
c
e
o
]
f
;
W
6=
cons
t:,
the
rates
have
to be
adap
ted
as
follo
ws.
The
num
ber
of
obse
rvati
ons
withi
n
each
recta
ngle
of
size
x 2 go
)
) thro
j
X ugh
;
W unc
]
( han
X
; ged.
W
) If W
This
;
app
pro
roxi
vidi
mati
ng
onus
of wit
theh
obje
the
ctiv
de
e sir
func
ed
tion
Ba
has
ha
thedur
gen
ex
eral
pa
fornsi
m on.
ass
Ch
umoo
edsin
in g
Kon
the
g, ap
Lint
pro
on,pri
and
ate
Xiase
(20qu
10)en
if ce
weof
setex
( per
4 im
1
) ent
~ s,
m
( fro
Y
; m
X
; her
W
; e
a
; on
b
; the
x
) ent
: ire
=
m pro
( of
Y
a an
b
( d
X
x res
)
) ult
E of
[
m the
(
Y ore
a
b m
(
X
40 MAXIMILIAN KASY
ngnas
well as b , converge in probability. These claims follow as before if we combine the
convergence of rfrom display (31) with Bahadur expansion (6) for c g01 2and c g01 1, where the latter
are evaluated at 2;n, which is not constant but converges.
Q XjX(qjX)jX = q. Di erentiating this with
Pro of of lemma 1: By de nition of conditional
quantiles, F XjX
respect to Xgives
F XjX(QjX) :
@ (QjX) f XjX
(42) @
@X XjX (qjX) =
@X
Q
@ @X XjXFThe di erential in the numerator has two components, one due to the structural relation
between Xand X, i.e., the derivative with respect to the argument Xof d(X; ), and one due to the
stochastic dependence of Xand .(QjX) = E h g+ @X@X P f Xjg
g(X0X;X(QjgX; ) QjX
;X) X0 X=Xi: This can be seen as follows: We can decompose the
derivative according to@ @XF XjX(QjX) = @ @X0+ @@X P g(X0; ) QjX To simplify the rst
derivative, note that by iterated expectations P g(X0; ) QjX = E[F(g(X0; )jX;g)jX]: Di erentiating this
with respect to X0givesX
X0=X:
Eh
f XjgX;X(QjgX;X)jX i :
gX
XjX(qjX) must be stable,
(qj0) 0 and Q@
@XQ XjX
X
2Sc
2Uc
;xs c+1 c c
sc
that Q XjX(qjX) = 0.
i;s)
c. Furthermore, xs c
<0 on [xs c;xs c+1
@
=
c
@x@
@eg
of g, @@e xc
g:
@ @exc
The claim now is immediate.
(qj1) 0. Therefore the unique root Xof Q(qjX) 0. By lemma 1 and assu
Pro of of prop osition 3: Since Xand X+ Xhave their support in E[gj X= Q;X] 0. Finally, note that for all Xwhere (0;X) is in the support of
the interval [0;1], Q
qsuch
] and
similarly for Pfor all s, c= 1;3;:::and x], c= 1;3;:::from which negativity on Nfor all s, c= 2;4;:::. Next,
g(:;efollows, similarly for P. Finally, under monotonicity of potential outcomes, assuming for simplicity
di erentiability
The numerator is always positive by assumption, the denominator is negative for c= 1;3;::: and positive
for c= 2;4;:::since we had assumed gpositive for su ciently small x, hence
is positive for c= 1;3;:::and negative for c= 2;4;:::.
Pro of of prop osition 4: The claims are immediate, noting that N
= Ts[xs c
NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA
41
REFERENCES Andrews, D., and X. Cheng (2010):
\Estimation and inference with weak, semi-strong, and
strong identi cation," . Azariadis, C., and J. Stachurski (2005): \Poverty traps," Handbook of
Economic Growth,
1, 295{384. Bajari, P., H. Hong, J. Krainer, and D. Nekipelov (2006): \Estimating static models of
strategic interaction," NBER working paper. Becker, G., and K. Murphy (2000): Social economics:
Market behavior in a social environment. Harvard University Press. Berry, S. (1992): \Estimation of a model of entry in the airline
industry," Econometrica,
60(4), 889{917. Bowles, S., S. Durlauf, and K. Hoff (2006): Poverty traps. Princeton University Press.
Bresnahan, T., and P. Reiss (1991): \Entry and competition in concentrated markets,"
Journal of Political Economy, 99(5), 977{1009. Card, D., A. Mas, and J. Rothstein (2008): \Tipping
and the dynamics of segregation,"
Quarterly Journal of Economics, 123(1), 177{218. Choirat, C., and R. Seri (2012): \Estimation in
Discrete Parameter Model," forthcoming in
Statistical Science. Dasgupta, P., and D. Ray (1986): \Inequality as a determinant of malnutrition and
unem1ployment: Theory," The Economic Journal, 96(384), 1011{1034. Gin
e, E., D. Mason, and A. Zaitsev
(2003): \The L-norm density estimator process," The Annals of Probability, 31(2), 719{768.
Hoeffding, W., and H. Robbins (1994): \The central limit theorem for dependent random variables," in
The collected works of Wassily Hoe ding. Springer.
Horowitz, J. (2001): \The Bootstrap," Handbook of Econometrics, 5, 3159{3228. Horv ath, L. (1991):
\On Lp-norms of multivariate density estimators," The Annals of Statistics, 19(4), 1933{1949.
Imbens, G., and J. Wooldridge (2007): \What’s new in econometrics? Weak instruments and many
instruments," NBER Lecture Notes 13, Summer 2007.
Kasy, M.(2010): \Nonparametric inference on the number of equilibria, web appendix," https:
//sites.google.com/site/maxkasywp/Home/wps/Appendixtestingmulteq.pdf .
Koenker, R. (2009): \Quantile Regression," http://www.econ.uiuc.edu/ ~ roger/research/
rq/rq.html , Accessed January 30, 2009. Kong, E., O. Linton, and Y. Xia (2010): \Uniform bahadur
representation for local polynomial estimates of m-regression and its application to the additive model," Econometric Theory, 26,
1{36.
Newey, W. K. (1994): \Kernel estimation of partial means and a general variance estimator,"
Econometric Theory, 10(2), 233{253.
Quah, D. (1996): \Empirics for economic growth and convergence," European Economic Review, 40(6),
1353{1375.
Staiger, D., and J. H. Stock (1997): \Instrumental Variables Regression with Weak Instruments,"
Econometrica, 65(3), 557{586.
van der Vaart, A. (1998): Asymptotic statistics. Cambridge University Press. Young, H. (2008): \Social
norms," in The New Palgrave Dictionary of Economics, ed. by
S. Durlauf, and L. Blume, vol. 2.
Download