NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA Maximilian Kasy May 5, 2012 This paper proposes an estimator and develops an inference procedure for the number of roots of functions which are nonparametrically identi ed by conditional moment restrictions. It is shown that a smoothed plug-in estimator of the number of roots is super-consistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics. The smoothed estimator is furthermore asymptotically e cient relative to a simple plug-in estimator. The procedure proposed is used to construct con dence sets for the number of equilibria of static games of incomplete information and of stochastic di erence equations. In an application to panel data on neighborhood composition in the United States, no evidence of multiple equilibria is found. Keywords: Nonparametric Testing, Multiple Equilibria. 1. INTRODUCTION 1Some economic systems show large and persistent di erences in outcomes even though the observable exogenous factors inuencing these systems di er little. One explanation for such persistent di erences in outcomes is multiplicity of equilibria. If a system indeed has multiple equilibria, temporary, large interventions might have a permanent e ect, by shifting the equilibrium attained, while long-lasting, small interventions might not have a permanent e ect. Knowing the number of equilibria, and in particular whether there are multiple equilibria, is of interest in many economic contexts. Multiple equilibria and poverty traps are discussed by Dasgupta and Ray (1986), Azariadis and Stachurski (2005), and Bowles, Durlauf, and Ho (2006). Poverty traps can arise, for instance, if an individual’s productivity is a function of her income and if wage income reects productivity, as in models of e ciency wages. Productivity might depend on wages because nutrition and health are improving with income. If this feedback mechanism is strong enough, there might be multiple equilibria, and extreme poverty might be self-perpetuating. In that case, Assistant Professor, Department of Economics, UCLA, and junior associate faculty, IHS Vienna. Address: 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095. E-Mail: maxkasy@econ.ucla.edu. I thank seminar participants at UC Berkeley, UCLA, USC, Brown, NYU, UPenn, LSE, UCL, Sciences Po, TSE, Mannheim and IHS Vienna for their helpful comments and suggestions. I particularly thank David Card, Kiril Datchev, Jinyong Hahn, Michael Jansson, Bryan Graham, Susanne Kimm, Patrick Kline, Rosa Matzkin, Enrico Moretti, Denis Nekipelov, James Powell, Alexander Rothenberg, Jesse Rothstein, James Stock and Mark van der Laan for many valuable discussions and David Card, Alexander Mas and Jesse Rothstein for the access provided to their data. This work was supported by a DOC fellowship from the Austrian Academy of Sciences at the Department of Economics, UC Berkeley. 1\System" 1 might refer to households, rms, urban neighborhoods, national economies, etc. 2 MAXIMILIAN KASY public investments in nutrition and health can permanently lift families out of poverty. Multiple equilibria and urban segregation are discussed by Becker and Murphy (2000) and Card, Mas, and Rothstein (2008). Urban segregation, along ethnic or sociodemographic dimensions, might arise because households’ location choices reect a preference over neighborhood composition. If this preference is strong enough, di erent compositions of a neighborhood can be stable, given constant exogenous neighborhood properties. Transition between di erent stable compositions might lead to rapid composition change, or \tipping," as in the case of gentri cation of a neighborhood. Interest in such tipping behavior motivated Card, Mas, and Rothstein (2008), and is the focus of the application discussed in section 4 of this paper. Multiple equilibria and the market entry of rms are discussed by Bresnahan and Reiss (1991) and Berry (1992). Entering a market might only be pro table for a rm if its competitors do not enter that same market. As a consequence, di erent con gurations of which rms serve which markets might be stable. In sociology, nally, multiple equilibria are of interest in the context of social norms. If the incentives to conform to prevailing behaviors are strong enough, di erent behavioral patterns might be stable norms, i.e., equilibria, see Young (2008). Transitions between such stable norms correspond to social change. One instance where this has been discussed is the assimilation of immigrant communities into the mainstream culture of a country. This paper develops an estimator and an inference procedure for the number of equilibria of economic systems. It will be assumed that the equilibria of a system can be represented as solutions to the equation g(x) = 0. It will furthermore be assumed that gcan be identi ed by some conditional moment restriction. The procedure proposed here provides con dence sets for the number Z(g) of solutions to the equation g(x) = 0. This procedure can be summarized as follows. In a rst stage, gand its derivative g0are nonparametrically estimated. These rst stage estimates of gand g0 are then plugged into a a smooth functional Z , as de ned in equation (4) below. We show that under standard i.i.d. asymptotics, and for small enough, the continuously distributed Z 2(bg) converges to the integer valued Z(g) at an in nite rate. A superconsistent estimatorof Z(g) can thus be formed by projecting Z (bg) on the closest integer. We then show that a rescaled version of Z(bg) converges to a normal distribution under a non-standard sequence of experiments. This non-standard sequence of experiments is constructed using increasing levels of noise and shrinking bandwidth as sample size increases. Under this same sequence of experiments, the bootstrap provides consistent estimates of the bias and standard deviation of Z (bg) relative to Z(g). We can thus construct con dence sets for Z(g) using t-tests. These con dence sets are sets of integers containing the true number of2An estimator is called superconsistent if it converges at a rate faster than the usual parametric rate, which equals the square root of the sample size. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 3 roots with a pre-speci ed asymptotic probability of 1 . An alternative to the procedure proposed here would be to use the simple plug-in estimator Z(bg). This estimator just counts the roots of the rst stage estimate of g. We show, however, that the simple plug-in estimator is asymptotically ine cient relative to the smoothed estimator Z (bg) under the non-standard sequence of experiments considered.3Sections 3.4 and 3.5 discuss two general setups that allow to translate the hypothesis of multiple equilibria into a hypothesis on the number of roots of some identi able function g; these setups are static games of incomplete information and stochastic di erence equations. Section 3.4 discusses a nonparametric model of static games of incomplete information, similar to the one analyzed in Bajari, Hong, Krainer, and Nekipelov (2006).Under the assumptions detailed in section 3.4, we can nonparametrically identify the average best response functions of the players in a static incomplete information game. This allows to represent the Bayesian Nash equilibria of this game as roots of an estimable function. Section 3.4 discusses how to perform inference on the number of such Bayesian Nash equilibria. Section 3.5 considers panel data of observations of some variable X, where X is generated by a general nonlinear stochastic di erence equation. This is motivated by the study of neighborhood composition dynamics in Card, Mas, and Rothstein (2008). Section 3.5 argues that we can construct tests for the null hypothesis of equilibrium multiplicity of such nonlinear di erence equations by testing whether nonparametric quantile regressions of Xon Xhave multiple roots. The rest of this paper is structured as follows. Section 2 presents the inference procedure and its asymptotic justi cation for the baseline case. Section 3 discusses generalizations, as well as identi cation and inference in static games of incomplete information and in stochastic di erence equations. Section 4 applies the inference procedure to the data on neighborhood composition studied by Card, Mas, and Rothstein (2008). In contrast to their results, no evidence of \tipping" (equilibrium multiplicity) is found here. Section 5 concludes. Appendix A presents some Monte Carlo evidence. All proofs are relegated to appendix B. Additional gures and tables are in the web appendix, Kasy (2010). This web appendix also contains a second application of the inference procedure to data on economic growth, similar to those discussed by Azariadis and Stachurski (2005), section 4.1, and by Quah (1996). 3Note that this paper does not contribute to the literature discussing identi cation and estimation problems in games of complete information with multiple equilibria. 4 MA XI MI LI AN KA SY 2 . I N F E R E N C E I N T H E B A S E L I N E C A S E 2 .1. S et up Th ro ug ho ut thi s pa pe r, th e pa ra mt er of int er es t is th e nu m be r of ro ot s Zo f som e funct ion gon a subs et X of its supp ort : (1) Z(g) := jfx2 X: g(x) = 0gj: Inter est in this para met er is moti vate d by econ omic mod els in whic h the equi libria can be repr esen ted as root s of such a funct ion g. Iden ti cat ion of the para met er Z(g) follo ws from ident i cati on of gon X. In this secti on, infer ence on Z(g) is disc usse d for funct ions gwit h one dime nsio nal and com pact dom ain and rang e. Thro ugh out, the follo wing assu mpti on will be main taine d. Ass ump tion 1 T he obse rvabl e data are i.i.d. draw s of (Yi;X i). T he set X is com pact, and the dens ity of Xis bou nde d awa y from 0 onX . Th e funct ion gis ident i ed by a cond ition al mo men t restr ictio n of the form ( 2 ) g ( x ) = a r g m i n [ m ( Y y )j X = x ]: T h e f u n c ti o n g i s c o n ti n u o u s l y d i e r e n ti a b l e a n d g e n e ri c i n t h e s e n s e o f d e n it i o n 1 b e l o w .y E Yj X 2Exa mple s of funct ions char acte rized by cond ition al mo men t restr ictio ns as in equ ation (2) are cond ition al mea n regr essi ons, for whic h m( ) = , and cond ition al qth qua ntile regr essi ons, for whic h mq( ) = ( q1( <0)). 0De nitio n1 (Ge neric ity) A conti nuo usly di er entia ble funct ion g is calle d gen eric if fx: g(x) =0 and g(x) = 0g= ?, and if all root s of gare in the interi or of X. 4 G e n e r i c i t y o f g i m p l i e s t h a t g h a s o n l y a n i t e n u m b e r o f r o o t s . W e pr op os e th e fol lo wi ng inf er en ce pr oc ed ur e for th e nu m be r of ro ot s of g, Z( g): Fir st, es ti m at e g(: ) an d g0 (:) us in g lo ca l lin ea r mre gr es si on : (3) here K 0(x) 4 bg(x); bg ( = argmina;b X K (Xi x)m(Yi 1K ab(Xi x)); i ( ) = ) for some (symmetric, w positive) kernel function Kintegrating to one with bandwidth . Equation (3) is a sample analog of equation (2), Suppose that ghas an in nite number of roots in the compact set X . Then the set of x such that g(x) = 0 has an accumulation point in X . At this accumulation point genericity is violated. b NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 5 Z=where a kernel weighted local average is replacing the conditional Z(g expectation. Next, calculate (:); g0 (4 ) Z bg(:);(:)) := Z is de ned asb g0(:) , where ZL g(x) (x) is a Lipschitz continuous, positive symmetric kernel integrating to 1 with bandwidth and support [ ; ]. Estimate the variance andbias.bias of b Zrelative to Zusing bootstrap. Finally, construct integer valued con -dence sets for Zusing t-statistics based on b Zand the bootstrapped variance and 2.2. Basic properties and consistency The rest of this section will motivate and justify this procedure. First, we will nsee that b Zis a superconsistent estimator of Z, in the sense ( b ZZ) ! p that The following proposition states that Z(g) = Z Z (g(:);g0(:)) = Z(g(:)): I n t h i s X dx: set s. g) for gen eric e gan x d p sm n0 for any diverging sequence !1, under i.i.d. sampling and conditions to be all r stated. Then wee will present the central result of this paper, which establishes eno asymptotic normality ofb Zunder a non-standard sequence of experiments. s ugh Fromthis result sit follows that inference based on t-statistics, using bootstrapped . standard errorsi and bias corrections, provides asymptotically valid con dence The sets for Z. We also o show thatb Zis an e cient estimator relative to the two simpleplug-in estimator Z(bg) under the non-standard asymptotic sequence. We n fun are mainly concerned with constructing con dence sets for Z, rather thana pointctio , estimator. A point estimator could be formed by projecting b Zon theclosest nal integer. While bLZwill be called an estimator of Z(g), it should be kept inmind that s its primary role is as an intermediate statistic in the construction of con dence onl y di er around non-generic g, or \bifurcation points," that is gwhere Zjumps. The functional Zis a smooth approximation of Zwhich varies continuously around such jumps. Proposition 1 For gcontinuously di erentiable and generic, if >0 is small enough, then All proofs are relegated to appendix B. The intuition underlying proposition 1 is as follows: Given a generic function g, consider the subset of X where L(g) is not zero. If is small enough, this subset is partitioned into disjoint neighborhoods of the roots of g, and gis monotonic in each of these neighborhoods. A g 0 g1 g2 g3 g4 6 MAXIMILIAN KASY Figure 1.| Zand Z x r0r (g3 Notes: This gure illustrates the relationship between Zand Z 1) = Z (g1) = 0, Z(g2) = 0 <Z (g2) <1, Z(g3) = 2 >Z 4) = Z (g4) = 2. 1(X ), with the following norm: (5) jjgjj:= supx2X jg(x)j+ jg0(x)j: supx2X . For the functions g depicted, Z(g ) >1, and Z(g change of variables, setting y= g(x), shows that the integral over each of these neighborhoods equals one. Figure 1 illustrates the relationship between Zand Z. The two functionals are equal if gdoes not peak within the range [ ; ]. If gdoes peak within the range [ ; ], they are di erent and Zis not integer valued. It is useful to equip the space of continuously di erentiable functions on the compact set X , C 1 1, and so is Z This is the uniform rst order Sobolev norm on C that has at least one root we can nd a function g2 2arbitrarily close to g1 to be uniformly close to 1 1 1 (X ). Given this norm, we have the following proposition: Proposition 2 (Local constancy) Z(:) is constant in a neighborhood, with respect to the norm jj:jj, of any generic function g2Cif is small enough. Using a neighborhood of gwith respect to the sup norm in levels only, instead of jj:jj, is not enough for the assertion of proposition 2 to hold. For any function gin the uniform sense which has more roots than g, by adding a \wiggle" around a root of g. Figure 2 illustrates by showing two functions which are uniformly close in levels but not in derivatives, and which have di erent numbers of roots. If one, however, additionally restricts the rst derivative of g that the plugin estimator b Z= Z b g ( : ) ; b (:) converges to a degenerate limiting distribution at an \in nite" rate, if bgconverges with respect to the norm jj:jj.to (g;g05),if gis generic and g if nTheorem 1 (Superconsistency) If bg;b g0 converges uniformly in 0 probability!1is some arbitrary diverging sequence, then NONPARAME Figu Notes: This gure illust of roots. the the derivative o since around these \harder" to estimat dominates the asy Proposition 2 imme theorem states Furthermore, if is small enough so that Z (g;g0) = Z(g) holds, then nb b b Z(g) ! 0 if !0 as n!1. g; g0 Z(g) ! Z b g g; 5 b g0 n p(Z(bg) n Z(g)) ! 0: Z This result implies that 0N ote tha t thi s is a sli ght ly di ere nt co ndi tio n fro p 0 : 0 p m co nv erg en ce of bg w.r .t. the nor m jj:jj sin ce ne ed not eq ual bg. 8 MA XI MI LI AN KA SY 2. 3. As y m pt oti c no rm ali ty an d rel ati ve e ci en cy W e ha ve sh o w n ou r rst cl ai m, su pe rc on si st en cy of un de ra no nst an da rd se qu en ce of ex pe ri m en ts. Th is se cti on wil l th en co nc lu de by for m all y st ati ng th e e ci en cy of b Zr el ati ve to th e si m pl e pl ug -in es ti m at or Z( bg ). To fur th er ch ar ac ter iz e th e as y m pt oti c di str ib uti on of b Z, w e ne ed a su itab le ap pr ox im ati on for th e di str ib uti on of th e rs t st ag e es ti m at or bg (:) ;b g0 (:) .K on g, Li nt on , an d Xi a (2 01 0) pr ov id e un ifo rm B ah ad ur re pr es en tat io ns for lo ca l po ly no mi al es ti m at or s of mre gr es si on s. W e st at e th eir re su lt, for th e sp ec ial ca se of lo ca l lin ea r mre gr es si on , as an as su m pti on . gence of (bg; b Zgiven uniform conver-). We will show next our second claim, b g0 asymptotic normality of b Z 1 xf K (Xix) (Yig(x)g0(x)(Xi 1; X ix 2 3 b g ( x ) ; Assumption 2 (Bahadur expansion) The estimation error of the estimator bg(x); b g0(x) de ned by equation (3) can be approximated by a local average as follows:(6) 0(x) (g(x);g b ( x ) 0(x)) =R x)) 1 i n X g ( x ) s 1 ( x ) I n where (in a piecewise derivative sense), s(x) =E[ (Yg(x))jX= x], and I(x) is a f non-random matrix converging uniformly to the identity matrix, and where bg(x) b (g(x);g0(x)) ; g 0 (x) p uniformly in x. n := R @ dx, := m0 K(x)x2 is the density of x, 2 x @g(x) R= o bg(x); b g0(x) 6(x)). This assumption is only well de ned in the context of a sequence of experiments.yjxIn theorem 2 below, this assumption will b understood to hold relative to the sequence of experiments de ned in assumption 3. In the case of qth quantile regression, ( ) = q1( <0) an f(g(x)jx). In the case of mean regression, ( ) = 2 and s(x) = 2.The asy results in the remainder of this section depend on the availability of an expansion in the form of expansion (6) and the relative negligibility (g(x);g0 R= uni for ml y in X, for so 6Kong, 1; 1 Op log(n)n Linton, and Xia (2010) provide regularity conditions under which me 2 (0; 1) as n!1 for sta tio nar y mi xin g pro ce ss es. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 9 of the remainder, but not on any other speci cs of local linear m-regression. This will allow for fairly straightforward generalizations of the baseline case considered here to the cases discussed in section 3 as well as to other cases which are beyond the scope of this paper, once we have appropriate expansions for the rst stage estimators. By proposition 2, consistency of any plugin estimator follows from uniform convergence of bg(:);b g0(:) . Such uniform convergence follows from assumption2, combined with a Glivenko Cantelli-theorem on uniform convergence of averages, assuming i.i.d. draws from the joint distribution of (Y;X) as n!1, seevan der Vaart (1998), chapter 19. Superconsistency of b Ztherefore follows, whichimplies that standard i.i.d. asymptotics with rescaling of the estimator yield only degenerate distributional approximations. This is because Z 1 and Zare constant in a Cneighborhood of any generic g, even though they jump at \bifurcation points", i.e., non-generic g. As a consequence, all terms in a functional Taylor expansion of Z , as a function of g, vanish, except for the remainder. The application of \delta method" type arguments, as in Newey (1994), gives only the degenerate limit distribution. In nite samples, however, the sampling variation of b Zis in general not negli-gible, as the simulations of appendix A con rm, which makes the distributional approximation of the degenerate limit useless for inference. Asymptotic statistical theory approximates the nite sample distribution of interest by a limiting distribution of a sequence of experiments, of which our actual experiment is an element. The choice of sequence, such as i.i.d. sampling, is to some extent arbitrary. In econometrics, non-standard asymptotics are used for instance in the literature on weak instruments (e.g., Staiger and Stock (1997), Imbens and Wooldridge (2007), Andrews and Cheng (2010)). In the present setup, a nondegenerate distributional limit of b Zcan only be obtained under a sequence ofexperiments which yields a non-degenerate limiting distribution of the rst stage estimator bg(:);b g0(:) 7.We will now consider asymptotics under such a sequence of experiments. The sequence we consider has increasing amounts of \noise" rel-ative to \signal" as sample size increases. Assumption 3 Experiments are indexed by n, and for the nth experiment we observe (Yi;n;Xi;n) for i= 1;:::;n. The observations (Xi;n;Yi;n) are i.i.d. given 7The approach of this paper, using local asymptotics, contrasts with the approach taken by most of the literature discussing inference on discrete valued parameters, testing and model selection. As argued by Choirat and Seri (2012), this literature has mostly focused on the use of large deviations asymptotics. The reason is that consistent estimators for discrete objects tend to converge at an exponential rate. Which type of asymptotics provides a more accurate approximation of nite sample distributions ultimately depends on the speci c data generating process, c.f. Andrews and Cheng (2010). 10 MA XI MI LI AN KA SY i;njX(8) fx fjX= g(Xi;n) + rni;n aE[m(rna)jX]: a Yi;ni;n where frn n, and (:)(7) ;(9 ) X gis a real-valued sequence and i;n 0 = argmin E[m(a)jX] = argmin The last equality requires the criterion function mto be \scale neutral". For a given sample size n, this is the same model as before. As nchanges, the function gidenti ed by equation (2) is held constant. If rngrows in n, the estimation problem in this sequence of models becomes increasingly di cult relative to i.i.d. sampling. Note that equation (9) does not describe an additive structural model, which would allow to predict counterfactual outcomes. Instead, rni;nis simply the statistical residual, given by the di erence of Y and g(X), which is also well-de ned for non-additive structural models. By corollary 1, a necessary condition for a non-degenerate limit of b Zis that bg;b g0 converges to a non-degenerate limiting distribution. As is well known,and also follows from assumption 2, b g 0converges at a slower rate than bg, so that asymptotically variation inb g0will dominate, namely by adding \wiggles" around the actual roots. If rn= (nh51=2)b g08in the sequence of experiments just de ned, bgconverges uniformly in probability to g, whereasconverges point-wise (and indeed functionally) to a non degenerate limit. This is the basis for the following theorem. Theorem 2 (Asymptotic normality) Under assumptions 1, 2, and 3, and if r n= (n 51=2), n !1, !0 and = 2!0, then there exist >0 and V such that r for b bg; Z= Z b Z Zb g0 !N(0;V) . Both and V depend on the data generating process only viathe asymptotic mean and variance of b g0at the roots of g, which in turn depend upon fX, g0, sand Var( jX) evaluated at the roots of g. form Z(g) = ZThis thoerem justi es the use of t-tests based on b Zfor null hypotheses of the8 (g) = z0. The construction of a t-statistic requires a consistent1pThe proof of theorem 2 uses somewhat similar arguments as Horv ath (1991) and Gin e, Mason, and Zaitsev (2003), who discuss the asymptotic distribution of the Lnorm (Lnorm) of kernel density estimators. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 11 estimator of V and an estimator of converging at a rate faster than p = . Thelast part of theorem 2 suggests a way to obtain those. Any plug-in estimator that consistently estimates the (co)variances ofb g0under the given sequence of experiments consistently estimates and V. One such plug-in estimator is standard bootstrap, that is resampling from the empirical distribution function. The Bahadur expansion in assumption 2, which approximatesb g0by sample averages, implies that the bootstrap gives a resampling distribution with the asymptotically correct covariance structure forb g0. From this and theorem 2 it then follows that the bootstrap gives consistent variance and bias estimates for Z, where the bias is estimated from the di erence of the resampling estimates relative to Z (bg). If sample size grows fast enough relative to p = and , the asymptotic validity of a standard normal approximation for the pivot follows.It would be interesting to develop distributional re nements for this statistic using higher order bootstrapping, along the lines discussed by Horowitz (2001). However, higher order bootstrapping might be very computationally demanding in the present case, in particular if criteria like quantile regression are used to identify g. tests based on b Z= Z Theorem 2 also implies that increasing the bandwidth parameter reduces the variance without a ecting the bias in the limiting normal distribution. Asymptotically, the di culty in estimating Zis driven entirely by uctuations inb g0. These uctuations lead both to upward bias and to variance in plug-in estimators. When is larger, these uctuations are averaged over a larger range of X, thereby reducing variance. Theorem 2 implies that Zis asymptotically ine cient relative to Z 2for 1< 2 1. Furthermore, by proposition 1, Z(g) = lim !0Z (g) for all generic g. If the relative ine ciency carries over to the limit as !0, it follows that the simple plug-in estimator Z(bg) is asymptotically ine cient rel-ative to b Z. Note, however, that this is only a heuristic argument. We can notexchange the limits with respect to and with respect to nto obtain the limit distribution of Z(bg). The following theorem, which is fairly easy to show, states a formally correct version of this argument. Theorem 3 (Asymptotic ine ciency of the naive plug-in estimator) Consider the setup of theorem 2, and assume Z(g) >0. Then, as n!1, liminf P(Z(bg) >Z(g)) >0 andVar r Z(bg) !1:From this theorem it follows in particular that tests based on Z(bg) will in general not be consistent under the sequence of experiments considered, i.e., the probability of false acceptances does not go to zero. This stands in contrast to bb g; g . 0 12 MAXIMILIAN KASY 3. EXTENSIONS AND APPLICATIONS In this section, several extensions and applications of the results of section 2 are presented. Subsections 3.1 through 3.3 discuss, in turn, inference on Zif g is identi ed by more general moment conditions, inference on Zif the domain and range of gare multidimensional, and inference on the number of stable and unstable roots. Subsections 3.4 and 3.5 discuss identi cation and inference for the two applications mentioned in the introduction, static games of incomplete information and stochastic di erence equations. 3.1. Conditioning on covariates In the previous section, inference on Z(g) was discussed for functions gidenti ed by the moment condition g(x) = argmin[m(Yy)jX= x]: This = subsection generalizes to functions gidenti ed by (10) w1;W 2] ; g(x;w1yEYjX) = argminyEW 2 EYjX;W [m(Yy)jX= x;Wwhere the parameter of interest now is Z(g(:;w11 1)jW 2 1 1 g(x;w1) := argminyE [m( h(x ;w 1 2 is plugged into the func- set supp(X;W 1) supp(W 2 The vector W 2 1 2 tional Z )), the number of roots of gin xgiven w. The conditional moment restriction (10) can be rationalized by a structural model of the form Y = h(X;W; );where ?(X;Wand gis de ned by ; ) y)]]: We will assume that the joint density of X;Wis bounded away from zero on the ), where suppdenotes the compact support of either random vector. serves as a vector of control variables. The conditional independence assumption ?(X;W)jWis also known as \selection on observables." The function gis equal to the average structural function if m( ) = , and equal to a quantile structural function if m q( ) = (q1( <0)). The average structural function will be of importance in the context of games of incomplete information, as discussed in section 3.4, quantile structural functions will be used to characterize stochastic di erence equations in section 3.5. When games of incomplete information are discussed in section 3.4, W = W 1will correspond to the component of public information which is not excluded from either player’s response function. The inference procedure proposed in the previous section is based upon two steps. First, the function gand its derivative are estimated using local linearm-regression. In the second step, the estimator bg;b g0(:;:), which is a smooth approximation of the functional Z(:). We can generalize this approach by maintaining the same second step while using more NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 13 general rst stage estimators bg;b g0 . Equation (10) suggests estimating gby anonparametric sample analog, replacing the conditional expectation with a local linear kernel estimator of it, and the expectation over Wwith a sample average. Formally, let(11) M(a;b;x;w1 bg(x;w1) = 1); b g0(x;wjn XPi1) K = argmin(Xix;W a;b1i2M(a;b;x;ww1;W2iW 1), where2j)m(Yiab(XiK (Xix;W1iw1;W2iW2jix)) P) :b g0has a non degenerate limiting distribution. If we obtain an approximation ofAn asymptotic normality result can be shown in this context which generalizes theorem 2. In light of the proof of theorem 2, the crucial step is to obtain a sequence of experiments such that bgconverges uniformly to gwhileb g0equivalent to the approximation in assumption 2, all further steps of the proof apply immediately. This can be done, using the results of Newey (1994), for the following sequence of experiments. Assumption 4 Experiments are indexed by n, and for the nth experiment we observe (Yi;n;Xi;n;W i;n) for i= 1;:::;n. The observations (Xi;n;Yi;n;W i;n) are i.i.d. given n, and iid) fx;w ) + rn = n 4+d 1=2 (Xi;n;W i;n ) f i;nj(Xi;n;W(13) Yi;n n i;n;W ), n d 2 to Rd Z L(:) bg detd b g0 ; d in the (15) b Z:=where bg (:); b g0 (:)(12) jX;W i;n 1 = g(X 1i;n i;n r d :(14) Theorem 4 (Asymptotic normality, with control variables) Under the assumptions of section 2, but with gidenti ed by equation 10 and the data generated by the model given by assumption 4, if r, where d = dim(X)+dim(W!1, !0 and = !0, then there exist >0 and V such that b Z Z !N(0;V): 3.2. Higher dimensional systems Thus far, only one-dimensional arguments xand one-dimensional ranges for the function gwere considered, where xis the argument over which Zintegrates. All results of section 2 are easily extended to a higher dimensional setup. In particular, assume we are interested in the number of roots of a function gfrom R. Generalizing equation (4), we can de ne b Zas are again estimated by local linear m regression, Lis a kernel with support [ ; ], and the integral is taken over the set X R 14 MAXIMILIAN KASY if rn = (n b g0support of g. As in the one dimensional case, superconsistency follows from uniform convergence of (bg;). The following theorem, generalizing theorem 2, holds for arbitrary d: dTheorem 5 (Asymptotic normality, multidimensional systems) Under the assumptions of section 2, but with g: R d=2 bZ Z !N(0;V): dx 0 s and Z s u (x) <0 or Zu s 0 0 g0 s 0 u 1 g 10 0X(:)) := ZX 0 Zs Zu 3. 3. St ab le an d un st ab le ro ot s In st ea d L 4+d 1=2), n d !R!1, !0 and = d d+1!0, then t of te sti ng for th e tot al nu m be r of ro ot s, on e mi gh t be int er es te d in the number of \stable" and \unstable" roots, Z and Z 0 (g) := jfx2X : g(x) = 0 andg(g) := jfx2X : g(x) = 0 andg u (g(:);g (:)) := Z b g0 L g(x) g(x g0 )g0(x) (x) (g(:);g Stab le root s are thos e wher e gis neg ative , unst . able root s thos e wher e gis posit ive: Z(x) <0gj Z(x) >0gj: (16) In the multi dime nsio nal case , we coul d mor e gen erall y cons ider root s with a gi ve n nu m be r of po sit iv e an d ne ga tiv e ei ge nv al ue s of g. W e ca n de n e s m oo th ap pr ox im ati on s of th e pa ra m et er s Za s fol lo w s: x) >0 du:( 17)A gain, all argu men ts of secti on 2 go thro ugh esse ntiall y unch ang ed for thes e para met ers. In ( parti cular , theo rem 2 appli es litera lly, repl acin g Zwit h Z. Mor e gen erall y, funct ional s whic h are smo oth appr oxim ation s of the num ber of root s with vario us stabi lity prop ertie s can be cons truct ed in the multi dime nsio nal case by multi plyin g the integ rand with an indic ator funct ion dep endi ng on the sign s of the eige nval ues of. 3 .4. St ati c ga m es of in co m pl et e inf or m ati on Th is se cti on an d se cti on 3. 5 di sc us s ho w to ap pl y th e inf er en ce pr oc ed ur e prop osed to test for equil ibriu m multi plicit y in econ omic mod els. The disc ussi on in this subs ectio n build s on Baja ri, Hon g, Krai ner, and Neki pelo v (200 6). C on si de r th e fol lo wi ng st ati c ga m e of in co m pl et e inf or m ati on . As su m e th er e ar e tw o pl ay er s i= 1; 2, w ho bo th ha ve to ch oo se be tw ee n on e of tw o ac tio ns , NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 15 Figure 3.| Response functions and Bayesian Nash Equilibria ( s21,s2)) gg( s12,s1 s2 s1 s1(s ) s2(s ) Notes: This gure illustrates the two player, two action static game of incomplete information discussed in section 3.4. The functions gare the (average) best response functions, Bayesian Nash Equilibrium requires g( 1i;s) := g1(g2( 1;s2);s1) = 0, and we observe one equilibrium ( 1(s); 21(s)) in the data. In this graph, there are two further equilibria which are not directly observable. a= 0;1. Player imakes her choice based on public information s, as well as private information . The public information sis observed by the econometrician, and iiis independent of s. It is assumed that 9idoes not enter player i’s utility.Denote the probability that player iplays strategy a= 1 given the public information sby i(s). Player i’s expected utility given her information, and hence her optimal action ai, as well as player i’s probability of choosing a= 1, i, depend on sand i(s). Let us denote the average best response of player i, integrating over the distribution of i, by (18) gi ( i;s) = E[aij i;s]: Figure 3 illustrates, by plotting the response functions g ifor given s. In Bayesian Nash Equilibrium, the probability of player ichoosing a = 1, , equals the average best response of player i, gii. This implies the two equilibrium conditions i(s) = gi ( i(s);s); for i= 1;2. In gure 3, the Bayesian Nash Equilibria correspond to the intersections of the graphs of the two gi. The condition for Bayesian Nash Equilibrium 9This is an important restriction. It precludes in particular application of this setup to correlated value auctions. 16 MAXIMILIAN KASY in this game can be restated as g( ;s) = 0, where (19) g( 1;s) = g1(g2( 11;s);s) : The number of roots of g( 1;s) in 11is the number of Bayesian Nash Equilibria in this game, given s. We will now discuss identi cation and inference on the number of Bayesian Nash Equilibria of this game, given the public information s. Assume we observe an i.i.d. sample of (a1;j;a2;j;sj), the players’ realized actions and the public information of the game, where ai;j2f0;1gfor i= 1;2 and s2Rk. In this subsection, iindexes players and jindexes observations. Rational expectation beliefs of player iabout the expected action of player iare given by (s) = E[aiijs]. The following two-stage estimation procedure is a nonparametric variant of the procedure proposed by Bajari, Hong, Krainer, and Nekipelov (2006). We can get an estimate of the beliefs, b (20) (b i0 i(s);b (s)) = argminib;c(s) = b E[ajXK (sjijs], by local linear mean regression.s)(aAverage best responses of players are given by gii;j( ibc(sj;s) = E[a2s))ij ;s]. Without further restrictions, giiis not identi ed, since by de nition is functionally dependent on s. If, however, exclusion restrictions of the form ( i;si ( i;s) = gi i i i. Assume furthermore that i i 1 1;s) = g1(g2( 1;s2);s1 ii( i;si) = b E[aijb i;si) = = i;si i;si;j si 2 i;j i;si;jsi)(ai;j bgi( (22) argminib;c;sijX i;j i 1 (23) bg( (21) gi ) 1 1 1i 1: ) are imposed, the gcan be identi ed. In particular, assume that exclusion restriction (21) holds, with dim(s) = dim(s) 1 = k1. There is one excluded component of sfor each player, the remaining k2 components are not excluded from either response function g(s) has full support [0;1] given s, for i= 1;2. Under these assumptions, we can estimate the best response functions, bg], again using local linear mean regression:); b gK 0 i( (b bc(b Note that no functional form restrictions are needed for identi cation of the choice functions g. This stands in contrast to Bajari, Hong, Krainer, and Nekipelov (2006), who need to impose such restrictions in order to be able to identify the underlying preferences. Recall that the condition for Bayesian Nash Equilibrium in this game is given by g( = 0. Inserting bg, both estimated by (22), yields an estimator of gwhich can be written asinto bg;s) = b Eh a b 2= b 2)) E[a2jb 1= 1;s2];s NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 17 Based on this estimator, we can perform inference on the number of Bayesian Nash Equilibria given s, Z(g(:;s)). In particular, let n bg(:;s);c g01(:;s) ; (bg(:; c s); g01(:;s) (24) b Z= Z 01 1 01 1 (25) c g01( 01 1 2( 1;s2 1and c g01 2, so that) c g01 2( 1;s1): 1 0(:) properties of 0 ) and b g0(x2 b g0(x 1x2 1 (9’) Yi;n n n i;n where bg(:;s) is given by (23). The term c g(:;s) refers to the estimated derivative of gw.r.t. , and similarly for c g;s) = c g(bg);sInference on Z(g(:;s)) can now proceed as before, if an asymptotic normality result similar to theorem 2 can be shown. In the proof of theorem 2, three bg(:); b gneeded to be proven for the statement of the theorem to follow: First, under the given sequence of experiments, bg(:) converges uniformlyin probability to a degenerate limit. Second, b g(:) converges in distribution to a non-degenerate limit. Third,1) are asymptotically independent for jxj>const . These properties can be shown for rin the present case, with replacing x, for an appropriate choice of sequence of experiments, where ris a scale parameter as before. The choice of sequence of experiments may seem to be more complicated here than in the baseline case, since the dependent variable ais naturally bounded by [0;1], so that increasing the residual variance would be inconsistent with the structural model. This is not a problem, however, if we note that the distribution ofb Z, in the baseline model, is invariant to a proportional rescaling of Y, gand . We can therefore de ne a sequence of experiments which is equivalent to the one de ned by equations (7) through (9) if we replace equation (9) by = 1r g(Xi;n) + . Intuitively, shrinking the \signal" gis equivalent to increasing the \noise" rni;n. Returning to games of incomplete information, consider the following sequence a experiments. of nd by =rn Assumption 5 For i= 1;2, gis continuously di erentiable and monotonic in i, and g1 i;ni;0denotes the inverse of gi;nwith respect to the argument, given sii;n. Experiments are indexed by n, and for the nth experiment we observe (sj;a1;j;n;a2;j;n) for j = 1;:::;n. The observations (sj;a1;j;n;a2;j;n) are i.i.d. 18 MA XI MI LI AN KA SY i;j;njsj;n i;n fs Bin( i;n(s( i;nj; n(s);s g1 2g1 2;0( 2;s rn 1 2;n( 2;s2 2 1 rn 2;n;s 1 1 2;n( 2) 2;n i(:;si 2 1 1;s2 1;n 1;n 1;n ) = 1 g ( 2;s i;n i) + (29 ) 1 )=1 1 2 + 1 1) 1 r ( 1) 2 2 1 2;0 2;n 2;n gn( = 1r 1;0 given nand (:)(26) a))(27) (s) = g)(28) sj;n 1 n g ( 2;s rn 1;0 (3 0) E qu ati on s (2 6) to (2 8) ar e : g 2;n;s th e sa m e as in th e m od el w e ha ve be en di sc us si ng so far . E qu ati on s (2 9) an d (3 0) sh rin k th e gr ap hs of th e be st re sp on se fu nc tio ns g) to w ar ds th e = li ne (c o m pa re gu re 3), pa ral lel to th e a xi s. D en ot e = g( ). W e ge t 1;s) = g ( g n ( 1;s );s ) 1 g ( 2;n;s )g 1 =g ( 2;n;s ) g1 2;0 ) 1 : 2): ;s) !g1;0 ( 1;s (31) rngn By equation (30), if rn !1, then 2;n ! 1, and hence 1 ( ( ;s converges to a non-degenerate limit i rn = O((n 4+k 1=2) c g01 i), where kis the dimensionality of the support of the response functions Us gi, k= dim(s). no inf ba se ba sm are slower. In particular, rn Theorem 6 (Asymptotic normality, static games of incomplete information) Under the uniformly in the Bahadur expansions as n !1, and if rn sequence of experiments de ned by assumption 5, if R= op r rn ZZ b !N(0;V): NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 19 Figure 4.| Qualitative dynamics of stochastic difference equations 1 x 2 X g(X,. ) x gU( X) gL( X) UNotes: This gure illustrates the characterization of the dynamics of nonlinear stochastic di erence equations discussed in section 3.5, where gand gL1], and the basin of attraction of the upper equilibrium region is [x2are upper and lower envelopes of gfor a sequence of realizations of . In this graph, equilibrium regions correspond to the dashed segments of the Xaxis, the basin of attraction of the lower equilibrium region is given by (;x;1). 3.5. Stochastic di erence equations In this subsection, identi cation and interpretation of the number of roots of gfor stochastic di erence equations of the form (32) Xi;t+1 = Xi;t+1 Xi;t = g(Xi;t; i;t) is discussed. Interest in such di erence equations is motivated by the study of neighborhood composition dynamics in Card, Mas, and Rothstein (2008). This discussion will form the basis of the empirical application in section 4. First, it will be shown that, under plausible assumptions, nding only one root in crosssectional quantile regressions of Xon Ximplies that there is only one stable root for every member of a family of conditional average structural functions. Second, it will be argued that the number of roots of gallows to characterize of the qualitative dynamics of the stochastic di erence equation in terms of equilibrium regions. Before the formal results are stated, let us discuss the intuition behind this latter claim. Holding constant, the number of roots of gin Xis the number of equilibria the di erence equation (32). If is stochastic, the number of roots can still serve to characterize qualitative dynamics in terms of \equilibrium regions"; this is illustrated in gure 4. In this gure there are ranges of Xin which the sign of Xdoes not depend on . This implies that in these ranges Xmoves towards the equilibrium regions, which are the regions in which the roots of g(:; ) lie. 20 MAXIMILIAN KASY How is the joint distribution of (Xt;X) related to the transition function g? Unobserved heterogeneity which is positively related over time leads to an upward bias in quantile regression slopes relative to the corresponding structural slopes. To show this, denote the qth conditional quantile of X given X by Q XjXt+1(qjX), the conditional cumulative distribution function at Qby F XjX(QjX), and the conditional probability density by f(QjX). The following lemma shows that quantile regressions of Xon Xyield biased slopes relative to the structural slope@ @X XjXg, if Xis not exogenous. The second term in equation 33 reects the bias due to statistical dependence between Xand . Lemma 1 (Bias in quantile regression slopes) If X= g(X; ), and if Q and F are di erentiable with respect to the conditioning argument X, then @ X= Q;X @ X :(33) The following assumption of rst order stochastic dominance states that there is no negative dependence between current g(x 0; ), evaluated at xed x, and current X: @ @XQ XjX( jX) = E f 1 XjX(QjX) g(X; ) P (g(X0; ) QjX) X0=X 0 @ @ X Assumption 6 (First order stochastic dominance) P (g(x 0; ) QjX) is non-increasing as a function of X, holding x0constant. Violation of this assumption would require some underlying cyclical dynamics, in continuous time, with a frequency close enough to half the frequency of observation, or more generally with a ratio of frequencies that is an odd number divided by two. It seems safe to discard this possibility in most applications. This assumption might not hold, for instance, if outcomes were inuenced by seasonal factors and observations were semi-annual. We can now formally state the claim that, if there are unstable equilibria structurally, then quantile regressions should exhibit multiple roots. Proposition 3 (Unstable equilibria in dynamics and quantile regressions) Assume that X= g(X; ) and that g(inf X ; ) >0, and g(supX ; ) <0 for all . If assumption 6 holds and Q XjX(qjX) has only one root Xfor all q, then the conditional average structural functions E[g(x0; )jg(X; ) = 0;X], as functions of x0, are \stable" at the roots m: for all X, where (0;X) is in the support of ( X;X). E @ g(X; ) X= 0;X 0 @ X NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 21 This proposition assumes \global stability" of g, i.e., Xdoes not diverge to in nity. Under such global stability, if there is only one root of g, then this root is stable. According to this proposition, if quantile regressions only have one stable root, then the same is true for the conditional average structural functions. This is not conclusive, but it is suggestive that the g(:; ) themselves have only one root. Let us now turn to the implications of the number of roots of gfor the qualitative dynamics of the stochastic di erence equation (32). Let ~g(x; ) := g(x; )+x. If gdescribes a structural relationship, the counterfactual time path under \manipulated" initial condition Xi;0= x0is given by Xi;1= ~g(x0; ) Xi;2= ~g(Xi;0i;1; ) .. .i;1 = ~g(Xi;t1; i;t1 and shocks ;:::; Xi;t U i;1 U i;t0 s<t i;1;:::; i;t L i;t i;s i;s i;s g(x; i;s 0 1 L i;t L i;t s < t 2, U i;t X 1 U i;t i;0 ):(34) Given the initial condition X g(x; i;1 i;t i;s de ned by g(x) = max <0 or g (x) = min The functions g i;s in the upper \basin of attraction" beyond x to x2 and g , equation (32) describes a time inhomogenous deterministic di erence equation. The following argument makes statements about the qualitative behavior of this di erence equation based on properties of the function g, in particular based on the number of roots in x of g(x; ) for given unobservables . Consider gure 4, which shows g and gL i ; t )(35) g):(36) and gare the upper Uand lower envelope of the family of functions g(x; ) for s= 1;:::;t. The direction of movement of Xover time does not depend on sin the ranges where g>0 (which is where the horizontal axis is drawn solid in gure 4), since the sign of g(x; ) does not depend on sin these ranges. In other words, suppose we start o with an initial value below xin the picture. If that is the case, Xwill converge monotonically toward the left-hand dashed range and then remain within that range for all s t. Similarly, for Xwill converge to the upper \equilibrium range" given by the right hand dashed range. Hence small changes of initial conditions (from x) can have large and persistent e ects on X in this case, in contrast to the case where g(:; ) only has only one stable root for all . These arguments are summarized in the following proposition. Proposition 4 (Characterizing dynamics of stochastic di erence equations) Assume that gL i;t, de ned by equation (35) and (36), are smooth and 22 MAXIMILIAN KASY generic, positive for su ciently small xand negative for su ciently large x, and have the same number zof roots, xU 1<:::<xU zand xL 1<:::<xL z, and let xL 0U z+1= , x= 1. De ne the following mutually disjoint ranges: = [xU c;xU c+1 = [xL c;xL c+1 c = [xL c;xU c ;xL c] forc= 2;4;:::;z1 c = c [ x U c i;s i;s c i;0 factors i;0 c c+1. cc c+1 c1 c 2 N c c c Nc ] forc= 1;3;:::;z P] forc= 0;2;:::;z1 S] forc= 1;3;:::;z U Then all g(x; and ) are negative on the N , and positive on the P S . Furthermore, all g(x; ) are negative in a neighborhood to the right of the maximum of the S and positive to the left of the minimum, and the reverse holds for the U. Therefore, if Xi;s i;s and then remain within S . If X 2Pc and Sc+1 will converge monotonically toward Sand then remain within S c [Sc [N Assuming nonemptiness of these ranges, the interval P i;1;:::; i;t c , since gU i;tL i;tg 0 c will converge monotoni c i ; s i s a \basin of attraction" for S, i.e., Xin this interval converges monotonically to Sand then remains there. The main di erence relative to the deterministic, time homogenous case is the \blurring" of the stable equilibrium to a stable set S. We did not make any assumptions on the joint distribution of the unobserved . The whole argument of the preceding theorem is conditional on these factors. However, the predictions of the theorem will be sharper (given g) if serial dependence of unobserved factors is stronger, increasing the number of units ito which the assertion is applicable and reducing the size of the intervals Sand Uis going to be smaller on average. In summary, proposition 3 implies that, if we do not nd multiple roots in quantile regressions, then the conditional average structural functions E[g(x; )jg(X; ) = 0;X] do not have multiple roots. Proposition 4 implies that, if upper and lower envelopes of g(:; ) do not have multiple roots, then the dynamics of the system are stable and initial conditions do not matter in the long run. 6= ?, then X 6= ?, then X 4. APPLICATION TO THE DYNAMICS OF NEIGHBORHOOD COMPOSITION This section analyzes the dynamics of minority share in a neighborhood, applying the methods developed in the last two sections to the data used for analysis of neighborhood composition dynamics by Card, Mas, and Rothstein (2008). Card, Mas, and Rothstein (2008) study whether preferences over neighborhood composition lead to a \white ight", once the minority share in a neighborhood exceeds a certain level. They argue that such \tipping" behavior implies discontinuities in the change of neighborhood composition over time as a function of initial composition, and test for the presence of such discontinuities in crosssectional regressions over di erent neighborhoods in a given city. The authors NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 23 provided full access to their datasets, which allows us to use identical samples and variable de nitions as in their work. The data set is an extract from the Neighborhood Change Database, or NCDB, which aggregates US census variables to the level of census tracts. Tract de nitions are changing between census waves but the NCDB matches observations from the same geographic area over time, thus allowing observation of the development over several decades of the universe of US neighborhoods. In the dataset used by Card, Mas, and Rothstein (2008), all rural tracts are dropped, as well as all tracts with population below 200 and tracts that grew by more than 5 standard deviations above the MSA mean. The de nition of MSA used is the MSAPMA from the NCDB, which is equal to Primary Metropolitan Statistical Area if the tract lies in one of those, and equal to the MSA it lies in otherwise. For further details on sample selection and variable de nition, see Card, Mas, and Rothstein (2008). The graphs and tables to be discussed are constructed as follows. For each of the MSAs and each of the decades separately, we run local linear quantile regressions of the change in minority share of a neighborhood (tract) on minority share at the beginning of the decade. This is done for the quantiles 0.2, 0.5 and 0.8, with a bandwidth of n:2, where nis the sample size.10The left column of graphs in gure 5 shows these quantile regressions for the three largest MSAs. For each of the regressions, Z is calculated, where is chosen as 0.04. The integral in the expression for Z is taken over the interval [0;1], intersected with the support of initial minority share if the latter is smaller. Note that it is possible to nd no (stable) equilibrium for an MSA, i.e. Z<1, if high initial minority shares do not occur in that MSA and most neighborhoods experienced growing minority shares. Figure 6 shows kernel density plots of the regressor, initial minority share, which suggest that support problems are not an issue, at least for the largest MSAs. For each Z , bootstrap standard errors and bias are calculated, as well as the corresponding t-test statistics for the null hypothesis Z = 0;1;2;3;:::, implying an integer-valued con dence set (of level .05) for z. By the results of section 2, these con dence sets have an asymptotic coverage probability of 95%. By the Monte Carlo evidence of appendix A, they are likely to be conservative, i.e., have a larger coverage probability. If the con dence sets thus obtained are empty, the two neighboring integers ofb Zare included in the11intervals shown. This makes inference even more conservative. Table I shows the resulting con dence sets for the twelve largest MSAs in the United States (by 2009 population), for all quantiles and decades under consideration. As can be seen from the table, in very few cases there is evidence of Zexceeding 1. In all cases shown, except for the .2 quantile for Atlanta in the 1980s, we can reject the null Z 3. Similar patterns hold for almost all of the 118 cities in the 10The implementation of local linear quantile regression uses code downloaded from Koenker (2009). full set of results for all 115 MSAs in the dataset can be found in the web-appendix, Kasy (2010). 11The 24 MAXIMILIAN KASY Figure 5.| Quantile regressions of the change in minority share and of the change in white population on initial minority share New York, 1980-1990 .2 .5 .8 0 .2 .5 .8 0 0.2 0.4 0.6 0.8 1 0.25 0 0.2 0.4 0.6 0.8 1 -0.05 0.2 -0.1 0.15 0.1 -0.15 0 . 2 0.05 0 Los Angeles, 1970-1980 .2 .5 .8 0.25 0.05 .2 .5 .8 0 0.2 0.4 0.6 0.8 1 0 0.2 0 0.2 0.4 0.6 0.8 1 -0.05 0.15 -0.1 0.1 -0.15 0 . 2 0.05 0 Chicago, 1970-1980 . 2 . 5 . 8 .2 .5 .8 0.2 0.1 0.3 0.25 0 0.2 -0.1 0.15 -0.2 0.1 0.05 0 -0.05 0 0.2 0.4 0.6 0.8 1 Notes: These graphs show local linear quantile regressions of the change in minority share (left column) and of the change in white population relative to initial population (right column) on initial minority share for the quantiles .2, .5 and .8. The graphs do not show con dence bands. 0 0 . 2 0 . 4 0 . 6 0 . 8 1 NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 25 Figure 6.| Density of minority share across neighborhoods New York 1980 Los Angeles 1970 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 Chicago 1970 2.5 2 1.5 1 0 0.2 0.4 0.6 0.8 1 Notes: These graphs show kernel density estimates of the distribution of minority share across neighborhoods. 0.5 0 26 MAXIMILIAN KASY dataset. Rather than exhibiting multiple equilibria, the data indicate a general rise in minority share that is largest for neighborhoods with intermediate initial share, but not to the extent of leading to tipping behavior. Proposition 3 in section 3.5 suggests that, if we do not nd multiple roots in quantile regressions, we can reject multiple equilibria in the underlying structural relationship. I take these results as indicative that tipping is not a widespread phenomenon in US ethnic neighborhood composition over the decades under consideration. This stands in contrast to the conclusion of Card, Mas, and Rothstein (2008), who do nd evidence of tipping. The approach used here di ers from the main analysis in Card, Mas, and Rothstein (2008) in a number of ways. Card, Mas, and Rothstein (2008) (i) use polynomial least squares regression with a discontinuity. They (ii) use a split sample method to test for the presence of a discontinuity, and they (iii) regress the change in the non-Hispanic, white population, divided by initial neighborhood population, on initial minority share. We (i) use local linear quantile regression without a discontinuity, we (ii) run the regressions on full samples for each MSA and test for the number of roots, and we (iii) regress the change in minority share on initial minority share. To check whether the di ering results are due to variable choice (iii) rather than testing procedure, the gures and tables that were just discussed are replicated using the change in the non-Hispanic, white population relative to initial population as the dependent variable, as did Card, Mas, and Rothstein (2008). The right column of gure 5 shows such quantile regressions. These gures correspond to the ones in Card, Mas, and Rothstein (2008), p.190, using the same variables but a di erent regression method and the full samples. Table II shows con dence sets for the number of roots of these regressions for the 12 largest MSAs. In comparing tables I and II, note that there is a correspondence between the lower quantiles of the rst (low increase in minority share) and the upper quantiles of the latter (higher increase/lower decrease of white population). The two tables show fairly similar results. Again, no systematic evidence of multiple roots is found. Some factors might lead to a bias in the estimated number of equilibria, using the methods developed here. First, the test might be sensitive to the chosen range of integration if there are roots near the boundary. If a root lies right on the boundary of the chosen range of integration, it enters Z as 1=2 only. Extending the range of integration beyond the unit interval, however, might also lead to an upward bias in the estimated number of roots, if extrapolated regression functions intersect with the horizontal axis. Second, choosing a bandwidth parameter that is too large might bias the estimated number of equilibria downwards, if the function gpeaks within the range [ ; ]. Third, there might be roots of g in the unit interval but beyond the support of the data. Notes: The table shows con dence intervals in the integers for Z(g) for the 12 largest MSAs of the United States, ordered by population, where gis estimated by quantile regression of the change in minority share over a decade on the initial minority share for the quantiles .2,.5 :2 , is chosen andas is n on .8.:04. sets are based Regression Con dence bandwidth t-statistics using bootstrapped bias and standard errors. MSA 70s 80s 90s q= :2 q= :5 q= :8 q= :2 q= :5 q= :8 q= :2 q= :5 q= :8 New York, NY PMSA [0,1] [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] Los Angeles-Long Beach, CA PMSA [1,1] [1,1] [0,1] [0,1] [0,1] [0,1] [1,1] [1,1] [0,0] Chicago, IL PMSA [0,1] [0,1] [0,1] [2,2] [0,1] [0,1] [1,1] [0,1] [0,0] Dallas, TX PMSA [1,2] [1,1] [0,0] [0,1] [0,0] [0,0] [0,1] [0,1] [0,0] Philadelphia, PA-NJ PMSA [1,2] [0,1] [0,1] [1,1] [0,1] [0,1] [1,1] [0,1] [0,0] Houston, TX PMSA [1,1] [0,0] [0,0] [1,2] [0,1] [0,0] [0,1] [0,0] [0,0] Miami, FL PMSA [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] Washington, DC-MD-VA-WV PMSA [0,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0] Atlanta, GA MSA [1,1] [1,1] [0,0] [2,3] [0,0] [0,0] [0,0] [0,0] [0,0] Boston, MA-NH PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,0] [1,1] [0,0] [0,1] Detroit, MI PMSA [1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,0] Phoenix-Mesa, AZ MSA [1,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0] San Francisco, CA PMSA [1,1] [0,1] [0,1] [0,0] [0,1] [0,0] [1,1] [0,0] [0,0] Table I.| .95 confidence sets for Z(g) for the 12 largest MSAs of the United States by decade and quantile, change in minority share NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 27 28 MAXIMILIAN KASY T ableI I .|.95confidencesetsf orZ ( g )f orthe12lar gestMSAsoftheUnitedSt a tesbydecade andquantile,changeinwhitepopula tion MSA70s80s90s q =: 2q =: 5q =: 8q =: 2q =: 5q =: 8q =: 2q =: 5q =: 8 NewY ork,NYPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1] LosAngeles-LongBeac h,CAPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1] Chicago,ILPMSA[0,1][0,1][0,1][0,0][0,1][1,1][0,1][0,1][0,1] Dallas,TXPMSA[0,1][0,1][0,1][0,0][1,1][0,2][0,1][1,1][0,1] Philadelphia,P A-NJPMSA[0,1][0,1][0,1][0,1][0,1][0,1][0,1][0,1][1,1] Houston,TXPMS A[0,1][0,1][0,1][1,1][1,1][1,1][0,1][0,1][0,1] Miami,FLPMS A[0,1][0,1][0,1][0,0][0,0][1,1][1,1][1,1][1,1] W ashington,DC-MD-V A-WVPMSA[0,1][0,0][0,1][0,0][1,1][0,0][0,1][0,1][0,1] A tlan ta,GAMSA[0,1][1,1][0,1][1,1][1,1][1,1][1,1][1,2][0,1] Boston,MA-NHPMSA[0,1][0,1][0,1][0,0][0,0][1,1][0,0][0,1][0,1] Detroit,MIPMS A [0,1][0,1][0,1][0,0][0,0][1,1][0,1][0,1][0,1] Pho enix-Mesa,AZMSA [0,1][0,1][0,1][0,0][1,1][0,0][0,1][0,1][0,1] SanF rancisco,CAPMS A[0,1][0,1][0,1][0,0][0,0][0,0][0,0][1,1][0,0] Notes:Thetablesho wscon dencein terv alsinthein tegersforZ ( g )f orthe12largestMSAsoftheU nitedStates,orderedb yp opulation,where g isestimatedb yquan tileregressionofthec hang einthenonhispanic,whitep opulationo v eradecade,dividedb yinitialtotalp opulation, onthei nitialminorit yshareforthequan tiles.2,.5and.8.Regressionbandwidth isn : 2 , isc hosenas: 05timesthemaximalc hange. Con dencesetsarebasedont-statisticsusingb o otstrapp edbiasandstandarderrors. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 29 5. SUMMARY AND CONCLUSION This paper proposes an inference procedure for the number of roots of functions nonparametrically identi ed using conditional moment restrictions, and develops the corresponding asymptotic theory. In particular, it is shown that a smoothed plug-in estimator of the number of roots is super-consistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics, and asymptotically e cient relative to a simple plug-in estimator. In section 3, these results are extended to cover various more general cases, allowing for covariates as controls, higher dimensional domain and range, and for inference on the number of equilibria with various stability properties. This section also discusses how to apply the results to static games of incomplete information and to stochastic di erence equations. In an application of the methods developed here to data on neighborhood composition dynamics in the United States, no evidence of multiple of equilibria is found. The inference procedure can also be used to test for bifurcations, i.e., (dis)appearing equilibria as a function of changing exogenous covariates. It is easy to test the hypothesis Z(g(:;W 1)) = Z(g(:;WZ(g(:;W i)) are independent for W 12and W 2)), since the corresponding estimators bfurther apart than twice the bandwidth . If there are bifurcations, small exogenous shifts might have a large (discontinuous) e ect on the equilibrium attained, if the \old" equilibrium disappears. In the dynamic setup, one might furthermore consider to apply the procedure to detrended data, for instance by demeaning Y. It seems likely that regressions of detrended data have a higher number of roots. The rationale of such an approach could be found in underlying models in which the dynamics of a detrended variable are stationary. This is in particular the case in Solow-type growth models, in which GDP or capital stock is stationary after normalizing by a technological growth factor. Finally, it might also be interesting to extend the results obtained here to cover further cases where gcan not be directly estimated using conditional moment restrictions. The crucial step for such extensions, as illustrated by the various cases discussed in section 3, is to nd a sequence of experiments such that the rst stage estimator bgconverges in probability to a degenerate limit whereasconverges in distribution to a non-degenerate limit. Furthermore, b g0(x1b g0) needs to be asymptotically independent ofb g0(x2) for all jx1x2j>const: . There are many potential applications of the results obtained here, where 12it might be interesting to know whether the underlying dynamics or strategic interactions imply multiple equilibria. Examples include household level poverty traps, intergenerational mobility, e ciency wages, macro models of economic growth (as analyzed in the web appendix), nancial market bubbles (herding), market entry, and social norms. 12The Matlab/Octave code written for this paper is available upon request. 30 MAXIMILIAN KASY APPENDIX A: MONTE CARLO EVIDENCE This section presents simulation results to check the accuracy in nite samples of the asymptotic approximations obtained in theorem 2. In all simulations, the Xare i.i.d. draws of Uni[0;1] random variables, and the additive errors are either uniformly or normally distributed: iid 8x3 Xi fjX ijXi = gj(Xi) + i Yi jXj 1 2 2 b Zshould be constant up to b Z, normalized by its 0 b g 0 Uni[0;1] ;(37) where fis an appropriately centered and scaled uniform or normal distribution. Two functions gare considered, the rst with one root and the second with three roots: g(x) = 0:5 x g(x) = 0:5 5x+ 12x: The function gis estimated by median regression, mean regression and .9 quantile regression, where the in the simulations are shifted appropriately to have median, mean or .9 quantile at the respective g. The gures and tables show sequences of four experiments with 400, 800, 1600 and 3200 observations. These models are chosen to be comparable to the empirical application discussed in section 4. The variance of in each experiment is chosen to yield the same variance for b g, as implied by the asymptotic approximation of the Bahadur expansion, in all experiments for a given g. By the proof of theorem 2, we should therefore get similar simulation results across all setups. Furthermore, the variance of a factor = . The parameters of these simulations are chosen to lie in an intermediate range where variation inis existent but moderate. Figure 7 shows density plots forb Zfor the sequences of experiments with uniform errors andmedian regressions; in the web-appendix, Kasy (2010), similar gures are presented for the other experiments. As predicted by theorem 2, biases are positive, and both bias and variance are decreasing in n. Figure 8 shows the distribution of the \naive" plug-in estimator Z(bg). It was shown in section 2 that this estimator is asymptotically ine cient relative to the smoothed plug-in estimator. This relative ine ciency is reected in a larger dispersion in the simulations, as can be seen comparing gure 7 and 8. Figure 9 shows density plots for sample mean and standard deviation, from the same simulations. These plots suggest that the sample distribution ofb Zis somewhat right-skewed relative to a normal distribution.Table III shows the results of simulations using bootstrapped standard deviations and biases, for mean regression with uniform errors. The results show, for the range of experiments considered, that rejection frequencies are lower than the 0:05 value implied by asymptotic theory. If this pattern generalizes, inference based upon the t-statistic proposed in this paper is conservative in nite samples. In particular, it seems that bootstrapped standard errors are too large. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 31 g1Figure 7.| Density of b Zin Monte Carlo experiments (x) = 0:5 x, Z(g1) = 1 n=400 n=800 n=1600 n=3200 1 2 3 4 5 6 10 8 6 4 2 0 (x) = 0:5 5x+ 12x2 8x3, Z(g2) = 3 n=400 n=800 n=1600 n=3200 g2 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 Notes: This gure shows density plots of b Zfrom Monte Carlo experiments with uniformerrors and gidenti ed by median regression, as described in appendix A. The upper graph shows the distribution from four experiments with increasing samplesize nand correspondingly growing variance of the residual , where the true parameter Zequals one. The same holds for the lower graph, except that Z= 3. 32 MAXIMILIAN KASY Figure 8.| Distribution of simple plug-in estimator Z(bg) in Monte Carlo experiments g1(x) = 0:5 x, Z(g1) = 1 0.8 0.7 n=400 n=800 n=160 0 n=320 0 0 1 2 3 4 5 6 7 0.6 0.5 0.4 0.3 0.2 0.1 0 g2(x) = 0:5 5x+ 12x2 8x3, Z(g2) = 3 n=400 n=800 n=1600 n=3200 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 Notes: This gure shows the distribution of Z(bg), the \naive" plug-in estimator, from the same simulations as gure 7. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 33 g1Figure 9.| Density of normalized b Zin Monte Carlo experiments (x) = 0:5 x, Z(g1) = 1 std.normal n=400 n=800 n=1600 n=3200 0.4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 g2(x) = 0:5 5x+ 12x2 8x3, Z(g2) = 3 std.normal n=400 n=800 n=1600 n=3200 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Notes: This gure shows density plots of b Z, normalized by its sample mean and standarddeviation, from the same simulations as gure 7. It also shows, as a reference, the density of a standard normal. 34 MAXIMILIAN KASY TABLE III Montecarlo rejection probabilities n r b P( >z ) b P( <z ) 400 0.065 0.179 0.05 0.01800 0.059 0.194 0.03 0.02 1600 0.055 0.231 0.02 0.01 3200 0.052 0.290 0.02 0.01 400 0.065 0.268 0.03 0.02 800 0.059 0.292 0.01 0.02 1600 0.055 0.347 0.01 0.01 3200 0.052 0.434 0.01 0.02 1Notes: This table shows the frequency of rejection of the null under a test of asymptotic level 5%, for the sequences of Monte Carlo experiments described in appendix A. The gare estimated by mean regression, the errors are uniformly distributed, and the rst four experiments are generated using gwith one root, the next four using g2with three roots. The columns show in turn sample size, regression bandwidth, error standard deviation, and the rejection probabilities of one-sided tests. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 35 APPENDIX B: PROOFS Pro of of prop osition 1: By continuity of g0as well as genericity of gwe can choose small enough such that sgn(g0(x)) is constantly equal to sgn(g0c)) 6= 0 in each of the neighborhoods of the c= 1;:::;zroots of g, fxg, de ned by L (xc(g(x)) 6= 0. Hence we can write the integralRXL (g(x))jg0(x)jdxas a sum of integrals over these neighborhoods, in each of which there is exactly one root. Assume w.l.o.g. that z= 1 and g0is constant in the range of xwhere L XL (g(x))jg0(g(x)) 6= 0. Then, by a change of variables setting y= g(x), Z(x)jdx= Zg(X )L (y)jg0(g1(y))j 1jg0(g1(y))j dy= 1 Pro of of prop osition 2: We need to nd such that jjg~gjj< implies Z(~g) = Z(g). By genericity of g, each root xcof gis such that sgn(g0)) 6= 0. By continuous rst derivatives we can then nd such that sgn(g0(xc(:)) is constant in the neighbourhood NHc:= (xc ;x+ ) of each of the nitely many roots xcand the NHccare mutually disjoint. By continuity of g, (38) 1:= infg(x) >0 c x=2 Sc NH and (39) 2 jg0(x)j>0; x2 Sc c = NHc c 0(xc )) = sgn(~g(xc 2 1 2c ccNHc 1 1 Z2 p c 1 0 := inf is constantly equal to sgn(g is the closure of NHc. Choosing =S )) 6= sgn(g(x c c where NHc ) ful lls our purpose. To see this choose a ~gsuch that jjg~gjj< . For x=2~gis bounded away from ; min( zero by equation (38). In NHthere must be exactly one xsuch that ~g(x) = 0: Since the NHare mutually disjoint, sgn(g(x+ )), by (38) again sgn(g(x )) and sgn(g(x+ )) = sgn(~g(x+ )), and nally the sign of ~g)) in NHby equation (39). The assertion for Zfollows now from the rst part of this proof, combined with proposition 1, if we can choose a independent of ~gsuch that proposition 1 applies. Su cient for this is a that separates roots. Choosing = accomplishes this. By equation (38), Lwill separate the NH, and by the previous argument each of the NHwill contain exactly one root of ~g. n n =A Z2 1 APro of of theorem 2: Write Z Z A 2 = ;Z3 (Z2). We will use Z1;Z 0 o have the same non-degenerate distributional limit for some non-random sequences aand b. In particular, as long as such sequences exist that guarantee convergence to a non-degenerate limit, this is implied by equality up to a remainder which is asymptotically negligible under the given sequence of experiments, i.e., Zif Z ) The remainder of this approximation is given by (bg))j b jg Z (g)L (L Z=1) Approximation of bgwith g: b to denote a sequence of approximations to b Z. Negligibility of this remainder follows from uniform convergence of bgunder our sequence of experiments at a rate faster than , which is a consequence of Bahadur expansion (6) and of = !1. Assuming that Lis Lipschitz with constant C= , this in turn implies uniform convergence of (L (bg)) to 0. This, combined with the arguments proving distributional 36 MA XI MI LI AN KA SY ver neigh borho ods of the roots of g, given below , prove s that the remai nder is op( b Z).2) Appro ximati on of b gZ (g; b g0Z L 1( g(x)) in XK g(X0i( x) f1x) ( Yi(x)s 1 n(x)Ig (x) g0n(x) (x)(X Xi2 3L (g)jRj; The absol ute value of the remai nder of this appro ximati on is less than or equal to Z pis o ne gli gib le, i.e. , R= ow her e Ris the re ma ind er of the Ba ha dur ex pa nsi on. Ne gli gib ility of the re ma ind er of the ap pro xi ma tio n is a co ns eq ue nc e of the as su mp tio n tha t the re ma ind er of the Ba ha dur ex pa nsi on b g;b g0 (g; g0) u nif or ml y in x.3 ) Re stri cti on to on e ro ot at 0 an d Ta ylo r ap pro xi ma tio ns: As su me tha t g(0 )= 0 an d g(x ) 6= 0 for x6 =0 (i.e ., Z= 1). Thi s is wit ho ut los s of ge ner alit y, sin ce the int egr al for the ge ner al ca se is si mp ly a su m of the ind ep en de nt int egr als in a nei gh bor ho od of ea ch roo t. Now de ne c= g0(0), w= f1(0)s1(0)1 2, convergence of R b g0 ) =A By replacing gwith g0(0)xin L 0 i = (ei) and ~ K (d) = K by the Bahadur expansion: x dx=: Z1 i and replacing f1(x)s1(x)(x) !1 uniformly, we get with w, both justi ed by smoothness and !0, as well as I = Z L (cx) x)) (d) d . n(g(x)) 1 2 Z1 x) ( i x)(Xi i dx= Z2jL (g) L The absolute value of the remainder of this approximation is less than or equal to Z(cx)j g0 X +Z L (cx) f1(x)s1(x)In(x) 1 2 2w Both terms in this expression go to 0 as !0. We can assume furthermore that X iidUni([ =c; =c]) conditional on falling in this interval and that iid 2i) fX(Xi)f jX i functio Y X ns FX jE j: Z L (cx) (Xi K iX 1 =A n g c+ w0r (0) fn2En1h ~K (0)s(Xi1(0) 1x) i2 3 i ( X(0)X+ o(X) and F jX( jX) = F jX n i (e) jX= 0 Th es e as su mp tio ns are jus ti e d by an oth er Ta ylo r ap pro xi ma tio n, thi s tim e of the dis trib uti on (x) = F Z h(Z ( j0 )+ O( X), as su mi ng bot h dis (0) + f jXi) fX(0)f jX 2 1 2 ( ij0) trib uti on fun cti on s to be C. To se e tha t thi s ap pro xi ma tio n is jus ti e d, not e tha t dis trib uti on al co nv erg en ce to the sa me limi t is eq uiv ale nt to co nv erg en ce of the ex pe cta tio ns of an y Lip sc hit z co nti nu ou s bo un de d fun cti on of the sta tisti cs to the sa me limi t. Th e di ere nc e in ex pe cta tio ns bet we en a fun cti on hof Za nd of its ap pro xi ma tio n usi ng co ndi tio nal ly uni for m Xa nd i.i. d. is giv en by This integral goes to 0 because the support of h(Z ) in Xis a neighborhood of 0 shrinking to 0. NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 37 j;tj+14) Partitioning the range of integration: Partition [ =c; =c] into subintervals ti [t], j= 1;:::;b = cwith ti+1 Z2 =A L (ctj) j = Z3 b =c cXj=1 = Zttjj +1 2 c+rn En h ~K(Xi c+ w x) ii dx n w with j n tj j = 2 . Then tj h2 jl h j;t iid r E 2 L (cx) L c max<xThe remainder of this approximation is given by Z This approximation is warranted by Lipschitz continuity of L2with a Lipschitz constant of order 1= , and by = !0. 5) Poisson approximation: The following argument essentially replaces the number of Xfalling into the interval [ =c; =c], which is approximately distributed Bin(n;2f(0) =c), with a Poisson random variable with parameter 2nf(0) =c; the distribution of everything else conditional on this number remains the same. Let njbe distributed i.i.d. Poisson(2n f(0)) for j= 1;:::;b = c. This is an approximation to the number of Xfalling into the bin [t]. Draw Xjl iidUni([tj;tj+1]) and j+1 (e)jX= 0 for j= 1;:::;b = cand l= 1;:::;n. Now de ne = Zttjj +1 c j+1Xk=j nkl= h (Xjl x) jli dx: 1X ~K + w rnn 2 1 j Then Z3 = A b =c cXj=1 where the j L (ctj) j are identically distributed and jfor jjkj 2. Conditional on ~n:=Pjnjis independent of k, the equality is exact. The exact distribution of the number of observations falling in the interval [ =c; =c], corresponding to ~n, would be given by (2n( =c)f(0))~n~n~n! n!n(n~n)! (1 2( =c)f(0))(n~n): 3The Poisson approximation sets the latter part of this expression to a constant in ~n. This is justi ed by the usual arguments deriving the Poisson distribution as a limit of Binomial distributions. The approximation of Zfollows by an argument similar to the one of point 3, second part, once we note that the multinomial p.m.f. converges uniformly. j] = 6) Moments of the integrals over the subintervals: E[ 2 j] = 21 2+ o( ) E[ + o( 2j j+1] = 2 11) E[ + o( 23 j] = 3 2+ o( 3)) E[ 38 MAXIMILIAN KASY These equations follow from noting rst pointwise convergence to normality of x) = w rnn 2 j+1Xk=j nkl= h (Xjl x) jli !N(0;v) 1X ~K 1 under our sequence of experiments. This is the point where the rate rn matters: x) = j+1Xk=j nkl= h (Xjl x) jli 1X ~K 1 w 1=21=2n (n w 1(n )1=2 j j +1) 1 [K( l) l l] = + n j + n 1=2 + nj+1 1j1j ljXl p +n ; 1=2 (n +n +n ) ]; p [K( l) n n : + nj x) x11 l = 1 + nj n X = w nj1 0 n + 0 n j1 jl=1 j +1 l l X ! j+1 j+1) +n !!N + nare i.i.d. Uni[3;3]. Now asymptotic normality follows by noting 1and E[ ] = 0. Similarly + ) v corr(j j) v corr(j j) v v j1 +n j j +1 where the j 6f(0), (nj1 2 Second, a change of the order of integration and the limit in ndelivers the claims, where this change of order is justi able by the dominated convergence theorem. For instance, lim(E[ 2 j]= 2) = 4limE " Z[0;1] j(c+ tj + 2 1))(c+ tj + 2 2))jd 1d 2# = 4 Z[0;1]2 limE[j(c+ tj + 2 1))(c+ tj + 2 2))j]d 1d 2 7) Central limit theorem applied to the sum of integrals over the subintervals: Now apply a central limit theorem for m-dependent sequences to the sum of integrals. For a de nition of m-dependence, see Hoe ding and Robbins (1994). Note that L (ctj) is an m-dependent sequence with m= 1. We have Var 0 b j @ L = = ( r c c c t X j j ) = 1 j 1 A j j L2 (ctj)Var( L (ct )(L L 0 (ct ) + L (ct )+X =c 2 @ j c = c( 2 Z =c 2 X (cu) 2( +2 3 2 1)du + 2 11 3 2 1 ) Z1 L2 1 1(cu)du ro ofm 3: Fix one b g0 of of the roots P theore x0 j 11 ))Cov( j; j+1 j1 j+1 ) 1A Z3 symptotic normality for q E[Z follow Z3 33]s, and by b Z =Ah b 0(x0 ))) >0: , the same holds for q bZE[ b Z] . Furthermore, E Z = O(1), and hence so is EZ i . of g. By the arguments of the proof of theoremA2, @=@xbg(x) (not to be confused with 0)) 6= sign(g(x)) converges to a non-degenerate normal distribution for all x. In particular, liminf P(sign(@=@xbg(x NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 39 By uniform convergence in levels of bgand the intermediate value theorem (compare also gure 2),P(Z(^g) >Z(g)) P(sign(@=@xbg(x0)) 6= sign(g0(x0))): This proofs the rst claim. The second claim now immediately follows from !1. Pro of of theorem 4 (Sketch): We will approximate M(a;b;x;w1) by a criterion function that has the form of equation (3), i.e., a local weighted average over the empirical distribution of some objective function. Based on this approximation we can then again apply the results of Kong, Linton, and Xia (2010). Newey (1994) provides a set of results that facilitate such approximations of partial means. In particular, lemma 5.4 in Newey (1994) allows derivation of the required approximation by replacing the outer sum over jin equation (11) with an expectation, and by linearizing the fraction inside. The rst replacement is asymptotically warranted since the variation created by averaging over the empirical distribution is of order 1=p nand hence dominated by the variation in the nonparametric component. The second replacement follows from di erentiability and requires in particular that the denominator of the fraction be asymptotically bounded away from zero. This is guaranteed by the requirement that Whas full conditional support given (X;W 1). Formally, lemma 5.4 in Newey (1994) gives M(a;b;x;w1)EW2[EmjX;W [mjX= x;W1= w1;W22]] = ~ M(a;b;x;w1)+op( ~ M(a;b;x;w)); where~ M(a;b;x;w1) := (40)1 x)) E[m(Yj 1jab(XjjW 2j) x))jXj;W j !: 1 X;W 1jW 2(Xj 1jW 2 fX;W1jW 2 d = O n (4+d) dimensional case is that (6) has to be multiplied by 1= O r2 nn 4+d . Ld+1 d d1. For b g0nto equal (n b g0) = d jover each of these subranges will be of order O(( = )d 1=2converges 2db Zof n i r. Pro of of theorem 5: The proof requires the following modi cations relative to the one-dimensional case: Assumption 2 is still applicable, where the only di erence in the dto have a point-wise non-degenerate distributional limit, we have to choose the rate r, which is slower for higher d. To see this note that Var(is Lipschitz continuous of order (1+d), so that we require = !0 for step 4 of the proof. The range of integration has to be partitioned into rectangular subranges of area instead of intervals of length . There will be approximately const ( = )such subintegrals. The variance of the integral of jb g0, similarly for expectations and covariances. This yields a variance of); see point 7 of the proof. gPro of of theorem 6: By equations (23) and (25), it is su cient to show that r01 1(g2;n( 1;s2);s1) and rn c c g01 2( 1;s1)) converge jointly in distribution, while r;s), j ( x w g X K 1 1 X ; o ) j W e 1 n s j m ( t Y o j 1 i f a b ( X ! 1 . j F i n a l l y , n d t h e v a r i a n c e o ] f ; W 6= cons t:, the rates have to be adap ted as follo ws. The num ber of obse rvati ons withi n each recta ngle of size x 2 go ) ) thro j X ugh ; W unc ] ( han X ; ged. W ) If W This ; app pro roxi vidi mati ng onus of wit theh obje the ctiv de e sir func ed tion Ba has ha thedur gen ex eral pa fornsi m on. ass Ch umoo edsin in g Kon the g, ap Lint pro on,pri and ate Xiase (20qu 10)en if ce weof setex ( per 4 im 1 ) ent ~ s, m ( fro Y ; m X ; her W ; e a ; on b ; the x ) ent : ire = m pro ( of Y a an b ( d X x res ) ) ult E of [ m the ( Y ore a b m ( X 40 MAXIMILIAN KASY ngnas well as b , converge in probability. These claims follow as before if we combine the convergence of rfrom display (31) with Bahadur expansion (6) for c g01 2and c g01 1, where the latter are evaluated at 2;n, which is not constant but converges. Q XjX(qjX)jX = q. Di erentiating this with Pro of of lemma 1: By de nition of conditional quantiles, F XjX respect to Xgives F XjX(QjX) : @ (QjX) f XjX (42) @ @X XjX (qjX) = @X Q @ @X XjXFThe di erential in the numerator has two components, one due to the structural relation between Xand X, i.e., the derivative with respect to the argument Xof d(X; ), and one due to the stochastic dependence of Xand .(QjX) = E h g+ @X@X P f Xjg g(X0X;X(QjgX; ) QjX ;X) X0 X=Xi: This can be seen as follows: We can decompose the derivative according to@ @XF XjX(QjX) = @ @X0+ @@X P g(X0; ) QjX To simplify the rst derivative, note that by iterated expectations P g(X0; ) QjX = E[F(g(X0; )jX;g)jX]: Di erentiating this with respect to X0givesX X0=X: Eh f XjgX;X(QjgX;X)jX i : gX XjX(qjX) must be stable, (qj0) 0 and Q@ @XQ XjX X 2Sc 2Uc ;xs c+1 c c sc that Q XjX(qjX) = 0. i;s) c. Furthermore, xs c <0 on [xs c;xs c+1 @ = c @x@ @eg of g, @@e xc g: @ @exc The claim now is immediate. (qj1) 0. Therefore the unique root Xof Q(qjX) 0. By lemma 1 and assu Pro of of prop osition 3: Since Xand X+ Xhave their support in E[gj X= Q;X] 0. Finally, note that for all Xwhere (0;X) is in the support of the interval [0;1], Q qsuch ] and similarly for Pfor all s, c= 1;3;:::and x], c= 1;3;:::from which negativity on Nfor all s, c= 2;4;:::. Next, g(:;efollows, similarly for P. Finally, under monotonicity of potential outcomes, assuming for simplicity di erentiability The numerator is always positive by assumption, the denominator is negative for c= 1;3;::: and positive for c= 2;4;:::since we had assumed gpositive for su ciently small x, hence is positive for c= 1;3;:::and negative for c= 2;4;:::. Pro of of prop osition 4: The claims are immediate, noting that N = Ts[xs c NONPARAMETRIC INFERENCE ON THE NUMBER OF EQUILIBRIA 41 REFERENCES Andrews, D., and X. Cheng (2010): \Estimation and inference with weak, semi-strong, and strong identi cation," . Azariadis, C., and J. Stachurski (2005): \Poverty traps," Handbook of Economic Growth, 1, 295{384. Bajari, P., H. Hong, J. Krainer, and D. Nekipelov (2006): \Estimating static models of strategic interaction," NBER working paper. Becker, G., and K. Murphy (2000): Social economics: Market behavior in a social environment. Harvard University Press. Berry, S. (1992): \Estimation of a model of entry in the airline industry," Econometrica, 60(4), 889{917. Bowles, S., S. Durlauf, and K. Hoff (2006): Poverty traps. Princeton University Press. Bresnahan, T., and P. Reiss (1991): \Entry and competition in concentrated markets," Journal of Political Economy, 99(5), 977{1009. Card, D., A. Mas, and J. Rothstein (2008): \Tipping and the dynamics of segregation," Quarterly Journal of Economics, 123(1), 177{218. Choirat, C., and R. Seri (2012): \Estimation in Discrete Parameter Model," forthcoming in Statistical Science. Dasgupta, P., and D. Ray (1986): \Inequality as a determinant of malnutrition and unem1ployment: Theory," The Economic Journal, 96(384), 1011{1034. Gin e, E., D. Mason, and A. Zaitsev (2003): \The L-norm density estimator process," The Annals of Probability, 31(2), 719{768. Hoeffding, W., and H. Robbins (1994): \The central limit theorem for dependent random variables," in The collected works of Wassily Hoe ding. Springer. Horowitz, J. (2001): \The Bootstrap," Handbook of Econometrics, 5, 3159{3228. Horv ath, L. (1991): \On Lp-norms of multivariate density estimators," The Annals of Statistics, 19(4), 1933{1949. Imbens, G., and J. Wooldridge (2007): \What’s new in econometrics? Weak instruments and many instruments," NBER Lecture Notes 13, Summer 2007. Kasy, M.(2010): \Nonparametric inference on the number of equilibria, web appendix," https: //sites.google.com/site/maxkasywp/Home/wps/Appendixtestingmulteq.pdf . Koenker, R. (2009): \Quantile Regression," http://www.econ.uiuc.edu/ ~ roger/research/ rq/rq.html , Accessed January 30, 2009. Kong, E., O. Linton, and Y. Xia (2010): \Uniform bahadur representation for local polynomial estimates of m-regression and its application to the additive model," Econometric Theory, 26, 1{36. Newey, W. K. (1994): \Kernel estimation of partial means and a general variance estimator," Econometric Theory, 10(2), 233{253. Quah, D. (1996): \Empirics for economic growth and convergence," European Economic Review, 40(6), 1353{1375. Staiger, D., and J. H. Stock (1997): \Instrumental Variables Regression with Weak Instruments," Econometrica, 65(3), 557{586. van der Vaart, A. (1998): Asymptotic statistics. Cambridge University Press. Young, H. (2008): \Social norms," in The New Palgrave Dictionary of Economics, ed. by S. Durlauf, and L. Blume, vol. 2.