Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/likelihoodinfereOOcher Dewey bi 1415 \.6Z-\ Technology Department of Economics Working Paper Series Massachusetts Institute of LIKELIHOOD INFERENCE FOR SOME NON-REGULAR ECONOMETRIC MODELS Victor Chernozhukov, Han Hong, MIT Princeton Working Paper 02-05 February 2002 Room E52-251 50 Memorial Drive Combridge, MA02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at ht tp://papers.ssi'n.coi'n/'paper.taf ?abstract id- 30 1486 f Massachusetts Institute of Technology Department of Economics Working Paper Series LIKELIHOOD INFERENCE FOR SOME NON-REGULAR ECONOMETRIC MODELS Victor Chemozhukov, Han Hong, MIT Princeton Working Paper 02-05 February 2002 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssm.com/paper.taf?abstract id= 30 486 1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY MAR 7 2002 LIBRARIES Likelihood Inference in a Class of Nonregular Econometric Models * Victor Chernozhukov Han Hong Depaxtment of Economics Department of Economics Massachusetts Institute of Technology Princeton University First Version: August 2000 This version: February 11, 2002 Abstract In this paper we study inference for a conditional model with a the conditional density, where the location Eind size of the by regression This interesting structure lines. econometric models. Two prominent is jump jump in are described shared by several structural examples are the standard auction model where density jumps from zero to a positive value, and the equilibrium job search model, where the density jumps from one level to another, inducing kinks in the cumulative distribution function. This paper develops the asymptotic inference theory for likelihood based estimators of these models- the Bayes and maximum likelihood estimators. Bayes and sical procedures. offer some While MLE is ML estimators are useful clas- transformation invariant, Bayes estimators and computationcJ advantages. They theoretic efficiency properties. We also have desirable characterize the limit likelihood as a function of a Poisson process that tracks the near-to-jump events and depends on regressors. The approach auction. flexible We is applied to an empirical model of a highway procurement estimated a pareto model of Paarsch (1992) and an alternative parametric model. Key Words:^Structural Econometric Model, Auctions, Job Search, Highway Procurement Auction, Likelihood, Point Process, Stochastic Equisemicontinuity 'We would like to thank Joe Altonji, Stephen Donald, Jerry Hausman, Bo Honore, Joel Horowitz, Shakeeb Khan, Yuichi Kitamura, Rosa Matzkin, Whitney Newey, George Neumann, Harry Paarsch, Frank Schorfheide, Robin Sickles, Richard Spady, Max Stinchcome as well as participants in the CEME conference, the EC2 meeting and at various seminars for valuable suggestions. The problem was suggested to Amemiya. We are grateful to him for enlightening conversations and continued us by our advisor Takeshi support. [For convenience of refereeing only. To be removed.] Contents 1 2 3 4 Introduction 1 The Basic Model 2.1 Assumptions 2.2 Definition 2 2 and Motivation of Bayes and Asymptotic Theory for ML Estimators 3 The Basic Model 5 3.1 Likelihood Limit 5 3.2 Large Sample Properties of Bayes Estimators 7 3.3 Maximum 8 Likelihood Nonlinear Model with Nuisance Parameters 8 10 4.1 Limits of Likelihood 4.2 Asymptotic Behavior of Bayesian and ML Estimators 11 11 5 Efficiency 6 Confidence Intervals and 7 Empirical Illustration 15 A Useful Background Definitions 20 A.l A. 2 A.3 B Practical Questions 13 20 Point Processes Convex Objectives 20 Stochastic Equisemicontinuity Proofs for the Linecir 20 Model Proof of Theorem 21 1 22 B.2 Proof of Theorem 2 Proof of Theorem 3 B.4 Lemmas 4-6 25 B.3 26 B.l C Some Proofs for the Non-Linear 28 Model 30 C.l Proof of Theorem 4 C.2 Proof of Theorem 5 C.3 Proof of Theorem 6 C.4 Proof of Theorems 7 and 8 34 C.5 Lemmas 36 7-8 30 32 . 33 Introduction 1 we consider a conditional model with a jump In this paper whose location and size are described by regression lines. in the conditional density, This model was first proposed by Aigner, Amemiya, and Poirier (1976) in the context of production analysis. Many recent econometric models also share this interesting structure. For example, in standard Donald and Paarsch (1993a), the conditional density jumps from Neumann, and Kiefer the density jumps from one level to another, inducing in the distribution kinks (2001)), function. In what follows, the former model is referred to as the one-sided or boundary model, while the latter model is the two-sided. It is typical in these models that the location auction models, cf. zero to a positive value: in equilibrium job search models (Bowlus, of jump indispensably related to the parameters of the underlying structural economic is model. Learning the parameters of location is thus crucial for learning the parameters of the underlying economic model. Several important, fundamental papers developed inference methods for such models, including Aigner, Amemiya, and Poirier (1976), Ibragimov and Has'minskii (1981), Flinn and Heckman (1982), Christensen and Kiefer (1991), Donald and Paarsch (1996), Donald and Paarsch (1993b), Donald and Paarsch (1993a), Bowlus, Neumann, and Kiefer (2001). Ibragimov and Has'minskii (1981) (IH afterwards) obtained the limit distributions of Bayes and maximum likelihood estimators(MLE) without covariates. Donald and Paarsch (1996) MLE dealt with in the one-sided (boundary) models with discrete covariates. by Aigner, Amemiya, and Poirier and ML estimators in the general two-sided regression model are still unknown. The properties of Bayes estimators in the one-sided model and the properties of MLE in the one-sided model with general regressors are also open questions. Without understanding these basic properties, using classical estimation principles in these econometric applications may be Nevertheless, the general inference problem posed (1976) hcis remained unresolved. The basic asymptotic properties of Bayes questionable. In this paper, we develop the Maximum cisymptotic theory of Bayes and Likelihood estimators for a general conditional model of a density jump, including one-sided and two-sided models with arbitrary covariates. estimation procedures. While some theoretic Bayes estimators and MLE is transformation invariant, MLE are attractive the Bayes estimators offer and computational advantages, and are convenient in practice. They cJso have desirable efficiency properties. Further details hood process is may be summarized as follows. We will show that the limit of the likeli- a stochastic integral of a Poisson point process that tracks the conditional near-to-jump events. The result is analogous in spirit to that of Chemozhukov (2000), obtained for the extremal quantile regression. It will hood be shown that Bayes estimators behave asymptotically as functions of the limit. asymptotically equivalent to sample average is ML. In fact, Bayes estimators axe efficient in terms of finite- and asymptotic average-risk optimality, which strongly do not study the minimax criteria. Recent contribution by Hirano risk optimality justifies their use.^ 'MLE likeli- Unlike the usual case of regular parametric models, Bayes estimators are not We not optimal for loss functions in conventional sense, but it may be shown to be optimal for a and Porter (2001) offer a substantive analysis in this interesting direction in the context of one-sided discrete covariate models. We will also demonstrate that the Our proof likelihood limit. (1999). In our opinion, the result econometrics and Finally, we MLE behaves asymptotically as a function of the uses the concept of stochastic equi-semicontinuity of Knight makes a convincing case for its further applications in statistics. will study these methods and apply them to estimate models model we estimate is a stylized pareto model of in simulations of a highway procurement auction. The Paarsch (1992). The second one a flexible parametric alternative to the non-parametric is first model of Guerre, Perrigne. and Vuong (2000). We also implemented computer programs with Monte Carlo Markov Chain methods for the estimators. These programs are available from the authors. The paper organized as follows. Section 2 describes a basic linear model. Section is 3 develops the asymptotic theory for this model. Section 4 considers a more general nonlinear model with nuisance parameters. Section 5 discusses efficiency issues. Section Throughout the paper, c and 6 discusses practical aspects of inference and estimation. C denote generic positive constants; distribution, respectively; 2 and • | | — > — and > denote convergence in probability and denotes the supremum norm of a vector. The Basic Model This section begins with a basic linear model, which helps establish the main results Extensions to general nonlinear models are given in section 2.1 The clearly. 4. Assumptions bcisic model, denoted R, takes the following form Y,=XlP + ei, where the error Cj (1) has the conditional density / {e\Xi,P), parameterized by P belonging R''. We denote the reference parameter as po^ to the set B, a compact, convex subset of and assume Po S interior^. The conditional density has a \\mf{e\x,p) = jump at zero: q{x,p), ctO \imf{e\x,P)=p{x,p), (2) e-l-O p{x,p)>q{x,p)+6,s>o, VxeX,v/3eB. X In other words, the conditional density of Y given jumps at the location X'P, which depends on the parameter p and covariate X The shape of the density may also depend on the parameter p. In section 4, it will be made dependent on other parameters as well, . and X'P will be generalized to a nonlinear function. generalized Dirac loss function. We have two models to consider: the one-sided model and the two-sided model. The one-sided model has two-sided model has conditional density jumping from zero to a positive constant. its positive value. Figure The conditional density jumping from one positive value to another its 1 illustrates The one-sided model is a special case Amemiya, and Poirier (1976), the two models. of the two-sided model. In addition, as suggested by Aigner, the two-sided model may be applied to one-sided models, using the lower density More generally, the two-sided model approximates models to account for outliers. sharp increase finite region with a the density, whose location depends on parameters and regressors. in sample distribution of parameter estimates in The such a model are approximated by that in the model with density jump. It is typical in these models that the location of jump is parameters of the underlying structural economic model. location We is indispensably related to the Learning the parameters of thus crucial for learning the parameters of the underlying economic model. maintain the following additional cissumptions for model R. Assumption (C.l) {Yi, Xi) The 1 is an following statements apply to x in i.i.d. sequence of vectors in Kx M'', X and P defined on Fx, with compact support X, that does not depend on > c > (u|i, P) = q (x, P) g(x,/3) ii. / (C.3) Except at e = = 0, for u < € K, iiT |c| is 0, \fJnf{e (5 + > 0, e and P, that are upper-semicontinuous at Ex J\-fyf [y - X'P\X;P) > 0, C > < K, in case C.2.ii this c.d.f 0. 0. has continuous derivatives in 0, f{e\x, /3) dominated: sup^^g all c, e T^ Pp)- Xt has or uniformly in u,x.p. W.l.o.g. /(e|i,/?) (C.4) There exist (fi, Var{X) > /?, P and x (C.2) in addition to (l)-(2), uniformly in i. in B: \dy < 0; its bounded derivative is oo. such that uniformly in x and P: in case C.2.i, for c\x,p)\<C{,^x)=C\i,\nf{e\x,p)\'+'; apply only to c,e : e +c > 0. Moreover, sup^ Ep^C{et,Xt) < oo. Assumption C.2 allows for the boundary case, where density is zero to the left side of the is positive on the right side. It also allows for the two-sided case, where density positive on both sides. We distinguish these two cases to organize the proofs better. jump and is C.3 and C.4 are needed for uniform convergence of a continuous part of the likelihood ratio. It will tails. consider a 2.2 The be satisfied as long as the derivative of the density is not ill-behaved in the Finally, the linearity of the regression function eases the exposition. Section more general non-linear model with nuisance parameters. Definition and Motivation of Bayes and likelihood function for the LniP) model = n i<n is ML given by /(^' - X',p\Xi;p)dFx{Xi). Estimators 4 will A. density of a two-sided model B. density of a one-sided model \ J 2 4 6 S 10 y A. cdf of a two-sided model Figure 1: B. cdf of a one-sided model Panels B. correspond to the one-sided model. Panel A. corresponds to the two-sided model, which arises in equilibrium job search models and translates into kinks of cumulative distribution functions, see models, when with a low Bowlus The two-sided model also arises from one-sided we draw an outlier which ends up below the boundary of et al (2001). probability the support. This model was initially proposed by Aigner et for al (1976). See also Hajari (1999) a robustness critique of the one-sided model in the auction context. Note that the location of the density introduced jump depends on parameters and in section 4. regressors. Additional shape parameters will be and the ML estimator'^ defined as is =argmin - /?ML On LniP). the other hand, the Bayes estimator minimizes the posterior expected loss P = argmf - p„(6 / dp, p) Jb beB JB^n{P)q{P) whhere p„ (x) = p{nx) is a loss function, and q(-) is the prior density or weight function on B. In the above expression, Ln{P)q{p)/J,^ Ln(P)q{p)dP is the posterior density conditional on the data (Yt,Xt,t < n). It does not depend on dFx{Xi). We impose the standard assumptions on p(-) and Assumption (D.l) q(-) > (D.2) p(-) > 2 is continous on B, is convex, and Examples of conventional loss (z) p = z'Wz where A > IH(1982). q{-), cf. is majorized by a polynomial of = as |u| — ¥ oo. quadratic loss functions, satisfying condition (D.2), include the W and the absolute deviation loss p for positive definite and abs(2) \u\ (z) = A'abs(z), {\z\}. ML and Bayes estimators are efand asymptotically equivalent under asymptotic normality. If normality does not apply, as in our case, the Bayes estimators and MLE are typically not asymptotically For symmetric and bowl-shaped loss functions, the ficient equivalent. Therefore, MLE does not inherit the efficiency of Bayes estimators; Bayes estimators are average-risk efficient under conventional loss functions, while Asymptotic Theory 3 We not.^ The Basic Model for begin with the asymptotic behavior of the likelihood process, and then proceed to asymptotic distributions of Bayes and 3.1 In MLE is ML estimators. Likelihood Limit modern eisymptotic analysis, a common first step is to find the finite-dimensional or marginal limit of the likelihood ratio process. The limit eventually serves to describe the asymptotic distribution of Bayes and ML estimators. Such an initial step is called the convergence of experiments, see van der Vaart and Wellner (1996). Consider the local likehhood ratio function £„(z) ^dFxiXi) does not depend on P and = is LniPn + z/n)/Ln{Pn), hence irrelevant. ^See for example van der Vaart (1999). Efficiency for not for quadratic or absolute deviation loss. MLE could be claimed for the "delta loss" but , where = /?„ + /?o ^In., 6 (5 for any finite denotes the true parameter sequence. R'^ study asymptotic efficiency Finite-dimensional later. This is needed to weak convergence means that J [tn[z,).. and (fi-di) ^cxi() is called a fi-di limit. j < J) ^ 3<J): {eo.{Zj).. — In this section, > (3) denotes convergence under P/3„ . Define pW =p(X/?o) and g(X) =9(X^o)Theorem 1 Given assumption t, the fi-di weak - e^{z) =exp{z'EX\p{X) X exp limit of likelihood ratio in{z) equals q{X)]} where / l^{j,x)dN{j,x) Je x)=\n^l[0<j< x'z] -Hn 4^1 [0 > i > x' z] Z,(i, p(x) N a Poisson is {Xi} are i.i.d. q[x) random measure N(-) = with d.f. Fx and , Jl = {^/} ^^i is n/q{X!), 1 \[Ji, -^i) T[ = and dom variables that are also independent of {Xi} are two i.i.d., 2i=i 1 K-^i' '^D ^ ]> 'where an independent copy of{Xi}, {£i} {£",'} €•]-!- -{£[ + (4) ...+£l), mutually independent sequences of standard exponential ran- The result heis an intuitive appeal. The The deterministic "outer" part, and {X-}. limit likelihood ^00(2) hais two informative parts. £i(z)=exp{z'EX\piX)-q{X)]}, can be regarded as information created by the far-from-jump data. The "inner" stochastic part, £2(2) = exp can be interpreted N is l,{j,x)dJ>i{j,x) J cis = exp J2UJi,Xi) + J2iM.xl) .t=i information created by near-to-jump data. an asymptotic model of the near-to-jump data. As equation (4) of N, {Ji,Xi) and {Ji,XI), depend on regressors Xi in a complex way. shows, the points N is the limit of the point process N(-)= J2 i[{nei,Xi)e-]+ J2 l[ne„X,)£-]. i:ti>0 i:ei<0 The mecisure N(vl) counts the number of points in the limit behavior of N(A) depends ciny given set A. For any bounded set A, only on near-to-jump errors nci and the corresponding 6 The covariate values. to mutually dependent smallest |£i|'s gamma are the ones that matter and they converge in law variables. Furthermore, in large samples the likelihood driven mainly by near-to-jump data, revealing rate is Op{n~^) at /3 rate. The not surprising. In a simplest one-sided case with no covariates, fast is convergence MLE is the minimal order statistic that converges to the end-point of the support at Op{n~^) rate. Note g{x) also an important simplification of the formulae in the one-sided case. Since = £,{z)^exp{z'EXp{X)}, ifJ, >A'/z,Vi 1 '0 The z. inner part £2 is Otherwise, £2{z) very informative in assigning the zero likelihood to certain values of Once is flat. £1 and £2 £2(z) equals by minimizing the posterior = 7r„ [u) is is assigned a zero inner likelihood. (z) is section = £n (w) q the loss function. and q As n —> n{/3 (/3) is P„) is related to the likelihood ratio process / p{z -u)-rrn(u)du. oo, Ujt (/?n £„ (z) is -|- /in u/n) + u/u) du. the likelihood ratio process defined in the previous approaches The E"* 2 Suppose assumptions 1 and is is = it 7r„(z) approaches the limit thus simple to conjecture. 2, also define T^{z)^j p{z-u) = argmin^gudFoo 7roo(u) a function of the likelihood limit only; and the posterior limit posterior ttoo from prior information. The result Suppose that Z^o («) 9 [Pn = n{B — Pn)- the prior density function. ^oo (w) / Jjijj^oo (w) du. Theorem — the posterior density on the rescaled parameter space Un Kn{u) free no z loss: ^niz)= is 0, shape the limit likelihood. The normalized Bayes estimator Z„ p the outer part £i{z) further shapes the Large Sample Properties of Bayes Estimators 3.2 where 1, > model, when q{x) likelihood. In the two-sided Both the otherwise for £ao ^^^f {z) is uniquely defined in (•) specified in Theorem 1: du. W^ a.s. (*), then 7 -^ 7 Remark 3.1 minimum at z The = condition (*) is automatic for strictly 0, since iooiz^ is positive a.s. measure by the assumed non-degeneracy of X. convex functions p(z) with unique on a subset of M'' with positive Lebesgue Maximum 3.3 MLE Zn — Likelihood n^ML ~ Pn) maximizes the local likelihood ratio process:* Zn = aigmm.^u^ - £„ (z) Because ^n(-) is a highly non-regular function, the standard uniform convergence arguments are not applicable. One approach, taken by IH{1982), treats £„{) as an element of a Skorohod space. There are substantive difficulties with this approach in the regression case where there is more than one parameter. Instead, we employ Knight's stochastic equisemicontinuity, which converts the finite-dimensional convergence of discontinuous objective functions into convergence of argmins.^ this Appendix A provides a brief discussion of new concept. Theorem a.s., 3 Suppose assumption then — Z„ Remark The 3.2 1, and Zoo > = that — ^ooC-^) argmin^gKrf - attains a unique may fail MLE in W^ (.oo{z)- condition that —£oo{z) attains a unique erwise the limit distribution minimum minimum a.s. is needed, oth- to exist. with discrete covariates in the one-sided model has The important special case of been studied the remarkable pioneering work of Donald and Paarsch (1996) and Donald in The and Paarsch (1993a). importantly, also two-sided results obtained here extend to continuous covariates and, Ccises. Nonlinear Model with Nuisance Parameters 4 In this section we consider a more general nonlinear model and also introduce nuisance parameters. While the linear model conveys the basic flavor and allows for a better explanation of the proofs, the nonlinear setup conforms with the economic models described in the introduction. The generalized model, denoted R, Y,=g{XuP) + where the error and a £ AC £j R''^. has conditional density / (e|Xi, We assume that the the reference parameter 70 ^If Zn is set- valued, = (/3o, set Q =B is given by U, j5, x hy P & B C R'^^ compact and convex and that a), parameterized A is Qo) belongs to the interior of this we may choose any measurable solution. set. Alternatively, define — t„{Z„) < inf^g^i — fn(z) which it may be difficult to find \ Zn as any measurable 0. Allowing approximate en,fn solutions is useful for situations in the exact optimum. ^MLE is a special Bayes estimator minimizing the posterior loss T„{z) = f Sz{u)(n{u) du = tni^), where Sz{-) is the delta function, which is too irregular to be a subject of the previous section. en-approximate argmin:Zn is s.t. -t- The both /? jump size of the and of the conditional density of e, may depend on at 0, given X^. q: hm / (e|x, P.a) = q [x, p. a) . \imf{€\x.,P,a)=p{x,P,a), (5) £4.0 pixj,a)> Vi e X, q{x,0,a)+S.,S>O:, iP,a) e W' ,d = d^ + d-z- X In other words, the conditional density of Y given jumps at the location g{X,0), which depends nonlinearly on the parameter P and covariate X. The shape of the density depends on the parameters P, a and covariates The additional shape parameter q is X . not related to the parameter of the location function. This model more flexible is therefore considerably than the basic linear model. The assumptions and model are very results for the non-linear the linear similcir to model. However, the presence of nuisance parameters adds to the complexity of exposition. We make the following additional technical Assumption (E.l) The 3 assumptions: following statements apply to x in and is bounded uniformly in e,x,7; in a, uniformly in e,x,7. eachx and Var and sup^g^ 7; (/?, a) in Q: = ln/(y; either (a) Ep_^ - exist € K, |c| hold for all ^'' (t') = If (2/-5(x,i);x,7) c, £ {y for each > 0, < K, c,€ : e C> \-§2 + The parameters 0, > 0. 5 > 0, is except at upper-semicontinuous at e — 0, = e for c». uniformly in x and 0. w.r.t. p, is = (<,s) in an open ball at [^'i(7')]' (l')] is 7 = {P,a), uniformly nonsingular such that for any x and 7: in + c|x,7)| < Ccise E.l.i, for all C(e,x) and in case E.l.ii this only needs to Moreover, sup^ Ep^C{ei,Xi) include location parameters a 7, uniformly nonsingulcir and bounded, or (b) and Ep^ [^'> ln/(e c '^ Under the same conditions, second derivatives < C" (£,x) and sup^ Ep^C"{ei,X,) < 00 the inference about x and hold uniformly in 7. (a) ajid (b) iiT e, - g{X,P)\X;'y)\dy < g[X^,t); X, .-)'). For 7' F^^j (7') and bounded; There 7 positive definite uniformly in p. is QR (E.4) Let /i(7') /(e|x,7) W.l.o.g. Ex f \^ e, c.d.f has continuous and bounded second derivative two continuous and bounded derivatives (E.3) g (x,P) has (E.6) = and 7 {Yi,Xi) is an i.i.d. sequence of vectors in Ex R'', defined on {Q.,T,Py). Xi has Fx, with compact support X. (5) holds, and uniformly in P,a, and x i. or ii. / (£|x, P) = q [x, P, a) = 0, for e < 0. q(x,P,a) > (E.2) Density /(e|x, 7) hcis continuous derivatives in (E.5) X Eire < 00. dominated: | gfl^. In P and shape parameters / (e a. If P + c|i, 7) is known, | regular. Thus, the assumptions E.1-E.6 reflect a mixture of non- regular assumptions like in section 3 and in IH(1982) and regular ones like in van der Vaart and Wellner (1996), chapter 7 (mean-square differentiability). Conditions E.2-E.3 impose reasonable smoothness on the location function and the density function. E.4 imposes a standard mean-square differentiability and finite information matrix for the shape parameter a. E.5 and E.6 impose a standard domination on the score function and derivative. its We ML next derive the limit likelihood process, followed by the asymptotics of Bayes and estimators. Limits of Likelihood 4.1 The model likelihood for this of the form: is Lnh) = n fiy^ - g{X,,p)\Xi;'y)dFx(Xi), i<n ML and Bayes estimators, 7ml where dFx(Xi) does not depend on 7 and factors out. iBayes, are defined as in section The convergence rates needed 2. n and limiting likelihood ratio process are given by Hn ^/n, for respectively. Define as a diagonal matrix with l/n in the first dim(/?) diagonal entries and l/^/n in the = remaining diagonal entries. Let 7n{z) = 7n + HnZ for z ratio process is given by £„ {z) = L„(7„(2))/L„(7„). + ^ K^ (u, v) — > denotes convergence in distribution under jd HjiS for given 6 6 In this section, 7o and a fi and to obtain a nondegenerate Theorem 4 In model R, given assumption P-y^ . The where 7„ , the finite- dimensional 3, localized likelihood ratio in (2) takes the following form,: for A(x) s local likelihood weak = (/3ji, Qn) = limit of the dg(x, Po)/dl3 ^00 (2) =^100(1^) x£2oo{m), 4oo(w) =exp Iw'v- -v'Jj^vj £2oo{u) =exp [u'EA{X)\p{X) X exp q(X)] / l.u{j,x)dN{j,x) Ue JE d I^Mto) ^1 Theorem W = where lu{j,x) process in The In 1. result differs as expected. Second, variable with its [0 < < j A(a;)'u] + In normally distributed is from that in section 3. we have a new term 7. Note that if j N (O^Jy^) First, > A(a;)'u] , N is /3 the Poisson and independent ofN. A{X) = dg{x,Po)/d0 which is replaces X, a normal random ^ioo(i') is a van der Vaart and Wellner (1996), ch. were known, we would end up only with the standard term e.g. needs to be estimated, we have the mixture of "regular" information about the shape parameters /?. > variance inversely related to the information matrix. Thus, the parameter /? [0 ^ioo(v), the log of standard term for regular likelihood inference, ^ioo(v). Since ^1 -h (70) da a and the "non-regular" information about the location parameters Moreover, these information components are asymptotically independent. 10 Asymptotic Behavior of Bayesian and 4.2 = Next consider the normalized Bayes estimator Z„ Theorem 3, Suppose /rj ^oo also that Zoo = argmin^gjjdFoo [z) Z-n — Z^ 2. P2 {v — Z^ = > argmin„ 1 i'')^2oo {v')dv'. Z^ Theorem 4 Z„ tion 3, is 6 (MLE These > Z^ = and Z^ — > /?), ^{a - a)) as- argmin^gjjj results generalize — a.s. (*), then Z^^ = argmin„ /^^^ > = Zoo Theorems Consider next the normalized minimum argmin^gKrf and 2 - ^00(2) we have Z^ and 3. Z^ a.s, — ¥ Z^ = J^^W = iV(0, J~^) are independent. In view of asymptotic independence befor these parameters Also, the limit distribution of the Bayes estimator of shape parameter q coincides with that of is and in ^looCw) Nonlinear Models) Under model R, assump- for — ^2tx3(z)- Z^ are asymptotically independent. Anderson's lemma). This M"* . tween the shape information and location information, the estimators 5 Q„)). p(z). If the additive separability does not hold, part multiplicative additive separability of £oo{z), and Z' — — -^oo ^ ^ooC'^) attains a unique Zn By uniquely defined in due to multiplicative separability of £oo(^) Asymptotics and assuming that is applies, while part 2 does not. is still = {Zl Z^) = {n0 - Theorem /?„), y/n{Q. Z^ and additive separability of of the MLE — [z') dz' - u') (.i^o («') du' f^^^ pi [u and are independent. Note that the independence ^2oo{'^) {n{l3 and that p(z) "'R'' 1. = (Z^, Z^) Estimators Asymptotics for Nonlinear Models) Assume model R, = pi(u) + p-ziv). For £^ () in Theorem, 4 define: 5 (Bayesian sumptions 2 and ML MLE, if the loss function p2 is symmetric (by not the case for the estimators of the location parameter /3. Efficiency The Bayes estimators are loss functions. Theorem efforts This is exactly finite-sample average-riskefficient(ARE) under particular an instance of a well known result, formally stated in Theorem 7. statement an asymptotic one. These results justify one of the main of this paper - the study of Bayes estimators. The ML estimators of location 8 makes this parameters are not equivalent to Bayes estimators even Eisymptotically and, unlike the usual case, do not share the optimality of Bayes estimators in large samples. Average risk efficiency is Lehmann, and one of the others. Before writing it classic efficiency down 11 concepts developed by Wald, formally for our csise, it is helpful to review . the basic idea. Given a parameter 7. an estimator 7. and a loss function p^ (x) = p [H~^x). we can compute the expected risk as Ep^pn{-y — 7 ) The average risk takes the form Jij —7 Ep^pni'y To address )q{'y)d'y section 4, and let 7n((5) fn = where 9 is a weight function fn{{yi, ^i)i=i): = jo + Hn6. and denote the Consider RpAf:K) = q [ (ARC) Ep^^^,^ [p [h;^' [/„ JK the weight or prior measure is earlier. Division all statistics set of all such Define the (exact) average risk criterion where (e.g. = 9(7) 1). the asymptotic results, consider the following notation, define i7„ as in by 'Leb{K) mappings as F„. as 7n(<5)]) ]q{i„{S))dS uniform) and p /Leb(Ji:), the loss function, defined is immaterial at this point. is Theorem 7 (Finite-Sample ARE) For fsayes £ ^n, defined by (e.g. - (measurable mappings of data) loss Suppose model R and conditions E.1-E.6 hold. function p and prior weight q: fsayes S arg inf Rp,g{f,U„). /£F„ We next define asymptotic average risk (AARC) as Rp{{fn},K) =limSUpi?p,Leb(/n,-R'), n—>oo K a compact cube of W' with center 0, and sequence of estimators {/„} in {F„}. To extend this definition to entire M'' define for , R,{{fn},R'') where K limsup [R,{{U},K)] , W' denotes an increasing sequence of cubes converging to W' '[ Theorem = 8 (Asymptotic ARE) Suppose model R and conditions E.1-E.6 hold. For {/sayes} £ {Fn}j defined by loss p and prior (weight) q: inf {/"}€{F„} Because Bayes and RpifnX) = Rp{{fBayes}X) ML estimators of location parameters are not Jisymptotically equiv- alent (equivalence holds for the shape psirameter a under symmetric ML estimators are not optimal under the convex loss functions contrast to the usual case where, for a large class of loss functions, equivcJent to the Bayes estimator and shares We its loss function P2), the considered here. This is in MLE is asymptotically optimality. next examine whether theoretical efficiency translates into actual efficiency gains with a simple monte carlo example. We consider simple one and two sided models with two covariates. In the one-sided case we generate data as ~ Uniform [c/m + (1 — l/m) c, c] c = c-PiXi- P-iX-i, Xj ~ Uniform (a, b),j = Yo 12 1,2 Table Mean Squared Error 1: For Two Simulation Experiments MLE The Pi /?2 01 02 One-sided Model 0.8769 2.2030 0.0473 0.0395 Two-sided Model 0.1761 0.1244 0.0587 0.0502 distribution in this model, example can be rationalized by a simple procurement auction C draws a random cost from the uniform distribution on C through submitted bids, which depend on -|- (1 — 1/m) C. resulting in the The two-sided example the data Y= is = n is We only observe the the Bayesian Nash equilibrium bid function: model above. a contaminated version of the Yo with prob. A and = 0.9 and Y= L= Uniform (L^c/m first example. In particular, sum 200. Table 1 reports the — l/m)c) with prob. using Pi 02 = 0.5 Some Practical Questions results enable the construction of confidence sets. Inference about (for = A mean square errors across 300 simulations. The smaller mean square error than MLE. Confidence Intervals. We must distinguish between the estimates parameter a and the estimates of the location parameters p. Bayes — 1 of Confidence Intervals and The obtained -f (1 We simulate the above two models, 2. Bayes estimator has a substantively 6 Each bidder bidders. (c, c). generated from: where we chose A for m which there are n auctions. In each auction there are in Y = c/m Bayes a parameters symmetric is fully regular. loss function pi) is The of the shape limit distribution of either To given by N{0,J'~^^). MLE and facilitate inference we need to estimate J^^^- This can be done by conventional methods, taking the parameter estimate P as given. CorollEtry 1 Under assumption ^ = -\Y1 3, for Bayes or 5!^ln/(^i MLE a,P - g{Xi,p)\Xi,p,a) -^ J^„ i=l An alternative brevity. is the familiar outer product of scores in a, which The resampling methods available for inference about bootstrap. Although the bootstrap is a we do not state for include subsampling and not investigated formally, we can conjecture due to Mammen's theorem and asymptotic normality (see it works Horowitz (2000)). P poses more difficulties. Neither Bayes nor ML estimadistribution. The nonparametric bootstrap is not consistent in Inference about the parameter tors have a standard limit the present setting. A simple counterexample 13 is the boundary model without covariates, . MLE which case both Bayes and in nonparametric bootstrap We is known to are functions of the fail in minimum this case (e.g. order statistics. The Horowitz (2000)). discuss a subsampling and an analytical approach to confidence intervals. We more practical and computationally simpler method. Following Politis, Romano, and Wolf (1999), let Wi,...,Wn„ be equal to the Nn = (") subsets of size 6 of {{Yi.Xi). i < n}. ordered in any fashion. Let /i ...Ib be chosen randomly with or without replacement from {1,2, ...Nn}- Now, let On^^i be equal to the statistic of interest believe subsampling a is , evaluated at the data set 9i, Tn{0 — 9), where is Tn =n W, The approximation to the limit distribution function of . for the location parameters or t„ = ^/n for the shape parameters, given by Ln,b{x) By inverting Ln,b{x), = we obtain two-sided confidence interval is 1 i{n{On,b,Ii -B^ B - On) < various a-quantiles c„^t,ioi) = — -^n U'*)- ^^^ level 1 — a — t~^c„,(,(1 a/2),^„ — T~^c„,6(a/2)]. — On\ can be used to construct symmetric obtained as [On Similarly, the empirical distribution of Tt,\6nfij- x} confidence intervals. Corollary 2 The subsampling method of estmating the lim,it distribution of t„{0 — 0), is Bayes or MLE, is consistent in the sense of Politis, Romano, and Wolf (1999) where (Theorem 2.2.1 (i)-(iii)), and the asymptotic coverage probability of the confidence intervals achieves the correct nominal value, as long as b —^ oo, b/n -4 0, and B —> oo, as n —> oo. is discussed in detailed in chapter 9 of Politis, Romano, and They provide the calibration and the minimum volatility methods. In the empirical section, we use 1/10 of the sample size. The confidence intervals are not sensitive The choice of the block size b Wolf (1999). to block size variation. This The distribution. is probably due to the fast rate of convergence to the limiting insensitivity principle underlies the An alternative is an analytical method, process N, taking the estimated parameters the solutions Z^o. This method subsampling is minimum method. volatility based on simulating the distribution of a poisson detailed in as given, then obtaining i^o and computing Chernozhukov (1999). In the present context, preferable on computational grounds. is Computational Methods. Modern computational methods are important for the inference and estimation methods available to practitioners. It making used to be that the Bayes computations were cumbersome and hampered the applicability for many years. Markov Chain MonteCarlo (MCMC). This technique allows the simulation of a markov chain Zi, ...Zj whose marginal distributions are approximately the posterior distribution. The method allows efficient and numerically stable computation of the Bayes estimates. Detailed discussion Since approximately 1990, this problem hsis been overcome by can be found We e.g. in Robert and Casella (1998). implement these computational methods The programs subroutines pute the cU'e cire available from the authors. coded MLE, we in C. Our program for the models considered Our implementation uses uninformative (flat is feist in this paper. over R"*) prior. use simulated annealing, an algorithm that handles general 14 main To comnonsmooth since the Table Worktype 2: Summary Statistics Avg. Stdev., Avg #. winning bid winning bid bidders #auctions (1989$, mil) 2 141 1.006 1.149 5.91 3 181 1.500 1.870 8.59 4 405 5.015 9.497 7.46 objective functions. Therefore, computer programs needed 7 for we provide not only Empirical Illustration We consider a data set of bids submitted the the theory, but also the tools and implementation. New in a procurement contract auctions conducted by (NJDOT) Jersey department of transportation period, the NJDOT in the years 1989-1997. Over this conducted 1025 low-price, sealed-bid auctions of contracts to procure various types of services such as highway work, bridge construction and maintenance, and road paving. Most of the services procured had few auctions conducted. In the following, we consider only three types of services. See table (2) for the summary statistics. Hong and Shum (2000) give a detailed description of the data. We focus on the independent ticular private value we assume that the construction model formulated Paarsch (1992). In par- in cost follows independent pareto distributions, as studied by Paarsch (1992) and Donald and Paarsch (2000). Precisely, the cost distribution for construction companies is given by, for 9 h{c) = ^^ = (Oi, 9-2): 0<9i<c,0<92. Paarsch (1992) and Donald and Paarsch (2000) showed that this implies the following density function of the winning bids, conditional on the number of bidders and other m covaxiates that affect the distribution parameters 9: g2m{gig2(m-l)/|g2(m-l)-ll}''a"' ( f{y\m,9) = ML > »,(™_i)_i otherwise I Table 3 reports the giffaCm-l) -r II 2/ »"="-+' \ estimates and the Bayesian posterior model. The variation in the summary statistics of mean estimates indicate that the jobs defined in these contracts are very different. separate parameter estimates for each type of contract. Also, and 95% symmetric confidence we 15 Hence we present give the intervals constructed using subsampling earlier. for this the winning bid across types of contracts 95% equal tailed method described Table 3: One sided Pareto Estimates MLE 6-2 logL 01 0-2 0.0160483 0.57543 -1341.05 0.0789063 0.706494 -0.0140822 0.562586 0.0455028 0.541959 2 equal tail symmetric 0.0154688 0.598813 0.0839134 0.73919 -0.00578062 0.552359 0.0620981 0.669373 0.0378773 0.598501 0.0957144 0.743614 0.0593045 0.568535 0.0465869 0.356914 0.0377357 0.56294 0.017097 0.268854 0.060384 0.589204 0.0493621 0.357122 0.0396208 0.55182 0.0264918 0.291758 0.0789883 0.58525 0.066682 0.42207 0.0222681 0.646935 0.146176 0.555795 0.00208421 0.627295 0.131181 0.329598 0.0223695 0.666575 0.157989 0.566001 0.00392716 0.627295 0.13406 0.368623 0.040609 0.666575 0.158291 0.742967 3 equal tail symmetric 4 equal Bayes ^1 worktype tail symmetric -1955.05 -7975.9 Table 4 reports the parameter estimates from an alternative two sided model: \\ e^m{e-i,0-i(m-l)l\e2(m-\)-l\\ ySj-^ + l 1-, \^ = f[v\m,e) "^1 " 2/ ^ e2{m-l)-l ifO<y<^iMiri:iil S2(m-1)-1 and otherwise. We chose A = 0.02, which accommodates outliers that do not conform to the theoretical model. In table 5 we introduce a continuous only available for work type where X denotes 4. volume. traffic covariate for the traffic volume. This covariate is exp (qi + as x X) and 62 — exp (02), appears to be significant, although there We parameterize 6i = The coefficient are large discrepancies depending on the estimation method and the model. This is in- dicative of misspecification. An alternative approach to direct parametric inference for independent private value auction model Their insight is is the indirect inference approach of Guerre, Perrigne, and based on examing the a representative bidder max 6 f i first Vuong (2000). order condition of the optimization problem of in the equilibrium: g-i [x) (6 — c) dx -g^i (b) {b-c)+ G-i (b) = Jb where G-i and g-i denotes the survival function and the density function of the distribution of the minimum bid among bidder i's competitors. Therefore, a two step procedure can be used. In the first step, G-i and g-i are estimated using the bid 16 data. In the second Table 4: Two sided Pareto Estimates MLE 2 equal tail symmetric 3 equal tail symmetric 4 equal tail symmetric Bayes 9i 0-2 logL 0, 02 0.259243 0.55482 -372.445 0.22572 0.810755 0.180359 0.498768 0.149349 0.776126 0.272495 0.569737 0.236047 0.85172 0.188709 0.498771 0.154353 0.297086 0.329778 0.61087 0.770756 0.850753 0.418519 0.512854 0.3074 0.361401 0.316776 0.453844 0.203662 0.336442 0.439423 0.529476 0.31702 0.36304 0.320138 0.491592 0.208184 0.344999 0.516895 0.534106 0.406617 0.377803 2.87942 0.535876 1.84626 0.549167 2.46807 0.529865 1.35817 0.53917 3.1199 0.553112 1.98644 0.569452 2.55317 0.518958 1.42778 0.530891 3.20566 0.552793 2.26474 0.567443 worktype Table 5: Model with -656.18 -2375.17 Traffic Volume (Type MLE 4) Bayes ai 02 as logL ai 02 as one-side -3.812 -0.436 -0.000 -6996.4 -1.096 -0.488 -0.697 equal -3.833 -0.440 -0.001 -1.637 -1.092 -0.941 -3.810 -0.409 0.006 0.137 -0.453 -0.557 -3.830 -0.453 -0.006 -2.160 -0.615 -0.911 -3.793 -0.419 0.005 -0.032 -0.362 -0.484 two- side -1.131 -0.652 0.547 -0.540 -0.639 0.091 equal -2.150 -0.655 0.544 -0.842 -0.670 0.058 -1.114 -0.634 0.645 -0.461 -0.588 0.138 -1.820 -0.670 0.451 -0.812 -0.681 0.049 -0.441 -0.634 0.642 -0.269 -0.596 0.134 tail symmetric tail symmetric 17 -1984 step, the pseudo-cost values .9-1 (b) are constructed for each bid in each auction. Then the distribution of these pseudo-values can be used to infer the latent distribution of the cost parameter. G_, and g-i can be easily inferred from the data for any symmetric affiliated private value model. It suffices to observe the winning bid. since n-l G.dx)={F{x)) where F The (•) and / (•) / - and g_,(x) 777 — - 1 f {x) and Vuong (2000) uses a non- F (x) and f (x). Here we consider a flexible parametric used the truncated normal distribution parameterized as: parametric method to estimate We \ are the survival and density function of the whining bid, respectively. indirect inference approach of Guerre, Perrigne, approach. 1 = i^-^^j F {x)-- /<''->(^)/*(^) where we take In 6 = ai -I- the results for worktype a4 x m, In p = we notice that the new model First, 02 -I- as x m, In a = 03 Qe x m. Table 6 reports -I- 4. is an improvement over the pareto model of the log Hkelihood value, suggesting a significant improvement in the fit. in terms Although we did not develop a formal testing procedure for the nonstandard likelihood, the principle of Vuong in the information-theoretic sense. (1989) suggests that the model with the higher likelihood Second, we note that the Bayes posterior likelihood estimates, although none of the slope surprising, since the likelihood surface hcis ters is greater. and the mean is closer to the estimates differ from the coefficient reverse its sign. many more modes when data maximum This is not the number of parame- Based on the computational experiments, theoretical efficiency properties, Bayes estimators (posterior means) minimize the globally convex fact that the objective function, they We did not report may be preferred. the results for the case with the traffic volume covariate. Adding this covariate produced a tiny improvement of the log likelihood and did not change the original coefficients. The estimates for the traffic volume coefficients were highly methods yielded agreeable insignificant. Both results concerning that covariate. Overall, the two-sided models fit data better that one-sided ones, indicating that con- do not conform the model is important. We also observe that the parametric variant of the indirect inference approach of Guerre, Perrigne, and Vuong (2000) is quite valuable and allows to fit the particulcir data better than the Pareto model. trolling outliers that Conclusion We studied a general model in which the conditional density of the dependent variable jumps at a location that is parameter dependent. This includes a variety of the boundary- dependent model discussed in the recent literature of structural estimation. 18 We derive Table 6: Truncated Normal Model for Worktype 4 MLE woiktype 4 Q2 a-i as logL equal tail symmetric ai = Os oe -694.854 -2.38883 0.404125 0.628959 -0.0702671 -2.58986 0.207458 -2.56234 -2.39153 -0.23598 -0.14270 -2.61471 0.167639 -2.3802 0.54492 0.80567 -0.04021 -2.5829 0.23593 -2.55203 -2.07808 0.15484 -0.12891 -2.60658 0.168674 -2.22562 2.88633 1.10308 -0.01162 -2.57313 0.24624 Bayes equal tail symmetric Qi 02 as Oi as 06 4.40077 -2.41695 0.643259 -2.38707 -2.26628 0.211352 3.84919 -2.93896 0.462919 -2.60164 -2.56957 0.172165 4.87133 -1.88786 1.047 -1.8852 -1.77164 0.23446 3.88809 -2.94462 0.347846 -2.8686 -2.72846 0.182684 4.91346 -1.88929 0.938672 -1.90554 -1.80411 0.24002 asymptotic distributions of Bayes and ML estimators under general conditions, and offer and inference methods. The practical computation results provide a solution to a long- standing econometric problem. Our models; results extend previous (2) inclusion ciency; (3) considering the and (4) in several directions: (1) handling general regression effi- two sided model as a robust alternative to the one-sided model; using the point process methods to give a precise characterization of large sample distributions. erties work of Bayes estimators, which enjoy the small and large sample Bayes estimators are important alternatives to which the MLE does not share. The methodology MLE in this due to efficiency prop- paper also provides new insights into the analysis of asymptotic distributions for models that cannot be studied using the conventional tools of locally asymptotically normal models. The empirical application presented illustrates the usefulness of the results. We estimated some key auction models. The first model was a stylized pareto model of Paarsch (1992). The second auction model represented a flexible parametric alternative to the nonparametric approach of Guerre, Perrigne, and Vuong (2000). We find that the two-sided models fit data better than one-sided ones, indicating that controlling for outliers that do not conform the model is important. We also find that a parametric variant of the indirect inference approach of Guerre, Perrigne, cind to fit Vuong (2000) data better than the direct parametric approach. 19 is quite valuable and may allow A Useful Background Definitions A.l Point Processes cf. Resnick (1987)) Let E be a locally compact topowith a countable basis, and £ to be the Borel cr-algebra of subsets of E. A point Definition 1 (Point Measures, Mp{E), logical space measure (pm) p on {E,£) is a measure of the following form: for {xi,i lection of points (called points of p), and any set A ^ p{A) €: = Yli > 1}, a countable col- p{K) < oo, p{x) < 1 Vx e £, G A). ^(^^t If K C E compact, then p is said to be Radon. A p.m. p is simple if compound otherwise. Let Mp(E) be the collection of all Radon point measures. Sequence {pn} C Mp{E) converges vaguely top, if/ /dp„ —> / fdpioi all functions / e Ck{E) [continuous, any for and is and vanishing outside a compact set] (cf. Leadbetter, Lindgren, and Rootzen (1983)). Vague convergence induces vague topology on Mp{E). Topological space Mp{E) is metrizable as complete separable metric space. Mp{E) denotes such metric space hereafter. Define Aip{E) to be tr-algebra generated by open sets. real-valued, Definition 2 (Point Processes: Convergence in Distribution.) is a measurable map N : (ft, —> {Mp T, P) the realization of the point process gence of the point process N„ (E) Nniw) , Mp(E)) is taking values in we shall write N„ => N in Mp{E) ous and bounded functions h mapping Mp{E) to R. Resnick (1987): Je — f{x:)<I^n{x) J^ f(x)dN{x) > for with mean intensity measure (a) for F any e £, m i.e. if {Fi,i Fp/i(N„) -^ Eph{N) Note that if Nn => N for all continuin Mp(E), then Random Measure (PRM)) (defined on {E,S)), Point process N is a if and any non-negative integer k k) are disjoint sets in £, ifm(F) then (N(F,),i < fc) cire = oo, independent random variables. Convex Objectives A. 2 The < A point process in Mp(E)'' if ^ '^"^"1 (b) for any / e Ck(E) by continuous mapping theorem. Definition 3 (Poisson Point Process or PRM ^' every elementary event ui G fi, some point measure in Mp{E). Weak converMp(E) is the same as for any metric space, cf. , result can be found in Knight (1999). Knight and Pollaid. It a generalization of earlier convexity lemmas by It is allows discontinuities and R - valued objective functions. Lemma 1 (Guyer) Suppose {Qt} is a sequence of lower-semi-continuous (Isc) convex ^-valued random functions, defined on R**, and let V be a countable dense subset o/R'*. IfQx fidi-converges to Qoo in R on P where Qoo is Isc convex and argmin Qt{z) provided the latter A. 3 is uniquely defined a.s. in finite — > R on an open non-empty set a.s., then argmin Qoo{z), . Stochastic Equisemicontinuity The remarkable concepts of this section were recently developed by Knight summarizes some essential elements we need. 20 (1999). The following Epi- Convergence. Suppose the sequence of objectives {Qn} are random lower semi-continuous (I-sc) functions (i.e. Q„(x) < liminf^.-n Q„(j;j), V2;,Vij —> x). Let C be the space of 1-sc functions / : R"^ —> R, s.t. ^ / C can oo. be made sidering a special metric, convergence in which Rockafellar and Wets R°, ..., Rl, and any real P{ Dj^i { inf x£Rj Q{x) > ri, ..., < rj}) Lemma i. ii. Hi. Zn Zoo by con- Knight(2000), in C: with open interiors if for any closed rectangles Ri,...,Rk in R'' Qn said to is rjt: liminf P( n > n^^^i { inf Q„(a:) x£Rj { inf n*=i Q„(x) rj}) > r-,}) < P( n*^=i { inf Q(x) > rj}). a weak condition that leads to the convergence of axgmins. Theorem 1) Suppose that Q„{Zn) < ini^^jidQn{z) + (n, En 2 (Knight, is s.t. = Qn{) Zji is (cf. Hence one can metrize the weak convergence < limsupP( Epi-convergence into a complete separable metric space equivalent to epi-convergence Q (1998)). epi-converge in distribution to is dTgmin^^^dQooiz) \ 0; Zn = Op(l) uniquely defined in is R a.s. epi-converges in distribution to Qoo{), then ——> Zoo Epi-convergence is discontinuities. In more general than uniform convergence, because our case, (lots of) it allows for rather generjil non- vanishing discontinuities make the uniform convergence of the likelihood function impossible. Provided the finite dimensional distributional condition for epi-convergence in distribution is (fidi) limit exists, the necessary and sufficient stochastic equi-lower-semi-continuity (s. e-l-sc), developed by Knight (1999). Stochastic equi-semi-continuity. Sequence {Qn} £ £ is s. e-sc. if for each bounded > 0, and (5 > 0, there exist ui,...,Uk G B and some open sets V(ui),...,V{uk) covering e containing ui,...,Uk s.t. limsupP(Uj=i Lemma 3 (Knight, Thm { s-esc condition amounts (and hence of argmin) of Q„ will Q„(x) < min(e"\(5„(uj) Qn and only is s.e-lsc. if {Qn} - e)}) < <5. Then {Qn} converges to Qoo in distribu- epi-converges in distribution to Qoo(). to the possibility of approximating the distribution of the infimum B by an approximate minimum of Q„ over a with given precision (e, 5). It is worth emphasizing that the naive over bounded set carefully chosen grid {ui, ...«*}, uniform grids inf 2) Suppose tion in finite- dimensional sense if The B, and set B not do the job in our case. The application of stochastic equisemi-continuity leads to a simple proof even in the no-covariate case, which could improve (in terms of length) over the arguments of Ibragimov and Hasminsldi. B Proofs for the Linear Model In the proof we set the local parameter sequence j3„ = Po- sequence Pn does not change the arguments but introduces a 21 Putting through the general IoceJ lot of notational complexity. — B.l Proof of Theorem 1. Q„ Consider the local log likelihood ratio process (z) = lnL„(/3o + z/n)/Ln{fio) n <?„(2) = ^g,„{^) X [l(e, >X',z/nVO) + l(€, <X',z/nA.O)] + 1(0 > 1=1 n + ^9.n(^) X [1(0 < e, < X'.zfn) > £, X',z/n)] 1=1 = q,n{z) = Q\n (z) - In [fiYi Qi„{z) and ^211(2) behave very I. Limit of Qin{z) By like statistic. + Qin X.'(/3o (z) where , + z/n))\X,,l3o + z/n)/f{Y, - X,'/3o|X„ ^o)] • differently. analyzed by techniques similar to those in IH (1982). {Q\n{z)} is CI, C4 and LLN, Qin{z) converges in probability for each it — > is an average z: = -z EX Qloo(z) (6) f(€t\X,Po) For a density function / that has a dominated derivative everywhere except at 0; fj^f'(u)du — /(O""") + /(0~). Thus applying this to the conditional density / (e|X,/9o), we have: Qi^{z) = = z'EX\p{X)-q{X)]. Next we use stochastic equi-continuity to convert pointwise convergence to uniform convergence show that for any \zi — 22! > 0, the term — in z. It suffices to n n |^gi„(2i) X l(ei >X,'zi/7jV0)-^gi>i(22) x >X,'z2/nV0)| l(ei ^0, (7) 1=1 1=1 as well as other similar terms in Q„ (2). The left hand side of (7) bounded by is 71 ^l(e,> 4^ v4^V0) ln/(6i-4^|/3o + x zi/7i) - In/ (e. - 4^|^o + 22/n) 1=1 , By C.4, the sum first > jr{ l(e, , in (8) is , by, for 2' in the bounded > XiZi/n V XiZ2 n V 0) (8) fU-^\0o + z,/n) , — -— x \ convex hull of — 21 and -— 22 x f(ei-X[z'ln\Po-\rz-ln)\ n I I (9) <-y;i(ei>0)x n -^ The second sum sup I [21 - some C, C, C" > {}''|f°,'' T' /(cil^o) by, for x 22I = Op(l) x \zx - 22I I f:i(e,e(0,^V^])xCx|n!l^rx&N^ sup .zoPZ l.zjSZ z-i x | bounded in (8) is 1 < const '^ " E 1 (e, e (0, C'/n]) x C" = ._j , 22 Op(l) since f'{t\x) X has a compact support. Thus Limit of ^271(2)- Qzn II. and can be modeled We as to the point process that measures these occurrences. w.r.t. constructs the key point process and derives 1 < Step 2 shows that {Q2n{zj),j representation. (7) follows. driven by the "rare" occurrences of the near-to-jump observations is an integral the proof into two parts: step split when € > u > Other terms are similarly checked. uniformly bounded above and f{u\x) uniformly bounded below is by C.2-C.3, and J) its limit a continuous transformation of the point is process. Step The Key 1: E is taken to is a compact subset of E. £ The key E = — oo,0) U (0,+oo) x X. The topological space on R \ {0} and R"* n X. So that, e.g., [a, b] x X Point Process. Define ( be a product of standard topologies of point process is is the Borel u-algebra of subsets of E. a random measure taking the following form; n = J2l[{ne„Xi)eA], f<!{A) 1=1 any for set A N in £. a random element of Mp(E), the metric space of nonuegative point is measures on E, with the metric generated by the topology of vague convergence. Appendix gives definitions. We will A show that N =» N in Mp(E) where (a) N By is defined in the proposition F 6 T, open bounded rectangles C.l and C.3, for any intersections of across i m is and in steps (a) compact open in E), limn-+oo BN(F) = < dFx(x) (b). sets in limn->oo E (finite unions and nP({ne,X} £ F) = m{F) < 00 defined in the proposition. Since the events {{nei,Xi) £ F] > by C.l, by the Meyer's + 0)du Lemma q{x)l{u ( lim P{SS{F) which by Kallenberg's Theorem^ [ Appendix A) implies that the mean measure 7n(). ( 0)du] x Meyer (1973) n—voo process done is the basis of relatively \p{x)l{u / where measure This 1. = = 0) ) we are independent also have: e-"'<^*, and N clearly simple N => N ia Mp(E), where iV is = a.s.] is the definition of the Poisson a Poisson point process with N Next we show that has the same distribution as the process N, stated in the proposition. homogeneous PRMs No and Nq with points {Pj} aind {P;}, defined in the proposition. No has the mean measure mo(du) = du on (0, 00), and No has the mean measure 77io(d'u) = du on (— oo,0). Now because No and No are independent, (b). First, define canonical Ni(-)=No(-)+No(-)' is the Poisson point process with the mi{du) Because {A",, A"/} are i.i.d. mean measure = \(u> Q)du + \(u < Q)du on R\ {0}. and independent of {FijPj}, by the Composition Lemma(cf. Proposi- tion 3.8 in Resnick (1987)), the composed PRM N2 with points {{Ti,Xi},{T'i,Xl},i>l,j>\) inR"* ^For example, Resnick (1987), Prop. 3.22 23 mean has the product intensity = m2{du,dx) Finally, N with measure given by > [l(u + 0)du < l(ii 0)du] x Fx{dx) on R\ {0} x X. the transformed points {T{r,,X,),T{r',,X-)}, where T:{u,x)>-^(l{u>0)-^ + l{u<0)^, x) = m2oT~\dj,dx) = m{dj,dx) by the Transformation Theorem Step X : 2: The Functional > q{x) compact > i5 E mean measure on has the desired Xs 0} and (b) set Z, as > + 0) q{x)l{j <0)]dj x Fx{dx), Poisson Processes( Proposition 3.7 in Resnick (1987)). for Key of the \p{x)l{j Point Process. Here X {r G = q{x) : 0}. we distinguish two cases (a) f(5 — x'z/n\x,l3o + z/n) q(x) uniformly in {<5,^,x 6 R+ x f(S — In uniformly in {S,z,x X Z x p{x) > x'z : x'z/n\x, Po < 0, + 5 < {x G any x X Zx p(x) In {l Mx). < x' z : > 0, 5 > [0 + 0(n-')) (11) x'z/n}. Note that in case (a) this holds by C.3, while in case (b) equation (10) holds identically equal to [Dn^l (10) and x'z/n}, z/n) mx,po) eR- a + Oin-')) In f{s\x,M = for ^ oo, n In Q2n(z) X= Note that by assumption C.3, — oo. Hence < nu < X.'z]+f^ln^l p >nu >XU]]^{l + o,(\)) (12) sQ2„(z)x(H-0p(l)) may (Again this expression uniformly in z over Z. equal — oo in ceise (b), create a problem for the proof.) Write Q-2n{z) as an integral w.r.t. but that does not N: Q2u{z)= f h{j,x)d^U,x), JE where lz{j,x) (a): By Kz = is We defined in the theorem. consider cases (a) and (b) sepsurately: conditions C.1-2, the function (j,x) X X, [—T], +7)] T] = sup3,gx l^'-^l, i-> where 77 lz{j,x) < = x'z. Next define the map T : Mp{E) i-^ bounded and vanishes outside the set Kj is a compact set in E (by the and at lz{j,x) is discontinuous at j = is oo by C.l. standard compactification of E). The function {j,x) j (13) i-> R' as N^(^J^h,{j,x)dN{j,x),k<iy The mapping T is discontinuous at the set V(T) = where (j/^ , if* , i > 1) [Ne Mp{E) jf = : or jf = z'^xf for some i>l,k<l} denote the points of N. By C.3 and the construction of P[N 6 ©(T), for some n > 1 24 ] = 0, P[N 6 I>(T)] = 0. N (14) : Therefore N N => in Mp{E) implies -^ T(N) -^ {Q-2n{zk),k<l) = where Q2oo{z) Now (b): T(N). We conclude that (Q2oo{zk),k<iy (15) f^L{j,x)d'N{j,x). = consider the second case: i2n{z) = £2n(z) if N(A(Z)) > explQanCz)}- Note that = hn{z) 0, 1 if n{A{z)) = 0, where A{z) = Observe also that i2oo(z) dimensional convergence (for if = N(^(z)) > A^ = eR+ xX:j< {{j,x) or ^200(2) 0, suffices to < Kj show {f<S{A(zk)),k theorem, this follows from — P(£2<»(2*:) r„() < Kj. By {'N(A(zk)),k > to (1), it suffices an integral with respect to in over (z) is rji^i (2) = / p{zJ\u]<M u) in («) q {Po Define their corresponding limit process rft, (z) = p (2 / - R** There are two steps . + u/n) du, = a.s. (fidi) convergence of in rfj suppose that for is which = [ tn J\u\<M each z and each (16), by two Elniz) is it -t- u/n) du. r'^ = loo (u) q{Po)du. > 0, 5 > suffices to 3M 6 R+ 0, P (|r„ (z) - r^ It follows (z) I > e) < J, (16) from the exponential smallness of the likelihood show (ri^.T^i tails, (4) ajid (5). (z*) ,fc -^ < k\ {t^oc,T^oo (zk) ,k < if V This facts: = lEll'^'iz') i. e demonstrated in the subsequent lemmas view of follows {u) q (^0 M € R4-, J\u\<M a tail-smallness property. is the proof: of R**. For leirge cis u) too (m) q(l3o)du, lim sup Fact finite- -U (r^(zk),k<Ky J\u\<M ii show the continuous mapping show the finite-dimensional Approximate r„ (2) over R** by an integral over a compact subset approximate r„ (z) by Tf (z) = rf, (z) /T'^^, where i to and N(9j4(za:)) 1. 2. In Thus toTooC-), (rn{zk),k<K'^ This 0. = Ak,k < k), N ^ N in Mp{E), since N(9yl(z*;)) = Using the convexity lemma Now = N{A(z)) 1 if Proof of Theorem 2 B.2 r„ x'z). 1): p{l2Azk) = Ak,k < k) —-> it = Koo - eV^{z)\^ < C\z - z'\ the definition of likelihood ratio. conditions of a limit theorem for integrals of Fact ii. random (1982). 25 is by Lemma functions, 5-6. These facts check the in Appendix I of IH Theorem 22 , By condition Show Show • • first C.3, — ^„ is lower-semi-continuous. the stochastic equi-lower-semi-continuity step was shown in Theorem by steps, the conclusion follows = — log^n}. We know We 1. Lemma (6) implies is B C > and S C ur=iV(zjt) and R** done is -A Q2n(z) 0, = J2 1 [ineip{Xi),Xi) e < N < Fin T2n--- i > ; : > e, r2„... . = Thus iV() < ajiy a fixed bounded zi,...,Zm s.t. S- 1 process: [(neiq{X,),X.) G •] of { n£,g(X,), i : tv < }. Denote by Xin,Xin the corresponding to -I- 1 [(r'i„,X,'„) e ] J2i>i 1 [(Fin, AT.n) e •] N, say T(N), from Mp{E) Theorem e 1. ] + l[iT'i,Xl) Also, {rin,—,Tkn) e •] to Mp{E). Therefore, in M-p(E) where the distribution of the points by continuous mapping theorem, — > in R* (ri,...,rA;). of this to chciracterize the equisemi-continuity of Q2n- Write Q2niz) We ^ + •] ri„,r'2„,... }, > r'i„ N() = T(N) = E,>il[(ri,-^0 defined in the statement of all show that or some suffices to is equivalently in terms of the order statistics a continuous transform of need 0, ei<0 r,„,r;„ realizations of the covariate. We it -Q2niz) < -Q2n{Zk)} inf Ur=l{ o{{neip(Xi), ri„,r2n,..- N{.) => -^ in several steps. Represent the points of so that {—in}, or equivaJently of where Q2n(z) = J^h{j,x)d'N{j,x). Qiooiz) show s-e-lsc of {—Q2n(z)} only. Ci>0 is " Q2n{z) seemingly strange point process] Construct the point iV(-) is (e-lsc) of there aie open neighborhoods V(zi),...V{zm) of 0, 1.5.1 in Op(l). Combining these two of' Theorem 1 that: a piece- wise constant function P (a) [A = that Zn sets, linear function. It therefore suffices to Because Q2n(z) — in}, 2. - Ql^iz) uniformly in z over fixed compact iV (e-lsc) of { demonstrate the second step next. Theorem lemma from the proof Qln{z) This steps axe needed: remains to show stochastic equi-lower-semi-continuity It {— <52n B Two finite-dimensional convergence of in to the stochastic limit ^oo IH(1982) shows that the conclusion of set . Proof of Theorem 3 B.3 The . = f h{J,x)dN(j,x) = hij,x)dNU,x) + f JE:j>0 JE:j<0 Q+n{z) + Q^niz)- examine discontinuities of Q2n by examining that of Qtni^) ^^d Q2ni^)- Because the argu- ments are identical (b) [Picking for either part, consider only part OjnC-^)- the Cover] Fix a bounded equal-sized cubes {X^{xj),j < set B C R'^. Cover X by the minimaJ J(4')} with the side-length equal to denoted as Xj. Construct sets {Vkj,k= -m,...,m,j <J{4>)} CR"* 26 <(> number of closed and the centers of cubes as Vkj s {z 6 R"* — Vf. < (p Vf. X < p{x)x'z = + <p,^x Vf. k G {—m, for k(p, e X^{xj)}, where > ip and 0, ...m}. ..., bounded away from and above by by selecting sufficiently large m. (c) [Number of Break-points is Op(l) and separated for small ip] Consider argument z in UjVJtj, then a discontinuity in Qj„(z) can potentially occur in UjVkj only if there exist z* £ UjVJtj and (TintXin) s.t. Since is a nondegerate compact subset of R'', and p{x) f where there (17) , must be that it Uk If — P\Xin)X.jjiZ Z7i is B assumption, {UkjVkj} can cover any given bounded set is - ^ 'P <v^ + r,„ 'p- such (r,n,^in), we say that Q^„ has a breakpoint in OjVkj- Note that Qj„(z) can not in Vkj with fc < because Tin > 0. Define A/k = #{« Tin < fc}, A/" = #{»' have breakpoints Fi < fe}, where k : : = Mn snp^^^^^gp{x)x'z. the upper bound on the number of breakpoints of is —M in R. So the number of breakpoints ^ Qf„(z) in set B. By continuous mapping theorem, jV„ 0{J^n) is stochastically bounded Op(l). Furthermore, break-points are separated: no more than one break point can happen in UjVkj with probability arbitrarily close to one if ip is sufficiently small. Define Ak to be the event that Qtni^) has more than two break-points in UjVkj. P [UfcAfc] < hmsup P limsup n |F,„ -F(i_i)„| <2ip min |Fi„ - F(,_i)„| min IF; — F/i_i) min , n < hmsup P < + P[Nu>K] 2ip n P < < I + P[N'>K\ 2(^ <5/2 which is P E Prom now (c) ["smart" on, K and <pi t; : < sufficiently large so that [mini<i<K min [ Ll<i<K 77 the constraints, << we <^ will will set where B(t]) is { inf F(,_i) ^ •" | < = E |Fi-F(,_i)ll P [N' > 2<^ [ ] < min K\ < l^jl l-i<'<^ J 5/4, followed by setting 5/4, which > is (p possible since 0. J are fixed, — (p< X be 4> cp Next construct small] < Zkj Ufc — ¥' + "centers" Zkj in Vkj so that Vx e x^{xj) ';, set sufficiently small in the next step. Depending on 7/, to satisfy sufficiently small as well. (d) [Stochastic Equi-semicontinuity] limsupPfuij — |F, grid-points by setting Vf. where K achieved by setting sufficiently small so that Now follows the verification of stochastic equi-semicontinuty: -Qtni^) < -Qtni^kj)}] < lim sup P [5(77)] the event that {Fyn,i < K} -I- lim sup are separated, and at least one of of the "small" disjoint sets of the form: hk, -V,V.ki -f + 27 rj], i <K. them P [ Ufc falls Ak] into one The bound is true because the last event contains the event that (r,„,i < K) are separated, and one of them falls into a set of the form [vi^ — tp, X'j^Zkj] which actually is the event Ut;{ because —Qtni^) '^ < K) because (ri„,i -Qo+„(z)<-Q+.(z,,)}n(U,.4,r, inf piecewise-constant and can only — (r,, > < K), which have i = limsupF[B(7;)] jump up 0(KTfi < Now index X'^z increases. if the the bounded density, follows that it djl. n by picking By Lemmas B.4 A sufficiently small (Note that tj. K stays fixed + limsup„ P[UtAt] < step (b) limsup„ P[B(77)] and depend on choice does not its rj). S.U 4-6 critical ingredient in the proof of Theorem 2 is the requirement of smallness. In this section tail such properties of en(z) are established, where z Lemma 4 Suppose 36 6 > j,M it is /3 in Ln{l3 Wz eW' ,z' + z/n)/Ln(J3). R**. 0, B> ii)Ep/„ {zf" < uniformly in = an open 0, s.t. e-''l=l; eR'' . {ii)Ep^\£n {z)"^ 3no - Vn > no: s.t. £„ (Z)'^' f < B\z - z'\, Then for ball at /3o- ^^^^r in(u),iP, + uln) ^^^ ^ f J\u\>M Ju^^n{u)q{Po+u/n)du true that under P = P^„ lim limsup^ri^i M-*oo „_>oo ^^^_ ^^^ =0, lAu),(Po + uln) ^^ fy^in J\u\>M lim limsupET^a' M->oo „_K3o (-z) = (u) q{Po-i- ^^ u/n)du 0, (18) so that lim limsupP (|r„ (z) - rf (z) 1 >e) = 0. (19) lemma by bounding Our approach requires a modification: we bound a conditional Hellinger the model R. The Conditional Hellinger Distance, denoted as r2 (/3;I3 + h), is defined IH(1982) studied the non-regression models ajid verified the conditions of this a Hellinger distance. distance for as: [//I'\f'/^{y-x'{l3 Upper and lower bounds on Lemma 5 // there are a > r2 () are 0, infri(/3;/3 P A> Vn > P + h)- f"^ /3 - fi) ? dyFx (rfx)l used for to verify conditions of lemma 2; \x; [y such that for each + ft)>-^!^ 1 + l"l where sup/inf are computed over 3rJo s.t. + h) and x' p\x; h> sup rl small, {13 ifi + h) < A\h\ /3 in open ball at /3o- Then 36 > 0, B> 0, s.t. Vz 6 R**, z' no-" supSp„£„ (z)'/' < e-""!^!; sup£;pJ£„ 28 (z)'/^ - £„ (z')'^' |^ < B\z - z'\. € R'^, The following lemma lemma the conditions of verifies (5) and hence also verifies the bound on the conditional Hellinger distance in model R. Lemma Proofs of lemmas Lemma (18) (4): is immediately from Lemma > 6 In model R, suppose (C.1)-(C.3) hold, then 3a small enough, the following [l - 0, j4 > 0,. such that for all h rUP; P + (4) to (6): > h) a special case of and a\h\ lemma su-pri (P; j3 + h) < A\h\ L5.2 and Theorem L10.2 of IH(1982). (19) follows (18). from the definition of the conditional Hellinger distance that y y f"^ (y - x' (/3 + z/n) + z/n) f"^ |x; /3 [y - x'/3|i; dyFx H) rj (/3; /3 + {dx) Also Ei, n j J f'^'^y we can bound uniformly in £'^„(z)i/2= Similarly, £|^„ {zf' - = 2 - ^'(-^ P= + f{yi |' = Bin J f'^ {y J /'^^ (y fl^ Lemma (6): rl{p;p nrl{j3 + z/n)f''' {y - x'/3|x;/3) dyFx {dx) P^ £„ {z'f^^ <2n i-Ex x'iP\x;P)dyiFx{dxi) ^/n)\xi;P l-lrlip-p + I- ( Ex - } f'^ = > true: in{ (5): It follows =2 is (z) - X' {y < z/n) + El,, (/J + e-t^2'<^'''+"''"' {z') -X'(p + z'ln) + z/n\l3 + z'/n) - X' (/3 2E(.n (z)'^' in (z'f''' zin) \X; -X'{P + z/n) (y - \X; + z'/n) < e"* hT^. p + z/n) \X;p fi + z' jn) dyj + z/n) |X;/3 + z'/n) dy <A\z-z'\. To obtain the uniform upper bound, for e > small enough + h)=Exj(f^'{y-X'{P + h)\X;p + h)-f'^'{y-X'/3\X;/3)ydy <Ex fifty- X' W + h)\X;l3 + h)-f{y- X'P]X; /3) \dy <Ex [ \f{y-X'{l3 + h)\X;p + h)-f{y- X'P\X; 0) J[X'0,X'W+h)] +Ex I J\X'B.X'(B+MY \dy \f{y-X'{fi + h)\X-fi + h)-f{y-X'p\X-p)\dy, 29 z/n) where \{a + [a,b] = a < - 6^ [a,b] ii b)(a -b)\ = 2Ex \X'h\ (p (.Y,/3) [a^ b | + = and for a [b,a] if 6 < a. > 0. This > and q (X,/3)) 6 + Ex The inequahty follows since first h'^iy- X' JJ au \a — b\^ < bounded by further is + (13 uh)\X;l3 + uh) ' dudy. which by compactness of X, Cauchy-Schwartz inequality and changing order of integration by Fubini bounded is < const by, x\h\+ const x Ex / \h\ \-^ J Jo The upper bound Now bound rl(P;fi is now obtained by uniformly in (3 > - X' (P + uh) \X ; + uh) fi dydu condition C.3. (f'" ^ ~x [ ^x5x {y '^P the conditional Hellinger distance from below: + h) >Ex [ J[X'P,X'{0+h}] >Ex I {y - X' W + h) |X;/3 + h) X'h {y - X'P\X-p))' dy ' dy p"^{X,li)-q"\X,fi) = ^x5x\h\xEx f\X'h\) - f"^ > const X \h\, \h\ where the C last line follows because inf^g^d.|^|_j £'|X'c| > 0, Var{X) > since 0. Proofs for the Non-Linear Model In the proofs we set the local parajneter sequence 7n = 7o- Putting through the general sequence 7n does not change the arguments but introduces a lot of notational complexity. C.l As in loccil Proof of Theorem 4 Theorem 1, we split the log likelihood ratio process Qn (z) = In Z/n(/3o + H„z) / Ln(j3o) = {u,v), into the "jump" part and the "smooth" part, and analyze each part separately. For z n Qn{z) = Y^qin{z) X [l(e, > A„(Xi,u)/nVO) + l(£, < A„(A',,-u)/n A 0)] 1=1 n + X^9m(-z) X (1(0 <ei < A„(X.,K)/n) + l{0>fi > A„(Xi,u)/n)] 1=1 = Qin (z) + Q2n {z) ''"^^^~'" A„(2:,u) I. We first = , where f{Yi-g{Xi,Po + u[n))\X,,Po + u/n, ao f(Y,-g{X„Po)\X,,Po) n{g{Xi,/3o find the uniform limit of to those of IH (1982). For each z, + - u/n) Q\„ (z) . + v/y/n) g{X,,Po))- In this part only, we apply the arguments similar using assumptions (E.l) to (E.3) with a second order Taylor 30 expansion: =- ViT. (z) u , — h dg{X,Mf'{^\X,lo)f(e\X,-yo) 0/3 ^ 7 =1 = u'EA{X) (p where the definition of p and Qin ^aiio|i^x^_ and CLT - (X) {z) is q {X)) Qm {z) =^ Qioc(z) = (^.•|^m7o) IV't; ^N - ^v'jv a continuous Gaussian process, provided we can show that strating stochastic equicontinuity of + u'ti Qi„{z) obvious. Information matrix equality for ^ Er=i £f gives + Qin{z) = Qi„ (z). (0,J) . a gives —J = Hence in £°=(Z), convergence this In particular, for any \zi — is z^] uniform by demon- —> 0, we can show that terms like n ^q,n(zi)l(ei > A„(X,,'Ui)/nV0)-5i„(z2)l(e, > A„ Split the (X,,-U2) /n V 0) term into 71 ^l(ei > A„(Xi,ui)/ny A„{X,,U2)/nVQ) - A„ {Xi,ui) /n\Xi;Po + ui/n,ao + vifn) - A„ (Xi,U2) /n\Xi;l3o + U2/n,ao + V2/n) X |ln/(ei - ln/(€, \ n + ^l(f X € (0, A„ {Xi,u^)ln V A„ f {a max j - An (X,, U2) In (Xi,Uj) ln\fio = l,2 + V 0)) Uj/n,ao + Vj/^/n) f{€,\/3o,ao) By the argument of the proof of Theorem 1, the second term is bounded by J^"_i 1 (e; 6 (0, C /n V 0)) x -^ 0. Each summand in the first summation can be bounded by, for some z' between 21 ^ and 22, n ^l(ei > A„(Xi,ui)/nV A„(Xi,U2)/Ti VO) x 1=1 ( ^In/ (ei - A„ {Xi,u') /n\Po + u' /n,ao + v' /y/n) + ^In/(€i - A„ Using assumption (E.5), the {Xi,u') /n\^o first term in this + v.' ln,ao + v'/y/n) ^ Ul ^ — \ summation can be bounded by c]^Y.\^\nf{,ib,)\'^'\u,-U2\ = 31 0,WW~U2\ U2 In ^ The second term y 1 n aaaa' term (E.6) implies that the second - " 1 V2) |- "S^ term by the CLT and ^ — — 22 -^ 1 0, - V2 - V2) v"/ is = \vt. - i;2|Op(l) Op(l) and is hnear in C"(u,Xi) assumption (E.4) the entire term goes to {Vi and bounded by is the indicator in the summation can be replaced by \zi 2" ln/(f,|A',,7o)(t;i n \ rt first z" between d > A„(A-,,ui)/n V A„(.Y,,-U2)/nV0)— l(f, const X \v" (vi The as, for \v" ^ Assumption summation can be further expanded in the > v\ — V2. (Note that by Chebyshev inequality.) Thus as in probability. This completes the proofs for stochastic \{tt 0) equicontinuity and uniform convergence of Limit of Q2n II. the r.v. in the proof of Qin (z) for the nonlinear model. Theorem 1, in the expression for Q2n either \np(Xi)/q{Xi) or \ap{Xi)/q{Xi), uniformly in z in Z, by qin(z) As (z). Note that where e,. this expressions may equal —00. [z) we can replace depending on the sign of So uniformly in z over Z, Q271 (z) = + Op(l), Q2n(z) 02„(z) = y] [In 4^1(0 < nu < A„(X„«)) + In 4^1(0 > ne, > A„(Xi,u)]. Next we note that by straightforward calculations n EY^\l{0<ne, < A„{Xi,u))- 1(0 < ne, < A(X,,w))| 1=1 + [1(0 >ne, > An(X,,u))-l{0>nu > where A (Xi E , u) = ^^^gg'^°^ u, 80 which yields that for any fixed A{X,,u))\ = Qin = z, {z) o(l), ln^4^1(0<n£. <A{Xi,u)) + \ii^^liO>nu > A{Xi,u)) + Op(l). p{X,) q{x.) Having obtained the "linearized expression," the finite-dimensional convergence of Q2n (z) now follows by the arguments identical to those in Theorem 1. It remains to show that (Q2n (zi) i < J) and (Qin(zi) ,i < J), for ajiy finite J, are asymptotically independent. This follows by noting that the dependence between these terms is realized through the sums that disappear in probability for any fixed z = {u,v) ;^ Er=i £i^f{^i\Xi,lo)liu e (0, A„ (Xi,«))) = Op(l), , -^ Er=i C.2 £ In/ (ei|Xi,7o) {u e (A„ (Xi,u) ,0)) = Op (1) Proof of Theorem 5 Similar to theorem it 1 suffices to 2, the proof is a straightforward application of convexity show the finite-dimensional (fidi) {rnizk),k<K) -A (r^(zk),k<Ky As before we lemma convergence of r„ () to Too (), follow two steps. 32 (1), by which . 1. we approximate First For each Me r^, (Z) = [ R-|., the integral Fn (z) over the approximation of r„ (z) P{Z-Z') In (z') q + (/3o R'' by an integral over a compact subset of s T'^i (z) /r^2 where R**. given by F^' (z) is ffnZ') du, = f V'^^ J\z'\<M i^ (z') q (/?o + Hnz') dz' J\z'\<M Define p{zFi^ {z)= f J\2'\<M z') e^ {z') e^ q(Mdu, r^^ = f J\z'\<M (z') q(l3o)dz'. Similaj to (19), lemmjis (7) and (8) in section (C.5) demonstrate that the ratio process is exponentially small: for each e > > 0,5 3M 0, tail of the likelihood e R+: limsupP(|F„(z)-ri^(z)|>£) <5, 2. In view of follows Ee„{z) i is i. of a limit = 1< - e'J\z)f < (r2L,rfL (zk) ,k < k\ This c\z - z'\ be definition of likelihood ratio. Fact ii. is by Lemma 7. These facts check the conditions theorem for integrals of random functions, Theorem 22 in Appendix I of IH (1982). is Two lower-semi-continuous. steps are needed: Show finite-dimensional convergence of £„ to the stochastic limit Show the stochastic equi-lower-semi-continuity (e-lsc) of {—in}, Theorem 4 handled step 1.5.1 in We i. Lemma conclusion follows by It -^ oo By assumption —l„ i. show (r'n2,T^i (zk),k<K^ Proof of Theorem 6 C.3 ii. suffices to it facts: E\eV^{z') ii Fact (20), by two (20) 2, demonstrate step since Z„ = ii After having these two steps the next. Op(l) by the iao, tail smallness lemma 8 and Theorem IH(1982). therefore remains to equivalently of show the stochastic equi-lower-semi-continuity the proof of Theorem 4: Qin{z)^u'(i where Qioo() subset of R'^ . (e-lsc) of {—in} or {—Q2n = — logf„}. From is + Qi^iv),me°°(Z) (21) the gaussian process defined in the proof of Theorem 4 and Z is any compact Also Q2n{z) - Qiniz) -^ 0, uniformly in z over Z, where Q2n{z) = / Ln{j,x)dN(j,x), Je lun(j,x) Because of (21), only through u, it = In 44l(0 < p(x) suffices to we may show ^ A„(i,u)) -|-ln^l(0 > ? q(x) s-e-lsc of {—Q2n{z)} only. write Q2niu) instead. Because Q2n(u) 33 is j > A„(z,u)). Because Q2n{z) depends on z a piece- wise constant function it suffices to show that til, ..., Urn ur=i{ This B C R'*' and S > CU'^=iV{uk) and any bounded for V{ui),...V{um) of some s.t. set B done in several steps. The proof is -02n(«) < inf uev(uj,) there are open neighborhoods 0, <S- -Q2.K)} nearly identical to the proof of is Theorem and only 3, the minor differences are highlighted below. This step (a). is constructed and identical to step (a) in the proof of its limit Theorem N where the point process 3, is found. This process eJIows to demonstrate the equisemi-continuity of is Q2rf Write Q2„{V.)= f lun{j,x)<m{j,x)+ JE:J>0 We lunU,x)di<!(j,x) f = QtAu) + Q2ju). JE:J<0 wish to examine the nature of discontinuities of Q2n by examining that of Q^^ and Q^„. for either part, consider only part Q2n- Because the arguments are identical (b) This step is Here of {Vkj}. identical to step (b) in the proof of = for for sufficiently large n, Vkj (^,Vr 6 X^{xj)^, where dg{x,Po+ut(X)/n) c^ > and v,. = kip, for Theorem {u G k e R"^' {— rn, ..., ' 3 except that the construction ~f V.k 0, ...m}. j^^ ^ positive definite varicince matrix uniformly in "^ 'p{x)^n{x,v.) < Uj^ + Observe that ^n(X,u) = u' in any compact set, as n —> 00, by assumption. Thus the support of this vector in non-degenerate in R"*' Thus, as in Theorem 3, since also X is compact and p(x) is bounded away from and above by assumption, {UfcjVitj} can cover any given bounded set B by selecting sufficiently large m, for large n. (c)[Number of Break-points is Op(l) and separated for small tp\ Consider argument z in . KJjVkj, then a discontinuity in 0271 C'^) (since it only depends on u through the index A„(x,u)) set VijVkj only if there exist u' S UjVfcj and {Tin, Xin) s.t. can potentially occur in the ri„=p(X.„)A„(Xi„,«*), where we should be the case for large n that it say that where k = Qj„ has i^^. — < (^ a breakpoint in \JjVkj- Define jV„ sup^.^^ u6bP(^)^(^i") + '^'P- (22) Tin <v_k = #{i : + f- If there r,n For sufficiently large n, < Mn fc}, is is jV such (r,„,X,„), = #{i : F, < k}, the upper bound on the — number of breakpoints of Qini^) ™ set B. By continuous mapping theorem, Nn > J^ in R. So the number of breakpoints 0{Mn) is stochastically bounded Op(l). Furthermore, breeik-points are separated in the same sense as in the proof of Theorem 3. Define Ak to be the event that Q^ni^) has more than two break-points in \JjVkj- Then by the arguments that are identical to those in the proof of Theorem 3, for any S > Q <p can be picked small so that limsup„ P [VkAk] < 5/2. (c) [grid-points by setting 4> q : Q < 7) « constraints set be Next step is to construct the "centers" of Ukj of Vkj< A„(x,Ukj) < Vy. — ip + t], Vr e X^(xj), where small in the next step. Depending on rj, to satisfy the small] Pick Ukj e Vkj so that for large n v^ — ip set sufficiently ip will 4> sufficiently small as well. (d) [Stochastic Equi-semicontinuity] This step 3, except C.4 we replace identical to step (d) in the proof of Theorem Proof of Theorems 7 and 8 Proof of Theorem 7 The is finite is X' z with £\n(X,u). for the result is well known. Under the stated conditions, the posterior Bayes estimator, so Theorem Proof of Theorem 8. Z„ denotes the 1.1 in re-scEiled 34 Ch.4 of Lehmann applies. Bayes estimator Hn^ilBayes — 7) risk . As a preliminary step we have > for large n, for c\,C2 and any H ^{|Z„| >J/} <ciexp{-C2|H|}, which follows by Theorem conditions verified in Ibragimov and Hasminski that required the and (8). 1.5.3 in Lemmas (7) Zn under P-, depends on 7, which we emphasize by writing by a polynomial imply that for some no, \p{ZZ),n > no, 7 Z^{K) Define H~^ as open in {fBayes,\K ball at 70 !• some no and any compact <p{ZZ{K) — S),n> smallness Z^. (23) and majorzation of p uniformly integrable. is (24) "7)1 where fsayes.XK '^ ^^^ Bayes estimator defined with 70) € K}. By construction for large H: P'{\ZUK)\>H} = for eis tails = l{Hn^{x — respect to the prior weight Ak(i) Thus (23) 7 (25) K, sets no, 0. in open ball at 70 f is uniformly integrable. (26) Next we (a) evaluate / = Rp{{fBayes}, R'^) and II {K) and (b) show that II{K) approaches / from below as Step (a). Ri,{{fBay..},K) = = RpUfsayesAK} K) , K '\'R^ \\m^ JJ^Ep^^^,^p{Zl-''^^)d5|Leh{K) = / Ep^^p{Z^)d5lheh(K) = Ep^^piZ^) Jk by Theorem 5, Thus equations ((23)-(24)), and dominated convergence theorem. = Ep^^p{Z^) RpiifBay.s},^'') Analogously, Rp{{fBaye.,Xj, },K) = lim„ /^ Ep^^^,^p(Z;i''^'HK))d5/Leh{K) = [ Ep^^p{zUK))d5/Leh{K) JK by Theorem 5(applies to Z^" (K) by the same argument (25)-(26), and dominated convergence, where .^^(iir) Step (b). the Ihs is Next, for any S, fj^ = aig inf / p{z - ft - as it S)lcx,{ft Ep^^p{Zi:,{K))d5/heb(K) < applies to - Z^" ) and equations S)dft. Ep^^p{Zoo). This follows because the lower risk bound for Rp{{f„},K). Thus no other estimator sequence in {F„} that from {fBayes, Xk } ^'^^ 'O- " achieves lower risk value. Rewrite the inequality as diifers ^ Ep^^ [piZ^K)) - p{Zo.)] dS/Leh(K) = (27) J Ep^^ [piZtoiK)) - p{Zoo)Y dS/heb(K) - J Ep^^ [p{Zi,{K)) - piZoo)]' d5/Leb{K) 35 <0 Next J —> r{K) as oo " Ep^^ [piZtiK)) - p{Z^(K))] dS/heb{K) = ' Ep^^ [p(ZI^^''\k)) - p(Z^)\ J dfi ^ ^ 0. (28) where r(K) denotes the width of the cube K (which conclusion follows by (a) domination: for any for jBp^ij any p e — {K)) Ld(2^'^ < p(Zao)\ — Z^ by Zoo for p 6 (0,1) > minimizes, in probability; \p{Z'^^'^\K)) oo. last (29) as r(/£') -> oo, it this - p(Zo.)] ^ (30) 0, Lemma and convexity definition Z^ " due to 1, s.t. , pe (0, l)'^,r(K) < ooj uniformly integrable. is must be that Jj^Ep^^ [p{Zi^{K)) - p{Z^)]^ d5lheh{K) -> as fi-di minimizes to the objective function that Z^o (29) the collection of variables - p(Zoc)]~ Thus II{K) t / Note that by (ii) imply that (27)- (30) also r{K) -^ The oo: Sp^^p(^tx)) and (b) pointwise convergence: convergence of the objective function that I r{K) < (0,1)'^, . (0,1)'' EP,, [p(Z^'*^'(X)) since (i) centered at zero by definition) - p{Z^)] ' < p{Z^) [p(Z^'<^'(/f)) so that n € is /sT t as R''- shows that {fsa-yca} minimizes iip({/},R ). Suppose that there is a sequence {/^} in {F„} that achieves lower risk value than Ep^^p{Zoa)- But then this sequence must achieve a lower value than II{K) for some large AT, which is impossible by the previous comment. C.5 Lemmas In this section 7-8 some important properties of ^n(z)=in(7(^))/i47)- s 7 + H^z conditions of lemma aie established, where 7(2) First we note the 'EpJ^(zy'^<e-'^^\ By the proof of lemma 1.5.2 of for z = (u,v) € R''. 4 that uniformly in 7 in a ball at 70 Ep^\in{zf'^-ltn{z'y'^\<B\z-z'\ IH(1982) the the second inequality only needs to hold for first \z — inequality only needs to hold for z large, < z'\ 1. For the nonlinear model R, the conditional Bellinger distance r2 //I\f'^^ (y - g(x,t + hi)-x,'y + h) - f^^ (y - g (x, Next we obtain the nonlinear versions of lemmas Lemma 7 If 3a > m{rl(r,'r then 3b> 0,B >0, 0, A> 0, such that for h + h)>amax{\hi\,\h2f) s.t. for sup Epjn (zY'' all < u large, e-'l^l, V2 e > (5) and + h)^ 7 t) 1,7) I'rfyFx {dx) ; (6) for R'', z' suprj (r,! e R^ , \z - SUpEpJir. (Z)"' - 7 36 z'\ in + < is the model R. small enough for 7 in a and defined as (7; h) 1. ball at < A {\hi\ 3no, {z)"^ \ s.t. 70 + \h2\'^) Vn > < B\z - z'\ no, and Proof: For z EpJ„{z) On {zY'^ |£„ - - 1 the other hand, for Ep^ same caJculation large, the \z In [zY''' 5 leads to v/y/n) small, z'\ <nrl I lemma + u/n,ao + ^''s (70; /9o — as in (70 + [uln, v/^) ; 70 + (u /n, v'l^)) < A[\u - u'\ + \v - v'\^) <A{\z-z'\ + \z-z'\'')<B{\z-z\) Lemma 8 In model R, suppose condition (E.1)-(E.6) hold, then 3a small enough; for 7 in a inf r2 (717 + /j) ball at > amax (1^1 |/i2|^) 1, and suprj (717 '' Proof of rlir, 'y <Ex > 3A > 0,. 0, s.t. V/i > 70 + /») < A (|/ii| + |/!2p) 7 Lemma + h)=Ex 8 For > e J [f'" and I > small enough, {y-g(X,t + \f(y-g{X,t f 77 e) \X; t let = 7 + e,s + rj)- {t,s),h = (e, rf), f" (y - g (X, t) \X; t, s)) ' dy + e)\X;t + e,s + Ti)-f{y-g{X,t)\X-t,s) ' 9(X,f),s(X,t + 0] + Ex [ ^[g(X,i),s(X,t + + Ex (f'^' {y e)]'= - g(X,t + €)\X;^ + h) - f' (y - g{X,t + e)\X;t + e,s)Y dy ^ "^ f {y J - g{X,t + e)\X;t + e,s) - - g{X,t)\X-t,s) dy f (y (2) <2Ex\giX,t + e)-g{X,t)\{p{X,'y) + g{X,'y)) \df'^'(y-g(X,t + e)\X,t + 1^1 e,s r ^^ r\df{y-g{X,t + uje)\X,t+we,s) + wv) dydtj dyduj dt (3) ' < 2\e\Ex{p{X,'y) + q{X,'y)) f \dg{X,t + se) dsFx(dX) + 0{\Tif)+0(\e\) dt Jo 0{\e\)+0{\v\'). = = < a, and the bound is uniform in 7 by (E.4). The first and from |a — 6p < \a^ — b^ for a > and b > 0. The first term in the second inequality follows from that fact that for y & [g {X, t) g (X, t + e)] and e small enough, |/ (1) < 2{p (X, 7) + g {X, 7)), and then we integrate over y over that area. The second and third terms in the second inequality are usual multivariate first order Taylor expansions and Fubini. The first term in the third inequality follows from Taylor expansion and Fubini. The second term in the third inequality follows from assumption (E.4), while the third term in that inequality follows from assumption (E.2). where [a, b] [a, 6] if a < 6 and [6, a] if 6 inequality follows by tricingle inequality \ , | To obtain a bound from below, consider rf (7; 7 + /i): (f''{y-g{X,t + e)\X;'r + h)-f'^'{y-g{X,t)\X;y)ydy =Ex [ ^ y[s(X,(),s(X,(+c)] ^ dy + Ex [ (f'^y - g(X,t + e)\X-n ^ h) - f'^ {y - g{X,t)\X-y))' ' J\s(X,t),giX,t+()Y ^ 37 We can bound the E,\ term from below uniformly first in 7 by + e)-giX,t) p'l'(X,^)-q"\Xn) g{X,t dg{X-t) >c\e\Ex >cU\ dt On using assumption (E.3), Taylor expansion ajid Cauchy-Schwartz inequality. bound the second term 7 of r2 (7; + the other hand, h) uniformly below by: {f"(y-g{X,t + Ex [ J\g(X,t),9(X,t+€)Y ^ e)\X-^ + h)-f'^(y-g{X,t)\X-t,s))''dy 2 y|s(x,<),3(x,(+£)]' Under assumption = On (E.4) (a), a lower M^ Ex \hf "7 V the other hand, if Thus, conclude that bound is \h\^ inf|„|=i /^^'fa-^ff^(^.*);7) '^ I^ assumption (E.4) inf-, / r| (7;7 (b) holds, the + h) > cmax (|e|, Ex — ~ f, ,x t) gix \y^^ ^|^|2) (+£)]= > ( ^|^|2 uniform lower bound 2 ^""37^^'^'"^^ " dy ) > ^|^|2 is |J7^|). References AlGNER, D., T. Amemiya, and D. PoiRlER (1976): "On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function," International Economic Review, 17, 377-396. "Auction Models Are Not Robust Bajari, p. (1999): When Bidders Make Small Mistakes," Stanford University. BowLUS, A., G. Neumann, and A. Kiefer (2001): "Equilibrium Search Models and the Transition from School to Work," International Economic Review, 42(2), 317-343. Chernozhukov, V. (2000): "Conditional Extremes and Near Extremes," mimeo. MIT and Stanford University. Christensen, B., and a. N. Kiefer (1991): "The Exact Likelihood Function Model," Econometric Theory, 7, 464-486. Donald, S., and for an Empirical Job Search H. Paarsch (2000): "Superconsistent Estimation and Inferences in Structural Econo- metric Models using Extreme Order Statistics," University of Texas and University of Iowa. Donald, S. G., and H. Distribution depends Department J. Paarsch (1993a): "Maximum Likelihood Estimation when the Support of the upon Some or All of the Unknown Parameters," Typescript. London, Canada: of Economics, University of Western Ontario. S. G., and H. J. Paarsch (1993b): "Piecewise Pseudo-Maximum Likelihood Estimation in Empirical Models of Auctions," International Economic Review, 34, 121-148. Donald, Donald, pirical and H. J. Paarsch (1996): "Identifiction, Estimation, and Testing in Parametric EmModels of Auctions within Independent Private Values Paradigm," Econometric Theory, 12, S. G., 517-567. Flinn, C, and J. Heckman (1982): "New Methods Dynamics," Journal of Econometrics, 18, 115-190. 38 for Analyzing Structural Models of Labor Force E., I. Perrigne, and Q. Vuong (2000): Auctions," Econometrica, 68(3), 525-574. Guerre, "Optima! Nonparametric Estimation of First-Price HiRANO, K., AND J. Porter (2001): "Asymptotic Efficiency in Parametric Structural Models with Parameter-Dependent Support," UCLA and Harvard University. Hong, and M. Shum H., (2000): "Increasing Competition and the Winner's Curse: Evidence from Procurement," manuscript. Horowitz, "The Bootstrap," University (2000): J. AND IBRAGIMOV, I., Knight, K. (1999): "Epi-convergence of Iowa; to appear. Ann. (1973): Probability, Paarsch, H. vol. 5. R. Has'minskii (1981): Statistical Estimation: Asymptotic Theory. Springer Verlag. and Stochastic Equisemicontinuity," Preprint. Leadbetter, M., G. LlNDGREN, AND H. RoOTZEN Sequences and Processes. Springer Verlag. Meyer, R. M. Handbook of Econometrics, "A poisson-type limit (1983): theorem for Extremes and Related Properties of Random mixing sequences of dependent 'rare' events," 1. (1992): "Deciding between the common and private value paradigms in empirical models of auctions," Journal of Econometrics, 51, 191-215. PoLiTis, D., J. Resnick, I. S. Robert, C. Romano, and M. Wolf (1987): P., and Rockafellar, R. T., VAN der Vaart, a. G. Casella (1998): Monte Carlo Statistical Methods. Springer-Verlag. and R. B. New Vuong, Q. Wets (1998): Variational Analysis. Springer-Verlag, Berlin. (1999): Asymptotic Statistics. van der Vaart, A. W., and Verlag, (1999): Subsampling. Springer Series in Statistics. Extreme Values, Regular Variation, and Point Processes. Springer-Verlag. J. A. Wellner Cambridge University (1996): Press. Weak convergence and empirical processes. Springer- York. (1989): "Likelihood-ratio tests for model selection pp. 307-333. 39 and non-nested hypotheses," Econometrica, Date Due 7/2^/^ MIT LIBRARIES 3 9080 02527 8919