Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/likelihoodestimaOOcher » :> Massachusetts Institute of Technology Department of Economics Working Paper Series A CLASS OF NONREGULAR ECONOMETRIC MODELS LIKELIHOOD ESTIMATION & INFERENCE IN Victor Chernozhukov Han Hong Working Paper 03-1 January 2003 Revised: May 9, 2003 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection http://ssm.com/abstract=386680 at MASSACHUSETTS INSTITUTE OF TECHNOLOGY MAY I 3 2003 LIBRAR IES LIKELIHOOD ESTIMATION AND INFERENCE LN A CLASS OF NONREGULAR ECONOMETRIC MODELS VICTOR CHERNOZHUKOV AND HAN HONG Abstract. In this paper we study estimation and inference the conditional density, where the location and size of the Two prominent examples are auction models, in structural jump models with a jump where the density jumps from zero to a positive and the equilibrium job search model, where the density jumps from one value, inducing kinks in in are described by regression lines. the cumulative distribution function. An early model level to another, of this kind by Aigner, Amemiya, and Poirier (1976), but the estimation and inference in was introduced such models remained an unresolved problem, with the important exception of the specific cases studied by Donald and Paarsch (1993a) and the univariate case is in Ibragimov and Has'minskii (1981a). The main the statistical non-regularity of the problem caused by discontinuities in difficulty the likelihood function. This difficulty also makes the problem computationally challenging. This paper develops estimation and inference theory and methods such models based on for likelihood procedures, focusing on the optimal (Bayes) procedures, including the results and confidence tionally. intervals. The Bayes The Bayes procedures obtain of independent practical method with good small sample and theoretical likelihood in these models, in statistic is not are attractive both theoretically and computa- confidence intervals, based on the posterior quantiles, are a valid large sample inference is MLEs. We on convergence rates and distribution theory, and develop Wald and Bayes type inference which the interest maximum shown to provide This inference result properties. due to the highly non-regular nature of the any finite dimensional Principle. Point Process, Frequentist Validity, Posterior, Structural Econometric likelihood statistic or asymptotically sufficient. Key Words: Likelihood Model, Auctions, Equilibrium Search, Production Frontier The original 2003. We draft of this paper would like to was presented at the December 2000 EC 2 meetings in Dublin. This version is March, thank Joe Altonji, Stephen Donald, Christian Hansen, Jerry Hausman, Joel Horowitz, Ivan Fernandez, Shakeeb Khan, Yuichi Kitamura, Rosa Matzkin, Whitney Newey, George Neumann, Harry Paarsch, Frank Schorfheide, Robin Sickles, Richard Spady, EC2 We meeting, MIT, Northwestern, Max Stinchcombe as well as participants in the Princeton, Rice, University of Texas at Austin, and thank three referees and the co-editor are grateful to our our advisor Takeshi for numerous Amemiya Duke CEME conference, the for valuable suggestions. useful suggestions that improved the paper considerably. for suggesting this 1 problem to us. We 1. Introduction This paper develops theory for estimation and inference methods in structural models with jumps in the conditional density, where the locations of the jumps are described by parametric regression curves. The jumps in the density are result in non-regular and very informative about the parameters of these curves, and difficult inference theory, implying highly discontinuous likelihoods, non- standard rates of convergence and inference, and considerable implementation Amemiya, and Many sis. difficulties. Aigner, Poirier (1976) proposed early models of this type in the context of production analy- recent econometric models also share this interesting structure. For example, in structural procurement auction models, cf. Donald and Paarsch (1993a), the conditional density jumps from zero to a positive value at the lowest cost; in equilibrium job search models (Bowlus, and Kiefer (2001)), the density jumps from one positive Neumann, another at the wage reservation, level to inducing kinks in the wage distribution function. In what follows, we refer to the former model as the one-sided or boundary model, and to the latter model as the two-sided model. In these models, the locations of the jumps are linked to the parameters of the underlying structural economic model. Learning the parameters of »the location of the jumps is thus crucial for learning the parameters of the underlying economic model. Several early fundamental papers have developed inference methods for several cases of such els, including Aigner, Heckman Amemiya, and Poirier (1976), Ibragimov mod- and Has'minskii (1981a), Flinn and and Kiefer (1991), Donald and Paarsch (1993a, 1993b, 1996, 2002), (1982), Christensen and Bowlus, Neumann, and Kiefer (2001). Ibragimov and Has'minskii (1981a) (Chapter V) ob- tained the limit theory of the likelihood-based optimal (Bayes) estimators in the general univariate non-regression case, and obtained the properties of MLE in the case of one-dimensional parameter, van der Vaart (1999) (Chapters 9.4-9.5) discussed the limit theory for the likelihood in the univari- ate Uniform and Pareto models, including Pareto models with parameter-dependent support and additional shape parameters. Paarsch (1992) and Donald and Paarsch (1993a, 1993b, 1996, 2002) introduced and developed the theory of likelihood (MLE) and related procedures in the one-sided regression models with discrete regressors, demonstrated the wide prevalence of such models in structural econometric modeling, and stimulated further research Nevertheless, the general inference problem posed by Aigner, remained unsolved previously. Very in little is known about in this area. Amemiya, and Poirier (1976) has likelihood-based estimation and inference the general two-sided regression model. In the general one-sided regression model, the problem and inference of likelihood-based estimation important exception being the (1993a). 1 The is remains an important unresolved question, an general theory of such regression models different structure 'There also MLE theory for discrete regressors developed than the corresponding theory also a literature on the continuous covariate case, see e.g. is by Donald and Paarsch more involved and has a substantively for the univariate (non-regression) or dummy ad hoc "linear programming" estimators of linear boundary functions covering Smith (1994) (Chernozhukv (2001) provides a detailed review and other related regressor case. computational This paper methods in 2 Moreover, there difficulty of is a considerable implementation problem caused by the inherent the classical offers solutions to these (maximum likelihood) estimates. open questions by providing theory both one and two-sided models with general regressors. likelihood-based optimal 3 Bayes and also the MLE procedures. for estimation and These methods inference on the rely This paper demonstrates that these are tractable, computationally and theoretically attractive ways to obtain parameter estimates, construct confidence intervals, and carry out statistical inference. These results cover Bayes type inference as well as Wald type inference. We show that Bayes inference methods, based on the posterior quantiles, are valid in and also perform well in small samples. Moreover, these inference no knowledge of asymptotic theory on the practitioner's attractive due to their well-known finite-sample are computationally attractive (MCMC), in see e.g. when and carried out through the likelihood principle. Importantly, models (in sample theory of likelihood we show that the procedures are generally not functions of the dummy for these models, any kind of inference based on the MLE is generally not an asymptotically sufficient contrast to the non-regression case or the likelihood contains more information than the or They •• useful not only for the present analysis but also for statistic in these risk optimality. Markov Chain Monte Carlo procedure MLE. All of these results are preceded by a complete large is These estimation methods are also sample average Robert and Casella (1998), which helps avoid the inherent curse of dimensionality the computation of the which part. large large samples methods are tractable and require MLE does, dummy and the regressor case). Therefore, totality of likelihood-based MLE asymptotically, as they are in the non-regression regressor case (or regular models). This motivates the study of the entire likelihood and the wide class of the likelihood-based procedures. results). In with the 2 some MLE In fact, special cases, such as homoscedastic exponential linear regression models, these estimators coincide asymptotically. we show in this paper that unlike in discussed in details by van der Vaart (1999) or statistics. MLE limit likelihoods 3 is the univariate models, such as the Uniform and Pareto models dummy regression case, there are no finite-dimensional sufficient not asymptotically sufficient either, making inference theory difficult to analyze. We depend on multivariate Poisson point processes with complex correlation structure. This terminology follows that of Berger (1993), p. 17. show that the and Porter (2002). They 4 provide a detailed analysis of asymptotic minimax efficiency in a class of boundary models. They Our work also related to a recent important contribution by Hirano is employ an exponential-shift experiment framework along with group analysis to generate new results and insights on the efficiency structure of Bayesian estimators (which also motivate the present research) and prove the inefficiency (sub-optimality) of the and absolute deviation and inference problem We briefly criteria. We MLE under the study a different set of questions in the general two-sided common mean squared focusing on the estimation - and boundary models. summarize the contributions of this paper as behavior of the likelihood ratio process and show that it follows. First, we derive the large sample approaches a simple, explicit function of a Poisson process that tracks the extreme (near-to-jump) events and depends on regressors interesting way. This limit result The limit useful since is it is useful for any inference that can be easily simulated we prove the on the likelihood in an principle. order to evaluate the limit distributions of in derived estimators and various likelihood-based statistics. Second, relies To our knowledge, these results are new. consistency, derive the rates of convergence, and provide the limit distribu- tion of the likelihood-based optimal estimators (BE) for using these estimators in empirical work. More and MLE. The results are basic prerequisites Wald importantly, these results justify general type inference based on limit distribution, subsampling, and parametric bootstrap. Third, we show that posterior r-quantiles are asymptotically of the true parameters. This property implies the validity of on the posterior quantiles. (1 — r)-quantile unbiased estimators Bayes type confidence intervals based These confidence intervals provide valuable practical inference meth- ods since they are simple to implement and require no detailed knowledge of asymptotic theory. This frequentist validity result also of general theoretical interest because is complicated likelihoods where no finite We proof applies more generally to other problems. inference about general it covers models with dimensional sufficient statistics exist asymptotically, and smooth functions its further generalize this result to cover Bayes of parameters, and show that it provides valid inference asymptotically. Fourth, we briefly discuss how the well-known procedures carries over to the 4 The limit. finite-sample (average-risk) optimality of Bayes discussion is auxiliary and given here to prove some Hirano and Porter (2002) also derive the limit distributions of BE's for continuous covariate boundary models different form. theirs somewhat.) An important Poisson process, hence (2002)'s limit in a (As Hirano and Porter (2002) note, the present treatment of continuous covariates appears to precede is it is difference is that our limit likelihood is stated in terms of a simple transform of a quite simple to simulate for inference purposes (which is our focus). Hirano and Porter implicit, given as a process indexed by continuous covariate values. Consequently, their result suited for efficiency analysis (which is their focus), and its use for classical inference without the (equivalent) Poisson type representations obtained in this paper. in practice may be is more infeasible Also, the present paper focuses on inference in both one- and two-sided models. 5 The optimality of Bayes estimates is treated in considerable details elsewhere in the literature. The recent contribution by Hirano and Porter (2002) provides a detailed limit-of-experiments analysis for a class of boundary models. Lehmann and Casella (1998) provides a basic discussion, 3 van der Vaart (1999), Chapter 9.3-9.4, treats of the previously stated results and for the justification of BE's, including the approximate MLEs. The exact MLE, exact procedures asymptotically. But even it is when close to the approximate MLE defined as a function that approximates the delta function such as the 0-1 loss I[\u\ power losses. 6 Such loss that sense, the exact delta function, > BE under any MLEs is approximately optimal under any and may perform better under the alternative generally can not be dominated by any other given e]/e or (truncated) p-th loss function that loss functions approximate the than other likelihood BE when the the loss function of the end user of her result. Thus, the method for comparisons are made across MLE method, advocated by Donald the context of the discrete-covariate boundary models, provides a valuable in estimation and inference in both the two-sided and one-sided regression models. 7 we show through simulation examples based on an empirical auction model from Paarsch Fifth, (1992) that(l) particular BE's and critically on the measure of intervals based on the risk, MLEs and work quite (2) the limit distributions and well their relative performance Bayes confidence intervals and the Wald confidence less expensive computationally. The Bayes produce the shortest confidence intervals among other methods. intervals also depends perform as accurately as the Wald confidence intervals based on the parametric bootstrap, but are much justifies risk MLE Such comparisons are relevant when the empirical investigator does not different loss functions. and Paarsch (1993a) loss functions penalize mistakes differently than the squared loss function. In procedures such as posterior means or posterior medians. Importantly, this implies that the know and bias-corrected, generally does not coincide with optimal confidence Thus, this paper a whole array of useful and practical inference techniques, ranging from Wald type to Bayes type inference methods. The Model, Examples, Assumptions, Procedures 2. This section describes the model and provides an informal discussion of the assumptions, results, and inference procedures developed The Model. 2.1. It is in the later sections of the paper. convenient to describe the class of models model where the errors have a discontinuous iid sample of size density. Let consider in terms of a regression (Yj, Xi) ,i = 1, . . . ,n, denote the random n generated by the model Y =g(X ,0)+e i efficiency jd the we Uniform and Pareto models i (2.1) i , in the non-regression case. Ibragimov and Has'minskii (1981b), p.93 prove a fundamental result on the generic asymptotic efficiency of Bayes procedures under general, non-primitive (hard-to-verify) conditions. The approximations exact are Bayes procedures and are optimal in that regard, and hence can be a good substitute for MLE. Additional important and distinct properties of the MLE include (i) invariance to reparameterization and independence from prior information. Arguably, both properties are very useful. 4 (ii) where F; is the dependent variable, Xj and the error e* is a vector of covariates that has distribution function The has conditional density / (e\Xi,/3,a). central assumption of the model Fx , that is the conditional density of the error / (e|Xj, 0, a) has a jump (or discontinuity) normalized to be at 0, which may depend on the parameters and a: Yimf(e\x,0,a) =q{x,0,a), (TO \\mf{e\x,0,a)=p{x,0,a), (2.2) (10 p(x,0,a) >q(x,0,a) +8, Hence, > 8 sets of parameters, collected into possibly the error distribution e X = support(X), V(Aq)e6x1 model the location of the discontinuity in this by the regression function g{X,0), which that Vx € 0, BCR d/> and a £ = a vector 7 and a Ac Rda . (0',a')', affects the We in the density of conditional on where is given affects the regression curve assume that the parameter also X Thus, there are two shape of the error distribution only. and convex, and that the true parameter belongs to the We Y described by the parameter 0. is set Q=Bx and We assume A is compact interior of this set. consider two models: the one-sided model and the two-sided model. In the one-sided model, the conditional density jumps from zero to a positive constant. In the two-sided model, the conditional density jumps from one positive value to another. two-sided model. In addition, Aigner, model may model the Amemiya, and outliers. in the density, More is a special case of the Poirier (1976) suggested that the two-sided generally, the two-sided model approximates models with a sharp change where the location of the change depends on parameters and regressors. The with a density jump. in The two-sided models such models is finite approximated by that in the model also naturally arise in equilibrium search models, see Bowlus, Neumann, and Kiefer (2001). The key feature of the regression model is that the conditional density of sharp discontinuities The difficulties. in given X jumps at the likelihood, which create statistical non-regularities and computational discontinuities are highly informative (The simplest univariate example is the other hand, inference about Note that Y and covariates X. This feature generates the location g(X,0), which depends on the parameter On one-sided model be applied to one-sided models in the presence of outliers, using an additional side to sample distribution of parameter estimates e.g. The classification of the the uniform model a is standard in and imply estimability at rate n. about U (0,/3), many model's parameters into where is estimated at the rate n). regards. a and is motivated statistically, as in Donald and Paarsch (1993a) and van der Vaart (1999) (who considered univariate Pareto models). The boundary parameters earlier. If usually coincide with the they do not and the Wald type inference is main economic parameters, to be used, then as indicated one needs to reparameterize them into a and /3, see e.g. Donald and Paarsch (1993a). 8 However, the practical use of Bayes type inference or parametric bootstrap In the following, we methods do not require such reparameterization. a structural example, which briefly review our regularity conditions and explain the results. bility of will serve to illustrate the plausi- provides an example for the It also Monte Carlo work. Example: Independent Private Value Procurement Auction. Consider the following econometric model of an independent private value procurement auction, formulated in Paarsch (1992) and Donald and Paarsch (2002). Here, Yi Xi = i-th auction The of V is the winning bid for auction minus 1, and Z, denotes other observed bidders' privately observed costs given X follow an and the covariates characteristics of auctions. Pareto distribution given X, iid i.e. the density described by is = fv (v\x) where 6 2 and V i mi denotes the number of bidders in the (Zi,rrii) describes variation across auctions, where M£ v are parameterized as functions of 9\ — notation convenience). E.g. 9\ {X,fi) > X > e1 and exp(/?(Z) and 2 o, (3 e2 > o, (but this dependence (X,P) = is suppressed for exp(P'2 Z). Assuming the Bayesian Nash Equilibrium solution concept, the equilibrium bidding function satisfies a(v) = v+ _ Fv{v]x))m (1 which is = at v the cost plus the expected net revenue conditional on winning the auction. Evaluating a (v) 6\ gives the conditional support for the winning bid. As shown in Paarsch (1992), this implies the following density function of the winning bid Y, which the specified bidding rule, conditional = (y\x,e u e U, ,,Y/l,M 2 8 A , is the first order statistic generated by on covariates X: 02m [M™-D-i] ^^ ) referee pointed out the following example. J^(m-l) ./ > ^ 1 Suppose g($) = #i 62, 6> 2 (m-l)-l where 62 also error distribution. Although this example does not correspond to the economic model affects the we used in shape of the the simulations, it highlights the important issue of reparameterization. In this case the asymptotic theory requires reparameterization into /3 = 6\02 and type inference a = may 9\ , and then the estimates finite-sample estimation uncertainty about 2(6 -q) /q & n~ 3 2 of fli and 62 are deduced from the estimates of a and 0, and Wald be carried out using the Delta method (preferably based on the second order expansion, so that l ZB la + n- l/2 Z a /a 2 not neglected). E.g. for 02 = PI a (02 — 02) ~ {fi — P)/a — (a — a) /a 2 + _1 2n (Z a ) 2 /a 3 where Z s and Z a are the limit distributions of /3 and /3 is + , a. This expansion can be important because in finite samples, variability of estimates of larger order than that of 6, motivating this expansion. (02 — 02) ~ ~(oc - a)/a 2 + Of o p (l) but this approximation course, one could use the is less 6 accurate. first /3 may be of comparable or order Taylor expansion too, Therefore, this is an example of a one-sided regression model Y =g(X i i ,0) +e (2.1) where i, with g(X,0) = 9 1 (X,0) = (m - 9 2 (X, 0) 1) /(9 2 {X, 0) ti 2.2. Regularity Conditions. The main regularity conditions A. They serve to impose identification (a) 1) - 1), fY (g(X,0) + e\X,0) conditional on X. and has density f(e\X,0) (m - C0-C5 are collected in and compact, convex parameter space (with true parameters (b) continuous differentiability of the regression function in the interior), g{x;0) in 0, (c) nondegeneracy and boundedness of the vector dg (X,0) /d0, (d) continuous differentiability and boundedness of the density function /(e|x,7), of first derivatives in 7 and €, and of the second partial derivatives continuous differentiability and integrability of the (e) Appendix types of assumptions: five basic first in 7(except at and the second e its = partial 0). partial derivatives of In/ (e|a;,7) in 7. Conditions of types (a) - (c) are standard in nonlinear likelihood analysis. tions of type (d) represent a generalization of the conditions of Ibragimov Conditions of type are the standard conditions for regular (e) van der Vaart (1999), Chapter if is 7. Conditions of type flexible likelihood models, e.g. as in that inference about enough to cover various auction models, models, and equilibrium search models. a is standard = II /W-*(*<,W«;7). • technical Define the where the addendum risk is gives an likelihood-based estimators that minimize the average computed under example (2.3) <n The optimal Bayes estimators are the risk, and Informal Overview of Results. 10 Lnil) expected frontier production function 9 Definitions of Estimation Procedures likelihood function as A and Has'minskii (1981a). known. These conditions are 2.3. smooth (e) reflect Smoothness condi- different parameter values and then averaged over of verification of these conditions in the auction model that underlies our Monte-Carlo simulations. I0 The of {X,,i likelihood can be < n). This term is made unconditional by multiplying through with the density (probability mass) function omitted because this additional term does not affect the definition of the likelihood ratio or can be otherwise canceled out. these parameter values. The procedures 7 = where p n (7) Q, and p(n/3,y/na) = arginf [ a is i n (7)A*(7)//c ^n(T')/ i loss function (7')^7' analysis, as in made is pn p„( 7 7) ^fji- , (2.4) the weight density function (prior density) on is tne posterior density. ls d), The optimality explicitly dependent on the sample size purposes of asymptotic for Ibragimov and Has'minskii (1981b), but this may be ignored Appendix A. Examples properties of the Bayes 11 and standard conditions are imposed on the in - loss function, p(-) procedures carry over to the limit. The are generally of the following form: loss function p and the prior p, in practice. Convexity D1-D3 and collected as of such loss functions include = z'z, a quadratic loss function, = Sj_i \zj\, an absolute deviation loss function, p (z; t) = Y.%1 1 zj > °) ~ T z3> T e (°> !)> a variant (A) p (z) (B) p (z) (C) I check deviation ) ( of the Koenker and Bassett (1978) loss function, Solutions of (2.4) with loss functions (A), (B), (C) generate BEs 7 that are, respectively, (a) a vector of posterior means, (b) a vector of posterior medians (for each parameter component), (c) a vector of posterior r-th quantiles. Since BEs become very difficult to compute when p functions for pragmatic reasons. However, proofs of the loss functions specified in is not convex, we focus on convex loss main results Ibragimov and Has'minskii (1981b). In practice, 7 can be computed using Markov Chain Monte Carlo methods, which produce a sequence (1, (7 whose marginal distribution is More ,...,7 ( of draws >) (2-5) ), given by the posterior. Appropriate statistics of that sequence can be taken depending on the choice of (B) above.) apply more generally to other p. (E.g. the generally, estimators 7 means or component-wise medians for cases (A) and are solutions of well defined globally convex differentiable optimization problems. 12 The computational attractiveness of estimation and inference based on the Bayes procedures stems from the use of Markov Chain Monte Carlo of the (MCMC) and Bayes procedures. Since the Bayes estimates and the the statistical motivation of definition interval estimates are typically medians, or quantiles of the posterior distribution, by drawing the posterior distribution, we can compute these Furthermore, another motivation for (2.3) is 1 pn Given the is MCMC series (2.5) 7 MCMC sample of size 6 from the quantities with an accuracy of order 1/vb. In contrast, that any optimal (admissible) estimation procedure procedure or a Bayes procedure with improper priors, (if means, cf. Wald solves arginf 76 g j X]<=i P" [f smooth) optimization problem. 8 is a Bayes (1950). ~ 7' 4 ')> which is a globally convex and smooth MLE the computation of exact requires optimization of a highly non-convex, discontinuous otherwise highly nonlinear likelihood. MLE can be estimated by grid-based algorithms or and MCMC only with an accuracy that worsens exponentially in the parameter dimension. MLE The BEs and the are consistent and 0-0 = O p (nThe BE's are shown to converge process. We first is a prerequisite shown it is in this paper that and 5 - a 1 ) in distribution to = Op {n- 1 '2 (2.6) ). Pitman 13 functionals of the limit likelihood ratio develop a complete large sample theory of likelihood for these models, which for any inference based on the likelihood principle. In particular, we obtain an explicit form of the limit likelihood ratio process as a function of a Poisson process that can be easily simulated. This result implies that the limit distributions of the estimators can be simulated for purposes of Wald type inference through either (a) simulation of the limit likelihood process, or (b) resampling may be more robust than techniques including subsampling and parametric bootstrap. Subsampling other methods under local misspecification of the parametric assumptions. However, the resampling methods are much more computationally expensive and require much more computational time than Bayes type inference. Simulating the limit distribution is comparable in terms of the computational expense to Bayes inference due to the linearity of the limit process. An attractive practical alternative results establish its large intervals for r n (7), is the Bayes inference based on the posterior quantiles. sample frequentist where rn is a smooth Our Consider constructing a r x 100% confidence validity. real function that possibly depends on n. Define the r-th posterior quantile of the posterior distribution as c(t) where p{z;r) is = arginf f p (f MCMC (r, i (7 r) I (1) under mild conditions on r„, which pragmatic motivation for is ^^ = , d7 (2.7) , {r„(7),7 € Q). In practice, c(t) is sequence evaluated at rn ),-,r n (7 (6) ))- resulting r x 100%-confidence intervals are given [c(r/2),c(l-r/2)], where A r„ (7) the check function defined above, and TZ n computed taking the rth-quantile of the The - (2.8) by Kn^P^cir/Z) < r„ (70) < c(l -t/2)} = r, (2.9) one of the main results of this paper. Bayesian intervals is that the empirical researcher does not need to have detailed knowledge of complex asymptotic limit theory to apply them. She can simply compute the intervals through generic MCMC methods, and then rely upon the present results that establish the large sample frequentist validity of these intervals. 13 We follow the terminology of Ibragimov and Has'minskii (1981b) p. 21. Another classical procedure is MLE, which the 7 = = 0', a')' defined by maximizing the likelihood function: is axgsup Ln (7) MLE is a limit of BEs under any sequence of loss functions that approximates the delta functions. We shall only briefly discuss the limit distribution of exact MLE for editorial reasons. A detailed analysis of MLE is given in the technical report, cf. Chernozhukov and Hong (2003). The MLE The Converges in distribution to a random variable that maximizes the limit likelihood ratio. Large Sample Theory 3. This section contains the main formal results of the paper. Section 3.1 examines the large sample properties of the likelihood ratio function. Characterization of the limiting behavior of the likelihood is necessary for obtaining all of the main results and is any likelihood based inference useful for methojds. Section 3.2 provides an intuitive discussion of this result and subsequent results through an example. Section 3.3 describes the large sample properties of optimal Bayes estimators and both Wald and Bayes type inference procedures. Section 3.5 briefly discusses the limit theory of exact MLE. 3.1. Large Sample Theory for the Likelihood. analysis is A common the limit serves to describe the asymptotic distribution of step step in modern asymptotic and Knight (2000). After appropriate strengthening, criterion functions, e.g. van der Vaart (1999) initial first to find the finite-dimensional marginal limit of the likelihood ratio process or other is all Such an likelihood based estimators. sometimes called the convergence of experiments, see van der Vaart (1999). Consider the local likelihood ratio function e n (z) where 7n (<5) = + 70 Hn S matrix with 1/n in the = Ln (nn (5) + Hn z)/Ln {ln {$)), denotes the true parameter sequence. first dp — dim (/5) The any Hn scaling by corresponds to the convergence rates y/n for function £„(z) finite is Pln Hn is a diagonal is = dim (a) necessary for subsequent results. a and n for 14 j3. if for k ^oo(-) is called under and said to converge in distribution to Eoa (z) in finite-dimensional sense (*»(**)' and Rd diagonal entries and l/y/n in the remaining d a diagonal entries. Consideration of the local parameter sequence The 6 € (S)- We a finite-dimensional 3<k)-> d (4o(*i)> 3<k), limit. In this section, ->,j denotes convergence in distribution partition the localized parameter z accordingly into z The convergence (3.1) = (u',v') , where u £ R dff rates are established as parts of the proof of the subsequent theorems, and follow from the exponential decay of the likelihood tails E£ n (z) ~ const 10 e _c ' z ' as \z\ —> 00, see the proof of Theorem 3.2. Rda corresponds to the localized location parameters and v € corresponds to the localized shape parameters. Theorem 3.1 (Limits of the Likelihood Function). C0-C5 Given conditions pendix A, the finite- dimensional weak limit of the likelihood ratio process t n form: for A(x) = p{X) = p{X,-y dg(x,0o )/dP, (z) = e l0O {v) 1 1oo (v) = exp e x ), l u (j,x) exp(u'm+ = \n?P-l[0 < j p(x) — oo N lnoo , is = oo and 1/0 co, oo • a Poisson random measure N(-) J t variable. > 1} is an > {X!,£-,i iid l u (j,x)dN(j,x)), ' =T t = 's is > j J), and TV (0, A(x)' u] , convention applies to the case when q(x) = 0: InO = 0, see equation (3.6) below]. = Y^\ /p(X,), [W>-*t') e 1 S1 + Ti= T'i = "] -(£[ + + SSi .. . an independent copy of {Xj,£j,i 1 Wn^i) e ']> where +E i>\ (3.3) + £<), I>1 (3.4) ... i, sequence of variables where Xi follows law 1} WA q(x) Jl=rjq{Xl), {Xi,£i,i / JlixX < A(x)' u]+\n?P-l[0> [where Ibragimov and Has'minskii (1981b) = =\nf(ei \Xi ,'y), (7) /* (W'v - v'Jv/2) £h (lb)'), m = EPyo A(X)\p(X) - q(X)}, (£i, ( 7 o) io ), (z) takes the following x £ 2 oo(w), V J = E„ and (3 2) i 2oo{u)= where q{X) =q{X,^ Ap- collected in > 1}, Fx, and £, is a unit exponential and both sequences are independent o/W. Remark fRxX l 3.1 (Alternative Form). To analyze u (j,x)dN(j,x) appearing in OO the limit ^2oo(u) further, write the Lebesgue integral the statement of Theorem OO OO £*««,*) +^(Ji,X!) = £> / -v, 3.1 as v Syil [0 < J, < A [X,)'u] (3.5) + £l*fSM°> 4 >*(*/)'*], which is a simple function of the variables {X^X-, Jj, J-}. This suggests that the function can be simulated simply by generating sequences of {X<, X-, distributions specified in Theorem In practice, the quantities p (Xi) 3.1 for large J-,i < b} according to the and then evaluating the corresponding expressions. and q (Xi) are replaced by the empirical distribution function. inference, see e.g. b, J,-, limit likelihood This replacement Chernozhukov (2001). 1 is their estimates, and Fx is replaced by permissible for purposes of large sample Remark (Boundary Case). There 3.2 (boundary) model. Since q(X) = a.s., a drastic simplification of is using the rules stated +X>GM) = J2 -°° £>(./„ *,) 1 [° £200 (u) in the one-sided Theorem < J < ' 3.1 A (*»)'«] 1=1 1=1 i=i in =0 (3.6) 0, "{ Hence for , Remark 3.3 I . exp(u'm) if Jj i „ > 1, otherwise. > A(Xi)'u, for . alii > 1 (3.7) . otherwise. Theorem 3.1 (hence in (3.4). (Robustness to Misspecification) Theorems . not It is of densities at the jump difficult to 3.2-3.4) that the limit theory for fication of the regression function g{x,0) of order o(l/n), and /? is observe from the proof of robust under local misspeci- local misspecification of the height points p{x,-y) and q(x, 7) of order o(\/ri). It also appears that the qual- nature of the limit theory would be preserved under local 0(l/n)- misspecification of the and possibly under 0(1) misspecification on regression function g(x,fi) p{x,f) all i the one-sided models, the limit depends only on the set of variables in (3.3) and does not depend on the variables itative (^i)'u, for —00, \ in J4 > A m = EA(X)p{X) (u) = < 2oo(u)=S Thus if > (7(2,7) for all x. p{x,~f) Theorem 3.1 is beyond the scope its A formal development of this paper. extends the results of Donald and Paarsch (1993a) on the boundary models with discrete regressors Despite q(x, 7) as long as Given these mentioned conditions hold, the inference about a appears to be robust up to local o(l)- violations of the information matrix equality for a. of these results and and the results of Ibragimov and Has'minskii (1981a) on the univariate models. unusual form, the limit likelihood has a simple structure. The term ^ioo(v) is a standard expression for the limit likelihood ratio in regular models, and inference about the shape parameter a is thus asymptotically regular. The limit log-likelihood has a standard linear-quadratic expression: t/W This limit contains a normal vector for W=N v'Jv/2, (0, J) and the information matrix J example, that conventional estimators of a, such as the posterior mean and the . This implies MLE, have the standard limit distribution jBecause is unknown, the 1 w = jv(o,j- 1 limit likelihood also includes tinuities in the density are highly informative about 12 /? ). a nonstandard term ^2oo(u). and are of a local nature. A lot of The discon- information about P that is contained the observations YJ that are near the location of the discontinuity g(Xi,P), in for those Yi such that is, ei close to zero. is =Y -g{X ,P) i i Thus the behavior of extreme (closest to zero) e[s determines the behavior of ^200 (u), as further explained in Section 3.2. Consequently, one expects that the rate of convergence n of likelihood-based estimators will be likelihood estimators of 3.2. will ft for be determined by £2oo(")- Informal Explanation Through an Example. Consider a simple model 15 Y =X' i where £ is inference about more a as it is = 1. Assume that z i , = n(P — Po), if Yi — X[p > this constraint takes the likelihood for this example Ln (P) / is a boundary model with the density at the (3-8) a (We do not discuss the a linearized, homoscedastic version is be most informative about P, as the likelihood function 0, for all n (P ) as is i, ne* > X'^P — depend on these constraints. that > X[z, for /?o will L n {p) = Y\ i a function of z M*) = II when is, /3n), for all i. Letting form learn about the parameter likelihood ratio = E, models. nti What we can ei The model regular as stated earlier). realistic nonlinear be positive only The +e there are no shape parameters Intuitively, the smallest values of tj will will o i a standard unit exponential variable. This boundary equal to p{X) of contrast to ^/n for a), and that the behavior of (in (e- e ' all i. e- <i+x '^-M\{ne > X[n(P - p = i )). Hence the n(P — Pq) takes the form +X '* /n /e-<-) > X'z), (nei 1 i<n which further reduces to = ex z l(n€i > X' z, ' in (z) Since X-* p EX, f the behavior of E n {z) for fixed z is for all (3.9) i). determined by the lowest order statistics "«(1). ««(2)) "f(3)> The Reny representation, see e.g. Embrechts, Kliippelberg, and Mikosch (1997) re-scaled order statistics to be represented almost surely as n, £\, We thank a referee £\ H n—17^2 for suggesting using , t\ H n -Si n— 1 a similar example. 13 n -\ n — xt3,..., 2 p. 189, allows these where {£i,£2,...,£n } stochastically n an is sequence of unit-exponential variables. For given z, essentially only a iid bounded number —¥ co, for any finite of order statistics, say fc, matters in the constraints Hence as (3.9). A; k (ne(i), 7i€( 2 ) ,—,ne(k))-*d {£1, £\ + £2 , -, /] £j) 3=1 = Hence the marginal may be limit of i n {z) = £«,(«) where {I\} is the sequence of point process methods in result stated in The density of variables Tj 3. term and > l{Ti X[z, for just a all i > above, and Xi 1). is the iid sequence of regressors special case of the limit stated in equation (3.3), in equation 3.1 formalizes the intuition described 3.1 is more complicated = e*'s Eioo(v)- a information about /3 statistics, is The The (3.3). use of it to is for the following reasons: also an additional negative error in equations like from above and the in equation (3.4) as the limit distributions of "extremes may gamma variables from below". vary near zero, which changes the hazard rates of the limit a leads to the presence of an additional of this term reflects that the inference about given by the limit average score W and comes from a small portion of the the average score W is gamma by varying hazard functions p{Xj) or q{X-). Uncertainty about the additional shape parameter The form where ^2oo(w).) above and extends then largely deduced from the e^s closest to /? is r'; , resulting in their division information about type z to zero from below. This explains the presence of the additional set of and associated regressors 2. ). errors. Theorem The information about e;'s closest fc ' more general two-sided models, there 1. In r , a special case of points (J{,Xi) stated is Theorem more general heteroscedastic (3.8). is ... (Also there are no nuisance parameters in this example so that £tx{z) 1. definition of points (1^, Xi) The E(x) , seen as gamma variables defined with distribution Fx- Note that this p(X) = e r2 (r l5 a The regular. the information matrix entire sample independent of those is fully and is J . limit Since based on extreme statistics asymptotically. This follows from the standard proof of asymptotic independence of sample averages of general form and sample minimal order Lemma 3.3. statistics, see e.g. Large Sample Properties of Bayes Procedures. Given the above Theorem 3.2 can be easily conjectured. The Bayes estimator Z„ centered at the true parameter 7„ is Resnick (1986) for a general treatment and van der Vaart (1999) 21.19 for a simple example. (6) = (/?„ (S) related to the localized likelihood ratio £ n (z) , q„ (5) ) — discussion, the following — a„ (5))')' and normalized by the convergence rates, as follows 14 = it (n(/3 /?„ (<5))', -Jn (2 minimizes the posterior loss redefined in terms of the local deviation from the true parameter: T n {z)= p(z - z')n n / (z')dz'. Ju d Here n n (z) is the posterior density for nn where £n (z) is (z) = l n (*) M (7n (6) = A» + H n z) / f JR £ n [z] fRd too {z) dz. (z) I The fj. ( 7n (6) H n z) dz, + d the local likelihood ratio process and be conjectured from the discussion 7roo {z) the local deviation z from the true parameter: p. is the prior density. As n —> in the previous section that the posterior limit local posterior density ir oo, it can n (z) approaches a function of the likelihood iTao is only and does not depend on prior information. Theorem 3.2 {Properties of BEs). Suppose that the conditions of Theorem 3.1 and 1. The convergence rate 2. Zn — > is n for estimating If p(z) = pp{u) y/n(a n — a n Theorem (5)) + = limit is terms £ioo(w) and - Pn i.e. Z„ — Op (1). Z& and Z a ^200 («)• If p{z) Z (&))-* d — { *'_\ 13 dz' (3.10) (z)dz = argmf u fR * v') £ioo («') dv' , given in the form cf. ^ z') pp (u - and Z® and u') e2oo [u')du' Za of a Pitman functional of a methods according to Remark Remark 3.1. = PpW) + MLE (1999)), the limit distribution of 3.5. p a (y) does not hold, part 3 of Theorem 3.2 does not the loss function p a 5 is is is a coincides with symmetric (by Anderson's lemma, see van der Vaart given by Af(0,JThis and result also justifies are independent due to the factorization of too(z) into independent that of the if limit distri- limit likelihood The apply. Also, the limit distribution of the Bayes estimator of the shape parameter i.e. and are independent. and establishes the rates of convergence and the MCMC the use of the parametric bootstrap, In the stated result, - J Rd too JR<!a p a (v 3.2 obtains the consistency The p(z 2€R J iR i arginf v not difficult to simulate using / arg inf p a (v), then n(J3 n Za -)•<* butions of the BEs. is and -Jn for estimating a, (3 Z, where Z= 3. D1-D3 Then hold. ] ). not the case for the estimators of the location parameter 0. Furthermore, as shown below the optimal estimators generally are not transformations of the the non-regression or Remark lidity of 3.4 ( dummy MLE asymptotically, contrary to regression cases. Wald Inference with Sub sampling). Theorem subsampling for Wald type inference. 3.2 immediately justifies the va- Subsampling approximates the distribution of the estimator in the sample based on values of full Implementation protocols are standard and can be found in orem Romano, and Wolf 2.2.1 in Politis, polynomial in n rates, proven in Theorem (ii) 3.2. strap and (b) is likely (1999) applies provided Romano, and Wolf 3.5 may not be as high quality as the parametric bootstrap or simulation of is (a) computationally less demanding than the parametric boot- to be more robust than other methods to local misspecification of parametric affect the rates of convergence. (Wald Inference with Parametric Bootstrap). As weak convergence (1981a), the The- the estimates are consistent at (i) models that change the parameters of the limit distribution but do not Remark (1999). the estimates posses a limit distribution. Both of these conditions are However, subsampling limit. Politis, smaller subsets of data. Thus, Theorem 3.2 immediately implies the validity of inference based on subsampling. Subsampling the many this estimator in and conditional on almost every in Ibragimov and Has'minskii and the proof can be stated uniformly in the parameter 7, realization of the covariate sequence {Xj,i < n} (n —> results order to do so, the notation must be made more complicated in 00). In a manner similar to Ibragimov and Has'minskii (1981a) to denote the dependence of the limit on the parameter 7 and on the realization of the covariate sequence. The uniform convergence in distribution is defined as the convergence of distributions under the Levy metric uniformly in the parameter 7, and conditional on covariate sequences. This immediately implies that the parametric bootstrap is valid in the usual sense that the bootstrap distribution converges to the limit distribution in probability under the Levy metric as long as the preliminary estimate estimator 7 may be used. 7— p 7, conditional on covariate samples. Any Hence the parametric bootstrap can be used for initial consistent Wald type inference based on the point estimates (See for example Horowitz (2000)). Although the Bayes estimates are not difficult to recompute (especially with a good starting value such as the 7), the parametric bootstrap appears to be very expensive computationally. 4, it is may much more computationally demanding than any other method. Bayes estimate As discussed in Section The parametric bootstrap not be robust against the local misspecification of the parametric models. Next consider the posterior mean 7 and posterior quantile 7 (2.4) initial under the squared Also define Z and Z loss (r) as and the check (t) as the solutions to the problem loss functions, respectively (each defined in Section 2). the solutions of the limit problem (3.10) under the squared and the check functions, respectively. Theorem tervals). 3.3 (Mean-Unbiasedness, Under 1. Posterior the conditions of mean Quantile- Unbiasedness, Posterior Confidence In- Theorem 3.2 estimators are asymptotically mean-unbiased: 16 < Consider any 2. = and t t" < r' < r" 1. If Z (t) has positive density in the neighborhood of then posterior r-quantiles are , Vm o P- rm w{<rt{T)) < i — 1 a. [(7 r"))j] ( (7 1 Results and 1 For example, 2 follow be that A 0. losses, mean Z had a mean asymptotic risk regardless of it is r„ : result concerns the - r n (7) R is EZ = c / 0, then the estimator 7 the local parameter sequence. Hence = R'(a- is > 1 00) y/n and R= [R',R']' with rank + R'{0-0o )n + O determined by either parameter m function m (a) If a smooth function If a smooth (3.13). (2.9) in Hn c must (/?) is (n\0 R= a - o \ 1: + ^\a - a made dependent on n a or of interest, then taking r n (7) is Note that these transformations by %fn or n do not all ) . (3.13) have a better of the asymptotic in rates of = n m (0) fulfills = ^/n m (a) also affect a \ specifically to due to the difference of prime interest, taking r n (7) is The technical. finite-sample approximation through the avoidance of the trivial case where is + it Consider inference about the function r n (7) where For purposes of theoretical analysis, the function inference in section 3.4. asymptotic validity of the posterior quantiles c(t) for inference such that for a rn (70) .. asymptotically optimal under the r-check loss function. about smooth functions of the parameters. I »+ d " -> (7) argument applies to the rth-posterior quantile. The r-posterior quantile similar r-quantile unbiased because d the validity of confidence intervals which are defined and established requirement that Zj (r) has positive density around The next (3.12) from the asymptotic optimality of posterior means and quantiles respec- strictly lower EZ = is sample inference on parameter components ar g e the limit posterior if would have a is 1 l under the squared and check function tively — f° r r' (3.11) < (tV)),} =r"-r'. very useful implication of the quantile unbiasedness result • — Hence taP^JRMlj < (%(*)), r ')) ( for t = Ac{(^ W)j < 0} = 1-T, (7n (*)),-} where (a)- denotes the j-th components of vector A 0, r-quantile unbiased: convergence. condition (3.13). fulfills condition the practical formulations (2.7) - Section 2.3 by the linearity of transformations and equivariance of quantiles to monotone transformations. Theorem 1. 3.4 For any < Inference with Posterior Quantiles ( r < 1, (c{t) - Z(r) 2. Provided Z (r) r n (y n (5))) -> d S arginf Z (r), ). where |/( i - R> Z ;r) £ ^^d, ^ has positive density over a neighborhood of limPTn(4) {c(T) < n Under the conditions of Theorem 3.2 rn ( 7b (6))} 17 for r = P7o {z(r) < =r o} l = and t 1 - r, = r" (3.14) and lim n o Theorem 3.4 generalizes P7T>w {c(T )<r n , Theorem 3.3 to ( more inference about struction of parameter confidence intervals. For example, a 1 hypothesis r n (70) = rn is = 7„((5))<c(r")} — - t" t' more general functions than conasymptotic test of the null r-level given by the decision rule that rejects the null if Optimality. Lemma 3.1 below optimality properties of BE's, which 3.3 and A 3.4. needed only r/2)], test. and asymptotic average briefly records the finite-sample is — r„ $ [c(t/2) ,c(1 where (3.15) can be used to deduce the local power and consistency of the 3.4. (3.15) . risk the auxiliary purposes of proving Theorems for detailed valuable analysis of (minimax) optimality in non-regular models can be found in Hirano and Porter (2002). Define the normalization matrix Hn as in section 3.1, the local parameter sequence. Consider the set 7„. Tn and let 7„((5) = + Hn 5, 70 of all statistics (measurable 5 € Zn = where measures of The i/" [7„ — 7 n (^)] and the expectation finite sample average risk (AR) of 7 is the weight or prior measure over measure. The asymptotic average risk K|K Compared p(Zn ), computed under 7n (<5). Consider the following (S) K, p (AAR) p(Zn )p^n (5))d6, is d (3.16) the loss function over K, and A of estimator sequence {7 n } limsuplimsup —tjtt \ KtR d n->°° A l-ttJ JK where EP given by: P p. is denote risk. ~y J^E ^ where is , mappings of data) Define the expected risk associated with a loss function p and estimator jn as 1 Kd EP is the Lebesgue given by is p{Zn )d6, (3.17) denotes an increasing sequence of cubes centered at the origin and converging to to the previous formula, the weight p. is Rd . replaced by the objective (uninformative) weight overM'*. Lemma 3.1. Suppose the conditions of estimator under loss p and prior weight p, 1. For each n > 1 the i.e. at The infimum of asymptotic average 00 and is (3.17). (Z denotes the weak Statement 1 is i/" 1 For hold. [j P ,,,, n — %,,,,„ 7n(<5)], Zn — Zn K = Un limit of denoting the Bayes is achieved over T„ in (3.16). risk over estimator sequences in attained by the sequence of the Bayes estimators a basic result of Tn € Un = n (B — /?o) x y/n (-A — cxq) of finite sample average risk for infimum by the Bayes estimator %,„,„, 2. Theorem 3.2 Zn = jPlll ,„, T„ i.e. equals E P^ o p(Z) < {Zn } = {Zn } in Zn ). statistics, that the optimal estimator under loss p estimator defined by the risk-weighting function 18 p and a loss function p, cf. Wald is a Bayes (1950) and Lehmann and Casella (1998), Chapter Statement 1 5. average risk efficiency MLE may be worse than but performs better under other loss functions, Large Sample Theory of MLE. discussion of the is asymptotic from Ibragimov and Has'minskii (1981b) (this result essentially follows For example, loss function p. which an alternate definition efficiency into p.93). that unlike in the regular case, the efficiency rankings are largely determined by the It is critical 3.5. often simply used as is Bayes (optimal) procedures. Statement 2 translates finite-sample of the Consider the posterior Section cf. mean under the squared loss, 4. Maximum Likelihood Procedures. We provide only a brief the MLE Z n = (z% Z°'\ = (n{0 - n (5))', Jn~{a - a n (<5))'Y, , centered at the true parameter and normalized by the convergence rates. Theorem minimum MLE). Under C0-C5, Zn — O p (l) and 3.5 {Properties of in Rd a.s., then Zn ^ d Z = Z°^t d Z a = In particular, J~ 1 and supposing arginf. 6RJ W = N{0,J- 1 ), Z - £00(2)- Z = -> d that -£oo{z) attains a unique argmin u6R<f/ , and l2oo {u), Z and Z a are independent. The proof is Due Hong given in the technical report (Chernozhukov and is an argmin of a limit likelihood, The (2003)). which inherits the discontinuities of the finite limit variable sample likelihood. to asymptotic independence of the information about the shape parameter from the information about the location parameter, the MLEs for these parameters are asymptotically independent. In the boundary models, the limit result can be stated more explicitly for P as follows: n(0 - /?„ {8))-> d Z = arginf ( = argsup - exp(u'm) u'm ( such that such that J; >A (*,•)' > A(Ai)'u,for J; u, for all all i > > i l\ , lj. This result generalizes the results of Donald and Paarsch (1993a) and Smith (1994). Note that the weak conditions. solutions of the linear programs like these are unique under fairly Remark 3.6 the posterior (Asymptotic Non- Sufficiency of MLE's). means and medians are generally not equal example of Section 3.2 E (X) limit maximum is important to note here that to the bias corrected (z) = eE W likelihood variable z subject to the constraint T; > z Z l [Tt > X(zM maximizes X\z, for all i all i £00 (z), > 1. > When A (X,) who show that A Consider the 1). which equivalent to maximizing is In the no covariate case, the limit For example, a very simple sufficient condition for almost sure uniqueness distributed element in MLE. where 4o The It (A,) and that Af-Y,) has a nondegenerate distribution, is that there cf. is one continuously Portnoy (1991) for a related problem. has discrete support, the stated limit result coincides with the result of Donald and Paarsch (1993a) uniqueness holds if A(/f,) has nondegenerate distribution (assumed in C3). 19 MLE Z implying sufficiency of Z. transformations of shift Z whether X= is If Z Z X (1,X) where implying that (z) Z is is continuous, (z) Design. where we = set 6 2 = 100 and n auctions. estimates. We 1 £<*, and was set to all i median, (2) the posterior mean, (3) the posterior mode (MLE), across different risk measures. (c) the > 1) MCMC B— be Here the sequence. results given in Table 1 nonrandom functions of the [/3n ± x 5] ± [j3i MCMC We 5] made Z. and a flat prior to compute the Bayes algorithm described on 245 in Robert p. MLE is computed by taking the argmax over the and Table 2 show that: better under the implementation the first grid 18 mean (a) mean is the posterior median the best under the 10-th power loss function. Thus, some 20,000 draws are made risk with a fixed variance, and used in it is the best under mean squared appears that all loss, of the measure. for a "burn-in stage", variance of transition kernel every 200 draws in order to keep the rejection rate near were MLE compare the performance of likelihood procedures perform quite well relative to In the limit computation of the estimates. The computations in the absolute deviation loss, (b) the posterior MLEs do with strictly positive probability, 17 (1) the posterior mean some all <?i Quality of Estimation Procedures. the Z), 400, which are close to practical sample sizes encountered in empirical work on and Casella (1998). The < We used a simple procurement auction model similar to that in section = exp(/? + A-X), with X ~ 1/(0, 1), /? = 1, P\ = 1, m = 3. We take starting value generated by the z e l(z the general regression case. Taking the example with were performed using the canonical random-walk 4.2. = n}, l<x>(z) Computational Experiments used the parameter space The < even conditional on covariates. Thus, the limit likelihood- are generally not Monte Carlo = min{r;,i easy to see that it is not sufficient for 4. 2.1, in X!Z,tor Z Z = thus then the limit optimal Bayes estimators are sufficient is ^ eEm '*l(X!z < based Bayes estimators n z, by the well-known Rao-Blackwell argument. This raises the question of a sufficient statistic for £«, (z) 4o 4.1. > maximizes over z such that I* the computation of estimates. The with adjustments Then .5. C++ made to the additional 20,000 draws implementation is available from the authors. 18 We also tried the approximate MLE performance of the approximate and exact loss functions that defined as a MLEs BE coincided under (truncated) 10-th power up to many digits and approximate delta function can also be used to approximate 20 is loss function but the not reported separately. Other MLEs by some BE. 4.3. Quality of Inferential Procedures. In the next step, we compare the performance inference methods, focusing on the coverage properties of the confidence intervals. of several We compare (1) the confidence intervals based on the posterior quantiles, (2) the percentile confidence intervals based on the parametric bootstrap of the posterior mean, (3) the percentile confidence intervals based in the parametric bootstrap of the MLE, (4) the percentile confidence intervals based on the subsampling of the posterior 1/4 x n as the subsample (5) = (using size), the simulation of the limit distribution of 3.1 (using b mean MLE and other estimators as described in Remark n). Results reported in Table 3 indicate that the intervals based on the posterior quantiles and simulation of the limit distribution perform nearly as well as the parametric bootstrap, while subsampling performs worse than any of them. The confidence intervals based on the posterior quantiles also appear to be the shortest on average. Given that the posterior intervals are the compute, they should be preferred. The subsampling and is is least expensive to expensive than the parametric bootstrap less probably more robust for inference purposes under local misspecification. In terms of computational expense, on a Pentium III PC. Simulation computation of the posterior quantiles takes of the limit distribution is less than 1 minute roughly twice as expensive (because the limit expressions are simple transformations of linear functions and do not contain nonlinear We expressions). the used 200 bootstrap draws and the MCMC algorithm (which reduces the number of full sample estimate as the starting value in MCMC draws needed in the re-computation of the estimates). Using this implementation, 200 bootstrap draws take between 7 and 30 minutes for samples n = 100 and n = 400. (Thus, 1000 bootstrap draws take up to 150 minutes for The subsampling takes about 1/5-th of draws. The entire of the time of the parametric bootstrap, using the Monte Carlo work took 5. We studied estimation models discussed in 400). same number weeks of computer time. Conclusion and inference a general model dependent variable jumps at a location that of interesting economic several n = is in which the conditional density of the parameter dependent. This model includes a number the recent literature of structural estimation. We derived the large sample theory of a variety of likelihood based procedures, and offered an array of useful and practical inference techniques, including results provide a theoretical and Wald type and Bayes type practical solution to inference methods. The an important econometric problem. Appendix A. Regularity Conditions C0-C5 and D1-D3 Notation. Throughout the paper, —> v and —>d denote convergence c and const denote generic in probability positive constants unless stated otherwise; and distribution, respectively; 21 ||x|| is the usual Euclidian norm Vx'x, and Note the |x| is supremum norm used to denote the densities of interest /(e|x,7) are discontinuous at simplify notation, we e = radius 5 as measured by 7 with Conditions C0-C5 The 7 = {P,a) CO d Q CR - /(Y, in Q. sup J<fc and are not C4 (Y,, Xi) compact convex 0, etc. (xi,...,Xk)- = differentiable in e at e when e ^ To 0. and also use 0, it to Also, .6,5(7) denotes a closed ball | C5 and an is = e — where x \xj\ • | x following conditions apply to Conditions For each 7, is = use 9/(e|x,7)/3e to denote the usual partial derivative denote the directional partial derivative df(Q + \x,f)/dt when at |x| i.e. , apply to any 7 = £ = (/3, a) in CO -C3 apply to any some .8,5(7) f°r denned on probability space any 7 and interior Q\ for Conditions 6 R. e and 7 R x R*, sequence of vectors in iid set such that 70 X, in (/3,a) 7^7, P7 {/(yi — > 5 0. (Q,7", P7 <?(Xi,/?)|.X,,7) ). ^ S (X, 3)|X,,7)}>0. / CI Xi has cdf Fx, that does not depend on in 7 and x, we have either and has compact support X. 7, In addition to (2.2), uniformly (i) the two-sided model: p(x,y) (ii) the one-sided model: p(x,7) C2 7. Without The e c second partial derivative > > c and f 0, the density /(e|x,7) = e in 0) that 7 that is bounded uniformly X for each x on and C3 g(x,fi) has two continuous 1 da aa continuous in x on ffi When C4 uniformly in each either (a) P7 -a.s., then £^-,[^'1 (7) C5 In the two-sided a ^U (7)'] model is is are ;, bounded sup 7 EPy Cj Lemma f", EPl Cj (e, Xi ) < Cl.i, the 00 for j = Xi ) < 00 for j = 1 , JWj sup = e for each has continuous first partial £,7); f(e\x,*y) has continuous The Ex J density function and the in and the /?, first uniformly in 00. x and /?, and specified above derivatives are = In/ (Y, — g(Xi,p) (7)'] is positive definite \Xi,^) where 7 and bounded or (b) if = -^U (J3,a), (7) = and bounded. y -*(X*,0)|**,7) f, = 1, ( * 2,3, for all Y, — W^rlnf (Y - g(Xi,p)\Xi^)\\ (A.l) t g(Xi,/3) 6 R \ {0}, uniformly in 76 Bs(~)o), = 1, 2,3, for all Yi — p(X,,/3) > 0, uniformly in 7 e #,5(70), where 2, 3. conditions C0-C3 imply that there are finite constants f, f, 9" such that /(e|*,7)</, sup \g(x,/3)\<g, J68.I6X and derivatives specified — g(X,/3)\X;~/)\dy < \$-f(y x and terms A.l (Important Constants). The f_, 9, 9', sup 7 The 0. 1,2,3. Similarly, for the one-sided model Cl.ii the terms in (A.l) respectively by Cj(e,,X,), j (e,„ /?. positive definite bounded respectively by Cj(f,,Xj), j where sup 7 in (e, and bounded derivatives in < in (e,x,i). f(e\x,*y) in (e, x, 7). 7. Lastly, 0, for all e upper-semicontinuous at is present, for U (7) EPl [-§^U (7) \%lnf(Yi - g{XiJ)\Xin)\, \\^ ln f are = /3. the nuisance parameter 7 and 7 « uniformly s positive definite X for q(x,~/) bounded uniformly is in ' = (t\x,~y) above are continuous The function or bounded from above uniformly is (except at > q(x,~f) loss of generality, density f(e\x,f) derivative in > > sup S6B,i6X sup ||^* ||^,s(x,/?)|| w /(«|«,7)ll</', < 3, 22 sup sup Hj^cKx,/?)!! SgB.igX ||Jfc,/(«|x,7)|| < s", < /", and also that for some 6 > = and V(0) {0} in case of Cl(i) and V(0) [—5,(5] \ /(e|x,7) inf *ev(0),xex,7?g Remark A.l. The regularity conditions are summarized location parameters the conditions /? C0-C5 and the shape parameters are a mixture of non-regular (1981a) and van der Vaart (1999), Ch. is zero to the left side a. jump and of the model where the density Conditions D1-D3 The prior p Q : and Section in in case of Cl(ii) (0,<5] 0. The parameters 7 2.2. known, the inference about a is regular assumptions, as in Ibragimov include the regular. Thus, and Has'minskii Condition Cl(ii) allows for the boundary model, where the density positive on the right side. Condition is on both positive is 7. If j3 is > / > = -> Conditions sides. R+ and the C3-C5 loss function p are : Cl(i) allows common Rd " +d s -* for the two-sided nonlinear analysis. in R+ have the following properties: Dl p ( •) > > p(z) is D2 D3 p Remark () is continuous on Q, and p(z) = iff z = p 0, convex, is dominated by a polynomial of \z\ as \z\ — A. 2. These are standard assumptions on the > 00. loss function p and the prior ft, Ibragimov and Has'minskii (1981b). Since BE's become essentially uncomputable when p do not consider the non-convex loss see for is specified in Ibragimov loss functions and Has'minskii (1981b). Appendix [N.B. In the proofs B.l. we functions for pragmatic reasons. However, the proof of Theorems 3.1- do not rely upon the convexity assumption and the results apply more generally to other 3.4 example not convex, we Proof of Theorem Proofs for Section 3 B. extensively use the constants defined in 3.1. In the proof we set the local Lemma A.l] parameter sequence f n = 70- Considering a general sequence does not change the argument but complicates notation. Following Ibragimov and Has'minskii (1981a), we split the log likelihood ratio process Q„ into the continuous part Our goal is to show that c Q n (z) Q n (z) (2) = \ne n (z) = In L„(f + H„z)/L„(7o) and the piece-wise constant part Q„ converges in distribution in Qoo(z) = QUz) + (2), and analyze each part separately. the finite-dimensional sense to Qi>(z), where Qlc(z) = W'« - \v' Jv + Qt(z)= [ m'u, 1 where each term is defined in Theorem lu(j,x)dK(j,x), JRxX /RxX 3.1. Given this ^00(2) result, the finite-dimensional limit of t n (z) = axp (Qao(z)) 23 is For z = (u ,v')' , using that = e,- Q n (z) = ^2ri„(z) V* — g(Xi,j3o), > {A„(X„«)/n} V0) + x[l(e< < {A n (X,,u)/n} A l(e, 0)] 1=1 n + ^(r,„(z) - 7N n (z)) x [1(0 A n (X„u)/n) + 1(0 <u < > e, > A„(X,-,u)/n)] i=i + ^Ttn„(z) x[l(0<e, < A n (X,,u)/n) + l(0>e, > A„(X;,u)/n)] <?*<*) = Ql (z) = f,„(z) + Ql f(Yi , - where + u/n) g(X,,/3 In f(tj = («) \X,,/3o + u/n,a + v/-fn) HYi-giXuM&i^o) - A n (Xi,u)/n)\X,,l3o + u/n,a + v/y/n) ln /(e,|X,-,7o) r in (z) = q(Xi) In p(Xi) <e,)+ln 1(0 A n (x,u) = The convergence tinuous part Ql Also, in above expressions for n(g(Xi,/3o from that of and working with oo's defined all in c Qn proofs Theorem (z), we c Qn and (z) is is standard. In sharp contrast, behavior of discon- analyzed using the point process methods. use the algebraic rules of Ibragimov and Has'minskii (1981a) This 3.1. is done to include the proof a special case. In particular, the expressions involving 1(0 e, > 0. Also in the boundary model r;„(z) = Part I is > e, > = — oo when r,„(.z) QL(2) = Thus, the term Qlniz) e,), + u/n) -g(Xi,0o ))- analysis of the continuous part (z) differs > 1(0 q(Xi) lP(X,) ...) < for the boundary model as cancel, since in the e; < A n (X",, u)/n, boundary models so that o. only non-zero for the two-sided model. Further details follow. obtains the finite-dimensional limit of Q cn {z). The proof method is standard for the smooth likelihood analysis. Application of Taylor expansion to each by application of the Markov LLN and ' Q^z) = -u'EA(X s. f % ) {£ o) '^'l • r, n (z), Chebyshev i = -L£Aln/ +v' L * 1, ...,n, so that the expanded terms are inequality, yields for a given z [see * ( e,|X„ 7 o) * £ a^'° E/( = -J 24 iid, followed Addendum] < |A'^l v + op (l), where A(A,) = dg{Xi,/3 ) _ ai°g/Hx, T0 )aio S /^|x,-rQ g The information matrix /d/3. ^ ' ) | the CLT W7i _> giyes a equality for W =7V(0,J-). Also rf -J = E d '"l^f " implies follows it by ' 7o) C2 = 19 m = EA(X )(p(X )-q(X i Therefore, the finite-dimensional limit of Q\ n (z) Q^(z) It remains to show Q\ n = (2) given by = urn + Wv - for any compact - A n (x,u)/n\x,/3o + u/n,ap + set Z, as v/\/n) hi, u/n,cto <e< 0, in {e, X R_ x Z x z,x 6 + v/y/n) (B.l) p(x) <2x (/'//) xg'\\z\\/JZ (B.2) _g(x) A„(i,a) < 0,0 > : <2x(/7/)xS'||z||/ v 'n A„(x,u)/n}. Likewise /(e|z,7o) uniformly hence consider the two-sided case. 0, p(x) A„(x, u) > : A n (x,u)/n\x,/3o + - /(e X R+ x Z x z,x € = -> 00, In /(e|z,7o) in {e, n g(x) - In uniformly \v'Jv. o p (l). In the one-sided case Q2T. (2) Note that by assumptions C1-C3, /(e is i )). i e > A„(i,u)/n}. Thus 71 C sup |Q 2 „ (2) | < 2 V ~t x (/'//) xg' x \\Z\\/y/H x -•EZ for K= some constant conclusion is ||Z|| x g' , where = ||Z|| sup{||z|| I 6 Z}, where /C Op {\[y/n~) (B.3) by C3. The is finite Op (l/y/n) by C2: ££^1(M <^/n) Part 2 : < K/n) = l(|e, II obtains the finite-dimensional limit of = J2 oi(z) [ |n Q„ (2). <2/A" <co. (B.4) Recall ., ^^-l(0<ne,<A u )) + ln n (X„u)) t53 1(0 < ne - A "(^" ' 4§ 1(0 > ne - £ A n(^..«)]. By C2 and C3 n £^|l(0 < = 1 + |l(0 > where A(X,) = <5n(^) Now 8, = ^V £ < A„ (A\,u)) - 1(0 < ne, < A(Xi)'u)\ ne, > A„ (X,,u)) - 1(0 > ne, > A(A,)'u)| '^ which implies that In f note that (Q„ (zj) follows by applying ne, 1 ^TTT P(X, 1 (° ,j<l) and < ne ' for given , j < I), 2 2fg"\\u\\ /n = o(l), 2 < A (*0'") + (Q5i (z7-) < for In (0 > ne, ^7^1 9 (A",) any finite /, > A(X f )'u) are asymptotically independent. This a standard argument concerning the independence of minimal order averages of general form, see for example Resnick (1986) or Op(l). Lemma 21.19 in statistics van der Vaart (1999) [ and sample Addendum provides the proof for completeness]. 19 RecalI that for a density function /, which derivative, is everywhere continuously differentiate except + fR f'{u)du= -/(0 ) + /(0"). 25 at with an integrable The next step Q dn to obtain the finite-dimensional limit of is near-to-jump observations, whose behavior The behavior . steps. Step 1 constructs the required point process and of Q%. is We split described using a point process. is determined by the the argument in two derives its limit. Step 2 applies Step 1 to obtain the finite-dimensional limit of Q^. Step 1: Intuition for Step Define The E= Rx 1 is provided in X. The topology on point process of interest is Section 3.2 of the E is standard, main [a,b] e.g. text. x X is a compact subset relative to E. a random measure taking the following form: for any Borel subset A n N(A) = ^l[(n fi ,I,)ei], i=i N random element a is of M P (E), the metric space of nonnegative point measures on E, with the metric generated by the usual topology of vague convergence, Resnick (1987) Ch. a review]. We [Technical 3. Addendum provides show that N => N in MP (E), N for Theorem given in 3.1. This done is in steps (a) and (b). By CI and C2, for any F € 7", the basis of relatively compact open sets in E (finite unions and bounded rectangles in E), limn-n*, E'N(F) = lim,,-^ nP((rae,,X;) e F) (a): intersections of open = where measure F} m is \p(x)l(u I > 0)du defined as m(du,dx) are independent across i + q(x)l(u < = \p(x)l(u > 0)du] 0)du x dFx (x) = m(F) < + q(x)l(u < 0)du] x (B.5) oo, dFx (x). Since {(ne,, -X*) g by CO, by Meyer's Theorem, Meyer (1973), lim n—MX) P(N(F) = 0) = e- m(F) (B.6) . Statements (B.5) and (B.6) imply by Kallenberg's Theorem - Resnick (1987), Proposition 3.22 - that N => N in MP (E), where TV Next we show that (b): N canonical Poisson processes is has the same distribution as No and mean measure mo(du) = du on Because (1987), p. 138. JVo a Poisson point process with the mean intensity measure m(-). N (0, oo), N given in Theorem with points {I\} and {Tj} defined and N has the mean measure m' {du) and N' are independent, Ni(-) = JV (-) + iV 3.1. First, consider the in Theorem = du on (— oo,0), see Resnick (-)' is No has the 3.1. a Poisson point process mean measure mi(du) = du on R, by definition of the Poisson process, see Resnick (1987), p. 130. Because {Xi, X-} are i.i.d. and independent of {r^Tj}, by Proposition 3.8 in Resnick (1987), the composed with process N2 with [l(u > T (u,x) : 0)du m(du,dx) + i-> = points ({Pi, Xi}, {P;, X[},i > 1) is a Poisson process with the mean measure m,2(du,dx) l(u (l(u TO2 1 (1987). Step 2: We = < 0)du] x Fx (dx) on R x X. Finally, N with the points {T(r,, X),T{T'U A /)}, where > 0)u/p(x) + l(u < 0)u/q(x),x) is a Poisson process with the desired mean measure oT~ (du,dx) = \p(x)l(u > 0) + q(x)l(u < 0)]du x Fx(dx), by Proposition 3.7 in Resnick 7 have for 2 d d Q n(z) = Q Ju) = = (u,v) [f>^l [0 < nu < A(X,)'«]+f>jgjl [0 >ne, > A(X,)'«]] + o p (l). N: Ignoring the o p (l) term, write Qi{u) as a Lebesgue integral with respect to d Q n (n)= f JE where l defined in u (j,x) is Theorem l (j,x)dN{j,x), u The convergence 3.1. of this integral is implied by N => N in both the two-sided and one-sided model: By (a) In two-sided model: Ku = the compact set has compact support but (f TV 1-4 at V(T) l Uk A r e MP (E) Since . = 0. Resnick (1987), p. P[N e 1>(T)] e,'s x X, : = 77 < for I) = j? or I < sup x€X |A(i)'u|, where when discontinuous is (j,x)dN(j,x),k = {N points of conditions Cl-3, the function (j,x) M- l u (j,x) [-?7,+??] j N => 153. It follows and j = for ) P[N some i > bounded and vanishes outside implies {Qi(u k ),k < l)-* d (Qio(u k ),k map T 3.13 in Resnick (1987) < : T >-> M P (E) is l u (j,x) i-> R' as discontinuous where (j? ,x? ,i > 1) denote the some n > 1] = 0, and by definition of N, l,fc e V(T), for N in MP (E) Qt(u) = [ is 00 by C3. Thus (j,x) A(x)'u. Define the Hence by Proposition 00. = i4A(zf jf are absolutely continuous Therefore = < 77 /} r(N)-> d T(N) by the continuous mapping theorem, < I), where l.(j,x)dN(j, J (b) In one-sided model: Using the Ibragimov rules for algebraic operations with = /£ lu(j, x)dN(j, x) as a binomial random variable: = 0, where A(u) = {(j,x) e R+ x X j < A(x)'u}. if N(yl(u)) = 0. Also define Qt(u) = QtM = JE u (j,x)dN(j,x) = -00 if N(A(u)) > 0, Qt(u) = Thus, to show finite-dimensional convergence (for 7* = —00 or 0): oo's stated in Qi(u) = -00 Theorem if note Q„(u) 3.1, N(-4(u)) > = and Has'minskii (1981a) = Qi(u) 0, if Qto( u ) N(A(u)) : l lim n—¥oo it suffices to B.2. C2 ) J) = P(QUu k show (N(A(ut)),fc < l)—*d (N(A(u k )),k < of point processes, since by P(Q dn (u k = 7fcl fc < cf. I) / < and construction of N, N(9.4(7i t )) Proof of Theorem 3.2. The = By a 00. Embrechts, Kliippelberg, and Mikosch (1997) = definition of is weak convergence immediate from N => N, a.s. proof applies Theorem 1.10.2, p. some I), p. 232, this and N(d.4(ujt)) (1981b) that states the limit distribution of BE's provided First, for =~f k ,k < ) 107 of Ibragimov and Has'minskii general conditions hold. BE's are measurable by Jennrich's (1969) measurability theorem since they minimize the objective functions that are continuous in data and parameters. Second, the following conditions (l)-(3) verify the conditions of Theorem 1.10.2, p. 107 of Ibragimov and Has'minskii (1981b): (1) (a) Holder continuity of („ (b) Exponential (z) in expectation proved in bound on the expected E PJ n where function a'(\z\ Has'minskii (1981b), for any N > 0, — i.e. lim^^^ 1) falls (z) — 1) is N - a '<l*l-» e U2 = C.2, Lemma > C.2: for a <e-°'^- \ l to the class of functions (i) a'(\z\ \z\ Lemma likelihood tails proved in G defined on monotonically increasing to 00 0. 1- p. 41 in Ibragimov and in \z\ on [0, 00), and (ii) (2) = Finite-dimensional convergence of £ n (z) collection (zj,j < exp(Q n (z)) = to £oo(-z) exp(Qoo(z)): any for finite I) (in(Zj),j<l)~^d {too(Zj),j<l), of the terms are defined in the proof of (all (3) The p. 107 and conditions It — - z)- p{z / uniquely solved by a random vector Z, which cf. 3.1.) Bayes problem: limit Z = arginf is Theorem in by is Z °° D2 dz, p (since convex with a unique minimum, is Ibragimov and Has'minskii (1981b).) Dl -D3 on the loss functions p and prior p. must be noted that Ibragimov and Has'minskii (1981b) impose the symmetry However, the inspection of the proof of Theorems 1.10.2 (and Theorem require the symmetry and applies to the loss functions that satisfy of p throughout 1.5.2) reveals D1-D3. Thus, conditions (l)-(3) imply by Theorem 1.10.2 of Ibragimov and Has'minskii (1981b) our Zn — lim The B.3. Proof of Theorem 3.3. To show claim lim Consider the problem min c to set c = —EZ. W. E Pl Suppose that p {Z c^O, E7n(5) + c) 1, and any will EP 1.10.2 of N> ,„p{Zn ) be used Ibragimov and Has'minskii = EP p(Z) < m.7) 00. later. note under the conditions of Theorem 3.2 by (B.7) = EPlo Z. 1 [H- (7- 7„ = where p(z) , d lim \ not needed to prove Theorem 3.2, but last result is and Theorem 1.5.2 local sequence 7n(5) = 70 + H„S,S 6 H- N Plnls) {\Zn > H} = 0, and (1981b) that for any result: Z. ><j Furthermore, conditions (l)-(3) imply by Theorem their book. that the proof does not (*))] z'z. The solution of this minimization problem E Plo p(Z + c)<EPlo p(Z), where by Lemma 3.1 and the rhs of (B.8) the lhs of (B.8) is is Yvm i since is 2, the asymptotic average risk of the sequence of estimators mean established in P^ w {^(T)) j < (70),.} = Ym^P^^Zn (r)). < o} = assumed to be a continuity point of the distribution of (Z c where p(z;r) Lemma = (1 (z 3.1. A 7+ Hn c Lemma 3.1. note that by Theorem 3.2 and definition of weak convergence m in or by (B.8) the asymptotic average risk of the sequence of posterior means 7, which contradicts the asymptotic average risk efficiency of the posterior To show claim is then > 0) — Ep P -, (( Z ( T ))j t)z. Note that the quantity solution of this problem Py {{Z(r)) j > c} _ c T ' ) (t)) .. P-f0 {{Z{r)) < o}. Consider the problem ' EPlo p UZ (t)) — c; t) . is given by the root of the first = r or P, {(Z (r)) J < c] = 28 j 1 is finite for any c by (B.7) order condition - r, (B.9) c i.e. = — (1 r)-th quantile of (Z(r))- (under the condition that (Z (r)) neighborhood of Suppose 0). has positive density in any small c 7^ 0, then E P7o p((Z(t)) -c;t)<E PioP ((Z(t)) j ;t), (B.10) j Lemma 3.1 where by (7(7-) Hn — , the lhs of (B.10) and the rhs Lemma and 3.4 c) is is the asymptotic average risk of the sequence of estimators defined as the asymptotic average risk of the posterior r-th quantile (7 (r)) Then (B.10) contradicts the asymptotic average 3.1 for definitions. posterior quantiles under the check function loss established in Thus must be that it second part of claim limP7 „ (<) {( 7 = 1 - < (r')) lim Tl—HX> = c in (B.9), so that the first part of equation (3.12) 2, P„ 7 ( 7n Proof of Theorem Lemma Lemma Then < J C.2 and (2) l^ (a) (z) > < 0} J f some 00 U Kn . , 9 (z) either the set {z is at the origin. marginally to £ x ln(z)fi (70 + Hn z) + HnZ )dz jAz')»{io sK ' y n (S) + Hn z 6 Q} in , is (Rt < l)—*d (R<x>{zj) ,j <l), where R7"( z ) -> d arginf 2gR P«,(z). (r")),| (z) (z) has the properties specified in under a parameter sequence 7 n (<5) R^ = 70 + H n 5. , dz > ^ ^ which case f JK e . , 9 {z) K=R x (z) is , j~e^WW dz ; or Kn = K ~y n a sequence of R-valued is - a fixed cube centered is {8).] random functions, i.e. for any convex and finite on an open non-empty set de- < I a.s., is proven on (1) Lemma then and (a) is a special case of Lemma 1.5.1 in Ibragimov and Has'minskii (1981b). p. 106-109 of Ibragimov and Has'minskii (1981b) under more general conditions (2). B.2 See Davis, Knight, and Liu part of the proof of Theorem 3.4 is (1992) and Pollard (1991). done by setting the true parameter sequence 7„ = (5) Considering general sequence does not change the argument but significantly complicates notation. Write Zn Zn = (r) 00 , than conditions first The i - Assertion (b) The r": (B.ll) converges in distribution in finite- dimensional sense to Poo, Proof of Lemma B.l Assertion Proof of proven. = J taken under any local sequence B.2 (Convexity Lemma). Suppose {Rt} ar g inf zeR'! < {Zn is for any vector-valued continuous function g(z) dominated R If J Suppose that (1) £„ Lemma (zj) ,j and r two useful lemmas. t 7l (z)). [Convergence in distribution here Rt r' > o\ (r')) L fined on . = j ball at zero a.s., (b) : r for <0}-P7o {(Z(r')) >0}=r"-r'. £ n (z) converges —* equation (3.11), 2, = P^mUZn (r% < P,.„,{(ZB lim 71—fOO 3.4. Consider at first in = Hm (7(1-")),} See section 3.1. claim immediate from (B.9) with c B.l (Integral Convergence of by a polynomial as z Here is (r")) K. = l-P,o {(Z(r")) j B.4. (*)),- {(Z„ (S) Lemma . risk efficiency of the (r) = (c(t) - r„ arg inf T„ (z) *e R , (70)). I\, (i) Note that = / p (z - r„ JR d (70 + Hn z) + rn 29 (70) ; r) /" ( * )/X ( 7° . JRd e n (z')p,(-ro + %"** dz. + Hn z')dz' . , 70. Since \r n + Hn z) — (70 — r„ (70) = O R"z\ \Hn z\ a Iz > for a' J the case by the properties of the 0, it is check function that - \p (z for any Lemma Hence by 2. Tn r„ (70 + Hn z) + \ ' Recall that Z (r) again, „ in it for all z. Next we need to it suffices to consider (M _I £) E^\& 2 (M-'O (b) EP^*{M- i)<e- sequence of M and -yn (<5) = 70 lim ri c '<M- l Then risk in R and 2 I > c' < cW - — all The Lemma n-too Proof of Chapter 5. Lemma (B.12) + Op(l) pv; . by (5) is given . Zn (r). + Hn z) — r„ (70) — R'z = Consider linear transformation f = M'z Then, the likelihood for £ given by M . 0): + 2|e'| V jf '|) (r) 3.1, of Ibragimov and Has'minskii (1981b) for any 0, ; \ r 1 it = J5p70 p l^Z (t) (B13) \ ; tj < 00. can be concluded that {c(t)} minimizes the 3.1. A("-)Jk EP^ p{Zn \T)d&. {£} which are measurable functions of sample (Yj, Xi, rest of the i < n), argument that establishes the r-posterior quantiles are unbiasedness and resulting coverage properties B.5. dz , the sense of achieving the infimum of statistic sequences r„ (7 n (5))). (r) e"| (1 (hm^ EP^ {s) p \Zn KfR d (c r„ a column of is lim sup lim sup —T^r / over . \ identically to the steps in the proof of asymptotic average Z (t) -> d H- N P, n(s) {\Zn (r)\>H} = hence n z') dz' j Ri in (z')ft (70 Lemma C.2. By Theorem 1.5.2 + Hn 5,5 e Rd and any N > —too, n—too n z) (70 By Lemma B.2 (z). that > - e n/2 (M~ l t) l by nonsingularity of p. only linear functions such that r„ (70 M such has the properties (for some c (a) local ; establish the uniform integrability for defined by the nonsingular matrix £n in (z) follow that the marginal limit of denotes the minimizer of T^, what follows \Hn zf)\, 2\0(z +H /77wt^L^ +H + H„Z) — (jO ft . . Zn Thus, < | fRd ln(z')ft(yo+Hn z')dz' ./if* Lemma B.l In (z) ID p(z-Rz\T)-f f / r) B.l (p(S- Hz; r)+0(z- \Hn z\")) =f Jk* (z) = Applying ;t)- P (z- R' z; r„ (70) Claim 1 is is identical to the proof of just a special case of Theorem Theorem 1.1 of (1 where — Zn = r)-quantile 3.3 Lehmann and Casella (1998), Claim 2 follows by an argument similar to that given by Ibragimov and Has'minskii (1981b) p. 93. Details are omitted for brevity. See Chernozhukov and Hong (2003). 30 Appendix Let 7 = = and h (/?,a) /2 jJ (note that Lemma all Ft (dx) x,f ); 2 taken outside the • | + ft) - 6 Q, uniformly '2 -brackets, since in z, (*)' E PJ n /2 it I^P) ', W "* 1 Kg 1 + and 7 (z) B> C.2. For some < 2> 1 n there are a r| (7; 7 + '2 - and /2 In (z'Y | | _r < e E P,\tn (z)' /2 - *„ (z')' r2 2 h> P (1) follows is \ dyFx (dx) > A> and such that for < A (|ft,| \ ha 2 \ ) (C.l) . some no and (C.l), for > + all n > no uniformly in 7 <A (\z - z'\) (1 +2 • V \z'\ \z\) (C.2) . denoting the sup norm, «. 2/ / + u / n a + V /Vn) > . / /--\ ( 3) ""(l-I.H?) „ when /2 | r > 2 <nr| 0, (3) is (7 + p. 260 in m.«(M.I"l 2 ) (4) given by (C.l), and (u/n, v/-J^) 2 ) ; 7 (4) and (\z - and (4) are obvious. 31 by the standard (2) follows by the (5) are obvious. Also, z'\) (1 by the standard manipulation of Hellinger distance, as on given, (3) (1) follows + (u'/n, w'A/n)) < A < A -.I-I + - (6) Ibragimov and Has'minskii (1981a); <A (\z -z'\ + \z- z'\ (1981a), (2) 2 \ a' ft) depends only on the diameter of the parameter space Q\ r) inequality (1 1,7) does not depend on the parameters). and some H„,z' 6 Q, 1 £p7 |4, ', manipulations of Hellinger distance, as on — ; (i) ,„ 1/2 : where constant + Hn z 6 5 < e-'""- Lemma (z) 2 (y-g (x,0) 7 ^J ^ (r,^h)>2 z such that 7 Proof of where 1 f C.2 (Exponential Tails and Holder Continuity). Given £p7 «» C.l. (v-9(x,0 + h | +h (a) rl Lemma is + ft) 2 = (hg,h a ). Define the standard Hellinger distance ri (757 C.l (Hellinger Distance Properties). Under C0-C5, h such that 7 /or any l/' Important Lemmas C. + 2 (\u \z'\ p. 260 in V - u'\ |*|) + \v- 2 v'\ ) , Ibragimov and Has'minskii Lemma Proof of C.2. 1/2 + h)<Ex rl{T,l C.l. In order to establish (C.l)(b), j (/ ( M y -g(X,l3+ I*;/? + fc/j, = 7 let = a) and h (/?, i/2 Q + ha ) - f (hp,h a ), 2 (y - dy g (X,0) \X; 1 )) (i) Ex < f \f (y - g(X,p + h )\X;P + h ,a+ha ) - f(y - g(X,0)\X;i) dy '[9{X,0),g(x,0+h /) )] Hsix,, + Ex j J\9iX.0),9( X /2 (/' +h (y- g(X,0 n + h) - f )\X '0 + h t))Y Ex J\ 9 <.x,0), g (x„ (X,8+h f (y c M -g(X,0 + + h e ,a) - \X;p 1/2 / (y (y- g(X,p + - hp)\X;fi + h p ,a)Y dy g (X,0) \X;-y) dy )] (2) < 2Ex \g(X,p + df Ex [> 2 + I \h a \ + ]h0] Ex l \ < 2f\h s \Ex df(y-g (X, p + whff) \) = \X, [0,6] if u,hg) , a) du 2 +O (\h a + ) \ +u J h. dyduj dydw (\h \) 2 ) \ a < 6 and by triangle inequality and from = < [b,a] if 6 \a — 2 < b\ by the Taylor expansion and Pubini. The and Pubini. The second term 2 < | and the bound a, \a inequality follows from the fact that \f (|) is + wh e §_ dp = 0(\h e + 0(\h a [a,b] -9(X,/3 + h )\X,p + hg,a da \dg(X,p + dp [ Jo where (y f\ _ (3) h fi )-g(X,0)\f — b 2 \ is > a uniform and The second and /. term first for in 6 > in 7. The The 0. first first inequality follows term in the second third terms in the second inequality are the third inequality follows from Taylor expansion from C4, while the third term in the third inequality follows in that inequality by C2. The lower bound from below, the equation (C.l)(a), sufficiently small S and \h\ > Indeed, for sufficiently small S and r On < \h\ 2 {T,l 8 it is + h)> r 2(7;7 It for some a > the bound in (C.l)(a) remains to prove (C.3) '(- '" >'•./ for \h\ < 8 for ,/2 (f l[9(X,e),g(x,(!+h l) )] x Ex f established by considering separately \h\ < S for some is CO for all + /0 > es \h\ > > 2 |/i„| some y sufficiently small 8. -g(X,P + y + h £ G (C4) 0. - (C.4). Write (y-g(X,p + hg)\X n + h)-f ( (C.3) ). 8 such that immediate from (C.3) 1/2 (f shown below that const max(|ft^|, the other hand, by the identification condition Hence is S. h s )\X; 1 i '2 (y-g(X,p)\X--y)Ydy ' + h)-f ,/2 (y-g(X,p)\X; 1 )Ydy. we can bound For small hg, / from below uniformly in Ex ^\g(X,p + hg)-g(X,fi) using assumption C3 and ,/2 (X, 7 ) (X,f) - q Taylor expansion. i/ Ex f J\g{x,e),9(x.g+h l[g(X,3),g(x,B+hg)Y (f v 9 )Y X I/2 p (* c f l[g(X,0),g(x,0+h J\g(X.0),g(X.e + h„)Y )] On df ll2 w& small for sufficiently area [g C4(b) Ex |ft|, (X, 0) g (X, p , f 2 inf, u is \h\ u dy 1 where the remainder term + hg)} and using bounds in 2 . (757 + =1 > const \hg\ C1-C3, bound // from below bound h) > const (y-g(X,0)\X-y)) dy J Ex ^ 7f 5(!, Ju + O (|/ig|) ( x,fS),g<x.0+hg)Y ( 1 > const |/i| > const |/i Q | ,g);i,) 'u) dy , neglecting the integrand over the small so. On the other hand, if assumption is * • by: 2 i/2 dy-o(\h\ 2 ) O (\hg\) arises from C2 and C3 to do inf &•) inf 7 r | dg{X-p)\ Ex <>, (y-g(*;/?)l7)V : f(« Conclude ) <9t ff^(y-g(X,/3)-n )' holds, the uniform lower ,a/' /2 const the other hand, by (y-9(X\P)h) Under C2, C3 and C4(a), a further lower bound >i*r > *(y-g(X,p + hg)\X- 1 + h)-f V \ l) 7 by max (\hg\, \ha\ S3 Ex ^ J I X '^'^ al da u ) d V> const l^| 2 References Aigner, D., T. Amemiya, AND D. Poirier (1976): "On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function," International Economic Review, 17, 377-396. Andrews, D. (1994): "Empirical Process Methods in Econometrics," in Handbook of Econometrics, Vol. 4, ed. by R. Engle, and D. McFadden. North Holland. Andrews, D. W. (1997): "A stopping rule for the computation of generalized method of moments estimators," Econometrica, 65(4), 913-931. Artstein, Z., AND R. J.-B. Wets (1995): "Consistency of minimizers and the SLLN for stochastic programs," J. Convex Anal., 2(1-2), 1-17. Beroer, J. O. (1993): Statistical decision theory and Bayesian analysis, Springer Series in Statistics. Springer- Verlag, York, Corrected reprint of the second (1985) edition. Bowlus, A., G. Neumann, AND A. Kiefer (2001): "Equilibrium Search Models and the Transition from School to Work," International Economic Review, 42(2), 317-343. Chernozhukov, V. (2001): "Extremal Quantile Regression," MIT Working Paper. Chernozhukov, V., AND H. Hong (2003): "Likelihood Estimation and Inference in Some Nonregular Econometrics Models," Working Paper, Department of Economics MIT, to be published by the Social Science Reserach Network New at www.ssrn.com. Christensen, B., AND A. N. Kiefer (1991): "The Exact Likelihood Function for an Empirical Job Search Model," Econometric Theory, 7, 464-486. Davis, R. A., K. Knight, and J. Liu (1992): "M-estimation for autoregressions with infinite variance," Stochastic Process. Appl., 40(1), 145-180. S. G., AND H. J. Paarsch (1993a): "Maximum Likelihood Estimation when the Support of the Distribution depends upon Some or All of the Unknown Parameters," Typescript. London, Canada: Department of Economics, University of Western Ontario. Donald, S. G., AND H. J. Paarsch (1993b): "Piecewise Pseudo-Maximum Likelihood Estimation in Empirical Models of Auctions," International Economic Review, 34, 121-148. Donald, S. G., AND H. J. Paarsch (1996): "Identifiction, Estimation, and Testing in Parametric Empirical Models of Auctions within Independent Private Values Paradigm," Econometric Theory, 12, 517-567. (2002): "Superconsistent estimation and inference in structural econometric models using extreme order Donald, Econometrics, 109(2), 305-340. and R. Wets (1988): "Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems," Ann. Statist., 16(4), 1517-1549. Embrechts, P., C. Kluppelberg, AND T. Mikosch (1997): Modelling extremal events, vol. 33 of Applications of Mathematics. Springer- Verlag, Berlin, For insurance and finance. Flinn, C, AND J. Heckman (1982): "New Methods for Analyzing Structural Models of Labor Force Dynamics," Journal of Econometrics, 18, 115-190. Hirano, K., and J. Porter (2002): "Asymptotic Efficiency in Parametric Structural Models with ParameterDependent Support," October 2002, forthcoming in Econometrica. Horowitz, J. (2000): "The Bootstrap," University of Iowa; to appear, Handbook of Econometrics, vol. 5. Ibragimov, I., AND R. Has'minskii (1981a): Statistical Estimation: Asymptotic Theory. Springer Verlag, Chapter V: Independent Identically Distributed Observations. Densities with Jumps. (1981b): Statistical Estimation: Asymptotic Theory. Springer Verlag, Chapter I: The Problem of Statistical Estimation. Jennrich, R. I. (1969): "Asymptotic properties of non-linear least squares estimators," Ann. Math. Statist., 40, 633-643. Knight, K. (2000): "Epi-convergence and Stochastic Equisemicontinuity," Preprint. Koenker, R., AND G. S. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50. Lehmann, E., AND G. Casella (1998): Theory of Point Estimation. Springer. Meyer, R. M. (1973): "A poisson-type limit theorem for mixing sequences of dependent 'rare' events," Ann. Probastatistics," J. Dupacova, J., bility, 1. Paarsch, H. (1992): "Deciding between the common and private value paradigms in empirical models of auctions," Journal of Econometrics, 51, 191-215. Pflug, G. C. (1995): "Asymptotic stochastic programs," Math. Oper. Res., 20(4), 769-789. Politis, D., J. Romano, AND M. Wolf (1999): Subsampling. Springer Series in Statistics. Pollard, D. (1991): "Asymptotics for Least Absolute Deviation Regression Estimator," Econometric Theory, 186-199. 7, AM S. (1991): "Asymptotic behavior of the number of regression quantile breakpoints," SI J. Set. Statist. Comput., 12(4), 867-883. Portnoy, S., AND R. Koenker (1997): "The Gaussian Hare and the Laplacian Tortoise," Statistical Science, 12, 279-300. Resnick, S. I. (1986): "Point processes, regular variation and weak convergence," Adv. in Appl. Probab., 18(1), Portnoy, 66-138. (1987): Extreme Values, Regular Variation, and Point Processes. Springer- Verlag. Robert, C. P., AND G. Casella (1998): Monte Carlo Statistical Methods. Springer- Verlag. Rockafellar, R. T., AND R. B. Wets (1998): Variational Analysis. Springer- Verlag, Berlin. AND R. J.-B. Wets (1986): "On the convergence in distribution of measurable multifunctions (random normal integrands, stochastic processes and stochastic infima," Math. Oper. Res., 11(3), 385-419. Salinetti, G., sets), SMITH, R. L. (1994): "Nonregular regression," Biometrika, 81(1), 173-183. van der Vaart, A. (1999): Asymptotic Statistics. Cambridge University Press. Van der Vaart, A. (2000): "Lecture notes on Bayesian Estimation," preprint, Aarhus University. van DER Vaart, A. W., AND J. A. Wellner (1996): Weak convergence and empirical processes. Springer- Verlag, New York. Wald, A. (1950): Statistical Decision Functions. John Wiley & Sons Inc., New York, N. Y. Table 1. Estimator Performance for Intercept Estimator RMSE MAD (Based on 500 repetitions). /?o- Median AD pthloss (p=10) n=100 Posterior mean 0.0114 0.0081 0.0059 0.0401 Posterior median 0.0115 0.0078 0.0054 0.0433 0.0127 0.0088 0.0063 0.0369 MLE n=400 Posterior mean 0.0029 0.0021 0.0015 0.0104 Posterior median 0.0030 0.0020 0.0014 0.0106 0.0034 0.0023 0.0015 0.0103 MLE Table Estimator Performance for Slope 2. Estimator RMSE MAD /3i. Median (Based on 500 repetitions) AD pthloss (p=10) n=100 mean 0.0212 0.0147 0.0105 0.0983 Posterior median 0.0214 0.0144 0.0096 0.1053 MLE 0.0216 0.0145 0.0096 0.0693 Posterior n=400 Posterior mean 0.0053 0.0037 0.0026 0.0227 Posterior median 0.0054 0.0037 0.0025 0.0228 0.0057 0.0038 0.0024 0.0201 MLE Table 3. Comparison of Inference Methods: Coverage and Average Length of the Nominal 90% Confidence Intervals (Based on 500 repetitions) Confidence Interval coverage: intercept length: intercept coverage: slope length: slope n=100 Posterior Interval 0.87 0.0298 0.85 0.0555 Bootstrap P-mean 0.88 0.0392 0.86 0.0720 Subsampling: P-mean 0.83 0.0416 0.82 0.0770 Limit process: P-mean 0.86 0.0346 0.84 0.0587 Limit process: P-median 0.85 0.0352 0.86 0.0610 0.93 0.0347 0.95 0.0653 Posterior Intervals 0.87 0.0075 0.86 0.0140 Bootstrap: P-mean 0.84 0.0089 0.88 0.0167 Subsampling: P-mean 0.82 0.0085 0.83 0.0158 Limit process: P-mean 0.89 0.0085 0.82 0.0145 Limit process: P-median 0.86 0.0087 0.86 0.0150 0.89 0.0084 0.92 0.0157 Limit process: MLE n=400 Limit process: MLE Technical This addendum Addendum, Part includes the excluded material, such as omitted simpler calculations and simpler proofs, and some background material on point processes. The addendum MIT I be made available as a part of an will Economics Department Technical Report published by the Social Science Research Network. Appendix D. Point Processes The following definitions are collected from Resnick (1987). Definition D.l (MP (E)). Let E be a locally compact topological the Borel cr-algebra of subsets of E. > p(K) < {Xi,i is point measure (p.m.) K oo, for any compound CE compact, then p otherwise. Let outside a compact A said to be Radon. is MP (E) be the collection of all Radon £ to be space with a countable basis, and pon (E,£) is a countable collection of points (called points of p), and any 1}, converges vaguely to p, is A a measure of the following form: set p.m. p is A G £: simple for = £V l(i, 6 A). If if p{x) < 1 Vx 6 E, and p(A) MP (E) C point measures. Sequence {p n } / /dp„ — > / fdp for all functions / € Ck (E) [continuous, real-valued, and vanishing Vague convergence induces vague topology on P (E). The topological space V (E) if M set]. metrizable as complete separable metric space. M P (E) M denotes such metric space hereafter. Define Ai p (E) to be cr-algebra generated by open sets. Definition D.2 (Point Processes. map measurable N : (CI, the point process N(u;) values in M P (E) is the J7 P) —> is , Convergence (Mp (E) ,M P (E)) some point measure same as for in , in Distribution.). i.e. for MP (E). any metric space, cf. A point process every elementary event Weak convergence w 6 ft, in M P (E) is of the point processes Resnick (1987): we shall write N„ a the realization of =$ N N„ in taking MP (E) —> Eph(N) for all continuous and bounded functions h mapping MP (E) to R. Note that if if Eph(N n N„ => N in MP (E), then JE f(x)dN n (x)—>d J f(x)dN(x) for any / 6 Ck(E) by continuous mapping ) f: theorem. Definition D.3. (Poisson Point Process) The point process intensity measure (a) for any F m on (E,£), = *) = < < (Ft,i k) are disjoint sets in £, then (N(F,),i < Appendix k) are E. Plausibility This section illustrates the plausibility of conditions private value auction section: 02 similarly. is = 1 Assumptions Yi,Xi and 8\ model = in if < oo ifm(F)=oo, = section 2.1. independent random variables. of C0-C5 CO C5 using the Paarsch's (1992) independent For clarity consider the example used exp(/3'Z), with the following assumptions. For example 62 can be notationally Al a Poisson Point Process with mean -W --"MfOVH { if is € £, and any non-negative integer k P(N(F) (b) N if made a in the Monte-Carlo More general examples can be verified function of regressors too, and verification proceeds similarly but much more burdensome. (in addition to those listed in Section 2.1) (Z,,mi) are iid, /? G B, a compact convex subset of W*. 1 X 6 X, a compact subset of Rd " +I . A2 Fx A3 3 (x) does not M < m, < Verification of for CO: depend on is positive definite. M < oo, where m, some Al EZZ' /3. implies iid sampling number (non-degenerate) is of bidders (minus and compactness and convexity of 1). parameter space stated in CO. Under the stated parameterization g = (A',/3) - exp {0 Z) (m 1) / (m - 2) and 9{x '('•• ", ' A2 Note that 7 = probability. Since the f (y and j3 and A3 imply that — g (X,/3) ^r,m+1 He>o). . for /3 ^ /3' g(X,@) , \X,/3) density function is strictly X g(X,/3') for some monotone g (X,/3), identification ^ in with positive holds. Verification of CI: Clearly the model is a one-sided model Cl.(ii) in which f(e\x,(3) P(XJ)=m is strictly for is each A = and M m+,= 9lxT) Al and A3. C2: / (e|A,/3), as defined, is obviously upper semi-continuous at /3, and is uniformly bounded by Al. Its first derivative in e: in e first partial except at e = and bounded uniformly derivative of / (e\X, « = 0, is maximized at e = X clearly continuous in and /3 in (e, A, /?) by Al and A3. /?) in d f{e\X,p) = Z dp is q(x,/3) and continuous The and from above by bounded away from Verification of [g(X = 9(X,0) m — (m + 1) [e and bounded uniformly + g(X,/3)]i in (e, A,/3) by f(e\X,P) Al and A3. This holds similarly 2 for a -* epep rf(e\X,P) 2 d/Sd/3 7 f(e\X,0) = ZZ'f(e\X,j3) I [e All of the above quantities are continuous dp in X 'i_V_ (m + 1)r 9(X,j3) m — (m + 1) + g[X for each e f(y-g(X,P)\X,0) \=l(y>9(X,p)) m 2 i^L + [e+g(X,p)] and /?. Finally, for g(X,pr ,,771 + 1 some constant -9(X,P)Z Verification of Ex f\&{*-'&>fi\*' i*)\*s Bx C3: This is verified immediately by noting that: ^g(X,P) = g(X,p)Z, + 1) jM [e C <i{y>-C)—^, which implies sup (ra cy=x dy < 00. + g(X,0)Y and -g{X,p)=g(X,p)ZZ', dpdp due to Al. Also note that dg(X;p)dg(X;p) dp dp for some > c due to Al and A3. By A2 EZZ' Verification of C4: Verification of Verification of C5: When / (Y, C4 is 2 E [g(X,j3) ZZ'] 7 so that is cEZZ is not present. positive definite. is a not needed since parameter the true parameter > e, = Y, — g(Xj,P), we have that m—l^S-^-^l (Y, - g(X„p) > 0) - g(X„0)\X,p) = . [e,+g(X„l3)r Al and A3 Therefore by sup £\nf(Y-g(X„0)\X„p) = supl(Y,-g(X„P)>0) (m+D & B m — (m + 1) g(x„p) [t,+g(X„p)}\ z, 9ix t, < + rJK^ g{Xi,P) C\ < 00, and 9(x J) ^\nf(Y,-g(X,J)\X,J) = sapl{Yi-g(XiJ)>Q) {m + d ; e,+g(X„p) 8 sup 3 g(x,J) [e,+g{X„p)}\ (m+1) Appendix F.l. In Proof of Theorem F. zX < Cz < iZ , 00. Excluded Simple Derivations added 3.1. These additional derivations are at the request of a referee. Write Tl <?!»(*)= £>«(*) xl„(«) (F.l) t=i where At for = l )n («) first, for [l(e, a fixed >{A„(X„u)/nV0}) + 2, l(«, < {A n (X„u)/n A we use a Taylor expansion of f,„(z) for each ()})] i . and plugging this back into expression Q\n {z) ,1 Tl dg(Xu p ^ *— +^) d In/( , dt 5/3 A n (X„n, n ( oa + 1 n I op , V 2 ~~ f / f oaoa a a n f— = , 1 1 e> \ A,,/3 n Xi po + u/n, Qo + v/y/nj e, 2 , V-=V"* 5— ln/(«i|A'i,7o) lin(u) +«'- V" In/ *—* ^3 '—^ y/n ) (« + "/rJ,Qo , \ + n V,„/VJ1 \X,,Po t + uln /n,a + v/y/n ) / l,n(u) 'I l«n(«) v, l,„(u) where «i„ and a point on the is Vi n is a point on the line Thus, the terms are and u that depend on u and between line and v that depends on v and between Xi) (but not on other observations), (e,, (ty, Xj) (but not on other observations). the above summations. iid in and the bounded Application of Chebyshev inequality, the bounded density condition Cl, C3 removes condition Var derivative the term II, and adds o p (l) to the above expression. For example, consider li n (u) in < -^f^ln/(ei|*,-, 7o)(l,-n(u)-l) const -f]E\li n (u) {f/ff * — * • 1| Tt. 2 < const(/7/) -2-(/p')/7i = (F.2) (l). Similarly £ iE£ * 1 = ln f^\ X»^ (!.»(«) - 1) = o(l). 1 Hence Also elementary calculations, as bounded density conditions C2 Chebyshev of inequality, C5 show that by the «,„, Vi n by and and ///, and add a term o p (l) to the whole expression. The for /, //' Markov LLN along with application of and application (F.2), and bounded derivative condition C3, we can replace remove l,„(a) from the expression and in allows to replace each of the terms with its limit expectation, gives the required conclusion. Proof of Independence needed in the Proof of Theorem 3.1. These additional derivations are added at the request of a referee. The proof was omitted from the main text, because it follows essentially F.2. from the standard proof concerning the independence of minimal order sample averages of general form Vaart (1999). It is (partial sum statistics (extremal processes) processes), see e.g. Resnick (1986) and Lemma 21.19 in and van der presented here for completeness. c {Q n (zj) ,j < I) is W„, we need to show that where ignoring o p (l) term Since up to o p (l) terms, the only stochastic element in (Qn ("Uj) ,j < ') are asymptotically independent of On(u) For clarity modate Case I: > I / = we 1 (° < nti < A (*')'«) +ln£ 7$l 1 q(X, present the proof for the case first n = 1 = P when by similar to that in (F.2), -^^^ln/(e V it ^Ya p(Xi By an argument W Therefore ln f For notation sake, we do not index 1. 1. =E W„, 1 |X,, I = 1, > "e > ' A W«) discuss the changes needed to accom- 7. W —W n = n 7o)l(e, > {A(X,)'u/n} V op or (1), where a < {A(X,)'u/n} 1 show asymptotic independence between suffices to and (° W n and Q^ > ne, (u). Define 71 Ql ( u ) = ^ [1 (0 < nc, < A{X,)'u) + 1 (0 > A(X,)'u)] /\0) and CO = J2 QZ,(u) < * <&{Xi)'u) + [!(° limsupP IqI To show = (u) (F.3), proceed in P [Ql = k,Qi («) (u) J'f >A (#/)'«)] and Q^ («) follows by the Portmanteau W, < y} < -P = *,<& Lemma and proving k: < fc,q* (u) two > W„ Then asymptotic independence between that for any real x,y and integer 1(0 x, steps. In Step 1 {<?L («) below, invoking <x,W n <y}=P [Ql (ti) = iid (u) sampling, < x} fc,Q* (u) < x} it is P{W < y} (F.3) . shown that p{yiZiW„ < (F.4) y} where W„ where ?;, X; for i <n— k axe CLT Step 2 applies Theorem 3.1 to = Thus the proof 1. P{QZ > A(X,)'u/n V {e, n —> d W. and by the Portmanteau lim sup Step W show that (u) Define p n = P {A(ci, Xi) c }. By i.i.d. M"«»,*i)<* J2 ;=n-i + (u) and 1 P\ e, < A(X,)'u/« A 0}. N => N implies similarly to the proof in Finally, convergence = k,Qdn complete given Steps n-k or Lemma that is (l)pUl-Pn) (F.5) draws from the distribution of ei,X, conditional on i.i.d. A(ei,Xi) yjn-k E^/tel*.*). f^ da < x) < PJQ^ (u) = k,Qt («) < x} . 2. sampling 1 (0 < nti in CO, the left < A(Xj)'u) + 1 hand side of (0 > ru, (F.4) can be written as > A(X,)'u) = 1, for i = n- k + 1,.,...,71 ] = P{Q^M=k,Qi(u)<x} XP {^g£' n/(f where function Step 2. W„—>d /„ is W defined follows £W n ->0, Var (w„) (c) Lindeberg's condition By C2 E* ^ 3.1 e, > {A(Xj)'u/n} V and W„ or e, < {A(X,)'u/n} A 0, for a lit < n- k >, in (F.5). by checking three conditions: (a) EW„ =Vn^k |X -' 70) Theorem in (b) Vor(W), and -» Condition (a) requires ' is satisfied. EW„ = Vn - kE-§^ In / (ii\Xi,~fo) />!(<,, x,)£ ln / -> 0. This is true because -Sx^(«., x «£ln/M^7o)/(*i|*,-,7o)<fei W*.>7o) /( f ,|X, 7o)^, _ /= = — vnfcP{A((„X,)} . ) P{>4( f „X,)} limn-,00 P{A(e,,X,)} = -Ex f JAUi.Xi)* 1. In addition, for large enough n: ^-\nf(e,\X, no )f(u\X„ 1 o)de,<2U'/f)(9f)(b oa n > Hence £W„ — A 0. > similar calculation verifies E— In/ (e,|X Var(wn ) =JS^ln/(e,-|X„7o)^ln/(ei|X 7o)'1-, ^££ln/(e,|X ,7o)^ln/(e I The step final |X 1 to verify the Lindeberg condition, that for is n-k ^£#l ln ! = Var(W„). ,7o)' all ^ A 1)7o ) and all >NW)i( V^ln/^IXi.To) f > 0, > &n-k\/n] (F.6) ln/(ei|Xi,7o) (F.7) where we define 2 ^T, Var °n-k n — k x \ 'i: lQ f(^^) = X'Var da Since a\_ k converges to a positive constant by (b), the conclusion (F.6) follows from ^(A'^ln/( e-,|X,, 7o)) P{ which, in turn, follows from lim n _nx, Case II: I > J ^(V^ln/^pG^o)) 4(e,, -X,)} = / P{A(u, < oo; and by C4. 1 This involves more tedious notations but follows the same 1. t )} logic. W„-W, = ov (1), Note that where W Hence teau = ^y"#-ln/(€,|X,, 7o )l(e, >{A(X0'%/n}V0ore, < {^{X^'uj/n} vn ~^ oa n it show the asymptotic independence between (Q^ (uj) ,j<l) and suffices to show that for any real Xj and y and any integers kj and k suffices to Lemma it limsupP {qI (m) = k u Q dn ( Ul ) < x u ...,Q pn (u,) d = k,,Q n < x h Qi = (ui) k, A W n . for j W„ < for all j 0, By = i) • the Portman- 1, ...,l < y\ (F.8) <P{Q PX («i) = fci.Qi («i) < x,, ...,(&, = (ti,) fc,,Qi («i) < *,,QU = *} -P{W < y} where Q*n =^2(l[0 < ne, < A (X,)' uj, for < l] + 1 [0 > some j < l] + 1[0 > some j ne t - > A (X,)' u,, for some j < I]) and QU = By iid ^ sampling fl in [0< CO, the P {QKu,) = huQiim) „f i v;'« da J, \i>f(e i < A left (Xi)' uj, for hand < x U -,Q*(u \X i ,jo)<y €i J\ > A (AT/)' jy, for some j < I]) side of (F.8) (without limsup) equals t ) = k h Qi(u < xi,Q n = k) ( t ) > {A(Xj)' uj/n] V or c, < {A(X,-)'tt//n} A 0, for all j < J, for alii < n - w "- '} - p {v^? where W, = -J=f^°=1 ^ In / I ! e,|X,,7o) and e,,X, are i.i.d. draws from the distribution of a,Xi conditional on A(u,Xi) = {e, > {A(X,)'uj/n} V or 6 u < {A(A\)'u_,/ti} A 0, for all j < I}. fc I, Theorem Similarly to the proof of p{Q pn (ui) = ki,Qi limsup <p{Q^(ui) = Finally, that lim n _,<x, Case I (when 1 Proof of F.3. Chapter 5. f{ ^W ZL / 1 < n N convergence 3.1, < (ui) N => QS (w) = fo.On xi,..., = P {VK < < (w/) = x,,Q|, that A:} = fci,Q»(ui)<x,,QU = *i,Q£,(ui)<zi,...,Q^(ui) y} Lemma implies that by the Portmanteau *} y} follows similarly to the proof of (a)-(c) Step 2 in in = 1). Lemma 3.1. Claim 1 is just a special case of Theorem Lehmann and 1.1 of Casella (1998), Claim 2 follows by an argument similar to that given by Ibragimov and Has'minskii (1981b) p.93. Let Z* local = H„' (7,.^,„ — 7n(<5)), where parameter sequence I(K) It I follows = = index 5 emphasizes the dependence of the distribution of Zt on the Define 7n(<5). In (K) limsup In (K), = -^— [ EP from Fatou's lemma and conclusion (B.7) that I(K) limsup^R,; I(K) II(K) = By Lemma B.2 and Lemma — 7n(<5)), = 1{H~ where (7 limsup //„(«"), B.l + |K| = P ,x EP _ lo p(Z ao )d& any / K = by Fatou's lemma implies that II(K) = rr^r [ E P - p{ z = 0, for |/C| E ' K fK . Jk( IIn (K) < In (K) n l (Zn (K))d6. {r, - ^_~ ^.. 8)) sup{|z| : 2 6 EP K} for dr,. p(Z„(K)) and infinitely now follows. <5 e risk-efficiency of the no, p(Z s (K))d5 for each n, J E ",o risk. Bayes estimator S I. = H~ Define Z£ wW fK EP Proof of limsup KTR<i //(X) = III ~j P ,>. hence IJ(K) K , n < I(K) = I. for any 8 6 Md as Indeed, suppose there exists an estimator sequence {7,1} that achieves a asymptotic average many n > d and n < 00, = E P p(Z s (K)); which any Then limsup KTR j H{K) — I follows from (a) II(K) < I for each K, (b) noting that K f R Z (A") — p Z, and (c) dominated convergence theorem, as shown below. (2) Thus <eKd finite-sample average risk efficiency of the Bayes estimator The claim p(Z). Bayes estimator defined with respect to n is the , provides the necessary uniform integrability to conclude lim n _Kx, strictly lower E P^ = K}. Define //„(*) axg inf <$} ~/ fo) 6 follows that for it P7n(6) {Z*(iir) > — 1 Zns (K)-^ d Z 5 (K) = By JK -^rj^. = E P^ o p{Z). Next let Z^(-K) = H„' (7p.a k ,t. the loss function p and prior A«(7) The property - = p(Zns )d6. ~t P .\ + (yn — 7n(<5)), then it must be that for some K, no, p{Z n )d8 < II„(K), which contradicts to finite-sample average K ,„ for each such n. Rewrite II(K) [p(Z (K))-p{Z)] l s < I(K) as dS/^K)-J E Pno JK E P^ o s [p(Z s (K)) - p(Z)] d5/X(K) < [p{Z {K))-p(Z)]~ d8/\{K) < 0. or (F.9) Next as r(K) —> oo f JK E Plo s \p(Z (K)) - p(Z)] ' dS/X(K) l where r(K) denotes the width of the cube follows by (b) for But any , 77 (c), 6 (F.9)- (F.ll) Thus JI(K) - J -+ = [ EPlQ \p{Z^ K \K)) J (-1,1)* K (which and the domination (uniform (0, l) d as is it #tR d - must be that JK ~ dr, -> 0, (F.10) J assumed to be centered at zero). Conclusion (F.10) integrability) condition: and any K, \p(ZVT{K) (K)) - p(Z)Y < p(Z), where imply that p(Z)] L J EP^ o [p(Z*(K)) E Pl(j p(Z) - p(Z)] + d5/X(K) -> < 00. (F.ll) as r(K) -> 00. Addendum, Part Technical This addendum includes the material on the The addendum the statement of the result. Working Paper published by the Maximum II: Likelihood Estimation maximum likelihood The main estimation. text only contains MIT Economics Department be made available as part of a will Social Science Research Network. Appendix G. Maximum Likelihood Procedures. The Maximum Likelihood Estimator 7 (MLE) = given by is = (&',&')' arginf - L n (f). 765 We obtain various properties of the and -Jn for the parameter a), and one for smooth likelihood analysis. may be used for inference variable Z e and in MLE its such as consistency, rates of convergence (n for the parameter asymptotic distribution. The limit distribution The asymptotic /3 is is j3 the standard an extreme type distribution, which the same way as any standard distribution. For example, denoting the limit the parameter sequence 7n(<5) = 7o + Hn 5, n0 - /?„(<*))->* Zi = for distribution for a for any continuously differentiate functions r of Z* , /?, «(rW-r(ft))^|r(W'Z'. Then, quantiles of distribution of ^r(/?o)' ' £sr((3)'ZP s can be estimated by simulating a series of draws of according to the formulas given in this paper, and then used for classical inference and hy- , (Alternatively, parametric bootstrap pothesis testing. Z° Q-quantile of §gr(f5)' an asymptotic , 90% may be limit distribution median On bias, is also useful for bias correction. simply take r(ft) — loss functions) given by For example, to remove the (first-order) asymptotic £~- the other hand, inference about symmetric is kA-S ?} 'rcft-321, The For example, denoting by c(t) the used). confidence interval a is The standard. under the parameter sequence limit distribution of either 7n(<5) = 70 + Hn 5 a Zl = V^(a - a„(<5)H d Z ±N{Q,J~ where is J = —E in fact o* a , ln/(Yi —g{Xi,j3)\Xi,p,a) Z independent from , 20 This intuition (so that is points, e.g. statistics or BE's (for ), the usual information matrix for a. /9 The limit variable Za coming from a small fraction and information about a coming from the whole sample impact of the small fraction based on dence of minimal order jump MLE given by l which follows from the information about of the whole sample located near the and averaged over is is is negligible). 20 Resnick (1986) and van der Vaart (1999)'s and sample averages. 9 Lemma 21.19 concerning the indepen- The combined with the usual estimates of the information matrix can be used for inference. This result, one above, can also be used for inference about functions - >/n(r08, a) r(/3 ,a = )) dr ^ ao) ' ^0 - a) of both r(fi, fa) + dr{ ^ ao) and a using the delta-method: /? ' ^i(& - Qo ) + o,(l/VS) &dr(g2i aoy za + oa Also, in this case it is possible to use the second order expansion, better capture the estimation uncertainty, /ZMA A\ r lR ,ao))« n ^ i. v^(r(^a)-r(/? In many /?, and the dr(p ,a a Z +—-—^—Z Y Z B + gK^op) Z -j* ' ' 1 d 2 situations, such as the previous auction example, the functions of regular parameter a is does not vanish, to i.e. 9q Q& — /?o) from which ($ r(/3o,a ,- „„ ) +0p (l/^). prime interest depend only on not present. Quantiles of the above distributional approximation can be obtained by simulation, which consists of making draws of the variables Z and Za independently of each , other, evaluating the above expressions (with derivatives of function r evaluated at the estimates), taking the appropriate quantiles of the simulated Wald intervals and hypothesis Theorem G.l mum in R Particularly, then MLE). Under C0-C5, and Zn = Op (l) Z°-> d Z a = The resulting quantiles can be used for classical testing. (Properties of a.s., series. and then supposing that — too(z) attains a unique mini- and J-'W = A/"(0, J--1 ), Z%-> d Z = argmin^ cR ^ - and £>oo(u), Ze and Z" are independent. This result states the consistency, the rates of convergence, and the limit distribution of the MLE. The limit tion is given in form of argmin of a limit likelihood. Due to asymptotic independence of the informa- about the shape parameter from the information about the location parameter, the MLE's for these parameters are asymptotically independent. Remark G.l (Boundary Models). y/n(a and by In the - a n (6))-+ d argsup (Wv - v'jv/2) = J" u limit distribution of MLE is — exp(u'm) ( = argsup 1 W = ^(O.J^ such that J, > A(Xi)'u, for such that J, > A(Xi) u'm f explicit: 1 ), point algorithms, cf. i > lj, u, for all i > 11. 1 ' thus convenient in the boundary models and can be simulated by solving OLS through the use of is much less practical. (Uniqueness). The condition that — loo(z) attains a unique minimum does not appear to be problematic. The invertibility of the information matrix Za , interior Portnoy and Koenker (1997). In contrast, bootstrapping requires repeated solutions of a nonlinear programming problem and Remark G.2 all ^ an Li-linear programming problem, which can be done at the speeds of ness of be made more limit result can (3.7) n(P — Pn(i))—>d arginf The boundary models the and the solutions of linear programs like J a.s. is necessary and guarantees the unique- those above are unique under mild conditions, which 10 guarantees uniqueness of Z® A (^^'s when E.g. . Portnoy (1991) see e.g. support does not concentrate on a proper linear subspace component and the variables of lower dimension, has an absolutely continuous corresponds to the result of Donald and Paarsch (1993a) G.l. Epi-graphical Convergence. Epi-convergence and Wets (1998), among (1986), Rockafellar x. £ be Let Qn lower semi-continuous is the space of 1-sc (1-sc) functions / and any , n, real p{ ..-, rk (1) (2) P[inf Ig K,i< a Qn(x) inf ieK ,x<a Q-n(x) The following Lemma i. ii. iii. < and Zn Zoo be > Q(x) inf rk oo. .R] , ..., Rk in R with open ) Q„(i) > n, ..., inf Q n (x) > rk Q n {x) > n ..., inf Q n {x) > rk \ (G.l) , Q(x)>r,,..., Q{x)>rk inf 1 f \. simply by lower-semi-continuity. Epi-convergence Q Q n {x), It is also n (x)]. Thus if Q n (Z„) < eR j a weak condition that inf l£ K Qn(x) < a] = one can characterize the joint distribution of the terms one obtains the inf. is an evident condition, since P[arg limit distribution of argmin. given in Knight (2000), Pflug (1995) Salinetti s.t. ^ Q n (z) + e n , e„ \ 0, and Wets (1986), among others. and suppose =O p (l), = Qn() arginf ieR jQoo(z) 13 uniquely defined m TRd a.s., and epi-converges in distribution to Qoo(), Zn —>i Epi-convergence is Zoo. more general than uniform convergence, because In our case, the non-vanishing discontinuities impossible. such that / are Qn(x3 ), Vz, Vij — -, inf of argmins. inf l€ K>ga is liminf Ij._n any closed rectangles for f in{ X £K tT .£ a lemma G.l. Let Z„ then ities. (2) is if -t-oo] < f < P\ weak convergence inf limsupP-^ inf (3) leads to work on -1 f < liminfP^ Note that the inequality C in in the : Q(x)>r u..., inf < Q each n, Qn{x) [— oo, > said to epi-converge in distribution to interiors R*\, ...,R k has been developed Knight (2000), Pflug (1995), Salinetti and Wets cf. for is Rd — R = : that uniqueness holds in that case too. Suppose that the sequence of objectives {Qn} others. functions, that who show in distribution the stochastic approximation of optimization problems, random are absolutely continuous, J, a related problem. Also when A(/f,)'s have discrete support the limit result for The recent remarkable work ing epi-convergence, which amount stochastic equisemicontinuity. of it allows for non-vanishing discontinu- make the uniform convergence Knight (2000) provides convenient of the likelihood function sufficient conditions for verify- to converting the finite-dimensional limits to epi-limits via a device called The work extends version of stochastic equisemicontinuity a.s. We Salinetti shall and Wets (1986) by defining an "in probability" prove epi-convergence directly, albeit borrowing the general structure of the proof from Knight (2000). In fact, part of the proof replicates the proof of 2 of Knight (2000). Theorem Proof of Theorem G.l. G.2. measurable by Proposition 3.2 Second, we use Lemma First, in Theorem is arg sup tn(z) the rescaled parameter space ^/n(A Zn ->d Z = Lemma defined in the proof of is may G.l Theorem are By definition — a n (S)) x n(B — /? n (<5)), Q n (z) defined in the proof of is = arg inf -<2oo(z), 3.1. be verified by checking three conditions: (a) epi-convergence in distribution of Zn = (c) uniqueness of Z. —Qn to its finite-dimensional limit — Qoo, 0„(1), and (c) is of the proof — Qn(z), -Q n {z), arg inf arg sup £oo(z) (b) Conditions = inf ze /c C0-C3. be proved that 3.1. It will where Qao(z) and other variables such as (1988), given G.l on epi-convergence to prove the result. Zn = where U„ MLE note that the Dupacova and Wets is assumed. Condition (b) is shown below. borrowed from Knight (2000) 's proof of two types of modulus of continuity by a strategy that more It is his is Theorem difficult to 2. The prove The (a). specifics are general idea based on bounding similar to the one in Ibragimov and Has'minskii (1981a). We Definition of epi-convergence in (G.l) consists of parts (l)-(3). almost identically, and part what follows we do not index P K Thus, to verify inf z£R] -Qn(z) > (1) in (G.l) it Ri,..., suffices to result clearly, first { inf Denote R = R^ x R" . Rk, write n,..., inf z€.Rk limsupP{u7<t { To explain the inf = -Qn(z) > r k \ 1 - p{u,< J I. fc { inf z£Rj -Q n (z) < -Q„(z) < rj}\ < bound the probability -Q n (z) <r} = }\. ) P\uj<k { inf -Qoo(z) < rj}\. (G.2) of the event { inf (-Ql(z) - Q dn {u)) < r\. Define two sets of grid-points as follows. distance between the adjacent points Lemma G.2 where proof of Lemma G.2. rj show Consider the grid of equidistant points {v s } and {u m } inside the rectangles of in by 7n(5). Given a collection of rectangles Pi verify part (1) only, part (3) follows (by definition of lower-semi-continuity.) For notation sake, (2) holds trivially and let is at most tp. Also cover R by the Ra Ukj denote a carefully chosen point inside the cover Next, define collection of points {zi} as the Cartesian product {v s } x ({w m } grid points has the property that the nearest grid-points are at of {zi} will be used to approximate the inf z6 fl used to approximate the behavior of inf ueH ^ — Q cn (z) by —Qn( u ) by 12 most R such that sup- U proof set Vjy, as defined in the {«*j})- This collection of apart from each other. — Q cn (z). The — Q„(u). } inf^j.,} inf u6 { ut <p and sets V,*, as defined in the The collection collection of {ukj} will be Then \mf -Q n (z)<r\c\\ A where is -Q n {z) + c inf w Q ^(R,<p) and Ld (R,<p) part = SU P Il,^2e'?i|z ] ^(fl,rf = The modulus WQ^(R,(p) Skorohod-type modulus, is ( G .3) complement, that is Q cn and discontinuous = is sup{|z| : Qn(22)l. -Qi(u)\. inf c Qn whether the infimum of the step function G.3 bounds the probability of < c\ ' \R\ -<#(«)< inf l - l<3n(2l) -Z2l<V The modulus . £ Q j(R,<p) is a coincides with the mini- — Qi(z) over a finite set of grid points {zi}- P{w Q n (R,ip) > of (G.5) is its , 0} a standard measure of equicontinuity of it tells — Qn(z) computed Lemma where = <p) a}\ U A c are the moduli of continuity of the continuous part WQ'„(R><P) of i Qi (R, { respectively: Qi, mum e, n Ac the event that the finite-dimensional approximation "works" and A = {» g ; (R, <p) < where -Q dn (u) < r + e\ inf Ac : l const -\R\ -£~ <p, z e R}. For any e > P{( Q d (R,¥>) = and given R, we can pick < l} ip(e) const \R\ (G.5) tp, small enough such that the rhs smaller than e/2. Hence P{ inf-Q„(z) <r\<p{ and by Theorem 3.1 and the Portmanteu limsupPJinf -Q n {z) < > is r) < p{ UfP) C R, for it is Finally, inf inf -Q n (z) < -Q„(z) it remains to establish of MLE -Q K (j) < r + e} + < r + c\ +e e. r\ < p\ -Q x (z) < A. inf <r ; }| < P{u* = i{ inf -Qoo{z) < r, 1 + e}\ + e. Zn = Op (l). — ln/(V, — g(Xi, /3)|X,,7) sup^EPl sup T >-» First, the MLE 7 is consistent by a generalization of Wald's - Theorem 3.3 of Artstein and Wets (1995) - which requires (b) the domination: 7 inf -Q?,(z) inf arbitrary, the required conclusion (G.2) follows. Theorem on Consistency (a) c follows that limsupP{u* =1 { Since e -Q n (z) + inf arbitrary limsupPJ Thus -Q dn (u) < r + e\+e, inf Lemma < p{ Since £ -Q cn (z) + inf is a.s. , ln/(K, (c) identification (by CO), (d) compact parameter space (by CO). lower semi-continuous (which - g(Xi,P')\Xi,*j') 13 is true by < In/ < +oo, which C2 is and C3), true by C2, wp —> Consistency implies Zn 1 Sn = € {z given in Lemma C.2. These and Has'minskii (1981a) Lemma inequalities imply, Lemma [see p{ e n (z) , \ N Lemma G.2 (Lipschitz Continuity of Q n Cn > such that for all n > no and some large ). which aJlows the -> 0; <CN A~ \ N like A> and any Ov > where , that on p. 265 in Ibragimov N> 0. Zn = Op (l). follows that it some e„ e n } for by a standard argument >A~ c \Ql(zi) -<?.»(*i)| < £n, \u\/n G.3, along with the exponential inequality for El]/ (z) G.4], that for sufficiently large sup < C N A~ N and Hence P{\Zn > A} < |u|/%A* : use of inequalities (G.6) and (G.7) proved in Under C0-C5, for a small 8 > 0, there is a random variable no < Cn \zi - z2 \{\zi |+1), E P^Cn sup <oo, i>no,7efl<(7o) uniformly over all \z\ — zi\ < 1 in the set Sn = {z lul/v^ : < £n, |«|/" < Lemma G.3 (Bounding Moduli of Continuity). Under C0-C5, sufficiently large, for some 5 > 0, and any bounded rectangle R C Sn : 1. For all sufficiently small <p > and for tuftere n all : £„ -4 0. N > no, where no < const • \R\ V. e (G v 76B4<-ro) 2. For all sufficiently small ip \R\ Lemma = sup{|z| G.4 : > P7 (£ z G fl}, and P7 { uniformly in 7 € G.3. Proof of tziQ^ ,i .8,5 (70) for Lemma sup some 8 e n (z) > (P, = 1) < const tp) |i?| for sufficiently large > A~ N \ < C N A~ N , A> and any where CN > N> 0. 0. G.2(short proof). The Lemma G.2 is needed for the MLE follows by a standard empirical process argument, noting that the object of interest average and that Proof of Lemma G.3. The is first part follows by the p. 262 in an Markov inequality and Lemma G.2. The second Ibragimov and Has'minskii (1981a). [Covering Sets.] For a hyper-cube R = Ra collection of (possibly overlapping) subsets {Vkj} of R 1. There are J{4>) < const (l/<f>) dB such cubes. 14 Rd " R C Rd ", x R8 8 as follows. First cover the support of vector , where R" C by the minimal number of closed equal-sized cubes {Xj,j,j < < is proven below. In the one-dimensional case with no covariates, the argument essentially reduces to the proof given on <p a function that document.] G.4. to result , part (a) The Cn (ei,Zi Z2)|zi — £2|||zi| + 1|, and finally applying a maximal moment inequality Cn (u,zi,Z2), specifically Lemma 19.34 in van der Vaart (1999). [Details are given in the expressions of the from last section of this is part only. a spline type object. The result then follows by the Taylor-like expansion and obtaining is to the coefficients G^ / vs, () and £ Q d () are defined in the proof of Theorem G.l. Bound). Under C0-C5, (Tail 61' ' sup where is > e P7 (w Q ^(R,tp) > e) su sup £«}, J((f>)} and construct a A(X) with the side-length of the cube equal Recall that dA n (X,u) _ dg{x,P) du Note en u also that uniformly in —> 0, Rs in and uniformly In particular, choose no such that for du n > no all for "^""^ Uuefl" of R any given x and any — E n } for some given sequence 1 '"' K" is K" = a.s. (j> 2 " cubes of the independent of <f> as being equal A(x) and independent from Q^ 7 A(X)| < , and is finite <j>) and given that A(x) G X^, ; we have that , belongs at most to only need that (no depends on du I aA < \\u\\/n : = A(X) + o(l). dA n (X,u I (We \0=e o +u/n Rr C {u in we have dA n (X,u) Thus, \ d/3 form X^j/ that are adjacent to X^ R and Thus, ). in what follows it is . (G.8) helpful to think u. Construct the (overlapping) sets 21 {vk] ,k = -m,...,m,j = de l,...,J{<l>)}cR such that where ip Vkj = > and u G -j R " vk ' — A n (x,u) < < ip = vk Since the range of |A n (X, u)\ — brackets of the form [v k ip,v k is fci^, bounded + <p] where for k at Hence the <p. most polynomially number L total \R in \ and in 1/V- Next, construct the "centers" u k j in n > no and for all G {— m, by a.s. m< oc p||i?^|| for all n, </l\R all x s.t. A(x) G X^.j k ...,0,...m}. const \R g \/<p, <t> for all small + ip t^ \R S \ we can cover the range by = sup{|u| : u G R e }. 2m -I- 1 Choose e (G.9) \ of covering sets Vkj L < (2m + bounded as is 1) • J(<t>) and grows 22 Vk j n R s so that for all n > no 23 <Skj+V, Vi:A(i)eX^ hj < A„(x,u tj-) (G.10) where 5,.- We will satisfy A„(x,u) where inf need that n < : 77 << as n oc 2 <p A(x) and hence and of The covering 22 Note also that sets Vk j Vk j - + R& [v k dA„(i,u)/du n > no- Hence u G for all for <t> at i? and x : small relative to A(x) 6 X^ tp. • 4> when x : .-. Moreover, set sufficiently small as well. Setting most to const can be thought of as "approximate linear subspaces" of clearly cover brackets of the form Note that to have dA n (x,Uk,)ldu is sufficiently PI <f> in order to small restricts the A(x) G X^j. Thus, we choose as stated in (G.9). 21 23 taken over u 6 Vkj inf is that n i.e. if, we need the constraint in (G.10), variation of 77 = Vk j in <P,v k <p] for the interior of K <*. n > no, because given u we have A n (x,u) belongs to at least two different 2 all n > no, and A(x) G away from for some X^j that is at most for X^ Vk j R? for , some k and it is j. the case that 6^, 15 = vk — <p; but otherwise, this is not the case. (b) [Characterization of Break-Points] Recall that n Qt{u)=Y. •n l(Xi i 7^{l(0<ne, < A n (X i7 r +^ u)) \n^\l(0>n >A n (X„u)) (A,) ei q 1=1 We next examine the nature of the discontinuities of Qt(u) by examining those of Qi. + (u) and then first those of Qi~(u). Suppose we have nti = A„(A'j,u) for some u £ Vkj and A(A";) 6 X^,j, then the pair (ne,, Xi) induce a break-point in the set Vkj and in the bracket Given that this is — Q^ + (u) -Qt piecewise-constant is — ip, + vk the only pair that induces a break-point in Vkj inf since \v k Thus, what we need # -Q n (u kj ) only and can only jump up as follows. is than one break-point happens d+ + (u) we need First, if net if follows that £ [5 kj , + n] 6 kj the index A„(A',-, increases. v.) to control the probability of the event that any of the brackets of the form in it — [v k + ip,v k for tp] < m. |fc| included in the event that the errors net are not separated in non-overlapping brackets, which A\(R) = Ujt<77i{ there are nei,nei' e [v k said to is which A„(X,, u) belongs. 24 to <p] — <p,y k + <p]}- Second, we need to control — <p,v k + tp], they do not fall into the nei that are separated into the brackets \v k the probability that for "bad subset" is the event is \5 k -, Sk A(Xj) G X^j. Formally, conditionally on the complement of A\(R), of such brackets, given that more This • all + i.e. rj\ on A\(R) define the event A2 (R) as the union of A 2i ,kj(R) = across i < n, \k\ < m,j < [nei 6 [S kj ,S kj + f?]|ne. 6 [v k - f,v k + ip],A{Xi) 6 X^-.u 6 Vk} j J(4>). To begin, n 71 p{a 1 (r)] < J2 J2 x^ p { ne " ne e ' fc* - *>** + fi} ^ 2m + ( *) <2 2 -^ ^ const fl ivi |k|<m i'=l:i'£i i=l Denote the bracket at [v k most 3 in (G.8)). • number total of rce, that fall into brackets of the form [v k — ip, v k + tp] by Afn — <p, v k + <p] overlaps with at most two other brackets and (ii) (G.8) K" jV„ neighborhoods of the form Vkj in which the break-point may holds for . Because n> occur (where , Al(R)} < 3 N K' n p{A 2i sup ,j, k (R)\ < 3 • K' M n (///) i<»,l*l<mj'<J») F{M,} < nEl{\nu\ < g'\\R\\) < 2g'\\R\\f, p{a 2 (R)\AUR)} Hence K" is (since n oc P{ < const \R\(nM. 2 <p ) Ukj {Jnf -Qt+(u) # -Q* + («,-»)}} < P{A2 (J?)|A;(iJ)} + p{^(fi)} < The terminology "break-points" is borrowed from the 16 const linear \R\(r)/tp programming + ip) < const literature. \R\<p. any defined Then p\A2 (R)\tfn Since (i) no, there are fo/(2p)). Therefore, conclude that J Likewise, it = p\ inf -Q dn + {u) < UtH" + {u < const \R\<p. -Q dn {u kj )\< const \R\<p. -Qi inf Ukj} kj ) } > follows that for a finite collection of grid points {Ukj} II = P{ -Qi~(u)< inf inf Finally, P\ inf '•u£R0 G.5. -Q dn {u) < Proof of Lemma G.4. This (1981a). We need to establish for uses the Cn suffice to method sup en of the proof on p. 265-266 of A> and any 1 (z)>A-"\ N for sufficiently large <CN A~ N {GU) . = {z : < t \z\ < t + 1}. It will >A t PJ Ibragimov and Has'minskii > denotes a generic constant that only depends on N. Let R(t) show that |#|¥>. > sufficiently large PJ where -Q*(u)) < J + JJ < const inf ^H^kj.^kj} en(z)>i- sup N }<C i~ N N (G.12) , since then oo PJ *• sup £ n (z) z£S„:\z\>A > A~"\ < VP-rj L Next cover R(t) by grid-points pJ ^ : {zi} in the N tn(z)>r \<Py\ sup f^ ' L > eR(t)ns n way £„(z)> (i sup + f)""" defined in the proof of sup £n (z)>r N } ,v . Theorem G.l. It follows that + pJw QCJR(t), <p) >\\n(r N /2)\UZ Q * (R(t),<p) = /2} J *e{*i} <CnA" 1 J :6S(/l+t)nS n v_!; 2 // where ujqc and £ The number i?(t)} = t + 1, ,i are the moduli of continuity defined in (G.4). of partition points {zj} that is L < const By Lemma C.2 and Markov (2 noting that \R(t)\ Il< Select v? = <" 2Ar "\ l)"^ inequality, I< By Lemma G.3 + bounded by L < const is const = + i const - ", where 1 < k. < (\R(t)\/<p) oo (n is K , = sup{|z| z proof of Lemma G.2). where given in the Py {( n (zi) > <~'v /2} < const exp(— b\t\). V"" t'v -e" 6 " 6>0 (t + 1) ,v t 1 • • . \R(t)\ Hence : for t 6 >A (G.13) 1 (< + 1)| ln(r w /2)|"V+ const (t + 1)^. then (G.12) immediately follows from (G.13) and (G.14). 17 l\, i, ' ' (G.14) G.6. Proof of Lemma We G.2(Detailed proof). have n c Q n {z)=^fin{z) >{A„(X,,M)/n}V0) + x[l(ej < {A n (X„u)/n} A l(e i 0)] 1=1 + ^( f "'( z ) - < r '"( z )) x l^ ^ A„(Xi,u)/n) + e< 1(0 > et > A„(Xi,u)/n)] i=i " ' « Recall that Q| n ( z ) = Intuitively, note that the functions of interest are the one-sided models. in assumptions tion, given the differentiability Under C1-C5, > a small 5 for 0, there all Lipschitz-smooth (spline-type) objects by construc- CI- C3. Thus, in it is a random variable is reasonable to expect the final result: Cn > such that for all n > no and some large no \<&M-QU*2)\<Cn\Zl-Z2\QZl\ + l), EP_,Cn SUp <OC, 7i>no,76Bj(7o) uniformly over — all \zi z2 \ The proof is tedious but the maximal Split Sn = < < —> < 1 in it does have a very simple structure. Given some careful Taylor-type expansions, the set {z : \v\/*Jn e„, \u\/n e„}, where e n 0. inequalities will be applied to the coefficients of those expansions to obtain the required result. Q c n {zi) — Qi„(z 2 ) \ into two terms n 7 = Y, f ' n (*i)l(ei > {A„ (Xi,B,)/n}V0) - r, n (z 2 ) 1 (e t > {A n (X„u 2 ) /n] V 0) < {A„ (X„ Ui )/n} A 0) - f, n (z 2 ) 1 < {A n 0) 1=1 n II = Y^f,n (zi) 1 (e, (a (X,,u 2 ) /n} A i=i We focus on term I, and only briefly indicate the differences for term h -h < I II. The term I < h + h, where /, =J^l(ei >{A„(I„u 1 )/t! }V{A„(X1 ,i12 )/ji}v0) i=i x (In/ - In/ — A n (Xi,u\) ln\Xi\po + ui/n,ao + v\/y/n) (e; (e, - A„ + u 2 /n, ao + v2 /y/n)) (X,,u 2 ) /n\Xi; n 72 = ^l(0<e t e [A n (Xi,ui) /n, A n (X u u 2 ) /n]) i=i x max j where [a, 6] denotes {i : a = i.2 I in / ( ei - A„ — — (X,,uj) /n\Xi;p <i< 6 or i) < i < + Uj/n,a + Vj/\/n) —. r— r /(e,|X,;/3 ,ao) I I a}. Analogously approximate the term 77 as follows: 77, - 772 < 77 < 18 7/i + 772 I , can bounded as where Tl =^ IIj x (In/ -In/ - A„ (e, (e, < {A„(A",,w,)/n} A{A„(A'„b 2 )/ii}AO) 1(e ' - An (A,, ui)/7j|A;;/3 (Xi,Ut) /n\Xi;/3 + + Ui/n,a +Vi/Vn) + u 2 /n,a v 2 /%fn)) n = J/ 2 ][] 1(0 > e, e [A n (A„u,)/n,A n (X,,U2)/n]) i=i x max /(e, I. - A„ where Part [a, I. fc] denotes {x:a<x<b Bounds on Terms I\ or and + Uj/n,a + (Aj.Mjj/nlA.jjgo —; —— /(e,|A,;/3o,ao) in j=J.»l Vj/-fn) \ I 6<i< a}. II\. Term 7i equals by Taylor expansion n ^ 1 (f, > {A„ (A, lUl )/n} V {A„ (X„u 2 )/n} V 0) i=i x — ou ln/(e, - A„ (A,,u*)/n|A,;/3 + u"/n,a + v* j-Jri) (u\ - u 2 )/n + In $S(e,- > {A n (X„ui)/n}V{A n (X ,u 2 )/n}V0)-^\nf((,\X,, lo )(v -«s)/Vnda 1 l i=i £)l(* > {A^A.^O/nlViAnCA,,^)/^} V0) i=i 92 x(H„z")'^-5-r ln/(e, - A„ (A,,u") /n|A,;/3 + w" n o-yoa Analogously decompose the term II\ as 77 n J2 1 («> < { A « (A„ Ul ) /n} A ,<*o + u" -7=) ^/n (ui - uz) /%/«. + 1 hi + U\z- {A n (A„u 2 /n} A ) 0) 1=1 O x — ln/(e, - A n ( A, u" ) /n\ Ar, 0o , + u' /n,a +v"/y/n) " £l(e,< {A„(A„ Ul )/«} A{A n — (ui - u 2 )/n + <9 (A,, U2 )/n} A 0) In / (e,|A,, 7 o) (u, -^2)/^ + ]£l(e. < {A (A,,w,)/n} A{A„(A,,u 2 )/n}A0) 7I 1=1 *(Hnz')'J!—\nf( e - A„ (A,,0/"|A,,,3o + 070a , 19 — n ,a + ^=) -Jn (w, - w) /VS. By C5, the term + |Jji II\\\ is bounded by (I £ VC2(e„X,)) where the expectation of the random term By C5, + the term I i3 II\ 3 is finite | - UI (G.15) ual, and constant across n by iid sampling. bounded by is n 1 z^\ + - l){z, z2 3 {u,Xi), )\-Y^C n i where the expectation of the random term is constant for = (G.16) l all n. Write Jn *— oa 1=1 ' " ' v Jill -4=y"l(0<e v J e[A„(X,-,ti,)/n,A n (Xi,ti2)/n])^-ln/(e j |Jfi,To)(t;, -w 2) (G.17) 1=1 ;/7 2 - Y" — vn i=i -7= > 1(0 £i 6 [A„ (A-j.m,) /n, A„ (Xj; «2 ) /n])^-ln/ (e,|X„ 7o)(vi * » By C5 u 2 )| Oa J//i has two finite moments, which remain constant for and the same bound III3 follows for E By Lemma identically. all n. Next we show a bound \IIh - EIII2 < J {F,F,L 2 (P)) sup \ for llli 19.34 in van der Vaart (1999) {] < oo, (G v I*ll/n+|ii2l/n<2€„ where J[\(F, original T, L%(P )) the L2(P) bracketing entropy of the function class, which is we rewrite in terms of \02 - parameter F={l(g(X„,3 ) < Y, e [s(X„/3 ),9(X,-, 9 2 )])^-ln/(e |X 7 o), |£ with the constant enveloped e„ eventually puts 1 = /?o + ttj /n's of a Donsker in (G.18) is finite V - + fa\ #>| < 2e„} the vector of ones. Note that the bound \ui— U2\fn for n > no, and also puts T uniformly in n by a standard argument, because is A„ < (X{ Uj ) /n's , formed as a product class {l(p(X,,/?o) (see type is ) , a small fixed neighborhood of /?o any small fixed neighborhood of 0. The entropy 1 / /'//I by C3, where 1 in in in 18); ' functions in < Yi 6 Andrews [g(X„p ),g(X„0 2 )]), (1994)) 1 and a bounded by van der Vaart and Wellner (1996) implies that T is 20 |/3, F - fa\ + random \fo ~ Po\ variable, Donsker, and thus J[)(F, < 2e„} which by Theorem 2.10.6 T, L2(P)) < oo. Also note that by C2 and C3: E^\ = \EIIh\ lfe r An (X^ < E^fn ,112 .f' 7JI > < /(e|X„ 70 )|l( e > 0)de )/ n /'l(e I 0)<7e f g \\u, - u 2 \\/y/n, /i,(x„ii,)/» where the constants are defined Thus, for some random Lemma A.l. in C„ with bounded expectation uniformly variable |Ii2+//h| < Now bounds established so collecting all the expectation uniformly II. we have far, n \. some random for variable Cn with bounded n in Bounds on Terms and 772 72 <C n +IIl\ |/l Part Cn \vi -v 2 in . Let's get \zi + l). -Z2\(\Zl\ (G.19) back now to the terms 72 and 772 . Recall that n =^1(0 72 <e. e [A„(A',,M,)/n,A n (X,,« 2 )/n]) >=i f I, max x in j = i.2 - A„ ((, (Xj,Uj) /n\Xj;Po + Uj/n,ao + Vj/\/n) \ , /(e,|A,;/3o,Qo) I I and that n 772 =^l(0>f, )/n,A„(A,,« 2 )/n]) 6 [A„(X,, Ul t=i x By C2-C3 72 is /(ei max j - An + (X,,Uj) /n\X,;0o In- = i.2 u j /n,a + Vj/y/n)\ /(ei|Ai;/?o,ao) 1 bounded by i^ 1(0 < e,- e [A„ (X\,u,) /n, A„ (A„ U2 ) A„ (A„ « 2 ) /n])(/'//) x||«, - «2 ||, x|| Ul - «2 ||, '2 where the constants are defined By C2-C3 772 Lemma in A.l. bounded by is -J* — 1(0 > 7J. e,- < 6 [A„ (Xi,ui)ln, /n])(f'/f) — 7/21 where the constants are defined inequality (G.18) we in Lemma A.l. Then, by an argument that obtain that uniformly in n E sup \y/n(hi - EI2 t)\ < co, "1 ."2 and identical bound follows for the term 772 i: E sup \y/n(II2 - £77 2 i U],T12 Furthermore by C2- C3 7J(7 2 i +772] < ) const • |m - u2 21 |. i ) | < oo. is identical to the proof of Hence for a random variable II2 Combining Cn with uniformly bounded expectation (uniformly in + /I2I < this inequality with the For a small 5 > 0, there IQi.(«i) is U21 +II21I one const(l cr»|*i + Cn/Vn)\ui - the bound in (G.19), a random variable - Q?»(«>)l < < Cn > in such that - a|(|a| + 1), u2 n and \. the statement of the for all lemma n > no and some E Pl cn sup in 7) follows: large no <oo, ™>no."r€B«(7o) uniformly over Similarly it all \z\ — zi\ < 1 in = the set 5„ follows that for a small 5 > 0, {z there : \v \/y/n < e„, \u\/n a random variable is < e„}, where c„ Cn > —> 0. such that for all n > no and some large no IQLfcO-QLMI uniformly over C„ = all \z\ — zi\ < 1 in <Cn|zi the set -22|(|2l| 5n = {z : 22 ZM SU P n>no.T6Si(7o) l), \v\/y/n in the one-side models. J + < e n ,|tt|/7i < E P^Cn £„}, <00, where e„ —» 0. Note that MIT LIBRARIES 3 9080 02524 4606 ! . m ''.'•• :