it I \ LIBRARIES V *& nft^S^ Off®* if Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/mcmcapproachtoclOOcher 1 0/ HB31 .M415 Massachusetts Institute of Technology Department of Economics Working Paper Series AN MCMC APPROACH TO CLASSICAL ESTIMATION Victor Chernozhukov Han Hong Working Paper 03-21 December 2002 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection http://ssrn.com/abstract=42037 at MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEP 1 9 2003 LIBRARIES An MCMC Approach to Classical Estimation Victor Chernozhukov a ° Department of Economics, Massachusetts b , Han Hong6 Institute of Technology, Cambridge, MA 02142, USA Department of Economics, Princeton University, Princeton, NJ 08544, First Version: October 2000 This Version: December 2002 USA 1 Project Funded by the National Science Foundation. Abstract This paper studies computationally and theoretically attractive estimators referred here as to the Laplace type estimators (LTE). The LTE include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in nonlinear IV, empirical likelihood, and minimum to classical extremum estimation and also it offers distance methods. The approach For example, outside the parametric Bayesian approach. falls GMM, generates an alternative a new attractive estimation method for such important semi-parametric problems as censored and instrumental quantile regression, nonlinear IV, GMM, and value-at-risk models. The LTE's are computed using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality. A large sample theory JEL Classification: is obtained and illustrated for regular cases. C10, Cll, C13, C15 Keywords: Laplace, Bayes, Markov Chain Monte Carlo, GMM, instrumental regression, censored quantile regression, instrumental quantile regression, empirical likelihood, value-at-risk 1 A Introduction variety of important econometric problems pose not only a theoretical but challenge, cf. Andrews (1997). A small a serious computational (and by no means exhaustive) set of such examples include Powell's censored median regression for linear and nonlinear problems, (2) nonlinear e.g in the Berry et updating al. (1) rV estimation, (1995) model, (3) the instrumental quantile regression, (4) the continuous- GMM estimator of Hansen et al. (1996), and related empirical likelihood problems. These problems represent a formidable practical challenge as the extremum estimators are known to be difficult to compute due to highly nonconvex criterion functions with pronounced global optimum). Despite extensive of efforts, see extremum computation remains a formidable impediment 'A shorter version of this paper is forthcoming in many local optima (but well notably Andrews (1997), the problem in these applications. Journal of Econometrics 115 (August 2003), p. 293-346 This paper develops a class of estimators, which we Quasi-Bayesian estimators (QBE), 2 call the Laplace type estimators (LTE) or which are defined similarly to Bayesian estimators but use gen- eral statistical criterion functions in place of the parametric likelihood function. This formulation circumvents the curse of dimensionality inherent in the computation of the classical extremum mators by instead focusing on functions and can be lation techniques LTE computed using Markov Chain Monte Carlo methods (MCMC), a class of simu- from Bayesian statistics. This formulation will be shown to yield both computable and theoretically attractive new estimators to such important problems as (l)-(4) listed above. Al- though the aforementioned applications are mostly microeconometric, the obtained to many other models, including of Gallant The esti- which are functions of integral transformations of the criterion GMM and quasi-likelihoods in the nonlinear results extend dynamic framework and White (1988). class of LTE's or QBE's aim to explore the use of the Laplace approximation (developed by Laplace to study large sample approximations of Bayesian estimators and for use in other nonstatistical problems) outside of the canonical Bayesian framework - that likelihood settings when the likelihood function is not known. outside of parametric is, Instead, the approach relies other statistical criterion functions of interest in place of the likelihood, transforms distributions - Quasi-posteriors - over a parameter of interest, quantiles of that distribution as the point estimates and them defines various and confidence upon into proper moments and intervals, respectively. It is important to emphasize that the underlying criterion functions are mainly motivated by the analogy principle in place of the likelihood principle, are not the likelihoods (densities) of the data, and are most often semi-parametric. 3 The resulting estimators and inference procedures possess a number of good theoretical and com- putational properties and yield new, alternative approaches for the important problems mentioned earlier. The estimates are as efficient as the extremum estimates; and, in many cases, the inference procedures based on the quantiles of the Quasi-posterior distribution or other posterior quantities yield asymptotically valid confidence intervals, which also perform notably well in finite samples. For example, in the quantile regression setting, those intervals provide valid large sample and excellent small sample inference without requiring nonparametric estimation of the conditional density function (needed in the standard approach). The obtained results are general and useful - they cover the examples listed above under general, non-likelihood based conditions that allow discontinuous, non-smooth semi-parametric settings to the nonlinear criterion functions, dynamic framework and data generating processes that range from of Gallant and White (1988). The iid results thus extend the theoretical work on large sample theory of Bayesian procedures in econometrics and statistics, e.g. and Yahav (1969), Ibragimov and Has'minskii (1981), Andrews (1994b), Kim (1998). Bickel The LTE's 2 A are computed using preferred terminology Estimators' is already used to is MCMC, which simulates a series of parameter draws such that taken here to be the 'Laplace Type Estimators', since the term 'Quasi-Bayesian name Bayesian procedures that use either 'vague' or 'data-dependent' prior or multiple priors, cf. Berger (2002). 3 In this paper, the term 'semi-parametric' refers to the cases where the parameters of interest are finite-dimensional but there are nonparametric nuisance parameters such as unspecified distributions. > the marginal distribution of the series parameters. mean As The estimator a quantile of the or stated above, the is therefore a function of this series, is LTE approach is may MCMC). The reason typically is motivated by the estimation and inference efficiency as well as LTE approach that the computation of LTE's means or quantiles is as efficient as the is itself statistically motivated. B number the is mode (extremum of is LTE's are estimator) draws from that distribution (functional MCMC estimated (computed) by the is similar grid-based algorithms at the nonparametric rate (l/B) d +?p p extremum approach, of a quasi-posterior distribution, hence can be estimated (computed) evaluations). In contrast, the dimension and explicitly as the not suffer from the computational curse of dimensionality (through the use of at the parametric rate l/\/B, where and and may be given the minimizer of a smooth globally convex function. series, or implicitly as computational attractiveness. Indeed, the but generally (approximately) the Quasi-posterior distribution of the , where d is the parameter the smoothness order of the objective function. Another useful feature of LT estimation is that, by using information about the and confidence objective function, point estimates intervals and allows also allows incorporation of prior information, may be for shape of the overall calculated simultaneously. It a simple imposition of constraints in the estimation procedure. The remainder of the paper proceeds as follows. Section 2 formally defines and further motivates the Laplace type estimators with several examples, reviews the literature, and explains other connections. The motivating examples, which justify the pursuit of are all semi-parametric and involve no parametric likelihoods, will a more general theory than is sample theory, and Sections 3 and 4 further explore mentioned earlier. Section currently available. Section 3 develops the large it within the context of the econometric examples 4 briefly reviews important computational aspects and illustrates the use of the estimator through simulation examples. Section 5 contains a brief empirical example, and Section 6 concludes. Notation. Standard notation is used throughout. convergence in distribution under P*, denotes the Euclidean \x\ table is etc. See e.g. van der Vaart and Wellner (1996) y/x'x; B$(x) denotes the ball of radius <5 for definitions. centered at x. A notation given in the appendix. Laplacian or Quasi-Bayesian Estimation: Definition and Motivation 2 2.1 Motivation Extremum estimators of norm — p denotes —>£ denotes the Given probability measure P, the convergence in (outer) probability with respect to the outer probability P*; random are usually motivated by the analogy principle and defined as maximizers average-like criterion functions typically viewed as transformations of are maximized uniquely at some 9q. normal, cf. Amemiya Ln ((?), where n denotes the sample size. n _1 L n sample averages that converge to criterion functions Extremum (1985), Gallant and M (6) (9) are that estimators are usually consistent and asymptotically White (1988), Newey and McFadden (1994), Potscher and Prucha (1997). However, in many important computing the extremum estimates cases, actually remains a large problem, as discussed by Andrews (1997). Example rics is Censored and Nonlinear Quantile Regression. A prominent model 1: in economet- the censored median regression model of Powell (1984). Powell's censored quantile regression estimator is defined to maximize the following nonlinear objective function n Ln where p T (u) and Y{ is = (r — W = - X) w l(u < 0)) u « is ' PriXi q{Xi,e)), = max (0,g(Xu 9)), q{Xi,9) the check function of Koenker and Bassett (1978), w, either positive or zero. Its conditional quantile q(Xi,9) censored quantile regression model was inference in of - unknown first Tobin-Amemiya models without form. ically elegant, The extremum it is distributional assumptions Buchinsky (1991), Fitzenberger (1997), and we shall explore the use of Example 2: Nonlinear The difficulty. where m; (9) is LT Khan and objective function is local similar to that optima, posing a (1998), estimators based on Powell's criterion function and show that GMM (1977), Hansen (1982), Hansen et al. (1996) estimators that maximize a moment function defined such that the economic parameter of interest solves Em,i{9 The weighting matrix may be given by choices. Hahn Powell (2001) for related discussions. In this both theoretically and computationally. IV and GMM. Amemiya introduced nonlinear IV and The and with heteroscedasticity formidable obstacle to the practical use of this extremum estimator; see Buchinsky and paper, a weight, estimator based on the Powell's criterion function, while theoret- nonsmooth and highly nonconvex, with numerous this alternative is attractive is specified as max(0,g(Xi,8)). formulated by Powell (1984) as a way to provide valid has a well-known computational plotted in Figure 1 - is Note that the term "op (l)" in Wn {9) Ln estimators, which will be discussed in section — ) [jj = 0. Y^?-i m « (#) m « W J + °p0) or other sensible implicitly incorporates generalized empirical likelihood 4. Up to the first order, objective functions of empirical likelihood estimators for 9 (with the Lagrange multiplier concentrated out) locally coincide with L„. Applications of these estimators are numerous and important (1996), many (e.g. Imbens (1997)), but while global maxima are typically local approach optima in applications. in applications This leads to serious where the parameter dimension is Berry et al. (1995), well-defined, difficulties high. As in it is Hansen et al. also typical to see with applying the extremum the previous example, LTE's provide a computable and theoretically attractive alternative to extremum estimators. Furthermore, Quasi-posterior quantiles provide a valid and effective explore the shape of the objective function. way to construct confidence intervals and Example 3: Instrumental and Robust Quantile Regression. Instrumental quantile regression may be defined by maximizing a standard nonlinear IV or GMM objective function4 where mi Yi is D the dependent variable, regressors, Z\ is ((?) is - t = (r - l(Yi < lY m Tt i i (9)m i Zu a vector of possibly endogeneous a vector of instruments, and W„{9) q(D h Xi,9)) Wn (9) is +op (l) (9)' Wn or (tf) = _i__ i i=l variables, Xi is a vector of a positive definite weighting matrix, J I v J. e.g. -j^^Z't /6 T' ) L i=l or other sensible versions. Motivations for estimating equations of this sort arise from traditional separable simultaneous equations, cf. Amemiya (1985), taneous equation models and heterogeneous treatment and also more general nonseparable simul- effect models. 5 Clearly, a variety of Huber (1973) type robust estimators can be defined in suppose in q{X,9) then Z = f(X) = fact, it The (1999), < \{X{j produces an approach that and Hubert = X'l3(T), can be constructed to preclude the influence of outliers in example, choosing Z\j (2002). is Xj), j — 1, ..., whose computational difficulty is well i n (/3) is optimum approach known, as discussed in van Aelst et al. highly robust to both outliers in Xij and Y,. In Rousseeuw and Hubert (1999). is flat, and has numerous is daunting. discontinuities The function L n and local optima. 6 is highly non- (Note that the well pronounced.) Figure 1 illustrates the situation. Again, in this case the will yield LTE a computable and theoretically attractive alternative to the extremum-based estimation and inference. 7 Furthermore, provide a valid and effective 6 n} appears that the breakdown properties of this objective function are similar to those of the convex, almost everywhere 5 For < maximal regression depth estimator of Rousseeuw Despite a clear appeal, the computational problem 4 X on the inference. dim(X), where Xj denotes the median of {X{j, i similar in spirit to the resulting objective function objective function of global For example, this way. the absence of endogeneity we will show that the Quasi-posterior confidence way to construct confidence intervals for parameters Early variants based on the Wald instruments go back to Mood (1950) and Hogg (1975), See Chernozhukov and Hansen (2001) for the development of this direction. Macurdy and Timmins cf. and intervals their Koenker smooth (1998). (2001) propose to smooth out the edges using kernels, however this does not eliminate non-convexities and local optima; see also Abadie (1995). T Another computationally attractive approach, based on an extension of Koenker and Bassett (1978) quantile problems like these, is given in Chernozhukov and Hansen(2001). regression estimator to instrumental functions without non-parametric estimation of the conditional density function evaluated at quantiles (needed in standard approach). The LTE's studied in this paper can be easily other posterior simulation methods. Ln function (9) is computed through Markov Chain Monte Carlo and To describe these estimators, note that although the objective generally not a log-likelihood function, the transformation >Wtt(0) e is a proper distribution density over the parameter of interest, called here the Quasi-posterior. Here k (9) is a weight or prior probability density that example, in the is strictly positive can be constant over the parameter space. Note that p„ it Bayesian sense, since may it is and continuous over ©, for generally not a true posterior not involve the conditional data density or likelihood, and is thus generally created through non-Bayesian statistical learning. The Quasi-posterior mean is e where then defined as = Je ePn{ e )de the parameter space. is = f& e ( ™ ^ZlU * Other quantities such as medians and quantiles will also be considered. A In order to compute these estimators, using Markov Chain Monte Carlo methods, we can draw a Markov chain formal definition of (see Figure LTE's given in Definition is 1. 1), S={9^,9^,.. .,0< D >), whose marginal density estimate 9, e.g. is approximately given by p„(9), the Quasi-posterior distribution. Then the the Quasi-posterior mean, is computed as B 1 B 2=1 Analogously, for a given continuously differentiable function g are constructed simply by taking the .05-th 9(S) see Figure 1. = and -¥ M, the 90%-confidence intervals (g(e^),...,g(9W)), Under the information equality restrictions discussed later, such confidence regions are asymptotically valid. Under other conditions, it is possible to use other Quasi-posterior quantities such as the variance-covariance matrix of the series regions, see Section 3. It shall be : .95-th quantiles of the sequence S to define asymptotically valid confidence emphasized repeatedly that the validity of depend on the likelihood formulation. this approach does not 2.2 Formal Definitions Let p n (u) be a penalty or loss function associated with making an incorrect decision. Examples of pn (u) include i. p„ (u) = iT/nu] ii. p n (u) = y/n^2 j=1 iii. p„ (u) = y/n^j_ 2 the squared loss function, , (tj 1 Koenker and Bassett The parameter is the absolute deviation loss function, \iij\, — 1 (uj < € 0)) Uj, for Tj (0, 1) for each the check loss function of j, (1978). assumed to belong to the subset of Euclidean space. Using the Quasi-posterior pn density in (2.1), define the Quasi-posterior risk function as: On (0 Definition 1 The = /e pAO - C) Pn class of LTE ffl minimize e The estimator 9 is a decision rule that * = fe *Vthe function = arginf is Q n {0 ( j^Zl^de ) »• (2 3) - ?n (%-3) for various choices of [Qn (C)]. pn : (2.4) least unfavorable given the statistical (non-likelihood) information provided by the probability measure p n using the loss function p n In particular, the p n may asymmetrically penalize deviations from the truth, and 7r may give differential weights to different values of 9. The solutions to the problem (2.4) for loss functions i-iii include - , loss function the Quasi-posterior means, medians, and marginal 2.3 Our The Tj-th. quantiles, respectively. 8 Related Literature analysis will rely heavily on the previous work on Bayesian estimators in the likelihood setting. initial large sample work on Bayesian estimators was done by Laplace (see detailed review). Further early extended in both econometric and statistical research, cf. and Yahav (1969), Andrews (1994b), The treatments Phillips in useful generality of Bickel for the present setting, 8 This a Ibragimov and Has'minskii (1981), Bickel and Ploberger (1996), and In general, Bayesian asymptotics require very delicate control of the and were developed Stigler (1975) for work of Bernstein (1917) and von Mises (1931) has been considerably much later tail Kim (1998), among others. of the posterior distribution than the asymptotics of extremum estimators. and Yahav (1969) and Ibragimov and Has'minskii (1981) are most but are inevitably tied down to the likelihood setting. useful For example, the formulation implies that conditional on the data, the decision 8 satisfies Savage's axioms of choice under uncertainty with subjective probabilities given by p„ (these include the usual asymmetry and negative transitivity of strict preference relationship, independence, and some other standard axioms). latter treatment relies being a likelihood of heavily on Hellinger bounds that are firmly rooted in the objective function iid data. However, the general flavor of the approach suited for the present is purposes. The treatment of Bickel and Yahav (1969) can be easily extended to smooth, possibly incorrect iid likelihoods, 9 but does not apply to censored settings. Andrews (1994b) and median regression or any of the GMM type and Ploberger (1996) study the large sample approximation Phillips of posteriors- and posterior odds ratio tests in relation to the classical Kim of smooth, correctly specified likelihoods. Wald tests in the context (1998) derives the limit behavior of posteriors in Kim's approach and related approaches likelihood models over shrinking neighborhood systems. have been important in describing the essence of posterior behavior, but the limit behavior of point estimates like ours does not follow from it. 10 Formally and substantively, none of the above treatments apply to our motivating examples and the estimators given in Definition 1. These examples do not involve likelihoods, deal mostly with GMM type objective functions, and often involve discontinuous and non-smooth criterion functions to which the above mentioned results do not apply. In order to develop the theory of LTE's for such examples, we extend the previous arguments. The of the Bayesian examples results obtained here enable the use of Bayesian tools outside framework - covering models with non-likelihood-based criterion functions, such as listed earlier and other semi-parametric objective functions that may, example, depend for on preliminary estimates of infinite-dimensional nuisance parameters. Moreover, our general forms of data generating processes - from the cross-sectional framework of results apply to Amemiya (1985) to the nonlinear dynamic framework of Gallant and White (1988) and Potscher and Prucha (1997). Our motivating problems are to such problems, see notably all semi-parametric, and there are several pure Bayesian approaches Doksum and Lo (1990), Diaconis and Freedman (1986), Hahn (1997), Chamberlain and Imbens (1997), Kottas and Gelfand (2001). Semi-parametric models have some parametric and nonparametric components, Examples in 1-3. e.g. the unspecified nonparametric distribution of data The mentioned papers proceed with the pure Bayesian approach to such problems, which involves Bayesian learning about these two components via a two-step process. In the first step, Bayesian non-parametric learning with Dirichlet priors is used to form beliefs about the joint non- parametric density of data, and then draws of the non-parametric density ("Bayesian bootstrap") are made repeatedly as it fully different to compute the extremum parameter conforms to the Bayes learning model. from LTE's or QBE's studied of interest. This It is clear in this paper, and approach is that this approach in applications, it still purely Bayesian, is generally quite requires numerous re-computations of the extremum estimates in order to construct the posterior distribution over the parameter of the common interest. In sharp contrast, the LT criterion functions as posteriors, estimation takes a "shortcut" by essentially using and thus entirely avoids both the estimation of the nonparametric distribution of the data and the repeated computation of extremum estimates. 9 See iid Bunke and Milhaud likelihood case. I0 (1998) for an extension to the The conditions do not apply E.g. to describe the behavior of posterior L n (&) beyond GMM the compact \.f\/n neighborhoods of &o- requires the study of the complete L n (6). more than three times Example 1. median one needs to know f* to differentiable smooth misspecified or even p n {6)d9 which requires the study of mean is J^°00 9p n (8)dd, which also Similarly, the posterior Finally note that the information principle set of approach has a limited-information or semi-parametric nature we do not know sense that a LTE moment is maximum conditions, calculates the calculation of the maximum entropy densities consistent While (misspecified) likelihoods. entropy densities is in the present framework, not needed, the large sample theory obtained here does cover Zellner's (1998) estimators as one fundamental case. Related work by a The limitedwho starts with with the moment powerfully elaborated in the recent work of Zellner (1998), and uses those as formal equations, in the or are not willing to specify the complete data density. limited information likelihood interpretation for certain GMM smooth Kim settings. 11 (2002) derives In addition, the LTE's based on the empirical likelihood are introduced in Section 4 and motivated there as respecting the limited information principle. Large Sample Properties 3 This section shows that under general regularity conditions the Quasi-posterior distribution concentrates at the speed T./\/n around the true parameter 9 moments" norm (and total variation norm as measured by the as a special case), that the and asymptotically normal, and that Quasi-posterior quantiles LT "total variation of estimators are consistent and other relevant quantities provide asymptotically valid confidence intervals. Assumptions 3.1 We begin by stating the main assumptions. In addition, the criterion functions For example, given the underlying probability space function of 9, and ASSUMPTION vex subset ii. p„(u) p is 1 any 9 € 0, L n (9) — is (Parameter) The of Euclidean space ASSUMPTION i. for it is assumed without further notice that L n {8) and other primitive objects have the Rd (ft, a random T,P), convex and p(h) < true parameter 9 is iv. <p(£) the weighting function "Kim (2002) also provided n some : € ft, L n (9) is a measurable belongs to the interior of a compact con- > and p(u) for some — p> iff u — : Rd -> is € Kd useful is satisfies: 1, minimized uniquely —> R+ M+ 0, ' Hi. w a measurable function of w. . 1 4- \h\ p = JRi p(u — f)e -u a "du any variable, that 2 (Penalty Function) The loss function p n p(y/nu), where p(u) standard measurability properties. for at some £* for any finite a > 0, a continuous, uniformly positive density function. asymptotic results for exp(L n (0)) using the shrinking neighborhood ap- proach. However, Kim's (2002) approach does not cover the estimators and procedures considered here, see previous footnote. ASSUMPTION 3 (Identifiability) For any D [\e-9 ASSUMPTION i. ii. Ln (9) — iv. for each (9o) = (ft,) n-^(9 )A n Jn (9 ) (9 /y/E 0(1) and e > 0, there exists e \>6n\ > - e )' -m fl n A„ such that 0, 1 J 4 (Expansion) For 9 in an open neighborhood of - Ln Hi. > -(L n (0)-L n (6o ))<-e\ sup , j <5 (6> ) - \{9 - 0o)' 0o> -9 [nJn (0O )] (9 + R n (9), ) tf(0,I), (9o) = 0(1) are uniformly in n there is a sufficiently small 5 (a) limsupP*! f&; limsupP*] > and positive- definite constant matrices, M> large ' Jg" <g sup sup |-RnW| such that > el < >e > e, = 0. Discussion of Assumptions 3.2 In the following We true. we discuss the stated assumptions under which Theorem 1-4 stated below will be argue that these assumptions are simple but encompass a wide variety of econometric models - from cross-sectional models to nonlinear dynamic models. This means that Theorems 1-4 are of wide interest and applicability. In general, Assumptions 1-4 are related to but different from those in Bickel Ibragimov and Has'minskii (1981). The most substantial differences appear due to the general non-likelihood to handle the tail setting. in and Yahav (1969) and Assumption 4, and are Also in Assumption 4 we introduce Huber type conditions behavior of discontinuous and non-smooth criterion functions. early approaches are inevitably tied to the iid likelihood formulation, which In general, the not suited for the is present purposes. The compactness Assumption difficult it is only required that 7r is it is only required that Je 2 it is 1 is conventional. It is shown in the proof of Theorem 1 that required that Assumption 2. Je \0\ it is not Theorem 3, a proper density; in the case of Quasi-posterior variances in Theorem 4, to drop compactness. For example, in the case of Quasi-posterior quantiles in 2 \9\ n(9)d9 p ir(0)d9 < < oo; oo. Also, that the parameter and Of is for the general loss functions considered in course, compactness guarantees on the interior of the all of the Theorem above given parameter space rules out some non-regular cases; see for example Andrews (1999). Assumption 2 imposes convexity on the penalty function. functions for pragmatic reasons. One of the We do not consider non-convex penalty main motivations of this paper is the generic com- putability of the estimates, given that they solve well-defined convex optimization problems. 10 The domination condition, p(h) < 1 + \h\ p some for < p< 1 oo, conventional and is satisfied in all is examples of p we gave. The assumption that some finite f * (p(£) = / p(u — £)e~*''"'du oc Ep(Af(0, o _1 — any positive for mentioned. In fact, when p Assumption 3 in Amemiya LEMMA i. it. is ) definite required, is = The proof Lemma of Ln (8) Given Assumption jn Assumption 4 — it clearly holds for 1, can be found in 1 Assumption 3 holds > nonstochastic, continuous on Q, for any S is and all of examples of p we by Anderson (1955)'s lemma. implied by the usual uniform convergence and unique identification conditions as (1985). 1 o symmetric, £* is a unique minimum at £) attains Mn (6) is satisfied models. Assumption for cross-sectional 4.ii 0, Amemiya (1985) Mn a function if there is limsup n (supi 9 _ 9 i and White (1994). Mn ><s {9) that (9) — Mn (6 )) < 0, converges to zero in (outer) probability uniformly over Q. under the conditions of Lemma 2, which are known to be mild requires asymptotic normality to hold, and and many time-series applications. Assumption is 4.iii generally a in nonlinear weak assumption out the cases of mixed rules asymptotic normality for some non-stationary time series models (which can be incorporated at a notational cost with different scaling rates). Assumption LEMMA 4.iv easily holds when 2 Given Assumptions 1 there and A n (0 o = V 9 L„(0o ) i. ii. for some 6 there is fl n > 0, {9 ) Ln (9) Mn and such that some 6 > and each 3, enough smoothness. Assumption 4 holds with and Jn (8 ) (6) are twice n- 1/2 {9 )VeL n {8 0(1) are uniformly positive Hi. for is e definite, ) = -V 99 >Mn (9 ) = 0(1), continuously differentiable in 6 when )/s/h~ -^ J„(0O ) A/"(0,7), is immediate, hence 0(1) and — 6q\ < Q n (9 ) 6, = and sup \Vge>L \Vee'L n {9)/n-V e e'Mn (e)\>e\=0. (9)\>e' (9)/n-Vee'M [\e-9 \<6 2 = \6 > limsupP*]{ Lemma if its proof is J omitted. Both Lemmas 1 and 2 are simple but useful conditions that can be easily verified using standard uniform laws of large numbers and central limit theorems. In particular, they have been proven to hold for criterion functions corresponding to 1. Most smooth cross-sectional models described in 11 Amemiya (1985); : 2. The smooth GMM and dynamic nonlinear stationary (1982), Gallant and Quasi-likelihood models of Hansen and White (1988) and Potscher and Prucha (1997), covering Gordin(mixingale type) conditions and near-epoch dependent processes such as ARMA, GARCH, ARCH, and other models alike; 3. General empirical likelihood (1997), models for smooth moment equation models studied by Imbens Kitamuraand Stutzer (1997), Newey and Smith (2001), Owen (1989,1990,1991, 2001), Qin and Lawless (1994), and the recent extensions to the conditional moment equations. Hence the main statistical results of this paper, Theorems 1-4, apply to these fundamental econometric and models. Moreover, Assumption 4 does not require differentiability of the criterion function and thus holds even more generally. Assumption 4.iv is a Huber-like stochastic equicontinuity condition, which requires that the remainder term of the expansion can be controlled in a particular way over a neighborhood 4 are given in empirical of 6q. In addition to Lemma 2, many sufficient process literature, e.g. Amemiya (1985), conditions for Assumption Andrews (1994a), Newey Pakes and Pollard (1989), and van der Vaart and Wellner (1996). Section 4 for the leading verifies (1991), Assumption 4 models with nonsmooth criterion functions, including the examples discussed in the previous section. Convergence 3.3 in the Total Variation of Moments Norm we show that the Quasi-posterior density concentrates around #o at the speed 1/y/n as measured by the total variation of moments norm, and then use this preliminary Under Assumptions 1-4, result to prove all other main results. Define the local parameter h as a normalized deviation from 9q and centered at the normalized random "score function" h = y/n~ (9 - 0„) 1 - Jn (Bo)' A„ (9 ) lyfc. Define by the Jacobi rule the localized Quasi-posterior density for h as p' (h) n Define the total variation of = -)=p n (hl^fc + 9 + Jn (Oo)- moments norm \\f\\ THEOREM 1 - 4, for any 1 ( for a real-valued TVMM = J 1 A n (0O ) /») measurable function / on S as (l+\h\ a )\f(h)\dh. Convergence in Total Variation of Moments Norm) Under Assumptions < a < 00, \\PnW ~ Plo(h)\\ TVMM = f (1 + 12 \h\ a )\ * * Pn (h) - Px (h)\dh^ p 0, Hn = {y/n{9 - 9 where Theorem 1 ) - Jn -1 A n (0O (0 O ) ) /y/K 0} and 9 e : shows that p n (9) is concentrated at a l/y/n neighborhood of 9 as measured by the total moments norm. For large n, p n (9) is approximately a random normal density with the variation of random mean parameter Theorem 1 applies 9q + Jn _ A„ (0o) and constant variance parameter Jn (^o) _1 / n (9a) /n, to general statistical criterion functions L n (9), hence it - covers the parametric likelihood setting as a fundamental case, in particular implying the Bernstein- Von Mises theorems, which state the convergence of the likelihood posterior to the limit random density Note also that the variation norm. norm total variation moments norm. The use of the latter as the posterior means or variances in Theorems variation of results from setting a — in the total in the total needed to deduce the convergence of LTE's such is 2-4. Limit Results for Point Estimates and Confidence Intervals 3.4 As a consequence of Theorem of LTE's. When extremum estimators. Recall that the 1, Theorem the loss function p 2 establishes y/n- consistency and asymptotic normality symmetric, LTE's are asymptotically equivalent to the is (•) extremum estimator y/n(9 ex — 9q), where 9 ex = argsup 9ee L n (9), is first order- equivalent to =^J y/n Un Given that the p* approaches p*^, it 1 n (9 )- may A n (9 ). be expected that the LTE y/n{9 - 9q) is asymptotically equivalent to Zn To a relationship between see arg inf z£® d Zn 6„ c« which exists 6„(»o> = °- by Assumption 2. > 12 and - I / Un , L p is 12 qa / i p( z symmetric, - u )p~ (") du i.e. p(h) — \ > p(—h), then by Anderson's lemma — £j„(s "*" ) ^"> are prepared to state the result. For example, is . Hence Zn and we \ J define ar s in If p(z- u)p1 (it - Un ) du JR* I in the scalar parameter case, the Q-quantile of Af{0, if p(h) = (a 1). 13 — l(h < 0))/i, the constant £j(s ) = q a Jn{9o) '' 2 , where THEOREM (LTE 2 y/ii0 - in Large Samples) Under Assumptions 0b) = 6.(.o) 1-4, + Vn + Op (l), n;^ 2 (0o)J„(eo)f/n ->i JV(0 ,/). Hence // ioss function pn is symmetric, i.e. (h) pn = p n (—h) for all h, £j„(» ) = for each n. In order for the Quasi-posterior distribution to provide valid large sample confidence intervals, the density of Wn = J\f (0, Jn {9a)^U n {9o)Jn{9o)~^) should coincide with that ofp^ )- 1 I hh'pl(h)dh = Jn (9 ~ Var(W„) ee (h). This requires Jn(0 )-*n„{9o)Jn(9o)-\ or equivalently ~ ^rt(^o) Jn(8o), The information equality is known to hold for known to hold for appropriately constructed criterion functions of generalized method of moments, minimum distance estimators, generalized empirical likelihood estimators, and properly weighted extremum estimators; see Section 4. which is a generalized information equality. regular, correctly specified likelihoods. It is also Consider construction of the confidence intervals for the quantity g{9 and suppose g ), is continuously differentiable. Define Fg,n(x)= p n {9)d9, / and c 9 ,„(a) — inf{i : F ti „{x) > a}. Jeee:g(9)<x Then a LT confidence interval is given by [cg t „(a/2),cgjn (l — a/2)] confidence intervals can be constructed by using the a/2 and 1 . As previously mentioned, these — a/2 quantiles of the MCMC sequence (g(9^),...,g(9^)) and thus are quite simple in practice. In order for the intervals to be valid in large samples, one needs to ensure the generalized information equality, which can be done easily through the use of optimal weighting in GMM and minimum-distance criterion functions or the use of generalized empirical likelihood functions; see Section 4. Consider now the usual asymptotic intervals based on the A- method and any estimator with the property y/n~{9-0) Such = 1 J„(B )- ^„(9)/y/K+op (l). intervals are usually given by where qa is the a-quantile of the standard normal distribution. The following theorem establishes the large sample correspondence of the Quasi-posterior confidence intervals to the above intervals. 14 9 THEOREM Sample Inference 3 (Large I) Suppose Assumptions 1-4 hold. In addition sup- pose that the generalized information equality holds: lim Jn {9 )Q n {9 )- 1 =1. n—nx> Then for any a (0, 1) €E c g n (a) , - g{9) - qa y/^^''-w-2inm = 0p (*\ , and lim 7i—yoo P*{c9 ,„(a/2) < g(9 < ) c9 ,„(l I - a/2)) = 1 -a. J One practical limitation of this result arises in the case of regression criterion functions (M-estimators), where achieving the information equality may require nonparametric estimation of appropriate weights, e.g. by using a as in censored quantile regression discussed in Section different we can quantiles, method 4. This may entirely be avoided for construction of confidence intervals. Instead of the Quasi-posterior use the Quasi-posterior variance as an estimate of the inverse of the population Hessian matrix J~ 1 (9o), and combine easier to obtain) in order to obtain the it with any available estimate of A-method style intervals. The Q n {9o) (which typically is methods is usefulness of this particularly evident in the censored quantile regression, where direct estimation of Jn (9o) requires use of nonparametric methods. THEOREM 4 (Large Sample Inference II) Suppose Assumptions 1-4 hold. Define for = fe 9pn (e)de, J-\9 ) = f n(9- 9)(9 - 9)'pn (9)d6, Je and = c 9in (a) where n n ((9 )n- 1 (6> )^ p /. lim In practice sequence Jn (#o) S= -1 is \/v^(^ ^( , g{9) + qa 9 , n (a/2) o) -'^(«oV-(°°)-'^^W ) Then JTl {9 )Jn (9 )- 1 Wc fl < g(9 ) ^p < I, and - a/2)) c 9 n (l , = 1 - a. computed by multiplying by n the variance-covariance matrix of the (9^,9^,..., <- B MCMC 1). Applications to Selected Problems 4 This section further elaborates the approach through several examples. Assumptions 1-4 cover a wide variety of next in is smooth econometric models (by virtue of Examples 1-3. Verification of the key Assumption 4 Propositions 1-3 and the forthcoming examples show prime Lemma 1 and Lemma 2). Thus, what follows mainly motivated by models with non-smooth moment equations, such as those occurring interest to us. 15 how is not immediate in these examples, and to do this in a class of models that are of ' Generalized 4.1 Method Going back to Example set of population Moments and Nonlinear Instrumental of that a typical model that underlies the applications of 2, recall moment of moment GMM is a equations: Emi(6)=Q Method Variables if and only = 9 if 9 (4.1) . estimators involve maximizing an objective function of the form (9) = -n(gn (9))'Wn (9)(gn 9n(0) = -Vm,(ff), *— Ln Wn W (9))/2, (4.2) (4.3) n (0) = W{9) + op (1) (9) > uniformly in e 0, and continuous uniformly in 9 (E (4.4) 0, (4.5) -1 W(9 The = ) flim Varh/^flo)]! (4.6) • choice (4.6) of the weighting matrix implies the generalized information equality under standard regularity conditions. Theorem Generally, by a Central Limit \/™<?n(#o)->d ^'{0,W'~ 1 (9o)), so that the objective function f can be interpreted as the approximate log-likelihood for the sample moments of the data g n {ff)- Thus we can think of as an approach that specifies an approximate likelihood for selected moments GMM 13 of the data without specifying the likelihood of the entire data. We may impose Assumptions 1-4 directly on the GMM objective function. However, to highlight the plausibility and elaborate on some examples that satisfy Assumption 4 consider the following proposition. Proposition and that for i. ii. Hi. 1 all (Method-of-Moments and Nonlinear IV) Suppose 9 in 0, m,-(0) is conditions (4-l)-(4-5) hold, J{9) = G{9)'W{9)G(9) A n (9 )/^= iv. for any e > 0, there is r^co Then Assumption 4 > -Vn~gn(e limsuppJ 6 and is where the condition > = V g Emi{9) A'(O,n(0o )), Q(9 sup in continuous, =G(9 )'W(9 ^^W-^))-(^W-^(y))l l + y/ri\9-e'\ )G(9 Q ), >e l <e . (4.7, J In addition the information equality holds by construction. Therefore the A n (# ), ft n (9o) = fi (#o) 6) is only needed for the conclusions of This does not help much ) is such that \\e-B'\<s holds. (4- continuous, G{9) yW(9 )G(9 )^ d conclusions of Theorems 1-4 hold with 13 that Assumptions 1-2 hold, stationary and ergodic, and and Jn {9 Theorem 3 terms of providing formal asymptotic results 16 for the ) = J (9q) defined above, to hold. GMM model. LTE Therefore, for symmetric loss functions p ni the asymptotically equivalent to the is extremum estimator. Furthermore, the generalized information GMM equality holds by construction, hence Quasi-posterior quantiles provide a computationally attractive method of "inverting" the objective function for the confidence intervals. For twice continuously differentiable smooth V$L n (9) and Vee'L n {9) generally, Andrews many methods Lemma stated in (1994a), Pakes moment conditions, the smoothness conditions imply condition iv in Proposition 2 trivially 1. on More and Pollard (1989) and van der Vaart and Wellner (1996) provide to verify that condition in a wide variety of method-of-moments models. Example 3 Continued. approach and the Instrumental median regression falls outside of both the classical Bayesian classical smooth nonlinear IV approach of Amemiya (1977). Yet the conditions of Proposition 1 are satisfied under mild conditions: (Yi,Di,Xi Zi) i. 7 {m,i(9) ii. iii. G{9) iv. J{9) = (r - an is 1(Y; iid < data sequence, E[m.i(9o)Zi] q{Di,Xi,9))) = VeErmie) = -EfYlDiX = G{9)'W(9)G{9) > , z and Zu 9 € 0} = 0, and a Donsker is {q{D,X,9))ZVeq{D,X,9)' 9q is class, is identifiable, 14 |m;(0)| 2 < oo, continuous, continuous in an open ball at 9 is Esup g . In this case the weighting matrix can be taken as Wn (9) = —?— r(l-r) \tm n . so that the information equality holds. Indeed, in this case n(0 o ) = G(9 )'W(9 )G(9 ) = J(9 ), where W(9 When the model q ) = is plim linear Wn (9 ) = [Var m^n)] and the dimension of estimators in the literature. 15 In Var mi {9 , D is ) = r(l - T)EZiZ[. small, there are computable more general models, the extremum estimates compute, and the inference faces the well-known On -1 and practical are quite difficult to difficulty of estimating sparsity parameters. the other hand, the Quasi-posterior median and quantiles are easy to compute and provide asymptotically valid confidence intervals. Note that the inference does not require the estimation of 14 This is a very weak restriction on the function class, and is known forms, see van der Vaart (1999). I5 These include e.g. the "inverse" quantile regression approach in to hold for all Chernozhukov and Hansen (2001), which extension of Koenker and Bassett (1978)'s quantile regression to endogenous settings. 17 practically relevant functional is an . the density function. The simulation example given in Section 5 strongly supports this alternative approach. Another important example which poses computational challenge et al. (1995). This example application of the the estimation problem of Berry LT methods may be and the fruitful there. Generalized Empirical Likelihood 4.2 A is similar in nature to the instrumental quantile regression, is class of objective functions that are first-order equivalent to optimally weighted GMM (after recentering) can be formulated using the generalized empirical likelihood framework. A class of generalized empirical likelihood functions(GEL) are studied mura and Stutzer (1997), and Newey and Smith (2001). For a in Imbens et al. (1998), Kita- moment equations Em^do) set of = that satisfy the conditions of section 4.1, define n L n (0, 7 = J2 ) Then - (m, (0)' 7 ) (a a (0)) (4.8) set £„(*) = £„ (0,7W), (4-9) where 7(0) solves 7(0)=arg and p The = inf^.L B (0,7), (4.10) dim(mj). scalar function s(-) is a strictly convex, finite, and three times differentiable function on an open R containing 0, denoted V, and is equal to +00 outside such an interval, s () is normalized both Vs (0) = 1 and V 2 s (0) = 1. The choices of the function s(w) = — ln(l — v), exp(u), and interval of so that (1 +v) 2 /2 GMM lead to the well-known empirical likelihood, exponential tilting, and continuous-updating criterion functions. Simple and practical sufficient conditions for et al. (1998), data, Kitamura Newey and Smith these problems To illustrate is (1997), Lemma 2 are given in Qin Kitamura and Stutzer (2001), and Christoffersen et and Lawless (1994), Imbens (1997), including stationary weakly dependent al. (1999). Thus, the application of LTE's to immediate. a further use of LTE's we state a set of simple conditions geared towards non-smooth microeconometric applications such as the instrumental quantile regression problem. These regularity conditions imply the first Donskerness condition below is order equivalence of the GEL a weak assumption that is and nonlinear functional forms encountered known some S > 18 and all that € objective functions. to hold for in practice, as discussed in Proposition 2 (Empirical likelihood Problems) Suppose the following conditions are satisfied: for GMM and all The reasonable linear van der Vaart (1999). Assumptions 1- 2 hold, and that 1 condition (4-1) holds and that m{(9) i. < x]/d6 is continuous in 9 uniformly in x <s |m,-(0)| < K dP[mi{9) ii. Hi. supi e _g i {m{(9),9 € iv. is iid, 0} Donsker is for some constant a.s., < K, for J(9 ) = G(9 The information equality holds in 1-4 0, are true with yft9n(0 )V.(J&o)-*G(9 )-> d JV(O,n(0 o )), )=G(9 yV(9 ) iii. V (9 = E{mi{0o )mi{0o y} > then Assumptions 3 and 4 hold, and thus the conclusions of Theorems n($ in K W^^(0/W), A„ (ftO/Vn = K where class, = 4= f>i Vn~9n(0o) \x\ : )'V(9 )- i G(9 r 1 ), G(9o), G(0 = VeEmi {9 ) ). this case. Another (equivalent) way to proceed is through the dual formulation. Consider the following criterion function n L n (9) = sup jri,...,ir„e[0,l] where h is n i=1 cf. L n (9) in (4.11) is 0, 7(7 + y^7r; = (4.11) 1, i=1 Newey and Smith (2001) (l?^ 1 function = i=1 the Cressie-Reid divergence criterion, h(*) The n Y^ft(7rj) subject to Y^m;((?)7Tj 1 - 1 1) the generalized empirical likelihood function for 9 with the con- centrated out probabilities. In fact, (4.11) corresponds to (4.9) by the argument given in Qin and Lawless (1994) p. 303-304, Empirical probabilities so that Proposition 2 covers (4.11) as a special case up to renormalization. 7?;(0)'s are obtained in (4.11) using the yields the empirical likelihood case, method. Taking 7 = where 7fi(9)'s extremum method. The are obtained through the yields the exponential tilting case, where case maximum 7r,(#)'s are obtained through minimization of the Kullback-Leibler distance from the empirical distribution. Taking 7 the continuous-updating case, where 9. Each approach generates the implied n = Vs(7(ff) 1 yields probabilities T?i{9) Qin and Lawless (1994) and Newey and Smith (2001) provide the formulas: S?iW = are obtained through the minimization of the Euclidean 7?j(#)'s distance from the empirical distribution. given 7= — likelihood , m (0))/X> S i 19 ( Wm 7 -(0)). I The Quasi-posterior for 9 and can be used for predictive inference. Suppose m,i(0) 7r,(0) some random vector Xj. Then the Quasi-posterior for P{Xi eA} = predictive probability is = m(Xi,9) given by JY^miiXi € A]pn {9)dB = Jh n (9)pn [0)dB, =h n (9) MCMC sequence evaluated at h n (h n (9^), ..., h n (9^)) Theorem 1 in Qin and Lawless (1994) that y/n(P{Xi e A} — where n„ = P{X 6 A}(1 - P{X € A}) - Emi(9o)'l{Xi € A} U which can be computed by averaging over the , It follows similarly to the proof of P{Xi € A})-> d JV(0,n„) Emi(9 )l{Xi eA},U = V(9o)~ 1 {I ~ G(^)J{0o)- G(9o)V(9o )- 1 }. , { { 1 M-estimation 4.3 M-estimators, which include many linear and nonlinear regressions as special cases, typically maxi- mize objective functions of the form n Ln {9) = Ym j {e). i i=l rtii (9) need not be the log likelihood function of observation and i, may depend on preliminary non-parametric estimation. Assumptions 1-3 usually are satisfied by uniform laws of large numbers and by unique McFadden identification of the parameter; see for (1994). The next example Amemiya (1985) and Newey and proposition gives a simple set of sufficient conditions for Assumption Proposition 3 (M-problems) Suppose Assumptions 1-3 hold for the criterion function specified above with the following additional conditions: Uniformly in 9 in an open neighborhood of 9q, is stationary and ergodic, and for fn n (9) i. = there exists m,(9o) such that Erhi(9o) m.i{9) 4. rrii{9) Yl7=i "i,-(0)/n, — - mi{9 ) - m;(0o )'(0 \9-9 for each 9 i ) : , \9 and, for some 5 — 0q\ < 6>> is > 0, a Donsker class, \ E[m n {9) - mn (9 ) - m n (0o )'(O - 9 2 )] = o{\9 - 2 O | ), n ii. J(9) — — Vee'E[m,i{9)] Then Assumption 4 J{9o) = fi(# ), holds. is continuous and nonsingular in a Therefore, the conclusions of Theorems then the conclusions of Theorem 3 also hold. 20 ball at 9q. 1, 2, and 4 hold. If in addition The above conditions apply to many well known examples such as LAD, see for example van der Vaart and Wellner (1996). Therefore, for many nonlinear regressions, Quasi-posterior means, modes, and medians are asymptotically equivalent, and Quasi-posterior quantiles provide asymptotically confidence statements fails if the generalized information equality holds. Theorem 4 provides to hold, the method of Example 1 Continued. is not difficult Newey and Powell Assumption (1990) imply /, = fYi \Xi(qi), Qi — (1990) Furthermore, it are nonparametrically estimated, the conditions of ui* 4. Newey and Powell satisfied. Under iid sampling, the use of efficient weighting = RT^r) *' "> where in Powell (1984) or assumptions of Proposition 3 are when the weights to show that valid the information equality valid confidence intervals. Under the conditions given for the censored quantile regression, the When < 412 ) q{Xi;6a), validates the generalized information equality, and the Quasi- posterior quantiles form asymptotically valid confidence intervals. Indeed, since J (*°> = T {\- T E ^ VqiV ^ fOT Vqi = 9?(Xi 9o)/d6 ' > ( 4J3 ) ) and n 1 -= A n (0o = ) *Jn v T V»r 1 •Jn v ' (T- \{Yi < qi ))V qi -„ jV(0, n(0o )), (4.14) -f— t=i with n{9o) = 7o^)^ V9iV^' (415) we have n(flo) For this class of (4.16) ). problems, the Quasi-posterior means and medians are asymptotically equivalent to the extremum estimators. intervals = A0 when the efficient The Quasi-posterior quantiles provide asymptotically valid confidence weights are used. However, estimation of inary estimation of parameter 9 . When efficient weights requires prelim- other weights are used, the method of Theorem 4 provides valid confidence intervals. 5 Computation and Simulation Examples In this section 5.1 The we briefly discuss the MCMC method and present simulation examples. Markov Chain Monte Carlo Quasi-posterior density is proportional to L 6 Pn {e)<xe ^ \{e). 21 we can In most cases easily compute e L "^Tr{9). However, computation of the point estimates and confidence intervals typically requires evaluation of integrals like Je g(9)e^ M0)M L Je e ~Wir(9)d0 e { ' For problems for which no analytic solution exists for for various functions g. (5.1), especially in MCMC methods provide powerful tools for evaluating integrals like the one above. high dimensions, See for example Chib (2001), Geweke and Keane (2001), and Robert and Casella (1999) for excellent treatments. MCMC bution is a methods that produce an ergodic Markov chain with the stationary collection of pn Given a starting . value 0(°' a chain (#(') , , 1 < t < B) is distri- generated using a transition kernel 9^ with stationary distribution p n which ensures the convergence of the marginal distribution of B, the methods produce a dependent sample (9^ a \9^ 1 \...,9^) , MCMC top,,. For sufficiently large whose empirical distribution approaches p n imply that as B . The ergodicity B ( stress that this technique does not rely computation of LTE's. (Appendix One of the most important Metropolis-Hastings density p n {9) oc e generate (9^°\ of the chains usually r 1 o£s(0 °Hi> We and construction -> oo, ..., Ln< 6 - 9 MCMC ) on the likelihood principle and can be methods is in the following Choose a starting value 9^. 2. Generate £ from g(0 w) |?)- 3. Update from 9^ = for j 7+1) _ used for the Metropolis-Hastings algorithm. (MH) algorithm with 1. fruitfully provides the formal details.) \{9), known up to a (B) 6^' +1 > B / 9W)Pn(8)d9. Quasi-Posteriors. Given the Quasi-posterior constant, and a prespecified conditional density q{9'\9), way, 1, 2, ..., using £ with probability p(# (i) ,0 0W> with probability l-p{9^,0 1 ' where Note that the most important quantity an "old" point x to the "new" point a possible "new" value of y y, in the algorithm is the probability p(x, y) of the which depends on how much yields relative to e Ln of ^Tx{x) at the "old" value 22 move from an improvement in e Ln ^n{y) x. Thus, the generated chain of draws spends a relatively high proportion of time in the higher density regions and a lower proportion in the lower density regions. Because such proportions of times are balanced in the right way, the generated sequence of parameter draws has the requisite marginal distribution, which we then use for mode the computation of means, medians, and quantiles. (How closely the sequence travels near not relevant.) is Another important choice is the transition kernel q, also called the instrumental density. It turns out that a wide variety of kernels yield Markov chains that converge to the distribution of interest. One canonical implementation of the MH algorithm Q{x\y) where / a density symmetric around is 0, is to take =f{\x-y\), such as the Gaussian or the Cauchy density. This implies a random walk. This is the implementation we used in this paper. Chib Robert and Casella (1999) can be consulted for important Geweke and Keane and (2001) (2001), details concerning the implementation and convergence monitoring of the algorithm. that the chain It is (9^) now worth is repeating that the main motivation behind the efficiency properties (stated in sections 3 LTE the approach is as efficient as the of dimensionality through the use of posterior distribution, hence can be is MCMC the number of MCMC the and 4) as well as LTE its computational attractiveness. Indeed, MCMC. LTE's are typically computed (estimated) means or quantiles of a Quasi- at the parametric rate 1/y/B, 16 where B draws (functional evaluations). Indeed, under canonical implementations, chains are geometrically mixing, so the rates of convergence are the same as under MCMC and is computed (estimated) by similar grid-based algorithms at the nonparametric rate (l/B) d +?p parameter dimension and p We used based on is extremum approach, but may avoid the computational curse independent sampling. In contrast, the extremum estimator (mode) the approach is , where d is the the smoothness order of the objective function. an optimistic tone regarding the performance of the objective functions have numerous local optima, but mum. These problems are important, and MCMC. all Indeed, in the problems we study, pronounced global exhibit a well therefore the good performance of opti- MCMC and the derived estimators are encouraging. However, various pathological cases can be constructed, see Robert and Casella (1999). Functions which case the initial MCMC draw may have may may be multiple separated global modes (or approximate modes), require extended time for convergence. Another potential problem 9^°' very far in the tails of the posterior pn (#). In this case, also take extended time to converge to the stationary distribution. In the problems this may be values based on an 16 that MCMC may we looked at, avoided by choosing a starting value based on economic considerations or other simple considerations. For example, in the censored two stage in is initial median regression example, we may use the starting Tobit regression. In the instrumental median regression, we may use the may typically least squares estimates as the starting values. Note that the rates are used for the informal motivation. increase linearly or polynomially in d if d is allowed to grow. 23 We fix d in the discussion, but the rate Monte Carlo Example 5.2 As discussed in Section 2, a large Censored Median Regression 1: literature has been devoted to the computation of Powell's censored median regression estimator. In the simulation example reported below, we find that both and large samples with high degree censoring, the LT estimation may be a in small useful alternative to the popular iterated linear programming algorithm of Buchinsky (1991). The model we consider = Y* 0o + X'B + u, X=A/"(0,J3 ), w = X!2 A^(0,l), The true parameter (80,81,32,63) The LTE initial is y = max(0,Y*). (—6,3,3,3), which produces about is MCMC series 40% censoring. £" L n (6») = - =1 \Yf - max (0, O + X\8) |. The taken to be the ordinary least squares estimate, and other details based on the Powell's objective function draw of the is is are summarized in Appendix B. Table first The number reports the results. 1 number results indicates the row for the in parentheses in the iterated linear of times that this algorithm converges to a local ILP reports the performance of the algorithm among the subset which the algorithm does not converge to the local results for all simulation runs, including those for from the local minimum results, minimum The second row at 0. of 0. reports the which the ILP algorithm does not move away and they compare favorably to the ILP even when the results, as can be seen from Table LTE's do markedly When 1. The of simulations for minimum. The LTE's (Quasi-posterior mean and median) never converge to the of 0, from the ILP programming (ULP) minimum the local local minima local are excluded minima are included in the ILP better. [Table 1 goes here.] 5.3 We Monte Carlo Example Instrumental Quantile Regression 2: consider a simulation example similar to that in Koenker (1994). Y = a + D'Bo + u, u = The model is a(D)e, 3 D = exptf(0,I3 The true parameter (ao,0o) equals Wn 0, and other details are -E(2 (1 + £l>(o)/5. _1(Yi - a+i? ' /?))2 ^'' r 1 • t=i draw of the summarized = AA(0,l), a(D) = and we consider the instrumental moment conditions l-i {0) . In simulations, the initial e ), in MCMC series Appendix B. 24 is taken to be the ordinary least squares estimate, While instrumented median regression is designed specifically for endogenous or nonlinear models, use a classical exogenous example in order to provide a contrast with a clear undisputed - the standard linear quantile regression. The benchmark provides a reliable and high-quality estimation method for the exogenous model. In this regard, the performance of the and inference, reported in Table 2 and Table 3, is we benchmark LT estimation encouraging. Table 2 summarizes the performance of LTE's and the standard quantile regression estimator. Table 3 compares the performance of the LT confidence intervals to the standard inference method for The reported quantile regression implemented in S-plus 4.0. parameters. Other The root mean square criteria level of 90% in LTE's are no larger than those of quantile regression. demonstrate similar performance of two methods, as predicted by the asymptotic The coverage theory. errors of the results are averaged across the slope of Quasi-posterior quantile confidence intervals both small and large samples. is also close to the nominal noteworthy that the intervals do not require It is also nonparametric density estimation, as the standard method requires. [Tables 2 6 An and 3 go here.] Illustrative Empirical Application The following illustrates the use of LT estimation in practice. We consider the problem of forecasting the conditional quantiles or value-at-risk (VaR) of the Occidental Petroleum The problem returns. economic analysis, but of forecasting quantiles of return distributions is fundamental to the real-life activities (NYSE:OXY) security not only important for is We of financial firms. approach provides a simple and effective method an offer econometric analysis of a dynamic conditional quantile forecasting model, and show that the LTE of estimating such models (despite the difficulties inherent in the estimation). The dataset Y the one-day returns of the Occidental Petroleum t, X t consists of 2527 daily observations (September, 1986 , (NYSE:OXY) security, a vector of returns and prices of other securities that affect the distribution of constant, lagged one-day return of price of oil The - November, 1998) on Dow Y t : a Jones Industrials (DJI), the lagged return on the spot (NCL, front-month contract on crude oil on NYMEX), and the lagged return Y(_i. choice of variables follows a general principle in which the relevant conditioning information for estimating value-at-risk of a stock return, corresponding capitalization and type X t , may (for instance, the contain such variables as a market index of S&P500 returns for a large-cap value stock), the industry index, a price of a commodity or some other traded risk that the firm and lagged values of its stock price. 25 is exposed to, Two functional forms of predictive r-th quantile regressions were estimated: Linear Model Q n+1 (r|It : Dynamic Model: where It t. Q Yt+1 {T\h , , = X' 9{tJ) 9(t), t (? yt+1 (r|/€ ,»(r),e(r)) = X' 9{r) + q(t) QYt [r\It-i,9{r),Q{r)), t 9(t)) denotes the r-th conditional quantile of available at time t. In other words, Qyt+l (r|J(,0(r)) The idea behind the dynamic models is is Yt+i the value-at-risk at the probability level to better incorporate the entire past information and by Engle and Manganelli better predict risk clustering, as introduced conditional on the information (2001). The nonlinear dynamic models described by Engle and Manganelli (2001) are appealing, but appear to be using conventional extremum methods, empirical analysis of the linear model The LT estimation and see Engle and Manganelli (2001) difficult to for discussion. An estimate extended given in Chernozhukov and Umantsev (2001). is inference strategy is based on the Koenker and Bassett (1978) criterion function, n L n (9,e) = -Y,MT)PT(Y t -Q Yt (T\it - u e, e )), (e.i) t=s where p T (u) — (t — l(u < 0))m. This criterion function is similar to that described in Example 1, with the exception that there is no censoring. The starting value s = 100 initializes the recursive specification so that the numerically negligible imputed conditions (taken to be the marginal quantiles) have a we constructed the LT estimates In the first step, w The initial effect. t = (r) 1/t(1 — using the flat weights t) for each t — s,...,T. results of the first step are not presented here, but they are very similar to those reported below. Because the weights are not optimal, the information equality does not hold, hence Quasiposterior quantiles are not valid for confidence intervals. However, the confidence intervals suggested in Theorem consistent Under the assumption 4 lead to asymptotically valid inference. specification, stationary sampling, and the conditions specified in of correct Proposition 3, and asymptotically normal (?(r)i(r)')^^(0,J(9o)- "(«o)J(Co)- i ) 1 where for V<? t (r) = dQ Yl {r\It-i,Q{r),9{r))ld{Q,9')' J(9 and for ^M = -^= ^L, fl(0o) fr = l™ the model is ) and q t (r) = EfYtlIt _Mt(T))Vq ~ Iff < £A ^— — T->co J If dynamic the LTE's are *(t))] t (6.2) , = Q n (r\It -i,e(r),9(T)), (T)Wqt (r)', V ft (r), n (0 o )A n (0 o )' - r(l - rJEVftWVftW'. S not correctly specified, then, for example, the Newey and West (1987) estimator provides a consistent and robust procedure for estimation of the limit variance Q{9 26 ). The estimation of the matrix J(#o) -1 can be done through the use of nonparametric methods as Powell (1984). Alternatively, as suggested in Theorem of the MCMC J(6a )~ x 4, we can use the variance-covariance matrix sequence of parameter draws multiplied by n = Plugging the estimates into the variance expression . and confidence we (T — (6.2), s) as a consistent estimate of we obtain the standard errors intervals that are qualitatively similar to those reported in Figures 4-7. In order to illustrate the use of Quasi-posterior quantiles efficiency, in (Theorem 3) and improve estimation also carried out the second step estimation using the Koenker-Bassett criterion function (6.1) with the weights h , 1__ ffi( [QY (T t where h oc of correct + h/2\it - 1 ,e{T),e{T))-QYt Cn~ x l z and C > dynamic is {T-h/2\it . u e(T),9{T))] based on the second step estimates. The computed for each coefficient 0j{t) and the 90%-confidence imply the generalized information equality, which intervals, = (j .05-th, .5-th, 1, ...,4) and £>(t), following analysis and then used to form the point estimates which are reported Figures 4-7 for r = .2, .4, ..., .8. VaR functions of the dynamic model linear models, respectively, plotted in the time-probability level coordinates, (i,p), (p We quantile index.) report VaR many for represents a more complete depiction The dynamics risk tends to model is The VaR is the surface formed by varying r of conditional risk. usual its level. The risk surface generated difference between the linear and the recursive by the recursive model is much smoother and is persistent. Furthermore, this difference is statistically significant, as Figure 7 shows. Focusing on the recursive model, slope coefficients coefficient ^(0> let us examine the economic and statistical interpretation of the ^3(')> #4(')> ?(')> plotted in on the lagged oil Figures 4-7. price return, #2(-)> tails of the conditional return distribution. coefficient Clearly, the whole t typically depicted in Figures 2 and 3 unambiguously indicate certain dates on which market be much higher than also striking. much more — The conventional VaR reporting values of r. involves the probability levels at a given r. The The and .95-th Quasi-posterior quantiles are Figures 2 and 3 present the estimated surfaces of the conditional and t(i-t)' chosen using the rule given in Koenker (1994). Under the assumption specification, these weights validates Quasi-posterior quantiles for inference purposes, as in (4.12)-(4.16). is ' on the lagged DJI return, is and right negative in the middle part. The insignificantly positive in the left It is insignificantly 03(-), in contrast, is significantly positive for all values of r. We also notice a sharp increase in the middle range. Thus, in addition to the strong positive relation between the individual stock return and the market return (DJI) (dictated by the on (0.2,0.8)) there is also additional sensitivity of the fact that #2(-) > median of the security return to the market movements. The coefficient on the own lagged return, for values of t close to 0. distribution. in the tails. This may be 6><i(-), on the other hand, is significantly negative, except interpreted as a reversion effect in the central part of the However, the lagged return does not appear to significantly Thus, the lagged return is more important 27 for the shift the quantile function determination of intermediate risks. Most importantly, the dynamic quantiles is and coefficient g () high quantiles, but in the a strong evidence is on the lagged VaR in favor of the recursive specification. both the reversion and significant risk clustering 7). As expected, there is the effects in properties of the stock price. Thus, the dynamic the tails of the quantile function, that for risk 7 significantly negative in the low The significance of q(-) The magnitude and sign of g(-) indicates tails of zero effect over the middle range, which is is insignificant in the middle range. effect of is lagged the distribution (see Figure consistent with the VaR is random walk much more important for management purposes. Conclusion In this paper, common using we study the Laplace-type Estimators statistical, non-likelihood these estimators are v^-consistent or Quasi-Bayesian Estimators that we define based criterion functions. Under mild regularity conditions and asymptotically normal, and Quasi-posterior quantiles provide A asymptotically valid confidence intervals. simulation study and an empirical example illustrate the properties of the proposed estimation and inference methods. These results show that in many important cases the Quasi-Bayesian estimators provide useful alternatives to the usual extremum we estimators. In ongoing work, are extending the results to models in which v^-convergence rate and asymptotic normality do not hold, including the maximum score problem. Acknowledgments: We thank the and an anonymous editor for the invitation of this paper to Journal of Econometrics prompt and highest referee for quality feedback. Gary Chamberlain, Ivan Fernandez, Ronald Cole, Hansen, Jerry Hausman, James Heckman, Sergei We thank Gallant, Jinyong Xiahong Chen, Shawn Hahn, Bruce Hansen, Chris Bo Honore, Guido Imbens, Roger Koenker, Shakeeb Khan, Morozov, Whitney Newey, Ziad Nejmeldeen, Stavros Panaceas, Chris Sims, George Tauchen, and seminar participants at Brown University, Duke-UNC Triangle Seminar, MIT, MIT-Harvard, University of Chicago, Princeton University, University of Wisconsin at Madison, University of Michigan, Michigan State University, Texas-AM University, the Winter meeting of the Econometric Society, the 2002 European Econometric Society Meeting gratefully acknowledge the financial support provided grants SES-0214047 A in Venice for insightful comments. We by the U.S. National Science Foundation and SES-0079495. Appendix of Proofs Proof of Theorem A.l It suffices 1 to show \h\°\p'n (h)-pUh)\dh-> p (A.l) /,. for all a > 0. Our arguments (1981), as presented part 2, and follow those in Bickel and Yahav (1969) and Ibragimov and Has'minskii by Lehraann and Casella (1998). As indicated are due to (i) the non-likelihood setting, to handle discontinuous criterion functions, (iii) (ii) in the text, the main difference are in the use of Huber-like conditions in Assumption 4 allowing more general loss functions, which are needed for construction of confidence intervals. 28 , Throughout proof the range of integration for this ft is implicitly understood to be Hn argument, we limit exposition only to the case where J„(9) and Q„(9) do not depend on . n. For clarity of the The more general case follows similarly. Part 1. Define ft = v^(0 - Tn ) T„ , = O + -J {Bo)-' A„ (0O ) Un = -^J(9 , n )-' ^/n A„(0 O ), (A.2) then p„ =—=p„ (fc/v^ + 5o + Un/Vn) (ft) /»„ w (^ + T") ex P ( L " (^ + T")) dh *(jz + Tn )exp(u>(h)) ^7r(^ + T„)exp(w(ft)) " ' a, where w (ft) = L„ (V„ + 4= - L (0o) - ^- A„ ) (0 O )' ^ (flo)"' A„ (A.3) (0O ) and Cn = [ o > Part 2 shows that for each A,„= Given (A.4), taking a /" (-y= + Tn \ exp (w(ft)) dft. 0, exp(w(ft))7rfrn |ft| a = ix + jLj-expf-|ft'J(0o )ft)7r(0o ) dft -^0. (A.4) we have C„->„ ,(9o " 7r(0o)dft = f e-^'l 7r(0o)(27r) = |detJ(0o)r 1/2 (A.5) hence C„ =Op (l). Next note left side of (A. 1) ee / |ftHp n (ft) -p^(ft)|dft = >1„ C„-\ i-here 4„ = /" |ft|° e wW Tr(Tn + Using (A. 5), to show (A.l) it -j=} suffices to - {2-K)- show that dl7 An \<\etJ (0 O )| -?-y 0. A„ < A\„ + A 2 „ 29 But ,/2 exp f-h' J (9 )h\ C„ dh. where A 2n = Then by a C„(2Ky d/2 \detJ(0o)\ l/2 exp(-^h'J(6o)h\ -is (0o)exp (~h'J(e )h\ \h\ J dh. (A. 4) 0, and A 2„ = C„(27r)- d/2 | det 1/2 J(0O ) - tt(0o) f | \h\"ex.p(-^h'j(9o)h\dh 0. Part 2. It remains only to show (A.4). Given Assumption 4 and definitions - ^A n (0 O )' = -\tiJ{e Split the integral A\„ • Area (i) : \h\ < M, • Area (ii) : M < \h\ • Area (Hi) : \h\ > 8y/n. Each of these areas Area (i): We will J )h -1 (flo) An (fl ) + Rn (-^= in (A. 2) and (A. 3), write + T„) + Rn (^= + T„Y in (A.4) over three separate areas: is < 8y/n, implicitly show that for liminf P.l f understood to intersect with the range of integration for < each c \h\ M < oo and each e exp(w(h))n(Tn + h, which is H„. > —=j (A.6) -exp(-^tij{0o)h\ir(eo ) dh This is <e| >1- . proved by showing that sup | exp(to (ft))* \T„ + -M -exp (~h'J(9o)h\ tt(5 (A.7) ) \h\<M Using the definition of (a) uj(h), (A. 7) follows from: sup \h\<M where (a) follows n(^=+Tn ) \v« -7z(S ) -Ao, (6) / sup \R^(-^=+Tn ) VV" / o, \h\<M\ from the continuity of ir(-) and because by Assumption 1 J(flo)" ' A„ (fl y/Z 30 ) = Op (1) 4.ii-4.iii: (A.8) — — Given (A.8), from Assumption (b) follows 4.iv, since h Tn sup = — 9o H = Op {l/Vn). \h\<M Area (ii): We show that for Iiminf P. each there exist large M and small 5 > exp (w (h)) k / < " > e [Jm< \k\<5s/^ such that (**) (A.9) Since the integral of the second term to show that for each e > I [Jm< \h\<SJK it suffices exp (w By assumption By 7r (•) (k)) < K, to M 7r so W show that + ( T„ -^= (- ]rtiJ{8 ) h\ and can be made is finite there exist large Iiminf P.l In order to do so, exp and small 5 -k > > -€. 1 M large, it suffices such that -=) exp(w(h))n (t„ + M as n — < C exp (-jh'J (8 h\ ) we can drop ) e 1 arbitrarily small by setting for sufficiently large j (8 \dh< dh < > -e 1 (A.10) oo for all , > h e > M< < \h\ (A.H) 5y/n. from consideration. it definition of ui(h) < exp (w (h)) Since \T„ — 8 \ = o p (l), for any 5 [ wp —> > T„ exp -= H - -ti J (8 ) h + Rn 1 — Bo < 25, for all \h\ Thus, by Assumption 4.iv(a) there exists some small 5 and large Iiminf P. ', sup < M<\h\<& lsV*\h+^J(8o)- 'A„(9 \ Since ^\J (Bo)' 1 A„ 2 (8 ) | = Op (1), for some lim inf P. i exp(u>(h)) (* + £) 2 )| < 5y/n. M such that < ^mineig (J " 4 (8 )) \ > 1 - e. C> < Cexp (-jft'j(A)fcH > liminfpJe"'" < Cexp (-hi J(8 )h+ > 1-e. (A. 12) implies (A. 11), which in turn implies (A.9). 31 (A.12) -mineig ( J (Bo)) \h\ 0} Area (iii): We will show that > for each e and each > 8 > limlinfP.i a f |h| + 4= |exp(u;(/i))irfrn ) (A.13) The integral of the exp (~tiJ(e )h\ second term clearly goes to a f \h\ as e" n —> vr +i [ J\e Since T„ — 9o -^4 Tn -9 ,r is wp -> X„ C 1 • vn/ V <eI > 1-e. we only need to show p 0. bounded by ((9) .-x,> j Ln (9) - Ln (9 ) - ^ A„ (So)' J (do)- 1 An d6. (0 O ) J bounded by this is % /H \dh oo. Therefore -T„|>J 0, ) w n (T„ + JL) dh -+ J|/.|>iiv^ Recalling the definition of h, the term n{9 a+1 a (1+ / \9\ ) -k (6) exp (L„ (6) - i„ (0 O )) e», •/|9-»ol>«/2 where = isT„ By Assumption exp > 3 there exists e wp —> 1 the entire term is pj (9 )j = Op (l). sup e L" m - L" l,a) < e" e } = 1 bounded by #„ C • Here observe that compactness 1 J^o)" A„ (%)' such that liminf Thus, f-^A„ is ^/n ,+1 • e~ n' a f \9\ ir (9) d0 = o p (l). (A.14) only used to insure that [ a \6\ Tx{9)d9<oo. (A.15) Hence by replacing compactness with the condition (A.15), the conclusion (A.14) is not affected for the given a. The A.2 entire proof is now completed by combining (A. 6), (A. 9), and (A.13). Proof of Theorem 2 For clarity of the argument, we limit exposition only to the case where J„ (9) and n. The more general case follows similarly. Recall that h = y/n{9-6 )-J (So)" A n 1 32 (g ) /y/n. fi„ (9) do not depend on Define Un = J (do) l &n{8o) I y/n. Consider the objective function Qn (z) = J p(z-h- U„)p*n (h)dh, which is minimized at ^/n(8 — 80) Also define Q<x>(z) = / - p{z - Zn Define h U„)p*x {h)dh. Jr* which is minimized at a random vector denoted £ Note that solution is symmetric, f = Z„ — Therefore, is unique and = arg inf I6» d finite . lj^p(z-h) P Uh)dh\. I jRd' by Assumption 2 parts we have for by Assumption 2.H p(h) < 1 + p \h\ \Qn(z)-Q ao (z)\< f and by (X < [ f Jh„ {l+2 p + < [ f where o p (l)-conclusion Now note that convexity =Op (l). by Theorem is - Q ao {z)^ v \a + b\ p < (l f 1 2p ~l \a\ p + 2p - 1 |6| p for p > 1: + \z-h-Un n<plo(h))dh -i \h\ (l p + (l+2 p -'\h\ p + H„ When p + \z-h-U„\n\Pn{h)-pUh)\dh + measure of loss function p. any fixed z Qn(z) since on the (iii) arg inf zift d Qae(z) equals Zn = £ + !/„ Next, and (ii) by Anderson's lemma. (l + +2 p 2p - - l , \h\ \z-U„\ p )\p'n {h)-p'ao {h)\dh p + 2p -1 \z-U„\ p )(p-ao (h))dh +Op (l))(p'n (h)-pUh))dh 2»-'\h\ p + Op (l))(pUh))dh = and exponentially small tails of o p (l), the normal density (Lebesgue converges to zero). Q n {z) and Qaa(z) are convex and lemma of Pollard finite, and Z„ = arginf zgR i Qac(z) = O p (l). By the (1991), pointwise convergence entails the uniform convergence over compact sets K: sup Since yjn(8 Zn = Op (l), uniform convergence and — 80) — Z„-> p 0, as shown below. Proof of Zn — yfn(8 — 80) = o p (l). The <2„(z) - Qoc(z) ->„ 0. convexity arguments like those in Jureckova (1977) imply that proof follows by extending slightly the convexity argument of Jureckova (1977) and Pollard (1991) to the present context. 33 Consider a ball Bs(Z„) with radius 5 > 0, Zn and let z = Zn + dv, where visa unit direction vector such = Op (l), for any 8 > and e > 0, there exists K > such that centered at Z„ , liminf P.i E„ By convexity, for any z = Zn + dv = {BS (Z„) constructed S that \v\ = 1 and d> 8. Because B K (0)}\ > 1 - e. so, it follows that ^(Q-OO - Qn{Zn )) > Q n (z') - Q„(Z„), where z* is a point of boundary of Bs{Z n ) on the Q„(z) to Qoc(z) over any compact set ^(Qn(z) ZJk-(O), exists 1 — an > 77 is a uniformly > in En and Z„. By the uniform convergence of occurs: - Q„(Z„)) > Q„(z*) - Q n (Zn ) > where V„ line connecting z whenever (A.16) n - Qoo(z*) Qoc(^n) positive variable, because P(Vn > such that lim inf„ > jj) 1 — Zn is + Op (l) > V„ + Op (l), the unique optimizer of That is, there least as big as 3e for large n: 2(Qn(z)-Q n (Zn ))>7 1 Thus, yfn{9 — 9o) set e as small as small T) > eventually belongs to a we like 0, it follows by picking complement Since this Bs(Zn ) with K, and > 0, it follows -V^{e-e probability at most 3e. Since (b) sufficiently large n, and we can (c) sufficiently )\ ><s} = 0. that Z„ - Vn(9 - 9 A. 3 . that true for any 8 is of (a) sufficiently large iimsupP*{|z„ ) = o p (l). M Proof of Theorem 3 For clarity of the argument, we limit exposition only to the case where J„(8) and n. Q<x>- Hence we have with probability at e. The more general case follows similarly. We f„. b (z) Evaluate it at x = H 9 <?(#o) + sj^fn and = / Jee&-. Pn {e)d0. change the variable of integration p'n (h)dh. Define also g n (s) , = / d :g(e +h/ ^+U JheR //.<ER'<: n /^r)<g(8 )+3/^; 9 (0o+'i/v w/ n + k%./v/5r)<s(9o)+s/%Af p'ooWdh and H Hg^is) ,oo(s) g do not depend on 9 (,e)<x As) = Fg ,„(9(So) + s/V^) = f S 8 ii n {9) defined = plcitydh. / J/.6R //.eR J :V9(9 :Vor9nVf/i/VS"+U„/v^r)<s/v^ )'('i/v^'+t''n/v^)<»/yS 34 By moments norm and Theorem definition of total variation of sup \Hg> „(s) 1 H3 ,„(s)|-»-p - 0, a where the sup By is Hg<n (s). taken over the support of the uniform continuity of the integral of the normal density with respect to the boundary of integration sup|i?s ,„(s) --ff9 ,oo(s)|-> p 0, - -ff3 ,oo(s)|->p 0. which implies SUp |ifg ,„(s) 3 where the sup is The convergence taken over the support of H 3i „(s). of distribution function implies the convergence of quantiles at continuity points of distri- bution functions, see e.g. Billingsley (1994), so - ff-^ (<*)-„ i?-i (a) 0. Next observe = P{^g(Bo)'N'(Un ff9 ,oo(s) H~Ua) = Vg(9 where qa is the a-quantile of jV(0, Recalling that we denned + qa ^Vgg(Bo)'J- )'Un < J-'iBo)) , s\un }, (BoWeg(0o), 1). = c 3 ,n(aO , F~l(a), by quantile equivariance with respect to the monotone trans- formations tf-i( a ) = yS(cs ,„(a)-ff(0o)) so that \/n(cg ,„(a) The A.4 - rest of the result follows g(6 )) V^»9W^W»sW + = Vg{6Q )'Un + g <'p(l)- by the A-method. Proof of Theorem 4 In view of Assumption 4, it suffices to show that Jn\6 ) - _, J„ (A. 17) (e )->p 0, and then conclude by the A-method. Recall that h = VS(0 - B ) - Jn -1 (So) A„ (Bo) A/S, u„ and the localized Quasi-posterior density for p*n (h) = h — yTl is p„ (h/y/n 35 + 0o + U„/y/n) . Note also J" 1 (Bo) = J n{9- 9)(9 - 0)'p n (9)d9 = J (h-V^(9-6o) + Un )-(h-VE(e-8 )+Un yp: (h)dh, t and l J~ {9o)= J htiplcitydh. We have, denoting h = (h u ..., dd ) and f„ (a) fHn hihj (p'n (h) -ploihijdh (b) fHC hihj [plo(h)jdh (c) fHn \fn 2 (p'n (h) \ = = = (f„i , ..., f„ d ) where f„ Theorem o p (l) by o p (l) by y/n{9 -9 - U„ ) for all , i, j < d 1, Jn {9o) op (l) by definition of pjo and -pUh))dh = = Theorem being uniformly nonsingular, 2, =<.„(!) (d) (e) \Tn /H JHri 2 (ploihndh \ fni hj (p'„(h) = op (l) by Theorem - plcihijdh = 2, definition op (l) by Theorems of p^,, and 1 and Jn (9o) being nonsingular, 2, =°„(i) (f) fcj /H (p£oC0 )dh T„i = o p (l) by Theorems 1 and 2, definition of p'^, and J„(5o) being uniformly nonsingular, from which the required conclusion follows. A. 5 Proof of Proposition Assumption 3 1. It is directly implied 1 by (4.1)-(4.4) and the uniform continuity of remains only to verify Assumption shown Errii (8), as in Lemma 4. Define the identity L„(8) - L n (9 ) = - ng n (9 )'W{9 A„(fl )G{8 ) {9 -9 ) )< (A.18) 1 --(9-9o )'nG{9 o yw(9o )G(0o)(9-9o) R n (9). + Jtfo) Next, given the definition of conditions i-iii of Proposition A n (9o) 1. and J(9o), conditions Condition iv is i, iii ii, verified as follows. of Assumption 4. are immediate from Condition iv of Assumption 4 can be succinctly stated as: ID ii for each e > there exists a 8 > such that lim 36 supP* < sup —— (Q\ — 7-: ~) I 1 ,„ > e > < 6. e This stochastic equicontinuity condition see e.g. Andrews . equivalent to the following stochastic equicontinuity condition, is (1994a): id forany<5 n ->0 |»— This is weaker than condition (v) of Theorem 7.1 sup "T^ifl where the term + n|0-0o in brackets is 2 i, —FT? = W O | v^ie - g l Hence the arguments of the proof, except At first From (1994), which requires (A. 20) > \VE\9-e \+n\d-9 l + n|0-0 o 2 L 2 | \' | + | for several - 2 rc|0-0 o | important differences, follow those of Theorem 7.2 in (1994). note that condition iv of Proposition sup »6Bj n (eo) (A.19) bounded by "*" Newey and McFadden °p „i/) - 0o + n\9 - Vn\6 | I Newey and McFadden in Rn{6) Rn(9) l n = o„(l). l + n|ff — flop ' l<<5„ fl fa\ v sup e (0) = ov (-L) Vv™/ where , 1 is e (0) implied by the condition (where 3" = 9 = ^ +~ Vn|0-0 "f* f o) O 1 <*> for we let g(9) ft. -> 0. any = Egn (0)): (A.21) v ; | (A. 18) fl* (0) = + « 2n (0) + Rsn(6), i?m(0) where R ln (0) =n (gn W„ (0)G (0 O (0 O )' iZ 2 „ (0) ) ) - 0o) + \ - 0o)' G (0O )' W(9)G (0O ) (0 - 0o) (0 5* («)' W»(%« (0) + ="Qs» (0o)' (W„(0o) - R3n (9) =n(gn (flo)' (W(6 - (0 iy„(0)) 9 „ (0o)) W„(fl))) G (0O ) Verification of (A.19) for the terms R2„{9) uniform consistency of by condition in as |0 It - O -> | i W of Proposition 1, and assumed as n (9) in so that Rin W„(%„ (00)) ) - W(9)) G (0O W„(0) - W(9) i of Proposition = 1 <?„ (0) for = (1 op (l) uniformly in the term Ri„(0). Note that + y/H\9 - (0 - 0o)) 0o|) 6 (0) 37 +g(9) + gn ) = O p (l) and and from the continuity 0. remains to check condition (A.19) ) immediately follows from y/ng„(6 in condition , 0o) + \(9- 0o)' G (0 O (W(9 )' (Bo)' , - (0 \g„ (0 O ) . and W(9) - W(9 of ) the W(9) = o(l) + , . Substitute this into Ri„ (9) and decompose - -Rm (6) = + (1 + V^|fl I (1 - flo|) + Vn|« - 6o\) 2 W„(<%- e (0)' e (0)' (fl ) + W„(0)e (1 + (0) +gn (9 V^\9 - 9 \) )' W„(9)(g e (9)' + \g («)' (Wn (9) - W(9)) g (9) + \g (9)' W(9)g (9) -\{9 - 9 Using the inequalities, for (1 x > + y/nxf < + ra 2 y/iii 2 1 ' ' W| ., sup 9es 5 „(eo) (6) c) SU P (d) sup (e) sup »6B 5 „(»o) < < + nx 2 ~ v ' - i 1 VK(0) o{\9 (a) follows > — is 9o\) condition o(\9 — i finite "l u" '' 0_£/ °l ?/^ "'? W < 1 1 ' 1 G (9 )' + Vni < + nx 2 — n (1 2 1 + + )' l = 12 ' sup 9eB 5 „(e v - iii; (9 - 9 ) (d) follows - »or W„ (9) v^3n sup 2^ eWW-Wi/^- first (A.22) ' (A. 21): (00) = op (1) Op(l). = °p(1), &(«)-^)l = MD, 8 -9°\ ) I 2 O (|e-9o| |^(g)l) \o-0or ) i, which states that Wn {9) = W{9) + op (l) and = G (9o) {9 — 9o) + equality follows by Taylor expansion g(9) follows from (A.22) and condition (c) follows iii; by (A.22) and then replacing, by condition 6o\), followed by applying (A.21) = < sup 9€B S „(8 x (l), 2n| e (0)'W.(«)ff»(*o)| and condition i; G (9o) (9 — 9o) + o (\9 — 9o\), followed by applying condition followed by applying condition G(9o) (9 — 9o) + o(\9 — So i; ii, from (A.22), (A.21), g{9) with G (9o) {9 — 9o) from replacing by condition (e) follows with and (f) ii 4- g(9) follows from replacing g (9) with i. I), Verification of (A. 20) for the term R\ n {9) A. 6 ) ^y/n and i- iii SU P "Mfl)| ^ + nlC-Por e6B ire (e the n|0 < — 2 ^ (> in 9; in (b) ' * + 1 ) from (A.22), (A.21), and condition uniformly W(9)G (0O ) y/ng) m ne(9)'Wn (9)e(9)=op SU P " P * +n1?/-Vo\' + and the second conclusion and i2 \ ,« where i fl 1 C6Bj„(»o) (0) 7 : i -9 Wn (9)g (9) each of these terms can be dealt with separately, by applying the conditions sup (9 )) 0: 1 (a) - G (9 (9) now follows by putting these terms together. Proof of Proposition 2 Verification of Assumption 3 consistency proofs of is finite is standard given the stated conditions and extremum estimators based on and Kitamura (1997) for cases when s GEL in takes on infinite values. 38 is subsumed Kitamura and Stutzer (1997) We shall as a step in the for cases not repeat it when s here. Next, v > we Assumption will verify . Define 4. 7(0)=arg mfL„(0, 7 ). to It will suffice show that uniformly L n (9„)=L n in 9„ 6 Bg n (9o) for any <J n —> 0, we have GMM the set-up: (9„,T{9 n )) / 1 "\ ,/i" V " 1 (A.23) where V(0o) The Assumptions 4.i-iii = Erm {9 rm (9 ) )' from the conditions of Proposition follow immediately gn (9) S > = ^5D"_, mi(9), the Donsker property assumed 2, and Assumption GMM verified exactly as in the proof of Proposition 1, given the reduction to the in condition iv implies that for 4.iv is Indeed, defining case. any > there is the proof of Proposition 1. e 0, such that limsupPV sup -/R\g n (9) (eeB s (e n-too - gn {9 - (Eg n (9) - Eg„(9 ) ))\ > e\ < ) e, ) which implies „. limsupP J < ,- n-»oo which It is — V^\9n{9) - sup [esfljfSo) condition iv in Proposition 1. The —— - g„(9o) - (Eg n (9) - Eg„(9 _ n l" — "o| 1 + — - We W.H we use first > e \f < e i J arguments follow that rest of the only remains to show the requisite expansion (A.23). For that purpose ))\ in show that 0. the convexity lemma, which was obtained by C. Geyer, and can be found in Knight (1999). Convexity Lemma. Suppose defined on Rd and , let 2? Qn a sequence of lower-semi-continuous convex R-valued random functions, is be a countable dense subset of $& finite-dimensional sense) on V where Qoo is d . If Qn weakly converges to Qoo lower-semi-continuous convex and finite in R marginally (in on an open non-empty set a.s., then arginf provided the latter is uniquely defined Next, we show that 7(6 n )-*p By in Rd F= {7 a.s. Define convexity and lower-semicontinuity of Thus 9 0. >-> for 7 e F, Es[mi(9)''y] < 00 Q n (z)-+d s, F for all 9 is : £s[m,(0o)'7] F and any 9 n — < 00} and convex, open, and its € Bs(9o) and some 5 Es[mi(9)'y] over Bs{9o) implied by the condition Thus, for a given 7 6 arginf Qoo(z), ii and > 0, p 9a " 1 ' 39 < {7 boundary iii. -Y"s[mi(B„)'y]-> p Es[Tni(9o)'i] » 77 Fc = 00. is : Es{mi{9o)'l] nowhere dense = in 00}. Rp . which follows by continuity of ' » > . This follows from the uniform law of large numbers implied by 6 Bs{9o)}, where S 1. {s[mi(9)'f],8 2. Em.i{8) = The above function set Jx dP\rn,i{9) is < x] small, being is sufficiently M for some compact M and a given 7 E F, by condition (b) {mi(6),6 £ Bs(0o)} being Donsker class by condition iv, (c) s Donsker Now V Pi being a uniform Lipschitz function over take 7 in V e Theorem for all 9 £ B}(9o), some S > 2.10.6 in van der Vaart class F c \ is Donsker class dF, where dF 0, M 17 given and and iii. s, e F, by construction of F, 7 itself. wp —> denotes the boundary of F. Then ™ Now take all the rational numbers 7 Lemma and conclude that e Rp \ dF we can expand the first = Es[mi(9oY"i] p 1 00. as the set T> appearing in the statement of the Convexity = 7(0")-»p result, ii 1, and Wellner (1996) that says a uniform Lipschitz transform of a n Given this -+ iii, by assumption on , and a —1 7^ s[mi(9n Yj] = 00— ^— form. Note wp Donsker by m,(0)'7 £ (e) class being continuously differentiable in 9 by condition (a) (d) mi(S)'7 a Donsker arg inf Fs[7n,(9o)'7]. 7 order condition for 7(0„) order to obtain the expression for in its first = J^ Vs (7 (9n)' mi (0„ )) m, (<?„) '':' (A-24) where V» = -V— V T7. for some 7(9 n ) between 2 * s (7 (9n and f{9„), which is )' m, different (9 n )) rrn (9 n ) mi (0„)' , from row to row of the matrix Vn . Then V„-+ p V(6>o) = Fm, (9 ) m, (9 )' This follows from the uniform law of large numbers implied by {V 2 s(-y'mi(9*))mi(9)mi(9)' ,(0*,-y,9) 6 small, being a Donsker class wp —> 1, 1. 2. Emi(9)mi(9)' 17 Recall that V = J xx'dP[mi(9) < is x] £,5,(00) x Bj 2 (0) x being continuous function defined as the open convex set on which 40 s is finite. B53 (9 in 9 )}, where by condition 8} i, > are sufficiently , 3. EV 2 s(~f'mi (9"))mi (9) > sufficiently small 5 The claim 1 is nu 0, for verified = £V 2 s(0)77ii (9)' any 7 — > 0, + o(l) (9) tth (9)' uniformly in by assumptions on s and condition by applying exactly the same logic as in . (9, 9') e Bg(9 ) x B5 (9 ) for iii. the previously stated steps (a)-(e). For the sake of brevity, this will not be repeated. Therefore, wp — 1 -y(9 n ) = -(Vn )- , ^J2m = -(V(9 +op (l))^J2m )- 1 (9 n ). (A.25) 4=y><(0-)+5V^(0»)'V»V«7(*»), (A.26) i (9 n ) 1=1 i t=i Consider the second order expansion, i»(9«,7(«»))=Vwr(«»)' v" l ,=1 where V„ for some 7 (9„) between 1 «^ = - Y, V 2 s (7 (9r,)' m, (9n )) rm and 7 (9 n ), which is different (9„) m,- (0 n )' from row to row of the matrix Vn . By a preceding argument, V„-> P V(9 Inserting (A.25) A. 7 and V„ = V(9o) +o p (l) into (A.26), ). we obtain the required expansion (A. 23). Proof of Proposition 3 Assumption 3 is assumed. We need to verify Assumption 4. Define the identity L n (9) - L„(9 ) = J2 m i( 9o)'(9 - + 1 -(9 9 ) (A.27) - y ee ,Em 9 )'n t (9 ) (9 - 9 ) -./(So) + Assumption 4.i-iii Rn(9). then follows immediately from conditions The remainder term R„ (9) is i and ii. Assumption 4.iv is verified given the following decomposition: n Rn (9) = J2 [m,{9) - m,(0 o ) - Emi{9) + Emi(9 + n{Em,{9) - Em t ) (9 )) - m,(9 + i(0 k 2 „(S) 41 )'(9 - - O )} 9 )'nJ(9 ){9 - 9Q ) as follows. " , to verify Assumption 4.iv separately for Ri„(9) It suffices R2n(9) for 9" some on the To show Assumption = -\n{6 - 6o)' \J(9") - connecting 9 and line from continuity of J(9) a in 9 over limsupP*< " for any given \<M/jz -9 ), for R2n(9) follows immediately M> J sup 9 l \\e-e <limsupW First, (9 |.Ri„(0)|>el sup \\e-e " last )] Assumption 4 9o, verification of <limsuP p| where the J(9 ball at 9o- R\ n {9), we note that 4.iv-(b) for and R2„(9). Since -^! ^'"^, >el 1 1 \<M/^i \o (A.28) — Oo\ J — sup LJiLkZJ >e}=0, conclusion follows from two observations. note that »w=^ 9-9 \-7nhy - mi (g i{9) z - (Erm{9) - Ermifio)) - ) mi{9 )'{9 -9 ) l is Donkser by assumption, that The is it converges in l°°(Bs(8o)) to a tight Gaussian process Z. process has uniformly continuous paths with respect to the semimetric p given by 2 p so that p{9,9) —> if 9 -* 9o- {9 u 9 2 ) Thus almost all = E{Z{9,) - Z{9 2 )f Z sample paths of are continuous at So- Second, since by assumption E[m n (9) we have for any 9 n —> m n {9 -m^{8 ) )'(6 - 2 8 )] = - o{\9 2 9 \ ), So |-Ri„ (9 n yfr\6 n - ) oJ^-9^ | |e„-e p So| ' therefore Z(flo) Therefore for any ff —> p So, = o. we have by the extended continuous mapping theorem Z„(6')-t d Z{9 ) = that 0, is Z„(9')-* p (A. 29) 0. This shows (A.28). To prove Assumption 4.iv-(a) for Rj„(9), we need limsupP-( to show that '" sup 'f 42 for ( some ! 11 > 5 > 4 <€ and constant M (A.30) < Using that M/^/n \9 — 9o\, bound limsupP the left-hand-side by \RlnW\ a - &o\ sup < i-\a \<5 i/n\8 M/^<\e-e ,, < limsup P* < limsup P* for any given e > in order to by the property (A. 29) of Z„ or make B l~\a y/n\9 sup |Z„(0)| make the M last inequality true, sufficiently large I J_ > | su P '-'" \M/v^<|9 -9ol<*V'"|0- fl" °l J € \ [gin (9) f > - 9a~\ M '• J- > . 1 " (A.31) J E L ™ [m/ vTT<|9-9oI<'5 n where ' i I we can make by the property Z„ either S sufficiently small — Op . (1). Appendix on Computation B.l A computational we record some formal In this section LEMMA lemma 3 Suppose the chain (0 ',] such that q{8\9') MCMC results on < B) is computation of the quasi-posterior quantities. produced by the Metropolis Hastings(MH) algorithm with q for each (6,9'). Suppose also that P{p(9 u) ,£) > = 1. p n () 2. the chain is ergodic with the limit marginal distribution given by 1} is for all j > t . Then the stationary density of the chain, lim sup |p(0 Bl-K» A (B) 6 A\0 o ) where the supremum is p n ()-" - f p n {9)do\ = JA \ 3. > taken over the Borel 0, I sets, For any p„- integrable function g: B JD^K Proof. The result An is immediate from Theorem immediate consequence of this lemma is 6.2.5 in f g(9) Pn (9)d9. Robert and Casella (1999). the following result. LEMMA 4 Suppose Assumptions Lemma then for any convex and p„-integrable loss function p„ 3, -, arg inf see provided that 9 is 1 and 2 hold. B nVp,^ 01 -0) Suppose the chain {9 1 r ->p = uniquely defined. 43 arg inf ,j < B) satisfies the conditions of (•) r / p n (0 - 8)p n (9)d9 Proof. By Lemma 3 we have the pointwise convergence of the objective function: for any 9 B r i ^5>n(0 B (i) -0)->„ / Pn (9-0) Pn (S)d9, which implies the result by the Convexity Lemma, since 9 J Pn(9 — 9)p„(9)d9 is convex by convexity of Quasi-Bayes Estimation and Simulated Annealing B.2 The >-¥ relation between drawing from the shape of a likelihood surface and optimizing to find the mode of the likelihood function is well known. It is lim well established that, [a f >l x Essentially, as A —> m n(0)d9 m LM XL 9e " = e.g. arg Robert and Casella (1999), m axX n (9) (B.l) measures oo, the sequence of probability XL » (e \(9) L J9 e* -mn(9)d9 e (B.2) converges to the generalized Dirac probability measure concentrated at argmax L„ (9). see The difficulty of nonlinear optimization has been an important issue in econometrics (Berndt et Sims (1999)). The simulated annealing algorithm (see considered a generic optimization method. with a uniform prior ir = (9) Press et number The temperature parameter is al. (1992), Goffe et al. (1994)) al. (1974), is usually an implementation of the simulation based optimization (B.l) c on the parameter space ©. annealing routine uses a large (B.2). It is e.g. At each temperature of Metropolis-Hastings steps to level 1/A, the simulated draw from the quasi distribution then decreased slowly while the Metropolis steps are repeated, until convergence criteria for the optimum are achieved. Interestingly, the simulated annealing algorithm has semiparametric objective functions. In principle, if been widely used in optimization of non-likelihood-based the temperature parameter is decreased at an arbitrarily slow rate (that depends on the criterion function), simulated annealing can find the global non-smooth objective functions that may have many parameter is a very delicate matter and is local extrema. Controlling paper show that compute the quasi-posterior medians or means used in place of the exact maximum. They limiting distribution as the exact apply equally to (B.2), the a positive constant and then for (B.2) using Metropolis steps. These estimates can be are consistent and asymptotically normal, maximum. The and possess the same interpretation of the simulated annealing algorithm as an implementation of (B.2) also suggests that for some problems with special structures, other such as the Gibbs sampler, may be of certainly crucial to the performance of the algorithm with highly On the other hand, as Theorems 1 and 2 we may fix the temperature parameter 1/A at nonsmooth objective functions. results of this optimum the temperature reduction MCMC methods, used to replace the Metropolis-Hasting step in the simulated annealing algorithm. B.3 Details of Computation in Monte-Carlo Examples The parameter space prior is is truncated to 0. taken to be © = Each parameter [9o is ± 10]. The transition kernel is a Normal density, and flat updated via a Gibbs-Metropolis procedure, which modifies 44 > slightly the basic Metropolis-Hastings algorithm: for — = k \,...,d, a draw of fit from the univariate normal made, then the candidate value f consisting of ft and 8_ k replaces 0^') w jth probability p(6^\^) specified in the text. Variance parameter is adjusted every 100 draws (in the second density q(\t,k 9% is \,<t>) <f> simulation example and empirical example) or 200 draws (in the rejection probability The first N is 5, intervals. The starting value is OLS the To N give an idea of computational expense, computing one depending on the example. results are available we used All of the codes that x d draws are used estimate in all examples. We in use N = 10, 000 in the second simulation 000 in the second simulation example and empirical example and example. simulation example) so that the x d draws (the burn-in stage) are discarded, and the remaining computation of estimates and N= first roughly 50%. set of estimates takes 20-40 seconds to produce figures, simulation, and empirical from the authors. Notation and Terms —p —>d wp —> convergence in (outer) probability 1 ~ -Bj(x) / A> with inner probability P. converging to one asymptotic equivalence denoted Donsker class A~B (°°{J-) l = / identity matrix is positive definite when A is matrix normal random vector with mean and variance matrix a here this means that empirical process / asymptotically Gaussian in ^°°(J mineig(A) means lim AB~ > centered at x of radius 8 ball A A/"(0, a) J- P* convergence in distribution under P* metric space of bounded over minimum r ), >-* 2™_i(/(Wi) — &f(Wi)) is see van der Vaart (1999) T functions, eigenvalue of matrix -4= see van der Vaart (1999) A References Abadie, A., 1995. Changes in Spanish labor income structure during the 1980s: a quantile regression approach, CEMFI Working Paper. Amemiya, T., 1977. The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model. Econometrica 45 (4), 955-968. Amemiya, T., 1985. Advanced Econometrics. Harvard University Press. Anderson, T. W., 1955. The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities. Proc. Andrews, D. W. K., 1994a. Empirical process of Econometrics, Vol. Andrews, D. odds W. tests. metrica 65 The K., 1994b. K., 1997. (4), Soc. 6, methods 170-176. in econometrics. In: Engle, R., McFadden, D. (Eds.), Handbook North Holland, pp. 2248-2292. 4. Econometrica 62 W. Andrews, D. Amer. Math. A large (5), sample correspondence between classical hypothesis tests and Bayesian posterior 1207-1232. stopping rule for the computation of generalized method of moments estimators. Econo- 913-931. 45 W. Andrews, D. when a parameter K., 1999. Estimation is on a boundary. Econometrica 66, 1341-83. Berger, J. O., 2002. Bayesian analysis: a look at today and thoughts of tomorrow. In: Statistics in the 21st Century. Chapman and New Hall, York, pp. 275-290. Berndt, E., Hall, B., Hall, R., Hausman, Economic and Bernstein, S., 1917. Theory Berry, S., Levinsohn, Yahav, Bickel, P. J., Geb Measurement 3 Social J., 1974. Estimation J., (4), and inference in nonlinear structural models. Annals of 653-665. of Probability. (Russian) Fourth Edition (1946) Gostekhizdat, Moscow-Leningrad. Pakes, A., July 1995. Automobile prices in market equilibrium. Econometrica 63, 841-890. J. A., 1969. Some contributions to the asymptotic theory of Bayes solutions. Z. Wahrsch. Verw. 11, 257-276. and Measure, 3rd Ed. John Wiley and Sons. Billingsley, P., 1994. Probability Buchinsky, M., 1991. Theory of and practice of quantile regression, Ph.D. dissertation, Department of Economics Harvard University. Buchinsky, M., Hahn, An 1998. J., alternative estimator for the censored regression model. Econometrica 66, 653-671. Bunke, O., Milhaud, X., 1998. Asymptotic behavior of Bayes estimates under possibly incorrect models. The Annals of Statistics 26 (2), 617-644. Chamberlain, G., Imbens, G., 1997. Nonparametric appliations of Bayesian inference, Chernozhukov, V., Hansen, C, 2001. An IV model of quantile treatment effects, NBER Working MIT Paper. Department of Economics Working Paper. Chernozhukov, V., Umantsev, Economics Chib, S., 2001. Handbook Markov chain monte Hahn, J., Chapter 5. Computation and inference. In: J.J.Heckman, Learner, E. (Eds.), North Holland, pp. 3564-3634. Inoue, A., 1999. Testing, comparing and combining value at risk measures, working Paper, Wharton School University of Pennsylvania. Diaconis, P., Freedman, D., 1986. Ann. carlo methods: of Econometrics, Vol 5, Christoffersen, P., Doksum, K. Conditional value-at-risk: Aspects of modeling and estimation. Empirical L., 2001. 26, 271-92. On the consistency of Bayes estimates. Annals of Statistics A., Lo, A. Y., 1990. Consistent and robust Bayes procedures for location based on partial information. Statist. 18 (1), 443-453. Engle, R., Manganelli, S., 2001. Caviar: Conditional value at risk by regression quantiles, of 14, 1-26. Economics UC Working Paper, Department San Diego. Fitzenberger, B., 1997. A guide to censored quantile regressions. In: Robust inference, Handbook of Statistics. Vol. 14. North-Holland, Amsterdam, pp. 405^437. Gallant, A. R., White, H., 1988. A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Oxford: Bail Blackwell. Geweke, J., Keane, M., 2001. Computationally intensive methods for integration Learner, E. (Eds.), Goffe, W. L., Ferrier, Handbook of Econometrics, Vol 5, Chapter G. D., Rogers, J., 5. in econometrics. In: J.J.Heckman, North Holland, pp. 3465-3564. 1994. Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60, 65-99. 46 Hahn, 38 J., Hansen, sample study. Intemat. Econom. Rev. 1997. Bayesian bootstrap of the quantile regression estimator: a large (4), 795-808. L., Heaton, J., Yaron, A., 1996. Finite-sample properties of some alternative GMM estimators. Journal of Business and Economic Statistics 14, 262-280. Hansen, L. P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50 (4), 1029-1054. Hogg, R. V., 1975. Estimates of percentile regression Journal of American Statistical Associ- lines using salary data. ation 70, 56-59. Huber, P. J., 1973. Ibragimov, I., Robust regression: Asymptotics, conjectures, (3), carlo. Annals of Statistics 1, 799-821. Has'minskii, R., 1981. Statistical Estimation: Asymptotic Theory. Springer Verlag. Imbens, G., 1997. One-step estimators 64 and monte for over-identified generalized method of moments models. Rev. Econom. Stud. 359-383. Imbens, G., Spady, R., Johnson, Econometrica Jureckova, J., P., 1998. Information theoretic approaches to inference in moment condition models. 66, 333-357. 1977. Asymptotic relations of M-estimators and R-estimators models. Annals of in linear regression Statistics 5, 464-472. Khan, S., Powell, J. L., 2001. Two step estimation of semiparametric censored regression models. Journal of Econo- metrics 103, 73-110. Kim, J.-Y., 1998. Large sample properties of posterior densities, Bayesian information criterion and the likelihood principle in nonstationary time series models. Econometrica 66 (2), 359-380. Kim, J.-Y., 2002. Limited information likelihood and Bayesian analysis. Journal of Econometrics Kitamura, Y., 1997. Empirical likelihood methods with weakly dependent processes. Ann. Kitamura, Y., Stutzer, M., 1997. An information-theoretic alternative to generalized Statist. , 175-193. 25 (5), 2084-2102. method of moments estimation. Econometrica 65, 861-874. Knight, K., 1999. Epi-convergence and stochastic equisemicontinuity, Working Paper, Department of Statistics University of Toronto. Koenker, R., 1994. Confidence intevals Asymptotic for quantile regression. In: Proceedings of the 5th Prague Symposium on Statistics. Heidelberg: Physica- Verlag, pp. 10-20. Koenker, R., 1998. Treating the treated, varieties of causal analysis, Lecture Note, Depratment of Economics University of Illinois. Koenker, R., Bassett, G. S., 1978. Regression quantiles. Econometrica 46, 33-50. Kottas, A., Gelfand, A., 2001. Bayesian semiparametric median regression modeling. Journal of the American Statistical Association 96, Lehman n, 1458-1468. E., Casella, G., 1998. Theory of Point Estimation. Springer. Macurdy, T., Timmins, C-, 2001. Bounding the influence of attrition on the intertemporal wage variation Working Paper, Department of Economics Yale University. 47 in the NLSY, Mood, A. M., W. Newey, 1950. Introduction to the Theory of Uniform convergence K., 1991. Statistics. McGraw-Hill Book Company, Inc. and stochastic equicontinuity. Econometrica 59 in probability (4), 1161- 1167. W. Newey, (Eds.), K., McFadden, D., 1994. Large sample estimation and hypothesis Handbook of Econometrics, Vol. 4. North Holland, pp. 2113-2241. W. Newey, testing. In: Engle, R., and type I censored K., Powell, J. L., 1990. Efficient estimation of linear regression McFadden, D. models under conditional quantile restrictions. Econometric Theory 6, 295-317. W. Newey, W. Newey, of Economics K., West, K. D., 1987. A and generalized empirical likeliood estimators, MIT. simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55 Owen, GMM K., Smith, R., 2001. Higher order properties of Working Paper, Department (3), 703-708. A., 1989. Empirical likelihood ratio confidence regions. In: Proceedings of the 47th Session of the International Statistical Institute, Book 3 (Paris, 1989). Vol. 53. pp. 373-393. Owen, A., 1990. Empirical likelihood ratio confidence regions. Owen, A., 1991. Empirical likelihood for linear models. Owen, A., 2001. Empirical Likelihood. Ann. Ann. Statist. 18 (1), 90-120. Statist. 19 (4), 1725-1747. Chapman and Hall/CRC. Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57 (5), 10271057. Phillips, P. C. B., Ploberger, 64 (2), W., 1996. An asymptotic theory of Bayesian inference for time Pollard, D., 1991. Asymptotics for least absolute deviation regression estimator. Econometric Potscher, B. M., Prucha, Powell, , Econometrica series. 381-412. J. L., I. R., 1997. Dynamic Nonlinear Econometric Models. Theory 7, 186-199. Springer-Verlag, Berlin. 1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics 303-325. Press, W., Teukolsky, S. A., Vettering, W., Flannery, B., 1992. Numerical Recipes in C, The Art of Scientific Com- puting. Cambridge. Qin, J., Lawless, Robert, C. C, P. J., and general estimating equations. Annals 1994. Empirical likelihood P., Casella, Rousseeuw, Sims, J., G, 1999. Monte Carlo Statistical Hubert, M., 1999. Regression depth. 1999. Adaptive Metropolis-Hastings, or J. of Statistics 22, 300-325. Methods. Springer. Amer. monte Statist. Assoc. 94 (446), carlo kernel estimation, 388-433. Working Paper, Department of Economics Princeton University. Stigler, S. M., 1975. Studies Laplace. Biometrika 62 van Aelst, S., Rousseeuw, in (2), the history of probability and statistics. XXXIV. Napoleonic statistics: the work of 503-517. P. J., Hubert, M., Struyf, A., 2002. The deepest regression method. 81 (1), 138-166. van der Vaart, A., 1999. Asymptotic Statistics. Cambridge University 48 Press. J. Multivariate Anal. van der Vaart, A. W., Wellner, J. A., 1996. Weak Convergence and Empirical Processes. Springer- Verlag, New York. von Mises, R., 1931. Wahrscheinlichkeitsrechnung. Berlin: Springer. White, H., 1994. Estimation, Inference and Specification Analysis. Vol. 22 of Econometric Society Monographs. Cambridge University Zellner, A., 1998. Past Press, Cambridge. and recent results on maximal data information 49 priors. J. Statist. Res. 32 (1), 1-22. Table 1: Monte Carlo Comparison of LTE's with Censored Quantile Regression Estimates Obtained Programming (Based on 100 repetitions). using Iterated Linear Estimator RMSE MAD Mean Bias Median Bias Median Abs. Dev. n=400 Q-posterior-mean 0.473 0.378 0.138 0.134 0.340 Q-posterior-median 0.465 0.372 0.131 0.137 0.344 Iterated LP(10) 0.518 0.284 0.040 0.016 0.171 3.798 0.827 -0.568 -0.035 0.240 0.089 n=1600 Table Q-posterior-mean 0.155 0.121 -0.018 0.009 Q-posterior-median 0.155 0.121 -0.020 0.002 0.092 Iterated LP(7) 0.134 0.106 0.040 0.067 0.085 3.547 0.511 0.023 -0.384 0.087 2: Monte Carlo Comparison of the LTE's with Standard Estimation for a Linear Quantile Regression Model (Based on 500 repetitions) RMSE MAD .0747 .0587 .0174 .0204 .0478 .0779 .0608 .0192 .0136 .0519 .0787 .0628 .0067 .0092 .0510 Q-posterior-mean .0425 .0323 -.0018 -.0003 .0280 Q-posterior-median .0445 .0339 -.0023 .0001 Standard Quantile Regression .0498 .0398 .0007 .0025 Estimator Mean Bias Median Bias Median AD n=200 Q-posterior-mean Q-posterior-median Standard Quantile Regression n=800 .0295 j .0356 Table 3: Monte Carlo Comparison of the LT Inference with Standard Inference for a Linear Quantile Regression Model (Based on 500 repetitions) Inference Method coverage length n=200 Quasi-posterior confidence interval, equal tailed .943 .377 Quasi-posterior confidence interval, symmetric (around mean) .941 .375 Quantile Regression: Hall-Sheather Interval .659 .177 Quasi-posterior confidence interval, equal tailed .920 .159 Quasi-posterior confidence interval, symmetric (around mean) .917 .158 Quantile Regression: Hall-Sheather Interval .602 .082 n=800 Criterion for IV-QR Criterion for QB-Estimation Markov Chain Sequence 2000 4000 6000 Q-Posterior for Theta 10000 -0.2 0.0 0.2 0.4 thela Figure A Nonlinear IV Example involving Instrumental Quantile Regression. In the top-left panel L n {9) is depicted (one-dimensional case). The true parameter In the bottom-left panel, a Markov Chain sequence of draws (9^\ ...9 J ^) is depicted. The 1: the discontinuous objective function #o = 0. <- marginal distribution of this sequence point estimate, the sample vertical lines are the 10-th mean is 9, is pn (9) = e Ln(8) / fQ e L" w d& , seethe bottom-right panel. The given by the vertical line with the romboid root. and the 90-th percentiles of quasi-posterior distribution. panel depicts the expected loss function that the LTE minimize. Two other The upper-right Var(p) for dynamic model 2400 Figure 2: Recursive VaR o.2 Surface in time-probability space Var(p) for static model 2400 Figure 3: Non-recursive VaR o.2 Surface in time-probability space 0.2 Figure Figure 0.3 4: #2(1") for 5: 0.4 r € 0s{t) for t € [.2, .8] [.2, .8] 0.5 0.7 0.8 and the 90% confidence intervals. and the 90% confidence intervals. 0.2 0.1 Figure 0.3 6: 9* (t) for 0.1 Figure 0.2 7: r 6 0.3 g(r) for r 6 0.4 [.2, .8] 04 [-2, .8] 0.5 0.6 07 and the 90% confidence 05 0.6 07 and the 90% confidence 0.8 0.9 intervals. 0.8 intervals. , Throughout this proof the range of integration for h implicitly understood to is , Hn be . For clarity of the argument, we limit exposition only to the case where J„(9) and tt„(9) do not depend on n. The more general case follows similarly. Part 1. Define h = y/H(9-Tn ), Tn =e o +-J(0o )-' A„(9 n ), U„ = -^=J(9 )-' y'n A„(0 O ), (A.2) then =-y=pn (h/Vn + p"n (h) (^ +T") ex p(^(7% +T")) 7r /«„ * (ts 7r ~fH 7r + U„lVn) 9 (^r + T") exp ( L - {7s + T")) dh +T") exp(w(ft)) „*(^+Tn ) exp(u(h))dh (v^ +T") exp(u W) ' where u> = L n (rn + -j=\ - L (9 (h) ) - i- A„ (9o)' J {9 )-' A„ (9 (A.3) ) and C„ Part 2 shows that for each a Am=[ Given (A .4), taking a a 7v + T„) f-j= exp (u(h)) dh. 0, ey.p(w{h))n(Tn \h\ = > = f +~\-eK P (-^h'j(9 dh^+0. )h\n(9o (A.4) we have C„-> p /" e-i h ' Jieo)h Tr(9 = )dh iT(eo)(,2iT)i\<ietJ(9 )\- i/2 (A.5) , hence C„ = Op (l). Next note left side of (A. 1) = f a \p' n (h) \h\ -p'00 (h)\dh = A n C~' where „ = /" Q |fc| e u,( Using (A. 5), to show (A.l) '°7r(V, it ! + -^=) suffices to -(27r)- show that d/2 An An < A |detJ(5o)r /2 ex P — tjl 29 > 0. + A 2n But r-^'J(eo)^ C„ dh. where Then by Cn {2-K)- d,2 \AetJ(fio)\ U2 &^{-\h'J{9o)h\ a A 2n = f \h\ -7r(0o)exp (-\h'J(9 )h\ dh. (A.4) and C„(27r)- Part 2. It d/2 1/2 |detJ(e )| (fi J (9 )' )-' = -\tiJ{6o)h + IU Split the integral A\ n in (A.4) over three exp f-lh'J(9 (i) rea (ii) M <\h\< Sy/n, rea (iii) \h\ : \h\ Each of these areas We will A„ (fl f-j= ) + i?n f-^: ) in (A. 2) h\ dh and (A. 3), write + T„) + T„y separate areas: <M, rea (i): \h\ remains only to show (A.4). Given Assumption 4 and definitions - ^-A n Area a f -7r(6lo)| > is Sy/n. implicitly understood to intersect with the range of integration for h, show that < for each liminfP, •{/ J\h\<t \h\< M < oo and each e > exp(w(h))7z(rn + -^=\ which is H„. [ (A.6) - exp ( - hi This is J (fl ) h\ -k (do) \dh < e \ > 1-e. proved by showing that sup \h\° exp(w(h))n (t„ + -j=\ - exp (-lh'J(6 )h\ 7r(0 o ) (A.7) \h\<M Using the definition of (a) ui (h), sup |M<M where (a) follows (A. 7) follows from: "ts +T") 7T (0 O ) 0, (6) sup \h\<M "•(^ +T") from the continuity of 7r() and because by Assumption 1 Jn J(8 )-'A n (9 30 )=Op (l). o, 4ii-4.iii: (A.8) Given (A.8), from Assumption (b) follows 4.iv, since Tn + sup \h\<M Area (ii): We show that > for each e a \h\ I J_ Op {\/y/n). M and small 5 > there exist large 4/ \unmfl\{ h \pn JM<\h\<6Sn | exp (w(h)} (t„ it + such that -J=j '- (A.9) -exp(-*-tiJ{e )h\n{9 Since the integral of the second term to show that for each e > liminf P,\ n In order to do and can be made finite is M and small there exist large exp(io(ft))7r + (T„ + -^= < Cexp -^= dh ) V"/ \ - M large, by setting e I > 1-e. it suffices > (-ife',7(6>o) ft) , < (A. 10) J M as n — to show that for sufficiently large so, it suffices 1 such that exp(w(h))w [Tn \h\" [ [JM<\h\<6^ ) arbitrarily small > i5 <e\ > dh oo for all M< < \h\ 6y/n. (A.ll) J By assumption By 7r(-) < K, so we can drop from consideration. it definition of w{h) exp (w Since \T„ — do\ = o p (l), for any < (h)) S [ wp — > T„ exp > r= H •Jn — -h'J (6o) — 8o < sup <|hl<*Vn \h l Since £| J (e )~ An 2 (fl ) [ = O p (1), + ^J(6 for some liminfP.^ exp(u>(/i)) >liminfpje"' °|e'" c 't) ( for all 25, + \R n (Tn PA ('-*) 1 Thus, by Assumption 4.iv(a) there exists some small liminf h + Rn )- <5 and \h\ large -$=)\ 1 A„ < 8^/n. M such that j < imineig(J(e9o))| )) J.>l-e. ~ 4 2 (So) | C> < Cexp ( -ift'J(fl < Cexp -±/i' J (9 <CexpC-l > 1-e. (A. 12) implies (A. 11), which in turn implies (A.9). 31 ) h )ft + (A.12) ^mineig(J : (0 O )) |ft| Area (iii): We show that will for each e and each a PA f liminf > exp(tu (h)) \h\ > S -k 0, + ( T„ -^- ) (A.13) -expf--/»'j(0 o )/i) The integral of the second term clearly goes to a f term Recalling the definition of h, the ^/n x+i T„-e f — T„ — 9o 0, wp tf„ • -> C 1 this is • ^ is e" (fc) oo. (t„ 7r Therefore we only need to show + -^=) dh^ p 0. bounded by 7r(fl)exp[£ n (fl)-i„(9o)-^-A„(9o)'J(flo)2n \ J\e-Tn \>s Since |fc| n— as d/i<el > 1-e. tt(0o) +1 , A„(e ) )d9. / bounded by / (1 Q + |0| ) 7T (0) exp (L„ (0) - L„ (0 O )) <#, /|9-9 l>S/2 where = tf„ exp f-ii. (0o )' J (ft,) -1 A n (0O = O p (l). )J By Assumption 3 > there exists e such that liminf P.[ n-+°° Thus, wp —> 1 the entire term is sup *-m-L.lto) < e -«l = 1 J bounded by K„C- v^" Here observe that compactness e (_|9-9ol>*/2 is 1 "1 e - "' / |(9|°tt (0) d0 = (A.14) op (l). only used to insure that Q / |0| 7r(0)d0<oo. (A. 15) J& Hence by replacing compactness with the condition (A. 15), the conclusion (A.14) is not affected for the given a. The A. 2 entire proof is now completed by combining (A.6), (A.9), and (A.13). Proof of Theorem 2 For clarity of the argument, we limit exposition only to the case where Jn (0) and ii„(9) do not depend on n. The more general case follows similarly. Recall that h = VM0 - Oo) -J(9o)~ A„ l 32 {Oo) iVn. . . Define Un = J(9o) ' A n (0o)/\/n. Consider the objective function Qn(z)= f p(,z-h-U„)p'„(h)dh, which minimized at y/n{9 is — Also define do). = / p(z-h-U„)pUh)dh. J** Q=»(z) which minimized at a random vector denoted is £ = arginf < is symmetric, $ Therefore, = Zn — is unique and finite Define . p(z I \hi { Jit* z£* d l Note that solution Zn we have by Assumption 2 parts 2.ii p(h) < 1 + |Q„( Z )-Qco(z)|< \h\ p / and by {l < < f (l+2p loss function p. When p =Op (l)- / (l where o p (l)-conclusion by Theorem is + 2 - p- f 1 + b\" < 2p ~l \a\ p + 2 p_1 |b|" for p> 1: + \z-h-Un p )(pUh))dh \ [ + note that on the (iii) - Qoo(zHp \a (l / + 1 \h\" {l + 1 \h\ {l p + + 2"- 1 \z-U \")\p' {h)-p' (h)\dh n n 00 2"- , \h\ p + 2'" , \z-Un p \ ){p'0O (h))dh +Op (V)(p'n(h)- P Uh))dh 2»- 1 \h\ p + Op (l))(pUh))dh = and exponentially small tails of o p (l) > the normal density (Lebesgue converges to zero) Q n {z) and Qoo(z) are convex and lemma of Pollard convexity and + \z-h-Un n\pl(h)-pZo{h)\dh + Now (ii) any fixed z for H^ \ arginf z6R dQoo(z) equals by Assumption measure of dh by Anderson's lemma. Qn(z) since h) p"^{h) J Z„=Z + Un Next, - finite, and Zn = arginf 26Rti(5oo(z) = O p {\). By the (1991), pointwise convergence entails the uniform convergence over compact sets K: sup \Q„(z) Since Z„ \/n{9 — = O p (l), 6o) Proof of Qoo(z) ->> uniform convergence and convexity arguments — Zn -t p Zn — - 0, as shown below. \/n{Q — 6q) = o p {\). The 0. like those in Jureckova (1977) imply that proof follows by extending slightly the convexity argument of Jureckova (1977) and Pollard (1991) to the present context. 33 Consider a ball Bi(Z n ) with radius 8 > 0, Zn and let z = Zn + dv, where v is a unit direction vector Zn = Op (l), for any 6 > and e > 0, there exists K > such that centered at , HminfP,{En = {Bs (Zn ) 6 By convexity, for any z = Zn + dv constructed J(Q«(*) where z* is a point of boundary of Bg(Z„) on the Qn(z) to Qoo(z) over any compact ^(<9n(z) exists an 1 — is a uniformly > 7j - 0„(Z»)) > in n Q„(**) and d > S. Because (A.16) ), By the uniform convergence of - Q»(Z„) - Qoa(Z„) +Op(l) > Qoo(z*) P(Vn > > rj) 1 — Z„ is V„ + Op(l), That the unique optimizer of Qoo- Hence we have with probability e. is, there at least as big as 3e for large n: 2(Qn(z)-Q„(zn ))> v — Thus, y/n(8 small r\ > 0, it we like by picking . a complement of Bg(Z„) with probability at most 0q) eventually belongs to set e as small as (a) sufficiently large K, and (b) sufficiently large n, Since this is true for any 8 > 0, it follows Since we can (c) sufficiently = 0. that -Vn(e-6 Z„ A. 3 3e. and follows that limsupP'{|Z„ -x/ra(0-0 o )| > s] ) = o p (l). U Proof of Theorem 3 For clarity of the argument, we limit exposition only to the case where J„(9) and n. 1 that Q n (z')-Q n (Zn positive variable, because such that lim inf„ = e. connecting z and Z„. line \v\ Bk(0), whenever E„ occurs: set > where V„ > B K {0)}} > 1 - so, it follows -Q«(Z«)) > such that The more We general case follows similarly. *i,»(aO Evaluate it at x = g(0o) + s/yjn = f eee:g(e)< do not depend on p n (6)d6. and change the variable of integration Hg n (s) = Fg n (g(9o) + s/y/n) = , f2 n (0) defined , Hs ,n(s) = p"„(h)dh. f x P' (h)dh / J/i6R J 9 (9 +fc/ vAf+t/„/VH')<9(9o)+s/v/:S : H9 ,oo(s) = I plcitydh. 34 By definition of total variation of moments norm and Theorem sup \Hg>n (s) where the sup By is taken over the support of - 1 Hg n (s)\-y p 0, , H gi „(s). the uniform continuity of the integral of the normal density with respect to the boundary of integration sup|-ffs ,„(s) - Hgtao (s)\->p 0, SUp \Hg, n (s) - Hg i00 (s)\-*p 0. which implies 3 where the sup is The convergence taken over the support of Hg<n (s). of distribution function implies the convergence of quantiles at continuity points of distri- bution functions, see (1994), so e.g. Billingsley H-Ua) - H~Ua)^ P 0. Next observe = ffs ,»(s) H'Ua) = where qa is the a-quantile of A/"(0, Recalling that we defined c g ,„(a) p{Vg(0o)'tf(Un J -, (0o)) , + qa ^V<,g{e Vg{8 )'Un < )'J-i(6 s\un ], )V e g(e ), 1). = F~l(a), by quantile equivariance with respect to the monotone trans- formations Hg ,i(a) = Vn (cj,„(a) - g(8 )) so that %Mc9 ,n(a) -ff(0o)) = Vg{eo)'Un +qa ^s9{SoyjThe A.4 rest of the result follows 1 (e )Veg(.e ) + o p (l). by the A-method. Proof of Theorem 4 In view of Assumption 4, it suffices to show that J;'W-^'(«oK0, (A.17) and then conclude by the A-method. Recall that h = V^(o - e ) - j„ -1 (flo) ' and the localized Quasi-posterior density for p*„ (h) h a„ - (0 O ) M^, ' is = —=pn (h/y/n y/Tl 35 + 9o + Un/y/n) . Note also J-\e ) = f n{8 = f (ft = f htiplcWdh. - ff){6 - 9)'p n (9)d9 - JH{9- 9 ) + U„) (h - yft{8-8o) + Un )'p'„(h)dh, and J- We have, denoting h 1 {9 = (a) /Hn fchj (p'n (h) (b) JHC hihj W /*„ l^"| (d) /H |Tn| {e) fH ^hj fni ) (h u and - pUh))dh = lp^(h))dh 2 ...,dd ) = (ptcC1 ))^ (p*„(h) (T„i, ...,Tnd ) Theorem o p (l) by where f„ = V"(0 - 0o) - t/„, for all i,j <d 1, op (l) by definition of p*x and J„(6o) being uniformly nonsingular, (p"n(h)-pUhj)dh 2 fn = = °p(1) = op {l) by Theorem by Theorem 2, 2, definition -p*00 (h))dh = ov (1) by Theorems of p'^, and J„(0o) being nonsingular, 1 and 2, =<v(i> (f) hj JH (p^oW)*" = T„i 1 °p(l) by Theorems 1 and 2, definition of p^,, and J„(#o) being uniformly =Op(l) nonsingular, from which the required conclusion follows. A. 5 Proof of Proposition Assumption 3 1. It is directly implied 1 by (4.1)-(4.4) remains only to verify Assumption and the uniform continuity of Ew.i (ff), as shown in Lemma 4. Define the identity L„(9) - L„(9 ) = - ngn (9o)'W(6 A„(» )G(9 ) (6 - ) )' (A.18) Q )'nG(0 o )'W{9o)G{0o){8-0o -H<,e-0 ^ * Z ) + Rn(9). ^ Next, given the definition of A„(#o) and J{9o), conditions conditions i-iii of Proposition succinctly stated 1. Condition iv is i, ii, iii verified as follows. of Assumption 4. are immediate from Condition iv of Assumption 4 can be as: for each e > there exists a 5 > such that lim sup P*./ < \ 36 |9 sup -" |<*l o — \R„ (9) -—tj r- + | n|0-0o| 2 > 1 e > J < e , This stochastic equicontinuity condition see e.g. Andrews is equivalent to the following stochastic equicontinuity condition, (1994a): forany5„->0 _£=ML_ = 0p sup (A .19) (l). W-9o\<t~ l+n\(f—(fo\ This is weaker than condition (v) of Theorem SU P Rn l where the term brackets in -f=Ta a Rn (0) + n|0-0o| 2 ^/E\9 Newey and McFadden 7.1 in m . i ^/ii\9 (6) - 9 + n\9 \ 9o\ y/E\9-9 Hence the arguments of the proof, except At first sup W > ( - 6p\ + n\B - 6p l+n|0-0o 2 A - 20 ) 21 | \ + n\9-9 l ' ~ 2 \ important differences, follow those of Theorem 7.2 in for several (1994). note that condition iv of Proposition esBj„(e 2 » bounded by is ^ Newey and McFadden = H~ll (1994), which requires e (9) = ) op -L) , f Vv"/ where implied by the condition (where 1 is g" = e (0) 3" o) ~ (<?) for <2f ^ f l + -/n|0-0o| any we let g (9) ft. -* 0. (9 -9 = Eg„ (9)): (A.21) From (A.18) Rn {9) = + R2n(6) + R3 n(9), Rln(9) where Rln (9) =n(g n (0 O )' Wn {9)G (9 ) (9 - \9n R 2 „ (9) =n(^gn (9o)' (Wn (9 ) - -9 (9)' Wn ) + \(9- 9 Wn (9)g (0)) gn (0 O n (9) )' G (9 + \gn )' W{B)G{9 (0O )' Wn (9)gn ) ) (0 O )) )Y R3n (9) =n (g n (fio)' (W(9 - Wn (9))) G (0o) (9 - 0o) ) + \(9- Bo)' G Verification of (A. 19) for the terms uniform consistency of in 9 as It \9 by condition - O | -> i W n (9) in 9 as of Proposition 1, R2 n {9) and assumed so that R3n (fl (9) (W{9 ) - W{9)) G (9 i of Proposition - W{9) = 1 o p (l) uniformly in 9 and remains to check condition (A. 19) for the term Ri n (0). Note that = (1 + v/S|0 - (9 - 0o|) e (0) 37 + g (0) + gn O )) = Op (l) and from the continuity 0. g n (0) ) immediately follows from ,/ng n (9o) in condition Wn )' (0 O ) . W{9) - W(9 and the of ) W = (9) o(l) , and decompose Substitute this into R\ n (9) - -flm n + (i («) = (1 2__ I + V^\e - e \) + v^|0 - 0o|) 2 W„(9) e e (0)' (0) w„(%» « (0)' > + S„ (fl ) + (i Wn(0)(9 - G (9 (<?) )) (9 + V£|0 - to|) <= (e)' ' X / w > + ^/m:) < + Til 2 _ yfnx 2 ' 1 1 < + ra ! _ SU P ees^ («o) M su p (<*) sup w su p m SUP 1 J ' 1 + y/nx < + rtx 2 ~ o{\9 — is 9o\) condition i j? t * + n e ~ ae°\ i a" «|r 2 , i finite n + Vnx) < Vn 2 + nx 2 ~~ x (1 g 1 ' 1 ! ."LV/^ |2 , , (g) ±i :.v 1,2 I + i ' ifl uniformly iii; (d) follows o(VH|0-0o|)' ) 9o) + o (\9 — ) . — 9o\), , m (A. 22) ' (A.21): > ,- Wv"»» JiT^" 2 ,1 o 1 + n|0 O -„ . (to) m = oP (1) | = ^ su p 2«l«(0)'w«(0)0»(to)l < ^p 2^ < < sup -p IffWI iSViw-(«)-^(»)i = ^ SU P °i.(i), -^ = 0,(1), 9 -So| e(0)'Wn (0)2 ^(i). 2 o(|6>-go| lV^(g)|) 12 | the first follows i, fl fl |2 = Wn which states that from (A. 22) and condition i; (e) follows followed by applying condition followed by applying condition term R]„{0) now ... °p t 1 ) • {9) = equality follows by Taylor expansion g(9) followed by applying (A.21) and condition Verification of (A. 20) for the A. 6 SU P 96B,„ (» iii; by (A. 22) and then replacing, by condition G (9o) {9 — 9a) +o(\9 — 9o\), - a in 9; in (b) with (9 = i2 i2 — G (9o) | ?/^ o{\9 9q\), (fl) fl and i- iii «eWW.(ff)e(tf)=«y(l) SU P ^ \ and the second conclusion and 12 from (A. 22), (A.21), and condition (a) follows W{9) > ?//^ i n|rg(g)j (/) where 1 sup m o * each of these terms can be dealt with separately, by applying the conditions ( 6) - 0: 2 a> (e r 6 (9) Using the inequalities, for x < ) w r 5 (9) (1 ) ' vk„(% (e) , l + \g (<?)' (w.W - w(*)) 9 (0) + ^<? (0)' w(9) 9 (e) - -{e-e )'G (e y w(B)G (e x -9 v > ' „ (to)' » , ' i; (c) follows ii, + op (l) and — G (9o) (9 — So) + W{9) from (A. 22), (A.21), g{9) with G (9o) (9 — 9o) + from replacing by condition and (f) follows from replacing g ii g(9) (9) with i. follows by putting these terms together. Proof of Proposition 2 Verification of Assumption 3 consistency proofs of is finite is standard given the stated conditions and extremum estimators based on and Kitamura (1997) for cases when GEL in s takes on infinite values. 38 is subsumed Kitamura and Stutzer (1997) We shall as a step in the for cases not repeat it when s here. Next, we Assumption will verify Define 4. 7(0)=arg to show that uniformly in 9 n It will suffice 6 Bs„(9o) (0*)) -K^!> Ln (9,y). inf for any <5„ —> 0, (V^oJ + oj.a))- we have the » ! 1 GMM set-up: \ (A.23) f-^f^nn^Jj, where V(9 The Assumptions 4.i-iii 5 > = i 127=1 Errn {6 ) rm (0 O )' - follow immediately from the conditions of Proposition verified exactly as in the g„(9) = ) proof of Proposition m i(Q)i tne Donsker property assumed 2, and Assumption GMM given the reduction to the 1, case. in condition iv implies that for 4.iv is Indeed, defining any e > 0, there is such that - gn {8o) - (Egn (9) - Egn (6 HmsupPV sup Vn\gn(8) limsupP'I sup ^\9^)-9n{9o)-(Egn (9)-E9n (9o))\ > 1 + y/n\9 — 9o\ ))\ > e\ < e, which implies n-njo which It is [ffgBstSo) condition iv in Proposition 1. The arguments follow that rest of the We only remains to show the requisite expansion (A.23). W»)-*j. first in 1 <£ I the proof of Proposition 1. show that o. For that purpose we use the convexity lemma, which was obtained by C. Geyer, and can be found in Knight (1999). Convexity Lemma. Suppose Q n is a sequence of lower-semi-continuous convex R- valued random functions, onRd and let V be a countable dense subset ofR *. If Q„ weakly converges to Q^, in R marginally (in 1 defined , finite-dimensional sense) on V where Qoo is lower-semi-continuous convex and finite on an open non-empty set a.s., then arginf Q„{z)-y d arginf Qoo(z), z£R d zeR d provided the latter is uniquely defined Next, we show that 7(0„)-> p By Define F = convexity and lower-semicontinuity of Thus 9 0. a.s. in i-+ for 7 e F, Es[mi(8)'^] < 00 s, for all 3d {7 F : is Es{mi{9 for a given 7 e F and any 9n —> p < its 9 e Bg(9o) and some 6 ii and {7 Es[mi(9oY"f] = 00}. boundary is nowhere dense in R p 00} and convex, open, and Es[mi(9)'^/] over B{(9o) implied by the condition Thus, )'j\ > 9a 39 < : . 0, iii. -^s[tti,(6>„)'7Hp Es[mi(9o)'H Fc = 00. which follows by continuity of This follows from the uniform law of large numbers implied by 6 B&(9o)}, where 1. {s[mi(8)'f],9 2. Erriiifi) — Jx The above function set mi(6)'-y (b) {jrii(8),9 (c) s e V Donsker take 7 in a Donsker class x] being continuously differentiable in 9 for all 2.10.6 in class is class € Bs(9o), some 6 by condition where class dF —1 5 > by condition VD result, first 0, M 1T and iii. iii, iv, by assumption on , s, and a given 7 £ F, by construction of F, denotes the boundary of F. Then ^ >J s[mi(6„)'-y] Rp 6 \ dF we can expand the first = wp — > oo-fp Es[m.i{8o)'"i] = 1 00. as the set T> appearing in the statement of the Convexity = 7(0nH P Given this and itself. Now take all the rational numbers 7 Lemma and conclude that form. Note ii 1, van der Vaart and Wellner (1996) that says a uniform Lipschitz transform of a Donsker F c \ dF, wp — Donsker by e Bg(8o)} being Donsker Theorem Now sufficiently small, being being a uniform Lipschitz function over (d) mi(<?)'7 (e) is < is M for some compact M and a given 7 e F, by condition € (a) dP[mi(8) S arg inf £s[mi(0o)'7]. 7 order condition for 7(#„) in order to obtain the expression for its n = Yl Vs (7(0*)' m i (0«)) ™i ( 9 ") •':' (A.24) t=i where V„ = " -^V 1 = i for some ~t{8n) between and 7 (9 n ), 2 s(7(fl„)'m,(e„))m i (9„)Tn i (9 T,)', l which is different from row to row of the matrix V„. Then K.->p V{0o) = Errn (9 ) rm (9 )' This follows from the uniform law of large numbers implied by {V 2 s(-y'mi(9'))mi(9)mi(8)' ,(9',-y,9) e Bs ^(9 small, being a Donsker class wp — 1, 1. 2. Emi(9)mi{9)' 17 Recall that V = is J xx'dP[m.i(8) < x] ) x Bj 2 (0) x Bs 3 (9 being continuous function defined as the open convex set on which s 40 is finite. in )}, where 9 by condition Sj i, > are sufficiently . 3. EV 2 s{~f'mi (0*))m,- rm (£>) = EV 2 s{0)rrii [9) m; (0)' + o(l) (9)' 6 Bj(5 (9, 9") uniformly in ) x B*(0 O ) for i > sufficiently smaJl S The claim 1 is 0, for ajiy verified 7 — 0, by assumptions on by applying exactly the same s and condition iii. logic as in the previously stated steps (a)-(e). For the sake of brevity, this will not be repeated. Therefore, wp — 1 7(fl») = -(V n) -1 -y)m -(flB ) = r I ft -(V(tfor 1 +Op(l))-X)mi B ). (A.25) (0n)+ rV^7(0„)'V„v^7(0«)> (A.26) n . t=i (fl . t=i Consider the second order expansion, 1 " =y> M0„,7(0„))=-v/S7(0n)'-7 1 < where for some 7 and 7 (# n ), which (#„) between is different from row to row of the matrix V„. By a preceding argument, V;-> P V(9 Inserting (A.25) A. 7 and V„ = V(9o) + o p (l) into (A.26), ). we obtain the required expansion (A. 23). Proof of Proposition 3 Assumption 3 is assumed. We need to verify Assumption 4. Define the identity L n (9) - L n (9 ) = £m,-(0o )'(0 - 9 ) A»(»o)' (A.27) + \<fi- 9o)'nV ee .E mi(eo) (9-9 ) -JWo) + Rn{9). Assumption 4.i-iii then follows immediately from conditions The remainder term R„ (9) Rn (9) is given = ]T {mi(9) i and ii. Assumption 4.iv is verified as follows. the following decomposition: - m,(0o - £m,(0) + Emi{9 ) + n (Em,(9) ) - m,{9 )'{9 - £m,(0 o )) + \{9 - 41 - 9 )j 9 )'nJ{9 ){9 - 9 ) ¥ Assumption to verify It suffices R*n(0) for 9' some on the To show Assumption = -|n(0 - Bo)' 4.iv-(b) for Ri„(9), B |_|9-8 we note sup |<M/ V/S' - 9 ), n M> that for any given l-Ri* (0) > | eI J <limsu P pl sup 1 ifl-flol ^|9-e |<M/VH' <HmsuP p( last (9 )] over a ball at 9o- in limsupP'i where the - J(9 \J{9') connecting 9 and 9o, verification of Assumption 4 for R2 n (9) follows immediately line from continuity of J(9) and R2n{9). Since 4.iv separately for Ri„(9) ^'"^, >e\ |f - »0| J 1 ^JgLLMl >e =0 } sup (A.28) , conclusion follows from two observations. First, note that 7nW lf>\ = ~ is V W~m W Rl » y/E\e-0 O m - J_ ( V^fcfV \ Donkser by assumption, that The is it '( g °) ~ ( Em iW ~ ErmW) ~ rni{9 )'{9 - fl |*-flo| ) \ / converges in t°°(Bg(9o)) to a tight Gaussian process Z. process has uniformly continuous paths with respect to the semimetric p given by 2 (9 1 ,9 2 ) p so that p(9,9) —> if 9 — 9o- Thus almost all = E(Z(9 )-Z(92 )) 2 1 sample paths of Z , are continuous at So- Second, since by assumption E[m n (9) - m„(9 we have for any 9 n — ) -l*Z{9o)'{9 - 9 )f = - o{\9 2 <? | ), 9o E* i/<Y,aU! f_°(|0n-0| ,V^|Cn -Sol. |»n - 0O| 2 ) _ 2 Qj therefore z(e Therefore for any 6'— v 9o, ) = o. we have by the extended continuous mapping theorem ZniP')-*d Z(9 ) = 0, that is Zn (9')-* p (A. 29) 0. This shows (A.28). To prove Assumption 4iv-(a) for "' R\ n (9), we need to show that — i !>/' [ sup for 'f.'"^! 42 some 5 > >e\<e. and constant M (A.30) Using that M/y/n < \9 — 9o\, bound the ,. Iim sup Pp . by left-hand-side f |-Rln(0)| sup < ' < hm sup P y/n\9-8 \ -ipr- sup < —> 1 —7=7- ,, Vn\9-9 \M/.fR<\e-e \<5 — \ e > J \ -i- 77 > • e > (A.31) <limsupP*{ sup ^M/ v 'S'<|e-«ol<* " where for any given by the property B > e (A. 29) of order to in Zn or make make M ! the last inequality true, sufficiently large we can make either S sufficiently small Zn = O p - (1). by the property Appendix on Computation B.l A computational lemma we record some formal In this section LEMMA 3 Suppose such that q{8\8') the chain (9 results on ,j < B) is MCMC computation of the quasi-posterior quantities. produced by the Metropolis Hastings(MH) algorithm with q for each (9,9'). Suppose also that P{p(9 U) ,£) > = 1} Pn(-) 2. the chain is ergodic with the limit marginal distribution given by is ) for all j > i - Then where the supremum is taken over the Borel - f p n (): Pn<fi)do\ = 0, Ja I 3. > the stationary density 0} the chain, 1. iB) lim sup \p(9 e A\9 B»->oo A sets, For any p n - integrable function g: B 1 D Proof. The result An |Z n (0)|.J_> e L is b) Y^g(9 immediate from Theorem immediate consequence of this lemma is r )^ P 6.2.5 in / g(9)pn (9)d9. Robert and Casella (1999). the following result. LEMMA 4 Suppose Assumptions Lemma then for any convex and p n -integrable loss function p„ 3, 1 and 2 hold. Suppose ( the chain {9 -'\j B 1 1 are inf i^p„(e S£0 provided that 9 is (j '-0) ->„0 = argmf uniquely defined. 43 r < B) satisfies the (•) /• / p„(8 - 9)p n {9)d9 conditions of Proof. By Lemma 3 we have the pointwise convergence of the objective function: for any 9 B >0)dO, which implies the result by the Convexity Lemma, since 6 pn i-> J p n {9 — 9)p n (9)d9 is convex by convexity of - Quasi-Bayes Estimation and Simulated Annealing B.2 The relation between drawing from the shape of a likelihood surface and optimizing to find the mode of the likelihood function well is known. It is well Km A-xx. Essentially, as A — established that, e.g. Robert f xz u-(„,9 )7r(5)d9 ,„/,„ /e e = argmaxX n » e9 and Casella (1999), (B.l) (0) sequence of probability measures oo, the XL ^ e e K(9) (B.2) converges to the generalized Dirac probability measure concentrated at argmax L„ The difficulty of nonlinear Sims (1999)). optimization has been an important issue in econometrics (Berndt et The simulated annealing algorithm considered a generic optimization method. with a uniform prior n = (9) number The temperature parameter convergence criteria for the It is (see e.g. Press et is optimum al. (1992), Goffe et al. (1994)) At each temperature of Metropolis-Hastings steps to level 1/A, the draw from the quasi if non-smooth objective functions that may have many a very delicate matter and results of this paper is compute the used quasi-posterior in place of until decreased at an arbitrarily is local extrema. Controlling the optimum of temperature reduction certainly crucial to the performance of the algorithm with highly On the other hand, as Theorems 1 and 2 we may fix the temperature parameter 1/A at apply equally to (B.2), the functions. show that simulated distribution been widely used in optimization of non-likelihood- based the temperature parameter slow rate (that depends on the criterion function), simulated annealing can find the global is usually then decreased slowly while the Metropolis steps are repeated, semi parametric objective functions. In principle, nonsmooth objective (1974), is are achieved. Interestingly, the simulated annealing algorithm has parameter al. an implementation of the simulation based optimization (B.l) c on the parameter space 0. annealing routine uses a large (B.2). (9). medians or means for (B.2) a positive constant and then using Metropolis steps. These estimates can be maximum. They are consistent and asymptotically normal, and possess the same the exact maximum. The interpretation of the simulated annealing algorithm as an the exact limiting distribution as implementation of (B.2) also suggests that for some problems with special structures, other such as the Gibbs sampler, may be MCMC methods, used to replace the Metropolis-Hasting step in the simulated annealing algorithm. B.3 Details of Computation The parameter space prior is is truncated to 0. taken to be in Monte-Carlo Examples Q = Each parameter [8o is ± 10]. The transition kernel is a Normal density, and flat updated via a Gibbs-Metropolis procedure, which modifies 44 slightly the basic Metropolis-Hastings algorithm: for k density q{\(,k — \,<t>) 0Jt p(6^,£) probability is and rejection probability is first N = 5, 000 example. -~,d, a draw of (/> is fit from the univariate normal adjusted every 100 draws (in the second empirical example) or 200 draws (in the first simulation example) so that the roughly 50%. N x d draws (the burn-in stage) are discarded, and the remaining in To x d draws are used in The starting value is the OLS estimate in all examples. We use the second simulation example and empirical example and N = 10, 000 in the second simulation computation of estimates and TV 1, specified in the text. Variance parameter simulation example The = made, then the candidate value £ consisting of £k and 9_^ replaces 0"' with give intervals. an idea of computational expense, computing one depending on the example. results are available All of the codes that set of estimates takes 20-40 we used to produce figures, simulation, seconds and empirical from the authors. Notation and Terms convergence in (outer) probability P" -t p convergence in distribution under P' —>d wp —> 1 ~ Bs(x) J with inner probability P» converging to one asymptotic equivalence denoted ball centered at AT(0, a) T Donsker class > A is positive definite is when t°°(J-) lim AB~' = I matrix normal random vector with mean and variance matrix a here this means that empirical process / asymptotically Gaussian in mineig(A) means identity matrix A A> x of radius S A~B £°°(^r metric space of bounded over minimum ), see i-> — Ef(Wi)) is van der Vaart (1999) T functions, eigenvalue of matrix -t= X)™=1 (/(W,-) see van der Vaart (1999) A References Abadie, A., 1995. Changes in Spanish labor income structure during the 1980s: a quantile regression approach, CEMFI Working Paper. Amemiya, T., 1977. The maximum likelihood and the nonlinear three-stage nonlinear simultaneous equation model. Econometrica 45 Amemiya, T., 1985. The integral of a probability inequalities. Proc. W. least squares estimator in the general 955-968. Advanced Econometrics. Harvard University Press. Anderson, T. W., 1955. Andrews, D. (4), K., 1994a. Empirical process set and some econometrics. In: Engle, R., McFadden, D. (Eds.), Handbook symmetric unimodal function over a symmetric convex Amer. Math. Soc. 6, methods 170-176. in of Econometrics, Vol. 4. North Holland, pp. 2248-2292. Andrews, D. W. K., 1994b. The large sample correspondence between odds tests. Andrews, D. metrica 65 Econometrica 62 W. K., 1997. (4), A (5), classical hypothesis tests and Bayesian posterior 1207-1232. stopping rule for the computation of generalized method of 913-931. 45 moments estimators. Econo- Andrews, D. W. K., 1999. Estimation when a parameter is on a boundary. Econometrica 66, 1341-83. Berger, J. O., 2002. Bayesian analysis: a look at today and thoughts of tomorrow. In: Statistics in the 21st Century. Chapman and Bemdt, New Hall, York, pp. 275-290. E., Hall, B., Hall, R., Hausman, Economic and Social Measurement 3 Bernstein, S., 1917. Theory of J., 1974. Estimation (4), and inference in nonlinear structural models. Annals of 653-665. Probability. (Russian) Fourth Edition (1946) Gostekhizdat, Moscow-Leningrad. Berry, S., Levinsohn, J., Pakes, A., July 1995. Automobile prices in market equilibrium. Econometrica 63, 841-890. Yahav, Bickel, P. J., Geb Some J. A., 1969. contributions to the asymptotic theory of Bayes solutions. Z. Wahrsch. Verw. 11, 257-276. and Measure, 3rd Ed. John Wiley and Sons. Billingsley, P., 1994. Probability Buchinsky, M., 1991. Theory of and practice of quantile regression, Ph.D. dissertation, Department of Economics Harvard University. Buchinsky, M., Hahn, J., 1998. An alternative estimator for the censored regression model. Econometrica 66, 653-671. Bunke, O., Milhaud, X., 1998. Asymptotic behavior of Bayes estimates under possibly incorrect models. The Annals of Statistics 26 (2), 617-644. Chamberlain, G., Imbens, G., 1997. Nonparametric appliations of Bayesian inference, Chernozhukov, V., Hansen, C, 2001. An IV model NBER Working MIT of quantile treatment effects, Paper. Department of Economics Working Paper. Chernozhukov, V., Umantsev, Economics Chib, S., 2001. Handbook L., 2001. Conditional value-at-risk: Aspects of modeling and estimation. Empirical 26, 271-92. Markov chain monte carlo methods: Chapter of Econometrics, Vol 5, 5. Computation and inference. In: J.J.Heckman, Learner, E. (Eds.), North Holland, pp. 3564-3634. Hahn, J., Inoue, A., 1999. Testing, comparing and combining value Wharton School University of Pennsylvania. Christoffersen P., , Diaconis, P., Freedman, D., 1986. Doksum, K. Ann. On and robust Bayes procedures for location based on partial information. 443-453. Engle, R., Manganelli, S., 2001. Caviar: Conditional value at risk by regression quantiles, of Economics UC working Paper, the consistency of Bayes estimates. Annals of Statistics 14, 1-26. A., Lo, A. Y., 1990. Consistent Statist. 18 (1), at risk measures, Working Paper, Department San Diego. Fitzenberger, B., 1997. A guide to censored quantile regressions. In: Robust inference, Handbook of Statistics. Vol. 14. North-Holland, Amsterdam, pp. 405-437. Gallant, A. R., White, H., 1988. A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Oxford: Bail Blackwell. Geweke, J., Keane, M., 2001. Computationally intensive methods for integration Learner, E. (Eds.), Goffe, W. L., Ferrier, Handbook of Econometrics, Vol 5, Chapter 5. G. D., Rogers, J., in econometrics. In: J.J.Heckman, North Holland, pp. 3465-3564. 1994. Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60, 65-99. 46 Hahn, 38 J., (4), Hansen, 1997. Bayesian bootstrap of the quantile regression estimator: a large sample study. Internat. Econom. Rev. 795-808. L., Heaton, J., Yaron, A., 1996. Finite-sample properties of some alternative GMM estimators. Journal of Business and Economic Statistics 14, 262-280. Hansen, L. Large sample properties of generalized method of moments estimators. Econometrica 50 P., 1982. (4), 1029-1054. Hogg, R. V., 1975. Estimates of percentile regression lines using salary data. Journal of American Statistical Association 70, 56-59. Huber, P. J., Ibragimov, I., 1973. Robust regression: Asymptotics, conjectures, and monte carlo. Annals of Statistics 1, 799-821. Has'minskii, R., 1981. Statistical Estimation: Asymptotic Theory. Springer Verlag. Imbens, G-, 1997. One-step estimators for over-identified generalized method of moments models. Rev. Econom. Stud. 64 (3), 359-383. Imbens, G., Spady, R., Johnson, Econometrica Jureckova, 66, J., 1977. P., 1998. Information theoretic approaches to inference in moment condition models. 333-357. Asymptotic and R-estimators relations of M-estimators models. Annals of in linear regression Statistics 5, 464-472. Khan, S., Powell, J. L., 2001. Two step estimation of semiparametric censored regression models. Journal of Econo- metrics 103, 73-110. Kim, J.-Y., 1998. Large sample properties of posterior densities, Bayesian information criterion and the likelihood principle in nonstationary time series models. Econometrica 66 (2), 359-380. Kim, J.-Y., 2002. Limited information likelihood and Bayesian analysis. Journal of Econometrics Kitamura, Y., 1997. Empirical likelihood methods with weakly dependent processes. Ann. Statist. , 175-193. 25 (5), 2084-2102. Kitamura, Y., Stutzer, M., 1997. An information-theoretic alternative to generalized method of moments estimation. Econometrica 65, 861-874. Knight, K., 1999. Epi-convergence and stochastic equisemicontinuity, Working Paper, Department of Statistics University of Toronto. Koenker, R., 1994. Confidence intevals for quantile regression. Asymptotic Statistics. Heidelberg: In: Proceedings of the 5th Prague Symposium on Physica- Verlag, pp. 10-20. Koenker, R., 1998. Treating the treated, varieties of causal analysis, Lecture Note, Depratment of Economics University of Illinois. Koenker, R., Bassett, G. S., 1978. Regression quantiles. Econometrica 46, 33-50. Kottas, A., Gelfand, A., 2001. Bayesian semiparametric median regression modeling. Journal of the American Statistical Association 96, 1458-1468. Lehmann, E., Casella, G., 1998. Theory of Point Estimation. Springer. Macurdy, T., Timmins, C., 2001. Bounding the influence of Working Paper, Department of attrition Economics Yale University. 47 on the intertemporal wage variation in the NLSY, Mood, A. M., Newey, Theory of 1950. Introduction to the Statistics. McGraw-Hill Book Company, Inc. W. K., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59 W. K., (4), 1161- 1167. Newey, (Eds.), W. Newey, McFadden, D., Handbook K., Powell, J. L., W. North Holland, pp. 2113-2241. Theory 6, W. of I censored regression models under conditional 295-317. GMM K., Smith, R., 2001. Higher order properties of Working Paper, Department Newey, 4. 1990. Efficient estimation of linear and type quantile restrictions. Econometric Newey, McFadden, D. 1994. Large sample estimation and hypothesis testing. In: Engle, R., of Econometrics, Vol. and generalized empirical likeliood estimators, Economics MIT. K., West, K. D., 1987. A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55 (3), 703-708. Owen, A., 1989. Empirical likelihood ratio confidence regions. In: Proceedings of the 47th Session of the International Statistical Institute, Owen, Book A., 1990. Empirical likelihood ratio confidence regions. Owen, A., 1991. Empirical Owen, 3 (Paris, 1989). Vol. 53. pp. 373-393. likelihood for linear models. A., 2001. Empirical Likelihood. Ann. Ann. Statist. 18 (1), 90-120. Statist. 19 (4), 1725-1747. Chapman and Hall/CRC. Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57 (5), 10271057. Phillips, P. C. B., Ploberger, W., 1996. An asymptotic theory of Bayesian inference for time Econometrica series. 64 (2), 381-412. Pollard, D., 1991. Asymptotics Potscher, B. M., Prucha, I. for least absolute deviation regression estimator. R., 1997. Dynamic Nonlinear Econometric Models. Econometric Theory 7, 186-199. Springer-Verlag, Berlin. Powell, J. L., 1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics , 303-325. Press, W., Teukolsky, S. A., Vettering, W., Flannery, B., 1992. Numerical Recipes in C, The Art of Scientific Com- puting. Cambridge. Qin, J., Lawless, J., 1994. Empirical likelihood and general estimating equations. Annals of Statistics 22, 300-325. Robert, C. P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer. Rousseeuw, P. Sims, C, J., Hubert, M., 1999. Regression depth. 1999. Adaptive Metropolis-Hastings, or J. Amer. monte Statist. Assoc. 94 (446), carlo kernel estimation, 388-433. Working Paper, Department of Economics Princeton University. Stigler, S. M., 1975. Studies in the history of probability and statistics. XXXIV. Napoleonic statistics: the work of Laplace. Biometrika 62 (2), 503-517. van Aelst, S., Rousseeuw, P. J., Hubert, M., Struyf, A., 2002. The deepest regression method. 81 (1), 138-166. van der Vaart, A., 1999. Asymptotic Statistics. Cambridge University 48 Press. J. Multivariate Anal. van der Vaart, A. W., Wellner, von Mises, J. A., 1996. Weak Convergence and Empirical Processes. Springer- Verlag, New York. R., 1931. Wahrscheinlichkeitsrechnung. Berlin: Springer. White, H., 1994. Estimation, Inference and Specification Analysis. Vol. 22 of Econometric Society Monographs. Cambridge University Zellner, A., 1998. Past Press, Cambridge. and recent results on maximal data information 49 priors. J. Statist. Res. 32 (1), 1-22. Table 1: Monte Carlo Comparison of LTE's with Censored Quantile Regression Estimates Obtained Programming (Based on 100 repetitions). using Iterated Linear Estimator RMSE MAD Mean Bias Median Bias Median Abs. Dev. n=400 Q-posterior-mean 0.473 0.378 0.138 0.134 0.340 Q-posterior-median 0.465 0.372 0.131 0.137 0.344 Iterated LP(10) 0.518 0.284 0.040 0.016 0.171 3.798 0.827 -0.568 -0.035 0.240 n=1600 Table Q-posterior-mean 0.155 0.121 -0.018 0.009 0.089 Q-posterior-median 0.155 0.121 -0.020 0.002 0.092 Iterated LP(7) 0.134 0.106 0.040 0.067 0.085 3.547 0.511 0.023 -0.384 0.087 2: Monte Carlo Comparison of the LTE's with Standard Estimation for a Linear Quantile Regression Model (Based on 500 repetitions) RMSE MAD .0747 .0587 .0174 .0204 .0478 .0779 .0608 .0192 .0136 .0519 .0787 .0628 .0067 .0092 .0510 Q-posterior-mean .0425 .0323 -.0018 -.0003 .0280 Q-posterior-median .0445 .0339 -.0023 .0001 .0295 Standard Quantile Regression .0498 .0398 .0007 .0025 .0356 Estimator Mean Bias Median Bias Median AD n=200 Q-posterior-mean Q-posterior-median Standard Quantile Regression n=800 Table 3: Monte Carlo Comparison of the LT Inference with Standard Inference for a Linear Quantile Regression Model (Based on 500 repetitions) coverage length Quasi-posterior confidence interval, equal tailed .943 .377 Quasi-posterior confidence interval, symmetric (around mean) .941 .375 Quantile Regression: Hall-Sheather Interval .659 .177 Quasi-posterior confidence interval, equal tailed .920 .159 Quasi-posterior confidence interval, symmetric (around mean) .917 .158 Quantile Regression: Hall-Sheather Interval .602 .082 Inference Method n=200 n=800 Criterion for CO Criterion for QB-Estimation ps • \ to J "\ <* IV-QR CM i 4 O 12 2000 4000 3 thela theta Markov Chain Sequence Q-Posterior for Theta 6000 8000 10000 -0.2 0.2 0.0 0.4 thela Figure 1: A Nonlinear IV Example involving Instrumental Quantile Regression. In the top-left panel L n {0) is depicted (one-dimensional case). The true parameter Markov Chain sequence of draws (0' ', ...0^) is depicted. The marginal distribution of this sequence is p n {9) = e L " (8) / /e e LnW d0 see the bottom-right panel. The point estimate, the sample mean 9, is given by the vertical line with the romboid root. Two other vertical lines are the 10-th and the 90-th percentiles of quasi-posterior distribution. The upper-right the discontinuous objective function #o = 0. 1 In the bottom-left panel, a , panel depicts the expected loss function that the LTE minimize. Var(p) for dynamic model 24-00 Figure 2: Recursive VaR o.2 Surface in time-probability space Var(p) for static model 2400 Figure 3: Non-recursive VaR 0.2 Surface in time-probability space 9?(t) for r Figure 4: Figure 5: #3(7") for € r G [.2, .8] [.2, .8] and the 90% confidence intervals. and the 90% confidence intervals. ' 0.2 0.1 Figure 0.3 6: d\ (r) for 0.1 Figure 0.2 7: r 6 0.3 q(t) for t e 0.7 0.6 0.4 [.2, .8] 0.4 and the 90% confidence 0:5 [.2, .8] 0.6 0.8 and the 90% confidence ^8 38 intervals. [ G 0.9 intervals. I 1 2003 Date Due Lib-26-67 MIT LIBRARIES 3 9080 02613 1380