3 DEWEY. MIT LIBRARIES 3 9080 02898 1451 Massachusetts Institute of Technology Department of Economics Working Paper Series IMPROVING POINT AND INTERVAL ESTIMATES OF MONOTONE FUNCTIONS BY REARRANGEMENT Victor Chernozhukov Ivan Fernandez-Val Alfred Galichon Working Paper 08-1 July 2, 2008 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://ssrn com/abstract=1 1 59965 Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/improvingpointinOOcher IMPROVING POINT AND INTERVAL ESTIMATES OF MONOTONE FUNCTIONS BY REARRANGEMENT VICTOR CHERNOZHUKOV' Abstract. Suppose that a creasing, and an increasing. We target function fo M.'' : —* R estimation methods used monotonic, namely weakly is original estimate / of this target function Many common is available, in statistics which is in- not weakly produce such estimates /. show that these estimates can always be improved with no harm by using rearrangement techniques: The rearrangement methods, univariate and multivariate, transform the original estimate to a monotonic estimate /', and the resulting estimate in ALFRED GALICHON' IVAN FERNANDEZ- VAL* common ment metrics than the original estimate /. is closer to the true curve /o The improvement property also extends to the construction of confidence bands monotone for of the rearrange- functions. Let ( u be the lower and upper endpoint functions of a simultaneous confidence interval covers /o with probability 1 — q, then the rearranged confidence interval [^',u'), defined by the rearranged lower and upper end-point functions (' and u' norms than the original interval illustrate the results with a and [(,u] that and covers , is shorter in length in /o with probability greater or equal to 1 common - q. We computational example and an empirical example dealing with age-height growth charts. Key words. Monotone function, improved estimation, improved inference, multivariate re- arrangement, univariate rearrangement, Lorentz inequalities, growth chart, quantile regression, AMS mean regression, series, locally linear, kernel methods Subject Classification. Primary 62G08; Secondary 46F10, 62F35, 62P10 December, 2006- Two ArXiv versions of paper arc available, ArXiv 0704.3686 (April, 2007) and This version is of July 2, 200S. The previous title of this paper was "Improving Estimates of Monotone Functions by Rearrangement". Wp would like to thank the editor, the associate editor, the anonymous referees of the journal, Richard Blundell, Eric Cator, Andrew Chcshcr, Moshc Cohen, Holgor Dettc. Emily Gallagher, Pict Groeneboom, Raymond Guiteras, Xuming He, Roger Koenkcr, Charles Manski, Costas Meghir, Ilya Molchanov, Whitney Newey, Steve Portnoy. Alp Simsek, Jon Wellner and seminar participants at BU, CEMFI, CEMMAP Measurement Matters Conference, Columbia, Columbia Conference on Optimal Transportation, Conference Frontiers of Microeconomctrics in Tokyo, Cornell, Cowless Foundation 75th Anniversary Conference. Duke- Triangle, Georgetown, MIT. MIT-Harvard, Northwestern, UBC, UCL, UIUC. University of Alicante. University of Gothenburg Conference "Nonsmooth Inference, Analysis, and Dependence," and Yale University for Date: First version is of ArXiv: 0806 4730 (June, 2008) very useful comments that helped to considerably improve the paper. Massachusetts Institute of Technology, Department of Economics &: Operations Research Center, and University College London, CEMMAP. E-maii: vchern@mit.edu. Research support from the Castle Krob Chair, National Science Foundation, the Sloan Foundation, and CEMMAP is gratefully acknowledged § Boston University, Department of Economics. E-mail: ivafif@bii,edii, Re.search support from the National Science Foundation is gratefully acknowledged, J Ecole Polytechniquc, Departement d'Economie, E-mail: alfred.galichon@polytechnique edu. 1 ,1. Introduction A common problem in statistics using an available sample. is • the approximation of an unknown monotonic function There are many examples of monotonically increasing functions, including biometric age-height charts, which should be monotonic in age; econometric de- mand functions, which should be be monotonic mate is monotonic Suppose an and quantile functions, which should non-monotonic, esti- Then, the rearrangement operation from variational analysis (Hardy, Lit- in the probability index. available. in price; original, potentially tlewood, and Polya 1952, Lorentz 1953, Villani 2003) can be used to monotonize the original estimate. The rearrangement has been shown estimates of density functions (Fougeres 1997), conditional Zitikis 2005, Dette, Neumeyer, and Pilz 2006, Dette monotonized to be useful in producing and mean (Davydov and functions and Scheder 2006), Pilz 2006, Dette and various conditional quantile and distribution functions (Chernozhukov, Fernandez- Val, and Galichon (2006b, 2006c)). we show, using Lorentz In this paper, inequalities that the rearrangement of the original estimate is and their appropriate generalizations, not only useful for producing monotonicity, but also has the following important property: The rearrangement always improves upon the original estimate, whenever the latter is not monotonic. Namely, the rearranged curves are Furthermore, always closer (often considerably closer) to the target curve being estimated. this improvement property is generic, i.e., it does not depend on the underlying specifics of the original estimate and applies to both univariate and multivariate cases. The improvement property of the rearrangement also extends to the construction of confidence bands tone functions. We for mono- show that we can increase the coverage probabilities and reduce the lengths of the confidence bands for monotone functions by rearranging the upper and lower bounds of the confidence bands. Monotonization procedures have a long history to isotone regression. While we will in the statistical literature, mostly in relation not provide an extensive literature review, methods other than rearrangement that are studies the effect of smoothing on monotonization by isotone regression. alternative two-step procedures that differ in the ordering of the smoothing steps. The resulting estimators are asymptotically equivalent bandwidth choice is used in the smoothing step. up He reference the and monotonization Turlach, and if an optimal Wand called I-splines. Ramsay (1988), which projects on a class of Later in the paper we will of these procedures with the rearrangement. make both analytical (2001) isotone regression problems, can be recast as a projection problem with respect to a given norm. Another approach one-step procedure of (1991) considers two to first order Mammen, Marron, show that most smoothing problems, notably including smoothed we Mammen most related to the present paper. monotone is the spline functions and numerical comparisons ; 2. 2.1. Improving Point Estimates of Monotone Functions by Rearrangement Common statistics is Estimates of Monotonia Functions. A the estimation of an unknown function /o Suppose we know that the target function further that an original estimate / common is /o is : M'' basic problem in ^ R method transforms which available, is The answer provided by the rearrangement Many not necessarily monotonic. Can paper this these estimates always the rearrangement yes: is the original estimate to a monotonic estimate /*. and this estimate fact closer to the true curve /b of monotonia, namely weakly increasing. Suppose estimation methods do indeed produce such estimates. be improved with no harm? many areas using the available information. than the original estimate / in common is in metrics. Furthermore, computationally tractable, and thus preserves the computational appeal is of the original estimates. Estimation methods, specifically the ones used global methods and where Pk„{x) is methods. local /o taking the form ' • An example " \- in regression analysis, method of a global - ; ,: the series estimator of is , ,- can be grouped into ., - : ; . . a fc„-vector of suitable transformations of the variable polynomials, and trigonometric functions. Section 4 an empirical example. The estimate b is lists specific x, examples such as B-splines, in the context of obtained by solving the regression problem n b = a.Tg min y"p{Y, - Pk„{X,yb), \ 1=1 where {Yi,Xi),i = l,...,n denotes the data. produces estimates of the conditional mean of 1994, Newey In particular, using the square loss p{u) 1997), while using the asymmetric absolute deviation loss Andrews 1991, Stone p{u) = (u — l(u < 0))u Y, series estimates given Xi (Koenker and Bassett 1978, x ^-^ f{x) = Pk„ {x)'b are widely used data analysis due to their desirable approximation properties and computational However, these estimates need not be naturally monotone, unless into the optimization program and Koenker and Ng (2005)). Examples of local (see, for v? Yi given X,; (Gallant 1981, produces estimates of the conditional u-quantile of Portnoy 1997, He and Shao 2000). The = in tractability. explicit constraints are added example, Matzkin (1994), Silvapulle and Sen (2005), methods include kernel and locally polynomial estimators. A kernel estimator takes the form /(i) = argmin S" Wrp{Y^ - 6), u^ = K where the loss function kernel function, and ^ p plays the same role as above, K{u) > is a vector of bandwidths —^ [ ^ 1=1 (see, for is a standard, possibly high-order, example, Wand and Jones (1995) and Ramsay and Silverman resulting estimate x necessarily improves We monotonic one. kernel estimate into a upon the polynomial regression further a related local show here that the rearranged estimate whenever the original estimate, is /(x) need not be naturally i-* show that the rearrangement transforms the Dette, Neumeyer, and Pilz (2006) monotone. locally The (2005)). latter method (Chaudhuri 1991, The not monotonic. is Fan and Gijbels 1996). In particular, the locally linear estimator takes the form {f{x),d{x)) = Y] Wip{Yi -b~ ~l argmin 6eR,deR The x resulting estimate i—> may f{x) also = K ^-— [ " V illustrates the explicit constrains are non-monotonicity of the locally an empirical example. summary, there are many In w, be non-monotonic, unless added to the optimization problem. Section 4 linear estimate in d{X, - x)f, attractive estimation and approximation methods in statistics that do not necessarily produce monotonic estimates. These estimates do have other attractive features though, such as we show good approximation properties and computational Below tractability. that the rearrangement operation applied to these estimates produces (monotonic) estimates that improve the approximation properties of the original estimates by bringing them closer to the target curve. Furthermore, the rearrangement is computationally tractable, and thus preserves the computational appeal of the original estimates. 2.2. In The Rearrangement and what follows, let ,^ be a take this interval to be ,^ bounded subset when X Approximation Property: The Univariate Case. its compact = = loss of generality, it is convenient to Let f{x) be a measurable function mapping [Oil]- of R. Let Ff{y) Without interval. j-^ l{f{u) follows the uniform distribution on < y}du denote [0, 1]. X to K, a the distribution function of f(X) Let r{x):=Qf{x):=inf{yeR:Ff{y)>x} be the quantile function of Fj{y). Thus, r{x) :=inf This function /* is Ue . / l{f{u) < y}du > X Jx ix called the increasing rearrangement of the function /. Thus, the rearrangement operator simply transforms a function / to That It is is, X i—> f*{x) is the quantile function of the random variable its quantile function /*. f{X) when X~ L/(0, 1). also convenient to think of the rearrangement as a sorting operation: given values of the function /(,r) evaluated at i in a fine enough net of equidistant points, values in an increasing order. The first main point of this The function created in this paper is the following: way is we simply sort the the rearrangement of /. Proposition Let fo 1. X ^' K be a weakly increasing measurable function in x. This is X -^ K he another measurable function, an initial estimate of the : Let f the target function. : target function /o- L For any p € [1, rearrangement of f oo], the denoted /*, weakly reduces the estimation error: , i/p ll/p < dx rix)-fo{x) I f{x)JX X Suppose that there 2. x £ Xo and that for all > /o(x') exist regions fo{x) + for e, > e Then 0. p £ (l.co). Namely, for any p G dx (2.1) Xq and Xq, each of measure greater than S Xq we have that x' some Mx] the > x' (i) gam x, > f{x) (ii) f(x') i/p v,v' ,t, t' = inf{|D and is strict for (Hi) - \f{x) dx - 5n fo{x) (2.2) IX — in the set e, i/p < dx X rjp + in the quality of approximation Q, such (1,(X)), r(x)-fo{x) where > 5 t'\^ K + \v' — such that i|f — \v v' > v - t^ — + e — \v' and > t' > t'\^} + t 0, with the infimum taken over all e. This proposition establishes that the rearranged estimate /* has a smaller (and often strictly smaller) estimation error in the Lp monotone. This sample size states the is norm than the original estimate whenever the latter a very useful and generally applicable property that and of the way the weak inequality example, the inequality a subset of X increasing, we mean is original estimate / (2.1), and strict for obtained. is The is part of the proposition first the second part states the strict inequality (2.2). p £ (1,00) if the original estimate f{x) having positive measure, while the target function fo{x) Of strictly increasing throughout). course, if is as the distribution of the original function /, that The weak ment and inequality (2.1) is X mapping that is, X to K. Let q* < /o(x), and g*{x) = the target function foix). is its each pair of vectors {v,v') and L{q{x),g{x))dx, / ' . ' •:; . Jx any submodular discrepancy function L = the denote their corresponding increasing rearrangements. Then, L{q*{x),g*{x))dx g{x) is f = Ff a direct (yet important) consequence of the classical rearrange- X for (by constant, then the F;, is inequality due to Lorentz (1953): Let q and g be two functions g* For decreasing on increasing on is fo{x) is inequality (2.1) becomes an equality, as the distribution of the rearranged function /* same not independent of the is R^ Now, note that ^-> in E+. t') + L(y \J = Set q{x) our case /q own rearrangement. Let (t,t') in R^,' L{v Av',t/\ : (a;) = /o(2;) us recall that we have that v',t\J t') < L{v, f{x), q*{x) L = f*{x), almost everywhere, is submodular if for . t) + L(y' , t'), (2.3) . L measuring In other words, a function the discrepancy between vectors When co-monotonization of vectors reduces the discrepancy. ularity is < equivalent to the condition dvdtL{v,t) example, power functions L{v,t) = — \v t\^ for L a function submodular is if smooth, submod- is holding for each {v,t) in p £ [l,oo) and many other Thus, IR^. for loss functions are submodular. In the Appendix, of the arise we provide weak inequality a proof of the strong inequality (2.2) as well as the direct proof The (2.1). how direct proof illustrates reductions of the estimation error from even a partial sorting of the values of the estimate Moreover, the direct proof /. characterizes the conditions for the strict reduction of the estimation error. It is also worth emphasizing the following immediate asymptotic implication of the above The rearranged estimate finite-sample result: For p e original estimates /. [l,c»], sequence of constants On, then [J^ if A,, \I*{'^) — /* inherits the = [/:^. 1/(3:) fo{x)\^duY/^ - Computation of the Rearranged Estimate. One used for points in [0, 1] or (2) a sample of i.i.d. = methods can be of the following B] be 1 X either (1) a set of equidistant [0, 1]. Then the can be approximately computed as the u-quantile The 1,...,S}. from the Op{aTi) for some draws from the uniform distribution on rearranged estimate f*{u) at point u € of the sample {f{Xj),i = /o(x)|''du]^'''' = < A„ = Op[an)- 2.3. computing the rearrangement. Let {Xj.'j rates of convergence L-p first method deterministic, is and the second is draws B, the complexity of computing the rearranged stochastic. Thus, for a given number estimate /*(u) in this way equivalent to the complexity of computing the sample u-quantile in a sample of size B. is The number of of evaluations B can depend on the problem. that the density function of the random variable f{X), when A' ^ U{0, Suppose bounded away 1), is from zero over a neighborhood of f*{x). Then f*{x) can be computed with the accuracy of Op(l/-\/S), as 2.4. B ^ oo, where the rate follows from the results of Knight Approximation Property: The Multivariate Case. multivariate functions / X'^ — K, where A''' = [0, l]"^ and K is The Rearrangement and In this section we consider (2002). Its > ; a bounded subset of K. The notion of monotonicity we seek to impose on / We say that the function / notation x' = {x\,...,x'^) > is a; weakly increasing = {x],...,Xd) other in each of the components, that is, i' in x if f{x') means that one vector > Xj for each j use the notation f{xj,x^j) to denote the dependence of / on other arguments, :r_j, requirement that for for each x_j in X'^~^ that exclude Xj. each j in 1, ...,d The notion = i—> is /(xj,x_j) the following: ,r' > x. The weakly larger than the follows, we j-th argument, ij, and all d. 1 its of monotonicit}' the mapping Xj is > f{x) whenever above is In is what equivalent to the weakly increasing in x_,, Define the rearranged operator Rj and the rearranged function fj{x) with respect to the j-th argument as follows: f;ix):=Rjof{x):=\nny This Xj I— is the one-dimensional increasing rearrangement applied to the one-dimensional function The rearrangement f{Xj,X-j), holding the other arguments x^j fixed. > >xA l{f{x'^,x.j)<y}dx'^ J Ux is applied for every value of the other arguments x_j. Let TT = (tti, TTd) .... be an ordering, the TT-rearrangement operator a permutation, of the integers i.e., R^ and 1, ...,d. Let us define the 7r-rearranged function f*{x) as follows: f:{x):=R^of{x):=R^,o...oR,^of{x). For any ordering of tt, the 7r-reaxrangement operator rearranges the function with respect to arguments. As shown below, the resulting function its In general, two different orderings and /*(a;) orderings, orderings tt and |n[ weakly increasing we may tt, we can consider averaging among rearrangements done with different among them: letting 11 be any collection of distinct define the average rearrangement as denotes the number of elements all in Tren ' the set of orderings 11. Dette and Scheder (2006) the possible orderings of the smoothed rearrangement in the context monotone conditional mean estimation. As shown below, the approximation average rearrangement is Let f : : X —* X'^ —' K K be a be to For each ordering n of (a) For any X"^ -^ : measurable function that another estimate of 1, some ..., Moreover, f*{x), an average of 2. . , 2. Let the target function /o a rearranged f with respect 1. '..'' -., following proposition describes the properties of multivariate yr-rearrangements; Proposition Let f error of the weakly smaller than the average of approximation errors of individual TT-rearrangements. The in x. f*,{x). Therefore, to resolve the conflict also proposed avera.ging of is can yield different rearranged functions n' of 1, ...,d ' where /^(a;) all j in 1, ...,d tt fo, is which K be weakly increasing an initial is tt estimate of the target function /q. Then, -rearranged estimate f^{x) -rearranged estimates, and any p in x. measurable in x, including, for example, of the arguments. d, the and measurable in [l,oo], the is is weakly increasing in x. weakly increasing in x. rearrangement of f with respect to the j-th argument produces a weak reduction in the approximation error: 1/p \f;ix)-fo{x)\Pdx X" 1/p < \f{x) X'' - fo{xWdx\ . (2.5) (b) Consequently, a -n -rearranged estimate f^{x) of f{x) weakly reduces the approximation error of the original estimate: 1/p l/;(x) - -ii/p fo{x)\Pdx UX" and for x'j A"' all > C X each of measure x = {xj,X-j) and x' = , > Xj, (a) f{x) (a) Then, for (2.6) Ux-' Suppose that f{x) and /o(x) have 3. - foixWdx \f{x) f{x') any p € + e, 5 > the following properties: and 0, (lii) X X-j C , there exist subsets of measure v > Q, Xj c > fo{x') fo{x) + some for e, e > X such that with x' G X'^ Xj £ Xj, X-j G X-j, we have that (a;',a;_j), and a subset (i) 0. (1, oo), i/p 1/p \f;(x)-fo{x)\pdx - \f{x) fo{x)\Pdx - (2.7) ripSiy X'' where V, v', rjp t, t' = — inf{|D function, f(x) = + \v' an ordering (b) Further, for and K in the set — tp — such that v' > i'jP Rtt^^^ ° tt = \v - + v (tti, t\P e — and ..., tt;,.. - \v' > t' ••i o R-n^ ° f{x) (for k "'rf) = > t'\P} t + with the mfimum. taken over 0, with -Kk = j, 'e^ = d we set f{x) f be a partially f{x)). the target function fo{x) satisfy the condition stated above, then, for If the \f{x)-fo{x)rdx-iipSu Ux-i 4- Ix-i The approximation error of an average rearrangement approximation error of the individual tt- is weakly smaller than the average rearrangements: For any p G [l,oo], 1/p 1/p \rix)^foix)\^dx in! Ix-i ^ Tren \rAx) - fo{x)\Pdx This proposition generalizes the results of Proposition showed that all their both arguments of the arguments. Dette smoothed rearrangement strict that when is 1 to the multivariate case, also We see that mean demon- the 7r-rearranged functions and Scheder (2006), using a for conditional for the bivariate case in large samples. ment improves the approximation properties (2.9) Ux'' strating several features unique of the multivariate case. are monotonic in function f[x) Tl/P < - fo{x)\^dx rearranged any p £ (l,oo), ii/p \f:(x) all e. different functions is argument, monotonic The rearrangement along any of the estimate. argu- Moreover, the improvement the rearrangement with respect to a j-th argument is decreasing in the j-th argument, while the target function in is performed on an estimate is increasing in the same j-th argument, in the sense precisely defined in the proposition. Moreover, averaging different TT-rearrangements is better (on average) than using a single tt- rearrangement chosen at random. All other basic implications of the proposition are similar to those discussed for the univariate case. Figure Graphical illustration for the proof of Proposition 1. and comparison to 1 panel) (left isotonic regression (right panel). In the figure, /o represents the target function, / the original estimate, /* the rearranged estimate, f' the isotonized estimate, and /'''^' the average of the rearranged and isotonized estimates. L{v,t') 2.5. = = In the left panel L{v,t) oP, L{v\t) ~ d', L{v',t') = and b^, dP. Discussion and Comparisons. In what follows we informally explain why rearrange- ment provides the improvement property and compare rearrangement to isotonization. Let us begin by noting that the proof of the improvement property can be and then to to the case of simple functions or, equivalently, functions with a finite domain, the case of "very simple" functions with a two-point domain. The improvement property these very simple functions then follows from the submodularity property (2.3). panel of Figure 1 we illustrate this In the for left property geometrically by plotting the original estimate /, the rearranged estimate /*, and the true function is reduced first /q. In this example, the original estimate decreasing and hence violates the monotonicity requirement. We rearrangement co-monotonizes /* with /o and thus brings /* closer to see that the two-point Jq. Also, we can view the rearrangement as a projection on the set of weakly increasing functions that have the same distribution as the original estimate /• Next The in the right panel of Figure isotonized estimate /^ is 1 ' , . , we plot , , . both the rearranged and isotonized estimates. a projection of the original estimate / on the set of weakly increasing functions (that only preserves the the two values of the isotonized estimate y mean of the original estimate). We can compute by assigning both of them the average of the two 10 values of the original estimate /, whenever the latter violate the monotonicity requirement, and We leaving the original values unchanged, otherwise. this produces a flat function /^. see from Figure 1 that in our example This computational procedure, known as "pool adjacent domains with more than two points by simply applying the violators," naturally extends to procedure iteratively to any pair of points at which the monotonicity requirement remains violated (Ayer, Brunk, Ewing, Reid, and Silverman 1955). Using the computational definition of isotonization, one can show that, isotonization also improves upon the original estimate, for any p € see, e.g.. rearrangement, ll/p 1/p < \f'{x)-Mx)\^dx like [l,oo]: \f{x)-Mx)\Pdx Barlow, Bartholomew, Bremner, and Brunk (1972). (2.10) Therefore, it follows that any function f^ in the convex hull of the rearranged and the isotonized estimate both monotonizes and improves upon the The original estimate. first from homogeneity and subadditivity of norms, that property for is Up [l,oo]: i/p < Mx)\''dx \f obvious and the second follows is any p £ i/p + (1-A) \f\x)~h{x)Y'dx A ir(.r)-/o(2)Rx i/p l/(-r) where f^{x) = \f*{x) + (1 — X)f' (x) for - h{x)ydx any A £ (2.11) [0, 1]. Before proceeding further, let us also note that, by an induction argument similar to that presented in the previous section, the improvement property its listed above extends to the sequential multivariate isotonization and to convex hull with the sequential multivariate rearrangement. Thus, we see that a rather rich class of procedures (or operators) both monotonizes the and reduces the distance to the true target function. original estimate note that there is It is also important to no single best distance-reducing monotonizing procedure. Indeed, whether the rearranged estimate /* approximates the target function better than the isotonized estimate f^ depends on how example plotted steep or flat the target function in the right panel of Figure 1; is. We illustrate this point via a simple Consider any increasing target function taking values in the shaded area between /* and /^, and also the function /^Z"^, the average of the isotonized and the rearranged estimate, that passes through the middle of the shaded area. Suppose first that the target function error than /'. Now is suppose instead that the target function a smaller approximation error than /*. very steep nor very flat, steep or flat or, some combination the target function is is It is also clear that, if /^^^ can outperform either /* or rearrangement, isotonization, how steeper than /^'^, then /* has a smaller approximation /''. flatter than /'Z^, then f^ has the target function Thus, in practice of the two, depending in a particular application. is neither we can choose on our beliefs about 11 3. Monotone Functions by Rearrangement Improving Interval Estimates of In this section we propose to directly apply the rearrangement, univariate and multivariate, We to simultaneous confidence intervals for functions. show that our proposal will necessarily improve the original intervals by decreasing their length and increasing their coverage. Suppose that we are given an simultaneous confidence interval initial \£,u]^{\e{x),u{x)],xeX''), (3.1) where i{x) and u{x) are the lower and upper end-point functions such that i{x) < u{x) X £ for all X<^. We further suppose that the confidence interval [i, has either the exact or the asymptotic u] a G confidence property for the estimand function /, namely, for a given \t,u]] = Probp{£(a;) probabihty measures P in Probp{/ € for all assume that property size n, or in the ' . — f{x) — for all •- a point estimate, s{x) s{x)c is but finitely ,'V ,,, is The Wasserman is a, (3.2) for the given sizes / n (Lehmann ' one where [i,u] in (3.1) (see, e.g., We sample ,:,,' ' ' s{x)c, (3.3) is the covers the function / with the methods for to the bootstrap, the con- both for Johansen and Johnstone (1990), and Hall (2006) book provides an excellent overview of the existing on functions. The problem with such confidence estimates themselves, - There are many well-established methods parametric and non-parametric estimators (1993)). I the standard error of the point estimate, and c struction of the critical value, ranging from analytical tube for inference is, many sample and u{x) = f{x) + chosen so that the confidence interval specified probabihty, as stated in (3.2). x e X'^} > all sample sense, that type of a confidence interval for functions is critical value is, ' u{x), for containing the true probability measure Pg. Vn , i{x) where f{x) set < J{x) (3.2) holds either in the finite asymptotic sense, that and Romano 2005). A common some < (0, 1), methods intervals, similar to the point that these intervals need not be monotonic. Indeed, typical inferential procedures do not guarantee that the end-point functions f{x)±s{x)c of the confidence interval are monotonic. This means that such a confidence that can be excluded from interval contains it. In some cases the confidence intervals mentioned above functions at all, for non-monotone functions may not contain any monotone example, due to a small sample size or misspecification. case of misspecification or incorrect centering of the confidence interval the estimand / being covered by /o, so that / may [i, u] is not be monotone. {£, u] as We define the any case where not equal to the weakly increasing target function Misspecification is a rather common occurrence both 12 parametric and non-parametric estimation. in Indeed, correct centering of confidence inter- vals in parametric estimation requires perfect specification of functional forms hard to achieve. On and is estimation requires the so-called undersmoothing, a delicate requirement, which ric to using a relatively large width number in kernel-based estimation. of terms in series estimation In real applications with (2008) provide, in our interpretation, and a many may /o, some formal justification for oversmoothing: be desirable to make inference more robust, to enlarge the class of data-generating processes (3.2) holds. In Vn for any case, regardless of the reasons band- regressors, researchers tend targeting inference on functions /, that represent various smoothed versions of /o summarize features of amounts relatively small In a recent development, Genovese and to use oversmoothing rather than undersmoothing. Wasserman generally the other hand, correct centering of confidence intervals in nonparamet- for and thus equivalently, or, which the confidence interval property why confidence intervals may target / instead of /o, our procedures will work for inference on the monotonized version /* of /. Our proposal for improved interval estimates to rearrange the entire simultaneous confi- is dence intervals into a monotonic interval given by = [r,u*] ([r(2;),u*(2-)],xG A-^), where the lower and upper end-point functions and of the original end-point functions i and u* i* and u* are the increasing rearrangements d* In the multivariate case, to denote either 7r-multivariate rearrangements rearrangements on u. d* (3.4) and u* whenever we do not need to , and £,r Uj^ specifically we use the symbols or average multivariate emphasize the dependence IT. The following proposition describes the formal property of the rearranged confidence inter- vals. Proposition 3. Let [£, u] in (3.1) be the original confidence interval that has the confidence interval property (3.2) for the estimand function f : A"*^ K ^^ and let the rearranged confidence ' interval [t,u*] be defined as in (3.4). 1. The rearranged confidence interval sense that the end-point functions on .' X'^. Moreover, the event that [£*,u*] is weakly increasing and non-empty, in the and u* are weakly increasing on PC^ and (.* ' . [i,u] satisfy £* < u* contains the estimand f implies the event that \i*,u*] contains the rearranged, hence monotonized, version f* of the estimand f: f e \£,u] implies f* 6 [t,u*]. (3.5) In particular, under the correct specification, when f equals a weakly increasing target function /o, we have that f = f* = fo, so that /o e [d,u] im.plies /o £ [t,u*]. - (3.6) 13 Therefore, \r,u*] covers /*, which The rearranged confidence interval 2. under the correct equal to /o is L^ interval [Lu] in the average [i* ,u*] is weakly shorter than the initial confidence p £ [l,oo]; length: for each p t{x) U i/p 1 exist subsets Aq C > > or (a) i{x') CX > that x' i{x) + (i) 1/ e, (3.7) , rj-p t, f = — inf{|i' in the set = where for k = t'\'P -\- \v' K — — t]^ such that j, letg € > u{x') C = X of measure u , u.{x) + €, for Then, for any p E v' \v > some — v t\^ + > e 2, — \v' and — t'y] t' > The > is u{x) t > + 0, all + e, Xq x' G for some Then, for any p e (1, cxo) (3.E ripS where the infimum or such that v e for an ordering n > e such that for 0, we have > that = > (tti, ..., and u{x) have or (Hi) i{x') > > I{x) = x all (i) x'^ v' Xj, + e + is tt^:, ..., tTc/) — either and u{x) > t > t' + all e. of integers R-^^^j^^o...oR^^og{x), the following properties: S > 0, (ii) i{x) u{x') and a subset [XpX- {xj,x-j) and x' and taken over and e > + e, for with + e, some e ({x') and > 0. (1, oo) dx < IJx" r]p 0. u{x) i/p — i{x) g{x). Suppose that E(x) C(x) - uUx) where > > and X' C X, each of measure greater than Xj S Xj, X-j G X-J, X'., e such that for u{x') denote the partially rearranged function, g{x) d we set g{x) there exist subsets Xj X-j C X^~^ > and there Jx In the m.ultivariate case with d {l,...,d] withiTk + e, some for f X where £{x') ' < dx > i{x) + and u(x) > u(x') e p \l Jx" V e{x) 1/p (3.9) defined as above. proposition shows that the rearranged confidence intervals are weakly shorter than the original confidence intervals, strictly shorter. univariate case, I— u{x)\ dx each of measure greater than 5 , and either x, \t{x) -u*{x) x — y^t" and X^ A" and X e Xq, we have x'j i{x) / In the univariate case, suppose that i{x) and u{x) have the following properties: 3. V. v' iVp r < dx [X. /.V e specification, with a proba- greater or equal to the probability that [i,u] covers f. bility that is i{x) and x and also qualifies when In particular, the inequality (3.7) if i—> there is is the rearranged confidence intervals are necessarily strict for p £ (l,oo) in the a region of positive meaisure in X over which the end-point functions u{x) are not comonotonic. This weak shortening result follows for univariate cases directly from the rearrangement inequality of Lorentz (1953), and the strong shortening follows from a simple strengthening of the Lorentz inequality, as argued in the proof. shortening results for the multivariate case follow by an induction argument. The Moreover, the order-preservation property of the univariate and multivariate rearrangements, demonstrated in the proof, implies that the rearranged confidence interval [^*, u*] has a weakly higher coverage 14 than the original confidence interval [£,u]. We do not qualify but we demonstrate them through the examples Our in strict improvements in coverage, the next section. idea of directly monotonizing the interval estimates also applies to other monotonization procedures. Indeed, the proof of Proposition 3 reveals that part 1 of Proposition 3 applies to any order-preserving monotonization operator T, such that g{x) < m{x) for all x e ^'' implies Furthermore, part 2 of Proposition 3 to any distance-reducing operator T T o g(x) <To m{x) x e X'^. (3.10) on the weak shortenmg of the confidence intervals applies such that 1 i/p \Toe{x)-Tou{x)\Pdx / for all r < Jx'' / -1 i/p - u{x)\Pdx \(;(x)-u(xWdx \l!{x) / (3.11) . VJx'' Rearrangements, univariate and multivariate, are one instance of order- preserving and distancereducing operators. Isotonization, univariate and multivariate, (Robertson, Wright, and Dykstra 1988). and distance-reducing operators, such also order-preserving is another important instance Moreover, convex combinations of order-preserving as the average of rearrangement and distance-reducing. We and isotonization, demonstrate the inferential implications of these properties further in the computational experiments reported in Section 4. In this section how we provide an are 4. Illustrations empirical application of biometric age-height charts. We show the rearrangement monotonizes and improves various nonparametric point and interval estimates for functions, and then we quantify the improvement in a simulation example that mimics the empirical application. (R Development Core Team We carried out all the computations using the software R 2008), the quantile regression package queuitreg (Koenker 2008), and the functional data analysis package fda (Ramsay, Wickham, and Graves 2007). 4.1. An Empirical Illustration with Age-Height Reference Charts. Since duction by Quetelet in the 19th century, reference growth charts have become to assess an individual's health status. their intro- common tools These charts describe the evolution of individual an- thropometric measures, such as height, weight, and body mass index, across different ages. See Cole (1988) for a classical work on the subject, and Wei, Pere, Koenker, and He (2006) for a recent analysis from a quantile regression perspective, and additional references. To illustrate the properties of the growth charts ship with age. for height. It Our data is rearrangement method we consider the estimation of clear that height should naturally follow an increasing relation- consist of repeated cross sectional measurements of height and age from the 2003-2004 National Health and Nutrition Survey collected by the National Center Health Statistics. Height is meeisured as standing height in centimeters, and age is recorded for in 15 months and expressed in years. between age and height, we Our twenty. Y Let final and X expectation of where u is To avoid confounding restrict the sample factors that US-born white males of age two through to X given = x, the quantile index. and Qy{u\X = £^[y|j>(' = indices (5%, 50%. The population functions and 95%), and (3) denote the conditional x] x is x i—> i—» fo{x) is Qy[u\X = 95%; and, in the third case, the target function {u,x) i—» (CQF) for several x i— x], fo{u,x) is > (CQP) = ^[yiJf for u {u,x) H-> We x] Qy[u|X and the = x] CQF x height in the Qy\u\X = X (u,x) for x]; 5%, 50%, and SfyiX = Qy{u\X = quantile = natural monotonicity requirements for the target functions are the following: i—> x, ^-* The 1-^ X= given of interests are (1) the conditional the entire conditional quantile process In the first case, the target function second case, the target function x ^^ fo{x) Y x) denote the u-th quantile of expectation function (CEF), (2) the conditional quantile functions given age. affect the relationship sample consists of 533 subjects almost evenly distributed across these ages. denote height and age, respectively. Let Y might The CEF x) should be increasing in age x, and the should be increasing in both age x and the quantile index x]. CQP u. estimate the target functions using non-parametric ordinary least squares or quantile regression techniques and then rearrange the estimates to satisfy the monotonicity require- ments. We consider (a) kernel, (b) locally linear, (c) regression splines, and (d) Fourier series methods. For the kernel and locally linear methods, we choose a bandwidth of one year and a box kernel. For the regression splines method, we use cubic B-splines with a knot sequence {3,5,8,10,11.5,13,14.5,16,18}, following Wei, Pere, Koenker, and He (2006). For the Fourier method, we employ eight trigonometric terms, with four the estimation of the conditional quantile process, we sines and four cosines. Finally, for use a net of two hundred quantile indices In the choice of the parameters for the different methods, {0.005,0.010, ...,0.995}. we select values that either have been used in the previous empirical work or give rise to specifications with similar complexities The panels A-D for the different of Figure 2 show the methods. original ~ ., :'' ' and rearranged estimates ' of the conditional expectation function for the different methods. All the estimated curves have trouble capturing the slowdown in the growth of height after age fifteen and yield non-monotonic curves for the highest values of age. The Fourier series performs particularly poorly in approximating the aperiodic age-height relationship and has many non-monotonicities. The rearranged estimates correct the non-monotonicity of the original estimates, providing weakly increasing curves that coincide with the original estimates in the parts where the latter are monotonic. Figure 3 displays similar but more pronounced non-monotonicity patterns conditional quantile functions. In all cases, the for the estimates of the rearrangement again performs well in delivering curves that improve upon the original estimates and that satisfy the natural monotonicity requirement. We quantify this improvement, in the next subsection. 16 A. CEF (Kernel, h = 1) B. CEF(Loc. Linear, h = 15 CEF (Regression 20 Age Age C. 1) D CEF Splines) (Fourier) 15 10 Age Figure Nonparamctric estimates of the Conditional Expectation Function (CEF) of 2. heiglit given age and their increasing rearrangements. Nonparamctric estimates are obtained using kerne! regression (A), locally linear regression (B), cubic regression B-splines series (C), and Fourier series (D). Figure 4 illustrates the multivariate rearrangement of the conditional quantile process along both the age and the quantile index arguments. inal estimate, its age rearrangement, its We quantile rearrangement, and its average multivariate rearrangement (the average of the age-quantile and quantile-age rearrangements). plot the corresponding contour surfaces. (CQP) plot, in three dimensions, the orig- Here, for brevity, we We also focus on the Fourier series estimates, which have the most severe non-monotonicity problems. (Analogous figures for the other estimation methods considered can be found in the working paper version Cher- nozhukov, Fernandez- Val, and Galichon (2006a)). Moreover, we do not show the multivariate age-quantile and quantile-age rearrangements separately, because they are very similar to the 17 CQF: 5%, 50%, 95% A. (Kernel, h = 1) B. CQF: 5%, 50%, 95% (Loc. Linear, h = 1) Age C. CQF: 5%, 50%, 95% (Regression Splines) D. CQF: 5%, 50%, 95% Ags Figure (Fourier) Age Nonparamotric estimates of the 5%, 50%, and 95% Conditional Quantile 3. Functions (CQF) of height given age and their increasing rearrangements. Nonparametric timates are obtained using Icernel regression (A), locally linear regression (B), B-splines series (C), and Fourier series (D). average multivariate rearrangement. is non-monotone index. The average , ,, . see from the contour plots that the estimated CQP age and non-monotone in the quantile index at extremal values of this in an estimate of the We es- cubic regression multivariate rearrangement fixes the non-monotonicity problem delivering CQP that is monotone in both the age and the quantile index arguments. Furthermore, by the theoretical results of the paper, the multivariate rearranged estimates upon the necessarily improve In Figures 5 and vals. 6, Figure 5 shows we original estimates. . / illustrate the inference properties of the rearranged confidence inter- 90% uniform confidence intervals for the conditional expectation function and three conditional quantile functions for the 5%, 50%, and 95% quantiles ba,sed on Fourier .18 A. C. CQP (Fourier) B. CQP: Age Rearrangement D. "T 0.0 E. G. CQP: QuantJIe Rearrangement CQP: Average Figure 4. Multivariate Rearrangement F. H. CQP: Contour CQP; Contour (Age Rearrangement) 1 1 1 1 02 04 05 08 CQP: Contour (Quantile Rearrangement) CQP: Contour (Average Multivariate Rearrangement) Fourier series estimates of the Conditional Quantile Process of height given age and T" 10 their increasing rearrangements. Panels (CQP) C and E plot the one dimensional increasing rearrangement along the age and quantile dimension, respectively; panel G shows the average multivariate rearrangement. 19 A. CEF 90% 90% 90% CI Onginal 90Vo CI Rearrangod C. CQF; 6% 90% 90% Figure 5. 90% 60% B. CQF-. (Fourier) (Fourier) E. (Fourier) CI Onginal CI Rearranged CQF: 96% (Fourier) CI Original CI Rearranged confidence intervals for the Conditional Expectation Function (CEF), and 5%, 50% and 95% Conditional Quantile Functions (CQF) of height given age and their in- creasing rearrangements. Nonparamotric estimates are based on Fourier series and confidence bands are obtained by bootstrap with 200 repetitions. series estimates. We obtain the initial with 200 repetitions to estimate the confidence intervals of the form (3.3) using the bootstrap critical values (Hall 1993). We then obtain the rearranged confidence intervals by rearranging the lower and upper end-point functions of the fidence intervals, following the procedure defined in Section In Figure 3. 6, we 90% uniform We and see from the figures that the rearranged confidence inter- vals correct the non-monotonicity of the original confidence intervals we initial confidence bands for the entire conditional quantile process based on the Fourier series estimates. length, as con- illustrate the construction of the confidence intervals in the multidimensional case by plotting the rearranged initial shall verify numerically in the next section. and reduce their average ' , T3 C3 Qi G) C > , "Eb; m j3 OH I '^ O d a Q. "3 a! c ,o cyi O c J3 O JD S J= T3 tG CJ o >i re '5) o I OJ G O 0^ o Tj CG g; C o § a tfl _QJ .2 CJ c OJ tfi 13 O 10) ^ 13 o e 0) bO c a 1-, Oj tH CO 0) cti H S Qi D > O ^3 fe 02 E 21 4.2. Monte-Carlo provement Illustration. and in the point We original estimates. also following compare rearrangement to isotonization and closely Y = Z{Xyp + e, quantifies the im- to convex matches the empirical application presented above. consider a design where the outcome variable e, Monte Carlo experiment combina- and isotonization. tions of rearrangement Our experiment The interval estimation that rearrangement provides relative to the and the disturbance is Y equals a, we Specifically, location function plus a disturbance independent of the regressor X . The vector Z{X) includes a constant and a piecewise linear transformation of the regressor A' with three changes of slope, namely Z(X) = (1,X,1{X > 5}{X -5),1{X > 10}-(A-10), 1{.Y > (A- 15)). 15}- This design implies the conditional expectation function E[Y\X] = Z{Xyp, (4,1) QY{u\X) = Z{Xy/3+Q,{u). (4.2) and the conditional quantile function We select the parameters of the design to match the empirical example of growth charts previous subsection. Thus, obtained in the we parameter P equal to the ordinary set the growth chart data, namely (71.25, 8.13, ^2.72, 1.78, —6.43). This value and the location specification (4.2) imply a model for the in CEF and CQP that parameter is monotone age over the range of ages 2-20. To generate the values of the dependent variable, disturbances from a normal distribution with the variance of the estimated residuals, regressor A' in replication, e mean and = Y — Z{X)' p, in the in the least squares estimate mean and variance equal to the growth chart data. we draw We fix the of the replications to be the observed values of age in the data set. In each all we estimate the CEF and CQP using the nonparametric methods described in the previous section, along with a global polynomial and a flexible Fourier methods. For the global polynomial method, we fit a quartic polynomial. For the flexible Fourier method, we use a quadratic polynomial and four trigonometric terms, with two sines and two cosines. In Table of the 1 CEF. we report We the average LP errors (for p = 1,2, and oo) for the original also report the relative efficiency of the rearranged estimates, estimates measured as the ratio of the average error of the rearranged estimate to the average error of the original estimate; together with relative efficiencies for an alternative approach based on isotonization of the original estimates, estimates. Mammen and an approach consisting The two-step approach based on (1991), of averaging the rearranged and isotonized isotonization corresponds to the SI estimator in where the isotonization step is carried out using the pool-adjacent violator algorithm (PAVA). For regression splines, we also consider the one-step monotone regression splines of Ramsay (1998). Table and p L^ Estimation Errors of Original, Rearranged, Isotonizcd, Average Rearranged- 1. and Monotone Estimates of Isotonized, L'klL'o L'o thie Conditional Expectation Function, for p = 1,2, oo. Univariate Case. ^//^O L'm/L^o ^\r+I)I2I^^0 L';,/L'^ L'o A. Kernel L^JLl ^(fl+/)/2/-^0 B. Locally Linear 1 1.00 0.97 0.98 0.98 0.79 0.96 0.97 0.96 2 1.30 0.98 0.99 0.98 0.99 0.96 0.97 0.97 oo 4.54 0.99 1.00 1.00 2.93 0.95 0.95 0.95 D. Quartic C. Regression Splines 1 0.87 0.93 0.95 0.94 0.99 1.33 0.89 0.87 0.87 2 1.09 0.93 0.95 0.94 0.99 1.64 0.89 0.88 0.87 oo 3.68 0.85 0.88 0.86 0.84 4.38 0.86 0.86 0.86 F. Flexible Fourier E. Fourier 1 6.57 0.49 0.59 0.40 0.73 0.97 0.99 0.98 2 10.8 0.35 0.45 0.30 0.91 0.98 0.99 0.98 oo 48.9 0.16 0.34 0.20 2.40 0.98 0.98 0.98 The Notes: table is the is L'' the in 6 cases; L'' error of the isotonized estimate; i-f^ . L^ is the L'' the monotone regression splines for these cases were discarded for L^ error of the error of the original estimate; and isotonized estimates; We The algorithm based on 1,000 replications. stopped with an error message /) ;2 error of the calculate the average L^ error as the '^ the is '^e L'' the estimators. all Lq is error of the rearranged estimate; L; error of the average of the rearranged L'' monotone regression splines estimates. Monte Carlo average of i/p LP where the target function fo{x) is \f(x)-Mx)\Pdx the CEF the original nonparametric estimate of the methods considered, we (i.e., x], or its increasing transformation, For 1% to 84% CEF for global is no uniform winner between The rearrangement works Quartic polynomials all better than isotonization In Table 2 We works some norms. Averaging the two is For comparable to the computationally splines procedure. we report the average L^ quantile process. for it the estimation methods considered. regression splines, the performance of the rearrangement monotone of the reduction in the average error, depending on the procedures seems to be a good compromise for intensive one-step all more accurately Local Polynomials, Splines, Fourier, and Flexible Fourier estimates, but worse than isotonization more and the estimate f{x) denotes either values of p). In this example, there rearrangement and isotonic regression. for Kernel, CEF = find that the rearranged curves estimate the true than the original curves, providing a method and the norm E[Y'\X errors for the original estimates of the conditional also report the ratio of the average error of the multivariate rearranged 23 estimate, with respect to the age and quantile index arguments, to the average error of the original estimate; together isotonized estimated. with the same ratios The and average rearranged- for isotonized estimates isotonized estimates are obtained by sequentially applying the PAVA to the two arguments, and then averaging for the two possible orderings age-quantile and quantile- age. Table 2. Estimation L'' Errors Original, of Rearranged, and Isotonized, = Rcarranged-Isotonized Estimates of the Conditional Quantile Process, for p 1,2, Average and oo. Multivariate Case. tP irP ^R/^O LVL'o A Kernel r'' p rP 1 TV ^Rl^O i'o LVL'o ^lR+I)/2l^O B. Locally Linear 1 1.49 0.95 0.97 0.96 1.21 0.91 0.93 0.92 2 1.99 0.96 0.98 0.97 1.61 0.91 0.93 0.92 oo 13.7 0.92 0.97 0.94 12.3 0.84 0.87 0.85 1 1.33 0.90 0.93 0.91 1.49 0.90 0.89 2 1.78 0.90 0.92 0.90 1.87 0,90 0.89 0.89 00 16.9 0.72 0.76 0.73 12.6 0.68 0.69 0.68 1 6.72 0.62 0.77 0.64 1.05 0.96 0.97 0.96 2 13.7 0.39 0.58 0.44 1.38 0.95 0.97 0.96 84.9 0.26 0.47 0.36 10.9 0.84 0.86 0,85 D, Quartic C. Splines F. Flexible E. Fourier oo Notes: is the The L'' table is based on 1,000 replications. Lq is the L'' ,, is the L'' Fou rier error of the original estimate; error of the average multivariate rearrange d estimate; L^ multivariate isotonized estimate; LJ"^^, 0.89 error of the the is mean L'' L^ error of the average of the average multivariate rearranged and isotonized estimates. The average Lp error is the Monte Carlo average of i/p LP: = \f{u,x) /U where the target function fo(u,x) - fo{u,x)\'''dxdu JX is the conditional quantile process Qy{u\X = x), and the estimate f{u, x) denotes either the original nonparametric estimate of the conditional quantile process or its monotone transformation. We present the results for the average multivariate rearrangement only. The age-quantile and quantile-age multivariate rearrangements give rors that are very similar to their average multivariate rearrangement, report them separately. For all ranged curves estimate the true to 74% the methods considered, we CQP er- and we therefore do not find that the multivariate rear- more accurately than the original curves, providing a 4% reduction in the approximation error, depending on the method and the norm. As in 24 the univariate case, there and is no uniform winner between rearrangement and isotonic regression their average estimate gives a good balance. Table 3 reports Monte Carlo coverage frequencies and integrated lengths and monotonized 90% confidence bands the for integrated L^ length, as defined in Proposition CEF. For a measure 3, with p = 1,2, and for of length, oo. We the original we used the constructed the original confidence intervals of the form in equations (3.3) by obtaining the pointwise standard errors of the original estimates using the bootstrap with 200 repetitions, critical value so that the original confidence frequency of 90%. We and we calibrated the bands cover the entire true function with the exact constructed monotonized confidence intervals by applying rearrange- ment, isotonization, and a rearrangement-isotonization average to the end-point functions of the original confidence intervals, as suggested in Section we Here, 3. find that in all cases the rearrangement and other monotonization methods increase the coverage of the confidence intervals while reducing their length. In particular, we see that monotonization increases cov- erage especially for the local estimation methods, whereas for the global estimation methods. For the most problematic Fourier estimates, there are both important increases in coverage and reductions in length. Appendix A.l. Proof of Proposition 1. A. The Proofs of Propositions first part establishes the weak inequality, following in part the strategy in Lorentz's (1953) proof. The second stated in the proposition. Proof of Part 1. We constant on intervals assume ((s - The proof at first that the functions /(•) l)/r,s/r], s = 1, ...,r. /s, fe some m> I. If £ does not exist, set For any simple /(•) with r steps, S{J) fom > foe and the definition f. U£ that we S{f) to be a r-vector and other elements equal fg, submodular function L L(5(/)(x), /o(x))(ix + < all : K^ —* K-|., L{fc, forn) J,^ by > fi fm, < L{hJw) + L{f{x), fo{3:))d2\ using integrate simple functions. Applying the sorting operation a pletely sorted, that /* /_^, / denote exists, set of the submodularity, L{fm,foe) L{fm,fom)- Therefore, we conclude that let Let i be an integer in l,...,r such that = with the £-th element equal to fm, the m-th element equal to to the corresponding elements of /. For any and /o() are simple functions, equal to the value of /(•) on the s-th interval. Let us define the sorting operator S{f) as follows: for focuses directly on obtaining the result part establishes the strong inequality. the r-vector with the s-th element, denoted > Im reduces length most noticeably it =So ... o S[f) is, . sufficient finite rearranged, vector /*. By number Thus, we of times to /, we obtain a com- can express /* as a finite composition repeating the argument above, each composition weakly reduces the 25 Table Coverage Probabilities and Integrated Lengths of Original, Rearranged, 3. tonized, and Average Rcarrangcd-Isotonized 90% Confidenec Iso- Intervals for the Conditional Expectation Function. Length Cover Interval L' L'/L'o Cover L'/Lh Lengl:h L^/L^ L '/Lh L' A. Kernel B. Locally Linear .90 8.80 R .96 8.79 1 1 I .94 8.80 1 1 (R + I)/2 .95 8.80 1 1 .90 8.63 .99 .96 8.63 1 1 .97 .99 .94 8.63 1 1 .98 .99 .95 8.63 1 1 .97 D, Quartic C. Splines 6.32 .90 L^/L^ L-'/Ll .90 10.43 R .91 6.32 1 1 1 .90 10.41 1 .99 .93 I .91 6.32 1 1 1 .90 10.43 1 1 .93 {R + I)/2 .91 6.32 1 1 1 .90 10.42 1 1 .93 E. Fourier O R F. Flexible Fourier .90 24.91 .90 6.45 1 24.52 .98 .94 0,63 .90 6.45 1 1 .97 I 1 24.91 1 .97 0.69 .90 6.45 1 1 .97 {R + I)/2 1 24.71 .99 .95 0.65 .90 6.45 1 1 .97 The Notes: isotonized, table based on 1,000 replications. O, R, is and average rearrangod-isotonizcd confidoncc intervals. refer to original, rearranged, Coverage probabilities (Cover) are Original confidence intervals calibrated to have for the entire function. approximation and (R + /)/2 /, 90% coverage probabilities. error. Therefore, / L{r{x),fo{x))dx < f LiS^.^{f),fQ{x))dx < f L{f{x)Jo{x))dx. Jx Jx J'^' T~r^-times (A.l) nnitc Furthermore, this inequahty mapping rV to and to /(•) /i' is extended to general measurable functions /() and by taking asequence of bounded simple functions /o(-y almost everywhere as r -^ oo. Proof of Part 1. We and Xq, each have that (i) 2. Let us /o(-) converging (•) of /'^^(O to of its quantile function /*^'"^(-) to the quantile Since inequality (A.l) holds along the sequence, the dominated convergence theorem implies that (A.l) also holds of Part and /q The almost everywhere convergence /() implies the almost everywhere convergence function of the limit, /*()• /''^'() first D for the general case. consider the case of simple functions, as defined in the proof take the functions to satisfy the following hypotheses: there exist regions Xq of x' measure greater than > in the proposition. x, (ii) f{x) > f{x') 5 + > e, 0, and such that (iii) for all fo{x') For any strictly submodular function L > : x E Xq and fo{x) M? ~* + e, M4. for e x' > G ^q, we specified we have that rj = 26 + ini{L{v',t) the set is K L{v,t') such that given in Figure We — v' — L{v,t) > + v c and > t' + t 0, e. where the infimum A taken over is v,v' all , t, t' simple graphical illustration for this property 1. G Xq, of r-vector This induces a sorting gain of at least /. of points that can be sorted in this this way, way is at least from sorting is at least 5r}. That is, The times 1/r. rj We then proceed to sort 5. and then continue with the sorting of other total gain all points. After the sorting /^ L{f*{x), fQ{x))dx < mass total of these points in is completed, the — 5r]. L{f{x),fo{x))dx j-^ then extend this inequality to the general measurable functions exactly as in the proof D of part one. Proof of Proposition A. 2. Proof of Part We 1. dimensions. If so, We consists of the following four parts. The claim > then consider any d is true for d Suppose the claim 2. is = by f*{x) 1 true in d in x_j for all arguments x_j of each Xj. Thus, for any random variable on the of (A.2) in the stochastic sense. left > X-j, we have that f*{xj,x'_j) for arguments at induction. all / For p We i—> for each Xj £ f*{Xj,X-j) conclude therefore that x The claim is i-^ of Part X= variable on the right random right, variable on namely (A. 3) [0, 1]. weakly increasing by virtue of being f]{x) is weakly increasing of the Proposition 1 now in all = of follows by D (a). By Proposition ~ 1, we have that fo{xj,x.j)\'' dxj < / for each x_j, |/(xj,a;_j) follows by integrating with respect to x_j oo, the claim follows Proof of Part 2 ° fi^)^ > f*{xj,x^j) points x € X"^. \fj{xj,x~j) Now, the claim ^d random - Proof of Part 2 sides. of (A.2) dominates the Therefore, the quantile function of the each x_j, the function Xj a quantile function. its left (A.2) dominates the quantile function of the random variable on the Moreover, 1 except for the argument x^, must be x, x'_. f{X„x'_^)>f(Xj,x^,)iorXj^U\0,l]. Therefore, the — then the estimate f{xj,x-j), obtained from the original estimate f(x) after applying the rearrangement to weakly increasing The proof 2. prove the claim by induction. being a quantile function. the in can begin sorting by exchanging an element f(x), x € Xq, of r-vector / with an element f{x'), x' We > L(v',t')} (b). We then to f{x) first = by taking the limit as p - /o(Xj,x_j)|^dXj. and taking the p-th root ^ a sequence of weak inequalities that imply the inequahty of both D oo. apply the inequality of Part 2(a) to J{x) R-k^.i ° Rtt^ ° /(^)i ^^'^ so on. In (A. 4) doing so, = we f{x), then to f{x) — recursively generate (2.6) stated in the Proposition. D , 27 Proof of Part 3 (A. 4), 1 Part and where X-j, by Part 2(a), we have the weak inequality < l/(ij,a;-j) / - fQ{xj,x^j)\'' dxj - rjpS, (A.5) JX same way defined in the T}p is -])? dxj k{xj Ifji^j^^ over X-j & A'''"^ \ X-j, of measure v, A!'^~^ \ we have the strong inequahty 2, X For each x-j £ (a). each x-j S X-j, by the inequahty for the univariate case stated in Proposition for I as in Proposition — i^, Integrating the weak inequality (A. 4) 1. and the strong inequahty (A.5) over X^j, of measure we obtain f;{x)-fo{x)\''dx< / The claim now \f{x)-fQ{x)fdx-^M. (A.6) Xd .v D follows. Proof of Part 3 (b). As in Part 2(a), we can recursively obtain a sequence of weak inequalities describing the improvements in approximation error from rearranging sequentially with respect to the individual arguments. to be of the form stated Moreover, at least one of the inequahties can be strengthened in (A.6), from the assumption of the claim. The resulting system of D inequalities yields the inequality (2.8), stated in the proposition. Proof of Part 4. We can write 1/p 1/p r(^-)-- 'dx /o(x) x" Jx'' ,,, where the for the L-p A. 3. The ^-hi ): last inequality follows ^E 1 dx (/"(-) -/o(-) TrGn (A.7) !/p f;{x) n L - Mx) dx X-i by pulling out l/|n| and then applying the triangle inequality D norm. Proof of Proposition rest of the proof relies 3. Proof of Part operator: for any measurable functions p, g{x) 1. The monotonicity follows from Proposition 2. on establishing the order-preserving property of the 7r-rearrangement < m{x) for all x £ X m : have that X'^ implies g*{x) < m*{x) for all x G X" (A.8) Given the property we have that e(x) which < f{x) < u(x) verifies the for all claim of the x e X'^ implies t(x) first part. The claim < f*{x) < u*(x) for all x e X'^ also extends to the average multivariate rearrangement, since averaging preserves the order-preserving property. It remains to establish the order-preserving property by induction. We first for tt— rearrangement, note that in the univariate case, when d = 1, which we do order preservation is obvious from the rearrangement being a quantile function; the random variable m{X), where 28 X ~ Uniform[A'], dominates the quantile function m*{x) of X We for each x £ m{X) must is g{xj,X--j) < m[xj,x-j) with respect to x-j holding Xj — Now As stated we can i and By u. fh{xj,X-j), g{xj,x-j) and x_j ^^ m{xj,x-j) g{xj,X-j) and Xj i-^ ^-> fh{xj,X-j), holding x-j D in the text, the weak inequality is and Then we apply u. is due to Lorentz (1953). similar to the proof of Proposition and start with simple functions £() alent vector representations i both i-^ then for each Xj G A^ and apply the order-preserving property of the univariate For completeness we only briefly note that the proof Indeed, < and work with u(-) the definition of submodularity (2.3), each application of ..., {So...oS{C), So...oS{u)), (i*,u*)} each other, and the sequence can be taken to be {i*,u*) of vectors {£,u). The finite, their equiv- the two point sort operation duces submodular discrepancies between vectors, so that pairs of vectors {(£, u), {S{i), S{u)), Suppose the conclude that (A. 8) holds. fixed, for each X-j, to 1. If so, 2. implies g{xj,x-j) rearrangement to the univariate functions Xj 2. > with d I rearrangements of X-j fixed. , hence the in the stochastic sense, be greater than the quantile function g*{x) of g{X) true for any d in are multivariate Proof of Part g{X) variable then extend this to the multivariate case by induction: . order-preserving property where g and random in S weakly inequality extends to general last pair is to re- the sequence become progressively weakly where the S closer to the rearrangement bounded measurable functions by passing to the limit and using the dominated convergence theorem. To extend the proof to the we apply multivariate case, exactly the same induction strategy as in the proof of Proposition D 2. Proof of Part Finally, the proof of strict inequality in the univariate case 3. the proof of Proposition we have over t > that t] all v, v', t, t' + £. = t' 2, using the fact that for strictly submodular functions in{{L{v',t) in the set The extension the proof of Proposition K + — L{v,t') such that L{v,t) v' > v + — e > L{v',f')} and t' > t 0, + e similar to is L K^ : where the infimum or such that v > v' t-^ R+ is taken + e and of the strict inequality to the multivariate case follows exactly as in 2. D . References Andrews, D. W. K. (1991): "Asymptotic normality of series estimators for nonparametric and semiparametric regression models," Econometnca, 59(2), 307-345. Ayer, M., H. D. Brunk, G. M. EwiNG, W. T. Reid, and E, Silverman (1955): "An empirical distribution function for sampling with incomplete information," Ann. Math. Statist., 26, 641-647. Barlow, R. E.. D. under order J. Bartholomew, restrictions. J. M. Bremner, and H. D. Brunk York-Sydney, Wiley Series in Probability and Mathematical Statistics. Statistical inference (1972): The theory and application of isotonic regression. John Wiley & Sons, London-New 29 ChaUDHURI, "Nonparamctric estimates of regression quantilcs and their p. (1991): Ann. tion," Statist., 19, Chernozhukov, v., Fernandez- Val, and A. Galichon I. local Bahadur representa- 760-777. MIT tone Functions by Rearrangement," "Innproving Approximations of (2006a): UCL CEMMAP and Working Paper, available Mono- at www.arxiv.org, cemmap.ifs.org.uk, www.ssrn.com. "Quantile and Probability Curves without Crossing," (2006b): MIT and UCL CEMMAP Working CEMMAP Working Paper, available at www.arxiv.org, cemmap.ifs.org.uk, www.ssrn.cbm. "Rearranging Edgeworth-Cornish-Fisher Expansions," (2006c): MIT and UCL Paper, available at www.arxiv.org, cemmap.ifs.org.uk, www.ssrn.com. Cole, T. (1988): "Fitting J. DavydOV, Y., and smoothed centile curves to reference data," J "An index of monotonicity and R. ZlTIKIS (2005): its Royal Stat Soc, 151, 385-418. estimation: a step beyond econometric applications of the Gini index," Metron, 63(3), 351-372. Dette. Neumeyer, and K. H., N. "A simple nonparamctric estimator of a F. Pilz (2006): strictly monotone regression function," Bernoulli, 12(3), 469-490. Dette. Stat. Dette, more Fan, and H., K. F. Pilz (2006): "A comparative study of monotone nonparametric kernel estimates," and H., and J., Scheder R. GubelS I. and Applied Fougeres, A.-L. Canadienne de (2006): "Strictly The Canadian Journal of variables," Statistics monotone and smooth nonparametric Local polynomial modelling and (1996): Probability. Chapman h Hall, (1981): applications, vol. 66 of Monographs on The Canadian .Journal of Statistics / La Revue P. (1993): "On the bias in flexible functional forms and an essentially unbiased form: the Fourier Econometrics, 15(2), 211-245. GenoveSE. C, and L. Wasserman (2008): "Adaptive confidence bands," Ann. Statist., 36(2), 875-905. "On Edgeworth Expansion and Bootstrap Confidence Bands mation," Journal of the Royal Statistical Society. Series He. X., its London. (1997): "Estimation de densites unimodales," flexible form," J. Hardy, G. regression for two or 535-561. Statistics, 34(4), Statistique, 25(3), 375-387. Gallant, A. R. Hall, J. Comput. Sxmul., 76(1), 41-56. Littlewood, and G. Polya H., J. E. AND Q.-M. Shao (2000): B (1952): Inequalities. "On parameters in Nonparametric Curve Esti- (Methodological), 55(1), 291-304. Cambridge University of increasing dimensions," J. Press, 2d ed. Multivariate Anal., 73(1), 120-135. JOHANSEN, in AND S., I. M. JOHNSTONE (1990): "HotcUing's theorem on the volume of tubes: some illustrations simultaneous inference and data analysis," Ann. Knight, K. (2002): based on the Li-norm and KOENKER, RR,, KOENKER, R., Lehmann, E. Springer, New 652-684. related methods (Neuchdtel, 2002), Stat. Ind. Techno!., pp. 47-65. Birkhauser, Basel. AND G. S. Bassett (1978): "Regression quantiles," Econometnca, 46, 33-50. AND P. Ng (2005): "Inequality constrained quantile regression," Sankhyd, 67(2), L., AND J. P. Romano (2005): Testing statistical hypotheses, Springer Texts (1953): E. (1991): Statistics, 19(2), in Statistics. "An inequality for rearrangements," Timer. Math. A4onthly, 60, 176-179. "Nonparamctric Regression Under Qualitative Smoothness Assumptions," The Annals of 741-759. E., J. S. for constrained 418-440. York, third edn. Lorentz, G. G. Mammen, Statist., 18(2), are the limiting distributions of quantile estimators?," in Statistical data analysis (2008): quantreg: Quantile RegressionK package version 4.17. KOENKER, Mammen, "What Marrow. smoothing," B. A. Turlach, A'ND M. Statist. Sci., 16(3), 232-248. P. Wand (2001): "A general projection framework - 30 Matzkin, R. "Restrictions of economic theory in nonparametric methods," in L. (1994): metrics, Vol. IV, vol. 2 of Handbooks in Econom., pp. 2523-2558. North-Holland, Newey, W. K. (1997): "Convergence rates and asymptotic normality Handbook of econo- Amsterdam. for series estimators," J. Econometrics, 79(1), 147-168. PORTNOY, S. (1997): "Local asymptotics for quantilc smoothing splines," Ann. Statist., 25(1), 414-434. R Development Core Team for Statistical Ramsay, (2008): R: A Language and Environm.ent for O. (1988): "Monotone Regression Splines J. Statistical Com.putingR Foundation Computing, Vienna, Austria, ISBN 3-900051-07-0. and m Action," Statistical Science, 3(4), 425-441. W. Silverman (2005): Functional data analysis. Springer, New York, second edn. Ramsay, J. O., H. Wickham, and S. Graves (2007): fda: Functional Data AnalysisR package version 1,2.3. Robertson, T., F. T. Wright, and R. L. Dykstra (1988): Order restricted statistical inference, Wiley Ramsay. 0., J. B. and Mathematical Series in Probability Statistics: Probability and Mathematical Statistics. John Wiley & Sons Ltd., Chichester. SiLVAPULLE, M. J., AND P. K. SEN Statistics. Wiley-Interseience [.3ohn Stone, C. tion," J. (1994): Ann. "The use Statist., 22(1), Constrained statistical inference, Wiley Series (2005): Wiley & Sons), of polynomial splines 118-184, With in Probability and Hoboken, NJ. and their tensor products in multivariate function estima- discussion by Andreeis Buja and Trevor Hastie and a rejoinder by the author. VlLLANl, C. (2003): Topics in optimal transportation, vol. 58 of Graduate Studies m Mathematics. American Mathematical Society, Providence, RI. Wand, M. P., Probability. Wasserman, and M. C. Chapman and Jones (1995): Kernel smoothing, vol, 60 of Monographs on Statistics and Applied Hall Ltd., London. L. (2006): All of nonparametric statistics. Springer Texts in Statistics. Springer, Wei, Y., A. Pere, R. Koenker, and X. He charts," Stat. Med., 25(8), 1369-1382. (2006): New York. "Quantile regression methods for reference growth