AN ASYMPTOTIC REPRESENTATION FOR M-ESTIMATORS AND LINEAR FUNCTIONS OF ORDER STATISTICS by Raymond J. Carro11* Department of statistics University of North Caro Una at Chape Z Hi U Institute of Statistics Mimeo Series #1006 May, 1975 * This work was suppo:;:ted in pa.:.'t by the Air Force Office of Scie'ltific Research under Grant No. APOSR-75-2840. AN ASYMPTOTIC REPRESENTATION FOR M-ESTIMATORS AND LINEAR FUNCTIONS OF ORDER STATISTICS by Raymond J. Carroll· University of North Carolina at Chapel- BiH Let Xl' X2 , .•• F(x - e) Let statistics. be independent observations from a distribution functior;, Tn be an M-estimator or a linear function of order Conditions are given for the existence of i.i.d. mean zero random ,n YI , YZ' •.• such that Tn - a = n -1 LI Y + o(n -1 (log n) 2) a. s. r,: i this representation is shown to hold for Huber and Hampel M-estimators variables n ~ 00 ; as well as the trimmed mean. It can also be extended to a scale invariant version of the M-estimators; if F is symmetric there is no change in the representation, but the effect of asymmetry is to add in an extra random term. The results are applied to show (i) an almost sure equivalence between M-estimators and linear functions of order statistics (ii) that a similar representation holds for adaptive estimators in the spirit of Jaeckel (1971). ~s 1970 Subject Classifications: Primary 62G35; Secondary 62E20 Key Words and Phrases: M-estimators, Linear function of order statistics, Robustness, Adaptive estimators * This work was supported in part by the Air Force Office of Scientific Research under Grant AFOSR-75-2840. -2- INTRODUCTION Let function parameter Xl' X2 ' ••. be a sample of i.i.d. random variables with distribution F(x - 8) • Two basic t)~es of robust estimators Tn of the location 8 are the M-estimators (Huber (1964)) and linear functions of order statistics (LFO's). It is known (Huber (1972), Filippova (1962)) that in many cases the M-estimators satisfy n~{Tn where I(x,F) - I I(x,F)dFn(x)} + 0 in probability, is the influence curve (Hampel (1972)). de Wet and Venter (1974) show that the trimmed mean can be represented as the mean of i.i.d. random variables with an error term (197la) has shown ~lder O(n- l 10g2n) almost surely. Jaeckel some conditions that for a large class of M-estimators L for which M - L = 0(n- 1 ) in probability; use of n n 11 the de Wet and Venter result shows that the M-estimators corresponding to the Mn there is an LFO trimmed mean can be represented as the mean of i.i.d. random variables with error term O(n -1 10g2n) in pl'lobabi U ty . Based on the above facts, it is reasonable to hn)othesize aZmost sure linear representations for and aZmost sure equivalence between M-estimators and LFO's. In this paper we verify these hypotheses for wide classes of these estimators; the latter equivalence follows as a corollary to the representations, as does the Luw of the Iterated Logarithm for these estimators. The results are given for both the symmetric and as)'TI'.metric cases. We also study (Section 3) the scale invariant versions of M-estimators and obtain -3- a similar linearization. However, interesting things occur if F is asymmetric and our results show precisely what is happening. The representa- tions, besides being of interest in themselves and establishing the above equivalence, also have other applications; for example, we use them to derive in a straight-forward fashion almost sure linear representations for adaptive estimators (Jaeckel (197lb)) and certain random means (Shorack (1974)). In Section 2, the almost sure linearization of M-estimators is given. only does the approach work for the monotone it provides results for non-monotone ~ ~ functions (Huber (1964)), but functions such as the important Hampel function (Huber (1972), Hampel (1974)). symmetry of ~ Not The approach does not depend on the F. In Section 3, the results are extended to scale invariant M-estimators. A~drews, et. al. (1972, page 40) indicate that in this case the influence curvG can become quite complicated, unless F is symmetric. Our results show quite clearly what happens; the effect of estimating the scale parameter ~ by s is to introduce an extra term into the representation of the order (s - ~)n -1 n I i=l If F X. 111 I eX.) • I' 1 is sYmmetric, this term will be of sufficiently small order to drop out of the representation. Interestingly en0ugh, if F is aSYmmetric, this term plays a definite role in the aSi~pt0tic distribution. In Section 4, we extend the de Wet and Venter (1974) representation to a fairly large class of LFO's. (1968). The pro0f relies heavily on a result of Moore -4In Section 5, threeapplicrrtions ofthe representations are given. First of all, an almost sure equivalence result between M-estimators and LFO's is given. Then the representation is applied to the problem of adaptive robust estimation. Jaeckel (197lb) considers estimating the optimal trimming propor- tion for the trimmed mean. k for the Huber estimates. We use the same idea to estimate the best value of Under conditions analogous to those in Jaeckel (197lb), the representation theorem yields a simple proof that these adaptive M-estimators (as well as adaptive trimmed means) also satisfy the representation. Finally, the representation is applied to the problem of random means (Shorack (1974)). The results of this paper can be extended to the one-step versions of M-estimators (see Bickel (1975), Andrews, et. al. (1972)). This will be the subject of a later paper. Finally, it should be remarked that Carroll (1974), in connection with sequential selection procedures, has obtained in a different manner expansions of a somewhat allied nature to those given here for a specific subclass of M-estimators and LFO's. These expansions are uniform over neighborhoods of a specific distribution function F; however, the results are not of the generality or precision given here. above paper ~re Although the conditions on F in the quite weak, the class of estimators is fairly restrictive; for example, the Huber and Hampel M-estimators are not covered by the results. addition, it is not possible to obtain from Carroll (1974) the equivalence theorem and representation for adaptive estimators shown in Section 5. uniformity mentioned above can also be obtained from our approach. The In -5- M-ESTIMATORS Supp~se Xl' X2 ' .•• are i.i.d. with a distribution function P(x - e) . The M-estimators were proposed by Huber (1964) as robust estimates of the location parameter e (see also Huber (1972), (1964)); in Andrews et. ale (1972), Hampel (see also Hampel (1974)) proposed the so-called descending three-step estimators which were shown to have very good performance. basic version of the M-estimators invariant version) (2.1) Tn The (see the next section for the sca1e- is defined as a solution to o =n -1 n I 1/1 (x.1 - Tn ) • i=l The minimax estimator proposed by Huber (1964) has a 1/1 function (2.2) =k k < x . o~ x < a ~ x < b c ~ The Hampel estimates are defined by (2.3) =a = a c-x a C-,) _·-·t =0 In this section under fairly weak conditions on P being smooth at points where (satisfied by 1/1' F x • (which basically involve does not exist), conditions on 1/10' 1/1 1 above) are given for which if a = Ep1/1'(X) • 1/1 -6n L (2.4) 1jJ(X i=l means - e) + O(n-l(log n)2) i an /bn + 0 , and = O(bn ) an a.s. means n+<o. The result (2.4) naturally yields (if and a Law of the Iterated Logarithm for E F1jJ(X) Tn =0 ) asymptotic normality It also shows why the M-estimators are robust: they are basically a mean of bounded random variables weighted and truncated by the J hand-side of (2.4) is 1jJ function. IF(x - e) dFn(x) Since the first term on the right(where IF(x) is the influence function), this verifies again the value of the influence function. verifies again that the M-estimator formed by Winzorized mean was designed to act. ~O This also in (2.2) acts like the It also shows that 1j!l given by (2.3) is actually not particularly influenced by outliers. There are two basic steps in the proof of (2.4). sho\~ n~ (log n) -1 (Tn - e) that + 0 (a.s.) . First, it is Then, (2.1) is • ~~ e8sent~a~~y expanded in a Taylor series, although the details are somewhat messy because (2.2) and (2.3) are not differentiable. that J ~(x) (AI) ~ dF(x) =0 Assuming (without loss of generality) , c0nsider the following possible conditions: satisfies a uniform Lipschitz condition or order one and there continuous, bounded derivatives. (A2) F satisfies a Lipschitz condition of order one in neighborhoods of ,a } and k (A3) If A(e) e = o. = f 1jJ1(X) dF(x) J 1jJ(x - 6) dF(x) In addition, IA(6)1 = EFw'(X) ~ , then A(e) 0 • has a unique zero at increases in some neighborhood about o. -7·· (A4) There is a closed interval interval almost surely as n + 00 [-b,b] if (B1) A(e) (B2) The set of points of increase of such that Tn is in this e =0 . has only a finite number of zeros and P EpW'(X) is an interval, a finite number of zeros on this interval, and EpW' ~ 0 . ~ Ace) has only 0 • The conditions (B1) or (B2) will be used in place of (AS) if W is nonmonotone. The condition (AI) is slightly stronger than that of Jaeckel (1971a) but not significantly so, and (A3) is slightly weaker. The condition (A4) mus~ be verified in each case. Theorem 2.1. If Xl' XZ' ... have distribution function F(x - e) , then under CAl) - (A4), (2.5) The result (Z.5) still holds if, instead of (AS), the following is true: (AS) * Ace) is continuously differentiable, and either (B1) or (B2) hold In addition, there exists n arbitrarily close to 0 such that A(±n) ~ 0 and sign (A(n)) = -sign (A(n)) , and the sample median is strongly consistent (i.e., almost surely converges to The proof is delayed until the end of the section. Theorem 2.2. (2.6) e). The main result is: If (AI), (A2) and the conclusion to Theorem 2.1 hold, then a.s. as n + 00 -8- The follm'ling Corollaries investigate the M-estimators formed by (2.2) and (2.3). Corollary 2.1. Suppose (AI) holds, that neighborhood of zero) and skew-symmetric ~ is increasing (strictly so in a (~(x) = -~(-x)) • that F is Lipschitz in neighborhoods of {aI' ..• ,a } and strictly increasing in a k neighborhood of zero. Then (2.6) holds. Corollary 2.2. Suppose F is symmetric, unimodal and sufficiently smooth at the points 0, ±a, ±b, ±c (ii) is continuously differentiable. A(e) (2.1) using ~l so that (i) the sample median is strongly consistent Define Tn as the solution to (given by (2.3)) which is closest to the sample median. Then the representation (2.6) holds if either (Bl) or (B2) hold. Rsmarks: First of all, note that in Corollary 2.1 it is not assumed that F is symmetric; this will be ilnportant in the equivalence theorem of Section 5. Secondly, it seems that the conditions on F in Corollary 2.2 are much stronger than necessary. From now on, assume \'lithcut lo:.;s of. gr;:nerality that Xl ,X 2 '... bution F and EF~ (X) = O. Let 8 (n) = n -~ (log n). Before proving the abovE, results, it is useful to have the following propositions. Proposition 2.1. Let e: > 0 and suppose y n + 0 . have distri- Then -9- =0 Proof: Since A(O) , ).(-eB(n)) = ).(-eB(n)) - ),(0) k+1 = areB (n) .L J a. 1 J =1 {~(x ~(x)} + EB(n)) dF(x) J- k + a. I JJ j=l a.-EB(n) {~(x + EB(n)) - ~(x)} dF(x) • J Now, apply Taylor's Theorem to the integrands in the first sum and apply the Lipschitz condition on to the integrands in the second sum to complete the ~ proof. Proposition 2.2. sup n lelsb ~ For any b 0 , n I {~(X.1 -1 . 1 1= - e) - A(e)} Proof: We may assume b = 1. the Bore1-Cantel1i Le~~a Because = O(n-~(log n)~ ~ a.s. satisfies a Lipschitz condition, by it suffices to show that there is a constant M > 0 such that L . nL ClO (2.8) ~~ Pr{(log n)- n In- n= 1 J=-n 1 n I . 1 ~(x. - j/n) - ).(j/n) 1= 1 I > M} < ClO • This is accomplished by using the exponential bounds given in Loeve (1963), page 254. Proposition 2.3. Let 51' 52' necessarily independent) with be binomial random variables (not E Sn = Pn ' where Pn are random variables with S = O(n-~(10g Pn Sn , we have n)) Then, -10- n- 1A = O(n- 1 (log n)2) a.s. If Pn n n-IA = o(n-l+~) a.s. for all ~ > n Q -1 n An =·o(n -1+2 1-') in probability. = o(n-~+~) O. for all ~ If P =o(n-~+13) , then n Proof: Simple calculations show that there are constants a such that for n 0 , then > > 0, M> 0 large, if Pn =O(n-~(log n)) • Pr{n -I, An I > aMn -1 (log n) 2} S Pr{n -11 Sn - nPn 1-1 > M (log n)} . n The exponential bounds and the Borel-Cantelli Lemma yield the first result. From Johnson and Kotz (1969), page 54, we know that Els n where B.(n) J k I k . L j=O = c.B.(n)pJ , n J J is a jth degree polynomial in n. Thus, by the Markov inequality for kth moments, tlle second and third results follow. Proposition 2.4. Suppose H is continuously differentiable and has only a finite number of zeros on C = [-b,b] , say {aI' ... ,am}' = (a i - EB(n), a.1 + EB(n)) In minimum on Kn = C - u Uni at i=l Un1• 8(n) Proof: S min <'< 1 -l-m I~ i = 1, 2.• . Then ~n n - Because H is continuous and We may assume dividing by ~n ~ 8(n) a1 yields ,m . Let Let IH(x)1 attain its a·1 = O(13(n)) • 1 C is compact, and min .... < 1 .;"l-m I~ n - H(a l + E13(n)) a.1 ~ 0 1 about a1 and -11- IB(n)-l(~n - al)H'(n*)I ~ IEH'(n)1 , n n where n*, nn n converge to H'(a 1 ) . This yields the result. The proof of Theorem 2.1 given below is an adaptation of the proof for maximum likelihood estimation given in Huber (1965). Basically, it involves showing something like a.s. If W were monotone (see Corollary 2.1), this would indeed yield Theorem 2.1. W is However, not in general monotone. Proof of Theorem 2.1. Kn =C - Un (say at C Let = [-b,b] and Un = (-EB(n), EBen)) . Then is compact, so that an)' Now, IA(e)1 IA(e)1 as 0 + e attains a positive minimum on + so that 0 an is increasing in a neighborhood of 0 IA(±eB(n))I ~ ceB(n) 2.2, for small for n O. Since By Proposition 2.1, sufficiently large and some c. By Proposition n > 0 , sup In - almost surely as n n l ~ W(X. - e) - A(e) L i=l 1 + 00. e~K n itf IB(n)- I ~ (c - n)eB(n) Hence, almost surely as lIn n- eeK n + Kn Ih(e)1 I w(x. - i=l n n + 00 , e)1 ~ nE > 0 . 1 L w(x.1 - Tn ) = 0, Tn t Kn almost surely as n + 00 , co~pleting the i=l first part of the proof. Suppose now that (A3)* holds in place of (A3). Since Suppose that the zeros of A(e) allowable under (81) or (B2) are { -12- {aI' .•. ,am} , and form a disjoint cover of these zeros by the intervals Uni = (a i - E8(n), a i + EB(n))}, i = I, 2, ... ,m Suppose IA{e)l· attains m its minimum on Kn = C - u U . at ..~n. By Proposl·tl·on 2 • 4 , . 1 n,l 1= min I~ l:>:i:S;m n EB(n) , IA(~n)1 ~ c*cS(n) Thus, for Proposition 2.2, means that n -1 n Tn - a·1 :S; cB(n) . 1 large and some ~ Kn c* > O. almost surely n + 00. This, together with Now, since n r ~(Xi - (±n)) + A(±n) a.s. , this means by Proposition 2.2 that i=l eventually there is a zero to (2.1) in the interval (-S(n), B(n)) ; since the sample median is strongly consistent, this completes the proof. Proof of Theorem 2.2. Assume for convenience that of non-differentiability, say {±k}. A.1 B. 1 Now, by expanding bounded and T2 n (2.9) o = n -1 = n -1 = {Ix.1 = {Ix.1 - ~(Xi - Tn) = 0 (n -1 (log I Tn I > k about n) 2) and Xi ix·1 1 Ix·1 1 on Ai :S; k} > k} and 1 1 is n r r i=l i=l ~(X. 1 - Tn ) n 1 IA.uB. B. , since $" a.s., IA.uB.} where has at most two points Define and - Tn ~ 1 ~ (n -1 (log n) 2) a. s. , 1 is the indicator function of the event Propositicn 2.3, since + 0 A. u B•• 1 1 Because of satisfies a Lipschitz condition, the middle term of -13- equation (2.9) (call it n B (2.10) n = n -1 L i=l Bn ) becomes {ljI(X.) - TnljI'(X.)}(l - I A.uB. ) + O(n- l (log n)2) 1 1 1 a.s. 1 Thus, T {n (2.11) n -1 L n ljI'(X )} n i=l = n- l a. s. , which completes the proof since n n -1 a.s. -1 a.s. Proof of Corollary 2.1. Since ljI is increasing, lr' Since ljI holds. and ~ 0 , so that (A2) holds. F are strictly increasing in a neighborhood of zero, (A3) Huber (1964) has shown (A4) by proving that Tn is almost surely consistent. Pt~of of Corollary 2.2. Because F is symmetric and unimodal, the conditions guarantee that (AI) and (A2) hold. symmetric and increasing at such that A(n) ~ 0 and For the same reason, since 0, there exists = -sign sign (A(n)) ljI1 is skew- n > 0 arbitrarily close to (A(-n)) Thus, (A3) * is verified, and it suffices by Th00rem 2.1 merely to show that (A4) holds. Because n -1 ~l ljII ( Xi - (±n) ) i=1 + A(±n) <> 0 a.s. , 0 -14- there is eventually a solution to (2.1) in the interval sample m~dian (-n,n) because the is strongly consistent, this gives (A4). SCALE INVARIANT M-ESTIMATORS An estimator Tn T(Xl/c, ..• ,Xn/c) = T(X l , ..• ,Xn ) is called scale invariant if = Tn/c for any constant c ~ o. The estimators in the previous section do not satisfy this property, and it is the purpose of this section to modify the criterion (2.1) to get scale invariant versions. Andrews, et. al. (1972, page 40) indicate that even if the ~ functions are monotone as in Corollary 2.1, the influence function is complicated unless is sYmmetric. F The representation given here shows clearly the behavior of the estimators in both sYmmetric and asymmetric cases; there is basically no change in the symmetric case but if F is not sYmmetric, an additional term of order O(n-~) in probability is added. Suppose that sn is a scale invariant estimator which, for some constant t; , satisfies (3.1) Under the conditions given in Bahadur (1966), the interquartile range is a scale invariant estimator satisfying (3.1). in (3.1) is D(n- B) for some In the following if the rate term B > 0 , the order term appearing in (3.3) is -15- 0(n- 28 ) only. In analogy with (2.1), define o = n -1 (3.2) Clearly, Tn n ~L ~ . 1 1= is scale invariant. (-1 s (X. n ~ Tn as the solution to t) ) . The result similar to Theorem 2.2 is the following: Theorem 3.1. Suppose Xl' X2, ... have a distribution function p(~-l(x and that the conditions of Theorem 2.2 hold. l = {E pl{J'(xn- (n- 1 (3.3) (l - Then, n I ~(~-l(Xi - 6))] + i=l n (n- 1 i=lI sn/~) e) (~-l(Xi - {E pl{J'(X)}-l e))~,(~-l(Xi - 6))] a.s. If, in addition, EpXI{J'(X) =0 , (3.4) Note that the condition o and ~(x) = -~(-x) EpX~'(X) =0 is satisfied if P is symmetric about , the latter being true for (2.2) and (2.3). if EpX~'(X) ~ 0 , the middle term of (3.3) is only O(n-~) many cases, so that there is a si~~ificant Note that in probability in effect on the asymptotic distribution Proof: The proof is quite similar to that of Theorems 2.1 and 2.2 so only an outline will be given. Assuming that a = 0 ,and ~ = 1 , by the Lipschitz -16- condition on l/I n n- l (3.5) {l/I(X/s ) - l/I(X i )} = o(n-~(log n)~) n l i=l Tn = O(n-~(log n)) so that a.s. a.s. , by the proof of Theorem 2.1. By the same method as in the proof of Theorem 2.2, one shows that n o = n- l '1 l a.s. 1 n I l n 1 2 1 nt l l/I(X./s ) = n - lt l/I(X.) + (Sn- - l)n - lt X.l/I'(X.) + U"(n - (log n) ) i=l l. n i=l l. i=l l. 1 a.s. l.= n- n O(n- l (log n)2) (3.6) l/I(X./s) - (T Is )n- 1 l l/I'(X./s) l.n nn '1 l.n l.= + n T {n -1 tl l/I' (Xl' I sn) - n -1 n . 1 1= a.s. o(n -1 (log n) 2) (s n -l)n -Itl$(X.)= 1 This completes the proof, since a.s. It is clear that if EFX$'(X) = 0 ; then results similar to Corollary 2.1 and Corollary 2.2 hold. ----_.- LINEAR FUNCTIONS OF ORDER STATISTICS Let Xl' XZ' ••• be a sample from a distribution function linear function of the order statistics F(x) • A X(l) S X(2) S •.• S X(n) n L J(i/n) (4.1) i=l where J (4.2) is some weighting function. J(t) = (1 =0 XCi) , The trimmed mean is given by - Za)-l otherwise .• is -17de Wet and Venter (1974) show that the trimmed mean is the sum of i.i.d. random variables plus an almost sure error term of order O(n -1 log2n). This section shows that this almost sure representation holds for a wide class of weight functions The method is to extend the proof of Moore (1968). J. Theorem 4.1. Suppose J' of order one on random variables Y l If, in addition, (4.4) = EXJ(F(X)) such that for a = n -1 I J: T n e Let , Y2 , Tn - (4.3) Proof: [0,1] . exists and satisfies a uniform Lipschitz condition . Then there are mean a > 0 , n I a.s. i=l d(F-1(u) J(ulll < ~ , a = n -1 a.s. Following Moore (1968), let G(x) = F-1 (x), R.1 , the empirical distribution function of the R.~ W (u) n = F(X.) , Un(u) = n~(un (u) - u) , J (un (u)) --..'J (u) = J' (v*n (u)) (un (u) - u) , where V*(u) n (4.5) = aun (u) + H(R l ) it follows that Tn - a (1 - a)u and = J J(u) = I nI 0, i.i.d. + lal S 1. Then, if [U1(u) - u] dG(u) , I n2 + I n3 ' where ~ be and -18(4.6) J: I = -n I n2 = n -~ I n3 = n- 1 II0 IIn21 < Now, n -1 I n1 . 1 1= J: H(R.) 1 dUn (u) G(u)J'(u)Wn(u)dWn(u) • supIJ'(V~(n)) IG(x)! dUn(x) (v~ (u)) - J' (u) 1 I'/n (u) G(u) [J' J: - J'(u)1 suplun(u) - ul ~ J IG(x)1 dx a.s. IG(x)1 dUn(x) , and By the Law of the Iterated Logarithm for the Kolmogorov-Smirnov statistics and the Lipschitz condition on J' , it follows that IIn21 = O(n- l log2n) a.s. as n ~ m. As in Moore's proof, I I: 21 I n3 1 = n-I (4.7) 21In31 < If 1 J: G(u)J' (u)d [I'/n (u)] 21. 0 (n -1 log 2n) J: IUn(u) - ul d(G(u)J'(u))I < ~ , 2 d(G(u)J'(u)) • 0(n- this gives a. s. , so that 1 log 2n) a.s. II n3 1 = 0(n- 1 log 2n) a.s. Otherwise, denote the first term on the right hand side of (4.7) by relation stated in' Proposition 2.5, one sees 4 cOn H(u) , where H(u) ~ that Bn • By the moment 8 Eln(un(u) - u) 1 $ 0 is integrable with respect to two applications of Schwarz's E Bn4 Inequ~lity $ cn 4/n 8 G(u)J'(u) . Thus, yield = cn -4 . By Chebychev's Inequality and the Borel-Cantelli Lemma, n -3/4+a Bn ~ 0 a.s. , completing the proof. Theorem 4.2. Supp'Jse J, J', JII are continuous and bounded except possibly a": {aI' ... ,ak } , and possess left and right limits. If F satisfies the -19conditions of Bahadur (1966) at {aI' ... ,ak} , the conclusions (4.3) and (4.4) hold, except that the order term in (4.4) is O(n- 1 (log n)2) • Proof: Following Moore (1968) again, we have that (4.8) T = fl G(u)J'(u)V (u)du + fI G(u)J(u)dV (u) nOn 0 n = n-~Wn (u) where Vn (u) + f1o G(u){J(Un (u») + f1o G(u)J'(u)Vn (u)dVn (u) The last term is , n3 appearing in The first two terms of (4.8) can be decomposed (via integration Theorem 4.1. = Un (u) - u. - J(u) - J'(u)V (u)} dU (n) n n I by parts) into k - V (a. l)G(a. l)J(a. l+)} ~ {Vn (a.)G(a.)J(a.-) J J J n JJJ- (4.9) j=l + f1o J(u)Vn (u)dG(u) • As in Theorem 4.1, the second term in (4.9) is the sum of i.i.d. mean 0 k random variables. k+1 (4.10) B n = l j=l Call the first term Sen) J= I ars(n) A. The third term of (4.8) is J,n G(u){J(U (u») - J(u) - J'Cu)V (u)} dU (u) a. l+s(n) n n n Jk a.+I3(n) + L f J G(u){J(U eu») - J(u) - J'(u)V (u)} dU (u) , . 1 J= where I . 1 a - Sf',.n ) j converges to zero. n By choosing n B(n) n = O(n-~(log2n)~ appropriately, the first term on the right hand side of (4.10) is a.s. , since Un(u) and u are (a.s.) in the intervals -20- (a j _1 + a(n), a j - a(n)) By choosing together as = n-~(log n)~, B(n) n since + wand J' Vn(u) is differentiable there. = o(B(n)) , Proposition 2.3 can be used to show that a.+B(n) J = 0. ( n -1 (log ) V (u)dU (u) G( u)J'(u n n Ja.-B(n) J n) 2) a.s. Thus, we need only consider a.+13(n) k (4. lOb) faj-B(n) ): J=l G(u) {J (Un (u)) - J(u)} dUn(u) • J R[ na ] Assume for the moment that ~ a. , wb.ere S R(k) s a j } J j statistic in R1 , ..• ,R . n In = {k: a. B.In 2 = {k: R[na.] Bjn3 = {k: J J J r + B.In 1 a j < R(k) < a j + BCn)} • 2 + B.In 2 B.In 3 G(R Ck )) {J(k/n) - J(R(k))}) . Now, de Wet and Venter (1974) show that the number of terms between R[na ] is j aT d Cl' j , (4.10b) becomes r 1 n- ( C4.11) [na. ] th Define B. 1 Then, for a given is the aj and O(n~(10g2n)~ a.s. ; becau3e J' is bounded, one can use Proposition 2.3 to show that the first and last sums of (4.11) are o(n- 1 Clog (4.12) n n)2) a.s. By a similar argument, the middle sum of (4.11) is -1 B r G(~) {J(a.+) - J(a.-)} + O(n- 1 C1og n)2) a.s. J jn2 = n- 1 J {J(a.+) - J(a.-)} {Sum of observations between J J x[ na. ] } + OrIn -1 (log n) 2) a. s. J F -1 (a.) J and -21k Now, the first term of (4.9) is - I j=1 V (a.)G(a.) (J(a.+) - J(a.-)) , so that n J J J J this term added to the first term on the right hand side of (4.12) becomes for c. J = J(a.+) J - J(a.-) , J ~ c . { n -1 ~ G(R )) - G(a.)V (a.) } Ck J B J n J jn2 (4.13) = c.n -1 J + c.n-l{n(u (a.) - U (R[ ])) - nV (a.)} G(a.) J n J n na j n J ) 1{na. = c.n) ) - [na.]} G(a.) + O(n-lClog n)2) = O(n-l(log n)2) ) J a.s. a.s. The second equality in (4.13) follows by using Proposition 2.3, since G(R[na.]) - G(a j ) = O(n-~(10g2n)~ a.s. This completes the proof. ) APPLICATIONS We now apply the results of Sections 2 and 4 to get the asymptotic equivalence of M-estimators and LFO's and to obtain an expansion for flexible estimates of location. We first consider the equivalence, showing that if Jaeckel's (l97la) conditions are strengthened, the result follows easily from the representations and is much stronger than any result appearing in the literature. Theorem 5.1. (S.la) Suppo~e the following hold: -22(S.lb) W,J satisfy the conditions of Theorems 2.1 and 4.2 respectively, and F is continuous. (S.lc) (S.ld) J(t) M n = W'(x) = 0 for t t [a,b], 0 < a < b < 1 . is the M-estimator formed by (2.1) and Ln is the LFO formed by J. Then, (S.2) Proof: Because of Theorems 2.2 and 4.2, it suffices to show that H(R l) = -I J(u) [Ul(u) - u] dG(u) H(R l ) = -A = Aw(X l ) I W'(G(U)) . Now, [Ul(u) - u] dG(u) = -A f W'(u) [Fl(y) - F(y)] dy = -A I Fl(y) dw(y) = A + A f F(y) f W(y) dFl(y) = Aw(X l ) d~(y) • Corollary 5.1. The conditions of Theorem 5.1 hold, for example, if (S.3a) F', F" (S.3b) ~ exist and are bounded away from zero on an interval has 3 continuous bounded derivatives at all but a finite number of points. (S.3c) -23- We now turn to flexible estimates of location. M-estimators defined by ljik We consider the in (2.2) and the trimmed means given by - 2a)-1 if a S t S I - a and zero otherwise. We will assume a (t) = (1 is symmetric for convenience, although some similar results hold if F is J asymmetric. F Further, some similar results hold for the scale invariant version of the M-estimators. The asymptotic variances of these estimators are o~(k) = {F(k) (5.4) 2 oLea) = (1 - F(_k)}-2 - 2a)- 2 { f J lji~(x)dF(X) x I-a 2 2 x dF(x) + 2axa } , XC( where xa = G(a) = F-1 (a) a-trimmed mean by 2 sM(k) (5.5) . Denoting the M-estimator by Mnk Lna ,we estimate these asymptotic variances by = {Fn (k - Mnk ) - Fn(-k - Mnk )} -2 2 sL(k) = (1 - 2a) = (1 - and the 2a) -2 {n- 1 -2 { n-[an] 2 i=[an]+l fX(n- [an]) X f 2 ljik(x - Mnk ) dFn(x) 2 (X(i) - Lna ) } (x - Lna )2 dF n (x) 2a(X([an]) + + _ L )2 na 2a (X [an ] _ Lna ) 2} . [an]+l) We define the optimal estimators as follows: assume a range of perlnissible values :;;; k S k and o < a O S a S a < ~ fixed in advance. Compute l l O 2 which minimize and sLea) for these ranges, and choose k 2 sLea) on these ranges. We will assume are uniquely defined -24and that the values minimizing k*, n* on the ranges are unique. If this latter assumption is not true, modifications along the lines suggested by Jaeckel (197Ib) are possible. Define We first consider the M-estimators. The following Lemma is an easy consequence of an extension of Theorems 2.1 and 2.2, along with the Law of the Iterated Logarithm for the empirical distribution function. Lemma 5.1. Let k' be any point in differentiable in neighborhoods inf IF(k) - F(-k) kEC M sup+ C IMnk I > If in addition Then 0 - (EFt/!k(X)r C . Suppose that F is continuously M C of ±k' and that 1 n- 1 n L t/!k(\) I = O(n-l(log n)2) a.s. i=l F is continuously differentiable on CM ' then Lemma 5.2. Under the conditions of Le~ma 5.1, if a~(k) is continuously differentiable in k on CM ' then Proof: Define 6(n) = MIn -~ (log k* Ml + 6Cn))} , where n) ~ and will be large. In Let t (k* - B(n), ~n = a~(an) = inf a~(k) . = {k E CM: k By -25- Proposition 2.4, we have a~ ~ Sen) sian - k*1 s M2S(n) , so that k*. Thus, by choosing Ml sufficiently large, we can find a sequence of constants D(n) = O(n-~(log n)) such that where Thus, with probability approaching one, + D(u) - D(u) > ~n - Deu) so that with probability approaching one, yielding the Lew~a. Thus by bringing Lemmas 5.1 and 5.2 together and by following the proof of Theorem 2.2, we obtain the following: Theorem 5.2. Under the conditions of Lemmas 5.1 and 5.2, a.s. We now show that a result similar to Theorem 5.2 holds for the trimmed mean. -26L~~a C1 ,. C2 5.3. Let a' be a fixed point and suppose that for neighborhoods of a, ' 1 - a'' F satisfies the conditions of Bahadur (1966). (5.6) Then a. s. , where y,,: (a) = G(a)(l - 2a) -1 X. < G(a) 1 1 = X. (1 - 2a. ) -1 1 = G(a) G(l - a)(l - 2a )-1 > X. 1 ~ G(l a) X.1 > G(1 - a) If the Bahadur conditions hold on the set EO ~ a O - EO ~ F(x) ~ 1 - (a O - EO) 0 , then the sup in (5.6) is over CL and (5.7) Proof: --- From Sen and Ghosh (1971), if sup sup O<t<l lal~g(n) Iu (t + a) - n U g(n) (t) + al n = n -3§ (log n) , = O(n- 3/ 4 1og n) a.s. This, together with the Law of the Iterated Logarithm for the empirical distribution function, implies that sup aO~a~l-aO sup a o ~Ct~l-a - F (G(a) ) I = O (n -~~log n) a.s. IFn(X([na.))) - Fn(G(a)) I =o(n-3§log n) a.s. IF(X([na])) 0 The conditions of this Lemma then show that with -27- Now, following the proof of de Wet and Venter (1974a), we see that (5.6) holds. -1 n Also,app1ying the exponential bounds to n I Y~(a) together with (5.6) i=l 1 yields sup C IL I = 0(n-~(10g na n)) a.s. L The proof of (5.7) now follows easily. is continuously Theorem 5.3. Under the condi.tions of Lemma 5.3, if CL ' then differentiable on an - a* = O(n-J.?(log n)) a.s. y Y~(a*) L = n- 1 nan i=l A + 0(11- 1 (10g n)2) a.s. 1 The proof is as in Lemma 5.2 and Theorem 5.2. Finally, if F satisfies the conditions of Bahadur (1966) at G(a) and·G(l - a) , one can obtain some of the results of Shorack (1974) directly from Lemma 5.3. an - a To be specific, if we estimate = o(n-~) in probability, by a an and if then lUlder the initial conditions of Le~ma 5.3, equation (5.6) tells us that (5.8) Now, ILna ~se I = /nna n 1 l . 1 L A n the last part of 1= A Y~(a) 1 P~oposition right-hand-side of (5.8) is a.s. n 2.3 to sho\~ that the first term on the o(n-~) in probability. -28- BIBLIOGRAPHY D.F., BICKEL, P.J., HAMPEL, F.R., HUBER, P.J., ROGERS, N.H., and TUKEY, J.W. (1972). Robust Estimates of Location: Survey and Advances. Princeton University Press. N~DREWS, BN1ADUR, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. (37) 577-580. BICKEL, P.J. (1975). One step estinates in the linear model. To appear in J. Amer. Statist. Assoc. CARROLL, R.J. (1974). Asymptotically nonparametric sequential selection procedures II - robust estimators. Institute of Statistics Mimeo Series #953, University of North Carolina at Chapel Hill. DE WET, T. and VENTER, J.ll. (1974). An asymptotic representation of trimmed means with applications. S. Afr. Statist. J. (8) 127-134. DE WET, T. (1974). Rates of convergence of linear combination of order statistics. S. Afr. Statist. J. (8) 35-44. FILIPPOVA, A.A. (1962). ~Iisesl theorem on the asymptotic behavior of functionals of empirical distribution functions and its statistical applications. Theory of Probability and its Applications (7) 24-57. P~"MPEL, F.R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. (69) 383-393. HUBER, P.J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. (35) 73-101. HUBER, P.J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Prob. (1) 221-233. HUBER, P.J. (1972). Robust statistics: a review. Ann. Math. Statist. (43) 1041-1067. JAECKEL, L.B. (197Ia). Robust esti~ate5 of location: symmetry and asymmetric contamination. Ann. Math. Statist. (42) 1020-1034. JAECKEL, L.B. (1971b). Some flexible estimates of location. Ann. Math. Statist. (42) 1540-1552. JOHNSON, N.L., and KOTZ, S. (1969). Discrete Distributions. Houghton Mifflin, Boston. LOEVE, 11. (1963). Probability Theory. D. Van Nostrand, Princeton. MOORE, D.S. (1968). An elementary proof of asymptotic normality of linear functions of order statistics. Ann. Math. Statist. (39) 263-265. SEN, P.K •• and GHOSH, M. (1971). On bounded length sequential confidence intervals based on one-sample rank order statistics. Ann. Math. Statist. (42) 189-203. SHORACK. G.R. (197 4 ). Random means. Ann. Statist. (2) 661-675.