an asymptotic representation for m-estimators and linear functions of

advertisement
AN ASYMPTOTIC REPRESENTATION FOR M-ESTIMATORS AND
LINEAR FUNCTIONS OF ORDER STATISTICS
by
Raymond J. Carro11*
Department of statistics
University of North Caro Una at Chape Z Hi U
Institute of Statistics Mimeo Series #1006
May, 1975
* This work was suppo:;:ted in pa.:.'t by the Air Force Office of Scie'ltific
Research under Grant No. APOSR-75-2840.
AN ASYMPTOTIC REPRESENTATION FOR M-ESTIMATORS AND
LINEAR FUNCTIONS OF ORDER STATISTICS
by
Raymond J. Carroll·
University of North Carolina at Chapel- BiH
Let
Xl' X2 , .••
F(x - e)
Let
statistics.
be independent observations from a distribution functior;,
Tn be an M-estimator or a linear function of order
Conditions are given for the existence of i.i.d. mean zero random
,n
YI , YZ' •.• such that Tn - a = n -1 LI Y + o(n -1 (log n) 2) a. s. r,:
i
this representation is shown to hold for Huber and Hampel M-estimators
variables
n
~
00
;
as well as the trimmed mean.
It can also be extended to a scale invariant
version of the M-estimators; if
F is symmetric there is no change in the
representation, but the effect of asymmetry is to add in an extra random term.
The results are applied to show (i) an almost sure equivalence between
M-estimators and linear functions of order statistics (ii) that a similar
representation holds for adaptive estimators in the spirit of Jaeckel (1971).
~s
1970 Subject Classifications: Primary 62G35; Secondary 62E20
Key Words and Phrases:
M-estimators, Linear function of order statistics,
Robustness, Adaptive estimators
* This work was supported in part by the Air Force Office of Scientific
Research under Grant AFOSR-75-2840.
-2-
INTRODUCTION
Let
function
parameter
Xl' X2 ' ••.
be a sample of i.i.d. random variables with distribution
F(x - 8) • Two basic
t)~es
of robust estimators Tn
of the location
8 are the M-estimators (Huber (1964)) and linear functions of order
statistics (LFO's).
It is known (Huber (1972), Filippova (1962)) that in many cases the
M-estimators satisfy
n~{Tn
where
I(x,F)
-
I
I(x,F)dFn(x)}
+
0
in probability,
is the influence curve (Hampel (1972)).
de Wet and Venter
(1974) show that the trimmed mean can be represented as the mean of i.i.d.
random variables with an error term
(197la) has shown
~lder
O(n- l 10g2n)
almost surely.
Jaeckel
some conditions that for a large class of M-estimators
L for which M - L = 0(n- 1 ) in probability; use of
n
n
11
the de Wet and Venter result shows that the M-estimators corresponding to the
Mn there is an LFO
trimmed mean can be represented as the mean of i.i.d. random variables with
error term O(n -1 10g2n)
in pl'lobabi U ty .
Based on the above facts, it is reasonable to hn)othesize aZmost sure
linear representations for and aZmost sure equivalence between M-estimators and
LFO's.
In this paper we verify these hypotheses for wide classes of these
estimators; the latter equivalence follows as a corollary to the representations, as does the Luw of the Iterated Logarithm for these estimators.
The results are given for both the symmetric and as)'TI'.metric cases.
We
also study (Section 3) the scale invariant versions of M-estimators and obtain
-3-
a similar linearization.
However, interesting things occur if F is
asymmetric and our results show precisely what is happening.
The representa-
tions, besides being of interest in themselves and establishing the above
equivalence, also have other applications; for example, we use them to derive
in a straight-forward fashion almost sure linear representations for adaptive
estimators (Jaeckel (197lb)) and certain random means (Shorack (1974)).
In Section 2, the almost sure linearization of M-estimators is given.
only does the approach work for the monotone
it provides results for non-monotone
~
~
functions (Huber (1964)), but
functions such as the important Hampel
function (Huber (1972), Hampel (1974)).
symmetry of
~
Not
The approach does not depend on the
F.
In Section 3, the results are extended to scale invariant M-estimators.
A~drews,
et. al. (1972, page 40) indicate that in this case the influence curvG
can become quite complicated, unless
F is symmetric.
Our results show quite
clearly what happens; the effect of estimating the scale parameter
~
by
s
is to introduce an extra term into the representation of the order
(s -
~)n
-1
n
I
i=l
If F
X. 111 I eX.) •
I'
1
is sYmmetric, this term will be of sufficiently small order to drop out
of the representation.
Interestingly en0ugh, if F is aSYmmetric, this term
plays a definite role in the
aSi~pt0tic
distribution.
In Section 4, we extend the de Wet and Venter (1974) representation to a
fairly large class of LFO's.
(1968).
The pro0f relies heavily on a result of Moore
-4In Section 5, threeapplicrrtions ofthe representations are given.
First
of all, an almost sure equivalence result between M-estimators and LFO's is
given.
Then the representation is applied to the problem of adaptive robust
estimation.
Jaeckel (197lb) considers estimating the optimal trimming propor-
tion for the trimmed mean.
k
for the Huber estimates.
We use the same idea to estimate the best value of
Under conditions analogous to those in Jaeckel
(197lb), the representation theorem yields a simple proof that these adaptive
M-estimators (as well as adaptive trimmed means) also satisfy the representation.
Finally, the representation is applied to the problem of random means
(Shorack (1974)).
The results of this paper can be extended to the one-step versions of
M-estimators (see Bickel (1975), Andrews, et. al. (1972)).
This will be the
subject of a later paper.
Finally, it should be remarked that Carroll (1974), in connection with
sequential selection procedures, has obtained in a different manner
expansions
of a somewhat allied nature to those given here for a specific subclass of
M-estimators and LFO's.
These expansions are uniform over neighborhoods of a
specific distribution function
F; however, the results are not of the
generality or precision given here.
above paper
~re
Although the conditions on
F in the
quite weak, the class of estimators is fairly restrictive; for
example, the Huber and Hampel M-estimators are not covered by the results.
addition, it is not possible to obtain from Carroll (1974) the equivalence
theorem and representation for adaptive estimators shown in Section 5.
uniformity mentioned above can also be obtained from our approach.
The
In
-5-
M-ESTIMATORS
Supp~se
Xl' X2 ' .••
are i.i.d. with a distribution function
P(x - e) .
The M-estimators were proposed by Huber (1964) as robust estimates of the
location parameter
e (see also Huber (1972), (1964)); in Andrews et. ale
(1972), Hampel (see also Hampel (1974)) proposed the so-called descending
three-step estimators which were shown to have very good performance.
basic version of the M-estimators
invariant version)
(2.1)
Tn
The
(see the next section for the sca1e-
is defined as a solution to
o =n
-1
n
I
1/1 (x.1 - Tn ) •
i=l
The minimax estimator proposed by Huber (1964) has a
1/1
function
(2.2)
=k
k <
x .
o~
x < a
~
x < b
c
~
The Hampel estimates are defined by
(2.3)
=a
=
a
c-x a
C-,)
_·-·t
=0
In this section under fairly weak conditions on
P being smooth at points where
(satisfied by
1/1'
F
x •
(which basically involve
does not exist), conditions on
1/10' 1/1 1 above) are given for which if a
= Ep1/1'(X)
•
1/1
-6n
L
(2.4)
1jJ(X
i=l
means
- e) + O(n-l(log n)2)
i
an /bn
+
0 , and
= O(bn )
an
a.s.
means
n+<o.
The result (2.4) naturally yields (if
and a Law of the Iterated Logarithm for
E
F1jJ(X)
Tn
=0
)
asymptotic normality
It also shows why the
M-estimators are robust: they are basically a mean of bounded random variables
weighted and truncated by the
J
hand-side of (2.4) is
1jJ
function.
IF(x - e) dFn(x)
Since the first term on the right(where
IF(x)
is the influence
function), this verifies again the value of the influence function.
verifies again that the M-estimator formed by
Winzorized mean was designed to act.
~O
This also
in (2.2) acts like the
It also shows that
1j!l
given by (2.3)
is actually not particularly influenced by outliers.
There are two basic steps in the proof of (2.4).
sho\~
n~ (log n) -1 (Tn - e)
that
+
0
(a.s.) .
First, it is
Then, (2.1) is
• ~~
e8sent~a~~y
expanded in a Taylor series, although the details are somewhat messy because
(2.2) and (2.3) are not differentiable.
that
J ~(x)
(AI)
~
dF(x)
=0
Assuming (without loss of generality)
, c0nsider the following possible conditions:
satisfies a uniform Lipschitz condition or order one and there
continuous, bounded derivatives.
(A2)
F satisfies a Lipschitz condition of order one in neighborhoods of
,a } and
k
(A3)
If A(e)
e = o.
=
f 1jJ1(X) dF(x)
J 1jJ(x - 6) dF(x)
In addition,
IA(6)1
= EFw'(X) ~
, then
A(e)
0 •
has a unique zero at
increases in some neighborhood about
o.
-7··
(A4)
There is a closed interval
interval almost surely as
n
+
00
[-b,b]
if
(B1)
A(e)
(B2)
The set of points of increase of
such that
Tn
is in this
e =0 .
has only a finite number of zeros and
P
EpW'(X)
is an interval,
a finite number of zeros on this interval, and
EpW'
~
0 .
~
Ace)
has only
0 •
The conditions (B1) or (B2) will be used in place of (AS) if W is nonmonotone.
The condition (AI) is slightly stronger than that of Jaeckel (1971a)
but not significantly so, and (A3) is slightly weaker.
The condition (A4)
mus~
be verified in each case.
Theorem 2.1.
If Xl' XZ' ...
have distribution function
F(x -
e) , then
under CAl) - (A4),
(2.5)
The result (Z.5) still holds if, instead of (AS), the following is true:
(AS) *
Ace)
is continuously differentiable, and either (B1) or (B2) hold
In addition, there exists
n
arbitrarily close to
0
such that
A(±n) ~ 0 and sign (A(n)) = -sign (A(n)) , and the sample median is
strongly consistent (i.e., almost surely converges to
The proof is delayed until the end of the section.
Theorem 2.2.
(2.6)
e).
The main result is:
If (AI), (A2) and the conclusion to Theorem 2.1 hold, then
a.s.
as
n +
00
-8-
The follm'ling Corollaries investigate the M-estimators formed by (2.2)
and (2.3).
Corollary 2.1.
Suppose (AI) holds, that
neighborhood of zero) and skew-symmetric
~
is increasing (strictly so in a
(~(x)
= -~(-x))
• that
F is
Lipschitz in neighborhoods of
{aI' ..• ,a } and strictly increasing in a
k
neighborhood of zero. Then (2.6) holds.
Corollary 2.2.
Suppose
F is symmetric, unimodal and sufficiently smooth at
the points
0, ±a, ±b, ±c
(ii)
is continuously differentiable.
A(e)
(2.1) using
~l
so that (i) the sample median is strongly consistent
Define
Tn
as the solution to
(given by (2.3)) which is closest to the sample median.
Then
the representation (2.6) holds if either (Bl) or (B2) hold.
Rsmarks:
First of all, note that in Corollary 2.1 it is not assumed that
F
is symmetric; this will be ilnportant in the equivalence theorem of Section 5.
Secondly, it seems that the conditions on
F in Corollary 2.2 are much
stronger than necessary.
From now on, assume \'lithcut lo:.;s of. gr;:nerality that Xl ,X 2 '...
bution
F
and
EF~ (X)
= O.
Let
8 (n)
= n -~ (log
n).
Before proving the abovE,
results, it is useful to have the following propositions.
Proposition 2.1.
Let
e: > 0
and suppose
y
n
+
0 .
have distri-
Then
-9-
=0
Proof: Since A(O)
,
).(-eB(n)) = ).(-eB(n)) - ),(0)
k+1
=
areB (n)
.L J
a. 1
J =1
{~(x
~(x)}
+ EB(n))
dF(x)
J-
k
+
a.
I JJ
j=l
a.-EB(n)
{~(x
+ EB(n)) -
~(x)}
dF(x) •
J
Now, apply Taylor's Theorem to the integrands in the first sum and apply the
Lipschitz condition on
to the integrands in the second sum to complete the
~
proof.
Proposition 2.2.
sup n
lelsb
~
For any b
0 ,
n
I {~(X.1
-1
. 1
1=
- e) - A(e)}
Proof: We may assume b = 1.
the Bore1-Cantel1i
Le~~a
Because
= O(n-~(log n)~
~
a.s.
satisfies a Lipschitz condition, by
it suffices to show that there is a constant M > 0
such that
L . nL
ClO
(2.8)
~~
Pr{(log n)- n In-
n= 1 J=-n
1
n
I
. 1
~(x. - j/n) - ).(j/n)
1=
1
I
> M} <
ClO
•
This is accomplished by using the exponential bounds given in Loeve (1963),
page 254.
Proposition 2.3.
Let
51' 52'
necessarily independent) with
be binomial random variables (not
E Sn = Pn ' where Pn
are random variables with
S
= O(n-~(10g
Pn Sn , we have
n))
Then,
-10-
n- 1A = O(n- 1 (log n)2) a.s. If Pn
n
n-IA = o(n-l+~) a.s. for all ~ >
n
Q
-1
n An =·o(n -1+2 1-') in probability.
= o(n-~+~)
O.
for all
~
If P =o(n-~+13) , then
n
Proof: Simple calculations show that there are constants a
such that for
n
0 , then
>
>
0,
M> 0
large, if Pn =O(n-~(log n)) •
Pr{n -I, An I > aMn -1 (log n) 2} S Pr{n -11 Sn - nPn 1-1
> M (log n)} .
n
The exponential bounds and the Borel-Cantelli Lemma yield the first result.
From Johnson and Kotz (1969), page 54, we know that
Els
n
where
B.(n)
J
k
I
k
.
L
j=O
=
c.B.(n)pJ ,
n
J J
is a jth degree polynomial in
n.
Thus, by the Markov
inequality for kth moments, tlle second and third results follow.
Proposition 2.4. Suppose H is continuously differentiable and has only a
finite number of zeros on C = [-b,b] , say {aI' ... ,am}'
= (a i
- EB(n), a.1 + EB(n))
In
minimum on Kn = C - u Uni at
i=l
Un1•
8(n)
Proof:
S
min
<'<
1 -l-m
I~
i
= 1,
2.•
.
Then
~n
n
-
Because H is continuous and
We may assume
dividing by
~n ~
8(n)
a1
yields
,m
.
Let
Let
IH(x)1
attain its
a·1 = O(13(n)) •
1
C is compact,
and
min
.... <
1 .;"l-m
I~ n -
H(a l + E13(n))
a.1 ~ 0
1
about
a1
and
-11-
IB(n)-l(~n - al)H'(n*)I
~ IEH'(n)1
,
n
n
where
n*,
nn
n
converge to
H'(a 1 )
.
This yields the result.
The proof of Theorem 2.1 given below is an adaptation of the proof for
maximum likelihood estimation given in Huber (1965).
Basically, it involves
showing something like
a.s.
If W were monotone (see Corollary 2.1), this would indeed yield Theorem 2.1.
W is
However,
not in general monotone.
Proof of Theorem 2.1.
Kn
=C -
Un
(say at
C
Let
= [-b,b] and Un = (-EB(n), EBen)) . Then
is compact, so that
an)'
Now,
IA(e)1
IA(e)1
as
0
+
e
attains a positive minimum on
+
so that
0
an
is increasing in a neighborhood of 0
IA(±eB(n))I ~ ceB(n)
2.2, for small
for
n
O.
Since
By Proposition 2.1,
sufficiently large and some
c.
By Proposition
n > 0 ,
sup In -
almost surely as
n
n
l
~ W(X. - e) - A(e)
L
i=l
1
+
00.
e~K
n
itf IB(n)-
I
~
(c - n)eB(n)
Hence, almost surely as
lIn
n-
eeK
n
+
Kn
Ih(e)1
I w(x. -
i=l
n
n
+
00
,
e)1 ~ nE > 0 .
1
L w(x.1 - Tn ) = 0, Tn t Kn almost surely as n + 00 , co~pleting the
i=l
first part of the proof. Suppose now that (A3)* holds in place of (A3).
Since
Suppose that the zeros of
A(e)
allowable under (81) or (B2) are
{
-12-
{aI' .•. ,am} , and form a disjoint cover of these zeros by the intervals
Uni = (a i - E8(n), a i + EB(n))}, i = I, 2, ... ,m
Suppose IA{e)l· attains
m
its minimum on Kn = C - u U . at ..~n. By Proposl·tl·on 2 • 4 ,
. 1 n,l
1=
min I~
l:>:i:S;m n
EB(n) ,
IA(~n)1 ~ c*cS(n)
Thus,
for
Proposition 2.2, means that
n -1
n
Tn
-
a·1
:S; cB(n) .
1
large and some
~
Kn
c* > O.
almost surely n
+
00.
This, together with
Now, since
n
r
~(Xi - (±n)) + A(±n) a.s. , this means by Proposition 2.2 that
i=l
eventually there is a zero to (2.1) in the interval (-S(n), B(n)) ; since the
sample median is strongly consistent, this completes the proof.
Proof of Theorem 2.2. Assume for convenience that
of non-differentiability, say {±k}.
A.1
B.
1
Now, by expanding
bounded and T2
n
(2.9)
o = n -1
= n -1
= {Ix.1
= {Ix.1
-
~(Xi - Tn)
= 0 (n -1 (log
I
Tn I
> k
about
n) 2)
and
Xi
ix·1
1
Ix·1
1
on Ai
:S;
k}
> k}
and
1
1
is
n
r
r
i=l
i=l
~(X.
1
- Tn )
n
1
IA.uB.
B. , since $"
a.s.,
IA.uB.}
where
has at most two points
Define
and
- Tn
~
1
~
(n -1 (log n) 2)
a. s. ,
1
is the indicator function of the event
Propositicn 2.3, since
+ 0
A. u B••
1
1
Because of
satisfies a Lipschitz condition, the middle term of
-13-
equation (2.9) (call it
n
B
(2.10)
n
= n -1 L
i=l
Bn ) becomes
{ljI(X.)
- TnljI'(X.)}(l
- I A.uB. ) + O(n- l (log n)2)
1
1
1
a.s.
1
Thus,
T {n
(2.11)
n
-1
L
n
ljI'(X )}
n
i=l
= n- l
a. s. ,
which completes the proof since
n
n
-1
a.s.
-1
a.s.
Proof of Corollary 2.1. Since ljI is increasing, lr'
Since
ljI
holds.
and
~
0 , so that (A2) holds.
F are strictly increasing in a neighborhood of zero, (A3)
Huber (1964) has shown (A4) by proving that
Tn
is almost surely
consistent.
Pt~of
of Corollary 2.2.
Because
F is symmetric and unimodal, the conditions
guarantee that (AI) and (A2) hold.
symmetric and increasing at
such that
A(n) ~ 0 and
For the same reason, since
0, there exists
= -sign
sign (A(n))
ljI1
is skew-
n > 0
arbitrarily close to
(A(-n))
Thus, (A3) * is
verified, and it suffices by Th00rem 2.1 merely to show that (A4) holds.
Because
n -1
~l ljII ( Xi - (±n) )
i=1
+
A(±n) <> 0 a.s. ,
0
-14-
there is eventually a solution to (2.1) in the interval
sample
m~dian
(-n,n)
because the
is strongly consistent, this gives (A4).
SCALE INVARIANT M-ESTIMATORS
An
estimator Tn
T(Xl/c, ..• ,Xn/c)
= T(X l , ..• ,Xn ) is called scale invariant if
= Tn/c
for any constant
c
~
o.
The estimators in the
previous section do not satisfy this property, and it is the purpose of this
section to modify the criterion (2.1) to get scale invariant versions.
Andrews, et. al. (1972, page 40) indicate that even if the
~
functions are
monotone as in Corollary 2.1, the influence function is complicated unless
is sYmmetric.
F
The representation given here shows clearly the behavior of the
estimators in both sYmmetric and asymmetric cases; there is basically no change
in the symmetric case but if
F is not sYmmetric, an additional term of order
O(n-~) in probability is added.
Suppose that
sn
is a scale invariant estimator which, for some constant
t; , satisfies
(3.1)
Under the conditions given in Bahadur (1966), the interquartile range is a
scale invariant estimator satisfying (3.1).
in (3.1) is
D(n- B)
for some
In the following if the rate term
B > 0 , the order term appearing in (3.3) is
-15-
0(n-
28
)
only.
In analogy with (2.1), define
o = n -1
(3.2)
Clearly,
Tn
n
~L
~
. 1
1=
is scale invariant.
(-1
s (X. n
~
Tn
as the solution to
t) ) .
The result similar to Theorem 2.2 is the
following:
Theorem 3.1. Suppose Xl' X2, ... have a distribution function p(~-l(x and that the conditions of Theorem 2.2 hold.
l
= {E pl{J'(xn- (n- 1
(3.3)
(l -
Then,
n
I ~(~-l(Xi - 6))]
+
i=l
n
(n- 1 i=lI
sn/~)
e)
(~-l(Xi
-
{E pl{J'(X)}-l
e))~,(~-l(Xi
-
6))]
a.s.
If, in addition,
EpXI{J'(X)
=0
,
(3.4)
Note that the condition
o and
~(x)
= -~(-x)
EpX~'(X)
=0
is satisfied if
P is symmetric about
, the latter being true for (2.2) and (2.3).
if EpX~'(X) ~ 0 , the middle term of (3.3) is only O(n-~)
many cases, so that there is a
si~~ificant
Note that
in probability in
effect on the asymptotic distribution
Proof: The proof is quite similar to that of Theorems 2.1 and 2.2 so only an
outline will be given.
Assuming that
a = 0 ,and
~
= 1 , by the Lipschitz
-16-
condition on
l/I
n
n- l
(3.5)
{l/I(X/s ) - l/I(X i )} = o(n-~(log n)~)
n
l
i=l
Tn = O(n-~(log n))
so that
a.s.
a.s. ,
by the proof of Theorem 2.1.
By the same
method as in the proof of Theorem 2.2, one shows that
n
o = n- l '1
l
a.s.
1 n I l n
1
2
1 nt
l l/I(X./s ) = n - lt l/I(X.) + (Sn- - l)n - lt X.l/I'(X.) + U"(n - (log n) )
i=l
l. n
i=l
l.
i=l l.
1
a.s.
l.=
n-
n
O(n- l (log n)2)
(3.6)
l/I(X./s) - (T Is )n- 1 l l/I'(X./s)
l.n
nn
'1
l.n
l.=
+
n
T {n -1 tl l/I' (Xl' I sn) - n -1
n
. 1
1=
a.s.
o(n -1 (log n) 2)
(s n -l)n -Itl$(X.)=
1
This completes the proof, since
a.s.
It is clear that if EFX$'(X) = 0 ; then results similar to Corollary 2.1
and Corollary 2.2 hold.
----_.-
LINEAR FUNCTIONS OF ORDER STATISTICS
Let
Xl' XZ' •••
be a sample from a distribution function
linear function of the order statistics
F(x) • A
X(l) S X(2) S •.• S X(n)
n
L J(i/n)
(4.1)
i=l
where J
(4.2)
is some weighting function.
J(t)
= (1
=0
XCi) ,
The trimmed mean is given by
- Za)-l
otherwise .•
is
-17de Wet and Venter (1974) show that the trimmed mean is the sum of i.i.d. random
variables plus an almost sure error term of order O(n -1 log2n).
This section
shows that this almost sure representation holds for a wide class of weight
functions
The method is to extend the proof of Moore (1968).
J.
Theorem 4.1. Suppose J'
of order one on
random variables
Y
l
If, in addition,
(4.4)
= EXJ(F(X))
such that for
a = n -1
I J:
T n
e
Let
, Y2 ,
Tn -
(4.3)
Proof:
[0,1] .
exists and satisfies a uniform Lipschitz condition
.
Then there are mean
a > 0 ,
n
I
a.s.
i=l
d(F-1(u) J(ulll
<
~
,
a = n -1
a.s.
Following Moore (1968), let
G(x)
= F-1 (x),
R.1 ,
the empirical distribution function of the
R.~
W
(u)
n
= F(X.) , Un(u)
= n~(un (u) - u) ,
J (un (u)) --..'J (u) = J' (v*n (u)) (un (u) - u) ,
where V*(u)
n
(4.5)
= aun (u)
+
H(R l )
it follows that Tn - a
(1 - a)u
and
= J J(u)
= I nI
0, i.i.d.
+
lal
S
1.
Then, if
[U1(u) - u] dG(u) ,
I n2 + I n3 ' where
~
be
and
-18(4.6)
J:
I
= -n
I n2
= n -~
I n3
= n- 1 II0
IIn21 <
Now,
n
-1
I n1
. 1
1=
J:
H(R.)
1
dUn (u)
G(u)J'(u)Wn(u)dWn(u) •
supIJ'(V~(n))
IG(x)! dUn(x)
(v~ (u)) - J' (u) 1 I'/n (u)
G(u) [J'
J:
- J'(u)1 suplun(u) - ul
~ J IG(x)1
dx a.s.
IG(x)1 dUn(x) , and
By the Law of the Iterated Logarithm for
the Kolmogorov-Smirnov statistics and the Lipschitz condition on J' , it
follows that IIn21 = O(n- l log2n) a.s. as n ~ m. As in Moore's proof,
I I:
21 I n3 1
= n-I
(4.7)
21In31 <
If
1
J:
G(u)J' (u)d [I'/n (u)] 21. 0 (n -1 log 2n)
J:
IUn(u) - ul
d(G(u)J'(u))I
<
~
,
2
d(G(u)J'(u)) • 0(n-
this gives
a. s. , so that
1
log 2n)
a.s.
II n3 1 = 0(n- 1 log 2n)
a.s.
Otherwise,
denote the first term on the right hand side of (4.7) by
relation stated in' Proposition 2.5, one sees
4
cOn H(u) , where
H(u)
~
that
Bn • By the moment
8
Eln(un(u) - u) 1 $
0 is integrable with respect to
two applications of Schwarz's
E Bn4
Inequ~lity
$
cn 4/n 8
G(u)J'(u) .
Thus,
yield
= cn -4
.
By Chebychev's Inequality and the Borel-Cantelli Lemma,
n
-3/4+a
Bn
~
0 a.s. ,
completing the proof.
Theorem 4.2. Supp'Jse J, J', JII are continuous and bounded except possibly a":
{aI' ... ,ak } , and possess left and right limits.
If F satisfies the
-19conditions of Bahadur (1966) at
{aI' ... ,ak} , the conclusions (4.3) and
(4.4) hold, except that the order term in (4.4) is O(n- 1 (log n)2) •
Proof:
Following Moore (1968) again, we have that
(4.8)
T = fl G(u)J'(u)V (u)du + fI G(u)J(u)dV (u)
nOn
0
n
= n-~Wn (u)
where Vn (u)
+
f1o G(u){J(Un (u»)
+
f1o G(u)J'(u)Vn (u)dVn (u)
The last term is
,
n3 appearing in
The first two terms of (4.8) can be decomposed (via integration
Theorem 4.1.
= Un (u) - u.
- J(u) - J'(u)V (u)} dU (n)
n
n
I
by parts) into
k
- V (a. l)G(a. l)J(a. l+)}
~ {Vn (a.)G(a.)J(a.-)
J
J
J
n JJJ-
(4.9)
j=l
+
f1o J(u)Vn (u)dG(u)
•
As in Theorem 4.1, the second term in (4.9) is the sum of i.i.d. mean
0
k
random variables.
k+1
(4.10)
B
n
= l
j=l
Call the first term
Sen)
J=
I
ars(n)
A.
The third term of (4.8) is
J,n
G(u){J(U (u») - J(u) - J'Cu)V (u)} dU (u)
a. l+s(n)
n
n
n
Jk
a.+I3(n)
+ L f J
G(u){J(U eu») - J(u) - J'(u)V (u)} dU (u) ,
. 1
J=
where
I
. 1
a - Sf',.n )
j
converges to zero.
n
By choosing
n
B(n)
n
= O(n-~(log2n)~
appropriately, the first term on the right hand side of (4.10) is
a.s. , since Un(u)
and u are
(a.s.)
in the intervals
-20-
(a j _1 + a(n), a j - a(n))
By choosing
together as
= n-~(log n)~,
B(n)
n
since
+
wand J'
Vn(u)
is differentiable there.
= o(B(n)) ,
Proposition 2.3 can
be used to show that
a.+B(n)
J
= 0. ( n -1 (log
) V (u)dU (u)
G(
u)J'(u
n
n
Ja.-B(n)
J
n) 2)
a.s.
Thus, we need only consider
a.+13(n)
k
(4. lOb)
faj-B(n)
):
J=l
G(u) {J (Un (u)) - J(u)} dUn(u) •
J
R[ na ]
Assume for the moment that
~
a. , wb.ere
S
R(k) s a j }
J
j
statistic in
R1 , ..• ,R .
n
In
= {k:
a.
B.In 2
= {k:
R[na.]
Bjn3
= {k:
J
J
J
r
+
B.In 1
a j < R(k) < a j + BCn)} •
2
+
B.In 2
B.In 3
G(R Ck )) {J(k/n) - J(R(k))}) .
Now, de Wet and Venter (1974) show that the number of terms between
R[na ]
is
j
aT d Cl'
j , (4.10b) becomes
r
1
n- (
C4.11)
[na. ] th
Define
B. 1
Then, for a given
is the
aj
and
O(n~(10g2n)~ a.s. ; becau3e J' is bounded, one can use
Proposition 2.3 to show that the first and last sums of (4.11) are
o(n- 1 Clog
(4.12)
n
n)2) a.s. By a similar argument, the middle sum of (4.11) is
-1
B
r
G(~) {J(a.+) - J(a.-)} + O(n- 1 C1og n)2) a.s.
J
jn2
= n- 1
J
{J(a.+) - J(a.-)} {Sum of observations between
J
J
x[ na. ] } + OrIn -1 (log n) 2) a. s.
J
F
-1
(a.)
J
and
-21k
Now, the first term of (4.9) is
-
I
j=1
V (a.)G(a.) (J(a.+) - J(a.-)) , so that
n J
J
J
J
this term added to the first term on the right hand side of (4.12) becomes for
c.
J
= J(a.+)
J
- J(a.-) ,
J
~
c . { n -1
~
G(R )) - G(a.)V (a.) }
Ck
J
B
J n J
jn2
(4.13)
= c.n -1
J
+ c.n-l{n(u (a.) - U (R[
])) - nV (a.)} G(a.)
J
n J
n na j
n J
)
1{na.
= c.n)
)
- [na.]} G(a.) + O(n-lClog n)2)
= O(n-l(log
n)2)
)
J
a.s.
a.s.
The second equality in (4.13) follows by using Proposition 2.3, since
G(R[na.]) - G(a j )
= O(n-~(10g2n)~
a.s.
This completes the proof.
)
APPLICATIONS
We now apply the results of Sections 2 and 4 to get the asymptotic
equivalence of M-estimators and LFO's and to obtain an expansion for flexible
estimates of location.
We first consider the equivalence, showing that if
Jaeckel's (l97la) conditions are strengthened, the result follows easily from
the representations and is much stronger than any result appearing in the
literature.
Theorem 5.1.
(S.la)
Suppo~e
the following hold:
-22(S.lb)
W,J
satisfy the conditions of Theorems 2.1 and 4.2 respectively, and
F is continuous.
(S.lc)
(S.ld)
J(t)
M
n
= W'(x) = 0
for
t
t
[a,b],
0 < a < b < 1 .
is the M-estimator formed by (2.1) and
Ln
is the LFO formed by J.
Then,
(S.2)
Proof: Because of Theorems 2.2 and 4.2, it suffices to show that
H(R
l)
= -I
J(u) [Ul(u) - u] dG(u)
H(R l )
= -A
= Aw(X l )
I W'(G(U))
.
Now,
[Ul(u) - u] dG(u)
= -A f
W'(u) [Fl(y) - F(y)] dy
= -A I
Fl(y) dw(y)
=
A
+
A
f F(y)
f W(y) dFl(y) = Aw(X l )
d~(y)
•
Corollary 5.1. The conditions of Theorem 5.1 hold, for example, if
(S.3a)
F', F"
(S.3b)
~
exist and are bounded away from zero on an interval
has 3 continuous bounded derivatives at all but a finite number of
points.
(S.3c)
-23-
We now turn to flexible estimates of location.
M-estimators defined by
ljik
We consider the
in (2.2) and the trimmed means given by
-
2a)-1 if a S t S I - a and zero otherwise. We will assume
a (t) = (1
is symmetric for convenience, although some similar results hold if F is
J
asymmetric.
F
Further, some similar results hold for the scale invariant version
of the M-estimators.
The asymptotic variances of these estimators are
o~(k) = {F(k)
(5.4)
2
oLea)
= (1
- F(_k)}-2
- 2a)-
2
{
f
J lji~(x)dF(X)
x
I-a
2
2
x dF(x) + 2axa } ,
XC(
where
xa
= G(a) = F-1 (a)
a-trimmed mean by
2
sM(k)
(5.5)
.
Denoting the M-estimator by Mnk
Lna ,we estimate these asymptotic variances by
= {Fn (k - Mnk ) - Fn(-k - Mnk )} -2
2
sL(k) = (1 - 2a)
=
(1 -
and the
2a)
-2 {n- 1
-2 {
n-[an]
2
i=[an]+l
fX(n- [an])
X
f
2
ljik(x - Mnk ) dFn(x)
2
(X(i) - Lna ) }
(x - Lna )2 dF n (x)
2a(X([an])
+
+
_ L )2
na
2a (X [an ] _ Lna )
2} .
[an]+l)
We define the optimal estimators as follows: assume a range of perlnissible
values
:;;; k S k and o < a O S a S a < ~ fixed in advance. Compute
l
l
O
2
which minimize
and sLea) for these ranges, and choose
k
2
sLea)
on these ranges.
We will assume
are uniquely defined
-24and that the values
minimizing
k*, n*
on the ranges are unique.
If this latter assumption is not true, modifications along the lines suggested
by Jaeckel (197Ib) are possible.
Define
We first consider the M-estimators.
The following Lemma is an easy
consequence of an extension of Theorems 2.1 and 2.2, along with the Law of the
Iterated Logarithm for the empirical distribution function.
Lemma 5.1.
Let
k'
be any point in
differentiable in neighborhoods
inf IF(k) - F(-k)
kEC
M
sup+
C
IMnk
I
>
If in addition
Then
0
- (EFt/!k(X)r
C . Suppose that F is continuously
M
C of ±k' and that
1
n-
1
n
L
t/!k(\)
I = O(n-l(log
n)2)
a.s.
i=l
F is continuously differentiable on
CM ' then
Lemma 5.2. Under the conditions of Le~ma 5.1, if a~(k) is continuously
differentiable in
k on
CM ' then
Proof: Define 6(n)
= MIn -~ (log
k*
Ml
+
6Cn))} , where
n) ~ and
will be large.
In
Let
t (k* - B(n),
~n = a~(an) = inf a~(k) .
= {k
E
CM: k
By
-25-
Proposition 2.4, we have
a~ ~
Sen) sian - k*1 s M2S(n) , so that
k*.
Thus, by choosing Ml sufficiently large, we can find a
sequence of constants D(n) = O(n-~(log n)) such that
where
Thus, with probability approaching one,
+ D(u)
- D(u) >
~n
- Deu)
so that with probability approaching one,
yielding the
Lew~a.
Thus by bringing Lemmas 5.1 and 5.2 together and by following the proof of
Theorem 2.2, we obtain the following:
Theorem 5.2. Under the conditions of Lemmas 5.1 and 5.2,
a.s.
We now show that a result similar to Theorem 5.2 holds for the trimmed
mean.
-26L~~a
C1 ,. C2
5.3.
Let
a'
be a fixed point and suppose that for neighborhoods
of a,
' 1 - a''
F satisfies the conditions of Bahadur (1966).
(5.6)
Then
a. s. ,
where
y,,: (a) = G(a)(l - 2a) -1
X. < G(a)
1
1
= X. (1 - 2a. ) -1
1
=
G(a)
G(l - a)(l - 2a )-1
>
X.
1
~
G(l
a)
X.1 > G(1 - a)
If the Bahadur conditions hold on the set
EO
~
a O - EO
~
F(x)
~
1 - (a O - EO)
0 , then the sup in (5.6) is over CL and
(5.7)
Proof:
---
From Sen and Ghosh (1971), if
sup
sup
O<t<l lal~g(n)
Iu
(t + a) -
n
U
g(n)
(t) + al
n
= n -3§ (log
n) ,
= O(n- 3/ 4 1og
n)
a.s.
This, together with the Law of the Iterated Logarithm for the empirical
distribution function, implies that
sup
aO~a~l-aO
sup
a
o
~Ct~l-a
- F (G(a) ) I
= O (n -~~log
n)
a.s.
IFn(X([na.))) - Fn(G(a))
I =o(n-3§log
n)
a.s.
IF(X([na]))
0
The conditions of this Lemma then show
that
with
-27-
Now, following the proof of de Wet and Venter (1974a), we see that (5.6) holds.
-1 n
Also,app1ying the exponential bounds to n
I Y~(a) together with (5.6)
i=l 1
yields
sup
C
IL I = 0(n-~(10g
na
n))
a.s.
L
The proof of (5.7) now follows easily.
is continuously
Theorem 5.3. Under the condi.tions of Lemma 5.3, if
CL ' then
differentiable on
an - a* = O(n-J.?(log n))
a.s.
y Y~(a*)
L
= n- 1
nan
i=l
A
+
0(11- 1 (10g n)2)
a.s.
1
The proof is as in Lemma 5.2 and Theorem 5.2.
Finally, if
F satisfies the conditions of Bahadur (1966) at
G(a)
and·G(l - a) , one can obtain some of the results of Shorack (1974) directly
from Lemma 5.3.
an - a
To be specific, if we estimate
= o(n-~) in probability,
by
a
an
and
if
then lUlder the initial conditions of
Le~ma
5.3,
equation (5.6) tells us that
(5.8)
Now,
ILna
~se
I = /nna
n
1
l
. 1
L
A
n
the last part of
1=
A
Y~(a)
1
P~oposition
right-hand-side of (5.8) is
a.s.
n
2.3 to
sho\~
that the first term on the
o(n-~) in probability.
-28-
BIBLIOGRAPHY
D.F., BICKEL, P.J., HAMPEL, F.R., HUBER, P.J., ROGERS, N.H., and
TUKEY, J.W. (1972). Robust Estimates of Location: Survey and Advances.
Princeton University Press.
N~DREWS,
BN1ADUR, R.R. (1966). A note on quantiles in large samples. Ann. Math.
Statist. (37) 577-580.
BICKEL, P.J. (1975). One step estinates in the linear model. To appear in J.
Amer. Statist. Assoc.
CARROLL, R.J. (1974). Asymptotically nonparametric sequential selection
procedures II - robust estimators. Institute of Statistics Mimeo Series
#953, University of North Carolina at Chapel Hill.
DE WET, T. and VENTER, J.ll. (1974). An asymptotic representation of trimmed
means with applications. S. Afr. Statist. J. (8) 127-134.
DE WET, T. (1974). Rates of convergence of linear combination of order
statistics. S. Afr. Statist. J. (8) 35-44.
FILIPPOVA, A.A. (1962). ~Iisesl theorem on the asymptotic behavior of functionals
of empirical distribution functions and its statistical applications.
Theory of Probability and its Applications (7) 24-57.
P~"MPEL, F.R.
(1974). The influence curve and its role in robust estimation.
J. Amer. Statist. Assoc. (69) 383-393.
HUBER, P.J. (1964). Robust estimation of a location parameter. Ann. Math.
Statist. (35) 73-101.
HUBER, P.J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Prob. (1)
221-233.
HUBER, P.J. (1972). Robust statistics: a review. Ann. Math. Statist. (43)
1041-1067.
JAECKEL, L.B. (197Ia). Robust esti~ate5 of location: symmetry and asymmetric
contamination. Ann. Math. Statist. (42) 1020-1034.
JAECKEL, L.B. (1971b). Some flexible estimates of location. Ann. Math.
Statist. (42) 1540-1552.
JOHNSON, N.L., and KOTZ, S. (1969). Discrete Distributions. Houghton Mifflin,
Boston.
LOEVE, 11. (1963). Probability Theory. D. Van Nostrand, Princeton.
MOORE, D.S. (1968). An elementary proof of asymptotic normality of linear
functions of order statistics. Ann. Math. Statist. (39) 263-265.
SEN, P.K •• and GHOSH, M. (1971). On bounded length sequential confidence
intervals based on one-sample rank order statistics. Ann. Math. Statist.
(42) 189-203.
SHORACK. G.R. (197 4 ). Random means. Ann. Statist. (2) 661-675.
Download