ROBUST STATISTICS

advertisement
ROBUST STATISTICS
INTRODUCTION
• Robust statistics provides an alternative approach to classical
statistical methods. The motivation is to produce estimators that are
not excessively affected by small departures from model
assumptions. These departures may include departures from an
assumed sample distribution or data departuring from the rest of the
data (i.e. outliers).
MEAN VS MEDIAN
nm
Tn    
Xj
j m1 n  2m
where m  n Usually,0.1    0.2
mX m1  X nm     X i 
nm
Tn   
i m1
n
where m  n 
ROBUST MEASURE OF VARIABILITY
 


MAD  m edian Yi  m edianY j 
i
j


ORDER STATISTICS AND ROBUSTNESS
• Ordered statistics and their functions are usually somewhat robust
(e.g. median, MAD, IQR), but not all ordered statistics are robust (e.g.
X(1), X(n), R=X(n) X(1).
M-ESTIMATORS
m in  log f Yi     m in  Yi   
i
i
i
i
. Yi  ˆ 
 i Yi  ˆ   0 or  wi Yi  ˆ   0 where w   Yi  ˆ 
i
i
M-ESTIMATORS
M-ESTIMATORS
M-ESTIMATORS
M-ESTIMATOR
• When an estimator is robust, it may be inferred that the influence of
any single observation is insufficient to yield any significant offset.
There are several constraints that a robust M-estimator should meet:
1. The first is of course to have a bounded influence function.
2. The second is naturally the requirement of the robust estimator to
be unique.
Briefly we give a few indications of these functions:
•
L2 (least-squares) estimators are not robust because their influence
function is not bounded.
• L1 (absolute value) estimators are not stable because the  function |x| is not strictly convex in x. Indeed, the second derivative
at x=0 is unbounded, and an indeterminant solution may result.
• L1L2 estimators reduce the influence of large errors, but they still
have an influence because the influence function has no cut off point.
EXAMPLES OF M-ESTIMATORS
• The mean corresponds to ρ(x) = x2, and the median to ρ(x) = |x|. (For
even n any median will solve the problem.) The function
x, x  c
 x   
0, otherwise
corresponds to metric trimming and large outliers have no influence at
all. The function
 c , x   c

 x    x , x  c
c , x  c

is known as metric Winsorizing2 and brings in extreme observations to
μ±c.
EXAMPLES OF M-ESTIMATORS
• The corresponding −log f is
2

x
,if x  c

 x   

c2 x  c ,otherwise
and corresponds to a density with a Gaussian center and doubleexponential tails. This estimator is due to Huber.
EXAMPLES OF M-ESTIMATORS
• Tukey’s biweight has
2 2
 t
 t   t 1    
  R   
where [ ]+ denotes the positive part of. This implements ‘soft’ trimming.
The value R = 4.685 gives 95% efficiency at the normal.
• Hampel’s ψ has several linear pieces,
x
,0  x  a

,a  x  b
a
 x   sgnx 
a c  x  / c  b ,b  x  c
0
,c  x

for example, with a = 2.2s, b = 3.7s, c = 5.9s.
ROBUST REGRESSION
• Procedures dampen the influence of outlying cases, as compared to
ordinary LSE, in an effort to provide a better fit for the majority of cases.
• LEAST ABSOLUTE RESIDUALS (LAR) REGRESSION: Estimates the
regression coefficients by minimizing the sum of absolute deviations of
Y observations from their means:
n

m in  Yi   0  1 X i ,1     p 1 X i , p 1

i 1

Since absolute deviations rather than squared ones are involved, LAR
places less emphasis on outlying observations than does the method of
LS. Residuals ordinarily will not sum to 0. Solution for estimated
coefficients may not be unique.
ROBUST REGRESSION
• ITERATIVELY REWEIGHTED LEAST SQUARES (IRLS) ROBUST
REGRESSION: It uses weighted least squares procedure.
n
 
m in  wi Yi   0  1 X i ,1     p X i , p

i 1
2 
This regression uses weights based on how far outlying a case is, as
measured by the residual for that case. The weights are revised with
each iteration until a robust fit has been obtained.
ROBUST REGRESSION
• LEAST MEDIAN OF SQUARES (LMS) REGRESSION:
n
 
m in  Yi   0  1 X i ,1     p X i , p

i 1
2 
• Other robust regression methods: Some involve trimming extreme
squared deviations before applying LSE, others are based on ranks.
Many of the robust regression procedures require intensive
computing.
EXAMPLE
• This data set gives n = 24 observations about the annual numbers of
telephone calls made (calls, in millions of calls) in Belgium in the last
two digits of the year (year); see Rousseeuw and Leroy (1987), and
Venables and Ripley (2002). As it can be noted in Figure there are
several outliers in the y-direction in the late 1960s.
• Let us start the analysis with the classical OLS fit.
> data(phones)
> attach(phones)
> plot(year,calls)
>fit.ols <- lm(calls~year)
> summary(fit.ols,cor=F)
..
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -260.059 102.607 -2.535 0.0189 *
year 5.041 1.658 3.041 0.0060 **
Residual standard error: 56.22 on 22 degrees of freedom
Multiple R-Squared: 0.2959, Adjusted R-squared: 0.2639
F-statistic: 9.247 on 1 and 22 DF, p-value: 0.005998
> abline(fit.ols$coef)
> par(mfrow=c(1,4))
> plot(fit.ols,1:2)
> plot(fit.ols,4)
> hmat.p <- hat(model.matrix(fit.ols))
> h.phone <- hat(hmat.p)
> cook.d <- cooks.distance(fit.ols)
> plot(h.phone/(1-h.phone),cook.d,xlab="h/(1-h)",ylab="Cook distance")
• In order to take into account of observations related to high values of the residuals, i.e. the
outliers in the late 1960s, consider a robust regression based on Huber-type estimates:
> fit.hub <- rlm(calls~year,maxit=50)
> fit.hub2 <- rlm(calls~year,scale.est="proposal 2")
> summary(fit.hub,cor=F)
..
Coefficients:
Value Std. Error t value
(Intercept) -102.6222 26.6082 -3.8568
year 2.0414 0.4299 4.7480
Residual standard error: 9.032 on 22 degrees of freedom
> summary(fit.hub2,cor=F)
..
Coefficients:
Value Std. Error t value
(Intercept) -227.9250 101.8740 -2.2373
year 4.4530 1.6461 2.7052
Residual standard error: 57.25 on 22 degrees of freedom
> abline(fit.hub$coef,lty=2)
abline(fit.hub2$coef,lty=3)
• From these results and also from THE PREVIOUS PLOT, we note that there are
some differences with the OLS estimates, in particular this is true for the Hubertype estimator with MAD. Consider again some classic diagnostic plots about the
robust fit: the plot of the observed values versus the fitted values, the plot of the
residuals versus the fitted values, the normal QQ-plot of the residuals and the fit
weights of the robust estimator. Note that there are some observations with low
Huber-type weights which were not identified by the classical Cook’s statistics.
Download