A SHORT COURSE ON ROBUST STATISTICS David E. Tyler

advertisement
A SHORT COURSE ON
ROBUST STATISTICS
David E. Tyler
Rutgers
The State University of New Jersey
Web-Site
www.rci.rutgers.edu/ dtyler/ShortCourse.pdf
References
• Huber, P.J. (1981). Robust Statistics. Wiley, New York.
• Hampel, F.R., Ronchetti, E.M., Rousseeuw,P.J. and Stahel, W.A. (1986).
Robust Statistics: The Approach Based on Influence Functions.
Wiley, New York.
• Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006).
Robust Statistics: Theory and Methods. Wiley, New York.
PART 1
CONCEPTS AND BASIC METHODS
MOTIVATION
• Data Set: X1, X2, . . . , Xn
• Parametric Model: F (x1, . . . , xn | θ)
θ: Unknown parameter
F : Known function
• e.g. X1, X2, . . . , Xn i.i.d. N ormal(µ, σ 2)
Q: Is it realistic to believe we don’t know (µ, σ 2), but we know e.g. the shape of the
tails of the distribution?
A: The model is assumed to be approximately true, e.g. symmetric and unimodal
(past experience).
Q: Are statistical methods which are good under the model reasonably good if the
model is only approximately true?
ROBUST STATISTICS
Formally addresses this issue.
CLASSIC EXAMPLE: MEAN .vs. MEDIAN



• Symmetric distributions: µ = 

population mean
population median
• Sample mean: X ≈ N ormal µ,
σ2
n
• Sample median: M edian ≈ N ormal µ,
• At normal: M edian ≈ N ormal µ,
σ2 π
n 2
1
1
n 4f (µ)2
!
• Asymptotic Relative Efficiency of Median to Mean
ARE(M edian, X) =
avar(X)
2
= = 0.6366
avar(M edian) π
CAUCHY DISTRIBUTION
X ˜ Cauchy(µ, σ 2)

f (x; µ, σ) =
2
1 
x−µ
1 + 
πσ
σ

2 −1
 

• Mean: X ˜Cauchy(µ, σ ) • M edian ≈ N ormal µ,
• ARE(M edian, X) = ∞ or ARE(X, M edian) = 0
π2σ2
4n
• For t on ν degrees of freedom:
4
Γ((ν + 1)/2)2
ARE(M edian, X) =
(ν − 2)π
Γ(ν/2)
ν≤2
3
4
5
ARE(M edian, X) ∞ 1.621 1.125 0.960
ARE(X, M edian)
0
0.617 0.888 1.041
MIXTURE OF NORMALS
Theory of errors: Central Limit Theorem gives plausibility to normality.



N ormal(µ, σ 2)
with probability 1 − X˜ 
 N ormal(µ, (3σ)2 ) with probability i.e. not all measurements are equally precise.
= 0.10
X ˜(1 − ) N ormal(µ, σ 2) + N ormal(µ, (3σ)2)
• Classic paper: Tukey (1960), A survey of sampling from contaminated distributions.
◦ For > 0.10 ⇒ ARE(M edian, X) > 1
◦ The mean absolute deviation is more effiicent than the sample standard deviation
for > 0.01.
PRINCETON ROBUSTNESS STUDIES
Andrews, et.al. (1972)
Other estimates of location.
• α-trimmed mean: Trim a proportion of α from both ends of the data set and then
take the mean. (Throwing away data?)
• α-Windsorized mean: Replace a proportion of α from both ends of the data set
by the next closest observation and then take the mean.
• Example: 2, 4, 5, 10, 200
Mean = 44.2
Median = 5
20% trimmed mean = (4 + 5 + 10) / 3 = 6.33
20% Windsorized mean = (4 + 4 + 5 + 10 + 10) / 5 = 6.6
Measuring the robustness of a statistics
• Relative Efficiency over a range of distributional models.
– There exist estimates of location which are asymptotically most efficient for
the center of any symmetric distribution. (Adaptive estimation, semi-parametrics).
– Robust?
• Influence Function over a range of distributional models.
• Maximum Bias Function and the Breakdown Point.
Measuring the effect of an outlier
(not modeled)
• Good Data Set: x1, . . . , xn−1
Statistic: Tn−1 = T (x1, . . . , xn−1)
• Contaminated Data Set: x1, . . . , xn−1, x
Contaminated Value: Tn = T (x1, . . . , xn−1, x)
THE SENSITIVITY CURVE (Tukey, 1970)
1
SCn(x) = n(Tn − Tn−1) ⇔ Tn = Tn−1 + SCn(x)
n
THE INFLUENCE FUNCTION (Hampel, 1969, 1974)
Population version of the sensitivity curve.
THE INFLUENCE FUNCTION
• Statistic Tn = T (Fn) estimates T (F ).
• Consider F and the -contaminated distribution,
F = (1 − )F + δx
where δx is the point mass distribution at x.
F
F
• Compare Functional Values: T (F ) .vs. T (F)
• Given Qualitative Robustness (Continuity):
T (F) → T (F ) as → 0
(e.g. the mode is not qualitatively robust)
• Influence Function (Infinitesimal pertubation: Gâteaux Derivative)
T (F) − T (F )
∂
= T (F)
IF (x; T, F ) = lim
→0
∂
=0
EXAMPLES
• Mean:
T (F ) = EF [X].
T (F) = EF (X)
= (1 − )EF [X] + E[δx]
= (1 − )T [F ] + x
(1 − )T (F ) + x − T (F )
= x − T(F)
→0
IF(x; T, F) = lim
• Median:
T (F ) = F −1(1/2)
IF(x; T, F) = {2 f (T(F))}−1sign(X − T(F))
Plots of Influence Functions
Gives insight into the behavior of a statistic.
1
Tn ≈ Tn−1 + IF (x; T, F )
n
Mean
α-trimmed mean
Median
α-Winsorized mean
(somewhat unexpected?)
Desirable robustness properties for the influence function
• SMALL
– Gross Error Sensitivity
GES(T ; F ) = sup | IF (x; T, F ) |
x
GES < ∞ ⇒ B-robust (Bias-robust)
– Asymptotic Variance
N ote : EF [ IF (X; T, F ) ] = 0
AV (T ; F ) = EF [ IF (X; T, F )2 ]
• Under general conditions, e.g. Frèchet differentiability,
√
n(T (X1, . . . , Xn) − T (F )) → N ormal(0, AV (T ; F ))
• Trade-off at the normal model: Smaller AV ↔ Larger GES
• SMOOTH (local shift sensitivity): protects e.g. against rounding error.
• REDESCENDING to 0.
REDESCENDING INFLUENCE FUNCTION
• Example: Data Set of Male Heights in cm
180, 175, 192, . . ., 185, 2020, 190, . . .
• Redescender = Automatic Outlier Detector
CLASSES OF ESTIMATES
L-statistics: Linear combination of order statistics
• Let X(1) ≥ . . . ≥ X(n) represent the order statistics.
T (X1, . . . , Xn) =
where ai,n are constants.
n
X
i=1
ai,nX(i)
• Examples: ◦ Mean. ai,n = 1/n
◦ Median.
ai,n



1 i=
=
 0 i 6=
for n odd
n+1
2
n+1
2
ai,n
1
2
i = n2 , n2 + 1
=
 0 otherwise
for n even



◦ α-trimmed mean.
◦ α-Winsorized mean.
• General form for the influence function exists
• Can obtain any desirable monotonic shape, but not redescending
• Do not readily generalize to other settings.
M-ESTIMATES
Huber (1964, 1967)
Maximum likelihood type estimates
under non-standard conditions
• One-Parameter Case.
X1, . . . , Xn i.i.d. f (x; θ), θ ∈ Θ
• Maximum likelihood estimates
– Likelihood function.
L(θ | x1, . . . , xn) =
n
Y
i=1
f (xi; θ)
– Minimize the negative log-likelihood.
n
X
min
θ
i=1
ρ(xi; θ) where ρ(xi; θ) = − log f (x : θ).
– Solve the likelihood equations
n
X
i=1
ψ(xi; θ) = 0 where ψ(xi; θ) =
∂ρ(xi; θ)
∂θ
DEFINITIONS OF M-ESTIMATES
• Objective function approach:
θc = arg min Pni=1 ρ(xi; θ)
• M-estimating equation approach:
Pn
c
ψ(x
;
θ)
= 0.
i
i=1
– Note: Unique solution when ψ(x; θ) is strictly monotone in θ.
• Basic examples.
1
2
– Mean. MLE for Normal: f (x) = (2π)−1/2e− 2 (x−θ) for x ∈ <.
ρ(x; θ) = (x − θ)2
or
ψ(x; θ) = x − θ
– Median. MLE for Double Exponential: f (x) = 12 e−|x−θ| for x ∈ <.
ρ(x; θ) = | x − θ |
or
ψ(x; θ) = sign(x − θ)
• ρ and ψ need not be related to any density or to each other.
• Estimates can be evaluated under various distributions.
M-ESTIMATES OF LOCATION
A symmetric and translation equivariant M-estimate.
Translation equivariance
X i → X i + a ⇒ Tn → Tn + a
gives
ρ(x; t) = ρ(x − t) and ψ(x; t) = ψ(x − t)
Symmetric
Xi → −Xi ⇒ Tn → −Tn
gives
ρ(−r) = ρ(r) or ψ(−r) = −ψ(r).
Alternative derivation
Generalization of MLE for center of symmetry for a given family of symmetric
distributions.
f (x; θ) = g(|x − θ|)
INFLUENCE FUNCTION OF M-ESTIMATES
M-FUNCTIONAL: T (F ) is the solution to EF [ψ(X; T (F ))] = 0.
IF (x; T, F ) = c(T, F )ψ(x; T (F ))
where c(t, f ) =
−1/EF ∂ψ(X;θ)
∂θ
evaluated at θ = T (F ).
Note: EF [ IF (X; T, F ) ] = 0.
One can decide what shape is desired for the Influence Function and then construct
an appropriate M-estimate.
⇒ Mean
⇒
Median
EXAMPLE
Choose







c r≥c
ψ(r) = 
r |r|<c




 −c r ≤ −c
where c is a tuning constant.
Huber’s M-estimate
Adaptively trimmed mean.
i.e. the proportion trimmed depends upon the data.
DERIVATION OF INFLUENCE FUNCTION FROM M-ESTIMATES
Sketch
Let T = T (F), and so
0 = EF [ψ(X; T)] = (1 − )EF [ψ(X; T)] + ψ(x; T).
Taking the derivative with respect to ⇒
0 = −EF [ψ(X; T)] + (1 − )
∂
∂
EF [ψ(X; T)] + ψ(x; T) + ψ(x; T).
∂
∂
Let ψ 0(x, θ) = ∂ψ(x, θ)/∂θ. Using the chain rule ⇒
0 = −EF [ψ(X; T)] + (1 − )EF [ψ 0(X; T)]
∂T
∂T
+ ψ(x; T) + ψ 0(x; T)
.
∂
∂
Letting → 0 and using qualitative robustness, i.e. T (F) → T (F ), then gives
0 = EF [ψ 0(X; T )]IF (x; T (F )) + ψ(x; T (F )) ⇒ RESULTS.
ASYMPTOTIC NORMALITY OF M-ESTIMATES
Sketch
b
Let θ = T (F ). Using Taylor series expansion on ψ(x; θ)
about θ gives
n
X
0=
i=1
0=
√
By the CLT,
ψ(xi; θ) =
b
n
X
i=1
ψ(xi; θ) + (θ − θ)
b
or
n
X
i=1
ψ 0(xi; θ) + . . . ,
n
n
√ b
√
1 X
1 X
ψ(xi; θ)} + n(θ − θ){
ψ 0(xi; θ)} + Op(1/ n).
n{
n i=1
n i=1
√
n{ n1
By the WLLN,
Pn
i=1 ψ(xi ; θ)}
1 Pn
0
n i=1 ψ (xi ; θ)
→d Z ∼ N ormal(0, EF [ψ(X; θ)2]).
→p EF [ψ 0(X; θ)].
Thus, by Slutsky’s theorem,
√ b
n(θ − θ) →d −Z/EF [ψ 0(X; θ)] ∼ N ormal(0, σ 2),
where
σ 2 = EF [ψ(X; θ)2]/EF [ψ 0(X; θ)]2 = EF [IF (X; T F )2]
• NOTE: Proving Frèchet differentiability is not necessary.
M-estimates of location
Adaptively weighted means
• Recall translation equivariance and symmetry implies
ψ(x; t) = ψ(x − t) and ψ(−r) = −ψ(r).
• Express ψ(r) = ru(r) and let wi = u(xi − θ), then
0=
n
X
i=1
ψ(xi − θ) =
n
X
(xi − θ)u(xi − θ) =
i=1
n
X
i=1
wi {xi − θ}
Pn
⇒
θc =
i=1 wi xi
Pn
i=1 wi
The weights are determined by the data cloud.
•••••••
θb
•
z
•
}|
{
Heavily Downweighted
SOME COMMON M-ESTIMATES OF LOCATION
Huber’s M-estimate







c r≥c
ψ(r) = 
r |r|<c




 −c r ≤ −c
• Given bound on the GES, it has maximum efficiency at the normal model.
• MLE for the “least favorable distribution”, i.e. symmetric unimodal model with
smallest Fisher Information within a “neighborhood” of the normal.
LFD = Normal in the middle and double exponential in the tails.
Tukey’s Bi-weight M-estimate (or bi-square)
u(r) =
 2

r2  
1 − 2 
c +







where a+ = max{0, a}
• Linear near zero
• Smooth (continuous second derivatives)
• Strongly redescending to 0
• NOT AN MLE.
CAUCHY MLE
ψ(r) =
r/c
(1 + r2/c2)
NOT A STRONG REDESCENDER.
COMPUTATIONS
• IRLS:
Iterative Re-weighted Least Squares Algorithm.
Pn
i=1 wi,k xi
θbk+1 =
Pn
i=1 wi,k
where wi,k = u(xi − θbk ) and θbo is any intial value.
• Re-weighted mean = One-step M-estimate
Pn
θb1 =
i=1 wi,o
Pn
xi
i=1 wi,o
,
where θbo is a preliminary robust estimate of location, such as the median.
PROOF OF CONVERGENCE
Sketch
Pn
θk+1 − θk =
b
b
i=1 wi,k xi
Pn
i=1 wi,k
Pn
− θk =
b
i=1 wi,k (xi − θk )
Pn
i=1 wi,k
Pn
b
=
i=1 ψ(xi − θk )
Pn
b
i=1 u(xi − θk )
b
b
Note: If θbk > θ,
then θbk > θbk+1 and θbk < θ,
then θbk < θbk+1.
Decreasing objective function
Let ρ(r) = ρo(r2) and suppose ρ0o(s) ≥ 0 and ρ00o (s) ≤ 0. Then
n
X
i=1
ρ(xi − θk+1) <
b
n
X
i=1
ρ(xi − θbk ).
• Examples:
– Mean ⇒ ρo(s) = s
√
– Median ⇒ ρo(s) = s
– Cauchy MLE ⇒ ρo(s) = log(1 + s).
• Include redescending M-estimates of location.
• Generalization of EM algorithm for mixture of normals.
b
PROOF OF MONOTONE CONVERGENCE
Sketch
Let ri,k = (xi − θbk ). By Taylor series with remainder term,
1 2
2
2
2
2
2 2 00 2
2
) = ρo(ri,k
) + (ri,k+1
− ri,k
)ρ0o(ri,k
) + (ri,k+1
− ri,k
) ρo (ri,∗)
ρo(ri,k+1
2
So,
n
X
i=1
2
ρo(ri,k+1
)<
n
X
i=1
2
)+
ρo(ri,k
n
X
2
2
2
− ri,k
)ρ0o(ri,k
)
(ri,k+1
i=1
Now,
ru(r) = ψ(r) = ρ0(r) = 2rρ0o(r2)
and so
ρ0o(r2) = u(r)/2.
Also,
2
2
(ri,k+1
− ri,k
) = (θbk+1 − θbk )2 − 2(θbk+1 − θbk )(xi − θbk ).
Thus,
n
X
i=1
2
ρo(ri,k+1
)
<
n
X
i=1
2
ρo(ri,k
)
n
n
1 b
X
b 2 X
2
− (θk+1 − θk )
u(ri,k ) <
ρo(ri,k
)
2
i=1
i=1
• Slow but Sure ⇒ Switch to Newton-Raphson after a few iterations.
PART 2
MORE ADVANCED CONCEPTS AND METHODS
SCALE EQUIVARIANCE
• M-estimates of location alone are not scale equivariant in general, i.e.
Xi → Xi + a ⇒ θb → θb + a
location equivariant
Xi → b Xi 6⇒ θb → b θb not scale equivariant
(Exceptions: the mean and median.)
• Thus, the adaptive weights are not dependent on the spread of the data
•••••••
•
z
}|
{
Equally
Downweighted
|
{z
}
•
•
•••
•
•
•
SCALE STATISTICS: sn
Xi → bXi + a ⇒ sn → | b |sn
• Sample standard deviation.
sn =
v
u
u
u
t
n
1 X
(xi − x̄)2
n − 1 i=1
• MAD (or more approriately MADAM).
Median Absolute Deviation About the Median
s∗n = M edian| xi − median |
sn = 1.4826 s∗n →p σ at N ormal(µ, σ 2)
Example
2, 4, 5, 10, 12, 14, 200
• Median = 10
• Absolute Deviations: 8, 6, 5, 0, 2, 4, 190
• MADAM = 5 ⇒ sn = 7.413
• (Standard deviation = 72.7661)
M-estimates of location with auxiliary scale
xi − θ 

ρ
θ = arg min

i=1
c sn

n




X
c

or
xi − θ 
 = 0
θ solves:
ψ

i=1
c sn
c
• c: tuning constant
• sn: consistent for σ at normal
n
X






Tuning an M-estimate
• Given a ψ-function, define a class of M-estimates via
ψc(r) = ψ(r/c)
• Tukey’s Bi-weight.
ψ(r) = r{(1 − r2)+}2 ⇒ ψc(r) =
• c → ∞ ⇒ ≈ Mean
• c → 0 ⇒ Locally very unstable
r
c







1−
r
c
 2





2

2
+
Normal distribution. Auxiliary scale.
MORE THAN ONE OUTLIER
• GES: measure of local robustness.
• Measures of global robustness?
Xn = {X1, . . . , Xn}: n “good” data points
Ym = {Y1, . . . , Ym}: m “bad” data points
Zn+m = Xn ∪ Ym: m-contaminated sample. m =
Bias of a statistic: | T (Xn ∪ Ym) − T (Xn) |
m
n+m
MAX-BIAS and THE BREAKDOWN POINT
• Max-Bias under m-contamination:
B(m; T, Xn) = sup | T (Xn ∪ Ym) − T (Xn) |
Ym
• Finite sample contamination breakdown point: Donoho and Huber (1983)
∗c (T ; Xn) = inf{m | B(m; T, Xn) = ∞}
• Other concepts of breakdown (e.g. replacement).
• The definition of BIAS can be modified so that BIAS → ∞ as T (Xn ∪ Ym) goes
to the boundary of the parameter space.
Example: If T represents scale, then choose
BIAS = | log{T (Xn ∪ Ym)} − log{T (Xn)} |.
EXAMPLES
• Mean: ∗c =
1
n+1
• Median: ∗c = 1/2
• α-trimmed mean: ∗c = α
• α-Windsorized mean: ∗c = α
• M-estimate of location with monotone and bounded ψ function:
∗c = 1/2
PROOF
(sketch of lower bound)
Let K = supr |ψ(r)|.
0=
n+m
X
i=1
|
ψ(zi − Tn+m) =
n
X
i=1
n
X
i=1
ψ(xi − Tn+m) +
ψ(xi − Tn+m) | = |
m
X
i=1
m
X
i=1
ψ(yi − Tn+m)
ψ(yi − Tn+m) | ≤ mK
Breakdown occurs ⇒ | Tn+m | → ∞, say Tn+m → −∞
⇒|
n
X
i=1
ψ(xi − Tn+m) | → |
Therefore, m ≥ n and so ∗c ≥ 1/2.
[]
n
X
i=1
ψ(∞) | = nK
Population Version of Breakdown Point
under contamination neighborhoods
• Model Distribution: F
• Contaminating Distribution: H
• -contaminated Distribution: F = (1 − )F + H
• Max-Bias under -contamination:
B(; T, F ) = sup | T (F) − T (F ) |
H
• Breakdown Point: Hampel (1968)
∗(T ; F ) = inf{ | B(; T, F ) = ∞}
• Examples
– Mean: T (F ) = EF (X) ⇒ ∗(T ; F ) = 0
– Median: T (F ) = F −1(1/2) ⇒ ∗(T ; F ) = 1/2
ILLUSTRATION
GES ≈ ∂B(; T, F )/∂ |=0
∗(T ; F ) = asymptote
Heuristic Interpretation of Breakdown Point
subject to debate
• Proportion of bad data a statistic can tolerate before becoming arbitrary or
meaningless.
• If 1/2 the data is bad then one cannot distinguish between the good data and
the bad data? ⇒ ≤ 1/2?
• Discussion: Davies and Gather (2005), Annals of Statistics.
Example: Redescending M-estimates of location with fixed scale


| xi − t | 


T (n1, . . . , xn) = arg min ρ 
t i=1
c
n
X
Breakdown point. Huber (1984). For bounded increasing ρ,
∗c (T ; F ) =
where A(Xn; c) = mint
xi −t
i=1 ρ( c )
Pn
1 − A(Xn; c)/n
2 − A(Xn; c)/n
and sup ρ(r) = 1.
• Breakdown point depends on Xn and c
• ∗ : 0 → 1/2 as c : 0 → ∞
• For large c, T (n1, . . . , xn) ≈ M ean!!
Explanation: Relationship between redescending M-estimates of location and
kernel density estimates
Chu, Glad, Godtliebsen and Marron (1998)
• Objective function:
xi − t 
ρ
c
i=1
n
X


• Kernel density estimate:
xi − t 
κ
fc(x) ∝
h
i=1
n
X
• Relationship:
κ∝1−ρ
• Example:
κ(r) =
√1
2π


and c = window width h
exp−r
Normal kernel
2 /2
⇒
ρ(r) = 1 − exp−r
2 /2
Welsch’s M-estimate
• Example: Epanechnikov kernel ⇒ skipped mean
• “Outliers” less compact than “good” data ⇒ Breakdown will not occur.
• Not true for monotonic M-estimates.
PART 3
ROBUST REGRESSION
AND MULTIVARIATE STATISTICS
REGRESSION SETTING
• Data: (Yi, Xi) i = 1, . . . , n
– Yi ∈ < Response
– Xi ∈ <p Predictors
• Predict: Y by X0β
• Residual for a given β: ri(β) = yi − x0iβ
• M-estimates of regression
– Generalizion of MLE for symmetric error term
– Yi = X0iβ + i where i are i.i.d. symmetric.
M-ESTIMATES OF REGRESSION
• Objective function approach:
n
X
min ρ(ri(β))
β i=1
where ρ(r) = ρ(−r) and ρ ↑ for r ≥ 0.
• M-estimating equation approach:
n
X
i=1
e.g. ψ(r) = ρ0(r)
ψ(ri(β))xi = 0
• Interpretation: Adaptively weighted least squares.
c
◦ Express ψ(r) = ru(r) and wi = u(ri(β))
β=
c
n
X
n
0 −1 X
[ wixixi] [ wiyixi]
i=1
i=1
• Computations via IRLS algorithm.
INFLUENCE FUNCTIONS FOR
M-ESTIMATES OF REGRESSION
IF (y, x; T, F ) ∝ ψ(r) x
• r = y − x0T (F )
• Due to residual: ψ(r)
• Due to Design: x
• GES = ∞, i.e. unbounded influence.
• Outlier 1: highly influential for any ψ function.
• Outlier 2: highly influential for monotonic but not redescending ψ.
BOUNDED INFLUENCE REGRESSION
• GM-estimates (Generalized M-estimates).
◦ Mallows-type (Mallows, 1975):
n
X
i=1
w(xi) ψ (ri(β)) xi = 0
Downweights outlying design points (leverage points), even if they are good
leverage points.
◦ General form (c.g. Maronna and Yohai, 1981):
n
X
i=1
ψ (xi, ri(β)) xi = 0
• Breakdown points.
– M-estimates: ∗ = 0
– GM-estimates: ∗ ≤ 1/(p + 1)
LATE 70’s - EARLY 80’s
• Open problem: Is high breakdown point regression possible?
• Yes. Repeated Median. Siegel (1982).
– Not regression equivariate
• Regression equivariance: For a ∈ < and A nonsingular
c
c
(Yi, Xi) → (aYi, A0Xi) ⇒ β
→ aA−1β
i.e.
Yci → aYci
• Open problem: Is high breakdown point equivariate regression possible?
• Yes. Least Median of Squares. Hampel (1984), Rousseeuw, (1984)
Least Median of Squares (LMS)
Hampel (1984), Rousseeuw, (1984)
min M edian{ri(β)2 | i = 1, . . . , n}
β
• Alternatively: minβ M AD{ ri(β) }
• Breakdown point: ∗ = 1/2
• Location version: SHORTH (Princeton Robustness Study)
Midpoint of the SHORTest Half.
LMS: Mid-line of the shortest strip containing 1/2 of the data.
• Problem: Not
√
√
n-consistent, but only 3 n-consistent
√ c
n|| β − β || →p ∞
√
c
3
n|| β
− β || = Op(1)
• Not locally stable. e.g. Example is pure noise.
S-ESTIMATES OF REGRESSION
Rousseeuw and Yohai (1984)
• For S(·), an estimate of scale (about zero):
minβ S(ri(β))
• S = M AD ⇒ LMS
• S = sample standard deviation about 0 ⇒ Least Squares
• Bounded monotonic M-estimates of scale (about zero):
n
X
i=1
χ (| ri |/s) = 0
for χ ↑, and bounded above and below. Alternatively,
n
1 X
ρ (| ri |/s) = n i=1
for ρ ↑, and 0 ≤ ρ ≤ 1
• For LMS: ρ : 0 - 1 jump function
• Breakdown point:
and = 1/2
∗ = min(, 1 − )
S-estimates of Regression
•
√
n- consistent and Asymptotically normal.
• Trade-off: Higher breakdown point ⇒ Lower efficiency and higher residual
gross error sensivity for Normal errors.
• One Resolution: One-step M-estimate via IRLS. (Note: R, S-plus, Matlab.)
M-ESTIMATES OF REGRESSION WITH GENERAL SCALE
Martin, Yohai, and Zamar (1989)
n
X
| yi − x0iβ | 


ρ
min
β i=1

c sn

• Parameter of interest: β
• Scale statistic: sn (consistent for σ at normal errors)
• Tuning constant: c
• Monotonic bounded ρ ⇒ redescending M-estimate
• High Breakdown Point Examples
– LMS and S-estimates
– MM-estimates, Yohai (1987).
– CM-estimates, Mendes and Tyler (1994).
sn .vs.
n
X
| yi − x0iβ | 


ρ
i=1

c sn

• Each β gives one curve
• S minimizes horizonally
• M minimizes vertically
MM-ESTIMATES OF REGRESSION
Default robust regression estimate in S-plus (also SAS?)
• Given a preliminary estimate of regression with breakdown point 1/2.
Usually an S-estimate of regression.
• Compute a monotone M-estimate of scale about zero for the residuals.
sn: Usually from the S-estimate.
• Compute an M-estimate of regression using the scale statistics sn and with any
desired tuning constant c.
• Tune to obtain reasonable ARE and residual GES.
• Breakdown point: ∗ = 1/2
TUNING
• High breakdown point S-estimates are badly tuned M-estimates.
• Tuning the S-estimates affects its breakdown point.
• MM-estimates can be tuned without affecting the breakdown point
COMPUTATIONAL ISSUES
• All known high breakdown point regression estimates are computationally intensive. Non-convex optimization problem.
• Approximate or stochastic algorithms.
• Random Elemental Subset Selection. Rousseeuw (1984).


n

 such lines.
– Consider exact fit of lines for p of the data points ⇒ 
p
– Optimize the criterion over such lines.
– For large n and p, randomly sample from such lines so that there is a good
chance, say e.g. 95% chance, that one of the elemental subsets will contain
only good data even if half the data is contaminated.
MULTIVARIATE DATA
Robust Multivariate Location and Covariance Estimates
Parallel developments
• Multivariate M-estimates. Maronna (1976). Huber (1977).
◦ Adaptively weighted mean and covariances.
◦ IRLS algorithms.
◦ Breakdown point: ∗ ≤ 1/(d + 1), where d is the dimension of
the data.
• Minimum Volume Ellipsoid Estimates. Rousseeuw (1985).
◦ MVE is a multivariate version of LMS.
• Multivariate S-estimates. Davies (1987), Lopuhuaä (1989)
• Multivariate CM-estimates. Kent and Tyler (1996).
• Multivariate MM-estimates. Tatsuoka and Tyler (2000). Tyler (2002).
MULTIVARIATE DATA
• d-dimensional data set: X1, . . . Xn
• Classical summary statistics:
– Sample Mean Vector: X =
1
n
Pn
i=1
Xi
– Sample Variance-Covarinance Matrix
1 X
n
(Xi − X)(Xi − X)T = {sij}
Sn =
n i=1
where sii = sample variance, sij = sample covariance.
• Maximum likelihood estimates under normality
Sample Mean and Covariance
6.0
●
●
●
●
●
●
5.5
●
●
●
● ●
●
●
●
●
●
●
●
●
5.0
●
●
● ●
●
●
●
●
●
●
4.5
●
●
●
●
●
●
●
●
●
●
●
● ●
●
4.0
●
●
3.6
3.8
4.0
4.2
4.4
4.6
Hertzsprung−Russell Galaxy Data (Rousseeuw)
Visualization ⇒ Ellipse containing half the data.
(X − X)TS−1
n (X − X) ≤ c
Mahalanobis distances: d2i = (Xi − X)TSn−1(Xi − X)
●
3.0
●
●
2.5
●
2.0
●
●
● ●
●
1.5
MahDis
●
●
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.5
● ●●
●
●
●
●
●
●
●
0
●
●
●
●
10
●
●
●
●
20
30
40
Index
−1
Mahalanobis angles: θ i,j = Cos
(Xi −
X)TS−1
n (Xj
− X)/(didj )
1.0
0.5
PC2
0.0
●
●
●
●
●
●●
●●
● ●
● ● ●
●● ●
● ● ●
●
●
●
●
●●●●
●
●
●
●
● ●●
●●
●
●
−0.5
●
●
●
●
●
●
●
●
−1.0
●
●
−1.0
●
●● ● ●
●● ●
● ●
●●
● ●● ●●● ●● ● ●
−0.5
0.0
● ●●● ● ● ●
●●●
0.5
● ●●● ●
●
1.0
PC1
Principal Components: Do not always reveal outliers.
ENGINEERING APPLICATIONS
• Noisy signal arrays, radar clutter.
Test for signal, i.e. detector, based on Mahalanobis angle between signal and observed data.
• Hyperspectral Data.
Adaptive cosine estimator = Mahalanobis angle.
• BIG DATA
• Can not clean data via observations. Need automatic methods.
ROBUST MULTIVARIATE ESTIMATES
Location and Scatter (Pseudo-Covariance)
• Trimmed versions: complicated
• Weighted mean and covariance matrices
Pn
µ=
d
i=1 wi Xi
Pn
i=1 wi
Σ=
d
1
n
X
n i=1
d
d T
wi (Xi − µ)(X
i − µ)
−1
d
d )T Σ
d
where wi = u(di,o) and d2i,o = (Xi − µ
o
o (Xi − µo )
d
d and Σ being initial estimates, e.g.
and with µ
o
o
→ Sample mean vector and covariance matrix.
→ High breakdown point estimates.
MULTIVARIATE M-ESTIMATES
• MLE type for elliptically symmetric distributions
• Adaptively weighted mean and covariances
Pn
d =
µ
i=1 wi Xi
Pn
i=1 wi
d
Σ
=
1
n
X
n i=1
d
d T
wi (Xi − µ)(X
i − µ)
−1
d
d TΣ
• wi = u(di) and d2i = (Xi − µ)
• Implicit equations
(Xi − µ)
d
PROPERTIES OF M-ESTIMATES
• Root-n consistent and asymptotically normal.
• Can be tuned to have desirable properties:
– high efficiency over a broad class of distributions
– smooth and bounded influence function
• Computationally simple (IRLS)
• Breakdown point: ∗ ≤ 1/(d + 1)
HIGH BREAKDOWN POINT ESTIMATES
• MVE: Minimum Volume Ellipsoid ⇒ ∗ = 0.5
Center and scatter matrix for smallest ellipse covering at least
half of the data.
• MCD, S-estimates, MM-estimates, CM-estimates ⇒ ∗ = 0.5
• Reweighted versions based on high breakdown start.
• Computationally intensive:
Approximate/probabilistic algorithms.
Elemental Subset Approach.
Sample Mean and Covariance
6.0
●
●
●
●
●
●
5.5
●
●
●
● ●
●
●
●
●
●
●
●
●
5.0
●
●
● ●
●
●
●
●
●
●
4.5
●
●
●
●
●
●
●
●
●
●
●
● ●
●
4.0
●
●
3.6
3.8
4.0
4.2
4.4
Hertzsprung−Russell Galaxy Data (Rousseeuw)
4.6
Add MVE
6.0
●
●
●
●
●
●
5.5
●
●
●
● ●
●
●
●
●
●
●
●
●
5.0
●
●
● ●
●
●
●
●
●
●
4.5
●
●
●
●
●
●
●
●
●
●
●
● ●
●
4.0
●
●
3.6
3.8
4.0
4.2
4.4
4.6
Add Cauchy MLE
6.0
●
●
●
●
●
●
5.5
●
●
●
● ●
●
●
●
●
●
●
●
●
5.0
●
●
● ●
●
●
●
●
●
●
4.5
●
●
●
●
●
●
●
●
●
●
●
● ●
●
4.0
●
●
3.6
3.8
4.0
4.2
4.4
4.6
ELLIPTICAL DISTRIBUTIONS
and
AFFINE EQUIVARIANCE
SPHERICALLY SYMMETRIC DISTRIBUTIONS
Z =RU
R ⊥ U, R > 0,
where
and U ∼d Uniform on the unit sphere.
−1.0
−0.5
y
0.0
0.5
1.0
Density: f (z) = g(z 0z)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1.0
−0.5
0.0
x
0.5
1.0
ELLIPTICAL SYMMETRIC DISTRIBUTIONS
X = AZ + µ ∼ E(µ, Γ; g), where Γ = AA0
1.0
0.5
0.0
−0.5
Eliplot()[,2]
1.5
2.0
2.5
⇒ f (x) = |Γ|
−1/2
0
g (x − µ) Γ
−1
(x − µ)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−0.5
0.0
0.5
1.0
Eliplot()[,1]
1.5
2.0
2.5
!
AFFINE EQUIVARIANCE
• Parameters of elliptical distributions:
X ∼ E(µ, Γ; g) ⇒ X ∗ = BX + b ∼ E(µ∗, Γ∗; g)
where µ∗ = Bµ + b and Γ∗ = BΓB 0
• Sample version:
⇒
Xi → Xi∗ = BX + b, i = 1, . . . n
d
d∗
d
d → µ
d∗ = B µ
d + b
µ
and V
→V
= BV
B0
• Examples
– Mean vector and Covariance matrix
– M-estimates
– MVE
ELLIPTICAL SYMMETRIC DISTRIBUTIONS
f (x; µ, Γ) = |Γ|
−1/2
0
g (x − µ) Γ
−1
!
(x − µ) .
• If second moments exist, shape matrix Γ ∝ covariance matrix.
• For any affine equivariant scatter functional: Σ(F ) ∝ Γ
• That is, any affine equivariant scatter statistic estimates a matrix
proportional to the shape matrix Γ.
NON-ELLIPTICAL DISTRIBUTIONS
• Difference scatter matrices estimate different population quantities.
• Analogy: For non-symmetric distributions:
population mean 6= population median.
OTHER TOPICS
Projection-based multivariate approaches
• Tukey’s data depth. (or half-space depth) Tukey (1974).
• Stahel-Donoho’s Estimate. Stahel (1981). Donoho 1982.
• Projection-estimates. Maronna, Stahel and Yohai (1994). Tyler (1996).
• Computationally Intensive.
TUKEY’S DEPTH
PART 5
ROBUSTNESS AND COMPUTER VISION
• Independent development of robust statistics within the computer vision/image understanding community.
• Hough transform: ρ = xcos(θ) + ysin(θ)
• y = a + bx where a = ρ/sin(θ) and b = −cos(θ)/sin(θ)
• That is, intersection refers to the line connecting two points.
• Hough: Estimation in Data Space to Clustering in Feature Space
Find centers of the clusters
• Terminology:
– Feature Space = Parameter Space
– Accumulator = Elemental Fit
• Computation: RANSAC (Random sample consensus)
– Randomly choose a subset from the Accumulator. (Random
elemental fits.)
– Check to see how many data points are within a fixed neighborhood are in a neighborhood.
Alternative Formulation of Hough Transform/RANSAC
P
ρ(ri) should be small, where ρ(r) =




1, | r | > R


 0, | r | ≤ R
That is, a redescending M-estimate of regression with known scale.
Note: Relationship to LMS.
Vision: Need e.g. 90% breakdown point, i.e. tolerate 90% bad data
Defintion of Residuals?
Hough transform approach does not distinguish between:
• Regression line for regressing Y on X.
• Regression line for regressing X on Y.
• Orthogonal regression line.
(Note: Small stochastic errors ⇒ little difference.)
GENERAL PARADIGM
• Line in 2-D: Exact fit to all pairs
• Quadratic in 2-D: Exact fit to all triples
• Conic Sections: Ellipse Fitting
(x − µ)0A(x − µ) = 1
◦ Linearizing: Let x0∗ = (x1, x2, x21, x22, x1x2, 1)
◦ Ellipse: a0x∗ = 0
◦ Exact fit to all subsets of size 5.
Hyperboloid Fitting in 4-D
• Epipolar Geometry: Exact fit to
◦ 5 pairs: Calibrated cameras (location, rotation, scale).
◦ 8 pairs: Uncalibrated cameras (focal point, image plane).
◦ Hyperboloid surface in 4-D.
• Applications: 2D → 3D, Motion.
• OUTLIERS = MISMATCHES
Download