A SHORT COURSE ON ROBUST STATISTICS David E. Tyler Rutgers The State University of New Jersey Web-Site www.rci.rutgers.edu/ dtyler/ShortCourse.pdf References • Huber, P.J. (1981). Robust Statistics. Wiley, New York. • Hampel, F.R., Ronchetti, E.M., Rousseeuw,P.J. and Stahel, W.A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York. • Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006). Robust Statistics: Theory and Methods. Wiley, New York. PART 1 CONCEPTS AND BASIC METHODS MOTIVATION • Data Set: X1, X2, . . . , Xn • Parametric Model: F (x1, . . . , xn | θ) θ: Unknown parameter F : Known function • e.g. X1, X2, . . . , Xn i.i.d. N ormal(µ, σ 2) Q: Is it realistic to believe we don’t know (µ, σ 2), but we know e.g. the shape of the tails of the distribution? A: The model is assumed to be approximately true, e.g. symmetric and unimodal (past experience). Q: Are statistical methods which are good under the model reasonably good if the model is only approximately true? ROBUST STATISTICS Formally addresses this issue. CLASSIC EXAMPLE: MEAN .vs. MEDIAN • Symmetric distributions: µ = population mean population median • Sample mean: X ≈ N ormal µ, σ2 n • Sample median: M edian ≈ N ormal µ, • At normal: M edian ≈ N ormal µ, σ2 π n 2 1 1 n 4f (µ)2 ! • Asymptotic Relative Efficiency of Median to Mean ARE(M edian, X) = avar(X) 2 = = 0.6366 avar(M edian) π CAUCHY DISTRIBUTION X ˜ Cauchy(µ, σ 2) f (x; µ, σ) = 2 1 x−µ 1 + πσ σ 2 −1 • Mean: X ˜Cauchy(µ, σ ) • M edian ≈ N ormal µ, • ARE(M edian, X) = ∞ or ARE(X, M edian) = 0 π2σ2 4n • For t on ν degrees of freedom: 4 Γ((ν + 1)/2)2 ARE(M edian, X) = (ν − 2)π Γ(ν/2) ν≤2 3 4 5 ARE(M edian, X) ∞ 1.621 1.125 0.960 ARE(X, M edian) 0 0.617 0.888 1.041 MIXTURE OF NORMALS Theory of errors: Central Limit Theorem gives plausibility to normality. N ormal(µ, σ 2) with probability 1 − X˜ N ormal(µ, (3σ)2 ) with probability i.e. not all measurements are equally precise. = 0.10 X ˜(1 − ) N ormal(µ, σ 2) + N ormal(µ, (3σ)2) • Classic paper: Tukey (1960), A survey of sampling from contaminated distributions. ◦ For > 0.10 ⇒ ARE(M edian, X) > 1 ◦ The mean absolute deviation is more effiicent than the sample standard deviation for > 0.01. PRINCETON ROBUSTNESS STUDIES Andrews, et.al. (1972) Other estimates of location. • α-trimmed mean: Trim a proportion of α from both ends of the data set and then take the mean. (Throwing away data?) • α-Windsorized mean: Replace a proportion of α from both ends of the data set by the next closest observation and then take the mean. • Example: 2, 4, 5, 10, 200 Mean = 44.2 Median = 5 20% trimmed mean = (4 + 5 + 10) / 3 = 6.33 20% Windsorized mean = (4 + 4 + 5 + 10 + 10) / 5 = 6.6 Measuring the robustness of a statistics • Relative Efficiency over a range of distributional models. – There exist estimates of location which are asymptotically most efficient for the center of any symmetric distribution. (Adaptive estimation, semi-parametrics). – Robust? • Influence Function over a range of distributional models. • Maximum Bias Function and the Breakdown Point. Measuring the effect of an outlier (not modeled) • Good Data Set: x1, . . . , xn−1 Statistic: Tn−1 = T (x1, . . . , xn−1) • Contaminated Data Set: x1, . . . , xn−1, x Contaminated Value: Tn = T (x1, . . . , xn−1, x) THE SENSITIVITY CURVE (Tukey, 1970) 1 SCn(x) = n(Tn − Tn−1) ⇔ Tn = Tn−1 + SCn(x) n THE INFLUENCE FUNCTION (Hampel, 1969, 1974) Population version of the sensitivity curve. THE INFLUENCE FUNCTION • Statistic Tn = T (Fn) estimates T (F ). • Consider F and the -contaminated distribution, F = (1 − )F + δx where δx is the point mass distribution at x. F F • Compare Functional Values: T (F ) .vs. T (F) • Given Qualitative Robustness (Continuity): T (F) → T (F ) as → 0 (e.g. the mode is not qualitatively robust) • Influence Function (Infinitesimal pertubation: Gâteaux Derivative) T (F) − T (F ) ∂ = T (F) IF (x; T, F ) = lim →0 ∂ =0 EXAMPLES • Mean: T (F ) = EF [X]. T (F) = EF (X) = (1 − )EF [X] + E[δx] = (1 − )T [F ] + x (1 − )T (F ) + x − T (F ) = x − T(F) →0 IF(x; T, F) = lim • Median: T (F ) = F −1(1/2) IF(x; T, F) = {2 f (T(F))}−1sign(X − T(F)) Plots of Influence Functions Gives insight into the behavior of a statistic. 1 Tn ≈ Tn−1 + IF (x; T, F ) n Mean α-trimmed mean Median α-Winsorized mean (somewhat unexpected?) Desirable robustness properties for the influence function • SMALL – Gross Error Sensitivity GES(T ; F ) = sup | IF (x; T, F ) | x GES < ∞ ⇒ B-robust (Bias-robust) – Asymptotic Variance N ote : EF [ IF (X; T, F ) ] = 0 AV (T ; F ) = EF [ IF (X; T, F )2 ] • Under general conditions, e.g. Frèchet differentiability, √ n(T (X1, . . . , Xn) − T (F )) → N ormal(0, AV (T ; F )) • Trade-off at the normal model: Smaller AV ↔ Larger GES • SMOOTH (local shift sensitivity): protects e.g. against rounding error. • REDESCENDING to 0. REDESCENDING INFLUENCE FUNCTION • Example: Data Set of Male Heights in cm 180, 175, 192, . . ., 185, 2020, 190, . . . • Redescender = Automatic Outlier Detector CLASSES OF ESTIMATES L-statistics: Linear combination of order statistics • Let X(1) ≥ . . . ≥ X(n) represent the order statistics. T (X1, . . . , Xn) = where ai,n are constants. n X i=1 ai,nX(i) • Examples: ◦ Mean. ai,n = 1/n ◦ Median. ai,n 1 i= = 0 i 6= for n odd n+1 2 n+1 2 ai,n 1 2 i = n2 , n2 + 1 = 0 otherwise for n even ◦ α-trimmed mean. ◦ α-Winsorized mean. • General form for the influence function exists • Can obtain any desirable monotonic shape, but not redescending • Do not readily generalize to other settings. M-ESTIMATES Huber (1964, 1967) Maximum likelihood type estimates under non-standard conditions • One-Parameter Case. X1, . . . , Xn i.i.d. f (x; θ), θ ∈ Θ • Maximum likelihood estimates – Likelihood function. L(θ | x1, . . . , xn) = n Y i=1 f (xi; θ) – Minimize the negative log-likelihood. n X min θ i=1 ρ(xi; θ) where ρ(xi; θ) = − log f (x : θ). – Solve the likelihood equations n X i=1 ψ(xi; θ) = 0 where ψ(xi; θ) = ∂ρ(xi; θ) ∂θ DEFINITIONS OF M-ESTIMATES • Objective function approach: θc = arg min Pni=1 ρ(xi; θ) • M-estimating equation approach: Pn c ψ(x ; θ) = 0. i i=1 – Note: Unique solution when ψ(x; θ) is strictly monotone in θ. • Basic examples. 1 2 – Mean. MLE for Normal: f (x) = (2π)−1/2e− 2 (x−θ) for x ∈ <. ρ(x; θ) = (x − θ)2 or ψ(x; θ) = x − θ – Median. MLE for Double Exponential: f (x) = 12 e−|x−θ| for x ∈ <. ρ(x; θ) = | x − θ | or ψ(x; θ) = sign(x − θ) • ρ and ψ need not be related to any density or to each other. • Estimates can be evaluated under various distributions. M-ESTIMATES OF LOCATION A symmetric and translation equivariant M-estimate. Translation equivariance X i → X i + a ⇒ Tn → Tn + a gives ρ(x; t) = ρ(x − t) and ψ(x; t) = ψ(x − t) Symmetric Xi → −Xi ⇒ Tn → −Tn gives ρ(−r) = ρ(r) or ψ(−r) = −ψ(r). Alternative derivation Generalization of MLE for center of symmetry for a given family of symmetric distributions. f (x; θ) = g(|x − θ|) INFLUENCE FUNCTION OF M-ESTIMATES M-FUNCTIONAL: T (F ) is the solution to EF [ψ(X; T (F ))] = 0. IF (x; T, F ) = c(T, F )ψ(x; T (F )) where c(t, f ) = −1/EF ∂ψ(X;θ) ∂θ evaluated at θ = T (F ). Note: EF [ IF (X; T, F ) ] = 0. One can decide what shape is desired for the Influence Function and then construct an appropriate M-estimate. ⇒ Mean ⇒ Median EXAMPLE Choose c r≥c ψ(r) = r |r|<c −c r ≤ −c where c is a tuning constant. Huber’s M-estimate Adaptively trimmed mean. i.e. the proportion trimmed depends upon the data. DERIVATION OF INFLUENCE FUNCTION FROM M-ESTIMATES Sketch Let T = T (F), and so 0 = EF [ψ(X; T)] = (1 − )EF [ψ(X; T)] + ψ(x; T). Taking the derivative with respect to ⇒ 0 = −EF [ψ(X; T)] + (1 − ) ∂ ∂ EF [ψ(X; T)] + ψ(x; T) + ψ(x; T). ∂ ∂ Let ψ 0(x, θ) = ∂ψ(x, θ)/∂θ. Using the chain rule ⇒ 0 = −EF [ψ(X; T)] + (1 − )EF [ψ 0(X; T)] ∂T ∂T + ψ(x; T) + ψ 0(x; T) . ∂ ∂ Letting → 0 and using qualitative robustness, i.e. T (F) → T (F ), then gives 0 = EF [ψ 0(X; T )]IF (x; T (F )) + ψ(x; T (F )) ⇒ RESULTS. ASYMPTOTIC NORMALITY OF M-ESTIMATES Sketch b Let θ = T (F ). Using Taylor series expansion on ψ(x; θ) about θ gives n X 0= i=1 0= √ By the CLT, ψ(xi; θ) = b n X i=1 ψ(xi; θ) + (θ − θ) b or n X i=1 ψ 0(xi; θ) + . . . , n n √ b √ 1 X 1 X ψ(xi; θ)} + n(θ − θ){ ψ 0(xi; θ)} + Op(1/ n). n{ n i=1 n i=1 √ n{ n1 By the WLLN, Pn i=1 ψ(xi ; θ)} 1 Pn 0 n i=1 ψ (xi ; θ) →d Z ∼ N ormal(0, EF [ψ(X; θ)2]). →p EF [ψ 0(X; θ)]. Thus, by Slutsky’s theorem, √ b n(θ − θ) →d −Z/EF [ψ 0(X; θ)] ∼ N ormal(0, σ 2), where σ 2 = EF [ψ(X; θ)2]/EF [ψ 0(X; θ)]2 = EF [IF (X; T F )2] • NOTE: Proving Frèchet differentiability is not necessary. M-estimates of location Adaptively weighted means • Recall translation equivariance and symmetry implies ψ(x; t) = ψ(x − t) and ψ(−r) = −ψ(r). • Express ψ(r) = ru(r) and let wi = u(xi − θ), then 0= n X i=1 ψ(xi − θ) = n X (xi − θ)u(xi − θ) = i=1 n X i=1 wi {xi − θ} Pn ⇒ θc = i=1 wi xi Pn i=1 wi The weights are determined by the data cloud. ••••••• θb • z • }| { Heavily Downweighted SOME COMMON M-ESTIMATES OF LOCATION Huber’s M-estimate c r≥c ψ(r) = r |r|<c −c r ≤ −c • Given bound on the GES, it has maximum efficiency at the normal model. • MLE for the “least favorable distribution”, i.e. symmetric unimodal model with smallest Fisher Information within a “neighborhood” of the normal. LFD = Normal in the middle and double exponential in the tails. Tukey’s Bi-weight M-estimate (or bi-square) u(r) = 2 r2 1 − 2 c + where a+ = max{0, a} • Linear near zero • Smooth (continuous second derivatives) • Strongly redescending to 0 • NOT AN MLE. CAUCHY MLE ψ(r) = r/c (1 + r2/c2) NOT A STRONG REDESCENDER. COMPUTATIONS • IRLS: Iterative Re-weighted Least Squares Algorithm. Pn i=1 wi,k xi θbk+1 = Pn i=1 wi,k where wi,k = u(xi − θbk ) and θbo is any intial value. • Re-weighted mean = One-step M-estimate Pn θb1 = i=1 wi,o Pn xi i=1 wi,o , where θbo is a preliminary robust estimate of location, such as the median. PROOF OF CONVERGENCE Sketch Pn θk+1 − θk = b b i=1 wi,k xi Pn i=1 wi,k Pn − θk = b i=1 wi,k (xi − θk ) Pn i=1 wi,k Pn b = i=1 ψ(xi − θk ) Pn b i=1 u(xi − θk ) b b Note: If θbk > θ, then θbk > θbk+1 and θbk < θ, then θbk < θbk+1. Decreasing objective function Let ρ(r) = ρo(r2) and suppose ρ0o(s) ≥ 0 and ρ00o (s) ≤ 0. Then n X i=1 ρ(xi − θk+1) < b n X i=1 ρ(xi − θbk ). • Examples: – Mean ⇒ ρo(s) = s √ – Median ⇒ ρo(s) = s – Cauchy MLE ⇒ ρo(s) = log(1 + s). • Include redescending M-estimates of location. • Generalization of EM algorithm for mixture of normals. b PROOF OF MONOTONE CONVERGENCE Sketch Let ri,k = (xi − θbk ). By Taylor series with remainder term, 1 2 2 2 2 2 2 2 00 2 2 ) = ρo(ri,k ) + (ri,k+1 − ri,k )ρ0o(ri,k ) + (ri,k+1 − ri,k ) ρo (ri,∗) ρo(ri,k+1 2 So, n X i=1 2 ρo(ri,k+1 )< n X i=1 2 )+ ρo(ri,k n X 2 2 2 − ri,k )ρ0o(ri,k ) (ri,k+1 i=1 Now, ru(r) = ψ(r) = ρ0(r) = 2rρ0o(r2) and so ρ0o(r2) = u(r)/2. Also, 2 2 (ri,k+1 − ri,k ) = (θbk+1 − θbk )2 − 2(θbk+1 − θbk )(xi − θbk ). Thus, n X i=1 2 ρo(ri,k+1 ) < n X i=1 2 ρo(ri,k ) n n 1 b X b 2 X 2 − (θk+1 − θk ) u(ri,k ) < ρo(ri,k ) 2 i=1 i=1 • Slow but Sure ⇒ Switch to Newton-Raphson after a few iterations. PART 2 MORE ADVANCED CONCEPTS AND METHODS SCALE EQUIVARIANCE • M-estimates of location alone are not scale equivariant in general, i.e. Xi → Xi + a ⇒ θb → θb + a location equivariant Xi → b Xi 6⇒ θb → b θb not scale equivariant (Exceptions: the mean and median.) • Thus, the adaptive weights are not dependent on the spread of the data ••••••• • z }| { Equally Downweighted | {z } • • ••• • • • SCALE STATISTICS: sn Xi → bXi + a ⇒ sn → | b |sn • Sample standard deviation. sn = v u u u t n 1 X (xi − x̄)2 n − 1 i=1 • MAD (or more approriately MADAM). Median Absolute Deviation About the Median s∗n = M edian| xi − median | sn = 1.4826 s∗n →p σ at N ormal(µ, σ 2) Example 2, 4, 5, 10, 12, 14, 200 • Median = 10 • Absolute Deviations: 8, 6, 5, 0, 2, 4, 190 • MADAM = 5 ⇒ sn = 7.413 • (Standard deviation = 72.7661) M-estimates of location with auxiliary scale xi − θ ρ θ = arg min i=1 c sn n X c or xi − θ = 0 θ solves: ψ i=1 c sn c • c: tuning constant • sn: consistent for σ at normal n X Tuning an M-estimate • Given a ψ-function, define a class of M-estimates via ψc(r) = ψ(r/c) • Tukey’s Bi-weight. ψ(r) = r{(1 − r2)+}2 ⇒ ψc(r) = • c → ∞ ⇒ ≈ Mean • c → 0 ⇒ Locally very unstable r c 1− r c 2 2 2 + Normal distribution. Auxiliary scale. MORE THAN ONE OUTLIER • GES: measure of local robustness. • Measures of global robustness? Xn = {X1, . . . , Xn}: n “good” data points Ym = {Y1, . . . , Ym}: m “bad” data points Zn+m = Xn ∪ Ym: m-contaminated sample. m = Bias of a statistic: | T (Xn ∪ Ym) − T (Xn) | m n+m MAX-BIAS and THE BREAKDOWN POINT • Max-Bias under m-contamination: B(m; T, Xn) = sup | T (Xn ∪ Ym) − T (Xn) | Ym • Finite sample contamination breakdown point: Donoho and Huber (1983) ∗c (T ; Xn) = inf{m | B(m; T, Xn) = ∞} • Other concepts of breakdown (e.g. replacement). • The definition of BIAS can be modified so that BIAS → ∞ as T (Xn ∪ Ym) goes to the boundary of the parameter space. Example: If T represents scale, then choose BIAS = | log{T (Xn ∪ Ym)} − log{T (Xn)} |. EXAMPLES • Mean: ∗c = 1 n+1 • Median: ∗c = 1/2 • α-trimmed mean: ∗c = α • α-Windsorized mean: ∗c = α • M-estimate of location with monotone and bounded ψ function: ∗c = 1/2 PROOF (sketch of lower bound) Let K = supr |ψ(r)|. 0= n+m X i=1 | ψ(zi − Tn+m) = n X i=1 n X i=1 ψ(xi − Tn+m) + ψ(xi − Tn+m) | = | m X i=1 m X i=1 ψ(yi − Tn+m) ψ(yi − Tn+m) | ≤ mK Breakdown occurs ⇒ | Tn+m | → ∞, say Tn+m → −∞ ⇒| n X i=1 ψ(xi − Tn+m) | → | Therefore, m ≥ n and so ∗c ≥ 1/2. [] n X i=1 ψ(∞) | = nK Population Version of Breakdown Point under contamination neighborhoods • Model Distribution: F • Contaminating Distribution: H • -contaminated Distribution: F = (1 − )F + H • Max-Bias under -contamination: B(; T, F ) = sup | T (F) − T (F ) | H • Breakdown Point: Hampel (1968) ∗(T ; F ) = inf{ | B(; T, F ) = ∞} • Examples – Mean: T (F ) = EF (X) ⇒ ∗(T ; F ) = 0 – Median: T (F ) = F −1(1/2) ⇒ ∗(T ; F ) = 1/2 ILLUSTRATION GES ≈ ∂B(; T, F )/∂ |=0 ∗(T ; F ) = asymptote Heuristic Interpretation of Breakdown Point subject to debate • Proportion of bad data a statistic can tolerate before becoming arbitrary or meaningless. • If 1/2 the data is bad then one cannot distinguish between the good data and the bad data? ⇒ ≤ 1/2? • Discussion: Davies and Gather (2005), Annals of Statistics. Example: Redescending M-estimates of location with fixed scale | xi − t | T (n1, . . . , xn) = arg min ρ t i=1 c n X Breakdown point. Huber (1984). For bounded increasing ρ, ∗c (T ; F ) = where A(Xn; c) = mint xi −t i=1 ρ( c ) Pn 1 − A(Xn; c)/n 2 − A(Xn; c)/n and sup ρ(r) = 1. • Breakdown point depends on Xn and c • ∗ : 0 → 1/2 as c : 0 → ∞ • For large c, T (n1, . . . , xn) ≈ M ean!! Explanation: Relationship between redescending M-estimates of location and kernel density estimates Chu, Glad, Godtliebsen and Marron (1998) • Objective function: xi − t ρ c i=1 n X • Kernel density estimate: xi − t κ fc(x) ∝ h i=1 n X • Relationship: κ∝1−ρ • Example: κ(r) = √1 2π and c = window width h exp−r Normal kernel 2 /2 ⇒ ρ(r) = 1 − exp−r 2 /2 Welsch’s M-estimate • Example: Epanechnikov kernel ⇒ skipped mean • “Outliers” less compact than “good” data ⇒ Breakdown will not occur. • Not true for monotonic M-estimates. PART 3 ROBUST REGRESSION AND MULTIVARIATE STATISTICS REGRESSION SETTING • Data: (Yi, Xi) i = 1, . . . , n – Yi ∈ < Response – Xi ∈ <p Predictors • Predict: Y by X0β • Residual for a given β: ri(β) = yi − x0iβ • M-estimates of regression – Generalizion of MLE for symmetric error term – Yi = X0iβ + i where i are i.i.d. symmetric. M-ESTIMATES OF REGRESSION • Objective function approach: n X min ρ(ri(β)) β i=1 where ρ(r) = ρ(−r) and ρ ↑ for r ≥ 0. • M-estimating equation approach: n X i=1 e.g. ψ(r) = ρ0(r) ψ(ri(β))xi = 0 • Interpretation: Adaptively weighted least squares. c ◦ Express ψ(r) = ru(r) and wi = u(ri(β)) β= c n X n 0 −1 X [ wixixi] [ wiyixi] i=1 i=1 • Computations via IRLS algorithm. INFLUENCE FUNCTIONS FOR M-ESTIMATES OF REGRESSION IF (y, x; T, F ) ∝ ψ(r) x • r = y − x0T (F ) • Due to residual: ψ(r) • Due to Design: x • GES = ∞, i.e. unbounded influence. • Outlier 1: highly influential for any ψ function. • Outlier 2: highly influential for monotonic but not redescending ψ. BOUNDED INFLUENCE REGRESSION • GM-estimates (Generalized M-estimates). ◦ Mallows-type (Mallows, 1975): n X i=1 w(xi) ψ (ri(β)) xi = 0 Downweights outlying design points (leverage points), even if they are good leverage points. ◦ General form (c.g. Maronna and Yohai, 1981): n X i=1 ψ (xi, ri(β)) xi = 0 • Breakdown points. – M-estimates: ∗ = 0 – GM-estimates: ∗ ≤ 1/(p + 1) LATE 70’s - EARLY 80’s • Open problem: Is high breakdown point regression possible? • Yes. Repeated Median. Siegel (1982). – Not regression equivariate • Regression equivariance: For a ∈ < and A nonsingular c c (Yi, Xi) → (aYi, A0Xi) ⇒ β → aA−1β i.e. Yci → aYci • Open problem: Is high breakdown point equivariate regression possible? • Yes. Least Median of Squares. Hampel (1984), Rousseeuw, (1984) Least Median of Squares (LMS) Hampel (1984), Rousseeuw, (1984) min M edian{ri(β)2 | i = 1, . . . , n} β • Alternatively: minβ M AD{ ri(β) } • Breakdown point: ∗ = 1/2 • Location version: SHORTH (Princeton Robustness Study) Midpoint of the SHORTest Half. LMS: Mid-line of the shortest strip containing 1/2 of the data. • Problem: Not √ √ n-consistent, but only 3 n-consistent √ c n|| β − β || →p ∞ √ c 3 n|| β − β || = Op(1) • Not locally stable. e.g. Example is pure noise. S-ESTIMATES OF REGRESSION Rousseeuw and Yohai (1984) • For S(·), an estimate of scale (about zero): minβ S(ri(β)) • S = M AD ⇒ LMS • S = sample standard deviation about 0 ⇒ Least Squares • Bounded monotonic M-estimates of scale (about zero): n X i=1 χ (| ri |/s) = 0 for χ ↑, and bounded above and below. Alternatively, n 1 X ρ (| ri |/s) = n i=1 for ρ ↑, and 0 ≤ ρ ≤ 1 • For LMS: ρ : 0 - 1 jump function • Breakdown point: and = 1/2 ∗ = min(, 1 − ) S-estimates of Regression • √ n- consistent and Asymptotically normal. • Trade-off: Higher breakdown point ⇒ Lower efficiency and higher residual gross error sensivity for Normal errors. • One Resolution: One-step M-estimate via IRLS. (Note: R, S-plus, Matlab.) M-ESTIMATES OF REGRESSION WITH GENERAL SCALE Martin, Yohai, and Zamar (1989) n X | yi − x0iβ | ρ min β i=1 c sn • Parameter of interest: β • Scale statistic: sn (consistent for σ at normal errors) • Tuning constant: c • Monotonic bounded ρ ⇒ redescending M-estimate • High Breakdown Point Examples – LMS and S-estimates – MM-estimates, Yohai (1987). – CM-estimates, Mendes and Tyler (1994). sn .vs. n X | yi − x0iβ | ρ i=1 c sn • Each β gives one curve • S minimizes horizonally • M minimizes vertically MM-ESTIMATES OF REGRESSION Default robust regression estimate in S-plus (also SAS?) • Given a preliminary estimate of regression with breakdown point 1/2. Usually an S-estimate of regression. • Compute a monotone M-estimate of scale about zero for the residuals. sn: Usually from the S-estimate. • Compute an M-estimate of regression using the scale statistics sn and with any desired tuning constant c. • Tune to obtain reasonable ARE and residual GES. • Breakdown point: ∗ = 1/2 TUNING • High breakdown point S-estimates are badly tuned M-estimates. • Tuning the S-estimates affects its breakdown point. • MM-estimates can be tuned without affecting the breakdown point COMPUTATIONAL ISSUES • All known high breakdown point regression estimates are computationally intensive. Non-convex optimization problem. • Approximate or stochastic algorithms. • Random Elemental Subset Selection. Rousseeuw (1984). n such lines. – Consider exact fit of lines for p of the data points ⇒ p – Optimize the criterion over such lines. – For large n and p, randomly sample from such lines so that there is a good chance, say e.g. 95% chance, that one of the elemental subsets will contain only good data even if half the data is contaminated. MULTIVARIATE DATA Robust Multivariate Location and Covariance Estimates Parallel developments • Multivariate M-estimates. Maronna (1976). Huber (1977). ◦ Adaptively weighted mean and covariances. ◦ IRLS algorithms. ◦ Breakdown point: ∗ ≤ 1/(d + 1), where d is the dimension of the data. • Minimum Volume Ellipsoid Estimates. Rousseeuw (1985). ◦ MVE is a multivariate version of LMS. • Multivariate S-estimates. Davies (1987), Lopuhuaä (1989) • Multivariate CM-estimates. Kent and Tyler (1996). • Multivariate MM-estimates. Tatsuoka and Tyler (2000). Tyler (2002). MULTIVARIATE DATA • d-dimensional data set: X1, . . . Xn • Classical summary statistics: – Sample Mean Vector: X = 1 n Pn i=1 Xi – Sample Variance-Covarinance Matrix 1 X n (Xi − X)(Xi − X)T = {sij} Sn = n i=1 where sii = sample variance, sij = sample covariance. • Maximum likelihood estimates under normality Sample Mean and Covariance 6.0 ● ● ● ● ● ● 5.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 Hertzsprung−Russell Galaxy Data (Rousseeuw) Visualization ⇒ Ellipse containing half the data. (X − X)TS−1 n (X − X) ≤ c Mahalanobis distances: d2i = (Xi − X)TSn−1(Xi − X) ● 3.0 ● ● 2.5 ● 2.0 ● ● ● ● ● 1.5 MahDis ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ●● ● ● ● ● ● ● ● 0 ● ● ● ● 10 ● ● ● ● 20 30 40 Index −1 Mahalanobis angles: θ i,j = Cos (Xi − X)TS−1 n (Xj − X)/(didj ) 1.0 0.5 PC2 0.0 ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ●● ● ● −0.5 ● ● ● ● ● ● ● ● −1.0 ● ● −1.0 ● ●● ● ● ●● ● ● ● ●● ● ●● ●●● ●● ● ● −0.5 0.0 ● ●●● ● ● ● ●●● 0.5 ● ●●● ● ● 1.0 PC1 Principal Components: Do not always reveal outliers. ENGINEERING APPLICATIONS • Noisy signal arrays, radar clutter. Test for signal, i.e. detector, based on Mahalanobis angle between signal and observed data. • Hyperspectral Data. Adaptive cosine estimator = Mahalanobis angle. • BIG DATA • Can not clean data via observations. Need automatic methods. ROBUST MULTIVARIATE ESTIMATES Location and Scatter (Pseudo-Covariance) • Trimmed versions: complicated • Weighted mean and covariance matrices Pn µ= d i=1 wi Xi Pn i=1 wi Σ= d 1 n X n i=1 d d T wi (Xi − µ)(X i − µ) −1 d d )T Σ d where wi = u(di,o) and d2i,o = (Xi − µ o o (Xi − µo ) d d and Σ being initial estimates, e.g. and with µ o o → Sample mean vector and covariance matrix. → High breakdown point estimates. MULTIVARIATE M-ESTIMATES • MLE type for elliptically symmetric distributions • Adaptively weighted mean and covariances Pn d = µ i=1 wi Xi Pn i=1 wi d Σ = 1 n X n i=1 d d T wi (Xi − µ)(X i − µ) −1 d d TΣ • wi = u(di) and d2i = (Xi − µ) • Implicit equations (Xi − µ) d PROPERTIES OF M-ESTIMATES • Root-n consistent and asymptotically normal. • Can be tuned to have desirable properties: – high efficiency over a broad class of distributions – smooth and bounded influence function • Computationally simple (IRLS) • Breakdown point: ∗ ≤ 1/(d + 1) HIGH BREAKDOWN POINT ESTIMATES • MVE: Minimum Volume Ellipsoid ⇒ ∗ = 0.5 Center and scatter matrix for smallest ellipse covering at least half of the data. • MCD, S-estimates, MM-estimates, CM-estimates ⇒ ∗ = 0.5 • Reweighted versions based on high breakdown start. • Computationally intensive: Approximate/probabilistic algorithms. Elemental Subset Approach. Sample Mean and Covariance 6.0 ● ● ● ● ● ● 5.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 Hertzsprung−Russell Galaxy Data (Rousseeuw) 4.6 Add MVE 6.0 ● ● ● ● ● ● 5.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 Add Cauchy MLE 6.0 ● ● ● ● ● ● 5.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 ELLIPTICAL DISTRIBUTIONS and AFFINE EQUIVARIANCE SPHERICALLY SYMMETRIC DISTRIBUTIONS Z =RU R ⊥ U, R > 0, where and U ∼d Uniform on the unit sphere. −1.0 −0.5 y 0.0 0.5 1.0 Density: f (z) = g(z 0z) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.0 −0.5 0.0 x 0.5 1.0 ELLIPTICAL SYMMETRIC DISTRIBUTIONS X = AZ + µ ∼ E(µ, Γ; g), where Γ = AA0 1.0 0.5 0.0 −0.5 Eliplot()[,2] 1.5 2.0 2.5 ⇒ f (x) = |Γ| −1/2 0 g (x − µ) Γ −1 (x − µ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 0.0 0.5 1.0 Eliplot()[,1] 1.5 2.0 2.5 ! AFFINE EQUIVARIANCE • Parameters of elliptical distributions: X ∼ E(µ, Γ; g) ⇒ X ∗ = BX + b ∼ E(µ∗, Γ∗; g) where µ∗ = Bµ + b and Γ∗ = BΓB 0 • Sample version: ⇒ Xi → Xi∗ = BX + b, i = 1, . . . n d d∗ d d → µ d∗ = B µ d + b µ and V →V = BV B0 • Examples – Mean vector and Covariance matrix – M-estimates – MVE ELLIPTICAL SYMMETRIC DISTRIBUTIONS f (x; µ, Γ) = |Γ| −1/2 0 g (x − µ) Γ −1 ! (x − µ) . • If second moments exist, shape matrix Γ ∝ covariance matrix. • For any affine equivariant scatter functional: Σ(F ) ∝ Γ • That is, any affine equivariant scatter statistic estimates a matrix proportional to the shape matrix Γ. NON-ELLIPTICAL DISTRIBUTIONS • Difference scatter matrices estimate different population quantities. • Analogy: For non-symmetric distributions: population mean 6= population median. OTHER TOPICS Projection-based multivariate approaches • Tukey’s data depth. (or half-space depth) Tukey (1974). • Stahel-Donoho’s Estimate. Stahel (1981). Donoho 1982. • Projection-estimates. Maronna, Stahel and Yohai (1994). Tyler (1996). • Computationally Intensive. TUKEY’S DEPTH PART 5 ROBUSTNESS AND COMPUTER VISION • Independent development of robust statistics within the computer vision/image understanding community. • Hough transform: ρ = xcos(θ) + ysin(θ) • y = a + bx where a = ρ/sin(θ) and b = −cos(θ)/sin(θ) • That is, intersection refers to the line connecting two points. • Hough: Estimation in Data Space to Clustering in Feature Space Find centers of the clusters • Terminology: – Feature Space = Parameter Space – Accumulator = Elemental Fit • Computation: RANSAC (Random sample consensus) – Randomly choose a subset from the Accumulator. (Random elemental fits.) – Check to see how many data points are within a fixed neighborhood are in a neighborhood. Alternative Formulation of Hough Transform/RANSAC P ρ(ri) should be small, where ρ(r) = 1, | r | > R 0, | r | ≤ R That is, a redescending M-estimate of regression with known scale. Note: Relationship to LMS. Vision: Need e.g. 90% breakdown point, i.e. tolerate 90% bad data Defintion of Residuals? Hough transform approach does not distinguish between: • Regression line for regressing Y on X. • Regression line for regressing X on Y. • Orthogonal regression line. (Note: Small stochastic errors ⇒ little difference.) GENERAL PARADIGM • Line in 2-D: Exact fit to all pairs • Quadratic in 2-D: Exact fit to all triples • Conic Sections: Ellipse Fitting (x − µ)0A(x − µ) = 1 ◦ Linearizing: Let x0∗ = (x1, x2, x21, x22, x1x2, 1) ◦ Ellipse: a0x∗ = 0 ◦ Exact fit to all subsets of size 5. Hyperboloid Fitting in 4-D • Epipolar Geometry: Exact fit to ◦ 5 pairs: Calibrated cameras (location, rotation, scale). ◦ 8 pairs: Uncalibrated cameras (focal point, image plane). ◦ Hyperboloid surface in 4-D. • Applications: 2D → 3D, Motion. • OUTLIERS = MISMATCHES