Geometric Median: Applications to Robust and Scalable Statistical Estimation Stanislav Minsker LDHD SAMSI Workshop March 31, 2014 Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 1 / 22 Challenges of Contemporary Statistical Science Resource limitations: massive data need computer clusters for storage and efficient processing Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 2 / 22 Challenges of Contemporary Statistical Science Resource limitations: massive data need computer clusters for storage and efficient processing =⇒ requires algorithms that can be implemented in parallel. Node 2 ... Node 1 Node k Master Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 2 / 22 Challenges of Contemporary Statistical Science Node 2 ... Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing =⇒ requires algorithms that can be implemented in parallel. Node k Master Presence of outliers of unknown nature Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 2 / 22 Challenges of Contemporary Statistical Science Node 2 ... Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing =⇒ requires algorithms that can be implemented in parallel. Node k Master Presence of outliers of unknown nature =⇒ requires algorithms that are robust and do not rely on preprocessing or outlier detection. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 2 / 22 Challenges of Contemporary Statistical Science Node 2 ... Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing =⇒ requires algorithms that can be implemented in parallel. Node k Master Presence of outliers of unknown nature =⇒ requires algorithms that are robust and do not rely on preprocessing or outlier detection. While ad-hoc techniques exist for some problems, we would like to develop general methods. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 2 / 22 Example: how to estimate the mean? Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ). Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 3 / 22 Example: how to estimate the mean? Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ). Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t . n P Xj , take Solution: compute µ̂∗ := n1 j=1 " √ r CInorm (t) = µ̂∗ − σ 2 Stanislav Minsker (LDHD SAMSI Workshop) √ t , µ̂∗ + σ 2 n Geometric Median and Robust Estimation r # t n March 31, 2014 3 / 22 Example: how to estimate the mean? Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ). Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t . n P Xj , take Solution: compute µ̂∗ := n1 j=1 " √ r CInorm (t) = µ̂∗ − σ 2 √ t , µ̂∗ + σ 2 n r # t n To find µ̂∗ : set m = n/k , X1 , . . . , Xm . . . . . . Xn−m+1 , . . . , Xn | {z } | {z } 1 µ̂1 := m | m P j=1 1 µ̂k := m Xj {z µ̂∗ = k1 Stanislav Minsker (LDHD SAMSI Workshop) n P j=n−m+1 k P j=1 Xj } µ̂j Geometric Median and Robust Estimation March 31, 2014 3 / 22 Example: how to estimate the mean? What if X , X1 , . . . , Xn are i.i.d. from Π with EX = µ, Var(X ) = σ 2 ? Problem: construct CI for µ with coverage probability ≥ 1 − e−t such that for any t length(CI(t)) ≤ (Absolute constant) · length(CInorm (t)) No additional assumptions on Π are imposed. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 4 / 22 Example: how to estimate the mean? What if X , X1 , . . . , Xn are i.i.d. from Π with EX = µ, Var(X ) = σ 2 ? Problem: construct CI for µ with coverage probability ≥ 1 − e−t such that for any t length(CI(t)) ≤ (Absolute constant) · length(CInorm (t)) No additional assumptions on Π are imposed. Remark: guarantee for the sample mean µ̂n = 1 n n P Xj is unsatisfactory: j=1 q Pr µ̂n − µ ≥ σ et /n ≤ e−t . Does the solution exist? Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 4 / 22 Example: how to estimate the mean? Answer (somewhat unexpected?): Yes! Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 4 / 22 Example: how to estimate the mean? Answer (somewhat unexpected?): Yes! Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11] Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each: G1 Gk }| { }| { z z X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn | | {z } {z } µ̂1 := |G1 | 1 | Stanislav Minsker (LDHD SAMSI Workshop) P Xi ∈G1 µ̂k := |G1 | k Xi {z P Xi ∈Gk µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k ) Geometric Median and Robust Estimation Xi } March 31, 2014 4 / 22 Example: how to estimate the mean? Answer (somewhat unexpected?): Yes! Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11] Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each: G1 Gk }| { }| { z z X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn | | {z } {z } µ̂1 := |G1 | 1 | Claim: Pr Stanislav Minsker (LDHD SAMSI Workshop) P Xi ∈G1 µ̂k := |G1 | k Xi {z P Xi ∈Gk µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k ) Xi } r ! t |µ̂∗ − µ| ≥ Abs.const × σ ≤ e−t n Geometric Median and Robust Estimation March 31, 2014 4 / 22 Example: how to estimate the mean? Answer (somewhat unexpected?): Yes! Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11] Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each: G1 Gk z }| { }| { z X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn | {z } | {z } µ̂1 := |G1 | 1 | P Xi ∈G1 µ̂k := |G1 | k Xi Then take {z |µ̂∗ − µ| ≥ Abs.const × σ " CI(t) = µ̂∗ − Cσ Stanislav Minsker (LDHD SAMSI Workshop) Xi } µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k ) Claim: Pr P Xi ∈Gk r r ! t ≤ e−t n t , µ̂∗ + Cσ n Geometric Median and Robust Estimation r # t n March 31, 2014 4 / 22 0.8 0.7 Proof of the claim: 0.6 ...... µ̂ 1 0.5 µ̂ ...... µ̂ 8 0.4 0.3 1 |µ̂ − µ| ≥ s =⇒ at least half of events {|µ̂j − µ| ≥ s} occur. 0.2 0.1 0 0 0.1 0.2 Stanislav Minsker (LDHD SAMSI Workshop) 0.3 0.4 0.5 0.6 0.7 Geometric Median and Robust Estimation 0.8 0.9 1 March 31, 2014 5 / 22 0.8 0.7 Proof of the claim: 0.6 ...... µ̂ 1 0.5 µ̂ ...... µ̂ 8 0.4 0.3 1 2 |µ̂ − µ| ≥ s =⇒ at least half of events {|µ̂j − µ| ≥ s} occur. Pr at least half of events {|µ̂j − µ| ≥ s} occur ≤ k k/2 (Pr(|µ̂1 − µ| ≥ s))k /2 0.2 q k /2 ≤ (2e)k /2 Var(X ) nsk 2 ≤ e−k whenever s ≥ (2e3 )1/2 σ nt . 0.1 Since k = btc + 1, the result follows (for “absolute constant”≤ 6.5) . 0 0 0.1 0.2 Stanislav Minsker (LDHD SAMSI Workshop) 0.3 0.4 0.5 0.6 0.7 Geometric Median and Robust Estimation 0.8 0.9 1 March 31, 2014 5 / 22 Extensions to higher dimensions A natural question: is it possible to extend presented method to the multivariate mean? Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 6 / 22 Extensions to higher dimensions A natural question: is it possible to extend presented method to the multivariate mean? Naive approach: apply the "median trick" coordinatewise. Makes the bound dimension-dependent. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 6 / 22 Extensions to higher dimensions A natural question: is it possible to extend presented method to the multivariate mean? Naive approach: apply the "median trick" coordinatewise. Makes the bound dimension-dependent. Can we do better? Yes! – replace the usual median by the geometric median. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 6 / 22 Extensions to higher dimensions Can we do better? Yes! – replace the usual median by the geometric median. Definition Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median x∗ is defined as k X x∗ = med(x1 , . . . , xk ) := argmin ky − xj k. y ∈X Stanislav Minsker (LDHD SAMSI Workshop) j=1 Geometric Median and Robust Estimation March 31, 2014 6 / 22 Extensions to higher dimensions Can we do better? Yes! – replace the usual median by the geometric median. Definition Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median x∗ is defined as k X x∗ = med(x1 , . . . , xk ) := argmin ky − xj k. j=1 6 y ∈X -6 -4 -2 y 2 If X is a Hilbert space, then x∗ ∈ convex hull(x1 , . . . , xk ). 0 1 4 Remarks: -4 -2 0 2 4 x Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 6 / 22 Extensions to higher dimensions Can we do better? Yes! – replace the usual median by the geometric median. Definition Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median x∗ is defined as k X x∗ = med(x1 , . . . , xk ) := argmin ky − xj k. j=1 6 y ∈X 4 Remarks: If X is a Hilbert space, then x∗ ∈ convex hull(x1 , . . . , xk ). 2 Other generalizations of the median are possible (e.g., A. Nemirovski and D. Yudin ‘83, D. Hsu and S. Sabato ‘14). y -6 -4 -2 0 2 1 -4 -2 0 2 4 x Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 6 / 22 Main result For 0 < p < α < 1 , 2 define ψ(α; p) = (1 − α) log Stanislav Minsker (LDHD SAMSI Workshop) 1−α α + α log 1−p p Geometric Median and Robust Estimation March 31, 2014 7 / 22 Main result For 0 < p < α < 1 , 2 define ψ(α; p) = (1 − α) log 1−α α + α log 1−p p Theorem (M., 2013) Fix α ∈ 0, 12 . Assume that µ ∈ X is a parameter of interest, and let µ̂1 , . . . , µ̂k ∈ X be a collection of independent estimators of µ. Suppose ε > 0 and p < α are such that for all 1 ≤ j ≤ k, Pr kµ̂j − µk > ε ≤ p (“weak concentration”) (1) Let (2) µ̂ :=med (µ̂1 , . . . , µ̂k ). Then h i−k Pr kµ̂ − µk > Cα ε ≤ eψ(α;p) Stanislav Minsker (LDHD SAMSI Workshop) (3) (“strong concentration”) q 1 where Cα = (1 − α) 1−2α for Hilbert space case and Cα = 2(1−α) 1−2α Geometric Median and Robust Estimation otherwise. March 31, 2014 7 / 22 Handling the “outliers” For 0 < p < α < 1 , 2 define ψ(α; p) = (1 − α) log 1−α α + α log 1−p p Theorem (M., 2013) Fix α ∈ 0, 12 . Assume that µ ∈ X is a parameter of interest, and let µ̂1 , . . . , µ̂k ∈ X be a collection of independent estimators of µ. Suppose ε > 0, p < α and 0 ≤ γ < for all 1 ≤ j ≤ (1 − γ)k + 1, Pr kµ̂j − µk > ε ≤ p (“weak concentration”) α−p 1−p are such that (4) Let (5) µ̂ :=med (µ̂1 , . . . , µ̂k ). Then α−γ −k (1−γ) ψ ;p Pr kµ̂ − µk > Cα ε ≤ e 1−γ q 1 where Cα = (1 − α) 1−2α for Hilbert space case and Cα = Stanislav Minsker (LDHD SAMSI Workshop) (“strong concentration”) 2(1−α) 1−2α Geometric Median and Robust Estimation (6) otherwise. March 31, 2014 8 / 22 Example: estimation of the mean (H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ is the covariance operator and EkX − µk2 = tr (Γ) < ∞. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 9 / 22 Example: estimation of the mean (H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ is the covariance operator and EkX − µk2 = tr (Γ) < ∞. k j Given δ, set k = log δ1 + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , Stanislav Minsker (LDHD SAMSI Workshop) |Gi | & jnk Geometric Median and Robust Estimation k , i = 1 . . . k. March 31, 2014 9 / 22 Example: estimation of the mean (H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ 2 is the covariance joperator k and EkX − µk = tr (Γ) < ∞. Given δ, set k = log 1 δ + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , µ̂j := |Gi | & jnk k , i = 1 . . . k. 1 X Xi , j = 1 . . . k , |Gj | i∈Gj µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ). Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 9 / 22 Example: estimation of the mean (H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ 2 is the covariance joperator k and EkX − µk = tr (Γ) < ∞. Given δ, set k = log 1 δ + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , µ̂j := |Gi | & jnk k , i = 1 . . . k. 1 X Xi , j = 1 . . . k , |Gj | i∈Gj µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ). Result for µ̂∗ : Corollary r tr (Γ) log(e/δ) Pr kµ̂∗ − µk ≥ 15 ≤ δ. n Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 9 / 22 Example: estimation of the mean µ̂j := 1 X Xi , j = 1 . . . k , |Gj | i∈Gj µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ). Result for µ̂∗ : Corollary r tr (Γ) log(e/δ) Pr kµ̂∗ − µk ≥ 15 ≤ δ. n Application to “robust PCA”: Yi ∈ RD , EYi = 0, EYi YiT = Σ, and set Xi = Yi YiT Pr 30 \ Projm − Projm F ≥ ∆m s ! EkX k4 − tr (Σ2 ) log(e/δ) n =⇒ ≤ δ, where ∆m = λm (Σ) − λm+1 (Σ) is the m-th spectral gap of Σ. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 9 / 22 High-dimensional sparse linear regression Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD : Yj = λT0 xj + ξj , (7) where 1 2 x1 , . . . , xn ∈ RD – a fixed collection of vectors; ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 . Problem: estimate λ0 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 10 / 22 High-dimensional sparse linear regression Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD : Yj = λT0 xj + ξj , (7) where 1 2 x1 , . . . , xn ∈ RD – a fixed collection of vectors; ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 . Problem: estimate λ0 . Interesting case: D n and λ0 is sparse, meaning that N(λ0 ) := supp(λ0 ) = {j : λ0,j 6= 0} = s D. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 10 / 22 High-dimensional sparse linear regression Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD : Yj = λT0 xj + ξj , (7) where 1 2 x1 , . . . , xn ∈ RD – a fixed collection of vectors; ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 . Problem: estimate λ0 . Interesting case: D n and λ0 is sparse, meaning that N(λ0 ) := supp(λ0 ) = {j : λ0,j 6= 0} = s D. In this situation, a (variant of) famous Lasso [Tibshirani ‘96] estimator n 2 1 X T λ̂ε := argmin Yj − λ xj + ε kλk1 . n λ∈RD j=1 provides a good approximation of λ0 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 10 / 22 Sparse recovery: heavy-tailed noise case t > 0 – fixed, k = btc + 1, m = bn/k c; Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 11 / 22 Sparse recovery: heavy-tailed noise case t > 0 – fixed, k = btc + 1, m = bn/k c; {1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m; Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 11 / 22 Sparse recovery: heavy-tailed noise case t > 0 – fixed, k = btc + 1, m = bn/k c; {1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m; T Xl = xj1 | . . . |xjm , ji ∈ Gl , l = 1, . . . , k ; Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 11 / 22 Sparse recovery: heavy-tailed noise case t > 0 – fixed, k = btc + 1, m = bn/k c; {1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m; T Xl = xj1 | . . . |xjm , ji ∈ Gl , l = 1, . . . , k ; λ̂lε 2 X 1 T := argmin Yj − λ xj + εkλk1 , |Gl | λ∈RD j∈Gl λ̂∗ε := med(λ̂1ε , . . . , λ̂kε ). Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 11 / 22 Sparse recovery: heavy-tailed noise case Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]: κl (s, c0 ) := Stanislav Minsker (LDHD SAMSI Workshop) min min J⊂{1...D} u∈RD ,u6=0 |J|≤s ku c k ≤c ku k 0 J 1 J 1 kXl uk √ > 0, nkuJ k Geometric Median and Robust Estimation l = 1, . . . , k . March 31, 2014 12 / 22 Sparse recovery: heavy-tailed noise case Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]: κl (s, c0 ) := min min J⊂{1...D} u∈RD ,u6=0 |J|≤s ku c k ≤c ku k 0 J 1 J 1 kXl uk √ > 0, nkuJ k l = 1, . . . , k . Assumptions: 1 2 3 Var(ξj ) ≤ σ 2 ; kxj k∞ ≤ M, 1 ≤ j ≤ n; κ̄(2N(λ0 ), 3) := min κl (2N(λ0 ), 3) > 0; 1≤l≤k Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 12 / 22 Sparse recovery: heavy-tailed noise case Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]: κl (s, c0 ) := min min J⊂{1...D} u∈RD ,u6=0 |J|≤s ku c k ≤c ku k 0 J 1 J 1 kXl uk √ > 0, nkuJ k l = 1, . . . , k . Assumptions: 1 2 3 Var(ξj ) ≤ σ 2 ; kxj k∞ ≤ M, 1 ≤ j ≤ n; κ̄(2N(λ0 ), 3) := min κl (2N(λ0 ), 3) > 0; 1≤l≤k Theorem (M., 2013) For any r ε ≥ 120Mσ t +1 log(2D), n with probability ≥ 1 − e−t √ 2 112 6 2 N(λ0 ) ∗ ε 4 . λ̂ε − λ0 ≤ 3 κ̄ (2N(λ0 ), 3) Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 12 / 22 Low-rank matrix recovery (X , Y ) ∈ RD×D × R is generated according to the trace regression model: Y = hA0 , X i + ξ, where 1 2 3 4 hA1 , A2 i := tr (AT1 A2 ), A0 ∈ RD×D is a fixed symmetric matrix, X ∈ RD×D is a random symmetric matrix, ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 13 / 22 Low-rank matrix recovery (X , Y ) ∈ RD×D × R is generated according to the trace regression model: Y = hA0 , X i + ξ, where 1 2 3 4 hA1 , A2 i := tr (AT1 A2 ), A0 ∈ RD×D is a fixed symmetric matrix, X ∈ RD×D is a random symmetric matrix, ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 . We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F . 1 2 X is such that {Xi,j , 1 ≤ i ≤ j ≤ D} are i.i.d. centered normal random variables with 2 2 EXi,j = 12 , i < j and EXi,i = 1, i = 1 . . . D. 1 X is such that Xi,j = √ εi,j , i < j, Xi,i = εi,i , 1 ≤ i ≤ D, where εi,j are i.i.d. Rademacher random 2 variables. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 13 / 22 Low-rank matrix recovery (X , Y ) ∈ RD×D × R is generated according to the trace regression model: Y = hA0 , X i + ξ, where 1 2 3 4 hA1 , A2 i := tr (AT1 A2 ), A0 ∈ RD×D is a fixed symmetric matrix, X ∈ RD×D is a random symmetric matrix, ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 . We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F . 1 2 X is such that {Xi,j , 1 ≤ i ≤ j ≤ D} are i.i.d. centered normal random variables with 2 2 EXi,j = 12 , i < j and EXi,i = 1, i = 1 . . . D. 1 X is such that Xi,j = √ εi,j , i < j, Xi,i = εi,i , 1 ≤ i ≤ D, where εi,j are i.i.d. Rademacher random 2 variables. Problem: estimate A0 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 13 / 22 Low-rank matrix recovery (X , Y ) ∈ RD×D × R is generated according to the trace regression model: Y = hA0 , X i + ξ, We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F . Problem: estimate A0 . (X1 , Y1 ), . . . , (Xn , Yn ) – i.i.d. observations with the same distribution as (X , Y ). Usually, n D 2 : impossible to estimate A0 consistently. If A0 is (approximately) low-rank , then n X 2 1 b ε := argmin Yj − A, Xj + ε kAk∗ . A n A∈L j=1 is a good approximation to A0 . Here, L is a closed, convex subset of a set of all D × D symmetric matrices and kAk∗ is the nuclear norm of A. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 13 / 22 Low-rank matrix recovery: heavy-tailed noise case Given δ, set k = j log 1 δ k + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , Stanislav Minsker (LDHD SAMSI Workshop) |Gi | & jnk Geometric Median and Robust Estimation k , i = 1 . . . k. March 31, 2014 14 / 22 Low-rank matrix recovery: heavy-tailed noise case Given δ, set k = j log 1 δ k + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , |Gi | & jnk k , i = 1 . . . k. For 1 ≤ i ≤ k , X 2 1 (i) b Aε := argmin Yj − A, Xj + ε kAk∗ and |Gi | A∈L j∈Gi b∗ A ε b (1) := medF (A Stanislav Minsker (LDHD SAMSI Workshop) b (k ) ). ,...,A Geometric Median and Robust Estimation March 31, 2014 14 / 22 Low-rank matrixjrecovery: heavy-tailed noise case k Given δ, set k = log 1 δ + 1, {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk , |Gi | & jnk k , i = 1 . . . k. For 1 ≤ i ≤ k , X 2 1 (i) b Yj − A, Xj Aε := argmin + ε kAk∗ and |Gi | A∈L j∈Gi b∗ A ε b (1) := medF (A b (k ) ). ,...,A Assume that 1 2 RL := sup kAk1 < ∞; A∈L R p kξk2,1 := 0∞ Pr (|ξ| > x)dx < ∞. (e.g., it holds if E|ξ|2+τ < ∞ for some τ > 0). Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 14 / 22 Low-rank matrix recovery: heavy-tailed noise case Assume that 1 2 RL := sup kAk1 < ∞; A∈L R p kξk2,1 := 0∞ Pr (|ξ| > x)dx < ∞. (e.g., it holds if E|ξ|2+τ < ∞ for some τ > 0). Theorem There exist constants c, C, B with the following property: let κ := log log2 (DRL ), sn,t,D := (5 + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all r ε ≥ Bkξk2,1 Dt log(2D), n with probability ≥ 1 − 2e−t kÂ∗ε − A0 k2F ≤ inf A∈L Stanislav Minsker (LDHD SAMSI Workshop) 2kA − A0 k2F + C ε2 rank(A) + RL2 sn,t,D Geometric Median and Robust Estimation Dt t + n n . March 31, 2014 14 / 22 Theorem (M., 2013) There exist constants c, C, B with the following property: let κ := log log2 (DRL ), sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all s Dt ε ≥ Bkξk2,1 log(2D), n with probability ≥ 1 − 2e−t ∗ 2 kÂε − A0 kF ≤ inf A∈L 2 2kA − A0 kF + C Dt t 2 2 ε rank(A) + RL sn,t,D + . n n Proof: Main idea: the size of regularization parameter ε is closely related to the operator norm of Θ := n 1X ξj Xj n j=1 Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 15 / 22 Theorem (M., 2013) There exist constants c, C, B with the following property: let κ := log log2 (DRL ), sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all s Dt ε ≥ Bkξk2,1 log(2D), n with probability ≥ 1 − 2e−t ∗ 2 kÂε − A0 kF ≤ inf A∈L 2 2kA − A0 kF + C Dt t 2 2 ε rank(A) + RL sn,t,D + . n n Proof: Main idea: the size of regularization parameter ε is closely related to the operator norm of Θ := n 1X ξj Xj n j=1 To obtain a "weak bound" on Pr kΘkOp Stanislav Minsker (LDHD SAMSI Workshop) ≥ s , it is enough to bound EkΘkOp ; Geometric Median and Robust Estimation March 31, 2014 15 / 22 Theorem (M., 2013) There exist constants c, C, B with the following property: let κ := log log2 (DRL ), sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all s Dt ε ≥ Bkξk2,1 log(2D), n with probability ≥ 1 − 2e−t ∗ 2 kÂε − A0 kF ≤ inf A∈L 2 2kA − A0 kF + C Dt t 2 2 ε rank(A) + RL sn,t,D + . n n Proof: Main idea: the size of regularization parameter ε is closely related to the operator norm of Θ := n 1X ξj Xj n j=1 To obtain a "weak bound" on Pr kΘkOp ≥ s , it is enough to bound EkΘkOp ; Key step: replace the heavy-tailed ξj by Gaussian random variables ηj [Multiplier inequality]: √ X i 1 n 1 X 2,1 √ . ≤ 2 2kξk √ η X E ξ X max E j j j j n 1≤i≤n n i Op Op j=1 Stanislav Minsker (LDHD SAMSI Workshop) j=1 Geometric Median and Robust Estimation March 31, 2014 15 / 22 Applications to Bayesian Inference (joint work with S. Srivastava, L. Lin and D. Dunson) {Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 16 / 22 Applications to Bayesian Inference (joint work with S. Srivastava, L. Lin and D. Dunson) {Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ. Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some θ0 ∈ Θ. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 16 / 22 Applications to Bayesian Inference (joint work with S. Srivastava, L. Lin and D. Dunson) {Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ. Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some θ0 ∈ Θ. Let Π be the prior distribution over Θ. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 16 / 22 Applications to Bayesian Inference (joint work with S. Srivastava, L. Lin and D. Dunson) {Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ. Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some θ0 ∈ Θ. Let Π be the prior distribution over Θ. The posterior distribution given Xn is a random probability measure on Θ defined as R Qn i=1 pθ (Xi )dΠ(θ) B . B 7→ Πn (B|Xn ) := R Qn i=1 pθ (Xi )dΠ(θ) Θ Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 16 / 22 Applications to Bayesian Inference (joint work with S. Srivastava, L. Lin and D. Dunson) {Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ. Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some θ0 ∈ Θ. Let Π be the prior distribution over Θ. The posterior distribution given Xn is a random probability measure on Θ defined as R Qn i=1 pθ (Xi )dΠ(θ) B . B 7→ Πn (B|Xn ) := R Qn i=1 pθ (Xi )dΠ(θ) Θ Under general conditions, Πn (·|Xn ) “contracts” towards δθ0 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 16 / 22 Distances between probability measures (X, ρ) - separable metric space. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 17 / 22 Distances between probability measures (X, ρ) - separable metric space. F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures P, Q on Y, define Z kP − QkF := sup f (x)d(P − Q)(x) . f ∈F Stanislav Minsker (LDHD SAMSI Workshop) X Geometric Median and Robust Estimation March 31, 2014 17 / 22 Distances between probability measures (X, ρ) - separable metric space. F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures P, Q on Y, define Z kP − QkF := sup f (x)d(P − Q)(x) . f ∈F X Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH ) with a reproducing kernel k : X × X 7→ R: q F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 17 / 22 Distances between probability measures (X, ρ) - separable metric space. F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures P, Q on Y, define Z kP − QkF := sup f (x)d(P − Q)(x) . f ∈F X Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH ) with a reproducing kernel k : X × X 7→ R: q F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}. Fact (Sriperumbudur at al., 2010): Z . kP − QkFk = k (x, ·)d(P − Q)(x) X P 7→ R X H k (x, ·)dP(x) is an embedding into the Hilbert space H. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 17 / 22 Distances between probability measures (X, ρ) - separable metric space. F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures P, Q on Y, define Z kP − QkF := sup f (x)d(P − Q)(x) . f ∈F X Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH ) with a reproducing kernel k : X × X 7→ R: q F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}. Fact (Sriperumbudur at al., 2010): Z . kP − QkFk = k (x, ·)d(P − Q)(x) X P 7→ R X H k (x, ·)dP(x) is an embedding into the Hilbert space H. kP − QkFk is easy to approximate. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 17 / 22 M-posterior (“median posterior”) distribution {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 18 / 22 M-posterior (“median posterior”) distribution {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm . RQ B Πn (B|Gj ) := R Q Θ Stanislav Minsker (LDHD SAMSI Workshop) i∈Gj pθ (Xi )dΠ(θ) i∈Gj pθ (Xi )dΠ(θ) Geometric Median and Robust Estimation . March 31, 2014 18 / 22 M-posterior (“median posterior”) distribution {X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm . RQ B Πn (B|Gj ) := R Q Θ i∈Gj pθ (Xi )dΠ(θ) i∈Gj pθ (Xi )dΠ(θ) . Define the M-posterior as (1) (m) Π̂n := med(Πn , . . . , Πn ). Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 18 / 22 Computational aspects Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )? Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 19 / 22 Computational aspects Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )? Famous Weiszfeld’s algorithm: starting from some z0 in the affine hull of {x1 , . . . , xk }, iterate zm+1 = k X (j) αm+1 xj , j=1 (j) where αm+1 = −1 kxj −zm k k P j=1 . kxj −zm k−1 Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 19 / 22 Computational aspects Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )? Famous Weiszfeld’s algorithm: starting from some z0 in the affine hull of {x1 , . . . , xk }, iterate zm+1 = k X (j) αm+1 xj , j=1 (j) where αm+1 = −1 kxj −zm k k P j=1 . kxj −zm k−1 Converges for all but countably many initial points; various modifications are possible. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 19 / 22 Numerical simulation: PCA X1 , . . . , X156 ∈ R120 ; Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 20 / 22 Numerical simulation: PCA X1 , . . . , X156 ∈ R120 ; d Xi = AYi , coordinates of Y are independent with density p(y ) = Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation 3y 2 2(1+|y |3 )2 ; March 31, 2014 20 / 22 Numerical simulation: PCA X1 , . . . , X156 ∈ R120 ; d Xi = AYi , coordinates of Y are independent with density p(y ) = 3y 2 2(1+|y |3 )2 ; A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 }; remaining diagonal elements are equal to √ 1 ; 120 Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 20 / 22 Numerical simulation: PCA X1 , . . . , X156 ∈ R120 ; d Xi = AYi , coordinates of Y are independent with density p(y ) = 3y 2 2(1+|y |3 )2 ; A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 }; remaining diagonal elements are equal to √ 1 ; 120 Additionally, the data set contained 4 "outliers" Z1 , . . . , Z4 from the uniform distribution on [−20, 20]120 . Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 20 / 22 Numerical simulation: PCA X1 , . . . , X156 ∈ R120 ; d Xi = AYi , coordinates of Y are independent with density p(y ) = 3y 2 2(1+|y |3 )2 ; A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 }; remaining diagonal elements are equal to √ 1 ; 120 Additionally, the data set contained 4 "outliers" Z1 , . . . , Z4 from the uniform distribution on [−20, 20]120 . Error is measures by Proj5 (Σ̂) − Proj5 (Σ) . Op Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 20 / 22 Numerical simulation: PCA 30 60 25 50 20 40 15 30 10 20 5 10 0 0.05 0.1 0.15 0.2 0.25 0 0.994 0.3 Figure: Error of the "geometric median" estimator. 0.995 0.996 0.997 0.998 0.999 1 Figure: Error of the sample covariance estimator. 80 70 60 50 40 30 20 10 0 0.05 0.1 0.15 0.2 Figure: Error of the "thresholded geometric median" estimator. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 20 / 22 References 1 S. Minsker. (2013) Geometric Median and Robust Estimation in Banach Spaces. 2 S. Minsker, S. Srivastava, L. Lin and D. Dunson. (2014) Robust and scalable Bayes via a median of subset posterior measures. 3 Nemirovski, A. and Yudin, D. (1983) Problem complexity and method efficiency in optimization. 4 Lerasle, M. and Oliveira, R.I. (2011) Robust empirical mean estimators. 5 Catoni, O. (2012) Challenging the empirical mean and empirical variance: a deviation study. 6 Koltchinskii, V. (2011) Oracle inequalities in empirical risk minimization and sparse recovery problems. 7 van der Vaart, A. W. and Wellner, J. A.Weak convergence and empirical processes. 8 Sriperumbudur, B and Gretton, A and Fukumizu, K. an Schölkopf, B. and Lanckriet, G. (2010) Hilbert space embeddings and metrics on probability measures. 9 Hsu, D. and Sabato, S. (2014) Loss minimization and parameter estimation with heavy tails. Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 21 / 22 Thank you for your attention! Stanislav Minsker (LDHD SAMSI Workshop) Geometric Median and Robust Estimation March 31, 2014 22 / 22