Geometric Median: Applications to Robust and Scalable

advertisement
Geometric Median: Applications to Robust and Scalable Statistical
Estimation
Stanislav Minsker
LDHD SAMSI Workshop
March 31, 2014
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
1 / 22
Challenges of Contemporary Statistical Science
Resource limitations: massive data need computer
clusters for storage and efficient processing
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
2 / 22
Challenges of Contemporary Statistical Science
Resource limitations: massive data need computer
clusters for storage and efficient processing
=⇒ requires algorithms that can be implemented in
parallel.
Node 2
...
Node 1
Node k
Master
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
2 / 22
Challenges of Contemporary Statistical Science
Node 2
...
Node 1
Resource limitations: massive data need computer
clusters for storage and efficient processing
=⇒ requires algorithms that can be implemented in
parallel.
Node k
Master
Presence of outliers of unknown nature
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
2 / 22
Challenges of Contemporary Statistical Science
Node 2
...
Node 1
Resource limitations: massive data need computer
clusters for storage and efficient processing
=⇒ requires algorithms that can be implemented in
parallel.
Node k
Master
Presence of outliers of unknown nature
=⇒ requires algorithms that are robust and do not
rely on preprocessing or outlier detection.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
2 / 22
Challenges of Contemporary Statistical Science
Node 2
...
Node 1
Resource limitations: massive data need computer
clusters for storage and efficient processing
=⇒ requires algorithms that can be implemented in
parallel.
Node k
Master
Presence of outliers of unknown nature
=⇒ requires algorithms that are robust and do not
rely on preprocessing or outlier detection.
While ad-hoc techniques exist for some problems, we
would like to develop general methods.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
2 / 22
Example: how to estimate the mean?
Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ).
Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
3 / 22
Example: how to estimate the mean?
Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ).
Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t .
n
P
Xj , take
Solution: compute µ̂∗ := n1
j=1
"
√
r
CInorm (t) = µ̂∗ − σ 2
Stanislav Minsker (LDHD SAMSI Workshop)
√
t
, µ̂∗ + σ 2
n
Geometric Median and Robust Estimation
r #
t
n
March 31, 2014
3 / 22
Example: how to estimate the mean?
Assume that X1 , . . . , Xn are i.i.d. N (µ, σ 2 ).
Problem: construct CInorm (t) for µ with coverage probability ≥ 1 − 2e−t .
n
P
Xj , take
Solution: compute µ̂∗ := n1
j=1
"
√
r
CInorm (t) = µ̂∗ − σ 2
√
t
, µ̂∗ + σ 2
n
r #
t
n
To find µ̂∗ : set m = n/k ,
X1 , . . . , Xm . . . . . . Xn−m+1 , . . . , Xn
|
{z
}
|
{z
}
1
µ̂1 := m
|
m
P
j=1
1
µ̂k := m
Xj
{z
µ̂∗ = k1
Stanislav Minsker (LDHD SAMSI Workshop)
n
P
j=n−m+1
k
P
j=1
Xj
}
µ̂j
Geometric Median and Robust Estimation
March 31, 2014
3 / 22
Example: how to estimate the mean?
What if X , X1 , . . . , Xn are i.i.d. from Π with EX = µ, Var(X ) = σ 2 ?
Problem: construct CI for µ with coverage probability ≥ 1 − e−t such that for any t
length(CI(t)) ≤ (Absolute constant) · length(CInorm (t))
No additional assumptions on Π are imposed.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
4 / 22
Example: how to estimate the mean?
What if X , X1 , . . . , Xn are i.i.d. from Π with EX = µ, Var(X ) = σ 2 ?
Problem: construct CI for µ with coverage probability ≥ 1 − e−t such that for any t
length(CI(t)) ≤ (Absolute constant) · length(CInorm (t))
No additional assumptions on Π are imposed.
Remark: guarantee for the sample mean µ̂n =
1
n
n
P
Xj is unsatisfactory:
j=1
q
Pr µ̂n − µ ≥ σ et /n ≤ e−t .
Does the solution exist?
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
4 / 22
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
4 / 22
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each:
G1
Gk
}|
{
}|
{
z
z
X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn
|
|
{z
}
{z
}
µ̂1 := |G1 |
1
|
Stanislav Minsker (LDHD SAMSI Workshop)
P
Xi ∈G1
µ̂k := |G1 |
k
Xi
{z
P
Xi ∈Gk
µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k )
Geometric Median and Robust Estimation
Xi
}
March 31, 2014
4 / 22
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each:
G1
Gk
}|
{
}|
{
z
z
X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn
|
|
{z
}
{z
}
µ̂1 := |G1 |
1
|
Claim:
Pr
Stanislav Minsker (LDHD SAMSI Workshop)
P
Xi ∈G1
µ̂k := |G1 |
k
Xi
{z
P
Xi ∈Gk
µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k )
Xi
}
r !
t
|µ̂∗ − µ| ≥ Abs.const × σ
≤ e−t
n
Geometric Median and Robust Estimation
March 31, 2014
4 / 22
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = btc + 1 groups G1 , . . . , Gk of size ' n/t each:
G1
Gk
z
}|
{
}|
{
z
X1 , . . . , X|G1 | . . . . . . Xn−|Gk |+1 , . . . , Xn
|
{z
}
|
{z
}
µ̂1 := |G1 |
1
|
P
Xi ∈G1
µ̂k := |G1 |
k
Xi
Then take
{z
|µ̂∗ − µ| ≥ Abs.const × σ
"
CI(t) = µ̂∗ − Cσ
Stanislav Minsker (LDHD SAMSI Workshop)
Xi
}
µ̂∗ =µ̂∗ (t):=median(µ̂1 ,...,µ̂k )
Claim:
Pr
P
Xi ∈Gk
r
r !
t
≤ e−t
n
t
, µ̂∗ + Cσ
n
Geometric Median and Robust Estimation
r #
t
n
March 31, 2014
4 / 22
0.8
0.7
Proof of the claim:
0.6
......
µ̂ 1
0.5
µ̂
......
µ̂ 8
0.4
0.3
1
|µ̂ − µ| ≥ s =⇒ at least half of events {|µ̂j − µ| ≥ s} occur.
0.2
0.1
0
0
0.1
0.2
Stanislav Minsker (LDHD SAMSI Workshop)
0.3
0.4
0.5
0.6
0.7
Geometric Median and Robust Estimation
0.8
0.9
1
March 31, 2014
5 / 22
0.8
0.7
Proof of the claim:
0.6
......
µ̂ 1
0.5
µ̂
......
µ̂ 8
0.4
0.3
1
2
|µ̂ − µ| ≥ s =⇒ at least half of events {|µ̂j − µ| ≥ s} occur.
Pr at least
half of events {|µ̂j − µ| ≥ s} occur ≤ k k/2 (Pr(|µ̂1 − µ| ≥ s))k /2
0.2
q
k /2
≤ (2e)k /2 Var(X ) nsk 2
≤ e−k whenever s ≥ (2e3 )1/2 σ nt .
0.1
Since k = btc + 1, the result follows (for “absolute constant”≤ 6.5) .
0
0
0.1
0.2
Stanislav Minsker (LDHD SAMSI Workshop)
0.3
0.4
0.5
0.6
0.7
Geometric Median and Robust Estimation
0.8
0.9
1
March 31, 2014
5 / 22
Extensions to higher dimensions
A natural question: is it possible to extend presented method to the multivariate mean?
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Extensions to higher dimensions
A natural question: is it possible to extend presented method to the multivariate mean?
Naive approach: apply the "median trick" coordinatewise. Makes the bound
dimension-dependent.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Extensions to higher dimensions
A natural question: is it possible to extend presented method to the multivariate mean?
Naive approach: apply the "median trick" coordinatewise. Makes the bound
dimension-dependent.
Can we do better? Yes! – replace the usual median by the geometric median.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Extensions to higher dimensions
Can we do better? Yes! – replace the usual median by the geometric median.
Definition
Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median
x∗ is defined as
k
X
x∗ = med(x1 , . . . , xk ) := argmin
ky − xj k.
y ∈X
Stanislav Minsker (LDHD SAMSI Workshop)
j=1
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Extensions to higher dimensions
Can we do better? Yes! – replace the usual median by the geometric median.
Definition
Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median
x∗ is defined as
k
X
x∗ = med(x1 , . . . , xk ) := argmin
ky − xj k.
j=1
6
y ∈X
-6
-4
-2
y
2
If X is a Hilbert space, then
x∗ ∈ convex hull(x1 , . . . , xk ).
0
1
4
Remarks:
-4
-2
0
2
4
x
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Extensions to higher dimensions
Can we do better? Yes! – replace the usual median by the geometric median.
Definition
Let (X, k · k) be a separable, reflexive Banach space and x1 , . . . , xk ∈ X. The geometric median
x∗ is defined as
k
X
x∗ = med(x1 , . . . , xk ) := argmin
ky − xj k.
j=1
6
y ∈X
4
Remarks:
If X is a Hilbert space, then
x∗ ∈ convex hull(x1 , . . . , xk ).
2
Other generalizations of the median are
possible
(e.g., A. Nemirovski and D. Yudin ‘83, D.
Hsu and S. Sabato ‘14).
y
-6
-4
-2
0
2
1
-4
-2
0
2
4
x
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
6 / 22
Main result
For 0 < p < α <
1
,
2
define
ψ(α; p) = (1 − α) log
Stanislav Minsker (LDHD SAMSI Workshop)
1−α
α
+ α log
1−p
p
Geometric Median and Robust Estimation
March 31, 2014
7 / 22
Main result
For 0 < p < α <
1
,
2
define
ψ(α; p) = (1 − α) log
1−α
α
+ α log
1−p
p
Theorem (M., 2013)
Fix α ∈ 0, 12 . Assume that µ ∈ X is a parameter of interest, and let µ̂1 , . . . , µ̂k ∈ X be a
collection of independent estimators of µ. Suppose ε > 0 and p < α are such that for all
1 ≤ j ≤ k,
Pr kµ̂j − µk > ε ≤ p (“weak concentration”)
(1)
Let
(2)
µ̂ :=med (µ̂1 , . . . , µ̂k ).
Then
h
i−k
Pr kµ̂ − µk > Cα ε ≤ eψ(α;p)
Stanislav Minsker (LDHD SAMSI Workshop)
(3)
(“strong concentration”)
q
1
where Cα = (1 − α) 1−2α
for Hilbert space case and Cα =
2(1−α)
1−2α
Geometric Median and Robust Estimation
otherwise.
March 31, 2014
7 / 22
Handling the “outliers”
For 0 < p < α <
1
,
2
define
ψ(α; p) = (1 − α) log
1−α
α
+ α log
1−p
p
Theorem (M., 2013)
Fix α ∈ 0, 12 . Assume that µ ∈ X is a parameter of interest, and let µ̂1 , . . . , µ̂k ∈ X be a
collection of independent estimators of µ. Suppose ε > 0, p < α and 0 ≤ γ <
for all 1 ≤ j ≤ (1 − γ)k + 1,
Pr kµ̂j − µk > ε ≤ p (“weak concentration”)
α−p
1−p
are such that
(4)
Let
(5)
µ̂ :=med (µ̂1 , . . . , µ̂k ).
Then
α−γ −k (1−γ)
ψ
;p
Pr kµ̂ − µk > Cα ε ≤ e 1−γ
q
1
where Cα = (1 − α) 1−2α
for Hilbert space case and Cα =
Stanislav Minsker (LDHD SAMSI Workshop)
(“strong concentration”)
2(1−α)
1−2α
Geometric Median and Robust Estimation
(6)
otherwise.
March 31, 2014
8 / 22
Example: estimation of the mean
(H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that
EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ
is the covariance operator and EkX − µk2 = tr (Γ) < ∞.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
9 / 22
Example: estimation of the mean
(H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that
EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ
is the covariance operator and EkX − µk2 = tr (Γ) < ∞.
k
j
Given δ, set k = log δ1 + 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
Stanislav Minsker (LDHD SAMSI Workshop)
|Gi | &
jnk
Geometric Median and Robust Estimation
k
, i = 1 . . . k.
March 31, 2014
9 / 22
Example: estimation of the mean
(H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that
EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ
2
is the covariance joperator
k and EkX − µk = tr (Γ) < ∞.
Given δ, set k =
log
1
δ
+ 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
µ̂j :=
|Gi | &
jnk
k
, i = 1 . . . k.
1 X
Xi , j = 1 . . . k ,
|Gj |
i∈Gj
µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ).
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
9 / 22
Example: estimation of the mean
(H, k · k) – Hilbert space, X , X1 , . . . , Xn ∈ H, – i.i.d. sample from a distribution Π such that
EX = µ, E [(X − µ) ⊗ (X − µ)] = Γ
2
is the covariance joperator
k and EkX − µk = tr (Γ) < ∞.
Given δ, set k =
log
1
δ
+ 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
µ̂j :=
|Gi | &
jnk
k
, i = 1 . . . k.
1 X
Xi , j = 1 . . . k ,
|Gj |
i∈Gj
µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ).
Result for µ̂∗ :
Corollary
r
tr (Γ) log(e/δ) Pr kµ̂∗ − µk ≥ 15
≤ δ.
n
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
9 / 22
Example: estimation of the mean
µ̂j :=
1 X
Xi , j = 1 . . . k ,
|Gj |
i∈Gj
µ̂∗ = µ̂∗ (δ) := med (µ̂1 , . . . , µ̂k ).
Result for µ̂∗ :
Corollary
r
tr (Γ) log(e/δ) Pr kµ̂∗ − µk ≥ 15
≤ δ.
n
Application to “robust PCA”: Yi ∈ RD , EYi = 0, EYi YiT = Σ, and set Xi = Yi YiT
Pr
30
\
Projm − Projm F ≥
∆m
s
!
EkX k4 − tr (Σ2 ) log(e/δ)
n
=⇒
≤ δ,
where ∆m = λm (Σ) − λm+1 (Σ) is the m-th spectral gap of Σ.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
9 / 22
High-dimensional sparse linear regression
Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD :
Yj = λT0 xj + ξj ,
(7)
where
1
2
x1 , . . . , xn ∈ RD – a fixed collection of vectors;
ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 .
Problem: estimate λ0 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
10 / 22
High-dimensional sparse linear regression
Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD :
Yj = λT0 xj + ξj ,
(7)
where
1
2
x1 , . . . , xn ∈ RD – a fixed collection of vectors;
ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 .
Problem: estimate λ0 .
Interesting case: D n and λ0 is sparse, meaning that
N(λ0 ) := supp(λ0 ) = {j : λ0,j 6= 0} = s D.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
10 / 22
High-dimensional sparse linear regression
Yj , j = 1 . . . n – noisy linear measurements of λ0 ∈ RD :
Yj = λT0 xj + ξj ,
(7)
where
1
2
x1 , . . . , xn ∈ RD – a fixed collection of vectors;
ξj , j = 1 . . . n – independent zero-mean random variables, Var(ξj ) ≤ σ 2 .
Problem: estimate λ0 .
Interesting case: D n and λ0 is sparse, meaning that
N(λ0 ) := supp(λ0 ) = {j : λ0,j 6= 0} = s D.
In this situation, a (variant of) famous Lasso [Tibshirani ‘96] estimator


n
2
1 X
T

λ̂ε := argmin
Yj − λ xj + ε kλk1  .
n
λ∈RD
j=1
provides a good approximation of λ0 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
10 / 22
Sparse recovery: heavy-tailed noise case
t > 0 – fixed, k = btc + 1, m = bn/k c;
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
11 / 22
Sparse recovery: heavy-tailed noise case
t > 0 – fixed, k = btc + 1, m = bn/k c;
{1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m;
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
11 / 22
Sparse recovery: heavy-tailed noise case
t > 0 – fixed, k = btc + 1, m = bn/k c;
{1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m;
T
Xl = xj1 | . . . |xjm , ji ∈ Gl , l = 1, . . . , k ;
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
11 / 22
Sparse recovery: heavy-tailed noise case
t > 0 – fixed, k = btc + 1, m = bn/k c;
{1, . . . , n} = G1 ∪ . . . ∪ Gk , |Gj | ' m;
T
Xl = xj1 | . . . |xjm , ji ∈ Gl , l = 1, . . . , k ;

λ̂lε

2
X
1
T
:= argmin 
Yj − λ xj + εkλk1  ,
|Gl |
λ∈RD
j∈Gl
λ̂∗ε := med(λ̂1ε , . . . , λ̂kε ).
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
11 / 22
Sparse recovery: heavy-tailed noise case
Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]:
κl (s, c0 ) :=
Stanislav Minsker (LDHD SAMSI Workshop)
min
min
J⊂{1...D}
u∈RD ,u6=0
|J|≤s ku c k ≤c ku k
0
J 1
J
1
kXl uk
√
> 0,
nkuJ k
Geometric Median and Robust Estimation
l = 1, . . . , k .
March 31, 2014
12 / 22
Sparse recovery: heavy-tailed noise case
Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]:
κl (s, c0 ) :=
min
min
J⊂{1...D}
u∈RD ,u6=0
|J|≤s ku c k ≤c ku k
0
J 1
J
1
kXl uk
√
> 0,
nkuJ k
l = 1, . . . , k .
Assumptions:
1
2
3
Var(ξj ) ≤ σ 2 ;
kxj k∞ ≤ M, 1 ≤ j ≤ n;
κ̄(2N(λ0 ), 3) := min κl (2N(λ0 ), 3) > 0;
1≤l≤k
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
12 / 22
Sparse recovery: heavy-tailed noise case
Let 1 ≤ s ≤ D and c0 > 0. Restricted Eigenvalue condition [Bickel, Ritov, Tsybakov ‘09]:
κl (s, c0 ) :=
min
min
J⊂{1...D}
u∈RD ,u6=0
|J|≤s ku c k ≤c ku k
0
J 1
J
1
kXl uk
√
> 0,
nkuJ k
l = 1, . . . , k .
Assumptions:
1
2
3
Var(ξj ) ≤ σ 2 ;
kxj k∞ ≤ M, 1 ≤ j ≤ n;
κ̄(2N(λ0 ), 3) := min κl (2N(λ0 ), 3) > 0;
1≤l≤k
Theorem (M., 2013)
For any
r
ε ≥ 120Mσ
t +1
log(2D),
n
with probability ≥ 1 − e−t
√
2
112 6 2
N(λ0 )
∗
ε 4
.
λ̂ε − λ0 ≤
3
κ̄ (2N(λ0 ), 3)
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
12 / 22
Low-rank matrix recovery
(X , Y ) ∈ RD×D × R is generated according to the trace regression model:
Y = hA0 , X i + ξ,
where
1
2
3
4
hA1 , A2 i := tr (AT1 A2 ),
A0 ∈ RD×D is a fixed symmetric matrix,
X ∈ RD×D is a random symmetric matrix,
ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
13 / 22
Low-rank matrix recovery
(X , Y ) ∈ RD×D × R is generated according to the trace regression model:
Y = hA0 , X i + ξ,
where
1
2
3
4
hA1 , A2 i := tr (AT1 A2 ),
A0 ∈ RD×D is a fixed symmetric matrix,
X ∈ RD×D is a random symmetric matrix,
ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 .
We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F .
1
2
X is such that {Xi,j , 1 ≤ i ≤ j ≤ D} are i.i.d. centered normal random variables with
2
2
EXi,j
= 12 , i < j and EXi,i
= 1, i = 1 . . . D.
1
X is such that Xi,j = √ εi,j , i < j, Xi,i = εi,i , 1 ≤ i ≤ D, where εi,j are i.i.d. Rademacher random
2
variables.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
13 / 22
Low-rank matrix recovery
(X , Y ) ∈ RD×D × R is generated according to the trace regression model:
Y = hA0 , X i + ξ,
where
1
2
3
4
hA1 , A2 i := tr (AT1 A2 ),
A0 ∈ RD×D is a fixed symmetric matrix,
X ∈ RD×D is a random symmetric matrix,
ξ is a zero-mean, independent of X and Var(ξ) ≤ σ 2 .
We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F .
1
2
X is such that {Xi,j , 1 ≤ i ≤ j ≤ D} are i.i.d. centered normal random variables with
2
2
EXi,j
= 12 , i < j and EXi,i
= 1, i = 1 . . . D.
1
X is such that Xi,j = √ εi,j , i < j, Xi,i = εi,i , 1 ≤ i ≤ D, where εi,j are i.i.d. Rademacher random
2
variables.
Problem: estimate A0 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
13 / 22
Low-rank matrix recovery
(X , Y ) ∈ RD×D × R is generated according to the trace regression model:
Y = hA0 , X i + ξ,
We will concentrate on 2 particular cases with the property that E hA, X i2 = kAk2F .
Problem: estimate A0 .
(X1 , Y1 ), . . . , (Xn , Yn ) – i.i.d. observations with the same distribution as (X , Y ). Usually,
n D 2 : impossible to estimate A0 consistently. If A0 is (approximately) low-rank , then


n
X
2
1
b ε := argmin 
Yj − A, Xj
+ ε kAk∗  .
A
n
A∈L
j=1
is a good approximation to A0 . Here, L is a closed, convex subset of a set of all D × D
symmetric matrices and kAk∗ is the nuclear norm of A.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
13 / 22
Low-rank matrix recovery: heavy-tailed noise case
Given δ, set k =
j
log
1
δ
k
+ 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
Stanislav Minsker (LDHD SAMSI Workshop)
|Gi | &
jnk
Geometric Median and Robust Estimation
k
, i = 1 . . . k.
March 31, 2014
14 / 22
Low-rank matrix recovery: heavy-tailed noise case
Given δ, set k =
j
log
1
δ
k
+ 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
|Gi | &
jnk
k
, i = 1 . . . k.
For 1 ≤ i ≤ k ,


X
2
1
(i)
b
Aε := argmin 
Yj − A, Xj
+ ε kAk∗  and
|Gi |
A∈L
j∈Gi
b∗
A
ε
b (1)
:= medF (A
Stanislav Minsker (LDHD SAMSI Workshop)
b (k ) ).
,...,A
Geometric Median and Robust Estimation
March 31, 2014
14 / 22
Low-rank matrixjrecovery:
heavy-tailed noise case
k
Given δ, set k =
log
1
δ
+ 1,
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gk ,
|Gi | &
jnk
k
, i = 1 . . . k.
For 1 ≤ i ≤ k ,


X
2
1
(i)
b
Yj − A, Xj
Aε := argmin 
+ ε kAk∗  and
|Gi |
A∈L
j∈Gi
b∗
A
ε
b (1)
:= medF (A
b (k ) ).
,...,A
Assume that
1
2
RL := sup kAk1 < ∞;
A∈L
R p
kξk2,1 := 0∞ Pr (|ξ| > x)dx < ∞. (e.g., it holds if E|ξ|2+τ < ∞ for some τ > 0).
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
14 / 22
Low-rank matrix recovery: heavy-tailed noise case
Assume that
1
2
RL := sup kAk1 < ∞;
A∈L
R p
kξk2,1 := 0∞ Pr (|ξ| > x)dx < ∞. (e.g., it holds if E|ξ|2+τ < ∞ for some τ > 0).
Theorem
There exist constants c, C, B with the following property: let κ := log log2 (DRL ),
sn,t,D := (5 + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all
r
ε ≥ Bkξk2,1
Dt
log(2D),
n
with probability ≥ 1 − 2e−t
kÂ∗ε − A0 k2F ≤ inf
A∈L
Stanislav Minsker (LDHD SAMSI Workshop)
2kA − A0 k2F + C
ε2 rank(A) + RL2 sn,t,D
Geometric Median and Robust Estimation
Dt
t
+
n
n
.
March 31, 2014
14 / 22
Theorem (M., 2013)
There exist constants c, C, B with the following property: let κ := log log2 (DRL ),
sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all
s
Dt
ε ≥ Bkξk2,1
log(2D),
n
with probability ≥ 1 − 2e−t
∗
2
kÂε − A0 kF ≤ inf
A∈L
2
2kA − A0 kF + C
Dt
t
2
2
ε rank(A) + RL sn,t,D
+
.
n
n
Proof:
Main idea: the size of regularization parameter ε is closely related to the operator norm of
Θ :=
n
1X
ξj Xj
n
j=1
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
15 / 22
Theorem (M., 2013)
There exist constants c, C, B with the following property: let κ := log log2 (DRL ),
sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all
s
Dt
ε ≥ Bkξk2,1
log(2D),
n
with probability ≥ 1 − 2e−t
∗
2
kÂε − A0 kF ≤ inf
A∈L
2
2kA − A0 kF + C
Dt
t
2
2
ε rank(A) + RL sn,t,D
+
.
n
n
Proof:
Main idea: the size of regularization parameter ε is closely related to the operator norm of
Θ :=
n
1X
ξj Xj
n
j=1
To obtain a "weak bound" on Pr kΘkOp
Stanislav Minsker (LDHD SAMSI Workshop)
≥ s , it is enough to bound EkΘkOp ;
Geometric Median and Robust Estimation
March 31, 2014
15 / 22
Theorem (M., 2013)
There exist constants c, C, B with the following property: let κ := log log2 (DRL ),
sn,t,D := (log(2/p∗ (α)) + κ) log(n/t) + log(2D), and assume that sn,t,D ≤ c(n/t). Then for all
s
Dt
ε ≥ Bkξk2,1
log(2D),
n
with probability ≥ 1 − 2e−t
∗
2
kÂε − A0 kF ≤ inf
A∈L
2
2kA − A0 kF + C
Dt
t
2
2
ε rank(A) + RL sn,t,D
+
.
n
n
Proof:
Main idea: the size of regularization parameter ε is closely related to the operator norm of
Θ :=
n
1X
ξj Xj
n
j=1
To obtain a "weak bound" on Pr kΘkOp
≥ s , it is enough to bound EkΘkOp ;
Key step: replace the heavy-tailed ξj by Gaussian random variables ηj
[Multiplier inequality]:
√
X
i
1 n
1 X
2,1
√
.
≤ 2 2kξk
√
η
X
E
ξ
X
max
E
j
j
j
j
n
1≤i≤n
n
i
Op
Op
j=1
Stanislav Minsker (LDHD SAMSI Workshop)
j=1
Geometric Median and Robust Estimation
March 31, 2014
15 / 22
Applications to Bayesian Inference
(joint work with S. Srivastava, L. Lin and D. Dunson)
{Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
16 / 22
Applications to Bayesian Inference
(joint work with S. Srivastava, L. Lin and D. Dunson)
{Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ.
Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some
θ0 ∈ Θ.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
16 / 22
Applications to Bayesian Inference
(joint work with S. Srivastava, L. Lin and D. Dunson)
{Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ.
Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some
θ0 ∈ Θ.
Let Π be the prior distribution over Θ.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
16 / 22
Applications to Bayesian Inference
(joint work with S. Srivastava, L. Lin and D. Dunson)
{Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ.
Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some
θ0 ∈ Θ.
Let Π be the prior distribution over Θ.
The posterior distribution given Xn is a random probability measure on Θ defined as
R Qn
i=1 pθ (Xi )dΠ(θ)
B
.
B 7→ Πn (B|Xn ) := R Qn
i=1 pθ (Xi )dΠ(θ)
Θ
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
16 / 22
Applications to Bayesian Inference
(joint work with S. Srivastava, L. Lin and D. Dunson)
{Pθ , θ ∈ Θ} - a family of probability distributions over RD indexed by Θ.
Xn = {X1 , . . . , Xn } are i.i.d. RD -valued random vectors with distribution P0 := Pθ0 for some
θ0 ∈ Θ.
Let Π be the prior distribution over Θ.
The posterior distribution given Xn is a random probability measure on Θ defined as
R Qn
i=1 pθ (Xi )dΠ(θ)
B
.
B 7→ Πn (B|Xn ) := R Qn
i=1 pθ (Xi )dΠ(θ)
Θ
Under general conditions, Πn (·|Xn ) “contracts” towards δθ0 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
16 / 22
Distances between probability measures
(X, ρ) - separable metric space.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
17 / 22
Distances between probability measures
(X, ρ) - separable metric space.
F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures
P, Q on Y, define
Z
kP − QkF := sup f (x)d(P − Q)(x) .
f ∈F
Stanislav Minsker (LDHD SAMSI Workshop)
X
Geometric Median and Robust Estimation
March 31, 2014
17 / 22
Distances between probability measures
(X, ρ) - separable metric space.
F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures
P, Q on Y, define
Z
kP − QkF := sup f (x)d(P − Q)(x) .
f ∈F
X
Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH )
with a reproducing kernel k : X × X 7→ R:
q
F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
17 / 22
Distances between probability measures
(X, ρ) - separable metric space.
F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures
P, Q on Y, define
Z
kP − QkF := sup f (x)d(P − Q)(x) .
f ∈F
X
Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH )
with a reproducing kernel k : X × X 7→ R:
q
F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}.
Fact (Sriperumbudur at al., 2010):
Z
.
kP − QkFk = k
(x,
·)d(P
−
Q)(x)
X
P 7→
R
X
H
k (x, ·)dP(x) is an embedding into the Hilbert space H.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
17 / 22
Distances between probability measures
(X, ρ) - separable metric space.
F = {f : X 7→ R} - collection of real-valued functions. Given two Borel probability measures
P, Q on Y, define
Z
kP − QkF := sup f (x)d(P − Q)(x) .
f ∈F
X
Important case: F is a unit ball in a Reproducing Kernel Hilbert Space (RKHS) (H, h·, ·iH )
with a reproducing kernel k : X × X 7→ R:
q
F = Fk := {f : X 7→ R, kf kH := hf , f iH ≤ 1}.
Fact (Sriperumbudur at al., 2010):
Z
.
kP − QkFk = k
(x,
·)d(P
−
Q)(x)
X
P 7→
R
X
H
k (x, ·)dP(x) is an embedding into the Hilbert space H.
kP − QkFk is easy to approximate.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
17 / 22
M-posterior (“median posterior”) distribution
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
18 / 22
M-posterior (“median posterior”) distribution
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm .
RQ
B
Πn (B|Gj ) := R Q
Θ
Stanislav Minsker (LDHD SAMSI Workshop)
i∈Gj
pθ (Xi )dΠ(θ)
i∈Gj
pθ (Xi )dΠ(θ)
Geometric Median and Robust Estimation
.
March 31, 2014
18 / 22
M-posterior (“median posterior”) distribution
{X1 , . . . , Xn } = G1 ∪ . . . ∪ Gm .
RQ
B
Πn (B|Gj ) := R Q
Θ
i∈Gj
pθ (Xi )dΠ(θ)
i∈Gj
pθ (Xi )dΠ(θ)
.
Define the M-posterior as
(1)
(m)
Π̂n := med(Πn , . . . , Πn ).
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
18 / 22
Computational aspects
Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )?
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
19 / 22
Computational aspects
Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )?
Famous Weiszfeld’s algorithm: starting from some z0 in the affine hull of {x1 , . . . , xk }, iterate
zm+1 =
k
X
(j)
αm+1 xj ,
j=1
(j)
where αm+1 =
−1
kxj −zm k
k
P
j=1
.
kxj −zm k−1
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
19 / 22
Computational aspects
Given x1 , . . . , xn ∈ RD , how to compute x∗ = med(x1 , . . . , xn )?
Famous Weiszfeld’s algorithm: starting from some z0 in the affine hull of {x1 , . . . , xk }, iterate
zm+1 =
k
X
(j)
αm+1 xj ,
j=1
(j)
where αm+1 =
−1
kxj −zm k
k
P
j=1
.
kxj −zm k−1
Converges for all but countably many initial points; various modifications are possible.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
19 / 22
Numerical simulation: PCA
X1 , . . . , X156 ∈ R120 ;
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
20 / 22
Numerical simulation: PCA
X1 , . . . , X156 ∈ R120 ;
d
Xi = AYi , coordinates of Y are independent with density p(y ) =
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
3y 2
2(1+|y |3 )2
;
March 31, 2014
20 / 22
Numerical simulation: PCA
X1 , . . . , X156 ∈ R120 ;
d
Xi = AYi , coordinates of Y are independent with density p(y ) =
3y 2
2(1+|y |3 )2
;
A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 };
remaining diagonal elements are equal to √ 1 ;
120
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
20 / 22
Numerical simulation: PCA
X1 , . . . , X156 ∈ R120 ;
d
Xi = AYi , coordinates of Y are independent with density p(y ) =
3y 2
2(1+|y |3 )2
;
A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 };
remaining diagonal elements are equal to √ 1 ;
120
Additionally, the data set contained 4 "outliers" Z1 , . . . , Z4 from the uniform distribution on
[−20, 20]120 .
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
20 / 22
Numerical simulation: PCA
X1 , . . . , X156 ∈ R120 ;
d
Xi = AYi , coordinates of Y are independent with density p(y ) =
3y 2
2(1+|y |3 )2
;
A is a full-rank diagonal matrix with 5 "large" eigenvalues {51/2 , 61/2 , 71/2 , 81/2 , 91/2 };
remaining diagonal elements are equal to √ 1 ;
120
Additionally, the data set contained 4 "outliers" Z1 , . . . , Z4 from the uniform distribution on
[−20, 20]120 .
Error is measures by Proj5 (Σ̂) − Proj5 (Σ) .
Op
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
20 / 22
Numerical simulation: PCA
30
60
25
50
20
40
15
30
10
20
5
10
0
0.05
0.1
0.15
0.2
0.25
0
0.994
0.3
Figure: Error of the "geometric median" estimator.
0.995
0.996
0.997
0.998
0.999
1
Figure: Error of the sample covariance estimator.
80
70
60
50
40
30
20
10
0
0.05
0.1
0.15
0.2
Figure: Error of the "thresholded geometric median" estimator.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
20 / 22
References
1
S. Minsker. (2013) Geometric Median and Robust Estimation in Banach Spaces.
2
S. Minsker, S. Srivastava, L. Lin and D. Dunson. (2014) Robust and scalable Bayes via a
median of subset posterior measures.
3
Nemirovski, A. and Yudin, D. (1983) Problem complexity and method efficiency in
optimization.
4
Lerasle, M. and Oliveira, R.I. (2011) Robust empirical mean estimators.
5
Catoni, O. (2012) Challenging the empirical mean and empirical variance: a deviation study.
6
Koltchinskii, V. (2011) Oracle inequalities in empirical risk minimization and sparse recovery
problems.
7
van der Vaart, A. W. and Wellner, J. A.Weak convergence and empirical processes.
8
Sriperumbudur, B and Gretton, A and Fukumizu, K. an Schölkopf, B. and Lanckriet, G. (2010)
Hilbert space embeddings and metrics on probability measures.
9
Hsu, D. and Sabato, S. (2014) Loss minimization and parameter estimation with heavy tails.
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
21 / 22
Thank you for your attention!
Stanislav Minsker (LDHD SAMSI Workshop)
Geometric Median and Robust Estimation
March 31, 2014
22 / 22
Download