slides

advertisement
Regularized Robust Estimation of Mean and
Covariance Matrix under Heavy Tails and Outliers
Daniel P. Palomar
Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology
2014 Workshop on Complex Systems Modeling and Estimation
Challenges in Big Data (CSM-2014)
31 July 2014
Acknowledgment
This is a joint work with
Ying Sun, Ph.D. student, Dept. ECE, HKUST.
Prabhu Babu, Postdoc, Dept. ECE, HKUST.
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Basic Problem
Task: estimate mean and covariance matrix from data {xi }.
Diculties: outlier corrupted observation (heavy-tailed
underlying distribution).
1−dimensional Probability Distribution Function
0.4
Student’s t
Gaussian
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−4
Daniel P. Palomar
−3
−2
−1
0
1
2
3
4
Robust Shrinkage Mean Covariance Estimation
4 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Sample Average
A straight-forward solution
µ = Ef (x) R = Ef (x − µ) (x − µ)T
⇓ f ← fN
µ̂ =
1
N
N
∑ xi
i=1
R̂ =
1
N
N
∑ (xi − µ̂) (xi − µ̂)T .
i=1
Works well for i.i.d. Gaussian distributed data.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
5 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Inuence of Outliers
What if the data is corrupted?
A real-life example: Kalman lter lost track of the spacecraft
during an Apollo mission because of outlier observation
(caused by system noise).
Example 1: Symmetrically Distributed Outliers
x ∼ HeavyTail (1, R)
R=
1
0.5
0.5
1
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
6 / 75
True location and shape
8
6
4
2
0
−2
−4
−6
−8
−8
−6
−4
−2
0
2
4
6
8
Location and shape estimated by sample average
8
6
4
2
0
−2
−4
−6
−8
−8
−6
−4
−2
0
2
4
6
8
Location and shape estimated by Cauchy MLE
8
6
4
2
0
−2
−4
−6
−8
−8
−6
−4
−2
0
2
4
6
8
0.08
SCM
Cauchy
Normalized Error (R)
0.07
0.06
0.05
0.04
0.03
0.02
3
4
5
6
7
8
ν of Student t−distribution
9
10
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Inuence of Outliers
What if the data is corrupted?
Example 2: Asymmetrically Distributed Outliers
x ∼ 0.9N (1, R) + 0.1N (µ, R)
µ=
5
−5
Daniel P. Palomar
R=
1
0.5
0.5
1
Robust Shrinkage Mean Covariance Estimation
11 / 75
True location and shape
8
6
4
2
0
−2
−4
−6
−8
−6
−4
−2
0
2
4
6
Location and shape estimated by sample average
8
6
4
2
0
−2
−4
−6
−8
−6
−4
−2
0
2
4
6
Location and shape estimated by Cauchy MLE
8
6
4
2
0
−2
−4
−6
−8
−6
−4
−2
0
2
4
6
0.7
SCM
Cauchy
0.6
Normalized Error (R)
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
12
Outlier Percentage
14
16
18
20
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
More Sophisticated Models
Factor model:
y = µ + Ax + ε.
Vector ARMA:
!
p
1 − ∑ Φi Li
i=1
(yt − µ) =
q
!
1 − ∑ Θi Li
i=1
ut .
VECM:
!
p
1 − ∑ Γi Li
i=1
∆yt = ΦDt + Πyt−1 + ε t .
State-space model:
yt = Cxt + ε t
xt = Axt−1 + ut .
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
16 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Warm-up
Recall the Gaussian distribution
f (x) = C det (Σ)
− 12
1 T −1
exp − x Σ x .
2
Negative log-likelihood function
L (Σ) =
N
1 N
log det (Σ) + ∑ xT
Σ−1 xi .
2
2 i=1 i
Sample covariance matrix
Σ̂ =
Daniel P. Palomar
1
N
N
∑ xi xTi .
i=1
Robust Shrinkage Mean Covariance Estimation
18 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
M -estimator
Minimizer of loss function [Mar-Mar-Yoh'06]:
L (Σ) =
N
N
−1
log det (Σ) + ∑ ρ xT
i Σ xi .
2
i=1
Solution to xed-point equation:
Σ=
1
N
N
∑w
i=1
If ρ is dierentiable
w=
Daniel P. Palomar
xTi Σ−1 xi xi xTi .
ρ0
.
2
Robust Shrinkage Mean Covariance Estimation
19 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Sample Covariance Matrix
SCM can be viewed as:
N
Σ̂ =
∑ wi xi xTi
i=1
with wi =
1
N,
∀i .
MLE of a Gaussian distribution with loss function
N
1 N
log det (Σ) + ∑ xT
Σ−1 xi .
2
2 i=1 i
Why is SCM sensitive to outliers? /
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
21 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Sample Covariance Matrix
Consider distance
q
−1
di = xT
i Σ xi .
wi =
1
N
normal samples and
outliers contribute to Σ̂
equally.
Quadratic loss.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
22 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Tyler's
M -estimator
Introduction
Examples
Unsolved Problems
Given f (x) → use MLE.
xi ∼ elliptical (0, Σ), what shall we do?
Normalized sample si , kxxi ik
2
Loss function
pdf
1
f (s) = C det (R)− 2 sT R−1 s
−K /2
N
K
log det (Σ) +
2
2
N
∑ log
i=1
−1
sT
i Σ si
{z
}
|
xTi Σ−1 xi
Tyler [Tyl'J87] proposed covariance estimator Σ̂ as solution to
N
∑ wi xi xTi = Σ,
i=1
wi =
K
.
−1
N xT
i Σ xi
Why is Tyler's estimator robust to outliers? ,
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
23 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Tyler's
M -estimator
Introduction
Examples
Unsolved Problems
Given f (x) → use MLE.
xi ∼ elliptical (0, Σ), what shall we do?
Normalized sample si , kxxi ik
2
Loss function
pdf
1
f (s) = C det (R)− 2 sT R−1 s
−K /2
N
K
log det (Σ) +
2
2
N
∑ log
i=1
−1
sT
i Σ si
{z
}
|
xTi Σ−1 xi
Tyler [Tyl'J87] proposed covariance estimator Σ̂ as solution to
N
∑ wi xi xTi = Σ,
i=1
wi =
K
.
−1
N xT
i Σ xi
Why is Tyler's estimator robust to outliers? ,
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
23 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Tyler's
M -estimator
Consider distance
q
−1
di = xT
i Σ xi .
Introduction
Examples
Unsolved Problems
epsilon
contribution
wi ∝ 1/di2
Outliers are down-weighted.
Logarithmic loss.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
24 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Tyler's
Introduction
Examples
Unsolved Problems
M -estimator
Tyler's
M -estimator
solves xed-point equation
Σ=
K
N
N
∑
xi xTi
.
T −1
i=1 xi Σ xi
Existence condition: N > K .
No closed-form solution.
Iterative algorithm
Σ̃t+1 =
K
N
N
xi xT
∑ xT Σ−i 1 x
i=1 i
t
i
Σt+1 = Σ̃t+1 /Tr Σ̃t+1 .
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
25 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Unsolved Problems
Problem 1
What if the mean value is unknown?
Problem 2
How to deal with small sample scenario?
Problem 3
How to incorporate prior information?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
27 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Unsolved Problems
Problem 1
What if the mean value is unknown?
Problem 2
How to deal with small sample scenario?
Problem 3
How to incorporate prior information?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
27 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Examples
Unsolved Problems
Unsolved Problems
Problem 1
What if the mean value is unknown?
Problem 2
How to deal with small sample scenario?
Problem 3
How to incorporate prior information?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
27 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Robust
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
M -estimators
Maronna's
1
N
1
N
M -estimators
[Mar'J76]:
N
T −1
u
(
x
−
µ)
R
(
x
−
µ)
(xi − µ) = 0
i
∑ 1 i
i=1
N
T −1
u
(
x
−
µ)
R
(
x
−
µ)
(xi − µ) (xi − µ)T = R.
i
∑ 2 i
i=1
Special examples:
Huber's loss function.
MLE for Student's t -distribution.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
29 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
t
MLE of the Student's -distribution
Student's t -distribution with degree of freedom ν :
f (x) = C det (R)
− 12
− K 2+ν
1
T −1
1 + (x − µ) R (x − µ)
.
ν
Negative log-likelihood
N
log det (R)
2
K +ν N
+
log ν + (xi − µ)T R−1 (xi − µ) .
∑
2 i=1
Lν (µ, R) =
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
30 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
t
MLE of the Student's -distribution
Estimating equations
xi − µ
=0
T −1
(xi − µ)
i − µ) R
i=1
K +ν N
(xi − µ) (xi − µ)T
∑ ν + (x − µ)T R−1 (x − µ) = R.
N i=
i
i
1
K +ν
N
Weight wi (ν) =
N
∑ ν + (x
K +ν
N
·
1
ν+(xi −µ)T R−1 (xi −µ)
decreases in ν .
Unique solution for ν ≥ 1.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
31 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
Joint Mean-Covariance Estimation
Assumption: xi ∼ elliptical (µ 0 , R0 ).
Goal: jointly estimate mean and covariance
Robust to outliers.
Easy to implement.
Provable convergence.
A natural idea:
MLE of heavy-tailed distributions.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
33 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
Joint Mean-Covariance Estimation
Method: tting {xi } to Cauchy (Student's t -distribution with
ν = 1) likelihood function.
Conservative tting.
Trade-o: robustness ⇔ eciency.
Tractability.
R̂ → c R0
c depends on the unknown shape of the underlying distribution
=⇒ estimate R/Tr (R) instead.
Existence condition N > K + 1 [Ken'J91].
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
34 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
Algorithm
No closed-form solution.
Numerical algorithm [Ken-Tyl-Var'J94]:
µ t+1 =
∑N
i=1 wi (µ t , Rt ) xi
∑N
i=1 wi (µ t , Rt )
Rt+1 =
T
K +1 N
wi (µ t , Rt ) xi − µ t+1 xi − µ t+1
∑
N i=1
with
wi (µ, R) =
Daniel P. Palomar
1
1 + (xi − µ) R−1 (xi − µ)
T
.
Robust Shrinkage Mean Covariance Estimation
35 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Known Mean
Problem:
insucient
observations
estimator
does not exist
algorithms
fail to converge
Methods:
Diagonal loading.
Penalized or regularized loss function.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
37 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Diagonal Loading
Modied Tyler's iteration [Abr-Spe'C07]
Σ̃t+1 =
K
N
N
∑
xi xTi
T −1
i=1 xi Σt xi
+ρ I
Σt+1 = Σ̃t+1 /Tr Σ̃t+1 .
Provable convergence [Che-Wie-Her'J11].
Systematic way of choosing parameter ρ [Che-Wie-Her'J11].
But without a clear motivation.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
38 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Penalized Loss Function I
Wiesel's penalty [Wie'J12]
h (Σ) = log det (Σ) + K log Tr Σ−1 T ,
Σ ∝ T minimizes h (Σ).
Penalized loss function
LWiesel (Σ) =
N
K
log det (Σ) +
2
2
N
∑ log
i=1
xTi Σ−1 xi
+α log det (Σ) + K log Tr Σ−1 T .
Algorithm
Σt+1 =
Daniel P. Palomar
N K
N + 2α N
N
xi xTi
2α
∑ xT Σ−1 x + N + 2α Tr
i=1 i
t
i
KT
1
Σ−
t T
Robust Shrinkage Mean Covariance Estimation
.
39 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Penalized Loss Function II
Alternative penalty: KL-divergence
h (Σ) = log det (Σ) + Tr Σ−1 T ,
Σ = T minimizes h (Σ).
Penalized loss function
LKL (Σ) =
N
K
log det (Σ) +
2
2
N
T −1
log
x
Σ
x
i
∑
i
i=1
−1
+α log det (Σ) + Tr Σ
T .
Algorithm?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
40 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Questions
Existence & Uniqueness?
Which one is better?
Algorithm convergence?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
41 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Questions
Existence & Uniqueness?
Which one is better?
Algorithm convergence?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
41 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Questions
Existence & Uniqueness?
Which one is better?
Algorithm convergence?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
41 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness for Wiesel's Shrinkage Estimator
Theorem [Sun-Bab-Pal'J14a]
Wiesel's shrinkage estimator exists a.s., and is also unique up to a
positive scale factor, if and only if the underlying distribution is
continuous and N > K −2α .
Existence condition for Tyler's estimator: N > K
Regularization relaxes the requirement on the number of
samples.
Setting α = 0 (no regularization) reduces to Tyler's condition.
Stronger condence on the prior information ⇒ less number of
samples required.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
42 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness for Wiesel's Shrinkage Estimator
Theorem [Sun-Bab-Pal'J14a]
Wiesel's shrinkage estimator exists a.s., and is also unique up to a
positive scale factor, if and only if the underlying distribution is
continuous and N > K −2α .
Existence condition for Tyler's estimator: N > K
Regularization relaxes the requirement on the number of
samples.
Setting α = 0 (no regularization) reduces to Tyler's condition.
Stronger condence on the prior information ⇒ less number of
samples required.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
42 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness for KL-Shrinkage Estimator
Theorem [Sun-Bab-Pal'J14a]
KL-shrinkage estimator exists a.s., and is also unique, if and only if
the underlying distribution is continuous and N > K −2α
Compared with Wiesel's shrinkage estimator:
Share the same existence condition.
Without scaling ambiguity.
Any connection? Which one is better?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
43 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness for KL-Shrinkage Estimator
Theorem [Sun-Bab-Pal'J14a]
KL-shrinkage estimator exists a.s., and is also unique, if and only if
the underlying distribution is continuous and N > K −2α
Compared with Wiesel's shrinkage estimator:
Share the same existence condition.
Without scaling ambiguity.
Any connection? Which one is better?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
43 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness for KL-Shrinkage Estimator
Theorem [Sun-Bab-Pal'J14a]
KL-shrinkage estimator exists a.s., and is also unique, if and only if
the underlying distribution is continuous and N > K −2α
Compared with Wiesel's shrinkage estimator:
Share the same existence condition.
Without scaling ambiguity.
Any connection? Which one is better?
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
43 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Equivalence
Theorem [Sun-Bab-Pal'J14a]
Wiesel's shrinkage estimator and KL-shrinkage estimator are
equivalent.
Fixed-point equation for KL-shrinkage estimator
Σ=
N K
N + 2α N
N
2α
xi xT
∑ xT Σ−i 1 xi + N + 2α T.
i =1 i
The solution satises equality
Tr Σ−1 T = K .
Fixed-point equation for Wiesel's shrinkage estimator
Σ=
Daniel P. Palomar
N K
N + 2α N
N
xi xTi
2α
∑ xT Σ−1 xi + N + 2α Tr
i =1 i
KT
.
Σ−1 T
Robust Shrinkage Mean Covariance Estimation
44 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Equivalence
Theorem [Sun-Bab-Pal'J14a]
Wiesel's shrinkage estimator and KL-shrinkage estimator are
equivalent.
Fixed-point equation for KL-shrinkage estimator
Σ=
N K
N + 2α N
N
2α
xi xT
∑ xT Σ−i 1 xi + N + 2α T.
i =1 i
The solution satises equality
Tr Σ−1 T = K .
Fixed-point equation for Wiesel's shrinkage estimator
Σ=
Daniel P. Palomar
N K
N + 2α N
N
xi xTi
2α
∑ xT Σ−1 xi + N + 2α Tr
i =1 i
KT
.
Σ−1 T
Robust Shrinkage Mean Covariance Estimation
44 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Majorization-minimization
Problem:
minimize
x
f (x)
f (x)
subject to x ∈ X
g (x|xt )
Majorization-minimization:
xt +1 = arg min g (x|xt )
x∈X
xt
with
f (xt ) = g (xt |xt )
f (x) ≤ g (x|xt ) ∀x ∈ X
f 0 (xt ; d) = g 0 (xt ; d|xt ) ∀xt + d ∈ X
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
45 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Modied Algorithm for Wiesel's Shrinkage Estimator
Surrogate function
g (Σ|Σt ) =
N
2
log det (Σ) +
xTi Σ−1 xi
T −1
2 i∑
=1 xi Σt xi
K
N
Tr Σ−1 T
+α log det (Σ) + K
Tr Σt−1 T
!
Update
Σ̃t +1 =
Normalization
Daniel P. Palomar
N K
N + 2α N
N
xi xT
2α
∑ xT Σ−i 1 xi + N + 2α Tr
i =1 i
t
KT
1
Σ−
t T
Σt +1 = Σ̃t +1 /Tr Σ̃t +1
Robust Shrinkage Mean Covariance Estimation
46 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm Convergence
Theorem [Sun-Bab-Pal'J14a]
Under the existence conditions, the modied algorithm for Wiesel's
shrinkage estimator converges to the unique solution.
Proof idea:
Majorization-minimization decreases the value of objective
function.
Normalization does not change the value of objective function.
There is a unique minimizer of the objective function.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
47 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm Convergence
Theorem [Sun-Bab-Pal'J14a]
Under the existence conditions, the modied algorithm for Wiesel's
shrinkage estimator converges to the unique solution.
Proof idea:
Majorization-minimization decreases the value of objective
function.
Normalization does not change the value of objective function.
There is a unique minimizer of the objective function.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
47 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm for KL-Shrinkage Estimator
Surrogate function
g (Σ|Σt ) =
N
2
log det (Σ) +
K
N
xT Σ−1 xi
i
T Σ−1 x
2 i∑
x
i
t
=1 i
+α log det (Σ) + Tr Σ−1 T
Update
Σt +1 =
N K
N + 2α N
N
xi xT
2α
∑ xT Σ−i 1 xi + N + 2α T
i =1 i
t
Theorem [Sun-Bab-Pal'J14a]
Under the existence conditions, the algorithm for KL-shrinkage
estimator converges to the unique solution.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
48 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm convergence of Wiesel's shrinkage estimator
Parameters: K = 10, N = 8.
3
10
10
2
||Σt−Σt−1||F
||Σt−Σt−1||F
10
10
1
10
0
10
0
10
0
10
−10
10
−20
1
2
10
10
10
3
0
10
10
1
10
t
0
3
10
4
10
10
λmin(Σt)/λmax(Σt)
λmin(Σt)/λmax(Σt)
10
−5
10
−10
10
−15
10
2
10
t
0
−5
0
10
1
2
10
10
t
(a)
3
10
10
0
10
1
10
2
10
t
3
10
4
10
(b)
Figure: (a) when the existence conditions are not satised with
α0 = 0.96, (b) when the existence conditions are satised with α0 = 1.04.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
49 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm convergence of KL-shrinkage estimator
Parameters: K = 10, N = 8.
15
5
10
10
||Σt−Σt−1||F
||Σt−Σt−1||F
10
10
5
10
0
10
1
10
2
10
t
3
10
10
4
0
10
10
0
1
10
2
10
t
3
10
4
10
0
10
λmin(Σt)/λmax(Σt)
10
λmin(Σt)/λmax(Σt)
−5
10
−10
0
10
−5
10
−10
10
−15
10
0
10
−5
0
10
1
10
2
10
t
(a)
3
10
4
10
10
0
10
1
10
2
10
t
3
10
4
10
(b)
Figure: (a) when the existence conditions are not satised with α0 = 0.96,
and (b) when the existence conditions are satised with α0 = 1.04.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
50 / 75
Outline
1
Motivation
2
Robust Covariance Matrix Estimators
Introduction
Examples
Unsolved Problems
3
Robust Mean-Covariance Estimators
Introduction
Joint Mean-Covariance Estimation for Elliptical Distributions
4
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
Problem:
µ 0 is unknown!
A simple solution: plug-in µ̂
Sample mean
Sample median
But...
Two-step estimation, not jointly optimal.
Estimation error of µ̂ propagates.
To be done: shrinkage estimator for joint mean-covariance
estimation with target (t, T).
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
52 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
Method: adding shrinkage penalty h (µ, R) to loss function
(negative log-likelihood of Cauchy distribution).
Design criteria:
h (µ, R) attains minimum at prior (t, T).
h (t, T) = h (t, r T) , ∀r > 0.
Reason:
R can be estimated up to an unknown scale factor.
T is a prior for the parameter R/Tr (R).
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
53 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
Proposed penalty function
h (µ, R) = α K log Tr R−1 T + log det (R)
T −1
+ γ log 1 + (µ − t) R (µ − t)
Proposition [Sun-Bab-Pal'J14b]
(t, r T) , ∀r > 0 are the minimizers of h (µ, R).
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
54 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
Proposed penalty function
h (µ, R) = α K log Tr R−1 T + log det (R)
T −1
+ γ log 1 + (µ − t) R (µ − t)
Proposition [Sun-Bab-Pal'J14b]
(t, r T) , ∀r > 0 are the minimizers of h (µ, R).
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
54 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
Resulting optimization problem:
minimize
µ,R0
(K + 1) N
log 1 + (xi − µ)T R−1 (xi − µ)
∑
2
i=1
+α K log Tr R−1 T + log det (R)
N
+γ log 1+(µ −t)T R−1 (µ −t) + log det (R) .
2
A minimum satises the stationary condition
and
∂ Lshrink (µ,R)
∂R
Daniel P. Palomar
∂ Lshrink (µ,R)
∂µ
=0
= 0.
Robust Shrinkage Mean Covariance Estimation
55 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Regularization-Unknown Mean
di (µ, R) =
q
(xi − µ)T R−1 (xi − µ),
q
dt (µ, R) = (t − µ)T R−1 (t − µ).
wi (µ, R) =
1
,
1+di2 (µ,R)
wt (µ, R) =
1
.
1+dt2 (µ,R)
Stationary condition:
R=
K +1
N + 2α
N
∑ wi (µ, R) (xi − µ) (xi − µ)T
i =1
2αK
T
2γ
+
wt (µ, R) (µ − t) (µ − t)T +
N +2α
N +2α Tr (R−1 T)
(K + 1) ∑N
i =1 wi (µ, R) xi + 2γwt (µ, R) t
µ=
(K + 1) ∑N
i =1 wi (µ, R) + 2γwt (µ, R)
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
56 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness
Theorem [Sun-Bab-Pal'J14b]
Assuming continuous underlying distribution, the estimator exists
under either of the following conditions:
(i) if γ > γ1 , then α > α1 ,
(ii) if γ2 < γ ≤ γ1 , then α > α2 (γ),
where
1
α1 = (K − N) ,
2
1
2γ + N −K −1
α2 (γ) =
K +1−N −
,
2
N −1
and γ1 = 12 (K + 1), γ2 = 12 (K + 1 − N).
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
57 / 75
α
α2 (γ)
α1
γ2
γ1
γ
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Existence and Uniqueness
Theorem [Sun-Bab-Pal'J14b]
The shrinkage estimator is unique if γ ≥ α .
α
α2 (γ)
α1
γ2
Daniel P. Palomar
γ1
γ
Robust Shrinkage Mean Covariance Estimation
59 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Algorithm in µ and
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
R
Surrogate function
L (µ, R|µ t , Rt )
= K 2+1 ∑ wi (µ t , Rt ) (xi − µ)T R−1 (xi − µ)
+γwt (µ t , Rt ) (t − µ)T R−1 (t − µ)
Tr(R−1 T)
+ N2 + α log det (R) + αK
1
Tr(R−
t T)
Update
µ t+1 =
Rt+1 =
(K + 1) ∑N
i=1 wi (µ t , Rt ) xi + 2γwt (µ t , Rt ) t
(K + 1) ∑N
i=1 wi (µ t , Rt ) + 2γwt (µ t , Rt )
K +1
N + 2α
N
∑ wi (µ t , Rt )
i=1
xi − µ t+1
xi − µ t+1
T
T
2γ
2αK
T
+
wt (µ t , Rt ) t−µ t+1 t−µ t+1 +
N + 2α
N +2α Tr R−1 T
t
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
60 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Algorithm in µ and
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
R
Theorem [Sun-Bab-Pal'J14b]
Under the existence conditions, the algorithm in µ and R for the
proposed shrinkage estimator converges to the unique solution.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
61 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm in Σ
Consider case α = γ , apply transform
Σ=
R + µ µT
µT
x̄i = [xi ; 1] ,
µ
1
t̄ = [t; 1]
Equivalent loss function
N
K +1 N
−1
+ α log det (Σ) +
log x̄T
∑
i Σ x̄i
2
2 i=1
T −1
+ αK log Tr S Σ ST + α log t̄T Σ−1 t̄
Lshrink (Σ) =
with S =
IK
01×K
.
Lshrink (Σ) is scale-invariant.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
62 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm in Σ
Surrogate function
L (Σ|Σt ) =
N
2
K +1
+ α log det (Σ) +
N
x̄T Σ−1 x̄i
i
T Σ−1 x̄
2 i∑
x̄
i
t
=1 i
Tr ST Σ−1 ST
t̄T Σ−1 t̄
+
+α K
−
1
1
Tr ST Σt ST
t̄T Σ−
t t̄
!
Update
Σ̃t+1 =
K +1
N + 2α
N
x̄i x̄T
∑ x̄T Σ−i 1 x̄
i=1 i
t
i


t̄t̄T 
2α 
K STST
+
+
1
N + 2α Tr ST Σ−1 ST
t̄T Σ−
t t̄
t
Σt+1 = Σ̃t+1 / Σ̃t+1
K +1,K +1
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
63 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Algorithm in Σ
Theorem [Sun-Bab-Pal'J14b]
Under the existence conditions, which simplies to N > K + 1 − 2α
for α = γ , the algorithm in Σ for the proposed shrinkage estimator
converges to the unique solution.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
64 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Simulation
Parameters: K = 10
µ 0 = 0.1 × 1K ×1
(R0 )ij = 0.8|i−j|
Error measurement: KL-distance
n
err µ̂, R̂ = E DKL N µ̂, R̂ kN (µ 0 , R0 )
o
+DKL N (µ 0 , R0 ) N µ̂, R̂
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
65 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Performance Comparison for Gaussian
25
Sample Average
Cauchy
Median+Shrinkage Tyler
Mean+Shrinkage Tyler
Diagonal Shrinkage Cauchy
Shrinkage Cauchy
KL Distance
20
15
10
5
0
20
40
60
80
100
120
140
Number of Samples
160
180
200
Figure: N (µ 0 , R0 )
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
66 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
t
Performance Comparison for -distribution
30
Sample Average
Maximum Likelihood
Cauchy
Median+Shrinkage Tyler
Mean+Shrinkage Tyler
Diagonal Shrinkage Cauchy
Shrinkage Cauchy
KL Distance
25
20
15
10
5
0
20
40
60
80
100
120
140
Number of Samples
160
180
200
Figure: tν (µ 0 , R0 ), ν = 5.
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
67 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Performance Comparison for Corrupted Gaussian
50
Sample Average
Cauchy
Median+Shrinkage Tyler
Mean+Shrinkage Tyler
Diagonal Shrinkage Cauchy
Shrinkage Cauchy
KL Distance
40
30
20
10
0
20
40
60
80
100
120
140
Number of Samples
160
180
200
Figure: 0.9 × N (µ 0 , R0 ) + 0.1 × N (5 × 1K ×1 , I)
Daniel P. Palomar
Robust Shrinkage Mean Covariance Estimation
68 / 75
Motivation
Robust Covariance Matrix Estimators
Robust Mean-Covariance Estimators
Small Sample Regime
Shrinkage Robust Estimator with Known Mean
Shrinkage Robust Estimator for Unknown Mean
Real Data Simulation
Minimum variance portfolio.
Training : S&P 500 index components weekly log-returns,
K = 40.
Estimate R
Construct portfolio weights w
Parameter selection: choose α yields minimum variance on
validation set.
Collect half a year portfolio returns.
···
Daniel P. Palomar
train
validate test
···
Robust Shrinkage Mean Covariance Estimation
69 / 75
−4
4
x 10
Sample Average
Cauchy Estimator
Median+Shrinkage Tyler
Mean+Shrinkage Tyler
Shrinkage Cauchy
Portfolio Variance
3.5
3
2.5
2
1.5
72
76
80
84
88 92 96 100 104 108 112 116 120 124
Estimation Window Length
Summary
In this talk, we have discussed
Robust mean-covariance estimation for heavy-tailed
distributions.
Shrinkage estimation in small sample scenario.
Future work
Parameter tuning.
Structured covariance estimation.
References I
R. Maronna, D. Martin, and V. Yohai.
Robust Statistics: Theory and Methods.
Wiley Series in Probability and Statistics. Wiley, 2006.
D. E. Tyler.
A distribution-free M -estimator of multivariate scatter.
Ann. Statist., volume 15, no. 1, pp. 234251, 1987.
R. A. Maronna.
Robust M-estimators of multivariate location and scatter.
Ann. Statist., volume 4, no. 1, pp. 5167, 1976.
J. T. Kent and D. E. Tyler.
Redescending M -estimates of multivariate location and scatter.
Ann. Statist., volume 19, no. 4, pp. 21022119, 1991.
References II
J. T. Kent, D. E. Tyler, and Y. Vard.
A curious likelihood identity for the multivariate t-distribution.
Communications in Statistics - Simulation and Computation,
volume 23, no. 2, pp. 441453, 1994.
Y. Abramovich and N. Spencer.
Diagonally loaded normalised sample matrix inversion (LNSMI)
for outlier-resistant adaptive ltering.
In Proc. IEEE Int Conf. Acoust., Speech, Signal Process.
(ICCASP), 2007., volume 3, pp. III1105III1108. Honolulu,
HI, 2007.
Y. Chen, A. Wiesel, and A. Hero.
Robust shrinkage estimation of high-dimensional covariance
matrices.
IEEE Trans. Signal Process., volume 59, no. 9, pp. 40974107,
2011.
References III
A. Wiesel.
Unied framework to regularized covariance estimation in
scaled Gaussian models.
IEEE Trans. Signal Process., volume 60, no. 1, pp. 2938,
2012.
Y. Sun, P. Babu, and D. Palomar.
Regularized Tyler's scatter estimator: Existence, uniqueness,
and algorithms.
arXiv preprint arXiv:1407.3079, 2014.
.
Regularized robust estimation of mean and covariance matrix
under heavy tails and outliers.
submitted to IEEE Trans. Signal Process., 2014.
Thanks
For more information visit:
http://www.ece.ust.hk/~palomar
Download