Regularized Robust Estimation of Mean and Covariance Matrix under Heavy Tails and Outliers Daniel P. Palomar Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology 2014 Workshop on Complex Systems Modeling and Estimation Challenges in Big Data (CSM-2014) 31 July 2014 Acknowledgment This is a joint work with Ying Sun, Ph.D. student, Dept. ECE, HKUST. Prabhu Babu, Postdoc, Dept. ECE, HKUST. Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Basic Problem Task: estimate mean and covariance matrix from data {xi }. Diculties: outlier corrupted observation (heavy-tailed underlying distribution). 1−dimensional Probability Distribution Function 0.4 Student’s t Gaussian 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −4 Daniel P. Palomar −3 −2 −1 0 1 2 3 4 Robust Shrinkage Mean Covariance Estimation 4 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Sample Average A straight-forward solution µ = Ef (x) R = Ef (x − µ) (x − µ)T ⇓ f ← fN µ̂ = 1 N N ∑ xi i=1 R̂ = 1 N N ∑ (xi − µ̂) (xi − µ̂)T . i=1 Works well for i.i.d. Gaussian distributed data. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 5 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Inuence of Outliers What if the data is corrupted? A real-life example: Kalman lter lost track of the spacecraft during an Apollo mission because of outlier observation (caused by system noise). Example 1: Symmetrically Distributed Outliers x ∼ HeavyTail (1, R) R= 1 0.5 0.5 1 Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 6 / 75 True location and shape 8 6 4 2 0 −2 −4 −6 −8 −8 −6 −4 −2 0 2 4 6 8 Location and shape estimated by sample average 8 6 4 2 0 −2 −4 −6 −8 −8 −6 −4 −2 0 2 4 6 8 Location and shape estimated by Cauchy MLE 8 6 4 2 0 −2 −4 −6 −8 −8 −6 −4 −2 0 2 4 6 8 0.08 SCM Cauchy Normalized Error (R) 0.07 0.06 0.05 0.04 0.03 0.02 3 4 5 6 7 8 ν of Student t−distribution 9 10 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Inuence of Outliers What if the data is corrupted? Example 2: Asymmetrically Distributed Outliers x ∼ 0.9N (1, R) + 0.1N (µ, R) µ= 5 −5 Daniel P. Palomar R= 1 0.5 0.5 1 Robust Shrinkage Mean Covariance Estimation 11 / 75 True location and shape 8 6 4 2 0 −2 −4 −6 −8 −6 −4 −2 0 2 4 6 Location and shape estimated by sample average 8 6 4 2 0 −2 −4 −6 −8 −6 −4 −2 0 2 4 6 Location and shape estimated by Cauchy MLE 8 6 4 2 0 −2 −4 −6 −8 −6 −4 −2 0 2 4 6 0.7 SCM Cauchy 0.6 Normalized Error (R) 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 Outlier Percentage 14 16 18 20 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime More Sophisticated Models Factor model: y = µ + Ax + ε. Vector ARMA: ! p 1 − ∑ Φi Li i=1 (yt − µ) = q ! 1 − ∑ Θi Li i=1 ut . VECM: ! p 1 − ∑ Γi Li i=1 ∆yt = ΦDt + Πyt−1 + ε t . State-space model: yt = Cxt + ε t xt = Axt−1 + ut . Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 16 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Warm-up Recall the Gaussian distribution f (x) = C det (Σ) − 12 1 T −1 exp − x Σ x . 2 Negative log-likelihood function L (Σ) = N 1 N log det (Σ) + ∑ xT Σ−1 xi . 2 2 i=1 i Sample covariance matrix Σ̂ = Daniel P. Palomar 1 N N ∑ xi xTi . i=1 Robust Shrinkage Mean Covariance Estimation 18 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems M -estimator Minimizer of loss function [Mar-Mar-Yoh'06]: L (Σ) = N N −1 log det (Σ) + ∑ ρ xT i Σ xi . 2 i=1 Solution to xed-point equation: Σ= 1 N N ∑w i=1 If ρ is dierentiable w= Daniel P. Palomar xTi Σ−1 xi xi xTi . ρ0 . 2 Robust Shrinkage Mean Covariance Estimation 19 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Sample Covariance Matrix SCM can be viewed as: N Σ̂ = ∑ wi xi xTi i=1 with wi = 1 N, ∀i . MLE of a Gaussian distribution with loss function N 1 N log det (Σ) + ∑ xT Σ−1 xi . 2 2 i=1 i Why is SCM sensitive to outliers? / Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 21 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Sample Covariance Matrix Consider distance q −1 di = xT i Σ xi . wi = 1 N normal samples and outliers contribute to Σ̂ equally. Quadratic loss. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 22 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Tyler's M -estimator Introduction Examples Unsolved Problems Given f (x) → use MLE. xi ∼ elliptical (0, Σ), what shall we do? Normalized sample si , kxxi ik 2 Loss function pdf 1 f (s) = C det (R)− 2 sT R−1 s −K /2 N K log det (Σ) + 2 2 N ∑ log i=1 −1 sT i Σ si {z } | xTi Σ−1 xi Tyler [Tyl'J87] proposed covariance estimator Σ̂ as solution to N ∑ wi xi xTi = Σ, i=1 wi = K . −1 N xT i Σ xi Why is Tyler's estimator robust to outliers? , Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 23 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Tyler's M -estimator Introduction Examples Unsolved Problems Given f (x) → use MLE. xi ∼ elliptical (0, Σ), what shall we do? Normalized sample si , kxxi ik 2 Loss function pdf 1 f (s) = C det (R)− 2 sT R−1 s −K /2 N K log det (Σ) + 2 2 N ∑ log i=1 −1 sT i Σ si {z } | xTi Σ−1 xi Tyler [Tyl'J87] proposed covariance estimator Σ̂ as solution to N ∑ wi xi xTi = Σ, i=1 wi = K . −1 N xT i Σ xi Why is Tyler's estimator robust to outliers? , Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 23 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Tyler's M -estimator Consider distance q −1 di = xT i Σ xi . Introduction Examples Unsolved Problems epsilon contribution wi ∝ 1/di2 Outliers are down-weighted. Logarithmic loss. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 24 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Tyler's Introduction Examples Unsolved Problems M -estimator Tyler's M -estimator solves xed-point equation Σ= K N N ∑ xi xTi . T −1 i=1 xi Σ xi Existence condition: N > K . No closed-form solution. Iterative algorithm Σ̃t+1 = K N N xi xT ∑ xT Σ−i 1 x i=1 i t i Σt+1 = Σ̃t+1 /Tr Σ̃t+1 . Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 25 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Unsolved Problems Problem 1 What if the mean value is unknown? Problem 2 How to deal with small sample scenario? Problem 3 How to incorporate prior information? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 27 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Unsolved Problems Problem 1 What if the mean value is unknown? Problem 2 How to deal with small sample scenario? Problem 3 How to incorporate prior information? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 27 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Examples Unsolved Problems Unsolved Problems Problem 1 What if the mean value is unknown? Problem 2 How to deal with small sample scenario? Problem 3 How to incorporate prior information? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 27 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Robust Introduction Joint Mean-Covariance Estimation for Elliptical Distributions M -estimators Maronna's 1 N 1 N M -estimators [Mar'J76]: N T −1 u ( x − µ) R ( x − µ) (xi − µ) = 0 i ∑ 1 i i=1 N T −1 u ( x − µ) R ( x − µ) (xi − µ) (xi − µ)T = R. i ∑ 2 i i=1 Special examples: Huber's loss function. MLE for Student's t -distribution. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 29 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Joint Mean-Covariance Estimation for Elliptical Distributions t MLE of the Student's -distribution Student's t -distribution with degree of freedom ν : f (x) = C det (R) − 12 − K 2+ν 1 T −1 1 + (x − µ) R (x − µ) . ν Negative log-likelihood N log det (R) 2 K +ν N + log ν + (xi − µ)T R−1 (xi − µ) . ∑ 2 i=1 Lν (µ, R) = Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 30 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Joint Mean-Covariance Estimation for Elliptical Distributions t MLE of the Student's -distribution Estimating equations xi − µ =0 T −1 (xi − µ) i − µ) R i=1 K +ν N (xi − µ) (xi − µ)T ∑ ν + (x − µ)T R−1 (x − µ) = R. N i= i i 1 K +ν N Weight wi (ν) = N ∑ ν + (x K +ν N · 1 ν+(xi −µ)T R−1 (xi −µ) decreases in ν . Unique solution for ν ≥ 1. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 31 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Joint Mean-Covariance Estimation for Elliptical Distributions Joint Mean-Covariance Estimation Assumption: xi ∼ elliptical (µ 0 , R0 ). Goal: jointly estimate mean and covariance Robust to outliers. Easy to implement. Provable convergence. A natural idea: MLE of heavy-tailed distributions. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 33 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Joint Mean-Covariance Estimation for Elliptical Distributions Joint Mean-Covariance Estimation Method: tting {xi } to Cauchy (Student's t -distribution with ν = 1) likelihood function. Conservative tting. Trade-o: robustness ⇔ eciency. Tractability. R̂ → c R0 c depends on the unknown shape of the underlying distribution =⇒ estimate R/Tr (R) instead. Existence condition N > K + 1 [Ken'J91]. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 34 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Introduction Joint Mean-Covariance Estimation for Elliptical Distributions Algorithm No closed-form solution. Numerical algorithm [Ken-Tyl-Var'J94]: µ t+1 = ∑N i=1 wi (µ t , Rt ) xi ∑N i=1 wi (µ t , Rt ) Rt+1 = T K +1 N wi (µ t , Rt ) xi − µ t+1 xi − µ t+1 ∑ N i=1 with wi (µ, R) = Daniel P. Palomar 1 1 + (xi − µ) R−1 (xi − µ) T . Robust Shrinkage Mean Covariance Estimation 35 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Known Mean Problem: insucient observations estimator does not exist algorithms fail to converge Methods: Diagonal loading. Penalized or regularized loss function. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 37 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Diagonal Loading Modied Tyler's iteration [Abr-Spe'C07] Σ̃t+1 = K N N ∑ xi xTi T −1 i=1 xi Σt xi +ρ I Σt+1 = Σ̃t+1 /Tr Σ̃t+1 . Provable convergence [Che-Wie-Her'J11]. Systematic way of choosing parameter ρ [Che-Wie-Her'J11]. But without a clear motivation. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 38 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Penalized Loss Function I Wiesel's penalty [Wie'J12] h (Σ) = log det (Σ) + K log Tr Σ−1 T , Σ ∝ T minimizes h (Σ). Penalized loss function LWiesel (Σ) = N K log det (Σ) + 2 2 N ∑ log i=1 xTi Σ−1 xi +α log det (Σ) + K log Tr Σ−1 T . Algorithm Σt+1 = Daniel P. Palomar N K N + 2α N N xi xTi 2α ∑ xT Σ−1 x + N + 2α Tr i=1 i t i KT 1 Σ− t T Robust Shrinkage Mean Covariance Estimation . 39 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Penalized Loss Function II Alternative penalty: KL-divergence h (Σ) = log det (Σ) + Tr Σ−1 T , Σ = T minimizes h (Σ). Penalized loss function LKL (Σ) = N K log det (Σ) + 2 2 N T −1 log x Σ x i ∑ i i=1 −1 +α log det (Σ) + Tr Σ T . Algorithm? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 40 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Questions Existence & Uniqueness? Which one is better? Algorithm convergence? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 41 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Questions Existence & Uniqueness? Which one is better? Algorithm convergence? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 41 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Questions Existence & Uniqueness? Which one is better? Algorithm convergence? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 41 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness for Wiesel's Shrinkage Estimator Theorem [Sun-Bab-Pal'J14a] Wiesel's shrinkage estimator exists a.s., and is also unique up to a positive scale factor, if and only if the underlying distribution is continuous and N > K −2α . Existence condition for Tyler's estimator: N > K Regularization relaxes the requirement on the number of samples. Setting α = 0 (no regularization) reduces to Tyler's condition. Stronger condence on the prior information ⇒ less number of samples required. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 42 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness for Wiesel's Shrinkage Estimator Theorem [Sun-Bab-Pal'J14a] Wiesel's shrinkage estimator exists a.s., and is also unique up to a positive scale factor, if and only if the underlying distribution is continuous and N > K −2α . Existence condition for Tyler's estimator: N > K Regularization relaxes the requirement on the number of samples. Setting α = 0 (no regularization) reduces to Tyler's condition. Stronger condence on the prior information ⇒ less number of samples required. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 42 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness for KL-Shrinkage Estimator Theorem [Sun-Bab-Pal'J14a] KL-shrinkage estimator exists a.s., and is also unique, if and only if the underlying distribution is continuous and N > K −2α Compared with Wiesel's shrinkage estimator: Share the same existence condition. Without scaling ambiguity. Any connection? Which one is better? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 43 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness for KL-Shrinkage Estimator Theorem [Sun-Bab-Pal'J14a] KL-shrinkage estimator exists a.s., and is also unique, if and only if the underlying distribution is continuous and N > K −2α Compared with Wiesel's shrinkage estimator: Share the same existence condition. Without scaling ambiguity. Any connection? Which one is better? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 43 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness for KL-Shrinkage Estimator Theorem [Sun-Bab-Pal'J14a] KL-shrinkage estimator exists a.s., and is also unique, if and only if the underlying distribution is continuous and N > K −2α Compared with Wiesel's shrinkage estimator: Share the same existence condition. Without scaling ambiguity. Any connection? Which one is better? Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 43 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Equivalence Theorem [Sun-Bab-Pal'J14a] Wiesel's shrinkage estimator and KL-shrinkage estimator are equivalent. Fixed-point equation for KL-shrinkage estimator Σ= N K N + 2α N N 2α xi xT ∑ xT Σ−i 1 xi + N + 2α T. i =1 i The solution satises equality Tr Σ−1 T = K . Fixed-point equation for Wiesel's shrinkage estimator Σ= Daniel P. Palomar N K N + 2α N N xi xTi 2α ∑ xT Σ−1 xi + N + 2α Tr i =1 i KT . Σ−1 T Robust Shrinkage Mean Covariance Estimation 44 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Equivalence Theorem [Sun-Bab-Pal'J14a] Wiesel's shrinkage estimator and KL-shrinkage estimator are equivalent. Fixed-point equation for KL-shrinkage estimator Σ= N K N + 2α N N 2α xi xT ∑ xT Σ−i 1 xi + N + 2α T. i =1 i The solution satises equality Tr Σ−1 T = K . Fixed-point equation for Wiesel's shrinkage estimator Σ= Daniel P. Palomar N K N + 2α N N xi xTi 2α ∑ xT Σ−1 xi + N + 2α Tr i =1 i KT . Σ−1 T Robust Shrinkage Mean Covariance Estimation 44 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Majorization-minimization Problem: minimize x f (x) f (x) subject to x ∈ X g (x|xt ) Majorization-minimization: xt +1 = arg min g (x|xt ) x∈X xt with f (xt ) = g (xt |xt ) f (x) ≤ g (x|xt ) ∀x ∈ X f 0 (xt ; d) = g 0 (xt ; d|xt ) ∀xt + d ∈ X Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 45 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Modied Algorithm for Wiesel's Shrinkage Estimator Surrogate function g (Σ|Σt ) = N 2 log det (Σ) + xTi Σ−1 xi T −1 2 i∑ =1 xi Σt xi K N Tr Σ−1 T +α log det (Σ) + K Tr Σt−1 T ! Update Σ̃t +1 = Normalization Daniel P. Palomar N K N + 2α N N xi xT 2α ∑ xT Σ−i 1 xi + N + 2α Tr i =1 i t KT 1 Σ− t T Σt +1 = Σ̃t +1 /Tr Σ̃t +1 Robust Shrinkage Mean Covariance Estimation 46 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm Convergence Theorem [Sun-Bab-Pal'J14a] Under the existence conditions, the modied algorithm for Wiesel's shrinkage estimator converges to the unique solution. Proof idea: Majorization-minimization decreases the value of objective function. Normalization does not change the value of objective function. There is a unique minimizer of the objective function. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 47 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm Convergence Theorem [Sun-Bab-Pal'J14a] Under the existence conditions, the modied algorithm for Wiesel's shrinkage estimator converges to the unique solution. Proof idea: Majorization-minimization decreases the value of objective function. Normalization does not change the value of objective function. There is a unique minimizer of the objective function. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 47 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm for KL-Shrinkage Estimator Surrogate function g (Σ|Σt ) = N 2 log det (Σ) + K N xT Σ−1 xi i T Σ−1 x 2 i∑ x i t =1 i +α log det (Σ) + Tr Σ−1 T Update Σt +1 = N K N + 2α N N xi xT 2α ∑ xT Σ−i 1 xi + N + 2α T i =1 i t Theorem [Sun-Bab-Pal'J14a] Under the existence conditions, the algorithm for KL-shrinkage estimator converges to the unique solution. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 48 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm convergence of Wiesel's shrinkage estimator Parameters: K = 10, N = 8. 3 10 10 2 ||Σt−Σt−1||F ||Σt−Σt−1||F 10 10 1 10 0 10 0 10 0 10 −10 10 −20 1 2 10 10 10 3 0 10 10 1 10 t 0 3 10 4 10 10 λmin(Σt)/λmax(Σt) λmin(Σt)/λmax(Σt) 10 −5 10 −10 10 −15 10 2 10 t 0 −5 0 10 1 2 10 10 t (a) 3 10 10 0 10 1 10 2 10 t 3 10 4 10 (b) Figure: (a) when the existence conditions are not satised with α0 = 0.96, (b) when the existence conditions are satised with α0 = 1.04. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 49 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm convergence of KL-shrinkage estimator Parameters: K = 10, N = 8. 15 5 10 10 ||Σt−Σt−1||F ||Σt−Σt−1||F 10 10 5 10 0 10 1 10 2 10 t 3 10 10 4 0 10 10 0 1 10 2 10 t 3 10 4 10 0 10 λmin(Σt)/λmax(Σt) 10 λmin(Σt)/λmax(Σt) −5 10 −10 0 10 −5 10 −10 10 −15 10 0 10 −5 0 10 1 10 2 10 t (a) 3 10 4 10 10 0 10 1 10 2 10 t 3 10 4 10 (b) Figure: (a) when the existence conditions are not satised with α0 = 0.96, and (b) when the existence conditions are satised with α0 = 1.04. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 50 / 75 Outline 1 Motivation 2 Robust Covariance Matrix Estimators Introduction Examples Unsolved Problems 3 Robust Mean-Covariance Estimators Introduction Joint Mean-Covariance Estimation for Elliptical Distributions 4 Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean Problem: µ 0 is unknown! A simple solution: plug-in µ̂ Sample mean Sample median But... Two-step estimation, not jointly optimal. Estimation error of µ̂ propagates. To be done: shrinkage estimator for joint mean-covariance estimation with target (t, T). Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 52 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean Method: adding shrinkage penalty h (µ, R) to loss function (negative log-likelihood of Cauchy distribution). Design criteria: h (µ, R) attains minimum at prior (t, T). h (t, T) = h (t, r T) , ∀r > 0. Reason: R can be estimated up to an unknown scale factor. T is a prior for the parameter R/Tr (R). Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 53 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean Proposed penalty function h (µ, R) = α K log Tr R−1 T + log det (R) T −1 + γ log 1 + (µ − t) R (µ − t) Proposition [Sun-Bab-Pal'J14b] (t, r T) , ∀r > 0 are the minimizers of h (µ, R). Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 54 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean Proposed penalty function h (µ, R) = α K log Tr R−1 T + log det (R) T −1 + γ log 1 + (µ − t) R (µ − t) Proposition [Sun-Bab-Pal'J14b] (t, r T) , ∀r > 0 are the minimizers of h (µ, R). Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 54 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean Resulting optimization problem: minimize µ,R0 (K + 1) N log 1 + (xi − µ)T R−1 (xi − µ) ∑ 2 i=1 +α K log Tr R−1 T + log det (R) N +γ log 1+(µ −t)T R−1 (µ −t) + log det (R) . 2 A minimum satises the stationary condition and ∂ Lshrink (µ,R) ∂R Daniel P. Palomar ∂ Lshrink (µ,R) ∂µ =0 = 0. Robust Shrinkage Mean Covariance Estimation 55 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Regularization-Unknown Mean di (µ, R) = q (xi − µ)T R−1 (xi − µ), q dt (µ, R) = (t − µ)T R−1 (t − µ). wi (µ, R) = 1 , 1+di2 (µ,R) wt (µ, R) = 1 . 1+dt2 (µ,R) Stationary condition: R= K +1 N + 2α N ∑ wi (µ, R) (xi − µ) (xi − µ)T i =1 2αK T 2γ + wt (µ, R) (µ − t) (µ − t)T + N +2α N +2α Tr (R−1 T) (K + 1) ∑N i =1 wi (µ, R) xi + 2γwt (µ, R) t µ= (K + 1) ∑N i =1 wi (µ, R) + 2γwt (µ, R) Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 56 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness Theorem [Sun-Bab-Pal'J14b] Assuming continuous underlying distribution, the estimator exists under either of the following conditions: (i) if γ > γ1 , then α > α1 , (ii) if γ2 < γ ≤ γ1 , then α > α2 (γ), where 1 α1 = (K − N) , 2 1 2γ + N −K −1 α2 (γ) = K +1−N − , 2 N −1 and γ1 = 12 (K + 1), γ2 = 12 (K + 1 − N). Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 57 / 75 α α2 (γ) α1 γ2 γ1 γ Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Existence and Uniqueness Theorem [Sun-Bab-Pal'J14b] The shrinkage estimator is unique if γ ≥ α . α α2 (γ) α1 γ2 Daniel P. Palomar γ1 γ Robust Shrinkage Mean Covariance Estimation 59 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Algorithm in µ and Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean R Surrogate function L (µ, R|µ t , Rt ) = K 2+1 ∑ wi (µ t , Rt ) (xi − µ)T R−1 (xi − µ) +γwt (µ t , Rt ) (t − µ)T R−1 (t − µ) Tr(R−1 T) + N2 + α log det (R) + αK 1 Tr(R− t T) Update µ t+1 = Rt+1 = (K + 1) ∑N i=1 wi (µ t , Rt ) xi + 2γwt (µ t , Rt ) t (K + 1) ∑N i=1 wi (µ t , Rt ) + 2γwt (µ t , Rt ) K +1 N + 2α N ∑ wi (µ t , Rt ) i=1 xi − µ t+1 xi − µ t+1 T T 2γ 2αK T + wt (µ t , Rt ) t−µ t+1 t−µ t+1 + N + 2α N +2α Tr R−1 T t Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 60 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Algorithm in µ and Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean R Theorem [Sun-Bab-Pal'J14b] Under the existence conditions, the algorithm in µ and R for the proposed shrinkage estimator converges to the unique solution. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 61 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm in Σ Consider case α = γ , apply transform Σ= R + µ µT µT x̄i = [xi ; 1] , µ 1 t̄ = [t; 1] Equivalent loss function N K +1 N −1 + α log det (Σ) + log x̄T ∑ i Σ x̄i 2 2 i=1 T −1 + αK log Tr S Σ ST + α log t̄T Σ−1 t̄ Lshrink (Σ) = with S = IK 01×K . Lshrink (Σ) is scale-invariant. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 62 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm in Σ Surrogate function L (Σ|Σt ) = N 2 K +1 + α log det (Σ) + N x̄T Σ−1 x̄i i T Σ−1 x̄ 2 i∑ x̄ i t =1 i Tr ST Σ−1 ST t̄T Σ−1 t̄ + +α K − 1 1 Tr ST Σt ST t̄T Σ− t t̄ ! Update Σ̃t+1 = K +1 N + 2α N x̄i x̄T ∑ x̄T Σ−i 1 x̄ i=1 i t i t̄t̄T 2α K STST + + 1 N + 2α Tr ST Σ−1 ST t̄T Σ− t t̄ t Σt+1 = Σ̃t+1 / Σ̃t+1 K +1,K +1 Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 63 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Algorithm in Σ Theorem [Sun-Bab-Pal'J14b] Under the existence conditions, which simplies to N > K + 1 − 2α for α = γ , the algorithm in Σ for the proposed shrinkage estimator converges to the unique solution. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 64 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Simulation Parameters: K = 10 µ 0 = 0.1 × 1K ×1 (R0 )ij = 0.8|i−j| Error measurement: KL-distance n err µ̂, R̂ = E DKL N µ̂, R̂ kN (µ 0 , R0 ) o +DKL N (µ 0 , R0 ) N µ̂, R̂ Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 65 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Performance Comparison for Gaussian 25 Sample Average Cauchy Median+Shrinkage Tyler Mean+Shrinkage Tyler Diagonal Shrinkage Cauchy Shrinkage Cauchy KL Distance 20 15 10 5 0 20 40 60 80 100 120 140 Number of Samples 160 180 200 Figure: N (µ 0 , R0 ) Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 66 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean t Performance Comparison for -distribution 30 Sample Average Maximum Likelihood Cauchy Median+Shrinkage Tyler Mean+Shrinkage Tyler Diagonal Shrinkage Cauchy Shrinkage Cauchy KL Distance 25 20 15 10 5 0 20 40 60 80 100 120 140 Number of Samples 160 180 200 Figure: tν (µ 0 , R0 ), ν = 5. Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 67 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Performance Comparison for Corrupted Gaussian 50 Sample Average Cauchy Median+Shrinkage Tyler Mean+Shrinkage Tyler Diagonal Shrinkage Cauchy Shrinkage Cauchy KL Distance 40 30 20 10 0 20 40 60 80 100 120 140 Number of Samples 160 180 200 Figure: 0.9 × N (µ 0 , R0 ) + 0.1 × N (5 × 1K ×1 , I) Daniel P. Palomar Robust Shrinkage Mean Covariance Estimation 68 / 75 Motivation Robust Covariance Matrix Estimators Robust Mean-Covariance Estimators Small Sample Regime Shrinkage Robust Estimator with Known Mean Shrinkage Robust Estimator for Unknown Mean Real Data Simulation Minimum variance portfolio. Training : S&P 500 index components weekly log-returns, K = 40. Estimate R Construct portfolio weights w Parameter selection: choose α yields minimum variance on validation set. Collect half a year portfolio returns. ··· Daniel P. Palomar train validate test ··· Robust Shrinkage Mean Covariance Estimation 69 / 75 −4 4 x 10 Sample Average Cauchy Estimator Median+Shrinkage Tyler Mean+Shrinkage Tyler Shrinkage Cauchy Portfolio Variance 3.5 3 2.5 2 1.5 72 76 80 84 88 92 96 100 104 108 112 116 120 124 Estimation Window Length Summary In this talk, we have discussed Robust mean-covariance estimation for heavy-tailed distributions. Shrinkage estimation in small sample scenario. Future work Parameter tuning. Structured covariance estimation. References I R. Maronna, D. Martin, and V. Yohai. Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, 2006. D. E. Tyler. A distribution-free M -estimator of multivariate scatter. Ann. Statist., volume 15, no. 1, pp. 234251, 1987. R. A. Maronna. Robust M-estimators of multivariate location and scatter. Ann. Statist., volume 4, no. 1, pp. 5167, 1976. J. T. Kent and D. E. Tyler. Redescending M -estimates of multivariate location and scatter. Ann. Statist., volume 19, no. 4, pp. 21022119, 1991. References II J. T. Kent, D. E. Tyler, and Y. Vard. A curious likelihood identity for the multivariate t-distribution. Communications in Statistics - Simulation and Computation, volume 23, no. 2, pp. 441453, 1994. Y. Abramovich and N. Spencer. Diagonally loaded normalised sample matrix inversion (LNSMI) for outlier-resistant adaptive ltering. In Proc. IEEE Int Conf. Acoust., Speech, Signal Process. (ICCASP), 2007., volume 3, pp. III1105III1108. Honolulu, HI, 2007. Y. Chen, A. Wiesel, and A. Hero. Robust shrinkage estimation of high-dimensional covariance matrices. IEEE Trans. Signal Process., volume 59, no. 9, pp. 40974107, 2011. References III A. Wiesel. Unied framework to regularized covariance estimation in scaled Gaussian models. IEEE Trans. Signal Process., volume 60, no. 1, pp. 2938, 2012. Y. Sun, P. Babu, and D. Palomar. Regularized Tyler's scatter estimator: Existence, uniqueness, and algorithms. arXiv preprint arXiv:1407.3079, 2014. . Regularized robust estimation of mean and covariance matrix under heavy tails and outliers. submitted to IEEE Trans. Signal Process., 2014. Thanks For more information visit: http://www.ece.ust.hk/~palomar