Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis

advertisement
Robust Nonparametric Regression
by Controlling Sparsity
Gonzalo Mateos and Georgios B. Giannakis
ECE Department, University of Minnesota
Acknowledgments: NSF grants no. CCF-0830480, 1016605
EECS-0824007, 1002180
May 24, 2011
1
Nonparametric regression
 Given
, function estimation allows predicting
 Estimate unknown
from a training data set
 If one trusts data more than any parametric model
 Then go nonparametric regression:
 lives in a (possibly
-dimensional) space of “smooth’’ functions
 Ill-posed problem
 Workaround: regularization [Tikhonov’77], [Wahba’90]
 RKHS
with reproducing kernel
and norm
 Our focus
 Nonparametric regression robust against outliers
 Robustness by controlling sparsity
2
Our work in context
 Noteworthy applications
 Load curve data cleansing [Chen et al’10]
 Spline-based PSD cartography [Bazerque et al’09]
 Robust nonparametric regression
 Huber’s function [Zhu et al’08]
 No systematic way to select thresholds
 Robustness and sparsity in linear (parametric) regression
 Huber’s M-type estimator as Lasso [Fuchs‘99]; contamination model
 Bayesian framework [Jin-Rao‘10][Mitra et al’10]; rigid choice of
3
Variational LTS
 Least-trimmed squares (LTS) regression [Rousseeuw’87]
Variational (V)LTS counterpart
(VLTS)


is the -th order statistic among
residuals discarded
 Q: How should we go about minimizing
?
(VLTS) is nonconvex; existence of minimizer(s)?
A: Try all
subsamples of size , solve, and pick the best
 Simple but intractable beyond small problems
4
Modeling outliers
 Outlier variables
 Nominal data obey
s.t.
outlier
otherwise
; outliers something else
 Remarks
 Both
and
are unknown
 If outliers sporadic, then vector is sparse!
 Natural (but intractable) nonconvex estimator
5
VLTS as sparse regression
 Lagrangian form
(P0)
 Tuning parameter
controls sparsity in
Proposition 1: If
solves (P0) with
then
solves (VLTS) too.
number of outliers
chosen s.t.
,
 The equivalence
 Formally justifies the regression model and its estimator (P0)
 Ties sparse regression with robust estimation
6
Just relax!
 (P0) is NP-hard
relax
(P1)
 (P1) convex, and thus efficiently solved
 Role of sparsity controlling
is central
 Q: Does (P1) yield robust estimates ?
A: Yap! Huber estimator is a special case
where
7
Alternating minimization
 (P1) jointly convex in
AM solver
(P1)
 Remarks
 Single Cholesky factorization of
 Soft-thresholding
 Reveals the intertwining between
 Outlier identification
 Function estimation with outlier compensated data
8
Lassoing outliers
 Alternative to AM
Proposition 2: Minimizers
solve Lasso [Tibshirani’94]
of (P1) are fully determined by
w/
as
and
, with
 Enables effective methods to select
 Lasso solvers return entire robustification path (RP)
 Cross-validation (CV) fails with multiple outliers [Hampel’86]
9
Robustification paths
 LARS returns whole RP [Efron’03]
 Same cost of a single LS fit (
)
Coeffs.
 Lasso path of solutions is piecewise linear
 Lasso is simple in the scalar case
 Coordinate descent is fast! [Friedman ‘07]
 Exploits warm starts, sparsity
 Other solvers: SpaRSA [Wright et al’09], SPAMS [Mairal et al’10]
 Leverage these solvers

values of
 For each ,
consider 2-D grid
values of
10
Selecting
and
 Relies on RP and knowledge on the data model
 Number of outliers known: from RP, obtain range of
Discard outliers (known), and use CV to determine
s.t.
 Variance of the nominal noise known: from RP, for each
grid, obtain an entry of the
sample variance matrix
The best
.
on the
as
are s.t.
 Variance of the nominal noise unknown: replace
above with a
robust estimate
, e.g., median absolute deviation (MAD)
11
Nonconvex regularization
 Nonconvex penalty terms approximate
better in (P0)
 Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08]
 Iterative linearization-minimization of
around
 Remarks
 Initialize with
, use
and
 Bias reduction (cf. adaptive Lasso [Zou’06])
12
Robust thin-plate splines
 Specialize to thin-plate splines [Duchon’77], [Wahba’80]
 Smoothing penalty only a seminorm in
 Solution:
 Radial basis function
 Augment w/ member of the nullspace of
 Given , unknowns
found in closed form
 Still, Proposition 2 holds for appropriate
13
Simulation setup
 Training set

: noisy samples of Gaussian mixture
examples,
i.i.d.
 Outliers:
 Nominal:
True function
i.i.d. for
w/
i.i.d. (
known)
Data
14
Robustification paths
 Grid parameters:


grid:
grid:
 Paths obtained using SpaRSA [Wright et al’09]
Outlier
Inlier
15
Results
True function
Robust predictions
Nonrobust predictions
Refined predictions
 Effectiveness in rejecting outliers is apparent
16
Generalization capability
 In all cases, 100% outlier identification success rate
 Figures of merit
 Training error:
 Test error:
 Nonconvex refinement leads to consistently lower
17
Load curve data cleansing
 Load curve: electric power consumption recorded periodically
 Reliable data: key to realize smart grid vision
 B-splines for load curve prediction and denoising [Chen et al ’10]
 Deviation from nominal models (outliers)
 Faulty meters, communication errors
 Unscheduled maintenance, strikes, sporting events
Uruguay’s aggregate power consumption (MW)
18
Real data tests
Robust predictions
Nonrobust predictions
Refined predictions
19
Concluding summary
 Robust nonparametric regression
 VLTS as
-(pseudo)norm regularized regression (NP-hard)
 Convex relaxation
variational M-type estimator
Lasso
 Controlling sparsity amounts to controlling number of outliers
 Sparsity controlling role of
is central
 Selection of
using the Lasso robustification paths
 Different options dictated by available knowledge on the data model
 Refinement via nonconvex penalty terms
 Bias reduction and improved generalization capability
 Real data tests for load curve cleansing
20
Download