Data - Department of Electrical and Computer Engineering

advertisement
Sparsity Control for Robustness
and Social Data Analysis
Gonzalo Mateos
ECE Department, University of Minnesota
Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh
G. Sapiro, N. Sidiropoulos, and N. Waller
Minneapolis, MN
December 9, 2011
MURI (AFOSR FA9550-10-1-0567) grant
1
Learning from “Big Data”
`Data are widely available, what is scarce is the ability to extract wisdom from them’
Hal Varian, Google’s chief economist
Fast
BIG
Productive
Ubiquitous
Revealing
K. Cukier, ``Harnessing the data deluge,'' Nov. 2011.
Smart
Messy 2
Social-Computational Systems
 Complex systems of people and computers
 The vision: preference measurement (PM), analysis, management
 Understand and engineer SoCS
 The means: leverage dual role of sparsity
 Complexity control through variable selection
 Robustness to outliers
3
Conjoint analysis
 Marketing, healthcare, psychology [Green-Srinivasan‘78]
 Optimal design and positioning of new products
 Strategy: describe products by a set of attributes, `parts’
 Goal: learn consumer’s utility function from preference data
 Linear utilities: `How much is each part worth?’
 Success story [Wind et al’89]
 Attributes: room size, TV options, restaurant, transportation
4
Modeling preliminaries
 Respondents (e.g., consumers)
 Rate
profiles
Each
comprises
attributes
 Linear utility: estimate vector of partworths
 Conjoint data collection formats
(M1) Metric ratings:
(M2) Choice-based conjoint data:
 Online SoCS-based preference data exponentially increases
 Inconsistent/corrupted/irrelevant data
Outliers
5
Robustifying PM
 Least-trimmed squares [Rousseeuw’87]
(LTS)


is the -th order statistic among
residuals discarded
Q: How should we go about minimizing nonconvex (LTS)?
A: Try all
subsets of size , solve, and pick the best
 Simple but intractable beyond small problems
 Near optimal solvers [Rousseeuw’06], RANSAC [Fischler-Bolles’81]
G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for
robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted).
6
Modeling outliers
 Outlier variables
outlier
otherwise
s.t.
 Nominal ratings obey (M1); outliers something else
-contamination [Fuchs’99], Bayesian model [Jin-Rao’10]
 Both
and
unknown,
typically sparse!
 Natural (but intractable) nonconvex estimator
7
LTS as sparse regression
 Lagrangian form
(P0)
 Tuning parameter
Proposition 1: If
then
controls sparsity in
solves (P0) with
in (LTS).
number of outliers
chosen s.t.
,
 Formally justifies the preference model and its estimator (P0)
 Ties sparse regression with robust estimation
8
Just relax!
 (P0) is NP-hard
relax
e.g., [Tropp’06]
(P1)
 (P1) convex, and thus efficiently solved
 Role of sparsity-controlling
is central
Q: Does (P1) yield robust estimates ?
A: Yap! Huber estimator is a special case
where
9
Lassoing outliers
 Suffices to solve Lasso [Tibshirani’94]
Proposition 2: Minimizers
of (P1) are
,
 Data-driven methods to select
Coeffs.
 Lasso solvers return entire robustification path (RP)
Decreasing
10
Nonconvex regularization
 Nonconvex penalty terms approximate
better in (P0)
 Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08]
 Iterative linearization-minimization of
around
 Initialize with
, use
 Bias reduction (cf. adaptive Lasso [Zou’06])
11
Comparison with RANSAC

,
i.i.d.
Nominal:
Outliers:
12
Nonparametric regression
 Interactions among attributes?
 Not captured by
 Driven by complex mechanisms
hard to model
 If one trusts data more than any parametric model
 Go nonparametric regression:
 lives in a space of “smooth’’ functions
 Ill-posed problem
 Workaround: regularization [Tikhonov’77], [Wahba’90]
 RKHS
with kernel
and norm
13
Function approximation
True function
Robust predictions
Nonrobust predictions
Refined predictions
 Effectiveness in rejecting outliers is apparent
G. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with
14
application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012
Load curve data cleansing
 Load curve: electric power consumption recorded periodically
 Reliable data: key to realize smart grid vision [Hauser’09]
 Faulty meters, communication errors
 Unscheduled maintenance, strikes, sport events
 B-splines for load curve prediction and denoising [Chen et al ’10]
Uruguay’s power consumption (MW)
15
NorthWrite data
 Energy consumption of a government building (’05-’10)
 Robust smoothing spline estimator,
hours
 Outliers: “Building operational transition shoulder periods”
 No manual labeling of outliers [Chen et al’10]
Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky
16
Principal Component Analysis
 Motivation: (statistical) learning from high-dimensional data
DNA microarray
Traffic surveillance
 Principal component analysis (PCA) [Pearson’1901]
 Extraction of low-dimensional data structure
 Data compression and reconstruction
 PCA is non-robust to outliers [Jolliffe’86]
 Our goal: robustify PCA by controlling outlier sparsity
17
Our work in context
 Contemporary applications tied to SoCS
 Anomaly detection in IP networks [Huang et al’07], [Kim et al’09]
 Video surveillance, e.g., [Oliver et al’99]
 Matrix completion for collaborative filtering, e.g., [Candes et al’09]
 Robust PCA
 Robust covariance matrix estimators [Campbell’80], [Huber’81]
 Computer vision [Xu-Yuille’95], [De la Torre-Black’03]
 Low-rank matrix recovery from sparse errors, e.g., [Wright et al’09]
18
PCA formulations
 Training data
 Minimum reconstruction error
 Compression operator
 Reconstruction operator
 Maximum variance
 Component analysis model
Solution:
19
Robustifying PCA
 Outlier-aware model
 Interpret: blind preference model with latent profiles
(P2)

-norm counterpart tied to (LTS PCA)
 (P2) subsumes optimal (vector) Huber

-norm regularization for entry-wise outliers
G. Mateos and G. B. Giannakis , ``Robust PCA as bilinear decomposition with outlier sparsity
20
regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted).
Alternating minimization
(P2)
1


update: SVD of outlier-compensated data
update: row-wise vector soft-thresholding
Proposition 3: Alg. 1’s iterates converge to a stationary point of (P2).
21
Video surveillance
Original
PCA
Data: http://www.cs.cmu.edu/~ftorre/
Robust PCA
`Outliers’
22
Big Five personality factors
 Five dimensions of personality traits [Goldberg’93][Costa-McRae’92]
 Discovered through factor analysis
 WEIRD subjects
 Big Five Inventory (BFI)
 Measure the Big Five
 Short-questionnaire (44 items)
 Rate 1-5, e.g.,
`I see myself as someone who…
…is talkative’
…is full of energy’
Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin,
Eds. New York, NY: Guilford Press, 2008.
23
BFI data
 Eugene-Springfield community sample [Goldberg’08]

subjects,
item responses,
factors
 Robust PCA identifies 8 outlying subjects
 Validated via `inconsistency’ scores, e.g., VRIN [Tellegen’88]
Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller
24
Online robust PCA
 Motivation: Real-time data and memory limitations
 Exponentially-weighted robust PCA
 At time
, do not re-estimate
25
Online PCA in action
 Nominal:
 Outliers:
26
Robust kernel PCA
 Kernel (K)PCA [Scholkopf ‘97]
Input space
Feature space
 Challenge:
-dimensional
Kernel trick:
 Related to spectral clustering
27
Unveiling communities
 Network: NCAA football teams (nodes), F’00 games (edges)

teams, kernel
ARI=0.8967
 Identified exactly: Big 10, Big 12, ACC, SEC, Big East
 Outliers: Independent teams
Data: http://www-personal.umich.edu/~mejn/netdata/
28
Spectrum cartography
Idea: collaborate to form a spatial map of the spectrum
Goal: find
s.t.
is the spectrum at position
Approach: Basis expansion model for
Original
, nonparametric basis pursuit
Estimated
S
P
E
C
T
R
U
M
M
A
P
J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum
cartography,'' IEEE Trans. Signal Process., Oct. 2011.
29
Distributed adaptive algorithms
Wireless sensor
Improved learning through cooperation
10
10
2
1
 Fast varying (non-)stationary processes
 Unavailability of statistical information
 Online incorporation of sensor data
 Noisy communication links
Learning Curve
Issues and Significance:
10
Local-LMS
10
-1
D-LMS w/ noisy links
10
10
Technical Approaches:
0
D-LMS
-2
Diffusion LMS
Centralized-LMS
-3
0
Jmin
100
200
300
400
500
600
700
800
Time t
 Consensus-based in-network operation in ad hoc WSNs
 Distributed optimization using alternating-direction methods
 Online learning of statistics using stochastic approximation
 Performance analysis via stochastic averaging
G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for
consensus-based in-network adaptive estimation,'‘IEEE Trans. Signal Process., Nov. 2009.
30
Unveiling network anomalies
Approach: Flag anomalies across flows and time via sparsity and low rank
Payoff: Ensure high performance, QoS, and security in IP networks
Enhanced detection capabilities
Anomalies across flows and time
M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and
31
time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).
Concluding summary
 Control sparsity in model residuals for robust learning
 Research issues addressed




Sparsity control for robust metric and choice-based PM
Kernel-based nonparametric utility estimation
Robust (kernel) principal component analysis
Scalable distributed real-time implementations
 Application domains




Preference measurement and conjoint analysis
Psychometrics, personality assessment
Video surveillance
Social and power networks
OUTLIER-RESILIENT
ESTIMATION
LASSO
 Experimental validation with GPIPP personality ratings (~6M)
Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com
32
Download