Yi Ma Allen Yang John Wright Sparse Representation and

advertisement
Sparse Representation and
Compressed Sensing:
Theory and Algorithms
Yi Ma1,2
Allen Yang3
Microsoft
Research Asia
1
University of Illinois
at Urbana-Champaign
2
John Wright1
University of
California Berkeley
3
CVPR Tutorial, June 20, 2009
MOTIVATION – Applications to a variety of vision problems
• Face Recognition:
Wright et al PAMI ’09, Huang CVPR ’08, Wagner CVPR ’09 …
• Image Enhancement and Superresolution:
Elad TIP ’06, Huang CVPR ‘08, …
• Image Classification:
Mairal CVPR ‘08, Rodriguez ‘07, many others …
• Multiple Motion Segmentation:
Rao CVPR ‘08, Elfamhir CVPR ’09 …
• … and many others, including this conference
MOTIVATION – Applications to a variety of vision problems
• Face Recognition:
Wright et al PAMI ’09, Huang CVPR ’09, Wagner CVPR ’09 …
• Image Enhancement and Superresolution:
Elad TIP ’06, Huang CVPR ‘08, …
• Image Classification:
Mairal CVPR ‘08, Rodriguez ‘07, …
When and why can we expect such good performance?
A closer look
• Multiple Motion Segmentation
:
at the theory …
Rao CVPR ‘08, Elfamhir CVPR ’09 …
• … and many others, including this conference
SPARSE REPRESENTATION – Model problem
Underdetermined system of linear equations, y = Ax
?
?
=
…
Observation
A 2 Rm £ n ; m ¿ n
?
y 2 Rm
?
Two interpretations:
• Compressed sensing: A as sensing matrix
• Sparse representation: A as overcomplete dictionary
Unknown
x 2 Rn
SPARSE REPRESENTATION – Model problem
Underdetermined system of linear equations, y = Ax
?
?
=
…
Observation
A 2 Rm £ n ; m ¿ n
?
y 2 Rm
?
Many more unknowns than observations → no unique solution.
• Classical answer: minimum ` 2-norm solution
• Emerging applications: instead desire sparse solutions
Unknown
x 2 Rn
SPARSE SOLUTIONS – Uniqueness
Look for the sparsest solution:
min kxk0 subj
k ¢k0 - number of nonzero elements
y = Ax:
SPARSE SOLUTIONS – Uniqueness
Look for the sparsest solution:
min kxk0 subj
y = Ax:
k ¢k0 - number of nonzero elements
Is the sparsest solution unique?
spark(A) - size of smallest set of linearly dependent columns of A.
y
x1
=
A1
x2
=
A2
)
A 1 x 1 ¡ A 2 x 2 = 0:
SPARSE SOLUTIONS – Uniqueness
Look for the sparsest solution:
min kxk0 subj
y = Ax:
k ¢k0 - number of nonzero elements
Is the sparsest solution unique?
spark(A) - size of smallest set of linearly dependent columns of A.
Proposition [Gorodnitsky & Rao ‘97]:
If y = Ax 0 with kx 0 k0 <
spark(A )
2
,
then x 0 is the unique solution to min kxk0
subj
y = Ax:
SPARSE SOLUTIONS – So How Do We Compute It?
Looking for the sparsest solution:
(P0)
min kxk0 subj
y = Ax:
Bad News: (P0 ) NP-hard in the worst case, hard to approximate within
certain constants [Amaldi & Kann ’95].
SPARSE SOLUTIONS – So How Do We Compute It?
Looking for the sparsest solution:
(P0)
min kxk0 subj
y = Ax:
Bad News: (P0 ) NP-hard in the worst case, hard to approximate within
certain constants [Amaldi & Kann ’95].
Maybe we can still solve important cases?
• Greedy algorithms:
Matching Pursuit, Orthogonal Matching Pursuit [Mallat & Zhang ‘93]
CoSAMP [Needell & Tropp ‘08]
• Convex programming [Chen, Donoho & Saunders ‘94]
SPARSE SOLUTIONS – The
Heuristic
Looking for the sparsest solution:
(P0)
min kxk0 subj
y = Ax:
Intractable.
y = Ax:
Linear program,
solvable in
polynomial time.
convex relaxation
(P1)
min kxk1 subj
`0
Why ` 1 ? Convex envelope of ` 0 over the unit cube:
`1
Rich applied history – geosciences, sparse coding in vision, statistics
EQUIVALENCE – A stronger motivation
In many cases, the solutions to (P0) and (P1) are exactly the same:
Theorem [Candes & Tao ’04, Donoho ‘04]:
For Gaussian A, with overwhelming probability, whenever kx 0 k0 < ½? m
x 0 = argmin kxk1 subj
“ ` 1-minimization recovers any
sufficiently sparse solution”
Ax = Ax 0 :
GUARANTEES – “Well-Spread” A
Mutual coherence: largest inner product between distinct columns of A
Low mutual coherence: vectors are well-spread in the space
GUARANTEES – “Well-Spread” A
Mutual coherence:
Theorem [Elad & Donoho ’03, Gribvonel & Nielsen ‘03]:
` 1 minimization uniquely recovers any x 0 with
.
Strong point: checkable condition.
Weakness:
low coherence can only guarantee recovery up to
nonzeros.
GUARANTEES – Beyond Coherence
Low coherence:
“any submatrix consisting of two columns of A is well-conditioned”
Stronger bounds by looking at larger submatrices?
Restricted Isometry Constants:
s.t. for all -sparse
,
Low RIC: “Column submatrices of A are uniformly well-conditioned”
GUARANTEES – Beyond Coherence
Restricted Isometry Constants:
s.t. for all -sparse
,
Theorem [Candes & Tao ’04, Candes ‘07]:
If
, then ` 1 -minimization recovers any k-sparse x 0 .
For random A, this guarantees recovery up to linear sparsity: kx 0 k0 < ½? m
GUARANTEES – Sharp Conditions?
Necessary and sufficient condition:
solves
iff
polytope spanned
by columns of A
and their negatives
GUARANTEES – Geometric Interpretation
Necessary and sufficient condition:
[Donoho ’06]
[Donoho + Tanner ’08]
uniquely recovers
Uniform guarantees for
with support and signs
is a simplicial face of
-sparse
P centrally
iff
.
-neighborly.
GUARANTEES – Geometric Interpretation
Geometric understanding gives sharp thresholds for sparse
recovery with Gaussian A [Donoho & Tanner ‘08]:
Failure almost always
Weak threshold
Sparsity
Success almost always
Success always
Aspect ratio of A
Strong threshold
GUARANTEES – Geometric Interpretation
Explicit formulas in the wide-matrix limit [Donoho & Tanner ‘08]:
Weak threshold:
Strong threshold:
GUARANTEES – Noisy Measurements
What if there is noise in the observation? y = Ax + z:
Gaussian or bounded 2-norm
Natural approach: relax the constraint:
min kxk1 subj
Studied in several literatures
Statistics – LASSO
Signal processing – BPDN.
ky ¡ Axk22 · " 2
GUARANTEES – Noisy Measurements
What if there is noise in the observation? y = Ax + z:
Natural approach:
min kxk1 subj
ky ¡ Axk22 · " 2
Theorem [Donoho, Elad & Temlyakov ‘06]:
Recovery is stable:
See also
k^
x ¡ x 0 k2 ·
4kzk 22
1¡ ¹ ( A )(4kx 0 k 0 ¡ 1)
[Candes-Romberg-Tao ‘06], [Wainwright ‘06],
[Meinshausen & Yu ’06], [Zhao & Yu ‘06], …
GUARANTEES – Noisy Measurements
What if there is noise in the observation? y = Ax + z:
Natural approach:
min kxk1 subj
ky ¡ Axk22 · " 2
Theorem [Candes-Romberg-Tao ‘06]:
Recovery is stable – for A satisfying an appropriate RIP4S condition,
k^
x ¡ x 0 k2 · C1 kzk2 + C2
x 0;S
kx 0 ¡ x 0; S k 1
p
S
– best S-term approximation
See also
[Donoho ‘06], [Wainwright ‘06],
[Meinshausen & Yu ’06], [Zhao & Yu ‘06], …
CONNECTIONS – Sketching and Expanders
Similar sparse recovery problems explored in data streaming community:
0
2
0
=
…
Sketch
y 2 Rm
A 2 Rm £ n ; m ¿ n
5
0
0
0
1
Data stream
x 2 Rn
Combinatorial algorithms → fast encoding/decoding at expense of
suboptimal # of measurements
Based on ideas from group testing, expander graphs
[Gilbert et al ‘06],
[Indyk ‘08],
[Xu & Hassibi ‘08]
CONNECTIONS – High dimensional geometry
Sparse recovery guarantees can also be derived via probabilistic
constructions from high-dimensional geometry:
• The Johnson-Lindenstrauss lemma
Given n points x 1 : : : x n ½ Rm a random projection into
dimensions preserves pairwise distances:
C log(m )
"2
(1 ¡ ")kx i ¡ x j k2 · kPx i ¡ Px j k · kx i ¡ x j k2 :
• Dvoretsky’s almost-spherical section theorem:
There exist subspaces ¡ ½ Rm of dimension as high as
on which the ` 1 and ` 2 norms are comparable:
8x 2 ¡ ;
p
C mkxk2 · kxk1 ·
p
mkxk2
c¢m
THE STORY SO FAR – Sparse recovery guarantees
Sparse solutions can often be recovered by linear programming.
Performance guarantees for arbitrary matrices with
“uniformly well-spread columns”:
•
•
(in)-coherence
Restricted Isometry
Sharp conditions via polytope geometry
Very well-understood performance for random matrices
What about matrices arising in vision… ?
PRIOR WORK - Face Recognition as Sparse Representation
Linear subspace model for images of same face under varying illumination:
Subject i Training
If test image
is also of subject , then
for some
.
.
Can represent any test image wrt the entire training set as
Test image
Combined training
dictionary
coefficients
corruption,
occlusion
PRIOR WORK - Face Recognition as Sparse Representation
Underdetermined system of linear equations in unknowns
:
Solution is not unique … but
should be sparse: ideally, only supported on images of the same subject
expected to be sparse: occlusion only affects a subset of the pixels
Seek the sparsest solution:
convex relaxation
Wright, Yang, Ganesh, Sastry, and Ma. Robust Face Recognition via Sparse Representation, PAMI 2008
GUARANTEES – What About Vision Problems?
Behavior under varying levels of random pixel corruption:
Recognition rate
99.3%
90.7%
37.5%
Can existing theory explain this phenomenon?
PRIOR WORK - Error Correction by
minimization
Candes and Tao [IT ‘05]:
• Apply parity check matrix
s.t.
, yielding
Underdetermined system in sparse e only
• Set
• Recover
from clean system
Succeeds whenever
in the reduced system
.
PRIOR WORK - Error Correction by
minimization
Candes and Tao [IT ‘05]:
• Apply parity check matrix
s.t.
, yielding
Underdetermined system in sparse e only
• Set
• Recover
from clean system
Succeeds whenever
This work:
in the reduced system
• Instead solve
Can be applied when A is wide (no parity check).
.
PRIOR WORK - Error Correction by
minimization
Candes and Tao [IT ‘05]:
• Apply parity check matrix
s.t.
, yielding
Underdetermined system in sparse e only
• Set
• Recover
from clean system
Succeeds whenever
This work:
in the reduced system
.
• Instead solve
Succeeds whenever
in the expanded system
.
GUARANTEES – What About Vision Problems?
Highly coherent
( volume
very sparse:
)
# images per subject,
often nonnegative (illumination cone models).
as dense as possible: robust to highest possible corruption.
Results so far:
should not succeed.
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension
, an even more striking phenomenon emerges:
Conjecture: If the matrices are sufficiently coherent, then for any error
fraction
, as
, solving
corrects almost any error
with
.
DATA MODEL - Cross-and-Bouquet
Our model for should capture the fact that the columns are tightly clustered
around a common mean :
L^-norm of deviations wellcontrolled ( -> v )
Mean is mostly incoherent
with standard (error) basis
We call this the “Cross-and-Bouquet’’ (CAB) model.
ASYMPTOTIC SETTING - Weak Proportional Growth
• Observation dimension
• Problem size grows proportionally:
• Error support grows proportionally:
• Support size sublinear in
:
ASYMPTOTIC SETTING - Weak Proportional Growth
• Observation dimension
• Problem size grows proportionally:
• Error support grows proportionally:
• Support size sublinear in
Sublinear growth of
Need at least
:
is necessary to correct arbitrary fractions of errors:
“clean” equations.
Empirical Observation:
If
grows linearly in
, sharp phase transition at
.
NOTATION - Correct Recovery of Solutions
Whether
Call
is recovered depends only on
-recoverable if
and the minimizer is unique.
with these signs and support
MAIN RESULT - Correction of Arbitrary Error Fractions
Recall notation:
“
recovers any sparse signal from almost any error with density less than 1”
SIMULATION - Arbitrary Errors in WPG
Fraction of correct successes for increasing m (
,
)
SIMULATION - Phase Transition in Proportional Growth
What if
grows linearly with m?
Asymptotically sharp phase transition, similar to that observed by
Donoho and Tanner for homogeneous Gaussian matrices
SIMULATION - Comparison to Alternative Approaches
“L1 - [A I]”:
“L1 -  comp”:
“ROMP”:
Candes + Tao ‘05
Regularized orthogonal matching pursuit
Needell + Vershynin ‘08
SIMULATION - Error Correction with Real Faces
For real face images, weak proportional growth corresponds to the setting where
the total image resolution grows proportionally to the size of the database.
Fraction of correct recoveries
Above: corrupted images.
(
50% probability of correct recovery )
Below: reconstruction.
SUMMARY – Sparse Representation in Theory and Practice
So far:
Face recognition as a motivating example
Sparse recovery guarantees for generic systems
New theory and new phenomena from face data
After the break:
Algorithms for sparse recovery
Many more applications in vision and sensor networks
Matrix extensions: missing data imputation and robust PCA
Download