- Computer Science Department, Technion

advertisement
Analysis of the Basis Pustuit
Algorithm and Applications*
Michael Elad
The Computer Science Department – (SCCM) program
Stanford University
Multiscale Geometric Analysis Meeting
IPAM - UCLA
January 2003
* Joint work with: Alfred M. Bruckstein – CS, Technion
David L. Donoho – Statistics, Stanford
Collaborators
Freddy
Bruckstein
CS
Department
Technion
M.Elad and A.M. Bruckstein, “A
Generalized Uncertainty Principle and
Sparse Representation in Pairs of Bases",
IEEE Trans. On Information Theory, Vol.
48, pp. 2558-2567, September 2002.
Sparse representation and
the Basis Pursuit Algorithm
Dave
Donoho
Statistics
Department
Stanford
D. L. Donoho and M. Elad,
“Maximal Sparsity Representation
via l1 Minimization”, to appear in
Proceedings of the Naional
Academy of Science.
2
General
• Basis Pursuit algorithm
[Chen, Donoho and Saunders, 1995]:
 Effective for finding sparse over-complete representations,
 Effective for non-linear filtering of signals.
• Our work (in progress) – better understanding BP and
deploying it in signal/image processing and computer
vision applications.
• We believe that over-completeness has an important role!
• Today we discuss the analysis of the Basis Pursuit
algorithm, giving conditions for its success. We then show
some stylized applications exploiting this analysis.
Sparse representation and
the Basis Pursuit Algorithm
3
Agenda
1. Introduction
Previous and current work
2. Two Ortho-Bases
Uncertainty  Uniqueness  Equivalence
3. Arbitrary dictionary
Understanding
the BP
Uniqueness  Equivalence
4. Stylized Applications
Separation of point, line, and plane clusters
Using the BP Application
5. Discussion
Sparse representation and
the Basis Pursuit Algorithm
4
Transforms
•
Define the forward and backward transforms by (assume
one-to-one mapping)
Forward :
  T s
Backward : s  T 1 
s – Signal (in the signal space CN)
 – Representation (in the transform domain CL, LN)
•
Transforms T in signal and image processing used for
coding, analysis, speed-up processing, feature
extraction, filtering, …
Sparse representation and
the Basis Pursuit Algorithm
5
The Linear Transforms
•
Special interest - linear
transforms (inverse) s  
General transforms
L
N

Atoms from a
Dictionary
•
Linear
=
s
Square
Unitary

In square linear transforms,
 is an N-by-N & non-singular.
Sparse representation and
the Basis Pursuit Algorithm
6
Lack Of Universality
•
Many available square linear transforms – sinusoids,
wavelets, packets, …
•
Successful transform – one which leads to sparse
representations.
•
Observation: Lack of universality - Different bases
good for different purposes.
•

Sound = harmonic music (Fourier) + click noise (Wavelet),

Image = lines (Ridgelets) + points (Wavelets).
Proposed solution: Over-complete dictionaries, and
possibly combination of bases.
Sparse representation and
the Basis Pursuit Algorithm
7
Example – Composed Signal
0.1
0.05
1
0
-0.05
-0.1
0.1
2
0.05
0
1.0
0
0.3
0.1 10
0 10-2
-0.05
-0.1
+
1
3
0.5
0.5
0
1
4
0.5
00
64
-0.1
-4
10
-0.2
-6
10
-0.3
|T{1+0.32}|
-0.410-8
-0.5 -10
10
0
-0.6
0.05
|T{1+0.32+0.53+0.054}|
20
40
60
80
100
120
DCT Coefficients
128
Sparse representation and
the Basis Pursuit Algorithm
8
Example – Desired Decomposition
10
0
-2
10
10
10
10
-4
-6
-8
-10
10
0
40
80
DCT Coefficients
Sparse representation and
the Basis Pursuit Algorithm
120
160
200
240
Spike (Identity) Coefficients
9
Matching Pursuit
•
Given d unitary matrices {k, 1kd}, define a
dictionary  = [1, 2 , … d] [Mallat & Zhang (1993)].
•
Combined representation per a signal s by
s  
•
Non-unique solution  - Solve for maximal sparsity
P0 : Min  0 s.t. s  

•
Hard to solve – a sub-optimal greedy sequential
solver: “Matching Pursuit algorithm” .
Sparse representation and
the Basis Pursuit Algorithm
10
Example – Matching Pursuit
0
10
-2
10
-4
10
-6
10
-8
10
-10
10
0
50
100
150
200
250
Dictionary Coefficients
Sparse representation and
the Basis Pursuit Algorithm
11
Basis Pursuit (BP)
•
Facing the same problem, and the same
optimization task [Chen, Donoho, Saunders (1995)]
P0 : Min  0 s.t. s  

•
Hard to solve – replace the l0 norm by an l1 :
“Basis Pursuit algorithm”
P1 : Min  1 s.t. s   

•
Interesting observation: In many cases it
successfully finds the sparsest representation.
Sparse representation and
the Basis Pursuit Algorithm
12
Example – Basis Pursuit
10
10
10
10
10
10
0
-2
-4
-6
-8
-10
0
50
100
150
200
250
Dictionary Coefficients
Sparse representation and
the Basis Pursuit Algorithm
13
Why
1 ?
2D-Example
p
Min 1  2
1 , 2 
p
s.t. s  11  22
2
2
s  11  22
s  11  22
Sparse representation and
the Basis Pursuit Algorithm
s  11  22
1
1
0P<1
2
P=1
1
P>1
14
Example – Lines and Points*
Original
image
Wavelet part
of the noisy
image
Ridgelets part
of the image
* Experiments from Starck, Donoho, and Candes - Astronomy & Astrophysics 2002.
Sparse representation and
the Basis Pursuit Algorithm
15
Example – Galaxy SBS 0335-052*
Original
Residual
=
+
Ridgelets
Wavelet
+
Curvelets
+
* Experiments from Starck, Donoho, and Candes - Astronomy & Astrophysics 2002.
Sparse representation and
the Basis Pursuit Algorithm
16
Non-Linear Filtering via BP
• Through the previous example – Basis Pursuit can be
used for non-linear filtering.
• From Transforming to Filtering
Min  1 s.t. s  

Min  1   s  

2
2
• What is the relation to alternative non-linear filtering
methods, such as PDE based methods (TV, anisotropic
diffusion …), Wavelet denoising?
• What is the role of over-completeness in inverse
problems?
Sparse representation and
the Basis Pursuit Algorithm
17
(Our) Recent Work
Proven equivalence between
P0 and P1 under some
conditions on the sparsity of
the representation, and for
dictionaries built of two orthobases [Donoho and Huo]
1998
Improving previous
results –
tightening the bounds
[Elad and Bruckstein]
1999
2000
Relaxing the notion
of sparsity from l0
to lp norm
[Donoho and Elad]
2001
Proving tightness of
E-B bounds [Feuer
& Nemirovski]
2002
time
Generalized all previous results to
any dictionary and Applications
[Donoho and Elad]
Sparse representation and
the Basis Pursuit Algorithm
18
Before we dive …
• Given a dictionary  and a signal s, we want to find
the sparse “atom decomposition” of the signal.
• Our goal is the solution of Min 

0
s.t. s  
• Basis Pursuit alternative is to solve instead
Min  1 s.t. s  

• Our focus for now: Why should this work?
Sparse representation and
the Basis Pursuit Algorithm
19
Agenda
1. Introduction
Previous and current work
2. Two Ortho-Bases
Uncertainty  Uniqueness  Equivalence
3. Arbitrary dictionary
N
N
  
Uniqueness  Equivalence
4. Stylized Applications
Separation of point, line, and plane clusters
N
5. Discussion
Sparse representation and
the Basis Pursuit Algorithm
20
Our Objective
Given a signal s, and its two representations using  and
, what is the lower bound on the sparsity of both?
Our Objective is
s  
s  
 0   0  Thr, 
We will show that such rule immediately leads to a
practical result regarding the solution of the P0 problem.
Sparse representation and
the Basis Pursuit Algorithm
21
Mutual Incoherence
Define M  Max
1 k , j N
  
H
j
k
• M – mutual incoherence between  and .
• M plays an important role in the desired uncertainty rule.
• Properties
 Generally, 1
N  M  1.
 For Fourier+Trivial (identity) matrices M  1
N.
 For random pairs of ortho-matrices M  2 loge N
Sparse representation and
the Basis Pursuit Algorithm
N.
22
Uncertainty Rule
Theorem 1
0  2 0 
0
0
2*

M
Examples:
 =: M=1, leading to  0    2.
0
 =I, =FN (DFT): M  1
N, leading to  0   0  2 N.
* Donoho & Huo obtained a weaker bound  0   0  1  M1 
Sparse representation and
the Basis Pursuit Algorithm
23
Example
=I, =FN (DFT)
M1
0  2 N
N
0
• For N=1024, s 0  F  s 0  64 .
• The signal satisfying this bound: Picket-fence
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
Sparse representation and
the Basis Pursuit Algorithm
200
400
600
800
1000
24
Towards Uniqueness
• Given a unit norm signal s, assume we hold two
different representations for it using 
s   1    2


• Thus 0       ,   x1    x  x  q
1
2
1
2
x
x 
 2
• Based on the uncertainty theorem we just got:
2
 x1 0  x 2
M
Sparse representation and
the Basis Pursuit Algorithm
0
 1   2
0
 1   2
0
0
25
Uniqueness Rule
2
 1   2
0
M
0
In words: Any two different representations of the same
signal CANNOT BE JOINTLY TOO SPARSE.
If we found a representation that satisfy
1
 
M
Theorem 2
*
0
Then necessarily it is unique (the sparsest).
1
* Donoho & Huo obtained a weaker bound  0  0.51  M 
Sparse representation and
the Basis Pursuit Algorithm
26
Uniqueness Implication
•
We are interested in solving
P0 : Min 

•
•
0
s.t. s  ,  .
Somehow we obtain a candidate solution ̂ .
The uniqueness theorem tells us that a simple test on
̂ ( M  ˆ  1) could tell us if it is the solution of P0.
0
•
However:



If the test is negative, it says nothing.
This does not help in solving P0.
This does not explain why P1 may be a good replacement.
Sparse representation and
the Basis Pursuit Algorithm
27
Equivalence - Goal
• We are going to solve the following problem
P1 : Min 

1
s.t. s  ,  .
• The questions we ask are:
 Will the P1 solution coincide with the P0 one?
 What are the conditions for such success?
• We show that if indeed the P0 solution is sparse
enough, then P1 solver finds it exactly.
Sparse representation and
the Basis Pursuit Algorithm
28
Equivalence - Result
Given a signal s with a representation s  , ,
Assuming a sparsity on  such that (assume k1<k2)
   1  2 …
 N , N1 N 2 …
  2N 
k1 non-zeros
Theorem 3
k2 non-zeros
If k1 and k2 satisfy 2M2k1k 2  Mk 2  1  0
then P1 will find the correct solution.
A weaker requirement is given by k 1  k 2 
2  0.5
M
*
* Donoho & Huo obtained a weaker bound   0.51  M1 
0
Sparse representation and
the Basis Pursuit Algorithm
29
The Various Bounds
Signal dimension: N=1024,
32
Dictionary: =I, =FN ,
28
Mutual incoherence M=1/32.
24
K 1+K2= 0.9142/M
2
2M K 1K 2+MK2 -1=0
K 1+K2=1/M
20
Uniqueness: 31 entries and below,
Equivalence:
K1>K2
K 1+K2 =
= 0.5(1+1/M)
K2
Results
16
12
8
• 16 entries and below (D-H),
• 29 entries and below (E-B).
4
0
Sparse representation and
the Basis Pursuit Algorithm
4
8
12
16
K1
20
24
28
32
30
Equivalence – Uniqueness Gap
• For uniqueness we got the requirement  
0
1
M
• For equivalence we got the requirement  
0
2  0.5
M
• Is this gap due to careless bounding?
• Answer
[by Feuer and Nemirovski, to appear in IEEE Transactions On
Information Theory]:
Sparse representation and
the Basis Pursuit Algorithm
No, both bounds are indeed tight.
31
Agenda
1. Introduction
Previous and current work
2. Two Ortho-Bases
Uncertainty  Uniqueness  Equivalence
3. Arbitrary dictionary
Uniqueness  Equivalence
N
4. Stylized Applications
Separation of point, line, and plane clusters
L

Every column
is normalized
to have an l2
unit norm
5. Discussion
Sparse representation and
the Basis Pursuit Algorithm
32
Why General Dictionaries?
• In many situations
 We would like to use more than just two ortho-bases (e.g.
Wavelet, Fourier, and ridgelets);
 We would like to use non-ortho bases (pseudo-polar FFT,
Gabor transform, … ),
 In many situations we would like to use non-square transforms
as our building blocks (Laplacian pyramid, shift-invariant
Wavelet, …).
• In the following analysis we assume ARBITRARY
DICTIONARY (frame). We show that BP is successful
over such dictionaries as well.
Sparse representation and
the Basis Pursuit Algorithm
33
Uniqueness - Basics
• Given a unit norm signal s, assume we hold two
different representations for it using 


s   1    2   1  2  0
• In the two-ortho case - simple splitting and use of the
uncertainty rule – here there is no such splitting !!
• The equation  v  0
implies a linear combination
of columns from  that are
linearly dependent. What is
the smallest such group?
Sparse representation and
the Basis Pursuit Algorithm

=
0
v
34
Uniqueness – Matrix “Spark”
Definition: Given a matrix , define =Spark{} as the
smallest integer such that there exists at least one group
of  columns from  that is linearly dependent.
Examples:
Spark
1 0  0 1
0 1  0 1


     


0
0

1
1


Sparse representation and
the Basis Pursuit Algorithm
=N+1; Spark
1 0  0
0 1  0

   

0 0  1
1
0



0
=2
35
“Spark” versus “Rank”
The notion of spark is confusing – here is an attempt to
compare it to the notion of rank
Rank
Spark
Definition: Maximal # of columns
that are linearly independent
Definition: Minimal # of columns
that are linearly dependent
Computation: Sequential - Take the
first column, and add one column at
a time, performing Gram-Schmidt
orthogonalization. After L steps,
count the number of non-zero
vectors – This is the rank.
Computation: Combinatorial sweep through 2L combinations
of columns to check linear
dependence - the smallest group
of linearly dependent vectors is
the Spark.
Generally: 2  =Spark{}  Rank{}+1.
Sparse representation and
the Basis Pursuit Algorithm
36
Uniqueness – Using the “Spark”
• Assume that we know the spark of , denoted by .
• For any pair of representations of s we have


s   1    2   1  2  0
• By the definition of the spark we know that if v=0
then v 0  . Thus
1   2
0

• From here we obtain the relationship
  1   2
Sparse representation and
the Basis Pursuit Algorithm
0
 1   2
0
0
37
Uniqueness Rule – 1
  1   2
0
0
Any two different representations of the same signal
using an arbitrary dictionary
cannot be jointly sparse.
If we found a representation that satisfy

 
2
Theorem 4
0
Then necessarily it is unique (the sparsest).
Sparse representation and
the Basis Pursuit Algorithm
38
Lower bound on the “Spark”
• Define

H

0(?)  M  Max k j  1
1 k , j L
kj
(notice the resemblance to the previous definition of M).
• We can show (based on Gerśgorin disks theorem)
that a lower-bound on the spark is obtained by
1
 1 .
M
• Since the Gerśgorin theorem is un-tight, this lower
bound on the Spark is too pessimistic.
Sparse representation and
the Basis Pursuit Algorithm
39
Uniqueness Rule – 2
1
1     1   2
0
M
0
Any two different representations of the same signal
using an arbitrary dictionary
cannot be jointly sparse.
If we found a representation that satisfy
Theorem 5
 1
1
 1    
2 2
M
*
0
Then necessarily it is unique (the sparsest).
* This is the same as Donoho and Huo’s bound! Have we lost tightness?
Sparse representation and
the Basis Pursuit Algorithm
40
“Spark” Upper bound
• The Spark can be found by solving
 S : Min 
k


 

 
  k . Thus   Min  kS
 Min  k
0
s.t.   0 & k  1
L
k 1
S
L
k
k 1
S
  Min  k
1 k L
0
• Use Basis Pursuit
 Q : Min 
k

Q
• Clearly  k
L
1
s.t.   0 & k  1
Q
k 1
S
0
Sparse representation and
the Basis Pursuit Algorithm
0
1k L
k
L
k 1
Q
0
1 k L
0
.
41
Equivalence – The Result
Following the same path as shown before for the
equivalence theorem in the two-ortho case, and adopting
the new definition of M we obtain the following result:
Given a signal s with a representation s   ,
Theorem 6 Assuming that  0  0.51  1 M, P1 (BP) is
*
Guaranteed to find the sparsest solution.
* This is the same as Donoho and Huo’s bound! Is it non-tight?
Sparse representation and
the Basis Pursuit Algorithm
42
To Summarize so far …
Over-complete
linear transforms
– great for sparse
representations
forward
transform?
(a) Design of dictionaries,
(b) Test of for optimality,
(c) Applications - scrambling,
signal separation, inverse
problems, …
Sparse representation and
the Basis Pursuit Algorithm
Basis Pursuit
Algorithm
Why works so
well?
Practical
Implications?
We give
explanations
(uniqueness and
equivalence) true
for any dictionary
43
Agenda
1. Introduction
Previous and current work
2. Two Ortho-Bases
Uncertainty  Uniqueness  Equivalence
3. Arbitrary dictionary
Uniqueness  Equivalence
4. Stylized Applications
Separation of point, line, and plane clusters
5. Discussion
Sparse representation and
the Basis Pursuit Algorithm
44
Problem Definition
• Assume a 3D volume Sn1 , n2 , n3   Sn of size
p  p  p Voxels.
• S contains digital points, lines, and planes, defined
algebraically ( Z p3 ):
 Point – one Voxel
 Line – p Voxels (defined by a starting Voxel on the cube’s face,
and a slope – wrap around is permitted),
 Plane – p2 Voxels (defined by a height Voxel in the cube, and 2
slope parameters – wrap around is permitted).
• Task: Decompose S into the point, line, and plane atoms
– sparse decomposition is desired.
Sparse representation and
the Basis Pursuit Algorithm
45
Properties To Use
Digital lines and planes are
constructed such that:
 Any two lines are disjoint, or
intersect in a single Voxel.
 Any two planes are disjoint,
or intersect in a line of p
Voxels.
 Any digital line intersect a
with a digital plane not at
all, in a single Voxel, or
along the p Voxels being
the line itself.
Sparse representation and
the Basis Pursuit Algorithm
46
Our Dictionary
• We construct a dictionary to contain all occurrences of
points, lines, and planes (O(p4) columns).
• Every atom in our set is reorganized lexicographically
as a column vector of size p3-by-1.
• The spark of this dictionary is =p+1
(p points and one line, or p lines and one plane).
• The mutual incoherence is given by M=p-0.5
(normalized inner product
1
of a plane with a line,
Line, Po int 
1 p
or line with a point).
Sparse representation and
the Basis Pursuit Algorithm
47
Thus we can say that …
If we found a representation that satisfies
Due to
Theorem 4

0
p 1

2
then necessarily it is sparsest (P0 solution).
Due to
Theorem 6
Given a volume s with a representation


s    such that  0  0.5 1  p , (BP)
is guaranteed to find it.
Sparse representation and
the Basis Pursuit Algorithm
48
Implications
• Assume p=1024:
 A representation with fewer than 513 atoms
IS KNOWN TO BE THE SPARSEST.
 A representation with fewer than 17 atoms
WILL BE FOUND BY THE BP.
• What happens in between? We know that the
second bound is very loose! BP will succeed far
above 17 atoms. Further work is needed to
establish this fact.
• What happens above 513 atoms? No theoretical
answer yet!
Sparse representation and
the Basis Pursuit Algorithm
49
Towards Practical Application
• The algebraic definition of the points, lines,
and planes atoms was chosen to enable the
computation of  and M.
• Although wrap around makes this definition
non-intuitive, these results are suggestive of
what may be the case for geometrically
intuitive definitions of points, lines, and
planes.
• Further work is required to establish this
belief.
Sparse representation and
the Basis Pursuit Algorithm
50
Agenda
1. Introduction
Previous and current work
2. Two Ortho-Bases
Uncertainty  Uniqueness  Equivalence
3. Arbitrary dictionary
Uniqueness  Equivalence
4. Stylized Applications
Separation of point, line, and plane clusters
5. Discussion
Sparse representation and
the Basis Pursuit Algorithm
51
51
Summary
• The Basis Pursuit can be used as a
 Forward transform, leading to sparse representation.
 Way to achieve non-linear filtering.
• The dream: the over-completeness idea is highly effective,
and should be used in modern methods in representation
and inverse-problems.
• We would like to contribute to this development by
 Supplying clear(er) explanations about the BP behavior,
 Improve the involved numerical tools, and then
 Deploy it to applications.
Sparse representation and
the Basis Pursuit Algorithm
52
53
Future Work
• What dictionary to use? Relation to learning?
• BP beyond the bounds – Can we say more?
• Relaxed notion of sparsity? When zero is really zero?
• How to speed-up BP solver (both accurate and
approximate)?
• Theory behind approximate BP?
• Applications – Demonstrating the concept for practical
problems, such as denoising, coding, restoration,
signal separation …
Sparse representation and
the Basis Pursuit Algorithm
53
54
Download