Eric Price
MIT
David Woodruff
IBM Almaden
• Choose an r x n matrix A
• Given x 2 R n
• Compute Ax
• Output a vector y so that
Pr
A
• x top k is the k-sparse vector of largest magnitude coefficients of x
• p = 1 or p = 2
• Minimize number r = r(n, k, ε) of “measurements”
• p = 1
[IR, …] r = O(k log(n/k) / ε) (deterministic A)
• p = 2
[GLPS] r = O(k log(n/k) / ε)
In both cases, r =
(k log(n/k)) [DIPW]
What is the dependence on ε?
• Suppose x = e i
– e i
+ u
= (0, 0, …, 0, 1, 0, …, 0) have 1/ ε = 100, log n = 32
• Consider y = 0 n
– |x-y|
2
= |x|
2
· 2 1/2
It’s a trivial solution!
¢ |x-e i
|
2
• (1+ε)-approximate recovery fixes this i
• p = 1
[IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log 2 (1/ ε) / ε 1/2 ) (randomized) r =
(k log(1/ ε) / ε 1/2 )
• p = 2:
[GLPS] r = O(k log(n/k) / ε) r =
(k log(n/k) / ε)
Previous lower bounds
(k log(n/k))
Lower bounds for randomized constant probability
• We get r = O~(k/ ε 1/2 ) randomized upper bound for p = 1
• We show
(k log (n/k) / ε) for p = 1 for deterministic schemes
• So randomized easier than deterministic
• Output a vector y from Ax so that
|x-y| p
· (1+ ε) |x-x top k
| p
• Sometimes want y to be k-sparse r =
~
(k/ ε p )
• Both results tight up to logarithmic factors
• Recall that for non-sparse output r = £ ~(k/ ε p/2 )
1. O~(k / ε 1/2 ) upper bound for p = 1
2. Lower bounds
• Want O~(k/ ε 1/2 ) for p = 1
• Replace k with 1
– Sample 1/k fraction of coordinates
– Solve the problem for k = 1 on the sample
– Repeat O~(k) times independently
– Combine the solutions found
ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n
ε/k, 1/n, …, 1/n
• Assume |x-x top
|
1
• First attempt
= 1, and x top
= ε
– Use CountMin [CM]
– Randomly partition coordinates into B buckets, maintain sum in each bucket
Σ i s.t. h(i) = 2 x i
• The expected l
1
mass of “noise” in a bucket is 1/B
• If B = £ (1/ ε), most buckets have count < ε/2, but bucket that contains x top has count > ε/2
• Repeat O(log n) times
• But we wanted O~(1/ε 1/2 ) measurements
• Error in a bucket is 1/B, need B ¼ 1/ ε
• What about CountSketch? [CCF-C]
– Give each coordinate i a random ¾ (i) 2 {-1,1}
– Randomly partition coordinates into B buckets, maintain Σ i s.t. h(i) = j
¾ (i) ¢ x i in j-th bucket
Σ i s.t. h(i) = 2
¾ (i) ¢ x i
– Bucket error is (Σ i
top
– Is this better?
x i
2 / B) 1/2
• Bucket error Err = (Σ i
top x i
2 / B) 1/2
• All |x i
| · ε and |x-x top
|
1
= 1
• Σ i
top x i
2 · 1/ ε ¢ ε 2 · ε
• So Err · ( ε/B) 1/2 which needs to be at most ε
• Solving, B ¸ 1/ ε
• CountSketch isn’t better than CountMin
• We insist on using CountSketch with B = 1/ε 1/2
• Suppose Err = (Σ i
top x i
2 / B) 1/2 = ε
• This means Σ i
top x i
2 = ε 3/2
• Forget about x top
!
• Let’s make up the mass another way
• We have: Σ i
top x i
2 = ε 3/2
• Intuition: suppose all x i
, i
top, are the same or 0
• Then: (# non-zero)*value = 1
(# non-zero)*value 2 = ε 3/2
• Hence, value = ε 3/2 and # non-zero = 1/ ε 3/2
• Sample ε-fraction of coordinates uniformly at random!
– value = ε 3/2 and # non-zero sampled = 1/ ε 1/2 , so l
1
-contribution = ε
– Find all non-zeros with O~(1/ε 1/2 ) measurements
• Σ i
top x i
2 = ε 3/2
• S j
= {i | 1/4 j < x i
2
· 1/4 j-1 }
• Σ i
top x i
2 = ε 3/2 implies there is a j for which |S j
|/4 j =
~( ε 3/2 )
ε 3/4
…
16 ε 3/2 , …, 16ε 3/2
ε 3/2
4 ε 3/2 , …, 4ε 3/2
, …, ε 3/2
• If |S j
| < 1/ ε 1/2 , then 1/4 j > ε 2 , so 1/2 j > ε, can’t happen
• Else, sample at rate 1/(|S j
| ε 1/2 ) to get 1/ ε 1/2 elements of |S j
|
• l
1
-mass of |S j
| in sample is > ε
• Can we find the sampled elements of S j
? Use Σ i
top x i
2 = ε 3/2
• The l
2
2 of the sample is about ε 3/2 ¢ 1/(|S j
| ε 1/2 ) = ε/|S j
|
• Using CountSketch with 1/ε 1/2 buckets:
Bucket error = sqrt{ ε 1/2 ¢ ε 3/2 ¢ 1/(|S j
| ε 1/2 )}
= sqrt{ ε 3/2 /|S j
|} < 1/2 j since |S j
|/4 j > ε 3/2
• Sub-sample O(log 1/ε) times in powers of 2
• In each level of sub-sampling maintain CountSketch with
O~(1/ ε 1/2 ) buckets
• Find as many heavy coordinates as you can!
• Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling
• Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)
1. O~(k / ε 1/2 ) upper bound for p = 1
2. Lower bounds
• General results:
–
~(k / ε 1/2 ) for p = 1
–
(k log(n/k) / ε) for p = 2
• Sparse output:
–
~(k/ ε) for p = 1
–
~(k/ ε 2 ) for p = 2
• Deterministic:
–
(k log(n/k) / ε) for p = 1
Alice x
What is f(x,y)?
Bob y
M
A
(x)
M
B
(y)
• Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability
• Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs
• Parties share randomness
• Shared randomness decides matrix A
• Alice sends Ax to referee
• Bob sends Ay to referee
• Referee computes A(x+y), uses compressed sensing recovery algorithm
• If output of algorithm solves f(x,y), then
# rows of A * # bits per measurement > CC(f)
• General results: Direct-Sum Gap-l
1
–
~(k / ε 1/2 ) for p = 1
–
~(k / ε) for p = 2
• Sparse output: Indexing
–
~(k/ ε) for p = 1
–
~(k/ ε 2 ) for p = 2
• Deterministic: Equality
–
(k log(n/k) / ε) for p = 1
Tighter log factors achievable by looking at Gaussian channels
• Alice and Bob have x, y, respectively, in R m
• There is a unique i * for which (x+y) i*
For all j
i * , (x+y) j
2
= d
{0, c, -c}, where |c| < |d|
• Finding i * requires
(m/(d/c) 2 ) communication
[SS, BJKS]
• m = 1/ε 3/2 , c = ε 3/2 , d = ε
• Need
(1/ ε 1/2 ) communication
• But the compressed sensing algorithm doesn’t need to find i *
• If not then it needs to transmit a lot of information about the tail
– Tail a random low-weight vector in {0, ε 3/2 , ε 3/2 } 1/ε 3
– Uses distributional lower bound and RS codes
• Send a vector y within 1-ε of tail in l
1
-norm
• Needs 1/ε 1/2 communication
• Same argument, different parameters
•
(1/ ε) communication
• What about general k?
• Bounded Round Direct Sum Theorem [BR]
(with slight modification) given k copies of a function f, with input pairs independently drawn from ¹ , solving a 2/3 fraction needs communication
(k ¢ CC
¹
(f))
ε 1/2 ε 3/2 , …, ε 3/2
ε 1/2
…
ε 3/2 , …, ε 3/2
} k
ε 1/2 ε 3/2 , …, ε 3/2
Instance for p = 1
1/2
• A is a matrix of real numbers
• Can assume orthonormal rows
• Round the entries of A to O(log n) bits, obtaining matrix A’
• Careful
– A’x = A(x+s) for “small” s
– But s depends on A, no guarantee recovery works
– Can be fixed by looking at A(x+s+u) for random u
• # rows of A * # bits per measurement > CC(f)
• By rounding, # bits per measurement = O(log n)
• In our hard instances, universe size = poly(k/ε)
• So # rows of A * O(log (k/ε)) > CC(f)
• # rows of A =
~(k/ ε 1/2 ) for p = 1
• # rows of A =
~(k/ ε) for p = 2
Sparse output: Indexing
–
~(k/ ε) for p = 1
–
~(k/ ε 2 ) for p = 2
What is x i
?
x 2 {0,1} n i 2 {1, 2, …, n}
CC(Indexing) =
(n)
Generalizes to k > 1 to give
~(k/ ε) x 2 {ε, ε} 1/ε Generalizes to p = 2 to give
~(k/ ε 2 ) y = e i
• Consider x+y
• If output is required to be 1-sparse must place mass on the i-th coordinate
• Mass must be 1+ε if x i
= ε, otherwise 1-ε
Deterministic: Equality
–
(k log(n/k) / ε) for p = 1
Is x = y?
x 2 {0,1} n y 2 {0,1} n
Deterministic CC(Equality) =
(n)
Choose log n signals x 1 , …, x log n , each with k/ ε values equal to ε/k
Choose log n signals y 1 , …, y log n , each with k/ ε values equal to ε/k x = Σ i=1 log n
10 i x i y = Σ i=1 log n
Consider x-y
Compressed sensing output is 0 n iff x = y
10 i y i
• Alice has a signal x =ε 1/2 e i for random i 2 [n]
• Alice transmits x over a noisy channel with independent
N(0, 1/n) noise on each coordinate
• Consider any row vector a of A
• Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|
2
2 /n)
• E i
[<a,x> 2 ] = ε |a|
2
2 /n
• Shannon-Hartley Theorem:
I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)
• General results
– £ ~(k/ ε p/2 )
• Sparse output
– £ ~(k/ ε p )
• Deterministic
– £ (k log(n/k) / ε) for p = 1