Information complexity and exact communication bounds

advertisement
Information complexity and
exact communication bounds
Mark Braverman
Princeton University
April 26, 2013
Based on joint work with Ankit Garg,
Denis Pankratov, and Omri Weinstein
1
Overview: information complexity
• Information complexity ::
communication complexity
as
• Shannon’s entropy ::
transmission cost
2
Background – information theory
• Shannon (1948) introduced information
theory as a tool for studying the
communication cost of transmission tasks.
communication channel
Alice
Bob
3
Shannon’s entropy
• Assume a lossless binary channel.
• A message 𝑋 is distributed according to
some prior πœ‡.
• The inherent amount of bits it takes to
transmit 𝑋 is given by its entropy
𝐻 𝑋 = πœ‡ 𝑋 = π‘₯ log 2 (1/πœ‡[𝑋 = π‘₯]).
X
communication channel
4
Shannon’s noiseless coding
• The cost of communicating many copies of
𝑋 scales as 𝐻(𝑋).
• Shannon’s source coding theorem:
– Let 𝐢𝑛 𝑋 be the cost of transmitting 𝑛
independent copies of 𝑋. Then the
amortized transmission cost
lim 𝐢𝑛 (𝑋)/𝑛 = 𝐻 𝑋 .
𝑛→∞
5
Shannon’s entropy – cont’d
• Therefore, understanding the cost of
transmitting a sequence of 𝑋’s is equivalent
to understanding Shannon’s entropy of 𝑋.
• What about more complicated scenarios?
X
communication channel
Y
• Amortized transmission cost = conditional
entropy 𝐻 𝑋 π‘Œ ≔ 𝐻 π‘‹π‘Œ − 𝐻(π‘Œ).
A simple
Easy and
example
complete!
• Alice has 𝑛 uniform 𝑑1 , 𝑑2 , … , 𝑑𝑛 ∈ 1,2,3,4,5 .
• Cost of transmitting to Bob is
≈ log 2 5 ⋅ 𝑛 ≈ 2.32𝑛.
• Suppose for each 𝑑𝑖 Bob is given a unifomly
random 𝑠𝑖 ∈ 1,2,3,4,5 such that 𝑠𝑖 ≠ 𝑑𝑖
then…
cost of transmitting the 𝑑𝑖 ’s to Bob is
≈ log 2 4 ⋅ 𝑛 = 2𝑛.
7
Meanwhile, in a galaxy far far away…
Communication complexity [Yao]
• Focus on the two party randomized setting.
Shared randomness R
A & B implement a
functionality 𝐹(𝑋, π‘Œ).
X
Y
F(X,Y)
A
e.g. 𝐹 𝑋, π‘Œ = “𝑋 = π‘Œ? ”
B
8
Communication complexity
Goal: implement a functionality 𝐹(𝑋, π‘Œ).
A protocol πœ‹(𝑋, π‘Œ) computing 𝐹(𝑋, π‘Œ):
Shared randomness R
m1(X,R)
m2(Y,m1,R)
m3(X,m1,m2,R)
X
Y
A
Communication costF(X,Y)
= #of bits exchanged.
B
Communication complexity
• Numerous applications/potential
applications (streaming, data structures,
circuits lower bounds…)
• Considerably more difficult to obtain lower
bounds than transmission (still much easier
than other models of computation).
• Many lower-bound techniques exists.
• Exact bounds??
10
Communication complexity
• (Distributional) communication complexity
with input distribution πœ‡ and error πœ€:
𝐢𝐢 𝐹, πœ‡, πœ€ . Error ≤ πœ€ w.r.t. πœ‡.
• (Randomized/worst-case) communication
complexity: 𝐢𝐢(𝐹, πœ€). Error ≤ πœ€ on all inputs.
• Yao’s minimax:
𝐢𝐢 𝐹, πœ€ = max 𝐢𝐢(𝐹, πœ‡, πœ€).
πœ‡
11
Set disjointness and intersection
Alice and Bob each given a set 𝑋 ⊆ 1, … , 𝑛 , π‘Œ ⊆
{1, … , 𝑛} (can be viewed as vectors in 0,1 𝑛 ).
• Intersection 𝐼𝑛𝑑𝑛 𝑋, π‘Œ = 𝑋 ∩ π‘Œ.
• Disjointness 𝐷𝑖𝑠𝑗𝑛 𝑋, π‘Œ = 1 if 𝑋 ∩ π‘Œ = ∅, and 0
otherwise.
• 𝐼𝑛𝑑𝑛 is just 𝑛 1-bit-ANDs in parallel.
• ¬π·π‘–𝑠𝑗𝑛 is an OR of 𝑛 1-bit-ANDs.
• Need to understand amortized communication
complexity (of 1-bit-AND).
Information complexity
• The smallest amount of information Alice
and Bob need to exchange to solve 𝐹.
• How is information measured?
• Communication cost of a protocol?
– Number of bits exchanged.
• Information cost of a protocol?
– Amount of information revealed.
13
Basic definition 1: The
information cost of a protocol
• Prior distribution: 𝑋, π‘Œ ∼ πœ‡.
Y
X
Protocol
Protocol π
transcript Π
B
A
𝐼𝐢(πœ‹, πœ‡) = 𝐼(Π; π‘Œ|𝑋) + 𝐼(Π; 𝑋|π‘Œ)
what Alice learns about Y + what Bob learns about X
Mutual information
• The mutual information of two random
variables is the amount of information
knowing one reveals about the other:
𝐼(𝐴; 𝐡) = 𝐻(𝐴) − 𝐻(𝐴|𝐡)
• If 𝐴, 𝐡 are independent, 𝐼(𝐴; 𝐡) = 0.
• 𝐼(𝐴; 𝐴) = 𝐻(𝐴).
H(A)
I(A,B)
H(B)
15
Basic definition 1: The
information cost of a protocol
• Prior distribution: 𝑋, π‘Œ ∼ πœ‡.
Y
X
Protocol
Protocol π
transcript Π
B
A
𝐼𝐢(πœ‹, πœ‡) = 𝐼(Π; π‘Œ|𝑋) + 𝐼(Π; 𝑋|π‘Œ)
what Alice learns about Y + what Bob learns about X
Example
•πΉ is “𝑋 = π‘Œ? ”.
•πœ‡ is a distribution where w.p. ½ 𝑋 = π‘Œ and w.p.
½ (𝑋, π‘Œ) are random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
B
𝐼(πœ‹, πœ‡) = 𝐼(Π; π‘Œ|𝑋) + 𝐼(Π; 𝑋|π‘Œ) ≈ 1 + 64.5 = 65.5 bits
what Alice learns about Y + what Bob learns about X
Information complexity
• Communication complexity:
𝐢𝐢 𝐹, πœ‡, πœ€ ≔
min
πœ‹ π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 
𝐹 π‘€π‘–π‘‘β„Ž π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ ≤πœ€
πœ‹.
• Analogously:
𝐼𝐢 𝐹, πœ‡, πœ€ ≔
inf
πœ‹ π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 
𝐹 π‘€π‘–π‘‘β„Ž π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ ≤πœ€
𝐼𝐢(πœ‹, πœ‡).
18
Prior-free information complexity
• Using minimax can get rid of the prior.
• For communication, we had:
𝐢𝐢 𝐹, πœ€ = max 𝐢𝐢(𝐹, πœ‡, πœ€).
πœ‡
• For information
𝐼𝐢 𝐹, πœ€ ≔
inf
πœ‹ π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 
𝐹 π‘€π‘–π‘‘β„Ž π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ ≤πœ€
max 𝐼𝐢(πœ‹, πœ‡).
πœ‡
19
Connection to privacy
• There is a strong connection between
information complexity and (informationtheoretic) privacy.
• Alice and Bob want to perform
computation without revealing
unnecessary information to each other (or
to an eavesdropper).
• Negative results through 𝐼𝐢 arguments.
20
Information equals amortized
communication
• Recall [Shannon]: lim 𝐢𝑛 (𝑋)/𝑛 = 𝐻 𝑋 .
𝑛→∞
𝑛 𝑛
• [BR’11]: lim 𝐢𝐢(𝐹 , πœ‡ , πœ€)/𝑛 = 𝐼𝐢 𝐹, πœ‡, πœ€ , for
πœ€ > 0.
𝑛→∞
• For πœ€ = 0: lim 𝐢𝐢(𝐹 𝑛 , πœ‡π‘› , 0+ )/𝑛 = 𝐼𝐢 𝐹, πœ‡, 0 .
𝑛→∞
• [ lim 𝐢𝐢(𝐹 𝑛 , πœ‡π‘› , 0)/𝑛 an interesting open
𝑛→∞
question.]
21
Without priors
• [BR’11] For πœ€ = 0:
lim 𝐢𝐢(𝐹 𝑛 , πœ‡π‘› , 0+ )/𝑛 = 𝐼𝐢 𝐹, πœ‡, 0 .
𝑛→∞
• [B’12] lim 𝐢𝐢(𝐹 𝑛 , 0+ )/𝑛 = 𝐼𝐢 𝐹, 0 .
𝑛→∞
22
Intersection
• Therefore
𝐢𝐢 𝐼𝑛𝑑𝑛 , 0+ = 𝑛 ⋅ 𝐼𝐢 𝐴𝑁𝐷, 0 ± π‘œ(𝑛)
• Need to find the information complexity of
the two-bit 𝐴𝑁𝐷!
23
The two-bit AND
• [BGPW’12] 𝐼𝐢 𝐴𝑁𝐷, 0 ≈ 1.4922 bits.
• Find the value of 𝐼𝐢 𝐴𝑁𝐷, πœ‡, 0 for all
priors πœ‡.
• Find the information-theoretically optimal
protocol for computing the 𝐴𝑁𝐷 of two
bits.
24
“Raise your hand when your number is reached”
The optimal protocol for AND
X {0,1}
1
Y {0,1}
B
A
If X=1, A=1
If X=0, A=U[0,1]
0
If Y=1, B=1
If Y=0, B=U[0,1]
“Raise your hand when your number is reached”
The optimal protocol for AND
X {0,1}
1
Y {0,1}
B
A
If X=1, A=1
If X=0, A=U[0,1]
0
If Y=1, B=1
If Y=0, B=U[0,1]
Analysis
• An additional small step if the prior is not
symmetric (Pr π‘‹π‘Œ = 10 ≠ Pr[π‘‹π‘Œ = 01]).
• The protocol is clearly always correct.
• How do we prove the optimality of a
protocol?
• Consider the function 𝐼𝐢(𝐹, πœ‡, 0) as a
function of πœ‡.
27
The analytical view
• A message is just a mapping from the
current prior to a distribution of posteriors
(new priors). Ex:
“0”: 0.6
πœ‡
X=0
X=1
Y=0
0.4
0.3
πœ‡0
Y=0
Y=1
X=0
2/3
1/3
X=1
0
0
πœ‡1
Y=0
Y=1
X=0
0
0
X=1
0.75
0.25
Y=1
0.2
0.1
“1”: 0.4
Alice sends
her bit
28
The analytical view
“0”: 0.55
πœ‡
X=0
X=1
Y=0
0.4
0.3
πœ‡0
Y=0
Y=1
X=0
0.545
0.273
X=1
0.136
0.045
πœ‡1
Y=0
Y=1
X=0
2/9
1/9
X=1
1/2
1/6
Y=1
0.2
0.1
“1”: 0.45
Alice sends her bit
w.p ½ and unif.
random bit w.p ½.
29
Analytical view – cont’d
• Denote Ψ πœ‡ ≔ 𝐼𝐢(𝐹, πœ‡, 0).
• Each potential (one bit) message 𝑀 by either
party imposes a constraint of the form:
Ψ πœ‡
≤ 𝐼𝐢 𝑀, πœ‡ + Pr 𝑀 = 0 ⋅ Ψ πœ‡0
+ Pr 𝑀 = 1 ⋅ Ψ πœ‡1 .
• In fact, Ψ πœ‡ is the point-wise largest function
satisfying all such constraints (cf. construction
30
of harmonic functions).
IC of AND
• We show that for πœ‹π΄π‘π· described above,
Ψπœ‹ πœ‡ ≔ 𝐼𝐢 πœ‹π΄π‘π· , πœ‡ satisfies all the
constraints, and therefore represents the
information complexity of 𝐴𝑁𝐷 at all priors.
• Theorem: πœ‹ represents the informationtheoretically optimal protocol* for
computing the 𝐴𝑁𝐷 of two bits.
31
*Not a real protocol
• The “protocol” is not a real protocol (this is
why IC has an inf in its definition).
• The protocol above can be made into a real
protocol by discretizing the counter (e.g.
into π‘Ÿ equal intervals).
• We show that the π‘Ÿ-round IC:
1
πΌπΆπ‘Ÿ 𝐴𝑁𝐷, 0 = 𝐼𝐢 𝐴𝑁𝐷, 0 + Θ 2 .
π‘Ÿ
32
Previous numerical evidence
• [Ma,Ishwar’09] – numerical calculation results.
33
Applications: communication
complexity of intersection
• Corollary:
𝐢𝐢 𝐼𝑛𝑑𝑛 , 0+ ≈ 1.4922 ⋅ 𝑛 ± π‘œ 𝑛 .
• Moreover:
𝑛
+
πΆπΆπ‘Ÿ 𝐼𝑛𝑑𝑛 , 0 ≈ 1.4922 ⋅ 𝑛 + Θ 2 .
π‘Ÿ
34
Applications 2: set disjointness
• Recall: 𝐷𝑖𝑠𝑗𝑛 𝑋, π‘Œ = 1𝑋∩π‘Œ=∅ .
• Extremely well-studied. [Kalyanasundaram
and Schnitger’87, Razborov’92, Bar-Yossef
et al.’02]: 𝐢𝐢 𝐷𝑖𝑠𝑗𝑛 , πœ€ = Θ(𝑛).
• What does a hard distribution for 𝐷𝑖𝑠𝑗𝑛
look like?
35
A hard distribution?
0
0
1
1
0
1
0
0
0
1
0
0
1
1
1
1
0
1
1
0
0
1
0
1
0
0
1
1
1
0
0
1
1
1
0
1
0
1
1
0
0
0
Very easy!
πœ‡
Y=0
Y=1
X=0
1/4
1/4
X=1
1/4
1/4
36
A hard distribution
0
0
0
1
0
1
0
0
0
1
0
0
1
1
0
1
0
1
1
0
0
1
0
1
0
0
0
1
1
0
0
1
1
1
0
0
0
1
0
0
0
0
At most one
(1,1) location!
πœ‡
Y=0
Y=1
X=0
1/3
1/3
X=1
1/3
0+
37
Communication complexity of
Disjointness
• Continuing the line of reasoning of BarYossef et. al.
• We now know exactly the communication
complexity of Disj under any of the “hard”
prior distributions. By maximizing, we get:
• 𝐢𝐢 𝐷𝑖𝑠𝑗𝑛 , 0+ = 𝐢𝐷𝐼𝑆𝐽 ⋅ 𝑛 ± π‘œ(𝑛), where
𝐢𝐷𝐼𝑆𝐽 ≔ max 𝐼𝐢 𝐴𝑁𝐷, πœ‡ ≈ 0.4827 …
πœ‡:πœ‡ 1,1 =0
• With a bit of work this bound is tight.
38
Small-set Disjointness
• A variant of set disjointness where we are
given 𝑋, π‘Œ ⊂ {1, … 𝑛} of size π‘˜ β‰ͺ 𝑛.
• A lower bound of Ω π‘˜ is obvious (modulo
𝐢𝐢 𝐷𝑖𝑠𝑗𝑛 = Ω(𝑛)).
• A very elegant matching upper bound was
known [Hastad-Wigderson’07]: Θ π‘˜ .
39
Using information complexity
• This setting corresponds to the prior
distribution
•
•
πœ‡π‘˜,𝑛
Y=0
Y=1
X=0
1-2k/n
k/n
X=1
k/n
0+
2
Gives information complexity
ln 2
2
Communication complexity
⋅
ln 2
⋅
π‘˜
;
𝑛
π‘˜±π‘œ π‘˜ .
Overview: information complexity
• Information complexity ::
communication complexity
as
• Shannon’s entropy ::
transmission cost
Today: focused on exact bounds using IC.
41
Selected open problems 1
• The interactive compression problem.
• For Shannon’s entropy we have
𝐢 𝑋𝑛
→𝐻 𝑋 .
𝑛
• E.g. by Huffman’s coding we also know that
𝐻 𝑋 ≤ 𝐢 𝑋 < 𝐻 𝑋 + 1.
• In the interactive setting
𝐢𝐢 𝐹 𝑛 , πœ‡π‘› , 0+
→ 𝐼𝐢 𝐹, πœ‡, 0 .
𝑛
• But is it true that 𝐢𝐢(𝐹, πœ‡, 0+ ) ≲ 𝐼𝐢 𝐹, πœ‡, 0 ??
Interactive compression?
• 𝐢𝐢(𝐹, πœ‡, 0+ ) ≲ 𝐼𝐢 𝐹, πœ‡, 0 is equivalent to
𝐢𝐢 𝐹 𝑛 ,πœ‡π‘› ,0+
𝐢𝐢(𝐹, πœ‡, 0+ ) ≲
, the “direct sum”
𝑛
problem for communication complexity.
• Currently best general compression scheme
[BBCR’10]: protocol of information cost 𝐼
and communication cost 𝐢 compressed to
𝑂( 𝐼 ⋅ 𝐢) bits of communication.
43
Interactive compression?
• 𝐢𝐢(𝐹, πœ‡, 0+ ) ≲ 𝐼𝐢 𝐹, πœ‡, 0 is equivalent to
𝐢𝐢 𝐹 𝑛 ,πœ‡π‘› ,0+
𝐢𝐢(𝐹, πœ‡, 0+ ) ≲
, the “direct sum”
𝑛
problem for communication complexity.
• A counterexample would need to separate
IC from CC, which would require new lower
bound techniques [Kerenidis, Laplante,
Lerays, Roland, Xiao’12].
44
Selected open problems 2
• Given a truth table for 𝐹, a prior πœ‡, and an
πœ€ ≥ 0, can we compute 𝐼𝐢(𝐹, πœ‡, πœ€)?
• An uncountable number of constraints,
need to understand structure better.
• Specific 𝐹’s with inputs in 1,2,3 × {1,2,3}.
• Going beyond two players.
45
External information cost
• (𝑋, π‘Œ) ~ πœ‡.
X
C
Y
Protocol
Protocol π
transcript Π
A
B
𝐼𝐢𝑒π‘₯𝑑 πœ‹, πœ‡ = 𝐼 Π; π‘‹π‘Œ ≥ 𝐼 Π; π‘Œ 𝑋 + 𝐼(Π; 𝑋|π‘Œ)
what Charlie learns about (𝑋, π‘Œ)
External information complexity
• 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, 0 ≔
inf
πœ‹ π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 
𝐹 π‘π‘œπ‘Ÿπ‘Ÿπ‘’π‘π‘‘π‘™π‘¦
𝐼𝐢𝑒π‘₯𝑑 (πœ‹, πœ‡).
• Conjecture: Zero-error communication scales
like external information:
𝐢𝐢 𝐹 𝑛 ,πœ‡π‘› ,0
lim
𝑛
𝑛→∞
= 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, 0 ?
• Example: for 𝐼𝑛𝑑𝑛 /𝐴𝑁𝐷 this value is
log 2 3 ≈ 1.585 > 1.492.
47
Thank You!
48
Download