Information complexity and exact communication bounds Mark Braverman Princeton University April 26, 2013 Based on joint work with Ankit Garg, Denis Pankratov, and Omri Weinstein 1 Overview: information complexity • Information complexity :: communication complexity as • Shannon’s entropy :: transmission cost 2 Background – information theory • Shannon (1948) introduced information theory as a tool for studying the communication cost of transmission tasks. communication channel Alice Bob 3 Shannon’s entropy • Assume a lossless binary channel. • A message π is distributed according to some prior π. • The inherent amount of bits it takes to transmit π is given by its entropy π» π = π π = π₯ log 2 (1/π[π = π₯]). X communication channel 4 Shannon’s noiseless coding • The cost of communicating many copies of π scales as π»(π). • Shannon’s source coding theorem: – Let πΆπ π be the cost of transmitting π independent copies of π. Then the amortized transmission cost lim πΆπ (π)/π = π» π . π→∞ 5 Shannon’s entropy – cont’d • Therefore, understanding the cost of transmitting a sequence of π’s is equivalent to understanding Shannon’s entropy of π. • What about more complicated scenarios? X communication channel Y • Amortized transmission cost = conditional entropy π» π π β π» ππ − π»(π). A simple Easy and example complete! • Alice has π uniform π‘1 , π‘2 , … , π‘π ∈ 1,2,3,4,5 . • Cost of transmitting to Bob is ≈ log 2 5 ⋅ π ≈ 2.32π. • Suppose for each π‘π Bob is given a unifomly random π π ∈ 1,2,3,4,5 such that π π ≠ π‘π then… cost of transmitting the π‘π ’s to Bob is ≈ log 2 4 ⋅ π = 2π. 7 Meanwhile, in a galaxy far far away… Communication complexity [Yao] • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality πΉ(π, π). X Y F(X,Y) A e.g. πΉ π, π = “π = π? ” B 8 Communication complexity Goal: implement a functionality πΉ(π, π). A protocol π(π, π) computing πΉ(π, π): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications (streaming, data structures, circuits lower bounds…) • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation). • Many lower-bound techniques exists. • Exact bounds?? 10 Communication complexity • (Distributional) communication complexity with input distribution π and error π: πΆπΆ πΉ, π, π . Error ≤ π w.r.t. π. • (Randomized/worst-case) communication complexity: πΆπΆ(πΉ, π). Error ≤ π on all inputs. • Yao’s minimax: πΆπΆ πΉ, π = max πΆπΆ(πΉ, π, π). π 11 Set disjointness and intersection Alice and Bob each given a set π ⊆ 1, … , π , π ⊆ {1, … , π} (can be viewed as vectors in 0,1 π ). • Intersection πΌππ‘π π, π = π ∩ π. • Disjointness π·ππ ππ π, π = 1 if π ∩ π = ∅, and 0 otherwise. • πΌππ‘π is just π 1-bit-ANDs in parallel. • ¬π·ππ ππ is an OR of π 1-bit-ANDs. • Need to understand amortized communication complexity (of 1-bit-AND). Information complexity • The smallest amount of information Alice and Bob need to exchange to solve πΉ. • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 13 Basic definition 1: The information cost of a protocol • Prior distribution: π, π ∼ π. Y X Protocol Protocol π transcript Π B A πΌπΆ(π, π) = πΌ(Π; π|π) + πΌ(Π; π|π) what Alice learns about Y + what Bob learns about X Mutual information • The mutual information of two random variables is the amount of information knowing one reveals about the other: πΌ(π΄; π΅) = π»(π΄) − π»(π΄|π΅) • If π΄, π΅ are independent, πΌ(π΄; π΅) = 0. • πΌ(π΄; π΄) = π»(π΄). H(A) I(A,B) H(B) 15 Basic definition 1: The information cost of a protocol • Prior distribution: π, π ∼ π. Y X Protocol Protocol π transcript Π B A πΌπΆ(π, π) = πΌ(Π; π|π) + πΌ(Π; π|π) what Alice learns about Y + what Bob learns about X Example •πΉ is “π = π? ”. •π is a distribution where w.p. ½ π = π and w.p. ½ (π, π) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B πΌ(π, π) = πΌ(Π; π|π) + πΌ(Π; π|π) ≈ 1 + 64.5 = 65.5 bits what Alice learns about Y + what Bob learns about X Information complexity • Communication complexity: πΆπΆ πΉ, π, π β min π πππππ’π‘ππ πΉ π€ππ‘β πππππ ≤π π. • Analogously: πΌπΆ πΉ, π, π β inf π πππππ’π‘ππ πΉ π€ππ‘β πππππ ≤π πΌπΆ(π, π). 18 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: πΆπΆ πΉ, π = max πΆπΆ(πΉ, π, π). π • For information πΌπΆ πΉ, π β inf π πππππ’π‘ππ πΉ π€ππ‘β πππππ ≤π max πΌπΆ(π, π). π 19 Connection to privacy • There is a strong connection between information complexity and (informationtheoretic) privacy. • Alice and Bob want to perform computation without revealing unnecessary information to each other (or to an eavesdropper). • Negative results through πΌπΆ arguments. 20 Information equals amortized communication • Recall [Shannon]: lim πΆπ (π)/π = π» π . π→∞ π π • [BR’11]: lim πΆπΆ(πΉ , π , π)/π = πΌπΆ πΉ, π, π , for π > 0. π→∞ • For π = 0: lim πΆπΆ(πΉ π , ππ , 0+ )/π = πΌπΆ πΉ, π, 0 . π→∞ • [ lim πΆπΆ(πΉ π , ππ , 0)/π an interesting open π→∞ question.] 21 Without priors • [BR’11] For π = 0: lim πΆπΆ(πΉ π , ππ , 0+ )/π = πΌπΆ πΉ, π, 0 . π→∞ • [B’12] lim πΆπΆ(πΉ π , 0+ )/π = πΌπΆ πΉ, 0 . π→∞ 22 Intersection • Therefore πΆπΆ πΌππ‘π , 0+ = π ⋅ πΌπΆ π΄ππ·, 0 ± π(π) • Need to find the information complexity of the two-bit π΄ππ·! 23 The two-bit AND • [BGPW’12] πΌπΆ π΄ππ·, 0 ≈ 1.4922 bits. • Find the value of πΌπΆ π΄ππ·, π, 0 for all priors π. • Find the information-theoretically optimal protocol for computing the π΄ππ· of two bits. 24 “Raise your hand when your number is reached” The optimal protocol for AND X {0,1} 1 Y {0,1} B A If X=1, A=1 If X=0, A=U[0,1] 0 If Y=1, B=1 If Y=0, B=U[0,1] “Raise your hand when your number is reached” The optimal protocol for AND X {0,1} 1 Y {0,1} B A If X=1, A=1 If X=0, A=U[0,1] 0 If Y=1, B=1 If Y=0, B=U[0,1] Analysis • An additional small step if the prior is not symmetric (Pr ππ = 10 ≠ Pr[ππ = 01]). • The protocol is clearly always correct. • How do we prove the optimality of a protocol? • Consider the function πΌπΆ(πΉ, π, 0) as a function of π. 27 The analytical view • A message is just a mapping from the current prior to a distribution of posteriors (new priors). Ex: “0”: 0.6 π X=0 X=1 Y=0 0.4 0.3 π0 Y=0 Y=1 X=0 2/3 1/3 X=1 0 0 π1 Y=0 Y=1 X=0 0 0 X=1 0.75 0.25 Y=1 0.2 0.1 “1”: 0.4 Alice sends her bit 28 The analytical view “0”: 0.55 π X=0 X=1 Y=0 0.4 0.3 π0 Y=0 Y=1 X=0 0.545 0.273 X=1 0.136 0.045 π1 Y=0 Y=1 X=0 2/9 1/9 X=1 1/2 1/6 Y=1 0.2 0.1 “1”: 0.45 Alice sends her bit w.p ½ and unif. random bit w.p ½. 29 Analytical view – cont’d • Denote Ψ π β πΌπΆ(πΉ, π, 0). • Each potential (one bit) message π by either party imposes a constraint of the form: Ψ π ≤ πΌπΆ π, π + Pr π = 0 ⋅ Ψ π0 + Pr π = 1 ⋅ Ψ π1 . • In fact, Ψ π is the point-wise largest function satisfying all such constraints (cf. construction 30 of harmonic functions). IC of AND • We show that for ππ΄ππ· described above, Ψπ π β πΌπΆ ππ΄ππ· , π satisfies all the constraints, and therefore represents the information complexity of π΄ππ· at all priors. • Theorem: π represents the informationtheoretically optimal protocol* for computing the π΄ππ· of two bits. 31 *Not a real protocol • The “protocol” is not a real protocol (this is why IC has an inf in its definition). • The protocol above can be made into a real protocol by discretizing the counter (e.g. into π equal intervals). • We show that the π-round IC: 1 πΌπΆπ π΄ππ·, 0 = πΌπΆ π΄ππ·, 0 + Θ 2 . π 32 Previous numerical evidence • [Ma,Ishwar’09] – numerical calculation results. 33 Applications: communication complexity of intersection • Corollary: πΆπΆ πΌππ‘π , 0+ ≈ 1.4922 ⋅ π ± π π . • Moreover: π + πΆπΆπ πΌππ‘π , 0 ≈ 1.4922 ⋅ π + Θ 2 . π 34 Applications 2: set disjointness • Recall: π·ππ ππ π, π = 1π∩π=∅ . • Extremely well-studied. [Kalyanasundaram and Schnitger’87, Razborov’92, Bar-Yossef et al.’02]: πΆπΆ π·ππ ππ , π = Θ(π). • What does a hard distribution for π·ππ ππ look like? 35 A hard distribution? 0 0 1 1 0 1 0 0 0 1 0 0 1 1 1 1 0 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0 Very easy! π Y=0 Y=1 X=0 1/4 1/4 X=1 1/4 1/4 36 A hard distribution 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 At most one (1,1) location! π Y=0 Y=1 X=0 1/3 1/3 X=1 1/3 0+ 37 Communication complexity of Disjointness • Continuing the line of reasoning of BarYossef et. al. • We now know exactly the communication complexity of Disj under any of the “hard” prior distributions. By maximizing, we get: • πΆπΆ π·ππ ππ , 0+ = πΆπ·πΌππ½ ⋅ π ± π(π), where πΆπ·πΌππ½ β max πΌπΆ π΄ππ·, π ≈ 0.4827 … π:π 1,1 =0 • With a bit of work this bound is tight. 38 Small-set Disjointness • A variant of set disjointness where we are given π, π ⊂ {1, … π} of size π βͺ π. • A lower bound of Ω π is obvious (modulo πΆπΆ π·ππ ππ = Ω(π)). • A very elegant matching upper bound was known [Hastad-Wigderson’07]: Θ π . 39 Using information complexity • This setting corresponds to the prior distribution • • ππ,π Y=0 Y=1 X=0 1-2k/n k/n X=1 k/n 0+ 2 Gives information complexity ln 2 2 Communication complexity ⋅ ln 2 ⋅ π ; π π±π π . Overview: information complexity • Information complexity :: communication complexity as • Shannon’s entropy :: transmission cost Today: focused on exact bounds using IC. 41 Selected open problems 1 • The interactive compression problem. • For Shannon’s entropy we have πΆ ππ →π» π . π • E.g. by Huffman’s coding we also know that π» π ≤ πΆ π < π» π + 1. • In the interactive setting πΆπΆ πΉ π , ππ , 0+ → πΌπΆ πΉ, π, 0 . π • But is it true that πΆπΆ(πΉ, π, 0+ ) β² πΌπΆ πΉ, π, 0 ?? Interactive compression? • πΆπΆ(πΉ, π, 0+ ) β² πΌπΆ πΉ, π, 0 is equivalent to πΆπΆ πΉ π ,ππ ,0+ πΆπΆ(πΉ, π, 0+ ) β² , the “direct sum” π problem for communication complexity. • Currently best general compression scheme [BBCR’10]: protocol of information cost πΌ and communication cost πΆ compressed to π( πΌ ⋅ πΆ) bits of communication. 43 Interactive compression? • πΆπΆ(πΉ, π, 0+ ) β² πΌπΆ πΉ, π, 0 is equivalent to πΆπΆ πΉ π ,ππ ,0+ πΆπΆ(πΉ, π, 0+ ) β² , the “direct sum” π problem for communication complexity. • A counterexample would need to separate IC from CC, which would require new lower bound techniques [Kerenidis, Laplante, Lerays, Roland, Xiao’12]. 44 Selected open problems 2 • Given a truth table for πΉ, a prior π, and an π ≥ 0, can we compute πΌπΆ(πΉ, π, π)? • An uncountable number of constraints, need to understand structure better. • Specific πΉ’s with inputs in 1,2,3 × {1,2,3}. • Going beyond two players. 45 External information cost • (π, π) ~ π. X C Y Protocol Protocol π transcript Π A B πΌπΆππ₯π‘ π, π = πΌ Π; ππ ≥ πΌ Π; π π + πΌ(Π; π|π) what Charlie learns about (π, π) External information complexity • πΌπΆππ₯π‘ πΉ, π, 0 β inf π πππππ’π‘ππ πΉ πππππππ‘ππ¦ πΌπΆππ₯π‘ (π, π). • Conjecture: Zero-error communication scales like external information: πΆπΆ πΉ π ,ππ ,0 lim π π→∞ = πΌπΆππ₯π‘ πΉ, π, 0 ? • Example: for πΌππ‘π /π΄ππ· this value is log 2 3 ≈ 1.585 > 1.492. 47 Thank You! 48