Basics of information theory and information complexity a tutorial

Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory • Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. communication channel Alice Bob 2 Quantifying “information” • Information is measured in bits. • The basic notion is Shannon’s entropy. • The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. • For a discrete variable: 𝐻 𝑋 ≔ ∑ Pr 𝑋 = 𝑥 log 1/Pr⁡[𝑋 = 𝑥] 3 Shannon’s entropy • Important examples and properties: – If 𝑋 = 𝑥 is a constant, then 𝐻 𝑋 = 0. – If 𝑋 is uniform on a finite set 𝑆 of possible values, then 𝐻 𝑋 = log 𝑆. – If 𝑋 is supported on at most 𝑛 values, then 𝐻 X ≤ log 𝑛. – If 𝑌 is a random variable determined by 𝑋, then 𝐻 𝑌 ≤ 𝐻(𝑋). 4 Conditional entropy • For two (potentially correlated) variables 𝑋, 𝑌, the conditional entropy of 𝑋 given 𝑌 is the amount of uncertainty left in 𝑋 given 𝑌: 𝐻 𝑋 𝑌 ≔ 𝐸𝑦~𝑌 H X Y = y . • One can show 𝐻 𝑋𝑌 = 𝐻 𝑌 + 𝐻(𝑋|𝑌). • This important fact is knows as the chain rule. • If 𝑋 ⊥ 𝑌, then 𝐻 𝑋𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 . 5 Example • • • • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 Where 𝐵1 , 𝐵2 , 𝐵3 , 𝐵4 , 𝐵5 ∈𝑈 {0,1}. Then – 𝐻 𝑋 = 3; 𝐻 𝑌 = 4; 𝐻 𝑋𝑌 = 5; – 𝐻 𝑋 𝑌 = 1 = 𝐻 𝑋𝑌 − 𝐻 𝑌 ; – 𝐻 𝑌 𝑋 = 2 = 𝐻 𝑋𝑌 − 𝐻 𝑋 . 6 Mutual information • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 • 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 𝐻(𝑋) 𝐻(𝑋|𝑌) 𝐼(𝑋; 𝑌) 𝐵1 𝐵1 ⊕ 𝐵2 𝐵2 ⊕ 𝐵3 𝐻(𝑌|𝑋) 𝐻(𝑌) 𝐵4 𝐵5 7 Mutual information • The mutual information is defined as 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋) • “By how much does knowing 𝑋 reduce the entropy of 𝑌?” • Always non-negative 𝐼 𝑋; 𝑌 ≥ 0. • Conditional mutual information: 𝐼 𝑋; 𝑌 𝑍 ≔ 𝐻 𝑋 𝑍 − 𝐻(𝑋|𝑌𝑍) • Chain rule for mutual information: 𝐼 𝑋𝑌; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(𝑌; 𝑍|𝑋) 8 • Simple intuitive interpretation. Example – a biased coin • A coin with 𝜀-Heads or Tails bias is tossed several times. • Let 𝐵 ∈ {𝐻, 𝑇} be the bias, and suppose that a-priori both options are equally likely: 𝐻 𝐵 = 1. • How many tosses needed to find 𝐵? • Let 𝑇1 , … , 𝑇𝑘 be a sequence of tosses. • Start with 𝑘 = 2. 9 What do we learn about 𝐵? • 𝐼 𝐵; 𝑇1 𝑇2 = 𝐼 𝐵; 𝑇1 + 𝐼 𝐵; 𝑇2 𝑇1 = 𝐼 𝐵; 𝑇1 + 𝐼 𝐵⁡𝑇1 ; 𝑇2 − 𝐼 𝑇1 ; 𝑇2 ≤ 𝐼 𝐵; 𝑇1 + 𝐼 𝐵⁡𝑇1 ; 𝑇2 = 𝐼 𝐵; 𝑇1 + 𝐼 𝐵; 𝑇2 + 𝐼 𝑇1 ; 𝑇2 𝐵 = 𝐼 𝐵; 𝑇1 + 𝐼 𝐵; 𝑇2 = 2 ⋅ 𝐼 𝐵; 𝑇1 .⁡ • Similarly, 𝐼 𝐵; 𝑇1 … 𝑇𝑘 ≤ 𝑘 ⋅ 𝐼 𝐵; 𝑇1 . • To determine 𝐵 with constant accuracy, need 0 < c < 𝐼 𝐵; 𝑇1 … 𝑇𝑘 ≤ 𝑘 ⋅ 𝐼 𝐵; 𝑇1 . • 𝑘 = Ω(1/𝐼(𝐵; 𝑇1 )). 10 Kullback–Leibler (KL)-Divergence • A distance metric between distributions on the same space. • Plays a key role in information theory. 𝑃[𝑥] 𝐷(𝑃 ∥ 𝑄) ≔ 𝑃 𝑥 log . 𝑄[𝑥] 𝑥 • 𝐷(𝑃 ∥ 𝑄) ≥ 0, with equality when 𝑃 = 𝑄. • Caution: 𝐷(𝑃 ∥ 𝑄) ≠ 𝐷(𝑄 ∥ 𝑃)! 11 Properties of KL-divergence • Connection to mutual information: 𝐼 𝑋; 𝑌 = 𝐸𝑦∼𝑌 𝐷(𝑋𝑌=𝑦 ∥ 𝑋). • If 𝑋 ⊥ 𝑌, then 𝑋𝑌=𝑦 = 𝑋, and both sides are 0. • Pinsker’s inequality: 𝑃 − 𝑄 1 = 𝑂( 𝐷(𝑃 ∥ 𝑄)). • Tight! 𝐷(𝐵1/2+𝜀 ∥ 𝐵1/2 ) = Θ(𝜀 2 ). 12 Back to the coin example • 𝐼 𝐵; 𝑇1 = 𝐸𝑏∼𝐵 𝐷(𝑇1,𝐵=𝑏 ∥ 𝑇1 ) = 𝐷 𝐵1 ∥ 𝐵1 = Θ 𝜀 2 . 2 • 𝑘=Ω 1 𝐼 𝐵;𝑇1 ±𝜀 = 2 1 Ω( 2). 𝜀 • “Follow the information learned from the coin tosses” • Can be done using combinatorics, but the information-theoretic language is more natural for expressing what’s going on. Back to communication • The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize. • Can attach operational meaning to Shannon’s entropy: 𝐻 𝑋 ≈ “the cost of transmitting 𝑋”. • Let 𝐶 𝑋 be the (expected) cost of transmitting a sample of 𝑋. 14 𝐻 𝑋 = 𝐶(𝑋)? • Not quite. • Let trit 𝑇 ∈𝑈 1,2,3 . 5 3 • 𝐶 𝑇 = ≈ 1.67. 1 2 0 10 3 11 • 𝐻 𝑇 = log 3 ≈ 1.58. • It is always the case that 𝐶 𝑋 ≥ 𝐻(𝑋). 15 But 𝐻 𝑋 and 𝐶(𝑋) are close • Huffman’s coding: 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. • This is a compression result: “an uninformative message turned into a short one”. • Therefore: 𝐻 𝑋 ≤ 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. 16 Shannon’s noiseless coding • The cost of communicating many copies of 𝑋 scales as 𝐻(𝑋). • Shannon’s source coding theorem: – Let 𝐶 𝑋 𝑛 be the cost of transmitting 𝑛 independent copies of 𝑋. Then the amortized transmission cost lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ • This equation gives 𝐻(𝑋) operational meaning. 17 𝐻 𝑋 ⁡𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋) per copy to transmit 𝑋’s communication channel 18 𝐻(𝑋) is nicer than 𝐶(𝑋) • 𝐻 𝑋 is additive for independent variables. • Let 𝑇1 , 𝑇2 ∈𝑈 {1,2,3} be independent trits. • 𝐻 𝑇1 𝑇2 = log 9 = 2 log 3. • 𝐶 𝑇1 𝑇2 = 29 9 5 3 < 𝐶 𝑇1 + 𝐶 𝑇2 = 2 × = 30 . 9 • Works well with concepts such as channel capacity. 19 “Proof” of Shannon’s noiseless coding • 𝑛 ⋅ 𝐻 𝑋 = 𝐻 𝑋 𝑛 ≤ 𝐶 𝑋 𝑛 ≤ 𝐻 𝑋 𝑛 + 1. Additivity of entropy Compression (Huffman) • Therefore lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ 20 Operationalizing other quantities • Conditional entropy 𝐻 𝑋 𝑌 : • (cf. Slepian-Wolf Theorem). 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋|𝑌) per copy to transmit 𝑋’s communication channel 𝑌1 , … , 𝑌𝑛 , … Operationalizing other quantities • Mutual information 𝐼 𝑋; 𝑌 : 𝑌1 , … , 𝑌𝑛 , … 𝑋1 , … , 𝑋𝑛 , … 𝐼 𝑋; 𝑌 per copy to sample 𝑌’s communication channel Information theory and entropy • Allows us to formalize intuitive notions. • Operationalized in the context of one-way transmission and related problems. • Has nice properties (additivity, chain rule…) • Next, we discuss extensions to more interesting communication scenarios. 23 Communication complexity • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality 𝐹(𝑋, 𝑌). X Y F(X,Y) A e.g. 𝐹 𝑋, 𝑌 = “𝑋 = 𝑌? ” B 24 Communication complexity Goal: implement a functionality 𝐹(𝑋, 𝑌). A protocol 𝜋(𝑋, 𝑌) computing 𝐹(𝑋, 𝑌): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications (some will be discussed later today). • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!). 26 Communication complexity • (Distributional) communication complexity with input distribution 𝜇 and error 𝜀: 𝐶𝐶 𝐹, 𝜇, 𝜀 .⁡ Error ≤ 𝜀 w.r.t. 𝜇. • (Randomized/worst-case) communication complexity: 𝐶𝐶(𝐹, 𝜀). Error ≤ 𝜀 on all inputs. • Yao’s minimax: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 27 Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Equality 𝐸𝑄 𝑋, 𝑌 ≔ 1𝑋=𝑌 . • 𝐶𝐶 𝐸𝑄, 𝜀 ≈ 1 log . 𝜀 • 𝐶𝐶 𝐸𝑄, 0 ≈ 𝑛. 28 Equality •𝐹⁡is “𝑋 = 𝑌? ”. •𝜇⁡is a distribution where w.p. ½⁡𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A Error? • Shows that 𝐶𝐶 𝐸𝑄, 𝜇, 2−129 ≤ 129.⁡ B Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Inner⁡product⁡𝐼𝑃 𝑋, 𝑌 ≔ ∑𝑖 𝑋𝑖 ⋅ 𝑌𝑖 ⁡(𝑚𝑜𝑑⁡2). • 𝐶𝐶 𝐼𝑃, 0 = 𝑛 − 𝑜(𝑛). In fact, using information complexity: • 𝐶𝐶 𝐼𝑃, 𝜀 = 𝑛 − 𝑜𝜀 (𝑛). 30 Information complexity • Information complexity 𝐼𝐶(𝐹, 𝜀):: communication complexity 𝐶𝐶 𝐹, 𝜀 as • Shannon’s entropy 𝐻(𝑋) :: transmission cost 𝐶(𝑋) 31 Information complexity • The smallest amount of information Alice and Bob need to exchange to solve 𝐹. • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 32 Basic definition 1: The information cost of a protocol • Prior distribution: 𝑋, 𝑌 ∼ 𝜇. Y X Protocol Protocol π transcript Π B A 𝐼𝐶(𝜋, 𝜇) ⁡ = ⁡𝐼(Π; 𝑌|𝑋) ⁡⁡ + ⁡⁡𝐼(Π; 𝑋|𝑌)⁡ what Alice learns about Y + what Bob learns about X Example •𝐹⁡is “𝑋 = 𝑌? ”. •𝜇⁡is a distribution where w.p. ½⁡𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) ⁡ = ⁡𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ⁡ ≈⁡1 + 64.5 = 65.5 bits what Alice learns about Y + what Bob learns about X Prior 𝜇 matters a lot for information cost! • If 𝜇 = 1 𝑥,𝑦 a singleton, 𝐼𝐶 𝜋, 𝜇 = 0. 35 Example •𝐹⁡is “𝑋 = 𝑌? ”. •𝜇⁡is a distribution where (𝑋, 𝑌) are just uniformly random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) ⁡ = ⁡𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ⁡ ≈⁡0 + 128 = 128 bits what Alice learns about Y + what Bob learns about X Basic definition 2: Information complexity • Communication complexity: 𝐶𝐶 𝐹, 𝜇, 𝜀 ≔ min 𝜋⁡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹⁡𝑤𝑖𝑡ℎ⁡𝑒𝑟𝑟𝑜𝑟⁡≤𝜀 Needed! • Analogously: 𝐼𝐶 𝐹, 𝜇, 𝜀 ≔ 𝜋. inf 𝜋⁡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹⁡𝑤𝑖𝑡ℎ⁡𝑒𝑟𝑟𝑜𝑟⁡≤𝜀 𝐼𝐶(𝜋, 𝜇). 37 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 • For information 𝐼𝐶 𝐹, 𝜀 ≔ inf 𝜋⁡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹⁡𝑤𝑖𝑡ℎ⁡𝑒𝑟𝑟𝑜𝑟⁡≤𝜀 max ⁡𝐼𝐶(𝜋, 𝜇). 𝜇 38 Ex: The information complexity of Equality • What is 𝐼𝐶(𝐸𝑄, 0)? • Consider the following protocol. A non-singular in 𝒁𝒏×𝒏 X in {0,1}n A1·X Y in {0,1}n Continue for n steps, or A1·Y until a disagreement is A2·X discovered. A A2·Y B39 Analysis (sketch) • If X≠Y, the protocol will terminate in O(1) rounds on average, and thus reveal O(1) information. • If X=Y… the players only learn the fact that X=Y (≤1 bit of information). • Thus the protocol has O(1) information complexity for any prior 𝜇. 40 Operationalizing IC: Information equals amortized communication • Recall [Shannon]: lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ 𝑛 • Turns out: lim 𝐶𝐶(𝐹 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 , 𝑛→∞ for 𝜀 > 0. [Error 𝜀 allowed on each copy] • For 𝜀 = 0: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0+ )/𝑛 = 𝐼𝐶 𝐹, 𝜇, 0 . 𝑛→∞ • [ lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0)/𝑛 an interesting open 𝑛→∞ problem.] 41 Information = amortized communication • lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 . 𝑛→∞ • Two directions: “≤” and “≥”. • 𝑛 ⋅ 𝐻 𝑋 = 𝐻 𝑋 𝑛 ≤ 𝐶 𝑋 𝑛 ≤ 𝐻 𝑋 𝑛 + 1. Additivity of entropy Compression (Huffman) The “≤” direction • lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 . 𝑛→∞ • Start with a protocol 𝜋 solving 𝐹, whose 𝐼𝐶 𝜋, 𝜇 is close to 𝐼𝐶 𝐹, 𝜇, 𝜀 . • Show how to compress many copies of 𝜋 into a protocol whose communication cost is close to its information cost. • More on compression later. 43 The “≥” direction • lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 ≥ 𝐼𝐶 𝐹, 𝜇, 𝜀 . 𝑛→∞ • Use the fact that 𝐶𝐶 𝐹 𝑛 ,𝜇𝑛 ,𝜀 𝑛 ≥ 𝐼𝐶 𝐹 𝑛 ,𝜇𝑛 ,𝜀 𝑛 . • Additivity of information complexity: 𝐼𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀 = 𝐼𝐶 𝐹, 𝜇, 𝜀 . 𝑛 44 Proof: Additivity of information complexity • Let 𝑇1 (𝑋1 , 𝑌1 ) and 𝑇2 (𝑋2 , 𝑌2 ) be two twoparty tasks. • E.g. “Solve 𝐹(𝑋, 𝑌) with error ≤ 𝜀 w.r.t. 𝜇” • Then 𝐼𝐶 𝑇1 × 𝑇2 , 𝜇1 × 𝜇2 = 𝐼𝐶 𝑇1 , 𝜇1 + 𝐼𝐶 𝑇2 , 𝜇2 • “≤” is easy. • “≥” is the interesting direction. 𝐼𝐶 𝑇1 , 𝜇1 + 𝐼𝐶 𝑇2 , 𝜇2 ≤ 𝐼𝐶 𝑇1 × 𝑇2 , 𝜇1 × 𝜇2 • Start from a protocol 𝜋 for 𝑇1 × 𝑇2 with prior 𝜇1 × 𝜇2 , whose information cost is 𝐼. • Show how to construct two protocols 𝜋1 for 𝑇1 with prior 𝜇1 and 𝜋2 for 𝑇2 with prior 𝜇2 , with information costs 𝐼1 and 𝐼2 , respectively, such that 𝐼1 + 𝐼2 = 𝐼. 46 𝜋( 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) 𝜋1 𝑋1 , 𝑌1 • Publicly sample 𝑋2 ∼ 𝜇2 • Bob privately samples 𝑌2 ∼ 𝜇2 |X2 • Run 𝜋( 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) 𝜋2 𝑋2 , 𝑌2 • Publicly sample 𝑌1 ∼ 𝜇1 • Alice privately samples 𝑋1 ∼ 𝜇1 |Y1 • Run 𝜋( 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) Analysis - 𝜋1 𝜋1 𝑋1 , 𝑌1 • Publicly sample 𝑋2 ∼ 𝜇2 • Bob privately samples 𝑌2 ∼ 𝜇2 |X2 • Run 𝜋( 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) • Alice learns about 𝑌1 : 𝐼 Π; 𝑌1 X1 X2 ) • Bob learns about 𝑋1 : 𝐼 Π; 𝑋1 𝑌1 𝑌2 𝑋2 ). • 𝐼1 = 𝐼 Π; 𝑌1 X1 X2 ) + 𝐼 Π; 𝑋1 𝑌1 𝑌2 𝑋2 ). 48 Analysis - 𝜋2 𝜋2 𝑋2 , 𝑌2 • Publicly sample 𝑌1 ∼ 𝜇1 • Alice privately samples 𝑋1 ∼ 𝜇1 |Y1 • Run 𝜋( 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) • Alice learns about 𝑌2 : 𝐼 Π; 𝑌2 X1 X2 Y1 ) • Bob learns about 𝑋2 : 𝐼 Π; 𝑋2 𝑌1 𝑌2 ). • 𝐼2 = 𝐼 Π; 𝑌2 X1 X2 Y1 ) + 𝐼 Π; 𝑋2 𝑌1 𝑌2 ). 49 Adding 𝐼1 and 𝐼2 𝐼1 + 𝐼2 = 𝐼 Π; 𝑌1 X1 X2 ) + 𝐼 Π; 𝑋1 𝑌1 𝑌2 𝑋2 ) + 𝐼 Π; 𝑌2 X1 X2 Y1 ) + 𝐼 Π; 𝑋2 𝑌1 𝑌2 ) = 𝐼 Π; 𝑌1 X1 X2 ) + 𝐼 Π; 𝑌2 X1 X2 Y1 ) + 𝐼 Π; 𝑋2 𝑌1 𝑌2 ) + 𝐼 Π; 𝑋1 𝑌1 𝑌2 𝑋2 ) = 𝐼 Π; 𝑌1 𝑌2 𝑋1 𝑋2 + 𝐼 Π; 𝑋2 𝑋1 𝑌1 𝑌2 = 𝐼. 50 Summary • Information complexity is additive. • Operationalized via “Information = amortized communication”. • lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 . 𝑛→∞ • Seems to be the “right” analogue of entropy for interactive computation. 51 Entropy vs. Information Complexity Entropy IC Additive? Yes Yes Operationalized 𝐶(𝑋 𝑛 )/𝑛 Compression? lim 𝑛→∞ Huffman: 𝐶 𝑋 ≤𝐻 𝑋 +1 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀 lim 𝑛→∞ 𝑛 ???! Can interactive communication be compressed? • Is it true that 𝐶𝐶 𝐹, 𝜇, 𝜀 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 + 𝑂(1)? • Less ambitiously: 𝐶𝐶 𝐹, 𝜇, 𝑂(𝜀) = 𝑂 𝐼𝐶 𝐹, 𝜇, 𝜀 ? • (Almost) equivalently: Given a protocol 𝜋 with 𝐼𝐶 𝜋, 𝜇 = 𝐼, can Alice and Bob simulate 𝜋 using 𝑂 𝐼 communication? • Not known in general… 53 Direct sum theorems • Let 𝐹 be any functionality. • Let 𝐶(𝐹) be the cost of implementing 𝐹. • Let 𝐹𝑛 be the functionality of implementing 𝑛 independent copies of 𝐹. • The direct sum problem: “Does 𝐶(𝐹 𝑛 ) ⁡ ≈ ⁡𝑛 ∙ 𝐶(𝐹)?” • In most cases it is obvious that 𝐶(𝐹 𝑛 ) ≤ 𝑛 ∙ 𝐶(𝐹). 54 Direct sum – randomized communication complexity • Is it true that 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 𝜀) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀))? • Is it true that 𝐶𝐶(𝐹𝑛, 𝜀) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜀))? 55 Direct product – randomized communication complexity • Direct sum 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 𝜀) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀))? • Direct product 𝐶𝐶(𝐹𝑛, 𝜇𝑛, (1 − 𝜀)𝑛 ) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀))? 56 Direct sum for randomized CC and interactive compression Direct sum: • 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 𝜀) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀))? In the limit: • 𝑛 ⋅ 𝐼𝐶(𝐹, 𝜇, 𝜀) ⁡ = ⁡Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀))? Interactive compression: • 𝐶𝐶 𝐹, 𝜇, 𝜀 = 𝑂(𝐼𝐶 𝐹, 𝜇, 𝜀) ? Same question! 57 The big picture 𝐼𝐶(𝐹, 𝜇, 𝜀) additivity (=direct sum) for information interactive compression? 𝐶𝐶(𝐹, 𝜇, 𝜀) direct sum for communication? 𝐼𝐶(𝐹𝑛, 𝜇𝑛 , 𝜀)/𝑛 information = amortized communication 𝐶𝐶(𝐹𝑛, 𝜇𝑛 , 𝜀)/𝑛 Current results for compression A protocol 𝜋⁡that has 𝐶 bits of communication, conveys 𝐼 bits of information over prior 𝜇, and works in 𝑟 rounds can be simulated: • Using 𝑂(𝐼 + 𝑟) bits of communication. • Using 𝑂 𝐼 ⋅ 𝐶 bits of communication. • Using 2𝑂(𝐼) bits of communication. • If 𝜇 = 𝜇𝑋 × 𝜇𝑌 , then using 𝑂(𝐼 polylog 𝐶) bits of communication. 59 Their direct sum counterparts • 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 𝜀) ⁡ = ⁡ Ω(𝑛1/2 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀)). • 𝐶𝐶(𝐹𝑛, 𝜀) ⁡ = ⁡ Ω(𝑛1/2 ∙ 𝐶𝐶(𝐹, 𝜀)). For product distributions 𝜇 = 𝜇𝑋 × 𝜇𝑌 , • 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 𝜀) ⁡ = ⁡ Ω(𝑛 ∙ 𝐶𝐶(𝐹, 𝜇, 𝜀)). When the number of rounds is bounded by 𝑟 ≪ 𝑛, a direct sum theorem holds. 60 Direct product • The best one can hope for is a statement of the type: 𝐶𝐶(𝐹𝑛, 𝜇𝑛, 1 − 2−𝑂 𝑛 ) ⁡ = Ω(𝑛 ∙ 𝐼𝐶(𝐹, 𝜇, 1/3)). • Can prove: 𝐶𝐶 𝐹𝑛, 𝜇𝑛, 1 − 2−𝑂 𝑛 = Ω(𝑛1/2 ∙ 𝐶𝐶(𝐹, 𝜇, 1/3)). 61 Proof 2: Compressing a one-round protocol • • 𝐼 • Say Alice speaks: 𝐼𝐶 𝜋, 𝜇 = 𝐼 𝑀; 𝑋 𝑌 . Recall KL-divergence: 𝑀; 𝑋 𝑌 = 𝐸𝑌 ⁡𝐷 𝑀𝑋𝑌 ∥ 𝑀𝑌 = 𝐸𝑌 ⁡𝐷 𝑀𝑋 ∥ 𝑀𝑌 Bottom line: – Alice has 𝑀𝑋 ; Bob has 𝑀𝑌 ; – Goal: sample from 𝑀𝑋 using ∼ 𝐷(𝑀𝑋 ∥ 𝑀𝑌 ) communication. 62 The dart board • Interpret the public randomness as q1 u random points in 𝑈 × [0,1], where 𝑈 1 is the universe of all possible messages. • First message under the histogram of 𝑀 is distributed ∼ 𝑀. 0 q2 u q3 u q4 u q5 u q6 u u3 u4 u1 u2 u5 𝑈 q7 …. u Proof Idea • Sample using 𝑂(log⁡1/𝜀 + 𝐷(𝑀𝑋 ∥ 𝑀𝑌)) communication with statistical error ε. Public randomness: 1 q1 u MX q2 u q3 u u3 ~|U| samples q4 u q5 u q6 u MY 1 q7 …. u u3 u1 u1 u2 u4 u4 u2 0 MX 0 u4 MY 64 Proof Idea • Sample using 𝑂(log⁡1/𝜀 + 𝐷(𝑀𝑋 ∥ 𝑀𝑌)) communication with statistical error ε. 1 MX 1 MY u2 u4 0 MX u4 h1(u4) h2(u4)  u2 0 MY 65 Proof Idea • Sample using 𝑂(log⁡1/𝜀 + 𝐷(𝑀𝑋 ∥ 𝑀𝑌)) communication with statistical error ε. 1 MX 1 2MY u4 0 MX u2 u4 u4 h3(u4) h4(u4)… hlog 1/ ε(u4)  0 u4 MY h1(u4), h2(u4) 66 Analysis ⁡ 𝑘 2 𝑀𝑌(𝑢4), • If 𝑀𝑋(𝑢4) ≈ then the protocol will reach round 𝑘 of doubling. • There will be ≈ 2𝑘 candidates. • About 𝑘 + log⁡1/𝜀 hashes to narrow to one. • The contribution of 𝑢4 to cost: – 𝑀𝑋(𝑢4)⁡(log⁡𝑀𝑋⁡(𝑢4)/𝑀𝑌⁡(𝑢4) ⁡ + ⁡log⁡1/𝜀). 𝐷(𝑀𝑋 ∥ 𝑀𝑌 ) ≔ 𝑢 𝑀𝑋 (𝑢) 𝑀𝑋 (𝑢) log . 𝑀𝑌 (𝑢) Done! 67 External information cost • (𝑋, 𝑌)⁡~⁡𝜇. X C Y Protocol Protocol π transcript π B A 𝐼𝐶𝑒𝑥𝑡(𝜋, 𝜇) ⁡ = ⁡𝐼(Π; 𝑋𝑌)⁡ what Charlie learns about (X,Y) Example •F is “X=Y?”. •μ is a distribution where w.p. ½ X=Y and w.p. ½ (X,Y) are random. Y X MD5(X) X=Y? B A 𝐼𝐶𝑒𝑥𝑡(𝜋, 𝜇) ⁡ = ⁡𝐼(Π; 𝑋𝑌) ⁡ = ⁡129⁡𝑏𝑖𝑡𝑠⁡ what Charlie learns about (X,Y) 69 External information cost • It is always the case that 𝐼𝐶𝑒𝑥𝑡 𝜋, 𝜇 ≥ 𝐼𝐶 𝜋, 𝜇 . • If 𝜇 = 𝜇𝑋 × 𝜇𝑌 is a product distribution, then 𝐼𝐶𝑒𝑥𝑡 𝜋, 𝜇 = 𝐼𝐶 𝜋, 𝜇 . 70 External information complexity • 𝐼𝐶𝑒𝑥𝑡 𝐹, 𝜇, 𝜀 ≔ inf 𝜋⁡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹⁡𝑤𝑖𝑡ℎ⁡𝑒𝑟𝑟𝑜𝑟⁡≤𝜀 𝐼𝐶𝑒𝑥𝑡 (𝜋, 𝜇). • Can it be operationalized? 71 Operational meaning of 𝐼𝐶𝑒𝑥𝑡 ? • Conjecture: Zero-error communication scales like external information: 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 0 lim = 𝐼𝐶𝑒𝑥𝑡 𝐹, 𝜇, 0 ? 𝑛→∞ 𝑛 • Recall: 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 0+ lim = 𝐼𝐶 𝐹, 𝜇, 0 . 𝑛→∞ 𝑛 72 Example – transmission with a strong prior • 𝑋, 𝑌 ∈ 0,1 • 𝜇 is such that 𝑋 ∈𝑈 0,1 , and 𝑋 = 𝑌 with a very high probability (say 1 − 1/ 𝑛). • 𝐹 𝑋, 𝑌 = 𝑋 is just the “transmit 𝑋” function. • Clearly, 𝜋 should just have Alice send 𝑋 to Bob. • 𝐼𝐶 𝐹, 𝜇, 0 = 𝐼𝐶 𝜋, 𝜇 = 𝐻 1 𝑛 = o(1). • 𝐼𝐶𝑒𝑥𝑡 𝐹, 𝜇, 0 = 𝐼𝐶𝑒𝑥𝑡 𝜋, 𝜇 = 1. 73 Example – transmission with a strong prior • 𝐼𝐶 𝐹, 𝜇, 0 = 𝐼𝐶 𝜋, 𝜇 = 𝐻 1 𝑛 = o(1). • 𝐼𝐶𝑒𝑥𝑡 𝐹, 𝜇, 0 = 𝐼𝐶𝑒𝑥𝑡 𝜋, 𝜇 = 1. • 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 0+ = 𝑜 𝑛 . • 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 0 = Ω(𝑛). Other examples, e.g. the two-bit AND function fit into this picture. 74 Additional directions Information complexity Interactive coding Information theory in TCS 75 Interactive coding theory • So far focused the discussion on noiseless coding. • What if the channel has noise? • [What kind of noise?] • In the non-interactive case, each channel has a capacity 𝐶. 76 Channel capacity • The amortized number of channel uses needed to send 𝑋 over a noisy channel of capacity 𝐶 is 𝐻 𝑋 𝐶 • Decouples the task from the channel! 77 Example: Binary Symmetric Channel • Each bit gets independently flipped with probability 𝜀 < 1/2. • One way capacity 1 − 𝐻 𝜀 . 1−𝜀 0 𝜀 1 0 𝜀 1−𝜀 1 78 Interactive channel capacity • Not clear one can decouple channel from task in such a clean way. • Capacity much harder to calculate/reason about. 1−𝜀 0 • Example: Binary symmetric channel. 0 𝜀 • One way capacity 1 − 𝐻 𝜀 . 1 1 • Interactive (for simple pointer jumping, 1 − 𝜀 [Kol-Raz’13]): 1−Θ 𝐻 𝜀 . Information theory in communication complexity and beyond • A natural extension would be to multi-party communication complexity. • Some success in the number-in-hand case. • What about the number-on-forehead? • Explicit bounds for ≥ log 𝑛 players would imply explicit 𝐴𝐶𝐶 0 circuit lower bounds. 80 Naïve multi-party information cost XZ YZ A XY B C 𝐼𝐶 𝜋, 𝜇 = 𝐼(Π; 𝑋|𝑌𝑍)+ 𝐼(Π; 𝑌|𝑋𝑍)+ 𝐼(Π; 𝑍|𝑋𝑌) 81 Naïve multi-party information cost 𝐼𝐶 𝜋, 𝜇 = 𝐼(Π; 𝑋|𝑌𝑍)+ 𝐼(Π; 𝑌|𝑋𝑍)+ 𝐼(Π; 𝑍|𝑋𝑌) • Doesn’t seem to work. • Secure multi-party computation [BenOr,Goldwasser, Wigderson], means that anything can be computed at near-zero information cost. • Although, these construction require the players to share private channels/randomness. 82 Communication and beyond… • The rest of today: – – – – • • • • Data structures; Streaming; Distributed computing; Privacy. Exact communication complexity bounds. Extended formulations lower bounds. Parallel repetition? … 83 Thank You! 84 Open problem: Computability of IC • Given the truth table of 𝐹 𝑋, 𝑌 , 𝜇 and 𝜀, compute 𝐼𝐶 𝐹, 𝜇, 𝜀 .⁡ • Via 𝐼𝐶 𝐹, 𝜇, 𝜀 = lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 can 𝑛→∞ compute a sequence of upper bounds. • But the rate of convergence as a function of 𝑛 is unknown. 85 Open problem: Computability of IC • Can compute the 𝑟-round⁡𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 information complexity of 𝐹. • But the rate of convergence as a function of 𝑟 is unknown. • Conjecture: 1 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 − 𝐼𝐶 𝐹, 𝜇, 𝜀 = 𝑂𝐹,𝜇,𝜀 2 . 𝑟 • This is the relationship for the two-bit AND. 86

Basics of information theory and information complexity a tutorial

Related documents

Products

Support

Basics of information theory and information complexity a tutorial

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib