Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory • Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. communication channel Alice Bob 2 Quantifying “information” • Information is measured in bits. • The basic notion is Shannon’s entropy. • The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. • For a discrete variable: π» π β ∑ Pr π = π₯ log 1/Prβ‘[π = π₯] 3 Shannon’s entropy • Important examples and properties: – If π = π₯ is a constant, then π» π = 0. – If π is uniform on a finite set π of possible values, then π» π = log π. – If π is supported on at most π values, then π» X ≤ log π. – If π is a random variable determined by π, then π» π ≤ π»(π). 4 Conditional entropy • For two (potentially correlated) variables π, π, the conditional entropy of π given π is the amount of uncertainty left in π given π: π» π π β πΈπ¦~π H X Y = y . • One can show π» ππ = π» π + π»(π|π). • This important fact is knows as the chain rule. • If π ⊥ π, then π» ππ = π» π + π» π π = π» π + π» π . 5 Example • • • • π = π΅1 , π΅2 , π΅3 π = π΅1 ⊕ π΅2 , π΅2 ⊕ π΅4 , π΅3 ⊕ π΅4 , π΅5 Where π΅1 , π΅2 , π΅3 , π΅4 , π΅5 ∈π {0,1}. Then – π» π = 3; π» π = 4; π» ππ = 5; – π» π π = 1 = π» ππ − π» π ; – π» π π = 2 = π» ππ − π» π . 6 Mutual information • π = π΅1 , π΅2 , π΅3 • π = π΅1 ⊕ π΅2 , π΅2 ⊕ π΅4 , π΅3 ⊕ π΅4 , π΅5 π»(π) π»(π|π) πΌ(π; π) π΅1 π΅1 ⊕ π΅2 π΅2 ⊕ π΅3 π»(π|π) π»(π) π΅4 π΅5 7 Mutual information • The mutual information is defined as πΌ π; π = π» π − π» π π = π» π − π»(π|π) • “By how much does knowing π reduce the entropy of π?” • Always non-negative πΌ π; π ≥ 0. • Conditional mutual information: πΌ π; π π β π» π π − π»(π|ππ) • Chain rule for mutual information: πΌ ππ; π = πΌ π; π + πΌ(π; π|π) 8 • Simple intuitive interpretation. Example – a biased coin • A coin with π-Heads or Tails bias is tossed several times. • Let π΅ ∈ {π», π} be the bias, and suppose that a-priori both options are equally likely: π» π΅ = 1. • How many tosses needed to find π΅? • Let π1 , … , ππ be a sequence of tosses. • Start with π = 2. 9 What do we learn about π΅? • πΌ π΅; π1 π2 = πΌ π΅; π1 + πΌ π΅; π2 π1 = πΌ π΅; π1 + πΌ π΅β‘π1 ; π2 − πΌ π1 ; π2 ≤ πΌ π΅; π1 + πΌ π΅β‘π1 ; π2 = πΌ π΅; π1 + πΌ π΅; π2 + πΌ π1 ; π2 π΅ = πΌ π΅; π1 + πΌ π΅; π2 = 2 ⋅ πΌ π΅; π1 .β‘ • Similarly, πΌ π΅; π1 … ππ ≤ π ⋅ πΌ π΅; π1 . • To determine π΅ with constant accuracy, need 0 < c < πΌ π΅; π1 … ππ ≤ π ⋅ πΌ π΅; π1 . • π = Ω(1/πΌ(π΅; π1 )). 10 Kullback–Leibler (KL)-Divergence • A distance metric between distributions on the same space. • Plays a key role in information theory. π[π₯] π·(π β₯ π) β π π₯ log . π[π₯] π₯ • π·(π β₯ π) ≥ 0, with equality when π = π. • Caution: π·(π β₯ π) ≠ π·(π β₯ π)! 11 Properties of KL-divergence • Connection to mutual information: πΌ π; π = πΈπ¦∼π π·(ππ=π¦ β₯ π). • If π ⊥ π, then ππ=π¦ = π, and both sides are 0. • Pinsker’s inequality: π − π 1 = π( π·(π β₯ π)). • Tight! π·(π΅1/2+π β₯ π΅1/2 ) = Θ(π 2 ). 12 Back to the coin example • πΌ π΅; π1 = πΈπ∼π΅ π·(π1,π΅=π β₯ π1 ) = π· π΅1 β₯ π΅1 = Θ π 2 . 2 • π=Ω 1 πΌ π΅;π1 ±π = 2 1 Ω( 2). π • “Follow the information learned from the coin tosses” • Can be done using combinatorics, but the information-theoretic language is more natural for expressing what’s going on. Back to communication • The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize. • Can attach operational meaning to Shannon’s entropy: π» π ≈ “the cost of transmitting π”. • Let πΆ π be the (expected) cost of transmitting a sample of π. 14 π» π = πΆ(π)? • Not quite. • Let trit π ∈π 1,2,3 . 5 3 • πΆ π = ≈ 1.67. 1 2 0 10 3 11 • π» π = log 3 ≈ 1.58. • It is always the case that πΆ π ≥ π»(π). 15 But π» π and πΆ(π) are close • Huffman’s coding: πΆ π ≤ π» π + 1. • This is a compression result: “an uninformative message turned into a short one”. • Therefore: π» π ≤ πΆ π ≤ π» π + 1. 16 Shannon’s noiseless coding • The cost of communicating many copies of π scales as π»(π). • Shannon’s source coding theorem: – Let πΆ π π be the cost of transmitting π independent copies of π. Then the amortized transmission cost lim πΆ(π π )/π = π» π . π→∞ • This equation gives π»(π) operational meaning. 17 π» π β‘ππππππ‘πππππππ§ππ π1 , … , ππ , … π»(π) per copy to transmit π’s communication channel 18 π»(π) is nicer than πΆ(π) • π» π is additive for independent variables. • Let π1 , π2 ∈π {1,2,3} be independent trits. • π» π1 π2 = log 9 = 2 log 3. • πΆ π1 π2 = 29 9 5 3 < πΆ π1 + πΆ π2 = 2 × = 30 . 9 • Works well with concepts such as channel capacity. 19 “Proof” of Shannon’s noiseless coding • π ⋅ π» π = π» π π ≤ πΆ π π ≤ π» π π + 1. Additivity of entropy Compression (Huffman) • Therefore lim πΆ(π π )/π = π» π . π→∞ 20 Operationalizing other quantities • Conditional entropy π» π π : • (cf. Slepian-Wolf Theorem). π1 , … , ππ , … π»(π|π) per copy to transmit π’s communication channel π1 , … , ππ , … Operationalizing other quantities • Mutual information πΌ π; π : π1 , … , ππ , … π1 , … , ππ , … πΌ π; π per copy to sample π’s communication channel Information theory and entropy • Allows us to formalize intuitive notions. • Operationalized in the context of one-way transmission and related problems. • Has nice properties (additivity, chain rule…) • Next, we discuss extensions to more interesting communication scenarios. 23 Communication complexity • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality πΉ(π, π). X Y F(X,Y) A e.g. πΉ π, π = “π = π? ” B 24 Communication complexity Goal: implement a functionality πΉ(π, π). A protocol π(π, π) computing πΉ(π, π): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications (some will be discussed later today). • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!). 26 Communication complexity • (Distributional) communication complexity with input distribution π and error π: πΆπΆ πΉ, π, π .β‘ Error ≤ π w.r.t. π. • (Randomized/worst-case) communication complexity: πΆπΆ(πΉ, π). Error ≤ π on all inputs. • Yao’s minimax: πΆπΆ πΉ, π = max πΆπΆ(πΉ, π, π). π 27 Examples • π, π ∈ 0,1 π . • Equality πΈπ π, π β 1π=π . • πΆπΆ πΈπ, π ≈ 1 log . π • πΆπΆ πΈπ, 0 ≈ π. 28 Equality •πΉβ‘is “π = π? ”. •πβ‘is a distribution where w.p. ½β‘π = π and w.p. ½ (π, π) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A Error? • Shows that πΆπΆ πΈπ, π, 2−129 ≤ 129.β‘ B Examples • π, π ∈ 0,1 π . • Innerβ‘productβ‘πΌπ π, π β ∑π ππ ⋅ ππ β‘(πππβ‘2). • πΆπΆ πΌπ, 0 = π − π(π). In fact, using information complexity: • πΆπΆ πΌπ, π = π − ππ (π). 30 Information complexity • Information complexity πΌπΆ(πΉ, π):: communication complexity πΆπΆ πΉ, π as • Shannon’s entropy π»(π) :: transmission cost πΆ(π) 31 Information complexity • The smallest amount of information Alice and Bob need to exchange to solve πΉ. • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 32 Basic definition 1: The information cost of a protocol • Prior distribution: π, π ∼ π. Y X Protocol Protocol π transcript Π B A πΌπΆ(π, π) β‘ = β‘πΌ(Π; π|π) β‘β‘ + β‘β‘πΌ(Π; π|π)β‘ what Alice learns about Y + what Bob learns about X Example •πΉβ‘is “π = π? ”. •πβ‘is a distribution where w.p. ½β‘π = π and w.p. ½ (π, π) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B πΌπΆ(π, π) β‘ = β‘πΌ(Π; π|π) + πΌ(Π; π|π) β‘ ≈β‘1 + 64.5 = 65.5 bits what Alice learns about Y + what Bob learns about X Prior π matters a lot for information cost! • If π = 1 π₯,π¦ a singleton, πΌπΆ π, π = 0. 35 Example •πΉβ‘is “π = π? ”. •πβ‘is a distribution where (π, π) are just uniformly random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B πΌπΆ(π, π) β‘ = β‘πΌ(Π; π|π) + πΌ(Π; π|π) β‘ ≈β‘0 + 128 = 128 bits what Alice learns about Y + what Bob learns about X Basic definition 2: Information complexity • Communication complexity: πΆπΆ πΉ, π, π β min πβ‘πππππ’π‘ππ πΉβ‘π€ππ‘ββ‘πππππβ‘≤π Needed! • Analogously: πΌπΆ πΉ, π, π β π. inf πβ‘πππππ’π‘ππ πΉβ‘π€ππ‘ββ‘πππππβ‘≤π πΌπΆ(π, π). 37 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: πΆπΆ πΉ, π = max πΆπΆ(πΉ, π, π). π • For information πΌπΆ πΉ, π β inf πβ‘πππππ’π‘ππ πΉβ‘π€ππ‘ββ‘πππππβ‘≤π max β‘πΌπΆ(π, π). π 38 Ex: The information complexity of Equality • What is πΌπΆ(πΈπ, 0)? • Consider the following protocol. A non-singular in ππ×π X in {0,1}n A1·X Y in {0,1}n Continue for n steps, or A1·Y until a disagreement is A2·X discovered. A A2·Y B39 Analysis (sketch) • If X≠Y, the protocol will terminate in O(1) rounds on average, and thus reveal O(1) information. • If X=Y… the players only learn the fact that X=Y (≤1 bit of information). • Thus the protocol has O(1) information complexity for any prior π. 40 Operationalizing IC: Information equals amortized communication • Recall [Shannon]: lim πΆ(π π )/π = π» π . π→∞ π • Turns out: lim πΆπΆ(πΉ , ππ , π)/π = πΌπΆ πΉ, π, π , π→∞ for π > 0. [Error π allowed on each copy] • For π = 0: lim πΆπΆ(πΉ π , ππ , 0+ )/π = πΌπΆ πΉ, π, 0 . π→∞ • [ lim πΆπΆ(πΉ π , ππ , 0)/π an interesting open π→∞ problem.] 41 Information = amortized communication • lim πΆπΆ(πΉ π , ππ , π)/π = πΌπΆ πΉ, π, π . π→∞ • Two directions: “≤” and “≥”. • π ⋅ π» π = π» π π ≤ πΆ π π ≤ π» π π + 1. Additivity of entropy Compression (Huffman) The “≤” direction • lim πΆπΆ(πΉ π , ππ , π)/π ≤ πΌπΆ πΉ, π, π . π→∞ • Start with a protocol π solving πΉ, whose πΌπΆ π, π is close to πΌπΆ πΉ, π, π . • Show how to compress many copies of π into a protocol whose communication cost is close to its information cost. • More on compression later. 43 The “≥” direction • lim πΆπΆ(πΉ π , ππ , π)/π ≥ πΌπΆ πΉ, π, π . π→∞ • Use the fact that πΆπΆ πΉ π ,ππ ,π π ≥ πΌπΆ πΉ π ,ππ ,π π . • Additivity of information complexity: πΌπΆ πΉ π , ππ , π = πΌπΆ πΉ, π, π . π 44 Proof: Additivity of information complexity • Let π1 (π1 , π1 ) and π2 (π2 , π2 ) be two twoparty tasks. • E.g. “Solve πΉ(π, π) with error ≤ π w.r.t. π” • Then πΌπΆ π1 × π2 , π1 × π2 = πΌπΆ π1 , π1 + πΌπΆ π2 , π2 • “≤” is easy. • “≥” is the interesting direction. πΌπΆ π1 , π1 + πΌπΆ π2 , π2 ≤ πΌπΆ π1 × π2 , π1 × π2 • Start from a protocol π for π1 × π2 with prior π1 × π2 , whose information cost is πΌ. • Show how to construct two protocols π1 for π1 with prior π1 and π2 for π2 with prior π2 , with information costs πΌ1 and πΌ2 , respectively, such that πΌ1 + πΌ2 = πΌ. 46 π( π1 , π2 , π1 , π2 ) π1 π1 , π1 • Publicly sample π2 ∼ π2 • Bob privately samples π2 ∼ π2 |X2 • Run π( π1 , π2 , π1 , π2 ) π2 π2 , π2 • Publicly sample π1 ∼ π1 • Alice privately samples π1 ∼ π1 |Y1 • Run π( π1 , π2 , π1 , π2 ) Analysis - π1 π1 π1 , π1 • Publicly sample π2 ∼ π2 • Bob privately samples π2 ∼ π2 |X2 • Run π( π1 , π2 , π1 , π2 ) • Alice learns about π1 : πΌ Π; π1 X1 X2 ) • Bob learns about π1 : πΌ Π; π1 π1 π2 π2 ). • πΌ1 = πΌ Π; π1 X1 X2 ) + πΌ Π; π1 π1 π2 π2 ). 48 Analysis - π2 π2 π2 , π2 • Publicly sample π1 ∼ π1 • Alice privately samples π1 ∼ π1 |Y1 • Run π( π1 , π2 , π1 , π2 ) • Alice learns about π2 : πΌ Π; π2 X1 X2 Y1 ) • Bob learns about π2 : πΌ Π; π2 π1 π2 ). • πΌ2 = πΌ Π; π2 X1 X2 Y1 ) + πΌ Π; π2 π1 π2 ). 49 Adding πΌ1 and πΌ2 πΌ1 + πΌ2 = πΌ Π; π1 X1 X2 ) + πΌ Π; π1 π1 π2 π2 ) + πΌ Π; π2 X1 X2 Y1 ) + πΌ Π; π2 π1 π2 ) = πΌ Π; π1 X1 X2 ) + πΌ Π; π2 X1 X2 Y1 ) + πΌ Π; π2 π1 π2 ) + πΌ Π; π1 π1 π2 π2 ) = πΌ Π; π1 π2 π1 π2 + πΌ Π; π2 π1 π1 π2 = πΌ. 50 Summary • Information complexity is additive. • Operationalized via “Information = amortized communication”. • lim πΆπΆ(πΉ π , ππ , π)/π = πΌπΆ πΉ, π, π . π→∞ • Seems to be the “right” analogue of entropy for interactive computation. 51 Entropy vs. Information Complexity Entropy IC Additive? Yes Yes Operationalized πΆ(π π )/π Compression? lim π→∞ Huffman: πΆ π ≤π» π +1 πΆπΆ πΉ π , ππ , π lim π→∞ π ???! Can interactive communication be compressed? • Is it true that πΆπΆ πΉ, π, π ≤ πΌπΆ πΉ, π, π + π(1)? • Less ambitiously: πΆπΆ πΉ, π, π(π) = π πΌπΆ πΉ, π, π ? • (Almost) equivalently: Given a protocol π with πΌπΆ π, π = πΌ, can Alice and Bob simulate π using π πΌ communication? • Not known in general… 53 Direct sum theorems • Let πΉ be any functionality. • Let πΆ(πΉ) be the cost of implementing πΉ. • Let πΉπ be the functionality of implementing π independent copies of πΉ. • The direct sum problem: “Does πΆ(πΉ π ) β‘ ≈ β‘π β πΆ(πΉ)?” • In most cases it is obvious that πΆ(πΉ π ) ≤ π β πΆ(πΉ). 54 Direct sum – randomized communication complexity • Is it true that πΆπΆ(πΉπ, ππ, π) β‘ = β‘Ω(π β πΆπΆ(πΉ, π, π))? • Is it true that πΆπΆ(πΉπ, π) β‘ = β‘Ω(π β πΆπΆ(πΉ, π))? 55 Direct product – randomized communication complexity • Direct sum πΆπΆ(πΉπ, ππ, π) β‘ = β‘Ω(π β πΆπΆ(πΉ, π, π))? • Direct product πΆπΆ(πΉπ, ππ, (1 − π)π ) β‘ = β‘Ω(π β πΆπΆ(πΉ, π, π))? 56 Direct sum for randomized CC and interactive compression Direct sum: • πΆπΆ(πΉπ, ππ, π) β‘ = β‘Ω(π β πΆπΆ(πΉ, π, π))? In the limit: • π ⋅ πΌπΆ(πΉ, π, π) β‘ = β‘Ω(π β πΆπΆ(πΉ, π, π))? Interactive compression: • πΆπΆ πΉ, π, π = π(πΌπΆ πΉ, π, π) ? Same question! 57 The big picture πΌπΆ(πΉ, π, π) additivity (=direct sum) for information interactive compression? πΆπΆ(πΉ, π, π) direct sum for communication? πΌπΆ(πΉπ, ππ , π)/π information = amortized communication πΆπΆ(πΉπ, ππ , π)/π Current results for compression A protocol πβ‘that has πΆ bits of communication, conveys πΌ bits of information over prior π, and works in π rounds can be simulated: • Using π(πΌ + π) bits of communication. • Using π πΌ ⋅ πΆ bits of communication. • Using 2π(πΌ) bits of communication. • If π = ππ × ππ , then using π(πΌ polylog πΆ) bits of communication. 59 Their direct sum counterparts • πΆπΆ(πΉπ, ππ, π) β‘ = β‘ Ω(π1/2 β πΆπΆ(πΉ, π, π)). • πΆπΆ(πΉπ, π) β‘ = β‘ Ω(π1/2 β πΆπΆ(πΉ, π)). For product distributions π = ππ × ππ , • πΆπΆ(πΉπ, ππ, π) β‘ = β‘ Ω(π β πΆπΆ(πΉ, π, π)). When the number of rounds is bounded by π βͺ π, a direct sum theorem holds. 60 Direct product • The best one can hope for is a statement of the type: πΆπΆ(πΉπ, ππ, 1 − 2−π π ) β‘ = Ω(π β πΌπΆ(πΉ, π, 1/3)). • Can prove: πΆπΆ πΉπ, ππ, 1 − 2−π π = Ω(π1/2 β πΆπΆ(πΉ, π, 1/3)). 61 Proof 2: Compressing a one-round protocol • • πΌ • Say Alice speaks: πΌπΆ π, π = πΌ π; π π . Recall KL-divergence: π; π π = πΈπ β‘π· πππ β₯ ππ = πΈπ β‘π· ππ β₯ ππ Bottom line: – Alice has ππ ; Bob has ππ ; – Goal: sample from ππ using ∼ π·(ππ β₯ ππ ) communication. 62 The dart board • Interpret the public randomness as q1 u random points in π × [0,1], where π 1 is the universe of all possible messages. • First message under the histogram of π is distributed ∼ π. 0 q2 u q3 u q4 u q5 u q6 u u3 u4 u1 u2 u5 π q7 …. u Proof Idea • Sample using π(logβ‘1/π + π·(ππ β₯ ππ)) communication with statistical error ε. Public randomness: 1 q1 u MX q2 u q3 u u3 ~|U| samples q4 u q5 u q6 u MY 1 q7 …. u u3 u1 u1 u2 u4 u4 u2 0 MX 0 u4 MY 64 Proof Idea • Sample using π(logβ‘1/π + π·(ππ β₯ ππ)) communication with statistical error ε. 1 MX 1 MY u2 u4 0 MX u4 h1(u4) h2(u4) ο u2 0 MY 65 Proof Idea • Sample using π(logβ‘1/π + π·(ππ β₯ ππ)) communication with statistical error ε. 1 MX 1 2MY u4 0 MX u2 u4 u4 h3(u4) h4(u4)… hlog 1/ ε(u4) ο 0 u4 MY h1(u4), h2(u4) 66 Analysis β‘ π 2 ππ(π’4), • If ππ(π’4) ≈ then the protocol will reach round π of doubling. • There will be ≈ 2π candidates. • About π + logβ‘1/π hashes to narrow to one. • The contribution of π’4 to cost: – ππ(π’4)β‘(logβ‘ππβ‘(π’4)/ππβ‘(π’4) β‘ + β‘logβ‘1/π). π·(ππ β₯ ππ ) β π’ ππ (π’) ππ (π’) log . ππ (π’) Done! 67 External information cost • (π, π)β‘~β‘π. X C Y Protocol Protocol π transcript π B A πΌπΆππ₯π‘(π, π) β‘ = β‘πΌ(Π; ππ)β‘ what Charlie learns about (X,Y) Example •F is “X=Y?”. •μ is a distribution where w.p. ½ X=Y and w.p. ½ (X,Y) are random. Y X MD5(X) X=Y? B A πΌπΆππ₯π‘(π, π) β‘ = β‘πΌ(Π; ππ) β‘ = β‘129β‘πππ‘π β‘ what Charlie learns about (X,Y) 69 External information cost • It is always the case that πΌπΆππ₯π‘ π, π ≥ πΌπΆ π, π . • If π = ππ × ππ is a product distribution, then πΌπΆππ₯π‘ π, π = πΌπΆ π, π . 70 External information complexity • πΌπΆππ₯π‘ πΉ, π, π β inf πβ‘πππππ’π‘ππ πΉβ‘π€ππ‘ββ‘πππππβ‘≤π πΌπΆππ₯π‘ (π, π). • Can it be operationalized? 71 Operational meaning of πΌπΆππ₯π‘ ? • Conjecture: Zero-error communication scales like external information: πΆπΆ πΉ π , ππ , 0 lim = πΌπΆππ₯π‘ πΉ, π, 0 ? π→∞ π • Recall: πΆπΆ πΉ π , ππ , 0+ lim = πΌπΆ πΉ, π, 0 . π→∞ π 72 Example – transmission with a strong prior • π, π ∈ 0,1 • π is such that π ∈π 0,1 , and π = π with a very high probability (say 1 − 1/ π). • πΉ π, π = π is just the “transmit π” function. • Clearly, π should just have Alice send π to Bob. • πΌπΆ πΉ, π, 0 = πΌπΆ π, π = π» 1 π = o(1). • πΌπΆππ₯π‘ πΉ, π, 0 = πΌπΆππ₯π‘ π, π = 1. 73 Example – transmission with a strong prior • πΌπΆ πΉ, π, 0 = πΌπΆ π, π = π» 1 π = o(1). • πΌπΆππ₯π‘ πΉ, π, 0 = πΌπΆππ₯π‘ π, π = 1. • πΆπΆ πΉ π , ππ , 0+ = π π . • πΆπΆ πΉ π , ππ , 0 = Ω(π). Other examples, e.g. the two-bit AND function fit into this picture. 74 Additional directions Information complexity Interactive coding Information theory in TCS 75 Interactive coding theory • So far focused the discussion on noiseless coding. • What if the channel has noise? • [What kind of noise?] • In the non-interactive case, each channel has a capacity πΆ. 76 Channel capacity • The amortized number of channel uses needed to send π over a noisy channel of capacity πΆ is π» π πΆ • Decouples the task from the channel! 77 Example: Binary Symmetric Channel • Each bit gets independently flipped with probability π < 1/2. • One way capacity 1 − π» π . 1−π 0 π 1 0 π 1−π 1 78 Interactive channel capacity • Not clear one can decouple channel from task in such a clean way. • Capacity much harder to calculate/reason about. 1−π 0 • Example: Binary symmetric channel. 0 π • One way capacity 1 − π» π . 1 1 • Interactive (for simple pointer jumping, 1 − π [Kol-Raz’13]): 1−Θ π» π . Information theory in communication complexity and beyond • A natural extension would be to multi-party communication complexity. • Some success in the number-in-hand case. • What about the number-on-forehead? • Explicit bounds for ≥ log π players would imply explicit π΄πΆπΆ 0 circuit lower bounds. 80 Naïve multi-party information cost XZ YZ A XY B C πΌπΆ π, π = πΌ(Π; π|ππ)+ πΌ(Π; π|ππ)+ πΌ(Π; π|ππ) 81 Naïve multi-party information cost πΌπΆ π, π = πΌ(Π; π|ππ)+ πΌ(Π; π|ππ)+ πΌ(Π; π|ππ) • Doesn’t seem to work. • Secure multi-party computation [BenOr,Goldwasser, Wigderson], means that anything can be computed at near-zero information cost. • Although, these construction require the players to share private channels/randomness. 82 Communication and beyond… • The rest of today: – – – – • • • • Data structures; Streaming; Distributed computing; Privacy. Exact communication complexity bounds. Extended formulations lower bounds. Parallel repetition? … 83 Thank You! 84 Open problem: Computability of IC • Given the truth table of πΉ π, π , π and π, compute πΌπΆ πΉ, π, π .β‘ • Via πΌπΆ πΉ, π, π = lim πΆπΆ(πΉ π , ππ , π)/π can π→∞ compute a sequence of upper bounds. • But the rate of convergence as a function of π is unknown. 85 Open problem: Computability of IC • Can compute the π-roundβ‘πΌπΆπ πΉ, π, π information complexity of πΉ. • But the rate of convergence as a function of π is unknown. • Conjecture: 1 πΌπΆπ πΉ, π, π − πΌπΆ πΉ, π, π = ππΉ,π,π 2 . π • This is the relationship for the two-bit AND. 86