An inequality for the binary skew-symmetric broadcast channel and its implications Chandra Nair arXiv:0901.1492v1 [cs.IT] 12 Jan 2009 Department of Information Engineering Chinese University of Hong Kong Sha Tin, N.T., Hong Kong Email: chandra@ie.cuhk.edu.hk Abstract— We establish an information theoretic inequality concerning the binary skew-symmetric channel that was conjectured in [1]. This completes the proof that there is a gap between the inner bound[2] and the outer bound[3] for the 2receiver broadcast channel; and constitutes the first example of a provable gap between the bounds. The interesting aspect of the proof is the technique employed, and is partly motivated by the paper[4]. I. I NTRODUCTION We consider the broadcast channel where a sender X wishes to communicate independent messages M1 , M2 to two receivers Y1 , Y2 . The capacity region for the broadcast channel is an open problem and the best known inner bound (achievable region) is the Marton’s inner bound, presented below. Bound 1: [2] The union of rate pairs (R1 , R2 ) satisfying R1 ≤ I(U, W ; Y1 ) R2 ≤ I(V, W ; Y2 ) R1 + R2 ≤ min{I(W ; Y1 ), I(W ; Y2 )} + I(U ; Y1 |W ) + I(V ; Y2 |W ) − I(U ; V |W ) over all (U, V, W ) ↔ X ↔ Y1 , Y2 constitutes an inner bound to the capacity region. Capacity regions have been established for a number of special cases, and in every case where the capacity region is known, the following outer bound and Marton’s inner bound yields the same region. Bound 2: ([2], [5], [3]) The union of rate pairs (R1 , R2 ) satisfying R1 ≤ I(U ; Y1 ) R2 ≤ I(V ; Y2 ) R1 + R2 ≤ I(U ; Y1 ) + I(V ; Y2 |U ) R1 + R2 ≤ I(V ; Y2 ) + I(U ; Y1 |V ) over all (U, V ) ↔ X ↔ Y1 , Y2 constitutes an outer bound to the capacity region. There is no example where the Bounds 1 and 2 has been explicitly evaluated and shown to be different. The difficulty stems from the need for explicit evaluation of the bounds over all choices of the auxiliary random variables. In [3] the authors studied Bound 2 for the binary skewsymmetric channel (Figure 1) and showed that a line-segment 1 2 1 2 0 0 Y 1 X 1 2 1 0 Z 1 2 1 Fig. 1. Binary Skew Symmetric Channel of R1 + R2 = 0.3711... lies on the boundary of the outer bound. Much earlier, in [6] the authors studied the Cover-Vander-Muelen inner bound (same as Bound 1 except (U, V, W ) are restricted to be independent) for the same channel and showed that a line-segment of R1 + R2 = 0.3616... lies on the boundary of the bound. In [1] the authors studied that Marton’s inner bound for the binary skew-symmetric broadcast channel(BSSC) and showed that, provided an information inequality (1) holds, a line-segment of R1 + R2 = 0.3616... lies on the boundary of the Marton’s inner bound as well. Therefore if (1) holds the maximum sum rate of the region characterized by the Marton’s inner bound would be 0.3616.., while the the maximum sum rate of the region characterized by the bound 2 would be 0.3711... This in particular implies that the bounds differ for the binary skew-symmetric channel. It was conjectured in [1] that for the BSSC, shown in Figure 1, we have I(U ; Y1 ) + I(V ; Y2 ) − I(U ; V ) ≤ max{I(X; Y1 ), I(X; Y2 )}, (1) for all (U, V ) ↔ X ↔ Y1 , Y2 . It should be noted that this inequality was established in [6] when U, V were independent; and in [1] for dependent U, V and P(X = 0) ∈ [0, 51 ] ∪ [ 45 , 1]. In this paper we establish the conjectured inequality in totality, i.e. establish that equation (1) holds for the binary skew-symmetric channel. Therefore, from [1], this establishes that the regions characterized by Bounds 1 and 2 differ for the broadcast channel. This calculation is straightforward and we do not repeat the arguments from [1]. Remark 1: It may be worthwhile to note that the method used here vastly differs from any of the previous approaches in [6], [1]. It was indeed the inability of the author to easily extend any of the existing methods that necessitated the approach adopted in the paper. Remark 2: While circulating this manuscript the author was informed that Gohari and Anantharam have also established this inequality (under an assumption that Marton’s inner bound matches the capacity region). Their communication suggested that the technique used was the following: they showed that one can restrict the cardinalities of U and V to be binary and then carried out numerical simulations to verify this inequality. It is not known whether Marton’s inner bound matches the capacity region for the BSSC, however it is very likely that such is the case. In [4] the authors compared the size of the images of a set via two channels and used this to establish the capacity region of a broadcast channel with degraded message sets. In this paper we compare the sizes of the intersection of the images of two codewords via the two channels in BSSC to obtain a necessary ingredient in the proof. The proof is unusual and may have deeper implications that are yet to be explored. The outline of the proof is as follows: (i) Using the random coding strategy developed by Marton, we will show that for any broadcast channel there exists a sequence of (2nR1 ×2nR2 )-codebooks for the broadcast channel with R1 = I(U ; Y1 ) − ǫ, R2 = I(V ; Y2 ) − I(U ; V ) − ǫ (for any ǫ > 0), such that the codebook satisfies two properties A and B (to be defined). Properties A and B are related to the size of decoding error events. (ii) Separately, we will show that for any broadcast channel if an (2nR1 × 2nR2 )-codebook satisfies two Properties A and C, then the codebook can also be considered as a single (2n(R1 +R2 ) )-codebook for the channel X ↔ Y1 whose average probability of error tends to zero. That is, the receiver Y1 can distinguish both the messages M1 and M2 with high probability. This implies from the converse of the channel coding theorem (or the maximal code lemma in [4]) that R1 + R2 ≤ I(X; Y1 ) + δ for any δ > 0 (for n sufficiently large) where p(x) is the empirical distribution of the code. (iii) For the BSSC channel we will show that Property B implies Property C (when P(X = 0) < 21 ), for n sufficiently large, and hence combining the earlier two results (and letting ǫ, δ → 0) we will obtain, I(U ; Y1 ) + I(V ; Y2 ) − I(U ; V ) ≤ I(X; Y1 ). II. P RELIMINARIES Consider a codebook {xn (i, j)}, i = 1, ..., M ; j = 1, ..., N for the broadcast channel X → (Y1 , Y2 ). We define the following sets of sequences in Y1n . C(i,j) : This is the set of output sequences y1n that is typical with respect to xn (i, j). n • Ci : The set of output sequences y1 that is typical with at n n least one of x (i, 1), . . . , x (i, N ), i.e. Ci = ∪N j=1 C(i,j) . Similarly, we define the corresponding sets in Y2n as follows: n • D(i,j) : The set of output sequences y2 that is typical with n respect to x (i, j). n • Dj : The set of output sequences y2 that is typical n n with at least one of x (1, j), . . . , x (M, j), i.e. Dj = ∪M i=1 D(i,j) . Remark 3: We use the definition of typical output sequences as found in [4]. An output sequence y n is deemed typical with respect to an transmitted sequence xn over a channel characterized by the transition probability p(y|x) if the following holds: • |N (a, b) − N (a)p(b|a)| ≤ ωn , ∀a ∈ X , b ∈ Y where N (a, b) counts the number of locations i : x(i) = a, y(i) = b, and N(a) counts the number of locations i : x(i) = a. Further ωn is a fixed sequence that satisfies: ωn ωn √ n → 0, n → ∞, as n → ∞. When p(b|a) = 0, one can also require that N (a, b) = 0 for y n to be considered typical w.r.t. xn . We now define the properties A, B, and C that we referred to in the outline. Definition 1: A codebook is said to have Property A if the following condition holds. Let ǫ > 0 be an arbitrary given number. Then, we require M N 1 XX X P(y1n ∈ Ci1 ∩ Ci |xn (i, j) is transmitted) < ǫ. M N i=1 j=1 i 6=i 1 Definition 2: A codebook is said to have Property B if the following condition holds. Let ǫ > 0 be an arbitrary given number. Then, we require M N 1 XX X P(y2n ∈ D(i,j1 ) ∩D(i,j) |xn (i, j) is transmitted) < ǫ. M N i=1 j=1 j1 6=j (2) Definition 3: A codebook is said to have Property C if the following condition holds. Let ǫ > 0 be an arbitrary given number. Then, we require M N 1 XX X P(y1n ∈ C(i,j1 ) ∩C(i,j) |xn (i, j) is transmitted) < ǫ. M N i=1 j=1 j1 6=j The motivation for defining these properties will become clear later. However the following relation is also worth mentioning. Observe that the average probability of error for receiver 1 is related to the size of M N 1 XX n P y1 ∈ ∪i1 6=i Ci1 ∩Ci |xn (i, j) is transmitted . M N i=1 j=1 In traditional random-coding arguments the union bound (i.e. replacing union with sum) is used to limit the size of this error event. Property A subsumes the union bound in its very definition, and hence random coding arguments will suffice to show the existence of codes satisfying Properties A and B. III. M AIN We restate the main result. Theorem 1: For the binary skew-symmetric broadcast channel shown in Figure 1, when P(X = 0) ≤ 12 , we have I(U ; Y1 ) + I(V ; Y2 ) − I(U ; V ) ≤ I(X; Y1 ), (3) for all (U, V ) ↔ X ↔ (Y1 , Y2 ). The result for P(X = 0) ≥ 12 follows by symmetry and interchanging the roles of receivers Y1 and Y2 . (Note that P(X = 0) ≤ 12 corresponds to the region where I(X; Y1 ) ≥ I(X; Y2 ).) Before we proceed into the proof, we present the key lemma that is particular to the binary skew-symmetric broadcast channel. (This corresponds to step (iii) in the outline of the proof.) Lemma 1: For any two sequences xn (i, j) and xn (i1 , j1 ) that belong to the same type with P(X = 0) < 12 , and n sufficiently large, P(y1n ∈ C(i1 ,j1 ) ∩ C(i,j) |xn (i, j) is transmitted) P(y2n Let us now compute the probability of events (averaged over the choice of the codebooks) corresponding to Properties A and B respectively. Since xn (i1 , j1 ) is jointly typical with un (i1 ) with high probability and Y1 ↔ X ↔ U is a Markov chain it follows (from the strong Markov property) that if y1n is jointly typical with xn (i1 , j1 ) then y1n is also jointly typical with un (i1 ). Therefore (ignoring the errors events already considered by Marton [2] like atypical sequences being generated, bins j having no V n (k) jointly typical with a U n (i), etc.) P(y1n ∈ Ci1 ∩ Ci , xn (i, j) is transmitted) ≤ P(y1n jointly typ. with un (i1 ), xn (i, j) is transmitted) 1 −n(I(U;Y1 )−0.5ǫ) 2 . ≤ MN Since un (i1 ) is generated independent of xn (i, j) (when i1 6= i) we see that the probability M N 1 XX X P(y1n ∈ Ci1 ∩ Ci |xn (i, j) is transmitted)) M N i=1 j=1 i 6=i 1 ≤ 2nR1 2−n(I(U ;Y1 )−0.5ǫ) ≤ 2−n0.5ǫ . n ≤ ∈ D(i1 ,j1 ) ∩ D(i,j) |x (i, j) is transmitted). Proof: The proof of this comparison lemma is provided in the Appendix. Corollary 1: If a codebook for the BSSC channel, with the empirical distribution P(X = 0) < 12 , satisfies Property B then the codebook also satisfies Property C. Proof: The proof follows from Lemma 1 and the definitions of properties B and C. A. Revisiting Márton’s coding scheme The first step of the proof of Theorem 1 is to show the existence of a codebook with R1 = I(U ; Y1 ) − ǫ and R2 = I(V ; Y2 ) − I(U ; V ) − ǫ that satisfies Properties A and B. Consider the random coding scheme involving random binning proposed by Márton[2], that achieves the corner point R1 = I(U ; Y1 ) − ǫ and R2 = I(V ; Y2 ) − I(U ; V ) − ǫ. Let us recall the code-generation strategy. Given Q a p(u, v) one generates 2nR1 sequences U n i.i.d. according to i p(ui ); and independently generates 2nT2 sequences V n i.i.d. accordQ ing to i p(vi ). Here one can assume T2 = I(V ; Y2 ) − 0.5ǫ. The sequences V n are randomly and uniformly binned into 2nR2 bins. Each bin therefore roughly contains 2n(I(U;V )+0.5ǫ) sequences V n . For every pair (i, j) we select a sequence V n from the bin numbered j that is jointly typical with U n (i) n (declare error if no such Q V exists) and generate a sequence n X (i, j) according to i p(xi |ui , vi ). Marton[2] showed that with high probability1 there will be at least one sequence V n from the bin numbered j that is jointly typical with U n (i); and further that the triple (U n (i), V n (j), X n (i, j)) is jointly typical. Here the probability is over the uniformly random choice of the pair (i, j) as well as over the choice of the codebooks. 1 An event E is said to have high probability if P(E ) ≥ 1 − a where n n n an → 0 as n → ∞. Therefore Property A is satisfied with high probability over the choice of the codebooks. Similar reasoning (again ignoring the events already considered by Marton[2]) implies that P(y2n ∈ D(i,j1 ) ∩ D(i,j) , xn (i, j) is transmitted) ≤ P(y2n jointly typ. with v n (k), xn (i, j) is transmitted). Here v n (k) refers to the v n sequence (belonging to bin j1 ) that was used in the generation of xn (i, j1 ). Let v n (1) be the v n sequence used for xn (i, j). Since v n (k), k 6= 1 is generated independent of xn (i, j) we see that X P(y2n jointly typ. with v n (k), xn (i, j) is transmitted) k6=1 1 n(I(V ;Y2 )−0.5ǫ) −n(I(V ;Y2 )−0.25ǫ) 2 2 MN 1 −n0.25ǫ 2 . ≤ MN ≤ This implies M N 1 XX X P(y2n ∈ D(i,j1 ) ∩ D(i,j) |xn (i, j) is transmitted) M N i=1 j=1 j1 6=j < ǫ. Therefore Property B is satisfied with high probability over the choice of the codebooks. Since intersection of two events of high probability is also an event of high probability Properties A and B are simultaneously satisfied with high probability over the choice of the codebooks. Therefore, there exists at least one codebook with rates R1 = I(U ; Y1 ) − ǫ and R2 = I(V ; Y2 ) − I(U ; V ) − ǫ that satisfies both Properties A and B. B. Converting a broadcast channel codebook to a single receiver codebook By symmetry in the channel and interchanging the roles of Y1 and Y2 , we also obtain the result The second step of the proof of Theorem 1 is to show that if a codebook satisfies the Properties A and C (again to be defined), then receiver Y1 can decode both the messages (M1 , M2 ) with very small average probability of error. Claim 1: Suppose a codebook satisfies Properties A and C, i.e. I(U ; Y1 ) + I(V ; Y2 ) − I(U ; V ) ≤ I(X; Y2 ), (5) 1 2 follows M N 1 XX X P(y1n ∈ Ci1 ∩ Ci |xn (i, j) is transmitted) < ǫ, M N i=1 j=1 i 6=i 1 2. The result for P(X = 0) = when P(X = 0) > from continuity. This completes the proof of Theorem 1. IV. D ISCUSSION The proof of the inequality presented in this paper completes the argument in [1] and shows that regions obtained by 1 and 2 are indeed different for the binary skew symmetric channel. M N 1 XX X Interestingly, this inequality also establishes that Marton’s n n P(y1 ∈ C(i,j1 ) ∩C(i,j) |x (i, j) is transmitted) < ǫ, M N i=1 j=1 j 6=j inner bound for the BSSC without the random variable W 1 reduces to the time-division region. then there exists a decoding map (M̂1 , M̂2 ) for receiver Y1 The capacity region of the BSSC remains unknown. It is such that the average probability of error is smaller than 3ǫ. natural to believe at first sight that time-division is optimal for Proof: Consider the following decoding map. If y1n ∈ the BSSC. However as was shown in [6], one can improve on S F := Ci \ (Ei ∪ E(i,j) ) where Ei = i1 6=i Ci ∩ Ci1 , E(i,j) = the time-division region using the ”common information” ranS(i,j) dom variable W . In [3] it was shown that this improved region j1 6=j C(i,j) ∩ C(i,j1 ) , then set (M̂1 = i, M̂2 = j). It is can be re-interpreted as a randomized time-division strategy, straightforward to check that the sets Fi,j are disjoint. thus reaffirming the belief that some form of time-division The average probability of decoding error is is optimal for BSSC. The comparison property (Lemma 1) M N 1 XX n n seems to give a tool to quantify our intuition in a mathematical P(y1 ∈ / Fi,j |x (i, j) is transmitted) M N i=1 j=1 framework. M N It would be very interesting to see if a similar reasoning can 1 XX = P(y1n ∈ / Ci |xn (i, j) is transmitted) be used to derive an outer bound for the BSSC that matches the M N i=1 j=1 Marton’s inner bound (at least the sum rate). Essentially, this + P(y1n ∈ Ei ∪ E(i,j) |xn (i, j) is transmitted) paper proves that the rates of any codebook for the BSSC that M X N X satisfies properties A and B must lie within the time-division ` 1 ǫ + P(y1n ∈ Ei |xn (i, j) is transmitted) ≤ region. Marton’s random coding scheme with U, V is just one M N i=1 j=1 ´ way obtain such a codebook, while the codebook obtained by + P(y1n ∈ E(i,j) |xn (i, j) is transmitted) Marton’s coding scheme with U, V, W do not satisfy properties M N X 1 XX` A and B simultaneously. P(y1n ∈ Ci1 ∩ Ci |xn (i, j) is transmitted) ǫ+ ≤ M N i=1 j=1 It would also be interesting to obtain a routine proof of i1 6=i X ´ this inequality. Hopefully a routine proof may lead to better P(y1n ∈ C(i,j1 ) ∩ C(i,j) |xn (i, j) is transmitted) + understanding of the types of information inequalities possible j1 6=j under a specific channel coding setting. (See Remark 2 on a ≤ 3ǫ. new development along these lines.) Therefore any (R1 , R2 )-codebook for the broadcast channel ACKNOWLEDGEMENT X → (Y1 , Y2 ) satisfying properties A and C forms an R1 +R2 The author is grateful to Vincent Wang for extensive nucodebook for the channel X → Y1 . merical simulations that he performed which formed the basis From Corollary 1 we know that if Property B is satisfied of the conjecture in [1]. The author is also grateful for several and P(X = 0) < 12 , then Property C is satisfied. In Section discussions he had with Profs. Bruce Hajek and Abbas El III-A we have shown the existence of a codebook with rates Gamal. R1 = I(U ; Y1 ) − ǫ and R2 = I(V ; Y2 ) − I(U ; V ) − ǫ that R EFERENCES satisfies both Properties A and B. Therefore from Claim 1 and and Corollary 1 we have a rate R = R1 + R2 = I(U ; Y1 ) + [1] C. Nair and V. W. Zizhou, “On the inner and outer bounds for 2receiver discrete memoryless broadcast channels,” Proceedings of the ITA I(V ; Y2 )−I(U ; V )−2ǫ code for the channel X → Y1 . Clearly Workshop, 2008, cs.IT/0804.3825. from the converse of the channel coding theorem, we require [2] K. Marton, “A coding theorem for the discrete memoryless broadcast channel,” IEEE Trans. Info. Theory, vol. IT-25, pp. 306–311, May, 1979. R ≤ I(X; Y1 ). Combining the above two results we have the [3] C. Nair and A. El Gamal, “An outer bound to the capacity region of the inequality (letting ǫ → 0) broadcast channel,” IEEE Trans. Info. Theory, vol. IT-53, pp. 350–355, 1 I(U ; Y1 ) + I(V ; Y2 ) − I(U ; V ) ≤ I(X; Y1 ), when P(X = 0) < 12 . (4) January, 2007. [4] J. Körner and K. Marton, “Images of a set via two channels and their role in mult-user communication,” IEEE Trans. Info. Theory, vol. IT-23, pp. 751–761, Nov, 1977. [5] A. El Gamal, “The capacity of a class of broadcast channels,” IEEE Trans. Info. Theory, vol. IT-25, pp. 166–169, March, 1979. [6] B. Hajek and M. Pursley, “Evaluation of an achievable rate region for the broadcast channel,” IEEE Trans. Info. Theory, vol. IT-25, pp. 36–46, January, 1979. The probability of obtaining any sequence in A is 2−na+o(n) and the probability of obtaining any sequence in B is 2−n(1−a)+o(n) , when xn (1) is transmitted. Therefore, A PPENDIX P (A2 |xn (1)) = 2n(1−a−b)H((1−a)/(2(1−a−b)))−n(1−a)+o(n) . Recalling Lemma 1. Consider two sequences xn (1) and n x (2) that belong to the same type (a, 1 − a), when a = m n for some 0 ≤ a < 12 . Let A represent the intersection of the typical sequences of xn (1) and xn (2) in the space Y1n ; and let B represent the intersection of the typical sequences of xn (1) and xn (2) in the space Y2n . Then Lemma 1 states that P(y1n ∈ A|xn (1) is transmitted) ≤ P(y2n ∈ B|xn (1) is transmitted). Proof: Let k = nb be the number of locations i where Xi (1) = 0 and Xi (2) = 1. The typical sequences in A are those sequences that have n a2 ± ωn zeros in the n(a − b) locations where xi (1) = xi (2) = 0, and ones everywhere else. The typical sequences in B are those sequences that have n 1−a 2 ± ωn ones in the n(1 − a − b) locations where xi (1) = xi (2) = 1, and zeros everywhere else. (Recall ωn from the definition of typical sequences in Remark 3.) P+ω n(a−b) n and The number of sequences in A is k=−ωn n a +k 2 P+ωn n(1−a−b) the number of sequences in B is k=−ωn n(1−a)/2+k . Note that when b > a2 we can see that the set A is essentially empty (By essentially empty we mean that the size is subexponential.). Therefore the lemma is trivially valid when b > a a 2 . The interesting case occurs when 0 ≤ b ≤ 2 . P (A1 |xn (1)) = 2n(a−b)H(a/(2(a−b)))−na+o(n) , Thus, for large n, it suffices to show that for 0 ≤ b ≤ a )−a (a − b)H( 2(a − b) 1−a ) − 1 − a, ≤ (1 − a − b)H( 2(1 − a − b) a 2 ≤ 1 4 or rewriting, one needs to establish 1−a a ) − (a − b)H( ). 2(1 − a − b) 2(a − b) (6) Observe that when b = 0 we have an equality; therefore it suffices to show that the right hand side is an increasing function of b for 0 ≤ b ≤ a2 . x Define the function g(x, b) = (x − b)H( 2(x−b) ). Then x − 2b x ∂ . g(x, b) = log2 = log2 1 − ∂b 2(x − b) 2(x − b) 1 − 2a ≤ (1 − a − b)H( Since ∂ ∂b g(x, b) is an increasing function of x we have ∂ ∂ 1 g(1 − a, b) ≥ g(a, b), 0 ≤ a ≤ ∂b ∂b 2 and hence the right hand side is an increasing function of b. This proves Equation (6) and completes the proof of the Lemma 1.