EE376A Information Theory Lecture 9: Polar Codes Mert Pilanci Stanford University February 5, 2019 Outline I Channel coding and capacity I Polar code construction I Decoding I Theoretical analysis I Extensions Channel coding I Entropy H(U ) = E[log X 1 ]=− p(u) log p(u) p(U ) u Channel coding I Entropy H(U ) = E[log X 1 ]=− p(u) log p(u) p(U ) u I Conditional Entropy H(X|Y ) = E[log X 1 ]= p(y)H(X|Y = y) p(X|Y ) y Channel coding I Entropy H(U ) = E[log X 1 ]=− p(u) log p(u) p(U ) u I Conditional Entropy H(X|Y ) = E[log X 1 ]= p(y)H(X|Y = y) p(X|Y ) y I Mutual Information I(X; Y ) = H(X) − H(X|Y ) = H(X) + H(Y ) − H(X, Y ) Channel Capacity I Channel capacity C is the maximal rate of reliable communication over memoryless channel characterized by P (Y |X) I Theorem: C = max I(X; Y ) PX Assume thatof the the channelbinary has “input-output symmetry.” Capacity erasure channel (BEC) Examples: BSC(ǫ) 1−ǫ 0 1 BEC(ǫ) 0 0 1−ǫ ǫ ǫ ǫ ǫ 1−ǫ 1 1 1−ǫ 0 ? 1 I(X; Y ) = H(X) − H(X|Y ) = H(X) − H(X) − 0 P (Y = 0) − 0 P (Y = 1) = (1 − )H(X) Picking X ∼ Ber( 12 ), we have H(X) = 1. Thus, the capacity of BEC is C = 1 − Channel Coding rate log M n bits/channel use probability of error Pe = P (Jˆ 6= J) Channel Coding rate log M n bits/channel use probability of error Pe = P (Jˆ 6= J) I If R < maxPx I(X; Y ), then rate R is achievable, i.e., there exists schemes with rate ≥ R and Pe → 0 Channel Coding rate log M n bits/channel use probability of error Pe = P (Jˆ 6= J) I If R < maxPx I(X; Y ), then rate R is achievable, i.e., there exists schemes with rate ≥ R and Pe → 0 I If R > maxPx I(X; Y ), then R is not achievable. Main result: maximum rate of reliable communication C = maxPX I(X; Y ) Today: Polar Codes I Invented by Erdal Arikan in 2009 I First code with an explicit construction to provably achieve the channel capacity I Nice structure with efficient encoding/decoding operations I We will assume that the channel is symmetric, i.e., uniform input distribution achieves capacity Basic 2 × 2 transformation W2 : U2 → (Y1 , Y2 , U1 ) U1 U2 + X1 X2 U1 , U2 , X1 , X2 ∈ {0, 1} binary variables (in GF(2)) Basic 2 × 2 transformation W2 : U2 → (Y1 , Y2 , U1 ) U1 + X1 X2 U2 U1 , U2 , X1 , X2 ∈ {0, 1} binary variables (in GF(2)) X1 X2 = 1 1 0 1 U1 U2 mod 2 or equivalently X1 = U1 ⊕ U2 and X2 = U2 Properties of G2 W2 : U2 → (Y1 , Y2 , U1 ) U1 X1 + X2 U2 U= Define G2 := U1 U2 1 1 0 1 X= X1 X2 then we have X = G2 U G22 := G2 G2 Properties of G2 W2 : U2 → (Y1 , Y2 , U1 ) U1 X1 + X2 U2 U= Define G2 := U1 U2 1 1 0 1 X= X1 X2 then we have X = G2 U G22 := G2 G2 G2 G2 U = 1 1 0 1 1 1 0 1 U1 U2 Properties of G22 W2 : U2 → (Y1 , Y2 , U1 ) U1 + U1 + X2 U2 Define X1 G2 := 1 1 0 1 U2 then we have X = G2 U G22 := G2 G2 G2 G2 U = = = 1 1 0 1 1 1 0 1 1 1 0 1 U1 ⊕ U2 U2 U1 U2 heErasure channel channel has “input-output symmetry.” BSC(ǫ) 1−ǫ BEC(ǫ) 0 0 1−ǫ 0 ǫ ǫ 1−ǫ 1 1 1−ǫ ? 1 on Encoding Decoding Construction Examples: Naively combining erasure channels BSC(ǫ) second bit-channel W2 1−ǫ 0 1 0 0 BEC(ǫ) 1−ǫ ǫ ǫ ǫ ǫ 1−ǫ 1 1 0 ? 1 ǫ W2 : U2 →1 −(Y 1 , Y 2 , U1 ) I Repetition coding U1 W W Y1 Y2 Pe Combining two erasure channels W2 : U2 → (Y1 , Y2 , U1 ) U1 U2 + W W Y1 Y2 Invertible transformation does not alter capacity: I(U ; Y ) = I(X; Y ) Sequential decoding W1 The first bit-channel W1 : U1 → (Y1 , Y2 ) First bit-channel W1 : U1 → (Y1 , Y2 ) U1 random U2 + W W Y1 Y2 C (W1 ) = I (U1 ; Y1 , Y2 ) second bit-channel W2 W : U → (Y , Y , U ) 2 W 2: U → 1(Y , Y 2 1 Second bit-channel 2 2 1 2 , U1 ) U1 U2 + W W Y1 Y2 on Encoding Decoding Construction Capacity is conserved second bit-channel W2 C(W1 ) + C(W2 ) = C(W ) + C(W ) = 2C(W ) W2C(W : U2)→ (Y1 ,)Y≤2C(W , U1 ) ) ≤ C(W 1 U1 U2 + 2 W W Y1 Y2 Pe Polarization process 2 − 2 2(2 − 2 ) − (2 − 2 )2 (2 − 2 )2 2(2 ) − (2 )2 2 (2 )2 A familiar update rule... Let et be i.i.d. uniform ±1 for t = 1, 2... wt+1 = wt + et wt (1 − wt ) A familiar update rule Let et be i.i.d. uniform ±1 for t = 1, 2... wt+1 = wt + et wt (1 − wt ) Martingales Let et be i.i.d. uniform ±1 for t = 1, 2... wt+1 = wt + et wt (1 − wt ) E[wt+1 |wt ] = wt Martingales Let et be i.i.d. uniform ±1 for t = 1, 2... wt+1 = wt + et wt (1 − wt ) E[wt+1 |wt ] = wt I Doob’s Martingale convergence theorem (informal) Bounded Martingale processes converge to a limiting random variable w∞ such that E[|wt − w∞ |] → 0. Non-convergent paths I Down - Up - Down - Up .... & 2 % 22 − 4 =? Non-convergent paths I Down - Up - Down - Up .... & 2 % 22 − 4 = if = √ 5 2 − 1 2 = 1 φ ≈ 0.61803398875 Non-convergent paths I Down - Up - Down - Up .... & 2 % 22 − 4 = if = Golden ratio : √ 5 2 − 1 2 = 1 φ ≈ 0.61803398875 √ 1+ 5 φ := ≈ 1.61803398875 2 Non-convergent paths I Down - Up - Down - Up .... & 2 % 22 − 4 = if = Golden ratio : √ 5 2 − 1 2 = 1 φ ≈ 0.61803398875 √ 1+ 5 φ := ≈ 1.61803398875 2 Google images: golden ratio in nature Polarization theorem Theorem The bit-channel capacities {C (Wi )} polarize: for any δ ∈ (0, 1), as the construction size N grows no. channels with C (Wi ) > 1 − δ −→ C (W ) N 1 1−δ and no. channels with C (Wi ) < δ −→ 1 − C (W ) N δ 0 Freezing noisy channels Freezing noisy channels Freezing noisy channels Encoding 1 frozen 0 +0 frozen 0 0 frozen 0 +1 1 free 1 1 1 frozen 0 +1 free 1 1 free 0 free 1 + + 1 + + + + 1 1 0 0 0 0 0 0 +1 1 1 1 1 1 + + W W W W W W W W Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 : U2 → (Y1 , Y2 , U1 ) Polarization of generalW2channels U1 + W U2 W − (Y1 , Y2 |U1 ) = W Y1 Y2 1X W1 (y1 |u1 ⊕ u2 )W2 (y2 |u2 ) 2 u 2 1 W + (Y1 , Y2 , U1 |U2 ) = W1 (y1 |u1 + u2 )W2 (y2 |u2 ) 2 : U2 → (Y1 , Y2 , U1 ) Polarization of generalW2channels U1 + W U2 W − (Y1 , Y2 |U1 ) = W Y1 Y2 1X W1 (y1 |u1 ⊕ u2 )W2 (y2 |u2 ) 2 u 2 1 W + (Y1 , Y2 , U1 |U2 ) = W1 (y1 |u1 + u2 )W2 (y2 |u2 ) 2 I(W − ) + I(W + ) = I(W ) + I(W ) = 2I(W ) : U2 → (Y1 , Y2 , U1 ) Polarization of generalW2channels U1 + W U2 W − (Y1 , Y2 |U1 ) = W Y1 Y2 1X W1 (y1 |u1 ⊕ u2 )W2 (y2 |u2 ) 2 u 2 1 W + (Y1 , Y2 , U1 |U2 ) = W1 (y1 |u1 + u2 )W2 (y2 |u2 ) 2 I(W − ) + I(W + ) = I(W ) + I(W ) = 2I(W ) I Mrs Gerber’s Lemma: If I(W ) = 1 − H(p), then I(W + ) − I(W −1 ) ≥ 2H(2p(1 − p)) − H(p) General Polar Construction General Polar Construction General Polar Construction Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Successive Cancellation Decoder Polar Coding Theorem Theorem For any rate R < I (W ) and block-length N, the probability of frame error for polar codes under successive cancelation decoding is bounded as √ √ Pe (N, R) = o 2− N+o( N) Improved decoders I List decoder (Tal and Vardy, 2011) First produce L candidate decisions Pick the most likely word from the list Complexity O(LN log N ) List decoder Tal-Vardy list decoder performance Length n = 2048, rate R = 0.5, BPSK-AWGN channel, list-size L. Polar Coding Summary Summary Given W , N = 2n , and R < I (W ), a polar code can be constructed such that it has ◮ ◮ ◮ ◮ construction complexity O(Npoly(log (N))), encoding complexity ≈ N log N, successive-cancellation decoding complexity ≈ N log N, √ √ frame error probability Pe (N, R) = o 2− N+o( N) . 5G Communications I The jump from 4G to 5G is far larger than any previous jumps–from 2G to 3G; 3G to 4G I The global 5G market is expected reach a value of 251 Bn by 2025 5G Communications I The jump from 4G to 5G is far larger than any previous jumps–from 2G to 3G; 3G to 4G I The global 5G market is expected reach a value of 251 Bn by 2025 I In 2016, 27 Gbps downlink speed was reached using Polar Codes! I Current LTE download speed is 5-12 Mbps 5G Communications I The jump from 4G to 5G is far larger than any previous jumps–from 2G to 3G; 3G to 4G I The global 5G market is expected reach a value of 251 Bn by 2025 I In 2016, 27 Gbps downlink speed was reached using Polar Codes! I Current LTE download speed is 5-12 Mbps I In November 2016, 3GPP agreed to adopt Polar codes for control channels in 5G. LDPC codes will also be used in data channels. References I E. Arikan, Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels, IEEE IT, 2009 I E. Arikan, Polar Coding Tutorial, Simons Institute, UC Berkeley, 2015 I B.C. Geiger, The Fractality of Polar and Reed–Muller Codes, Entropy, 2018