Asymptotic Enumerators of Protograph LDPCC Ensembles Jeremy Thorpe Joint work with Bob McEliece, Sarah Fogal Outline Motivation “Bad Sets” What causes error floors in codes? How can we predict error floors in advance of simulation? Codeword Stopping Set Other Sets Protograph Ensembles and Asymptotic Enumerators Computation of Enumerators Why error-correcting codes? Error correcting codes are designed to transmit information As efficiently as possible With very low probability of error What is an Error Floor? An error floor occurs when the probability of error doesn’t improve “fast enough” as a channel improves. What’s going on? Error floors occur when the typical mode of failure changes. In “waterfall” region, performance depends on global channel statistics. In “error floor” region, performance depends on channel output near: Low-weight codewords Low-weight stopping sets [FKV 98] Low-weight “trapping sets” [R. 04] Can we predict error floors without simulation? Predict the “bad sets”: Low-weight codewords Low-weight stopping sets Low-weight “trapping sets” This is difficult to do for particular codes However, for ensembles of codes, the problem becomes feasible, and has been solved for Codeword WE, Regular Ensembles [G 63] Codeword WE, Unstructured Irregular Ensembles [LS 98] Codeword, Stopping Set WE, UIE [Di 04] Weight Enumerators Defined Code Basics Recall that a code is a set of vectors of length n. The code on the right is the (7,4) Hamming Code. C = 0000000 0000111 0011001 0011110 0101010 0101101 0110011 0110100 1111111 1111000 1100110 1100001 1010101 1010010 1001100 1001011 Linear Codes Linear Codes can be represented by their parity check matrices. C = C {x : Hx 0} T 0000000 0000111 0011001 0011110 0101010 0101101 0110011 0110100 1111111 1111000 1100110 1100001 1010101 1010010 1001100 1001011 1010101 H = 0011110 1100110 Representation by a Graph Parity check matrices can be represented by a graph. 1010101 H = 0011110 1100110 Codeword weight Enumerator for the (7,4) Hamming Code C = 0000000 0000111 0011001 0011110 0101010 0101101 0110011 0110100 1111111 1111000 1100110 1100001 1010101 1010010 1001100 1001011 A(w) 7 6 5 4 3 2 1 0 0 1 2 3 4 w 5 6 7 Protograph Ensembles Protograph is expanded by “N” to obtain code graph. Randomness comes from permutations of each edge type N=4 Average codeword weight Enumerator of expanded code N=2 For N=2, there are (2!)|E|=4096 codes in the ensemble. The “ensemble average” weight enumerator is shown. A(w) 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 w Asymptotic average codeword weight enumerator Plot on log scale… log A(w) 2 1.5 1 0.5 0 -0.5 -1 0 5 10 w Asymptotic average codeword weight enumerator Plot on log scale Make quantities intrinsic… log A( w) n 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 0 0.2 0.4 0.6 w n 0.8 1 Asymptotic average codeword weight enumerator Plot on log scale Make quantities intrinsic Take Limit log A(n) lim n n 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 0.2 0.4 0.6 w n 0.8 1 Codewords, Stopping Sets, Trapping Sets Codewords (on a graph) Assignment x of the variables of a graph such that: each check node is adjacent an even number of times to variables assigned 1 example: x If half of the variables of a codeword are “flipped” in a BSC, ML decoding fails. X = 1 0 0 1 1 0 0 Stopping Sets Assignment x of the variables of a graph such that: Each check node is adjacent 0 times, or 2 or more times to variables assigned 1 example: x, y If all of a stopping set is erased, BP decoding cannot continue. X = 1 0 0 1 1 0 0 y = 1 0 1 0 1 0 0 Stopping Set Enumerators On the right is a “sneak peak” at a stopping set enumerator. Stopping set enumerators are uniformly larger than codeword enumerators because every codeword is a stopping set. log A(n) lim n n w n Trapping Sets Trapping sets are sets of variables that cause quantized decoders to get “stuck”. Usually, trapping sets have a small number of Checks connected once to variables assigned a 1. Unfortunately, this is not a good combinatorial characterization, so we forget about trapping sets for now. Trapping Set Enumerators Words like “usually” and “small number” usually don’t lead to useful combinatorial characterizations… ? Formalizing “Bad Sets” For a given “word” x, define the vector of assignments adjacent to check c as xc. For each check node c, define the set Ωc. Define Ω If Ωc are the sets of even weight, then Ω is the set of codewords. If Ωc are the set of vectors of non-unit weight, Ω is the stopping sets. c c 0,1 deg(c ) x : xc c Example If Ωc are the sets of even weight, then x is not in Ω, because x1 is not in Ω1. If Ωc are the set of vectors of non-unit weight, x is in Ω. x = 1 0 1 0 1 c1 c2 c3 xc1 1,1,1,0 xc2 1,0,1,0 xc2 1,0,1,0 0 0 Types of words Consider an arbitrary vector x Define vector θ where θv is the fraction of 1’s assigned to variables of type v 10/ 28 θ = 1 0 .25 0 .5 .5 .25 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 x = A Lemma about types Lemma: All words of a given type are equally probably a member of Ω, with respect to our ensemble. Sketch of Proof: Suppose x and y have the same type. If x is in Ω for a certain set of permutations (= code in the ensemble), then y is in Ω for some other set of permutations. Applying method of Types There is usually a “typical” type θ* of words of any weight Θ. Even if all types are equally numerous, there are only polynomially many types, and this cannot change the exponent. log A(n) E lim n n E max E : log A(N ) E lim n n Computing E(θ) A(Nθ) can be broken AN # N P x x down as shown on the right. H v It is easy to compute v A N e P x the number of words of x a given type. The exponent of the log AN H v c probability that all v c checks of a given type will be satisfied is defined as Φ. Computing Φ(θc) Sanov’s theorem (or something like it) tells us that Φ(θc) is the minimum KL-distance to a certain distribution qθc. Pθc is the set of distributions on Ωc with marginals equal to θc. c min D p || q pPc c Finding “p” We can do a constrained minimization to show that the optimal p must have a “Boltzmann distribution” with parameter s. It is easy to compute θc from s, but difficult in reverse. It is easy to compute D(p||q) from s. e s p s ' e ' s θc p D(p||q) Optimizing over θ E(θ) is not a convex function, thus in principle, we have to evaluate everywhere to find its maximum E(Θ). In practice, we still use convex optimization techniques. Using weight enumerators to design codes! Why are Enumerators Interesting? The zero-crossings of weight enumerators give us lower bounds on the typical size of bad sets. In this example, the zerocrossing of the stopping set enumerator is about 0.1, so we would expect a code of size 106 to have a minimum stopping set of size 105 (or possibly bigger). Optimizing Protograph Ensembles Currently, many ensembles are designed only for density evolution threshold. Such optimized codes are often ignored by actual implementers because desired error rates are around 10-10. By optimizing simultaneously for threshold and asymptotic enumerator, we may be able to find efficient codes that can be used in these applications. Open Questions How fast can we compute the weight enumerator (or just the zero-crossing)? What’s the achievable region of threshold and zero-crossing for protograph ensembles?