Local optimality in Tanner codes Guy Even Nissim Halabi June 2011 1 Error Correcting Codes for Memoryless Binary-Input Output-Symmetric Channels (1) Channel Encoding c C 0,1 N Noisy Channel codeword y ¡ N noisy codeword Channel Decoding cˆ 0,1 Memoryless Binary-Input Output-Symmetric Channel Binary input but output is real Stochastic: characterized by conditional probability function P( y | c ) Memoryless: errors are independent (from bit to bit) Symmetric channel: Errors affect ‘0’s and ‘1’s symmetrically Example: Binary Symmetric Channel (BSC) 1-p 0 ci p 1 2 p 1-p 0 yi 1 N Error Correcting Codes for Memoryless Binary-Input Output-Symmetric Channels (2) Channel Encoding c C 0,1 codeword N Noisy Channel ( y ) ¡ N noisy codeword Channel Decoding cˆ 0,1 Log-Likelihood Ratio (LLR) i for a received observation yi: PYi / X i yi | xi 0 i yi ln PY / X yi | xi 1 i i i > 0 i < 0 3 yi is more likely to be ‘0’ yi is more likely to be ‘1’ y : injective mapping replace y by N Maximum-Likelihood (ML) Decoding Maximum-likelihood (ML) decoding can be formulated by: (for any binary-input memory-less channel) ML arg min , x xC or a linear program: ML arg min , x arg min , x xC No Efficient Representation xconv C {0,1}N conv()[0, 1]N 4 Linear Programming (LP) Decoding Linear Programming (LP) decoding [Fel03, FWK05] – relaxation of the polytope conv() LP arg min , x xP P : (1) All codewords x in are vertices (2) All new vertices are fractional (therefore, new vertices are not in ) (3) Has an efficient representation {0,1}N LP integral LP ML Solve LP N LP fractional fail ! 5 conv([0,1] fractional P conv( P Linear Programming (LP) Decoding Linear Programming (LP) decoding [Fel03, FWK05] – relaxation of the polytope conv() LP arg min , x xP P : (1) All codewords x in are vertices (2) All new vertices are fractional (therefore, new vertices are not in ) (3) Has an efficient representation LP decoder finds ML codeword {0,1}N LP integral LP ML Solve LP N LP fractional fail ! 6 conv([0,1] fractional P conv( P Linear Programming (LP) Decoding Linear Programming (LP) decoding [Fel03, FWK05] – relaxation of the polytope conv() LP arg min , x xP P : (1) All codewords x in are vertices (2) All new vertices are fractional (therefore, new vertices are not in ) (3) Has an efficient representation LP decoder fails {0,1}N LP integral LP ML Solve LP N LP fractional fail ! 7 conv([0,1] fractional P conv( P Tanner Codes [Tan81] Factor graph representation of Tanner codes: Every Local-code node j is associated with linear code of length degG(j) Tanner code (G) and codewords x : x C G j. xN C local code Cj j d * min dmin local code Cj j Extended local-code j bits outside the local-code {0,1}N: 8 extend to x1 x2 x3 x4 x5 x6 x7 x8 x9 C1 C2 C3 C4 C5 x10 Variable nodes Example: Expander codes [SS’96] Tanner graph is an expander; Simple bit flipping decoding algorithm. G=(I Local-Code nodes J,E) LP Decoding of Tanner Codes Maximum-likelihood (ML) decoding: ML where conv C =conv j arg min , x xconv C extended local-code Cj Linear Programming (LP) decoding [following Fel03, FWK05]: LP arg min , x xP where conv extended local-code Cj P j P satisfies the requirements. 9 conv(extended local-code conv(extended local-code 1) Why study (irregular) Tanner Codes? Tighter LP relaxation because local codes are not parity codes Irregularity of the Tanner graph Codes with better performance in the case of LDPC codes [Luby et al., 98] Regular Low-Density Tanner Code with nontrivial local codes Irregular LDPC code. Irregular Tanner code with non-trivial local codes Might yield improved designs. 10 Criterions of Interest N denote an LLR vector received from the channel. Let Let x (G) denote a codeword. Consider the following questions: ? x ML NP-Hard ? ML unique ? LP unique ? x LP x Efficient Test with One Sided Error ? x ML unique ? Definitely Yes / Maybe No E.g., efficient test via local computations “Local Optimality” criterion 11 Combinatorial Characterization of Local Optimality (1) Let x (G) {0,1}N f [0,1]N N [Fel03] Define relative point x f by x f i xi fi Consider a finite set B [0,1]N of deviations Definition: A codeword x is locally optimal for N if for all vectors b B, , x b , x ML() Goal: find a set B deviations such that: (1) x LO( ) x ML() and ML() unique (2) x LO( ) x LP() and LP() unique LP() integral LO( ) P LP decoding fails P 0 N is not locally opt. | c 0 N 12 (3) P , b 0 | c 0 N 1 o(1) b B All-Zeros Assumption Combinatorial Characterization of Local Optimality (2) Goal: find a set B such that: (1) x LO( ) x ML() and ML() unique (2) x LO( ) x LP() and LP() unique N (3) P , b 0 | c 0 1 o(1) b B Suppose we have properties (1), (2). Large support(b) property (3). (e.g., Chernoff-like bounds) If B = , then: x LO( ) x ML() and ML() unique However, analysis of property (3) ??? b – GLOBAL Structure 13 Combinatorial Characterization of Local Optimality (3) Goal: find a set B such that: (1) x LO( ) x ML() and ML() unique (2) x LO( ) x LP() and LP() unique N (3) P , b 0 | c 0 1 o(1) b B Suppose we have properties (1), (2). Large support(b) property (3). (e.g., Chernoff-like bounds) For analysis purposes, consider structures with a local nature B is a set of TREES [following KV’06] (height < ½ girth) Strengthen analysis by introducing layer weights! [following ADS’09] better bounds on N P , b 0 | c 0 b B Finally, height(subtrees(G)) < ½ girth(G) = O(log N) Take path prefix trees – not bounded by girth! 14 Path-Prefix Tree Captures the notion of computation tree Consider a graph G=(V,E) and a node r V: Vˆ – set of all backtrackless paths in G emanating from node r with length at most h. Trh G G: Vˆ , Eˆ – path-prefix tree of G rooted at node r with height h. Tr4 G : 1 15 1 1 1 1 1 1 1 1 1 2 2 2 2 2 3 2 4 2 d-Tree Consider a path-prefix tree Trh G of a Tanner graph G = (I J , E) h d-tree T [r,h,d] – subgraph of Tr G root = r ∀ v ∈ T ∩ I : deg T (v) = deg G (v). ∀ c ∈ T ∩ J : deg T (c) = d. 2-tree = skinny tree / minimal deviation v0 16 3-tree v0 Not necessarily a valid configuration! 4-tree v0 Cost of a Projected Weighted Subtree Consider layer weights : {1,…,h} → , and a subtree Trˆ of a path prefix tree Tr2h G . Define a weight function Trˆ : Vˆ R for the subtree Trˆ induced by : where G Trˆ R N – projection to Tanner graph G. r̂ 1 1 Trˆ 161 1 1 17 G Trˆ 1 4 2 8 Trˆ 1 1 1 1 1 1 2 2 1 2 2 project 1 2 1 62 2 2 2 1 2 1 2 2 2 3 2 4 2 2 Combinatorial Characterization of Local Optimality Setting: (G) {0,1}N Tanner code with minimal local distance d* 2 ≤ d ≤ d* [0,1]h\{0N} Bd – set of all vectors corresponding to projections to G by -weighted d-trees of height 2h rooted at variable nodes Bd G T T is -weighted a d -tree of height h Definition: A codeword x is (h, , d)-locally optimal for N if for all vectors b Bd, , x b , x 18 Thm: Local Opt ⇒ ML Opt / LP Opt Theorem: If x (G) is (h , , d)-locally optimal for , then: (1) x is the unique ML codeword for . (2) x is the unique optimal LP solution given . Proof sketch – Two steps: STEP 1: Local Opt. ML Opt. Key structural lemma: Every codeword x equals a convex combination of projected -weighted d-trees in G (up to a scaling factor ): x For every codeword x’ ≠ x, let z x x ' , x Eb , x b local-optimality of x , x Eb b Key structural lemma , x z 1 , x , x ' 19 , x ' , x Thm: Local Opt ⇒ ML Opt / LP Opt Theorem: If x (G) is (h , , d)-locally optimal for , then: (1) x is the unique ML codeword for . (2) x is the unique optimal LP solution given . Proof sketch – Two steps: STEP 2: Local Opt. LP Opt. Local opt. ML Opt. used as “black box”. Reduction using graph covers and graph cover decoding [VK’05] 20 Thm: Local Opt ⇒ ML Opt / LP Opt Theorem: If x (G) is (h , , d)-locally optimal for , then: (1) x is the unique ML codeword for . (2) x is the unique optimal LP solution given . Interesting outcomes: Characterizes the event of LP decoding failure. For example: Theorem: Fix R h . Then P LP decoding fails P d -tree . G [ ], 0 | c 0 n . All-Zeros Assumption Work in progress: Design an iterative message passing decoding algorithm that computes an (h , , d)- locally-optimal codeword after h iterations. x locally optimal codeword for weighted message passing algorithm finds x after h iterations + guarantee that x is the ML codeword. 21 Thm: Local Opt ⇒ ML Opt / LP Opt Theorem: If x (G) is (h , , d)-locally optimal for , then: (1) x is the unique ML codeword for . (2) x is the unique optimal LP solution given . ML() LP() integral LO( ) 22 Goals achieved: (1) x LO( ) x ML() and ML() unique (2) x LO( ) x LP() and LP() unique Left to show: (3) Pr{x LO( )} = 1 – o(1) Bounds for LP Decoding Based on d-Trees (analysis for regular factor graphs) (dL,dR)-regular Tanner codes with minimum local distance d* - Form of finite length bounds: c > 1. t. noise < t. Pr(LP decoder success) 1 - exp(-cgirth) - If girth = Θ(logn), then Pr(LP decoder success) 1 - exp(-nγ), for 0 < < 1 n → ∞ : t is a lower bound on the threshold of LP decoding Results for BSC using analysis of uniform weighted d-trees: 23 Skinny-Tree analysis (2-tree) [ADS’09] 3-trees 4-trees (3,6)-regular Tanner code with d*=4 pLP > 0.02 pLP > 0.076 pLP > 0.4512 (4,8)-regular Tanner code with d*=4 pLP > 0.017 pLP > 0.057 pLP > 0.365 Message Passing Decoding for Irregular LDPC codes Dynamic Programming on coupled computation trees of the Tanner graph. Input: Channel observations for variable nodes v V. Output: An assignment to variable nodes v V , s.t.: assignment to v agrees with the “Tree” codeword that minimizes an objective function on the computation tree rooted at v. Algorithm principle – Messages are sent iteratively: variable nodes → check nodes check nodes → variable nodes Goal: A locally optimal codeword x exists passing algorithm finds x 24 Message -Weighted Min-Sum: Message Rules Variable node → Check node message (iteration ): ( ) v c h 1 ( 1) v c v degG v degG v 1 c 'N v \c Check node → Variable node message (iteration ): 25 ( ) c v sign v() c v 'N c \v v() c v 'min N c \v -Weighted Min-Sum: Decision Rule Decision at variable node v on iteration h: 0 1 ( h 1) v v cv degG v degG v 1 cN v 0 if v 0 xˆv o.w. 1 26 Thm: Local Opt. ⇒ -Weighted Min-Sum Opt. Theorem: If x (G) is (h , , 2)-locally optimal for , then: (1) -weighted min-sum computes x in h iterations. (2) x is the unique ML codeword given . The -Weighted Min-Sum algorithm generalizes the attenuated max-product algorithm [Frey and Koetter ’00] [Jian and Pfister ’10]. Open question: probabilistic analysis of localoptimality in the irregular case. 27 Local Optimality Certificates for LP decoding Previous results: Koetter and Vontobel ’06 – Characterized LP solutions and provide a criterion for certifying the optimality of a codeword for LP decoding of regular LDPC codes. Characterization is based on combinatorial structure of skinny trees of height h in the factor graph. (h< girth(G)) Arora, Daskalakis and Steurer ’09 – Certificate for LP decoding of regular LDPC codes over BSC. Extension to MBIOS channels in [Halabi-Even ’10]. Certificate characterization is based on weighted skinny trees of height h. (h< girth(G)) Note: Bounds on the word error probability of LP decoding are computed by analyzing the certificates. By introducing layer weights, ADS certificate occurs with high probability for much larger noise rates. Vontobel ’10 – Implies a certificate for LP decoding of Tanner codes based on weighted skinny subtrees of graph covers. 28 Local Optimality Certificates for LP decoding – Summary of Main Techniques [KV06,ADS09,HE10] Girth h< h is unbounded. characterization uses computation trees girth(G) Regularity Regular factor graph. Check Nodes / Constraints Parity code. Irregular factor graph – normalization by node degrees. Linear Codes. Tighter relaxation for the generalized fundamental polytope. x1 x2 x3 x4 x5 x6 x7 x8 x9 29 Current work x10 C1 C2 C3 C4 C5 Local Optimality Certificates for LP decoding – Summary of Main Techniques [KV06,ADS09,HE10] Girth h< girth(G) Regularity Regular factor graph. Check Nodes / Constraints Deviations h is unbounded. characterization uses computation trees Irregular factor graph – normalization by node degrees. Parity code. Linear Codes. Tighter relaxation for the generalized fundamental polytope. “skinny” Locally satisfies parity checks. “fat” – Certificates based on “fat” structures likely to occur with high probability for larger noise rates. Not necessarily a valid configuration! v0 30 Current work v0 Local Optimality Certificates for LP decoding – Summary of Main Techniques [KV06,ADS09,HE10] Girth h < girth(G) h is unbounded No dependencies on the Characterization using computation factor graph. trees Regularity Regular factor graph. Check Nodes / Constraints Deviations Irregular factor graph – add normalization factors according to node degrees. Parity code. Linear Codes. Tighter relaxation for the generalized fundamental polytope. “skinny” Locally satisfies inner parity checks. “fat” – Certificates based on “fat” structures likely to occur with high probability for larger noise rates. Not necessarily a valid configuration! LP solution Dual / Primal LP analysis / analysis. Polyhedral characterization analysis. 31 Current work Use reduction to ML via characterization of graph cover decoding. Conclusions Combinatorial, graph theoretic, and algorithmic methods for analyzing decoding of (modern) error correcting codes over any memoryless channel: New local optimality combinatorial certificate for LP decoding of irregular Tanner codes, i.e., Local Opt. LP Opt ML Opt. The certificate is based on weighted “fat” subtrees of computation trees of height h. h is not bounded by girth. Degree of local-code nodes is not limited to 2 Proofstechniques: combinatorial decompositions and graph covers arguments. Efficient algorithm (dynamic programming), runs in O(|E| h) time, for the computation of local optimality certificate of a codeword given an LLR vector. 32 Future Work Work in progress: Design an iterative message passing decoding algorithm for Tanner codes that computes an (h , , d)- locally-optimal codeword after h iterations. x locally optimal codeword for weighted message passing algorithm computes x in h iterations + guarantee that x is the ML codeword. Asymptotic analysis of LP decoding and weighted min-sum decoding for ensembles of irregular Tanner codes. apply density evolution techniques and the combinatorial characterization of local optimality. 33 Thank You! 34