Error Correction and LDPC decoding CMPE 691/491: DSP Hardware Implementation Tinoosh Mohsenin 1 Error Correction in Communication Systems noise Binary information Transmitter channel Receiver Corrected information Error: Given the original frame k and the received frame k’, how many corresponding bits differ? Hamming distance (Hamming, 1950). Example: 2 frame Corrupted frame Transmitted frame: 1110011 Received frame: 1011001 Number of errors: 3 Error Detection and Correction Add extra information to the original data being transmitted. Error detection: enough info to detect error. Frame = k data bits + m bits for error control: n = k + m. Need retransmissions. Error correction: enough info to detect and correct error. Forward error correction (FEC). 3 Error Correction in Communication Systems Noise Binary information Encoder (Adding Redundancy) Encoded information Channel Corrupted information with noise Renewed interest in LDPC LDPC Introduced 1960 Hamming codes BCH codes 1970 Convolutional codes Reed Solomon codes Corrected Decoder information (Error Detection and Correction 1980 1990 2000 Turbo Practical codes implementation LDPC beats of codes Turbo and convolutional codes Modulation 5 Phase-shift keying (PSK) is a digital modulation scheme that conveys data by changing, or modulating, the phase of a reference signal Binary Phase Shift-Keying (BPSK) Modulation phase reversal keying, or 2PSK) is the simplest form of phase shift keying (PSK). It uses two phases which are separated by 180° In matlab: 1-2X where X is the input signal Quadrature phase-shift keying (QPSK) 4-PSK, or 4-QAM. QPSK uses four points on the constellation diagram, equispaced around a circle. With four phases, QPSK can encode two bits per symbol, shown in the diagram with gray coding to minimize the bit error rate (BER). Key terms Encoder : adds redundant bits to the sender's bit stream to create a codeword. Decoder: uses the redundant bits to detect and/or correct as many bit errors as the particular error-control code will allow. Communication Channel: the part of the communication system that introduces errors. Ex: radio, twisted wire pair, coaxial cable, fiber optic cable, magnetic tape, optical discs, or any other noisy medium Additive white Gaussian noise (AWGN) Larger noise makes the distribution wider 6 Important metrics Bit error rate (BER): The probability of bit error. Signal to noise ratio (SNR): quantifies how much a signal has been corrupted by noise. defined as the ratio of signal power to the noise power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise often expressed using the logarithmic decibel scale: Important number: 3dB means signal power is two times noise power 7 We want to keep this number small Ex: BER=10-4 means if we have transmitted10,000 bits, there is 1 bit error. BER is a useful indicator of system performance independent of error channel BER=Number of error bits/ total number of transmitted bits Error Correction in Communication Systems Information k-bit Goal: Attain lower BER at smaller SNR Error correction is a key component 100 in communication and storage applications. 10-1 Coding example: Convolutional, 10-2 Turbo, and Reed-Solomon codes 10-3 What can 3 dB of coding gain buy? 8 channel Receivedword n-bit A satellite can send data with half the required transmit power A cellphone can operate reliably with half the required receive power Bit Error Probability Encoder (add parity) Codeword n-bit noise 10-4 Decoder (check parity, detect error) Corrected Information k-bit Uncoded system Convolutional code 0 3 dB 1 2 3 4 5 6 7 Signal to Noise Ratio (dB) 8 Figure courtesy of B. Nikolic, 2003 (modified) LDPC Codes and Their Applications Low Density Parity Check (LDPC) codes have superior error performance Standards and applications 9 4 dB coding gain over convolutional codes 10 Gigabit Ethernet (10GBASE-T) Digital Video Broadcasting (DVB-S2, DVB-T2, DVB-C2) Next-Gen Wired Home Networking (G.hn) WiMAX (802.16e) WiFi (802.11n) Hard disks Deep-space satellite missions 100 Bit Error Probability 10-1 Uncoded 10-2 10-3 4 dB Conv. code 10-4 0 1 2 3 4 5 6 7 Signal to Noise Ratio (dB) 8 Figure courtesy of B. Nikolic, 2003 (modified) Encoding Picture Example H= 1 0 0 0 0 0 0 0 0 0 . . .1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 . . .1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 . . .0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 . . .0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 … ... H.ViT=0 V= V= 101110110000 010010001011 010100011111 0 1 0 0Parity 01100100 … … 111100011111 … 011001000110 …100100111001 …010011100100 Image ... Binary multiplication called syndrome check Decoding Picture Example Transmitter Receiver noise 50 100 channel 150 200 Ethernet cable, Wireless, or Hard disk 250 20 40 60 80 100 120 140 160 180 200 Iterative message passing decoding 50 100 150 200 250 20 40 60 80 100 120 140 160 180 200 Iteration 1 50 50 100 100 150 150 200 200 250 250 50 100 150 200 Iteration 5 20 40 60 80 100 120 140 160 180 200 Iteration 15 Iteration 16 LDPC Codes—Parity Check Matrix Defined by a large binary matrix, called a parity check matrix or H matrix Each row is defined by a parity equation The number of columns is the code length Example: 6x 12 H matrix for a12-bit LDPC code No. of columns=12 (i.e. Receivedword (V) = 12 bit) No. of rows= 6 No. of ones per row=3 (row weight) No. of ones per col= 2 (column weight) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 0 1 0 H= 0 1 0 12 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 C1 C2 C3 C4 C5 C6 LDPC Codes—Tanner Graph Interconnect representation of H matrix Two sets of nodes: Check nodes and Variable nodes Each row of the matrix is represented by a Check node Each column of matrix is represented by a Variable node A message passing method is used between nodes to correct errors (1) Initialization with Receivedword (2) Messages passing until correct Example: V3 to C1, V4 to C1, V8 to C1, V10 to C1 C2 to V1, C5 to V1 The same for other nodes: message passing along the connections Variable nodes 13 V1 V2 C1 V3 V4 0 1 0 H 0 1 0 C2 V5 C3 V6 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 C4 V7 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 C5 V8 Receivedword from channel 1 0 0 0 1 0 C6 V9 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 Check nodes V10 V11 V12 Message Passing: Variable node processing 0 1 0 H 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 C1 V1 V2 V3 V4 1 0 0 0 0 1 0 1 0 1 0 0 C2 V5 Zj 0 0 1 0 1 0 C3 V6 0 0 1 1 0 0 1 0 0 0 1 0 C4 V7 j ' , hij ' 1 0 1 0 0 0 1 C5 V8 ij ' λ 0 1 1 0 0 0 0 0 0 1 1 0 C6 V9 j ij Z j ij 14 1 0 0 0 0 1 V10 V11 V12 α: message from check to variable node β: message from variable is the original received information from the channel to check node Message Passing: Check node processing (MinSum) 0 1 0 H 0 1 0 0 0 1 0 0 1 Check nodes V1 V2 V3 1 0 0 1 0 0 C1 V4 1 0 0 0 0 1 0 1 0 1 0 0 C2 V6 0 0 1 0 1 0 C3 V6 0 0 1 1 0 0 1 0 0 0 1 0 C4 V7 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 C5 V8 0 0 0 1 1 0 C6 V9 V10 V11 V12 Variable nodes ij MS sign ij ' min ij ' j ',hij ' 1, j ' j j ',hij ' 1, j ' j Sign 15 Sfactor Magnitude After check node processing, the next iteration starts with another variable node processing (begins a new iteration) Code Estimation Based on your modulation scheme (here BPSK) estimate the transmitted bits ^ V 16 Z ^ V Syndrome Check Compute syndrome ^ T=0 (Binary multiplication) H.V i Ex: 17 If syndrome =0, terminate decoding Else, continue another iteration Example Information k-bit Encoder Codeword (V) n-bit Receivedword(λ) n-bit BPSK modulation channel Corrected Information n-bit Decoder (iterative MinSum) Encoded information V= [1 0 1 0 1 0 1 0 1 1 1 1] BPSK modulated= [-1 1 -1 1 -1 1 -1 1 -1 -1 -1 -1] λ (Received data from channel)= [ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8] Estimated code= ^ 1 0 1 V= 18 0 1 0 0 0 1 1 1 1 Ex: Variable node processing (iteration 1) 0 1 0 H 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 C1 λ= V2 0 0 1 0 1 0 C2 C3 V3 V4 V5 -9.1 4.9 -3.2 3.6 -1.4 1 0 0 0 1 0 C4 V6 V7 3.1 0.3 Z1 19 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 C5 0 1 1 0 0 0 0 0 0 1 1 0 C6 0 0 V1 0 1 0 1 0 0 12 V11 V9 1.6 -6.1 -2.5 -7.8 -6.8 j ' , hij ' 1 V10 V8 i1 1 15 V12 Ex: Check node processing (Iteration 1) 0 1 0 H 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 C1 V1 V2 V3 =-9.1 4.9 -3.2 V4 0 1 0 1 0 0 0 0 1 0 1 0 C2 V6 3.6 -1.4 | 13 | Min {3.6,1.6 , 2.5} 0 0 1 1 0 0 1 0 0 0 1 0 C3 0 1 0 0 0 1 1 0 0 0 0 1 C4 0 1 1 0 0 0 C5 V6 V7 V8 3.1 0.3 1.6 | 14 | Sign (13 ) (1)(1)( 1) 1 Sign (14 ) 20 0 0 0 1 1 0 Here assume Sfactor 1 C6 V9 V10 V11 V12 -6.1 -2.5 -7.8 -6.8 | 18 | | 110 | Sign (18 ) Sign (110 ) Ex: Code Estimation (Iteration 1) Z=λ = [ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8] ^ V Z ^ V ^ V =1 21 0 1 0 1 0 0 0 1 1 1 1 Ex: Syndrome Check (iteration 1) Compute syndrome ^ H.ViT=0 (Binary multiplication) 0 1 0 H 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 Syndromei XOR (v j ) jhij | hij 0 0 0 0 1 1 0 x 1 0 1 0 1 0 0 0 1 1 1 1 0 0 1 1 1 0 0 Sum Syndrom Syndromei i Sumsyndrome=2 Not ZERO => Error, continue decoding 22 Second iteration In variable node processing, compute β, α and Z based on the algorithm 0 1 0 H 0 1 0 0 0 1 0 0 1 C1 -1.4 V1 λ= 1 0 0 0 0 1 ^ V= [ 1 0 0 1 0 1 0 C3 0 0 1 1 0 0 1 0 0 0 1 0 C4 0 1 0 0 0 1 C5 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 C6 -1.6 V3 V4 7.1 -4.5 0 0 1 0 1 0 0 C2 V5 -9.1 4.9 -3.2 3.6 -1.4 Z= [-12.1 23 V2 1 0 0 1 0 0 1 V11 V7 V8 V9 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8 7.7 -7.2 0 V10 V6 1 4.4 -4.2 0 1 V12 7.2 -10.0 -7.7 -8.9 -8.1] 0 1 1 1 1 ] Ex: Syndrome Check (iteration 2) Compute syndrome ^ H.ViT=0 (Binary multiplication) 0 1 0 H 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 Syndromei XOR (v j ) jhij | hij 0 0 0 0 1 1 0 x 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 Sum Syndrom Syndromei i Sumsyndrome= ZERO => corrected code Terminate Decoding 24 Full-Parallel Decoding Checknodes nodes Check C1 C2 C3 C4 C5 C6 Variable nodes V1 V3 V4 V5 V6 V7 V8 λ from channel Every check node and variable node is mapped to a processor All processors directly connected based on the Tanner graph 25 V2 Very High throughput No large memory storage elements (e.g. SRAMs) High routing congestion Large delay, area, and power caused by long global wires V9 V10 V11 V12 init: all α = 0 Chk 1 Var 1 Chk 2 Var 2 Var 3 Chk 5 6 Var 12 Full-Parallel LDPC Decoder Examples Ex 1: 1024-bit decoder, [JSSC 2002] 52.5 mm2, 50% logic utilization, 160 nm CMOS Ex 2: 2048 bit decoder, [ISCAS 2009] 18.2 mm2, 25% logic utilization, 30 MHz, 65 nm CMOS CPU time for place & route>10 days 512 Chk & 1024 Var Proc. 300 CPU time (hours) 384 Chk & 2048 Var Proc. 200 100 For all data in the plot: Same automatic place & route flow is used 7 10 CPU: Quad Core, Intel Xeon 3.0GHz 0 5 6 10 10 Number of wire connections Serial Decoder Example Check nodes V1 V2 V3 C1 V4 C2 V5 C3 V6 C4 V7 C5 V8 C6 V9 V10 V11 V12 Variable nodes (2) compute V1 V2 V3 (1) initialize memory and store (clear contents) Mem Row Col V4 V5 V6 V7 V8 V9 V10 V11 V12 (3) …now compute C1 C2 C3 and store C4 C5 C6 Decoding Architectures 0 Partial parallel decoders 1 Multiple processing units 0 H and shared memories 0 Throughput: 100 Mbps-Gbps 1 0 Requires Large memory (depending on the size) Requires Efficient Control and scheduling 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 Mem Mem Mem Var Var Var 1 0 0 0 0 1 0 1 1 0 0 0 Mem Var Chk Mem Mem Mem Mem Chk Mem Mem Mem Mem 0 0 0 1 1 0 Reported LDPC Decoder ASICs 10 5 Throughput (Mbps) Partial-parallel Decoder Full-parallel Decoder 10 10 4 10GBASE-T 3 802.11n 10 DVB-S2 2 802.11a/g 802.16e 1 10 2000 2002 2004 2006 Year 2008 2010 Throughput Across Fabrication Technologies Existing ASIC implementations without early termination Full-parallel decoders have the highest throughput 10 Throughput (Mbps) 10 10 10 Partial-parallel Decoder Full-parallel Decoder 4 3 2 1 180 160 130 90 CMOS Technology (nm) 65 Energy per Decoded Bit in Different Technologies Existing ASIC implementations without early termination Full-parallel decoders have the lowest energy dissipation 10 2 Partial-parallel Decoder Full-parallel Decoder Energy per bit (nJ/bit) 10 10 10 1 0 -1 180 160 130 90 CMOS Technology (nm) 65 Circuit Area in Different Technologies Full-parallel decoders have the largest area due to the high routing congestion and low logic utilization 10 Area of Decoder Chip (mm2) 3 Partial-parallel Decoder Full-parallel Decoder 10 10 10 2 1 0 180 160 130 90 CMOS Technology (nm) 65 Key optimization factors Architectural optimization Parallelism Memory Data path wordwidth (fixedpoint format) Variable Node Wc β α1...αWc Check α Node λ + αi i=1 λ + λ+ Wc i=1 αi αi + _ βi clk 33 Architectural optimization 34 Z. Zhang JSSC 2010 BER performance versus quantization format 35 SNR(dB) Check Node Processor Mag Logic β1 β2 | β2 | Comp IndexMin1 Min1 βn–1 | βn–1 | βn | βn | Comp Comp Comp Comp βWr/Spn – 1 |βWr/Spn – 1| βWr/Spn | βWr/Spn| | α1 | Min2 | αWr/Spn | Comp L = log2(Wr) Sign Logic Sign (β1) Sign (βWr/Spn) Sign (α1) Sign (αWr/Spn) 37 Variable Node Processor ij Based on the variable update equation The same as the original MinSum and SPA algorithms Variable node hardware complexity is mainly reduced via wordwidth reduction ij' j j ' , hij ' 1 + λi α1 αwc seven 5-bit inputs 2's to SM β1 λi + αj SM to 2's + j=1:wc SM to 2's + + 7 + 7 + 7 + 8 3 to 2's SM - SAT - SAT 6 + + 8 5 βwc 39 Partial parallel decoder example 40 802.11ad LDPC code 41 0 1 0 H 0 0 1 42 0 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 Transmission scenario 43