Error Correction and LDPC decoding CMPE 691/491: DSP Hardware Implementation Tinoosh Mohsenin 1

advertisement
Error Correction and LDPC decoding
CMPE 691/491: DSP Hardware Implementation
Tinoosh Mohsenin
1
Error Correction in Communication Systems
noise
Binary
information

Transmitter
channel
Receiver
Corrected
information
Error: Given the original frame k and the
received frame k’, how many corresponding bits
differ?


Hamming distance (Hamming, 1950).
Example:



2
frame
Corrupted
frame
Transmitted frame: 1110011
Received frame:
1011001
Number of errors: 3
Error Detection and Correction

Add extra information to the original data being
transmitted.


Error detection: enough info to detect error.


Frame = k data bits + m bits for error control: n = k + m.
Need retransmissions.
Error correction: enough info to detect and
correct error.

Forward error correction (FEC).
3
Error Correction in Communication Systems
Noise
Binary
information
Encoder
(Adding
Redundancy)
Encoded
information
Channel
Corrupted
information
with noise
Renewed interest
in LDPC
LDPC
Introduced
1960
Hamming
codes
BCH codes
1970
Convolutional
codes
Reed Solomon
codes
Corrected
Decoder
information
(Error Detection
and Correction
1980
1990
2000
Turbo
Practical
codes
implementation
LDPC beats
of codes
Turbo and
convolutional
codes
Modulation



5
Phase-shift keying (PSK) is a digital modulation scheme that
conveys data by changing, or modulating, the phase of a reference
signal
Binary Phase Shift-Keying (BPSK) Modulation
 phase reversal keying, or 2PSK) is the
simplest form of phase shift keying (PSK).
It uses two phases which are separated by 180°
 In matlab: 1-2X where X is the input signal
Quadrature phase-shift keying (QPSK)
 4-PSK, or 4-QAM. QPSK uses four points
on the constellation diagram, equispaced
around a circle. With four phases, QPSK
can encode two bits per symbol, shown in
the diagram with gray coding to minimize
the bit error rate (BER).
Key terms



Encoder : adds redundant bits to the sender's bit stream to
create a codeword.
Decoder: uses the redundant bits to detect and/or correct as
many bit errors as the particular error-control code will allow.
Communication Channel: the part of the communication
system that introduces errors.

Ex: radio, twisted wire pair, coaxial cable, fiber optic cable, magnetic
tape, optical discs, or any other noisy medium

Additive white Gaussian noise (AWGN)

Larger noise makes the
distribution wider
6
Important metrics

Bit error rate (BER): The probability of bit error.





Signal to noise ratio (SNR): quantifies how much a signal has
been corrupted by noise.

defined as the ratio of signal power to the noise power corrupting the
signal. A ratio higher than 1:1 indicates more signal than noise
often expressed using the logarithmic decibel scale:

Important number: 3dB means signal power is two times noise power

7
We want to keep this number small
Ex: BER=10-4 means if we have transmitted10,000 bits, there is 1 bit
error.
BER is a useful indicator of system performance independent of error
channel
BER=Number of error bits/ total number of transmitted bits
Error Correction in Communication Systems
Information
k-bit



Goal: Attain lower BER at smaller SNR
Error correction is a key component
100
in communication and storage
applications.
10-1
Coding example: Convolutional,
10-2
Turbo, and Reed-Solomon codes
10-3
What can 3 dB of coding gain buy?


8
channel
Receivedword
n-bit
A satellite can send data with half the
required transmit power
A cellphone can operate reliably with
half the required receive power
Bit Error Probability

Encoder
(add parity)
Codeword
n-bit
noise
10-4
Decoder
(check parity,
detect error)
Corrected
Information
k-bit
Uncoded system
Convolutional
code
0
3 dB
1 2 3 4 5 6 7
Signal to Noise Ratio (dB)
8
Figure courtesy of B. Nikolic, 2003
(modified)
LDPC Codes and Their Applications
Low Density Parity Check (LDPC) codes have superior
error performance


Standards and applications







9
4 dB coding gain over convolutional codes
10 Gigabit Ethernet (10GBASE-T)
Digital Video Broadcasting
(DVB-S2, DVB-T2, DVB-C2)
Next-Gen Wired Home
Networking (G.hn)
WiMAX (802.16e)
WiFi (802.11n)
Hard disks
Deep-space satellite missions
100
Bit Error Probability

10-1
Uncoded
10-2
10-3
4 dB
Conv. code
10-4
0
1 2 3 4 5 6 7
Signal to Noise Ratio (dB)
8
Figure courtesy of B. Nikolic, 2003
(modified)
Encoding Picture Example
H=
1 0 0 0 0 0 0 0 0 0 . . .1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 0 0 . . .1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 0 . . .0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0
0 0 0 1 0 0 0 0 0 0 . . .0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1
…
...
H.ViT=0
V=
V=
101110110000
010010001011
010100011111
0 1 0 0Parity
01100100
…
… 111100011111
… 011001000110
…100100111001
…010011100100
Image
...
Binary multiplication called syndrome check
Decoding Picture Example
Transmitter
Receiver
noise
50
100
channel
150
200
Ethernet cable,
Wireless,
or Hard disk
250
20
40
60
80
100 120 140 160 180 200
Iterative message passing decoding
50
100
150
200
250
20
40
60
80
100 120 140 160 180 200
Iteration 1
50
50
100
100
150
150
200
200
250
250
50
100
150
200
Iteration 5
20
40
60
80
100
120
140
160
180
200
Iteration 15
Iteration 16
LDPC Codes—Parity Check Matrix

Defined by a large binary matrix, called a parity check matrix or H matrix



Each row is defined by a parity equation
The number of columns is the code length
Example: 6x 12 H matrix for a12-bit LDPC code




No. of columns=12 (i.e. Receivedword (V) = 12 bit)
No. of rows= 6
No. of ones per row=3 (row weight)
No. of ones per col= 2 (column weight)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
0
1
0
H=
0
1
0
12
0
0
1
0
0
1
1
0
0
1
0
0
0
0
0
1
0
1
1
1
0
0
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0
0
0
0
1
1
0
C1
C2
C3
C4
C5
C6
LDPC Codes—Tanner Graph

Interconnect representation of H matrix




Two sets of nodes: Check nodes and Variable nodes
Each row of the matrix is represented by a Check node
Each column of matrix is represented by a Variable node
A message passing method is used between nodes to correct errors
(1) Initialization with Receivedword
(2) Messages passing until correct
Example:
V3 to C1, V4 to C1,
V8 to C1, V10 to C1
C2 to V1, C5 to V1
The same for other nodes:
message passing along the
connections
Variable nodes
13
V1
V2
C1
V3
V4
0
1
0
H
0
1
0
C2
V5
C3
V6
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
C4
V7
0
1
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
C5
V8
Receivedword from channel
1
0
0
0
1
0
C6
V9
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0
0
0
0
1
1
0
Check nodes
V10
V11
V12
Message Passing: Variable node processing
0
1
0
H
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
C1
V1
V2
V3
V4
1
0
0
0
0
1
0
1
0
1
0
0
C2
V5
Zj 
0
0
1
0
1
0
C3
V6
0
0
1
1
0
0
1
0
0
0
1
0
C4
V7

j ' , hij ' 1
0
1
0
0
0
1
C5
V8
ij '
λ
0
1
1
0
0
0
0
0
0
1
1
0
C6
V9
 j
 ij  Z j   ij
14
1
0
0
0
0
1
V10
V11
V12
α: message from check to
variable node
β: message from variable
is the original received information from the channel
to check node
Message Passing: Check node processing (MinSum)
0
1
0
H
0
1
0
0
0
1
0
0
1
Check nodes
V1
V2
V3
1
0
0
1
0
0
C1
V4
1
0
0
0
0
1
0
1
0
1
0
0
C2
V6
0
0
1
0
1
0
C3
V6
0
0
1
1
0
0
1
0
0
0
1
0
C4
V7
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0
C5
V8
0
0
0
1
1
0
C6
V9
V10
V11
V12
Variable nodes



 ij MS    sign ij '   min  ij '
 j ',hij ' 1, j ' j
  j ',hij ' 1, j ' j



 
Sign
15

  Sfactor


Magnitude
After check node
processing, the next
iteration starts with
another variable
node processing
(begins a new
iteration)
Code Estimation

Based on your modulation
scheme (here BPSK)
estimate the transmitted
bits
^
V
16
Z
^
V
Syndrome Check

Compute syndrome
^ T=0 (Binary multiplication)
H.V
i

Ex:


17
If syndrome =0, terminate
decoding
Else, continue another iteration
Example
Information
k-bit
Encoder
Codeword (V)
n-bit
Receivedword(λ)
n-bit
BPSK
modulation
channel
Corrected
Information
n-bit
Decoder
(iterative
MinSum)
Encoded information V= [1
0
1
0
1
0
1
0
1
1
1
1]
BPSK modulated= [-1
1
-1
1
-1
1
-1
1
-1
-1
-1
-1]
λ (Received data from channel)=
[ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8]
Estimated code=
^ 1 0 1
V=
18
0
1
0
0
0
1
1
1
1
Ex: Variable node processing (iteration 1)
0
1
0
H
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
C1
λ=
V2
0
0
1
0
1
0
C2
C3
V3
V4
V5
-9.1 4.9 -3.2 3.6 -1.4
1
0
0
0
1
0
C4
V6
V7
3.1
0.3
Z1 
19
0
0
1
1
0
0
0
1
0
0
0
1
1
0
0
0
0
1
C5
0
1
1
0
0
0
0
0
0
1
1
0
C6
0
0
V1
0
1
0
1
0
0
12 
V11
V9
1.6
-6.1 -2.5 -7.8 -6.8

j ' , hij ' 1
V10
V8
i1
 1
15 
V12
Ex: Check node processing (Iteration 1)
0
1
0
H
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
C1
V1

V2
V3
=-9.1 4.9 -3.2
V4
0
1
0
1
0
0
0
0
1
0
1
0
C2
V6
3.6 -1.4
| 13 |  Min {3.6,1.6 , 2.5}
0
0
1
1
0
0
1
0
0
0
1
0
C3
0
1
0
0
0
1
1
0
0
0
0
1
C4
0
1
1
0
0
0
C5
V6
V7
V8
3.1
0.3
1.6
| 14 | 
Sign (13 )  (1)(1)( 1)  1 Sign (14 ) 

20
0
0
0
1
1
0
Here assume Sfactor  1
C6
V9
V10
V11
V12
-6.1 -2.5 -7.8 -6.8
| 18 | 
| 110 | 
Sign (18 ) 
Sign (110 ) 
Ex: Code Estimation (Iteration 1)
Z=λ =
[ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8]
^
V
Z
^
V
^
V =1
21
0
1
0
1
0
0
0
1
1
1
1
Ex: Syndrome Check (iteration 1)

Compute syndrome
^
H.ViT=0 (Binary multiplication)
0
1
0
H
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0

Syndromei  XOR (v j )
jhij | hij  0
0
0
0
1
1
0
x

1
0
1
0
1
0
0
0
1
1
1
1
0
0
1
1
1
0
0
Sum Syndrom   Syndromei
i
Sumsyndrome=2 Not ZERO => Error, continue decoding
22
Second iteration

In variable node processing, compute β, α and Z based on the algorithm
0
1
0
H
0
1
0
0
0
1
0
0
1
C1
-1.4
V1
λ=
1
0
0
0
0
1
^
V= [ 1
0
0
1
0
1
0
C3
0
0
1
1
0
0
1
0
0
0
1
0
C4
0
1
0
0
0
1
C5
1
0
0
0
0
1
0
1
1
0
0
0
0
0
0
1
1
0
C6
-1.6
V3
V4
7.1 -4.5
0
0
1
0
1
0
0
C2
V5
-9.1 4.9 -3.2 3.6 -1.4
Z= [-12.1
23
V2
1
0
0
1
0
0
1
V11
V7
V8
V9
3.1
0.3
1.6
-6.1 -2.5 -7.8 -6.8
7.7 -7.2
0
V10
V6
1
4.4 -4.2
0
1
V12
7.2 -10.0 -7.7 -8.9 -8.1]
0
1
1
1
1 ]
Ex: Syndrome Check (iteration 2)

Compute syndrome
^
H.ViT=0 (Binary multiplication)
0
1
0
H
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0

Syndromei  XOR (v j )
jhij | hij  0
0
0
0
1
1
0
x

1
0
1
0
1
0
1
0
1
1
1
1
0
0
0
0
0
0
0
Sum Syndrom   Syndromei
i
Sumsyndrome= ZERO => corrected code Terminate Decoding
24
Full-Parallel Decoding
Checknodes
nodes
Check
C1
C2
C3
C4
C5
C6
Variable nodes
V1


V3
V4
V5
V6
V7
V8
λ from channel
Every check node and variable
node is mapped to a processor
All processors directly connected
based on the Tanner graph




25
V2
Very High throughput
No large memory storage elements
(e.g. SRAMs)
High routing congestion
Large delay, area, and power caused
by long global wires
V9
V10
V11
V12
init: all α = 0
Chk
1
Var
1
Chk
2
Var
2
Var
3
Chk
5
6
Var
12
Full-Parallel LDPC Decoder Examples

Ex 1: 1024-bit decoder, [JSSC 2002]
 52.5 mm2, 50% logic utilization, 160 nm CMOS
Ex 2: 2048 bit decoder, [ISCAS 2009]
 18.2 mm2, 25% logic utilization, 30 MHz,
65 nm CMOS
 CPU time for place & route>10 days
512 Chk &
1024 Var
Proc.
300
CPU time (hours)

384 Chk
& 2048
Var Proc.
200
100
For all data in the plot:
 Same automatic place & route flow is used
7
10  CPU: Quad Core, Intel Xeon 3.0GHz
0 5
6
10
10
Number of wire connections
Serial Decoder Example
Check nodes
V1
V2
V3
C1
V4
C2
V5
C3
V6
C4
V7
C5
V8
C6
V9
V10
V11
V12
Variable nodes
(2) compute V1 V2 V3
(1) initialize memory
and store
(clear contents)
Mem
Row
Col
V4 V5 V6
V7 V8 V9
V10 V11 V12
(3) …now
compute C1 C2 C3
and store C4 C5 C6
Decoding Architectures

0
Partial parallel decoders
1
 Multiple processing units
0
H
and shared memories
0
 Throughput: 100 Mbps-Gbps 1
0
 Requires Large memory
(depending on the size)
 Requires Efficient Control and
scheduling
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
Mem Mem Mem
Var
Var
Var
1
0
0
0
0
1
0
1
1
0
0
0
Mem
Var
Chk
Mem
Mem Mem Mem
Chk
Mem
Mem Mem Mem
0
0
0
1
1
0
Reported LDPC Decoder ASICs
10
5
Throughput (Mbps)
Partial-parallel Decoder
Full-parallel Decoder
10
10
4
10GBASE-T
3
802.11n
10
DVB-S2
2
802.11a/g
802.16e
1
10
2000
2002
2004
2006
Year
2008
2010
Throughput Across Fabrication Technologies

Existing ASIC implementations without early termination
Full-parallel decoders have the highest throughput
10
Throughput (Mbps)

10
10
10
Partial-parallel Decoder
Full-parallel Decoder
4
3
2
1
180
160
130
90
CMOS Technology (nm)
65
Energy per Decoded Bit in Different Technologies

Existing ASIC implementations without early termination
Full-parallel decoders have the lowest energy dissipation
10
2
Partial-parallel Decoder
Full-parallel Decoder
Energy per bit (nJ/bit)

10
10
10
1
0
-1
180
160
130
90
CMOS Technology (nm)
65
Circuit Area in Different Technologies
Full-parallel decoders have the largest area due to the high
routing congestion and low logic utilization
10
Area of Decoder Chip (mm2)

3
Partial-parallel Decoder
Full-parallel Decoder
10
10
10
2
1
0
180
160
130
90
CMOS Technology (nm)
65
Key optimization factors

Architectural optimization



Parallelism
Memory
Data path wordwidth (fixedpoint format)
Variable Node
Wc
β
α1...αWc
Check α
Node
λ
+
αi
i=1
λ
+
λ+
Wc
i=1
αi
αi
+
_
βi
clk
33
Architectural optimization
34
Z. Zhang JSSC 2010
BER performance versus quantization format
35
SNR(dB)
Check Node Processor
Mag Logic
β1
β2
| β2 |
Comp
IndexMin1
Min1
βn–1
| βn–1 |
βn
| βn |
Comp
Comp
Comp
Comp
βWr/Spn – 1 |βWr/Spn – 1|
βWr/Spn | βWr/Spn|
| α1 |
Min2
| αWr/Spn |
Comp
L = log2(Wr)
Sign Logic
Sign (β1)
Sign (βWr/Spn)
Sign (α1)
Sign (αWr/Spn)
37
Variable Node Processor
 ij 

Based on the variable
update equation


The same as the
original MinSum and
SPA algorithms
Variable node
hardware complexity
is mainly reduced via
wordwidth reduction

 ij'   j
j ' , hij ' 1
+
λi
α1
αwc
seven 5-bit
inputs
2's to
SM
β1
λi + αj
SM
to 2's
+
j=1:wc
SM
to 2's
+
+
7
+
7
+
7
+
8
3 to
2's
SM
-
SAT
-
SAT
6
+
+
8
5
βwc
39
Partial parallel decoder example
40
802.11ad LDPC code
41
0
1
0
H
0
0
1
42
0
0
1
0
1
0
1
0
0
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
0
0
1
0
0
0
1
1
0
0
Transmission scenario
43
Download