Uploaded by Noor Alshibani

Algebraic 2010cyc

advertisement
Algebraic Coding Theory
Honors Thesis
Ahsan Ashraf
Expected date of graduation: May 15, 2010
Dr. Arnold Feldman
Franklin & Marshall College
Department of Mathematics and Computer Science
May 13, 2010
Abstract
This project will attempt an in-depth study of algebraic coding
theory. We will study the two basic kinds of codes: Block codes and
trellis codes. Specifically, we will look at linear block codes, cyclic
codes, Hamming codes, and convolutional codes.
1
Contents
1 Introduction
3
2 Algebraic Framework
5
2.1
Overview of Basic Algebra . . . . . . . . . . . . . . . . . . . .
5
2.2
Finite fields based on the integer and polynomial rings . . . .
7
3 Linear Block Codes from Matrices and Group Rings
13
3.1
Matrix Description . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2
Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Cyclic Codes
19
4.1
Polynomial Description . . . . . . . . . . . . . . . . . . . . . . 20
4.2
Matrix Description . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Some More Algebra
5.1
25
Group rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Convolutional Codes
28
6.1
Trellis Description of Convolutional Codes . . . . . . . . . . . 28
6.2
Construction of Convolutional Codes from Units: Polynomial
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3
Group Ring Convolutional Codes: Matrix Description . . . . . 32
6.4
Specific Examples . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.5
Hamming Type Convolutional Codes . . . . . . . . . . . . . . 35
2
1
Introduction
In modern communication systems, there is a massive amount of data that
needs to be stored and communicated every day. It is vital that this data be
transmitted in a reliable way, with as few errors as possible. Error-correcting
codes play a fundamental role in minimizing data corruption due to certain
defects in the system such as noise, interference and attenuation. We will
introduce a variety of such codes that can be used in error-correction. Specifically, we will aim to understand convolutional codes.
Let’s suppose that all data of interest can be represented as binary data.
This means that any data can be broken down to a sequence of zeros and
ones. This binary data is transmitted through a channel that often may encounter errors. The basic purpose of a code is to add a check so that if an
error occurs, the error can be found and corrected at the receiver.
If we have a sequence of symbols of length k, the coding process encodes it
into a sequence of symbols of length n. The initial sequence of k symbols is
called a dataword and the final sequence of n symbols is called a codeword.
The set of all possible codewords is called the code. Once the codewords are
transmitted through the channel, the words received at the other end are
called sensewords.
There are two basic kinds of codes: block codes and trellis codes. In general
block codes are defined over some arbitrary finite alphabet, for example the
alphabet with q symbols {0, 1, 2, . . . , q − 1}. So the dataword is a sequence
of k of these symbols. The code is the set of all codewords. As we have q
different symbols, the code is then of size M = q k . This code is defined as
3
an (n, k) code. Now if q = 2, we get sequences of zeros and ones. These are
called binary codes. We will give several examples of binary codes later.
A trellis code is different from block codes. It takes an unending sequence
of data symbols arranged in k-symbol segments called dataf rames. The code
puts out a continuous stream of code symbols arranged in n-symbol segments
called codef rames. The major differentiation between the two types of codes
is that in a trellis code a k-symbol dataframe can affect all codeword frames
that come after. On the contrary, in a block code, a k-symbol datablock only
determines the next n-symbol codeblock.
We will begin with an introduction to some basic abstract algebra. We
will then outline some further algebraic ideas such as extension fields, finite
fields and Galois fields that we will need in order to understand algebraic
coding. We can then define basic terms in coding and link ideas from abstract algebra to matrix definitions of linear block codes and their generator
and check matrices.
We will introduce cyclic codes and their polynomial and matrix forms. We
will explain how we use Galois fields in their structure.
Further, we will explain Hurley’s construction of unit-derived convolutional
codes [4]. This will require some understanding of group rings and group ring
matrices that we will overview. We will work with some specific examples
and solve for free distances for each of these codes. We will then talk about
how these free distances can be potentially improved giving us better codes.
4
2
Algebraic Framework
All good data-transmission codes rely on the structures of algebra. The tools
of algebra are necessary in order to design encoders and decoders. We will
devote this section to understanding topics in algebra that are significant in
the development of data-transmission codes.
2.1
Overview of Basic Algebra
We will use arithmetic systems, different from real or complex numbers, consisting of sets together with operations defined on the elements of these sets.
In this section we will outline these systems.
A group G is a set, together with an operation on pairs of elements on the
set (∗), satisfying closure, associativity, existence of an identity and the existence on inverse elements. If the operation is commutative, we call this an
abelian group. In this paper, we will only deal with abelian groups. The
operation, (∗), can then be called addition and denoted by (+). The identity
element is called “zero” and the inverse of a is denoted by −a. We can also
call the operation multiplication where the identity element is denoted as (1)
and the inverse of any element a is denoted as a−1 .
Let G be a group and let H be a subset of G. Then H is a subgroup of G if H
is also a group under (∗). One way of obtaining a subgroup of a finite group
G is to take any element h from G and multiply it by itself a number of times.
As G is finite, only a finite number of these will be unique. The first element
to repeat will be h itself. Also, if any hj = h then hj−1 = 1, which is the
identity. So H will be the group composed of the elements 1, h, h2 , h3 , . . . .
5
The set H is called the subgroup generated by h. The number of elements
in H is called the order of the element h. A group that consists of all the
powers of one of its elements is called a cyclic group. Therefore H is a cyclic
subgroup of G.
A ring R is a set with two operations defined: addition and multiplication.
It is an abelian group under addition and it is closed, associative, and distributive under multiplication. Addition in a ring is always commutative,
however, multiplication need not be. A commutative ring is one in which
multiplication is commutative. Addition in a ring has an identity called
“zero”. Multiplication may or may not have an identity element. If there is
an identity under multiplication, it is unique. Every element in a ring has an
inverse under addition. Under multiplication, an inverse is only defined in a
ring with the identity. In such a ring, inverses may exist but need not. They
can be defined as right or left inverses depending on if they multiply from
the right or the left. An integral domain is a commutative ring that contains
no zero-divisors. A zero-divisor is a non-zero element, a, of the ring if there
exists a non-zero b in the ring such that ab = 0.
A field F is a commutative ring with the identity in which every nonzero
element is a unit. Therefore there is an identity for addition, as well as for
multiplication. These are denoted as 0 and 1 respectively. The additive inverse of a is denoted as −a and the multiplicative inverse is denoted as a−1 .
In this paper, we will be interested in finite fields. A field with q elements,
if it exists, is called a finite field or a Galois field, and is denoted by GF (q).
We will talk in much detail about Galois fields later.
We will also need another system called a vector space. A familiar example
6
is the three-dimensional space that we use in physics. We can extend this
mathematically to an n-dimensional vector space. Vector spaces are defined
abstractly with respect to a field. Let F be a field. The elements of F are
called scalars. A set V is called a vector space if we define vector addition
on pairs of elements from V , and scalar multiplication on an element of F
and an element of V . The elements of V are called vectors. V must be an
abelian group under this addition. Scalar multiplication must be distributive
and associative. The zero element of V is called the origin and is denoted by
0. As an example, let V be the set of polynomials in x and F be GF (q). In
this space, the vectors are polynomials in x. A set of vectors {v1 , . . . , vk } is
called linearly dependent if there is a set of scalars {a1 , . . . , ak } not all zero,
such that
a1 v1 + a2 v2 + · · · + ak vk = ~0.
A set of vectors that is not linearly dependent is linearly independent. A
sum such as the one on the left hand side of the above equation is called
a linear combination of vectors. No vector in a linearly independent set of
vectors can be written as a linear combination of other vectors in the set.
2.2
Finite fields based on the integer and polynomial
rings
There is a method to construct a new ring, called a quotient ring, from a
given commutative ring. We will specifically look at the construction of a
quotient ring from the ring of integers. If the ring is an integral domain, this
7
construction will result in a field.[1]
Theorem 2.1. For any pair of integers c and d, where d is nonzero, we can
write c = dQ + s for some Q and s, where 0 ≤ s ≤ d − 1. The remainder
can be written as s = Rd [c]. [1]
Definition 2.1. Let q be a positive integer. The quotient ring called the ring
of integers modulo q, denoted by Z/(q) or Zq , is the set 0, . . . , q − 1 with
addition and multiplication defined by
a + b = Rq [a + b],
a · b = Rq [ab]
Any element a ∈ Z can be mapped into Z/(q) by a′ = Rq [a]. If two
elements, a and b of Z are mapped into the same element of Z/(q) then they
are congruent and a = b + mq for some integer m. Then a is congruent to
b with modulus q, where the modulus is the base with respect to which the
congruence is computed.
To define finite fields based on polynomial rings, we will first define polynomial rings. A polynomial over any field F is a mathematical expression
f (x) = fn−1 xn−1 + fn−2 xn−2 + · · · + f1 x + f0
where the symbol x is an indeterminate, and the coefficients fn−1 , . . . , f0 are
elements of the field F . We will be working with polynomials over a finite
field, GF (q). A monic polynomial is a polynomial with a leading coefficient
8
of one. The degree of a non-zero polynomial, deg f (x), is the index of the
leading coefficient. Also, deg 0 = −∞. The degree of a non-zero polynomial
is always finite.
The set of all polynomials over a finite field GF (q) forms a ring if addition
and multiplication are defined as the usual addition and multiplication of
polynomials. This ring is denoted by GF (q)[x]. A polynomial ring is analogous to the ring of integers. A polynomial s(x) in a polynomial ring R is
divisible by the polynomial r(x) in R, or r(x) is a factor of s(x), if there
exists a polynomial a(x) in R such that r(x)a(x) = s(x). A polynomial p(x)
that is divisible only by αp(x) and by α where α is any non-zero field element
in GF (q) is called an irreducible polynomial.
Theorem 2.2. For any pair of polynomials c(x) and d(x), where d(x) is
nonzero, we can write c(x) = d(x)Q(x) + s(x) for some Q(x) and s(x),
where s(x) is of degree less than d(x). The remainder can be written as
s(x) = Rd(x) [c(x)]. [1]
We can obtain finite fields from polynomial rings by using a similar construction to the one we used to obtain finite fields from the integer ring.
Suppose we have F [x], the ring of polynomials over the field F . We can
choose any p(x) from F [x], and define the quotient ring by using p(x) as a
modulus for polynomial arithmetic.
Definition 2.2. For any monic polynomial p(x) of nonzero degree over the
field F, the ring of polynomials modulo p(x) is the set of all polynomials
with a degree smaller than that of p(x), together with polynomial addition and
9
polynomial multiplication modulo p(x). This ring is conventionally denoted
by F [x]/ < p(x) >.
Any element r(x) of F [x] can be mapped into F [x]/ < p(x) > by r(x) −→
Rp(x) [r(x)]. Two elements of F [x], a(x) and b(x), that map into the same
element of F [x]/ < p(x) > are congruent mod p(x); b(x) = a(x) + Q(x)p(x)
for some polynomial Q(x). [1]
Theorem 2.3. The ring of polynomials modulo a monic polynomial p(x) is
a field if and only if p(x) is an irreducible polynomial. [2]
We will use finite fields extensively later therefore we will need to define
some more terms.
Definition 2.3. Let E be a field. A subset F of E is called a subfield of E if
F itself is a field under the inherited addition and multiplication. The field
E is then called an extension f ield of the subfield F .
We can now state the Fundamental Theorem of Field Theory.
Theorem 2.4. Let F be a field and f(x) be a nonconstant polynomial in F[x].
Then there is an extension field E of F in which f(x) has a zero. [1]
This is a standard result so we will not be proving it here. The Fundamental Theorem of Field Theory allows us to prove another theorem which
will be very useful for us later.
Theorem 2.5. Let F be a field and let p(x) ∈ F [x] be irreducible over F .
If a is a zero of p(x) in some extension E of F , then F(a) is isomorphic to
10
F [x]/ < p(x) >. Furthermore, if deg p(x) = n, then every member of F (a)
can be uniquely expressed in the form
cn−1 an−1 + cn−2 an−2 + · · · + c1 a + c0
where c0 , c1 , . . . , cn−1 ∈ F [1].
Finite fields are very restrictive in their nature. The following theorem is
an example of this. We will be using this property of finite fields later.
Theorem 2.6. For each prime p and each positive integer n, there is up to
isomorphism a unique finite field of order pn , and the nonzero elements of
that field form a cyclic group under multiplication. [1]
Definition 2.4. A primitive element of the field GF(q) is an element α
such that every nonzero field element can be expressed as a power of α. [2]
Let’s look at a specific example of a finite field based on a polynomial ring.
In GF (8), every nonzero element has an order that divides 7. In this field,
every element except one and zero is primitive, that is to say that that every
element except one and zero has order 7, because seven is prime. We can
construct GF (8) using the polynomial p(x) = x3 + x + 1. p(x) is irreducible
over Z2 .[1] Based on the primitive element α = x, we write
α = x
α 2 = x2
α3 = x + 1
11
because x3 + < p(x) >= −(x + 1)+ < p(x) >= x + 1+ < p(x) >. Similarly
we get
α 4 = x2 + x
α 5 = x2 + x + 1
α 6 = x2 + 1
α7 = 1 = α0
Then GF (8) consists of the elements: {0, α, α2 , . . . , α7 }.
12
3
Linear Block Codes from Matrices and Group
Rings
Definition 3.1. A block code of size M over an alphabet with q symbols is
a set of M q-ary sequences of length n called codewords.
The coding process takes a block of k data symbols and encodes it into an
n-symbol codeword. Each codeword is a sequence of n symbols. The set of all
codewords is the code. We will define the rate, R of a block code as R = k/n.
In this section we will use properties of finite fields to construct codes.
Let us recall that the set of n-tuples of elements from GF (q) is the vector
space called GF (q)n under vector addition and multiplication.
Definition 3.2. A linear code C is a subspace of GF (q)n over the field GF (q).
So a linear code C is a non-empty set of n-tuples over GF (q), called
codewords. The sum of two codewords is a codeword and the product of a
codeword with a field element is a codeword. If C is an (n, k) code, then n is
the blocklength and k is the dimension of the code. Notice that C contains
q k elements.
Definition 3.3. The Hamming weight, or weight, w(c) of a vector c is equal
to the number of nonzero components in the vector. The minimum Hamming
weight wmin of a code C is the smallest Hamming weight of any nonzero
codeword of C.
Definition 3.4. The Hamming distance d(x,y) between two q-ary sequences
x, and y of length n is the number of places in which x and y differ.
13
The minimum Hamming distance dmin (or d) of C is the Hamming distance
between the distinct pair of codewords with the smallest Hamming distance.
Let C = {cl |l = 0, . . . , M − 1} be a code. Then dmin = mini6=j d(ci , cj ).
Theorem 3.1. For a linear code, the minimum distance dmin satisfies
dmin = minc6=0 w(c) = wmin
where the minimum is over all codewords except the all-zero codeword
Proof. Let ci , cj ∈ C,
dmin = mini6=j d(ci , cj )
= mini6=j d(ci − ci , cj − ci )
We can do this because C is a linear code so ci − ci , cj − ci ∈ C.
mini6=j d(ci − ci , cj − ci ) = mini6=j d(0, cj − ci )
Now, any cj − ci ∈ C so cj − ci is a codeword. Also any codeword, c, can be
written in the form of cj − ci because c = c − 0 where c = cj and 0 = ci .
Then we can say that mini6=j d(0, cj −ci ) = mini6=j d(0, c) = minc6=0 w(c).
We can use this to find a code that can correct a specific number of errors.
If we want a code that fixes t errors, we need a linear code whose minimum
distance, wmin ≥ 2t + 1. For example, if a codeword is transmitted and a
single error is made the Hamming distance from the transmitted codeword,
called the senseword, to the actual word is 1. If the distance to every other
14
codeword is greater than 1, then the decoder can fix this error. In general
the decoder can fix t errors if the distance from the senseword to every other
codeword is larger than t. So if a senseword is a distance less than t away
from a codeword, it must be at least t + 1 away from every other codeword.
Therefore we say that a code can fix t errors if dmin ≥ 2t + 1. From the
theorem we know that dmin = wmin ; then a code with wmin ≥ 2t + 1 fixes t
errors.
3.1
Matrix Description
We know that a linear code is a subspace of GF (q)n . Any set of basis vectors
for the subspace can be used as rows to form a k by n matrix, G, called the
generator matrix of the code. Then the row space of G is a linear code and
any code word is a linear combination of the rows of G. The rows of G are
linearly independent and the number of rows, k, is the dimension of the code.
This k is also the rank of the generator matrix.
To encode a dataword, a, we use a one-to-one pairing of k-tuples (datawords)
and codewords. Remember that a dataword is one of the initial k-tuples that
we encode to get a codeword. So a ∈ GF (q)k , and
c = a G.
The codeword is an n-tuple.
Let’s look at an example of a simple code. Take the generator matrix,
15


1 0 0 1 0



G=
0
1
0
0
1




0 0 1 1 1
and the data vector,
a= 0 1 1 .
Now we can encode this a into the codeword c


1 0 0 1 0 


c= 0 1 1 
0 1 0 0 1 = 0 1 1 1 0


0 0 1 1 1
Definition 3.5. If W is a subspace of a vector space of n-tuples, the set of
vectors orthogonal to W is called the dual space of W.
Theorem 3.2. If W, a subspace of a vector space of n-tuples, has dimension
k, then W ⊥ , the dual space of W, has dimension n − k. [2]
The code, C, has dimension k, which is the number of rows in G. C has an
orthogonal complement because it is a subspace of GF (q)n . This orthogonal
complement C ⊥ is the set of all vectors orthogonal to C; therefore it is also
a subspace of GF (q)n , and itself is a code. It is often referred to as the dual
code of C.
Using the theorem above, C ⊥ has dimension n − k. So any basis of C ⊥ has
n − k vectors. Now define H to be a matrix with any set of basis vectors of
C ⊥ as rows. We can use this matrix to check if any n-tuple, ~c, is a codeword
16
in C. c must be orthogonal to every row vector of H; thus
cHT = 0
where HT is the transpose of H. This is called the check equation. If an
n-tuple obeys this equation then it is a codeword of C. The matrix, H is
called the check matrix. As the check equation holds if c is any row of G,
then GHT = 0. In the previous example, we can write the check matrix as


1 0 1 1 0
H=
.
0 1 1 0 1
We can use the check matrix to find the minimum Hamming distance of
a code. The minimum Hamming distance gives us a measure of how good a
code is. This is because it is related to the number of errors a code can fix.
Theorem 3.3. If H is any check matrix for a linear code C, then the minimum distance of the code is equal to the smallest number of columns of H
that form a linearly dependent set. [2]
So, for an (n, k) code that can fix t errors, it is sufficient to find an (n − k)
by n matrix H where every set of 2t columns is linearly independent.
3.2
Hamming Codes
There is a specific kind of linear code called a Hamming code. We will talk
about binary Hamming codes here. Later we will connect Hamming codes
to cyclic codes and also talk about Hamming type convolutional codes.
17
If we have a binary code with minimum distance 3, its check matrix must
have distinct and nonzero columns because the smallest number of columns
of H that form a linearly dependent set is 3. Therefore any two columns
need to form a linearly independent set. If the check matrix has m rows,
each column is an m bit binary number. There are 2m − 1 possible columns
because we exclude zero. Let H consist of all 2m − 1 possible columns; then
n = 2m − 1 and k = (2m − 1) − m. So this defines a (2m − 1, 2m − 1 − m)
code. Codes of this form are called Hamming codes.
An example of a binary Hamming code with m = 3 is given by the following
check matrix.


 1 1 0 1 1 0 0



H=
 1 0 1 1 0 1 0


0 1 1 1 0 0 1
Every pair of columns of H is linearly independent. Some sets of three
columns are dependent; therefore the minimum distance is 3 and the code
can correct one error. We can make codes that correct more errors by defining
H over GF (q), q 6= 2.
18
4
Cyclic Codes
Cyclic codes are a specific kind of linear codes. We will look at cyclic codes
over GF (q) that have special structural properties that will prove to be useful
in developing Hamming type convolutional codes.
A linear code is described over GF (q) with a check matrix, H. Any vector, c,
over GF (q) is a codeword if cHT = 0. The check matrix H can be written in
a more compact way if we work in an extension field of GF (q) like GF (q m ).
For example let’s look at the following check matrix


1 0 0 1 0 1 1


.
H=
0
1
0
1
1
1
0




0 0 1 0 1 1 1
We can identify the columns of H with elements of GF (23 ). As we saw in
an example earlier, the polynomial p(x) = x3 +x+1 can be used to construct
GF (8) with α as the primitive element represented by x. Using the first row
as the coefficient of x0 , the second row as the coefficient of x and the last row
as the coefficient of x2 yields H = (1, x, x2 , x + 1, x2 + x, x2 + x + 1, x2 + 1) or
H=
α0 α1 α2 α3 α4 α5 α6
Using this check matrix over GF (8), we can define any codeword as a vector
over GF (2) such that in the extension field GF (8), the product with HT is
zero.
19
4.1
Polynomial Description
In the example above, we said that a vector c is a codeword if cHT = 0.
Thus (c0 , c1 , . . . , c6 )(α0 , α1 , . . . , α6 )T = 0. This product can be expanded to
6
X
get
ci αi = 0. This looks a lot like a polynomial. Generally, replacing α
i=0
with x, a codeword can be written as c(x) =
n−1
X
ci xi = 0. So evaluating the
i=0
polynomial c(x) at α, we get the product of the codeword with the check
matrix. Then for c(x) to be a codeword, we need c(α) = 0.
Now we will describe what it means for a linear code to be cyclic in the polynomial description of cyclic codes. Let C be a linear code over GF (q). Then
C is cyclic if when c = (c0 , c1 , . . . , cn−1 ) is in C, then c′ = (cn−1 , c0 , . . . , cn−2 )
is also in C. We have shifted the components of c by one place to get c′ . So
we can repeat this shift a total of n times.
Every linear code of blocklength n over GF (q) is a subspace of GF (q)n .
We can relate the structure of GF (q)n to the structure of a set of polynomials.
Each vector in GF (q)n is represented as a polynomial with degree less than
or equal to n − 1. The coefficients in the polynomial are the components of
the vector. This set of polynomials has a ring structure, that we saw earlier.
In this case we write the quotient ring as GF (q)[x]/ < xn − 1 >. Within this
ring, a cyclic shift can be written as x · p(x) = Rxn −1 [xp(x)]. Therefore in
this representation, a code is a subset of the ring GF (q)[x]/ < xn − 1 >.
Pick a nonzero codeword polynomial, g(x), of the smallest degree from C,
and denote its degree by n − k. We can make g(x) a monic polynomial by
multiplying it by a field element. g(x) is then the only monic polynomial
20
with degree n − k in C because if we had another monic polynomial with
degree n − k, we could subtract the two to get something with smaller degree. Therefore this is a contradiction because we picked this polynomial
to be of smallest degree. This unique nonzero monic polynomial, g(x), is
defined as the generator polynomial of C. Multiplying this polynomial with
polynomials of degree k − 1 or less, we get all other elements of the cyclic
code. We will now prove this statement. By the division algorithm, we know
that for some polynomials, s(x) and Q(x), c(x) = Q(x)g(x) + s(x) for any
c(x) that is in the code with the generator g(x). The degree of s(x) must
be smaller than g(x). Rearranging, s(x) = c(x) − Q(x)g(x). We know that
x · g(x) = Rxn −1 [xg(x)] is in the code; then any linear combination of powers
of x also gives us something in the code because the code is linear. Also,
adding two code elements yields a code element. Therefore both, c(x) and
Q(x)g(x) are codewords so s(x) must be a codeword. But s(x) is of degree
less than n − k, the degree of g(x). Then s(x) must be the zero polynomial
because g(x) is the polynomial of smallest degree. Then Q(x) can be of degree less than k − 1. So multiplying any polynomial, Q(x) with degree less
than k − 1 gives us a codeword in the code.
Theorem 4.1. A cyclic code of blocklength n with generator g(x) exists if
and only if g(x) divides xn − 1. [2]
Definition 4.1. Every polynomial p(x) =
r
X
pi xi is associated with a recip-
i=0
r
X
rocal polynomial which is defined as p̃(x) =
i=0
21
pr−i xi = xr p(x−1 ).
If g(x) divides xn − 1, the reciprocal polynomial, g̃(x) = xr g(x−1 ), where
r is the degree of g(x), also divides xn − 1. From the last theorem, the reciprocal polynomial must also be the generator for some code. This code is
said to be equivalent to the code generated by g(x).
Now, if C is cyclic code with generator polynomial g(x), there exists a polynomial, h(x) such that
xn − 1 = g(x)h(x).
The polynomial h(x) is called the check polynomial of C. So every codeword
c(x) satisfies, Rxn −1 [h(x)c(x)] = 0 because h(x)c(x) = h(x)g(x)a(x) for some
a(x). Then h(x)g(x)a(x) = (xn − 1)a(x). a(x) is referred to as the data
polynomial.
To study the generator polynomials of cyclic codes in more detail, we will try
to find all possible generator polynomials. One way of doing this is to write
xn − 1 in terms of its monic irreducible factors f1 (x)f2 (x) . . . fs (x). These
factors are distinct polynomials over GF (q). We can have 2s − 2 different
cyclic codes of blocklength n as we can take any subset of these prime polynomials, except all or none, and multiply them together to give a generator
polynomial.
4.2
Matrix Description
We will now come back to the matrix representation at the beginning of
this section. We will relate cyclic codes to their matrix representations by
writing a generator matrix from the generator polynomial of a code. We
22
know that codewords are of the form c(x) = a(x)g(x). We can translate this
equation to matrix form using an extension field of GF (q). If γj ∈ GF (q m )
n−1
X
for j = 1, . . . , r are the zeros of g(x), then
ci γji = 0. This can be written
i=0
in a matrix form as

0
1
γ1 γ1 . . .

γ 0 γ 1 . . .
 2 2
H=.
 ..


γr0 γr1 . . .

γ1n−1 

γ2n−1 

.



γrn−1
This matrix can be written over GF (q) by expanding each matrix element γ
by a column vector based on the coefficients of γ expressed as a polynomial
over GF (q). So we get a check matrix for the code.
Another way to construct the generator matrix of the code is by direct inspection. So if we have a generator polynomial, g(x), we can write the generator
matrix as

0
...


 0


G=
 0

 ..
 .

gn−k . . .
0
gn−k
gn−k
gn−k−1 . . .
gn−k−1 gn−k−2 . . .
gn−k−1 gn−k−2 gn−k−3 . . .
g 2 g1 g0
g 1 g0
g0
0
0



0


0


.. 
.

0
If h(x) is the check polynomial, then the check matrix can then be written
as
23

0 0 0 ...

 ..
.


H=
 0 0 h0


 0 h0 h1

h0 h1 h2 . . .
...
hk−2 hk−1 hk
hk−1
hk
0
hk
0
0
...

hk−1 hk

.. 
.


0
0



0
0

0
0
For example, if we have the generator polynomial, g(x) = x3 + x + 1,
then

0

0

G=
0


1

0 0 1 0 1 1

0 1 0 1 1 0


1 0 1 1 0 0


0 1 1 0 0 0
Now h(x) = x4 + x2 + x + 1. The check matrix therefore can be written
as


 0 0 1 1 1 0 1



H=
0
1
1
1
0
1
0




1 1 1 0 1 0 0
We can check that GHT does indeed give us the zero matrix.
24
5
Some More Algebra
In the next section we will be studying convolutional codes constructed from
cyclic codes. These codes have larger Hamming distances which allow us to
correct more errors. Before moving on to convolutional codes, we will define
group rings and outline some of their properties that will be useful for us in
the next section.
5.1
Group rings
For a group G and a ring R, a group ring is defined as
X
αg g|αg ∈ R},
RG = {
g∈G
where only a finite number of αg are nonzero. Thus the elements of G act
as the basis of the group ring over R. Multiplication in the group ring is
analogous to polynomial multiplication.
In forming convolutional codes, we will use a cyclic group as the group G.
Zero-divisors and units are specifically useful in coding. In forming cyclic
codes in the last section, we used zero-divisors. Group rings are an abundant
source of zero-divisors and units that will help us in forming convolutional
codes.
We can write RG-matrices of elements in RG. This way of looking at a
group ring will be useful for us while looking at convolutional codes. Let
n
X
{g1 , g2 , . . . , gn } be a group G. The RG matrix of an element, w =
α gi g i ∈
i=1
25
RG in Rn×n , the ring of n × n matrices over R, is defined as,

−1
−1
 α g1 g1 α g1 g2 . . .

α −1
 g g αg2−1 g2 . . .
M (RG, w) =  2. 1
..
..
 ..
.
.


αgn−1 g1 αgn−1 g2 . . .

αg1−1 gn 

αg2−1 gn 

.
.. 

. 

αgn−1 gn
To form this matrix, we associate the element in the first column and the
first row to the coefficient of g1 , the second column and the first row to the
coefficient of g2 and so on.
More technically, the group ring RG is isomorphic to a ring of RG-matrices
over R.
Theorem 5.1. Let G be a group of order n. Then there is a bijective ring
homomorphism σ(w) = M (RG, w) so σ : RG → Rn×n .
Proof. We will only prove this for the case of n = 3. Let G = C3 =< β >.
Then σ takes w = a + bβ + cβ 2 to


a b c 


c a b




b c a
This is obviously bijective because for any such matrix, M (Z2 C3 , w) there
exists a w that maps to it. Also any distinct w’s cannot be mapped onto the
same matrix, M (Z2 C3 , w).
Now for this to be a homomorphism, it needs to preserve multiplication and
addition.
26
Let x = a + bβ + cβ 2 and y = p + qβ + rβ 2 be in the group ring. Then we
want σ(x + y) = σ(x) + σ(y). So






a b c  p q r  a + p b + q c + r 

 
 


 
 
σ(x) + σ(y) = 
 c a b  + r p q  =  c + r a + p b + q  .
 
 


b+q c+r a+p
q r p
b c a
But this is σ(x + y) as x + y = (a + p) + (b + q)β + (c + r)β 2 . Therefore σ
preserves addition.
Now we want σ(x · y) = σ(x) · σ(y). We can write this as





a b c  p q r  ap + br + cq aq + bp + cr ar + bq + cp

 


·r p q  = cp + ar + bq cq + ap + br cr + aq + bp .
σ(x)·σ(y) = 
c
a
b

 



 


bp + cr + aq bq + cp + ar br + cq + ap
q r p
b c a
On inspection we see that this is σ(x · y) as x · y = (ap + br + cq) + (aq +
bp + cr)β + (ar + bq + cp)β 2 . Therefore σ preserves multiplication. Then σ
is a bijective homomorphism at least for the case of n = 3.
Let us look at a specific example of this. Consider, Z2 C7 where C7 is the
cyclic group of seven elements. In this group ring, (1+g+g 3 )(1+g+g 2 +g 4 ) =
27
0. This can then be written as

1

0



0


0


1


0


1
6

1 0 1 0 0 0  1


1 1 0 1 0 0
 0


0 1 1 0 1 0  0


0 0 1 1 0 1  1



0 0 0 1 1 0
 0


1 0 0 0 1 1
 1

0 1 0 0 0 1
1

1 1 0 1 0 0

1 1 1 0 1 0



0 1 1 1 0 1


=0
0 0 1 1 1 0


1 0 0 1 1 1


0 1 0 0 1 1


1 0 1 0 0 1
Convolutional Codes
We discussed in the introduction the two basic forms of codes: block codes
and trellis codes. The codes we have discussed till now are all block codes.
Like block codes, trellis codes divide a datastream into blocks of length k,
called dataframes, which are encoded into blocks of length n called codeframes. In both cases the datastream is divided into a sequence of blocks or
frames. However using a block code, a single block of the codestream only
depends on a single block of the datastream. In a trellis code, a single frame
of the codestream depends on several frames of the datastream.
6.1
Trellis Description of Convolutional Codes
A type of trellis code is a convolutional code. For a convolutional code, a
datastream is divided into dataframes of k symbols. Each dataframe might
only be one symbol. These dataframes are fed into an encoder. The encoder
28
can store m dataframes. m is called the memory of the encoder. Every
time a new dataframe is added to the encoder, the m stored dataframes
shift forward and the oldest one is discarded. Therefore this type of an
encoder is called a shift-register encoder. For each added dataframe, the
encoder computes a codeframe of length n using the current dataframe and
the stored m dataframes, each of length k. This codeframe is shifted out as
a new dataframe is shifted in. The following figure explains the basic idea.
The set of code sequences produced by such an encoder is called an (n, k)
convolutional code. The code rate of such a code is k/n.
There is a difference between the encoder and the code itself. The code is the
infinite set of all infinitely long codewords that can be produced by feeding
every possible data sequence. The encoder is device used for the encoding.
A trellis code needs to have two properties in order to be a convolutional code:
time invariance and linearity. Time invariance means that if a dataframe is
delayed by a single dataframe, (a dataframe of zeros is sent in), then the codeframe is also delayed by a single codeframe. The linearity property says that
if we take a linear combination of two datastreams and find the codestream
29
for it, it will be the same as taking a linear combination of the codestreams
of the two datastreams.
Most often practical convolutional codes use very small values of k and n,
such as k = 1. In this case, the m dataframes stored in the encoder are
each of length one. A useful way of describing the input to the encoder is
by using the time invariance of a convolutional code. Each bit or dataframe,
for the case k = 1, can be represented as part of a polynomial over the field
GF (2). A bit with no time delay is represented as a 1. A bit with a delay
of one dataframe is represented as an x. A delay of two dataframes will be
represented as x2 . This way we can represent a datastream as a polynomial.
For example, 1011 will be represented as x3 + x + 1. We will exploit this
further in the next section.
6.2
Construction of Convolutional Codes from Units:
Polynomial Description
We start with a ring R, which is a subring of the ring of matrices Fn×n . So we
have the group ring F G which is a subring of Fn×n using the embedding defined in Section 5.1. First we will outline the general method of construction
of convolutional codes from units, then we will apply this to polynomials.
Suppose f and g are in R[z, z −1 ], where R[z, z −1 ] is the ring of Laurent series
of finite support in z with coefficients from R. Here finite support means that
only a finite number of the coefficients are nonzero. Also, f [z, z −1 ]g[z, z −1 ] =
30
1. We can decompose f and g in the following way.
 
f 1 
f =   g = g1 g2
f2
where f1 is an k × n matrix, f2 is an (n − k) × n matrix, g1 is an n × k
matrix, and g2 is an n × (n − k) matrix. So
 
f 1 
  × g1 g2 = 1.
f2
Then we can write


f 1 g 1 f 1 g 2 
=1

f2 g1 f2 g 2
So f1 g1 = Ik×k , f1 g2 = 0k×(n−k) , f2 g1 = 0(n−k)×k , and f2 g2 = I(n−k)×(n−k) .
Thus we can take f1 as the generator matrix of an (n, k) convolutional code.
g2 is then the check matrix for this code because f1 g2 = 0k×(n−k) .
Generalizing this result, we can take any rows of f to construct a generator
matrix for a convolutional code. The check matrix can then be obtained
using g.
Let us apply this construction to the case of polynomials. Let f and g be
polynomials with f (z)g(z) = 1 in R[z]. f and g can be written in a matrix
form using the homomorphism described earlier. The n × n matrices can
then be written as f (z) = (fi,j (z)) and g(z) = (gi,j (z)). Consider an element
k[z] ∈ F [z]k . This is an k-tuple of polynomials. If we add zeros at the end
of k[z], we can think of k[z] as an element in F [z]n . In F [z]k , k[z] is a 1 × k
31
matrix and in F [z]n , it is a 1 × n matrix. We have added n − k zeros at the
end of the 1 × k matrix. Now the mapping γ : F [z]k → F [z]n can be defined
by γ : k(z) → k(z)f (z). We will define the code C to be the image of γ. We
can define the generator matrix, G(z) of this code by taking the first k rows
of f (z) because k[z] ∈ F [z]n has zeros in the last n − k columns.
6.3
Group Ring Convolutional Codes: Matrix Description
Let R = F G. It is the group ring of some group G over F . Using the explicit
injection from F G to a subring Fn×n , we say that R is a subring of Fn×n
where |G| = n.
Now we can define convolutional codes using the same method as the last
section. We consider R[z, z −1 ] ∼
= RC∞ . So this is the group ring over C∞
with the coefficients from R = F G. To find units in R[z, z −1 ], we need to
find units in F G.
n
t
X
X
Let
αi z i ×
βj z j = 1 where αn 6= 0 and βt 6= 0. This is considered an
i=0
j=0
equation in RC∞ with non-negative powers. Comparing the coefficients of
z 0 on either side of the equation, we see that β0 6= 0 and α0 6= 0. Looking at
the highest and lowest powers, we can see that αn × βt = 0 and α0 × β0 = 1.
So we can say that α0 is a unit with β0 as the inverse.
32
6.4
Specific Examples
Example 1
Let R = Z2 C4 , where C4 is the cyclic group generated by a and has
order 4. Now if we let α0 = a + a2 + a3 , α1 = 1 + a2 and α2 = a + a3 ,
then α02 = 1, and α12 = α22 = 0. So if w = α0 + α1 z + α2 z 2 in RC∞ , then
w2 = α02 +z(α0 α1 +α1 α0 )+z 2 (α0 α2 +α12 +α2 α0 )+z 3 (α1 α2 +α2 α1 )+z 4 (α22 ) = 1.
Now using the injection from R = Z2 C4 into the ring of 4 × 4 matrices over
Z2 , we can write α0 as

0

1


1


1

1 1 1

0 1 1


1 0 1


1 1 0


α1 as
1

0


1


0
0 1 0

1 0 1


0 1 0


1 0 1


and α2 as
0

1


0


1
1 0 1

0 1 0


1 0 1


0 1 0
33
w can then written as

0

1

w=
1


1
1 1
0 1
1 0
1 1


1 1
 

1
 0
+

1
 1
 
0
0


0 1 0
0


1
1 0 1


z + 
0
0 1 0




1 0 1
1

1 0 1

0 1 0
 2
z
1 0 1


0 1 0
If we take the first two rows of w to generate a convolutional code, we get
the following generator matrix




0 1 0 1  2
0 1 1 1 1 0 1 0
G=
z
z + 
+
1 0 1 0
0 1 0 1
1 0 1 1


The last two columns of w give us the check matrix,

1

1

H=
0


1


1 1
 

1
 0
+

1
 1
 
0
0


0
0


1
1


z + 
0
0




1
1

1

0
 2
z
1


0
If C is an (n, k) cyclic code with distance d1 , and Cˆ is the (n, n − k) dual
code of C with distance d2 , the free distance, d is defined as min{d1 , d2 }.
The free distance of this code is 6. The free distance can be calculated by
finding the smallest number of ones that a nonzero codeword can have. In this
case, consider, ((1 0)+(0 1)z)G. This gives (0 1 1 1)+(1 0 1 0)z+(0 1 0 1)z 2 +
(1 0 1 1)z+(0 1 0 1)z 2 +(1 0 1 0)z 3 . So we get (0 1 1 1)+(0 0 0 1)z+(1 0 1 0)z 3 .
So we have 6 ones in this codeword.
34
6.5
Hamming Type Convolutional Codes
We will use the construction as given in [4] of Hamming type convolutional
codes. We will look at a particular example of the code that can be constructed from the Hamming cyclic (7, 4) code that is generated by the polynomial f (g) = 1 + g + g 3 . This is the same cyclic code that we mentioned in
Section [2.2] and Section [5.1].
Let C be the cyclic code generated by f (g) = 1 + g + g 3 . We now define
f (z) = α0 + α1 z + α3 z 3 in RC∞ . Let R = Z2 (C4 × C2 ) where C4 is generated
by a and C2 is generated by h. Then {C4 ×C2 } = {1, a, a2 , a3 , h, ah, a2 h, a3 h}.
Assume that α0 = 1 + h(1 + a2 ) so that α02 = 1. Also let αi , for i > 0, be
either 1 + h(a + a2 + a3 ) or 0. Then αi2 is 0 for all i except for i = 0.
Now f (z)2 = α02 = 1 because αi2 = 0 for i > 0. We will now use this f (z) to
generate a convolutional code by taking the first first rows of αi . Remember
that α0 can be written as a matrix given the homomorphism in [5.1]. The first
row of this matrix is given by the coefficients of (1 a a2 a3 h ha ha2 ha3 ). So
the first row is (1 0 0 0 1 0 1 0). The second row is given by the coefficients
of (a3 1 a a2 a3 h h ha ha2 ). This row is obtained by multiplying a3 , the
inverse of a in this group ring, to the first row. So the second row is given by
(0 1 0 0 0 1 0 1). We see that the matrix is showing a pattern where the first
four columns and the last four columns cycle independently of each other.
35
So we get

1


0


0



0
α0 = 

1


0



1

0
Let

1

0

A=
1


0
We can then write
and
0 0 0 1 0 1 0
1 0 0 0 1 0
0 1 0 1 0 1
0 0 1 0 1 0
0 1 0 1 0 0
1 0 1 0 1 0
0 1 0 0 0 1
1 0 1 0 0 0


0 1 0
0


1
1 0 1


 B=

1
0 1 0



1 0 1
1



1


0



1


0


0



0

1

1 1 1

0 1 1


1 0 1


1 1 0

 I A
α0 = 

A I



 I B
αi = 
 , i 6= 0.
B I
In this formalism we will define the generator matrix of the code by taking
only the first four rows of f . The check matrix is defined by taking the last
four columns of f . So we get the generator matrix to be
G(z) = (I A) + (I B)z + (I B)z 3
36
The control matrix is then
 
   
B 
A B 
H(z) =   +   z +   z 3
I
I
I
To find the free distance of this code, consider w(z) = (
t
X
βi z i )G where
i=0
βi is a 1 × 4 matrix. Let
t
X
βi z i = t(z). The free distance is given by
i=0
the sum of the weights of the coefficients of z in w(z). We assume that β0
is nonzero. We will consider t(z) of different support to see the minimum
number of one’s in the product w(z).
Let us first consider t(z) = β0 . In this case, w(z) has three terms. The
coefficient of 1 = z 0 is β0 (I A) and the coefficient of z 3 and z is β0 (I B). As
two of the rows in A are identical, we can add them together and get zero.
For example let β0 = (1 0 1 0). Then we get (1 0 1 0)G = (1 0 1 0 0 0 0 0) +
(1 0 1 0 1 0 1 0)z + (1 0 1 0 1 0 1 0)z 3 . We get the coefficient of 1 = z 0 to
be (1 0 1 0 0 0 0 0) and the coefficient of z 3 and z is (1 0 1 0 1 0 1 0). So
(I A) gives us a distance of 2 and (I B) gives us a distance of 4 by counting
the number of ones. Therefore we get at least 10 ones.
Now we consider t(z) of support two. Consider (β0 + β1 z)G. Again, the first
term gives us at least two ones and the last term, that is the coefficient of z 4 ,
gives us at least four ones. There is however another term, specifically the
z 2 term whose coefficient also gives us at least four ones. Similarly, we check
(β0 + β1 z 2 )G. We get at least six ones from the first and the last terms. We
also get at least another four ones from the coefficient of z. For (β0 + β1 z 3 )G,
37
we get at least six ones from the first and the last terms. Then we also get at
least four ones from the coefficient of z 4 . For any higher power,(β0 + β1 z t )G
we always get an extra term with at least four ones because the z t+1 does
not cancel with anything.
We move on to t(z) of support three. Consider (β0 + β1 z + β2 z 2 )G. Again,
we get at least six ones from the first and last terms. We also get at least
four ones from the z 4 term. For a highest power of two there are no other
t(z) with support three. Similarly, (β0 + β1 z + β2 z 3 )G gives us at least six
ones from first and last terms as well as at least four from the z 4 term. Also,
(β0 + β1 z 2 + β2 z 3 )G gives us at least six ones from first and last terms and at
least four from the z 5 term. We notice that for any higher orders of t(z), we
always get a term from the highest power of t(z) and the z term in G that
gives us at least four ones. We therefore conclude that we always get at least
ten ones. Therefore the free distance of this code is ten.
Using the same construction but different generator polynomials, we get
different Hamming type convolutional code. We can generally write the
generator matrix as
G(z) = (I A) + δ1 (I B)z + δ2 (I B)z 2 + · · · + δn (I B)z n
where δi = 0, 1. The general check matrix is then
 
 
 
 
B 
B 
B 
A 
H(z) =   + δ1   z + δ2   z 2 + · · · + δn   z n
I
I
I
I
38
For example, n = 1 gives us the generator matrix G(z) = (I A) + (I B)z.
This has a free distance of 6. We find this by considering (1 0 1 0)G. We
again add two of the rows in A which are identical to get zero. So (I A)
gives us a distance of 2 and (I B) gives us a distance of 4. By increasing n,
we can get codes of increasing free distance.
We can also increase the free distance of the code by shifting the coefficient (I A) to any position in the generator matrix and compensating for it
by dividing the check matrix by some power of z. By shifting (I A) to the
coefficient of z t , we have to redefine f so that αt2 = 1 and αi2 = 0 for all i 6= t.
Then f 2 = z 2t . We can write this as f × zf2t . Then the first four rows of f
can be used to form the generator matrix and the last four columns of zf2t
can be used to form the check matrix.
39
References
[1] Gallian, Joseph A. Contemporary Abstract Algebra. Boston, MA:
Houghton Mifflin, 2006.
[2] Blahut, Richard E. Algebraic coding theory for Data Transmission. Cambridge University Press, 1985.
[3] Hurley, Paul, and Hurley, Ted. Block codes from matrix and group rings.
World Scientific Review, Volume July 23 2008.
[4] Hurley, Paul, and Hurley, Ted. LDPC and convolutional codes from matrix and group rings. World Scientific Review, Volume July 23 2008.
[5] Hurley, Ted. Convolutional codes from units in matrix and group rings,
Selected Topics in Information and Coding Theory Galway, UK: National
University of Ireland.
[6] Hurley, Paul, and Hurley, Ted. Codes from zero-divisors and units in
group rings. arXiv:0710.5893.
[7] Hurley, Ted. Self-dual, dual-containing and related quantum codes from
group rings. arXiv:0711.3983.
40
Download