Algebraic Coding Theory Honors Thesis Ahsan Ashraf Expected date of graduation: May 15, 2010 Dr. Arnold Feldman Franklin & Marshall College Department of Mathematics and Computer Science May 13, 2010 Abstract This project will attempt an in-depth study of algebraic coding theory. We will study the two basic kinds of codes: Block codes and trellis codes. Specifically, we will look at linear block codes, cyclic codes, Hamming codes, and convolutional codes. 1 Contents 1 Introduction 3 2 Algebraic Framework 5 2.1 Overview of Basic Algebra . . . . . . . . . . . . . . . . . . . . 5 2.2 Finite fields based on the integer and polynomial rings . . . . 7 3 Linear Block Codes from Matrices and Group Rings 13 3.1 Matrix Description . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Cyclic Codes 19 4.1 Polynomial Description . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Matrix Description . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Some More Algebra 5.1 25 Group rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Convolutional Codes 28 6.1 Trellis Description of Convolutional Codes . . . . . . . . . . . 28 6.2 Construction of Convolutional Codes from Units: Polynomial Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3 Group Ring Convolutional Codes: Matrix Description . . . . . 32 6.4 Specific Examples . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.5 Hamming Type Convolutional Codes . . . . . . . . . . . . . . 35 2 1 Introduction In modern communication systems, there is a massive amount of data that needs to be stored and communicated every day. It is vital that this data be transmitted in a reliable way, with as few errors as possible. Error-correcting codes play a fundamental role in minimizing data corruption due to certain defects in the system such as noise, interference and attenuation. We will introduce a variety of such codes that can be used in error-correction. Specifically, we will aim to understand convolutional codes. Let’s suppose that all data of interest can be represented as binary data. This means that any data can be broken down to a sequence of zeros and ones. This binary data is transmitted through a channel that often may encounter errors. The basic purpose of a code is to add a check so that if an error occurs, the error can be found and corrected at the receiver. If we have a sequence of symbols of length k, the coding process encodes it into a sequence of symbols of length n. The initial sequence of k symbols is called a dataword and the final sequence of n symbols is called a codeword. The set of all possible codewords is called the code. Once the codewords are transmitted through the channel, the words received at the other end are called sensewords. There are two basic kinds of codes: block codes and trellis codes. In general block codes are defined over some arbitrary finite alphabet, for example the alphabet with q symbols {0, 1, 2, . . . , q − 1}. So the dataword is a sequence of k of these symbols. The code is the set of all codewords. As we have q different symbols, the code is then of size M = q k . This code is defined as 3 an (n, k) code. Now if q = 2, we get sequences of zeros and ones. These are called binary codes. We will give several examples of binary codes later. A trellis code is different from block codes. It takes an unending sequence of data symbols arranged in k-symbol segments called dataf rames. The code puts out a continuous stream of code symbols arranged in n-symbol segments called codef rames. The major differentiation between the two types of codes is that in a trellis code a k-symbol dataframe can affect all codeword frames that come after. On the contrary, in a block code, a k-symbol datablock only determines the next n-symbol codeblock. We will begin with an introduction to some basic abstract algebra. We will then outline some further algebraic ideas such as extension fields, finite fields and Galois fields that we will need in order to understand algebraic coding. We can then define basic terms in coding and link ideas from abstract algebra to matrix definitions of linear block codes and their generator and check matrices. We will introduce cyclic codes and their polynomial and matrix forms. We will explain how we use Galois fields in their structure. Further, we will explain Hurley’s construction of unit-derived convolutional codes [4]. This will require some understanding of group rings and group ring matrices that we will overview. We will work with some specific examples and solve for free distances for each of these codes. We will then talk about how these free distances can be potentially improved giving us better codes. 4 2 Algebraic Framework All good data-transmission codes rely on the structures of algebra. The tools of algebra are necessary in order to design encoders and decoders. We will devote this section to understanding topics in algebra that are significant in the development of data-transmission codes. 2.1 Overview of Basic Algebra We will use arithmetic systems, different from real or complex numbers, consisting of sets together with operations defined on the elements of these sets. In this section we will outline these systems. A group G is a set, together with an operation on pairs of elements on the set (∗), satisfying closure, associativity, existence of an identity and the existence on inverse elements. If the operation is commutative, we call this an abelian group. In this paper, we will only deal with abelian groups. The operation, (∗), can then be called addition and denoted by (+). The identity element is called “zero” and the inverse of a is denoted by −a. We can also call the operation multiplication where the identity element is denoted as (1) and the inverse of any element a is denoted as a−1 . Let G be a group and let H be a subset of G. Then H is a subgroup of G if H is also a group under (∗). One way of obtaining a subgroup of a finite group G is to take any element h from G and multiply it by itself a number of times. As G is finite, only a finite number of these will be unique. The first element to repeat will be h itself. Also, if any hj = h then hj−1 = 1, which is the identity. So H will be the group composed of the elements 1, h, h2 , h3 , . . . . 5 The set H is called the subgroup generated by h. The number of elements in H is called the order of the element h. A group that consists of all the powers of one of its elements is called a cyclic group. Therefore H is a cyclic subgroup of G. A ring R is a set with two operations defined: addition and multiplication. It is an abelian group under addition and it is closed, associative, and distributive under multiplication. Addition in a ring is always commutative, however, multiplication need not be. A commutative ring is one in which multiplication is commutative. Addition in a ring has an identity called “zero”. Multiplication may or may not have an identity element. If there is an identity under multiplication, it is unique. Every element in a ring has an inverse under addition. Under multiplication, an inverse is only defined in a ring with the identity. In such a ring, inverses may exist but need not. They can be defined as right or left inverses depending on if they multiply from the right or the left. An integral domain is a commutative ring that contains no zero-divisors. A zero-divisor is a non-zero element, a, of the ring if there exists a non-zero b in the ring such that ab = 0. A field F is a commutative ring with the identity in which every nonzero element is a unit. Therefore there is an identity for addition, as well as for multiplication. These are denoted as 0 and 1 respectively. The additive inverse of a is denoted as −a and the multiplicative inverse is denoted as a−1 . In this paper, we will be interested in finite fields. A field with q elements, if it exists, is called a finite field or a Galois field, and is denoted by GF (q). We will talk in much detail about Galois fields later. We will also need another system called a vector space. A familiar example 6 is the three-dimensional space that we use in physics. We can extend this mathematically to an n-dimensional vector space. Vector spaces are defined abstractly with respect to a field. Let F be a field. The elements of F are called scalars. A set V is called a vector space if we define vector addition on pairs of elements from V , and scalar multiplication on an element of F and an element of V . The elements of V are called vectors. V must be an abelian group under this addition. Scalar multiplication must be distributive and associative. The zero element of V is called the origin and is denoted by 0. As an example, let V be the set of polynomials in x and F be GF (q). In this space, the vectors are polynomials in x. A set of vectors {v1 , . . . , vk } is called linearly dependent if there is a set of scalars {a1 , . . . , ak } not all zero, such that a1 v1 + a2 v2 + · · · + ak vk = ~0. A set of vectors that is not linearly dependent is linearly independent. A sum such as the one on the left hand side of the above equation is called a linear combination of vectors. No vector in a linearly independent set of vectors can be written as a linear combination of other vectors in the set. 2.2 Finite fields based on the integer and polynomial rings There is a method to construct a new ring, called a quotient ring, from a given commutative ring. We will specifically look at the construction of a quotient ring from the ring of integers. If the ring is an integral domain, this 7 construction will result in a field.[1] Theorem 2.1. For any pair of integers c and d, where d is nonzero, we can write c = dQ + s for some Q and s, where 0 ≤ s ≤ d − 1. The remainder can be written as s = Rd [c]. [1] Definition 2.1. Let q be a positive integer. The quotient ring called the ring of integers modulo q, denoted by Z/(q) or Zq , is the set 0, . . . , q − 1 with addition and multiplication defined by a + b = Rq [a + b], a · b = Rq [ab] Any element a ∈ Z can be mapped into Z/(q) by a′ = Rq [a]. If two elements, a and b of Z are mapped into the same element of Z/(q) then they are congruent and a = b + mq for some integer m. Then a is congruent to b with modulus q, where the modulus is the base with respect to which the congruence is computed. To define finite fields based on polynomial rings, we will first define polynomial rings. A polynomial over any field F is a mathematical expression f (x) = fn−1 xn−1 + fn−2 xn−2 + · · · + f1 x + f0 where the symbol x is an indeterminate, and the coefficients fn−1 , . . . , f0 are elements of the field F . We will be working with polynomials over a finite field, GF (q). A monic polynomial is a polynomial with a leading coefficient 8 of one. The degree of a non-zero polynomial, deg f (x), is the index of the leading coefficient. Also, deg 0 = −∞. The degree of a non-zero polynomial is always finite. The set of all polynomials over a finite field GF (q) forms a ring if addition and multiplication are defined as the usual addition and multiplication of polynomials. This ring is denoted by GF (q)[x]. A polynomial ring is analogous to the ring of integers. A polynomial s(x) in a polynomial ring R is divisible by the polynomial r(x) in R, or r(x) is a factor of s(x), if there exists a polynomial a(x) in R such that r(x)a(x) = s(x). A polynomial p(x) that is divisible only by αp(x) and by α where α is any non-zero field element in GF (q) is called an irreducible polynomial. Theorem 2.2. For any pair of polynomials c(x) and d(x), where d(x) is nonzero, we can write c(x) = d(x)Q(x) + s(x) for some Q(x) and s(x), where s(x) is of degree less than d(x). The remainder can be written as s(x) = Rd(x) [c(x)]. [1] We can obtain finite fields from polynomial rings by using a similar construction to the one we used to obtain finite fields from the integer ring. Suppose we have F [x], the ring of polynomials over the field F . We can choose any p(x) from F [x], and define the quotient ring by using p(x) as a modulus for polynomial arithmetic. Definition 2.2. For any monic polynomial p(x) of nonzero degree over the field F, the ring of polynomials modulo p(x) is the set of all polynomials with a degree smaller than that of p(x), together with polynomial addition and 9 polynomial multiplication modulo p(x). This ring is conventionally denoted by F [x]/ < p(x) >. Any element r(x) of F [x] can be mapped into F [x]/ < p(x) > by r(x) −→ Rp(x) [r(x)]. Two elements of F [x], a(x) and b(x), that map into the same element of F [x]/ < p(x) > are congruent mod p(x); b(x) = a(x) + Q(x)p(x) for some polynomial Q(x). [1] Theorem 2.3. The ring of polynomials modulo a monic polynomial p(x) is a field if and only if p(x) is an irreducible polynomial. [2] We will use finite fields extensively later therefore we will need to define some more terms. Definition 2.3. Let E be a field. A subset F of E is called a subfield of E if F itself is a field under the inherited addition and multiplication. The field E is then called an extension f ield of the subfield F . We can now state the Fundamental Theorem of Field Theory. Theorem 2.4. Let F be a field and f(x) be a nonconstant polynomial in F[x]. Then there is an extension field E of F in which f(x) has a zero. [1] This is a standard result so we will not be proving it here. The Fundamental Theorem of Field Theory allows us to prove another theorem which will be very useful for us later. Theorem 2.5. Let F be a field and let p(x) ∈ F [x] be irreducible over F . If a is a zero of p(x) in some extension E of F , then F(a) is isomorphic to 10 F [x]/ < p(x) >. Furthermore, if deg p(x) = n, then every member of F (a) can be uniquely expressed in the form cn−1 an−1 + cn−2 an−2 + · · · + c1 a + c0 where c0 , c1 , . . . , cn−1 ∈ F [1]. Finite fields are very restrictive in their nature. The following theorem is an example of this. We will be using this property of finite fields later. Theorem 2.6. For each prime p and each positive integer n, there is up to isomorphism a unique finite field of order pn , and the nonzero elements of that field form a cyclic group under multiplication. [1] Definition 2.4. A primitive element of the field GF(q) is an element α such that every nonzero field element can be expressed as a power of α. [2] Let’s look at a specific example of a finite field based on a polynomial ring. In GF (8), every nonzero element has an order that divides 7. In this field, every element except one and zero is primitive, that is to say that that every element except one and zero has order 7, because seven is prime. We can construct GF (8) using the polynomial p(x) = x3 + x + 1. p(x) is irreducible over Z2 .[1] Based on the primitive element α = x, we write α = x α 2 = x2 α3 = x + 1 11 because x3 + < p(x) >= −(x + 1)+ < p(x) >= x + 1+ < p(x) >. Similarly we get α 4 = x2 + x α 5 = x2 + x + 1 α 6 = x2 + 1 α7 = 1 = α0 Then GF (8) consists of the elements: {0, α, α2 , . . . , α7 }. 12 3 Linear Block Codes from Matrices and Group Rings Definition 3.1. A block code of size M over an alphabet with q symbols is a set of M q-ary sequences of length n called codewords. The coding process takes a block of k data symbols and encodes it into an n-symbol codeword. Each codeword is a sequence of n symbols. The set of all codewords is the code. We will define the rate, R of a block code as R = k/n. In this section we will use properties of finite fields to construct codes. Let us recall that the set of n-tuples of elements from GF (q) is the vector space called GF (q)n under vector addition and multiplication. Definition 3.2. A linear code C is a subspace of GF (q)n over the field GF (q). So a linear code C is a non-empty set of n-tuples over GF (q), called codewords. The sum of two codewords is a codeword and the product of a codeword with a field element is a codeword. If C is an (n, k) code, then n is the blocklength and k is the dimension of the code. Notice that C contains q k elements. Definition 3.3. The Hamming weight, or weight, w(c) of a vector c is equal to the number of nonzero components in the vector. The minimum Hamming weight wmin of a code C is the smallest Hamming weight of any nonzero codeword of C. Definition 3.4. The Hamming distance d(x,y) between two q-ary sequences x, and y of length n is the number of places in which x and y differ. 13 The minimum Hamming distance dmin (or d) of C is the Hamming distance between the distinct pair of codewords with the smallest Hamming distance. Let C = {cl |l = 0, . . . , M − 1} be a code. Then dmin = mini6=j d(ci , cj ). Theorem 3.1. For a linear code, the minimum distance dmin satisfies dmin = minc6=0 w(c) = wmin where the minimum is over all codewords except the all-zero codeword Proof. Let ci , cj ∈ C, dmin = mini6=j d(ci , cj ) = mini6=j d(ci − ci , cj − ci ) We can do this because C is a linear code so ci − ci , cj − ci ∈ C. mini6=j d(ci − ci , cj − ci ) = mini6=j d(0, cj − ci ) Now, any cj − ci ∈ C so cj − ci is a codeword. Also any codeword, c, can be written in the form of cj − ci because c = c − 0 where c = cj and 0 = ci . Then we can say that mini6=j d(0, cj −ci ) = mini6=j d(0, c) = minc6=0 w(c). We can use this to find a code that can correct a specific number of errors. If we want a code that fixes t errors, we need a linear code whose minimum distance, wmin ≥ 2t + 1. For example, if a codeword is transmitted and a single error is made the Hamming distance from the transmitted codeword, called the senseword, to the actual word is 1. If the distance to every other 14 codeword is greater than 1, then the decoder can fix this error. In general the decoder can fix t errors if the distance from the senseword to every other codeword is larger than t. So if a senseword is a distance less than t away from a codeword, it must be at least t + 1 away from every other codeword. Therefore we say that a code can fix t errors if dmin ≥ 2t + 1. From the theorem we know that dmin = wmin ; then a code with wmin ≥ 2t + 1 fixes t errors. 3.1 Matrix Description We know that a linear code is a subspace of GF (q)n . Any set of basis vectors for the subspace can be used as rows to form a k by n matrix, G, called the generator matrix of the code. Then the row space of G is a linear code and any code word is a linear combination of the rows of G. The rows of G are linearly independent and the number of rows, k, is the dimension of the code. This k is also the rank of the generator matrix. To encode a dataword, a, we use a one-to-one pairing of k-tuples (datawords) and codewords. Remember that a dataword is one of the initial k-tuples that we encode to get a codeword. So a ∈ GF (q)k , and c = a G. The codeword is an n-tuple. Let’s look at an example of a simple code. Take the generator matrix, 15 1 0 0 1 0 G= 0 1 0 0 1 0 0 1 1 1 and the data vector, a= 0 1 1 . Now we can encode this a into the codeword c 1 0 0 1 0 c= 0 1 1 0 1 0 0 1 = 0 1 1 1 0 0 0 1 1 1 Definition 3.5. If W is a subspace of a vector space of n-tuples, the set of vectors orthogonal to W is called the dual space of W. Theorem 3.2. If W, a subspace of a vector space of n-tuples, has dimension k, then W ⊥ , the dual space of W, has dimension n − k. [2] The code, C, has dimension k, which is the number of rows in G. C has an orthogonal complement because it is a subspace of GF (q)n . This orthogonal complement C ⊥ is the set of all vectors orthogonal to C; therefore it is also a subspace of GF (q)n , and itself is a code. It is often referred to as the dual code of C. Using the theorem above, C ⊥ has dimension n − k. So any basis of C ⊥ has n − k vectors. Now define H to be a matrix with any set of basis vectors of C ⊥ as rows. We can use this matrix to check if any n-tuple, ~c, is a codeword 16 in C. c must be orthogonal to every row vector of H; thus cHT = 0 where HT is the transpose of H. This is called the check equation. If an n-tuple obeys this equation then it is a codeword of C. The matrix, H is called the check matrix. As the check equation holds if c is any row of G, then GHT = 0. In the previous example, we can write the check matrix as 1 0 1 1 0 H= . 0 1 1 0 1 We can use the check matrix to find the minimum Hamming distance of a code. The minimum Hamming distance gives us a measure of how good a code is. This is because it is related to the number of errors a code can fix. Theorem 3.3. If H is any check matrix for a linear code C, then the minimum distance of the code is equal to the smallest number of columns of H that form a linearly dependent set. [2] So, for an (n, k) code that can fix t errors, it is sufficient to find an (n − k) by n matrix H where every set of 2t columns is linearly independent. 3.2 Hamming Codes There is a specific kind of linear code called a Hamming code. We will talk about binary Hamming codes here. Later we will connect Hamming codes to cyclic codes and also talk about Hamming type convolutional codes. 17 If we have a binary code with minimum distance 3, its check matrix must have distinct and nonzero columns because the smallest number of columns of H that form a linearly dependent set is 3. Therefore any two columns need to form a linearly independent set. If the check matrix has m rows, each column is an m bit binary number. There are 2m − 1 possible columns because we exclude zero. Let H consist of all 2m − 1 possible columns; then n = 2m − 1 and k = (2m − 1) − m. So this defines a (2m − 1, 2m − 1 − m) code. Codes of this form are called Hamming codes. An example of a binary Hamming code with m = 3 is given by the following check matrix. 1 1 0 1 1 0 0 H= 1 0 1 1 0 1 0 0 1 1 1 0 0 1 Every pair of columns of H is linearly independent. Some sets of three columns are dependent; therefore the minimum distance is 3 and the code can correct one error. We can make codes that correct more errors by defining H over GF (q), q 6= 2. 18 4 Cyclic Codes Cyclic codes are a specific kind of linear codes. We will look at cyclic codes over GF (q) that have special structural properties that will prove to be useful in developing Hamming type convolutional codes. A linear code is described over GF (q) with a check matrix, H. Any vector, c, over GF (q) is a codeword if cHT = 0. The check matrix H can be written in a more compact way if we work in an extension field of GF (q) like GF (q m ). For example let’s look at the following check matrix 1 0 0 1 0 1 1 . H= 0 1 0 1 1 1 0 0 0 1 0 1 1 1 We can identify the columns of H with elements of GF (23 ). As we saw in an example earlier, the polynomial p(x) = x3 +x+1 can be used to construct GF (8) with α as the primitive element represented by x. Using the first row as the coefficient of x0 , the second row as the coefficient of x and the last row as the coefficient of x2 yields H = (1, x, x2 , x + 1, x2 + x, x2 + x + 1, x2 + 1) or H= α0 α1 α2 α3 α4 α5 α6 Using this check matrix over GF (8), we can define any codeword as a vector over GF (2) such that in the extension field GF (8), the product with HT is zero. 19 4.1 Polynomial Description In the example above, we said that a vector c is a codeword if cHT = 0. Thus (c0 , c1 , . . . , c6 )(α0 , α1 , . . . , α6 )T = 0. This product can be expanded to 6 X get ci αi = 0. This looks a lot like a polynomial. Generally, replacing α i=0 with x, a codeword can be written as c(x) = n−1 X ci xi = 0. So evaluating the i=0 polynomial c(x) at α, we get the product of the codeword with the check matrix. Then for c(x) to be a codeword, we need c(α) = 0. Now we will describe what it means for a linear code to be cyclic in the polynomial description of cyclic codes. Let C be a linear code over GF (q). Then C is cyclic if when c = (c0 , c1 , . . . , cn−1 ) is in C, then c′ = (cn−1 , c0 , . . . , cn−2 ) is also in C. We have shifted the components of c by one place to get c′ . So we can repeat this shift a total of n times. Every linear code of blocklength n over GF (q) is a subspace of GF (q)n . We can relate the structure of GF (q)n to the structure of a set of polynomials. Each vector in GF (q)n is represented as a polynomial with degree less than or equal to n − 1. The coefficients in the polynomial are the components of the vector. This set of polynomials has a ring structure, that we saw earlier. In this case we write the quotient ring as GF (q)[x]/ < xn − 1 >. Within this ring, a cyclic shift can be written as x · p(x) = Rxn −1 [xp(x)]. Therefore in this representation, a code is a subset of the ring GF (q)[x]/ < xn − 1 >. Pick a nonzero codeword polynomial, g(x), of the smallest degree from C, and denote its degree by n − k. We can make g(x) a monic polynomial by multiplying it by a field element. g(x) is then the only monic polynomial 20 with degree n − k in C because if we had another monic polynomial with degree n − k, we could subtract the two to get something with smaller degree. Therefore this is a contradiction because we picked this polynomial to be of smallest degree. This unique nonzero monic polynomial, g(x), is defined as the generator polynomial of C. Multiplying this polynomial with polynomials of degree k − 1 or less, we get all other elements of the cyclic code. We will now prove this statement. By the division algorithm, we know that for some polynomials, s(x) and Q(x), c(x) = Q(x)g(x) + s(x) for any c(x) that is in the code with the generator g(x). The degree of s(x) must be smaller than g(x). Rearranging, s(x) = c(x) − Q(x)g(x). We know that x · g(x) = Rxn −1 [xg(x)] is in the code; then any linear combination of powers of x also gives us something in the code because the code is linear. Also, adding two code elements yields a code element. Therefore both, c(x) and Q(x)g(x) are codewords so s(x) must be a codeword. But s(x) is of degree less than n − k, the degree of g(x). Then s(x) must be the zero polynomial because g(x) is the polynomial of smallest degree. Then Q(x) can be of degree less than k − 1. So multiplying any polynomial, Q(x) with degree less than k − 1 gives us a codeword in the code. Theorem 4.1. A cyclic code of blocklength n with generator g(x) exists if and only if g(x) divides xn − 1. [2] Definition 4.1. Every polynomial p(x) = r X pi xi is associated with a recip- i=0 r X rocal polynomial which is defined as p̃(x) = i=0 21 pr−i xi = xr p(x−1 ). If g(x) divides xn − 1, the reciprocal polynomial, g̃(x) = xr g(x−1 ), where r is the degree of g(x), also divides xn − 1. From the last theorem, the reciprocal polynomial must also be the generator for some code. This code is said to be equivalent to the code generated by g(x). Now, if C is cyclic code with generator polynomial g(x), there exists a polynomial, h(x) such that xn − 1 = g(x)h(x). The polynomial h(x) is called the check polynomial of C. So every codeword c(x) satisfies, Rxn −1 [h(x)c(x)] = 0 because h(x)c(x) = h(x)g(x)a(x) for some a(x). Then h(x)g(x)a(x) = (xn − 1)a(x). a(x) is referred to as the data polynomial. To study the generator polynomials of cyclic codes in more detail, we will try to find all possible generator polynomials. One way of doing this is to write xn − 1 in terms of its monic irreducible factors f1 (x)f2 (x) . . . fs (x). These factors are distinct polynomials over GF (q). We can have 2s − 2 different cyclic codes of blocklength n as we can take any subset of these prime polynomials, except all or none, and multiply them together to give a generator polynomial. 4.2 Matrix Description We will now come back to the matrix representation at the beginning of this section. We will relate cyclic codes to their matrix representations by writing a generator matrix from the generator polynomial of a code. We 22 know that codewords are of the form c(x) = a(x)g(x). We can translate this equation to matrix form using an extension field of GF (q). If γj ∈ GF (q m ) n−1 X for j = 1, . . . , r are the zeros of g(x), then ci γji = 0. This can be written i=0 in a matrix form as 0 1 γ1 γ1 . . . γ 0 γ 1 . . . 2 2 H=. .. γr0 γr1 . . . γ1n−1 γ2n−1 . γrn−1 This matrix can be written over GF (q) by expanding each matrix element γ by a column vector based on the coefficients of γ expressed as a polynomial over GF (q). So we get a check matrix for the code. Another way to construct the generator matrix of the code is by direct inspection. So if we have a generator polynomial, g(x), we can write the generator matrix as 0 ... 0 G= 0 .. . gn−k . . . 0 gn−k gn−k gn−k−1 . . . gn−k−1 gn−k−2 . . . gn−k−1 gn−k−2 gn−k−3 . . . g 2 g1 g0 g 1 g0 g0 0 0 0 0 .. . 0 If h(x) is the check polynomial, then the check matrix can then be written as 23 0 0 0 ... .. . H= 0 0 h0 0 h0 h1 h0 h1 h2 . . . ... hk−2 hk−1 hk hk−1 hk 0 hk 0 0 ... hk−1 hk .. . 0 0 0 0 0 0 For example, if we have the generator polynomial, g(x) = x3 + x + 1, then 0 0 G= 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 Now h(x) = x4 + x2 + x + 1. The check matrix therefore can be written as 0 0 1 1 1 0 1 H= 0 1 1 1 0 1 0 1 1 1 0 1 0 0 We can check that GHT does indeed give us the zero matrix. 24 5 Some More Algebra In the next section we will be studying convolutional codes constructed from cyclic codes. These codes have larger Hamming distances which allow us to correct more errors. Before moving on to convolutional codes, we will define group rings and outline some of their properties that will be useful for us in the next section. 5.1 Group rings For a group G and a ring R, a group ring is defined as X αg g|αg ∈ R}, RG = { g∈G where only a finite number of αg are nonzero. Thus the elements of G act as the basis of the group ring over R. Multiplication in the group ring is analogous to polynomial multiplication. In forming convolutional codes, we will use a cyclic group as the group G. Zero-divisors and units are specifically useful in coding. In forming cyclic codes in the last section, we used zero-divisors. Group rings are an abundant source of zero-divisors and units that will help us in forming convolutional codes. We can write RG-matrices of elements in RG. This way of looking at a group ring will be useful for us while looking at convolutional codes. Let n X {g1 , g2 , . . . , gn } be a group G. The RG matrix of an element, w = α gi g i ∈ i=1 25 RG in Rn×n , the ring of n × n matrices over R, is defined as, −1 −1 α g1 g1 α g1 g2 . . . α −1 g g αg2−1 g2 . . . M (RG, w) = 2. 1 .. .. .. . . αgn−1 g1 αgn−1 g2 . . . αg1−1 gn αg2−1 gn . .. . αgn−1 gn To form this matrix, we associate the element in the first column and the first row to the coefficient of g1 , the second column and the first row to the coefficient of g2 and so on. More technically, the group ring RG is isomorphic to a ring of RG-matrices over R. Theorem 5.1. Let G be a group of order n. Then there is a bijective ring homomorphism σ(w) = M (RG, w) so σ : RG → Rn×n . Proof. We will only prove this for the case of n = 3. Let G = C3 =< β >. Then σ takes w = a + bβ + cβ 2 to a b c c a b b c a This is obviously bijective because for any such matrix, M (Z2 C3 , w) there exists a w that maps to it. Also any distinct w’s cannot be mapped onto the same matrix, M (Z2 C3 , w). Now for this to be a homomorphism, it needs to preserve multiplication and addition. 26 Let x = a + bβ + cβ 2 and y = p + qβ + rβ 2 be in the group ring. Then we want σ(x + y) = σ(x) + σ(y). So a b c p q r a + p b + q c + r σ(x) + σ(y) = c a b + r p q = c + r a + p b + q . b+q c+r a+p q r p b c a But this is σ(x + y) as x + y = (a + p) + (b + q)β + (c + r)β 2 . Therefore σ preserves addition. Now we want σ(x · y) = σ(x) · σ(y). We can write this as a b c p q r ap + br + cq aq + bp + cr ar + bq + cp ·r p q = cp + ar + bq cq + ap + br cr + aq + bp . σ(x)·σ(y) = c a b bp + cr + aq bq + cp + ar br + cq + ap q r p b c a On inspection we see that this is σ(x · y) as x · y = (ap + br + cq) + (aq + bp + cr)β + (ar + bq + cp)β 2 . Therefore σ preserves multiplication. Then σ is a bijective homomorphism at least for the case of n = 3. Let us look at a specific example of this. Consider, Z2 C7 where C7 is the cyclic group of seven elements. In this group ring, (1+g+g 3 )(1+g+g 2 +g 4 ) = 27 0. This can then be written as 1 0 0 0 1 0 1 6 1 0 1 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 =0 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 Convolutional Codes We discussed in the introduction the two basic forms of codes: block codes and trellis codes. The codes we have discussed till now are all block codes. Like block codes, trellis codes divide a datastream into blocks of length k, called dataframes, which are encoded into blocks of length n called codeframes. In both cases the datastream is divided into a sequence of blocks or frames. However using a block code, a single block of the codestream only depends on a single block of the datastream. In a trellis code, a single frame of the codestream depends on several frames of the datastream. 6.1 Trellis Description of Convolutional Codes A type of trellis code is a convolutional code. For a convolutional code, a datastream is divided into dataframes of k symbols. Each dataframe might only be one symbol. These dataframes are fed into an encoder. The encoder 28 can store m dataframes. m is called the memory of the encoder. Every time a new dataframe is added to the encoder, the m stored dataframes shift forward and the oldest one is discarded. Therefore this type of an encoder is called a shift-register encoder. For each added dataframe, the encoder computes a codeframe of length n using the current dataframe and the stored m dataframes, each of length k. This codeframe is shifted out as a new dataframe is shifted in. The following figure explains the basic idea. The set of code sequences produced by such an encoder is called an (n, k) convolutional code. The code rate of such a code is k/n. There is a difference between the encoder and the code itself. The code is the infinite set of all infinitely long codewords that can be produced by feeding every possible data sequence. The encoder is device used for the encoding. A trellis code needs to have two properties in order to be a convolutional code: time invariance and linearity. Time invariance means that if a dataframe is delayed by a single dataframe, (a dataframe of zeros is sent in), then the codeframe is also delayed by a single codeframe. The linearity property says that if we take a linear combination of two datastreams and find the codestream 29 for it, it will be the same as taking a linear combination of the codestreams of the two datastreams. Most often practical convolutional codes use very small values of k and n, such as k = 1. In this case, the m dataframes stored in the encoder are each of length one. A useful way of describing the input to the encoder is by using the time invariance of a convolutional code. Each bit or dataframe, for the case k = 1, can be represented as part of a polynomial over the field GF (2). A bit with no time delay is represented as a 1. A bit with a delay of one dataframe is represented as an x. A delay of two dataframes will be represented as x2 . This way we can represent a datastream as a polynomial. For example, 1011 will be represented as x3 + x + 1. We will exploit this further in the next section. 6.2 Construction of Convolutional Codes from Units: Polynomial Description We start with a ring R, which is a subring of the ring of matrices Fn×n . So we have the group ring F G which is a subring of Fn×n using the embedding defined in Section 5.1. First we will outline the general method of construction of convolutional codes from units, then we will apply this to polynomials. Suppose f and g are in R[z, z −1 ], where R[z, z −1 ] is the ring of Laurent series of finite support in z with coefficients from R. Here finite support means that only a finite number of the coefficients are nonzero. Also, f [z, z −1 ]g[z, z −1 ] = 30 1. We can decompose f and g in the following way. f 1 f = g = g1 g2 f2 where f1 is an k × n matrix, f2 is an (n − k) × n matrix, g1 is an n × k matrix, and g2 is an n × (n − k) matrix. So f 1 × g1 g2 = 1. f2 Then we can write f 1 g 1 f 1 g 2 =1 f2 g1 f2 g 2 So f1 g1 = Ik×k , f1 g2 = 0k×(n−k) , f2 g1 = 0(n−k)×k , and f2 g2 = I(n−k)×(n−k) . Thus we can take f1 as the generator matrix of an (n, k) convolutional code. g2 is then the check matrix for this code because f1 g2 = 0k×(n−k) . Generalizing this result, we can take any rows of f to construct a generator matrix for a convolutional code. The check matrix can then be obtained using g. Let us apply this construction to the case of polynomials. Let f and g be polynomials with f (z)g(z) = 1 in R[z]. f and g can be written in a matrix form using the homomorphism described earlier. The n × n matrices can then be written as f (z) = (fi,j (z)) and g(z) = (gi,j (z)). Consider an element k[z] ∈ F [z]k . This is an k-tuple of polynomials. If we add zeros at the end of k[z], we can think of k[z] as an element in F [z]n . In F [z]k , k[z] is a 1 × k 31 matrix and in F [z]n , it is a 1 × n matrix. We have added n − k zeros at the end of the 1 × k matrix. Now the mapping γ : F [z]k → F [z]n can be defined by γ : k(z) → k(z)f (z). We will define the code C to be the image of γ. We can define the generator matrix, G(z) of this code by taking the first k rows of f (z) because k[z] ∈ F [z]n has zeros in the last n − k columns. 6.3 Group Ring Convolutional Codes: Matrix Description Let R = F G. It is the group ring of some group G over F . Using the explicit injection from F G to a subring Fn×n , we say that R is a subring of Fn×n where |G| = n. Now we can define convolutional codes using the same method as the last section. We consider R[z, z −1 ] ∼ = RC∞ . So this is the group ring over C∞ with the coefficients from R = F G. To find units in R[z, z −1 ], we need to find units in F G. n t X X Let αi z i × βj z j = 1 where αn 6= 0 and βt 6= 0. This is considered an i=0 j=0 equation in RC∞ with non-negative powers. Comparing the coefficients of z 0 on either side of the equation, we see that β0 6= 0 and α0 6= 0. Looking at the highest and lowest powers, we can see that αn × βt = 0 and α0 × β0 = 1. So we can say that α0 is a unit with β0 as the inverse. 32 6.4 Specific Examples Example 1 Let R = Z2 C4 , where C4 is the cyclic group generated by a and has order 4. Now if we let α0 = a + a2 + a3 , α1 = 1 + a2 and α2 = a + a3 , then α02 = 1, and α12 = α22 = 0. So if w = α0 + α1 z + α2 z 2 in RC∞ , then w2 = α02 +z(α0 α1 +α1 α0 )+z 2 (α0 α2 +α12 +α2 α0 )+z 3 (α1 α2 +α2 α1 )+z 4 (α22 ) = 1. Now using the injection from R = Z2 C4 into the ring of 4 × 4 matrices over Z2 , we can write α0 as 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 α1 as 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 and α2 as 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 33 w can then written as 0 1 w= 1 1 1 1 0 1 1 0 1 1 1 1 1 0 + 1 1 0 0 0 1 0 0 1 1 0 1 z + 0 0 1 0 1 0 1 1 1 0 1 0 1 0 2 z 1 0 1 0 1 0 If we take the first two rows of w to generate a convolutional code, we get the following generator matrix 0 1 0 1 2 0 1 1 1 1 0 1 0 G= z z + + 1 0 1 0 0 1 0 1 1 0 1 1 The last two columns of w give us the check matrix, 1 1 H= 0 1 1 1 1 0 + 1 1 0 0 0 0 1 1 z + 0 0 1 1 1 0 2 z 1 0 If C is an (n, k) cyclic code with distance d1 , and Cˆ is the (n, n − k) dual code of C with distance d2 , the free distance, d is defined as min{d1 , d2 }. The free distance of this code is 6. The free distance can be calculated by finding the smallest number of ones that a nonzero codeword can have. In this case, consider, ((1 0)+(0 1)z)G. This gives (0 1 1 1)+(1 0 1 0)z+(0 1 0 1)z 2 + (1 0 1 1)z+(0 1 0 1)z 2 +(1 0 1 0)z 3 . So we get (0 1 1 1)+(0 0 0 1)z+(1 0 1 0)z 3 . So we have 6 ones in this codeword. 34 6.5 Hamming Type Convolutional Codes We will use the construction as given in [4] of Hamming type convolutional codes. We will look at a particular example of the code that can be constructed from the Hamming cyclic (7, 4) code that is generated by the polynomial f (g) = 1 + g + g 3 . This is the same cyclic code that we mentioned in Section [2.2] and Section [5.1]. Let C be the cyclic code generated by f (g) = 1 + g + g 3 . We now define f (z) = α0 + α1 z + α3 z 3 in RC∞ . Let R = Z2 (C4 × C2 ) where C4 is generated by a and C2 is generated by h. Then {C4 ×C2 } = {1, a, a2 , a3 , h, ah, a2 h, a3 h}. Assume that α0 = 1 + h(1 + a2 ) so that α02 = 1. Also let αi , for i > 0, be either 1 + h(a + a2 + a3 ) or 0. Then αi2 is 0 for all i except for i = 0. Now f (z)2 = α02 = 1 because αi2 = 0 for i > 0. We will now use this f (z) to generate a convolutional code by taking the first first rows of αi . Remember that α0 can be written as a matrix given the homomorphism in [5.1]. The first row of this matrix is given by the coefficients of (1 a a2 a3 h ha ha2 ha3 ). So the first row is (1 0 0 0 1 0 1 0). The second row is given by the coefficients of (a3 1 a a2 a3 h h ha ha2 ). This row is obtained by multiplying a3 , the inverse of a in this group ring, to the first row. So the second row is given by (0 1 0 0 0 1 0 1). We see that the matrix is showing a pattern where the first four columns and the last four columns cycle independently of each other. 35 So we get 1 0 0 0 α0 = 1 0 1 0 Let 1 0 A= 1 0 We can then write and 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1 B= 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 1 0 I A α0 = A I I B αi = , i 6= 0. B I In this formalism we will define the generator matrix of the code by taking only the first four rows of f . The check matrix is defined by taking the last four columns of f . So we get the generator matrix to be G(z) = (I A) + (I B)z + (I B)z 3 36 The control matrix is then B A B H(z) = + z + z 3 I I I To find the free distance of this code, consider w(z) = ( t X βi z i )G where i=0 βi is a 1 × 4 matrix. Let t X βi z i = t(z). The free distance is given by i=0 the sum of the weights of the coefficients of z in w(z). We assume that β0 is nonzero. We will consider t(z) of different support to see the minimum number of one’s in the product w(z). Let us first consider t(z) = β0 . In this case, w(z) has three terms. The coefficient of 1 = z 0 is β0 (I A) and the coefficient of z 3 and z is β0 (I B). As two of the rows in A are identical, we can add them together and get zero. For example let β0 = (1 0 1 0). Then we get (1 0 1 0)G = (1 0 1 0 0 0 0 0) + (1 0 1 0 1 0 1 0)z + (1 0 1 0 1 0 1 0)z 3 . We get the coefficient of 1 = z 0 to be (1 0 1 0 0 0 0 0) and the coefficient of z 3 and z is (1 0 1 0 1 0 1 0). So (I A) gives us a distance of 2 and (I B) gives us a distance of 4 by counting the number of ones. Therefore we get at least 10 ones. Now we consider t(z) of support two. Consider (β0 + β1 z)G. Again, the first term gives us at least two ones and the last term, that is the coefficient of z 4 , gives us at least four ones. There is however another term, specifically the z 2 term whose coefficient also gives us at least four ones. Similarly, we check (β0 + β1 z 2 )G. We get at least six ones from the first and the last terms. We also get at least another four ones from the coefficient of z. For (β0 + β1 z 3 )G, 37 we get at least six ones from the first and the last terms. Then we also get at least four ones from the coefficient of z 4 . For any higher power,(β0 + β1 z t )G we always get an extra term with at least four ones because the z t+1 does not cancel with anything. We move on to t(z) of support three. Consider (β0 + β1 z + β2 z 2 )G. Again, we get at least six ones from the first and last terms. We also get at least four ones from the z 4 term. For a highest power of two there are no other t(z) with support three. Similarly, (β0 + β1 z + β2 z 3 )G gives us at least six ones from first and last terms as well as at least four from the z 4 term. Also, (β0 + β1 z 2 + β2 z 3 )G gives us at least six ones from first and last terms and at least four from the z 5 term. We notice that for any higher orders of t(z), we always get a term from the highest power of t(z) and the z term in G that gives us at least four ones. We therefore conclude that we always get at least ten ones. Therefore the free distance of this code is ten. Using the same construction but different generator polynomials, we get different Hamming type convolutional code. We can generally write the generator matrix as G(z) = (I A) + δ1 (I B)z + δ2 (I B)z 2 + · · · + δn (I B)z n where δi = 0, 1. The general check matrix is then B B B A H(z) = + δ1 z + δ2 z 2 + · · · + δn z n I I I I 38 For example, n = 1 gives us the generator matrix G(z) = (I A) + (I B)z. This has a free distance of 6. We find this by considering (1 0 1 0)G. We again add two of the rows in A which are identical to get zero. So (I A) gives us a distance of 2 and (I B) gives us a distance of 4. By increasing n, we can get codes of increasing free distance. We can also increase the free distance of the code by shifting the coefficient (I A) to any position in the generator matrix and compensating for it by dividing the check matrix by some power of z. By shifting (I A) to the coefficient of z t , we have to redefine f so that αt2 = 1 and αi2 = 0 for all i 6= t. Then f 2 = z 2t . We can write this as f × zf2t . Then the first four rows of f can be used to form the generator matrix and the last four columns of zf2t can be used to form the check matrix. 39 References [1] Gallian, Joseph A. Contemporary Abstract Algebra. Boston, MA: Houghton Mifflin, 2006. [2] Blahut, Richard E. Algebraic coding theory for Data Transmission. Cambridge University Press, 1985. [3] Hurley, Paul, and Hurley, Ted. Block codes from matrix and group rings. World Scientific Review, Volume July 23 2008. [4] Hurley, Paul, and Hurley, Ted. LDPC and convolutional codes from matrix and group rings. World Scientific Review, Volume July 23 2008. [5] Hurley, Ted. Convolutional codes from units in matrix and group rings, Selected Topics in Information and Coding Theory Galway, UK: National University of Ireland. [6] Hurley, Paul, and Hurley, Ted. Codes from zero-divisors and units in group rings. arXiv:0710.5893. [7] Hurley, Ted. Self-dual, dual-containing and related quantum codes from group rings. arXiv:0711.3983. 40