Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial Hung-Wei Chang1, *, Che-Wun Chiou2, Fu-Hua Chou3, and Wen-Yew Liang1 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 106, Taiwan {t6599009, wyliang}@ntut.edu.tw 2 Department of Computer Science and Information Engineering, Ching Yun University, Chung-Li 320, Taiwan cwchiou@cyu.edu.tw 3 Department of Electronic Engineering, Ching Yun University, Chung-Li 320, Taiwan fhchou@cyu.edu.tw Received 20 June 2011; Revised 27 July 2011; Accepted 10 September 2011 Abstract. Due to the rapid development of smart-phones, mobile commerce becomes very popular and valuable. The communication and information security of the mobile commerce is heavily dependent on the public key cryptosystems such as RSA. However, existing public key cryptosystems are not available for the resource constrained devices like smart-phone. Therefore, the new elliptic curve cryptosystem with very low cost as compared to RSA is useful and suggested for mobile commerce. The polynomial basis multiplication is the most important arithmetic operation in the elliptic curve cryptosystem. A new and proved effective cryptanalysis is called fault based cryptanalysis. To protect such type cryptanalysis, the simple way is to redesign cryptosystems with concurrent error detection capability and only output error-free computed results. The polynomial basis multipliers generated by trinomials have advantages of low complexity and easy VLSI implementation. However, no existing polynomial basis multipliers which are generated by trinomials have concurrent error detection capability. Thus, a new polynomial basis multiplier using trinomial with concurrent error detection capability will be presented. As compared to other existing polynomial basis multipliers using general polynomials, the proposed polynomial basis multiplier using trinomial with concurrent error detection capability saves about 40% space complexity. Keywords: Finite field arithmetic, concurrent error detection, polynomial basis multiplier, elliptic curve cryptosystem, fault based cryptanalysis 1 Introduction Recently, finite fields arithmetic operations attracted many attentions because of their importance and practical applications in the field of error correcting code [1], cryptography [2], digital signal processing [3], [4], switching theory [5], pseudorandom number generation [6], encoding of Reed-Solomon code [7], and solving the Wiener-Hopf equation [8]. In particular, two public-key cryptography schemes, elliptic and hyper elliptic curve cryptosystems [9]-[11] require arithmetic operation to be performed in finite field. The finite field arithmetic operations include addition, multiplication, division, inversion, and exponentiation. Addition is a simple bit-by-bit XOR operation. Complicated operations such as division, inversion, and exponentiation can be performed by repeated multiplications. Moreover, the elliptic scalar multiplication is based on the multiplication over GF(2m). Thus, the multiplication over GF(2m) is the most operation among finite field arithmetic. There are three popular types of bases over finite fields: polynomial basis (PB) [12]-[17], normal basis (NB) [18]-[21], and dual basis (DB) [22], [23]. Each basis representation has its own distinct advantages. With the advantages of low design complexity, simplicity, regularity, and modularity in architecture, the polynomial basis *Correspondence author Journal of Computers Vol. 22, No. 3, October 2011 [12]-[17] are widely used for producing efficient VLSI multiplier implementations. The major advantage of the normal basis [18]-[21] is that the squaring of an element could be performed simply by cyclically shifting its binary form. As compared to other two bases, the dual basis multiplier [22], [23] require less chip area. Side-channel attack is a powerful analysis and differential fault analysis which deliberates fault injection into cryptographic devices only require a small amount of side-channel information to break common ciphers. Sidechannel attacks have been proven to be a useful cryptanalysis technique against symmetrical and asymmetrical encryption algorithm. Kelsey et al. [24] claimed that differential fault analysis requires only 50 to 200 cipher text blocks to recover a symmetrical block cipher Data Encryption Standard (DES) key by deliberating fault injection. Biham and Shamir [25], and Boneh et al. [26], also developed a fault-based cryptanalysis on symmetrical and asymmetrical cryptosystems, respectively. Therefore, the simplest way for protecting the encryption/decryption circuit from an attacker is to ensure that the computation device can ensure the accuracy of the signal before outputting it. To output error-free values, several error detection schemes have been presented for the symmetrical and asymmetrical cryptosystems [27]-[32]. Fenn et al. [31] proposed an on-line error detection scheme for bit-serial multipliers in GF(2m) using parity prediction. By applying the same parity checking approach, Reyhani-Masoleh and Hasan [32] provides error detection methods in bit-parallel and bit-serial polynomial basis multipliers in GF(2m). Although the complexity in their design costs extra hardware, Moreover, the probability of error detection of bit-serial multiplier is about 100%. In their design can be applied to any irreducible polynomial defining the field. In addition, their design can be easily extended and applied to other GF(2 m) multipliers. However, the problem involved in adopting parity checking is that it takes a long time to generate parity. An XOR tree is utilized for computing the parity. It requires at least éêlog 2 mù ú XOR-gate delays to determine the output parity. The value of m in practical cryptosystems is typically equal to 512 or more. Therefore, the parity checking method is not allowed to provide on-line error detection capability in systolic array multipliers with bit-parallel output. However, Lee et al. [23] has solved this difficulty using the parity checking method in the case of dual basis representation. In [33], Chiou provided a concurrent error detection (CED) scheme for a special case of polynomial basis representation, termed all-one polynomial. In [34], Chiou et al. provided a concurrent error detection scheme for the general case of polynomial basis representation. Chiou et al. [35] also developed a finite field Montgomery multiplier with concurrent error detection capability in 2006. These Chiou’s works employ the concept of REcomputing with Shift Operands (RESO) [36], [37]. However, no finite field polynomial multiplier using irreducible trinomial having concurrent error detection capability can be found in the literature up to date. Therefore, we will redesign a polynomial basis multiplier using irreducible trinomial to have concurrent error detection capability in this study. This work employs the concept of RESO [36], [37] and equality check [38] methods. The organization of this paper is as follows: Section 2 briefly reviews the mathematical background. Section 3 introduces the proposed PB multiplier with trinomial. The novel PB multiplier with concurrent error detection capability is discussed in section 4. Finally, results and conclusions are made in section 5 and 6, respectively. 2 Preliminaries Let GF(2m) be an extension field of the ground field GF(2). GF(2m) is a vector space over GF(2). Supposed that the finite field GF(2m) is generated by the irreducible polynomi1 2 al P( x) = p0 + p1x + p2 x + ... + pm- 1xm- 1 + pm xm of degree m over GF(2). Let A(x), B(x), C(x) and P(x) be elements over GF(2m), where C(x) is the product of A(x) and B(x), i.e., C(x)=A(x)B(x) mod P(x). Then A(x), B(x), C(x) and P(x) can be expressed as follows: A( x) = a0 + a1 x + ... + am- 1 x m- 1 , B( x) = b0 + b1 x + ... + bm- 1 x m- 1 , C ( x) = c0 + c1 x + ... + cm- 1 x m- 1 . C(x) is the product of A(x) and B(x). Then, C ( x) = A( x) B( x) mod P( x) . Using Horner’s rule, C(x) can be obtained by C ( x) (1) = A( x) B( x) mod P( x) = a0 B( x) mod P( x) + a1 xB( x) mod P( x) + ... + am- 1 x 12 m- 1 B( x) mod P( x) Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial Then, m- 1 C (x) = A(x)B (x) = å m- 1 (i ) ai ´ B (x) = i= 0 å ci ´ xi , i= 0 where (i ) B (x) = xi ´ B (x), ci Î { 0,1} and 0 £ i £ m - 1 . Let the polynomial P(x) be formed as xm+xn+1 with m>n>0, which is known as a trinomial polynomial. Since P(x)=0, xm+xn+1 can be used to reduce the higher order term xp, p≧m to a polynomial of degree less than m. The RESO method [36], [37] is based on time redundancy technique. The RESO method is utilized to achieve concurrent error detection capability and is briefly reviewed as follows. Fig. 1 shows the flow chart of the RESO method. Let x be the input to a function unit F and F(x) be the desired output. The RESO method using time redundancy technique computes F(x) twice, within a specified time frame on the same unit F. Supposed that the function b and b-1 have the following property: b-1(F(b(x)))=F(x), where b-1 is the inverse function of b , so b-1((b(x))=x. During the first step, the input x is applied to the function unit F and then F(x) is computed and stored in the latch. During the second step, the input b(x) is applied to the function unit F and the computed result F(b(x)) is then converted to F(x) by b-1= b-1(F(b(x)))=F(x). The results given by both steps are compared. If they are not equal, the error signal will output a logical one to indicate the presence of error. Recently, Wang-Lin [39] and Lee et al. [34] proposed methods for designing the systolic and semi-systolic arrays with error detection Step 1: Step 2: x Latch F x b b-1 F Error Signal = Fig. 1. RESO method 3 Proposed Polynomial Basis Multiplier In the following paragraphs, the proposed polynomial basis multiplier using trinomial is described. The multiplication in Eq. (1) can be expressed as follows: C ( x) = A( x) B( x) mod P( x) ( ) = a0 B( x) mod P( x) + a1 xB( x) mod P( x) + ... + am- 1 x m- 1B( x) mod P( x) mod P( x) where P( x) = x m + x n + 1 and m > n > 0 , then B(x) in Eq. (1) can be rewritten as: B( x) = bn x n + bn+ 1 x n+ 1 + ... + bm- 1 x m- 1 + b0 + b1 x1 + ... + bn- 1 x n- 1 (2) Therefore, xB( x) mod P( x) can be denoted as xB( x) mod P( x) = bn x n+ 1 + bn+ 1 x n+ 2 + ... + bm- 1 x m + b0 x + ... + bn- 1 x n mod P( x) n = (bm- 1 + bn- 1 ) x + bn x n+ 1 + ... + bm- 2 x m- 1 + bm- 1 + b0 x + ... + bn- 2 x (3) n- 1 As aforementioned, 13 Journal of Computers Vol. 22, No. 3, October 2011 B( x)(1) = xB( x) (4) n = bn- 1+ m x + bn x n+ 1 + ... + bm- 2 x m- 1 + bm- 1 + b0 x + ... + bn- 2 x n- 1 where bn- 1+ m = bn- 1 + bm- 1 . Therefore, B ( x)(i ) = xB( x)(i- 1) (5) n = b( n- i )+ m x + b( n- i+ 1)+ m x n+ 1 + ... + b( n- 1)+ m x n- i + bn x n+ i + 1 + .. + b( n- 1- i ) x n- 1 where b( n- i )+ m = b( n- i ) + b( m- i ) if i £ m - n b( n- i )+ m = b( m- i )+ m + b( n- i ) (6) if i > m - n If B( x)(0) = B( x) , then Eq. (1) can be rewritten as: C( x) = a0 B( x)(0) + a1B( x)(1) + ... + am- 1B( x)(m- 1) (7) As mentioned above, xm=xn+1 due to P(x)=0. Both addition and subtraction are same in GF(2m), thus the equation xm=xn+1 can be rewritten as xn=xm+1. Based on both equations xm=xn+1 and xn=xm+1, the architecture for the proposed polynomial basis which is suitable for adding concurrent error detection capability is discussed as follows. 3.1 Employing xm=xn+1 By applying Eq. (7), C( x) = a0 B( x)(0) + a1B( x)(1) + ... + am- 1B( x)(m- 1) , where B( x) = b0 + b1x + b2 x 2 + ... + ... + bn- 1xn- 1 + bn xn + ... + bm- 1xm- 1 . According to the above equation, Fig. 2 shows the feedback structure and its initial value of register D is assigned as follows: D0 = b0 , D1 = b1 , D2 = b2 , ... , Dn- 1 = bn- 1 , Dn = bn , ... , Dm- 1 = bm- 1. D0 D1 D2 Dn-1 + Fig. 2. Feedback structure by using xm=xn+1 3.2 Utilizing xn=xm+1 * The product C ( x ) can be presented as follows: 14 Dn Dm-1 Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial * C ( x) * * = A( x)´ B( x) * * * * = A( x)´ (bn+ 1 x n+ 1 + bn+ 2 x n+ 2 + * * * bm- 1 x m- 1 + b0 x 0 + b1 x1 + * + b n- 1 x n- 1 + bn x n ) where * * * * (8) bn xn = bn ( x m + 1) = bn x m + bn x 0 * \ C ( x) * * = A( x)´ (bn+ 1 x n+ 1 * + bn+ 2 x n+ 2 * + + bm- 1 x m- 1 * * m * 0 * 1 + bn x + (b0 + bn ) x + b1 x + + bn- 1 x (9) n- 1 ) According to the above equation (9), Fig. 3 depicts the feedback structure and the initial value of register D is loaded as follows: * * * * * * D0 = bn+ 1 , D1 = bn+ 2 , ... , Dm- n- 1 = bm- 1 , Dm- n = bn , Dm- n+ 1 = (b0 + bn ), ... , Dm- 1 = bn- 1 . D0 D1 Dm-n-1 Dm-1 + Fig. 3. Feedback structure by using xn=xm+1 3.3 Combining xm=xn+1 and xn=xm+1 for Concurrent Error Detection According the relation between m and 2n, three cases for the combining structure are separately discussed below. Case 1: as m = 2n B( x) = b0 + b1 x + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1 * * * * B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 + * + bm- 1 x m- 1 + bn x m + * + bn- 1 x n- 1 (10) bm-n-1 and bn-1 can be in the same bit position. Case 2: as m > 2n B( x) = b0 + b1 + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1 * * * B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 + * (11) * + bn+ m x n+ m + bn+ m+ 1 x n+ m+ 1 + * * + bm- 1 x m- 1 + bn x m + * + bn- 1 x n- 1 The bit position of bm-n-1 is in the rear of that of bn-1. Case 3: as m < 2n B( x) = b0 + b1 x + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1 * * * B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 + * * + bm- 1 x m- 1 + bn x m + (12) * * + bn+ m x n+ m + bn+ m+ 1 x n+ m+ 1 + * + bn- 1 x n- 1 The bit position of bm-n-1 is in the front of that of bn-1. An example with P(x)=1+x2+x5 is shown in Fig. 4. The detailed circuits of both processing cells V and U are drawn in Fig. 5 and Fig. 6, respectively. 15 Journal of Computers Vol. 22, No. 3, October 2011 The select line S in Fig. 4 chooses the left XOR operation of U cells. In step 1, the systolic PB multiplier computes Eq. (1) by using x5=x2+1. The left XOR operation of Un-1 is active by setting S=0. In step 2, the systolic PB multiplier performs Eq. (2) using x2=x5+1. The left XOR operation of U0 is active by setting S=1. b0 0 0 b1 0 b2 0 0 b1 0 0 b4 17 a0 U0,0 V0,1 U0,2 V0,3 V0,4 a1 U1,0 V1,1 U1,2 V1,3 V1,4 a2 U2,0 V2,1 U2,2 V2,3 V2,4 a3 U3,0 V3,1 U3,2 V3,3 V3,4 a4 U4,0 V4,1 U4,2 V4,3 V4,4 c3 c4 S S S S S c1 c0 c2 Fig. 4. The proposed systolic polynomial basis multiplier which employs x5=x2+1 and x2=x5+1 c b in in a ao in L ut L bo co Note: L: 1-bit Latch ut Fig. 5. The detailed circuit of V cell 16 ut Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial ain Select d cin aout 2-1 MUX bin L L cout bout Fig. 6. The detailed circuit of U cell 4 Proposed Polynomial Basis Multiplier with Concurrent Error Detection Capability PB multipliers applied in ECC usually require a very large field. Such multiplier might needs millions of logic gates. The errors will occur in computation due to faults in the field. To eliminating errors in cryptographic computations has been pointed out in some recent articles. The simplest way to prevent a fault-based attack is to ensure that the computational device verifies values before sending them out. In this section, we will introduce how to make the proposed multiplier to have concurrent error detection capability. Assume that the single cell fault model is assumed in this paper. For this fault model, its behavior will result in one of the faulty cell’s outputs being fixed to either a logic 0 (stuck-at-0) or a logic 1 (stuck-at-1), respectively. As aforementioned, the polynomial basis multiplication can be performed by one of Eqs. (10), (11) or (12) depending on the relation of m and 2n. Therefore, error schemes for such three cases are discussed separately as follows. (a) When m=2n, the systolic PB multiplier is shown in Fig. 7. Each row can compute either Eq. (1) (using xm=xn+1) or Eq. (2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In step 2, the left XOR operation of U0 is active by setting S=1. (b) When m>2n, the systolic PB multiplier is shown in Fig. 8. Each row can compute either Eq. (1) (using xm=xn+1) or Eq. (2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In step 2, the left XOR operation of U0 is active by setting S=1. (c) When m<2n, the systolic PB multiplier is shown in Fig. 9. Each row can compute either Eq. (1) (using xm=xn+1) or Eq.(2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In step 2, the left XOR operation of U0 is active by setting S=1. 17 Journal of Computers Vol. 22, No. 3, October 2011 bn+1 bn b0 a0 0 0 b1 bm-1 0 bn b0 0 0 bn+1 Step 2 b1 0 0 bm-1 U0,0 V0,1 U0,n-1 V0,n V0,m-1 U1,0 V1,1 U1,n-1 V1,n V1,m-1 U2,0 V2,1 U2,n-1 V2,n V2,m-1 Ui,0 Vi,1 Ui,n-1 Vi,n Vi,m-1 Um-1,0 Vm-1,1 Um-1,n-1 Vm-1,n Vm-1,m-1 L L Step 1 S a1 S a2 S an S am-1 S L L Step 2 c0’ c1’ L cn-1’ cn’ Equality Checker cm-1’ Step 1 c0 c1 cn-1 cn Fig. 7. The polynomial basis multiplier for the case m=2n 18 cm-1 Error Signal Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial bn b0 bn+1 0 0 bm-1 b1 0 b0 bn-1 0 0 bn bm-n-1 0 bm-n-1 Step 2 b1 0 bm-1 0 a0 U0,0 V0,1 U0,n-1 V0,n V0,m-n-1 V0,m-1 a1 U1,0 V1,1 U1,n-1 V1,n V1,m-n-1 V1,m-1 a2 U2,0 V2,1 U2,n-1 V2,n V2,m-n-1 V2,m-1 ai Ui,0 Vi,1 Ui,n-1 Vi,n Vi,m-n-1 Vi,m-1 Um-1,0 Vm-1,1 Um-1,n-1 Vm-1,n Vm-1,m-n-1 Vm-1,m-1 L L L Step 1 S S S S am-1 S L Step 2 Step 1 c0’ L c1’ c0 L cn-1’ c1 cn’ cn-1 cm-n-1’ cn Equality Checker cm-1’ cm-n-1 cm-1 Error Signal Fig. 8. The polynomial basis multiplier for the case m>2n 19 Journal of Computers Vol. 22, No. 3, October 2011 bn b0 bn+1 0 0 bm-n-1 0 bm-n-1 b1 bm-1 b0 bn-1 0 0 bn 0 b1 0 Step 2 bm-1 0 a0 U0,0 V0,1 V0,m-n-1 U0,n-1 V0,n V0,m-1 a1 U1,0 V1,1 V1,m-n-1 U1,n-1 V1,n V1,m-1 a2 U2,0 V2,1 V2,m-n-1 U2,n-1 V2,n V2,m-1 ai Ui,0 Vi,1 Vi,m-n-1 Ui,n-1 Vi,n Vi,m-1 Um-1,0 Vm-1,1 Vm-1,m-n-1 Um-1,n-1 Vm-1,n Vm-1,m-1 L L Step 1 S S S S am-1 S L Step 2 Step 1 c0’ L c1’ c0 L cm-n-1’ c1 cm-n-1 L cn-1’ cn’ cn-1 cm-1’ cn Fig. 9. The polynomial basis multiplier for the case m<2n 20 Equality Checker cm-1 Error Signal Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial The multiplication algorithm based on the proposed PB multiplier (shown in Figs.7,8 and 9) is described as follows: Algorithm-CED: (Polynomial basis multiplication with CED capability) Input: A=(a0, a1, …, am-2, am-1) Step 1: B=(b0, b1, …, bm-2, bm-1) Step 2: B’=(b0 , b1, …, bm-2, bm-1) Output: C=(c0, c1, …, cm-2, cm-1) C’=(c0’ , c1’, …, cm-2’, cm-1’) Begin Step 1: Computing C=A×B Step 2: Computing C’=A×B’ Step 3: Compare C and C’, if equal then output C, else output error signal End The relation between C’ and C is as follows: ci’=c<i+n> for 0≦i≦m-1 Theorem 1:Any single cell fault in the proposed PB multiplier in Fig. 7, Fig. 8 or Fig. 9 is detected by the Algorithm-CED. Proof: Suppose that the faulty cell is Hi,j, where H=U or V, and 0≦i, j≦m-1. The behavior of the faulty cell can then be classified into one of the following cases and proven detectable. (1) Error on aout The output aout of the cell Hi,j is a go-through line from the input ain, therefore an error on it is easily detected by comparing the input ain of the cell Ui,0 and the output aout of the cell Vi,m-1 easily reveals any errors. (2) Error on bout Suppose an error occurs on the output bout of the faulty cell Hi,j. In the first computation, the error will infect outputs cj+1, cj+2 ,…, c<j+m-1-i> (the symbol <t> expresses t mod m). In the second computation, the error will affect outputs c’j+1, c’j+2 ,…, c’<j+m-1-i> which are equal to c<j+1+n>, c<j+2+n>,…, c<j+m-1-i+n>. It is easy to see that at least one bit of c’j+1, c’j+2 ,…, c’<j+m-1-i> does not exist in one of cj+1, cj+2 ,…, c<j+m-1-i> because n 0 . For example, the output bit cj+1 was affected in Step 1 but not affected in Step 2. Therefore, comparing the outputs of the first and second computation reveals the presence of the error. (3) Error on cout Suppose an error occurs on cout of the faulty cell Hi,j. In the first computation, the error will infect outputs cj, cj+1, cj+2, … ,c<j+m-1-i>. In the second computation, the error will affect outputs c’j, c’j+1, c’j+2,…, c’<j+m-1-i>, and the outputs c’j, c’j+1, c’j+2,…, c’<j+m-1-i> are not same as the outputs c<j+n>, c<j+1+n>, c<j+2+n>,…, c<j+m-1-i>. In other words, at least one bit which belongs to cj, cj+1, cj+2,…,c<j+m-1-i> does not exist in c’j, c’j+1, c’j+2,…, c’<j+m-1-i> because n is not equal to 0. Therefore, at least one output bit such as cj is affected in Step 1 but not affected in Step 2. Therefore, comparing the outputs of both computations can detect this error. If the error occurs on the cell Vi,m-1(0≦i≦m-2) of Fig. 7, 8 and 9, the faulty cell will influence all cells Ui,n(0≦i≦m-2), thus Algorithm-CED cannot detect this error. Future research will address this problem. 5 Results The proposed PB multiplier with concurrent error detection capability employs the RESO method. As mentioned, no concurrent error detection algorithms for the PB multiplier generated by trinomial have been devel21 Journal of Computers Vol. 22, No. 3, October 2011 oped up to date. Thus, existing PB multipliers generated by general polynomial are compared with the proposed one and the results are drawn in Table 1. Recently, Wang-Lin [39] and Lee et al. [34] proposed methods for designing the systolic and semi-systolic arrays with error detection, respectively. Wang-Lin [39] proposed a error detection scheme for the PB multiplier using general polynomial, but their scheme could not provide concurrent error detection capability. However, Lee et al. [34] proposed concurrent error detection architecture for the PB multiplier utilizing general polynomial. We will take the transistor count using a standard CMOS VLSI realization. In the CMOS VLSI technology, inverter, n-input AND, n-input OR, n-input XOR and 1-bit latch are composed of 2, 2n+2, 2n+2, 2n+2, and 8 transistors, respectively [40]. Some real circuits such as M74HC86 (STMicroelectronics,2-input XOR gate, tPD=12ns (TYP.)) [41] 、 M74HC08 (STMicroelectronics, AND gate, tPD=7ns (TYP.)) [42] 、 M74HC279 (STMicroelectronics, SR Latch, tPD=13ns (TYP.)) [43]、M74HC32 (STMicroelectronics, OR gate, tPD=8ns (TYP.)) [44] 、 M74HC11 (STMicroelectronics, 3-input AND gate, tPD=9ns (TYP.)) [45] 、 M74AC157 (STMicroelectronics, 2-1 Multiplexer, tPD=4ns (TYP.)) [46] are employed for comparing time complexity. The results of comparison for various polynomial basis multipliers are listed in Table 1. As shown in Table 1, the proposed PB multiplier with concurrent error detection capability has smaller time and space complexity than Wang-Lin multiplier [39] which does not have concurrent error detection capability. Even if the proposed PB multiplier wastes 22% of time complexity of Lee et al. [34], but saves at least 40% of space complexity. Table 1. Comparisons of various polynomial basis multipliers with and without concurrent error detection capability Multipliers Generated polynomial Array type Number of cells No. of cell types Lee et al.[34] Wang-Lin [39] Propose design General polynomial General polynomial Trinomial Semi-systolic Systolic Semi-systolic 2 U: m +2m V: m 2 m2 1 U: 2m V: m2 2 Space complexity: Inverter 2-input XOR 3-input XOR 2-input AND 3-input AND 2-input OR 1-bit Latch transistor counts inverters 2-input XOR 3-input XOR 2-input AND 3-input AND 4-inout AND 2-input OR 4-input OR 1-bit Latch transistor counts Total space cost Cell delay Latency 22 0 0 2m2+5m 2m2 0 m2 2m2+2m 0 m 0 0 0 2 2m +6m+1 7m2 2 40m +74m+8 76m2 Extra equality checker circuits: 2m 0 0 4m-4 0 m-1 4m-1 m-1 3m-3 48m2+82m-64 88m2+156m-56 76m2 Time complexity: U: 64ns V:34ns 117ns m+2m m2+2m 0 m2+4m 0 2m 2m2+m 28m2+62m M 0 0 2m-2 0 (m-1)/2 2m (m-1)/2 (3m-3)/2 24m2+41m-32 52m2+103m-32 U: 80ns V: 45ns m 3m m Total delay (unit: ns) 97m 351m 125m Throughput (unit=1/cycle) 1/2 1 1/2 Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial Concurrent error detection Yes no Yes 6 Conclusions A novel systolic polynomial basis multiplier using irreducible trinomial has been proposed. By employing the RESO method [36], [37], the proposed systolic polynomial basis multiplier with concurrent error detection capability has been presented. The proposed multiplier with concurrent error detection capability minimally increases the space complexity overhead compared to other existing multipliers. As compared to Lee’s multiplier, the proposed architecture increases the time overhead about 20%. But, our proposed multiplier saves about 40% space complexity. 7 Acknowledgement The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their great help in improving the paper. References [1] F. J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, North Holland, Amsterdam, 1988. [2] R. Lidl and H. Niederreiter, Introduction to Finite Fields and their Applications, Cambridge University Press, New York, 1994. [3] R. E. Blahut, Fast Algorithms for Digital Signal Processing, Addison-Wesley, 1985. [4] I. S. Reed and T.K. Truong, “The Use of Finite Fields to Compute Convolutions,” IEEE Transactions on Information Theory, Vol. 21, No. 2, pp. 208-213, 1975. [5] B. Benjauthrit and I. S. Reed, “Galois Switching Functions and Their Applications,” IEEE Transactions on Computers, Vol. C-25, No. 1, pp. 78-86, 1976. [6] C.C. Wang and D. Pei, “A VLSI Design for Computing Exponentiations in GF(2m) and Its Application to Generate Pseudorandom Number Sequences,” IEEE Transactions on Computers, Vol. 39, No. 2, pp. 258-262, 1990. [7] E. Berlekamp, “Bit-serial Reed-solomon Encoders,” IEEE Transactions on Information Theory, Vol. 28, No. 6, pp. 869-874, 1982. [8] M. Morii, M. Kasahara, D. L. Whiting, “Efficient Bit-serial Multiplication and the Discrete-time Wiener-hopf Equation Over Finite Fields,” IEEE Transactions on Information Theory, Vol. 35, No. 6, pp. 1177-1183, 1989. [9] R. Schroeppel, H. Oman, S. O'Mallry, O. Sparscheck, “Fast Key Exchange with Elliptic Curve Systems,” in Proceedings of Advances in Cryptology-CRYPTO, Santa Barbara, California, USA, pp. 43-56, 1995. [10] E.D. Win, A. Bosselaers, P. De Gersem, S. Vandenberghe, J. Vandewalle, “A Fast Software Implementation for Arithmetic Operations in GF (2n),” in Proceedings of Advances in Cryptology - ASIACRYPT'96, Kyongju, Korea, pp. 65-76, 1996. [11] A. J. Menezes and I. F. Blake, Applications of Finite Fields, Springer, 2010. [12] C.Y. Lee, “Low Complexity Bit-parallel Systolic Multiplier Over GF(2m) Using Irreducible Trinomials,” IEE Proceedings-Computers and Digital Techniques, Vol. 150, No. 1, pp. 39-42, 2003. [13] C. Paar, “A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields,” IEEE Transactions on Computers, Vol. 45, No. 7, pp. 856-861, 1996. 23 Journal of Computers Vol. 22, No. 3, October 2011 [14] C.W. Chiou, L.C. Lin, F.H. Chou, S.F. Shu, “Low-complexity Finite Field Multiplier Using Irreducible Trinomials,” Electronics Letters, Vol. 39, No. 24, pp. 1709-1711, 2003. [15] C.W. Chiou, C.Y. Lee, J.M. Lin, “Efficient Systolic Arrays for Power-sum, Inversion, and Division in GF (2 m),” International Journal of Computer Sciences and Engineering Systems, Vol. 1, No. 1, pp. 27-41, 2007. [16] C.Y. Lee, Y.H. Chen, C.W. Chiou, J.M. Lin, “Unified Parallel Systolic Multiplier Over GF(2 m),” Journal of Computer Science and Technology, Vol. 22, No. 1, pp. 28-38, 2007. [17] C.Y. Lee, J.M. Lin, C.W. Chiou, “Scalable and Systolic Architecture for Computing Double Exponentiation Over GF (2m),” Acta Applicandae Mathematicae, Vol. 93, No. 1, pp. 161-178, 2006. [18] J. L. Massey and J. K. Omura, Computational method and apparatus for finite field arithmetic, in US patent, pp. 627, 1986. [19] A. Reyhani-Masoleh and M. A. Hasan, “A New Construction of Massey-Omura Parallel Multiplier Over GF(2m),” IEEE Transactions on Computers, Vol. 51, No. 5, pp. 511-520, 2002. [20] C.Y. Lee and C.W. Chiou, “Efficient Design of Low-complexity Bit-parallel Systolic Hankel Multipliers to Implement Multiplication in Normal and Dual Bases of GF (2m),” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 88, No. 11, pp. 3169-3179, 2005. [21] C.W. Chiou and C.Y. Lee, “Multiplexer-based Double-exponentiation for Normal Basis of GF (2m),” Computers & Security, Vol. 24, No. 1, pp. 83-86, 2005. [22] H. Wu, M. A. Hasan, I. F. Blake, “New Low-complexity Bit-parallel Finite Field Multipliers Using Weakly Dual Bases,” IEEE Transactions on Computers, Vol. 47, No. 11, pp. 1223-1234, 1998. [23] C.Y. Lee, C.W. Chiou, J.M. Lin, “Concurrent Error Detection in a Bit-parallel Systolic Multiplier for Dual Basis of GF (2m),” Journal of Electronic Testing, Vol. 21, No. 5, pp. 539-549, 2005. [24] J. Kelsey, B. Schneier, D. Wagner, H. Hall, “Side Channel Cryptanalysis of Product Ciphers,” Journal of Computer Security, Vol. 8, No. 2, pp. 141-158, 2000. [25] E. Biham and A. Shamir, “Differential Fault Analysis of Secret Key Cryptosystems,” in Proceedings of Advances in Cryptology — CRYPTO'97, Santa Barbara, California, USA, pp. 513-525, 1997. [26] D. Boneh, R. DeMillo, R. Lipton, “On the Importance of Checking Cryptographic Protocols for Faults,” in Proceedings of Advances in Cryptology-EUROCRYPT'97, Konstanz, Germany, Vol. 1233, pp. 37-51, 1997. [27] R. Karri, G. Kuznetsov, M. Goessel, “Parity-based Concurrent Error Detection of Substitution-permutation Network Block Ciphers,” in Proceedings of Cryptographic Hardware and Embedded Systems-CHES 2003, Cologne, Germany, pp. 113-124, 2003. [28] G. Bertoni, L. Breveglieri, IP. Koren Maistri, V. Piuri, “Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard,” IEEE Transactions on Computers, Vol. 52, No. 4, pp. 492505, 2003. [29] M. Joye, A. K. Lenstra, J. J. Quisquater, “Chinese Remaindering Based Cryptosystems in the Presence of Faults,” Journal of Cryptology, Vol. 12, No. 4, pp. 241-245, 1999. [30] D. Boneh, R. DeMillo, R. J. Lipton, “On the importance of Eliminating Errors in Cryptographic Computations,” Journal of Cryptology, Vol. 14, No. 2, pp. 101-119, 2001. [31] S. Fenn, M. Gossel, M. Benaissa, D. Taylor, “On-line Error Detection for Bit-serial Multipliers in GF (2m),” Journal of Electronic Testing, Vol. 13, No. 1, pp. 29-40, 1998. 24 Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial [32] A. Reyhani-Masoleh and M. A. Hasan, “Error Detection in Polynomial Basis Multipliers Over Binary Extension Fields,” in Proceedings of Cryptographic Hardware and Embedded Systems - CHES 2002, CA, USA, pp. 515-528, 2003. [33] C.W. Chiou, “Concurrent Error Detection in Array Multipliers for GF (2 m) Fields,” Electronics Letters, Vol. 38, No. 14, pp. 688-689, 2002. [34] C.Y. Lee, C.W. Chiou, J.L. Lin, “Concurrent Error Detection in a Polynomial Basis Multiplier Over GF (2m),” Journal of Electronic Testing, Vol. 22, No. 2, pp. 143-150, 2006. [35] C.W. Chiou, C.Y. Lee, A.W. Deng, J.M. Lin, “Concurrent Error Detection in Montgomery Multiplication Over GF (2m),” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 89, No. 2, pp. 566-574, 2006. [36] J. H. Patel and L.Y. Fung, “Concurrent Error Detection in ALU's by Recomputing with Shifted Operands,” IEEE Transactions on Computers, Vol. C-31, No.7, pp. 589-595, 1982. [37] J. H. Patel, L.Y. Fung, “Concurrent Error Detection in Multiply and Divide Arrays,” IEEE Transactions on Computers, Vol. C-32, No. 4, pp. 417-422, 1983. [38] J. F. Wakerly, Error Detecting Codes, Self-checking Circuits and Applications, Elsevier, 1978. [39] C.L. Wang and, J.L. Lin, “Systolic Array Implementation of Multipliers for Finite Fields GF (2 m),” IEEE Transactions on Circuits and Systems, Vol. 38, No. 7, pp. 796-800, 1991. [40] N. H. E. Weste, K. Eshraghian, M. J. S. Smith, in Principles of CMOS VLSI Design: A Systems Perspective with Verilog/VHDL Manual, Addison Wesley, 2000. [41] M74HC86, Quad Exclusive OR Gate, STMicroelectronics,http://www.st.com/stonline/books/pdf/docs/2006.pdf [42] M74HC08, Quad 2-input AND Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1885.pdf [43] M74HC279, Quad S-R Latch, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1937.pdf [44] M74HC32, Quad 2-input OR Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1944.pdf [45] M74HC11, Triple 3-input AND Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1890.pdf [46] M74AC157, Quad 2 Channel Multiplexer, STMicroelectronics, http://www.st.com/stonline/books/pdf/doce/5144.pdf 25