Lecture Notes in Computer Science:

advertisement
Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m)
Using Irreducible Trinomial
Hung-Wei Chang1, *, Che-Wun Chiou2, Fu-Hua Chou3, and Wen-Yew Liang1
1 Department
of Computer Science and Information Engineering,
National Taipei University of Technology,
Taipei 106, Taiwan
{t6599009, wyliang}@ntut.edu.tw
2 Department
of Computer Science and Information Engineering,
Ching Yun University,
Chung-Li 320, Taiwan
cwchiou@cyu.edu.tw
3 Department
of Electronic Engineering,
Ching Yun University,
Chung-Li 320, Taiwan
fhchou@cyu.edu.tw
Received 20 June 2011; Revised 27 July 2011; Accepted 10 September 2011
Abstract. Due to the rapid development of smart-phones, mobile commerce becomes very popular and valuable. The communication and information security of the mobile commerce is heavily dependent on the public key cryptosystems such as RSA. However, existing public key cryptosystems are not available for the resource constrained devices like smart-phone. Therefore, the new elliptic curve cryptosystem with very low
cost as compared to RSA is useful and suggested for mobile commerce. The polynomial basis multiplication
is the most important arithmetic operation in the elliptic curve cryptosystem. A new and proved effective
cryptanalysis is called fault based cryptanalysis. To protect such type cryptanalysis, the simple way is to redesign cryptosystems with concurrent error detection capability and only output error-free computed results.
The polynomial basis multipliers generated by trinomials have advantages of low complexity and easy VLSI
implementation. However, no existing polynomial basis multipliers which are generated by trinomials have
concurrent error detection capability. Thus, a new polynomial basis multiplier using trinomial with concurrent error detection capability will be presented. As compared to other existing polynomial basis multipliers
using general polynomials, the proposed polynomial basis multiplier using trinomial with concurrent error
detection capability saves about 40% space complexity.
Keywords: Finite field arithmetic, concurrent error detection, polynomial basis multiplier, elliptic curve
cryptosystem, fault based cryptanalysis
1 Introduction
Recently, finite fields arithmetic operations attracted many attentions because of their importance and practical
applications in the field of error correcting code [1], cryptography [2], digital signal processing [3], [4], switching theory [5], pseudorandom number generation [6], encoding of Reed-Solomon code [7], and solving the Wiener-Hopf equation [8]. In particular, two public-key cryptography schemes, elliptic and hyper elliptic curve cryptosystems [9]-[11] require arithmetic operation to be performed in finite field. The finite field arithmetic operations include addition, multiplication, division, inversion, and exponentiation. Addition is a simple bit-by-bit
XOR operation. Complicated operations such as division, inversion, and exponentiation can be performed by
repeated multiplications. Moreover, the elliptic scalar multiplication is based on the multiplication over GF(2m).
Thus, the multiplication over GF(2m) is the most operation among finite field arithmetic.
There are three popular types of bases over finite fields: polynomial basis (PB) [12]-[17], normal basis (NB)
[18]-[21], and dual basis (DB) [22], [23]. Each basis representation has its own distinct advantages. With the
advantages of low design complexity, simplicity, regularity, and modularity in architecture, the polynomial basis
*Correspondence
author
Journal of Computers Vol. 22, No. 3, October 2011
[12]-[17] are widely used for producing efficient VLSI multiplier implementations. The major advantage of the
normal basis [18]-[21] is that the squaring of an element could be performed simply by cyclically shifting its
binary form. As compared to other two bases, the dual basis multiplier [22], [23] require less chip area.
Side-channel attack is a powerful analysis and differential fault analysis which deliberates fault injection into
cryptographic devices only require a small amount of side-channel information to break common ciphers. Sidechannel attacks have been proven to be a useful cryptanalysis technique against symmetrical and asymmetrical
encryption algorithm. Kelsey et al. [24] claimed that differential fault analysis requires only 50 to 200 cipher text
blocks to recover a symmetrical block cipher Data Encryption Standard (DES) key by deliberating fault injection.
Biham and Shamir [25], and Boneh et al. [26], also developed a fault-based cryptanalysis on symmetrical and
asymmetrical cryptosystems, respectively. Therefore, the simplest way for protecting the encryption/decryption
circuit from an attacker is to ensure that the computation device can ensure the accuracy of the signal before
outputting it.
To output error-free values, several error detection schemes have been presented for the symmetrical and
asymmetrical cryptosystems [27]-[32]. Fenn et al. [31] proposed an on-line error detection scheme for bit-serial
multipliers in GF(2m) using parity prediction. By applying the same parity checking approach, Reyhani-Masoleh
and Hasan [32] provides error detection methods in bit-parallel and bit-serial polynomial basis multipliers in
GF(2m). Although the complexity in their design costs extra hardware, Moreover, the probability of error detection of bit-serial multiplier is about 100%. In their design can be applied to any irreducible polynomial defining
the field. In addition, their design can be easily extended and applied to other GF(2 m) multipliers. However, the
problem involved in adopting parity checking is that it takes a long time to generate parity. An XOR tree is utilized for computing the parity. It requires at least éêlog 2 mù
ú XOR-gate delays to determine the output parity. The
value of m in practical cryptosystems is typically equal to 512 or more. Therefore, the parity checking method is
not allowed to provide on-line error detection capability in systolic array multipliers with bit-parallel output.
However, Lee et al. [23] has solved this difficulty using the parity checking method in the case of dual basis
representation. In [33], Chiou provided a concurrent error detection (CED) scheme for a special case of polynomial basis representation, termed all-one polynomial. In [34], Chiou et al. provided a concurrent error detection
scheme for the general case of polynomial basis representation. Chiou et al. [35] also developed a finite field
Montgomery multiplier with concurrent error detection capability in 2006. These Chiou’s works employ the
concept of REcomputing with Shift Operands (RESO) [36], [37].
However, no finite field polynomial multiplier using irreducible trinomial having concurrent error detection
capability can be found in the literature up to date. Therefore, we will redesign a polynomial basis multiplier
using irreducible trinomial to have concurrent error detection capability in this study. This work employs the
concept of RESO [36], [37] and equality check [38] methods.
The organization of this paper is as follows: Section 2 briefly reviews the mathematical background. Section 3
introduces the proposed PB multiplier with trinomial. The novel PB multiplier with concurrent error detection
capability is discussed in section 4. Finally, results and conclusions are made in section 5 and 6, respectively.
2 Preliminaries
Let GF(2m) be an extension field of the ground field GF(2). GF(2m) is a vector space over GF(2). Supposed that
the
finite
field
GF(2m)
is
generated
by
the
irreducible
polynomi1
2
al P( x) = p0 + p1x + p2 x + ... + pm- 1xm- 1 + pm xm of degree m over GF(2).
Let A(x), B(x), C(x) and P(x) be elements over GF(2m), where C(x) is the product of A(x) and B(x), i.e.,
C(x)=A(x)B(x) mod P(x). Then A(x), B(x), C(x) and P(x) can be expressed as follows:
A( x) = a0 + a1 x + ... + am- 1 x m- 1 ,
B( x) = b0 + b1 x + ... + bm- 1 x m- 1 ,
C ( x) = c0 + c1 x + ... + cm- 1 x m- 1 .
C(x) is the product of A(x) and B(x). Then,
C ( x) = A( x) B( x) mod P( x) .
Using Horner’s rule, C(x) can be obtained by
C ( x)
(1)
= A( x) B( x) mod P( x)
= a0 B( x) mod P( x) + a1 xB( x) mod P( x) + ... + am- 1 x
12
m- 1
B( x) mod P( x)
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
Then,
m- 1
C (x) = A(x)B (x) =
å
m- 1
(i )
ai ´ B (x) =
i= 0
å
ci ´ xi ,
i= 0
where
(i )
B (x) = xi ´ B (x), ci Î { 0,1} and 0 £ i £ m - 1 .
Let the polynomial P(x) be formed as xm+xn+1 with m>n>0, which is known as a trinomial polynomial. Since
P(x)=0, xm+xn+1 can be used to reduce the higher order term xp, p≧m to a polynomial of degree less than m.
The RESO method [36], [37] is based on time redundancy technique. The RESO method is utilized to achieve
concurrent error detection capability and is briefly reviewed as follows.
Fig. 1 shows the flow chart of the RESO method. Let x be the input to a function unit F and F(x) be the desired output. The RESO method using time redundancy technique computes F(x) twice, within a specified time
frame on the same unit F. Supposed that the function b and b-1 have the following property: b-1(F(b(x)))=F(x),
where b-1 is the inverse function of b , so b-1((b(x))=x. During the first step, the input x is applied to the function unit F and then F(x) is computed and stored in the latch. During the second step, the input b(x) is applied to
the function unit F and the computed result F(b(x)) is then converted to F(x) by b-1= b-1(F(b(x)))=F(x).
The results given by both steps are compared. If they are not equal, the error signal will output a logical one to
indicate the presence of error. Recently, Wang-Lin [39] and Lee et al. [34] proposed methods for designing the
systolic and semi-systolic arrays with error detection
Step 1:
Step 2:
x
Latch
F
x
b
b-1
F
Error
Signal
=
Fig. 1. RESO method
3 Proposed Polynomial Basis Multiplier
In the following paragraphs, the proposed polynomial basis multiplier using trinomial is described. The multiplication in Eq. (1) can be expressed as follows:
C ( x)
= A( x) B( x) mod P( x)
(
)
= a0 B( x) mod P( x) + a1 xB( x) mod P( x) + ... + am- 1 x m- 1B( x) mod P( x) mod P( x)
where P( x) = x m + x n + 1 and m > n > 0 , then B(x) in Eq. (1) can be rewritten as:
B( x) = bn x n + bn+ 1 x n+ 1 + ... + bm- 1 x m- 1 + b0 + b1 x1 + ... + bn- 1 x n- 1
(2)
Therefore, xB( x) mod P( x) can be denoted as
xB( x) mod P( x)
= bn x n+ 1 + bn+ 1 x n+ 2 + ... + bm- 1 x m + b0 x + ... + bn- 1 x n mod P( x)
n
= (bm- 1 + bn- 1 ) x + bn x
n+ 1
+ ... + bm- 2 x
m- 1
+ bm- 1 + b0 x + ... + bn- 2 x
(3)
n- 1
As aforementioned,
13
Journal of Computers Vol. 22, No. 3, October 2011
B( x)(1)
= xB( x)
(4)
n
= bn- 1+ m x + bn x
n+ 1
+ ... + bm- 2 x
m- 1
+ bm- 1 + b0 x + ... + bn- 2 x
n- 1
where
bn- 1+ m = bn- 1 + bm- 1 .
Therefore,
B ( x)(i )
= xB( x)(i- 1)
(5)
n
= b( n- i )+ m x + b( n- i+ 1)+ m x
n+ 1
+ ... + b( n- 1)+ m x
n- i
+ bn x
n+ i + 1
+ .. + b( n- 1- i ) x
n- 1
where
b( n- i )+ m = b( n- i ) + b( m- i )
if i £ m - n
b( n- i )+ m = b( m- i )+ m + b( n- i )
(6)
if i > m - n
If B( x)(0) = B( x) , then Eq. (1) can be rewritten as:
C( x) = a0 B( x)(0) + a1B( x)(1) + ... + am- 1B( x)(m- 1)
(7)
As mentioned above, xm=xn+1 due to P(x)=0. Both addition and subtraction are same in GF(2m), thus the
equation xm=xn+1 can be rewritten as xn=xm+1. Based on both equations xm=xn+1 and xn=xm+1, the architecture
for the proposed polynomial basis which is suitable for adding concurrent error detection capability is discussed
as follows.
3.1 Employing xm=xn+1
By applying Eq. (7),
C( x) = a0 B( x)(0) + a1B( x)(1) + ... + am- 1B( x)(m- 1) ,
where
B( x) = b0 + b1x + b2 x 2 + ... + ... + bn- 1xn- 1 + bn xn + ... + bm- 1xm- 1 .
According to the above equation, Fig. 2 shows the feedback structure and its initial value of register D is assigned as follows:
D0 = b0 , D1 = b1 , D2 = b2 , ... , Dn- 1 = bn- 1 , Dn = bn , ... , Dm- 1 = bm- 1.
D0
D1
D2
Dn-1
+
Fig. 2. Feedback structure by using xm=xn+1
3.2 Utilizing xn=xm+1
*
The product C ( x ) can be presented as follows:
14
Dn
Dm-1
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
*
C ( x)
*
*
= A( x)´ B( x)
*
*
*
*
= A( x)´ (bn+ 1 x n+ 1 + bn+ 2 x n+ 2 +
*
*
*
bm- 1 x m- 1 + b0 x 0 + b1 x1 +
*
+ b n- 1 x n- 1 + bn x n )
where
*
*
*
*
(8)
bn xn = bn ( x m + 1) = bn x m + bn x 0
*
\ C ( x)
*
*
= A( x)´ (bn+ 1 x
n+ 1
*
+ bn+ 2 x
n+ 2
*
+
+ bm- 1 x
m- 1
*
*
m
*
0
*
1
+ bn x + (b0 + bn ) x + b1 x +
+ bn- 1 x
(9)
n- 1
)
According to the above equation (9), Fig. 3 depicts the feedback structure and the initial value of register D is
loaded as follows:
*
*
*
*
*
*
D0 = bn+ 1 , D1 = bn+ 2 , ... , Dm- n- 1 = bm- 1 , Dm- n = bn , Dm- n+ 1 = (b0 + bn ), ... , Dm- 1 = bn- 1 .
D0
D1
Dm-n-1
Dm-1
+
Fig. 3. Feedback structure by using xn=xm+1
3.3 Combining xm=xn+1 and xn=xm+1 for Concurrent Error Detection
According the relation between m and 2n, three cases for the combining structure are separately discussed below.
Case 1: as m = 2n
B( x) = b0 + b1 x + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1
*
*
*
*
B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 +
*
+ bm- 1 x m- 1 + bn x m +
*
+ bn- 1 x n- 1
(10)
bm-n-1 and bn-1 can be in the same bit position.
Case 2: as m > 2n
B( x) = b0 + b1 + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1
*
*
*
B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 +
*
(11)
*
+ bn+ m x n+ m + bn+ m+ 1 x n+ m+ 1 +
*
*
+ bm- 1 x m- 1 + bn x m +
*
+ bn- 1 x n- 1
The bit position of bm-n-1 is in the rear of that of bn-1.
Case 3: as m < 2n
B( x) = b0 + b1 x + ... + bn- 1 x n- 1 + bn x n + ... + bm- 1 x m- 1
*
*
*
B( x) = bn+ 1 x n+ 1 + bn+ 2 x n+ 2 +
*
*
+ bm- 1 x m- 1 + bn x m +
(12)
*
*
+ bn+ m x n+ m + bn+ m+ 1 x n+ m+ 1 +
*
+ bn- 1 x n- 1
The bit position of bm-n-1 is in the front of that of bn-1.
An example with P(x)=1+x2+x5 is shown in Fig. 4. The detailed circuits of both processing cells V and U are
drawn in Fig. 5 and Fig. 6, respectively.
15
Journal of Computers Vol. 22, No. 3, October 2011
The select line S in Fig. 4 chooses the left XOR operation of U cells. In step 1, the systolic PB multiplier computes Eq. (1) by using x5=x2+1. The left XOR operation of Un-1 is active by setting S=0. In step 2, the systolic PB
multiplier performs Eq. (2) using x2=x5+1. The left XOR operation of U0 is active by setting S=1.
b0
0 0 b1
0
b2
0
0 b1
0
0
b4
17
a0
U0,0
V0,1
U0,2
V0,3
V0,4
a1
U1,0
V1,1
U1,2
V1,3
V1,4
a2
U2,0
V2,1
U2,2
V2,3
V2,4
a3
U3,0
V3,1
U3,2
V3,3
V3,4
a4
U4,0
V4,1
U4,2
V4,3
V4,4
c3
c4
S
S
S
S
S
c1
c0
c2
Fig. 4. The proposed systolic polynomial basis multiplier which employs x5=x2+1 and x2=x5+1
c
b
in
in
a
ao
in
L
ut
L
bo
co
Note: L: 1-bit Latch
ut
Fig. 5. The detailed circuit of V cell
16
ut
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
ain
Select
d
cin
aout
2-1 MUX
bin
L
L
cout
bout
Fig. 6. The detailed circuit of U cell
4 Proposed Polynomial Basis Multiplier with Concurrent Error Detection Capability
PB multipliers applied in ECC usually require a very large field. Such multiplier might needs millions of logic
gates. The errors will occur in computation due to faults in the field. To eliminating errors in cryptographic computations has been pointed out in some recent articles. The simplest way to prevent a fault-based attack is to
ensure that the computational device verifies values before sending them out.
In this section, we will introduce how to make the proposed multiplier to have concurrent error detection capability. Assume that the single cell fault model is assumed in this paper. For this fault model, its behavior will
result in one of the faulty cell’s outputs being fixed to either a logic 0 (stuck-at-0) or a logic 1 (stuck-at-1), respectively.
As aforementioned, the polynomial basis multiplication can be performed by one of Eqs. (10), (11) or (12) depending on the relation of m and 2n. Therefore, error schemes for such three cases are discussed separately as
follows.
(a) When m=2n, the systolic PB multiplier is shown in Fig. 7. Each row can compute either Eq. (1) (using
xm=xn+1) or Eq. (2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In
step 2, the left XOR operation of U0 is active by setting S=1.
(b) When m>2n, the systolic PB multiplier is shown in Fig. 8. Each row can compute either Eq. (1) (using
xm=xn+1) or Eq. (2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In
step 2, the left XOR operation of U0 is active by setting S=1.
(c) When m<2n, the systolic PB multiplier is shown in Fig. 9. Each row can compute either Eq. (1) (using
xm=xn+1) or Eq.(2) (utilizing xn=xm+1) by setting select line S=0 or 1, respectively. The Select line S decides
the left XOR operation of U cells. In step 1, the left XOR operation of Un-1 is active by setting S=0. In step 2,
the left XOR operation of U0 is active by setting S=1.
17
Journal of Computers Vol. 22, No. 3, October 2011
bn+1
bn
b0
a0
0
0
b1
bm-1
0
bn
b0
0
0
bn+1
Step 2
b1
0
0
bm-1
U0,0
V0,1
U0,n-1
V0,n
V0,m-1
U1,0
V1,1
U1,n-1
V1,n
V1,m-1
U2,0
V2,1
U2,n-1
V2,n
V2,m-1
Ui,0
Vi,1
Ui,n-1
Vi,n
Vi,m-1
Um-1,0
Vm-1,1
Um-1,n-1
Vm-1,n
Vm-1,m-1
L
L
Step 1
S
a1
S
a2
S
an
S
am-1
S
L
L
Step 2
c0’
c1’
L
cn-1’
cn’
Equality
Checker
cm-1’
Step 1
c0
c1
cn-1
cn
Fig. 7. The polynomial basis multiplier for the case m=2n
18
cm-1
Error Signal
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
bn
b0
bn+1
0 0
bm-1
b1
0
b0
bn-1 0
0
bn
bm-n-1
0
bm-n-1
Step 2
b1
0
bm-1
0
a0
U0,0
V0,1
U0,n-1
V0,n
V0,m-n-1
V0,m-1
a1
U1,0
V1,1
U1,n-1
V1,n
V1,m-n-1
V1,m-1
a2
U2,0
V2,1
U2,n-1
V2,n
V2,m-n-1
V2,m-1
ai
Ui,0
Vi,1
Ui,n-1
Vi,n
Vi,m-n-1
Vi,m-1
Um-1,0
Vm-1,1
Um-1,n-1
Vm-1,n
Vm-1,m-n-1
Vm-1,m-1
L
L
L
Step 1
S
S
S
S
am-1
S
L
Step 2
Step 1
c0’
L
c1’
c0
L
cn-1’
c1
cn’
cn-1
cm-n-1’
cn
Equality
Checker
cm-1’
cm-n-1
cm-1
Error Signal
Fig. 8. The polynomial basis multiplier for the case m>2n
19
Journal of Computers Vol. 22, No. 3, October 2011
bn
b0
bn+1
0 0
bm-n-1
0 bm-n-1
b1
bm-1
b0
bn-1 0 0
bn
0
b1
0
Step 2
bm-1
0
a0
U0,0
V0,1
V0,m-n-1
U0,n-1
V0,n
V0,m-1
a1
U1,0
V1,1
V1,m-n-1
U1,n-1
V1,n
V1,m-1
a2
U2,0
V2,1
V2,m-n-1
U2,n-1
V2,n
V2,m-1
ai
Ui,0
Vi,1
Vi,m-n-1
Ui,n-1
Vi,n
Vi,m-1
Um-1,0
Vm-1,1
Vm-1,m-n-1
Um-1,n-1
Vm-1,n
Vm-1,m-1
L
L
Step 1
S
S
S
S
am-1
S
L
Step 2
Step 1
c0’
L
c1’
c0
L
cm-n-1’
c1
cm-n-1
L
cn-1’
cn’
cn-1
cm-1’
cn
Fig. 9. The polynomial basis multiplier for the case m<2n
20
Equality
Checker
cm-1
Error Signal
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
The multiplication algorithm based on the proposed PB multiplier (shown in Figs.7,8 and 9) is described as follows:
Algorithm-CED:
(Polynomial basis multiplication with CED capability)
Input:
A=(a0, a1, …, am-2, am-1)
Step 1: B=(b0, b1, …, bm-2, bm-1)
Step 2: B’=(b0 , b1, …, bm-2, bm-1)
Output:
C=(c0, c1, …, cm-2, cm-1)
C’=(c0’ , c1’, …, cm-2’, cm-1’)
Begin
Step 1: Computing C=A×B
Step 2: Computing C’=A×B’
Step 3: Compare C and C’,
if equal then output C,
else output error signal
End
The relation between C’ and C is as follows:
ci’=c<i+n> for 0≦i≦m-1
Theorem 1:Any single cell fault in the proposed PB multiplier in Fig. 7, Fig. 8 or Fig. 9 is detected by the
Algorithm-CED.
Proof:
Suppose that the faulty cell is Hi,j, where H=U or V, and 0≦i, j≦m-1. The behavior of the faulty cell can
then be classified into one of the following cases and proven detectable.
(1) Error on aout
The output aout of the cell Hi,j is a go-through line from the input ain, therefore an error on it is easily detected by comparing the input ain of the cell Ui,0 and the output aout of the cell Vi,m-1 easily reveals any errors.
(2) Error on bout
Suppose an error occurs on the output bout of the faulty cell Hi,j. In the first computation, the error will infect outputs cj+1, cj+2 ,…, c<j+m-1-i> (the symbol <t> expresses t mod m). In the second computation,
the error will affect outputs c’j+1, c’j+2 ,…, c’<j+m-1-i> which are equal to c<j+1+n>, c<j+2+n>,…,
c<j+m-1-i+n>.
It is easy to see that at least one bit of c’j+1, c’j+2 ,…, c’<j+m-1-i> does not exist in one of cj+1, cj+2 ,…,
c<j+m-1-i> because n  0 . For example, the output bit cj+1 was affected in Step 1 but not affected in
Step 2. Therefore, comparing the outputs of the first and second computation reveals the presence of the
error.
(3) Error on cout
Suppose an error occurs on cout of the faulty cell Hi,j. In the first computation, the error will infect outputs
cj, cj+1, cj+2, … ,c<j+m-1-i>. In the second computation, the error will affect outputs c’j, c’j+1, c’j+2,…,
c’<j+m-1-i>, and the outputs c’j, c’j+1, c’j+2,…, c’<j+m-1-i> are not same as the outputs c<j+n>,
c<j+1+n>, c<j+2+n>,…, c<j+m-1-i>. In other words, at least one bit which belongs to cj, cj+1,
cj+2,…,c<j+m-1-i> does not exist in c’j, c’j+1, c’j+2,…, c’<j+m-1-i> because n is not equal to 0. Therefore, at least one output bit such as cj is affected in Step 1 but not affected in Step 2. Therefore, comparing
the outputs of both computations can detect this error.
If the error occurs on the cell Vi,m-1(0≦i≦m-2) of Fig. 7, 8 and 9, the faulty cell will influence all cells
Ui,n(0≦i≦m-2), thus Algorithm-CED cannot detect this error. Future research will address this problem.
5 Results
The proposed PB multiplier with concurrent error detection capability employs the RESO method. As mentioned, no concurrent error detection algorithms for the PB multiplier generated by trinomial have been devel21
Journal of Computers Vol. 22, No. 3, October 2011
oped up to date. Thus, existing PB multipliers generated by general polynomial are compared with the proposed
one and the results are drawn in Table 1.
Recently, Wang-Lin [39] and Lee et al. [34] proposed methods for designing the systolic and semi-systolic arrays with error detection, respectively. Wang-Lin [39] proposed a error detection scheme for the PB multiplier
using general polynomial, but their scheme could not provide concurrent error detection capability. However,
Lee et al. [34] proposed concurrent error detection architecture for the PB multiplier utilizing general polynomial.
We will take the transistor count using a standard CMOS VLSI realization. In the CMOS VLSI technology,
inverter, n-input AND, n-input OR, n-input XOR and 1-bit latch are composed of 2, 2n+2, 2n+2, 2n+2, and 8
transistors, respectively [40]. Some real circuits such as M74HC86 (STMicroelectronics,2-input XOR gate,
tPD=12ns (TYP.)) [41] 、 M74HC08 (STMicroelectronics, AND gate, tPD=7ns (TYP.)) [42] 、 M74HC279
(STMicroelectronics, SR Latch, tPD=13ns (TYP.)) [43]、M74HC32 (STMicroelectronics, OR gate, tPD=8ns
(TYP.)) [44] 、 M74HC11 (STMicroelectronics, 3-input AND gate, tPD=9ns (TYP.)) [45] 、 M74AC157
(STMicroelectronics, 2-1 Multiplexer, tPD=4ns (TYP.)) [46] are employed for comparing time complexity. The
results of comparison for various polynomial basis multipliers are listed in Table 1.
As shown in Table 1, the proposed PB multiplier with concurrent error detection capability has smaller time
and space complexity than Wang-Lin multiplier [39] which does not have concurrent error detection capability.
Even if the proposed PB multiplier wastes 22% of time complexity of Lee et al. [34], but saves at least 40% of
space complexity.
Table 1. Comparisons of various polynomial basis multipliers with and without concurrent error detection capability
Multipliers
Generated polynomial
Array type
Number of cells
No. of cell types
Lee et al.[34]
Wang-Lin [39]
Propose design
General
polynomial
General
polynomial
Trinomial
Semi-systolic
Systolic
Semi-systolic
2
U: m +2m
V: m
2
m2
1
U: 2m
V: m2
2
Space complexity:
Inverter
2-input XOR
3-input XOR
2-input AND
3-input AND
2-input OR
1-bit Latch
transistor counts
inverters
2-input XOR
3-input XOR
2-input AND
3-input AND
4-inout AND
2-input OR
4-input OR
1-bit Latch
transistor counts
Total space cost
Cell delay
Latency
22
0
0
2m2+5m
2m2
0
m2
2m2+2m
0
m
0
0
0
2
2m +6m+1
7m2
2
40m +74m+8
76m2
Extra equality checker circuits:
2m
0
0
4m-4
0
m-1
4m-1
m-1
3m-3
48m2+82m-64
88m2+156m-56
76m2
Time complexity:
U: 64ns V:34ns
117ns
m+2m
m2+2m
0
m2+4m
0
2m
2m2+m
28m2+62m
M
0
0
2m-2
0
(m-1)/2
2m
(m-1)/2
(3m-3)/2
24m2+41m-32
52m2+103m-32
U: 80ns V: 45ns
m
3m
m
Total delay (unit: ns)
97m
351m
125m
Throughput (unit=1/cycle)
1/2
1
1/2
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
Concurrent error detection
Yes
no
Yes
6 Conclusions
A novel systolic polynomial basis multiplier using irreducible trinomial has been proposed. By employing the
RESO method [36], [37], the proposed systolic polynomial basis multiplier with concurrent error detection capability has been presented. The proposed multiplier with concurrent error detection capability minimally increases
the space complexity overhead compared to other existing multipliers. As compared to Lee’s multiplier, the proposed architecture increases the time overhead about 20%. But, our proposed multiplier saves about 40% space
complexity.
7 Acknowledgement
The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their
great help in improving the paper.
References
[1]
F. J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, North Holland, Amsterdam, 1988.
[2]
R. Lidl and H. Niederreiter, Introduction to Finite Fields and their Applications, Cambridge University Press, New
York, 1994.
[3]
R. E. Blahut, Fast Algorithms for Digital Signal Processing, Addison-Wesley, 1985.
[4]
I. S. Reed and T.K. Truong, “The Use of Finite Fields to Compute Convolutions,” IEEE Transactions on Information
Theory, Vol. 21, No. 2, pp. 208-213, 1975.
[5]
B. Benjauthrit and I. S. Reed, “Galois Switching Functions and Their Applications,” IEEE Transactions on Computers,
Vol. C-25, No. 1, pp. 78-86, 1976.
[6]
C.C. Wang and D. Pei, “A VLSI Design for Computing Exponentiations in GF(2m) and Its Application to Generate
Pseudorandom Number Sequences,” IEEE Transactions on Computers, Vol. 39, No. 2, pp. 258-262, 1990.
[7]
E. Berlekamp, “Bit-serial Reed-solomon Encoders,” IEEE Transactions on Information Theory, Vol. 28, No. 6, pp.
869-874, 1982.
[8]
M. Morii, M. Kasahara, D. L. Whiting, “Efficient Bit-serial Multiplication and the Discrete-time Wiener-hopf
Equation Over Finite Fields,” IEEE Transactions on Information Theory, Vol. 35, No. 6, pp. 1177-1183, 1989.
[9]
R. Schroeppel, H. Oman, S. O'Mallry, O. Sparscheck, “Fast Key Exchange with Elliptic Curve Systems,” in
Proceedings of Advances in Cryptology-CRYPTO, Santa Barbara, California, USA, pp. 43-56, 1995.
[10] E.D. Win, A. Bosselaers, P. De Gersem, S. Vandenberghe, J. Vandewalle, “A Fast Software Implementation for
Arithmetic Operations in GF (2n),” in Proceedings of Advances in Cryptology - ASIACRYPT'96, Kyongju, Korea, pp.
65-76, 1996.
[11] A. J. Menezes and I. F. Blake, Applications of Finite Fields, Springer, 2010.
[12] C.Y. Lee, “Low Complexity Bit-parallel Systolic Multiplier Over GF(2m) Using Irreducible Trinomials,” IEE
Proceedings-Computers and Digital Techniques, Vol. 150, No. 1, pp. 39-42, 2003.
[13] C. Paar, “A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields,”
IEEE Transactions on Computers, Vol. 45, No. 7, pp. 856-861, 1996.
23
Journal of Computers Vol. 22, No. 3, October 2011
[14] C.W. Chiou, L.C. Lin, F.H. Chou, S.F. Shu, “Low-complexity Finite Field Multiplier Using Irreducible Trinomials,”
Electronics Letters, Vol. 39, No. 24, pp. 1709-1711, 2003.
[15] C.W. Chiou, C.Y. Lee, J.M. Lin, “Efficient Systolic Arrays for Power-sum, Inversion, and Division in GF (2 m),”
International Journal of Computer Sciences and Engineering Systems, Vol. 1, No. 1, pp. 27-41, 2007.
[16] C.Y. Lee, Y.H. Chen, C.W. Chiou, J.M. Lin, “Unified Parallel Systolic Multiplier Over GF(2 m),” Journal of Computer
Science and Technology, Vol. 22, No. 1, pp. 28-38, 2007.
[17] C.Y. Lee, J.M. Lin, C.W. Chiou, “Scalable and Systolic Architecture for Computing Double Exponentiation Over GF
(2m),” Acta Applicandae Mathematicae, Vol. 93, No. 1, pp. 161-178, 2006.
[18] J. L. Massey and J. K. Omura, Computational method and apparatus for finite field arithmetic, in US patent, pp. 627,
1986.
[19] A. Reyhani-Masoleh and M. A. Hasan, “A New Construction of Massey-Omura Parallel Multiplier Over GF(2m),”
IEEE Transactions on Computers, Vol. 51, No. 5, pp. 511-520, 2002.
[20] C.Y. Lee and C.W. Chiou, “Efficient Design of Low-complexity Bit-parallel Systolic Hankel Multipliers to Implement
Multiplication in Normal and Dual Bases of GF (2m),” IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, Vol. 88, No. 11, pp. 3169-3179, 2005.
[21] C.W. Chiou and C.Y. Lee, “Multiplexer-based Double-exponentiation for Normal Basis of GF (2m),” Computers &
Security, Vol. 24, No. 1, pp. 83-86, 2005.
[22] H. Wu, M. A. Hasan, I. F. Blake, “New Low-complexity Bit-parallel Finite Field Multipliers Using Weakly Dual
Bases,” IEEE Transactions on Computers, Vol. 47, No. 11, pp. 1223-1234, 1998.
[23] C.Y. Lee, C.W. Chiou, J.M. Lin, “Concurrent Error Detection in a Bit-parallel Systolic Multiplier for Dual Basis of
GF (2m),” Journal of Electronic Testing, Vol. 21, No. 5, pp. 539-549, 2005.
[24] J. Kelsey, B. Schneier, D. Wagner, H. Hall, “Side Channel Cryptanalysis of Product Ciphers,” Journal of Computer
Security, Vol. 8, No. 2, pp. 141-158, 2000.
[25] E. Biham and A. Shamir, “Differential Fault Analysis of Secret Key Cryptosystems,” in Proceedings of Advances in
Cryptology — CRYPTO'97, Santa Barbara, California, USA, pp. 513-525, 1997.
[26] D. Boneh, R. DeMillo, R. Lipton, “On the Importance of Checking Cryptographic Protocols for Faults,” in
Proceedings of Advances in Cryptology-EUROCRYPT'97, Konstanz, Germany, Vol. 1233, pp. 37-51, 1997.
[27] R. Karri, G. Kuznetsov, M. Goessel, “Parity-based Concurrent Error Detection of Substitution-permutation Network
Block Ciphers,” in Proceedings of Cryptographic Hardware and Embedded Systems-CHES 2003, Cologne, Germany,
pp. 113-124, 2003.
[28] G. Bertoni, L. Breveglieri, IP. Koren Maistri, V. Piuri, “Error Analysis and Detection Procedures for a Hardware
Implementation of the Advanced Encryption Standard,” IEEE Transactions on Computers, Vol. 52, No. 4, pp. 492505, 2003.
[29] M. Joye, A. K. Lenstra, J. J. Quisquater, “Chinese Remaindering Based Cryptosystems in the Presence of Faults,”
Journal of Cryptology, Vol. 12, No. 4, pp. 241-245, 1999.
[30] D. Boneh, R. DeMillo, R. J. Lipton, “On the importance of Eliminating Errors in Cryptographic Computations,”
Journal of Cryptology, Vol. 14, No. 2, pp. 101-119, 2001.
[31] S. Fenn, M. Gossel, M. Benaissa, D. Taylor, “On-line Error Detection for Bit-serial Multipliers in GF (2m),” Journal
of Electronic Testing, Vol. 13, No. 1, pp. 29-40, 1998.
24
Chang et al: Concurrent Error Detection in Polynomial Basis Multiplier over GF(2 m) Using Irreducible Trinomial
[32] A. Reyhani-Masoleh and M. A. Hasan, “Error Detection in Polynomial Basis Multipliers Over Binary Extension
Fields,” in Proceedings of Cryptographic Hardware and Embedded Systems - CHES 2002, CA, USA, pp. 515-528,
2003.
[33] C.W. Chiou, “Concurrent Error Detection in Array Multipliers for GF (2 m) Fields,” Electronics Letters, Vol. 38, No.
14, pp. 688-689, 2002.
[34] C.Y. Lee, C.W. Chiou, J.L. Lin, “Concurrent Error Detection in a Polynomial Basis Multiplier Over GF (2m),” Journal
of Electronic Testing, Vol. 22, No. 2, pp. 143-150, 2006.
[35] C.W. Chiou, C.Y. Lee, A.W. Deng, J.M. Lin, “Concurrent Error Detection in Montgomery Multiplication Over GF
(2m),” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 89, No. 2,
pp. 566-574, 2006.
[36] J. H. Patel and L.Y. Fung, “Concurrent Error Detection in ALU's by Recomputing with Shifted Operands,” IEEE
Transactions on Computers, Vol. C-31, No.7, pp. 589-595, 1982.
[37] J. H. Patel, L.Y. Fung, “Concurrent Error Detection in Multiply and Divide Arrays,” IEEE Transactions on Computers,
Vol. C-32, No. 4, pp. 417-422, 1983.
[38] J. F. Wakerly, Error Detecting Codes, Self-checking Circuits and Applications, Elsevier, 1978.
[39] C.L. Wang and, J.L. Lin, “Systolic Array Implementation of Multipliers for Finite Fields GF (2 m),” IEEE Transactions
on Circuits and Systems, Vol. 38, No. 7, pp. 796-800, 1991.
[40] N. H. E. Weste, K. Eshraghian, M. J. S. Smith, in Principles of CMOS VLSI Design: A Systems Perspective with
Verilog/VHDL Manual, Addison Wesley, 2000.
[41] M74HC86, Quad Exclusive OR Gate, STMicroelectronics,http://www.st.com/stonline/books/pdf/docs/2006.pdf
[42] M74HC08, Quad 2-input AND Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1885.pdf
[43] M74HC279, Quad S-R Latch, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1937.pdf
[44] M74HC32, Quad 2-input OR Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1944.pdf
[45] M74HC11, Triple 3-input AND Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1890.pdf
[46] M74AC157, Quad 2 Channel Multiplexer, STMicroelectronics, http://www.st.com/stonline/books/pdf/doce/5144.pdf
25
Download