area efficient implementation of gf (2 ) multipliers for finite fields

advertisement
International Journal of Electrical, Electronics and Data Communication, ISSN: 2320-2084
Volume-2, Issue-5, May-2014
M
AREA EFFICIENT IMPLEMENTATION OF GF (2 ) MULTIPLIERS
FOR FINITE FIELDS
1
SONIA SOPHIA JOSEPH, 2S PRAKASH
1
PG Scholar, Applied Electronics, Jerusalem College of Engineering, Chennai, India.
2
Professor, E.C.E Department, Jerusalem College of Engineering, Chennai, India.
Email: soniasjoseph@gmail.com, prakash.sav4@gmail.com
Abstract— Galios field arithmetic have many applications in cryptography and other fields. Multiplication is the most
expensive operation in Galios field. Multipliers for Galios Field, GF (2m) have wide applications in Cryptography and error
control coding systems. The existing multipliers for Galios Field require large area and time complexity, which is
undesirable for real time applications. This paper presents an area-time efficient systolic structure for multiplication over GF
(2m) based on irreducible all-one polynomial (AOP). A novel cut-set retiming is used to reduce the critical path to one XOR
delay. The systolic structure can be further divided into two or more parallel branches to reduce the delay. The multiplier is
synthesized using Verilog HDL. Area can be further reduced using a novel register sharing technique. The result shows that
this multiplier has reduced area-time complexity than the existing multipliers.
Keywords—All One Polynomial, Finite Field, Systolic Design, Verilog HDL
I.
the critical delay. Using cut set retiming we can
introduce certain number of delays on all the edges in
one direction of any cut-set of a signal flow-graph
(SFG) by removing equal number of delays on all
the edges in the reverse direction of the same cut-set. It
is useful for pipelining digital circuits. This systolic
multiplier can be divided into two or more parallel
branches for reducing the delay. These parallel
branches share the same input. The area can be
further reduced by using a novel register sharing
technique. In this technique a shared register is used
for two processing elements that share the same input
operand.
This paper presents an area- time efficient systolic
structure for multiplication over GF (2m) based on all
one polynomial. The multiplier is synthesized using
Verilog HDL. The logic verification is done using
Modelsim 6.4e and the result is compared with the
existing designs.
INTRODUCTION
finite field multipliers over GF (2m) have many
applications cryptography and error control coding
system such as Reed-Solomon codes. In elliptical
curve cryptography (ECC) and many other
cryptographic applications finite field multipliers are
used. Efficient hardware design for polynomialbased multiplication is important for real-time
applications. All-one polynomial (AOP) is an
irreducible polynomial for efficient implementation
of finite field multiplication. Since AOP-based
multipliers are simple and regular, a number of works
have been done on its efficient realization. AOPbased fields could also be used for efficient
implementation of Reed-Solomon encoders. Since
irreducible AOPs are not abundant nearly AOP
(NAOP) is also used for the implementation of
finite field multipliers. Also, the AOP-based
architectures are also used as a kernel circuit for
exponentiation, inversion, and division architectures.
Systolic design is a preferred hardware for finite field
multipliers because of its high level of pipeline
ability, local connectivity and many other
advantageous features. Systolic array consists of a
network of processing elements. A digit serial
systolic
multiplier
based
on
Montgomery
multiplication algorithm was proposed in. In some
papers bit-parallel implementation of finite field
multipliers are synthesized based on AOP. In systolic
design only local interconnections exists. It provides
high throughput rates.
Systolic structure provides pipeline ability, but it has
two major disadvantages. The first is, systolic
structure consumes large area and power. Second is,
the latency is high for systolic design. If the order of
the field is m, then its latency is nearly m cycles. This
makes it undesirable for real time applications. A
novel cut set retiming is used in this paper to reduce
II.
ALGORITHM
Let f(x) =xm+xm-1+…+x+1 is an irreducible AOP of
degree m over GF (2). The set {m+1,m-2,….,1}
forms the polynomial basis (where  is a root of f(x)
), such that an element X of the binary field can be
given by
X=Xm-1m-1+Xm-2m-2+…+X1+X0, where XiGF (2)
for i=m-1… 2, 1, 0.
Since  is a root of f(x),
f()=0,and
f()+f()=(m+m-1+….++1)+
(m+m1
m+1
+….++1) =  +1=0
Therefore, we have m+1=1
This property of All one polynomial is used for the
multiplication of GF (2m) to reduce the complexity as
discussed in the following.
Area Efficient Implementation of GF (2m) Multipliers for Finite Fields
16
International Journal of Electrical, Electronics and Data Communication, ISSN: 2320-2084
Volume-2, Issue-5, May-2014
A=
j B=
j C =
j
,
Where aj bj, cj GF (2), for 0<= j <= m-1, and am=0,
bm=0, cm=0 If C is the product of elements.
…………………………….. .. (1)
Xi=bi*Ai …………………………………….. (2)
Ai+1=a0i+1 + a0i+1+ a1i+1.α+…..+ ami+1.αm …… (3)
The systolic implementation of multiplication over
GF (2m), the operations of (1), (2) and (3) can be
performed recursively. Each recursion is composed of
three steps, i.e., bit-addition of (1), bit-multiplication
of (2), and modular reduction of (3).
III.
Fig. 2. Cut-set retiming of the SFG. (a) Cut-set retiming in a
general way. (b) Proposed cut-set retiming. (c) Formation of PE.
“D” denotes unit delay.
The regular PE consists of bit shift cell (BSC),
addition cell and XOR cell. Bit shift cell performs
modular reduction operation and XOR cell performs
addition. The bit multiplication is done using AND
cell. The AND cell and the XOR cell consists of
(m+1) AND and XOR gates working in parallel.
The systolic multiplier produces the first output
(m+2) cycles after the first input is fed into it.
SYSTOLIC MULTIPLER DESIGN
A.
Basic systolic multiplier design
The systolic multiplier performs the operations of bit
addition (1), bit multiplication (2) and modular
reduction (3) recursively. This can be represented by
the SFG as in Fig.1.
Fig. 3. Proposed systolic structure.
Fig. 3 (b) Regular PE
Fig. 1. SFG of the algorithm. (a) The SFG. (b) Function of
node R(i). (c) Function of node M(i) . (d) Function of node A(i).
The node M (i) performs multiplication, A (i)
performs addition and R(i) performs modular
reduction operations. The proposed cut set retiming is
shown in Fig.2. The critical path is TA+TX in Fig 2(a),
where TA and TX are the propagation delay of AND
and XOR gates. The bit reduction node does not
perform any computation, therefore does not involve
any time consumption. Using the proposed cut set
retiming, the operations of (1), (2) and (3) can be
performed simultaneously. So the critical delay is
reduced to max (TA,TX). The processing elements
(PE) are formed based on the proposed cut set
retiming. The Fig 2(c) shows the formation of PE.
The basic systolic design is shown in Fig 3. This
multiplier has (m+2) PE. The structure of regular PE
is shown in Fig 3(d). It performs bit addition, bit
multiplication and modular reduction concurrently.
The PE and PE perform multiplication and modular
reduction only.
Fig. 4. Structure of PEs. (a) Internal structure of a regular PE. (b)
Internal struc- ture of PE[0] (c) An example of AND cell for
m=4. (d) Structure of the AC. (e) Structure of BSC . (f)
Alternate structure of a regular PE. (g) Alternate structure of
PE.
B.
Low latency systolic multiplier design
The systolic structure for multiplication can be
divided into two parallel branches
using the
equation to get the low latency systolic multiplier.
C=
+
Area Efficient Implementation of GF (2m) Multipliers for Finite Fields
17
International Journal of Electrical, Electronics and Data Communication, ISSN: 2320-2084
One of the sum contains [(m/2) +1] partial products
while the other has m/2 partial products. The low
latency systolic multiplier is shown in Fig 5. The
upper branch consists of [(m/2) +2] PE and the lower
branch consists of (m/2+1) PEs and a delay cell.
Besides, an addition-cell (AC) is required to perform
the final addition of the outputs of the two systolic
arrays.
Volume-2, Issue-5, May-2014
CONCLUSION
An area time efficient systolic design for the
multiplication over GF(2m) based on irreducible AOP
is proposed. By novel cut-set retiming we have been
able to reduce the critical path to one XOR gate
delay and a low latenc y bit-parallel systolic
multiplier is derived. Compared with the existing
systolic structures for bit-parallel realization of
multiplication over GF(2m), the proposed one requires
lesser area, shorter critical path and lower latency. The
proposed design can be improved using register
sharing technique to further reduce the latency.
REFERENCES
Fig. 5. Proposed low latency systolic structure. (a) The
systolic structure. (b) Function of the AC.
[1]
The low latency systolic structure consumes the same
space complexity as the systolic multiplier but the
delay is reduced to (m/2+3).
IV.
[2]
[3]
SIMULATION RESULTS
The simulation result for the area efficient systolic
structure for multiplication in GF (2m) is obtained
using modelsim 6.4c. The design is coded using
Verilog HDL. The value of m is chosen as 6 GF
(26). The proposed structure require (m+3) PE and an
addition cell. The latency of the design is (m/2+3)
cycles, where the duration of the clock-period is TX.
The simulation output of systolic multiplier for GF
(26) is shown in Fig 6.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Fig. 6. Simulation output of systolic multiplier GF (26)
[11]
The simulation output of low latency systolic
multiplier for GF (26) is shown in Fig 7.
[12]
[13]
[14]
Fig. 7. Simulation output of low latency systolic multiplier GF
(26)
M. Ciet, J. J. Quisquater, and F. Sica, “A secure family of
composite finite fields suitable for fast implementation of
elliptic curve cryptography,” in Proc. Int. Conf. Cryptol.
India, 2001, pp. 108–116.
H. Fan and M. A. Hasan, “Relationship between GF(2m)
Montgomery and shifted polynomial basis multiplication
algorithms,” IEEE Trans. Computers, vol. 55, no. 9, pp.
1202–1206, Sep. 2006.
C.L.Wang and J.L. Lin, “Systolic array implementation of
multipliers for finite fields GF(2m) ,” IEEE Trans. Circuits
Syst., vol. 38, no. 7, pp. 796–800, Jul. 1991.
B. Sunar and C. K. Koc, “Mastrovito multiplier for all
trinomials,” IEEE Trans. Comput., vol. 48, no. 5, pp. 522–
527, May 1999.s
C. H. Kim, C.-P. Hong, and S. Kwon, “A digit-serial
multiplier for finite field GF(2m) ,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 476–483,
2005.
C. Paar, “Low complexity parallel multipliers for Galois
fields based on s GF(2m) special types of primitive
polynomials,” in Proc. IEEE Int. Symp. Inform. Theory,
1994, p. 98.
H. Wu, “Bit-parallel polynomial basis multiplier for new
classes of finite fields,” IEEE Trans. Computers, vol. 57, no.
8, pp. 1023–1031, Aug. 2008.
S. Fenn, M.G. Parker,M. Benaissa, and D. Taylor, “Bitserial multiplication in GF(2m) using all-one polynomials,”
IEE Proc. Com. Digit. Tech., vol. 144, no. 6, pp. 391–393,
1997.
K.Y. Chang, D. Hong, and H.S. Cho, “Low complexity bitparallel multiplier for GF(2m)
defined by all-one
polynomials using redundant representation,” IEEE Trans.
Computers, vol. 54, no. 12, pp. 1628–1629, Dec. 2005.
H.S. Kim and S.W. Lee, “LFSR multipliers over GF(2m)
defined by all-one polynomial,” Integr., VLSI J., vol. 40,
no. 4, pp. 571–578, 2007.
P. K. Meher, Y. Ha, and C.Y. Lee, “An optimized design of
serial-parallel finite field multiplier for GF(2m) based on
all-one polynomials,” in Proc. ASP-DAC, 2009, pp. 210–
215.
M. Sandoval, M. F. Uribe, and C. Kitsos, “Bit-serial and
digit-serial GF(2m) montgomery multipliers using linear
feedback shift registers,” IET Comput. Digit. Tech., vol. 5,
no. 2, pp. 86–94, 2011.
C.Y. Lee, E.H. Lu, and J.Y. Lee, “Bit-parallel systolic
multipliers for GF(2m) fields defined by all-one and equally
spaced polynomials,” IEEE Trans. Computers, vol. 50, no.
6, pp. 385–393, May 2001.
Y.R. Ting, E.H. Lu, and Y.C. Lu, “Ringed bit-parallel
systolic multipliers over a class of fields GF(2m) ,” Integr.,
VLSI J., vol. 38, no. 4, pp. 571–578, 2005.
[15]
C.-Y. Lee, J.S. Horng, I.C. Jou, and E.H. Lu,
“Low-complexity
bit-parallel
systolic
montgomery
ultipliers for special classes of GF(2m),” IEEE Trans.
Computers, vol. 54, no. 9, pp. 1061–1070, Sep. 2005.
Area Efficient Implementation of GF (2m) Multipliers for Finite Fields
18
Download