1: Power optimization of GCD processor using low power Spartan 6

advertisement

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

Power optimization of GCD processor using low power Spartan 6 FPGA family

(an improvement over Spartan 3 FPGA family)

Sachin D Kohale

Assistant Professor, Dept. of Electronics and

Telecommunication Engineering,

St. John College of Engineering and Technology,

Palghar (E), Thane Dist., Maharashtra, India. sdk_pz@yahoo.com

Abstract— This work is an extension of work done using Spartan

3 FPGA Family. Power dissipation is an important factor to be taken into consideration. Simulations are done using Spartan 6 and because of its good features and it’s capability of having 42% less power consumption and 12% increased performance over previous generation devices. Arithmetic and logic unit(ALU) is being implemented which has a capability of doing arithmetic and logical operation along with calculation of greatest common divisor(gcd) using Euclid’s and Stein’s algorithm. Simulation is done with Xilinx 13.4 ISE and experimental results are found on

Spartan 6. Experimental results shows that Stein’s algorithm is better than Euclid’s algorithm for finding out greatest common divisor(gcd) of two non-negative integers, also it consumes less power with using Spartan 6 than using Spartan 3. So, optimization of power is being done by using Spartan 6 instead of

Spartan 3 for implementing GCD Processor.

Keywords- Arithmetic and Logic Unit(ALU), Built in Slef Test

(BIST), Euclid’s Algorithm, Greatest Common Divisor, Stein’s

Algorithm.

Ratnaprabha W Jasutkar

Associate Professor, Dept. of Computer Science and

Engineering,

G H Raisoni College of Engineering,

Hingna Road, Nagpur, Maharashtra, India. ratnaprabhajasutkar@gmail.com

ALU is a part of GCD Processor and testing of the operations performed by ALU is done by adding BIST circuitry into it. Spartan 6 FPGA Family is used for experimentation.

II.

B ENEFITS OF S PARTAN 6

 Increased System Performance with efficient, dual register 6-input LUT(look up table) logic structure.

 Get connected with up to 8 low power (150 mW per)

3.2Gbps GTP serial transceivers.

 Build DSP applications using low-power 390MHz

DSP48A1 slices with 18 x 18 multipliers.

 Use multi-voltage, multi-standard Select IO ™ banks with low cost HSTL and SSTL memory interfaces.

 1000 MHz clocking.

I.

I NTRODUCTION OF A RITHMETIC AND L OGIC U NIT

An arithmetic and logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. The ALU is a fundamental building block of the central processing unit

(CPU) of a computer. The inputs to the ALU are the data to be operated on (called operands) and the code from the control unit indicating which operation to perform. Its output is the result of the computation. The processors found inside modern

CPUs and graphics processing units (GPUs) accommodate very powerful and very complex ALUs; a single component may contain a number of ALUs. Most of a processor's operations are performed by one or more ALUs. An ALU loads data from input registers, an external Control Unit then tells the ALU what operation to perform on that data, and then the ALU stores its result into an output register. The Control

Unit is responsible for moving the processed data between these registers, ALU and memory.

III.

G

REATEST

C

OMMON

D

IVISOR AND

I

TS

A

PPLICATION

Greatest Common Divisor(GCD)[9] calculations is needed in different applications such as Cryptography, data security etc. RSA is an algorithm for public-key cryptography that is based on the presumed difficulty of factoring large integers . A user of RSA creates and then publishes the product of two large prime numbers , along with an auxiliary value, as their public key. The prime factors must be kept secret. Anyone can use the public key to encrypt a message, but with currently published methods, if the public key is large enough, only someone with knowledge of the prime factors can feasibly decode the message. Whether breaking RSA encryption is as hard as factoring is an open question known as the RSA problem .

RSA involves a public key and a private key . The public key can be known to everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted in a reasonable amount of time using theprivatekey.

1 | 2 2

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

In mathematics , the greatest common divisor (gcd), also known as the greatest common factor (gcf), or highest common factor (hcf), of two or more non-zero integers , is the largest positive integer that divides the numbers without a remainder . For example, the GCD of 48 and 180 is 12.

Similarly u is odd and v is even then

gcd(u,v)=gcd(u,v/2) (11)

If u and v are both odd and u is ≥ v , then gcd(u,v)=gcd((u–v)/2,v) (12)

If both are odd and u < v , then

gcd(u,v)=gcd((v–u)/2,u) (13)

For example. gcd( 0 , 22 ) is 22 . Also, gcd( 33 , 0 ) is 33.

Similarly , gcd( 21 , 22 ) is same as gcd( 21 , 11). Also, gcd( 21

, 41 ) is same as gcd( (41 - 21) / 2 , 21 ) is again same as gcd(

10 , 21 ).

Figure 1. GCD Calculation

There are two main algorithm derived for calculating

Greatest Common Divisor(GCD) of two non-negative numbers, they are:- Euclid’s algorithm and Stein’s algorithm.

A.

Euclid’s Algorithm

In mathematics, the Euclidean algorithm (also called

Euclid's algorithm) is an efficient method for computing the greatest common divisor (GCD) of two integers, also known as the greatest common factor (GCF) or highest common factor

(HCF).

Basically Euclid algorithm can be described as

A.

IV.

A LGORITHM

Euclid’s Algorithm

I MPLEMENTATION

The flowchart of Euclid’s algorithm[9] implementation is shown in figure 2 below. The result that is , gcd of two nonnegative inputs are finally stored in register R. Three inputs ldr

, clk and rst are used along with each register. The use of ldr signal is to load value in respective register. clk signal is used to provide clock for synchronized operations. rst signal is used to reset value stored in respective register.

gcd( a , 0 ) = a (1)

gcd( a , b ) = gcd( b , a mod b ) (2)

If arguments are both greater than zero, then

gcd( a , a ) = a (3)

gcd( a , b ) = gcd( a - b , b ) ; if b < a (4)

gcd( a , b ) = gcd( a , b - a ) ; if a < b (5)

Example. gcd( 20 , 20 ) is 20 . Also, gcd( 20 , 40 ) is same as calculating gcd( 20 , ( 40 – 20 ) ) is again gcd( 20 , 20 ).

B.

Stein’s Algorithm

The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor of two nonnegative integers. It gains a measure of efficiency over the ancient Euclidean algorithm by replacing divisions and multiplications with shifts. Basically Stein’s algorithm can be described as

gcd( 0 , v) = v (6)

gcd( u , 0 ) = u (7)

gcd( 0 , 0 ) = 0 (8)

If u and v are both even, then

Figure .2. Euclid’s algorithm Implementation Logic

B. Stein’s Algorithm

If u is even and v is odd, then finally stored in register R. In this way, Using Euclid’s and

Stein’s algorithm, gcd is being calculated. This two algorithm is being tested for perfection with BIST feature added into it.

2 | 2 2

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

Using Linear Feedback Shift Register(LFSR), random inputs are generated. These random inputs are given as a input to respective registers initial to verify the output. So, gcd is being calculated and testing of results is being done with BIST[8]. circuitry. It includes flip-flops connected each other with EX-

OR and NOT gates shown in Fig. 4 and Fig. 5 below.

Figure 3. Stein’s Algorithm Implementation Logic

V.

ALU O PERATIONS IN GCD P ROCESSOR

The various operations of ALU implemented is discussed below :-

TABLE I. V ARIOUS ALU O PERATIONS I N GCD P ROCESSOR

Opcode

000

001

010

011

100

101

Operations of ALU

Addition

Subtraction

Multiplication

Division

GCD Using Euclid’s Algorithm

GCD Using Stein’s Algorithm

VI.

C

ONCEPT OF

L

INER

F

EEDBACK

S

HIFT

R

EGISTER

An Linear Feedback Shift Register(LFSR)[1] is a shift register that, when clocked, advances the signal through the register from one bit to the next most-significant bit. Some of the outputs are combined in exclusive-OR configuration to form a feedback mechanism. A linear feedback shift register can be formed by performing exclusive-OR on the outputs of two or more of the flip-flops together and feeding those outputs back into the input of one of the flip-flops.

For applying BIST features [6], random inputs are generated using Linear Feedback Shift Register(LFSR)

Figure 4. Generation of First Random Input

Inputs generated are: 0 , 1 , 2 , 4 , 9, 0 , 1 , …….

Figure 5. Generation of Second Random Input

Inputs generated are: 0 , 1 , 2 , 5 , 10 , 0 , 1 , …….

VII.

R ESULTS AND D ISCUSSION

Fig. 6 below shows the working snapshot of ALU implementations of GCD Processor with BIST features using

Spartan 6 FPGA Family.

3 | 2 2

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

Figure .6. Experimental Setup of ALU implementations with BIST using

Spartan 6

A. Euclid’s Algorithm with BIST

Fig. 7 below shows RTL view of 8-bit Euclid’s implementation with BIST using Spartan 6 .

TABLE II. D EVICE U TILIZATION S UMMARY OF 8BIT E UCLID ’ S

A LGORITHM I MPLEMENTATION U SING S PARTAN 6

Slice Logic Utilization Used Available

Number of Slice

Registers

Utliz ation

34 18224

Number of Slice LUT’s 1371 9112

1%

15%

Number of Occupied

Slices

450 2278 19%

Number of Bounded

IOB’s

Number of BUF/BUFG

MUX’s

35

1

232

16

15%

6%

Number of

OLOGIC2/OSERDES2s

16 248 6%

Average Fanout of Non-

Clock Nets

5.39 - -

Table II above shows device utilization summary of 8-bit

Euclid’s algorithm implementation using Spartan 6.

TABLE III. P OWER S UMMARY OF 8BIT E UCLID ’ S A LGORITHM

I MPLEMENTATION U SING S PARTAN 6

On-

Chip

Clocks

Power Used Available

0.000 3 -

Utlization

-

Logic 0.000

Signals 0.000

IO’s 0.000

1371 9112

1504 -

35 232

15

-

15

Leakage 0.020

Total 0.020

-

-

-

-

-

-

Table III. Shown above shows power summary of 8-bit

Euclid’s algorithm implementation using Spartan 6.

Figure 7. RTL View of 8-bit Euclid’s Algorithm Implementation with BIST

 data01 and data02 are the two random data’s generated by using LFSR circuitry.

 gcd and gcd1 signal is for gcd calculation of reference circuitry and for actual circuitry.

 bist_out signal is ‘1’, when gcd = gcd1. That means, circuit is tested properly.

Figure 8. Power Summary Snapshot of 8-bit Euclid’s Algorithm implementation using Spartan 3

As Compared to Power dissipation of 8-bit Euclid’s algorithm implementation using Spartan 3, the power dissipation was 92 mwatts(i.e. 0.092) shown in Fig. 8 above.

4 | 2 2

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

By replacing Spartan 6 FPGA family, the power dissipation is reduced( 0.020) i.e. 20 mwatts.

B. Stein’s Algorithm with BIST

Fig. 9 below shows RTL view of 8-bit Stein’s implementation with BIST using Spartan 6.

 load signal is used to load the values in registers.

 clk signal is used for synchronization.

 similarly, reset signal is used for reset the values. mwatts(i.e. 0.03723) shown in Fig.10 below. Whereas the power dissipation using Spartan 6 is 0.020 mwatts indicated in

Table V below. Thus, Spartan 6 is having more advantageous as compared to Spartan 3.

Figure 9. RTL View of 8-bit Stein’s algorithm implementation with BIST

Table IV below shows device utlization summary of 8-bit

Stein’s algorithm implementation using Spartan 6. Stein’s algorithm implementation is having more advantages as compared to Euclid’s algorithm by considering above parameters after comparing Table II and Table IV above.

TABLE IV. D EVICE U TILIZATION S UMMARY OF 8BIT S TEIN ’ S

A LGORITHM I MPLEMENTATION U SING S PARTAN 6

Slice Logic Utilization Used Available

Utlizati on

Number of Slice

Registers

32 18224 1%

Number of Slice LUT’s 39

Number of Occupied

Slices

14

Number of Bounded

IOB’s

Number of BUF/BUFG

MUX’s

Number of

OLOGIC2/OSERDES2s

Average Fanout of Non-

Clock Nets

22

1

0

3.98

9112

2278

232

16

248

-

1%

1%

9%

6%

0%

-

Table V below shows Power Summary of 8-bit Stein’s algorithm implementation using Spartan 6.

After comparing Power dissipation of 8-bit Stein’s algorithm implementation using Spartan 3 and Spartan 6, it is found that, using Spartan 3, the power dissipation was 37.23

Figure 10. Power Summary Snapshot of 8-bit Stein’s Algorithm implementation using Spartan 3

Also, Stein’s algorithm is better than Euclid’s algorithm for finding out greatest common divisor(gcd) of two nonnegative integers, due to less avg. fan outs in stein’s(3.98) as compared with in Euclid’s(5.39),also less no. of slice registers used, less no. of slice LUT’s ,less no. of occupied slices and also less no. of bounded IOB’s needed in Stein’s algorithm than in Euclid’s ones(Ref. Table II and Table IV above).

TABLE V. P OWER S UMMARY OF 8BIT S TEIN ’ S A LGORITHM

I MPLEMENTATION U SING S PARTAN 6

On-Chip Power Used Available Utlization

Clocks 0.000 1 - -

Logic

Signals

0.000

0.000

39

57

9112

-

0

-

22 IO’s 0.000

Leakage 0.020

Total 0.020

-

-

232

-

-

9

-

-

R

EFERENCES

[1] Prathyusha Nayineni ,S.K.Masthan, “Power optimization of BIST circuit using low power LFSR”, International Journal of Computer Trends and

Technology, ISSN: 2231-2803, vol. 2,issue 2, 2011, pp.5-8.

[2] R.S.Oliveira, J.Semiao, I.C.Teixeira, M.B.Santos,J.P. Teixeira, “On-line

BIST for Performance Failure Prediction under Aging Effects in

Automotive Safety-Critical Applications” ,Test Workshop (LATW),

978-1-4577-0/11/2011 IEEE.

[3] Massoud Shadfar,Zainalabedin Navabi, “BIST Modeling and its application in design verification”, VHDL Users Group / VHDL

International Users Forum (VIUF) Proceedings,Spring-95,1995,pp.4.18-

4.21.

[4] Shikha Khurana, Kanika Kaur, “Implementation of ALU using FPGA”,

International Journal of Emerging Trends & Technology in Computer

Science (IJETTCS) ,ISSN 2278-6856, volume 1, issue 2,July – August

2012, pp.146-149.

5 | 2 2

International Journal of Conceptions on Electronics and Communication Engineering

Vol. 2, Issue. 1, June’ 2014; ISSN: 2357 - 2809

[5] Prashanth B.U.V., P.Anil Kumar, G Sreenivasulu ,“ Design &

Implementation of Floating point ALU on a FPGA Processor”,

International Conference on Computing, Electronics and Electrical

Technologies [ ICCEET],2012,pp.772-776.

[6] Navdeep Kaur, Neeru Malhotra, Balwinder Singh, “VHDL

Implementation of ALU with Built In Self Test Technique”,

International Journal of Engineering Research and Development, volume 5, issue 1 (November 2012), 2012 ,pp. 14-17.

[7] Douglas Densmore, John P. Hayes, “Built-In-Self Test (BIST)

Implementations-An overview of design tradeoffs”, EECS 579 – Digital

Systems Testing,2001,pp.1-24.

[8] Rekha Devi,Jaget Singh,Mandeep Singh, “VHDL Implementation of

GCD Processor with Built in Self Test Feature”, International Journal of

Computer Applications (0975 – 8887) volume 25,no.2,july 2011, pp.50-

54.

[9] Jamuna. S and VK Agrawal , “VHDL Implementation of BIST

Controller ” , Proc. of int. conf, on Advances in Recent Technologies in

Communication and Computing ,2011.

[10] Zhu Hongyu and LI Huiyun , “A BIST Scheme to Test Static Parameters of ADCs”, IEEE Symposium on Electrical & Electronics Engineering

(EEESYM) ,2012.

[11] P. Udaya, “Euclid's Algorithm and LFSR synthesis”, ISIT 2000.

Sorrento, Italy, June 25-30,2000 IEEE.

[12] Haroon Altarawneh, “A Comparison of Several Greatest Common

Divisor (GCD) Algorithms”, International Journal of Computer

Applications (0975 – 8887),vol. 26 no.5,july 2011, pp.24-31.

[13] George Purdy, Carla Purdy, and Kiran Vedantam, “Two binary algorithms for calculating the jacobi symbol and a fast systolic implementation in hardware”, MWSCAS '06,49th IEEE International

Midwest Symposium on Circuits and Systems, 2006.

[14] K. Kobayashi, N. Takagi, K. Takagi, “Fast inversion algorithm in

GF(2m) suitable for implementation with a polynomial multiply instruction on GF(2)”, published in IET Computers & Digital

Techniques(ISSN 1751-8601), The Institution of Engineering and

Technology, 2012 ,pp.180-185.

6 | 2 2

Download