2. Field Programmable Gate Array Prototyping of End

advertisement
Field Programmable Gate Array Prototyping of End-Around
Carry Parallel Prefix Tree Architectures.
ABSTRACT:
As an important part of many processors’ floating point unit, fused multiply-add unit
performs a multiplication followed immediately by an addition. In IBM POWER6
microprocessor’s fused multiply-add unit, a fast 128-bit floating-point end-around-carry (EAC)
adder is proposed. Very few algorithmic details exist in today’s literature about this adder. In
this study, a complete designed EAC adder that can work independently as a regular adder is
proposed. Details about the proposed EAC adder’s arithmetic algorithms are described. In IBM’s
original EAC adder, the Kogge–Stone tree has been chosen for its high performance on ASIC
technology. In this study, the authors present a comparative study on different parallel prefix
trees which are used in the design of our new EAC adder targeting field programmable gate
array (FPGA) technology. Our study highlights the main performance differences among 14
different architecture configurations focusing on the area requirements and the critical path
delay. The experimental results show that there is one architecture configuration with the
lower area requirement and the higher performance.
Key-Words: EAC, Parallel prefix, Kogge-stone
INTRODUCTION:
Fused multiply-add unit plays an important role in modern microprocessor. It performs floatingpoint multiplication followed immediately by an addition of the product with a third floatingpoint operand. In 2007, a seven-cycle fused multiply-add pipeline unit was proposed as a part
of the floating-point unit in IBM’s POWER6 microprocessor. In this fused multiply-add dataflow,
the product should be aligned before it is added with the addend. Because the magnitude of
the product is unknown in the early stages prior to the combination with the addend, it is
difficult to determine a priori which operand is bigger. Even if it was determined early that the
product was bigger, there would be a problem on conditionally complementing two
intermediate operands, the carry and sum outputs of the counter tree. Thus, an adder needs to
be designed to always output a positive magnitude result and preferably only needs to
conditionally complement one operand. Therefore a new 128-bit end-around carry (EAC) adder
was designed and fabricated in IBM’s fused multiply-add unit. The intention is not to produce
an adder with the best stand-alone performance but to provide the one with the best overall
floating-point performance.
VEDLABS, #112, Oxford Towers, Old airport Road, Kodihalli, Bangalore-08
Page 1
BLOCK DIAGRAM:
Fig: Architecture of modified EAC adder
Fig. shows the architecture of the proposed EAC adder. In this adder, the inputs are two 129-bit
binary addends
sum
magnitudes of
and the output is the
They are all in sign magnitude format.
are the
are the corresponding sign bits. The magnitudes of
VEDLABS, #112, Oxford Towers, Old airport Road, Kodihalli, Bangalore-08
Page 2
operands are used to produce the positive magnitude of the sum and the sign bits of operands
are used to produce the sign of the sum.
HARDWARE AND SOFTWARE REQUIREMENTS:
Software Requirement Specification:

Operating System: Windows XP with SP2

Synthesis Tool: Xilinx 12.2.

Simulation Tool: Modelsim6.3c.
Hardware Requirement specification:

Minimum Intel Pentium IV Processor

Primary memory: 2 GB RAM,

Spartan III FPGA

Xilinx Spartan III FPGA development board

JTAG cable, Power supply
REFERENCES:
1] CURRAN B., MCCREDIE B., SIQAL L., ET AL.: ‘4GHz+ low-latency fixed-point and binary floating-point
execution units for the POWER 6 processor’. Digest of 2006 IEEE Int. Solid-State Circuits Conf., 2006,
pp. 1728–1734
[2] SCHWARZ E.M.: ‘Binary floating-point unit design’, in U.S.S. (ED.): ‘High performance energy efficient
microprocessor design’ (Springer, 2006), pp. 189–208
[3] YU X.Y., FLEISCHER B., CHAN Y.H., ET AL.: ‘A 5 GHz+ 128-bit binary floating-point adder for the POWER 6
processor’. Proc. Int. Conf. 32nd European Solid-State Circuits, 2006, pp. 166–169
[4] LEOBANDUNG D.M.E., NAYAKAMA H., ET AL.: ‘High performance 65 nm SOI technology with dual stress liner
and low capacitance sram cell’. Digest of 2005 Symp. on VLSI Technology, 2005
[5] KOGGE P.M., STONE H.S.: ‘A parallel algorithm for the efficient solution of a general class of recurrence
equations’, IEEE Trans. Comput., 1973, 22, (8), pp. 786–793
[6] LADNER R., FISCHER M.: ‘Parallel prefix computation’, J. ACM, 1980, 27, (4), pp. 831–838
VEDLABS, #112, Oxford Towers, Old airport Road, Kodihalli, Bangalore-08
Page 3
Download