By: Mehrnaz Monajati Instructor: Dr. S.M. Fakhrai This is a class presentation. All data are copy rights of their respective authors as listed in the references and have been used here for educational purpose only. Fixed vs. Floating Point DSPs • Cost • Ease of use • Accuracy • Dynamic range 2 Fixed vs. Floating Point DSPs • Cost Today, fixed-point DSPs continue to benefit more from cost reductions of scale in manufacturing since they are more often used for high-volume applications the same reductions will apply to floating-point DSPs when high-volume demand for the devices appears. Today, cost has increasingly become an issue of SOC integration and volume, rather than a result of the size of the DSP core itself. 3 Fixed vs. Floating Point DSPs • Ease of use Last days Today TI floating-point supported the C language FXP DSPs were programmed at the assembly code level Coding of real arithmetic in to hardware TI fixed-point DSPs have long been Directly in FLP indirectly in FXP software routines that added development time and extra instructions to the algorithm supported by outstandingly efficient C compilers The advantage of implementing real arithmetic directly in floating-point hardware still remains Reduction in FXP complexity FXP DSPs still have an edge in cost and FLP DSPs in ease of use, but the edge has narrowed Programming Easier in FLP 4 Fixed vs. Floating Point DSPs • Accuracy • Dynamic range • Accuracy of FLP is greater than FXP FLP has greater precision in integer as well as real values Exponentiation vastly increases the dynamic range Internal data representations in FLP DSPs are more exact than in FXP ensuring greater accuracy in end result 5 Fixed vs. Floating Point DSPs FXP DSPs TI’s TMS320C62x™ FXP DSPs Two data paths operating in parallel Each with a 16-bit word width provides signed integer values within a range from –2^15 to 2^15 TMS320C64x™ DSPs, double the overall throughput with four 16-bit multipliers TMS320C5x™ and TMS320C2x™ DSPs designed for handheld and control applications, respectively are based on single 16-bit data paths 6 Fixed vs. Floating Point DSPs FLP DSPs TMS320C67x™ FLP DSPs divide a 32-bit data path into two parts: a 24-bit mantissa and an 8bit exponent. 16M range of precision supporting a vastly greater dynamic range than is available with the FXP format. The C67x™ DSP can also perform calculations C67x™ DSP Using industry-standard double-width precision 64 bits, including a 53-bit mantissa and an 11-bit exponent Achieves much greater precision and dynamic range at the expense of speed, since it requires multiple cycles for each operation 7 Standards for FLP Number Formats 8 FLP Nnumber Formats 9 Sample Floating Point DSPs AMD - Athlon Processor Xilinx – Virtex-5 APU Floating Point Unit Digital Core Design – DFPAU ver 2.05 10 AMD - Athlon Processor 2000 Include the most powerful floating point engine for x86 platforms Delivers twice the peak x87 floating point execution rate of the Intel Pentium® III processor Rivals the FP performance of many RISC processors in that time Superscalar and Super pipelined Higher clock frequencies Higher overall throughput Ref. [3] 11 AMD - Athlon Processor 2000 Ref. [3] 12 Xilinx – Virtex-5 APU FLP Unit 2009 designed for the PowerPC® 440 embedded microprocessor of the Virtex-5 FXT FPGA family support for IEEE-754 standard in single or double precision Optimized for 2:1 and 3:1 APU:CPU clock ratios allowing PowerPC processor to operate at maximum frequency Application: Digital signal processing of high-quality audio or video signals where a very large dynamic range is needed to retain fidelity. Matrix inversion in wireless communications and radar Digital signal processing tasks, spectral methods such as FFT Statistical processing where floating-point is often the simplest way to avoid integer overflow and rounding errors 13 Xilinx – Virtex-5 APU FLP Unit 2009 Increased Processing Capacity Hardware floating-point operations complete faster than the equivalent software emulation routines The floating-point operators within the FPU are pipelined multiple floating-point calculations can proceed in parallel The FPU is autonomous the PowerPC processor internal pipeline can continue to execute integer instructions while floating-point operations are handled by the FPU in parallel IEEE 754-1985 / Book-E Standard Compatibility The standard represents very small numbers by allowing significands of the form "0.x" in addition to the usual “1.x” used by normalized FLP numbers In Book-E, the multiply part of a multiply-add operation should not round its result before supplying it to the addition part The FPU treats all not-a-number (NaN) values as quiet NaNs, which do not cause exceptions. When a floating-point operation results in a NaN because one of the inputs was a NaN, the input NaN is not propagated to the output; the default quiet NaN value is provided. This value is 0x7ff8000000000000 in double precision, and 0x7f800000 in single precision 14 Xilinx – Virtex-5 APU FLP Unit Ref. [4] 15 Digital Core Design – DFPAU ver. 2.05, 2010 It is a FLP Arithmetic Co-processor directly replaces C software functions, by equivalent, very fast hardware operations significantly accelerate system performance It doesn’t require any programming Everything is done automatically during software compilation by the DFPAU C driver. Supports addition, subtraction, multiplication, division, square root, comparison, absolute value The input numbers format is according to IEEE-754 Each floating point function can be turned on/off at configuration level providing the flexible scalability of DFPAU module technology independent design 16 Digital Core Design – DFPAU ver. 2.05, 2010 Ref. [5] Ref. [5] 17 Architectural Modification to Improve FLP Unit in FPGAs – 2008 [1] Variable length shifters account for over 30% of a adder and 25% of a multiplier Coarse-grained approach Embedded Shifter fine-grained approach Multiplexer embedded 4:1 shifter multiplexer Consumed chip area 1.5% 0.48% Saved area 14.6% 7.3% Increased clock rate 3.3% 11.6% 18 Low power FLP Unit – 2009 [2] Design of embedded systems applications with low power consumption and fast processing performing basic operations such as addition, subtraction, multiplication and division Idea: the functional units (adder, shifter, registers) are shared between different operations Advantage: saving silicon area Disadvantage: the increase in the number of cycles required to perform the operation 19 Low power FLP Unit - 2009 Ref. [2] 20 Low power FLP Unit - 2009 Ref. [2] 21 Reconfigurable FLP Unit – 2009 [7] Non-numerical applications usually have very few FLP operations FLP unit is always under idle mode In idle mode, the floating-point unit still consume power and the die area is wasted Idea: reconfigurable floating-point unit that provide integer and floating-point operations 22 Reconfigurable FLP Unit rAMM Array Ref. [7] 23 Reconfigurable FLP Unit Ref. [7] 24 Reconfigurable FLP Unit Ref. [7] Ref. [7] 25 References 1. 2. 3. 4. 5. 6. 7. M. Beauchamp, et al., "Architectural modifications to enhance the floatingpoint performance of FPGAs," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, p. 177, 2008. R.Neves, et al. "A Floating Point Unit Architecture for Low Power Embedded Systems Applications," XXIV SIM - South Symposium on Microelectronics, 2009. AMD Athlon Floating Point Engine, "AMD Athlon Processor floating Point Capability, The Most Powerful, Architecturally Advanced Floating Point Engine Ever Delivered in an x86 Microprocessor," with paper, 2000. Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet, DS693, 2009. DFPAU floating-point pipelined divider, 2010, <http://www.altera.com>. G. Frantz and R. Simar, "Comparing Fixed and Floating Point DSPs," SPRY061, Texas Instruments, 2004. Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit," 2009. 26 27 Embedded shifter block diagram Ref. [1] 28 4:1 Multiplexer Ref. [1] 29