DSP Floating Point Formats

advertisement
By: Mehrnaz Monajati
Instructor: Dr. S.M. Fakhrai
This is a class presentation. All data are copy rights of their
respective authors as listed in the references and have been used
here for educational purpose only.
Fixed vs. Floating Point DSPs
• Cost
• Ease of use
• Accuracy
• Dynamic range
2
Fixed vs. Floating Point DSPs
• Cost
 Today, fixed-point DSPs continue to benefit more from cost
reductions of scale in manufacturing

since they are more often used for high-volume applications
 the same reductions will apply to floating-point DSPs when
high-volume demand for the devices appears.
 Today, cost has increasingly become an issue of SOC
integration and volume, rather than a result of the size of
the DSP core itself.
3
Fixed vs. Floating Point DSPs
• Ease of use
Last days
Today
TI floating-point supported the C language
 FXP DSPs were programmed at the assembly
code level
 Coding of real arithmetic in to hardware
 TI fixed-point DSPs have long been


Directly in FLP

indirectly in FXP

software routines that added
development time and extra instructions
to the algorithm
supported by outstandingly efficient C
compilers
 The advantage of implementing real
arithmetic directly in floating-point
hardware still remains
 Reduction in FXP complexity
 FXP DSPs still have an edge in cost and
FLP DSPs in ease of use, but the edge has
narrowed
 Programming
 Easier in FLP
4
Fixed vs. Floating Point DSPs
• Accuracy
• Dynamic range
• Accuracy of FLP is greater than FXP
 FLP has greater precision in integer as well as real values
 Exponentiation vastly increases the dynamic range
 Internal data representations in FLP DSPs are more exact
than in FXP
 ensuring greater accuracy in end result
5
Fixed vs. Floating Point DSPs
FXP DSPs
 TI’s TMS320C62x™ FXP DSPs
 Two data paths operating in parallel
 Each with a 16-bit word width
 provides signed integer values within a range from –2^15 to 2^15
 TMS320C64x™ DSPs,
 double the overall throughput with four 16-bit multipliers
 TMS320C5x™ and TMS320C2x™ DSPs
 designed for handheld and control applications, respectively
 are based on single 16-bit data paths
6
Fixed vs. Floating Point DSPs
FLP DSPs
 TMS320C67x™ FLP DSPs
 divide a 32-bit data path into two parts: a 24-bit mantissa and an 8bit exponent.
 16M range of precision
 supporting a vastly greater dynamic range than is available with the
FXP format. The C67x™ DSP can also perform calculations
 C67x™ DSP
 Using industry-standard double-width precision


64 bits, including a 53-bit mantissa and an 11-bit exponent
Achieves much greater precision and dynamic range at the expense of
speed, since it requires multiple cycles for each operation
7
Standards for FLP Number Formats
8
FLP Nnumber Formats
9
Sample Floating Point DSPs
 AMD - Athlon Processor
 Xilinx – Virtex-5 APU Floating Point Unit
 Digital Core Design – DFPAU ver 2.05
10
AMD - Athlon Processor 2000
 Include the most powerful floating point engine for x86
platforms
 Delivers twice the peak x87 floating point execution rate of
the Intel Pentium® III processor
 Rivals the FP performance of many RISC processors in that
time
 Superscalar and Super pipelined
 Higher clock frequencies
 Higher overall throughput
Ref. [3]
11
AMD - Athlon Processor 2000
Ref. [3]
12
Xilinx – Virtex-5 APU FLP Unit 2009
 designed for the PowerPC® 440 embedded microprocessor of the
Virtex-5 FXT FPGA family
 support for IEEE-754 standard in single or double precision
 Optimized for 2:1 and 3:1 APU:CPU clock ratios
 allowing PowerPC processor to operate at maximum frequency
Application:
 Digital signal processing of high-quality audio or video signals where
a very large dynamic range is needed to retain fidelity.
 Matrix inversion in wireless communications and radar
 Digital signal processing tasks, spectral methods such as FFT
 Statistical processing
 where floating-point is often the simplest way to avoid integer overflow
and rounding errors
13
Xilinx – Virtex-5 APU FLP Unit 2009
 Increased Processing Capacity
 Hardware floating-point operations complete faster than the equivalent software
emulation routines
 The floating-point operators within the FPU are pipelined

multiple floating-point calculations can proceed in parallel
 The FPU is autonomous
 the PowerPC processor internal pipeline can continue to execute integer instructions while
floating-point operations are handled by the FPU in parallel
 IEEE 754-1985 / Book-E Standard Compatibility
 The standard represents very small numbers by allowing significands of the form
"0.x" in addition to the usual “1.x” used by normalized FLP numbers
 In Book-E, the multiply part of a multiply-add operation should not round its
result before supplying it to the addition part
 The FPU treats all not-a-number (NaN) values as quiet NaNs, which do not cause
exceptions. When a floating-point operation results in a NaN because one of the
inputs was a NaN, the input NaN is not propagated to the output; the default quiet
NaN value is provided. This value is 0x7ff8000000000000 in double precision, and
0x7f800000 in single precision
14
Xilinx – Virtex-5 APU FLP Unit
Ref. [4]
15
Digital Core Design – DFPAU ver. 2.05, 2010
 It is a FLP Arithmetic Co-processor
 directly replaces C software functions, by equivalent, very
fast hardware operations
 significantly accelerate system performance
 It doesn’t require any programming
 Everything is done automatically during software compilation
by the DFPAU C driver.
 Supports addition, subtraction, multiplication, division,
square root, comparison, absolute value
 The input numbers format is according to IEEE-754
 Each floating point function can be turned on/off at
configuration level
 providing the flexible scalability of DFPAU module
 technology independent design
16
Digital Core Design – DFPAU ver. 2.05, 2010
Ref. [5]
Ref. [5]
17
Architectural Modification to Improve
FLP Unit in FPGAs – 2008 [1]
 Variable length shifters account for over 30% of a adder
and 25% of a multiplier
 Coarse-grained approach
 Embedded Shifter
 fine-grained approach
 Multiplexer
embedded
4:1
shifter
multiplexer
Consumed chip area
1.5%
0.48%
Saved area
14.6%
7.3%
Increased clock rate
3.3%
11.6%
18
Low power FLP Unit – 2009 [2]
 Design of embedded systems applications with low
power consumption and fast processing
 performing basic operations such as addition,
subtraction, multiplication and division
 Idea:
 the functional units (adder, shifter, registers) are shared
between different operations
 Advantage: saving silicon area
 Disadvantage: the increase in the number of cycles
required to perform the operation
19
Low power FLP Unit - 2009
Ref. [2]
20
Low power FLP Unit - 2009
Ref. [2]
21
Reconfigurable FLP Unit – 2009 [7]
 Non-numerical applications usually have very few FLP
operations
 FLP unit is always under idle mode
 In idle mode, the floating-point unit still consume
power and the die area is wasted
 Idea:
 reconfigurable floating-point unit that provide integer
and floating-point operations
22
Reconfigurable FLP Unit
rAMM Array
Ref. [7]
23
Reconfigurable FLP Unit
Ref. [7]
24
Reconfigurable FLP Unit
Ref. [7]
Ref. [7]
25
References
1.
2.
3.
4.
5.
6.
7.
M. Beauchamp, et al., "Architectural modifications to enhance the floatingpoint performance of FPGAs," IEEE Transactions on Very Large Scale
Integration Systems, vol. 16, p. 177, 2008.
R.Neves, et al. "A Floating Point Unit Architecture for Low Power Embedded
Systems Applications," XXIV SIM - South Symposium on Microelectronics,
2009.
AMD Athlon Floating Point Engine, "AMD Athlon Processor floating Point
Capability, The Most Powerful, Architecturally Advanced Floating Point
Engine Ever Delivered in an x86 Microprocessor," with paper, 2000.
Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet, DS693,
2009.
DFPAU floating-point pipelined divider, 2010, <http://www.altera.com>.
G. Frantz and R. Simar, "Comparing Fixed and Floating Point DSPs,"
SPRY061, Texas Instruments, 2004.
Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit," 2009.
26
27
Embedded shifter block diagram
Ref. [1]
28
4:1 Multiplexer
Ref. [1]
29
Download