Slide

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc }@eecg.utoronto.ca 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA What is this presentation about? FOCUS: Signal processing applications developed using high-level language representation and floating-point data types... WANT: Faster fixed-point software development... QUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance? Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 2 of 32 Presentation Outline Motivation & Background Focus on…  Automatic Conversion to Fixed-Point  Architectural Enhancements  Some Experimental Results Summary / Future Directions Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 3 of 32 Motivation 80% of DSPs in use are Fixed-Point. Why? Because fixed-point hardware is cheaper and uses less power … … however, it is much harder to develop signal-processing software for. Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 4 of 32 Background  UTDSP Project: DSP Compiler/Architecture Co-design  Traditional DSP architectures are hard for compilers to generate efficient code for… eg. extended precision accumulators  First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm CMOS / 63 MHz (Sean Peng’s M.A.Sc.)  16-bit Fixed-Point VLIW DSP with novel 2-level Instruction fetching architecture (reduced pin-count)  June 2000: Synopsys CoCentric Fixed-Point Designer Tool  First commercial tool for transforming floating-point ANSI C programs into fixed-point ($20,000 US) Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 5 of 32 Background: Fixed-Point versus Floating-Point sign bit 8 bit exponent (excess 127) 23+1 bit normalized mantissa 32 bit Floating-Point (IEEE): explicit binary-point Fixed-Point: implied binary-point sign bit Tor Aamodt & Paul Chow University of Toronto Integer Part Fractional Part Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 6 of 32 Background: Using Fixed-Point Arithmetic Floating-Point: Fixed-Point: yn = yn-1 + xn yn = (( •y n-1>>3) + xn ) << 1 Explicit Scaling Operations Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 7 of 32 Automatic Conversion Process Traditional Optimizing Compiler: Input Program Parser Optimizer Code Generator • CONSTRAINT: Input/Output Invariance • GOAL: Application Speedup Processor ie. make code faster, but do not break anything!!! Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 8 of 32 Automatic Conversion Process Traditional Optimizing Compiler: Input Program Sample Inputs Parser Optimizer Code Generator Processor Floating-Point to Fixed-Point Translator • “RELAX” CONSTRAINTS… • GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500  faster than FP emulation) Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 9 of 32 Floating-Point to Fixed-Point Translation float a, b, x[N]; y = a*x[i] + b*x[i+1]; int a, b, x[N]; y = a•x[i] >> 2 + b•x[i+1]; 1. Type Conversion 2. Scaling Operations 3. Fractional Fixed-Point Operations Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 10 of 32 Floating-Point to Fixed-Point Translator SUIF Parser* Optimizer Identifier Assignment Fixed-PointConversion Instrument Code Sample Inputs Profile *SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 11 of 32 Collecting Dynamic Range Information Consider the ANSI C code: float a, b, x[N]; y = a*x[i] + b*x[i+1]; Equivalent Expression Tree: ID Assignment: “1” : tmp_1 “0” : y Code Instrumentation: tmp_1 = a*x[i]; profile(tmp_1,1); a tmp_2 = b*x[i+1]; profile(tmp_2,2); x[i] y = tmp_1 * tmp_2; * + b profile(y,0); * “2” : tmp_2 Tor Aamodt & Paul Chow University of Toronto x[i+1] Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 12 of 32 Generating Scaling Operations Signal Scaling: Integer Word Length (IWL)  definition: IWL[x] = log2 max(x) + 1 IWL Sign bit Tor Aamodt & Paul Chow University of Toronto Integer Part Fractional Part Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 13 of 32 Generating Scaling Operations Example: “A op B”:  IWLA op B measured  IWLA measured  IWLA current ? Tor Aamodt & Paul Chow University of Toronto A current  IWLB measured  IWL B current op Converted Sub-Expressions IWLA op B B Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 14 of 32 Automatic Conversion Process: IRP: Using Intermediate Result Profile Data Previous Algorithms:  ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric FixedPoint Designer Tool)  A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for FixedPoint Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997. Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes  Is Useful Information Lost? Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 15 of 32 IRP: Additive Operations For example, assume |A| > |B|, and IWLA+B measured  IWLA measured “A ± B” A: >> n B: n “A  B”  where: IWLA+B current Tor Aamodt & Paul Chow University of Toronto “(A << nA)  (B >> [n-nB])” nA = IWLA nB = IWLA n = IWLA = IWLA current current measured - IWLA IWLB IWLB measured measured measured measured Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 16 of 32 IRP: Multiplication “A • B”  where: IWLA•B current Tor Aamodt & Paul Chow University of Toronto “(A << nA) • (B << nB)” nA = IWLA nB = IWLA = IWLA current current measured - + IWLB IWLA IWLB measured measured measured Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 17 of 32 IRP: Division “A / B”  “(A >> [ndividend - nA]) / (B << nB)” nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured ndiff = IWLA/B measured - IWLA measured + IWLB ndividend = Tor Aamodt & Paul Chow University of Toronto measured ndiff , if ndiff  0 0 , otherwise Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 18 of 32 IRP-SA: Using ‘Shift Absorption’ Example: y = (a*x[i] + (b*x[i+1]>>1)) << 1 Question: Is information discarded unnecessarily here? Consider the following alternative: y = (a*x[i]<<1) + b*x[i+1] BUT: Can we really discard most significant bits and get roughly the same answer???? YES! Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 19 of 32 Architectural Support Common occurrence (using IRP-SA): Fractional Multiplication with internal Left Shift A•B << n IWLA A: IWLB B: A*B: IWLA+ IWLB n Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 20 of 32 Experimental Results Benchmarks 4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P) (Normalized) Lattice Filter (LAT, NLAT) 128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW) Levinson-Durbin Recursion (LEVDUR) 10x10 Matrix-Multiply (MMUL10) Nonlinear Control (INVPEND) Trig Function (SIN) Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 21 of 32 SQNR Enhancement: FMLS and/or IRP-SA 2 IRP-SA FMLS IRP-SA w/ FMLS Equivalent Bits 1.5 1 0.5 0 -0.5 IIR4-C IIR4-P Tor Aamodt & Paul Chow University of Toronto NLAT LAT FFT-NR FFT-MW INVPEND LEVDUR MMUL10 Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation SIN Slide 22 of 32 What Is The Effect of “Shift Absorption” ? Distribution of Fractional Multiply Output Shifts Relative Frequency 0.8 0.6 IRP IRP-SA 0.4 0.2 0 3 left 2 left 1 left none 1 right FMLS Ouput Shift Distance Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 23 of 32 Experimental Results: Rotational Inverted Pendulum U of T System Control Group Non-linear Testbench Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 24 of 32 Closed-Loop System Response: Rotational Inverted Pendulum 12-bit Controller Comparison WC : 32.8 dB IRP-SA: 41.1 dB IRP-SA w/ fmls: 48.0 dB Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 25 of 32 128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop) Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 26 of 32 Speedup? Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 27 of 32 …Yup! Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 28 of 32 Speedup* Using FMLS 1.4 Limiting 8-FMUL = { 4 left thru 3 right } 4-FMUL = { 2 left thru 1 right } 2-FMUL = { one left, no shift } 1.2 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation SIN INVPEND MMUL10 LEVDUR FFT-MW FFT-NR LAT NLAT 1 IIR4-P 1.1 IIR4-C Relative Speedup 1.3 Slide 29 of 32 SQNR Enhancement for various Output Shift Sets 2 Limiting 8-FMUL 4-FMUL 2-FMUL Equivalent Bits 1.5 1 0.5 0 IIR4-C IIR4-P Tor Aamodt & Paul Chow University of Toronto NLAT LAT FFT-NR FFT-MW LEVDUR MMUL10 INVPEND Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation SIN Slide 30 of 32 Summary The Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it) Easy VLSI implementation, and easy for compiler to use. Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 31 of 32 Future Directions Higher Level Transformations:  Automatic Generation of Block-Floating-Point...  Quantization Error Feedback…  BOTH need signal-flow-graph representation… therefore probably need a better DSP language than ANSI C Variable Precision Arithmetic (How much precision does each operation need?) Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation Slide 32 of 32

Slide

Related documents

Products

Support

Slide

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib