Slide

advertisement
Embedded ISA Support for Enhanced Floating-Point
to Fixed-Point ANSI C Compilation
Tor Aamodt and Paul Chow
University of Toronto
{ aamodt, pc }@eecg.utoronto.ca
3rd ACM International Conference on Compilers, Architectures and Synthesis for
Embedded Systems, Nov. 17-18th, 2000, San Jose CA
What is this presentation about?
FOCUS: Signal processing applications developed
using high-level language representation and
floating-point data types...
WANT: Faster fixed-point software development...
QUESTION: Are there “better” fixed-point DSP
instruction-sets in terms of runtime, power, or
roundoff-noise performance?
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 2 of 32
Presentation Outline
Motivation & Background
Focus on…
 Automatic Conversion to Fixed-Point
 Architectural Enhancements
 Some Experimental Results
Summary / Future Directions
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 3 of 32
Motivation
80% of DSPs in use are Fixed-Point. Why?
Because fixed-point hardware is cheaper and
uses less power …
… however, it is much harder to develop
signal-processing software for.
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 4 of 32
Background
 UTDSP Project: DSP Compiler/Architecture Co-design
 Traditional DSP architectures are hard for compilers to generate
efficient code for… eg. extended precision accumulators
 First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm
CMOS / 63 MHz (Sean Peng’s M.A.Sc.)
 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction
fetching architecture (reduced pin-count)
 June 2000: Synopsys CoCentric Fixed-Point Designer Tool
 First commercial tool for transforming floating-point ANSI C
programs into fixed-point ($20,000 US)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 5 of 32
Background: Fixed-Point versus Floating-Point
sign bit
8 bit exponent
(excess 127)
23+1 bit normalized
mantissa
32 bit Floating-Point (IEEE):
explicit
binary-point
Fixed-Point:
implied
binary-point
sign bit
Tor Aamodt & Paul Chow
University of Toronto
Integer Part
Fractional Part
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 6 of 32
Background: Using Fixed-Point Arithmetic
Floating-Point:
Fixed-Point:
yn = yn-1 + xn
yn =
(( •y
n-1>>3)
+ xn
) <<
1
Explicit Scaling Operations
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 7 of 32
Automatic Conversion Process
Traditional Optimizing Compiler:
Input
Program
Parser
Optimizer
Code Generator
• CONSTRAINT:
Input/Output Invariance
• GOAL:
Application Speedup
Processor
ie. make code faster, but do not break anything!!!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 8 of 32
Automatic Conversion Process
Traditional Optimizing Compiler:
Input
Program
Sample
Inputs
Parser
Optimizer
Code Generator
Processor
Floating-Point to Fixed-Point Translator
• “RELAX” CONSTRAINTS…
• GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio)
Fast/Low-Power Operation (10-500  faster than FP emulation)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 9 of 32
Floating-Point to Fixed-Point Translation
float a, b, x[N];
y = a*x[i] + b*x[i+1];
int a, b, x[N];
y = a•x[i] >> 2 + b•x[i+1];
1. Type Conversion
2. Scaling Operations
3. Fractional Fixed-Point Operations
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 10 of 32
Floating-Point to Fixed-Point Translator
SUIF Parser*
Optimizer
Identifier Assignment
Fixed-PointConversion
Instrument Code
Sample Inputs
Profile
*SUIF = Stanford University Intermediate Format
See: http://suif.stanford.edu
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 11 of 32
Collecting Dynamic Range Information
Consider the ANSI C code:
float a, b, x[N];
y = a*x[i] + b*x[i+1];
Equivalent Expression Tree:
ID Assignment:
“1” : tmp_1
“0” :
y
Code Instrumentation:
tmp_1 = a*x[i];
profile(tmp_1,1);
a
tmp_2 = b*x[i+1];
profile(tmp_2,2);
x[i]
y = tmp_1 * tmp_2;
*
+
b
profile(y,0);
*
“2” : tmp_2
Tor Aamodt & Paul Chow
University of Toronto
x[i+1]
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 12 of 32
Generating Scaling Operations
Signal Scaling: Integer Word Length (IWL)
 definition:
IWL[x] = log2 max(x) + 1
IWL
Sign bit
Tor Aamodt & Paul Chow
University of Toronto
Integer Part
Fractional Part
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 13 of 32
Generating Scaling Operations
Example: “A op B”:
 IWLA op B measured
 IWLA measured
 IWLA current
?
Tor Aamodt & Paul Chow
University of Toronto
A
current
 IWLB measured
 IWL
B current
op
Converted
Sub-Expressions
IWLA op B
B
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 14 of 32
Automatic Conversion Process:
IRP: Using Intermediate Result Profile Data
Previous Algorithms:
 ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE:
An Interactive Code Generation Environment for HW/SW CoDesign.
ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric FixedPoint Designer Tool)
 A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and
Wonyong Sung. A Floating-Point to Fixed-Point C Converter for FixedPoint Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop,
August 1997.
Neither use Intermediate Result Profile data,
instead, they combine range information from leaf
nodes  Is Useful Information Lost?
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 15 of 32
IRP: Additive Operations
For example, assume |A| > |B|, and
IWLA+B measured  IWLA measured
“A ± B”
A:
>> n
B:
n
“A  B”

where:
IWLA+B
current
Tor Aamodt & Paul Chow
University of Toronto
“(A << nA)  (B >> [n-nB])”
nA = IWLA
nB = IWLA
n = IWLA
= IWLA
current
current
measured
-
IWLA
IWLB
IWLB
measured
measured
measured
measured
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 16 of 32
IRP: Multiplication
“A • B”

where:
IWLA•B
current
Tor Aamodt & Paul Chow
University of Toronto
“(A << nA) • (B << nB)”
nA = IWLA
nB = IWLA
= IWLA
current
current
measured
-
+ IWLB
IWLA
IWLB
measured
measured
measured
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 17 of 32
IRP: Division
“A / B”

“(A >> [ndividend - nA]) / (B << nB)”
nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
ndiff = IWLA/B measured - IWLA measured + IWLB
ndividend =
Tor Aamodt & Paul Chow
University of Toronto
measured
ndiff , if ndiff  0
0 , otherwise
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 18 of 32
IRP-SA: Using ‘Shift Absorption’
Example:
y = (a*x[i] + (b*x[i+1]>>1)) << 1
Question: Is information discarded unnecessarily here?
Consider the following alternative:
y = (a*x[i]<<1) + b*x[i+1]
BUT: Can we really discard most significant bits and
get roughly the same answer???? YES!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 19 of 32
Architectural Support
Common occurrence (using IRP-SA):
Fractional Multiplication
with internal Left Shift
A•B << n
IWLA
A:
IWLB
B:
A*B:
IWLA+ IWLB
n
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 20 of 32
Experimental Results
Benchmarks
4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P)
(Normalized) Lattice Filter (LAT, NLAT)
128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW)
Levinson-Durbin Recursion (LEVDUR)
10x10 Matrix-Multiply (MMUL10)
Nonlinear Control (INVPEND)
Trig Function (SIN)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 21 of 32
SQNR Enhancement: FMLS and/or IRP-SA
2
IRP-SA
FMLS
IRP-SA w/ FMLS
Equivalent Bits
1.5
1
0.5
0
-0.5
IIR4-C
IIR4-P
Tor Aamodt & Paul Chow
University of Toronto
NLAT
LAT
FFT-NR
FFT-MW
INVPEND
LEVDUR
MMUL10
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
SIN
Slide 22 of 32
What Is The Effect of “Shift Absorption” ?
Distribution of Fractional Multiply Output Shifts
Relative Frequency
0.8
0.6
IRP
IRP-SA
0.4
0.2
0
3 left
2 left
1 left
none
1 right
FMLS Ouput Shift Distance
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 23 of 32
Experimental Results:
Rotational Inverted Pendulum
U of T System Control Group
Non-linear Testbench
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 24 of 32
Closed-Loop System Response: Rotational Inverted Pendulum
12-bit Controller Comparison
WC :
32.8 dB
IRP-SA:
41.1 dB
IRP-SA w/ fmls: 48.0 dB
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 25 of 32
128-Point Radix-2 FFT
(Generated by MATLAB RealTime Workshop)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 26 of 32
Speedup?
Rotational Inverted Pendulum:
Fractional Multiply Output Shift Relative Frequencies
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 27 of 32
…Yup!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 28 of 32
Speedup* Using FMLS
1.4
Limiting
8-FMUL = { 4 left thru 3 right }
4-FMUL = { 2 left thru 1 right }
2-FMUL = { one left, no shift }
1.2
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
SIN
INVPEND
MMUL10
LEVDUR
FFT-MW
FFT-NR
LAT
NLAT
1
IIR4-P
1.1
IIR4-C
Relative Speedup
1.3
Slide 29 of 32
SQNR Enhancement for various Output Shift Sets
2
Limiting
8-FMUL
4-FMUL
2-FMUL
Equivalent Bits
1.5
1
0.5
0
IIR4-C
IIR4-P
Tor Aamodt & Paul Chow
University of Toronto
NLAT
LAT
FFT-NR
FFT-MW
LEVDUR
MMUL10
INVPEND
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
SIN
Slide 30 of 32
Summary
The Fractional Multiply with internal Left Shift
(FMLS) operation can improve runtime and
signal-to-noise performance. Speedups of up to
35% and SQNR enhancement equivalent of up to
2 bits maybe even 4 bits (depending on how you
choose to measure it)
Easy VLSI implementation, and easy for compiler
to use.
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 31 of 32
Future Directions
Higher Level Transformations:
 Automatic Generation of Block-Floating-Point...
 Quantization Error Feedback…
 BOTH need signal-flow-graph representation…
therefore probably need a better DSP language
than ANSI C
Variable Precision Arithmetic (How much
precision does each operation need?)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced FloatingPoint to Fixed-Point ANSI C Compilation
Slide 32 of 32
Download