Bitwidth Analysis with Application to Silicon Compilation

advertisement
Bitwidth Analysis with
Application to Silicon
Compilation
Mark Stephenson
Jonathan Babb
Saman Amarasinghe
MIT Laboratory for Computer Science
Goal
• For a program written in a high level
language, automatically find the minimum
number of bits needed to represent:
– Each static variable in the program
– Each operation in the program.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Usefulness of Bitwidth Analysis
• Higher Language Abstraction
• Enables other compiler optimizations
1. Synthesizing
application-specific processors
2. Optimizing for power-aware processors
3. Extracting more parallelism for SIMD
processors
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Bitwidth Opportunities
• Runtime profiling reveals plenty of bitwidth
opportunities.
• For the SPECint95 benchmark suite,
– Over 50% of operands use less than half the
number of bits specified by the programmer.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Analysis Constraints
• Bitwidth results must maintain program
correctness for all input data sets
– Results are not runtime/data dependent
• A static analysis can do very well, even in
light of this constraint
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Bitwidth Extraction
• Use abundant hints in the source language
to discover bitwidths with near optimal
precision.
• Caveats
– Analysis limited to fixed-point variables.
– We assume source program correctness.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
The Hints
•
Bitwidth refining constructs
1.
2.
3.
4.
5.
6.
7.
Arithmetic operations
Boolean operations
Bitmask operations
Loop induction variable bounding
Clamping operations
Type castings
Static array index bounding
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
1. Arithmetic Operations
• Example
int
a;
unsigned b;
a = random();
b = random();
a: 32 bits b: 32 bits
a = a / 2;
a: 31 bits b: 32 bits
b = b >> 4;
a: 31 bits b: 28 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
2. Boolean Operations
• Example
int a;
a: 32 bits
a = (b != 15);
a: 1 bit
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
3. Bitmask Operations
• Example
int a;
a: 32 bits
a = random() & 0xff;
a: 8 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
4. Loop Induction Variable
Bounding
• Applicable to for loop induction variables.
• Example
int i;
i: 32 bits
for (i = 0; i < 6; i++) {
i: 3 bits
…
}
i: 3 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
5. Clamping Optimization
• Multimedia codes often simulate saturating
instructions.
• Example
int valpred
valpred: 32 bits
if (valpred > 32767)
valpred = 32767
else if (valpred < -32768)
valpred = -32768
valpred: 16 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
6. Type Casting (Part I)
• Example
int a;
char b;
a: 32 bits b: 8 bits
a = b;
a: 8 bits b: 8 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
6. Type Casting (Part II)
• Example
int a;
char b;
a: 32 bits b: 8 bits
a: 8 bits b: 8 bits
b = a;
a: 8 bits b: 8 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
7. Array Index Optimization
• An index into an array can be set based on
the bounds of the array.
• Example
int a, b;
int X[1024];
a: 32 bits b: 32 bits
a: 10 bits b: 8 bits
X[a] = X[4*b];
a: 10 bits b: 8 bits
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Propagating Data-Ranges
• Data-flow analysis
• Three candidate lattices
– Bitwidth
– Vector of bits
– Data-ranges
a: 4 bits
a = a + 1
a: 5 bits
June 19th, 2000
Propagating bitwidths
www.cag.lcs.mit.edu/bitwise
Propagating Data-Ranges
• Data-flow analysis
• Three candidate lattices
– Bitwidth
– Vector of bits
– Data-ranges
a: 1X
a = a + 1
Propagating bit vectors
a: XXX
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Propagating Data-Ranges
• Data-flow analysis
• Three candidate lattices
– Bitwidth
– Vector of bits
– Data-ranges
Four bits are required
a: <0,13>
a = a + 1
a: <1,14>
June 19th, 2000
Propagating data-ranges
www.cag.lcs.mit.edu/bitwise
Propagating Data-Ranges
• Propagate data-ranges forward and backward over
the control-flow graph using transfer functions
described in the paper
• Use Static Single Assignment (SSA) form with
extensions to:
– Gracefully handle pointers and arrays.
– Extract data-range information from conditional
statements.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Example of Data-Range
Propagation
a0 = input()
a1 = a0 + 1
a1 < 0
Range-refinement functions
true
a2 = a1:(a10)
a3 = a2 + 1
a4 = a1:(a10)
c0 = a4
a5 = (a3,a4)
b0 = array[a5]
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Example of Data-Range
Propagation
a0 = input()
a1 = a0 + 1
<-1, -1>
<-127, -1>
a1 < 0
<0, 9>
<0, 127>
true
a2 = a1:(a10)
a3 = a2 + 1
a4 = a1:(a10)
c0 = a4
<0, 9>
<-126, 0>
<0, 127>
<0, 9>
a5 = (a3,a4)
b0 = array[a5]
<-126, 127>
<0, 9>
June 19th, 2000
<-128, 127> <-2, 8>
<-127, 127> <-1, 9>
array’s bounds are [0:9]
www.cag.lcs.mit.edu/bitwise
What to do with Loops?
• Finding the fixed-point around back edges
will often saturate data-ranges.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
What to do with Loops?
• Finding the fixed-point around back edges
will often saturate data-ranges.
• Example
a: 0..0
a = 0
for (y = 1; y < 100; y++)
a = a + 5;
y: 1..1
y: 1..
1..1
1..2
1..3
1..5
1..6
a0 = 0
y0 = 1
y1 = (y0, y2)
a1 = (a0, a3)
a: 0..
0..0
0..5
0..10
0..20
0..25
y1 < 100
a2 = a1 + 5
y2 = y1 + 1
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Our Loop Solution
• Find the closed-form solutions to commonly
occurring sequences.
– A sequence is a mutually dependent group of
instructions.
• Use the closed-form solutions to determine
final ranges.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
<0,0>
<1,460>
<3,480>
<24,510>
<510,510>
• Non-trivial to find the exact ranges
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
<0,0>
<1,460>
<3,480>
<24,510>
<510,510>
• Non-trivial to find the exact ranges
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
<0,0>
<1,460>
<3,480>
<24,510>
<510,510>
• Can easily find conservative range of <0,510>
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Solving the Linear Sequence
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
<1,10>
<1,100>
<1,100>
• Figure out the iteration count of each loop.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Solving the Linear Sequence
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
<1,10>
<1,10>*<1,1>=<1,10>
<1,100>
<1,100>*<2,2>=<2,200>
<1,100>
<1,100>*<3,3>=<3,300>
• Find out how much each instruction contributes to
sequence using iteration count.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Solving the Linear Sequence
a = 0
<1,10>
for i = 1 to 10
a = a + 1
<1,10>*<1,1>=<1,10>
for j = 1 to 10
<1,100>
a = a + 2
<1,100>*<2,2>=<2,200>
for k = 1 to 10
<1,100>
<1,100>*<3,3>=<3,300>
a = a + 3
...= a + 4 (<1,10>+<2,200>+<3,300>)<0,0>=<0,510>
• Sum all the contributions together, and take the datarange union with the initial value.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Results
• Standalone Bitwise compiler.
– Bits cut from scalar variables
– Bits cut from array variables
• With the DeepC silicon compiler.
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
with Bitwise
dynamic profile
benchmark
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
sor
pmatch
parity
intfir
histogram
convolve
mpegcorr
median
jacobi
intmatmul
life
bubblesort
adpcm
100
80
60
40
20
0
softfloat
percentage of bits remaining
Percentage of Original Scalar Bits
Percentage of Original Array Bits
dynamic profile
benchmark
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
sor
pmatch
parity
intfir
histogram
convolve
mpegcorr
median
jacobi
intmatmul
life
bubblesort
adpcm
100
90
80
70
60
50
40
30
20
10
0
softfloat
percentage of bits remaining
with Bitwise
DeepC Compiler Targeted to FPGAs
C/Fortran program
Suif Frontend
Pointer alias and other high-level analyses
Bitwidth Analysis
Raw
parallelization
MachSuif
Codegen
DeepC specialization
Traditional CAD optimizations
Physical Circuit
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Verilog
June 19th, 2000
0
sor (32)
pmatch (32)
parity (32)
newlife (1)
Without bitwise
mpegcorr (16)
median (32)
life (1)
jacobi (8)
intmatmul (16)
intfir (32)
histogram (16)
convolve (16)
bubblesort (32)
adpcm (8)
Area (CLB count)
FPGA Area
With bitwise
2000
1800
1600
1400
1200
1000
800
600
400
200
www.cag.lcs.mit.edu/bitwise
Benchmark (main datapath width)
0
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
sor
pmatch
parity
newlife
mpegcorr
Without bitwise
median
life
jacobi
intmatmul
intfir
histogram
convolve
bubblesort
adpcm
XC4000-09 Clock Speed (MHZ)
FPGA Clock Speed
(50 MHz Target)
With bitwise
150
125
100
75
50
25
Power Savings
Average Dynamic Power (mW)
Without bitwidth analysis
With bitwidth analysis
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
bubblesort
June 19th, 2000
histogram
jacobi
www.cag.lcs.mit.edu/bitwise
pmatch
Related Work
• Data-range propagation for branch
prediction [Patterson]
• Symbolic data-range analysis [Rugina et al.]
• Bitwidth propagation [Ananian]
• Bit-vector propagation [Rahzdan, Budiu et
al.]
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Summary
• Developed Bitwise: a scalable bitwidth analyzer
– Standard data-flow analysis
– Loop analysis
– Incorporate pointer analysis
• Demonstrate savings when targeting silicon
from high-level languages
– 57% less area
– up to 86% improvement in clock speed
– less than 50% of the power
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Power Savings
• C  ASIC
– IBM SA27E process
• 0.15 micron drawn
– 200 MHz
• Methodology
– C  RTL
– RTL simulation  Register switching activity
– Synthesis reports dynamic power
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Mismatched Bitwidths
• When operands of an instruction are of
differing sizes
– type conversion instructions are added,
converting both operands
• to an integer of the widest of the two, and
• with the appropriate sign
June 19th, 2000
www.cag.lcs.mit.edu/bitwise
Download