Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science Goal • For a program written in a high level language, automatically find the minimum number of bits needed to represent: – Each static variable in the program – Each operation in the program. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Usefulness of Bitwidth Analysis • Higher Language Abstraction • Enables other compiler optimizations 1. Synthesizing application-specific processors 2. Optimizing for power-aware processors 3. Extracting more parallelism for SIMD processors June 19th, 2000 www.cag.lcs.mit.edu/bitwise Bitwidth Opportunities • Runtime profiling reveals plenty of bitwidth opportunities. • For the SPECint95 benchmark suite, – Over 50% of operands use less than half the number of bits specified by the programmer. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Analysis Constraints • Bitwidth results must maintain program correctness for all input data sets – Results are not runtime/data dependent • A static analysis can do very well, even in light of this constraint June 19th, 2000 www.cag.lcs.mit.edu/bitwise Bitwidth Extraction • Use abundant hints in the source language to discover bitwidths with near optimal precision. • Caveats – Analysis limited to fixed-point variables. – We assume source program correctness. June 19th, 2000 www.cag.lcs.mit.edu/bitwise The Hints • Bitwidth refining constructs 1. 2. 3. 4. 5. 6. 7. Arithmetic operations Boolean operations Bitmask operations Loop induction variable bounding Clamping operations Type castings Static array index bounding June 19th, 2000 www.cag.lcs.mit.edu/bitwise 1. Arithmetic Operations • Example int a; unsigned b; a = random(); b = random(); a: 32 bits b: 32 bits a = a / 2; a: 31 bits b: 32 bits b = b >> 4; a: 31 bits b: 28 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 2. Boolean Operations • Example int a; a: 32 bits a = (b != 15); a: 1 bit June 19th, 2000 www.cag.lcs.mit.edu/bitwise 3. Bitmask Operations • Example int a; a: 32 bits a = random() & 0xff; a: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 4. Loop Induction Variable Bounding • Applicable to for loop induction variables. • Example int i; i: 32 bits for (i = 0; i < 6; i++) { i: 3 bits … } i: 3 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 5. Clamping Optimization • Multimedia codes often simulate saturating instructions. • Example int valpred valpred: 32 bits if (valpred > 32767) valpred = 32767 else if (valpred < -32768) valpred = -32768 valpred: 16 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 6. Type Casting (Part I) • Example int a; char b; a: 32 bits b: 8 bits a = b; a: 8 bits b: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 6. Type Casting (Part II) • Example int a; char b; a: 32 bits b: 8 bits a: 8 bits b: 8 bits b = a; a: 8 bits b: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise 7. Array Index Optimization • An index into an array can be set based on the bounds of the array. • Example int a, b; int X[1024]; a: 32 bits b: 32 bits a: 10 bits b: 8 bits X[a] = X[4*b]; a: 10 bits b: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits – Data-ranges a: 4 bits a = a + 1 a: 5 bits June 19th, 2000 Propagating bitwidths www.cag.lcs.mit.edu/bitwise Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits – Data-ranges a: 1X a = a + 1 Propagating bit vectors a: XXX June 19th, 2000 www.cag.lcs.mit.edu/bitwise Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits – Data-ranges Four bits are required a: <0,13> a = a + 1 a: <1,14> June 19th, 2000 Propagating data-ranges www.cag.lcs.mit.edu/bitwise Propagating Data-Ranges • Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper • Use Static Single Assignment (SSA) form with extensions to: – Gracefully handle pointers and arrays. – Extract data-range information from conditional statements. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Example of Data-Range Propagation a0 = input() a1 = a0 + 1 a1 < 0 Range-refinement functions true a2 = a1:(a10) a3 = a2 + 1 a4 = a1:(a10) c0 = a4 a5 = (a3,a4) b0 = array[a5] June 19th, 2000 www.cag.lcs.mit.edu/bitwise Example of Data-Range Propagation a0 = input() a1 = a0 + 1 <-1, -1> <-127, -1> a1 < 0 <0, 9> <0, 127> true a2 = a1:(a10) a3 = a2 + 1 a4 = a1:(a10) c0 = a4 <0, 9> <-126, 0> <0, 127> <0, 9> a5 = (a3,a4) b0 = array[a5] <-126, 127> <0, 9> June 19th, 2000 <-128, 127> <-2, 8> <-127, 127> <-1, 9> array’s bounds are [0:9] www.cag.lcs.mit.edu/bitwise What to do with Loops? • Finding the fixed-point around back edges will often saturate data-ranges. June 19th, 2000 www.cag.lcs.mit.edu/bitwise What to do with Loops? • Finding the fixed-point around back edges will often saturate data-ranges. • Example a: 0..0 a = 0 for (y = 1; y < 100; y++) a = a + 5; y: 1..1 y: 1.. 1..1 1..2 1..3 1..5 1..6 a0 = 0 y0 = 1 y1 = (y0, y2) a1 = (a0, a3) a: 0.. 0..0 0..5 0..10 0..20 0..25 y1 < 100 a2 = a1 + 5 y2 = y1 + 1 June 19th, 2000 www.cag.lcs.mit.edu/bitwise Our Loop Solution • Find the closed-form solutions to commonly occurring sequences. – A sequence is a mutually dependent group of instructions. • Use the closed-form solutions to determine final ranges. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 June 19th, 2000 www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 June 19th, 2000 www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 <0,0> <1,460> <3,480> <24,510> <510,510> • Non-trivial to find the exact ranges June 19th, 2000 www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 <0,0> <1,460> <3,480> <24,510> <510,510> • Non-trivial to find the exact ranges June 19th, 2000 www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 <0,0> <1,460> <3,480> <24,510> <510,510> • Can easily find conservative range of <0,510> June 19th, 2000 www.cag.lcs.mit.edu/bitwise Solving the Linear Sequence a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 <1,10> <1,100> <1,100> • Figure out the iteration count of each loop. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Solving the Linear Sequence a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 <1,10> <1,10>*<1,1>=<1,10> <1,100> <1,100>*<2,2>=<2,200> <1,100> <1,100>*<3,3>=<3,300> • Find out how much each instruction contributes to sequence using iteration count. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Solving the Linear Sequence a = 0 <1,10> for i = 1 to 10 a = a + 1 <1,10>*<1,1>=<1,10> for j = 1 to 10 <1,100> a = a + 2 <1,100>*<2,2>=<2,200> for k = 1 to 10 <1,100> <1,100>*<3,3>=<3,300> a = a + 3 ...= a + 4 (<1,10>+<2,200>+<3,300>)<0,0>=<0,510> • Sum all the contributions together, and take the datarange union with the initial value. June 19th, 2000 www.cag.lcs.mit.edu/bitwise Results • Standalone Bitwise compiler. – Bits cut from scalar variables – Bits cut from array variables • With the DeepC silicon compiler. June 19th, 2000 www.cag.lcs.mit.edu/bitwise with Bitwise dynamic profile benchmark June 19th, 2000 www.cag.lcs.mit.edu/bitwise sor pmatch parity intfir histogram convolve mpegcorr median jacobi intmatmul life bubblesort adpcm 100 80 60 40 20 0 softfloat percentage of bits remaining Percentage of Original Scalar Bits Percentage of Original Array Bits dynamic profile benchmark June 19th, 2000 www.cag.lcs.mit.edu/bitwise sor pmatch parity intfir histogram convolve mpegcorr median jacobi intmatmul life bubblesort adpcm 100 90 80 70 60 50 40 30 20 10 0 softfloat percentage of bits remaining with Bitwise DeepC Compiler Targeted to FPGAs C/Fortran program Suif Frontend Pointer alias and other high-level analyses Bitwidth Analysis Raw parallelization MachSuif Codegen DeepC specialization Traditional CAD optimizations Physical Circuit June 19th, 2000 www.cag.lcs.mit.edu/bitwise Verilog June 19th, 2000 0 sor (32) pmatch (32) parity (32) newlife (1) Without bitwise mpegcorr (16) median (32) life (1) jacobi (8) intmatmul (16) intfir (32) histogram (16) convolve (16) bubblesort (32) adpcm (8) Area (CLB count) FPGA Area With bitwise 2000 1800 1600 1400 1200 1000 800 600 400 200 www.cag.lcs.mit.edu/bitwise Benchmark (main datapath width) 0 June 19th, 2000 www.cag.lcs.mit.edu/bitwise sor pmatch parity newlife mpegcorr Without bitwise median life jacobi intmatmul intfir histogram convolve bubblesort adpcm XC4000-09 Clock Speed (MHZ) FPGA Clock Speed (50 MHz Target) With bitwise 150 125 100 75 50 25 Power Savings Average Dynamic Power (mW) Without bitwidth analysis With bitwidth analysis 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 bubblesort June 19th, 2000 histogram jacobi www.cag.lcs.mit.edu/bitwise pmatch Related Work • Data-range propagation for branch prediction [Patterson] • Symbolic data-range analysis [Rugina et al.] • Bitwidth propagation [Ananian] • Bit-vector propagation [Rahzdan, Budiu et al.] June 19th, 2000 www.cag.lcs.mit.edu/bitwise Summary • Developed Bitwise: a scalable bitwidth analyzer – Standard data-flow analysis – Loop analysis – Incorporate pointer analysis • Demonstrate savings when targeting silicon from high-level languages – 57% less area – up to 86% improvement in clock speed – less than 50% of the power June 19th, 2000 www.cag.lcs.mit.edu/bitwise June 19th, 2000 www.cag.lcs.mit.edu/bitwise Power Savings • C ASIC – IBM SA27E process • 0.15 micron drawn – 200 MHz • Methodology – C RTL – RTL simulation Register switching activity – Synthesis reports dynamic power June 19th, 2000 www.cag.lcs.mit.edu/bitwise Mismatched Bitwidths • When operands of an instruction are of differing sizes – type conversion instructions are added, converting both operands • to an integer of the widest of the two, and • with the appropriate sign June 19th, 2000 www.cag.lcs.mit.edu/bitwise