Register Allocation by Puzzle Solving Jens Palsberg UCLA Computer Science Department University of California, Los Angeles palsberg@ucla.edu This talk Register allocation Aliased registers and pre-coloring Optimal live-range splitting produces elementary programs Elementary Coloring programs have elementary interference graphs elementary graphs is the same as solving puzzles A linear-time puzzle solving algorithm Spilling Experimental Copyright @ 2007 UCLA results Register allocation a collection of puzzles Copyright @ 2007 UCLA A Compiler source language parser intermediate representation code generator machine code Copyright @ 2007 UCLA A Better Compiler source language parser intermediate representation code generator with a register allocator machine code Copyright @ 2007 UCLA What is Register Allocation? A = 10 B = 20 C = A + 30 Print C + 40 + B Copyright @ 2007 UCLA Assume we have two registers Register allocation = liveness analysis + graph coloring A = 10 A B = 20 B,A C = A + 30 B,C Print C + 40 + B Copyright @ 2007 UCLA Interference graph: A B C With colors: A B C After Register Allocation A = 10 B = 20 C = A + 30 Print C + 40 + B R1 = 10 R2 = 20 R1 = R1 + 30 Print R1 + 40 + R2 Copyright @ 2007 UCLA A B C Core (Spill-free) Register Allocation Problem Instance: a program P and a number K of available registers. Problem: can each variable of P be mapped to one of the K registers such that: variables with interfering live ranges are assigned to different registers? Theorem (Chaitin et al., 1981): NP-complete Copyright @ 2007 UCLA A program in SSA form has a chordal interference graph Proved by three groups independently in 2005: Bouchez (ENS Lyon) Brisk et al. (UCLA) Hack (U. Karlsruhe) A chordal graph Copyright @ 2007 UCLA can be colored in linear time Aliased registers on the Pentium 32 bits EAX EBX ECX EDX 16 bits AX BX CX DX 8 bits AH AL BH BL CH CL DH DL Pre-coloring Copyright @ 2007 UCLA Weighted Graphs and Aligned 1-2-coloring Nodes Two have weight one or two numbers 2i and 2i+1 are aligned Aligned 1-2-coloring: • assigns a color to every vertex of weight one • assigns two aligned colors to every vertex of weight two Partial aligned 1-2-coloring: partial function models pre-coloring Copyright @ 2007 UCLA Aligned 1-2-coloring Extension Instance: a number 2K of colors, a weighted graph G, and a partial aligned 1-2-coloring C of G Problem: Extend C to an aligned 1-2-coloring of G. Aligned 1-2-coloring: no vertex is pre-colored Coloring Extension: all vertices have weight one Coloring: no vertex is pre-colored; all weight one Copyright @ 2007 UCLA Related work Problem General Chordal Aligned 1-2-coloring extension NPNPNPLinear time complete complete complete [this paper] Aligned 1-2-coloring NP- NP- Interval NP- Elementary Linear time complete complete complete [this paper] Coloring extension NPNPNPLinear time complete complete complete [this paper] Coloring NP- Linear complete time Copyright @ 2007 UCLA Linear time Linear time This talk Register allocation Aliased registers and pre-coloring Optimal live-range splitting produces elementary programs Elementary Coloring programs have elementary interference graphs elementary graphs is the same as solving puzzles A linear-time puzzle solving algorithm Spilling Experimental Copyright @ 2007 UCLA results From strict programs to elementary programs Optimal live range splitting : strict program elementary program Used by Appel and George (PLDI 2001) Basic block … Copyright @ 2007 UCLA Statement1 Parallel copy Statement2 A program P is an elementary program if: 1. P is strict 2. P is in static single assignment form 3. For any variable v of P, LR(v) contains at most one program point outside the basic block that contains def(v) 4. If two variables u,v of P interfere, then either def(u) = def(v), or kill(u) = kill(v) 5. If two variables u,v of P interfere, then either LR(u) LR(v), or LR(v) LR(u) Copyright @ 2007 UCLA Interference graph of the example program Copyright @ 2007 UCLA A clique substitution of P3 P3 is a path with three vertices Copyright @ 2007 UCLA Elementary graphs Definition: G is an elementary graph if and only if every connected component of G is a clique substitution of P3 Theorem: An elementary program has an elementary interference graph. Copyright @ 2007 UCLA Six classes of graphs Copyright @ 2007 UCLA This talk Register allocation Aliased registers and pre-coloring Optimal live-range splitting produces elementary programs Elementary Coloring programs have elementary interference graphs elementary graphs is the same as solving puzzles A linear-time puzzle solving algorithm Spilling Experimental Copyright @ 2007 UCLA results A puzzle board Copyright @ 2007 UCLA The six kinds of pieces Copyright @ 2007 UCLA A puzzle and a solution Copyright @ 2007 UCLA From graphs to puzzles Given PX,Y,Z we build a puzzle: Vertex piece Color column X-clique upper row Y-clique both rows Z-clique lower row Precoloring some pieces are on the board already Theorem: Aligned 1-2-coloring extension for clique substitutions of P3 and puzzle solving are equivalent under linear-time reductions Copyright @ 2007 UCLA A rule, a pattern, and a mismatch Copyright @ 2007 UCLA Example program Copyright @ 2007 UCLA Counterexample 1 Lesson: use a size-2 piece before two size-1 pieces Copyright @ 2007 UCLA Counterexample 2 Lesson: statements 7-10 must come before statements 11-14 Copyright @ 2007 UCLA Counterexample 3 Lesson: statement 15 must come before statements 11-14 Copyright @ 2007 UCLA Counterexample 4 Lesson: the order in statement 11-14 is crucial Copyright @ 2007 UCLA From graph coloring to puzzle solving Theorem: A puzzle is solvable if and only if our program succeeds on the puzzle Our puzzle solving program runs in linear time Copyright @ 2007 UCLA This talk Register allocation Aliased registers and pre-coloring Optimal live-range splitting produces elementary programs Elementary Coloring programs have elementary interference graphs elementary graphs is the same as solving puzzles A linear-time puzzle solving algorithm Spilling Experimental Copyright @ 2007 UCLA results Spilling Visit If each puzzle once the puzzle is not solvable, then remove some pieces and try to solve again Each time we remove a piece, we also remove all other pieces that stem from the same variable in the original program Copyright @ 2007 UCLA Benchmark characteristics Benchmark LoC Asm ASCI Purple:smg2000 74,875 73,039 303,037 SPEC2000:175.vpr 70,253 52,917 173,475 SPEC2000:188.ammp 54,335 35,567 149,245 MallocBench:expresso 52,853 45,041 250,770 SPEC2000:197.parser 49,388 32,849 163,025 SPEC2000:164.gzip 39,157 8,130 46,188 … … … (six more) Total Copyright @ 2007 UCLA btcode 409,540 286,900 1,345,898 Puzzles and the number of times we solve them Benchmark Puzzles Avg max Once ASCI Purple:smg2000 52,791 1.33 8 33,822 SPEC2000:175.vpr 47,276 1.10 10 45,575 SPEC2000:188.ammp 33,428 1.09 9 28,515 MallocBench:expresso 43,791 1.06 3 38,925 SPEC2000:197.parser 30,868 1.05 4 28,992 7,840 1.06 3 6,718 … … … … 251,428 1.13 10 213,411 SPEC2000:164.gzip (six more) Total Copyright @ 2007 UCLA Execution time of the generated code vs. gcc Copyright @ 2007 UCLA Conclusion If you want to do register allocation for the Pentium, your problem is to solve a collection of puzzles Fast compilation time, competitive code quality Future work: compare with run times of code generated by Smith-Ramsey-Holloway (PLDI 2004) Future work: Copyright @ 2007 UCLA compare with Appel-George (PLDI 2001)