Load-Reuse Analysis: Design and Evaluation Rastislav Bodik, Rajiv Gupta, Mary Lou Soffa PLDI'99 Presented by Sue Ann Hong 4/11/2006 Load-reuse Example Register Promotion 1. Load-reuse analysis This paper: find as many reuses as possible Identify loads & stores to the same addr on a path a1 = a4 on path p1? a2 = a4 on path p2? 2. path p2 load a1 store a2 Alias analysis Make sure load value isn’t changed a0 = a4? 3. path p1 store a0 Program transformation e.g. partial redundancy elimination hoist ‘load a4’ to path p3 load a4 path p3 Related Work • Lexical load-reuse analysis Only loads with identical names • Value numbering x = 5; Remember the hash tables… t1 = x; t2 = x; Only copy assignments for (i=0; i < N-2; i++) { A[i+2] = A[i] } Paper Does This The ideal run-time reuse finder “ground truth” Compare: Its load-reuse algorithm “Profile-based Estimator” How many reuses they find, on SPEC95, of course… Evaluating the Algorithm Comparing to Ideal Reuse Analysis • Ideal Reuse Analysis (dynamic = run-time) – Generally undecidable use simulation: Note: they do show empirically that # of accesses in history > 1 doesn’t matter too much. (Simple) remember access history for each memory inst and find prior load or store a little bit of – Want tight upper bound old history = expensive, tends to be Ignore possible (input-dependent, sporadic) reuses as noise while ( c = read() ) { … = hashtbl[ hash(c) ]; } – Still, how input-independent is the simulation? ≤ 18% • Identified reuse level (SPEC95) – See p67. Tall bars… Something like 55% of overall loads are reuses. So reuse-analysis is probably worth it. Load Reuse Analysis A must-alias analysis Value Name Graph (Data-flow analysis) An addr value flows between two addr exprs if they access the same addr (they’re equivalencies). 3 steps for 3 goods 1. Symbolic interpretation store(2x+12); ‘2x+12’ y = 2x + 8; z = load (y+4); ‘2x+12’ Find equivalences after algebraic simplification; Create synthetic names Remember the hash tables… 2. Symbolic value numbering Use the synthetic names, and backward flow from temps, find equivalences due to assignment to temps 3. Data-flow analysis Connect the equivalences from prev steps along specific paths Profile-based Estimators Intuition – Reuse-analysis which path contains what reuses f(pi) є Z – Ideal analysis how many reuses overall? n – n = Σi [f(pi) * how many times path is used] Estimator; use profiling • Crazy 5 different estimators lower and upper bounds to compensate for edge profiling errors Experiments • Figure 8 on p75. How do you interpret that thing?? How possible aliasing could make reuses useless. • Ideal found ~55% of loads have reuse • Their analysis found ~80% of those. • Other than that, the paper doesn’t really have conclusions. • What happened after this paper (1999)? Ask the next dude. • blah Discussions from class • Bodik’s notion of defining and comparing to ideal performance is different from the usual approach of giving overall optimization performance. In fact, he’s famous for not giving numbers for run time optimization. • Is this orthogonal to cache optimization? Yes. The paper doesn’t address cache/locality-related issues. • I probably shouldn’t have laughed at the author for saying “Such an amount of registers [>34] will be soon available in general-purpose processors.” Peter’s PowerBook was able to display my presentation in contrast to my Sony.