Load-Reuse Analysis: Design and Evaluation Sue Ann Hong 4/11/2006

advertisement
Load-Reuse Analysis:
Design and Evaluation
Rastislav Bodik, Rajiv Gupta, Mary Lou Soffa
PLDI'99
Presented by
Sue Ann Hong
4/11/2006
Load-reuse Example
Register Promotion
1.
Load-reuse analysis
This paper: find as many reuses as possible
Identify loads & stores to the same
addr on a path
a1 = a4 on path p1?
a2 = a4 on path p2?
2.
path p2
load a1
store a2
Alias analysis
Make sure load value isn’t
changed
a0 = a4?
3.
path p1
store a0
Program transformation
e.g. partial redundancy elimination
hoist ‘load a4’ to path p3
load a4
path p3
Related Work
• Lexical load-reuse analysis
Only loads with identical names
• Value numbering
x = 5;
Remember the hash tables…
t1 = x;
t2 = x;
Only copy assignments
for (i=0; i < N-2; i++) { A[i+2] = A[i] }
Paper Does This
The ideal run-time reuse finder
“ground truth”
Compare:
Its load-reuse algorithm
“Profile-based Estimator”
How many reuses they find,
on SPEC95, of course…
Evaluating the Algorithm
Comparing to Ideal Reuse Analysis
• Ideal Reuse Analysis (dynamic = run-time)
– Generally undecidable  use simulation:
Note: they do show
empirically that # of
accesses in history > 1
doesn’t matter too much.
(Simple) remember access history for each memory inst and find prior load or store
a little bit of
– Want tight upper bound
old history = expensive, tends to be
Ignore possible (input-dependent, sporadic) reuses as noise
while ( c = read() ) { … = hashtbl[ hash(c) ]; }
– Still, how input-independent is the simulation?
≤ 18%
• Identified reuse level (SPEC95)
– See p67. Tall bars… Something like 55% of overall loads are
reuses.
So reuse-analysis is
probably worth it.
Load Reuse Analysis
A must-alias analysis
Value Name Graph (Data-flow analysis)
An addr value flows between two addr exprs if they
access the same addr (they’re equivalencies).
3 steps for 3 goods
1. Symbolic interpretation
store(2x+12);  ‘2x+12’
y = 2x + 8;
z = load (y+4);  ‘2x+12’
Find equivalences after algebraic simplification;
Create synthetic names
Remember the hash tables…
2.
Symbolic value numbering
Use the synthetic names, and backward flow from temps,
find equivalences due to assignment to temps
3.
Data-flow analysis
Connect the equivalences from prev steps along specific paths
Profile-based Estimators
Intuition
– Reuse-analysis  which path contains what
reuses f(pi) є Z
– Ideal analysis  how many reuses overall? n
– n = Σi [f(pi) * how many times path is used]
Estimator; use profiling
• Crazy 5 different estimators
 lower and upper bounds to compensate
for edge profiling errors
Experiments
• Figure 8 on p75.
How do you interpret that thing??
How possible aliasing could make reuses useless.
• Ideal found ~55% of loads have reuse
• Their analysis found ~80% of those.
• Other than that, the paper doesn’t really have
conclusions.
• What happened after this paper (1999)?
Ask the next dude.
• blah
Discussions
from class
• Bodik’s notion of defining and comparing to ideal
performance is different from the usual approach of
giving overall optimization performance. In fact, he’s
famous for not giving numbers for run time optimization.
• Is this orthogonal to cache optimization?
Yes. The paper doesn’t address
cache/locality-related issues.
• I probably shouldn’t have laughed at the author for
saying “Such an amount of registers [>34] will be soon
available in general-purpose processors.” Peter’s
PowerBook was able to display my presentation in
contrast to my Sony.
Download