Eliminating Memory References Joshua Dunfield Alina Oprea

advertisement
Eliminating Memory References
Joshua Dunfield
Alina Oprea
Problem Definition – Register
Promotion
Memory-reuse analysis
find loads and stores that access the same
address and the execution path along which the
reuse exists
Program transformation
promote values from memory to registers
replace redundant loads and stores with register
references
Different Approaches
Memory-reuse analysis
1. Using the SSAPRE algorithm
2. Modeled as a data flow problem
Program transformation
Formulated as a PRE problem
•path 1
•load
•loada a
•path 2
•path 3
•load b
•a = d on path 1
•b = d on path 2
•store c
•load d
•c != d
Different Approaches
Memory-reuse analysis
1. Using the SSAPRE algorithm
2. Modeled as a data flow problem
Program transformation
Formulated as a PRE problem
•path 1
•load
•loada a
•path 2
•path 3
•load b
•load d
•a = d on path 1
•b = d on path 2
•store c
•c != d
Register promotion using the
SSA representation
Paper: “Register Promotion by Sparse
Partial Redundancy Elimination of Loads
and Store” – Lo, Chow, Kennedy, Liu, Tu
Register promotion = 2 problems:
1. PRE of loads
2. PRE of stores
Duality between loads and stores
Loads – as ordinary expressions with respect to
redundancy (have to delete the latter occurrences)
Stores – reverse (have to delete the earlier
occurrences)
•load a
•load a
•load a
•store a
•store a
•store a
PRE of loads
Replace each store x expr by r expr
x r
where r is a pseudo-register
Apply the SSAPRE algorithm, but take into
account the occurrences of stores
Effect of stores on loads:
x
•load x
r
•load x
Improving Code Motion in
SSAPRE by Speculation
Speculation = inserting computations during
SSAPRE at ’s where the computation is not
down-safe (anticipated)
Is not permitted by the original SSAPRE
2 strategies:
conservative speculation (when profile data is not
available)
profile-driven speculation
Speculation
Conservative Speculation
Move loop-invariant computations out of single-entry
loops (can perform worse if the body of the loop not
executed)
Profile-driven Speculation
Pb of determining the optimum code placement is
undecidable (solution between no-speculate and fullyspeculate)
Heuristics: do speculation at the granularity of the
connected components of the SSA graph (for each
connected component either no speculate or fully
speculate)
PRE of Stores – SSU form
Single
Redundant
Fully redundant
Where to factor
Factoring op
Insertion points
Movement
•Loads
Assignment
Available, later
Dominated
(by earlier load)
Merge points
h3   (h1 , h2 )
iterated DF
Backward
•Stores
Use
Anticipated, earlier
Post-dominated
(by later store)
Split points
(h2 , h3 )  h1
iterated post-DF
Forward
Performance
PRE of loads reduces the number of loads by 25%
PRE of stores reduces the number of stores by 1%
Reasons: have already applied a dead store elimination
algorithm
Speculation results:
conservative: same performance, even worse on some
cases
profile-driven: 2% reduction in the # of loads and 0.5%
in the # of stores
Load-Reuse Analysis
Paper: “Load-Reuse Analysis: Design and
Evaluation” – Bodik, Gupta, Soffa
Modeled as a data-flow problem
3-fold contributions:
1. load-reuse analysis supporting indirect memory
accesses
2. simulation of the dynamic amount of load reuse
3. profile-based estimators: using edge profile
information to assign a dynamic weight to the static
load-reuse analysis
Framework
Load-Reuse Analysis
Uses Value Name Graph (VNG) representation –
keeps track of address expressions that compute
the same value
Traditionally – values identified by lexical name
VNG supports symbolic equivalences
Enhance VNG
handle indirect addressing
develop a sparse version (more space efficient)
VNG Example
name = 2v +10
u = store (v-2)
name = 2 (*u) +12
x = load (u)
name = 2x+12
y = 2x+8
name = y+4
z = load (y+4)
name = address of last load
Load-Reuse Analysis (cont)
In computing symbolic names, do substitutions for
w iterations for each loop
Set the max no of indirection levels (0 or 1)
Find congruence classes (names that refer to the
same memory address)
Extract a VNG sparse representation that contains
only loads and stores
Solve the data flow problem on the sparse VNG
representation
Download