Outline Static Single Assignment (SSA) Form ƒ Introduction to SSA

advertisement
Outline
ƒ Introduction to SSA
Static Single Assignment (SSA) Form
ƒ Motivation
ƒ Value Numbering
ƒ Definition, Observations
Construction - Analyses - Optimizations
ƒ Construction, Destruction
ƒ
ƒ
ƒ
ƒ
ƒ
Welf Löwe
Welf.lowe@lnu.se
Theoretical, Pessimistic, Optimistic Construction
Destruction
Memory SSA,
Interprocedural analysis based on Memory SSA: example P2SSA
How to capture analysis results
ƒ Optimizations
1
2
What is an IR tailored for analyses
and optimizations?
Intermediate Representations
ƒ Intermediate representations (like BB, SSA graphs)
separate compiler front-end (source code related
representation) from back-end (target code related
representation)
ƒ Analyses and optimizations can be performed independently
of the source and target languages
ƒ Tailored for analyses and optimizations
ƒ Represents dependencies of operations in the program
ƒ Control flow dependencies
ƒ Data dependencies
ƒ Only essential dependencies (approximation)
ƒ A dependency s;s’ of operations is essential iff execution s’;s
changes observable behavior of the program
ƒ Computation of essential dependencies is not decidable
ƒ Compact representation
ƒ … of dependencies
ƒ No (few) redundant expression representation
3
Static Single Assignment - SSA
4
Avoid redundant computations
ƒ Goal:
ƒ Assign each (partial) expression a unique number.
ƒ increase efficiency of inter/intra-procedural analyses and optimizations
ƒ speed up dataflow analysis
ƒ represent def-use relations explicitly
ƒ Good optimization in itself as values can be reused instead of
recomputed
ƒ Known as value numbering
ƒ Basic idea for SSA
ƒ Idea:
ƒ Represent program as a directed graph of operations τ
ƒ Represent triples as a (single) assignment t := τ t' t'' with t, t', t'' a
variable/label/register/edge connecting operations
ƒ Values that are provable equivalent get the same number
ƒ How to find equal values?
ƒ Can be computed by data flow analysis (forward, must)
ƒ SSA-Property: there is only one single (static) position in a
program/procedure defining t
ƒ Does not mean t computed only once (due to iterations the program point is
in general executed more than once, possibly each time with different
values)
ƒ But, there is no doubt which static variable definition is used in arguments of
operations
5
6
1
Equivalent Values
Idea of Value Numbering
ƒ Congruent values get the same value number
ƒ Values are defined by operations and used by other operations
ƒ Values computed only once and then reused (referring to their value
number)
ƒ Algorithmic idea to prove equality of expression values at different
program points (congruence of tuples):
ƒ Two expressions are semantically equivalent, iff they
compute the same value - Not decidable
ƒ Two expressions are syntactically equivalent, iff the
operator is the same and the operands are the same or
syntactically equivalent
ƒ Generalization towards semantic equivalence using
algebraic identities, e.g., a+a =2*a
ƒ In practice, provable equivalence (conservative
approximation): two expressions are congruent, iff they are
syntactically equivalent or algebraically identical
ƒ Basic case: constants are easy to proof equivalent
ƒ Induction: see definition of syntactic equivalence: if inputs of two operations
equal and the operator is equal the computed values are also equal
ƒ Also apply algebraic identities to prove congruence
ƒ Problems (postponed):
ƒ Alias/Points-to problem: Addresses, hence address content, is not exactly
computable. Where are values stored to and loaded from? Not decidable.
ƒ Meets in control flow: which definition holds?
7
8
Value Numbering
Value Numbering
with Local Variables without Alias Problem
ƒ Type of value numbers:
(1) Initially: value number vn(constant)=constant; vn(t) = void for all tuples t.
(2) for all tuple t in program order:
case (a) t =ST >local< t´
vn(t) := vn (ST >local< vn (t´))
if vn (t) = void then
vn (LD <local >):= vn (t´),
vn (t):= new value number,
generate: vn (t): ST >local< vn (t´)
(b) t = LD <local>
vn(t) := vn (LD <local>).
if vn (t) = void then
vn (t):= new value number,
generate: vn (t): LD <local>
(c) t = τ t´ t´´
vn(t) := vn( τ vn (t´) vn (t´´))
if vn (t) = void then
vn (t):= new value number,
generate: vn (t): τ vn (t´) vn (t´´).
(d) t = call proc t´ t´´ ...
-- analog (c) with τ = call proc
ƒ INT for integer constants; BOOL for Boolean constants etc.
ƒ Use ids (labels): {t1, . . . , tn} otherwise
ƒ Data structure:
ƒ Mapping (hash table) gets value number vn of operations defining
the values, i.e., for tuples t := τ t' t'‘, a lookup of τ vn (t´) vn (t´´) gets
the value number of t,
ƒ Mapping of operation labels t to tuples τ t´t´´ (implicit),
ƒ τ is an operator symbol,
ƒ vn (t´) vn (t´´) are value numbers of tuples labeled t´, t´´, i.e. t´, t´´
have hash table entries or are constants.
ƒ For a first try:
ƒ Computation basic block local
ƒ One hash table per basic block.
9
Example
Original
t1: ST
>a<
t2: LD <a>
t3: LD <x>
t4: MUL t2
t5: ADD t4
>b<
t6: ST
t7: LD <x>
t8: MUL 2
>a<
t9: ST
t10: LD <a>
t11: ADD t10
t12: LD <b>
t13: ADD t11
t14: ST >c<
10
Value Number Graph of Basic Block
2
Result
v1: ST
>a<
2
Result
v1: ST
>a<
2
t3
1
t5
v2: LD <x>
v3: MUL 2
v4: ADD v3
>b<
v5: ST
v2
1
v4
v2: LD <x>
v3: MUL 2
v4: ADD v3
>b<
v5: ST
v2
1
v4
v6: ST
v3
t7
t8
2
1
Mul
v3
>a<
v6: ST
>a<
v3
1
t12
t13
LD x
v2
v7: ADD v4
>c<
v8: ST
v4
v7
v7: ADD v4
>c<
v8: ST
11
v4
v7
ADD
v4 v4
ADD
v7
12
2
Value Numbering
General Value Numbering
with Global Variables without Alias Problem
ƒ Case (a’) as before case (a) for local variables
case (a’) t =ST >global< t´
vn(t) := vn (ST >global< vn(t´))
if vn (t) = void then
vn (LD <global >):= vn (t´),
vn(t) := new value number,
generate: vn (t): ST >global< vn(t´)
ƒ Procedures:
ƒ Case (d) as before, but if global (potentially) redefined in proc, set value
number for tuple ST >global<, LD <global> and transitive depending tuples to
void
ƒ Easy implementation: set all value numbers to void
ƒ Optimization for non-recursive leaf procedures:
ƒ New case (d): analyze procedure as if it was inlined
ƒ Not trivial if proc has more than one basic block
-- t’ is an address with unknown value (no compile time constant address, no name)
-- computed in an operation with value number vn (t’)
Case (e) t = ST t’ t’’
vn (t) := vn (ST vn (t’) vn (t’’))
if vn (t) = void then
vn (LD vn (t’)) := vn (t’),
vn (t):= new value number,
Generate: vn (t): ST vn (t’) vn (t’’)
if t’ may be semantically equivalent to tt: -- Points-to analysis
vn (ST vn (tt) ...) := void ,
vn (LD vn (tt)) := void ,
value number of transitively depending tuples := void
Case (f) t = LD t’
vn (t) := vn (LD vn (t’))
if vn (t) = void then
vn (t):= new value number,
Generate: vn (t): LD vn (t’)
13
14
Value number graph → SSA
Remarks
ƒ Initially all value numbers are set to void . By knowing the
values of predecessor basic blocks, this can be relaxed
(initializations over basic block solved by SSA, next issue)
ƒ Each new entry in the hash table generates a new value
number. After call proc, entries depending on non-local
variables get void
ƒ A store operation ST >a< t´ sets void all vn (LD <a´>) and vn
(ST <a´> t´) if it is not clear, whether a = a´ or a ≠ a´(aliasproblem). Special case: arrays with index expressions
ƒ SSA-Property: there is only one position in a
program/procedure defining t
ƒ Half way to SSA representation due to value numbering, i.e.
value number graph is SSA graph of a basic block
ƒ Problem: What to do with variables having assignments on
more then one position?
ƒ E.g.
if … then i:=1 else i:=2 end; x:=i
i:=0; while …loop …; i:=i+1; … end; x:=i
ƒ Value number graph of a basic block
ƒ No (provable) unnecessary dependencies
ƒ No (provable) redundant computation
15
16
Compact representation of
dependencies
φ-Functions
ƒ Solution:
ƒ Previous: #def x #use dependency edges
ƒ Each assignment to a variable a defines a new version ai,
ƒ This version is actually the value number of the assigned expression
ƒ At meets in the control flow, we just add a pseudo expression selecting a
value defining a new version (value number) of that variable
a3 := φ(a1,a2)
ƒ Now: #def + #use dependency edges
ƒ E.g.
if … then i1:=1 else i2:=2 end; i3:=φ(i1,i2); x:=i3
i1:=0; while i3:=φ(i1,i2); … loop … ; i2:=i3+1; … end; x:=i3
i := …
i := …
i := …
i1 :=
i2:= …
i3 := …
ƒ φ−functions
ƒ always occur at the beginning of a block
i4 := φ(i1, i2, i3)
ƒ are all are evaluated simultaneously for a block
ƒ guarantee that there is exactly one definition/assignment for each use of a
variable
… := i
… := i
… := i4
… := i4
ƒ Assignment i0 := φ(i1,…,ik) in a basic block indicates that the block has k
direct predecessors in the control flow
17
18
3
Example Program and Basic Block Graph
(1) a=1;
(2) b=2;
while true{
(3)
c=a+b;
(4)
if (d=c-a)
(5)
while (d=b*d){
(6)
d=a+b;
(7)
e=e+1;
}
(8)
b=a+b;
(9)
if (e=c-a) break;
}
(10)a=b*d;
(11)b=a-d;
Basic Block and SSA Graph
(1) a=1
(2) b=2
(1) a=1
(2) b=2
a1=1
b1=2
(3) c=a+b
(4) d=c-a
(3) c=a+b
(4) d=c-a
(5) d=b*d
(5) d=b*d
(6) d=a+b
(7) e=e+1
(6) d=a+b
(7) e=e+1
d2=φ(d1,d4)
e3=φ(e2,e4)
d3=b2*d2
(8) b=a+b
(9) e=c-a
d4=a1+b2
e4=e3+1
(10) a=b*d
(11) b=a-d
(10) a=b*d
(11) b=a-d 19
SSA-Graph before and after Constant
and Copy Propagation
(8) b=a+b
(9) e=c-a
b2=φ(b1,b3)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
d5=φ(d1,d3)
b3=a1+b2
e5=c1-a1
a2=b3*d5
b4=a2-d5
20
SSA-Graph before and after using
Algebraic Identities
a1=1
b1=2
d2=φ(d1,d4)
e3=φ(e2,e4)
d3=b2*d2
d4=a1+b2
e4=e3+1
b2=φ(b1,b3)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
b2=φ(2,c1)
e2=φ(e1,d1)
c1=1+b2
d1=c1-1
d2=φ(d1,c1)
e3=φ(e2,e4)
d3=b2*d2
d5=φ(d1,d3)
d2=φ(d1,c1)
e3=φ(e2,e4)
d3=b2*d2
d5=φ(d1,d3)
b3==c1
b3=a1+b2
e5=c1-a1
d4==c1
e4=e3+1
b2=φ(2,c1)
e2=φ(e1,d1)
c1=1+b2
d1=c1-1
d5=φ(d1,d3)
e4=e3+1
a2=b3*d5
b4=a2-d5
a2=c1*d5
b4=a2-d5
d2=φ(b2,c1)
e3=φ(e2,e4)
d3=b2*d2
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
d1==b2
d5=φ(b2,d3)
e4=e3+1
a2=c1*d5
b4=a2-d5
a2=c1*d5
b4=a2-d5
21
Implementation
SSA graph
Start
Consti 1
Jmp
φ
φ
e
c
φ
1
SSA properties
Consti 2
b
P1: Typed in-/output of nodes, in- and output of operation node connected by
edges have the same type.
P2: Operation nodes and edges of a basic block are a DAG.
Note: correspondence to value number graphs and expression trees
P3: Input of φ-operations have same type as their output.
P4: i-th operand of a φ-operation is available at the end of the i-th predecessor
BB.
P5: A start node Start dominates all BBs of a procedure, an end node End postdominates all nodes of a procedure.
P6: Every block has exactly one of nodes Start, End, Jump, Cond, Ret
P7: If operation x in a BB Bx defines a operand of operation y in a BB By then there
is a path Bx →+ By.
P7a: (Special case of P7) operation y is a φ-operation and x in Bx = By then there is
a cyclic path By →+ By.
2
Consti 0
Addi
φ
Cond
Consti 0
22
Muli
d
Cond
3
Consti 0
φ
Cond
5
Consti 1
Muli
Addi
Jmp
4
P8: Let X,Y be BBs each with a definition of a that may reach a use of a in BB Z.
Let Z’ be the first common BB of execution paths X →+ Z, Y →+ Z.
Then Z’ contains a φ-operation for a.
6
a
Subi
23
24
4
Property P8 revisited
Dominance
Dominance: X ≥ Y
On each path from the starting node S in the basic block graph to Y, X
before Y.
≥ is reflexive: X ≥ X.
Strict Dominance: X > Y
X>Y⇔X≥Y∧X≠Y
Direct Dominance: ddom(X)
X = ddom(Y) ⇔ X > Y ∧ ¬∃ Z: X > Z >Y.
Post-Dominance: X ≤ Y
On each path from the end node E in the basic block graph to Y, X
before Y.
Definitions of strict and direct post dominance analogously.
Let X,Y be BBs each with a definition of a that may reach a use of a in BB Z.
Let Z’ be the first common BB of execution paths X →+ Z, Y →+ Z.
Then Z’ contains a φ-operation for a.
X:a= ...
Y:a= ...
X:a1= ...
Y:a2= ...
Z´: a3= φ(a1, a2)
Z´:
Z: ...=...a ...
Z: ...=...a3 ...
25
Example
{1,2}
1
2
3
(Iterated) Dominance Frontiers
ƒ Dominance Frontier DF (n)
{1}
ƒ Set of nodes just not strictly dominated by n any more
ƒ DF (n) = {m | n > m ∧ ∃b ∈ pre (m): n ≥ b}.
{1,3}
ƒ Dominance Frontiers of a Set M of nodes DF (M)
{1,3,4}
4
{1,3,4,5}
5
{1,3,4,7} 7
{1,3,4,7,8}
8
{1,3,4,7,8,9}
9
ƒ DF (M) = U DF (n)
n∈M
{1,3,4,6}
6
ƒ Iterated Dominance Frontier DF+ (M)
ƒ minimum fix point of:
DF0 = DF (M),
DFi+1 = DF ( U DFk).
{1,3,4,7,8,10}
10
k∈0…i
27
Discussion P8
Proof (idea) of this observation
ƒ Defines, where to position φ-functions
ƒ Let X,Y be the only BBs containing a definition of a. Both may reach a
use of a in block Z. Then there are path X →+ Z, Y →+ Z in the BB graph
(P7). Let Z' be the first node common to both path then:
ƒ a3 := φ(a1, a2) is assigned to Z'
ƒ Z' ≥ Z otherwise X,Y are not the only BBs (a not initialized)
ƒ Z' dominates Z, actually Z' is the first common successor of X and Y
dominating Z
ƒ Z' ∈ DF(X,Y) i.e. Z' is in the dominance frontier of X and Y
ƒ Since a3 := φ(a1, a2) itself is a definition of a, the dominance frontiers
DF(X,Y,Z’) must contain φ functions.
ƒ Fixed point iteration required,
ƒ Observation: φ-functions in the iterated dominance frontiers DF+(...) of
the BB defining a variable
32
1. a3 := φ(a1, a2) cannot be defined before Z'
2. X and Y, resp., must dominate all direct predecessors of Z',
otherwise there would be a use of a without previous
definition. Hence Z' ∈ DF(X,Y)
3. a3 := φ(a1, a2) should not (but may) be inserted later in other
dominators of Z dominated by Z' since on the path Z' →+ Z
there is no change (new definition) of a.
4. Iteration DF+(...) is required, as there is now a further
definitions of a in a block Z' and DF(X,Y) Œ DF(X,Y,Z’)
33
5
Example Property E8
Outline
ƒ Introduction to SSA
1
al:= φ(a0, a5)
a2:= ...
a:= ...
2
3
2
a:= ...
4
ƒ Motivation
ƒ Value Numbering
ƒ Definition, Observations
1
a:= ...
ƒ Construction, Destruction
3
a3:= ...
ƒ
ƒ
ƒ
ƒ
ƒ
a4:= ...
4
A(a) = {1, 2, 3}
DF(A(a)) = {4} = DF1
DF2 = {1, 4} = DF+
a5:= φ(a3,a4)
Theoretical, Pessimistic, Optimistic Construction
Destruction
Memory SSA,
Interprocedural analysis based on Memory SSA: example P2SSA
How to capture analysis results
ƒ Optimizations
34
Construction Theory
35
Remainder Value Numbering
ƒ Program of size n may contain O(n) variables
ƒ In the worst case there are O(n) φ−function (for the
variables) in O(n) BBs, hence worst case complexity is Ω(n2)
ƒ Previous discussion gives a straight forward
implementation:
ƒ Perform a value numbering and update BBs accordingly
ƒ For any used variable that is used but not locally (in BB) defined
compute set of definition points (data flow analysis)
ƒ Compute iterated dominance frontiers of definition points
ƒ Insert φ−function and rename variables accordingly
ƒ In practice easier constructions possible
(1) Initially: value number vn(constant)=constant; vn(t) = void for all tuples t.
(2) for all tuple t in program order:
case
(a) t =ST >local< t´
vn(t) := vn (ST >local< vn (t´))
if vn (t) = void :
vn (LD <local >):= vn (t´),
vn (t):= new value number,
generate: ST >local< vn (t´)
(b) t = LD <local>
vn(t) := vn (LD <local>).
if vn (t) = void :
vn (t):= new value number,
generate : vn (t): t
(c) t = τ t´ t´´
vn(t) := vn( τ vn (t´) vn (t´´))
if vn (t) = void :
vn (t):= new value number,
generate : vn (t): τ vn (t´) vn (t´´).
(d) t = call proc t´ t´´ ...
-- analog (c) with τ = call proc
36
Extended Initialization
37
Extended Value Numbering
(1) Initialization for current block Z
(A) always: vn(constant)Z =constant;
(B) if Z = start block: vn(t) = void for all tuples t.
(C) else:
let Pred={X, Y , …} be the predecessors of Z in basic block graph
for all variables t used in current block Z:
if vn(t)X ≠ vn(t)Y
vn(t)Z := new value number
generate: vn(t)Z := φ(vn(t)X vn(t)Y)
if vn(t)X = vn(t)Y
vn(t)Z := vn(t)X
X:a1= ...
Z: a3= φ(a1, a2)
...
(2) for all tuple t in program order:
-- as before
Y:a2= ...
X:a1= ...
ok
Y:a2= ...
?
Z: ...=...a3 ...
38
39
6
Extended Initialization
Extended Value Numbering
(1) Initialization for current block Z
(A) always: vn(constant)Z =constant;
(B) if Z = start block: vn(t) = void for all tuples t.
(C) else:
let Pred={X, Y , …} be the predecessors of Z in basic block graph
for all variables t used in current block Z:
if vn(t)X,Y∈ Pred = undefind
recursively, initialize block X vn(t)X,Y with (1)
if vn(t)X ≠ vn(t)Y
vn(t)Z := new value number
generate: vn(t)Z := φ(vn(t)X vn(t)Y)
if vn(t)X = vn(t)Y
vn(t)Z := vn(t)X
Y:a1= ...
Y:a1= ...
Y‘:a2= ...
X: a3= φ(a1, a2)
ok
Z: ...=...a3 ...
(2) for all tuple t in program order:
-- as before
?
Z: ...=...a ...
Not visited
40
41
Eliminate/Mature φ'-Functions
Extended Initialization
(1) Initialization for current block Z
(A) always: vn(constant)Z =constant;
(B) if Z = start block: vn(t) = void for all tuples t.
(C) else:
let Pred={X, Y , …} be the predecessors of Z in basic block graph
for all variables t used in current block Z:
if X∈ Pred = unvisited
vn(t)X = new special value number (guessed number)
if vn(t)X,Y∈ Pred = undefind
recursively, initialize block X vn(t)X,Y w’ith (1)
if vn(t)X ≠ vn(t)Y
vn(t)Z := new value number
generate: vn(t)Z := φ(vn(t)X vn(t)Y)
if vn(t)X or vn(t)Y is guessed
generate: vn(t)Z := φ’(vn(t)X vn(t)Y)
if vn(t)X = vn(t)Y
vn(t)Z := vn(t)X
ƒ After value numbering is finished for each block X:
ƒ replace special value numbers in φ'-functions of X by last valid real
value numbers in pre(X)
ƒ replace φ'-functions by mature φ-functions using real value numbers
ƒ delete: vn (t)Z := φ(vn(t)Y vn(t)Z )
if t not changed in previously unvisited blocks, no φ function required
ƒ replace then use of vn (t)Z by vn(t)Y
ƒ Insight:
ƒ deletion could prove some other φ-functions unnecessary
ƒ iterative deletion till fix point
(2) for all tuple t in program order:
-- as before
42
Example I: Mature φ'-Functions
Y:a1= ...
Z: ...=...a3 ...
Example II: Mature φ'-Functions
Y:a1= ...
X: a3=φ'(a1, ?)
Y:a1= ...
Z: ...=...a3 ...
Y:a1= ...
X: a3=φ'(a1, ?)
X: a3=φ(a1, a2)
Not visited
43
Y': a2= ...
44
Z: ...=...a3 ...
X: a3= φ(a1, a3)
Not visited
Z: ...=...a3 ...
a untouched
45
7
Example III: Mature φ'-Functions
Example Program and BB Graph
(1) a=1;
(2) b=2;
while true{
(3)
c=a+b;
(4)
if (d=c-a)
(5)
while (d=b*d){
(6)
d=a+b;
(7)
e=e+1;
}
(8)
b=a+b;
(9)
if (e=c-a) break;
}
(10)a=b*d;
(11)b=a-d;
Y:a1= ...
Y:a1= ...
X:
X:
Z: ...=...a3 ...
a untouched
Z: ...=...a1 ...
a untouched
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
(5) d=b*d
(6) d=a+b
(7) e=e+1
(10) a=b*d
(11) b=a-d 47
46
SSA Construction Block 2 Initialization
SSA Construction Block 1
a1=1
b1=2
(8) b=a+b
(9) e=c-a
1
a1=1
b1=2
2
3
3
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
1
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
4
(1) a=1
(2) b=2
(5) d=b*d
(6) d=a+b
(7) e=e+1
6
(3) c=a+b
(4) d=c-a
5
4
(5) d=b*d
(8) b=a+b
(9) e=c-a
(6) d=a+b
(7) e=e+1
6
(10) a=b*d
(11) b=a-d
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
48
SSA Construction Block 3 Initialization
SSA Construction Block 2
a1=1
b1=2
3
4
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
c1=a2+b2
d1=c1-a2
49
1
a1=1
b1=2
2
3
(1) a=1
(2) b=2
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
(3) c=a+b
(4) d=c-a
5
4
(5) d=b*d
(6) d=a+b
(7) e=e+1
6
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
c1=a2+b2
d1=c1-a1
1
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
(8) b=a+b
(9) e=c-a
(6) d=a+b
(7) e=e+1
6
(10) a=b*d
(11) b=a-d
50
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
51
8
SSA Construction Block 4 Initialization
SSA Construction Block 3
a1=1
b1=2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
c1=a2+b2
d1=c1-a2
d3=b3*d2
1
a1=1
b1=2
2
3
(3) c=a+b
(4) d=c-a
5
4
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
(1) a=1
(2) b=2
d3=b3*d2
(5) d=b*d
(6) d=a+b
(7) e=e+1
6
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
1
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
4
(5) d=b*d
(8) b=a+b
(9) e=c-a
(6) d=a+b
(7) e=e+1
6
(10) a=b*d
(11) b=a-d
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
52
SSA Construction Block 5 Initialization
SSA Construction Block 4
a1=1
b1=2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
d3=b3*d2
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
1
a1=1
b1=2
2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
4
53
d3=b3*d2
(5) d=b*d
d4=a2+b3
e4=e3+1
(6) d=a+b
(7) e=e+1
6
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
b4=φ(b3,b2)
4
1
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
d4=a2+b3
e4=e3+1
(8) b=a+b
(9) e=c-a
(6) d=a+b
(7) e=e+1
6
(10) a=b*d
(11) b=a-d
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
54
SSA Construction Block 6 Initialization
SSA Construction Block 5
a1=1
b1=2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
d3=b3*d2
d4=a2+b3
e4=e3+1
4
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
b4=φ(b3,b2)
b5=a2+b4
e4=c1-a2
55
1
a1=1
b1=2
2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
d3=b3*d2
(5) d=b*d
(6) d=a+b
(7) e=e+1
6
d4=a2+b3
e4=e3+1
(8) b=a+b
(9) e=c-a
4
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
b4=φ(b3,b2)
d5=φ(d3,d1)
1
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
b5=a2+b4
e5=c1-a2
(6) d=a+b
(7) e=e+1
6
(10) a=b*d
(11) b=a-d
56
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
57
9
SSA Construction Block 6
a1=1
b1=2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
d3=b3*d2
4
a2=φ‘(a1,a5)
b2=φ‘(b1,b5)
e2=φ‘(e1,e5)
c1=a2+b2
d1=c1-a2
b4=φ(b3,b2)
d5=φ(d3,d1)
1
a1=1
b1=2
2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
d3=b3*d2
(5) d=b*d
b5=a2+b4
e5=c1-a2
d4=a2+b3
e4=e3+1
SSA Mature Block 2
(6) d=a+b
(7) e=e+1
6
a3=b5*d5
b6=a3-d5
4
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
b5=a2+b4
e5=c1-a2
d4=a2+b3
e4=e3+1
(8) b=a+b
(9) e=c-a
a2=φ(a1,a2)
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a2+b2
d1=c1-a2
b4=φ(b3,b2)
d5=φ(d3,d1)
1
(6) d=a+b
(7) e=e+1
6
a3=b5*d5
b6=a3-d5
(10) a=b*d
(11) b=a-d
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
58
SSA Mature Block 2
a1=1
b1=2
3
b3=φ‘(b2,b4)
d2=φ‘(d1,d4)
e3=φ‘(e2,e4)
d3=b3*d2
4
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
b4=φ(b3,b2)
d5=φ(d3,d1)
SSA Mature Block 3
1
a1=1
b1=2
2
3
b3=φ(b2,b3)
d2=φ(d1,d4)
e3=φ(e2,e4)
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
d3=b3*d2
(5) d=b*d
b5=a1+b4
e5=c1-a1
d4=a1+b3
e4=e3+1
59
(6) d=a+b
(7) e=e+1
6
a3=b5*d5
b6=a3-d5
4
b4=φ(b3,b2)
d5=φ(d3,d1)
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
b5=a1+b4
e5=c1-a1
d4=a1+b3
e4=e3+1
(8) b=a+b
(9) e=c-a
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
1
(6) d=a+b
(7) e=e+1
6
a3=b5*d5
b6=a3-d5
(10) a=b*d
(11) b=a-d
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
60
SSA Mature Block 3
a1=1
b1=2
3
d2=φ(d1,d4)
e3=φ(e2,e4)
d3=b2*d2
4
d4=a1+b2
e4=e3+1
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
b4=φ(b2,b2)
d5=φ(d3,d1)
SSA Mature Block 3
1
a1=1
b1=2
2
3
d2=φ(d1,d4)
e3=φ(e2,e4)
(1) a=1
(2) b=2
d3=b2*d2
(3) c=a+b
(4) d=c-a
5
(6) d=a+b
(7) e=e+1
6
d4=a1+b2
e4=e3+1
(8) b=a+b
(9) e=c-a
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
d5=φ(d3,d1)
4
(5) d=b*d
b5=a1+b4
e5=c1-a1
a3=b5*d5
b6=a3-d5
61
b5=a1+b2
e5=c1-a1
62
2
(1) a=1
(2) b=2
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
(6) d=a+b
(7) e=e+1
a3=b5*d5
b6=a3-d5
(10) a=b*d
(11) b=a-d
1
6
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
63
10
Final Simplifications
Optimistic SSA Construction
ƒ Idea:
a1=1
b1=2
b2=φ(b1,b5)
e2=φ(e1,e5)
c1=a1+b2
d1=c1-a1
3
d2=φ(d1,d4)
e3=φ(e2,e4)
d3=b2*d2
d5=φ(d3,d1)
4
b5=a1+b2
e5=c1-a1
d4=a1+b2
e4=e3+1
a3=b5*d5
b6=a3-d5
1
1
2
3
d2=φ(b2,c1)
e3=φ(e2,e4)
d3=b2*d2
5
4
d4==c1
e4=e3+1
6
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
d1==b2
d5=φ(d3,b2)
b5==c1
e5==b2
2
ƒ all values (value numbers) are equal until the opposite is proven
ƒ opposite is proven by:
• Values are different constants
• Values are generated form syntactical different operations
• Values are generated form syntactical equivalent operations with proven
different values as operands
ƒ Advantage:
ƒ Detects sometimes congruence that are not detected by pessimistic
construction
ƒ No φ−functions to mature
5
a3=c1*d5
b6=a3-d5
ƒ Disadvantage:
6
ƒ Detects sometimes congruence not that are detected by pessimistic
construction (e.g. algebraic identities)
ƒ Requires Definition-Use-Analyses on BB graph on construction
ƒ Requires computation of iterated dominance frontiers to position φ−functions
64
Construction Algorithm
65
Optimistic SSA Construction
ƒ Generate BB graph and perform Definition-Use-Analysis (data flow
analysis) for all variables. Notations:
ƒ v(i)= ...
ƒ u(i)= ...v(x,y,z,...)...
Variable v in statement (i) defined
Variable v in statement (i) used with may
reaching definitions in statements (x,y,z,...)
ƒ statements can be abstracted to blocks
ƒ Set v(i) ≡ u(j) for all v(i), u(j) in the program
ƒ Iterate until a fixed point over:
ƒ Set v(i) ≡ u(j) for:
• v(i) = c and u(j) ≠ c
• v (i) = op1(...) and u(j) ≠ op1(...)
• v (i) = op1(x1,y1) and u(j) = op1(x2,y2) and x1 ≡ x2 or y1 ≡ y2
a(1)=1
b(1)=2
(2) (3) c=a+b
(4) d=c-a
c(2)=a(1)+b(1,5)
d(2)=c(2)-a(1)
d(3)=b(1,5)*d(2,4)
(3) (5) d=b*d
(4)(6) d=a+b (5) (8) b=a+b
(7) e=e+1
(9) e=c-a
ƒ Find a unique value number for each equivalence class
ƒ Replace variables consistently by value number for each equivalence
class
ƒ Insert, if necessary, φ-functions (also possible during iteration)
Optimistic SSA Construction
d(4)=a(1)+b(1,5)
e(4)=e(0,4,5)+1
(6)(10) a=b*d
(11) b=a-d
66
b(5)=a(1)+b(1,5)
e(5)=c(2)-a(1)
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)67
Optimistic SSA Construction
a(1)=1
b(1)=2
a1=1
b(1)=2
a1=1
b(1)=2
a1=1
b1=2
c(2)=a(1)+b(5)
d(2)=c(2)-a(1)
c(2)=a1+b(1,5)
d(2)=c(2)-a1
c(2)=a1+b(1,5)
d(2)=c(2)-a1
b2=φ(b1,b(5))
c(2)=a1+b2
d(2)=c(2)-a1
d(3)=b(1,5)*d(2,4)
d(4)=a(1)+b(1,5)
e(4)=e(0,4,5)+1
(1) (1) a=1
(2) b=2
d(3)=b(1,5)*d(2,4)
b(5)=a(1)+b(1,5)
e(5)=c(2)-a(1)
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
d(4)=a1+b(1,5)
e(4)=e(0,4,5)+1
d(3)=b(1,5)*d(2,4)
b(5)=a1+b(1,5)
e(5)=c(2)-a1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
68
d(4)=a1+b(1,5)
e(4)=e(0,4,5)+1
d(3)=b2*d(2,4)
b(5)=a1+b(1,5)
e(5)=c(2)-a1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
d(4)=a1+b2
e(4)=e(0,4,5)+1
b(5)=a1+b2
e(5)=c(2)-a1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
69
11
Optimistic SSA Construction
d(3)=b2*d(2,4)
d(4)=a1+b2
e(4)=e(0,4,5)+1
Optimistic SSA Construction
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
b2=φ(b1,b(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b(5)=a1+b2
e(5)=c(2)-a1
d1=φ(d(4),d(2))
d(3)=b2*d1
d(4)=a1+b2
e(4)=e(0,4,5)+1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
d1=φ(d(4),d(2))
d(3)=b2*d1
b(5)=a1+b2
e(5)=c(2)-a1
d(4)=a1+b2
e(4)=e(0,4,5)+1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
b(5)=a1+b2
e(5)=c(2)-a1
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d(3)=b2*d1
d(4)=a1+b2
e(4)=e2+1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
b(5)=a1+b2
e(5)=c(2)-a1
a(6)=b(5)*d2
b(6)=a(6)-d2
70
71
Optimistic SSA Construction
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d(3)=b2*d1
d(4)=a1+b2
e(4)=e2+1
Optimistic SSA Construction
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b(5)=a1+b2
e(5)=c(2)-a1
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d(3)=b2*d1
d(4)=a1+b2
e(4)=e2+1
a(6)=b(5)*d(2,3)
b(6)=a(6)-d(2,3)
d2=φ(d(3),d(2))
b(5)=a1+b2
e(5)=c(2)-a1
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d(3)=b2*d1
d(4)=a1+b2
e(4)=e2+1
a(6)=b(5)*d2
b(6)=a(6)-d2
d2=φ(d(3),d(2))
b(5)=a1+b2
e(5)=c(2)-a1
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d3=b2*d1
d(4)=a1+b2
e(4)=e2+1
a(6)=b(5)*d2
b(6)=a(6)-d2
d2=φ(d3,d(2))
b(5)=a1+b2
e(5)=c(2)-a1
a2=b(5)*d2
b(6)=a2-d2
72
73
Optimistic SSA Construction
e2=φ(e1,e(4))
d1=φ(d(4),d(2))
d3=b2*d1
d(4)=a1+b2
e(4)=e2+1
Optimistic SSA Construction
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
a1=1
b1=2
b2=φ(b1,b(5))
e1=φ(e0,e(5))
c(2)=a1+b2
d(2)=c(2)-a1
b2=φ(b1,c1)
e1=φ(e0,e(5))
c1=a1+b2
d(2)=c1-a1
b2=φ(b1,c1)
e1=φ(e0,e(5))
c1=a1+b2
d(2)=c1-a1
b2=φ(b1,c1)
e1=φ(e0,d4)
c1=a1+b2
d4=c1-a1
d2=φ(d3,d(2))
b(5)=a1+b2
e(5)=c(2)-a1
a2=b(5)*d2
b(6)=a2-d2
e2=φ(e1,e3)
d1=φ(c1,d(2))
d3=b2*d1
d(4)=c1
e3=e2+1
e2=φ(e1,e3)
d1=φ(c1,d(2))
d3=b2*d1
d2=φ(d3,d(2))
b(5)=c1
e(5)=c1-a1
e3=e2+1
a2=c1*d2
b(6)=a2-d2
d2=φ(d3,d(2))
e(5)=c1-a1
a2=c1*d2
b(6)=a2-d2
74
e2=φ(e1,e3)
d1=φ(c1,d4)
d3=b2*d1
e3=e2+1
d2=φ(d3,d4)
e(5)=d4
a2=c1*d2
b3=a2-d2
75
12
Optimistic vs. Pessimistic SSA
Optimal (good) Policy
ƒ Generate pessimistic SSA
In examples pessimistic SSA better.
Caution: can not be generalized!
e2=φ(e1,e3)
d1=φ(c1,d4)
d3=b2*d1
b2=φ(2,c1)
e1=φ(e0,d4)
c1=1+b2
d4=c1-1
d2=φ(d3,d4)
e3=e2+1
ƒ Program size reduced
ƒ Definition-Use-Information computed
ƒ No immature φ-functions
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
ƒ Set all value(-number)s congruent
ƒ Compute optimistic SSA
ƒ Iteration (pessimistic-optimistic-pessimistic- ...)
e3=φ(e2,e4)
d2=φ(b2,c1)
d3=b2*d2
ƒ until fix point possible
ƒ in practice only pessimistic SSA or pessimistic SSA with only one
subsequent optimistic SSA computation (no iteration)
d4=φ(d3,b2)
e4=e3+1
a2=c1*d2
b3=a2-d2
a2=c1*d4
b4=a2-d4
76
Minimal SSA-Form
77
SSA – Construction from AST
ƒ Insight:
ƒ Left-Right Traversal (1. Round):
ƒ φ-functions guarantee that for each use of a variable there is exact
one definition
ƒ “Variable” means program- or auxiliary variable Solution of the (may)
Reaching-Definitions-Problem
ƒ Problems with array elements and indirectly addressed variables
retain (discussed and solved later)
ƒ Minimal SSA-form: set φ-function a0 := φ(a1, a2,...) in block B iff
value a0 is live in B.
ƒ Use data flow analysis liveIn(B) and check a ∈ liveIn(B).
ƒ Better: generate value numbers only on demand (integration in
construction algorithms).
ƒ compute for each syntactic expression its basic block number
ƒ compute precedence relation on basic blocks
ƒ generate expression triples into the BBs
ƒ Right-Left Traversal (2. Round):
ƒ compute, for each live (beginning with the results of a procedure)
expression, the value numbers (contains φ') using the data structures
known from value numbering
ƒ Left-Right Traversal (3. Round):
ƒ Mature φ'-functions
ƒ generate SSA for non empty blocks
ƒ Further eliminations on SSA graph
78
79
SSA from AST
SSA from AST
ƒ while AST and BB graph
ƒ Rounds 1+2 on one left-right tree traversal if liveliness is
ignored,
ƒ Construct BBs
ƒ Construct SSA code for the basic blocks (value number graphs)
ƒ Construct control flow between BBs
Parent
AST
ƒ For each statement type (AST node type) there is different
set of actions when visiting the nodes of that type including:
ƒ Assignment to local variables and expressions: like local value
numbering in a left-to-right traversal
ƒ Procedure calls like any other operation expressions
ƒ While, If, Exception, … on the fly introduce new BBs and control flow
80
Parent
Succ
Start
while
Expr
Expr
Succ
Body
Body
Start
Body
End
81
13
SSA from AST
Deconstruction of SSA
AST
Parent Succ
ƒ while actions
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ Serialize the SSA graph
ƒ Replace data dependency edges by variables
ƒ Remove φ-functions:
while
Finalize current block B(Parent)
Create a new current block B(Expr)
Add control flow B(Paret) to B(Expr)
Recursively, generate code for Expr
computing value numbers locally
Finalize current block B(Expr)
Create a new current block Bstart(Body)
Add control flow B(Expr) to Bstart(Body)
Recursively, generate code for Body
After return current block is Bend(Body),
finalize it
Add control flow Bend(Body) to B(Expr)
Create a new current block Bstart(Succ)
Add control flow B(Expr) to Bstart(Succ)
Return with Bstart(Succ) as current block
Expr
Body
ƒ Define a new variable
ƒ Copy from value in predecessor basic blocks (requires possibly new
blocks on some edges)
ƒ Perform copy propagation to avoid unnecessary copy operations
ƒ Allocate registers for variables
ƒ Fixed number of registers
ƒ More variables than registers
ƒ Idea: assign variables with non overlapping lifetimes to the same
register
82
83
1
Start
Example Program and BB Graph
Consti 2
(1) a=1;
(2) b=2;
while true{
(3)
c=a+b;
(4)
if (d=c-a)
(5)
while (d=b*d){
(6)
d=a+b;
(7)
e=e+1;
}
(8)
b=a+b;
(9)
if (e=c-a) break;
}
(10)a=b*d;
(11)b=a-d;
e
c
(3) c=a+b
(4) d=c-a
Muli
(5) d=b*d
d3=b2*d2
d5=φ(d3,b2)
4
Addi
d2=φ(b2,c1)
e3=φ(e4,e2)
(1) a=1
(2) b=2
d3=b2*d2
(3) c=a+b
(4) d=c-a
6
a
b2=2
1
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
2
Subi
85
7
(1) a=1
(2) b=2
b2=c1
d5=φ(d3,b2)
4
(5) d=b*d
(6) d=a+b
(7) e=e+1
4
Remove φ-functions
3
5
5
6
Muli
Jmp
2
e4=e3+1
a3=c1*d5
b6=a3-d5
φ
Consti 1
1
d2=φ(b2,c1)
e3=φ(e4,e2)
Consti 0
3
≠
(8) b=a+b
(9) e=c-a
Introduce Variables for Edges
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
d
≠
(10) a=b*d
(11) b=a-d 84
3
Consti 0
Addi
φ
φ
2
≠
Consti 0
(6) d=a+b
(7) e=e+1
φ
φ
b
(1) a=1
(2) b=2
Consti 1
Jmp
(3) c=a+b
(4) d=c-a
5
(5) d=b*d
e4=e3+1
(8) b=a+b
(9) e=c-a
(6) d=a+b
(7) e=e+1
a3=c1*d5
b6=a3-d5
(10) a=b*d
(11) b=a-d
86
6
(8) b=a+b
(9) e=c-a
(10) a=b*d
(11) b=a-d
87
14
Remove φ-functions
Remove φ-functions
1
b2=2
b2=2
e2=e1
3
d2=φ(b2,c1)
e3=φ(e4,e2)
d3=b2*d2
1
e2=e1
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
2
8
3
7
d5=φ(d3,b2)
4
d2=φ(b2,c1)
e3=φ(e4,e2)
(1) a=1
(2) b=2
b2=c1
e2=b2
(5) d=b*d
e4=e3+1
(6) d=a+b
(7) e=e+1
6
a3=c1*d5
b6=a3-d5
d5=d3
d3=b2*d2
(3) c=a+b
(4) d=c-a
5
d2=c1
e3=e2
e4=e3+1
4
b2=φ(2,c1)
e2=φ(e1,b2)
c1=1+b2
d5=b2
d5=φ(d3,b2)
7
b2=c1
e2=b2
5
9
d2=b2
e3=e4
(8) b=a+b
(9) e=c-a
2
a3=c1*d5
b6=a3-d5
(10) a=b*d
(11) b=a-d
6
88
89
Copy Propagation
Remove Empty Block
1
b2=2
b2=2
e2=e1
c1=1+b2
d5=b2
8
3
d3=b2*d2
d2=c1
e3=e2
2
e4=e3+1
8
3
7
d2=c1
e3=e2
d5=b2*d2
b2=c1
e2=b2
d5=d3
4
1
e2=e1
c1=1+b2
d5=b2
5
e4=e3+1
4
9
d2=b2
e3=e4
6
a3=c1*d5
b6=a3-d5
7
b2=c1
e2=b2
5
9
d2=b2
e3=e4
2
a3=c1*d5
b6=a3-d5
6
90
Memory SSA
91
Why Load Defines Memory?
ƒ By now we can only handle simple variables
ƒ Extension:
ƒ Node: memory changing operations
ƒ Edges:
M a
• Data- and control flow.
• Anti- / out dependencies between memory changing operations
Load
ƒ Functional modeling of memory changing operations
v
Anti-depending memory operations:
Read an address essentially before
Redefine the value
a v'
Store
Ma v
M a
M a ...
Store
Load
Call
...
M‘
M‘ v
M‘ v
M‘
M, M‘
a
v
Memory state
Address
Value
92
93
15
Memory SSA
Properties of Memory SSA
ƒ To capture only essential dependencies distinguish disjoint memory
fragments
ƒ In general not decidable
ƒ Approximated by analyses
ƒ Initial distinctions e.g.
• Heap vs. Stack
• Different arrays on the stack
• Heap partitions for different object types
ƒ Distinction often only locally possible
ƒ Union necessary
ƒ Sync operation unifies disjoint memory fragments
ƒ Like φ- functions but sync is strict
P1-P8: as before
P9:
New! Lifetime of memory states do not overlap if they
define different values of the same memory slot
Store
Load
ƒ Otherwise we would need to keep two versions of the
memory alive
ƒ Memory does not fit into a register (usually)
ƒ Would be make the programs non-implementable
Store
ϕ
Store
ƒ Note: if we only have to analyze the program, P9 could be
ignored
Load
Sync
94
Reduced SSA Representations
95
Example Points-to-SSA
ƒ Assume goal is analysis not compilation as in many software tech.
applications in IDEs
ƒ
ƒ
ƒ
ƒ
Rename ids
Go back in debugging
Move code
…
ƒ Not the whole program is directly relevant for an analysis
ƒ Certain data types are uninteresting, e.g., Int, Bool in call graph construction
ƒ Consequently, operation nodes consuming/defining values of these types
and edges connecting them can be removed
ƒ More compact program representation
ƒ Faster in analysis
ƒ Still SSA properties hold
ƒ Example Points-To SSA capturing only reference information necessary
for Points-To analysis (ignoring basic types and operations)
96
Cliffhanger from the first day
ƒ
ƒ
ƒ
ƒ
97
Recall: Call graph construction
ƒ Points-to Analysis (P2A) computes reference information:
Inter-Procedural analysis
Call graph construction (fast)
Call graph construction (precise)
Call graph construction (fast and precise)
ƒ Which abstract objects may a variable refer to.
ƒ Which are possible abstract target objects of a call.
ƒ In general: for any expression in a program, which are the abstract
objects that are possibly bound to it in an execution of that program.
ƒ Call graph construction is a simple client analysis of P2A
98
99
16
Fast and Precise P2A
Recall: Precise call graph construction
ƒ Construction of a Points-to Graphs (P2G):
1. Data values
ƒ Node for objects and variables,
ƒ Edges for assignments and calls
ƒ
ƒ
ƒ Propagate objects along edges, i.e., data-flow analysis on that graph
allocation site abstract from objects O
abstract heap memory: OμFöO (F set of fields) heap size: Int
2. Data-flow graph: Points-to-SSA graph for each method
(constructor)
ƒ The baseline P2G approach is locally and globally flow-insensitive; it
focuses on data-dependencies over variables and ignores the control
flow
ƒ
ƒ An analysis is flow-sensitive if it takes into account the order in which
statements in a program are executed
ƒ In principle, additional def-use analysis avoids this problem
ƒ Using global def-use analysis does not scale
Nodes with ports represent operations relevant for P2A, ports
correspond to operands, special φ-nodes for merge points in control
flow
Edges represent intra-procedural control- and data-flow
ƒ
3. Transfer function:
ƒ The baseline P2G approach is context-insensitive
ƒ An analysis is context-sensitive if distinguishes different contexts in which
methods are called
ƒ Object-sensitivity distinguish methods by the abstract objects they are called
on – can be understood as copies of the method’s graph
ƒ Scales but is quite slow
ƒ
ƒ
update the heap according to the abstract semantics of the
special φ-nodes: » on O values and max on Int values, resp.
4. Initialization: « for O ports and 0 for Int ports, resp.
5. Simulated execution
100
101
Recall: Simulated Execution
Example transfer functions
ƒ Interleaving of process method and update call nodes’ transfer function
ƒ Processes a method:
ƒ Starts with main,
ƒ propagates data values analog the edges in P2-SSA graph
ƒ updates the heap and the data values in the nodes according to their
transfer functions
ƒ If node type is a call then …
ƒ Call nodes’ transfer function; if input changes:
ƒ Interrupts the processing of a caller method
ƒ Propagates arguments Pt(v1…vn) to the all callees Pt(a)
ƒ Processes the callees (one by one) completely
(iterate in case of recursive calls)
ƒ Propagates back and merges the results Pt(r) of the callees
ƒ Updates heap size size(x’)
ƒ Continue processing the caller method …
x
a v1
Call
x’
if input changes, update as below else skip:
Alloc
C
v
vn
C.m
m
a
Load
C.f
v
Pt(v)={oi} (unique i)
Pt(v)=∪oi ∈ Pt(a) heap(oi,f)
x
a
Store
v
C.f
x’
∀oi∈Pt(a): heap(oi,f) ∪=Pt(v)
size(x’) = heap.size
r
102
Flow-sensitivity
103
Example: Global flow-sensitivity
ƒ The Points-to-SSA approach has two features that
contribute to flow-sensitivity:
1. Locally flow-sensitive: We have SSA edges imposing the correct
ordering among all operations (calls and field accesses) within a
method.
2. Restricted globally flow-sensitive: We have the simulated execution
technique that follows the inter-procedural control-flow from one
method to another.
ƒ Effect:
ƒ An access a1.x will never be affected by another a2.x that we
process after a1.x.
ƒ Each return only contains contributions from previously processed
calls, i.e. reduced mixing of values returned by calls targeting the
same method.
ƒ But: information is accumulated in method arguments to guarantee
scalability.
104
105
17
Context-sensitivity
Context-sensitive call handling
ƒ Context-insensitive analysis
ƒ Actual arguments of calls targeting the same method were mixed in
the formal arguments.
ƒ Advantage scalability: ensures termination for recursive call
sequences and reaches a fix point quickly
ƒ Disadvantage accuracy: inaccurate due to mixed return values
ƒ Context-sensitive analysis
ƒ Divide calls targeting a given method a.m(v ...) into a finite
number of different categories
ƒ Analyze them separately – as if they defined different copies of that
method.
ƒ We can define contexts as an abstraction of call stack situations:
context: [m, call id, a, v1,… , vn] ö [c1, … , cp]
ƒ Often it is just one context per call stack situations
context: [m, call id, a, v1,… , vn] ö ci
Call(m, call, xin, a, v1, … , vn] ö [xout, r ]
[xout, r ] = [0,^]
for all c œ context [m, call id, a, v1,… , vn] do
if [xin, a, v1, … , vn] m previous arguments (m, c) then
r = previous return (m, c)
xout = xin
else
args = previous args (m, c) + [xin, a, v1, … , vn]
args previous (m, c) = args
[xout, r ] = [xout, r ] + processMethod(m, args)
previous return (m, c) = r
end if
end for
return [xout, r ]
106
107
Examples: Context abstractions
Object-sensitivity
ƒ A context is given by a pair
(m,o)
ƒ where o ∈ O is a unique abstract
object in the points-to value
analyzed for the call target
variable this.
ƒ Linear (in program size) many
contexts,
ƒ In practice slightly more precise
than This-sensitivity.
Examples: Precision
This-sensitivity
ƒ A context is given by a pair
(m,this)
ƒ where this ∈ 2O is the unique
points-to value (set of abstract
objects) analyzed for the call
target variable this.
ƒ Exponentially many contexts (in
practice ok),
ƒ In practice an order of magnitude
faster than Object-sensitivity.
In favor of Object-sensitivity
ƒ Method definitions:
m() {field = this; }
V n() {return field; }
ƒ Call 1:
a1 = {o1 , o2}
a1.m()
ƒ Call 2:
a2 = {o1 }
r2 = a2.n()
In favor of This-sensitivity
ƒ Method definition:
V m(V v) {return v; }
ƒ Object-sensitivity: Pt(r2) = {o1 }.
ƒ This-sensitivity: Pt(r2) = {o1 , o2 }.
ƒ Object-sensitivity: Pt(r2) = {o3, o4}.
ƒ This-sensitivity: Pt(r2) = {o4 }.
ƒ Call 1:
a1 = {o1 }, v1 = {o3 }
r1 = a1.m(v1)
ƒ Call 2:
a2 = {o1 , o2 }, v2 = {o4 }
r2 = a2.m(v2)
108
109
Results
Assignment I
ƒ Fast and accurate P2A
Compute the SSA graph for:
Points-to SSA ï locally flow sensitive PTA
Simulated execution ï globally flow-sensitive PTA, fast
Context-insensitive in the first presented baseline version
More accurate (in theory and practice ca. 20%) and 2μ as fast
compared to classic flow- and context-insensitive P2A
ƒ Fast: < 1 min on javac with > 300 classes.
ƒ
ƒ
ƒ
ƒ
1:
2:
3:
4:
ƒ Context-sensitive variant this sensitivity even more accurate
ƒ As fast and up to 3μ as precise compared to classic flow- and
context-insensitive P2A
ƒ As precise and 10μ as fast compared to the best known contextsensitive variant (object sensitivity) of classic P2A
5:
ƒ
ƒ
ƒ Shows in clients analyses like synchronization removal and
static garbage collection (escape and side effect analysis)
110
c := 0;
while a/=0 do
c := c+b;
a := a-1;
od;
stdout := c
Define the program with
immature φ’ functions and the
final version.
Draw the graph (replacing
variables by edges). Note that
stdout := c is actually a
function call.
c := 0;
while a/=0
c:= c+b;
a:= a-1;
stdout:=c
111
18
Assignment II
Context-sensitive analysis
ƒ Leave the SSA graph for the program from Assignment I
ƒ Distinguish different call context of a method
ƒ In general: Distinguish different execution path to a program
point
ƒ Remove φ functions
ƒ Perform copy propagation
ƒ Compare the produced program to the original one
ƒ Exponentially (in program size) many path in a sequential program
ƒ Exponentially many analysis values
ƒ Perform register allocation for 3 registers
ƒ Do it mechanically using live analysis and graph coloring
ƒ Why is it immediately clear that 3 registers are sufficient but 2 are
not.
x=1
x=2
y=x
b=3
y=2
b=4
ƒ (Let away the problem of analysis) How to capture the
results efficiently?
112
113
Context-insensitive
Data flow analysis
Context-sensitive
Data flow analysis
x=1
x=2
y=x
b=3
y=2
b=4
a=x
a=y
x+y = ?{2,3,4}
{4,5,6}
a+b = ?
a=x
a=y
000 001 010 011 100
1
1
1
1
2
1
1
2
2
2
1
1
1
1
2
3
3
4
4
3
x
y
a
b
101
2
2
2
3
110
2
2
2
4
111
2
2
2
4
2(x+y) >= a+b ?
if
2(x+y)>=(a+b)
if
2(x+y)>=(a+b)
Representation of context in decision tables
Too large! Too much redundancy!
114
115
Concept of χ terms
a+b
4
x=1
1
x=2
χ1
χ1
χ2
y=x
b=3
y=2
b=4
4
a=y
a+b
3
χ3
+
5
6
χ1
a=x
5
6
4
2
χ1
χ2
χ2
χ3
χ2
χ1
χ2
4
=
x=1
1
x=2
χ2
χ2
y=x
b=3
y=2
b=4
4
⊑
χ1
χ1
χ3
{4,5} {5,6} 6
χ2
χ3
116
a=x
a=y
a+b
3
χ3
+
5
6
χ1
χ2
5
6
2
χ1
χ2
χ2
χ3
χ2
4
χ2
⊑
χ1
=
χ2
χ3
{4,5} {5,6} 6
χ2
χ2
χ3
117
19
Advantages of χ-Terms
ƒ
ƒ
Data-Flow Analyses (DFA) on SSA
Compact representation of context sensitive information
Delayed widening (abstract interpretation) of terms until no more memory:
no unnecessary loss of information
ƒ Each SSA node type n has a concrete semantics [n],
χ3({4,5,6},{4,5,6}) = {4,5,6} ⊑ [4,6] ⊑ ⊤
ƒ Each static data-flow analysis abstracts from concrete semantics and
values
ƒ On execution of n, map inputs i to outputs o and o= [n](i)
ƒ inputs i and outputs o are records of typed values (P1)
ƒ [n] :: type(i1) × … × type(ik) → type(o1) × … × type(ol)
χ3(χ2(χ1(4,5),χ1(5,6)),χ2(χ1(4,5),6))⊑ χ3(χ2({4,5},{5,6}),χ2({4,5},6)) ⊑
ƒ
χ-Terms are implementation of decision diagrams.
ƒ
ƒ
ƒ
ƒ
Adaptation of OBDD implementation techniques.
Symbolic computation with χ-terms (simplification of terms),
Transition of the idea of SSA φ-function to data-flow values
Especially interesting for address values to distinguish memory partitions as
long as possible
ƒ Abstract semantics Tn is called transfer function of node type n
ƒ On analysis of n, map abstract analysis inputs a(i) to abstract analysis
outputs a(o) and a(o) = Tn( a(i) )
ƒ Tn :: type(a( i1)) × … × type(a( ik)) → type(a( o1)) × … × type(a( ol))
ƒ For each abstract semantics type A = type(a(•)) – analysis universe –
there is a partial order relation ⊆ and a meet operation ∪ (supremum),
ƒ ⊆ :: A × A
ƒ ∪ :: A × A → A with ∪ = Tφ
ƒ (A, ⊆) defines a complete partial order
118
119
DFA on SSA (cont.)
Generalization to χ-terms
ƒ Provided that transfer functions are monotone: x ≤ y ⇒ Tn(x)
≤ Tn(y) with x,y ∈ A1 × … × Ak and ≤ defined element-wise
with ⊆ of the respective abstract semantics type
ƒ Following iteration terminates in a unique fixed point
ƒ Given such a context-insensitive analysis (lattices for
abstract values, set of transfer functions, initialization of start
node) we can systematically construct a context-sensitive
analysis
ƒ χ-term algebras Χ over abstract semantic values a ∈ A
introduced
ƒ Initialize the input of each node the SSA graph with the smallest
(bottom) element of the Lattice/CPO corresponding its abstract
semantics type
ƒ Initialize the start nodes of the SSA graph with proper abstract values
of the corresponding its abstract types
ƒ Attach each node with its corresponding transfer function
ƒ Uniform randomly compute abstract output values
ƒ (Fewer updates use SCC and interval analysis to determine an
traversal strategy, backward problems analyzed analogously)
ƒ a ∈ A ⇒ a ∈ ΧA
ƒ t1 ,t2 ∈ Χ ⇒ χi(t1, t2) ∈ ΧA
ƒ Induces sensitive CPO for abstract values (ΧA , ⊑) and
for a1 ,a2 ∈ A and t1 ,t2 ,t3 ,t4 ∈ Χ:
a1 ⊆ a2 ⇒ a1 ⊑ a2
χi(a1, a2) ⊑ a1 ∪ a2
t1 ⊑ t3 , t2 ⊑ t4 ⇒ χi(t1, t2) ⊑ χi(t3, t4)
ƒ New transfer functions induced
120
New transfer functions
121
Sensitive Transfer Schema (case a)
ƒ φ-node’s transfer functions:
ƒ Insensitive: Tφ = ∪ = a( i1) ∪ … ∪ a( ik)
ƒ Sensitive: Sφ = ⊔b = a’( i1) ⊔b … ⊔b a’( ik) with b block number of φnode and for a1 ,a2 ∈ A and t1 ,t2 ,t3 ,t4 ∈ Χ:
a1 ⊔b a2 = χb (a1 , a2 )
χx(t1, t2) ⊔b χy(t3, t4) = χb (t1 ∪ t2 , t3 ⊔b t4 ) iff x=b (cases y = b analog)
χx(t1, t2) ⊔b χy(t3, t4) = χb (t1 ⊔b t2 , t3 ⊔b t4 ) otherwise
χi
ƒ Ordinary operation’s (τ node’s) transfer functions:
ƒ Insensitive (w.l.o.g. binary operation): Tτ :: Aa × Ab → Ac
ƒ Sensitive: Sτ :: ΧAa × ΧAb → ΧAc and for a1,a2 ∈ Aa , Ab and t1,t2∈ Χa , Χb:
Sτ (a1 , a2 ) = Tτ (a1 , a2 )
Sτ ( χx(t1, t2) , χy(t3, t4) ) = χk(Sτ(cof(χx(t1,t2),χk,1),cof(χy(t3,t4),χk,1)),
Sτ(cof(χx(t1,t2),χk,2),cof(χy(t3,t4),χk,2)))
with k larger of x,y and cof is the co-factorization
122
χi
τ
τ
τ
χi
123
20
Sensitive Transfer Schema (case b)
Co-factorization
ƒ cof(χx(t1,t2),χk,i) selects the i-th branch of a χ-term if χx=χk
and returns the whole χ-term, otherwise
χ<i
χi
χ<i
χ<i
τ
= χx(t1,t2)
= t1
= t2
ƒ cof(χx(t1,t2),χk,i)
ƒ cof(χx(t1,t2),χk,1)
ƒ cof(χx(t1,t2),χk,2)
iff k > x
iff k = x
iff k = x
τ
τ
χi
124
125
Context-sensitive φ-node transfer
function
Example revisited: insensitive
ƒ SSA node ⊕
ƒ Semantic
ƒ [⊕] :: Int × Int → Int
ƒ [⊕](a,b) = a+b
ƒ Abstract Int values { ⊥, 1,2, …, maxint,⊤}
ƒ Context-insensitive transfer function:
ƒ T⊕(⊥, x) = T⊕(x, ⊥) = ⊥
ƒ T⊕(⊤, x) = T⊕(x, ⊤) = ⊤
ƒ T⊕(a,b) = [⊕](a,b) = a+b for a,b ∈ Int
ƒ Context-insensitive meet function
ƒ Tφ(⊥, x) = Tφ(x, ⊥) = x
ƒ Tφ(⊤, x) = Tφ(x, ⊤) = ⊤
ƒ Tφ(x, x) = x
ƒ Tφ(x, y) = ⊤
x=1
x=2
1
2
x= χ1
y=x
b=3
1
y=2
b=4
2
χ1
3
b= χ2
y= χ2
2
a=y
χ2
a= χ3
a+b
Context-sensitive a ⊕ b
Context-sensitive a ⊕ b (cont.)
S⊕( χ3(χ1(1,2),χ2(χ1(1,2),2)) , χ2(3,4) )
= χ3( S⊕(cof(χ3(χ1(1,2),χ2(χ1(1,2),2)), χ3,1),cof(χ2(3,4), χ3,1)),
S⊕(cof(χ3(χ1(1,2),χ2(χ1(1,2),2)), χ3,2),cof(χ2(3,4), χ3,2)))
= χ3( S⊕(χ1(1,2),χ2(3,4)),
S⊕(χ2(χ1(1,2),2), χ2(3,4)))
= χ3( χ2(S⊕(cof( χ1(1,2), χ2,1),cof(χ2(3,4), χ2,1)) ,
S⊕(cof( χ1(1,2), χ2,2),cof(χ2(3,4), χ2,2)) ),
χ2(S⊕(cof(χ2(χ1(1,2),2), χ2,1),cof(χ2(3,4), χ2,1)) ,
S⊕(cof(χ2(χ1(1,2),2), χ2,2),cof(χ2(3,4), χ2,2)) ))
= χ3( χ2(S⊕(χ1(1,2), 3) ,
S⊕(χ1(1,2),4)),
χ2(S⊕(χ1(1,2),3),
S⊕(2,4) ))
S⊕( χ3(χ1(1,2),χ2(χ1(1,2),2)) , χ2(3,4) ) = ...
= χ3( χ2(S⊕(χ1(1,2), 3), S⊕(χ1(1,2),4)
= χ3( χ2(S⊕(χ1(1,2), 3), S⊕(χ1(1,2),4)
= ...
= χ3( χ2(χ1(4,5),
χ1(5,6)
1
S⊕(2,4) ))
6
))
), χ2(χ1(4,5),
6
5
χ1
χ2
χ3
3
+
=
))
6
χ1
χ2
4
χ2
127
), χ2(S⊕(χ1(1,2),3),
), χ2(S⊕(χ1(1,2),3),
4
2
χ1
128
1
χ1
a=x
126
4
χ2
χ3
129
21
Assignment III
Given the SSA fragment on the left
ƒ Perform context-insensitive
data-flow analysis (using the
definitions on the previous
slides). What is the the value at
the entry of node x?
ƒ Perform context-sensitive dataflow analysis (using the
definitions on the previous
slides). What is the the value at
the entry of node x?
ƒ Why is the former less precise
than the latter?
ƒ Construct a scenario where you
could take advantage of that
precision in an optimization!
1
φ
1
⊕
x
130
22
Download