jenny

advertisement
Shape Analysis via 3-Valued Logic.
Jenny Sannikov
jenny_sv@yahoo.com
Shape Analysis with applications.
Chapter 4.6









2. Lecture Outline.
Collecting Semantics using first order logic.
3-Valued logic and embedding.
Simple abstract semantics using logic.
More precise abstract semantics.
3. Collecting Semantics using Logic.
Represent states using logical structures.
Construct the program control flow graph with a distinguished node start.
Define the set of logical structures at start.
Define the meaning of program conditions using closed first order
formulae.
Define the meaning of statements using first order formulae.
* The set of structures can be finite or infinite.
* Suppose that at the beginning of the program the initial values are
known – for example the whole memory is empty, or in C program – all
global variables are 0.
4. The SWhile Programming Language.
Recall from previous lecture, that SWhile is an extension of While language,
which includes dynamic memory allocation, pointers and destructive update.
Each allocated memory element is constructed of two cells: car and cdr.
Abstract Syntax:
Destructive
sel:= car | cdr
update
a := x | x.sel | null | n | a1 op a a2
Memory
allocation
b := true | false | not b | b1 op b b2 | a1 opr a2
S := [ x : a]l | [ x.sel : a]l | [ x : malloc ()]l |
[ skip]l | S1 ; S2 | if [b]l then S1 else S2 | while [b]l do S
5,6. An Example.
The following program creates a linked list with count nodes.
x : null ;1
while ([count  0]2 )(
[t : malloc (); ]3
[t.cdr : x; ]4
[ x : t ]5
[count : count  1]6 )


The following predicates are defined:
Unary Predicates – their number is as the number of the variables in the
program:
o x(v) - will be true for cells pointed by x.
o t(v) - will be true for cells pointed by t.
Binary Predicates:
o car(v1,v2) – true if car(v1)=v2.
o cdr(v1,v2) – true if cdr(v1)=v2.
o eq(v1,v2) – true if v1 and v2 point the same cell.
Let's analyze the given program control flow graph (as appears in slide 6):
1. x:=null "means" {x'(v):=0} – doesn't point any cell in memory.
2. count>0 is ignored, meaning that the number of allocated
elements is unknown and unimportant. The loop can be performed any
number of times.
3. t:=malloc() semantics is {let v0 := new in
t( v ):=eq( v , v0 )}
v here is any cell in the memory.
4. t.cdr:=x :
{message  v: t(v) -> ...
cdr'(v1,v2):=(t(v1)?x(v2):cdr(v1,v2))}
The first part handles errors. If t doesn't point any cell in memory
(there exists no v such that t(v) is true) an error message can be
generated. The second part is based on a well-known C shortcut for an
“if then else” sentence. In logic - ( ?1 :  2 )  (  1 )  (   2 ) .
5. x:=t – {x'(v):= t(v)} – an assignment sentence semantics is that
x points a cell if and only if t points it too.
7,8. Another Example – the reverse program.
Recall from previous lecture:
 y : null ;1
while ([ x! null ]2 )(
[t : y; ]3
[ y : x; ]4
[ x : x.cdr; ]5
[ y.cdr : t; ]6 )
The Predicates list is built in a similar way to the previous example – three
unary predicates (x(v),t(v),y(v)) and three binary.
Let's analyze the given program control flow graph (as appears in slide 8).
The semantics of most of the sentences is like in previous program. Pay
attention to x!=null sentence. Its semantics is v:x(v).
x:=x.cdr;  "means" {message :x(v)->...
x'(v) := v1:x(v1)cdr(v1,v)}.
9. Statement's Meaning Conclusion.
Pay attention to y.sel and x.sel – the way to move through the data structure.
Thus the data structure doesn't have to be a list, but any data structure.
10. Condition's Meaning Conclusion.
Pay attention to the difference between eq and == program condition. The
binary operator eq denotes equality on allocations – dynamic cells attribute.
The == condition denotes equality of syntactic program expressions denoting
L-values. It succeeds when both point to the same place in memory. There is
a difference between variables (statically determined) and locations
(dynamically allocated and unbounded number).
11. Collecting Semantics.
CS (start) = {<,>}
CS (v) = {st(u) (S):u  v  E, S  CS(u)} 
{S : S, u  v  E t ,S  cond(u)} 
{S : S, u  v  E f ,S  cond(u)} 
We deal here with a group of concrete states represented by logical
structures. We assume that a program’s flow control graph has a start node
(otherwise such a node can be added). The start node has no predecessors.
For a node v – CS(v) is a group’s union, which is usually infinite. Given a
logical structure and a previous state we execute the statement and get the
current state. In the case of a logical condition – there are two entries - E t
and E f . If the meaning of the condition is satisfied (the formulae of the
condition is 1 – there are no free variables), then the structures move to CS
of the next node. If the state satisfies the negation of the formulae then the
false statement is performed.
Look again at the slide 6 control flow analysis. The CS(v) of the first node in
the loop ([count>0]) is:
CS(2) = {x:=null (S) | S  CS(1)} 
{x:=t (S) | S  CS(5)}
It is possible that at this point the memory is empty or there is one (two,
three...) cell pointed both by t and by x. What is clear is that in CS(2) x and t
are always equal – or both null or both point the same cell.
12.Three-Valued Logic.
We will apply all formulas in three-valued logic, when third value ½ means
“unknown”.
Two-valued Logic: 1 and 0 state for true/false.
Three-valued logic:
 1: True
 2: False
 ½: Unknown
 A join semi-lattice 0  1 = ½.


13. 3 – Valued Logical Structures.
A set of individuals (nodes) U.
Predicate meaning
- P S : U S  {0,1,1 / 2} . {0, 1,1/2} stay for true/false/don’t know for
indicating whether individual u in U satisfies predicate p in P.
14. Example.
In the following example x S and y S are unary predicates and car S , cdr S and
eq S are binary predicates. x points only at u1, so
xS
= {u1  1, u2  0, u3  0} .
The list has three or more nodes. Since u3 is a summary node, its equality is
unknown (1/2).






15,16. Embedding.
A pre-partial order on 3-valued logical structures.
S1  S 2  Every concrete state represented by S1 is also represented by
S 2 (but S1 is more precise).
The set of nodes in S1 and S 2 may be different.
o No meaning for nodes (abstract locations).
S1  f S 2 
o f maps the individuals of S1 onto S 2 . f is onto, meaning that for
every element in S 2 there is an element in S1 .
o p1S (u1 ...u k )  p 2S ( f (u1 ).. f (u k )) .
S1  S 2  There exists f such that S1  f S 2 .
Pre-partial order. The order is reflexive, transitive, but not anti-symmetric.
o Reflexive – by f that maps an element to itself
o Transitive – composing f

Induces a pre-partial order on P(3-Struct)
o Set-union is a least upper bound
 Finite height – important for ending the iterative process. Details later.
 :3-Struct  P(2-Struct)
o (S)={S’:S’2-Struct,S’S}
 :P(3-Struct)  P(2-Struct)
o (XS)=  SXS (S)
*Two concrete structures are equal if they isomorphic.
Pay attention that 0 ½ and 1  ½.
The following example demonstrates an embedding:
S1:
X
cdr
S2:
X
cdr
L1
L1
cdr
cdr
L2
L2
cdr
cdr
L3
cdr
L4
cdr
L5
L3
The question asked here is whether S1  S2 under any f? Yes, if we map L1
to L1, L2 to L2 and the rest (L3. .L5) to the summary node. This f is onto.
The eq and cdr predicates satisfy the needs: 0 ½ and 1  ½ .
17. Tight Embedding.







S=< U , P >
f: U S  U # such that f is onto.
Define S #  U # , P # 
o p # (u #1 ,.., u # k )   { p S (u1 ,.., u k ) : f (u i )  u # i }
Don’t map to ½ if you don’t need to (join: 0 1=1/2)
S  f S#
S
S
18. The Abstraction Principle.
Partition the individuals into equivalence classes based on the values of
their unary predicates (Blur operation in the next example).
Collapse other predicates via .
The result is a special case of tight embedding in the mapping f is
determined by the (concrete) value of unary of unary predicates.
19. An example.
We can see that three elements of the concrete world –u2, u3, u4 collapsed
into one element u234 in abstract world. These 3 elements were grouped in
one equivalence class, because all their unary predicates (x, y) are identical.
Their binary predicate n (next) wasn’t identical, but it was collapsed by join
action on each partition.
As a result information was lost: in abstract world we have a summary node,
which means that any of u2, u3, u4 can point to any of u2, u3, u4. In concrete
world we knew that only successors' pointers exist.
This operation is called blur and can be also applied to an abstract 3-valued
structure. In this case, it acts as a kind of widening; it embeds a 3-valued
structure in a potentially less precise but more compact one.
20. Boolean Connectives [Kleene].
A meaning (conservative) for the formulas in logic is:










21,22. Formal Semantics of First Order Formulae.
For a structure S  U S , P S  .
Formulae  with LVar free variables.
Assignment: z: LVar  U S (free variables to individuals).
s (z):{0,1,1/2} – The value of the formulae for an assignment s.
An inductive definition:
o 1s (z) = 1
o 0s (z) = 1
o  p(v1 , v2 ,..., vk ) s (z) = p S ( z (v1 ), z (v 2 ),..., z (v k ))
s (z):{0,1,1/2}
o 12s (z)=max(1s (z), 2s (z))
o 12s (z)=min(1s (z), 2s (z))
o 1s (z)= 1-1s (z)
o v:1s (z)=max{1s (z[vu]): uUs }
23,24. The embedding Theorem.
Evaluating formula in S is conservative with respect to evaluation in the
(potentially infinite) structures represented in  (S ) .
Every formula  is preserved
o  =1 in S   =1 in every S '  ( S )
o  =0 in S   =0 in every S '  ( S )
o  =1/2 in S  don’t know
S f S’
Formulae  with LVar free variables
Assignment z: LVar  U S
o s (z)  s’ (fz)

Writing the concrete semantics using logic, we’ve got the
soundness for free.
 But the evaluation may be suboptimal (over conservative)
For example, consider the formulae [v : x(v) ?1 : 1] and a structure:
x
By definition, we have to evaluate (v : x(v )  1)( v : x (v )  1) .
The calculated value of the formula is (1 / 2  1)  (1 / 2  1)  1 / 2 , but it is clearly
seen that the formula is true. A practical solution for this problem is a more
precise definition for the formula. However, there undecidabllity of first order
logic implies that there are many formulae whose values are ½ in 3-valued
logic while they have a definite (1 or 0) value in all represented concrete
structures.


25. Shape Analysis via Abstract Interpretation.
Iteratively compute a set of 3-valued structures for every program point.
Every statement transforms structures according to the predicate-update
formulae
o use 3-valued logic instead of 2-valued logic
o use exactly the predicate-update formulae of the concrete
semantics!
26. Abstract Semantics.
AI(start)={< ,> }
AI (v) = {st(u)3 (S):u  v  E, S  AI(u)} 
{S : S, u  v  E t ,S 3 cond(u)} 
{S : S, u  v  E f ,S 3 cond(u)} 
Notice that if there is a condition evaluated to ½, then both then and else are
calculated.
27. Example.
Let us look at the example program at slide 6. At every point there is a group
of structures, which is final because of the blur.
We assume that at the beginning the work- list is initialized with all the
elements and the structure is empty:
The process has to stop – at each step blur is performed. After three
iterations we get a list of any length. At the next iteration the process stops,
although the program can perform infinite number of allocations.
Pay attention that at the end a number of important facts are known. For
example that x and t must point to the same cell in memory.
1.
2.
3.
4.
At the beginning the structure is empty, the work list is {1,2,3,4,5,exit}
First node – the structure is empty, work list = {2,3,4,5,exit}
Second node - the structure is empty, work list = {3,4,5,exit}
Third node - the structure is empty, work list = {3,4,5,exit}
5. Fourth node – the structure is
t
Work-list = {4,5,exit}
6. Fifth node – the structure is:
x
t
Work-list = {5,exit}
7.
x
t
After blur there is no change, since the unary predicates are different.
8. Fourth node again –
x
cdr
t
After blur there is no change, since the unary predicates are different.
9. Fifth node again –
x
cdr
t
After blur there is no change, since the unary predicates are different.
10.
cdr
x
t
After blur there is no change, since the unary predicates are different.
11. Fourth node again x
cdr
t
cdr
After blur there is no change, since the unary predicates are different.
12. Fifth node x
cdr
t
cdr
After blur the two last cells are collapsed, since their unary predicates
equal (x(v) and t(v) are both 0).
13. Finally
x
t
Summary
node
28.The Reverse Program example.
Look at the following list:
x
u1
u2
After a few steps we get:
y
x
u1
Next step is x:=x.cdr:
For u1 it’s 0(there is no cdr that contains u1), for u2 it’s ½ (1/2  1=1/2),
so we get:
y
u1
x
Now we know that x points somewhere after y – a lot of information was
lost. Actually the fact that it is a list was “forgotten”. The process is sound,
but not precise.



29. Intermediate Summary.
Predicate logic allows naturally expressing SOS for languages with pointers
and dynamically allocated structures.
3-valued logic provides a sound solution
o Immediate from Embedding theorem.
o All you need is to guarantee the SOS correctness.
But not very precise (as seen in the previous example).
30. More precise abstract interpretation.
We will use two separate mechanisms:
 Refine the abstraction (concretization)
 More precise abstract interpretation of basic statements
o But not necessarily the best (induced).



31.The Instrumentation Principle
Increase precision by storing the truth-value of some designated formulae.
Introduce predicate-update formulae to update the extra predicates.
This is the first option, refine abstraction and concretization.
32,33.Example: Heap Sharing.
Recall previous lecture:
"is shared” is 0 for all the nodes, so the joint is 0 too.
In next example “is shared” information holds for each concrete node and is
preserved in an abstract representation. For each sentence in semantics, a
special rule for updating “is” value will be added. Notice that this kind of
sharing is “heap-sharing” since it ignores sharing caused by multiple pointers
from the stack. Indeed, stack-sharing need not be stored explicitly in the
abstract domain since it can be recovered from the values of unary
predicates.
In this case ‘u’ node will be determined as shared.
Adding more unary properties increases the precision but also may increase
the cost.
How should we choose those properties? There is no precise answer. We
should rely on previous knowledge about what properties are important for a
certain structure. For example, heap-sharing via selectors is important since it
allows us to determine list-ness along (unbounded) traversal of list x = x.sel.
34.Updating sharing x.sel:=y.
is[sel]’(v):=
(v1:x(v1)?
(y(v)?
v2:sel(v2,v)   eq(v2,v1)
:sel(v1,v)?
v2,v3:  is[sel ] (v2,v3,v)
 eq(v2,v1)  eq(v3,v1)
:is[sel](v))
:is[sel](v))
 is[sel ] (v2,v3,v)=sel(v2,v)sel(v3,v)   eq(v2,v3)
The trivial solution is updating the formula by calculating it – of course, not
precise.
The problem is that if the predicate is defined, then it must be updated.
35.Oher Instrumentation.



Doubly linked list:
o c[cdr,car](v) = v1:cdr(v,v1)car(v1,v)
o c[cdr,car](v) = v1:car(v,v1)cdr(v1,v)
Reachability: Usefull for compile-time GC
o r[sel](v1,v2) = sel*(v1,v2)
o r[x,sel](v) = v1:x(v1)sel*(v1,v)
o r[x](v) = v1:x(v1)(car|cdr)*(v1,v)
Sortedness (from previous lecture):
o InOrder[sel,dle](v) = v1:sel(v,v1)dle(v,v1)
o InROrder[sel,dle](v)= v1:sel(v,v1)dle(v1,v)
Defining more properties, the system is more precise. But can be more
expensive since (i) we need to update this predicate and (ii) the number of
nodes in the represented structures may be larger.
36.Example.
Still not precise enough – x still points to ½.



37.Semantic Reduction.
The second mechanism used to increase precision.
Improve the precision of the analysis by recovering properties of the
program semantics.
A formal definition:
o A Galois connection ( L1 , ,  , L2 )
o An operation op: L1  L2 is a semantic reduction
   L2 op(1)
1
 (op(1)) = (1)


Can be applied before and after basic operations
Preserve soundness!
In general the reduction is from an abstract world to an abstract world.
We get a more precise element, but they both present the same concrete
elements.
We don’t look for the best solution, but for a better one.
38.Materialization.
A private case of semantic reduction:
The bottom possibility is, of course, more precise, since we’ve created a
new node pointed by x. In this case we, for example, know that x is a
successor of y. The bottom can be embedded into top, mapping two last
elements to the same one.

39,40,41.The focusing principle.
To increase precision
o “Bring the predicate formula into focus” (Force ½ to 0 or 1)
o Then apply the predicate-update formula.
Generalizes materialization

For example:

L1 and L2 are groups of structures. Here L1 has one structure and L2
has three, but they both represent the same cells in memory. For first
structure of L2 the formula is evaluated to 0, for second to 1, for third
- for u1 to 1 and for u0 to 0. The difference between second and third
is that the second can’t be evaluated to 0.
Clarification Regarding summary node and nodes or pointers
with a value of ½:
A summary node represents one or more nodes in the concrete world.
The first option of the focus function does not have the cdrlink from u1
to u (which has a value of 1/2 in the un-focused structure). It seems
that this is a contradiction. If a value of 1/2 means that at least one of
that item exists in the concrete world, then there should still be a link
from u1 to somewhere within the list that the summary node u
represents in every concrete representation.
There are two separate issues here, which may be confusing:
o A predicate value of 1/2 represents the values {0}, {1} and {0,1}.
This means regarding an indefinite 1/2 cdr-edge from u1 to u2 that
we really don't know if there exists locations l1 (represented by u1)
and a location l2 (represented by u2) such that l1.cdr = l2. Notice
that the above is also true for summary nodes, which are nodes u,
having eq(u, u)=1/2. In this case, we don't know if there exist l1
(represented by u) and a location l2 (represented by u) such that l1
and l2 are different.
o From technical reasons we require that the mapping f used in the
embedding is onto, i.e., every individual (node) in the abstract
domain represents some location. This requirement ensures that
when the formula v :  evaluates to 1 in an (abstract) 3-valued
logical structure S, we know that v :  also evaluates to 1 in all the
concrete structures represented by S # . This means that every
node, including the summary node, represents one or more
concrete locations in the  ( S # ) .The only difference between
summary nodes and non-summary nodes is the fact that a

summary node may represent two locations from *one* concrete
state e.g., the elements in the tail of a long linked list. The practical
implications of this decision are that the abstract interpretation may
need to keep around many 3-valued structures for the different
possibilities.
Evaluate the predicate-update formulae:
The results are still not very precise – there are some unexpected things in
the concrete world.


42,43,44. The focus operation.
Focus: Formula(P(3-Struct) P(3-Struct))
For every formula 
o Focus()(X) yields structure in which  evaluates to a definite(0
or 1 but not ½) values in all assignments.
o Focus() is a semantic reduction
o But Focus()(X) may be undefined for some X.
o Can return an infinite number of structures.
For example: Focus on v1 : cdr (v, v1 ) X
cdr
X
Y
X
cdr
u1
u
X



Summary.
Predicate logic allows naturally expressing SOS for languages with pointer
and dynamically allocated structures.
3-valued logic provides a sound solution.
Semantic reduction improves precision and preserves soundness.
Download