jenny

Shape Analysis via 3-Valued Logic. Jenny Sannikov jenny_sv@yahoo.com Shape Analysis with applications. Chapter 4.6          2. Lecture Outline. Collecting Semantics using first order logic. 3-Valued logic and embedding. Simple abstract semantics using logic. More precise abstract semantics. 3. Collecting Semantics using Logic. Represent states using logical structures. Construct the program control flow graph with a distinguished node start. Define the set of logical structures at start. Define the meaning of program conditions using closed first order formulae. Define the meaning of statements using first order formulae. * The set of structures can be finite or infinite. * Suppose that at the beginning of the program the initial values are known – for example the whole memory is empty, or in C program – all global variables are 0. 4. The SWhile Programming Language. Recall from previous lecture, that SWhile is an extension of While language, which includes dynamic memory allocation, pointers and destructive update. Each allocated memory element is constructed of two cells: car and cdr. Abstract Syntax: Destructive sel:= car | cdr update a := x | x.sel | null | n | a1 op a a2 Memory allocation b := true | false | not b | b1 op b b2 | a1 opr a2 S := [ x : a]l | [ x.sel : a]l | [ x : malloc ()]l | [ skip]l | S1 ; S2 | if [b]l then S1 else S2 | while [b]l do S 5,6. An Example. The following program creates a linked list with count nodes. x : null ;1 while ([count  0]2 )( [t : malloc (); ]3 [t.cdr : x; ]4 [ x : t ]5 [count : count  1]6 )   The following predicates are defined: Unary Predicates – their number is as the number of the variables in the program: o x(v) - will be true for cells pointed by x. o t(v) - will be true for cells pointed by t. Binary Predicates: o car(v1,v2) – true if car(v1)=v2. o cdr(v1,v2) – true if cdr(v1)=v2. o eq(v1,v2) – true if v1 and v2 point the same cell. Let's analyze the given program control flow graph (as appears in slide 6): 1. x:=null "means" {x'(v):=0} – doesn't point any cell in memory. 2. count>0 is ignored, meaning that the number of allocated elements is unknown and unimportant. The loop can be performed any number of times. 3. t:=malloc() semantics is {let v0 := new in t( v ):=eq( v , v0 )} v here is any cell in the memory. 4. t.cdr:=x : {message  v: t(v) -> ... cdr'(v1,v2):=(t(v1)?x(v2):cdr(v1,v2))} The first part handles errors. If t doesn't point any cell in memory (there exists no v such that t(v) is true) an error message can be generated. The second part is based on a well-known C shortcut for an “if then else” sentence. In logic - ( ?1 :  2 )  (  1 )  (   2 ) . 5. x:=t – {x'(v):= t(v)} – an assignment sentence semantics is that x points a cell if and only if t points it too. 7,8. Another Example – the reverse program. Recall from previous lecture:  y : null ;1 while ([ x! null ]2 )( [t : y; ]3 [ y : x; ]4 [ x : x.cdr; ]5 [ y.cdr : t; ]6 ) The Predicates list is built in a similar way to the previous example – three unary predicates (x(v),t(v),y(v)) and three binary. Let's analyze the given program control flow graph (as appears in slide 8). The semantics of most of the sentences is like in previous program. Pay attention to x!=null sentence. Its semantics is v:x(v). x:=x.cdr;  "means" {message :x(v)->... x'(v) := v1:x(v1)cdr(v1,v)}. 9. Statement's Meaning Conclusion. Pay attention to y.sel and x.sel – the way to move through the data structure. Thus the data structure doesn't have to be a list, but any data structure. 10. Condition's Meaning Conclusion. Pay attention to the difference between eq and == program condition. The binary operator eq denotes equality on allocations – dynamic cells attribute. The == condition denotes equality of syntactic program expressions denoting L-values. It succeeds when both point to the same place in memory. There is a difference between variables (statically determined) and locations (dynamically allocated and unbounded number). 11. Collecting Semantics. CS (start) = {<,>} CS (v) = {st(u) (S):u  v  E, S  CS(u)}  {S : S, u  v  E t ,S  cond(u)}  {S : S, u  v  E f ,S  cond(u)}  We deal here with a group of concrete states represented by logical structures. We assume that a program’s flow control graph has a start node (otherwise such a node can be added). The start node has no predecessors. For a node v – CS(v) is a group’s union, which is usually infinite. Given a logical structure and a previous state we execute the statement and get the current state. In the case of a logical condition – there are two entries - E t and E f . If the meaning of the condition is satisfied (the formulae of the condition is 1 – there are no free variables), then the structures move to CS of the next node. If the state satisfies the negation of the formulae then the false statement is performed. Look again at the slide 6 control flow analysis. The CS(v) of the first node in the loop ([count>0]) is: CS(2) = {x:=null (S) | S  CS(1)}  {x:=t (S) | S  CS(5)} It is possible that at this point the memory is empty or there is one (two, three...) cell pointed both by t and by x. What is clear is that in CS(2) x and t are always equal – or both null or both point the same cell. 12.Three-Valued Logic. We will apply all formulas in three-valued logic, when third value ½ means “unknown”. Two-valued Logic: 1 and 0 state for true/false. Three-valued logic:  1: True  2: False  ½: Unknown  A join semi-lattice 0  1 = ½.   13. 3 – Valued Logical Structures. A set of individuals (nodes) U. Predicate meaning - P S : U S  {0,1,1 / 2} . {0, 1,1/2} stay for true/false/don’t know for indicating whether individual u in U satisfies predicate p in P. 14. Example. In the following example x S and y S are unary predicates and car S , cdr S and eq S are binary predicates. x points only at u1, so xS = {u1  1, u2  0, u3  0} . The list has three or more nodes. Since u3 is a summary node, its equality is unknown (1/2).       15,16. Embedding. A pre-partial order on 3-valued logical structures. S1  S 2  Every concrete state represented by S1 is also represented by S 2 (but S1 is more precise). The set of nodes in S1 and S 2 may be different. o No meaning for nodes (abstract locations). S1  f S 2  o f maps the individuals of S1 onto S 2 . f is onto, meaning that for every element in S 2 there is an element in S1 . o p1S (u1 ...u k )  p 2S ( f (u1 ).. f (u k )) . S1  S 2  There exists f such that S1  f S 2 . Pre-partial order. The order is reflexive, transitive, but not anti-symmetric. o Reflexive – by f that maps an element to itself o Transitive – composing f  Induces a pre-partial order on P(3-Struct) o Set-union is a least upper bound  Finite height – important for ending the iterative process. Details later.  :3-Struct  P(2-Struct) o (S)={S’:S’2-Struct,S’S}  :P(3-Struct)  P(2-Struct) o (XS)=  SXS (S) *Two concrete structures are equal if they isomorphic. Pay attention that 0 ½ and 1  ½. The following example demonstrates an embedding: S1: X cdr S2: X cdr L1 L1 cdr cdr L2 L2 cdr cdr L3 cdr L4 cdr L5 L3 The question asked here is whether S1  S2 under any f? Yes, if we map L1 to L1, L2 to L2 and the rest (L3. .L5) to the summary node. This f is onto. The eq and cdr predicates satisfy the needs: 0 ½ and 1  ½ . 17. Tight Embedding.        S=< U , P > f: U S  U # such that f is onto. Define S #  U # , P #  o p # (u #1 ,.., u # k )   { p S (u1 ,.., u k ) : f (u i )  u # i } Don’t map to ½ if you don’t need to (join: 0 1=1/2) S  f S# S S 18. The Abstraction Principle. Partition the individuals into equivalence classes based on the values of their unary predicates (Blur operation in the next example). Collapse other predicates via . The result is a special case of tight embedding in the mapping f is determined by the (concrete) value of unary of unary predicates. 19. An example. We can see that three elements of the concrete world –u2, u3, u4 collapsed into one element u234 in abstract world. These 3 elements were grouped in one equivalence class, because all their unary predicates (x, y) are identical. Their binary predicate n (next) wasn’t identical, but it was collapsed by join action on each partition. As a result information was lost: in abstract world we have a summary node, which means that any of u2, u3, u4 can point to any of u2, u3, u4. In concrete world we knew that only successors' pointers exist. This operation is called blur and can be also applied to an abstract 3-valued structure. In this case, it acts as a kind of widening; it embeds a 3-valued structure in a potentially less precise but more compact one. 20. Boolean Connectives [Kleene]. A meaning (conservative) for the formulas in logic is:           21,22. Formal Semantics of First Order Formulae. For a structure S  U S , P S  . Formulae  with LVar free variables. Assignment: z: LVar  U S (free variables to individuals). s (z):{0,1,1/2} – The value of the formulae for an assignment s. An inductive definition: o 1s (z) = 1 o 0s (z) = 1 o  p(v1 , v2 ,..., vk ) s (z) = p S ( z (v1 ), z (v 2 ),..., z (v k )) s (z):{0,1,1/2} o 12s (z)=max(1s (z), 2s (z)) o 12s (z)=min(1s (z), 2s (z)) o 1s (z)= 1-1s (z) o v:1s (z)=max{1s (z[vu]): uUs } 23,24. The embedding Theorem. Evaluating formula in S is conservative with respect to evaluation in the (potentially infinite) structures represented in  (S ) . Every formula  is preserved o  =1 in S   =1 in every S '  ( S ) o  =0 in S   =0 in every S '  ( S ) o  =1/2 in S  don’t know S f S’ Formulae  with LVar free variables Assignment z: LVar  U S o s (z)  s’ (fz)  Writing the concrete semantics using logic, we’ve got the soundness for free.  But the evaluation may be suboptimal (over conservative) For example, consider the formulae [v : x(v) ?1 : 1] and a structure: x By definition, we have to evaluate (v : x(v )  1)( v : x (v )  1) . The calculated value of the formula is (1 / 2  1)  (1 / 2  1)  1 / 2 , but it is clearly seen that the formula is true. A practical solution for this problem is a more precise definition for the formula. However, there undecidabllity of first order logic implies that there are many formulae whose values are ½ in 3-valued logic while they have a definite (1 or 0) value in all represented concrete structures.   25. Shape Analysis via Abstract Interpretation. Iteratively compute a set of 3-valued structures for every program point. Every statement transforms structures according to the predicate-update formulae o use 3-valued logic instead of 2-valued logic o use exactly the predicate-update formulae of the concrete semantics! 26. Abstract Semantics. AI(start)={< ,> } AI (v) = {st(u)3 (S):u  v  E, S  AI(u)}  {S : S, u  v  E t ,S 3 cond(u)}  {S : S, u  v  E f ,S 3 cond(u)}  Notice that if there is a condition evaluated to ½, then both then and else are calculated. 27. Example. Let us look at the example program at slide 6. At every point there is a group of structures, which is final because of the blur. We assume that at the beginning the work- list is initialized with all the elements and the structure is empty: The process has to stop – at each step blur is performed. After three iterations we get a list of any length. At the next iteration the process stops, although the program can perform infinite number of allocations. Pay attention that at the end a number of important facts are known. For example that x and t must point to the same cell in memory. 1. 2. 3. 4. At the beginning the structure is empty, the work list is {1,2,3,4,5,exit} First node – the structure is empty, work list = {2,3,4,5,exit} Second node - the structure is empty, work list = {3,4,5,exit} Third node - the structure is empty, work list = {3,4,5,exit} 5. Fourth node – the structure is t Work-list = {4,5,exit} 6. Fifth node – the structure is: x t Work-list = {5,exit} 7. x t After blur there is no change, since the unary predicates are different. 8. Fourth node again – x cdr t After blur there is no change, since the unary predicates are different. 9. Fifth node again – x cdr t After blur there is no change, since the unary predicates are different. 10. cdr x t After blur there is no change, since the unary predicates are different. 11. Fourth node again x cdr t cdr After blur there is no change, since the unary predicates are different. 12. Fifth node x cdr t cdr After blur the two last cells are collapsed, since their unary predicates equal (x(v) and t(v) are both 0). 13. Finally x t Summary node 28.The Reverse Program example. Look at the following list: x u1 u2 After a few steps we get: y x u1 Next step is x:=x.cdr: For u1 it’s 0(there is no cdr that contains u1), for u2 it’s ½ (1/2  1=1/2), so we get: y u1 x Now we know that x points somewhere after y – a lot of information was lost. Actually the fact that it is a list was “forgotten”. The process is sound, but not precise.    29. Intermediate Summary. Predicate logic allows naturally expressing SOS for languages with pointers and dynamically allocated structures. 3-valued logic provides a sound solution o Immediate from Embedding theorem. o All you need is to guarantee the SOS correctness. But not very precise (as seen in the previous example). 30. More precise abstract interpretation. We will use two separate mechanisms:  Refine the abstraction (concretization)  More precise abstract interpretation of basic statements o But not necessarily the best (induced).    31.The Instrumentation Principle Increase precision by storing the truth-value of some designated formulae. Introduce predicate-update formulae to update the extra predicates. This is the first option, refine abstraction and concretization. 32,33.Example: Heap Sharing. Recall previous lecture: "is shared” is 0 for all the nodes, so the joint is 0 too. In next example “is shared” information holds for each concrete node and is preserved in an abstract representation. For each sentence in semantics, a special rule for updating “is” value will be added. Notice that this kind of sharing is “heap-sharing” since it ignores sharing caused by multiple pointers from the stack. Indeed, stack-sharing need not be stored explicitly in the abstract domain since it can be recovered from the values of unary predicates. In this case ‘u’ node will be determined as shared. Adding more unary properties increases the precision but also may increase the cost. How should we choose those properties? There is no precise answer. We should rely on previous knowledge about what properties are important for a certain structure. For example, heap-sharing via selectors is important since it allows us to determine list-ness along (unbounded) traversal of list x = x.sel. 34.Updating sharing x.sel:=y. is[sel]’(v):= (v1:x(v1)? (y(v)? v2:sel(v2,v)   eq(v2,v1) :sel(v1,v)? v2,v3:  is[sel ] (v2,v3,v)  eq(v2,v1)  eq(v3,v1) :is[sel](v)) :is[sel](v))  is[sel ] (v2,v3,v)=sel(v2,v)sel(v3,v)   eq(v2,v3) The trivial solution is updating the formula by calculating it – of course, not precise. The problem is that if the predicate is defined, then it must be updated. 35.Oher Instrumentation.    Doubly linked list: o c[cdr,car](v) = v1:cdr(v,v1)car(v1,v) o c[cdr,car](v) = v1:car(v,v1)cdr(v1,v) Reachability: Usefull for compile-time GC o r[sel](v1,v2) = sel*(v1,v2) o r[x,sel](v) = v1:x(v1)sel*(v1,v) o r[x](v) = v1:x(v1)(car|cdr)*(v1,v) Sortedness (from previous lecture): o InOrder[sel,dle](v) = v1:sel(v,v1)dle(v,v1) o InROrder[sel,dle](v)= v1:sel(v,v1)dle(v1,v) Defining more properties, the system is more precise. But can be more expensive since (i) we need to update this predicate and (ii) the number of nodes in the represented structures may be larger. 36.Example. Still not precise enough – x still points to ½.    37.Semantic Reduction. The second mechanism used to increase precision. Improve the precision of the analysis by recovering properties of the program semantics. A formal definition: o A Galois connection ( L1 , ,  , L2 ) o An operation op: L1  L2 is a semantic reduction    L2 op(1) 1  (op(1)) = (1)   Can be applied before and after basic operations Preserve soundness! In general the reduction is from an abstract world to an abstract world. We get a more precise element, but they both present the same concrete elements. We don’t look for the best solution, but for a better one. 38.Materialization. A private case of semantic reduction: The bottom possibility is, of course, more precise, since we’ve created a new node pointed by x. In this case we, for example, know that x is a successor of y. The bottom can be embedded into top, mapping two last elements to the same one.  39,40,41.The focusing principle. To increase precision o “Bring the predicate formula into focus” (Force ½ to 0 or 1) o Then apply the predicate-update formula. Generalizes materialization  For example:  L1 and L2 are groups of structures. Here L1 has one structure and L2 has three, but they both represent the same cells in memory. For first structure of L2 the formula is evaluated to 0, for second to 1, for third - for u1 to 1 and for u0 to 0. The difference between second and third is that the second can’t be evaluated to 0. Clarification Regarding summary node and nodes or pointers with a value of ½: A summary node represents one or more nodes in the concrete world. The first option of the focus function does not have the cdrlink from u1 to u (which has a value of 1/2 in the un-focused structure). It seems that this is a contradiction. If a value of 1/2 means that at least one of that item exists in the concrete world, then there should still be a link from u1 to somewhere within the list that the summary node u represents in every concrete representation. There are two separate issues here, which may be confusing: o A predicate value of 1/2 represents the values {0}, {1} and {0,1}. This means regarding an indefinite 1/2 cdr-edge from u1 to u2 that we really don't know if there exists locations l1 (represented by u1) and a location l2 (represented by u2) such that l1.cdr = l2. Notice that the above is also true for summary nodes, which are nodes u, having eq(u, u)=1/2. In this case, we don't know if there exist l1 (represented by u) and a location l2 (represented by u) such that l1 and l2 are different. o From technical reasons we require that the mapping f used in the embedding is onto, i.e., every individual (node) in the abstract domain represents some location. This requirement ensures that when the formula v :  evaluates to 1 in an (abstract) 3-valued logical structure S, we know that v :  also evaluates to 1 in all the concrete structures represented by S # . This means that every node, including the summary node, represents one or more concrete locations in the  ( S # ) .The only difference between summary nodes and non-summary nodes is the fact that a  summary node may represent two locations from *one* concrete state e.g., the elements in the tail of a long linked list. The practical implications of this decision are that the abstract interpretation may need to keep around many 3-valued structures for the different possibilities. Evaluate the predicate-update formulae: The results are still not very precise – there are some unexpected things in the concrete world.   42,43,44. The focus operation. Focus: Formula(P(3-Struct) P(3-Struct)) For every formula  o Focus()(X) yields structure in which  evaluates to a definite(0 or 1 but not ½) values in all assignments. o Focus() is a semantic reduction o But Focus()(X) may be undefined for some X. o Can return an infinite number of structures. For example: Focus on v1 : cdr (v, v1 ) X cdr X Y X cdr u1 u X    Summary. Predicate logic allows naturally expressing SOS for languages with pointer and dynamically allocated structures. 3-valued logic provides a sound solution. Semantic reduction improves precision and preserves soundness.

jenny

Related documents

Products

Support

jenny

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib