The Simplify Theorem Prover Class Notes for Lecture No.8 by Mooly Sagiv Notes prepared by Daniel Deutch Introduction This lecture will present key aspects in the leading theorem proving systems existing today, with emphasis put on a system called Simplify, which is the proof engine used in the Extended Static Checkers ESC/Java and ESC/Modula-3. This system’s basic approach is proof by refutation, where it tries to show the validity of a given formula by showing that its negation is unsatisfiable. This system is incomplete, in the sense that it might output that a formula is invalid where it is in fact valid. However, it is sound, so that every formula that the system approves its validity is guaranteed to be valid. We shall start with description of the basic search strategy for showing unsatisfiability of propositional formulas, discuss handling of formulas that consist of propositions of multiple theories, and then add quantifiers into the bowl. The lecture will conclude by analyzing the faults and merits of Simplify, and presenting alternative systems. Search Strategy [Source – Simplify: A theorem Prover for Program Checking\Detlefs,Nelson,Saxe, http://citeseer.ist.psu.edu/detlefs03simplify.html] We start with a simple propositional formula, with no quantifiers, and try to show its validity by considering its negation and showing it has no satisfying assignment. The algorithm uses a data structure called context, which holds the negated formula in conjunction with the assumptions used in each case, which will be used to find contradictions. The context is composed out of a set of literals, lits, representing the conjunction of all its elements and a set of clauses, cls, where a clause is a set of literals representing its disjunction. The set cls represent the conjunction of all its elements. We can thus see that the set cls correspond to a Conjunctive Normal Form if the literals, and indeed we shall use this set to build a CNF translation of the formula as the algorithm proceeds. A context also contains a boolean named refuted, set to true where the context is found to be inconsistent. Setting this boolean to be true actually means that the current search path had failed and a backtrack takes place. Following the SAT procedure is presented. This procedure outputs a set of monomes, i.e. a set of satisfying assignment describing all ways of satisfying the context. A monomer is a conjunction where all variables set true by the assignment appear in their positive form, and all variables set false by the assignment appear in their negative form. The SAT procedure [Source – Simplify: A theorem Prover for Program Checking\Detlefs,Nelson,Saxe, http://citeseer.ist.psu.edu/detlefs03simplify.html] Signature: Input: Works on the globally defined ‘context’ Output: A set of monomes such that: (1) Each monome is consistent (2) Each monome implies the context (3) The context implies the disjunction of all the monomes. (1) and (2) just imply that each monome found is a legal assignment and that is a satisfying assignment for the context. From (2) and (3) we get that the output is a Disjunctive Normal Form of the context. Moreover, soundness and completeness of the procedure is assured by (1),(2),(3). SAT uses a simple backtracking search that tries to extend the set of literals by including in it (which is equivalent to ‘guessing’ it is true) one literal from each clause in the set of clauses. When a contradiction is found, backtracking takes place. A refinement procedure is used to remove useless clauses or literals – a clause that has a literal implied by lits will never lead to contradiction and can be removed, and if it contains a literal such that its negation is implied by lits then this literal can be deleted from the clause (a clause is a disjunction, so this literal won’t be helpful for either satisfiability or unsatisfiability. Example Consider the following formula in the context of Uninterpreted Functions theory: ~f = f(x)=f(b) ^ f(y)=f(a) ^(~(x=b) V ~(y=a)) The algorithm tries by guessing f(x)=f(b) is correct, then guessing f(y)=f(a) is correct, and then tries guessing ~(x=b) is correct. It now encounters a contradiction (with f(x)=f(b) along with the axioms of the Uninterpreted Functions Theory), and backtracks to trying ~(y=a). Again, a contradiction (this time with f(y)=f(a)), and thus no satisfying assignment exist ( so f, the original formula, is valid). Handling multiple theories [Source – Simplification By Cooperating Decision Procedures\Nelson,Oppen, http://portal.acm.org/citation.cfm?id=357073.357079&coll=portal&dl=ACM&idx=3570 73&part=periodical&WantType=periodical&title=ACM%20Transactions%20on%20Pro gramming%20Languages%20and%20Systems%20(TOPLAS)&CFID=64292546&CFTO KEN=72917873 ] So far we’ve considered formulas over a single theory, e.g. the theory of Uniterperted functions in our example above. However, in real life programs variables have many different domains, and one would like to verify properties where these variables are used together. We are thus in search of a way to combine decision procedures of different theories into a single theory. Throughout this section, we still assume the formulas to be quantifier free. Basic idea We have in hand a formula f which we want to show is valid. The first stage is standard – as always in proof by refutation, we transform it into ~f (NOT(f)), and try showing it has no satisfying assignment. However, it contains un-comparable expressions from different theories (Say 2 theories, apply repeatedly for more theories). So we separate the conjunctive formula into two formulas A and B, such that A^B, where A contains only terms from one theory, and B contains only terms from the other theory. Of course, a connection between the two formulas may appear in the original formula. So these formulas may have shared parts, but we can show that the decomposition can take place such that only constants are shared between A and B. Now use the theories decision procedure to decide satisfiability of A and B. If one of A or B is unsatisfiable, surely the entire formula is so as well. If both are satisfiable, we start reasoning about equalities, and propagate these equalities between the formulas, repeatedly. Examples of theories Arrays select(store(v,i,e),j)= if i=j then e else select(v,j) store(v,i,select(v,i))=v store(store(v,i,e),i,f)=store(v,i,f) i <>j -> store(store(v,i,e),j,f)=store(store(v,j,f),i,e) Lists car(cons(x,y))=x cdr(cons(x,y))=y ~atom(x) -> cons(car(x),cdr(x))=x ~atom (cons(x,y )) All axioms are interpreted with a ‘for all’ quantifier quantifying over all variables in the axioms. Algorithm(Nelson-Oppen method) 1. Decompose F into F1 and F2 so that F1 is a conjunction of literals from one theory, F2 is a conjunction of literals from the second theory,and the conjunction of F1 and F2 is satisfiable if and only if F is. 2.If either F1 or F2 is unsatisfiable, return unsatisfiable. 3.If either some equality between variables is a logical consequence of F1 or F2 but not of the other, then add the equality as a new conjunct to the one that does not entail it, and repeat step 2. 4.If either F1 or F2 entails a disjunction of equalities, without entailing any of the equalities alone, then try all options(add each equality as a conjunct and apply the procedure, repeat for all equalities). If any of these formulas are satisfiable,return satisfiable. Otherwise return unsatisfiable. Why is step 4 required? Convexity A formula F is non-convex iff F entails a disjunction of formulas (in our case equalities), but does not entail anyone of them alone. A simple arithmetic example for such formula is the formula x*y=0, which entails (x=0 V y=0) but we cannot know which one is true. For such formulas we need to check both cases (or in the general case, m cases), and this is called case split. A theory is convex iff every conjunctive formula in it is convex, meaning if it contains one or more non-convex conjunctive formulas, it is non-convex. The theory of uninterpreted functions and the theory of relational liner algebra are examples for convex theories. The theory of arrays, the theory of reals under multiplication and the theory of integers under ‘+’ and ‘<=’ are all examples of nonconvex theories. Decomposition Example 1. Let a symbol f be a function symbol of one theory, and let g be a function symbol of the second theory, and let the formula be {f(g(x)) = g(f(x))}. We repeatedly replace term with newly generated variables, Namely v1,v2,…., as follows: Replace g(x) by w1 {f(g(x)) = g(f(x)) ============= {w1=g(x), f(w1) = g(f(x)} Replace f(x) by w2 replace f(w1) by w3 ============= {w1=g(x),w2=f(x),f(w1)=g(w2)}============= replace g(w2) by w4 {w1=g(x),w2=f(x),w3=f(w1),w3=g(w2)} ============== {w1=g(x),w2=f(x),w3=f(w1),w4=g(w2),w3=w4} We can now decompose the set into two sets, one containing only ‘f’ and the other containing only ‘g’: {w2=f(x),w3=f(w1),w3=w4} , {w1=g(x),w4=g(w2)} We then start to add equalities according to the closure (e.g. we can add w3=g(w2) etc.), however naturally we won’t find a contradiction as none exist in this case. 2. F = {1 <= x, x <= 2, f(x) <> f(1), f(x) <> f(2)}. This is unsatisfiable over the theories of integers with inequalities combined with the theory of uninterperted functions, both with equalities. The first two literals imply x=1 or x=2, but the latter two imply that both are impossible, so a contradiction will be found in this case. We apply variables replacement to ‘1’ and ‘2’ to obtain: {w1 <= x, x<=w2, w1=1,w2=2},{f(x) <> f(w1), f(x)<>f(w2)} The set of shared variables is {x1, x2, w}. From the theory of uninterperted functions we can conclude x<>w1, x<>w2 through the axioms of uninterperted functions and congruent closure, as we saw in the previous lecture. But moving these two inequalities along with {w1<=x,x<=w2,w1=1,w2=2} we get {w1<=x, x <> w1, x<=w2, x <> w2, w1=1,w2=2} which is detected as unsatisfiable through congruent closure, this time with the inequalities axioms. Algorithm Completeness [source:”Combining Decision Procedures”\Manna,Zarba http://www-step.stanford.edu/papers/fmcr03.html ] Residues of a formula For every formula f, Res(f) is the strongest simple formula that f entails. Meaning, for every simple formula h entailed by f , then Res(f) entails h. Res(f) can be written such that its only variables are the free variables of f. The existence of such formula is guaranteed by Craig’s Interpolation Lemma, as follows: If F entails G then there exists a ‘mid’-formula H such that f entails H and H entails G.. In our context, it’s the strongest set of equalities between constants that can be entailed by a formula. In a sense, it is the most interesting thing we can deduce from the formula, and we would like to show that this set of equalities stays intact upon decomposition. In other words, we would like to say that for each two formulas A,B Res(A) ^ Res(B) = Res(A^B). However, this is true only under some assumptions, as follows. Stably infinite theory A theory T is stably-infinite if for every T-satisfiable, quantifier-free, formula f there exists a T-interpretation (model) A satisfying f such that the domain of A is infinite, i.e. A maps each variable, constant etc. into an infinite set. The theory with the axiom {(Ax x=1 V x=2)} is a simple example for a theory that is not stably infinite, as every interpretation can assign to x only 1 or 2. Conversely, the theories of integers, reals, lists and arrays are all stably-infinite. Correctness The Nelson-Oppen method is complete if the following three assumptions hold: 1. The formula must be quantifier-free 2. The signatures of the theories are disjoint(only constants are shared) 3. The theories must all be stably infinite. As stated before, we can consider only two theories. Combination Theorem Let Ti,i=1,2 be two disjoint theories as before, F1, F2 formulas over T1,T2 respectively. So F1 U F2 is satisfiable if and only if there exist two interpretations A,B such that : 1. |A| = |B| 2. x=y under A if and only if x=y under B for each pair of shared variables x,y. Proof (sketch) Define an isomorphism (a mapping that is one-to=one and onto) from the interpretation of A to the interpretation of B. h(x under A) = x under B. This mapping is guaranteed to be an isomorphism because of the two conditions above. We can now use this isomorphism to create a satisfying interpretation, as we interpret variables, constant and function symbols that are in the range of B by B itself, and those in the range of A by h^(B(s)), for each symbol S. Theorem Let Ti, i=1,2 be stably infinite theories; let F be a conjunctive formula in a separated form as above, and Fi be the part of F that resides in Ti. Then F is satisfiable if and only if there exists an equivalence relation E over the shared variables of the Fi formulas such that Fi along with the equivalences in E (which all range over shared variables, and specifically those of Fi) ,denote Fi U E, is Ti-satisfiable for i=1,2 Proof Let M be an interpretation satisfying F. We create a simple equivalence relation such that x=y iff both are equal under M. By construction it is clear that both Fi-s along with the new equivalence relation are satisfiable by the parts of M relevant to each. Vice versa, let E be an equivalence relation such that Fi U E is Ti-satisfiable. Since Ti is stably-infinite, there exists a T1-interpertation A satisfying F1 U E such that A is infinite(but countable), and a countable infinite T2-interpertation B satisfying F2 U E. |A|=|B|, and x=y under A if and only if x=y under B, so we have the existence of an interpretation satisfying F1 U F2=F, by applying the combination theorem. The fact that such combined interpretation exists still does not ensure that the algorithm will find it. But by properties of the congruent closure we can obtain that this equivalence relation will eventually be found. Handling quantifiers So far we’ve handled only formulas with no quantifiers. Allowing quantifiers within the formula requires a different approach, as the satisfiability problem that was NPcomplete but decidable for the no quantifiers case, becomes undecidable, and naturally the Nelson-Oppen method does not apply to formulas with quantifiers. Simplify uses an incomplete method that is not guaranteed to find a proof if such exist (Obviously we couldn’t expect it to decide the problem, but there are systems, such as SPASS, that semi-decides it, i.e. that find a proof if such exist). However, it turns out that in many practical cases the matching technique used in Simplify to handle quantifiers. Basic idea A universally quantified formula Ax1…Axn P means that for every assignment to x1..xn, P holds. So if we have such formula in hand, we can instantiate its variables with any values and P will hold. So the algorithm heuristically chooses the “most relevant” substitution to the variables. An instance is relevant if it contains enough terms for which equalities are already known, with reasonable hope that such instance that has an information on an equivalence class already generated, contains more entropy than an instance baring a piece of information on terms we know nothing about. A simple example states that if we already know that a=b, and now know that b=c we can also deduce that a=c, where from c=d we can’t obtain any new equalities. A simple implementation of this idea is obtained by choosing a single term from P, called the trigger and denoted t. For each substitution S, it is considered relevant if S(t) appears in the equalities already known. Choosing a good trigger is crucial for the algorithm success. A good trigger will enable further equations, and thus extending the congruence graph and allowing the proof to continue. A bad trigger might fail to create further instances, or, in a more severe case, cause a matching loop, where extensions to the congruence graph keep occurring in an unbounded manner. We shall now see examples for the three kinds of triggers. Example Consider the formula AxAyAz (x+y)*z=x*z+y*z We can choose the trigger to be x+y. Now say that we have some prior knowledge on equalities containing a+b.i.e it appears in the congruence graph. So we’ll find it and generate the new information regarding it, by adding a new ‘*’ node to the congruence graph if it doesn’t already appear. A more restrictive trigger would try to match the entire term (x+y)*z. The quality of these both triggers depends on what we are trying to prove. If we are trying to prove a+b=b+a, then the restrictive trigger will fail to produce any new instance. However if we are trying to prove (a+b)*c=a*b+a*c, then the restrictive trigger is excellent as it immediately leads to the proof. A trigger with another bad impact is a trigger that generates a matching loop. A matching loop occurs when applying one matching leads to a new instance of another matching and so on. An example to this case can be found in Matching Loop Example Assume Ax f(x)= f(g(x)), and choose f(x) as trigger. Now assume we have in hand f(g(a)) = a from previous knowledge. We instantiate the trigger in the only way possible, by x=g(a), and now have {f(g(a)) = a, f(g(a)) = f(g(g(a)),a=f(g(g(a))}(the first known, the second by directly by instantiation, the third as a closure of the first two), so we now choose x=g(g(a)), and add f(g(g(g(a))) to the congruence, etc. This process constitutes a never-terminating loop. Matching loops can be reduced in numbers by heuristics for choosing good triggers, or by asking the user to supply one. However matching loops are not avoided, and Simplify tries to track them when they do occur, so that it will at least give an alert in such a case. Triggers as sets There are rules that cannot be instantiated by choosing a single trigger. AxAyAz x=3 ^ y=4 ^ z=x*y -> z=12 will lead to nothing if we only choose x=3, so we are allowed (and sometimes must) to use a set of items as triggers. Faults and merits of Simplify Faults 1. The system is incomplete for quantified formulas, and may fail for some ‘interesting’ properties of real-life programs that tend to be complicated. 2. The triggers mechanism may require human interference in order to work well. 3. The mechanism for deciding formulas without quantifiers is not state-of-the-art, there exist better, i.e. much faster algorithms for SAT. Merits 1. In terms of efficiency, in practice, the system performs well, verifying large formulas efficiently. 2. In practice and for simple programs, the system succeeds in verifying formulas without getting into a loop, so the incompleteness does not affect these cases. 3. The system is almost fully automatic (apart for the possible human selection of triggers; however the automatic selection may be sufficient), as oppose to other systems that require major human help in the process. Other Systems Verifun [Source: Lecture notes by Rajeev Joshi, www.eecs.berkeley.edu/~sseshia/219c/lectures/Lec18.pdf] As mentioned above, one of the main problems with Simplify is the fact that it uses obsolete SAT solving mechanism, causing a performance problem. The idea of Verifun is using fast SAT solvers that supply candidate truth assignments for the atomic formulas, along with ‘proof-supplying’ modules. These modules are theoryspecific, and can supply the reason for which an assignment failed to satify the formula. Using these ‘reasons’, we can now disqualify, at once, a whole set of assignments containing the same inconsistencies. Naturally, we need a theory-specific module that can produce such ‘reasons’ (“proof explicating theory modules”). Again, we shall start with quantifier-free formulas and then introduce quantifiers to the formula. Proof Explication Example (Note that ‘<=’ stands for ‘less or equal’, and ‘->’ stands for the logic connective of entailment. Start with the formula (a=b) ^ ((f(a) <> f(b) V (b=c) ) ^ f(a) <> f(c) Give each term a new name to create p ^ (q V r) ^ s. Feed this formula to the SAT solver, to receive a truth assignment: Say p,q,~r,s. Feed these set of literals (with their original meanings) to an Equality Decision Procedure: (a=b),f(a) <> f(b),b<>c,f(a) <>f(c)) It finds an inconsistency. Caused by the fact (a=b) -> f(a) = f(b). This is mapped into p -> ~q. So we add this term to the formula, and now try to satisfy p ^ (q V r) ^ s ^ p ->~q. Note that this way we have disqualified many truth assignments(namely every assignment setting both p and q to be true). The SAT solver may then suggest p,~q,r,s as an assignment. This is inconsistent as a=b ^ b=c -> f(a) = f(c). So p^r -> ~s is added to the formula, and now the formula is found to be undecidable by the SAT solver. Note that this algorithm learns, at each stage, ‘important’ new information, according to the assignment it already tried. This heuristic seems to be fruitful. Handling quantifiers As in Simplify, when considering quantifiers we might not find inconsistencies in the simple way described above. Rather, we must materialize the quantified variables to create a new piece of information which will hopefully lead to inconsistency. Example Ax Ay ((y < x) -> ~(f(x) < f(y))) 3>b a>4 f(a) < f(b) Any assignment that set b to be smaller than 3 and a to be larger than 4 is consistent. To find inconsistency we must materialize the quantified formulas with x=a, y=b, and thus we add a new tautology: Ax Ay ((x > y) -> f(x) > f(y)) 3>b a>4 f(a) < f(b) (b<a) -> ~(f(a) < f(b)) Say we now choose an assignment that set (a>4), (b < 3), ~(b<a) are true. The proof explicator finds a contradiction and we add a new rule (a>4), (b < 3) -> ~~(b<a), i.e. (a>4), (b < 3) -> (b<a) It is now easy to see that no satisfying assignment exist, hence a contradiction was found. ZAP [Source: ”Zap: Automated Theorem Proving for Software Analysis” \Thomas Ball, Shuvendu Lahiri, Madanlal Musuvathi, http://research.microsoft.com/research/pubs/view.aspx?type=technical+report&id=99 1] ZAP is a theorem proving system developed at Microsoft Research. The system uses state-of-the-art algorithms for SAT solving as its search procedure, and Nelson & Oppen method to combine theories, and quantifier matching in a way similar to the one presented above. Its main target is to provide a richer set of operations specific for program analysis, thus serving as a more convenient interface for verifying software properties. Thus, the emphasis is put on developing new and theory-specific decision procedures.