SAT-Based Decision Procedures for Subsets of First-Order Logic Part II: Separation Logic Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Outline Background SAT-based Decision Procedures Equality with Uninterpreted Functions Translating to propositional formula Exploiting positive equality and sparse transitivity Separation Logic –2– Translating to propositional formula Hybrid encoding techniques Separation Logic with Uninterpreted Functions (SUF) Suitable for verifying wider class of systems Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) T+1 T–1 Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 Pred(T1, …, Tk) –3– Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application SUF Separation Logic Eliminate function and predicate applications using fresh variables and ITE expressions [Bryant, German, Velev, CAV’99] –4– f(x) v1 and f(y) ITE(x = y, v1, v2) Terms (T ) ITE(F, T1, T2) v Fun (T1, …, Tk) T+1 T-1 Integer Expressions If-then-else Function application Integer variable Increment Decrement Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 b Pred(T1, …, Tk) Boolean Expressions Boolean connectives Equation Separation Predicate Inequality Predicate application Boolean variable Eager Boolean Encoding Methods for Separation Logic Separation Logic Formula Small Domain Encoding (SD) Per-Constraint Encoding (EIJ) Boolean Formula SAT Solver satisfiable/unsatisfiable –5– Small Domain Encoding (SD) [Bryant, Lahiri, Seshia, CAV’02] x y y z z x+1 0x1x0 0y1y0 0y1y0 0z1z0 0z1z0 0x1x0 + 1 Observation: To check satisfiability, need to consider all possible relative orderings of finitely-many expressions z x x+1 y z Values increase y x x+1 Can use Boolean encoding of finite range of values – 4 values in this case, so 2-bit encoding –6– Per-Constraint Encoding (EIJ) [Strichman, Seshia, Bryant, CAV’02] x y y z z x+1 Overall Boolean Encoding e1 xy e1 e2 e 3 e2 yz e3 z x+1 e1 e2 e4 e4 e3 Transitivity Constraints –7– New Separation Predicate e4 xz Enforcing Transitivity Constraints x y + c1 x c1 y x c3 + c4 c3 + c2 c1 + c4 c + c2 c4 c3 1 c1 c2 z y Graph Representation of Separation Constraints Directed multigraph where edges labeled by constants Fourier-Motzkin Elimination –8– Eliminate nodes in succession Possibly exponential growth in edges Introducing New Predicates x y + c1 x c1 y Sample Predicates x c3 + c4 c3 + c2 c1 + c4 c + c2 c4 c3 1 c1 c2 e1 x y + c1 e2 y z + c2 e3 x z + c1 + c2 Sample Transitivity Constraint e4 x y + c2 e1 e2 e3 y Sample Ordering Constraint (for c1 < c2) e4 e1 –9– z Comparing Eager Encoding Methods Of SD and EIJ encoding methods, which one is better? Comparison with respect to – 10 – Size of resulting Boolean formula Performance of SAT solver Size of Boolean Encoding: SD better than EIJ Let N be size of original separation logic formula Size of a directed acyclic graph representation SD encoding size is worst-case O(N2) EIJ encoding size is worst-case O(2N) Can generate O(2N) transitivity constraints Example: N = 6813 – 11 – Method Boolean Encoding Size EIJ > 1000000 SD 54465 Impact on SAT problem: SD vs EIJ Experimentally compared zChaff performance on SD and EIJ encodings of several unsatisfiable formulas Sample result: Method # Boolean variables # CNF Clauses # Conflict Clauses EIJ 57211 169387 150 0.56 SD 23112 67699 15811 21.63 EIJ better than SD for zChaff – 12 – zChaff Time (sec) Impact on SAT: Why is EIJ better than SD? Conjecture: For SD, SAT solver has to “discover” transitivity constraints as conflict clauses Violation of transitivity constraint might be discovered only after assigning bits of several bit-vectors EIJ adds all such constraints a priori – 13 – Less learning and backtracking required by the SAT solver Eager Encoding Tradeoffs SD encoding + Polynomial size encoding Worse for SAT solvers EIJ encoding + Worst-case exponential size encoding Better for SAT solvers Can we automatically select between SD and EIJ based on the input formula? – 14 – Selection Strategy Seshia, Lahiri, Bryant, DAC ‘03 Estimate number of transitivity constraints, C YES Use SD encoding – 15 – C>T? NO Problem: Can we use a different metric? Use EIJ encoding Computationally hard to estimate number of transitivity constraints Idea: Identify feature of the input formula that varies monotonically with run-time of EIJ (but not with run-time of SD) A Good Formula Feature: Number of Separation Predicates – 16 – A Good Formula Feature: Number of Separation Predicates – 17 – Revised Selection Strategy Count number of separation predicates, m YES Use SD encoding – 18 – m>T? NO Use EIJ encoding Easy to count number of separation predicates Very approximate measure of # of transitivity constraints Constraints only relate predicates that share variables Also need to automate setting of threshold T Statistically estimate from “training” set of benchmarks Identifying Variable Classes Æ Ç u¸v Æ x¸y z ¸ x+1 u = v-2 y¸z {x,y,z} shared – 19 – Ç {u,v} shared Assignments to {u,v} are independent of those to {x,y,z} Hybrid Encoding Technique Separation Logic Formula Compute 1. Variable classes based on predicates 2. Number of separation predicates for each class {u,v}, mk {x,y,z}, m1 NO YES m1 > T ? EIJ NO SD mk > T ? EIJ Encode each class using SD or EIJ based on local decision – 20 – Encoded Boolean Formula YES SD Automatically Selecting a Threshold Value: Intuition EIJ run time increases drastically beyond a certain number of separation predicates – 21 – Automatically Selecting a Threshold Value using Clustering Cluster total time (Y-axis) values, minimizing variance of each cluster – 22 – Experimental Evaluation Setup Compared Hybrid against SD and EIJ encodings Cooperating Validity Checker (CVC) based on lazy encoding method [Stump et al.’02] Stanford Validity Checker (SVC) – non SAT-based [Barrett et al. ’96] CVC & SVC can handle more expressive logics than SUF Benchmarks 49 unsatisfiable SUF formulas Load-store unit, out-of-order unit, device driver code, compiler validation, DLX pipeline Threshold value calculated from subset of 16 benchmarks Worked well for 39 out of the 49 benchmarks Setup – 23 – Used zChaff SAT solver Imposed timeout of 1800 sec. on total time (Encoding+SAT) Hybrid vs. SD (39/49 benchmarks) Hybrid better SD better – 24 – Hybrid vs. EIJ (39/49 benchmarks) Hybrid better EIJ better – 25 – Hybrid vs. Lazy Encoding (CVC) (39/49 benchmarks) Hybrid better CVC better – 26 – Hybrid vs. Non-SAT-based Procedure (SVC) (39/49 benchmarks) Hybrid better SVC better – 27 – SD outperforms Hybrid on 10/49 benchmarks Hybrid better SD better – 28 – Conclusions & Ongoing Work Hybrid combination of EIJ and SD encodings is robust to formula variations outperforms lazy encoding methods (CVC) outperforms non-SAT-based methods (SVC) Ongoing & Future work Alternate estimators for number of transitivity constraints Threshold setting technique based on clustering applies to other CAD problems too Combination of lazy and eager encoding techniques might perform well on satisfiable formulas? More on UCLID project webpage http://www.cs.cmu.edu/~uclid – 29 –