A Hybrid SAT-based Decision Procedure for Separation Logic with Uninterpreted Functions Sanjit A. Seshia Joint work with Shuvendu K. Lahiri & Randal E. Bryant Carnegie Mellon University, USA June 2003 Decision Procedures in Formal Verification RTL/ Source Code + Specification Abstraction Formal Model + Specification Verification OK Error Satisfiable/Unsatisfiable Formula Decision Procedure for Decidable Fragment of First-Order Logic Applications: Out-of-order, Pipelined Microprocessors; Cache Coherence Protocols; Device Drivers; Compiler Validation; … –2– Data and Function Abstraction x0 x1 x2 x xn-1 Common Operations p x 1 ITE(p, x, y) y 0 If-then-else Bit-vectors to (unbounded) Integers x y A L U = x=y Test for equality f Functional units to Uninterpreted Functions a = x Æ b = y ) f(a,b) = f(x,y) x y x 1 < x<y Test for ordering + x +1 Counters –3– Separation Logic with Uninterpreted Functions (SUF) Sufficiently expressive for afore-mentioned applications System property expressed as SUF formula F – Efficiently decided via translation to SAT Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) T+1 T-1 Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 Pred(T1, …, Tk) Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application –4– SAT-based Decision Procedures Input Formula Satisfiability-preserving Boolean Encoder Input Formula Approximate Boolean Encoder Boolean Formula Boolean Formula SAT Solver SAT Solver satisfiable unsatisfiable EAGER ENCODING additional clause unsatisfiable First-order Conjunctions SAT Checker satisfiable satisfying assignment unsatisfiable LAZY ENCODING satisfiable –5– Talk Outline SUF Separation Logic SAT – Two eager encoding techniques – Pros and cons of each technique Combining eager encoding techniques – The Hybrid eager encoding technique Experimental results – Superior performance to lazy encoding methods and non-SAT-based decision procedures Conclusions –6– SUF Separation Logic Eliminate function and predicate applications using fresh variables and ITE expressions [Bryant, German, Velev, CAV’99] – f(x) v1 and f(y) ITE(x = y, v1, v2) Terms (T ) ITE(F, T1, T2) v Fun (T1, …, Tk) T+1 T-1 Integer Expressions If-then-else Function application Integer variable Increment Decrement Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 b Pred(T1, …, Tk) Boolean Expressions Boolean connectives Equation Separation Predicate Inequality Predicate application Boolean variable –7– Eager Boolean Encoding Methods for Separation Logic Separation Logic Formula Small Domain Encoding (SD) Per-Constraint Encoding (EIJ) Boolean Formula SAT Solver satisfiable/unsatisfiable –8– Small Domain Encoding (SD) [Bryant, Lahiri, Seshia, CAV’02] x ¸ y Æ y ¸ z Æ z ¸ x+1 h0x1x0i ¸ h0y1y0i Æ h0y1y0i ¸ h0z1z0i Æ h0z1z0i ¸ h0x1x0i + 1 Observation: To check satisfiability, need to consider all possible relative orderings of finitely-many expressions z x x+1 y z Values increase y x x+1 Can use Boolean encoding of finite range of values – 4 values in this case, so 2-bit encoding –9– Per-Constraint Encoding (EIJ) [Strichman, Seshia, Bryant, CAV’02] x ¸ y Æ y ¸ z Æ z ¸ x+1 Overall Boolean Encoding e1 e1 Æ e2 Æ e 3 x¸y e2 y¸z Æ e3 z ¸ x+1 e1 Æ e2 ) e4 Æ e4 ) : e3 New Separation Predicate e4 x¸z Transitivity Constraints – 10 – Comparing Eager Encoding Methods Of SD and EIJ encoding methods, which one is better? Comparison with respect to – Size of resulting Boolean formula – Performance of SAT solver – 11 – Size of Boolean Encoding: SD better than EIJ Let N be size of original separation logic formula – Size of a directed acyclic graph representation SD encoding size is worst-case O(N2) EIJ encoding size is worst-case O(2N) – Can generate O(2N) transitivity constraints Example: N = 6813 Method Boolean Encoding Size EIJ > 1000000 SD 54465 – 12 – Impact on SAT problem: SD vs EIJ Experimentally compared zChaff performance on SD and EIJ encodings of several unsatisfiable formulas Sample result: Method # Boolean variables # CNF Clauses # Conflict zChaff Clauses Time (sec) EIJ 57211 169387 150 0.56 SD 23112 67699 15811 21.63 EIJ better than SD for zChaff – 13 – Impact on SAT: Why is EIJ better than SD? Conjecture: For SD, SAT solver has to “discover” transitivity constraints as conflict clauses – Violation of transitivity constraint might be discovered only after assigning bits of several bit-vectors EIJ adds all such constraints a priori – Less learning and backtracking required by the SAT solver – 14 – Eager Encoding Tradeoffs SD encoding + Polynomial size encoding – Worse for SAT solvers EIJ encoding – Worst-case exponential size encoding + Better for SAT solvers Can we automatically select between SD and EIJ based on the input formula? – 15 – Selection Strategy Estimate number of transitivity constraints, C – Computationally hard to estimate number of transitivity constraints YES Use SD encoding C>T? NO Use EIJ encoding Problem: Can we use a different metric? – Idea: Identify feature of the input formula that varies monotonically with run-time of EIJ (but not with run-time of SD) – 16 – A Good Formula Feature: Number of Separation Predicates – 17 – A Good Formula Feature: Number of Separation Predicates – 18 – Revised Selection Strategy Count number of separation predicates, m YES m>T? + Easy to count number of separation predicates – Very approximate measure of # of transitivity constraints – Constraints only relate predicates that share variables NO Use SD encoding Use EIJ encoding Also need to automate setting of threshold T – Statistically estimate from “training” set of benchmarks – 19 – Identifying Variable Classes Æ Ç u¸v Æ x¸y Ç z ¸ x+1 u = v-2 y¸z {x,y,z} shared {u,v} shared Assignments to {u,v} are independent of those to {x,y,z} – 20 – Hybrid Encoding Technique Separation Logic Formula Compute 1. Variable classes based on predicates 2. Number of separation predicates for each class {u,v}, mk {x,y,z}, m1 NO YES m1 > T ? EIJ NO SD mk > T ? EIJ YES SD Encode each class using SD or EIJ based on local decision Encoded Boolean Formula – 21 – Automatically Selecting a Threshold Value: Intuition EIJ run time increases drastically beyond a certain number of separation predicates – 22 – Automatically Selecting a Threshold Value using Clustering Cluster total time (Y-axis) values, minimizing variance of each cluster – 23 – Experimental Evaluation Setup Compared Hybrid against – SD and EIJ encodings – Cooperating Validity Checker (CVC) based on lazy encoding method [Stump et al.’02] – Stanford Validity Checker (SVC) – non SAT-based [Barrett et al. ’96] – CVC & SVC can handle more expressive logics than SUF Benchmarks – 49 unsatisfiable SUF formulas – Load-store unit, out-of-order unit, device driver code, compiler validation, DLX pipeline – Threshold value calculated from subset of 16 benchmarks Worked well for 39 out of the 49 benchmarks Setup – Used zChaff SAT solver – Imposed timeout of 1800 sec. on total time (Encoding+SAT) – 24 – Hybrid vs. SD (39/49 benchmarks) Hybrid better SD better – 25 – Hybrid vs. EIJ (39/49 benchmarks) Hybrid better EIJ better – 26 – Hybrid vs. Lazy Encoding (CVC) (39/49 benchmarks) Hybrid better CVC better – 27 – Hybrid vs. Non-SAT-based Procedure (SVC) (39/49 benchmarks) Hybrid better SVC better – 28 – SD outperforms Hybrid on 10/49 benchmarks Hybrid better SD better – 29 – Conclusions & Ongoing Work Hybrid combination of EIJ and SD encodings – is robust to formula variations – outperforms lazy encoding methods (CVC) – outperforms non-SAT-based methods (SVC) Ongoing & Future work – Alternate estimators for number of transitivity constraints – Threshold setting technique based on clustering applies to other CAD problems too – Combination of lazy and eager encoding techniques might perform well on satisfiable formulas? More on UCLID project webpage http://www.cs.cmu.edu/~uclid – 30 –