Introduction to Satisfiability Modulo Theories (SMT) Clark Barrett, NYU Sanjit A. Seshia, UC Berkeley ICCAD Tutorial November 2, 2009 Boolean Satisfiability (SAT) p1 Ç Æ p2 . . . : Æ Ç Ç pn Is there an assignment to the p1, p2, …, pn variables such that evaluates to 1? C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 2 Satisfiability Modulo Theories p1 x=y p2 x+2z¸1 . . . pn w & 0xFFFF = x x % 26 = v Ç Æ : Æ Ç Ç Is there an assignment to the x,y,z,w variables s.t. evaluates to 1? C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 3 Satisfiability Modulo Theories • Given a formula in first-order logic, with associated background theories, is the formula satisfiable? – Yes: return a satisfying solution – No [generate a proof of unsatisfiability] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 4 Applications of SMT • Hardware verification at higher levels of abstraction (RTL and above) • Verification of analog/mixed-signal circuits • Verification of hybrid systems • Software model checking • Software testing • Security: Finding vulnerabilities, verifying electronic voting machines, … • Program synthesis • … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 5 References Satisfiability Modulo Theories Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Chapter 8 in the Handbook of Satisfiability, Armin Biere, Hans van Maaren, and Toby Walsh, editors, IOS Press, 2009. (available from our webpages) SMTLIB: A repository for SMT formulas (common format) and tools SMTCOMP: An annual competition of SMT solvers C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 6 Roadmap for this Tutorial • • • • Background and Notation Survey of Theories Theory Solvers Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 7 Roadmap for this Tutorial Background and Notation • Survey of Theories • Theory Solvers • Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 8 First-Order Logic • A formal notation for mathematics, with expressions involving – Propositional symbols – Predicates – Functions and constant symbols – Quantifiers • In contrast, propositional (Boolean) logic only involves propositional symbols and operators C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 9 First-Order Logic: Syntax • As with propositional logic, expressions in first-order logic are made up of sequences of symbols. • Symbols are divided into logical symbols and non-logical symbols or parameters. • Example: (x = y) Æ (y = z) Æ (f(z) ¸ f(x)+1) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 10 First-Order Logic: Syntax • Logical Symbols – Propositional connectives: Ç, Æ, :, !, $ – Variables: v1, v2, . . . – Quantifiers: 8, 9 • Non-logical symbols/Parameters – Equality: = – Functions: +, -, %, bit-wise &, f(), concat, … – Predicates: ·, is_substring, … – Constant symbols: 0, 1.0, null, … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 11 Quantifier-free Subset • We will largely restrict ourselves to formulas without quantifiers (8, 9) • This is called the quantifier-free subset/fragment of first-order logic with the relevant theory C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 12 Logical Theory • Defines a set of parameters (non-logical symbols) and their meanings • This definition is called a signature. • Example of a signature: Theory of linear arithmetic over integers Signature is (0,1,+,-,·) interpreted over Z C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 13 Roadmap for this Tutorial Background and Notation Survey of Theories • Theory Solvers • Two Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 14 Some Useful Theories • Equality (with uninterpreted functions) • Linear arithmetic (over Q or Z) • Difference logic (over Q or Z) • Finite-precision bit-vectors – integer or floating-point • Arrays / memories • Misc.: Non-linear arithmetic, strings, inductive datatypes (e.g. lists), sets, … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 15 Theory of Equality and Uninterpreted Functions (EUF) • Also called the “free theory” – Because function symbols can take any meaning – Only property required is congruence: that these symbols map identical arguments to identical values i.e., x = y ) f(x) = f(y) • SMTLIB name: QF_UF C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 16 Data and Function Abstraction with EUF x0 x1 x2 x xn-1 Bit-vectors to Abstract Domain (e.g. Z) Common Operations p x 1 ITE(p, x, y) y 0 If-then-else A L U x f y = x=y Test for equality Functional units to Uninterpreted Functions a = x Æ b = y ) f(a,b) = f(x,y) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 17 Hardware Abstraction with EUF IF/ID PC Op ID/EX Control EX/WB Control Rd Ra Instr F1 Mem = Adat Reg. File A FL2 U Imm F +4 3 Rb = • For any Block that Transforms or Evaluates Data: – Replace with generic, unspecified function – Also view instruction memory as function C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 18 Example QF_UF (EUF) Formula (x = y) Æ (y = z) Æ (f(x) f(z)) Transitivity: (x = y) Æ (y = z) ) (x = z) Congruence: (x = z) ) (f(x) = f(z)) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 19 Equivalence Checking of Program Fragments int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } int fun2(int y) { return y*y; } C. Barrett & S. A. Seshia SMT formula Satisfiable iff programs non-equivalent ( z = y Æ y1 = x Æ x1 = z Æ ret1 = x1*x1) Æ ( ret2 = y*y ) Æ ( ret1 ret2 ) What if we use SAT to check equivalence? ICCAD 2009 Tutorial 20 Equivalence Checking of Program Fragments int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } SMT formula Satisfiable iff programs non-equivalent ( z = y Æ y1 = x Æ x1 = z Æ ret1 = x1*x1) Æ ( ret2 = y*y ) Æ ( ret1 ret2 ) Using SAT to check equivalence (w/ Minisat) int fun2(int y) { 32 bits for y: Did not finish in over 5 hours return y*y; 16 bits for y: 37 sec. } 8 bits for y: 0.5 sec. C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 21 Equivalence Checking of Program Fragments int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } int fun2(int y) { return y*y; } C. Barrett & S. A. Seshia SMT formula ’ ( z = y Æ y1 = x Æ x1 = z Æ ret1 = sq(x1) ) Æ ( ret2 = sq(y) ) Æ ( ret1 ret2 ) Using EUF solver: 0.01 sec ICCAD 2009 Tutorial 22 Equivalence Checking of Program Fragments int fun1(int y) { int x; x = x ^ y; y = x ^ y; x = x ^ y; return x*x; } int fun2(int y) { return y*y; } C. Barrett & S. A. Seshia Does EUF still work? No! Must reason about bit-wise XOR. Need a solver for bit-vector arithmetic. Solvable in less than a sec. with a current bit-vector solver. ICCAD 2009 Tutorial 23 Finite-Precision Bit-Vector Arithmetic (QF_BV) – Fixed width data words • Can model int, short, long, etc. – Arithmetic operations • E.g., add/subtract/multiply/divide & comparisons • Two’s complement and unsigned operations – Bit-wise logical operations • E.g., and/or/xor, shift/extract and equality – Boolean connectives C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 24 Linear Arithmetic (QF_LRA, QF_LIA) • Boolean combination of linear constraints of the form (a1 x1 + a2 x2 + … + an xn » b) • xi’s could be in Q or Z , » 2 {¸,>,·,<,=} • Many applications, including: – Verification of analog circuits – Software verification, e.g., of array bounds C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 25 Difference Logic (QF_IDL, QF_RDL) • Boolean combination of linear constraints of the form xi - xj » cij or x i » ci » 2 {¸,>,·,<,=}, xi’s in Q or Z • Applications: – Software verification (most linear constraints are of this form) – Processor datapath verification – Job shop scheduling / real-time systems – Timing verification for circuits C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 26 Arrays/Memories • SMT solvers can also be very effective in modeling data structures in software and hardware – Arrays in programs – Memories in hardware designs: e.g. instruction and data memories, CAMs, etc. C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 27 Theory of Arrays (QF_AX) Select and Store • Two interpreted functions: select and store – select(A,i) – store(A,i,d) Read from A at index i Write d to A at index i • Two main axioms: – select(store(A,i,d), i) = d – select(store(A,i,d), j) = select(A,j) for i j • One other axiom: – (8 i. select(A,i) = select(B,i)) ) A = B C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 28 Equivalence Checking of Program Fragments int fun1(int y) { int x[2]; x[0] = y; y = x[1]; x[1] = x[0]; return x[1]*x[1]; } SMT formula ’’ x1 = store(x,0,y) Æ y1 = select(x1,1) Æ x2 = store(x1,1,select(x1,0)) Æ ret1 = sq(select(x2,1)) ] Æ ( ret2 = sq(y) ) Æ ( ret1 ret2 ) [ int fun2(int y) { return y*y; } C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 29 Roadmap for this Tutorial Background and Notation Survey of Theories Theory Solvers • Two Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 30 Over to Clark… C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 31 Roadmap for this Tutorial Background and Notation Survey of Theories Theory Solvers • Approaches to SMT Solving – Lazy Encoding to SAT Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 32 Eager Approach to SMT Input Formula Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver satisfiable unsatisfiable EAGER ENCODING C. Barrett & S. A. Seshia SAT Solver involved in Theory Reasoning Key Ideas: • Small-domain encoding – Constrain model search • Rewrite rules • Abstraction-based methods (eager + lazy) Example Solvers: UCLID, STP, Spear, Boolector, Beaver, … ICCAD 2009 Tutorial 33 Theories • Eager Encoding Methods have been demonstrated for the following Theories: – Equality & Uninterpreted Functions – Integer Linear Arithmetic – Restricted Lambda expressions • Arrays, memories, etc. – Finite-precision Bit-Vector Arithmetic – Strings C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 34 UCLID Operation Input Formula Lambda Expansion for Arrays -free Formula Operation – Series of transformations leading to Boolean formula – Each step is validity (satisfiability) preserving – Each step performs optimizations http://uclid.eecs.berkeley.edu C. Barrett & S. A. Seshia Function & Predicate Elimination Linear/ Bitvector ArithmeticFormula Encoding Arithmetic Boolean Formula Boolean Satisfiability ICCAD 2009 Tutorial 35 Rewrites: Eliminating Function Applications – Two applications of an uninterpreted function f in a formula – f(x1) and f(x2) Bryant, German, Velev’s Encoding Ackermann’s Encoding f(x1) vf1 f(x1) f(x2) vf2 f(x2) x1= x2 vf1 = vf2 C. Barrett & S. A. Seshia vf1 ITE(x1= x2, vf1, vf2) ICCAD 2009 Tutorial 36 Small-Domain Encoding • Consider an SMT formula (x1, x2, …, xn) where xi 2 Di • Small-domain encoding/Finite instantiation: Derive finite set Si ½ Di s.t. |Si| ¿ |Di| – In some cases, Si is finite where Di is infinite • Encode each xi to take values only in Si – Could be done by encoding to SAT • Example: Integer Linear Arithmetic (QF_LIA) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 37 Solving QF_LIA is NP-complete • In NP: – If a satisfying solution exists, then one exists within a bound d • log d is polynomial in input size – Expression for d [Papadimitriou, ‘82] 2m+3 (n+m) ¢ (bmax +1) ¢ ( m ¢ amax ) – Input size: • • • • m n bmax amax C. Barrett & S. A. Seshia – # constraints – # variables – largest constant (absolute value) – largest coefficient (absolute value) ICCAD 2009 Tutorial 38 Small-domain encoding / Finite Instantiation: Naïve approach • Steps – Calculate the solution bound d – Encode each integer variable with d log d e bits & translate to Boolean formula – Run SAT solver • Problem: For QF_LIA, d is W( m m ) – W( m log m ) bits per variable • Solution: Exploit special-cases and domainspecific structure C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 39 Special Case 1: Equality Logic • Linear constraints are equalities xi = xj • Result: d = n x1 x 2 Æ x2 x3 Æ x1 x3 3-valued domain is needed: {1, 2, 3} x1 = x2 Æ x2 x3 Æ x1 x3 Can find solution with domain {1, 2} [Pnueli et al., Information and Computation, 2002] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 40 Special Case 2: Difference Logic • Boolean combination of difference-bound constraints – xi ¸ xj + b, § xi ¸ b • Result: d = n ¢ (bmax + 1) [Bryant, Lahiri, Seshia, CAV’02] • Proof sketch: satisfying solution corresponds to shortest path in constraint graph – Longest such path has length · n ¢ (bmax + 1) • Tighter formula-specific bounds possible C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 41 Special Case 3: Generalized 2SAT • Generalized 2SAT constraints – xi + xj ¸ b, - xi - xj ¸ b, xi - xj ¸ b, xi ¸ b • d = 2 ¢ n ¢ (bmax + 1) [Seshia, Subramani, Bryant,’04] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 42 Full Integer Linear Arithmetic • Can we avoid the mm blow-up? • In fact, yes. The idea is to derive a new parameterized solution bound d – Formalize parameters that the bound really depends on – Parameters characterize sparse structure • Occurs especially in software verification; also in many high-level hardware models – [Seshia & Bryant, LICS’04, LMCS’05] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 43 Structure of Linear Constraints in Software Verification • Characteristics of studied benchmarks – Mostly difference constraints • Only 3% of constraints were NOT difference constraints – Non-difference constraints are sparse • At most 6 variables per constraint (total number of variables in 1000s) • Some similar observations: Pratt’77, ESC/JavaSimplify-TR’03 C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 44 Parameterized Solution Bound New parameters: – k non-difference constraints, – w variables per constraint (width) Our solution bound: n ¢ (bmax +1) ¢ ( w ¢ amax ) k Previous: (n+m) ¢ (bmax +1) ¢ ( m ¢ amax ) 2m+3 m #constraints n #variables bmax max |constant| amax max |coefficient| • Direct dependence on m eliminated (and k ¿ m ) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 45 Example Æ Ç : Ç x1 - x2 ¸ 1 x1 + 2 x2 + x3 > -3 x2 – x4 ¸ 0 m #constraints 3 k #non-difference 1 n #variables 4 w width 3 Previous d bmax max |constant| 3 = 282,175,488 amax max |coefficient| 2 C. Barrett & S. A. Seshia d = 96 ICCAD 2009 Tutorial 46 Summary of d Values Logic Equality logic Difference logic Solution Bound d n n ¢ ( bmax + 1 ) Generalized 2SAT logic 2 ¢ n ¢ ( bmax + 1 ) Full Integer Linear Arithmetic n ¢ (bmax + 1) ¢ (amaxk ¢ w k) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 47 Abstraction-Based Methods • For some logics, one cannot easily compute a closed-form expression for the small domain • Example: Bit-Vector Arithmetic • In such cases, an abstraction-refinement approach can be used to compute formula-specific small domains C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 48 Bit-Vector Arithmetic: Some History • B.C. (Before Chaff) – String operations (concatenate, field extraction) – Linear arithmetic with bounds checking – Modular arithmetic • SAT-Based “Bit Blasting” – Generate Boolean circuit based on bit-level behavior of operations • Handles arbitrary operations – Check with best available SAT solver – Effective in many applications • CBMC [Clarke, Kroening, Lerda, TACAS ’04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 49 Research Challenge • Is there a better way than bit blasting? • Requirements – Provide same functionality as with bit blasting • Must support all bit-vector operators – Exploit word-level structure – Improve on performance of bit blasting • Current Approaches based on two core ideas: 1. Simplification: Simplify input formula using word-level rewrite rules and solvers 2. Abstraction: Can use automatic abstraction-refinement to solve simplified formula C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 50 Bit-Vector SMT Solvers, circa Spr.’2009 Current Techniques with Sample Tools – Proof-based abstraction-refinement – UCLID [Bryant et al., TACAS ’07] – Solver for linear modular arithmetic to simplify the formula – STP [Ganesh & Dill, CAV’07] – Automatic parameter tuning for SAT– Spear [Hutter et al., FMCAD ’07] – Rewrites, underapproximation, efficient SAT engine – Boolector [Brummayer & Biere, TACAS’09] – Equality/constant propagation, logic optimization, special rules for non-linear ops - Beaver [Jha et al., CAV’09] – DPLL(T) framework: Layered approach, rewriting – CVC3 [Barrett et al.], MathSAT [Bruttomesso et al], Yices [Dutertre et al.], Z3 [de Moura et al] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 51 Abstraction-Refinement • Deciding Bit-Vector Arithmetic with Abstraction [Bryant et al., TACAS ’07, STTT ’09] – Use bit blasting as core technique – Apply to simplified versions of formula: under and over approximations – Generate successive approximations until a solution is found or formula shown unsatisfiable – Inspired by McMillan & Amla’s proof-based abstraction for finite-state model checking • Small Motivating Example: (x + y y + x) Æ (x * y y * x) – Sufficient to prove the left-hand conjunct unsat C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 52 Approximations to Formula Overapproximation Original Formula Underapproximation + + More solutions: If unsatisfiable, then so is − Fewer solutions: Satisfying solution also satisfies − • Example Approximation Techniques – Underapproximating • Restrict word-level variables to smaller ranges of values – Overapproximating • Replace subformula with Boolean variable C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 53 Starting Iterations 1− • Initial Underapproximation – (Greatly) restrict ranges of word-level variables – Intuition: Satisfiable formula often has small-domain solution C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 54 First Half of Iteration 1+ 1− UNSAT proof: generate overapproximation If SAT, then done • SAT Result for 1− – Satisfiable • Then have found solution for – Unsatisfiable • Use UNSAT proof to generate overapproximation 1+ C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 55 Second Half of Iteration 1+ SAT: Use solution to generate refined underapproximation If UNSAT, then done 2− 1− • SAT Result for 1+ – Unsatisfiable: then have shown unsatisfiable – Satisfiable: solution indicates variable ranges that must be expanded • Generate refined underapproximation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 56 Example 1+ := (x = y+2) SAT x = 2, y = 0 UNSAT Look at proof := (x = y+2) Æ (x2 > y2) 2− := (x[2] = y[2]+2) Æ (x[2] > y[2] 2 2) SAT, done. 1− := (x[1] = y[1]+2) Æ (x[1]2 > y[1]2) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 57 Iterative Behavior • Underapproximations 2+ + 1 k+ k− – Successively more precise abstractions of – Allow wider variable ranges • Overapproximations – No predictable relation – UNSAT proof not unique 2− 1− C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 58 Overall Effect • Soundness 2+ + 1 UNSAT k+ SAT k− 2− 1− C. Barrett & S. A. Seshia – Only terminate with solution on underapproximation – Only terminate as UNSAT on overapproximation • Completeness – Successive underapproximations approach – Finite variable ranges guarantee termination • In worst case, get k− = ICCAD 2009 Tutorial 59 Roadmap for this Tutorial Background and Notation Survey of Theories Theory Solvers Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 60 Summary of Ideas: Modeling • Philosophy: Model systems in first-order logic + suitable theories • Widely-used theories: – Equality and uninterpreted functions – Linear arithmetic – Bit-vector arithmetic – Arrays C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 61 Summary of Ideas: Lazy Methods • Philosophy: Extend DPLL framework from SAT to SMT • Literals assigned by SAT are sent to Theory Solver • Theory Solver determines if literals are satisfiable in the theory • Key optimizations: small explanations, early conflict detection, theory propagation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 62 Summary of Ideas: Eager Methods • Philosophy: Constrain solution space with logic-specific methods • Small-domain encoding – Compute bounds that work for any formula in the logic • Abstraction-refinement of domains – Compute formula-specific small domains • Rewrite rules: high level and bit level – Simplify formula before and after bit-blasting C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 63 Challenges and Opportunities • Solvers for new theories – Strings – Non-linear arithmetic – Can we exploit domain-specific structure? • Parallel SMT • Better support for quantifiers • Better proof/interpolant generation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 64 Join the SMT Community • We need your new, exciting applications! • Contribute to SMT-LIB • Create new solvers, compete in SMTCOMP Slides and book chapter available on our websites: Clark: http://cs.nyu.edu/~barrett Sanjit: http://www.eecs.berkeley.edu/~sseshia C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 65