Decision Procedures Customized for Formal Verification Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Contributions by former graduate students: Sanjit Seshia, Shuvendu Lahiri Outline Context Infinite state models of hardware systems Verification techniques Needs Requirements for decision procedures Dealing with quantifiers Our Solution –2– SAT-based procedure “Eager” Boolean encoding CADE ‘05 Verification Example Task –3– Verify that microprocessor correctly implements instruction set definition Even though heavily pipelined Alpha 21264 Microprocessor Microprocessor Report, Oct. 28, 1996 CADE ‘05 Existing Hardware Verification Methods Simulators, equivalence checkers, model checkers, … All Operate at Bit Level View each register or memory bit as state variable Behavior of each state variable defined by Boolean function Strengths Finite-state systems conceptually simple BDDs & SAT procedures allow high degrees of automation Limitations State space can be very large Only verify fixed instantiation of system Specific memory sizes, number of processes, buffer lengths, … –4– CADE ‘05 Verification Challenges Sources of Complexity Lots of internal state Complex control logic Opportunities –5– Most of the logic serves to store, select, and communicate data Alpha 21264 Microprocessor Microprocessor Report, Oct. 28, 1996 CADE ‘05 Applying Data Abstraction to Hardware Verification Idea Abstract details of data encodings and operations Keep control logic precise Applications Verify overall correctness of system Assuming individual functional units correct Advantages of Abstraction Abstract infinite-state system easier to verify than detailed finite-state one Parametric representation allows verification of many different system variants Arbitrary number of processes, buffer lengths, etc. –6– CADE ‘05 Word Abstraction Control Logic Com. Log. 1 Com. Log. 2 Data Path Data: Abstract details of form & functions Control: Keep at bit level Timing: Keep at cycle level –7– CADE ‘05 Data Abstraction #1: Bits → Terms x0 x1 x2 x xn-1 View Data as Symbolic Words Arbitrary integers No assumptions about size or encoding Classic model for reasoning about software –8– Can store in memories & registers CADE ‘05 Abstracting Data Bits Control Logic Com. Log. ? 1 Com. Log. ? 2 1 Data Path What do we do about logic functions? – 10 – CADE ‘05 Abstraction #2: Uninterpreted Functions A Lf U For any Block that Transforms or Evaluates Data: Replace with generic, unspecified function Only assumed property is functional consistency: a = x b = y f (a, b) = f (x, y) – 11 – CADE ‘05 Abstracting Functions Control Logic Com. Log. F1 1 Com. Log. F2 1 Data Path For Any Block that Transforms Data: – 12 – Replace by uninterpreted function Ignore detailed functionality Conservative approximation of actual system CADE ‘05 Abstraction #3: Modeling Memories as Mutable Functions Memory M Modeled as Function M a M(a): Value at location a Initially M a – 14 – m0 Arbitrary state Modeled by uninterpreted function m0 CADE ‘05 Effect of Memory Write Operation Writing Transforms Memory M = Write(M, wa, wd) M Express with Lambda Notation M = a . ITE(a = wa, wd, M(a)) wa = a wd M 1 0 Reading from updated memory: Address wa will get wd Otherwise get what’s already in M – 15 – CADE ‘05 Systems with Buffers Circular Queue Unbounded Buffer In Use 0 head • • • head • • • • • • tail • • • • • • • • • In Use tail Max-1 Modeling Method – 16 – Mutable function to describe buffer contents Integers to represent head & tail pointers Parameterize buffer capacity with symbolic value Max CADE ‘05 Some History of Term-Level Modeling Historically Standard model used for program verification Unbounded integer data types Widely used with theorem-proving approaches to hardware verification E.g, Hunt ’85 Automated Approaches to Hardware Verification Burch & Dill, ’95 Tool for verifying pipelined microprocessors Implemented by form of symbolic simulation – 17 – Continued application to pipelined processor verification CADE ‘05 UCLID Seshia, Lahiri, Bryant, CAV ‘02 Term-Level Verification System Language for describing systems Inspired by CMU SMV Symbolic simulator Generates integer expressions describing system state after sequence of steps Decision procedure Determines validity of formulas Support for multiple verification techniques Available by Download http://www.cs.cmu.edu/~uclid – 18 – CADE ‘05 Required Logic Scalar Data Types Formulas (F ) Boolean Expressions Control signals Terms (T ) Integer Expressions Data values Functional Data Types Functions (Fun) Integer Integer Immutable: Functional units Mutable: Memories Predicates (P) Integer Boolean Immutable: Data-dependent control Mutable: Bit-level memories – 19 – CADE ‘05 CLU Logic Counter Arithmetic, Lambda Expressions and Uinterpreted Functions Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) succ (T) pred (T) Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 P(T1, …, Tk) Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application To support pointer operations – 20 – CADE ‘05 CLU Logic (Cont.) Functions (Fun) f x1, …, xk . T Predicates (P) p x1, …, xk . F – 21 – Integer Integer Uninterpreted function symbol Function definition Integer Boolean Uninterpreted predicate symbol Predicate definition CADE ‘05 Outline Context Infinite state models of hardware systems Verification techniques Needs Requirements for decision procedures Dealing with quantifiers Our Solution – 22 – SAT-based procedure “Eager” Boolean encoding CADE ‘05 Verifying Safety Properties Present State Next State Reachable States Bad States Reset States Reset Inputs (Arbitrary) State Machine Model State encoded as Booleans, integers, and functions Next state function expresses how updated on each step Prove: System will never reach bad state – 23 – CADE ‘05 Bounded Model Checking Reachable Rn Bad States R2 R1 Reset States Repeatedly Perform Image Computations Set of all states reachable by one more state transition Underapproximation of Reachable State Set – 24 – But, typically catch most bugs with 8–10 steps CADE ‘05 Implementing BMC Satisfiable? Reset S – 25 – X1 X2 Bad Xn Construct verification condition formula for step n by symbolically simulating system for n cycles Check with decision procedure Do as many cycles as tractable CADE ‘05 True Model Checking Rn Bad States R2 R1 Reset States Impractical for Term-Level Models Can keep adding Reach Fixed-Point – 26 – Rn = Rn+1 = Reachable Many systems never reach fixed point elements to buffer Convergence test undecidable (Bryant, Lahiri, Seshia, CHARME ’03) CADE ‘05 Inductive Invariant Checking I Bad States Reachable States Reset States Key Properties of System that Make it Operate Correctly Formulate as formula I Prove Inductive – 27 – Holds initially I(s0) Preserved by all state changes I(s) I((i, s)) CADE ‘05 Inductive Invariants Formulas I1, …, In holds for any initial state s0, for 1 j n I1(s) I2(s) … In(s) Ij(s ) for any current state s and successor state s for 1 j n Ij(s0) Overall Correctness Follows by induction on time Restricted form of invariants x1x2…xk (x1…xk) (x1…xk) is a CLU formula without quantifiers x1…xk are integer variables free in (x1…xk) Express properties that hold for all buffer indices, register IDs, etc. – 28 – CADE ‘05 Proving Invariants Proving invariants inductive requires quantifiers |= [x1x2…xk (x1…xk)] [y1y2…ym (y1…ym)] Prove unsatisfiability of formula x1x2…xk (x1…xk) (y1…ym) Undecidable Problem – 29 – In logic with uninterpreted functions and equality CADE ‘05 Invariant Checking: Out-of-Order Processor Designs base Total Invariants UCLID time Person time – 30 – exc exc / br exc / br / exc / br / mem-simp mem 39 67 71 13 34 54 s 236 s 403 s 1594 s 2200 s 2 days 7 days 9 days 24 days 34 days Generating invariants requires considerable human effort Impractical for realistic designs CADE ‘05 Constructing Invariants from Predicates Predicates rob.head reg.tag(r) Invariant reg.valid(r) r,t.reg.valid(r) reg.tag(r) = t (rob.head reg.tag(r) < rob.tail rob.dest(t) = r ) Result: Correctness reg.tag(r) = t rob.dest(t) = r – 31 – CADE ‘05 Automatic Predicate Abstraction Graf & Saïdi, CAV ’97 Idea Given set of predicates P1(s), …, Pk(s) Boolean formulas describing properties of system state View as abstraction mapping: States {0,1}k Defines abstract FSM over state set {0,1}k Form of abstract interpretation Do reachability analysis similar to symbolic model checking Early Implementations Inefficient – 32 – Guess at possible next abstract states Test with call to decision procedure CADE ‘05 P.E. as Invariant Generator A Rn Abstract System Reach Fixed-Point on Abstract System R2 R1 Reset States Concretize C Concrete System I Termination guaranteed, since finite state Equivalent to Computing Invariant for Concrete System Strongest possible invariant that can be expressed by formula over these predicates Reset States – 33 – CADE ‘05 Symbolic Formulation of Predicate Abstraction Lahiri, Bryant, Cook, CAV ‘03 Basic Operation Compute set of legal abstract next states (B) given current abstract states (B) B, B: , : Abstract current and next-state state variables Boolean formulas Create formula of form (S,B) Possible combinations of current concrete state S and next abstract state B Formulate as Quantifier Elimination Problem Generate formula of form (B) S (S,B) S: Integer variables – 34 – For interpretation of B, formula true iff (S,B) satisfiable CADE ‘05 Outline Context Infinite state models of hardware systems Verification techniques Needs Requirements for decision procedures Dealing with quantifiers Our Solution – 35 – SAT-based procedure “Eager” Boolean encoding CADE ‘05 Decision Procedure Needs Bounded Model Checking Satisfiability of quantifier-free CLU formula Handled by decision procedure Invariant Checking Satisfiability of quantified CLU formula Undecidable Predicate Abstraction Eliminate quantifiers from CLU formula Role of Decision Procedure – 36 – Apply in sound, but incomplete way CADE ‘05 UCLID Decision Procedure Operation CLU Formula Lambda Expansion Series of transformations leading to propositional formula Except for lambda expansion, each has polynomial complexity -free Formula Function & Predicate Elimination Term Formula Finite Instantiation Boolean Formula Boolean Satisfiability – 37 – CADE ‘05 SAT-based Decision Procedures Input Formula Satisfiability-preserving Boolean Encoder Approximate Boolean Encoder Boolean Formula Boolean Formula SAT Solver SAT Solver satisfiable – 38 – Input Formula unsatisfiable EAGER ENCODING additional clause unsatisfiable First-order Conjunctions SAT Checker satisfiable satisfying assignment unsatisfiable satisfiable LAZY ENCODING CADE ‘05 Eager Encoding Characteristics Input Formula – Must encode all information about domain properties into Boolean formula – Some properties can give exponential blowup Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver + Lets SAT solver do all of the work Good Approach for Some Domains Modern SAT solvers have remarkable capacity Good at extracting relevant portions out of very large formulas Learns about formula properties as search proceeds satisfiable – 39 – unsatisfiable CADE ‘05 Encoding Methods Difference Logic Formula Small Domain Encoding (SD) Per-Constraint Encoding (PC) Boolean Formula SAT Solver satisfiable/unsatisfiable – 41 – CADE ‘05 Small Domain Encoding (SD) [Bryant, Lahiri, Seshia, CAV’02] x y y z z x+1 0x1x0 0y1y0 0y1y0 0z1z0 0z1z0 0x1x0+1 Observation: To check satisfiability, need to consider all possible relative orderings of finitely-many expressions z x x+1 y z Values increase y x x+1 Can use Boolean encoding of finite range of values – 4 values in this case, so 2-bit encoding – 42 – CADE ‘05 Per-Constraint Encoding (PC) [Strichman, Seshia, Bryant, CAV’02] xy yz e1 Overall Boolean Encoding z x+1 e2 e3 e1 e2 e4 e4 e3 e1 xy e2 yz e3 z x+1 New Difference Predicate e4 xz Transitivity Constraints – 43 – CADE ‘05 Size of Boolean Encoding: SD better than PC Let N be size of original difference logic formula Size of a directed acyclic graph representation SD encoding size is worst-case O(N2) PC encoding size is worst-case O(2N) Can generate O(2N) transitivity constraints Example: N = 6813 – 44 – Method Boolean Encoding Size PC > 1000000 SD 54465 CADE ‘05 Impact on SAT problem: SD vs PC Experimentally compared zChaff performance on SD and PC encodings of several unsatisfiable formulas Sample result: Method # Boolean variables # CNF Clauses # Conflict Clauses zChaff Time (sec) PC 57211 169387 150 0.56 SD 23112 67699 15811 21.63 PC better than SD for zChaff – 45 – CADE ‘05 How to Choose Encoding Hybrid Strategy Partition variables into classes Which ones are compared to each other For each class, choose encoding method PC except SD when PC blows up How to Determine Whether PC Will Work Try to predict based on formula characteristics Number of constraints, density, … Selection procedure trained by machine learning – 46 – CADE ‘05 Some Lessons We’ve Learned About Decision Procedures Preserve Boolean Structure Other approaches require collapsing to conjunctions of predicates (or extracting them dynamically) Exploit Problem Characteristics Sparseness Polarity structure Let SAT Solver Do the Work – 47 – Eager encoding: provide sufficient set of constraints to prove / disprove formula They are good at digesting large volume of information CADE ‘05 Invariant Checking Revisited Prove Unsatisfiability of Formula x1x2…xk (x1…xk) (y1…ym) General Form: X (X) (Y) Quantifier Instantiation Generate expressions E1(Y), …, En(Y) Using terms that appear in Q Expand as (E1(Y)) … (En(Y)) (Y) If unsatisfiable, then so is quantified formula Sound, but incomplete Trade-off – 48 – Be clever about instantiation, or Instantiate many terms and rely on decision procedure capacity CADE ‘05 Predicate Abstraction Revisited Formulate as Quantifier Elimination Problem Generate formula of form (B) S (S,B) S: Integer variables Use Eager SAT Encoding of Get formula A P(A,B) A: Boolean variables Satisfying solutions for P w.r.t. B same as those for – 49 – Core problem of symbolic model checking CADE ‘05 Quantifier Elimination for P.A. Formula A P(A,B) A: Boolean variables Typically: 200+ variables for A, ~20 for B BDD-Based Use partitioning techniques developed for symbolic model checking Typically too many total Boolean variables SAT Enumeration Find satisfying solution (A) (B) to P Enumerate solution (B) Reformulate P as P (B) Performance: about 1000 solutions / second – 50 – CADE ‘05 Why Verification Tasks Feasible CLU Logic Fairly Simple Equality, uninterpreted functions, difference constraints Small model property “Deep” Reasoning Not Required – 51 – Formulas large and messy, but straightforward Verifying systems that are designed to have constrained behaviors Only checking effect of a few cycles of system operation CADE ‘05 Decision Procedures Revisited SAT-Based Approaches Effective Good performance as decision procedures Key to implementing predicate abstraction Quantifier elimination Eager Encoding Gives Good Performance Avoids many iterations of theory-specific checkers Extends to linear integer arithmetic Seshia & Bryant, LICS ‘04 Quantifier-free Presburger Small domain encoding exploiting sparseness – 52 – CADE ‘05 Areas of Research Bit-Vector Decision Procedures True model for hardware & low-level software Bit-field extraction Bit-wise Boolean operations Overflow effects Automatically apply abstractions Abstract to symbolic terms whenever possible Boolean Quantifier Elimination SAT enumeration still not good enough Limits predicate abstraction to ~25 predicates – 53 – Core problem for symbolic model checking CADE ‘05 More Research Proof Generation Hard to see how to generate unsatisfiability proof for CLU formula Debugging Support Bounded model checking: provide counterexample trace Invariant checking: hard to determine why invariant fails And may be due to weakness in quantifier instantiation Predicate abstraction: Gets nowhere without right set of predicates Proving Liveness – 54 – Current abstractions do not preserve liveness properties Can help in proving progress invariant CADE ‘05 Questions?