Formal Verification of Infinite-State Systems Using Boolean Methods Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Contributions by former graduate students: Sanjit Seshia, Shuvendu Lahiri Outline Task Formally verify abstract models of hardware and software systems Build on success in verifying finite models Infinite-State Models Need logic that is suitably expressive, yet remains reasonably tractable Verification Techniques Solve problems by mapping into propositional logic Proof engines can use powerful Boolean methods –2– Different levels of automation and capacity Theoretically Infinite-State Systems Systems with unbounded buffers Even though can’t really build one In Use • • • –3– • • • • • • tail head Arbitrarily Large Finite-State Systems P2 • P1 • Synchronization protocol that should work for arbitrary number of processes • PN Verify for arbitrary N Circular buffer with fixed, but arbitrary capacity In Use head Verify for arbitrary value of Max • • • –4– • • • • • • 0 tail Max-1 Existing Automatic Verification Methods Simulators, model checkers, … All Operate at Bit Level State model State encoded as words and arrays of words Comprised of bits Must track how each bit of state gets updated Only Verify Single Instance of Design Fixed values for parameters Word size Buffer sizes Number of processes –5– What About Theorem Provers? Traditional Tool for Formal Verification Allow many forms of abstraction Hard to Use Lots of manual effort & expertise required Question: –6– Can we incorporate some of these abstraction abilities into an automated tool? Data Abstraction #1: Bits → Integers x0 x1 x2 xn-1 View Data as Symbolic Words Arbitrary integers No assumptions about size or encoding Classic model for reasoning about software –7– Can store in memories & registers x Abstracting Data Bits Control Logic Com. ? Log. 1 Com. ? Log. 2 1 Data Path What do we do about logic functions? –8– Abstraction #2: Uninterpreted Functions A Lf U For any Block that Transforms or Evaluates Data: Replace with generic, unspecified function Only assumed property is functional consistency: a = x b = y f (a, b) = f (x, y) –9– Abstracting Functions Control Logic Com. Log. F1 1 Com. Log. F2 1 Data Path For Any Block that Transforms Data: – 10 – Replace by uninterpreted function Ignore detailed functionality Conservative approximation of actual system Modeling Data-Dependent Control Branch? Adata Branch Logic Cond p Bdata Model by Uninterpreted Predicate – 11 – Yields arbitrary Boolean value for each control + data combination Produces same result when arguments match Abstraction #3: Modeling Memories as Mutable Functions Memory M Modeled as Function M a M(a): Value at location a Initially M a – 12 – m0 Arbitrary state Modeled by uninterpreted function m0 Effect of Memory Write Operation Writing Transforms Memory M = Write(M, wa, wd) M wa = a wd M 1 0 Reading from updated memory M(a): Address wa will get wd Otherwise get what’s already in M – 13 – Systems with Buffers Circular Queue Unbounded Buffer In Use 0 head Modeling Method – 14 – Mutable function to describe buffer contents Integers to represent head & tail pointers • • • head • • • • • • tail • • • • • • • • • In Use tail Max-1 UCLID Seshia, Lahiri, Bryant, CAV ‘02 Term-Level Verification System Language for describing systems Inspired by CMU SMV Symbolic simulator Generates integer expressions describing system state after sequence of steps Decision procedure Determines validity of formulas Support for multiple verification techniques Available by Download http://www.cs.cmu.edu/~uclid – 15 – System Model Present State Next State State Variable Types Boolean Control signals Integer Data, addresses Function Memories, buffers Reset Inputs (Arbitrary) System Operation Synchronous All state variables updated on each step of operation Interleaving One (set of) state variable(s) updated at a time Simulate in synchronous model with uninterpreted scheduling function – 16 – Modeling Example Boolean state DLX Pipeline Integer state Single-issue, 5-stage pipeline Function state Pipeline Fetch pc Decode fd Execute de Write Back Memory em mw Branch Arg1 Target Arg2 Value Instr Arg2 Type Type Instr Data PC PC Type Dest Valid Valid Valid Valid Instr pPC – 17 – RF Mem Writing & Reading Register File Write Back Decode fd de mw Arg1 src1 RF Instr Arg2 src2 Data Dest Valid – 18 – Writing Register File init[RF] := rf0; (* Uninterpreted Function *) next[RF] := Lambda(a) . Write case Back mw_Valid & (a = mw_Dest) : mw_Data; mw default : RF(a); esac; RF Data Dest Valid – 19 – Reading Register File init[de_Arg1] := dea10; (* Initially arbitary *) next[de_Arg1] := next[RF](src1(fd_Instr)); init[de_Arg2] := dea20; (* Initially arbitary *) next[de_Arg2] := next[RF](src2(fd_Instr)); Decode fd de Write-before-read semantics Arg1 src1 RF Instr src2 – 20 – Arg2 Underlying Logic Scalar Data Types Formulas (F ) Boolean Expressions Control signals Terms (T ) Integer Expressions Data values Functional Data Types Functions (Fun) Integer Integer Immutable: Functional units Mutable: Memories Predicates (P) Integer Boolean Immutable: Data-dependent control Mutable: Bit-level memories – 21 – CLU Logic Counter Arithmetic, Lambda Expressions and Uinterpreted Functions Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) succ (T) pred (T) Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 P(T1, …, Tk) Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application To support pointer operations – 22 – CLU Logic (Cont.) Functions (Fun) f x1, …, xk . T Predicates (P) p x1, …, xk . F – 23 – Integer Integer Uninterpreted function symbol Function definition Integer Boolean Uninterpreted predicate symbol Predicate definition Decision Problem Circuit Representation of Formula Truth Values Dashed Lines Model Control Logical connectives Equations Integer Values Solid lines Model Data Uninterpreted functions If-Then-Else operation e1 ff T F e0 x0 ff T d0 T F == == F Task Determine whether formula F is universally valid True for all interpretations of variables and function symbols Often expressed as (un)satisfiability problem – 24 – » Prove that formula F is not satisfiable Finite Model Property e1 ff T F e0 x0 ff T d0 T F == x0 d0 f (x0) f (d0) == F Observation – 25 – Any formula has limited number of distinct expressions Only property that matters is whether or not different terms are equal Boolean Encoding of Integer Values Expression x0 Possible Values {0} Bit Encoding 0 0 d0 {0,1} 0 b10 f (x0) {0,1,2} b21 b20 f (d0) {0,1,2,3} b31 b30 For Each Expression Either equal to or distinct from each preceding expression Boolean Encoding Use Boolean values to encode integers over small range CLU formula can be translated into propositional logic Logic circuit with multiplexors, comparators, logic gates – 26 – Tautology iff original formula valid – 27 – in TI (2 00 5) 118 Sa tE l it eG (2 00 4) 147 Si eg e 04 ) (2 00 2) (2 00 3- er kM (2 00 1) (2 00 0) 1,000 zC ha ff B ra sp zC ha ff G Run-time (sec.) Recent Progress in SAT Solving 3600 3,000 2,000 766 81 46 0 Verifying Safety Properties Present State Next State Reachable States Reset States Reset Inputs (Arbitrary) Prove: System will never reach bad state – 28 – Bad States Bounded Model Checking Reachable Rn Bad States R2 R1 Reset States Repeatedly Perform Image Computations Set of all states reachable by one more state transition Easy to Implement Underapproximation of Reachable State Set – 29 – But, typically catch most bugs with 8–10 steps Implementing BMC Satisfiable? Reset S – 30 – X1 X2 Bad Xn Construct verification condition formula for step n by symbolically simulating system for n cycles Check with decision procedure Do as many cycles as tractable True Model Checking Rn Bad States R2 R1 Reset States Impractical for Term-Level Models Can keep adding elements Reach Fixed-Point – 31 – Rn = Rn+1 = Reachable Many systems never reach fixed point to buffer Convergence test undecidable Inductive Invariant Checking I Bad States Reachable States Reset States Key Properties of System that Make it Operate Correctly Formulate as formula I Prove Inductive – 32 – Holds initially I(s0) Preserved by all state changes I(s) I((i, s)) An Out-of-order Processor (OOO) incr Program memory PC result bus valid tag val D E C O D E dispatch Register Rename Unit retire ALU execute head tail Reorder Buffer valid value src1valid src1val src1tag src2valid src2val src2tag dest op result 1st Operand 2nd Operand Reorder Buffer Fields Data Dependencies Resolved by Register Renaming Map register ID to instruction in reorder buffer that will generate register value Inorder Retirement Managed by Retirement Buffer – 33 – FIFO buffer keeping pending instructions in program order Verifying OOO Lahiri, Seshia, & Bryant, FMCAD 2002 Goal Show that OOO implements Instruction Set Architecture (ISA) model For all possible execution sequences Challenge OOO holds partially executed instructions in reorder buffer States of two systems match only when reorder buffer flushed – 34 – ISA Reg. File PC OOO Reg. File PC Reorder Buffer Adding Shadow State McMillan, ‘98 Arons & Pnueli, ‘99 Provides Link Between ISA & OOO Models ISA Reg. File PC Additional info. in ROB Do not affect OOO behavior Generated when instruction dispatched Predict values of operands and result From ISA model OOO Reg. File PC Reorder Buffer – 35 – Invariant Checking Formulas I1, …, In holds for any initial state s0, for 1 j n I1(s) I2(s) … In(s) Ij(s ) for any current state s and successor state s for 1 j n Ij(s0) Invariants for OOO (13) Refinement maps (2) Show relation between ISA and OOO models Shadow state (3) Shadow values correctly predict OOO values State consistency (8) Properties of OOO state that ensure proper operation Overall Correctness – 36 – Follows by induction on time State Consistency Invariant Examples Register Renaming invariants (2) Any mapped register should be in the ROB, and the destination register should match r.reg.valid(r) (rob.head reg.tag(r) < rob.tail rob.dest(reg.tag(r)) = r ) For any ROB entry, the destination should have reg.valid as false and tag should be to this or later instruction robt.(reg.valid(rob.dest(t)) t reg.tag(rob.dest(t)) < rob.tail) – 37 – Extending the OOO Processor base Executes ALU instructions only exc Handles arithmetic exceptions Must flush reorder buffer exc/br Handles branches Predicts branch & speculatively executes along path exc/br/mem-simp Adds load & store instructions Store commits as instruction retires exc/br/mem Stores held in buffer Can commit later Loads must scan buffer for matching addresses – 38 – Comparative Verification Effort base Total Invariants UCLID time Person time exc exc / br exc / br / exc / br / mem-simp mem 39 67 71 13 34 54 s 236 s 403 s 1594 s 2200 s 2 days 7 days 9 days 24 days 34 days (Person time shown cumulatively) – 39 – “I Just Want a Loaf of Bread” Ingredients Recipe – 40 – Result Cooking with Invariants Ingredients: Predicates rob.head reg.tag(r) Recipe: Invariants reg.valid(r) r,t.reg.valid(r) reg.tag(r) = t (rob.head reg.tag(r) < rob.tail rob.dest(t) = r ) reg.tag(r) = t Result: Correctness rob.dest(t) = r – 41 – Automatic Recipe Generation Ingredients Recipe Creator Result Want Something More – 42 – Given any set of ingredients Generate best recipe possible Automatic Predicate Abstraction Graf & Saïdi, CAV ‘97 Idea Given set of predicates P1(s), …, Pk(s) Boolean formulas describing properties of system state View as abstraction mapping: States {0,1}k Defines abstract FSM over state set {0,1}k Form of abstract interpretation Do reachability analysis similar to symbolic model checking Implementation Early ones had weak inference capabilities Call theorem prover or decision procedure to test each potential transition – 43 – Recent ones make better use of symbolic encodings Abstract State Space Abstraction Concretization P1(s), …, Pk(s) Abstract States Abstract States Abstraction Function Concrete States – 44 – s Concretization Function t Concrete States s t Abstract State Machine Abstract Transition Abstract System Concretize Concrete System Abstract Concrete Transition s s t – 45 – t Transitions in abstract system mirror those in concrete Generating Concrete Invariant A Rn Abstract System Reach Fixed-Point on Abstract System R2 R1 Reset States Concretize C Concrete System I Reset States – 46 – Termination guaranteed, since finite state Equivalent to Computing Invariant for Concrete System Strongest possible invariant that can be expressed by formula over these predicates Quantified Invariant Generation (Lahiri & Bryant, VMCAI 2004) User supplies predicates containing free variables Generate globally quantified invariant Example Predicates p1: reg.valid(r) p2: rob.dest(t) = r p3: reg.tag(r) = t Abstract state satisfying (p1 p2 p3) corresponds to concrete state satisfying r,t[reg.valid(r) reg.tag(r) = t rob.dest(t) = r] rather than r[reg.valid(r)] r,t[reg.tag(r) = t] r,t[rob.dest(t) = r] – 47 – Systems Verified with Predicate Abstraction Model Out-Of-Order Execution Unit 25 9 1,207s German’s Cache Protocol 13 9 14s German’s Protocol, unbounded channels 24 17 427s Lamport’s Bakery Algorithm 33 18 471s – 48 – Predicates Iterations CPU Time Safety properties only Future Prospects Evaluation Demonstrated ability to verify complex, parameterized systems Predicate Abstraction Shows Promise Provides key automation advantage of model checking Successful Application to Program Application – 49 – Qadeer & Lahiri, POPL ’06 Generate loop invariants for list manipulation programs – 50 – Automatic Predicate Discovery Strength of Predicate Abstraction If give it right set of predicates, PA will put them together into invariant Weakness Gets nowhere without right set of predicates Typical failure mode: Generate “true” as invariant Challenges – 51 – Too many predicates will overwhelm PA engine Our use of quantified invariants precludes counterexamplegenerated refinement techniques Implementation of Predicate Discovery Lahiri & Bryant, CAV ’04 Initially: Extract predicates from verification condition Iterate: Add new predicates by composing next-state formulas With some heuristics thrown in Experience – 52 – Can automatically generate invariants for real examples ~10X slower than for hand-selected predicates