Verification of parameterised systems Automatic Predicate Abstraction of C Programs Shilpa Seshadri Universität Paderborn Prof. Dr. Heike Wehrheim, Daniel Wonisch, Nils Timm, Steffen Ziegert Agenda Motivation Introduction C2BP Algorithm SLAM Toolkit Future Work Conclusion Discussion 2 Automatic Predicate Abstraction of C Programs April 7, 2015 Motivation Model checking Verification technique for a finite state system Widely used for validation and debugging Sometimes, State-space explosion limits the use of tools Hence, model checkers operate on abstractions of systems Software systems are typically infinite state systems Abstraction is critical Predicate abstraction of programs is implemented – One approach Model checking finite state check an abstraction of a software system 3 Automatic Predicate Abstraction of C Programs April 7, 2015 Model Checking Algorithmic exploration of state space of the system Several advances in the past decade: symbolic model checking symmetry reductions partial order reductions compositional model checking bounded model checking using SAT solvers Most hardware companies use a model checker in the validation cycle 4 Automatic Predicate Abstraction of C Programs April 7, 2015 Abstraction Program Model Checker Input void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } Infinite state 5 Automatic Predicate Abstraction of C Programs Finite state April 7, 2015 Abstraction (A simplified view) Abstraction is an effective tool in verification Given a transition system, we want to generate an abstract transition system which is easier to analyze However, we want to make sure that If a property holds in the abstract transition system, it also holds in the original (concrete) transition system 6 Automatic Predicate Abstraction of C Programs April 7, 2015 Abstraction (A simplified view) If the property does not hold in the abstract transition system, what can we do? We can refine the abstract transition system (split some states that we merged) The refined transition system should still be an abstraction of the concrete transition system Then, we can recheck the property again on the refined transition system If the property does not hold again, we can refine again 7 Automatic Predicate Abstraction of C Programs April 7, 2015 Abstraction Refinement Loop Initial Abstraction Actual Program No error or bug found Verification Boolean Program Model Checker Spurious counterexample Abstraction refinement 8 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate Abstraction An automated abstraction technique which can be used to reduce the state space of a program The basic idea here is to remove some variables from the program by just keeping information about a set of predicates about them Predicate abstraction is a technique for doing such abstractions automatically 9 Automatic Predicate Abstraction of C Programs April 7, 2015 A Very Simple Example Assume that we have two integer variables x,y We want to abstract the program using a single predicate “x=y” We will divide the states of the program to two: 1. 2. The states where “x=y” is true The states where “x=y” is false, i.e., “xy” We will then merge all the states in the same set 10 This is an abstraction Basically, we forget everything except the value of the predicate “x=y” Automatic Predicate Abstraction of C Programs April 7, 2015 A Very Simple Example We will represent the predicate “x=y” as the boolean variable B in the abstract program “B=true” will mean “x=y” and “B=false” will mean “xy” Assume that we want to abstract the following program which contains only one statement: y := y+1 11 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate Abstraction, Step 1 Calculate preconditions based on the predicate {x = y + 1} y := y + 1 {x = y} precondition for B being true after executing the statement y:=y+1 {x y + 1} y := y + 1 {x y} precondition for B being false after executing the statement y:=y+1 12 Automatic Predicate Abstraction of C Programs Using our temporal logic notation we can say something like: {x=y+1} AX{x=y} Again, using our temporal logic notation: {x≠y+1} AX{x≠y} April 7, 2015 Predicate Abstraction, Step 2 Use decision procedures to determine if the predicates used for abstraction imply any of the preconditions x = y x = y + 1 ? No x y x = y + 1 ? No x = y x y + 1 ? Yes x y x y + 1 ? No 13 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate Abstraction, Step 3 Generate abstract code Predicate abstraction wrt the predicate “x=y” IF B THEN B := false ELSE B := true | false y := y + 1 1) Compute preconditions 3) Generate abstract code {x = y + 1} y := y + 1 {x = y} {x y + 1} y := y + 1 {x y} 2) Check implications 14 Automatic Predicate Abstraction of C Programs x = y x = y + 1 ? No x y x = y + 1 ? No x = y x y + 1 ? Yes x y x y + 1 ? No April 7, 2015 Automatic Predicate Abstraction 1st proposed by Graf & Saidi & reflected in T Ball’s work Concrete states are mapped to abstract states under a finite set of predicates Designed and implemented for Finite state systems Infinite state systems specified as Guarded Commands Not implemented for a programming language such as C 15 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate Abstraction of C (c2bp) Performs automatic predicate abstraction of C programs Input: a C program P and set of predicates E predicate = pure C boolean expression Output: a boolean program BP(P,E) that is a sound abstraction of P a precise (boolean) abstraction of P Results separate compilation (predicate abstraction) in presence of procedures and pointers 16 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate abstraction by C2BP program P C2BP Boolean program BP(P,E) predicates E 17 Automatic Predicate Abstraction of C Programs April 7, 2015 Boolean program BP(P, E): a C program with bool as type - plus some additional constructs - same control structure as P Given P : a C program E = {e1,...,en} : set of C boolean expressions over the variables in P No side effects, no procedure calls Produces a boolean program B Same control-flow structure as P Properties true of B are true of P 18 Automatic Predicate Abstraction of C Programs April 7, 2015 Formal Properties of C2BP soundness B has a superset of the feasible paths in P If {ei} is true (false) at some point on a path in B, then ei is true (false) at that point along a corresponding path in P complexity linear in size of program exponential in number of predicates 19 Automatic Predicate Abstraction of C Programs April 7, 2015 BEBOP model checker A Symbolic Model Checker for Boolean Programs Performs inter procedural dataflow analysis using binary decision diagrams (BDDs) Used to analyze the boolean program Based on Context-free Language (CFL) reachability (see Glossary) 20 Automatic Predicate Abstraction of C Programs April 7, 2015 What is SLAM? SLAM is a software model checking project at Microsoft Research Goal: Automatically check C programs (system software) against safety properties using model checking Safety property – “something good happens” . An example: a lock is never released without first being acquired Application domain: device drivers Counterexample-driven refinement terminates in practice 21 Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM Input API usage rules client C source code “as is” Analysis create, explore and refine boolean program abstractions Output Error traces (minimize noise) Verification (soundness) 22 Automatic Predicate Abstraction of C Programs April 7, 2015 Rules Static Driver Verifier Read for understanding New API rules Development Precise API Usage Rules (SLIC) Defects Drive testing tools Software Model Checking Testing 100% path coverage 23 Source Code Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM Toolkit SLAM toolkit was developed to find errors in windows device drivers Windows device drivers are required to interact with the windows kernel according to certain interface rules SLAM toolkit has an interface specification language called SLIC (Specification Language for Interface Checking) which is used for writing these interface rules The SLAM toolkit instruments the driver code with assertions based on these interface rules 24 Automatic Predicate Abstraction of C Programs April 7, 2015 Windows Device Drivers & SLIC Kernel presents a very complex interface to driver stack of drivers NT kernel multi-threaded Correct API usage described by finite state protocols SLIC Finite state language for stating rules monitors behavior of C code temporal safety properties familiar C syntax 25 Automatic Predicate Abstraction of C Programs April 7, 2015 Newton Given an error path p in boolean program B, it checks is p a feasible path of the corresponding C program? Yes: found an error No: find predicates that explain the infeasibility Uses the same interfaces to the theorem provers as c2bp. 26 Automatic Predicate Abstraction of C Programs April 7, 2015 How SLAM does it Model checking a C program is not feasible! Still model checking is very effective on model level ... Idea: automatically extract an (abstract) model from C source. But even this is hard: which aspects should be retained and hidden?? how to extract?? Idea: Start with a very abstract model, whose extraction is quite trivial. Incrementally refine the abstraction as needed. 27 Automatic Predicate Abstraction of C Programs April 7, 2015 Traditional approach model checker 28 FSM Finite state machines Source code Sequential C program Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM model checker Data flow analysis implemented using BDDs Push down model Finite state machines Boolean FSM program abstraction Source code 29 C data structures, pointers, procedure calls, parameter passing, scoping,control flow Sequential C program Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM Soundness 30 Idea: SLAM constructs sound abstractions! If A is a constructed abstraction of P, A preserves P’s control structure. Therefore, theorem: paths(P) paths(A) Every possible execution path of P is a possible execution path of A. Therefore, theorem : So, if A satisfies the SLIC spec; so does P ! Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM completeness 31 Unfortunately, the reverse of the previous theorem is generally not true an execution path (including an error path) in A may not be an execution path in P so, an error found in A may be a false error If A produces false errors, we can try to refine it (to make it more precise) to a new model A’ ; so an A’ such that: paths(P) paths(A’) paths(A) (suggesting an iterative procedure....) Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM main iteration 32 Program P Instrument Program Property φ Instrumented program P' Initial predicates Abstraction No. Then refine the abstraction is feasible in P ? yes! Property φ is invalid Abstraction A of P' violation by an error path Model checking: A |= φ ? But verification is generally undecidable; hence this iteration may not terminate. Automatic Predicate Abstraction of C Programs no violation Property φ is valid April 7, 2015 Pointers and SLAM Abstracting from a language with pointers (C) to one without pointers (boolean programs) is a challenge With pointers, C supports call by reference Strictly speaking, C supports only call by value With pointers and the address-of operator, one can simulate call-by- reference Boolean programs support only call-by-value-result SLAM mimics call-by-reference with call-by-value-result Extra complications: address operator (&) in C multiple levels of pointer dereference in C 33 Automatic Predicate Abstraction of C Programs April 7, 2015 Challenges of predicate abstraction Pointers: two related sub-problems treated in a uniform way assignments through de-referenced pointers in original Cprogram pointers & pointer-dereferences in the predicates for the abstraction Procedures: allow procedural abstraction in Boolean programs. They also have: global variables procedures with local variables call-by-value parameter passing procedural abstraction – signatures constructed in isolation 34 Automatic Predicate Abstraction of C Programs April 7, 2015 Cont’d … Procedure calls: abstraction process is challenging in the presence of pointers after a call the caller must conservatively update local state modified by procedure sound and precise approach that takes side-effects into account Unknown values: it is not always possible to determine the effect of a statement in the C-program in terms of the input predicate set E such non-determinism handled in BP via non-deterministic control expression ‘*’ which allows to implicitly express 3valued domain for boolean variables 35 Automatic Predicate Abstraction of C Programs April 7, 2015 Assumption over a C-program: All inter-procedural control flow is by if and goto All expressions are free of side-effects & short-circuit evaluation All expressions do not contain multiple pointer dereferences (e.g. **p) Function calls occur at topmost level of expressions 36 Automatic Predicate Abstraction of C Programs April 7, 2015 Weakest Precondition For a statement ‘s’ and a predicate ‘φ’ , let WP(s, φ) denote the weakest liberal precondition of φ with respect to ‘s’ For assignment statement, By definition WP(x = e, φ) is φ with all occurrences of x replaced with e, denoted φ[e/x] For example WP(x=x+1, x<5) = (x+1) < 5 = (x<4) Given S and Q, what is the weakest P’ satisfying {P’} S {Q} ? P' is called the weakest precondition of S with respect to Q, written WP(S, Q) to check {P} S {Q}, check P P’ C2BP uses decision procedures (i.e., a theorem prover) to strengthen the weakest precondition 37 Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM Future Work More impact Static Driver Verifier (internal, external) More features Heap abstractions Concurrency More languages C# and CIL 38 Automatic Predicate Abstraction of C Programs April 7, 2015 Predicate abstraction overview PA Problem: given (P, E) where P is a C-program E = {φ1, …, φn} is a set of pure boolean C-expressions over variables and constants of the C-language Compute BP(P, E) which is a boolean program that has some control structure as P contains only boolean variables V = {b1, …, bn} where bi = {φi} represents predicate φi guaranteed to be an abstraction of P (superset of traces modulo …) 39 Automatic Predicate Abstraction of C Programs April 7, 2015 SLAM – Software Model Checking SLAM innovations boolean programs: a new model for software model creation (c2bp) model checking (bebop) model refinement (newton) SLAM toolkit built on MSR program analysis infrastructure SLAM is Microsoft’s fully automated tool to verify the correctness of C programs More info: http://www.research.microsoft.com/slam/ 40 Automatic Predicate Abstraction of C Programs April 7, 2015 Glossary 41 Model checking Checking properties by systematic exploration of the state-space of a model. Properties are usually specified as state machines, or using temporal logics Safety properties Properties whose violation can be witnessed by a finite run of the system. The most common safety properties are invariants Reachability Specialization of model checking to invariant checking. Properties are specified as invariants. Most common use of model checking. Safety properties can be reduced to reachability. Boolean programs “C”-like programs with only boolean variables. Invariant checking and reachability is decidable for boolean programs. Predicate A Boolean expression over the state-space of the program eg. (x < 5) Predicate abstraction A technique to construct a boolean model from a system using a given set of predicates. Each predicate is represented by a boolean variable in the model. Weakest precondition The weakest precondition of a set of states S with respect to a statement T is the largest set of states from which executing T, when terminating, always results in a state in S. Automatic Predicate Abstraction of C Programs April 7, 2015 Thank You for your Attention! Questions are welcome 42 Automatic Predicate Abstraction of C Programs April 7, 2015