272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 3: Modular Verification with Magic, Predicate Abstraction Modular verification with Magic • MAGIC: Modular Analysis of proGrams In C • Goal: Automated verification of C programs against finite state machine specifications (given as labeled transition systems) – Checks that the behavior of the C program conforms to the behavior of the state machine • It is a modular verification approach, the decomposition of the verification task follows the modularity in the code – The procedure that is being analyzed can invoke other procedures which are themselves specified as state machines • It uses predicate abstraction for automatically generating procedure abstractions and then checks conformance of the extracted procedure abstraction to the specification • It uses the abstract-verify-refine approach – If the conformance check fails, the procedure abstraction can be refined Labeled transition systems as specifications • A labeled transition system (LTS) M is a 4-tuple (S, S0, Act, T) where – S is a finite, non-empty set of states – S0 S is the set of initial states – Act is the set of actions – T S × Act × S is the transition relation • Assume that there is a special type of state called STOP state. – A STOP state has no outgoing transitions • (s, a, s’) T is also written as s →a s’ • s a s’ means that s’ is reachable from s by following only a single a-transition and arbitrary number of ε-transitions – ε is a specific type of action in Act. It corresponds to a silent action (like skip) Example LTS lock return[0] MyLock return[1] STOP • There is a textual language (called Finite State Processes, FSP) for specifying labeled transition systems • For the above LTS, the FSP specification would be: MyLock = { lock -> return {$0 == 0} -> STOP | return {$0 == 1} -> STOP } . An example LTS and an example procedure lock return[0] MyLock return[1] STOP int proc() { if (do_lock()) return 0; else return 1; } • The goal is to check the conformance between the C procedures and the specification LTSs Procedure Abstractions • They define a procedure abstraction (PA) as a set of LTSs. • A PA is a tuple <d, l> where – d is the declaration for the procedure (as it appears in a C header file) – l is a finite list <g1, M1> , …, <gn, Mn> where each gi is a guard formula ranging over the parameters of the procedure and each Mi is an LTS with a single initial state • The guards are mutually exclusive • A PA is an abstraction of a procedure, if, for all i between 1 and n, when the guard gi evaluates to true over the actual parameters passed to the procedure, the procedure conforms to the LTS Mi Procedure Abstractions • Procedure abstractions serve two purposes 1. They are used to specify desired behavior of the procedures • They present automated extraction techniques to automatically extract a PA from a given procedure 2. They are used to achieve modular verification • During verification of a procedure, the behaviors of procedures that are called by that procedure are abstracted as PAs Conformance as Weak Simulation • Once a PA is extracted from a given procedure, then we want to check if the extracted PA conforms to the given LTS specification • In order to do this we need to formalize what it means to “conform” to a given LTS specification • They do this by using weak simulation • Weak simulation preservers LTLX properties – LTLX is the temporal logic LTL without the next state operator X – So, 1. if we verify an LTLX property on the specification LTS, and 2. show that the procedure conforms to the specification LTS, then 3. we can conclude that the procedure also satisfies the LTL property Conformance as Weak Simulation • Given two LTSs M = (S, S0, Act, T) and M’ = (S’, S0’, Act, T’) • M’ weakly simulates M if and only if there exists a weak simulation relation E S × S’ such that 1. For all s S0 there exists an s’ S0’ such that (s, s’) E 2. (s, s’) E implies that for all actions a Act \ {ε} if s a s1 then there exists an s1’ S0’ such that s’ a s1’ and (s1, s1’) E Weak Simulation • The existence of a simulation relation between two labeled transition systems can be checked by reducing the problem to an instance of Boolean satisfiability • Due to the specific structure of the SAT instances produced in this reduction, satisfiability of the resulting SAT instance can be solved in linear time. • Weak simulation is the conformance criteria that is used in Magic: – A procedure conforms to an LTS if the LTS can weakly simulate the procedure – This means that the implementation (the C procedure) is safely abstracted by its specification (the LTS) Weak Simulation • Weak simulation is the conformance criteria that is used in Magic: – A procedure conforms to an LTS if the LTS can weakly simulate the procedure – This means that the implementation (the C procedure) is safely abstracted by its specification (the LTS) Overall Approach Given a specification Mspec for a procedure • First, extract Mimp which abstracts the behavior of the procedure – During the abstraction process, the procedures that are called by the procedure that is being analyzed are modeled using a set of given procedures abstractions (which are called assumption PAs) – The procedure abstraction is automatically generated using the given assumption PAs and predicate abstraction • Then, check if Mimp conforms to Mspec (via weak simulation) – If Mimp conforms to Mspec then verification is successful and we are done – If Mimp does not conform to Mspec then we check the cause for nonconformance • If it is a bug in the implementation, then we found an error and we are done • If it is not a bug, but non-conformance is due to imprecision in the abstraction Mimp, then refine Mimp and repeat the process Model Extraction Extraction of Mimp relies on the following principles: • Every state of Mimp models a state during execution of the procedure, so every state is composed of a control component and a data component • The control components intuitively represent the values of the program counter and are formally obtained from the CFG • The data components are abstract representations of the memory state of the procedure and are obtained using predicate abstraction • The transitions between states of the Mimp are derived from the transitions in the control flow graph taking into account the assumption PAs and the predicate abstraction Inlining assumption PAs • During the model extraction, assumption PAs are used to handle procedure calls • If the procedure that is being abstraction calls another procedure p, then the PA for p is inlined by – creating a copy of the LTS for p – inserting an ε-transition from the call location to the initial state of the LTS for p – inserting ε-transitions from the STOP states of the LTS for p to the statement right after the call statement Experiments with MAGIC • OpenSSL if an open source implementation of the publicly available SSL specification – SSL protocol is used by a client (typically a web browser) and a server to establish a secure socket connection over a malicious network using public and symmetric key cryptography • A critical component of the protocol is the handshake • Check if the openssl-0.9.6c implementation of the server side handshake conforms to its specification – Implementation is encapsulated in a single procedure with 347 lines of C code – They wrote the Mspec manually (an LTS with 28 states and 67 transitions) • Check if the client-side implementation conforms to the specification – Implementation is encapsulated in a single procedure with 345 lines of C code – Mspec is an LTS with 28 states and 60 transitions Experiments with MAGIC • They provided 18 predicates for abstraction and provided the PAs for 12 library routines • Server-side verification took 255 seconds and 130MB of memory • Client-side verification took 226 seconds and 107MB of memory • They then changed the specification model to see if their approach can catch errors – Server-side error was found in 247 seconds using 130MB of memory – Client-side error was found in 227 seconds using 11MB of memory Predicate Abstraction • In the following slides I will give an overview of the predicate abstraction technique Abstraction (A simplified view) • How do we generate an abstract transition system? • Merge states in the concrete transition system (based on some criteria) – This reduces the number of states, so it should be easier to do verification • Do not eliminate transitions – This will make sure that the paths in the abstract transition system subsume the paths in the concrete transition system Abstraction (A simplified view) • For every path in the concrete transition system, there is an equivalent path in the abstract transition system – If no path in the abstract transition system violate a property, then no path in the concrete system can violate the property • Using this reasoning we can verify properties in the abstract transition system – If the property holds on the abstract transition system, we are sure that the property holds in the concrete transition system – If the property does not hold in the abstract transition system, then we are not sure if the property holds or not in the concrete transition system Abstraction (A simplified view) • If the property does not hold in the abstract transition system, what can we do? • We can refine the abstract transition system (split some states that we merged) • We have to make sure that the refined transition system is still an abstraction of the concrete transition system • Then, we can recheck the property again on the refined transition system – If the property does not hold again, we can refine again Predicate Abstraction • An automated abstraction technique which can be used to reduce the state space of a program • The basic idea in predicate abstraction is to remove some variables from the program by just keeping information about a set of predicates about them • For example a predicate such as x = y maybe the only information necessary about variables x and y to determine the behavior of the program – In that case we can just store a boolean variable which corresponds to the predicate x = y and remove variables x and y from the program – Predicate abstraction is a technique for doing such abstractions automatically Predicate Abstraction • Given a program and a set of predicates, predicate abstraction abstracts the program so that only the information about the given predicates are preserved • The abstracted program adds nondeterminism since in some cases it may not be possible to figure out what the next value of a predicate will be based on the predicates in the given set • One needs an automated theorem prover to compute the abstraction Predicate Abstraction, A Very Simple Example • Assume that we have two integer variables x,y • We want to abstract the program using a single predicate “x=y” • We will divide the states of the program to two: 1. The states where “x=y” is true 2. The states where “x=y” is false, i.e., “xy” • We will then merge all the states in the same set – This is an abstraction – Basically, we forget everything except the value of the predicate “x=y” Predicate Abstraction, A Very Simple Example • We will represent the predicate “x=y” as the boolean variable B in the abstract program – “B=true” will mean “x=y” and – “B=false” will mean “xy” • Assume that we want to abstract the following program which contains only one statement: y := y+1 Predicate Abstraction, Step 1 • Calculate preconditions based on the predicate {x = y + 1} y := y + 1 {x = y} precondition for B being true after executing the statement y:=y+1 {x y + 1} y := y + 1 {x y} precondition for B being false after executing the statement y:=y+1 Using our temporal logic notation we can say something like: {x=y+1} AX{x=y} Again, using our temporal logic notation: {x≠y+1} AX{x≠y} Predicate Abstraction, Step 2 • Use decision procedures to determine if the predicates used for abstraction imply any of the preconditions x = y x = y + 1 ? No x y x = y + 1 ? No x = y x y + 1 ? Yes x y x y + 1 ? No Predicate Abstraction, Step 3 • Generate abstract code Predicate abstraction wrt the predicate “x=y” IF B THEN B := false ELSE B := true | false y := y + 1 1) Compute preconditions 3) Generate abstract code {x = y + 1} y := y + 1 {x = y} {x y + 1} y := y + 1 {x y} 2) Check implications x = y x = y + 1 ? No x y x = y + 1 ? No x = y x y + 1 ? Yes x y x y + 1 ? No Checking conformance to a state machine • We want to check if this procedure conforms to this LTS void example() { do { A: KeAcquireSpinLock(); nPacketsOld = nPackets; req = devExt->WLHV; if(req && req->status){ devExt->WLHV = req->Next; B: KeReleaseSpinLock(); irp = req->irp; if(req->status > 0){ irp->IoS.Status = SUCCESS; irp->IoS.Info = req->Status; } else { irp->IoS.Status = FAIL; irp->IoS.Info = req->Status; } SmartDevFreeBlock(req); IoCompleteRequest(irp); nPackets++; } } while(nPackets!=nPacketsOld); C: KeReleaseSpinLock(); } KeAcquireSpinLock() SpinLock KeReleaseSpinLock() return STOP Converting a C program to a state machine • We can convert a C program to a state machine – The control component of the state machine will be states of the control from graph – The data component of the state machine will be the values of the predicates used for predicate abstraction C Code: void example() { do { A: KeAcquireSpinLock(); nPacketsOld = nPackets; req = devExt->WLHV; if(req && req->status){ devExt->WLHV = req->Next; B: KeReleaseSpinLock(); irp = req->irp; if(req->status > 0){ irp->IoS.Status = SUCCESS; irp->IoS.Info = req->Status; } else { irp->IoS.Status = FAIL; irp->IoS.Info = req->Status; } SmartDevFreeBlock(req); IoCompleteRequest(irp); nPackets++; } } while(nPackets!=nPacketsOld); C: KeReleaseSpinLock(); } State Machine (as a program): void example() begin do A: KeAcquireSpinLock(); skip; if (*) then skip; B: KeReleaseSpinLock(); skip; if (*) then skip; else skip; fi skip; fi while (*); C: KeReleaseSpinLock(); end Other than the statements labeled A, B and C, all the rest are ε-transitions Abstraction Preserves Correctness • The state machine that is generated with predicate abstraction is nondeterministic (the branches labeled “*” are non-deterministic choices) – Non-determinism is used to handle the cases where the predicates used during predicate abstraction are not sufficient enough to determine which branch will be taken • If we find no error in the generated state machine then we are sure that there are no errors in the original program – The abstract state machine allows more behaviors than the original program due to non-determinism. – Hence, if the abstract state machine is correct then the original program is also correct. Counter-Example Guided Abstraction Refinement (CEGAR) • However, if we find an error in the abstract state machine this does not mean that the original program is incorrect. – The erroneous behavior in the abstract state machine could be an infeasible execution path that is caused by the non-determinism introduced during abstraction. • Counter-example guided abstraction refinement is a technique used to iteratively refine the abstract state machine in order to remove the spurious counter-example traces CEGAR The basic idea in counter-example guided abstraction refinement is the following: • First look for an error in the abstract program (if there are no errors, we can terminate since we know that the original program is correct) • If there is an error in the abstract program, generate a counterexample path on the abstract program • Check if the generated counter-example path is feasible using a theorem prover • If the generated path is infeasible add the predicate from the branch condition where an infeasible choice is made to the predicate set and generate a new abstract program using predicate abstraction CEGAR Refined Abstraction: Abstraction: (using the predicate (nPackets = npacketsOld)) the boolean variable b void example() void example() represents the predicate begin begin (nPackets = npacketsOld) do do A: KeAcquireSpinLock(); A: KeAcquireSpinLock(); skip; b := T; if (*) then if (*) then skip; skip; B: KeReleaseSpinLock(); B: KeReleaseSpinLock(); skip; skip; if (*) then if (*) then skip; skip; else else skip; skip; fi fi skip; b := b ? F : *; fi fi while (*); while (!b); C: KeReleaseSpinLock(); C: KeReleaseSpinLock(); end end CEGAR • Using counter-example guided abstraction refinement we are iteratively creating more an more refined abstractions • This iterative abstraction refinement loop is not guaranteed to converge for infinite domains – This is not surprising since automated verification for infinite domains is undecidable in general • The challenge in this approach is automatically choosing the right set of predicates for abstraction refinement – This is similar to finding a loop invariant that is strong enough to prove the property of interest SLAM Project • SLAM project at Microsoft Research – Verification of C programs – Can handle unbounded recursion but does not handle concurrency – Uses predicate abstraction and CEGAR • SLAM toolkit was developed to find errors in windows device drivers – Predicate abstraction example in my slides is from: • “The SLAM Toolkit”, Thomas Ball and Sriram K. Rajamani, CAV 2001 • Windows device drivers are required to interact with the windows kernel according to certain interface rules • SLAM toolkit has an interface specification language called SLIC (Specification Language for Interface Checking) which is used for writing these interface rules (which are state machines) • The SLAM toolkit checks if the driver code conforms to these interface specifications