Software Model Checking Rajeev Alur University of Pennsylvania University of Edinburgh, July 2008 Systems Software Can Microsoft Windows version X be bug-free? Millions of lines of code Types of bugs that cause crashes well-known Enormous effort spent on debugging/testing code Certifying third-party code (e.g. device drivers) do{ KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } }while(nPackets!= nPacketsOld); KeReleaseSpinLock(); Do lock operations, acquire and release strictly alternate on every program execution? Concurrency Libraries Exploiting concurrency efficiently and correctly dequeue(queue_t *queue, value_t *pvalue) { node_t *head; node_t *tail; node_t *next; } while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, tail, next); } else { *pvalue = next->value; if (cas(&queue->head, head, next)) break; } } } delete_node(head); return true; Concurrent Queue (MS’96) Shared Memory Can the code deadlock? Is sequential semantics of a queue preserved? (Sequential consistency) Security Checks for Java Applets https://java.sun.com/javame/ public Vector<String> phoneBook; public String number; public int Selected; public void sendEvent() { phoneBook = getPhoneBook(); selected = chhoseReceiver(); number=phoneBook.elementAt(selected); if ((number==null)|(number=“”)){ //output error } else{ String message = inputMessage(); sendMessage(number, message); } } How to certify applications for data integrity / confidentiality ? EventSharingMidlet from J2ME By listening to messages, can one infer whether a particular entry is in the addressbook? In Search of the Holy Grail… software/model correctness specification yes/proof Verifier no/bug Correctness is formalized as a mathematical claim to be proved or falsified rigorously always with respect to the given specification Challenge: Impossibility results for automated verifier Verification problem is undecidable (Turing 1936) Even approximate versions are computationally intractable (model checking is Pspace-hard) 1970s: Proof calculi for program correctness Key to proof: BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int; Finding suitable for (i=0; i<n; i++) { loop invariants Permute(A,B) Sorted(B[n-i,n]) for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for (j=0; j<n-i; j++) { Permute(A,B), Sorted(B[n-i,n], for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for 0<k<j B[k] <= B[j] if (B[j]>B[j+1]) swap(B,j,j+1) } }; return B; } Deductive Program Verification Powerful mathematical logic (e.g. first-order logic, Higherorder logics) needed for formalization Great progress in decision procedures Finding proof decomposition requires expertise, but modern tools support many built-in proof tactics Contemporary theorem provers: HOL, PVS, ACL2, ESC-Java, Boogie In practice … User partially annotates the program with invariants, and the tool infers remaining invariants needed to complete the proof Checks are modular (per function) Success story: Windows developers must add enough annotations to be able to prove absence of buffer overflow errors 1980s: Finite-state Protocol Analysis Automated analysis of finite-state protocols with respect to temporal logic specifications Network protocols, Distributed algorithms Specs: Is there a deadlock? Does every req get ack? Does a buffer overflow? Tools: SPIN, Murphi, CADP… Battling State-space Explosion Analysis is basically a reachability problem in a HUGE graph Size of graph grows exponentially as the number of bits required for state encoding Graph is constructed only incrementally, on-the-fly Many techniques for exploiting structure: symmetry, data independence, hashing, partial order reduction … Great flexibility in modeling: Scale down parameters (buffer size, number of network nodes…) Bad states State Transition 1990s: Symbolic Model Checking Constraint-based analysis of Boolean systems Symbolic Boolean representations (propositional formulas, OBDDs) used to encode system dynamics Success in finding high-quality bugs in hardware applications (VHDL/Verilog code) Global bus UIC UIC M UIC P M P Deadlock found in cache coherency protocol Gigamax by model checker SMV Cluster bus Read-shared/read-owned/write-invalid/write-shared/… Symbolic Reachability Problem Model variables X ={x1, … xn} Each var is of finite type, say, boolean Initialization: I(X): a formula over X e.g. (x1 && ~x2) Update: T(X,X’) How new vars X’ are related to old vars X as a result of executing one step of the program: Disjunction of clauses obtained by compiling individual instructions e.g. (x1 && x1’ = x1 && x2’ = ~x2 && x3’ = x3) Target set: F(X) e.g. (x2 && x3) Computational problem: Can F be satisfied starting with I by repeatedly applying T ? K-step reachability reduces to propositional satisfiability (SAT): Bounded Model Checking I(X0) && T(X0,X1) && T(X1,X2) && --- && T(Xk-1,Xk) && F(Xk) The Story of SAT Propositional Satisfiability: Given a formula over Boolean variables, is there an assignment of 0/1’s to vars which makes the formula true Canonical NP-hard problem (Cook 1971) Enormous progress in tools that can solve instances with 1000s of variables and millions of clauses 1960 DP 10 var 1952 Quine 10 var 1962 DLL 10 var 1988 SOCRATES 3k var 1986 BDDs 100 var 1994 Hannibal 3k var 1992 GSAT 300 var 1996 GRASP 1k var 1996 Stålmarck 1000 var 2002 Berkmin 10k var 2001 Chaff 10k var 1996 SATO 1k var Source: Malik [2004] 2000s: Model Checking of C code Phase 1: Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction) Phase 2: Model check A wrt specification: this can prove P to be correct, or reveal a bug in P, or suggest inadequacy of A Shown to be effective on Windows device drivers in Microsoft Research project SLAM do{ KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } }while(nPackets!= nPacketsOld); KeReleaseSpinLock(); Do lock operations, acquire and release, strictly alternate on every program execution? Program Abstraction int x, y; if x>0 { ………… y=x+1 ……….} else { ………… y=x+1 ……….} bool bx, by; Predicate Abstraction bx: x>0; by : y>0 if bx { ………… by=true ……….} else { ………… by={true,false} ……….} Software Model Checking Tools for verifying source code combine many techniques Program analysis techniques such as slicing, range analysis Abstraction Model checking Refinement from counter-examples New challenges for model checking (beyond finite-state reachability analysis) Recursion gives pushdown control Pointers, dynamic creation of objects, inheritence…. A very active and emerging research area Abstraction-based tools: SLAM, BLAST,… Direct state encoding: F-SOFT, CBMC, CheckFence… Coming Up … CheckFence Project at Penn Concurrent Executions on Relaxed Memory Models Analysis tool for Concurrent Data Types Joint work with Sebastian Burckhardt and Milo Martin Not covered: How to check that a Java midlet does not leak user-specified secrets (Ongoing work with Pavol Cerny) Challenge: Exploiting Concurrency, Correctly Multi-threaded Software Shared-memory Multiprocessor Concurrent Executions Bugs Concurrency on Multiprocessors Initially x = y = 0 thread 1 x = 1 y = 1 thread 2 r1 = y r2 = x Standard Interleavings x = 1 y = 1 r1 = y r2 = x x = 1 r1 = y y = 1 r2 = x x = 1 r1 = y r2 = x y = 1 r1 = y x = 1 y = 1 r2 = x r1 = y x = 1 r2 = x y = 1 r1 = y r2 = x x = 1 y = 1 r1=r2=1 r1=0,r2=1 r1=0,r2=1 r1=0,r2=1 r1=0,r2=1 r1=r2=0 Can we conclude that if r1 = 1 then r2 must be 1 ? No! On “real” multiprocessors, possible to have r1=1 and r2=0 Architectures with Weak Memory Models A modern multiprocessor does not enforce global ordering of all instructions for performance reasons Lamport (1979): Sequential consistency semantics for correctness of multiprocessor shared memory (like interleaving) Considered too limiting, and many “relaxations” proposed In theory: TSO, RMO, Relaxed … In practice: Alpha, Intel IA32, IBM 370, Sun SPARC, PowerPC … cache Main Memory Concurrency in Theory CCS (1978) CCS Syntax Intel 64 memory ordering obeys following principles P := e | a.P | P+P | P||P | P\a CCS Operational Semantics (sample rules) a.P -a-> P P –a-> P’ P||Q –a-> P’||Q Concurrency in Practice Intel (2007) P -a-> P’ Q||P -a-> Q||P’ P –a-> P’; Q –a-> Q’ P||Q –t-> P’||Q’ 1. Loads are not reordered with other loads 2. Stores are not reordered with other stores 3. Stores are not reordered with older loads 4. Loads may be reordered with older stores to different locations but not with older stores to same locations 4 more rules + Illustrative examples Programming with Weak Memory Models Concurrent programming is already hard, shouldn’t the effects of weaker models be hidden from the programmer? Mostly yes … Safe programming using extensive use of synchronization primitives Use locks for every access to shared data Compilers use memory fences to enforce ordering Not always … Non-blocking data structures Highly optimized library code for concurrency Code for lock/unlock instructions OS code managing process queues etc. Non-blocking Queue (MS’96) boolean_t dequeue(queue_t *queue, value_t *pvalue) { node_t *head; Queue is being possibly updated concurrently node_t *tail; node_t *next; 2 3 1 while (true) { head = queue->head; tail = queue->tail; head tail next = head->next; if (head == queue->head) { if (head == tail) { Atomic compare-and-swap for synchronization if (next == 0) return false; cas(&queue->tail, (uint32) tail, (uint32) next); } else { *pvalue = next->value; if (cas(&queue->head, (uint32) head, (uint32) next)) break; } } } delete_node(head); return true; } Programs (multi-threaded) Simple Usable by programmers Application level concurrency model System-level code Concurrency libraries Architecture-aware Concurrency Analysis Architecture level concurrency model Highly parallel hardware -- multicores, SoCs Complex Efficient use of parallelism Software Model Checking for Concurrent Code on Multiprocessors Why?: Real bugs in real code Opportunities 10s—100s lines of low-level library C code Hard to design and verify -> buggy Effects of weak memory models, fences … Challenges Lots of behaviors possible: high level of concurrency How to formalize and reason about weak memory models? Shared Memory Consistency Models Specifies restrictions on what values a read from shared memory can return Program Order: x <p y if x and y are instructions belonging to the same thread and x appears before y Sequential Consistency (Lamport 79): Concurrent execution is correct if there exists a global order < of all accesses such that If x <p y then x < y Each load returns value of most recent, according to <, store to the same location (or initial value, if no such store exists) Clean abstraction for programmers, but high implementation cost Effect of Memory Model Initially flag1 = flag2 = 0 thread 1 thread 2 1. 2. 1. 2. flag1 = 1; if (flag2 == 0) crit. sect. flag2 = 1; if (flag1 == 0) crit. sect. Ensures mutual exclusion if architecture supports SC memory Most architectures do not enforce ordering of accesses to different memory locations Does not ensure mutual exclusion under weaker models Ordering can be enforced using “fence” instructions Insert MEMBAR between lines 1 and 2 to ensure mutual exclusion Weak Memory Models A large variety of models exist; a good starting point: Shared Memory Consistency Models: A tutorial IEEE Computer 96, Adve & Gharachorloo How to relax memory order requirement? Operations of same thread to different locations need not be globally ordered How to relax write atomicity requirement? Read may return value of a write not yet globally visible Uniprocessor semantics preserved Typically defined in architecture manuals (e.g. SPARC manual) Which Memory Model should a Verifier use? RMO PSO TSO 390 SC Alpha IA-32 Relaxed Formalization of Relaxed Program Order: x <p y if x and y are instructions belonging to the same thread and x appears before y Concurrent execution over a set X of accesses is correct wrt Relaxed if there exists a total order < over X such that 1. If x <p y, and both x and y are accesses to the same address, and y is a store, then x < y must hold 2. For a load l and a store s visible to l, either s and l have same value, or there exists another store s’ visible to l with s < s’ A store s is visible to load l if they are to the same address and either s < l or s <p l Constraint-based specification that can be easily encoded in logical formulas Pass: all executions of the test are observationally equivalent to a serial execution CheckFence Fail: Inconclusive: runs out of time or memory Memory Model Axioms How To Bound Executions Verify individual “symbolic tests” finite number of concurrent threads finite number of operations/thread nondeterministic input values Example thread 1 enqueue(X) thread 2 dequeue() → Y User creates suite of tests of increasing size Why Symbolic Test Programs? 1) Make everything finite State is unbounded (dynamic memory allocation) ... is bounded for individual test Checking sequential consistency is undecidable (AMP 96) ... is decidable for individual test 2) Gives us finite instruction sequence to work with State space too large for interleaved system model .... can directly encode value flow between instructions Memory model specified by axioms .... can directly encode ordering axioms on instructions Tool Architecture Trace C code Memory model Symbolic Test Symbolic test gives exponentially many executions (symbolic inputs, dynamic memory allocation, ordering of instructions). CheckFence solves for “incorrect” executions. construct CNF formula whose solutions correspond precisely to the concurrent executions Trace C code Memory model Symbolic Test automatic, lazy loop unrolling automatic specification mining (enumerate correct observations) Specification Mining thread 1 thread 2 dequeue() → Z enqueue(X); enqueue(Y) Possible Operation-level Interleavings enqueue(X) enqueue(X) dequeue() -> Z enqueue(Y) dequeue() -> Z enqueue(X) dequeue() -> Z enqueue(Y) enqueue(Y) For each interleaving, obtain symbolic constraint by encoding corresponding executions in SAT solver Spec is disjunction of all possibilities: Spec: (Z=X) | (Z=null) To find bugs, check satisfiability of Phi & ~ Spec where Phi encodes all possible concurrent executions Encoding Memory Order thread 1 s1 s2 thread 2 store store l1 l2 load load Variables for encoding Use boolean vars for relative order (x<y) of memory accesses Use bitvector variables Ax and Dx for address and data values associated with memory access x Encode constraints encode transitivity of memory order encode ordering axioms of the memory model Example (for SC): (s1<s2) & (l1<l2) encode value flow “Loaded value must match last value stored to same address” Example: value must flow from s1 to l1 under following conditions: ((s1<l1)&(As1 = Al1)&((s2<s1)|(l1<s2)|(As2 != Al1))) -> (Ds1= Dl1) Example: Memory Model Bug 1 head Processor 1 links new node into list ... 3 node->value = 2; ... 1 head = node; ... 2 3 Processor 2 reads value at head of list ... 2 value = head->value; ... Processor 1 reorders the stores! memory accesses happen in order 1 2 3 --> Processor 2 loads uninitialized value adding a fence between lines on left side prevents reordering Algorithms Analyzed Type Description LOC Source Queue Two-lock queue 80 Queue Non-blocking queue 98 M. Michael and L. Scott (PODC 1996) Set Lazy list-based set 141 Heller et al. (OPODIS 2005) Set Nonblocking list 174 T. Harris (DISC 2001) Deque “snark” algorithm 159 D. Detlefs et al. (DISC 2000) LL/VL/SC CAS-based 74 M. Moir (PODC 1997) LL/VL/SC Bounded Tags 198 Results snark algorithm has 2 known bugs lazy list-based set had a unknown bug (missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode) Type Description Queue Two-lock queue Queue Non-blocking queue Set Lazy list-based set Set Nonblocking list Deque original “snark” Deque fixed “snark” LL/VL/SC CAS-based LL/VL/SC Bounded Tags regular bugs 1 unknown 2 known Results snark algorithm has 2 known bugs lazy list-based set had a unknown bug (missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode) Many failures on relaxed memory model • inserted fences by hand to fix them • small testcases sufficient for this purpose Type Description regular bugs # Fences inserted Store Load Dependent Aliased Store Load Loads Loads Queue Two-lock queue 1 Queue Non-blocking queue 2 Set Lazy list-based set Set Nonblocking list Deque original “snark” Deque fixed “snark” 1 unknown 1 4 1 2 1 3 1 2 3 4 6 2 known 4 2 LL/VL/SC CAS-based 3 LL/VL/SC Bounded Tags 4 Typical Tool Performance Very efficient on small testcases (< 100 memory accesses) Example (nonblocking queue): T0 = i (e | d) T1 = i (e | e | d | d ) - find counterexamples within a few seconds - verify within a few minutes - enough to cover all 9 fences in nonblocking queue Slows down with increasing number of memory accesses in test Example (snark deque): Dq = pop_l | pop_l | pop_r | pop_r | push_l | push_l | push_r | push_r - has 134 memory accesses (77 loads, 57 stores) - Dq finds second snark bug within ~1 hour Does not scale past ~300 memory accesses CheckFence Summary Software model checking of low-level concurrent software requires encoding of memory models Challenge for model checking due to high level of concurrency and axiomatic specifications Opportunity to find bugs in library code that’s hard to design and verify CheckFence project at Penn SAT-based bounded model checking for concurrent data types Bugs in real code with fences Ongoing Research What’s the best way to verify C code (on relaxed memory models)? SAT-based encoding seems suitable to capture specifications of memory models, but many opportunities for improvement Can one develop abstract operational abstract models for multiprocessor architectures? Proof methods for relaxed memory models Hardware support for transactional memory Current interest in industry and architecture research Can formal verification influence designs/standards? software/model correctness specification Impressive progress on an intractable problem Device drivers Concurrency libraries Buffer overflows in OS Network protocols … Academic research with industrial impact Software Model Checker yes/proof no/bug Ingredients for success SAT almost feasible Logic + Algorithms + Tools Focus on specific problems Scalability not necessary Flexibility in setting up the problem Unmet challenge: Lack of robustness of tools -> lot of user expertise needed