Software Model Checking Rajeev Alur University of Pennsylvania University of Edinburgh, July 2008

advertisement
Software Model Checking
Rajeev Alur
University of Pennsylvania
University of Edinburgh, July 2008
Systems Software
Can Microsoft Windows version X be
bug-free?
Millions of lines of code
Types of bugs that cause
crashes well-known
Enormous effort spent on
debugging/testing code
Certifying third-party code
(e.g. device drivers)
do{
KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
}while(nPackets!=
nPacketsOld);
KeReleaseSpinLock();
Do lock operations, acquire and
release strictly alternate on every
program execution?
Concurrency Libraries
Exploiting concurrency efficiently and correctly
dequeue(queue_t *queue, value_t *pvalue)
{
node_t *head;
node_t *tail;
node_t *next;
}
while (true) {
head = queue->head;
tail = queue->tail;
next = head->next;
if (head == queue->head) {
if (head == tail) {
if (next == 0)
return false;
cas(&queue->tail, tail, next);
} else {
*pvalue = next->value;
if (cas(&queue->head, head, next))
break;
}
}
}
delete_node(head);
return true;
Concurrent Queue (MS’96)
Shared Memory
Can the code deadlock?
Is sequential semantics of a queue
preserved? (Sequential consistency)
Security Checks for Java Applets
https://java.sun.com/javame/
public Vector<String> phoneBook;
public String number;
public int Selected;
public void sendEvent() {
phoneBook = getPhoneBook();
selected = chhoseReceiver();
number=phoneBook.elementAt(selected);
if ((number==null)|(number=“”)){
//output error
} else{
String message = inputMessage();
sendMessage(number, message);
}
}
How to certify applications for
data integrity / confidentiality ?
EventSharingMidlet from J2ME
By listening to messages, can
one infer whether a particular
entry is in the addressbook?
In Search of the Holy Grail…
software/model
correctness
specification
yes/proof
Verifier
no/bug
 Correctness is formalized as a mathematical claim to be
proved or falsified rigorously
always with respect to the given specification

Challenge: Impossibility results for automated verifier
Verification problem is undecidable (Turing 1936)
Even approximate versions are computationally intractable (model
checking is Pspace-hard)
1970s: Proof calculi for program correctness
Key to proof:
BubbleSort (A : array[1..n] of int) {
B = A : array[1..n] of int;
Finding suitable
for (i=0; i<n; i++) {
loop invariants
Permute(A,B)
Sorted(B[n-i,n])
for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]
for (j=0; j<n-i; j++) {
Permute(A,B), Sorted(B[n-i,n],
for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]
for 0<k<j B[k] <= B[j]
if (B[j]>B[j+1]) swap(B,j,j+1)
}
};
return B;
}
Deductive Program Verification
 Powerful mathematical logic (e.g. first-order logic, Higherorder logics) needed for formalization
Great progress in decision procedures
Finding proof decomposition requires expertise, but modern tools
support many built-in proof tactics
Contemporary theorem provers: HOL, PVS, ACL2, ESC-Java, Boogie
 In practice …
User partially annotates the program with invariants, and the tool
infers remaining invariants needed to complete the proof
Checks are modular (per function)
Success story: Windows developers must add enough annotations to
be able to prove absence of buffer overflow errors
1980s: Finite-state Protocol Analysis
Automated analysis of finite-state protocols with respect to
temporal logic specifications
Network protocols, Distributed algorithms
Specs:
Is there a deadlock?
Does every req get ack?
Does a buffer overflow?
Tools:
SPIN, Murphi, CADP…
Battling State-space Explosion
Analysis is basically a reachability
problem in a HUGE graph
Size of graph grows exponentially as
the number of bits required for state
encoding
Graph is constructed only
incrementally, on-the-fly
Many techniques for exploiting
structure: symmetry, data
independence, hashing, partial order
reduction …
Great flexibility in modeling: Scale
down parameters (buffer size, number
of network nodes…)
Bad states
State
Transition
1990s: Symbolic Model Checking
Constraint-based analysis of Boolean systems
Symbolic Boolean representations (propositional formulas, OBDDs)
used to encode system dynamics
Success in finding high-quality bugs in hardware applications
(VHDL/Verilog code)
Global bus
UIC
UIC
M
UIC
P
M
P
Deadlock found in
cache coherency
protocol Gigamax by
model checker SMV
Cluster bus
Read-shared/read-owned/write-invalid/write-shared/…
Symbolic Reachability Problem
Model variables X ={x1, … xn}
Each var is of finite type, say, boolean
Initialization: I(X): a formula over X e.g. (x1 && ~x2)
Update: T(X,X’)
How new vars X’ are related to old vars X as a result of executing one step
of the program: Disjunction of clauses obtained by compiling individual
instructions e.g. (x1 && x1’ = x1 && x2’ = ~x2 && x3’ = x3)
Target set: F(X) e.g. (x2 && x3)
Computational problem:
Can F be satisfied starting with I by repeatedly applying T ?
K-step reachability reduces to propositional satisfiability (SAT):
Bounded Model Checking
I(X0) && T(X0,X1) && T(X1,X2) && --- && T(Xk-1,Xk) && F(Xk)
The Story of SAT
Propositional Satisfiability: Given a formula over Boolean variables, is
there an assignment of 0/1’s to vars which makes the formula true
Canonical NP-hard problem (Cook 1971)
Enormous progress in tools that can solve instances with 1000s
of variables and millions of clauses
1960
DP
10 var
1952
Quine
 10 var
1962
DLL
 10 var
1988
SOCRATES
 3k var
1986
BDDs
 100 var
1994
Hannibal
 3k var
1992
GSAT
 300 var
1996
GRASP
1k var
1996
Stålmarck
 1000 var
2002
Berkmin
10k var
2001
Chaff
10k var
1996
SATO
1k var
Source: Malik [2004]
2000s: Model Checking of C code
Phase 1: Given a program P, build an
abstract finite-state (Boolean)
model A such that set of
behaviors of P is a subset of those
of A (conservative abstraction)
Phase 2: Model check A wrt
specification: this can prove P to
be correct, or reveal a bug in P, or
suggest inadequacy of A
Shown to be effective on
Windows device drivers in
Microsoft Research project
SLAM
do{
KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
}while(nPackets!=
nPacketsOld);
KeReleaseSpinLock();
Do lock operations, acquire and
release, strictly alternate on every
program execution?
Program Abstraction
int x, y;
if x>0 {
…………
y=x+1
……….}
else {
…………
y=x+1
……….}
bool bx, by;
Predicate Abstraction
bx: x>0; by : y>0
if bx {
…………
by=true
……….}
else {
…………
by={true,false}
……….}
Software Model Checking
 Tools for verifying source code combine many techniques
Program analysis techniques such as slicing, range analysis
Abstraction
Model checking
Refinement from counter-examples
 New challenges for model checking (beyond finite-state
reachability analysis)
Recursion gives pushdown control
Pointers, dynamic creation of objects, inheritence….
 A very active and emerging research area
Abstraction-based tools: SLAM, BLAST,…
Direct state encoding: F-SOFT, CBMC, CheckFence…
Coming Up …
CheckFence Project at Penn
Concurrent Executions on Relaxed Memory Models
Analysis tool for Concurrent Data Types
Joint work with Sebastian Burckhardt and Milo Martin
Not covered: How to check that a Java midlet does not leak
user-specified secrets (Ongoing work with Pavol Cerny)
Challenge: Exploiting Concurrency, Correctly
Multi-threaded Software
Shared-memory Multiprocessor
Concurrent Executions
Bugs
Concurrency on Multiprocessors
Initially x = y = 0
thread 1
x = 1
y = 1
thread 2
r1 = y
r2 = x
Standard Interleavings
x = 1
y = 1
r1 = y
r2 = x
x = 1
r1 = y
y = 1
r2 = x
x = 1
r1 = y
r2 = x
y = 1
r1 = y
x = 1
y = 1
r2 = x
r1 = y
x = 1
r2 = x
y = 1
r1 = y
r2 = x
x = 1
y = 1
r1=r2=1
r1=0,r2=1
r1=0,r2=1
r1=0,r2=1
r1=0,r2=1
r1=r2=0
Can we conclude that if r1 = 1 then r2 must be 1 ?
No! On “real” multiprocessors, possible to have r1=1 and r2=0
Architectures with Weak Memory Models
 A modern multiprocessor does
not enforce global ordering of
all instructions for performance
reasons
 Lamport (1979): Sequential
consistency semantics for
correctness of multiprocessor
shared memory (like
interleaving)
 Considered too limiting, and
many “relaxations” proposed
 In theory: TSO, RMO,
Relaxed …
 In practice: Alpha, Intel IA32, IBM 370, Sun SPARC,
PowerPC …
cache
Main Memory
Concurrency in Theory
CCS (1978)
CCS Syntax
Intel 64 memory ordering obeys
following principles
P := e | a.P | P+P | P||P | P\a
CCS Operational Semantics
(sample rules)
a.P -a-> P
P –a-> P’
P||Q –a-> P’||Q
Concurrency in Practice
Intel (2007)
P -a-> P’
Q||P -a-> Q||P’
P –a-> P’; Q –a-> Q’
P||Q –t-> P’||Q’
1. Loads are not reordered with
other loads
2. Stores are not reordered with
other stores
3. Stores are not reordered with
older loads
4. Loads may be reordered with
older stores to different
locations but not with older
stores to same locations
4 more rules +
Illustrative examples
Programming with Weak Memory Models
 Concurrent programming is already hard, shouldn’t the effects
of weaker models be hidden from the programmer?
 Mostly yes …
 Safe programming using extensive use of synchronization
primitives
 Use locks for every access to shared data
 Compilers use memory fences to enforce ordering
 Not always …
 Non-blocking data structures
 Highly optimized library code for concurrency
 Code for lock/unlock instructions
 OS code managing process queues etc.
Non-blocking Queue (MS’96)
boolean_t dequeue(queue_t *queue, value_t *pvalue)
{
node_t *head;
Queue is being possibly updated concurrently
node_t *tail;
node_t *next;
2
3
1
while (true) {
head = queue->head;
tail = queue->tail;
head
tail
next = head->next;
if (head == queue->head) {
if (head == tail) {
Atomic compare-and-swap for synchronization
if (next == 0)
return false;
cas(&queue->tail, (uint32) tail, (uint32) next);
} else {
*pvalue = next->value;
if (cas(&queue->head, (uint32) head, (uint32) next))
break;
}
}
}
delete_node(head);
return true;
}
Programs (multi-threaded)
Simple
Usable by programmers
Application level concurrency model
System-level code
Concurrency libraries
Architecture-aware
Concurrency Analysis
Architecture level concurrency model
Highly parallel hardware
-- multicores, SoCs
Complex
Efficient use of parallelism
Software Model Checking for
Concurrent Code on Multiprocessors
Why?: Real bugs in real code
 Opportunities
 10s—100s lines of low-level library C code
 Hard to design and verify -> buggy
 Effects of weak memory models, fences …
 Challenges
 Lots of behaviors possible: high level of concurrency
 How to formalize and reason about weak memory models?
Shared Memory Consistency Models
 Specifies restrictions on what values a read from shared memory
can return
 Program Order: x <p y if x and y are instructions belonging to the
same thread and x appears before y
 Sequential Consistency (Lamport 79): Concurrent execution is
correct if there exists a global order < of all accesses such that
 If x <p y then x < y
 Each load returns value of most recent, according to <, store to the
same location (or initial value, if no such store exists)
 Clean abstraction for programmers, but high implementation cost
Effect of Memory Model
Initially flag1 = flag2 = 0
thread 1
thread 2
1.
2.
1.
2.
flag1 = 1;
if (flag2 == 0)
crit. sect.
flag2 = 1;
if (flag1 == 0)
crit. sect.
Ensures mutual exclusion if architecture supports SC memory
Most architectures do not enforce ordering of accesses to different
memory locations
 Does not ensure mutual exclusion under weaker models
Ordering can be enforced using “fence” instructions
 Insert MEMBAR between lines 1 and 2 to ensure mutual exclusion
Weak Memory Models
 A large variety of models exist; a good starting point:
Shared Memory Consistency Models: A tutorial
IEEE Computer 96, Adve & Gharachorloo
 How to relax memory order requirement?
 Operations of same thread to different locations need not be
globally ordered
 How to relax write atomicity requirement?
 Read may return value of a write not yet globally visible
 Uniprocessor semantics preserved
 Typically defined in architecture manuals (e.g. SPARC manual)
Which Memory Model should a Verifier use?
RMO
PSO
TSO
390
SC
Alpha
IA-32
Relaxed
Formalization of Relaxed

Program Order: x <p y if x and y are instructions belonging to
the same thread and x appears before y

Concurrent execution over a set X of accesses is correct wrt
Relaxed if there exists a total order < over X such that
1.
If x <p y, and both x and y are accesses to the same address,
and y is a store, then x < y must hold
2. For a load l and a store s visible to l, either s and l have same
value, or there exists another store s’ visible to l with s < s’
A store s is visible to load l if they are to the same address and
either s < l or s <p l

Constraint-based specification that can be easily encoded in logical
formulas
Pass: all executions of the test
are observationally
equivalent to a serial
execution
CheckFence
Fail:
Inconclusive:
runs out of time
or memory
Memory
Model Axioms
How To Bound Executions
 Verify individual “symbolic tests”
 finite number of concurrent threads
 finite number of operations/thread
 nondeterministic input values
 Example
thread 1
enqueue(X)
thread 2
dequeue() → Y
 User creates suite of tests of increasing size
Why Symbolic Test Programs?
1) Make everything finite
 State is unbounded (dynamic memory allocation)
... is bounded for individual test
 Checking sequential consistency is undecidable (AMP 96)
... is decidable for individual test
2) Gives us finite instruction sequence to work with
 State space too large for interleaved system model
.... can directly encode value flow between instructions
 Memory model specified by axioms
.... can directly encode ordering axioms on instructions
Tool Architecture
Trace
C code
Memory model
Symbolic Test
Symbolic test gives exponentially many executions
(symbolic inputs, dynamic memory allocation, ordering of instructions).
CheckFence solves for “incorrect” executions.
construct CNF formula whose solutions
correspond precisely to the concurrent
executions
Trace
C code
Memory model
Symbolic Test
automatic, lazy
loop unrolling
automatic specification mining
(enumerate correct observations)
Specification Mining
thread 1
thread 2
dequeue() → Z
enqueue(X);
enqueue(Y)
Possible Operation-level Interleavings
enqueue(X)
enqueue(X)
dequeue() -> Z
enqueue(Y)
dequeue() -> Z
enqueue(X)
dequeue() -> Z
enqueue(Y)
enqueue(Y)
For each interleaving, obtain symbolic constraint by encoding
corresponding executions in SAT solver
Spec is disjunction of all possibilities:
Spec: (Z=X) | (Z=null)
To find bugs, check satisfiability of Phi & ~ Spec
where Phi encodes all possible concurrent executions
Encoding Memory Order
thread 1
s1
s2
thread 2
store
store
l1
l2
load
load
 Variables for encoding
 Use boolean vars for relative order (x<y) of memory accesses
 Use bitvector variables Ax and Dx for address and data values
associated with memory access x
 Encode constraints
 encode transitivity of memory order
 encode ordering axioms of the memory model
Example (for SC): (s1<s2) & (l1<l2)
 encode value flow
“Loaded value must match last value stored to same address”
Example: value must flow from s1 to l1 under following conditions:
((s1<l1)&(As1 = Al1)&((s2<s1)|(l1<s2)|(As2 != Al1))) -> (Ds1= Dl1)
Example: Memory Model Bug
1
head
Processor 1
links new node into list
...
3 node->value = 2;
...
1 head = node;
...
2
3
Processor 2
reads value at head of list
...
2 value = head->value;
...
Processor 1 reorders the stores!
memory accesses happen in order 1 2 3
--> Processor 2 loads uninitialized value
adding a fence between lines on left side prevents reordering
Algorithms Analyzed
Type
Description
LOC Source
Queue
Two-lock queue
80
Queue
Non-blocking queue
98
M. Michael and L. Scott
(PODC 1996)
Set
Lazy list-based set
141
Heller et al. (OPODIS 2005)
Set
Nonblocking list
174
T. Harris (DISC 2001)
Deque
“snark” algorithm
159
D. Detlefs et al. (DISC 2000)
LL/VL/SC CAS-based
74
M. Moir (PODC 1997)
LL/VL/SC Bounded Tags
198
Results
 snark algorithm has 2 known bugs
 lazy list-based set had a unknown bug
(missing initialization; missed by formal correctness proof
[CAV 2006] because of hand-translation of pseudocode)
Type
Description
Queue
Two-lock queue
Queue
Non-blocking queue
Set
Lazy list-based set
Set
Nonblocking list
Deque
original “snark”
Deque
fixed “snark”
LL/VL/SC CAS-based
LL/VL/SC Bounded Tags
regular
bugs
1 unknown
2 known
Results
 snark algorithm has 2 known bugs
 lazy list-based set had a unknown bug
(missing initialization; missed by formal correctness proof
[CAV 2006] because of hand-translation of pseudocode)
 Many failures on relaxed memory model
• inserted fences by hand to fix them
• small testcases sufficient for this purpose
Type
Description
regular
bugs
# Fences inserted
Store Load Dependent Aliased
Store Load
Loads
Loads
Queue
Two-lock queue
1
Queue
Non-blocking queue
2
Set
Lazy list-based set
Set
Nonblocking list
Deque
original “snark”
Deque
fixed “snark”
1 unknown
1
4
1
2
1
3
1
2
3
4
6
2 known
4
2
LL/VL/SC CAS-based
3
LL/VL/SC Bounded Tags
4
Typical Tool Performance
 Very efficient on small testcases (< 100 memory accesses)
Example (nonblocking queue): T0 = i (e | d) T1 = i (e | e | d | d )
- find counterexamples within a few seconds
- verify within a few minutes
- enough to cover all 9 fences in nonblocking queue
 Slows down with increasing number of memory accesses in test
Example (snark deque):
Dq = pop_l | pop_l | pop_r | pop_r | push_l | push_l | push_r | push_r
- has 134 memory accesses (77 loads, 57 stores)
- Dq finds second snark bug within ~1 hour
 Does not scale past ~300 memory accesses
CheckFence Summary
 Software model checking of low-level concurrent
software requires encoding of memory models
 Challenge for model checking due to high level of
concurrency and axiomatic specifications
 Opportunity to find bugs in library code that’s hard to
design and verify
 CheckFence project at Penn
 SAT-based bounded model checking for concurrent data types
 Bugs in real code with fences
Ongoing Research
 What’s the best way to verify C code (on relaxed memory
models)?
 SAT-based encoding seems suitable to capture specifications of
memory models, but many opportunities for improvement
 Can one develop abstract operational abstract models for
multiprocessor architectures?
 Proof methods for relaxed memory models
 Hardware support for transactional memory
 Current interest in industry and architecture research
 Can formal verification influence designs/standards?
software/model
correctness
specification
Impressive progress on an
intractable problem
Device drivers
Concurrency libraries
Buffer overflows in OS
Network protocols …
Academic research with
industrial impact
Software
Model Checker
yes/proof
no/bug
Ingredients for success
SAT almost feasible
Logic + Algorithms + Tools
Focus on specific problems
Scalability not necessary
Flexibility in setting up the
problem
Unmet challenge: Lack of
robustness of tools -> lot of
user expertise needed
Download