Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania

advertisement
Catching Bugs in Software
Rajeev Alur
Systems Design Research Lab
University of Pennsylvania
www.cis.upenn.edu/~alur/
Software Reliability
 Software bugs are pervasive
Bugs can be expensive
Bugs can cost lives
Bulk of development cost is in validation, testing, bug fixes
 Old problem that just won’t go away
 Many approaches and decades of research
Systematic testing
Programming languages technology (e.g. types)
Formal methods (specification and verification)
Grand challenge for computer science:
Tools for designing “correct” software
software/model
correctness
specification
Verifier
Yes/proof
No/bug
 Correctness is formalized as a mathematical claim
to be proved or falsified rigorously
always with respect to the given specification
 A brief history of formal verification
1.
Structured programs; Hoare logic; 1969
2. Network protocols; State-space search; 1990
3. Cache coherency protocols; Symbolic search; 1995
4. Device drivers; Automated abstraction; 2001
1. Program Verification
 Hoare logic for formalizing correctness of structured
programs (late 1960s)
 Typical examples: sorting, graph algorithms
 Specification for sorting
Permute(A,B): array B is a permutation of elements in
array A
Sorted(A): for 0<i<n, A[i]<=A[i+1]
 Function sort is correct if following holds
{True} B := sort(A) {Permute(A,B)&Sorted(B)}
 Provides calculus for pre/post conditions of structured
programs
Sample Proof: Bubble Sort
Key to proof:
BubbleSort (A : array[1..n] of int) {
B = A : array[1..n] of int;
Finding suitable
for (i=0; i<n; i++) {
loop invariants
Permute(A,B)
Sorted(B[n-i,n])
for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]
for (j=0; j<n-i; j++) {
Permute(A,B), Sorted(B[n-i,n],
for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]
for 0<k<j B[k] <= B[j]
if (B[j]>B[j+1]) swap(B,j,j+1)
}
};
return B;
}
Program Verification
 Powerful mathematical logic (e.g. first-order logic,
Higher-order logics) needed for formalization
Automation extremely difficult
Finding proof decomposition requires great expertise
 Alive and well, but not booming
 Contemporary theorem provers: HOL, PVS, ACL2
provide decision procedures and tactics for
decomposition
Main applications: Microprocessor verification,
Correctness of JVM…
2. Protocol Analysis
 Automated analysis of finite-state protocols
Network protocols, Distributed algorithms
 Great progress in the last 20 years
Protocol modeled as communicating finite-state
processes
Correctness specified using temporal logic
Verification performed automatically to reveal errors
Highly optimized state-space search techniques
 Model checker SPIN from Bell Labs
ACM Software Systems award (2001)
Success in finding high-quality bugs in real systems
(NASA space shuttle, Lucent’s Pathstar switch)
Example: X.21 Communication Protocol
State-space Explosion !!
 Analysis is basically a reachability problem in a graph
Nodes are states, where each state gives values of all the variables of all
the communicating processes
An edge represents execution of a single action of one of the processes
(asynchronous communication)
 Size of graph grows exponentially as the number of bits
required for state encoding, but…
Graph is constructed only incrementally, on-the-fly
Clever hashing and state compaction techniques
Many techniques for exploiting structure: symmetry, data
independence, partial order reduction …
Millions of states can be explored quickly to reveal bugs
 Great flexibility in modeling
Abstract many details, simplify
Scale down parameters (buffer size, number of network nodes…)
3. Symbolic Model Checking
 Constraint-based analysis of Boolean systems
Cache coherency protocols, Memory controllers,…
 Active in the past 12 years
Symbolic Boolean representations (propositional
formulas, BDDs) used to encode system dynamics
Correctness specified using temporal logic CTL
Fix-point computation over state sets
Highly optimized memory management
 Model checker SMV from CMU
ACM Kannellakis Theory in Practice Award (1999)
Success in finding high-quality bugs in hardware
applications (VHDL/Verilog code)
Cache consistency: Gigamax
Real design of a distributed multiprocessor
Global bus
UIC
UIC
M
UIC
P
M
P
Cluster bus
Read-shared/read-owned/write-invalid/write-shared/…
Deadlock found using SMV
Similar successes: IEEE Futurebus+ standard, IBM/Intel/Motorola…
Symbolic Reachability Problem
Model variables X ={x1, … xn}
Each var is of finite type, say, boolean
Initialization: I(X) condition over X
Update: T(X,X’)
How new vars X’ are related to old vars X as a result of
executing one step of the program
Target set: F(X)
Computational problem:
Can F be satisfied starting with I by repeatedly applying T ?
Graph Search problem
Symbolic Solution
Data type: region to represent state-sets
R:=I(X)
Repeat
If R intersects T report “yes”
else if R contains Post(R) report “no”
else R := R union Post(R)
Post(R(X))= (Exists X. R(X) and T(X,X’))[X’ -> X]
Operations needed: union, intersection, test for
inclusion/emptiness, projection, renaming
Binary Decision Diagrams
Popular representations for Boolean functions
0
0
0
c
a
0
1
0
d
1
b
1
1
1
Like a decision graph
No redundant nodes
No isomorphic subgraphs
Variables tested in fixed order
Function: (a and b) or (c and d)
Key properties:
Canonical!
Size depends on choice of ordering of variables
Operations such as union/intersection are efficient
Symbolic Search Techniques
 Size of BDDs can explode during search, and is
quite unpredictable
Years of research leading to plethora of heuristics
 Significant industrial interest
In-house groups: Cadence, Synopsis, IBM, NEC…
Commercial model checkers/verification consultants
 Recent focus: SAT solvers
Checking whether F can be reached within k steps can be
formulated as a satisfiability of a propositional formula
with nk variables
Extremely fast solvers such as zChaff (from Princeton)
can solve problems with 1000 vars fast !
SAT + BDD can be combined to great effects
4. Software Model Checking via Abstraction
 Can we apply model checking to C programs?
SPIN approach is fine for analyzing models, but constructing
models is expensive, and models have no relation to code
 Given a program P, build an abstract finite-state (Boolean)
model A such that set of behaviors of P is a subset of those
of A (conservative abstraction)
Basic ideas around for a while, but all components put
together effectively only recently by Microsoft Research
team in the project SLAM
Shown to be effective on Windows device drivers, Linux
source code (about 10K lines of code)
Program Abstraction
int x, y;
if x>0 {
…………
y:=x+1
……….}
else {
…………
y:=x+1
……….}
Predicate Abstraction
bx: x>0; by : y>0
bool bx, by;
if bx {
…………
by:=true
……….}
else {
…………
by:={true,false}
……….}
Verification Example
Does this code
obey the
locking spec?
do {
KeAcquireSpinLock();
Rel
nPacketsOld = nPackets;
Acq
Unlocked
Locked
Rel
Acq
Error
Specification
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Initial Abstraction
do {
KeAcquireSpinLock();
U
L
if(*){
L
L
KeReleaseSpinLock();
U
L
U
L
U
U
E
}
} while (*);
KeReleaseSpinLock();
Model checking
boolean program
Using BDDs
Feasibility Analysis
do {
KeAcquireSpinLock();
U
Is error path feasible
in C program?
Requires theorem
prover for constraint
propagation
L
nPacketsOld = nPackets;
L
L
U
L
U
L
U
U
E
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Predicate Discovery
b : (nPacketsOld == nPackets)
do {
KeAcquireSpinLock();
U
Add new predicate
to boolean program
New techniques
L
nPacketsOld = nPackets; b = true;
L
L
U
L
U
L
U
U
E
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++; b = b ? false : *;
}
} while (nPackets != nPacketsOld); !b
KeReleaseSpinLock();
Revised Abstraction
b : (nPacketsOld == nPackets)
do {
KeAcquireSpinLock();
U
L
b = true;
b L
if(*){
b L
b U
b L
b L
b U
!b U
KeReleaseSpinLock();
b = b ? false : *;
}
} while ( !b );
KeReleaseSpinLock();
Model checking
refined
boolean program
Abstraction Based Techniques
 Tools for verifying source code combine many techniques
Program analysis techniques such as slicing
Abstraction
Model checking
Refinement from counter-examples
 New challenges for model checking (beyond finite-state
reachability analysis)
Recursion gives pushdown control
Pointers, dynamic creation of objects, inheritence….
 A very active and emerging research area
Research in Formal Methods
software
Modeling languages
Hierarchy, recursion
Real-time, Hybrid
Stochastic
model
correctness
specification
Bridging the gap
Model extraction
Model-based design:
from models to code
Decision procedures
Algorithms engineering
Automated abstraction
Compositional analysis
Verifier
proof
bug
Temporal logics
Automata
From requirements to specs
Current Research Projects
 Foundations
Analysis of context-free models
Stochastic hybrid systems
Decision problems for timed automata
 Algorithms Engineering
Combining SAT, BDDs, Abstraction
Symbolic solutions to games
 Model-based design
From hybrid automata to embedded software
From state-machine models to Java card policies
 Software verification for Java classes
Classical Model Checking
 Both model M and specification S are regular (finite-state)
M as a generator of all possible behaviors
S as an acceptor of “good” behaviors (verification is language
inclusion of M in S) or as an acceptor of “bad” behaviors (verification
is checking emptiness of intersection of M and S)
 Typical specifications (using automata or temporal logic)
Safety: Always not ( both P1 and P2 have write-exclusive copy)
Liveness: Always (if P1 requests, eventually it gets response)
 Robustness of theory of regular languages helps in many ways
M can be product of several components (closure under intersection)
 For liveness properties, one needs to consider automata over
infinite words, but corresponding theory of omega-regular
languages is well developed and well understood
Boolean Programs
main() {
bool y;
…
x = P(y);
…
z = P(x);
…
}
bool P(u: bool) {
…
return Q(u);
}
bool Q(w: bool) {
if …
else return P(~w)
}
Recursive State Machines
A1
A2
A2
A2
A3
A3
Entry-point
A3
Box (superstate)
A1
Exit-point
Model Checking of Recursive Models
 Control-flow requires stack, so model M defines a
context-free language
 Algorithms exist for checking regular specifications
against context-free models
Emptiness of pushdown automata is solvable
Product of a regular language and a context-free
language is context-free
 But, checking context-free spec against a context-free
model is undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
Are Context-free Specs Interesting?
 Classical Hoare-style pre/post conditions
If p holds when procedure A is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
 Stack inspection properties (security/access control)
If a variable x is being accessed, procedure A must be in
the call stack
 Above requires matching of calls with returns, or
finding unmatched calls
Recall: Language of words over [, ] such that brackets are well
matched is not regular, but context-free
Caret for Context-free Specifications
 Caret: Temporal Logic of Calls and Returns [AEM03]
Context-free extension of Pnueli’s Linear Temporal Logic LTL
Allows specification of pre/post conditions
Allows specification of stack inspection properties
 Main result: Checking Caret specifications against a
context-free model is decidable
Polynomial in the size of the model and exponential in the
size of formula (as in case of classical model checking)
Proof technique: Product of pushdown model M and Caret
specification S is again a pushdown automaton
Key to success: The notion of calls and returns is the
same for M as well as S
Caret Definition
Interpreted over “structured” words in which positions
are marked with calls { and returns }
p’=Always(p or q)
p
q’
{q
{r
p
r
q’
q
{p
p
p}
r
q’
p’
p’
p’
q}
p
p
q’=Next(q)
Caret provides classical temporal operators such as Next
and Always
Caret Abstract Operators
Abstract versions of operators jump from a call to the
matching return
p’=abstract-always(p or q)
p’
p’
p
{q
q’
p’
{r
q’
p
r
q
{p
p’
p
q’
p’
p}
r
q’
Sample specification: pre/post:
Always( p & call -> abstract-next q )
p’
p’
p’
q}
p
p
q’=abstract-next(q)
Visibly Pushdown Languages [AM03]
 Subclass of context-free languages that is suitable for program
analysis / algorithmic verification
 Alphabet is structured: Symbols are tagged with calls and
returns
 A visibly pushdown automaton’s moves are constrained by input
If current symbol is a call, it must push
If current symbol is a return it must pop
Else it can only update control state
 Class of languages defined by these automata is very robust
Closed under union, intersection, complement, Kleene-*.
Emptiness, inclusion, equivalence decidable
Alternative characterizations: Embeddings of regular tree languages,
Monadic Second Order theory with a binary matching predicate
 Caret is a subset of visibly pushdown languages
Synthesis of Behavioral Interfaces
 Behavioral type of a class specifies the allowed sequences of
method calls
 Type for a file class may be (open; (read+open)*;close)*
 Can we synthesize this type automatically?
Given source code for the class implementation
Construct a regular language over the method calls so that a
particular exception is never raised
 This is useful for compositional verification also: behavioral
interface is a suitable abstraction of the class
 Proposed route (ongoing project)
Use abstraction to get a finite-state model
Solve a symbolic game to get the most general strategy for invoking
methods to keep the abstract model “safe”
Extract interface type from the game solution
AbstractList.ListItr
public Object next() {
…
lastRet = cursor++;
…}
public Object prev() {
…
lastRet = cursor;
…}
public void remove() {
if (lastRet==-1)
throw new IllegalExc();
…
lastRet = -1;
…}
public void add(Object o) {
…
lastRet = -1;
…}
Behavioral Interface
Start
next
add
next,prev
Safe
Unsafe
remove,add
add
next,prev
Game in Abstracted Program
next
prev
From black states,
Player0 gets to choose
the input method call
From purple states,
Player1 gets to choose
a path in the abstract
program till call returns
Objective for Player0: Ensure error states (from
which exception can be rasied) are avoided
Winning strategy: Correct method sequence calls
Challenges
 Techniques for generating finite-state abstractions
 How to solve large games symbolically?
In fact, a partial information game (Player0 should choose the next
method call only based on values returned so far)
 How to construct an understandble behavioral type from the
winning strategy?
 Abstraction refinement
If Player0 does not invoke any method, exceptions can never be
raised
How to refine the current abstraction based on quality of current
behavioral type?
 Integrating all these into a working tool
Download