STRESS06-rajamani

advertisement
Use of Models in Analysis and
Design
Sriram K. Rajamani
Rigorous Software Engineering
Microsoft Research, India
Models
• Abstractions of reality
• All branches of science and engineering
use models. Some examples:
– Differential equations
– State machines
• Models enable conquering complexity
– Allow focus on one issue at a time, while
ignoring others
Models in software engineering
• Mainstream mantra:
– “Code is truth, and only truth”
• Models are used, but not widely
– Requirements capturing (UML):
• Used specialized domains: telcom, automotive,
embedded sytstems
– Development tools:
• Testing and verification
– Design
• Model driven development
This talk
• Use of models in analysis and design
– Personal experience
• Analysis:
– Extracting analyzable models from source code using
iterative refinement
• Design:
– My assessment of state of the art and important
research problems
Models in Analysis
Software Validation
• Large scale reliable software is hard to
build and test.
• Different groups of programmers write
different components.
• Integration testing is a nightmare.
Property Checking
• Programmer provides redundant partial
specifications
• Code is automatically checked for
consistency
• Different from proving whole program
correctness
– Specifications are not complete
Interface Usage Rules
•Rules in documentation
– Incomplete, unenforced, wordy
– Order of operations & data
access
– Resource management
•Disobeying rules causes bad
behavior
– System crash or deadlock
– Unexpected exceptions
– Failed runtime checks
Does a given usage rule hold?
• Checking this is computationally
impossible!
• Equivalent to solving Turing’s halting
problem (undecidable)
• Even restricted computable versions of the
problem (finite state programs) are
prohibitively expensive
Why bother?
Just because a problem is undecidable, it
doesn’t go away!
Automatic property checking =
Study of tradeoffs
• Soundness vs completeness
– Missing errors vs reporting false alarms
• Annotation burden on the programmer
• Complexity of the analysis
– Local vs Global
– Precision vs Efficiency
– Space vs Time
Broad classification
• Underapproximations
– Testing
• After passing testing, a program may still violate a
given property
• Overapproximations
– Type checking
• Even if a program satisfies a property, the type
checker for the property could still reject it
Current trend
• Confluence of techniques from
different fields:
– Model checking
– Automatic theorem proving
– Program analysis
• Significant emphasis on practicality
• Several new projects in academia
and industry
Model Checking
• Algorithmic exploration of state space of the
system
• Several advances in the past decade:
–
–
–
–
–
symbolic model checking
symmetry reductions
partial order reductions
compositional model checking
bounded model checking using SAT solvers
• Most hardware companies use a model checker in
the validation cycle
enum {N, T, C} state[1..2]
int turn
init
state[1] = N; state[2] = N
turn = 0
trans
state[i]= N & turn = 0 -> state[i] = T; turn = i
state[i] = N & turn !=0 -> state[i] = T
state[i] = T & turn = i -> state[i] = C
state[i] = C & state[2-i] = N -> state[i] = N
state[i] = C & state[2-i] != N -> state[i] = N; turn = 2-i
N1,N2
turn=0
T1,N2
turn=1
C1,N2
turn=1
N1,T2
turn=2
T1,T2
turn=1
T1,T2
turn=2
C1,T2
turn=1
N = noncritical, T = trying, C = critical
N1,C2
turn=2
T1,C2
turn=2
Model Checking
• Strengths
– Fully automatic (when it works)
– Computes inductive invariants
• I such that F(I)  I
– Provides error traces
• Weaknesses
– Scale
– Operates only on models
• How do you get from the program to the
model?
Theorem proving
– Early theorem provers were proof checkers
• They were built to support asssertional reasoning in
the Hoare-Dijkstra style
• Cumbersome and hard to use
– Greg Nelson’s thesis in early 80s paved the
way for automatic theorem provers
•
•
•
•
Theory of equality with uninterpreted functions
Theory of lists
Theory of linear arithmetic
Combination of the above !
– Automatic theorem provers based on
Nelson’s work are widely used
• ESC
• Proof Carrying Code
Theory of Equality.
• Symbols: =, , f, g, …
• Axiomatically defined:
E=E
E2 = E1
E1 = E2
E1 = E2
E2 = E3
E1 = E3
E1 = E2
f(E1) = f(E2)
• Example of a satisfiability problem:
g(g(g(x)) = x  g(g(g(g(g(x))))) = x  g(x)  x
• Satisfiability problem decidable in O(n log n)
a : array [1..len] of int;
int max := -MAXINT;
(  1  j  i. a[j]  max)
i := 1;
 ( i > len)

{  1  j  i. a[j]  max}
( 1  j  len. a[j]  max}
while (i  len)
if( a[i] > max)
max := a[i];
i := i+1;
endwhile
{  1  j  len. a[j]  max}
Automatic theorem proving
• Strengths
– Handles unbounded domains naturally
– Good implementations for
• equality with uninterpreted functions
• linear inequalities
• combination of theories
• Weaknesses
– Hard to compute fixpoints
– Requires inductive invariants
• Pre and post conditions
• Loop invariants
Program analysis
• Originated in optimizing compilers
– constant propagation
– live variable analysis
– dead code elimination
– loop index optimization
• Type systems use similar analysis
• Are the type annotations consistent?
Program analysis
• Strengths
–
–
–
–
Works on code
Pointer aware
Integrated into compilers
Precision efficiency tradeoffs well studied
• flow (in)sensitive
• context (in)sensitive
• Weakenesses
– Abstraction is hardwired and done by the
designer of the analysis
– Not targeted at property checking
(traditionally)
Model Checking, Theorem
Proving and Program Analysis
• Very related to each other
• Different histories
– different emphasis
– different tradeoffs
• Complementary, in some ways
• Combination can be extremely powerful
What is the key design challenge in a model
checker for software?
It is the model!
Model Checking Hardware
Primitive values are booleans
States are boolean vectors of fixed size
Models are finite state machines !!
Characteristics of Software
Primitive values are more complicated
– Pointers
– Objects
Control flow (transition relation) is more complicated
– Functions
– Function pointers
– Exceptions
States are more complicated
– Unbounded graphs over values
Variables are scoped
– Locals
– Shared scopes
Much richer modularity constructs
– Functions
– Classes
Traditional approach
model
checker
FSM
Source code
Finite state machines
Sequential C program
Automatic
abstraction
SLAM
model
checker
Data flow analysis implemented using BDDs
Finite down
Push
state machines
model
Boolean
FSM
program
abstraction
Source code
C data structures, pointers,
procedure calls, parameter passing,
scoping,control flow
Sequential C program
Computing power doubles every 18
months
-Gordon Moore
An optimizing compiler doubles
performance every 18 years
-Todd Proebsting
When I use a model checker, it runs and
runs for ever and never comes back…
when I use a static analysis tool, it
comes back immediately and says “I
don’t know”
- Patrick Cousot
Rules
Static Driver Verifier
Read for
understanding
New API rules
Development
Precise
API Usage Rules
(SLIC)
Defects
Drive testing
tools
Software Model
Checking
100% path
coverage
Source Code
Testing
SLAM – Software Model Checking
• SLAM innovations
– boolean programs: a new model for software
– model creation (c2bp)
– model checking (bebop)
– model refinement (newton)
• SLAM toolkit
– built on MSR program analysis infrastructure
SLIC
• Finite state language for stating rules
– monitors behavior of C code
– temporal safety properties
– familiar C syntax
• Suitable for expressing control-dominated
properties
– e.g. proper sequence of events
– can encode data values inside state
State Machine
for Locking
state {
enum {Locked,Unlocked}
s = Unlocked;
}
Rel
Acq
Unlocked
Locked
Rel
Acq
Error
Locking Rule in
SLIC
KeAcquireSpinLock.entry {
if (s==Locked) abort;
else s = Locked;
}
KeReleaseSpinLock.entry {
if (s==Unlocked) abort;
else s = Unlocked;
}
The SLAM Process
c2bp
prog. P
SLIC rule
slic
prog. P’
boolean
program
bebop
predicates
path
newton
Example
Does this code
obey the
locking rule?
do {
KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Example
Model checking
boolean program
(bebop)
do {
KeAcquireSpinLock();
U
L
if(*){
L
L
KeReleaseSpinLock();
U
L
U
L
U
U
E
}
} while (*);
KeReleaseSpinLock();
Example
Is error path feasible
in C program?
(newton)
do {
KeAcquireSpinLock();
U
L
nPacketsOld = nPackets;
L
L
U
L
U
L
U
U
E
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Example
Add new predicate
b : (nPacketsOld == nPackets) to boolean program
(c2bp)
do {
KeAcquireSpinLock();
U
L
nPacketsOld = nPackets; b = true;
L
L
U
L
U
L
U
U
E
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++; b = b ? false : *;
}
} while (nPackets != nPacketsOld); !b
KeReleaseSpinLock();
Example
b : (nPacketsOld == nPackets)
do {
KeAcquireSpinLock();
U
L
b = true;
b L
if(*){
b L
b U
b L
!b U
b L
U
b U
E
KeReleaseSpinLock();
b = b ? false : *;
}
} while ( !b );
KeReleaseSpinLock();
Model checking
refined
boolean program
(bebop)
Example
b : (nPacketsOld == nPackets)
do {
KeAcquireSpinLock();
U
L
b = true;
b L
if(*){
b L
b U
b L
b L
b U
!b U
KeReleaseSpinLock();
b = b ? false : *;
}
} while ( !b );
KeReleaseSpinLock();
Model checking
refined
boolean program
(bebop)
Observations about SLAM
• Automatic discovery of invariants
– driven by property and a finite set of (false) execution paths
– predicates are not invariants, but observations
– abstraction + model checking computes inductive invariants
(boolean combinations of observations)
• A hybrid dynamic/static analysis
– newton executes path through C code symbolically
– c2bp+bebop explore all paths through abstraction
• A new form of program slicing
– program code and data not relevant to property are dropped
– non-determinism allows slices to have more behaviors
Current status of SDV
• Runs on 100s of
Windows drivers
• Finds several bugs,
proves several properties
• SDV now transferred
from MSR to Windows
division
• Used to check several
DDK and inbox drivers
• Beta Released at
WINHEC 2005!
Static Driver Verifier
Static
Driver
Verifier
Driver: Parallel port device driver
•
• Rule: Checks that driver dispatch routines do not call
IoCompleteRequest(…) twice on the I/O request
packet passed to it by the OS or another driver
Call #1
Call #2
SLAM/SDV History (with Tom Ball)
• 1999-2001
– foundations, algorithms,
prototyping
– papers in CAV, PLDI, POPL,
SPIN, TACAS
•
September 3, 2002
– made initial release of SDV to
Windows (friends and family)
•
April 1, 2003
– made wide release of SDV to
Windows (any internal driver
developer)
•
September, 2003
– team of six in Windows working on
SDV
– researchers moving into
“consultant” role
•
November, 2003
– demonstration at Driver Developer
Conference
•
May, 2005
– Beta ships at WinHEC 2005!
• March 2002
– Bill Gates review
• May 2002
– Windows committed to hire
two Ph.D.s in model
checking to support Static
Driver Verifier
• July 2002
– running SLAM on 100+
drivers, 20+ properties
SLAM
• Boolean program model has proved itself
• Successful for domain of device drivers
– control-dominated safety properties
– few boolean variables needed to do proof or find real
counterexamples
• Counterexample-driven refinement
– terminates in practice
– incompleteness of theorem prover not an issue
What is hard?
• Abstracting
– from a language with pointers (C)
– to one without pointers (boolean programs)
• All side effects need to be modeled by
copying (as in dataflow)
• Open environment problem
What stayed fixed?
• Boolean program model
• Basic tool flow
• Repercussions:
– newton has to copy between scopes
– c2bp has to model side-effects by value-result
– finite depth precision on the heap is all
boolean programs can handle
What changed?
• Interface between newton and c2bp
• We now use predicates for doing more
things
• refine alias precision via aliasing predicates
• newton helps resolve pointer aliasing imprecision
in c2bp
Model Checking, Theorem
Proving and Program Analysis
• Very related to each other
• Different histories
– different emphasis
– different tradeoffs
• Complementary, in some ways
• Combination can be extremely powerful
What worked well?
•
•
•
•
•
•
Specific domain problem
Safety properties
Shoulders & synergies
Separation of concerns
Summer interns & visitors
Strategic partnership with Windows
Predictions
• The holy grail of full program verification
has been abandoned. It will probably
remain abandoned
• Less ambitious tools like powerful type
checkers will emerge and become more
widely used
• These tools will exploit ideas from various
analysis disciplines
• Tools will alleviate the “chicken-and-egg”
problem of writing specifications
Further Reading
See papers, slides from:
http://research.microsoft.com/slam
http://research.microsoft.com/~sriram
Glossary
Model checking
Checking properties by systematic exploration of the state-space of a
model. Properties are usually specified as state machines, or using
temporal logics
Safety properties
Properties whose violation can be witnessed by a finite run of the system.
The most common safety properties are invariants
Reachability
Specialization of model checking to invariant checking. Properties are
specified as invariants. Most common use of model checking. Safety
properties can be reduced to reachability.
Boolean programs
“C”-like programs with only boolean variables. Invariant checking and
reachability is decidable for boolean programs.
Predicate
A Boolean expression over the state-space of the program eg. (x < 5)
Predicate abstraction
A technique to construct a boolean model from a system using a given set
of predicates. Each predicate is represented by a boolean variable in the
model.
Weakest precondition
The weakest precondition of a set of states S with respect to a statement T
is the largest set of states from which executing T, when terminating,
always results in a state in S.
Download