Document 14981723

advertisement
ESP: Program Verification Of
Millions of Lines of Code
Manuvir Das
Researcher
PPRC Reliability Team
Microsoft Research
Motivation
No Buffer Overruns!
Approach
Redundency is good
 Redundancy exposes inconsistency
 Inconsistency points to errors
 Compare

 what
programmer should do
 what her code actually does
Lightweight specifications

Rules
 Describe
correct behavior
 Readable/writable by programmers

Specify limited properties
 not

total correctness/verification
Compare rules against code
Types are rules

Programmers use types to
 document
interface syntax
 represent program abstractions

Types are written, read and checked
 routine

part of development process
Why are types successful?
 types
are lightweight specifications
 type checking is fast & routine
 errors are found early, at compile-time
Can we extend this approach?

Specify and check other properties



Goal is partial correctness



languages to express rules
tools to check that code obeys rules
detect and report important classes of errors
no guarantee of program correctness
Systematic tools of various flavors



compile-time verifiers and bug-finders
run-time monitors and fault injectors
document generators
Rule-based
programming
Rules
Development
Testing
Read for Static Verification Tool
understanding
Precise
New API rules
Rules
Drive testing
tools
Defects
Program Analysis Engine
100% path
coverage
Source Code
ESP
Rules
ESP
OPAL
Rules
Defects
Path-sensitive Dataflow Analysis
100% path
coverage
C/C++ Code
Requirements

Scalability




Complete coverage
Millions of lines of code
All features of C/C++
Usability



Low number of false positives
Simple rule description language
Informative error reports
The bottom line

Can ESP verify a million lines of code?

We’re not sure …. yet
We’ve done 150 KLOC in 70s and 50MB
So, we’re cautiously optimistic


Are we running into a wall?

Verification demands precision



Need to minimize false error reports
Must analyze each execution path
Big programs demand scalability



Exponentially/infinitely many paths
Cannot analyze each execution path
Must use approximate analysis
Research problem

Can we invent a verification method that





is always conservative,
is always scalable,
is almost always precise, and
matches our intuition?
Yes, for a certain class of rules

Finite state, temporal safety properties
Finite state safety properties


Property is described by an FSA
As the program executes, a monitor




tracks the current state of the FSA
updates the current state
signals an error when the FSA transitions
into special error states
Goal of verification:

Is there some execution path that would
cause the monitor to signal an error?
Example: stdio usage in gcc
void main ()
{
if (dump)
Open;
fil
= fopen(dumpFile,”w”);
if (p)
x = 0;
else
x = 1;
if (dump)
Close;
fclose(fil);
}
Closed
Print/Close
*
Open
Print
Error
Close
Opened
Open
Path-sensitive property analysis



Symbolically evaluate the program
Track FSA state and execution state
At branch points:

Execution state implies branch direction?


Yes: process appropriate branch
No: split state and process both branches
Example
entry
[Closed]
[Closed|dump=T]
[Opened|dump=T]
[Opened|dump=T,p=T]
[Opened|dump=T,p=T,x=0]
[Opened|dump=T,p=T,x=0]
[Closed|dump=T,p=T,x=0]
dump
T
F
Open
T
p
F
x = 0
x = 1
dump
[Opened|dump=T,p=F,x=1]
T
F
Close
exit
[Opened|dump=T,p=F]
[Opened|dump=T,p=F,x=1]
[Closed|dump=T,p=F,x=1]
Dataflow property analysis



Track only FSA state
Ignore non-state-changing code
At control flow join points:

Accumulate FSA states
Example
entry
{Closed}
dump
T
F
Open
{Closed,Opened}
T
p
F
x = 0
{Closed,Opened}
x = 1
dump
T
F
Close
{Error,Closed,Opened}
exit
Why is this code correct?
void main ()
{
if (dump)
Open;
if (p)
x = 0;
else
x = 1;
if (dump)
Close;
}
Closed
Print/Close
*
Open
Print
Error
Close
Opened
Open
When is a branch relevant?

Precise answer


When the value of the branch condition
determines the property FSA state
Heuristic answer

When the property FSA is driven to
different states along the arms of the
branch statement
Property simulation

Modification of path-sensitive analysis

At control flow join points:

States agree on property FSA state?

Yes: merge states

No: process states separately
Example
entry
[Closed]
dump
T
F
Open
[Opened|dump=T]
T
p
F
x = 0
[Opened|dump=T]
[Opened|dump=T,p=T,x=0]
x = 1
dump
[Closed|dump=F]
[Opened|dump=T,p=F,x=1]
T
F
Close
[Closed]
[Closed|dump=T]
[Closed|dump=F]
exit
[Closed|dump=F]
Loop example
entry
[Closed]
new = old
[Closed|new=old+1]
Open
[Opened|new=old]
*
T
Close
T
F
new++
[Closed|new=old+1]
new != old
[Opened|new=old]
F
Close
exit
[Closed|new=old]
Making property simulation work

Real programs are complex



Real code bases are very large


Multiple FSAs
Aliasing
Well beyond a million lines
ESP =
Property Simulation + Multiple FSAs +
Aliasing + Component-wise Analysis
Problem: Multiple FSAs
void main ()
{
Closed
if (dump1)
fil1
Open(fil1);
= fopen(dumpFile1,”w”);
Open
Print/Close
*
Close
if (dump2)
fil2
= fopen(dumpFile2,”w”);
Open(fil2);
Opened
Print
if (dump1)
fclose(fil1);
Close(fil1);
if (dump2)
fclose(fil2);
Close(fil2);
}
Error
Source code pattern
Open
Transition
e = fopen(_)
Open
fclose(e)
Close
Property simulation, bit by bit


Problem: property state can be exponential
Solution: track one FSA at a time
void main ()
{
if (dump1)
Open;
void main ()
{
if (dump1)
ID;
if (dump2)
ID;
if (dump2)
Open;
if (dump1)
Close;
if (dump1)
ID;
if (dump2)
ID;
if (dump2)
Close;
}
}
Property simulation, bit by bit

One FSA at a time
+ Avoids exponential property state
+ Fewer branches are relevant
+ Lifetimes are often short
+ Smaller memory footprint
+ Embarassingly parallel
− Cannot correlate FSAs
Problem: Aliasing
void main ()
{
if (dump1)
fil1 = fopen(dumpFile1,”w”);
if (dump2)
fil2 = fopen(dumpFile2,”w”);
fil3 = fil1;
if (dump1)
fclose( fil3 );
if (dump2)
fclose( fil2 );
}
ESP Model: Values Have State

During execution, the program



The programmer defines



creates stateful values
changes the state of stateful values
how values are created (syntactic patterns)
how values change state (syntactic patterns)
Syntactic expressions are aliases for values
OPAL Rule Descriptions

Object Property Automata Language
State Closed
State Opened
State Error
Initial Event Open { _object_ ASTFUNCTIONCALL
{ ASTSYMBOL “fopen” } { _anyargs_ } }
Event Close { ASTFUNCTIONCALL
{ ASTSYMBOL “fclose” } { _object_ } }
Transition _ -> Opened on Open
Transition Opened -> Closed on Close
Transition Closed -> Error on Close “File already closed”
Parameterized transitions
void main ()
{
if (dump1)
fil1 = fopen(dumpFile1,”w”);
if (dump2)
fil2 = fopen(dumpFile2,”w”);
fil3 = fil1;
if (dump1)
fclose( fil3 );
if (dump2)
fclose( fil2 );
}
Parameterized transitions
void main ()
{
if (dump1) {
t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1;
}
if (dump2) {
t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2;
}
fil3 = fil1;
if (dump1) {
fclose( fil3 ); Close(fil3);
}
if (dump2) {
fclose( fil2 ); Close(fil2);
}
}
Expressions are value aliases
void main ()
{
if (dump1) {
t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1;
}
if (dump2) {
t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2;
}
fil3 = fil1;
if (dump1) {
fclose( fil3 ); Close(fil3);
}
if (dump2) {
fclose( fil2 ); Close(fil2);
}
}
Value-alias analysis



Is expression e an alias for value v?
ESP uses GOLF to answer this query
Generalized One Level Flow



Context-sensitive
Largely flow-insensitive
Millions of lines of code, in seconds
Putting it all together

Property simulation


Syntactic patterns + value-alias analysis


Identify and track relevant execution state
Identify and isolate individual FSAs
One FSA at a time

Bit vector analysis for safety properties
Case study: stdio usage in gcc

cc1 from gcc version 2.5.3 (Spec95)
Does cc1 always print to opened files?

cc1 is a complex program:




140K non-blank, non-comment lines of C
2149 functions, 66 files, 1086 globals
Call graph includes one 450 function SCC
Skeleton of cc1 source
FILE *f1, … , *f15;
int p1, … , p15;
void compileFile() {
if (p1)
f1 = fopen(…);
…
if (p15)
f15 = fopen(…);
}
void restOfComp()
{
if (p1)
printRtl(f1);
…
if (p15)
printRtl(f15);
restOfComp();
restOfComp();
}
if (p1)
fclose(f1);
…
if (p15)
fclose(f15);
void printRtl(FILE *f)
{
fprintf(f);
}
OPAL rules for stdio usage
State
State
State
State
Uninit
Closed
Opened
Error
Initial Event Decl {ASTDECLARATION
{_object_ ASTSYMBOL _any_}}
Initial Event Open {_object_ ASTFUNCTIONCALL
{ASTSYMBOL “fopen”} {_anyargs_}}
Event Print {ASTFUNCTIONCALL
{ASTSYMBOL “fprintf”} {_object_,_anyargs_}}
Event Close {ASTFUNCTIONCALL
{ASTSYMBOL “fclose”} {_object_}}
Transition
Transition
Transition
Transition
Transition
Transition
Transition
_ -> Uninit on Decl
_ -> Opened on Open
Uninit -> Error on Print “File not opened”
Opened -> Opened on Print
Closed -> Error on Print “Printing to closed file”
Opened -> Closed on Close
Closed -> Error on Close “File already closed”
Experimental results

Precision



Scalability



Verification succeeds for every file handle
No transitions to Error; no false errors
Ave. per handle: 72.9 seconds, 49.7 MB
Single 1GHz PIII laptop with 512 MB RAM
We have proved that:

Each of the 646 calls to fprintf in the
source code prints to a valid, open file
Ongoing research

Path-sensitive value-alias analysis

Value-alias sets





Expressions that hold tracked value
Track value-alias sets during simulation
Add value-alias sets to property state
When things get complicated, use GOLF
Component-wise analysis


Identify and analyze components
Link using less precise analysis
Download