“A System and Language for Building System-Specific, Static Analyses”

advertisement
“A System and Language for Building
System-Specific, Static Analyses”
CMSC 631 – Fall 2003
Seth Hallem, Benjamin Chelf,
Yichen Xie, and Dawson Engler
(presented by Mujtaba Ali)
Motivation
• Goal: Find as many bugs as possible
• Applications:
– Free checker
• Detect double frees and dereference of freed pointers
– Lock checker
• Warn if locks released without being acquired, double
acquired, or not released at all
– Statistical analysis to infer checking rules
• Infer whether routines a and b must be paired
2
State Machine Transitions
kfree(v)
unknown
kfree(v)
freed
stop
*v
• Analyses modeled as state machine transitions
• State machines are:
– Simple enough for programmers to understand
– Expressive enough to specify lots of analyses
Note: stop state does not always imply an error
3
Free Checker Example
int contrived(int *p, int *w, int
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after
}
int contrived_caller (int *w, int
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after
}
x) {
(pfreed)
(pfreed)
Assume x!=0
(p,wfreed)
(p,w,qfreed)
(w,qfreed,pstop)
Prune true branch
free!
(wfreed,qstop)
x, int *p) {
(pfreed)
free!
4
Free Checker Example
int contrived(int *p, int *w, int
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after
}
int contrived_caller (int *w, int
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after
}
x) {
free!
(pfreed)
(pfreed)
Assume x==0
Prune false branch
(pfreed)
(wfreed,qstop)
x, int *p) {
(pfreed)
free!
5
Free Checker Example
int contrived(int *p, int *w, int
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after
}
int contrived_caller (int *w, int
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after
}
x) {
union
free!
(pfreed)
(wfreed,qstop)
x, int *p) {
free!
(pfreed)
(p,wfreed)
(pfreed,wstop)
6
A Unified Framework
• Two components:
– metal
• Language used for expressing custom analyses
• I.e, for expressing state machines
– xgcc
• Analysis engine that executes metal specifications
7
metal
• Language for specifying state machines
• metal specification is called an “extension”
• For programmers, not compiler writers
– Many rules known only to programmers
• Flexibility allows for different kinds of
analyses, e.g.:
– Find violations of known correctness rules
– Automatically infer such rules from source
8
Example Extension: Free Checker
state decl any_pointer v;
start:
{ kfree(v) } ==> v.freed;
v.freed:
{ *v } ==> v.stop,
{ err("using %s after free!", mc_identifier(v)); }
| { kfree(v) } ==> v.stop,
{ err("double free of %s!", mc_identifier(v)); }
;
– Extensions feature ML-like pattern matching
9
metal Extension Terminology
state decl any_pointer v;
variable-specific state variable
start:
{ kfree(v) } ==> v.freed;
global state value
v.freed:
{ *v } ==> v.stop,
{ err(...); }
| { kfree(v) } ==> v.stop,
{ err(...); }
;
variable-specific state values
– Global state variable (with exactly one instance) implied
– Instances of variable-specific state variables come and go
10
metal Extensions and SMs
• Extension composed of one or more SMs
– Extension state = the state of these SMs
• State machine state is a state tuple:
– Value of global instance
– Value of one of variable-specific instances
• State tuple notation: (start,v:pfreed)
• So, extension state = set of state tuples, e.g.
{(start,v:pfreed),(start,v:wfreed)}
11
xgcc
• Executes metal extensions
– Context-sensitive, interprocedural analysis
• Does not restrict metal extensions
– Beyond determinism
• Scalability a primary design requirement
– More rules + more code = more bugs found
12
xgcc Algorithm Overview
• Applies extension to CFG for a function in
depth-first order
• At each program point, looks for executable
transition in all state machines
• Provides additional enhancements:
– Prunes non-executable paths
– Follows simple value flow
– Deletes state attached to redefined expressions
13
Intraprocedural Heuristics
• Basic block-level state caching
• Motivation: Exploit determinism of extension
– Applying extension to same program point in
same state always gives same result
• Algorithm:
– Before traversal, record extension state in each
basic block – a “block summary”
– Subsequent traversals abort if their extension state
is a subset of the block summary
14
Block Summary
int contrived(int *p, int *w, int x) {
int *q;
multi-line basic blocks
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int contrived_caller (int *w, int x, int *p) {
(start,v:wfreed)
kfree (p);
(start,v:qfreed)
contrived (p, w, x);
return *w; // using 'w' after free!
}
(pfreed)
(pfreed)
Assume x!=0
(p,wfreed)
(p,w,qfreed)
(w,qfreed,pstop)
Prune true branch
(wfreed,qstop)
(pfreed)
15
Interprocedural Heuristics
• Require additional cache information
• Block summary is now a union of:
– Transition edges: (s,v:tvs)(s’,v:tvs’)
– Add edges: (s,v:tunknown)(s’,v:tvs’)
• When new instances created inside basic block
• Suffix summary
– Edges starting at a basic block and ending at
function’s exit point
– Function summary=entry block’s suffix summary
– Built backwards (in contrast to block summaries)
16
Block and Suffix Summaries
int contrived(int *p, int *w, int x) {
(pfreed)
int *q;
(pfreed)
if(x) {
Assume x!=0
(start,v:pfreed)(start,v:pfreed)
kfree(w);
(p,wfreed)
q(start,v:pfreed)(start,v:pfreed)
= p;
(p,w,qfreed)
p(start,v:wunknown)(start,v:wfreed)
= 0;
(w,qfreed,pstop)
}
if(!x)
Prune true branch
return *w; // safe
return *q; // using 'q' after free!
(wfreed,qstop)
}
int contrived_caller (int *w, int x, int *p) {
(start,v:wfreed)(start,v:wfreed)
kfree (p);
(pfreed)
(start,v:qfreed)(start,v:qstop)
contrived (p, w, x);
(start,v:wfreed)(start,v:wfreed)
return
*w; // using 'w' after free!
}
17
Unsoundness
• xgcc’s interprocedural analysis is unsound
– But that’s OK (Jim Larus agrees)
– If it can catch some errors, it’s still useful
• Unsound analyses can catch some errors that
sound analyses can’t
– Some analyses (e.g.,inferring which routines must
be paired) can not be expressed soundly
• Focus is on executing extensions efficiently
18
Reducing False Positives
• Killing variables and expressions
– Remove state machine when variable is defined
• Synonyms
p = q = kmalloc(...);
if(!p)
return 0;
*q; /* safe dereference: q = p = not null */
• False path pruning
• Targeted suppression
– i.e., xgcc hacks
19
Free Checker Example
int contrived(int *p, int *w, int
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after
}
int contrived_caller (int *w, int
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after
}
x) {
(pfreed)
(pfreed)
Assume x!=0
(p,wfreed)
(p,w,qfreed)
(w,qfreed,pstop)
Prune true branch
free!
(wfreed,qstop)
x, int *p) {
(pfreed)
free!
On a write, if there is a state machine for p, we “kill” it.
20
Reducing False Positives
• Killing variables and expressions
– Remove state machine when variable is defined
• Synonyms
p = q = kmalloc(...);
if(!p)
return 0;
*q; /* safe dereference: q = p = not null */
• False path pruning
• Targeted suppression
– i.e., xgcc hacks
21
Ranking of Errors
• Impossible to eliminate all false positives
• xgcc ranks errors
– Generic ranking: distance
– Path-specific ranking by annotating extensions
– Statistical ranking (z-ranking)
• Ranking can distinguish different uses
– Linux semaphore routines up and down used as
both counters and locks
– Interprocedural analysis can not handle this case
22
Extending metal Extensions
• Extend state space using general purpose code
• Path specific transitions
– Different destination state for when analysis
follows true branch or false branch
start:
{trylock(l) != 0} ==> true=l.locked, false=l.stop
| {trylock(l) == 0} ==> true=l.stop, false=l.locked
• C Code actions
– Can manipulate extension’s state using xgcc’s
interface
23
Example Extension: Free Checker
state decl any_pointer v;
start:
{ kfree(v) } ==> v.freed;
v.freed:
{ *v } ==> v.stop,
{ err("using %s after free!", mc_identifier(v)); }
| { kfree(v) } ==> v.stop,
{ err("double free of %s!", mc_identifier(v)); }
;
C Code actions
24
Extending metal Extensions
• Extend state space using general purpose code
• Path specific transitions
– Different destination state for when analysis
follows true branch or false branch
start:
{trylock(l) != 0} ==> true=l.locked, false=l.stop
| {trylock(l) == 0} ==> true=l.stop, false=l.locked
• C Code actions
– Can manipulate extension’s state using xgcc’s
interface
25
The Good
• Unsoundness presents new opportunities
• Designed for use by “everyday” programmers
• Heuristics to speed up execution
• Heuristics to reduce false positives
• Ranking to help sift through false positives
• Tested on systems code (Linux, OpenBSD)
• Paper is very clearly written!
26
The Bad
• Unsoundness is unsound
– Jim Larus says eventually programmers will want
to move to sound tools
• Designed for use by “everyday” programmers
– Advanced features require analysis knowledge
• Path-specific state machine transitions
• Path-specific error ranking
• xgcc/metal is now commercial
– Boooo!
27
Related Work
• ESP
– Sound
– Uses state machine language like metal
– More likely to scale in the interprocedural case
• SLAM
– Model-checking approach
– Verification tool intended for smaller code bases
• PREfix
– Unsound, more expensive analysis
– Fixed set of error types and analyses
28
Related Work (con’t.)
• ESC/Java
– Uses theorem prover
– High annotation burden (1 ann / 3 loc)
• Recent efforts to infer annotations
• Cqual
– Interprocedural, sound analysis
– Annotations to express program properties and to
suppress false positives
29
Singular Key Idea
• Unsound, uncomplete analysis based on
clever heuristics can be an effective bug
fighting tool
– Such analyses can allow techniques not possible
with sound and complete analyses
30
Download