“A System and Language for Building System-Specific, Static Analyses” CMSC 631 – Fall 2003 Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler (presented by Mujtaba Ali) Motivation • Goal: Find as many bugs as possible • Applications: – Free checker • Detect double frees and dereference of freed pointers – Lock checker • Warn if locks released without being acquired, double acquired, or not released at all – Statistical analysis to infer checking rules • Infer whether routines a and b must be paired 2 State Machine Transitions kfree(v) unknown kfree(v) freed stop *v • Analyses modeled as state machine transitions • State machines are: – Simple enough for programmers to understand – Expressive enough to specify lots of analyses Note: stop state does not always imply an error 3 Free Checker Example int contrived(int *p, int *w, int int *q; if(x) { kfree(w); q = p; p = 0; } if(!x) return *w; // safe return *q; // using 'q' after } int contrived_caller (int *w, int kfree (p); contrived (p, w, x); return *w; // using 'w' after } x) { (pfreed) (pfreed) Assume x!=0 (p,wfreed) (p,w,qfreed) (w,qfreed,pstop) Prune true branch free! (wfreed,qstop) x, int *p) { (pfreed) free! 4 Free Checker Example int contrived(int *p, int *w, int int *q; if(x) { kfree(w); q = p; p = 0; } if(!x) return *w; // safe return *q; // using 'q' after } int contrived_caller (int *w, int kfree (p); contrived (p, w, x); return *w; // using 'w' after } x) { free! (pfreed) (pfreed) Assume x==0 Prune false branch (pfreed) (wfreed,qstop) x, int *p) { (pfreed) free! 5 Free Checker Example int contrived(int *p, int *w, int int *q; if(x) { kfree(w); q = p; p = 0; } if(!x) return *w; // safe return *q; // using 'q' after } int contrived_caller (int *w, int kfree (p); contrived (p, w, x); return *w; // using 'w' after } x) { union free! (pfreed) (wfreed,qstop) x, int *p) { free! (pfreed) (p,wfreed) (pfreed,wstop) 6 A Unified Framework • Two components: – metal • Language used for expressing custom analyses • I.e, for expressing state machines – xgcc • Analysis engine that executes metal specifications 7 metal • Language for specifying state machines • metal specification is called an “extension” • For programmers, not compiler writers – Many rules known only to programmers • Flexibility allows for different kinds of analyses, e.g.: – Find violations of known correctness rules – Automatically infer such rules from source 8 Example Extension: Free Checker state decl any_pointer v; start: { kfree(v) } ==> v.freed; v.freed: { *v } ==> v.stop, { err("using %s after free!", mc_identifier(v)); } | { kfree(v) } ==> v.stop, { err("double free of %s!", mc_identifier(v)); } ; – Extensions feature ML-like pattern matching 9 metal Extension Terminology state decl any_pointer v; variable-specific state variable start: { kfree(v) } ==> v.freed; global state value v.freed: { *v } ==> v.stop, { err(...); } | { kfree(v) } ==> v.stop, { err(...); } ; variable-specific state values – Global state variable (with exactly one instance) implied – Instances of variable-specific state variables come and go 10 metal Extensions and SMs • Extension composed of one or more SMs – Extension state = the state of these SMs • State machine state is a state tuple: – Value of global instance – Value of one of variable-specific instances • State tuple notation: (start,v:pfreed) • So, extension state = set of state tuples, e.g. {(start,v:pfreed),(start,v:wfreed)} 11 xgcc • Executes metal extensions – Context-sensitive, interprocedural analysis • Does not restrict metal extensions – Beyond determinism • Scalability a primary design requirement – More rules + more code = more bugs found 12 xgcc Algorithm Overview • Applies extension to CFG for a function in depth-first order • At each program point, looks for executable transition in all state machines • Provides additional enhancements: – Prunes non-executable paths – Follows simple value flow – Deletes state attached to redefined expressions 13 Intraprocedural Heuristics • Basic block-level state caching • Motivation: Exploit determinism of extension – Applying extension to same program point in same state always gives same result • Algorithm: – Before traversal, record extension state in each basic block – a “block summary” – Subsequent traversals abort if their extension state is a subset of the block summary 14 Block Summary int contrived(int *p, int *w, int x) { int *q; multi-line basic blocks if(x) { kfree(w); q = p; p = 0; } if(!x) return *w; // safe return *q; // using 'q' after free! } int contrived_caller (int *w, int x, int *p) { (start,v:wfreed) kfree (p); (start,v:qfreed) contrived (p, w, x); return *w; // using 'w' after free! } (pfreed) (pfreed) Assume x!=0 (p,wfreed) (p,w,qfreed) (w,qfreed,pstop) Prune true branch (wfreed,qstop) (pfreed) 15 Interprocedural Heuristics • Require additional cache information • Block summary is now a union of: – Transition edges: (s,v:tvs)(s’,v:tvs’) – Add edges: (s,v:tunknown)(s’,v:tvs’) • When new instances created inside basic block • Suffix summary – Edges starting at a basic block and ending at function’s exit point – Function summary=entry block’s suffix summary – Built backwards (in contrast to block summaries) 16 Block and Suffix Summaries int contrived(int *p, int *w, int x) { (pfreed) int *q; (pfreed) if(x) { Assume x!=0 (start,v:pfreed)(start,v:pfreed) kfree(w); (p,wfreed) q(start,v:pfreed)(start,v:pfreed) = p; (p,w,qfreed) p(start,v:wunknown)(start,v:wfreed) = 0; (w,qfreed,pstop) } if(!x) Prune true branch return *w; // safe return *q; // using 'q' after free! (wfreed,qstop) } int contrived_caller (int *w, int x, int *p) { (start,v:wfreed)(start,v:wfreed) kfree (p); (pfreed) (start,v:qfreed)(start,v:qstop) contrived (p, w, x); (start,v:wfreed)(start,v:wfreed) return *w; // using 'w' after free! } 17 Unsoundness • xgcc’s interprocedural analysis is unsound – But that’s OK (Jim Larus agrees) – If it can catch some errors, it’s still useful • Unsound analyses can catch some errors that sound analyses can’t – Some analyses (e.g.,inferring which routines must be paired) can not be expressed soundly • Focus is on executing extensions efficiently 18 Reducing False Positives • Killing variables and expressions – Remove state machine when variable is defined • Synonyms p = q = kmalloc(...); if(!p) return 0; *q; /* safe dereference: q = p = not null */ • False path pruning • Targeted suppression – i.e., xgcc hacks 19 Free Checker Example int contrived(int *p, int *w, int int *q; if(x) { kfree(w); q = p; p = 0; } if(!x) return *w; // safe return *q; // using 'q' after } int contrived_caller (int *w, int kfree (p); contrived (p, w, x); return *w; // using 'w' after } x) { (pfreed) (pfreed) Assume x!=0 (p,wfreed) (p,w,qfreed) (w,qfreed,pstop) Prune true branch free! (wfreed,qstop) x, int *p) { (pfreed) free! On a write, if there is a state machine for p, we “kill” it. 20 Reducing False Positives • Killing variables and expressions – Remove state machine when variable is defined • Synonyms p = q = kmalloc(...); if(!p) return 0; *q; /* safe dereference: q = p = not null */ • False path pruning • Targeted suppression – i.e., xgcc hacks 21 Ranking of Errors • Impossible to eliminate all false positives • xgcc ranks errors – Generic ranking: distance – Path-specific ranking by annotating extensions – Statistical ranking (z-ranking) • Ranking can distinguish different uses – Linux semaphore routines up and down used as both counters and locks – Interprocedural analysis can not handle this case 22 Extending metal Extensions • Extend state space using general purpose code • Path specific transitions – Different destination state for when analysis follows true branch or false branch start: {trylock(l) != 0} ==> true=l.locked, false=l.stop | {trylock(l) == 0} ==> true=l.stop, false=l.locked • C Code actions – Can manipulate extension’s state using xgcc’s interface 23 Example Extension: Free Checker state decl any_pointer v; start: { kfree(v) } ==> v.freed; v.freed: { *v } ==> v.stop, { err("using %s after free!", mc_identifier(v)); } | { kfree(v) } ==> v.stop, { err("double free of %s!", mc_identifier(v)); } ; C Code actions 24 Extending metal Extensions • Extend state space using general purpose code • Path specific transitions – Different destination state for when analysis follows true branch or false branch start: {trylock(l) != 0} ==> true=l.locked, false=l.stop | {trylock(l) == 0} ==> true=l.stop, false=l.locked • C Code actions – Can manipulate extension’s state using xgcc’s interface 25 The Good • Unsoundness presents new opportunities • Designed for use by “everyday” programmers • Heuristics to speed up execution • Heuristics to reduce false positives • Ranking to help sift through false positives • Tested on systems code (Linux, OpenBSD) • Paper is very clearly written! 26 The Bad • Unsoundness is unsound – Jim Larus says eventually programmers will want to move to sound tools • Designed for use by “everyday” programmers – Advanced features require analysis knowledge • Path-specific state machine transitions • Path-specific error ranking • xgcc/metal is now commercial – Boooo! 27 Related Work • ESP – Sound – Uses state machine language like metal – More likely to scale in the interprocedural case • SLAM – Model-checking approach – Verification tool intended for smaller code bases • PREfix – Unsound, more expensive analysis – Fixed set of error types and analyses 28 Related Work (con’t.) • ESC/Java – Uses theorem prover – High annotation burden (1 ann / 3 loc) • Recent efforts to infer annotations • Cqual – Interprocedural, sound analysis – Annotations to express program properties and to suppress false positives 29 Singular Key Idea • Unsound, uncomplete analysis based on clever heuristics can be an effective bug fighting tool – Such analyses can allow techniques not possible with sound and complete analyses 30