Spring 2013 Program Analysis and Verification Lecture 1: Introduction Roman Manevich Ben-Gurion University 30GB Zunes all over the world fail en masse December 31, 2008 2 Zune bug 1 while (days > 365) { 2 if (IsLeapYear(year)) { 3 if (days > 366) { 4 days -= 366; 5 year += 1; 6 } 7 } else { 8 days -= 365; 9 year += 1; 10 } 11 } 3 Zune bug 1 while (366 > 365) { 2 if (IsLeapYear(2008)) { 3 if (366 > 366) { 4 days -= 366; 5 year += 1; 6 } 7 } else { 8 days -= 365; 9 year += 1; 10 } 11 } Suggested solution: wait for tomorrow 4 Patriot missile failure On the night of the 25th of February, 1991, a Patriot missile system operating in Dhahran, Saudi Arabia, failed to track and intercept an incoming Scud. The Iraqi missile impacted into an army barracks, killing 28 U.S. soldiers and injuring another 98. February 25, 1991 5 Patriot bug – rounding error • Time measured in 1/10 seconds • Binary expansion of 1/10: 0.0001100110011001100110011001100.... • 24-bit register 0.00011001100110011001100 • error of – 0.0000000000000000000000011001100... binary, or ~0.000000095 decimal • After 100 hours of operation error is 0.000000095×100×3600×10=0.34 • A Scud travels at about 1,676 meters per second, and so travels more than half a kilometer in this time Suggested solution: reboot every 10 hours 6 Billy Gates why do you make this possible ? Stop making money and fix your software!! (W32.Blaster.Worm) August 13, 2003 7 Windows exploit(s) Buffer Overflow Memory addresses … void foo (char *x) { char buf[2]; strcpy(buf, x); } int main (int argc, char *argv[]) { foo(argv[1]); } Previous br frame Returnda address Saved ca FP char* ra x ./a.out abracadabra Segmentation fault buf[2] ab Stack grows this way 8 Buffer overrun exploits int check_authentication(char *password) { int auth_flag = 0; char password_buffer[16]; strcpy(password_buffer, password); if(strcmp(password_buffer, "brillig") == 0) auth_flag = 1; if(strcmp(password_buffer, "outgrabe") == 0) auth_flag = 1; return auth_flag; } int main(int argc, char *argv[]) { if(check_authentication(argv[1])) { printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); printf(" Access Granted.\n"); printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); } else printf("\nAccess Denied.\n"); } (source: “hacking – the art of exploitation, 2nd Ed”) 9 (In)correct usage of APIs Application trend: Increasing number of libraries and APIs – Non-trivial restrictions on permitted sequences of operations Typestate: Temporal safety properties – What sequence of operations are permitted on an object? – Encoded as DFA e.g. “Don’t use a Socket unless it is connected” close() getInputStream() getOutputStream() init connect() getInputStream() getOutputStream() connected close() closed getInputStream() getOutputStream() err * 10 Challenges class SocketHolder { Socket s; } Socket makeSocket() { return new Socket(); // A } open(Socket l) { l.connect(); } talk(Socket s) { s.getOutputStream()).write(“hello”); } main() { Set<SocketHolder> set = new HashSet<SocketHolder>(); while(…) { SocketHolder h = new SocketHolder(); h.s = makeSocket(); set.add(h); } for (Iterator<SocketHolder> it = set.iterator(); …) { Socket g = it.next().s; open(g); talk(g); } } 11 Testing is not enough • Observe some program behaviors • What can you say about other behaviors? • Concurrency makes things worse • Smart testing is useful – requires the techniques that we will see in the course 12 Static analysis definition Reason statically (at compile time) about the possible runtime behaviors of a program “The algorithmic discovery of properties of a program by inspection of its source text1” -- Manna, Pnueli 1 Does not have to literally be the source text, just means w/o running it 13 Is it at all doable? x=? if (x > 0) { y = 42; } else { y = 73; foo(); } assert (y == 42); Bad news: problem is generally undecidable 14 Central idea: use approximation Over Approximation Exact set of configurations/ behaviors Under Approximation universe 15 Goal: exploring program states bad states reachable states initial states 16 Technique: explore abstract states bad states reachable states initial states 17 Technique: explore abstract states bad states reachable states initial states 18 Technique: explore abstract states bad states reachable states initial states 19 Technique: explore abstract states bad states reachable states initial states 20 Sound: cover all reachable states bad states reachable states initial states 21 Unsound: miss some reachable states bad states reachable states initial states 22 Imprecise abstraction False alarms bad states reachable states initial states 23 23 A sound message x=? if (x > 0) { y = 42; } else { y = 73; foo(); } assert (y == 42); Assertion may be violated 24 Precision • Avoid useless result UselessAnalysis(Program p) { printf(“assertion may be violated\n”); } • Low false alarm rate • Understand where precision is lost 25 Runtime vs. static analysis Runtime Static analysis Effectiveness Can miss errors Finds real errors Can find rare errors Can raise false alarms Cost Proportional to program’s execution Proportional to program’s complexity No need to efficiently handle Can handle limited classes of rare cases programs and still be useful 26 Static Driver Verifier Rules Static Driver Verifier Precise API Usage Rules (SLIC) Defects 100% path coverage Driver’s Source Code in C Environment model Bill Gates’ Quote "Things like even software verification, this has been the Holy Grail of computer science for many decades but now in some very key areas, for example, driver verification we’re building tools that can do actual proof about the software and how it works in order to guarantee the reliability." Bill Gates, April 18, 2002. Keynote address at WinHec 2002 The Astrée Static Analyzer Patrick Cousot Radhia Cousot Jérôme Feret Laurent Mauborgne Antoine Miné Xavier Rival ENS France Objectives of Astrée • Prove absence of errors in safety critical C code • ASTRÉE was able to prove completely automatically the absence of any RTE in the primary flight control software of the Airbus A340 fly-by-wire system – a program of 132,000 lines of C analyzed Objectives of Astrée • Prove absence of errors in safety critical C code • ASTRÉE was able to prove completely automatically the absence of any RTE in the primary flight control software of the Airbus A340 fly-by-wire system – a program of 132,000 lines of C analyzed By Lasse Fuss (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons A little about me • History – Studied B.Sc., M.Sc., Ph.D. at Tel-Aviv University • Research in program analysis with IBM and Microsoft – Post-doc in UCLA and in UT Austin – Joined Ben-Gurion University this year • Example research challenges – What’s a good algorithm for automatically discovering (with no hints) that a program generates a binary tree where all leaves are connected in a list? – What’s a good algorithm for automatically proving that a parallel program behaves “well”? – How can we automatically synthesize parallel code that is both correct and efficient? 32 Why study program analysis? • Challenging and thought provoking – An approach for dealing with computationally hard (usually undecidable) problems – Treat programs as mathematical objects • Understand how to systematically – Design optimizations – Reason about correctness / find bugs (security) • Some techniques may be applied in other domains – Computational learning – Analysis of biological systems 33 What do you get in this course? • Learn basic principles of static analysis – Understand jargon/papers • Learn a few advanced techniques – Some principled way of developing analysis – Develop one in a small-scale project • Put to practice what you learned in logic, automata, programming 34 My role • Teach you theory and practice • Teach you how to think of new techniques • E-mail: romanm@cs.bgu.ac.il • Office hours: Wednesday 13:00-15:00 • Course web-page – Announcements – Forum –… 35 Requirements 1. Summarize one lecture: 10% of grade – Submit initial summary – Get corrections/suggestions – Submit revised summary 2. Theoretical assignments and programming assignments: 50% – – – – About 8 (some very small) Must submit all Must solve all questions Otherwise re-submit (and get a lower grade) 3. Final project: 40% – Implement a program analyzer for a given component 36 How to succeed in this course • Attend all classes • Make sure you understand material in class – Engage by asking questions and raising ideas • Be on top of assignments – Submit on time – Don’t get stuck or give up on exercises – get help – ask me – Don’t start working on assignments the day before • Be ethical Joe (a day before assignment deadline): “I don’t really understand what you want from me in this assignment, can you help me/extend the deadline”? 37 The static analysis approach • Formalize software behavior in a mathematical model (semantics) • Prove properties of the mathematical model – Automatically, typically with approximation of the formal semantics • Develop theory and tools for program correctness and robustness 38 Kinds of static analysis • Spans a wide range – type checking … up to full functional verification • General safety specifications • Security properties (e.g., information flow) • Concurrency correctness conditions (e.g., absence of data races, absence of deadlocks, atomicity) • Correct usage of libraries (e.g., typestate) • Underapproximations useful for bug-finding, test-case generation,… 39 Static analysis techniques • Abstract Interpretation • Dataflow analysis • Constraint-based analysis • Type and effect systems 40 Static analysis for verification specification Valid program Analyzer Abstract counter example 41 Relation to program verification Static Analysis • Fully automatic • Applicable to a programming language • Can be very imprecise • May yield false alarms Program Verification • Requires specification and loop invariants • Program specific • • • • Relatively complete Provides counter examples Provides useful documentation Can be mechanized using theorem provers 42 Verification challenge main(int i) { int x=3,y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y; Determine what states can arise during any execution } Challenge: set of states is unbounded 43 Abstract Interpretation main(int i) { int x=3,y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y; Recipe 1) Abstraction 2) Transformers Determine what 3) Exploration states can arise during any execution } Challenge: set of states is unbounded Solution: compute a bounded representation of (a superset) of program states 44 1) Abstraction main(int i) { int x=3,y=1; • concrete state : Var Z • abstract state (sign) do { #: Var{+, 0, -, ?} y = y + 1; } while(--i > 0) x y i assert 0 < x + y; } 3 1 7 x y i x y + + + i 3 2 6 … 45 2) Transformers main(int i) { int x=3,y=1; • concrete transformer x y i 3 1 0 } y = y + 1 x y i 3 2 0 do { • abstract transformer y = y + 1; } while(--i > 0) x y i y = y + 1 x assert 0 < x + y; + + 0 + y i + 0 0 + ? 0 + 0 0 + + 0 + ? 0 + ? 0 + - 46 3) Exploration x y i main(int i) { int x=3,y=1; x y i ? ? ? + + ? do { y = y + 1; } while(--i > 0) assert 0 < x + y; } + + ? + + ? + + ? + + ? + + ? + + ? 47 Incompleteness x y i main(int i) { int x=3,y=1; x y i ? ? ? + + ? do { y = y - 2; y = y + 3; } while(--i > 0) assert 0 < x + y; } + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? 48 Parity abstraction while (x !=1 ) do if (x % 2) == 0 x := x / 2; } else { x := x * 3 + assert (x %2 } } { { 1; ==0); challenge: how to find “the right” abstraction 49 How to find “the right” abstraction? • Pick an abstract domain suited for your property – Numerical domains – Domains for reasoning about the heap –… • Combination of abstract domains • Another approach – Abstraction refinement 50 Following the recipe (in a nutshell) 1) Abstraction n t n n t x x n Abstract state Concrete state 2) Transformers t->n = x n t x n n t x n 51 Example: shape (heap) analysis void stack-init(int i) { Node* x = null; emp do { Node t = malloc(…) t t t x x n t t t->n = x; t } while(--i>0) t x t x n t x n n t x } n n n n n n n n x Top = x; assert(acyclic(Top)) n x n t n x n t x x = t; n n t x n n t x x t n t x top 52 3) Exploration void stack-init(int i) { Node* x = null; emp do { Node t = malloc(…) t t t x x x t->n = x; } while(--i>0) Top = x; assert(acyclic(Top)) } t x t x t x t Top n n t x n t x x = t; n x n t n t x t t t n n n t x x n n n n tt x Top x n t x Top n n t n x x n n n n t n x Top n 53 Example: polyhedra (numerical) domain proc MC(n:int) returns (r:int) var t1:int, t2:int; begin if (n>100) then r = n-10; else t1 = n + 11; t2 = MC(t1); r = MC(t2); endif; end var a:int, b:int; begin b = MC(a); end What is the result of this program? 54 McCarthy 91 function if (n>=101) then n-10 else 91 proc MC (n : int) returns (r : int) var t1 : int, t2 : int; begin /* (L6 C5) top */ if n > 100 then /* (L7 C17) [|n-101>=0|] */ r = n - 10; /* (L8 C14) [|-n+r+10=0; n-101>=0|] */ else /* (L9 C6) [|-n+100>=0|] */ t1 = n + 11; /* (L10 C17) [|-n+t1-11=0; -n+100>=0|] */ t2 = MC(t1); /* (L11 C17) [|-n+t1-11=0; -n+100>=0; -n+t2-1>=0; t2-91>=0|] */ r = MC(t2); /* (L12 C16) [|-n+t1-11=0; -n+100>=0; -n+t2-1>=0; t2-91>=0; r-t2+10>=0; r-91>=0|] */ endif; /* (L13 C8) [|-n+r+10>=0; r-91>=0|] */ end var a : int, b : int; begin /* (L18 C5) top */ b = MC(a); /* (L19 C12) [|-a+b+10>=0; b-91>=0|] */ end 55 Some things that should trouble you • • • • • • Does a result always exist? Does the recipe always converge? How “optimal” is the result? How do I pick my abstraction? How do come up with abstract transformers? Other practical issues – Efficiency – How does it do in practice? 56 Abstraction refinement Valid program specification abstraction Verify Abstract counter example Abstraction Refinement counter example Change the abstraction to match the program 57 Recap: program analysis • Reason statically (at compile time) about the possible runtime behaviors of a program • use sound overapproximation of program behavior • abstract interpretation – abstract domain – transformers – exploration (fixed-point computation) • finding the right abstraction? 58 Next lecture: semantics of programming languages 59 References • Patriot bug: – http://www.cs.usyd.edu.au/~alum/patriot_bug.html – Patrick Cousot’s NYU lecture notes • Zune bug: – http://www.crunchgear.com/2008/12/31/zune-bug-explained-indetail/ • Blaster worm: – http://www.sans.org/securityresources/malwarefaq/w32_blasterworm.php • Interesting CACM article – http://cacm.acm.org/magazines/2010/2/69354-a-few-billion-lines-ofcode-later/fulltext • Interesting blog post – http://www.altdevblogaday.com/2011/12/24/static-code-analysis/ 60