CS5103 Software Engineering Lecture 15 Static Bug Detection and Verification Static bug detection Static bug detection is a minor approach for software quality assurance, compared with testing Compared to testing Work for specific kinds of bugs Sometimes not scalable Generate false positives Easy to start (no build, no setup, no install …) 2 Sometimes can guarantee the software to be free of certain kinds of bugs No need for debugging State-of-art: static bug detection Type-specific detection (Fixed Specification and improvement is provided) Major or important type of bugs A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them from practical usage Specification based detection 3 Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i18n bugs, … Model checking, symbolic execution, theorem proving Specification 4 A description of the correct behavior of software We must have formal specification to do static bug detection Three main types of specifications Value Temporal Data Flow Value Specification 5 The value (s) of one or several variable (s) must satisfy a certain constraint Example: Final Exam Score <= 100 sortedlist(0) >= sortedlist(1) http_url.startsWith(“http”) Sql_query belongs to Language_SQL Temporal Specification 6 Two events (or a series of events) must happen in a certain order Example lock() -> unlock() file.open() -> file.close() and file.open() -> file.read() They are different, right? Temporal Logic Lock() -> F(unlock()) (!read())U(open()) Data Flow Specification 7 Data from a certain source must / must not flow to a certain sink Example: ! Contact Info -> Internet Password -> encryption -> Internet Data Flow Specification are mainly for security usage General Specifications 8 Common behaviors of all software a/b -> b!=0 a.field -> a!=null a[x] -> x<a.length() p.malloc() -> p.free() lock(s) -> unlock(s) while(Condition) -> F(!Condition) <script> xxx </script> -> ! User_input -> xxx ! Hard-coded string -> User Interface Divide by 0 Null Pointer Reference Buffer Overflow Memory Leak deadlock Infinite Loop XSS I18n error Checking Specifications Basic ways Value Specifications Temporal Specification Model Checking Data Flow Specification 9 Symbolic execution Graph traversal (Data Dependence Graph) Static symbolic execution Basic Example y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1; print ("OK"); Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed T (y=s), s is a symbolic variable for input T (y=2*s) T (y=2*s) T^y<=12 (y = 3) T^!(y<=12) (y= 2*s + 1) T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1) (2*s <= 12 & y = 3) & y <= 0 Not Satisfiable Prove y > 0? !(2*s <= 12) & (y = 2*s + 1) & y<=0 Not Satisfiable Static symbolic execution Complex Example T (y=s), s is a symbolic variable for input y = read(); T (p = 1, y = s) p = 1; while(y < 10){ T (p = 1, y = s) T^ (y = s(y+=1,s + p 2, = 1) T^s<10 2<s+1<10 p = 2) | s+1<=2 (y = s + 2, p = 3) y = y + 1; if y >2 p = p + 1; T^!(2 … < s + 1< 10) (y = s + 1, p = 2) else p = p + 2; T^s + 1<=2 (y = s + 1, p = 3) } print (p); Prove p > 0? 11 Checking Specifications Basic ways Value Specifications Temporal Specification Model Checking Data Flow Specification 12 Symbolic execution Graph traversal (Data Dependence Graph) Model Checking Basic idea 13 Transform the program to an automaton Program states are state of the automaton, and statements are transitions / edges Checking temporal properties on the automaton by traversing it Model Checking: Model Building Basic approach: Use Control Flow Graph: Use Abstract states View all program states after a statement with same abstract values as ONE state Use Concrete values 14 View all program states after a statement as ONE state View all program states after a statement with same concrete values as ONE state: usually impossible An example with CFG-model Checking whether a file is closed in all cases Start boolean load(){ f.open(); line = f.read(); while(line!=null){ if(line.contains('key')){ f.close() return true; }else if(line.contains('value')){ f.close() } line = f.read(); } ==null return false; } ret 15 f is not open opened new line read !=null key value none closed closed An example with CFG-model Traversing the model to find contrary examples f is not open Start opened new line read !=null key value none ==null 16 closed ret closed An example with CFG-model Read must before close f is not open Start opened new line read !=null key value none ==null 17 closed ret closed Temporal Logic The basic idea of model checking is to find a certain path in the model that violate the specification Describe the sequential relationship among a number of events: the specification 18 So that any specification can just be read by a path finding tool Do not need to bother writing a path finding tool for each proof Usage of Temporal Logic Describe the sequential relationship among a number of events U: until PUQ means that P has to be true until Q is true F: Future FP means that P will be true some time in future 19 !read(f)Uopen(f) !close(f)Uopen(f) open(f) -> Fclose(f) close(f) -> !Fread(f) Checking Specifications Basic ways Value Specifications Symbolic execution Abstract Interpretation Temporal Specification Data Flow Specification 20 Model Checking Graph traversal (Data Dependence Graph) Some Simple check with Graph Traversal Check x flows to w Check (!z used as divider)U(Z is written) 21 Problems of static bug detection Lack of Specifications Very rare project-specific formal specification Solutions: 22 General specifications (for typical bugs) Mining specifications (for API-specific, project-specific specifications) False Positives vs. Efficiency More sensitivities -> higher cost Path sensitivity is rarely achieved Combination of all sensitivities -> Incomputable problems State-of-practice: static bug detection 23 Findbugs A tool developed by researchers from UMD Widely used in industry for code checking before commit The idea actually comes from Lint Lint A code style enforcing tool for C language Find bad coding styles and raise warnings Bad naming Hard coded strings … Idea: do it reversely Most static bug detection tools Set up a specification (either from users or well-defined ones) Check all possible cases to guarantee that the specification hold Otherwise provide counter-examples Findbugs 24 E.g., Devisor should not be 0, null pointer should not be referred to, the salary of a personal cannot be negative Detect code patterns for bugs E.g., a = null, b = a.field; str.replace(“ ”, “”); Characters of Findbugs Based on existing concrete code patterns Check code patterns locally: only do innerprocedure analysis Perform bug ranking according to the probability and potential severity of bugs 25 What are the advantages and disadvantages of doing so? Probability: the bug is likely to be true Severity: the bug may cause severe consequence if not fixed Application of Findbugs-like tools Findbugs is adopted by a number of large companies such as Google Usually only the issues with highest confidence/severity are reported as issues A statistics in Google 2009: 26 More than 4000 issues are identified, in which 1700 bugs are confirmed, and 1100 are fixed. The software department of USAA is using PMD, an alternative of Findbugs Patterns to be checked 27 404 bug patterns in 6 major categories Bad Practice / Dodgy code Correctness Internationalization Vulnerability / Security Multithread correctness Performance Bad Practice / Dodgy code Hackish code, not stable and may harm future maintenance Examples: Equals method should not assume type of object argument boolean Equals(Object o){ Myclass my = (Myclass)o; return my.id = this.id; } Abstract class defines covariant compareTo() method int compareTo(Myclass obj){ … } 28 Correctness The code pattern may result in incorrect behavior of the software Examples: DMI: Collections should not contain themselves List s = new …; … if(s.contains(s)){ … } DMI: Invocation of hashCode on an array Int[] x = new int[10]; … x.hashcode(); 29 Internationalization A code pattern that will hard future i18n of the software Example: Use toUpperCase, toLowerCase on localized strings String s = getLocale(key); s.toUpperCase(); Perfrom tobytes() on localized strings String s = getLocale(key); s.getBytes(); 30 Multi-thread correctness A code pattern that may cause incorrectness in multi-thread execution Examples Synchronization on boxed primitive private static Boolean inited = Boolean.FALSE; ... synchronized(inited) { if (!inited) { init(); inited = Boolean.TRUE; } } ... 31 Vulnerability/Security The code pattern may result in vulnerability or security issues Examples: SQL: A SQL query is generated from a non-constant String String str = “select” + bb + ” ddd” + … server.execute(str); This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability Para = request.getParameter(key); out.print(Para); 32 Performance The code pattern may harm the performance of the software Examples: SBSC: Method concatenates strings using + in a loop String s = ""; for (int i = 0; i < field.length; ++i) { s = s + field[i]; } StringBuffer buf = new StringBuffer(); for (int i = 0; i < field.length; ++i) { buf.append(field[i]); } String s = buf.toString(); 33 Major problem: False positives Overall precision 34 5% to 10% on open source and industry projects Developers want to make sure they do not waste effort on a false positive Usually more bugs than developers can fix Solution: Bug ranking Ranking bug categories Some categories are more likely to be bugs than others How to give scores to each category? 35 Check large number of issues in the history of software How large a proportion is fixed? Raise precision to about 30% in the 25% top ranked bugs Findbugs Disadvantages Can not guarantee the software to be free of certain bugs Still involve many false positives Advantages Easy to start Scalable Relatively less false positives Some what like testing 36 Becomes the most popular and practical static bug detection techniques Review of Static Bug Detection Specification-based static bug detection Value Specifications : Symbolic Execution, Abstract Interpretation Temporal Specifications: Model Checking Data Flow Specifications: Dependence Graph, Traversing Pattern-based static bug detection Findbugs Bug Ranking