Dynamically Validating Static Memory Leak Warnings Mengchen Li Joint work with: Yuanjun Chen, Linzhang Wang, Nanjing University Guoqing Xu, University of California, Irvine ISSTA 2013 Outline Background & Motivation Overview Examples Algorithms Evaluation 2016/6/28 Software Engineering Group 2 Background Memory Leak An important source of severe memory errors 39% of all reported vulnerabilities since 1991 according to US-CERT Vulnerability Notes Database Occurs when dynamically allocated memory cannot be reclaimed and reused In C/C++, explicit and manual memory management can easily lead to memory leaks and other vulnerabilities 2016/6/28 Software Engineering Group 3 Motivation Dynamic analysis Both static and dynamic analysis have been Cannot findto allfind problems without high-qualified test suites developed memory leaks Static analysis Can detect potential memory leaks without execution overhead Imprecise modeling of real programs complex pointer arithmetic operations extremely large number of paths Report a sea of likely warnings true problems being buried among them Manually inspecting static warnings to find true leakslimits is tedious, laborSignificantly its real-world intensive, and time consuming. usefulness 2016/6/28 Software Engineering Group 4 Motivation Reduce the number of warnings that need to be manually validated Need to Likely to Remain be fixed false to be warnings validated 2016/6/28 Classification System Software Engineering Group 5 Overview Our approach works for all static analysis tools producing: Allocation site a Path fragment p Potential leaking point e Result = Malloc(size) ... Currently use Fortify SCA as an example tool a F Size>10 T Size>0 T Return NULL e F Return result Demonstrate the effectiveness Can be used on other static leak detectors 2016/6/28 Software Engineering Group 6 Overview Generate test cases to cover the path fragment of each warning and dynamically track the allocated memory objects Warnings are classified into four categories: MUST-LEAK LIKELY-NOT-LEAK BLOAT MAY-LEAK 2016/6/28 Software Engineering Group 7 Basic Idea In ideal situation: T (C1) F (┐C1) Divide warnings into T and F allocation site a, potential leaking point e T condition: In some execution, we can find an object created by a that has no incoming reference right after e F condition: In all possible executions, all objects created by a have incoming references after e 2016/6/28 Software Engineering Group 8 Basic Idea T (C1) F (┐C1) allocation site a, potential leaking point e In real situation: 2016/6/28 Number of incoming references : difficult to understand exactly and requires expensive instrumentation and data flow tracking Restricted to testing techniques, “in all possible executions” unsatisfied Software Engineering Group 9 Basic Idea To approximate the ideal condition: MUST-LEAK T F LIKELY-NOT-LEAK and BLOAT T (C1) BLOAT (Cb) F (┐C1) MUST-LEAK (Cw) LIKELY-NOT-LEAK (Cs) MAY-LEAK = (T U F) ∕ (BLOAT U MUST-LEAK U LIKELY-NOT-LEAK) 2016/6/28 Software Engineering Group 10 Category:MUST-LEAK Main ... MUST-LEAK : static warning path calls (1) along at least one execution Size>0 T Result = Malloc(size) ... (2) an object created by the reported leaking allocation site a a F Size>10 T Return Return NULL e F (3) not reclaimed (freed) before the end of the execution 2016/6/28 Return result return … End Software Engineering Group 11 Category:LIKELY-NOT-LEAK Main ... LIKELY-NOT-LEAK : static warning path calls (1)along all executions T Size>0 p = Malloc(size) … addToGlobal(p) ... (2)objects created by a in all tests F return return (3) all are accessed after point e and (4) all are reclaimed in the end 2016/6/28 For(i=0;i<num;i++) Write(ptrArr[i]); freePtrArr();//free all all in in ptrArr ptrArr freePtrArr();//free Software Engineering Group … End 12 Category:BLOAT Main ... BLOAT: static warning path calls (1)along all executions T Size>0 Malloc(size) pp == Malloc(size) … … addToGlobal(p) addToGlobal(p) ... (2)objects created by a in all tests F return return (3) some are never used after point e (stale) (4) all are reclaimed in the end 2016/6/28 For(i=0;i<10;i++) Write(ptrArr[i]); freePtrArr();//free all in ptrArr Software Engineering Group … End 13 Basic Idea MUST-LEAK memory leak warnings MAY-LEAK LIKELY-NOTLEAK BLOAT Priority comparisons among the four categories: Manual Validation Priority MAY-LEAK > BLOAT> LIKELY-NOT-LEAK > MUST-LEAK Remain to be validated Likely to be false warnings Need to be fixed Fixing Priority MUST-LEAK > MAY-LEAK > BLOAT> LIKELY-NOT-LEAK 2016/6/28 Software Engineering Group 14 Algorithms Path-guided concolic testing Object-based state tracking Pre-processing: Instrumentation: declares symbolic variables, marks the path fragment and tracks the usage of each run-time object Reachability analysis: computed on CFG, direct concolic testing to cover path fragment more efficiently 2016/6/28 Software Engineering Group 15 Test Generation Illustration 2016/6/28 Modified CREST Reachability : for each control-flow branch, whether the path fragment can be potentially reached from this branch Use a reachability map to direct concolic testing Prune unreachable paths from concolic search space Software Engineering Group 16 Update Tracking Data Freed1: not cover p or freed before e Freed2: BLOAT Freed3: LIKELY-NOT-LEAK LP/UseAfterLP: MUSTLEAK 2016/6/28 Software Engineering Group 17 Experiment and Evaluation Two experiments to evaluate the effectiveness Precision and efficiency Scalability To answer : 2016/6/28 How accurate? How much effort can save? How efficient? Perform on large-scale, real-world application? Software Engineering Group 18 Experiment 1: Classification Accuracy and Efficiency 3.print_tokens2 LIKELY-NOT-LEAK MUST-LEAK 12 12 9 10 8 8 8 6 4 3.print_tokens2 BLOAT MAY-LEAK 11 6 5 4 9 3 6 4 4 9 10 7 11 8 5 0 88 1 2 6 6 5 6 4 4 2 2222 #injection 1 1 2 2 2 2 2 1 38 4 9 27 #may-leak 22 22 1 1#categorized 11 0 NO true leak is mistakenly classified into LIKELY-NOT-LEAK and BLOAT 2016/6/28 Software Engineering Group 19 Experiment 1: Classification Accuracy and Efficiency Warning MAY-LEAK 32 29 155 WARNINGS 22 18 18 6 4 5 8 76.1% 8 2 8 2 3 6 5 6 6 2 2 37 MAY-LEAKS 2016/6/28 Software Engineering Group 20 Experiment 1: Classification Accuracy and Efficiency Running time T0(s) Peak memory consumption T1(s) Sp0(MB) 30 30 25 25 20 20 15 15 10 10 5 5 0 0 24.8% 2016/6/28 Sp1(MB) 21.4% Software Engineering Group 21 Experiment 2: A Large-scale Program A case study for a large-scale program Texinfo-4.13 (46493 lines of code) No leak injection for this application Manually wrote a set of input files, let concolic engine generate the command line part and choose from input files 91 warnings for texinfo classified into 69 MUST-LEAK, 1 LIKELYNOT-LEAK, 0 BLOAT and 21 MAY-LEAK (reduce 76.9%) Time and space overheads for this application are 77.5% and 26.4% 2016/6/28 Software Engineering Group 22 Conclusions Classify memory leak warnings into four categories: MUST-LEAK, LIKELY-NOT-LEAK, BLOAT, and MAYLEAK Reduce human effort and improve productivity Combine the path-guided concolic testing and the object-based state tracking Future work: 2016/6/28 More experiments using stronger test generation techniques Extend to other types of vulnerabilities Software Engineering Group 23 2016/6/28 Software Engineering Group 24