Dynamically Validating Static Memory Leak Warnings ISSTA 2013 Mengchen Li

advertisement
Dynamically Validating Static Memory Leak
Warnings
Mengchen Li
Joint work with:
Yuanjun Chen, Linzhang Wang, Nanjing University
Guoqing Xu, University of California, Irvine
ISSTA 2013
Outline





Background & Motivation
Overview
Examples
Algorithms
Evaluation
2016/6/28
Software Engineering Group
2
Background
Memory Leak

An important source of severe memory errors

39% of all reported vulnerabilities since 1991 according to US-CERT
Vulnerability Notes Database

Occurs when dynamically allocated memory cannot be
reclaimed and reused

In C/C++, explicit and manual memory management can easily
lead to memory leaks and other vulnerabilities
2016/6/28
Software Engineering Group
3
Motivation
Dynamic
analysis
Both static
and dynamic analysis have been

Cannot findto
allfind
problems
without high-qualified
test suites
developed
memory
leaks
Static analysis



Can detect potential memory leaks without execution overhead
Imprecise modeling of real programs

complex pointer arithmetic operations

extremely large number of paths
Report a sea of likely warnings

true problems being buried among them
Manually inspecting static warnings to
find
true leakslimits
is tedious,
laborSignificantly
its real-world
intensive,
and time consuming.
usefulness
2016/6/28
Software Engineering Group
4
Motivation
Reduce the number of warnings
that need to be manually validated
Need
to
Likely
to
Remain
be
fixed
false
to be
warnings
validated
2016/6/28
Classification
System
Software Engineering Group
5
Overview

Our approach works for all
static analysis tools
producing:

Allocation site a
Path fragment p

Potential leaking point e

Result = Malloc(size)
...
Currently use Fortify SCA
as an example tool


a
F
Size>10

T
Size>0
T
Return NULL
e
F
Return result
Demonstrate the effectiveness
Can be used on other static
leak detectors
2016/6/28
Software Engineering Group
6
Overview

Generate test cases to cover the path fragment of each
warning and dynamically track the allocated memory
objects

Warnings are classified into four categories:




MUST-LEAK
LIKELY-NOT-LEAK
BLOAT
MAY-LEAK
2016/6/28
Software Engineering Group
7
Basic Idea

In ideal situation:



T (C1)
F (┐C1)
Divide warnings
into T and F
allocation site a, potential leaking point e
T condition:

In some execution, we can find an object created by a that has
no incoming reference right after e
F condition:

In all possible executions, all objects created by a have
incoming references after e
2016/6/28
Software Engineering Group
8
Basic Idea
T (C1)
F (┐C1)
allocation site a, potential leaking point e

In real situation:


2016/6/28
Number of incoming references : difficult to understand
exactly and requires expensive instrumentation and data flow
tracking
Restricted to testing techniques, “in all possible executions”
unsatisfied
Software Engineering Group
9
Basic Idea

To approximate the ideal condition:


MUST-LEAK
T
F
LIKELY-NOT-LEAK and BLOAT
T (C1)
BLOAT (Cb)
F (┐C1)
MUST-LEAK (Cw)
LIKELY-NOT-LEAK
(Cs)
MAY-LEAK = (T U F) ∕ (BLOAT U MUST-LEAK U LIKELY-NOT-LEAK)
2016/6/28
Software Engineering Group
10
Category:MUST-LEAK
Main
...
MUST-LEAK :
static warning path
calls
(1) along at least one execution
Size>0
T
Result = Malloc(size)
...
(2) an object created by the
reported leaking allocation site
a
a
F
Size>10
T
Return
Return NULL
e
F
(3) not reclaimed (freed)
before the end of the
execution
2016/6/28
Return result
return
…
End
Software Engineering Group
11
Category:LIKELY-NOT-LEAK
Main
...
LIKELY-NOT-LEAK :
static warning path
calls
(1)along all executions
T
Size>0
p = Malloc(size)
…
addToGlobal(p)
...
(2)objects created by a in all
tests
F
return
return
(3) all are accessed after point e
and
(4) all are reclaimed in the
end
2016/6/28
For(i=0;i<num;i++)
Write(ptrArr[i]);
freePtrArr();//free all
all in
in ptrArr
ptrArr
freePtrArr();//free
Software Engineering Group
…
End
12
Category:BLOAT
Main
...
BLOAT:
static warning path
calls
(1)along all executions
T
Size>0
Malloc(size)
pp == Malloc(size)
…
…
addToGlobal(p)
addToGlobal(p)
...
(2)objects created by a in all
tests
F
return
return
(3) some are never used after
point e (stale)
(4) all are reclaimed in the
end
2016/6/28
For(i=0;i<10;i++)
Write(ptrArr[i]);
freePtrArr();//free all in ptrArr
Software Engineering Group
…
End
13
Basic Idea
MUST-LEAK
memory
leak
warnings

MAY-LEAK
LIKELY-NOTLEAK
BLOAT
Priority comparisons among the four categories:
Manual Validation Priority
MAY-LEAK > BLOAT> LIKELY-NOT-LEAK > MUST-LEAK
Remain to be
validated
Likely to be false warnings
Need to be fixed
Fixing Priority
MUST-LEAK > MAY-LEAK > BLOAT> LIKELY-NOT-LEAK
2016/6/28
Software Engineering Group
14
Algorithms

Path-guided concolic testing

Object-based state tracking

Pre-processing:

Instrumentation: declares symbolic variables, marks the path
fragment and tracks the usage of each run-time object

Reachability analysis: computed on CFG, direct concolic
testing to cover path fragment more efficiently
2016/6/28
Software Engineering Group
15
Test Generation Illustration
2016/6/28

Modified CREST

Reachability : for each control-flow
branch, whether the path fragment
can be potentially reached from
this branch

Use a reachability map to direct
concolic testing

Prune unreachable paths from
concolic search space
Software Engineering Group
16
Update Tracking Data
Freed1: not cover p or freed
before e
Freed2: BLOAT
Freed3: LIKELY-NOT-LEAK
LP/UseAfterLP: MUSTLEAK
2016/6/28
Software Engineering Group
17
Experiment and Evaluation

Two experiments to evaluate the effectiveness



Precision and efficiency
Scalability
To answer :




2016/6/28
How accurate?
How much effort can save?
How efficient?
Perform on large-scale, real-world application?
Software Engineering Group
18
Experiment 1: Classification Accuracy and Efficiency
3.print_tokens2 LIKELY-NOT-LEAK
MUST-LEAK
12
12
9
10
8
8
8
6
4
3.print_tokens2
BLOAT
MAY-LEAK
11
6
5
4
9
3
6
4
4
9
10
7
11
8
5
0
88
1
2
6 6
5
6
4
4
2 2222
#injection
1 1
2
2
2 2
2 1
38
4
9
27
#may-leak
22
22
1 1#categorized
11
0
NO true leak is mistakenly classified into
LIKELY-NOT-LEAK and BLOAT
2016/6/28
Software Engineering Group
19
Experiment 1: Classification Accuracy and Efficiency
Warning
MAY-LEAK
32
29
155
WARNINGS
22
18
18
6
4
5
8
76.1%
8
2
8
2
3
6
5
6
6
2
2
37 MAY-LEAKS
2016/6/28
Software Engineering Group
20
Experiment 1: Classification Accuracy and Efficiency
Running time
T0(s)
Peak memory consumption
T1(s)
Sp0(MB)
30
30
25
25
20
20
15
15
10
10
5
5
0
0
24.8%
2016/6/28
Sp1(MB)
21.4%
Software Engineering Group
21
Experiment 2: A Large-scale Program

A case study for a large-scale program



Texinfo-4.13 (46493 lines of code)
No leak injection for this application
Manually wrote a set of input files, let concolic engine
generate the command line part and choose from input
files


91 warnings for texinfo classified into 69 MUST-LEAK, 1 LIKELYNOT-LEAK, 0 BLOAT and 21 MAY-LEAK (reduce 76.9%)
Time and space overheads for this application are 77.5% and
26.4%
2016/6/28
Software Engineering Group
22
Conclusions

Classify memory leak warnings into four categories:
MUST-LEAK, LIKELY-NOT-LEAK, BLOAT, and MAYLEAK

Reduce human effort and improve productivity

Combine the path-guided concolic testing and the
object-based state tracking

Future work:


2016/6/28
More experiments using stronger test generation techniques
Extend to other types of vulnerabilities
Software Engineering Group
23
2016/6/28
Software Engineering Group
24
Download