CS5103 Software Engineering

advertisement
CS5103
Software Engineering
Lecture 15
Static Bug Detection and
Verification
Static bug detection


Static bug detection is a minor approach for
software quality assurance, compared with testing
Compared to testing

Work for specific kinds of bugs

Sometimes not scalable

Generate false positives

Easy to start (no build, no setup, no install …)


2
Sometimes can guarantee the software to be free of
certain kinds of bugs
No need for debugging
State-of-art: static bug detection

Type-specific detection (Fixed Specification and
improvement is provided)

Major or important type of bugs




A large bunch of techniques for each kind of bugs
Most of them have severe limitations preventing them
from practical usage
Specification based detection

3
Null pointer, memory leak, unsafe cast, injection, buffer
overflow, Dynamic SQL error, racing, deadlock, dead loop,
html error, UI inconsistency, i18n bugs, …
Model checking, symbolic execution, theorem proving
Specification



4
A description of the correct behavior of
software
We must have formal specification to do static
bug detection
Three main types of specifications

Value

Temporal

Data Flow
Value Specification


5
The value (s) of one or several variable (s) must
satisfy a certain constraint
Example:

Final Exam Score <= 100

sortedlist(0) >= sortedlist(1)

http_url.startsWith(“http”)

Sql_query belongs to Language_SQL
Temporal Specification



6
Two events (or a series of events) must happen in a
certain order
Example

lock() -> unlock()

file.open() -> file.close() and file.open() -> file.read()

They are different, right?
Temporal Logic

Lock() -> F(unlock())

(!read())U(open())
Data Flow Specification



7
Data from a certain source must / must not flow to
a certain sink
Example:

! Contact Info -> Internet

Password -> encryption -> Internet
Data Flow Specification are mainly for security
usage
General Specifications

8
Common behaviors of all software

a/b -> b!=0

a.field -> a!=null

a[x] -> x<a.length()

p.malloc() -> p.free()

lock(s) -> unlock(s)

while(Condition) -> F(!Condition)

<script> xxx </script> -> ! User_input -> xxx

! Hard-coded string -> User Interface
Divide by 0
Null Pointer Reference
Buffer Overflow
Memory Leak
deadlock
Infinite Loop
XSS
I18n error
Checking Specifications
Basic ways

Value Specifications


Temporal Specification


Model Checking
Data Flow Specification

9
Symbolic execution
Graph traversal (Data Dependence Graph)
Static symbolic execution

Basic Example
y = read();
y = 2 * y;
if (y <= 12)
y = 3;
else
y = y + 1;
print ("OK");
Here T is the condition for the
statement to be executed,
(y=s) is the relationship of all
variables to the inputs after
the statement is executed
T (y=s), s is a symbolic variable for input
T (y=2*s)
T (y=2*s)
T^y<=12 (y = 3)
T^!(y<=12) (y= 2*s + 1)
T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1)
(2*s <= 12 & y = 3) & y <= 0 Not Satisfiable
Prove y > 0?
!(2*s <= 12) & (y = 2*s + 1) & y<=0 Not Satisfiable
Static symbolic execution

Complex Example
T (y=s), s is a symbolic variable for input
y = read();
T (p = 1, y = s)
p = 1;
while(y < 10){ T (p = 1, y = s)
T^
(y = s(y+=1,s +
p 2,
= 1)
T^s<10
2<s+1<10
p = 2) | s+1<=2 (y = s + 2, p = 3)
y = y + 1;
if y >2
p = p + 1; T^!(2
… < s + 1< 10) (y = s + 1, p = 2)
else
p = p + 2; T^s + 1<=2 (y = s + 1, p = 3)
}
print (p);
Prove p > 0?
11
Checking Specifications
Basic ways

Value Specifications


Temporal Specification


Model Checking
Data Flow Specification

12
Symbolic execution
Graph traversal (Data Dependence Graph)
Model Checking

Basic idea



13
Transform the program to an automaton
Program states are state of the automaton, and
statements are transitions / edges
Checking temporal properties on the automaton
by traversing it
Model Checking: Model Building

Basic approach:

Use Control Flow Graph:


Use Abstract states


View all program states after a statement with same
abstract values as ONE state
Use Concrete values

14
View all program states after a statement as ONE
state
View all program states after a statement with same
concrete values as ONE state: usually impossible
An example with CFG-model

Checking whether a file is closed in all
cases
Start
boolean load(){
f.open();
line = f.read();
while(line!=null){
if(line.contains('key')){
f.close()
return true;
}else if(line.contains('value')){
f.close()
}
line = f.read();
}
==null
return false;
}
ret
15
f is not open
opened
new line read
!=null
key
value
none
closed
closed
An example with CFG-model

Traversing the model to find contrary
examples
f is not open
Start
opened
new line read
!=null
key
value
none
==null
16
closed
ret
closed
An example with CFG-model

Read must before close
f is not open
Start
opened
new line read
!=null
key
value
none
==null
17
closed
ret
closed
Temporal Logic


The basic idea of model checking is to find a
certain path in the model that violate the
specification
Describe the sequential relationship among a
number of events: the specification


18
So that any specification can just be read by a path
finding tool
Do not need to bother writing a path finding tool for
each proof
Usage of Temporal Logic


Describe the sequential relationship among
a number of events
U: until

PUQ means that P has to be true until Q is true



F: Future

FP means that P will be true some time in future


19
!read(f)Uopen(f)
!close(f)Uopen(f)
open(f) -> Fclose(f)
close(f) -> !Fread(f)
Checking Specifications
Basic ways


Value Specifications

Symbolic execution

Abstract Interpretation
Temporal Specification


Data Flow Specification

20
Model Checking
Graph traversal (Data Dependence Graph)
Some Simple check with Graph Traversal
Check x flows to w
Check (!z used as divider)U(Z is written)
21
Problems of static bug detection

Lack of Specifications

Very rare project-specific formal specification

Solutions:



22
General specifications (for typical bugs)
Mining specifications (for API-specific, project-specific
specifications)
False Positives vs. Efficiency

More sensitivities -> higher cost

Path sensitivity is rarely achieved

Combination of all sensitivities -> Incomputable problems
State-of-practice: static bug detection


23
Findbugs

A tool developed by researchers from UMD

Widely used in industry for code checking before commit

The idea actually comes from Lint
Lint

A code style enforcing tool for C language

Find bad coding styles and raise warnings

Bad naming

Hard coded strings

…
Idea: do it reversely

Most static bug detection tools

Set up a specification (either from users or well-defined
ones)




Check all possible cases to guarantee that the
specification hold
Otherwise provide counter-examples
Findbugs

24
E.g., Devisor should not be 0, null pointer should not be
referred to, the salary of a personal cannot be negative
Detect code patterns for bugs

E.g., a = null, b = a.field;

str.replace(“ ”, “”);
Characters of Findbugs


Based on existing concrete code patterns
Check code patterns locally: only do innerprocedure analysis


Perform bug ranking according to the
probability and potential severity of bugs


25
What are the advantages and disadvantages of doing
so?
Probability: the bug is likely to be true
Severity: the bug may cause severe consequence if
not fixed
Application of Findbugs-like tools

Findbugs is adopted by a number of large
companies such as Google


Usually only the issues with highest
confidence/severity are reported as issues
A statistics in Google 2009:


26
More than 4000 issues are identified, in which 1700
bugs are confirmed, and 1100 are fixed.
The software department of USAA is using
PMD, an alternative of Findbugs
Patterns to be checked

27
404 bug patterns in 6 major categories

Bad Practice / Dodgy code

Correctness

Internationalization

Vulnerability / Security

Multithread correctness

Performance
Bad Practice / Dodgy code


Hackish code, not stable and may harm future
maintenance
Examples:

Equals method should not assume type of object argument
boolean Equals(Object o){
Myclass my = (Myclass)o;
return my.id = this.id;
}

Abstract class defines covariant compareTo()
method
int compareTo(Myclass obj){ … }
28
Correctness


The code pattern may result in incorrect behavior
of the software
Examples:

DMI: Collections should not contain themselves
List s = new …; …
if(s.contains(s)){ … }

DMI: Invocation of hashCode on an array
Int[] x = new int[10];
…
x.hashcode();
29
Internationalization


A code pattern that will hard future i18n of
the software
Example:

Use toUpperCase, toLowerCase on localized strings
String s = getLocale(key);
s.toUpperCase();

Perfrom tobytes() on localized strings
String s = getLocale(key);
s.getBytes();
30
Multi-thread correctness


A code pattern that may cause incorrectness in
multi-thread execution
Examples

Synchronization on boxed primitive
private static Boolean inited =
Boolean.FALSE;
...
synchronized(inited) {
if (!inited) {
init();
inited = Boolean.TRUE;
}
}
...
31
Vulnerability/Security


The code pattern may result in vulnerability or
security issues
Examples:

SQL: A SQL query is generated from a non-constant
String
String str = “select” + bb + ” ddd” + …
server.execute(str);

This code directly writes an HTTP parameter to JSP
output, which allows for a cross site scripting vulnerability
Para = request.getParameter(key);
out.print(Para);
32
Performance


The code pattern may harm the performance of
the software
Examples:

SBSC: Method concatenates strings using + in a loop
String s = "";
for (int i = 0; i < field.length; ++i) {
s = s + field[i];
}
StringBuffer buf = new StringBuffer();
for (int i = 0; i < field.length; ++i) {
buf.append(field[i]);
}
String s = buf.toString();
33
Major problem: False positives

Overall precision



34
5% to 10% on open source and industry projects
Developers want to make sure they do not waste
effort on a false positive
Usually more bugs than developers can fix
Solution: Bug ranking



Ranking bug categories
Some categories are more likely to be bugs
than others
How to give scores to each category?



35
Check large number of issues in the history of
software
How large a proportion is fixed?
Raise precision to about 30% in the 25%
top ranked bugs
Findbugs



Disadvantages

Can not guarantee the software to be free of certain bugs

Still involve many false positives
Advantages

Easy to start

Scalable

Relatively less false positives
Some what like testing

36
Becomes the most popular and practical static bug
detection techniques
Review of Static Bug Detection

Specification-based static bug detection




Value Specifications : Symbolic Execution,
Abstract Interpretation
Temporal Specifications: Model Checking
Data Flow Specifications: Dependence Graph,
Traversing
Pattern-based static bug detection

Findbugs

Bug Ranking
Download