Program Analysis and Verification

advertisement
Spring 2013
Program Analysis and Verification
Lecture 1: Introduction
Roman Manevich
Ben-Gurion University
30GB Zunes all over the world fail en masse
December 31, 2008
2
Zune bug
1 while (days > 365) {
2 if (IsLeapYear(year)) {
3
if (days > 366) {
4
days -= 366;
5
year += 1;
6
}
7 } else {
8
days -= 365;
9
year += 1;
10 }
11 }
3
Zune bug
1 while (366 > 365) {
2 if (IsLeapYear(2008)) {
3
if (366 > 366) {
4
days -= 366;
5
year += 1;
6
}
7 } else {
8
days -= 365;
9
year += 1;
10 }
11 }
Suggested solution: wait for tomorrow
4
Patriot missile failure
On the night of the
25th of February, 1991,
a Patriot missile
system operating in
Dhahran, Saudi Arabia,
failed to track and
intercept an incoming
Scud. The Iraqi
missile impacted into
an army barracks,
killing 28 U.S.
soldiers and injuring
another 98.
February 25, 1991
5
Patriot bug – rounding error
• Time measured in 1/10 seconds
• Binary expansion of 1/10:
0.0001100110011001100110011001100....
• 24-bit register
0.00011001100110011001100
• error of
– 0.0000000000000000000000011001100... binary, or
~0.000000095 decimal
• After 100 hours of operation error is
0.000000095×100×3600×10=0.34
• A Scud travels at about 1,676 meters per second, and
so travels more than half a kilometer in this time
Suggested solution: reboot every 10 hours
6
Billy Gates why do you make this possible ? Stop making money
and fix your software!!
(W32.Blaster.Worm)
August 13, 2003
7
Windows exploit(s)
Buffer Overflow
Memory
addresses
…
void foo (char *x) {
char buf[2];
strcpy(buf, x);
}
int main (int argc, char *argv[]) {
foo(argv[1]);
}
Previous
br frame
Returnda
address
Saved
ca FP
char*
ra x
./a.out abracadabra
Segmentation fault
buf[2]
ab
Stack grows
this way
8
Buffer overrun exploits
int check_authentication(char *password) {
int auth_flag = 0;
char password_buffer[16];
strcpy(password_buffer, password);
if(strcmp(password_buffer, "brillig") == 0) auth_flag = 1;
if(strcmp(password_buffer, "outgrabe") == 0) auth_flag = 1;
return auth_flag;
}
int main(int argc, char *argv[]) {
if(check_authentication(argv[1])) {
printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
printf(" Access Granted.\n");
printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); }
else
printf("\nAccess Denied.\n");
}
(source: “hacking – the art of exploitation, 2nd Ed”)
9
(In)correct usage of APIs

Application trend: Increasing number of libraries and APIs
– Non-trivial restrictions on permitted sequences of operations

Typestate: Temporal safety properties
– What sequence of operations are permitted on an object?
– Encoded as DFA
e.g. “Don’t use a Socket unless it is connected”
close()
getInputStream()
getOutputStream()
init
connect()
getInputStream()
getOutputStream()
connected
close()
closed
getInputStream()
getOutputStream()
err
*
10
Challenges
class SocketHolder {
Socket s;
}
Socket makeSocket() { return new Socket(); // A }
open(Socket l) { l.connect(); }
talk(Socket s) { s.getOutputStream()).write(“hello”); }
main() {
Set<SocketHolder> set = new HashSet<SocketHolder>();
while(…) {
SocketHolder h = new SocketHolder();
h.s = makeSocket();
set.add(h);
}
for (Iterator<SocketHolder> it = set.iterator(); …) {
Socket g = it.next().s;
open(g);
talk(g);
}
}
11
Testing is not enough
• Observe some program behaviors
• What can you say about other behaviors?
• Concurrency makes things worse
• Smart testing is useful
– requires the techniques that we will see in the
course
12
Static analysis definition
Reason statically (at compile time) about the
possible runtime behaviors of a program
“The algorithmic discovery of properties of a
program by inspection of its source text1”
-- Manna, Pnueli
1 Does not have to literally be the source text, just means w/o running it
13
Is it at all doable?
x=?
if (x > 0) {
y = 42;
} else {
y = 73;
foo();
}
assert (y == 42);
Bad news: problem is generally undecidable
14
Central idea: use approximation
Over
Approximation
Exact set of
configurations/
behaviors
Under
Approximation
universe
15
Goal: exploring program states
bad
states
reachable
states
initial
states
16
Technique: explore abstract states
bad
states
reachable
states
initial
states
17
Technique: explore abstract states
bad
states
reachable
states
initial
states
18
Technique: explore abstract states
bad
states
reachable
states
initial
states
19
Technique: explore abstract states
bad
states
reachable
states
initial
states
20
Sound: cover all reachable states
bad
states
reachable
states
initial
states
21
Unsound: miss some reachable states
bad
states
reachable
states
initial
states
22
Imprecise abstraction
False alarms
bad
states
reachable
states
initial
states
23
23
A sound message
x=?
if (x > 0) {
y = 42;
} else {
y = 73;
foo();
}
assert (y == 42); Assertion may be violated
24
Precision
• Avoid useless result
UselessAnalysis(Program p) {
printf(“assertion may be violated\n”);
}
• Low false alarm rate
• Understand where precision is lost
25
Runtime vs. static analysis
Runtime
Static analysis
Effectiveness
Can miss errors
Finds real errors
Can find rare errors
Can raise false alarms
Cost
Proportional to program’s
execution
Proportional to program’s
complexity
No need to efficiently handle Can handle limited classes of
rare cases
programs and still be useful
26
Static Driver
Verifier
Rules
Static Driver Verifier
Precise
API Usage Rules
(SLIC)
Defects
100% path
coverage
Driver’s Source Code in C
Environment
model
Bill Gates’ Quote
"Things like even software verification,
this has been the Holy Grail of computer
science for many decades but now in
some very key areas, for example, driver
verification we’re building tools that can
do actual proof about the software and
how it works in order to guarantee the
reliability." Bill Gates, April 18, 2002.
Keynote address at WinHec 2002
The Astrée Static Analyzer
Patrick Cousot
Radhia Cousot
Jérôme Feret
Laurent Mauborgne
Antoine Miné
Xavier Rival
ENS France
Objectives of Astrée
• Prove absence of errors in safety critical C
code
• ASTRÉE was able to prove completely
automatically the absence of any RTE in the
primary flight control software of the Airbus
A340 fly-by-wire system
– a program of 132,000 lines of C analyzed
Objectives of Astrée
• Prove absence of errors in safety critical C
code
• ASTRÉE was able to prove completely
automatically the absence of any RTE in the
primary flight control software of the Airbus
A340 fly-by-wire system
– a program of 132,000 lines of C analyzed
By Lasse Fuss (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
A little about me
• History
– Studied B.Sc., M.Sc., Ph.D. at Tel-Aviv University
• Research in program analysis with IBM and Microsoft
– Post-doc in UCLA and in UT Austin
– Joined Ben-Gurion University this year
• Example research challenges
– What’s a good algorithm for automatically discovering
(with no hints) that a program generates a binary tree
where all leaves are connected in a list?
– What’s a good algorithm for automatically proving that a
parallel program behaves “well”?
– How can we automatically synthesize parallel code that is
both correct and efficient?
32
Why study program analysis?
• Challenging and thought provoking
– An approach for dealing with computationally hard
(usually undecidable) problems
– Treat programs as mathematical objects
• Understand how to systematically
– Design optimizations
– Reason about correctness / find bugs (security)
• Some techniques may be applied in other
domains
– Computational learning
– Analysis of biological systems
33
What do you get in this course?
• Learn basic principles of static analysis
– Understand jargon/papers
• Learn a few advanced techniques
– Some principled way of developing analysis
– Develop one in a small-scale project
• Put to practice what you learned in logic,
automata, programming
34
My role
• Teach you theory and practice
• Teach you how to think of new techniques
• E-mail: romanm@cs.bgu.ac.il
• Office hours: Wednesday 13:00-15:00
• Course web-page
– Announcements
– Forum
–…
35
Requirements
1. Summarize one lecture: 10% of grade
– Submit initial summary
– Get corrections/suggestions
– Submit revised summary
2. Theoretical assignments and programming
assignments: 50%
–
–
–
–
About 8 (some very small)
Must submit all
Must solve all questions
Otherwise re-submit (and get a lower grade)
3. Final project: 40%
– Implement a program analyzer for a given component
36
How to succeed in this course
• Attend all classes
• Make sure you understand material in class
– Engage by asking questions and raising ideas
• Be on top of assignments
– Submit on time
– Don’t get stuck or give up on exercises – get help – ask me
– Don’t start working on assignments the day before
• Be ethical
Joe (a day before assignment deadline):
“I don’t really understand what you want from me in this
assignment, can you help me/extend the deadline”?
37
The static analysis approach
• Formalize software behavior in a
mathematical model (semantics)
• Prove properties of the mathematical model
– Automatically, typically with approximation of the
formal semantics
• Develop theory and tools for program
correctness and robustness
38
Kinds of static analysis
• Spans a wide range
– type checking … up to full functional verification
• General safety specifications
• Security properties (e.g., information flow)
• Concurrency correctness conditions (e.g., absence of
data races, absence of deadlocks, atomicity)
• Correct usage of libraries (e.g., typestate)
• Underapproximations useful for bug-finding, test-case
generation,…
39
Static analysis techniques
• Abstract Interpretation
• Dataflow analysis
• Constraint-based analysis
• Type and effect systems
40
Static analysis for verification
specification
Valid
program
Analyzer
Abstract
counter
example
41
Relation to program verification
Static Analysis
• Fully automatic
• Applicable to a programming language
• Can be very imprecise
• May yield false alarms
Program Verification
• Requires specification and loop
invariants
• Program specific
•
•
•
•
Relatively complete
Provides counter examples
Provides useful documentation
Can be mechanized using
theorem provers
42
Verification challenge
main(int i) {
int x=3,y=1;
do {
y = y + 1;
} while(--i > 0)
assert 0 < x + y;
Determine what
states can arise
during any execution
}
Challenge: set of states is unbounded
43
Abstract Interpretation
main(int i) {
int x=3,y=1;
do {
y = y + 1;
} while(--i > 0)
assert 0 < x + y;
Recipe
1) Abstraction
2) Transformers
Determine what
3) Exploration
states can arise
during any execution
}
Challenge: set of states is unbounded
Solution: compute a bounded representation
of (a superset) of program states
44
1) Abstraction
main(int i) {
int x=3,y=1;
• concrete state
: Var Z
• abstract state (sign)
do {
#: Var{+, 0, -, ?}
y = y + 1;
} while(--i > 0)
x y i
assert 0 < x + y;
}
3 1 7
x y i
x y
+ + +
i
3 2 6
…
45
2) Transformers
main(int i) {
int x=3,y=1;
• concrete transformer
x y i
3 1 0
}
y = y + 1
x y i
3 2 0
do {
• abstract transformer
y = y + 1;
} while(--i > 0) x y i y = y + 1 x
assert 0 < x + y; + + 0
+
y i
+ 0
0
+ ? 0
+ 0 0
+ + 0
+ ? 0
+ ? 0
+ -
46
3) Exploration
x y i
main(int i) {
int x=3,y=1;
x y i
? ? ?
+ + ?
do {
y = y + 1;
} while(--i > 0)
assert 0 < x + y;
}
+ + ?
+ + ?
+ + ?
+ + ?
+ + ?
+ + ?

47
Incompleteness
x y i
main(int i) {
int x=3,y=1;
x y i
? ? ?
+ + ?
do {
y = y - 2;
y = y + 3;
} while(--i > 0)
assert 0 < x + y;
}
+ ? ?
+ ? ?
+ ? ?
+ ? ?
+ ? ?
+ ? ?

48
Parity abstraction
while (x !=1 ) do
if (x % 2) == 0
x := x / 2;
} else {
x := x * 3 +
assert (x %2
}
}
{
{
1;
==0);
challenge: how to find “the right” abstraction
49
How to find “the right” abstraction?
• Pick an abstract domain suited for your
property
– Numerical domains
– Domains for reasoning about the heap
–…
• Combination of abstract domains
• Another approach
– Abstraction refinement
50
Following the recipe (in a nutshell)
1) Abstraction
n
t
n
n
t
x
x
n
Abstract state
Concrete state
2) Transformers
t->n = x
n
t
x
n
n
t
x
n
51
Example: shape (heap) analysis
void stack-init(int i) {
Node* x = null;
emp
do {
Node t =
malloc(…)
t
t
t
x
x
n
t
t
t->n = x;
t
} while(--i>0)
t
x
t
x
n
t
x
n
n
t
x
}
n
n
n
n
n
n
n
n
x
Top = x;
assert(acyclic(Top))
n
x
n
t
n
x
n
t
x
x = t;
n
n
t
x
n
n
t
x
x
t
n

t
x
top
52
3) Exploration
void stack-init(int i) {
Node* x = null;
emp
do {
Node t =
malloc(…)
t
t
t
x
x
x
t->n = x;
} while(--i>0)
Top = x;

assert(acyclic(Top))
}
t
x
t
x
t
x
t
Top
n
n
t
x
n
t
x
x = t;
n
x
n
t
n
t
x
t
t
t
n
n
n
t
x
x
n
n n n
tt
x Top
x
n
t
x
Top
n
n
t
n
x x
n
n
n
n
t
n
x
Top
n 53
Example: polyhedra (numerical) domain
proc MC(n:int) returns (r:int)
var t1:int, t2:int;
begin
if (n>100) then
r = n-10;
else
t1 = n + 11;
t2 = MC(t1);
r = MC(t2);
endif;
end
var a:int, b:int;
begin
b = MC(a);
end
What is the result of this program?
54
McCarthy 91 function
if (n>=101) then n-10 else 91
proc MC (n : int) returns (r : int) var t1 : int, t2 : int;
begin
/* (L6 C5) top */
if n > 100 then
/* (L7 C17) [|n-101>=0|] */
r = n - 10;
/* (L8 C14) [|-n+r+10=0; n-101>=0|] */
else
/* (L9 C6) [|-n+100>=0|] */
t1 = n + 11;
/* (L10 C17) [|-n+t1-11=0; -n+100>=0|] */
t2 = MC(t1);
/* (L11 C17) [|-n+t1-11=0; -n+100>=0;
-n+t2-1>=0; t2-91>=0|] */
r = MC(t2);
/* (L12 C16) [|-n+t1-11=0; -n+100>=0;
-n+t2-1>=0; t2-91>=0; r-t2+10>=0;
r-91>=0|] */
endif;
/* (L13 C8) [|-n+r+10>=0; r-91>=0|] */
end
var a : int, b : int;
begin
/* (L18 C5) top */
b = MC(a);
/* (L19 C12) [|-a+b+10>=0; b-91>=0|] */
end
55
Some things that should trouble you
•
•
•
•
•
•
Does a result always exist?
Does the recipe always converge?
How “optimal” is the result?
How do I pick my abstraction?
How do come up with abstract transformers?
Other practical issues
– Efficiency
– How does it do in practice?
56
Abstraction refinement
Valid
program
specification
abstraction
Verify
Abstract
counter
example
Abstraction
Refinement
counter
example
Change the abstraction to match the program
57
Recap: program analysis
• Reason statically (at compile time) about the
possible runtime behaviors of a program
• use sound overapproximation of program
behavior
• abstract interpretation
– abstract domain
– transformers
– exploration (fixed-point computation)
• finding the right abstraction?
58
Next lecture:
semantics of programming languages
59
References
• Patriot bug:
– http://www.cs.usyd.edu.au/~alum/patriot_bug.html
– Patrick Cousot’s NYU lecture notes
• Zune bug:
– http://www.crunchgear.com/2008/12/31/zune-bug-explained-indetail/
• Blaster worm:
– http://www.sans.org/securityresources/malwarefaq/w32_blasterworm.php
• Interesting CACM article
– http://cacm.acm.org/magazines/2010/2/69354-a-few-billion-lines-ofcode-later/fulltext
• Interesting blog post
– http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
60
Download