Program Analysis Lecture Notes

advertisement
Program Analysis Lecture Notes
CFA example and CSSV (June 30 2002)
Compiled by Roman Manevich (rumster@post.tau.ac.il)
Applying the CFG algorithm to the motivating example
class Vehicle Object { int position = 10;
static void move (Vehicle this1 {Car} , int x1) {
position = position + x1 ;}}
class Car extends Vehicle { int passengers;
static void await(Car this2 {Car} , Vehicle v {Truck}) {
if (v.position < position)
then v.move(v, position - v.position);
else this2.move(this2, 10); }}
class Truck extends Vehicle {
static void move(Truck this3 {Truck}, int x2) {
if (x2 < 55) position += x2; }}
void main { Car c; Truck t; Vehicle v1;
new c;
new t;
v1 := c;
c.passangers := 2;
c.move(c, 60);
v1.move(v1, 70);
c.await(c, t) ;}
In this example, the aim is to find the run-time type of objects of which methods are
invoked. This is useful because statistically, most of the calls are applied to exactly one
type, and if this information is available to an optimizing compiler it can turn the call to a
static one, saving the overhead needed for virtual function calls.
The constraint system for the analysis is given by :
1. {V}  cl(v)  cl(v)cl(t1)
2. {C}  cl(v)  cl(v)cl(t1)
3. {T}  cl(v)  cl(v)cl(t3)
4. {C}  cl(t2)  cl(t2)cl(t1)
5. {C}  cl(c)
6. {T}  cl(t)
7. cl(c)  cl(v1)
8. {C}  cl(c)  cl(c)cl(t1)
9. {V}  cl(v1)  cl(v1)cl(t1)
10. {C}  cl(v1)  cl(v1)cl(t1)
11. {T}  cl(v1)  cl(v1)cl(t3)
12. {C}  cl(c)  cl(c)cl(t2)
13. {C}  cl(c)  cl(t)cl(v)
The conditional constraints are suitable for treating function calls.
The minimal solution of this system gives us a class-level analysis of this program.
In the presentation, edges are used to associate sets of constraints with each context.
1. The analysis of “if (v.position < position) then v.move()“ adds the first
three constraints – one for each possible class.
2. Notice that in the second step we add only one constraint because ‘Car’ is
not sub-classed, and we always assume that the front-end supplies us with
this information by enforcing type-safety before we get to this analysis.
3. When we get to analyze constraints, like {C}  cl(c) the effect is to add
cl(c) to the work-list
4. Same for {T}  cl(t)
…
Intuitively this analysis may be more efficient than data flow, since every time the
algorithm visits a constraint there a good chance that it adds more elements.
In real life applications, many optimizations are applied, such as trying to avoid many
circular evaluations.
In this particular example, we were able to find the exact class for each method
invocation.
More information about control flow analysis is available from the following URLs:
http://www.cs.berkeley.edu/Research/~Aiken/bane.html
http://www.cs.washington.edu/research/projects/cecil
CSSV – C String Static Verifier
A work by: Nurit Dor, Michael Rodeh, Mooly Sagiv and Greta Yorsh.
Example – unsafe call to strcpy()
simple()
{
char
s[20];
char
*p;
char
t[10];
strcpy(s,”Hello”);
p = s + 5;
strcpy(p,” world!”);
strcpy(t,s);
}
The last call to strcpy causes character to be written after the end of the buffer pointed to
by t.
Complicated Example
/* from web2c [fixwrites.c] */
#define BUFSIZ 1024
char
buf[BUFSIZ];
char insert_long(char *cp)
{
char temp[BUFSIZ];
buf
cp
…
for (i = 0; &buf[i] < cp ; ++i)
temp[i] = buf[i];
strcpy(&temp[i],”(long)”);
strcpy(&temp[i+6],cp);
(long)
temp
…
When the cp pointer is close to the end of the buffer then the statement
strcpy(&temp[i],”(long)”) might access memory out of the buffer’s bounds, as shown in the
next figure:
buf
cp
temp
(l o n g)
In the next figure we see an example in which cp is not too close to the end of the buffer,
but the statement strcpy(&temp[i+6],cp); might “push” elements outside the buffer’s
bounds:
buf
cp
temp
(long)
Notice, that this does not necessarily mean that a program that uses this function
necessarily contains errors. In many cases there are complicated relationships between
the server and client and this makes it harder to avoid false alarms (for example, the
client function may be doing the bounds check before calling this function).
A Real Example
void RTC_Si_SkipLine(const INT32 NbLine, char ** const PtrEndText)
{
INT32 indice;
for (indice=0; indice<NbLine; indice++) {
**PtrEndText = '\n';
(*PtrEndText)++;
}
**PtrEndText = '\0';
return;
}
PtrEndText
This example involves multilevel pointers and numeric values.
NbLine + 1
Are String Violations Common?
 FUZZ study (1995)
•
Random test programs on various systems
9 different UNIX systems
18% – 23% hang or crash
80% are string related errors
“Errors in the use of pointers and array subscripts dominate the results of our tests.”
 CERT advisory
•
50% of attacks are abuses of buffer overflows
Current Methods
 Runtime
– Safe-C [PLDI’94]
– Purify
– Bound-checking
 Static + Runtime
– CCured [POPL’02]
Structure of CSSV
C files
The input files.
Procedure name
A procedure to analyze.
Pre/Mod/Post Annotations
Function annotations supplied by the user.
The Pre annotation specifies what is expected on entrance to the function (precondition). This is checked before a call to another function.
The Mod annotation specifies everything that can change by the end of the
function.
The Post annotation specifies the condition expected on exit from the function
(post-condition).
They have two roles. They allow the verifier to work without a need for
interprocedural analysis (modular analysis), since it only has to verify that the
function is consistent with local constraints. It also helps reduce a potentially
large number of false alarms due to the very conservative pointer analysis.
Specification of strcpy
char* strcpy(char* dst, char *src)
requires ( string(src)  alloc(dst) > len(src) )
mod dst.strlen, dst.is_nullt
ensures ( len(dst) = = pre@len(src)  return = = pre@dst )
Notice that there is no requirement for src to be immutable. This is implicitly specified by
not including src in the mod section.
Specification – insert_long()
/* insert_long.c */
#include "insert_long.h"
char buf[BUFSIZ];
char * insert_long (char *cp) {
char temp[BUFSIZ];
int i;
for (i=0; &buf[i] < cp; ++i){
temp[i] = buf[i];
}
strcpy (&temp[i],"(long)");
strcpy (&temp[i + 6], cp);
strcpy (buf, temp);
return cp + 6;
}
char * insert_long(char *cp)
requires( string(cp)  buf  cp < buf + BUFSIZ )
mod cp.strlen
ensures ( cp.strlen = = pre[cp.strlen + 6] 
return_value = = cp + 6 ;
)
In this example, the requires annotation specifies that cp is a string and its bounds.
The mod annotation specifies that only cp is allowed to be modified in this function.
The ensures annotation specifies that the length of the string (cp.strlen) after it has been
mutated is longer by six characters relative to original length (pre means the value on
entrance to insert_long), and that the return pointer is at six characters offset relative to
cp.
Pointer Analysis
Interprocedural flow-insensitive pointer analysis.
The analysis is used to build local function information for every argument, which
increases the precision of the overall analysis by allowing analysis to conduct
strong updates to pointer selectors.
foo(char *p, char *q)
{
char local[100];
…
p = local;
*q = 0;
…}
local
p
q
main()
{
char s[10], t[20], r[30];
char *temp;
foo(s,t);
foo(s,r);
…
temp = s
…}
s
t
temp
In this example, we see how the information built for foo does not distinguish between
the two calls.
C2IP
Converts a C program to an integer program.
The conversion is conservative in the sense that if a (potential) bug (violation of the
pre/post constraints) exists in the original program then an assert statement is violated in
the integer program.
The conversion is done by inlining the specifications, as in the following example:
strcpy(s, “hello”);
assert( s.offset < s.alloc &&
s.alloc – s.offset > s.len);
eliminate( s.len );
assume( s.len == s.offset + 5);
Sometimes the points-to information is insufficient for determining which part of the
branch should be taken, as in the following example:
p
aloc
1
aloc
5
So a non-deterministic treatment is necessary:
r
if (…) {
aloc1.len = p.offset;
aloc1.is_nullt = true; }
else {
alloc5.len = p.offset;
alloc5.is_nullt = true; }
Memory allocation statements, such as “malloc” are handled conservatively by
representing all memory allocated at the same statement by a single summary node
Integer Analysis
The interval analysis approach we learned in class is not sufficiently precise for our
means, because it ignores the relationships between variables.
For this purpose, a much stronger analysis is needed, which uses polyhedrons to express
linear relations between variables. Cousot and Halbwachs introduced this abstract domain
in 1978.
Linear inequalities between variables can be expressed directly, and two pairs of weak
inequalities can specify equalities.
The points in the feasible region of the polyhedron (the filled area) correspond to
potential combinations of variable values.
0
1
2
3
V = <(1,2) (2,1) >
R = <(1,0) (1,1)>
y
y1
x+y3
-x + y 1
0
1
2
3
x
The polyhedra domain defines the join operator by means of the convex hull of two
polyhedrons, and a widening operator (needed because the height of the lattice is not
finite).
AWP
Approximate Weakest Precondition
CSSV also includes an automatic generator for producing (conservative) annotations.
1. Mod annotations can be computed by analyzing the body of the function
using points-to information
2. Pre annotation is approximated by running the integer analysis backwards.
This means that the control flow program of the integer program is
traversed in a backward manner.
3. Post condition is computed from pre by analyzing the body of the
procedure.
These annotations can be used as a starting point, and later refined manually.
Potential Error Messages
CSSV can also supply counter-examples, as in the following example
buf.offset = 0
temp.offest = 0
0  cp.offset = i
i  sbuf.len < s buf.msize
sbuf.msize = 1024
stemp.msize= 1024
i = cp.offset  1018
buf
cp
temp
( l o n g)
assert(0 £ i < 6 - stemp.msize );
// strcpy(&temp[i],"(long)");
Potential violation when
cp.offset  1018
Good Luck with the exam!
Download