Program Analysis Lecture Notes CFA example and CSSV (June 30 2002) Compiled by Roman Manevich (rumster@post.tau.ac.il) Applying the CFG algorithm to the motivating example class Vehicle Object { int position = 10; static void move (Vehicle this1 {Car} , int x1) { position = position + x1 ;}} class Car extends Vehicle { int passengers; static void await(Car this2 {Car} , Vehicle v {Truck}) { if (v.position < position) then v.move(v, position - v.position); else this2.move(this2, 10); }} class Truck extends Vehicle { static void move(Truck this3 {Truck}, int x2) { if (x2 < 55) position += x2; }} void main { Car c; Truck t; Vehicle v1; new c; new t; v1 := c; c.passangers := 2; c.move(c, 60); v1.move(v1, 70); c.await(c, t) ;} In this example, the aim is to find the run-time type of objects of which methods are invoked. This is useful because statistically, most of the calls are applied to exactly one type, and if this information is available to an optimizing compiler it can turn the call to a static one, saving the overhead needed for virtual function calls. The constraint system for the analysis is given by : 1. {V} cl(v) cl(v)cl(t1) 2. {C} cl(v) cl(v)cl(t1) 3. {T} cl(v) cl(v)cl(t3) 4. {C} cl(t2) cl(t2)cl(t1) 5. {C} cl(c) 6. {T} cl(t) 7. cl(c) cl(v1) 8. {C} cl(c) cl(c)cl(t1) 9. {V} cl(v1) cl(v1)cl(t1) 10. {C} cl(v1) cl(v1)cl(t1) 11. {T} cl(v1) cl(v1)cl(t3) 12. {C} cl(c) cl(c)cl(t2) 13. {C} cl(c) cl(t)cl(v) The conditional constraints are suitable for treating function calls. The minimal solution of this system gives us a class-level analysis of this program. In the presentation, edges are used to associate sets of constraints with each context. 1. The analysis of “if (v.position < position) then v.move()“ adds the first three constraints – one for each possible class. 2. Notice that in the second step we add only one constraint because ‘Car’ is not sub-classed, and we always assume that the front-end supplies us with this information by enforcing type-safety before we get to this analysis. 3. When we get to analyze constraints, like {C} cl(c) the effect is to add cl(c) to the work-list 4. Same for {T} cl(t) … Intuitively this analysis may be more efficient than data flow, since every time the algorithm visits a constraint there a good chance that it adds more elements. In real life applications, many optimizations are applied, such as trying to avoid many circular evaluations. In this particular example, we were able to find the exact class for each method invocation. More information about control flow analysis is available from the following URLs: http://www.cs.berkeley.edu/Research/~Aiken/bane.html http://www.cs.washington.edu/research/projects/cecil CSSV – C String Static Verifier A work by: Nurit Dor, Michael Rodeh, Mooly Sagiv and Greta Yorsh. Example – unsafe call to strcpy() simple() { char s[20]; char *p; char t[10]; strcpy(s,”Hello”); p = s + 5; strcpy(p,” world!”); strcpy(t,s); } The last call to strcpy causes character to be written after the end of the buffer pointed to by t. Complicated Example /* from web2c [fixwrites.c] */ #define BUFSIZ 1024 char buf[BUFSIZ]; char insert_long(char *cp) { char temp[BUFSIZ]; buf cp … for (i = 0; &buf[i] < cp ; ++i) temp[i] = buf[i]; strcpy(&temp[i],”(long)”); strcpy(&temp[i+6],cp); (long) temp … When the cp pointer is close to the end of the buffer then the statement strcpy(&temp[i],”(long)”) might access memory out of the buffer’s bounds, as shown in the next figure: buf cp temp (l o n g) In the next figure we see an example in which cp is not too close to the end of the buffer, but the statement strcpy(&temp[i+6],cp); might “push” elements outside the buffer’s bounds: buf cp temp (long) Notice, that this does not necessarily mean that a program that uses this function necessarily contains errors. In many cases there are complicated relationships between the server and client and this makes it harder to avoid false alarms (for example, the client function may be doing the bounds check before calling this function). A Real Example void RTC_Si_SkipLine(const INT32 NbLine, char ** const PtrEndText) { INT32 indice; for (indice=0; indice<NbLine; indice++) { **PtrEndText = '\n'; (*PtrEndText)++; } **PtrEndText = '\0'; return; } PtrEndText This example involves multilevel pointers and numeric values. NbLine + 1 Are String Violations Common? FUZZ study (1995) • Random test programs on various systems 9 different UNIX systems 18% – 23% hang or crash 80% are string related errors “Errors in the use of pointers and array subscripts dominate the results of our tests.” CERT advisory • 50% of attacks are abuses of buffer overflows Current Methods Runtime – Safe-C [PLDI’94] – Purify – Bound-checking Static + Runtime – CCured [POPL’02] Structure of CSSV C files The input files. Procedure name A procedure to analyze. Pre/Mod/Post Annotations Function annotations supplied by the user. The Pre annotation specifies what is expected on entrance to the function (precondition). This is checked before a call to another function. The Mod annotation specifies everything that can change by the end of the function. The Post annotation specifies the condition expected on exit from the function (post-condition). They have two roles. They allow the verifier to work without a need for interprocedural analysis (modular analysis), since it only has to verify that the function is consistent with local constraints. It also helps reduce a potentially large number of false alarms due to the very conservative pointer analysis. Specification of strcpy char* strcpy(char* dst, char *src) requires ( string(src) alloc(dst) > len(src) ) mod dst.strlen, dst.is_nullt ensures ( len(dst) = = pre@len(src) return = = pre@dst ) Notice that there is no requirement for src to be immutable. This is implicitly specified by not including src in the mod section. Specification – insert_long() /* insert_long.c */ #include "insert_long.h" char buf[BUFSIZ]; char * insert_long (char *cp) { char temp[BUFSIZ]; int i; for (i=0; &buf[i] < cp; ++i){ temp[i] = buf[i]; } strcpy (&temp[i],"(long)"); strcpy (&temp[i + 6], cp); strcpy (buf, temp); return cp + 6; } char * insert_long(char *cp) requires( string(cp) buf cp < buf + BUFSIZ ) mod cp.strlen ensures ( cp.strlen = = pre[cp.strlen + 6] return_value = = cp + 6 ; ) In this example, the requires annotation specifies that cp is a string and its bounds. The mod annotation specifies that only cp is allowed to be modified in this function. The ensures annotation specifies that the length of the string (cp.strlen) after it has been mutated is longer by six characters relative to original length (pre means the value on entrance to insert_long), and that the return pointer is at six characters offset relative to cp. Pointer Analysis Interprocedural flow-insensitive pointer analysis. The analysis is used to build local function information for every argument, which increases the precision of the overall analysis by allowing analysis to conduct strong updates to pointer selectors. foo(char *p, char *q) { char local[100]; … p = local; *q = 0; …} local p q main() { char s[10], t[20], r[30]; char *temp; foo(s,t); foo(s,r); … temp = s …} s t temp In this example, we see how the information built for foo does not distinguish between the two calls. C2IP Converts a C program to an integer program. The conversion is conservative in the sense that if a (potential) bug (violation of the pre/post constraints) exists in the original program then an assert statement is violated in the integer program. The conversion is done by inlining the specifications, as in the following example: strcpy(s, “hello”); assert( s.offset < s.alloc && s.alloc – s.offset > s.len); eliminate( s.len ); assume( s.len == s.offset + 5); Sometimes the points-to information is insufficient for determining which part of the branch should be taken, as in the following example: p aloc 1 aloc 5 So a non-deterministic treatment is necessary: r if (…) { aloc1.len = p.offset; aloc1.is_nullt = true; } else { alloc5.len = p.offset; alloc5.is_nullt = true; } Memory allocation statements, such as “malloc” are handled conservatively by representing all memory allocated at the same statement by a single summary node Integer Analysis The interval analysis approach we learned in class is not sufficiently precise for our means, because it ignores the relationships between variables. For this purpose, a much stronger analysis is needed, which uses polyhedrons to express linear relations between variables. Cousot and Halbwachs introduced this abstract domain in 1978. Linear inequalities between variables can be expressed directly, and two pairs of weak inequalities can specify equalities. The points in the feasible region of the polyhedron (the filled area) correspond to potential combinations of variable values. 0 1 2 3 V = <(1,2) (2,1) > R = <(1,0) (1,1)> y y1 x+y3 -x + y 1 0 1 2 3 x The polyhedra domain defines the join operator by means of the convex hull of two polyhedrons, and a widening operator (needed because the height of the lattice is not finite). AWP Approximate Weakest Precondition CSSV also includes an automatic generator for producing (conservative) annotations. 1. Mod annotations can be computed by analyzing the body of the function using points-to information 2. Pre annotation is approximated by running the integer analysis backwards. This means that the control flow program of the integer program is traversed in a backward manner. 3. Post condition is computed from pre by analyzing the body of the procedure. These annotations can be used as a starting point, and later refined manually. Potential Error Messages CSSV can also supply counter-examples, as in the following example buf.offset = 0 temp.offest = 0 0 cp.offset = i i sbuf.len < s buf.msize sbuf.msize = 1024 stemp.msize= 1024 i = cp.offset 1018 buf cp temp ( l o n g) assert(0 £ i < 6 - stemp.msize ); // strcpy(&temp[i],"(long)"); Potential violation when cp.offset 1018 Good Luck with the exam!