CMSC 631 – Fall 2002 Ruggero Morselli CMSC 631 – PAPER REVIEW Jens Palsberg Type-Based Analysis and Applications, PASTE 2001 Program Analysis is a general term, under which several problems fall. Examples are: alias analysis (determine if two program expressions can be both references to the same memory location), liveness analysis (determine if the value of a variable at a certain point of the program may be read, because it gets overwritten or destroyed), method reachability (determine which procedures a polymorphic procedure call can actually invoke at run time), just to mention a few. Most of those problems are relevant for optimized compilation, but there are other applications like program understanding and correctness checking. Type-Based Program Analysis is the collective name of the techniques for solving these problems that exploit the type derivation of the program, assuming that it is written in a language with some concept of static types and that the program type checks. This paper discusses the advantages of type-based program analysis, mentioning several examples. The first sample problem is flow analysis for lambda-calculus: given an expression e in the lambda-calculus language, determine, for each subexpression e’ of e, which function f, whose definition appears in e¸ e’ can evaluate to. The paper briefly recalls 0-CFA, a non-typed-based technique for this analysis. Then it discusses three different type-based analysis techniques for the same problem. 1. Types as discriminators: a simple type system is defined for the lambda calculus. Then the type inference algorithm is run on the input expression. Each e’ can evaluate to any function f that has the same type as e’. 2. Type and effect system: augment the simple type system with qualified function type. Each function type is qualified with the function values it can possibly take. Type inference will produce the result 3. Sparse flow graph: this method does not use directly the type system. It runs an algorithm that is similar to 0-CFA that apparently ignores types. The advantage is that, if the expression type checks against the original type system and the size of the types of the subexpressions is O(1) in the size of the program, then this algorithm produces the output graph in linear, rather than cubic, time. A second example of analysis problem is method reachability, which is useful to determine which function calls can be inlined. In most languages, a polymorphic call is made through a pointer to an object and the type of the pointer is known. The type of the object can be only a subtype of the type of the pointer. This can be used to efficiently rule out most of the functions of the program as possible targets of the call. More type information can actually be used, like verifying the presence of upcasts from a type T to a supertype T1, to determine if a pointer of type T1 can actually refer to an object of type T. Another example is alias analysis. The same techniques seen for method reachability can be used to verify if two pointers can actually refer to the same memory location or not. Additional type-based tricks include observing that two expressions p.f and q.g are never alias if f<>g, because, at most, they can be two distinct fields of the same object. CMSC 631 – Fall 2002 Ruggero Morselli Other examples mentioned in the paper are using qualified types to verify if the objects of a Java class with default access can actually be accessed by code in a different package (confinement analysis); this is useful for program understanding. Or implementing region memory management for variables in Standard ML, where the places in the program when these regions can be allocated and deallocated can be determined statically. Advantages of type-based analysis? First of all, it tends to be far more efficient than analyses that ignore types. The sparse-flow graph compared to the 0-CFA, mentioned above, is just one example. Another important point is that type-based analyses are easier to understand and therefore simpler to design. Also it is well understood how to prove the correctness of a type system, and this often automatically verifies the correctness of the analysis. As a conclusion, types are very useful to deduce properties about a program and to lead to efficient optimized compilation and program understanding. This also underlies the advantage of strongly statically typed languages like Java, compared to weakly typed languages like C or dynamically typed languages like Lisp.