IBM Research Applications of Type Constraints in Software Engineering Tools Frank Tip IBM T.J. Watson Research Center © 2004 IBM Corporation IBM Research This Presentation is Based on Joint Work With Ittai Balaban (New York University) Dirk Bäumer (IBM Zurich Research Center) Bjorn De Sutter (Ghent University) Julian Dolby (IBM T.J. Watson Research Center) Robert Fuhrer (IBM T.J. Watson Research Center) Adam Kieżun (MIT) 2 © 2004 IBM Corporation IBM Research IBM Research about 3000 people world-wide – 1600 at IBM T.J. Watson Research Center – other sites: Almaden, Austin, Zurich, Haifa, China, India Software Technology Department – about 70 people, director Daniel Yellin – projects on: compiler optimization (JikesRVM), aspects, performance analysis, web services, refactoring, verification, XML, ... – www.research.ibm.com/compsci/plansoft/index.html ARTIST project (Advanced Refactoring Tools for Improving Software archiTecture) – Robert Fuhrer, Mandana Vaziri, Tim Klinger, Adam Kiezun (intern), Frank Tip (project leader) – collaboration with Eclipse JDT team at IBM Zurich – collaboration with IBM Rational – academic collaborations with Bjorn De Sutter (Ghent University), Ittai Balaban (NYU) 3 © 2004 IBM Corporation IBM Research Other Research Activities change impact analysis – given an old and a new version of a program, and a test that fails in the new version, find the subset of the source code changes responsible for the failure – with Barbara Ryder and Xiaoxia Ren (Rutgers) and Julian Dolby (IBM), Max Stoerzer (University of Passau) – papers: PASTE’01, OOPSLA’04 Jax: an application extractor for Java – apply static analysis techniques to eliminate redundant functionality from Java applications, and apply size-reducing transformations – with Peter Sweeney, Chris Laffra, Aldo Eisma, David Streeter – transferred to IBM product (WebSphere Studio Device Developer) – papers: CACM’03, TOPLAS’02, OOPSLA’00, FSE’00, OOPSLA’99 4 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 5 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 6 © 2004 IBM Corporation IBM Research Scope of our Research start with a type-correct Java program P for a given transformation that transforms P into P’ – we would like to check/guarantee that P’ is type-correct – we would like to check/guarantee that P’ has the same behavior as P – (in some cases) compute “maximal” P’ for which the above properties hold we use type constraints to establish these properties – formalism for expressing relationships between program expressions that must hold in order for a program to be type-correct – traditionally used for type checking and type inference transformations under consideration – refactorings: well-known maintenance operations, usually aimed at making code more flexible/general; proposed by the programmer – driven by static/dynamic analysis in link-time optimizer 7 © 2004 IBM Corporation IBM Research Refactoring refactoring: the application of behavior-preserving transformations to a program in order to improve a program’s design – eliminating undesirable program characteristics – e.g., duplicated code, classes/methods that are too large,... – making existing classes/methods usable in new contexts – preparing for extensions – breaking up monolithic systems into components – introduction of design patterns refactoring (noun): a specific program transformation. Usually identified by: – name (e.g., “Extract Method”, “Pull Up Members”, ...) – preconditions – a specific set of transformations to be performed by a programmer or by an automated tool 8 © 2004 IBM Corporation IBM Research Refactoring pioneered by Griswold [1991], Opdyke [1992] & Johnson, leading to Smalltalk Refactoring Browser [Roberts 1992] recently popularized by continuous-refinement methodologies such as “Extreme Programming” [Beck 2000] catalogues of common refactorings: [Fowler 1999], [Kerievsky 2003] Fowler describes refactorings as a series of steps to be performed by the programmer – manual refactoring is very error-prone – renewed interest in automated refactoring support in IDEs – refactoring support featured in Eclipse, IntelliJ IDEA, OmniCore, ... 9 © 2004 IBM Corporation IBM Research Categories of Refactorings (see Fowler’s book) making method calls simpler – Rename Method, Add/Remove Parameter, ... composing methods – Extract Method, Inline Method, Inline Local, ... moving features between objects – Move Method, Move Field, Extract Class, ... organizing data – Self-Encapsulate Field, Replace Data Value with Object, ... simplifying/eliminating conditionals – Replace Conditional with Polymorphism, ... dealing with generalization – Extract Interface, Pull Up Members, ... 10 © 2004 IBM Corporation IBM Research Eclipse (www.eclipse.org) open-source (CPL) development environment – implemented in Java, XML – basis for commercial offerings by IBM (WSAD, WSDD) and others plugin-architecture – plugins contribute views/perspectives – plugins provide extension points state-of-the-art development environment for Java – quick-fixes, refactoring, type hierarchy view, call hierarchy, search facilities – support for other languages (C, Smalltalk, AspectJ) various IBM programs focused on Eclipse – Eclipse Innovation Grants for academics (2002, 2003) – Eclipse Technology Exchange meetings (ICSE, OOPSLA, ECOOP) solid basis for research/education projects – Penumbra, Gild, Hipikat, ECESIS, ... – Continuous Testing, Java Traits, Ownership Types, ... 11 © 2004 IBM Corporation IBM Research Demo: Eclipse Refactorings 12 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 13 © 2004 IBM Corporation IBM Research Type Constraints formalism developed in 1990s – captures relationships between types of program constructs original purpose: type checking/inference – prove that certain kinds of errors cannot occur at run-time – e.g., no “message not understood” errors we use a variation on the formalism from a book by Palsberg & Schwartzbach – adapted/extended to capture the semantics of Java 14 © 2004 IBM Corporation IBM Research Type Constraints Notation [E] the type of expression E [M] the declared return type of method M [F] the declared type of field F Decl(M) the type that contains method M Param(M,i) the i-th parameter of method M , 15 subtype relation © 2004 IBM Corporation IBM Research Syntax of Type Constraints [E] = [E’] the type of expression E must be the same as the type of expression E’ [E] [E’] the type of expression E is a proper subtype of the type of expression E’ [E] [E’] either [E] = [E’] or [E] [E’] [E] T the type of expression E is defined to be T [E] [E1] or ... or [E] [Ek] disjunction: at least one of subconstraints [E] [E1], ..., [E] [Ek] must hold 16 © 2004 IBM Corporation IBM Research Generating Type Constraints declaration C v [v] C assignment E1 = E2 [E2] [E1] access E.f to field F [E.f] [F] [E] Decl(F) return E in method M [E] [M] method M in class C Decl(M) C this in method M direct call E.m(E1,...,En) to method M [this] Decl(M) [E.m(E1,...,En)] [M] [Ei] [Param(M,i)] [E] Decl(M) 17 © 2004 IBM Corporation IBM Research Virtual Method Calls for a call E.m(E1,...,En) to a virtual method M [E.m(E1,...,En) ] [M] [Ei] [Param(M,i)] [E] Decl(M1) or... or [E] Decl(Mk) where RootDefs(M) = { M1,...,Mk } RootDefs(M) = { M’ | M overrides M’, and there exists no M’’ (M’’ M’) such that M’ overrides M’’ } 18 © 2004 IBM Corporation IBM Research Constraints for Virtual Method Calls Dictionary put() Map put() public void foo(String s1, String s2) { Map Hashtable h = new Hashtable(); h.put(s1, s2); Hashtable put() } [h] Decl(Map.put(...)) Decl(Dictionary.put(...)) [h] Map or [h] Dictionary 19 © 2004 IBM Corporation IBM Research Constraints for Overriding & Hiding if method M’ overrides method M, M’ M [Param(M’,i)] = [Param(M,i)] [M’] = [M] Decl(M’) < Decl(M) if field F’ hides field F Decl(F’) < Decl(F) 20 © 2004 IBM Corporation IBM Research Casts for a cast (C)E [(C)E] C [E] [(C)E] or [(C)E] [E] if C is a class and [E] is a class the latter constraint need not be generated if C or |E| is an interface these constraints only capture the requirements for typecorrectness (not necessarily program behavior) it is possible to avoid generating disjunctions by preserving the “directionality” of the cast 21 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 22 © 2004 IBM Corporation IBM Research Refactoring for Generalization several refactorings are concerned with generalization – moving methods/fields to superclasses and subclasses – splitting & merging of classes – manipulating the types of declarations Chapter 11 of Fowler’s book mentions: – Extract Interface – Pull Up Member(s) – Push Down Member(s) – Extract Subclass – Generalize Type 23 © 2004 IBM Corporation IBM Research Extract Interface – Recipe select class C select subset M of C’s methods create interface I containing declarations of the methods in M add inheritance “C implements I” “Adjust client type declarations to use the interface” [Fowler, p.342] 24 © 2004 IBM Corporation IBM Research Extract Interface: An Example List class with methods as follows: – add(Comparable) add an element – addAll(List) add contents of another List – iterator() iteration support – sort() sorts the list ListIterator class – implements java.util.Iterator; methods hasNext(), next() Client class – create List; add some elements – add contents of another List; sort the List – print contents of the List extract an interface Bag from List – declares add(Comparable), addAll(List), iterator() 25 © 2004 IBM Corporation interface Bag { List/Bag Example (1) public Iterator iterator(); public List add(Comparable e); public List addAll(List v0); } class List implements Bag { int size = 0; Comparable[] elems = new Comparable[10]; public Iterator iterator(){ return new ListIterator(this); } public List add(Comparable e) { if (this.size + 1 == this.elems.length) { Comparable[] newElems = new Comparable[2 * this.size]; System.arraycopy(this.elems, 0, newElems, 0, this.size); this.elems = newElems; } this.elems[this.size++] = e; return this; } public List addAll(List v1) { java.util.Iterator i = v1.iterator(); for (; i.hasNext(); this.add((Comparable)i.next())); return this; } public void sort() { /* insertion sort */ } } List/Bag Example (2) class ListIterator implements java.util.Iterator { private int count = 0; private List v2; ListIterator(List List v3){ v2 = v3; } public boolean hasNext(){ return this.count < this.v2.size; } public Object next(){ return this.v2.elems[this.count++]; } } public class Client { public static void main(String[] args) { List v4 = createList(); populate(v4); update(v4); sortList(v4); print(v4); } static List createList(){ return new List(); } static void populate(List v5){ v5.add("foo").add("bar"); } static void update(List v6) { List v7 = new List().add("zap").add("baz"); v6.addAll(v7); } static void sortList(List List v8){ v8.sort(); } static void print(List v9) { for (Iterator iter = v9.iterator(); iter.hasNext();) System.out.println("Object: " + iter.next()); } } IBM Research Problem Statement identify all declarations that can be updated to make use of the newly extracted interface want to be able to reason about: – correctness of the solution – maximality of the solution 28 © 2004 IBM Corporation IBM Research Using Type Constraints declared types of variables, fields, parameters constrained by: – field access, method calls – assignments, parameter-passing several other invariants must be maintained to preserve typecorrectness & program behavior Observation: all these constraints can be stated succinctly and uniformly using type constraints 29 © 2004 IBM Corporation IBM Research List.add(),Bag.add() [Bag.add()] = [List.add()] List.addAll(),Bag.addAll() [v0] = [v1] [Bag.addAll()] = [List.addAll()] List.iterator() List [v3] List.add() List [List.add()] List.addAll() [v1] Bag, List [List.addAll()] ListIterator.iterator() [v3] [v2] ListIterator.hasNext() [v2] List ListIterator.next() [v2] List Client.main() [Client.createList()] [v4], [v4] [v5], [v4] [v6], [v4] [l8], [v4] [l9] Client.createList() List [Client.createList()] Client.populate() [v5] Bag, [List.add()] Bag Client.update() [List.add()] [v7], [List.add()] Bag, [v6] Bag, [v7] [v1] Client.sortList() [v8] List Client.print() [v9] Bag 30 © 2004 IBM Corporation IBM Research Observation the constraints for the original program contain all the information we need some declarations cannot be updated List [v3] [v2] List [v4] [v8] List other variables are less constrained [v1] Bag 31 © 2004 IBM Corporation IBM Research Algorithm for Determining “Updatable” Declarations iterative algorithm for determining non-updatable declarations – first determine declarations that cannot be updated because of member access (e.g., [v2] List, [v8] List) – if x is non-updatable, and there is a type constraint [y] [x], [y] = [x], or [y] < [x] then y is non-updatable iterate until fixed-point is reached 32 © 2004 IBM Corporation IBM Research Non-Updatable Declarations for the Example Program { v2, v3, v4, v8, Client.createList() } (consistent with earlier result) 33 © 2004 IBM Corporation IBM Research Justification (Details in Paper) type-correctness – updating the “updatable” declaration elements results in a program that satisfies all type constraints preservation of behavior – argument based on the fact that method dispatch, cast/instanceof behavior do not depend on declared types maximality – updating any non-updatable declarations will result in the violation of type constraints 34 © 2004 IBM Corporation IBM Research Another Refactoring: Pull Up Members class A { ... } ? [this] Decl(B.foo()) Decl(B.foo()) B [B.foo()] B [this] [B.foo()] class B extends A { public B foo(){ return this;} } 35 © 2004 IBM Corporation IBM Research Pull Up Members (2) class A { public B foo(){ return this;} } class B extends A { ... } [this] Decl(A.foo()) Decl(A.foo()) A [A.foo()] B [this] ≤ [A.foo()] 36 © 2004 IBM Corporation IBM Research Other Refactorings Generalize Type – update the type of a declaration E – use type constraints to determine allowable supertypes/subtypes – may enable Pull Up Members in certain cases Extract Subclass – splitting of a class – can be treated similarly as Extract Interface Push Down Members – the “inverse” of Pull Up Members – similar issues 37 © 2004 IBM Corporation IBM Research Perspective infer from original program a system of ordering constraints between types of declaration elements – original program is just one possible solution Extract Interface – declarations: variables – locations of members: constants Pull Up Members – declarations: constants – locations of members: variables Generalize Type – selected declaration: variable – all other declarations & locations of members: constants 38 © 2004 IBM Corporation IBM Research Demo: Extract Interface & Generalize Type 39 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 40 © 2004 IBM Corporation IBM Research Class Libraries class libraries improve programmer productivity – programmers don’t have to waste time developing & debugging standard infrastructure but... class libraries are often implemented with some typical/ average usage pattern in mind for example: container class implementations assume that: – elements are accessed often & frequently – a large number of elements is stored performance loss if the actual usage of a library class differs from this typical usage pattern “MyHashTable”, “SmartHashtable”,... in various benchmarks 41 © 2004 IBM Corporation IBM Research Our Approach derive custom versions from library classes rewrite application to use these custom versions ship custom library classes with application technical foundations: – use type constraints to determine where custom classes can be used – use profile information to determine where introducing custom classes is profitable – use static analysis and profile information to decide how to customize 42 © 2004 IBM Corporation IBM Research Example Program class Example { void foo(M foo(Map m){ m){ Hashtable H r1 = newr1 H(); = new Hashtable(); JTree tree = new JTree(r1); Hashtable H r2 = newr2 H(); = new Hashtable(); Hashtable H r3 = newr3 H(); = new Hashtable(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O bar(Object o){ o){ Hashtable H r4 = (H)r4 o;= (Hashtable) o; if (r4.contains(“FOO”)) {…} } } 43 Object O O String S S Dictionary DD M Map M Hashtable H H © 2004 IBM Corporation IBM Research How to customize? class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H r2 = new H()H1(); H(); H r3 = new H()H2(); H(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} } } 44 O S D M H H1 H2 © 2004 IBM Corporation IBM Research O How to customize? class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H H1 r2 = r2new = new H()H1(); H()H1(); H H2 r3 = r3new = new H()H2(); H()H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} } } 45 S H1 D H2 M H H1 H2 © 2004 IBM Corporation IBM Research O How to customize? class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H H1 r2 = new H()H1(); H H1 r3 = new H()H1(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} } } 46 S H1 D H2 M H © 2004 IBM Corporation IBM Research O How to customize? class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H AH H1 r2 = new H()H1(); H AH H1 r3 = new H()H2(); H()H1(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} } } 47 S H1 D AH H2 H1 M H H2 • update allocations of library types • update declarations © 2004 IBM Corporation IBM Research O Restrictions? S AH call to: javax.swing.JTree(Hashtable) class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H r2 = new H(); H r3 = new H(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} } } 48 D H1 M H H2 • type correctness • interface compatibility • preserve behavior of cast and instanceof operations © 2004 IBM Corporation IBM Research Outline of Approach generate type constraints for program – additional constraints generated to ensure that behavior of cast/instanceof operations is preserved constraint simplification – rewrite/replace all constraints to use “≤” only solve the resulting constraint system rewrite the program’s declarations and allocation sites to use the inferred types 49 © 2004 IBM Corporation IBM Research Preserving the Behavior of Cast & instanceof we want to change declarations and allocation sites – need to ensure that cast/instanceof operations succeed and fail in exactly the same cases as before – use points-to analysis to approximate the set of objects to which the cast/instanceof is applied – easily expressed using constraint (to be replaced with a ≤ constraint) public class Example { void zip(){ zap(new Hashtable()); // A1 zap(new String()); // A2 } void zap(Object o){ Hashtable h = (Hashtable)o; // C } } 50 A1 ≤ C A2 C © 2004 IBM Corporation IBM Research O Type constraints class Example { void foo(M m){ H r1 d1 r1==new newH(); a1(); JTree tree = new JTree(r1); H r2 d2 r2==new newH(); a2(); H r3 d3 r3==new newH(); a3(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 bar(O o){ o){ H r4 d5 r4==(H) (c1) o;o; if (r4.contains(“FOO”)) {…} } } 51 S a1 ≤ d1 d1 ≤ H a2 ≤ d2 a3 ≤ d3 d2 ≤ D v d2 ≤ M d3 ≤ d4 d3 ≤ d2 d2 ≤ M S ≤ d4 c1 ≤ d4 v d4 ≤ c1 c1 ≤ d5 d5 ≤ H v d5 ≤ AH a3 ≤ c1 S c1 D AH H1 M H H H2 d5 ≤ H c1 ≤ H © 2004 IBM Corporation IBM Research O Type constraints H M d5 d4 c1 S a3 52 S d2 H d1 d3 a2 a1 D a1 ≤ d1 d1 ≤ H a2 ≤ d2 a3 ≤ d3 d2 ≤ D v d2 ≤ M d3 ≤ d4 d3 ≤ d2 d2 ≤ M S ≤ d4 c1 ≤ d4 v d4 ≤ c1 c1 ≤ d5 d5 ≤ H v d5 ≤ AH d5 ≤ H a3 ≤ c1 S c1 c1 ≤ H AH H1 M H H H2 © 2004 IBM Corporation IBM Research O Constraint Solving S d5 ≤ HT AH H M {O,S,H, d4 H1,H2, D,M,AH} {O,S,H, d5 H1,H2, D,M,AH} {O,S,H, S c1 H1,H2, D,M,AH} a3 {O,S,H,H1,H2} 53 D d1 ≤ H {O,S,H, d2 H1,H2, D,M,AH} H {O,S,H,H1,H2} H H H2 {O,S,H, d1 H1,H2, D,M,AH} {O,S,H, d3 H1,H2, D,M,AH} a2 H1 M a1 ≤ d1 a1 {O,S,H,H1,H2} © 2004 IBM Corporation IBM Research Rewriting the Example Program H M {AH} d5 {O} d4 c1 S {AH} d2 d3 {AH} {H2} 54 a3 a2 {H2} {H1} class class Example Example { { void foo(M m){ void foo(M m){ H d1r1 H r1 r1= ==new new newH(); H(); a1(); H JTree tree = JTree tree = new new JTree(r1); JTree(r1); d2 AH d2 r2 AH r2 = = new new H1(); a2(); H1(); d3 AH r3 = new d3 r3 = new H2(); AH a3(); H2(); r2.put(“FOO”,“BAR”); r2.put(“FOO”,“BAR”); {H} bar(r3); d1 bar(r3); r2 r2 = = r3; r3; r2.putAll(m); r2.putAll(m); bar(“HELLO”); bar(“HELLO”); } } void bar(d4 o){ void bar(O bar(d4o){ bar(O o){ o){ d5 r4 = (c1) (H2) d5 r4 = (H2) AH (c1) o; o; if if (r4.contains(“FOO”)) (r4.contains(“FOO”)) {…} {…} } } a1 } } {H} © 2004 IBM Corporation IBM Research Creating Custom Classes 1. 2. 3. 55 S create custom “profiling” Hashtable – determine how often allocation sites are executed – simulate caching schemes – number of succeeding/failing get/put operations D AH H1 M H H2 static analysis (using “gnosis” framework developed at IBM) – construct call graph (0-CFA, distinct allocation sites for classes of interest) – compute type estimates – escape analysis generate custom implementations: H1, H2, … – 4. O generated from template (using C preprocessor) rewrite bytecode for the program © 2004 IBM Corporation IBM Research Generating Custom Classes O S AH 1. lazy vs. eager allocation 2. synchronized vs. unsynchronized D H1 M H H2 3. optimizing edge cases 4. caching of frequently accessed objects 5. removal of unused fail-safe iteration code 6. … 56 © 2004 IBM Corporation IBM Research _202_jess Applied Customizations – specialization of Hashtable keys (String/Integer) – synchronization removal on frequently used Vectors _209_db – use caching to optimize consecutive Vector-retrievals – synchronization removal on frequently used Vectors _218_jack – 99% of all search operations are on empty Hashtables – lazy allocation, removal of bookkeeping for fail-safe iterators – synchronization removal on Hashtables Jax – most containers remain small, decrease initial container size HyperJ – optimization of empty Hashtables, removal of bookkeeping for fail-safe iterators – synchronization removal Chess* – frequent iteration over Hashtables of fixed, small size – use smaller initial size Pmd * – the vast majority of a huge number of allocated HashSets remains empty – lazy allocation, removal of bookkeeping for fail-safe iterators 57 *no synchronization removal because of GUI-related © 2004 IBM Corporation multi-threading in these benchmarks IBM Research Speedups customization of: – – 58 java.util.* containers StringBuffers (desynchronization only) measurements taken on HyperThreaded Pentium 4 @ 2.8Ghz running Linux 2.4.21 © 2004 IBM Corporation IBM Research Heap Consumption 59 significant reduction in heap consumption on _218_jack because of lazy allocation of many Hashtable-objects that remain empty © 2004 IBM Corporation IBM Research Impact on Application Size note: original size of _209_db is only 6KB. – 60 15 KB of custom container classes are added on large benchmarks (>100Kb), the size increase is <= 12% © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 61 © 2004 IBM Corporation IBM Research Java Generics generics (parametric polymorphism) to be introduced in Java 1.5 – classes can have type parameters that have optional bounds – reduces need for downcasts class Hashtable<Key,Value> { ... } class Tree<Elem extends Comparable<Elem>> { ... } Hashtable<Integer,String> table = new Hashtable<Integer,String>(); ... String s = table.get(someInteger); 62 © 2004 IBM Corporation IBM Research Generic Collections in most Java applications, the use of Collection classes is the main source of down-casts the standard libraries for Java 1.5 contain generic versions of existing Collection classes – Vector<T> instead of Vector – HashMap<K,V> instead of HashMap goal: refactor applications that use non-generic collections – make them use generic collections instead – use type inference to infer element types – remove downcasts 63 © 2004 IBM Corporation IBM Research class A { Example 1 public void foo(){ Vector v1 = new Vector(); String s1= "aaa"; this.insert(v1, s1); String s2= (String)v1.get(0); } public void insert(List v2, Object o){ v2.add(o); } } 64 © 2004 IBM Corporation IBM Research Example 1 (refactored) class A { public void foo(){ Vector<String> v1 = new Vector<String>(); String s1= "aaa"; this.insert(v1, s1); String s2= (String)v1.get(0); } public void insert(List<String> v2, String o){ v2.add(o); } } update “collection” declarations remove casts note update of declaration of o 65 © 2004 IBM Corporation IBM Research public void bar(){ List v1= new Vector(); v1.add(new Float(3.4)); this.reverse(v1); Float f1 = (Float) v1.iterator().next(); } public void baz(){ List v2 = new Vector(); v2.add(new Integer(17)); this.reverse(v2); Integer i1 = (Integer) v2.iterator().next(); } public void reverse(List v3){ for (int t=0; t < v3.size()/2; t++){ Object temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t)); v3.add(t, temp); } } 66 Example 2 © 2004 IBM Corporation IBM Research public void bar(){ List<Number> v1= new Vector<Number>(); v1.add(new Float(3.4)); this.reverse(v1); Float f1 = (Float) v1.iterator().next(); Example 2 (version 1) } public void baz(){ List<Number> v2 = new Vector<Number>(); v2.add(new Integer(17)); this.reverse(v2); Integer i1 = (Integer) v2.iterator().next(); } public void reverse(List<Number> v3){ for (int t=0; t < v3.size()/2; t++){ Number temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t)); element types v3.add(t, temp); } } 67 “merged” in reverse() cannot remove casts in callers © 2004 IBM Corporation public void bar(){ List<Float> v1= new Vector<Float>(); v1.add(new Float(3.4)); this.reverse(v1); Float f1 = (Float) v1.iterator().next(); Example 2 (version 2) } public void baz(){ List<Integer> v2 = new Vector<Integer>(); v2.add(new Integer(17)); this.reverse(v2); Integer i1 = (Integer) v2.iterator().next(); } public <T> void reverse(List<T> v3){ for (int t=0; t < v3.size()/2; t++){ obs: no flow of values between different invocations of reverse() T temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t)); need for context-sensitive v3.add(t, temp); } } analysis introduction of type parameters IBM Research Outline of Approach context inference – use low-cost variation on Agesen’s Cartesian Product Algorithm (CPA) [Agesen:95] for inferring relevant contexts – simultaneously computes points-to information for expressions and a set of contexts for each method type inference – generate type constraints for the program that explicitly encode context information – solving the type constraints produces element types for declarations and allocations of container class types source rewriting – analyze (element) types inferred for different contexts, introduce type parameter if necessary 69 © 2004 IBM Corporation IBM Research public void bar(){ Context Inference [●] List v1= new Vector(); // L1 v1.add(new Float(3.4)); this.reverse(v1); [●] Float f1 = (Float) v1.iterator().next(); } [●] public void baz(){ List v2 = new Vector(); // L2 v2.add(new Integer(17)); this.reverse(v2); [●] Integer i1 = (Integer) v2.iterator().next(); } [●,L1] [●,L2] public void reverse(List v3){ [●,Lext] [●,L1] [●,L2] for (int t=0; t < v3.size()/2; t++){ Object temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t)); v3.add(t, temp); } } 70 © 2004 IBM Corporation IBM Research public void bar(){ [●] |new Vector()|[●] Vector<X1> List v1= new Vector(); // |new L1 Vector()| Example Constraints [●] ≤ |v1|[●] v1.add(new Float(3.4)); |new Float(3.4)|[●] Float this.reverse(v1); |new Float(3.4)|[●] Types[●](v1) Float f1 = (Float) v1.iterator().next(); |v1|[●] ≤ |v3|[●, L1] } public void baz(){ [●] |new Vector()|[●] Vector<X2> List v2 = new Vector(); // L2 |new Vector()|[●] ≤ |v2|[●] v2.add(new Integer(17)); |new Integer(17)|[●] Integer this.reverse(v2); |new Integer(17)| Types[●](v2) Integer i1 = (Integer) v2.iterator().next();[●] |v2|[●] ≤ |v3|[●, L2] } public void reverse(List v3){ [●,L1], [●,L2], [●,Lext] for (int t=0; t < v3.size()/2; t++){ |v3.get()|[●,L1] Elem[●, L1](v3) |v3.get()|[●,L2] Elem [●, L2](v3) Object temp = v3.get(v3.size()-1); |v3.get()| Elem ] [●,L ](v3) |v3.get()|[●, L1][●,L ≤ Ext |temp| [●, L1] Ext |v3.get()| v3.add(v3.size()-1, v3.get(t)); [●, L2] ≤ |temp|[●, L2] |v3.get()| [●,L ] ≤ |temp| [●,L ] |v3.get()| ≤ Elem Ext [●, L1] [●, L1](v3) Ext v3.add(t, temp); |v3.get()|[●, L2] ≤ Elem [●, L2](v3) |v3.get()| ≤ Elem [●,LExt] (v3) [●,LExt](v3) |temp|[●, L1] ≤ Elem } [●, L1] |temp|[●, L2] ≤ Elem [●, L2](v3) |temp| ≤ Elem [●, LExt] [●, LExt](v3) } 71 © 2004 IBM Corporation IBM Research Constraint Solving standard propagation-based solver – computes a type for each constraint variable |E| – in cases where multiple types can be chosen for an expression E, a heuristics-based choice is made (a least specific type for containerrelated expressions, a most specific type for other expressions) – different types may be computed for the same expression in different contexts (e.g., |E|1 and |E|2) element types are unified across ≤ constraints processing type variables – a type variable is bound by matching it with a concrete set of types – matching two type variables results in their unification – type variables may be left unbound (e.g., in incomplete programs) – use approximate solution (e.g., element type Object) when processing programs with code like v.add(v) 72 © 2004 IBM Corporation IBM Research public void bar(){ [●] List v1= new Vector(); // L1 Constraint Solving v1.add(new Float(3.4)); Elem[●](v1) = Float this.reverse(v1); Float f1 = (Float) v1.iterator().next(); } public void baz(){ [●] List v2 = new Vector(); // L2 Elem[●](v2) = Integer v2.add(new Integer(17)); this.reverse(v2); Integer i1 = (Integer) v2.iterator().next(); } public void reverse(List v3){ [●,L1], [●,L2], [●,Lext] for (int t=0; t < v3.size()/2; t++){ Object temp = v3.get(v3.size()-1); Elem[●,L1](v3) = Float v3.add(v3.size()-1, v3.get(t)); v3.add(t, temp); Elem[●,L2](v3) = Integer } Elem[●,Lext(v3) = Object } 73 © 2004 IBM Corporation IBM Research public void bar(){ List<Float> v1= new Vector<Float>(); Code v1.add(new Float(3.4)); this.reverse(v1); Float f1 = (Float) v1.iterator().next(); } public void baz(){ List<Integer> v2 = new Vector<Integer>(); v2.add(new Integer(17)); this.reverse(v2); Integer i1 = (Integer) v2.iterator().next(); } public <T> void reverse(List<T> v3){ for (int t=0; t < v3.size()/2; t++){ T temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t)); v3.add(t, temp); } } 74 Generation © 2004 IBM Corporation IBM Research Results 75 benchmark LOC #container allocations #container declarations #casts #casts removed %casts removed Hanoi 4028 3 6 20 14 70 JUnit 5317 24 63 54 21 39 JLex 7841 17 45 71 53 75 JavaCup 10598 19 78 502 373 74 Mango1 2808 2 9 2 2 100 Mango2 2808 3 13 4 2 50 Mango3 2808 1 17 10 0 0 © 2004 IBM Corporation IBM Research Demo: Prototype “Genericize” Refactoring 76 © 2004 IBM Corporation IBM Research Outline background type constraints for Java programs – notation and terminology – constraint generation rules applications – generalization-related refactorings (OOPSLA’03) – customization of library classes (ECOOP’04) – refactorings for introducing generics (work in progress) related work conclusions and future work 77 © 2004 IBM Corporation IBM Research Related Work on Customization automatic data structure selection for SETL – see [Schonberg et al. ’81] automatic component selection – see, e.g., [Hogstedt et al. ’01, Yellin ’03] – purely profile-based, no static analysis – all possible component implementations supplied up-front automatic optimization of data structures in specific domains – e.g., data structure selection for sparse matrix problems optimizations applied to specific container classes – see, e.g., [Beckmann & Wang, Friedman et al. ’01] – e.g., prefetching, incrementalizing rehash operations much related work on partial evaluation and program specialization – see e.g., [Schultz, Lawall, Consel ’03] 78 © 2004 IBM Corporation IBM Research Other Related Work type inference and type-directed transformation have been used in the translation of large COBOL programs for Y2K compliance [Eidorff et al. 99, Ramalingam et al. 99] informal characterization of type constraints [Opdyke’92, Seguin’00, Tokuda & Batory’01] detecting overspecific variables [Halloran & Scherlis’02] generating proposals for refactoring class hierarchies using concept analysis [Snelting & Tip’00] inferring generic types in Java programs [Duggan’99, Donovan et al.’04, Von Dincklage & Diwan’04] 79 © 2004 IBM Corporation IBM Research Future Work in progress: support for migration between functionally equivalent classes – e.g., from Vector to ArrayList, Hashtable to HashMap – limitations on migration due to interaction with external code – application: upgrading of “legacy” applications variation on Java in which programmers only refer to interface types such as Set, Map, List instead of concrete types such as HashSet, TreeMap, ArrayList – use customization techniques to select implementation – similar in spirit to the SETL work at NYU by Paige, Schonberg, et al. in the 1970s and 1980s other generics-related refactorings – select a declaration & change its type into a type parameter 80 © 2004 IBM Corporation IBM Research Conclusions type constraints are a useful tool for supporting refactorings and related program transformations – checking of preconditions – determining allowable source-code modifications – enables reasoning about program behavior applications – refactorings related to generalization – customization of library classes – refactorings for introducing generics – more refactorings in the works implemented in Eclipse – Extract Interface, Generalize Type available now – generics refactorings planned for Eclipse 3.1 – freely available from www.eclipse.org 81 © 2004 IBM Corporation EXTRA SLIDES IBM Research Typical Refactoring Scenario user proposes a transformation by interacting with GUI/Wizards in IDE system checks if preconditions are met system determines necessary/allowable source code updates systems shows before/after “diff” view user confirms program works as before 83 © 2004 IBM Corporation IBM Research Solving the Constraints naive approach – explicitly enumerate all values; each expression type in { C, I } – for each solution, determine if constraints are satisfied cost: O(2n), where n is the number of declarations of type C 84 © 2004 IBM Corporation IBM Research Object-Oriented Type Systems “A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute” [Benjamin C. Pierce, 2002] Traditional applications of type systems: – enhance readability/understandability – prove/guarantee that certain kinds of run-time errors will not occur during program execution (e.g., “message not understood”) – foundation for abstractions & language features (e.g., module systems) – enable optimizations (e.g., replace dynamic dispatch with direct call) 85 © 2004 IBM Corporation IBM Research Some Terminology type: set of objects that share properties (e.g., supported operations) – in Java, there is a direct correspondence between types and classes and interfaces in the inheritance hierarchy static typing: type information is explicit in the source code – consistency checks can be performed by a compiler (type checking) – Note: some run-time checking may still be needed type checking: checking certain consistency properties of programs that contain explicit type declarations – to guarantee the absence of run-time errors – a program that type-checks is (statically) type-correct type inference – in dynamically typed languages, types of expressions are inferred from their usage – also used in statically typed languages for optimization (e.g., certain run-time checks may be proven obsolete through analysis) type constraints – formalism for expressing relationships between program expressions that must hold in order for a program to be type-correct – used for type checking as well as for type inference 86 © 2004 IBM Corporation IBM Research Observations cannot update variable e1 because method getName() is called on e1, which is not declared in Billable cannot update variable e2 because method getAddress() is called on e2, which is not declared in Billable updating the return type of findEmployee() produces type mismatch in assignment to e2 updating the cast produces type mismatch in assignment to e1 87 © 2004 IBM Corporation IBM Research Observations Observations: – type of v2 must be List, because of field access v2.size – type of v3 must be List, because of assignment v2 = v3 – type of v8 must be List, because of call v8.sort() – type of v4 must be List because it is passed as an argument to Client.sortList(), implying an assignment v8 = v4 – return type of Client.createList() must be List because of assignment v4 = Client.createList() Conclusion: – v0, v1, v5, v6, v7, v9, and the return types of List.add(), List.addAll(), Bag.add(), Bag.addAll() can be given type Bag 88 © 2004 IBM Corporation IBM Research Conclusions & Future Work customization: a technique for library-level optimizations – – – use type constraints to determine where applicable use profile information to determine where useful use static analysis and profile information to select optimizations strong results – speedups up to 76.7% (18.8-24.1% on average) – heap consumption reduced by up to 45.9% (11.9% on average) – modest increase in app. size (<12% on large applications) future work: – – – – apply additional optimizations apply to additional library classes self-customizing classes incorporate into whole-program optimizers • 89 e.g., Jax [Tip et al. 02], IBM WSDD SmartLinker © 2004 IBM Corporation IBM Research Detailed Speedup Results 90 © 2004 IBM Corporation IBM Research Detailed Heap/Size Results 91 © 2004 IBM Corporation IBM Research Implementation implemented in Eclipse using existing refactoring framework [Baeumer et al. 01] – Extract Interface – Generalize Type – Pull Up Members – Push Down Members determining type constraints nontrivial for several language features – arrays – member types (inner classes) – exceptions – overloading 92 © 2004 IBM Corporation IBM Research Demonstration of Eclipse Refactoring Support Basic Stuff: – – – – Rename Class – inline “callSite” in processCurrentCallSitesWrtProcessedClasses() Extract Constant – 93 RTA.moveNewToCurrentClasses() Inline Local Variable – method RTA.process() too long extract processCurrentCallSitesWrtProcessedClasses() estIterations() undo estIterations() with next line --- two return values convert local to field estIterations with next line OK now Inline Method – remove ugly prefix: JX_RTA -> RTA Extract Method – – – – – – – texthovers: JavaDoc ctrl-hover: Code + HyperLink Ctrl-T: hierarchy code completion DONE_ESTIMATE at end of RTA.process() Pull Up Members © 2004 IBM Corporation IBM Research Example public class Employee { public String getName(){ return _name; } public String getAddress(){ return _address; } public int getRate(){ return _rate;} public boolean hasSpecialSkill(){ return _hasSpecialSkill; } private int _rate; private boolean _hasSpecialSkill; private String _name; private String _address; } public class TimeSheet { public double charge(Employee emp, int days){ int base = emp.getRate() * days; if (emp.hasSpecialSkill()) return base * 1.05; else return base; } } Example taken from Fowler’s “Refactoring”, p.342 © 2004 IBM Corporation 94 IBM Research Example public interface Billable { int getRate(); boolean hasSpecialSkill(); } public class Employee implements Billable { // contents of this class same as before } public class TimeSheet { public double charge(Billable emp, int days){ int base = emp.getRate() * days; if (emp.hasSpecialSkill()) return base * 1.05; else return base; } } 95 Example taken from Fowler’s “Refactoring”, p.342 © 2004 IBM Corporation IBM Research But updating any of these references to Employee leads to compilation errors... public class Personnel { public static Employee findEmployee(String name) throws NotFoundException { for (int t=0; t < employees.size(); t++){ Employee e1 = (Employee)employees.elementAt(t); if (e1.getName().equals(name)) return e1; } throw new NotFoundException(); } public static String findAddress(String name) throws NotFoundException { Employee e2 = findEmployee(name); return e2.getAddress(); } private static Vector employees; } © 2004 IBM Corporation 96 IBM Research Context Inference assume that allocation sites in a program are labeled – distinct labels L1, ..., Lk for container-related allocation sites – a single “blob” label ● used for all other allocation sites – distinct label Lext represents collections created outside the application for each method m, infer a set of contexts Contexts(m) – each context represents a set of callers of a method – identified by a list of labels, one for each parameter; e.g., [L1, L2, ●, ●] for each expression E that occurs in the body of method m for which Contexts(m), infer a points-to set Objects(E) – set of labels; e.g., PT(E) = {L1, L2, L9, ●} compute context-sensitive call graph – compute for each pair <call-site, context>, a set of <method, context> pairs – make conservative assumptions about entry point methods 97 © 2004 IBM Corporation IBM Research Context Inference we assume a given set of entry point points – e.g., all public methods – to be specified by the user of the refactoring tool conservative assumptions about objects bound to parameters of entry point methods – depends on declared type of the parameter conservative assumptions about calls to external methods for which source code is unavailable use Class Hierarchy Analysis (CHA) [Grove et al. 95] to approximate behavior of dynamic dispatch null constants, literals, primitive values modeled as objects 98 © 2004 IBM Corporation IBM Research Auxiliary Definitions for Context Inference Rules set of objects assumed to be bound to parameters of entry-point methods { Lext } if T ≤ Collection ExternalObjects(T) = { ● } if T Collection {Lext,● } otherwise construct contexts for call sites that occur in method m for which Contexts(m) SelectContexts(, E0,...,Ek) = { [p0,...,pk] | pi Objects(Ei), 0 ≤ i ≤ k } 99 © 2004 IBM Corporation IBM Research Some of the Context Inference Rules T0.m(T1,...,Tn) is an entry point, pi ExternalObjects(Ti), = [p0,...,pn], 1 ≤ i ≤ n Contexts(T0.m(T1,...,Tn)) pi Objects(Param(T0.m(T1,...,Tn) )) (C1) (C2) m contains assignment E1=E2, Contexts(m) Objects(E2) Objects(E1) (C3) m contains call E0 new TL(E1,...,En) to constructor m’, T ≤ Collection, Contexts(m) L Objects(E0) (C4) m contains call E0 new TL(E1,...,En) to constructor m’, T Collection, Contexts(m) ’ SelectContexts(,E0,...,En), 0 ≤ i ≤ n ’ Contexts(m’) ● Objects(E0) Objects(Ei) Objects’(Param(m’,i)) 100 (C5) (C6) (C7) © 2004 IBM Corporation IBM Research Constraint Generation constraint generation rules similar to those used for generalization-related refactorings – constraint variables annotated with subscript that identifies their “containing” context – additional rules that model the behavior of operations on collections constraint variable Elem(E) represents the element type of container objects in Objects(E) – similar: Key(E), Value(E) type for Map-style collections notation: NewType(T) denotes a parameterized version of type T with a fresh type variable 101 © 2004 IBM Corporation IBM Research Some of the Constraint Generation Rules m contains assignment E1=E2, Contexts(m) |E2| ≤ |E1| (B1) m contains direct call E T.n(E1,...,Ek) to method m’, T Collection Contexts(m), ’ SelectContexts(, E1, ..., Ek), E’i = Param(m’,i), 1 ≤ i ≤ k |E| |m’|’ (B4) |Ei| ≤ |E’i| ’ (B5) m contains call E0.add(E1) to method m’, Contexts(m), Decl(m’) ≤ Collection |E1| Types(E0) T Types(E) T ≤ Elem(E) 102 (B16) |E1| ≤ |E2|’ (B24) Elem(E1) = Elem’(E2) (B27) © 2004 IBM Corporation IBM Research Constraint Generation for new Expressions m contains expression E0 new T(E1,...,Ek) to constructor m’, T Collection, Contexts(m), ’ = SelectContexts(, E0 ,...,Ek), E’i = Param(m’, i), 0 ≤ i ≤ k |E0| T (B2) |Ei| ≤ |E’i|’ (B3) m contains expression E0 new T(E1,...,Ek), T ≤ Collection, Contexts(m), T’ = NewType(T) |E0| T’ 103 (B14) © 2004 IBM Corporation IBM Research Code Generation source code updating for a method m is trivial if there is one context for m, or if the types inferred for the expressions in m are the same in all contexts if for a given expression E in method m, different types are computed in different contexts for m we attempt to introduce a type parameter for E – need to determine which (if any) other expressions must have the same type as E – a bound on a type parameter T of method m is needed if expressions of type T are constrained to be of a type X more specific than Object in some context of m • use a common upper bound of all such types X in programs with failing casts, the type constraint system may not have a solution in a given context – approach: merge all contexts for methods with failing casts, and continue solving (context-insensitive solution) a down-cast (T)E is redundant if the inferred type for E is a subtype of T – in all contexts for E 104 © 2004 IBM Corporation