An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University September 18, 2002 Background  Andersen’s    points-to analysis for C (1994) Flow-insensitive, context-insensitive Inclusion-based, more accurate than unification-based Steensgaard O(n3), considered too slow to be practical  CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01)   Online caching/cycle elimination Field-independent: 1.3M lines of code in 137s September 18, 2002 SAS 2002 Slide 2 Doing it for Java  We want Andersen-level pointers for Java  Naïve port of CLA algorithm:   Spec “compress” benchmark: 2+ hours! Call graph accuracy: same as RTA (terrible)  Our    paper: how to do CLA for Java Spec “compress” benchmark: 5 seconds! JEdit (1371 classes): ~10 minutes! Call graph accuracy: very good September 18, 2002 SAS 2002 Slide 3 Java vs. C: Virtual calls  Java   has many virtual calls Accuracy of analysis strongly affects number of call targets More call targets leads to more code being analyzed and longer analysis times September 18, 2002 SAS 2002 Slide 4 Java vs. C: Treatment of Fields  Field-independent:   Most C pointer analyses Sound even for non-type-safe languages  Field-based:   in o.f, use only f Very inaccurate, requires type safety  Field-sensitive:  in o.f, use only o in o.f, use both o, f Strictly more accurate than field-independent or field-based Essential for Java September 18, 2002 SAS 2002 Slide 5 Java vs. C: Local variables  Local variables/stack locations are reused  Flow insensitivity causes many false aliases  Local flow sensitivity is necessary September 18, 2002 SAS 2002 Slide 6 Our Contribution  Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLA  Field sensitivity • Tracks separate fields of separate objects  Uses “method summary graphs” • Sparse representation, uses local flow sensitivity  Optimizations • Caching across iterations, reducing redundant ops  Supports all features of Java September 18, 2002 SAS 2002 Slide 7 Algorithm Overview Intraprocedural: Generate a sparse, flow-insensitive summary graph for each method  Based on access paths, uses local flow sensitivity Interprocedural: Using summary graphs, build inclusion graph to obtain whole-program result September 18, 2002 SAS 2002 Slide 8 Method Summaries  Sparse, flow-insensitive summary of the semantics of each method    Stores (writes) in method Calls made by method and their parameters Return values, thrown and caught exceptions  Use a flow-sensitive technique to generate method summaries  Precisely model updates to stack and locals September 18, 2002 SAS 2002 Slide 9 Method Summary: Example Code for method foo: Summary for method foo: static void foo(C x, C y) { C t = x.f; t.g = y; x.g = x; t.bar(y); } x f x.f g y g bar(t,y); read edge write edge parameter map edge September 18, 2002 SAS 2002 Slide 10 Node types A node represents an object at run time.  Concrete type nodes   Objects that have a known concrete type new statements and constant objects  Abstract   nodes Parameters, return values, dereferences Interprocedural phase maps an abstract node to set of concrete nodes it can represent September 18, 2002 SAS 2002 Slide 11 Edge types  Read   edge: Created by load statements Represent dereferences (access paths) of known locations  Write   f edge: f Created by store statements Represent references created by the method September 18, 2002 SAS 2002 Slide 12 Outgoing parameter map  Records which nodes are passed as which parameters  This is used in the interprocedural phase to match call sites to call targets x f x.f g y g t.bar(y); September 18, 2002 SAS 2002 Slide 13 Generating method summary  Worklist data flow solver (flow-sensitive)  Strong updates on locals, weak on others  Detect and close cycles in access paths  More detail in the paper September 18, 2002 SAS 2002 Slide 14 Review: Andersen’s Points-to  Points-to is encoded as inclusion relations x=y implies xy x  y is also written as: x  y September 18, 2002 SAS 2002 Slide 15 Review: Andersen’s Points-to Rule name: If code contains: Apply rule: Store x.f = e; x  newy newy.f  e Load e = x.f; x  newy e  newy.f Copy e1 = e2; e1  e2, e2  e3 e1  e3 Transitive closure September 18, 2002 e 1  e2 SAS 2002 Slide 16 Andersen example g t = x.f; t.g = y; x.g = x; September 18, 2002 x f x.f SAS 2002 g y Slide 17 Andersen example g t = x.f; t.g = y; x.g = x; x C September 18, 2002 f f x.f D SAS 2002 g y E Slide 18 Andersen example g t = x.f; t.g = y; x.g = x; x C Rule name: Load September 18, 2002 f x.f f g D If code contains: e = x.f; SAS 2002 y E Apply rule: x  newy e  newy.f Slide 19 Andersen example g t = x.f; t.g = y; x.g = x; x C Rule name: Load September 18, 2002 f x.f f g D If code contains: e = x.f; SAS 2002 y E Apply rule: x  newy e  newy.f Slide 20 Andersen example g t = x.f; t.g = y; x.g = x; x C Rule name: Store September 18, 2002 f x.f f g D If code contains: x.f = e; SAS 2002 y E Apply rule: x  newy newy.f  e Slide 21 Andersen example g t = x.f; t.g = y; x.g = x; x C Rule name: Store September 18, 2002 f x.f f g D If code contains: x.f = e; SAS 2002 y E Apply rule: x  newy newy.f  e Slide 22 Andersen example g t = x.f; t.g = y; x.g = x; x C Rule name: Store September 18, 2002 f x.f f g D If code contains: x.f = e; SAS 2002 y E Apply rule: x  newy newy.f  e Slide 23 Andersen example g t = x.f; t.g = y; x.g = x; x.f g y g x f C Rule name: Store September 18, 2002 f D If code contains: x.f = e; SAS 2002 E Apply rule: x  newy newy.f  e Slide 24 Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); x.f g y g x f C September 18, 2002 f D SAS 2002 E Slide 25 Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); x.f g y g x f C September 18, 2002 f D SAS 2002 E Slide 26 Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); x.f g y g x f C September 18, 2002 f D E Bar: this Bar: p1 SAS 2002 Slide 27 Overall Picture “Abstract” world E “Concrete” world F C D September 18, 2002 SAS 2002 Slide 28 Graph-based Andersen  Computing full transitive closure is prohibitively expensive  Store the graph in pre-transitive form, and calculate reachable nodes on demand September 18, 2002 SAS 2002 Slide 29 Algorithm foreach write edge e1 → e2 do foreach n in getConcreteNodes(e1) add write edge n.f → e2 foreach read edge e1 → e2 do foreach n in getConcreteNodes(e1) add inclusion edge e2  n.f foreach method call e1.f() foreach n in getConcreteNodes(e1) add parameter mappings for target method September 18, 2002 SAS 2002 Slide 30 Caching reachability queries  getConcreteNodes(e): transitive closure query on the inclusion graph  The same queries are repeated many times  Store the result in a hash table   Cached result may be stale due to edges added since the last query Iterate until convergence September 18, 2002 SAS 2002 Slide 31 Online cycle detection  Inclusion graph includes cycles  The algorithm collapses cycles as they are traversed    During traversal, keeps track of current path If a node on current path is revisited, collapse all nodes in cycle Each node has a “skip” pointer, which is set when collapsed and followed on all accesses September 18, 2002 SAS 2002 Slide 32 Reusing caches  Concrete node cache values don’t change much between algorithm iterations  Reallocation and rebuilding them is expensive  Reuse caches from old iterations  Keep track of an iteration ‘version’ number for each cache entry September 18, 2002 SAS 2002 Slide 33 Minimizing set union operations  Many caches don’t change across iterations  Avoid set union operations for caches that haven’t changed since the last iteration   Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry If input set hasn’t changed, set union operation is redundant September 18, 2002 SAS 2002 Slide 34 Experimental Results  Concrete type inference  Static call graph  Implemented in ~800 lines of Java  Freely available at: http://joeq.sourceforge.net September 18, 2002 SAS 2002 Slide 35 Programs  SpecJVM   J2EE – Java 2 Enterprise Edition v1.3   Compiler infrastructure, 75K lines Cloudscape   Massive (1+ million lines) business framework joeq   Standard benchmark suite Database shipped with J2EE, no source code JEdit  Full-featured editor, 100K lines September 18, 2002 SAS 2002 Slide 36 Experimental Results  We analyzed the reachable code for each application   Results include code in class library Analysis was very effective in reducing total program size  Pentium 4 2GHz 2GB RAM, Redhat 7.2  Sun JDK 1.3.1_01 with 512MB heap September 18, 2002 SAS 2002 Slide 37 c co hec m k pr es s db ja ck ja va c m pe jes ga s ud io m ra tr yt t ad rac m e in ap too pc l de lie pl nt j2 oyto ee o se l pa rve ck r ag ve er rif ie r clo jo ud eq sc ap e je di t Average targets per call site Analysis Precision vs. RTA 3 2.5 2 1.5 September 18, 2002 RTA Points-to 1 0.5 0 SAS 2002 Slide 38 Analysis time: Small benchmarks 80 70 Seconds 60 50 No opt Opt 40 30 20 10 September 18, 2002 SAS 2002 ce ra yt ra m trt o m pe ga ud i je ss ja va c ja ck db ch ec k co m pr es s 0 Slide 39 Analysis time: Large benchmarks 2000 1800 1600 Seconds 1400 1200 No opt Opt 1000 800 600 400 200 September 18, 2002 it ds je d ca pe q jo e cl ou ve rif i er r ag e ck pa es er v er l j2 e pl oy to o de ie nt pc l ap ad m in to o l 0 SAS 2002 Slide 40 db ja ck ja va c m jes pe s ga ud io m ra trt yt r ad ace m in ap too pc l de l ien pl oy t j2 ee too se l pa rve ck r ag e ve r rif ie r cl ou joeq ds ca pe je di t c co hec m k pr es s Times speedup Analysis time (speedup) 20 18 16 14 12 10 Opt 8 6 4 2 0 September 18, 2002 SAS 2002 Slide 41 September 18, 2002 db ja ck ja va c m j pe ess ga ud io m ra trt yt ad rac m e in ap too pc l de lien pl oy t j2 ee too se l pa rve ck r ag e ve r rif ie r cl j ou oe ds q ca pe je di t c co hec m k pr es s Bytecodes per second Analysis time (bytecodes/second) 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 SAS 2002 Slide 42 Related Work  Original  CLA paper Heintze and Tardieu (PLDI 2001)  Anderson’s    Rountev, Milanova, Ryder (OOPSLA 2001) Liang, Pennings, Harrold (PASTE 2001) Many others…  Concrete   analysis for Java type inference CHA, RTA Flow and context sensitivity, 0-CFA September 18, 2002 SAS 2002 Slide 43 Conclusion  Improved   Field sensitivity Local flow sensitivity  Improved   precision efficiency Reuse reachability cache across iterations Minimize set-union operations  Scales to the largest Java programs  A new baseline for Java pointers  No reason to use a less precise analysis September 18, 2002 SAS 2002 Slide 44

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

Related documents

Products

Support

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib