Dynamic Purity Analysis for Java Programs Haiying Xu, Christopher J.F. Pickett, Clark Verbrugge School of Computer Science, McGill University PASTE ’07 Conference, San Diego, CA Presented by Derek White CSE 6329 Outline • • • • • • • • Introduction Approach and Contributions Design: Static Purity Analysis Kinds of Dynamic Purity Design: Dynamic Purity Analysis Memoization Experimental Evaluation Conclusions Introduction • Functional programming emphasizes application of functions and avoids mutable data (side effects) • Popular functional languages include Scheme, Haskell, F#, OCaml, Scala, etc • But you can program in a functional style using other languages • “Pure” methods are methods that have functional (side effect free) behavior – Several definitions for purity, either no externally visible side effects or the extent of side effects is limited – Constraints may also be placed on level of dependency on previously available state Introduction (2) • Why do we care if a method is pure? • Helpful in program understanding, allows us to isolate side effect free parts • Verification in model checking • Can be used to guide compiler optimization – Better method purity info allows for less conservative assumptions – Caching (memoization) of function calls Introduction (3) • Static analysis has allowed large classifications for pure methods, there is variation in precise definitions used • Static analysis is conservative with respect to runtime behavior • It is unclear if some classes of pure methods have any practical value • So, the authors present a detailed examination of method purity for Java – Considering several definitions of purity – Investigating both static and dynamic properties Approach and Contributions • Extending previous work on static analysis, showing different forms of purity at different frequencies in dynamic environment • Design and implementation of dynamic purity analysis, online and offline – Scalable, handles SPECjvm98 at size 100 “with acceptable overhead” • Support for multiple purity definitions in order to compare to static purity analysis, also identified pure forms only observable dynamically Approach and Contributions (2) • Three metrics for the evaluation of extent of dynamic purity – Method, invocation, bytecode – These are applied to a static analysis as well as dynamic purity definitions • Implementation of memoization on JVM, a traditional consumer of purity information – Doesn’t achieve any speedup, just a functional test module Design: Static Analysis • Previous work has found that a large number of methods have weak purity properties, stronger purity properties result in fewer pure method • Static work done here considers strong purity – Method is “strongly pure” iff it doesn’t depend on OR change initial state beyond primitive input values – Must always return the same result for the same input • Specifically, the method may not: – – – – – – Read/write heap or static data Synchronize Allocate objects Invoke native methods Throw exceptions Invoke any non-pure methods Design: Static Analysis (2) • Java class files used as input • Flow-insensitive analysis done using Soot Class files Class files + attributes Soot Jimple Static Analysis Attribute Generation SableVM Attribute Parser Dynamic Metrics Output Figure 1. Static analysis framework Design: Static Analysis (3) • Instructions within a method are scanned, any instructions found to be impure mark the method as impure • Interprocedural analysis is done next, propagating impurity up from leaves of a CHA-based call graph • Assumption is made that exceptions do not propagate up the call stack unchecked Impurity Instructions Native code exec native INVOKE* Heap access NEW, NEWARRAY, ANEWARRAY, MULTIANEWARRAY, GETFIELD, PUTFIELD, *ALOAD, *ASTORE Static access GETSTATIC, PUTSTATIC Synchronization synchronized INVOKE*, synchronized *RETURN, MONITORENTER, MONITOREXIT Exceptions ATHROW Design: Static Analysis (4) • Easily extended for dynamic evaluation of strong static purity analysis • Soot writes purity information to class file attributes • SableVM reads attributes and records: – Pure methods reached at runtime – Frequency of pure method invocations – Percentage of pure bytecode executed by pure methods • Provides indications about how static results correlate with dynamic runtime behavior Design: Dynamic Analysis • Under the static analysis, a method is determined to be pure for all possible executions or is impure otherwise – may be too conservative • Methods that were flagged impure with static analysis may only execute pure flow control at runtime • Goal of dynamic analysis is to identify pure methods based on runtime behavior, increasing number of pure methods found Design: Dynamic Analysis (2) Figure 2. Dynamic purity analysis framework Design: Dynamic Analysis (3) • Class files read into SableVM, instruction stream is examined for purity • Purity analysis module uses an online escape analysis tracking writes to locally allocated objects • Purity information can be used immediately by the VM or written to a file as offline analysis for a later execution • Offline analysis removes the execution overhead • Clients of analysis are memoization and metrics used in static analysis • Four kinds of purity: strong, moderate, weak, onceimpure Kinds of Dynamic Purity: Strong • • • • Same criteria as strong static purity Only executed instructions are considered All methods start with unknown status Impure method information propagates up the call stack • As with static, once a method is identified as impure it is conservatively always considered impure Kinds of Dynamic Purity: Moderate • Objects can be created and altered as long as the objects do not escape the method execution context • A method may call an impure method as long as the impurity is contained • Must not change behavior based on heap or global state, based completely on primitive input arguments • Methods still cannot: – – – – – Invoke native methods Read/write existing heap or static objects Perform monitor operations Throw exceptions Call moderately impure methods, unless modified data belongs to and is contained in the caller • Native System.arraycopy() and Object.clone() treated as heap access and allocation instructions Kinds of Dynamic Purity: Moderate (2) • Analysis needs to take a closer look at *NEW*, GETFIELD, PUTFIELD, *ALOAD, *ASTORE • *NEW* instructions used to determine object locality – Objects of a method are local if they do not escape the method, or if they escape from a callee – Frames in the call stack have an object table storing all currently local objects • PUTFIELD can allow objects local to the callee to escape to the caller (requires an update to the object table) • GETFIELD, PUTFIELD, *ALOAD, *ASTORE can be classified depending on a frame’s object table • Moderately pure methods can only use object parameters for reference comparisons Kinds of Dynamic Purity: Weak • Allows heap reads so a method can inspect object parameters • Maintains property that the method is function on its input • GETFIELD is always safe • PUTFIELD still is considered in the context of the escape analysis Kinds of Dynamic Purity: Once-Impure • Observed that some impure methods became weakly pure after a first invocation • Once-Impure is a weakly pure method that was impure during its first execution Memoization: Optimization with Purity • All forms of purity mentioned previously ensure that there is a unique result for any given input • All are candidates for memoization • Memoization caches argument to return value mapping allowing the VM to bypass repeated execution of a method with the same arguments • Benefit from jumping past execution must outweigh cost of looking up the return value in cache Memoization (2) • Method must be long enough to be worth optimizing • After the first invocation, arguments are hashed together, looked up in a hash table, and the stored return value is substituted for invocation • Primitive args stored directly, reference args are flattened (gathering type and primitive fields) – Done so that garbage collection doesn’t invalidate memo tables • Direct object reference comparisons cannot be safely memoized, so ACMP_* bytecodes must be considered impure • Upper bounds on memory consumption limit the number of method invocations that can be cached Experimental Evaluation • Experiments conducted using programs from SPEC JVM98 benchmark • Metrics – Static method purity - percentage of all methods in the call graph that are pure – Dynamic method purity - percentage of methods reached at runtime that are pure – Dynamic invocation purity – percentage of method invocations that are pure – Dynamic bytecode purity – percentage of executed bytecode stream belonging to pure methods Experimental Evaluation: Static • Experimental analysis includes both application and class library code used • On average, 13% of methods are found to be strongly pure • Not all methods are invoked at runtime, dynamically it is found that 5-6% of reached methods are statically identified as pure • Many of these methods are small (20 inst or less) or are executed infrequently Table 2. Strong Static Purity: Static methods row shows percentage of all methods in the call graph identified as statically pure. Dynamic methods row shows percentage of all dynamic method invocations that execute a statically pure method. Bytecode row shows the percentage of the bytecode stream that is executed by a statically pure method Experimental Evaluation: Dynamic • Strong dynamic purity is a weaker than the static equivalent • First row of Tables 3, 4, 5 show an improvement over the runtime use of strong static purity in rows 2-4 of Table 2 • Table 3 shows up to 4% more pure methods reached with strong dynamic purity • Some methods invoked with significant frequency, Table 4 shows 13% more pure invocations for db Experimental Evaluation: Dynamic (2) Table 3. Dynamic method purity: All reached methods Table 4. Dynamic invocation purity: Invoked methods that are pure for dynamic purity definitions Table 5. Dynamic bytecode purity: Bytecode instruction streams that are pure for dynamic purity definitions Experimental Evaluation: Dynamic (3) • Reasons for impurity Table 8. Reasons for dynamic impurity Experimental Evaluation: Memoization • Once-impure dynamic purity analysis used, a method is always invoked once prior to memoization • Only applied to methods meeting cost effective criteria Table 11. Memoized/memoizable methods: Minimum method size setting shown in far left column Experimental Evaluation: Execution Figure 3. Execution times: Minimum method size for memoization is set to 50 Conclusions • Dynamic purity analyses identify considerable amounts of purity • Actual program behavior is not predictable based on only on static observations • Little variation in purity over the benchmark suite • May be the case that memoization is of limited use for non-functional languages Questions