Precise Memory Leak Detection for Java Software Using Container Profiling Guoqing Xu and Atanas Rountev Ohio State University Supported by NSF under CAREER grant CCF-0546040 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Java Memory Leak Memory leaks still exist in Java !! - Lost objects: unreachable but not freed - Useless objects: reachable but not used again A Java memory leak can cause severe problems - Performance degradation - OutofMemory exception It is difficult to detect such redundant objects - Static analysis may be of limited usefulness - Dynamic analysis is typically the “weapon of choice” 2 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Related Dynamic Approaches Basic idea: from-symptom-to-cause-diagnosis - Track all objects during the execution - Find suspicious objects - Traverse the run-time object graph to find the cause An orthogonal issue: the definition of symptom - Growing number of instances of a type (LeakBot ECOOP’05, Cork POPL’07) - Staleness (time since last use) of an object (Sleigh ASPLOS’06) Problem: lack of precision PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Why Imprecise Sources of imprecision - One single factor does NOT suffice to serve as leak symptom Growth of #objects may not point to a leak Can we created, consider but the is A JFrame object is never used since combination of not a leak multiple factors? Other factors that may contribute e.g. memory transitively consumed by an object - Hard to find the real cause from arbitrary objects We want a highly-precise report PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Outline Motivation Leak confidence analysis Memory leak detection for Java Experimental evaluation - Experience with real-world memory leaks - Run-time overhead 5 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University A New Perspective Observation: containers are often the leak causes - Many JDK memory leak bugs are caused by misuse of containers Let’s reverse the traditional diagnosis process - Start by suspecting that all containers are leaking, and use symptoms to rule out those less likely to leak - Avoid the effort to search for a cause from arbitrary objects PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Our Proposal Container-centric - Tracking containers - Assign a confidence value to each container, based on the symptoms it shows - Rank containers based on confidence We only find bugs caused by containers at the first and second levels of the tree Other approaches could be used as supplement PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Leak Confidence Analysis Consider the combined effect of multiple factors - Overall memory consumption - Memory taken up by an individual container - Staleness of a container Container abstraction - An ADT with three operations ADD, GET, and REMOVE ADD(n, o):void GET(n):o REMOVE(n, o):void Confidence analysis - Conceptual, and can be implemented in different ways PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Leaking Region Leaking region - A time region in which memory leak symptoms occur - Determined using GC events (1, 2 , …, n) - Defined as [s, e] where the memory consumption at GC events keep growing Decide on e - Offline: the time at which the program ends or OutOfMemory exception is thrown - Online: any time when a user wants to generate a report Determine s - Backward traverse GC events from e PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Leak-free Containers A container is considered to be leak-free - It is in state 0 at time e (#REMOVE = #ADD), or - It is deallocated within the leaking region Deallocation of n is treated as n REMOVE operations Container leak confidence - Memory contribution - Staleness contribution PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Memory Contribution A memory-time-graph is captures a container’s memory footprint in the leaking region - X axis: time relative to e - Y axis: memory consumed by all objects reachable from the container relative to total used memory 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.4 0.5 0.6 0.7 0.8 0.9 MC () = the area covered by the curve [0, 1] PRESTO: Program Analyses and Software Tools Research Group, Ohio State University 1 Staleness Contribution Time since last use [Bond and McKinley ASPLOS 06] A new definition in terms of and o - SC(o) = (2 - 1)/(2 - 0) ADD (σ, o) τ0 GET (σ, o) REMOVE (σ, o) τ1 τ2 SC() = staleness(o)/ size(), o [0, 1] PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Putting It All Together: Leak Confidence LC = SC X MC1-SC [0, 1] SC is more important than MC Increasing either SC or MC increases LC Properties - MC = 0 and SC[0, 1] SC = 0 and MC[0, 1] SC = 1 and MC[0, 1] MC = 1 and SC[0, 1] LC = LC = LC = LC = 0 0 1 SC PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Memory Leak Detection for Java Modeling of container behavior - A glue class is built for each container class - A bridge method is used to connect a container method with one of the ADD, REMOVE, or GET operations Instrumentation - Soot transformation framework Profiling - JVMTI Data analysis PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Modeling of Containers class HashMap{ Object put(Object key, Object value) {…} Object get(Object key) {…} Object remove(Object key) {…} } class Java_util_HashMap{ static void put_after (int csID, Map receiver, Object key, Object value , Object result) { if (result == null){ … Recorder.v().record(csID, receiver, key, …, Recorder.EFFECT_ADD); } } } Object result = map.put(a,b); Java_util_HashMap.put_after(1234, map, a, b, result); PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Profiling Approximation of MC - An object graph traversal thread is launched periodically to calculate the total of amount of memory consumed by objects reachable from the container object - Precision and overhead tradeoff is defined by the interval between two runs of the thread - Our experience shows 1/50GC is an appropriate value PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Data Analysis Identify leaking region Compute the approximation of MC - MC = MTi X (Ti+1 – Ti) Compute SC - Decompress data - Scan the data and remove data entries outside the leaking region - For each element, find its REMOVE site and its last GET site PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Experience with Detecting Memory Leak Bugs Java AWT/Swing bugs - Sun JDK bug #6209673 – existed in Java 5, fixed in 6 - Sun JDK bug #6559589 – still open in Java 6 SPECjbb bug The generated reports are precise - Top-ranked containers are the actual causes of the bugs - Confidence values for bug-inducing containers and correctly-used containers differ significantly PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Sun JDK Bug #6209673 The bug manifests when switching between two Swing applications According to a developer’s report, it is very hard to track down We instrumented the entire java.awt and javax.swing packages, and the test case that triggered the bug PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Sun JDK Bug #6209673 Container:29781703 type: java.util.HashMap (LC: 0.443, SC: 0.480, MC: 0.855) ---cs: javax.swing.RepaintManager:591 Container:2263554 type: class java.util.LinkedList (LC: 0.145, SC:0.172, MC: 0.814) ---cs: java.awt.DefaultKeyboardFocusManager:738 Container:399262 type: class javax.swing.JPanel (LC: 0.038, SC:0.044, MC: 0.860) ---cs: javax.swing.JComponent:796 Data analyzed in 21593ms PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Sun JDK Bug #6209673 Line 591 of javax.swing.RepaintManager - An GET operation image = (VolatileImage) volatileMap.get(config); - The container that is misused is the volatileMap - This information may be sufficient for a developer to locate the bug Where? - VolatileImage objects are cached in the map - Upon a display mode switch, the old configuration object get invalidated and will not used again - But the images are still maintained PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Overhead Compile-time analysis Dynamic overhead - Sampling rate: 1/15GC, 1/50GC - Initial heap size: default, 512k PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Overhead for Different Sampling Rates bu g2 jf le x xa la n pm d 1/50GC rate jy th on lu in de x lu se ar ch fo p ch ar t an tl r hs ql db 1/15GC rate 350 300 250 200 150 100 50 0 Y-axis: (NewTime-OldTime)/OldTime 1/15GC: 121.2% 1/50GC: 87.5% PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Overhead for Different Init Heap Size xa la n jf le x bu g2 512k heap pm d hs ql db jy th o lu n in d lu ex se ar ch an tl r ch ar t fo p default heap size 400 350 300 250 200 150 100 50 0 Default heap: 177.2% 512K heap: 87.5% PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Conclusions Our approach is container-centric - Tracking all modeled containers - Computing a leak confidence for each container Memory contribution Staleness contribution - Can be used for both online and offline diagnosis Memory leak detection for Java - Compiler assisted transformation - Profiling Future work - Reduce overhead (e.g., selective profiling, JIKES RVM) - Automated detection of containers - Try other models PRESTO: Program Analyses and Software Tools Research Group, Ohio State University