Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin Ohio State University ESEC/FSE 07 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Outline Motivation - Challenges for checkpointing/replaying Java software - Summary of our approach Contributions - Static analyses - Multiple execution regions - Experimental evaluation Conclusions 2 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Motivation Checkpointing/replaying has been used for a variety of purposes at system level - Originally designed to support fault tolerance - Debugging of OS and of parallel and distributed software Checkpointing can benefit a number of software engineering tasks - Reduce the cost of manual debugging and testing - Support for automated techniques for debugging and testing: e.g., dynamic slicing and delta-debugging - Inspired by both system-level checkpointing [Pan-PDD88, Dunlap-OSDI02, King-USENIX05] and “saving-and-restoring” software engineering techniques [Saff-ASE05, OrsoWODA05, Orso-WODA06, Elbaum-FSE06] 3 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Challenges Ease of use and deployment - Application-level checkpointing: no JVM/runtime support, just code analysis and instrumentation - Challenge: no direct access to the call stack; no control over thread scheduling or external resources (files, etc.) Reduce the size of the recorded state - Dumping the entire heap may be prohibitively expensive, especially for large programs - Challenge: static analyses to prune redundant state Static and dynamic overhead - Static analysis cost is amortized over multiple runs - Approach is intended for long-running applications 4 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Summary of Our Approach Tool input: program + checkpoint definition Performs static analyses and code instrumentation Tool output: two program versions First, an augmented checkpointing version is executed once to record (parts of) the run-time program states - At the checkpoint: heap objects, static fields, locals - At certain points along the call chain leading to the checkpoint Next, a pruned replaying version is executed multiple times - Restore variables saved at the checkpoint - Restore variables saved at points along the call chain How do we resume execution from the checkpoint? - Step 1: control flow quickly reaches the checkpoint - Step 2: recover state at checkpoint - Step 3: incrementally recover state after call sites along the call chain leading to the checkpoint 5 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Definitions Crosscut call chain (CC-chain) - A programmer-specified call chain that leads to the method that contains the checkpoint - E.g. main(44) -> run(28) Decision points - A call site on the CC-chain (e.g. m.run) – due to polymorphism - A predicate on which a decision point or the checkpoint is control-dependent At a decision point, the checkpointing version records the control-flow outcome The replaying version uses this info to force the control flow to reach the checkpoint 6 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Replaying, Step 1: Recover the Call Stack Predicate decision point: recover boolean value Call site decision point o.m(a1…, an) - Recover the run-time type of the receiver object; instantiated during replaying using sun.misc.Unsafe 7 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Checkpointing Version 8 void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); static void main(String[] args) Set wp_packs = getWpacks(); { Set body_packs = getBpacks(); Main m = new Main(); boolean b = Options.v().whole_jimple(); boolean b = => save(b); args.length !=0; if (b){// DP => save(b); getPack("cg").apply(); if (b) // DP // --- checkpoint --=> save(type_of(m)); => save(…); m.run(args); // DP getPack("wjtp").apply(); } getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } ... PRESTO: Program Analyses and Software Tools Research Group, Ohio State University } Replaying Version 9 void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); static void main(String[] args) Set wp_packs = getWpacks(); { Set body_packs = getBpacks(); Main m = new Main(); boolean b = Options.v().whole_jimple(); boolean b = => read(b); args.length !=0; if (b){// DP => read(b); getPack("cg").apply(); if (b) // DP // --- checkpoint --=> read(type_of(m)); =>read(…); => unsafe.allocate(m); getPack("wjtp").apply(); => args = null; getPack("wjop").apply(); m.run(args); // DP getPack("wjap").apply(); } } retrieveAllBodies(); … } PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Step 2: Recover at the Checkpoint 10 void run(String[] args) { processCmdLine(args); Our static analysis selects locals for recording(for loadNecessaryClasses(); checkpointing)/recovering(for replaying) when Set wp_packs = getWpacks(); - They Set arebody_packs written before the checkpoint = getBpacks(); - They ifare read after the checkpoint (Options.v().whole_jimple()) { getPack("cg").apply(); Record primitive-typed // --- checkpoint ---values or entire object graphs ongetPack("wjtp").apply(); the heap (all reachable objects) getPack("wjop").apply(); Static fields are selected based on the same idea getPack("wjap").apply(); } retrieveAllBodies(); for (Iterator i = body_packs.iterator(); i.hasNext();) { body_packs … }… } PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Selection of Static Fields A whole program Mod/Use analysis - A static field is “written” if its value is changed, or any heap object reachable from it is mutated - A static field is “read” if its value is directly read Analysis algorithm - Context-sensitive and flow-insensitive; uses the points-to solution and the call graph from Spark [Lhotak CC-03] - Bottom-up traversal of the SCC-DAG of the call graph - For each method m, a set Cm is maintained to contain all objects from which a mutated object can be reached - Propagate backwards the objects in Cm that escape a callee method to its callers - Select a static field fld if PointsToSet(fld) ∩ Cm ≠ ∅ 11 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Step 3: Recover after the Checkpoint Replaying only at decision points and the checkpoint is not enough to guarantee correct execution after the checkpoint void main(){ class B{ Additionally record/recover local variables that = new HashSet(); Set in s; CC-chain willSet behsread after each call site B b = new B(hs); //-- reco/rest // (type_of(b)) b.m(); //-- extra reco/rest (hs) if(hs == b.s){ … } hs } uninitialized 12 void m(){ B r0 = this; r0.s = new HashSet(); //-- checkpoint //-- reco/rest (r0) r0.s.add(“”); } } PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Additional Issues A checkpoint can have multiple run-time instances If a method in CC-chain has callers that are not in the chain, it has to be replicated Currently do not support multi-threaded programs Our technique does not guarantee the correctness of the execution, when the post-checkpoint part of the program - Depends on external resources, such as files, databases - Depends on unique-per-execution values, such as clock - Is modified with new cross-checkpoint dependencies Multiple execution regions - Designated by a starting point and an ending point - Specified by two CC-chains 13 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Study 1: Static Analysis Program 14 #R #IP compress socksproxy 1 3 6 11 socksecho 3 14 raytrace 3 10 soot-2.2.3 10 35 muffine 3 20 sablecc 4 11 jess 3 8 violet 4 9 javacup 4 9 jtar-1.21 2 4 db 2 5 jflex 2 8 jb-6.1 3 5 jlex-1.2.6 3 8 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Static Analysis: Locals Reduction 1800 Total Locals Selected Locals 1600 1400 1200 1000 800 600 400 200 15 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University -1 .2 .6 jle x jb -6 .1 x jf le db r1. 21 jt a p ac u ja v vi ol et s je s bl ec c sa fin e m uf ot -2 .2 .3 so ra yt ra ce ch o so c ks e y ks pr ox so c co m pr es s 0 Static Analysis: Static Fields Reduction 3500 Total SF Selected SF 3000 2500 2000 1500 1000 500 16 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University jb -6 .1 jle x1. 2. 6 jf le x db p jt ar -1 .2 1 ac u ja v vi ol et s je s y so ck se ch o ra yt ra ce so ot -2 .2 .3 m uf fin e sa bl ec c ro x so ck sp co m pr es s 0 Static Analysis: Removed/Inserted Statements Stmts Left after Pruning(%) Stmts Inserted(%) 120 100 80 60 40 20 17 6 .2 . .1 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University j le x1 jb -6 le x jf db 21 jt ar -1 . va cu p ja le t vi o je ss ra yt ra ce so ot -2 .2 .3 m uf fin e sa bl ec c ks ec ho so c ks pr ox y so c co m pr es s 0 Static Analysis Cost Phase 1: Soot infrastructure cost - Between 1.64ms and 30.6ms per thousand Jimple statements - On average, 11.1ms/1000 statements Phase 2: Our analysis cost - Between 1.67ms and 26.6ms per thousand Jimple statements - On average, 9.4ms/1000 statements This should be amortized across multiple runs of the replaying version 18 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Study 2: Run-Time Performance (compress) Original program: compressing and decompressing 5 big tar files several times Evaluated for five checkpoint definitions - 19 One checkpoint, close to the beginning of the program Two regions of compression and decompression A region containing the process of compression A region containing the process of decompression One checkpoint, close to the end of the program PRESTO: Program Analyses and Software Tools Research Group, Ohio State University compress Performance Normalized running times checkpointing version replaying version 140 120 100 80 60 40 20 0 1 Normalized size of captured program state 2 3 Size of Heap 100 4 5 Size of Captured Program State 90 80 70 60 50 40 30 20 10 0 1 20 2 3 4 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University 5 Study 2: Run-Time Performance (soot) Input: soot-2.2.3 itself containing 2227333 methods Phases - Enabling cg.spark, wjtp, wjop.ji, wjap.uft, jtp, jop.cp Evaluated for six checkpoint definitions - 21 Before whole-program packs After cg After wjtp After wjop After wjap After body packs PRESTO: Program Analyses and Software Tools Research Group, Ohio State University soot Performance Normalized running times Checkpointing version Replaying version 120 100 80 60 40 20 0 1 Normalized captured program state 2 Size of Heap 100 3 4 5 6 Size of Captured Program State 90 80 70 60 50 40 30 20 10 0 1 22 2 3 4 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University 5 6 Study 2: Run-Time Performance (jflex-1.4.1) Input: a .flex grammar file corresponding to a DFA containing 21769 states Evaluated for four checkpoint definitions - 23 After After After After NFA is generated DFA is generated to DFA minimization emission PRESTO: Program Analyses and Software Tools Research Group, Ohio State University jflex Performance Normalized running time Replaying version Checkpointing version 150 100 50 0 1 Normalized size of capture state 100 2 Size of Heap 3 4 Size of Captured Program State 90 80 70 60 50 40 30 20 10 0 1 24 2 3 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University 4 Summary of Evaluation Static analysis successfully reduces the size of program state recorded and recovered It is more meaningful to checkpoint/replay longrunning programs Checkpoints are better taken after a phase of long time computation with (relatively) small output state - √ compress: small program state, short running time - √ soot: large program state, but very long computation time - X jflex: large program state, short running time 25 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Conclusions A static-analysis-based checkpointing/replaying technique An implementation and an evaluation that shows our technique can be an interesting candidate for testing, debugging, and dynamic slicing of longrunning programs Future work - Language-level checkpointing/replaying multi-threaded programs - More precise static analyses could be employed to reduce the size of program state to be captured - The run-time support for object reading and writing could be improved 26 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Questions? 27 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University compress Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 31 471by 0.17% 4.19 (0.74%) 4.14 (0.38%) 2 545 89.7M 28.8% 5.22 (10.4%) 3.19 (11.8%) 3 22 89.7by 28.9% 5.38 (9.0%) 2.17 (12.8%) 4 578 89M 26.7% 4.70 (12.3%) 1.39 (24.7%) 5 31 296by 0.008% 4.17 (8.1%) 47 (34.0%) Original running time: 4.05s 28 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University soot Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 461058 36.2M 36.3% 4695.3 (0.4%) 4643.5 (0.5%) 2 65648481 745M 73.2% 4712.2 (7.2%) 4410.5 (9.1%) 3 65648481 745M 73.2% 4688.4 (6.9%) 4387.3 (8.7%) 4 77739391 806.4M 79.0% 4770.1 (8.0%) 511.5 (95.2%) 5 77767256 806.5M 63.5% 4972.8 (8.0%) 533.1 (97.8%) 6 75668735 795.3M 72.8% 4661.6 (8.0%) 411.5 (96.5%) Original running time: 4665.7s 29 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University jflex Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 6606489 259.8M 86.1% 64.9 (8.0%) 68.8 (18.3%) 2 6695173 385.1M 68.1% 65.2 (12.3%) 55.6 (26.1%) 3 6695172 385.1M 68.1% 63.9 (12.1%) 55.4 (26.0%) 4 21 2K 0.0003% 56.2 (0.14%) 0.063 (50.8%) Original running time: 52.6s 30 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University