Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay

advertisement
Efficient Checkpointing of Java
Software using Context-Sensitive
Capture and Replay
Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin
Ohio State University
ESEC/FSE 07
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Outline
 Motivation
- Challenges for checkpointing/replaying Java software
- Summary of our approach
 Contributions
- Static analyses
- Multiple execution regions
- Experimental evaluation
 Conclusions
2
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Motivation
 Checkpointing/replaying has been used for a variety
of purposes at system level
- Originally designed to support fault tolerance
- Debugging of OS and of parallel and distributed software
 Checkpointing can benefit a number of software
engineering tasks
- Reduce the cost of manual debugging and testing
- Support for automated techniques for debugging and
testing: e.g., dynamic slicing and delta-debugging
- Inspired by both system-level checkpointing [Pan-PDD88,
Dunlap-OSDI02, King-USENIX05] and “saving-and-restoring”
software engineering techniques [Saff-ASE05, OrsoWODA05, Orso-WODA06, Elbaum-FSE06]
3
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Challenges
 Ease of use and deployment
- Application-level checkpointing: no JVM/runtime support,
just code analysis and instrumentation
- Challenge: no direct access to the call stack; no control
over thread scheduling or external resources (files, etc.)
 Reduce the size of the recorded state
- Dumping the entire heap may be prohibitively expensive,
especially for large programs
- Challenge: static analyses to prune redundant state
 Static and dynamic overhead
- Static analysis cost is amortized over multiple runs
- Approach is intended for long-running applications
4
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary of Our Approach




Tool input: program + checkpoint definition
Performs static analyses and code instrumentation
Tool output: two program versions
First, an augmented checkpointing version is executed once
to record (parts of) the run-time program states
- At the checkpoint: heap objects, static fields, locals
- At certain points along the call chain leading to the checkpoint
 Next, a pruned replaying version is executed multiple times
- Restore variables saved at the checkpoint
- Restore variables saved at points along the call chain
 How do we resume execution from the checkpoint?
- Step 1: control flow quickly reaches the checkpoint
- Step 2: recover state at checkpoint
- Step 3: incrementally recover state after call sites along the call
chain leading to the checkpoint
5
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Definitions
 Crosscut call chain (CC-chain)
- A programmer-specified call chain that leads to the
method that contains the checkpoint
- E.g. main(44) -> run(28)
 Decision points
- A call site on the CC-chain (e.g. m.run) – due to
polymorphism
- A predicate on which a decision point or the checkpoint is
control-dependent
 At a decision point, the checkpointing version
records the control-flow outcome
 The replaying version uses this info to force the
control flow to reach the checkpoint
6
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Replaying, Step 1: Recover the Call Stack
 Predicate decision point: recover boolean
value
 Call site decision point o.m(a1…, an)
- Recover the run-time type of the receiver object;
instantiated during replaying using sun.misc.Unsafe
7
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Checkpointing Version
8
void run(String[] args) {
processCmdLine(args);
loadNecessaryClasses();
static void main(String[] args)
Set wp_packs = getWpacks();
{
Set body_packs = getBpacks();
Main m = new Main();
boolean b = Options.v().whole_jimple();
boolean b =
=> save(b);
args.length !=0;
if (b){// DP
=> save(b);
getPack("cg").apply();
if (b) // DP
// --- checkpoint --=> save(type_of(m));
=> save(…);
m.run(args); // DP
getPack("wjtp").apply();
}
getPack("wjop").apply();
getPack("wjap").apply();
}
retrieveAllBodies();
…
} ...
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
}
Replaying Version
9
void run(String[] args) {
processCmdLine(args);
loadNecessaryClasses();
static void main(String[] args)
Set wp_packs = getWpacks();
{
Set body_packs = getBpacks();
Main m = new Main();
boolean b = Options.v().whole_jimple();
boolean b =
=> read(b);
args.length !=0;
if (b){// DP
=> read(b);
getPack("cg").apply();
if (b) // DP
// --- checkpoint --=> read(type_of(m));
=>read(…);
=> unsafe.allocate(m);
getPack("wjtp").apply();
=> args = null;
getPack("wjop").apply();
m.run(args); // DP
getPack("wjap").apply();
}
}
retrieveAllBodies();
…
}
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Step 2: Recover at the Checkpoint



10
void run(String[] args) {
processCmdLine(args);
Our static
analysis selects locals for recording(for
loadNecessaryClasses();
checkpointing)/recovering(for
replaying) when
Set wp_packs = getWpacks();
- They Set
arebody_packs
written before
the checkpoint
= getBpacks();
- They ifare
read after the checkpoint
(Options.v().whole_jimple())
{
getPack("cg").apply();
Record primitive-typed
// --- checkpoint ---values or entire object
graphs ongetPack("wjtp").apply();
the heap (all reachable objects)
getPack("wjop").apply();
Static fields
are selected based on the same idea
getPack("wjap").apply();
}
retrieveAllBodies();
for (Iterator i = body_packs.iterator();
i.hasNext();) {
body_packs
…
}…
}
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Selection of Static Fields
 A whole program Mod/Use analysis
- A static field is “written” if its value is changed, or any
heap object reachable from it is mutated
- A static field is “read” if its value is directly read
 Analysis algorithm
- Context-sensitive and flow-insensitive; uses the points-to
solution and the call graph from Spark [Lhotak CC-03]
- Bottom-up traversal of the SCC-DAG of the call graph
- For each method m, a set Cm is maintained to contain all
objects from which a mutated object can be reached
- Propagate backwards the objects in Cm that escape a
callee method to its callers
- Select a static field fld if PointsToSet(fld) ∩ Cm ≠ ∅
11
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Step 3: Recover after the Checkpoint
 Replaying only at decision points and the
checkpoint is not enough to guarantee correct
execution after the checkpoint
void main(){
class B{
 Additionally record/recover local variables that
= new
HashSet();
Set in
s; CC-chain
willSet
behsread
after
each call site
B b = new B(hs);
//-- reco/rest
//
(type_of(b))
b.m();
//-- extra reco/rest (hs)
if(hs == b.s){ … }
hs
}
uninitialized
12
void m(){
B r0 = this;
r0.s = new HashSet();
//-- checkpoint
//-- reco/rest (r0)
r0.s.add(“”);
}
}
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Additional Issues
 A checkpoint can have multiple run-time instances
 If a method in CC-chain has callers that are not in
the chain, it has to be replicated
 Currently do not support multi-threaded programs
 Our technique does not guarantee the correctness
of the execution, when the post-checkpoint part of
the program
- Depends on external resources, such as files, databases
- Depends on unique-per-execution values, such as clock
- Is modified with new cross-checkpoint dependencies
 Multiple execution regions
- Designated by a starting point and an ending point
- Specified by two CC-chains
13
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 1: Static Analysis
Program
14
#R
#IP
compress
socksproxy
1
3
6
11
socksecho
3
14
raytrace
3
10
soot-2.2.3
10
35
muffine
3
20
sablecc
4
11
jess
3
8
violet
4
9
javacup
4
9
jtar-1.21
2
4
db
2
5
jflex
2
8
jb-6.1
3
5
jlex-1.2.6
3
8
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Static Analysis: Locals Reduction
1800
Total Locals
Selected Locals
1600
1400
1200
1000
800
600
400
200
15
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
-1
.2
.6
jle
x
jb
-6
.1
x
jf
le
db
r1.
21
jt
a
p
ac
u
ja
v
vi
ol
et
s
je
s
bl
ec
c
sa
fin
e
m
uf
ot
-2
.2
.3
so
ra
yt
ra
ce
ch
o
so
c
ks
e
y
ks
pr
ox
so
c
co
m
pr
es
s
0
Static Analysis: Static Fields Reduction
3500
Total SF
Selected SF
3000
2500
2000
1500
1000
500
16
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
jb
-6
.1
jle
x1.
2.
6
jf
le
x
db
p
jt
ar
-1
.2
1
ac
u
ja
v
vi
ol
et
s
je
s
y
so
ck
se
ch
o
ra
yt
ra
ce
so
ot
-2
.2
.3
m
uf
fin
e
sa
bl
ec
c
ro
x
so
ck
sp
co
m
pr
es
s
0
Static Analysis: Removed/Inserted Statements
Stmts Left after Pruning(%)
Stmts Inserted(%)
120
100
80
60
40
20
17
6
.2
.
.1
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
j le
x1
jb
-6
le
x
jf
db
21
jt
ar
-1
.
va
cu
p
ja
le
t
vi
o
je
ss
ra
yt
ra
ce
so
ot
-2
.2
.3
m
uf
fin
e
sa
bl
ec
c
ks
ec
ho
so
c
ks
pr
ox
y
so
c
co
m
pr
es
s
0
Static Analysis Cost
 Phase 1: Soot infrastructure cost
- Between 1.64ms and 30.6ms per thousand Jimple
statements
- On average, 11.1ms/1000 statements
 Phase 2: Our analysis cost
- Between 1.67ms and 26.6ms per thousand Jimple
statements
- On average, 9.4ms/1000 statements
 This should be amortized across multiple runs of the
replaying version
18
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 2: Run-Time Performance (compress)
 Original program: compressing and decompressing 5
big tar files several times
 Evaluated for five checkpoint definitions
-
19
One checkpoint, close to the beginning of the program
Two regions of compression and decompression
A region containing the process of compression
A region containing the process of decompression
One checkpoint, close to the end of the program
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
compress Performance
 Normalized running
times
checkpointing version
replaying version
140
120
100
80
60
40
20
0
1
 Normalized size of
captured program
state
2
3
Size of Heap
100
4
5
Size of Captured Program State
90
80
70
60
50
40
30
20
10
0
1
20
2
3
4
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
5
Study 2: Run-Time Performance (soot)
 Input: soot-2.2.3 itself containing 2227333 methods
 Phases
- Enabling cg.spark, wjtp, wjop.ji, wjap.uft, jtp, jop.cp
 Evaluated for six checkpoint definitions
-
21
Before whole-program packs
After cg
After wjtp
After wjop
After wjap
After body packs
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
soot Performance
 Normalized
running times
Checkpointing version
Replaying version
120
100
80
60
40
20
0
1
 Normalized
captured program
state
2
Size of Heap
100
3
4
5
6
Size of Captured Program State
90
80
70
60
50
40
30
20
10
0
1
22
2
3
4
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
5
6
Study 2: Run-Time Performance (jflex-1.4.1)
 Input: a .flex grammar file corresponding to a DFA
containing 21769 states
 Evaluated for four checkpoint definitions
-
23
After
After
After
After
NFA is generated
DFA is generated to DFA
minimization
emission
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
jflex Performance
 Normalized
running time
Replaying version
Checkpointing version
150
100
50
0
1
 Normalized size
of capture state
100
2
Size of Heap
3
4
Size of Captured Program State
90
80
70
60
50
40
30
20
10
0
1
24
2
3
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
4
Summary of Evaluation
 Static analysis successfully reduces the size of
program state recorded and recovered
 It is more meaningful to checkpoint/replay longrunning programs
 Checkpoints are better taken after a phase of long
time computation with (relatively) small output
state
- √ compress: small program state, short running time
- √ soot: large program state, but very long computation time
- X jflex: large program state, short running time
25
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Conclusions
 A static-analysis-based checkpointing/replaying
technique
 An implementation and an evaluation that shows
our technique can be an interesting candidate for
testing, debugging, and dynamic slicing of longrunning programs
 Future work
- Language-level checkpointing/replaying multi-threaded
programs
- More precise static analyses could be employed to reduce
the size of program state to be captured
- The run-time support for object reading and writing could
be improved
26
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
 Questions?
27
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
compress
Run
#Objects
Space
%Heap
Timec(s) (%wio)
Timer(s) (%rio)
1
31
471by
0.17%
4.19 (0.74%)
4.14 (0.38%)
2
545
89.7M
28.8%
5.22 (10.4%)
3.19 (11.8%)
3
22
89.7by
28.9%
5.38 (9.0%)
2.17 (12.8%)
4
578
89M
26.7%
4.70 (12.3%)
1.39 (24.7%)
5
31
296by
0.008%
4.17 (8.1%)
47 (34.0%)
Original running
time: 4.05s
28
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
soot
Run
#Objects
Space
%Heap
Timec(s) (%wio)
Timer(s) (%rio)
1
461058
36.2M
36.3%
4695.3 (0.4%)
4643.5 (0.5%)
2
65648481
745M
73.2%
4712.2 (7.2%)
4410.5 (9.1%)
3
65648481
745M
73.2%
4688.4 (6.9%)
4387.3 (8.7%)
4
77739391
806.4M 79.0%
4770.1 (8.0%)
511.5 (95.2%)
5
77767256
806.5M 63.5%
4972.8 (8.0%)
533.1 (97.8%)
6
75668735
795.3M 72.8%
4661.6 (8.0%)
411.5 (96.5%)
Original running
time: 4665.7s
29
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
jflex
Run
#Objects
Space
%Heap
Timec(s) (%wio)
Timer(s) (%rio)
1
6606489
259.8M
86.1%
64.9 (8.0%)
68.8 (18.3%)
2
6695173
385.1M
68.1%
65.2 (12.3%)
55.6 (26.1%)
3
6695172
385.1M
68.1%
63.9 (12.1%)
55.4 (26.0%)
4
21
2K
0.0003%
56.2 (0.14%)
0.063 (50.8%)
Original running
time: 52.6s
30
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Download