objects - Computer Science Department, Technion

advertisement
Heap Shape Scalability
Scalable Garbage Collection on
Highly Parallel Platforms
Kathy Barabash, Erez Petrank
Computer Science Department
Technion, Israel
Outline

Is tracing GC ready for the many-core?


Evaluating the heap shape scalability


Idealized Trace Utilization
Improving the heap shape scalability



How the heap shape is related?
Solution 1: Reshaping with Shortcut References
Solution 2: Tracing with Speculative Roots
Related work & conclusion
ISMM 2010
2
Is Tracing GC Ready for Many-core ?

GC tracing


Traverse lots of objects
Roots
a
Sequential trace

b
Each live object is
touched (BFS, DFS)
e

Parallel trace


ISMM 2010
Load balancing
1K cores really soon
Heap
c
d
f
g
j
k
h
i
m
l
3
Can Heaps Spoil the Scalability?
Roots

1
2
4M live objects

4M

3
Sequential trace


Single linked list
4M steps
Parallel trace

Not any faster
4K
Heap
ISMM 2010
4
Deep Object Graphs Can be Evil
Definition:
Object Depth
Length of the minimal path from some root object
Object-Graph Depth
Maximal live object depth
Example:
Object Depths
0
How deep are object
graphs of Java programs?
1
2
Heap
ISMM 2010

SpecJVM, Dacapo, SpecJBB

Instrumented BFS trace
3
5
Object-Graph Depths of Java Benchmarks
Name
Heap Size
(MB)
GC
Cycles
Max
Depth
Java compiler run 3 times
32
15
1,234
3D raytracer
32
8
1,416
Java byte code analyzer
48
344
1,195
Java code analyzer
48
59
18,482
Transforms XML into
HTML
128
129
8,476
Description
SpecJVM
javac
mtrt
Dacapo
bloat
pmd
xalan
Other 15 benchmarks
ISMM 2010
128
6
Object-Graph Depths of Java Benchmarks
Name
Heap Size
(MB)
GC
Cycles
Max
Depth
Java compiler run 3 times
32
15
1,234
3D raytracer
32
8
1,416
Java byte code analyzer
48
344
1,195
Java code analyzer
48
59
18,482
Transforms XML into
HTML
128
129
8,476
Description
SpecJVM
javac
mtrt
Dacapo
bloat
pmd
xalan
Other 15 benchmarks
ISMM 2010
128
7
Object-Graph Depths of Java Benchmarks
Name
Heap Size
(MB)
GC
Cycles
Max
Depth
Java compiler run 3 times
32
15
1,234
3D raytracer
32
8
1,416
Java byte code analyzer
48
344
1,195
Java code analyzer
48
59
18,482
Transforms XML into
HTML
128
129
8,476
Description
SpecJVM
javac
mtrt
Dacapo
bloat
pmd
xalan
Other 15 benchmarks
ISMM 2010
128
8
Not all Deep Object Graphs are Evil

Roots
Object-graph

1
2

3
Sequential trace

…

4K
4K
4M steps
Parallel trace

4K
1K same sized linked lists
of 4K objects
Scales well for up to 1K
processors
Heap
ISMM 2010
9
Deep and Narrow Object Graphs are Evil
Definition:
Object Depths Distribution
Amount of objects at different depths
Example:
Graphical Representation
(Object-graph shape):
#objects
1
2
4
# objects
5
4
3
2
1
3
0
1
Heap
ISMM 2010
1
2
3
4
5
depth
10
Object-Graph Shapes of Java Benchmarks
# objects
jython
depth
# objects
xalan
depth
ISMM 2010
11
# objects (log 10)
Object-Graph Shapes of Java Benchmarks
db
jython
jess
bloat
jack
javac
lusearch
mtrt
hsqldb
xalan
antlr
pmd
depth (log 10)
ISMM 2010
depth (log 10)
12
The Idealized Trace Utilization
Simulate the idealized traversal by N threads


Perfect load balancing
Perfect cache behavior


BFS traversal
Single time tick object scan
During the traversal, count


Objects available to be scanned at every time tick
Processor slots: some are busy and some are wasted
At the end, report the utilization (ITU)
Total Scanned Objects
* 100%
Total Processor Slots
ISMM 2010
13
Idealized Trace Utilization Example
Core 1
Core 2
4 Tracers
Core 3
Core 4
Heap objects
Time ticks
1
Scanned objects 2
2 3
4
5
6
7
8
5 9 11 12 13 14 15
Total Scanned Objects
15
* 100% = 47 %
ITU =
* 100% =
8*4
Total Processor Slots
ISMM 2010
14
Graphical Representation
1. Simulate and compute
2. Draw the graph
Utilization
# objects
100
80
60
40
20
0
1
depth
ISMM 2010
2
4
8
Processors
15
Worst Case ITU for Java Benchmarks
100
check
compress
db
80
Utilization
jack
javac
60
jess
mpegaudio
mtrt
40
antlr
bloat
20
hsqldb
jython
lusearch
0
1
2
4
8
16
32
64
Processors
ISMM 2010
128
256
512 1024
pmd
xalan
16
Average ITU for Java Benchmarks
check
100
compress
db
80
jack
Utilization
javac
60
jess
mpegaudio
mtrt
40
antlr
bloat
20
hsqldb
jython
0
lusearch
1
2
4
8
16
32
64
Processors
ISMM 2010
128
256
512 1024
pmd
xalan
17
What’s Next?

Problematic heaps exist


javac, mtrt, pmd, bloat, xalan
Can we improve the trace scalability without
modifying the benchmarks?

Reshape with Shortcut References

Trace with Speculative Roots
ISMM 2010
18
Reshape with Shortcut References

Roots
Sequential trace

1
16K

2
New references
are added

3

4

Invisible to the
program
Useful for the
tracers
Parallel trace

4K
16K steps
Scales for 4
processors
Heap
ISMM 2010
19
Evaluation Prototype

Devise a shortcut strategy


When the program is stopped for GC




Where shortcuts are needed
Compute the Idealized Trace Utilization
Run the shortcuts adding algorithm
Compute the ITU for the modified heap
Report


ISMM 2010
ITU improvement
Amount of shortcuts added
20
Shortcut Strategy and Parameters

Identify candidate subgraphs
 With at least size objects


Size=5
Depth=4
With depth-to-size ratio no less than ratio
Ratio=0.8
Add shortcut to the root of the subgraph
 Leading to the objects length pointers away
 Next shortcut introduced not closer than distance
pointers away
Distance (2)
1
ISMM 2010
2
3
4
Length (4)
5
6
7
8
9
21
Results for SpecJVM mtrt
Worst before
Worst after
Avg before
16
64
Avg after
100
Utilization
80
60
40
20
0
1
Size=50
2
4
8
32
128
256
512
1024
Processors
Ratio=0.2
~ 500K of live objects
Length=50
Max shortcuts – 110
Distance=25
Avg shortcuts – 94
ISMM 2010
22
Results for DaCapo xalan
Worst before
Worst after
Avg before
16
64
Avg after
100
Utilization
80
60
40
20
0
1
Size=50
2
4
8
32
128
256
512
1024
Processors
Ratio=0.2
~ 400K of live objects
Length=50
Max shortcuts – 888
Distance=25
Avg shortcuts – 536
ISMM 2010
23
Results for DaCapo bloat
Worst before
Worst after
Avg before
16
64
Avg after
100
Utilization
80
60
40
20
0
Size=50
Ratio=0.2
1
2
4
8
32
Processors
128
256
512
1024
~ 400K of live objects
Length=50
Max shortcuts – 940
Distance=25
Avg shortcuts – 378
ISMM 2010
24
Results for DaCapo pmd
Worst before
Worst after
Avg before
16
64
Avg after
100
Utilization
80
60
40
20
0
Size=600
Ratio=0.1
1
2
4
8
32
128
256
512
1024
Processors
~ 434K of live objects
Length=120
Max shortcuts – 5,874
Distance=40
Avg shortcuts – 432
ISMM 2010
25
Results for SpecJVM javac
Worst before
Worst after
Avg before
Avg after
100
Utilization
80
60
40
20
0
1
Size=500
Ratio=0.1
2
4
8
16
32
64
128
256
512
1024
Processors
~ 383K of live objects
Length=100
Max shortcuts – 292
Distance=50
Avg shortcuts – 16
ISMM 2010
26
Trace with Speculative Roots

Roots
Sequential trace

16M steps
4M

Helper tracers



Parallel trace

4K
Heap
ISMM 2010
Pick random roots
Trace using custom
colors
Scales for 4
processors
27
Speculative Trace

Helper tracer




Regular trace


Pick up the root
Pick up the color, e.g. red
Trace; if blue object is discovered, mark blue as
reachable from red
Trace from root; if blue object is discovered, mark blue
as live
Complete trace


ISMM 2010
All colors reachable from live colors marked live
All objects marked by live colors survive the collection
28
Evaluation Prototype



4 regular tracers, 4 helper tracers
Speculative roots – random unmarked objects
ITU before and after the colored trace
a
b
e
Useful helpers work
c

d
f
g
j
k
h
i
Wasted helpers work

m
ISMM 2010
Dead objects colored by dead colors
Floating garbage

Heap
Live objects colored by live colors
Dead objects colored by live colors
l
29
Limit the floating garbage

Maximal amount of objects colored by a
single color



Make the random roots choices smarter



Helpers must save discovered but not traced objects
Trace completion phase takes care of the saved fronts
To avoid choosing dead objects
To reach deeper parts of the live object graph
Filter for the recursive objects

ISMM 2010
Objects with referents of their own type
30
Results

Lots of floating garbage


Hard to find good roots


Progressively harder as the live objects are getting
marked
Trace completion phase is complex


Even with the filter
Can defeat the purpose
Modest improvement in the Idealized Trace
Utilization scores
ISMM 2010
31
Results for DaCapo xalan
Worst case ITU improvement, with the random choices filter
100
Utilization
80
60
Before
After
40
20
0
1
2
4
8
16
32
64
128
256
512 1024
Processors
ISMM 2010
32
Results for DaCapo bloat
Worst case ITU improvement, with the random choices filter
100
Utilization
80
60
Before
After
40
20
0
1
2
4
8
16
32
64
128 256 512 1024
Processors
ISMM 2010
33
Related Work

Parallel Garbage Collection Folklore


Siebert (ISMM’08)



There are heap structures that can foil any
clever load balancing scheme
Reported object graph depths for SpecJVM
benchmarks
Proposed upper bound on the worst case
scalability as a way to compute RT guarantees
for the GC tracing
Random tracing originally proposed by Click
ISMM 2010
34
Summary

Studied the heap shape properties of Java
benchmarks


Devised a measure to quantify the heap shape
scalability


Out of twenty considered benchmarks, five had not
scalable heap shapes during the run
Idealized Trace Utilization
Proposed, prototyped and evaluated two
approaches to improve the tracing scalability

ISMM 2010
Reshaping with Shortcuts appears to be more
promising than Tracing from Speculative Roots
35
Thank You!
ISMM 2010
36
Download