380C lecture 19 • Where are we & where we are going

advertisement
380C
lecture 19
• Where are we & where we are going
– Managed languages
• Dynamic compilation
• Inlining
• Garbage collection
– Opportunity to improve data locality on-the-fly
– Other opportunities?
–
–
–
–
–
Why you need to care about workloads
Alias analysis
Dependence analysis
Loop transformations
EDGE architectures
1
CS380C Lecture 19
Garbage Collection Advantage:
Improving Program Locality
Xianglong Huang (UT)
Stephen M Blackburn (ANU), Kathryn S McKinley (UT)
J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)
2
CS380C Lecture 19
Today: Advanced Topics
• Generational Garbage Collection
• Copying objects is an opportunity
• Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn
S McKinley (UT), J Eliot B Moss (UMass), Zhenlin Wang
(MTU), Perry Cheng (IBM), “The Garbage Collection
Advantage: Improving Program Locality,” OOPSLA 2004.
3
CS380C Lecture 19
Motivation
• Memory gap problem
• OO programs become more popular
• OO programs exacerbates memory gap
problem
– Automatic memory management
– Pointer data structures
– Many small methods
Goal: improve OO program locality
4
CS380C Lecture 19
Allocation Mechanisms
Bump-Pointer
 Fast (increment & bounds
check)
 contemporaneous object
locality
 Can't incrementally free &
reuse: must free en masse
5
CS380C Lecture 19
Allocation Mechanisms
Bump-Pointer
 Fast (increment & bounds
check)
 contemporaneous object
locality
 Can't incrementally free &
reuse: must free en masse
6
CS380C Lecture 19
Allocation Mechanisms
Bump-Pointer
Free-List
 Fast (increment & bounds
check)
 contemporaneous object
locality
 Can't incrementally free &
reuse: must free en masse
 Slightly slower (consult list
for fit)
 Mystery locality
Can incrementally free &
reuse cells
7
CS380C Lecture 19
State-of-the-art throughput
Copying Generational GC
etc. etc …
‘nursery’
‘older generation’
• Requirements
– write-barrier to track inter-generation pointers
• remsets, cards
– copy reserve
• Advantages:
– Minimizes copying of older objects
– Compaction of long-lived objects
• Problems:
– Not very incremental
– Very youngest objects always copied
– What order should GC use to copy objects?
8
CS380C Lecture 19
Opportunity
• Generational copying garbage
collector reorders objects at runtime
9
CS380C Lecture 19
Copying of Linked Objects
1
2
3
4
5
6
7
Breadth
First
10
CS380C Lecture 19
Copying of Linked Objects
1
2
4
3
6
5
Breadth
First
1
2
3
4
5
7
6
7
Depth
First
11
CS380C Lecture 19
Copying of Linked Objects
1
2
4
3
5
Breadth
First
Depth
First
1
1
2
2
3
3
7
6
4
5
Online
Object
Reordering
5
4
6
7
6
7
12
CS380C Lecture 19
Outline
•
•
•
•
•
Motivation
Online Object Reordering (OOR)
Methodology
Experimental Results
Conclusion
13
CS380C Lecture 19
Cache Performance Matters
Total Cycles (in billions)
_213_javac
40
35
30
25
20
15
10
5
0
c
rfe
1
IL
,1
D
ct
K
28
fe
er
,P
K
L1
tI
,8
2
tL
1
DL
c
rfe
Pe
Pe
8K
L1
L2
CS380C Lecture 19
14
Online Object Reordering
• Where are the cache misses?
• How to identify hot field accesses at
runtime?
• How to reorder the objects?
15
CS380C Lecture 19
Where Are The Cache Misses?
• Heap structure:
VM Objects
Stack
Older
Generation
Nursery
Not to scale
16
CS380C Lecture 19
_209_db
2000
1800
1600
1400
1200
1000
800
600
400
200
0
L2 hits
L2 misses
N
er
ts
en
y
G
ec
bj
k
O
r
se
ur
ld
O
ac
St
VM
Total Accesses (in millions)
Where Are The Cache Misses?
17
CS380C Lecture 19
Where Are The Cache Misses?
• Two opportunities to reorder objects
in the older generation
– Promote nursery objects
– Full heap collection
18
CS380C Lecture 19
How to Find Hot Fields?
• Runtime info (intercept every read)?
• Compiler analysis?
• Runtime information + compiler
analysis
Key: Low overhead estimation
19
CS380C Lecture 19
Which Classes Need
Reordering?
Step 1: Compiler analysis
– Excludes cold basic blocks
– Identifies field accesses
Step 2: JIT adaptive sampling
identifies hot methods
– Mark as hot field accesses in hot
methods
Key: Low overhead estimation
20
CS380C Lecture 19
Example: Compiler Analysis
Method Foo {
Class A a;
try {
Hot BB
Collect access info
…=a.b;
…
}
catch(Exception e){ Cold BB
Ignore
…a.c
}
}
Compiler
Access List:
1. A.b
2. ….
….
21
CS380C Lecture 19
Example: Adaptive Sampling
Method Foo {
Class A a;
try {
…=a.b;
Adaptive Sampling
…
Foo Accesses:
1. A.b
2. ….
….
Foo is hot
}
catch(Exception e){
…a.c
}
}
A.b is hot
A
c
…..
b
A’s type information
B
c
b
22
CS380C Lecture 19
Copying of Linked Objects
Type Information
1
4
3
1
2
3
5
4
6
7
Online
Object
Reordering
Hot space
Cold space
23
CS380C Lecture 19
OOR System Overview
Hot
Methods
Source
Code
Baseline
Compiler
Executing
Code
Input/Output
Look Up
Access Info
Database
Adaptive
Sampling
Optimizing
Compiler
Affects
Improves
Locality
Adds
Entries
GC: Copies
Objects
JikesRVM component
CS380C Lecture 19
Register Hot
Field Accesses
Advice
OOR addition
24
Outline
•
•
•
•
•
Motivation
Online Object Reordering
Methodology
Experimental Results
Conclusion
25
CS380C Lecture 19
Methodology:
Virtual Machine
• Jikes RVM
–
–
–
–
VM written in Java
High performance
Timer based adaptive sampling
Dynamic optimization
• Experiment setup
– Pseudo-adaptive
– 2nd iteration [Eeckhout et al.]
26
CS380C Lecture 19
Methodology: Memory
Management
• Memory Management Toolkit (MMTk):
– Allocators and garbage collectors
– Multi-space heap
• Boot image
• Large object space (LOS)
• Immortal space
• Experiment setup
– Generational copying GC with 4M
bounded nursery
27
CS380C Lecture 19
Overhead: OOR Analysis Only
Benchmark
Base Execution Time
(sec)
w/ only OOR
Analysis (sec)
Overhead
jess
4.39
4.43
0.84%
jack
5.79
5.82
0.57%
raytrace
4.63
4.61
-0.59%
mtrt
4.95
4.99
0.70%
javac
12.83
12.70
-1.05%
compress
8.56
8.54
0.20%
pseudojbb
13.39
13.43
0.36%
db
18.88
18.88
-0.03%
0.94
0.91
-2.90%
hsqldb
160.56
158.46
-1.30%
ipsixql
41.62
42.43
1.93%
jython
37.71
37.16
-1.44%
ps-fun
129.24
128.04
-1.03%
antlr
Mean
-0.19%
28
CS380C Lecture 19
Detailed Experiments
•
•
•
•
Separate application and GC time
Vary thresholds for method heat
Vary thresholds for cold basic blocks
Three architectures
– x86, AMD, PowerPC
• x86 Performance counter:
– DL1, trace cache, L2, DTLB, ITLB
29
CS380C Lecture 19
Performance javac
30
CS380C Lecture 19
Performance db
31
CS380C Lecture 19
Performance jython
Any static ordering leaves you vulnerable to pathological cases.
32
CS380C Lecture 19
Phase Changes
33
CS380C Lecture 19
Related Work
• Evaluate static orderings
[Wilson et al.]
– Large performance variation
• Static profiling
[Chilimbi et al., and others]
– Lack of flexibility
• Instance-based object reordering
[Chilimbi et al.]
– Too expensive
34
CS380C Lecture 19
Conclusion
• Static traversal orders have up
to 25% variation
• OOR improves or matches best
static ordering
• OOR has very low overhead
• Past predicts future
35
CS380C Lecture 19
380C
• Where are we & where we are going
– Managed languages
• Dynamic compilation
• Inlining
• Garbage collection
– Why you need to care about workloads &
methodology
–
–
–
–
• Read: Blackburn et al., Wake Up and Smell the
Coffee: Evaluation Methodology for the 21st Century,
ACM CACM, 51(8): 83--89, August, 2008.
Alias analysis
Dependence analysis
Loop transformations
EDGE architectures
CS380C Lecture 19
36
Download