Whose Cache Line Is It Anyway? - University of British Columbia

advertisement
Whose Cache Line
Is It Anyway?
Operating System Support for Live
Detection and Repair of False Sharing
Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan,
Dutch T. Meyer, William Aiello, and Andrew Warfield
University of British Columbia
2
3
Mondriaan Memory Protection [ASPLOS ’02, SOSP ’05]
4
Byte-granularity,
software-only
remapping
5
False Sharing
6
7
Control VM
+
(Dom0)
Target
System
Xen
Memory
Hardware
8
Dynamic Detection and Mitigation
of False Sharing
9
10
T1
T2
Write
Read 0x300
Write 0x308
Cache
0x300
0x340
Main Memory
11
Cache Line
C Structure
With Padding
With Allocator
Metadata
12
Serial
Parallel
Regular (FS)
Source Fixed
4
6
40
35
Time (s)
30
25
20
15
10
5
0
1
2
3
No. of Cores
5
7
8
13
Serial
Parallel
Regular (FS)
Source Fixed
4
6
40
35
Time (s)
30
25
20
15
10
5
0
1
2
3
No. of Cores
5
7
8
14
Serial
Parallel
Regular (FS)
Source Fixed
4
6
40
35
Time (s)
30
25
20
15
10
5
0
1
2
3
No. of Cores
5
7
8
15
Serial
Parallel
Regular (FS)
Source Fixed
4
6
40
35
Time (s)
30
25
20
15
10
5
0
1
2
3
No. of Cores
5
7
8
16
Serial
Parallel
Regular (FS)
Source Fixed
40
35
Time (s)
30
25
20
15
7.5x
10
5
0
1
2
3
4
No. of Cores
5
6
7
8
Linux Kernel [OSDI ’10], JVM [Dice, 2012],
Software Transactional Memory [HPCA ’06]
17
Dynamic Detection and Mitigation
of False Sharing
18
Modify access locations
Modify access frequency
Sheriff [OOPSLA ’11]
19
20
T1 T2
Isolated Page
Underlay Page
21
Dynamic Detection and Mitigation
of False Sharing
22
Persistent, highfrequency
false sharing
23
Very Fast and
Imprecise
Fast and
Somewhat
Precise
Slow and
Precise
24
Performance
Counters
Log Page
Reads
Instruction
Emulation
Log-Analysis
Rules for
remapper
What
What
Does
Does
are
pages
contention
the
thisare
byte
signify
involved
exist?
ranges
inbeing
false
the contention?
accessed?
sharing?
25
Dynamic Detection and Mitigation
of False Sharing
26
T1 T2
Isolated Page
Underlay Page
27
Don’t be Evil
28
Fault Driven Redirection
29
Original Code
Code Cache
30
Original Code
Code Cache
31
Catch all accesses via data path
Avoid code trampolines
Amortize page fault cost
32
“Know When
You are Beaten”
33
T1 T2
Isolated Page
Underlay Page
34
Evaluation
35
Progress (million records)
600
Remappings
Established
500
400
300
160 M/sec
200
110 M/sec
100
0
0
1000
Version with
false sharing
under Plastic
2000
3000
4000
5000
6000
Time (ms)
Coherence
Invalidations
Source-fixed
Version
36
Normalized Performance
1
0.9
0.8
Regular
CCBench
Phoenix
w/Plastic
Parsec
0.7
5.4x
0.6
0.5
0.4
3.6x
0.3
0.2
1.4x
0.1
0
37
Low overhead runtime detection
Byte-granularity remapping
Speedup of up to 5.4x
38
Performance Optimizations
Security Enhancements
39
40
Download