Whose Cache Line Is It Anyway? Operating System Support for Live Detection and Repair of False Sharing Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield University of British Columbia 2 3 Mondriaan Memory Protection [ASPLOS ’02, SOSP ’05] 4 Byte-granularity, software-only remapping 5 False Sharing 6 7 Control VM + (Dom0) Target System Xen Memory Hardware 8 Dynamic Detection and Mitigation of False Sharing 9 10 T1 T2 Write Read 0x300 Write 0x308 Cache 0x300 0x340 Main Memory 11 Cache Line C Structure With Padding With Allocator Metadata 12 Serial Parallel Regular (FS) Source Fixed 4 6 40 35 Time (s) 30 25 20 15 10 5 0 1 2 3 No. of Cores 5 7 8 13 Serial Parallel Regular (FS) Source Fixed 4 6 40 35 Time (s) 30 25 20 15 10 5 0 1 2 3 No. of Cores 5 7 8 14 Serial Parallel Regular (FS) Source Fixed 4 6 40 35 Time (s) 30 25 20 15 10 5 0 1 2 3 No. of Cores 5 7 8 15 Serial Parallel Regular (FS) Source Fixed 4 6 40 35 Time (s) 30 25 20 15 10 5 0 1 2 3 No. of Cores 5 7 8 16 Serial Parallel Regular (FS) Source Fixed 40 35 Time (s) 30 25 20 15 7.5x 10 5 0 1 2 3 4 No. of Cores 5 6 7 8 Linux Kernel [OSDI ’10], JVM [Dice, 2012], Software Transactional Memory [HPCA ’06] 17 Dynamic Detection and Mitigation of False Sharing 18 Modify access locations Modify access frequency Sheriff [OOPSLA ’11] 19 20 T1 T2 Isolated Page Underlay Page 21 Dynamic Detection and Mitigation of False Sharing 22 Persistent, highfrequency false sharing 23 Very Fast and Imprecise Fast and Somewhat Precise Slow and Precise 24 Performance Counters Log Page Reads Instruction Emulation Log-Analysis Rules for remapper What What Does Does are pages contention the thisare byte signify involved exist? ranges inbeing false the contention? accessed? sharing? 25 Dynamic Detection and Mitigation of False Sharing 26 T1 T2 Isolated Page Underlay Page 27 Don’t be Evil 28 Fault Driven Redirection 29 Original Code Code Cache 30 Original Code Code Cache 31 Catch all accesses via data path Avoid code trampolines Amortize page fault cost 32 “Know When You are Beaten” 33 T1 T2 Isolated Page Underlay Page 34 Evaluation 35 Progress (million records) 600 Remappings Established 500 400 300 160 M/sec 200 110 M/sec 100 0 0 1000 Version with false sharing under Plastic 2000 3000 4000 5000 6000 Time (ms) Coherence Invalidations Source-fixed Version 36 Normalized Performance 1 0.9 0.8 Regular CCBench Phoenix w/Plastic Parsec 0.7 5.4x 0.6 0.5 0.4 3.6x 0.3 0.2 1.4x 0.1 0 37 Low overhead runtime detection Byte-granularity remapping Speedup of up to 5.4x 38 Performance Optimizations Security Enhancements 39 40