2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011 1/23 Today’s CMP systems Operating System App. 1 App. App.22 App. 3 1/23 App. 4 Within a single application, cache interference can stem from… Operating System App. 1 HW Prefetch Req. TLB Miss Handling Other OS Req. App. Data ld/st 2/23 Real-System LLC Miss Characterization Others [prefetching, page table walks, and etc] Application LLC Misses Percentage (%) 100% 80% >50% of LLC misses are due to prefetching, TLB 60% miss handling, other OS refs, etc. 40% 20% 0% 3/23 Prior Work for Intra-Application Cache Interference • System-induced Cache Interference – Characterization indicates significant OS/user cache interference [Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92] – Reduce TLB miss handling effects [Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10] • Prefetch-induced Cache Interference – Prefetch buffer/filter [Peir et al. ICS ’02] [Hur and Lin MICRO ’06] – Replacement policies (Prefetch bit per cache line) But all require hardware modification [Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01] – Prefetching algorithms [Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04] 1/23 4/23 Contributions of This Paper 1. Cache interference within an application is a problem Real-system characterization Detailed full-system simulation 2. Dynamic management mechanisms System-aware cache management Real-system, real-time prefetch manager 1/23 5/23 Talk Outline Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference System-Aware Cache Management Real-System Dynamic Prefetch Manager Conclusion 1/23 6/23 Measurement Methodology • Real-system infrastructure – Intel Nehalem-based Core i7 (Bloomfield) – perfmon2 to access hardware PMCs • Full-system simulation: Simics/GEMS – Simics/GEMS full system simulation • Benchmarks – SPEC CPU2006 benchmark suite 1/23 7/23 System-Mode Reference Breakdown System-Mode Reference Breakdown page table walk references other system-mode references 100% 80% 60% 40% 20% 0% 80% of system references are due to TLB miss handling (details in the paper). 1/23 8/23 Memory Reuse Characteristics Analysis for User References User System Zero-reused cachelines User-Mode References 100% 80% 60% 40% 20% 0% mcf sphinx3 sjeng bzip2 System cache lines destroy good data locality zero-reused cache lines [baseline] of user lines when sharing the cache! 1/23 zero-reused cache lines [user only] 9/23 Avg. Memory Reuse Characteristics Analysis for System References User System Zero-reused cachelines System-Mode References 100% 80% 60% 40% 20% 0% mcf sphinx3 sjeng bzip2 zero reused cachelines [baseline] Majority of system cache Bypassing lines lines?are not reused. zero reused cache lines [system only] 1/23 10/23 Avg. System-Aware Cache Management 0xEEEA Refs LRU MRU .... 1/23 .... 11/23 System-Aware Cache Management LRU MRU Refs 0X001A 0XDADA MRU .... 0XEEAF 1/23 0X1234 MID 12/23 .... 0XDFAE 0xEEEA LRU System-Aware Cache Management LRU MRU user Refs 0XDADA MRU …. .... 0XEEAF 0X1234 MID system SYS-LRUinsert 1/23 13/23 .... 0XDFAE 0xEEEA LRU System-Aware Cache Management LRU MRU user Refs 0XDADA MRU …. .... 0XEEAF 0X1234 MID system SYS-MIDinsert 1/23 14/23 .... 0xEEEA LRU System-Aware Cache Management LRU MRU user Refs 0XDADA MRU …. .... 0XEEAF 0X1234 MID .... 0xBEEF LRU system SYS-DYNAMIC *Set sampling: DIP [Qureshi et al. ISCA ‘07] 1/23 15/23 IPC Performance Improvement Aggr. IPC Normalized to Baseline (Higher is Better) SYS-LRUinsert SYS-MIDinsert SYS-DYNAMIC 1.3 1.2 1.1 1 0.9 0.8 SYS-DYNAMIC improves performance for ALL applications by as much as 10% (avg. of 3%). 1/23 16/23 Talk Outline Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference System-Aware Cache Management Real-System Dynamic Prefetch Manager Conclusion 1/23 17/23 Intra-application cache interference can also stem from hardware prefetching L1 Instruction & Streamer Prefetchers Mid-Level Cache (MLC) Spatial & Streamer Prefetchers 1/23 18/23 Intra-Application Interference Caused by Hardware Prefetching Miss Counts Normalized to System Default [ALL Prefetchers On] 3 MLC Prefetcher OFF Less LLC Misses for libquantum and sphinx3 2.5 2 1.5 1 0.5 0 Application LLC Misses 1/23 19/23 Dynamic Prefetch Management • Use Nehalem’s Precise Event Based Sampling (PEBS) • Sample application inst. count periodically. K K Inst. Inst. ON OFF Read RDTSC t0 t1 N ..... MLC prefetchers ON Read RDTSC t2 if ( t2 - t1 > t1 – t0) Turn ON MLC prefetchers; else Turn OFF MLC prefetchers; 1/23 20/23 time Dynamic Management Mitigating Prefetch-Induced LLC Interference Application LLC Miss Counts Normalized to System Default Prefetchers On (System Default) 3 Prefetchers Off Dynamic Management 2.5 2 1.5 1 0.5 0 Dynamic modulation of MLC prefetchers >> Static ON/OFF prefetch options. 1/23 21/23 Summary Dynamic System-Aware Cache Management Full-system evaluation (OS effects) Performance improvement by as much as 10% (on avg. 3%). Real-time Dynamic Prefetch Manager Real-system implementation on Nehalem PEBS 25% LLC miss count reduction performance+, bandwidth & energy saving 1/23 22/23 Characterization and Dynamic Mitigation of Intra-Application Cache Interference Operating System *Intra-application* cache Interference from App. 1 modern hardware prefetching & OS influence app. performance significantly! HW Prefetch Req. App. Data ld/st 1/23 TLB Miss Handling Other OS Req. 23/23 2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi {carolewu, mrm}@princeton.edu 1/23