Application Slowdown Model: Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur Mutlu Carnegie Mellon University, Intel Labs, University of Virginia Impact of Shared Resource Interference Core Shared Cache Core Main Memory 6 5 4 3 2 1 0 Slowdown Core Slowdown Problem: Shared Resource Interference Provide high and predictable performance in the presence of shared resource interference 6 5 4 3 2 1 0 leslie3d gccgcc (core (core 0) (core 1) 1) Core Our Goal Our Approach leslie3d mcfmcf (core (core 0) (core 1) 1) 1. Build a model to estimate slowdowns 2. Leverage our model for slowdownaware resource management 1. High application slowdowns 2. Unpredictable application slowdowns Application Slowdown Model (ASM) Observation: Proxy for Performance For a memory bound application, Performance α Cache access rate Minimize memory bandwidth contention: Using priority (Subramanian et al., HPCA 2013) 1. Run alone 2.2 1.8 astar lbm bzip2 1.4 1.2 1 1.2 1.4 1.6 1.8 2 2.2 Shared Cache Access Rate Alone/ Shared Cache Access Rate Shared Slowdown= 2 1 Core Main Memory Shared Cache 2. Run with another application Time units Service order Request Buffer State 3 Main Memory 1 Slowdown= 3 Main Memory 1.6 Quantify shared cache capacity contention: Using auxiliary tag stores (Pomerene et al., 1989) Time units Service order Request Buffer State 2 Slowdown Challenge: Estimating Cache Access Rate Alone 2 1 Core Main Memory Performance Alone Performance Shared Cache Access Rate Alone (CARAlone) Cache Access Rate Shared (CARShared) Auxiliary Tag Store Time units Service order 3 2 1 Priority Auxiliary Tag Store Main Memory 3. Run with another application: highest priority Request Buffer State Main Memory Main Memory Auxiliary tag store counts #contention misses Highest priority Little interference (almost as if application were run alone) Enables estimation of miss service time Cache Contention Cycles = #Contention Misses x Average Miss Service Time From auxiliary tag store when given high priority Measured when given high priority Remove contention cycles when estimating CARAlone Average error of ASM: 10%; Average error of previous models: 30% Leveraging the Application Slowdown Model Coordinated Resource Allocation Schemes Shared Cache Main Memory Core Core Previous work: Reduce miss counts; Our proposal: Reduce slowdowns Shared Cache Core Slowdown-aware memory bandwidth partitioning Shared Cache Main Memory Core Allocation memory bandwidth proportional to slowdowns Main Memory Core 16-core system 100 workloads Fairness (Lower is better) Core Core 11 0.35 10 0.3 9 0.25 Performance Core 8 7 6 0 1 2 Number of Channels 3.5 TCM+UCP 3 PARBS+UCP ASM-Cache-Mem 0.1 4 4 FRFCFS+UCP 0.15 0.05 • Cache allocation with the goal of meeting a slowdown bound • Allocate just enough cache space to critical application • Allocate remaining cache space to other applications FRFCFS-NoPart 0.2 5 Providing Slowdown Guarantees 1 2 Number of Channels Slowdown Slowdown-aware cache capacity partitioning Naive-QoS 2.5 ASM-QoS-2.5 2 ASM-QoS-3 1.5 ASM-QoS-3.5 ASM-QoS-4 1 0.5 Significant fairness benefits across different channel counts 0 h264ref mcf sphinx3 soplex