MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, Onur Mutlu 1 Main Memory Interference is a Problem Core Core Core Core Main Memory 2 6 6 5 5 Slowdown Slowdown Unpredictable Application Slowdowns 4 3 2 1 0 4 3 2 1 0 leslie3d (core 0) gcc (core 1) leslie3d (core 0) mcf (core 1) An application’s performance depends on which application it is running with 3 Need for Predictable Performance There is a need for predictable performance When multiple applications share resources Especially if some applications require performance guarantees Example 1: In mobile systems Our Goal: Predictable performance Interactive applications run with non-interactive applications inNeed thetopresence of memory interference guarantee performance for interactive applications Example 2: In server systems Different users’ jobs consolidated onto the same server Need to provide bounded slowdowns to critical jobs 4 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown 5 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 6 Slowdown: Definition Performanc e Alone Slowdown Performanc e Shared 7 Key Observation 1 Normalized Performance For a memory bound application, Performance Memory request service rate 1 omnetpp 0.9 Harder mcf 0.8 astar Request Service Performanc e AloneRate Alone Intel Core i7, 4 cores Slowdown0.6 Mem. Bandwidth: 8.5 GB/s Performanc e SharedRate Request Service Shared 0.5 0.7 0.4 Easy 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Request Service Rate 8 Key Observation 2 Request Service Rate Alone (RSRAlone) of an application can be estimated by giving the application highest priority in accessing memory Highest priority Little interference (almost as if the application were run alone) 9 Key Observation 2 1. Run alone Time units Request Buffer State 3 Main Memory 2. Run with another application Time units Request Buffer State Main Memory 3 Service order 2 1 Main Memory Service order 2 1 Main Memory 3. Run with another application: highest priority Time units Request Buffer State Main Memory 3 Service order 2 1 Main Memory 10 Memory Interference-induced Slowdown Estimation (MISE) model for memory bound applications Request Service Rate Alone (RSRAlone) Slowdown Request Service Rate Shared (RSRShared) 11 Key Observation 3 Memory-bound application Compute Phase Memory Phase No interference With interference Req Req Req time Req Req Req time Memory phase slowdown dominates overall slowdown 12 Key Observation 3 Non-memory-bound application Compute Phase Memory Phase Memory Interference-induced Slowdown Estimation 1 (MISE) model for non-memory bound applications No interference RSRAlone time Slowdown (1 - ) RSRShared With interference 1 RSRAlone RSRShared time Only memory fraction () slows down with interference 13 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 14 Interval Based Operation Interval Interval time Measure RSRShared, Estimate RSRAlone Measure RSRShared, Estimate RSRAlone Estimate slowdown Estimate slowdown 15 Measuring RSRShared and α Request Service Rate Shared (RSRShared) Per-core counter to track number of requests serviced At the end of each interval, measure Number of Requests Serviced RSRShared Interval Length Memory Phase Fraction ( ) Count number of stall cycles at the core Compute fraction of cycles stalled for memory 16 Estimating Request Service Rate Alone (RSRAlone) Divide each interval into shorter epochs At the beginning of each epoch Memory controller randomly picks application as the Goal: Estimate RSRanAlone highest priority application How: Periodically give each application in accessing memory At highest the end ofpriority an interval, for each application, estimate Number of Requests During High Priority Epochs RSRAlone Number of Cycles Applicatio n Given High Priority 17 Inaccuracy in Estimating RSRAlone When an Request Buffer State application has priority Service order Time unitshighest 3 2 1 Still experiences some interference Main Memory Request Buffer State Request Buffer State Time units Main Memory 3 Time units Main Memory 3 Time units 3 High Priority Main Memory Service order 2 1 Main Memory Service order 2 1 Main Memory Service order 2 1 Main Memory Interference Cycles 18 Accounting for Interference in RSRAlone Estimation Solution: Determine and remove interference cycles from RSRAlone calculation RSRAlone Number of Requests During High Priority Epochs Number of Cycles Applicatio n Given High Priority - Interferen ce Cycles A cycle is an interference cycle if a request from the highest priority application is waiting in the request buffer and another application’s request was issued previously 19 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 20 MISE Model: Putting it All Together Interval Interval time Measure RSRShared, Estimate RSRAlone Measure RSRShared, Estimate RSRAlone Estimate slowdown Estimate slowdown 21 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 22 Previous Work on Slowdown Estimation Previous work on slowdown estimation STFM (Stall Time Fair Memory) Scheduling [Mutlu+, MICRO ‘07] FST (Fairness via Source Throttling) [Ebrahimi+, ASPLOS ‘10] Per-thread Cycle Accounting [Du Bois+, HiPEAC ‘13] Basic Idea: Stall Time Alone Slowdown Stall Time Shared Hard Easy Count number of cycles application receives interference 23 Two Major Advantages of MISE Over STFM Advantage 1: STFM estimates alone performance while an application is receiving interference Hard MISE estimates alone performance while giving an application the highest priority Easier Advantage 2: STFM does not take into account compute phase for non-memory-bound applications MISE accounts for compute phase Better accuracy 24 Methodology Configuration of our simulated system 4 cores 1 channel, 8 banks/channel DDR3 1066 DRAM 512 KB private cache/core Workloads SPEC CPU2006 300 multi programmed workloads 25 Quantitative Comparison SPEC CPU 2006 application leslie3d 4 Slowdown 3.5 3 Actual STFM MISE 2.5 2 1.5 1 0 20 40 60 80 100 Million Cycles 26 4 4 3 3 3 2 1 2 1 4 Average error of MISE: 0 50 100 0 8.2%50 100 cactusADM GemsFDTD soplex Average error of STFM: 29.4% 4 4 (across 300 workloads) 3 3 Slowdown 3 2 1 0 2 1 0 0 1 100 50 Slowdown 0 2 0 0 0 Slowdown Slowdown 4 Slowdown Slowdown Comparison to STFM 50 wrf 100 2 1 0 0 50 calculix 100 0 50 100 povray 27 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 28 Providing “Soft” Slowdown Guarantees Goal 1. Ensure QoS-critical applications meet a prescribed slowdown bound 2. Maximize system performance for other applications Basic Idea Allocate just enough bandwidth to QoS-critical application Assign remaining bandwidth to other applications 29 MISE-QoS: Mechanism to Provide Soft QoS Assign an initial bandwidth allocation to QoS-critical application Estimate slowdown of QoS-critical application using the MISE model After every N intervals If slowdown > bound B +/- ε, increase bandwidth allocation If slowdown < bound B +/- ε, decrease bandwidth allocation When slowdown bound not met for N intervals Notify the OS so it can migrate/de-schedule jobs 30 Methodology Each application (25 applications in total) considered the QoS-critical application Run with 12 sets of co-runners of different memory intensities Total of 300 multi programmed workloads Each workload run with 10 slowdown bound values Baseline memory scheduling mechanism Always prioritize QoS-critical application [Iyer+, SIGMETRICS 2007] Other applications’ requests scheduled in FRFCFS order [Zuravleff +, US Patent 1997, Rixner+, ISCA 2000] 31 A Look at One Workload Slowdown Bound = 10 Slowdown Bound = 3.33 Slowdown Bound = 2 3 Slowdown 2.5 2 AlwaysPrioritize MISE-QoS-10/1 MISE-QoS-10/3 MISE-QoS-10/5 MISE-QoS-10/7 MISE-QoS-10/9 MISE 1.5 is effective in 1. meeting the slowdown bound for the QoS1 critical application 2. 0.5 improving performance of non-QoS-critical applications 0 leslie3d hmmer lbm omnetpp QoS-critical non-QoS-critical 32 Effectiveness of MISE in Enforcing QoS Across 3000 data points Predicted Met Predicted Not Met QoS Bound Met 78.8% 2.1% QoS Bound Not Met 2.2% 16.9% MISE-QoS meets the bound for 80.9% of workloads MISE-QoS correctly predicts whether or not the bound is met for 95.7% of workloads AlwaysPrioritize meets the bound for 83% of workloads 33 Performance of Non-QoS-Critical Applications Harmonic Speedup 1.4 1.2 1 0.8 0.6 0.4 0.2 AlwaysPrioritize MISE-QoS-10/1 MISE-QoS-10/3 MISE-QoS-10/5 MISE-QoS-10/7 MISE-QoS-10/9 0 0 1slowdown 2 3 Avgis 10/3 When bound Higher when bound is loose Numberperformance of Memory Intensive Applications MISE-QoS improves system performance by 10% 34 Outline 1. Estimate Slowdown Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model 2. Control Slowdown Providing Soft Slowdown Guarantees Minimizing Maximum Slowdown 35 Other Results in the Paper Sensitivity to model parameters Comparison of STFM and MISE models in enforcing soft slowdown guarantees Robust across different values of model parameters MISE significantly more effective in enforcing guarantees Minimizing maximum slowdown MISE improves fairness across several system configurations 36 Summary Uncontrolled memory interference slows down applications unpredictably Goal: Estimate and control slowdowns Key contribution Key Idea MISE: An accurate slowdown estimation model Average error of MISE: 8.2% Request Service Rate is a proxy for performance Request Service Rate Alone estimated by giving an application highest priority in accessing memory Leverage slowdown estimates to control slowdowns Providing soft slowdown guarantees Minimizing maximum slowdown 37 Thank You 38 MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, Onur Mutlu 39 Backup Slides 40 Case Study with Two QoS-Critical Applications 10 Two comparison points Always prioritize both applications 8Prioritize each application 50% of time 9 AlwaysPrioritize EqualBandwidth MISE-QoS-10/1 MISE-QoS-10/2 MISE-QoS-10/3 MISE-QoS-10/4 MISE-QoS-10/5 Slowdown 7 6 5 4 3 2 1 0 MISE-QoS MISE-QoSprovides can achieve much a lower slowdowns slowdown astarforbound mcf forleslie3d mcf non-QoS-critical both applications applications 41 Minimizing Maximum Slowdown Goal Minimize the maximum slowdown experienced by any application Basic Idea Assign more memory bandwidth to the more slowed down application Mechanism Memory controller tracks Slowdown bound B Bandwidth allocation of all applications Different components of mechanism Bandwidth redistribution policy Modifying target bound Communicating target bound to OS periodically Bandwidth Redistribution At the end of each interval, Group applications into two clusters Cluster 1: applications that meet bound Cluster 2: applications that don’t meet bound Steal small amount of bandwidth from each application in cluster 1 and allocate to applications in cluster 2 Modifying Target Bound If bound B is met for past N intervals Bound can be made more aggressive Set bound higher than the slowdown of most slowed down application If bound B not met for past N intervals by more than half the applications Bound should be more relaxed Set bound to slowdown of most slowed down application Results: Harmonic Speedup 0.7 Harmonic Speedup 0.6 0.5 FRFCFS 0.4 ATLAS TCM 0.3 STFM MISE-Fair 0.2 0.1 0 4 8 16 46 Results: Maximum Slowdown 16 Maximum Slowdown 14 12 10 FRFCFS ATLAS 8 TCM 6 STFM MISE-Fair 4 2 0 4 8 Core Count 16 47 Sensitivity to Memory Intensity 25 Maximum Slowdown 20 FRFCFS 15 ATLAS TCM 10 STFM MISE-Fair 5 0 0 25 50 75 100 Avg MISE’s Implementation Cost 1. Per-core counters worth 20 bytes Request Service Rate Shared Request Service Rate Alone 1 counter for number of high priority epoch requests 1 counter for number of high priority epoch cycles 1 counter for interference cycles Memory phase fraction ( ) 2. Register for current bandwidth allocation – 4 bytes 3. Logic for prioritizing an application in each epoch 49