MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian

MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, Onur Mutlu 1 Main Memory Interference is a Problem Core Core Core Core Main Memory 2 6 6 5 5 Slowdown Slowdown Unpredictable Application Slowdowns 4 3 2 1 0 4 3 2 1 0 leslie3d (core 0) gcc (core 1) leslie3d (core 0) mcf (core 1) An application’s performance depends on which application it is running with 3 Need for Predictable Performance  There is a need for predictable performance    When multiple applications share resources Especially if some applications require performance guarantees Example 1: In mobile systems Our Goal: Predictable performance Interactive applications run with non-interactive applications inNeed thetopresence of memory interference guarantee performance for interactive applications    Example 2: In server systems   Different users’ jobs consolidated onto the same server Need to provide bounded slowdowns to critical jobs 4 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown 5 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  6 Slowdown: Definition Performanc e Alone Slowdown  Performanc e Shared 7 Key Observation 1 Normalized Performance For a memory bound application, Performance  Memory request service rate 1 omnetpp 0.9 Harder mcf 0.8 astar Request Service Performanc e AloneRate Alone Intel Core i7, 4 cores Slowdown0.6 Mem. Bandwidth: 8.5 GB/s Performanc e SharedRate Request Service Shared 0.5 0.7 0.4 Easy 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Request Service Rate 8 Key Observation 2 Request Service Rate Alone (RSRAlone) of an application can be estimated by giving the application highest priority in accessing memory Highest priority  Little interference (almost as if the application were run alone) 9 Key Observation 2 1. Run alone Time units Request Buffer State 3 Main Memory 2. Run with another application Time units Request Buffer State Main Memory 3 Service order 2 1 Main Memory Service order 2 1 Main Memory 3. Run with another application: highest priority Time units Request Buffer State Main Memory 3 Service order 2 1 Main Memory 10 Memory Interference-induced Slowdown Estimation (MISE) model for memory bound applications Request Service Rate Alone (RSRAlone) Slowdown  Request Service Rate Shared (RSRShared) 11 Key Observation 3  Memory-bound application Compute Phase Memory Phase No interference With interference Req Req Req time Req Req Req time Memory phase slowdown dominates overall slowdown 12 Key Observation 3  Non-memory-bound application Compute Phase Memory Phase Memory Interference-induced Slowdown Estimation  1 (MISE) model for non-memory bound applications No interference RSRAlone time Slowdown  (1 -  )   RSRShared With interference 1 RSRAlone  RSRShared time Only memory fraction () slows down with interference 13 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  14 Interval Based Operation Interval Interval time   Measure RSRShared,  Estimate RSRAlone   Measure RSRShared,  Estimate RSRAlone Estimate slowdown Estimate slowdown 15 Measuring RSRShared and α  Request Service Rate   Shared (RSRShared) Per-core counter to track number of requests serviced At the end of each interval, measure Number of Requests Serviced RSRShared  Interval Length  Memory Phase Fraction ( )   Count number of stall cycles at the core Compute fraction of cycles stalled for memory 16 Estimating Request Service Rate Alone (RSRAlone)  Divide each interval into shorter epochs  At the beginning of each epoch   Memory controller randomly picks application as the Goal: Estimate RSRanAlone highest priority application How: Periodically give each application in accessing memory At highest the end ofpriority an interval, for each application, estimate Number of Requests During High Priority Epochs RSRAlone  Number of Cycles Applicatio n Given High Priority 17 Inaccuracy in Estimating RSRAlone  When an Request Buffer State  application has priority Service order Time unitshighest 3 2 1 Still experiences some interference Main Memory Request Buffer State Request Buffer State Time units Main Memory 3 Time units Main Memory 3 Time units 3 High Priority Main Memory Service order 2 1 Main Memory Service order 2 1 Main Memory Service order 2 1 Main Memory Interference Cycles 18 Accounting for Interference in RSRAlone Estimation  Solution: Determine and remove interference cycles from RSRAlone calculation RSRAlone   Number of Requests During High Priority Epochs Number of Cycles Applicatio n Given High Priority - Interferen ce Cycles A cycle is an interference cycle if   a request from the highest priority application is waiting in the request buffer and another application’s request was issued previously 19 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  20 MISE Model: Putting it All Together Interval Interval time   Measure RSRShared,  Estimate RSRAlone   Measure RSRShared,  Estimate RSRAlone Estimate slowdown Estimate slowdown 21 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  22 Previous Work on Slowdown Estimation  Previous work on slowdown estimation     STFM (Stall Time Fair Memory) Scheduling [Mutlu+, MICRO ‘07] FST (Fairness via Source Throttling) [Ebrahimi+, ASPLOS ‘10] Per-thread Cycle Accounting [Du Bois+, HiPEAC ‘13] Basic Idea: Stall Time Alone Slowdown  Stall Time Shared Hard Easy Count number of cycles application receives interference 23 Two Major Advantages of MISE Over STFM  Advantage 1:    STFM estimates alone performance while an application is receiving interference  Hard MISE estimates alone performance while giving an application the highest priority  Easier Advantage 2:   STFM does not take into account compute phase for non-memory-bound applications MISE accounts for compute phase  Better accuracy 24 Methodology  Configuration of our simulated system      4 cores 1 channel, 8 banks/channel DDR3 1066 DRAM 512 KB private cache/core Workloads   SPEC CPU2006 300 multi programmed workloads 25 Quantitative Comparison SPEC CPU 2006 application leslie3d 4 Slowdown 3.5 3 Actual STFM MISE 2.5 2 1.5 1 0 20 40 60 80 100 Million Cycles 26 4 4 3 3 3 2 1 2 1 4 Average error of MISE: 0 50 100 0 8.2%50 100 cactusADM GemsFDTD soplex Average error of STFM: 29.4% 4 4 (across 300 workloads) 3 3 Slowdown 3 2 1 0 2 1 0 0 1 100 50 Slowdown 0 2 0 0 0 Slowdown Slowdown 4 Slowdown Slowdown Comparison to STFM 50 wrf 100 2 1 0 0 50 calculix 100 0 50 100 povray 27 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  28 Providing “Soft” Slowdown Guarantees  Goal 1. Ensure QoS-critical applications meet a prescribed slowdown bound 2. Maximize system performance for other applications  Basic Idea   Allocate just enough bandwidth to QoS-critical application Assign remaining bandwidth to other applications 29 MISE-QoS: Mechanism to Provide Soft QoS     Assign an initial bandwidth allocation to QoS-critical application Estimate slowdown of QoS-critical application using the MISE model After every N intervals  If slowdown > bound B +/- ε, increase bandwidth allocation  If slowdown < bound B +/- ε, decrease bandwidth allocation When slowdown bound not met for N intervals  Notify the OS so it can migrate/de-schedule jobs 30 Methodology      Each application (25 applications in total) considered the QoS-critical application Run with 12 sets of co-runners of different memory intensities Total of 300 multi programmed workloads Each workload run with 10 slowdown bound values Baseline memory scheduling mechanism  Always prioritize QoS-critical application [Iyer+, SIGMETRICS 2007]  Other applications’ requests scheduled in FRFCFS order [Zuravleff +, US Patent 1997, Rixner+, ISCA 2000] 31 A Look at One Workload Slowdown Bound = 10 Slowdown Bound = 3.33 Slowdown Bound = 2 3 Slowdown 2.5 2 AlwaysPrioritize MISE-QoS-10/1 MISE-QoS-10/3 MISE-QoS-10/5 MISE-QoS-10/7 MISE-QoS-10/9 MISE 1.5 is effective in 1. meeting the slowdown bound for the QoS1 critical application 2. 0.5 improving performance of non-QoS-critical applications 0 leslie3d hmmer lbm omnetpp QoS-critical non-QoS-critical 32 Effectiveness of MISE in Enforcing QoS Across 3000 data points Predicted Met Predicted Not Met QoS Bound Met 78.8% 2.1% QoS Bound Not Met 2.2% 16.9% MISE-QoS meets the bound for 80.9% of workloads MISE-QoS correctly predicts whether or not the bound is met for 95.7% of workloads AlwaysPrioritize meets the bound for 83% of workloads 33 Performance of Non-QoS-Critical Applications Harmonic Speedup 1.4 1.2 1 0.8 0.6 0.4 0.2 AlwaysPrioritize MISE-QoS-10/1 MISE-QoS-10/3 MISE-QoS-10/5 MISE-QoS-10/7 MISE-QoS-10/9 0 0 1slowdown 2 3 Avgis 10/3 When bound Higher when bound is loose Numberperformance of Memory Intensive Applications MISE-QoS improves system performance by 10% 34 Outline 1. Estimate Slowdown Key Observations  Implementation  MISE Model: Putting it All Together  Evaluating the Model  2. Control Slowdown Providing Soft Slowdown Guarantees  Minimizing Maximum Slowdown  35 Other Results in the Paper  Sensitivity to model parameters   Comparison of STFM and MISE models in enforcing soft slowdown guarantees   Robust across different values of model parameters MISE significantly more effective in enforcing guarantees Minimizing maximum slowdown  MISE improves fairness across several system configurations 36 Summary    Uncontrolled memory interference slows down applications unpredictably Goal: Estimate and control slowdowns Key contribution    Key Idea    MISE: An accurate slowdown estimation model Average error of MISE: 8.2% Request Service Rate is a proxy for performance Request Service Rate Alone estimated by giving an application highest priority in accessing memory Leverage slowdown estimates to control slowdowns   Providing soft slowdown guarantees Minimizing maximum slowdown 37 Thank You 38 MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, Onur Mutlu 39 Backup Slides 40 Case Study with Two QoS-Critical Applications  10 Two comparison points Always prioritize both applications  8Prioritize each application 50% of time  9 AlwaysPrioritize EqualBandwidth MISE-QoS-10/1 MISE-QoS-10/2 MISE-QoS-10/3 MISE-QoS-10/4 MISE-QoS-10/5 Slowdown 7 6 5 4 3 2 1 0 MISE-QoS MISE-QoSprovides can achieve much a lower slowdowns slowdown astarforbound mcf forleslie3d mcf non-QoS-critical both applications applications 41 Minimizing Maximum Slowdown  Goal   Minimize the maximum slowdown experienced by any application Basic Idea  Assign more memory bandwidth to the more slowed down application Mechanism  Memory controller tracks    Slowdown bound B Bandwidth allocation of all applications Different components of mechanism    Bandwidth redistribution policy Modifying target bound Communicating target bound to OS periodically Bandwidth Redistribution  At the end of each interval,  Group applications into two clusters  Cluster 1: applications that meet bound  Cluster 2: applications that don’t meet bound  Steal small amount of bandwidth from each application in cluster 1 and allocate to applications in cluster 2 Modifying Target Bound  If bound B is met for past N intervals    Bound can be made more aggressive Set bound higher than the slowdown of most slowed down application If bound B not met for past N intervals by more than half the applications   Bound should be more relaxed Set bound to slowdown of most slowed down application Results: Harmonic Speedup 0.7 Harmonic Speedup 0.6 0.5 FRFCFS 0.4 ATLAS TCM 0.3 STFM MISE-Fair 0.2 0.1 0 4 8 16 46 Results: Maximum Slowdown 16 Maximum Slowdown 14 12 10 FRFCFS ATLAS 8 TCM 6 STFM MISE-Fair 4 2 0 4 8 Core Count 16 47 Sensitivity to Memory Intensity 25 Maximum Slowdown 20 FRFCFS 15 ATLAS TCM 10 STFM MISE-Fair 5 0 0 25 50 75 100 Avg MISE’s Implementation Cost 1. Per-core counters worth 20 bytes  Request Service Rate Shared  Request Service Rate Alone    1 counter for number of high priority epoch requests 1 counter for number of high priority epoch cycles 1 counter for interference cycles Memory phase fraction ( ) 2. Register for current bandwidth allocation – 4 bytes 3. Logic for prioritizing an application in each epoch  49

MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian

Related documents

Products

Support

MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib