MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian

advertisement
MISE:
Providing Performance Predictability
in Shared Main Memory Systems
Lavanya Subramanian, Vivek Seshadri,
Yoongu Kim, Ben Jaiyen, Onur Mutlu
1
Main Memory Interference is a Problem
Core
Core
Core
Core
Main
Memory
2
6
6
5
5
Slowdown
Slowdown
Unpredictable Application Slowdowns
4
3
2
1
0
4
3
2
1
0
leslie3d (core 0)
gcc (core 1)
leslie3d (core 0)
mcf (core 1)
An application’s performance depends on
which application it is running with
3
Need for Predictable Performance

There is a need for predictable performance



When multiple applications share resources
Especially if some applications require performance guarantees
Example 1: In mobile systems
Our
Goal:
Predictable
performance
Interactive applications run with non-interactive applications
inNeed
thetopresence
of memory
interference
guarantee performance
for interactive
applications



Example 2: In server systems


Different users’ jobs consolidated onto the same server
Need to provide bounded slowdowns to critical jobs
4
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
5
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

6
Slowdown: Definition
Performanc e Alone
Slowdown 
Performanc e Shared
7
Key Observation 1
Normalized Performance
For a memory bound application,
Performance  Memory request service rate
1
omnetpp
0.9
Harder
mcf
0.8
astar
Request
Service
Performanc
e AloneRate Alone
Intel Core i7, 4 cores
Slowdown0.6
Mem.
Bandwidth: 8.5 GB/s
Performanc
e
SharedRate
Request
Service
Shared
0.5
0.7
0.4
Easy
0.3
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Request Service Rate
8
Key Observation 2
Request Service Rate Alone (RSRAlone) of an application
can be estimated by giving the application highest
priority in accessing memory
Highest priority  Little interference
(almost as if the application were run alone)
9
Key Observation 2
1. Run alone
Time units
Request Buffer State
3
Main
Memory
2. Run with another application
Time units
Request Buffer State
Main
Memory
3
Service order
2
1
Main
Memory
Service order
2
1
Main
Memory
3. Run with another application: highest priority
Time units
Request Buffer State
Main
Memory
3
Service order
2
1
Main
Memory
10
Memory Interference-induced Slowdown Estimation
(MISE) model for memory bound applications
Request Service Rate Alone (RSRAlone)
Slowdown 
Request Service Rate Shared (RSRShared)
11
Key Observation 3

Memory-bound application
Compute Phase
Memory Phase
No
interference
With
interference
Req
Req
Req
time
Req
Req
Req
time
Memory phase slowdown dominates overall slowdown
12
Key Observation 3

Non-memory-bound application
Compute Phase
Memory Phase
Memory Interference-induced Slowdown Estimation

1
(MISE) model for non-memory bound applications
No
interference
RSRAlone
time
Slowdown  (1 -  )  
RSRShared
With
interference
1
RSRAlone

RSRShared
time
Only memory fraction () slows down with interference
13
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

14
Interval Based Operation
Interval
Interval
time


Measure RSRShared, 
Estimate RSRAlone


Measure RSRShared, 
Estimate RSRAlone
Estimate
slowdown
Estimate
slowdown
15
Measuring RSRShared and α

Request Service Rate


Shared
(RSRShared)
Per-core counter to track number of requests serviced
At the end of each interval, measure
Number of Requests Serviced
RSRShared 
Interval Length

Memory Phase Fraction ( )


Count number of stall cycles at the core
Compute fraction of cycles stalled for memory
16
Estimating Request Service Rate Alone (RSRAlone)

Divide each interval into shorter epochs

At the beginning of each epoch


Memory controller
randomly picks
application as the
Goal: Estimate
RSRanAlone
highest priority application
How: Periodically give each application
in accessing
memory
At highest
the end ofpriority
an interval,
for each application,
estimate
Number of Requests During High Priority Epochs
RSRAlone 
Number of Cycles Applicatio n Given High Priority
17
Inaccuracy in Estimating RSRAlone
 When
an
Request
Buffer
State

application
has
priority
Service order
Time
unitshighest
3
2
1
Still experiences
some
interference
Main
Memory
Request Buffer
State
Request Buffer
State
Time units
Main
Memory
3
Time units
Main
Memory
3
Time units
3
High Priority
Main
Memory
Service order
2
1
Main
Memory
Service order
2
1
Main
Memory
Service order
2
1
Main
Memory
Interference Cycles
18
Accounting for Interference in RSRAlone Estimation

Solution: Determine and remove interference
cycles from RSRAlone calculation
RSRAlone 

Number of Requests During High Priority Epochs
Number of Cycles Applicatio n Given High Priority - Interferen ce Cycles
A cycle is an interference cycle if


a request from the highest priority application is
waiting in the request buffer and
another application’s request was issued previously
19
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

20
MISE Model: Putting it All Together
Interval
Interval
time


Measure RSRShared, 
Estimate RSRAlone


Measure RSRShared, 
Estimate RSRAlone
Estimate
slowdown
Estimate
slowdown
21
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

22
Previous Work on Slowdown Estimation

Previous work on slowdown estimation




STFM (Stall Time Fair Memory) Scheduling [Mutlu+, MICRO ‘07]
FST (Fairness via Source Throttling) [Ebrahimi+, ASPLOS ‘10]
Per-thread Cycle Accounting [Du Bois+, HiPEAC ‘13]
Basic Idea:
Stall Time Alone
Slowdown 
Stall Time Shared
Hard
Easy
Count number of cycles application receives interference
23
Two Major Advantages of MISE Over STFM

Advantage 1:



STFM estimates alone performance while an
application is receiving interference  Hard
MISE estimates alone performance while giving an
application the highest priority  Easier
Advantage 2:


STFM does not take into account compute phase for
non-memory-bound applications
MISE accounts for compute phase  Better accuracy
24
Methodology

Configuration of our simulated system





4 cores
1 channel, 8 banks/channel
DDR3 1066 DRAM
512 KB private cache/core
Workloads


SPEC CPU2006
300 multi programmed workloads
25
Quantitative Comparison
SPEC CPU 2006 application
leslie3d
4
Slowdown
3.5
3
Actual
STFM
MISE
2.5
2
1.5
1
0
20
40
60
80
100
Million Cycles
26
4
4
3
3
3
2
1
2
1
4
Average
error
of
MISE:
0
50
100 0 8.2%50
100
cactusADM
GemsFDTD
soplex
Average error of STFM: 29.4%
4
4
(across
300 workloads)
3
3
Slowdown
3
2
1
0
2
1
0
0
1
100
50
Slowdown
0
2
0
0
0
Slowdown
Slowdown
4
Slowdown
Slowdown
Comparison to STFM
50
wrf
100
2
1
0
0
50
calculix
100
0
50
100
povray
27
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

28
Providing “Soft” Slowdown Guarantees

Goal
1. Ensure QoS-critical applications meet a prescribed
slowdown bound
2. Maximize system performance for other applications

Basic Idea


Allocate just enough bandwidth to QoS-critical
application
Assign remaining bandwidth to other applications
29
MISE-QoS: Mechanism to Provide Soft QoS




Assign an initial bandwidth allocation to QoS-critical
application
Estimate slowdown of QoS-critical application using
the MISE model
After every N intervals

If slowdown > bound B +/- ε, increase bandwidth allocation

If slowdown < bound B +/- ε, decrease bandwidth allocation
When slowdown bound not met for N intervals

Notify the OS so it can migrate/de-schedule jobs
30
Methodology





Each application (25 applications in total)
considered the QoS-critical application
Run with 12 sets of co-runners of different
memory intensities
Total of 300 multi programmed workloads
Each workload run with 10 slowdown bound values
Baseline memory scheduling mechanism

Always prioritize QoS-critical application
[Iyer+, SIGMETRICS 2007]

Other applications’ requests scheduled in FRFCFS order
[Zuravleff +, US Patent 1997, Rixner+, ISCA 2000]
31
A Look at One Workload
Slowdown Bound = 10
Slowdown Bound = 3.33
Slowdown Bound = 2
3
Slowdown
2.5
2
AlwaysPrioritize
MISE-QoS-10/1
MISE-QoS-10/3
MISE-QoS-10/5
MISE-QoS-10/7
MISE-QoS-10/9
MISE
1.5 is effective in
1. meeting the slowdown bound for the QoS1
critical application
2. 0.5
improving performance of non-QoS-critical
applications
0
leslie3d
hmmer
lbm
omnetpp
QoS-critical
non-QoS-critical
32
Effectiveness of MISE in Enforcing QoS
Across 3000 data points
Predicted
Met
Predicted
Not Met
QoS Bound
Met
78.8%
2.1%
QoS Bound
Not Met
2.2%
16.9%
MISE-QoS meets the bound for 80.9% of workloads
MISE-QoS correctly predicts whether or not the bound
is met for 95.7% of workloads
AlwaysPrioritize meets the bound for 83% of workloads
33
Performance of Non-QoS-Critical Applications
Harmonic Speedup
1.4
1.2
1
0.8
0.6
0.4
0.2
AlwaysPrioritize
MISE-QoS-10/1
MISE-QoS-10/3
MISE-QoS-10/5
MISE-QoS-10/7
MISE-QoS-10/9
0
0
1slowdown
2
3
Avgis 10/3
When
bound
Higher
when
bound is loose
Numberperformance
of Memory Intensive
Applications
MISE-QoS improves system performance by 10%
34
Outline
1. Estimate Slowdown
Key Observations
 Implementation
 MISE Model: Putting it All Together
 Evaluating the Model

2. Control Slowdown
Providing Soft Slowdown Guarantees
 Minimizing Maximum Slowdown

35
Other Results in the Paper

Sensitivity to model parameters


Comparison of STFM and MISE models in enforcing
soft slowdown guarantees


Robust across different values of model parameters
MISE significantly more effective in enforcing guarantees
Minimizing maximum slowdown

MISE improves fairness across several system configurations
36
Summary



Uncontrolled memory interference slows down
applications unpredictably
Goal: Estimate and control slowdowns
Key contribution



Key Idea



MISE: An accurate slowdown estimation model
Average error of MISE: 8.2%
Request Service Rate is a proxy for performance
Request Service Rate Alone estimated by giving an application highest
priority in accessing memory
Leverage slowdown estimates to control slowdowns


Providing soft slowdown guarantees
Minimizing maximum slowdown
37
Thank You
38
MISE:
Providing Performance Predictability
in Shared Main Memory Systems
Lavanya Subramanian, Vivek Seshadri,
Yoongu Kim, Ben Jaiyen, Onur Mutlu
39
Backup Slides
40
Case Study with Two QoS-Critical Applications

10
Two
comparison points
Always prioritize both applications
 8Prioritize each application 50% of time
 9
AlwaysPrioritize
EqualBandwidth
MISE-QoS-10/1
MISE-QoS-10/2
MISE-QoS-10/3
MISE-QoS-10/4
MISE-QoS-10/5
Slowdown
7
6
5
4
3
2
1
0
MISE-QoS
MISE-QoSprovides
can achieve
much
a lower slowdowns
slowdown
astarforbound
mcf forleslie3d
mcf
non-QoS-critical
both applications
applications
41
Minimizing Maximum Slowdown

Goal


Minimize the maximum slowdown experienced by any
application
Basic Idea

Assign more memory bandwidth to the more slowed
down application
Mechanism

Memory controller tracks



Slowdown bound B
Bandwidth allocation of all applications
Different components of mechanism



Bandwidth redistribution policy
Modifying target bound
Communicating target bound to OS periodically
Bandwidth Redistribution

At the end of each interval,

Group applications into two clusters

Cluster 1: applications that meet bound

Cluster 2: applications that don’t meet bound

Steal small amount of bandwidth from each application
in cluster 1 and allocate to applications in cluster 2
Modifying Target Bound

If bound B is met for past N intervals



Bound can be made more aggressive
Set bound higher than the slowdown of most slowed
down application
If bound B not met for past N intervals by more than
half the applications


Bound should be more relaxed
Set bound to slowdown of most slowed down application
Results: Harmonic Speedup
0.7
Harmonic Speedup
0.6
0.5
FRFCFS
0.4
ATLAS
TCM
0.3
STFM
MISE-Fair
0.2
0.1
0
4
8
16
46
Results: Maximum Slowdown
16
Maximum Slowdown
14
12
10
FRFCFS
ATLAS
8
TCM
6
STFM
MISE-Fair
4
2
0
4
8
Core Count
16
47
Sensitivity to Memory Intensity
25
Maximum Slowdown
20
FRFCFS
15
ATLAS
TCM
10
STFM
MISE-Fair
5
0
0
25
50
75
100
Avg
MISE’s Implementation Cost
1. Per-core counters worth 20 bytes
 Request Service Rate Shared
 Request Service Rate Alone



1 counter for number of high priority epoch requests
1 counter for number of high priority epoch cycles
1 counter for interference cycles
Memory phase fraction ( )
2. Register for current bandwidth allocation – 4 bytes
3. Logic for prioritizing an application in each epoch

49
Download