ppt

Multi-Core Systems

unfairness

CORE 0 CORE 1 CORE 2 CORE 3

Multi-Core

Chip

L2

CACHE

L2

CACHE

L2

CACHE

L2

CACHE

DRAM MEMORY CONTROLLER

Shared DRAM

Memory System

DRAM

Bank 0

DRAM

Bank 1

DRAM

. . .

Bank 2

DRAM

Bank 7

1

DRAM Bank Operation

Columns

Column decoder

Data

Row Buffer

2

DRAM Controllers



A row-conflict memory access takes much longer than a row-hit access (50ns vs. 100ns)



Current controllers take advantage of the row buffer



Commonly used scheduling policy (FR-FCFS):





Row-hit first: Service row-hit memory accesses first

Oldest-first: Then service older accesses first



This scheduling policy aims to





Maximize DRAM throughput

But, yields itself to a new form of denial of service with multiple threads

3

Memory Performance Attacks:

Denial of Memory Service in Multi-Core Systems

Thomas Moscibroda and Onur Mutlu

Computer Architecture Group

Microsoft Research

Outline











The Problem



Memory Performance Hogs

Preventing Memory Performance Hogs

Our Solution



Fair Memory Scheduling

Experimental Evaluation

Conclusions

5

The Problem





Multiple threads share the DRAM controller

DRAM controllers are designed to maximize DRAM throughput





DRAM scheduling policies are thread-unaware and unfair





Row-hit first: unfairly prioritizes threads with high row buffer locality





Streaming threads

Threads that keep on accessing the same row

Oldest-first: unfairly prioritizes memory-intensive threads

Vulnerable to denial of service attacks

6

Memory Performance Hog (MPH)



A thread that exploits unfairness in the DRAM controller

 to deny memory service (for long time periods) to threads coscheduled on the same chip





Can severely degrade other threads’ performance

While maintaining most of its own performance!



Easy to write an MPH







An MPH can be designed intentionally (malicious)

Or, unintentionally!

Implications in commodity multi-core systems:

 widespread performance degradation and unpredictability



 user discomfort and loss of productivity vulnerable hardware could lose market share

7

An Example Memory Performance Hog

STREAM

- Sequential memory access

- Very high row buffer locality (96% hit rate)

- Memory intensive

8

A Co-Scheduled Application

RDARRAY

- Random memory access

- Very low row buffer locality (3% hit rate)

- Similarly memory intensive

9

What Does the MPH Do?

Row size: 8KB, data size: 64B

128 requests of stream serviced before rdarray

Request Buffer

Row Buffer

Column decoder

Data

10

Already a Problem in Real Dual-Core Systems

2.82X slowdown

1.18X slowdown

Results on Intel Pentium D running Windows XP

(Similar results for Intel Core Duo and AMD Turion, and on Fedora)

11

More Severe Problem with 4 and 8 Cores

Simulated config:

- 512KB private L2s

- 3-wide processors

-128-entry instruction window

12

Outline











The Problem





Our Solution





Conclusions

13




MPHs cannot be prevented in software



Fundamentally hard to distinguish between malicious and unintentional MPHs



Software has no direct control over DRAM controller’s scheduling





OS/VMM can adjust applications’ virtual  physical page mappings

No control over DRAM controller’s hardwired page  bank mappings



MPHs should be prevented in hardware





Hardware also cannot distinguish malicious vs unintentional

Goal: Contain and limit MPHs by providing fair memory scheduling

14

Fairness in Shared DRAM Systems







A thread’s DRAM performance dependent on its inherent



Row-buffer locality



Performance

Well-studied notions of fairness are inadequate



Fair scheduling cannot simply





Equalize memory bandwidth among threads





Unfairly penalizes threads with good bandwidth utilization

DRAM scheduling is not a pure bandwidth allocation problem



Unlike a network wire, DRAM is a stateful system

Equalize the latency or number of requests among threads



Unfairly penalizes threads with good row-buffer locality



Row-hit requests are less costly than row-conflict requests

15

Our Definition of DRAM Fairness



A DRAM system is fair if it slows down each thread equally



Compared to when the thread is run alone







Experienced Latency = EL(i) = Cumulative DRAM latency of thread i running with other threads

Ideal Latency= IL(i) = Cumulative DRAM latency of thread i if it were run alone on the same hardware resources

DRAM-Slowdown(i) = EL(i)/IL(i)



The goal of a fair scheduling policy is to equalize DRAM-

Slowdown(i) for all Threads i



Considers inherent DRAM performance of each thread

16

Outline











The Problem





Our Solution





Conclusions

17

Fair Memory Scheduling Algorithm (1)



During each time interval of  cycles, for each thread,

DRAM controller





Tracks EL (Experienced Latency)

Estimates IL (Ideal Latency)



At the beginning of a scheduling cycle, DRAM controller





Computes Slowdown = EL/IL for each thread

Computes unfairness index = MAX Slowdown / MIN Slowdown



If unfairness < 



Use baseline scheduling policy to maximize DRAM throughput





(1) row-hit first

(2) oldest-first

18

Fair Memory Scheduling Algorithm (2)



If unfairness ≥ 



Use fairness-oriented scheduling policy





(1) requests from thread with MAX Slowdown first (if any)

(2) row-hit first





(3) oldest-first

Maximizes DRAM throughput if cannot improve fairness







 : Maximum tolerable unfairness

 : Lever between short-term versus long-term fairness

System software determines  and 



Conveyed to the hardware via privileged instructions

19

How Does FairMem Prevent an MPH?

T0: Row 0

T1: Row 5

T1: Row 111

Row Buffer

T0 Slowdown

Unfairness



1.05

Data

20

Hardware Implementation



Tracking Experienced Latency





Relatively easy

Update a per-thread latency counter as long as there is an outstanding memory request for a thread



Estimating Ideal Latency





More involved because thread is not running alone

Need to estimate inherent row buffer locality





Keep track of the row that would have been in the row buffer if thread were running alone

For each memory request, update a per-thread latency counter based on

“would-have-been” row buffer state



Efficient implementation possible – details in paper

21

Outline











The Problem





Our Solution





Conclusions

22

Evaluation Methodology













Instruction-level performance simulation

Detailed x86 processor model based on Intel Pentium M

4 GHz processor, 128-entry instruction window

512 Kbyte per core private L2 caches

Detailed DRAM model







8 banks, 2Kbyte row buffer

Row-hit latency: 50ns (200 cycles)

Row-conflict latency: 100ns (400 cycles)

Benchmarks





Stream (MPH) and rdarray

SPEC CPU 2000 and Olden benchmark suites

23

Dual-core Results

MPH with non-intensive VPR

Fairness improvement 7.9X to 1.11

System throughput improvement 4.5X

MPH with intensive MCF

Fairness improvement 4.8X to 1.08

System throughput improvement 1.9X

24

4-Core Results

• Non-memory intensive applications suffer the most (VPR)

• Low row-buffer locality hurts (MCF)

• Fairness improvement: 6.6X

• System throughput improvement: 2.2X

25

8-Core Results

• Fairness improvement: 10.1X

• System throughput improvement: 1.97X

26

Effect of Row Buffer Size

• Vulnerability increases with row buffer size

• Also with memory latency (in paper)

27

Outline











The Problem





Our Solution





Conclusions

28

Conclusions







Introduced the notion of memory performance attacks in shared DRAM memory systems

Unfair DRAM scheduling is the cause of the vulnerability

More severe problem in future many-core systems





We provide a novel definition of DRAM fairness



Threads should experience equal slowdowns

New DRAM scheduling algorithm enforces this definition



Effectively prevents memory performance attacks



Preventing attacks also improves system throughput!

29

Questions?

ppt

Multi-Core Systems

DRAM Bank Operation

DRAM Controllers

Memory Performance Attacks:

Denial of Memory Service in Multi-Core Systems

Outline

The Problem

Memory Performance Hog (MPH)

An Example Memory Performance Hog

A Co-Scheduled Application

What Does the MPH Do?

128 requests of stream serviced before rdarray

Already a Problem in Real Dual-Core Systems

More Severe Problem with 4 and 8 Cores

Outline

Preventing Memory Performance Hogs

Fairness in Shared DRAM Systems

Our Definition of DRAM Fairness

Outline

Fair Memory Scheduling Algorithm (1)

Fair Memory Scheduling Algorithm (2)

How Does FairMem Prevent an MPH?

Hardware Implementation

Outline

Evaluation Methodology

Dual-core Results

4-Core Results

8-Core Results

Effect of Row Buffer Size

Outline

Conclusions

Questions?

Related documents

Products

Support

ppt

Multi-Core Systems

DRAM Bank Operation

DRAM Controllers

Memory Performance Attacks:

Denial of Memory Service in Multi-Core Systems

Outline

The Problem

Memory Performance Hog (MPH)

An Example Memory Performance Hog

A Co-Scheduled Application

What Does the MPH Do?

128 requests of stream serviced before rdarray

Already a Problem in Real Dual-Core Systems

More Severe Problem with 4 and 8 Cores

Outline

Preventing Memory Performance Hogs

Fairness in Shared DRAM Systems

Our Definition of DRAM Fairness

Outline

Fair Memory Scheduling Algorithm (1)

Fair Memory Scheduling Algorithm (2)

How Does FairMem Prevent an MPH?

Hardware Implementation

Outline

Evaluation Methodology

Dual-core Results

4-Core Results

8-Core Results

Effect of Row Buffer Size

Outline

Conclusions

Questions?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib