Rethinking Memory System Design in the Nanoscale Many-Core Era Onur Mutlu

Rethinking Memory System Design in the Nanoscale Many-Core Era

Onur Mutlu onur@cmu.edu

  

Summary

Tech/App/Arch Trends  New Needs from Memory Hierarchy

  

  

  

Technology scalable ( not DRAM)

QoS and performance guarantees ( not free for all)

Energy and power efficient ( not one size fits all)

  

New Memory Hierarchy Research Challenges

  

Tech. Scalability : Enabling NVRAM (+ DRAM) memory

  

  

QoS/performance : Reducing and controlling interference

Energy-efficiency : Customizability and minimal waste

  

Need fundamental research in HW/SW cooperation

  

Memory architecture, organization, controllers, HW algorithms

  

Low level system software and HW/SW interface

WAMT, 3/14/2010, © Onur Mutlu

2

Agenda

  

  

  

  

Technology, Application, Architecture Trends

Requirements from the Memory Hierarchy

Research Challenges and Possible Avenues

Summary


3

Modern Memory Subsystems (Multi-Core)

4


Technology Trends

  

DRAM does not scale well beyond N nm

  

Memory scaling benefits: density, capacity, cost

  

Energy/power already key design limiters

  

Memory hierarchy responsible for a large fraction of power

  

  

More transistors (cores) on chip (Moore’s Law)

Pin bandwidth not increasing as fast as number of transistors

  

  

Memory subsystem is a key shared resource among cores

More pressure on the memory hierarchy


5

  

  

  

  

Application/System Trends

Many different threads/applications/virtual machines will share the memory system

Cloud computing/servers : Many workloads consolidated onchip to improve efficiency

GP-GPUs : Many threads from multiple parallel applications

Mobile : Interactive + non-interactive consolidation

  

  

Different applications with different requirements (SLAs)

  

  

Some applications/threads require performance guarantees

Modern hierarchies do not distinguish between applications

Different goals for different systems/users

  

  

System throughput, fairness, per-application performance

Modern hierarchies are not flexible/configurable

6


Architecture Trends

  

More cores and components

  

More pressure on the memory hierarchy

  

Asymmetric cores: Performance asymmetry, CPU+GPUs, accelerators, …

  

Motivated by energy efficiency and Amdahl’s Law

  

Different cores have different performance requirements

  

Memory hierarchies do not distinguish between cores


7

Agenda

  

  

  

  




Summary


8

Requirements from an Ideal Hierarchy

  

Traditional

  

  

  

High system performance

Enough capacity

Low cost

  

New

  

Technology scalability

  

  

QoS support, performance guarantees, configurability

Energy (and power, bandwidth) efficiency


9

Requirements from an Ideal Hierarchy

  

Traditional

  

  

  

High system performance: Reduce inter-thread interference

Enough capacity

Low cost

  

New

  


  

  

  

Emerging non-volatile memory technologies (PCM, MRAM) can help

QoS support, performance guarantees, configurability

  

Need HW mechanisms SW can use to satisfy QoS policies

Energy (and power, bandwidth) efficiency

  

One size fits all wastes energy, performance, bandwidth


10

Agenda

  

  

  

  




  

  

  


QoS support: Inter-thread/application interference

Energy/power/bandwidth efficiency

Summary


11

Technology Scalability Challenges

  

Problem: DRAM is not scalable beyond N nm  memory capacity, cost may not continue to scale

  

Some emerging resistive memory technologies (NVRAM) more scalable than DRAM

  

NVRAM will likely be a main component of the memory hierarchy , but we need to enable it:

  

Redesign the hierarchy to mitigate NVRAM shortcomings

  

  

Find the right way to place NVRAM in the subsystem

Satisfy all other requirements in the presence of NVRAM


12

Emerging Memory Technologies

  

Phase Change Memory

  

Pros

  

  

  

Better technology scaling

Non volatility

Low idle power

  

Cons

  

Higher latencies (especially write)

  

  

Higher active energy

Lower endurance


13

NVRAM-based Hierarchy: Key Challenges

  

How should NVRAM-based (main) memory be organized?

  

Hybrid NVRAM+DRAM [Qureshi et al., ISCA’09] :

  

  

How to partition/migrate data (energy, performance, endurance)

How to design the memory controllers and system software

  

Exploit advantages, minimize disadvantages


14

NVRAM-based Hierarchy: Key Challenges

  

How should NVRAM-based (main) memory be organized?

  

Pure NVRAM main memory [Lee et al., ISCA’09, IEEE Micro’10] :

  

How to redesign entire hierarchy (and cores) to overcome

NVRAM shortcomings

  

Latency, energy, endurance


15

Many Research Challenges: Hybrid Systems

  

  

  

  

Partitioning

  

  

Should DRAM be a cache or main memory, or configurable?

What fraction? How many controllers?

Data allocation/movement (energy, perf, lifetime, security)

  

  

  

  

Access latency critical, heavy modifications  DRAM

Non-volatility critical, not accessed heavily  PCM

Who manages allocation/movement?

What are good control algorithms?

Redesign of cache hierarchy, memory controllers

  

How can NVRAM be exploited on chip?

Design of NVRAM/DRAM chips

  

Rethink the design of PCM/DRAM with new requirements?

16


Agenda

  

  

  

  




  

  

  




Summary


17

Memory System is the Major Shared Resource

threads’ requests interfere

18


Inter-Thread/Application Interference

  

Problem: Threads share the memory system, but memory system does not distinguish threads’ requests

  

Memory system algorithms thread-unaware and thread-unfair

  

Existing memory systems

  

Free-for-all, demand-based sharing of the memory system

  

  

Aggressive threads can deny service to others

Do not try to reduce or control inter-thread interference

19


Problems due to Uncontrolled Interference

DRAM is the only shared resource

High priority

Cores make very slow progress

  

  

  

  

  

Unfair slowdown of different threads [MICRO’07, ISCA’08, ASPLOS’10]

System performance loss [MICRO’07, ISCA’08, HPCA’10]

Vulnerability to denial of service [USENIX Security’07]

Priority inversion: unable to enforce priorities/SLAs [MICRO’07]

Poor performance predictability (no performance isolation)


20

  

QoS-Aware Memory Systems: Challenges

How do we reduce inter-thread interference ?

  

  

Improve system performance and utilization

Preserve the benefits of single-thread performance techniques

  

How do we control inter-thread interference ?

  

Provide fairness when needed

  

  

  

Satisfy performance guarantees of threads when needed

Provide mechanisms to enable system software to enforce a variety of QoS policies

All the while providing high system performance

  

How do we make the memory system configurable/flexible ?

  

Enable flexible mechanisms that can achieve many goals


21

Hardware/Software Cooperation for Memory QoS

  

  

  

Hardware good at fine-grained prioritization mechanisms

 high performance

Software good at coarse-grained prioritization mechanisms

Software needed to decide the QoS policy in the memory system

(e.g., system throughput vs. fairness, which app more important)

  

  

Hardware provides configurable partitioning/prioritization and feedback mechanisms (fine grained interference control)

Software configures the hardware mechanisms (coarse-grained)

  

Many challenges

  

How to design flexible hardware resources?

  

  

How to design the software/hardware interface?

How should system software be written?


22

Designing QoS-Aware Memory Systems: Approaches

  

Smart resources: Design each shared resource to have a configurable fairness/QoS mechanism

  

  

Fair/QoS-aware memory schedulers, interconnects, caches, arbiters

Examples: fair memory schedulers [Mutlu MICRO 2007] , parallelismaware memory schedulers [Mutlu ISCA 2008] , application-aware on-chip networks [Das et al. MICRO 2009, ISCA 2010, Grot et al. MICRO 2009]

  

Dumb resources: Keep each resource free-for-all, but control access to memory system at the cores/sources

  

  

Fairness via Source Throttling [Ebrahimi et al., ASPLOS 2010]

  

Estimate thread slowdowns in the entire system and throttle cores that slow down others

Coordinated Prefetcher Throttling [Ebrahimi et al., MICRO 2009]

  

Combined approaches are even more powerful


23

Agenda

  

  

  

  




  

  

  

Technology Scalability



Summary


24

Energy-Efficiency in the Memory System

  

Problem: How to minimize energy/power consumption while satisfying performance requirements

  

Existing memory systems are wasteful

  

  

Optimized for “general” behavior , suboptimal for particular access patterns

  

E.g., fixed cache line size, fixed cache size

A lot of data movement : Moving data can be inefficient

  

  

Can we design the memory hierarchy to be customizable?

Can we minimize (data) movement?

25


Configurability Enables Customization

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C C

C

C1

C

C C2 C

C C3 C

Non-configurable

  

  

Non-configurable: One size fits all

  

Energy and performance suboptimal for different behaviors

Configurable: Enables tradeoffs and customization

  

  

Processing requirements vary across applications and phases

Execute code on best-fit resources (minimal energy, adequate perf.)


26

Customizable Memory Systems: Challenges

  

What type of access/communication patterns deserve customization?

  

How do we enable customization?

  

How should applications be mapped to the best-fit memory hierarchy resources?

  

  

Many design, monitoring, program characterization questions

Hardware and software should work cooperatively


27

Summary

  

Technology, application, architecture trends dictate fundamentally new needs from memory system

  

A Fresh Look at Re-designing Memory Hierarchy

  

Tech. Scalability : Enabling NVRAM (+ DRAM) memory

  

  

QoS/performance : Reducing and controlling interference

Energy-efficiency : Customizability and minimal waste

  

  

  

HW/SW cooperation essential

Fundamental changes to architecture, uarch, software

Many challenges and opportunities


28

Rethinking Memory System Design in the Nanoscale Many-Core Era

Onur Mutlu onur@cmu.edu

Rethinking Memory System Design in the Nanoscale Many-Core Era Onur Mutlu

Rethinking Memory System Design in the Nanoscale Many-Core Era

Onur Mutlu onur@cmu.edu

Summary

Agenda

Modern Memory Subsystems (Multi-Core)

Technology Trends

Application/System Trends

Architecture Trends

Agenda

Requirements from an Ideal Hierarchy

Requirements from an Ideal Hierarchy

Agenda

Technology Scalability Challenges

Emerging Memory Technologies

NVRAM-based Hierarchy: Key Challenges

NVRAM-based Hierarchy: Key Challenges

Many Research Challenges: Hybrid Systems

Agenda

Memory System is the Major Shared Resource

Inter-Thread/Application Interference

Problems due to Uncontrolled Interference

QoS-Aware Memory Systems: Challenges

Hardware/Software Cooperation for Memory QoS

Designing QoS-Aware Memory Systems: Approaches

Agenda

Energy-Efficiency in the Memory System

Configurability Enables Customization

Customizable Memory Systems: Challenges

Summary