Tech/App/Arch Trends New Needs from Memory Hierarchy
Technology scalable ( not DRAM)
QoS and performance guarantees ( not free for all)
Energy and power efficient ( not one size fits all)
New Memory Hierarchy Research Challenges
Tech. Scalability : Enabling NVRAM (+ DRAM) memory
QoS/performance : Reducing and controlling interference
Energy-efficiency : Customizability and minimal waste
Need fundamental research in HW/SW cooperation
Memory architecture, organization, controllers, HW algorithms
Low level system software and HW/SW interface
WAMT, 3/14/2010, © Onur Mutlu
2
Technology, Application, Architecture Trends
Requirements from the Memory Hierarchy
Research Challenges and Possible Avenues
Summary
WAMT, 3/14/2010, © Onur Mutlu
3
4
WAMT, 3/14/2010, © Onur Mutlu
DRAM does not scale well beyond N nm
Memory scaling benefits: density, capacity, cost
Energy/power already key design limiters
Memory hierarchy responsible for a large fraction of power
More transistors (cores) on chip (Moore’s Law)
Pin bandwidth not increasing as fast as number of transistors
Memory subsystem is a key shared resource among cores
More pressure on the memory hierarchy
WAMT, 3/14/2010, © Onur Mutlu
5
Many different threads/applications/virtual machines will share the memory system
Cloud computing/servers : Many workloads consolidated onchip to improve efficiency
GP-GPUs : Many threads from multiple parallel applications
Mobile : Interactive + non-interactive consolidation
Different applications with different requirements (SLAs)
Some applications/threads require performance guarantees
Modern hierarchies do not distinguish between applications
Different goals for different systems/users
System throughput, fairness, per-application performance
Modern hierarchies are not flexible/configurable
6
WAMT, 3/14/2010, © Onur Mutlu
More cores and components
More pressure on the memory hierarchy
Asymmetric cores: Performance asymmetry, CPU+GPUs, accelerators, …
Motivated by energy efficiency and Amdahl’s Law
Different cores have different performance requirements
Memory hierarchies do not distinguish between cores
WAMT, 3/14/2010, © Onur Mutlu
7
Technology, Application, Architecture Trends
Requirements from the Memory Hierarchy
Research Challenges and Possible Avenues
Summary
WAMT, 3/14/2010, © Onur Mutlu
8
Traditional
High system performance
Enough capacity
Low cost
New
Technology scalability
QoS support, performance guarantees, configurability
Energy (and power, bandwidth) efficiency
WAMT, 3/14/2010, © Onur Mutlu
9
Traditional
High system performance: Reduce inter-thread interference
Enough capacity
Low cost
New
Technology scalability
Emerging non-volatile memory technologies (PCM, MRAM) can help
QoS support, performance guarantees, configurability
Need HW mechanisms SW can use to satisfy QoS policies
Energy (and power, bandwidth) efficiency
One size fits all wastes energy, performance, bandwidth
WAMT, 3/14/2010, © Onur Mutlu
10
Technology, Application, Architecture Trends
Requirements from the Memory Hierarchy
Research Challenges and Possible Avenues
Technology scalability
QoS support: Inter-thread/application interference
Energy/power/bandwidth efficiency
Summary
WAMT, 3/14/2010, © Onur Mutlu
11
Problem: DRAM is not scalable beyond N nm memory capacity, cost may not continue to scale
Some emerging resistive memory technologies (NVRAM) more scalable than DRAM
NVRAM will likely be a main component of the memory hierarchy , but we need to enable it:
Redesign the hierarchy to mitigate NVRAM shortcomings
Find the right way to place NVRAM in the subsystem
Satisfy all other requirements in the presence of NVRAM
WAMT, 3/14/2010, © Onur Mutlu
12
Phase Change Memory
Pros
Better technology scaling
Non volatility
Low idle power
Cons
Higher latencies (especially write)
Higher active energy
Lower endurance
WAMT, 3/14/2010, © Onur Mutlu
13
How should NVRAM-based (main) memory be organized?
Hybrid NVRAM+DRAM [Qureshi et al., ISCA’09] :
How to partition/migrate data (energy, performance, endurance)
How to design the memory controllers and system software
Exploit advantages, minimize disadvantages
WAMT, 3/14/2010, © Onur Mutlu
14
How should NVRAM-based (main) memory be organized?
Pure NVRAM main memory [Lee et al., ISCA’09, IEEE Micro’10] :
How to redesign entire hierarchy (and cores) to overcome
NVRAM shortcomings
Latency, energy, endurance
WAMT, 3/14/2010, © Onur Mutlu
15
Partitioning
Should DRAM be a cache or main memory, or configurable?
What fraction? How many controllers?
Data allocation/movement (energy, perf, lifetime, security)
Access latency critical, heavy modifications DRAM
Non-volatility critical, not accessed heavily PCM
Who manages allocation/movement?
What are good control algorithms?
Redesign of cache hierarchy, memory controllers
How can NVRAM be exploited on chip?
Design of NVRAM/DRAM chips
Rethink the design of PCM/DRAM with new requirements?
16
WAMT, 3/14/2010, © Onur Mutlu
Technology, Application, Architecture Trends
Requirements from the Memory Hierarchy
Research Challenges and Possible Avenues
Technology scalability
QoS support: Inter-thread/application interference
Energy/power/bandwidth efficiency
Summary
WAMT, 3/14/2010, © Onur Mutlu
17
threads’ requests interfere
18
WAMT, 3/14/2010, © Onur Mutlu
Problem: Threads share the memory system, but memory system does not distinguish threads’ requests
Memory system algorithms thread-unaware and thread-unfair
Existing memory systems
Free-for-all, demand-based sharing of the memory system
Aggressive threads can deny service to others
Do not try to reduce or control inter-thread interference
19
WAMT, 3/14/2010, © Onur Mutlu
DRAM is the only shared resource
High priority
Cores make very slow progress
Unfair slowdown of different threads [MICRO’07, ISCA’08, ASPLOS’10]
System performance loss [MICRO’07, ISCA’08, HPCA’10]
Vulnerability to denial of service [USENIX Security’07]
Priority inversion: unable to enforce priorities/SLAs [MICRO’07]
Poor performance predictability (no performance isolation)
WAMT, 3/14/2010, © Onur Mutlu
20
How do we reduce inter-thread interference ?
Improve system performance and utilization
Preserve the benefits of single-thread performance techniques
How do we control inter-thread interference ?
Provide fairness when needed
Satisfy performance guarantees of threads when needed
Provide mechanisms to enable system software to enforce a variety of QoS policies
All the while providing high system performance
How do we make the memory system configurable/flexible ?
Enable flexible mechanisms that can achieve many goals
WAMT, 3/14/2010, © Onur Mutlu
21
Hardware good at fine-grained prioritization mechanisms
high performance
Software good at coarse-grained prioritization mechanisms
Software needed to decide the QoS policy in the memory system
(e.g., system throughput vs. fairness, which app more important)
Hardware provides configurable partitioning/prioritization and feedback mechanisms (fine grained interference control)
Software configures the hardware mechanisms (coarse-grained)
Many challenges
How to design flexible hardware resources?
How to design the software/hardware interface?
How should system software be written?
WAMT, 3/14/2010, © Onur Mutlu
22
Smart resources: Design each shared resource to have a configurable fairness/QoS mechanism
Fair/QoS-aware memory schedulers, interconnects, caches, arbiters
Examples: fair memory schedulers [Mutlu MICRO 2007] , parallelismaware memory schedulers [Mutlu ISCA 2008] , application-aware on-chip networks [Das et al. MICRO 2009, ISCA 2010, Grot et al. MICRO 2009]
Dumb resources: Keep each resource free-for-all, but control access to memory system at the cores/sources
Fairness via Source Throttling [Ebrahimi et al., ASPLOS 2010]
Estimate thread slowdowns in the entire system and throttle cores that slow down others
Coordinated Prefetcher Throttling [Ebrahimi et al., MICRO 2009]
Combined approaches are even more powerful
WAMT, 3/14/2010, © Onur Mutlu
23
Technology, Application, Architecture Trends
Requirements from the Memory Hierarchy
Research Challenges and Possible Avenues
Technology Scalability
QoS support: Inter-thread/application interference
Energy/power/bandwidth efficiency
Summary
WAMT, 3/14/2010, © Onur Mutlu
24
Problem: How to minimize energy/power consumption while satisfying performance requirements
Existing memory systems are wasteful
Optimized for “general” behavior , suboptimal for particular access patterns
E.g., fixed cache line size, fixed cache size
A lot of data movement : Moving data can be inefficient
Can we design the memory hierarchy to be customizable?
Can we minimize (data) movement?
25
WAMT, 3/14/2010, © Onur Mutlu
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C
C
C1
C
C C2 C
C C3 C
Non-configurable
Non-configurable: One size fits all
Energy and performance suboptimal for different behaviors
Configurable: Enables tradeoffs and customization
Processing requirements vary across applications and phases
Execute code on best-fit resources (minimal energy, adequate perf.)
WAMT, 3/14/2010, © Onur Mutlu
26
What type of access/communication patterns deserve customization?
How do we enable customization?
How should applications be mapped to the best-fit memory hierarchy resources?
Many design, monitoring, program characterization questions
Hardware and software should work cooperatively
WAMT, 3/14/2010, © Onur Mutlu
27
Technology, application, architecture trends dictate fundamentally new needs from memory system
A Fresh Look at Re-designing Memory Hierarchy
Tech. Scalability : Enabling NVRAM (+ DRAM) memory
QoS/performance : Reducing and controlling interference
Energy-efficiency : Customizability and minimal waste
HW/SW cooperation essential
Fundamental changes to architecture, uarch, software
Many challenges and opportunities
WAMT, 3/14/2010, © Onur Mutlu
28