isca11 - University of Utah

advertisement
Combining Memory and a Controller with
Photonics through 3D-Stacking to Enable Scalable
and Energy-Efficient Systems
Aniruddha N. Udipi
Naveen Muralimanohar*
Rajeev Balasubramonian
Al Davis
Norm Jouppi*
University of Utah and *HP Labs
Memory Trends - I
• Multi-socket, multi-core, multi-thread
– High bandwidth requirement
– 1 TB/s by 2017
• Edge-bandwidth bottleneck
Source: Tom’s Hardware
– Pin count, per pin bandwidth
– Signal integrity and off-chip power

Limited number of DIMMs
• Without melting the system
– Or setting up in the Tundra!
Source: ZDNet
2
Memory Trends - II
• The job of the memory controller is hard
– 18+ timing parameters for DRAM!
– Maintenance operations

Refresh, scrub, power down, etc.
• Several DIMM and controller variants
– Hard to provide interoperability
– Need processor-side support for new
memory features
• Now throw in heterogeneity
– Memristors, PCM, STT-RAM, etc.
3
Improving the interface
Memory Interconnect - Efficient
application of Silicon Photonics,
without modifying DRAM dies
DIMM
1
…
CPU
MC
2
Communication protocol –
Streamlined Slot-based Interface
Memory interface under severe pressure
4
PART 1 – Memory Interconnect
Silicon Photonic Interconnects
• We need something that can break
the edge-bandwidth bottleneck
• Ring modulator based photonics
– Off chip light source
Source: Xu et al. Optical Express 16(6), 2008
– Indirect modulation using resonant rings
– Relatively cheap coupling on- and off-chip
• DWDM for high bandwidth density
– As many as 67 wavelengths possible
– Limited by Free Spectral Range, and
coupling losses between rings
DWDM
64 λ × 10 Gbps/ λ = 80 GB/s per waveguide
6
Static Photonic Energy
• Photonic interconnects
– Large static power dissipation: ring tuning
– Much lower dynamic energy consumption –
relatively independent of distance
• Electrical interconnects
– Relatively small static power dissipation
– Large dynamic energy consumption
• Should not over-provision photonic
bandwidth, use only where necessary
7
The Questions We’re Trying to Answer
What should the role of
electrical signaling be?
How do we make
photonics less
invasive to memory
die design?
Should we replace all
interconnects with
photonics? On-chip
too?
What should the
role of 3D be in an
optically connected
memory?
Should we be designing
photonic DRAM dies?
Stacks? Channels?
8
Contributions Beyond Prior Work
• Beamer et al. (ISCA 2010)
– First paper on fully integrated optical memory
– Studied electrical-optical balance point
– Focus on losses, proposed photonic power guiding
• We build upon this
– Focus on tuning power constraints
– Effect of low-swing wires
– Effect of 3D and daisy-chaining
9
Energy Balance Within a DRAM Chip
Photonic Energy
Electrical Energy
10
1 Photonic
DRAM die
Single Die Design
Full-swing on-chip wires
Low-swing on-chip wires
More
46%
Similar
energy
efficient
to state-of-the-art
reduction
on-chip going
electrical
design,
between
communication
based
best
onfull-swing
prior provides
work.
the
config
Argues
added
(4 stops)
for
benefit
a specially
and
of best
allowing
designed
low-swing
fewer
photonic
config
photonic
(1DRAM.
stop).
resources.
11
3D Stacking Imminent for Capacity
• Simply stack photonic dies?
– Vertical coupling and hierarchical power
guiding suggested by prior work
– This is our baseline design
• But, more photonic rings in the channel
– Exactly the same number active as before
• Energy optimal point shifts towards fewer
“stops”
– single set of rings becomes optimal
• 2.4x energy consumption, for 8x capacity
12
8 Optimally Designed
Photonic DRAM dies
Key Idea – Exploiting TSVs
• Move all photonic components to a
separate interface die, shared by
several memory dies
8 Optimally
8 Commodity
Designed
Photonic
DRAMDRAM
dies dies
• Photonics off-chip only
• TSVs for inter-die communication
– Best of both worlds; high BW and low
static energy
• Efficient low-swing wires on-die
13
Single photonic
interface die
Proposed Design
ADVANTAGE 2:
1:
3:
Increased
Not
Rings
disruptive
are co-located;
activity
to the
factor,
easier
design
more
to
ofisolate
efficient
commodity
oruse
tune
memory
ofthermally
photonics
dies
DRAM chips
Processor
DIMM
Memory Waveguide
controller
14
Photonic
Interface die
Energy Characteristics
Single die on the channel
Four 8-die stacks on the channel
Static energy trumps distance-independent dynamic energy
15
Final System
DRAM chips
Processor
Memory
controller
DIMM
Waveguide
Makes the job of
the memory
controller
difficult!
Photonic
Interface die
• 23% reduced energy consumption
• 4X capacity per channel
• Potential for performance improvements
due to increased bank count
• Less disruptive to memory die design
16
PART 2 – Communication Protocol
The Scalability Problem
• Large capacity, high bandwidth, and evolving
technology trends will increase pressure on the
memory interface
• Processor-side support required for every memory
innovation
• Current micro-management requires several signals
– Heavy pressure on address/command bus
– Worse with several independent banks, large
amounts of state
18
Proposed Solution
• Release MC’s tight control, make memory
stack more autonomous
• Move mundane tasks to the interface die
– Maintenance operation (refresh, scrub, etc.)
– Routine operations (DRAM precharge, NVM wear
leveling)
– Timing control (18+ constraints for DRAM alone)
– Coding and any other special requirements
19
What would it take to do this?
• “Back-pressure” from the memory
• But, “Free-for-all” would be inefficient
– Needs explicit arbitration
• Novel slot-based interface
– Memory controller retains control over data bus
– Memory module only needs address, returns data
20
Memory Access Operation
ML
ML
x
Arrival
x
Issue Start
looking
> ML
x
S1
S2
First
free slot
Backup
slot
Time
Slot – Cache line data bus occupancy
X – Reserved Slot
ML – Memory Latency = Addr. latency +
Bank access + Data bus latency
21
Advantages
• Plug and play
– Everything is interchangeable and interoperable
– Only interface-die support required (communicate ML)
• Better support for heterogeneous systems
– Easier DRAM-NVM data movement on the same channel
• More innovation in the memory system
– Without processor-side support constraints
• Fewer commands between processor and memory
– Energy, performance advantages
22
Target System and Methodology
• Terascale memory node in an Exascale system
– 1 TB of memory, 1 TB/s of bandwidth
• Assuming 80 GB/s per channel, we need 16
channels, with 64 GB per channel
– 2 GB dies x 8 dies per stack x 4 stacks per channel
• Focus on the design of a single channel
• In-house DRAM simulator + SIMICS
– PARSEC, STREAM, synthetic random traffic
– Max. traffic load used, just below channel saturation
23
Performance Impact – Synthetic Traffic
< 9% latency impact, even at maximum load
Virtually no impact on achieved bandwidth
24
Performance Impact – PARSEC/STREAM
Apps have very low BW requirements
Scaled down system, similar trends
25
Tying it together – The Interface Die
Summary of Design
• Proposed 3D-stacked interface die with 2 major functions
– Holds photonic devices for Electrical-Optical-Electrical conversion
 Photonics only on the busy shared bus between this die and the
processor
 Intra-memory communication all-electrical exploiting TSVs and
low-swing wires
– Holds device controller logic
 Handles all mundane/routine tasks for the memory devices
– Refresh, scrub, coding, timing constraints, sleep modes, etc.
 Processor-side controller deals with more important functions
such as scheduling, channel arbitration, etc.
 Simple speculative slot based interface
27
Key Contributions
• Efficient application of photonics
– 23% lower energy
– 4X capacity, potential for performance improvements
• Minimally disruptive to memory die design
– Single memory die design for photonics and electronics
• Streamlined memory interface
– More interoperability and flexibility
– Innovation without processor-side changes
– Support for heterogeneous memory
28
Download