Serial Memories Fill a Need
Memcon 2015
Agenda
v  Michael Sporer – Director of Marketing
§  The future of parallel versus serial interface for memory
v  Mark Baumann – Director of Applications Engineering
§  Based on experience at MoSys developing and introducing the GigaChip
interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will
describe several options for future memory interface solutions.
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
2
Discrete DRAM doesn’t do Serial… yet
v  Memory is the last holdout that still hasn’t gone serial
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
3
Challenges of Implementing DDR
Design, Development
& Qualification
DRAM bus trace length
matching requirements
Source: Agilent
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
4
Tradeoffs: Serial vs. Parallel
v  On the Chip
v  On the Board
§  SerDes adds costs on chip
•  MUX deMUX
•  2.5GHz chip with 25 Gbps IO
§  Fewer lanes
•  25GHz is more challenging, but is
solvable
§  Longer reach than parallel
•  Easier board floor planning
•  Distributed thermal loads
§  Greater noise immunity
v  IO Bandwidth / Chip Area
§  Roughly the same on chip
§  Depends on the range
v  Is it a balanced tradeoff?
v  IO Bandwidth / Power
§  It depends on reach
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
5
HMC gives them the bandwidth they need
v  “DDR has run out of pins on the package”
Source: Xilinx Technology Outlook -­‐ Liam Madden, FPL, Sept-­‐2014
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
6
TSV Based DRAM Stacks
v  The performance potential of TSV based DRAM stacks can be
realized with two very different interface and packaging
solutions.
v  High Bandwidth Memory (HBM)
§  Evolutionary
§  wide, parallel interface
v  Hybrid Memory Cube (HMC)
§  high performance serial interface.
v  Both solutions have their place in new systems design and there
are advancements in both options on the horizon.
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
7
and HBM is coming …
v  Just look at what AMD and nvidia have planned
HBM Gen1 shipping now
HBM Gen2 coming soon
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
8
Interposer based MCM
v  Xilinx highlighted that the technology wasn’t the critical element,
it was the supply chain.
Source: Xilinx Technology Outlook -­‐ Liam Madden, FPL, Sept-­‐2014
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
9
Economics of Direct Attach HBM
v  @Customer: Can customer afford Direct Attach HBM?
§  Interposer development costs
§  Fixed memory footprint
§  Special Supply Chain
§  What is the volume required to recoup incremental costs?
v  @Manufacturer: Can DA-HBM exist in a low volume, high mix
manufacturing environment?
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
10
Serial HBM:
High Performance, Low Pin count
Serial HBM Solution
v  Serial HBM Reduces Risk at the Customer
§  Lower Technology Risk
• 
• 
• 
• 
Pin count advantage for host device,
Ease of routing a serial interface
Standard CEI interface
Scalable and versatile
Serial Interface HBM
§  Component type Supply Chain
•  Inventories
•  Test and Burn-In
§  Cost Advantages
•  Standard board assembly
v  Serial HBM Markets
shim
§  Networking
GCI
•  Packet Buffering and high capacity tables
§  Embedded
•  Supports a range of capacity and speeds with long product lifecycles
•  Protects customers from changing HBM memory interface on host
v  All the Bandwidth but none of the headaches of DA-HBM
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
12
Flexible Capacity Expansion : Serial
v  One host port of 16 lanes can
connect to 1, 2 or 4 devices
v  No additional bus loading or
pin count
16 8 8 Host Host 1x 2x v  No throughput degradation
4 4 4 4 Host §  Expansion example shows MoSys
Bandwidth Engine
Copyright ©MoSys, Inc. 2015. All rights reserved.
4x 13
HBM MCM Yield Analysis
HBM Memory Solutions
ASIC
55 um
HBM
HBM
MCM Yield
Single Sourced
Interface support longevity
Memory controller complexity and power
added to ASIC
HBM
§ 
§ 
§ 
§ 
HBM
v  Direct Attach HBM – 4 HBM
shim
shim
HBM
shim
HBM
HBM
HBM
Copyright ©MoSys, Inc. 2015. All rights reserved.
shim
MemCon 2015 - October 12th
ASIC
180 um
shim
ASIC
shim
180 um
HBM
VSR SerDes for Motherboard
Lowest Cost, highest yield solution
30% board area increase
Easiest thermal solution
HBM
§ 
§ 
§ 
§ 
HBM
v  Serial HBM On Motherboard:
shim
§  Tested and optional burn in of component
HBM before MCM assembly
§  shim features optimized for application
§  Incremental power for additional shim ASIC
§  USR SerDes for MCM
shim
HBM
v  Serial HBM Package on Package
15
Serial vs. Direct Attach Value Comparison
A+ribute Serial HBM Direct A+ach HBM Technical Risk + • Smaller Interposer • Discrete Component BI & Test +
-­‐ • MCM Yield • HBM Repair -­‐
Cost + • Lower yielded cost • Supply Chain Inventory +
-­‐ • MCM Development Cost • MCM Yield -­‐
Power -­‐ •  incremental power /BW + • Lower power Thermal + • Distributed sources -­‐ • Higher Thermal Density Time to Market + • Proven Standard SerDes • Discrete Component Design +
-­‐ • HBM Interface IP Availability • MCM Complexity -­‐
Flexibility + • On or Off substrate • Memory expansion + • Fungible Serdes +
-­‐ • Depopulate or not • Single purpose HBM IO Block -­‐
Reliability + • Burn-­‐In OpUon -­‐ • JEDEC Field Repair in host ASIC • Field Repair managed in Serial HBM +
Copyright ©MoSys, Inc. 2015. All rights reserved.
MemCon 2015 - October 12th
16
Normalized Yielded Cost of HBM
Assembly yield expected to be 95%
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
17
HMC – Hybrid Memory Cube
v  Breakthrough in power due to
TSV based construction
§  5 pJ/b DRAM only
v  Combined with Logic die resulting
in 24.5W per 1Tbps
§  3 links @ 12.5G
§  24.5 pJ/b total (vs. 39 for DDR4)
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
18
Serial vs. Parallel Memory Comparison
Attribute
Bandwidth Engine
BE-2 | BE-3
Hybrid Memory
Cube (HMC)
High Bandwidth
Memory (JEDEC)
DDR4
(JEDEC)
Physical Interface
Serial CEI Standard
Serial CEI Std
JEDEC HBM IO
JEDEC DDR4 IO
Protocol
GigaChip™ Interface
HMC Consortium
RAS/CAS
Dual-Sourced
Single Sourced
Multi-Sourced
Banked RAM
Source of Supply
Access
TDM
Scheduler
Sched./Switch
576 Mb
1152 Mb
16~32 Gb
32-64 Gb
4-8 Gb
Buffer Bandwidth
400 Gbps
800 Gbps
1280 Gbps
2048 Gbps
38 Gbps
Transaction Rate
>4.5 Bt/s
>10 Bt/s
2.6~2.9 Bt/s
TBD
0.2 Bt/s
66
66
272
~1600
42
BGA 19x19
BGA 25x25
BGA 31x31
KGSD
BGA 8x12
7-11W
TBA
~28W
8W estimated
0.7W
Capacity
Signal Pins
Package
Power
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
………………………………………………
TDM / Scheduler
Switch
Serial IO
Serial IO
8
MemCon 2015 - October 12th
8
16
16
16
Channel 0 Channel 1 DDR4 ~ 16+20
16
Copyright ©MoSys, Inc. 2015. All rights reserved.
HBM – 8 channels
& 128 banks,
~1600 pins, Si Interposer
19
Future TSV DRAM Comparison
Direct A+ach HBM Bandwidth Interposer / Yield cost Serial HBM concept HMC equal CPU Memory Memory Power 1x <2x >3x Latency Lowest Low ? Yes Yes No DeterminisUc Longevity of Interface Field Repair Host IO (PHY & pins) 5 years Host based indefinitely Serial HBM based HMC based Single Purpose General Purpose and LP SerDes Not possible Possible Supply Chain MCM-­‐type Component ApplicaUon Performance none Test or Burn-­‐In Source MemCon 2015 - October 12th
OpUmized for applicaUon MulU-­‐sourced Copyright ©MoSys, Inc. 2015. All rights reserved.
Generic HMC SpecificaKon Single Source 20
What to build with? It depends…
The Ultimate Network Processor’s Memory
Implementation
v  Memcon 2014 MoSys presented on extreme memories for
v 
v 
v 
networking and showed the relative position and value for
different memories for a 1.2Tbps Network processor.
HBM for buffering
Serial memories
for header
processing and
search
Off chip PHY to
optimize datapath
v  This is a great
v 
point solution for
1.2 Tbps datapath
What about less
extreme systems?
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
22
Example 400G Line Card w/ EZchip NPS
Z30 Adds 50% System Memory Bandwidth
Intelligent Offload
MoSys
Flexible Feature &
Performance
Expansion
Flexibility + Performance
“C” Programmable Processors
+ L2-L7 Accelerators
MSRZ30
Front Panel
Packet Forwarding Engine
MoSys
Framer/
Gear Box
FIC
uP
uP
uP
uP
uP
uP
uP
uP
Embedded
Memory
Memory I/O
Memory bandwidth for
Packet Buffering, cores
and HW Accelerators
MemCon 2015 - October 12th
DDR
DDR
DDR
DDR
DDR
4DDR
444
4
DDR
DDR
DDR
DDR
DDR
4DDR
444
4
uP
uP
uP
uP
uP
uP
uP
uP
Hardware
Accelerators
DDR
DDR
DDR
DDR
DDR
4DDR
444
4
Backplane
8-16 serial lanes
DDR
DDR
DDR
DDR
DDR
4DDR
444
4
Packet Buffer
24 x DDR4 devices
Copyright ©MoSys, Inc. 2015. All rights reserved.
23
800GE Using Serial HBM & BE3
4 x 100G
Optics
Module
GB/RT
LineSpeed
Gearbox,
Retimer
GB/RT
MemCon 2015 - October 12th
LineSpeed
Gearbox,
Retimer
Bandwidth
Engine
Gen 3
400G PFE (ASIC/FPGA) Copyright ©MoSys, Inc. 2015. All rights reserved.
GCI
Optics
Module
shim
Shared:
•  FIB Tables
• Statistics
• Metering
• Semaphores
• Packet Buffers
400G PFE (ASIC/FPGA) 4 x 100G
24
Conclusion
v  Serial memory offers advantages over Direct Attach HBM
§ 
§ 
§ 
§ 
§ 
Economics driven by Supply Chain
Flexible and adaptable
Scalable performance
Quality and reliability
Simplifying board design and cooling
v  Pick your memory for your application
§  Memory core performance and capacity (DRAM vs. others)
§  Architecture ( Point to Point versus Chainable)
§  IO serial vs. parallel
v  DDR DRAM is the defacto standard based on decades of
evolution and optimization.
§  If DDR doesn’t meet your needs there are other options available.
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
25
Bandwidth Engine Serial Interface
(GCI)
Mark Baumann
Director of Applications
Topics
v  Parallel Interface evolution – faster, wider à
How long can this Last?
v  Serial Interface evolution – NRZ à PAM4 à
emerging
v  Interface efficiency – HMC vs. GCI vs. ILA
v  Standards based solutions vs. proprietary
v  Interface for offload (abstracted)
§  serial is better (variable size transfers)
§  Splitting transaction layer from transport layer
v  Purpose built vs. Fungible IO
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
27
NPU Interface Options Today
Memory & CoProcessor
DDR-3
SDRAM
RLDRAM
QDR
SRAM
KBP/
TCAM
MemCon 2015 - October 12th
XAUI
SSTL/
HSTL
SSTL/
HSTL
SerDes
10G KR
NPU
SerDes
Interlaken
SSTL/
HSTL
SerDes
Copyright ©MoSys, Inc. 2015. All rights reserved.
PCIex
Network & Backplane Interfaces
Serial Style DDR Style 28
DDR-3
SDRAM
DDR-3
Bridge
BE
Serial Style GCI GCI XAUI
SSTL/
HSTL
SerDes
10G KR
SerDes
Serial
SRAM?
KBP/
TCAM
Serial Style SerDes
NPU
SerDes
Interlaken
SerDes
PCIex
Interlaken 3x to 4x Bandwidth Density per mm2 Network & Backplane Interfaces
Memory & CoProcessor
NPU Interfaces Using Serial
Enabled by 10G KR GCI enabled SerDes MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
29
Memory & CoProcessor
Serial Style HMC or
Ser. HBM
BE
GCI XAUI
SSTL/
HSTL
SerDes
10G KR
SerDes
Serial
SRAM?
KBP/
TCAM
Serial Style SerDes
NPU
SerDes
Interlaken
SerDes
PCIex
Interlaken 3x to 4x Bandwidth Density per mm2 Network & Backplane Interfaces
NPU Interfaces Using Serial
Enabled by 10G KR GCI enabled SerDes MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
30
Parallel vs Serial
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
31
GigaChip Interface Layers & Frame Format
Transaction
Application Specific
BE
QDR,TCAM…
GigaChip Interface Protocol Data Link
Reliable transport of
Frames via CRC &
Positive Ack
Physical Coding
Sublayer (PCS)
Link initialization
Lane Deskew
Scrambling
CEI CompaUble SerDes Physical Media
Access Electrical
Data Link Layer Frame Format
Payload
DLL
Rx
Ack
CRC
72b
1b
1b
6b
v  Frame striped across SerDes lanes (1, 2, 4, 8,16)
§  Modulo 10 UI, Fixed size
§  Sized to meet needs of application
§  >90% bandwidth efficiency at 80b
v  Data Link Layer operations
§  DLL Indicates if payload is Transaction Link Layer
operation or Data Payload
§  Data Link Layer operations: Replay, Pause (no-op)
v  Data Payload format up to application
§  Op codes, address, data…formatting left to higher level
§  For memory transactions: 1 frame = transaction
§  For packets: variable number of frames can be used
PC Board Trace MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
32
CRC Error Handling w/Positive Ack
Device A CSI Tx Tx Request
Transactor
Queue
Device B CSI Rx Tx Replay Queue 72 CRC
Gen
Compare,
Set Tx
Replay if
“stuck”
Tx
SerDes
Rx
SerDes
6 Prev Rx
Ack Count
72 Rx Target
Transactor
Queue
CRC Error
Check
Compare Ack, Replay when “stuck” Rx Ack
Counter
Freeze Ack If CRC Error, Resume Replay Frame Ack Count Ack Count PISO
SIPO
1 Rx
SerDes
Rx
SerDes
72 + 6 MemCon 2015 - October 12th
Post if CRC OK, Freeze if not OK, Resume posUng on Replay Frame 1 72 + 6 Copyright ©MoSys, Inc. 2015. All rights reserved.
33
Multi Core => Multi-Partition & Multi-bank
Packet Processor
ingress … 0
Multi-threaded
Multi-Cores
allow for
high processing
throughput
n-1
1
n
Serial Link
Serial Link
Multi-linked
allow for
concurrent
transport
operations
… Multi-bank
Multi-partitions
allow for
high access
availability
Serial Link
Serial Link
ALU for
functional
Acceleration
Multi-cycle Scheduler
BIST
Selfrepair
… … ALU
… Allows
Extended
Carrier Class
& In package
Repair
egress 10 GA
800 Gb/s
Local
processing
minimizes
intra-chip
traffic
Bandwidth Engine
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
34
Protocol Transfer Efficiency Comparison:
Range of Payload Sizes and Applications
Packet Header Processing Application
100% 100% Read-­‐Only Data Efficiency 90% Packet Buffering Applications
Read/Write Data Transfer Efficiency 90% 80% 80% 70% 70% 60% 60% 50% 50% BE 50:50 40% HMC 50:50 BE 40% ILA 30% 30% HMC 20% 20% 10% 10% HMC 32B Block Size 0% 0 5 10 15 20 25 Payload Size (B) 30 35 40 HMC H MC 6 4B HMC 128B Block Size 3 2B 0% 0 20 40 60 80 100 120 Payload Size (B) 140 160 180 Efficiency includes Transaction & Transport protocol:
Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol)
Note GCI: GCI + TL 2.0
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
35
Protocol Transport Efficiency Comparison:
GCI Optimized For Smaller Transfers
Packet Transfers Header Processing 100% 90% 80% GCI ~ 2x
Interlaken
70% 60% GCI ≈
Interlaken
50% ILA 40% Interlaken GCI +2 .0 TL 2.0 GCI 30% 20% 10% 0% 0 MemCon 2015 - October 12th
10 20 30 40 50 Frame Size (Bytes) 60 Copyright ©MoSys, Inc. 2015. All rights reserved.
70 80 36
Serial Link Rate Road Map
v  Xilinx UltraScale+ 2016
v 
v 
33G GTY SerDes
BE3 2016 Q1 31G
SerDes
56G PAM4 is being
demonstrated now
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
37
CEI-56G Will Address Chip to Chip, Module, +
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
38
Summary
v  GCI is a proven chip to chip reliable transport protocol
§  Multiple designs in FPGA, ASIC and ASSP in production systems
v  GCI Specification is freely available without restriction on use
§  Same as Interlaken model
v  GCI protocol is designed to evolve as the CEI standard evolves
v  The inherent performance efficiency of GCI naturally equates to
improved energy efficiency
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
39
Thank You
©MoSys,
MemCon Copyright
2015 - October
12thInc. 2015. All rights reserved.
40
CMOS Memory Core Technologies
• Transaction Rate
• Power
• mm2/bit
TCAM
• Cost
Logic Fab
SRAM
eDRAM
LL/RL
DRAM
HMC
HBM
DDR Mobile DRAM
DRAM Fab
(limited metal)
MemCon 2015 - October 12th
Copyright ©MoSys, Inc. 2015. All rights reserved.
#BitCells per
SenseAmp
41