Serial Memories Fill a Need Memcon 2015 Agenda v Michael Sporer – Director of Marketing § The future of parallel versus serial interface for memory v Mark Baumann – Director of Applications Engineering § Based on experience at MoSys developing and introducing the GigaChip interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will describe several options for future memory interface solutions. MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 2 Discrete DRAM doesn’t do Serial… yet v Memory is the last holdout that still hasn’t gone serial MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 3 Challenges of Implementing DDR Design, Development & Qualification DRAM bus trace length matching requirements Source: Agilent MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 4 Tradeoffs: Serial vs. Parallel v On the Chip v On the Board § SerDes adds costs on chip • MUX deMUX • 2.5GHz chip with 25 Gbps IO § Fewer lanes • 25GHz is more challenging, but is solvable § Longer reach than parallel • Easier board floor planning • Distributed thermal loads § Greater noise immunity v IO Bandwidth / Chip Area § Roughly the same on chip § Depends on the range v Is it a balanced tradeoff? v IO Bandwidth / Power § It depends on reach MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 5 HMC gives them the bandwidth they need v “DDR has run out of pins on the package” Source: Xilinx Technology Outlook -­‐ Liam Madden, FPL, Sept-­‐2014 MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 6 TSV Based DRAM Stacks v The performance potential of TSV based DRAM stacks can be realized with two very different interface and packaging solutions. v High Bandwidth Memory (HBM) § Evolutionary § wide, parallel interface v Hybrid Memory Cube (HMC) § high performance serial interface. v Both solutions have their place in new systems design and there are advancements in both options on the horizon. MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 7 and HBM is coming … v Just look at what AMD and nvidia have planned HBM Gen1 shipping now HBM Gen2 coming soon MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 8 Interposer based MCM v Xilinx highlighted that the technology wasn’t the critical element, it was the supply chain. Source: Xilinx Technology Outlook -­‐ Liam Madden, FPL, Sept-­‐2014 MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 9 Economics of Direct Attach HBM v @Customer: Can customer afford Direct Attach HBM? § Interposer development costs § Fixed memory footprint § Special Supply Chain § What is the volume required to recoup incremental costs? v @Manufacturer: Can DA-HBM exist in a low volume, high mix manufacturing environment? MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 10 Serial HBM: High Performance, Low Pin count Serial HBM Solution v Serial HBM Reduces Risk at the Customer § Lower Technology Risk • • • • Pin count advantage for host device, Ease of routing a serial interface Standard CEI interface Scalable and versatile Serial Interface HBM § Component type Supply Chain • Inventories • Test and Burn-In § Cost Advantages • Standard board assembly v Serial HBM Markets shim § Networking GCI • Packet Buffering and high capacity tables § Embedded • Supports a range of capacity and speeds with long product lifecycles • Protects customers from changing HBM memory interface on host v All the Bandwidth but none of the headaches of DA-HBM MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 12 Flexible Capacity Expansion : Serial v One host port of 16 lanes can connect to 1, 2 or 4 devices v No additional bus loading or pin count 16 8 8 Host Host 1x 2x v No throughput degradation 4 4 4 4 Host § Expansion example shows MoSys Bandwidth Engine Copyright ©MoSys, Inc. 2015. All rights reserved. 4x 13 HBM MCM Yield Analysis HBM Memory Solutions ASIC 55 um HBM HBM MCM Yield Single Sourced Interface support longevity Memory controller complexity and power added to ASIC HBM § § § § HBM v Direct Attach HBM – 4 HBM shim shim HBM shim HBM HBM HBM Copyright ©MoSys, Inc. 2015. All rights reserved. shim MemCon 2015 - October 12th ASIC 180 um shim ASIC shim 180 um HBM VSR SerDes for Motherboard Lowest Cost, highest yield solution 30% board area increase Easiest thermal solution HBM § § § § HBM v Serial HBM On Motherboard: shim § Tested and optional burn in of component HBM before MCM assembly § shim features optimized for application § Incremental power for additional shim ASIC § USR SerDes for MCM shim HBM v Serial HBM Package on Package 15 Serial vs. Direct Attach Value Comparison A+ribute Serial HBM Direct A+ach HBM Technical Risk + • Smaller Interposer • Discrete Component BI & Test + -­‐ • MCM Yield • HBM Repair -­‐ Cost + • Lower yielded cost • Supply Chain Inventory + -­‐ • MCM Development Cost • MCM Yield -­‐ Power -­‐ • incremental power /BW + • Lower power Thermal + • Distributed sources -­‐ • Higher Thermal Density Time to Market + • Proven Standard SerDes • Discrete Component Design + -­‐ • HBM Interface IP Availability • MCM Complexity -­‐ Flexibility + • On or Off substrate • Memory expansion + • Fungible Serdes + -­‐ • Depopulate or not • Single purpose HBM IO Block -­‐ Reliability + • Burn-­‐In OpUon -­‐ • JEDEC Field Repair in host ASIC • Field Repair managed in Serial HBM + Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th 16 Normalized Yielded Cost of HBM Assembly yield expected to be 95% MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 17 HMC – Hybrid Memory Cube v Breakthrough in power due to TSV based construction § 5 pJ/b DRAM only v Combined with Logic die resulting in 24.5W per 1Tbps § 3 links @ 12.5G § 24.5 pJ/b total (vs. 39 for DDR4) MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 18 Serial vs. Parallel Memory Comparison Attribute Bandwidth Engine BE-2 | BE-3 Hybrid Memory Cube (HMC) High Bandwidth Memory (JEDEC) DDR4 (JEDEC) Physical Interface Serial CEI Standard Serial CEI Std JEDEC HBM IO JEDEC DDR4 IO Protocol GigaChip™ Interface HMC Consortium RAS/CAS Dual-Sourced Single Sourced Multi-Sourced Banked RAM Source of Supply Access TDM Scheduler Sched./Switch 576 Mb 1152 Mb 16~32 Gb 32-64 Gb 4-8 Gb Buffer Bandwidth 400 Gbps 800 Gbps 1280 Gbps 2048 Gbps 38 Gbps Transaction Rate >4.5 Bt/s >10 Bt/s 2.6~2.9 Bt/s TBD 0.2 Bt/s 66 66 272 ~1600 42 BGA 19x19 BGA 25x25 BGA 31x31 KGSD BGA 8x12 7-11W TBA ~28W 8W estimated 0.7W Capacity Signal Pins Package Power ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… ……………………………………………… TDM / Scheduler Switch Serial IO Serial IO 8 MemCon 2015 - October 12th 8 16 16 16 Channel 0 Channel 1 DDR4 ~ 16+20 16 Copyright ©MoSys, Inc. 2015. All rights reserved. HBM – 8 channels & 128 banks, ~1600 pins, Si Interposer 19 Future TSV DRAM Comparison Direct A+ach HBM Bandwidth Interposer / Yield cost Serial HBM concept HMC equal CPU Memory Memory Power 1x <2x >3x Latency Lowest Low ? Yes Yes No DeterminisUc Longevity of Interface Field Repair Host IO (PHY & pins) 5 years Host based indefinitely Serial HBM based HMC based Single Purpose General Purpose and LP SerDes Not possible Possible Supply Chain MCM-­‐type Component ApplicaUon Performance none Test or Burn-­‐In Source MemCon 2015 - October 12th OpUmized for applicaUon MulU-­‐sourced Copyright ©MoSys, Inc. 2015. All rights reserved. Generic HMC SpecificaKon Single Source 20 What to build with? It depends… The Ultimate Network Processor’s Memory Implementation v Memcon 2014 MoSys presented on extreme memories for v v v networking and showed the relative position and value for different memories for a 1.2Tbps Network processor. HBM for buffering Serial memories for header processing and search Off chip PHY to optimize datapath v This is a great v point solution for 1.2 Tbps datapath What about less extreme systems? MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 22 Example 400G Line Card w/ EZchip NPS Z30 Adds 50% System Memory Bandwidth Intelligent Offload MoSys Flexible Feature & Performance Expansion Flexibility + Performance “C” Programmable Processors + L2-L7 Accelerators MSRZ30 Front Panel Packet Forwarding Engine MoSys Framer/ Gear Box FIC uP uP uP uP uP uP uP uP Embedded Memory Memory I/O Memory bandwidth for Packet Buffering, cores and HW Accelerators MemCon 2015 - October 12th DDR DDR DDR DDR DDR 4DDR 444 4 DDR DDR DDR DDR DDR 4DDR 444 4 uP uP uP uP uP uP uP uP Hardware Accelerators DDR DDR DDR DDR DDR 4DDR 444 4 Backplane 8-16 serial lanes DDR DDR DDR DDR DDR 4DDR 444 4 Packet Buffer 24 x DDR4 devices Copyright ©MoSys, Inc. 2015. All rights reserved. 23 800GE Using Serial HBM & BE3 4 x 100G Optics Module GB/RT LineSpeed Gearbox, Retimer GB/RT MemCon 2015 - October 12th LineSpeed Gearbox, Retimer Bandwidth Engine Gen 3 400G PFE (ASIC/FPGA) Copyright ©MoSys, Inc. 2015. All rights reserved. GCI Optics Module shim Shared: • FIB Tables • Statistics • Metering • Semaphores • Packet Buffers 400G PFE (ASIC/FPGA) 4 x 100G 24 Conclusion v Serial memory offers advantages over Direct Attach HBM § § § § § Economics driven by Supply Chain Flexible and adaptable Scalable performance Quality and reliability Simplifying board design and cooling v Pick your memory for your application § Memory core performance and capacity (DRAM vs. others) § Architecture ( Point to Point versus Chainable) § IO serial vs. parallel v DDR DRAM is the defacto standard based on decades of evolution and optimization. § If DDR doesn’t meet your needs there are other options available. MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 25 Bandwidth Engine Serial Interface (GCI) Mark Baumann Director of Applications Topics v Parallel Interface evolution – faster, wider à How long can this Last? v Serial Interface evolution – NRZ à PAM4 à emerging v Interface efficiency – HMC vs. GCI vs. ILA v Standards based solutions vs. proprietary v Interface for offload (abstracted) § serial is better (variable size transfers) § Splitting transaction layer from transport layer v Purpose built vs. Fungible IO MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 27 NPU Interface Options Today Memory & CoProcessor DDR-3 SDRAM RLDRAM QDR SRAM KBP/ TCAM MemCon 2015 - October 12th XAUI SSTL/ HSTL SSTL/ HSTL SerDes 10G KR NPU SerDes Interlaken SSTL/ HSTL SerDes Copyright ©MoSys, Inc. 2015. All rights reserved. PCIex Network & Backplane Interfaces Serial Style DDR Style 28 DDR-3 SDRAM DDR-3 Bridge BE Serial Style GCI GCI XAUI SSTL/ HSTL SerDes 10G KR SerDes Serial SRAM? KBP/ TCAM Serial Style SerDes NPU SerDes Interlaken SerDes PCIex Interlaken 3x to 4x Bandwidth Density per mm2 Network & Backplane Interfaces Memory & CoProcessor NPU Interfaces Using Serial Enabled by 10G KR GCI enabled SerDes MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 29 Memory & CoProcessor Serial Style HMC or Ser. HBM BE GCI XAUI SSTL/ HSTL SerDes 10G KR SerDes Serial SRAM? KBP/ TCAM Serial Style SerDes NPU SerDes Interlaken SerDes PCIex Interlaken 3x to 4x Bandwidth Density per mm2 Network & Backplane Interfaces NPU Interfaces Using Serial Enabled by 10G KR GCI enabled SerDes MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 30 Parallel vs Serial MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 31 GigaChip Interface Layers & Frame Format Transaction Application Specific BE QDR,TCAM… GigaChip Interface Protocol Data Link Reliable transport of Frames via CRC & Positive Ack Physical Coding Sublayer (PCS) Link initialization Lane Deskew Scrambling CEI CompaUble SerDes Physical Media Access Electrical Data Link Layer Frame Format Payload DLL Rx Ack CRC 72b 1b 1b 6b v Frame striped across SerDes lanes (1, 2, 4, 8,16) § Modulo 10 UI, Fixed size § Sized to meet needs of application § >90% bandwidth efficiency at 80b v Data Link Layer operations § DLL Indicates if payload is Transaction Link Layer operation or Data Payload § Data Link Layer operations: Replay, Pause (no-op) v Data Payload format up to application § Op codes, address, data…formatting left to higher level § For memory transactions: 1 frame = transaction § For packets: variable number of frames can be used PC Board Trace MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 32 CRC Error Handling w/Positive Ack Device A CSI Tx Tx Request Transactor Queue Device B CSI Rx Tx Replay Queue 72 CRC Gen Compare, Set Tx Replay if “stuck” Tx SerDes Rx SerDes 6 Prev Rx Ack Count 72 Rx Target Transactor Queue CRC Error Check Compare Ack, Replay when “stuck” Rx Ack Counter Freeze Ack If CRC Error, Resume Replay Frame Ack Count Ack Count PISO SIPO 1 Rx SerDes Rx SerDes 72 + 6 MemCon 2015 - October 12th Post if CRC OK, Freeze if not OK, Resume posUng on Replay Frame 1 72 + 6 Copyright ©MoSys, Inc. 2015. All rights reserved. 33 Multi Core => Multi-Partition & Multi-bank Packet Processor ingress … 0 Multi-threaded Multi-Cores allow for high processing throughput n-1 1 n Serial Link Serial Link Multi-linked allow for concurrent transport operations … Multi-bank Multi-partitions allow for high access availability Serial Link Serial Link ALU for functional Acceleration Multi-cycle Scheduler BIST Selfrepair … … ALU … Allows Extended Carrier Class & In package Repair egress 10 GA 800 Gb/s Local processing minimizes intra-chip traffic Bandwidth Engine MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 34 Protocol Transfer Efficiency Comparison: Range of Payload Sizes and Applications Packet Header Processing Application 100% 100% Read-­‐Only Data Efficiency 90% Packet Buffering Applications Read/Write Data Transfer Efficiency 90% 80% 80% 70% 70% 60% 60% 50% 50% BE 50:50 40% HMC 50:50 BE 40% ILA 30% 30% HMC 20% 20% 10% 10% HMC 32B Block Size 0% 0 5 10 15 20 25 Payload Size (B) 30 35 40 HMC H MC 6 4B HMC 128B Block Size 3 2B 0% 0 20 40 60 80 100 120 Payload Size (B) 140 160 180 Efficiency includes Transaction & Transport protocol: Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol) Note GCI: GCI + TL 2.0 MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 35 Protocol Transport Efficiency Comparison: GCI Optimized For Smaller Transfers Packet Transfers Header Processing 100% 90% 80% GCI ~ 2x Interlaken 70% 60% GCI ≈ Interlaken 50% ILA 40% Interlaken GCI +2 .0 TL 2.0 GCI 30% 20% 10% 0% 0 MemCon 2015 - October 12th 10 20 30 40 50 Frame Size (Bytes) 60 Copyright ©MoSys, Inc. 2015. All rights reserved. 70 80 36 Serial Link Rate Road Map v Xilinx UltraScale+ 2016 v v 33G GTY SerDes BE3 2016 Q1 31G SerDes 56G PAM4 is being demonstrated now MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 37 CEI-56G Will Address Chip to Chip, Module, + MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 38 Summary v GCI is a proven chip to chip reliable transport protocol § Multiple designs in FPGA, ASIC and ASSP in production systems v GCI Specification is freely available without restriction on use § Same as Interlaken model v GCI protocol is designed to evolve as the CEI standard evolves v The inherent performance efficiency of GCI naturally equates to improved energy efficiency MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. 39 Thank You ©MoSys, MemCon Copyright 2015 - October 12thInc. 2015. All rights reserved. 40 CMOS Memory Core Technologies • Transaction Rate • Power • mm2/bit TCAM • Cost Logic Fab SRAM eDRAM LL/RL DRAM HMC HBM DDR Mobile DRAM DRAM Fab (limited metal) MemCon 2015 - October 12th Copyright ©MoSys, Inc. 2015. All rights reserved. #BitCells per SenseAmp 41