Serial Network SDRAM ENEE 759H Spring 2003 Introduction SDRAM system drawbacks No parallelism for memory accesses Multitude of pins for address/command/data Overall Goals Increase parallelism, reduce latency Reduce pin count Attempt to increase bandwidth Motivation Poulton’s idea Bi-directional serial links. Theoretically high bandwidth! Less pins required for same functionality! Looks perfect! *Graphic from Poulton’s Signaling Tutorial Evolution I Initial design Memory Controller Split topology. Effectively halve latency. Complicated protocol and connection details. Address, etc. Evolution II Initial design Memory Controller Individual DRAM chips directly connected. High overall bandwidth. Inflexible, lower capacity for system. We need a better design! 8 SDRAM Chips The Next Step Want simple system interconnects Keep basic SDRAM chip structure intact Utilize the strengths of both parallel and serial connections Create a system that facilitates parallelism System Overview Take a “step back”… Consider memory module interface. Consider inter-chip interface on module. Memory Controller Memory Modules Serial Lines @ fast clock System Overview 1 logical channel, 4 physical channels 3.2 GHz point-to-point connections Each channel called “module” 5 pins/module on memory controller Intra-module connections: parallel External connections: high speed serial Module Topology 8 bit, Data-in buses 8 bit, Data-out buses 18 bit Addr/Cmd buses DIN/DOUT Buffers Memory Controller 256 Mbit x8 SDRAM Parts CLK1 DIN CLK2 DOUT CMD Serial Lines @ 3.2 GHz Translator Circuits Memory Module System Details I DIN Translator DOUT Translator DOUT 8 8 8 COMMAND 18 To SDRAM Chips 8 8 8 From DOUT Buffer 8 DIN To DIN Buffer 8 System Details II DOUT Buffer DIN Buffer From SDRAM Chip 0 To SDRAM Chip 0 From SDRAM Chip 1 8 From SDRAM Chip 2 8 8 From DIN Translator To DOUT Translator 8 To SDRAM Chip 1 8 To SDRAM Chip 2 8 To SDRAM Chip 3 From SDRAM Chip 3 8 8 4, 8-bit registers System Details – Protocol I The Command Set CMD USE OP ADDR? NOP No operation. 000 N ACT Activate a row; uses bank and row address. 001 Y READ Selects bank/column, initiates read burst. 010 Y WRITE Select bank and column, initiate write burst. 011 Y PREC Precharge; deactivate row in bank. 100 * AUTOR Auto-refresh; enter refresh mode. 101 N XXX Reserved 110 XXX Reserved 111 System Details – Protocol II Packets 18 bit command/address 32 bit data packets Activate this row and bank… COMMAND 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 Start a READ burst at this column… COMMAND 0 1 0 1 1 0 1 1 0 0 1 0 0 *Operating at 3.2GHz, command packets take 5.62ns; data packets take 10ns (the same as SDRAM operating at 100 MHz). Cubing I “Chip stacking” Developed by IrvineSensors Corp. Currently can stack two 256 Mbit chips. Smaller footprint/area! Much shorter connection wires! *Graphics from Irvine-Sensors Data Sheet Cubing II – Serial Network Point-to-point star topology. Dedicated circuits high speed serial lines. Departure from “traditional” bus concept. 4-stack Cubes Memory Controller Address/Command line DOUT line DIN line Clock line @ 3.2 GHz System Access Protocol Consecutive access to same module Similar timing as SDRAM. Bandwidth matched between parallel and serial. DIN/DOUT buffers - no additional timing constraints. *Graphic from Dr. Jacob and Dave Wang System Access Protocol Independent, simultaneous access to separate modules. Conventional SDRAM: No inter-module timing issues. *Graphic from Dr. Jacob and Dave Wang Serial Network Advantages I Path length matching No more heroic routing! Star topology is symmetric. No clock mismatch issues… Everyone is on time! *Graphic from Dr. Jacob and Dave Wang Serial Network Advantages IIa No need for bus termination. Point-to-point communication, terminated in module. *Graphic from Dr. Jacob and Dave Wang Serial Network Advantages IIb Serial/P2P vs. RAMBUS multi-drop. Faster signaling! No ringing! Clean timing. Serial wins… RAMBUSted! *Graphic from Dr. Jacob and Dave Wang System Simulation SimpleScalar Single CPU, Single Thread SNSDRAM(32 Meg x 8) 1 rank in every memory module Channel width : 32 bits One extra cycle of Transaction Queue Delay to model the parallel to serial conversion Simulation Run I - Parallel Bus Channel 1 1 1 1 Rank Per Channel 1 2 4 8 Sim_Cycles 884521 881421 880361 880361 Simulation Run I - Serial Network Channel 1 2 4 8 Rank Per Channel 1 1 1 1 Sim_Cycles 885291 805721 766711 766711 Simulation I Cycles Chart test-printf Total Cycles in Thousands 900 850 800 750 700 1 Serial Link Parallel Bus 2 4 Number of Channels (Serial Link) Number of Ranks (Parallel Bus) 8 Simulation Run II – Parallel Bus Channel 1 2 4 8 Rank Per Channel 1 1 1 1 Sim_Cycles 13206613 13169500 13144737 13144737 Simulation Run II – Serial Network Channel 1 2 4 8 Rank Per Channel 1 1 1 1 Sim_Cycles 13264603 12633349 12510912 12510912 Simulation II Cycles Chart test-printf Total Cycles in Thousands 13400 13200 13000 12800 12600 12400 12200 12000 1 Serial Link Parallel Bus 2 4 Number of Channels (Serial Link) Number of Ranks (Parallel Bus) 8 Memory Mapping Basic SDRAM Rank Row ID Bank Hi Col ID Channel ID Lo Col ID Col Size Lo Col ID Col Size High Performance SDRAM Row ID Rank Bank Hi Col ID Channel ID Analysis Cache line = 64 byte channel width Read after Read Multi-CPU Single CPU Multi-Thread Summary I Recall… SDRAM has complex interface, simple chips. RDRAM has a simple interface, but very complex chips. SNSDRAM… Blends these seemingly split philosophies! Summary II Advantages Smaller pin count on memory controller. Independent memory modules facilitate parallelism. Simulated performance improvement over similar SDRAM configurations. Smaller system footprint with cubing technology. Theoretically scalable.