3. Overall design space of main memories Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 Instruction Set Architecture (ISA) Von Neumann computational model Underlying principle of operation Microarchitecture Underlying principles of implementation Figure: Design Space of processors Principles of attaching memory and I/O Control Set Architecture (CSA) Underlying principle of operation Microarchitecture of the MM Underlying principles of implementation Figure: Design Space of main memories (MM) Underlying principle of operation Basic operation Refreshing (not discussed) Figure: Underlying principle of operation of DRAM devices Basic operation of DRAM devices (Assuming device/bank/row/column addressing) Reads Activate Read C AD AB AR C AD AB AC tRAC Precharge Read data (RD) tRAC C: Command AD: Device address AB: Bank address AR: Raw address AC: Column address C AD AR AB tRAC t Writes Activate Write C AD AB AR C AD AB AC tRAC Precharge C AD AR AB Write data (WD) tRAC tRAC t Basic operation of DRAM devices Underlying principles of the implementation of MMs One/two level implementation Managing the DRAM status Multiplexing commands, addresses and data Bus topology Principle of communication Type of signaling Bus width Type of synchronisation Figure: Main dimensions of the design space of the underlying principles of implementation of MMs One/two-level implementation One-level implementation MM is built up of DRAM devices Type of mounting Two-level implementation MM is built up of modules, modules are built up of DRAM devices Typically soldered Typically socketed Not expandable Easily expandable Board space requirement Large boardspace Small boardspace Signal integrity Good signal integrity Expandability E.g. (Earliest PC main memories) XDR memories Unfavorable signal integrity All other types of main memories Figure: One/two level implementation of main memories Managing DRAM status Along with the basic operation Detached from the basic operation (via a second dedicated interface) All other types of main memories RDRAM XDR Figure: Options to manage DRAM status This dimension of the design space is not discussed. Assumptions for multiplexing commands, addresses and data Commands and addresses • • are unidiredctional (they flow in one direction, from the MC to the MM) they are transferred on the same communication principle b) Data is bidirectional (read data flow from the MC to the MM, write data from the MM to the MC) is transferred separately from the addresses/commands DW/DR multiplexed AR/AC multiplexed AR/AC separate DRAM (asynchr.) (from the MK4096 on) First DRAMs (before the MK4086) Synchr. SDRAMs DW/DR Not multiplexed (bi-directional) DW/DR separate (unidirectional) AR/AC multiplexing Figure: Multiplexing row and column addresses (AR/AC) vs read and write data (RA/WA) Principle of communication Via a parallel bus in a single cycle Packet-based in a number of cycles 01 0 1 11 E.g: 16 cycles 01 1 cycle E.g: 4 cycles Packet transfer over a one bit wide data path 0 Figure: Principles of communication used in main memories Principle of communication Packet-based in a number of cycles Via a parallel bus in a single cycle E.g: 16 cycles 1 0 1 01 11 MC MC 1 0 0 t t 01 MC t E.g: 4 cycles Packet transfer over a one bit wide data path Figure: Principles of communication used in main memories Bus topology Multi-drop Stub-bus (socketed) Daisy-chained D R A M Attaching DRAM devices (soldered) Attaching DIMMs Fly-by Point-to-point MC MC D I M M D R A M D I M M D R A M MC D R A M D I M M MC D R A M D I M M MC D R A M MC D I M M Signal integrity Unfavorable (due to TL discontinuities) Better Good Excellent Peak transfer rate (recently) Up to 16 Gb/s (with increasingly sophisticated termination) Up to 4.8 Gb/s Up to 4.8 Gb/s Up to --- Gb/s Figure: Bus topologies used to connect RQAM devices or modules to the memory controller Bus width Parallel bus Pentium 32 64 Serial bus Width of serial bus Transmission Parallel Parallel-based Data bus Multi-drop Stub-bus Fly-by Devices on the module DDR3 Daisy-chnd Multi-drop Modules SDRAM DDR DDR2 Fly-by Daisy-chained Devices RDRAM Devices ? Devices XDR XDR2 Modules FB-DIMM Modules P2P Address/control bus Stub-bus Point-to-point TBI Figure: Bus topologies used to attach DRAM devices or DIMMs Capturing control/address information Mesochron. synch. Source synch. Source synchronization Mesochronous synchronization DDR/2/3 CRDRAM RDRAM XDR ? XDR2 ? SDRAM Central synch. Capturing control/address information Central synchronization TBI ? Figure: Synchronisation alternatives Signals Voltage referenced Open ended Differential S+ VREF t t TTL (5 V) PCI LVTTL (3.3 V) SDRAM PCI PCI-X AGP1.0 VCM S- SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3) AGP2.0 (1.5 V) AGP3.0 (0.8 V) t HVDS SCSI-1 LVDS Hypertransport SATA Ultra-2 SCSI and later PCI-E Higher data rates LVTTL: Low Voltage TTL HVDS: High Voltage Differential Signaling VREF: Reference Voltage LVDS: Low Voltage Differential Signaling SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage Figure: Different kinds of signals used in buses or interfaces Serial connected RDRAMs Devices Aimed at: Produced by Consumer (PS2) RIMMs Desktop (PIII/P4) XDRs FB-DIMMs Devices DIMMs Consumer (PS3) Servers (QS20/21) Servers (Intel’s 5000/7000, Sun’s Niagara II) Micron Elpida Elpida Samsung Samsung Toshiba Qimonda Qimonda Nanya Hynix Figure: Use and production of serial connected DRAMs MM-Overall DS Principle of operation (1) The set of buses defined Own • • • buses for data, memory requests and control register (CR) read/writes. Designation of the buses • Data bus: DQ/N [15:0] • Request bus: RQ [11:0] • CR reads/writes: serial bus (SCK, CMD, SDI, SDO, RST) Comparison: The set and direction of buses defined in major memory types (Direction is interpreted from the point of view of the memory device/module) XDR • • • • read/write data (I/O), memory requests (I) and control register (CR) reads (O), control register (CR) writes (I). DQ [15:0] RQ [11:0] SDI SDO FB-DIMM (between the AMBs and the memory controller) • read data/device status (O), • memory requests/write data/CR reads or writes (I) PN [13:0] PS [9:0] Synchronous DRAMs • read/write data (I/O), • commands (set of individual command lines (I) • addresses (bank address/address within a bank) (I) DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Principle of operation (1) Topology of the buses interconnecting the memory controller and the XDR devices • Point-to-point topology for the data bus, • Fly-by topology for the system clock, request bus and serial bus. Principle of operation (1) 1/1 Point-to-point topology for the data bus rather than a multidrop or daisy chained topology. Point-to-point data bus Data packets Data from only one device can be accessed Good signal integrity CC R/W packets Control packets Small memory size High data rate of 3.2...4.8 Gb/s Figure: Point-to-point implementation of the data bus [4] Point-to-point Figure: Implementation of a two-channel XDR memory with two XDR devices/channel [6] XDR [4] FB-DIMM Point-to-point connection Data packets Memory controller Daisy chained connection Data from only one device can be accessed Data from multiple modules can be accessed Good signal integrity Good signal integrity M. module M. module CC R/W packets Control packets Small memory size High data rate of 3.2...4.8 Gb/s M. module High memory size High data rate of 3.2...4.8 Gb/s Figure: Contrasting the point-to-point and daisy chained bus implementations of the data bus Figure: Daisy chained connection of the AMBs in FB-dIMMs [7] (There are two Command/Address buses (C/A) to reduce loading coming from 9 to 36 DRAMs mounted on the module) Note Concerning the point of termination the daisy chained connection appears like a point-to-point connection, since in this case the controller „sees” only the first memory device/module whereas further devices/modules are hidden from the controller via the repeater chain feature of the daisy chain topology and vice versa. Point-to-point connection Stub-bus connection Flying-by Command-, Address-, Control-, and CK, CK# signals ODT terninated DQ, DQS/#, DM signals Data from only one device can be accessed Small memory size Good signal integrity High data rate of 3.2...4.8 Gb/s Data from multiple modules can be accessed Unfavourable signal integrity Large memory size Low data rate of 0.8...1.6 Gb/s Figure: Contrasting the point-to-point and multidrop bus implementations of the data bus Stub bus topology Principle of operation (1/2) Fly-by topology for the • request and • CR read/write buses Request bus: RQ [11:0] CR reads/writes: SDI, SDO Comparison: Bus topologies chosen for the major memory types Bus topology XDR • • • • read/write data (I/O) memory requests (I) control register (CR) reads (O) control register (CR) writes (I) DQ [15:0] RQ [11:0] SDI SDO Point-to-point Fly-by Fly-by Fly-by PN [13:0] PS [9:0] Daisy-chained Daisy-chained DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Stub bus Stub bus DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Fly-by Fly-by FB-DIMM (AMBs - memory controller) • read data/device status (O) • memory requests/ write data/CR reads or writes (I) Synchronous DRAMs (except DDR3) • read/write data (I/O) • commands (I) • addresses (I) DDR3 • read/write data (I/O) • commands (I) • addresses (I) XDR • • • • read/write data (I/O), memory requests (I) and control register (CR) reads (O) control register (CR) writes (I) DQ [15:0] RQ [11:0] SDI SDO Bus topology Signaling Point-to-point Fly-by Fly-by Fly-by Differetial Volt. ref. Volt. ref. Volt. ref. Comparison: Signaling chosen for the major memory types DQ [15:0] RQ [11:0] SDI SDO PN [13:0] PS [9:0] DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Comparison: Signaling chosen for the major memory types Bus topology XDR • • • • read/write data (I/O), memory requests (I) and control register (CR) reads (O) control register (CR) writ(I) Signaling Differential Volt. ref. Volt. ref. Volt. ref. DQ [15:0] RQ [11:0] SDI SDO Point-to-point Fly-by Fly-by Fly-by PN [13:0] PS [9:0] Daisy-chained Daisy-chained Differential Differential DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Stub bus Stub bus Volt. ref. Volt. ref. Volt. ref. DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Fly-by Fly-by Volt. ref. Volt. ref. Volt. ref. FB-DIMM (AMBs - memory controller) • read data/device status (O) • memory requests/ write data/CR reads or writes (I) Synchronous DRAMs (except DDR3) • read/write data (I/O) • commands (I) • addresses (I) DDR3 • read/write data (I/O) • commands (I) • addresses (I) Comparison: Signaling in the major memory types Bus topology DQ [15:0] RQ [11:0] SDI SDO Point-to-point Fly-by Fly-by Fly-by PN [13:0] PS [ 9:0] Daisy-chained Daisy-chained Differential Differential DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Stub bus Stub bus Volt. ref. Volt. ref. Volt. ref. DQ [3:0/7:0/15:0] CS, RAS. CAS, WE BA [7:0], A [N:0] Stub bus Fly-by Fly-by Volt. ref. Volt. ref. Volt. ref. XDR • • • • Signaling Buses read/write data (I/O) memory requests (I) control register (CR) reads (O) control register (CR) writes (I) Differential Volt. ref. Volt. ref. Volt. ref. FB-DIMM (AMBs - memory controller) • read data/device status (O) • memory requests/ write data/CR reads or writes (I) Synchronous DRAMs (except DDR3) • read/write data (I/O) • commands (I) • addresses (I) DDR3 • read/write data (I/O) • commands (I) • addresses (I) Flying-by Command-, Address-, Control-, and CK, CK# signals ODT terninated DQ, DQS/#, DM signals XDR DDR3 Packet based Parallel bus based Data PTP, differential (DRSL) Bus, voltage ref. (SSTL) Comm./Addr. Bus, fly-by, volt. ref. (RSL) Bus, fly-by, volt. ref. (SSTL) Clock Fly-by, diff. (DRSL) Fly-by, diff. ( diff. SSTL) Contr. reg. manip. Serial 1-bit, volt. ref. (RSL) Comm. principle Signaling Synchron. FlexPhase n.a. Read/write leveling Figure: Contrasting communication and synchronisation in XDR and DDR3 memories [4], [9] Principle of operation (2) Packet based communication between the memory controller and the XDR devices (like in FB-DIMM modules) Data packets CC R/W packets Packets • Data packets over the DQ/N lines • Request packets over the RQ lines • CC R/W packets over the serial if. Request packets Interface lines • DQ/N [15:0]: Data lines • RQ [11:0]: Request lines • CFM/N: Clock From Master • SCK... Serial interface Figure: Principle of operation [4] CC: Control Register R/W: Read/Write /N: Negative signal Flying-by Command-, Address-, Control-, and CK, CK# signals ODT terninated DQ, DQS/#, DM signals XDR DDR3 Packet based Parallel bus based Data PTP, differential (DRSL) Bus, voltage ref. (SSTL) Comm./Addr. Bus, fly-by, volt. ref. (RSL) Bus, fly-by, volt. ref. (SSTL) Clock Fly-by, diff. (DRSL) Fly-by, diff. ( diff. SSTL) Contr. reg. manip. Serial 1-bit, volt. ref. (RSL) Comm. principle Signaling Synchron. FlexPhase n.a. Read/write leveling Figure: Contrasting communication and synchronisation in XDR and DDR3 memories [4], [9] Remark XDR FB-DIMM Memory controller Data packets Southbound packets • Commands • Write data Northbound packets • Read data • Status M. module M. module CC R/W packets Control packets M. module [4] Figure: Contrasting the packet concepts of XDR and FB-DIMM memories (1) Contrasting the packet concepts of XDR and FB-DIMM memories (2) Both XDR and FB-DIMM memories use packet based communication between the memory controller and the XDR devices. Differences in the packet policies XDRs „Clean” packets of • memory access and maintenace commands (termed request packets), • data, • control register read/write commands. FB-DIMMs „Clean” packets of • read data or status packets (termed as northbound packets). Mixed packets of • commands and write data (termed as southbound packets). Principle of operation (2) • The memory controller sends request packets to the XDR devices, • the XDR devices satisfy these requests, e.g. by sending read data packets to the memory controller. Basic command sequence the same as for synchronous DRAMs Activate – Read/Write - Precharge Example 1 Operation Activate Read Read Precharge Bank a, Row a Bank a, Column a1 Bank a, Column a2 Bank a Read data packet Q(a1) Read data packet Q(a2) Figure: Example for reading from the XDR device [3] Example 2 Operation Activate Write Write Precharge Bank a, Row a Bank a, Column a1 Bank a, Column a2 Bank a Read data packet Q(a1) Read data packet Q(a2) Figure: Example for writing to the XDR device [3] Mem. Size GB 500 x 512 x 7300 (4) SunT2 (4D) Core 2 Servers, T2 200 192 FB-DIMM DDR-2 100 50 DDR (reg) 48 x 5100 (2) x 32 7520 (2) P4 Servers 20 x 16 x 7501 (2) P4 Servers, QS22 7520 (2) DDR2 (reg) 10 x 8 5 P4 Desktops P4 Desktops 4 3 DDR x 845 (1) 2 x 845 (1) x 1 820 (1) 850 (1) 0,75 925X (2) x SDRAM RDRAM x DDR2 x 840 (2) RDRAM x x QS21 (2D) 860 (2) XDR 850E (1) P4 Desktops 1 QS22 (2) (2D) Core 2 Desktop 875 (2) 1,6 QS20 (2D) P4 Servers 3,2 2 Servers x 0,5 1,06 x x P35 (2) 6,4 4,2 5 8,5 10,6 12,8 21,2 25,6 10 20 51,2 30 BW GB/s 50 Figure: Peak memory size vs peak bandwidth (BW) of particular DRAM technologies in Intel’s chipsets, IBM’s QS2x blades and Sun’s T2