2. Challenges/limiters of parallel connected synchronous memories Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 Overview • 1. Key challenges facing main memories • 2. Main limiters of increasing the transfer rate of main memories - Overview • 3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array • 4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts • 5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts • 6. Main limiters of increasing the memory size • 7. References 1. Key challenges facing main memories 1. Key challenges facing main memories (1) Key challenges facing main memories • Increasing (single core) processor performance (the past) 1. Key challenges facing main memories (2) Integer performance grows SPECint92 Levelling off 10000 P4/3200 * * Prescott (2M) * * *Prescott (1M) P4/3060 * Northwood B P4/2400 * **P4/2800 P4/2000 * *P4/2200 P4/1500 * * P4/1700 PIII/600 PIII/1000 * **PIII/500 PII/400 * PII/300 * PII/450 * 5000 2000 1000 500 * Pentium Pro/200 Pentium/200 * * * Pentium/166 Pentium/133 Pentium/120 * Pentium/100 * Pentium/66* * 486-DX4/100 486/50 * 486-DX2/66 * * 486/33 486-DX2/50 * ~ 100*/10 years 200 100 50 20 486/25 * 10 * 386/33 386/20 5 386/16 * * 386/25 80286/12 * 2 80286/10 1 8088/8 0.5 0.2 * * * * 8088/5 79 1980 81 Year 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 99 2000 01 02 Figure 1.2: Integer performance growth of Intel’s x86 processors 03 04 05 1. Key challenges facing main memories (3) Key challenges facing main memories • Increasing (single core) processor performance (the past) • Multicore/manycore processors with doubling core numbers in about every two years (the presence and near future) 1. Key challenges facing main memories (4) Evolution of Intel’s process technology Shrinking: ~ 0.7/2 Years Figure: Evolution of Intel’s process technology [1] 1. Key challenges facing main memories (5) The evolution of IC complexity (Moore’s low) Figure: The actual rise of IC complexity in DRAMs and microprocessors [2] 1. Key challenges facing main memories (6) Rapid spreading of multicore processors in Intel’s processor portfolio Figure: Rapid spreading of Intel’s multicore processors 1. Key challenges facing main memories (7) The Cell BE (2006) SPE: Synergistic Procesing Element SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit LS: Local Store of 256 KB SMF: Synergistic Mem. Flow Unit EIB: Element Interface Bus PPE: Power Processing Element PPU: Power Processing Unit PXU: POWER Execution Unit MIC: Memory Interface Contr. BIC: Bus Interface Contr. XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3] 1. Key challenges facing main memories (8) Assuming that the IC process technology will evolve in the near future at a similar rate as now (shrinking of characteristic feature sizes at a rate of ~ 0.7/2 years) the number of cores will double also about every two years. 1. Key challenges facing main memories (9) Higher processor performance/more cores Higher memory performance requirements in terms of • larger memory size • higher memory bandwidth • lower memory latency 1. Key challenges facing main memories (10) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performance requirements in terms of • larger memory size • higher memory bandwidth • lower memory latency 1. Key challenges facing main memories (11) Higher processor performance/more cores Interesting research area Depends on • characteristics of the application • cache architecture • ... Higher memory performance requirements in terms of • larger memory size • higher memory bandwidth • lower memory latency 1. Key challenges facing main memories (12) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performance requirements in terms of • larger memory size • higher memory bandwidth • lower memory latency Limitations of recent implementations 1. Key challenges facing main memories (13) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performance requirements in terms of • larger memory size • higher memory bandwidth • lower memory latency Limitations of recent implementations 2. Main limiters of increasing the transfer rate of main memories - Overview 2. The transfer rate of main memories (1) Main components of the main memory DRAM device Memory Cell Array I/O Buffers Memory controller Figure: Main components of the main memory 2. The transfer rate of main memories (2) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of sourcing/sinking data from/to the memory array, (problem of reducing the Column Cycle Time of the memory cell array) DRAM device Memory Cell Array I/O Buffers Memory controller Sourcing/Sinking Figure: Schematic view of the structure of the main memory 2. The transfer rate of main memories (3) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of transmitting data between memory controller and memory modules (transmission line termination problem), DRAM device Memory Cell Array Sourcing/Sinking I/O Buffers Memory controller Transfering Figure: Schematic view of the structure of the main memory 2. The transfer rate of main memories (4) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of capturing data in the memory controller/memory module. (signaling and synchronization problem). DRAM device Memory Cell Array Sourcing/Sinking I/O Buffers Memory controller Transfering Capturing Capturing Figure: Schematic view of the structure of the main memory 2. The transfer rate of main memories (5) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of sourcing/sinking data from/to the memory array, (problem of reducing the Column Cycle Time of the memory cell array) • The rate of transmitting data between memory controller and memory modules (transmission line termination problem), • The rate of capturing data at the memory controller/memory module. (signaling and synchronization problem). The most serious limitation constrains the achievable transfer rate. 3. Challenges in increasing the rate of 3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell sourcing/sinking data from/to the memory cell array array 3. The rate of sourcing/sinking data (1) Basic operation speed of recent sychronous DRAMs The memory cell array sources/sinks data to/from the I/O buffers at a rate of T (at a data width of x4/x8/x16). T = 1/tCCD x FW with tCCD: Min. column cycle time of the memory cell array FW: Fetch width of the memory cell array 3. The rate of sourcing/sinking data (2) The min. column cycle time (tCCD) of the memory cell array tCCD (Core column delay) is the min. time interval between consecutive Reads or Writes. Figure: The interpretation of tCCD [4] Remark tCCD is designated also as the Read/Write command to Read/Write command delay 3. The rate of sourcing/sinking data (3) ns Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [5] Note: The min. column cycle time (tCCD) of synchronous DRAMs is: SDRAM: DDR/2/3 7.5 ns 5 ns 3. The rate of sourcing/sinking data (4) The fetch width (FW) of the memory cell array specifies how many times more bits the cell array fetches per column cycle then the data width of the device. E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is typically: DRAM type FW SDRAM: DDR: DDR2: DDR3: 1 2 4 8 E.g. SDRAM E.g. DDR SDRAM E.g. DDR2 SDRAM Clock frequency (f ) Clock (CK) 3. The rate of sourcing/sinking data (5) 100 MHz 100 MHz DRAM core frequency 100 MHz Memory Cell Array CK fCK n bits DRAM core clock 100 MHz Memory Cell Array Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1) 100 MT/s n bits fCK 2xn bits I/O Buffers 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 200 MT/s n bits I/O Buffers 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 400 MT/s n bits 4xn bits E.g. DRAM core clock 100 MHz DDR-200 Data Strobe (DQS) 200 MHz Clock (CK/CK#) 200 MHz fCK/2 SDRAM-100 Data Strobe (DQS) 100 MHz Clock (CK/CK#) 100 MHz DRAM core clock 100 MHz Memory Cell Array fCK I/O Buffers DDR2-400 Data Strobe (DQS) 400 MHz Clock (CK/CK#) 400 MHz fCK/4 DDR3 SDRAM Memory Cell Array I/O Buffers 8xn bits 2 x fCK n bits Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 800 MT/s DDR3-800 Figure: Fetch width of synchronous DRAM generations 3. The rate of sourcing/sinking data (6) According to Tmax = 1/tCCD x FW The peak rates of sourcing/sinking data to/from the I/O buffers are: SDRAM: DDR: DDR2: DDR3: 1/7.5 x 1 = 133 MT/s 1/5 X 2 = 400 MT/s 1/5 x 4 = 800 MT/s 1/5 x 8 = 1600 MT/s (not yet achived) The main limitation in increasing the rates of sourcing/sinking data from/to the memory array is TCCD (Column Cycle Time). The column cycle time TCCD) resulting from a DRAM design depends on a number of architectural choiches, like column decoder layout, array block size, array partitioning, decisions to share resources between array banks etc. [32]. Its reduction below 5 ns is an intricate circuit design task, that is out of scope of our discussion. For an insight into the subject see [32]. Remark GDDR3 and GDDR4 devices, with peak transfer rates of 1.6 and 2.5 GT/s, respectively, achive min. column cycle times (TCCD) of 2.5 and 3.2 ns, respectively [32]. 4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts 4. The transfer rate between the MC and the DRAM parts (1) The dataway connecting the memory controller and the DRAM chips Memory modules Memory controller Motherboard trace Figure: The dataway connecting the memory controller and the DRAM chips (based on [6]) 4. The transfer rate between the MC and the DRAM parts (2) The dataway connecting the memory controller and the DRAM chips Memory modules For higher data rates PCB traces behave as transmission lines Memory controller Motherboard trace Figure: The dataway connecting the memory controller and the DRAM chips (based on [6]) 4. The transfer rate between the MC and the DRAM parts (3) Basic behaviour of transmission lines (TL) TL Driver Receiver Principle of operation • A signal front given at the input of the TL travels down the TL from the driver side to the receiver side. • Arriving at the receiver side the signal becomes reflected back to the driver side, then • at the driver side, the signal will be reflected again toward the receiver side etc. 4. The transfer rate between the MC and the DRAM parts (4) Transmission lines (TL) PC board traces (microstrips) behaves over ~ 100 MT/s like transmission lines with • a characteristic impedance (ZO) • and trace velocity 4. The transfer rate between the MC and the DRAM parts (5) Characteristic impedance of PCB traces (ZO) [7] Table: Typical characteristic impedance values of PCB traces [8] 4. The transfer rate between the MC and the DRAM parts (6) Trace velocity Table: Typical trace velocity values of PCB traces [8] Remark With 1 ft = 30.48 cm, the equivalent values in cm/ns are: 1.6 ns/ft equals ~ 19 cm/ns 2.0 ns/ft equals ~ 15 cm/ns 2.2 ns/ft equals ~ 14 cm/ns 4. The transfer rate between the MC and the DRAM parts (7) Behaviour of an ideal TL Ideal TL: no attenuation, no capacitive or inductive loading. VrD(t) ZO VrR(t) TL ZD T VO (t) VD(t) Driver With VO(t): VD(t): VrD(t): VR(t): VrR(t): Generator voltage Voltage at the driver output Reflected voltage at the driver Voltage at the receiver Reflected voltage at the receiver ZT VR(t) Receiver ZD: Internal impedance of the driver ZO: Charateristic impedance of the TL ZT: Impedance terminaling the TL T: Flight-time over TL Figure: Equivalent circuit of an ideal transmission line, (neglecting attenuation along the TL and capacitive as well as inductive loading of the TL) 4. The transfer rate between the MC and the DRAM parts (8) Characteristic equations describing the reflections and driver/receiver side voltages (based on [9]) At t = 0 Driver side: VO(t=0) = VO ZO VD(t=0) = VD(0) = VO Z + Z O D VrD(t=0) = VD(t=0) At t = T (T: propagation time across the TL) Receiver side: VR(nT) = VD((n-1)T)*(1+rR) where rR = VrR(nT) = VD((n-1)T)*rR ZT – ZO ZT + ZO 4. The transfer rate between the MC and the DRAM parts (9) Characteristic equations (cont.) At t = nT (n>1) Driver side VD((n+1)T) = VD((n-1)*T)+VrR(nT)*(1+rD) where: rD = ZD – ZO ZD + ZO VrD((n+1)T) = VrR(nT)*rD Receiver side VR(nT) =VR((n-2)T) + VrD((n-1)T)*(1+rR) VrR(nT) = VrD((n-1)T)*rR At t ∞ (Steady state) Receiver side VR(t∞) = VO ZO ZO + ZD 4. The transfer rate between the MC and the DRAM parts (10) Example 1: Open ended ideal TL VrD (t) ZO = 50 Ω ZD = 25 Ω TL ZD VO (t=0) = 2V VO(t) VD(t) Driver VrR (t) ZT ZT >> ZO Receiver Figure: Equivalent circuit of an open ended ideal TL VR(t) 4. The transfer rate between the MC and the DRAM parts (11) Driver side Receiver side VR(t) 2.0 VR(t) 1.0 VD(t) 2.0 1.0 VD(t) 1.333 1.333 T 1.33 1T 1.333 2.666 2.67 2T 2T -0.444 2.222 3T 2.22 3T -0.444 1.778 1.78 4T 4T 0.148 1.926 5T 1.93 5T 0.148 2.074 2.07 6T 6T -0.049 2.025 7T 2.02 7T -0.049 1.976 1.98 8T 8T 0.002 9T Figure: Ladder diagram and VD(t), VR(t) waveforms of an open ended ideal TL (based on [6]) 4. The transfer rate between the MC and the DRAM parts (12) D: Driver R: Receiver O: Output I: Input Figure: Open ended real TL (diiferential connection) [10] Reflections at both ends (R-end, D-end) 4. The transfer rate between the MC and the DRAM parts (13) Reflections Figure: Reflections shown on a eye diagram due to termination mismatch [11] 4. The transfer rate between the MC and the DRAM parts (14) Implications of the reflections on a TL • When a data signal is given at the driver side of the TL, a signal wavefront travels down the TL and will be ping-ponged between both ends of the TL until the steady state condition is reached. • But until the signal becomes at least nearly settled no further wavefront can be given to the TL else inter symbol interferences (ISI) arise. Reflections limit the max. data transfer rate of a TL. 4. The transfer rate between the MC and the DRAM parts (15) The max. data transfer rate is limited primarily by the time until the signal settles, that is, it depends both on • • the number of signal round trips until the signal settles, and the length of the TL. Example Open ended TL of the length of 10 cm Assumptions: • Signal velocity on the TL is 20 cm/ns. T = 0.5 ns • Reflections settle to an acceptable level after three roundtrips (6T). Then the wavefront of a signal settles nearly after 6×0.5 ns = 3 ns. ½ of the min. cycle time is 3 ns, the min. cycle time is 6 ns, the max. transfer rate of the above open ended TL is ~ 166 MHz 4. The transfer rate between the MC and the DRAM parts (16) Open ended TLs may be used only for • relative low transfer rates (up to ~ 100 MHz), that is up to SDRAM devices, and • short distances (up to ~ 10 cm). For higher transfer rates or longer distances the TL needs to be terminated by its characteristic impedance Z0. 4. The transfer rate between the MC and the DRAM parts (17) Reducing reflections by a series resistor A series resistor put before the TL reduces reflections Improved signal integrity, higher transfer rates 4. The transfer rate between the MC and the DRAM parts (18) Example 2: Using series resistors to reduce reflections Figure: Equivalent circuit of an open ended TL with a series resistor (R3 in the figure) included between the driver and the TL (Micro-Cap 9.0.5.0) 4. The transfer rate between the MC and the DRAM parts (19) R3 = 0 Ώ R3 = 25 Ώ R3: Figure: Driver (Vout) and Reciever (Vin) voltages of an open ended TL with a series resistor R3 The value of R3 is modified from 0 to 25 Ohm 4. The transfer rate between the MC and the DRAM parts (20) SDR DIMM SDR DIMM Memory Contr. LVTTL Comm., Contr. Addr. DQ, DQS RS RS DM Slot 1 Slot 2 Figure: Series resistors on an SDRAM module inserted into the DQ, DQS, DM lines (Rs = 10 or 22 Ω) 4. The transfer rate between the MC and the DRAM parts (21) Matched TLs Needed above ~ 100 MHz (i.e. for DDR/DDR2/DDR3 memories). Basic scheme for unidirectional signals (assuming SSTL signaling) VT VT: VREF: RT ZO = 50 Ohms Transmitter RT: Termination Voltage = VREF 0.5 Output voltage 50 Ohm VREF Receiver Figure: Termination of a TL with its characteristic impedance [12] SSTL:Stub Series Termination Logic 4. The transfer rate between the MC and the DRAM parts (22) Example 3: Perfectly terminated ideal TL VrD (t) VO(t=0) = 2V ZD = 25 Ω ZO = 50 Ω ZD TL VO (t) VD(t) Driver VrR (t) ZT ZT = 50 Ω Receiver Figure: Equivalent circuit of a perfectly terminated ideal TL VR(t) 4. The transfer rate between the MC and the DRAM parts (23) Driver side Receiver side VR(t) 2.0 VR(t) 1.0 VD(t) 2.0 1.0 VD(t) 1.33 1T 0 1.33 2T 3T 4T Figure: Ladder diagram and waveforms VD(t), VR(t) of a perfectly matched ideal TL (based on [6]) 4. The transfer rate between the MC and the DRAM parts (24) RT = ZO Figure: Perfectly matched real TL (differential connection) [10] No reflections from the receiver end 4. The transfer rate between the MC and the DRAM parts (25) The problem of TL inhomogenity • The TL connecting the memory controller and the DRAM devices is not homogeneous, it consists of multiple sections. Memory modules Memory controller Motherboard/transmission line Figure: Discontinuities of TLs connecting the memory controller and the memory modules (based on [6]) 4. The transfer rate between the MC and the DRAM parts (26) Figure: Discontinuities of TLs connecting the slot to the particular DRAM devices assuming stub-bus topology and a registered memory module [5] 4. The transfer rate between the MC and the DRAM parts (27) The problem of TL inhomogenity • The TL connecting the memory controller and the DRAM devices is not homogeneous, it consists of multiple sections. • Between different TL sections there are discontinuities, that give rise to reflections. Memory modules Memory controller Motherboard/transmission line Figure: Discontinuities of TLs connecting the memory controller and the memory modules (based on [6]) 4. The transfer rate between the MC and the DRAM parts (28) Addressing the problem of TL discontinuities SSTL termination (Stub Series Termination Logic) Used in DDR/DDR2/DDR3 devices Principle Use both perfect termination and a series resistors (RS) to increase the TL attenuation and thus reduce reflections from the memory module back to the memory controller [6]. R S: RT: 22/25 Ohm 50 Ohm VT RT VT: Termination Voltage = VREF VREF: 0.5 Output voltage RS ZO = 50 Ohms Transmitter VREF Receiver Figure: SSTL termination of a unidirectional signal [12] 4. The transfer rate between the MC and the DRAM parts (29) Figure: Equivalent circuit of two TLs (T1, T2) with slightly different characteristic impedances, a series resistor (R3), while T2 is terminated by 50 Ohm and 3 pF. 4. The transfer rate between the MC and the DRAM parts (30) Discontinuities of the transmission line generate reflections Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit 4. The transfer rate between the MC and the DRAM parts (31) Higher series resistor values attenuate reflections but lower the steady state output voltage R3 = 0 Ώ R3 = 25 Ώ R3 = 0 … 25 Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit The value of R3 is modified from 0 to 25 Ohm 4. The transfer rate between the MC and the DRAM parts (32) Higher output capacitance values lower the reflections C3=0 pF C3=9 pF C3 = 0 … 9 pF Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit The value of C3 is modified from 0 to 9 pF 4. The transfer rate between the MC and the DRAM parts (33) Note With increasing value of Rs (from 2 Ohm to 22 Ohm) the amplitude of the reflected voltage at the receiver side clearly decreases. 4. The transfer rate between the MC and the DRAM parts (34) Example 1: Line terminations in a DDR memory DDR DIMM DDR DIMM VT T Memory Contr. VT T RT SSTL_2 RT RS1 RS1 Comm., Contr. Addr. DQ, DQS/# DM RS2 RS2 Slot 1 Slot 2 Figure: Line terminations in a DDR memory (RS1: 7.5 Ω for 4 devices, 5.1 Ω for 8 devices, 3 Ω for 16 devices RS2 = 22 Ω, RT = 56 Ω) 4. The transfer rate between the MC and the DRAM parts (35) In order to achieve higher transfer rates more and more sophisticated line terminations are needed. Examples: Synchronous DRAMs (commodity DRAMs) 4. The transfer rate between the MC and the DRAM parts (36) Example 2: Line terminations in a DDR2 memory On-Die Termination (ODT) DDR2 DIMM DDR2 DIMM VT T Memory Contr. SSTL_18 RS1 Comm., Contr. Addr. ODT T RTT RS1 ODT VTT VTT R1 R1 DQ, DQS/# R2 R2 DM Vss Vss Rs2 Rs2 VT Slot 1 Slot 2 Figure: Line terminations in a DDR2 memory (RS1: 10 Ω for 4 devices, 5.1-10 Ω for 8 devices, 7.5 Ω for 16 devices RS2 = 22 Ω, RTT = 47 Ω) RTT 4. The transfer rate between the MC and the DRAM parts (37) Example 3: Line terminations in a DDR3 memory Dynamic On-Die Termination (ODT) opt.: to optimize termination resistors along with each write command DDR3 DIMM DDR3 DIMM VT VT T Memory Contr. RT Dyn. ODT SSTL_15 Comm., Contr. Addr. T VTT VTT R1 R1 DQ, DQS/# R2 R2 DM Rs RT Dyn. ODT Vss ZQ Rs RZQ Vss Vss ZQ RZQ Vss ZQ calibration: to adjust the „on” and the „termination” impedances of the merged drivers every 128 ms. Figure: Line terminations in a DDR3 memory (Rs = 10-15 Ω, RT = 36-39 Ω, RZQ = 240 Ω ±1%) Remark: Due to the fligh-by module topology no series resistors are needed for the Command/Control/Address lines 4. The transfer rate between the MC and the DRAM parts (38) SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM C/C/A LVTTL SSTL_2 SSTL_18 SSTL_15 Clock (CLK/CK) LVTTL SSTL_2 Diff. SSTL_18 Diff. SSTL_15 Diff. DQ, DQM LVTTL SSTL_2 SSTL_18 SSTL_15 DQS -- SSTL_2 SSTTL-18/SSTL_18 Diff. SSTL_15 Diff.l Terminations No RS Signaling RS RS on module RS on module RT No RT RT on board RT on board RT on module RS RS on module RT -- RT on board ODT (RT on die) Dyn. ODT (RT on die) Separate output / termination drivers Separate output / termination drivers Separate output / termination drivers Merged output/termination drivers with ZQ-calibration (during power up/ periodically) Central clock Source synchronization No DQS DLL DLL DLL+ Read/write leveling to compensate fly-time skews between DQS and CK (during power-up) Posted reads/writes No No Yes Yes Reset pin No No No Yes C/C/A No RS DQ/DQS/DM Driver architecture Synchronization Basic scheme Aligning DQS with CK DIMM topology Packaging C/C/A: Command/Control/Address Stub architecture TSOP-54 DQ: Data Fly-by architecture TSOP-54 BGA-60 BGA-60 for x4/x8 BGA-84 for x16 DQM: Data Mask Table : Implementation details of SDRAM types DQS: Data Strobe BGA-78 for x4/x8 BGA-96 for x16 4. The transfer rate between the MC and the DRAM parts (39) Line terminations of recent commodity DRAMs achieved already a rather high grade of sophistication there is not to much headroom remaining for further improvements. 5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts 5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts • 5.1 Coping with capturing data • 5.2 Using more advanced signalling • 5.3 Using more advanced synchronization 5.1 Coping with capturing data (1) Basics of capturing data Data/Commands • Data and commands are latched by D Flip-Flops. Clock Clock • For correctly capturing data or commands, input signals need to be held valid for specified periods of time before and after the clock puls, termed as the setup time (tS) and the hold time (tH) as shown in the figure. Data tH tS Q Figure: Temporal requirements for correctly capturing data 5.1 Coping with capturing data (2) Setup time (tS) the minimum time interval for which the input signal must remain valid (high or low) prior to the clock edge in order to capture the data bit correctly. Hold Time (tH) the minimum time interval for which the input signal must remain valid (high or low) following the clock edge in order to capture the data bit correctly. 5.1 Coping with capturing data (3) Specification of the setup time (tS) and the hold time (tH) In device datasheets, e.g. in case of a DDR-400 device: Table: Excerpt from the specification of the dynamic parameters of a DDR-400 device [13] Note: A DDR-400 device is clocked by 200 MHz, so its half clock period is 1.25 ns. By contrast, its setup and hold times are 0.4 ns each (designated as tDS, TDH in the table). 5.1 Coping with capturing data (4) Minimum data valid window (DVW) the minimum time interval for which the input signal must remain valid (high or low) before and after the clock edge in order to capture the data bits correctly. Data CK tS tH Min. DVW Figure: Interpretation of the minimum DVW for ideal signals The minimum DVW has two characteristics, a size, that is the sum of the setup time (tS) and the hold time (tH), and a correct phase related to the clock edge, to satisfy both tS and tH requirements. 5.1 Coping with capturing data (5) If both tS = tH, the clock edge needs to be center aligned with the DVW, as indicated below. Data CK tS tH Min. DVW Figure: Center aligned clock edge within the min. DVW Example In a DDR-400 SDRAM tS = tH = 0.4 ns [13], then • the min. DVW is 0.8 ns, i.e. roughly 2/3 of the clock period (1.25 ns), and • the clock edge needs to be center aligned in the min. DVW. 5.1 Coping with capturing data (6) Available DVW the time interval for which the input signal remains valid (high or low). Available DVW Data CK tS tH Min. DVW Figure: Interpretation of the minimum DVW and the available DVW for ideal signals For correctly capturing data:, two requirements need to be fulfilled: • the available DVW ≥ available DVW, and • the clock edge needs to be properly aligned (usually center aligned) within the available DVW. 5.1 Coping with capturing data (7) Note Assuming tS = tH (as usual) for the highest transfer rate the clock signal needs to be center aligned with the data. 5.1 Coping with capturing data (8) Reduction of the available DVW in real systems In real systems the available DVW is reduced due to • skews and • jitter. 5.1 Coping with capturing data (9) Skew is a time offset of the signal edges • between different occurances of the same signal, such as a clock, at different locations on a chip or a PC board (as shown in the Figure below), or • between different bit lines of a parallel bus at a given location. Figure: Skew due to propagation delay [15] Skews arise mainly due to - propagation delays in the PC-board traces, termed also as time of flight (TOF) (about 170 ps/inch), as indicated above [14], - capacitive loading of a PC-board trace (about 50 ps per pF) as indicated in the subsequent figure [14], - SSO (Simultaneous Switching Output) occurring due to parasitic inductances in case when a number of bit lines simultaneously change their output states. 5.1 Coping with capturing data (10) CK-1 CK-2 Skew Figure: Skew due to capacitve loading of signal lines [14] 5.1 Coping with capturing data (11) Reduction of operational tolerances due to skews Center aligned clock Skewed clock Available DVW Available DVW Data Data CK CK tS tS tH Min. DVW tH Min. DVW Figure: Reduction of operational tolerances due to clock skew (ideal signals assumed) A larger than indicated skew would even jeopardize or prevent correct operation. Deskewing of clock distribution is needed. 5.1 Coping with capturing data (12) Jitter • phase uncertainty causing ambiguity in the rising and falling edges of a signal, as shown in the figure below, • has a stochastic nature, Figure: Jitter of signal edges [15] The main sources of jitter are • Crosstalk caused by coupling adjacent traces on the board or in the DRAM device, • ISI (Inter-Symbol Interference) caused by cycling the bus faster than it can settle, • Reflection noise due to mismatching termination of signal lines, • EMI (Electromagnetic Interference) caused by electromagnetic radiation emitted from external sources. 5.1 Coping with capturing data (13) Narroving the available DVW due to jitter Jitter obviously narrow the available DVW, as shown in the following example for DDR-200 devices. (DDR-200 devices are clocked by 100 MHz, thus their half clock period is 5 ns). DQ Av. DVW Av. DVW ~ 5 ns with jitter without jitter Figure: Narrowing the available DVW due to jitter 5.1 Coping with capturing data (14) The timing budget of the available DVW The available DVW need to cover • the min. requested DVW (tS +tH), • all possible sources of skews, • all possible sources of jitter. Available DVW min. DVW Skews/jitters Skews/jitters Figure: Interpretation of the timing budget of the available DVW Note The white areas before and after the min. DVW represent available timing margins 5.1 Coping with capturing data (15) Example Timing budget of a DDR-266 memory Table: Timing budget of a DDR-266 memory [16] Remark The table uses partly different terminology, as follows Total skew: Transmitter skew: Receiver skew: VREF noise: CIN mismatch: Available DVW Setup time Hold time OSS Skew due to different capacitive loading 5.1 Coping with capturing data (16) Note The crucial sources and actual extent of occurring skews and jitters depend on • the frequency range in question, • DRAM type used, • mainboard and memory module implementation details. Timing budget tuning is a main task of developing DRAM devices/modules and mainboards. 5.1 Coping with capturing data (17) Shrinking the available DVW for higher transfer rates Higher data rates Shorter clock periods Shorter available DVWs This is one of the key problems to be handled for achieving higher data rates. tDV Width of the available VDW Figure: Shrinking the available DVW while raising the data rate from PC-133 to DDR-400 and DDR2-800 [17] 5.1 Coping with capturing data (18) Addressing the problem of shrinking (available) DVWs in order to raise DRAM speed Reducing skews and jitters by • using more advanced signaling techniques, such as • SSTL (Stub Series Terminated Logic) or • LVDS (Low Voltage Differential Signaling), instead of open-ended LVTTL (Low Voltage TTL), • using more efficient synchronisation schemes than central clocking, such as source-synchronous synchronisation. • using DLLs/PLLs to align clock or data strobe edges. 5.2 Using more advanced signaling (1) Using more advanced signaling techniques 5.2 Using more advanced signaling (2) Signal types Ground referenced Voltage referenced, single ended Voltage referenced, differential S+ VREF LVTTL (3.3 V) SDRAM PCI PCI-X AGP1.0 t t t TTL (5 V) PCI VCM S- SSTL single ended signals SSTL_2 (2.5 V) (DDR) SSTL_18 (1.8 V) (DDR2) SSTL_15 (1.5 V) (DDR3) AGP2.0 (1.5 V) AGP3.0 (0.8 V) Higher data rates HVDS SCSI-1 LVDS Hypertransport SATA Ultra-2 SCSI and later PCI-E SSTL Differential signals Figure: Overview of signal types LVTTL: Low Voltage TTL HVDS: High Voltage Differential Signaling VREF: Reference Voltage LVDS: Low Voltage Differential Signaling SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage 5.2 Using more advanced signaling (3) Signal types used in mainstream DRAMs TTL LVTTL SSTL (Low Voltage TTL) (Stub Series Termination Logic) Signal type Ground referenced Ground referenced Voltage referenced, single ended/differential Termination Open ended Open ended Terminated Voltage Used in the DRAM types 5V Page Mode FPM EDO 3.3 V FPM EDO SDRAM Figure: Signal types used in mainstream DRAMs (Earliest DRAMs (1K/4K) omitted) 2.5/1.8/1.5 V DDR DDR2 DDR3 5.2 Using more advanced signaling (4) SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM Comm./Control/Addr./ Data (DQ)/Data Mask (DM) LVTTL SSTL_2 SSTL_18 SSTL_15 Clock (CLK/CK) LVTTL SSTL_2 Diff. SSTL_18 Diff. SSTL_15 Diff. -- SSTL_2 SSTL_18 / SSTL_18 Diff. SSTL_15 Diff. Data Strobe (DQS) Table: Signal types of the main signal groups in synchronous DRAM devices 5.2 Using more advanced signaling (5) Vout VOH max 5.0 4.0 VIN Vout TTL inverter 3.0 VOH min 2.4 2.0 1.0 VOL max 0.4 0.8 1.0 VIL max 2.0 2.4 3.0 VIH min 4.0 5.0 Vin Figure: Input/output characteristics of TTL signals as used in PM/FPM/EDO devices (based on [6]) 5.2 Using more advanced signaling (6) Vout 3.3 VOH max 3.0 VOH min 2.4 VIN 2.0 Vout LVTTL inverter 1.0 VOL max 0.4 0.8 1.0 VIL max 2.0 VIH min 3.0 3.3 Vin Figure: Input/output characteristics of LVTTL signals (based on [6]) 5.2 Using more advanced signaling (7) Stub Series Terminated Logic (SSTL) Three generations SSTL_2: VDDQ = 2.5 V JESD8-9 SSTL_18: VDDQ = 1.8 V SSTL_15 (Sept. 1998), used in DDR SDRAMs JESD8-15A (Oct. 2002), used in DDR2 SDRAMs VDDQ = 1.5 V used in DDR3 SDRAMs SSTL signals Differential Single ended S+ VRE VCM S- F t t Commmand/Control/Address, Used as Clock (CK) Data (DQ), Data Mask (DM), Data Strobe (DQS) in DDR/DDR2 Data Strobe (DQS) in DDR2/3 Figure: Types of SSTL signals 5.2 Using more advanced signaling (8) Vout VOH min VIN VREF 2.5 2.125 2.0 Vout 1.25 1.0 SSTL inverter VOL max 0.375 1.0 1.25 VREF VIL max 2.0 2.5 Vin VIH min (VREF – 150 mV) (VREF + 150 mV) Figure: Input/output characteristics of single ended SSTL signals (based on [6]) The static view 5.2 Using more advanced signaling (9) Figure: Interpretation of characteristic input levels of single ended SSTL signals [18] The dynamic view DC values: define the final logic state. AC values: define the timing specifications the receiver needs to meet e.g. slew rate) State changes A certain amout of time after the device has crossed the DC threshold and then also the AC threshold (hold time), the device will switch state and will not switch back as long as the input stays beyond the dc threshold [18]. 5.2 Using more advanced signaling (10) Figure: Using AC values for defining the falling and rising slew rates of single ended SSTL signals [19] 5.2 Using more advanced signaling (11) DDR DDR2 DDR3 VDDQ 2.5 V 1.8 V 1.5 V VREF 1.25 V 0.9 V 0.75 V VIH (ac )min. VREF + 310 mV VREF + 250 mV VREF + 175 mV VIH (dc) min. VREF + 150 mV VREF + 125 mV VREF + 100 mV VIL (dc )max. VREF - 150 mV VREF - 125 mV VREF - 100 mV VIL (ac)max. VREF - 310 mV VREF - 250 mV VREF - 175 mV Ground Ground Ground VSS Table: Characteristic input levels of single ended SSTL signals in DDR/DDR2/DD3 devices [20], [21], [22] 5.2 Using more advanced signaling (12) VTR: True level VCP: Compl. level Figure: Interpretation of characteristic input levels of differential SSTL signals [19] (CK/CK#, DQS/DQS#) DDR DDR2 DDR3 VDDQ 2.5 V 1.8 V 1.5 V VREF 1.25 V 0.9 V 0.75 V VID 620 mV 500 mV 400 mV VIX VREF VREF VREF VSS Ground Ground Ground Table: Characteristic input levels of differential SSTL signals in DDR/DDR2/DD3 devices [20], [22], [19] 5.2 Using more advanced signaling (13) Skew reduction by differential data strobes (DQ, DQ#) Figure: Skew reduction while using differential strobes instead of single ended strobes [23] 5.2 Using more advanced signaling (14) The eye diagram Visualizes both signal traces (belonging to the H and L levels) by overlapping subsequent symbols in time, as indicated below for both an ideal and real signal. Jitter Reflections Reflections Figure: Eye diagram of an ideal and a real signal The eye diagram is a favorable way • to visualize reflections, jitter and • to contrast expected and available values both for the DVW and voltage levels. 5.2 Using more advanced signaling (15) Visualizing both min. and available DVWs and voltage margins by means of an eye diagram M r g M r g Margin V1Hmin DATA eye Min. DVW Margin V1Lmax tS tH Figure: Eye diagram of an ideal signal showing both min. and available DVW and voltage levels 5.2 Using more advanced signaling (16) min Min. DVW max Figure: Eye diagram of a real signal showing both min. and available DVW and voltage levels [24] 5.2 Using more advanced signaling (17) For a correct operation available DVW and voltage values ≥ required values A stable operation needs reasonable temporal margins (timing budget) and voltage margins. 5.3 Using more advanced sycnhronisation (1) Improving the basic synchronisation scheme Basic synchronisation scheme Central clock synchronisation Source synchronisation A central clock is used to latch (capture) addresses, commands and data from the respective buses or send fetched data. Figure: Basic synchronisation schemes 5.3 Using more advanced sycnhronisation (2) Central clock synchronization (SDRAMs) Address, command and data lines are latched by the central clock (CLK) Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs) while writing random data [25], [13] (TDOSS: Write command to first DQS latching transition) 5.3 Using more advanced sycnhronisation (3) Improving the basic synchronisation scheme Basic synchronisation scheme Central clock synchronisation Source synchronisation A central clock is used to latch (capture) addresses, commands and data from the respective buses or send fetched data. • Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc. • SDRAMs and earlier DRAMs are centrally clocked. Figure: Basic synchronisation schemes 5.3 Using more advanced sycnhronisation (4) Improving the basic synchronisation scheme Basic synchronisation scheme Central clock synchronisation A central clock is used to latch (capture) addresses, commands and data from the respective buses or send fetched data. Source synchronisation An extra data strobe signal (DQS) is provided to accompany data sent from the driving unit to the receiving unit • Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc. • SDRAMs and earlier DRAMs are centrally clocked. Figure: Basic synchronisation schemes 5.3 Using more advanced sycnhronisation (5) Central clock synchronization (SDRAMs) Address, command and data lines are latched by the central clock (CLK) Source synchronization (DDR SDRAMs) Command and address lines are latched by the differential clock (CK, CK#) but data are latched by the source synchronous data strobe DQS Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs) while writing random data [25], [13] (TDOSS: Write command to first DQS latching transition) 5.3 Using more advanced sycnhronisation (6) Improving the basic synchronisation scheme Basic synchronisation scheme Central clock synchronisation A central clock is used to latch (capture) addresses, commands and data from the respective buses or send fetched data. Source synchronisation An extra data strobe signal (DQS) is provided to accompany data sent from the driving unit to the receiving unit • Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc. • The data strobe signal eliminates propagation delays between data lines • SDRAMs and earlier DRAMs are centrally clocked. • DDR SDRAMs are source synchronised. • The data strobe signal (DQS) is bidirectional to reduce pin count. Figure: Basic synchronisation schemes 5.3 Using more advanced sycnhronisation (7) Required phase alignments for synchronous DRAM devices, controllers and modules In case of SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CLK). • for data writes center align data signals (DQ) with the clock (CLK), • for data reads SDRAM devices do not perform any alignment on the data sent to the controller, it is the task of the controller to shift the CLK edge to the center of the data eye. • SDRAM devices do not need to perform any phase alignments, however • for data reads they have to garantee that the required minimal data hold time (TOH) is satisfied, see Figure. • SDRAM modules need to perform clock deskewing for the clock (CLK) distribution circuitry. 5.3 Using more advanced sycnhronisation (8) In case of DDRx SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CK). 5.3 Using more advanced sycnhronisation (9) Figure: Required phase alignments in case of DDRx devices 5.3 Using more advanced sycnhronisation (10) In case of DDRx SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CK). • for data writes center align data signals (DQ) with the data strobe (DQS), 5.3 Using more advanced sycnhronisation (11) Figure: Required phase alignments in case of DDRx devices 5.3 Using more advanced sycnhronisation (12) In case of DDRx SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CK). • for data writes center align data signals (DQ) with the data strobe (DQS), • for data reads (DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).) 5.3 Using more advanced sycnhronisation (13) Figure: Required phase alignments in case of DDRx devices 5.3 Using more advanced sycnhronisation (14) In case of DDRx SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CK). • for data writes center align data signals (DQ) with the data strobe (DQS), • for data reads (DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).) It is then the task of the controller to shift the DQS edge to the center of the data eye. 5.3 Using more advanced sycnhronisation (15) Example: Shifting DQS into the center of DQ Figure: DDR2 write operation at 800 MT/s showing 90O shift of the differential DQS into the center of the data eye [27] 5.3 Using more advanced sycnhronisation (16) In case of DDRx SDRAM devices • Memory controllers of devices need to perform the following alignments: • for all commands center align address, command and control signals with the clock (CK). • for data writes center align data signals (DQ) with the data strobe (DQS), • for data reads (DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).) It is then the task of the controller to shift the DQS edge to the center of the data eye. • DDRx devices perform the following alignment: • for data reads they edge align the data strobe signal (DQS) with the data signal (DQ). 5.3 Using more advanced sycnhronisation (17) The rationale of this alignment scheme to keep DRAM devices as simple as possible and put complexity into the memory controller [27], by centralizing DLL circuitry needed to accomplish alignments in a single place that is into the memory controller and thus to avoid the need for replication DLLs into every DRAM device (except the DLLs needed in the DRAMs to edge align the DQS with CK for reads) [26]. 5.3 Using more advanced sycnhronisation (18) Furthermore • DDRx modules need to perform clock deskewing for the clock (CK) distribution circuitry. 5.3 Using more advanced sycnhronisation (19) DLLs (Delay Locked Loops) used to • edge align or deskew two signals, or • center align the data strobe signal (DSQ) with the data signal (DQ). Delay CLKREF CLK CLKD CLKOUT Figure: Deskewing the CLK signal with reference to the CLKRE signal by means of a DLL Delay DQ DQS DQS DQS Figure: Shifting the data strobe (DQS) to the center of the data signal (DQ) by means of a DLL 5.3 Using more advanced sycnhronisation (20) Simplified block diagram and principle of operation of a DLL The DLL is buit up mainly of a delay line and a phase delay control unit. The phase delay control unit inserts delay on the clock signal (CLK) until the rising edge of the clock signal (CLK) is in phase with the rising edge of the reference clock signal (CLKREF). CLK Delay Delay Delay Delay CLKOUT CLKREF Phase Delay Control Clock Distribution Network CLKOUT Delay based on [28] CLKREF CLK CLKD CLKOUT Figure: Block diagram and principle of operation of the DLL by deskewing the clock signal CLK 5.3 Using more advanced sycnhronisation (21) „Warm up” time of DLLs In a DRAM device the DLL will be activated during initialization (power up procedure). After enabling however, the DLL needs about 200 clock cycles to lock [13] and thus, until any read command can be issued. Remark [6] • PLLs and DLLs fulfill similar tasks. However, • PLLs include a voltage controlled oscillator (VCO), that generates a new clock signal, whose phase is adjustable. • DLLs include a delay line, that inserts a voltage controlled phase delay between the input and output signal. While DLLs just delay the incoming signal to achieve a phase alignement, the PLLs actually synthesize a new clock signal, whose phase is adjustable. • Since DLLs do not incorporate a VCO, they are cheaper to implement than PLLs. Memory controllers and DRAM devices of synchronous DRAMs make use of DLLs to implement phase alignments. In contrast, memory modules use PLLs to deskew clock distribution networks. 6. Main limiters of increasing the memory size 6. Main limiters of increasing the memory size (1) Memory size (CM) The memory size is given basically by the amount of memory installed in the memory system: CM = nCU x nCH x nM x nR x CD with nCU: No. of north bridges/memory control units nCH: No. of memory channels per north bridge/control unit nM: No. of memory modules per channel nR: No. of ranks per memory module CR: Rank capacity (device density x no. of DRAM devices) E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank. The resulting maximum memory capacity is: CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB 6. Main limiters of increasing the memory size (2) Remark Beyound the max. installable memory the max. memory size may be limited by particular constraints, such as the supported max. addressable space due to the number of address pins on the FSB, like in the 925X and 925XE desktop chipsets [31]. Crucial factors limiting the maximum size of main memories • nM: No. of memory modules supported per memory channel • CR: Rank capacity (device density x no. of DRAM devices/rank). 6. Main limiters of increasing the memory size (3) Number of memory modules supported per memory channel E.g. 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Higher transfer rates limit the number of mem. modules typically to one or two. Figure: Number of memory modules supported by memory channel 6. Main limiters of increasing the memory size (4) Slots/ch. 4 3 2 * * * 1 * MT/s 133 200 266 333 400 533 667 800 1066 1333 1600 Figure: Max. number of supported memory modules (slots)/channel in Intel’s desktop chipsets 6. Main limiters of increasing the memory size (5) Slots/ch 4 * * Later 3 2 * * * * At intro. * * 1 MT/s 133 200 266 333 400 533 667 800 Figure: Max. number of supported memory modules (slots)/channel in Intel’s server chipsets 6. Main limiters of increasing the memory size (6) Notes 1. Servers prefer memory size over memory speed. E.g. • current desktop chipsets support speed grades of up to DDR3-1333 (even DDR3-1600 with strong size restriction) and memory sizes of up to 4 GB/channel, • current server chipsets using parallel connected main memory support speed grades of up to DDR2-667 but memory sizes of up to 16/24 GB/channel. 2. Servers expect registered memory modules rather than unbuffered modules as desktops do. Registered modules provide buffering for the address and control lines, and through reducing signal loading they increase the number of supported memory slots (memory modules) and thus supported memory size. 3. On higher transfer rates the next wavefront arrives earlier on the transmission line, Less time remains until the next wavefront arrives the transmission line, Less time remains for settling the reflections of the privious wavefront, Inter signal interferences (ISI) will raise. Thus, for higher frequencies reflections, also skews and jitter impede more and more signal integrity. This limits the number of supported memory modules/channel. Recent desktop chipsets support typically 1-2 whereas server chipsets with parallel communication path, typically 2-3 memory modules (slots)/channel. 6. Main limiters of increasing the memory size (7) Rank capacity (CR) CR = nD x D with nD: Number of DRAM devices/rank D: Device density Number of DRAM devices/rank A 64-bit wide rank consists of 8 x8 or 16 x4 devices, and occupies usually one module side. E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices 6. Main limiters of increasing the memory size (8) Remark A few Intel server chipsets, such as the E7500, 7501 supported stacked devices as well. E.g. the E7500 server chipset supported double-sided dual rank DIMMs with 16 stacked devices (a rank) mounted on each side and yielding a total modul size of 2 GB. Figure: Double sided DDR SDRAM DIMM with 16 stacked devices on each side [30] 6. Main limiters of increasing the memory size (9) Device density Units 106 2000 4M 16M 64M 256M 1G 1500 Density: ~4×/4Y 64K 1000 500 256K 1M 16K 1980 1985 1990 1995 2000 2005 2010 2015 Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [29]) Year 6. Main limiters of increasing the memory size (10) 845 Max. mem. size/ch. (1/02) 875P1 (4/03) 4 GB 925X 975X P35 (6/04) (11/05) (6/07) * 3 GB X482 (3/08) * * * 2 GB * * 1 GB * MT/s 133 200 266 333 400 533 667 800 1066 1333 1600 Max. dev. size 1 Gb 1 GB 512 Mb * * 512 Mb * * MT/s 133 200 266 333 400 533 667 800 1066 1333 1600 Figure: Supported max. device size and max memory size/channel in Intel’s desktop chipsets 6. Main limiters of increasing the memory size (11) Max. mem. size/ch. E7501 E7520 51001 (12/02) (8/04) (1/08) Later 24 GB * * At intro. 16 GG * 8 GB * * * * * MT/s 133 200 266 333 400 533 667 800 Max. dev. size 2 Gb 2 Gb * 1 Gb 1 Gb 512 Mb * * * 512 Mb * * MT/s 133 200 266 333 400 533 667 800 Figure: Supported max. device size and max memory size/channel in Intel’s server chipsets 6. Main limiters of increasing the memory size (12) Notes 1. As the figures indicate, recent desktops provide up to 4 GB/channel memory size, whereas recent servers (with parallel bus attatchment) offer 4-8 times larger sizes. 2. Servers achieve larger memory sizes by • supporting more memory modules (with registering expected) than desktop chipsets do, and • using higher density DRAM devices at the same speed grade (e.g. 1 Gb devices instead of 512 Mb devices or 2 Gb devices instead of 1 Gb devices than desktop chipsets. 3. Recent server chipsets supporting main memories with serial bus attachement (like Intel’s 5000 and 7000 DP and MP-family chipsets) support both more channels and more modules/channel providing much higher main memory sizes of up to 192 GB or more (see Section Main memories with serial bus attachment). 6. Main limiters of increasing the memory size (13) The rate of increasing DRAM densities In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month DRAM densities evolve about 4x/ 4 years. For the same numbers of control units/modules/ranks the maximum size of main memories would increases also about 4x/4 years. But as the number of modules/channel decreases with higher transfer rates, the maximum size of main memories increases by a rate < 4x/4 years. 7. References (1) [1]: D. Bhandarkar: „The Dawn of a New Era”, 11. EMEA, May, 2006. [2]: Moore G. E., No Exponential is Forever... ISSCC 2003, ftp://download.intel.com/research/silicon/Gordon_Moore_ISSCC_021003.pdf [3]: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf [4]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf [5]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf [6]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008 [7]: Backplane Designer’s Guide, Section 9 - Layout Considerations, Fairchild Semiconductor, Apr. 2002, http://www.fairchildsemi.com/ms/MS/MS-569.pdf [8]: PC133 SDRAM RegisteredcDIMM Design Specification, Rev. 1.1, Aug. 1999, IBM & Reliance Computer Corp., http://www.simmtester.com/PAGE/memory/techdata_ pc133rev1_1.pdf [9]: Horna O. A., „Pulse Reflection in Transmission Lines,” IEEE Transactions on Computers, Vol. C-20, No. 12, Dec. 1971, pp. 1558-1563 7. References (2) [10]: Vo J., „A Comparison of Differential Termination Techniques,” Application Note 903, Aug. 1993, National Semiconductor, http://www1.control.com/PLCArchive/RS485_3.pdf [11]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText =calibrated [12]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. [13]: Double Data Rate (DDR) SDRAM MT46V128M4, MT46V64M8, MT46V32M16, Micron Techn. Inc, 2000, http://download.micron.com/pdf/datasheets/dram/ddr/512MBDDRx4x8x16.pdf [14]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, www.edn.com [15]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/ documentation/white_papers/wp190.pdf [16]: DDR Simulation Process Introduction, TN-46-11, July 2005, Micron, http://download. micron.com/pdf/technotes/DDR/TN4611.pdf [17]: Allan G., „DDR Integration,” Chip Design Magazine, June/July 2007 7. References (3) [18]: Stub Series Terminated Logic for 2.5 Volts (SSTL-2), EIA/JEDEC Standard JESD8-9, Sept. 1998 [19]: Stub Series Terminated Logic for 1.8 Volts (SSTL-18), JEDEC Standard JESD8-15A, Sept. 2003 [20]: Double Data Rate (DDR) SDRAM Specification, JEDEC Standard JESD79E, May 2005 [21]: DDR2 SDRAM Specification, JEDEC Standard JESD79-2, May 2006 [22]: DDR3 SDRAM Standard, JEDEC Standard JESD79-3, June 2007 [23]: DDR2 (Point-to-Point) Features and Functionality, TN-47-19, Micron,2003, http://download.micron.com/pdf/technotes/ddr2/TN4719.pdf [24]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf [25]: Micron Synchronous DRAM, 64 Mbit, MT48LC16M4A2, MT48LC16M8A2, MT48LC16M16A2, Micron Technology, Inc. http://www.micron.com/products/dram/sdram/partlist.aspx Oct. 2000 [26] General DDR SDRAM Functionality, TN-46-05, Micron Techn. Inc., July 2001, http://download.micron.com/pdf/technotes/TN4605.pdf [27]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/SDRAM_Controller_whitepaper_ Oct_2006.pdf 7. References (4) [28]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt [29]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/dram_pricing.pdf [30]: Intel E7500 MCH A2 x4/x8 DDR Memory Limitations, Application Note AP-722, March 2002, Intel [31]: Intel 925X/925XE Express Chipset, Datasheet, Rev. 001, Jun. 2004, Intel [32]: Keeth B., Baker R. J., Johnson B., Lin F., DRAM Circuit Design, Wiley-Interscience, 2008