Challenges/limiters of parallel connected synchronous MMs

advertisement
2. Challenges/limiters of parallel
connected synchronous memories
Dezső Sima
September 2008
(Ver. 1.0)
 Sima Dezső, 2008
Overview
•
1. Key challenges facing main memories
•
2. Main limiters of increasing the transfer rate of
main memories - Overview
•
3. Challenges in increasing the rate of sourcing/sinking
data from/to the memory cell array
•
4. Challenges in increasing the transfer rate between
the memory controller and the DRAM parts
•
5. Challenges in increasing the rate of capturing data
in the memory controller/DRAM parts
•
6. Main limiters of increasing the memory size
•
7. References
1. Key challenges facing main memories
1. Key challenges facing main memories (1)
Key challenges facing main memories
• Increasing (single core) processor performance (the past)
1. Key challenges facing main memories (2)
Integer performance grows
SPECint92
Levelling off
10000
P4/3200
* * Prescott (2M)
* * *Prescott (1M)
P4/3060
* Northwood B
P4/2400 * **P4/2800
P4/2000 * *P4/2200
P4/1500 * *
P4/1700
PIII/600
PIII/1000
*
**PIII/500
PII/400
*
PII/300
* PII/450
*
5000
2000
1000
500
* Pentium Pro/200
Pentium/200
*
* * Pentium/166
Pentium/133
Pentium/120
*
Pentium/100 *
Pentium/66*
* 486-DX4/100
486/50
* 486-DX2/66
* *
486/33
486-DX2/50
*
~ 100*/10 years
200
100
50
20
486/25 *
10
* 386/33
386/20
5
386/16 *
*
386/25
80286/12
*
2
80286/10
1
8088/8
0.5
0.2
*
*
*
*
8088/5
79
1980 81
Year
82
83
84
85
86
87
88
89 1990
91
92
93
94
95
96
97
98
99 2000
01
02
Figure 1.2: Integer performance growth of Intel’s x86 processors
03
04
05
1. Key challenges facing main memories (3)
Key challenges facing main memories
• Increasing (single core) processor performance (the past)
• Multicore/manycore processors with doubling core numbers in about every two years
(the presence and near future)
1. Key challenges facing main memories (4)
Evolution of Intel’s process technology
Shrinking: ~ 0.7/2 Years
Figure: Evolution of Intel’s process technology [1]
1. Key challenges facing main memories (5)
The evolution of IC complexity (Moore’s low)
Figure: The actual rise of IC complexity in DRAMs and microprocessors [2]
1. Key challenges facing main memories (6)
Rapid spreading of multicore processors in Intel’s processor portfolio
Figure: Rapid spreading of Intel’s multicore processors
1. Key challenges facing main memories (7)
The Cell BE (2006)
SPE: Synergistic Procesing Element
SPU: Synergistic Processor Unit
SXU: Synergistic Execution Unit
LS: Local Store of 256 KB
SMF: Synergistic Mem. Flow Unit
EIB: Element Interface Bus
PPE: Power Processing Element
PPU: Power Processing Unit
PXU: POWER Execution Unit
MIC: Memory Interface Contr.
BIC: Bus Interface Contr.
XDR: Rambus DRAM
Figure: Block diagram of the Cell BE [3]
1. Key challenges facing main memories (8)
Assuming that the IC process technology will evolve in the near future
at a similar rate as now (shrinking of characteristic feature sizes at a rate of ~ 0.7/2 years)
the number of cores will double also about every two years.
1. Key challenges facing main memories (9)
Higher processor performance/more cores
Higher memory performance requirements in terms of
• larger memory size
• higher memory bandwidth
• lower memory latency
1. Key challenges facing main memories (10)
Higher processor performance/more cores
Depends on
• characteristics of the application
• cache architecture
• ...
Higher memory performance requirements in terms of
• larger memory size
• higher memory bandwidth
• lower memory latency
1. Key challenges facing main memories (11)
Higher processor performance/more cores
Interesting
research
area
Depends on
• characteristics of the application
• cache architecture
• ...
Higher memory performance requirements in terms of
• larger memory size
• higher memory bandwidth
• lower memory latency
1. Key challenges facing main memories (12)
Higher processor performance/more cores
Depends on
• characteristics of the application
• cache architecture
• ...
Higher memory performance requirements in terms of
• larger memory size
• higher memory bandwidth
• lower memory latency
Limitations of
recent
implementations
1. Key challenges facing main memories (13)
Higher processor performance/more cores
Depends on
• characteristics of the application
• cache architecture
• ...
Higher memory performance requirements in terms of
• larger memory size
• higher memory bandwidth
• lower memory latency
Limitations of
recent
implementations
2. Main limiters of increasing the transfer rate of
main memories - Overview
2. The transfer rate of main memories (1)
Main components of the main memory
DRAM device
Memory Cell
Array
I/O
Buffers
Memory
controller
Figure: Main components of the main memory
2. The transfer rate of main memories (2)
Main limitations of recent commodity DRAMs (sychronous main memories)
in increasing transfer rates
• The rate of sourcing/sinking data from/to the memory array,
(problem of reducing the Column Cycle Time of the memory cell array)
DRAM device
Memory Cell
Array
I/O
Buffers
Memory
controller
Sourcing/Sinking
Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (3)
Main limitations of recent commodity DRAMs (sychronous main memories)
in increasing transfer rates
• The rate of transmitting data between memory controller and memory modules
(transmission line termination problem),
DRAM device
Memory Cell
Array
Sourcing/Sinking
I/O
Buffers
Memory
controller
Transfering
Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (4)
Main limitations of recent commodity DRAMs (sychronous main memories)
in increasing transfer rates
• The rate of capturing data in the memory controller/memory module.
(signaling and synchronization problem).
DRAM device
Memory Cell
Array
Sourcing/Sinking
I/O
Buffers
Memory
controller
Transfering
Capturing
Capturing
Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (5)
Main limitations of recent commodity DRAMs (sychronous main memories)
in increasing transfer rates
• The rate of sourcing/sinking data from/to the memory array,
(problem of reducing the Column Cycle Time of the memory cell array)
• The rate of transmitting data between memory controller and memory modules
(transmission line termination problem),
• The rate of capturing data at the memory controller/memory module.
(signaling and synchronization problem).
The most serious limitation constrains the achievable transfer rate.
3. Challenges in increasing the rate of
3. Challenges in increasing the rate of
sourcing/sinking data from/to the memory cell
sourcing/sinking data from/to the memory cell array
array
3. The rate of sourcing/sinking data (1)
Basic operation speed of recent sychronous DRAMs
The memory cell array sources/sinks data to/from the I/O buffers at a rate of T
(at a data width of x4/x8/x16).
T = 1/tCCD x FW
with tCCD: Min. column cycle time of the memory cell array
FW: Fetch width of the memory cell array
3. The rate of sourcing/sinking data (2)
The min. column cycle time (tCCD) of the memory cell array
tCCD (Core column delay)
is the min. time interval between consecutive Reads or Writes.
Figure: The interpretation of tCCD [4]
Remark
tCCD is designated also as the Read/Write command to Read/Write command delay
3. The rate of sourcing/sinking data (3)
ns
Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [5]
Note: The min. column cycle time (tCCD) of synchronous DRAMs is:
SDRAM:
DDR/2/3
7.5 ns
5 ns
3. The rate of sourcing/sinking data (4)
The fetch width (FW) of the memory cell array
specifies how many times more bits the cell array fetches per column cycle
then the data width of the device.
E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM)
fetches 4 × 4 that is 16 bits from the memory cell array per column cycle.
The fetch width (FW) of the memory cell array of synchronous DRAMs is typically:
DRAM type
FW
SDRAM:
DDR:
DDR2:
DDR3:
1
2
4
8
E.g.
SDRAM
E.g.
DDR
SDRAM
E.g.
DDR2
SDRAM
Clock frequency (f )
Clock (CK)
3. The rate
of
sourcing/sinking data
(5)
100 MHz
100 MHz
DRAM core frequency
100 MHz
Memory Cell
Array
CK
fCK
n bits
DRAM core clock
100 MHz
Memory Cell
Array
Data transfer on the rising edges of CK
over the data lines (DQ0 - DQn-1)
100 MT/s
n bits
fCK
2xn bits
I/O
Buffers
2 x fCK
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
200 MT/s
n bits
I/O
Buffers
2 x fCK
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
400 MT/s
n bits
4xn bits
E.g.
DRAM core clock
100 MHz
DDR-200
Data Strobe (DQS)
200 MHz
Clock (CK/CK#)
200 MHz
fCK/2
SDRAM-100
Data Strobe (DQS)
100 MHz
Clock (CK/CK#)
100 MHz
DRAM core clock
100 MHz
Memory Cell
Array
fCK
I/O
Buffers
DDR2-400
Data Strobe (DQS)
400 MHz
Clock (CK/CK#)
400 MHz
fCK/4
DDR3
SDRAM
Memory Cell
Array
I/O
Buffers
8xn bits
2 x fCK
n bits
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
800 MT/s
DDR3-800
Figure: Fetch width of synchronous DRAM generations
3. The rate of sourcing/sinking data (6)
According to Tmax = 1/tCCD x FW
The peak rates of sourcing/sinking data to/from the I/O buffers are:
SDRAM:
DDR:
DDR2:
DDR3:
1/7.5 x 1 = 133 MT/s
1/5 X 2 = 400 MT/s
1/5 x 4 = 800 MT/s
1/5 x 8 = 1600 MT/s (not yet achived)
The main limitation in increasing the rates of sourcing/sinking data from/to the
memory array is TCCD (Column Cycle Time).
The column cycle time TCCD) resulting from a DRAM design depends on a number of
architectural choiches, like column decoder layout, array block size, array
partitioning, decisions to share resources between array banks etc. [32].
Its reduction below 5 ns is an intricate circuit design task, that is out of scope of
our discussion. For an insight into the subject see [32].
Remark
GDDR3 and GDDR4 devices, with peak transfer rates of 1.6 and 2.5 GT/s, respectively,
achive min. column cycle times (TCCD) of 2.5 and 3.2 ns, respectively [32].
4. Challenges in increasing the transfer rate between
the memory controller and the DRAM parts
4. The transfer rate between the MC and the DRAM parts (1)
The dataway connecting the memory controller and the DRAM chips
Memory modules
Memory controller
Motherboard trace
Figure: The dataway connecting the memory controller and the DRAM chips
(based on [6])
4. The transfer rate between the MC and the DRAM parts (2)
The dataway connecting the memory controller and the DRAM chips
Memory modules
For higher data rates PCB traces
behave as transmission lines
Memory controller
Motherboard trace
Figure: The dataway connecting the memory controller and the DRAM chips
(based on [6])
4. The transfer rate between the MC and the DRAM parts (3)
Basic behaviour of transmission lines (TL)
TL
Driver
Receiver
Principle of operation
• A signal front given at the input of the TL travels down the TL from the driver side
to the receiver side.
• Arriving at the receiver side the signal becomes reflected back to the driver side, then
• at the driver side, the signal will be reflected again toward the receiver side etc.
4. The transfer rate between the MC and the DRAM parts (4)
Transmission lines (TL)
PC board traces (microstrips) behaves over ~ 100 MT/s like transmission lines with
• a characteristic impedance (ZO)
• and trace velocity
4. The transfer rate between the MC and the DRAM parts (5)
Characteristic impedance of PCB traces (ZO) [7]
Table: Typical characteristic impedance values of PCB traces [8]
4. The transfer rate between the MC and the DRAM parts (6)
Trace velocity
Table: Typical trace velocity values of PCB traces [8]
Remark
With 1 ft = 30.48 cm, the equivalent values in cm/ns are:
1.6 ns/ft equals ~ 19 cm/ns
2.0 ns/ft equals ~ 15 cm/ns
2.2 ns/ft equals ~ 14 cm/ns
4. The transfer rate between the MC and the DRAM parts (7)
Behaviour of an ideal TL
Ideal TL: no attenuation, no capacitive or inductive loading.
VrD(t)
ZO
VrR(t)
TL
ZD
T
VO (t)
VD(t)
Driver
With VO(t):
VD(t):
VrD(t):
VR(t):
VrR(t):
Generator voltage
Voltage at the driver output
Reflected voltage at the driver
Voltage at the receiver
Reflected voltage at the receiver
ZT
VR(t)
Receiver
ZD: Internal impedance of the driver
ZO: Charateristic impedance of the TL
ZT: Impedance terminaling the TL
T:
Flight-time over TL
Figure: Equivalent circuit of an ideal transmission line,
(neglecting attenuation along the TL and capacitive as well as inductive loading of the TL)
4. The transfer rate between the MC and the DRAM parts (8)
Characteristic equations
describing the reflections and driver/receiver side voltages (based on [9])
At t = 0
Driver side:
VO(t=0) = VO
ZO
VD(t=0) = VD(0) = VO Z + Z
O
D
VrD(t=0) = VD(t=0)
At t = T
(T: propagation time across the TL)
Receiver side:
VR(nT) = VD((n-1)T)*(1+rR)
where
rR =
VrR(nT) = VD((n-1)T)*rR
ZT – ZO
ZT + ZO
4. The transfer rate between the MC and the DRAM parts (9)
Characteristic equations (cont.)
At t = nT (n>1)
Driver side
VD((n+1)T) = VD((n-1)*T)+VrR(nT)*(1+rD)
where: rD =
ZD – ZO
ZD + ZO
VrD((n+1)T) = VrR(nT)*rD
Receiver side
VR(nT) =VR((n-2)T) + VrD((n-1)T)*(1+rR)
VrR(nT) = VrD((n-1)T)*rR
At t  ∞ (Steady state)
Receiver side
VR(t∞) = VO
ZO
ZO + ZD
4. The transfer rate between the MC and the DRAM parts (10)
Example 1: Open ended ideal TL
VrD (t)
ZO = 50 Ω
ZD = 25 Ω
TL
ZD
VO (t=0) = 2V
VO(t)
VD(t)
Driver
VrR (t)
ZT
ZT >> ZO
Receiver
Figure: Equivalent circuit of an open ended ideal TL
VR(t)
4. The transfer rate between the MC and the DRAM parts (11)
Driver side
Receiver side
VR(t)
2.0
VR(t)
1.0
VD(t)
2.0
1.0
VD(t)
1.333
1.333
T
1.33
1T
1.333
2.666
2.67
2T
2T
-0.444
2.222
3T
2.22
3T
-0.444
1.778
1.78
4T
4T
0.148
1.926
5T
1.93
5T
0.148
2.074
2.07
6T
6T
-0.049
2.025
7T
2.02
7T
-0.049
1.976
1.98
8T
8T
0.002
9T
Figure: Ladder diagram and VD(t), VR(t) waveforms of an open ended ideal TL (based on [6])
4. The transfer rate between the MC and the DRAM parts (12)
D: Driver
R: Receiver
O: Output
I: Input
Figure: Open ended real TL
(diiferential connection) [10]
Reflections at both ends
(R-end, D-end)
4. The transfer rate between the MC and the DRAM parts (13)
Reflections
Figure: Reflections shown on a eye diagram due to termination mismatch [11]
4. The transfer rate between the MC and the DRAM parts (14)
Implications of the reflections on a TL
• When a data signal is given at the driver side of the TL, a signal wavefront travels down
the TL and will be ping-ponged between both ends of the TL until the steady state
condition is reached.
• But until the signal becomes at least nearly settled no further wavefront can be given to
the TL else inter symbol interferences (ISI) arise.
Reflections limit the max. data transfer rate of a TL.
4. The transfer rate between the MC and the DRAM parts (15)
The max. data transfer rate is limited primarily by the time until the signal settles,
that is, it depends both on
•
•
the number of signal round trips until the signal settles, and
the length of the TL.
Example
Open ended TL of the length of 10 cm
Assumptions:
• Signal velocity on the TL is 20 cm/ns.
T = 0.5 ns
• Reflections settle to an acceptable level after three roundtrips (6T).
Then the wavefront of a signal settles nearly after 6×0.5 ns = 3 ns.
½ of the min. cycle time is 3 ns,
the min. cycle time is 6 ns,
the max. transfer rate of the above open ended TL is ~ 166 MHz
4. The transfer rate between the MC and the DRAM parts (16)
Open ended TLs may be used only for
• relative low transfer rates (up to ~ 100 MHz), that is up to SDRAM devices, and
• short distances (up to ~ 10 cm).
For higher transfer rates or longer distances the TL needs to be terminated
by its characteristic impedance Z0.
4. The transfer rate between the MC and the DRAM parts (17)
Reducing reflections by a series resistor
A series resistor put before the TL reduces reflections
Improved signal integrity, higher transfer rates
4. The transfer rate between the MC and the DRAM parts (18)
Example 2: Using series resistors to reduce reflections
Figure: Equivalent circuit of an open ended TL with a series resistor (R3 in the figure) included
between the driver and the TL (Micro-Cap 9.0.5.0)
4. The transfer rate between the MC and the DRAM parts (19)
R3 = 0 Ώ
R3 =
25 Ώ
R3:
Figure: Driver (Vout) and Reciever (Vin) voltages of an open ended TL with a series resistor R3
The value of R3 is modified from 0 to 25 Ohm
4. The transfer rate between the MC and the DRAM parts (20)
SDR DIMM
SDR DIMM
Memory Contr.
LVTTL
Comm., Contr.
Addr.
DQ, DQS
RS
RS
DM
Slot 1
Slot 2
Figure: Series resistors on an SDRAM module inserted into the DQ, DQS, DM lines
(Rs = 10 or 22 Ω)
4. The transfer rate between the MC and the DRAM parts (21)
Matched TLs
Needed above ~ 100 MHz (i.e. for DDR/DDR2/DDR3 memories).
Basic scheme for unidirectional signals (assuming SSTL signaling)
VT
VT:
VREF:
RT
ZO = 50 Ohms
Transmitter
RT:
Termination Voltage = VREF
0.5 Output voltage
50 Ohm
VREF
Receiver
Figure: Termination of a TL with its characteristic impedance [12]
SSTL:Stub Series Termination Logic
4. The transfer rate between the MC and the DRAM parts (22)
Example 3: Perfectly terminated ideal TL
VrD (t)
VO(t=0) = 2V
ZD = 25 Ω
ZO = 50 Ω
ZD
TL
VO (t)
VD(t)
Driver
VrR (t)
ZT
ZT = 50 Ω
Receiver
Figure: Equivalent circuit of a perfectly terminated ideal TL
VR(t)
4. The transfer rate between the MC and the DRAM parts (23)
Driver side
Receiver side
VR(t)
2.0
VR(t)
1.0
VD(t)
2.0
1.0
VD(t)
1.33
1T
0
1.33
2T
3T
4T
Figure: Ladder diagram and waveforms VD(t), VR(t) of a perfectly matched ideal TL
(based on [6])
4. The transfer rate between the MC and the DRAM parts (24)
RT = ZO
Figure: Perfectly matched real TL
(differential connection) [10]
No reflections from
the receiver end
4. The transfer rate between the MC and the DRAM parts (25)
The problem of TL inhomogenity
• The TL connecting the memory controller and the DRAM devices is not homogeneous,
it consists of multiple sections.
Memory modules
Memory controller
Motherboard/transmission line
Figure: Discontinuities of TLs connecting the memory controller and the memory modules
(based on [6])
4. The transfer rate between the MC and the DRAM parts (26)
Figure: Discontinuities of TLs connecting the slot to the particular DRAM devices
assuming stub-bus topology and a registered memory module [5]
4. The transfer rate between the MC and the DRAM parts (27)
The problem of TL inhomogenity
• The TL connecting the memory controller and the DRAM devices is not homogeneous,
it consists of multiple sections.
• Between different TL sections there are discontinuities, that give rise to reflections.
Memory modules
Memory controller
Motherboard/transmission line
Figure: Discontinuities of TLs connecting the memory controller and the memory modules
(based on [6])
4. The transfer rate between the MC and the DRAM parts (28)
Addressing the problem of TL discontinuities
SSTL termination (Stub Series Termination Logic)
Used in DDR/DDR2/DDR3 devices
Principle
Use both perfect termination and a series resistors (RS) to increase the TL attenuation
and thus reduce reflections from the memory module back to the memory controller [6].
R S:
RT:
22/25 Ohm
50 Ohm
VT
RT
VT: Termination Voltage = VREF
VREF: 0.5 Output voltage
RS
ZO = 50 Ohms
Transmitter
VREF
Receiver
Figure: SSTL termination of a unidirectional signal [12]
4. The transfer rate between the MC and the DRAM parts (29)
Figure: Equivalent circuit of two TLs (T1, T2) with slightly different characteristic impedances,
a series resistor (R3), while T2 is terminated by 50 Ohm and 3 pF.
4. The transfer rate between the MC and the DRAM parts (30)
Discontinuities of the transmission line generate reflections
Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit
4. The transfer rate between the MC and the DRAM parts (31)
Higher series resistor values attenuate reflections but
lower the steady state output voltage
R3 = 0 Ώ
R3 = 25 Ώ
R3 = 0 … 25
Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit
The value of R3 is modified from 0 to 25 Ohm
4. The transfer rate between the MC and the DRAM parts (32)
Higher output capacitance values lower the reflections
C3=0 pF
C3=9 pF
C3 = 0 … 9 pF
Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit
The value of C3 is modified from 0 to 9 pF
4. The transfer rate between the MC and the DRAM parts (33)
Note
With increasing value of Rs (from 2 Ohm to 22 Ohm) the amplitude of the reflected
voltage at the receiver side clearly decreases.
4. The transfer rate between the MC and the DRAM parts (34)
Example 1: Line terminations in a DDR memory
DDR DIMM
DDR DIMM
VT
T
Memory Contr.
VT
T
RT
SSTL_2
RT
RS1
RS1
Comm., Contr.
Addr.
DQ, DQS/#
DM
RS2
RS2
Slot 1
Slot 2
Figure: Line terminations in a DDR memory
(RS1: 7.5 Ω for 4 devices, 5.1 Ω for 8 devices, 3 Ω for 16 devices
RS2 = 22 Ω, RT = 56 Ω)
4. The transfer rate between the MC and the DRAM parts (35)
In order to achieve higher transfer rates
more and more sophisticated line terminations are needed.
Examples: Synchronous DRAMs (commodity DRAMs)
4. The transfer rate between the MC and the DRAM parts (36)
Example 2: Line terminations in a DDR2 memory
On-Die Termination (ODT)
DDR2 DIMM
DDR2 DIMM
VT
T
Memory Contr.
SSTL_18
RS1
Comm., Contr.
Addr.
ODT
T
RTT
RS1
ODT
VTT
VTT
R1
R1
DQ, DQS/#
R2
R2
DM
Vss
Vss
Rs2
Rs2
VT
Slot 1
Slot 2
Figure: Line terminations in a DDR2 memory
(RS1: 10 Ω for 4 devices, 5.1-10 Ω for 8 devices, 7.5 Ω for 16 devices
RS2 = 22 Ω, RTT = 47 Ω)
RTT
4. The transfer rate between the MC and the DRAM parts (37)
Example 3: Line terminations in a DDR3 memory
Dynamic On-Die Termination (ODT) opt.: to optimize termination resistors along with each write command
DDR3 DIMM
DDR3 DIMM
VT
VT
T
Memory Contr.
RT
Dyn. ODT
SSTL_15
Comm., Contr.
Addr.
T
VTT
VTT
R1
R1
DQ, DQS/#
R2
R2
DM
Rs
RT
Dyn. ODT
Vss
ZQ
Rs
RZQ
Vss
Vss
ZQ
RZQ
Vss
ZQ calibration: to adjust the „on” and the „termination” impedances of the merged drivers every 128 ms.
Figure: Line terminations in a DDR3 memory
(Rs = 10-15 Ω, RT = 36-39 Ω, RZQ = 240 Ω ±1%)
Remark: Due to the fligh-by module topology no series resistors are needed for the Command/Control/Address lines
4. The transfer rate between the MC and the DRAM parts (38)
SDRAM
DDR SDRAM
DDR2 SDRAM
DDR3 SDRAM
C/C/A
LVTTL
SSTL_2
SSTL_18
SSTL_15
Clock (CLK/CK)
LVTTL
SSTL_2 Diff.
SSTL_18 Diff.
SSTL_15 Diff.
DQ, DQM
LVTTL
SSTL_2
SSTL_18
SSTL_15
DQS
--
SSTL_2
SSTTL-18/SSTL_18 Diff.
SSTL_15 Diff.l
Terminations
No RS
Signaling
RS
RS on module
RS on module
RT
No RT
RT on board
RT on board
RT on module
RS
RS on module
RT
--
RT on board
ODT (RT on die)
Dyn. ODT (RT on die)
Separate output /
termination drivers
Separate output /
termination drivers
Separate output /
termination drivers
Merged output/termination drivers
with ZQ-calibration (during power
up/ periodically)
Central clock
Source synchronization
No DQS
DLL
DLL
DLL+ Read/write leveling to
compensate fly-time skews between
DQS and CK (during power-up)
Posted reads/writes
No
No
Yes
Yes
Reset pin
No
No
No
Yes
C/C/A
No RS
DQ/DQS/DM
Driver architecture
Synchronization
Basic scheme
Aligning DQS with
CK
DIMM topology
Packaging
C/C/A: Command/Control/Address
Stub architecture
TSOP-54
DQ: Data
Fly-by architecture
TSOP-54
BGA-60
BGA-60 for x4/x8
BGA-84 for x16
DQM: Data Mask
Table : Implementation details of SDRAM types
DQS: Data Strobe
BGA-78 for x4/x8
BGA-96 for x16
4. The transfer rate between the MC and the DRAM parts (39)
Line terminations of recent commodity DRAMs
achieved already a rather high grade of sophistication
there is not to much headroom remaining
for further improvements.
5. Challenges in increasing the rate of capturing
data in the memory controller/DRAM parts
5. Challenges in increasing the rate of capturing data
in the memory controller/DRAM parts
•
5.1 Coping with capturing data
•
5.2 Using more advanced signalling
•
5.3 Using more advanced synchronization
5.1 Coping with capturing data (1)
Basics of capturing data
Data/Commands
• Data and commands are latched by D Flip-Flops.
Clock
Clock
• For correctly capturing data or commands,
input signals need to be held valid
for specified periods of time
before and after the clock puls,
termed as the setup time (tS) and
the hold time (tH) as shown in the figure.
Data
tH
tS
Q
Figure: Temporal requirements
for correctly capturing data
5.1 Coping with capturing data (2)
Setup time (tS)
the minimum time interval for which the input signal must remain valid (high or low)
prior to the clock edge in order to capture the data bit correctly.
Hold Time (tH)
the minimum time interval for which the input signal must remain valid (high or low)
following the clock edge in order to capture the data bit correctly.
5.1 Coping with capturing data (3)
Specification of the setup time (tS) and the hold time (tH)
In device datasheets, e.g. in case of a DDR-400 device:
Table: Excerpt from the specification of the dynamic parameters of a DDR-400 device [13]
Note: A DDR-400 device is clocked by 200 MHz, so its half clock period is 1.25 ns.
By contrast, its setup and hold times are 0.4 ns each (designated as tDS, TDH in the table).
5.1 Coping with capturing data (4)
Minimum data valid window (DVW)
the minimum time interval for which the input signal must remain valid (high or low)
before and after the clock edge in order to capture the data bits correctly.
Data
CK
tS
tH
Min. DVW
Figure: Interpretation of the minimum DVW for ideal signals
The minimum DVW has two characteristics,
a size, that is the sum of the setup time (tS) and the hold time (tH), and
a correct phase related to the clock edge, to satisfy both tS and tH requirements.
5.1 Coping with capturing data (5)
If both tS = tH, the clock edge needs to be center aligned with the DVW, as indicated below.
Data
CK
tS
tH
Min. DVW
Figure: Center aligned clock edge within the min. DVW
Example
In a DDR-400 SDRAM tS = tH = 0.4 ns [13], then
• the min. DVW is 0.8 ns, i.e. roughly 2/3 of the clock period (1.25 ns), and
• the clock edge needs to be center aligned in the min. DVW.
5.1 Coping with capturing data (6)
Available DVW
the time interval for which the input signal remains valid (high or low).
Available DVW
Data
CK
tS
tH
Min. DVW
Figure: Interpretation of the minimum DVW and the available DVW for ideal signals
For correctly capturing data:, two requirements need to be fulfilled:
• the available DVW ≥ available DVW, and
• the clock edge needs to be properly aligned (usually center aligned)
within the available DVW.
5.1 Coping with capturing data (7)
Note
Assuming tS = tH (as usual)
for the highest transfer rate
the clock signal needs to be center aligned with the data.
5.1 Coping with capturing data (8)
Reduction of the available DVW in real systems
In real systems the available DVW is reduced due to
• skews and
• jitter.
5.1 Coping with capturing data (9)
Skew
is a time offset of the signal edges
• between different occurances of the same signal, such as a clock, at different locations
on a chip or a PC board (as shown in the Figure below), or
• between different bit lines of a parallel bus at a given location.
Figure: Skew due to propagation delay [15]
Skews arise mainly due to
- propagation delays in the PC-board traces, termed also as time of flight (TOF)
(about 170 ps/inch), as indicated above [14],
- capacitive loading of a PC-board trace (about 50 ps per pF) as indicated in the
subsequent figure [14],
- SSO (Simultaneous Switching Output) occurring due to parasitic inductances in case
when a number of bit lines simultaneously change their output states.
5.1 Coping with capturing data (10)
CK-1
CK-2
Skew
Figure: Skew due to capacitve loading of signal lines [14]
5.1 Coping with capturing data (11)
Reduction of operational tolerances due to skews
Center aligned clock
Skewed clock
Available DVW
Available DVW
Data
Data
CK
CK
tS
tS
tH
Min. DVW
tH
Min. DVW
Figure: Reduction of operational tolerances due to clock skew
(ideal signals assumed)
A larger than indicated skew would even jeopardize or prevent correct operation.
Deskewing of clock distribution is needed.
5.1 Coping with capturing data (12)
Jitter
• phase uncertainty causing ambiguity in the rising and falling edges of a signal,
as shown in the figure below,
• has a stochastic nature,
Figure: Jitter of signal edges [15]
The main sources of jitter are
• Crosstalk caused by coupling adjacent traces on the board or in the DRAM device,
• ISI (Inter-Symbol Interference) caused by cycling the bus faster than it can settle,
• Reflection noise due to mismatching termination of signal lines,
• EMI (Electromagnetic Interference) caused by electromagnetic radiation emitted
from external sources.
5.1 Coping with capturing data (13)
Narroving the available DVW due to jitter
Jitter obviously narrow the available DVW, as shown in the following example for
DDR-200 devices.
(DDR-200 devices are clocked by 100 MHz, thus their half clock period is 5 ns).
DQ
Av. DVW
Av. DVW
~ 5 ns
with jitter
without jitter
Figure: Narrowing the available DVW due to jitter
5.1 Coping with capturing data (14)
The timing budget of the available DVW
The available DVW need to cover
• the min. requested DVW (tS +tH),
• all possible sources of skews,
• all possible sources of jitter.
Available DVW
min.
DVW
Skews/jitters
Skews/jitters
Figure: Interpretation of the timing budget of the available DVW
Note
The white areas before and after the min. DVW represent available timing margins
5.1 Coping with capturing data (15)
Example
Timing budget of a DDR-266 memory
Table: Timing budget of a DDR-266 memory [16]
Remark
The table uses partly different terminology, as follows
Total skew:
Transmitter skew:
Receiver skew:
VREF noise:
CIN mismatch:
Available DVW
Setup time
Hold time
OSS
Skew due to different capacitive loading
5.1 Coping with capturing data (16)
Note
The crucial sources and actual extent of occurring skews and jitters depend on
• the frequency range in question,
• DRAM type used,
• mainboard and memory module implementation details.
Timing budget tuning is a main task of developing DRAM devices/modules
and mainboards.
5.1 Coping with capturing data (17)
Shrinking the available DVW for higher transfer rates
Higher data rates
Shorter clock periods
Shorter available DVWs
This is one of the key problems to be handled for achieving higher data rates.
tDV Width of the
available VDW
Figure: Shrinking the available DVW while raising the data rate from PC-133 to
DDR-400 and DDR2-800 [17]
5.1 Coping with capturing data (18)
Addressing the problem of shrinking (available) DVWs in order to raise DRAM speed
Reducing skews and jitters by
• using more advanced signaling techniques, such as
• SSTL (Stub Series Terminated Logic) or
• LVDS (Low Voltage Differential Signaling),
instead of open-ended LVTTL (Low Voltage TTL),
• using more efficient synchronisation schemes than central clocking, such as
source-synchronous synchronisation.
• using DLLs/PLLs to align clock or data strobe edges.
5.2 Using more advanced signaling (1)
Using more advanced signaling techniques
5.2 Using more advanced signaling (2)
Signal types
Ground referenced
Voltage referenced,
single ended
Voltage referenced,
differential
S+
VREF
LVTTL (3.3 V)
SDRAM
PCI
PCI-X
AGP1.0
t
t
t
TTL (5 V)
PCI
VCM
S-
SSTL single ended signals
SSTL_2 (2.5 V) (DDR)
SSTL_18 (1.8 V) (DDR2)
SSTL_15 (1.5 V) (DDR3)
AGP2.0 (1.5 V)
AGP3.0 (0.8 V)
Higher data rates
HVDS
SCSI-1
LVDS
Hypertransport
SATA
Ultra-2 SCSI and later
PCI-E
SSTL Differential signals
Figure: Overview of signal types
LVTTL: Low Voltage TTL
HVDS: High Voltage Differential Signaling
VREF: Reference Voltage
LVDS: Low Voltage Differential Signaling
SSTL: Stub Series Terminated Logic
VCM:
Common Mode Voltage
5.2 Using more advanced signaling (3)
Signal types used in mainstream DRAMs
TTL
LVTTL
SSTL
(Low Voltage TTL)
(Stub Series Termination Logic)
Signal type
Ground referenced
Ground referenced
Voltage referenced,
single ended/differential
Termination
Open ended
Open ended
Terminated
Voltage
Used in the
DRAM types
5V
Page Mode
FPM
EDO
3.3 V
FPM
EDO
SDRAM
Figure: Signal types used in mainstream DRAMs
(Earliest DRAMs (1K/4K) omitted)
2.5/1.8/1.5 V
DDR
DDR2
DDR3
5.2 Using more advanced signaling (4)
SDRAM
DDR SDRAM
DDR2 SDRAM
DDR3 SDRAM
Comm./Control/Addr./
Data (DQ)/Data Mask (DM)
LVTTL
SSTL_2
SSTL_18
SSTL_15
Clock (CLK/CK)
LVTTL
SSTL_2 Diff.
SSTL_18 Diff.
SSTL_15 Diff.
--
SSTL_2
SSTL_18 / SSTL_18 Diff.
SSTL_15 Diff.
Data Strobe (DQS)
Table: Signal types of the main signal groups in synchronous DRAM devices
5.2 Using more advanced signaling (5)
Vout
VOH max 5.0
4.0
VIN
Vout
TTL inverter
3.0
VOH min 2.4
2.0
1.0
VOL max 0.4
0.8 1.0
VIL max
2.0 2.4 3.0
VIH min
4.0
5.0
Vin
Figure: Input/output characteristics of TTL signals as used in PM/FPM/EDO devices (based on [6])
5.2 Using more advanced signaling (6)
Vout
3.3
VOH max 3.0
VOH min 2.4
VIN
2.0
Vout
LVTTL inverter
1.0
VOL max 0.4
0.8 1.0
VIL max
2.0
VIH min
3.0 3.3
Vin
Figure: Input/output characteristics of LVTTL signals (based on [6])
5.2 Using more advanced signaling (7)
Stub Series Terminated Logic (SSTL)
Three generations
SSTL_2:
VDDQ = 2.5 V
JESD8-9
SSTL_18: VDDQ = 1.8 V
SSTL_15
(Sept. 1998),
used in DDR SDRAMs
JESD8-15A (Oct. 2002),
used in DDR2 SDRAMs
VDDQ = 1.5 V
used in DDR3 SDRAMs
SSTL signals
Differential
Single ended
S+
VRE
VCM
S-
F
t
t
Commmand/Control/Address,
Used as
Clock (CK)
Data (DQ), Data Mask (DM),
Data Strobe (DQS) in DDR/DDR2
Data Strobe (DQS) in DDR2/3
Figure: Types of SSTL signals
5.2 Using more advanced signaling (8)
Vout
VOH min
VIN
VREF
2.5
2.125
2.0
Vout
1.25
1.0
SSTL inverter
VOL max
0.375
1.0 1.25
VREF
VIL max
2.0 2.5
Vin
VIH min
(VREF – 150 mV) (VREF + 150 mV)
Figure: Input/output characteristics of single ended SSTL signals (based on [6])
The static view
5.2 Using more advanced signaling (9)
Figure: Interpretation of characteristic input levels of single ended SSTL signals [18]
The dynamic view
DC values: define the final logic state.
AC values: define the timing specifications the receiver needs to meet e.g. slew rate)
State changes
A certain amout of time after the device has crossed the DC threshold
and then also the AC threshold (hold time), the device will switch state and
will not switch back as long as the input stays beyond the dc threshold [18].
5.2 Using more advanced signaling (10)
Figure: Using AC values for defining the falling and rising slew rates
of single ended SSTL signals [19]
5.2 Using more advanced signaling (11)
DDR
DDR2
DDR3
VDDQ
2.5 V
1.8 V
1.5 V
VREF
1.25 V
0.9 V
0.75 V
VIH (ac )min.
VREF + 310 mV
VREF + 250 mV
VREF + 175 mV
VIH (dc) min.
VREF + 150 mV
VREF + 125 mV
VREF + 100 mV
VIL (dc )max.
VREF - 150 mV
VREF - 125 mV
VREF - 100 mV
VIL (ac)max.
VREF - 310 mV
VREF - 250 mV
VREF - 175 mV
Ground
Ground
Ground
VSS
Table: Characteristic input levels of single ended SSTL signals in DDR/DDR2/DD3
devices [20], [21], [22]
5.2 Using more advanced signaling (12)
VTR: True level
VCP: Compl. level
Figure: Interpretation of characteristic input levels of differential SSTL signals [19]
(CK/CK#, DQS/DQS#)
DDR
DDR2
DDR3
VDDQ
2.5 V
1.8 V
1.5 V
VREF
1.25 V
0.9 V
0.75 V
VID
620 mV
500 mV
400 mV
VIX
VREF
VREF
VREF
VSS
Ground
Ground
Ground
Table: Characteristic input levels of differential SSTL signals
in DDR/DDR2/DD3 devices [20], [22], [19]
5.2 Using more advanced signaling (13)
Skew reduction by differential data strobes (DQ, DQ#)
Figure: Skew reduction while using differential strobes instead of single ended strobes [23]
5.2 Using more advanced signaling (14)
The eye diagram
Visualizes both signal traces (belonging to the H and L levels) by overlapping subsequent
symbols in time, as indicated below for both an ideal and real signal.
Jitter
Reflections
Reflections
Figure: Eye diagram of an ideal and a real signal
The eye diagram is a favorable way
• to visualize reflections, jitter and
• to contrast expected and available values both for the DVW and voltage levels.
5.2 Using more advanced signaling (15)
Visualizing both min. and available DVWs and voltage margins
by means of an eye diagram
M
r
g
M
r
g
Margin
V1Hmin
DATA eye
Min.
DVW
Margin
V1Lmax
tS
tH
Figure: Eye diagram of an ideal signal showing both min. and available DVW and voltage levels
5.2 Using more advanced signaling (16)
min
Min.
DVW
max
Figure: Eye diagram of a real signal showing both min. and available DVW and voltage levels [24]
5.2 Using more advanced signaling (17)
For a correct operation
available DVW and voltage values ≥ required values
A stable operation needs reasonable temporal margins (timing budget) and
voltage margins.
5.3 Using more advanced sycnhronisation (1)
Improving the basic synchronisation scheme
Basic synchronisation scheme
Central clock synchronisation
Source synchronisation
A central clock is used to latch (capture)
addresses, commands and data from
the respective buses or send fetched data.
Figure: Basic synchronisation schemes
5.3 Using more advanced sycnhronisation (2)
Central clock synchronization
(SDRAMs)
Address, command and data lines are
latched by the central clock (CLK)
Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs)
while writing random data [25], [13]
(TDOSS: Write command to first DQS latching transition)
5.3 Using more advanced sycnhronisation (3)
Improving the basic synchronisation scheme
Basic synchronisation scheme
Central clock synchronisation
Source synchronisation
A central clock is used to latch (capture)
addresses, commands and data from
the respective buses or send fetched data.
• Leads to high skews due to
propagation delays (flight of time),
different path length,
different loading of the traces etc.
• SDRAMs and earlier DRAMs are centrally
clocked.
Figure: Basic synchronisation schemes
5.3 Using more advanced sycnhronisation (4)
Improving the basic synchronisation scheme
Basic synchronisation scheme
Central clock synchronisation
A central clock is used to latch (capture)
addresses, commands and data from
the respective buses or send fetched data.
Source synchronisation
An extra data strobe signal (DQS) is provided
to accompany data sent from the
driving unit to the receiving unit
• Leads to high skews due to
propagation delays (flight of time),
different path length,
different loading of the traces etc.
• SDRAMs and earlier DRAMs are centrally
clocked.
Figure: Basic synchronisation schemes
5.3 Using more advanced sycnhronisation (5)
Central clock synchronization
(SDRAMs)
Address, command and data lines are
latched by the central clock (CLK)
Source synchronization
(DDR SDRAMs)
Command and address lines are latched
by the differential clock (CK, CK#) but
data are latched by the source synchronous
data strobe DQS
Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs)
while writing random data [25], [13]
(TDOSS: Write command to first DQS latching transition)
5.3 Using more advanced sycnhronisation (6)
Improving the basic synchronisation scheme
Basic synchronisation scheme
Central clock synchronisation
A central clock is used to latch (capture)
addresses, commands and data from
the respective buses or send fetched data.
Source synchronisation
An extra data strobe signal (DQS) is provided
to accompany data sent from the
driving unit to the receiving unit
• Leads to high skews due to
propagation delays (flight of time),
different path length,
different loading of the traces etc.
• The data strobe signal eliminates
propagation delays between data lines
• SDRAMs and earlier DRAMs are centrally
clocked.
• DDR SDRAMs are source synchronised.
• The data strobe signal (DQS) is bidirectional
to reduce pin count.
Figure: Basic synchronisation schemes
5.3 Using more advanced sycnhronisation (7)
Required phase alignments for synchronous DRAM devices, controllers and modules
In case of SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CLK).
• for data writes
center align data signals (DQ) with the clock (CLK),
• for data reads
SDRAM devices do not perform any alignment on the data sent to the controller,
it is the task of the controller to shift the CLK edge to the center of the data eye.
• SDRAM devices do not need to perform any phase alignments, however
• for data reads
they have to garantee that the required minimal data hold time (TOH) is satisfied,
see Figure.
• SDRAM modules need to perform clock deskewing for the clock (CLK) distribution circuitry.
5.3 Using more advanced sycnhronisation (8)
In case of DDRx SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CK).
5.3 Using more advanced sycnhronisation (9)
Figure: Required phase alignments in case of DDRx devices
5.3 Using more advanced sycnhronisation (10)
In case of DDRx SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CK).
• for data writes
center align data signals (DQ) with the data strobe (DQS),
5.3 Using more advanced sycnhronisation (11)
Figure: Required phase alignments in case of DDRx devices
5.3 Using more advanced sycnhronisation (12)
In case of DDRx SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CK).
• for data writes
center align data signals (DQ) with the data strobe (DQS),
• for data reads
(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).)
5.3 Using more advanced sycnhronisation (13)
Figure: Required phase alignments in case of DDRx devices
5.3 Using more advanced sycnhronisation (14)
In case of DDRx SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CK).
• for data writes
center align data signals (DQ) with the data strobe (DQS),
• for data reads
(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).)
It is then the task of the controller to shift the DQS edge to the center of the data eye.
5.3 Using more advanced sycnhronisation (15)
Example: Shifting DQS into the center of DQ
Figure: DDR2 write operation at 800 MT/s showing 90O shift of the differential DQS
into the center of the data eye [27]
5.3 Using more advanced sycnhronisation (16)
In case of DDRx SDRAM devices
• Memory controllers of devices need to perform the following alignments:
• for all commands
center align address, command and control signals with the clock (CK).
• for data writes
center align data signals (DQ) with the data strobe (DQS),
• for data reads
(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).)
It is then the task of the controller to shift the DQS edge to the center of the data eye.
• DDRx devices perform the following alignment:
• for data reads
they edge align the data strobe signal (DQS) with the data signal (DQ).
5.3 Using more advanced sycnhronisation (17)
The rationale of this alignment scheme
to keep DRAM devices as simple as possible and put complexity into the
memory controller [27],
by centralizing DLL circuitry needed to accomplish alignments in a single place
that is into the memory controller and thus to avoid the need for replication DLLs
into every DRAM device (except the DLLs needed in the DRAMs to edge align
the DQS with CK for reads) [26].
5.3 Using more advanced sycnhronisation (18)
Furthermore
• DDRx modules need to perform clock deskewing for the clock (CK) distribution circuitry.
5.3 Using more advanced sycnhronisation (19)
DLLs (Delay Locked Loops)
used to
• edge align or deskew two signals, or
• center align the data strobe signal (DSQ) with the data signal (DQ).
Delay
CLKREF
CLK
CLKD
CLKOUT
Figure: Deskewing the CLK signal with reference to the CLKRE signal by means of a DLL
Delay
DQ
DQS
DQS
DQS
Figure: Shifting the data strobe (DQS) to the center of the data signal (DQ) by means of a DLL
5.3 Using more advanced sycnhronisation (20)
Simplified block diagram and principle of operation of a DLL
The DLL is buit up mainly of a delay line and a phase delay control unit.
The phase delay control unit inserts delay on the clock signal (CLK) until the rising edge
of the clock signal (CLK) is in phase with the rising edge of the reference clock signal (CLKREF).
CLK
Delay
Delay
Delay
Delay
CLKOUT
CLKREF
Phase Delay
Control
Clock
Distribution
Network
CLKOUT
Delay
based on [28]
CLKREF
CLK
CLKD
CLKOUT
Figure: Block diagram and principle of operation of the DLL by deskewing the clock signal CLK
5.3 Using more advanced sycnhronisation (21)
„Warm up” time of DLLs
In a DRAM device the DLL will be activated during initialization (power up procedure).
After enabling however, the DLL needs about 200 clock cycles to lock [13]
and thus, until any read command can be issued.
Remark [6]
• PLLs and DLLs fulfill similar tasks. However,
• PLLs include a voltage controlled oscillator (VCO), that generates a new clock signal,
whose phase is adjustable.
• DLLs include a delay line, that inserts a voltage controlled phase delay between the
input and output signal.
While DLLs just delay the incoming signal to achieve a phase alignement,
the PLLs actually synthesize a new clock signal, whose phase is adjustable.
• Since DLLs do not incorporate a VCO, they are cheaper to implement than PLLs.
Memory controllers and DRAM devices of synchronous DRAMs make use
of DLLs to implement phase alignments. In contrast, memory modules use
PLLs to deskew clock distribution networks.
6. Main limiters of increasing the memory size
6. Main limiters of increasing the memory size (1)
Memory size (CM)
The memory size is given basically by the amount of memory installed in the memory system:
CM = nCU x nCH x nM x nR x CD
with nCU: No. of north bridges/memory control units
nCH: No. of memory channels per north bridge/control unit
nM: No. of memory modules per channel
nR:
No. of ranks per memory module
CR:
Rank capacity (device density x no. of DRAM devices)
E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two
dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density
per rank.
The resulting maximum memory capacity is:
CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB
6. Main limiters of increasing the memory size (2)
Remark
Beyound the max. installable memory the max. memory size may be limited by particular
constraints, such as the supported max. addressable space due to the number of address
pins on the FSB, like in the 925X and 925XE desktop chipsets [31].
Crucial factors limiting the maximum size of main memories
• nM: No. of memory modules supported per memory channel
• CR:
Rank capacity (device density x no. of DRAM devices/rank).
6. Main limiters of increasing the memory size (3)
Number of memory modules
supported per memory channel
E.g.
1-4
memory modules
6-8
memory modules
Modules connected
via a parallel bus
Modules connected
via a serial bus
SDRAM, DDR, DDR2, DDR3
modules
FBDIMM
modules
Higher transfer rates limit
the number of mem. modules
typically to one or two.
Figure: Number of memory modules supported by memory channel
6. Main limiters of increasing the memory size (4)
Slots/ch.
4
3
2
*
*
*
1
*
MT/s
133 200 266 333 400
533
667
800
1066
1333
1600
Figure: Max. number of supported memory modules (slots)/channel in Intel’s desktop chipsets
6. Main limiters of increasing the memory size (5)
Slots/ch
4
*
*
Later
3
2
*
* *
*
At intro.
*
*
1
MT/s
133 200 266 333 400
533
667
800
Figure: Max. number of supported memory modules (slots)/channel in Intel’s server chipsets
6. Main limiters of increasing the memory size (6)
Notes
1. Servers prefer memory size over memory speed. E.g.
• current desktop chipsets support
speed grades of up to DDR3-1333 (even DDR3-1600 with strong size restriction) and
memory sizes of up to 4 GB/channel,
• current server chipsets using parallel connected main memory support
speed grades of up to DDR2-667 but
memory sizes of up to 16/24 GB/channel.
2. Servers expect registered memory modules rather than unbuffered modules as
desktops do. Registered modules provide buffering for the address and control lines,
and through reducing signal loading they increase the number of supported memory
slots (memory modules) and thus supported memory size.
3. On higher transfer rates the next wavefront arrives earlier on the transmission line,
Less time remains until the next wavefront arrives the transmission line,
Less time remains for settling the reflections of the privious wavefront,
Inter signal interferences (ISI) will raise.
Thus, for higher frequencies reflections, also skews and jitter impede more and more
signal integrity. This limits the number of supported memory modules/channel.
Recent desktop chipsets support typically 1-2 whereas server chipsets with parallel
communication path, typically 2-3 memory modules (slots)/channel.
6. Main limiters of increasing the memory size (7)
Rank capacity (CR)
CR = nD x D
with
nD: Number of DRAM devices/rank
D: Device density
Number of DRAM devices/rank
A 64-bit wide rank consists of 8 x8 or 16 x4 devices, and occupies usually one module side.
E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices
6. Main limiters of increasing the memory size (8)
Remark
A few Intel server chipsets, such as the E7500, 7501 supported stacked devices as well.
E.g. the E7500 server chipset supported double-sided dual rank DIMMs with 16 stacked
devices (a rank) mounted on each side and yielding a total modul size of 2 GB.
Figure: Double sided DDR SDRAM DIMM with 16 stacked devices on each side [30]
6. Main limiters of increasing the memory size (9)
Device density
Units 106
2000
4M
16M
64M 256M
1G
1500
Density: ~4×/4Y
64K
1000
500
256K 1M
16K
1980
1985
1990
1995
2000
2005
2010
2015
Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [29])
Year
6. Main limiters of increasing the memory size (10)
845
Max.
mem. size/ch.
(1/02)
875P1
(4/03)
4 GB
925X
975X
P35
(6/04)
(11/05)
(6/07)
*
3 GB
X482
(3/08)
*
*
*
2 GB
*
*
1 GB
*
MT/s
133 200 266 333 400
533
667
800
1066
1333
1600
Max. dev. size
1 Gb
1 GB
512 Mb
*
*
512 Mb
*
*
MT/s
133 200 266 333 400
533
667
800
1066
1333
1600
Figure: Supported max. device size and max memory size/channel in Intel’s desktop chipsets
6. Main limiters of increasing the memory size (11)
Max.
mem. size/ch.
E7501 E7520
51001
(12/02) (8/04)
(1/08)
Later
24 GB
*
*
At intro.
16 GG
*
8 GB
*
*
*
*
*
MT/s
133 200 266 333 400
533
667
800
Max. dev. size
2 Gb
2 Gb
*
1 Gb
1 Gb
512 Mb
*
*
*
512 Mb
*
*
MT/s
133 200 266 333 400
533
667
800
Figure: Supported max. device size and max memory size/channel in Intel’s server chipsets
6. Main limiters of increasing the memory size (12)
Notes
1. As the figures indicate, recent desktops provide up to 4 GB/channel memory size,
whereas recent servers (with parallel bus attatchment) offer 4-8 times larger sizes.
2. Servers achieve larger memory sizes by
• supporting more memory modules (with registering expected) than desktop
chipsets do, and
• using higher density DRAM devices at the same speed grade (e.g.
1 Gb devices instead of 512 Mb devices or 2 Gb devices instead of 1 Gb devices
than desktop chipsets.
3. Recent server chipsets supporting main memories with serial bus attachement
(like Intel’s 5000 and 7000 DP and MP-family chipsets) support both more
channels and more modules/channel providing much higher main memory sizes
of up to 192 GB or more (see Section Main memories with serial bus attachment).
6. Main limiters of increasing the memory size (13)
The rate of increasing DRAM densities
In accordance with Moore’s law (saying that the transistor count per chip is
doubling about every 24 month
DRAM densities evolve about 4x/ 4 years.
For the same numbers of control units/modules/ranks
the maximum size of main memories would increases also about 4x/4 years.
But as the number of modules/channel decreases with higher transfer rates,
the maximum size of main memories increases by a rate < 4x/4 years.
7. References (1)
[1]: D. Bhandarkar: „The Dawn of a New Era”, 11. EMEA, May, 2006.
[2]: Moore G. E., No Exponential is Forever... ISSCC 2003,
ftp://download.intel.com/research/silicon/Gordon_Moore_ISSCC_021003.pdf
[3]: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
[4]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron,
http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf
[5]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005,
http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf
[6]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008
[7]: Backplane Designer’s Guide, Section 9 - Layout Considerations, Fairchild
Semiconductor, Apr. 2002, http://www.fairchildsemi.com/ms/MS/MS-569.pdf
[8]: PC133 SDRAM RegisteredcDIMM Design Specification, Rev. 1.1, Aug. 1999, IBM &
Reliance Computer Corp., http://www.simmtester.com/PAGE/memory/techdata_
pc133rev1_1.pdf
[9]: Horna O. A., „Pulse Reflection in Transmission Lines,” IEEE Transactions on
Computers, Vol. C-20, No. 12, Dec. 1971, pp. 1558-1563
7. References (2)
[10]: Vo J., „A Comparison of Differential Termination Techniques,” Application Note 903,
Aug. 1993, National Semiconductor, http://www1.control.com/PLCArchive/RS485_3.pdf
[11]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online,
01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText
=calibrated
[12]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384,
Febr. 2003, XILINC inc.
[13]: Double Data Rate (DDR) SDRAM MT46V128M4, MT46V64M8, MT46V32M16,
Micron Techn. Inc, 2000,
http://download.micron.com/pdf/datasheets/dram/ddr/512MBDDRx4x8x16.pdf
[14]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002,
www.edn.com
[15]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II
Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/
documentation/white_papers/wp190.pdf
[16]: DDR Simulation Process Introduction, TN-46-11, July 2005, Micron, http://download.
micron.com/pdf/technotes/DDR/TN4611.pdf
[17]: Allan G., „DDR Integration,” Chip Design Magazine, June/July 2007
7. References (3)
[18]: Stub Series Terminated Logic for 2.5 Volts (SSTL-2), EIA/JEDEC Standard JESD8-9,
Sept. 1998
[19]: Stub Series Terminated Logic for 1.8 Volts (SSTL-18), JEDEC Standard JESD8-15A,
Sept. 2003
[20]: Double Data Rate (DDR) SDRAM Specification, JEDEC Standard JESD79E, May 2005
[21]: DDR2 SDRAM Specification, JEDEC Standard JESD79-2, May 2006
[22]: DDR3 SDRAM Standard, JEDEC Standard JESD79-3, June 2007
[23]: DDR2 (Point-to-Point) Features and Functionality, TN-47-19, Micron,2003,
http://download.micron.com/pdf/technotes/ddr2/TN4719.pdf
[24]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix,
http://netro.ajou.ac.kr/~jungyol/memory2.pdf
[25]: Micron Synchronous DRAM, 64 Mbit, MT48LC16M4A2, MT48LC16M8A2, MT48LC16M16A2,
Micron Technology, Inc. http://www.micron.com/products/dram/sdram/partlist.aspx
Oct. 2000
[26] General DDR SDRAM Functionality, TN-46-05, Micron Techn. Inc., July 2001,
http://download.micron.com/pdf/technotes/TN4605.pdf
[27]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006,
http://www.mosaid.com/corporate/products-services/ip/SDRAM_Controller_whitepaper_
Oct_2006.pdf
7. References (4)
[28]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/
cse320_f07/xilinx_intro.ppt
[29]: DRAM Pricing – A White Paper, Tachyon Semiconductors,
http://www.tachyonsemi.com/about/papers/dram_pricing.pdf
[30]: Intel E7500 MCH A2 x4/x8 DDR Memory Limitations, Application Note AP-722,
March 2002, Intel
[31]: Intel 925X/925XE Express Chipset, Datasheet, Rev. 001, Jun. 2004, Intel
[32]: Keeth B., Baker R. J., Johnson B., Lin F., DRAM Circuit Design, Wiley-Interscience, 2008
Download