FB-DIMM

advertisement
FB-DIMM technology
Dezső Sima
Spring 2008
(Ver. 1.0)
 Sima Dezső, 2008
Motivations to introduce FB-DIMMs in servers/workstations
Shortcommings of the stub-bus topology used with conventional DRAM architectures [2]
Stub-bus topology
Data lines of the memory controller
are electrically connected
to the data lines of every DRAM device
on the bus (memory channel)
Impedance discontinuities effect
signal integrity [2]
Memory channels may have 8 DIMMs with 8 DRAM devices/DIMM
(i.e. 72 devices/channel)
Heavy signal loading due to the large number of devices
and impedance discontinuities on the bus
limit the number of DRAM devices connected to the channel
the more the higher the data rate
Figure:
Scaling number of channels with memory hubs [7].
Two ranks of DRAM devices per DIMM is assumed.
In the case of single rank per DIMM , while the number
of DIMMs per channel may be doubled, the declining
trend shown in the figure remains the same.
For higher DRAM speeds
less DRAM devices
can be connected
per memory channel [2]
Stub-bus channel capacity
(device density x nr. of devices)
has hit its ceiling [2]
but
increasing server performance
doubles memory capacity demand
about every two years [2]
from Jacob mem systems 2007
Increasing the number of memory channels
Each DDR2 memory channel requires 240 pins
FB-DIMM technology (1)
Principle of operation
• introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)
• introduce full buffering (registered DIMMs buffer only addresses)
• CRC error checking (cyclic redundancy check)
FB-DIMM technology (2)
Figure: FB-DIMM memory architecture [4]
Figure: Maximum supported FB-DIMM configuration [6]
(6 channels/8 DIMMs)
FB-DIMM technology (3)
Implementation details (1)
• Serial transmission between the North Bridge and the DIMMs
(each bit needs a pair of wires)
• Number of seral links
• 14 read lanes (2 wires each)
• 10 write lanes (2 wires each)
• Clocked at 6 x double pumped data rate
e.g. for a DDR 667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz
• Every 12 cycles (that is every two memory cycles) constitute a packet.
• Read packets (frames, bursts): 168 bits (12 x 14 bits)
• 144 data bits
(equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits)
in two memory cycles)
• 24 CRC bits.
• Write packets (frames, bursts): 120 bits (12 x 10 bits)
• 98 payload bits
• 22 CRC bits.
FB-DIMM technology (4)
Implementation details (2)
98 payload bits.
•
2 frame type bits,
• 24 bits of command,
• 72 bits for data and commands, according to the frame type,
e.g. 72 bits of data, 36 bits of data + one command or two commands.
Commands
• row select, precharge, refresh, read, write etc.
• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.
FB-DIMM technology (5)
Implementation details (3)
Read bandwidth:
One FB-DIMM channel transfers in one frame (that is in 12 cycles):
128 data bits, + 16 ECC bits
One frame lasts 2 memory cycles
One DDR2 DIMM channel transfers in 2 memory cycles:
2 x 72 bits (2 x 64-bit data + 2 x 8-bit ECC)
The read bandwidth of an FB-DIMM channel
equals
the bandwidth of a DDR2 channel
Write bandwidth:
The write bandwidth of an FB-DIMM channel is up to 0.5 x the read bandwidth.
But FB-DIMMs allow simultan read and write operation
FB-DIMM technology (6)
FB-DIMM data puffer
(Advanced Memory Buffer, AMB)
Manages the read/write operations
of the module
Source: PC stats
FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s
PC2-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s
PC2-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s
Figure: Different implementations of FB-DIMMs
Figure: Block diagram of the AMB [3]
(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)
FB-DIMM
Necessary routing to connect the north
bridge to the technology
DIMM socket
(7)
b) In case of an FB-DIMM
(69 pins)
a) In case of a DDR2 DIMM
(240 pins)
A 2-layer PCB is needed
(but a 3. layer is used for power lines)
A 3-layer PCB is needed
Figure: PCB routing [4]
FB-DIMM technology (8)
Figure: Latency and bandwith figures of different DRAM technologies for a mix of SPEC applications [5]
FB-DIMM technology (9)
Pros and cons of FB-DIMMs
Advantage of FB-DIMMs vs DDR2 and DDR3 DIMMs
• more memory channels (up to 6)
higher total bandwidth
• more DIMM modules (up to 8) per channel
higher memory capacity (up to 192 GB)
• less wires
simplified PCB routing
• symultaneous read/write operation in a channel
Disadvantage of FB-DIMMs vs DDR2 and DDR3 DIMMs
• higher latency and lower bandwidth figures for 4 to 8 DIMM modules
• higher cost
• higher dissipation
(Typical dissipation figures: DDR2: about 5 W
AMB: about 5 W
DDR2 FB-DIMM: about 10 W)
Latency
The other issue is potentially more troubling. Intel addressed this by not having the
signals be stored and then retransmitted. The data travels along a special fast-passthrough channel in the buffer itself. This lessens much of the latency that would be
induced by store and forward architectures.
Figure: FB-DIMM heat sinks (heat spreaders)
FB-DIMM technology (10)
Market penetration of the FB-DIMM technology
• 5/2006 Intel adopts it in its Bensley platform (5000) for DPs
• 8/2007 Sun introduces it in the Niagara II
• 9/2006 AMD has taken it off from their road map
• 9/2007 Intel uses it in the Caneland platform (7000) for MPs
• 2007 Major memory manufacturers intend to develop DDR3 DIMMs
instead of DDR3 based FB-DIMMs
Standardisation
3/2007 JESD205 DDR2 SDRAM Fully Buffered DIMM (FBDIMM) Design Specification
DDR2-533, DDR2-667, DDR2-800 x72 ECC, 240 pin
256 Mb, 512 Mb, 1 Gb, 2 Gb, 4 Gb devices
1/2007 JESD 206 FBDiMM Architecture and Protocol
FB-DIMM technology (11)
DDR2 vs (SDRAM) DDR
The key difference between DDR and DDR2 is that the DDR2 data bus is clocked
at twice the speed of the memory cells, so four data words can be transferred in
each memory cell cycle without speeding up the memory cells themselves.
Figure: Clocking schemes of the SDR, DDR and DDR2 SDRAM techologies [1]
DDR2's bus frequency is boosted by electrical interface improvements, on-die
termination, prefetch buffers and off-chip drivers. However, latency is greatly increased
as a trade-off. The DDR2 prefetch buffer is 4 bits deep, whereas it is 2 bits deep for DDR
(and 8 bits deep for DDR3). While DDR SDRAM has typical read latencies of between 2
and 3 bus cycles, early DDR2 may have read latencies between 4 and 6 cycles.
Although introduced in Q2 2003 at 200/266 MHz, initially DDR2 could not be
competitive due to too high latency figures. As lower latency parts became available
by the end of 2004 DDR2 became widespread.
Memory
Timings
Latency
Bandwidth in dual-channel mode
DDR400 SDRAM
2.5–3–3
12.5 ns
6.4 GB/sec
DDR400 SDRAM
2–3–2
10 ns
6.4 GB/sec
DDR533 SDRAM
3–4–4
11.2 ns
8.5 GB/sec
DDR533 SDRAM
2.5–3–3
9.4 ns
8.5 GB/sec
DDR2-533 SDRAM
5–5–5
18.8 ns
8.5 GB/sec
DDR2-533 SDRAM
4–4–4
15 ns
8.5 GB/sec
DDR2-533 SDRAM
3–3–3
11.2 ns
8.5 GB/sec
DDR2-600 SDRAM
5–5–5
16.6 ns
9.6 GB/sec
DDR2-600 SDRAM
4–4–4
13.3 ns
9.6 GB/sec
Table: Burst timing, latency and bandwidth figures of DDR and DDR2 DRAM technologies [1]
CAS latency (Column Address Select),(CL)
the time delay (in number of clock cycles) between a memory chip is accessed for data
and the first data bit becomes available
For instance, after accessing a 400 MHz CL3 device, the first bit arrives in 3 x 2.5 ns = 7.5 ns
Early DDR2-533 SDRAM modules available at the time of the announcement of i925 and
i915 chipsets (6/2004) had 4-4-4 timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time).
FB-DIMM technology ()
Power savings are achieved primarily due to a drop in operating voltage (1.8 V
compared to DDR's 2.5 V).
DDR2 has 240 pins instead of 168 pins used by DDR DIMMs
DDR3
Official JEDEC Specifications
DDR2
DDR3
Rated Speed
400-800 Mbps
800-1600 Mbps
Vdd/Vddq
1.8V +/- 0.1V
1.5V +/- 0.075V
Internal Banks
4
8
Termination
Limited
All DQ signals
Topology
Conventional T
Fly-by
Driver Control
OCD Calibration
Self Calibration with ZQ
Thermal Sensor
No
Yes (Optional)
Source: Anandtech
Appeared mid 2007 e.g. in Intel’s P35 Bearlake
Source: Wiki
5.2. Speed gap between processor and memory (1a)
DRAM
1
FPM
2
EDO3
BEDO4
SDRAM 5
DRDRAM
6
Since 1996
Asynchronous
Synchronous
Burst mode access (4*8B) on the same row (page)
Up to 66 MHz bus frequency
66/100/133 MHz
Random access,
typ. access time
60/70/80/100 ns
Access to 4
subsequent
columns
(60 ns)
~ 40 ns
~ 25 ns
(5-5-5-5)
(5-7)-3-3-3
(5-7)-4-4-4
(5-7)-2-2-2
5-1-1-1
(5-7)-1-1-1
Max. bandwidth MB/s






Effective bandwidth MB/s




Triton III.: 7-1-1-1
430 ZX.: 7-1-1-1
820
840
Developed by
RAMBUS
Cycle time within a burst
(for a 60 ns part)
Full burst timing
Examples
Overlapping the
read and address
transfer operations
Internal 2-bit
address generator,
dual banks
~ 15 ns


Triton I.: 7-3-3-3 Triton I.: 7-2-2-2
Triton III.: 6-3-3-3 Triton II,III.:6-2-2-2
Internal on-chip
Full pipelined
operation,
SRAM cache,
assuming at least
page is filled in
dual banks
1 clock cycle,1-2 B
wide data path
256/300/356/400
MHz transfer rate
~ 15/10/7.5 ns
Developed by
MICRON
Remakes
(4/3.3/2.8/2.5 ns)
Level of overlapping
Cached structure
1
4
2
5
Dynamic RAM
Fast Page Mode DRAM
3
Extended Data Out DRAM
Burst mode EDO
Synchronous DRAM
6
Direct Rambus DRAM
Figure 5.1a: DRAM types
5.2. Speed gap between processor and memory (1b)
t
RAC
(ns)
200
*
200
180
160
*
140
150
120
100
*
80
80 *
100
80
*
60
70
*
60*
60
*
50*
40
*
* 30
20
81
Processor
chipset
PC
Typ. DRAM
parts
16 K
t
RAC
50
82
83
84
85
86
AT 386 DX
128 K 128 K
256 K
87
88
89
90
486 DX
256 K
1M
4M
91
92
93
94
95
96
97
P
PPro
PII
430 NX 450 KX/GX 440 BX
256 K
1M
4M
8M
4M
16 M
64 M
: Row access time (time from row address until data valid)
Figure 5.1b: Latency of DRAM chips
4M
16 M
64 M
98
99
PIII
815
16 M
64 M
128 M
256 M
2000
Year
5.2. Speed gap between processor and memory (1c)
Memory latency
ns
Memory latency
in proc. cycles
500
1000
702
**
468 500
Latency in proc.cycles
400
85
300
200
100
*
300
Latency in ns
200
200
*
155
*
3
100
*1
50
40
*
30
20
10
*
*
135
10
141
*
*
116
*
*1
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 2000
Processor PC
486 DX
P
PPro
AT 386 DX
PII
PIII P4
(8088)
(286)
5
3
2
1
Year
Figure 5.1c: System-level memory latency in x86-based PCs
5.2. Speed gap between processor and memory (1d)
Memory latency
(cycles)
Pentium 4
Pentium III
130
*
*
120
*
DDR 266
110
*
RDRAM-60
100
DDR 333
Pentium II
90
60
*
Pentium Pro
*
*
DDR 400
*
*
*
*
*
PC 100
*
40
PC 66
PC 133
EDO
FPM
*
*
20
10
*
RDRAM-40
Pentium
50
30
*
*
80
70
DDR2 533
*
*
486
*386
*
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Figure 5.1d: Latency of DRAM chips (in clock cycles)
4.0
fc
(GHz)
5.2. Speed gap between processor and memory (2)
Tmemory/f c
1.00
Pentium
0.90
Pentium Pro
0.80
0.70
Pentium II
0.60
0.50
*
0.40
0.30
0.20
0.10
Pentium 4
Pentium III
EDO
**
*
*
PC-66
*
*
* *
FPM *
*
*
*
*
*
*
*
PC-100
0.5
PC-133
*
*
*
*
*
1.0
PC-800D
*
*
*
*
1.5
2.0
*
*
*
DDR 333D
*
*
*
DDR 266
2.5
*
*
*
*
*
*
*
*
DDR 400D
*
DDR 400
*
*
* DDR 333
*
3.0
*
DDR 533D
*
*
*
*
*
*
3.5
Figure 5.2: Relative transfer rate of memories (D: dual channel)
*
4.0
f c (GHz)
References
[1]: Gavrichenkov I., „DDR2 vs. DDR: Revenge Gained,” Xbit Laboratories, 12/17/2004,
http://www.xbitlabs.com/articles/memory/display/ddr2-ddr.html
[2]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004,
Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf
[3]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,”
Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf
[4]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the
Next Level,” Technology Intel Magazine, March 2005, pp. 1-7
[5]: Ganesh B., Jaleel A., Wang D. , Jacob B., „Fully-Buffered DIMM Memory Architectures:
Understanding Mechanisms, Overheads and Scaling”, Proc. HPCA 2007
[6]: - „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,
http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[7]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the
Next Level,” Technology Intel Magazin, Technology Intel Magazin, http://www.intel.com/
technology/magazine/computing/fully-buffered-dimm-0305.htm
Download