MEMORY MODULES

advertisement
1. Macroarchitecture and
performance parameters of MMs
Dezső Sima
September 2008
(Ver. 1.0)
 Sima Dezső, 2008
Overview
•
1. Introduction
•
2. Macroarchitecture of main memories
•
3. Key performance parameters of main memories
•
4. References
1. Introduction (1)
Scope
General purpose main memories,
i.e. main memories used in desktops, servers and laptops
1. Introduction (2)
Desktop [32]
Server [77]
Figure: Main memories on motherboards
1. Introduction (3)
Figure: Different kinds of memory modules
1. Introduction (4)
Layout of main memories
Macroarchitecture
of the main memory
Layout of the
memory modules
Figure: Main dimensions of the layout of main memories
2. Macroarchitecture of main memories
2. Macroarchitecture of main memories
•
2.1 Introduction
•
2.2 Attachment policy
•
2.3 Point of attachment
•
2.4 Number of memory controllers
•
2.5 Number of memory channels
•
2.6 Attributes of memory channels
2.1. Introduction (1)
Macroarchitecture of main memories
Example 1
Core
L2 contr.
Core
Processor
L2 contr.
L2
L2
FSB c.
FSB c.
FSB
North
Bridge
FSB
Memory
Processor
Memory
Mem. Mem. modules
channel
North
Bridge
Figure: Single channel main memory attached via the FSB and the north bridge
2.1. Introduction (2)
Example 2
Core
Core
L2 contr.
Core
Processor
Core
L2 contr.
L2
L2
FSB c.
FSB c.
FSB
North
Bridge
FSB
Memory
Processor
Memory
Mem.
channels
Mem. modules
North
Bridge
Figure: Dual channel main memory attached via the FSB and the north bridge
2.1. Introduction (3)
Example 3
Core
Core
Processor
Processor
L2
L2
Memory
IN (Xbar)
IN (Xbar)
B. c.
IO-bus
M. c.
Memory
B. c.
Mem. Mem. modules
channel
M. c.
IO-bus
Figure: Single channel main memory attached via a dedicated memory controller
2.1. Introduction (4)
Example 4
Core
L2
Core
Core
L2
Processor
IN (Xbar)
IN (Xbar)
IO-bus
M. c.
L2
Syst. Req. Queue
Syst. Req. Queue
B. c.
L2
Core
Memory
B. c.
Processor
Memory
Mem.
channels
Mem. modules
M. c.
IO-bus
Figure: Dual channel main memory attached via a dedicated memory controller
2.1. Introduction (5)
Macroarchitecture of main memories
Attachment policy
Point of
attachment
No. of
mem. contr.s
No. of
mem. channels
Attributes
of mem. channels
(in case of direct
attachment)
Figure: Main dimensions of the macroarchitecture of main memories
2.2. Attachment policy (1)
Attachment policy
Indirect attachment
Direct attachment
Attachment via the FSB and
north bridge (mem. control hub)
Attachment via mem. controller(s)
• Longer access times (~20-30%), • Shorter access times (~20-30%),
• Dependency of
• Independency of
memory technology and speed
memory technology and speed
POWER4 (2001)
PA-8800 (2004)
PA-8900 (2005)
POWER5 (2005)
POWER6 (2007)
Cell BE (2006)
Core Duo line (2006)
UltraSPARC IV (2004)
UltraSPARC IV+ (2005)
Montecito (2006)
UltraSPARC T1 (2005)
Athlon 64 X2 line (2005)
Opteron line (2003)
Barcelona (2007)
Figure: Attachment policy
2.2. Attachment policy (2)
Core
Core
Core
Core
L2
L2 contr.
L2
System Request Queue
L2
IN (Xbar)
FSB c.
B. c.
M. c.
FSB
North
Bridge
Memory
Core Duo (2006)
HT-bus
Memory
Athlon 64 X2 (2005)
Core 2 Duo (2006)
Figure:Indirect attachment
of the main memory to the syst. architecture
Figure: Direct attachment
of the main memory to the syst. architecture
2.3. Point of attachment (1)
The point of attachment
The highest cache level
Between the two highest cache levels
(via an IN)
(via the IN connecting these levels)
2-level caches:
3-level caches:
The IN connecting
the L2 cache
The IN connecting
the L3 cache
L2
L2
L3
L3
IN
IN
M
M
The M. c is connected usually in this way
if the highest cache level is inclusive.
3-level caches:
2-level caches:
The IN connecting
the L1 and L2 caches
C
C
L2
M
IN1
L2
The IN connecting
the L2 and L3caches
L2
L2
L2
M
IN
L3
L3
L3
The M. c is connected usually in this way
if the highest cache level is exclusive.
Figure: Possible points of attachment of main memory to the system architecture
2.3. Point of attachment (2)
Interrelationsship between inclusion policy of L3 caches and point of attachment
Inclusive L3
Exclusive L3
Replaced lines
L2
L2
L3
Lines missing in L2
are reloaded and
deleted from L3
Data missing in L2/L3
(high traffic)
L3
Replaced, modified data
(low traffic)
M.c.
Memory
Memory
L3
L3
L2
L2
M.c.
IN
M.c.
Memory
Montecito (2006)
POWER4 (2001)
L3
L3
UltraSPARC IV+ (2004)
POWER5 (2004)
Memory
2.3. Point of attachment (3)
In case of a two-level cache hierarchy
Core
Core
L2 contr.
Core
In case of a three-level cache hierarchy
Core
L2
L2
System Request Queue
Core
L2 I
Core
L2 D
L2 I
L3
L2 D
L3
L2
IN (Xbar)
FSB c.
FSB c.
B. c.
M. c.
FSB
FSB
HT-bus
Core 2 Duo (2006)
Memory
Athlon 64 X2 (2005)
Montecito (2006)
Figure: Examples for attaching memory via the highest cache level
2.3. Point of attachment (4)
In case of a two-level cache hierarchy
In case of a three-level cache hierarchy
L3 data
L3 tags/contr.
Core 0
L2
L2
M. c.
Memory
L2
M. c.
Memory
L2
X
b
Interconn.
network
Core
a
r
Core 7
L2
M. c.
Memory
L2
M. c.
Memory
B. c.
B. c.
Fire Plane
bus
Core
M. c.
Memory
JBus
UltraSPARC T1 (2005)
UltraSPARC IV+ (2005)
(exclusive L3)
Figure: Examples for attaching memory via the interconnection network connecting
the two highest cache levels
2.4. Number of memory controllers (1)
Number ofmemory controllers
(in case of direct attachment)
Typ. use
E.g.
Single
memory controller
Dual
memory controllers
Quad
memory controllers
Usual
implementations
A few recent designs
Exceptional designs
POWER5 (2004)
POWER6 (2007)
K8-based
processors (2006)
Barcelona (2007)
UltraSPARC T1 (2005)
UltraSPARC T2 (2007)
Figure: Number of memory controllers
(in case of direct attachment)
2.4. Number of memory controllers (2)
Figure: Block diagrams of the POWER5 and POWER6 processors [57]
2.4. Number of memory controllers (3)
Figure: Block diagrams of AMD’s K8 and Barcelona processors [58]
2.4. Number of memory controllers (4)
Figure: Block diagram of the UltraSPARC 2 (Niagara-2) [59]
2.5. Number of memory channels (1)
Number of memory channels
(per north bridge/memory controller)
Typ. use
E.g.
Single
memory channel
Dual
memory channels
Early desktops
Recent desktops,
single core
DP/MP servers
Recent DC and QC
DP/MP servers with
FB DIMM memory
Intel’s 865 and higher
chipset families
for P4 desktops,
Intel’s P4 based
DP server chipsets
Intel’s 5000 (Bensley)
and 7000 Caneland
platforms for Core Duo
DC and MC processors
Intel’s 845/848
chipset families
for P4 desktops
and earlier desktop
chipsets
Quad
memory channels
Cell BE
Figure: Number of memory channels supported per north bridge/memory controller
2.5. Number of memory channels (2)
Example 1
Figure: Block diagram of an early P4 desktop having a single memory channel
(Intel 845 chipset) [49]
2.5. Number of memory channels (3)
Example 2
Figure: Block diagram of a more advanced P4 desktop including dual memory channels
(Intel’s 975 chipset) [50]
2.5. Number of memory channels (4)
Example 3
Figure: Block diagram of an early P4-based DP server including dual memory channels
(Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51]
2.5. Number of memory channels (5)
Memory Interface Controller (MIC)
•
Dual XDRTM memory channels
•
Interleaved adressing in the channels
•
The MIC can be configured to support
only a single channel
•
ECC support (32 + 4 bits)
Dual 36 bits wide XDR channels
Memory bandwidth at 3.2 Gb/s transfer rate:
3.2 Gb/s x 2 x 4 B = 25.6GB/s
Figure: Basic blocks of the Cell BE processor [60]
2.5. Number of memory channels (6)
Remark
In dual channel configurations (or in general, in case of multiple memory channels)
a scheme is needed to define the allocation of memory addresses to the individual channels.
Allocation of addresses to the individual channels
Interleaved mode
Asymmetric mode
• Addresses are allocated alternating
to the channels at 64 B boundaries,
assuming 64 B long cache lines.
Two consecutive cache lines
can be retrieved simultaneously.
• Addresses start in the first channel
and are allocated to this channel
until the highest rank of this channel.
Then addresses continue in the second
channnel.
• Both memory channels must be
populated with modules having
the same size (e.g. 1 GB).
• No need to populate both channels,
or populate them with the same
size.
• Provides maximum performance
in real applications.
• In real applications, performance is
limited to single channel performance.
Figure: Address allocation alternatives to the individual channels
2.5. Number of memory channels (7)
Example 4
5000
(Blackford)
Xeon
5000 (Dempsey, Netburst), DC
5100 (Woodcrest, Core 2), DC
5300 (Clowertown, Core 2), QC
FB-DIMM
up to 64 GB
In workstations the snoop filter
eliminates snoop traffic to the
graphics port
Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processors
including quad memory channels [52]
2.5. Number of memory channels (8)
Example 5
Xeon
7200 (Tigerton DC, Core2), DC
7300 (Tigerton QC, Core2), QC
FB-DIMM
up to 512 GB
Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processors
including quad memory channels [53]
2.5. Number of memory channels (9)
Remark
The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless
actual implementations support typically only four DIMMs.
Figure: Maximum supported FB-DIMM configuration [54]
(6 channels/8 DIMMs)
2.6. Attributes of memory channels (1)
Attributes of memory channels
Supported type of
mem. modules
Supported no. of
mem. modules
Supported attributes
Supported no. of
of DRAM devices
ranks per mem. module
Figures: Attributes of memory channels
2.6. Attributes of memory channels (2)
Suported type of memory modules
Memory modules
of the same DRAM type
Memory modules
of different DRAM types
Usual implementation
In order to provide a choice and
evolution path in times of
memory technology transfers
(e.g. while DDR2 technology
replaces DDR technology)
DRAM type A
E.g.
DDR2
DDR
DRAM type B
DDR2
Figure: Type of memory modules supported on the memory channel(s)
2.6. Attributes of memory channels (3)
Note: Motherboards allowing to choose from two different DRAM types are
termed Combo boards.
Example
Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies.
Per channel a single memory module is supported (with one or two memory ranks on each).
Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard,
is a designated as a combo mainboard.
2.6. Attributes of memory channels (4)
4 DIMM slots
North bridge of the
915G chipset
Figure: MSI’s 915G Combo
motherboard (based on
Intel’s 915G chipset) [61]
2.6. Attributes of memory channels (5)
DDR2
DDR
Two DDR or DDR2 channels
with a single DIMM slot
on each channel
Figure: DIMM slots of the
MSI’s 915G Combo
motherboard [61]
2.6. Attributes of memory channels (6)
Supported number of memory modules
It depends on the
• DRAM connection technology
• DRAM speed
• Number of ranks mounted onto the memory module(s).
2.6. Attributes of memory channels (7)
Dependency on the memory connection technology
The maximum number of supported memory modules depends heavily on the
memory connection technology, that is whether the modules are connected
• via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or
• via a serial bus (like in case of FBDIMM modules).
Number of memory modules
supported per memory channel
E.g.
1-4
memory modules
6-8
memory modules
Modules connected
via a parallel bus
Modules connected
via a serial bus
SDRAM, DDR, DDR2, DDR3
modules
FBDIMM
modules
Figure: Number of memory modules vs memory connection technology
in synchronous DRAMs
2.6. Attributes of memory channels (8)
Remarks
1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs
often allowed 4 – 8 memory modules to attach.
2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets
supported typically two pairs of 32-bit wide FPM/EDO modules.
2.6. Attributes of memory channels (9)
Dependency on the memory speed
For
•
•
•
higher transfer rates
skews
jitter and
reflections (caused by impedance mismatch while terminating transmission lines)
impede more and more signal integrity.
Obviously, the more memory modules are present on a channel
the serious signal integrity problems arise.
Higher transfer rates limit the number of memory modules that can be
supported on a memory channel.
2.6. Attributes of memory channels (10)
Figure: Scaling down the number of supported DIMMs per channel with increasing data rates
(assuming two ranks per DIMM) [62]
2.6. Attributes of memory channels (11)
Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55]
2.6. Attributes of memory channels (12)
Levelling off channel capacity for synchronous DRAMs
With
• increasing device densities
• but decreasing number of modules supported for higher transfer rates
by memory channels,
the maximum memory capacity per memory channel remains roughly the same
for synchronous SDRAM devices [66].
But increasing server performance
doubles memory capacity demand
about every two years [66]
Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66]
2.6. Attributes of memory channels (13)
Increasing server capacity demand calls for memory technologies with
higher capacity potential, such as DRAM technologies with serial bus
connection, like FB-DIMM.
2.6. Attributes of memory channels (14)
Dependency on the number of ranks mounted onto the memory modules
Dual memory ranks mounted on the memory modules result in higher bus loading,
and may reduce the maximum number of supported memory slots.
E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed
• up to three SDRAM DIMMs with just a single rank or
• up to two SDRAM DIMMs with dual ranks.
2.6. Attributes of memory channels (15)
Number of memory modules
supported per memory channel
Typical use
1-2
memory modules
6-8
memory modules
Desktops/
entry level
servers
DP/MP servers
with FBDIMM
mem. modules
Figure: Number of memory modules supported per memory channel
by Intel’s P4/Core 2 Duo north bridges
2.6. Attributes of memory channels (16)
4 DIMM slots
DDR2
DDR
Two DDR or DDR2 channels
with a single DIMM slot
on each channel
Figure: Example 1. P4 based
desktop motherboard
(MSI’s 915G Combo
motherboard with
Intel’s 915G chipset) [61]
2.6. Attributes of memory channels (17)
4 DIMM slots
Ch. A
Ch. B Two DDR2 channels with two DIMM slots
on each channel
MCH
(E7221)
CPU
Figure: Example 2. P4-based entry-level DP server motherboard
(Supermicro’s P8SCT with Intel’s E7221 chipset) [63]
2.6. Attributes of memory channels (18)
4 DDR2 FB-DIMM
channels
6 DIMM slots
on each channel
Figure: Example 3. Block diagram of a Core 2 based four-processor MP server
(Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64]
2.6. Attributes of memory channels (19)
4 DDR2 FB-DIMM
channels
Xeon
6 DIMM slots
on each channel
7200 DC
7300 QC
(Tigerton)
192 GB
ATI ES1000 Graphics with
32MB video memory
7300 NB
SBE2 SB
Figure: Example 3. Core 2 based four-processor MP server motherboard
(Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64]
2.6. Attributes of memory channels (20)
Xeon
7200 (Tigerton DC, Core2), DC
7300 (Tigerton QC, Core2), QC
Four DDR2 FB-DIMM channels
with 8 DIMM slots on each channel
up to 512 GB
Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platform
with the 7300 (Clarksboro) chipset (9/2007) [65]
2.6. Attributes of memory channels (21)
Supported number of ranks per memory module
Rank: logical unit
Memory module: physical unit
• A rank consists of a set of DRAM devices (of a given width) that are needed to
achieve the expected data width of the memory module.
E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices.
• Optionally, a rank may include an additional DRAM device to hold ECC bits.
• DRAM devices constituting a rank are mounted side by side onto a memory module.
• A rank covers usually one side of the memory module (using x8 or x16 devices,
but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides.
• All devices of a rank share the address and the command bus.
• All devices of a rank are selected by the same CS (Chip Select) signal, whereas
different ranks have different CS signals.
A memory rank is sometimes designated also as a row.
2.6. Attributes of memory channels (22)
Figure: Connecting ranks to the memory controller [68]
2.6. Attributes of memory channels (23)
Memory module: physical unit
• A memory module is basically a PC card that carries one or more ranks,
and fits into a memory slot of the motherboard.
• Memory modules may be populated either on one side or on both sides.
A memory module may contain
• a single rank on one of its sides
• a single rank on both of its sides
• two ranks, each one of its sides
2.6. Attributes of memory channels (24)
Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices,
that are mounted on one side of the module [67]
2.6. Attributes of memory channels (25)
Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices,
that are mounted on both sides of the module [67]
2.6. Attributes of memory channels (26)
Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices,
that are mounted on both sides of the module [67]
2.6. Attributes of memory channels (27)
Supported number of ranks per memory module
A single rank
is supported per mem. module
Dual ranks
are supported per mem. module
In few cases, usually as a
restriction for higher DRAM speeds
Typical implementation
Figure: Supported number of ranks (rows) per memory module
Examples
a) The north bridge of Intel’s 815 chipset supports
• up to three SDRAM-100 DIMMs with dual ranks or
• up to three SDRAM-133 DIMMs with just a single rank or
• up to two SDRAM-133 DIMMs with dual ranks.
b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports
• up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks
2.6. Attributes of memory channels (28)
Supported attributes of DRAM devices
DRAM type
DRAM width
DRAM density
DRAM speed
Figure: Supported attributes of DRAM devices
2.6. Attributes of memory channels (29)
DRAM types
( for general use)
DRAMs with
parallel bus connection
Asynchronous DRAMs
DRAM
Year
of intro.
FP
FPM
EDO
(1970) (~1974) (1983) (1995)
DRAMs with
serial bus connection
Synchronous DRAMs
SDRAM
(1996)
DDR2
DDR3
DRDRAM
XDR
FBDIMM
(2000) (2004)
(2007)
(1999)
(2006)1
(2006)
DDR
Main stream DRAM types
1
Challenging DRAM types
Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
Figure: DRAM types for general use
(Described in Sections 4, 5, 6 of the Chapter DRAM devices)
2.6. Attributes of memory channels (30)
DRAM width
North bridges/memory controllers specify the width of supported DRAM devices.
Most recent north bridges/memory controllers support x8 and x16 DRAM devices.
DRAM density
North bridges/memory controllers specify supported DRAM densities.
Example 1
The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices
with 16Mb/64Mb/128Mb/256Mb densities
Example 2
The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors
supports DDR2 and DDR3 devices with 512Mb and 1Gb densities.
.
DRAM speed
Also north bridges/memory controllers specify supported DRAM speeds.
2.6. Attributes of memory channels (31)
845xx family
Features
(Brookdale)
Single channel SDR/DDR SDRAM
(unbuffered)
2 GB
MCH/GMCH
Memory
Max. memory
FSB
HT support
DRAM speed
400 MHz
HT not supported
PC133
9/0111/01
845
533/400 MHz
HT supported
PC133,
PC133,
DDR 266/200
DDR 266/200
DDR 266/200 DDR 266/200
1/02
845
5/02
5/02
845GL
845G
5/02
845E
DDR 266/200 DDR 333/266 DDR 333/266
10/02
845GV
10/02
845GE
10/02
845PE
Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets.
Another example:
The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors
support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s
transfer rate.
3. Key performance parameters of
main memories
3. Key performance parameters of main memories
•
3.1 Memory capacity
•
3.2 Memory bandwidth
•
3.3 Memory latency
3.1. Memory capacity (1)
Memory capacity (CM)
CM = nCU x nCH x nM x nR x CD
with nCU: No. of north bridges/memory control units
nCH: No. of memory channels per north bridge/control unit
nM: No. of memory modules per channel
nR:
No. of ranks per memory module
CR:
Rank capacity (device density x no. of DRAM devices)
E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two
dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density
per rank.
The resulting maximum memory capacity is:
CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB
3.1. Memory capacity (2)
Crucial factors limiting the maximum capacity of main memories
• nM: No. of memory modules supported per memory channel
• CR:
Rank capacity (device density x no. of DRAM devices/rank).
3.1. Memory capacity (3)
Number of memory modules
supported per memory channel
E.g.
1-4
memory modules
6-8
memory modules
Modules connected
via a parallel bus
Modules connected
via a serial bus
SDRAM, DDR, DDR2, DDR3
modules
FBDIMM
modules
Higher transfer rates limit
the number of mem. modules
typically to one or two.
Figure: Number of memory modules supported by memory channel
3.1. Memory capacity (4)
Rank capacity (CR)
CR = nD x D
with
nD: Number of DRAM devices/rank
D: Device density
Number of DRAM devices/ rank
Typically: up to 8
E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices
3.1. Memory capacity (5)
Device density
Units 106
2000
4M
16M
64M 256M
1G
1500
Density: ~4×/4Y
64K
1000
500
256K 1M
16K
1980
1985
1990
1995
2000
2005
2010
2015
Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35])
Year
Ranks include typically up3.1.
to 8 DRAM
devices.
Memory
capacity
(6)
Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops
assuming:
•
•
•
•
2 memory channels
1 modules per channel
dual ranked modules
populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density:
CMmax = 1 x 2 x 1 x 2 x 1 = 8 GB
Typical maximum main memory sizes of recent Core 2 based servers, assuming:
assuming:
•
•
•
•
4 memory channels
6 modules per channel
dual ranked modules
populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density:
CMmax = 1 x 4 x 6 x 2 x 4 = 192 GB
3.1. Memory capacity (7)
The rate of increasing DRAM densities
In accordance with Moore’s law (saying that the transistor count per chip is
doubling about every 24 month
DRAM densities evolve about 4 x/ 4 years.
For the same number of control units/modules/ranks
the maximum size of main memories increases also about 4 x/4 years.
3.2. Memory bandwidth (8)
Bandwidth of memory systems
Total bandwidth (BW) provided by a memory system:
BW = nCU x nCH x T x WM
with nCU: No. of north bridges/memory control units
nCH: No. of memory channels per north bridge/control unit
T: Transfer rate of the module (no. of data transfers/sec)
WM: Data width of the memory modules
E.g. A memory system with a single, dual channel controller and
8 Byte wide DDR2 800 modules provides a total bandwidth of:
BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s
Processors with increasing number of cores require obviously, increasingly higher
memory bandwidth.
3.2. Memory bandwidth (10)
The min. column cycle time (tCCD) of the memory cell array
tCCD (Core column delay)
is the min. time interval between consecutive Reads or Writes.
Figure: The interpretation of tCCD [36]
Remark
tCCD is designated also as the Read/Write command to Read/Write command delay
3.2. Memory bandwidth (11)
ns
Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [37]
Note: The min. column cycle time (tCCD) of synchronous DRAMs is:
SDRAM:
DDR/2/3
7.5 ns
5 ns
3.2. Memory bandwidth (9)
The crucial factor limiting the memory bandwidth of the main memory:
Transfer rate of the memory module (no. of data transfers/sec)
The transfer rate of the memory module (T)
equals the transfer rate of the DRAM devices used.
The peak transfer rate (Tmax) of synchronous DRAM devices:
Tmax = 1/tCCD x FW
with tCCD: Min. column cycle time of the memory cell array
FW: Fetch width of the memory cell array
3.2. Memory bandwidth (12)
The fetch width (FW) of the memory cell array
specifies how many times more bits the cell array fetches per column cycle
then the data width of the device.
E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM)
fetches 4 × 4 that is 16 bits from the memory cell array per column cycle.
The fetch width (FW) of the memory cell array of synchronous DRAMs is:
DRAM type
FW
SDRAM:
DDR:
DDR2:
DDR3:
1
2
4
8
3.2. Memory bandwidth (13)
The peak transfer rates of the different DRAM technologies are:
Tmax = 1/tCCD x FW
SDRAM:
DDR:
DDR2:
DDR3:
1/7.5 x 1 = 133 MT/s
1/5 X 2 = 400 MT/s
1/5 x 4 = 800 MT/s
1/5 x 8 = 1600 MT/s
3.2. Memory bandwidth (14)
Transfer rate
(MT/s)
5000
2000
1000
500
DDR
266
200
SDRAM
133
SDRAM
100
100
SDRAM
66
50
*
DDR
333
DDR
400
DDR2
533
*
DDR2
667
*
DDR2
800
*
DDR3
1067
*
*
*
*
~ 10*/10years
*
*
20
10
Year
96
97
98
99
2000
01
02
03
04
05
06
07
08
Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
3.2. Memory bandwidth (15)
The evolution of peak transfer rates of synchronous DRAMs
Peak transfer rates evolve by ≈ 10x/10 years,
that means doubling in 3-4 years
Sources of the evolution
the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3)
More specifically
the more and more advanced approaches to improve first of all
• signaling (by using SSTL_2/1.8/1.5, differential CK/DQS)
• synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and
• line terminations (by using ODT, dynamic ODT, ZQ calibration etc.)
3.2. Memory bandwidth (16)
The evolution of processor clock frequencies vs
transfer rates of main memories in mainstream processors
3.2. Memory bandwidth (17)
The evolution of processor clock frequencies (fC) in desktops
fc
(MHz)
Le ve ling off
5000
Pe ntium 4
**
*
* * *
   
* * *    
2000
~100*/10ye ars
1000
* *    
*
   
*
Pe ntium III*
*
*
*
Pe ntium II *
* *    
*
*
500
200
Pe ntium
486-DX4
**
100
*
~10*/10ye ars
50
486-DX2
*
486
386
20
*
286
10
*
8088 *

*
5
*
*
*
*
*
*
Pe ntium Pro
   
  
*
  
*

*
*
  
2
1
78
79 1980 81 82
83 84 85
86 87 88 89 1990 91 92
93 94
95
96 97
98 99 2000 01 02
03 04
Year of first volume shipment
Figure: Evolution of clock frequencies in Intel’s desktop processors
05
Year
3.2. Memory bandwidth (21)
The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
Transfer rate
(MT/s)
5000
2000
1000
500
DDR
266
200
SDRAM
133
SDRAM
100
100
SDRAM
66
50
*
DDR
333
DDR
400
DDR2
533
*
DDR2
667
*
DDR2
800
*
DDR3
1067
*
*
*
*
~ 10*/10years
*
*
20
10
Year
96
97
98
99
2000
01
02
03
04
05
06
07
08
Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
3.2. Memory bandwidth (19)
The evolution of processor clock frequencies (fC) in desktops between 1995-2003
fc
(MHz)
Le ve ling off
5000
Pe ntium 4
*
**
* * *    
2000
   
* *    
~100*/10ye ars
1000
* * *
*
Pe ntium III* *
   
**
500
Pe ntium II
200
*
Pe ntium
486-DX4
**
100
*
~10*/10ye ars
50
486-DX2
*
486
386
20
*
286
10
*
8088 *

*
5
*
*
*
*
*
*
*
*
*
*
*    
Pe ntium Pro
   
  
*
  
*

*
*
  
2
1
78
79 1980 81 82
83 84 85
86 87 88 89 1990 91 92
93 94
95
96 97
98 99 2000 01 02
03 04
Year of first volume shipment
Figure: Evolution of clock frequencies in Intel’s desktop processors
05
Year
3.2. Memory bandwidth (20)
In the time period of about 1995 - 2003
clock frequencies arose by a rate of
≈ 100x/10 years
transfer rates of main memories only by a rate of ≈ 10x/10 years.
3.2. Memory bandwidth (22)
In the time period of about 1995 - 2003
clock frequencies arose by a rate of
≈ 100x/10 years
transfer rates of main memories only by a rate of ≈ 10x/10 years.
In this time period
the gap between clock frequencies and memory transfer rates became
continuously wider.
In this time period higher clock rates were the main source for higher proc. performance,
but higher processor performance invokes higher memory traffic
a strong motivation arose to increase the bandwidth of main memories by
increasing the width of the datapath to the main memory ,
first of all by introducing dual memory channels.
Dual memory channels became the commonplace even in desktops.
3.2. Memory bandwidth (24)
The evolution of processor clock frequencies (fC) in desktops after about 2003
fc
(MHz)
Le ve ling off
5000
Pe ntium 4
*
**
* * *    
2000
   
* *    
~100*/10ye ars
1000
* * *
*
Pe ntium III* *
   
**
500
Pe ntium II
200
*
Pe ntium
486-DX4
**
100
*
~10*/10ye ars
50
486-DX2
*
486
386
20
*
286
10
*
8088 *

*
5
*
*
*
*
*
*
*
*
*
*
*    
Pe ntium Pro
   
  
*
  
*

*
*
  
2
1
78
79 1980 81 82
83 84 85
86 87 88 89 1990 91 92
93 94
95
96 97
98 99 2000 01 02
03 04
05
Year
Year of first volume shipment
Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003
3.2. Memory bandwidth (23)
After about 2003 however,
clock frequencies became saturated (due to meeting the thermal wall),
and single core processors represented the mainline until about 2005.
In the time period of about 2003 - 2005
the gap between clock frequencies and memory transfer rates
became narrover.
Nevertheless,
beginning with ~ 2005
the era of multicores emerged
with doubling the core count about every two years.
Beginning with about 2005
A new scenario becomes dominant
with steadily increasing bandwidth/transfer rate requirements.
3.2. Memory bandwidth (25)
The status quo in increasing bandwidth/transfer rates
3.2. Memory bandwidth (26)
.
double data rate SDRAM migration
Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56]
3.2. Memory bandwidth (27)
Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40]
3.3. Memory latency (1)
Memory latency
• Device level memory latency
• System level memory latency
3.3. Memory latency (2)
Read
1
latency
(ns)
200
200
*
180
160
*
140
150
120
100
100
*
100
*
80
*
*
80
80
*
70
60
70
*
*
60
60
*
*
50
40
50
*
*
40
20
81
Desktop
processor
82
83
84
85
86
AT 386 DX
PC
87
88
89
FPM
Typ. DRAM 64 K
chips (bits)
FPM
64 K
256 K
64 K
256 K
FPM
91
92
486 DX
Chipset
DRAM
type
90
FPM
93
94
95
96
430LX
FPM
FPM
98
PII
P
420TX
97
430FX
*
35
40
*
*
30
99
2000 01
30
*
*
24
02
64 K
256 K
256 K
256 K
256 K
1M
1M
1M
4M
4M
16 M 16 M
4M
64 M
64 M
05
06
07 Year
Core2
8152 850 845
8453
8202
SDRAM DDR
SDRAM SDRAM
EDO RDRAM RDRAM
16 M
04
PIII P4
430VX 440ZX
FPM FPM
EDO EDO
SDRAM
03
25
*
*
22
915
865
835
DDR
DDR2
DDR2
DDR3
DDR2
64 M 128 M 128 M 256 M
128 M 256 M 256 M 512 M
256 M
512 M
512 M
1G
1G
1G
256 M
512 M
1 Read latency of DRAM, FPM, EDO and BEDO parts = tRAC (Row access time (time from row address until data valid))
Read latency of SDRAM parts = CL + tRCD (CAS Latency + Row to Column delay)
2 The 815 chipset supports SDRAMs while the 820 RDRAMs
3 A new revision of the 845 supports DDRs instead of SDRAMs
Figure: Estimated maximum and minimum read latencies of DRAM devices (ns)
512 M
1G
2G
3.3. Memory latency (3)
Memory latency
ns
300
200
210
*
RDRAM
200
*
*160
155
*
140
*
135
*
120
*
100
110
*
85
*
70
*
50
81 82
Desktop
processor
Chipset
DRAM
type
Typ. DRAM
parts (bits)
86 87
88 89 1990 91 92
AT 386 DX
(286)
PC
(8088)
DRAM
16 K
83 84 85
DRAM
64 K
64 K
128 K
DRAM
486 DX
DRAM
FPM
DRAM
FPM
64 K
256 K
128 K
256 K
1M
93 94
P
95
96 97 98 99 2000 01 02
PPro
PII
03 04
05
PIII P4
430VX 440ZX 8152 850 845
915
8453
8202
EDO EDO
SDRAM DDR DDR
EDO RDRAM
FPM FPM SDRAM SDRAM RDRAM
DDR2
SDRAM
430FX
420TX
430LX
FPM
FPM
256 K
256 K
1M
1M
4M
4M
16 M 16 M
4M
64 M
64 M 128 M 64 M 256 M 256 M
128
M 256 M 128 M 512 M 512 M
64 M
1G
256 M
256 M
1G
512 M
16 M
Year
06 07 08
Core2
865
835
DDR2
DDR3
DDR2
512 M
512 M
1G
1G
2G
Figure : Estimated typical system-level memory latency in x86-based PCs (in ns)
3.3. Memory latency (4)
Memory latency
in proc. cycles
1000
500
300
200
RDRAM
100
85 *
50
5
3
180
*
Desktop
processor
Chipset
PC
(8088)
DRAM
type
DRAM
16 K
*
1
*
1
*
81 82
Typ. DRAM
parts (bits)
280
*
10
*
10
1
220
*
40
*
30
20
3
2
240
*
83 84 85
AT 386 DX
(286)
DRAM
64 K
86 87
DRAM
88 89 1990 91 92
486 DX
DRAM DRAM
FPM
FPM
96 97 98 99 2000 01 02
93 94
95
P
PPro
420TX
430LX
FPM
FPM
PII
03 04
05
PIII P4
430VX 440ZX 8152 850 845
915
8453
8202
EDO EDO
SDRAM DDR DDR
EDO RDRAM
FPM FPM SDRAM SDRAM RDRAM
DDR2
SDRAM
430FX
64 K
64 K
256 K
256 K
256 K
128 K
128 K
256 K
1M
1M
1M
4M
4M
16 M 16 M
4M
64 M
64 M 128 M 64 M 256 M 256 M
64 M 128 M 256 M 128 M 512 M 512 M
1G
256 M
256 M
1G
512 M
16 M
Year
06 07 08
Core2
865
835
DDR2
DDR3
DDR2
512 M
512 M
1G
1G
2G
Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles)
4. References (1)
[1]: 64MB Apple G3 Beige 168p SDRAM DIMM, http://www.memoryx.net/apl168s64.html
[2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, http://www.pjrc.com/mp3/simm/datasheet.html
[3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard
No. 21-C, Page 4.20.2
[4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page 4.5.10
[5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida,
http://pdf1.alldatasheet.com/datasheet-pdf/view/60081/ELPIDA/MC-4R512FKE6D.html
[6]: DDR2 SDRAM UDIMM Features, Micron,
http://www.micron.com/products/modules/udimm/partlist
[7]: DDR3 SDRAM UDIMM Features, Micron,
http://www.micron.com/products/modules/udimm/partlist
[8]: DDR2 SDRAM FBDIMM Features, Micron,
http://www.micron.com/products/modules/fbdimm/partlist
[9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets,
http://www.hardwaresecrets.com/article/167/1
[10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007,
http://www.digit-life.com/articles2/mainboard/ddr3-rmma.html
4. References (2)
[11]: http://www.hardwaresecrets.com/fullimage.php?image=2862
[12]: http://cgi.ebay.com/Vintage-Microsoft-8-Bit-ISA-PC-RAM-Card-W-Gold-5150_
W0QQitemZ310017171151QQcmdZViewItem
[13]: http://www.hardwaresecrets.com/fullimage.php?image=2856
[14]: http://www.memex.com.au/images/72psimm.jpg
[15]: Ahn J.-H., „DRAM Operation & Architecture,” 2007. 9. 10., Hynix,
http://netro.ajou.ac.kr/~jungyol/memory2.pdf
[16]: http://www.twinmos.com/dram/dram_p_dt_ddr.htm#s
[17]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
[18]: http://www.twinmos.com/dram/dram_p_dt_ddr3_1333.htm#s
[19]: http:// item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAMLAPTOP-16mb-EDO_W0QQitemZ230060958674QQihZ013QQcmdZExpressItem
[20]: http:// www.twinmos.com/dram/dram_p_nb_sdr_sodimm.htm
[21]: http:// www.cdw.com/shop/products/default.aspx?EDC=915882
[22]: http:// laptoping.com/category/laptop-memory
[23]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
[24]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
4. References (3)
[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/
SD18C32_64_128x72D.pdf
[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/
sd18c32_64_128x72.pdf
[26]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr/
DDF18C64_128x72D.pdf
[27]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr2/
HTF18C64_128_256x72D.pdf
[28]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr3/
JSF18C256x72PD.pdf
[29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003,
http://www.pericom.com/pdf/datasheets/PI6CV857.pdf
[30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification,
JEDEC Standard No. 21-C, Page 4.20.4-1, Revison 1.3, Jan. 2002,
http://www.jedec.org/download/search/4_20_04R13.PDF
[31]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/
[32]: http://www.pricegrabber.com/search_getprod.php/masterid=3191326
[33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications,
JESD82, JEDEC, July 2000
4. References (4)
[34]: http://www.tranzistoare.ro/datasheets2/32/327037_1.pdf
[35]: DRAM Pricing – A White Paper, Tachyon Semiconductors,
http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf
[36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron,
http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf
[37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005,
http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf
[38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006,
http://www.mosaid.com/corporate/products-services/ip/
SDRAM_Controller_whitepaper_Oct_2006.pdf
[39]: Van Roon T., „What exactly is a PLL?,” April 2006,
http://www.uoguelph.ca/~antoon/gadgets/pll/pll.html
[40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004,
http://asic.postech.ac.kr/1.Nrl/2.NRL%20Seminar/invitation/041208ChoiJH.pdf
[41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/
cse320_f07/xilinx_intro.ppt
[42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384,
Febr. 2003, XILINC inc.
4. References (5)
[43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated
Device Technology Inc., 1999, http://www.digchip.com/datasheets/parts/
datasheet/222/IDT49C466.php
[44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application
Note XAP645 (v.2.2), Aug. 2006, http://www.xilinx.com/support/documentation/
application_notes/xapp645.pdf
[45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page
4.20.4-1, Jan. 2002, http://www.jedec.org
[46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester,
http://www.simmtester.com/PAGE/news/showpubnews.asp?num=153
[47]: DDR2 DIMM SPD Definition, August 25, 2006,
http://docmemory.com/page/news/showpubnews.asp?num=141
[48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, 2002
http://download.micron.com/pdf/technotes/TN_04_42_C.pdf
[49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002,
Intel, No. 298604-001
[50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet,
Nov. 2005, Intel, No. 310158-001
[51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007,
SUPER MICRO Computer Inc.
4. References (6)
[52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet,
Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm
[53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007,
http://www.intel.com/design/chipsets/datashts/313082.htm
[54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,
http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[55]: PCI Technology overview, Febr. 2003, http://www.digi.com/pdf/prd_msc_pcitech.pdf
[56]: DDR3 SDRAM, Samsung, http://www.samsung.com/global/business/semiconductor/
products/dram/Products_DDR3SDRAM.html
[57]: Le H. Q. et al., „IBM POWER6 microarchitecture,”
IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662
[58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007,
http://www.realworldtech.com/includes/templates/articles.cfm?
ArticleID=RWT051607033728&mode=print
[59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006
http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf
[60]: Hofstee P., „Tutorial: Hardware and Software Architectures
for the CELL BROADBAND ENGINE processor”, IBM Corp., September 2005
http://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdf
4. References (7)
[61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI
[62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the
Next Level,” Technology Intel Magazin, http://www.intel.com/technology/magazine/
computing/fully-buffered-dimm-0305.htm
[63]: http://www.supermicro.com/manuals/motherboard/E7221/MNL-0776.pdf
[64]: http://www.supermicro.com/manuals/motherboard/7300/MNL-0955.pdf
[65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007,
http://www.intel.com/design/chipsets/datashts/313082.htm
[66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004,
Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf
[67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard
No. 21C, Page 4.20.18-1
[68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance
Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005
[69]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/
SD9C16_32x72.pdf
[70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37,
Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf
Download