1. Macroarchitecture and performance parameters of MMs Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 Overview • 1. Introduction • 2. Macroarchitecture of main memories • 3. Key performance parameters of main memories • 4. References 1. Introduction (1) Scope General purpose main memories, i.e. main memories used in desktops, servers and laptops 1. Introduction (2) Desktop [32] Server [77] Figure: Main memories on motherboards 1. Introduction (3) Figure: Different kinds of memory modules 1. Introduction (4) Layout of main memories Macroarchitecture of the main memory Layout of the memory modules Figure: Main dimensions of the layout of main memories 2. Macroarchitecture of main memories 2. Macroarchitecture of main memories • 2.1 Introduction • 2.2 Attachment policy • 2.3 Point of attachment • 2.4 Number of memory controllers • 2.5 Number of memory channels • 2.6 Attributes of memory channels 2.1. Introduction (1) Macroarchitecture of main memories Example 1 Core L2 contr. Core Processor L2 contr. L2 L2 FSB c. FSB c. FSB North Bridge FSB Memory Processor Memory Mem. Mem. modules channel North Bridge Figure: Single channel main memory attached via the FSB and the north bridge 2.1. Introduction (2) Example 2 Core Core L2 contr. Core Processor Core L2 contr. L2 L2 FSB c. FSB c. FSB North Bridge FSB Memory Processor Memory Mem. channels Mem. modules North Bridge Figure: Dual channel main memory attached via the FSB and the north bridge 2.1. Introduction (3) Example 3 Core Core Processor Processor L2 L2 Memory IN (Xbar) IN (Xbar) B. c. IO-bus M. c. Memory B. c. Mem. Mem. modules channel M. c. IO-bus Figure: Single channel main memory attached via a dedicated memory controller 2.1. Introduction (4) Example 4 Core L2 Core Core L2 Processor IN (Xbar) IN (Xbar) IO-bus M. c. L2 Syst. Req. Queue Syst. Req. Queue B. c. L2 Core Memory B. c. Processor Memory Mem. channels Mem. modules M. c. IO-bus Figure: Dual channel main memory attached via a dedicated memory controller 2.1. Introduction (5) Macroarchitecture of main memories Attachment policy Point of attachment No. of mem. contr.s No. of mem. channels Attributes of mem. channels (in case of direct attachment) Figure: Main dimensions of the macroarchitecture of main memories 2.2. Attachment policy (1) Attachment policy Indirect attachment Direct attachment Attachment via the FSB and north bridge (mem. control hub) Attachment via mem. controller(s) • Longer access times (~20-30%), • Shorter access times (~20-30%), • Dependency of • Independency of memory technology and speed memory technology and speed POWER4 (2001) PA-8800 (2004) PA-8900 (2005) POWER5 (2005) POWER6 (2007) Cell BE (2006) Core Duo line (2006) UltraSPARC IV (2004) UltraSPARC IV+ (2005) Montecito (2006) UltraSPARC T1 (2005) Athlon 64 X2 line (2005) Opteron line (2003) Barcelona (2007) Figure: Attachment policy 2.2. Attachment policy (2) Core Core Core Core L2 L2 contr. L2 System Request Queue L2 IN (Xbar) FSB c. B. c. M. c. FSB North Bridge Memory Core Duo (2006) HT-bus Memory Athlon 64 X2 (2005) Core 2 Duo (2006) Figure:Indirect attachment of the main memory to the syst. architecture Figure: Direct attachment of the main memory to the syst. architecture 2.3. Point of attachment (1) The point of attachment The highest cache level Between the two highest cache levels (via an IN) (via the IN connecting these levels) 2-level caches: 3-level caches: The IN connecting the L2 cache The IN connecting the L3 cache L2 L2 L3 L3 IN IN M M The M. c is connected usually in this way if the highest cache level is inclusive. 3-level caches: 2-level caches: The IN connecting the L1 and L2 caches C C L2 M IN1 L2 The IN connecting the L2 and L3caches L2 L2 L2 M IN L3 L3 L3 The M. c is connected usually in this way if the highest cache level is exclusive. Figure: Possible points of attachment of main memory to the system architecture 2.3. Point of attachment (2) Interrelationsship between inclusion policy of L3 caches and point of attachment Inclusive L3 Exclusive L3 Replaced lines L2 L2 L3 Lines missing in L2 are reloaded and deleted from L3 Data missing in L2/L3 (high traffic) L3 Replaced, modified data (low traffic) M.c. Memory Memory L3 L3 L2 L2 M.c. IN M.c. Memory Montecito (2006) POWER4 (2001) L3 L3 UltraSPARC IV+ (2004) POWER5 (2004) Memory 2.3. Point of attachment (3) In case of a two-level cache hierarchy Core Core L2 contr. Core In case of a three-level cache hierarchy Core L2 L2 System Request Queue Core L2 I Core L2 D L2 I L3 L2 D L3 L2 IN (Xbar) FSB c. FSB c. B. c. M. c. FSB FSB HT-bus Core 2 Duo (2006) Memory Athlon 64 X2 (2005) Montecito (2006) Figure: Examples for attaching memory via the highest cache level 2.3. Point of attachment (4) In case of a two-level cache hierarchy In case of a three-level cache hierarchy L3 data L3 tags/contr. Core 0 L2 L2 M. c. Memory L2 M. c. Memory L2 X b Interconn. network Core a r Core 7 L2 M. c. Memory L2 M. c. Memory B. c. B. c. Fire Plane bus Core M. c. Memory JBus UltraSPARC T1 (2005) UltraSPARC IV+ (2005) (exclusive L3) Figure: Examples for attaching memory via the interconnection network connecting the two highest cache levels 2.4. Number of memory controllers (1) Number ofmemory controllers (in case of direct attachment) Typ. use E.g. Single memory controller Dual memory controllers Quad memory controllers Usual implementations A few recent designs Exceptional designs POWER5 (2004) POWER6 (2007) K8-based processors (2006) Barcelona (2007) UltraSPARC T1 (2005) UltraSPARC T2 (2007) Figure: Number of memory controllers (in case of direct attachment) 2.4. Number of memory controllers (2) Figure: Block diagrams of the POWER5 and POWER6 processors [57] 2.4. Number of memory controllers (3) Figure: Block diagrams of AMD’s K8 and Barcelona processors [58] 2.4. Number of memory controllers (4) Figure: Block diagram of the UltraSPARC 2 (Niagara-2) [59] 2.5. Number of memory channels (1) Number of memory channels (per north bridge/memory controller) Typ. use E.g. Single memory channel Dual memory channels Early desktops Recent desktops, single core DP/MP servers Recent DC and QC DP/MP servers with FB DIMM memory Intel’s 865 and higher chipset families for P4 desktops, Intel’s P4 based DP server chipsets Intel’s 5000 (Bensley) and 7000 Caneland platforms for Core Duo DC and MC processors Intel’s 845/848 chipset families for P4 desktops and earlier desktop chipsets Quad memory channels Cell BE Figure: Number of memory channels supported per north bridge/memory controller 2.5. Number of memory channels (2) Example 1 Figure: Block diagram of an early P4 desktop having a single memory channel (Intel 845 chipset) [49] 2.5. Number of memory channels (3) Example 2 Figure: Block diagram of a more advanced P4 desktop including dual memory channels (Intel’s 975 chipset) [50] 2.5. Number of memory channels (4) Example 3 Figure: Block diagram of an early P4-based DP server including dual memory channels (Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51] 2.5. Number of memory channels (5) Memory Interface Controller (MIC) • Dual XDRTM memory channels • Interleaved adressing in the channels • The MIC can be configured to support only a single channel • ECC support (32 + 4 bits) Dual 36 bits wide XDR channels Memory bandwidth at 3.2 Gb/s transfer rate: 3.2 Gb/s x 2 x 4 B = 25.6GB/s Figure: Basic blocks of the Cell BE processor [60] 2.5. Number of memory channels (6) Remark In dual channel configurations (or in general, in case of multiple memory channels) a scheme is needed to define the allocation of memory addresses to the individual channels. Allocation of addresses to the individual channels Interleaved mode Asymmetric mode • Addresses are allocated alternating to the channels at 64 B boundaries, assuming 64 B long cache lines. Two consecutive cache lines can be retrieved simultaneously. • Addresses start in the first channel and are allocated to this channel until the highest rank of this channel. Then addresses continue in the second channnel. • Both memory channels must be populated with modules having the same size (e.g. 1 GB). • No need to populate both channels, or populate them with the same size. • Provides maximum performance in real applications. • In real applications, performance is limited to single channel performance. Figure: Address allocation alternatives to the individual channels 2.5. Number of memory channels (7) Example 4 5000 (Blackford) Xeon 5000 (Dempsey, Netburst), DC 5100 (Woodcrest, Core 2), DC 5300 (Clowertown, Core 2), QC FB-DIMM up to 64 GB In workstations the snoop filter eliminates snoop traffic to the graphics port Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processors including quad memory channels [52] 2.5. Number of memory channels (8) Example 5 Xeon 7200 (Tigerton DC, Core2), DC 7300 (Tigerton QC, Core2), QC FB-DIMM up to 512 GB Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processors including quad memory channels [53] 2.5. Number of memory channels (9) Remark The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless actual implementations support typically only four DIMMs. Figure: Maximum supported FB-DIMM configuration [54] (6 channels/8 DIMMs) 2.6. Attributes of memory channels (1) Attributes of memory channels Supported type of mem. modules Supported no. of mem. modules Supported attributes Supported no. of of DRAM devices ranks per mem. module Figures: Attributes of memory channels 2.6. Attributes of memory channels (2) Suported type of memory modules Memory modules of the same DRAM type Memory modules of different DRAM types Usual implementation In order to provide a choice and evolution path in times of memory technology transfers (e.g. while DDR2 technology replaces DDR technology) DRAM type A E.g. DDR2 DDR DRAM type B DDR2 Figure: Type of memory modules supported on the memory channel(s) 2.6. Attributes of memory channels (3) Note: Motherboards allowing to choose from two different DRAM types are termed Combo boards. Example Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies. Per channel a single memory module is supported (with one or two memory ranks on each). Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard, is a designated as a combo mainboard. 2.6. Attributes of memory channels (4) 4 DIMM slots North bridge of the 915G chipset Figure: MSI’s 915G Combo motherboard (based on Intel’s 915G chipset) [61] 2.6. Attributes of memory channels (5) DDR2 DDR Two DDR or DDR2 channels with a single DIMM slot on each channel Figure: DIMM slots of the MSI’s 915G Combo motherboard [61] 2.6. Attributes of memory channels (6) Supported number of memory modules It depends on the • DRAM connection technology • DRAM speed • Number of ranks mounted onto the memory module(s). 2.6. Attributes of memory channels (7) Dependency on the memory connection technology The maximum number of supported memory modules depends heavily on the memory connection technology, that is whether the modules are connected • via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or • via a serial bus (like in case of FBDIMM modules). Number of memory modules supported per memory channel E.g. 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Figure: Number of memory modules vs memory connection technology in synchronous DRAMs 2.6. Attributes of memory channels (8) Remarks 1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs often allowed 4 – 8 memory modules to attach. 2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets supported typically two pairs of 32-bit wide FPM/EDO modules. 2.6. Attributes of memory channels (9) Dependency on the memory speed For • • • higher transfer rates skews jitter and reflections (caused by impedance mismatch while terminating transmission lines) impede more and more signal integrity. Obviously, the more memory modules are present on a channel the serious signal integrity problems arise. Higher transfer rates limit the number of memory modules that can be supported on a memory channel. 2.6. Attributes of memory channels (10) Figure: Scaling down the number of supported DIMMs per channel with increasing data rates (assuming two ranks per DIMM) [62] 2.6. Attributes of memory channels (11) Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55] 2.6. Attributes of memory channels (12) Levelling off channel capacity for synchronous DRAMs With • increasing device densities • but decreasing number of modules supported for higher transfer rates by memory channels, the maximum memory capacity per memory channel remains roughly the same for synchronous SDRAM devices [66]. But increasing server performance doubles memory capacity demand about every two years [66] Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66] 2.6. Attributes of memory channels (13) Increasing server capacity demand calls for memory technologies with higher capacity potential, such as DRAM technologies with serial bus connection, like FB-DIMM. 2.6. Attributes of memory channels (14) Dependency on the number of ranks mounted onto the memory modules Dual memory ranks mounted on the memory modules result in higher bus loading, and may reduce the maximum number of supported memory slots. E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed • up to three SDRAM DIMMs with just a single rank or • up to two SDRAM DIMMs with dual ranks. 2.6. Attributes of memory channels (15) Number of memory modules supported per memory channel Typical use 1-2 memory modules 6-8 memory modules Desktops/ entry level servers DP/MP servers with FBDIMM mem. modules Figure: Number of memory modules supported per memory channel by Intel’s P4/Core 2 Duo north bridges 2.6. Attributes of memory channels (16) 4 DIMM slots DDR2 DDR Two DDR or DDR2 channels with a single DIMM slot on each channel Figure: Example 1. P4 based desktop motherboard (MSI’s 915G Combo motherboard with Intel’s 915G chipset) [61] 2.6. Attributes of memory channels (17) 4 DIMM slots Ch. A Ch. B Two DDR2 channels with two DIMM slots on each channel MCH (E7221) CPU Figure: Example 2. P4-based entry-level DP server motherboard (Supermicro’s P8SCT with Intel’s E7221 chipset) [63] 2.6. Attributes of memory channels (18) 4 DDR2 FB-DIMM channels 6 DIMM slots on each channel Figure: Example 3. Block diagram of a Core 2 based four-processor MP server (Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64] 2.6. Attributes of memory channels (19) 4 DDR2 FB-DIMM channels Xeon 6 DIMM slots on each channel 7200 DC 7300 QC (Tigerton) 192 GB ATI ES1000 Graphics with 32MB video memory 7300 NB SBE2 SB Figure: Example 3. Core 2 based four-processor MP server motherboard (Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64] 2.6. Attributes of memory channels (20) Xeon 7200 (Tigerton DC, Core2), DC 7300 (Tigerton QC, Core2), QC Four DDR2 FB-DIMM channels with 8 DIMM slots on each channel up to 512 GB Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platform with the 7300 (Clarksboro) chipset (9/2007) [65] 2.6. Attributes of memory channels (21) Supported number of ranks per memory module Rank: logical unit Memory module: physical unit • A rank consists of a set of DRAM devices (of a given width) that are needed to achieve the expected data width of the memory module. E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices. • Optionally, a rank may include an additional DRAM device to hold ECC bits. • DRAM devices constituting a rank are mounted side by side onto a memory module. • A rank covers usually one side of the memory module (using x8 or x16 devices, but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides. • All devices of a rank share the address and the command bus. • All devices of a rank are selected by the same CS (Chip Select) signal, whereas different ranks have different CS signals. A memory rank is sometimes designated also as a row. 2.6. Attributes of memory channels (22) Figure: Connecting ranks to the memory controller [68] 2.6. Attributes of memory channels (23) Memory module: physical unit • A memory module is basically a PC card that carries one or more ranks, and fits into a memory slot of the motherboard. • Memory modules may be populated either on one side or on both sides. A memory module may contain • a single rank on one of its sides • a single rank on both of its sides • two ranks, each one of its sides 2.6. Attributes of memory channels (24) Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices, that are mounted on one side of the module [67] 2.6. Attributes of memory channels (25) Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (26) Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (27) Supported number of ranks per memory module A single rank is supported per mem. module Dual ranks are supported per mem. module In few cases, usually as a restriction for higher DRAM speeds Typical implementation Figure: Supported number of ranks (rows) per memory module Examples a) The north bridge of Intel’s 815 chipset supports • up to three SDRAM-100 DIMMs with dual ranks or • up to three SDRAM-133 DIMMs with just a single rank or • up to two SDRAM-133 DIMMs with dual ranks. b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports • up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks 2.6. Attributes of memory channels (28) Supported attributes of DRAM devices DRAM type DRAM width DRAM density DRAM speed Figure: Supported attributes of DRAM devices 2.6. Attributes of memory channels (29) DRAM types ( for general use) DRAMs with parallel bus connection Asynchronous DRAMs DRAM Year of intro. FP FPM EDO (1970) (~1974) (1983) (1995) DRAMs with serial bus connection Synchronous DRAMs SDRAM (1996) DDR2 DDR3 DRDRAM XDR FBDIMM (2000) (2004) (2007) (1999) (2006)1 (2006) DDR Main stream DRAM types 1 Challenging DRAM types Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers Figure: DRAM types for general use (Described in Sections 4, 5, 6 of the Chapter DRAM devices) 2.6. Attributes of memory channels (30) DRAM width North bridges/memory controllers specify the width of supported DRAM devices. Most recent north bridges/memory controllers support x8 and x16 DRAM devices. DRAM density North bridges/memory controllers specify supported DRAM densities. Example 1 The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices with 16Mb/64Mb/128Mb/256Mb densities Example 2 The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors supports DDR2 and DDR3 devices with 512Mb and 1Gb densities. . DRAM speed Also north bridges/memory controllers specify supported DRAM speeds. 2.6. Attributes of memory channels (31) 845xx family Features (Brookdale) Single channel SDR/DDR SDRAM (unbuffered) 2 GB MCH/GMCH Memory Max. memory FSB HT support DRAM speed 400 MHz HT not supported PC133 9/0111/01 845 533/400 MHz HT supported PC133, PC133, DDR 266/200 DDR 266/200 DDR 266/200 DDR 266/200 1/02 845 5/02 5/02 845GL 845G 5/02 845E DDR 266/200 DDR 333/266 DDR 333/266 10/02 845GV 10/02 845GE 10/02 845PE Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets. Another example: The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s transfer rate. 3. Key performance parameters of main memories 3. Key performance parameters of main memories • 3.1 Memory capacity • 3.2 Memory bandwidth • 3.3 Memory latency 3.1. Memory capacity (1) Memory capacity (CM) CM = nCU x nCH x nM x nR x CD with nCU: No. of north bridges/memory control units nCH: No. of memory channels per north bridge/control unit nM: No. of memory modules per channel nR: No. of ranks per memory module CR: Rank capacity (device density x no. of DRAM devices) E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank. The resulting maximum memory capacity is: CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB 3.1. Memory capacity (2) Crucial factors limiting the maximum capacity of main memories • nM: No. of memory modules supported per memory channel • CR: Rank capacity (device density x no. of DRAM devices/rank). 3.1. Memory capacity (3) Number of memory modules supported per memory channel E.g. 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Higher transfer rates limit the number of mem. modules typically to one or two. Figure: Number of memory modules supported by memory channel 3.1. Memory capacity (4) Rank capacity (CR) CR = nD x D with nD: Number of DRAM devices/rank D: Device density Number of DRAM devices/ rank Typically: up to 8 E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices 3.1. Memory capacity (5) Device density Units 106 2000 4M 16M 64M 256M 1G 1500 Density: ~4×/4Y 64K 1000 500 256K 1M 16K 1980 1985 1990 1995 2000 2005 2010 2015 Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35]) Year Ranks include typically up3.1. to 8 DRAM devices. Memory capacity (6) Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops assuming: • • • • 2 memory channels 1 modules per channel dual ranked modules populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density: CMmax = 1 x 2 x 1 x 2 x 1 = 8 GB Typical maximum main memory sizes of recent Core 2 based servers, assuming: assuming: • • • • 4 memory channels 6 modules per channel dual ranked modules populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density: CMmax = 1 x 4 x 6 x 2 x 4 = 192 GB 3.1. Memory capacity (7) The rate of increasing DRAM densities In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month DRAM densities evolve about 4 x/ 4 years. For the same number of control units/modules/ranks the maximum size of main memories increases also about 4 x/4 years. 3.2. Memory bandwidth (8) Bandwidth of memory systems Total bandwidth (BW) provided by a memory system: BW = nCU x nCH x T x WM with nCU: No. of north bridges/memory control units nCH: No. of memory channels per north bridge/control unit T: Transfer rate of the module (no. of data transfers/sec) WM: Data width of the memory modules E.g. A memory system with a single, dual channel controller and 8 Byte wide DDR2 800 modules provides a total bandwidth of: BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s Processors with increasing number of cores require obviously, increasingly higher memory bandwidth. 3.2. Memory bandwidth (10) The min. column cycle time (tCCD) of the memory cell array tCCD (Core column delay) is the min. time interval between consecutive Reads or Writes. Figure: The interpretation of tCCD [36] Remark tCCD is designated also as the Read/Write command to Read/Write command delay 3.2. Memory bandwidth (11) ns Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [37] Note: The min. column cycle time (tCCD) of synchronous DRAMs is: SDRAM: DDR/2/3 7.5 ns 5 ns 3.2. Memory bandwidth (9) The crucial factor limiting the memory bandwidth of the main memory: Transfer rate of the memory module (no. of data transfers/sec) The transfer rate of the memory module (T) equals the transfer rate of the DRAM devices used. The peak transfer rate (Tmax) of synchronous DRAM devices: Tmax = 1/tCCD x FW with tCCD: Min. column cycle time of the memory cell array FW: Fetch width of the memory cell array 3.2. Memory bandwidth (12) The fetch width (FW) of the memory cell array specifies how many times more bits the cell array fetches per column cycle then the data width of the device. E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is: DRAM type FW SDRAM: DDR: DDR2: DDR3: 1 2 4 8 3.2. Memory bandwidth (13) The peak transfer rates of the different DRAM technologies are: Tmax = 1/tCCD x FW SDRAM: DDR: DDR2: DDR3: 1/7.5 x 1 = 133 MT/s 1/5 X 2 = 400 MT/s 1/5 x 4 = 800 MT/s 1/5 x 8 = 1600 MT/s 3.2. Memory bandwidth (14) Transfer rate (MT/s) 5000 2000 1000 500 DDR 266 200 SDRAM 133 SDRAM 100 100 SDRAM 66 50 * DDR 333 DDR 400 DDR2 533 * DDR2 667 * DDR2 800 * DDR3 1067 * * * * ~ 10*/10years * * 20 10 Year 96 97 98 99 2000 01 02 03 04 05 06 07 08 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets 3.2. Memory bandwidth (15) The evolution of peak transfer rates of synchronous DRAMs Peak transfer rates evolve by ≈ 10x/10 years, that means doubling in 3-4 years Sources of the evolution the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3) More specifically the more and more advanced approaches to improve first of all • signaling (by using SSTL_2/1.8/1.5, differential CK/DQS) • synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and • line terminations (by using ODT, dynamic ODT, ZQ calibration etc.) 3.2. Memory bandwidth (16) The evolution of processor clock frequencies vs transfer rates of main memories in mainstream processors 3.2. Memory bandwidth (17) The evolution of processor clock frequencies (fC) in desktops fc (MHz) Le ve ling off 5000 Pe ntium 4 ** * * * * * * * 2000 ~100*/10ye ars 1000 * * * * Pe ntium III* * * * Pe ntium II * * * * * 500 200 Pe ntium 486-DX4 ** 100 * ~10*/10ye ars 50 486-DX2 * 486 386 20 * 286 10 * 8088 * * 5 * * * * * * Pe ntium Pro * * * * 2 1 78 79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 99 2000 01 02 03 04 Year of first volume shipment Figure: Evolution of clock frequencies in Intel’s desktop processors 05 Year 3.2. Memory bandwidth (21) The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets Transfer rate (MT/s) 5000 2000 1000 500 DDR 266 200 SDRAM 133 SDRAM 100 100 SDRAM 66 50 * DDR 333 DDR 400 DDR2 533 * DDR2 667 * DDR2 800 * DDR3 1067 * * * * ~ 10*/10years * * 20 10 Year 96 97 98 99 2000 01 02 03 04 05 06 07 08 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets 3.2. Memory bandwidth (19) The evolution of processor clock frequencies (fC) in desktops between 1995-2003 fc (MHz) Le ve ling off 5000 Pe ntium 4 * ** * * * 2000 * * ~100*/10ye ars 1000 * * * * Pe ntium III* * ** 500 Pe ntium II 200 * Pe ntium 486-DX4 ** 100 * ~10*/10ye ars 50 486-DX2 * 486 386 20 * 286 10 * 8088 * * 5 * * * * * * * * * * * Pe ntium Pro * * * * 2 1 78 79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 99 2000 01 02 03 04 Year of first volume shipment Figure: Evolution of clock frequencies in Intel’s desktop processors 05 Year 3.2. Memory bandwidth (20) In the time period of about 1995 - 2003 clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. 3.2. Memory bandwidth (22) In the time period of about 1995 - 2003 clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. In this time period the gap between clock frequencies and memory transfer rates became continuously wider. In this time period higher clock rates were the main source for higher proc. performance, but higher processor performance invokes higher memory traffic a strong motivation arose to increase the bandwidth of main memories by increasing the width of the datapath to the main memory , first of all by introducing dual memory channels. Dual memory channels became the commonplace even in desktops. 3.2. Memory bandwidth (24) The evolution of processor clock frequencies (fC) in desktops after about 2003 fc (MHz) Le ve ling off 5000 Pe ntium 4 * ** * * * 2000 * * ~100*/10ye ars 1000 * * * * Pe ntium III* * ** 500 Pe ntium II 200 * Pe ntium 486-DX4 ** 100 * ~10*/10ye ars 50 486-DX2 * 486 386 20 * 286 10 * 8088 * * 5 * * * * * * * * * * * Pe ntium Pro * * * * 2 1 78 79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 99 2000 01 02 03 04 05 Year Year of first volume shipment Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003 3.2. Memory bandwidth (23) After about 2003 however, clock frequencies became saturated (due to meeting the thermal wall), and single core processors represented the mainline until about 2005. In the time period of about 2003 - 2005 the gap between clock frequencies and memory transfer rates became narrover. Nevertheless, beginning with ~ 2005 the era of multicores emerged with doubling the core count about every two years. Beginning with about 2005 A new scenario becomes dominant with steadily increasing bandwidth/transfer rate requirements. 3.2. Memory bandwidth (25) The status quo in increasing bandwidth/transfer rates 3.2. Memory bandwidth (26) . double data rate SDRAM migration Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56] 3.2. Memory bandwidth (27) Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40] 3.3. Memory latency (1) Memory latency • Device level memory latency • System level memory latency 3.3. Memory latency (2) Read 1 latency (ns) 200 200 * 180 160 * 140 150 120 100 100 * 100 * 80 * * 80 80 * 70 60 70 * * 60 60 * * 50 40 50 * * 40 20 81 Desktop processor 82 83 84 85 86 AT 386 DX PC 87 88 89 FPM Typ. DRAM 64 K chips (bits) FPM 64 K 256 K 64 K 256 K FPM 91 92 486 DX Chipset DRAM type 90 FPM 93 94 95 96 430LX FPM FPM 98 PII P 420TX 97 430FX * 35 40 * * 30 99 2000 01 30 * * 24 02 64 K 256 K 256 K 256 K 256 K 1M 1M 1M 4M 4M 16 M 16 M 4M 64 M 64 M 05 06 07 Year Core2 8152 850 845 8453 8202 SDRAM DDR SDRAM SDRAM EDO RDRAM RDRAM 16 M 04 PIII P4 430VX 440ZX FPM FPM EDO EDO SDRAM 03 25 * * 22 915 865 835 DDR DDR2 DDR2 DDR3 DDR2 64 M 128 M 128 M 256 M 128 M 256 M 256 M 512 M 256 M 512 M 512 M 1G 1G 1G 256 M 512 M 1 Read latency of DRAM, FPM, EDO and BEDO parts = tRAC (Row access time (time from row address until data valid)) Read latency of SDRAM parts = CL + tRCD (CAS Latency + Row to Column delay) 2 The 815 chipset supports SDRAMs while the 820 RDRAMs 3 A new revision of the 845 supports DDRs instead of SDRAMs Figure: Estimated maximum and minimum read latencies of DRAM devices (ns) 512 M 1G 2G 3.3. Memory latency (3) Memory latency ns 300 200 210 * RDRAM 200 * *160 155 * 140 * 135 * 120 * 100 110 * 85 * 70 * 50 81 82 Desktop processor Chipset DRAM type Typ. DRAM parts (bits) 86 87 88 89 1990 91 92 AT 386 DX (286) PC (8088) DRAM 16 K 83 84 85 DRAM 64 K 64 K 128 K DRAM 486 DX DRAM FPM DRAM FPM 64 K 256 K 128 K 256 K 1M 93 94 P 95 96 97 98 99 2000 01 02 PPro PII 03 04 05 PIII P4 430VX 440ZX 8152 850 845 915 8453 8202 EDO EDO SDRAM DDR DDR EDO RDRAM FPM FPM SDRAM SDRAM RDRAM DDR2 SDRAM 430FX 420TX 430LX FPM FPM 256 K 256 K 1M 1M 4M 4M 16 M 16 M 4M 64 M 64 M 128 M 64 M 256 M 256 M 128 M 256 M 128 M 512 M 512 M 64 M 1G 256 M 256 M 1G 512 M 16 M Year 06 07 08 Core2 865 835 DDR2 DDR3 DDR2 512 M 512 M 1G 1G 2G Figure : Estimated typical system-level memory latency in x86-based PCs (in ns) 3.3. Memory latency (4) Memory latency in proc. cycles 1000 500 300 200 RDRAM 100 85 * 50 5 3 180 * Desktop processor Chipset PC (8088) DRAM type DRAM 16 K * 1 * 1 * 81 82 Typ. DRAM parts (bits) 280 * 10 * 10 1 220 * 40 * 30 20 3 2 240 * 83 84 85 AT 386 DX (286) DRAM 64 K 86 87 DRAM 88 89 1990 91 92 486 DX DRAM DRAM FPM FPM 96 97 98 99 2000 01 02 93 94 95 P PPro 420TX 430LX FPM FPM PII 03 04 05 PIII P4 430VX 440ZX 8152 850 845 915 8453 8202 EDO EDO SDRAM DDR DDR EDO RDRAM FPM FPM SDRAM SDRAM RDRAM DDR2 SDRAM 430FX 64 K 64 K 256 K 256 K 256 K 128 K 128 K 256 K 1M 1M 1M 4M 4M 16 M 16 M 4M 64 M 64 M 128 M 64 M 256 M 256 M 64 M 128 M 256 M 128 M 512 M 512 M 1G 256 M 256 M 1G 512 M 16 M Year 06 07 08 Core2 865 835 DDR2 DDR3 DDR2 512 M 512 M 1G 1G 2G Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles) 4. References (1) [1]: 64MB Apple G3 Beige 168p SDRAM DIMM, http://www.memoryx.net/apl168s64.html [2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, http://www.pjrc.com/mp3/simm/datasheet.html [3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.2 [4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page 4.5.10 [5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida, http://pdf1.alldatasheet.com/datasheet-pdf/view/60081/ELPIDA/MC-4R512FKE6D.html [6]: DDR2 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist [7]: DDR3 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist [8]: DDR2 SDRAM FBDIMM Features, Micron, http://www.micron.com/products/modules/fbdimm/partlist [9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets, http://www.hardwaresecrets.com/article/167/1 [10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007, http://www.digit-life.com/articles2/mainboard/ddr3-rmma.html 4. References (2) [11]: http://www.hardwaresecrets.com/fullimage.php?image=2862 [12]: http://cgi.ebay.com/Vintage-Microsoft-8-Bit-ISA-PC-RAM-Card-W-Gold-5150_ W0QQitemZ310017171151QQcmdZViewItem [13]: http://www.hardwaresecrets.com/fullimage.php?image=2856 [14]: http://www.memex.com.au/images/72psimm.jpg [15]: Ahn J.-H., „DRAM Operation & Architecture,” 2007. 9. 10., Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf [16]: http://www.twinmos.com/dram/dram_p_dt_ddr.htm#s [17]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg [18]: http://www.twinmos.com/dram/dram_p_dt_ddr3_1333.htm#s [19]: http:// item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAMLAPTOP-16mb-EDO_W0QQitemZ230060958674QQihZ013QQcmdZExpressItem [20]: http:// www.twinmos.com/dram/dram_p_nb_sdr_sodimm.htm [21]: http:// www.cdw.com/shop/products/default.aspx?EDC=915882 [22]: http:// laptoping.com/category/laptop-memory [23]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg [24]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg 4. References (3) [25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ SD18C32_64_128x72D.pdf [25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ sd18c32_64_128x72.pdf [26]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr/ DDF18C64_128x72D.pdf [27]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr2/ HTF18C64_128_256x72D.pdf [28]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr3/ JSF18C256x72PD.pdf [29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003, http://www.pericom.com/pdf/datasheets/PI6CV857.pdf [30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Revison 1.3, Jan. 2002, http://www.jedec.org/download/search/4_20_04R13.PDF [31]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/ [32]: http://www.pricegrabber.com/search_getprod.php/masterid=3191326 [33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications, JESD82, JEDEC, July 2000 4. References (4) [34]: http://www.tranzistoare.ro/datasheets2/32/327037_1.pdf [35]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf [36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf [37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf [38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/ SDRAM_Controller_whitepaper_Oct_2006.pdf [39]: Van Roon T., „What exactly is a PLL?,” April 2006, http://www.uoguelph.ca/~antoon/gadgets/pll/pll.html [40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004, http://asic.postech.ac.kr/1.Nrl/2.NRL%20Seminar/invitation/041208ChoiJH.pdf [41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt [42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. 4. References (5) [43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated Device Technology Inc., 1999, http://www.digchip.com/datasheets/parts/ datasheet/222/IDT49C466.php [44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application Note XAP645 (v.2.2), Aug. 2006, http://www.xilinx.com/support/documentation/ application_notes/xapp645.pdf [45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org [46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester, http://www.simmtester.com/PAGE/news/showpubnews.asp?num=153 [47]: DDR2 DIMM SPD Definition, August 25, 2006, http://docmemory.com/page/news/showpubnews.asp?num=141 [48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, 2002 http://download.micron.com/pdf/technotes/TN_04_42_C.pdf [49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002, Intel, No. 298604-001 [50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet, Nov. 2005, Intel, No. 310158-001 [51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007, SUPER MICRO Computer Inc. 4. References (6) [52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm [53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm [54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1 [55]: PCI Technology overview, Febr. 2003, http://www.digi.com/pdf/prd_msc_pcitech.pdf [56]: DDR3 SDRAM, Samsung, http://www.samsung.com/global/business/semiconductor/ products/dram/Products_DDR3SDRAM.html [57]: Le H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662 [58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007, http://www.realworldtech.com/includes/templates/articles.cfm? ArticleID=RWT051607033728&mode=print [59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006 http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf [60]: Hofstee P., „Tutorial: Hardware and Software Architectures for the CELL BROADBAND ENGINE processor”, IBM Corp., September 2005 http://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdf 4. References (7) [61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI [62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, http://www.intel.com/technology/magazine/ computing/fully-buffered-dimm-0305.htm [63]: http://www.supermicro.com/manuals/motherboard/E7221/MNL-0776.pdf [64]: http://www.supermicro.com/manuals/motherboard/7300/MNL-0955.pdf [65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm [66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf [67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard No. 21C, Page 4.20.18-1 [68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005 [69]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf [70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf