Platforms I.

advertisement
Platforms I.
Dezső Sima, Olivér Asztalos
2014 November
(Ver. 1.7)
 Sima Dezső, Olivér Asztalos 2012-2014
Contents
•
1. Introduction to platforms
•
2. Main components of platforms
•
3. Platform architectures
•
4. Memory subsystem design considerations
•
5. References
1. Introduction to platforms
• 1.1. The notion of platform
• 1.2. Description of particular platforms
• 1.3. Representation forms of platforms
•
1.4. Compatibility of platform components
1.1. The notion of platform
1.1 The notion of platform (1)
1.1 The notion of platform
The notion platform is widely used in different segments of the IT industry e.g. by
IC manufacturers, system providers or even by software suppliers with different interpretations.
Here we are focusing on the platform concept as used typically by system providers.
1.1 The notion of platform (2)
Modular (unified) system design and the notion platform
Modular system design means that the system architecture is partitioned to a few
standard components (modules), such as the processor, memory control hub (MCH),
I/O control hub (ICH) that are interconnected by specified (standard) interconnections.
Core 2 Duo
Core 2 Extreme
(2C)
FSB: 1066/800/533 MT/s speed
FSB
965 Series
MCH
ME
Two memory channels
DDR2-800/666/533
Two DIMMs per channel
C-link
DMI
ICH8
Figure: Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimed
DT platform (the Bridge Creek platform)
1.1 The notion of platform (3)
Modular system design became part of scientific research at the end of the 1990s, see e.g. [4].
Remark
The need for a modular system design, called platform design, arose in the PC industry
when PCI-based system designs were substituted by port based system designs,
about 1998-1999 .
1.1 The notion of platform (4)
Pentium II/
Pentium III
Pentium II/
Pentium III
Pentium III
Processor bus
AGP
System
controller
Main Memory
Processor bus
AGP
(EDO/SDRAM)
2xIDE/
ATA 33/66/100
PCI bus
2xIDE/ATA33/66
2xUSB
2x/4x USB
Peripheral
controller
PCI device
adapter
Main Memory
(SDRAM)
System
controller
Hub interface
Peripheral
controller
LPC
Super I/O (KBD, MS, etc.)
AC'97
(Legacy and/or
slow devices)
PCI bus
ISA bus
PCI to ISA
bridge
PCI device
adapter
ISA bus
ISA device
adapter
ISA device
adapter
Late PCI-based system architecture (~ 1998)
(used typically with Pentium II/III
(built around Intel’s 440xx chipset)
Legacy
devices
Early port-based system architecture (~ 1999)
(used first with Pentium III
(built around Intel’s 810 chipset)
1.1 The notion of platform (5)
Main goals of modular system level design
• to reduce the complexity of designing complex systems by partitioning it to modules,
• to have stable interfaces (at least for a few number of years) interconnecting the modules
• in this way
• to minimize design rework while upgrading a given system design, like moving from
one processor generation to the next and thus
• to shorten the time to market.
Co-design of platform components
Platform components are typically co-designed, announced and delivered as a set.
1.1 The notion of platform (6)
The notion of platforms
System providers however, may use the notion platform either in a more general or a more
specific sense.
Interpretation of the notion platform
Interpretation in a
more general sense
A modular system design targeting a given
application area,
used as terms like DT or MP platforms.
Interpretation in a
more specific sense
A particular modular system architecture,
developed for a given application area,
such as a given DT or MP platform, like
Intel’s Sandy Bridge Based Sugar Bay DT platform or
AMD’s Phenom II X! based Dragon platform (2008)
for gamers (2009)
1.1 The notion of platform (7)
Benefits of the platform concept for computer manufacturers
• With the platform concept in mind manufacturers, like Intel or AMD will plan, design
and market all key components of a platforms, such as the processor or the processors
and the related chipset as an integrated entity [5].
• This is beneficial for the manufacturers since it motivates OEMs as system providers,
to buy all key parts of a computer system from the same manufacturer.
1.1 The notion of platform (8)
Benefits of the platform concept for customers
The platform concept is beneficial for the customers as well since an integrated “backbone”
of a system architecture promises a more reliable and more cost effective system.
1.1 The notion of platform (9)
Interpretation the notion platform in a more specific sense
In a more specific sense the notion platform refers to a particular modular system architecture,
that is developed for a given application area, such as a DT, DP or MP platform.
In this sense the notion platform is interpreted as a standardized backbone of a
system architecture developed for a given application area that is built up typically of
•
•
•
•
the processor or processors,
the chipset,
the memory subsystem (MSS) that is attached by a specific memory interface
in some cases, such as in mobile or business oriented DT platforms also the
networking component [7] as well
• the buses interconnecting the above components of the platform..
Basic components of a platform
Processor or
processors
Chipset
The memory
subsystem
(LAN
controller)
Buses
interconnecting
the preceding
basic components
Subsequently, we will focus on the interpretation of the notion platform in this latter sense.
1.1 The notion of platform (10)
Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3]
Platform
1066 MT/s
Display
card
2 DIMMs/channel
2 DIMMs/channel
C-link
1.1 The notion of platform (11)
Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH
Platform
Xeon 7500
(Nehalem-EX)
(Becton) 8C
/
Xeon 7-4800
(Westmere-EX) 10C
SMB
SMB
SMB
SMB
Nehalem-EX 8C
Westmere-EX
10C
Nehalem-EX 8C
Westmere-EX
10C
QPI
SMB
SMB
SMB
SMB
QPI
QPI
QPI
QPI
SMB
SMB
SMB
SMB
Nehalem-EX 8C
Westmere-EX
10C
SMB
Nehalem-EX 8C
Westmere-EX
10C
QPI
QPI
SMB
2x4 SMI
channels
7500 IOH
ESI
Interfaces connecting platform
components
SMB
QPI
2x4 SMI
channels
DDR3-1067
SMB
ICH10
ME
DDR3-1067
SMI: Serial link between
the processors and SMBs
SMB: Scalable Memory Buffer
Parallel/serial conversion
ME: Management Engine
1.1 The notion of platform (12)
The structure of a platform is termed as its architecture (or topology).
It describes the basic components and their interconnections and will be discussed in Section 3.
1.1 The notion of platform (13)
Historical remarks
System providers began using the notion “platform” about 2000, like
• Philips’ Nexperia digital video platform (1999),
• Texas Intruments (TI) OMAP platform for SOCs (2002),
• Intel’s first generation mobile oriented Centrino platform for laptops, designated as the
Carmel platform (3/2003).
Intel contributed significantly for spreading the notion platform when based on the success
of their Centrino platform they introduced this concept also for their desktops [5] and servers
[6], [7] in 2004.
1.1 The notion of platform (14)
Intel’s early server and workstation roadmap from Aug. 2004 [6]
Note
a) This roadmap already makes use of the notion platform without revealing platform names.
b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems.
1.1 The notion of platform (15)
Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8]
Note
This roadmap includes also the particular platform designations for desktops, UP servers etc.
1.2. Description of a particular platform
1.2 Description of a particular platform (1)
Description of a particular platform
Description of a particular platform
Detailing the
platform
architecture
Example: The Tylersburg DT platform
(2008)
Processor
MCH
ICH
1.2 Description of a particular platform (2)
Detailing the platform architecture includes the specification architecture (topology) of the
processor-, the memory- and the I/O subsystems (to be discussed in Section 3).
Example: The Tylersburg DT platform
(2008)
Processor
MCH
ICH
It is concerned with issues, such as whether the processors of an MP server are connected
to the MCH via an FSB or otherwise, or whether the memory is attached to the
system architecture through the MCH or through the processors etc.).
1.2 Description of a particular platform (3)
Description of a particular platform
Description of a particular platform
Detailing the
platform
architecture
Identification of the
platform components
Example: The Tylersburg DT platform
(2008)
Processor
1. gen. Nehalem
(4C)/
Westmere-EP (6C)
MCH
X58 IOH
ICH
ICH10
1.2 Description of a particular platform (4)
Description of a particular platform
Description of a particular platform
Detailing the
platform
architecture
Identification of the
platform components
Specification of the
interfaces
interconnecting
the platform components
Example: The Tylersburg DT platform
(2008)
Processor
1. gen. Nehalem
(4C)/
Westmere-EP (6C)
1. gen. Nehalem
(4C)/
Westmere-EP (6C)
QPI
MCH
X58 IOH
X58 IOH
DMI
ICH
ICH10
ICH10
1.2 Description of a particular platform (5)
Remark
The specification of a platform will be completed by the datasheets of the related platform
components.
1.2 Description of a particular platform (6)
Dependence of the platform architecture on the platform category
Platforms may be classified according to the target area of application, such as
Platforms
Mobile
platforms
Desktop (DT)
platforms
Dual processor (DP)
platforms
Quad processor (MP)
platforms
Of course, beyond the above categories also further processor categories and related platforms
exist, such as embedded processors and related platforms.
In conformity with different platform categories also different platform architectures arise,
as indicated below.
Platform architecture
Architecture of
mobile platforms
Architecture of
DT platforms
Architecture of
DP platforms
Architecture of
MP platforms
In these slides platform architectures will be discussed in Section 3, nevertheless restricted
only for DT, DP and MP platforms.
1.3. Representation forms of platforms
1.3 Representation forms of platforms (1)
1.3 Representation forms of platforms
a) Thumbnail representation
b) Extended representation (an arbitrarily chosen representation form in these slides)
c) Block diagram of a platform.
1.3 Representation forms of platforms (3)
a) Thumbnail representation
It is a concise representation of a particular platform.
In particular, the thumbnail representation
• reveals the platform architecture,
• identifies the basic components of a platform, such as the processor or processors, the chipset,
in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller,
• and specifies the interconnection links (buses) between the platform components.
Example
Core 2 Duo
Core 2 Extreme
(2C)
FSB: 1066/800/566 MT/s speed
FSB
965 Series
MCH
ME
Two DDR2 channels
DDR2-800/666/566
Two DIMMs per channel
C-link
DMI
ICH8
Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform)
1.3 Representation forms of platforms (4)
6/2006
b) Extended representation
DT platform
Bridge Creek
This kind of representation
• indicates a few additional data of the processor and the chipset,
DP cores
(like data of the die, the cache system or the memory)
• reveals the dates of the introduction of platform components, and
• identifies compatibility ranges of processors or chipsets
in platforms by encircling compatible components,
• but lacks the graphical representation of the platform.
7/2006
E6xxx/E4xxx
X6800
(Conroe: E6xxx/X6800)1
Allendale: E4xxx)1
Core 2 Extreme 2C
Core 2 Duo 2C
65 nm
Conroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2
X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/s
LGA775
6/2006
MCH
965 Series
(Broadwater)
FSB
1066/800/566 MT/s
2 DDR2 channels
DDR2-800/666/533
4 ranks/channel
8 GB max.
6/2006
ICH
ICH8
1The
Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0),
that provided typically only 2 MB L2 and appeared 1/2007.
Core 2-aimed (65 nm)
1.3 Representation forms of platforms (5)
6/2006
Example for stating the compatibility range of a platform
DT platform
The Core 2 Duo aimed DT platform that targets
home users (designated as the Bridge Creek platform).
7/2006
Core 2 Duo (2C)
Core 2 Extr. (2C)
DT core
Core 2 Duo
Core 2 Extreme
(2C)
965 Series
MCH
Core 2 Duo (2C):E6xxx/E4xxx
Core 2 Extreme (2C): X6800
E6xxx/X68001: Conroe
E4xxx)1: Allendale
65 nm
Conroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2
X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/s
LGA775
6/2006
FSB: 1066/800/566 MT/s speed
FSB
ME
Two DDR2 channels
DDR2-800/666/566
Two DIMMs per channel
C-link
DMI
MCH
ICH8
Beyond the target processor this platform may be used also with
• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and
• the subsequent Core 2 Quad lines of processors,
as shown in the next slides.
Bridge Creek
ICH
Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0),
that provided typically only 2 MB L2 and appeared 1/2007.
965 Series
(Broadwater)
FSB
1066/800/566 MT/s
2 DDR2 channels
DDR2-800/666/533
4 ranks/channel
8 GB max.
6/2006
ICH8
1The
Core 2-aimed (65 nm)
1.3 Representation forms of platforms (6)
6/2006
Support of Pentium 4/D/EE processors
Bridge Creek
5/2005
2/2005
DT cores
Pentium D/EE
8xx1
Pentium 4
6x0/6x1/EE
1/2006
Pentium D/EE
9xx2,3
(Prescott-2M) 1C
(Smithfield) 2x1C
(Presler) 2x1C
90 nm
169 mtrs
135 mm2
2 MB L2
800 MT/s
Two-way multithreading
LGA775
90 nm
2x115 mtrs
2x103 mm2
2x1 MB L2
800/533 MT/s
No multithreading
LGA775
65 nm
2x188 mtrs
2x81 mm2
2x2 MB L2
1066/800 MT/s
No multithreading
LGA775
MCH
7/2006
Core 2 Duo (2C)
Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxx
Core 2 Extreme (2C): X6800
E6xxx/X68001: Conroe
E4xxx)1: Allendale
65 nm
Conroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2
X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/s
LGA775
6/2006
965 Series
1Pentium
EE 840 supports only 800 MT/s
D 9xx support only 800 MT/s
3Pentium EE 955/965 supports only 1066 MT/s
2Pentium
ICH
Supports also
Pentium 4 6x0/6x1/EE processors/90nm
(Broadwater)
FSB
1066/800/566 MT/s
2 DDR2 channels
DDR2-800/666/533
4 ranks/channel
8 GB max.
6/2006
ICH8
Supports also
Pentium D/EE processors/90/65 nm
Core 2-aimed (65 nm)
1.3 Representation forms of platforms (7)
6/2006
Support of Core 2 Quad processors)
DT platform
Bridge Creek
7/2006
DT core
Core 2 Duo (2C)
Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxx
Core 2 Extreme (2C): X6800
E6xxx/X68001: Conroe
E4xxx)1: Allendale
65 nm
Conroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2
X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/s
LGA775
6/2006
MCH
11/2006
Core 2 Quad (2x2C)
Core 2 Quad (2x2C): Q6xxx
Q6xxx: Kentsfield
65 nm
2x291 mtrs/2x143 mm2
2*4 MB L2
1066 MT/s
LGA775
965 Series
(Broadwater)
FSB
1066/800/566 MT/s
2 DDR2 channels
DDR2-800/666/533
4 ranks/channel
8 GB max.
6/2006
ICH
ICH8
Core 2-aimed (65 nm)
Supports also
Core 2 Quad processors/65 nm
1.3 Representation forms of platforms (8)
c) Block diagram of a platform
Example: The Core 2 aimed home user DT platform (Bridge Creek)
(without an integrated display controller) [3]
1066 MT/s
Display
card
2 DIMMs/channel
2 DIMMs/channel
C-link
1.4. Compatibility of platform components
1.4 Compatibility of platform components (1)
1.4 Compatibility of platform components
One of the goals of platform based designs is to use stabilized interfaces (at least for a while)
to minimize or eliminate design rework while moving from one processor generation to the
next [2].
Consequently, assuming platform based designs, platform components, such as processors
or chipsets of a given line are typically compatible with their previous or subsequent generations
as long as the same interfaces are used and interface parameters (such FSB speed) or
other implementation requirements (either from side of the components to be substituted or
the substituting components) do not restrict this.
1.4 Compatibility of platform components (2)
Limits of compatibility
In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by
an FSB with 1066/800/533 MT/s.
The target processor of the platform however, can be substituted
• either by processors of three previous generations or
• processors of the subsequent generation (Core 2 Quad)
since all these processors have FSBs of 533/800/1066 MT/s, as shown before.
Core 2 Duo
Core 2 Extreme
(2C)
FSB: 1066/800/533 MT/s
FSB
965 Series
MCH
ME
Two memory channels
DDR2-800/666/533
Two DIMMs per channel
C-link
DMI
ICH8
Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad,
provided already an increased FSB speed of 1333 MT/s and therefore was not more supported
by the Core 2 aimed platform considered.
2. Basic components of platforms
• 2.1. Processors
• 2.2. The memory subsystem
• 2.3. Buses interconnecting platform components
1.1 The notion of platform (6)
Basic components of platforms - Overview
As already discussed in Section 1. the notion platform is interpreted as a standardized
backbone of a system architecture developed for a given application area that is built up
typically of
•
•
•
•
the processor or processors,
the chipset,
the memory subsystem (MSS) that is attached by a specific memory interface,
in some cases, such as in mobile or business oriented DT platforms also the
networking component [7], as well as
• the buses interconnecting the above components.
Basic components of a platform
Processor or
processors
Chipset
The memory
subsystem
(LAN
controller)
Buses
interconnecting
the preceding
basic components
Subsequently, we will discuss the following three basic components of platforms:
• Processors (Section 2.1)
• The memory subsystem (Section 2.2) and
• Buses interconnecting platform components (excluding memory buses) (Section 2.3).
2.1. Processors
2.1 Processors (1)
2 YEARS
TICK
TOCK
2 YEARS
TICK
TOCK Pentium 4 /Northwood
2 YEARS
Intel’s Tick-Tock model
TICK
TOCK
Pentium 4 /Willamette
2 YEARS
TICK Pentium 4 / Cedar Mill
2 YEARS
TICK PENRYN Family
2 YEARS
TICK WESTMERE
2 YEARS
Pentium 4 /Prescott
TICK IVY BRIDGE
TOCK
Core 2
Key microarchitectural features
180nm
11/2000
New microarch.
130nm
01/2002
Adv. microarch., hyperthreading
90nm
02/2004
Adv. microarch., hyperthreading,
64-bit
65nm
07/2006
45nm
TOCK NEHALEM
TOCK SANDY BRIDGE
TOCK HASWELL
01/2006
New microarch., 4-wide core,
128-bit SIMD, no hyperthreading
11/2007
11/2008
New microarch., hyperthreading,
(inclusive) L3, integrated MC, QPI
01/2010
32nm
01/2011
New microarch. hyperthreading,
256-bit AVX, integr. GPU, ring bus,
04/2012
22nm
Figure 2.1: Overview of Intel’s Tick-Tock model (based on [17])
2.1 Processors (2)
Basic architectures and their related shrinks
Considered from the Pentium 4 Prescott (the third core of Pentium 4) on
Basic architectures
Pentium 4
(Prescott)
Core 2
Nehalem
Sandy Bridge
Haswell
Basic architectures and their shrinks
2005
90 nm
Pentium 4
2006
65 nm
Pentium 4
2006
65 nm
Core 2
2007
45 nm
Penryn
2008
45 nm
Nehalem
2010
32 nm
Westmere
2011
32 nm
Sandy Bridge
2012
22 nm
Ivy Bridge
2013
22 nm
Haswell
2.1 Processors (5)
Basic Arch.
Techn.
Core/technology
Cores
Intro.
Cache arch.
Interf.
4 MB L2/2C
2/4 MB L2/2C
4 MB L2 /2C
4 MB L2/2C
4MB L2/2C
4 MB l2/2C
FSB
65 nm
X6800
E6xxx
E4xxx
E6xxx
QX67xx
Q6xxx
Conroe
Conroe
Allendale
Allendale
Kentsfield
Kentsfield
2C
2C
2C
2C
2x2C
2*2C
7/2006
7/2006
1/2007
7/2007
11/2006
1/2007
45 nm
E8xxx
E7xxx
QX9xxx
Q9xxx
Q9xxx
Q8xxx
Wolfdale
Wolfdale-3M
Yorkfield XE
Yorkfield
Yorkfield-6M
Yorkfield-4M
2C
2C
2x2C
2*2C
2*2C
2x2C
1/2008
4/2008
11/2007
1/2008
1/2008
8/2008
i7-920-965
Bloomfield
4C
11/2008
¼ MB L2/C, 8 MB L3
QPI
2. G. Nehalem-EP
i7-8xxx/i5-7xx
Lynnfield
4C
9/2009
¼ MB L2/C, 8 MB L3
DMI
Westmere-EP
32 nm
i7-9xxX
i7-9xx
i5-6xx/i3-5xx
Gulftown
Gulftown
Clarkdale
6C
6C
2C+G
3/2010
7/2010
1/2010
¼ MB L2/C, 12 MB L3
¼ MB L2/C, 12 MB L3
¼ MB L2/C, max. 4 MB L2
QPI
QPI
DMI
32 nm
i7-39/38xx
i7-26/27xx
i5-23/24/25xx
i3-21xx
6C
2/4C+G
2/4C+G
2C+G
11/2011
1/2011
1/2011
1/2011
¼ MB L2/C, 15 MB L3
¼ MB L2/C, 4/8 MB L3
¼ MB L2/C, 3/6 MB L3
¼ MB L2/C, 3 MB L3
DMI 2.0
DMI 2.0
PCIe 2.0
22 nm
i7-3770
i5-33/34/35xx
i3-32xx
4C+G
2/4C+G
2C
4/2012
4/2012
9/2012
¼ MB L2/C, 8 MB L3
¼ MB L2/C, 6 MB L3
¼ MB L2/C, 3 MB L3
DMI 2.0
PCIe 3.0
Core2
Penryn
1. G. Nehalem-EP
Sandy Bridge
Ivy Bridge
45 nm
Sandy Bridge
Iyv Brigde
6
3
6
6
3
2
MB
MB
MB
MB
MB
MB
L2/2C
L2/2C
L2/2C
L2/2C
L2/2C
L2/2C
Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines
FSB
(PCIe 3.0)
2.1 Processors (6)
Basic Arch.
Core/technology
DP server processors
Pentium 4
90 nm
10/2005
Paxville DP 2.8
2x1 C, 2 MB L2/C
Pentium 4
65 nm
5/2006
5000 (Dempsy)
2x1 C, 2 MB L2/C
Core2
65 nm
6/2006
11/206
5100 (Woodchrest)
5300 (Clowertown)
1x2 C, 4 MB L2/C
2x2 C, 4 MB L2/C
Penryn
45 nm
11/2007
5400 (Harpertown)
2x2 C, 6 MB L2/2C
Nehalem-EP
45 nm
3/2009
5500 (Gainstown)
Westmere-EP
32 nm
3/2010
56xx (Gulftown)
1x6 C, ¼ MB L2/C, 12 MB L3
Nehalem-EX
45 nm
3/2010
6500 (Beckton)
1x8 C, ¼ MB L2/C, 24 MB L3
Westmere-EX
32 nm
4/2011
E7-28xx (Westmere-EX)
Sandy Bridge-EN
32 nm
5/2012
E5-2xxx
Ivy Bridge
22 nm
Pentium 4
(Prescott)
Core 2
1x4 C, ¼ MB L2/C, 8 MB L3
Nehalem
1X10 C, ¼ MB L2/C, 30 MB L3
1x8 C, ¼ MB L2/C, 20 MB L3
Sandy
Bridge
Table 2.2: Overview of Intel’s multicore DP server processors
2.1 Processors (7)
Basic Arch.
Core/technology
MP server processors
Pentium 4
90 nm
11/2005
Paxville MP
2x1 C, 2 MB L2/C
Pentium 4
65 nm
8/2006
7100 (Tulsa)
Core2
65 nm
9/2007
7200 (Tigerton DC)
7300 (Tigerton QC)
1x2 C, 4 MB L2/C
2x2 C, 4 MB L2/C
Penryn
45 nm
9/2008
7400 (Dunnington)
1x6 C, 3 MB L2/2,C 16 MB L3
Nehalem-EP
45 nm
Westmere-EP
32 nm
Nehalem-EX
45 nm
3/2010
7500 (Beckton)
1x8 C, ¼ MB L2/C, 24 MB L3
32nm
4/2011
E7-48xx (Westmere-EX)
5/2012
E5-4xxx
Pentium 4
(Prescott)
Core 2
2x1 C, 1 MB L2/C 16 MB L3
Nehalem
Westmere-EX
Sandy Bridge-EP 32 nm
1x10 C, ¼ MB L2/C, 30 MB L3
1x8C, ¼ MB L2/C, 20 MB L3
Sandy
Bridge
Ivy Bridge
22 nm
Table 2.2: Overview of Intel’s multicore MP server processors
2.2. The memory subsystem
•
2.2.1. Key parameters of the memory subsystem
•
2.2.2. Main attributes of the memory technology used
•
2.2.2.1. Overview: Main attributes of the memory
technology used
•
2.2.2.2. Memory type
•
2.2.2.2. Speed grades
•
2.2.2.4. DIMM density
•
2.2.2.5. Use of ECC support
•
2.2.2.6. Use of registering
2.2.1 Key performance parameters of the memory subsystem (1)
2.2.1 Key performance parameters of the memory subsystem
This issue will be discussed in Section 4.
2.2.2 Main attributes of the memory technology used
2.2.2 Main attributes of the memory technology used
2.2.2.1 Overview: Main attributes of the memory technology used
Main attributes of the memory technology used
Memory type
Section
2.2.2.2
Speed grade
2.2.2.2
DIMM density
Use of
ECC support
Use of
registering
2.2.2.4
2.2.2.5
2.2.2.6
2.2.2.2 Memory type (1)
2.2.2.2 Memory type
a) Overview: Main DRAM types
DRAMs for general use
DRAMs with
parallel bus connection
Synchronous DRAMs
Asynchronous DRAMs
DRAM
FP
FPM
EDO
(1970) (~1974) (1983) (1995)
Commodity DRAMs
DDR3
DDR4
DRDRAM
XDR
FB-DIMM
(1996) (2000) (2004) (2007)
(2014)
(1999)
(2006)1
(2006)
SDRAM
DDR
DDR2
Main stream DRAM types
1
DRAMs with
serial bus connection
Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
Challenging DRAM types
2.2.2.2 Memory type (2)
b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3, DDR4)
2.2.2.2 Memory type (3)
SDRAM to DDR4 DIMMs
SDRAM
(SDR)
168-pin
DDR
184-pin
DDR2
240- pin
DDR3
240-pin
DDR4
284-pin
All these DIMM modules are 64 bit (8-byte) wide
2.3.2.2 Memory type (4)
Principle of operation of synchronous DRAMs (SDRAM to DDR4 memory chips)
DRAM device
Memory Cell
Array
fCell
I/O
Buffers
Sourcing/sinking data
to/from the I/O buffers
• at a rate of fCell
• at a width of FW
(Fetch Width)
fCK
Receiving/transmitting data
to/from the MC
• at a rate of fCK (SDRAM) or
• 2 x fCK(DDR to DDR4)
Data transmission
• on the rising edge
of the strobe (CK) for SDRAMs or
• on both edges of the strobe (DQS)
for DDR/DDR2/DDR3/DDR4.
Memory
controller
(MC)
2.3.2.2 Memory type (5)
Sourcing/sinking data by the memory cell array
The memory cell array sources/sinks data to/from the I/O buffers
• at a rate of fCell, where fCell is the clock frequency of the memory cell aray,
• at a data width of FW, where FW is the fetch width of the memory cell array.
The core clock frequency of the memory cell array (fcell)
• fCell is 100 to 200 MHz
• fCK stands in a given ratio with fCell (the clock frequency of the memory cell array) as follows:
fCK
SDRAM
fcell
DDR
fcell
DDR2
2 x fcell
DDR3
4 x fcell
DDR4
8 x fcell
Raising fCell from 100 MHz to 200 MHz characterizes the evolution of each memory technology
• When a new memory technology (e.g. DDR2 or DDR3) appears fCore is initially 100 MHz,
.this sets the initial speed grade of fCK accordingly (e.g. to 400 MT/s for DDR2 or
to 800 MT/s for DDR3).
• As memory technology evolves fCore will be raised from 100 MHz to 133, 167 and to 200 MHz.
• Along with fCore fCK and the final speed grade will also be raised.
2.3.2.2 Memory type (6)
The fetch width (FW) of the memory cell array
It specifies how many times more bits the cell array fetches per column cycle
then the data width of the device (xn).
E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM)
fetches 4 × 4 that is 16 bits from the memory cell array in every fCell cycle.
The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows:
FW
SDRAM
1
DDR
2
DDR2
4
DDR3
8
DDR4
8
The DDR4 architecture is an 8n prefetch with two or four selectable bank groups.
This design permit the DDR4 memory devices to have separate: activation, read,
write or refresh operations underway in each unique bank group.
DDR3
SDRAM
DDR4
SDRAM
DDR4 devices have 16 banks organized in four groups compared to DDR3’s 8 independent banks.
Source: http://www.chip.de/artikel/DDR4-RAM-So-funktioniert-der-neue-Arbeitsspeicher_68928617.html
DDR4 uses a point-to-point topology (one DIMM per channel) compared to its predecesors
multi-drop bus (multiple DIMMs per channel).
Source: http://www.bit-tech.net/hardware/memory/2010/08/26/ddr4-what-we-can-expect/2
Figure 2.3: RAS feature comparison of DDR3 and DDR4 SDRAM
Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf
E.g.
SDRAM
E.g.
DDR
SDRAM
E.g.
DDR2
SDRAM
Clock frequency (fCK)
100 MHz
DRAM core frequency
100 MHz
Memory Cell
Array
fCell
DRAM core clock
100 MHz
Memory Cell
Array
n bits
Clock (CK/CK#)
100 MHz
fCell
2xn bits
DRAM core clock
100 MHz
Memory Cell
Array
fCK
I/O
Buffers
n bits
Clock (CK)
100 MHz
I/O
Buffers
fCK
n bits
I/O
Buffers
4xn bits
SDRAM-100
Data Strobe (DQS)
100 MHz
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
200 MT/s
DDR-200
Data Strobe (DQS)
200 MHz
Clock (CK/CK#)
200 MHz
fCell
Data transfer on the rising edges of CK
over the data lines (DQ0 - DQn-1)
100 MT/s
2 x fCK
n bits
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
400 MT/s
DDR2-400
E.g.
DRAM core clock
100 MHz
Data Strobe (DQS)
400 MHz
Clock (CK/CK#)
400 MHz
fCell
DDR3
SDRAM
Memory Cell
Array
I/O
Buffers
2 x fCK
n bits
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
800 MT/s
DDR3-800
8xn bits
DDR4
SDRAM
E.g.
Clock (CK/CK#)
800 MHz
DRAM core clock
100 MHz
Data Strobe (DQS)
800 MHz
fCell
Bank
group 0
Memory Cell
Array
Buffers
2 x fCK
8xn bits
Bank
group 1
Memory Cell
Array
I/O
Buffers
Buffers
8xn bits
n bits
Data transfer on both edges of DQS
over the data lines (DQ0 - DQn-1)
1600 MT/s
DDR4-1600
2.2.2.2 Memory type (9)
The main technique to increase memory speed
Relation between voltage swings and rise/fall times of signals
Q = Cin x V = I x t
Q:
Cin:
V:
I:
tR:
Smaller
voltage
swings
tR ~ Cin x V/I
Charge on the input capacitance of the line (Cin)
Input capacitance of the line
Voltage
Current strength of the driver
Rise time
shorter signal rise/fall times
higher speed grades
but lower voltage budget
higher requirements for signal integrity
Memory type
SDRAM
DDR
DDR2
DDR3
DDR4
Voltage/Voltage swing
3.3
2.5
1.8
1.5
1.2
V
V
V
V
V
2.2.2.2 Memory type (9b)
Signaling used in buses
Signals
Voltage referenced
Single ended
Differential
S+
VREF
t
t
Typ.voltage
swings
Signaling
system used
3.3-5 V
TTL (5 V)
FPM/EDO
LVTTL (3.3 V)
FPM/EDO
SDRAM
HI1.5
VCM
S-
600-800 mV
t
200-300 mV
SSTL
SSTL2 (DDR)
SSTL1.8 (DDR2)
SSTL1.5 (DDR3)
SSTL1.2 (DDR4)
RSL (RDRAM)
FSB
LVDS
PCIe
QPI, DMI, ESI
FB-DIMMs
DRSL
XDR (data)
Smaller voltage swings
LVDS:
(D)RSL:
VCM:
Low Voltage Differential Signaling
(Differential) Rambus Signaling Level
Common Mode Voltage
LVTTL:
SSTL:
VREF:
Low Voltage TTL
Stub Series Terminated Logic
Reference Voltage
Figure 2.4: Signal types used in MMs for control, address and data signals
2.2.2.2 Memory type (10)
Signaling of command, control and adress lines
Single ended
(TTL, LVTTL)
Voltage ref.
(RSL, SSTL)
Differential
(DRSL, LVDS)
Signaling of data lines
Single ended
(TTL, LVTTL)
Voltage ref.
(RSL, SSTL)
Differential
(DRSL, LVDS)
FPM
EDO
SDRAM
DDR
DDR2
DDR3
DDR4
RDRAM
XDR
XDR2
FBDIMM
Figure 2.7: Signaling alternatives of buses used with memories
2.2.2.2 Memory type (11)
Key features of synchronous DRAM devices (SDRAM to DDR3)
JEDEC standard
Key features
Standard
First/last release
SDR SDRAM
DDR SDRAM
DDR2 SDRAM
DDR3 SDRAM
DDR4 SDRAM
JESD 21-C Release 4
JESD 79
JESD 79-2
JESD 79-3
JESD 79-4
Synchronous, pipelined,
burst oriented
Double data rate
2n prefetch architecture
Double data rate
4n prefetch architecture
Double data rate
8n pref. arch.
Double data rate
8n+bank groups
JESD 21-C
Release 4
11/1993
JESD 79
6/2000
JESD 79E
5/2005
JESD 79-2
9/2003
JESD 79-2C
5/2006
JESD 79-3
6/2007
JESD 79-4
9/2012
Device density
64 Mb
128 Mb-1Gb
256 Mb-4 Gb
256 Mb–4 Gb
512 Mb – 8 Gb
2 Gb – 8 Gb
Organization
x4/8/16
x4/8/16
x4/8/16
x4/8/16
x4/8/16
x4/8/16
400/533/
667/800
800/1066/
1333/1600
1600/1866/2133
2400/2667/3200
256MB–1GB
x8/16
256MB-1GB
x8/16
512MB–16GB
2GB–16GB
P4 (Prescott)
P4 (Presler)
Pentium D
Core2 Duo
Core2 Duo to
Broadwell
Haswell-E to
…
Device speed
(MT/s)
66
100/133
200/266
200/266/
333/400
DIMM density
4/16 MB
16-256MB
x8/16
64-512 MB
x8/16
128-512 MB
x8/16
Typ. processors
Pentium
(3V)
Pentium
III
P4
(Willamette)
P4 (Northw.)
P4 (Prescott)
Voltage
No. of pins on the
modul
3.3 V
2.5 V
1.8 V
1.5 V
1.2 V
168
184
240
240
284
Table 2.4: Key features of synchronous DRAM devices
2.2.2.2 Memory type (12)
Approximate appearance dates and speed grades of DDR DRAMs
as well as the bandwidth provided by a dual channel memory subsystem
Bandwidth1
1
Bandwidth of a dual channel memory subsystem [12]
Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf
2.2.2.2 Memory type (13)
Green and ultra-low power memories
They represents the latest achievements of the DRAM memory technology
Green memories: lower dissipation memories
Low voltage DDR3L memories: Use of 1.35 V supply voltage instead of 1.50 V
to reduce dissipation
Ultra low voltage DDR3U memories: Use of 1.25 V supply voltage instead of 1.50 V
to reduce dissipation
2.2.2.2 Memory type (14)
Green and ultra-low power memories- Examples [13]
2.2.2.2 Memory type (15)
c) FB-DIMMs
DRAMs for general use
DRAMs with
parallel bus connection
Synchronous DRAMs
Asynchronous DRAMs
DRAM
FP
FPM
EDO
(1970) (~1974) (1983) (1995)
DRAMs with
serial bus connection
DDR3
DDR4
DRDRAM
XDR
FB-DIMM
(1996) (2000) (2004) (2007)
(2014)
(1999)
(2006)1
(2006)
SDRAM
DDR
DDR2
Main stream DRAM types
1
Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
Challenging DRAM types
2.2.2.2 Memory type (16)
Principle of operation
• Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)
• Introduce full buffering (registered DIMMs buffer only addresses)
• CRC error checking (cyclic redundancy check)
2.2.2.2 Memory type (17)
The architecture of FB-DIMM memories [19]
2.2.2.2 Memory type (18)
Figure 2.8: Maximum supported FB-DIMM configuration [20]
(6 channels/8 DIMMs)
2.2.2.2 Memory type (19)
Implementation details (1)
• Serial (differential) transmission between the North Bridge and the DIMMs
(each bit needs a pair of wires)
• Number of seral links
• 14 read lanes (2 wires each)
• 10 write lanes (2 wires each)
• Clocked at 6 x data rate of the DDR2
e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz
• Every 12 cycles (that is every two memory cycles) constitute a packet.
• Read packets (frames, bursts): 168 bits (12 x 14 bits)
•
144 data bits
(equals the number of data bits produced by a 72 bit wide DDR2 module
(64 data bits + 8 ECC bits) in two memory cycles)
• 24 CRC bits.
• Write packets (frames, bursts): 120 bits (12 x 10 bits)
• 98 payload bits
• 22 CRC bits.
2.2.2.2 Memory type (20)
Implementation details (2)
98 payload bits.
•
2 frame type bits,
• 24 bits of command,
• 72 bits for data and commands, according to the frame type,
e.g. 72 bits of data, 36 bits of data + one command or two commands.
Commands
• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.
2.2.2.2 Memory type (22)
FB-DIMM data puffer
(Advanced Memory Buffer, AMB)
Manages the read/write operations
of the module
Source: PC stats
FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s
FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s
FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s
Figure 2.9: Different implementations of FB-DIMMs
2.2.2.2 Memory type (23)
Figure 2.10: Block diagram of the AMB [21]
(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)
2.2.2.2
type
Necessary routing to connect the north
bridge toMemory
the DIMM socket
(24)
b) In case of an FB-DIMM
(69 pins)
a) In case of a DDR2 DIMM
(240 pins)
A 3-layer PCB is needed
A 2-layer PCB is needed
(but a 3. layer is used for power lines)
Figure 2.11: PCB routing [19]
2.2.2.2 Memory type (25)
Assessing benefits and drawbacks of FB-DIMM memories
(as compared to DDR2/3 memories)
Benefits of FB-DIMMs
•
more memory channels (up to 6)
higher memory size and bandwidth
•
more DIMM modules (up to 8) per channel
higher memory size (6x8=48 DIMM size)
asuming 8 GB/DIMM
•
up to 512 GB
same bandwidth figures as the parts based on (DDR2)
Drawbacks of FB-DIMMs
• higher latency
• higher dissipation
(Typical dissipation figures: DDR2: about 5 W
AMB: about 5 W
FB-DIMM with DDR2: about 10 W)
• higher cost
2.2.2.2 Memory type (26)
Latency [22]
• Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have
about 15 % higher overall average latency than DDR2 memories.
Production
The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the
market due to the drawbacks of the technology.
2.2.2.2 Speed grades (1)
2.2.2.2 Speed grades
Overview of the speed grades of DDR DRAMs
Bandwidth1
Bandwidth1
1
Bandwidth of a dual channel memory subsystem [12]
Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf
2.2.2.2 Speed grades (2)
Remark
Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency
of the FSBs was 133 MHz (around 2000).
Then subsequent speed grades of FSBs and also those of the memories were chosen
as subsequent integral multiples of 133 MHz, such as
266 = 2 x 133
400 ~= 3 x 133
533 ~= 4 x 133
667 ~= 5 x 133
800 ~= 6 x 133
1067 ~= 7 x 133
1333 ~= 8 x 133
1600 ~= 9 x 133 etc.
2.2.2.2 Speed grades (3)
Rate of increasing the transfer rates in synchronous DRAMs
Transfer rate
(MT/s)
DDR4
2133
*
2000
1500
DDR2
DDR2 800
DDR2 667
*
DDR 533
*
DDR 400
*
DDR 333
*
266
*
~ 10*/10 years
*
1000
500
200
100
50
SDRAM
66
*
SDRAM
133
*
SDRAM
100
*
DDR3
DDR3 1600
1333 *
*
20
10
Year
96
97
98
99
2000
01
02
03
04
05
06
07
08
09
10
11
12
13
Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs
as manifested in Intel’s chipsets
2.2.2.2 Speed grades (4)
Memory speed grades used in Intel’s multicore systems
Kind of attaching memory
(In Intel’s MC systems, typically)
Attaching memory
by parallel channels
Memory is attached
to the MCH
Up to DDR2-667
Attaching memory
by serial channels
Memory is attached
to the processor(s)
Using FB-DIMMs
Using serial channels
with S/P converters
Up to DDR3-1600
Up to DDR2-667
Up to DDR3- 1600/2133
2.2.2.4 DIMM density (1)
2.2.2.4. DIMM density
a) Device density
Units 106
2000
4M
16M
64M 256M
1G
1500
Density: ~4×/4Y
64K
1000
500
256K 1M
16K
1980
1985
1990
1995
2000
2005
2010
2015
Year
Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23])
2.2.2.4 DIMM density (2)
b) DIMM (module) density
Based on device densities of 1 to 8 Gb and with typical width of x4 to x16 (bits) DDR3 or
DDR4 modules provide typical densities of up to 8 or 16 GB.
DDR4 DIMM’s theoretical maximum capacity is 512 GB compared to DDR3’s 128 GB.
2 ranks x 8 high x 18 devices x 16 Gb (0,5 KB page size) vs. 2 ranks x 4 high x 18
devices x 8 Gb (2 KB page size)
2.2.2.5 Use of ECC support (1)
2.2.2.5 Use of ECC support
ECC basics (as used in DIMMs)
Implemented as SEC-DED (Single Error Corretion Double Error Detection)
Single bit Error Correction
For D data bits P check-bits are added.
Data bits
Check bits
D
P
Figure: The code word
The minimum number of check-bits (P) for single bit error corection ?
Requirement:
2P ≥ the minimum number of states to be distinguished.
2.2.2.5 Use of ECC support (2)
The minimum number of states to be distinguished:
• It is needed to specify the bit position of a possible single bit error in the code word
consisting of both data and check bits
This requires D + P states
• one additional state to specify the „no error” state.
the minimum number of states to be distinguished is: D + P + 1
Accordingly:
to implement single bit error correction the minimum number of check bits (P)
needs to satisfy the requirement:
2P ≥ D + P + 1
2.2.2.5 Use of ECC support (3)
Double bit error detection
an additional parity bit is needed to check for an additional error.
Then the minimum number of check-bits (CB) needed for SEC-DED is:
CB = P + 1
since
P = CB - 1
2P ≥ D + P + 1
2CB-1 ≥ D + CB -1 + 1
2CB-1 ≥ D + CB
Data bits (D)
Check bits (CB)
1
2
3:2
3
7:4
4
15:8
5
31:16
6
63:32
7
127:64
8
255:128
9
511:256
10
Table 2.5: The number of check-bits (CB) needed for D data bits
2.2.2.5 Use of ECC support (4)
Supported memory features of DT and DP/MP platforms
DT memories typically do not support ECC or registered (buffered) DIMMs,
Servers make typically use of registered DIMMs with ECC protection.
2.2.2.5 Use of ECC support (5)
Typical implementation of ECC protected registered DIMMs (used typically in servers)
Main components
• Two register chips, for buffering the address- and command lines
• A PLL (Phase Locked Loop) unit for deskewing clock distribution.
ECC
Register
PLL
Register
Figure 2.14:Typical layout of a registered memory module with ECC [14]
2.2.2.6 Use of registering (1)
2.2.2.6 Use of registering
Problems arising while implementing higher memory capacities
Higher memory capacities need more modules
Higher loading the lines
Signal integrity problems
Buffering address and command lines,
Phase locked clocking of the modules
2.2.2.6 Use of registering (2)
Registering
Principle
Buffering address and control lines
• to reduce signal loading in a memory channel
• in order to increase the number of supported DIMM slots (max. mem. capacity),
needed first of all in servers.
2.2.2.6 Use of registering (3)
Example: Block diagram of a registered DDR DIMM
S
D
R
A
M
S
D
R
A
M
S
D
R
A
M
PI74SSTV168
57 Register
Address/Control
form
Motherboard
S
D
R
A
M
S
D
R
A
M
PI6CV857
PLL
Input Clock
for
Motherboard
S
D
R
A
M
S
D
R
A
M
S
D
R
A
M
S
D
R
A
M
Data From / To Motherboard
PI74SSTV168
57 Register
Address Control
from
Motherboard
Figure 2.17: Example. Block diagram of a registered DDR DIMM [16]
2.2.2.6 Use of registering (4)
Implementation of registering
By means of a register chip that buffers address and control lines
R
E
G
I
S
T
E
R
REGE: Register enable signal
Figure 2.15: Registered signals in case of an SDRAM memory module [15]
Note: Data (DQ) and data strobe (DQS) signals are not registered
as only address an control signals are common for all memory chips.
2.2.2.6 Use of registering (5)
Number of register chips required
• Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30
address and control lines,
• Register chips buffer usually 14 lines,
Typically, two register chips are needed per memory module [16].
2.2.2.6 Use of registering (6)
Typical layout of registered DIMMs
• Two register chips, for buffering the address- and command lines
• A PLL (Phase locked loop) unit for deskewing clock distribution.
ECC
Register
PLL
Register
Figure 2.16:Typical layout of a registered memory module with ECC [14]
2.2.2.6 Use of registering (7)
Registered DIMM module with ECC
ECC
Figure 2.18: Registered DIMM module with ECC [14]
2.2.2.6 Use of registering (8)
Typical use of unregistered DIMMs (UDIMMs)
in desktops/laptops (Memory capacities: up to a few GB)
Typical use of registered DIMM (RDIMM)
in servers (Memory capacities: a few tens of GB to a few hundreds of GB)
2.3. Buses interconnecting platform components
2.3 Buses interconnecting platform components (1)
2.3 Buses interconnecting platform components
Use of buses in Intel’s DT/DP and MP platforms
Buses interconnecting
processors
(In NUMA topologies)
Buses interconnecting
processors to chipsets
Xeon 6500
(Nehalem-EX)
(Becton)
SMB
SMB
SMB
Nehalem-EX (8C)
Westmere-EX
(10C)
SMB
SMI links
Buses interconnecting
MCHs to ICHs
(In 2-part chipsets)
Xeon E7-2800
or (Westmere-EX)
QPI
Nehalem-EX (8C)
Westmere-EX
(10C)
QPI
QPI
DDR3-1067
7500 IOH
ESI
ICH10
SMB
SMB
SMB
SMB
SMI links
DDR3-1067
ME
SMI: Serial link between the processor
and the SMB
SMB: Scalable Memory Buffer with
Parallel/serial conversion
Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)
Remark
Buses connecting the memory subsystem with the main body of the platforms are memory specific
interfaces and will be discussed in Section 4.
2.3 Buses interconnecting platform components (2)
Implementation of buses used in Intel’s DT/DP and MP platforms
Parallel/serial bus
Parallel bus
64-bit wide
Serial bus
(Point-to-point interconnection)
8-bit wide
Used to interconnect Used to interconnect
MCHs to ICHs
processors to chipsets
in previous platforms in previous platforms
FSB
(Front Side Bus)
HI1.5
16-bit wide
4-bit wide
(4 PCIe lanes)
Used to interconnect
processors to processors
and processors to chipsets
Used to interconnect
processors to chipsets
or MCHs to ICHs
QPI
(Quick Path Interconnect)
QPI1.1
(Quick Path Interconnect v.1.1)
DMI
(Direct Media Interface)
ESI
(Enterprise System Interface)
DMI2
(Direct Media Interface 2.G.)
2.3 Buses interconnecting platform components (3)
Buses used in Intel’s DT/DP/MP platforms
Parallel/serial bus
Parallel bus
Buses interconnecting
processors
(In NUMA topologies)
Buses interconnecting
processors to chipsets
FSB (64-bit: 1993)
• 64-bit wide
• ~150 lines
• 3.2-12.8 GB/s total
in both directions
Low-cost
systems
Serial bus
Buses interconnecting
MCHs to ICHs
(In 2-parts chipsets)
QPI (2008)
• 20 lanes
• 84 lines
• 9.6/11.72/12.8 GB/s
in each direction
• 8-bit wide
• 16 lines
• 266 MB/s total
in both directions
High-performance
systems
DMI/ESI (2008)2 QPI (2008)
• 4 PCIe lanes
• 20 lanes
• 18 lines
• 84 lines
• 1 GB/s/direction • 9.6/11.72/12.8 GB/s
in each direction
DMI2 (2011)
HI 1.5 (1999)
QPI1.1 (2012?)
• 4 PCIe lanes
Specification na.
• 18 lines
• 2 GB/s/direction
DMI/ESI (20041)
• 4 PCIe lanes
• 18 lines
• 1 GB/s/direction
DMI2 (2011)
• 4 PCIe lanes
• 18 lines
• 2 GB/s/direction
2.3 Buses interconnecting platform components (4)
Remarks
1
DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6,
supporting Pentium 4 Prescott processors, in 2004.
2
DMI: Introduced as an interface between the processors and the chipset first between
Nehalem-EP and the 34xxPCH, in 2008,
after the memory controllers were placed to the processor die.
2.3 Buses interconnecting platform components (5)
Signaling used in buses
Signals
Voltage referenced
Single ended
Differential
S+
VREF
t
t
Typ.voltage
swings
Signaling
system used
3.3-5 V
TTL (5 V)
FPM/EDO
LVTTL (3.3 V)
FPM/EDO
SDRAM
HI1.5
VCM
S-
t
200-300 mV
600-800 mV
SSTL
SSTL2 (DDR)
SSTL1.8 (DDR2)
SSTL1.5 (DDR3)
RSL (RDRAM)
FSB
LVDS
PCIe
QPI, DMI, ESI
FB-DIMMs
DRSL
XDR (data)
Smaller voltage swings
LVDS:
(D)RSL:
VCM:
Low Voltage Differential Signaling
(Differential) Rambus Signaling Level
Common Mode Voltage
LVTTL:
SSTL:
VREF:
Low Voltage TTL
Stub Series Terminated Logic
Reference Voltage
Figure 2.4: Signal types used in MMs for control, address and data signals
2.3 Buses interconnecting platform components (6)
Main features of parallel buses used in Intel’s multicore platforms
FSB
HI 1.5
Typical use
Connecting the processors
and the chipset
Connecting MCH and ICH
Introduced
With the Pentium (1993)
With the Pentium III (1999)
Width
64 bit
8 bit
Clock
100-400 MHz
66 MHz
QDR since Pentium 4 (2000)
QDR
400-1600 MT/s
266 MT/s
3.2-12.8 GB/s
in both directions altogether
266 MB/s
in both directions altogether
Voltage referenced data signals
Single-ended data signals
~ 150 lines
~ 16 lines
DDR/QDR
Transfer rate
Bandwidth
Signaling
No. of lines
FSB/HI 1.5: Bus type interconnects
2.3 Buses interconnecting platform components (7)
Main features of serial buses used in Intel’s platforms
DMI/ESI
Typical use
DMI2
To interconnect MCHs and ICHs
or processors to chipsets in
NUMA platforms
QPI
QPI 1.1
To interconnect processors in NUMA
topologies or processors to chipsets
In connection
with 2. gen.
Nehalem in 2008
In connection
with Sandy
Bridge in 2011
In connection with
Nehalem-EP in 2008
In connection with
Sandy Bridge in
2012 (?)
Width
4 PCI lanes
4 PCI2 lanes
20 lanes
No specification
available yet
Clock
2.5 GHz
5 GHz
2.4/2.93/3.2 GHz
DDR
–
–
DDR
10bit/8bit
10bit/8bit
no
1 GB/s
2 GB/s
9.6/11.72/12.8 GB/s
LVDS
LVDS
LVDS
18 lines
18 lines
84 lines
Introduced
Encoding
Bandwidth/
direction
Signaling
No. of lines
DMI/QPI: Point-to-point interconnection
2.3 Buses interconnecting platform components (8)
Comparing main features of Intel’s FSB and QPI [9]
GTL+: A kind of voltage refenced signaling
2.3 Buses interconnecting platform components (9)
Principle of LVDS signal transmission used in serial buses
Figure 2.5: LVDS Single Link Interface Circuit [10]
2.3 Buses interconnecting platform components (10)
PCIe package format (data frames)
PCI Express Data Frame [10]
The related fields are:
Field
Interpretation
Frame
1-byte Start-of-Frame/End of Frame
Seq#
2-byte Sequence Number
Header
16- or 20-byte Header
Data
0-4096-byte Data field
CRC
4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC)
(CRC: Cyclic Redundancy Check)
2.3 Buses interconnecting platform components (11)
Principle of the QuickPath Interconnect bus (QPI bus)
TX Unidirectional link
RX Unidirectional link
16 data
2 protocol
2 CRC
Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11]
5. References
5. References (1)
[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino
[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong
Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire,
Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server
+Architecture%3B+Platform...-a053949226
[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/
[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design:
Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on
Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.
[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004,
http://pcworld.about.net/news/Dec072004id118866.htm
[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004,
http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf
[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s
Press release, Aug. 2, 2004 ,
http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm
[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation,
March 1 2005, http://www.silentpcreview.com/article224-page2
[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/
content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf
[10]: Davis L. PCI Express Bus,
http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html
5. References (2)
[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel
Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008,
http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDFTaipei_TDPS001_100.pdf
[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor
/products/dram/Products_ComputingDRAM.html
[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011,
http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/
Documents/downloads/green_ddr3_2011.pdf
[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page
4.20.4-1, Jan. 2002, http://www.jedec.org
[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/
SD9C16_32x72.pdf
[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37,
Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf
[17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture
(Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf
[18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years,
Xbit Laboratories, 04/28/2006,
http://www.xbitlabs.com/news/cpu/display/20060428162855.html
[19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the
Next Level,” Technology Intel Magazine, March 2005, pp. 1-7
5. References (3)
[20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,
http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,”
Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf
[22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures:
Understanding Mechanisms, Overheads and Scaling, 2007,
[23]: DRAM Pricing – A White Paper, Tachyon Semiconductors,
http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf
Download