Computer Workstation and Server Hardware Part 2

advertisement
Lecture 4: Workstation Hardware Part 2
Intel and AMD Chipsets 2006
Basic Differences in Processor/chipset Designs: FSB vs. Direct Connect and
HyperTransport Technologies
What is the difference between AMD and Intel processors? The answer is in AMD's Direct
Connect Architecture and its use of HyperTransport Technology (HTT). A main function of the
processor is to read and write information to and from memory as fast and efficiently as possible.
When developing the AMD Opteron and AMD Athlon 64 processors, AMD’s engineers needed
to find a method to efficiently deliver data to the processor from main memory even when
handling the large data loads of 64-bit and multi-core computing. To keep all cores as productive
as possible, they looked to processor designs outside of what had been used for 32-bit x86-based
processors. The integrated memory controller, one of the key innovations of its Direct Connect
Architecture, has it roots in the architecture of RISC processors. The benefit of an integrated
memory controller for AMD64 processors, along with the power efficiency of our processors,
resulted in AMD's performance-per-watt leadership; however, Intel’s Core 2 Duo architecture has
knocked AMD off that perch–for now.
In traditional x86 system design, a front side bus (FSB) connects the CPU to the main memory of
a system. A separate memory controller chip is responsible for coordinating the data traffic as it
passes from the memory to the CPU. However, in AMD64 and RISC processor design, the
memory controller is integrated onto the processor, which can reduce the amount of time it takes
for data to move from one component to another. For example, until the middle of this year when
DDR2 Direct Connect bus was introduced, an AMD Athlon 64 accessed RAM memory by
transferring data and the leading and trailing edge of the clock pulse, making its maximum
transfer rate with the RAM memory be 3,200 MB/s, which was the same maximum rate of the
DDR400/PC3200 memories. It still does this but now has two channels to DDR2.
Thus the Athlon 64 (X2, Opteron, etc.) has two external buses: the first accesses RAM memory
(either using a single channel to DDR or dual channel to DDR2 RAM), and the second to access
the chipset (typically the Northbridge). This second bus is called HyperTransport bus.
Theoretically this architecture is better than Intel’s FSB/Chipset architecture. Like Intel’s FSB,
the HT bus is used to communicate with the Northbridge chip (which is used as a gateway to the
1
video card and the slower devices off the Southbridge chip); however it is faster and can be easily
extended in both bus width and speed. In theory, the Athlon 64 can communicate with the
memory and with the other circuits of the computer at the same time.
Another advantage of the HyperTransport is that it has one path for the transmission of data and a
second for its reception—it is thus two-way or full-duplex. In the traditional architecture used by
the other processors, a single path is used both for the transmission and for the reception of data.
In theory, the Athlon 64 can both transmit and receive data to the chipset at the same time. Thus
the HyperTransport bus works at 3,200 MB/s in each direction: that is why it is listed as being a
6,400 MB/s bus, which is not true, and is one of the major misunderstanding by computer
practitioners. In brief, it is as if people claimed that a motorway had a speed limit of 130 kph just
because there is a speed limit of 65 Kph in each direction. It is a significant advantage over Intel’s
FSB as it permits the processor(s) to send (e.g. to the video card) and receive (from the
Internet/DVD/HDD) data.
A typical HTT (Version 1.1) bus works at 800 MHz, with two 2 bytes of data being transferred at
both the leading and trailing edge of the clock pulse: in terms of speed this equates to 800*2 =
1600Mhz bus transferring 2 bytes of data per clock pulse. In effect the bandwidth is calculated by
multiplying 800Mhz * 2 byes (switched on the leading edge) * 2 bytes (on the trailing edge)
giving a bandwidth of 3,200 MB/s. The former calculation leads to another misunderstanding by
saying the external bus of the Athlon 64 is 1,600 MHz. Nevertheless, the older AGP 8x bus works
at > 1 GB/s: therefore, a HyperTransport bus is 3 times faster than this, leaving plenty of
bandwidth for remaining I/O devices.
However, emergence of PCI Express with higher
bandwidth, especially for video I/O meant that HTT would have to provide higher bandwidth
speeds.
All this means that Hypertransport Technology had to evolve: from HTT 1.1 to 2.0 and now to
3.0. It's interesting that within the HyperTransport system, performance is nominally measured in
billions of transfers per second, or gigatransfers or GT/s—not gigabytes. That's because each
interconnect or bus can range in width from two to 32 bits. Furthermore, as indicated it is also a
"Double Data Rate" or DDR technology, meaning nodes send data on both the rising and falling
edges of the clock signal. Thus the aggregate bandwidth of the original HyperTransport 1.1 link is
1.6 gigatransfers/second or 800 Mhz * 2: if the bus width is 16 bits or 2 bytes, this translates to
3.2 GB/sec of data per direction. HyperTransport 2.0 adds three new speed grades: 2.0, 2.4, and
2.8 gigatransfers per second, with clock rates of between 1.0GHz and 1.4GHz, So, a 1 Ghz
Hypertransport bus with a 16 bit width will deliver a bandwidth of 2 GT/s * 2 = 4GB/s ( 1* 2
2
(DDR) * 2 (2 bytes= 16 bits). Thus with a 2 byte bus a HyperTransport 2.0 system with speeds
of 2.0, 2.4 and 2.8 GT/s (gigatransfers/second) translate to 4.0, 4.8 or 5.6 GB/s.
HyperTransport is primarily a chip-to-chip interconnect, so an important element of its design is
its bridge capabilities to board-level bus systems, such as AGP, PCI, PCI-X and PCI Express.
That's required for providing a HyperTransport-based system board with access to the huge array
of I/O devices, ranging from AGP desktop graphics cards to Ethernet, Fibre Channel and SCSI
adapters. Each HyperTransport link consists of a host device and an endpoint; any of those
devices may be a bridge to one of those slower board-to-board networks. The original
HyperTransport specification defined the PCI and AGP bridges. HyperTransport 1.05 created a
bridge to PCI-X; HyperTransport 2.0, which appeared in 2004, added in the PCI Express
mappings and appropriate bridge technology.
PCI Express
PCI Express
Lanes
Bandwidth per
stream
Bandwidth,
duplex
1
256 MB/s
512 MB/s
2
512 MB/s
1 GB/s
4
1 GB/s
2 GB/s
8
2 GB/s
4 GB/s
16
4 GB/s
8 GB/s
32
8 GB/S
16 GB/s
Basic PCI Express x1 buses, which are single channel, permit 256 MB/s transfer rate. Each
channel or lane of a PCI Express connection contains two pairs of wires; one to send and one to
receive. Packets of data move across the channel/lane at a rate of one bit per cycle. An x1
connection, the smallest PCI Express bus, has one lane made up of four wires. It carries one bit
per cycle in each direction. An x2 link contains eight wires and transmits two bits at once, a x4
link transmits four bits, and so on. Other configurations are x12, x16 and x32. For example, the
x16 bus (16 channels) used for video boards have a bandwidth of 4 GB/s (16 * 256) in a single
direction, with 6.4 GB/s possible in both directions. PCIe transfers data at 250 MB/s, per channel
to a maximum of 32 lanes, a total combined transfer rate of 8 GB/s. It must be noted that PCIe is
able to transfer data in both directions at once (full-duplex). This effectively doubles the data
transfer rate allowing 500 MB/s per channel giving a total combined transfer rate of 16 GB/s
when 32 channels are employed.
3
AMD Athlon and Opteron Family Chipset: 2004-2006
Figure 1 introduces the basic Athlon 64 chipset architecture using a Via K8 Series chipset,
introduced in 2004. This architecture was being used for all AMD Athlon 64 processors prior to
the release of its X2 dual core and AM2 socket processors. Similar chipsets were released by
NVIDIA and other manufacturers.
Figure 1 Typical AMD Athlon 64 Platform
4
The VIA K8 Series supports the AMD Athlon 64 processor-based systems and had 90% market
share in 2004. The VIA K8T890 chipset enables 64-bit capable performance desktop PCs and
Media Centers with graphics capabilities for gaming and digital video editing.
VIA's
Hypertransport Technology enables a full 16-bit/1000MHz (DDR) or 4 GB/s of high bandwidth
link in both directions between processor and chipset for the AMD Athlon 64 processor platform
(previous K8 Series supported 16-bit/800MHz (DDR) or 3.2 GB/s): this ensures that AMD
Athlon 64 processor-based systems can achieve their full performance potential. Note this
member of the K8 Series supports PCI Express x16 graphic cards.
The chipset’s Southbridge V-MAP Architecture supports the Northbridge/Southbridge Ultra VLink High with throughput 1GB/s interconnect. (V-Link supporting 533MB/s South Bridge/North
Bridge interconnect is also an option). From here on down it is now dissimilar to Intel’s chipsets
as VIA’s DriveStation™ Advanced Controller Suite supports Serial ATA High Performance
150MB/s Dual Channel Serial ATA interface (Note: the bus bandwidth here is equivalent to 1.5
Gb/s or Giga bit per second) . There is also Extended support for additional two Serial ATA
devices through SATAlite™ interfaceg V-RAID Serial RAID controller RAID 0, RAID 1, and
RAID 0+1.
Figure 2 VIA Chipset RAID Support
There is also support for Parallel ATA-133 and up to four PATA devices. In addition the
Southbridge includes the nVIA Advanced Connectivity Controller Suite supports USB 2.0
5
Controller with support for 8 USB 2.0/1.1 ports. Network Controller Enterprise Class
10/100Mbps Fast Ethernet MAC. PCI Express & LPC bus controllers. VIA Vinyl™Audio, VIA
Vinyl integrated 5.1 surround sound AC ’97 audio, VIA Six-TRAC codec, VIA Vinyl Gold
onboard 7.1 surround sound, 24/96 resolution audio, VIA Envy24PT + VIA Six-TRAC Codec+
additional DAC.
While VIA were the market leaders in 2004 for AMD-based chipsets, that situation changed by
2006, when NVIDIA nForce4 obtained sizable chunk of the market of high-end enthusiast
motherboards. ATI followed suit with its Radeon Xpress 200 CrossFire Edition chipset. VIA had
no immediate answer for the nForce4: The K8T890 north bridge wasn't bad, but it couldn't drive a
6
pair of PCI Express graphics slots with eight lanes going to each slot, so its potential for multiGPU use was limited. This situation changed with the release of the K8T900 (see below),
however, NVIDIA are now the undoubted leaders, see the following section for reasons why.
AMD Chipsets 2006+: The NVIDIA nForce 500 Platform
For the launch of socket AM2, NVIDIA is providing four new chipset offerings dependent upon
the market sector. With this product introduction NVIDIA is launching two new chipsets, the
C51XE and MCP55PXE that form the basis for the four models. A quick summary of the new
product choices can be found in the following table. The new socket AM2 (940) motherboards
cost slightly more than their socket 939 counterparts, at least initially, but long-term they should
have basically the same prices.
7
NVIDIA Chipset Breakdown
Market Segment
Socket 939
Socket AM2
High-End Enthusiast
nForce4 SLI X16
nForce 590 SLI
Mainstream Enthusiast
nForce4 SLI
nForce 570 SLI
Performance Mainstream
nForce4 Ultra
nForce 570 Ultra
Value Mainstream
nForce4 4X
nForce 550
At the top of the product offering, the nForce 590 SLI consists of two chips, the C51Xe SPP and
the MCP55PXE. This solution offers dual X16 PCI-E lanes for multiple graphics card
configurations. While other features have changed, the overall design is very similar to the
nForce4 SLI X16.
One step down from the nForce 590 SLI is the nForce 570 SLI. This is a single chip solution,
providing two X8 lanes for multiple graphics cards. A total of 28 lanes are of available, up from
the 20 lanes that were available on the nForce4 SLI chipset. The 570 chipset also lacks the
included LinkBoost technology but is otherwise the same as the 590.
8
The nForce 570 Ultra comes next, with a drop to 20 total PCI-E lanes. If the name didn't clue you
in already, the Ultra also drops support for SLI—that is there is only support for 1 PCI Express
x16 GPU. Basically, this is the "performance mainstream" offering, targeting users that are only
interested in running single graphics cards.
9
Specification
Segment
CPU Suggestion
NVIDIA
NVIDIA
NVIDIA
NVIDIA
nForce
nForce
4 nForce
570 nForce
590 SLI
SLI x16
SLI
Enthusiast
Enthusiast
Performance
SLI (2x16)
SLI (2x16)
SLI (2x8)
Athlon 64 FX, Athlon
64
570
Ultra
Performance
NVIDIA
nForce 550
Mainstream
FX, Athlon 64 FX, Athlon 64 FX, Athlon
64,
Athlon 64
Athlon 64
Athlon 64
Athlon 64
Sempron
SLI Technology
Yes
Yes
Yes
No
No
NVIDIA LinkBoost
Yes
No
No
No
No
NVIDIA FirstPacket
Yes
No
Yes
Yes
No
NVIDIA DualNet
Yes
No
Yes
Yes
No
Gigabit Connections
2
2
2
1
Teaming
Yes
No
Yes
Yes
No
TCP/IP Accleration
Yes
Yes, ActiveArmor Yes
Yes
No
MediaShield
Yes
Yes
Yes
Yes
SATA / PATA Drives
6
2 - requires an
external chipset
SATA 4
Yes
SATA 6
SATA 6
SATA 4
SATA
2 PATA
4 PATA
2 PATA
2 PATA
2 PATA
RAID
0, 1, 0+1, 5
0, 1, 0+1
0, 1, 0+1, 5
0, 1, 0+1, 5
0, 1, 0+1
NVIDIA nTune 5
Yes
No
Yes
Yes
Yes
PCI Express Lanes
46
38
28
20
20
Links
9
8
6
5
5
USB Ports
10
10
10
10
10
PCI Slots Supported
5
5
5
5
5
Audio
Azalia
AC'97
Azalia
Azalia
Azalia
10
Finally, the nForce 550 is the "value mainstream" product, taking over from the nForce4 4X.
Several of the higher end options have been dropped from the 550 chipset, including support for
dual Ethernet controllers, the FirstPacket technology, TCP/IP acceleration, and RAID 5. The
number of natively supported SATA ports has also been reduced from six down to four. Whereas
the other three chipsets are recommended for Athlon 64/FX/X2 users, the nForce 550 is
recommended for Athlon 64 (single core) and Sempron users. In conclusion, NVIDIA apply the
same chipset technology to Intel platforms also. Here we see a chipset released in 2005 to support
the last generation (i.e. before Core 2 Duo family. Note the similarities between this and AMDbased chipsets and the use of Hypertransport Technologies for the Northbridge/Southbridge link.
Figure 3 NVIDIA Chipsets for Intel with HTT
11
Cutting to the Chase: Chipsets for Intel Processors in 2006
The NVIDIA nForce 590 SLI
NVIDIA's flagship nForce 590 SLI chipset has the ability to let a Core 2 Duo processor reach its
maximum potential while still maintaining the low noise and power consumption benefits of this
impressive processor series.
At the top of the product offering, the nForce 590 SLI consists of two chips, the 590SLI SPP and
the 590SLI MCP. This solution offers dual X16 PCI-E lanes for multiple graphics card
configurations. While other features have changed, the overall design is very similar to the
nForce4 Intel SLI X16. The total number of PCI-E lanes is now 48, with 18 lanes coming from
the SPP. NVIDIA has the only chipsets in the Intel processor range that fully support SLI
technology.
12
NVIDIA nForce 500 Series MCPs for Intel
Chipset:
NVIDIA nForce 590 SLI
NVIDIA nForce 570 SLI
Segment:
Enthusiast
Performance SLI
CPU:
Intel Socket 775:
Core 2 Extreme,
Core 2 Duo,
Pentium D 9XX,
Pentium D 8XX,
Pentium 4
Intel Socket 775:
Core 2 Extreme,
Core 2 Duo,
Pentium D 9XX,
Pentium D 8XX,
Pentium 4
Celeron D
NVIDIA SLI Technology:
Yes - 2 x16
Yes - 1 x16, 2 x 8
FSB (MHz):
1066, 800, 533 MHz
1066, 800, 533 MHz
DDR2 Memory (MHz):
667 MHz +
667 MHz +
PCI-E - # Lanes
48 lanes
20 lanes
PCI-E - # Links
9 links
5 links
Configuration
16, 16, 8/4/2/1, 4/2/1, etc. 16/8, 8, 4/2/1, etc.
SATA/PATA drives
6, 2
4,4
SATA speed
3Gb/s
3Gb/s
RAID
0,1,0+1,5
0,1,0+1,5
NVIDIA MediaShield
Storage Technology
Yes
Yes
Native Gigabit
Ethernet Connections
2
1
NVIDIA FirstPacket
Technology
Yes
Yes
NVIDIA DualNet
Technology
Yes
No
Teaming
Yes
No
TCP/IP Acceleration
Yes
No
NVIDIA nTune Utility
Yes
Yes
USB ports
10
8
PCI Slots
5
5
Audio
HDA (Azalia)
HDA (Azalia)
The nForce 590 SLI chipset is the most powerful. While the nForce 590 SLI doesn't list Celeron
D support, at least unofficially it should work. Intel didn't officially want their 975X chipset to
support 533 FSB processors, but a few motherboard manufacturers disagreed on this point, and
the end result is that 975X motherboards are able to run Celeron D chips. A similar situation
13
applies with the nForce 590 SLI. Not that it is recommended putting a budget processor in a
high-end motherboard, but at least you can if you so desire.
The second point is on lane configurations. It's difficult to properly convey all of the options
available with a simple features table, so here's a lengthier explanation. The combination of PCI
Express lanes and slots can be tweaked according to individual manufacturer desires. In SLI
mode, the 590 will always provide two X16 slots with X16 bandwidth and the 570 will provide
two X16 slots with X8 bandwidth. Beyond that, many potential configurations exist. 590 SLI
motherboards could offer two more X16 slots, but due to the lane configuration (30 lanes from
the Northbridge and 18 from the Southbridge) the slots will be limited to a maximum of one X8
data connection and one X4 data connection. That would give motherboards a total of four X16
slots with varying bandwidth offered to each slot (2 X16, 1 X8, 1 X4). Considering the layout of
expansion slots on ATX/BTX motherboards, this would be our ideal configuration, and the
remaining expansion slots can be filled out with either X1/X2 PCI-E or regular PCI connectors.
There really does not seem to be much point in including X1 physical slots, particularly on
enthusiast level hardware, and ATI at least has already recommended that motherboard
manufacturers begin including more X16 physical connectors. The situation is similar with
nForce 570 SLI, only with a lower number of total available lanes and links. The ideal
configuration for 570 SLI motherboards seems to be three X16 connections, two with X8 links
and the third with an X4 link; another option would be to provide two X2 links. Naturally, it is
important to provide spacing between the X16 slots so that dual slot GPUs can be used without
blocking access to the other slots.
The 975X and 965 Intel Chipsets
There are only two high-end core logic products for socket 775 available: Nvidia's nForce 590
SLI chipset, and Intel's 975X. Both support all current processors, and high-end motherboards
come with comparable feature sets. You need the Nvidia chipset to install an Nvidia SLI dual
graphics solution, or the 975X to run an ATI Crossfire dual graphics setup. (It's ironic to see an
ATI setup running on an Intel system these days, because the Canadian graphics company was
recently purchased by AMD.)
The 975X is Intel's state-of-the-art core logic for high-end desktop and entry-level workstation
systems. It supports all socket 775 Pentium 4, Pentium D, Core 2 Solo and Core 2 Duo processors
at FSB800 and FSB1066 speeds, and it comes with a Dual DDR2-800 memory controller. As
already mentioned, the chipset will support two graphics cards, each running at PCI Express x8
14
link speed, or the graphics interface can be configured to support a single graphics card at x16
link speed. For device connectivity, Intel uses its well-known ICH7R Southbridge component,
which supports High Definition audio, eight USB 2.0 ports, six x1 PCI Express lanes, a
conventional PCI bus and four Serial ATA/300 ports for hard drives and modern optical drives.
There is also one UltraATA/100 channel for your legacy devices, and Intel has implemented its
Matrix Storage technology, which allows you to install multiple RAID sets across a common set
of hard drives.
Figure 4 Intel's 975X Chipset
15
Figure 5 Intel Core 2 Duo Chipset
Intel released the P965 Express MCH and ICH8/R chipsets in 2006. The major features of the
P965 MCH include Intel's new Fast Memory Access technology, 1066MHz front side bus
support, 800MHz DDR-2 memory support, and full support for the new Core 2 Duo processor
lineup. The integrated graphics versions, G965 and Q965, will ship with the new GMA X3000
graphics engine for the ViiV and Corporate markets respectively. The new ICH8/R chipsets offer
ten USB 2.0 ports, up to six 3Gb/s SATA ports, Intel's new Quiet System Technology, and the
removal of Parallel ATA support.
While one could argue that the removal of PATA support is a needed step forward in technology,
the industry believes that Intel should have waited until the next generation ICH for this change.
The Optical Drive market is still about 98% PATA based and does not seem to be changing
anytime soon. While this development might spur the optical drive suppliers into offering
16
additional SATA drives in the near future, it does not address the requirements of the current
PATA installed base. This most likely means there will be additional costs and complexity on the
motherboards using ICH8 as the manufacturers will have to add an additional chipset for PATA
support. (Manufacturers could simply choose not have any PATA ports, but at present we feel
such a motherboard would be doomed to fail.)
Random Access Memory in Personal Computers
Dynamic RAM (DRAM) is a type of RAM that employs refresh circuits to maintain data in its
logic circuits. Each memory cell in DRAM consists of a single transistor and a capacitor. The
capacitor is responsible for holding the electrical charge that designates a 1 bit. The absence of a
charge designates a logical 0. Capacitors loose their charge over time and therefore need to be
recharged or refreshed. A more expensive and faster type of RAM, Static RAM (SRAM), uses
between 4 and 6 transistors in a special ‘flip-flop’ circuit that maintains a 1 or 0 while the
computer system is operating. SRAM can be read or written to like DRAM. DRAM logic, on the
other hand, is refreshed several hundreds of times a second. To do this, the DRAM controller
logic merely reads the contents of each memory cell, and because of the way in which cells are
constructed, the reading action simply refreshes the contents of the memory. This action puts the
dynamic into DRAM. However, refreshing takes time and increases the latency (the time taken
for a memory access request to the time the data is output) of DRAM.
DRAM is used in all computers and associated devices for main system memory even though
DRAM is slower than SRAM, due to operation of the refresh circuitry. Also, DRAM is used
because it is much cheaper and takes up less space—typically 25% the silicon area of SRAMs or
less. To build a 256 MB of system SRAM memory would be prohibitively expensive. However,
technological advances has led to faster and faster forms of DRAM, despite the disadvantages of
the refresh circuit. As indicated, DRAM modules are smaller and less expensive than SRAM
because the latter employs four to six transistors (or more) to store one single bit, as opposed to
DRAM’s single transistor (the switch) and capacitor (the 1/0 charge store). SRAM is mainly
employed in L1 and L2 cache memory on Intel Pentium CPUs and in L1, L2 and L3 cache in the
Itanium family. In 2004, Intel released Pentium IVs with and onboard L3 cache and its L2 cache
reached a whopping 1 MB. All things being equal, the more cache on board a CPU package, the
larger the scale of integration as the number of transistors runs into the millions. Itanium II
processors have up to 500 million transistors, the majority of which are in the L1, 2 and 3 caches.
17
DRAM technology involves very large scale integration (VSLI) using a silicon substrate which is
etched with the patterns that make the transistors and capacitors. Each unit of DRAM comes
packaged in an integrated circuit (IC). By 2003, DRAM technologies had evolved to the point
where several competing technologies existed, however, the older of these are slower (i.e. have a
higher latency) and contain less MB of storage per unit.
Ceterus paribus, adding more memory to a computer system increases its performance. Why is
this? Well, if the amount of RAM is insufficient to hold the processes and data required, the
operating system has to create a swap file on hard disk, which is used to create virtual memory.
On average, it takes a CPU about 200 nanoseconds (ns) to access DRAM compared to
12,000,000ns to access the hard drive. More RAM means less swapping and speeds up a system.
Synchronous Dram (SDRAM)
In late 1996, synchronous DRAM began to appear in computer systems. Unlike previous RAM
technology, which used asynchronous memory access techniques, SDRAM is synchronized with
the system clock and the CPU. SDRAM interface is synchronized with the clocked operation of
computer's system bus (i.e. clock pulses trigger the gates to open and close) , and thus with the
processor. In the SDRAM module itself clock pulses are used to drive logic circuits that pipelines
incoming read/write commands. This allows the chip to have a more complex pattern of operation
than DRAM which does not have synchronizing control circuits. Pipelining means that the chip
can accept a new command before it has finished processing the previous one. In a pipelined
write, the write command can be immediately followed by another command without waiting for
the data to be written to the memory array. In a pipelined read, the requested data appears a fixed
number of clock pulses after the read command. This delay is called the latency—this is a key
performance variable for all types of RAM. Thus SDRAM employs interleaving and burst mode
functions, which make memory retrieval even faster.
SDRAM dual inline memory modules (DIMMs, as opposed to the older single inline memory
modules, SIMMs) were available from numbers vendors and at several different packing densities
(e.g. the amount of MB on each DIMM) and speeds. The speed of SDRAM chips were closely
associated with the speed of the front-side bus, in order to synchronous with the operation of the
CPU. For example, PC66 SDRAM runs at 66MHz, PC100 SDRAM runs at 100MHz, PC133
SDRAM runs at 133MHz, and so on. Faster SDRAM speeds such as 200MHz and 266MHz
appeared later.
18
Double Data Rate Synchronous SDRAM (DDR SDRAM)
DDR SDRAM was targeted at Intel’s 7th Generation Pentium CPUs, as its key innovation is that
the memory control logic gates switch on both the leading and trailing edge of a clock pulse,
rather than on just the leading edge as with normal gate operation. With typical SDRAM
technology binary signals on the control, data and address potions of the system bus from the
Northbridge chip to the memory unit are transferred on the leading edge of the clock pulse that
opens the bus interface logic gates.
Until the advent of the Pentium IV CPU, bus speed was dictated by the system clock speed,
which ran at 100 MHz and 133 MHz on the Pentium III. The Frontside Bus (FSB) speed to the
Northbridge chip and the portion of the system bus from the Northbridge chip to the memory
chips ran at 100 MHz and 133 MHz. The Pentium IV Willamette, however, had a FSB speed of
400 MHz (the 100 MHz system clock was ‘quad pumped’ (x 4) to achieve this). With a data bus
width of 64 bits (8 bytes), this gives a data bandwidth of 3.2 Giga Bytes (400 MB x 8 = 3,200
Mbps or 3.2 Gbps). Note, data transfer rates within a computer are rated in Kilo, Mega, or Giga
Bytes per second due to the parallel method of transfer, while a computer network (client/server
etc.) it is measured in bits per second (Kilo, Mega or Giga bits per second). Older SDRAM
technologies (PC100 SDRAM and PC133 SDRAM) operate at system bus speeds and therefore
constitute a bottleneck for Pentium IV systems. Hence, the advent of DDR technology which
helped alleviate the bottleneck, and Intel’s support for RAMBus DRAM, which it felt was a
better solution.
With DDR technology, special logic circuitry enables data to be transferred on the leading and
trailing edge of the ‘1’ clock pulse (remember each clock cycle consists of a ‘1’ followed by a
‘0’). Taking the data bus for example, a clock pulse transition from the ‘0’ of the preceding cycle
to the ‘1’ of the next opens logic gates to allow 64 bits of data onto the data bus, while the
transition from ‘1’ to a ‘0’ results in another 64 bits being switched onto the bus. Of course the
gates on the Northside chip that serves the memory segment of the system bus open and close in
unison. This effectively doubles the speed of operation of SDRAM, hence the term the term
Double Data Rate. With DDR SDRAM, a 100 or 133MHz system clock rate yields an effective
data rate of 200MHz or 266MHz when double switched Newer designs (PC2700 etc.) are based
on DDR SRRAM running at 166MHz, which is double switched to give an effective rate of 333
MHz.
19
Speed Ratings for DDR SDRAM
As indicated, in the past, the speeds that SDRAM chips operated at were dictated by the bus
speeds, so PC100 and PC133 SDRAM DIMMs operated on 100 and 133 MHz FSBs. DDRDRAM ratings are not based on clock speed, but on the maximum data bandwidth or throughput.




DDR-200: DDR-SDRAM memory chips specified to operate at 100 MHz
DDR-266: DDR-SDRAM memory chips specified to operate at 133 MHz
DDR-333: DDR-SDRAM memory chips specified to operate at 166 MHz
DDR-400: DDR-SDRAM memory chips specified to operate at 200 MHz
Hence, with a 100 MHz system bus on a Pentium IV Willamette system, the maximum data
bandwidth is 1600 Mbytes per second (100 x 2 x 8) or 1.6 GB. Hence, the industry designation
for DDR SDRAM DIMMS on 100 MHz systems is PC1600. Likewise, with a 133 MHz system
clock speed, the designation for DDR SDRAMs that operate at 133 MHz is PC2100 (133 x 2 x 8
= 2133 MB per second). The reason for this rating system lies with manufacturer’s marketing
strategies. For example, RAMBus DRAM RIMMs are designated PC800, because of the rate in
MHz at which the memory chips operate. This is the internal and external rate of operation to the
Northbridge chip; however, the data bus between memory and the Northbridge chip is a mere 16
bits at which it operates. This gives a bandwidth of 1600 MBytes/second (800 X 2 bytes), the
same as DDR (however, the actual throughput of data is higher and latency lower in RAMBus
DRAM RIMMs. The manufacturers of DDR SDRAM were reluctant to badge their chips with
smaller designations (e.g. PC200 or PC266) as potential customers might not buy their chips even
though the difference in performance was negligible. Further advances in DDR SDRAM
technologies saw DDR SDRAM-based Intel and VIA chipsets which accommodated PC2400 and
PC2700 DDR SRRAM running at 150 MHz and 166 MHz respectively, which is double clocked
to 300 and 333 MHz (so called DDR 300 and 333). However, the evolution of DDR366 and new
chipset design led to the PC3000 DDR SDRAM being released with even higher bandwidth
speeds. New the market is Dual Channel DDR 400 with 3.2GBytes/s (200 x 2 x 8) peak
bandwidth.
Dual Channel DDR: Making DDR Perform Faster
The latest Pentium IV/ Intel 915/925 chipsets have an 800 MHz FSB memory bus that is twice
the speed of the 400 MHz DDR400, which would effectively slow things down to 400 MHz clock
speed due to the single channel bottleneck. Ideally, you would want to match the processor front
side bus and the memory bus, so DDR SDRAM running at 800 MHz would be an optimal
20
solution. However, the technical challenges of getting an 8 byte memory bus to operate at 800
MHz are not insignificant. The solution to this was to adopt a dual channel approach, just like that
in RAMBus memory technologies. Practically speaking, dual channel DDR400 requires two
DIMM slots and two modules. The architecture, while offering 6.4 GBytes/s of peak bandwidth,
simultaneously splits the back-and-forth signaling with the CPU. The signal from each channel
comes from one of two sockets in the chipset. In May 2002, Intel began to play catch-up with Via
and SiS by releasing Pentium IV chipsets with a 500 MHz FSB that supported DDR SDRAM.
This seemed to increase Intel’s commitment to DDR technology. Remember Intel had based its
first Pentium IV chipset on RDRAM and the company took a long time it considered adding
SDRAM, much less DDR support. However RAMBus was never popular with the computer
manufacturing industry, especially among JEDEC1 members and memory suppliers. Even with
this, it came as some surprise when at a recent Intel Developers Forum (IDF), the company
indicated its support for single- and dual-channel DDR400 in its new 800 MHz FSB chipsets.
More significant, however, was that Intel’s memory roadmap for the future did not include
RAMBus memory.
DDR2 and DDR3
With a clock frequency of 100 MHz, single-data-rate SDRAM transfers data on every rising edge
of the clock pulse, thus achieving an effective 100 MHz data transfer rate. However, both DDR
and DDR2 are double switched; that is their computer logic is switched to transfer data on the
rising and falling edges of the clock, at points of 0.0 V and 2.5 V (1.8 V for DDR2). As indicated
above, this achieves an effective rate of 200 MHz (and a theoretical bandwidth of 1.6 GB/s) with
the same clock frequency. DDR operates both the internal logic circuits of the memory chip and
its I/O bus (to the Intel Northbridge Memory Control Hub (MCH) and AMD Athlon 64 memory
controller) at the same speed. DDR2 memory logic operates at half the speed of I/O clock. For
example, the I/O clock is the rate at which DDR2 logic gates open and connect to the Northbridge
MCH. The internal logic and external I/O bus of DDR PC-3200 operates at 200 MHz; in contrast
the internal logic and memory circuits of DDR2 PC2-3200 operates at 100 MHz , while the I/O
interface logic switches at 200 MHz.
1
Joint Electron Device Engineering Council (JEDEC) is the semiconductor engineering standardization
body of the Electronic Industries Alliance (EIA), a trade association that represents all areas of the
electronics industry.
21
DDR2's bus frequency is boosted by electrical interface improvements, on-die termination,
prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. To
compensate the DDR2 prefetch buffer is 4 bits wide, whereas it is just 2 bits wide for DDR,
which has a lower latency. The prefetch buffer for DDR3 is 8 bits wide, thus indicating it has
greater latency problems. Another mportant feature of DDR2 is that it consumes less power:
power savings are achieved primarily due to an improved manufacturing process, resulting in a
drop in operating voltage (1.8 V compared to DDR's 2.5 V).
Chip Specifications

DDR2-400: run at 100 MHz, I/O clock at 200 MHz, PC2-3200, 3.200 GB/s bandwidth

DDR2-533: run at 133 MHz, I/O clock at 266 MHz , PC2-5300, 4.267 GB/s bandwidth

DDR2-667: run at 166 MHz, I/O clock at 333 MHz PC2-4200, 5.333 GB/s bandwidth1

DDR2-800: run at 200 MHz, I/O clock at 400 MHz PC2-64006.400 GB/s bandwidth
DDR3 SDRAM (Double Data Rate 3 Synchronous Dynamic Random Access Memory)
DDR3 SDRAM comes with a promise of a power consumption reduction of 40% compared to
current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for
lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V). "Dualgate" transistors will be used to reduce leakage of current. DDR3's prefetch buffer width is 8 bit,
whereas DDR2's is 4 bit, and DDR's is 2 bit. Theoretically, these modules could transfer data at
the effective clockrate of 400-800 MHz (for a single clock bandwidth of 800-1600 MHz),
compared to DDR2's current range of 200-533 MHz (400-1066 MHz) or DDR's range of 100-300
MHz (200-600 MHz). To date, such bandwidth requirements have been mainly on the graphics
market, where vast transfer of information between framebuffers is required. Intel has
preliminarily announced that they expect to be able to offer support for it near the end of 2007.
AMD's roadmap indicates their own adoption of DDR3 to come in 2008.

DDR3-800 : runs at 100 MHz, I/O clock at 400 MHz

DDR3-1066: runs at 133 MHz, I/O clock at 533 MHz

DDR3-1333: runs at 166 MHz, I/O clock at 667 MHz
Graphics Double Data Rate 3, is a graphics card-specific memory technology, designed by ATI
Technologies.
22
GDDR3 memory might sound similar to DDR3 but is more like DDR2: it has been in use for
several years in high-end graphic cards such as ones from NVIDIA or ATI Technologies, and as
main system memory on the Xbox 360. It is sometimes incorrectly referred to as "DDR3". It has
much the same technological base as DDR2, but the power and heat dispersal requirements have
been reduced somewhat, allowing for higher-speed memory modules, and simplified cooling
systems. Unlike the DDR2 used on graphics cards, GDDR3 is unrelated to the upcoming JEDEC
DDR3 specification. This memory uses internal terminators, enabling it to better handle certain
graphics demands. To improve bandwidth, GDDR3 memory transfers 4 bits of data per pin in 2
clock cycles.
RAMBus DRAM
While DDR SDRAM is relatively new, its major competitor, RAMBus DRAM (DRAM), has
been around for some time. Intel’s support for RAMBus DRAM, as indicated by its collaboration
with Rambus Inc. and its efforts to successfully develop chipset support for the standard and
promote this combination as the de facto standard to PC manufacturers, indicates that the future
of DRAM is RDRAM. RDRAM is therefore a proprietary technology jointly developed by Intel
and Rambus Inc. However, Intel had problems with its initial chipset designs and the emergence
of support for DDR among Intel’s major competitors in the CPU and chipset markets, caused
Intel to develop a chipset (the i845) to support DDR. Then there was the relative cost of DDR
SDRAM and RDRAM, with the latter being prohibitively more expensive until 2002/2003, when
the cost of DDR SDRAM DIMMs and RAMBus DRAM RIMMs were more or less equal, but
not the PC systems of which they were a part.
There are three types of RAMBus DRAM; Base RAMBus, Concurrent RAMBus and Direct
RAMBus. Direct RAMBus is the newest DRAM architecture and interface standard that
challenges traditional main memory designs. Direct RAMBus transfers data at speeds up to and
over 800MHz over a 16-bit bus called a Direct RAMBus Channel (later versions use an 18 bit
bus). Therefore, PC600 RDRAM delivers a peak bandwidth of 1200 Mbytes/second, while
PC800 delivers 1,600 M Bytes/second. Advances in RAMBus technology has seen new
designations such as PC1033 and PC1066, with concomitant increases in bandwidth.

PC600: 16-bit, single channel RIMM, specified to operate at 300 MHz clock speed, 1200
MB/s bandwidth

PC700: 16-bit, single channel RIMM, specified to operate at 355 MHz clock speed, 1420
MB/s bandwidth
23

PC800: 16-bit, single channel RIMM, specified to operate at 400 MHz clock speed, 1600
MB/s bandwidth

PC1066 (RIMM 2100): 16-bit, single channel RIMM specified to operate at 533 MHz
clock speed, 2133 MB/s bandwidth

PC1200 (RIMM 2400): 16-bit, single channel RIMM specified to operate at 600 MHz
clock speed, 2400 MB/s bandwidth

RIMM 3200: 32-bit, dual channel RIMM specified to operate at 400 MHz clock speed,
3200 MB/s bandwidth

RIMM 4200: 32-bit, dual channel RIMM specified to operate at 533 MHz clock speed,
4200 MB/s bandwidth

RIMM 4800: 32-bit, dual channel RIMM specified to operate at 600 MHz clock speed,
4800 MB/s bandwidth

RIMM 6400: 32-bit, dual channel RIMM specified to operate at 800 MHz clock speed,
6400 MB/s bandwidth
However, there is a problem when it comes to RAMBus DRAM. While PC800 RAMBus DRAM
delivers the same throughput as PC1600 DDR SDRAM, the efficiency of RDRAM is in the order
of 80% or over. On the other hand, DDR SDRAM delivers between 40% - 70% dependant on
systems and applications. This might seem to give RAMBus the advantage, however, the latency
of RAMBus DRAM is higher, and increases with every RIMM installed. Remember latency
refers to the time that elapses from the CPU requesting (addressing) instructions or data from
RAM and the time it receives it. PC-800 RDRAM operated with a latency of 45 ns, compared to
only 7.5 ns for PC-133 SDRAM. RDRAM memory chips also put out significantly more heat
than SDRAM chips, necessitating heatspreaders on all RIMM devices. RDRAM includes a
memory controller on each memory chip, significantly increasing manufacturing complexity
compared to SDRAM, which used a single memory controller located on the northbridge chipset.
RDRAM was also two to three times the price of PC-133 SDRAM due to a combination of high
manufacturing costs and high license fees.[citation needed] PC-2100 DDR SDRAM, introduced
in 2000, operated with a clockspeed of 133 MHz and delivered 2100 MB/s over a 64-bit bus
using a 184-pin DIMM form factor.
So, what’s the net effect of higher efficiency and throughput and higher latency? In 2003, the jury
was still out, but DDR SDRAM then remained the best option for many PC systems, except in
high end solution, where high cost and performance is the goal. The reason for this is the
evolution of even higher DRAM to Northbridge bandwidths of the PC2700 and PC3000 DDR
SDRAM. With the introduction of the i840 chipset, Intel added support for dual-channel PC-800
RDRAM, doubling bandwidth to 3200 MB/s by increasing the bus width to 32-bit. This was
followed in 2002 by the i850E chipset, which introduced PC-1066 RDRAM, increasing total
24
dual-channel bandwidth to 4200 MB/s. Then in 2002, Intel released the E7205 Granitebay
chipset, which introduced dual-channel DDR support for a total bandwidth of 4200 MB/s, but at a
much lower latency than competing RDRAM.
25
Download