Lecture 4: Workstation Hardware Part 2 Intel and AMD Chipsets 2006 Basic Differences in Processor/chipset Designs: FSB vs. Direct Connect and HyperTransport Technologies What is the difference between AMD and Intel processors? The answer is in AMD's Direct Connect Architecture and its use of HyperTransport Technology (HTT). A main function of the processor is to read and write information to and from memory as fast and efficiently as possible. When developing the AMD Opteron and AMD Athlon 64 processors, AMD’s engineers needed to find a method to efficiently deliver data to the processor from main memory even when handling the large data loads of 64-bit and multi-core computing. To keep all cores as productive as possible, they looked to processor designs outside of what had been used for 32-bit x86-based processors. The integrated memory controller, one of the key innovations of its Direct Connect Architecture, has it roots in the architecture of RISC processors. The benefit of an integrated memory controller for AMD64 processors, along with the power efficiency of our processors, resulted in AMD's performance-per-watt leadership; however, Intel’s Core 2 Duo architecture has knocked AMD off that perch–for now. In traditional x86 system design, a front side bus (FSB) connects the CPU to the main memory of a system. A separate memory controller chip is responsible for coordinating the data traffic as it passes from the memory to the CPU. However, in AMD64 and RISC processor design, the memory controller is integrated onto the processor, which can reduce the amount of time it takes for data to move from one component to another. For example, until the middle of this year when DDR2 Direct Connect bus was introduced, an AMD Athlon 64 accessed RAM memory by transferring data and the leading and trailing edge of the clock pulse, making its maximum transfer rate with the RAM memory be 3,200 MB/s, which was the same maximum rate of the DDR400/PC3200 memories. It still does this but now has two channels to DDR2. Thus the Athlon 64 (X2, Opteron, etc.) has two external buses: the first accesses RAM memory (either using a single channel to DDR or dual channel to DDR2 RAM), and the second to access the chipset (typically the Northbridge). This second bus is called HyperTransport bus. Theoretically this architecture is better than Intel’s FSB/Chipset architecture. Like Intel’s FSB, the HT bus is used to communicate with the Northbridge chip (which is used as a gateway to the 1 video card and the slower devices off the Southbridge chip); however it is faster and can be easily extended in both bus width and speed. In theory, the Athlon 64 can communicate with the memory and with the other circuits of the computer at the same time. Another advantage of the HyperTransport is that it has one path for the transmission of data and a second for its reception—it is thus two-way or full-duplex. In the traditional architecture used by the other processors, a single path is used both for the transmission and for the reception of data. In theory, the Athlon 64 can both transmit and receive data to the chipset at the same time. Thus the HyperTransport bus works at 3,200 MB/s in each direction: that is why it is listed as being a 6,400 MB/s bus, which is not true, and is one of the major misunderstanding by computer practitioners. In brief, it is as if people claimed that a motorway had a speed limit of 130 kph just because there is a speed limit of 65 Kph in each direction. It is a significant advantage over Intel’s FSB as it permits the processor(s) to send (e.g. to the video card) and receive (from the Internet/DVD/HDD) data. A typical HTT (Version 1.1) bus works at 800 MHz, with two 2 bytes of data being transferred at both the leading and trailing edge of the clock pulse: in terms of speed this equates to 800*2 = 1600Mhz bus transferring 2 bytes of data per clock pulse. In effect the bandwidth is calculated by multiplying 800Mhz * 2 byes (switched on the leading edge) * 2 bytes (on the trailing edge) giving a bandwidth of 3,200 MB/s. The former calculation leads to another misunderstanding by saying the external bus of the Athlon 64 is 1,600 MHz. Nevertheless, the older AGP 8x bus works at > 1 GB/s: therefore, a HyperTransport bus is 3 times faster than this, leaving plenty of bandwidth for remaining I/O devices. However, emergence of PCI Express with higher bandwidth, especially for video I/O meant that HTT would have to provide higher bandwidth speeds. All this means that Hypertransport Technology had to evolve: from HTT 1.1 to 2.0 and now to 3.0. It's interesting that within the HyperTransport system, performance is nominally measured in billions of transfers per second, or gigatransfers or GT/s—not gigabytes. That's because each interconnect or bus can range in width from two to 32 bits. Furthermore, as indicated it is also a "Double Data Rate" or DDR technology, meaning nodes send data on both the rising and falling edges of the clock signal. Thus the aggregate bandwidth of the original HyperTransport 1.1 link is 1.6 gigatransfers/second or 800 Mhz * 2: if the bus width is 16 bits or 2 bytes, this translates to 3.2 GB/sec of data per direction. HyperTransport 2.0 adds three new speed grades: 2.0, 2.4, and 2.8 gigatransfers per second, with clock rates of between 1.0GHz and 1.4GHz, So, a 1 Ghz Hypertransport bus with a 16 bit width will deliver a bandwidth of 2 GT/s * 2 = 4GB/s ( 1* 2 2 (DDR) * 2 (2 bytes= 16 bits). Thus with a 2 byte bus a HyperTransport 2.0 system with speeds of 2.0, 2.4 and 2.8 GT/s (gigatransfers/second) translate to 4.0, 4.8 or 5.6 GB/s. HyperTransport is primarily a chip-to-chip interconnect, so an important element of its design is its bridge capabilities to board-level bus systems, such as AGP, PCI, PCI-X and PCI Express. That's required for providing a HyperTransport-based system board with access to the huge array of I/O devices, ranging from AGP desktop graphics cards to Ethernet, Fibre Channel and SCSI adapters. Each HyperTransport link consists of a host device and an endpoint; any of those devices may be a bridge to one of those slower board-to-board networks. The original HyperTransport specification defined the PCI and AGP bridges. HyperTransport 1.05 created a bridge to PCI-X; HyperTransport 2.0, which appeared in 2004, added in the PCI Express mappings and appropriate bridge technology. PCI Express PCI Express Lanes Bandwidth per stream Bandwidth, duplex 1 256 MB/s 512 MB/s 2 512 MB/s 1 GB/s 4 1 GB/s 2 GB/s 8 2 GB/s 4 GB/s 16 4 GB/s 8 GB/s 32 8 GB/S 16 GB/s Basic PCI Express x1 buses, which are single channel, permit 256 MB/s transfer rate. Each channel or lane of a PCI Express connection contains two pairs of wires; one to send and one to receive. Packets of data move across the channel/lane at a rate of one bit per cycle. An x1 connection, the smallest PCI Express bus, has one lane made up of four wires. It carries one bit per cycle in each direction. An x2 link contains eight wires and transmits two bits at once, a x4 link transmits four bits, and so on. Other configurations are x12, x16 and x32. For example, the x16 bus (16 channels) used for video boards have a bandwidth of 4 GB/s (16 * 256) in a single direction, with 6.4 GB/s possible in both directions. PCIe transfers data at 250 MB/s, per channel to a maximum of 32 lanes, a total combined transfer rate of 8 GB/s. It must be noted that PCIe is able to transfer data in both directions at once (full-duplex). This effectively doubles the data transfer rate allowing 500 MB/s per channel giving a total combined transfer rate of 16 GB/s when 32 channels are employed. 3 AMD Athlon and Opteron Family Chipset: 2004-2006 Figure 1 introduces the basic Athlon 64 chipset architecture using a Via K8 Series chipset, introduced in 2004. This architecture was being used for all AMD Athlon 64 processors prior to the release of its X2 dual core and AM2 socket processors. Similar chipsets were released by NVIDIA and other manufacturers. Figure 1 Typical AMD Athlon 64 Platform 4 The VIA K8 Series supports the AMD Athlon 64 processor-based systems and had 90% market share in 2004. The VIA K8T890 chipset enables 64-bit capable performance desktop PCs and Media Centers with graphics capabilities for gaming and digital video editing. VIA's Hypertransport Technology enables a full 16-bit/1000MHz (DDR) or 4 GB/s of high bandwidth link in both directions between processor and chipset for the AMD Athlon 64 processor platform (previous K8 Series supported 16-bit/800MHz (DDR) or 3.2 GB/s): this ensures that AMD Athlon 64 processor-based systems can achieve their full performance potential. Note this member of the K8 Series supports PCI Express x16 graphic cards. The chipset’s Southbridge V-MAP Architecture supports the Northbridge/Southbridge Ultra VLink High with throughput 1GB/s interconnect. (V-Link supporting 533MB/s South Bridge/North Bridge interconnect is also an option). From here on down it is now dissimilar to Intel’s chipsets as VIA’s DriveStation™ Advanced Controller Suite supports Serial ATA High Performance 150MB/s Dual Channel Serial ATA interface (Note: the bus bandwidth here is equivalent to 1.5 Gb/s or Giga bit per second) . There is also Extended support for additional two Serial ATA devices through SATAlite™ interfaceg V-RAID Serial RAID controller RAID 0, RAID 1, and RAID 0+1. Figure 2 VIA Chipset RAID Support There is also support for Parallel ATA-133 and up to four PATA devices. In addition the Southbridge includes the nVIA Advanced Connectivity Controller Suite supports USB 2.0 5 Controller with support for 8 USB 2.0/1.1 ports. Network Controller Enterprise Class 10/100Mbps Fast Ethernet MAC. PCI Express & LPC bus controllers. VIA Vinyl™Audio, VIA Vinyl integrated 5.1 surround sound AC ’97 audio, VIA Six-TRAC codec, VIA Vinyl Gold onboard 7.1 surround sound, 24/96 resolution audio, VIA Envy24PT + VIA Six-TRAC Codec+ additional DAC. While VIA were the market leaders in 2004 for AMD-based chipsets, that situation changed by 2006, when NVIDIA nForce4 obtained sizable chunk of the market of high-end enthusiast motherboards. ATI followed suit with its Radeon Xpress 200 CrossFire Edition chipset. VIA had no immediate answer for the nForce4: The K8T890 north bridge wasn't bad, but it couldn't drive a 6 pair of PCI Express graphics slots with eight lanes going to each slot, so its potential for multiGPU use was limited. This situation changed with the release of the K8T900 (see below), however, NVIDIA are now the undoubted leaders, see the following section for reasons why. AMD Chipsets 2006+: The NVIDIA nForce 500 Platform For the launch of socket AM2, NVIDIA is providing four new chipset offerings dependent upon the market sector. With this product introduction NVIDIA is launching two new chipsets, the C51XE and MCP55PXE that form the basis for the four models. A quick summary of the new product choices can be found in the following table. The new socket AM2 (940) motherboards cost slightly more than their socket 939 counterparts, at least initially, but long-term they should have basically the same prices. 7 NVIDIA Chipset Breakdown Market Segment Socket 939 Socket AM2 High-End Enthusiast nForce4 SLI X16 nForce 590 SLI Mainstream Enthusiast nForce4 SLI nForce 570 SLI Performance Mainstream nForce4 Ultra nForce 570 Ultra Value Mainstream nForce4 4X nForce 550 At the top of the product offering, the nForce 590 SLI consists of two chips, the C51Xe SPP and the MCP55PXE. This solution offers dual X16 PCI-E lanes for multiple graphics card configurations. While other features have changed, the overall design is very similar to the nForce4 SLI X16. One step down from the nForce 590 SLI is the nForce 570 SLI. This is a single chip solution, providing two X8 lanes for multiple graphics cards. A total of 28 lanes are of available, up from the 20 lanes that were available on the nForce4 SLI chipset. The 570 chipset also lacks the included LinkBoost technology but is otherwise the same as the 590. 8 The nForce 570 Ultra comes next, with a drop to 20 total PCI-E lanes. If the name didn't clue you in already, the Ultra also drops support for SLI—that is there is only support for 1 PCI Express x16 GPU. Basically, this is the "performance mainstream" offering, targeting users that are only interested in running single graphics cards. 9 Specification Segment CPU Suggestion NVIDIA NVIDIA NVIDIA NVIDIA nForce nForce 4 nForce 570 nForce 590 SLI SLI x16 SLI Enthusiast Enthusiast Performance SLI (2x16) SLI (2x16) SLI (2x8) Athlon 64 FX, Athlon 64 570 Ultra Performance NVIDIA nForce 550 Mainstream FX, Athlon 64 FX, Athlon 64 FX, Athlon 64, Athlon 64 Athlon 64 Athlon 64 Athlon 64 Sempron SLI Technology Yes Yes Yes No No NVIDIA LinkBoost Yes No No No No NVIDIA FirstPacket Yes No Yes Yes No NVIDIA DualNet Yes No Yes Yes No Gigabit Connections 2 2 2 1 Teaming Yes No Yes Yes No TCP/IP Accleration Yes Yes, ActiveArmor Yes Yes No MediaShield Yes Yes Yes Yes SATA / PATA Drives 6 2 - requires an external chipset SATA 4 Yes SATA 6 SATA 6 SATA 4 SATA 2 PATA 4 PATA 2 PATA 2 PATA 2 PATA RAID 0, 1, 0+1, 5 0, 1, 0+1 0, 1, 0+1, 5 0, 1, 0+1, 5 0, 1, 0+1 NVIDIA nTune 5 Yes No Yes Yes Yes PCI Express Lanes 46 38 28 20 20 Links 9 8 6 5 5 USB Ports 10 10 10 10 10 PCI Slots Supported 5 5 5 5 5 Audio Azalia AC'97 Azalia Azalia Azalia 10 Finally, the nForce 550 is the "value mainstream" product, taking over from the nForce4 4X. Several of the higher end options have been dropped from the 550 chipset, including support for dual Ethernet controllers, the FirstPacket technology, TCP/IP acceleration, and RAID 5. The number of natively supported SATA ports has also been reduced from six down to four. Whereas the other three chipsets are recommended for Athlon 64/FX/X2 users, the nForce 550 is recommended for Athlon 64 (single core) and Sempron users. In conclusion, NVIDIA apply the same chipset technology to Intel platforms also. Here we see a chipset released in 2005 to support the last generation (i.e. before Core 2 Duo family. Note the similarities between this and AMDbased chipsets and the use of Hypertransport Technologies for the Northbridge/Southbridge link. Figure 3 NVIDIA Chipsets for Intel with HTT 11 Cutting to the Chase: Chipsets for Intel Processors in 2006 The NVIDIA nForce 590 SLI NVIDIA's flagship nForce 590 SLI chipset has the ability to let a Core 2 Duo processor reach its maximum potential while still maintaining the low noise and power consumption benefits of this impressive processor series. At the top of the product offering, the nForce 590 SLI consists of two chips, the 590SLI SPP and the 590SLI MCP. This solution offers dual X16 PCI-E lanes for multiple graphics card configurations. While other features have changed, the overall design is very similar to the nForce4 Intel SLI X16. The total number of PCI-E lanes is now 48, with 18 lanes coming from the SPP. NVIDIA has the only chipsets in the Intel processor range that fully support SLI technology. 12 NVIDIA nForce 500 Series MCPs for Intel Chipset: NVIDIA nForce 590 SLI NVIDIA nForce 570 SLI Segment: Enthusiast Performance SLI CPU: Intel Socket 775: Core 2 Extreme, Core 2 Duo, Pentium D 9XX, Pentium D 8XX, Pentium 4 Intel Socket 775: Core 2 Extreme, Core 2 Duo, Pentium D 9XX, Pentium D 8XX, Pentium 4 Celeron D NVIDIA SLI Technology: Yes - 2 x16 Yes - 1 x16, 2 x 8 FSB (MHz): 1066, 800, 533 MHz 1066, 800, 533 MHz DDR2 Memory (MHz): 667 MHz + 667 MHz + PCI-E - # Lanes 48 lanes 20 lanes PCI-E - # Links 9 links 5 links Configuration 16, 16, 8/4/2/1, 4/2/1, etc. 16/8, 8, 4/2/1, etc. SATA/PATA drives 6, 2 4,4 SATA speed 3Gb/s 3Gb/s RAID 0,1,0+1,5 0,1,0+1,5 NVIDIA MediaShield Storage Technology Yes Yes Native Gigabit Ethernet Connections 2 1 NVIDIA FirstPacket Technology Yes Yes NVIDIA DualNet Technology Yes No Teaming Yes No TCP/IP Acceleration Yes No NVIDIA nTune Utility Yes Yes USB ports 10 8 PCI Slots 5 5 Audio HDA (Azalia) HDA (Azalia) The nForce 590 SLI chipset is the most powerful. While the nForce 590 SLI doesn't list Celeron D support, at least unofficially it should work. Intel didn't officially want their 975X chipset to support 533 FSB processors, but a few motherboard manufacturers disagreed on this point, and the end result is that 975X motherboards are able to run Celeron D chips. A similar situation 13 applies with the nForce 590 SLI. Not that it is recommended putting a budget processor in a high-end motherboard, but at least you can if you so desire. The second point is on lane configurations. It's difficult to properly convey all of the options available with a simple features table, so here's a lengthier explanation. The combination of PCI Express lanes and slots can be tweaked according to individual manufacturer desires. In SLI mode, the 590 will always provide two X16 slots with X16 bandwidth and the 570 will provide two X16 slots with X8 bandwidth. Beyond that, many potential configurations exist. 590 SLI motherboards could offer two more X16 slots, but due to the lane configuration (30 lanes from the Northbridge and 18 from the Southbridge) the slots will be limited to a maximum of one X8 data connection and one X4 data connection. That would give motherboards a total of four X16 slots with varying bandwidth offered to each slot (2 X16, 1 X8, 1 X4). Considering the layout of expansion slots on ATX/BTX motherboards, this would be our ideal configuration, and the remaining expansion slots can be filled out with either X1/X2 PCI-E or regular PCI connectors. There really does not seem to be much point in including X1 physical slots, particularly on enthusiast level hardware, and ATI at least has already recommended that motherboard manufacturers begin including more X16 physical connectors. The situation is similar with nForce 570 SLI, only with a lower number of total available lanes and links. The ideal configuration for 570 SLI motherboards seems to be three X16 connections, two with X8 links and the third with an X4 link; another option would be to provide two X2 links. Naturally, it is important to provide spacing between the X16 slots so that dual slot GPUs can be used without blocking access to the other slots. The 975X and 965 Intel Chipsets There are only two high-end core logic products for socket 775 available: Nvidia's nForce 590 SLI chipset, and Intel's 975X. Both support all current processors, and high-end motherboards come with comparable feature sets. You need the Nvidia chipset to install an Nvidia SLI dual graphics solution, or the 975X to run an ATI Crossfire dual graphics setup. (It's ironic to see an ATI setup running on an Intel system these days, because the Canadian graphics company was recently purchased by AMD.) The 975X is Intel's state-of-the-art core logic for high-end desktop and entry-level workstation systems. It supports all socket 775 Pentium 4, Pentium D, Core 2 Solo and Core 2 Duo processors at FSB800 and FSB1066 speeds, and it comes with a Dual DDR2-800 memory controller. As already mentioned, the chipset will support two graphics cards, each running at PCI Express x8 14 link speed, or the graphics interface can be configured to support a single graphics card at x16 link speed. For device connectivity, Intel uses its well-known ICH7R Southbridge component, which supports High Definition audio, eight USB 2.0 ports, six x1 PCI Express lanes, a conventional PCI bus and four Serial ATA/300 ports for hard drives and modern optical drives. There is also one UltraATA/100 channel for your legacy devices, and Intel has implemented its Matrix Storage technology, which allows you to install multiple RAID sets across a common set of hard drives. Figure 4 Intel's 975X Chipset 15 Figure 5 Intel Core 2 Duo Chipset Intel released the P965 Express MCH and ICH8/R chipsets in 2006. The major features of the P965 MCH include Intel's new Fast Memory Access technology, 1066MHz front side bus support, 800MHz DDR-2 memory support, and full support for the new Core 2 Duo processor lineup. The integrated graphics versions, G965 and Q965, will ship with the new GMA X3000 graphics engine for the ViiV and Corporate markets respectively. The new ICH8/R chipsets offer ten USB 2.0 ports, up to six 3Gb/s SATA ports, Intel's new Quiet System Technology, and the removal of Parallel ATA support. While one could argue that the removal of PATA support is a needed step forward in technology, the industry believes that Intel should have waited until the next generation ICH for this change. The Optical Drive market is still about 98% PATA based and does not seem to be changing anytime soon. While this development might spur the optical drive suppliers into offering 16 additional SATA drives in the near future, it does not address the requirements of the current PATA installed base. This most likely means there will be additional costs and complexity on the motherboards using ICH8 as the manufacturers will have to add an additional chipset for PATA support. (Manufacturers could simply choose not have any PATA ports, but at present we feel such a motherboard would be doomed to fail.) Random Access Memory in Personal Computers Dynamic RAM (DRAM) is a type of RAM that employs refresh circuits to maintain data in its logic circuits. Each memory cell in DRAM consists of a single transistor and a capacitor. The capacitor is responsible for holding the electrical charge that designates a 1 bit. The absence of a charge designates a logical 0. Capacitors loose their charge over time and therefore need to be recharged or refreshed. A more expensive and faster type of RAM, Static RAM (SRAM), uses between 4 and 6 transistors in a special ‘flip-flop’ circuit that maintains a 1 or 0 while the computer system is operating. SRAM can be read or written to like DRAM. DRAM logic, on the other hand, is refreshed several hundreds of times a second. To do this, the DRAM controller logic merely reads the contents of each memory cell, and because of the way in which cells are constructed, the reading action simply refreshes the contents of the memory. This action puts the dynamic into DRAM. However, refreshing takes time and increases the latency (the time taken for a memory access request to the time the data is output) of DRAM. DRAM is used in all computers and associated devices for main system memory even though DRAM is slower than SRAM, due to operation of the refresh circuitry. Also, DRAM is used because it is much cheaper and takes up less space—typically 25% the silicon area of SRAMs or less. To build a 256 MB of system SRAM memory would be prohibitively expensive. However, technological advances has led to faster and faster forms of DRAM, despite the disadvantages of the refresh circuit. As indicated, DRAM modules are smaller and less expensive than SRAM because the latter employs four to six transistors (or more) to store one single bit, as opposed to DRAM’s single transistor (the switch) and capacitor (the 1/0 charge store). SRAM is mainly employed in L1 and L2 cache memory on Intel Pentium CPUs and in L1, L2 and L3 cache in the Itanium family. In 2004, Intel released Pentium IVs with and onboard L3 cache and its L2 cache reached a whopping 1 MB. All things being equal, the more cache on board a CPU package, the larger the scale of integration as the number of transistors runs into the millions. Itanium II processors have up to 500 million transistors, the majority of which are in the L1, 2 and 3 caches. 17 DRAM technology involves very large scale integration (VSLI) using a silicon substrate which is etched with the patterns that make the transistors and capacitors. Each unit of DRAM comes packaged in an integrated circuit (IC). By 2003, DRAM technologies had evolved to the point where several competing technologies existed, however, the older of these are slower (i.e. have a higher latency) and contain less MB of storage per unit. Ceterus paribus, adding more memory to a computer system increases its performance. Why is this? Well, if the amount of RAM is insufficient to hold the processes and data required, the operating system has to create a swap file on hard disk, which is used to create virtual memory. On average, it takes a CPU about 200 nanoseconds (ns) to access DRAM compared to 12,000,000ns to access the hard drive. More RAM means less swapping and speeds up a system. Synchronous Dram (SDRAM) In late 1996, synchronous DRAM began to appear in computer systems. Unlike previous RAM technology, which used asynchronous memory access techniques, SDRAM is synchronized with the system clock and the CPU. SDRAM interface is synchronized with the clocked operation of computer's system bus (i.e. clock pulses trigger the gates to open and close) , and thus with the processor. In the SDRAM module itself clock pulses are used to drive logic circuits that pipelines incoming read/write commands. This allows the chip to have a more complex pattern of operation than DRAM which does not have synchronizing control circuits. Pipelining means that the chip can accept a new command before it has finished processing the previous one. In a pipelined write, the write command can be immediately followed by another command without waiting for the data to be written to the memory array. In a pipelined read, the requested data appears a fixed number of clock pulses after the read command. This delay is called the latency—this is a key performance variable for all types of RAM. Thus SDRAM employs interleaving and burst mode functions, which make memory retrieval even faster. SDRAM dual inline memory modules (DIMMs, as opposed to the older single inline memory modules, SIMMs) were available from numbers vendors and at several different packing densities (e.g. the amount of MB on each DIMM) and speeds. The speed of SDRAM chips were closely associated with the speed of the front-side bus, in order to synchronous with the operation of the CPU. For example, PC66 SDRAM runs at 66MHz, PC100 SDRAM runs at 100MHz, PC133 SDRAM runs at 133MHz, and so on. Faster SDRAM speeds such as 200MHz and 266MHz appeared later. 18 Double Data Rate Synchronous SDRAM (DDR SDRAM) DDR SDRAM was targeted at Intel’s 7th Generation Pentium CPUs, as its key innovation is that the memory control logic gates switch on both the leading and trailing edge of a clock pulse, rather than on just the leading edge as with normal gate operation. With typical SDRAM technology binary signals on the control, data and address potions of the system bus from the Northbridge chip to the memory unit are transferred on the leading edge of the clock pulse that opens the bus interface logic gates. Until the advent of the Pentium IV CPU, bus speed was dictated by the system clock speed, which ran at 100 MHz and 133 MHz on the Pentium III. The Frontside Bus (FSB) speed to the Northbridge chip and the portion of the system bus from the Northbridge chip to the memory chips ran at 100 MHz and 133 MHz. The Pentium IV Willamette, however, had a FSB speed of 400 MHz (the 100 MHz system clock was ‘quad pumped’ (x 4) to achieve this). With a data bus width of 64 bits (8 bytes), this gives a data bandwidth of 3.2 Giga Bytes (400 MB x 8 = 3,200 Mbps or 3.2 Gbps). Note, data transfer rates within a computer are rated in Kilo, Mega, or Giga Bytes per second due to the parallel method of transfer, while a computer network (client/server etc.) it is measured in bits per second (Kilo, Mega or Giga bits per second). Older SDRAM technologies (PC100 SDRAM and PC133 SDRAM) operate at system bus speeds and therefore constitute a bottleneck for Pentium IV systems. Hence, the advent of DDR technology which helped alleviate the bottleneck, and Intel’s support for RAMBus DRAM, which it felt was a better solution. With DDR technology, special logic circuitry enables data to be transferred on the leading and trailing edge of the ‘1’ clock pulse (remember each clock cycle consists of a ‘1’ followed by a ‘0’). Taking the data bus for example, a clock pulse transition from the ‘0’ of the preceding cycle to the ‘1’ of the next opens logic gates to allow 64 bits of data onto the data bus, while the transition from ‘1’ to a ‘0’ results in another 64 bits being switched onto the bus. Of course the gates on the Northside chip that serves the memory segment of the system bus open and close in unison. This effectively doubles the speed of operation of SDRAM, hence the term the term Double Data Rate. With DDR SDRAM, a 100 or 133MHz system clock rate yields an effective data rate of 200MHz or 266MHz when double switched Newer designs (PC2700 etc.) are based on DDR SRRAM running at 166MHz, which is double switched to give an effective rate of 333 MHz. 19 Speed Ratings for DDR SDRAM As indicated, in the past, the speeds that SDRAM chips operated at were dictated by the bus speeds, so PC100 and PC133 SDRAM DIMMs operated on 100 and 133 MHz FSBs. DDRDRAM ratings are not based on clock speed, but on the maximum data bandwidth or throughput. DDR-200: DDR-SDRAM memory chips specified to operate at 100 MHz DDR-266: DDR-SDRAM memory chips specified to operate at 133 MHz DDR-333: DDR-SDRAM memory chips specified to operate at 166 MHz DDR-400: DDR-SDRAM memory chips specified to operate at 200 MHz Hence, with a 100 MHz system bus on a Pentium IV Willamette system, the maximum data bandwidth is 1600 Mbytes per second (100 x 2 x 8) or 1.6 GB. Hence, the industry designation for DDR SDRAM DIMMS on 100 MHz systems is PC1600. Likewise, with a 133 MHz system clock speed, the designation for DDR SDRAMs that operate at 133 MHz is PC2100 (133 x 2 x 8 = 2133 MB per second). The reason for this rating system lies with manufacturer’s marketing strategies. For example, RAMBus DRAM RIMMs are designated PC800, because of the rate in MHz at which the memory chips operate. This is the internal and external rate of operation to the Northbridge chip; however, the data bus between memory and the Northbridge chip is a mere 16 bits at which it operates. This gives a bandwidth of 1600 MBytes/second (800 X 2 bytes), the same as DDR (however, the actual throughput of data is higher and latency lower in RAMBus DRAM RIMMs. The manufacturers of DDR SDRAM were reluctant to badge their chips with smaller designations (e.g. PC200 or PC266) as potential customers might not buy their chips even though the difference in performance was negligible. Further advances in DDR SDRAM technologies saw DDR SDRAM-based Intel and VIA chipsets which accommodated PC2400 and PC2700 DDR SRRAM running at 150 MHz and 166 MHz respectively, which is double clocked to 300 and 333 MHz (so called DDR 300 and 333). However, the evolution of DDR366 and new chipset design led to the PC3000 DDR SDRAM being released with even higher bandwidth speeds. New the market is Dual Channel DDR 400 with 3.2GBytes/s (200 x 2 x 8) peak bandwidth. Dual Channel DDR: Making DDR Perform Faster The latest Pentium IV/ Intel 915/925 chipsets have an 800 MHz FSB memory bus that is twice the speed of the 400 MHz DDR400, which would effectively slow things down to 400 MHz clock speed due to the single channel bottleneck. Ideally, you would want to match the processor front side bus and the memory bus, so DDR SDRAM running at 800 MHz would be an optimal 20 solution. However, the technical challenges of getting an 8 byte memory bus to operate at 800 MHz are not insignificant. The solution to this was to adopt a dual channel approach, just like that in RAMBus memory technologies. Practically speaking, dual channel DDR400 requires two DIMM slots and two modules. The architecture, while offering 6.4 GBytes/s of peak bandwidth, simultaneously splits the back-and-forth signaling with the CPU. The signal from each channel comes from one of two sockets in the chipset. In May 2002, Intel began to play catch-up with Via and SiS by releasing Pentium IV chipsets with a 500 MHz FSB that supported DDR SDRAM. This seemed to increase Intel’s commitment to DDR technology. Remember Intel had based its first Pentium IV chipset on RDRAM and the company took a long time it considered adding SDRAM, much less DDR support. However RAMBus was never popular with the computer manufacturing industry, especially among JEDEC1 members and memory suppliers. Even with this, it came as some surprise when at a recent Intel Developers Forum (IDF), the company indicated its support for single- and dual-channel DDR400 in its new 800 MHz FSB chipsets. More significant, however, was that Intel’s memory roadmap for the future did not include RAMBus memory. DDR2 and DDR3 With a clock frequency of 100 MHz, single-data-rate SDRAM transfers data on every rising edge of the clock pulse, thus achieving an effective 100 MHz data transfer rate. However, both DDR and DDR2 are double switched; that is their computer logic is switched to transfer data on the rising and falling edges of the clock, at points of 0.0 V and 2.5 V (1.8 V for DDR2). As indicated above, this achieves an effective rate of 200 MHz (and a theoretical bandwidth of 1.6 GB/s) with the same clock frequency. DDR operates both the internal logic circuits of the memory chip and its I/O bus (to the Intel Northbridge Memory Control Hub (MCH) and AMD Athlon 64 memory controller) at the same speed. DDR2 memory logic operates at half the speed of I/O clock. For example, the I/O clock is the rate at which DDR2 logic gates open and connect to the Northbridge MCH. The internal logic and external I/O bus of DDR PC-3200 operates at 200 MHz; in contrast the internal logic and memory circuits of DDR2 PC2-3200 operates at 100 MHz , while the I/O interface logic switches at 200 MHz. 1 Joint Electron Device Engineering Council (JEDEC) is the semiconductor engineering standardization body of the Electronic Industries Alliance (EIA), a trade association that represents all areas of the electronics industry. 21 DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. To compensate the DDR2 prefetch buffer is 4 bits wide, whereas it is just 2 bits wide for DDR, which has a lower latency. The prefetch buffer for DDR3 is 8 bits wide, thus indicating it has greater latency problems. Another mportant feature of DDR2 is that it consumes less power: power savings are achieved primarily due to an improved manufacturing process, resulting in a drop in operating voltage (1.8 V compared to DDR's 2.5 V). Chip Specifications DDR2-400: run at 100 MHz, I/O clock at 200 MHz, PC2-3200, 3.200 GB/s bandwidth DDR2-533: run at 133 MHz, I/O clock at 266 MHz , PC2-5300, 4.267 GB/s bandwidth DDR2-667: run at 166 MHz, I/O clock at 333 MHz PC2-4200, 5.333 GB/s bandwidth1 DDR2-800: run at 200 MHz, I/O clock at 400 MHz PC2-64006.400 GB/s bandwidth DDR3 SDRAM (Double Data Rate 3 Synchronous Dynamic Random Access Memory) DDR3 SDRAM comes with a promise of a power consumption reduction of 40% compared to current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V). "Dualgate" transistors will be used to reduce leakage of current. DDR3's prefetch buffer width is 8 bit, whereas DDR2's is 4 bit, and DDR's is 2 bit. Theoretically, these modules could transfer data at the effective clockrate of 400-800 MHz (for a single clock bandwidth of 800-1600 MHz), compared to DDR2's current range of 200-533 MHz (400-1066 MHz) or DDR's range of 100-300 MHz (200-600 MHz). To date, such bandwidth requirements have been mainly on the graphics market, where vast transfer of information between framebuffers is required. Intel has preliminarily announced that they expect to be able to offer support for it near the end of 2007. AMD's roadmap indicates their own adoption of DDR3 to come in 2008. DDR3-800 : runs at 100 MHz, I/O clock at 400 MHz DDR3-1066: runs at 133 MHz, I/O clock at 533 MHz DDR3-1333: runs at 166 MHz, I/O clock at 667 MHz Graphics Double Data Rate 3, is a graphics card-specific memory technology, designed by ATI Technologies. 22 GDDR3 memory might sound similar to DDR3 but is more like DDR2: it has been in use for several years in high-end graphic cards such as ones from NVIDIA or ATI Technologies, and as main system memory on the Xbox 360. It is sometimes incorrectly referred to as "DDR3". It has much the same technological base as DDR2, but the power and heat dispersal requirements have been reduced somewhat, allowing for higher-speed memory modules, and simplified cooling systems. Unlike the DDR2 used on graphics cards, GDDR3 is unrelated to the upcoming JEDEC DDR3 specification. This memory uses internal terminators, enabling it to better handle certain graphics demands. To improve bandwidth, GDDR3 memory transfers 4 bits of data per pin in 2 clock cycles. RAMBus DRAM While DDR SDRAM is relatively new, its major competitor, RAMBus DRAM (DRAM), has been around for some time. Intel’s support for RAMBus DRAM, as indicated by its collaboration with Rambus Inc. and its efforts to successfully develop chipset support for the standard and promote this combination as the de facto standard to PC manufacturers, indicates that the future of DRAM is RDRAM. RDRAM is therefore a proprietary technology jointly developed by Intel and Rambus Inc. However, Intel had problems with its initial chipset designs and the emergence of support for DDR among Intel’s major competitors in the CPU and chipset markets, caused Intel to develop a chipset (the i845) to support DDR. Then there was the relative cost of DDR SDRAM and RDRAM, with the latter being prohibitively more expensive until 2002/2003, when the cost of DDR SDRAM DIMMs and RAMBus DRAM RIMMs were more or less equal, but not the PC systems of which they were a part. There are three types of RAMBus DRAM; Base RAMBus, Concurrent RAMBus and Direct RAMBus. Direct RAMBus is the newest DRAM architecture and interface standard that challenges traditional main memory designs. Direct RAMBus transfers data at speeds up to and over 800MHz over a 16-bit bus called a Direct RAMBus Channel (later versions use an 18 bit bus). Therefore, PC600 RDRAM delivers a peak bandwidth of 1200 Mbytes/second, while PC800 delivers 1,600 M Bytes/second. Advances in RAMBus technology has seen new designations such as PC1033 and PC1066, with concomitant increases in bandwidth. PC600: 16-bit, single channel RIMM, specified to operate at 300 MHz clock speed, 1200 MB/s bandwidth PC700: 16-bit, single channel RIMM, specified to operate at 355 MHz clock speed, 1420 MB/s bandwidth 23 PC800: 16-bit, single channel RIMM, specified to operate at 400 MHz clock speed, 1600 MB/s bandwidth PC1066 (RIMM 2100): 16-bit, single channel RIMM specified to operate at 533 MHz clock speed, 2133 MB/s bandwidth PC1200 (RIMM 2400): 16-bit, single channel RIMM specified to operate at 600 MHz clock speed, 2400 MB/s bandwidth RIMM 3200: 32-bit, dual channel RIMM specified to operate at 400 MHz clock speed, 3200 MB/s bandwidth RIMM 4200: 32-bit, dual channel RIMM specified to operate at 533 MHz clock speed, 4200 MB/s bandwidth RIMM 4800: 32-bit, dual channel RIMM specified to operate at 600 MHz clock speed, 4800 MB/s bandwidth RIMM 6400: 32-bit, dual channel RIMM specified to operate at 800 MHz clock speed, 6400 MB/s bandwidth However, there is a problem when it comes to RAMBus DRAM. While PC800 RAMBus DRAM delivers the same throughput as PC1600 DDR SDRAM, the efficiency of RDRAM is in the order of 80% or over. On the other hand, DDR SDRAM delivers between 40% - 70% dependant on systems and applications. This might seem to give RAMBus the advantage, however, the latency of RAMBus DRAM is higher, and increases with every RIMM installed. Remember latency refers to the time that elapses from the CPU requesting (addressing) instructions or data from RAM and the time it receives it. PC-800 RDRAM operated with a latency of 45 ns, compared to only 7.5 ns for PC-133 SDRAM. RDRAM memory chips also put out significantly more heat than SDRAM chips, necessitating heatspreaders on all RIMM devices. RDRAM includes a memory controller on each memory chip, significantly increasing manufacturing complexity compared to SDRAM, which used a single memory controller located on the northbridge chipset. RDRAM was also two to three times the price of PC-133 SDRAM due to a combination of high manufacturing costs and high license fees.[citation needed] PC-2100 DDR SDRAM, introduced in 2000, operated with a clockspeed of 133 MHz and delivered 2100 MB/s over a 64-bit bus using a 184-pin DIMM form factor. So, what’s the net effect of higher efficiency and throughput and higher latency? In 2003, the jury was still out, but DDR SDRAM then remained the best option for many PC systems, except in high end solution, where high cost and performance is the goal. The reason for this is the evolution of even higher DRAM to Northbridge bandwidths of the PC2700 and PC3000 DDR SDRAM. With the introduction of the i840 chipset, Intel added support for dual-channel PC-800 RDRAM, doubling bandwidth to 3200 MB/s by increasing the bus width to 32-bit. This was followed in 2002 by the i850E chipset, which introduced PC-1066 RDRAM, increasing total 24 dual-channel bandwidth to 4200 MB/s. Then in 2002, Intel released the E7205 Granitebay chipset, which introduced dual-channel DDR support for a total bandwidth of 4200 MB/s, but at a much lower latency than competing RDRAM. 25