More About Processors CSIT 301 (Blum) 1 Pentium 4 Processor Specs CSIT 301 (Blum) 2 The above list of processor specifications includes such aspects as •CPU Speed, Bus Speed, Manufacturing technology, Stepping, Cache Size, Package Type CSIT 301 (Blum) 3 CPU Speed CSIT 301 (Blum) 4 CPU Speed • The activities of the processor are kept in sync by the clock. • The clock goes through a regular/repetitive action. In a binary system, a cycle consists of a 1 and a 0 (a high followed by a low). • The clock is usually a quartz oscillator that is external to the microprocessor. • So the CPU speed is not something built into the chip, but rather the maximum rate at which the chip can be expected to perform normally. CSIT 301 (Blum) 5 CPU Speed (Cont.) • Sometimes differently rated chips are made from the same manufacturing process, and the CPU speed is determined by some testing after the fact. • Some people try to operate the processor faster than the designated rate. This is known as “overclocking.” CSIT 301 (Blum) 6 CPU Speed (Cont.) • The speed is measured in Hertz, which are cycles per second. – KiloHertz, kHz, is thousands (103) of cycles per second – MegaHertz, MHz, is millions (106) of cycles per second – GigaHertz, GHz, is billions (109) of cycles per second – What’s next? CSIT 301 (Blum) 7 CPU Speed (Cont.) • The clock speed is also known as the clock’s frequency (the number of cycles per second). • A related quantity is called the period which is the time required for one cycle (a.k.a. as a clock tick). • A clock’s frequency and period are reciprocals. – f = 1/T or T = 1/f, where f is frequency and T is period – E.g. a frequency of 60 Hertz (cycles per second) corresponds to a period of 1/60 = 0.0167 seconds per cycle CSIT 301 (Blum) 8 CPU Speed (Cont.) • A frequency of 1 kHz [a thousand cycles per second] corresponds to a period (tick) of 1 millisecond (ms) [a thousandth (10-3) of a second per cycle]. • A frequency of 1 MHz [a million cycles per second] corresponds to a period (tick) of 1 microsecond (s) [a millionth (10-6) of a second per cycle]. • A frequency of 1 GHz [a billion cycles per second] corresponds to a period (tick) of 1 nanosecond (ns) [a billionth (10-9) of a second per cycle]. CSIT 301 (Blum) 9 Bus Speed CSIT 301 (Blum) 10 Bus Speed • There is a hierarchy of buses in a computer, but in a discussion of processors, the buses of interest are the front-side bus and the back-side bus. • In early processors the CPU speed and bus speed (and thus the speed of interactions with memory, etc.) were the same. But a bottleneck (the von Neumann bottleneck) arose because memory speeds cannot keep up with processor speeds. And so accessing the memory was holding the processor back. CSIT 301 (Blum) 11 Front-side Bus (FSB) • The Front-side Bus (a.k.a. the memory bus or system bus) connects the processor to other parts via the chipset. • It allows communication between the processor and main memory (RAM), the system chipset, PCI devices, the AGP card, and other peripheral buses. • When the “bus speed” is given as one of the processor’s specs it refers to the front-side bus speed. CSIT 301 (Blum) 12 The Northbridge • A chipset is a simply group of chips that work together to perform related functions. • The Northbridge chipset communicates with the processor (using the FSB) and controls interaction with memory, the PCI bus, and AGP. • Northbridge’s partner in the chipset is the Southbridge. The Southbridge handles the IO functions. – The Intel Hub Architecture (IHA) is replacing the Northbridge/Southbridge chipset. CSIT 301 (Blum) 13 Backside Bus • The back-side bus (a.ka. the cache bus) connects the processor to L2 cache. The term back-side bus is reserved for cases in which the L2 cache is packaged with the microprocessor. – If the L2 cache is separate from the processor, the frontside bus will connect the processor to the Level 2 cache. • Cache (SRAM) operates faster than memory (DRAM). The backside bus operates at faster speeds than the front-side bus, sometimes it works at the processor speed. CSIT 301 (Blum) 14 FSB Speeds • The ratio between the CPU speed and bus speed is a simple fraction. – For example, a CPU speed of 3.2 GHz and bus speed of 800 MHz has a ratio of 4. • With Pentium III’s the 100 and 133 MHz FSB speeds became standard. • That rate has been somewhat fixed for a few years but what is changing is the amount of data transferred each clock cycle. • This is where one begins to talk of “DDR” or “quad-pumped.” CSIT 301 (Blum) 15 Edge-triggering CSIT 301 (Blum) 16 Edge triggering • The clock keeps the various circuit elements working in unison. • Elements are typically designed to be active on the “edge” of the clock – either – when it is rising (the positive edge) – Or when it is falling (the negative edge) • More precise than level activation, where the action takes places when the clock has a certain state or level (e.g. when the clock is high). CSIT 301 (Blum) 17 DDR • Double Data Rate (DDR) allows data to be fetched on both the positive and negative edges of the clock. – Thus it is essentially the equivalent of doubling clock rate. – E.g. a 100MHz DDR transfer equals that of a 200MHz SDR transfer CSIT 301 (Blum) 18 Quad pumped • A quad pumped bus allows four signals to be communicated per clock cycle. This is sometimes called QDR (Quad Data Rate). • Pentium 4’s uses a quad pumped FSB. – The 400MHz FSB is a 100MHz bus with four signals per cycle. – The 533MHz FSB is a quad-pumped 133MHz bus. • Quad pumping is one of the features of the Pentium 4 Net-Burst micro-architecture. CSIT 301 (Blum) 19 Manufacturing Technology CSIT 301 (Blum) 20 Manufacturing technology • The next specification found in the table is manufacturing technology, which indicates the size of the components (mainly transistors) which reflects the number of components that can be placed on the chip. • In earlier microprocessors, one used terms like large-scale integration (LSI), very large-scale integration (VLSI) and ultra large-scale integration (ULSI). – But as Moore’s Law continued to hold true, we ran out of adjectives. CSIT 301 (Blum) 21 Manufacturing Technology • Today the manufacturing technology is given in terms of microns or nanometers (e.g. the 0.13micron or the 90-nm technology). – A nanometer (nm) is a billionth of a meter (10-9 m). • The same chip may be made using different technologies, but this is to done to perfect the newer technology so that more components can be added to latter chips. CSIT 301 (Blum) 22 Stepping CSIT 301 (Blum) 23 Stepping • As with software, mistakes (errata) in hardware are found and revisions are needed. However, hardware mistakes are more difficult to fix. • The stepping refers to various fixes, so one wants a higher stepping which presumably has fewer bugs. – AMD uses the term “revision number.” • The circuitry cannot be changed on an existing chip, it might be possible to overcome a processor bug by changing the BIOS which can be changed (flashed). CSIT 301 (Blum) 24 Pentium 4 Product Information CSIT 301 (Blum) 25 Document on Specification Update (Stepping Levels) CSIT 301 (Blum) 26 Cache size CSIT 301 (Blum) 27 Cache • Recall that there are three levels of cache (L1, L2 and L3) associated with the processor. • The cache specification on the previous slide refers to L2 cache. • A more detailed set of specification will reveal the amount of L1 and L2 as well as the amount of L3 that can be supported. CSIT 301 (Blum) 28 Package Type CSIT 301 (Blum) 29 Form Factor and Package • The term form factor applies to many devices including processors. It refers to their size and shape. And in the case of processors it also includes how they connect to the motherboard. – The motherboard has a slot or socket. • A related term is the “package” — an enclosure for a chip (integrated circuit). CSIT 301 (Blum) 30 Pinning The pins or leads are how a chip interfaces with the outside world. There are various ways to arrange the pins on a chip. Furthermore, several chips can be brought together into unit called a module (common in memory). CSIT 301 (Blum) 31 PGA/DIP/SIP • PGA: pin grid array, chip in which the pins are located on the bottom in concentric squares. – Used in some microprocessors. • DIP: dual in-line package, rectangular chip with two rows of pins, one on each side. • SIP: single in-line package, chip with pins protruding from one side CSIT 301 (Blum) 32 SEPP An out-dated processor packaging scheme. CSIT 301 (Blum) • Single-Edge Processor Package • With the S.E.P.P. form factor, the processor is not completely covered by the black plastic (as in S.E.C.C.and S.E.C.C.2). • The circuit board (substrate) can be seen from the bottom side. 33 SECC Another out-dated processor packaging scheme. CSIT 301 (Blum) • Single Edge Contact Connector • With the S.E.C.C. form factor, processors have a plastic shroud covering with an active heatsink and fan. • Identifiable by the goldfinger contacts which in this case are inside of the plastic housing. 34 Heat • Recall that in the history of processors the number of transistors continues to grow (Moore’s Law) while the relative size of the chip stays fixed. With more transistors carrying current, more heat is produced. • Various developments have occurred to deal with the issue of heat. One is a reduction in the working voltage (5V 3.3V 2V). Another has been the introduction of the heatsink and fan. CSIT 301 (Blum) 35 Heat Sink • The computer has had a fan for some time to deal with heat. But starting with the 486, the processor needed special consideration. • A heat sink is an element designed to take heat away from the processor. • In this case, heat is dissipated mainly via convection, the heat is transferred to the nearby air and is carried away with the air as it moves. – Convection is why a breeze feels nice on a hot summer day. CSIT 301 (Blum) 36 Desired Effects • A heat sink should have a large surface area since this is where the heat is transferred to the air. • But the heat sink should not block the air flow since this is how the heat is carried away. • Heat sinks often have very strange shapes to try to maximize these two competing effects. – Typically made of Aluminum – May have “fins” CSIT 301 (Blum) 37 Heat Sinks CSIT 301 (Blum) 38 Passive and Active • All modern processors have a heat sink. Some also require a fan. – Without a fan: passive heat sink – With a fan: active heat sink • Because the heat sink’s purpose is to dissipate heat, it is important that the heat can get from the processor to the heat sink. The material “gluing” the heat sink to the processor must conduct heat well. • A heat slug is a piece of metal that connects the processor core to the processor package and/or heatsink. CSIT 301 (Blum) 39 SECC2 • As with SECC, with SECC2 the processors have a plastic housing with an active heatsink (means it has a fan). • It is distinct from SECC in that the goldfinger contacts are exposed. CSIT 301 (Blum) 40 PPGA • Plastic Pin Grid Array • With PPGA the processors have pins arranged in a square pattern. They fit into Socket 370 motherboards. • Look for the square pattern (Pin Grid Array) on the bottom. • Slot connectors do not have pins. CSIT 301 (Blum) 41 FC-PGA • Flipped-Chip Pin Grid Arrays • The chip is designed so that the “core” processor, which is the part that gets the hottest, is on top (closer to the heat sink). • Also fits into a socket 370 motherboard. But it must be a FCPGA compliant motherboard for FCPGA processor to work. CSIT 301 (Blum) 42 Pentium 4 Form Factors • Pentium 4’s also come in a FCPGA form factor. – The package uses 478 pins, which are 2.03 mm long and .32 mm in diameter. • FCBGA (Flip Chip Ball Grid Array) – Instead of pins, FCBGA uses small balls, which acts as contacts for the processor. Pins bend, ball don’t. – The package uses 479 balls, which are .78 mm in diameter. CSIT 301 (Blum) 43 The LGA • "Intel’s new LGA, or Land Grid Array, 775 processor socket takes a step away from traditional implementations in that the package no longer features pins, rather the bottom of the LGA 775 processors only have small gold contacts. With the LGA package, Intel has moved the pins into the bottom portion of the processor socket, something that will make installation of the processor easier in that there is no need to watch for bent pins on the package...although it will make it more difficult as well. You no longer need to worry about bent or damaged pins on the processor, rather now you have to worry twice as much about bent pins within the processor socket itself." • http://rootprompt.org/article.php3?article=7115 CSIT 301 (Blum) 44 The previous specifications differentiated one Pentium 4 from another. Now let us look at some of the features that differentiate the Pentium 4 from other Intel microprocessors. CSIT 301 (Blum) 45 Micro-architecture • A processor’s architecture refers to its instruction set, the number and type of registers, and memoryresident data structures (e.g. stacks) that are available to a programmer (at least at the assembly level). • A processor’s micro-architecture refers to the hardware implementation of the architecture (the transistors). • Backward compatibility is within the architecture (which is more of a logical level). The microarchitecture (implementation) may change dramatically and is not necessarily compatible with previous versions. CSIT 301 (Blum) 46 NetBurst Micro-architecture • Features of the Pentium 4’s NetBurst microarchitecture include: – – – – – Hyper Pipelined Technology Improved Branch Prediction Level 1 Execution Trace Cache Rapid Execution Engine 400 or 533 MHz System Bus (quad pumping) • Actually even faster now CSIT 301 (Blum) 47 NetBurst CSIT 301 (Blum) 48 Pipelining • Recall that to execute an instruction, one must fetch it, decode it, fetch any data required, execute the instruction, write the answer to the appropriate place and possibly look for an interrupt requests that might have occurred during the previous. • In pipelining a processor can begin executing a second instruction before the first has been completed. • Thus, many instructions are in the pipeline at the same, though at various processing stages. CSIT 301 (Blum) 49 Pipelining • The pipeline is divided into segments. Each segment can perform its duty at the same time as the other segments. • When a segment completes its task, it passes the result to the next segment and fetches the next operation from the preceding segment. • Once a feature of only high-end processors, now pipelining is standard. – A Pentium had up to six instruction in the pipeline. CSIT 301 (Blum) 50 Hyper-Pipelined Technology • Pentium 4’s Hyper-pipelined technology uses a 20-stage pipeline. • Having so many instructions in the works can be a problem if the program branches and one has the wrong instructions in the pipeline. • For long pipelines to be effective there must be good “branch prediction.” CSIT 301 (Blum) 51 NetBurst CSIT 301 (Blum) 52 BPU • The Pentium 4’s Branch Prediction Unit (BPU) is about 33% more efficient than that of the Pentium III at predicting the instruction one needs to line up. • The improved BPU is part of what Intel calls “Advanced Dynamic Execution.” CSIT 301 (Blum) 53 NetBurst CSIT 301 (Blum) 54 Rapid Execution Engine • Pentium 4’s have two Arithmetic Logic Units (ALUs) clocked at twice the core processor frequency. • This allows basic integer instructions such as Add, Logical AND, etc. to execute in half of a clock cycle. • E.g. the Rapid Execution Engine on a 1.50 GHz Pentium 4 processor runs at 3 GHz. CSIT 301 (Blum) 55 NetBurst CSIT 301 (Blum) 56 Level 2 Advanced Transfer Cache • L2 Advanced Transfer Cache (ATC) yields a higher throughput between L2 cache and processor. – – – – 256 KB in the 0.18 micron technology 512 KB in the 0.13 micron technology 1MB in the 90-nm technology Now up to 2MB CSIT 301 (Blum) 57 Manufacturing Technology and Cache Correlation If we can put more on the chip, one thing we will choose to put on is more cache. CSIT 301 (Blum) 58 Advanced Transfer Cache • Features of the ATC include: – Non-Blocking, full speed, on-die level 2 cache – 8-way set association • We will explain that when we cover cache – 512-bit or 256-bit data bus to the level 2 cache – data clocked into and out of the cache every clock cycle. – The Data Prefetch Logic anticipates the data needed by an application and pre-loads it into the Advanced Transfer Cache, further increasing processor and application performance. CSIT 301 (Blum) 59 NetBurst CSIT 301 (Blum) 60 Level 1 Execution Trace Cache • Along with an 8-KB data cache, the Pentium 4 has a 12-KB Execution Trace Cache that stores decoded micro-instructions in the order of program execution. • Caching decoded micro-instructions saves on the instruction decoding portion of execution. • Storing them in execution order speeds things up and prevents one from having to store instructions that are “jumped over”. CSIT 301 (Blum) 61 NetBurst CSIT 301 (Blum) 62 Enhanced Floating Point and Multimedia Unit • The Pentium4 has an expanded 128-bit floating point register and an additional register for data movement. • It improves performance on floating-point operations and multimedia applications. CSIT 301 (Blum) 63 NetBurst CSIT 301 (Blum) 64 Internet Streaming SIMD Extensions • SSE is an acronym within an acronym: It stands for Streaming SIMD Extensions, where SIMD is Single Instruction Multiple Data • SSE consists of 70 SIMD instructions for integer and floating-point operations. It helps with high resolution images, audio and video viewing, speech recognition etc. • Pentium 4 actually uses SSE2. • SEE2 adds 144 new instructions. CSIT 301 (Blum) 65 Hyperthreading CSIT 301 (Blum) 66 Special Compiler • A compiler is a software tool that takes raw source code and converts (or compiles) it into a machine language a computer can understand. • Intel® compilers have additional features that make code run more efficiently and take advantage of Intel® NetBurst™ architecture. CSIT 301 (Blum) 67 VTune • The Intel® VTune™ Performance Analyzer is used to determine how software performs when run on a specific processor such as the Intel® Pentium® 4 processor. Software developers can then optimize their software to utilize a processor's features such as SSE2. CSIT 301 (Blum) 68 References • PC Hardware in a Nutshell, Thompson and Thompson • http://www.webopedia.com • http://www.intel.com • http://www.anandtech.com • http://www.mbreview.com/lga775.php CSIT 301 (Blum) 69