Introduction to Computer Administration Week-2 Advanced Concepts related to Computer Parts and Types 1. System Clock 2. MPIS / TFLOPS 3. Cache 4. DMA (Direct Memory Access) 5. Pipeline 6. SMP (symmetric multiprocessing) 7. Single-Point-Of-Failure System Clock ( Clock Rate , MHz , GHz ) The clock rate is the fundamental rate in cycles per second (measured in hertz) for the frequency of the clock in any synchronous circuit. For example, a crystal oscillator frequency reference typically is synonymous with a fixed sinusoidal waveform, a clock rate is that frequency reference translated by electronic circuitry into a corresponding square wave pulse [typically] for digital electronics applications. In this context the use of the word, speed (physical movement), should not be confused with frequency or its corresponding clock rate. Thus, the term "clock speed" is a misnomer. CPU manufacturers typically charge premium prices for CPUs that operate at higher clock rates. For a given CPU, the clock rates are determined at the end of the manufacturing process through actual testing of each CPU. CPUs that are tested as complying with a given set of standards may be labeled with a higher clock rate, e.g., 1.50 GHz, while those that fail the standards of the higher clock rate yet pass the standards of a lesser clock rate may be labeled with the lesser clock rate, e.g., 1.33 GHz, and sold at a relatively lower price Limits to clock rate The clock rate of a CPU is normally determined by the frequency of an oscillator crystal. The first commercial PC, the Altair 8800 (by MITS), used an Intel 8080 CPU with a clock rate of 2 MHz (2 million cycles/second). The original IBM PC (c. 1981) had a clock rate of 4.77 MHz (4,772,727 cycles/second). In 1995, Intel's Pentium chip ran at 100 MHz (100 million cycles/second), and in 2002, an Intel Pentium 4 model was introduced as the first CPU with a clock rate of 3 GHz (three billion cycles/second corresponding to ~3.3 10-10seconds per cycle). With any particular CPU, replacing the crystal with another crystal that oscillates half the frequency ("underclocking") will generally make the CPU run at half the performance. It will also make the CPU produce roughly half as much waste heat. The clock rate of a computer is only useful for providing comparisons between computer chips in the same processor family. An IBM PC with an Intel 80486 CPU running at 50 MHz will be about twice as fast as one with the same CPU, memory and display running at 25 MHz, while the same will not be true for MIPS R4000 running at the same clock rate as the two are different processors with different functionality. Furthermore, there are many other factors to consider when comparing the performance of entire computers, like the clock rate of the computer's front-side bus (FSB), the clock rate of the RAM, the width in bits of the CPU's bus and the amount of Level 1, Level 2 and Level 3 cache. Further, in many cases a computer's performance depends on factors outside of the CPU, such as the speed of access to storage devices such as hard drives. Clock rates should not be used when comparing different computers or different processor families. Rather, some software benchmark should be used. Clock rates can be very misleading since the amount of work different computer chips can do in one cycle varies. For example, RISC CPUs tend to have simpler instructions than CISC CPUs (but higher clock rates), and superscalar processors can execute more than one instruction per cycle (on average), yet it is not uncommon for them to do "less" in a clock cycle. In addition, subscalar CPUs or use of parallelism can also affect the quality of the computer regardless of clock rate. Clock rates: 1 Hertz = 1 Cycle Per Second 1 KHz = 1024 Cycles Per Second 1 MHz = 106 Cycle Per Second 1 GHz = 109 Cycle Per Second 1 THz = 1012 Cycle Per Second MIPS Processor Intel 486DX DEC Alpha 21064 EV4 IPS IPS/MHz Year 54 MIPS at 0.818 MIPS/MHz 1992 66 MHz 300 MIPS at 2 MIPS/MHz 1992 Source [4] Motorola 68060 Intel Pentium Pro ARM 7500FE PowerPC G3 Zilog eZ80 Intel Pentium III 150 MHz 88 MIPS at 66 MHz 541 MIPS at 200 MHz 35.9 MIPS at 40 MHz 525 MIPS at 233 MHz 80 MIPS at 50 MHz 1,354 MIPS at 500 MHz 1.33 MIPS/MHz 1994 2.705 MIPS/MHz 1996 [5] 0.897 MIPS/MHz 1996 2.253 MIPS/MHz 1997 1.6 MIPS/MHz 1999 [6] 2.708 MIPS/MHz 1999 [7] Freescale MPC8272 760 MIPS at 1.9 MIPS/MHz 400 MHz 3,561 MIPS at 1.2 GHz 5,935 MIPS AMD Athlon XP 2400+ at 2.0 GHz 9,726 MIPS Pentium 4 Extreme Edition at 3.2 GHz 2,000 MIPS ARM Cortex A8 at 1.0 GHz 12,000 MIPS AMD Athlon FX-57 at 2.8 GHz AMD Athlon 64 3800+ X2 14,564 MIPS (Dual Core) at 2.0 GHz Xbox360 IBM "Xenon" 19,200 MIPS Triple Core at 3.2 GHz 10,240 MIPS PS3 Cell BE (PPE only) at 3.2 GHz AMD Athlon FX-60 (Dual 18,938 MIPS Core) at 2.6 GHz Intel Core 2 Extreme 27,079 MIPS X6800 at 2.93 GHz Intel Core 2 Extreme 49,161 MIPS QX6700 at 2.66 GHz 8,800 MIPS P.A. Semi PA6T-1682M at 2.0 GHz Intel Core 2 Extreme 59,455 MIPS QX9770 at 3.2 GHz AMD Athlon 2000 Integrated Communications Processors 2.967 MIPS/MHz 2000 2.967 MIPS/MHz 2002 3.039 MIPS/MHz 2003 2.0 MIPS/MHz 2005 [8] 4.285 MIPS/MHz 2005 7.282 MIPS/MHz 2005 2.0 MIPS/MHz 2005 3.2 MIPS/MHz 2006 [9] 7.283 MIPS/MHz 2006 [9] 9.242 MIPS/MHz 2006 [9] 18.481 MIPS/MHz 2006 [10] 4.4 MIPS/MHz 2007 [11] 18.580 MIPS/MHz 2008 [12] Intel Core i7 Extreme 965EE AMD Phenom II X4 940 Black Edition 76,383 MIPS at 3.2 GHz 42,820 MIPS at 3.0 GHz 23.860 MIPS/MHz 14.273 MIPS/MHz 2008 [13] 2009 [14] TFLOPS (1012 FLoating point Operations Per Second) In computing, FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to the older, simpler, instructions per second. Computer Performance Name FLOPS yottaFLOPS 1024 zettaFLOPS 1021 exaFLOPS 1018 petaFLOPS 1015 teraFLOPS 1012 gigaFLOPS 109 megaFLOPS 106 kiloFLOPS 103 Cache Memory (Key words: Cache Hit, Cache Miss, Hit Rate , Latency , Cache Types) A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory. When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory. The diagram on the right (above) shows two memories. Each location in each memory has a datum (a cache line), which in different designs ranges in size from 8 to 512 bytes. The size of the cache line is usually larger than the size of the usual access requested by a CPU instruction, which ranges from 1 to 16 bytes. Each location in each memory also has an index, which is a unique number used to refer to that location. The index for a location in main memory is called an address. Each location in the cache has a tag that contains the index of the datum in main memory that has been cached. In a CPU's data cache these entries are called cache lines or cache blocks. Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer used to speed up virtual-to-physical address translation for both executable instructions and data. When the processor needs to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, we say that a cache hit has occurred; otherwise, we speak of a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. The proportion of accesses that result in a cache hit is known as the hit rate, and is a measure of the effectiveness of the cache. In the case of a cache miss, most caches allocate a new entry, which comprises the tag just missed and a copy of the data from memory. The reference can then be applied to the new entry just as in the case of a hit. Misses are comparatively slow because they require the data to be transferred from main memory. This transfer incurs a delay since main memory is much slower than cache memory, and also incurs the overhead for recording the new data in the cache before it is delivered to the processor. DMA (Direct Memory Access) Direct memory access (DMA) is a feature of modern computers and microprocessors that allows certain hardware subsystems within the computer to access system memory for reading and/or writing independently of the central processing unit. Many hardware systems use DMA including disk drive controllers, graphics cards, network cards and sound cards. DMA is also used for intra-chip data transfer in multicore processors, especially in multiprocessor system-on-chips, where its processing element is equipped with a local memory (often called scratchpad memory) and DMA is used for transferring data between the local memory and the main memory. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without a DMA channel. Similarly a processing element inside a multi-core processor can transfer data to and from its local memory without occupying its processor time and allowing computation and data transfer concurrency. Without DMA, using programmed input/output (PIO) mode for communication with peripheral devices, or load/store instructions in the case of multicore chips, the CPU is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. This is especially useful in real-time computing applications where not stalling behind concurrent operations is critical. Another and related application area is various forms of stream processing where it is essential to have data processing and transfer in parallel, in order to achieve sufficient throughput. A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. Such an operation does not stall the processor, which as a result can be scheduled to perform other tasks Pipeline An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput (the number of instructions that can be executed in a unit of time). The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. The term pipeline refers to the fact that each step is carrying data at once (like water), and each step is connected to the next (like the links of a pipe.) Generic pipeline Generic 4-stage pipeline; the colored boxes represent instructions independent of each other To the right is a generic pipeline with four stages: 1. 2. 3. 4. Fetch Decode Execute Write-back (for lw and sw memory is accessed after execute stage) The top gray box is the list of instructions waiting to be executed; the bottom gray box is the list of instructions that have been completed; and the middle white box is the pipeline. Execution is as follows: Time Execution Clock Tick 0 1 2 3 Four instructions are awaiting to be executed the green instruction is fetched from memory the green instruction is decoded the purple instruction is fetched from memory the green instruction is executed (actual operation is performed) the purple instruction is decoded 4 5 6 7 8 9 the blue instruction is fetched the green instruction's results are written back to the register file or memory the purple instruction is executed the blue instruction is decoded the red instruction is fetched the green instruction is completed the purple instruction is written back the blue instruction is executed the red instruction is decoded The purple instruction is completed the blue instruction is written back the red instruction is executed the blue instruction is completed the red instruction is written back the red instruction is completed All instructions are executed SMP (symmetric multiprocessing) In computing, symmetric multiprocessing or SMP involves a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory. Most common multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors. SMP systems allow any processor to work on any task no matter where the data for that task are located in memory; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently. Single Point of Failure A Single Point of Failure, (SPOF), is a part of a system which, if it fails, will stop the entire system from working [1]. They are undesirable in any system whose goal is high availability, be it a network, software application or other industrial system. The assessment of a potentially single location of failure identifies the critical components of a complex system that would provoke a total systems failure in case of malfunction. Highly reliable systems may not rely on any such individual component. The strategy to prevent total system failure is Reduced Complexity Complex systems shall be designed according to principles decomposing complexity to the required level. Redundancy Redundant systems include a double instance for any critical component with an automatic and robust switch or handle to turn control over to the other well functioning unit (failover) Diversity Diversity design is a special redundancy concept that cares for the doubling of functionality in completely different design setups of components to decrease the probability that redundant components might fail both at the same time under identical conditions. Transparency Whatever systems design will deliver, long term reliability is based on transparent and comprehensive documentation.