COMPUTER ABSTRACTIONS AND TECHNOLOGY INTRODUCTION • Computer technology has made incredible progress since the invention of the first general purpose computers • Computer designers were able to succeed commercially with newer architectures after the improvement rate brought about by the microprocessors by • the virtual elimination of assembly language reduced the need for object-code compatibility The creation of standard, vendor-independent operating systems like unix and linux These changes led to the development of a new architecture that could run simpler instructions, called RISC that focused on instruction level parallelism and the use of caches CLASSES OF COMPUTERS The five classes of computers CLASSES OF PARALLELISM • Parallelism is using multiple processing elements to solve any problem • Two types of parallelism in applications Data level parallelism (DLP) many data items can be operated on at the same time Task level parallelism (TLP) tasks of work are created that can run independently and largely in parallel CLASSES OF PARALLELISM • Computer hardware expoit the two levels of parallelism in four ways Instruction level parallelism Vector architectures, graphic processor units (GPUs) and multimedia instruction sets exploits DLP by applying an instruction to a collection of data in parallel Thread level parallelism exploits DLP with compiler’s help using ideas like pipelining and speculative executions exploits DLP or TLP in a tightly coupled hardware that allows for interaction between parallel threads Request level parallelism exploits parallelism amonst largely decoupled tasks specified by the programmer or operating system FLYNN’S TAXONOMY • Single Instruction, single data (SISD) • Single instruction, multiple data (SIMD) • a single instruction runs on multiple processors,operating on different data streams on the processors exploits DLP Multiple instruction, single data (MISD) • made of the uniprocessor processes a single instruction on a single data stream executes multiple instructions on multiple processors, whiles operating on the same data stream No commercial ones available Multiple instruction stream, multiple data stream (MIMD) a processor executes his own instruction and operates on its own data targets TLP DEFINING COMPUTER ARCHITECTURE ISA: THE MYOPIC VIEW OF ARCHITECTURE • Computer architecture is the design of instruction set and implementations • Instruction set architecture (ISA) is the programmer-visible instruction set • ISA serves as the boundary between software and hardware • eg. RISC-V, ARM , 80x86 ISAs • DIMENSIONS OF ISA Class of ISA Memory addressing Addressing modes Types and sizes of operands Operations Control flow instructions Encoding an ISA GENUINE COMPUTER ARCHITECTURE • Computer implementation has two components: Organization or microarchitecture Hardware high-level aspects of a computer’s design, such as the memory system, the memory interconnects, and the design of the internal processor Two processors can have the same ISA but different organization, an example is the Intel core i7 and AMD Opteron Specifics of a computer, including the detailed logic design and the packaging technology of the computer Two processors can have the same ISA and similar organization but different hardware, an example is the Intel core i7 and Interl Xeon E7 with different clock rates and different memory systems COMPUTER ARCHITECTURE = ISA + Organization + Hardware FUNCTIONAL REQUIREMENTS TRENDS IN TECHNOLOGY • From a systems designer’s point of view, there are three primary concerns to power, energy and performance What is the maximum power a processor requires What is the sustained power consumption - Thermal Design Power (TDP) Energy and energy efficiency TRENDS IN POWER AND ENERGY OF IC’S POWER AND ENERGY: A SYSTEM’S PERSPECTIVE • From a systems designer’s point of view, there are three primary concerns to power, energy and performance What is the maximum power a processor requires What is the sustained power consumption - Thermal Design Power (TDP) Energy and energy efficiency ENERGY AND POWER WITHIN A MICROPROCESSOR • For CMOS chips, the traditional primary energy consumption has been in switching transistors, also called dynamic energy. • Energydynamic α capacitive load * voltage • Energydynamic α * capacitive load * voltage is the power required per transition from a 1 to 0 or a 0 to 1 • Powerdynamic α 1/2 * capacitative load * voltage * Frequency switched is the power required per transister EXAMPLE • QUESTION • • Some microprocessors today are designed to have adjustable voltage, so a 15 reduction in voltage may result in a 15% reduction in frequency. What would be the impact on dynamic energy and on dynamic power? ANSWER 𝐸𝑛𝑒𝑟𝑔𝑦𝑛𝑒𝑤 (𝑣𝑜𝑙𝑡𝑎𝑔𝑒 ∗ 0.85)2 • = = 0.852 = 0.72 2 𝐸𝑛𝑒𝑟𝑔𝑦𝑜𝑙𝑑 𝑣𝑜𝑙𝑡𝑎𝑔𝑒 • 𝑃𝑜𝑤𝑒𝑟𝑛𝑒𝑤 𝑃𝑜𝑤𝑒𝑟𝑜𝑙𝑑 • Therefore the energy and power used by the microprocessor is reduced to aboult 72% and 61% respectively = (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑠𝑤𝑖𝑡ℎ𝑐𝑒𝑑 ∗ 0.85) * 0.72 = 0.61 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑑 TECHNIQUES TO IMPROVE ENERGY EFFICIENCY • Do nothing well • disabling clocks for inactive modules are employed Dynamic voltage-frequency scaling (DVFS) reduce the frequency of clock cycles and voltages for computers at periods of lower activities Design for the typical case memory and storage of PMDs have low power modes to save energy microprocessors have inbuilt heat sensors that regulate the activities of the microprocessors Overclocking Increasing the clock rate of a microprocessor within a short period, until temperatures start to rise growth of clock rates over the years ENERGY AND POWER WITHIN A MICROPROCESSOR CONT’D • Power consumption when there is no circuit activity is Static Power, and occurs due to leakage current within the devices • Powerstatic α Static current * voltage • The larger the number of devices, the larger the overall static power • Can be controlled with techniques such as power-gating or race-to-halt THE SHIFT IN COMPUTER ARCHITECTURE BECAUSE OF LIMITS IN ENERGY Comparison of the energy and die area of arithmetic operations and energy cost of accesses to SRAM and DRAM • Dark silicon, the phenomenom where idle transistors are shutdown to reduce the energy usage and power consumption • Studies of this phenomenon led to what we call domain specific processors which save energy by reducing wide floating point operations and and deploying specialpurpose memories to reduce access to DRAM DEPENDABILITY • Dependability is the ability of a system to deliver its intended level of service to its users • Service Level Agreement (SLA) or Service Level Objectives (SLO) is a formal contract between a service provider and a client that outlines the specific services to be delivered, performance standards, and the responsibilities of both parties. It defines the expected level of service, including metrics for measuring performance and what happens if those performance levels are not met • • • Service accomplishment, where the service is delivered as specified. Service interruption, where the delivered service is different from the SLA. Quantifying these transitions leads to the main measures of dependability • Module Reliability • a measure of the continuous service accomplishment (or, equivalently, of the time to failure)from a reference initial instant • Mean Time To Fail (MTTF ) is a reliability measure • Module availability • a measure of the service accomplishment with respect to the alternation between the two states of accomplishment and interruption. 𝑀𝑇𝑇𝐹 • For a nonredundant system with repair,Module availability = 𝑀𝑇𝑇𝑅 + 𝑀𝑇𝑇𝐹 DEPENDABILITY EXAMPLE • QUESTION • ANSWER • Failure ratesystem = Assume a disk subsystem with the following components and MTTF: ■ disks, each rated at 1,000,000-hour MTTF ■ ATA controller, 500,000-hour MTTF ■ power supply, 200,000-hour MTTF ■ fan, 200,000-hour MTTF ■ ATA cable, 1,000,000-hour MTTF Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole 𝟏𝟎 ∗ 𝟏 𝟏 + 𝟏,𝟎𝟎𝟎,𝟎𝟎𝟎 𝟓𝟎𝟎,𝟎𝟎𝟎 𝟏 𝟏 𝟏 + + 𝟐𝟎𝟎,𝟎𝟎𝟎 𝟐𝟎𝟎,𝟎𝟎𝟎 𝟏,𝟎𝟎𝟎,𝟎𝟎𝟎 𝟐𝟑,𝟎𝟎𝟎 𝟏,𝟎𝟎𝟎,𝟎𝟎𝟎𝒉𝒐𝒖𝒓𝒔 • MTTFsystem = 1 𝐹𝑎𝑖𝑙𝑢𝑟𝑒 𝑟𝑎𝑡𝑒𝑠𝑦𝑠𝑡𝑒𝑚 = 44ℎ𝑜𝑢𝑟𝑠 EXAMPLE • QUESTION • ANSWER MTTFpower supply pair Disk subsystems often have redundant power supplies to improve dependability. Using the preceding components and MTTFs ,calculate the reliability of redundant power supplies .Assume that one power supply is sufficient to run the disk subsystem and that we are adding one redundant power supply. 𝒎𝒆𝒂𝒏 𝒕𝒊𝒎𝒆 𝒖𝒏𝒕𝒊𝒍 𝒂 𝒑𝒐𝒘𝒆𝒓 𝒔𝒖𝒑𝒑𝒍𝒚 𝒇𝒂𝒊𝒍𝒔 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂 𝒔𝒆𝒄𝒐𝒏𝒅 𝒇𝒂𝒊𝒍𝒖𝒓𝒆 𝒃𝒆𝒇𝒐𝒓𝒆 𝒓𝒆𝒑𝒂𝒊𝒓 MTTFpower supply = MTTFpower supply /2 //( A single power supply failure) 𝑴𝑻𝑻𝑹 𝒑𝒐𝒘𝒆𝒓 prob of second failure = 𝑴𝑻𝑻𝑭 𝒑𝒐𝒘𝒆𝒓 𝒔𝒖𝒑𝒑𝒍𝒚 MTTFpower supply MTTF/2 = MTTF MTTR/MTTF /2*MTTR EXAMPLE Using the preceding MTTF numbers, if we assume it takes on average hours for a human operator to notice that a power supply has failed and to replace 𝟐𝟎𝟎,𝟎𝟎𝟎𝟐 it, there liability of the fault tolerant pair of power supplies is MTTF = ≈ 𝟐 ∗ 𝟐𝟒 830,000,000hours which is more reliable than a single power supply MEASURING, REPORTING AND SUMMARIZING PERFORMANCE • • Performance can be seen in terms of response time (or execution time) —the time between the start and the completion of an event and throughput —the total amount of work done in a given time Comparing performance of different computers, X and Y in terms of execution time where X is n times 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝑿 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝒀 faster than Y is = 𝒏 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝒀 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝑿 • Execution time, or Elapsed includes the time it takes for I/O activities memory accesses and storage accesses • CPU time is the time the processor uses for computing, not includingthe time waiting for I/O or running other programs QUANTITATIVE PRINCIPLES OF COMPUTER DESIGN • Take advantage of parallelism • Principle of locality • Focus on the common case • Amdahl’s law • the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used • speedup = 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝒇𝒐𝒓 𝒆𝒏𝒕𝒊𝒓𝒆 𝒕𝒂𝒔𝒌 𝒖𝒔𝒊𝒏𝒈 𝒕𝒉𝒆 𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒎𝒆𝒏𝒕 𝒘𝒉𝒆𝒏 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝒇𝒐𝒓 𝒆𝒏𝒕𝒊𝒓𝒆 𝒕𝒂𝒔𝒌 𝒘𝒊𝒕𝒉𝒐𝒖𝒕 𝒖𝒔𝒊𝒏𝒈 𝒕𝒉𝒆 𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒎𝒆𝒏𝒕 𝐅𝐫𝐚𝐜𝐭𝐢𝐨𝐧 • Execution timenew = Execution timeold * ((𝟏 − 𝐅𝐫𝐚𝐜𝐭𝐢𝐨𝐧𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐝 ) + 𝑺𝒑𝒆𝒆𝒅𝒖𝒑𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐝 ) 𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒅 • Speedupoverall 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆𝒐𝒍𝒅 = 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆𝒏𝒆𝒘 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝟏 𝐅𝐫𝐚𝐜𝐭𝐢𝐨𝐧𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝑜𝑙𝑑 ∗ ((𝟏 − 𝐅𝐫𝐚𝐜𝐭𝐢𝐨𝐧𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐝 ) + 𝑺𝒑𝒆𝒆𝒅𝒖𝒑 𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒅 ) EXAMPLE • QUESTON Suppose that we want to enhance the processor used for web serving. The new processor is times faster on computation in the web serving application than the old processor. Assuming that the original processor is busy with computation of the time and is waiting for I/O of the time, what is the overall speedup gained by incorporating the enhancement? • ANSWER Fractionenhanced . Speedupenhanced Speedupoverall 1 0.6 + 0.4/ 10 = 1/0.64 ≈ 1.56 EXAMPLE • QUESTON A common transformation required in graphics processors is square root. Implementations of floatingpoint (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FSQRT) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FSQRT hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives • ANSWER SpeedupFSQRT SpeedupFSQRT 1 (1 − 0.2) + 0.2 = 10 1 0.5 (1 − 0.5) + 1.6 = 1 = 1.22 0.82 1 = 1.23 0.8125 EXAMPLE • QUESTON Amdahl’s Law is applicable beyond performance. Let’s redo the reliability example from page 39 after improving the reliability of the power supply via redundancy from 200,000-hour to 830,000,000-hour MTTF, or 4150 times better. • ANSWER Failure ratepower supply = 1/ 200,000 = 5 * 10The reliability improvement is Improvement power supply 1 (1 − 0.22) + 0.22/ 4150 = 1/0.78 = 1.26 THE PROCESSOR PERFORMANCE EQUATION • CPU time = CPU clock cycles for a program * Clock cycle time • OR CPU time = 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒 • Clock cycles per Instruction (CPI) = 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 (𝐼𝐶) • CPU time = IC * CPI * Clock cycle time • CPU time = 𝐼𝑛𝑠𝑡𝑟𝑐𝑢𝑡𝑖𝑜𝑛𝑠 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 ∗ 𝑃𝑟𝑜𝑔𝑟𝑎𝑚 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 • CPU clock cycles ∗ 𝑆𝑒𝑐𝑜𝑛𝑑𝑠 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 σ𝑛𝑖 = 1 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖 • CPU time = (σ𝑛𝑖 = 1 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖 ) * Clock cycle time • CPI = σ𝑛𝑖 = 1 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖 𝐼𝐶 𝑆𝑒𝑐𝑜𝑛𝑑𝑠 𝑃𝑟𝑜𝑔𝑟𝑎𝑚 THE END
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )