Performance And Power Benchmarking Khushboo Sheth Department of Electrical and Computer Engineering 10/24/05 ELEC6500 1 Performance Reducing Response Time (Execution Time)- the time between the start and the completion of a task. Total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating systems overhead, CPU execution time, etc. Increasing Throughput-the total amount of work done in a given time. 10/24/05 ELEC6500 2 Performance Performance and Execution time relation for a computer X can be given as PerformanceX = 1 -------------Execution time X If comuter X is n times faster than computer Y then the execution time on Y is n times longer than it is on X : PerformanceX = Execution Time Y = n ----------------------------------PerformanceY Execution Time X 10/24/05 ELEC6500 3 Execution Time Elapsed Time - total time to complete a task, including disk accesses, memory accesses, I/O activities, operating systems overhead, etc. CPU Time – the time the CPU spends computing for the task and does not include time spent waiting for I/O or running other programs ( response time = elapsed time not the CPU time ) User CPU Time – the CPU time spent in the program System CPU Time – the CPU time spent in the operating system performing tasks on behalf of the program 10/24/05 ELEC6500 4 Computing CPU Execution Time Computers are constructed using a clock that runs at a constant rate and determines when event take place in the hardware. These discrete time intervals are called clock cycles. Clock rate is the inverse of clock period. CPU Execution time = CPU clock cycles * Clock cycle for a program for a program time CPU clock cycles = Instructions * Average clock cycles for a program per instruction CPU Time = Instruction count * CPI * Clock cycle time Seconds = Instruction * Clock cycles * Seconds ---------- ------------------------ --------Program Program Instruction Clock cycles 10/24/05 ELEC6500 5 Evaluating Performance The computer may be evaluated using a set of BENCHMARKS – programs specifically chosen to measure the performance. The benchmarks form a workload that the user hopes will predict the performance of the actual workload. “Synthetic” benchmarks – specially created programs that impose the workload on the component “Application” benchmarks – run actual real-world programs on the system. Application Benchmarks usually give a much better measure of real world performance on a given system, synthetic benchmarks still have their use for testing out individual components like a hard disk or networking device. 10/24/05 ELEC6500 6 Types Of Benchmarks Real Program Kernel Word processing software Tool software of CDA User`s application software (MIS) Contains key codes Normally abstracted from actual program Popular kernel-Livermore loop Linpack benchmark (contains basic linear algebra subroutine written in FORTRAN language) Results are represented in MFLOPS Toy Benchmark User can program it and use it to test computer`s basic components. 10/24/05 ELEC6500 7 Types of Benchmarks Synthetic Benchmark Procedure for programming synthetic benchmark Take statistics of all type of operations from plenty of application programs Get proportion of each operation Write a program based on the proportion above Its results are represented in KWIPS (Kilo Whetstone Instructions Per Second). Not suitable for measuring pipeline computers Types of Synthetic Benchmarks Whetstone – is a benchmark for evaluating the power of computers. It was first written in Algol60 at the National Physical Laboratory in the United Kingdom. It originally measured computing power in units of kilo-WIPS. Results for a variety of languages, compilers and system architectures have been obtained and modern workstations typically achieve more than 1,000,000 kWIPS. It primarily measures the floating point arithmetic performance. 10/24/05 ELEC6500 8 Types of Benchmarks Types of Synthetic Benchmarks Dhrystone – is a benchmark invented in 1984 by Reinhold P. Weicker. It contains no floating point operations, thus the name is a pun on the then popular Whetstone benchmark for floating point operations. The o/p from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second). One common representation of the Dhrystone benchmark is the DMIP-Dhrystone MIPS-obtained when the Dhrystone score is divided by 1,757 (the number of Dhrystones per second obtained on the VAX 11/780, a 1 MIPS machine). The Dhrystone benchmark contains mainly integer and string operations. But like most synthetic benchmarks, the Dhrystone benchmark is not particularly useful in measuring the performance of real-world computer systems and has fallen into disuse replaced by benchmarks that more closely resemble typical actual usage. 10/24/05 ELEC6500 9 SPEC The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization that aims to produce fair, impartial and meaningful benchmarks for computers. SPEC was founded in 1988 and is financed by its member organizations which include all leading computer & software manufacturers. SPEC benchmarks are widely used today in evaluating the performance of computer systems. The benchmarks aims to test real-life situations. SPEC_WEB, for example, tests web servers performance by performing various types of parallel HTTP requests, and SPEC_CPU tests CPU performance by measuring the run time of several programs such as the compiler gcc and the chess program crafty. The various tasks are assigned weights based on their perceived importance; these weights are used to compute a single benchmark result in the end. SPEC benchmarks are written in a platform neutral programming language (usually C or FORTRAN) and the interested parties may compile the code using whatever compiler they prefer for their platform, but may not change the code. Manufacturers have been known to optimize their compilers to improve performance of the various SPEC benchmarks. 10/24/05 ELEC6500 10 Various Current SPEC Benchmarks SPEC CPU2000,combined performance of CPU, memory and compiler CIN2000 (“SPECint”) ,testing integer arithmetic, with programs such as compilers, interpreters, word processors, chess programs, etc. CFP2000(“SPECfp”) , testing floating point performance, with physical simulations, 3D graphics, image processing, computational chemistry, etc. SPECWEB99, web server performance, measured by setting up a network of client machines that stress the server with parallel requests. SPEC HPC2002, testing high end parallel computing systems with applications such as weather prediction and computational chemistry. SPEC JVM98, performance of a java client server running a java virtual machine. SPEC MAIL2001, performance of a mail server, testing SMTP and POP protocol SPEC SFS97_R1, NFS file server throughput and response time. 10/24/05 ELEC6500 11 Power Benchmarking The power benchmarking of a computer is fundamentally the notion of determining how much energy the computer is consuming in order to accomplish some measure of work. The BDTI (Berkeley Design Technology Inc.), EEMBC (EDN Embedded Microprocessor Benchmark Consortium), and SPEC (Standard Performance Evaluation Corp) benchmark organizations support benchmark suites that highlight a processor's performance when performing application specific tasks. Researchers at BDTI and EEMBC are both working on how to extend their benchmark suites to measure and compare a processor’s energy efficiency as opposed to power consumption when performing application specific tasks. 10/24/05 ELEC6500 12 Power Benchmark Strategy There are 3 primary areas of interest when benchmarking the characteristics of “low power” systems employing power management techniques to achieve low power goals. First – actual power consumption of the system under typical user conditions, presumably under power management spectrum. Second – system operability or usability under power management conditions. Its clear that one could achieve remarkable power characteristics at the cost of system performance and the response time. Third – impact of power management techniques on system reliability. An appropriate benchmarking strategy for power managed systems must address these three areas in order to postulate an overall system figure merit, low power without sacrificing system operability or reliability. It would be one that characterizes the system power consumption while the system was carrying out some useful task. 10/24/05 ELEC6500 13 Power Benchmarking The primary interest in power benchmarking is power consumed over the course of exercising a given application or in the case of multi-tasking environments, multiple applications running simultaneously. Specifically, what is the system power consumption as an application is exercised and the system transitions through various power managed power states. This is fundamentally a question of system expectations from both the application and end user’s perspective and how a power management facility might be able to exploit these expectations to reduce the system power consumption. If a specific system component isn’t being used and is unlikely to be used in the immediate time frame, its level of readiness might be compromised in order to reduce its and ultimately the system power consumption. A Word Processing application being used in EDIT mode might not access the system Fixed Disk for an extended period of time. The power management facility might recognize this as a flag that suggests that the Fixed Disk is unlikely to be called upon in the near time frame. Based on this determination the power management facility might exploit this system expectation as an opportunity to transition the fixed disk to a lower power state. This scenario could progress to a point when the fixed disk is actually completely powered off, its lowest power state and lowest state of readiness. 10/24/05 ELEC6500 14 Power Benchmarking The energy consumption of a system is therefore, the aggregate power dissipated by its components over time, at varying power states. In terms of time , it is the energy required to execute a given task to completion. This can be reflected at the system level as the summation of the energy requirements of each subtask and can be computed by the following expression: m Pt = II (Pn) * Tc 1 -----3600 where, Pt = “Task Energy” in watt hours, WHrs. m = no. of power transitions occurring during the task Pn = Segmented power dissipation during a given power state Tc = “Task Cycle” time, time required to complete the task Pn = Tsn * Ps -----Tc where, Tsn = time duration of the power state n, Ps = power level during the power state 10/24/05 ELEC6500 15 Power Benchmarking This expression accurately reflects the energy consumed by a system during the execution of a task but does not reflect any notion of system operability, specifically the time spent to complete the task. It is unclear how to define and apply a consistent approach to measure energy efficiency and correlating it with a performance point. Both BDTI and EEMBC now propose that the core and local memories to a workload is sufficient, provided that proper disclosure of the testing configuration exists. Standard power and energy efficiency benchmarks are coming to fruition and a lot of opportunity exists for people to refine them. The importance of power benchmarks will continue to grow, especially because a growing number of processors have similar or identical core architectures. However, just like performance benchmarks, power benchmarks require developers to practice due diligence when mapping the benchmark data and testing configuration to their project’s requirements. Reference : David A. Patterson and John L. Hennessy, James W. Davis, EDNAsia.com 10/24/05 ELEC6500 16