Evaluating Performance Chapter 1 Section 1.4 Dr. Iyad F. Jafar Outline Introduction CPU Execution Time The Performance Equation Determinants of Performance SPEC Benchmark Other Performance Metrics Examples 2 Introduction Given a collection of computers, which one to buy? Best Performance ? Least Cost ? Best Cost/Performance? How to define performance? Time required to finish a task (individual users) Number of tasks executed per time (throughput) Less time to finish implies better performance Performance = 1 / Execution Time If X is n times faster than Y, then 3 PerformanceX ExecutionTimeY n PerformanceY ExecutionTimeX CPU Execution Time How to measure the execution time? Almost all modern computers are based on a clock The clock is a periodic square wave with known period (cycle time) Period = 1 / Frequency one clock cycle • The base unit in measuring time is the cycle time , thus Time = cycles * cycle time Time = cycles / clock frequency 4 CPU Execution Time Time required to execute a program is essentially the time required to execute its instructions! Time = #instructions x cycle time However, not all instructions take the same time! One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction Time = #instructions x average cycles per instruction x cycle time Time = IC x CPI x CC 5 CPU Execution Time The average CPI is computed by Effective CPI = N k 1 ICk CPI k IC Where Where ICk is the number of instructions of class k executed CPIk is the number of clock cycles per instruction for that instruction class N is the number of instruction classes Note: The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs 6 The Performance Equation Execution Time = IC CPI x CC Execution Time = IC CPI / CR Performance = 1 / Execution Time Notes Three key factors for performance: IC, CPI, and CC CC: The clock rate is usually given IC: Overall instruction count (executed instructions) by using profilers/ simulators CPI: varies by instruction type and ISA implementation 7 The Performance Equation Example 1. In a certain program 1000 instructions were executed on CPU running at 1 GHz. If the instruction counts and CPI for each class are given below, how long does it take to execute the program? Instruction Class Instruction Count Class CPI 1 200 2 2 300 3 3 500 1 Effective CPI = (200x2+300x3+500x1)/1000 = 1.8 Time = 1000 x 1.8 x 1 ns = 1.8 us 8 The Performance Equation Example 2. Suppose the computer A has clock cycle of 250 ps and CPI 2.0 for some program, and computer B has clock cycle time of 500 ps and CPI of 1.2 for the same program, then which computer is faster ? TimeA = IC x 2 x 250 ps = 500 IC ps TimeB = IC x 1.2 x 500 ps = 600 IC ps PerformanceA TimeB 600 IC -------------------- = ---------- = --------- = 1.2 PerformanceB TimeA 500 IC 9 Computer A is 1.2 faster than B The Performance Equation Example 3. A certain processor that has four instruction classes is to be modified using different approaches. The details of the program used in evaluating different approaches are given in the table below. What is the effective CPI for The original processor Approach 1. A cache is added and it reduces the average load time to 2 cycles. Approach 2. A branch prediction scheme is used and it cuts the branch time by 1 cycle. Approach 3. A second ALU is added to execute two ALU instructions at once. Original 10 App1 App2 App3 Class Frequency Class CPI CPIk x F CPIk x F CPIk x F CPIk x F ALU 50% 1 0.5 0.5 0.5 0.25 Load 20% 5 1.0 0.4 1.0 1.0 Store 10% 3 0.3 0.3 0.3 0.3 Branch 20% 2 0.4 0.4 0.2 0.4 Effective CPI 2.2 1.6 2.0 1.95 Speed up 1.375 1.10 1.128 Determinants of Performance Execution Time = IC CPI x CC IC CPI Algorithm X X Programming Language X X Compiler X X ISA X X X X X Processor Organization Technology 11 CC X SPEC Benchmark What programs can be used to evaluate different computers? Can we cheat? Need a standard! SPEC Benchmark Standard Performance Evaluation Corp (SPEC) Programs used to measure performance (CPU, Web, I/O…) Typical actual workloads SPEC CPU2006 Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) n 12 Geometric Mean = n Execution Time Ratio i 1 i SPEC Benchmark CINT2006 for Opteron X4 2356 IC×109 CPI Tc (ns) Exec time Ref time SPECratio Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Name Description perl Geometric mean 11.7 High cache miss rates 13 SPEC Benchmark CINT2000 Results for Various Processors Pentium 3 3400 CINT2000 Pentium 4 2900 P4 Extreme 2400 Xeon Athlon 1900 Athlon 64 1400 Opteron Pmac G5 900 Athlon FX (DC) 14 Clock Speed (GHz) 4 3.5 3 2.5 2 1.5 1 0.5 400 Core Duo Core 2 Duo SPEC Benchmark CFP2000 Results for Various Processors 3200 Pentium 3 Pentium 4 2700 CFP2000 P4 Extreme 2200 Xeon Athlon 1700 Athlon 64 Opteron 1200 Pmac G5 700 Athlon FX (DC) Core Duo 15 Clock Speed (GHz) 3.5 3 2.5 2 1.5 1 0.5 200 Core 2 Duo Other Performance Factors Power consumption is one factor in evaluating performance This is specifically important in the embedded market where battery life is important (and passive cooling) 16 Examples Example 4. given a program with 106 instructions with the following mix: 10% class A, 20% class B, 50% class C, and 20% class D. If this program is executed on two different processors with the specifications given below, then Processor CR (GHz) CPI Class A CPI Class B CPI Class C CPI Class D 1 1.5 1 2 3 4 2 2 2 2 2 2 What is the effective CPI for the program for each implementation? Which implementation is faster? What is the speedup? 17 Examples Example 5. The information for some program that is executed on some processor is given below. If the processor is modified such that the CPI for Class 2 instructions is reduced to 2, then would it be beneficial to adopt this modification if this modification requires increasing the clock cycle by 10%? modification does not affect the clock cycle but requires twice the amount of power to execute the program 18 Classi CPIi Frequencyi 1 2 0.3 2 5 0.2 3 3 0.5