Recap 1 Measuring Performance A computer user: response time (execution time). A computer center manager - throughput - the total amount of work done in a period of time. CPU time : a very good and fair measure of performance. CPU time can also be divided into user CPU time (program) and system CPU time (OS). 2 Aspects of CPU Execution Time CPU Time = Instruction count x CPI x Clock cycle Depends on: Program Used Compiler ISA Instruction Count I Depends on: Program Used Compiler ISA CPU Organization CPI Clock Cycle C Depends on: CPU Organization Technology 3 Factors Affecting CPU Performance CPU time = Seconds = Instructions x Cycles Program Program Instruction Instruction Count I CPI Program X X Compiler X X Instruction Set Architecture (ISA) X X Organization Technology x Seconds X Cycle Clock Cycle C X X 4 Example: tradeoff between C and CPI Op Frequency Cycle Count ALU ops 43% 1 Loads 21% 1 Stores 12% 2 Branches 24% 2 • Assume stores can execute in 1 cycle by slowing clock 15% • Should this be implemented? 5 Simple Example • Old CPI = 0.43 x 1 + 0.21 x 1 + 0.12 x 2 + 0.24 x 2 = 1.36 • New CPI = 0.43 x 1 + 0.21 x 1 + 0.12 x 1 + 0.24 x 2 = 1.24 • Speedup = old time/new time – = {I x old CPI x C}/{I x new CPI x 1.15 C} – = 1.36 / (1.24 x 1.15) = 0.95 • Answer: Don’t make the change 6 Some Caveats • Inter-dependence of I, CPI, and C: Improvement In One May Impact Another – increasing pipeline depth tends to increase clock speed but may increase CPI – Change in ISA to reduce instruction count may require a design with slower clock => May Not Improve Performance – CPI depends on instruction mix => Smaller Instruction Count May Not Improve Performance 7 Code Size & Performance KDF9 B5500 Instructions executed Code size in instructions Code size in bits 12 11 10 9 8 7 6 5 ICL 1907 1.1 s 4 ATLAS 3 Performance factor Time 2 CDC 6600 NU 1108 1 8 Benchmarks and Benchmarking • In lack of a universal task pick some programs that represent common tasks • Use representative programs to compare performance of systems: • CAUTIONS: – Comparisons are as good as the benchmarks are in representing your real workload. – Many parameters affect measured performance 9 Example: We must use the same compiler • Compiler “enhancements” and performance 800 700 SPEC performance ratio 600 500 400 300 200 100 0 gcc espresso spice doduc nasa7 li eqntott matrix300 fpppp tomcatv Benchmark Compiler 1998 Morgan Kaufmann Publishers Enhanced compiler 10 Benchmark Suites • A Suite Is a Collection of Representative Benchmarks From Different Application Domains • Weakness of Any One Benchmark Likely to Be Compensated By Another • Standard Performance Evaluation Corporation (SPEC) – Most Popular Benchmark Suite – Suite Consists of Kernels, Small Fragments, Large Applications – SPEC2006: CINT2006, CFP2006 – http://www.spec.org/ • Benchmark suites for servers – SPECSFS: measures performance of File servers – SPECWeb: measurers performance of Web servers 11 SPEC CPU2006 Programs Benchmark CINT2006 (Integer) 400.Perlbench 401.bzip2 403.Gcc 429.mcf 445.gobmk 456.Hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Language Descriptions C C C C C C C C C C++ C++ C++ Programming Language Compression C Compiler Combinatorial Optimization Artificial Intelligence: Go Search Gene Sequence Artificial Intelligence: chess Physics / Quantum Computing Video Compression Discrete Event Simulation Path-finding Algorithms XML Processing Source: http://www.spec.org/osg/cpu2006/CINT2006/ 12 SPEC CPU2006 Programs Benchmark CFP2006 (Floating Point) 410.Bwaves 416.Gamess 433.Milc 434.Zeusmp 435.Gromacs 436.cactusADM 437.leslie3d 444.Namd 447.dealII 450.Soplex 453.Povray 454.Calculix 459.GemsFDTD 465.Tonto 470.Lbm 481.Wrf 482.sphinx3 Language Descriptions Fortran Fortran C Fortran C, Fortran C, Fortran Fortran C++ C++ C++ C++ C, Fortran Fortran Fortran C C, Fortran C Fluid Dynamics Quantum Chemistry Physics / Quantum Chromodynamics Physics / CFD Biochemistry / Molecular Dynamics Physics / General Fluid Dynamics Biology / Molecular Dynamics Finite Element Analysis Linear Programming, Optimization Image Ray-tracing Structural Mechanics Computational Electromagnetics Quantum Chemistry Fluid Dynamics Weather Speech Source: http://www.spec.org/osg/cpu2006/CFP2006/ 13 Top 20 SPEC CPU2006 Results (As of August 2007) Top 20 SPECint2006 Top 20 SPECfp2006 # MHz Processor int peak int base MHz Processor fp peak fp base 1 3000 Core 2 Duo E6850 22.6 20.2 4700 POWER6 22.4 17.8 2 4700 POWER6 21.6 17.8 3000 Core 2 Duo E6850 19.3 18.7 3 3000 Xeon 5160 21.0 17.9 1600 Dual-Core Itanium 2 18.1 17.3 4 3000 Xeon X5365 20.8 18.9 1600 Dual-Core Itanium 2 17.8 17.0 5 2666 Core 2 Duo E6750 20.5 18.3 2666 Core 2 Duo E6750 17.7 17.1 6 2667 Core 2 Duo E6700 20.0 17.9 3000 Xeon 5160 17.7 17.1 7 2667 Core 2 Quad Q6700 19.7 17.6 3000 Opteron 2222 17.4 16.0 8 2666 Xeon X5355 19.1 17.3 2667 Core 2 Duo E6700 16.9 16.3 9 2666 Xeon 5150 19.1 17.3 2800 Opteron 2220 16.7 13.3 10 2666 Xeon X5355 18.9 17.2 3000 Xeon 5160 16.6 16.1 11 2667 Xeon X5355 18.6 16.8 2667 Xeon X5355 16.6 16.1 12 2933 Core 2 18.5 17.8 2667 Core 2 Quad Q6700 16.6 16.1 13 2400 Core 2 Quad Q6600 18.5 16.5 2666 Xeon X5355 16.6 16.1 14 2600 Core 2 Duo X7800 18.3 16.4 2933 Core 2 Extreme X6800 16.2 16.0 15 2667 Xeon 5150 17.6 16.6 2400 Core 2 Quad Q6600 16.0 15.4 16 2400 Core 2 Duo T7700 17.6 16.6 1400 Dual-Core Itanium 2 15.9 15.2 17 2333 Xeon E5345 17.5 15.9 2667 Xeon 5150 15.9 15.5 18 2333 Xeon 5148 17.4 15.9 2333 Xeon E5345 15.4 14.9 19 2333 Xeon 5140 17.4 15.7 2600 Opteron 2218 15.4 12.5 20 2660 Xeon X5355 17.4 15.7 2400 Xeon X3220 15.3 15.1 14 Source: http://www.spec.org/cpu2006/results/cint2006.html Performance Evaluation Using Benchmarks • “For better or worse, benchmarks shape a field” • Good products created when we have: – Good benchmarks – Good ways to summarize performance • Given sales depend in big part on performance relative to competition, there is big investment in improving products as reported by performance summary • If benchmarks inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! 15 tomcatv fpppp matrix300 eqntott li nasa7 doduc spice epresso gcc SPEC Perf How to Summarize Performance 800 700 600 500 400 300 200 100 0 Benchmark 16