CDA 3101 Spring 2016 Introduction to Computer Organization Benchmarks 14 January 2016 Overview • Benchmarks • Popular benchmarks – Linpack – Intel’s iCOMP • SPEC Benchmarks • MIPS Benchmark • Fallacies and Pitfalls Benchmarks • Benchmarks measure different aspects of component and system performance • Ideal situation: use real workload • Types of Benchmarks • Real programs • Kernels • Toy benchmarks • Synthetic benchmarks • Risk: adjust design to benchmark requirements – (partial) solution: use real programs and update constantly • Engineering or scientific applications • Software development tools • Transaction processing • Office applications A / Benchmark Story 1. You create a benchmark called the vmark 2. Run it on lots of different computers 3. Publish the vmarks in www.vmark.org 4. vmark and www.vmark.org become popular – Users start buying their PCs based on vmark – Vendors would be banging on your door 5. Vendors examine the vmark code and fix up their compilers and/or microarchitecture to run vmark 6. Your vmark benchmark has been broken 7. Create vmark 2.0 Performance Reports • Reproducibility – Include hardware / software configuration (SPEC) – Evaluation process conditions • Summarizing performance – – – – – Total time: Arithmetic mean: Harmonic mean: Weighted mean: Geometric mean: GM (Xi) GM (Yi) = GM Xi Yi AM = 1/n * Σ exec timei HM = n / Σ (1/ratei) WM = Σ wi * exec timei GM = (Π exec time ratioi)1/n Ex.1: Linpack Benchmark • “Mother of all benchmarks” • Time to solve a dense systems of linear equations DO I = 1, N DY(I) = DY(I) + DA * DX(I) END DO • Metrics – – – – Rpeak: system peak Gflops Nmax: matrix size that gives the highest Gflops N1/2: matrix size that achieves half the rated Rmax Gflops Rmax: the Gflops achieved for the Nmax size matrix • Used in http://www.top500.org Ex.2: Intel’s iCOMP Index 3.0 • New version (3.0) reflects: • Mix of instructions for existing and emerging software. • Increasing use of 3D, multimedia, and Internet software. • Benchmarks • • • • • 2 integer productivity applications (20% each) 3D geometry and lighting calculations (20%) FP engineering and finance programs and games (5%) Multimedia and Internet application (25%. ) Java application (10%) • Weighted GM of relative performance – Baseline processor: Pentium II processor at 350MHz Ex.3: SPEC CPU Benchmarks • System Performance Evaluation Corporation • Need to update/upgrade benchmarks – Longer run time – Larger problems – Application diversity www.spec.org • Rules to run and report – Baseline and optimized – Geometric mean of normalized execution times – Reference machine: Sun Ultra5_10 (300-MHz SPARC, 256MB) • CPU2006: latest SPEC CPU benchmark (4th version) – 12 integer and 17 floating point programs • Metrics: response time and throughput Ex.3: SPEC CPU Benchmarks 1989-2006 Previous Benchmarks, now retired Ex.3: SPEC CPU Benchmarks • Observe: We will use SPEC 2000 & 2006 CPU benchmarks in this set of notes. • Task: However, you are asked to read about SPEC 2006 CPU benchmark suite, described at www.spec.org/cpu2006 • Result: Compare SPEC 2006 with SPEC 2000 data www.spec.org/cpu2000 to answer the extra-credit questions in Homework #2. SPEC CINT2000 Benchmarks 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 164.gzip C Compression 175.vpr C FPGA Circuit Placement and Routing 176.gcc C C Programming Language Compiler 181.mcf C Combinatorial Optimization 186.crafty C Game Playing: Chess 197.parser C Word Processing 252.eon C++ Computer Visualization 253.perlbmk C PERL Programming Language 254.gap C Group Theory, Interpreter 255.vortex C Object-oriented Database 256.bzip2 C Compression 300.twolf C Place and Route Simulator SPEC CFP2000 Benchmarks 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 168.wupwise F77 171.swim F77 172.mgrid F77 173.applu F77 177.mesa C 178.galgel F90 179.art C 183.equake C 187.facerec F90 188.ammp C 189.lucas F90 191.fma3d F90 200.sixtrack F77 301.apsi F77 Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Partial Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Nuclear Physics Accelerator Design Meteorology: Pollutant Distribution SPECINT2000 Metrics • SPECint2000: The geometric mean of 12 normalized ratios (one for each integer benchmark) when each benchmark is compiled with "aggressive" optimization • SPECint_base2000: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization • SPECint_rate2000: The geometric mean of 12 normalized throughput ratios when compiled with "aggressive" optimization • SPECint_rate_base2000: The geometric mean of 12 normalized throughput ratios when compiled with "conservative" optimization SPECint_base2000 Results Mips/IRIX R12000@ 400MHz Intel/NT 4.0 PIII @ 733 MHz Alpha/Tru64 21264 @ 667 MHz SPECfp_base2000 Results Mips/IRIX R12000@ 400MHz Alpha/Tru64 21264 @ 667 MHz Intel/NT 4.0 PIII @ 733 MHz Effect of CPI: SPECint95 Ratings Microarchitecture improvements CPU time = IC * CPI * clock cycle Effect of CPI: SPECfp95 Ratings Microarchitecture improvements SPEC Recommended Readings SPEC 2006 – Survey of Benchmark Programs http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf SPEC 2006 Benchmarks - Journal Articles on Implementation Techniques and Problems http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/ SPEC 2006 Installation, Build, and Runtime Issues http://www.spec.org/cpu2006/issues/ Another Benchmark: MIPS • • • • Millions of Instructions Per Second MIPS = IC / (CPUtime * 106) Comparing apples to oranges Flaw: 1 MIPS on one processor does not accomplish the same work as 1 MIPS on another – It is like determining the winner of a foot race by counting who used fewer steps – Some processors do FP in software (e.g. 1FP = 100 INT) – Different instructions take different amounts of time • Useful for comparisons between 2 processors from the same vendor that support the same ISA with the same compiler (e.g. Intel’s iCOMP benchmark) Fallacies and Pitfalls • Ignoring Amdahl’s law • Using clock rate or MIPS as a performance metric 1922-2015 • Using the Arithmetic Mean of normalized CPU times (ratios) instead of the Geometric Mean • Using hardware-independent metrics – Using code size as a measure of speed • Synthetic benchmarks predict performance – They do not reflect the behavior of real programs • The geometric mean of CPU times ratios is proportional to the total execution time [NOT!!] Conclusions • Performance is specific to a particular program/s • CPU time: only adequate measure of performance • For a given ISA performance increases come from: – increases in clock rate (without adverse CPI affects) – improvements in processor organization that lower CPI – compiler enhancements that lower CPI and/or IC • Your workload: the ideal benchmark • You should not always believe everything you read! Happy & Safe Weekend