Power calculation for transistor operation P CapacitiveLoad Voltage2 ClockFrequency or P CV f clock 2 • What will cause power consumption to increase? CS2710 Computer Organization 1 Measuring the current used by the Atmega microprocessor shows a linear relationship ATMEGA 32 Current versus Crystal Frequency P CV f clock 2 y = 1E-06x + 21.406 R² = 0.9921 45 Microprocessor Current (mA) Also : P IV Thus : I CVf clock 50 40 35 30 25 Current 20 Linear (Current) 15 10 5 0 0 5000000 10000000 15000000 20000000 Crystal Frequency (Hz) Note: V=5v for in this case CS2710 Computer Organization 2 What effect does increasing voltage to a microprocessor have on power? On speed? Power versus Microprocessor Voltage 250 y = 3.428x2.5874 R² = 0.9984 Microprocessor Power (mW) 200 150 Power 100 Power (Power) 50 Below around 2.5v (for this microprocessor), the transistors simply stop working 0 0 1 2 Microprocessor Voltage 4 3 CS2710 Computer Organization 5 6 3 The Power Wall: Why haven’t clock rates continued to increase at historical rates? CS2710 Computer Organization 4 Manufacturers have turned to multi-core architectures to bypass the Power Wall Clock speed decrease, but overall performance increase CS2710 Computer Organization 5 Lecture Objectives: 1) Explain the SPEC benchmarks. 2) Define Amdahl's law 3) Define MIPS Amdahl’s Law (p51) • The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used ExecutionTimeIm proved ExecutionTimeAffectedBy Im provement AmountOfImprovement CS2710 Computer Organization ExecutionTimeUnaffected 7 Amdahl’s Law Applied • A Program spends 40 seconds performing network transfers and 60 seconds generating reports. – Suppose we could rewrite the report generator to make it more efficient. – What improvement in performance in the report generator would be necessary to increase the overall speed of the program by a factor of 2? – How about by a factor of 3? CS2710 Computer Organization 8 A Performance Metric: MIPS InstructionCount MIPS 6 ExecutionTime 10 Units: millions of instructions per second CS2710 Computer Organization 9 Issues with MIPS metrics 1. Measures instruction execution rate, but doesn’t consider the complexity of the instructions performed 2. Average instruction complexity varies between programs executing on a single computer 3. Different microprocessors implement instructions of differing complexities • MIPS may vary independently from performance • We cannot compare computers with different instruction sets using MIPS! CS2710 Computer Organization 10 Benchmarking: How do you decide which computer to buy? CS2710 Computer Organization 11 SPEC Benchmark • A set of programs used to measure performance – Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) – Develops benchmarks for CPU, I/O, Web, … • SPEC CPU2006 – Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance – Normalize relative to reference machine – Summarize as geometric mean of performance ratios • CINT2006 (integer) and CFP2006 (floating-point) n n Execution time ratio i i1 CS2710 Computer Organization 12 Geometric vs. Arithmetic Mean • Arithmetic mean: 1 n xi n i 1 • Geometric mean: n n x i 1 i CS2710 Computer Organization 13 Which computer has better overall performance? Computer A Computer B Computer C Program 1 1 10 20 Program 2 1000 100 20 CS2710 Computer Organization 14 Which computer has better overall performance? Computer A Computer B Computer C Program 1 1 10 20 Program 2 1000 100 20 Arithmetic mean 500.5 55 20 Geometric mean 31.622 . . . 31.622 . . . 20 A is fastest via Arithmetic mean. A and B are tied via Geometric mean. Geometric mean is the appropriate mean when the ranges of the values being compared vary significantly. CS2710 Computer Organization 15 Benchmarking often computes performance relative to a standard reference Computer A Computer B Computer C Program 1 1 10 20 Program 2 1000 100 20 Let’s say A is the “reference” computer. We adjust all performance values by dividing each value by the reference computer’s value. In this example, we divide all results for Program 2 by the reference computer’s performance value of 1000, giving: Computer A (reference) Computer B Computer C Program 1 1 10 20 Program 2 1 0.1 0.02 Scaling the results in this manner is called normalization. Note that no normalization was needed for Program 1 since the reference computer’s value was already 1. CS2710 Computer Organization 16 Arithmetic and Geometric means based on the normalized values: Computer A Computer B Computer C Program 1 1 10 20 Program 2 1 0.1 0.02 Arithmetic mean 1 5.05 10.01 Geometric mean 1 1 0.632 . . . Now C is fastest via Arithmetic mean! A and B are still tied via Geometric mean. CS2710 Computer Organization 17 Now consider computer B to be the “reference” computer and normalize A and C w.r.t. B Computer A Computer B (reference) Computer C Program 1 0.1 1 2 Program 2 10 1 0.2 Arithmetic mean 5.05 1 1.1 Geometric mean 1 1 0.632 Now A is fastest via Arithmetic mean! A and B are still tied via Geometric mean. The Geometric mean is consistent regardless of normalization! CS2710 Computer Organization 18 The SPECjvm2008 application – SPECjvm2008 is a benchmark suite for measuring the performance of a Java Runtime Environment (JRE), containing several real life applications and benchmarks focusing on core java functionality. – The SPECjvm2008 workload mimics a variety of common general purpose application computations. CS2710 Computer Organization 19 CINT2006 integer performance benchmarks for the Opteron X4 2356 IC×109 CPI Tc (ns) Exec time Ref time SPECratio Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Name Description perl Geometric mean 11.7 CS2710 Computer Organization 20 SPEC and power: ssj_ops (server-side java operations/sec) • Power consumption of server at different workload levels – Performance: ssj_ops/sec – Power: Watts (Joules/sec) 10 10 Overall ssj_ops per Watt ssj_ops i poweri i0 i 0 CS2710 Computer Organization 21 A Power benchmark: SPEC Power versus load SPECpower_ssj2008 for X4 Target Load % Performance (ssj_ops/sec) Average Power (Watts) 100% 231,867 295 90% 211,282 286 80% 185,803 275 70% 163,427 265 60% 140,160 256 50% 118,324 246 40% 920,35 233 30% 70,500 222 20% 47,126 206 10% 23,066 180 0% 0 141 1,283,590 2,605 Overall sum ∑ssj_ops/ ∑power 493 CS2710 Computer Organization 22 Low power at low usage? No! • Look back at X4 power benchmark – At 100% load: 295W – At 50% load: 246W (83%) – At 10% load: 180W (61%) • Google data center – Mostly operates at 10% – 50% load – At 100% load less than 1% of the time • Future research/development: Design processors to make power proportional to load CS2710 Computer Organization 23