Reading: 2.4, 3.1-3.5. Measuring and Discussing Computer System Performance or “My computer is faster than your computer” Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.01 Unported License. Match (Best) Performance Metric to Domain Prep with explanation of metrics and domains. Performance Metrics 1. Network Bandwidth (data/sec) 2. Network Latency (ms) 3. Frame Rate (frames/sec) 4. Throughput (ops/sec) Domains Selection Online WoW Crysis (FPS) Torrent Download Google Server Farm A 4 3 1 2 B 4 1 3 2 C 2 1 3 4 D 2 3 1 4 E None of the above Jack’s car buying analogy, we care about many…. 2 Measures of “Performance” • • • • • • • Execution Time Frame Rate Throughput (operations/time) Responsiveness Performance / Cost Performance / Power Performance / Power^2 3 Recall our O(n) discussion • Much of computer science focuses on execution time – and much of our class will as well. • Ultimately, much of what we do we want fast (response time). People sit in front of computers / iphones / etc. • So is time a reasonable metric? User time/ clock time/ CPU time CPU Time for this class – for now. 4 All Together Now CPU Execution = Time Instruction CPI X X Count Clock Cycle Time 5 All Together Now seconds CPU Execution = Time Instruction CPI X X Count instructions Clock Cycle Time cycles/instruction seconds/cycle 6 CPU Execution = Time • Instruction CPI X X Count IC = 1 billion, 500 MHz processor, execution time of 3 seconds. What is the CPI for this program? 3 sec = 1*10^9 inst*CPI * 1sec/(5*10^8)cycles 1.5*10^9 cycles = 10^9insts*CPI 1.5 = CPI Clock Cycle Time Selectio n CPI A 3 B 15 C 1.5 D 15*10^9 E None of the above 7 Individual only Who Affects Performance? CPU Execution = Time CT IC Instruction CPI Clock Cycle X X Count Time • There are a number of people involved in processor / programming design • Each of these elements of the performance equation can be impacted by different designer(s) • Next slides will be about who can impact what. We’ll do speed voting (1 min ind, 1 min group) then discuss each slide. 8 1 min ind / 1 min group Who Affects Performance? CPU Execution = Time • CT IC Instruction CPI Clock Cycle X X Count Time What can a programmer influence? Selection A B C D E Impacts IC IC, CPI IC, CPI, and CT IC and CT None of the above 9 Who Affects Performance? 1 min ind / 1 min group CPU Execution = Time • CT IC Instruction CPI Clock Cycle X X Count Time What can a compiler influence? Selection A B C D E Impacts IC IC, CPI IC, CPI, and CT CPI and CT None of the above 10 Who Affects Performance? 1 min ind / 1 min group CPU Execution = Time • CT IC Instruction CPI Clock Cycle X X Count Time What can an instruction set architect influence? Selection A B C D E Impacts IC IC, CPI IC, CPI, and CT CPI and CT None of the above 11 Who Affects Performance? 1 min ind / 1 min group CPU Execution = Time • CT IC Instruction CPI Clock Cycle X X Count Time What can an hardware designer influence? Selection A B C D E Impacts IC IC, CPI IC, CPI, and CT CPI and CT None of the above 12 Performance Variation CPU Execution = Time Number of instructions ROW 1 2 3 Instruction CPI Clock Cycle X X Count Time Same machine different programs same programs, different machines, same ISA Same programs, different machines, different ISA CPI DIFF Same Diff Same DIFF Same DIFF Same Diff Clock Cycle Time Same Diff Diff Sele Row(s) ction Correct A 1 B 1 and 3 C 2 D 2 and 3 E None of the above 13 Other Performance Metrics • Time is useful – but how might we try to measure the “performance” of a machine - MIPS - MFLOPS 14 MIPS MIPS = Millions of Instructions Per Second = Instruction Count Execution Time * 106 = Clock rate CPI * 106 • • program-independent deceptive Just crank up clock rate and have it execute tons of noops. But we need to sell processors, what do we market? 15 Trying to market the “performance” of a processor. • “Speed Demons” vs. “Brainiacs” Intel vs. Alpha, Intel wins… but ends up remarketing. If we can’t use something like raw processor speed (CT), if we want CPI - we need to look at performance on benchmarks 16 Benchmarks - Which Programs? • peak throughput measures (simple programs)? 17 Benchmarks - Which Programs? • • peak throughput measures (simple programs)? synthetic benchmarks (whetstone, dhrystone,...)? 18 Benchmarks - Which Programs? • • • peak throughput measures (simple programs)? synthetic benchmarks (whetstone, dhrystone,...)? Real applications 19 Benchmarks - Which Programs? • • • • peak throughput measures (simple programs)? synthetic benchmarks (whetstone, dhrystone,...)? Real applications SPEC (best of both worlds, but with problems of their own) - System Performance Evaluation Cooperative - Provides a common set of real applications along with strict guidelines for how to run them. - provides a relatively unbiased means to compare machines. 20 Danger in Benchmark-Specific Performance Measures • measures compiler as much as architecture! 21 SPEC Performance on Pentium III on clock rate relative to change in and Pentium Focus 4 INT vs. FP performance. SSE2 on P3 was a FP stack, P4 had independent registers 22 Speedup Often want to compare performance of one machine against another Performance = 1 Execution Time Speedup (A over B) = PerformanceA PerformanceB Speedup (A over B) = ETB ETA • 23 Amdahl’s Law Execution time = after improvement Execution Time Affected + Execution Time Unaffected Amount of Improvement 24 Amdahl’s Law and Parallelism • Our program is 90% parallelizable (segment of code executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 4 cores (assume no overhead for parallelization)? ISOMORPHIC Execution time = after improvement Execution Time Affected + Execution Time Unaffected Amount of Improvement Selecti on Execution Time A 25 seconds B 32.5 seconds C 50 seconds D 92.5 seconds E None of the above 25 Amdahl’s Law and Parallelism • Our program is 90% parallelizable (segment of code executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 2 cores (assume no overhead for parallelization)? ISOMORPHIC Execution time = after improvement Execution Time Affected + Execution Time Unaffected Amount of Improvement Selecti on Execution Time A 55 seconds B 50 seconds C 100 seconds D 95 seconds E None of the above 26 Amdahl’s Law • So what does Amdalh’s Law mean at a high level? Selectio “BEST” message from Amdahl’s Law n A Parallel programming is critical for improving performance B Improving serial code execution is ultimately the most important goal. C Performance is strictly tied to the ability to determine which percentage of code is parallelizable. D The impact of a performance improvement is limited by the percent of execution time affected by the improvement E None of the above 27 Point out Phenom II x4 and x2 get same performance – but one has 4 cores the other 2. What does this tell us? (Note – with some slower speeds of the Phenom this isn’t the case – 4 cores help.) 28 Speedup vs. Sizeup • Speedup runs into problems for parallelization because of diminishing returns and dominance of serial execution. • What if time were a constant? Human perception (graphics) Earthquake prediction Weather prediction 29 Key Points • Be careful how you specify “performance” • Execution time = IC * CPI * CT • Use real applications, if possible • Use standards, if possible • Make the common case fast 30