COMPUTER ORGANIZATION AND DESIGN 5th Edition The Hardware/Software Interface Chapter 1 Computer Abstractions and Technology ■ Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC-8-50 0 125 250 375 0 500 Passenger Capacity Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC-8-50 350 700 1050 Cruising Speed (mph) 4500 6750 9000 Cruising Range (miles) Boeing 777 0 2250 §1.6 Performance Defining Performance 1400 0 75000 150000 225000 300000 Passengers x mph Chapter 1 — Computer Abstractions and Technology — 26 Relative Performance ■ ■ Define Performance = 1/Execution Time “X is n time faster than Y” Performanc e X Performanc e Y = Execution time Y Execution time X = n ■ Example: time taken to run a program ■ ■ ■ 10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 0.5 (50%) faster than B Chapter 1 — Computer Abstractions and Technology — 28 CPU Time CPU Time = CPU Clock Cycles × Clock Cycle Time CPU Clock Cycles = Clock Rate ■ Performance improved by ■ ■ ■ Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Chapter 1 — Computer Abstractions and Technology — 31 CPU Time Example ■ ■ Computer A: 2GHz clock, 10s CPU time Designing Computer B ■ ■ ■ Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Clock CyclesB 1.2 × Clock Cycles A Clock RateB = = CPU Time B 6s Clock Cycles A = CPU Time A × Clock Rate A = 10s × 2GHz = 20 × 10 9 1.2 × 20 × 10 9 24 × 10 9 Clock RateB = = = 4GHz 6s 6s Chapter 1 — Computer Abstractions and Technology — 32 Instruction Count and CPI Clock Cycles = Instruction Count × Cycles per Instruction CPU Time = Instruction Count × CPI × Clock Cycle Time Instruction Count × CPI = Clock Rate ■ Instruction Count for a program ■ ■ Determined by program, ISA and compiler Average cycles per instruction ■ ■ Determined by CPU hardware If different instructions have different CPI ■ Average CPI affected by instruction mix Chapter 1 — Computer Abstractions and Technology — 33 CPI Example ■ ■ ■ ■ Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time A CPU Time B = Instruction Count × CPI × Cycle Time A A = I × 2.0 × 250ps = I × 500ps A is faster… = Instruction Count × CPI × Cycle Time B B = I × 1.2 × 500ps = I × 600ps CPU Time B = I × 600ps = 1.2 CPU Time I × 500ps A …by 20% Chapter 1 — Computer Abstractions and Technology — 34 CPI in More Detail ■ Various instruction types (i) uses various numbers of CPU clock cycles n Clock Cycles = ∑ (CPIi × Instruction Count i ) i=1 ■ Weighted average CPI n Clock Cycles Instruction Count i $ ' CPI = = ∑ % CPIi × " Instruction Count i=1 & Instruction Count # Relative frequency Chapter 1 — Computer Abstractions and Technology — 35 CPI Example ■ ■ Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 ■ ■ Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 ■ Sequence 2: IC = 6 ■ ■ Clock Cycles = 4×1 + 1×2 + 1×3 =9 Avg. CPI = 9/6 = 1.5 Chapter 1 — Computer Abstractions and Technology — 36 Performance Summary The BIG Picture Instructions Clock cycles Seconds CPU Time = × × Program Instruction Clock cycle ■ Performance depends on ■ ■ ■ ■ Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc Chapter 1 — Computer Abstractions and Technology — 37 ■ Improving an aspect of a computer and expecting a proportional improvement in overall performance Timproved ■ Taffected = + Tunaffected improvemen t factor Example: multiply accounts for 80s/100s ■ How much improvement in multiply performance to get 5× overall? 80 20 = + 20 n ■ §1.10 Fallacies and Pitfalls Pitfall: Amdahl’s Law ■ Cannot be done! Corollary: make the common case fast Chapter 1 — Computer Abstractions and Technology — 46 Fallacy: Low Power at Idle ■ Look back at i7 power benchmark ■ ■ ■ ■ Google data center ■ ■ ■ At 100% load: 258W At 50% load: 170W (66%) At 10% load: 121W (47%) Mostly operates at 10% – 50% load At 100% load less than 1% of the time Consider designing processors to make power proportional to load Chapter 1 — Computer Abstractions and Technology — 47 Pitfall: MIPS as a Performance Metric ■ MIPS: Millions of Instructions Per Second ■ Doesn’t account for ■ ■ Differences in ISAs between computers Differences in complexity between instructions MIPS = = ■ Instruction count Execution time × 10 6 Instruction count Clock rate = 6 Instruction count × CPI CPI × 10 6 × 10 Clock rate CPI varies between programs on a given CPU Chapter 1 — Computer Abstractions and Technology — 48