Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001 Administrative • Office Hours Office: D304 LSRC Hours: Mon 10:00-11:00 Thurs 2:00-3:00 or by appointment (email) email: alvy@cs.duke.edu Phone: 660-6551 • Teaching Assistant Fareed Zaffar Office: D125 LSRC Hours: Tuesday 10:00-11:00, Wednesday 1:00-2:00 email: fareed@cs.duke.edu Phone: 660-6576 © Alvin R. Lebeck 2001 CPS 220 2 Administrative (Grading) • 30% Homeworks – 6 Homeworks – 5 points per day late, for first 10 days – Always do the homework (better late than never) • 30% Examinations (Midterm + Final) • 30% Research Project (work in pairs) • 10% Class Participation • This course requires hard work. © Alvin R. Lebeck 2001 CPS 220 3 Administrative (Continued) • Midterm Exam: In class (75 min) Closed book • Final Exam: (3 hours) closed book • This is a “Quals” Course. – Quals pass based on Midterm and Final exams only © Alvin R. Lebeck 2001 4 Administrative (Continued) • Course Web Page – http://www.cs.duke.edu/courses/fall01/cps220 – Lectures posted there after class (pdf) – Homework posted there • Course News Group – duke.cs.cps220 – Use it to 1) read announcements/comments on class or homework, 2) ask questions (help), 3) communicate with each other • Need Duke CS account – Duke ID, ACPUB account name (see HW #0) © Alvin R. Lebeck 2001 5 SPIDER: Systems Seminar • Systems & Architecture Seminar – Wednesdays 3:45-5:00 in D344 – duke.cs.os-research (spider newsgroup) • Presentations on current work – Practice talks for conferences – Discussion on recent papers – Your own research • Why you should go? – If you want to work in Systems/Architecture… – Good time to practice public speaking in front of friendly crowd – Learn about current topics © Alvin R. Lebeck 2001 6 Assignment • Homework #0 (Background, due Thursday) • Read Chapters 1 & 2 © Alvin R. Lebeck 2001 7 CPS 220 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Parallelism Programming Languages Applications Computer Architecture: • Instruction Set Design • Organization • Hardware Power Operating Systems © Alvin R. Lebeck 2001 Measurement & Evaluation CPS 220 Interface Design (ISA) History 8 Related Courses Prerequisites • CPS 104: Basic Machine Organization • CPS 110: Basic Operating System Functions • This course: focus on why, analysis, evaluation – Cost/performance – Power budget Follow on Courses • CPS 221: Advanced Computer Architecture II – Parallel computer architecture © Alvin R. Lebeck 2001 9 Computer Architecture Is … the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE © Alvin R. Lebeck 2001 CPS 220 10 Topic Coverage Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2nd Ed., 1995. • • • • • • • • • • • Fundamentals of Computer Architecture (Chapter 1) Instruction Set Architecture (Chapter 2, Appendix C&D) Pipelining (Chapter 3) Advanced Pipelining and ILP (Chapter 4) Memory Hierarchy (Chapter 5) Input/Output and Storage (Chapter 6) Networks and Interconnection Technology (Chapter 7) Multiprocessors (Chapter 8) Vectors (Apendix) New Architectures/trends (papers) Power (papers) © Alvin R. Lebeck 2001 CPS 220 11 Computer Architecture Topics Input/Output and Storage Disks, WORM, Tape Emerging Technologies Interleaving Bus protocols DRAM Memory Hierarchy Coherence, Bandwidth, Latency L2 Cache L1 Cache Addressing, Protection, Exception Handling VLSI Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation © Alvin R. Lebeck 2001 RAID CPS 220 Pipelining and Instruction Level Parallelism 12 Computer Architecture Topics (CPS 221) P M P M S °°° P M P M Interconnection Network Processor-Memory-Switch Multiprocessors Networks and Interconnections © Alvin R. Lebeck 2001 CPS 220 Shared Memory, Message Passing, Data Parallel Network Interfaces Topologies, Routing, Bandwidth, Latency, Reliability 13 Computer Engineering Methodology Technology Trends © Alvin R. Lebeck 2001 14 Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends © Alvin R. Lebeck 2001 15 Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends Simulate New Designs and Organizations Workloads © Alvin R. Lebeck 2001 16 Computer Engineering Methodology Implementation Complexity Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends Implement Next Generation System Simulate New Designs and Organizations Workloads © Alvin R. Lebeck 2001 17 Context for Designing New Architectures • Application Area – Special Purpose (e.g., DSP) / General Purpose – Scientific (FP intensive) / Commercial (Mainframe) – Portable (Power matters) • Level of Software Compatibility – Object Code/Binary Compatible (cost HW vs. SW; IBM S/360) – Assembly Language (dream to be different from binary) – Programming Language; Why not? © Alvin R. Lebeck 2001 CPS 220 18 Context for Designing New Architectures • OS Requirements for General Purpose Apps – – – – – Size of Address Space Memory Management/Protection Context Switch Interrupts and Traps Communication • Standards: Innovation vs. Competition – – – – IEEE 754 Floating Point I/O Bus Networks Operating Systems / Programming Languages ... © Alvin R. Lebeck 2001 CPS 220 19 Technology Trends: Microprocessor Capacity 1000000000 “Graduation Window” 100000000 10000000 1000000 100000 10000 1000 19 71 19 74 19 78 19 82 19 85 19 89 19 93 19 95 20 00 20 03 20 06 20 06 20 11 100 Intel Digital Pentium Pro: 5.5 million Sparc Ultra: 5.2 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Alpha 21264: 15 million Pentium III: 28 million Pentium 4: 42 million Alpha 21364: 100 million Alpha 21464: 250 million CMOS improvements: • Die size: 2X every 3 yrs • Line width: halve / 7 yrs © Alvin R. Lebeck 2001 20 DRAM Capacity (single chip) size 10000000 1000000 100000 10000 1000 100 year size 1980 64 Kb 1983 256 Kb 1986 1 Mb 1989 4 Mb 1992 16 Mb 1996 64Mb 1998 256Mb 2002 1Gb cyc time 250 ns 220 ns 190 ns 165 ns 145 ns 104 ns 10 1980 1983 1986 1989 1992 1996 1998 2002 © Alvin R. Lebeck 2001 21 Technology Trends (Summary) Capacity Speed Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 1.4x in 10 years Disk 2x in 3 years 1.4x in 10 years © Alvin R. Lebeck 2001 CPS 220 22 Processor Performance 300 Sun UltraSparc P e r f o r m a n c e 1.54X /yr 250 DEC 21064a 200 150 IBM Power 2/590 100 DEC AX P 3000 HP 9000/750 50 MIPS M/120 MIPS M2000 Sun-4/260 IBM RS6000/540 1.35X /yr 0 1987 1988 1989 1990 1991 1992 1993 1994 1995 Year © Alvin R. Lebeck 2001 CPS 220 23 Alpha SPECint and SPECfp Integer Floating Point 1.54x/yr Performance (Specmark) 700 600 500 400 300 200 100 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 © Alvin R. Lebeck 2001 24 Chip Area Reachable in One Clock Cycle Fraction of Chip Reached 1.2 1 0.8 f16 f8 fSIA 0.6 0.4 0.2 0 250 180 130 100 70 50 35 Nanometers © Alvin R. Lebeck 2001 25 Power Density Power Density W/cm^2 1000 100 Processor Hot Plate Laser diode 10 1 1.5 1 0.8 0.6 0.35 0.25 0.18 0.13 0.1 Microns © Alvin R. Lebeck 2001 26 Processor Perspective • Putting performance growth in perspective: Pentium-III Cray YMP Personal Comp. Supercomputer Year 1998 1988 MIPS > 400 MIPS < 50 MIPS Linpack 140 MFLOPS 160 MFLOPS Cost $3,000 $1M ($1.6M in 1994$) Clock 400 MHz 167 MHz Cache 512 KB 0.25 KB Memory 128 MB 256 MB • 1988 supercomputer in 1998 personal computer! © Alvin R. Lebeck 2001 27 Measurement and Evaluation Design Architecture is an iterative process: • Searching the space of possible designs • At all levels of computer systems Analysis Creativity Cost / Performance Analysis Good Ideas Bad Ideas © Alvin R. Lebeck 2001 CPS 220 Mediocre Ideas 28 Measurement Tools • • • • How do I evaluate an idea? Performance, Cost, Die Area, Power Estimation Benchmarks, Traces, Mixes Simulation (many levels) – ISA, RT, Gate, Circuit • Queuing Theory • Rules of Thumb • Fundamental Laws • Question: What is “better” Boeing 747 or Concorde? © Alvin R. Lebeck 2001 CPS 220 29 The Bottom Line: Performance (and Cost) Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6.5 hours 610 mph 470 286,700 BAD/Sud Concorde 3 hours 1350 mph 132 178,200 • Time to run the task (ExTime) – Execution time, response time, latency • Tasks per day, hour, week, sec, ns … (Performance) – Throughput, bandwidth © Alvin R. Lebeck 2001 CPS 220 30 The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) --------ExTime(X) = Performance(X) --------------Performance(Y) • Speed of Concorde vs. Boeing 747 • Throughput of Boeing 747 vs. Concorde © Alvin R. Lebeck 2001 CPS 220 31 Performance Terminology “X is n% faster than Y” means: ExTime(Y) --------- = ExTime(X) Performance(X) -------------Performance(Y) = 1 + n ----100 n = 100(Performance(X) - Performance(Y)) Performance(Y) Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X? © Alvin R. Lebeck 2001 CPS 220 32 Example ExTime(Y) = ExTime(X) © Alvin R. Lebeck 2001 15 10 1.5 1.0 = n = n = CPS 220 = Performance (X) Performance (Y) 100 (1.5 - 1.0) 1.0 50% 33 Amdahl's Law Speedup due to enhancement E: ExTime w/o E Speedup(E) = ------------ExTime w/ E = Performance w/ E ------------------Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTime(E) = Speedup(E) = © Alvin R. Lebeck 2001 CPS 220 34 Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Speedupoverall = ExTimeold ExTimenew 1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced © Alvin R. Lebeck 2001 CPS 220 35 Amdahl’s Law • Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP ExTimenew = Speedupoverall = © Alvin R. Lebeck 2001 CPS 220 36 Amdahl’s Law • Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold Speedupoverall = © Alvin R. Lebeck 2001 1 0.95 CPS 220 = 1.053 37 Corollary: Make The Common Case Fast • All instructions require an instruction fetch, only a fraction require a data fetch/store. – Optimize instruction access over data access • Programs exhibit locality Spatial Locality Temporal Locality • Access to small memories is faster – Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's © Alvin R. Lebeck 2001 Cache Memory CPS 220 Disk / Tape 38 Occam's Toothbrush • The simple case is usually the most frequent and the easiest to optimize! • Do simple, fast things in hardware and be sure the rest can be handled correctly in software © Alvin R. Lebeck 2001 CPS 220 39 Metrics of Performance Application Answers per month Operations per second Programming Language Compiler ISA (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Datapath Control Function Units Transistors Wires Pins © Alvin R. Lebeck 2001 Megabytes per second Cycles per second (clock rate) CPS 220 40 Aspects of CPU Performance CPU time = Seconds Program = Instructions x Cycles Program Instruction Instr. Cnt CPI x Seconds Cycle Clock Rate Program Compiler Instr. Set Organization Technology © Alvin R. Lebeck 2001 CPS 220 41 Marketing Metrics Instructio n Count Clock Rate 6 MIPS 10 106 Time CPI • Machines with different instruction sets ? • Programs with different instruction mixes ? – Dynamic frequency of instructions • Uncorrelated with performance MFLOPS FP Operations 10 6 Time • Machine dependent • Often not where time is spent © Alvin R. Lebeck 2001 CPS 220 Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin, . . . 8 43 Cycles Per Instruction “Average Cycles Per Instruction” CPI CPU time Clock Rate Cycles Instructio n Count Instructio n Count n CPU time Cycle Time CPI i I i i 1 “Instruction Frequency” n CPI CPI i Fi i 1 wher e Fi Ii Instructio n Count Invest Resources where time is Spent! © Alvin R. Lebeck 2001 44 Organizational Trade-offs Application Programming Language Compiler Instruction Mix ISA Datapath Control Function Units Transistors Wires Pins © Alvin R. Lebeck 2001 CPS 220 CPI Cycle Time 45 Example: Calculating CPI Base Machine (Reg / Reg) Op Freq Cycles CPIi ALU 50% 1 .5 Load 20% 2 .4 Store 10% 2 .2 Branch 20% 2 .4 1.5 (% Time) (33%) (27%) (13%) (27%) Typical Mix © Alvin R. Lebeck 2001 CPS 220 46 Example Add register / memory operations to traditional RISC: – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3. What fraction of the loads must be eliminated for this to pay off? Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 2 Store 10% 2 Branch 20% 2 © Alvin R. Lebeck 2001 CPS 220 47 Next Time • Benchmarks • Performance Metrics • Cost • Instruction Set Architectures TODO • Read Chapters 1 & 2 • Do Homework #0 © Alvin R. Lebeck 2001 CPS 220 48