MS108 Computer System I Lecture 1 Introduction Prof. Xiaoyao Liang 2015/3/4 1 Course Details • • • • • • • • Time: Wed 10:00-11:40am, Fri 10:00-11:40am Location: 下院201 Course Website: http://www.cs.sjtu.edu.cn/~liang-xy/ms108/MS108-L*.ppt http://www.cs.sjtu.edu.cn/~liang-xy/ms108/hw*.pdf Instructor: Xiaoyao Liang, liang-xy@cs.sjtu.edu.cn TA: TBD Textbook: Computer Architecture:A Quantitative Approach,Fifth Edition/计算 机体系结构:量化研究方法(英文版•第5版) ISBN 9787111364580 (英文影印版), John L.Hennessy, David A.Patterson著,机械工业出版社2012年1月1日出版 Reference: 计算机组成与设计:硬件、软件接口(原书第3或第4版) ,David A.Patterson,John L.Hennessy著,机械工业出版社出版 Grades: Homework (40%), Attendance (10%), Middle-term Exam (20%), Project (30%) 2 Course Prerequisites • Computing Hardware or similar Logic Design (computer arithmetic) Basic ISA (what is a RISC instruction) Pipelining (control/data hazards, forwarding) Will review the above during the first couple of weeks • C Programming, Linux • Compilers, OS, Circuits/VLSI background is a plus, not needed 3 Course Importance • Embarrassing if you are a BS in CS/CE and can’t make sense of the following terms: DRAM, pipelining, cache hierarchies, I/O, virtual memory • Embarrassing if you are a BS in CS/CE and can’t decide which processor to buy: 3 GHz Core2duo or 1GHz ARM (helps us reason about performance/power) • Obvious first step for chip designers, compiler/OS writers • Will knowledge of the hardware help me write better programs? 4 Course Importance • Memory management: if we understand how/where data is placed, we can help ensure that relevant data is nearby • Thread management: if we understand how threads interact, we can write smarter multi-threaded programs Why do we care about multi-threaded programs? Average Joe Programmer Vs. Stephaney Programmer 5 Course Topics • Focus on what modern computer architects worry about (both academia and industry) • Get through the basics of modern processor design • Understand the interfaces between architecture and system software (compilers, OS) • System architecture and I/O (disks, memory, multiprocessors) • Look at technology trends, recent research ideas, and the future of computing hardware 6 Course Arrangement • • • • • • • • • • Introduction and Performance Metrics (1 week) ISA/Basic Pipelining Review (2 week) Hardware ILP (2 weeks) Software ILP (1 week) Caches/Memory (2 weeks) Modern Processor Case Studies (2 weeks) Multiprocessors/Multithreading (2 weeks) Input/Output and Interconnects (1 week) Research Trends (1 week) Technology Trends impact on architecture (1 week) 7 What is Computer Architecture 8 Computers Are Everywhere • General-Purpose Laptop/Desktop Productivity, interactive graphics, video, audio Optimize price-performance Examples: Intel Core2duo, Nvidia GTX • Embedded Computers PDAs, cell-phones, sensors => Price, lifetime Examples: Iphone, Ipad, Android Phone Game Machines, Network uPs => Price-Performance Examples: Sony PS, Xbox, IBM 750FX • Data Centers HPC, Cloud => Price, throughput, power, cooling Example: Google, Amazon 9 Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “Moore’s Law” Microprocessors have become smaller, denser, and more powerful. Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. 10 Microprocessor Speed Growth in transistors per chip Increase in clock rate 100,000,000 1000 10,000,000 1,000,000 i80386 i80286 100,000 R3000 R2000 100 Clock Rate (MHz) Transistors R10000 Pentium 10 1 i8086 10,000 i8080 i4004 1,000 1970 1975 1980 1985 1990 1995 2000 2005 Year 0.1 1970 1980 1990 2000 Year Why bother with parallel programming? Just wait a year or two… 11 Microprocessor Performance Move to multi-processor 12 Limit #1: Power Density Can soon put more transistors on a chip than can afford to turn on. -- Patterson ‘07 Scaling clock speed (business as usual) will not work Sun’s Surface Power Density (W/cm2) 10000 Rocket Nozzle 1000 Nuclear Reactor 100 8086 Hot Plate 10 4004 8008 8085 386 286 8080 1 1970 1980 P6 Pentium® 486 1990 Year Source: Patrick Gelsinger, Intel 2000 2010 13 Limit #2: ILP Tapped Out Application performance was increasing by 52% per year as measured by the SpecInt benchmarks here From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006 • ½ due to transistor density • ½ due to architecture changes, e.g., Instruction Level Parallelism (ILP) • VAX : 25%/year 1978 to 1986 • RISC + x86: 52%/year 1986 to 2002Year 14 Limit #2: ILP Tapped Out • Superscalar (SS) designs were the state of the art; many forms of parallelism not visible to programmer multiple instruction issue dynamic scheduling: hardware discovers parallelism between instructions speculative execution: look past predicted branches non-blocking caches: multiple outstanding memory ops • You may have heard of these before, but you haven’t needed to know about them to write software • Unfortunately, these sources have been used up Year 15 Limit #3: Chip Yield Manufacturing costs and yield problems limit use of density • Moore’s (Rock’s) 2nd law: fabrication costs go up • Yield (% usable chips) drops • Parallelism can help Year More smaller, simpler processors are easier to design and validate Can use partially working chips: 16 E.g., Cell processor (PS3) Current Situation • Chip density is continuing increasing Clock speed is not Number of processor cores may double instead • There is little or no hidden parallelism (ILP) to be found • Parallelism must be exposed to and managed by software Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond) 17 Abstraction • As an architect, our main job is to deal with tradeoffs Performance, Power, Die Size, Complexity, Applications Support, Functionality, Compatibility, Reliability, etc. • Technology trends, applications… How do we deal with all of this to make real tradeoffs? • Abstractions allow this to happen • Focus is on metrics of these abstractions Performance, Cost, Availability, Power 18 Computer Components • Input/output devices • Secondary storage: non-volatile, slower, cheaper • Primary storage: volatile, faster, costlier • Communication: Bus, cable • CPU/processor 19 IC Manufacturing 20 Processor Technology Trend • Integrated circuit technology – Transistor density: 35%/year – Die size: 10-20%/year – Integration overall: 40-55%/year • DRAM capacity: • Flash capacity: 25-40%/year (slowing) 50-60%/year – 15-20X cheaper/bit than DRAM • Magnetic disk technology: 40%/year – 15-25X cheaper/bit then Flash – 300-500X cheaper/bit than DRAM 21 Memory and IO Technology Trend • Bandwidth or throughput – Total work done in a given time – 10,000-25,000X improvement for processors – 300-1200X improvement for memory and disks • Latency or response time – Time between start and completion of an event – 30-80X improvement for processors – 6-8X improvement for memory and disks 22 Bandwidth Vs. Latency 23