Outline • Goal of this class: CS5100 Advanced Computer Architecture − To understand the trends of IC technology and be able to relate the trends with computer architecture designs − Why need to know the trends? Technology Trends • Learn from the history • Understand possible future and know how to adapt now Prof. Chung-Ta King • Class outline: Department of Computer Science National Tsing Hua University, Taiwan − Trends in technology (Sec. 1.4) − Trends in power and energy (Sec. 1.5) − Trends in cost (Sec. 1.6) 1 National Tsing Hua University National Tsing Hua University IC Technology and Processor Performance Review of Transistors (MOSFET) on IC # transistors on ICs x2 every 2 years Exponential growth Source: Intel Corp. 2 National Tsing Hua University 3 National Tsing Hua University Technology Scaling • Feature size: − Minimum size of transistor or wire in x or y dimension − 10 microns in 1971 to 22 nm in 2012 − New technology node every 2 years or so − ~70% (S) reduction for each generation 0.7x 0.7x National Tsing Hua University 10 µm – 1971 6 µm – 1974 3 µm – 1977 1.5 µm – 1982 1 µm – 1985 800 nm – 1989 600 nm – 1994 350 nm – 1995 250 nm – 1997 180 nm – 1999 130 nm – 2001 90 nm – 2004 65 nm – 2006 45 nm – 2008 32 nm – 2010 22 nm – 2012 14 nm – 2014 4 10 nm – 2016 Effects of Scaling • More transistors per unit area − Feature size reduced by 0.7 (S) area of a transistor reduced by 0.5 (S2) − 2X # transistors/unit area − Fixed cost per wafer lower cost per transistor • Faster transistors − Reduce time to switch on/off transistors speed improved by S exponential increase in clock rate • Less supplied voltage and power − Power to switch transistor reduced, but not power density − Voltage to drive transistors reduced 5 National Tsing Hua University Effects of Scaling Summary: Technology Trends • Local wires are getting faster • Global wires are getting slower, i.e. scale poorly • Integrated circuit technology − Transistor density: 35%/year − Die size: 10-20%/year − Integration overall: 40-55%/year (slow down after 2003!) − No longer possible to cross chip in one cycle − Computer architects need to plan around this • DRAM capacity: 25-40%/year (slowing) • Flash capacity: 50-60%/year Chip size − 15-20X cheaper/bit than DRAM Scaling of reachable radius 3D stacking Distributed mechanisms • Magnetic disk capacity: 40%/year − 15-25X cheaper/bit than Flash − 300-500X cheaper/bit than DRAM − But not speed 6 National Tsing Hua University 7 National Tsing Hua University Implications for Computer Architecture Bandwidth versus Latency • High rate of density improvements • Bandwidth or throughput − Used for bringing 4-bit, 8-bit, through 64-bit microprocessors in the early days of microprocessors − Used for multiprocessor per chip, wider SIMD, …, in recent years • Quantitative changes leading to qualitative changes − 25K to 30K transistors per chip in early 1980s possible to build a single-chip 32-bit microprocessor − By mid 1980s, FP unit can be integrated − By late 1980s, L1 cache can fit on the same chip Performance improvements often in discrete steps − Total work done in a given time − 10,000-25,000X improvement for processors − 300-1200X improvement for memory and disks • Latency or response time − Time between start and completion of an event − 30-80X improvement for processors − 6-8X improvement for memory and disks • Work with signal propagation delay on wires 8 National Tsing Hua University 9 National Tsing Hua University Bandwidth and Latency Summary: Bandwidth and Latency • For disk, LAN, memory & microprocessor, bandwidth improves by square of latency improvement Log-log plot of bandwidth and latency milestones − In the time that bandwidth doubles, latency improves by no more than 1.2X to 1.4X • Lag probably even larger in real systems, as BW gains multiplied by replicated components − − − − Multiple processors in a cluster or in a chip Multiple disks in a disk array Multiple memory modules in a large memory Simultaneous communication in switched LAN • HW and SW developers should innovate assuming latency lags bandwidth 10 National Tsing Hua University 11 National Tsing Hua University Outline Power Density Trend • Trends in technology (Sec. 1.4) • Trends in power and energy (Sec. 1.5) • Trends in cost (Sec. 1.6) P = αCVdd f + Vdd I st + Vdd I leak 2 Source: Intel Corp. 12 National Tsing Hua University 13 National Tsing Hua University Power Power and Energy • Intel 80386 consumed ~2 W, but 3.3 GHz Intel Core i7 consumes 130 W • Heat must be dissipated from the chip • Today, power is major limitation to using transistors, not silicon area • Pavg = Pdynamic + Pstatic • Energy is related to power through time • If power dissipation remains constant through time T, then E = (Pavg x T) 14 National Tsing Hua University 15 National Tsing Hua University Dynamic Power and Energy Static Power • For CMOS chips, traditional dominant energy consumption has been in switching transistors, called dynamic power • Because leakage current flows even when a transistor is off, now static power is important too − Currentstatic x Voltage − Scales with number of transistors − Increase as transistors shrink and # transistors increases − ½ x capacitive load x voltage2 x frequency switched • For mobile devices, energy is better metric • With 65nm or better technologies, leakage can account for 50% of total power if not designed properly − ½ x capacitive load x voltage2 • Reducing clock rate reduces power, but not energy • Reducing power: − − − − − To reduce: power gating Do nothing well: turn off clock of inactive modules Dynamic Voltage-Frequency Scaling (DVFS) Low power state for DRAM, disks Overclocking, turning off cores 16 National Tsing Hua University 17 National Tsing Hua University Implications for Computer Architecture Outline • Architectural designs for low power using metrics such as tasks per joule or performance per watt • Trends in technology (Sec. 1.4) • Trends in power and energy (Sec. 1.5) • Trends in cost (Sec. 1.6) − Use the right power/energy to do the right things • Sometimes, do things faster but at a higher power may be better race to halt − Often techniques for performance also lead to power saving 18 National Tsing Hua University 19 National Tsing Hua University VLSI Economics NRE • Selling price Stotal • Engineering cost − Stotal = Ctotal / (1-m) − Depends on size of design team, including benefits, training, computers − CAD tools: • m = profit margin • Ctotal = total cost • Digital front end: $10K • Analog front end: $100K • Digital back end: $1M − Nonrecurring engineering cost (NRE) − Recurring cost − Fixed cost: data sheets and application notes, marketing and advertising, yield analysis • Prototype manufacturing − Mask costs: $500k – 1M in 130 nm process − Test fixture and package tooling 20 National Tsing Hua University 21 National Tsing Hua University Recurring Cost of IC Cost and Computer Architecture • Sole control of computer architects on IC cost is die area, and hence a portion of the cost − What functions should be included or excluded in the design? − Number of I/O pins − Design complexities − Defects per unit area = 0.016-0.057 defects per cm2 (2010) − N = process-complexity factor = 11.5-15.5 (40 nm, 2010) 22 National Tsing Hua University 23 National Tsing Hua University Technology and Architecture Technology and Architecture • How to translate technology improvements into increases in computing performance? • Increased transistor counts: − Basic strategies: parallelism, speculation, overlapping, monitoring/profiling − Modular and hierarchical architectures − Constraints on power dissipation, localized communication, design and verification complexities • Increasing clock frequency: − Need to tackle power, heat, clock skew, wire delay − Gap to memory and I/O devices, PC board design multi-level cache (with on-chip cache) − Need scalable design with little complexity, parallelism e.g., multiple functional units, RICS cores − Need good locality, avoid long distance and rapid interaction, e.g., MP on a chip • Shorter wires, lower complexity, scale with technology • On-chip cache/DRAM, MP on a chip, multithreading, vector processing, VLIW − For monitoring and learning program’s execution and subsequently recasting it for faster execution − Self-adapting, self-management, self-healing, … − More functionalities: multimedia, facilities for I/O and memory, bandwidth and latency improvement 24 National Tsing Hua University Recap • Trends in technology • Trends in power and energy • Do you understand the trends of IC technology? • Can you explain the implications and relate the trends with computer architecture designs? 26 National Tsing Hua University 25 National Tsing Hua University