CS/EE 1012 Computing for the Near and Long Term Haldun Hadimioglu Spring 2010 Outline What has happened ? Designing chips Near future directions Long term directions Conclusions Intel Eight-Core Xeon die with 2.3 billion transistors Cray Jaguar Supercomputer the fastest computer in the world CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 2 What has Happened ? Moore’s Law has been holding since 1960s It will continue to hold Perhaps at a slower rate of doubling every three years www.ieee.org We will have very small transistors ! Smaller transistors are susceptible to alpha particles ! More transistors will be defective ! CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 3 Intel ‘s Past Microprocessor Roadmap Intel 1.01 TFLOP, 100 million transistor, 62-Watt, 80-core die, each core at 3.16GHz Intel Eight-Core Xeon 7500 die with 2.3 billion transistors Intel eight-core Xeon processor (>26MB cache) 2010 2,300,000,000 CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 4 Power Density was Increasing Exponentially! Power was doubling every 4 years 1000 Rocket Nozzle Watts/cm 2 Nuclear Reactor 100 Pentium® 4 Pentium® III Pentium® II Hot plate 10 Pentium® Pro Pentium® i386 i486 1 1.5m 1m 0.7m 0.5m 0.35m 0.25m 0.18m 0.13m 0.1m 0.07m Courtesy : “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999. Courtesy Avi Mendelson, Intel. CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 5 Microprocessor speed Every two years the speed of microprocessors doubles The processor speed increases 50% a year ! But, memory speed increases 10 % a year ! Microprocessor speed for an application depends on Number of operations in the application (lower better) The quality of the code Number of parallel operations performed (higher better) Do more operations in parallel How fast each operation is performed (higher better) Because of Moore’s Law : transistors are smaller and wires are shorter Clock frequency is increased Until 2005 increasing the clock frequency was the main way to increase the speed Power consumption (heat generation) increases with the frequency The chip has to be cooled by usingcooled A heat sink or a fan or a liquid Since 2005 power consumption changed way to increase speed CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 6 Multi-Core Microprocessors Since 2005 microprocessor speed increase depends on Number of operations in the code (the quality of the code) Number of parallel operations performed Dual-core microprocessors with reduced frequency consume less power (generate less heat) Two/Four/Eight cores perform more operations in parallel The speed increase continues into the future with more cores on chip Clock frequency Number of cores per chip doubles every two years The memory can become a bottleneck The memory speed increases 10% a year More cores increase the demand on the memory The memory wall problem Parallel Programming has to be improved dramatically Parallel programming wall CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 7 Designing Chips We have been using hardware description languages (HDLs) to design chips We write an HDL program to design a chip ! Just like we draw a schematic to design a chip Why an HDL program, why not schematics ? Real life circuits are too complex to be designed by schematics There are two popular HDLs today VHDL Verilog HDL Knowing one HDL language helps one learn another HDL language faster CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 8 Why HDLs ? Software : Statements are executed sequentially The sequence of statements is significant, since they are executed in that order Java, C++, C, Ada, Pascal, Fortran,… Hardware : Events happen concurrently A software language cannot be used for describing and simulating hardware Concurrent software languages cannot be used either Because we do not have powerful tools Programs in C/C++ etc. will be used to design chips in the future It is already done for C and C++ programs in limited cases First they are converted to HDL programs and then to hardware CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 9 Full Adder VHDL Program © IBM Data-flow description of the Full Adder circuit : IBM dual-core BlueGene/L microprocessor die & its chip ki mi ci si Full Adder co si = ki mi ci + ki mi ci + ki mi ci + ki mi ci co = ki mi + ki ci + mi ci CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 10 VHDL Details : 3-to-8 Decoder CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 11 3-to-8 Decoder VHDL Program Y_L0 A0 Y_L1 A1 Entity Part : Y_L2 A2 3-to-8 DCD Y_L3 Y_L4 G1 Y_L5 G2A_L Y_L6 G2B_L Y_L7 CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 12 3-to-8 Decoder VHDL Program Architecture Part : Y_L0 A0 Y_L1 A1 Y_L2 A2 3-to-8 DCD Y_L3 Y_L4 G1 Y_L5 G2A_L Y_L6 G2B_L Y_L7 All statements happen concurrently CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 13 Near Future Directions Double number of cores every two years Make sure to handle errors due to Alpha particles Defective transistors Make sure to handle Power Wall Memory Wall Make sure to improve Parallel Programming CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 14 Near Future Directions Intel Unveils 48-Core Research Chip On Wednesday Intel shifted its Tera-scale Computing Research Program into second gear by demonstrating a 48-core x86 processor. The company is intending to use the new chip as a research platform for the purpose of lighting a fire under many-core computing. According to Intel, the new chip boasts 1.3 billion transistors and is built on 45nm CMOS technology. It's distinction is that it contains the largest number of Intel Architecture (IA) cores ever assembled on a single microprocessor. As such, it represents the sequel to Intel's 2007 "Polaris" 80-core prototype that was based on simple floating point units. While the latter chip was said to reach 2 teraflops, the company is not talking about performance for the 48-core version. HPC Wire, December 4, 2009 The IBM Power7 chips are implemented in a 45 nanometer copper/SOI process and have 1.2 billion transistors with eight cores on a single die. The Power7 core has 32KB of L1 instruction cache and 32KB of L1 data cache. Each core sports simultaneous multithreading that delivers four virtual threads per core, and has a 256KB of L2 cache tightly coupled to it. The chip also has 32MB of embedded DRAM that acts as a shared L3 cache, with 4 MB segments affiliated with each of the eight cores. The Power7 chip has two dual-channel DDR3 memory controllers implemented on the chip, which deliver 100 GB/sec of sustained bandwidth per chip. September 1, 2009 http://www.arstechnica.com http://www.theregister.co.uk, November, 27, 2009 CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 15 Scalable High Performance Main Memory System Using PCM Technology, Moinuddin K. Qureshi, et.al., ISCA 2009, IBM From Intel www.anandtech.com Intel Technology Journal, November 2005 Intel & IBM Vision for Next 5-8 Years Intel CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 16 Near Future Directions : Next 5-8 Years Applications Intel : Recognition, Mining, Synthesis as platform 2015 Workload Model (on massively parallel core chips) IBM : Presence information, knowing where and things are and how to best match them, people are sensorized Microsoft : Intention machine, computer predicts user intentions and delivers useful information CMU : Computational thinking, computer science based approach to solving problems, designing systems, understanding human behavior Traditional computing will continue A C/C++/Java program for an application becomes Software A compiler generates the machine language program file A new type of computing A C/C++/Java program for an application becomes Hardware A hardware compiler generates the transistor circuit The result is a custom chip CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 17 Near Future Directions : New Computing Types ? Any other new possibility ? A C/C++/Java program for an application becomes Hardware A CAD tool generates the bit file to reconfigure the FPGA An FPGA chip is a hardware programmable chip The chip emulates the circuit designed The bit file configures the chip The CS 2204 Digital Logic Lab uses FPGAs ! There can be more opportunities with FPGA chips ! FPGAs are increasingly used in commercial products ! FPGAs are becoming cost competitive with microprocessors FPGAs are becoming speed competitive with custom chips FPGAs are used for applications where Speed and programmability matter Latest FPGAs also have microprocessor cores They can run software as well The application can be divided into software and hardware CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 18 Near Future Directions : New Computing Types A C/C++/Java program becomes Part software and part hardware FPGA with cores and reconfigurable areas runs applications Software is run by processor cores and Hardware is in the reconfigurable area When such an FPGA runs an application, some operations are in hardware and simultaneously some operations in software Reconfigurable area to do operations in hardware Processor core to run software These FPGAs are available now but we need much better tools Software tools (compilers) and CAD tools must merge Reconfigurable areas & cores allow recovering from errors due to Alpha particles Defective transistors CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 19 Near Future Directions : Hybrid Switching Elements CMOL : A circuitry composed of CMOS and nanodevices A closer look at FPGA-like reconfigurable logic circuits Interface between CMOS and nanodevices Figures from : Konstantin K. Likharev A larger view of FPGA-like reconfigurable logic circuits Two CMOS cells and a nanodevice CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 20 Near Future Directions : Possible New Structures Microelectromechanical systems, MEMS, with computing elements Microembedded systems Smart Dust at UC Berkeley Microbiolab on a chip Sometimes referred to as a biochip ! Other structures that can be used for a number of different applications with or without computing elements Microcameras Microsensors Micromirrors Micromotors Microlenses An all-optical computing chip with Micromirrors Microlenses Bio MEMS The Biochip Group at Mesa+, University of Twente, Holland CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 21 Near Future Directions : Year 2020 SEMATECH : consortium of semiconductor manufacturers from America, Asia and Europe. SEMATECH predictions for year 2020 (from its 2009 Update of International Technology Roadmap for Semiconductors, ITRS, study) : Clock speed : 12 GHz Number of transistors on a microprocessor chip : 35 Billion Make sure to handle errors due to 32Gbit DRAM chips Alpha particles Process length : 14 nm Defective transistors http://www.sematech.org CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 22 Long Term Directions : Possible New Structures Nanotechnology Programmable materials NEMS Bio NEMS Nano medicine Drug delivery Smart diagnosis Nanocomputing Quantum computing Molecular computing IBM Blue Gene/L molecular dynamics demo Molecular self assembly Testing of molecular structures Adaptive molecular structures Merger of bio and non-bio structures Synthetic biology www.ibm.com 1 Watt supercomputer CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 23 Long Term Directions : 2020 and Beyond Many interconnected varying-size computing elements using each other’s results autonomously Ubiquitous computing with little human intervention Cloud computing to nano computing Personal agents Intelligent spaces Nano medicine Targeted drug delivery We need Self-healing, adaptive, self managing, trustworthy, dependable hardware and software New computational models New programming languages Hardware and software reliability www.uky.edu Efficient parallel processing CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 24 Long Term Directions : 2020 and Beyond Will hardware and software be developed separately like today ? How will software be developed for nano systems ? Quantum software ? Molecular software ? Biosoftware ? How will hardware be developed for nano systems ? VHDL or Verilog HDL or C or C++ or ? Developing tools is critical Simulation of protein molecules folding on a supercomputer Iron atoms on copper with electron movement CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 25 Long Term Directions : 2020 and Beyond By 2019 a $1000 computer will match the processing power of the human brain Raymond Kurzweil, KurzweilAI.net, 9/1/1999 His keynote speech at the Supercomputing Conference (SC06) in November 2006 The title of his talk is “The Coming Merger of Biological and Non-Biological Intelligence” Singularity point ? Brain downloads possible by 2050 Ian Pearson, Head of British Telecom’s futurology unit, CNN.com, 5/23/2005 Computers will be used as virtual brain extensions ? Direct brain - Internet link ? CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 26 Long Term Directions Hans Moravec, 1998 Many ethical issues will be facing you ! Being prepared will help ! CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 27 Conclusions Digital Logic evolution will continue : Faster, cheaper, smaller, lighter, less power consuming, higher reliability digital products Due to converging research in various areas : Mathematics Computer Science Computer Engineering Electrical Engineering Mechanical Engineering Physics Chemistry Material Science Biology ? There will be many ethical issues Try to prepare ! Try to be informed ! CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 28