(ISCAS-2011) Reiner Hartenstein IEEE fellow Aiming at the Natural Equilibrium of Planet Earth Requires to Reinvent Computing 1 TU Kaiserslautern (Preface) Without Computers? (Business Information System) 2 2011, reiner@hartenstein.de © 2010, Lufthansa anno 1960 http://hartenstein.de (Preface) Rebooting the World TU Kaiserslautern http://www.macrowikinomics.com The World Economic Forum: Smart graphic interfaces replacing UN, UNESCO and other bureaucracies by Rebooting the World for the New Realities direct interaction with the population (mass collaboration) © 2010, reiner@hartenstein.de 3 http://hartenstein.de Preface TU Kaiserslautern • Enormous Trouble in Computing: – Longterm Programming Crisis – Keynotes and Panel Discussions booming – Excessive Power Consumption 2011, reiner@hartenstein.de © 2010, 4 http://hartenstein.de Outline (1) TU Kaiserslautern •Energy consumption of Computers •Toward Exascale Computing •The von Neumann Syndrome •We need to Reinvent Computing •Conclusions 2011, reiner@hartenstein.de © 2010, 5 http://hartenstein.de TU Kaiserslautern Beyond peak oil „6 more Saudi Arabias needed [Fatih Birol, Chief Economist IEA]. for demand predicted for 2030“ https://www.theoildrum.com/ © 2010, reiner@hartenstein.de 6 http://hartenstein.de TU Kaiserslautern 2011 reiner@hartenstein.de reiner@hartenstein.de 2011, ©©2010, Saudi Arabia 7 7 http://hartenstein.de 8 TU Kaiserslautern How many more Saudi Arabias needed? Rio de Janeiro 2011, reiner@hartenstein.de © 2010, http://hartenstein.de Power Consumption of the Internet TU Kaiserslautern 9 Power consumption by internet: x30 til 2030 if trends continue soon 8 billion smart wireless devices G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008 [Randy Katz: IEEE Spectrum, Febr. 2009] Google Data Ccenter at Columbia River 2011, reiner@hartenstein.de © 2010, http://hartenstein.de 10 TU Kaiserslautern More Google Data Centers [datacenterknowledge.com] Google causing 2% electricity consumption worldwide ? http://hartenstein.de 10 © 2011 reiner@hartenstein.de 2011, reiner@hartenstein.de © 2010, Electricity Bill: a Key Issue TU Kaiserslautern Google going to sell electricity Patent for water-based data centers • Already in 2005, Google’s electricity bill higher than value of its equipment. Cost of a Google data center dominated only by monthly power bill „The possibility of computer equipment power consumption spiraling out of control could have serious consequences for the overall affordability of computing.” [L. A. Barroso, Google] 2011, reiner@hartenstein.de © 2010, 11 http://hartenstein.de The World's largest Data Center TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, 12 http://hartenstein.de [datacenterknowledge.com] Microsoft Data Center at Quincey TU Kaiserslautern [datacenterknowledge.com] 2011, reiner@hartenstein.de © 2010, 13 http://hartenstein.de About 2000 datacenters world-wide [datacenterknowledge.com] TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, 14 http://hartenstein.de Outline (2) TU Kaiserslautern •Energy consumption of Computers •Toward Exascale Computing •The von Neumann syndrome •We need to Reinvent Computing •Conclusions 2011, reiner@hartenstein.de © 2010, 15 http://hartenstein.de Multicore: Break-through or Breakdown? TU Kaiserslautern relative performance 94 96 David Callahan, Microsoft distinghuished endineer begin of the multicore era 16 „forcing a historic transition to a parallel programming model yet to be invented“ 98 00 02 year 04 2011, reiner@hartenstein.de © 2010, 06 08 10 12 14 16 18 20 22 24 26 28 30 http://hartenstein.de 17 TU Kaiserslautern „ intel has thrown a Hail Mary Pass“ Dave Patterson 2011, reiner@hartenstein.de © 2010, http://hartenstein.de 18 TU Kaiserslautern John Hennessy „ … I would be panicking …“ 2011, reiner@hartenstein.de © 2010, http://hartenstein.de TU Kaiserslautern Exascale affordable ? Exa-scale: (1018 computations/second) expected by 2018; [several sources] Power estimated (single supercomputer): 250 MW – 10 GW (2x NY City: 16 million people) © 2010, reiner@hartenstein.de 19 http://hartenstein.de Supercomputers: TU Kaiserslautern no Computers? In my opinion, the largest supercomputers at any time, including the first exaflops, should not be thought of as computers. … [Andrew Jones, vice president Numerical Algorithms Group] © 2010, reiner@hartenstein.de 20 http://hartenstein.de TU Kaiserslautern Supercomputers as Scientific Instruments …Their usage patterns and scientific impact are closer to major research facilities such as CERN, ITER, or Hubble. [Andrew Jones, vice president Numerical Algorithms Group] no reason to solve the power problem ? © 2010, reiner@hartenstein.de 21 http://hartenstein.de 22 TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, CERN (1) http://hartenstein.de 23 TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, CERN (2) http://hartenstein.de 24 TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, Hubble http://hartenstein.de 25 Learning how to go Exascale TU Kaiserslautern CACHES 2011 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems June 4th, 2011, held in conjunction with ICS'2011 25th International Conference on Supercomputing 2011, http://hartenstein.de © 2010, May 31reiner@hartenstein.de - June 4, 2011, Loews Ventana Canyon Resort, Tucson, Arizona Outline (3) TU Kaiserslautern •Energy consumption of Computers •Toward Exascale Computing •The von Neumann syndrome •We need to Reinvent Computing •Conclusions 2011, reiner@hartenstein.de © 2010, 26 http://hartenstein.de TU Kaiserslautern Potential of RC Reconfigurable Computing offers an overwhelming reduction of electricity consumption as well as massive speed-up factors … 2011, reiner@hartenstein.de © 2010, 27 http://hartenstein.de TU Kaiserslautern PISA project >15000 Speed-up factors are not new 100,000 10,000 ? Speedup-Factor 1,000,000 Image processing, Pattern matching, Multimedia DSP and real-time wireless face detection 6000 Reed-Solomon Decoding video-rate stereo vision pattern recognition 730 900 1000 by avoiding the von Neumann paradigm 52 40 10 20 2011, ©©2010, 2011 reiner@hartenstein.de reiner@hartenstein.de 1000 400 SPIHT wavelet-based image compression 100 1 MAC BLAST 288 457 FFT 88 protein identification 28500 ? DES breaking 2400 DNA seq. 8723 3000 crypto CT imaging 1000 Viterbi Decoding Smith-Waterman pattern matching 100 molecular dynamics simulation Bioinformatics Astrophysics GRAPE 28 http://hartenstein.de TU Kaiserslautern Power save factors obtained Speedup-Factor 106 Image processing, Pattern matching, Multimedia DSP and 6000 Energy saving factors: ~10% of speedup SPIHT wavelet-based image compression 52 40 20 100 http://hartenstein.de 2011, ©©2010, 2011 reiner@hartenstein.de reiner@hartenstein.de DES breaking Reed-Solomon Decoding video-rate stereo vision MAC pattern 730 1000 900 recognition 400 103 28500 wireless real-time face detection BLAST 288 457 FFT 88 protein identification DNA 2400 seq. 8723 3000 crypto CT imaging 1000 Viterbi Decoding Smith-Waterman pattern matching 100 molecular dynamics simulation Bioinformatics Astrophysics GRAPE 29 http://hartenstein.de RC*: Demonstrating the intensive Impact TU Kaiserslautern Tarek El-Ghazawi [Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008] SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster Application . DNA and Protein sequencing DES breaking Power Savings Cost Size 8723 779 22 253 28514 3439 96 1116 Speed-up factor massively saving energy *) RC = Reconfigurable Computing 2011, reiner@hartenstein.de © 2010, 30 much less equipment needed http://hartenstein.de Drastically less Equipment needed TU Kaiserslautern a single rack without air conditioning For instance: a hangar full of racks replaced by: © 2010, reiner@hartenstein.de 31 or ½ rack http://hartenstein.de The Reconfigurability Paradox TU Kaiserslautern • Lower clock speed • Wiring overhead • Reconfigurability overhead • Routing congestion O. o. magnitude better performance by a massively worse technology ? 2011, reiner@hartenstein.de © 2010, 32 http://hartenstein.de The von Neumann Syndrome because of TU Kaiserslautern 2011 reiner@hartenstein.de reiner@hartenstein.de ©©2010, 33 33 http://hartenstein.de TU Kaiserslautern von Neumann Syndrome Lambert M. Surhone, Mariam T. Tennoe, Susan F. Hennessow (ed.): Von Neumann Syndrome; ßetascript publishing 2011 2011, reiner@hartenstein.de © 2010, 34 http://hartenstein.de 35 von Neumann Model Critics TU Kaiserslautern Nathan’s Law: Software is a gas. It expands to fill all its containers ... Nathan Myhrvold, Microsoft Ex-CTO incompetent programmers year system 2001 Windows XP 2005 MAC OS X 10.4 2007 SAP Net Weaver MLOC (millions) 40 86 238 “The von Neumann Syndrome”: [C.V. “RAM” Ramamoorthy 2007; UC Berkeley] Critique of von Neumann is not new: Software Desaster Reports: N. N. 1995: THE STANDISH GROUP REPORT 2011 reiner@hartenstein.de E. © Dijkstra 1968; J. Backus 1978; Arvind , 1983; Robert N. Charette 2005: Why Software Fails; IEEE Spectrum 2011, reiner@hartenstein.de http://hartenstein.de © 2010, Anthony Berglas 2008: Why it is Important that Software Projects Fail Peter G. Neumann 1985-2003; L. Savain 2006. All hardware but ALU is overhead: TU Kaiserslautern x20 inefficiency [R. Hameed et al.: Understanding Sources of Inefficiency in GeneralPurpose Chips; 37th ISCA, June 19-23, 2010, St. Malo, France] “GP Processors are inefficient” (data cashe) x20 inefficiency: just one of several overhead layers 2011, reiner@hartenstein.de © 2010, 36 http://hartenstein.de „The Memory Wall“ TU Kaiserslautern Performance 1000 100 coined by Sally McKee The overwhealming problem is data moving complexity, not processor performance. Dr. Djordje Maric* (ETH Zurich), >1000 and complex multi-MLOC instruction movement Patterson’s Law: Processor-Memory Performance Gap: (grows 50% / year) CPU 10 1 1980 2011, reiner@hartenstein.de © 2010, DRAM 1990 2000 37 2008 http://hartenstein.de Through-Silicon-Via (TSV) TU Kaiserslautern reducing the memory wall? SIP multiple dice PoP Package on Package PiP Package in Package TSV Through silicon via reduce power consumption by 75% [Wally Rh., Micro News 2/28/2011 ] 2011, reiner@hartenstein.de © 2010, 38 http://hartenstein.de Massive Overhead Phenomena TU Kaiserslautern von Neumann overhead machine instruction fetch instruction stream state address computation instruction stream data address computation instruction stream data meet PU + other overh. instruction stream i / o to / from off-chip RAM instruction stream Inter PU communication instruction stream message passing overhead instruction stream transactional memory overh. instruction stream proportionate to the number of processors overproportionate to the number of processors multithreading overhead etc. instruction stream © 2010, reiner@hartenstein.de 39 http://hartenstein.de TU Kaiserslautern von Neumann overhead vs. Reconfigurable Computing overhead instruction fetch state address computation von Neumann machine instruction stream instruction stream data address computation instruction stream data meet PU + other overh. instruction stream i / o to / from off-chip RAM instruction stream Inter PU communication instruction stream datastream machine none* none* none* none* none* message passing overhead instruction stream transactional memory overh. instruction stream none* none* none* multithreading overhead etc. instruction stream none* 40 © 2010, reiner@hartenstein.de 40 http://hartenstein.de Outline (4) TU Kaiserslautern •Energy consumption of Computers •Toward Exascale Computing •The von Neumann Syndrome •We need to Reinvent Computing •Conclusions 2011, reiner@hartenstein.de © 2010, 41 http://hartenstein.de Putting Old Ideas Into Practice Software Engineering http://www.acm.org/sigsoft/SEN/parnas.html SEN vol. 24 no. 3, May 1999 TU Kaiserslautern The biggest payoff will come from putting old ideas into practice (POIIP) and teaching people how to apply them properly. [David Parnas] 2011, reiner@hartenstein.de © 2010, 42 http://hartenstein.de Mike Flynn‘s Taxonomy TU Kaiserslautern M. J. Flynn: “Very high-speed computing systems”; Proc. IEEE, Vol. 54, No. 12, pp. 1901–1909, Dec., 1966. 2011, reiner@hartenstein.de © 2010, 43 http://hartenstein.de © 2011 reiner@hartenstein.de 44 Diana‘s extended Taxonomy TU Kaiserslautern 4 x SISD: rSI: I can be reconfigured at run time: e. g. RISP rSD: can exchange data memory or datapath rSIrSD: both possible 4 x SIMD: rSI: I can be reconfigured at run time: e. g. RISP rMD: SIMD processors can exchange their data memories or reconfigure their datapaths rSIrMD: can reconfigure both, D and Iat run time 4 x MIMD: rMI: MPSoCs w. reconfigurable I rMD: MPSoCs w. reconfigurable D rMIrMD: supports both I: instruction stream D: data stream D. Göhringer, M. Hübner, T. Perschke, J. Becker: “A Taxonomy of Reconfigurable Single/Multi-Processor Systems-on-Chip”; 2011, reiner@hartenstein.de http://hartenstein.de © 2010, International Journal of Reconfigurable Computing, Hindawi, Special Issue: Selected Papers from ReCoSoC 2008, 2009. Software to Configware Migration TU Kaiserslautern S = R + (if C then A else B endif); section of a very large pipe network: R B A decision box: C =1 0 + C 1 (de)multiplexer: B A 0 1 C POIIP: decision box turns into (de)multiplexer ** W. A. Clark: 1967 SJCC, AFIPS Conf. Proc. C. G. Bell et al: IEEE Trans-C21/5, May 1972 © 2010, reiner@hartenstein.de 45 http://hartenstein.de POIIP: Loop to Pipe Mapping TU Kaiserslautern loop: Memory CPU FMDemod Pipeline: Split (reconfigurable) DataPath Unit: loop body complex loop body nested loops rDPU loop body rDPU rDPU LPF1 LPF2 LPF3 HPF1 HPF2 HPF3 rDPU Gather rDPU complex rDPU or pipe network inside rDPU © 2010, reiner@hartenstein.de 46 Adder Source: MIT StreamIT Speaker complex pipe network http://hartenstein.de POIIP: Loop to Pipe Mapping TU Kaiserslautern loop: Memory CPU FMDemod Pipeline: Split (reconfigurable) DataPath Unit: loop body complex loop body nested loops rDPU loop body rDPU rDPU LPF1 LPF2 LPF3 HPF1 HPF2 HPF3 rDPU Gather rDPU complex rDPU or pipe network inside rDPU © 2010, reiner@hartenstein.de 47 Adder Source: MIT StreamIT Speaker complex pipe network http://hartenstein.de on „platform FPGAs“ Imperative Language Twins MoPL: [FPL‘94, Prague] TU Kaiserslautern language category Computer Languages von Neumann Languages Languages f. Anti Machine both deterministic Software proceduralLanguages sequencing: traceable, checkpointable Flowware Languages read next instruction, goto (instr. addr.), jump (to instr. addr.), instr. loop, loop nesting no parallel loops, escapes, instruction stream branching program counter massive memory cycle overhead read next data item, goto (data addr.), jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching data counter(s) Instruction fetch parallel memory bank access memory cycle overhead overhead avoided interleaving only no restrictions language features control flow + data manipulation data streams only (no data manipulation) operation sequence driven by: state register address computation © 2010, reiner@hartenstein.de 48 overhead avoided Antimachine: [COMPEURO ’89] http://hartenstein.de A Heliocentric CS Model needed auto-sequencing Memory TU Kaiserslautern asM FE Flowware Engineering CPU SE Software Engineering PE Program Engineering structures pipe network model The Generalization of Software Engineering — 2011, reiner@hartenstein.de © 2010, 49 *) do not confuse with „dataflow“! Configware CE Engineering rDPU reconfigurable-Data-Path- Unit rDPA reconfigurable-Data-Path- Array http://hartenstein.de TU Kaiserslautern A Clean Terminology, please program source compilation result Software instruction streams Flowware data streams Configware © 2010, reiner@hartenstein.de datapath structures configured 50 http://hartenstein.de Outline (5) TU Kaiserslautern •Energy consumption of Computers •Toward Exascale Computing •The von Neumann Syndrome •We need to Reinvent Computing •Conclusions 2011, reiner@hartenstein.de © 2010, 51 http://hartenstein.de TU Kaiserslautern absurdely incomprehensible abstractions are the problem in „standard“ languages We need model-based abstractions at algorithmic level [For architecture design & debug] Concurrency models can operate at component architecture level rather than programming languages. [E. A. Lee] [E. A. Lee: Are new languages necessary for multicore? 2007] [E. A. Lee. The problem with threads. Computer, 2006.] 2011, reiner@hartenstein.de © 2010, 52 http://hartenstein.de Higher Abstraction Levels TU Kaiserslautern Nick Tredennick: Efforts to extend standards-based, serial programming languages with features to describe parallel constructs are likely to fail. What’s more likely to succeed are languages that raise the level of abstraction in algorithm description Mauricio Ayala-Rincón: Term Rewriting Systems (TRS) may raise the abstraction level up to math formulae TRS: powerful for better language design and design space exploration © 2010, reiner@hartenstein.de 53 http://hartenstein.de Conclusions TU Kaiserslautern Since we‘ve to re-write software anyway we should do it twin-pardigm. We need a tool flow & education efforts supporting a twin-paradigm approach and locality awareness Twin Paradigm skills & basic hardware knowledge are essential qualifications for programmers. We urgently need a fundamental CS Education and Research Revolution for dual-rail-thinking © 2010, reiner@hartenstein.de 54 http://hartenstein.de TU Kaiserslautern We need „une' Levée en Masses“ We need „une' Levée en Masses“ © 2010, reiner@hartenstein.de 55 55 http://hartenstein.de Thank You very much ! too many panels Don‘t worry ! and keynotes? TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, 56 http://hartenstein.de TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, 57 http://hartenstein.de time to space mapping TU Kaiserslautern time domain: procedure domain space domain: structure domain time algorithm space algorithm pipeline program loop n time steps, 1 CPU 1 time step, n DPUs Shuffle Sort Bubble Sort conditional swap n x k time steps, 1 „conditional x swap“ unit y k time steps, n conditional swap“ units conditional swap conditional swap conditional swap space/time algorithm s time algorithm © 2010, reiner@hartenstein.de conditional swap 58 http://hartenstein.de Architecture instead of synchro: Example TU Kaiserslautern conditional swap conditional swap conditional swap conditional swap conditional swap conditional swap conditional swap Better Architecture instead of complex synchronisation: half he number of Blocks + up und down of data (shuffle function) – no von Neumann-syndrome ! conditional swap conditional swap conditional swap conditional swap conditional swap direct time to space mapping accessing conflicts modification: with shufflefunction „Shuffle Sort“ © 2010, reiner@hartenstein.de 59 http://hartenstein.de TU Kaiserslautern Understanding Complex Hetero Systems [Ed Lee] We must change how programmers think Internode Communications reduces Computational Efficiency Understanding streams through complex fabrics needed Efficient Distribution of Tasks being memory limited Focusing on memory mapping issues and transfer modes to detect overhead and bottlenecks Layers of Abstraction and Automatic Parallelization hide critical sources of, and limits to efficient parallel execution essential: awareness of locality, © 2010, reiner@hartenstein.de 60 http://hartenstein.de Vertical Disintegration courtesy Manfred Glesner TU Kaiserslautern 1960 2011, reiner@hartenstein.de © 2010, 61 200X http://hartenstein.de TU Kaiserslautern Market Complexity Source: Gartner 2011, reiner@hartenstein.de © 2010, 62 http://hartenstein.de TU Kaiserslautern Taxonomy of Twin Paradigm Programming Flows (HPRC) [courtesy Richard Newton] „The nroff of EDA“ [R. N.] E. El-Araby et al.: Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology And Empirical Study; Proc. SPL2007, Mar del Plata, Argentina, Febr. 2007 2011, reiner@hartenstein.de © 2010, 63 http://hartenstein.de HLL programming models TU Kaiserslautern 2011, reiner@hartenstein.de © 2010, 64 http://hartenstein.de TU Kaiserslautern Some hardware description languaqges DeFacto Galadriel & Nenya MATCH © 2010, reiner@hartenstein.de 65 http://hartenstein.de TU Kaiserslautern Some programming languages © 2010, reiner@hartenstein.de 66 http://hartenstein.de Some languages for parallelism TU Kaiserslautern © 2010, reiner@hartenstein.de 67 http://hartenstein.de More Languages © 2010, reiner@hartenstein.de 68 Some datastream languages Some functional languages TU Kaiserslautern http://hartenstein.de 69 TU Kaiserslautern © 2010, reiner@hartenstein.de R. Rajkumar, I. Lee, L. Sha, J. Stankovic: Cyber-Physical Systems: The Next Computing Revolution; DAC 2010 Why Computers are important http://hartenstein.de Science alone ? TU Kaiserslautern see the claims by Andrew Jones, … 2011, reiner@hartenstein.de © 2010, 70 http://hartenstein.de TU Kaiserslautern Mobile Communication Worldwide radio base station sites* (millions) Average power consumption per site (kW) Total power consumption of all sites (TW) Total global RAN energy consumption (TWh) total # of subscriptions expected (billions) Broadband subscriptions expected (billions) Video streams (%) Share of mobile data in total mobile traffic (%) *) all standards 37.5 2014 7.6 1.3 10 84 6 2 66 98 2020 11.2 1.1 12.5 99 9 90 99.6 A. Fehske, J. Malmodin, G. Biczók, G. Fettweis: The Global Footprint of Mobile Communications – The Ecological and Economic Perspective; IEEE Communications Magazine, Aug 2011 The data transmission speed growth by a factor of ten every five years (cellular, local + personal area networks), 2011, reiner@hartenstein.de © 2010, 2007 3.3 1.7 5.6 49 71 Technologies to reduce energy consumption are a key enabler http://hartenstein.de <1000 repeaters: <25 kW TU Kaiserslautern Undersea Cable Google: 9,620km submarine cable Japan-US; 1st use Febr 21, 2011 Five fiber pairs deliver up to 4.8 Terabits per second (Tbps) >100 kilometers between repeaters repeater laser power consumption <25 W wavelength-division multiplexing dramatically increases fiber capacity. 2011, reiner@hartenstein.de © 2010, multiple (e.g. 5) pairs of fibers: each pair has one fiber in each direction power consumption of fabrication and cable layer ships much higher 72 http://hartenstein.de