IPDPS Looking Back Panel Uzi Vishkin, University of Maryland Moderator: What has gone well? IPDPS a big success. Has become top facilitator for: •Specific technical contributions, and •Open debate of challenges – e.g., this panel Warm congrats to Viktor and all others! 2 What also has gone well •Parallel PRAM algorithmic theory, second in magnitude only to the serial algorithmic theory •Won the “battle of ideas” in the 1980s. Repeatedly: •Challenged without success no real alternative! 3 Dream opportunity Limited interest in parallel computing evolved into quest for general-purpose parallel computing in mainstream machines Example many-core desktops So far for the good news Are we doing everything we can to ensure that manycores are not rejected by programmers? Recall Einstein’s observation: A perfection of means, and confusion of aims, seems to be our main problem 4 “If you find yourself in a hole, stop digging” Moderator: What has gone wrong We found ourselves in a hole: most programmers can’t handle today’s (multicore) desktops Moderator: What was surprising We keep digging Why are we in trouble 1940’s Stored-program & program-counter For serial comp: knowledge of algorithms was low priority for arch (& knowledge of arch was low priority for alg people) No such agreed “bridge” for many-cores. Still: - Industry realizes the need to reinvent computing for parallelism but is stuck with short-term pressures/culture. Academia hold on. - Education Architects pay too limited attention to (this time parallel) algorithms. How will they know to build machines that are easy to program? Instead, funding guides using problematic designs 5 What follows - Are many-core architectures doomed to mismatch parallel algorithms and ease-of-programming (EoP)? - What difference can a matching arch make? - What is feasible? 6 How come that most programmers can’t handle today’s (multicore) desktops? Hypothesis: Flawed architecture foundation Origin: ‘build-first figure-out-how-to-program-later’ Parallel languages: fitted flawed architectures then standardized Who can save the field and promote the aim of ease-of-programming (EoP)? Industry (perfecting means) - Follow-up architectures fit language standards remain flawed - Insufficient competition Academia (perfecting means) - Consider a vendor-backed flawed system. Wonderful opportunity for our originality-seeking publications culture: * The simplest problem requires creativity More papers * Cite one another if on similar systems high # citations coupled with ‘industry impact’ - Ultimate job security – By the time the ink dries on these papers, next 7 flawed ‘modern’ ‘state-of-the-art’ system. Culture of short-term impact Anecdotal Validation (?) • Breadth-first-search (BFS) example 42 students: joint UIUC/UMD course - <1X speedups using OpenMP on 8-processor SMP - 7x-25x speedups on 64-processor XMT FPGA prototype [Built at UMD] What’s the big deal of 64 processors beating 8? Silicon area of 64 XMT processors ~= 1-2 SMP processors • Questionnaire Rank approaches for achieving (hard) speedups: All students, but one : XMTC ahead of OpenMP • Order-of-magnitude advantage on teachability (MS, HS & up, SIGCSE’10) • SPAA’11: >100X speedup on max-flow relative to 2.5X on GPU (IPDPS’10) • Fleck/Kuhn: research too esoteric to be reliable exoteric validation! What has gone wrong Only heroic programmers can exploit the vast parallelism in current machines – The Future of Computing Performance: Game Over or Next Level?, Report by NAE 2011. Conclusion Fund power.. Reward alert: Try to publish a paper boasting easy to obtain results 8 EoP: 1. Badly needed. Yet, 2. A lose-lose proposition. Parallel Random-Access Machine/Model PRAM: n synchronous processors all having unit time access to a shared memory. Reactions You got to be kidding, this is way: - Too easy - Too difficult: Why even mention processors? What to do with How to allocate processors to instructions? n processors? Immediate Concurrent Execution ‘Work-Depth framework’ SV82, Adopted in Par Alg texts [J92,KKT01]. ICE basis for architecture specs: V, Using simple abstraction to reinvent computing for parallelism, CACM 1/2011 Similar to role of stored-program & program-counter in arch specs for serial comp 10 Algorithms-aware many-core is feasible Algorithms PRAM-On-Chip HW Prototypes Programming 64-core, 75MHz FPGA of XMT [SPAA98..CF08] Toolchain Compiler + simulator HIPS’11 128-core interconnection network Programmer’s workflow IBM 90nm: 9mmX5mm, 400 MHz [HotI07] - Rudimentary yet stable FPGA designASIC • IBM 90nm: 10mmX10mm compiler • 150 MHz Architecture scales to 1000+ cores on-chip XMT homepage: www.umiacs.umd.edu/users/vishkin/XMT/index.shtml or search: ‘XMT’ Where are your specs? What is your par alg abstraction? • ‘First-specs then-build’ is “not uncommon”.. for engineering • I see only 2 options for architects: A. 1. Go through parallel algorithms immersion 2. Develop abstraction that meets EoP 3. Develop specs 4. Build B. Start from abstraction with proven EoP Sociologists of science • Debates between adherents of different thought styles consist almost entirely of misunderstandings. Members of both parties are talking of different things (though they are usually under an illusion that they are talking about the same thing). They are applying different methods and criteria of correctness (although they are usually under an illusion that their arguments are universally valid and if their opponents do not want to accept them, then they are either stupid or malicious) 13