Educational Applications of Supercomputing and Cyberinfrastructure KAUST Economic Development International Symposium at ISC'11, 21 June 2011, Hamburg Supercomputing in Science and Engineering: Economic and Technological Opportunities and Challenges Dr. Craig A. Stewart Associate Dean, Research Technologies Executive Director, Pervasive Technology Institute Indiana University stewart@iu.edu Outline • Too many people, too few people • Inspiring young people • Examples of interesting educational activities (roughly scaling up by participant count) • New opportunities for cyberinfrastructure at the campus, national, and international levels (campus bridging) • Conclusions: Education, technology, economic development NB: License terms for slides at end 2 Some definitions • Supercomputer – large, monolithic, tightly integrated computer • High Performance Computer – a more general term than supercomputer including a wider variety of cluster types • High Throughput Computing – systems of computers that work on nicely parallel problems with (very) low bandwidth connections • Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible. • eScience – large scale science increasingly carried out by global collaborations enabled by the Internet. 3 Technology assertions… “…results of this discovery upon society will be greater than the imagination of the most sanguine can now distinctly conceive.” The telegraph The American Biblical Repository, 1838 “… will tremendously influence our national elections, will promote world understanding of social, racial, and economic problems, will influence our daily lives to a degree yet undreamed of.” The television Franklin Dunham, 1956 “… is becoming the town square for the global village of tomorrow.” The Internet Bill Gates, 1996 “The world is poised on the cusp of an economic and cultural shift as dramatic as that of the Industrial Revolution.” The WWW Steven Levy, 1997 “We have technology, finally, that for the first time in human history allows people to really maintain rich connections with much larger numbers of people.” The Internet Pierre Omidyar, 2005 4 World population growth (history & predicted) Billions 12 11 2100 10 9 Old Stone 7 Age 8 New Stone Age Bronze Age Iron Age 6 Modern Age Middle Ages 2000 Future 5 4 1975 3 1950 2 1 Black Death 1+ million 7000 6000 5000 years B.C. B.C. B.C. 4000 B.C. — The Plague 1900 1800 3000 2000 1000 A.D. A.D. A.D. A.D. A.D. A.D. B.C. B.C. B.C. 1 1000 2000 3000 4000 5000 Source: © Population Reference Bureau; and United Nations, World Population Projections to 2100 (1998). 5 © Schnabel, R. 2011. ACM’s engagement in education policy. CRA Leadership Meeting. 28 Feb 2011. 6 Analytics market $76B market by 2015 Information & Analytics Market $60B in 2011; 6.4% CGR ′10-′15 Info Integration & MDM $4.9B ′10; 8.3% CGR ′10-’15 DW DBMS $6.9B ′11; 7.1% CGR ′10-′15 Analytic Applications $7.3B ′11; 7.0% CGR ′10-′15 BI Platform & PM $15.0B ′11; 7.7% CGR ′10-′15 Data Mgmt & IDM $18.9B ′11; 4.2% CGR ′10-′15 Content Management $6.9B ′11; 6.7% CGR ′10-′15 Source: GMV 2H10 (incl. analytic applications) © IBM, Inc. 7 The conundrum • Technology will not solve our problems by itself • We do not have enough knowledge workers • People in many parts of the globe do not have access to education that will enable them to fill the jobs of today and tomorrow • Colleges and universities are not recruiting and retaining enough students to fulfill demand for students with CSTEM skills in general and advanced computing skills in particular 8 We need people comfortable with critical thinking and computational thinking • Critical thinking skills • Computational thinking skills – – – – Conceptualizing, not programming Fundamental, not rote skill A way that humans, not computers, think Complements and combines mathematical and engineering thinking – Ideas, not artifacts – For everyone, everywhere From: Wing, J.M. Computational thinking. 2006. Communications of the ACM. 49(3): 33-35 9 Inspiration matters! © Estes-COX Inc. www.estesrockets.com Two young model and model rocket builders, shortly after claiming world record for continuous model building at 37 hours and 40 minutes (quickly surpassed by others) April, 1973 10 Ready, Set Robots! Camp @ PTI 2010 From 3D movie What is Cancer? by Albert William IUPUI, SOIC, AVL, Research Technologies, UITS / PTI © Matthew King, student in IU professor Margaret Dolinsky's Digital Art class Mike Boyles, AVL, Research Technologies, UITS / PTI giving demo 11 Games are not reality 12 PolarGrid Je’aime Powell, Elizabeth City State University graduate researcher on Greenland expedition, 2009. Photos courtesy of Keith Lehigh and Matt Link, Indiana University 13 Geoffrey Fox, PI. PolarGrid SC ‘08 Cluster Challenge IU / Dresden team – organized within IU side by Dr. Andrew Lumsdaine, Director, Open Systems Lab and Center for Scalable Computing, PTI, and Professor, School of Informatics and Computing; and Matt Link and D. Scott McCaulay, Directors, Research Technologies, UITS / PTI 14 Guitar workshop Photos courtesy of Rebecca Lowe, Open Systems Lab, SOIC and PTI, Indiana University. Guitar workshop sponsored by Dr. Andrew Lumsdaine, Director, Open Systems Lab and Center for Scalable Computing, PTI; and Professor, School of Informatics and Computing 15 Minority Engineering Advancement Program @ IUPUI Use the Bootable Cluster CD with the “Game of Life” to demonstrate speedup LittleFe - small integrated cluster Matt Link, Director, Research Technologies, UITS; and Associate Director, Center for Scalable Computing, PTI 16 LittleFe “LittleFe is a complete 6 node Beowulf style portable computational cluster. The entire package weighs less than 50 pounds; easily travels; and sets-up in 5 minutes. Current generation LittleFe hardware includes multicore processors and GPGPU support enabling support for shared memory parallelism, distributed memory parallelism, and hybrid models. By leveraging the Bootable Cluster CD project, and the Computational Science Education Reference Desk LittleFe is a powerful, ready-to-run, computational science and parallel programming educational platform for the price of a high-end laptop.” http://LittleFe.net Photo courtesy Charlie Peck, Earlham College. © Earlham College. 17 LEAD (Linked Environments for Atmospheric Discovery) & LEAD II – an example Science Gateway Meteorology researchers used data and images generated by LEAD II while chasing tornadoes. Images © Beth Plale, Professor, School of Informatics & Computing; Director, Data to Insight Center, PTI 18 WxChallenge & LEAD II www.wxchallenge.com. Screen image © University of Oklahoma. In support of the 2010 Vortex2 campaign, LEAD II successfully executed 214 workflows, used 109,568 CPU hours, generated 215 GB of data and over 9,100 2D products. http://pti.iu.edu/d2i/leadii-vortex2 Image © Trustees of Indiana University 19 nanoHUB Screen Image © Network for Computational Nanotechnology (nanohub.org/groups/ncn). 20 nanoHUB usage nanoHUB usage, September 2010. Red dots: tutorial and seminar use. Yellow dots: online simulation use. Size of dot indicates number of users from location. Annually nanoHUB serves over 170,000 users in 172 countries. © Gerhard Klimeck, Network for Computational Nanotechnology (nanohub.org/groups/ncn). Used by permission. May not be reused without permission. 21 @home projects (based on BOINC) http://escatter11.fullerton.edu/nfs/. Image courtesy Dr. Greg Childers, and © California State University, Fullerton. docking.cis.udel.edu Image Courtesy of Michela Taufer, GCLab, U. Delaware. © U. Deleware 22 You don’t need access to a supercomputer to teach parallel computing… or dataintensive computing • • • • Multicore & GPUs LittleFe Cloud providers Citizen Science – access to and participation in authentic science physicsworld.com/cws/article/news/2738 © Institute of Physics. Reused under Licensing terms @ physicsworld.com/cws/copyright Photograph © Chris Eller, Advanced Visualization Lab, Research Technologies, UITS; and PTI 23 Campus bridging • Campus bridging is the seamlessly integrated use of cyberinfrastructure operated by a scientist or engineer with other cyberinfrastructure on the scientist’s campus, at other campuses, and at the regional, national, and international levels as if they were proximate to the scientist, and when working within the context of a Virtual Organization (VO) make the ‘virtual’ aspect of the organization irrelevant (or helpful) to the work of the VO. • Campus bridging material: http://pti.iu.edu/campusbridging/ • ACCI Taskforce final reports: http://www.nsf.gov/od/oci/taskforces/ 24 Estimated Computing Capacity (TFLOPS) NSF Track 1 Track 2 and other major facilities Campus HPC/ Tier 3 systems Workstations at Carnegie research universities Volunteer computing Commercial cloud (Iaas and Paas) 0 2000 4000 6000 8000 10000 12000 TFLOPS Data at http://hdl.handle.net/2022/13136 25 Single lab biological instruments Type of instrument Model Raw image data Data products Light Microscopy BD Pathway 855 Bioimager N/A 7 GB/day Genome sequencing Roche 454 Life Sciences genome analyzer system 39 GB/day 9 GB/day Illumina-Solexa genome analyzer system 367 GB/day 100 GB/day ABI SOLID 3 238 GB/day 150 GB/day Microarray Gene Expression Chip Reader Molecular Devices GenePix Professional 4200A Scanner N/A 8 MB/day Microarray Gene Expression Chip Reader NimbleGen Hybridization System 4 (110V) N/A 300 MB/day Several Task Force recommendations to the NSF re Hardware and networking: Much more attention to data and networking challenges! 26 Cyberinfrastructure is infrastructure Strategic Recommendation to the NSF: NSF must lead the community in establishing a blueprint for a National CI CI software must be made more robust National Science Foundation. Investing in America’s Future: Strategic Plan FY 2006-2011. September 2006. Available from: http://www.nsf.gov/pubs/2006/nsf0648/nsf0648.jsp 27 Examples of mature and maturing systems & software DEISA UK eScience Grid NSF CIF 21 (Cyberinfrastructure Framework for 21st Century Science and Engineering ROCKS (www.rocksclusters.org) Condor (www.condor.org) © DEISA. http://www.deisa.eu/usersupport/user-documentation/ unicore-5-in-deisa/job-submission-through-unicore-5/ DEISA-UNICORE-Figure01.png/image_preview 28 Critical challenge: curriculum materials http://ocw.mit.edu/index.htm Used under Creative Commons License – Attribution-NonCommercial-ShareAlike 3.0 United States (CC BY-NC-SA 3.0) http://creativecommons.org/licenses/by-nc-sa/3.0/us/ 29 Existing curriculum resources • MIT Computer Science & Engineering curriculum – web.mit.edu/catalog/degre.engin.ch6.html • ACM – www.acm.org/education/curricula-recommendations • TCPP (Technical Committee on Parallel Programming) tcpp.cs.gsu.edu/ – CORE COURSES: • • • • • CS1 Introduction to Computer Programming (First Courses) CS2 Second Programming Course in the Introductory Sequence Systems Intro Systems/Architecture Core Course DS/A Data Structures and Algorithms DM Discrete Structures/Math ADVANCED – ELECTIVE COURSES: • • • • • • • Arch 2 Advanced Elective Course on Architecture Algo 2 Elective/Advanced Algorithm Design and Analysis Lang Programming Language/Principles (after introductory sequence) SwEngg Software Engineering ParAlgo Parallel Algorithms ParProg Parallel Programming Compilers Compiler Design – IMHO: The TCPP curriculum demonstrates the need for more attention to computational thinking in K-12 education 30 300+ Students learning about Twister & Hadoop MapReduce technologies, supported by FutureGrid. July 26-30, 2010 NCSA Summer School Workshop http://salsahpc.indiana.edu/tutorial Washington University University of Minnesota Iowa IBM Almaden Research Center University of California at Los Angeles San Diego Supercomputer Center Michigan State Univ.Illinois at Chicago Notre Dame Johns Hopkins Penn State Indiana University University of Texas at El Paso University of Arkansas University of Florida Slide © Judy Qiu, SOIC and SALSA Lab, PTI Economies of scale in training Photo courtesy Robert Quick, Research Technologies & PTI. OSG Grid School in Sao Paulo Brazil, January 2011 Image from TeraGridEOT: Education, Outreach, and Training 2010. https://www.teragrid.org/web/news/ news#2010scihigh 32 Great challenges, great opportunities • Challenges – Matters such as human impact on the global environment will be most successfully addressed with fact-based consensus approaches. – More countries must have the skill and access to technology to do their own modeling • Cyberinfrastructure and education opportunities – If we can treat cyberinfrastructure more like infrastructure … we can focus on the challenging / important / fun work – Robust cyberinfrastructure => reusable educational materials – Data-intensive science creates tremendous need and opportunity in education and application – While we are busy improving the pipeline of talent, involving undergrads in research may greatly improve the % of the existing pipeline that pursues an advanced technology career 33 New economic growth opportunities • VOs and opportunities they provide for research • Digital manufacturing (new opportunities in a different approach to globalization) • Sustainable societies • With better education in supercomputing, and all forms of high performance computing, people may enable us to achieve some of the technology nirvana described at beginning of talk 34 This talk is dedicated to the memory of Truman O. Stewart 35 Additional information • Droegemeier, K., B. Plale, M. Ramamurthy, and C. Mattocks, "A New Approach for Using Web Services, Grids, and Virtual Organizations in Mesoscale Meteorological Research" 25th Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology (IIPS), 01/2009. • Stewart, C.A., S. Simms, B. Plale, M. Link, D. Hancock and G. Fox. 2010. What is Cyberinfrastructure? In: Proceedings of SIGUCCS 2010 (Norfolk, VA, 24-27 Oct, 2010). http://portal.acm.org/citation.cfm?doid=1878335.1878347 • http://www.computinginthecore.org/ − “a non-partisan advocacy coalition … to elevate computer science education to a core academic subject in K-12 education … • http://hubzero.org/resources/408/ Exploring the Impact of nanoHUB.org on Research and Education • Cohen, D. 2006. Globalization and Its Enemies. MIT Press 36 Acknowledgments • Thanks to King Abdullah University of Science and Technology for the opportunity to present today (Through Inspiration, Discovery indeed!) • Malinda Lingwall for editing, graphical work, and factfinding/checking • Ready, Set, Robots! Camp: Daphne Siefert-Herron, Kurt Seiffert, Kristy Kallback-Rose, Danko Antolovic, Jenett Tillotson, Therese Miller • MEAP: David Hancock, Andrew Arenson, Rich Knepper, Kurt Seiffert, Matt Link (Research Technologies, UITS, Research Technologies, PTI); Patrick Gee, Mark Russell • Thanks to all of the IU Research Technologies staff and Pervasive Technology Institute students, staff, and faculty who have led or been involved in the IU projects described here 37 Acknowledgments • • • • • • • • • • • • Many of the scientific workflow examples here use the IU Data Capacitor – project led by Steve Simms, Research Technologies, UITS, & PTI. http://pti.iu.edu/dc/ NSF CNS 05-21433 LEAD: Beth Plale, IU (SOIC-PTI) funded by NSF 0331480 PolarGrid: NSF 0723054 (G. Fox, PI) FutureGrid: NSF 0910812 (G. Fox, PI) nanoHUB: nanoHUB.org is operated by Network for Computational Nanotechnology (NCN). NCN was funded by the National Science Foundation (NSF) under various grants. Development and support of nanoHUB is also supported in part by the HUBzero consortium, of which IU is a member. Campus Bridging: NSF 040777, 1059812, 0948142, 1002526, 0829462 LittleFe: Support from TeraGrid, SC Conference, Intel Corporation and Earlham College. Lilly Endowment for its support of IU through INGEN, METACyt, and the Pervasive Technology Institute Tevfik Kosar, who as Chair of DIDC ‘10 invited me to present the Keynote presentation at the Third International Workshop on Data Intensive Distributed Computing (DIDC'10). “It’s not a data deluge – it’s worse than that.” Several slides from that talk are reused here. That original talk is available : http://hdl.handle.net/2022/13195 Thanks to those individuals who gave permission to use images presented in this talk Any opinions presented here are those of the presenter and do not necessarily represent the opinions of the National Science Foundation, the Lilly Endowment, the NSF ACCI, NSF ACCI Task Force on Campus Bridging, or any other funding agencies or organizations 38 License terms • Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. • Please cite this presentation as: Stewart, C.A. “Educational Applications of Supercomputing and Cyberinfrastructure.” Presentation at KAUST Economic Development International Symposium at ISC'11, 21 June 2011. Available from: http://hdl.handle.net/2022/13365 • Except where otherwise noted, contents of this presentation are copyright 2011 by the Trustees of Indiana University. • This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. 39 Questions? And thank you for your kind attention…. 40