Perspectives on Cyberinfrastructure Daniel E. Atkins atkins@umich.edu Professor, University of Michigan School of Information & Dept. of EECS October 2002 2 Input to Panel • 62 presentations at invitational public testimony sessions • 700 responses to a community-wide survey • review of dozens of prior relevant reports; scores of unsolicited emails and phone calls • 250 pages of written critique from 60 reviewers of an early draft of this report • hundreds of hours of deliberation and discussion between Panel members • The members of the Panel have backgrounds in areas widely relevant to creating, managing, and using advanced cyberinfrastructure. 3 Report Flow 4 (Cyber) infrastructure • The term infrastructure has been used since the 1920’s to refer collectively to the roads, bridges, rail lines, and similar public works that are required for an industrial economy to function. • The recent term cyberinfrastructure refers to an infrastructure based upon computer, information and communication technology (increasingly) required for discovery, dissemination, and preservation of knowledge. • Traditional infrastructure is required for an industrial economy. Cyberinfrastructure is required for an information economy. Cyberinfrastructure: the Middle Layer Applications in science and engineering research and education Cyberinfrastructure: hardware, software, personnel, services, institutions Base-technology: computation, storage, communication Enabling and Motivating a CI Initiative ASC PACI’s Pittsburgh TSC Distributed Terascale Facility Some ITR Projects Digital Library Initiatives Networking Initiatives Middleware Initiatives Other CISE Research Collaboratories Scientific Data Collection/Curation Initiatives in non-CISE Directorates NSB Research Infrastructure Review Initiatives in DOE, NIH, DOD, NASA, … International Initiatives: UK e-science, Earth Simulator, EU Grid & 6th Framework CyberInfrastructure Initiative Trends & Issues • Components Circuit speed flattening in about 6 years, then most increase from improving chip density and massive parallelism. New technology curves? Disk capacity increase 60-100% per year. Networking: 1.6 Terabits/sec running in labs on a single fiber (40 channels at 40 gigabits/sec.). Ubiquitous wireless. 8 Computational Diversity • Capability not just capacity: technology, policy, tools. • Still need some center-based leadingedge,super computers. • On-demand supercomputing,not just batch. 9 Content • Digital everything; exponential growth; conversion and born-digital. • S&E literature is digital. Microfilm-> digital for preservation. Digital libraries are real and getting better. • Distributed (global scale), multi-media, multidisciplinary observation. Huge volume. • Need for large-scale, enduring, professionally managed/curated data repositories. • New modes of scholarly communication emerging. • IP, openness, ownership, privacy, security issues 10 Converging Streams of Activity GRIDS (broadly defined) E-science CI-enabled Science & Engineering Research & Education ITFRU Scholarly communication in the digital age Science-driven pilots (not using above labels) Futures: The Computing Continuum Smart Objects Petabyte Archives National Petascale Systems Terabit Collaboratories Networks Responsive Environments Laboratory Terascale Systems Building Up Ubiquitous Sensor/actuator Networks Contextual Awareness Ubiquitous Infosphere Building Out Science, Policy and Education Components of CI-enabled science & engineering A broad, systemic, strategic conceptualization High-performance computing for modeling, simulation, data processing/mining Humans Individual & Group Interfaces & Visualization Collaboration Services Instruments for observation and characterization. Global Connectivity Physical World Facilities for activation, manipulation and construction Knowledge management institutions for collection building and curation of data, information, literature, digital objects Community Planning Guidance Examples from Geosciences Consultation with environmental community leaders NSF - Nov. 19, 2001 Cyberinfrastructure Enabled Science NVO and ALMA Climate Change ATLAS and CMS LIGO The number of nation-scale projects is growing rapidly! More Diversity, New Devices, New Applications Picture of earthquake and bridge Sensors Personalized Medicine Picture of digital sky Wireless networks Knowledge from Data Instruments Four LHC Experiments: The Petabyte to Exabyte Challenge ATLAS, CMS, ALICE, LHCB Higgs + New particles; Quark-Gluon Plasma; CP Violation Data stored ~40 Petabytes/Year and UP; CPU 0.30 Petaflops and UP 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (2007) (~2012 ?) for the LHC Experiments Crab Nebula in 4 spectral regions X-ray, optical, infrared, radio Cyberinfrastructure is a First-Class Tool for Science Remote Users Laboratory Equipment Instrumented Structures and Sites Network for Earthquake Engineering Simulation HighPerformance Network(s) Field Equipment Curated Data Repository Leading Edge Computation Laboratory Equipment Global Connections Remote Users Need highly coordinated, persistent, major investment in… • Research and development (CI as object of R&D)) Base technology (CISE) CI components & systems (CISE & SEB) Science-driven pilots (CISE, SEB, all others) • Operational services Distributed but connected (Grid) Exploit commonality, interoperability Advanced, leading-edge but… Robust, predictable, responsive, persistent • Domain science communities (CI in service of R&D) Specific application of CI to revolutionizing research (pilot -> operational) Required not optional. New things, new ways. New things, new ways. Empowerment, training, retraining. X-informatics. • Education and broader engagement Multi-use: education, public science literacy Equity of access Pilots of broader application: ITFRU, industry, workforce & economic development Shared Opportunity and Responsibility • • • • All NSF communities Multi-agency Industry International From Prime Minister Tony Blair’s Speech to the Royal Society (23 May 2002) • What is particularly impressive is the way that scientists are now undaunted by important complex phenomena. Pulling together the massive power available from modern computers, the engineering capability to design and build enormously complex automated instruments to collect new data, with the weight of scientific understanding developed over the centuries, the frontiers of science have moved into a detailed understanding of complex phenomena ranging from the genome to our global climate. Predictive climate modelling covers the period to the end of this century and beyond, with our own Hadley Centre playing the leading role internationally. • The emerging field of e-science should transform this kind of work. It's significant that the UK is the first country to develop a national escience Grid, which intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information. • One of the pilot e-science projects is to develop a digital mammographic archive, together with an intelligent medical decision support system for breast cancer diagnosis and treatment. An individual hospital will not have supercomputing facilties, but through the Grid it could buy the time it needs. So the surgeon in the operating room will be able to pull up a high-resolution mammogram to identify exactly where the tumour can be found. Bottom-line • NSF had a unique responsibility to provide leadership for the Nation in an initiative to revolutionize science and engineering research capitalizing on cyberinfrastructure opportunities. A nascent revolution has begun. Demand is here and growing. The time is now (opportunities & opportunity costs.) Many prior investments (projects, initiatives, centers) are a key resource to build upon. Now need sanction, leadership and empowerment through significant new funding and effective coordination. Need very broad (synergistic) participation by many communities with complementary needs and expertise. Need appropriate leadership and management structure. Need incremental funding of $1B/year (continuing). Incremental budget estimates • Our estimates are based on current and previous NSF activities testimonies other agencies’ programs in related areas activities in other countries explicit input from community on Draft 1.0 Budget Overview (Incremental in $ Millions) • Fundamental research to advance CI • Application of CI to advance S&E research • Provision of operational CI • Information and data support • TOTAL $ 60 $200 $660 $200 $1020 The INITIATIVE = ??? • 1. Advanced Cyberinfrastructure Initiative (ACI) • 2. Advanced Application and Cyberinfrastucture Initiative (AACI) • 3. Advanced Cyberinrastructure and Application Initiative (ACAI) • 4. Advanced Digital Science and Engineering (ADSE) • 5. eScience Initiative (eSI) • 6. Digital Science for the Future (DSF) • 7. Digital Science and Engineering for the Future (DSEF) • 8. New Science and Engineering Research (NSER) • 9. Revolutions in Digital Exploration (RIDE) • 10. Digital Science and Engineering Exploration (D-SEE) 28 END Need Appropriate Organizational Structure • An INITIATIVE OFFICE with a highly placed, credible leader empowered to Initiate competitive, discipline-driven path-breaking applications within NSF of cyberinfrastructure which contribute to the shared goals of the INITIATIVE. Coordinate policy and allocations across fields and projects. Participants across NSF directorates, Federal agencies, and international e-science. Develop high quality middleware and other software that is essential and special to scientific research. Manage individual computational, storage, and networking resources at least 100x larger than individual projects or universities can provide.