Cyberinfrastructure in the Next Decade

advertisement
Perspectives on
Cyberinfrastructure
Daniel E. Atkins
atkins@umich.edu
Professor, University of Michigan
School of Information & Dept. of EECS
October 2002
2
Input to Panel
• 62 presentations at invitational public testimony
sessions
• 700 responses to a community-wide survey
• review of dozens of prior relevant reports; scores
of unsolicited emails and phone calls
• 250 pages of written critique from 60 reviewers of
an early draft of this report
• hundreds of hours of deliberation and discussion
between Panel members
• The members of the Panel have backgrounds in
areas widely relevant to creating, managing, and
using advanced cyberinfrastructure.
3
Report Flow
4
(Cyber) infrastructure
• The term infrastructure has been used since the
1920’s to refer collectively to the roads, bridges,
rail lines, and similar public works that are
required for an industrial economy to function.
• The recent term cyberinfrastructure refers to an
infrastructure based upon computer, information
and communication technology (increasingly)
required for discovery, dissemination, and
preservation of knowledge.
• Traditional infrastructure is required for an
industrial economy. Cyberinfrastructure is
required for an information economy.
Cyberinfrastructure: the Middle Layer
Applications in science and
engineering research and
education
Cyberinfrastructure: hardware,
software, personnel, services,
institutions
Base-technology: computation,
storage, communication
Enabling and Motivating a CI Initiative
ASC
PACI’s
Pittsburgh TSC
Distributed Terascale Facility
Some ITR Projects
Digital Library Initiatives
Networking Initiatives
Middleware Initiatives
Other CISE Research
Collaboratories
Scientific Data Collection/Curation
Initiatives in non-CISE Directorates
NSB Research Infrastructure Review
Initiatives in DOE, NIH, DOD, NASA, …
International Initiatives: UK e-science,
Earth Simulator, EU Grid & 6th Framework
CyberInfrastructure
Initiative
Trends & Issues
• Components
 Circuit speed flattening in about 6 years, then
most increase from improving chip density and
massive parallelism. New technology curves?
 Disk capacity increase 60-100% per year.
 Networking: 1.6 Terabits/sec running in labs on
a single fiber (40 channels at 40 gigabits/sec.).
Ubiquitous wireless.
8
Computational Diversity
• Capability not just capacity: technology,
policy, tools.
• Still need some center-based leadingedge,super computers.
• On-demand supercomputing,not just
batch.
9
Content
• Digital everything; exponential growth; conversion
and born-digital.
• S&E literature is digital. Microfilm-> digital for
preservation. Digital libraries are real and getting
better.
• Distributed (global scale), multi-media, multidisciplinary observation. Huge volume.
• Need for large-scale, enduring, professionally
managed/curated data repositories.
• New modes of scholarly communication emerging.
• IP, openness, ownership, privacy, security issues
10
Converging Streams of Activity
GRIDS (broadly defined)
E-science
CI-enabled
Science &
Engineering
Research &
Education
ITFRU
Scholarly communication
in the digital age
Science-driven pilots (not using above labels)
Futures: The Computing Continuum
Smart
Objects
Petabyte
Archives
National
Petascale
Systems
Terabit
Collaboratories
Networks
Responsive
Environments
Laboratory
Terascale
Systems
Building Up
Ubiquitous
Sensor/actuator
Networks
Contextual
Awareness
Ubiquitous Infosphere
Building Out
Science, Policy
and Education
Components of CI-enabled science &
engineering
A broad, systemic, strategic conceptualization
High-performance computing
for modeling, simulation, data
processing/mining
Humans
Individual &
Group Interfaces
& Visualization
Collaboration
Services
Instruments for
observation and
characterization.
Global
Connectivity
Physical World
Facilities for activation,
manipulation and
construction
Knowledge management
institutions for collection building
and curation of data, information,
literature, digital objects
Community Planning Guidance Examples from Geosciences
Consultation
with
environmental
community
leaders
NSF - Nov. 19,
2001
Cyberinfrastructure Enabled Science
NVO and ALMA
Climate Change
ATLAS and CMS
LIGO
The number of nation-scale projects is growing rapidly!
More Diversity, New Devices, New
Applications
Picture of
earthquake
and bridge
Sensors
Personalized
Medicine
Picture of
digital sky
Wireless networks
Knowledge
from Data
Instruments
Four LHC Experiments: The Petabyte
to Exabyte Challenge
ATLAS, CMS, ALICE, LHCB
Higgs + New particles; Quark-Gluon Plasma; CP Violation
Data stored
~40 Petabytes/Year and UP;
CPU
0.30 Petaflops and UP
0.1 to
1
Exabyte (1 EB = 1018 Bytes)
(2007)
(~2012 ?) for the LHC Experiments
Crab Nebula in 4 spectral regions
X-ray, optical, infrared, radio
Cyberinfrastructure is a First-Class Tool for
Science
Remote
Users
Laboratory
Equipment
Instrumented
Structures
and Sites
Network for
Earthquake
Engineering
Simulation
HighPerformance
Network(s)
Field Equipment
Curated Data
Repository
Leading Edge
Computation
Laboratory Equipment
Global
Connections
Remote Users
Need highly coordinated,
persistent, major investment in…
• Research and development (CI as object of R&D))
 Base technology (CISE)
 CI components & systems (CISE & SEB)
 Science-driven pilots (CISE, SEB, all others)
• Operational services




Distributed but connected (Grid)
Exploit commonality, interoperability
Advanced, leading-edge but…
Robust, predictable, responsive, persistent
• Domain science communities (CI in service of R&D)
 Specific application of CI to revolutionizing research (pilot -> operational)
 Required not optional. New things, new ways.
 New things, new ways. Empowerment, training, retraining. X-informatics.
• Education and broader engagement
 Multi-use: education, public science literacy
 Equity of access
 Pilots of broader application: ITFRU, industry, workforce & economic development
Shared Opportunity and
Responsibility
•
•
•
•
All NSF communities
Multi-agency
Industry
International
From Prime Minister Tony Blair’s Speech to the Royal
Society (23 May 2002)
•
What is particularly impressive is the way that scientists are now undaunted by important
complex phenomena. Pulling together the massive power available from
modern computers, the engineering capability to design and build
enormously complex automated instruments to collect new data, with
the weight of scientific understanding developed over the centuries, the
frontiers of science have moved into a detailed understanding of
complex phenomena ranging from the genome to our global climate.
Predictive climate modelling covers the period to the end of this century and beyond, with
our own Hadley Centre playing the leading role internationally.
•
The emerging field of e-science should transform this kind of work. It's
significant that the UK is the first country to develop a national escience Grid, which intends to make access to computing power,
scientific data repositories and experimental facilities as easy as the Web
makes access to information.
•
One of the pilot e-science projects is to develop a digital mammographic archive, together
with an intelligent medical decision support system for breast cancer diagnosis and
treatment. An individual hospital will not have supercomputing facilties, but through the
Grid it could buy the time it needs. So the surgeon in the operating room will be able to pull
up a high-resolution mammogram to identify exactly where the tumour can be found.
Bottom-line
• NSF had a unique responsibility to provide
leadership for the Nation in an initiative to
revolutionize science and engineering research
capitalizing on cyberinfrastructure opportunities.
 A nascent revolution has begun. Demand is here and
growing. The time is now (opportunities & opportunity
costs.)
 Many prior investments (projects, initiatives, centers) are
a key resource to build upon.
 Now need sanction, leadership and empowerment
through significant new funding and effective
coordination.
 Need very broad (synergistic) participation by many
communities with complementary needs and expertise.
 Need appropriate leadership and management structure.
 Need incremental funding of $1B/year (continuing).
Incremental budget estimates
• Our estimates are based on





current and previous NSF activities
testimonies
other agencies’ programs in related areas
activities in other countries
explicit input from community on Draft 1.0
Budget Overview
(Incremental in $ Millions)
• Fundamental research to advance CI
• Application of CI to advance S&E research
• Provision of operational CI
• Information and data support
• TOTAL
$ 60
$200
$660
$200
$1020
The INITIATIVE = ???
• 1. Advanced Cyberinfrastructure Initiative (ACI)
• 2. Advanced Application and Cyberinfrastucture Initiative
(AACI)
• 3. Advanced Cyberinrastructure and Application Initiative
(ACAI)
• 4. Advanced Digital Science and Engineering (ADSE)
• 5. eScience Initiative (eSI)
• 6. Digital Science for the Future (DSF)
• 7. Digital Science and Engineering for the Future (DSEF)
• 8. New Science and Engineering Research (NSER)
• 9. Revolutions in Digital Exploration (RIDE)
• 10. Digital Science and Engineering Exploration (D-SEE)
28
END
Need Appropriate Organizational
Structure
• An INITIATIVE OFFICE with a highly placed, credible
leader empowered to
 Initiate competitive, discipline-driven path-breaking applications
within NSF of cyberinfrastructure which contribute to the shared
goals of the INITIATIVE.
 Coordinate policy and allocations across fields and projects.
Participants across NSF directorates, Federal agencies, and
international e-science.
 Develop high quality middleware and other software that is
essential and special to scientific research.
 Manage individual computational, storage, and networking
resources at least 100x larger than individual projects or
universities can provide.
Download