1762 KB

advertisement
Cyberinfrastructure
From Dreams to Reality
Deborah L. Crawford
Deputy Assistant Director of NSF for
Computer & Information Science & Engineering
Workshop for eInfrastructures
Rome, December 9, 2003
Daniel E. Atkins, Chair, University
of Michigan
Kelvin K. Droegemeier, University
of Oklahoma
Stuart I. Feldman, IBM
Hector Garcia-Molina, Stanford
University
Michael L. Klein, University of
Pennsylvania
David G. Messerschmitt, University
of California at Berkeley
Paul Messina, California Institute
of Technology
Jeremiah P. Ostriker, Princeton
University
Margaret H. Wright,New York
University
http://www.communitytechnology.org/nsf_ci_report/
2
Setting the Stage
In summary then, the opportunity is here to create
cyberinfrastructure that enables more ubiquitous,
comprehensive knowledge environments that
become functionally complete ..
in terms of
people, data, information, tools, and instruments
and that include unprecedented capacity for
computational, storage, and communication.
They
can
serve
individuals,
teams
and
organizations in ways that revolutionize what they
do, how they do it, and who can participate.
- The Atkins Report
3
Desired Characteristics
•
•
•
•
Science- and engineering-driven
Enabling discovery, learning and innovation
Promising economies of scale and scope
Supporting data-, instrumentation-, computeand collaboration-intensive applications
• High-end to desktop
• Heterogeneous
• Interoperable-enabled by collection of
reusable, common building blocks
4
Integrated Cyberinfrastructure
meeting the needs of a community of communities
Discovery,
Learning &
Innovation
Applications
• Environmental Science
• High Energy Physics
• Proteomics/Genomics
• Learning
Science of CI
Training &
Workforce
Development
Science
Gateways
CI Commons
CI Services
& Middleware
Hardware
Distributed
Resources
5
Overarching Principles
•
Enrich the portfolio
• Demonstrate transformative power of CI across S&E enterprise
• Empower range of CI users – current and emerging
• System-wide evaluation and CI-enabling research informs progress
•
Develop intellectual capital
• Catalyze community development and support
• Enable training and professional development
• Broaden participation in the CI enterprise
•
Enable integration and interoperability
• Develop shared vision, integrating architectures, common investments
• Promote collaboration, coordination and communication across fields
• Share promising technologies, practices and lessons learned
6
CI Planning - A Systems Approach
Domain-specific strategic plans
-Technology/human capital roadmaps
-Gaps and barrier analyses (policy,
S&E Gateways
funding, ..)
CI-enabling Research
Integrative CI
“system of systems”
Core Activities
System-wide activities
-Education, training
-(Inter)national networks
-Capacity computing
-Science of CI
CI Commons
-
Compute-centric
Information-intensive
Instrumentation-enabling
Interactive-intensive
7
Baselining NSF CI Investments
• Core (examples)
• Protein Databank
• Network for Earthquake Engineering Simulation
• International Integrated Microdata Access System
• Partnerships, Advanced Computational Infrastructure
• Circumarctic Environmental Observatory Network
• National Science Digital Library
• Pacific Rim GRID Middleware
• Priority Areas (examples)
• Geosciences Network
• international Virtual Data Grid Laboratory
• Grid Research and Applications Development
.. and others too numerous to mention (~$400M in FY’04)
8
CI Building Blocks
Partnerships for Advanced
Computational Infrastructure (PACI)
–
–
–
Science Gateways (Alpha projects,
Expeditions)
Middleware Technologies (NPACKage,
ATG, Access Grid in a Box, OSCAR )
Computational Infrastructure
Extensible Terascale Facility
(TERAGRID)
– Science Gateways (value-added of
integrated system approach)
– Common Teragrid Software Stack
(CTSS)
– Compute engines, Data,
Instruments, Visualization
NSF Middleware Initiative (NMI)
– Production software releases
– GridsCenter Software Suite, etc.
Early Adopters
– Grid Physics Network (GriPhyN),
international Virtual Data Grid
Laboratory (iVDGL)
– National Virtual Observatory
– Network for Earthquake
Engineering Simulation (NEES)
– Bio-Informatics Research Network
(BIRN)
9
Extensible Terascale Facility (TERAGRID)
A CI Pathfinder
• Pathfinder Role
– integrated with extant CI capabilities
– clear value-added
• supporting a new class of S&E applications
• Deploy a balanced, distributed system
– not a “distributed computer” but rather
– a distributed “system” using Grid technologies
• computing and data management
• visualization and scientific application analysis
• remote instrumentation access
• Define an open and extensible infrastructure
– an “enabling cyberinfrastructure” demonstration
– extensible beyond original sites with additional funding
• NCSA, SDSC, ANL, Caltech and PSC
• ORNL, TACC, Indiana University, Purdue University and Atlanta hub
10
Resource Providers + 4 New Sites
Caltech: Data collection analysis
0.4 TF IA-64
IA32 Datawulf
80 TB Storage
ANL: Visualization
LEGEND
IA64
Cluster
Visualization
Cluster
Storage Server
Shared Memory
IA32
IA64
IA32
Disk Storage
Backplane Router
1.25 TF IA-64
96 Viz nodes
20 TB Storage
IA32
Extensible Backplane Network
LA
Hub
30 Gb/s
Chicago
Hub
40 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
4 TF IA-64
DB2, Oracle Servers
500 TB Disk Storage
6 PB Tape Storage
1.1 TF Power4
IA64
Sun
IA64
10 TF IA-64
128 large memory nodes
230 TB Disk Storage
3 PB Tape Storage
GPFS and data mining
Pwr4
SDSC: Data Intensive
NCSA: Compute Intensive
6 TF EV68
71 TB Storage
0.3
TF
EV7
shared-memory
EV7
150 TB Storage Server
EV68
Sun
PSC: Compute Intensive
11
Common Teragrid Software Stack
(CTSS)
• Linux Operating
Environment
• Basic and Core Globus
Services
– GSI (Grid Security
Infrastructure)
– GSI-enabled SSH and
GSIFTP
– GRAM (Grid
Resource Allocation
& Management)
– GridFTP
– Information Service
– Distributed accounting
– MPICH-G2
– Science Portals
• Advanced and Data
Services
– Replica
Management
Tools
– GRAM-2 (GRAM
extensions)
– CAS (Community
Authorization
Service)
– Condor-G (as
brokering “super
scheduler”)
– SDSC SRB
(Storage
Resource Broker)
– APST user
middleware, etc.
12
TERAGRID as a Pathfinder
• Science Drivers - Gateways
-On-demand computing
-Remote visual steering
-Data-intensive computing
• Systems Integrator/Manager
-Common TERAGRID Software Stack
-User training & services
-TERAGRID Operations Center
• Resource Providers
-Data resources, compute engines, viz,
user services13
Focus on Policy and Social Dynamics
• Policy issues must be considered
up front
• Social engineering will be at least
as important as software
engineering
• Well-defined interfaces will be
critical for successful software
development
• Application communities will need
to participate from the beginning
Fran Berman, SDSC
14
CI Building Blocks
Partnerships for Advanced
Computational Infrastructure (PACI)
–
–
–
Science Gateways (Alpha projects,
Expeditions)
Middleware Technologies (NPACKage,
ATG, Access Grid in a Box, OSCAR )
Computational Infrastructure
Extensible Terascale Facility
(TERAGRID)
– Science Gateways (value-added of
integrated system approach)
– Common Teragrid Software Stack
(CTSS)
– Compute engines, Data,
Instruments, Visualization
NSF Middleware Initiative (NMI)
– Production software releases
– GridsCenter Software Suite, etc.
Early Adopters
– Grid Physics Network (GriPhyN),
international Virtual Data Grid
Laboratory (iVDGL)
– National Virtual Observatory
– Network for Earthquake
Engineering Simulation (NEES)
– Bio-Informatics Research Network
(BIRN)
15
16
CI Commons
Goals
• Commercial-grade software – stable, well-supported and welldocumented
• User surveys and focus groups inform priority-setting
• Development of “Commons roadmap”
Unanswered questions
• What role does industry play in development and support of
products
• In what timeframe will software and services be available
• How will customer satisfaction be assessed and by whom
• What role do standards play – and does an effective standards
process exist today
17
CI Commons
Community Development Approach
• End-user communities willing and able to modify code
• Adds features, repairs defects, improves code
• Customizes common building blocks for domain applications
• Leads to higher quality code, enhances diversity
• Natural way to set priorities
Requires
• Education, training in community development methodologies
• Effective Commons governance plan
• Strong, sustained interaction between Commons developers and
community code enhancers
18
Challenging Context
• Cyberinfrastructure Ecology
– Technological change more rapid than institutional change
– Disruptive technology promises unforeseen opportunity
• Seamless Integration of New and Old
– Balancing upgrades of existing and creation of new
resources
– Legacy instruments, models, data, methodologies
• Broadening Participation
• Community-Building
• Requires Effective Migration Strategy
19
20
On-Demand: Severe Weather Forecasting
Several times
a week, need
multiple hours
dedicated
access to a
multiTeraflops
system.
Kelvin Droegemeier, Center for Analysis and Prediction of Storms
(CAPS) University of Oklahoma
21
On Demand: Brain Data Grid
Objective: Form a National Scale Testbed for Federating Large
Databases Using NIH High Field NMR Centers
Stanford
U. Of MN
NCRR Imaging
and Computing
Resources UCSD
Harvard
Cal Tech
SDSC
Surface Web
Cal-(IT)2
Deep Web
UCLA
Duke
Cyberinfrastructure Linking Tele-instrumentation, Data
Intensive Computing, and Multi-scale Brain Databases.
Mark Ellisman, Larry Smarr, UCSD
Wireless “Pad”
Web Interface
22
Molecular Biology Simulation
User
Web Portal
• membrane potential (s)
Molecular Dynamics
• bath diffusion constant
• protein/lipid 3-d struct coord and topology
• channel diffusion constant
• time step-size
• force field sets
Related by sampling method used
for calculation of diffusion constant
in MD simulations
• number of time steps
• ion-water ratio
• ion type/initial positions
• channel diameter
• simulation time step-size
• channel length
• simulation methodology specifications **
• force profile
Hole Profile
analysis
Data
Workflow
Manager
Hole Analysis
• ion trajectory
• protein 3-d structure coordinates
• ion type
Electrostatics - II
Brownian Dynamics
• bath concentration (s) – Inside / Outside
• one position in channel
•temperature
• approximate channel direction
• ionic strength
• technical specifications***
• protein dielectric
• water dielectric
• protein 3-d structure coordinates
Electrostatics - I
• technical specifications *
• partial charges of titratable residues
• temperature
• ionic strength
• pH of bath
• protein dielectric
• interaction potentials
between titratable groups
in protein
• water dielectric
Globus
Client
• protein 3-d struct coord
• technical specifications *
TeraGrid Resources
Eric Jakobsson, UIUC
23
Download