Grid Challenges It’s the vision, stupid …but it NEEDS TO be followed

advertisement
Grid Challenges
It’s the vision, stupid
…but it NEEDS TO be followed
by operational standards
based on real applications…
The Global Grid Forum
25 June 2003
Gordon Bell
Microsoft Corporation
A quick look at
some past visions
and a challenge
NREN >> Internet
WWW
Challenge: Will match any Grid enabled
application that wins a Gordon Bell
Prize for parallelism
FCCSET NREN Plan 11/1987
10G1G-
3G
a factor of 1000
makes a
difference
45 M
100M10M-
1.5 M
Optical
Phase 2
1M100K10K-
Phase 1
56K
1988 1990 1992 1994 1996 1998 2000
3
Originating Bandwidth (Gb/s)
U.S. Interstate Comm. traffic L Roberts ’92
10,000- ARPAnet Goals c1972 = Grid Goals
Video Conf.
1,000-
Voice
100-
Video on
Demand
NSF bb•
10FAX
Email
Broadcast TV
1-
|1990 |
|2000 |
|2010 |
|20204
Growth in hype vs reality
WWW books,
Infoway
newspapers regulation
Infoway
speculation
“how great it’ll
be” (politicians ,
telecoms &
futurists)
Infoway
addiction
conferences
lawsuits
c 1995 Data from Gordon’s WAG
5
Articles per newspaper versus
orders per second sent via Internet
orders
per
second
articles per
newspaper
c 1995 Data from Gordon’s WAG
6
Articles about security, privacy, & fraud
versus commerce ($M)
actual
commerce
articles about
risk and NOT
doing
commerce
organized
crime on
Internet
c 1995 Data from Gordon’s WAG
7
The virtuous cycle of bandwidth
supply and demand
Increased
Demand
Standards
IP
Create new
service
Telnet & FTP
EMAIL
Increase Capacity
(circuits & bw)
Lower
response time
WWW
Audio
Video
Voice!
Grids
Video Conf.
FTP
Web Svcs
8
For More Information

Grid Book c1998 from 1996
- www.mkp.com/grids

The Globus Project™
- www.globus.org

OGSA
- www.globus.org/ogsa

Global Grid Forum
651 pp.
22 chapters,
41 authors
- www.gridforum.org

Grid Computing
2003
1080 pages
43 chapters,
O(100) authors
From Carl Kesselman, ISI
Progress...a review
Grid started out with great promise…c1998
Interesting use at NASA for coupled programs
NMI (National Middleware Infrastructure)
…State_Tools.gov, funded by NSF.gov
clearly open, clearly not “free” not IETF model
Tools vs. standards & evolving working code
Some examples:
C1980: Seti@home, folding@home, >> Napster p2p
2001 15 TB Terraserver > Terraservice w/Web Services
2003 Alex Szelay & Jim Gray: Skyserver/skyservice
Cornell Theory Center Web Services based apps
NEES—good poster child. An XML task
GRADs and Teragrid… dream or research or just $$s?
TerraServer c1998:
The “Whole Earth” Database
TerraServer Experience
c2001

Successful Web Site



To the
rescue!
50,000 daily users satisfied with “human-accessible”
data
59 GB imagery transmitted daily
New Feature Requests



Programmable access to meta-data
User selectable image sizes, i.e. “a map server”
Permission to use TerraServer data within server
applications
.NET TerraService Architecture
Standard
Browsers
Map UI
Web Forms
Existing
DB Server
Map Server
Http Handler
Smart
Clients
Windows
Forms
.NET
Framework
668 m Rows
SQL 2000
TerraServer
Web Service
1.0 TB Db
SQL 2000
1.0 TB Db
ADO.NET
OLEDB
SQL 2000
1.0 TB Db
Data Intensive Science:
the Next Frontier
The W.M. Keck Fellows
in Advanced Scientific
Data Analysis
Alex Szalay
The Johns Hopkins University
Department of Physics and Astronomy
National Virtual Observatory
NSF ITR project, “Building the Framework for
the National Virtual Observatory” is a
collaboration of 17 funded and 3 unfunded
organizations
Astronomy data centers
National observatories
Supercomputer centers
University departments
Computer science/information technology specialists
PI and project director: Alex Szalay (JHU)
CoPI: Roy Williams (Caltech/CACR)
Scientific Data Exploration
1. Thousand years ago: science was empirical
describing natural phenomena
2. Last few hundred years: theoretical branch
using models, generalizations
3. Last few decades: a computational branch
simulating complex phenomena
4. Today: data exploration is emerging
synthesizing theory, experiment and computation with
advanced data management and statistics
Living in an Exponential World
Astronomers have a few hundred TB now
1 pixel (byte) / sq arc second ~ 4TB
Multi-spectral, temporal, … → 1PB
1000
They mine it looking for
new (kinds of) objects,
more of interesting ones (quasars),
density variations in 400-D space,
correlations in 400-D space
100
10
1
0.1
1970
1975
Data doubles every year
Data is public after 1 year
So, 50% of the data is public
Same trend appears in all sciences
1980
1985
1990
1995
2000
CCDs
Glass
Why Is Astronomy Special?
IRAS 25m
It has no commercial value
No privacy concerns, freely share results with others
Great for experimenting with algorithms
It is real and well documented
High-dimensional (with confidence intervals)
Spatial, temporal
Diverse and distributed
Many different instruments from
many different places and
many different times
The questions are interesting
There is a lot of it (soon petabytes)
GB: It is not over-funded aka it’s poor
ROSAT ~keV
2MASS 2m
DSS Optical
IRAS 100m
WENSS 92cm
NVSS 20cm
GB 6cm
Making Discoveries
When and where are discoveries made?
Always at the edges and boundaries
Going deeper, collecting more data, using more colors….
Metcalfe’s law
Utility of computer networks grows as the
number of possible connections: O(N2)
VO: Federation of N archives
Possibilities for new discoveries grow as O(N2)
Current sky surveys have proven this
Very early discoveries from SDSS, 2MASS, DPOSS
What can be learned from Sky Server?
It’s about data, not about harvesting flops
1-2 hr. query programs versus
1 wk programs based on grep
10 minute runs versus
3 day compute & searches
Database viewpoint. 100x speed-ups
Avoid costly re-computation and searches
Use indices and PARALLEL I/O.
Read / Write >>1.
Parallelism is automatic, transparent, and just
depends on the number of computers/disks.
Limited experience and talent to use dbases.
Soon: The Virtual Observatory
Many new surveys are coming
SDSS is a dry run for the next ones
LSST will be 5TB/night
All the data will be on the Internet
ftp, web services…
Data and applications will be
associated with the instruments
Distributed world wide, cross-indexed
Federation is a must
Will be the best telescope in the world
World Wide Telescope
Finds the “needle in the haystack”
Successful demonstrations in Jan’03
Emerging Concepts
Standardizing distributed data access
Web Services, supported on all platforms
XML: Extensible Markup Language
SOAP: Simple Object Access Protocol
WSDL: Web Services Description Language
Standardizing distributed computing
Grid Services
Custom configure remote computing dynamically
Build your own remote computer, and discard
Virtual Data: new data sets on demand
Both needed for Data Exploration
Computational Science
Simulations based on
Web Services
Gerd Heber
Cornell Theory Center
heber@tc.cornell.edu
Three Flavors of Adaptivity

Application-level



Algorithm-level



Mathematical model
High/low confidence
Discretization method
Solution technique
System-level


Resource availability
Fault tolerance
International Conference on Computational Science 2003
The Problem

Do distributed, coupled and adaptive
multi-physics simulations of



Mechanics of chemically-reacting flows
(Damage) Thermo-Mechanics of solids
Components provided as Web Services
International Conference on Computational Science 2003
International Conference on Computational Science 2003
Geography

Cornell University






Theory Center
Department of Computer Science
Department of Civil Engineering
University of Alabama
Mississippi State University
College of William and Mary
International Conference on Computational Science 2003
Workflow
International Conference on Computational Science 2003
Components


MiniCAD
Meshers






Surface (Delaunay, quality guarantees)
Volume (Dmesh, Jmesh, Gmesh)
Fluid/Thermal simulation (Loci, CHEM)
Thermo-mechanical component (CPTC)
Fracture mechanics
Visualization (OpenDX + SQL Server)
International Conference on Computational Science 2003
Web Services



“Web Services are self-contained,
modular applications that can be
described, published, located, and
invoked over a network, …” (IBM)
Service oriented architecture: Publish,
find, bind
XML, SOAP, UDDI, WSDL
International Conference on Computational Science 2003
Features and Requirements

Distributed expertise



Platform and language neutrality





No porting
Network accessibility (“firewall
compliant”)
Security
Industry standards
Metadata
State
Students shouldn’t waste too much
time with coding!
International Conference on Computational Science 2003
GrADS Vision
•
Build a National Problem-Solving System on the Grid
•
Software Support for Application Development on Grids
•
Challenges:
— Transparent to the user, who sees a problem-solving system
— Goal: Design and build programming systems for the Grid that
broaden the community of users who can develop and run
applications in this complex environment
— Presenting a high-level application development interface*
– If programming is hard, the Grid will not not reach its potential
— Designing and constructing applications for adaptability
— Late mapping of applications to Grid resources
— Monitoring and control of performance
– When should the application be interrupted and remapped?
*GB note: This is a superset of the previously unsolved
clusters programming problem!
GrADSoft Architecture
Performance
Feedback
Software
Components
Source
Application
WholeProgram
Compiler
Performance
Problem
Configurable
Object
Program
Resource
Negotiator
Real-time
Performance
Monitor
Negotiation
Scheduler
Binder
Libraries
Grid
Runtime
System
Network for Earthquake Eng. Simulation


NEESgrid: US national
infrastructure to couple
earthquake engineers with
experimental facilities,
databases, computers, &
each other
On-demand access to
experiments, data streams,
computing, archives,
collaboration
Carl Kesselman, ISI
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC From
www.neesgrid.org
A Universal Architecture for
Web Services… Microsoft Vision
Security
Reliable Messaging
“Scales Up”
Transactions
on large
Routing
systems
…
“Scales In”
on a machine
“Scales Down”
to devices
“Scales Away”
spans organizations
& geographies
“Scales Out”
by adding
machines
Messaging Infrastructure
Distributed applications
Vertical processes
Embedded systems
Network equipment
…
39
Web Services: Level I
Foundation to Build Upon
Basic profile
Defined by WS-I
XML, SOAP, WSDL,
UDDI
Broad vendor
support
WS-I assures
widespread
compatibility
Level II
Secure, Reliable, Transacted
Connected Applications
Secure
Reliable
Messaging
XML
Transports
…
Transacted
Metadata
Management
Business
Process
Level III
From Infrastructure to Solutions
Application schemas
Domain specific profiles
Vertical industry services
Vison: Community/Data-Centric Computing
Versus Machine-Centered Centers
Goal: Enable technical communities to create and take
responsibility for their own computing environments of
personal, data, and program collaboration and
distribution
Design based on technology and cost, e.g. networking,
apps programs maintenance, databases, and providing
24x7 web and other services
Many alternative styles and locations are possible
Service from existing centers, including many state centers
Software vendors could be encouraged to supply apps web
services
NCAR style center based on shared data and apps
Instrument- and model-based databases. Both central &
distributed when multiple viewpoints create the whole.
Wholly distributed services supplied by many individual
groups
Community/Data Centric: “web service”
Community is responsible
Planned & budget as resources
Responsible for its infrastructure
Apps are from community
Computing is integral to work
In sync with technologies
1-3 Tflops/$M; 1-3 PBytes/$M
to buy smallish Tflops & PBytes.
New scalables are “centers”
Community can afford and evolve
Dedicated to a community
Program, data & database centric
May be aligned with instruments or other community
activities
Output = web service;
Can communities form that can supply services?
Commitment to standards
A general architecture comes much from
understanding the problems
Understanding the problems comes from
actually solving such problems
This is bottom-up, based on experience
Microsoft is committed to develop
community-wide web services
standards…
Is the Grid Forum equally committed?
The End
How can GRIDs become a real,
useful, computer structure?
Get a life.
Use the standards and tools.
Adopt an application and/or
community…now!
Download