lanlgridsept03 - Digital Science Center

advertisement
Remarks on
Grids e-Science
CyberInfrastructure
and Peer-to-Peer
Networks
Los Alamos
September 23 2003
Geoffrey Fox
Community Grids Lab
Indiana University
gcf@indiana.edu
What is High Performance Computer?
• We might wish to consider three classes of multi-node computers
• 1) Classic MPP with microsecond latency and scalable internode
bandwidth (tcomm/tcalc ~ 10 or so)
• 2) Classic Cluster which can vary from configurations like 1) to 3)
but typically have millisecond latency and modest bandwidth
• 3) Classic Grid or distributed systems of computers around the
network
– Latencies of inter-node communication – 100’s of milliseconds but can have
good bandwidth
• All have same peak CPU performance but synchronization costs
increase as one goes from 1) to 3)
• Cost of system (dollars per gigaflop) decreases by factors of 2 at
each step from 1) to 2) to 3)
• One should NOT use classic MPP if class 2) or 3) suffices unless
some security or data issues dominates over cost-performance
• One should not use a Grid as a true parallel computer – it can link
parallel computers together for convenient access etc.
What is a Grid I?
• Collaborative Environment (Ch2.2,18)
• Combining powerful resources, federated computing and a security
structure (Ch38.2)
• Coordinated resource sharing and problem solving in dynamic multiinstitutional virtual organizations (Ch6)
• Data Grids as Managed Distributed Systems for Global Virtual
Organizations (Ch39)
• Distributed Computing or distributed systems (Ch2.2,10)
• Enabling Scalable Virtual Organizations (Ch6)
• Enabling use of enterprise-wide systems, and someday nationwide
systems, that consist of workstations, vector supercomputers, and
parallel supercomputers connected by local and wide area networks.
Users will be presented the illusion of a single, very powerful
computer, rather than a collection of disparate machines. The system
will schedule application components on processors, manage data
transfer, and provide communication and synchronization in such a
manner as to dramatically improve application performance. Further,
boundaries between computers will be invisible, as will the location
of data and the failure of processors. (Ch10)
What is a Grid II?
• Supporting e-Science representing increasing global collaborations of
people and of shared resources that will be needed to solve the new
problems of Science and Engineering (Ch36)
• As infrastructure that will provide us with the ability to dynamically
link together resources as an ensemble to support the execution of
large-scale, resource-intensive, and distributed applications. (Ch1)
• Makes high-performance computers superfluous (Ch6)
• Metasystems or metacomputing systems (Ch10,37)
• Middleware as the services needed to support a common set of
applications in a distributed network environment (Ch6)
• Next Generation Internet (Ch6)
• Peer-to-peer Network (Ch10, 18)
• Realizing thirty year dream of science fiction writers that have spun
yarns featuring worldwide networks of interconnected computers that
behave as a single entity. (Ch10)
• Technology on which to build CyberInfrastructure (NSF)
• High Performance Computing World’s view of the Web
The Grid for my purposes is “best practice” in all of this!
Taxonomy of Grid Functionalities
Name of Grid Type
Description of Grid Functionality
Compute/File Grid
or Data File Grid
Run multiple jobs with distributed compute and data
resources (Global “UNIX Shell”)
Desktop Grid
e.g. SETI@Home
Information Grid
or Data Service Grid
“Internet Computing” and “Cycle Scavenging” with secure
sandbox on large numbers of untrusted computers
Complexity or
Hybrid Grid
Hybrid combination of Information and Compute/File Grid
emphasizing integration of experimental data, filters and
simulations: Data assimilation
Campus Grid
Grid supporting University community computing
Enterprise Grid
Grid supporting a company’s enterprise infrastructure
Grid service access to distributed information, data and
knowledge repositories
Classes of Computing Grid Applications
• Running “Pleasing Parallel Jobs” as in United Devices,
Entropia (Desktop Grid) “cycle stealing systems”
• Can be managed (“inside” the enterprise as in Condor) or
more informal (as in SETI@Home)
• Computing-on-demand in Industry where jobs spawned are
perhaps very large (SAP, Oracle …)
• Support distributed file systems as in Legion (Avaki),
Globus with (web-enhanced) UNIX programming paradigm
– Particle Physics will run some 30,000 simultaneous jobs this way
• Pipelined applications linking data/instruments, compute,
visualization
• Seamless Access where Grid portals allow one to choose
one of multiple resources with a common interfaces
Information/Knowledge Grids
• These are typified by virtual observatory and
bioinformatics applications
• Distributed (10’s to 1000’s) of data sources (instruments,
file systems, curated databases …)
• Possible filters assigned dynamically
– Run image processing algorithm on telescope image
– Run Gene sequencing algorithm on data from EBI/NCBI
• Integrate across experiments as in multi-wavelength
astronomy
• Needs decision support front end with “what-if”
simulations
• Metadata (provenance) critical to annotate data
• SERVOGrid – Solid Earth Research Virtual Observatory
will link Japan, Australia, USA
SERVOGrid Caricature
Repositories
Federated Databases
Database
Loosely Coupled
Filters
Sensor Nets
Streaming Data
Database
Closely Coupled Compute Nodes
Analysis and
Visualization
Sources of Grid Technology
• Grids support distributed collaboratories or virtual
organizations integrating concepts from
• The Web
• Agents
• Distributed Objects (CORBA Java/Jini COM)
• Globus, Legion, Condor, NetSolve, Ninf and other High
Performance Computing activities
• Peer-to-peer Networks
• With perhaps the Web and P2P networks being the most
important for “Information Grids” and Globus for
“Compute Grids”
The Essence of Grid Technology?
• We will start from the Web view and assert that basic
paradigm is
• Meta-data rich Web Services communicating via
messages
• These have some basic support from some runtime such
as .NET, Jini (pure Java), Apache Tomcat+Axis (Web
Service toolkit), Enterprise JavaBeans, WebSphere (IBM)
or GT3 (Globus Toolkit 3)
– These are the distributed equivalent of operating system
functions as in UNIX Shell
– Called Hosting Environment or platform
• W3C standard WSDL defines IDL (Interface standard) for
Web Services
Services and Distributed Objects
• A web service is a computer program running on either the local or
remote machine with a set of well defined interfaces (ports) specified
in XML (WSDL)
• Web Services (WS) have many similarities with Distributed Object
(DO) technology but there are some (important) technical and
religious points
– CORBA Java COM are typical DO technologies
– Agents are typically SOA (Service Oriented Architecture)
• Both involve distributed entities but Web Services are more loosely
coupled
– WS interact with messages; DO with RPC
– DO have “factories”; WS manage instances internally and interaction-specific
state not exposed and hence need not be managed
– DO have explicit state (statefull services); WS use context in the messages to
link interactions (statefull interactions)
• Claim: DO’s do NOT scale; WS build on experience (with CORBA)
and do scale
A typical Web Service
• In principle, services can be in any language (Fortran .. Java .. Perl ..
Python) and the interfaces can be method calls, Java RMI Messages,
CGI Web invocations, totally compiled away (inlining)
• The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
WSDL interfaces
Security
WSDL interfaces
Payment
Credit Card
Catalog
Warehouse
shipping
Details of Web Service Protocol Stack
• UDDI finds where programs are
– remote( (distributed) programs are
just Web Services
– (not a great success)
• WSFL links programs together
(under revision as BPEL4WS)
• WSDL defines interface (methods,
parameters, data formats)
• SOAP defines structure of message
including serialization of information
• HTTP is negotiation/transport protocol
• TCP/IP is layers 3-4 of OSI
• Physical Network is layer 1 of OSI
UDDI or WSIL
WSFL
WSDL
SOAP or RMI
HTTP or SMTP
or IIOP or RMTP
TCP/IP
Physical Network
What are System and Application Services?
• There are generic Grid system services: security, collaboration,
persistent storage, universal access
– OGSA (Open Grid Service Architecture) is implementing these as
extended Web Services
• An Application Web Service is a capability used either by another
service or by a user
– It has input and output ports – data is from sensors or other
services
• Consider Satellite-based Sensor Operations as a Web Service
– Satellite management (with a web front end)
– Each tracking station is a service
– Image Processing is a pipeline of filters – which can be grouped
into different services
– Data storage is an important system service
– Big services built hierarchically from “basic” services
• Portals are the user (web browser) interfaces to Web services
Application Web Services
Prog1
Prog2
Filter2
Filter3 sensor analysis, simulations and people
• Filter1
Note Service model
integrates sensors,
WS
WS
WS
WS
WS
• An Application Web Service is a capability used either by another service or by a
user
Build as multiple
Build as multiple Filter Web Services
interdisciplinary
– It has input and output ports – data is from users, sensors
or other services
– Big services built hierarchically from “basic” servicesPrograms
Sensor Data
as a Web
service (WS)
Simulation WS
Data
Analysis WS
Sensor
Management
WS
Visualization WS
What is Happening?
• Grid ideas are being developed in (at least) two
communities
– Web Service – W3C, OASIS
– Grid Forum (High Performance Computing, e-Science)
•
•
•
•
Service Standards are being debated
Grid Operational Infrastructure is being deployed
Grid Architecture and core software being developed
Particular System Services are being developed “centrally”
– OGSA framework for this in
• Lots of fields are setting domain specific standards and
building domain specific services
• There is a lot of hype
• Grids are viewed differently in different areas
– Largely “computing-on-demand” in industry (IBM, Oracle, HP,
Sun)
– Largely distributed collaboratories in academia
Grid Applications
•
•
•
•
•
•
•
•
•
Cope with Data Deluge – Moore’s law for detectors
Astronomy – virtual observatories
Biology – distributed repositories and filtering
Chemistry – online laboratories
Earth/Environmental Science – distributed sensors
Engineering – distributed monitors
Health – medical instruments and images
Particle Physics – analyze LHC data
Gridsourcing – animation in China, software in India and
design/leadership in USA
– Basketball coaching in Indiana, players in China
– Teachers in Los Alamos, students in universities
• Command and Control for DoD
• Federation of Information systems and modeling and simulation
• Problem Solving Environment and Software Integration
DAME
In flight data
~5000 engines
~ Gigabyte per aircraft per
Engine per transatlantic flight
Airline
Global Network
Such as SITA
Ground
Station
Engine Health (Data) Center
Maintenance Centre
Internet, e-mail, pager
Rolls Royce and UK e-Science Program
Distributed Aircraft Maintenance Environment
OGSA OGSI & Hosting Environments
• Start with Web Services in a hosting environment
• Add OGSI to get a Grid service and a component model
• Add OGSA to get Interoperable Grid “correcting” differences in base
platform and adding key functionalities
Not OGSA
Domain -specific services
Possibly OGSA
More specialized services: data
replication, workflow, etc., etc.
OGSA
Environment
Broadly applicable services: registry,
authorization, monitoring, data
access, etc., etc.
OGSI on Web Services
Given to us from on high
Hosting Environment for WS
Network
•
•
•
•
•
•
•
•
OGSI Open Grid Service Interface
http://www.gridforum.org/ogsi-wg
It is a “component model” for web services.
It defines a set of behavior patterns that each OGSI service must exhibit.
Every “Grid Service” portType extends a common base type.
– Defines an introspection model for the service
– You can query it (in a standard way) to discover
• What methods/messages a port understands
• What other port types does the service provide?
• If the service is “stateful” what is the current state?
Factory Model
A set of standard portTypes for
– Message subscription and notification
– Service collections
Each service is identified by a URI called the “Grid Service Handle”
GSHs are bound dynamically to Grid Services References (typically wsdl
docs)
– A GSR may be transient. GSHs are fixed.
– Handle map services translate GSHs into GSRs.
OGSI and Stateful Services
• Sometimes you can send a message to a service, get a result and
that’s the end
– This is a statefree service
• However most non-trivial services need state to allow persistent
asynchronous interactions
• OGSI is designed to support Stateful services through two
mechanisms
– Information Port: where you can query for SDE (Service
Definition Elements)
– “Factories” that allow one to view a Service as a “class” (in an
object-oriented language sense) and create separate instances for
each Service invocation
• There are several interesting issues here
– Difference between Stateful interactions and Stateful services
– System or Service managed instances
Factories and OGSI
• Stateful interactions are typified by amazon.com where messages carry correlation
information allowing multiple messages to be linked together
– Amazon preserves state in this fashion which is in fact preserved in its
database permanently
• Stateful services have state that can be queried outside a particular interaction
• Also note difference between implicit and explicit factories
– Some claim that implicit factories scale as each service manages its own
instances and so do not need to worry about registering instances and lifetime
management
• See WS-Addressing from largely IBM and Microsoft
http://msdn.microsoft.com/webservices/default.aspx?pull=/library/en-us/dnglobspec/html/ws-addressing.asp
Explicit Factory
Implicit Factory
F
A
C
T
O
R
Y
1
2
3
4
1
2
3
4
F
A
C
T
O
R
Y
Technical Activities of Note
• Look at different styles of Grids such as Autonomic
(Robust Reliable Resilient)
• New Grid architectures hard due to investment required
• Critical Services Such as
–
–
–
–
–
–
–
–
Security – build message based not connection based
Notification – event services
Metadata – Use Semantic Web, provenance
Databases and repositories – instruments, sensors
Computing – Submit job, scheduling, distributed file systems
Visualization, Computational Steering
Fabric and Service Management
Network performance
• Program the Grid – Workflow
• Access the Grid – Portals, Grid Computing Environments
Issues and Types of Grid Services
•
•
•
•
•
•
1) Types of Grid
–
R3
–
Lightweight
–
P2P
–
Federation and Interoperability
2) Core Infrastructure and Hosting
Environment
–
Service Management
–
Component Model
–
Service wrapper/Invocation
–
Messaging
3) Security Services
–
Certificate Authority
–
Authentication
–
Authorization
–
Policy
4) Workflow Services and Programming
Model
–
Enactment Engines (Runtime)
–
Languages and Programming
–
Compiler
–
Composition/Development
5) Notification Services
6) Metadata and Information Services
–
Basic including Registry
–
Semantically rich Services and meta-data
–
Information Aggregation (events)
–
Provenance
•
•
•
•
•
7) Information Grid Services
– OGSA-DAI/DAIT
– Integration with compute resources
– P2P and database models
8) Compute/File Grid Services
– Job Submission
– Job Planning Scheduling Management
– Access to Remote Files, Storage and
Computers
– Replica (cache) Management
– Virtual Data
– Parallel Computing
9) Other services including
– Grid Shell
– Accounting
– Fabric Management
– Visualization Data-mining and
Computational Steering
– Collaboration
10) Portals and Problem Solving
Environments
11) Network Services
– Performance
– Reservation
– Operations
Remote Grid Service
10: Job
Status
Remote Grid Service
1: Job Management Service
(Grid Service Interface to user or program client)
1: Plan Execution
4: Job Submittal
2: Schedule and control Execution
3: Access to Remote Computers
Data
7: Cache
Data
Replicas
9: Grid MPI
5: Data Transfer
6: File and
Storage
Access
8: Virtual
Data
Data
Technology Components of (Services in)
a Computing Grid
Taxonomy of Grid Operational Style
Name of Grid Style
Semantic Grid
Peer-to-peer Grid
Description of Grid Operational or
Architectural Style
Integration of Grid and Semantic Web meta-data
and ontology technologies
Grid built with peer-to-peer mechanisms
Lightweight Grid
Grid designed for rapid deployment and minimum
life-cycle support costs
Collaboration Grid
Grid supporting collaborative tools like the Access
Grid, whiteboard and shared applications.
Fault tolerant and self-healing Grid
Robust Reliable Resilient RRR
RRR or Autonomic
Grid
Virtualization
• The Grid could and sometimes does virtualize various
concepts – should do more
• Location: URI (Universal Resource Identifier) virtualizes
URL (WSAddressing goes further)
• Replica management (caching) virtualizes file location
generalized by GriPhyn virtual data concept
• Protocol: message transport and WSDL bindings
virtualize transport protocol as a QoS request
• P2P or Publish-subscribe messaging virtualizes matching
of source and destination services
• Semantic Grid virtualizes Knowledge as a meta-data
query
• Brokering virtualizes resource allocation
• Virtualization implies all references can be indirect and
needs powerful mapping (look-up) services -- metadata
Metadata and Semantic Grid
• Can store in one catalog, multiple catalogs or in each service
– Not clear how a coherent approach will develop
• Specialized metadata services like UDDI and MDS (Globus)
– Nobody likes UDDI
– MDS uses old fashioned LDAP
– RGMA is MDS with a relational database backend
• Some basic XML database (Oracle, Xindice …)
• “By hand” as in current SERVOGrid Portal which is roughly same
as using service stored SDE’s (Service Data Elements) as in OGSI
• Semantic Web (Darpa) produced a lot of metadata tools aimed at
annotating and searching/reasoning about metadata enhanced
webpages
– Semantic Grid uses for enriching Web Services
– Implies interesting programming model with traditional analysis
(compiler) augmented by meta-data annotation
Three Metadata Architectures
System or Federated Registry or Metadata Catalog
Database
Grid or Domain Specific Metadata Catalogs
Database1
Database2
Database3
Information Ports
SDE1
SDE2
SDE1
SDE2
SDE1
SDE2
SDE1
SDE2
SDE1
SDE2
SDE1
SDE2
SDE1
SDE2
Service
Service
Service
Service
Service
Service
Service
Individual Services
Jobs
Database
Tools
Selected GeoInformatics Data
XML Meta-data
Service
Tool MetaData
Job MetaData
MultiScale
Ontologies
Complexity Scripts
Workflow
SERVOGrid Complexity
Simulation Service
SERVOPSE
Programs
using CCEML
(SERVOML)
Importance of Metadata Service; how should this be implemented?
SERVOGrid Requirements
• Seamless Access to Data repositories and large scale
computers
• Integration of multiple data sources including sensors,
databases, file systems with analysis system
– Including filtered OGSA-DAI
• Rich meta-data generation and access with SERVOGrid
specific Schema extending openGIS standards and using
Semantic Grid
• Portals with component model for user interfaces and web
control of all capabilities
• Collaboration to support world-wide work
• Basic Grid tools: workflow and notification
Approach
• Build on e-Science methodology and Grid
technology
• Science applications with multi-scale
models, scalable parallelism, data
assimilation as key issues
– Data-driven models for earthquakes, climate,
environment …..
• Use existing code/database technology
(SQL/Fortran/C++) linked to “Application
Web/OGSA services”
– XML specification of models, computational
steering, scale supported at “Web Service” level
as don’t need “high performance” here
– Allows use of Semantic Grid technology
Application WS
WS linking
to user and
Other WS
(data sources)
Typical
codes
Integration of Data and Filters
• One has the OGSA-DAI Data repository interface
combined with WSDL of the (Perl, Fortran, Python …)
filter
• User only sees WSDL not data syntax
• Some non-trivial issues as to where the filtering compute
power is
– Microsoft says filter next to data
WSDL
Of Filter
Filter
OGSA-DAI
Interface
DB
SERVOGrid Complexity Computing Environment
Database
Database
Service
Application
Service-1
Application
Service-2
Application
Service-3
Parallel
Simulation
Service
Compute
Service
Middle Tier
with XML
CCE Control
Portal Aggregation
Users
Sensor
Service
Interfaces
XML Meta-data
Service
Complexity
Simulation
Service
Visualization
Service
OGSA-DAI
Grid Services
Grid
Grid Data
Assimilation
HPC
Simulation
Analysis
Control
Visualize
This Type of Grid
integrates with
Parallel computing
Multiple HPC
facilities but only
use one at a time
Many simultaneous
data sources and
sinks
Distributed Filters
massage data
For simulation
SERVOGrid (Complexity)Computing Model
Data Assimilation
• Data assimilation implies one is solving some optimization
problem which might have Kalman Filter like structure
Nobs
min
Theoretical Unknowns
2
Data
(
position
,
time
)

Simulated
_
Value
Error



i
i
2
i 1
• As discussed by DAO at Earth Science meeting, one will
become more and more dominated by the data (Nobs much
larger than number of simulation points).
• Natural approach is to form for each local (position, time)
patch the “important” data combinations so that
optimization doesn’t waste time on large error or insensitive
data.
• Data reduction done in natural distributed fashion NOT on
HPC machine as distributed computing most cost effective
if calculations essentially independent
– Filter functions must be transmitted from HPC machine
Distributed Filtering
Nobslocal patch >> Nfilteredlocal patch ≈ Number_of_Unknownslocal patch
In simplest approach, filtered data gotten by linear transformations on
original data based on Singular Value Decomposition of Least squares
matrix
Send needed Filter
Receive filtered data
Nobslocal patch 1
Nfilteredlocal patch 1
Geographically
Distributed
Sensor patches
Nobslocal patch 2
Factorize Matrix
to product of
local patches
Nfilteredlocal patch 2
Distributed
Machine
HPC Machine
Two-level Programming I
• The paradigm implicitly assumes a two-level Programming
Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and
databases
Service
Data
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Two-level Programming II
• The Grid is discussing the composition of distributed
services with the runtime
Service1
Service2
interfaces to Grid as
opposed to UNIX
Service3
Service4
pipes/data streams
• Familiar from use of UNIX Shell, PERL or Python scripts
to produce real applications from core programs
• Such interpretative environments are the single processor
analog of Grid Programming
• Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
Why we can dream of using HTTP and
that slow stuff
•
•
•
•
•
We have at least three tiers in computing environment
Client (user portal)
“Middle Tier” (Web Servers/brokers)
Back end (databases, files, computers etc.)
In Grid programming, we use HTTP (and used to use
CORBA and Java RMI) in middle tier ONLY to
manipulate a proxy for real job
– Proxy holds metadata
– Control communication in middle tier only uses metadata
– “Real” (data transfer) high performance communication in
back end
User
Services
System
Services
Grid
Computing
Environments
Portal
Services
System
Services
Application
Application Metadata
Service
Middleware
System
Services
Actual Application
System
Services
System
Services
Raw (HPC)
Resources
“Core”
Grid
Database
Workflow and SERVOGrid CCE
• SERVOGrid will use workflow technology to support both
– “code and data coupling”
– Multiscale features
• Implementing multiscale model requires
– building Web services for each model,
– describing each model with metadata and
– Describing linkage of models (linkage of ports on web services)
– And describing when to use which scale model
• So workflow and multiscale depend on web services described by
rich metadata
• This analysis isn’t correct if scales must be “tightly coupled” as
current workflow won’t support this (area addressed by CCA from
DoE)
– We should focus on multiscale models with loose “service”
coupling
– Hopefully we will learn how to take same architecture, compile
away inefficiencies and get high performance on tighter coupling
than conventional distributed workflow
Download