Information Services The Globus Toolkit™: The Globus Project™ Argonne National Laboratory

advertisement
The Globus Toolkit™:
Information Services
The Globus Project™
Argonne National Laboratory
USC Information Sciences Institute
http://www.globus.org
Grid Information Services
z
System information is critical to operation
of the grid and construction of applications
– What resources are available?
> Resource discovery
– What is the “state” of the grid?
> Resource selection
– How to optimize resource use
> Application configuration and adaptation?
z
We need a general information
infrastructure to answer these questions
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
2
Examples of Useful Information
z
Characteristics of a compute resource
– IP address, software available, system
administrator, networks connected to, OS
version, load
z
Characteristics of a network
– Bandwidth and latency, protocols, logical
topology
z
Characteristics of the Globus infrastructure
– Hosts, resource managers
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
3
Grid Information: Facts of Life
z
Information is always old
– Time of flight, changing system state
– Need to provide quality metrics
z
Distributed state hard to obtain
– Complexity of global snapshot
z
Component will fail
z
Scalability and overhead
z
Many different usage scenarios
– Heterogeneous policy, different information
organizations, etc.
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
4
Grid Information Service
z
z
z
Provide access to static and dynamic
information regarding system components
A basis for configuration and adaptation in
heterogeneous, dynamic environments
Requirements and characteristics
– Uniform, flexible access to information
– Scalable, efficient access to dynamic data
– Access to multiple information sources
– Decentralized maintenance
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
5
The GIS Problem: Many Information
Sources, Many Views
R
R
?
R
VO C
R
R
R
R
?
R
VO A
R
R
?
R
October 22, 2001
R
R
R
?
R
R
R
VO B
Intro to Grid Computing and Globus Toolkit™
6
What is a Virtual Organization?
• Facilitates the workflow of a group of users
across multiple domains who share (some
of) their resources to solve particular
classes of problems
• Collates and presents information about
these resources in a uniform view
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
7
Two Classes Of Information Servers
z
Resource Description Services
– Supplies information about a specific
resource (e.g. Globus 1.1.3 GRIS).
z
Aggregate Directory Services
– Supplies collection of information which was
gathered from multiple GRIS servers (e.g.
Globus 1.1.3 GIIS).
– Customized naming and indexing
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
8
Information Protocols
z
Grid Resource Registration Protocol
– Support information/resource discovery
– Designed to support machine/network
failure
z
Grid Resource Inquiry Protocol
– Query resource description server for
information
– Query aggregate server for information
– LDAP V3.0 in Globus 1.1.3
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
9
GIS Architecture
Customized Aggregate Directories
Users
Enquiry
A
A
Protocol
Registration
Protocol
R
R
R
R
Standard Resource Description Services
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
10
Metacomputing Directory Service
z
z
Use LDAP as Inquiry
Access information in a distributed directory
– Directory represented by collection of LDAP
servers
– Each server optimized for particular function
z
Directory can be updated by:
– Information providers and tools
– Applications (i.e., users)
– Backend tools which generate info on demand
z
Information dynamically available to tools and
applications
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
11
Two Classes Of MDS Servers
z
Grid Resource Information Service (GRIS)
– Supplies information about a specific resource
– Configurable to support multiple information
providers
– LDAP as inquiry protocol
z
Grid Index Information Service (GIIS)
– Supplies collection of information which was
gathered from multiple GRIS servers
– Supports efficient queries against information
which is spread across multiple GRIS server
– LDAP as inquiry protocol
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
12
LDAP Details
z
Lightweight Directory Access Protocol
– IETF Standard
– Stripped down version of X.500 DAP protocol
– Supports distributed storage/access (referrals)
– Supports authentication and access control
z
Defines:
– Network protocol for accessing directory contents
– Information model defining form of information
– Namespace defining how information is
referenced and organized
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
13
MDS Components
z
LDAP 3.0 Protocol Engine
– Based on OpenLDAP with custom backend
– Integrated caching
z
Information providers
– Delivers resource information to backend
z
APIs for accessing & updating MDS contents
– C, Java, PERL (LDAP API, JNDI)
z
Various tools for manipulating MDS contents
– Command line tools, Shell scripts & GUIs
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
14
Grid Resource Information Service
z
Server which runs on each resource
– Given the resource DNS name, you can find the
GRIS server (well known port = 2135)
z
Provides resource specific information
– Much of this information may be dynamic
> Load, process information, storage information, etc.
> GRIS gathers this information on demand
z
“White pages” lookup of resource information
– Ex: How much memory does machine have?
z
“Yellow pages” lookup of resource options
– Ex: Which queues on machine allows large jobs?
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
15
Grid Index Information Service
z
GIIS describes a class of servers
– Gathers information from multiple GRIS servers
– Each GIIS is optimized for particular queries
> Ex1: Which Alliance machines are >16 process SGIs?
> Ex2: Which Alliance storage servers have >100Mbps
bandwidth to host X?
– Akin to web search engines
z
Organization GIIS
– The Globus Toolkit ships with one GIIS
– Caches GRIS info with long update frequency
> Useful for queries across an organization that rely on
relatively static information (Ex1 above)
z
Can be merged into GRIS
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
16
Finding a GRIS
and Server Registration
z
A GRIS or GIIS server can be configured to
(de-) register itself during
startup/shutdown
– Targets specified in configuration file
z
Softstate registration protocol
– Good behavior in case of failure
z
Allows for federations of information
servers
– E.g. Argonne GRIS can register with both
Alliance and DOE GIIS servers
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
17
Logical MDS Deployment
Grads
Gusto
GIIS
ISI
GRISes
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
18
MDS Commands
z
LDAP defines a set of standard commands
ldapsearch, etc.
z
We also define MDS-specific commands
– grid-info-search, grid-info-host-search
z
APIs are defined for C, Java, etc.
– C: OpenLDAP client API
> ldap_search_s(), …
– Java: JNDI
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
19
Information Services API
z
RFC 1823 defines an IETF draft standard
client API for accessing LDAP databases
– Connect to server
– Pose query which returns data structures
contains sets of object classes and
attributes
– Functions to walk these data structures
z
Globus does not provide an LDAP API. We
recommend the use of OpenLDAP, an open
source implementation of RFC 1823.
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
20
Searching an LDAP Directory
grid-info-search [options] filter [attributes]
z
Default grid-info-search options
-h mds.globus.org
MDS server
-p 389
MDS port
-b “o=Grid”
search start point
-T 30
LDAP query timeout
-s sub
scope = subtree
alternatives:
base : lookup this entry
one
: lookup immediate children
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
21
Searching a GRIS Server
grid-info-host-search [options] filter [attributes]
z
Exactly like grid-info-search, except defaults:
-h localhost
-p 2135
z
GRIS server
GRIS port
Example:
grid-info-host-search –h pitcairn “dn=*” dn
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
22
Filtering
z
Filters allow selection of object based on
relational operators (=, ~=,<=, >=)
– grid-info-search “cputype=*”
z
Compound filters can be construct with
Boolean operations: (&, |, !)
– grid-info-search “(&(cputype=*)(cpuload1<=1.0))”
– grid-info-search “(&(hn~=sdsc.edu)(latency<=10))”
z
Hints:
required
– white space is significant
– use -L for LDIF format
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
23
Example: Filtering
% grid-info-host-search -L “(objectclass=GlobusSoftware)”
dn: sw=Globus, hn=pitcairn.mcs.anl.gov, dc=mcs, dc=anl, dc=gov,
o=Grid
objectclass: GlobusSoftware
releasedate: 2000/04/11 19:48:29
releasemajor: 1
releaseminor: 1
releasepatch: 3
releasebeta: 11
lastupdate: Sun Apr 30 19:28:19 GMT 2000
objectname: sw=Globus, hn=pitcairn.mcs.anl.gov, dc=mcs, dc=anl,
dc=gov, o=Grid
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
24
Example: Attribute Selection
% grid-info-host- search -L “(objectclass=*)” dn hn
– Returns the distinguished name (dn) and hostname (hn) of
all objects
dn: sw=Globus, hn=pitcairn.mcs.anl.gov, dc=mcs, dc=anl, dc=gov, o=Grid
dn: hn=pitcairn.mcs.anl.gov, dc=mcs, dc=anl, dc=gov, o=Grid
hn: pitcairn.mcs.anl.gov
dn: service=jobmanager, hn=pitcairn.mcs.anl.gov, dc=mcs, dc=anl, dc=gov,
o=Grid
hn: pitcairn.mcs.anl.gov
dn: queue=default, service=jobmanager, hn=pitcairn.mcs.anl.gov, dc=mcs,
dc=anl,
dc=gov, o=Grid
– Objects without hn fields are still listed
– DNs are always listed
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
25
Example: Discovering CPU Load
z
Retrieve CPU load fields of compute resources
% grid-info-search -L “(objectclass=GlobusComputeResource)” \
dn cpuload1 cpuload5 cpuload15
dn: hn=lemon.mcs.anl.gov, ou=MCS, o=Argonne National Laboratory,
o=Globus, c=US
cpuload1: 0.48
cpuload5: 0.20
cpuload15: 0.03
dn: hn=tuva.mcs.anl.gov, ou=MCS, o=Argonne National Laboratory,
o=Globus, c=US
cpuload1: 3.11
cpuload5: 2.64
cpuload15: 2.57
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
26
The Globus Toolkit™:
Resource Management
Services
The Globus Project™
Argonne National Laboratory
USC Information Sciences Institute
http://www.globus.org
The Challenge
z
Enabling secure, controlled remote access
to heterogeneous computational resources
and management of remote computation
– Authentication and authorization
– Resource discovery & characterization
– Reservation and allocation
– Computation monitoring and control
z
Addressed by new protocols & services
– GRAM protocol as a basic building block
– Resource brokering & co-allocation services
– GSI for security, MDS for discovery
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
28
Resource Management
z
z
z
The Grid Resource Allocation Management
(GRAM) protocol and client API allows
programs to be started on remote
resources, despite local heterogeneity
Resource Specification Language (RSL) is
used to communicate requirements
A layered architecture allows applicationspecific resource brokers and co-allocators
to be defined in terms of GRAM services
– Integrated with Condor, PBS, MPICH-G2, …
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
29
Resource
Management Architecture
RSL
specialization
Broker
RSL
Queries
& Info
Application
Ground RSL
Information
Service
Co-allocator
Simple ground RSL
Local
resource
managers
October 22, 2001
GRAM
GRAM
GRAM
LSF
Condor
NQE
Intro to Grid Computing and Globus Toolkit™
30
GRAM Protocol
z
GRAM-1: Simple HTTP-based RPC
– Job request
> Returns a “job contact”: Opaque string that can be passed
between clients, for access to job
– Job cancel, status, signal
– Event notification (callbacks) for state changes
> Pending, active, done, failed, suspended
z
GRAM-1.5 (U Wisconsin contribution)
– Add reliability improvements
> Once-and-only-once submission
> Recoverable job manager service
> Reliable termination detection
z
GRAM-2: Moving to Web Services (SOAP)…
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
31
Globus Toolkit Implementation
z
Gatekeeper
– Single point of entry
– Authenticates user, maps to local security
environment, runs service
– In essence, a “secure inetd”
z
Job manager
– A gatekeeper service
– Layers on top of local resource
management system (e.g., PBS, LSF, etc.)
– Handles remote interaction with the job
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
32
GRAM Components
Client
MDS client API calls
to locate resources
MDS: Grid Index Info Server
Site boundary
MDS client API calls
to get resource info
GRAM client API calls to
MDS: Grid Resource Info Server
request resource allocation
and process creation.
Query current status
of resource
GRAM client API state
change callbacks
Grid Security
Local Resource Manager
Infrastructure
Allocate &
Request
create processes
Job Manager
Create
Gatekeeper
Parse
RSL Library
October 22, 2001
Monitor &
control
Process
Process
Process
Intro to Grid Computing and Globus Toolkit™
33
Job Submission Interfaces
z
Globus Toolkit includes several command
line programs for job submission
– globus-job-run: Interactive jobs
– globus-job-submit: Batch/offline jobs
– globusrun: Flexible scripting infrastructure
z
Others are building better interfaces
– General purpose
> Condor-G, PBS, GRD, Hotpage, etc
– Application specific
> ECCE’, Cactus, Web portals
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
34
globus-job-run
z
z
For running of interactive jobs
Additional functionality beyond rsh
– Ex: Run 2 process job w/ executable staging
globus-job-run -: host –np 2 –s myprog arg1 arg2
– Ex: Run 5 processes across 2 hosts
globus-job-run \
-: host1 –np 2 –s myprog.linux arg1 \
-: host2 –np 3 –s myprog.aix arg2
– For list of arguments run:
globus-job-run -help
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
35
globus-job-submit
z
For running of batch/offline jobs
– globus-job-submit
Submit job
> Same interface as globus-job-run
> Returns immediately
– globus-job-status
Check job status
– globus-job-cancel
Cancel job
– globus-job-get-output
Get job stdout/err
– globus-job-clean
Cleanup after job
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
36
globusrun
z
Flexible job submission for scripting
– Uses an RSL string to specify job request
– Contains an embedded globus-gass-server
> Defines GASS URL prefix in RSL substitution variable:
(stdout=$(GLOBUSRUN_GASS_URL)/stdout)
– Supports both interactive and offline jobs
z
Complex to use
– Must write RSL by hand
– Must understand its esoteric features
– Generally you should use globus-job-*
commands instead
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
37
Resource Management APIs
z
z
z
z
The globus_gram_client API provides access to all
of the core job submission and management
capabilities, including callback capabilities for
monitoring job status.
The globus_rsl API provides convenience functions
for manipulating and constructing RSL strings.
The globus_gram_myjob allows multi-process jobs
to self-organize and to communicate with each
other.
The globus_duroc_control and
globus_duroc_runtime APIs provide access to
multirequest (co-allocation) capabilities.
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
38
Resource Specification Language
z
Common notation for exchange of
information between components
– Syntax similar to MDS/LDAP filters
z
RSL provides two types of information:
– Resource requirements: Machine type,
number of nodes, memory, etc.
– Job configuration: Directory, executable,
args, environment
z
Globus Toolkit provides an API/SDK for
manipulating RSL
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
39
RSL Syntax
z
Elementary form: parenthesis clauses
– (attribute op value [ value … ] )
z
Operators Supported:
– <, <=, =, >=, > , !=
z
Some supported attributes:
– executable, arguments, environment, stdin, stdout,
stderr, resourceManagerContact,
resourceManagerName
z
Unknown attributes are passed through
– May be handled by subsequent tools
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
40
Constraints: “&”
z
For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog)
z
“Create 5-10 instances of myprog, each
on a machine with at least 64 MB
memory that is available to me for 4
hours”
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
41
Disjunction: “|”
z
For example:
& (executable=myprog)
( | (&(count=5)(memory>=64))
(&(count=10)(memory>=32)))
z
Create 5 instances of myprog on a
machine that has at least 64MB of
memory, or 10 instances on a machine
with at least 32MB of memory
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
42
RSL and GRAM
z
Will come back to this later...
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
43
Advance Reservation
and Other Generalizations
z
General-purpose Architecture for Reservation
and Allocation (GARA)
– 2nd generation resource management services
z
Broadens GRAM on two axes
– Generalize to support various resource types
> CPU, storage, network, devices, etc.
– Advance reservation of resources, in addition
to allocation
z
Currently a research prototype
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
44
Co-allocation
z
Simultaneous allocation of a resource set
– Handled via optimistic co-allocation based
on free nodes or queue prediction
– In the future, advance reservations will also
be supported (already in prototype)
z
Globus APIs/SDKs support the coallocation of specific multi-requests
– Uses a Globus component called the
Dynamically Updated Request Online
Co-allocator (DUROC)
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
45
Multirequest: “+”
z
A multirequest allows us to specify multiple
resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))
– Execute 5 instances of p1 on a machine with at least
64M of memory
– Execute p2 on a machine with an ATM connection
z
Multirequests are central to co-allocation
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
46
A Co-allocation Multirequest
+( & (resourceManagerContact=
“flash.isi.edu:754:/C=US/…/CN=flash.isi.edu-fork”)
(count=1)
(label="subjob A")
Different resource
(executable= my_app1)
managers
)
Different ( & (resourceManagerContact=
counts
“sp139.sdsc.edu:8711:/C=US/…/CN=sp097.sdsc.edu-lsf")
(count=2)
(label="subjob B")
Different executables
(executable=my_app2)
)
October 22, 2001
Intro to Grid Computing and Globus Toolkit™
47
GARA: The Big Picture
Co-Reservation Agent
Gatekeeper
GRIO RM
October 22, 2001
Gatekeeper
Scheduler RM
MDS Info Service
Gatekeeper
Diffserv RM
Intro to Grid Computing and Globus Toolkit™
Gatekeeper
DSRT RM
48
Resource Management Futures:
GRAM-2 (planned for 2002)
z
Advance reservations
– As prototyped in GARA in previous 2 years
z
Multiple resource types
– Manage anything: storage, networks, etc., etc.
z
Recoverable requests, timeout, etc.
z
Better lifetime management
z
Policy evaluation points for restricted proxies
z
Use of Web Services (WSDL, SOAP)
October
22, 2001
Intro
to Grid Computing
Karl
Czajkowski,
Steve
Tuecke,
othersand Globus Toolkit™
49
Download