Scientific Grid RM and Discovery Karl Czajkowski Center for Grid Technologies USC/ISI Talk Outline z Introduction – Scientific Grid RM – Virtual Organizations z MDS-2 Architecture – Distributed service model – Mapping to OGSA z GRAM-1 and GRAM–2 Architectures – Service model – Mapping to OGSA Scientific Resource Management z Supercomputing jobs are services – Provide domain-specific capability – Require resource/hosting-environment – Many legacy applications z Leading-edge users act like administrators – Deploy jobs dynamically – Reconfigure service environment z Complex resource environment – This is the root of Grid computing Resource Discovery/Monitoring R R R ? R R dispersed users R R R ? network ? R R R R R R R R ? VO-A R R R VO-B z Distributed users and resources z Variable resource status z Variable grouping and connectivity R Resource Acquisition Phases z Resource Discovery – “What resources are relevant?” – Bootstraps planner state z Resource Status Inquiry – “How do resources compare (now)?” – Refines planner knowledge z Resource Control – “Did I acquire the resources?” – Affects service environment Base Required Features z Virtual Organizations (VOs) – Group together resources and users – Support community-specific “discovery” – Specialized “views” z Scalability – Many resources – Many VOs – Graceful degradation of service Virtual Organizations z Collaborating individuals and institutions – Shared goals – Enable sharing of resources – Non-locality of participants z Dynamic in nature – VOs come and go – Resources join and leave VOs – Resources change status and fail z Community-wide goals MDS-2 Service Architecture ? discovery (GRIP?) VO-specific Aggregate Directories A A lookup (GRIP) registration (GRRP) R R R standard Resource Description services z Dynamic Registration via Reg. Protocol (GRRP) z Resource Inquiry via Info. Protocol (GRIP) – Co-located with resource on network z Resource Discovery (via GRIP or other) – Using GRIP allows resource/directory hierarchy R Distributed Services R R R R D R R R R R R D R R registration messages R R R R replicated directories R R R R R R R R R R R R D fault-partition D R R R R divergent directories R R VO-A VO-B z Service scales with Grid growth z Loose consistency model tolerates failures z Interoperability by GRIP/GRRP protocols Soft-state Registration z Periodic notification – “Service/resource is available” – Expected-frequency metadata z Automatic index/registry construction – Add new resources to registry – Invite resources to join new registry z Self-cleaning – Reduce occurrence of “dead” references Mapping to OGSA z GRIP: OGSI ServiceData enquiry – Self-describing services – Extensible data model – Query and subscription/notification z GRRP: OGSI Registry – Simple case of “mutable store” z GRRP: OGSI ServiceData notification – Allows general Index transformation Index Namespace Management host: hn=R1, O=O1 host: hn=R2, O=O1 host: hn=R3, O=O1 host: hn=R1, O=O2 host: hn=R2, O=O2 host: hn=R1 O1 host: hn=R1 host: hn=R2 host: hn=R3 AggDir O2 host: hn=R1 host: hn=R2 R1 R2 R3 R1 R2 host host host host host z z AggDir R1 AggDir host ResDesc ServiceData is named within home service Qualifying “source name” to disambiguate in index, or use URLs to refer to remote info GRAM Architecture RSL specialization Broker RSL Queries & Info Application Ground RSL Information Service Co-allocator Simple ground RSL Local resource managers GRAM GRAM GRAM LSF Condor NQE Resource Specification Language z Common notation for exchange of information between components – Meant as a machine-to-machine language z RSL provides two types of information: – Resource requirements: Machine type, number of nodes, memory, etc. – Job configuration: Directory, executable, args, environment Advance Reservation and Other Generalizations z General-purpose Architecture for Reservation and Allocation (GARA) – 2nd generation resource management services z Broadens GRAM on two axes – Generalize to support various resource types > CPU, storage, network, devices, etc. – Advance reservation of resources, in addition to allocation z Currently a research prototype GARA: The Big Picture Co-Reservation Agent Gatekeeper GRIO RM Gatekeeper Scheduler RM MDS Info Service Gatekeeper Diffserv RM Gatekeeper DSRT RM GRAM-2 (planned for GT-3) z Advance reservations – As prototyped in GARA in previous 2 years z Multiple resource types – Manage anything: storage, networks, etc., etc. z z Recoverable requests, timeout, etc. Exploit OGSI capabilities – Reliable lifetime management – Use ServiceData mechanisms – Depend on generalized security solutions Karl Czajkowski, Steve Tuecke, others GRAM-2 Agreement Model z Submission agreements – Manager agrees to run task for client – Temporary service deployment z Assignment agreements – Manager agrees to provide resources – Advance reservation and QoS z Binding agreements – Manager binds assignment to task – Allows complex RM arrangements Mapping to OGSA z Manager is a Factory – Agreements rendered as transient services – Agreements present simple meta-interface z Agreements embed Resource Description – XML-based Resource Model forms “RSL2” z Tasks may reflect as Grid Services – Provide OGSI service interface > Including ServiceData and domain-specific methods – Appear in service registries > Become discoverable resources themselves Moving Forward z MDS-2 Architecture details – Paper from HPDC-10 on www.globus.org z GRAM-2 Architecture details – Paper submitted for publication > Contact me (Karl Czajkowski) for access – Grid Service rendering being outlined > Perhaps a BOF at GGF-5? z Resource Modeling – Not just for requests… also advertisement – Needs GGF discussion