gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer gLite • What is gLite: “the next generation middleware for grid computing” “collaborative efforts of more than 80 people in 10 different academic and industrial research centers” “Part of the EGEE project (http://www.eu-egee.org)” “bleeding-edge, best-of-breed framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet” EGEE Activity Areas (quoted from http://www.glite.org) Nordic contributors: HIP, PDC, UiB 2 Architecture guiding principles • Lightweight services • Interoperability • Large-scale deployment and continuous usage Being built on Scientific Linux and Windows Co-existence with deployed infrastructure • Allow for multiple implementations Portability • 60+ external dependencies Performance/Scalability & Resilience/Fault Tolerance • Easily and quickly deployable Use existing services where possible as basis for re-engineering “Lightweight” does not mean less services or non- intrusiveness – it means modularity Reduce requirements on participating sites Flexible service deployment Multiple services running on the same physical machine (if possible) Co-existence with LCG-2 and OSG (US) are essential for the EGEE Grid service Service oriented approach … 3 Service-oriented approach • By adopting the Open Grid Services Architecture, with components that are: Loosely coupled (messages) Accessible across network; modular and self-contained; clean modes of failure Can change implementation without changing interfaces Can be developed in anticipation of new use cases • Follow WSRF standardization No mature WSRF implementations exist to-date so start with plain WS • WSRF compliance is not an immediate goal, but the WSRF evolution is followed • WS-I compliance is important 4 gLite vs LCG-2 • Intended to replace LCG-2 • Starts with existing components • Aims to address LCG-2 shortcoming and advanced needs from applications (in particular feedback from DCs) • Prototyping short development cycles for fast user feedback • Initial web-services based prototypes being tested with representatives from the application groups LCG-1 LCG-2 gLite-1 gLite-2 Globus 2 based Web services based 5 Approach • Exploit experience and components VDT EDG ... AliEn LCG ... from existing projects AliEn, VDT, EDG, LCG, and others • Design team works out architecture and design Architecture: https://edms.cern.ch/document/476451 Design: https://edms.cern.ch/document/487871/ Feedback and guidance from EGEE PTF, EGEE NA4, LCG GAG, LCG Operations, LCG ARDA • Components are initially deployed on a prototype infrastructure Small scale (CERN & Univ. Wisconsin) Get user feedback on service semantics and interfaces • After internal integration and testing components to be deployed on the pre-production service 6 Subsystems/components LCG2: components gLite: services User Interface AliEn Computing Element Worker Node Workload Management System Package Management Job Provenance Logging and Bookkeeping Data Management Information & Monitoring Job Monitoring Accounting Site Proxy Security Fabric management 7 Workload Management System 8 Computing Element • Works in push or pull mode • Site policy enforcement • Exploit new Globus GK and Condor-C (close interaction with Globus and Condor team) CEA … Computing Element Acceptance JC … Job Controller MON … Monitoring LRMS … Local Resource Management System 9 Data Management • Scheduled data transfers (like jobs) • Reliable file transfer • Site self-consistency • SRM based storage 10 Catalogs • File Catalog • Filesystem-like view on logical file names Keeps track of sites where data is stored Conflict resolution Metadata Catalog Replica Catalog • Keeps information at a site (Metadata Catalog) Attributes of files on the logical level Boundary between generic middleware and application layer Metadata GUID LFN LFN GUID SURL Site ID Site ID LFN GUID SURL Replica Catalog Site A File Catalog SURL SURL Replica Catalog Site B 12 Information and Monitoring • R-GMA for e.g: D0 application monitoring: Information system and system monitoring Application Monitoring Job wrapper MPP MPP architecture But re-engineer and harden MPP the system • Co-existence and interoperability with other systems is a goal E.g. MonaLisa DbSP MPP – Memory Primary Producer Job wrapper DbSP – Database Secondary Producer • No major changes in Job wrapper 13 Security Credential Storage Obtain Grid (X.509) credentials for Joe 1. 2. myProxy Pseudonymity Service (optional) “Joe → Zyx” tbd 3. Attribute Authority Joe “Issue Joe’s privileges to Zyx” 4. VOMS “The Grid” “User=Zyx Issuer=Pseudo CA” GSI LCAS/LCMAP S 14 GAS & Package Manager • Grid Access Service (GAS) Discovers and manages services on behalf of the user File and metadata catalogs already integrated • Package Manager Provides application software at execution site Based upon existing solutions Details being worked out together with experiments and operations 15 Current Prototype • WMS • WN • 23 at CERN + 1 at Wisconsin External SRM implementations (dCache, Castor), gLite-I/O Catalogs (CERN) • • • AliEn FileCatalog, RLS (EDG), gLite Replica Catalog VOMS (CERN), myProxy, gridmapfile and GSI security User Interface (CERN & Wisc) • R-GMA Security • Simple interface defined Information & Monitoring (CERN, Wisc) • GridFTP Metadata Catalog (CERN) SE (CERN, Wisconsin) • Globus Gatekeeper, Condor-C, PBS/LSF , “Pull component” (AliEn CE) Data Transfer (CERN, Wisc) CE (CERN, Wisconsin) • AliEn TaskQueue, EDG WMS, EDG L&B (CNAF) • AliEn shell, CLIs and APIs, GAS Package manager Prototype based on AliEn PM Data Scheduling (CERN) File Transfer Service (Stork) 16 Summary, plans • Most Grid systems (including LCG2) are batch-job production oriented, gLite addresses distributed analysis Most likely will co-exist, at least for a while • A prototype exists, new services are being added: Dynamic accounts, gLite CEmon, Globus RLS, File Placement Service, Data Scheduler, fine-grained authorization, accounting… • A Pre-Production Testbed is being set up more sites, tested/stable services • First release due end of March 2005 Functionality freeze at Christmas Intense integration and testing period from January to March 2005 • 2nd release candidate: November 2005 May: revised architecture doc, June: revised design doc 17