Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 12-17 September 2011, NEC’11, Varna 1 Outline DIRAC Overview Main subsystems Workload Management Request Management Transformation Management Data Management Use in LHCb and other experiments DIRAC as a service Conclusion 2 Introduction DIRAC is first of all a framework to build distributed computing systems Supporting Service Oriented Architectures GSI compliant secure client/service protocol • Fine grained service access rules Hierarchical Configuration service for bootstrapping distributed services and agents This framework is used to build all the DIRAC systems: Workload Management • Based on Pilot Job paradigm Production Management Data Management etc 3 Production Manager Physicist User Matcher Service EGI/WLCG Grid EGEE Pilot Director NDG Grid NDG Pilot Director GISELA Grid EELA Pilot Director CREAM CE CREAM Pilot Director User credentials management The WMS with Pilot Jobs requires a strict user proxy management system Jobs are submitted to the DIRAC Central Task Queue with credentials of their owner (VOMS proxy) Pilot Jobs are submitted to a Grid WMS with credentials of a user with a special Pilot role The Pilot Job fetches the user job and the job owner’s proxy The User Job is executed with its owner’s proxy used to access SE, catalogs, etc The DIRAC Proxy manager service ensures the necessary functionality Proxy storage and renewal Possibility to outsource the proxy renewal to the MyProxy server 5 Direct submission to CEs Using gLite WMS now just as a pilot deployment mechanism Limited use of brokering features • For jobs with input data the destination site is already chosen Have to use multiple Resource Brokers because of scalability problems DIRAC is supporting direct submission to CEs CREAM CEs Can apply individual site policy Direct measurement of the site state watching the pilot status info • Site chooses how much load it can take (Pull vs Push paradigm) This is a general trend All the LHC experiments declared abandoning eventually gLite WMS 6 DIRAC sites Dedicated Pilot Director per (group of) site(s) On-site Director Off-site Director Site managers have full control Of LHCb payloads Off-site Director On-site Director Site delegates control to the central service Site must only define a dedicated local user account The payload submission through the SSH tunnel In both cases the payload is executed with the owner credentials 7 DIRAC Sites Several DIRAC sites in production in LHCb E.g. Yandex • 1800 cores • Second largest MC production site Interesting possibility for small user communities or infrastructures e.g. contributing local clusters building regional or university grids 8 WMS performance Up to 35K concurrent jobs in ~120 distinct sites Limited by the resources available to LHCb 10 mid-range servers hosting DIRAC central services Further optimizations to increase the capacity are possible ● Hardware, database optimizations, service load balancing, etc 9 Belle (KEK) use of the Amazon EC2 VM scheduler developed for Belle MC production system Dynamic VM spawning taking spot prices and TQ state into account Thomas Kuhr, Belle 10 Belle Use of the Amazon EC2 Various computing resource combined in a single production system KEK cluster LCG grid sites Amazon EC2 Common monitoring, accounting, etc Thomas Kuhr, Belle II 11 Belle II Raw Data Storage and Processing Starting at 2015 after the KEK update 50 MC Production and Ntuple Production ab-1 by 2020 Computing model Data Ntuple Analysis rate 1.8 GB/s ( high rate scenario ) Using KEK computing center, grid and cloud resources Belle II distributed computing system is based on DIRAC Thomas Kuhr, Belle II 12 Support for MPI Jobs MPI Service developed for applications in the GISELA Grid Astrophysics, BioMed, Seismology applications No special MPI support on sites is required • MPI software installed by Pilot Jobs MPI ring usage optimization • Ring reuse for multiple jobs Lower load on the gLite WMS • Variable ring sizes for different jobs Possible usage for HEP applications: Proof on demand dynamic sessions 13 Coping with failures Problem: distributed resources and services are unreliable Software bugs, misconfiguration Hardware failures Human errors Solution: redundancy and asynchronous operations DIRAC services are redundant Geographically: Configuration, Request Management Several instances for any service 14 Request Management system A Request Management System (RMS) to accept and execute asynchronously any kind of operation that can fail Request are collected by RMS instances on VO-boxes at 7 Tier-1 sites Data upload and registration Job status and parameter reports Extra redundancy in VO-box availability Requests are forwarded to the central Request Database For keeping track of the pending requests For efficient bulk request execution 15 DIRAC Transformation Management Data driven payload generation based on templates Generating data processing and replication tasks LHCb specific templates and catalogs 16 Data Management Based on the Request Management System Asynchronous data operations transfers, registration, removal Two complementary replication mechanisms Transfer Agent • user data • public network FTS service • Production data • Private FTS OPN network • Smart pluggable replication strategies 17 Transfer accounting (LHCb) 18 ILC using DIRAC ILC CERN group 2M jobs run in the first year Using DIRAC Workload Management and Transformation systems Instead of 20K planned initially DIRAC FileCatalog was developed for ILC More efficient than LFC for common queries Includes user metadata natively 19 DIRAC as a service DIRAC installation shared by a number of user communities and centrally operated EELA/GISELA grid gLite based DIRAC is part of the grid production infrastructure • Single VO French NGI installation https://dirac.in2p3.fr Started as a service for grid tutorials support Serving users from various domains now • Biomed, earth observation, seismology, … • Multiple VOs 20 DIRAC as a service Necessity to manage multiple VOs with a single DIRAC installation Per VO pilot credentials Per VO accounting Per VO resources description Pilot directors are VO aware Job matching takes pilot VO assignment into account 21 DIRAC Consortium Other projects are starting to use or evaluating DIRAC CTA, SuperB, BES, VIP(medical imaging), • Contributing to DIRAC development • Increasing the number of experts Need … for user support infrastructure Turning DIRAC into an Open Source project DIRAC Consortium agreement in preparation • IN2P3, Barcelona University, CERN, … http://diracgrid.org • News, docs, forum 22 Conclusions DIRAC is successfully used in LHCb for all distributed computing tasks in the first years of the LHC operations Other experiments and user communities started to use DIRAC contributing their developments to the project The DIRAC open source project is being built now to bring the experience from HEP computing to other experiments and application domains 23 24 LHCb in brief Experiment dedicated to studying CP-violation Responsible for the dominance of matter on antimatter Matter-antimatter difference studied using the b-quark (beauty) High precision physics (tiny difference…) Single arm spectrometer Looks like a fixed-target experiment Smallest of the 4 big LHC experiments ~500 physicists Nevertheless, computing is also a challenge…. 25 LHCb Computing Model Tier0 Center Raw data shipped in real time to Tier-0 Part of the first pass reconstruction and re-reconstruction Acting as one of the Tier1 center Calibration and alignment performed on a selected part of the data stream (at CERN) Resilience enforced by a second copy at Tier-1’s Rate: ~3000 evts/s (35 kB) at ~100 MB/s Alignment and tracking calibration using dimuons (~5/s) • Used also for validation of new calibration PID calibration using Ks, D* CAF – CERN Analysis Facility Grid resources for analysis Direct batch system usage (LXBATCH) for SW tuning Interactive usage (LXPLUS) 27 Tier1 Center Real data persistency First pass reconstruction and re-reconstruction Data Stripping Group analysis Event preselection in several streams (if needed) The resulting DST data shipped to all the other Tier1 centers Further reduction of the datasets, μDST format Centrally managed using the LHCb Production System User analysis Selections on stripped data Preparing N-tuples and reduced datasets for local analysis 28 Tier2-Tier3 centers No assumption of the local LHCb specific support MC production facilities Small local storage requirements to buffer MC data before shipping to a respective Tier1 center User analysis No assumption of the user analysis in the base Computing model However, several distinguished centers are willing to contribute • Analysis (Stripped) data replication to T2-T3 centers by site managers Full or partial sample • Increases the amount of resources capable of running User Analysis jobs Analysis data at T2 centers available to the whole Collaboration • No special preferences for local users 29