Distributed Infrastructure with Remote Agent Control (DIRAC) Personal Interface (Architecture Review) LHCB Technical Note Issue: Revision: Draft 0.1 Reference: Created: Last modified: GridPP-LHCb 28 April 2003 16 February 2016 Prepared By: Editor: Gennady. G. Kuznetsov Glenn N. Patrick Alexander Soroko Distributed Infrastructure with Remote Agent Control (DIRAC) Personal Interface (Architecture Review) LHCB Technical Note Issue: Draft Table of Contents Reference: Revision: Last modified: GridPP-LHCb 0.1 16 February 2016 Abstract DIRAC is the name given to the LHCb distributed Monte Carlo production environment and its monitoring and control system. Personal Interface is an extension of DIRAC designed for personal use. Document Status Sheet Table 1 Document Status Sheet 1. Document Title: [Project Name Qualification] User Requirements Document 2. Document Reference Number: [Document Reference Number] 3. Issue Draft page i 4. Revision 1 5. Date 6. Reason for change 28 April 2003 First version Distributed Infrastructure with Remote Agent Control (DIRAC) Personal Interface (Architecture Review) LHCB Technical Note Issue: Draft Table of Contents Reference: Revision: Last modified: GridPP-LHCb 0.1 16 February 2016 Table of Contents 1 INTRODUCTION .................................................................................................................................................... 1 1.1 1.2 1.3 PURPOSE OF THE DOCUMENT ............................................................................................................................... 1 DEFINITIONS, ACRONYMS AND ABBREVIATIONS .................................................................................................. 1 REFERENCES ........................................................................................................................................................ 2 2 CURRENT SYSTEM OVERVIEW ........................................................................................................................ 3 3 SYSTEM DESIGN.................................................................................................................................................... 4 3.1 page ii SYSTEM DECOMPOSITION..................................................................................................................................... 5 Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) LHCB Technical Note Revision: Issue: Draft Last modified: Introduction Reference: 0.1 16 February 2016 List of Figures Figure 1 UML Deployment Diagram of the centralised DIRAC package. ......................... 3 Figure 2 UML Deployment diagram of the DIRAC include Personal Interface.................. 4 page iii GridPP-L Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) LHCB Technical Note Revision: Issue: Draft Last modified: Introduction Reference: 0.1 16 February 2016 1 Introduction 1.1 Purpose of the document The purpose of this document is to specify architecture of the DIRAC Personal Interface. 1.2 Definitions, acronyms and abbreviations 1.2.1 Definitions Architecture The software architecture of a program or computing system is the structure or structures of the system, which comprises software components, the externally visible properties of those components, and the relationships among them. Framework A framework represents a collection of classes that provide a set of services for a particular domain; a framework exports a number of individual classes and mechanisms that clients can use or adapt. A framework realizes the architecture. Component A software component is a re-usable piece of software that has a well-specified public interface and it implements a limited functionality. Software components achieve reuse by following standard conventions. 1.2.2 Acronyms CERN RAL SSH UML WWW European Organization for Nuclear Research Rutherford Appleton Laboratory Secure Shell Unified Modelling Language World Wide Web page 1 GridPP-L Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) Reference: LHCB Technical Note Revision: Issue: Draft Last modified: Introduction GridPP-LHCb 0.1 16 February 2016 1.3 References 2 Data Management: Job Configuration, Bookkeeping, Data Production, LHCb Technical Note, LHCb COMP 02-nn, 22 May 2002 Gaudi Framework (http://proj-gaudi.web.cern.ch/proj-gaudi/) page 2 Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) LHCB Technical Note Revision: Issue: Draft Last modified: Current System Overview Reference: 0.1 16 February 2016 2 Current System Overview The DIRAC (Distributed Infrastructure with Remote Agent Control) is software package designed to curry out largescale distributed calculation. It based on Client/Server technology and topologically can be presented as collection of the distributed services (Figure 1). Deploy ment Diagram CERN Serv ices <<Database>> Production Serv ice <<Database>> Bookkeeping Serv ice <<XML RPC>> Production Serv ice <<Tape>> Storage Serv ice Production Interf ace used in modelling the phy sical aspects of an object-oriented sy stem. It shows the conf iguration of run time processing nodes and components that liv e on them. CERN Storage Interf ace <<network>> Internet Centre 1 Centre 2 Batch Frontend Batch Frontend <<control>> <<control>> Agent Agent <<network>> <<network>> Intranet Intranet Node 1 Node 2 <<entity >> <<entity >> Job Job Node 3 <<entity >> Job Node 1 Node 2 Node 3 <<entity >> <<entity >> <<entity >> Job Job Job Figure 1 UML Deployment Diagram of the centralised DIRAC package. The ideology of this package is very flexible but relays on the centralised schema of the software and job distribution [1]. The job configuration data are stored in the production database an can be managed remotely in the form of the workflow. But execution part implemented by Agent (Python program) and limited by number of implemented cases. For every change in algorithm of the execution of single application Agent must be renewed. page 3 GridPP-L Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) Reference: LHCB Technical Note Revision: Issue: Draft Last modified: System Design GridPP-LHCb 0.1 16 February 2016 3 System Design The new proposed extension of the DIRAC called Personal Interface. This additional package designed to focus on the needs of the local physicist and be independent from any specific type of executed application. I also support collaborative mode of execution for large-scale production (Figure 2). CERN Serv ices <<Database>> Production Serv ice <<Database>> Bookkeeping Serv ice <<XML RPC>> Production Serv ice <<Tape>> Storage Serv ice Production Interf ace CERN Storage Interf ace <<network>> XML RPC Internet bbf tp Serv er Client Computer Personal Agent Personal Desktop IIOP SSH SSH Centre 2 Batch Frontend Centre 1 Grid Frontend <<network>> <<network>> Intranet Intranet Node 1 Node 2 <<entity >> <<entity >> Job Job Node 3 <<entity >> Job Node 1 Node 2 Node 3 <<entity >> <<entity >> <<entity >> Job Job Job Figure 2 UML Deployment diagram of the DIRAC include Personal Interface. 4 page 4 Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) LHCB Technical Note Revision: Issue: Draft Last modified: System Design Reference: 0.1 16 February 2016 3.1 System decomposition The Personal Interface consists of two components: Personal Agent and Personal Desktop. Personal Agent is set of services continuously representing user in the network. User starts, shutdowns and communicates with Personal Agent using Personal Desktop. 3.1.1 Personal Desktop decomposition. The Personal Desktop provides GUI for the Production Manager to create collections of jobs. This GUI based on the component ideology, were the final element created from the basic building blocks called Modules. Then Module used to create a bigger entity called Step. Sub sequentially Steps used to create a Workflow. The Workflow is the biggest entity and reflects way how the user wants to tackle the problem. When submitted, Workflow instantiates into Production and consists of many dependent or independent jobs. 3.1.1.1 Module Editor This editor automates process of packing Python code into XML Module. Designed to use by python programmer. Modules are building blocks of the system. Different experiment (LHCb, ATLAS) may have different library of modules. 3.1.1.2 Module Library Collection of Modules designed for application or experiment, represented by Library Browser. 3.1.1.3 Step Editor Graphical editor defines Step as linked collection of Modules instances. Each instance of Step going to be a Job. 3.1.1.4 Workflow Editor Graphical editor define Workflow as linked collection of Steps. Each link represents persistent external object like File. Specific meaning for Files will be given during instantiation of the Workflow into Production. 3.1.1.5 Production Monitor Tools monitoring status of the Production and individual jobs. 3.1.2 Personal Agent decomposition. The Personal Agent is the application represents physicist in the network. 3.1.2.1 Job Dependency Service Service to resolve job dependency for the dependent from each other jobs. page 5 GridPP-L Distributed Infrastructure with Remote Agent Control (DIRAC)Personal Interface (Architecture Review) Reference: LHCB Technical Note Revision: Issue: Draft Last modified: System Design GridPP-LHCb 0.1 16 February 2016 3.1.2.2 Job Submission Service Service submitting jobs into different Batch systems (include GRID) using Pull technology (see Andrei Tsaregorodtsev publications). Also provides connection with central Job Distribution Service at CERN. 3.1.2.3 File Storage Service Short term File storage system, just for lifetime of the Production. 3.1.2.4 File Distribution Service Supports File Replication Service. 3.1.2.5 Monitoring Service To publish status of the single job 6 page 6