INFN computing scenarios for GRID architecture Padova, 14 February 2000 Document for discussing on GRID tools and services with Carl Kesselman R.Cucchi, A.Ghiselli, L.Luminari, L.Perini, M.Mazzucato M.Sgaravatto, C.Vistoli 1 1 Introduction ...........................................................................3 2 HEP computing....................................................................3 3 Computing requirements for LHC experiment .5 4 Approaching computing and data grid ................5 4.1 Layout Model....................................................................... 6 4.2 GRID middleware required .................................................. 7 5 Use case 1 ..............................................................................8 6 Use case 2 ..............................................................................8 6.1 Testbed ............................................................................... 9 2 1 Introduction A computational grid is more than just a collection of resources: it is also a set of services for obtaining information about grid components, locating and scheduling resources, communicating, accessing code and data, measuring performance, authenticating users and resources , ensuring the privacy of communications , and so forth (from GRID book). The aim of this document is to try to summarize the most important characteristics of LHC computing (from Monarc documents), the requirements and to describe some use-cases in order to plan a test program for GRID services and tools. 2 HEP computing LHC experiments collaborations have already investigated many aspects of LHC computing in different frameworks like RD45 and MONARC. From them some well-established elements are listed below: data type : RAW (taken at CERN) ~1MB per event, 10** 9 events per year per experiment (1PB) ESD ( Event Summary Data) refers to physics objects by construction not larger than 100KB AOD (Analysis Object data) refers to objects which facilitate analysis, by construction not larger than 10KB. Created by collaboration analysis group from ESD. Tag refers to very small objects (100 to 500B) which identify an event by its physics signature. Tasks of offline software of each experiment: Data reconstruction: from RAW to ESD MC production Offline calibration Successive data reconstruction Analysis Technical Services database maintenance (including backup, recovery, installation of new versions, monitoring and policing) basic and experiment-specific sw maintenance (backup, updating, installation) support for experiment-specific sw development production of tools for data services production and maintenance of documentation (including Web pages) storage management (disks, tapes, distributed file systems if applicable) 3 CPU usage monitoring and policing database access monitoring and policing I/O usage monitoring and policing network maintenance (as appropriate) support of large bandwidth Current estimates for a single LHC experiment capacity to be installed at CERN by 2006 (see Robertson): 520,000 SI95 for CPU, covering data recording, first-pass reconstruction, some reprocessing, basic analysis of the ESD, support for 4 analysis groups about 1400 boxes to be managed 540 TBs of disk capacity 3 PBs of automated tape capacity 46 GB/s LAN throughput One can assume that 10 to 100 TB of disk space is allocated to AOD/ESD/tags at the central site. Distributed, Hierarchical Regional Centers Architecture The envisaged hierarchy of resources may be summarized in terms of tiers of RC with five decreasing levels of complexity and capability. A possible scheme is: Tier-0: CERN, acting also as a tier-1 Tier-1: large RC on national scale, expensive, multi-service Tier-2: smaller RC, less expensive, mostly dedicated to analysis Tier-3: institute workgroup servers, satellites of tier-2 and/or tier-1 Tier-4: individual desktops Data model Distributed ODBMS presently based on Objectivity/DB. OBJECTIVITY/DB characteristics (many of them to be kept in case of other DB-SW) o o o o Store and manage object via C++ Distributed Database Architecture: Federated Database Application architecture: client/server local server remote server lock server (MROW: multiple reader one write) federated database: more database and server and only one lock server (single partition) 4 o o o o o o FTO (fault tolerant option) it allows to split one Federated DB in more partitions, with a lock server. DRO (database replication option) it allows database replication and to access the nearest ones. It allows also parallel access to the data. data modeling o SCHEMA : it’s the same for application and database. Locking granularity: federated database database container SERVER design page read/write cluster di oggetti caching nel server DIMENSIONI del F.D.B. 64-bit Object Reference 10M di TeraBytes OID (Object Identification) e’ unico in un F.D.B. 3 Computing requirements for LHC experiment There are several characteristics of experimental HEP code and applications that are important in designing computing facilities (based on GRID?) for HEP data processing. In general the computing problem consists of processing very large number of independent transactions, which may therefore be processed in parallel – the granularity of the parallelism can be selected freely; Modest floating point requirements – computational requirements are therefore expressed in SPECint (not SPECfp) units; Massive data storage: measured in PetaBytes (1015 Bytes) for each experiment; Read-mostly data, rarely modified, usually simply completely replaced when new version are generated; High sustained throughput is more important than peak speed – the performance measure is the time it takes to complete processing for all of the independent transactions; Resilience of the overall (grid) system in the presence of sub-system failure is far more important than trying to ensure 100% availability of all sub-system at all times. Therefore HEP applications need High Throughput Computing rather then high performance computing. 4 Approaching computing and data grid Which aspects to consider first? Application programming: based on a wide spectrum of programming paradigm: Java/RMI, CORBA, GRID-oriented …, running in multi-platform heterogeneous computing environment (application oriented middleware system). 5 4.1 Computing resources distributed through INFN sites: o Tier 1; 2 or more regional center ; all kind of data, batch and interactive computing. Receive data from CERN at 100Hz and replicate them. The RC could be distributed. o Tier 2/3; institute workgroup servers, mostly dedicated to analysis, satellites of tier1 o Tier 4, individual desktop Layout Model The logical layout of the multi-tier client-server architecture for one LHC experiment is represented in the following figure: CERN – Tier 0 Client Client Client Data Server Tier 1 Client Data Server Tier 2/3 Data Server Tier 2/ 3 desktop Client deskop WAN Condor Pool In the above configuration example there are three RCs with Data server (or Data mover) and computing farms (client/server model). Several client machines connect data servers through LAN and WAN links. (This will provide a direct comparison between LAN and WAN behavior and evaluating network impact on application behavior and efficiency). Users access through desktop or Client machines. The INFN WAN Condor pool is connected to the system grid. Data are distributed through all the RC data servers; the users from the desktop run jobs and resource managers must locate clients and data in order to process data in the more efficient way. 6 4.2 GRID middleware required The INFN GRID working group (with physicists an IT experts) defined the following requirements: Wide-area workload management Optimal co-allocation of data, CPU and network for specific grid/network aware jobs Distributed scheduling (data and/or code migration) Unscheduled/scheduled job submission Management of heterogeneous computing systems Uniform interface to various local resource managers and schedulers Priorities, policies on resource (CPU, DATA, Network) usage Bookkeeping and ‘web’ user interface Wide-area data management Universal name-space: transparent and location independent Data replication and caching Data mover (scheduled/interactive at object/file/DB granularity) Loose synchronization between replicas Application metadata, interfaced with DBMS, i.e. Objectivity,…. Network services definition for a given application End systems network protocol tuning Wide-area application monitoring Performance, “instrumented systems” with timing information and analysis tools Run-time analysis of collected application events Bottleneck analysis Dynamic monitoring of GRID resources to optimize resource allocation Failure management Computing fabric and general utilities for a global managed Grid: Configuration management of computing facilities Automatic software installation and maintenance System, service, network monitoring and global alarm notifications, automatic recovery from failures Resource use accounting Security of GRID resources and infrastructure usage 7 5 Information service Use case 1 5 Globus machines geographically distributed and configured to run simulation programs called High Level Trigger. These jobs run on single machines with local disk I/O. The aim is to optimize cpu usage of the all 5 machines 6 Use case 2 This use case describes the WAN Monarc testbed with Objectivity 5.2 in a multiserver configuration: Atlfast++ program is used to populate the database following the Tag/Event data model proposed by the LHC++ project to read data from the database 3 AMS servers 1 single federation Objectivity database containing about 50.000 events (~ 2Gbytes). Application program performs read/write access to the database The procedure followed to perform these tests consists in submitting an increasing number of concurrent jobs from each client and then monitoring CPU utilization, network throughput and job execution time(wall clock time). The idea is to use globus and resource manager to optimize client cpu usage ….. QoS Flows Client Client Client Data Server Data Server Data Server Client Client Network Layout will be configured also with QoS mechanisms based on Differentiated Services (DS) allowing different priority traffic flows. The aim is to perform a careful evaluation of TCP performance and 8 Application performance and draw conclusions about how to configure DS providing premium service in this scenario. Together with DS mechanisms, GARA should be used to deliver per-flow, advance reservation, end-to-end Quality of service. Then data grid:……. 6.1 Testbed Start Up Clients Genova Clients 10MbpsWan link server Mi 10Mbps TEN155 10Mbps Padova 10Mbps Garr-B LAN or international links CERN server Clients 10Mbps Cnaf server Roma Bologna Clients Clients There will be 3 data server: one in Milano, one in CNAF and one at CERN. These servers will be interconnected with dedicated links at 10Mbps. At each of these servers will be linked different client sites at 10Mbps: Genova to Milano, Padova, Roma and Bologna will connect CNAF.. 9