Annex 1 The Execution Plan for the Deployment and Commissioning of the CMS Tier-1 Center in JINR Introduction In March of 2011 the proposal to create the LCG Tier1 center as an integral part of the central data handling service of the LHC Experiments in Russia was expressed in the official letter by Minister of Science and Education of Russia Andrey Fursenko to CERN Director General Rolf-Dieter Heuer. In pursuance to achieve principal provisions of this proposal Russia will agree to accept responsibility of creation of Tier-1 center to serve all the four LHC experiments (ALICE, ATLAS, CMS and LHCb). In 2011 The Federal Target Programme Project: «Creation of the automated system of data processing for experiments at the Large Hadron Collider of Tier-1 level and maintenance of Grid services for distributed analysis of this data» was approved for the period 2011-2013 with the budget amounted to about 8.5 MCHF. The Project is aimed to the creation of a Tier-1 center in Russia for the processing of experimental data received from LHC and provisioning of Grid services for a subsequent analysis of this data at the distributed centers of the LHC computing Grid. It is shared that the National Research Centre "Kurchatov Institute" (Moscow) is responsible primarily for support of ALICE, ATLAS, and LHC-B experiments while the JINR (Dubna) provides Tier-1 services for the CMS experiment. The present document is aimed to evidentiate capability to create and to operate the Tier-1 centre in JINR, Dubna. It contains the execution plan with milestones at the WLCG Overview Board for the purpose of signing the WLCG Memorandum of Understanding as a associate Tier-1 centre with the idea of becoming a full Tier-1 within one year. Milestones of the Tier-1 Deployment and Commissioning The master execution plan consists of two phases in 2012-2014. The first phase is the construction of the prototype by the end of 2012. The next one is implementation of full 1 Tier-1 functionality, which has to be completed in 2013 (Phase I). Phase II in 2014 foresees the upgrade of Tier-1 resources. The works on the full-featured Tier-1 have been started concurrently with testing of the prototype in 2013. In order to demonstrate the ability of operating Tier-1 center, we define three milestones: Milestone 1: Tier-1 prototype with 10% (exclude tapes) of resources out of the whole capacity is deployed, Tier-1 is integrated into the LHC OPN with connectivity of 2 Gbit/sec; Milestone 2: data transfer using LHC OPN was tested on 2 Gbit/sec level, WLCG and CMS-specific services on the 10% capacity are tested to comply the availability and reliability requirements, physical LHC OPN connectivity was bumped to 10 Gbit/sec; Milestone 3: data transfer using LHC OPN was tested on 10 Gbit/sec level; upgrade resources; Milestone 4: WLCG and CMS-specific services on the full Tier-1 capacity are tested to comply the availability and reliability requirements, signed WLCG MoU to become associated Tier-1 centre, upgrade resources to the level of 10% of aggregate existing Tier-1 capacity in 2014. Here is the detailed plan of phases and milestones: Objective Presentation the Execution Plan to WLCG OB Target date Sep 2012 Prototype Disk & Servers installation and tests Tape system installation Organization of network infrastructure and connectivity to CERN via GEANT (2 Gb) LHC OPN integration (2 Gb) and registration of JINR Tier1 center in GOCDB including integration with the APEL accounting system Milestone 1 Oct 2012 Nov 2012 Nov 2012 Dec 2012 Dec 2012 Phase I LHC OPN functional tests (2 Gb) Test of WLCG and CMS services (using 2 Gb LHC OPN) Test of tape system at JINR: data transfers from CERN to JINR (using 2 Gb LHC OPN) Test of accounting data publishing via APEL Definition of support level for Tier 2 Increase CERN connectivity to 10 Gb Milestone 2 2 May 2013 May 2013 May 2013 May 2013 May 2013 Jul 2013 Jul 2013 LHC OPN functional tests (10 Gb) Test of tape system: data transfers from CERN to JINR (using 10 Gb LHC OPN), local access test Upgrade of tape, disk and CPU capacity at JINR Milestone 3 85% of the job capacity running for at least 2 months Storage availability > 98% (functional tests) for at least 2 months Running with > 98% Availability & Reliability for at least 30 days WLCG MoU as an associate Tier-1 center Phase II Upgrade of disk, tape and CPU capacity at JINR Milestone 4 3 Aug 2013 Aug 2013 Nov 2013 Nov 2013 Feb 2014 Oct 2014 Dec 2014 Detailed execution plan 1. Disk & Servers installation and test 1200 CPU slots will be installed in November 2012 660 TB of disk-based storage will be installed in November 2012 1600 CPU slots will be installed in November 2013 3168 TB of disk-based storage will be installed in November 2013 Add 1600 CPU slots in Oct 2014 Add 1056 TB of disk-based storage in Oct 2014 2. Tape system installation Tape library of total capacity 72 TB will be purchased and installed in November 2012 Tape library of total capacity 5720 TB will be purchased and installed in November 2013 Additional 1600 TB of tape media and additional disk drives will be purchased and installed in October 2014 3. Network connectivity The network bandwidth as a part of LHCOPN for Tier-0-Tier-1 and Tier1-Tier-1 connections is about 2 Gbps for 2012 and will be increased up to 10 Gbps in 2014. The existing JINR link to public (academic) networks with a bandwidth of 2x10 Gbps will be used to connect the Tier-1 with other Tier-2/Tier-3 centers. Year LHCOPN, Gbps WAN, Gbps (Russian academic networks 2012 2013 2014 2 10 10 2x10 2x10 2x10 and GEANT2) Below is the plan for the integration of the upcoming Tier-1 into the LHC OPN: Goal Date Integration into LHC OPN (2 Gb) Dec 2012 Functional tests of the OPN (2 Gb) Feb 2013 Integration into LHC OPN (10 Gb) Jul 2013 4 Functional tests of the OPN (10 Gb) Aug 2013 4. Data transfer tests Data transfer tests from CERN to JINR will be performed in 2013 with 2 Gb and 10 Gb OPN connectivity and in 2014 with 20 Gb OPN connectivity. The tests will demonstrate the ability of receiving and storing into the tape system the raw data for CMS in amount that is scaled accordingly to the JINR Tier-1 capacity at the testing periods. 5. Tests for WLCG and VO-specific services Since the start-up in line with the WLCG and LHC Experiments requirements the upcoming Tier-1 has to provide support of a number of the main services for all four experiments. In particular, in JINR: WLCG autorisation and security (GSI, Argus, gLExec) Computing Element (CREAM CE) and Worker Nodes Storage Elements (disk- and tape-based) Monitoring and Accounting (Nagios, APEL) Workload management (WMS) Logging and Bookkeeping service (LB) Information service (BDII) File transfer service (FTS) Core services (NTP, DNS, logging and auditing) HTTP proxy farm VO-specific services for CMS: PHEDEX Our schedule includes tests of these services in JINR at all stages of the Tier-1 project. 6. Plan for service availability and reliability Integration into the SAM/Nagios framework and acceptance of the availability and reliability tests will be done on the prototype phase (December 2012) and will be carried on since the early days of the Tier-1 centre to check that the deployed resources are available and reliable. 5 Main tests for service availability and reliability are planned for the end of 2013 when Tier-1 centre in JINR will reach the target of 10% total capacity of Tier-1 centres in 2013. Goal Minimum running time From To 85% of the job capacity run 2 months Dec 2013 Feb 2014 98% of the storage element availability 2 months Dec 2013 Feb 2014 2 months Dec 2013 Feb 2014 98% of the WLCG and VO-specific services 7. Tier-2 support In agreement with the CMS computing models, Tier-1 center in JINR will accept the agreed share of raw and Monte Carlo data allow access to the stored data by other Tier-2/Tier-3 centres from WLCG infrastructure, operate FTS-channels for Russian Tier-2 centres including monitoring of data transfers. The details will be defined in May 2013. 8. A plan for providing on-call services/support according to the Tier-1 specifications as laid out in the WLCG MoU Tier-1 in JINR will operate an on-call service for the regional centres and users. It will be available during the working hours (9:00 – 18:00 MSK) and will include support by e-mail, phone and, in certain cases, by in-person visits to the regional centre that needs help. Provided services include consultation on deployment a typical Grid centre; help with specific problems of Grid-related services; support in handling security-related incidents; dissemination of best practices. 6 Staffing and pledges Here we describe the staffing of the T1 and the support model. ROLE FTE Administrative 1.5 Engineering Infrastructure 2 Network support 2.5 Hardware support 3 Core software and WLCG middleware 4.5 CMS services 3.5 17 Total The computing resources to be allocated at JINR Tier-1 for the years 2012 - 2014 are: Year 2012 2013 2014 CPU (HEPSpec06) 14400 28800 43200 Disk (Terabytes) 660* 3168* 4224* Tape (Terabytes) 72 5700 8000 *include tape's buffer pools It is assumed that after milestone 3 in 2013 this center will have computing facilities of 10% of the total existing CMS Tier-1 resources for 2013 (excluding CERN) and after milestone 4 resources will be increased further to catch up with the Tier-1 pledges for 2014. 7