Milestones of the Tier-1 Deployment and - Indico

advertisement
Annex 1
The Execution Plan for the Deployment and Commissioning
of the CMS Tier-1 Center in JINR
Introduction
In March of 2011 the proposal to create the LCG Tier1 center as an integral part of the
central data handling service of the LHC Experiments in Russia was expressed in the
official letter by Minister of Science and Education of Russia Andrey Fursenko to CERN
Director General Rolf-Dieter Heuer. In pursuance to achieve principal provisions of this
proposal Russia will agree to accept responsibility of creation of Tier-1 center to serve
all the four LHC experiments (ALICE, ATLAS, CMS and LHCb).
In 2011 The Federal Target Programme Project: «Creation of the automated
system of data processing for experiments at the Large Hadron Collider of Tier-1 level
and maintenance of Grid services for distributed analysis of this data» was approved for
the period 2011-2013 with the budget amounted to about 8.5 MCHF.
The Project is aimed to the creation of a Tier-1 center in Russia for the
processing of experimental data received from LHC and provisioning of Grid services for
a subsequent analysis of this data at the distributed centers of the LHC computing Grid.
It is shared that the National Research Centre "Kurchatov Institute" (Moscow) is
responsible primarily for support of ALICE, ATLAS, and LHC-B experiments while the
JINR (Dubna) provides Tier-1 services for the CMS experiment.
The present document is aimed to evidentiate capability to create and to operate
the Tier-1 centre in JINR, Dubna. It contains the execution plan with milestones at the
WLCG Overview Board for the purpose of signing the WLCG Memorandum of
Understanding as a associate Tier-1 centre with the idea of becoming a full Tier-1 within
one year.
Milestones of the Tier-1 Deployment and Commissioning
The master execution plan consists of two phases in 2012-2014. The first phase is the
construction of the prototype by the end of 2012. The next one is implementation of full
1
Tier-1 functionality, which has to be completed in 2013 (Phase I). Phase II in 2014
foresees the upgrade of Tier-1 resources. The works on the full-featured Tier-1 have
been started concurrently with testing of the prototype in 2013. In order to
demonstrate the ability of operating Tier-1 center, we define three milestones:

Milestone 1: Tier-1 prototype with 10% (exclude tapes) of resources out of the
whole capacity is deployed, Tier-1 is integrated into the LHC OPN with
connectivity of 2 Gbit/sec;

Milestone 2: data transfer using LHC OPN was tested on 2 Gbit/sec level, WLCG
and CMS-specific services on the 10% capacity are tested to comply the
availability and reliability requirements, physical LHC OPN connectivity was
bumped to 10 Gbit/sec;

Milestone 3: data transfer using LHC OPN was tested on 10 Gbit/sec level;
upgrade resources;

Milestone 4: WLCG and CMS-specific services on the full Tier-1 capacity are
tested to comply the availability and reliability requirements, signed WLCG MoU
to become associated Tier-1 centre, upgrade resources to the level of 10% of
aggregate existing Tier-1 capacity in 2014.
Here is the detailed plan of phases and milestones:
Objective
Presentation the Execution Plan to WLCG OB
Target date
Sep 2012
Prototype
Disk & Servers installation and tests
Tape system installation
Organization of network infrastructure and connectivity to CERN via
GEANT (2 Gb)
LHC OPN integration (2 Gb) and registration of JINR Tier1 center in
GOCDB including integration with the APEL accounting system
Milestone 1
Oct 2012
Nov 2012
Nov 2012
Dec 2012
Dec 2012
Phase I
LHC OPN functional tests (2 Gb)
Test of WLCG and CMS services (using 2 Gb LHC OPN)
Test of tape system at JINR: data transfers from CERN to JINR (using
2 Gb LHC OPN)
Test of accounting data publishing via APEL
Definition of support level for Tier 2
Increase CERN connectivity to 10 Gb
Milestone 2
2
May 2013
May 2013
May 2013
May 2013
May 2013
Jul 2013
Jul 2013
LHC OPN functional tests (10 Gb)
Test of tape system: data transfers from CERN to JINR (using 10 Gb
LHC OPN), local access test
Upgrade of tape, disk and CPU capacity at JINR
Milestone 3
85% of the job capacity running for at least 2 months
Storage availability > 98% (functional tests) for at least 2 months
Running with > 98% Availability & Reliability for at least 30 days
WLCG MoU as an associate Tier-1 center
Phase II
Upgrade of disk, tape and CPU capacity at JINR
Milestone 4
3
Aug 2013
Aug 2013
Nov 2013
Nov 2013
Feb 2014
Oct 2014
Dec 2014
Detailed execution plan
1. Disk & Servers installation and test






1200 CPU slots will be installed in November 2012
660 TB of disk-based storage will be installed in November 2012
1600 CPU slots will be installed in November 2013
3168 TB of disk-based storage will be installed in November 2013
Add 1600 CPU slots in Oct 2014
Add 1056 TB of disk-based storage in Oct 2014
2. Tape system installation
 Tape library of total capacity 72 TB will be purchased and installed in November
2012
 Tape library of total capacity 5720 TB will be purchased and installed in
November 2013
 Additional 1600 TB of tape media and additional disk drives will be purchased
and installed in October 2014
3. Network connectivity
The network bandwidth as a part of LHCOPN for Tier-0-Tier-1 and Tier1-Tier-1
connections is about 2 Gbps for 2012 and will be increased up to 10 Gbps in 2014. The
existing JINR link to public (academic) networks with a bandwidth of 2x10 Gbps will be
used to connect the Tier-1 with other Tier-2/Tier-3 centers.
Year
LHCOPN, Gbps
WAN, Gbps (Russian academic networks
2012
2013
2014
2
10
10
2x10
2x10
2x10
and GEANT2)
Below is the plan for the integration of the upcoming Tier-1 into the LHC OPN:
Goal
Date
Integration into LHC OPN (2 Gb)
Dec 2012
Functional tests of the OPN (2 Gb)
Feb 2013
Integration into LHC OPN (10 Gb)
Jul 2013
4
Functional tests of the OPN (10 Gb)
Aug 2013
4. Data transfer tests
Data transfer tests from CERN to JINR will be performed in 2013 with 2 Gb and 10 Gb
OPN connectivity and in 2014 with 20 Gb OPN connectivity. The tests will demonstrate
the ability of receiving and storing into the tape system the raw data for CMS in amount
that is scaled accordingly to the JINR Tier-1 capacity at the testing periods.
5. Tests for WLCG and VO-specific services
Since the start-up in line with the WLCG and LHC Experiments requirements the
upcoming Tier-1 has to provide support of a number of the main services for all four
experiments. In particular, in JINR:

WLCG autorisation and security (GSI, Argus, gLExec)

Computing Element (CREAM CE) and Worker Nodes

Storage Elements (disk- and tape-based)

Monitoring and Accounting (Nagios, APEL)

Workload management (WMS)

Logging and Bookkeeping service (LB)

Information service (BDII)

File transfer service (FTS)

Core services (NTP, DNS, logging and auditing)

HTTP proxy farm

VO-specific services for CMS: PHEDEX
Our schedule includes tests of these services in JINR at all stages of the Tier-1 project.
6. Plan for service availability and reliability
Integration into the SAM/Nagios framework and acceptance of the availability and
reliability tests will be done on the prototype phase (December 2012) and will be
carried on since the early days of the Tier-1 centre to check that the deployed resources
are available and reliable.
5
Main tests for service availability and reliability are planned for the end of 2013 when
Tier-1 centre in JINR will reach the target of 10% total capacity of Tier-1 centres in
2013.
Goal
Minimum
running time
From
To
85% of the job capacity run
2 months
Dec 2013
Feb 2014
98% of the storage element availability
2 months
Dec 2013
Feb 2014
2 months
Dec 2013
Feb 2014
98% of the WLCG and VO-specific
services
7. Tier-2 support
In agreement with the CMS computing models, Tier-1 center in JINR will

accept the agreed share of raw and Monte Carlo data

allow access to the stored data by other Tier-2/Tier-3 centres from WLCG
infrastructure,

operate FTS-channels for Russian Tier-2 centres including monitoring of data
transfers.
The details will be defined in May 2013.
8. A plan for providing on-call services/support according to the Tier-1 specifications
as laid out in the WLCG MoU
Tier-1 in JINR will operate an on-call service for the regional centres and users. It will be
available during the working hours (9:00 – 18:00 MSK) and will include support by e-mail,
phone and, in certain cases, by in-person visits to the regional centre that needs help.
Provided services include
 consultation on deployment a typical Grid centre;

help with specific problems of Grid-related services;

support in handling security-related incidents;

dissemination of best practices.
6
Staffing and pledges
Here we describe the staffing of the T1 and the support model.
ROLE
FTE
Administrative
1.5
Engineering
Infrastructure
2
Network support
2.5
Hardware support
3
Core software and
WLCG middleware
4.5
CMS services
3.5
17
Total
The computing resources to be allocated at JINR Tier-1 for the years 2012 - 2014 are:
Year
2012
2013
2014
CPU (HEPSpec06)
14400
28800
43200
Disk (Terabytes)
660*
3168*
4224*
Tape (Terabytes)
72
5700
8000
*include tape's buffer pools
It is assumed that after milestone 3 in 2013 this center will have computing facilities of
10% of the total existing CMS Tier-1 resources for 2013 (excluding CERN) and after
milestone 4 resources will be increased further to catch up with the Tier-1 pledges for
2014.
7
Download