Overview of the computing fabric at CERN

advertisement
Overview of the Computing Fabric at CERN
Bernd Panzer-Steindel
Computing Fabrics Area Manager, CERN/IT
The following overview tries to describe in some details the setup and organization of the Computing
Fabric at CERN.
The LCG project has of course by nature a large international aspect, while the Fabric Area of the project
has a very strong CERN focus, as it has the goal to provide the necessary local infrastructure, T0 center, to
safely store and process the large amount of raw data produced by the 4 LHC Experiments (ALICE
ATLAS CMS LHCb). The second goal is the installation and deployment of a more general T1 center
for the reprocessing, analysis and import/export of the data. (Tier center structure) ( dataflow diagram )
The internal structure and the choice of technology is handled separately by each center, but some
coordination is done through the GDB , regular conferences like HEPIX and bilateral projects between
HEP centers.
The Fabric activities are integrated into the management structure of the CERN IT division.
The Computing Fabric consists mainly of services and activities to improve or evolve them
and uses as a basic principle the coupling of computing elements through software and hardware to reach
different complexity levels (complexity diagram).
The Computing Fabric consists in the functional view out of several layers (fabric diagram).
Infrastructure
The basic infrastructure is providing space, electricity and cooling for the computing equipment. The
CERN computer center is currently undergoing some major changes to prepare for the large amount of
equipment needed in 2007 when LHC starts. The vault of building 513 has been completely refurbished
(pictures) to nearly double the available space.
The major project during the next 2 years is the upgrade of the power stations to provide the center with
2 MW of power for electricity and ensure the corresponding cooling. (overview electricity and cooling)
The amount of available power determines the size of the compute power which can be provided.
Unfortunately there are not yet enough technological developments to improve the ratio of compute power
per consumed Watt (power status).
Another important activity is the management of the material flow in the computer center, which covers a
wide range of issues : the organization of purchases, the deployment and installation of new equipment, the
regular replacement of ‘old’ systems , the daily repair/maintenance of broken equipment and the detailed
bookkeeping of the material flow and its status. One focus here the preparation of the purchase procedures
for very large amounts of computing items, as during Phase 2 of the the LCG project (2006 – 2008) we will
have to invest about 60 million SFr to cope with the computing requirements of LHC (cost evaluation).
System management and operation
The CERN IT approach to the general handling of system administration has evolved during the last 2
years. We are moving from an outsourced solution to in sourcing. This is based on the detailed experience
from running a large center during the last years with a mixture of local service teams and a IT company
providing system administration services. This period also improved greatly the understanding of our
Total-Cost-of-ownership (description).
In parallel the software tools have been developed and adapted to achieve a high level of automation.
The current architecture distinguishes between three different functional clusters in the center : CPU nodes
for computation, disk nodes for storage and tape servers for the access to the mass storage (tape silos).
All functionalities are provided by dual INTEL processor ‘deskside’ boxes running the LINUX operating
system. The system management team uses tool sets to install the nodes (operating system and
environment), configure the environment, monitor the status and correct problems.
Overview installation and configuration
Overview monitoring and fault tolerance
Overview basic control
A team of LINUX experts is providing the necessary support for the system management team.
The evolution and deployment of LINUX versions on the production cluster (mainly CPU nodes
concerned) is done in a close collaboration of the different service teams and the experiments (LINUX
certification). In addition some HEP coordination is done regularly in the framework of the HEPIX
conference and via HEP mailing lists.
Services for the Data and Task flow
In this layer the different computing elements are coupled through the physical Network structure and
logically through a batch scheduling system and through storage systems.
Our network strategy is based on a hierarchical tree structure using different Ethernet implementations
(Fast, Gigabit, 10 Gbit, network diagram), which will expand in size and performance from 2005 onwards
to cope with the expected data rates in 2008 and later (data rate note). This is organized by the CERN IT
network team. Overview cluster network
The large production cluster is separated into two parts called Lxplus and Lxbatch. The Lxplus service
provides the interactive user environments for the different experiments. The major resources are provided
by Lxbatch which uses the batch scheduling system LSF for the coordination and load balancing of the user
jobs.
Overview batch scheduler
There are two services with different mechanisms to enable the access to data and software repositories :
1. The access to small files (software, user environment, etc.) is managed by the Andrew Filesystem
(AFS).
The shared filesystem approach is also becoming a key element for the distribution mechanism of
experiment software environments in the GRID.
Overview shared storage
2.
The bulk data management is done through the Hierarchical Storage Management system
CASTOR. It is not a full fledged shared filesystem, but rather provides a minimal set of
functionality similar to that to keep the complexity low. Today about 300 disk server with about
200 TB of disk space are managed by CASTOR as a disk cache layer and about 2 PB of tape space
in large tape libraries as the backend storage part.
Hierarchal Storage Manager, CASTOR
Couplings to the ‘inside/outside’
The production part of the Fabric is ‘coupled’ via the online farms of the experiments, feedback loops to
experimental test clusters, WAN and GRID middleware to other centers, projects with Industry and
Institutes.
The deployed nodes in the center are actually not a monolithic installation used exclusively for production,
but is rather a collection of sub-clusters (cluster diagram) for a variety of activities. Besides the main
production nodes an important system is the so called prototype farm. This is used for technology
evaluations focused on future production use (relations with Industry , openlab) and high performance
computing data challenges like the ALICE mass storage test or online activities ( ALICE event building,
ATLAS control). (data challenges). This is related to the computing models for the four main computing
tasks :
1. Central Data Recording ( CDR) and reprocessing
2. Analysis
3. Monte Carlo production
4. Data import and export
A considerable number of nodes is used in the different GRID testbeds and certification activities.
The boundary between Online computing at the experiments and Offline computing in the center is a rather
arbitrary one. Both sides are now using as much as possible commodity computing equipment, but have
still a few differences in requirements. Some discussions about synergies in these areas have started
(workshop).
CERN is the lead partner in the DataTag EU project to test high speed (10 Gbit) Wide Area Network
connections. The equipment will be used to provide a 10 Gbit WAN connection to the clusters in the
center, starting with a link to the prototype farm. A project between US-CMS and IT in the LCG
framework will use this to test on a large scale (time and size) the data export via WAN (proposal).
The interaction with the experiments takes place via the different services and for the resource planning this
was done partly through the COCOTIME committee and now the PEB.
The center provides of course not only services for the LHC experiments , but also for the fixed target
experiments ( already large data rates, 150 MB/s CDR peak values), general user community (small
experiments, engineers, etc.). (IT home page)
During the last two years we have organized 2 external audits of the activities in the center
(audit1 audit2 ).
Details about the planning and developments in the different areas described above can be found
in the milestone status page
The progress in the Fabric developments is reported regularly (per quarter) in the quarterly reports together
with the other areas in the LCG project, there is also a detailed risk analysis paper available.
Download