Computing resources for the OPERA experiment

advertisement

Computing resources for the OPERA experiment

A

. Electronic detectors and hybrid analysis

Computing resources at LNGS

The OPERA acquisition system has been designed to implement Ethernet standards for the data transfer at the earlier stage of the readout chain. The readout boards are equipped with controllers that play a role similar to the “nodes” in the traditional

Ethernet networks. A GPS-based clock distribution system synchronizes the whole

DAQ. Due to the low counting rate, a trigger-less operation mode can be envisaged.

The corresponding bandwidth for this mode is estimated of the order of 100 Mbit/s.

However, the size of permanently stored data is much lower and this size determines the computing resources needed for the electronic detectors. Permanent data belongs to one of the following categories:

Detector hits in coincidence with CNGS beam spill

Events surviving a level 2 asynchronous trigger for the selection of cosmic muon

Calibration events

The size of raw data produced by the electronic detectors in one year of operation is of the order of 100 Gigabytes. In fact, 99% of these data are made up of calibration events. Hence, the overall data storage will not exceed 1 Terabyte (500 Gb for storage and 500 Gb for backup). For what concerns the off-line computing, the CPU power is mainly devoted to the local processing of the data in order to determine the changeable sheets (CS) which have to be removed and sent to the CS Scanning

Stations (see Sec.B). To ease software upgrade and maintenance during the data taking, the setting up of a farm“cloned” from the LINUX farm at CERN is advisable.

However, the corresponding CPU power needed is significantly lower: about 5 PC dual processor with 2 GHz clock. The trace-back of the brick movements during operation and the slow-control information are recorded locally to an ORACLE database. A dedicated 200 Gb mirrored hard-disk is foreseen for this task. Finally, workstations for physicists and technicians at LNGS during data taking have to be supplied (6-10 PC). The electronic detector data and the info stored in the database will be sent to CERN. From the estimates above, it can be inferred that the bandwidth occupation of the LNGS network will be rather limited (a few hundreds of Gigabytes per year). Data from the various scanning stations are also expected to be sent to

CERN for hybrid analysis.

Computing resources at CERN

The OPERA computing resources at CERN have been recently agreed with IT through the COCOTIME committee. Presently, 1.2 Terabytes have been allocated for

OPERA. Extensions up to 600 Gigabytes per year can be requested. The purchase and maintenance of these devices are part of the facilities offered to the approved experiments by CERN and they do not affect the OPERA budget. Additional storage areas based on CASTOR can be requested. In this case, however, a 400 Euro cost per

Terabyte is charged for a 5 year period of allocation. A CPU power of 2000 kSi2K has been assigned to OPERA. This is equivalent to the computing power of the PC farm presently used by OPERA at CERN for MC production and analysis (10 PC

dual-processor with 1.2GHz clock). It corresponds to about 5% of the CPU power allocated at CERN for a large LHC experiment like ATLAS or CMS. CERN workstations and licenses will host the ORACLE-based database systems for OPERA without additional costs for the experiment. Finally, writing of the data to permanent supports (e.g. DLT) will be carried out at CERN, too.

Italian group activities concerning software development and hybrid analysis

Padova and Frascati share the responsibility for the development of the software related to the magnetic spectrometer. It includes raw-data analysis and corresponding

DST production, software development for the track reconstruction in the magnetic field, the simulation of the active (RPC, XPC) and passive (magnetic field maps in iron and air) parts of the spectrometer.

Moreover the coordination of the physics analysis of the OPERA experiment is an

Italian responsibility ( Napoli).

About 10 FTE will be involved in the hybrid analysis (emulsion + electronic detectors). The manpower needed for the construction and operation of the Italian scanning stations is discussed in Sec.B.

B. Emulsion scanning activities

Most data from the OPERA experiment will come from the emulsions. It is important to allocate a reasonable amount of resources for scanning activities in order to efficiently support them. As explained in previous documents, the emulsion scanning job will be shared among European and Japanese laboratories. The European

Laboratories will take part in OPERA emulsion scanning with a shared Scanning

Station for CS scanning, and with several local laboratories for vertex location, vertex selection and precision measurements. The Collaboration has decided that data will be organized in a database built on top of the Oracle Database Server. The current version (Oracle 9iDS) is assumed as a reference for benchmarks and resource estimation. The official OPERA tool for analysis will be ROOT. ROOT will be interfaced with Oracle Database Server for data analysis.

Central DB resources

The output of scanning procedures will be stored in a central place, whose exact location could be either at the Scanning Station site or at CERN. The central DB resources will consist of 4 DB Servers, each one keeping one copy of the data. These machines have to provide local laboratories with information about CS scanning output data, and will collect vertex location, selection and precision measurement output. In this case, the high redundancy ensures not only data safety but also quick response to queries coming from local laboratories. Having 4 central replicas of the full dataset also eliminates the need for local backup mirrors (see below). A typical

DB server machine should have at least the power of a Dual Pentium IV Xeon at 3

GHz.

The amount of data to be stored in the DB servers for the whole sample of

OPERA events is estimated to be below 2.5 TB. This quantity includes: a)5 years’ run; b)increased CNGS beam flux; c)calibration areas on each sheet of each brick for overall alignment, each containing 100 tracks; d)for each scanback track, a set of 100 “spectator” tracks of cosmic ray particles used for local alignment; e)for each vertex/track to be studied, a set of 100 “spectator” tracks of cosmic ray particles used for local alignment;

The estimated amount of data does not include raw data (i.e. track grains and tracks that do not pass minimum quality requirements). We don’t plan to store these data into the DB; however, they will be stored locally in each laboratory (see below).

In order to ensure fault tolerance and access speed, the DB server disks should be SCSI units, with hardware RAID controllers.

The computing resources needed for the central DB are summarized as follows: a)4 Dual Pentium IV Xeon machines at 3 GHz. b)10 TB (DB space) + 0.4 TB (individual hard disks of the 4 machines, 100 GB each)

The network bandwidth should be large enough to ensure proper performance of the

DB system. A reliable estimate can only come from a real-life simulation; however, it will take some work to realize it, because every step of the data taking process must be simulated. For the time being, we can obtain some rough number from the following considerations: as a typical load, we can consider the bandwidth needed to write 2.5 TB from various sites, and to read these 2.5 TB at least 10 times during the 5 years’ data taking. The total transfer capabilities should be 27.5 TB / 5 yrs = 1.4

Mbit/s. This bandwidth should be ensured on average, whereas the peak requirements could be much higher. However, the required bandwidth should not be larger than 32

Mbit/s .

Computing power at the Scanning Station

The Scanning Station will host about 10 fast microscope stages. Each microscope will be driven by a PC equipped with a Matrox frame grabber and a NI motor controller.

The PCs will be running under Windows. Machines equipped with Dual Pentium IV

Xeon at 3 GHz fit our needs.

Additional machines will support the Scanning Station microscope PCs. Their functions are summarized here: a)2 Domain Controllers. Redundancy is needed in order to ensure fault tolerance. b)2 File and Post-processing Servers to perform early post-processing operations, such as microtrack connection from one side to the other of the emulsion plates, and plate-to-plate alignment to follow scanback tracks. These machines should ensure enough power to follow the microscope PCs. c)1 local DB Server mirroring the Scanning Station data to reduce the bandwidth needs.

d)2 Personal workstations to keep microscope PCs and servers under control and monitor data quality.

Each server machine (items “a” through “c”) should provide at least the power of Dual Pentium IV Xeon at 3 GHz. Commercially available personal workstations are expected to have the same computing power.

The processing power is summarized as follows:

17 Dual Pentium IV Xeon machines at 3 GHz.

The Scanning Station DB Server will store less than 500 GB (almost only CS scanning). Raw data should be sent to file servers and moved to mass storage devices as soon as possible. The file servers work as staging areas, and a storage space of 300

GB per file server should be enough. Fault tolerance should be ensured through use of

RAID configurations. Raw data should be stored in files. The overall size of the raw data for CS will depend on the quality cuts on emulsion microtracks, but it should be within 1 TB. One mirror is foreseen for data safety. Storage systems using the LTO technology can already now offer even 15 TB online storage space in a single box.

Unneeded raw data can be moved offline by removing the LTO cassettes and storing them in safe places.

The storage capabilities are then summarized as follows: a)1.1 TB of disk space for DB servers and file servers + 1.7 TB additional space for individual hard disks of each PC

(100 GB each). b)1 TB for LTO online mass storage + 1 TB for LTO offline mass storage.

Network at the Scanning Station

The internal network of the Scanning Station must be very fast in order to avoid slowing down data exchange among DAQ PCs and servers. A Local Area Network with 1 Gbit/s speed can be easily set up. A dedicated external network connection at 2

Mbit/s should satisfy the needs. This should not interfere with the communication to and from the central DB, so these 2 Mbit/s should not be included in the 32 Mbit/s dedicated to the DB.

Computing power at local laboratories

The tasks of vertex location, vertex selection and precision measurement will be shared by several European laboratories, 5 of which are Italian. The computing structure of each laboratory will mimic the Scanning Station structure. 40 bricks/day will leave the OPERA detector to be scanned, and about 20 of these should be processed in Europe. Italian laboratories are expected to provide the scanning power for about 2/3 of the European bricks, which means about 14 bricks/day. The Scanning

Station will scan CS for all European laboratories.

About 20 microscope stages will be driven by DAQ PCs, each equipped with one Matrox frame grabber and one NI motor controller. About 15 of these microscope stages should be in Italy. At each site, two additional machines will act as Domain

Controller, File and PostProcessing Server (see above) and DB Server. One more personal workstation will be used to keep data quality under control. We expect to

need 2 more PCs per site for data analysis. The needs for processing power can be summarized as follows:

40 Dual Pentium IV Xeon machines at 3 GHz.

In order to estimate the disk storage needs, we have to take into account the fact that each laboratory only needs a local copy of its data to avoid overloading the network. On average, each of the 5 Italian laboratories will have a local DB with 2/15

(1/5×2/3) of the total data set. As a result, another delocalized copy of the whole scanning data set will be shared by the laboratories. The amount of raw data will indeed be much larger than in the case of CS scanning, because brick sheets will contain cosmic ray tracks for alignment. The present estimate for raw data is 30 TB.

This quantity will be shared among laboratories, most probably stored in LTO devices. Including the need for safety mirrors, a total amount of 60 TB raw data must be stored. The storage capacity needed for scanning laboratories can be summarized as: a)1.7 TB for local DBs + 4.0 TB additional space for individual hard disks of each PC (100 GB each). b)30 TB of LTO online mass storage + 30 TB of LTO offline mass storage.

Network at the local laboratories

The internal network of each laboratory can be a normal 100 Mbit/s LAN. For the external network connections, a network bandwidth of 2 Mbit/s dedicated to scanning laboratories can be enough.

Human resources

The Italian groups are heavily involved in the emulsions scanning and analysis tasks for the OPERA experiment with about 20 FTE.

The coordination of the emulsion scanning and analysis on the European side is an

Italian responsibility ( Bari).

The Scanning Station will be in charge of scanning the CS for all the European

Laboratories. It will be an Italian responsibility. The Italian local laboratories must provide to scan 2/3 of the bricks assigned to the Europen laboratories. Moreover the

Italian groups will be responsible for the setting up and maintaining the Emulsion DB for the Europen part of the OPERA collaboration.

Summary of resource needs for emulsion scanning

Processing power:

57 Pentium IV Xeon machines at 3 GHz

Data storage: a)18.9 TB of disk space.

b)62 TB of LTO mass storage.

Network bandwidth: a)1 central DB station with internal LAN at 1 Gbit/s and 16

32 Mbit/s access to the Internet. b)6 local stations (including CS Scanning Station) with internal LANs at 100

Mbit/s and 2 Mbit/s access to the Internet.

Human power:

25 FTE

Download