Short to middle term GRID deployment plan for LHCb

advertisement
Short to middle term GRID deployment plan for LHCb
The GRID is a unique opportunity to unify the CPU resources available in the collaborating institutes of the
experiment.
To make this expectation a reality, we need to agree on a common deployment plan with the concerned
institutes. This plan could be part of the EU grid project, or it could be a stand-alone LHCb plan.
I see the following steps and milestones.
LHCb GRID working group
A LCHb GRID working group is started, initially consisting of CERN, RAL and Liverpool. A kick-off
meeting took place at RAL on june 14th. The purpose of this meeting is to establish some goals that are
realistically achievable on the short term:
1. Globus should be installed and tested at CERN, RAL and Liverpool. Status:
 Globus version 1.1.3 has been installed at CERN on a single Linux node (pcrd25.cern.ch)
running Redhat 6.1 but is not yet available for use. The service should be ready for use this
week.
 Globus version 1.1.3 will be installed at RAL on CSF by the end of this week. Some instability
expected during july.
 Globus version 1.1.3 is being installed at Liverpool MAP by...
2. Members of the GRID working group are given acces on their respective testbeds. Members cross
check that they can run jobs on each others machines. Status:
 Eric and Glenn Patrick have access to CERN pcrd25.cern.ch. Chris Brew and Girish Patel still
to be given access to CERN.
 Eric, Glenn and Chris Brew have access to RAL CSF. Girish to be given access to RAL CSF.
 All to be given access to MAP.
Timescale: accounts ready by end june.
3. Checkpoint during the LHCb software week meeting. Inform the collaboration of this project and
invite other institutes to join the working group.
4. We make sure that SICBMC can be run at CERN, RAL and Liverpool. (SICBDST requires input
tapes - we propose to wait until later).
 Eric to prepare a script using globus commands (this will ensure we all run the same
executable) to run sicbmc and copy the data back to the host where the job was executed from.
Timescale: 1st week in july.
 Other partners to test the script from their sites. Timescale: end of july.
5. We verify that the data produced by SICBMC can be shipped back to CERN and written onto
tape.
 This should be done exclusively using globus commands - new fast ftp commands are under
development. Some scripting may be required to get the data to SHIFT and onto tape.
Timescale: early August.
6. Checkpoint meeting at Liverpool mid-end August.
7. Some benchmarking of sustained datatransfer rates between the three sites.
8. This mini project should be completed by the end of september. We should report the results of
this at the LHCb week in Milano.
Integration of the GRID with PCSF, LSF and MAP
1.
Eric, Chris, Glenn, Girish and Gareth attended the Globus course at RAL during 21 & 22 june. The
following issues came up:
 The Globus toolkit has a C API, therefore it should be easy to integrate with Gaudi.
2.
3.
4.
5.
 There are globus commands for remotely running scripts (or executables), recovering data from
standard output, copying data between sites, and saving/consulting metadata through LDAP (Light
Weight Directory Access protocol).
 By using Globus, we can ensure that:
1. We are all using the same executables.
2. We are all using the same scripts.
3. We are handling the data in a uniform way.
 Globus is independent of the underlying operating system (but: no date for a WIN 32 client), and of
a particular batch scheduler, although it is working with Condor. This will be discussed in Lyon on
june 30th.
 We should investigate whether we can use LDAP for our bookkeeping database:
1. There is a well defined API (would solve our current Oracle -> Gaudi interface problems) for
C.
2. It would greatly simplify the DB updating done by MC production.
3. Everybody seems to be heading in this direction.
4. Oracle have an LDAP server product, so it should be possible to migrate. Someone should
explore this, and report. If no one volunteers, I'll look into it.
My java tools to manage the jobsubmission, tape management and bookkeeping database updating
should be extended to use the GRID testbeds. Timescale: october. (This will done in parallel with the
current NT based production).
As the RAL farm will be converted from NT to Linux before the end of the year, the CERN PCSF is
dual boot and it can be switched over to Linux any time we want, and MAP is already running Linux, I
suggest that we decide to use Linux globally as the LHCb GRID operating system. IT have been
informed (Tony Cass) they should install Globus on PCSF. Time scale for this: november.
The question of abandoning NT for production should be discussed with the collaboration during the
next software week. I know of at least one person who is very much against.
To test the system, we should aim for a production run using the GRID in december.
Extension to other institutes
It seems that the architecture of the LHCb GRID will become:
1. Intel PCs running Linux.
2. LSF (or Condor) for managing job scheduling.
3. Globus software for managing the GRID.
4. LDAP for our bookkeeping database.
5. Java tools for connecting our own production and bookkeeping requirements to the GRID.
Clearly the precise form of this architecture will depend on the results of the tests (and discussions) that
will take place later this year. Provided that this architecture is present, it should be easy for other institutes
to join the GRID. The objective of the GRID working group next year should be to encourage maximum
participation of outside institutes and to help them with the installation of the software that is part of the
architecture.
Download