LCG & EGEE Status & Overview GridPP9 February 4

advertisement
LCG & EGEE Status &
Overview
GridPP9
February 4th 2004
Tony.Cass@CERN.ch
Agenda
 LCG
– LHCC Review
– Area Status
»
»
»
»
Applications
Fabric
Grid Deployment
Grid Technology
– LCG & GridPP2
 EGEE
@ CERN
Tony.Cass@CERN.ch
2
LCG – LHCC Review
 LHCC
Comprehensive review of the project,
24th/25th November.
– See http://agenda.cern.ch/fullAgenda.php?ida=a035729
 Preceded
by
– Application Area as part of overall experiment software
planning, 4th September.
» See http://agenda.cern.ch/fullAgenda.php?ida=a032308,
– LCG Internal review of the Grid & Fabric areas, 17th-19th
November.
» See http://agenda.cern.ch/fullAgenda.php?ida=a035728
Tony.Cass@CERN.ch
3
LCG — Review Conclusions
 The
LHCC noted significant progress: “It is
realistic to expect the LCG project to have
prototyped, built and commissioned the initial LHC
computing environment”.
 The LHCC noted the good progress in the
Applications Area.
No problems for Fabrics!
 Usual worries about Grid deployment, middleware
development and middleware directions (ARDA),
but the review committee considered that the
project is/was steering appropriate course.
 GridPP funded manpower is a substantial factor
behind the progress noted by the LHCC Reviewers!
Tony.Cass@CERN.ch
4
LCG — Applications Area
 From
the Applications Area report for Q4, 2003:
– “The applications area in this quarter continued to move
through a period in which rapid-paced development and
feedback-driven debugging are giving way to
consolidation of the software in a number of areas and
increased direct support for experiment integration.”
 Internal
Applications Area review in October prior
to LHCC review.
– Review report reflected in AA plans for 2004.
– In particular, recommendation for closer relationship
with ROOT team being followed up in area of dictionary &
maths libraries.
» SEAL and ROOT teams developing proposed workplans for
consideration in Q1 this year.
Tony.Cass@CERN.ch
5
LCG — Applications Area

POOL
– passed major integration and deployment milestone with production
use by CMS: millions of events written per week;
– no longer on CMS critical path to Data Challenge readiness, a major
success for the POOL team and CMS.

Simulation project
– completed important milestones (initial cycle of EM physics
validation), drew close to completing others (revisiting of physics
requirements, hadronic physics validation), and made an important
clarification of the objectives and program of the generic simulation
framework subproject
– Maybe not directly Grid related, but LHCC review “underlined the
importance of supporting the Monte Carlo generator codes for the
experiments.”

Other items
– SEAL and POOL now available for Windows; initial PI program
essentially complete; ARDA RTAG report.
Tony.Cass@CERN.ch
6
LCG — Fabric Area

Extremely Large Farm management system expanding
its control of the CERN fabric
– quattor management of CPU servers being extended to disk & tape
servers (including CASTOR configuration). Disk configuration stored
in CDB using HMS.
– Lemon: EDG/WP4 repository in production since Sept
– LEAF: new State Management System being used to move systems
into/out of production and drive, e.g., kernel upgrades.
– All computer centre machines registered in quattor/CDB.
– Use of ELFms tools, particularly quattor, for management of
experiment farms is under discussion (and test).

CERN Computer Centre upgrade continues.
– Substation civil engineering almost complete; electrical equipment
starts to arrive in March.
– RHS of machine room upgraded: Major equipment migration to free
the LHS is in preparation!
Tony.Cass@CERN.ch
7
LCG — Fabric Area

Phase II purchasing process starting now
– See talk at http://agenda.cern.ch/fullAgenda.php?ida=a036781.
– Long lead time before 2006 purchases given CERN rules.
» Install early in 2006. Volumes are large, especially for disk servers.
– Plan to qualify suppliers of “generic PCs”
» “Intel-like architecture” about the only requirement
» Selection criteria for companies is the major consideration at present.
Plan careful evaluation of potential bidders in Autumn.
– Expect CPU servers to be commodity equipment as at present.
– Disk server area is major concern.
» Significant problems with EIDE servers in 2003. Reasons not fully
understood (yet!). Procedures and control much improved since
November (end of 2003 data taking).
» Still, these servers are significantly cheaper than alternatives. We need
to be able to deal with hardware failures in this area.

CMS and ATLAS are watching our plans closely.
– Common suppliers for Tier0/1 and online farms?
Tony.Cass@CERN.ch
8
LCG — Grid Deployment
 LCG1
service now covers 28 sites. Major
productions for ATLAS and CMS during Christmas
holidays.
– CMS: 600K events; sites mainly in Italy & Spain.
– ATLAS 14K events (although only 75 jobs).
– US/ATLAS sent requests for job execution to LCG-1
from the US Grid3 infrastructure. After some work,
events were successfully generated using LCG-1 sites
CERN, Turin and Brookhaven with the output data staged
at the University of Chicago and registered in the Globus
RLS.
 LCG2
service is on smaller number of sites
– Avoid configuration and stability issues
– Require commitment of sufficient support effort and
compute resources
Tony.Cass@CERN.ch
9
LCG2 Core sites and commitments
Site
Immediate
Later
CERN
200
1200
CNAF
200
500
FNAL
10
?
FZK
100
?
Nikhef
124
180
PIC
100
300
RAL
70
250
Taipei
60
?
Russia
30
50
Prague
17
40
100
?
864(+147)
>2600(+>90)
Budapest
Totals
Tony.Cass@CERN.ch
Initial LCG-2
core sites
Other firm
commitments
Will bring in the other 20
LCG-1 sites as quickly
as possible
10
LCG2 functionality

General
– CondorG –
» new grid manager (critical, now in official VDT); gahp-server (critical, local, with
Condor team now); scheduler, memory usage (with Condor team)
– Globus -
» RM wouldn't work behind the firewall; prevent occassional hangs of CE; number of
errors in the handling of return status from various functions
» Refrained from putting all fixes into 2.2.x knowing that they would be included in
2.4.3.
– RB – new WP1 fixed number of LCG-1 problems (reported by LCG)
» above this we fixed (with WP1 team) memory leaks in Interlockd, network server
& filelist problem
– CE – memory leaks

Installation

Still require outbound IP connectivity from WN’s
– WN installation independent from LCFGng (or other tools)
– Still required for service nodes
– Work to be done to address in Replica Manager
– Add statement to security policy to recognise the need – but limit it –
applications must not rely on this
Tony.Cass@CERN.ch
11
LCG2 Status
 Generally
OK, but delayed by problems in SE area.
 Intention was to use SRM interfaces for Castor
and dCache, but there are still problems…
 Agreed now to continue for the present with
gridftp access to storage.
– dcache will be available as a space manager for sites
without one, but not using the SRM interface initially.
 Joint testing with ALICE starts this week.
Tony.Cass@CERN.ch
12
LCG — Grid Technology
 Key
topic has been, of course, the direction of Grid
Middleware.
 ARDA started as an RTAG in the applications area
to define the completion of the Physicist Interface
programme (distributed analysis). Much discussion,
though, on the Grid Middleware and impact on the
DA framework.
 ARDA workshop held at CERN, January 21st/22nd to
plan post RTAG developments.
– See report later this afternoon
» and you’ve just heard from Tony!
– also http://agenda.cern.ch/fullAgenda.php?ida=a036745.
Tony.Cass@CERN.ch
13
LCG & GridPP2
 Remember:
delay of LHC to April 2007 means LCG
and GridPP are now out of phase. LCG phase II
starts only in January 2006.
– Work programme and plan both exist, however, and there
is a shortfall in resources, principally lack of staff.
– Strong desire to maintain UK contribution (and influence)
and links between GridPP & LCG.
– UK message that clear case must be made is understood.
Discussions with new CERN management are starting.
– £1M would support 5FTE over the 3 years (c.f. 25+ now
and 10 in the GridPP2 proposal). Work areas to be agreed.
 Existing
GridPP funded staff have ~1 year left
before the end of their contracts. There will be a
review of post effectiveness similar to that just
completed for other GridPP posts.
Tony.Cass@CERN.ch
14
EGEE @ CERN


See Neil for the high level politics!
Implementation Plan
– Initial service will be based on the LCG infrastructure (this will be
the production service, most resources allocated here)
» Cross membership of LCG & EGEE project management boards.
– Also will need a certification test-bed system
» For debugging and problem resolving of the production system
– In parallel must deploy a development service
» Runs the candidate next software release for production
» Treated as an reliable facility (but with less support than the production
service)

EGEE All Activities Meeting, January 13th/14th

Two areas managed by CERN
– see http://agenda.cern.ch/fullAgenda.php?ida=a036343.
– JRA1/Middleware: Frederic Hemmer
– SA1/Operations: Ian Bird
– Siginificant effort in recruitment area over last 2 months. Four
boards held. 19 job offers made to date. CERN support for at least
one person prior to April project start.
Tony.Cass@CERN.ch
15
Conclusion
 Good
progress in all areas :->
 As
ever, strongly supported by GridPP funded
effort at CERN.
Tony.Cass@CERN.ch
16
Tony.Cass@CERN.ch
17
Download