13th June 2006 David Colling
Imperial College London
1
Slightly unusual to be asked to summarise a session that everybody has everyone has just sat through so:
• I will try summarise the important points of each model
-This will be a personal view
I “manage” a distributed Tier 2 in the UK that currently supports all LHC experiments
- I am involved in CMS computing/analysis
• Then there will be further opportunity to question the experiment experts about the implementation of the models on the Tier 2s.
13th June 2006 David Colling
Imperial College London
2
• Firstly, only three of the four LHC experiments plan to do any analysis at the Tier 2s
• However, conceptually, those three have very similar models.
• They have the majority (if not all) end user analysis being performed at the Tier 2s. This gives the Tier 2 a crucial role in extracting the physics of the LHC.
• The analysis share the Tier 2s with Monte Carlo production
13th June 2006 David Colling
Imperial College London
3
• The experiments want to be able to control the fraction of the Tier 2 resources that are used for different purposes
(analysis v production, analysis A v analysis B)
• They all realise that data movement followed by knowledge of the content and location of those data is vitally important.
-They all separate the data content and data location databases
- All have jobs going to the data
• The experiments all realise that there is a need to separate the user from complexity of the WLCG.
13th June 2006 David Colling
Imperial College London
4
• Implementation, here they differ widely on:
-What services need to be installed/maintained at each Tier 2.
- What additional software software that they need above the “standard” grid installations.
- Details of the job submission system (e.g. pilot jobs or not, very different UIs etc)
• How they handle different Grids
• Maturity:
- CMS has a system capable of running >100K jobs/month whereas Atlas only has a few hundred
GB of appropriate data
13th June 2006 David Colling
Imperial College London
5
Lets starts with Atlas…
• Different implementations on different Grids.
Looking at the EGEE Atlas implementation.
• No services required at the Tier 2 only software installed by SGM.
• All services (file catalogue, data moving services of Don
Quixote etc) at the local T1.
• As a Tier 2 “manager” this makes me very happy as it minimises the support load at the Tier 2 and means that it is left to experts at the Tier 1. Means that all sites within the London Tier 2 will be available for Atlas analysis.
13th June 2006 David Colling
Imperial College London
6
CE rfio dcap gridftp nfs
SE
Tier 2
13th June 2006
Accessing data for analysis on the Atlas EGEE installation http lrc protocol
Dataset catalog
Tier 0
VOBOX
David Colling
Imperial College London
LRC
Tier 1
FTS
7
Prioritisation mechanism will come from the
EGEE Priorities Working group
Production
CE
Long
Software
CE
Short
70%
20 %
1 %
9 %
13th June 2006 David Colling
Imperial College London
8
US Using Panda system:
• Much more work at the Tier 2
• However US Tier 2 seem to be better endowed with support effort so this may not be problem.
NorduGrid
• Implementation still ongoing
Maturity
• Only a few hundred GB of appropriate data
• Experience of SC4 will be important,
David Colling
Imperial College London
9
• Require installation of some services at Tier 2s: PhEDEx
& trivial file catalogue
• However, it is possible to run the instances for different sites within a distributed T2 at a single site.
• So as a distributed Tier 2 “manager” I am not too unhappy … for example in the UK I can see that all sites in the London Tier 2 and in SouthGrid running CMS analysis but less likely in NorthGrid and ScotGrid
13th June 2006 David Colling
Imperial College London
10
• Installation as similar as possible across EGEE and
OSG.
• Same UI for both … called crab
•Can use crab to submit to OSG sites via an EGEE
WMS or directly via CondorG
13th June 2006 David Colling
Imperial College London
11
• PhEDex has proved to be a very reliable since
DC04
• CRAB in use since end of 2004
• Hundred thousand jobs a month
• Tens of sites both for execution and submission
• Note that there are still failures
13th June 2006 David Colling
Imperial College London
12
• Only really running on EGEE and Alice specific sites
• Puts many requirement on a site: xrootd, VO Box running AliEn SE and CE, the package management server, MonLisa server, LCG UI and AliEn file transfer.
• All jobs are submitted via AliEn tools
• All data is accessed only via AliEn
13th June 2006 David Colling
Imperial College London
13
Vo-Box
SA
CE
FTD
MonaLisa
PackMan
LCG-UI
Port # Access: Outgoing + Service
=============================================
8082 incoming from World SE (Storage Element)
8084 incoming from CERN CM (ClusterMonitor)
8083 incoming from World FTD (FileTransferDaemon)
9991 incoming from CERN PackMan
Can run on the
VO Box
Xrootd redirector
Port # Access: Outgoing + Service
=============================================)
1094 incoming from World xrootd file transfer.
Storage
LCG CE
LCG FTS/SRM-SE/LFC
Disk Server 1
Disk Server 2
Disk Server 3
Workernode configuration/requirements are equal for batch processing at
Tier0/1/2 centers (2 GB Ram/CPU – 4 GB local scratch space)
13th June 2006 David Colling
Imperial College London
14
• All data access is via xrootd … allows innovative access to data. However, it is a requirement on site.
• May be able to use xrootd front ends to standard srm
• Batch analysis implicitly allows prioritisation through a central job queue
• However, this does involve using
like functionality
13th June 2006 David Colling
Imperial College London
15
JDL
Submission
Task
Queue
JDL
JDL match
Tier-2
AliEn
CE
LCG
UI
RB
Tier-2
LCG
CE
Batch
Sys.
apply
Optimiziation
- Splitting
- Requirements
- FTS replication
- Policies
JDL
JDL
XML
Agent
ROO
T
Tier-2
13th June 2006
API Services
Central services
AliE n FC
David Colling
Imperial College London
Xrootd
Tier-2 SE
16
• As a distributed Tier 2 “manager” this set up does not fill me with joy.
• I cannot imagine installing such VO boxes within the
London Tier 2 and would be surprised if any UK Tier 2 sites (with the exception Birmingham) install such boxes.
13th June 2006 David Colling
Imperial College London
17
Interactive Analysis
• More important to Alice than others.
• Novel and interesting approach based Proof and xrootd
Maturity
• Currently, only a handful of people trying to perform Grid based analysis
• Not a core part of SC4 activity for Alice.
13th June 2006 David Colling
Imperial College London
18
• Three of the four experiments plan to use Tier 2 sites for end user analysis.
• These three experiments have conceptually similar models (at least for batch )
• The implementations of the similar models have very different implications for the Tier 2 supporting the VOs
13th June 2006 David Colling
Imperial College London
19
13th June 2006 David Colling
Imperial College London
20