Analysis_summary - Indico

advertisement

Summary of the Analysis Systems

13th June 2006 David Colling

Imperial College London

1

Outline

Slightly unusual to be asked to summarise a session that everybody has everyone has just sat through so:

• I will try summarise the important points of each model

-This will be a personal view

I “manage” a distributed Tier 2 in the UK that currently supports all LHC experiments

- I am involved in CMS computing/analysis

• Then there will be further opportunity to question the experiment experts about the implementation of the models on the Tier 2s.

13th June 2006 David Colling

Imperial College London

2

Comparing the Models

• Firstly, only three of the four LHC experiments plan to do any analysis at the Tier 2s

• However, conceptually, those three have very similar models.

• They have the majority (if not all) end user analysis being performed at the Tier 2s. This gives the Tier 2 a crucial role in extracting the physics of the LHC.

• The analysis share the Tier 2s with Monte Carlo production

13th June 2006 David Colling

Imperial College London

3

Comparing the Models

• The experiments want to be able to control the fraction of the Tier 2 resources that are used for different purposes

(analysis v production, analysis A v analysis B)

• They all realise that data movement followed by knowledge of the content and location of those data is vitally important.

-They all separate the data content and data location databases

- All have jobs going to the data

• The experiments all realise that there is a need to separate the user from complexity of the WLCG.

13th June 2006 David Colling

Imperial College London

4

So where do they differ?

• Implementation, here they differ widely on:

-What services need to be installed/maintained at each Tier 2.

- What additional software software that they need above the “standard” grid installations.

- Details of the job submission system (e.g. pilot jobs or not, very different UIs etc)

• How they handle different Grids

• Maturity:

- CMS has a system capable of running >100K jobs/month whereas Atlas only has a few hundred

GB of appropriate data

13th June 2006 David Colling

Imperial College London

5

Implementations

Lets starts with Atlas…

• Different implementations on different Grids.

Looking at the EGEE Atlas implementation.

• No services required at the Tier 2 only software installed by SGM.

• All services (file catalogue, data moving services of Don

Quixote etc) at the local T1.

• As a Tier 2 “manager” this makes me very happy as it minimises the support load at the Tier 2 and means that it is left to experts at the Tier 1. Means that all sites within the London Tier 2 will be available for Atlas analysis.

13th June 2006 David Colling

Imperial College London

6

CE rfio dcap gridftp nfs

SE

Tier 2

13th June 2006

Accessing data for analysis on the Atlas EGEE installation http lrc protocol

Dataset catalog

Tier 0

VOBOX

David Colling

Imperial College London

LRC

Tier 1

FTS

7

Atlas Implementations

Prioritisation mechanism will come from the

EGEE Priorities Working group

Production

CE

Long

Software

CE

Short

70%

20 %

1 %

9 %

13th June 2006 David Colling

Imperial College London

8

Atlas Implementations and maturity

US Using Panda system:

• Much more work at the Tier 2

• However US Tier 2 seem to be better endowed with support effort so this may not be problem.

NorduGrid

• Implementation still ongoing

Maturity

• Only a few hundred GB of appropriate data

• Experience of SC4 will be important,

David Colling

Imperial College London

9

CMS Implementation

• Require installation of some services at Tier 2s: PhEDEx

& trivial file catalogue

• However, it is possible to run the instances for different sites within a distributed T2 at a single site.

• So as a distributed Tier 2 “manager” I am not too unhappy … for example in the UK I can see that all sites in the London Tier 2 and in SouthGrid running CMS analysis but less likely in NorthGrid and ScotGrid

13th June 2006 David Colling

Imperial College London

10

CMS Implementation across Grids

• Installation as similar as possible across EGEE and

OSG.

• Same UI for both … called crab

•Can use crab to submit to OSG sites via an EGEE

WMS or directly via CondorG

13th June 2006 David Colling

Imperial College London

11

CMS Maturity

• PhEDex has proved to be a very reliable since

DC04

• CRAB in use since end of 2004

• Hundred thousand jobs a month

• Tens of sites both for execution and submission

• Note that there are still failures

13th June 2006 David Colling

Imperial College London

12

Alice Implementation

• Only really running on EGEE and Alice specific sites

• Puts many requirement on a site: xrootd, VO Box running AliEn SE and CE, the package management server, MonLisa server, LCG UI and AliEn file transfer.

• All jobs are submitted via AliEn tools

• All data is accessed only via AliEn

13th June 2006 David Colling

Imperial College London

13

Tier-2 Infrastructure/Setup Example

Vo-Box

SA

CE

FTD

MonaLisa

PackMan

LCG-UI

Port # Access: Outgoing + Service

=============================================

8082 incoming from World SE (Storage Element)

8084 incoming from CERN CM (ClusterMonitor)

8083 incoming from World FTD (FileTransferDaemon)

9991 incoming from CERN PackMan

Can run on the

VO Box

Xrootd redirector

Port # Access: Outgoing + Service

=============================================)

1094 incoming from World xrootd file transfer.

Storage

LCG CE

LCG FTS/SRM-SE/LFC

Disk Server 1

Disk Server 2

Disk Server 3

Workernode configuration/requirements are equal for batch processing at

Tier0/1/2 centers (2 GB Ram/CPU – 4 GB local scratch space)

13th June 2006 David Colling

Imperial College London

14

Alice Implementation

• All data access is via xrootd … allows innovative access to data. However, it is a requirement on site.

• May be able to use xrootd front ends to standard srm

• Batch analysis implicitly allows prioritisation through a central job queue

• However, this does involve using

glexec

like functionality

13th June 2006 David Colling

Imperial College London

15

Alice Implementation –Batch analysis

JDL

Submission

Task

Queue

JDL

JDL match

Tier-2

AliEn

CE

LCG

UI

RB

Tier-2

LCG

CE

Batch

Sys.

apply

Optimiziation

- Splitting

- Requirements

- FTS replication

- Policies

JDL

JDL

XML

Agent

ROO

T

Tier-2

13th June 2006

API Services

Central services

AliE n FC

David Colling

Imperial College London

Xrootd

Tier-2 SE

16

Alice Implementation

• As a distributed Tier 2 “manager” this set up does not fill me with joy.

• I cannot imagine installing such VO boxes within the

London Tier 2 and would be surprised if any UK Tier 2 sites (with the exception Birmingham) install such boxes.

13th June 2006 David Colling

Imperial College London

17

Alice Implementation

Interactive Analysis

• More important to Alice than others.

• Novel and interesting approach based Proof and xrootd

Maturity

• Currently, only a handful of people trying to perform Grid based analysis

• Not a core part of SC4 activity for Alice.

13th June 2006 David Colling

Imperial College London

18

Conclusions

• Three of the four experiments plan to use Tier 2 sites for end user analysis.

• These three experiments have conceptually similar models (at least for batch )

• The implementations of the similar models have very different implications for the Tier 2 supporting the VOs

13th June 2006 David Colling

Imperial College London

19

Discussion…

13th June 2006 David Colling

Imperial College London

20

Download