LHCb Development Glenn Patrick Rutherford Appleton Laboratory 4th February 2004

advertisement
LHCb Development
Glenn Patrick
Rutherford Appleton Laboratory
4th February 2004
GRIDPP9
1
LHCb - Reminder
B meson
1.2M electronic channels
Weight ~4,000 tonnes
b
d
Muon System
Tracking stations
(inner and outer)
Magnet
Calorimeters
VELO
4th February 2004
20 m
RICH1
GRIDPP9
RICH2
Anti-B meson
b
d
2
LHCb GridPP Development
LHCb development has been taking
place on three fronts:

MC Production Control
and Monitoring
Gennady Kuznetsov (RAL)

Data Management
Carmine Cioffi (Oxford)
Karl Harrison (Cambridge)

GANGA
Alexander Soroko (Oxford)
Karl Harrison (Cambridge)
4th February 2004
GRIDPP9
All developed
in tandem with
LHCb Data
Challenges
3
Data Challenge DC03
65M events processed.

Distributed over 19 different
centres.

Averaged 830,000 events/day.

Equivalent to 2,300 × 1.5GHz
computers.
RICH2
RICH1

34% processed in UK at 7
different institutes.
VELO
TT

All
data written to CERN.
“Physics” Data Challenge.
Used to redesign and optimise detector …
4th February 2004
GRIDPP9
4
The LHCb Detector
Reduced number of layers for M1 (4  2)
Reduced number of tracking stations behind the magnet (4  3)
No tracking chambers in the magnet
No B field shielding plate
Changes were made for
Full Si station
material reduction and
Reoptimized
RICH-1
design
4th February 2004
GRIDPP9
5
L1
trigger
improvement
Reduced number of VELO stations (25  21)
“Detector” TDRs completed
4th February 2004
GRIDPP9
Only Computing
TDR remains
6
Data Challenge 2004
“Computing” Data Challenge.
April – June 2004
Produce 10 × more events.
At least 50% to be done via LCG.
Store data at nearest Tier-1
(i.e. RAL for UK institutes)
Try out distributed analysis.
Test computing model and write
computing TDR.
Require stable LCG2 release with SRM
interfaced to RAL DataStore
4th February 2004
GRIDPP9
7
DC04: UK Tier-2 Centres
NorthGrid
Daresbury, Lancaster, Liverpool,
Manchester, Sheffield
SouthGrid
Birmingham, Bristol, Cambridge,
Oxford, RAL PPD
11
01
10
00
11
ScotGrid
Durham, Edinburgh, Glasgow
LondonGrid
Brunel, Imperial, QMUL, RHUL, UCL
4th February 2004
GRIDPP9
8
DIRAC Architecture
Information
Service
API
Job
Provenance
Auditing
Authentication
Authorisation
User
Interface
Accounting
Metadata
Catalogue
Grid
Monitoring
File
Catalogue
Workload
Management
Package
Manager
DIRAC components
Other project components:
AliEn, LCG, …
Data
Management
Storage
Element
Computing
Element
Resources:
LCG, LHCb production sites
4th February 2004
GRIDPP9
9
MC Control Status
Gennady Kuznetsov
DIRAC
Distributed
Infrastructure with
Remote
Agent
Control
Control toolkit breaking down production
workflow into components – modules, steps.
To be deployed in DC04.
SUCCESS!
4th February 2004
GRIDPP9
10
DIRAC v1.0
Original scheme
Monitoring service
Bookkeeping service
Production service
Bookkeeping data
Monitoring info
Agent
Get jobs
Agent
Site A
Agent
Agent
Site D
“Pull” rather
than “Push”
4th February 2004
Site B
Site C
GRIDPP9
11
Components – MC Control
Module is the
basic component of
the architecture
Step
Module
Module
Module
Module
Module
Module
Workflow
Production
Step
Step
Step
Step
Step
Step
Step
4th February 2004
Module
Job
Job Job Job
Job Job
Job
Job
Job
Job Job Job Job Job Job
Job Job
Job Job
Job
Job
GRIDPP9
This structure allow the
Production Manager to
construct any algorithm
as a combination of
modules.
Levels of usage:
1.
Module – Programmer
2.
Step – Production
Manager
3.
Workflow –
User/Production
manager
Each step
generates job as a
Python program.
12
Gennady Kuznetsov
Module Name
Module Editor
Stored as XML file
Description
Module
variables.
4th February 2004
GRIDPP9
Python code of
single module. Can
be many classes. 13
Gennady Kuznetsov
Step Editor
Step Name
Stored as XML file, where all
modules are embedded
Definitions of
Modules
Instances of
Modules
Description
Selected instance
Variables of
currently
selected
instance
Step variables.
4th February 2004
GRIDPP9
14
Gennady Kuznetsov
Workflow Editor
Workflow Name
Stored as XML file
Step Definitions
Step Instances
Selected Step
Instance
Description
Variables of
currently
selected Step
Instance
Workflow
Variables.
4th February 2004
GRIDPP9
15
Gennady Kuznetsov
Job Splitting
The input value for the job splitting is a Python list
object.
Every single (top level) element of this list applies to
the Workflow Definition and propagates through the
code and generates a single element of the production
(one or several jobs).
Python List
Workflow Definition
Step
Step
Step
Step
Step
Step
Step
4th February 2004
GRIDPP9
Production
Job
Job Job Job
Job Job
Job
Job
Job
Job Job Job Job Job Job
Job Job
Job Job
Job
Job
16
Gennady Kuznetsov
Future: Production Console
Once an agent has received a workflow, the Production
Manager has no control over any function in a remote
centre.
Local Manager must perform all of the configurations and
interventions at individual site.
Develop ”Production Console” which will provide extensive
control and monitoring functions for the Production
Manager.
Monitor and configure remote agents.
Data replication control.
Intrusive system – need to address Grid security
mechanisms
4th February 2004 and provide robust
GRIDPP9 environment.
17
DIRAC v1.0 Architecture
Production preparation
Application
Application
packager
packager
Workflow
Workflow Production
Production
editor
editor
editor
editor
Create
application
tar file
Edit
Instantiate
workflow
Production manager
Production
Manager
Production resources
Site A
Agent
Agent A
A
Site B
…
Production
Production DB
DB
Agent
Agent B
B
Job
Job
XML
XML
Job
Job
request
request
Job
Job
status
status
Meta
Meta
XML
XML
Dataset
Dataset
replica
replica
Site n
Production
Production
Service
Service
Monitoring
Monitoring
Service
Service
Monitoring
Monitoring
DB
DB
Bookkeeping
Bookkeeping
Service
Service
Bookkeeping
Bookkeeping
DB
DB
Central Storage
Castor
Castor MSS
MSS
CERN
CERN
Agent
Agent nn
4th February 2004
Central
Services
GRIDPP9
18
DIRAC v2.0 WMS Architecture
Based on central queue service
Production
Production
Service
service
DIRAC Workload Management
GANGA
Job
Receiver
Optimizer
1
Optimizer
Optimizer 11
Optimizer
Optimizer 11
Command
line UI
Job queue
Job DB
4th February 2004
Match
Maker
GRIDPP9
Computing resources
Agent 1
LCG
LCG CE
CE
Agent 2
LCG
LCG WMS
WMS
Agent 3
DIRAC
DIRAC CE
CE
Also data stored remotely
19
Data Management Status
Carmine Cioffi
File catalogue browser for POOL
Integration of POOL persistency framework
into GAUDI  new EventSelector interface.
4th February 2004
GRIDPP9
SUCCESS!20
Main Panel, LFN Mode Browsing
POOL file catalogue provides
LFN & PFN association.
Write mode
selection
Import the
fragment of
a catalog
Shows the metadata
schema, with the
possibility to change it
Reload the
catalog
Read the next and
previous bunch of files
from the catalog
List the files
selected
Filter text
bar.
Search text
bar
List of
LFNs
List of PFNs
associated to the
LFN selected
from the list of
LFNs on the left
sub-panel
Tabs for
LFN / PFN
mode selection
4th February 2004
List all the
metadata value
of the catalog
Browser allows
user to
interact with
catalogue via
GUI.
GRIDPP9
Can save list of LFNs
for job sandbox
21
Main Panel, PFN Mode Browsing
In PFN mode, the files are browsed in the same way as Windows
Explorer. The folders are shown on the left sub-panel and the
value of the folder on the right sub-panel.
Sub menu with three
operations to be
done on the file
selected.
4th February 2004
GRIDPP9
Write mode button
opens WrFCBrowser
frame allowing user to
22
write to the catalogue…
Write Mode Panel
Remove a
LFN
Add a PFN
replica
Delete
a PFN
Add
LFN
Add
metadata
value
Rollback
Commit
Register
a PFN
Show the action performed
4th February 2004
GRIDPP9
23
PFN register
frame
Frame to show and
change the metadata
schema of the catalog
This frame allows
setting of the
metadata value
4th February 2004
GRIDPP9
24
This frame shows
the metadata
value of the PFN
Myfile
This frame shows the
attribute value of the
PFN
Shows the list of the
files selected
4th February 2004
GRIDPP9
25
GAUDI/POOL Integration
Benefit from investment in LCG
Retire parts of Gaudi  reduce maintenance.
Designed and implemented a new interface for
the LHCb EventSelector. Criteria:
 One or more “datasets” (e.g. list of runs,
list of files matching a given criteria).
 One or more “EventTagCollections” with
extra selection based on Tag values.
One or more physical files.
Result of an event selection is a virtual list of
event pointers.
4th February 2004
GRIDPP9
26
Physicist’s View of Event Data
Dataset
Dataset
File1
Event
Files
Event 1
Event
Event
2 1
Event 2
… Event 2
…
… 3
Event
Event 3
Event N
RAW2-1/1/2008
RAW3-22/9/2007
RAW4-2/2/2008
…
Collection Set
B -> ππ Candidates (Phy)
B -> J/Ψ (μ+ μ-)
Candidates
…
Dataset
Dataset
Event
Event
1 tag collctn
Event 1
Tag 21
Event
Event 2
… Tag 2
…
… 3
Event
Event 3
Tag M
0.3
1.2
8
3.1
Gaudi
Bookkeeping
4th February 2004
5
2
GRIDPP9
27
Future: Data to Metadata
File catalogue holds only a minimal amount of metadata.
LHCb deploys a separate “bookkeeping” database service to store the
metadata for datasets and event collections.
Corresponds to ARDA Job Provenance DB
and Metadata Catalogue
Based on central ORACLE server at CERN with query service through
XML-RPC interface.
Not scaleable, particularly for Grid, and completely new metadata
solution required.
ARDA based system will be investigated.
Vital that this is development is optimised for LHCb and synchronised
with data challenges.
4th February 2004
GRIDPP9
28
Metadata: Data Production
Job.xml
Data Production
Production
done
Production Jobs
Bookkeeping
File Catalogue
Prod.Mgr
•Build new
configuration
•Selection of
Defaults
4th February 2004
Configuration
GRIDPP9
Information Flow
29
Metadata: Data Analysis
Information Flow
Job.opts
Bookkeeping
DIRAC
User Job
File Catalogue
Modify
Defaults
Select input data
User
4th February 2004
Pick-up default
configuration
GRIDPP9
Configuration
30
LHCb GANGA Status
Alexander Soroko, Karl Harrison
LHCb
ATLAS
+ Alvin Tan
Janusz Martyniak
BaBar
User Grid Interface.
First prototype released in April 2003.
To be deployed for LHCb 2004 Data Challenge.
SUCCESS!
4th February 2004
GRIDPP9
31
GANGA for LHCb
GANGA will allow LHCb user to perform
standard analysis tasks:
 Data
queries.
 Configuration of jobs, defining the job
splitting/merging strategy.
 Submitting jobs to the chosen Grid resources.
 Following the progress of jobs.
 Retrieval of job output.
 Job bookkeeping.
4th February 2004
GRIDPP9
32
GANGA User Interface
Grid/Batch System
Gatekeeper
Submit job
Worker nodes
JDL file
Job Options file
Ganga Job object
Get job output
Job script
Send
Get Monitoring Info
Send job output
Ganga Job object
File Transfer
Storage Element
Local Client
Ganga Job object
Ganga Job object
Ganga Job object
Job Factory (Job Registry Class)
Job Options Editor
Data Selection
Database of Standard Job Options
(Input/Output
Files)
4th February 2004
GRIDPP9
Strategy Selection
Job Requirements
Strategy Database (Splitting scripts)
(LSF Resources,
etc)
33
Software Bus
GUI

CLI
Job Definition
Software Bus
Gaudi/Athena
Job Definition
Gaudi/Athena
Job Options
Editor
BaBar Job
Definition and
Splitting
Job Registry
Job Handling

Ganga components of general
applicability or Core
Components (to right in
diagram)
 Ganga components providing
specialised functionality (to
left in diagram)
 External components (at
bottom in diagram)
File Transfer

Python Native
Gaudi
Python
Python
Root
PyCMT
Py
Magda
PyAMI
4th February 2004

User has access to
functionality of Ganga
components through GUI and
CLI, layered one over the
other above a Software Bus
Software Bus itself is a
Ganga component
implemented in Python
Components used by Ganga
fall into 3 categories:
GRIDPP9
34
GUIs Galore
4th February 2004
GRIDPP9
35
DIRAC WMS Architecture
Production
service
DIRAC Workload Management
GANGA
GANGA
Job
Receiver
Optimizer
1
Optimizer
Optimizer 11
Optimizer
Optimizer 11
Command
line UI
Job queue
Job DB
4th February 2004
Match
Maker
GRIDPP9
Computing resources
Agent 1
LCG
LCG CE
CE
Agent 2
LCG
LCG WMS
WMS
Agent 3
DIRAC
DIRAC CE
CE
36
Future Plans
Motivation




Software Cache
Component
Cache
Software/Component Server
Refactorisation
of Ganga, with
submission on
remote client
Ease integration
Remote Client
of external
Execution node
components
Grid/ Batch-System
Agent
Remote-Client Scheduler
Scheduler
(Runs/Validates Job)
Facilitate multiperson,
distributed
Local Client
development
JDL, Classads,
Increase
Scheduler Proxy
Dispatcher
Job Requirements
customizability/
LSF Resources, etc
flexibility
NorduGrid
LSF
Job Collection
Permit GANGA
Derived Requirements
Local
PBS
(XML
Description)
components to
DIAL
EDG
be used
DIRAC
USG
Job Factory
Other
externally more
(Machinery for Generating XML Descriptions of Multiple Jobs)
simple
Scheduler Service
Job-Options Template
Job-Options Editor
Job-Options
Knowledge Base
4th February 2004
Database of
Standard Job Options
Dataset
Dataset Selection
Strategy
Selection
User Requirements
Dataset Catalogue
Strategy Database
(Splitter Algorithms)
Database of
Job Requirements
2ndGRIDPP9
GANGA prototype ~ April 2004
37
Future: GANGA
Develop into generic front-end capable of submitting a range of
applications to the Grid. Requires central core and modular structure
(started with version 2 re-factorisation) to allow new frameworks to
be plugged in.
Enable GANGA to be used in complex analysis environment over many
years for many users. Hierarchical structure, import/export facility,
schema evolution, etc.
Interact with multiple Grids (e.g. LCG, NorduGrid, EGEE…). Needs to
keep pace with development of Grid services. Synchronise with ARDA
developments.
Interactive analysis? ROOT, PROOF
4th February 2004
GRIDPP9
38
Download