ppt file

advertisement
The Particle Physics Computational Grid
Paul Jeffreys/CCLRC
21 March 2000
System Managers Meeting
Slide 1
Financial Times, 7 March 2000
21 March 2000
System Managers Meeting
Slide 2
Front Page FT, 7 March 2000
21 March 2000
System Managers Meeting
Slide 3
LHC Computing: Different from Previous
Experiment Generations
–
–
–
–
Geographical dispersion: of people and resources
Complexity: the detector and the LHC environment
Scale: Petabytes per year of data
(NB – for purposes of this talk – mostly LHC specific)
~5000 Physicists
250 Institutes
~50 Countries
Major challenges associated with:
 Coordinated Use of Distributed Computing Resources
 Remote software development and physics analysis
 Communication and collaboration at a distance
R&D: A New Form of Distributed System: Data-Grid
21 March 2000
System Managers Meeting
Slide 4
The LHC Computing Challenge – by example
• Consider UK group searching for Higgs particle in LHC experiment
– Data flowing off detectors at 40TB/sec (30 million floppies/sec)!
• Factor of c. 5.105 rejection made online before writing to media
– But have to be sure not throwing away the physics with the background
– Need to simulate samples to exercise rejection algorithms
• Simulation samples will be created around the world
• Common access required
– After 1 year, 1PB sample of experimental events stored on media
• Initial analysed sample will be at CERN, in due course elsewhere
– UK has particular detector expertise (CMS: e-, e+,  )
– Apply our expertise to : access 1PB exptal. data (located?), re-analyse
e.m. signatures (where?) to select c. 1 in 104 Higgs candidates, but S/N
will be c. 1 to 20 (continuum background), and store results (where?)
• Also .. access some simulated samples (located?), generate
(where?) additional samples, store (where?) -- PHYSICS (where?)
• In addition .. strong competition
• Desire to implement infrastructure in generic way
21 March 2000
System Managers Meeting
Slide 5
Proposed Solution to LHC Computing Challenge (?)
• A data analysis ‘Grid’ for High Energy Physics
CERN
3
3
T2
3
3
3
T2
T2
T2
3
3
Tier 1
4
3
T2
T2
4
4
3
3
3
21 March 2000
3
3
3
System Managers Meeting
Slide 6
4
Access Patterns
Typical particle physics experiment in
2000-2005:
On year of acquisition and analysis of data
100 Mbytes/s (2-5
physicists)
Raw Data ~1000 Tbytes
Reco-V1 ~1000 Tbytes
ESD-V1.1
~100 Tbytes
AOD
~10 TB
AOD
~10 TB
Reco-V2 ~1000 Tbytes
ESD-V1.2
~100 Tbytes
AOD
~10 TB
21 March 2000
AOD
~10 TB
ESD-V2.1
~100 Tbytes
AOD
~10 TB
AOD
~10 TB
ESD-V2.2
~100 Tbytes
AOD
~10 TB
Access Rates
(aggregate, average)
AOD
~10 TB
AOD
~10 TB
System Managers Meeting
500 Mbytes/s (5-10
physicists)
1000 Mbytes/s (~50
physicists)
2000 Mbytes/s (~150
physicists)
Slide 7
Hierarchical Data Grid
• Physical
– Efficient network/resource use  local > regional > national > oceanic
• Human
– University/regional computing complements national labs, in turn
complements accelerator site
• Easier to leverage resources, maintain control, assert priorities at
regional/local level
– Effective involvement of scientists and students independently of
location
• The ‘challenge for UK particle physics’ … How do we:
– Go from the 200 PC99 farm maximum of today to 10000 PC99 centre?
– Connect/participate in European and World-wide PP grid?
– Write the applications needed to operate within this hierarchical grid?
AND
– Ensure other disciplines able to work with us, our developments &
applications are made available to others, exchange of expertise, and
enjoy fruitful collaboration with Computer Scientists and Industry
21 March 2000
System Managers Meeting
Slide 8
Quantitative Requirements
• Start with typical experiment’s Computing Model
• UK Tier-1 Regional Centre specification
• Then consider implications for UK Particle Physics Computational
Grid
– Over years 2000, 2001, 2002, 2003
– Joint Infrastructure Bid made for resources to cover this
– Estimates of costs
• Look further ahead
21 March 2000
System Managers Meeting
Slide 9
21 March 2000
System Managers Meeting
Slide 10
21 March 2000
System Managers Meeting
Slide 11
21 March 2000
System Managers Meeting
Slide 12
21 March 2000
System Managers Meeting
Slide 13
21 March 2000
System Managers Meeting
Slide 14
21 March 2000
System Managers Meeting
Slide 15
21 March 2000
System Managers Meeting
Slide 16
21 March 2000
System Managers Meeting
Slide 17
Steering Committee
‘Help establish the Particle Physics Grid activities in the UK'
a. An interim committee be put in place.
b. The immediate objectives would be prepare for the presentation to John Taylor on
27 March 2000, and to co-ordinate the EU 'Work Package' activities for April 14
c. After discharging these objectives, membership would be re-considered
d. The next action of the committee would be to refine the Terms of Reference
(presented to the meeting on 15 March)
e. After that the Steering Committee will be charged with commissioning a Project
Team to co-ordinate the Grid technical work in the UK
f. The interim membership is:
• Chairman:
Andy Halley
• Secretary:
Paul Jeffreys
• Tier 2 reps:
Themis Bowcock, Steve Playfer
• CDF:
Todd Hoffmann
• D0:
Ian Bertram
• CMS:
David Britton
• BaBar:
Alessandra Forti
• CNAP:
Steve Lloyd
– The 'labels' against the members are not official in any sense at this stage, but the
members are intended to cover these areas approximately!
21 March 2000
System Managers Meeting
Slide 18
UK Project Team
•
•
•
•
•
Need to really get underway!
System Managers crucial!
PPARC needs to see genuine plans and genuine activities…
Must coordinate our activities
And
– Fit in with CERN activities
– Meet needs of experiments (BaBar, CDF, D0, …)
• So … go through range of options and then discuss…
21 March 2000
System Managers Meeting
Slide 19
EU Bid(1)
• Bid will be made to EU to link national grids
– “Process” has become more than ‘just a bid’
• Almost reached the point where have to be active participant in EU
bid, and associated activities, in order to access data from CERN in
the future
• Decisions need to be taken today…
• Timescale:
– March 7
– March 17
– March 30
–
–
–
–
April 17
April 25
May 1
May 7
21 March 2000
Workshop at CERN to prepare programme of work (RPM)
Editorial meeting to look for industrial partners
Outline of paper used to obtain pre-commitment of
partners
Finalise ‘Work Packages’ – see next slides
Final draft of proposal
Final version of proposal for signature
Submit
System Managers Meeting
Slide 20
EU Bid(2)
• The bid was originally for 30MECU, with matching contribution from
national funding organisations
– Now scaled down, possibly to 10MECU
– Possibly as ‘taster’ before follow-up bid?
– EU funds for Grid activities in Framework VI likely to be larger
• Work Packages have been defined
– Objective is that countries (through named individuals) take
responsibility to split up the work and define deliverables within each,
to generate draft content for EU bid
– BUT
• Without doubt the same people will be well positioned to lead the
work in due course
• .. And funds split accordingly??
• Considerable manoeuvering!
– UK – need to establish priorities, decide where to contribute…
21 March 2000
System Managers Meeting
Slide 21
Work Packages
Middleware
Contact Point
1 Grid Work Scheduling
Cristina Vistoli
2 Grid Data Management
Ben Segal
3 Grid Application Monitoring
Robin Middleton
4 Fabric Management
Tim Smith
5 Mass Storage Management
Olof Barring
Infrastructure
6 Testbed and Demonstrators
François Etienne
7 Network Services
Christian Michau
Applications
8 HEP Applications
Hans Hoffmann
9 Earth Observation Applications
Luigi Fusco
10 Biology Applications
Christian Michau
Management
11 Project Management
Fabrizio Gagliardi
INFN
CERN
UK
CERN
CERN
IN2P3
CNRS
4expts
CERN
Robin is ‘place-holder’ – holding UK’s interest (explanation in Open Session)
21 March 2000
System Managers Meeting
Slide 22
UK Participation in Work Packages
MIDDLEWARE
1. Grid Work Scheduling
2. Grid Data Management
TONY DOYLE, Iain Bertram?
3. Grid Application monitoring ROBIN MIDDLETON, Chris Brew
4. Fabric Management
5. Mass Storage Management JOHN GORDON
INFRASTRUCTURE
6. Testbed and demonstrators
7. Network Services
PETER CLARKE, Richard HughesJones
APPLICATIONS
8. HEP Applications
21 March 2000
System Managers Meeting
Slide 23
PPDG
PPDG as an NGI Problem
PPDG Goals
The ability to query and partially retrieve hundreds of terabytes
across Wide Area Networks within seconds,
Making effective data analysis from ten to one hundred US
universities possible.
PPDG is taking advantage of NGI services in three areas:
– Differentiated Services: to allow particle-physics bulk data
transport to coexist with interactive and real-time remote
collaboration sessions, and other network traffic.
– Distributed caching: to allow for rapid data delivery in response to
multiple “interleaved” requests
– “Robustness”: Matchmaking and Request/Resource
co-scheduling: to manage workflow and use computing and net
resources efficiently; to achieve high throughput
Particle Physics Data Grid
21 March 2000
Richard P. Mount, SLAC
DoE NGI Program PI Meeting,
System Managers Meeting
October 1999
Slide 24
PPDG
PPDG Resources
• Network Testbeds:
– ESNET links at up to 622 Mbits/s (e.g. LBNL-ANL)
– Other testbed links at up to 2.5 Gbits/s (e.g. Caltech-SLAC via NTON)
• Data and Hardware:
– Tens of terabytes of disk-resident particle physics data (plus hundreds of
terabytes of tape-resident data) at accelerator labs;
– Dedicated terabyte university disk cache;
– Gigabit LANs at most sites.
• Middleware Developed by Collaborators:
– Many components needed to meet short-term targets (e.g.Globus, SRB,
MCAT, Condor,OOFS,Netlogger, STACS, Mass Storage Management)
already developed by collaborators.
• Existing Achievements of Collaborators:
– WAN transfer at 57 Mbytes/s;
– Single site database access at 175 Mbytes/s
Data Analysis for SLAC Physics
21 March 2000
Richard P. Mount
CHEP 2000
System Managers Meeting
27
Slide 25
PPDG
PPDG First Year Milestones
• Project Start
• Decision on existing middleware to be
integrated into the first-year Data Grid;
• First demonstration of high-speed
site-to-site data replication;
• First demonstration of multi-site
cached file access (3 sites);
• Deployment of high-speed site-to-site
data replication in support of two
particle-physics experiments;
• Deployment of multi-site cached file
access in partial support of at least two
particle-physics experiments.
Particle Physics Data Grid
21 March 2000
Richard P. Mount, SLAC
August, 1999
October, 1999
January, 2000
February, 1999
July, 2000
August, 2000
DoE NGI Program PI Meeting,
System Managers Meeting
October 1999
Slide 26
PPDG
First Year PPDG “System” Components
Middleware Components (Initial Choice): See PPDG Proposal Page 15
Object and File-Based
Objectivity/DB (SLAC enhanced)
Application Services
GC Query Object, Event Iterator,
Query Monitor
FNAL SAM System
Resource Management
Start with Human Intervention
(but begin to deploy resource discovery & mgmnt tools)
File Access Service
Components of OOFS (SLAC)
Cache Manager
GC Cache Manager (LBNL)
Mass Storage Manager
HPSS, Enstore, OSM (Site-dependent)
Matchmaking Service
Condor (U. Wisconsin)
File Replication Index
MCAT (SDSC)
Transfer Cost Estimation Service Globus (ANL)
File Fetching Service
Components of OOFS
File Movers(s)
SRB (SDSC); Site specific
End-to-end Network Services
Globus tools for QoS reservation
Security and authentication
Globus (ANL)
Particle Physics Data Grid
21 March 2000
Richard P. Mount, SLAC
DoE NGI Program PI Meeting,
System Managers Meeting
October 1999
Slide 27
LHCb contribution to EU proposal HEP
Applications Work Package
• Grid testbed in 2001, 2002
• Production 106 simulated b->D*pi
– Create 108 events at Liverpool MAP in 4 months
– Transfer 0.62TB to RAL
– RAL dispatch AOD and TAG datasets to other sites
• 0.02TB to Lyon and CERN
• Then permit a study of all the various options for performing a
distributed analysis in a Grid environment
21 March 2000
System Managers Meeting
Slide 28
American Activities
• Collaboration with Ian Foster
– Transatlantic collaboration using GLOBUS
• Networking
– QoS tests with SLAC
– Also link in with GLOBUS?
• CDF and D0
– Real challenge to ‘export data’
– Have to implement 4Mbps connection
– Have to set up mini Grid
• BaBar
– Distributed LINUX farms etc in JIF bid
21 March 2000
System Managers Meeting
Slide 29
Networking Proposal - 1
DETAILS
DEPENDENCIES/RISKS
RESOURCES REQUIRED
MILESTONES
Availability of temporary PVCs
on inter and intra-national WANs
– or from collaborating
industries.
1.5 SY
Jan-01: Demonstration of low
rate transfers between all sites
Demonstration of high rate site
to site file replication
Single site-to-site tests at low
rates to set up technologies and
gain experience. These should
include and benefit the
experiments which will be
taking data.
TIER-0 -> TIER-1 (CERNRAL)
TIER-1 -> TIER-1 (FNALRAL)
TIER-1 to TIER-2 (RAL-LVPL,
GLA/ED)
Needs negotiation now.
Monitoring expertise/ tools
already available: PPNCG(UK),
ICFA(Worldwide)
10 Mbit/s PVCs between sites in
00-01
50 Mbits/s PVCs between sites in
01-02
> 100 Mbits/s PVCs in 02-03
Jan-02: Demonstration of
cascaded file transfer
Demonstration of sustained
modest rate transfers
03: Implementation of sustained
transfers of real data at rates
approaching 1000 Mbits/s
Use existing monitoring tools.
Adapt to function as resource
predictors also.
Multi site file replication,
cascaded file replication at
modest rates rates.
Transfers at Neo-GRID rates
21 March 2000
System Managers Meeting
Slide 30
Networking - 2
Differentiated Services
Deploy some form of DiffServ
on dedicated PVCs. Measure
high and low priority latency and
rates as a function of strategy
and load.
PVCs must be QoS capable. May
rely upon proprietary or
technology dependent factors in
short term.
21 March 2000
Same PVCs as in NET-1
Apr-01: Successful deployment
and measurement of pilot QoS on
PVCs under project control.
Production deployment of QoS
on WAN
Monotoring tools
[Depends upon QoS
developments]. Attempt to
deploy end-to-end QoS across
several interconnected networks.
1.5 SY
WAN end-to-end depends upon
expected developments by
network suppliers.
System Managers Meeting
Slide 31
Networking - 3
Monitoring and Metrics for
resource prediction.
Survey and define monitoring
requirements of GRID .
PPNCG monitoring
0.5 SY
Dec-00: Interim report on GRID
monitoring requirements
ICFA monitoring
Adapt existing monitoring tools
for for measurement and
monitoring needs of network
work packages (all NET-xx) as
described here.
Jul-01:
Dec-01: Finish of adaptation of
existing tools for monitoring
Jul-02: First prototype predictive
tools deployed
In particular develop protocol
sensitive monitoring as will be
needed for QoS
Dec-02: Report on tests of
predictive tools
Develop and test prediction
metrics
21 March 2000
System Managers Meeting
Slide 32
Networking - 4
Data Flow modelling
Assimilate Monarc modeling
tool set.
Applicability of existing tools
unknown before appraisal.
3 SY
Oct-00: Assimilate Monarc
Dec-00: Determine requirements
of GRID model
Determine requirements of
model of UK GRID – and to
what extent this factorises or not
from international GRID.
Determine scope of work needed
to adapt/provide components.
Appraise work needed to
adapt/write necessary
components.
??: Configure initial model.
Configure and run models in
parallel with transfer tests NET1 and QoS tests NET-2 for
calibration purposes
Apply models to determination
of GRID topology and resource
location.
21 March 2000
System Managers Meeting
Slide 33
Pulling it together…
• Networking:
– EU work package
– Existing tests
– Integration of ICFA studies to Grid
• Will networking lead the non-experiment activities??
• Data Storage
– EU work package
• Grid Application Monitoring
– EU work package
• CDF, D0 and BaBar
– Need to integrate these into Grid activities
– Best approach is to centre on experiments
21 March 2000
System Managers Meeting
Slide 34
…Pulling it all together
• Experiment-driven
– Like LHCb, meet specific objectives
• Middleware preparation
– Set up GLOBUS?
• QMW, RAL, DL ..?
– Authenticate
– Familiar
– Try moving data between sites
• Resource Specification
• Collect dynamic information
• Try with international collaborators
– Learn about alternatives to GLOBUS
– Understand what is missing
– Exercise and measure performance of distributed cacheing
• What do you think?
• Anyone like to work with Ian Foster for 3 months?!
21 March 2000
System Managers Meeting
Slide 35
Download