Regional Centre

advertisement
Planning LHCb computing infrastructure at
CERN and at regional centres
F. Harris
Planning LHCb computing infrastructure
22 May 2000
Slide 1
Talk Outline
 Reminder of LHCb distributed model
 Requirements and planning for 2000-2005 ( growth of
regional centres)
 EU GRID proposal status and LHCb planning
 Large prototype proposal and LHCb possible uses
 Some NEWS from LHCb (and other) activities
Planning LHCb computing infrastructure
22 May 2000
Slide 2
General Comments
 Draft LHCb Technical Note for computer model exists
 New requirements estimates have been made (big
changes in MC requirements)
 Several presentations have been made to LHC
computing review in March and May (and May 8 LHCb
meeting)
 http://lhcb.cern.ch/computing/Steering/Reviews/LHCComputing2000/default.htm
Planning LHCb computing infrastructure
22 May 2000
Slide 3
Baseline Computing Model - Roles
 To provide an equitable sharing of the total computing
load can envisage a scheme such as the following
 After 2005 role of CERN (notionally 1/3)
 to be production centre for real data
 support physics analysis of real and simulated data by CERN
based physicists
 Role of regional centres (notionally 2/3)
 to be production centre for simulation
 to support physics analysis of real and simulated data by local
physicists
 Institutes with sufficient cpu capacity share simulation
load with data archive at nearest regional centre
Planning LHCb computing infrastructure
22 May 2000
Slide 4
CPU for production
Mass Storage for
RAW, ESD AOD, and TAG
Production Centre
Generate raw data
Reconstruction
Production analysis
User analysis
AOD,TAG
real : 80TB/yr
sim: 120TB/yr
CPU for analysis
Mass storage for
AOD, TAG
Regional Centre
User analysis
Regional Centre
Regional Centre
Regional Centre
User analysis
User analysis
User analysis
AOD,TAG
8-12 TB/yr
CPU and
data servers
Institute
Selected User Analyses
Institute
Institute
Selected User Analyses
Institute
Selected User Analyses
Selected User Analyses
Physics : Plans for Simulation 2000-2005
 In 2000 and 2001 we will produce 3. 106 simulated
events each year for detector optimisation studies in
preparation of the detector TDRs (expected in 2001
and early 2002).
 In 2002 and 2003 studies will be made of the high level
trigger algorithms for which we are required to
produce 6.106 simulated events each year.
 In 2004 and 2005 we will start to produce very large
samples of simulated events, in particular background,
for which samples of 107 events are required.
 This on-going physics production work will be used as
far as is practicable for testing development of the
computing infrastructure.
Planning LHCb computing infrastructure
22 May 2000
Slide 6
Computing : MDC Tests of Infrastructure
 2002 : MDC 1 - application tests of grid middleware and
farm management software using a real simulation and
analysis of 107 B channel decay events. Several regional
facilities will participate :
 CERN, RAL, Lyon/CCIN2P3,Liverpool, INFN, ….
 2003 : MDC 2 - participate in the exploitation of the
large scale Tier0 prototype to be setup at CERN





High Level Triggering – online environment, performance
Management of systems and applications
Reconstruction – design and performance optimisation
Analysis – study chaotic data access patterns
STRESS TESTS of data models, algorithms and technology
 2004 : MDC 3 - Start to install event filter farm at the
experiment to be ready for commissioning of detectors
in 200 4 and 2005
Planning LHCb computing infrastructure
22 May 2000
Slide 7
Cost of CPU, disk and tape
 Moore’s Law evolution with
time for cost of CPU and
storage. Scale in MSFr is for
a facility sized to ATLAS
requirements (> 3 x LHCb)
 At today’s prices the total
cost for LHCb ( CERN and
regional centres) would be
~60 MSFr
 In 2004 the cost would be
~10 - 20 MSFr
 After 2005 the maintenance
cost is ~ 5 MSFr /year
Planning LHCb computing infrastructure
22 May 2000
Slide 8
Growth in Requirements to Meet Simulation
Needs
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
No of signal events
1.0E+06 1.0E+06 2.0E+06 3.0E+06 5.0E+06 1.0E+07 1.0E+07 1.0E+07 1.0E+07 1.0E+07 1.0E+07
No of background events
1.0E+06 1.5E+06 2.0E+06 4.0E+06 1.0E+07 1.0E+09 1.0E+09 1.0E+09 1.0E+09 1.0E+09 1.0E+09
CPU for simulation of signal (SI95) 10000
10000
20000
30000
50000 100000 100000 100000 100000 100000 100000
CPU for background simulation (SI95)
16000
24000
32000
64000 160000 400000 400000 400000 400000 400000 400000
CPU user analysis (SI95)
2500
2500
5000
7500
12500
25000
25000
25000
25000
25000
25000
RAWmc data on disk (TB)
0.4
0.5
0.8
1.4
3
202
202
202
202
202
202
RAWmc data on tape (TB)
0.4
0.5
0.8
1.4
3
202
202
202
202
202
202
ESDmc data on disk (TB)
0.2
0.25
0.4
0.7
1.5
101
101
101
101
101
101
AODmc data on disk (TB)
0.06
0.1
0.1
0.3
0.5
30.5
39.4
42.1
42.9
43.2
43.3
TAGmc data on disk (TB)
0.002 0.0025
0.004
0.007
0.015
1.01
1.01
1.01
1.01
1.01
1.01
Unit Costs
CPU cost / SI95
Disk cost / GB
Tape cost / GB
2000 2001 2002
64.5 46.1 32.9
16.1 11.5
8.2
2.7
1.9
1.4
2003
23.5
5.9
1.0
2004
16.8
4.2
0.7
2005
12.0
3.0
0.5
2006
8.6
2.1
0.36
2007 2008 2009 2010
6.1
4.4
3.1
2.2
1.5
1.1
0.8
0.6
0.26 0.18 0.13 0.09
CPU for signal(kSFr)
CPU for background (kSFr)
CPU for user analysis (kSFr)
RAWmc data on disk (kSFr)
RAWmc data on tape (kSFr)
ESDmc data on disk (kSFr)
AODmc data on disk (kSFr)
TAGmc data on disk (kSFr)
Investment per year (kSFr)
2000 2001 2002
323
230
329
0
369
263
65
92
66
3
6
2
0.2
0.2
0.2
3
3
1
1
1
0
0.0
0.0
0.0
395
701
663
2003
235
753
59
4
0.3
2
1
0.0
1053
2004
336
1613
84
7
0.4
3
1
0.1
2045
2005
600
2880
150
597
20.2
299
90
3.0
4639
2006
171
857
69
129
14.4
64
21
2.2
1328
2007 2008 2009 2010
122
87
62
45
612
437
312
223
49
35
25
18
92
66
47
33
10.3
7.4
5.3
3.8
46
33
23
17
15
11
8
6
1.5
1.1
0.8
0.6
949
678
484
346
Planning LHCb computing infrastructure
22 May 2000
Slide 9
Cost / Regional Centre for Simulation
 Assume there are 5 regional centres(UK,IN2P3,INFN,CERN+ consortium
of Nikhef, Russia, etc...)
 Assume costs are shared equally
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
No of signal events
2.0E+05 2.0E+05 4.0E+05 6.0E+05 1.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06
No of background events
2.0E+05 3.0E+05 4.0E+05 8.0E+05 2.0E+06 2.0E+08 2.0E+08 2.0E+08 2.0E+08 2.0E+08 2.0E+08
CPU for simulation of signal (SI95) 2000
2000
4000
6000
10000
20000
20000
20000
20000
20000
20000
CPU for simulation of background (SI95)
3200
4800
6400
12800
32000
80000
80000
80000
80000
80000
80000
CPU user analysis (SI95)
500
500
1000
1500
2500
5000
5000
5000
5000
5000
5000
RAWmc data on disk (TB)
0.08
0.1
0.16
0.28
0.6
40.4
40.4
40.4
40.4
40.4
40.4
RAWmc data on tape (TB)
0.08
0.1
0.16
0.28
0.6
40.4
40.4
40.4
40.4
40.4
40.4
ESDmc data on disk (TB)
0.04
0.05
0.08
0.14
0.3
20.2
20.2
20.2
20.2
20.2
20.2
AODmc data on disk (TB)
0.012 0.0186 0.02958 0.05087 0.10526 6.09158 7.88747 8.42624 8.58787 8.63636 8.65091
TAGmc data on disk (TB)
0.0004 0.0005 0.0008
0.0014
0.003
0.202
0.202
0.202
0.202
0.202
0.202
CPU for signal (kSFr)
CPU for background (kSFr)
CPU user analysis (kSFr)
RAWmc data on disk (kSFr)
RAWmc data on tape (kSFr)
ESDmc data on disk (kSFr)
AODmc data on disk (kSFr)
TAGmc data on disk (kSFr)
Investment per year (kSFr)
Planning LHCb computing infrastructure
2000 2001 2002
64.5 46.1 65.9
0.0 73.8 52.7
12.9 18.4 13.2
0.6
1.2
0.5
0.0
0.0
0.0
0.6
0.6
0.2
0.2
0.2
0.1
0.0
0.0
0.0
79
140
133
22 May 2000
2003
47.0
150.5
11.8
0.7
0.1
0.4
0.1
0.0
211
2004
67.2
322.6
16.8
1.3
0.1
0.7
0.2
0.0
409
2005
120.0
576.0
30.0
119.4
4.0
59.7
18.0
0.6
928
2006
34.3
171.4
13.7
25.7
2.9
12.9
4.3
0.4
266
2007 2008 2009 2010
24.5 17.5 12.5
8.9
122.4 87.5 62.5 44.6
9.8
7.0
5.0
3.6
18.4 13.1
9.4
6.7
2.1
1.5
1.1
0.8
9.2
6.6
4.7
3.3
3.1
2.2
1.6
1.1
0.3
0.2
0.2
0.1
190
136
97
69
Slide 10
EU GRID proposal status
(http://grid.web.cern.ch/grid/)
 GRIDs Software to manage all aspects of distributed
computing(security and authorisation, resource
management,monitoring). Interface to high energy physics...
 Proposal was submitted May 9
 Main signatories (CERN,France,Italy,UK,Netherlands,ESA) +
associate signatories
(Spain,Czechoslovakia,Hungary,Spain,Portugal,Scandinavia..)
 Project composed of Work Packages (to which countries provide
effort)
 LHCb involvement
 Depends on country
 Essentially comes via ‘Testbeds’ and ‘HEP applications’
Planning LHCb computing infrastructure
22 May 2000
Slide 11
EU Grid Work Packages
 Middleware
 Grid work scheduling
 Grid Data Management
 Grid Application Monitoring
 Fabric Management
 Mass Storage Management
 Infrastructure
 Testbed and Demonstrators (LHCb in)
 Network Services
C Vistoli(INFN)
B Segal(IT)
R Middleton(RAL)
T Smith(IT)
O Barring(IT)
F Etienne(Marseille)
C Michau(CNRS)
 Applications
 HEP (LHCb in)
 Earth Observation
 Biology
 Management
 Project Management
Planning LHCb computing infrastructure
22 May 2000
H Hoffmann(CERN)
L Fusco(ESA)
C Michau(CNRS)
F Gagliardi(IT)
Slide 12
Grid LHCb WP - Grid Testbed (DRAFT)
 MAP farm at Liverpool has 300 processors would take 4 months to generate the full
sample of events
 All data generated (~3TB) would be transferred to RAL for archive (UK regional
facility).
 All AOD and TAG datasets dispatched from RAL to other regional centres, such as
Lyon , CERN, INFN.
 Physicists run jobs at the regional centre or ship AOD and TAG data to local
institute and run jobs there. Also copy ESD for a fraction (~10%) of events for
systematic studies (~100 GB).
 The resulting data volumes to be shipped between facilities over 4 months would be
as follows :
Liverpool to RAL
3 TB (RAW ESD AOD and TAG)
RAL to LYON/CERN/…
0.3 TB (AOD and TAG)
LYON to LHCb institute
0.3 TB (AOD and TAG)
RAL to LHCb institute
Planning LHCb computing infrastructure
100 GB (ESD for systematic studies)
22 May 2000
Slide 13
MILESTONES for 3 year EU GRID project starting Jan
2001
 Mx1 (June 2001) Coordination with the other WP’s. Identification
of use cases and minimal grid services required at every step of the
project. Planning of the exploitation of the GRID steps.
 Mx2 (Dec 2001) Development of use cases programs. Interface
with existing GRID services as planned in Mx1.
 Mx3 (June 2002) Run #0 executed (distributed MonteCarlo
production and reconstruction) and feed back provided to the
other WP’s.
 Mx4 (Dec 2002) Run #1 executed (distributed analysis) and
corresponding feed-back to the other WP’s. WP workshop.
 Mx5 (June 2003) Run #2 executed including additional GRID
functionality.
 Mx6 (Dec 2003) Run #3 extended to a larger user community
Planning LHCb computing infrastructure
22 May 2000
Slide 14
‘Agreed’ LHCb resources going into EU GRID project over 3 years

Country
FTE equivalent/year

CERN
1

France
1

Italy
1

UK
1

Netherlands
.5
 These people should work together….LHCb GRID CLUB!
 This is for HEP applications WP - interfacing our
physics software into the GRID and running it in
testbed environments
 Some effort may also go into testbed WP (? Don’t know
if LHCb countries have signed up for this?)
Planning LHCb computing infrastructure
22 May 2000
Slide 15
Grid computing – LHCb planning
 Now : Forming GRID technical working group with reps from
regional facilities
 Liverpool(1), RAL(2), CERN(1), IN2P3(?), INFN(?), …
 June 2000 : define simulation samples needed in coming years
 July 2000 : Install Globus software in LHCb regional centres and
start to study integration with LHCb production tools
 End 2000 : define grid services for farm production
 June 2001 : implementation of basic grid services for farm
production provided by EU Grid project
 Dec 2001 : MDC 1 - small production for test of software
implementation (GEANT4)
 June 2002 : MDC 2 - large production of signal/background
sample for tests of world-wide analysis model
 June 2003 : MDC 3 - stress/scalability test on large scale Tier 0
facility, tests of Event Filter Farm, Farm control/management,
data throughput tests.
Planning LHCb computing infrastructure
22 May 2000
Slide 16
Prototype Computing Infrastructure
 Aim to build a prototype production facility at CERN
in 2003 (‘proposal coming out of LHC computing
review)
 Scale of prototype limited by what is affordable ~0.5 of the number of components of ATLAS system
 Cost ~20 MSFr
 Joint project between the four experiments
 Access to facility for tests to be shared
 Need to develop a distributed network of resources
involving other regional centres and deploy data
production software over the infrastructure for
tests in 2003
 Results of this prototype deployment used as basis
for Computing MoU
Planning LHCb computing infrastructure
22 May 2000
Slide 17
Tests Using Tier 0 Prototype in 2003
 We intend to make use of the Tier 0 prototype planned
for construction in 2003 to make stress tests of both
hardware and software
 We will prepare realistic examples of two types of
application :
 Tests designed to gain experience with the online farm
environment
 Production tests of simulation, reconstruction, and analysis
Planning LHCb computing infrastructure
22 May 2000
Slide 18
Event Filter Farm Architecture
RU
~100
RU
Readout Network
Technology (GbE?)
Switch
(Functions as Readout Network)
SFC
~10
~100
SFC
CPU
CPU
CPU
CPU
~10
CPU
Sub-Farm Network
Technology
(Ethernet)
Controls Network
Technology
(Ethernet)
SFC Sub-Farm Controller
Storage
Controller(s)
CPU
CPU
CPU
CPU
CPU
CPU Work CPU
CPC
CPC
CPC Control PC
Storage/CDR
Controls Network
Controls System
Planning LHCb computing infrastructure
22 May 2000
Slide 19
Testing/Verification
RU
~100
RU
Small Scale Lab
Tests +Simulation
Switch
(Functions as Readout Network)
Full Scale Lab
Tests
SFC
~10
~100
Large/Full Scale
Tests using Farm
Prototype
SFC
CPU
CPU
CPU
CPU
~10
CPU
Legend
Storage
Controller(s)
CPU
CPU
CPU
CPU
CPU
Storage/CDR
CPC
CPC
Controls Network
Controls System
Planning LHCb computing infrastructure
22 May 2000
Slide 20
Scalability tests for simulation and
reconstruction
Test writing of reconstructed+raw data at
200Hz in online farm environment
Test writing of reconstructed+simulated data in
offline Monte Carlo farm environment
 Population of event database from multiple input processes
Test efficiency of event and detector data
models
 Access to conditions data from multiple reconstruction jobs
 Online calibration strategies and distribution of results to
multiple reconstruction jobs
 Stress testing of reconstruction to identify hot spots, weak
code etc.
Planning LHCb computing infrastructure
22 May 2000
Slide 21
Scalability tests for analysis
Stress test of event database
 Multiple concurrent accesses by “chaotic” analysis
jobs
Optimisation of data model
 Study data access patterns of multiple, independent,
concurrent analysis jobs
 Modify event and conditions data models as
necessary
 Determine data clustering strategies
Planning LHCb computing infrastructure
22 May 2000
Slide 22
Work required now for planning prototypes for 2003/4
(request from Resource panel of LHC review)
 Plan for evolution to prototypes (Tier0/1) -who will
work on this from the institutes?
 Hardware evolution
 Spending profile
 Organisation(sharing of responsibilities in
collaboration/CERN/centres)
 Description of Mock Data Challenges
 Draft of proposal (hardware and software) for
prototype construction)
 ? By end 2000
 If shared Tier-0 prototype then single proposal for 4 expts??
Planning LHCb computing infrastructure
22 May 2000
Slide 23
Some NEWS from LHCb RC activities (and other..)
 LHCb/Italy currently preparing case to be submitted to INFN
June(compatible with planning shown in this talk)
in
 Liverpool
 Increased COMPASS nodes to 6 (3 TBytes of disk)
 Bidding for a 1000PC system with 800MHZ/processor and
70Gbyte/processor
 Globus should be fully installed soon
 Collaborating with Cambridge Astronomy to test Globus package
 Other experiments and the GRID
 CDF and Babar planning to set up GRID prototypes soon…
 GRID workshop in Sep (date and details to be confirmed)
 Any other news?
Planning LHCb computing infrastructure
22 May 2000
Slide 24
Download