Planning LHCb computing infrastructure at CERN and at regional centres F. Harris Planning LHCb computing infrastructure 22 May 2000 Slide 1 Talk Outline Reminder of LHCb distributed model Requirements and planning for 2000-2005 ( growth of regional centres) EU GRID proposal status and LHCb planning Large prototype proposal and LHCb possible uses Some NEWS from LHCb (and other) activities Planning LHCb computing infrastructure 22 May 2000 Slide 2 General Comments Draft LHCb Technical Note for computer model exists New requirements estimates have been made (big changes in MC requirements) Several presentations have been made to LHC computing review in March and May (and May 8 LHCb meeting) http://lhcb.cern.ch/computing/Steering/Reviews/LHCComputing2000/default.htm Planning LHCb computing infrastructure 22 May 2000 Slide 3 Baseline Computing Model - Roles To provide an equitable sharing of the total computing load can envisage a scheme such as the following After 2005 role of CERN (notionally 1/3) to be production centre for real data support physics analysis of real and simulated data by CERN based physicists Role of regional centres (notionally 2/3) to be production centre for simulation to support physics analysis of real and simulated data by local physicists Institutes with sufficient cpu capacity share simulation load with data archive at nearest regional centre Planning LHCb computing infrastructure 22 May 2000 Slide 4 CPU for production Mass Storage for RAW, ESD AOD, and TAG Production Centre Generate raw data Reconstruction Production analysis User analysis AOD,TAG real : 80TB/yr sim: 120TB/yr CPU for analysis Mass storage for AOD, TAG Regional Centre User analysis Regional Centre Regional Centre Regional Centre User analysis User analysis User analysis AOD,TAG 8-12 TB/yr CPU and data servers Institute Selected User Analyses Institute Institute Selected User Analyses Institute Selected User Analyses Selected User Analyses Physics : Plans for Simulation 2000-2005 In 2000 and 2001 we will produce 3. 106 simulated events each year for detector optimisation studies in preparation of the detector TDRs (expected in 2001 and early 2002). In 2002 and 2003 studies will be made of the high level trigger algorithms for which we are required to produce 6.106 simulated events each year. In 2004 and 2005 we will start to produce very large samples of simulated events, in particular background, for which samples of 107 events are required. This on-going physics production work will be used as far as is practicable for testing development of the computing infrastructure. Planning LHCb computing infrastructure 22 May 2000 Slide 6 Computing : MDC Tests of Infrastructure 2002 : MDC 1 - application tests of grid middleware and farm management software using a real simulation and analysis of 107 B channel decay events. Several regional facilities will participate : CERN, RAL, Lyon/CCIN2P3,Liverpool, INFN, …. 2003 : MDC 2 - participate in the exploitation of the large scale Tier0 prototype to be setup at CERN High Level Triggering – online environment, performance Management of systems and applications Reconstruction – design and performance optimisation Analysis – study chaotic data access patterns STRESS TESTS of data models, algorithms and technology 2004 : MDC 3 - Start to install event filter farm at the experiment to be ready for commissioning of detectors in 200 4 and 2005 Planning LHCb computing infrastructure 22 May 2000 Slide 7 Cost of CPU, disk and tape Moore’s Law evolution with time for cost of CPU and storage. Scale in MSFr is for a facility sized to ATLAS requirements (> 3 x LHCb) At today’s prices the total cost for LHCb ( CERN and regional centres) would be ~60 MSFr In 2004 the cost would be ~10 - 20 MSFr After 2005 the maintenance cost is ~ 5 MSFr /year Planning LHCb computing infrastructure 22 May 2000 Slide 8 Growth in Requirements to Meet Simulation Needs 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 No of signal events 1.0E+06 1.0E+06 2.0E+06 3.0E+06 5.0E+06 1.0E+07 1.0E+07 1.0E+07 1.0E+07 1.0E+07 1.0E+07 No of background events 1.0E+06 1.5E+06 2.0E+06 4.0E+06 1.0E+07 1.0E+09 1.0E+09 1.0E+09 1.0E+09 1.0E+09 1.0E+09 CPU for simulation of signal (SI95) 10000 10000 20000 30000 50000 100000 100000 100000 100000 100000 100000 CPU for background simulation (SI95) 16000 24000 32000 64000 160000 400000 400000 400000 400000 400000 400000 CPU user analysis (SI95) 2500 2500 5000 7500 12500 25000 25000 25000 25000 25000 25000 RAWmc data on disk (TB) 0.4 0.5 0.8 1.4 3 202 202 202 202 202 202 RAWmc data on tape (TB) 0.4 0.5 0.8 1.4 3 202 202 202 202 202 202 ESDmc data on disk (TB) 0.2 0.25 0.4 0.7 1.5 101 101 101 101 101 101 AODmc data on disk (TB) 0.06 0.1 0.1 0.3 0.5 30.5 39.4 42.1 42.9 43.2 43.3 TAGmc data on disk (TB) 0.002 0.0025 0.004 0.007 0.015 1.01 1.01 1.01 1.01 1.01 1.01 Unit Costs CPU cost / SI95 Disk cost / GB Tape cost / GB 2000 2001 2002 64.5 46.1 32.9 16.1 11.5 8.2 2.7 1.9 1.4 2003 23.5 5.9 1.0 2004 16.8 4.2 0.7 2005 12.0 3.0 0.5 2006 8.6 2.1 0.36 2007 2008 2009 2010 6.1 4.4 3.1 2.2 1.5 1.1 0.8 0.6 0.26 0.18 0.13 0.09 CPU for signal(kSFr) CPU for background (kSFr) CPU for user analysis (kSFr) RAWmc data on disk (kSFr) RAWmc data on tape (kSFr) ESDmc data on disk (kSFr) AODmc data on disk (kSFr) TAGmc data on disk (kSFr) Investment per year (kSFr) 2000 2001 2002 323 230 329 0 369 263 65 92 66 3 6 2 0.2 0.2 0.2 3 3 1 1 1 0 0.0 0.0 0.0 395 701 663 2003 235 753 59 4 0.3 2 1 0.0 1053 2004 336 1613 84 7 0.4 3 1 0.1 2045 2005 600 2880 150 597 20.2 299 90 3.0 4639 2006 171 857 69 129 14.4 64 21 2.2 1328 2007 2008 2009 2010 122 87 62 45 612 437 312 223 49 35 25 18 92 66 47 33 10.3 7.4 5.3 3.8 46 33 23 17 15 11 8 6 1.5 1.1 0.8 0.6 949 678 484 346 Planning LHCb computing infrastructure 22 May 2000 Slide 9 Cost / Regional Centre for Simulation Assume there are 5 regional centres(UK,IN2P3,INFN,CERN+ consortium of Nikhef, Russia, etc...) Assume costs are shared equally 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 No of signal events 2.0E+05 2.0E+05 4.0E+05 6.0E+05 1.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06 2.0E+06 No of background events 2.0E+05 3.0E+05 4.0E+05 8.0E+05 2.0E+06 2.0E+08 2.0E+08 2.0E+08 2.0E+08 2.0E+08 2.0E+08 CPU for simulation of signal (SI95) 2000 2000 4000 6000 10000 20000 20000 20000 20000 20000 20000 CPU for simulation of background (SI95) 3200 4800 6400 12800 32000 80000 80000 80000 80000 80000 80000 CPU user analysis (SI95) 500 500 1000 1500 2500 5000 5000 5000 5000 5000 5000 RAWmc data on disk (TB) 0.08 0.1 0.16 0.28 0.6 40.4 40.4 40.4 40.4 40.4 40.4 RAWmc data on tape (TB) 0.08 0.1 0.16 0.28 0.6 40.4 40.4 40.4 40.4 40.4 40.4 ESDmc data on disk (TB) 0.04 0.05 0.08 0.14 0.3 20.2 20.2 20.2 20.2 20.2 20.2 AODmc data on disk (TB) 0.012 0.0186 0.02958 0.05087 0.10526 6.09158 7.88747 8.42624 8.58787 8.63636 8.65091 TAGmc data on disk (TB) 0.0004 0.0005 0.0008 0.0014 0.003 0.202 0.202 0.202 0.202 0.202 0.202 CPU for signal (kSFr) CPU for background (kSFr) CPU user analysis (kSFr) RAWmc data on disk (kSFr) RAWmc data on tape (kSFr) ESDmc data on disk (kSFr) AODmc data on disk (kSFr) TAGmc data on disk (kSFr) Investment per year (kSFr) Planning LHCb computing infrastructure 2000 2001 2002 64.5 46.1 65.9 0.0 73.8 52.7 12.9 18.4 13.2 0.6 1.2 0.5 0.0 0.0 0.0 0.6 0.6 0.2 0.2 0.2 0.1 0.0 0.0 0.0 79 140 133 22 May 2000 2003 47.0 150.5 11.8 0.7 0.1 0.4 0.1 0.0 211 2004 67.2 322.6 16.8 1.3 0.1 0.7 0.2 0.0 409 2005 120.0 576.0 30.0 119.4 4.0 59.7 18.0 0.6 928 2006 34.3 171.4 13.7 25.7 2.9 12.9 4.3 0.4 266 2007 2008 2009 2010 24.5 17.5 12.5 8.9 122.4 87.5 62.5 44.6 9.8 7.0 5.0 3.6 18.4 13.1 9.4 6.7 2.1 1.5 1.1 0.8 9.2 6.6 4.7 3.3 3.1 2.2 1.6 1.1 0.3 0.2 0.2 0.1 190 136 97 69 Slide 10 EU GRID proposal status (http://grid.web.cern.ch/grid/) GRIDs Software to manage all aspects of distributed computing(security and authorisation, resource management,monitoring). Interface to high energy physics... Proposal was submitted May 9 Main signatories (CERN,France,Italy,UK,Netherlands,ESA) + associate signatories (Spain,Czechoslovakia,Hungary,Spain,Portugal,Scandinavia..) Project composed of Work Packages (to which countries provide effort) LHCb involvement Depends on country Essentially comes via ‘Testbeds’ and ‘HEP applications’ Planning LHCb computing infrastructure 22 May 2000 Slide 11 EU Grid Work Packages Middleware Grid work scheduling Grid Data Management Grid Application Monitoring Fabric Management Mass Storage Management Infrastructure Testbed and Demonstrators (LHCb in) Network Services C Vistoli(INFN) B Segal(IT) R Middleton(RAL) T Smith(IT) O Barring(IT) F Etienne(Marseille) C Michau(CNRS) Applications HEP (LHCb in) Earth Observation Biology Management Project Management Planning LHCb computing infrastructure 22 May 2000 H Hoffmann(CERN) L Fusco(ESA) C Michau(CNRS) F Gagliardi(IT) Slide 12 Grid LHCb WP - Grid Testbed (DRAFT) MAP farm at Liverpool has 300 processors would take 4 months to generate the full sample of events All data generated (~3TB) would be transferred to RAL for archive (UK regional facility). All AOD and TAG datasets dispatched from RAL to other regional centres, such as Lyon , CERN, INFN. Physicists run jobs at the regional centre or ship AOD and TAG data to local institute and run jobs there. Also copy ESD for a fraction (~10%) of events for systematic studies (~100 GB). The resulting data volumes to be shipped between facilities over 4 months would be as follows : Liverpool to RAL 3 TB (RAW ESD AOD and TAG) RAL to LYON/CERN/… 0.3 TB (AOD and TAG) LYON to LHCb institute 0.3 TB (AOD and TAG) RAL to LHCb institute Planning LHCb computing infrastructure 100 GB (ESD for systematic studies) 22 May 2000 Slide 13 MILESTONES for 3 year EU GRID project starting Jan 2001 Mx1 (June 2001) Coordination with the other WP’s. Identification of use cases and minimal grid services required at every step of the project. Planning of the exploitation of the GRID steps. Mx2 (Dec 2001) Development of use cases programs. Interface with existing GRID services as planned in Mx1. Mx3 (June 2002) Run #0 executed (distributed MonteCarlo production and reconstruction) and feed back provided to the other WP’s. Mx4 (Dec 2002) Run #1 executed (distributed analysis) and corresponding feed-back to the other WP’s. WP workshop. Mx5 (June 2003) Run #2 executed including additional GRID functionality. Mx6 (Dec 2003) Run #3 extended to a larger user community Planning LHCb computing infrastructure 22 May 2000 Slide 14 ‘Agreed’ LHCb resources going into EU GRID project over 3 years Country FTE equivalent/year CERN 1 France 1 Italy 1 UK 1 Netherlands .5 These people should work together….LHCb GRID CLUB! This is for HEP applications WP - interfacing our physics software into the GRID and running it in testbed environments Some effort may also go into testbed WP (? Don’t know if LHCb countries have signed up for this?) Planning LHCb computing infrastructure 22 May 2000 Slide 15 Grid computing – LHCb planning Now : Forming GRID technical working group with reps from regional facilities Liverpool(1), RAL(2), CERN(1), IN2P3(?), INFN(?), … June 2000 : define simulation samples needed in coming years July 2000 : Install Globus software in LHCb regional centres and start to study integration with LHCb production tools End 2000 : define grid services for farm production June 2001 : implementation of basic grid services for farm production provided by EU Grid project Dec 2001 : MDC 1 - small production for test of software implementation (GEANT4) June 2002 : MDC 2 - large production of signal/background sample for tests of world-wide analysis model June 2003 : MDC 3 - stress/scalability test on large scale Tier 0 facility, tests of Event Filter Farm, Farm control/management, data throughput tests. Planning LHCb computing infrastructure 22 May 2000 Slide 16 Prototype Computing Infrastructure Aim to build a prototype production facility at CERN in 2003 (‘proposal coming out of LHC computing review) Scale of prototype limited by what is affordable ~0.5 of the number of components of ATLAS system Cost ~20 MSFr Joint project between the four experiments Access to facility for tests to be shared Need to develop a distributed network of resources involving other regional centres and deploy data production software over the infrastructure for tests in 2003 Results of this prototype deployment used as basis for Computing MoU Planning LHCb computing infrastructure 22 May 2000 Slide 17 Tests Using Tier 0 Prototype in 2003 We intend to make use of the Tier 0 prototype planned for construction in 2003 to make stress tests of both hardware and software We will prepare realistic examples of two types of application : Tests designed to gain experience with the online farm environment Production tests of simulation, reconstruction, and analysis Planning LHCb computing infrastructure 22 May 2000 Slide 18 Event Filter Farm Architecture RU ~100 RU Readout Network Technology (GbE?) Switch (Functions as Readout Network) SFC ~10 ~100 SFC CPU CPU CPU CPU ~10 CPU Sub-Farm Network Technology (Ethernet) Controls Network Technology (Ethernet) SFC Sub-Farm Controller Storage Controller(s) CPU CPU CPU CPU CPU CPU Work CPU CPC CPC CPC Control PC Storage/CDR Controls Network Controls System Planning LHCb computing infrastructure 22 May 2000 Slide 19 Testing/Verification RU ~100 RU Small Scale Lab Tests +Simulation Switch (Functions as Readout Network) Full Scale Lab Tests SFC ~10 ~100 Large/Full Scale Tests using Farm Prototype SFC CPU CPU CPU CPU ~10 CPU Legend Storage Controller(s) CPU CPU CPU CPU CPU Storage/CDR CPC CPC Controls Network Controls System Planning LHCb computing infrastructure 22 May 2000 Slide 20 Scalability tests for simulation and reconstruction Test writing of reconstructed+raw data at 200Hz in online farm environment Test writing of reconstructed+simulated data in offline Monte Carlo farm environment Population of event database from multiple input processes Test efficiency of event and detector data models Access to conditions data from multiple reconstruction jobs Online calibration strategies and distribution of results to multiple reconstruction jobs Stress testing of reconstruction to identify hot spots, weak code etc. Planning LHCb computing infrastructure 22 May 2000 Slide 21 Scalability tests for analysis Stress test of event database Multiple concurrent accesses by “chaotic” analysis jobs Optimisation of data model Study data access patterns of multiple, independent, concurrent analysis jobs Modify event and conditions data models as necessary Determine data clustering strategies Planning LHCb computing infrastructure 22 May 2000 Slide 22 Work required now for planning prototypes for 2003/4 (request from Resource panel of LHC review) Plan for evolution to prototypes (Tier0/1) -who will work on this from the institutes? Hardware evolution Spending profile Organisation(sharing of responsibilities in collaboration/CERN/centres) Description of Mock Data Challenges Draft of proposal (hardware and software) for prototype construction) ? By end 2000 If shared Tier-0 prototype then single proposal for 4 expts?? Planning LHCb computing infrastructure 22 May 2000 Slide 23 Some NEWS from LHCb RC activities (and other..) LHCb/Italy currently preparing case to be submitted to INFN June(compatible with planning shown in this talk) in Liverpool Increased COMPASS nodes to 6 (3 TBytes of disk) Bidding for a 1000PC system with 800MHZ/processor and 70Gbyte/processor Globus should be fully installed soon Collaborating with Cambridge Astronomy to test Globus package Other experiments and the GRID CDF and Babar planning to set up GRID prototypes soon… GRID workshop in Sep (date and details to be confirmed) Any other news? Planning LHCb computing infrastructure 22 May 2000 Slide 24