ATLAS Data Challenges The Physics point of view UCL, September 5th 2001 Fabiola Gianotti (CERN) Three data challenges are foreseen: -- DC0 : end 2001 -- DC1 : first half 2002 -- DC2 : first half 2003 Computing TDR Goals : validate our computing model and our software Important physics content : provide data samples for physics studies and hopefully many physics results How ? Start with data which looks like real data need MC generators, G3/G4 simulation, event model, detailed detector response (e.g. noise, cross-talk, etc.), pile-up Run the filtering/trigger and reconstruction chain Store the output data into the database Run the analysis Produce physics results DC0: November - December 2001 In principle should be a test of the WHOLE software chain : a kind of “rehearsal” for DC1 (check that everything works for DC1) Issue is therefore not massive production of huge data samples but few 100k events able to test the whole software chain Chosen physics sample : few 100 k Z+jet events, with Z . -- allows tests of ALL sub-detectors (including b-tagging since 6% of jets are b-jets) -- idea is to produced several samples with the 3 general-purpose generators (PYTHIA, Isajet, Herwig) If you want to participate in DC1, you are (strongly) encouraged to participate in DC0 as well. DC1: February - July 2002 Scope : stress-test the system with large-scale production, reconstruction and analysis Several samples of up to 107 events 10% data collected at LHC in one year. Crucial issues : -- simulation will be done mainly with G3 but it is important to perform smaller-scale production with G4 -- comparison G3/G4 (with same geometry, to be meaningful …) -- learn about event model and detector description -- I/O performances : N events with different technologies -- pile-up treatment -- understand bottle necks -- understand distributed computing model / GRID (not discussed here) DC1: Physics samples 107 jets for e/jet separation studies in view of Trigger/DAQ TDR (due end of 2002). ~ 10 times more statistics than “old jet production”. Study performance of ATHENA and HLT algorithms. Useful also for other physics studies (e.g. optimisation of jet energy reconstruction algorithm) Any other CPU-consuming physics sample considered useful for physics studies. Mainly SM “background processes” : examples: -- inclusive muon sample (for B-physics and muon performance studies), Zbb and Wbb samples (backgrounds to many searches), WW/ZZ samples -- Z for tau-lifetime studies -- several samples with different generators to understand the physics of various MC Physics groups and Combined Performance groups asked to prepare list of wishes first discussions at Physics Coordination in Lund and at October ATLAS week. Everybody is encouraged to make suggestions DC2 : January - September 2003 Scope/precise goals: depend on the outcomes of DC0/1 Present goals: -- 108 events ( data collected in 1 LHC year) -- Geant4 should play a major role -- full test of calibration/alignement procedures and condition database -- question : do we want to add part or all of DAQ, LVl1, LVL2, Event filter ? Physics content: -- demonstrate capability of extracting and interpreting a signal from New Physics -- generate various SM samples and “hide” in each one a different New Physics process (e.g. SUSY for one mSUGRA point, excited leptons, etc.). -- people will be asked to understand the nature and all possible features of the signal (without knowing a priori what it is) DC production : CPU and data size Number of events DC0 ~ 10 5 DC1 ~ 10 7 DC2 ~ 10 8 Time hours SI95 Total size 5 ~10 ~ 0.2 TB 7 ~10 ~ 20 TB 8 ~10 ~ 200 TB “Physics readiness document” (kind of Physics TDR prime … ) : LHC t0-1year Content (examples): Work done with MC generators, the ATLAS MC library, status/strategy for MC production Strategy for using different levels of simulation (full, parametrisations, fast) for different processes Comparisons G4/test-beam data, FLUKA/test beam-data systematics from full simulation Main figures of Physics TDR redone with new/final software Specialised packages needed for various physics studies (e.g. MSSM scan packages for Higgs and SUSY with up-to-date theoretical calculations, etc.) etc. Status of the non-core software (my view, emphasis on “physics part”) Main generators (PYTHIA, ISAJET, HERWIG) interfaced to HepMC (HERWIG being finalised …). Next : specialised generators (e.g. VECBOS, QQ) Simulation : -- G4 : physics validation not completed (lot of work done with EM physics, hadronic physics being tested now); full ATLAS geometry not yet in. -- DC0, DC1: use G3 plus smaller/restricted (e.g. to some detector parts) productions with G4 -- FLUKA : I am 100% sure with need it. I intitiated a pilot-project with Tilecal : G4 test-beam geometry input to FLUKA (first results in Lund). Then extend to other sub-detectors Intermediate simulation (e.g. shower/track parametrisation): I am 100% sure we need it. Tried to find people over the last two years failed. Recently a couple of groups have shown some interest. ATLFAST OO (UK product): -- runs in ATHENA -- reads HepMC from Objectivity, writes output into Objectivity (and ntuples) -- first validation made. Further results in Lund (from “users-nondevelopers”) -- next steps: improve functionality (beyond ATLFAST fortran). E.g. : shower shapes ? Trigger simulation ? Parametrisation for B-physics ? C++/OO reconstruction: -- runs in ATHENA -- reads G3 hits/digis (Phyiscs TDR data) -- validation results in Lund Less clear situation (to me ...) for : e.g. -- event data model -- detector description -- database , condition database, technology choice -- simulation framework vs ATHENA -- analysis tools (maybe premature today but one of the aims of DC’s should be validation of analysis tools) Where could you contribute ? Lot of work to be done everywhere , of course …. Examples: Improve understanding of ATLAS potential for physics (e.g. SUSY, Extra-dimensions, backgrounds) and detector performance (e.g. can we tag charm-jets ?) by analysing data produced by DC’s. Improve reconstruction, algorithms, etc. (e.g. HLT, E-flow algorithm for jet reconstruction using ID+CALOs) Validation of MC generators: e.g. Which MC for which process ? For which processes do we need more calculations and/or additional/specialised MC ? Validation of G4/FLUKA physics : comparisons with test-beam data (in particular nuclear interactions) Validation of ATLFAST OO and new reconstruction against old/fortran ATLFAST and ATRECON Intermediate simulation (shower and track parametrisations) Detector response : hits digi (including noise, pile-up with correct time structure, efficiency, etc.) HLT-DC1 scenario This has to be discussed with the HLT community but the basis could be similar to what has been done previously Generation: Pt hard scattering > 17 Gev | h | < 2.7 2 samples 1) e-candidate S Et > 17 Gev, no m, no n • Grid 0.12 x 0.12 2) Jet-candidate S Et > 40 Gev • Grid 1.0 x 1.0 A first selection is made at the level of the event generation One keeps 14.5% of generated events • 14.4% for (1) and 2% for (2) HLT- DC1 scenario Simulation The remaining events are run through the full simulation The Lvl1 trigger is applied at that level One keeps 13.7% of the events • 97% for (1) and 10% for (2) The pile-up is run for the remaining events means ~2% of the ‘generated’ sample Reconstruction Then the events are run through Lvl2, Event Filter and offline reconstruction What next Prepare a first list of goals & requirements with HLT, Physics community simulation, reconstruction, database communities people working on ‘infrastructure’ activities (bookkeeping) to be discussed with A-team with CSG (July 24th meeting) In order to prepare a list of tasks • Some Physics oriented • But also like testing code, running production, … define the priorities Then Start the validation of the various components in the chain (putting dead lines for readiness) Software Simulation, pile-up, … Infrastructure Database, bookkeeping, … Estimate what it will be realistic (!) to do For DC0 For DC1 “And turn the key” The ATLAS Data Challenges Project Structure Organisation CSG NCB Reviews ATLAS Data Challenges Reports Resource Matters DC Overview Board DataGrid Project Other Computing Grid Projects TIERs DC Execution Board Work Plan Definition DC Definition Committee (DC2) WP WP WP WP WP RTAG Expression of interests So far, after the NCB meeting of July 10th: Canada, France, Italy, Japan, Nordic Grid, Russia, Taiwan, UK Proposition to help in DC0 Proposition to participate to DC1 Contact with HLT community Contact with EU-Data-GRID Kit of ATLAS software