Edinburgh - 29 January 2009 ATLAS th ATLAS, the G Grid id and d th the UK Roger Jones Lancaster University Roger Jones: ATLAS, the Grid & the UK 1 Edinburgh - 29 January 2009 Event Data Model z RAW: z “ByteStream” ByteStream format, format ~1 ~1.6 6 MB/event ESD (Event Summary Data): Full output of reconstruction in object (POOL/ROOT) format: ¾ Nominal size 1 MB/event initially, to decrease as the understanding of the detector improves ¾ z Summary of event reconstruction with “physics” (POOL/ROOT) objects: ¾ electrons, muons, jets, etc. Nominal size 100 kB/event (currently roughly double that) DPD (Derived Physics Data): Skimmed/slimmed/thinned events + other useful “user” data derived from AODs and conditions data DPerfD is mainly skimmed ESD Nominally 10 kB/event on average ¾ z Compromise “being able to do everything on the ESD” and “not storage for oversize events” AOD ((Analysis y Object j Data): ) z Tracks (and their hits), hits) Calo Clusters, Clusters Calo Cells, Cells combined reconstruction objects etc etc. Large variations depending on physics channels TAG: Database (or ROOT files) used to quickly select events in AOD and/or ESD files Roger Jones: ATLAS, the Grid & the UK 2 Edinburgh - 29 January 2009 Computing p g Model: main operations p z Tier-0: z Tier-1s (x10): z Store and take care of a fraction of RAW data (forever) Run “slow” calibration/alignment procedures R Rerun reconstruction i with i hb better calib/align lib/ li and/or d/ algorithms l i h Distribute reconstruction output to Tier-2s Keep current versions of ESDs and AODs on disk for analysis Run large large-scale scale event selection and analysis jobs for physics and detector groups Looks like some user access will be granted, but limited and NO ACCESS TO TAPE or LONG TERM STORAGE Tier-2s (x~35): z Copy RAW data to CERN Castor for archival & Tier-1s Tier 1s for storage and reprocessing Run first-pass calibration/alignment Run first-pass reconstruction (within 48 hrs) Distribute reconstruction output (ESDs, AODs, DPDs & TAGS) to Tier-1s Run analysis jobs (mainly AOD and DPD) Run simulation (and calibration/alignment when/where appropriate) Keep current versions of AODs and samples of other data types on disk for analysis Ti 3 Tier-3s: Provide access to Grid resources and local storage for end-user data Contribute CPU cycles for simulation and analysis if/when possible Roger Jones: ATLAS, the Grid & the UK 3 Edinburgh - 29 January 2009 Necessity y of Distributed Computing p g z We are going to collect raw data at 320 MB/s for 50k seconds/day and ~100 d s/ days/year RAW dataset: 1.6 PB/year Processing (and re-processing) these events will require ~10k CPUs full time the first year of data-taking, and a lot more in the future as data accumulate Reconstructed events will also be large, as people want to study detector performance as well as do physics analysis using the output data ESD dataset: 1.0 PB/year, AOD, DPD datasets: hundreds of TB/year At least 10k CPUs are also needed for continuous simulation production of at least 20-30% of the real data rate and for analysis y z There is no way to concentrate all needed computing power and storage capacity The LEP/Tevatron model will not scale to this level Roger Jones: ATLAS, the Grid & the UK 4 Edinburgh - 29 January 2009 Event data flow from online to offline z Events are written in “ByteStream” format by the Event Filter farm in <=2 GB files Event rate 40MHz, interesting events 100-1000Hz 200 Hz trigger rate (independent of luminosity luminosity, except for Heavy Ions) Events will be grouped by “luminosity block” (1-2 minute intervals) ¾ One luminosity block can be approximated as having constant luminosity ¾ There should be enough information for each lumi block to be able to calculate the luminosity Nominal RAW event size is 1.6 MB/event Several streams ((event can be in more than one stream): ) ¾ ~5 physics event streams, separated by main trigger signature z e.g. muons, electromagnetic, hadronic jets, taus, minimum bias ¾ Express stream with monitoring and calibration (physics) events to be processed immediately ¾ Calibration data streams ¾ “Trouble maker” events (for debugging) Data will be transferred to the Tier-0 Tier 0 input buffer at 320 MB/s (average) Roger Jones: ATLAS, the Grid & the UK 5 Edinburgh - 29 January 2009 Tier-2 Data on Disk ~35 Tier-2 sites of very, very different size contain: Some fraction of ESD and RAW z In 2008/early data 2009: 30% of RAW and 150% of ESD in Tier-2 cloud In late 2009 and after: 10% of RAW and 30% of ESD in Tier-2 Tier 2 cloud This will largely be ‘pre-placed’ in early running Recall of small samples through the group production at T1 Additional access to ESD and RAW in CAF ¾ 1/18 RAW and 10% ESD z 10 0 cop copies es of full AOD on disk d sk z A full set of official group DPD (in production area) z Lots of small group DPD (in production area) z User data • Access is ‘on demand’ Roger Jones: ATLAS, the Grid & the UK 6 Edinburgh - 29 January 2009 Tier 2 Nominal Disk Share 2009 Tier 2 Disk share 2009 Raw ESD / DPerfD AOD TAG RAW Sim ESD Sim (curr.) AOD Sim Tag Sim Group DPD User Data Roger Jones: ATLAS, the Grid & the UK 7 Edinburgh - 29 January 2009 Tier-3s z These have many forms z Basically represent resources not for general ATLAS usage Some fraction of T1/T2 resources L cal University clusters Local Desktop/laptop machines Tier-3 task force provides recommended solutions (plural!): ¾ http://indico.cern.ch/getFile.py/access?contribId=30&sessionId=14&resId=0&materialId=slides& confId=22132 z Concern over the apparent pp belief that Tier-3s can host or p pull down large g samples p Required storage and effort, network and server loads at Tier-2s Roger Jones: ATLAS, the Grid & the UK 8 Edinburgh - 29 January 2009 Minimal Tier-3 requirements q z z z The ATLAS software environment, as well as the ATLAS and grid middleware iddl ttools, ls allow ll uss tto b build ild a work k model d lf for collaborators ll b t s who h are located at sites with low network bandwidth to Europe or North America. The minimal requirement q is on local installations,, which should be configured with a Tier-3 functionality: A Computing Element known to the Grid, in order to benefit from the aut matic distributi automatic distribution n of f ATLAS ssoftware ft are releases A SRM-based Storage Element, in order to be able to transfer data automatically from the Grid to the local storage, and vice versa The local cluster should have the installation of: A Grid User Interface suite, to allow job submission to the Grid ATLAS DDM client tools, tools to permit access to the DDM data catalogues and data transfer utilities The Ganga/pAthena client, to allow the submission of analysis jobs to all ATLAS computing resources Roger Jones: ATLAS, the Grid & the UK 9 Edinburgh - 29 January 2009 A Few Statements of Policy y z The ATLAS data volumes WILL be very large Even after hard selections, you will have large sets to work with We have a distributed computing model Every ATLAS physicist E h i i should h ld h have managed d access to the h data d & some of the CPU at any ATLAS Tier 2 worldwide ¾ z The same comment does not apply pp y to Tier 1s - the Tier 1s are f for production role only, although some Tier 1s have attached Tier 2s But not too distributed Attempt to have all analysis sets within a cloud ¾ The UK is a cloud, roughly 10% of the ATLAS total Th cloud The l d has h some degree d of f autonomy t ¾ We set the policy the placement of sets within the cloud ¾ We have also set aside some Tier 2 disk storage and some Tier 2 CPU for UK-only use, beyond the disk & CPU pledged to the whole of ATLAS Roger Jones: ATLAS, the Grid & the UK 10 Edinburgh - 29 January 2009 User Data Movement Policy y z Jobs go to the data, not data to the job z Users need to access the files they produce This means they need (ATLAS) data tools on Tier 3s z There is a risk: some users may attempt to move large data volumes ¾ SE overload ¾ Network congestion ¾ DDM meltdown ATLAS policy in outline: ¾ O(10GB/day/user) who cares? ¾ O(50GB/day/user) rate throttled ¾ O(10TB/day/user) user throttled! ¾ Planned large movements possible if negotiated Roger Jones: ATLAS, the Grid & the UK 11 Edinburgh - 29 January 2009 The UK and Data Placement z The movement and placement of data must be managed Overload of the data management system slows the system down for everyone Unnecessary multiple copies waste disk space and will prevent a full set being available Some multiple copies will be a good idea to balance loads z We have a group for deciding the data placement: z UK Physics y Co-ordinator, UK deputy p y spokesman, p Tony y Doyle y (UK data ops), Roger Jones (UK ops) + Stewart, Love & Brochu The UK Physics co-ordinator consults the institute physics reps The initial data plans follows the matching of trigger type to site from previous exercises We will make second copies until we run short of space space, then the second copies will be removed *at little notice* Roger Jones: ATLAS, the Grid & the UK 12 Edinburgh - 29 January 2009 Grid Storage g for the UK z z The general user area on ATLAS Tier 2s is scratch space There is no guaranteed lifetime, as we cannot control the writing of new files - users need to be responsible The aim is to have files on disk for many days (a month?) month?), allowing them to either be discarded, used and discarded or used and moved to secure storage So where can I keep files? On your local non-Grid storage I ATLASLOCALUSERDISK space In ¾ This means local to the UK - your certificate must be in the ATLAS-UK g group p ¾ We must manage this space! ¾ Be responsible! We would like to avoid heavy policing of this space, but can and d will ill if people l go crazy Roger Jones: ATLAS, the Grid & the UK 13 Edinburgh - 29 January 2009 CPU in the UK z 20% of the UK Tier 2 Grid capacity in the UK is allocated for UK specific usage At present, we have not implemented this, as we are not heavily loaded When we do, this will also be based on VO membership, with some queues only open to ATLAS-UK Roger Jones: ATLAS, the Grid & the UK 14 Edinburgh - 29 January 2009 ATLAS Software & Computing p g Project j z z The ATLAS Collaboration has developed a set of software and middleware tools that enable access to data for physics analysis purposes to all members of the collaboration, independently of their geographical location. Main building blocks of this infrastructure are: z The Athena software framework, with its associated modular structure of the event data model, including the software for: ¾ Event simulation; ¾ Event trigger; ¾ Event reconstruction; ¾ y analysis y tools. Physics The Distributed Computing tools built on top of Grid middleware: ¾ The Distributed Data Management system; ¾ The Distributed Production System; ¾ The Ganga/pAthena frameworks for distributed analysis on the Grid. ¾ Monitoring and accounting DDM is the central link between all components As data access is needed for any processing and analysis step! Roger Jones: ATLAS, the Grid & the UK 15 Edinburgh - 29 January 2009 Disillusionment? Gartner Group HEP Grid on the LHC timeline 2003 2004 2007 2002 2008 2006 2005 Roger Jones: ATLAS, the Grid & the UK 16 Edinburgh - 29 January 2009 ATLAS Distributed Data Management g z z z The DDM design is based on: A hierarchical hi hi l definition d fi i i of fd datasets Central dataset catalogues Data blocks as units of file storage g and replication p Distributed file catalogues Automatic data transfer mechanisms using distributed services (dataset subscription system) There are also local tools to allow you to access data from the grid from a gridenabled local site How do I find the data? You can use tools like AMI and ELSSI See James’ talk, or attend a tutorial at CERN or in Edinburgh g at the end of this month! Roger Jones: ATLAS, the Grid & the UK 17 Edinburgh - 29 January 2009 Central vrs Local Services z z z The DDM system has now a central role with respect to ATLAS Grid tools One f O fundamental d t lf feature t is th the presence s of f dist distributed ib t d fil file catalogues t l s and d ((above b all) ll) auxiliary services Clearly we cannot ask every single Grid centre to install ATLAS services We decided to install “local” catalogues and services at Tier-1 centres ¾ VO Box ¾ FTS channel server (both directions) ¾ L Local l file f l catalogue l (part ( of f DDM/D DDM/DQ2) 2) We believe that this architecture scales to our needs : Moving several 10000s files/day Supporting up to 100000 organized production jobs/day Supporting the analysis work of >1000 active ATLAS physicists T0VObox LFC T1 T1 VObox LFC C T2 T2 Roger Jones: ATLAS, the Grid & the UK …. FTS Server T0 FTS Server T1 LFC: local within ‘cloud’ All SEs with SRM interface 18 Edinburgh - 29 January 2009 Distributed Analysis y z z The ATLAS tool for distributed analysis is GANGA Supports all ll three h Grid G d flavours, fl local l l jobs b and db batch h systems llike k It is quite generic, and can support all sorts of jobs, not just Athena Can be command line, line scripted & also has a GUI But what’s this pAthena then? Developed for running Athena jobs on OSG Uses the same back-end as the production system Has the equivalent of the GANGA command line interface Not yet compliant with EGEE and NDGF requirements for general users z GANGA has back end to same system; the 2 projects being integrated integrated. z DA tests have been invaluable, especially in the UK Feedback from UK users was very good, good and encouraging Tests of pilot-based system in the UK a credit to UK Roger Jones: ATLAS, the Grid & the UK 19 Edinburgh - 29 January 2009 Shifts,, Operations p & User Support pp z The computing operations need effort both at CERN & in the UK z There is a lot of information available, but it not always clear where to look We have a UK computing operations wiki that should inform you of general problems in the UK cloud http://www.atlas.ac.uk/ops.html Also an ATLAS UK Grid operations page http://www.atlas.ac.uk/grid.html UK mailing lists: ¾ atlas-uk-comp-users@cern.ch (UK atlas user support, discussion, and announcements) ¾ gridpp-users@jiscmail ac uk (Support for all GridPP experiments) gridpp-users@jiscmail.ac.uk There is a weekly UK operations meeting that addresses problems in the UK cloud A Savanna bug tracking system for UK issues. Best used by the expert team after site problem has been identified,. Roger Jones: ATLAS, the Grid & the UK 20 Edinburgh - 29 January 2009 The ATLAS UK Grid page p g Roger Jones: ATLAS, the Grid & the UK 21 Edinburgh - 29 January 2009 Is everything y g ready y then? z z Unfortunately not yet: a lot of work remains Thorough testing of existing software and tools l Optimisation of CPU usage, memory consumption, I/O rates and event size on disk Completion p of the data management g tools with disk space p management g Completion of the accounting, priority and quota tools (both for CPU and storage) Just one example (but there are many!): In the computing model we foresee distributing a full copy of AOD data to each Tier-1, and an additional full copy distributed amongst all Tier-2s of a given Tier-1 “cloud” ¾ In total, >20 copies around the world, as some large Tier-2s want a full set ¾ This model is based on general principles to make AOD data easily accessible to everyone for analysis In reality, we don’t know how many concurrent analysis jobs a data server can support ¾ Tests could be made submitting large numbers of grid jobs to read from the same data server z Results will be functions of the server type (hardware, connectivity to the CPU farm, local file system, Grid data interface) but also access pattern (all events vs sparse data in a file) If we can reduce the number of AOD copies, we can increase the number of other data samples (RAW, ESD, simulation) on disk Roger Jones: ATLAS, the Grid & the UK 22