University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing Structure T3 types and comparisons Scorecard My view of the ideal computing environment Full system support by a dedicated professional • hardware and software (OS and file system) High bandwidth access to the data at desired level of detail • e.g., ESD, AOD, summary data and conditions data Access to all relevant ATLAS software and grid services Access to compute cycles equivalent to purchased hardware Access to additional burst cycles Access to ATLAS software support when needed Conversationally close to those in same working group • Preferentially face to face These are my views, derived from discussions with Jason Nielsen, Terry Schalk UCSC), Jim Cochran (Iowa State), Anyes Taffard (UCI), Ray Frey, Eric Torrence (Oregon), Gordon Watts (Washington), Richard Mount, Charlie Young (SLAC) July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 2 ATLAS Computing Structure • ATLAS world-wide tiered computing structure where ~30 TB of raw data/day from ATLAS is reconstructed, reduced and distributed to the end user for analysis • T0: CERN • T1: 10 centers world wide. US: BNL. No end user analysis. • T2: some end-user analysis capability @ 5 US Centers, 1 located @ SLAC • T3: end user analysis @ universities and some national labs. • See ATLAS T3 report: http://www.pa.msu.edu/~brock/file_sharing/T3TaskForce//fin al/TierThree_v1_executiveFinal.pdf July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 3 Data Formats in ATLAS Format Size(MB)/evt RAW - data output from DAQ (streamed on trigger bits) 1.6 ESD - event summary data: reco info + most RAW 0.5 AOD - analysis object data: summary of ESD data 0.15 TAG - event level metadata with pointers to data files 0.001 Derived Physics Data ~25 kb/event ~30 kb/event ~5 kb/event July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 4 Possible data reduction chain (possible scenario for “mature” phase of ATLAS experiment) July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 5 T3g T3g: Tier3 with grid connectivity (a typical university-based system): • • • • • Tower or rack-based Interactive nodes Batch system with worker nodes Atlas code available (in kit releases) ATLAS DDM client tools available to fetch data (currently dq2-ls, dq2-get) • Can submit grid jobs • Data Storage located on worker nodes or dedicated file servers • Possible activities: detector studies from ESD/pDPD, physics/validation studies from D3PD, fast MC, CPU intensive matrix element calculations, ... July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 6 A university-based ATLAS T3g • Local computing a key to producing physics results quickly from reduced datasets • Analyses/streams of interest at the typical university: performance ESD/pDPD at T2 # analyses e-gamma 1 W/Z(e) 2 W/Z(m) 2 minbias 1 physics stream (AOD/D1PD) at T2 # analyse s e-gamma 2 muon 1 jet/missEt 1 • CPU and storage needed for first 2 years: components 160 cores 70 TB July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 7 A university-based ATLAS T3g Requirements matched by a rack-based system from T3 report: The university has a 10 Gb/s network to the outside. Group will locate the T3g near campus switch and interface directly to it July 17, 2009 SLUO/LHC workshop Computing Session 10 KW heat 320 kSI2K processing Bill Lockman 8 Tier3 AF (Analysis Facility) Two sites expressed interest and have set up prototypes • BNL: Interactive nodes, batch cluster, Proof cluster • SLAC: Interactive nodes and batch cluster T3AF – University groups can contribute funds / hardware • Groups are granted priority access to resources they purchased • Purchase batch slots • Remaining ATLAS may use resources when not in use by owners SLAC-specific case: • Details covered in Richard Mount’s talk July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 9 University T3 vs. T3AF Site: Advantages: Disadvantages: University • Cooling, power, space usually provided • Limited cooling, power, space and funds • Control over use of resources to scale acquisition in future years • More freedom to innovate/experiment • Support not 24/7, not professional. Cost • Dedicated CPU resource may be comparable to that at T3AF • Potential matching funds from university • Limited networking and networking support • Access to databases • No surge capability • 24/7 hardware and software support • A yearly buy in cost (mostly professional) • Less freedom to innovate/experiment • Shared space for code, data (AOD) by university • Excellent access to ATLAS data and • Must share some cycles databases • Fair share mechanism to allow universities to use what they contributed • Better network security • ATLAS release installation provided Some groups will site disks and/or worker nodes at T3AF, interactive nodes at university T3AF July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 10 Qualitative score card University T3g T3AF no generally yes High bandwidth access to the data at desired level of detail variable good Access to all relevant ATLAS software and grid services variable yes Access to compute cycles equivalent to purchased hardware yes yes Access to additional burst cycles (e.g., crunch time analysis) generally not yes Access to ATLAS software support when needed generally yes yes some being negotiated being negotiated Full system support by a dedicated professional Cost (hardware, infrastructure) • Cost is probably the driving factor in hardware site decision • hybrid options are also possible A T3AF at SLAC will be an important option for university groups considering a T3 July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 11 Extra July 17, 2009 SLUO/LHC workshop Computing Session Bill Lockman 12