Tier1 Status Report Martin Bly RAL 27,28 April 2005 Topics • • • • • • • Hardware Atlas DataStore Networking Batch services Storage Service Challenges Security 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Hardware • Approximately 550 CPU nodes – – ~980 processors deployed in batch Remainder are services nodes, servers etc. • 220TB disk space ~ 60 servers, ~120 arrays • Decommissioning – – – – Majority of the P3/600MHz systems decommissioned Jan 05 P3/1GHz systems to be decommissioned in July/Aug 05 after commissioning of Year 4 procurement. Babar SUN systems decommissioned by end Feb 05 CDF IBM systems decommissioned and sent to Oxford, Liverpool, Glasgow and London • Next procurement – – – 64bit AMD or Intel CPU nodes – power, cooling Dual cores possibly too new Infortrend Arrays / SATA disks / SCSI connect • Future – Evaluate new disk technologies, dual core CPUs, etc. 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Atlas DataStore • Evaluating new disk systems for staging cache – FC attached SATA arrays – Additional 4TB/server, 16TB total – Existing IBM/AIX servers • Tape drives – Two additional 9940B drives, FC attached – 1 for ADS, 1 for test CASTOR installation • Developments – – – – – – Evaluating a test CASTOR installation Stress testing ADS components to prepare for Service Challenges Planning for a new robot Considering next generation of tape drives SC4 (2006) requires step in cache performance Ancillary network rationalised 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Networking • Planned upgrades to Tier1 production network – Started November 04 – Based on Nortel 5510-48T `stacks’ for large groups of CPU and disk server nodes (up to 8/stack, 384 ports) – High speed backbone inter-unit interconnect (40Gb/s bidirectional) within stacks – Multiple 1Gb/s uplinks aggregated to form backbone • currently 2 x 1Gb/s, max 4 x 1Gb/s – Update to 10Gb/s uplinks and head node as cost falls – Uplink configuration with links to separate units within each stack and the head switch will provide resilience – Ancillary links (APCs, disk arrays) on separate network • Connected to UKLight for SC2 (c.f. later) – 2 x 1Gb/s links aggregated from Tier1 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Batch Services • Worker node configuration based on traditional style batch workers with LCG configuration on top. – Running SL 3.0.3 with LCG 2_4_0 – Provisioning by PXE/Kickstart – YUM/Yumit, Yaim, Sure, Nagios, Ganglia… • All rack-mounted workers dual purpose, accessed via a single batch system PBS server (Torque). • Scheduler (MAUI) allocates resources for LCG, Babar and other experiments using Fair Share allocations from User Board. • Jobs able to spill into allocations for other experiments and from one `side’ to the other when spare capacity is available, to make best use of the capacity. • Some issues with jobs that use excess memory (memory leaks) not being killed by Maui or Torque – under investigation. 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Systems • Service systems migrated to SL 3 – Mail hub, NIS servers, UIs – Babar UIs configured as DNS triplet • NFS / data servers – Customised RH7.n • Driver issues • NFS performance of SL 3 uninspiring c/w 7.n – dCache systems at SL 3 • LCG service nodes at SL 3, LCG-2_4_0 • Need to migrate to LCG-2_4_0 or loose work 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Storage • Moving to SRMs from NFS for data access – dCache successfully deployed in production • Used by CMS, ATLAS… • See talk by Derek Ross – Xrootd deployed in production • Used by Babar • Two `redirector’ systems handle requests – Selected by DNS pair – Hand off request to appropriate server – Reduces NFS load on disk servers • Load issues with Objectivity server – Two additional servers being commissioned • Project to look at SL 4 for servers – 2.6 kernel, journaling file systems - ext3, XFS 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Challenges I • The Service Challenges are a program infrastructure trials designed to test the LCG fabric at increasing levels of stress/capacity in the run up to LHC operation. • SC2 – March/April 05: – Aim: T0->T1s aggregate of >500MB/s sustained for 2 weeks – 2Gb/sec link via UKlight to CERN – RAL sustained 80MB/sec for two weeks to dedicated (non-production) dCache • 11/13 gridftp servers • Limited by issues with network – Internal testing reached 3.5Gb/sec (~400MB/sec) aggregate disk to disk – Aggregate to 7 participating sites: ~650MB/sec • SC3 – July 05 -Tier1 expects: – CERN -> RAL at 150MB/s sustained for 1 month – T2s -> RAL (and RAL -> T2s?) at yet-to-be-defined rate • Lancaster, Imperial … • Some on UKlight, some via SJ4 • Production phase Sept-Dec 05 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Challenges II • SC4 - April 06 – CERN-RAL T0-T1 expects 220MB/sec sustained for one month – RAL expects T2-T1 traffic at N x 100MB/sec simultaneously. • June 06 – Sept 06: production phase • Longer term: – There is some as yet undefined T1 -> T1 capacity needed. This could be add 50 to 100MB/sec. – CMS production will require 800MB/s combined and sustained from batch workers to the storage systems within the Tier1. – At some point there will be a sustained double rate test – 440MB/sec T0-T1 and whatever is then needed for T2-T1. • It is clear that the Tier1 will be able to keep a significant part of a 10Gb/sec link busy continuously, probably from late 2006. 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Security • The Badguys™ are out there – Users are vulnerable to loosing authentication data anywhere • Still some less than ideal practices – All local privilege escalation exploits must be treated as a high priority must-fix – Continuing program of locking down and hardening exposed services and systems – You can only be more secure • See talk by Roman Wartel 27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL