DiRAC@GRIDPP • www.dirac.ac.uk – please look at our wiki! • Part of the new national e-Infrastructure http://www.bis.gov.uk/policies/science/science-funding/elc • Provides Advanced IT services for Theoretical and Simulation research in Particle Physics, Particle Astrophysics, Cosmology, Astrophysics and Solar System Science [90%] • And for Industry, Commerce and Public Sector [10%] • Given the mission of investigating and deploying cutting edge technologies (hardware and software) in a production environment • Erk! We are a play pen as well as a research facility Community Run DiRAC uses the standard Research Council Facility Model. Academic Community oversight and supervision of Technical staff Regular reporting to Project Management Board. This includes Science Outcomes Outcome driven resource allocation -no research outcomes no research Sound Familiar? Structure Standard Facility Model – Oversight Committee (Meets twice yearly); Chair Foster (Cern) – Project Management Board (meets monthly); Chair Davies (Glasgow) – Technical Working Group (every fortnight); Chair Boyle (Edinburgh) – Project Director; Yates (UCL) PMB sets strategy, policy and considers reports on equipment usage and research outcomes – 10 members TWG members deliver HPC services and undertake projects for the Facility (8 members) Computational Methods Monte Carlo – particle based codes for CFD and solution of PDEs Matrix Methods – Navier-Stokes and Shrodingers equations Integrators – Runge-Kutta, ODEs Numerical lattice QCD calculations using Monte Carlo methods Where are we going - DiRAC-2 22 September 2011; DiRAC invited to submit a 1 page proposal to BIS to upgrade systems and make them part of the baseline service for the new National EInfrastructure. Assisted by STFC (Trish, Malcolm and Janet) Awarded 14M for Compute and 1M for Storage. Strict Payment Deadlines (31/03/2012) imposed. Reviewed by Cabinet Office under the Gateway Review Mechanism Payment Profiles agreed with BIS How has it gone Owning kit and paying for admin support at our sites works best – a hosting solution seems to get by far the mostest. – Rapid deployment of 5 systems using local HEI procurement Buying access via SLA/MoU – is simply not as good – just another user. SLAs don't always exist! Excellent Research outcomes – papers are flowing from the service Had to consolidate from 14 systems to 5 systems New Systems III • Final System Specs below. Costs include some Data Centre Capital Works System (supplier) Tflop/s Connectivity RAM PFS Cost /£M BG Q (IBM) 540 (Total now 1300) 5d Torus 16TB 1PB 6.0 SMP (SGI) 42 NUMA 16TB 200TB 1.8 Data Centric (OSF/IBM) 135 QDR IB 56TB 2PB Usable 3.7 Data Analytic 50% of (DELL) 200Tflops FDR (Mell) 38TB 2PB Usable 1.5 Complexity (HP) FDR (Mell) 36TB 0.8PB 2.0 90 User Access to Resources • Now have Independent Peer Review System • People apply for time; just like experimentalists do! • First Call – 21 proposals! Over contention • Central Budget from STFC for power and support staff (4FTE) • Will need to leverage users' local sys admin support to assist with DiRAC • We do need a cadre of developers to parallelise and optimise existing codes and to develop new applications • Working with Vendors and Industry to attract more funds. • Created a common login and reporting environment for DIRAC using EPCC-DL SAFE system – crude identity management TWG Projects In Progress: Authentication/access for all our users to the allowed resources. Using SSH and Database updating cludge. In Progress: Usage and System Monitoring – using SAFE initially Networking Testing in concert with Janet GPFS Multi-Clustering – multi-clustering enables compute servers outside the cluster serving a file system to remotely mount that file system. GPFS Multi-Clustering • Why might we want it – you can use ls, cp, mv etc. Much more intuitive for humble users and no ftp-ing ivolved. • Does it work over long distances (WAN)? Weeell – perhaps • Offer from IBM to test between Edinburgh and Durham, both DiRAC GPFS sites. Would like to test simple data replication workflows. • Understand and quantify identity and UID mapping issues. How YUK are they? Can SAFE help sort these out or do we need something else? • Extend to the Hartree Centre, another GPFS site. Perform more complex workflows. • Extend to Cambridge and Leciester – non IBM sites. IBM want to solve inter-operability issues The Near Future • New Management Structure – • Project Director (me) in place and Project Manager now funded at 0.5FTE level. 1 FTE System Support at each of the four Hosting Sites Need to sort out sustainability of Facility – Tied to the establishment of the N E-I LC's 10 years Capital and Recurrent investment programme (~April 2014?) • We can now perform Research Domain Leadership Simulations in the UK • Need to federate access/authentication/monitoring systems middleware that actually works, is easily usable AND is easy to manage. • Need to Federate Storage to allow easier workflow and data security Some Theoretical Physics – The Universe (well, 12.5% of it)