National HPC Infrastructure … and Edinburgh’s role Professor Arthur Trew Director, EPCC A.S.Trew@ed.ac.uk +44 131 650 5025 The UK model for HPC • … based on the Branscomb pyramid • with different layers being funded through different mechanisms from the bottom up • Personal computing – paid for by university departments or Research grants – personal control • Departmental computing – paid for by university central funds (SRIF) or larger research grants – group control local capacity computing • examples include: – BlueGene/L at Edinburgh – ECDF at Edinburgh – 500 -> 1500 processors – Darwin at Cambridge – 2300 processors • increasingly becoming Gridenabled • forming compute Grids – Scotgrid – White Rose Grid … • control still informal the top end • funded by Research Councils – some special-purpose systems for specific communities funded by Research Grants, eg – QCDOC for UKQCD at Edinburgh – COSMA for cosmology at Durham – as with mid-range systems these have local control – EPSRC managing agent for HPC services for general computational research: chemistry, engineering, climate … – HPCx – HECToR – access through national peerreview process the six year cycle • EPSRC strategy based on two co-operating services – each lasting for six years – running three years out of phase HECToR HPCx CSAR Cray T3D/E Year Present – … though we don’t always manage the three-year overlap rivallry at the high-end • each service is based on two-year phases with refreshes to keep pace with Moore’s Law • during the first “three years” from installation that service is the national capability service • … when overtaken by the next service it becomes the highend capability/capacity service – but often complementary architectures enable a more flexible approach – eg, HPCx can offer shared-memory, more flexible scheduling … than HECToR HPCx • owned and operated by UoE HPCx Ltd – wholly-owned subsiduary of the University of Edinburgh – based at Daresbury Laboratory • 2002 – 2008 – currently in Phase 3: 20xp575 – 2560 processors, 2GB/processor – 15 Tflops – No. 43 in the world HPCx Usage HECToR • also owned and operated by UoE HPCx Ltd – based at University of Edinburgh • 2007 – 2013 • installation of Phase 1 in Summer 2007 – 60xCray XT4 – 11,328 processors, 3GB/processor – 60 Tflops – ~top-10 in the world – 4x more in Phase 2 extension to ACF New Construction looking forward • spiralling costs: – CSAR £26M over 6 years – HPCx £55M over 6 years – HECToR £113 over 6 years • -> European co-operation • PACE = international bid to create European supercomputer centre(s) – petaflops-scale – planning to start this year , operation ~2010 leading Europe • goal “Edinburgh to be the premier European computational science R&D centre” – with facilities equivalent of one of the big US centres, eg San Diego Supercomputer Center • think globally, act locally building the vision European leadership R&D excellence HPC facilities+skills Training Partnerships Industry + Academia Dbase + Grid expertise EPCC Activities • vital statistics: – – – – European leadership R&D excellence HPC facilities+skills Training Partnerships Industry + Academia founded in 1990 75 staff £4.0M turnover, 90+% external required to break-even year-onyear • multidisciplinary and multifunded Dbase + Grid expertise – ... with a large spectrum of activities – … and a critical mass of expertise • operates a “win-win-win” model – inter-project leverage delivers – improved marginal costeffectiveness – radically enhanced services Facilities • EPCC leads research computing at Edinburgh: – Edinburgh researchers European – UoE HPC service leadership – Blue Gene – SAN R&D Training – national research groups excellence – QCDOC – HPC(X) Partnerships HPC Dbase + Grid Industry + – HECToR facilities+skills expertise Academia – European researchers – HPC-Europa visitor programme • widest range of facilities in any European university Technology Transfer • project-based consultancy • European leadership R&D excellence HPC facilities+skills Training Partnerships Industry + Academia Dbase + Grid expertise to industry and commerce 75 clients in past 3 years – Large enterprises... – eg, IBM, Oracle, UKMO, Sun, C&G, AEA, Cisco – ...to local SMEs – eg, Weidlinger, Quadstone, Jardine, Arran Aromatics ... • funded by business, DTI, SE and EU – judged by SE “the best example of commercialising the science base in Scotland” 1996 & 2000 • industrial work generates 50% of revenues Grid expertise • middleware: – OGSA-DAI: most successful UK middleware development – QCDGrid: developed for UKQCD, deploying in medicine … • applications: – EdSkyQuery – used in Sloan Digital Sky Survey – Mouse Atlas – 3D imaging for embryo development – Gene Expression – linking databases worldwide European leadership R&D excellence HPC facilities+skills Training Partnerships Industry + Academia Dbase + Grid expertise HPC Research • computational techniques – programming techniques – numerical algorithms – languages and standards European leadership • applications: – – – – cosmology: Virgo particle physics: QCD materials: RealityGrid …. • measured by: R&D excellence HPC facilities+skills – papers in journals and at international conferences – RAE return – 22% of UoA 19 in 1996 & 2001 Training Partnerships Industry + Academia Dbase + Grid expertise Training • Centre of Excellence in HPC • Training MSc in HPC – one of first in world – funded by EPSRC – now extended as part of taught PhD programme – supported by industry – excellent practical grounding • increasing activity in undergraduate teaching in Physics – – – – Computational Simulation PFiP Cosmology … European leadership R&D excellence HPC facilities+skills Training Partnerships Industry + Academia Dbase + Grid expertise European Engagement • HPC-Europa – – – – European leadership – • R&D excellence HPC facilities+skills Partnerships Industry + Academia ENACTS – Training – Dbase + Grid expertise – • EPCC led Round Table of European HPC & Data Centres assessed impact of emerging technologies for scientific computing eg, Grid computing set model for I3 in FP6 IST (industrial) projects – – • funded by EU since 1993 repeatedly reviewed to be the leading computing programme ~50 visitors per year, each for 1-3 months collaborative research with all Schools in this College visitors from almost every European country Gridstart: co-ordination of all EU Grid development projects NextGrid: disseminating best-practice tech transfer in Grid DEISA – supercomputing metacentre novel architectures … why bother? • HPCx – 6.1 Tflops, 1.6 TByte memory – 50xp690+ cabinets, 2500 sq ft – cost: £XX per Tflops – power consumption: 500 kW • Blue Gene – 4.5 Tflops, 0.5 TByte memory – 1xcabinet, 200 sq ft – cost: ~£XX/10 per Tflops – power consumption: 30 kW • because they can be extremely cost-effective QCDOC & Blue Gene • QCDOC – designed for QCD by Columbia, IBM & Edinburgh – first of three to begin operation, exploited by an international team – high compute/communications ratio, low memory requirement – => low-latency 6D torus – 14,000 pes, 11 Tflops peak, sustains 35% on full code • Blue Gene – providing capability service for key projects – nuclear materials simulation; financial modelling; complex fluids; systems biology, physical chemistry … – aim to keep number of projects small – relatively easy to port applications – library optimisation jointly with IBM FPGA • in collaboration with Xilinx, AlphaData, Nallatech & Algotronix – builds on Scottish leadership in FPGAs – aim is 64-pe FPGA-based parallel computer for scientific/industrial use – EPCC: – has designed the architecture – is developing parallel toolkit – will port demonstrator commercial applications biological systems • Jason Crain is investigating how proteins assemble from amino acid chains into biologically-functional structures – protein mis-folding is attributed to diseases such as cystic fibrosis, CJD and Alzheimers – joint research with IBM • Igor Goryanin is integrating Gene, protein and metabolite data to understand how biological systems function – better understanding of the network of processes in living organisms – developing cellular, organ and organism models new materials • Mike Cates has invented a new generic way to make gels involving particles suspended in a mixture of two solvents – use in, eg, cosmetics, foodstuffs, oil exploration and pharmaceuticals • Graeme Ackland is simulating the way plasma damages fission reactors – lead to better design in fusion reactors? • Paul Madden is looking at simulating material behaviour that is unmeasurable – eg. lower edge of the Earth’s mantle financial modelling • Andreas Grothey and Jacek Gondzio are modelling financial assets and liabilities – maximise expected gain and minimise the associated risk of investments – OOPS makes the solution of general large nonlinear financial problems feasible. • … a truly large scale optimisation problem – even moderate numbers of assets, time stages and scenarios leads to problems with huge dimensions. • using Blue Gene, we have been able to solve much larger problems than previously possible – solved a problem with 500 Million variables in under 2 hours..