National HPC Infrastructure … and Edinburgh’s role Professor Arthur Trew Director, EPCC

advertisement
National HPC Infrastructure
… and Edinburgh’s role
Professor Arthur Trew
Director, EPCC
A.S.Trew@ed.ac.uk
+44 131 650 5025
The UK model for HPC
• … based on the Branscomb pyramid
• with different layers being funded through different
mechanisms
from the bottom up
• Personal computing – paid for
by university departments or
Research grants – personal
control
• Departmental computing –
paid for by university central
funds (SRIF) or larger
research grants – group
control
local capacity computing
• examples include:
– BlueGene/L at Edinburgh
– ECDF at Edinburgh
– 500 -> 1500 processors
– Darwin at Cambridge
– 2300 processors
• increasingly becoming Gridenabled
• forming compute Grids
– Scotgrid
– White Rose Grid …
• control still informal
the top end
• funded by Research Councils
– some special-purpose systems for
specific communities funded by
Research Grants, eg
– QCDOC for UKQCD at Edinburgh
– COSMA for cosmology at Durham
– as with mid-range systems these
have local control
– EPSRC managing agent for HPC
services for general computational
research: chemistry, engineering,
climate …
– HPCx
– HECToR
– access through national peerreview process
the six year cycle
• EPSRC strategy based on two co-operating services
– each lasting for six years
– running three years out of phase
HECToR
HPCx
CSAR
Cray T3D/E
Year
Present
– … though we don’t always manage the three-year overlap
rivallry at the high-end
• each service is based on two-year phases with refreshes to
keep pace with Moore’s Law
• during the first “three years” from installation that service is
the national capability service
• … when overtaken by the next service it becomes the highend capability/capacity service
– but often complementary architectures enable a more flexible
approach
– eg, HPCx can offer shared-memory, more flexible scheduling … than
HECToR
HPCx
• owned and operated by UoE HPCx Ltd – wholly-owned
subsiduary of the University of Edinburgh
– based at Daresbury Laboratory
• 2002 – 2008
– currently in Phase 3: 20xp575 – 2560 processors, 2GB/processor
– 15 Tflops – No. 43 in the world
HPCx Usage
HECToR
• also owned and operated by UoE HPCx Ltd
– based at University of Edinburgh
• 2007 – 2013
• installation of Phase 1 in Summer 2007
– 60xCray XT4 – 11,328 processors, 3GB/processor
– 60 Tflops – ~top-10 in the world
– 4x more in Phase 2
extension to ACF
New
Construction
looking forward
• spiralling costs:
– CSAR £26M over 6 years
– HPCx £55M over 6 years
– HECToR £113 over 6 years
• -> European co-operation
• PACE = international bid to create European supercomputer
centre(s)
– petaflops-scale
– planning to start this year , operation ~2010
leading Europe
• goal
“Edinburgh to be the premier European
computational science R&D centre”
– with facilities equivalent of one of the big US centres, eg San Diego
Supercomputer Center


• think globally, act locally
building the vision
European
leadership
R&D
excellence
HPC
facilities+skills
Training
Partnerships
Industry +
Academia
Dbase + Grid
expertise
EPCC Activities
• vital statistics:
–
–
–
–
European
leadership
R&D
excellence
HPC
facilities+skills
Training
Partnerships
Industry +
Academia
founded in 1990
75 staff
£4.0M turnover, 90+% external
required to break-even year-onyear
• multidisciplinary and multifunded
Dbase + Grid
expertise
– ... with a large spectrum of
activities
– … and a critical mass of expertise
• operates a “win-win-win” model
– inter-project leverage delivers
– improved marginal costeffectiveness
– radically enhanced services
Facilities
• EPCC leads research computing at
Edinburgh:
– Edinburgh researchers
European
– UoE HPC service
leadership
– Blue Gene
– SAN
R&D
Training
– national research groups
excellence
– QCDOC
– HPC(X)
Partnerships
HPC
Dbase + Grid
Industry +
– HECToR
facilities+skills
expertise
Academia
– European researchers
– HPC-Europa visitor programme
• widest range of facilities in any
European university
Technology Transfer
• project-based consultancy
•
European
leadership
R&D
excellence
HPC
facilities+skills
Training
Partnerships
Industry +
Academia
Dbase + Grid
expertise
to industry and commerce
75 clients in past 3 years
– Large enterprises...
– eg, IBM, Oracle, UKMO,
Sun, C&G, AEA, Cisco
– ...to local SMEs
– eg, Weidlinger,
Quadstone, Jardine,
Arran Aromatics ...
• funded by business, DTI,
SE and EU
– judged by SE “the best
example of commercialising
the science base in
Scotland” 1996 & 2000
• industrial work generates
50% of revenues
Grid expertise
• middleware:
– OGSA-DAI: most successful
UK middleware development
– QCDGrid: developed for
UKQCD, deploying in
medicine …
• applications:
– EdSkyQuery – used in Sloan
Digital Sky Survey
– Mouse Atlas – 3D imaging
for embryo development
– Gene Expression – linking
databases worldwide
European
leadership
R&D
excellence
HPC
facilities+skills
Training
Partnerships
Industry +
Academia
Dbase + Grid
expertise
HPC Research
• computational techniques
– programming techniques
– numerical algorithms
– languages and standards
European
leadership
• applications:
–
–
–
–
cosmology: Virgo
particle physics: QCD
materials: RealityGrid
….
• measured by:
R&D
excellence
HPC
facilities+skills
– papers in journals and at
international conferences
– RAE return
– 22% of UoA 19 in 1996 & 2001
Training
Partnerships
Industry +
Academia
Dbase + Grid
expertise
Training
• Centre of Excellence in HPC
•
Training
MSc in HPC
– one of first in world
– funded by EPSRC
– now extended as part of taught
PhD programme
– supported by industry
– excellent practical grounding
• increasing activity in
undergraduate teaching in
Physics
–
–
–
–
Computational Simulation
PFiP
Cosmology
…
European
leadership
R&D
excellence
HPC
facilities+skills
Training
Partnerships
Industry +
Academia
Dbase + Grid
expertise
European Engagement
•
HPC-Europa
–
–
–
–
European
leadership
–
•
R&D
excellence
HPC
facilities+skills
Partnerships
Industry +
Academia
ENACTS
–
Training
–
Dbase + Grid
expertise
–
•
EPCC led Round Table of European
HPC & Data Centres
assessed impact of emerging
technologies for scientific computing
eg, Grid computing
set model for I3 in FP6
IST (industrial) projects
–
–
•
funded by EU since 1993
repeatedly reviewed to be the leading
computing programme
~50 visitors per year, each for 1-3
months
collaborative research with all Schools
in this College
visitors from almost every European
country
Gridstart: co-ordination of all EU Grid
development projects
NextGrid: disseminating best-practice
tech transfer in Grid
DEISA
–
supercomputing metacentre
novel architectures … why bother?
• HPCx
– 6.1 Tflops, 1.6 TByte memory
– 50xp690+ cabinets, 2500 sq ft
– cost: £XX per Tflops
– power consumption: 500 kW
• Blue Gene
– 4.5 Tflops, 0.5 TByte memory
– 1xcabinet, 200 sq ft
– cost: ~£XX/10 per Tflops
– power consumption: 30 kW
• because they can be extremely cost-effective
QCDOC & Blue Gene
• QCDOC
– designed for QCD by Columbia, IBM & Edinburgh
– first of three to begin operation, exploited by an international team
– high compute/communications ratio, low memory requirement
– => low-latency 6D torus
– 14,000 pes, 11 Tflops peak, sustains 35% on full code
• Blue Gene
– providing capability service for key projects
– nuclear materials simulation; financial modelling; complex fluids;
systems biology, physical chemistry …
– aim to keep number of projects small
– relatively easy to port applications
– library optimisation jointly with IBM
FPGA
• in collaboration with Xilinx, AlphaData, Nallatech & Algotronix
– builds on Scottish leadership in FPGAs
– aim is 64-pe FPGA-based parallel computer for scientific/industrial use
– EPCC:
– has designed the architecture
– is developing parallel toolkit
– will port demonstrator commercial applications
biological systems
• Jason Crain is investigating how proteins
assemble from amino acid chains into
biologically-functional structures
– protein mis-folding is attributed to diseases
such as cystic fibrosis, CJD and Alzheimers
– joint research with IBM
• Igor Goryanin is integrating Gene, protein
and metabolite data to understand how
biological systems function
– better understanding of the network of
processes in living organisms
– developing cellular, organ and organism models
new materials
• Mike Cates has invented a new generic way
to make gels involving particles suspended
in a mixture of two solvents
– use in, eg, cosmetics, foodstuffs, oil
exploration and pharmaceuticals
• Graeme Ackland is simulating the way
plasma damages fission reactors
– lead to better design in fusion reactors?
• Paul Madden is looking at simulating
material behaviour that is unmeasurable
– eg. lower edge of the Earth’s mantle
financial modelling
• Andreas Grothey and Jacek Gondzio are
modelling financial assets and liabilities
– maximise expected gain and minimise the
associated risk of investments
– OOPS makes the solution of general large
nonlinear financial problems feasible.
• … a truly large scale optimisation problem
– even moderate numbers of assets, time stages and scenarios leads
to problems with huge dimensions.
• using Blue Gene, we have been able to solve much larger
problems than previously possible
– solved a problem with 500 Million variables in under 2 hours..
Download