How did NeSI Arise? - School of Engineering and Computer Science

advertisement
NeSI: Providing HPC
Resources for the NZ
Science Community
Dr. Michael J Uddstrom
Director, NIWA HPCF
(on behalf of the NeSI Team)
michael.uddstrom@niwa.co.nz
Outline
•
•
•
•
NeSI – How it Happened…
NeSI – Goals & Purposes;
Defining HPC & Fit for Purpose;
NeSI Capabilities:
– HPC Resources;
– Services.
• How to gain Research Access:
– The Process;
– The Allocation Classes;
– The responsibilities.
• What we need to do – to safeguard
the future…
• Summary;
• Contact details.
2
How did NeSI Arise?
• In October 2010 an Investment Case entitled:
– “National eScience Infrastructure (NeSI) High Performance Computational
Platforms and Services for NZ’s Research Communities”
Was submitted to the Minister of Research, Science & Technology.
• It was prepared by a Working Group of representatives from: UoA,
UC, UoO, Landcare, AgResearch & NIWA (under an indept. Chair)
• The Investment Case asserted that:
– HPC and related eScience infrastructure are indispensable components of
modern science, and are having a major impact on almost every branch of
research;
– By taking a sector approach, more efficient coordination and cooperation
would be achieved, leading to strategically targeted investment in HPC;
– Thereby providing international-scale HPC to a wide range of communities
and disciplines.
• It was formulated following a “Needs Analysis” from 2010.
3
What “we” said about our HPC needs…
• In 2010 the NZ Research Community were survey to determine their
existing and anticipated HPC requirements. We (~194 of us) said
(e.g.):
Processors to run a code
< 10
10 - 100
100 - 1000
>10,000
In 2010
40%
39%
15%
5%
In 2015
12%
32%
38%
17%
File Space per experiment
< 100GB
1 - 10 TB
100 TB
>1PB
In 2010
64%
35%
0%
0.6%
In 2015
27%
58%
13%
3%
Off-Site Data Transfers/day
< 100MB
1GB
1TB
>10TB
In 2010
33%
42%
21%
4%
In 2015
17%
28%
33%
23%
4
How is NeSI Funded?
• There are three anchor partners (Principal Investors):
–
–
–
–
University of Auckland
(with Crown funding of ~$2.2M pa);
University of Canterbury
(with Crown funding of ~$2.6M pa);
NIWA
(with Crown funding of ~$1.0M pa);
And two associate members (Associate Investors): Landcare and University of Otago.
• The Crown’s investment is $27M over 4 years;
• Institutional investment (partners & associates) is $21M over 4 years
– $15.4M in Capital Expenditure & $5.6M in Operational Expenditure
• Which provides:
–
–
–
–
4 × HPC systems at University of Auckland & Canterbury, and at NIWA
NeSI Directorate at Auckland (5 staff positions + admin) 5.5 FTEs
Systems Engineers ~ 5.5 FTEs, Site Management: 1.8 FTEs
HPC Specialist Scientific Programmers ~5.7 FTEs
• The “Partners” (called collaborators) control 60% of the HPC for
their purpose and 40% is available for “Merit” / “Research” Access
5
What NeSI is Focused on?
• Creating an advanced, scalable computing infrastructure to support
New Zealand’s research communities;
– i.e. International scale as opposed to institutional/department scale.
• Providing the grid middleware, research tools and applications, data
management, user-support, and community engagement needed for
the best possible uptake (of HPC);
– i.e. Enable efficient use of these systems.
• Encouraging a high level of coordination and cooperation within the
research sector;
– i.e. Fit the science to the HPC as opposed to “my institutions resources”.
• Contributing to high quality research outputs from the application of
advanced computing and data management techniques and
associated services, which support the Government’s published
priorities for science.
– i.e. Its all about better science to underpin national outcomes.
6
If its about HPC….what is an HPC?
• There are two “basic” types:
– Capability (aka Supercomputer): provide the maximum computing power
available to solve large problems: emphasis is on problem size (large
memory, lots of CPUs) . e.g:
• IBM p775/p7 & p575/p6, Cray XK6, IBM BG/Q & BG/P
– Capacity: typically use efficient cost-effective computing power: the
emphasis is on throughput (dealing with loads larger than a single
PC/small cluster), e.g:
• IBM iDataPlex, HP Cluster Platform n000
• The essential differences are:
– the interconnect fabric performance;
– the processor performance, and
– reliability (i.e. resiliency to component failure).
• Supercomputers have high efficiency:
– Efficiency = sustained-performance/peak-performance (%).
7
The Limits to Scalability: Amdahl’s Law
• Amdahl's law states that if P is the proportion of a program that can be
made parallel, and (1 − P) is the proportion that cannot be parallelized
(remains serial), then the maximum speedup that can be achieved by
using N processors is:
1
(1  P) 
P
N
• In the limit, as N tends to infinity, the maximum speedup tends to 1 / (1
− P). In practice, performance to price ratio falls rapidly as N is
increased once there is even a small component of (1 − P).
e.g. if P=0.99, & N=64, then
speedup = 39.3 (perfect = 64) 
if P=0.99, & N=1024, then speedup = 91.2 (perfect = 1024) ×
if P=0.90, & N=1024, then speedup = 9.9 (perfect = 1024) ××!
• It’s a Challenge… but this is “our” future!!
8
What types of HPC do we need?
• It depends on the problem (and the data locality)!
• Is it “Embarrassingly Parallel” (EP)?
– This means the problem can be split into independent tasks (P≈1), with each
sent to a different processor:
• Eg: Image rendering, classification, Monte Carlo calculations, BLAST, genetic
algorithms, etc…
• If not EP, then is it highly-scalable?
– This means the problem does not place high demands on processor
performance – because the coupling between processors is relatively “loose”
(P>0.999) and you can use very many:
• Eg: materials codes, Schrodinger’s equation, DCA++, LSMS, NWChem…
• If not EP and not highly scalable – is it tightly coupled?
– This means that the problem will place high demands on processor
performance and on the interconnect between processors: (P > ~0.9)
• Examples: numerical weather and climate prediction, Variational data assimilation,
combustion.
9
Applications running on Jaguar at ORNL (2011)
Domain area
Code name
Institution
# of cores
Performance
Notes
Materials
DCA++
ORNL
213,120
1.9 PF
2008 Gordon Bell Prize
Winner
Materials
WL-LSMS
ORNL/ETH
223,232
1.8 PF
2009 Gordon Bell Prize
Winner
2008 Gordon Bell Prize
 high
Finalist fraction
Chemistry
Highly scalable
NWChem
PNNL/ORNL
224,196
1.4 PF
Nanoscience
OMEN
Duke
222,720
> 1 PF
of Bell
Peak
2010 Gordon
Prize
Performance
Finalist
Biomedical
MoBo
GaTech
196,608
780 TF
2010 Gordon Bell Prize
Winner
Chemistry
MADNESS
UT/ORNL
140,000
550 TF
Materials
LS3DF
LBL
147,456
442 TF
149,784
165 TF
Seismology
SPECFEM3D USA (multiple)
Combustion
S3D
SNL
147,456
83 TF
Weather
WRF
USA (multiple)
150,000
50 TF
Tightly Coupled
2008
Gordon Bell
Prize
Problem
small
Winner
fraction of Peak
2008 Gordon Bell Prize
Performance
Finalist
Just 3.6% of
DCA++
Performance
10
Wednesday, July 4, 2012
New Zealand HPC Applications Workshop, Wellington
NeSI HPCs in Summary
• University of Auckland:
– IBM iDataPlex Intel processor Cluster, large node memory + some exotic
hardware (i.e. GPGPUs) (Pan)
• General purpose HPC cluster
• Optimised for EP and Highly Scalable problems;
• University of Canterbury:
– IBM BlueGene/P Supercomputer
• Optimised for EP and Highly Scalable problems;
– IBM p755/POWER7 cluster
• General purpose / capability HPC cluster
– IBM IDataPlex Visualisation Cluster
• NIWA High Performance Computing Facility:
– IBM p575/POWER6 Supercomputer (FitzRoy)
• Optimised for tightly coupled (large) problems
• Operational/production-ready (i.e. IBM Support arrangements)
11
NeSI Auckland / Centre for eResearch
• Pan cluster:
– IBM iDataPlex: dx360 M3 & M4 servers (912  1904 (Intel) cores)
•
•
•
•
•
•
76 nodes (Westmere: 12 cores / SMP node, 96 GB/node) (64 bit)
12 GPGPU nodes (Tesla M2090: 2 devices/node)
62 nodes (Sandy Bridge: 16 cores / SMP node, 128 GB/node)
UoO nodes…
InfiniBand Interconnect
Linux Red Hat 6.1
• Storage (General Parallel File System – GPFS)
– 200 TB SAN User Disk
• BeSTGRID Auckland cluster
– Commodity hardware (approx. 500 cores)
• Grid submission support:
– grisu, gricli, Jobs Online and other clients.
12
NeSI Auckland / Centre for eResearch
• DataFabric (Federated Data Sharing Service):
–
–
–
–
–
Hosted at UoA and UC
Integrated Rule Oriented Data Service (iRODS)
Tuakiri authentication
Web interface/WebDAV/FUSE/GridFTP
http://df.auckland.ac.nz/
• Applications:
– Preinstalled (some licensing may be required):
• Math: Gap, Magma, Matlab, Mathematica, R
• BioInformatics: BLAST, BEAST, beagle, PhyML,
MrBayes, BEDtools, Bamtools, Bowtie, Clustal Omega,
Cufflinks, FastQC, FASTX Toolkit
• Computational Chemistry: Gaussian, Gromacs,
AMBER, Orca, VASP
• Engineering: Ansys, Abaqus, OpenFOAM
• Meteorology: WRF, WPS
13
NeSI Auckland / Centre for eResearch
• Scientific libraries
– Compilers: Fortran, C & C++ (gcc, Intel & PGI)
– BLAS, LAPACK/LAPACK++, ATLAS, FFTW…
• Support for custom built applications:
– Batch submission (non-interactive, non-GUI,
preferably parallel processing)
– Compilers (GNU, PGI C/C++ and Fortran, Intel,
– Java, Python, OpenMPI
• CeR NeSI Staff:
– Service Delivery Manager: Marcus Gustafsson
– Systems Engineers: Yuriy Halytskyy, Aaron Hicks, + 1 TBD
– HPC Specialist Programmers: Markus Binsteiner, Martin Feller, Ben Roberts, Gene
Soudlenkov, + 2 TBD
– https://wiki.auckland.ac.nz/display/CERES/Centre+for+eResearch
14
NeSI HPCF / NIWA
• FitzRoy: IBM p575/p6 Supercomputer;
– Being upgraded in Q4 2012;
– 58 ( 108) POWER6 nodes;
• 32 cores / SMP node (64 bit):
– 1,856 × 4.7GHz ( 3,456) cores;
– 34 ( 66) TFLOPS peak;
• 602 GFLOPS / node;
– 5.3 ( 8.5) TB Memory:
• 64 and 128 GB memory nodes.
• InfiniBand interconnect fabric;
• Storage: Global Parallel File System (GPFS).
– 790 TB SAN user disk;
– 5 PB Automatic Tape Library storage with Hierarchical Storage
Management;
• AIX Operating System,
15
NeSI HPCF / NIWA
• Operating Environment:
–
–
–
–
–
–
–
Documentation: https://teamwork.niwa.co.nz/display/HPCF
SSH/X logon (via dedicated NeSI login node);
LoadLeveler (Batch Queue) to run non-interactive jobs;
IBM (xl) Fortran (77, 90, 95, 2003), C and C++ compilers;
IBM High Performance Computing Toolkit (MPI, OpenMP, etc;)
TotalView graphical debugger;
Third Party Software (e.g.):
• Make, Cmake, Python, Java, git, Subversion, GSL, Hypre, LAPACK,
ParMETIS, FFTW, NetCDF (3 & 4), parallel-NetCDF, HDF5, jasper, VisIt;
• Access any set of specific s/w versions via MODULES.
• NIWA NeSI Support Staff:
– Service Delivery Manager: Michael Uddstrom (0.05 FTE)
– Systems Engineers: Chris Edsall, Fabrice Cantos (0.21 FTE each)
– HPC Specialist Scientific Programmer: Mark Cheeseman (0.21 FTE)
• Expected system uptime: >99.5%
16
NeSI HPCF / NIWA Applications
• Unified Model (UK Met Office / Hadley Centre):
–
–
–
–
–
Weather forecasting: (global & regional – to 100 m resolution);
3DVAR & 4DVAR data assimilation;
Regional Climate modelling HadGEM3-RA;
Coupled (atmospheric, ocean, land, sea ice) earth simulation HadGEM3;
Chemistry Climate Modelling – UKCA.
• Ocean Modelling:
– ROMS (Regional Ocean Model);
– NEMO. (Global Ocean Model).
• Wave Modelling:
– WaveWatch 3, SWAN.
• CFD: Gerris (self refining grid);
• Typical job sizes:
– 64 – 1024 cores & O(10) GB output per job.
17
NeSI BlueFern / University of Canterbury
• IBM BlueGene/P:
– 2048 nodes (4 cores / SMP node):
• 8192 × 0.85 GHz PowerPC 450 cores (32 bit);
• 4 GB memory / node;
• 13.6 GFLOPS / node, 23 TFLOPS (peak) .
– 3 Dimensional Torus Interconnect.
• IBM p755/POWER7 Cluster (split between AIX & Linux):
– 13 nodes (32 cores / SMP node):
• 416 × 3.3 GHz POWER7 cores (64 bit);
• 1.7 TB Memory (128 GB / node);
• 769 GFLOPS/node, 10 TFLOPS (peak).
– Infiniband Interconnect.
• Storage: General Parallel Filesystem (GPFS):
– 180 TB SAN user disk;
– 1 PB Automatic Tape Library storage with Hierarchical Storage Management.
18
NeSI BlueFern / University of Canterbury
• IBM iDataPlex Visualisation Cluster
– 5 nodes (8 cores / SMP node)
• 40 × 3.03 GHz Intel Xeon cores
• 2 × Tesla M2070Q GPUs / node
• 96 GB Memory / node
• Applications;
– BG/P
• Molecular Dynamics (NAMD, LAMMPS, VASP etc), Weather Forecasting
(WRF), Protein Folding/Docking (AMBER, GROMACS, etc), Monte Carlo &
researcher codes
– P755/POWER7
• Fluid Dynamics (Fluent/CFX), Genomics (MrBayes etc), Numerical (Octave,
R), interpreted languages (Java, Python+SciPy/NumPy) & researcher codes
– Visualization
• Visualization tools (VTK, ParaView, VisIt etc.) and high-speed remote graphical
sessions (vizstack, turboVNC, etc.)
19
NeSI BlueFern / University of Canterbury
•
Operating Environment:
– Documentation:
wiki.canterbury.ac.nz/display/BlueFern/BlueFern+User+Documentation+Wiki
– SSH/X logon
– VNC Remote Desktop
– LoadLeveler (Batch Queue) to run non-interactive jobs;
– IBM (xl) Fortran (77, 90, 95, 2003), C and C++ compilers;
– IBM High Performance Computing Toolkit (MPI, OpenMP, etc;)
– TotalView graphical debugger;
– Libraries(e.g.): LAPACK, ScaLAPACK, BLAST/MpiBlast, ESSL, FFTW, GSL
– IDE: Eclipse PTP
•
Support:
– Service Delivery Manager: Dan Sun
– HPC Support Consultants: François Bissey, Céline Cattoën-Gilbert, Tony
Dale, Vladimir Mencl
– bluefern@canterbury.ac.nz
20
Accessing NeSI Resources (I)
• NeSI is focused on providing high quality access to the most important
science that needs HPC.
• This means that NeSI HPC Allocations are given to Research Projects
that meet the following simple criteria:
– The science has been peer reviewed (and approved);
– There is a demonstrable need to use HPC:
• i.e. Researchers / PIs need to submit a Technical Proposal to NeSI explaining
what they need, and why – which is assessed by a Panel of Experts.
• Meet these, and you will be granted a “Research Allocation” on one or
more NeSI HPCs… and NeSI will subsidise 80% of the cost of the
HPC core-hours that you need:
– In return….your project needs to fund the remaining 20%....
– And… now that I have your attention…
• It is recognised that this process will take time – as it needs
researchers to explicitly fund HPC in their proposals…
21
Accessing NeSI Resources (II)
• During this transition period …
– If you meet the Science and Technical Requirements, but are unable to
pay even the 20% then you will still be provided with an HPC Allocation
(i.e. core hours) as “Research-Unfunded” – but:
• Your jobs will have low priority;
• “Research-Funded” (i.e. have paid for HPC time) will always have higher
priority.
• So how much does NeSI HPC resources cost?
Platform
Cost per core-h
(before subsidy)
Minimum Allocation Unit
UoA iDataPlex
$0.20
12 or 16 cores/node
NIWA HPCF P575/P6
$0.20
32 cores/node
UC BG/P
$0.05
256 cores / partition
UC P755/P7
$0.20
32 cores/node
• PI’s projects will pay only 20% of the cost/core-h indicated above.
22
Accessing NeSI Resources (III)
• So what benefits do Research Allocation Users get?
– Can seek up to 1,000,000 core hours on the UoA, NIWA and/or UC BG/P
systems
– Expert advice and assistance for getting their codes running on these
systems (i.e there is 5.7 FTEs of specialist support available)
– No charge for data storage (or for data transfers)…
– Can use NeSI eResearch infrastructure (post BestGRID tools), e.g.
•
•
•
•
Datafabric (iRODS) for sharing data
Grisu & Gricli grid submission tools
GridFTP to transfer data etc.
Tuakiri authentication
• What if I have never used an HPC and don’t‘ know where to begin?
– You can apply for a “Proposal Development” Allocation…
23
Accessing NeSI Resources (IV)
• Proposal Development Allocations are designed to:
– Provide limited access to the HPCs at no cost to the PI;
– Provide an opportunity to:
• Become familiar with the operating environments on the HPC(s)
• Port / implement application codes on the target HPC(s)
• Do preliminary runs to determine scalability and suitability of the HPC(s) for
your problem (is it fit for purpose?)
• Enable you to complete a Technical Proposal for a Research Allocation.
– Conditions:
• One Allocation concurrently per project per PI;
• Public display of project description and results required.
• Teaching Allocations:
–
–
–
–
Available on UoA and UC systems;
To support academic education classes or training workshops;
Require a publicly viewable class or workshop description;
No cost.
24
Proposal Development Allocation Applic.
• Does it meet the conditions for a Proposal Development Allocation?
• NeSI Technical Application requirements:
–
–
–
–
–
–
–
–
Contact details of PI and the team members etc.
Title
Outline the scientific goals of the potential future project
It would be useful to outline the team’s HPC experience to date
Indicate which HPC architecture is likely to be most suitable
List the codes to be used
Development tools needed (such as compilers, debuggers, profilers, etc.)
Estimate how much specialist help will be required (e.g. for software
porting & installation etc.)
– Indicate how much data storage will be needed.
• NeSI staff will be able to assist you in responding to these questions.
25
Research Allocation Application
• Does your project meet the Scientific Requirements Test?
• NeSI Technical Application requirements:
–
–
–
–
–
Contact details of PI and the team members etc.
Title (should relate to Science proposal title)
Outline the scientific goals / hypothesis and say why you need HPC
Outline the deliverables of the overall project (e.g. # papers)
Outline the team’s HPC experience & HPC architectures it has worked on
in the past (incl. any Proposal Development experience)
– Indicate which HPC architecture is likely to be most suitable (and why)
• List the codes to be used and parallelisation methods planned (if any)
• Indicate the scalability of the code(s) on the target architecture
–
–
–
–
Indicate all software requirements (libraries, compilers) – any licensing?
Specify the size of allocation requested (i.e. core-hours)
Indicate data storage requirements
Estimate how much specialist help will be required (e.g. for optimisation)
26
NeSI Status Reflection / Review (at 07/12)
• Creating an advanced, scalable computing infrastructure to support
New Zealand’s research communities;
– 3 international scale HPCs operating: A grade
• Providing the grid middleware, research tools and applications, data
management, user-support, and community engagement needed for
the best possible uptake (of HPC);
– NeSI is staffed (few open positions), grid middleware being developed, community
engagement underway (HPC Workshop, Presentations beginning): C+ Grade
• Encouraging a high level of coordination and cooperation within the
research sector;
– Will always be a challenge – but Auckland, Canterbury & NIWA working together
for the Good of NZ Science: B+ Grade
• Contributing to high quality research outputs… which support the
Government’s published priorities for science.
– To early to tell – but we need to do so within the next 12 months: No Grade yet…
27
Summary
• NeSI is a big new investment (by the Government in HPC for NZ
Science)
• It is making a world-class HPC ecosystem available to NZ Science
• It is a collaboration between Universities & CRIs
• It is funded till June 2014 – but will need to prove its success by
September 2013
• To be successful… (i.e. attract ongoing funding & HPC access)
– NZ Scientists will need to demonstrate their need for HPC (see User
Needs Survey)
– This means transitioning from the problems that I can solve on “my” PC,
and/or departmental cluster – to large scale HPC provided by NeSI;
• This is a function of the funding-round cycle too…
• It will take time to learn new programming methods & tools: MPI, OpenMP, and
new Operating Environments
– In the presence of PBRF and Publication pressures…
28
Summary
• And:
– PIs will have to demonstrate the value of NeSI by funding 20% of their
access costs
• This has implications for the way our Institutions provide access to Operational
Expenditure (most prefer to provide Capital Expenditure)
– Research Projects using NeSI will need to generate excellent science
(that could not be done without these HPCs)
– Contribute to Government Outcomes
• In which case we can expect a long period of HPC funding in the
years ahead.
29
Where to get Help
•
•
•
•
•
•
•
•
•
NeSI Central:
Access Policy:
Eligibility:
Allocation Classes:
Application Forms:
Calls Timetable:
Storage:
Case Studies:
Sites:
– CeR:
– NIWA:
– UC:
http://www.nesi.org.nz/
http://www.nesi.org.nz/access-policy
http://www.nesi.org.nz/eligibility
http://www.nesi.org.nz/allocations
http://www.nesi.org.nz/apply
http://www.nesi.org.nz/timetable
http://www.nesi.org.nz/files-and-data
http://www.nesi.org.nz/case-studies
http://www.eresearch.auckland.ac.nz/uoa/
http://www.niwa.co.nz/our-services/hpcf
http://www.bluefern.canterbury.ac.nz/
• NeSI Staff are both here to help, and willing to help!
30
Extra Slides
31
What we said we Need/will Need in HPC
32
Download