ppt

advertisement
Blue Waters and PPL’s Role
Celso Mendes & Eric Bohm
Parallel Programming Laboratory
Dep. Computer Science, University of Illinois
Outline
• Blue Waters System
• NCSA Project
• Petascale Computing Facility
• Machine Characteristics
• PPL Participation
•
•
•
•
PPL’s Role in Blue Waters
Object-Based Virtualization
BigSim Simulation System
NAMD Petascale Application
• Conclusion
• How to Join Us
• Acknowledgements
2
Blue Waters System
• Comes online in 2011 at NCSA
• World’s first sustained petascale system for open
scientific research
• Hundreds of times more powerful than today’s typical
supercomputer
• 1 quadrillion calculations per second sustained
• Collaborators:
• University of Illinois/NCSA
• IBM
• Great Lakes Consortium for Petascale Computation
3
The Blue Waters Project
• Will enable unprecedented science and engineering
advances
• Supports:
•
•
•
•
Application development
System software development
Interaction with business and industry
Educational programs
• Includes Petascale Application Collaboration Teams
(PACTs) that will help researchers:
• Port, scale, and optimize existing applications
• Create new applications
4
Blue Waters – How We Won
• Two years from start to finish to develop proposal and
go through intense competition and peer-review
process
• Rivals from across the country, including: California,
Tennessee, Pennsylvania—universities, national labs
• Offered an excellent, open site; unparalleled technical
team; collaborators from across the country; and an
intense focus on scientific research
• Leverages $300M DARPA investment in IBM technology
• 3 years of development, followed by 5 years of
operations. Blue Waters will come online in 2011 and be
retired or upgraded in 2016
5
Blue Waters – What’s a Petaflop?
One quadrillion calculations per second!
If we multiplied two 14-digit numbers together per second:
• 32 years to complete 1 billion calculations
• 32 thousand years to complete 1 trillion calculations
• 32 million years to complete 1 quadrillion calculations
32 years ago, America
celebrated its bicentennial
6
32 thousand years ago, early
cave paintings were completed
32 million years ago, the Alps
were rising in Europe
Blue Waters – The lay of the land
Blue Waters is the powerhouse of the National Science
Foundation’s strategy to support supercomputers for
scientists nationwide
T1
T2
T3
7
Blue Waters
NCSA/Illinois
1 petaflop sustained per second
Roadrunner
DOE/Los Alamos 1.3 petaflops peak per second
Ranger
TACC/Texas
504 teraflops peak per second
Kraken
NICS/Tennessee
166 teraflops peak per second (with
upgrade to come)
Campuses
across the U.S.
Several sites
50-100 teraflops peak per second
Petascale Computing Facility
•
•
•
•
8
Future home of Blue Waters and other NCSA hardware
88,000 square feet, 20,000 square foot machine room
Water-cooled computers are 40 percent more efficient
Onsite cooling towers save even more energy
Blue Waters – Interim Systems
An interesting challenge: The IBM POWER7 hardware on
which Blue Waters will be based isn’t available yet. NCSA
has installed four systems to prepare for Blue Waters:
• “BluePrint,” an IBM POWER575+
cluster for studying the software
environment
• Two IBM POWER6 systems for
developing the archival storage
environment and scientific
applications
• An x86 system running “Mambo,” an
IBM system simulator that allows
researchers to study the performance
of scientific codes on Blue Waters’
POWER7 hardware
9
Selection Criteria for Petascale Computer
• Maximize Core Performance
… to minimize number of cores needed for a given level of performance as
well as lessen impact of sections of code with limited scalability
• Incorporate Large, High-bandwidth Memory Subsystem
… to enable the solution of memory-intensive problems
• Maximize Interconnect Performance
… to facilitate scaling to the large numbers of processors required for
sustained petascale performance
• High-performance I/O Subsystem
… to enable solution of data-intensive problems
• Maximize System Integration, Leverage Mainframe
Reliability, Availability, Serviceability (RAS) Technologies
… to assure reliable operation for long-running, large-scale simulations
10
Blue Waters - Main Characteristics
• Hardware:
•
•
•
•
•
•
•
•
•
•
11
Processor: IBM Power7 multicore architecture
More than 200,000 cores will be available
Capable of simultaneous multithreading (SMT)
Vector multimedia extension capability (VMX)
Four or more floating-point operations per cycle
Multiple levels of cache – L1, L2, shared L3
32 GB+ memory per SMP, 2 GB+ per core
16+ cores per SMP
10+ Petabytes of disk storage
Network interconnect with RDMA technology
Blue Waters - Main Characteristics
• Software:
•
•
•
•
•
•
•
•
•
•
•
•
12
C, C++ and Fortran compilers
UPC and Co-Array-Fortran compilers
MASS, ESSL and parallel ESSL libraries
MPI, MPI2, OpenMP
Low level active messaging layer
Eclipse-based application development framework
HPC and HPCS toolkits
Cactus framework
Charm++/AMPI infrastructure
Tools for debugging at scale
GPFS file system
Batch and interactive access
Blue Waters Project Leadership
• Thom H. Dunning, Jr. – NCSA
• Project Director
• Bill Kramer - NCSA
• Deputy Project Director
• Wen-mei Hwu – UIUC/ECE
• Co-Principal Investigator
• Marc Snir – UIUC/CS
• Co-Principal Investigator
• Bill Gropp – UIUC/CS
• Co-Principal Investigator
To learn more about Blue Waters: http://www.ncsa.uiuc.edu/BlueWaters
13
PPL Participation in Blue Waters
• Since the very beginning…
14
Current PPL Participants in Blue Waters
• Leadership:
• Prof. Laxmikant (Sanjay) Kale
• Research Staff:
•
•
•
•
•
Eric Bohm *
Celso Mendes *
Ryan Mokos
Viraj Paropkari
Gengbin Zheng #
* Partially funded
15
• Grad Students:
• Filippo Gioachin
• Chao Mei
• Phil Miller
• Admin. Support:
• JoAnne Geigner *
# Unfunded
PPL’s Role in Blue Waters
• Three Activities:
a) Object-Based Virtualization – Charm++ & AMPI
b) BigSim Simulation System
c) NAMD Application Porting and Tuning
• Major Effort Features:
•
•
•
Deployments specific for Blue Waters
Close integration with NCSA staff
Leverages other PPL’s research funding
• DOE & NSF (Charm++), NSF (BigSim), NIH (NAMD)
16
Object-Based Virtualization
• Charm++ is a well used software that has been ported
to a number of different parallel machines
• OS platforms including Linux, AIX, MacOS, Windows,…
• Object and thread migration is fully supported on those platforms
• Many existing applications based on Charm++ have
already scaled beyond 20,000 processors
• e.g. NAMD, ChaNGa, OpenAtom
• Adaptive MPI (AMPI): designed for legacy MPI codes
• MPI implementation based on Charm++; supports C/C++/Fortran
• Usability has been continuously enhanced
• Scope of PPL Work:
• Deploy and optimize Charm++, AMPI and possibly other virtualized
GAS languages on Blue Waters
17
Current Charm++/SMP Performance
•
18
Improvement on K-Neighbor Test (24 cores, Mar’2009)
BigSim Simulation System
• Two-Phase Operation:
• Emulation: Run actual program with AMPI, generate logs
• Simulation: Feed logs to a discrete-event simulator
• Multiple Fidelity Levels Available
• Computation: scaling factor; hardware counters; processor-simulator
• Communication: latency-bandwidth only; full contention-based
19
Combined BigSim/Processor-Simulator Use
void func(… )
{
StartSim( )
…
EndSim( )
Cycle-accurate
Simulator
e.g. Mambo
Parameter files
BigSim
Simulator
}
BigSim
Emulator
+
Log files
20
interpolation
Replace sequential timings
New log files
Recent BigSim Enhancements
• Incremental Reading of Log Files
• Enables handling large log files that might not fit in memory
• Creation of Out-of-Core Support for Emulation
• Enables emulating applications with large memory footprint
• Tests with Memory-Reuse Schemes (in progress)
• Enables reusing memory data of emulated processors
• Applicable to codes without data-dependent behavior
• Flexible Support for Non-Contention Network Model
• Network parameters easily configured at runtime
• Development of Blue Waters Network Model (in progress)
• Contention-based model specific for Blue Waters network
• NOT a part of BigSim’s public distribution
21
Petascale Problems from NSF
• NSF Solicitation (June 5, 2006):
• http://www.nsf.gov/pubs/2006/nsf06573/nsf06573.html
• Three applications selected for sustained petaflop performance:
• Turbulence
• Lattice-gauge QCD
• Molecular Dynamics
• Turbulence and QCD cases defined by problem specification
• Molecular dynamics:
• Defined with problem specification
• Required use of NAMD code
22
MD Problem Statement from NSF
“A
molecular dynamics (MD) simulation of curvature-inducing protein
BAR domains binding to a charged phospholipid vesicle over 10 ns
simulation time under periodic boundary conditions. The vesicle, 100 nm
in diameter, should consist of a mixture of dioleoylphosphatidylcholine
(DOPC) and dioleoylphosphatidylserine (DOPS) at a ratio of 2:1. The
entire system should consist of 100,000 lipids and 1000 BAR domains
solvated in 30 million water molecules, with NaCl also included at a
concentration of 0.15 M, for a total system size of 100 million atoms.
All system components should be modeled using the CHARMM27 allatom empirical force field. The target wall-clock time for completion of
the model problem using the NAMD MD package with the velocity Verlet
time-stepping algorithm, Langevin dynamics temperature coupling,
Nose-Hoover Langevin piston pressure control, the Particle Mesh Ewald
algorithm with a tolerance of 1.0e-6 for calculation of electrostatics, a
short-range (van der Waals) cut-off of 12 Angstroms, and a time step of
0.002 ps, with 64-bit floating point (or similar) arithmetic, is 25 hours. The
positions, velocities, and forces of all the atoms should be saved to disk
every 500 timesteps.”
23
NAMD Challenges
• At the time the NSF benchmark was proposed the
largest atom systems being run in NAMD (or similar
applications) had fewer than 4 million atoms.
• Standard file formats (PSF, PDB) could not even express systems
larger than 10 million atoms.
• Startup, input, output were all handled on one processor
• The NAMD toolset of : NAMD, VMD, and PSFGen all required
significant enhancements so that atom systems of this size could be
executed.
• Blue Waters hardware not available
24
NAMD Progress
• New File formats to support 100 Million atom systems
• New I/O framework to reduce memory footprint
• New output framework to parallelize output
• New input framework to parallelize input and startup
• New PME communication framework
• Performance analysis of sequential blocks in MAMBO
• Performance prediction via BigSim and MAMBO
• Performance analysis of extremely large systems
25
NAMD Progress (cont.)
•
NAMD 2.7b1 and Charm-6.1 released
•
Support execution of 100M atom systems
• New plug-in system to support arbitrary file formats
• Limit on number of atoms in PSF file fixed
• Tested using 116 Million atom BAR domain and water
systems
•
Parallel Output
• Parallel output is complete
• Performance tuning is ongoing
26
NAMD Progress (cont.)
•
Parallel Input
• Work delayed by complexities in file formats
•
•
•
Worked with John Stone and Jim Phillips (Beckman Inst.) to revise file formats
New plug-in system integrated
Demonstrated 10x-20x performance improvement for 116 M atom
•
10M, 50M, 100M Bar systems
• PSFgen couldn’t make them with old format (see above)
• Currently have 10M, 20M,50M,100M, 150M water boxes
• 116M BAR Domain constructed, solvated, run in NAMD.
•
Analysis of 10M, 50M, 100M
• Comparative analysis of overheads from fine
decomposition and molecule size ongoing
27
Parallel Startup in NAMD
Table 1: Parallel Startup for 10 Million water on BlueGene/P
Memory(MB)
Start (sec)
Nodes
1
NA
4484.55 *
8
446.499
865.117
16
424.765
456.487
32
420.492
258.023
64
435.366
235.949
128
227.018
222.219
256
122.296
218.285
512
73.2571
218.449
1024
76.1005
214.758
Table : Parallel Startup 116 Million BAR domain on Abe
Nodes
Start (sec)
Memory (MB)
28
1
3075.6 *
75457.7 *
50
340.361
1008
80
322.165
908
120
323.561
710
Current NAMD Performance
29
Summary
• Blue Waters arriving at Illinois in 2011
• First sustained-Petaflop system
• PPL early participation
• NSF proposal preparation
• Application studies
• PPL current participation
• Charm++/AMPI deployment
• BigSim simulation
• NAMD porting and tuning
• Illinois’ HPC tradition continues…
30
Conclusion
Want to Join Us?
• NCSA has a few open positions
• Visit http://www.ncsa.uiuc.edu/AboutUs/Employment
• PPL may have PostDoc and RA positions in the near
future
• E-mail to kale@illinois.edu
Acknowledgments - Blue Waters funding
• NSF grant OCI-0725070
• State of Illinois funds
31
Download