Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division

advertisement
Connecting HPIO Capabilities
with Domain Specific Needs
Rob Ross
MCS Division
Argonne National Laboratory
[email protected]
I/O in a HPC system
Application
I/O System Software
…
Clients running
applications
(100s-1000s)
Storage or
System Network
I/O System Software
Storage Hardware
…
I/O devices
or servers
(10s-100s)
• Many cooperating tasks sharing I/O resources
• Relying on parallelism of hardware and software
for performance
2
Motivation
• HPC applications increasingly rely on I/O
subsystems
– Large input datasets, checkpointing, visualization
• Applications continue to be scaled, putting more
pressure on I/O subsystems
• Application programmers desire interfaces that
match the domain
– Multidimensional arrays, typed data, portable formats
• Two issues to be resolved by I/O system
– Very high performance requirements
– Gap between app. abstractions and HW abstractions
3
I/O history in a nutshell
• I/O hardware has lagged behind and
continues to lag behind all other system
components
• I/O software has matured more slowly than
other components (e.g. message passing
libraries)
– Parallel file systems (PFSs) are not enough
• This combination has led to poor I/O
performance on most HPC platforms
• Only in a few instances have I/O libraries
presented abstractions matching application
needs
4
Evolution of I/O software
(Not to scale or necessarily in the right order…)
• Goal is convenience and performance for HPC
• Slowly capabilities have emerged
• Parallel high-level libraries bring together good
abstractions and performance, maybe
5
I/O software stacks
Application
High-level I/O Library
MPI-IO Library
Parallel File System
I/O Hardware
• Myriad I/O components are converging into
layered solutions
• Insulate applications from eccentric MPI-IO
and PFS details
• Maintain (most of) I/O performance
– Some HLL features do cost performance
6
Role of parallel file systems
• Manage storage hardware
– Lots of independent components
– Must present a single view
– Provide fault tolerance
• Focus on concurrent, independent access
– Difficult to pass knowledge of collectives to PFS
• Scale to many clients
– Probably means removing all shared state
– Lock-free approaches
• Publish an interface that MPI-IO can use
effectively
– Not POSIX
7
Role of MPI-IO implementations
• Facilitate concurrent access by groups of
processes
– Understanding of the programming model
• Provide hooks for tuning PFS
– MPI_Info as interface to PFS tuning parameters
• Expose a fairly generic interface
– Good for building other libraries
• Leverage MPI-IO semantics
– Aggregation of I/O operations
• Hide unimportant details of parallel file
system
8
Role of high-level libraries
• Provide an appropriate abstraction for the
domain
–
–
–
–
Multidimensional, typed datasets
Attributes
Consistency semantics that match usage
Portable format
• Maintain the scalability of MPI-IO
– Map data abstractions to datatypes
– Encourage collective I/O
• Implement optimizations that MPI-IO cannot
(e.g. header caching)
9
Example: ASCI/Alliance FLASH
ASCI FLASH
Parallel netCDF
IBM MPI-IO
• FLASH is an astrophysics
simulation code from the
ASCI/Alliance Center for
GPFS
Astrophysical
Storage
Thermonuclear Flashes
• Fluid dynamics code using adaptive mesh
refinement (AMR)
• Runs on systems with thousands of nodes
• Three layers of I/O software between the
application and the I/O hardware
• Example system: ASCI White Frost
10
FLASH data and I/O
• 3D AMR blocks
– 163 elements per block
– 24 variables per element
– Perimeter of ghost cells
• Checkpoint writes all variables
– no ghost cells
– one variable at a time (noncontiguous)
• Visualization output is a subset of
variables
• Portability of data desirable
– Postprocessing on separate platform
Ghost cell
Element (24 vars)
11
Tying it all together
FLASH I/O Benchmark
• FLASH tells PnetCDF that all its
processes want to write out
regions of variables and store
them in a portable format
120
100
80
60
40
20
0
16
32
64
128
Processors
HDF5
256
PnetCDF
• PnetCDF performs data conversion and calls appropriate
MPI-IO collectives
• MPI-IO optimizes writing of data to GPFS using data
shipping, I/O agents
• GPFS handles moving data from agents to storage
resources, storing the data, and maintaining file metadata
• In this case, PnetCDF is a better match to the application
12
Future of I/O system software
• More layers in the I/O stack
– Better match application view of data
– Mapping this view to PnetCDF or similar
– Maintaining collectives, rich descriptions
Application
Domain Specific I/O Library
• More high-level libraries using MPI-IO
– PnetCDF, HDF5 are great starts
– These should be considered mandatory
I/O system software on our machines
High-level I/O Library
MPI-IO Library
Parallel File System
I/O Hardware
• Focusing component implementations on their roles
– Less general-purpose file systems
- Scalability and APIs of existing PFSs aren’t up to workloads and scales
– More aggressive MPI-IO implementations
- Lots can be done if we’re not busy working around broken PFSs
– More aggressive high-level library optimization
- They know the most about what is going on
13
Future
• Creation and adoption of parallel high-level I/O
libraries should make things easier for everyone
– New domains may need new libraries or new middleware
– HLLs that target database backends seem obvious, probably
someone else is already doing this?
• Further evolution of components necessary to get
best performance
– Tuning/extending file systems for HPC (e.g. user metadata
storage, better APIs)
• Aggregation, collective I/O, and leveraging semantics
are even more important at larger scale
– Reliability too, especially for kernel FS components
• Potential HW changes (MEMS, active disk) are
complementary
14
Download