PIO, NCAR (Jim Edwards)

advertisement
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Parallel IO in the Community
Earth System Model
Jim Edwards
John Dennis
(NCAR)
Ray Loy(ANL)
Pat Worley (ORNL)
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
CAM
ATMOSPHERIC
MODEL
CISL LAND ICE
MODEL
CLM LAND
MODEL
CPL7
COUPLER
CICE OCEAN
ICE MODEL
Workshop on Scalable IO in Climate Models
POP2 OCEAN
MODEL
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Some CESM 1.1 Capabilities:
– Ensemble configurations with multiple
instances of each component
– Highly scalable capability proven to
100K+ tasks
– Regionally refined grids
– Data assimilation with DART
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Each model component was
independent with it’s own IO
interface
• Mix of file formats
– NetCDF
– Binary (POSIX)
– Binary (Fortran)
• Gather-Scatter method to interface
serial IO
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Converge on a single file format
– NetCDF selected
• Self describing
• Lossless with lossy capability (netcdf4
only)
• Works with the current postprocessing tool
chain
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Extension to parallel
• Reduce single task memory profile
• Maintain single file decomposition
independent format
• Performance (secondary issue)
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Parallel IO from all compute tasks is
not the best strategy
– Data rearrangement is complicated
leading to numerous small and
inefficient IO operations
– MPI-IO aggregation alone cannot
overcome this problem
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Goals:
– Reduce per MPI task memory usage
– Easy to use
– Improve performance
• Write/read a single file from parallel
application
• Multiple backend libraries: MPIIO,NetCDF3, NetCDF4, pNetCDF,
NetCDF+VDC
• Meta-IO library: potential interface to other
general libraries
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
CPL7
COUPLER
CISL LAND ICE
MODEL
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
CAM
ATMOSPHERIC
MODEL
CLM LAND
MODEL
CICE OCEAN
ICE MODEL
POP2 OCEAN
MODEL
PIO
VDC
netcdf3
netcdf4
pnetcdf
HDF5
MPI-IO
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Separation of Concerns
• Separate computational and I/O
decomposition
• Flexible user-level rearrangement
• Encapsulate expert knowledge
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• What versus How
– Concern of the user:
• What to write/read to/from disk?
• eg: “I want to write T,V, PS.”
– Concern of the library developer:
• How to efficiently access the disk?
• eq: “How do I construct I/O operations so
that write bandwidth is maximized?”
• Improves ease of use
• Improves robustness
• Enables better reuse
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Separate computational and I/O decomposition
computational
decomposition
I/O decomposition
Rearrangement between
computational and I/O
decompositions
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Flexible user-level rearrangement
• A single technical solution is not suitable
for the entire user community:
– User A: Linux cluster, 32 core job, 200
MB files, NFS file system
– User B: Cray XE6, 115,000 core job,
100 GB files, Lustre file system
Different compute environment requires
different technical solution!
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Writing distributed data (I)
Computational decomposition
I/O decomposition
Rearrangement
+ Maximize size of individual io-op’s to disk
- Non-scalable user space buffering
- Very large fan-in  large MPI buffer allocations
Correct solution for User A
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Writing distributed data (II)
Computational decomposition
I/O decomposition
Rearrangement
+ Scalable user space memory
+ Relatively large individual io-op’s to disk
- Very large fan-in  large MPI buffer allocations
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Writing distributed data (III)
Computational decomposition
I/O decomposition
Rearrangement
+ Scalable user space memory
+ Smaller fan-in -> modest MPI buffer allocations
- Smaller individual io-op’s to disk
Correct solution for User B
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Encapsulate Expert knowledge
• Flow-control algorithm
• Match size of I/O operations to stripe size
– Cray XT5/XE6 + Lustre file system
– Minimize message passing traffic at
MPI-IO layer
• Load balance disk traffic over all I/O nodes
– IBM Blue Gene/{L,P}+ GPFS file
system
– Utilizes Blue Gene specific topology
information
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Did we achieve our design goals?
• Impact of PIO features
– Flow-control
– Vary number of IO-tasks
– Different general I/O backends
• Read/write 3D POP sized variable
[3600x2400x40]
• 10 files, 10 variables per file, [max
bandwidth]
• Using Kraken (Cray XT5) + Lustre
filesystem
– Used 16 of 336 OST
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3D POP arrays [3600x2400x40]
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3D POP arrays [3600x2400x40]
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3D POP arrays [3600x2400x40]
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3D POP arrays [3600x2400x40]
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3D POP arrays [3600x2400x40]
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
PIOVDC
Parallel output to a VAPOR Data Collection (VDC)
•
VDC:
– A wavelet-based, gridded data format supporting both progressive
access and efficient data subsetting
•
Data may be progressively accessed (read back) at different levels
of detail, permitting the application to trade off speed and accuracy
– Think GoogleEarth: less detail when the viewer is far away,
progressively more detail as the viewer zooms in
– Enables rapid (interactive) exploration and hypothesis testing that can
subsequently be validated with full fidelity data as needed
•
Subsetting
– Arrays are decomposed into smaller blocks that significantly improve
extraction of arbitrarily oriented sub arrays
•
Wavelet transform
– Similar to Fourier transforms
– Computationally efficient: order O(n)
– Basis for many multimedia compression technologies (e.g. mpeg4,
jpeg2000)
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Earth System Modeling Framework
(ESMF)
• Model for Prediction Across Scales
(MPAS)
• Geophysical High Order Suite for
Turbulence (GHOST)
• Data Assimilation Research Testbed
(DART)
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
April
26,
2010
Penn
State
University
Workshop on Scalable IO in Climate Models
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
26
27/02/2012
Parallel IO in CESM
April
26,
2010
Penn
State
University
Workshop on Scalable IO in Climate Models
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
27
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
100:1 Compression with coefficient prioritization
3
1024
Taylor-Green turbulence (enstrophy field) [P. Mininni, 2006]
No compression
Workshop on Scalable IO in Climate Models
Coefficient prioritization (VDC2)
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
3
4096 Homogenous
turbulence simulation
Volume rendering of original enstrophy field and 800:1 compressed field
Original: 275GBs/field
800:1 compressed: 0.34GBs/field
Data provided by P.K. Yeung at Georgia Tech and Diego
Donzis at Texas A&M
Workshop on Scalable IO in Climate Models
27/02/2012
Jim Edwards
Parallel IO in CESM
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
interface PIO_write_darray
! TYPE real,int
! DIMS 1,2,3
module procedure write_darray_{DIMS}d_{TYPE}
end interface
genf90.pl
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
# 1 "tmp.F90.in"
interface PIO_write_darray
module procedure dosomething_1d_real
module procedure dosomething_2d_real
module procedure dosomething_3d_real
module procedure dosomething_1d_int
module procedure dosomething_2d_int
module procedure dosomething_3d_int
end interface
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• PIO is opensource
– http://code.google.com/p/parallelio/
Documentation using doxygen
• http://web.ncar.teragrid.org/~dennis/pio_do
c/html/
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Thank you
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• netCDF3
– Serial
– Easy to implement
– Limited flexibility
• HDF5
–
–
–
–
Serial and Parallel
Very flexible
Difficult to implement
Difficult to achieve good performance
• netCDF4
–
–
–
–
–
Serial and Parallel
Based on HDF5
Easy to implement
Limited flexibility
Difficult to achieve good performance
Workshop on Scalable IO in Climate Models
27/02/2012
Parallel IO in CESM
Jim Edwards
jedwards@ucar.edu
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
• Parallel-netCDF
–
–
–
–
Parallel
Easy to implement
Limited flexibility
Difficult to achieve good performance
• MPI-IO
–
–
–
–
Parallel
Very difficult to implement
Very flexible
Difficult to achieve good performance
• ADIOS
– Serial and parallel
– Easy to implement
– BP file format
• Easy to achieve good performance
– All other file formats
• Difficult to achieve good performance
Workshop on Scalable IO in Climate Models
27/02/2012
Download