Parallel_IO

advertisement
Parallel I/O Optimizations
Sources/Credits:
 R. Thakur, W. Gropp, E. Lusk. A Case for Using MPI's Derived Datatypes to
Improve I/O Performance. Supercomputing 98
http://www.cs.dartmouth.edu/pario/bib/short.html (bibliography)
Xiaosong Ma, Marianne Winslett, Jonghyun Lee, and Shengke Yu. Improving MPI IO output
performance with active buffering plus threads. In Proceedings of the International Parallel
and Distributed Processing Symposium. IEEE Computer Society Press, April 2003.
High Performance with Derived
Data Types (Thakur et. al: SC 98)
 Potential of parallel file systems not fully
utilized because of application’s I/O access
patterns
a. Many small requests to non-contiguous
blocks
b. Most parallel file systems access single
large chunk
 Thus motivation for making a single call
using derived data types
 ROMIO (MPICH’s I/O) performs 2
optimizations – data sieving and collective
I/O
Datatype Constructors in MPI
1.
2.
3.
4.
5.
6.
contiguous
I I I I
I I I I
vector/hvector I I I I
indexed/hindexed/indexed_block
I I I
I I I I
I I
struct
I I I
D D D D
C C
subarray
darray
I I
Different levels of access
Different levels of access
Different levels of access
Optimizations in ROMIO for
derived-datatype noncontiguous
access
1. Data sieving
•
•
•
Make a few, large contiguous requests to the
file system even if the user’s requests consists
of several, small, nocontiguous requests
Extract (pick out data) in memory that is really
needed
This is ok for read? For write?
Read-modify-write along with locking
•
Use small buffer for writing with data sieving
than for reading with data sieving. Why?
Greater the size of the write buffer, greater the
contention among processes for locks
Optimizations in ROMIO for
derived-datatype noncontiguous
access
1.
2.
Data sieving
Collective I/O
• During collective-I/O functions, the implementation
can analyze and merge the requests of different
processes
• The merged request can be large and
continuous although the individual requests
were noncontiguous.
• Perform I/O in 2 phases:
•
•
•
•
I/O phase – processes perform I/O for the merged
request. Some data may belong to other processes. If
the merged request is not contiguous, use data sieving
Communication phase – processes redistribute data to
obtain the desired distribution
Additional cost of communication phase can be offset
by performance gain due to contiguous access.
Data sieving and collective-I/O also help improve
caching and prefetching in underlying file system
Collective I/O Illustration
P0
P0
P1
P0
P1
P1
P0
P0
P1
P1
Active Buffering with Threads
(Xiaosong Ma et al.: IPDPS 2003)
 Above optimizations alone are not
enough.
 Active Buffering – use of separate I/O
nodes
 Overlapping I/O access with
computation by threads
 Buffer space automatically adjusted
to available memory
Original Scheme (Ma: IPDPS 2002)
Hierarchical buffering scheme
Dedicated I/O server nodes
During I/O:
if(not overflow in compute nodes)
compute nodes -> local buffers
else
if(not overflow in server nodes)
compute nodes ->server buffers (using MPI)
else
server nodes -> I/O system
 During computation:
Server nodes clear local buffers and I/O write
Fetch data from compute nodes (one-sided communication)
and I/O write



Current Scheme
 I/O threads collective I/O overlapped with
main threads computation and
communication
 Uses pthreads with kernel-level scheduling
 Interception of ROMIO’s I/O calls
 Main threads and I/O threads coordinate by
buffer queue
 Producer-consumer and bounded-buffer
problem
Execution Timeline
Bibliography





Philip H. Carns, Walter B. Ligon III, Robert B. Ross, and Rajeev Thakur. PVFS: A
parallel file system for linux clusters. In Proceedings of the 4th Annual Linux
Showcase and Conference, pages 317-327, Atlanta, GA, October 2000. USENIX
Association.
Jose Aguilar. A graph theoretical model for scheduling simultaneous
I/O operations on parallel and distributed environments. Parallel
Processing Letters, 12(1):113-126, March 2002.
Rajesh Bordawekar. Implementation of collective I/O in the Intel
Paragon parallel file system: Initial experiences. In Proceedings of the
11th ACM International Conference on Supercomputing, pages 20-27. ACM
Press, July 1997.
Peter Brezany, Marianne Winslett, Denis A. Nicole, and Toni Cortes.
Parallel I/O and storage technology. In Proceedings of the Seventh
International Euro-Par Conference, volume 2150 of Lecture Notes in
Computer Science, pages 887-888, Manchester, UK, August 2001. SpringerVerlag.
Bradley Broom, Rob Fowler, and Ken Kennedy. KelpIO: A telescopeready domain-specific I/O library for irregular block-structured
applications. In Proceedings of the First IEEE/ACM International
Symposium on Cluster Computing and the Grid, pages 148-155, Brisbane,
Australia, May 2001. IEEE Computer Society Press
Bibliography

J. Carretero, F. Pérez, P. de Miguel, F. Garc\'\ia, and L. Alonso. I/O data
mapping in \em ParFiSys: support for high-performance I/O in
parallel and distributed systems. In Euro-Par '96, volume 1123 of
Lecture Notes in Computer Science, pages 522-526. Springer-Verlag,
August 1996




Ying Chen, Marianne Winslett, Y. Cho, and S. Kuo. Automatic parallel I/O
performance optimization using genetic algorithms. In Proceedings of the
Seventh IEEE International Symposium on High Performance Distributed
Computing, pages 155-162. IEEE Computer Society Press, July 1998.
Ying Chen, Ian Foster, Jarek Nieplocha, and Marianne Winslett. Optimizing
collective I/O performance on parallel computers: A multisystem study.
In Proceedings of the 11th ACM International Conference on Supercomputing,
pages 28-35. ACM Press, July 1997.
Avery Ching, Alok Choudhary, Kenin Coloma, Wei keng Liao, Robert Ross,
and William Gropp. Noncontiguous I/O accesses through MPI-IO. In
Proceedings of the Third IEEE/ACM International Symposium on Cluster
Computing and the Grid, pages 104-111, Tokyo, Japan, May 2003. IEEE
Computer Society Press.
Phillip M. Dickens and Rajeev Thakur. Evaluation of collective I/O
implementations on parallel architectures. Journal of Parallel and
Distributed Computing, 61(8):1052-1076, August 2001.
Bibliography

Félix Garcia-Carballeira, Alejandro Calderon, Jesus Carretero,
Javier Fernandez, and Jose M. Perez. The design of the Expand
parallel file system. The International Journal of High
Performance Computing Applications, 17(1):21-38, 2003




Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google
file system. In Proceedings of the Nineteenth ACM Symposium on
Operating Systems Principles, pages 96-108, Bolton Landing, NY,
October 2003. ACM Press.
James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew
A. Chien, and David S. Blumenthal. PPFS: A high performance
portable parallel file system. In Hai Jin, Toni Cortes, and
Rajkumar Buyya, editors, High Performance Mass Storage and
Parallel {I/O}: Technologies and Applications, chapter 22, pages
330-343. IEEE Computer Society Press and Wiley, New York, NY,
2001.
Meenakshi A. Kandaswamy, Mahmut Kandemir, Alok Choudhary,
and David Bernholdt. An experimental evaluation of I/O
optimizations on different applications. IEEE Transactions on
Parallel and Distributed Systems, 13(7):728-744, July 2002.
Mahmut Kandemir. Compiler-directed collective I/O. IEEE
Transactions on Parallel and Distributed Systems, 12(12):13181331, December 2001.
Bibliography




Xiaosong Ma, Marianne Winslett, Jonghyun Lee, and Shengke Yu.
Improving MPI IO output performance with active buffering
plus threads. In Proceedings of the International Parallel and
Distributed Processing Symposium. IEEE Computer Society Press, April
2003.
Tara M. Madhyastha and Daniel A. Reed. Learning to classify
parallel input/output access patterns. IEEE Transactions on
Parallel and Distributed Systems, 13(8):802-813, August 2002.
Ethan L. Miller and Randy H. Katz. RAMA: An easy-to-use, highperformance parallel file system. Parallel Computing, 23(45):419-446, June 1997.
Bill Nitzberg and Virginia Lo. Collective buffering: Improving
parallel I/O performance. In Proceedings of the Sixth IEEE
International Symposium on High Performance Distributed
Computing, pages 148-157, Portland, OR, August 1997. IEEE
Computer Society Press.
See also later version nitzberg:bcollective.

Huseyin Simitci and Daniel Reed. A comparison of logical and
physical parallel I/O patterns. The International Journal of High
Performance Computing Applications, 12(3):364-380, Fall 1998.
Bibliography




Domenico Talia and Pradip K. Srimani. Parallel dataintensive algorithms and applications. Parallel
Computing, 28(5):669-671, May 2002.
Len Wisniewski, Brad Smisloff, and Nils Nieuwejaar. Sun
MPI I/O: Efficient I/O for parallel applications. In
Proceedings of SC99: High Performance Networking and
Computing, Portland, OR, November 1999. ACM Press and
IEEE Computer Society Press
K. K. Lee, M. Kallahalla, B. S. Lee, and P. J. Varman.
Performance comparison of prefetching and
placement policies for parallel I/O. International
Journal of Parallel and Distributed Systems and Networks,
5(2):76-84, 2002.
M. Kallahalla and P. J. Varman. PC-OPT: Optimal offline
prefetching and caching for parallel I/O systems. IEEE
Transactions on Computers, 51(11):1333-1344, November
2002.
Download