Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL) 1 Introduction Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching. Introduction Scientific Datasets and Scientific I/O Libraries PnetCDF, HDF5, ADIOS PnetCDF MPI-IO Parallel File Systems Scientific I/O libraries allow users to specify array-based logical input Logical-physical mismatching Motivation I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5): Independent I/O Processes collaboration: No Calls collaboration : No Collective I/O Processes collaboration: Yes Calls collaboration : No Nonblocking I/O Processes collaboration: Yes Calls collaboration : Yes Motivation Call0 Calli Call1 … … … … … … … Two Phase Collective I/O ag00 ag01 ag02 ag03 ag10 ag11 ag12 ag13 … agi0 agi1 agi2 agi3 Contention on Storage Server without Aware of Locality Performance with Overlapping Calls Collective I/O Independent I/O 30 10 Non-overlaping Calls Non-overlaping Calls Overlaping Calls I/O Cost (s) I/O Cost (s) Overlaping Calls 25 8 6 4 2 20 15 10 5 0 1 5 10 20 30 40 50 0 1 5 Number of Calls 10 20 30 40 50 Number of Calls Nonblocking Collective I/O Independent Collective Nonblocking Collective 7 Non-overlaping Calls Overlaping Calls 5 I/O Cost (s) I/O Cost (s) 6 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 1 1 5 10 20 Number of Calls 30 40 50 5 10 20 30 Number of Calls Conclusion: Overlapping Should be Removed 40 50 Idea: High level I/O Aggregation Physical Layout Logical Input Decomposition Call0 sub0 start{0,0,0} length{100,200,100} start{0,0,0} length{100,200,200} Call1 start{10,20,100} length{10,300,400} Physical Layout sub1 start{0,0,100} length{100,200,100} sub0 sub2 sub2 start{10,20,100} length{10,150,400} sub1 sub3 start{10,170,100} length{10,150,400} sub3 Idea: High level I/O Aggregation Basic Idea Figure out the overlapping among requests Eliminate the overlapping before doing I/O Challenges How to decompose the requests How to aggregate the sub-arrays at a high level Hila: High Level I/O Aggregation Way to figure out the physical layout Sub-correlation Function Lustre Striping: stripe size: t; stripe count: l; Dataset : Dimension: d; subsets size: m Sub-correlation Set Hila Algorithm: Prior Step Prior Step: calculate sub-correlation set, one time analysis Hila Algorithm: Decomposition Main Steps: Request Decomposition and Aggregation Improvement with Hila 7 10 Indepedent 9 HiLa-ind Nonblocking Collective HiLa-nbc 6 5 7 I/O Cost (s) I/O Cost (s) 8 6 5 4 3 4 3 2 2 1 1 0 0 5 10 20 30 40 50 5 10 Number of Calls 20 30 40 50 Number of Calls 14 12 30 Collective HiLa-col 10 I/O Cost (s) I/O Cost (s) 25 20 15 8 6 4 10 2 5 0 Indepedent Collective Nonblocking Collective Traditional 2.769361 12.567792 5.693901 HiLa 2.262536 12.118085 4.613422 0 5 10 20 30 Number of Calls 40 50 Performance Improved with Hila 9 8 7 6 5 4 3 2 1 0 FASM-HiLa FASM Speedup 1 0.8 0.6 0.4 0.2 0 5 10 FASM Improved with Hila 20 30 Number of Calls 40 50 Speedup I/O Cost (s) Improvement with Hila Conclusion and Future Work Conclusion The mismatching between logical access and physical layout can lead to poor performance. We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests. Future Work Apply to write operations Integrate with file systems. Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Thanks Q&A http://discl.cs.ttu.edu