Slides

Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL) 1 Introduction  Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future.  VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB  Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching. Introduction  Scientific Datasets and Scientific I/O Libraries  PnetCDF, HDF5, ADIOS PnetCDF MPI-IO Parallel File Systems  Scientific I/O libraries allow users to specify array-based logical input  Logical-physical mismatching Motivation I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5): Independent I/O  Processes collaboration: No  Calls collaboration : No Collective I/O  Processes collaboration: Yes  Calls collaboration : No Nonblocking I/O  Processes collaboration: Yes  Calls collaboration : Yes Motivation Call0 Calli Call1 … … … … … … … Two Phase Collective I/O ag00 ag01 ag02 ag03 ag10 ag11 ag12 ag13 … agi0 agi1 agi2 agi3 Contention on Storage Server without Aware of Locality Performance with Overlapping Calls Collective I/O Independent I/O 30 10 Non-overlaping Calls Non-overlaping Calls Overlaping Calls I/O Cost (s) I/O Cost (s) Overlaping Calls 25 8 6 4 2 20 15 10 5 0 1 5 10 20 30 40 50 0 1 5 Number of Calls 10 20 30 40 50 Number of Calls Nonblocking Collective I/O Independent Collective Nonblocking Collective 7 Non-overlaping Calls Overlaping Calls 5 I/O Cost (s) I/O Cost (s) 6 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 1 1 5 10 20 Number of Calls 30 40 50 5 10 20 30 Number of Calls Conclusion: Overlapping Should be Removed 40 50 Idea: High level I/O Aggregation Physical Layout Logical Input Decomposition Call0 sub0 start{0,0,0} length{100,200,100} start{0,0,0} length{100,200,200} Call1 start{10,20,100} length{10,300,400} Physical Layout sub1 start{0,0,100} length{100,200,100} sub0 sub2 sub2 start{10,20,100} length{10,150,400} sub1 sub3 start{10,170,100} length{10,150,400} sub3 Idea: High level I/O Aggregation Basic Idea  Figure out the overlapping among requests  Eliminate the overlapping before doing I/O Challenges  How to decompose the requests  How to aggregate the sub-arrays at a high level Hila: High Level I/O Aggregation Way to figure out the physical layout  Sub-correlation Function  Lustre Striping: stripe size: t; stripe count: l;  Dataset : Dimension: d; subsets size: m  Sub-correlation Set Hila Algorithm: Prior Step Prior Step: calculate sub-correlation set, one time analysis Hila Algorithm: Decomposition Main Steps: Request Decomposition and Aggregation Improvement with Hila 7 10 Indepedent 9 HiLa-ind Nonblocking Collective HiLa-nbc 6 5 7 I/O Cost (s) I/O Cost (s) 8 6 5 4 3 4 3 2 2 1 1 0 0 5 10 20 30 40 50 5 10 Number of Calls 20 30 40 50 Number of Calls 14 12 30 Collective HiLa-col 10 I/O Cost (s) I/O Cost (s) 25 20 15 8 6 4 10 2 5 0 Indepedent Collective Nonblocking Collective Traditional 2.769361 12.567792 5.693901 HiLa 2.262536 12.118085 4.613422 0 5 10 20 30 Number of Calls 40 50 Performance Improved with Hila 9 8 7 6 5 4 3 2 1 0 FASM-HiLa FASM Speedup 1 0.8 0.6 0.4 0.2 0 5 10 FASM Improved with Hila 20 30 Number of Calls 40 50 Speedup I/O Cost (s) Improvement with Hila Conclusion and Future Work Conclusion  The mismatching between logical access and physical layout can lead to poor performance.  We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests. Future Work  Apply to write operations  Integrate with file systems. Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Thanks Q&A http://discl.cs.ttu.edu

Slides

Related documents

Products

Support

Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib