Slides

advertisement
Locality-driven High-level I/O Aggregation
for Processing Scientific Datasets
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen
Oct. 15. 2013@U-REaSON Seminar
Data-Intensive Scalable Computing Laboratory (DISCL)
1
Introduction
 Scientific simulations nowadays generate a few terabytes (TB) of
data in a single run and the data sizes are expected to reach
petabytes (PB) in the near future.
 VPIC, Vector Particle in Cell, Plasma
physics, 26 bytes per particle, 30TB
 Accessing and analyzing the data
reveals poor I/O performance due to
the logical-physical mismatching.
Introduction
 Scientific Datasets and Scientific I/O Libraries
 PnetCDF, HDF5, ADIOS
PnetCDF
MPI-IO
Parallel File Systems
 Scientific I/O libraries allow users to specify array-based
logical input
 Logical-physical mismatching
Motivation
I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5):
Independent I/O
 Processes collaboration: No
 Calls collaboration
: No
Collective I/O
 Processes collaboration: Yes
 Calls collaboration
: No
Nonblocking I/O
 Processes collaboration: Yes
 Calls collaboration
: Yes
Motivation
Call0
Calli
Call1
…
…
…
…
…
…
…
Two Phase Collective I/O
ag00 ag01 ag02 ag03
ag10 ag11 ag12 ag13
…
agi0 agi1 agi2 agi3
Contention on Storage Server without Aware of Locality
Performance with Overlapping Calls
Collective I/O
Independent I/O
30
10
Non-overlaping Calls
Non-overlaping Calls
Overlaping Calls
I/O Cost (s)
I/O Cost (s)
Overlaping Calls
25
8
6
4
2
20
15
10
5
0
1
5
10
20
30
40
50
0
1
5
Number of Calls
10
20
30
40
50
Number of Calls
Nonblocking Collective I/O
Independent
Collective
Nonblocking Collective
7
Non-overlaping Calls
Overlaping Calls
5
I/O Cost (s)
I/O Cost (s)
6
4
3
2
1
0
9
8
7
6
5
4
3
2
1
0
1
1
5
10
20
Number of Calls
30
40
50
5
10
20
30
Number of Calls
Conclusion: Overlapping Should be Removed
40
50
Idea: High level I/O Aggregation
Physical
Layout
Logical Input
Decomposition
Call0
sub0
start{0,0,0}
length{100,200,100}
start{0,0,0}
length{100,200,200}
Call1
start{10,20,100}
length{10,300,400}
Physical
Layout
sub1
start{0,0,100}
length{100,200,100}
sub0
sub2
sub2
start{10,20,100}
length{10,150,400}
sub1
sub3
start{10,170,100}
length{10,150,400}
sub3
Idea: High level I/O Aggregation
Basic Idea
 Figure out the overlapping among requests
 Eliminate the overlapping before doing I/O
Challenges
 How to decompose the requests
 How to aggregate the sub-arrays at a high level
Hila: High Level I/O Aggregation
Way to figure out the physical layout
 Sub-correlation Function
 Lustre Striping: stripe size: t; stripe count: l;
 Dataset : Dimension: d; subsets size: m
 Sub-correlation Set
Hila Algorithm: Prior Step
Prior Step: calculate sub-correlation set, one time analysis
Hila Algorithm: Decomposition
Main Steps: Request Decomposition and Aggregation
Improvement with Hila
7
10
Indepedent
9
HiLa-ind
Nonblocking Collective
HiLa-nbc
6
5
7
I/O Cost (s)
I/O Cost (s)
8
6
5
4
3
4
3
2
2
1
1
0
0
5
10
20
30
40
50
5
10
Number of Calls
20
30
40
50
Number of Calls
14
12
30
Collective
HiLa-col
10
I/O Cost (s)
I/O Cost (s)
25
20
15
8
6
4
10
2
5
0
Indepedent
Collective
Nonblocking
Collective
Traditional
2.769361
12.567792
5.693901
HiLa
2.262536
12.118085
4.613422
0
5
10
20
30
Number of Calls
40
50
Performance Improved with Hila
9
8
7
6
5
4
3
2
1
0
FASM-HiLa
FASM
Speedup
1
0.8
0.6
0.4
0.2
0
5
10
FASM Improved with Hila
20
30
Number of Calls
40
50
Speedup
I/O Cost (s)
Improvement with Hila
Conclusion and Future Work
Conclusion
 The mismatching between logical access and physical layout
can lead to poor performance.
 We propose the locality-driven high-level aggregation approach
(HiLa) to facilitate the existing I/O methods by eliminating the
overlapping among sub-array requests.
Future Work
 Apply to write operations
 Integrate with file systems.
Locality-driven High-level I/O Aggregation
for Processing Scientific Datasets
Thanks
Q&A
http://discl.cs.ttu.edu
Download