Dynamic Active Storage for High Performance I/O Chao Chen()

advertisement
Dynamic Active Storage for
High Performance I/O
Chao Chen(chao.chen@ttu.edu)
4.02.2012
UREaSON
Outline
Ø Background
Ø Active Storage
Ø Issues/challenges
Ø Dynamic Active Storage
Ø Prototyping and Evaluation
Ø Conclusion and future work
Background
Ø  Applications from the area of geographical
information systems, Climate Science,
astrophysics, high-energy physics, etc. are
becoming more and more data intensive.
§ 
NASA’s Shuttle Radar Topography Mission (10TB)
§ 
FLASH: Buoyancy-Driven Turbulent Nuclear
Burning(75TB~300TB)
§ 
Climate Science (10TB~355TB)
Ø  Efficient tools are needed to store and
analyze these data sets.
Background
Ø  CN: compute nodes, dedicated for processing (sum,
minus, multiple etc.)
Ø  It is very time consuming.
Ø  SN: storage nodes, dedicated for storing the data.
Ø  I/O operations dominate the
system performance
CN1
CN2
CN3
CNn
Network
Compute
Node
SN2
Analysis kernel
I/O request
Storage
Node
SN1
Application
SNm
Data
Disk
Active Storage
Ø  Active Storage was proposed to mitigate such issue, and attracted intensive
attention.
Ø  It moves appropriate computations near data (Storage Nodes)
Compute
Application
Node
I/O request
Storage
Node
Result
Analysis
kernel
Data
Disk
Network bandwidth
cost is reduced
Active Storage
Two famous prototype:
Ø  Felix et. al proposed the first
prototype based on Lustre
§ 
Supports limited and simple operations
§ 
Lacks a flexible method to add
processing kernels
NAL
User Space
Processing
Component
OST
ASOBD
OBDfilter
ext3
ASDEV
Active Storage
Ø  Woo et. al proposed another
prototype based on PVFS
§ 
It provides a more sophisticated
prototype based on MPI
§ 
User can register their process
kernels
Application
Client 1
Client 2
…
Parallel File System Client
Client n
Parallel File
System API
Active
Storage API
Parallel File
System API
Kernels
Interconnection network
Server 1
Server 2
…
Server n
Disk
GPU
Issues/Challenges
Ø  All existing studies don’t consider data dependence
Ø  Dependence commonly exists among data accesses
Issues/Challenges
for example, flow-direction and flow-accumulation operations in
terrain analysis
latitude
longitude
Single direction
multi-direction
Fig .1 Examples of SFD and MFD
SFD: Single flow direction
MFD: Multiple flow direction
Issues/Challenges
Ø  Dependence has a great impact on performance
The)performance)of)Ac&ve)Storage)(no)dependence))
Performance)of)Ac&vestorage)(with)dependence))
10000"
3500"
9000"
3000"
7000"
6000"
5000"
TS"
4000"
AS"
3000"
Execu&on)&me)(s))
Execu&on)&me)(s))
8000"
2500"
2000"
AS"
1500"
TS"
1000"
2000"
500"
1000"
0"
0"
24"
36"
48"
Data)size)(GB))
SUM operation
60"
24GB"
36GB"
48GB"
60GB"
Data)size)
flow-routing operation
Question: Is every operation suitable to be offloaded to storage node?
Data Dependence
Stripe L
Stripe 1
1
Terrain
map
2
3
4
5
…
…
N-4
N-3
N-2
Each stripe is 64kb in PVFS
N-1
N
Possible Data distribution
2
3
Stripe o
s1
s2
s3
…
s4
s
s5
M-3
s6
s7
s8
4
M-2
Stripe p
Server a
Server b
Server c
Analysis
Kernel
Analysis
Kernel
Analysis
Kernel
Disk
Stripe q
Stripe o
…
Disk
Stripe p
…
M-1
M
Possible Bandwidth cost: 2 times
Disk
Stripe q
Dynamic Active Storage
A Dynamic Active Storage Prototype is proposed:
Ø  Predicts the I/O bandwidth cost before the active I/O is accepted
Ø  Dynamically determines operations that are beneficial to be offloaded and
processed on storage nodes
Ø  Introduces a new data layout method
DAS System Architecture
NEW
Key components:
1.  Bandwidth prediction
2.  Data Distribution calculation
(layout optimizer)
3.  Kernel features
4.  Local I/O API
5.  Processing kernels
Bandwidth Prediction
Known the dependence patterns, we can calculate data locations, and then estimate the
bandwidth cost previously:
k
i
stride
j
stride
i,j,k – ith, jth, kth data elements
E – Data element size
D – Num. of Storage Nodes
L – Location of data elements
Stripe_size – parallel file system parameter
Bandwidth Prediction
if
Formula 1
then
All dependency data is located at same storage node , and accept
download requirement
else
It would cost 2 times bandwidth of file size, and should
reject Active I/O requirement
Issues/Challenges
On the other hands, it is common that successive operations share the
same data access patterns in terrain analysis and image processing
§  for example, flow-direction is always followed by flow-accumulation operation
in terrain analysis
§  flow-direction generate intermediate image/map for flow-accumulation
Layout Optimizer
A new data distribution method is introduced:
Ø  Adopt an suitable data distribution method to store intermediate image/data
Ø  Ensure no/little data dependency for successive operations (such as flowaccumulation)
Ø  round-robin pattern is discarded, and each storage node stores k successive stripes.
Ø  Two copies of the boundary data strips are stored in successive two storage nodes
Layout Optimizer
Stripe L
Stripe 1
1
2
3
4
5
…
…
N-4
N-3
N-2
N-1
N
Normal Data layout
2
3
Stripe l
4
s1
s2
s3
Stripe m
…
s4
s
s5
Stripe n
M-3
s6
s7
M-2
M-1
M
Stripe o
Stripe p
s8
Stripe q
Server a
Stripe l
Stripe m
Stripe n
Server b
Data Transfer
Stripe o
Stripe p
Stripe q
Layout Optimizer
Server a
Stripe l
Stripe m
Stripe n
Stripe o
Stripe p
Stripe q
Server b
Data Transfer
……
Stripe l
Stripe m
Stripe n
……
Copy
Stripe o
Stripe p
Stripe q
Layout Optimizer
New formulas:
What the prototype
need to do is to
calculate a suitable
value for k, D and
stripe_size
Evaluation
Platform
Hrothgar Cluster
# of Nodes
24, 36, 48, 60
Evaluated
operations
Flow-routing, Flow-accumulation and 2D
Gaussian Filter
Data set size
24GB, 36GB, 48GB and 60GB
Evaluated
schemes
TS: traditional storage,
NAS: normal active storage,
DAS: proposed prototype
Impact of Data Dependence
Performance)Impact)of)Data)Dependece)
16000"
14000"
flow_rou/ng_NAS"
Execu&on)Time)(s))
12000"
flow_rou/ng_TS"
10000"
flow_accumula/on_NAS"
8000"
6000"
flow_accumula/on_TS"
4000"
gaussian_NAS"
2000"
gaussian_TS"
0"
24"
36"
48"
60"
Data)Size)(GB))
Execution time of NAS scheme is compared with one of TS scheme
Performance Improvement
Execu&on)Time)of)Each)Scheme)
6000"
§  30% improvement
V.S. TS
Execu&on)Time)(s))
5000"
4000"
NAS"
3000"
DAS"
TS"
2000"
1000"
0"
Flow-rou0ng"
Flow-accumula0on"
Gaussian"Filter"
Opera&ons)
Comparison of Execution Time OF NAS, TS and DAS. (24GB data, 24 nodes)
§  60% improvement
V.S. NAS
Scalability Analysis
Scalability)with)Varied)Number)of)Nodes)
10000"
9000"
flow_rou2ng_DAS"
Execu&on)Time)(s))
8000"
7000"
flow_rou2ng_TS"
6000"
flow_accumula2on_DAS"
5000"
4000"
flow_accumula2on_TS"
3000"
gaussian_DAS"
2000"
1000"
gaussian_TS"
0"
24"
36"
48"
60"
Number)of)Nodes)
Comparison of Execution Time when the Number of Nodes Increased
All decreased
15% with
increasing 12
nodes
Scalability Analysis
Scalability)with)varied)Data)Set)Size)
16000"
flow_rou/ng_NAS"
14000"
flow_rou/ng_DAS"
Execu&on)Time)(s))
12000"
flow_rou/ng_TS"
10000"
flow_accumula/on_NAS"
8000"
flow_accumula/on_DAS"
6000"
flow_accumula/on_TS"
4000"
gaussian_NAS"
2000"
gaussian_DAS"
0"
24"
36"
48"
48"
Data)Size)(GB))
Comparison of Execution Time with varied data size
gaussian_TS"
execution time increases:
DAS: 15%
NAS: 30%
TS: 30%
When data increased 12GB
Bandwidth Improvement
Normalized+Bandwidth+
2.5"
Normalized+band+width+
2"
1.5"
NAS"
DAS"
1"
TS"
0.5"
0"
24"
36"
48"
Data+size(GB)+
Normalized Sustained Bandwidth Improvement
60"
Compared to TS
DAS: 1.8 times bandwidth
NAS: 0.7 times bandwidth
Conclusion and Future Work
Ø  Data dependence has a great impact on performance of Active Storage
Ø  DAS is introduced to solve such challenge issue
Ø  Resource contention
Reference
1.  R. Ross, R. Latham, M. Unangst and B. Welch. Paralell I/O in Practice. Tutorial in the ACM/
IEEE Supercomputing Conference, 2009.
2.  J. F. O. Callaghan and M. D. M. The Extraction of Drainage Networks from Digital Elevation
Data. Computer Vision, Graphics and Image Processing, 8:323–344, 1984.
3.  J. Piernas, J. Nieplocha, and E. J. Felix. Evaluation of Active Storage Strategies for the
Lustre Parallel File System. In Proceedings of the 2007 ACM/IEEE conference on
Supercomputing, 2007.
4.  E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage
5.  Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters:
The HPC Revolution, Chapel Hill, North Carolina, 2005.
6.  . W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I /
O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies
(MSST), 2010.
..etc.
Thank you
Download