Dynamic Active Storage for High Performance I/O Chao Chen()

Dynamic Active Storage for High Performance I/O Chao Chen(chao.chen@ttu.edu) 4.02.2012 UREaSON Outline Ø Background Ø Active Storage Ø Issues/challenges Ø Dynamic Active Storage Ø Prototyping and Evaluation Ø Conclusion and future work Background Ø  Applications from the area of geographical information systems, Climate Science, astrophysics, high-energy physics, etc. are becoming more and more data intensive. §  NASA’s Shuttle Radar Topography Mission (10TB) §  FLASH: Buoyancy-Driven Turbulent Nuclear Burning(75TB~300TB) §  Climate Science (10TB~355TB) Ø  Efficient tools are needed to store and analyze these data sets. Background Ø  CN: compute nodes, dedicated for processing (sum, minus, multiple etc.) Ø  It is very time consuming. Ø  SN: storage nodes, dedicated for storing the data. Ø  I/O operations dominate the system performance CN1 CN2 CN3 CNn Network Compute Node SN2 Analysis kernel I/O request Storage Node SN1 Application SNm Data Disk Active Storage Ø  Active Storage was proposed to mitigate such issue, and attracted intensive attention. Ø  It moves appropriate computations near data (Storage Nodes) Compute Application Node I/O request Storage Node Result Analysis kernel Data Disk Network bandwidth cost is reduced Active Storage Two famous prototype: Ø  Felix et. al proposed the first prototype based on Lustre §  Supports limited and simple operations §  Lacks a flexible method to add processing kernels NAL User Space Processing Component OST ASOBD OBDfilter ext3 ASDEV Active Storage Ø  Woo et. al proposed another prototype based on PVFS §  It provides a more sophisticated prototype based on MPI §  User can register their process kernels Application Client 1 Client 2 … Parallel File System Client Client n Parallel File System API Active Storage API Parallel File System API Kernels Interconnection network Server 1 Server 2 … Server n Disk GPU Issues/Challenges Ø  All existing studies don’t consider data dependence Ø  Dependence commonly exists among data accesses Issues/Challenges for example, flow-direction and flow-accumulation operations in terrain analysis latitude longitude Single direction multi-direction Fig .1 Examples of SFD and MFD SFD: Single flow direction MFD: Multiple flow direction Issues/Challenges Ø  Dependence has a great impact on performance The)performance)of)Ac&ve)Storage)(no)dependence)) Performance)of)Ac&vestorage)(with)dependence)) 10000" 3500" 9000" 3000" 7000" 6000" 5000" TS" 4000" AS" 3000" Execu&on)&me)(s)) Execu&on)&me)(s)) 8000" 2500" 2000" AS" 1500" TS" 1000" 2000" 500" 1000" 0" 0" 24" 36" 48" Data)size)(GB)) SUM operation 60" 24GB" 36GB" 48GB" 60GB" Data)size) flow-routing operation Question: Is every operation suitable to be offloaded to storage node? Data Dependence Stripe L Stripe 1 1 Terrain map 2 3 4 5 … … N-4 N-3 N-2 Each stripe is 64kb in PVFS N-1 N Possible Data distribution 2 3 Stripe o s1 s2 s3 … s4 s s5 M-3 s6 s7 s8 4 M-2 Stripe p Server a Server b Server c Analysis Kernel Analysis Kernel Analysis Kernel Disk Stripe q Stripe o … Disk Stripe p … M-1 M Possible Bandwidth cost: 2 times Disk Stripe q Dynamic Active Storage A Dynamic Active Storage Prototype is proposed: Ø  Predicts the I/O bandwidth cost before the active I/O is accepted Ø  Dynamically determines operations that are beneficial to be offloaded and processed on storage nodes Ø  Introduces a new data layout method DAS System Architecture NEW Key components: 1.  Bandwidth prediction 2.  Data Distribution calculation (layout optimizer) 3.  Kernel features 4.  Local I/O API 5.  Processing kernels Bandwidth Prediction Known the dependence patterns, we can calculate data locations, and then estimate the bandwidth cost previously: k i stride j stride i,j,k – ith, jth, kth data elements E – Data element size D – Num. of Storage Nodes L – Location of data elements Stripe_size – parallel file system parameter Bandwidth Prediction if Formula 1 then All dependency data is located at same storage node , and accept download requirement else It would cost 2 times bandwidth of file size, and should reject Active I/O requirement Issues/Challenges On the other hands, it is common that successive operations share the same data access patterns in terrain analysis and image processing §  for example, flow-direction is always followed by flow-accumulation operation in terrain analysis §  flow-direction generate intermediate image/map for flow-accumulation Layout Optimizer A new data distribution method is introduced: Ø  Adopt an suitable data distribution method to store intermediate image/data Ø  Ensure no/little data dependency for successive operations (such as flowaccumulation) Ø  round-robin pattern is discarded, and each storage node stores k successive stripes. Ø  Two copies of the boundary data strips are stored in successive two storage nodes Layout Optimizer Stripe L Stripe 1 1 2 3 4 5 … … N-4 N-3 N-2 N-1 N Normal Data layout 2 3 Stripe l 4 s1 s2 s3 Stripe m … s4 s s5 Stripe n M-3 s6 s7 M-2 M-1 M Stripe o Stripe p s8 Stripe q Server a Stripe l Stripe m Stripe n Server b Data Transfer Stripe o Stripe p Stripe q Layout Optimizer Server a Stripe l Stripe m Stripe n Stripe o Stripe p Stripe q Server b Data Transfer …… Stripe l Stripe m Stripe n …… Copy Stripe o Stripe p Stripe q Layout Optimizer New formulas: What the prototype need to do is to calculate a suitable value for k, D and stripe_size Evaluation Platform Hrothgar Cluster # of Nodes 24, 36, 48, 60 Evaluated operations Flow-routing, Flow-accumulation and 2D Gaussian Filter Data set size 24GB, 36GB, 48GB and 60GB Evaluated schemes TS: traditional storage, NAS: normal active storage, DAS: proposed prototype Impact of Data Dependence Performance)Impact)of)Data)Dependece) 16000" 14000" flow_rou/ng_NAS" Execu&on)Time)(s)) 12000" flow_rou/ng_TS" 10000" flow_accumula/on_NAS" 8000" 6000" flow_accumula/on_TS" 4000" gaussian_NAS" 2000" gaussian_TS" 0" 24" 36" 48" 60" Data)Size)(GB)) Execution time of NAS scheme is compared with one of TS scheme Performance Improvement Execu&on)Time)of)Each)Scheme) 6000" §  30% improvement V.S. TS Execu&on)Time)(s)) 5000" 4000" NAS" 3000" DAS" TS" 2000" 1000" 0" Flow-rou0ng" Flow-accumula0on" Gaussian"Filter" Opera&ons) Comparison of Execution Time OF NAS, TS and DAS. (24GB data, 24 nodes) §  60% improvement V.S. NAS Scalability Analysis Scalability)with)Varied)Number)of)Nodes) 10000" 9000" flow_rou2ng_DAS" Execu&on)Time)(s)) 8000" 7000" flow_rou2ng_TS" 6000" flow_accumula2on_DAS" 5000" 4000" flow_accumula2on_TS" 3000" gaussian_DAS" 2000" 1000" gaussian_TS" 0" 24" 36" 48" 60" Number)of)Nodes) Comparison of Execution Time when the Number of Nodes Increased All decreased 15% with increasing 12 nodes Scalability Analysis Scalability)with)varied)Data)Set)Size) 16000" flow_rou/ng_NAS" 14000" flow_rou/ng_DAS" Execu&on)Time)(s)) 12000" flow_rou/ng_TS" 10000" flow_accumula/on_NAS" 8000" flow_accumula/on_DAS" 6000" flow_accumula/on_TS" 4000" gaussian_NAS" 2000" gaussian_DAS" 0" 24" 36" 48" 48" Data)Size)(GB)) Comparison of Execution Time with varied data size gaussian_TS" execution time increases: DAS: 15% NAS: 30% TS: 30% When data increased 12GB Bandwidth Improvement Normalized+Bandwidth+ 2.5" Normalized+band+width+ 2" 1.5" NAS" DAS" 1" TS" 0.5" 0" 24" 36" 48" Data+size(GB)+ Normalized Sustained Bandwidth Improvement 60" Compared to TS DAS: 1.8 times bandwidth NAS: 0.7 times bandwidth Conclusion and Future Work Ø  Data dependence has a great impact on performance of Active Storage Ø  DAS is introduced to solve such challenge issue Ø  Resource contention Reference 1.  R. Ross, R. Latham, M. Unangst and B. Welch. Paralell I/O in Practice. Tutorial in the ACM/ IEEE Supercomputing Conference, 2009. 2.  J. F. O. Callaghan and M. D. M. The Extraction of Drainage Networks from Digital Elevation Data. Computer Vision, Graphics and Image Processing, 8:323–344, 1984. 3.  J. Piernas, J. Nieplocha, and E. J. Felix. Evaluation of Active Storage Strategies for the Lustre Parallel File System. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 2007. 4.  E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage 5.  Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, 2005. 6.  . W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010. ..etc. Thank you

Dynamic Active Storage for High Performance I/O Chao Chen()

Related documents

Products

Support

Dynamic Active Storage for High Performance I/O Chao Chen()

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib