Document 11499092

advertisement
Performance Model Directed Data Sieving
for High Performance I/O
DISCL group
Yin Lu
Performance Model Directed Data Sieving for High Performance I/O
Outline
§  Introduction
§  Performance Model Directed Data Sieving
§  Evaluation
§  Conclusion & Future work
Performance Model Directed Data Sieving for High Performance I/O
Outline
§  Introduction
§  Performance Model Directed Data Sieving
§  Evaluation
§  Conclusion & Future work
Introduction
§  Highly data intensive I/O for large-scale scientific computing.
SciDAC climate studies visualization at ORNL
SciDAC astrophysics simulation
visualization at ORNL
Introduction
§  Data requirements of representative INCITE applications run at Argonne
National Laboratory.
On-Line
Data
INCITE: Innovative and Novel Computational Impact on Theory and Experiment Program
Off-Line
Data
FLASH: Buoyancy-Driven Turbulent Nuclear Burning
75TB
300TB
Reactor Core Hydrodynamics
2TB
5TB
Computational Nuclear Structure
4TB
40TB
Computational Protein Structure
1TB
2TB
Performance Evaluation and Analysis
1TB
1TB
Kinetics and Thermodynamics of Metal and Complex
Hydride Nanoparticles
5TB
100TB
Climate Science
10TB
345TB
Parkinson's Disease
2.5TB
50TB
Plasma Microturbulence
2TB
10TB
Lattice QCD
1TB
44TB
Thermal Striping in Sodium Cooled Reactors
4TB
8TB
Gating Mechanisms of Membrane Proteins
10TB
10TB
Introduction
§  Poor performance in dealing with large number of small and
noncontiguous data requests.
Compute
Node
Compute
node
Metadata
server
Compute
Node
Compute
node
Storage
server
Compute
Node
Compute
node
Storage
server
Compute
Node
Compute
node
Storage
server
Introduction
§  Structured data leads naturally to noncontiguous I/O
§  Noncontiguous I/O has three forms
•  Noncontiguous in memory, noncontiguous in file, or noncontiguous in both
Large array distributed
among 16 processes
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9 P10 P11
Each square represents a
subarray in the memory of a
single process
P12 P13 P14 P15
Access Pattern in the file
P0
P1
P4
P2
P5
P8
P3
P6
P9
P12
P0
P7
P10
P13
P1
P4
P11
P14
P2
P5
P6
P8
P9
P10
P15
P12
P13
P14
Introduction
§  ROMIO addresses noncontiguous I/O effectively on parallel file systems.
•  The most popular MPI-IO implementation.
•  MPI - a standardized and portable message-passing system used to program on parallel
computers.
•  MPI IO - a standard interface for parallel I/O.
•  Layered implementation supports many storage types
–  Local file systems (e.g. XFS)
MPI-IO Interface
–  Parallel file systems (e.g. PVFS2)
–  NFS, Remote I/O (RFS)
Common Functionality
–  UFS implementation works for most other file systems
ADIO Interface
(e.g. GPFS and Lustre)
•  Includes data sieving and two-phase optimizations.
PVFS
XFS
UFS
NFS
Introduction
§  Data sieving combines small and noncontiguous I/O requests into
a large and contiguous request to reduce the effect of high I/O
latency caused by noncontiguous access pattern.
§  Date sieving write operations
Introduction
§  Benefits highly depend on specific access patterns
•  Always combines all the requests to form a large and contiguous one.
•  Lacks a dynamic decision based on different access patterns.
•  Non-requested portions (holes) could be too large to be beneficial to perform data
sieving.
Introduction
§  Potential problem of extensive memory requirement
•  A single contiguous chunk of data starting from the first up to the last byte requested by
the user is read into the temporary buffer
•  Although memory capacity gradually increases in HPC system, the available memory
capacity per core even decreases
–  Especially when the scale of HPC is projected to million cores or beyond
Performance Model Directed Data Sieving for High Performance I/O
Outline
§  Introduction
§  Performance Model Directed Data Sieving
§  Evaluation
§  Conclusion & Future work
Performance Model Directed Data Sieving
§  Our work
•  Develop a performance model of parallel I/O
system.
MPI-IO Interface
Common Functionality
COMMON FUNCTIONALITY
PMD Data Sieving
•  Component determines when to preform data
sieving dynamically on the fly depending on access
patterns.
Performance Model
Dynamic
Decision
•  Component determines how data sieving is
performed based on the performance model and
specific access patterns.
Requests
Grouping
Data Access
ADIO Interface
PVFS
XFS
UFS
NFS
Performance Model Directed Data Sieving
Basic model
§  For reading a particular block of data, the total time required is
•  TRtotal (Total time for reading a block) = Start up time + Time for system I/O call +
(Request size / Bandwidth for read)
§  Similarly for writing a particular block of data
hole
Performance Model Directed Data Sieving
Extended model
§ 
§ 
The client nodes and storage nodes are separate from each other, and every data access involves network transmission.
Data is stripped across all the storage nodes in a round-robin fashion.
Performance Model Directed Data Sieving
Extended model
Table1. Parameters and Descriptions
Parameters
Table 2. Formula of Deriving I/O Performance
The total time required for
establishing network
connection
Description
p
Number of I/O client processes
n
Number of storage nodes (file servers)
te
Time of establishing network connection for single node
tt
Network transmission time of one unit of data
cud
Time reading/writing one unit of data
lqdep
The latency for outstanding I/Os
sizerd
Read data size of one I/O request
The total time spent on the
network transmission
The total start up (s) time
for I/O operations
The total time spent on the
actual data read/write (Trw)
te * p
tt * sizerd tt * sizewr
Or
n
n
p *(seek time + system IO call)
sizerd * cud
sizewr * cud
Or
n
n
Ttotal = Tnetwork + Tstorage + lqdep
Ttotal = te ∗ p +
tt ∗ sizerrd / wr
sizerd / wr ∗ cud
+ p ∗(seek time + system IO call) +
+ lqdep
n
n
Performance Model Directed Data Sieving
Components Design
§  Dynamic Decision Component
•  Input: Hole size (sizeh), seek latency read , time for system I/O
call, bandwidth read , the number of storage nodes (n), the
number of I/O client processes (p), the time for establishing
network connection for single node (te), network transmission
time of one unit of data (tt) and the next I/O access size (sizerd)
MPI-IO Interface
Common Functionality
PMD Data Sieving
Performance Model
•  Output: YES or NO. If it is YES, then the data sieving technique
is adopted. If it is NO, requests are handled as independent I/O
request.
Dynamic
Decision
Requests
Grouping
Data Access
{
THread = sizeh /(bandwidthread * n)
Let TStart = (te * p) + (tt * sizerd)/n + (seek latencyread + time for
system I/O call)*p;
If (TStart > THread)
{
Return (YES);
}
Else if (TStart < THread)
{
Return (NO);
}
}
ADIO Interface
PVFS
XFS
UFS
NFS
Performance Model Directed Data Sieving
Components Design
§  Request Grouping Component
•  Input: List of all offsets and list of lengths of the each I/O
request data.
MPI-IO Interface
Common Functionality
•  Output: Set of groups containing all I/O requests. Among
the I/O requests in each group, the data sieving technique
will be implemented.
{
{
Performance Model
Dynamic
Decision
Start from the lowest offset;
While (algorithm doesn’t reach to the end of the largest offset)
data request is encountered;
decision = call Algorithm 1;
If (decision = NO)
{
If (encountered data request not in any group)
then, go for the independent I/O request;
Else if (encountered data request is in a group)
then, close that group;
}
Else if (decision = YES)
{
group it with the next consecutive I/O request;
}
}
PMD Data Sieving
}
Requests
Grouping
Data Access
ADIO Interface
PVFS
XFS
UFS
NFS
Performance Model Directed Data Sieving for High Performance I/O
Outline
§  Introduction
§  Performance Model Directed Data Sieving
§  Evaluation
§  Conclusion & Future work
Evaluation
§  Experimental Environment
•  One Sun Fire X4240 head node with dual 2.7 GHz Opteron quad core processors and
8GB memory. 64 Sun Fire X2200 compute nodes with dual 2.3GHz Opteron quad-core
processors and 8GB memory connected with Gigabit Ethernet.
•  Each node is equipped with one solid state drive with model number OCZ Technology
OCZSSDPX-1RVDX0100 REVO X2 PCIE SSD 100GB MLC.
•  Ubuntu 4.3.3-5 system with kernel 2.6.28.10, PVFS 2.8.1 file system and
MPICH2-1.0.5p3 library manages the storage system and runtime environment.
•  The actual values of the parameters used in the performance model were obtained
through measurement on the experimental platform.
–  te: 0.0003sec
–  tt: 1/120 MB
–  cud: 1/120 MB
Evaluation
§  Three synthetic I/O benchmark scenarios from real applications kernel
•  All requests and holes among them have different sizes
•  Sparse noncontiguous I/O requests and large holes exist among requests
•  Dense noncontiguous I/O requests where small size holes exist among request
!
!
!
Evaluation
§  Experimental Results on Single Node
Execution time of three strategies
Memory requirement of three strategies
Speedup ratio of two strategies
Evaluation
§  Experimental Results on Multiple Nodes
Execution time for access scenario 1
(fixed number of storage nodes)
Execution time for access scenario 2
(fixed number of storage nodes)
Execution time for access scenario 3
(fixed number of storage nodes)
Execution time for access scenario 1
(fixed number of I/O client processes)
Execution time for access scenario 2
(fixed number of I/O client processes)
Execution time for access scenario 3
(fixed number of I/O client processes)
Performance Model Directed Data Sieving for High Performance I/O
Outline
§  Introduction
§  Performance Model Directed Data Sieving
§  Evaluation
§  Conclusion & Future work
Conclusion
§  Data sieving remains a critical approach to improve the performance
for small and noncontiguous accesses in data intensive applications.
§  The existing data sieving strategy is static and suffers large memory
requirement pressure.
§  The proposed performance model directed (PMD) data sieving
approach is essentially a heuristic data sieving approach directed by
estimation given from a performance model.
§  Experiments have been performed on a cluster to evaluate the benefit
of PMD approach
•  PMD performs better than both direct method and the current data sieving
approach in terms of execution time.
•  PMD reduces the memory requirement considerably as well compared with the
conventional data sieving.
Future work
§  Rigorous study of the performance model can be done by including
more parameters
§  Integrate PMD data sieving with the hybrid storage media
Questions?
Backup
Download