downloading - HPCL - The George Washington University

advertisement
UPC-IO Reference Implementation
Beta Version 2.0 – June 2006
Yiyi Yao, Kun Xi, Tarek El-Ghazawi
{yyy, kunxi, tarek@gwu.edu}
High Performance Computing Lab
The George Washington University
Rajeev Thakur
thakur@mcs.anl.gov
Mathematics and Computer Science Division
Argonne National Laboratory
Brief UPC-IO Background
The UPC-IO functionalities are defined as an I/O standard library of the UPC
programming language. The UPC-IO specification V1.0 [1] was first proposed to the
UPC language consortium in 2004. It is now merged into the latest UPC language
specification V1.2 [2] as an appendix section. The current UPC-IO specification defines
26 functions, which cover basic operations like open/close, file read/write, advanced
functions like list-IO and a set of miscellaneous operations. For more information
regarding the UPC-IO specification and its function definitions or general UPC related
information, please refer to the UPC official website http://upc.gwu.edu.
UPC-IO Reference Implementation History
Version
UPC-IO v1.0
UPC-IO v2.0
Time
Features
May, 2005
MPI-IO based
June, 2006
Standalone UPC implementation
Table 1. UPC-IO reference implementation history
The UPC-IO reference implementation V1.x [3] was released in May 2005. V1.0 is built
on top of MPI-IO for portability. Thus theoretically any platform that has MPI-IO
installed is supported. However consistency must be kept between the UPC and UPC-IO
implementations. Thus, user must make sure that the UPC compiler and run-time system
know how to communicate with the MPI layer properly. For example, the MYTHREAD
constant should be mapped properly to the MPI rank and THREADS should be exactly
the same as the MPI size. This version poses limitation that it requires a MPI compliant
UPC compiler. This is also call for a newer version of UPC-IO implementation that based
on pure UPC environment.
The UPC-IO reference implementation V2.0 [4] was completed in June 2006. This
version is built on top of pure UPC language environment. To isolate the high-level
parallel language constructs from the low-level IO facilities, an Abstract Device layer [5]
is introduced. The AD layer is implemented in such a way that the file system providers
UPC-IO Reference Implementation V2.0
1
do not need to be aware of the UPC specific language features. Thus, it is relatively easy
to port this version of UPC-IO reference implementation onto different systems that have
new parallel file systems.
Both versions of the reference implementations are open-source, portable
implementations that contain a complete set of implementation of the appendix UPC-IO
section of the latest UPC language specification, v1.2. Both implementations are library
based language extensions.
What is NEW in UPC-IO Reference Implementation V2.0
This new UPC-IO Reference Implementation focuses on both portability and high
performance. An Abstract Device layer is introduced for this purpose. Such AD layer will
isolate the high-level UPC parallel specific language feature from the low-level file
system/IO operations. For complete explanations of each function, please refer to the
Abstract Device specification.
UPC-IO
UPC-IO
MPI-IO
File Systems
UPC-AD Layer
File Systems
Figure 1a. V1.0 Architecture
Figure 1b. V2.0 Architecture
Figure 1. Software Architectures of the UPC-IO reference implementations
Figure 1 depicts the software architectures of both UPC-IO reference implementation
V1.x and V2.0. The major difference is instead of mapping UPC-IO function calls onto
its equivalent MPI-IO function calls in V1.0, the V2.0 utilizes a newly introduced AD
layer to translate the UPC-IO requests into the low-level file system operations directly.
The AD layer is written in pure UPC language. Thus the new implementation no longer
depends on any other non-UPC compliant packages (e.g. MPI-IO). AD layer can
potentially provide better performance by talking to the file system native calls or POSIX
calls directly.
UPC-IO Reference Implementation V2.0 Details
The target of the new UPC-IO reference implementation is to have it universally portable
while harnessing the performance advantage of the emerging parallel file system like
Lustre and PVFS2. The core component of the UPC-IO Reference Implementation V2.0
lib is the AD layer. The AD layer divides the whole implementation into 2 parts. The
upper layer is called the UPC AD layer. This layer deals with all the UPC or UPC-IO
specific features, like shared memory, individual/common file pointer, list-IO requests
and etc. These features are eventually translated by UPC AD layer into a set of
generalized API requests. Such API acts as the interface between the upper UPC AD
layer and the lower File System layer. The definition of the Abstract Device API was
UPC-IO Reference Implementation V2.0
2
initially inspired by the ROMIO [6] implementation, where such ADIO layer allows
ROMIO to be portable across a number of file systems. The API is defined such that it is
file system/POSIX friendly. To extend the UPC-IO to a new file system, only the lower
level File System layer needs to be implemented. It will be much easier than reimplementing all the UPC/UPC-IO features from scratch.
Along with the UPC-IO reference implementation V2.0, a standard POSIX based lower
level File System layer is released. Considering that most new high performance parallel
file systems do support POSIX as their standard programming interface, e.g. Lustre and
PVFS2, this version should be able to take advantage of the performance benefit brought
in by employing those new high performance parallel file systems.
Figure 2. UPC-IO reference implementation V2.0 architecture
Figure 2 gives the layered view of the current V2.0 architecture. The UPC-IO extension
library contains 2 separate layers: UPC AD layer and the POSIX File System interface.
All the UPC-IO function calls within the UPC codes will be handled by the UPC AD
layer and mapped onto the AD API calls. The API calls are further translated into the
POSIX I/O interface function calls in the File System layer. Eventually all the IO related
requests are directed to various file systems supported by the OS kernel.
In Figure 2, the File System layer is represented as POSIX, this is because in the current
V2.0 implementation all the AD API calls are implemented using the POSIX IO
interface. Each participating UPC thread (process) holds a dedicated file handler for its
file manipulations. Thus the actual IO work is carried out in multi-process mode.
The AD API can be also implemented using File System’s native calls (e.g. PVFS2
native function calls). File system’s native calls are supposed to provide the best
performance. Fortunately, POSIX IO interface is supported by a lot of newly emerging
high performance parallel file systems. The POSIX is even the recommended
programming interface for a number of file systems such as Lustre and PVFS2.
UPC-IO Reference Implementation V2.0
3
To implement a true asynchronous scenario for the AD API non-blocking function calls,
a dedicated IO thread is spawn on each participating UPC thread (process) using the
POSIX thread library. Thus the main thread can return immediately when the IO thread is
successfully created. The wait/test scheme is implemented using pthread_join and signal
kill techniques.
This implementation is built to use pure UPC language instead of sitting on top of MPIIO as V1.0 did. The version should, therefore, work with every existing UPC compiler.
Future extension to other file system can be relatively easy to support due to the
introduction of the AD layer.
Testing
This implementation is distributed with a set of compiler quick test cases that can be used
to verify the correctness of the UPC-IO reference implementation V2.0 under the
user/developer’s working UPC environment. More tests can be obtained from LLNL /
UC Berkeley.
The High Performance Computing Lab at the George Washington University also
provides the official UPC-IO test suites. The official UPC-IO test suites can be
downloaded from http://upc.gwu.edu
License Note
This is an open source project. Any one can use, modify or redistribute this software.
This file, license notes MUST be kept in the project. This software is provided without
any warranty. We hope this software is useful and helpful. But we are NOT responsible
for any damage caused by using this software. User should decide if this is the right
software for your system.
Acknowledgements
We want to thank those people who have provided us helps during our development of
this UPC-IO lib.
Special thanks to Dan Bonachea from Berkeley UPC group who provided us great helps
while developing this UPC-IO lib.
Reference
[1] UPC-IO specification v1.0
[2] UPC language specification v2.0
[3] UPC-IO reference implementation v1.0
[4] UPC-IO reference implementation v2.0
[5] UPC Abstract Device API definition
[6] MPICH ROMIO implementation
UPC-IO Reference Implementation V2.0
4
Download