Conclusion of the PostDoctoral Program on {SSI- OSCAR

advertisement
Conclusion of the PostDoctoral Program on {SSIOSCAR
September, 2005
Geoffroy Vallee
Introduction
From March 2005 to September 2005, a collaboration was initiated between EDF R&D, INRIA and
ORNL on the subject of software infrastructure for clustering. The main goal of this collaboration was
the integration of the Single System Image (SSI) Kerrighed, developed by INRIA and EDF R&D, in
the cluster toolkit OSCAR.
This collaboration allows to fund a postdoc position, located for the five months at IRISA, France and
then at the ORNL, Oak Ridge, USA.
Section 1 presents initial objectives. Section 2 presents studies and prototypes done during the time of
the postdoc. Section 3 presents scientific papers, talks made by the postdoc.
Initial Objectives
EDF R&D, INRIA (Institut national de Recherche en Informatique et Automatique) (and more
specifically the INRIA research unit located in Rennes at IRISA (Institut de Recherche en Informatique
et Systèmes Aléatoires) ) and Oak Ridge National Laboratory (ORNL) initiated a two years research
collaboration on the subject of Cluster Computing. This collaboration is focus on: Three strategic objectives:
SO1 : the evaluation of the OSCAR consortium and the OSCAR toolbox by EDF R&D and
INRIA
SO2 : the instruction of the decision by INRIA and EDF R&D to participate to the OSCAR
consortium. If the decision is taken, define with the consortium the role INRIA and EDF R&D
will play.
SO3 : the initiation a of more general and long term collaboration between INRIA, EDF R&D
and ORNL on the subject of High Performance Computing.
Two technical objectives: TO1 : the evaluation of the Kerrighed system by ORNL
TO2 : the integration of the Kerrighed system into the OSCAR toolbox
One industrial objective: IO1 : make OSCAR toolbox adapted to industrial needs and available on commercial clusters.
A two years postdoc position, that has been funded by INRIA and EDF R&D, has be initiated in the
framework of this collaboration. Punctual exchange of researchers between the three organisms was
also organized. Initially, the postdoc student, Geoffroy Vallée, was supposed to integrate the team of Stephen Scott at
ORNL during the first year and join the team of Christine Morin at IRISA/INRIA the second year. The
first three months of the collaboration was supposed to be dedicated to answer the 2 first objectives.
The two technical objectives was supposed to be achieved at the end of the first year. The industrial
objective was supposed be achieved at the end of the second year, joining efforts of the G. Vallée
postdoc, researchers of ORNL and INRIA and engineers of EDF R&D.
This collaboration was also initiated as an opportunity to work on a more general and long term
collaboration between INRIA, EDF R&D and ORNL on the subject of High Performance Computing.
Work Done
Schedule
Because of visa issue, the organization of the postdoc was modified. Geoffroy Vallée was at IRISA the
first five months, waiting for the completion of visa papers. Geoffroy Vallée arrived at the ORNL on
August, 2005 and stay at the ORNL until this end of the postdoc program on September, 2005.
Geoffroy Vallée spent a month in December/January, 2005 in order to work with IRISA researchers
and EDF engineers.
SSI-OSCAR
The postdoc program allows to create the SSI­OSCAR software. SSI­OSCAR aims at providing an
easy way to use clusters with a Single System Image (SSI) and an easy way to administrate clusters,
using a distribution for high performance computing on clusters OSCAR. For that, the Kerrighed SSI
has been integrated into the OSCAR distribution.
The first version of SSI­OSCAR was a complete spin­off suite because of important modifications of
the OSCAR suite. The OSCAR needed to be modified because of limitations of both the OSCAR suite
and the Kerrighed SSI. For example, OSCAR was not able to easily changed the kernel installed on
compute nodes and Kerrighed being an extension of the Linux kernel, it was not possible to change the
kernel used for compute nodes without OSCAR modifications.
SSI­OSCAR 1.0 has been released in November, 2005 and announced during SuperComputing'05. This
version was based on OSCAR 3.0 for RedHat 9.0 and on Kerrighed 1.0 release candidate 8. This
version was provided as an alternative OSCAR suite, important modification being made to support the
Kerrighed kernel.
SSI­OSCAR 2.0 has been released in March, 2005. This version was based on Kerrighed 1.0.0 and on
OSCAR 4.0 for RedHat 9.0 and Fedora Core 2. This version was still released as an alternative
OSCAR suite, new features being the support of a new OSCAR version and of a new Kerrighed version
SSI­OSCAR 3.0 has been released in May, 2005. This version introduced a new architecture: the new
version is available as a "spin­off" OSCAR package which can be downloaded and installed with the
OSCAR Package Downloader (OPD) of OSCAR. This version provides Kerrighed 1.0.0 (the package
of Kerrighed kernel has been modified to include a large set of drivers, supporting initrd images), and is
based on OSCAR 4.1 for RedHat 9 et Fedora Core 2. This version was announced to the OSCAR
symposium.
SSI­OSCAR 3.1 has been released in May, 2005. SSI­OSCAR 3.1 includes Kerrighed 1.0.2 and the
integration of Kerrighed tests into OSCAR tests. With this new version, it was possible to test the
Kerrighed installation with the OSCAR GUI and the Kerrighed system is automatically launched at the
end of the cluster installation. This version is based on OSCAR 4.1 for RedHat 9 et Fedora Core 2.
OSCAR Package
OSCAR is based on binary packages therefore the first step to integrate Kerrighed in OSCAR is to
create binary packages for Kerrighed. RedHat 9 and Fedora Core 2 being the two popular Linux
distributions supported by OSCAR, packages was created for both RedHat 9 and Fedora Core 2. These
two distributions are based on RPM packages, therefore some information can be used to create these
packages. Nevertheless, to create packages, each distribution needs specific parameters (e.g. the
compiler version to use). Therefore, to ease the package creation, a framework was developed to
automatically create binary packages for Kerrighed from Linux sources and Kerrighed sources.
This framework, named Kpackager, allows to centralize common components between binary
packages and to ease the management of specific parameters. At the end, Kpackager allows the package
maintainer to create Kerrighed packages using a simple make command. The advantage of this solution is to:
●
simplify the management of files needed to create a package. The creation of binary packages can
quickly be complex because of the important set of files (e.g. patches, configuration) and
information to manage (e.g. sources location).
●
ease the creation of packages for a new Kerrighed version. The package maintainer just has to
update information for the new version (like the Kerrighed patch).
●
ease the support of new distributions based on a same binary package format. Files for a Linux
distribution can be used as starting point to support a new Linux distribution.
The current Kpackager version allows to create packages for Kerrighed 1.0.2 for RedHat 9 and Fedora
Core 2. Each distribution has its own configuration files to specialized the kernel to the Linux
distribution (e.g. use of different compiler, different configuration of kernel options).
Kpackager was integrated in the CVS repository of the SSI­OSCAR project (the CVS server on
lievre.irisa.fr, project kpackager) to ease to technical transfer to the next team which will manage the
project.
Future Work
The current version of Kpackager and SSI­OSCAR are based on RedHat 9 and Fedora Core 2. These
two Linux distributions are not longer the most popular Linux distributions supported by OSCAR. The
next version of kpackages and SSI­OSCAR may be based on CentOS 4 (a clone of the professional
RedHat distribution) and Fedora Core 3.
Some improvements can also be done for the management of Kerrighed patches in Kpackager, patches
currently being duplicated for each Linux distribution.
Kerrighed Port on the 2.4.29 Kernel
At the beginning of this program, Kerrighed was based on the 2.4.24 kernel. This version being old
comparing to supported kernel in Linux distributions, some issues appeared during the creation of SSI­
OSCAR software.
Therefore, Kerrighed was ported on the 2.4.29, the most recent 2.4 kernel available for the port. This
port allowed:
●
to fix some kernel issue to the differences between the supported kernel of Linux distributions and
the Kerrighed kernel.
●
to update Kerrighed to the most recent 2.4 kernel.
●
initiate the x86_64 port, this architecture being not supported by the 2.4.24 kernel.
Initiation of the x86_64 Port
The Kerrighed port on x86_64 machines was initiated thanks to the port on the 2.4.29 kernel, the
x86_64 support in the 2.4.29 kernel being better than in the 2.4.24 kernel initially supported by
Kerrighed. This port allows to compile Kerrighed on x86_64 machines but no validation was made.
OSCARonDebian
EDF clusters are based on the Debian Linux distribution. OSCAR 3.0 (the official stable version of
OSCAR at the beginning of the collaboration program) was only supporting RPM based Linux
distributions and a basic framework was available to ease the extension to new binary package formats.
A first port on Debian was created and announced during the OSCAR symposium (May, 2005). This
version, experimental, allowed to have a summer student funded by Google, thanks to the Summer of
Code program. This contribution allowed to fix issues of the initial version. The state of the current
version allows to integrate the code to the development repository of the OSCAR project and the
integration in OSCAR 5 is planed. SSI related works at ORNL
ORNL is involved in two FastOS projects: PetascaleSSI and Molar. PetascaleSSI aims at creating a
petascale SSI. Molar (MOdular Linux and Adaptive Runtime support for HEC OS/R research) aims at
adaptive, reliable,and efficient operating and runtime system solutions for ultra­scale high­end
scientific computing on the next generation of supercomputers. SSI features (e.g. process
checkpoint/restart and process migration) were studied in this context. SSI­OSCAR may be used for
some specific points of these projects. In the context of these studies of large­scale system, i have studied a solution for cluster virtualization
for kernel research and testing. A paper has been written based on these studies. Conferences and Talks
●
The COSET­1 workshop, June 2004: presentation of the SSI­OSCAR project and chairman of the
session Cluster Single System Image Operating Systems.
●
The Cluster 2004 IEEE International Conference: presentation of the paper @InProceedings{morlotvalgalmarbersch04cluster,
Author = {Christine Morin and Renaud Lottiaux and Geoffroy Vallée and Pascal Gallard and David Margery and Jean­Yves Berthou and Isaac D. Scherson},
Title
= {Kerrighed and Data Parallelism: Cluster Computing on Single System Image Operating Systems},
Booktitle
= {The 2004 IEEE International Conference on Cluster Computing},
Address
Pages
Month
Year
}
= {San Diego, California, USA},
= {20­­23},
= September,
= 2004
●
Presentation at ETSU (Johnson City, Tennessee, USA), September 2004: The Kerrighed Operating
System: a Single System Image for Cluster.
●
Presentation at the University of Tennessee (Knoxville, Tennessee, USA), September 2004: The
Kerrighed Operating System: a Single System Image for Cluster.
●
The HAPCW workshop, October 2004: presentation of the paper:
@InProceedings{valbergallotmarmor04hapcw,
Author = {Geoffroy Vallée and Jean­Yves Berthou and Pascal Gallard and Renaud Lottiaux and David Margery and Christine Morin},
Title = {Kerrighed: a Single System Image Providing High Availability Capabilities to
Applications},
Booktitle = {HAPCW'04: High Availability and Performance Computing Workshop},
Organization = {Held in conjunction with LACSI 2004, },
Month = OCT,
Year = 2004
}
● Talk at ORNL, October 2004: Kerrighed: a Single System Image for Clusters.
●
SuperComputing OSCAR BOF, November 2004: presentation of the SSI­OSCAR project.
●
The CCGRID 2005 IEEE International Conference: presentation of the paper:
@InProceedings{lotboigalvalmor05ccgrid,
Author = {Renaud Lottiaux and Benoit Boissinot and Pascal Gallard and Geoffroy
Vallée and Christine Morin},
Title = {OpenMosix, OpenSSI and Kerrighed: A Comparative Study},
Booktitle = {Cluster Computing and Grid 2005 (CCGRID 2005)},
Address = {Cardiff, England},
Month = May,
Year = 2005
}
● The DSM 2005 workshop (held in conjunction of CCGRID 2005): session co­chairman.
●
OSCAR Symposium, May 2005. Member of the program committee and presentation of the papers:
@InProceedings{valberprilep05oscar,
Author = {Geoffroy Vallée and Jean­Yves Berthou and Hugues Prisker and Daniel Leprince},
Title
= {OSCAR on Debian: the EDF Experience},
Booktitle
= {The 3rd Annual OSCAR Symposium},
Organization = {Held in conjunction with the 19th International Symposium on High Performance Computing Systems and Applications (HPCS 2005)}
Address
= {University of Guelph, Guelph, Ontario, Canada},
Month
= May,
Year
= 2005,
}
@InProceedings{valscomorberpri05oscar,
Author = {Geoffroy Vallée and Stephen L. Scott and Christine Morin and Jean­Yves Berthou and Hugues Prisker},
Title
= {SSI­OSCAR: a Cluster Distribution for High Performance Computing Using a Single System Image},
Booktitle
= {The 3rd Annual OSCAR Symposium},
Organization = {Held in conjunction with the 19th International Symposium on High Performance Computing Systems and Applications (HPCS 2005)},
Address
= {University of Guelph, Guelph, Ontario, Canada},
Month
= May,
Year
= 2005,
}
●
The COSET­2 workshop, June 2005: member of the program committee.
●
The ISPDC 2005 IEEE International Conference, July 2005. Presentation of the paper:
@InProceedings{vallotmarmorber05ispdc,
author = {Geoffroy Vallée and Renaud Lottiaux and David Margery and Christine Morin and Jean­Yves Berthou},
title = {Ghost Process: a Sound Basis to Implement Process Duplication,
Migration and Checkpoint/Restart in Linux Clusters},
booktitle = {The 4th International Symposium on Parallel and Distributed Computing},
year = {2005},
address = {Lille, France},
month = {July}
}
Download