Grid usage of Condor VM Universe

advertisement
OSG TI: Grid usage of Condor VM Universe
Authors: Brian Bockelman, Ashu Guru
1. Introduction
Amazon EC2 can be credited for two major advances in cyberinfrastructure:

First popular, large-scale implementation of leasing a virtualization-based
computing infrastructure (“the cloud”).

Providing a transparent “dollars-per-hour” price point for computing
infrastructure.
Since its introduction, EC2’s virtualization-based approaches have become
extremely popular as a basis for cyber-infrastructure, and is the subject of this
investigation. The utility of the transparent price point is the subject of a separate
TI.
Traditional university and laboratory computing centers have been presented with
a false dichotomy: should they continue to run a batch system, where the atomic
unit of work is a “job”, or a “cloud”, where the atomic unit of work is a “virtual
machine”? This investigation demonstrates the viability of a hybrid approach –
utilizing the Condor VM universe to run virtual machines within the same
infrastructure (Condor) as regular batch jobs (an ongoing concurrent investigation
looks at the pure-VM approach using OpenStack). We believe there are sufficient
advantages and disadvantages with virtual machines that some sites will never
completely transition to a pure virtualization-based infrastructure, motivating this
hybrid approach.
This approach will also be a viable as a transition strategy for OSG stakeholders
(such as ATLAS) who have a stated desire to increasingly virtualize their
infrastructure. Sites can maintain their worker node environment for the
traditional user while allowing advanced users to completely customize their
operating system environment by using virtual machines.
This document summarizes the work done along with its findings on the technology
investigation encompassing creation, uploading, and execution of VMs on the OSG.
The following objectives were reached in this investigation:




Creation of VM’s to be deployed via Condor-C with required specifications
using a kickstart file
Staging of the VM Image
Submitting VM jobs via Condor (grid universe and regular condor submit)
Joining a Condor pool from a Condor instance running inside of the launched
VM Condor job (Condor inside Condor)
While this work has the Condor-inside-Condor inner instance join the same pool as
the outer instance, an actual deploy will likely feature the inner instance joining a
per-user Condor pool.
The document is divided into four sections. Section 2 lists the background of the
technical components in this study. Section 3 describes the details of the tasks and
any issues that were faced during the implementation process. Section 4 contains
the summary and recommendations for future work.
2. Background
We will combine a few technologies in this work: Condor-C (providing traditional
“grid submission” to a remote Condor instance), Condor VM universe for launching
and managing machines, and libvirt/KVM for virtualization.
With Condor-C, a local Condor schedd submits jobs to the queue remote Condor
schedd, a traditional “work delegation” between two sites. It allows one to manage
all the jobs as if they were in the local queue, regardless of where they are actually
running. Condor-C was used in this work as it allows easier expression of Condor
ClassAds (VM universe requires many ClassAd tweaks not easily achievable with the
much more common Globus GRAM). Globus GRAM would have been an acceptable
alternative to using Condor-C.
The Condor VM universe is the “job type” used by the remote Condor scheduler.
Instead of specifying a process to launch, the Condor VM universe job will specify a
virtual machine to launch. On the worker node, the Condor startd will interact with
libvirt, a common VM manager. We utilize KVM (Kernel-based Virtual Machine), a
full virtualization solution for Linux on x86 hardware containing virtualization
extensions (Intel VT or AMD-V). KVM supports multiple virtual machines running
unmodified Linux or Windows images. It works with the default kernel in the most
recent versions of RHEL5, meaning the same worker node can run both batch jobs
and virtual machines simultaneously.
3. Technical Implementation
The following figure shows the outline of the workflow for launching VMs on the
worker node, from when the job arrives on the worker node to when the Condor
instance inside the VM joins the pool.
URL-based transfer_input_files
Job arrives on a worker node
1
The Condor inside the VM acts as a
worker node providing a consistent
execution environment for a job
4
File transfer plugin is invoked –
it stages the VM disk image on
the worker node
2
A Condor instance starts inside the
VM to join a configured collector
VM instance is
launched by
Condor
daemon
Figure 1 - Workflow of VM instance that is
launched as a condor job
Two technical issues were addressed, below, outside of integrating together the entire
workflow.
Creation of VM’s to be deployed via Condor-C
The virtual machine images have to be created in a manner that is easily
reproducible. We decided to use the “appliance-tools” package created by the
Fedora project. To use this package, the basic requirement is to provide a RedHat
“kickstart” file1 (the standard format for describing how to create machines in
RHEL) with the configuration of the VM, and host it on an httpd server. Once the
disk image is created, it can be tested and further customized using standard KVM
tool (virt-install); the location of the web address of the kickstart file is passed to the
virt-install using the -x flag2. Please refer to the blog entry3 for further details of this
task. After several iterations of creating and testing, the disk image was ready to be
staged to the cluster.
Staging of the VM Image
Here, “staging” means the workflow that involves the transfer of the VM disk image
from a storage location to the condor allocated worker node where the VM will be
1http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/2.ExampleKickStartFil
e.txt
2http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/1.ReadmeCreateVMIm
age.txt
3 http://osgtech.blogspot.com/2011/08/kernel-based-virtualization-andcondor.html
3
instantiated. The storage could be a Storage Element (SE) that is accessible via SRM
or a Network File System (NFS). By default, Condor will transfer the images from
the submit host (a severe network bandwidth bottleneck when running many jobs).
In order to accomplish the scalable staging of VM disk images, we utilized the
custom file transfer plugin features (parameters ENABLE_URL_TRANSFERS and
FILETRANSFER_PLUGINS were enabled). The file transfer plugin that is written as
part of the study can handle staging VM disk images that may be archived and/or
compressed. As OSG provides a large amount of scratch space (10GB is the default)
per CPU, we found the VM disk images achieved a high rate of compressions
(approximately 20x). The plugin developed additionally creates a copy-on-write
(COW) disk image for the running instance of the VM on the worker node. Multiple
running virtual machines images on a worker node can thus share a common
backing file. Use of COW image reduces the disk space requirement and the network
traffic between the storage and the condor allocated KVM host.
For further details please see the transfer plugin filesystemplugin.sh4 or the blog
posting5.
While the ENABLE_URL_TRANSFERS and FILETRANSFER_PLUGINS works very well
with regular condor jobs, this feature had a bug while handling the Condor-C grid
universe job. Specifically, the file transfer plugin was being invoked at the remote
schedd rather than the Condor allocated worker node. Please see the document
GridRemoteFileTransferBug.txt6 and GridRemoteFileTransferBugTable.pdf7 for
details regarding the issue. This issue was identified during an initial test and a
forwarded to the Condor development team which was finally resolved in the
subsequent release.
Submitting VM jobs via Condor
Once the Condor worker node has been configured for the
ENABLE_URL_TRANSFERS and FILETRANSFER_PLUGINS submitting a Condor job is
straightforward. Some of the highlights of the submit files are listed below with
comments:
For sending any attributes to the remote ClassAd for the jobs launched using the
grid universe use the prefix “remote_” and prepend a ‘+’. For additional details look
http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/4.filesystemplugin.sh
http://osgtech.blogspot.com/2011/10/kvm-and-condor-part-2-condor.html
6http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/5.GridRemoteFileTran
sferBug.txt
7http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/6.GridRemoteFileTran
sferBugTable.pdf
4
5
at the example submit files for Condor-C and Condor
(CondorCSubmitFileExample.txt8 and CondorSubmitFileExample.txt9 respectively).
ISSUES IDENTIFIED
When a job is launched via grid universe for a remote VM universe and it reaches
the worker node - the file transfer plugin stages the VM disk image in the Condor
allocated execute directory. As soon as the file transfer is completed by the plugin
the condor execute directory is deleted by a Condor daemon causing the transferred
VM disk image to be deleted as well. The current study could not identify what was
causing the Condor allocated execute directory to be deleted, also this behavior is
not seen for jobs submitted directly to the VM universe. This results in the worker
node Startd and VM_Gahp to complain regarding the incorrect/missing VM disk
image that ultimately places the job in a hold status.
A workaround for now is that the file transfer plugin stages the VM disk image in a
temporary location outside of the condor allocated execute directory and have an
absolute path to the transferred file location vm_disk parameter of the job ClassAd.
In this case even though the Condor execute directory is deleted but the VM is
launched and job maintains its running status.
Joining a condor collector from a Condor instance running inside of the
launched VM Condor job (Condor inside Condor)
The collector configuration for this particular study was hardcoded. However this
task could be partially configured (install required Condor packages etc.) via the
Kickstart file that is used to create the initial VM disk image and partially via the
configuration sent along with a CD-ROM ISO image that may be mounted from the
launched VM instance.
4. Conclusions and Future Work
This work demonstrated the viability of a hybrid batch-system/“cloud” approach.
The same Condor instance was utilized via the grid for launching both jobs and
virtual machines. The resulting virtual machines were designed to be reminiscent of
batch nodes (they integrated with a Condor pool), but fully under control of the
user, not the site.
Because the size of an executable (megabytes) is typically three orders of magnitude
smaller than the typical virtual machine (gigabytes), special care was taken to
improve the scalability of the system. This was especially evident in the staging
method, which is using a COW disk image with a common backing file rather than a
self-contained disk image. Due to the bugs discovered, we believe we were the first
to combine the VM universe with a reasonably scalable staging method.
8http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/7.CondorCSubmitFileE
xample.txt
9http://t2.unl.edu:8094/browser/VMApps/ReportNov2011/8.CondorSubmitFileEx
ample.txt
While this work demonstrated viability, future work is needed to put this work into
production. While the hand-written kickstart files and manual upload were
sufficient for this project, to manage images across the entire OSG will require a
management framework for the creation and maintenance of VM templates using
tools such as Aeolus10.
Additionally, to have the TI finish within the allotted time, we narrowed the scope to
integrate the VM image with a VO’s workflow management system, such as
GlideinWMS or PanDA. This will likely be the topic of a follow-up TI. Finally, we
believe there is a need to implement a monitoring framework that will assure that
the VM instance is successfully started and that it joins the configured collector to
serve as a VM based worker node.
10
http://www.aeolusproject.org/
Download