Grid vs Cloud vs Vacuum Vac implementation

advertisement
Running jobs in the Vacuum
Andrew McNab
School of Physics and Astronomy, University of Manchester
Federico Stagni and Mario Ubeda Garcia
CERN
In parallel with the Grid and Cloud models, the
Vacuum is a third model for the operation of
computing nodes at a site. In the Vacuum case, virtual
machines (VMs) are created and contextualized for
virtual organisations (VOs) by the site itself. For the
VO, these virtual machines appear to be produced
spontaneously "in the vacuum" rather than in
response to requests by the VO. This is an inversion
of the more usual cloud model of Infrastructure-as-aService, and these sites operate as Infrastructure-asa-Client (IaaC).
The Vacuum model takes advantage of the mature pilot
job frameworks adopted by many VOs, in which pilot
jobs submitted via a Grid infrastructure in turn start job
agents which fetch the real jobs from the VO's central
task queue. In the Vacuum model, the contextualisation
process starts a job agent within the virtual machine
and real jobs are then fetched from the central task
queue as normal for these systems. This parallels
similar developments in which Cloud interfaces are
used to start virtual machines containing job agents.
Grid vs Cloud vs Vacuum
The Grid
“Push becomes Pull”
Pilot Agent. Runs
JobAgent to
fetch from TQ
WMS
Broker
Director
(pilot factory)
Grid
site
CREAM-CE &
Batch queues
Central
Services
Matcher &
Task queue
The Cloud
The Grid: originally conceived with central brokers
forwarding users' jobs to sites, the Grid has evolved
towards direct submission of jobs to sites. Large VOs have
created their own overlay infrastructures, such as LHCb's
DIRAC, with users' jobs fetched from a central task queue
by pilot jobs once they arrive at the site.
Cloud site
“Infrastructure-as-a-Service”
Director
(VM factory)
Vacuum site
“Infrastructure-as-a-Client”
VM. Runs
JobAgent to
fetch from TQ
Similar to direct
Grid submission
& introduces
VMs.
User and
productio
n
jobs
The Vacuum
VM. Runs
JobAgent to
fetch from TQ
Further simplification
by removing central
director/factory.
Site creates VMs itself.
Central
Services
Matcher &
Task queue
User and
productio
n
jobs
Central
Services
The Cloud: an evolution of the demand for server rental in
the commercial world, which has led to highly elastic
virtual machine hosting services. Some VOs are now
looking at using either commercial Clouds providers or
sites using software developed for Clouds, and are adding
support for VMs to their private infrastructures.
User and
productio
n
jobs
Matcher &
Task queue
The Vacuum: provides a third model, in which VMs are
run for VOs by sites using a contextualization provided in
advance by the VO. As before, user jobs are fetched from
the VO's central task queues. This leverages the existing
work on supporting VMs for Clouds and the VO's private
overlay infrastructures such as DIRAC.
Vac implementation
An implementation of the Vacuum scheme, Vac, has been developed at Manchester. A
VM factory daemon, vacd, runs on each physical worker node to create and
contextualise that node's set of virtual machines. Each node's VM factory can decide
which virtual organization's virtual machines to run, based on site-wide target shares
and on a peer-to-peer protocol between factory nodes.
The site's VM factory nodes query each other to discover which virtual machine types
they are running, and therefore identify which virtual organisations' virtual machines
should be started as VM slots become available again. This allows sites to provide
virtual environments to VOs and still maintain the fair share allocation of capacity
between multiple VOs that is currently achieved with grid interfaces to conventional
batch systems.
To create a virtual machine for a virtual organization, the VO must supply a
contextualization procedure to the site which vacd applies. A suitable contextualization
was developed by the LHCb collaboration at CERN for use with Clouds, and has been
extended to work with Vac. During the summer of 2013, we successfully demonstrated
running production jobs from the main LHCb central task queues without any
modification to the central DIRAC services or operations procedures. During these
production runs, Vac was in operation at 3 sites in the UK (Manchester, Lancaster,
Imperial
College).
VM
VM
vacd
VM
Vac
node
querying
peers
VM
vacd
VM
VM
VM
VM
VM
VM
VM
VM
VM
–
VM
vacd
VM
For more about the Vac implementation, please see http://www.gridpp.ac.uk/vac/
Running jobs in the Vacuum
VM
vacd
vacd
VM
VM
VM
vacd
VM
VM
Andrew.McNab@manchester.ac.uk – www.gridpp.ac.uk/vac/
VM
VM
Template produced at the Graphics Support Workshop, Media Services
Another property of this implementation is that there is no gate keeper service, head
node, or batch system accepting and then directing jobs to particular worker nodes.
This avoids several central points of failure, and considerably simplifies the installation
and maintenance of these sites.
VM
Vacuum
site
Download