Condor services for the Global Grid

advertisement
Condor Services for the Global Grid:
Interoperability between OGSA and Condor
Clovis Chapman1, Paul Wilson2, Todd Tannenbaum3,
Matthew Farrellee3, Miron Livny3, John Brodholt2, and
Wolfgang Emmerich1
1 Dept. of Computer Science, University College London,
Gower St, London WC1E 6BT, United Kingdom
2 Dept. of Earth Sciences, University College London
Gower St, London WC1E 6BT, United Kingdom
3 Computer Sciences Department, University of Wisconsin
1210 W. Dayton St., Madison, WI 53706-1685, U.S.A.
Goals

Leverage acceptance of grid standards: investigate the
potential for interoperability with established systems

Complementary architectures: OGSA allows us to expose a
range of Condor services
 Seamless integration of Condor resources in a standardized
Grid environment

Improving Condor’s grid capabilities:
 Bring Condor in line with advances in grid computing – and add
significant new functionality

Providing a set of high-throughput computing services to
the grid community (workload management, scheduling, etc.)
2
Condor Architecture overview
Central manager
Collector
User jobs
Submission machine(s)
Execution machine(s)
Schedd
Startd
3
Condor Architecture overview
Central manager
Negotiator
Collector
Submission machine(s)
Execution machine(s)
Schedd
Startd
Shadow
Starter
4
Architectural alternatives
Option 1
Option 2
Remote
client
Job
queue
Remote
client
1. Job execution
request
Site A
Job
queue
Site B
2. Resource
allocation request
Manager
Local
Schedule
r
1. Resource
allocation
request
3. Job
execution
2. Job
Execution
Manager
5
Comparisons
Must take into account real world constraints such as:
 Firewalls or private LANs: might not have access to all
machines of a pool – even though the use of SOAP should
help ease access through firewalls
 Potential cost in resource usage (Condor is currently
relatively lightweight) – need to consider weight of hosting
environment -> debatable
 Should avoid interfering with intricate relationships
between condor components
6
Option 1: Job Delegation
 Need to provide:
- Job submission and queue management
interface
- Job execution management
- Resource information providers: allow external
sources to estimate pool suitability before
submission
 Can be mapped to: schedd, collector (shadow)
7
Option 1: The scheduler
 Can present a transaction oriented interface for job
submission
 Transient schedulers: allow users to instantiate their own
instances of the scheduler via a scheduler factory
- Isolates user/application-specific sets of jobs
- Can be destroyed when no longer required
- Security benefits: scheduler would no longer require root access.
 Expose job classAds as service data elements
- Job classAds represent a job and its characteristics during its
lifetime
- Allows job information to be obtained via OGSA query
mechanisms
- Allows for asynchronous notifications of classAd updates
8
Option 1: Resource Information Providers
The collector:
 Collects information about availability and characteristics
of resources in a pool in the form of resource classAds
 Can expose resource classAds as individual service data
elements
 Can complement this information with pool policies
(priorities, job pre-emption rules, etc.) – but need a
clearer representation of customer capabilities in Condor
 Will the central manager be accessible? (Firewalls…)
 Might want to use proxy services or redirect queries through
the scheduler
9
Conclusion
 VO-wide management tools will be the focus point for
future development work
 Project funded by DTI, JISC and Microsoft
 Starting point: implementation of a (transient)
scheduler
- Take advantage of OGSA concepts such as service data,
notification and factories to boost Condor capabilities and
ease remote access and integration in grid environment
- Couple this with (VO-wide) discovery and monitoring
services
- Move to WSRF and Web Services Notification
11
Download