Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman1, Paul Wilson2, Todd Tannenbaum3, Matthew Farrellee3, Miron Livny3, John Brodholt2, and Wolfgang Emmerich1 1 Dept. of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom 2 Dept. of Earth Sciences, University College London Gower St, London WC1E 6BT, United Kingdom 3 Computer Sciences Department, University of Wisconsin 1210 W. Dayton St., Madison, WI 53706-1685, U.S.A. Goals Leverage acceptance of grid standards: investigate the potential for interoperability with established systems Complementary architectures: OGSA allows us to expose a range of Condor services Seamless integration of Condor resources in a standardized Grid environment Improving Condor’s grid capabilities: Bring Condor in line with advances in grid computing – and add significant new functionality Providing a set of high-throughput computing services to the grid community (workload management, scheduling, etc.) 2 Condor Architecture overview Central manager Collector User jobs Submission machine(s) Execution machine(s) Schedd Startd 3 Condor Architecture overview Central manager Negotiator Collector Submission machine(s) Execution machine(s) Schedd Startd Shadow Starter 4 Architectural alternatives Option 1 Option 2 Remote client Job queue Remote client 1. Job execution request Site A Job queue Site B 2. Resource allocation request Manager Local Schedule r 1. Resource allocation request 3. Job execution 2. Job Execution Manager 5 Comparisons Must take into account real world constraints such as: Firewalls or private LANs: might not have access to all machines of a pool – even though the use of SOAP should help ease access through firewalls Potential cost in resource usage (Condor is currently relatively lightweight) – need to consider weight of hosting environment -> debatable Should avoid interfering with intricate relationships between condor components 6 Option 1: Job Delegation Need to provide: - Job submission and queue management interface - Job execution management - Resource information providers: allow external sources to estimate pool suitability before submission Can be mapped to: schedd, collector (shadow) 7 Option 1: The scheduler Can present a transaction oriented interface for job submission Transient schedulers: allow users to instantiate their own instances of the scheduler via a scheduler factory - Isolates user/application-specific sets of jobs - Can be destroyed when no longer required - Security benefits: scheduler would no longer require root access. Expose job classAds as service data elements - Job classAds represent a job and its characteristics during its lifetime - Allows job information to be obtained via OGSA query mechanisms - Allows for asynchronous notifications of classAd updates 8 Option 1: Resource Information Providers The collector: Collects information about availability and characteristics of resources in a pool in the form of resource classAds Can expose resource classAds as individual service data elements Can complement this information with pool policies (priorities, job pre-emption rules, etc.) – but need a clearer representation of customer capabilities in Condor Will the central manager be accessible? (Firewalls…) Might want to use proxy services or redirect queries through the scheduler 9 Conclusion VO-wide management tools will be the focus point for future development work Project funded by DTI, JISC and Microsoft Starting point: implementation of a (transient) scheduler - Take advantage of OGSA concepts such as service data, notification and factories to boost Condor capabilities and ease remote access and integration in grid environment - Couple this with (VO-wide) discovery and monitoring services - Move to WSRF and Web Services Notification 11