Predictable Workflow Deployment Service Stephen MCGough Ali Afzal, Anthony Mayer, Steven Newhouse, Laurie Young London e-Science Centre Department of Computing, Imperial College London ICENI The Iceni, under Queen Boudicca, united the tribes of South-East England in a revolt against the occupying Roman forces in AD60. • • • • • IC e-Science Networked Infrastructure Developed by LeSC Grid Middleware Group Collect and provide relevant Grid meta-data Use to define and develop higher-level services Interaction with other frameworks: OGSA, Jxta etc. 2 The Architecture: Showing the Trinity Scheduler Performance Store Reservation Service Launcher Application Service Reservation Engine 3 Scheduling Service Scheduling Algorithm -Algorithm to select where to deploy components Scheduling Framework Application Mapper - Generates the possible mappings of Components to resources Listen out for services -Launcher Services -Reservation Services -Performance Services 4 Scheduling Ports • launchJob – Takes an EP (workflow) or JDML (job description) – Works out where to deploy on the grid • Uses Performance, Reservation and Launching service to help determine this – Deploys work to appropriate Resources (as JDML) – Returns EP indicating what has been done • generateQuote – same but doesn’t deploy 5 The Performance Repository Framework: PerformanceStore Store Performance Performance Store - Persistent Performance storage - Persistent Performance storage - Persistent Performance storage Performance Framework Data Collector -Collecting data on currently running applications (event times) Performance Processing - Conversion of raw event times into performance data 6 Performance Ports • register – Inform the Performance Service of a new application to monitor – Provides a unique id that is used for further calls to the PS • addEP – Provide a workflow for an application the PS is monitoring – Requires the Execution Plan (a workflow) – Requires the unique id provided above 7 Performance Ports (2) • getActivityTime – Get an estimated execution time for part of a workflow – Compulsory data • Component type – identifies the component we are interested in • Resource – the resource it will be run on • Activity – which part of the component – Optional data • Share count – the number of other components that will be running on the resource • Problem space characteristics – a set of parameters specified by the component designer (eg number of unknowns for a set of liner equations) 8 Performance Ports (3) • getProblemCharacteristics – find out the set of parameters and their types that can be used when querying the performance service for a given component – Requires component type, resource, activity 9 Performance Events • When ICENI components start or component ports are accessed events are fired – Used to gather performance information about currently running application • Events contain data about – Time, Component where event happened, resource, type of event (start or port), application which event refers to. • Are serialised objects – can be XML documents 10 Collection of Performance Results Linear Equation Source Event: Start Linear Equation Source Data Collector Linear Equation Workflow Solver Display Vector Results Time 12:00 12:04 12:03 12:05 12:12 Event Linear Equation Source Start Send out Equations Linear Equation Solver Start Receive Equations ……….. Performance Processing Performance Store 11 Launching Service Launcher Launcher Launcher -ConvertsaaJDML JDMLdocument documentinto into a -Converts -Converts a JDML document into aa platformspecific specificjob job platform platform specific job Launcher Factory Launching Framework -Generates a Launcher for each job submitted to the Launching Service Advertiser -Generate a document for each resource available from this Launcher Reservation - Provides mechanism for reservations to be made 12 Launching Service • launchJob – Takes an XML description of the job to deploy written in JDML and enacts this job on the appropriate resource – JDML is translated to the local DRM specifics • getResources – return the set of id’s of the resources available from this launcher. – If a set of user credentials are provided then the list only contains those resources that the user may use. 13 Launching Service (2) • getResourceDescription – Get the resource description for a named resource as an XML document. – If credentials are provided only return the document if the user can use the resource. • getResourceAttribute – Query a specific attribute from a resource. Given the name of a resource and the name of one of the attributes return the value of this attribute. • getLocations – Get a list of the names of the resources 14 Launcher With Reservations • createReservation - Given an agreement document requests a reservation for a resource – Returns an agreement document and an agreement identity • renegotiateAgreement – Takes an agreement document returned previously and attempts to modify it. – If successful new document returned – If unsuccessful return an alternative proposal • cancelReservation – takes a reservation identity and cancels the associated Reservation 15 Launcher With Reservations (2) • createHold – Given an agreement document and timeout value make a hold on a resource – Arguments may be negotiated – Returns an Agreement Document with the Hold Identity – Hold is not permanent (time limited) • may need to cancel if can’t hold all other components in application • confirmHold - Takes a hold identity and makes the hold permanent • cancelHold – Takes a hold identity and cancels that hold on the resource. 16 Reservation Service • makeReservations – Takes a set of EPs (workflows) and tries to see if any of them can be fully reserved for the given user credentials – Returns an EP that can be fully reserved (if one exists) – Does this by making holds with the Launching Services and confirming them • cancelReservation – Takes the Resource Identity and Reservation Identity and cancels that reservation – These are found in the EP returned from creating a reservaiton 17 Reservation Engine • Exposes the underlying reservation features of the DRM • makeReservation – Takes reservation including time interval and user credentials – Either confirms the reservation is accepted or offers an alternative • cancelReservation – Takes a reservation identity and cancels it • makeHold – Takes a reservation request and duration – Returns the time interval that the hold will be held for • cancelHold – Cancel a Hold request given its id • confirmHold – Make a Hold into a reservation – requires id 18 Example Execution Performance Service Scheduling Service Reservations Service Launcher Service Advertise Actor Reservations Engine Advertise Advertise Advertise Submit workflow Get resource information Get performance information Performance data Resource information Evaluate Performance Models Schedule workflow Create Reservations Create Hold Hold Created Confirm Hold Execution Plan Reservation Confirmed Create Hold Hold Created Confirm Hold Reservation Created Deploy Jobs onto Resources Application Started 19 Service: ICENI • End to end Grid middleware. Providing Launching, Scheduling, Reservation and interapplication communication. – URL: www.lesc.doc.ic.ac.uk/iceni – Licence: ICENI, based on Sun open source licence – Support: Web site / mailing list • SOA Model:Jini 20 What do you use to build your service? (i.e. How ‘standard’ is your service?) • Widely Implemented Standard Specification (1pt) – JINI • Implemented draft specification (2pt) • Implemented draft specification (3pt) • Implemented proposal (4pt) – ICENI Architecture • Non-implemented proposal (5pt) • Concept (6pt) • TOTAL: JINI, 1pt, Implemented Pro 4pt = 5pt. 21 Service Dependencies • What else does your service depend on (i.e. external dependencies)? – Logging : Java Logging • What does your implementation depend on? – Languages : Java – JINI based. 22 AAA & Security • What authentication mechanism do you use? – X509 certificates based. • What authorisation mechanism do you use? – From ICENI infrastructure. • What accounting mechanism do you use? – None at present. • Does service interaction need to be encrypted? • If these are not used now, will they be in the future? 23 Exploiting the Service Architecture • What features from your ‘plumbing’ do you use in your service? – Event notification – Meta-data – Registry discovery/advertisement 24 Service Activity • Multiple interaction or single user? – Multiple interaction • Throughput (1/per day or 100/per second?) – ~ 10/per min. • Typical data volume moved in • Typical data volume moved out – Depends on job. 25 Service Failure • Required Reliability – Failure semantics? • Positive ack • Required Persistence – No current persistence. • Required Availability – One of many. 26 Required Service Management • Remote access to: – Performance – Progress (limited at present). 27 The Future • • • • How will ICENI develop? Want to re-engineer services as web-services Already have this for launcher (WS-JDML) Bring ICENI back into main stream services – More reliable and useful to others – Fragment ICENI into separate interoperating services – Explore different service discovery mechanisms 28 Acknowledgements • Director: Professor John Darlington • Research Staff: – – – – Anthony Mayer, Nathalie Furmento Stephen McGough, William Lee, Jeremy Cohen Marko Krznaric, Murtaza Gulamali Asif Saleem, Laurie Young, Jeffery Hau • Others: – Steven Newhouse, Yong Xie, Gary Kong, James Stanton • Contact: – http://www.lesc.ic.ac.uk/iceni – e-mail: lesc@ic.ac.uk 29