Dynamic adaptation of parallel codes Toward self-adaptable components for the Grid Françoise André, Jérémy Buisson & Jean-Louis Pazat IRISA / INSA de Rennes / Université de Rennes 1 Our view of the Grid Cluster resource Cluster resource Application WAN Our point of interest Cluster resource Our view of the Grid Cluster resource Processor resources Network resource … Our view of the Grid • Environment that is: – Parallel • Grid is built up from parallel machines – Dynamic • Resource allocation may change dynamically – Distributed • Resources are distributed over a network • Resources are in different administration domains • Need for a new programming technique Parallel self-adaptable distributed components Related works • Parallel and distributed components / objects exist – Example: GridCCM, PARDIS • Self-adaptable components exist – Example: ACEEL, DART • But no parallel and self-adaptable distributed component Principles of parallel components • Encapsulation of a parallel code – Collaboration of several communicating processes • Goal: allow to easily couple parallel codes Principles of dynamic adaptation • Modification of the executed code Execution flow – Reflexive programming • Goal: better fit to allocated resources 1. Event 2. Reaction Dynamic adaptation • Three key questions: – When should the component adapt? – How should the component be modified? – Where can the reaction be executed? Dynamic adaptation • When should the component adapt? – Upon reception of an event from a monitor – According to the policy Monitor Decider Notifies of events Interprets Adaptation policy Dynamic adaptation • How should the component be modified? – Executing special code – Following directives of the policy Coordinator Requests execution of reactions Decider Executes Interprets Adaptation policy Reaction Dynamic adaptation • Where can the reaction be executed? Behavior 1 Behavior 2 – At the next adaptation point – Approximated prediction of the next point • Based on control flow graph Reaction Not an adaptation point An adaptation point Dynamic adaptation Coordinator Requests execution of reactions Monitor Decider Executes Notifies of events Component Platform Interprets Reaction Adaptation policy Modifies Behavior Mixing parallelism and adaptation Parallel coordinator Requests execution of reactions Monitor Decider Executes Notifies of events Component Platform Interprets Parallel reaction Adaptation policy Modifies Parallel behavior Parallel Parallelbehavior behavior Mixing parallelism and adaptation • Introduction of global adaptation points – All the processes at the same state • Need to coordinate all the processes Local adaptation point Global adaptation points • Example: SPMD code – Adaptation point between each phase Not global adaptation points Mixing parallelism and adaptation • Need for a distributed algorithm for the parallel coordinator – Only consider globally reachable points • In the future of all the processes – Make an agreement of all the processes • Choose the same point for all the processes Mixing parallelism and adaptation • Need to control the non-determinism – Due to parallelism • Dynamically insert synchronization statements – Due to unpredictable conditional instructions • Force the result of the conditions if possible – Example: insertion of empty iterations in loops • Otherwise postpone the decision-making Experiment – Iterative SPMD code • Adaptation points between each iteration – Increase of the number of processors • Results – Negligible time in adaptation points – Gain thanks to the adaptation – Expected to scale well Execution time (sec) • Experiment Adaptation 120 100 80 60 40 20 0 0 25 50 75 100 Iteration Original Adaptable Related domains • Computation steering – Notions equivalent to global adaptation points • Need to execute some “special code” at the next “special point” – Particular use of adaptation mechanisms • User interface instead of monitors Man-Machine Interface Requ exec Monitor Decider Notifies of events Component Interprets Adaptation policy Mod Parallel behav Parallel Parallelbeha beha Related domains • Fault tolerance – Consider dynamic environment – Need for a global “consistent” state • In the past for fault tolerance • In the future for dynamic adaptation – Relation to dynamic adaptation • An application? • A complementary feature? Work done • Design of the overall architecture – Identification of functional “boxes” • Distributed algorithm for the coordinator – Automated instrumentation by static behavioral reification – Simple negotiation protocol • Demonstration prototype – Ad-hoc mechanisms – Proof of concept Future work • Generalizing the approach – Generic definition of global adaptation points • Limits of the “same state” definition • Case of non-SPMD codes – Expression of the adaptation policy • Limits of explicit event-based rules • Need for more sophisticated (intelligent?) policies – Smoothing measures of resource availability – Balancing instabilities Future work • Collaborative adaptation of components – Control side-effects • Avoid adaptation cycles – Common policy at the level of: • A group of components • A composite • The whole application – Consider full Grid applications • Not only their components Dynamic adaptation of parallel codes Toward self-adaptable components for the Grid