Use Case Scenarios for Performance Control of Grid-based Metacomputing John Gurd, Ken Mayes, Graham Riley 3rd Grid Performance Workshop, June 2005 www.cs.man.ac.uk/cnc Overview Preamble • The case for Performance Control Context • Malleable, component-based Grid applications The PERCO (Performance Control) System • Design and implementation Homogeneous Components • Simple performance control scenarios More Complex Scenarios Conclusions Achieving Performance Engineering for maximum performance: • coarse design, then fine tuning • requires high degree of repeatability • benefits from homogeneity, symmetry, etc. Control to achieve (less than maximum) target: • use negative feedback control at run-time • necessary to approach dynamic environment • helps to deal with heterogeneity How to Control Performance? Requires (negative) feedback feedback function error actuator • needs sensors, actuators and compensators • timers, control ‘handles’, predictive models Whole system vs. piece-wise control • who is responsible for what? Perception is that a hierarchy is needed • hence need hierarchical software structure Controllable Components? Several groups have suggested that control should be effected via a component-based software architecture • degenerates to singleton component • can reduce the complexity of control • can form a control hierarchy Overview of PERCO Two-tier hierarchical performance control • CPS (Component Performance Steerer) - one wrapped around each component - all attached to APS (see below) - maximises performance on deployed platform • APS (Application Performance Steerer) - (re)deploys components on available resources - maximises performance on allocated platforms Requires an external resource allocator (from which to obtain a set of resources in which to effect its deployments) Modus Operandi Components progress via a sequence of progress points, at each of which a component calls out to its CPS for any component-specific performance control actions (local actuation; requires component to be malleable) Certain progress points are also safepoints (i.e. the component is in a state that permits it to be redeployed) and, at these points, the CPS can call out to the APS for redeployment-based performance control actions (the APS means of actuation) Progress Points Assume that the execution of components and application proceeds through phases, and that the phase boundaries are marked by progress points. Ph 1 0 Ph 2 1 Ph 3 2 Ph 4 3 Ph 5 4 Ph 6 5 Ph 7 6 Can take decisions about performance and (possibly) actuate at the progress points 7 Application vs. Component Progress Points APS Application progress points CPS Component progress points Component Time Application progress points need to be safe points PERCO System Overview PERCO Infrastructure Each component is attached to a local loader which is capable of moving the component safely around the distributed Grid hardware according to the APS commands The local loaders act in concert with the APS to form a virtual loader layer for the application Each CPS communicates with the local loader on behalf of its component PERCO System for 2 Components, C1 & C2 Controllable Components? Several groups have suggested that control should be effected via a component-based software architecture • degenerates to singleton component • can reduce the complexity of control • can form a control hierarchy But where do the components come from? • a knotty problem (cf. RealityGrid LB3D) One Answer . . . Homogeneous components • each component a copy of the same model • used e.g. for parameter search • e.g. LB3D from RealityGrid Performance control scenarios • N instances of LB3D, finish as fast as possible - equates to keeping them in (approximate) timestep with each other (see next slides) • execute N instances of LB3D at specified rates relative to one another - e.g. N=2, one instance executes twice as many timesteps per unit of time as the other With No Control With Control Exerted Slightly More Complex Answer . . . “Almost homogeneous” components • each component a copy of a similar model, but ... • ... with different driving parameters - e.g. LB3D with different resolutions Performance control scenarios • TeraGyroid experiment (from RealityGrid; conducted during SC’2003; see next slide) • IntBioSim “beading” method • Hurricane “tracking” Embedded high resolution subdomains • when does extra resolution become new physics? TeraGyroid Use Case Scenario Even More Complex Answer: Coupled Models Many scientific modellers are finding a need to link together multiple models: • climate/envt. models (ocean + atmosphere + ...) • multi-scale phenomena (CFD + MD = HybridMD) • aircraft lightning strike (CEM + a/f structure) + others, all needing high performance & ‘Grid’ The individual models seem to constitute ready-made components: • can these be used for performance control? Summary We are investigating the practicalities of component-based performance control in Grid execution environments A prototype performance control system is being developed and we have shown that it can be used to achieve a scientifically meaningful high-level performance objective We are ready to apply it to realistic scientific coupled model applications K.R. Mayes, M. Luján, G.D. Riley, J. Chin, P.V. Coveney, J.R. Gurd, Towards performance control on the Grid, Philosophical Transactions of the Royal Society of London: Series A, to appear, August 2005. Related Projects at Manchester FLUME - design of next generation Unified Model software • funded by The Met Office (led by Mick Carter) RealityGrid – condensed matter modelling • EPSRC-funded e-Science (led by Peter Coveney at UCL) SoftIAM - climate impact, integrated assessment modelling • funded by the Tyndall Centre (led by Rachel Warren) IntBioSim – integrated biological simulation • BBSRC-funded e-Science (led by Mark Sansom at Oxford) GENIEfy – Earth system modelling • NERC-funded e-Science (led by Tim Lenton + Tyndall C) Weblinks For more information check: http://www.cs.man.ac.uk/cnc http://www.realitygrid.org http://www.intbiosim.org (under construction)