AgentCities - Agents and Grids Thoughts on Monitoring and Agents Prof Mark Baker ACET, University of Reading Tel: +44 118 378 8615 E-mail: Mark.Baker@computer.org Web: http://acet.rdg.ac.uk/~mab February 20, 07 mark.baker@computer.org Outline • • • • • • • • Monitoring: What is it? A View of Grid Monitoring. Ganglia Example. Generic Monitoring Architecture A Layered View. Monitoring Issues. Where do Agents fit in? Summary. February 20, 07 mark.baker@computer.org Monitoring: What is it? • Monitoring is part of the process of administrating and managing computer-based resources: – However, the term “monitoring” is rather an overloaded word. • The term implies that we are effectively “watching” the state of some component or resource. • This type of passive monitoring (read only) is useful in some spheres (e.g. job submission), but has limited usefulness for actually managing these computerbased resources. • Dynamic monitoring (read/write) is more useful because now we can not only watch the status of the resources, but we can interact with them to control and manage them too (e.g. reconfigure on the fly, change QoS setting, queue priorities…). February 20, 07 mark.baker@computer.org A View of Grid Monitoring • Traditional view of monitoring is looking at static and dynamic computer-based resource information: – Static Information: • For example - CPU type, amount of memory, OS type… – Dynamic Information: • For example - CPU, memory, disk use. • This information gathered can be used for all manner of tasks: – Basic systems monitoring (sys admin tasks), – General accounting, – Monitoring for job submissions purposes (want to choose best resource for task placement), – Monitoring to ensure QoS, – Policing SLA, – Performance profiling of systems and applications (looking for bottlenecks and other problems), – Potential for security reasons. February 20, 07 mark.baker@computer.org Ganglia February 20, 07 mark.baker@computer.org Generic Architecture (Local) Grid Site Resource Monitor Gather Performance Statistics Resource and Historical Performance Data Webserver (Servlets) Local Cache (Database) Local Grid Resource 1 Resource Warnings & Alerts Agent/Sensor Local Grid Resource 2 Local Grid Resource n Agent/Sensor Agent/Sensor Remote (registered) Grid Sites Performance Information Gathering Protocols: SNMP, WBEM…. February 20, 07 mark.baker@computer.org Generic Architecture (Global) February 20, 07 mark.baker@computer.org Data Management Issues • Need to produce: – A simple and expressive API, – Device drivers and manager for each Agent, – A means of describing the monitored data: • Implies an XML-based schema and an ontology. Ontologies and Schema Resource Markup Language API Agent API Agent Driver Manager Driver Manager Common Agent API Agent Devices SNMP Agent February 20, 07 NWS Agent NetL Agent WBEM Agent mark.baker@computer.org SCM Agent XYZ Agent Some Architectural Issues • Sensors/Agents: – Make everyone install custom agents, or use existing ones! • Potentially billions of resources that need monitoring! • Protocols: – No real standards apart from SNMP. – XML used extensively now - GLUE often used (limited). • Resources verses Services: – On-going debate. • Scalability: – Need global extent, current systems are typically designed for small scale, based on cluster monitoring. • Security: – Often little or no security. – OK for read-only systems, but… • Intrusiveness: – Trade-off as usual, do not want to affect systems monitored. February 20, 07 mark.baker@computer.org Monitoring Systems • Recent review showed that there are about twenty active Grid-based monitoring systems. • These range from systems: – That are “built from scratch” - to use such a system you need to install all the their software for monitoring purposes, – To those that are built on existing infrastructure and standards - gather SNMP/Ganglia data and use this for monitoring purposes. • The latter systems are becoming increasing popular and widely used to day. February 20, 07 mark.baker@computer.org Where do Agents fit in with Monitoring? • Agent booklet definition: – “An agent is a computer system that is capable of flexible autonomous action in a dynamic, unpredictable, typically multi-agent domains.” • According to this definition we “just” throw away what we have and start again with agents! • However, there are a raft of very practical problems… – Not least among these is that most of the world does not use agent-based technologies, and do not want to replace there monitoring infrastructure with something new and unproven. February 20, 07 mark.baker@computer.org Where do Agents fit in with Monitoring? Intelligence/Knowledge Clients Intelligent Tools Ontologies and Schema Brokers, Schedulers, Policing API Agent/Sensor API Agent/Sensor Driver Manager Driver Manager Common Agent/Sensor API Agent Devices SNMP Agent NWS Agent NetL Agent WBEM Agent SCM Agent XYZ Agent Data/Information February 20, 07 mark.baker@computer.org Where do Agents fit in with Monitoring? • Not practical to replace existing monitoring infrastructure with agents. • However, there is vast space to use agents to process data/information gathered and use this provide intelligence/knowledge to higher-level tools. • Key agent features: – Intelligence - rule-based decision making. – Complex agent-to-agent interaction - to produce knowledge for more sophisticated decision making. • Potential problems!: – Integrating agent frameworks and the Grid, APIs, and protocols - practical aspects of wide-scale deployment! February 20, 07 mark.baker@computer.org Where do Agents fit in with Monitoring? • SLA/QoS/site-policy policing • Intelligent brokering for a range of tasks: – – – – – Negotiation, Bartering, Arbitration, Job submission, Resource reservation. • Accounting tools. • Autonomic behaviour - help in providing self-healing capabilities of distributed systems. • Working with Semantic Web technologies to create/provide knowledge. February 20, 07 mark.baker@computer.org Summary • Well established monitoring infrastructure for existing distributed systems - clusters, LANs, the Grid… • Higher level tools/services that use the gathered monitoring data are few and far between - seems a good space where agentbased systems can work. • Need “intelligence” to provide knowledge to consumers of Grid-based services. • Not necessarily easy to put agent and Grid infrastructure, various issues security, different architectures, API, protocols… February 20, 07 mark.baker@computer.org