CompSci 296.2 Self-Managing Systems Shivnath Babu Today • Project schedule (reminder) • Finish QueS presentation – System, challenges • Sample projects • If we have time, start ROC discussion 2 Project • Group size <= 2 • Identify “general topic” by end of January, meet Shivnath • Feb 7: Scope the problem, give 15-minute talk • Feb 21: 3-minute talk • March 7: 15-minute talk • March 28: 3-minute talk • April 4/6: 15-minute talk • April 20/24: 15-minute final in-class presentation (+ “demo”) 3 Querying Systems as Data • What are probable causes of the Service-Level-Agreement (SLA) violations rising to 12%? Root-cause query 4 Queries: What if … • Given today’s workload, how will average response time change if my database fails? • If I double the memory on my application servers, how will SLA violation rate change? 5 Queries: Let me know … • Let me know if, with 75% probability, average response time will exceed 5 seconds in next 30 minutes – Prediction – Continuous query 6 Queries: What should I do? • What should I do to reduce SLA violations of requests A to <1%, without increasing violations of other requests? – Root-cause + What-if 7 Querying Systems as Data • Instrumented traces, logs • System activity data • Data from active probing • Workload • System configuration data (e.g., buffer size, indexes) • Source code D A T A • Models – – – – Analytic performance models Machine learning models Rules from system experts Simulators 8 Querying Systems with QueS (30,000 ft) Data Maintenance System mgmt. Queries services Answers Query Processor Modeldriven DB Engine D A T A Data Acquisition 9 Challenges: Query Complexity • Support for complex queries – Rank probable causes of SLA violation rising to 12%? – “What should I do” queries • Queries may be acquisitional 10 Challenges: Query Specification • Declarative query language – Expressibility of language – Composition • Snapshot queries and continuous queries 11 Challenges: Query Processing • Model-based query processing • Many types of data sources – Structured, semi-structured, and unstructured • Uncertainty in input data – E.g., legacy systems may have partial/no instrumentation • Imprecise answers – Answers may include quantification of accuracy – Ranking 12 Challenges: Run-time Overhead • Real-time service for 24x7 systems • Tunable data acquisition • Active probing 13 Sample Projects • NIMO • Fa • What-if querying for database systems • Combining structured & unstructured data • Projects using Nagios • Projects using IBM software 14 Sample Project (in progress) • NIMO (Piyush Shivam) • Answering queries about: 1. Expected performance given a resource assignment 2. Feasible resource assignments to meet SLA 3. What-if queries for applications in network utilities 15 Sample Project (in progress) • Fa (Songyun Duan) • Can we automate problem-prediction and diagnosis? • Use of Bayesian Networks for: • Predicting performance problems (continuous query) • Root-cause queries 16 Sample Project • What-if queries on database configurationparameter settings – Ex: What happens to transaction response times if I change value of parameter X from v to v’ 17 Sample Project • Combined querying of structured and unstructured system data – Structured data: MySQL performance counters, processor utilization, number of I/O accesses – Unstructured data: Application and system logs • Interested: Hao He 18 Sample Project • Add problem-prediction capability to Nagios • Add root-cause querying to Nagios • Similar projects using the IBM Autonomic Computing Toolkit + ABLE framework – Ex: Wrap them inside a query interface 19 Projects at HP Research • Project 1: Predicting performance problems, finding root causes of problems • Project 2: Debugging complex systems • Project 3: Designing adaptive systems (using control theory) 20