CompSci 296.2 Self-Managing Systems Shivnath Babu

advertisement
CompSci 296.2
Self-Managing Systems
Shivnath Babu
Today
• Project schedule (reminder)
• Finish QueS presentation
– System, challenges
• Sample projects
• If we have time, start ROC discussion
2
Project
• Group size <= 2
• Identify “general topic” by end of January, meet Shivnath
• Feb 7: Scope the problem, give 15-minute talk
• Feb 21: 3-minute talk
• March 7: 15-minute talk
• March 28: 3-minute talk
• April 4/6: 15-minute talk
• April 20/24: 15-minute final in-class presentation (+ “demo”)
3
Querying Systems as Data
• What are probable causes of
the Service-Level-Agreement
(SLA) violations rising to 12%?
Root-cause query
4
Queries: What if …
• Given today’s workload, how
will average response time
change if my database fails?
• If I double the memory on my
application servers, how will
SLA violation rate change?
5
Queries: Let me know …
• Let me know if, with 75%
probability, average response
time will exceed 5 seconds in
next 30 minutes
– Prediction
– Continuous query
6
Queries: What should I do?
• What should I do to reduce
SLA violations of requests A to
<1%, without increasing
violations of other requests?
– Root-cause + What-if
7
Querying Systems as Data
• Instrumented traces, logs
• System activity data
• Data from active probing
• Workload
• System configuration data
(e.g., buffer size, indexes)
• Source code
D
A
T
A
• Models
–
–
–
–
Analytic performance models
Machine learning models
Rules from system experts
Simulators
8
Querying Systems with QueS (30,000 ft)
Data
Maintenance
System
mgmt. Queries
services
Answers
Query
Processor
Modeldriven DB
Engine
D
A
T
A
Data
Acquisition
9
Challenges: Query Complexity
• Support for complex queries
– Rank probable causes of SLA violation rising to 12%?
– “What should I do” queries
• Queries may be acquisitional
10
Challenges: Query Specification
• Declarative query language
– Expressibility of language
– Composition
• Snapshot queries and continuous queries
11
Challenges: Query Processing
• Model-based query processing
• Many types of data sources
– Structured, semi-structured, and unstructured
• Uncertainty in input data
– E.g., legacy systems may have partial/no instrumentation
• Imprecise answers
– Answers may include quantification of accuracy
– Ranking
12
Challenges: Run-time Overhead
• Real-time service for 24x7 systems
• Tunable data acquisition
• Active probing
13
Sample Projects
• NIMO
• Fa
• What-if querying for database systems
• Combining structured & unstructured data
• Projects using Nagios
• Projects using IBM software
14
Sample Project (in progress)
•
NIMO (Piyush Shivam)
•
Answering queries about:
1. Expected performance given a resource assignment
2. Feasible resource assignments to meet SLA
3. What-if queries for applications in network utilities
15
Sample Project (in progress)
•
Fa (Songyun Duan)
•
Can we automate problem-prediction and
diagnosis?
•
Use of Bayesian Networks for:
•
Predicting performance problems (continuous query)
•
Root-cause queries
16
Sample Project
•
What-if queries on database configurationparameter settings
– Ex: What happens to transaction response times if I
change value of parameter X from v to v’
17
Sample Project
•
Combined querying of structured and
unstructured system data
– Structured data: MySQL performance counters,
processor utilization, number of I/O accesses
– Unstructured data: Application and system logs
•
Interested: Hao He
18
Sample Project
•
Add problem-prediction capability to Nagios
•
Add root-cause querying to Nagios
•
Similar projects using the IBM Autonomic
Computing Toolkit + ABLE framework
– Ex: Wrap them inside a query interface
19
Projects at HP Research
• Project 1: Predicting performance problems, finding
root causes of problems
• Project 2: Debugging complex systems
• Project 3: Designing adaptive systems (using
control theory)
20
Download