lecture1

advertisement
Self-* Systems
CSE 598B
Instructor: Bhuvan Urgaonkar
Fall 2005
Introduction
 Bhuvan Urgaonkar
– Assistant Professor, CSE
 Ph.D. Univ. of Mass., Amherst
 Research Interests
– Distributed systems, operating systems, computer networking,
modeling of systems
 Office: 338D, Email: bhuvan@cse.psu.edu
 Office hours and class timings
– Undecided as of now, we will figure this out at the end of the class
– If in doubt: just walk in anytime!
 Students’ turn to introduce themselves
2
Self-* systems
Self-*: a regular expression
– But not quite
– No self-destroying systems 
Three themes
– Self-tuning systems
– Self-healing systems
– Self-stabilizing systems
Course Web page:
– http://www.cse.psu.edu/~bhuvan/teaching/fall05/self-star.html
To do: Set up a course mailing list
3
Self-tuning systems
 Systems that can adapt their behavior to dynamically
changing external influences on their own
Desired Trajectory
Friction,
Turbulence
Guidance
Model
Thrust
Parameters
Rocket
Thrusters
Actual Trajectory
4
Internet applications
 Proliferation of Internet applications
auction site
online game
online retail store
 Growing significance in personal, business affairs
 Focus: Internet server applications
5
Hosting platforms
 Data Centers
– Clusters of servers
– Storage devices
– High-speed interconnect
 Hosting platforms:
– Rent resources to third-party applications
– Performance guarantees in return for revenue
 Benefits:
– Applications: don’t need to maintain their own infrastructure
• Rent server resources, possibly on demand
– Platform provider: generates revenue by renting resources
6
Goals of a hosting platform
 Meet service-level agreements
– Satisfy application performance guarantees
• E.g., average response time, throughput
 Maximize revenue
– E.g., maximize the number of hosted applications
Question: How should a hosting platform manage its
resources to meet these goals?
7
Challenge: dynamic workloads
1200
 Multi-time-scale variations
– E.g., Flash crowds
 User threshold for
response time: 8-10 s
0
0
1
2
3
4
5
Time (days)
140K
140000
Request Rate (req/min)
 Overloads
Arrivals per min
– Time-of-day, hour-of-day
120000
100000
80000
60000
40000
20000
0
0
0
0
 Key issue: How to provide good
response time under varying workloads?
5
10
Time (hrs)
12
15
20
Time (hours)
8
24
Self-tuning systems
 A self-tuning hosting platform
Application Performance Goals
Dynamic
Workloads
Resource Inference
Model
Resource
Shares
Resource
Schedulers
Actual Performance
9
Dynamic provisioning
Monitor
workload
Compute current/
future demand
Adjust allocation
 Key idea: increase or decrease allocated servers to
handle workload fluctuations
– Monitor incoming workload
– Compute current or future demand
– Match number of allocated servers to demand
10
Dynamic provisioning at multiple
time-scales
 Predictive provisioning
– Certain Internet workloads patterns can be predicted
• E.g., time-of-day effects, increased workload during Thanksgiving
– Design a good application model
– Provision using model at time-scale of hours or days
 Reactive provisioning
– Applications may see unpredictable fluctuations
• E.g., Increased workload to news-sites after an earthquake
– Detect such anomalies and react fast (minutes)
 Question: How to put these together?
– When to invoke the predictor and the reactor?
11
Self-healing systems
 Systems that continue to operate on their own despite
faults or failures
 Distinction between faults and failures
– Fault: A sysadmin sets a small concurrency limit for a Web server
– Failure: debris from an external fuel tank is thought to have struck
Columbia's left wing in 2003.
 Failure/fault handling capability built into the system
– Graceful degradation
 We will study classic literature in fault tolerance, papers
that apply these principles to modern distributed systems
12
Self-stabilizing systems
 Guaranteed to converge to a desired behavior from any
initial state if left alone
 Why should one have interest in self-stabilizing
algorithms?
– Its applicability to distributed systems
– Recovering from faults of a space shuttle. Faults may
cause malfunction for a while. Using a self-stabilizing
algorithm for its control will cause an automatic
recovery, and enables the shuttle continue in its task
13
What is a self-stabilizing algorithm?
 This question will be answered using the “Stabilizing
Orchestra” example
 The Problem:
– The conductor is unable to participate – harmony
is achieved by players listening to their neighbor
players
– Windy evening – the wind can turn some pages in
the score, and the players may not notice the
change
14
The “Stabilizing Orchestra” Example
 Our Goal:
To guarantee that harmony is achieved at some point
following the last undesired page turn
 Imagine that the drummer notices a different page of the
violin next to him … (solutions and their problems):
1. The drummer turns to its neighbors new page – what if
the violin player noticed the difference as well?
2. Both the drummer and violin player start from the
beginning
- what if the player next to the violin player notices the
change only after sync between the other 2?
15
The Self-Stabilizing Solution
 Every player will join the neighboring player who is
playing the earliest page (including himself)
 Note that the score has a bounded
length. What happens if a player
goes to the first page of the score
before harmony is achieved?
 In every long enough period in which
the wind does not turn a page, the
orchestra resumes playing in
synchrony
16
Discussion: Overlaps and distinctions
 Self-tuning vs self-healing vs self-stabilizing systems
 Proactive vs reactive
17
Crosscutting goals and challenges
 Removing costly and error-prone humans from
administering complex systems
 Learning from the past
 Modeling systems to render them amenable to analysis
 Understanding how robust a system is
– Robust = predictable behavior, graceful degradation
– Equivalent: Figuring out how to make a system robust
18
Introspection!
 Everyone gives an example of a self-* aspect from
their research/experience
– Arjun: e-commerce applications
– Amitayu: dynamic allocation of servers in a farm
– Ross: Ross’s sensor n/w
– Huajing: information ret/ feedback
– Young: fault handling by duplication
– Krishna: activity migration in a multiprocessor
19
Goals of the course
 Understand classic literature
 Identify theory and systems issues/tools common across
these diverse domains
– Statistical learning, control theory, measurement techniques,
data analysis, fault tolerance, modeling
• I will try to have some guest lectures
 Learn to appreciate how theory translates into and
compares with practice
 Critically evaluate papers and present them, use these in
research
20
Some administrative details …
21
Grading policy
 Paper presentations: 30%
 Class participation and discussion: 15%
– Lets have lots of heated discussions
– Don’t be shy!
 Paper evaluations due before class: 15%
– A conference-style evaluation form
 Semester-long project: 30%
– May be replaced by a term paper
– Apply ideas to your research, masters thesis
 Final exam: 10%
– Take-home exam
22
Expected course-load
 No intentions of stressing you out!
 Round-robin presentation policy
– Number of presentations will depend on how many students enroll
– Red-teams: To make sure you come prepared
– We DON’T want bad presentations!
 Mid-term and final presentations for students doing
projects
 End-of-semester take-home exam
– Goal: Find out what we learnt in the course
23
Presentations
 Prepare about 45-min long talk
 Rest of the class for discussions
– We will accept or reject papers at the end of each class 
 Red team
– Each presenter will practice his/her talk with the assigned red team
before the class
– You are welcome to talk to me, discuss slides, ask for help
understanding the paper before presenting it
 Use the powerpoint template on course page
 We will try to become good speakers and reviewers!
24
Paper evaluations
 Due the midnight before the class
 I will put up an evaluation format that you will adhere to
– No long essays needed
– Be critical, read the papers carefully
 I will anonymize evaluations and put them up after the
class so all can read them
 Acceptable: txt, pdf
25
Course project
 Not compulsory
 You may work in groups of up to 2 students
 You may replace it with a term paper
– Survey of additional reading material
 Project may be
– A theoretical exercise
– Implementation-based
– A thought experiment
 Report and term papers due at the end of the semester
26
Final exam
 Day-long take-home exam
 For students doing projects, I will design questions
related to their project
 For students doing a survey, I will design questions
based on their survey report
27
Miscelleneous
 Please register soon so the course can be offered
– At least 5 students need to take the course
 Lets figure out course timings suitable to all
 Random thoughts
– Would you like to solve puzzles?
– Would you like to have discussions on systems
research in general, hot areas, top conferences …?
– Would you like to take turns as scribes?
 Hope: We will learn a lot and have lots of fun in this course
28
Questions or comments?
29
Download