L11

advertisement
Lecture 12
Synchronization
Roadmap for today

Project logistics





Posted yesterday
P01 due Wednesday Nov. 3rd
Apply for planetlab accounts
Discuss quiz questions
Synchronization in distributed systems
EECE 411: Design of Distributed Software Applications

Before we start survey results
EECE 411: Design of Distributed Software Applications
Useful

Discussions structure




Assignments









9
closely related to class materials, usefull hands-on work,
good that marking is done on coding style as much as functionality
but *too few*!
Slides


good thinking exercises;
helps understand how knowledge is applied;
good to discuss quizz-like questions
20
8
Good summary of material,
Useful for assignment,
good overlap with previous week to tie in,
I like the repetition, makes it more obvious what we have to learn
Real-world examples
7
Availability of TA / instructor || coding session
hmmmm' voting technique
1
1
EECE 411: Design of Distributed Software Applications
Concerns

Project not yet up


description / grading scheme / project expectations
PlanetLab tutorial

Textbook is not a good reference
Epidemic

I would like to see this topic









Replication
How do large things work
Distributed decision making
Cloud computing
Event-driven programming
Examples of pseudocode
Virtualization
Security
EECE 411: Design of Distributed Software Applications
10
3
2
1
1
1
1
1
1
1
1
Suggestions























Sample quizes/clearer idea on quiz expectations
8
More sample questions / more sample problems
7
Questions with answers (summarize discussions in slides) 6
More detailed explanation within the slides
Make slides available earlier
Discussion board
Fixed course structure
More structured relationships between topics
More coding sessions
More short assignments
Make sure that students without adequate coding experience can not take the class
Results of a design rather than covering a bit about each
Instructor-assigned groups (rather than based on student preference)
Good on quiz one: lots of questions
Provide grading scheme beforehand
Provide more of a big picture / Better organization of content / Roadmap
Tighter deadlines around project so that there is no cramming …
Assignments that include written parts
if we learn about gossip should we have a chance to implement?
More quantitative discussion questions
Go slower
Be more clear in describing concepts
EECE 411: Design of Distributed Software Applications
Grading scheme: Best 2 out of 3 three quizes
3
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Roadmap for today

Project logistics





Posted yesterday
P01 due Wednesday Nov. 3rd
Apply for planetlab accounts
Discuss quiz questions
Synchronization in distributed systems
EECE 411: Design of Distributed Software Applications
Q5.) Consider a circular Distributed Hash Table (DHT) with identifiers
in the range [0; 127]. Suppose there are eight participating nodes
with identifiers 1, 13, 43, 51, 70, 83, 100 and 115. The DHT is
configured so that the successor list has length 2. Also, the DHT
is configured so that the finger table has size one: i.e., each peer
maintains only one ‘shortcut’ (or ‘finger’) – this aims to reduce the
search space in half. Questions:
a). Suppose that the following (key,value) pairs should be stored in
the DHT: (0,’mama’), (3,’tata’), (7,’zaza’), (15,’bibi’), (110, ‘zizi’)
and (125, ‘cici’). Which peers will store which (key,value) pair?
Present your answer as a table.
b.) Assume a search launched at node 13 for key 0. Describe the
search process.
c.) Suppose that peer 13 learns that peer 43 has left the DHT. How
does peer 13 update its successor state information? Which peer
is now its first successor? Its second successor? Is there any
change in the set of keys each peer is responsible for?
c.) Suppose that a new peer with the identifier 5 wants to join the
DHT and it initially only knows the IP address of the peer 53.
What steps are taken for peer 6 to join the system? How does the
system look like after peer 6 joins?
EECE 411: Design of Distributed Software Applications
Key placement
N1
Node ID
K0, K125
K110 N115
0
N100
N13
K3, K7
128
Circular
ID Space
N83
N43
N70
N51
EECE 411: Design of Distributed Software Applications
K15
Search launched at N13 for K0
N1
Node ID
K0, K125
K110 N115
N13
N100
K3, K7
N83
N43
N70
Each node maintains

Successor list (2)

Shortcuts (1)
N51
EECE 411: Design of Distributed Software Applications
K15
Search launched at N13 for K0
N1
Node ID
K0, K125
K110 N115
N13
N100
K3, K7
N83
N43
N70
Each node maintains

Successor list (2)

Shortcuts (1)
N51
EECE 411: Design of Distributed Software Applications
K15
Search launched at N13 for K0
N1
Node ID
K0, K125
K110 N115
N13
N100
K3, K7
N83
N43
N70
Each node maintains

Successor list (2)

Shortcuts (1)
N51
EECE 411: Design of Distributed Software Applications
K15
Search launched at N13 for K0
N1
Node ID
K0, K125
K110 N115
N13
N100
K3, K7
N83
N43
N70
Each node maintains

Successor list (2)

Shortcuts (1)
N51
EECE 411: Design of Distributed Software Applications
K15
Peer 13 learns that peer 43 is dead. How does peer 13 update its
successor state information? Which peer is now its first/second
successor? Is there any change in the set of keys each peer holds?
N1
K0, K125
K110 N115
N13
N100
K3, K7
N83
N43
K15
Crashed
N70
N51
EECE 411: Design of Distributed Software Applications
A new peer (6) wants to join the DHT and it initially only knows the
IP address of the peer 51. What steps are taken for peer 6 to join
the system? How does the system look like after peer 6 joins?
N1
K0, K125
K110 N115
N13
N100
K3, K7
Two invariants to maintain
1. Lookup for 6?  N13
for
correctness
2. Predecessor of N13? 

KeyN1to node
3. Announce yourself to N1
assignment
4. N1 updates successor

Successor
lists
5.
6.
7.
8.
9.
N83
10.
N1 notifies predecesor
N115 updates successor
N6 joins
Creates successor list
Splits keys with N13
N13 updates predecesor
N70
N51 K15
N6 joins
EECE 411: Design of Distributed Software Applications
6.) What are the criteria to choose between a
system based on consistent hashing and one
based on a distributed hash table.
Where are the differences?
 Key to node assignment?
 Lookup?



No difference
Yes (logN hops for
DHT vs. 1 hop consistent hashing)
Information used for lookup?
Yes (logN vs. N)
Impact of failures?
Yes (logN vs. N)
Ability to scale?
Yes
EECE 411: Design of Distributed Software Applications
Roadmap for today

Project logistics





Posted yesterday
P01 due Wednesday Nov. 3rd
Apply for planetlab accounts
Discuss quiz questions
Synchronization in distributed systems
EECE 411: Design of Distributed Software Applications
Summary so far …
A distributed system is:
 a collection of independent computers that appears to its
users as a single coherent system
Components need to:
 Communicate



Point to point: sockets, RPC/RMI
Point to multipoint: multicast, epidemic
Cooperate

Naming to enable some resource sharing



Naming systems for flat (unstructured) namespaces: consistent
hashing, DHTs
Naming systems for structured namespaces: EECE456 for DNS
Synchronization
EECE 411: Design of Distributed Software Applications
Synchronization to support coordination

Examples





Distributed make
Printer sharing
Monitoring of a real world system
Agreement on message ordering
Why is synchronization more complex than in
a single-box system

No global views, multiple clocks, failures
EECE 411: Design of Distributed Software Applications
Roadmap

Physical clocks


‘Logical clocks’


Provide actual / real time
Where only ordering of events matters
Leader election

How do I choose a coordinator?
EECE 411: Design of Distributed Software Applications
Physical clocks (I)


Problem: How to achieve agreement on time in a
distributed system?
A possible solution: use Universal Coordinated Time (UTC):




Based on the number of transitions per second of the cesium 133
atom (pretty accurate).
At present, the real time is taken as the average of some 50
cesium-clocks around the world.
Introduces a leap second from time to time to compensate for days
getting longer.
UTC is broadcast through short wave radio and satellite.

Accuracy ± 1ms (but if weather conditions considered ±10ms)
EECE 411: Design of Distributed Software Applications
Physical clocks - underlying model
Suppose we have a distributed system with a UTC-receiver
somewhere in it.
Problem: we still have to distribute time to each machine.
Internal mechanism at each node
 Each machine has a timer
 Timer causes an interrupt H times a second
 Interrupt handler adds 1 to a software clock
 Software clock keeps track of the number of ticks since
agreed-upon time in the past.

Notation

Value of clock on machine p at real time t is Cp(t)
EECE 411: Design of Distributed Software Applications
Physical clocks – main problem: clock drift
Notation: Value of clock on machine p at real time t is Cp(t)
Ideally: Cp(t) == t and dCp(t) = dt
Real world: clock drift, i.e., |Cp(t) - t | > 0
Clock value (Cp) guaranteed to progress:
1 - ρ ≤ (dC/dt) ≤ 1 + ρ
ρ -- maximum drift rate
Goal: Never let clocks in any two
nodes in the system differ by more
than x time units
 synchronize at least every x/(2ρ) seconds.
EECE 411: Design of Distributed Software Applications
Building a complete system …

Option I: Every machine asks a time server for the accurate time
at least once every x/(2ρ) seconds (Network Time Protocol).




Okay, but need to account for network delays, including interrupt
handling and processing of messages.
Client updates time to:
Tnew=CUTC+(T2-T1)/2
Fundamental: You’ll have to take into account that setting the
time back is never allowed  smooth adjustments.
Option II: Let the time server scan all machines periodically,
calculate an average, and inform each machine how it should
adjust its time relative to its present time.

Note: you don’t even need to propagate UTC time.
EECE 411: Design of Distributed Software Applications
Building a complete system …

Option I: Every machine asks a time server for the
accurate time at least once every x/(2ρ) seconds
(Network Time Protocol).


Okay, but need to account for network delays, including
interrupt handling and processing of messages.
Client updates time to
EECE 411: Design of Distributed Software Applications
Real world: Network Time Protocol (NTP)


Stratum 0 NTP servers – receive time from external
sources (cesium clocks, GPS, radio broadcasts)
Stratum N+1 servers synchronize with stratum N
servers and between themselves

Self-configuring network

User configured to contact local NTP server

Survey (N. Minar’99)



> 175K NTP servers
90% of the NTP servers have <100ms offset fro synchronization
peer
99% are synchronized within 1s
EECE 411: Design of Distributed Software Applications
Uses of (synchronized) physical clocks
in the real world



NTP
Global Positioning Systems
Using physical clocks to implement at-mostonce semantics
EECE 411: Design of Distributed Software Applications
Summary so far

Synchronization solutions

Physical time synchronization


Often costly, imperfect
But with real applications (NEXT TIME)
EECE 411: Design of Distributed Software Applications
Download