Slide 1

advertisement
Distributed Systems
CS 15-440
Google Chubby and Message Ordering
Recitation 4, Sep 29, 2011
Majd F. Sakr, Vinay Kolar, Mohammad Hammoud
Today…
 Last recitation session:
 Google Protocol Buffers and Publish-Subscribe
 Today’s session:
 Google Chubby
 A Google library and infrastructure for synchronization
 Ordered Communication
 Ordering events and enforcing ordering while communicating
 Announcement:
 Project 1 due on Oct 3rd
Overview
Recap
Google Chubby
Ordered Communication
Recap: Google Physical Infrastructure
Google has created a large distributed system from
commodity PCs
Commodity PC
Data Center
Rack
Approx 40 to 80 PCs
One Ethernet switch (Internal=100Mbps,
external = 1Gbps)
Cluster
Approx 30 racks (around 2400 PCs)
2 high-bandwidth switches (each rack connected to both
the switches for redundancy)
Placement and replication generally done at cluster level
Recap: Google Data center Architecture
(To avoid clutter the Ethernet connections are shown from only one of the clusters to the external
links)
Recap: Google System Architecture
Recap: Google Infrastructure
Overview
Recap
Google Chubby
Ordered Communication
Google Chubby
Google Chubby offers the coordination and storage
services to other services (e.g., to Google File System)
It provides coarse-grained distributed locks to synchronize
distributed activities in a large-scale, asynchronous environment
It can be used to support the election of primary in a set of replicas
It can be used as a name-service within Google
It provides a file system offering the reliable storage of small files
Chubby is an all-in-one package consisting of file-system, locking service,
naming service and election facilitator!
Chubby Interface
Chubby provides an abstraction based on a file system
concept that every data object is a file
Files are organized into hierarchical namespace
Example
/ls/chubby_cell/directory_name/…/file_name
Lock Service
An identifier for describing the name of the instance of Chubby
Chubby as a file-system and
a locking service
The interface provides an easy mechanism to store
small files
Chubby provides following Interfaces
General Interfaces
File-System Interfaces
Locking Service Interfaces
Chubby – General Interfaces
Chubby provides interfaces for opening, closing and
deleting a file in its namespace
Open call: Opens a file or directory and returns a handle
Client can specify if the file has to be opened for
reading, writing or locking
Close call: Relinquishes the handle
Delete calls: Remove the file or directory
Chubby – File-System Interfaces
Chubby provides two services:
Whole-file reading and writing operations
Single atomic operations are provided to read and write complete
data in the file
Chubby can be used to store small files (but not large files)
Access control
A file is associated with an Access Control List (ACL)
ACL can be get and set through interfaces
Chubby – Locking Service Interfaces
In Chubby, a file can be opened as a lock
The owner of the lock has the handle to the file
Chubby provides three interfaces
Acquire: The call gets a handle to the lock
Release: This call releases the lock
TryAcquire: This is a Non-blocking variant of the Acquire call
Chubby provides advisory locks, and not mandatory locks
Advantage: Extra flexibility and resilience
Disadvantage: Programmer has to manage the conflict
Summary of Chubby Interfaces
Chubby Architecture
A Chubby Instance (or a chubby cell) is the first level of
hierarchy inside Chubby (ls)
/ls/chubby_cell/directory_name/…/file_name
Chubby instance is implemented as a small number of
replicated servers (typically 5) with one designated master
Clients access these replicas using Chubby Library
Uses Protocol Buffers to communicate
Replicas are placed at failure-independent sites
Typically, they are placed within a cluster but not within a rack
Chubby Namespace Architecture
The hierarchical namespace of directories and files/locks
is maintained in a database at each replicas
The consistency of replicated database is ensured
through a consensus protocol that uses operation logs
Logs can be used to reconstruct the state of the system
Problem: Logs can become too large over time
Solution: Chubby takes a snapshot of the system periodically,
and erases the old logs
Chubby Session
Chubby Session is the relationship between client and a
Chubby cell
KeepAlive messages maintain the session
Client Caching and Consistency
Client caches file data, meta data and handles that are open
Cache consistency
Whenever a mutation is to occur, the associated operation is
blocked until all caches are invalidated
Invalidation messages are piggybacked on KeepAlive messages
Disadvantages:
Cached copies are not invalidated, and not simultaneous
updated
Operation cannot progress until all replicas are invalidated
Advantages:
Simple and elegant for small files and locks
Chubby Architecture Diagram
Overview
Recap
Google Chubby
Ordered Communication
Ordered Communication
In several applications, ordering of events is vital
For example, consider a flight-booking system
Reserve
Cancel
time
Client
Server
Prices 15% Off
Server cancels the reservation before booking – even when
the messages are reliably delivered!
We will study how to ensure ordered delivery of events in
group communication
Ordered Multicast – An Example
An example where total-ordering is necessary
In an eCommerce application, the bank database has been
replicated across many servers
Let us consider a 2-replica scenario
Event 1 = Add $1000
Event 2 = Add interest of 5%
2
1
Bal=2000
Bal=2100
Bal=1000
4
3
Bal=1000
Bal=1050
Bal=2050
Replicated Database
The updates from Event 1 and Event 2 should be performed in the same
order on every replicated server. Else the data is inconsistent.
Three Types of Ordering
FIFO Order
Causal Order
Total Order
FIFO Ordering
T1
T2
FIFO Order
If a process sends a multicasts a
message m before m’, then no correct
process delivers m’ if it has not already
delivered m
F1
F3
F2
Time
In the example,
F1 and F2 are in FIFO Order
C1
Drawback:
FIFO Order does not specify any order
for the messages generated across
different processes
e.g, F1 and F3 can be delivered in any order
C2
C3
P1
P2
P3
Causal Ordering
If process Pi multicasts a message mi
and Pj multicasts mj, and if mimj
(operator ‘’ is Lamport’s
happened-before relation) then
any correct process that delivers mj
will deliver mi before mj
T1
T2
F1
F3
F2
Time
Causal Order
Relationship between FIFO and Causal order:
Causal Order implies FIFO Order, but FIFO
Order does not imply Causal Order
In the example, C1 and C3 are in Causal Order
Drawback:
The happened-before relation between
mi and mj should be induced before
communication
C1
C2
C3
P1
P2
P3
Total Ordering
Total Order
If process Pi multicasts a message
mi and Pj multicasts mj, and if one
correct process delivers mi before mj
then every correct process delivers
mi before mj
T1
T2
F1
F3
F2
Drawback:
Total order does not imply FIFO or
causal orders
Time
In the example, T1 and T2 are in Total
Order
C1
C2
C3
P1
P2
P3
Totally Ordered Multicast
Totally Ordered Multicast is a multicast communication paradigm
that ensures that all messages are delivered in the same order at all
the receivers
Approach:
Process Pi sends timestamped multicast message msgi to all the
receivers in the group
At the sender, the message is buffered in a local queue queuei
Any incoming message at Pj is queued in queuej, according to its
timestamp, and acknowledged to every other process.
11
Process 1
Process 2
Process 3
0
1
2
3
5
7
0
1
3
5
7
0
1
3
5
7
2
4
6
Totally Ordered Multicast (cont’d)
A receiver will deliver the message to the application if
The message is at the head of the queue, and
The message has been acknowledged by each other process
Assumptions in Totally Ordered Multicast:
Communication is reliable
There is no out-of-order delivery of messages that are
transmitted from the same sender
Application of Vector Clocks:
Causally Ordered Multicast
In Causally Ordered Communication, a message m is delivered to an
application only if all messages that causally precede m has been
received
Vector Clocks allow implementation of Causally Ordered Multicast
Here, a multicast message is delivered to an application in the causal order
Under some criteria, Causally Ordered Multicast is weaker than
Totally Ordered Multicast
If two messages are not related to each other, it does not matter in which order
they are delivered to the application
Causally Ordered Multicast – An Example
Causally Ordered Multicast –
Approach
Clocks are adjusted only when sending and receiving messages
When sending a message m from Process Pi:
VCi[i] = VCi[i] + 1
ts(m) = VCi
When it delivers a message with ts(m):
VCj[k] = max(VCj[k], ts(m)[k]) ; (for all k)
When Pj receives a message m (with timestamp ts(m)) from Pi, it
will deliver the message to the application only if:
ts(m)[i] = VCj[i]+1
m is the next message that Pj was expecting from Pi
ts(m)[k] <= VCj[k]; (for all k != i)
Pj has seen all the messages that have been seen by Pi when it
sent the message m
References
http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
http://mobilelocalsocial.com/2010/google-data-center-fire-returns-worldwide-404errors/
http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/
http://cdk5.net
http://www.dis.uniroma1.it/~baldoni/ordered%2520communication%25202008.ppt
http://www.cs.uiuc.edu/class/fa09/cs425/L5tmp.ppt
Download