Zookeeper
Wait-Free Coordination for
Internet-Scale Systems
What is ZooKeeper
Service for coordinating distributed processes
Wait-free coordination
Enables high-performance server implementation
Can handle hundreds of thousands of transactions per second
Distributed system for implementing distributed systems!
What distributed processes entail
Large number of processes
Heterogeneous hardware
Inter-Process Communication
Asynchronous systems
Network delays
Some Examples
Search engines
Crawling
Indexing
Query Processing
Large-scale data processing
Map-reduce
Hadoop
Dryad
Why is it necessary
Distributed systems need
Configuration Maintenance
Distributed Synchronization
Group Membership
Because
Race Conditions
Deadlocks
Bugs
Introduction
ZooKeeper – Coordination service
Database of meta-data
Relieves distributed systems of its distributed responsibilities
How?
Elements of ZooKeeper
Replicated in-memory database
Hierarchical DHT
Coarse-grained lock service
Event queue server
Hierarchical Pub/Sub server
Guarantees of ZooKeeper
Serializability
Serializable Reads
All reads from a client are processed in order
Linearizability
Linearizable Writes
All writes from all clients are processed in order
Data Model
File system supporting full reads and writes
Uses znodes
Data objects
Hierarchical ordering
Znodes are unlike files
Does support storing metadata
Data Model
The API
create(path, data, flags) delete(path, version) exists(path, watch) getData(path, watch) setData(path, data, version) getChildren(path, watch) sync(path)
Why multiple functions for a function
Atomicity
Message passing
Three notifications
Exists getData
-> znode insertion at a path
-> znode data updates getChildren -> znode group broadcasts
Failure detection
Synchronization
The many guarantees of ZooKeeper
Sequential consistency
Atomicity
Reliability
Group revision
Linearizable reads
ZooKeeper Implementation
ZooKeeper Implementation
Request Processor
Provides high availability by replication
Use atomic broadcast for coordination in case of writes
If read request, simply generate response
ZooKeeper Implementation
Request Processor
Replicated database contains entire tree
Maintains logs for recoverability
Clients connect to one server to submit requests
Transactions are idempotent.
Writes forwarded to one server – leader
ZooKeeper Implementation
ZooKeeper Primitives
Configuration Management
Rendezvous
Group membership
Simple locks
Read / Write locks
Double barrier
Evaluation of ZooKeeper
Variable number of servers, fixed number of clients.
35 machines simulating 250 simultaneous clients, which all use the asynchronous API.
Read/write payloads all 1KB in size.
Benchmarking done on the client side.
Evaluation of ZooKeeper
Evaluation of ZooKeeper
Evaluation of ZooKeeper
1. Failure and recovery of a follower;
2. Failure and recovery of a different follower;
3. Failure of the leader;
4. Failure of two followers (a, b) in the first two marks, and recovery at the third mark (c);
5. Failure of the leader.
Conclusion
Wait-free approach towards coordinating processes
Used in several applications
Yahoo Message Broker (Pub/Sub)
Hadoop
Katta – Distributed Indexer