Composable Consistency for Wide Area Replication Sai Susarla Advisor: Prof. John Carter 1 PhD Dissertation Defense Overview Goal: middleware support for wide area caching in diverse distributed applications Key Hurdle: flexible consistency management Our Solution: novel consistency interface/model Composable Consistency Benefit: supports broader set of sharing needs than existing models. Examples: file systems, databases, directories, collaborative apps – wider variety than any existing consistency model can support Demo Platform: novel P2P middleware data store - Swarm 2 PhD Dissertation Defense Caching: Overview The Idea: cache frequently used items locally for quick retrieval Benefits Within cluster: load-balancing, scalability Across WAN: lower latency, improved throughput & availability Applications Data stored in one place, accessed from multiple locations Examples: » File system: personal files, calendars, log files, software, … » Database: online shopping, inventory, auctions, … » Directory: DNS, LDAP, Active Directory, KaZaa, … » Collaboration: chat, multi-player games, meetings, … 3 PhD Dissertation Defense Centralized Service Primary server cluster client user Internet 4 PhD Dissertation Defense Proxy-based Caching Primary server cluster Consistency protocol client user Internet Caching proxy Server cluster 5 PhD Dissertation Defense Caching: The Challenge Applications have diverse consistency needs Application 6 Sharing Characteristics Consistency needs Static web content, Read-mostly media, s/w updates Stale data, manual reload ok Chat, whiteboard Concurrent appends Real-time sync, causal msg order Auctions, ticket sales, Financial DB Write-sharing, conflicts, varying contention Serializability, latest data, atomicity (ACID) … … … PhD Dissertation Defense Caching: The Problem Consistency requirements are diverse Caching is difficult over WANs Variable delays, node failures, network partitions, admin domains, … Thus, most WAN applications either: Roll their own caching solution, or Do not cache and live with the latency Can we do better? 7 PhD Dissertation Defense Thesis "A consistency management system that provides a small set of customizable consistency mechanisms can efficiently satisfy the data sharing needs of a wide variety of distributed applications." 8 PhD Dissertation Defense Outline Further Motivation Application study new taxonomy to classify application sharing needs Composable Consistency (CC) model Novel interface to express consistency semantics for each access Small option set can express more diverse semantics Evaluation 9 PhD Dissertation Defense Existing Models are Inadequate Provide a few packaged consistency semantics for specific needs: e.g., optimistic/eventual, close-to-open, strong Or, lack enough flexibility to support diverse needs TACT (cannot express weak consistency or session semantics) Bayou (cannot support strong consistency) Or, leave consistency management burden on applications e.g., Oceanstore, Globe 10 PhD Dissertation Defense Existing Middleware is Inadequate Existing middleware support specific sharing needs 11 Read-only data: PAST, BitTorrent Rare write-sharing: file systems (NFS, Coda, Ficus …) Master-slave (read-only) replication: storage vendors, mySQL Scheduled (nightly) replication: storage and DB services Read-write replication in a cluster: commercial DB vendors, Petal PhD Dissertation Defense Application Survey 40+ applications with diverse consistency needs 12 Application Sharing Characteristics Consistency needs Static web content, media, s/w updates Read-mostly Stale data, manual reload ok Stock quotes Read-only Limit max. staleness to T secs Chat, whiteboard Concurrent appends Real-time sync, causal msg order Multiplayer game Heavy write-sharing Real-time sync, totally order play moves Auctions, ticket sales, Financial DB Write-sharing, conflicts, varying contention Serializability, latest data, atomicity (ACID) Personal file access Rare write-sharing Eventual consistency Mobile file access, collaborative sharing Sequential write-sharing Latest data, session semantics Directory, calendars, groupware Write-sharing, mergeable writes Tight sync within campus, relaxed sync across campuses PhD Dissertation Defense Survey Results Found common issues, overlapping choices Are parallel read and writes ok? How often should replicas synchronize? Does update order matter? What if some copies are inaccessible? … Can we exploit this commonality? 13 PhD Dissertation Defense Composable Consistency: Novel interface to express consistency semantics Concurrency control Access mode Concurrent Exclusive Sync frequency Manual push, pull T-seconds stale, N missed writes Replica synchronization Strength Hard Soft Causality Yes No Atomicity Yes No Update ordering None Total Inaccessible copy Ignore Fail access Accept updates Session Immediately Reveal updates On close() Immediately Failure handling View Isolation Update Visibility 14 Serial PhD Dissertation Defense Example: Close-to-open (AFS) Allow parallel reads and writes Latest data guaranteed at open() Fail access when partitioned Accept remote updates only at open() Reveal local updates to others only on close() 15 Access mode Concurrent Exclusive Sync frequency Manual push, pull 0 seconds stale Strength Hard Soft Causality Yes No Atomicity Yes No Update ordering None Total Inaccessible copy Ignore Fail access Accept updates Session Immediately Reveal updates On close() Immediately Serial PhD Dissertation Defense Example: Eventual Consistency (Bayou) Allow parallel reads and writes Sync copies at most once every 10 minutes Syncing should not block or fail operations Accept remote updates as they arrive Reveal local updates to others as they happen 16 Access mode Concurrent Exclusive Sync frequency Manual push, pull 10 minutes stale Strength Hard Soft Causality Yes No Atomicity Yes No Update ordering None Total Inaccessible copy Ignore Fail access Accept updates Session Immediately Reveal updates On close() Immediately Serial PhD Dissertation Defense Handling Conflicting Semantics What if two sessions have different semantics? If conflicting, block a session until conflict goes away (serialize) Otherwise, allow them in parallel Simple rules for checking conflicts (conflict matrix) Examples: Exclusive write vs. exclusive read vs. eventual write: serialize Write-immediate vs. session-grain isolation: serialize Write-immediate vs. eventual read: no conflict 17 PhD Dissertation Defense Using Composable Consistency Perform data access within a session e.g., session_id = open(object, CC_option_vector); read(session_id, buf); write(session_id, buf); OR, update(session_id, incr_counter(value)); close(session_id); Specify consistency semantics per-session at open() via the CC option vector Concurrency control, replica synchronization, failure handling, view isolation and update visibility. System enforces semantics by mediating each access 18 PhD Dissertation Defense Composable Consistency Benefits Powerful: Small option set can express diverse semantics Customizable: allows different semantics for each access Effective: amenable to efficient WAN implementation Benefit to middleware Can provide read-write caching to a broader set of apps. Benefit for an application Can customize consistency to diverse and varying sharing needs Can simultaneously enforce different semantics on the same data for different users 19 PhD Dissertation Defense Evaluation 20 PhD Dissertation Defense Swarm: A Middleware Providing CC Swarm: Shared file interface with CC options Location-transparent page-grained file access Aggressive P2P caching Dynamic cycle-free replica hierarchy per file Prototype implements CC (except causality & atomicity) Per-file, per-replica and per-session consistency Network economy (exploit nearby replicas) Contention-aware replication (RPC vs caching) Multi-level leases for failure resilience 21 PhD Dissertation Defense Client-server BerkeleyDB Application App users App users LAN LAN Primary App server Internet App logic DB kernel 22 FS PhD Dissertation Defense BerkeleyDB Application using Swarm App users App users LAN LAN Primary App server App logic RDB wrapper Swarm server RDB plugin DB kernel 23 Internet DB FS PhD Dissertation Defense Caching Proxy App Server using Swarm App users App users LAN LAN Primary App server App logic RDB wrapper Internet Proxy App server Swarm server Swarm server RDB plugin DB RDB plugin DB 24 DB DB kernel kernel App logic RDB wrapper FS FS PhD Dissertation Defense Swarm-based Applications SwarmDB: Transparent BerkeleyDB database replication across WAN SwarmFS: wide area P2P read-write file system SwarmProxy: Caching WAN proxies for an auction service with strong consistency SwarmChat: Efficient message/event dissemination No single model can support the sharing needs of all these applications 25 PhD Dissertation Defense SwarmDB: Replicated BerkeleyDB Replication support built as wrapper library Uses unmodified BerkeleyDB binary Evaluated with five consistency flavors: Lock-based updates, eventual reads Master-slave writes, eventual reads Close-to-open reads, writes Staleness-bounded reads, writes Eventual reads, writes Compared against BerkeleyDB-provided RPC version Order-of-magnitude throughput gains over RPC by relaxing consistency 26 PhD Dissertation Defense SwarmDB Evaluation BerkeleyDB B-tree index replicated across N nodes Nodes linked via 1Mbps links to common router 40ms RTT to each other Full-speed workload 30% Writes: inserts, deletes, updates 70% Reads: lookups, cursor scans Varied # replicas from 1 to 48 27 PhD Dissertation Defense SwarmDB Write Throughput/replica Local SwarmDB server Optimistic 20msec stale 10msec stale RPC over WAN Master-slave writes, eventual reads Close-to-open Locking writes, eventual reads 28 PhD Dissertation Defense SwarmDB Query Throughput/replica Local SwarmDB server Optimistic RPC over WAN 10msec stale Close-to-open 29 PhD Dissertation Defense SwarmDB Results Customizing consistency can improve WAN caching performance dramatically App can enforce diverse semantics by simply modifying CC options Updates & queries with different semantics possible 30 PhD Dissertation Defense SwarmFS Distributed File System Sample SwarmFS path /swarmfs/swid:0x1234.2/home/sai/thesis.pdf Performance Summary 31 Achieves >80% of local FS performance on Andrew Benchmark More network-efficient than Coda for wide area access Correctly supports fine-grain collaboration across WANs Correctly supports file locking for RCS repository sharing PhD Dissertation Defense SwarmFS: Distributed Development 32 PhD Dissertation Defense Replica Topology 33 PhD Dissertation Defense SwarmFS vs. Coda Roaming File Access Compile Latency from Cold Cache Coda-s Network Economy SwarmFS 300 Coda-s always gets files from distant U1. SwarmFS gets files from nearest copy. seconds 250 200 150 100 50 0 U1 34 I1, 24ms C1, 50ms T1, 160ms F1, 130ms PhD Dissertation Defense SwarmFS vs. Coda Roaming File Access P2P protocol more efficient Compile Latency from Cold Cache Coda-s Coda-s writes files through to U1 for close-toopen semantics. Hence, SwarmFS performs better for temporary files. 300 250 seconds Swarm’s P2P pull-based protocol avoids this. SwarmFS 200 150 100 50 0 U1 I1, 24ms C1, 50ms T1, 160ms F1, 130ms LAN#-node#, RTT to Home (U1) 35 PhD Dissertation Defense SwarmFS vs. Coda Roaming File Access Compile Latency from Cold Cache Eventual consistency inadequate Coda-s 300 Coda-w behaves incorrectly Trickle reintegration pushed huge obj files to U1, clogging network link. seconds linker found corrupt object files. Coda-w Coda-w Compile errors 250 `make’ skipped files SwarmFS 200 150 100 50 0 U1 36 I1, 24ms C1, 50ms T1, 160ms F1, 130ms PhD Dissertation Defense Evaluation Summary SwarmDB: gains of customizable consistency SwarmFS: network economy under write-sharing SwarmProxy: strong consistency over WANs under varying contention SwarmChat: update dissemination in real-time By employing CC, Swarm middleware data store can support diverse app needs effectively 37 PhD Dissertation Defense Related Work Flexible consistency models/interfaces Munin, WebFS, Fluid Replication, TACT Wide area caching solutions/middleware File systems and data stores: AFS, Coda, Ficus, Pangaea, Bayou, Thor, … Peer-to-peer systems: Napster, PAST, Farsite, Freenet, Oceanstore, BitTorrent, … 38 PhD Dissertation Defense Future Work Security and authentication Fault-tolerance via first-class replication 39 PhD Dissertation Defense Thesis Contributions Survey of sharing needs of numerous applications New taxonomy to classify application sharing needs Composable consistency model based on taxonomy Demonstrated CC model is practical and supports diverse applications across WANs effectively 40 PhD Dissertation Defense Conclusion Can a storage service provide effective WAN caching support for diverse distributed applications? YES Key enabler: a novel flexible consistency interface called Composable consistency Allows an application to customize consistency to diverse and varying sharing needs Allows middleware to serve a broader set of apps effectively 41 PhD Dissertation Defense 42 PhD Dissertation Defense SwarmDB Control Flow 43 PhD Dissertation Defense Composing Master-slave Master-slave replication serialize updates » Concurrent mode writes (WR) » Serial update ordering (apply updates at central master) eventual consistency for queries » Options mentioned earlier Use: mySQL DB read-only replication across WANs 44 PhD Dissertation Defense Clustered BerkeleyDB 45 PhD Dissertation Defense BerkeleyDB Proxy using Swarm 46 PhD Dissertation Defense A Swarm-based Chat Room 3 4 P 2 1 callback(handle, newdata) { display(newdata); } main() { handle = sw_open(kid, "a+"); sw_snoop(handle, callback); while (! done) { read(&newdata); display(newdata); sw_write(handle, newdata); } sw_close(handle); } Sample Chat client code Chat transcript: WR mode, 0 second soft staleness, immediate visibility, no isolation Update propagation path 47 PhD Dissertation Defense