Consistency-Based Service Level Agreements for Cloud Storage Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, Hussam Abu-Libdeh Microsoft Research “A foolish consistency is the hobgoblin of little minds” -- Ralph Waldo Emerson (1841) “… and of large clouds” -- Douglas Brian Terry (2013) 2 Today’s Cloud Storage Providers • Replicate data widely • Offer choice of strong or eventual consistency e.g. Amazon DynamoDB, Yahoo PNUTS, Google App Engine, Oracle NoSQL, Cassandra, … Microsoft Windows Azure • Tradeoff consistency, availability and performance 3 Problem • Developers must choose consistency • No single choice is best for all clients and situations Client Consistency strong eventual U.S. England India China (secondary) (primary) (secondary) (client only) 147.5 1.2 435.5 307.23 1.1 1.0 1.1 160.2 roundtrip times in milliseconds 4 Pileus key features a cap cloud • Replicated, partitioned key-value store • Choice of consistency • Consistency-based service level agreements (SLAs) 5 Pileus System Model API secondary nodes primary core sync replication lazy replication BeginSession (SLA) BeginTx (SLA) Put (key, value) Get (key, SLA) returns value, consistency EndTx () EndSession () 6 Read Consistency Guarantees Strong Consistency Causal Consistency Bounded Staleness (t) Read My Writes Return value of latest Put. Return value of latest causally [COPS 2011] preceding Put. Return value that is stale by at most [TACT 2002] t seconds. Return value of latest Put in client session or a later value. [Bayou 1994] Monotonic Reads Return same or later value as earlier Get in client session. Eventual Consistency Return value of any Put. 7 Read Latencies Client/ Consistency U.S. England India China (secondary) (primary) (secondary) (client only) consistency affects latency strong 147.5 1.2 435.5 307.3 causal 146.3 1.0 client location affects latency306.4 431.6 bounded(30) 75.1 1.0 234.6 241.9 read-my-writes 13.0 1.1 18.4 166.8 monotonic 1.1 1.0 1.1 160.2 eventual 1.1 1.0 1.1 160.2 roundtrip times in milliseconds 8 Consistency-based SLA • Applications declare desired consistency/latency Shopping Cart: consistency strong latency 300 ms. utility 1.0 2. read my writes 300 ms. 0.5 3. eventual 300 ms. 0.1 1. 9 SLA Enforcement: Client Monitoring For each tablet: Node Primary? A yes 210 B no 166 C no 203 from configuration service RTTs measured on Gets, Puts, and pings High Timestamp returned from Gets, Puts, and pings 10 SLA Enforcement: Node Selection On Get (key, SLA): 1. For each subSLA and node, a. compute Platency b. compute Pconsistency c. compute Platency x Pconsistency x utility 2. Select node with maximum expected utility 3. Send Get operation to node 4. Measure RTT and update records 5. Return data and delivered consistency to caller 11 Experimental Setup System configuration: 161 U.S. 149 England 436 308 287 India China 181 Primary: England Secondaries: U.S., India Clients: U.S., England, India, China Benchmark: • YCSB with 50/50 Gets/Puts • 500-op sessions Node selection schemes: • • • • Primary = get from primary Random = get from random node Closest = get from closest node Pileus = get from node with highest expected utility Measurement: • Average utility for Get operations 12 Experiment #1: SLA Simplified shopping cart SLA: consistency latency utility 1. read my writes 300 ms. 1.0 2. eventual 300 ms. 0.5 13 Average utility per Get Experiment #1: Delivered Utility (secondary) (primary) (secondary) (client only) Client datacenter 14 Experiment #1: Delivered Utility Average utility per Get Primary selection works well when close to the primary, but poorly when distant (secondary) (primary) (secondary) (client only) Client datacenter 15 Experiment #1: Delivered Utility Average utility per Get Random selection rarely works well (secondary) (primary) (secondary) (client only) Client datacenter 16 Experiment #1: Delivered Utility Average utility per Get 100% Gets from England; 100% meet top subSLA (secondary) (primary) (secondary) (client only) Client datacenter 17 Experiment #1: Delivered Utility Average utility per Get 91% from U.S.; 9% from England; 100% meets top subSLA 14.5 ms. avg. latency vs. 148 ms. for primary (secondary) (primary) (secondary) (client only) Client datacenter 18 Experiment #1: Delivered Utility Average utility per Get 99.6% from U.S.; 0.4% from India; 96% meets top subSLA (secondary) (primary) (secondary) (client only) Client datacenter 19 Average utility per Get Experiment #1: Delivered Utility (secondary) (primary) (secondary) (client only) Pileus always delivers the Client datacenter most utility! 20 Experiment #1: Delivered Utility Average utility per Get 9% fail to meet readmy-write (secondary) (primary) (secondary) (client only) Client datacenter 21 Experiment #2: SLA Password checking SLA: consistency latency utility 1. strong 150 ms. 1.0 2. eventual 150 ms. 0.5 3. strong 1000 ms. 0.25 22 Average utility per Get Experiment #2: Delivered Utility (secondary) (primary) (secondary) (client only) Client datacenter 23 Conclusions: Main Contributions Our Pileus system • provides a broad choice of consistency guarantees and range of delivered read latency • allows declarative specification of desired consistency and latency through consistency-based SLAs • selects nodes to maximize expected utility while adapting to varying conditions