PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman

PLATO: Predictive LatencyAware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee Total Ordering   a.k.a Atomic Broadcast delivering messages to a set of nodes in the same order    messages arrive at nodes in different orders… nodes agree on a single delivery order messages are delivered at nodes in the agreed order Modern Datacenters  Applications     E-tailers, Finance, Aerospace Service-Oriented Architectures, PublishSubscribe, Distributed Objects, Event Notification… … Totally Ordered Multicast! Hardware   Fast high-capacity networks Failure-prone commodity nodes Total Ordering in a Datacenter Updates are Totally Ordered Replicated Service Query Update 1 Inventory Service Replica 1 Update 2 Query Inventory Service Replica 2 Totally Ordered Multicast is used to consistently update Replicated Services Latency of Multicast  System Consistency Requirement: order multicasts consistently, rapidly, robustly Multicast Wishlist  Low Latency!  High (stable) throughput   Leverage hardware properties   Minimal, proactive overheads HW Multicast/Broadcast is fast, unreliable Handle varying data rates  Datacenter workloads have sharp spikes… and extended troughs! State-of-the-Art  Traditional Protocols    Example: Fixed Sequencer   Simple, works well Optimistic Total Ordering:    Conservative Latency-Overhead tradeoff deliver optimistically, rollback if incorrect Why this works – No out-of-order arrival in LANs Optimistic total ordering for datacenters? PLATO: Predictive Ordering  In a datacenter, broadcast / multicast occurs almost instantaneously    Most of the time, messages arrive in same order at all nodes. Some of the time, messages arrive in different orders at different nodes. Can we predict out-of-order arrival? Reasons for Disorder: Swaps Receives Sender 2's message after Sender 1's message Receives Sender 1's message after Sender 2's message Receiver 1 Receiver 2 Switch Sender 1 Switch Sender 2 Typical Datacenter Diameter: 50-500 microseconds Out-of-order arrival can occur when the inter-send interval between two messages is smaller than the diameter of the network Reasons for Disorder: Loss  Datacenter networks are overprovisioned   Loss never occurs in the network Datacenter nodes are cheap  Loss occurs due to end-host buffer overflows caused by CPU contention G E F D H E D C E B D A C t A B C Order of arrivals into user-space D E H F G Emulab Testbed (Utah) 850 Mhz Cisco 6509 4 Gb Cisco 6509 100 Mb 600 Mhz 4 Gb 100 Mb 4 Gb Emulab2 test scenario: 2 switches of separation One-way ping latency: ~100 microseconds Cisco 6509 850 Mhz Cisco 6509 100 Mb 1 Gb 8 Gb Cisco 6513 850 Mhz 100 Mb 2 Ghz 3 GHz Emulab3 test scenario: 3 switches of separation One-way ping latency: ~110 microseconds The Utah Emulab Testbed Cornell Testbed 100 Mb HP Procurve 6108 1 Gb 1 Gb 1 Gb The Cornell Testbed HP Procurve 4000M 1.3 Ghz HP Procurve 4000M Cornell3 test scenario: 3 switches of separation One-way ping latency: ~70 microseconds 1.3 Ghz 100 Mb 100 Mb 1 Gb 1 Gb 1 Gb HP Procurve 6108 HP Procurve 4000M 1.3 Ghz HP Procurve 4000M HP Procurve 6108 1.3 Ghz 100 Mb Cornell5 test scenario: 5 switches of separation One-way ping latency: ~110 microseconds Disorder: Emulab3 Percentage of swaps and losses goes up with data rate At 2800 packets per sec, 2% of all packet pairs are swapped and 0.5% of packets are lost. Disorder Predicting Disorder   Predictor: Inter-arrival time of consecutive packets into user-space Why?   Swaps: simultaneous multicasts  low inter-arrival time Loss: kernel buffer overflow  sequence of low inter-arrival times Predicting Disorder Inter-arrival time of swaps  95% of swaps and 14% of all pairs are within 128 µsecs Inter-arrival time of all pairs Cornell Datacenter, 400 multicasts/sec Predicting Disorder PLATO Design   Heuristic: If two packets arrive within Δ µsecs, possibility of disorder PLATO    Heuristic + Lazy Fixed Sequencer Heuristic works  ~ zero (Δ) latency Heuristic fails  fixed sequencer latency PLATO Design API: optdeliver, confirm, revoke Ordering Layer: Pending Queue: Packets suspected to be out-of-order, or queued behind suspected packets Suspicious Queue: Packets optdelivered to the application, not yet confirmed PLATO Design optdeliver(A) optdeliver(E) optdeliver(B) optdeliver(D) A pending E suspicious TE-TA>DELTA D B E A B revoke(D) setsuspect(D) setsuspect(C) D D C TC-TD<DELTA C suspicious B Or d Underlined packets in pending are suspected pending E A Se q er Ms g :A BC D revoke(E) setsuspect(E) pending E suspicious confirm(A, B, C, D) t Performance  Fixed Sequencer  PLATO At small values of Δ, very low latency of delivery but more rollbacks Performance Latency of both FixedSequencer and PLATO decreases as throughput increases Performance Traffic Spike: PLATO is insensitive to data rate, while Fixed Sequencer depends on data rate Performance Latency is as good as static Δ parameterization Δ is varied adaptively in reaction to rollbacks Conclusion     First optimistic total order protocol that predicts out-of-order delivery Slashes ordering latency in datacenter settings Stable at varying loads Ordering layer of a time-critical protocol stack for Datacenters

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman

Related documents

Products

Support

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib