Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese Trend toward low-latency networks Low latency: one of important metrics in designing a network Switch vendors introduce switches that provide low latency Financial data center begins to demand more stringent latency Benefits of low-latency networks Content Provider Our network provides E-to-E latency SLA of a few μseconds Brokerage Financial Service Provider Network Low latency An automated trading program can buy shares cheaply A cluster application can run 1000’s more instructions But… Guaranteeing low latency in data centers is hard Reason 1: No traffic models for different applications Congestion needs to be less than a certain level Hinders managers from predicting offending applications Reason 2: New application’s behavior is often unforeseen until it is actually deployed E.g., TCP incast problem [SIGCOMM ’09] Latency & loss measurements are crucial Need latency & loss measurements on a continuous basis Detect problems Fix: re-routing offending application, upgrading links, etc. Goal: Providing fine-grained end-to-end aggregate latency and loss measurements in data center environments Content Provider Brokerage A B E-to-E latency and loss measurements Measurement model Content Provider Out-of-order delivery Filter Financial Service A Multiple paths Provider Brokerage … B Brokerage Network Filter Out-of-order packet delivery due to multiple paths Packet filtering associates packet stream between A and B Time synchronization: IEEE 1588, GPS clock, etc. No header changes: Regular packets carry no timestamp Measurement model Router A Router B Content Filter Provider Interval Interval Financial Service A Message Provider Network Measurement B Interval Filter Interval message: A special ‘sync’ control packet to mark off a measurement interval Message Brokerage Injected by measurement modules at an edge (e.g., Router A) Measurement interval: A set of packets ‘bookended’ by a pair of interval messages Existing solutions Active probes Storing timestamps and packet digests locally Problem: Not effective due to huge probe rate requirement Problem: Significant overhead for communication Packet sampling: Trade-off between accuracy and overhead Lossy Difference Aggregator (LDA) [Kompella, SIGCOMM ’09] State-of-the-art solution with FIFO packet delivery assumption Problem: Not suitable in case where packets can be reordered LDA in packet loss case Bucket Interval Message Packet count Timestamp sum X 2 A 1 5 7 Router Hash 2 1 1 6 2 1 2 9 11 3 B 9 Router Hash Corrupted bucket 2 1 12 3 1 11 (3 – 1) + (9 – 5) + (11 – 7) 12 – 6 True delay = =3 Estimated delay = 2 3 = 3.3 Estimation error = 9% Key point: Only useful buckets must be used for estimation A useful bucket: a bucket updated by the same set of packets at A and B Bad packets: lost packets to corrupt buckets LDA in packet loss + reordering case Freeze buckets Packet count Timestamp sum X 2 A 1 5 7 Router Hash 2 1 1 6 2 1 2 9 Hash No reordering 12 12 3 True delay = 3.3 12 + 24 – 6 – 9 = 5.25 Estimated delay = 4 11 13 3 B 9 Router Reordering 2 1 11 24 Freeze buckets after update Estimation error = 59% Problem: LDA confounds loss and reordering Packet count match in buckets between A and B is insufficient Reordered packets are also bad packets Significant error in loss and aggregate latency estimation True delay = 3.3 Quick fix of LDA: per-path LDA Let LDA operate on a per-path basis Exploit the fact that packets in a flow are not reordered by ECMP Issues (1) Associating a flow with a path is difficult (2) Not scalable: potentially need to handle millions of separate TCP flows Packet reordering in IP networks Today’s trend New trend: Data centers exploit the path diversity No reordering among packets in a flow No reordering across flows between two interfaces ECMP splits flows across multiple equal-cost paths Reordering can occur across flows Future direction: Switches may allow reordering within switches for improved load balancing and utilization Reordering-tolerant TCP for use in data centers Proposed approach: FineComb Objective Detect and correct unusable buckets Control the number of unusable buckets Key ideas 1) Incremental stream digests: Detect unusable buckets 2) Stash recovery: Make corrupted buckets useful by correction 3) Packet sampling: Control the number of bad packets included Incremental stream digests (ISDs) An ISD = H(pkt1) H(pkt2) … H(pktk) Property 1: Low collision probability is an invertible commutative operator (e.g., XOR) Two different packet streams hash to different value Allows to detect corrupted buckets Property 2: Invertibility Easy addition/subtraction of a packet digest from an ISD The basis of stash recovery ISDs handles loss and reordering X 2A 04 A 03 06 Router Hash Packet count Timestamp sum ISD 2 1 1 6 2 1 2 9 03 09 04 2E 2A 03 B 06 10 Router Hash value Hash ISDs don’t match 2 1 12 3 2 1 11 24 03 09 3A 2A ISDs detects corrupted buckets by loss and reordering Buckets are usable only if both packet counts and ISDs match each other between A and B True delay = 3.3 Latency and loss estimation Router A Packet count Timestamp sum ISD 2 6 09 2 9 2E 3 9 A1 Router B 2 12 09 2 1 24 19 3A 9C Average latency estimation Delay sum = (12 – 6) + (0 – 0) + (0 – 0) = 6 Count = 2 + 0 + 0 = 2 Average latency = 3.0 Loss estimation Loss count sum = (2 – 2) + (2 – 2) + (3 – 1) = 3 Total packets = 2 + 2 + 3 = 7 Loss rate = 0.43 Stash recovery Stash: A set of (timestamp, bucket index, hash value) tuple of packets which are potentially reordered (-) stash Contains packets potentially added to a receiver (Router B) In recovery, packet digests are subtracted from bad buckets at a receiver (+) stash Contains packets potentially missing at a receiver (Router B) In recovery, packet digests are added to bad buckets at a receiver Stash recovery A bad bucket can be recovered iff reordered packets corrupted it Reordered packets are not counted as lost packets Increase loss estimation accuracy (–) Stash in B A bucket in A 2 12 A bucket in B 2E ISDs match don’t ISDs match 1 3 – 34 3E 1 5 2 10 04 2 29 32 2E 3A { { { 2 04 1 5 10 All subsets 1 2 04 1 5 10 1 2 04 } } 1 5 10 } Sizing buckets and stashes Known loss and reordering rates Given a fixed storage size, we obtain the optimal packet sampling rate (p*) We provision stash and buckets based on the the p* Unknown loss and reordering rates Use multiple banks optimized for different set of loss and reordering rate Details can be found in our paper Accuracy of latency estimation Average relative error Packet loss rate = 0.01%, #packets = 5M, true mean delay = 10μs 1000x difference Reordering rate FineComb: ISD+stash, FineComb-: ISD only Accuracy of loss estimation Average relative error Packet loss rate = 0.01%, #packets = 5M Stash helps to obtain accurate loss estimation Reordering rate Summary Data centers require end-to-end fine-grain latency and loss measurements We proposed a data structure called FineComb Resilient to packet loss and reordering Incremental stream digest detects corrupted buckets Stash recovers buckets only corrupted by reordered packets Evaluation shows FineComb achieves higher accuracy in latency and loss estimation than LDA Thank you! Questions? Backup Average relative error Microscopic loss estimation Reordering rate Average relative error Handling unknown loss & reordering rates Reordering rate LDA: 2-banks, FineComb: 4-banks with same memory size