NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison Network is a black box 2 Black box view End hosts Packet lost or delayed No standardized way for network to inform, where and why glitch occurred Network This keeps network simple and efficient However, end hosts have to resort to complicated logic to infer the nature of glitches Current tools for locating glitches 3 Probing tools Tulip[SOSP 2003] sends multiple probes to routers on the path, and use their response to infer the nature of glitches Issues with such approach Out-of-band troubleshooting Probe packets can be treated differently Transient failures hard to detect What if network could tell end-hosts about glitches? 4 Where and why glitch occurred End hosts Network End hosts can take better actions No need to reduce flow rate, if packet loss was not due to congestion Route around glitches using alternate routes via multi-homing or recent routing protocol enhancements like path-splicing etc Benefits emerging applications e.g. gaming, video streaming etc to achieve more robust network performance Network-assisted troubleshooting 5 Design requirement Keep the network as simple as possible While enabling end hosts to determine where and why glitches occurred Our design: Net-Replay Router remember the packets they have forwarded Routers annotate packet (e.g. with their identifier) that they see for the first time When some glitch occurs, sender replays those packets who had experienced the glitch Based on annotations, receiver determines the nature of glitch experienced by the original packet Using Net-Replay to characterize loss 6 Packet already present at router A B Remember the green packet Re-play the lost packet A and annotate before forwarding A A A Sender B B Replayed packet was seen for the first time at B A C Receiver Receiver infers that packet was dropped at A-B link Outline 7 Supporting Net-Replay functionality in network End hosts using Net-Replay Discussion Outline 8 Supporting Net-Replay functionality in network End hosts using Net-Replay Discussion Basic support at router 9 Compute hash Look-up hash in Hashstore Remember new packet as seen Hash Packet Finding if new packet was already seen by the router Exclude mutable fields (e.g. TTL) Compute hash Store hash pointing to packet in Hashstore Evict the oldest packet, if Hashstore becomes full Hash Packet Simple hash table implementation in DRAM for speeds like 2.4 Gbps SRAM for higher speed (40 Gbps) 16 MB SRAM currently available Hashstore High speed Hashstore implementation 10 Use bloom-filters What about false positives? What about packet eviction? Can probabilistically report the location of glitches Use 2 bloom filters: primary and secondary When primary is half filled, start using both When primary is fully filled, copy secondary to primary and clear out secondary How much time worth packets are stored in 16 MB SRAM? Up to 3s at 40 Gbps with average packet size of 600 bytes Greater than10 RTTs assuming RTT < 250 ms Sufficient enough for end-applications to react Outline 11 Supporting Net-Replay functionality in network End hosts using Net-Replay Deployment discussion, cheating Issues and new applications enabled by Net-Replay How end hosts can use Net-Replay? 12 Characterizing glitches Packet loss Replay lost packet Delay Router remembers which packets were delayed Replay delayed packet Reordering If it happened due to route changes, sender could know the first router where route changed Replay reordered packet End host protocol stack 13 Application or higher layer Decide how to overcome glitches MIRO/ Path splicing Nature of glitches Or no action TCP layer Higher layer should decide the policy of handling glitches TCP layer can tell higher layer the nature of glitches e.g. loss After loss, TCP layer retransmits packet (in current TCP protocols) Uses retransmitted packet to find the nature of loss Receiver sends the information about loss back to sender along with ACK Outline 14 Supporting Net-Replay functionality in network End hosts using Net-Replay Discussion Deployment discussion 15 Partial deployment Net-replay can be deployed on few routers and can be used to find the nature of glitches in path segments Border routers of ISP and information per domain Avoiding device modifications Can be deployed in 2-port hardware switches as bumps in the wire Net-Replay agnostic devices Net-Replay aware bumps in the wire Other applications 16 Network tomography uses complicated logic to infer link loss rates With Net-Replay, location of loss can be precisely determined Simplifies network tomography Packets can be moved from fast memory to disks in batches Can be used for debugging distributed applications Useful for network operators to find the performance at fine grained level Conclusion 17 Net-Replay helps applications perform in-band characterization of glitches Net-Replay requires simple support from network infrastructure End hosts can get robust network performance using Net-Replay Questions 18 Thank you Backup 19 Hashstore implementation 20 Simple hash table in DRAM (50 ns latency) good enough at 2.5 Gbps Lookup and store: 100 ns per packet 40B packets arrive every 128ns at 2.5 Gbps However DRAM latency can’t match 40Gbps; requires faster memory like SRAM Current SRAM up to 16MB only; Need space-efficient data structure Cheating issues 21 ISP inserts wrong annotations to ensure that it is never considered accountable for glitches Chances ISP modifies ACK packet, if it finds its router is causing glitches Use are that ISP is caught encrypted ACK Possibly other issues and need to investigate