SmartRE: An Architecture for Coordinated Network-Wide Redundancy Elimination Ashok Anand, Vyas Sekar, Aditya Akella University of Wisconsin, Madison Carnegie Mellon University 1 Redundancy Elimination (RE) for Increasing Network Capacity Video Data centers Web content Other services (backup) CDN Dedup/ RE: LeverageWan repeated transmissions archival Optimize Many “narrow” solutions to improve performance! r HTTP CDN HTTP HTTP Wan CDN caches caches caches Can we generalize this transparently? Optimizer Dedup/ archival Benefit both users and ISPs? Mobile Users Enterprises Home Users 2 In-Network RE as a Service Key Issues: 1. Performance: Minimize Traffic FootPrint (“byte-hops”) NewCapacity: packets get Can onlyEncoded pktsfinite are DRAM 2. Cache provision “encoded” or or 3. Processing“compressed” Constraints: Enc/Dec “decoded” are memory-access limited w.r.t cached pkts “uncompressed” downstream Routers keep a cache of recent pkts RE as a IP-layer service: Generalizes “narrow” deployments Transparent to users/apps: Democratizes benefits of RE Benefits ISPs: Better TE/Lower load 3 In-Network RE as a Service Hop-by-Hop (Anand et al, Sigcomm08) Hop-by-hop RE is limited by encoding bottleneck Encoding: ~ 15 mem. accesses ~ 2.5 Gbps (@ 50ns DRAM) Decoding: ~ 3-4 accesses > 10 Gbps (@ 50ns DRAM) Decode Encode Decode Decode Encode Encode Encode Decode Same packet encoded andEncode decoded many times Decode Same packet cached many times Performance (Leverage all RE) ✔ Cache Constraints ✖ Processing Constraints ✖ 4 In-Network RE as a Service At the Edge Doesn’t help ISPs (e.g., traffic engineering) Encode Decode Cannot leverage Inter-path RE Can leverage Intra-path RE Performance (Leverage all RE) ✖ Cache Constraints ✔ Processing Constraints ✔ Decode 5 Edge Hop-by-Hop Performance (Leverage all RE) ✖ ✔ Cache Constraints ✔ ✖ Processing Constraints ✔ ✖ Motivating Question: How can we practically leverage the benefits of network-wide RE optimally? 6 Outline • Background and Motivation • High-level idea • Design and Implementation • Evaluation 7 SmartRE: High-level idea Don’t look at one-link-at-a-time Treat RE as a network-wide problem Cache Constraints: “Coordinated Caches” Each packet is cached only once downstream Processing Constraints: Encode @ Ingress, Decode@ Interior/Egress Decode can occur multiple hops after encoder Performance: Network-Wide Optimization Account for traffic, routing, constraints etc. SmartRE: Coordinated Network-wide RE 8 Cache Constraints Example Packet arrivals: A, B, A,B Ingress can store 2pkts Interior can store 1pkt Total RE savings in network footprint After 2nd pkt (“byte hops”)? A,B B,A A,B B A B B Can we A do betterB than this? 2*1=2 After 4th pkt RE on first link No RE on interior 9 Cache Constraints Example Coordinated Caching Packet arrivals: A, B, A,B Total Ingress can store 2pkts Interior can store 1pkt RE savings in network footprint After 2nd pkt A,B A,B A,B (“byte hops”)? After 4th pkt B B B A A A 1*2+1*3=5 RE for pkt A Save 2 hops RE for pkt B Save 3 hops 10 Processing Constraints Example 4 Mem Ops for Enc 2 Mem Ops for Dec 5 Enc/s Enc 5 Enc/s Enc Dec 5 Dec/s 5 Dec/s Dec Note that even though decoders can do more work, they are limited by encoders 5 Enc/s Enc Dec 5Dec/s 5 Dec/s Dec 5 Enc/s Enc 20 Mem Ops 5Dec/s Dec Enc 5 Enc/s 5 E/s Enc 5 D/s Dec Can we Total RE savings in network footprint do better (“byte hops”)? than this? 5 * 6 = 30 units/s 11 Processing Constraints Example: Smarter Approach 4 Mem Ops for Enc 20 Mem Ops 2 Mem Ops for Dec Many nodes are idle.Still does better! Good for partial deployment also10 Dec/s 5 Enc/s 5 Enc/s 5 Dec/s 5 Dec/s Total RE savings in network footprint 5 Enc/s (“byte hops”)? 10*3 + 5 *2 = 40 units/s Dec @ edge Dec @ core 12 Outline • Background and Motivation • High-level idea • Design and Implementation • Evaluation 13 SmartRE Overview @ NOC Network-Wide Optimization “Encoding Configs” To Ingresses “Decoding Configs” To Interiors 14 Ingress/Encoder Operation Packet Cache Encoding Config Shim carries Info(matched pkt) MatchRegionSpec Check if this packet needs to be cached Identify candidate packets to encode Find “compressible” regions w.r.t cached packets Spring & Wetherall Sigcomm’00, Anand et al Sigcomm’08 15 Interior/Decoder Operation Decoding Config Packet Cache Shim carries Info(matched pkt) MatchRegionSpec Check if this packet needs to be cached Reconstruct “compressed” regions using reference packets 16 Design Components How do we specify coordinated caching responsibilities? What does the optimization entail? Correctness: How do ingresses and interior nodes maintain cache consistency? How do ingresses identify candidate packets for encoding? 17 How do we “coordinate” caching responsibilities across routers ? Non-overlapping hash-ranges per-path avoids redundant caching! (from cSamp, NSDI 08) [0.1,0.4] [0.7,0.9] [0.1,0.3] [0.1,0.4] [0,0.1] [0.7,0.9] [0,0.3] 1. Hash (pkt.header) 2. Get path info for pkt 3. Cache if hash in range for path 18 18 Design Components How do we specify coordinated caching responsibilities? What does the optimization entail? Correctness: How do ingresses and interior nodes maintain cache consistency? How do ingresses identify candidate packets for encoding? 19 What does the “optimization” entail? Inputs Traffic Patterns Traffic Matrix Redundancy Profile (intra + inter) Topology Routing Matrix Topology Map Objective: Max. Footprint Reduction (byte-hops) or any ISP objective (e.g., TE) Linear Program Network-wide optimization Output Encoding manifests Decoding manifests Path,HashRange Router constraints Processing (MemAccesses) Cache Size 20 Design Components How do we coordinate caching responsibilities across routers ? What does the optimization entail? Correctness: How do ingresses and interior nodes maintain cache consistency? How do ingresses identify candidate packets for encoding? 21 How do ingresses and interior nodes maintain cache consistency? [0.1,0.4] [07,0.9] What if traffic surge on red path causes packets on black path to be evicted? [0.1,0.3] [0.1,0.4] [0,0.1] [0.7,0.9] [0,0.3] Create “logical buckets” For every path-interior pair Evict only within buckets 22 SmartRE: Putting the pieces together Traffic Redundancy Profile Device Constraints @ NOC Network-Wide Optimization “Encoding Configs” To Ingresses Routing Candidate packets must be available on new packet’s path “Decoding Configs” To Interiors [0.1,0.4] [07,0.9] [0.1,0.3] Cache Consistency: Create “logical buckets” For every path-interior pair Evict only within buckets [0.1,0.4] [0,0.1] [0,0.3] [0.7,0.9] Non-overlapping hash-ranges per-path avoids redundant caching! 23 Outline • Background and Motivation • High-level idea • Design and Implementation • Evaluation 24 Reduction in Network Footprint Setup: Real traces from U.Wisc Emulated over tier-1 ISP topologies Processing constraints MemOps & DRAM speed 2GB cache per RE device SmartRE is 4-5X better than the Hop-by-Hop Approach SmartRE gets 80-90% of ideal unconstrained RE Results consistent across redundancy profiles, on synthetic traces 25 More results … Can we benefit even with partial deployment? Even simple strategies work pretty well! What if redundancy profiles change over time? Some “dominant” patterns which are stable Get good performance even with dated configs 26 To Summarize .. • RE as a network service is a promising vision – Generalizes specific deployments: benefit all users, apps, ISPs • SmartRE makes this vision more practical – Look beyond link-local view; decouple encoding-decoding – Network-wide coordinated approach • 4-5X better than current proposals – Works even with less-than-ideal/partial deployment • Have glossed over some issues .. – Consistent configs, Decoding gaps, Packet losses, Routing dynamics • Other domains: Data Center Networks, Multihop Wireless etc. 27