CacheCast: Eliminating Redundant Link Traffic for Single Source Multiple Destination Transfers Piotr Srebrny, Thomas Plagemann, Vera Goebel Department of Informatics, University of Oslo Andreas Mauthe Computing Department, Lancaster University Outline Problem statement Idea Design Related work CacheCast evaluation Conclusions What is the Problem? The Internet provides only a mechanism for single source to single destination datagram transmission (unicast) This is expressed in the IP header PAYLOAD … Destination Address IP Source Address … What is the Problem? (cont.) The Internet – a network of routers and links S C Redundancy! A B What has been done? “Datagram Routing for Internet Multicasting” L. Aguilar – explicit list of destinations in the IP header ◦ Follow-ups: XCast, Small Group Multicast “Host Extension for IP Multicasting” S. Deering – destination address denotes a group of host “A Case for End System Multicast” Y. hua Chu et al. – application level multicast CacheCast CacheCast is a network layer caching mechanism that eliminates redundant data transmissions S C A B CacheCast Idea 1. B P 0 A 0 0 A 0 B 2. P A P 1 P 0 A 1 P 1 P A A P B 3. P 0 B 1 P 1 P 1 P A CacheCast Idea (cont.) 4. B B B 1 0 1 P 1 P A P B 5. P B 1 1 P 1 P 1 P A Link Cache Point-to-point logical links Caching is done per link: ◦ Cache Management Unit (CMU) ◦ Cache Store (CS) Caching Router CS C MU Router CMU CS Router C MU CS Normal processing Router Link Cache Requirements Simple processing ◦ ~72ns to process a minimum size packet on a 10Gbps link and ~18ns on a 40Gbps link Modern DDR r/w cycle ~20ns Modern SRAM r/w cycle ~5ns Small cache size ◦ A link queue scaled to 250ms of the link traffic Source Requirements Simple cache processing ◦ A source provides information on payload ID, payload size Minimise cache size ◦ A source batches request for the same data and transmits it within the minimum amount of time Cacheable Packets CacheCast packet carries a metadata describing packet payload ◦ Payload ID ◦ Payload size ◦ Index Only packets with the metadata are cached CMU & CS Cache miss P1 0 xA P1 CMU CS CMU table Cache store Index Payload ID Index Payload 0 1 2 P1 0 0 0 0 1 2 - CMU & CS (cont.) Cache hit P2 xB 1 CMU CS CMU table Cache store Index Payload ID Index Payload 0 1 2 P1 P2 P3 0 1 2 P1 P2 P3 - Estimating Cache Size Concept of packet train It is sufficient to hold payload in the CS for the packet train duration time How many packet headers can send a source send within the time? Estimating Cache Size (cont.) Back-of-the-envelope calculations Source uplink speed 2ms 10ms 50ms 512Kbps 2 8 40 1Mbps 4 16 79 10Mbps 32 157 781 100Mbps 313 1561 7802 ~10ms Packet train time caches are sufficient Implication of the Small Size 10ms cache size on a 10Gbps link: ~12.8MB for the CS storage space ~13000 entries in the CMU table What about 100Mbps LAN? ~130KB for CS ~130 entries in the CMU table We can afford that! Related Work Packet Caches on Routers: The Implications of Universal Redundant Traffic Elimination. Ashok Anand et al. ◦ Per link cache ◦ Universal redundancy elimination ◦ No server support Evaluation Bandwidth consumption ◦ CacheCast vs. IP Multicast Unique packet headers Finite cache sizes Incremental deployment ◦ Benefits of partial cache deployment Congestion control ◦ CacheCast impact on TFRC throughput Bandwidth Consumption Multicast efficiency metric: Lm m 1 Lu Lm – total amount of multicast links Lu – total amount of unicast links Example: A C B Lm 5, Lu 9 5 4 m 1 9 9 S Bandwidth Consumption (cont.) CacheCast unicast header part (h) and multicast payload part (p) Lm m 1 Lu CC 1 Thus: CC sh Lu s p Lm ( s p sh ) Lu sh 1 m, r 1 r sp E.g.: using packets which sp=1416B and sh=84B we experience reduction of 5% Finite Cache Size The more destination the higher efficiency m E.g. ◦ 512Kbps – 8 headers in 10ms, e.g. 12 destinations L K J P I H G F E D C B P A Slow sources transmitting to many destinations cannot achieve the maximum efficiency Finite Cache Size (cont.) Sources with different uplink speed transmitting to the growing number of destinations Uplink PH 512Kbps 8 1Mbps 16 10Mbps 157 100Mbps 1561 Incremental Deployment The CS and CMU deployed incrementally 1 S 2 3 4 5 6 Incremental Deployment Bottleneck Link Test ns2 implementation 100 TCP flows competing with 100 TFRC flows on a bottleneck link Bottleneck Link Test (cont.) Both TCP and TRFC benefit from CacheCast CacheCast Implementation Router part ◦ Click Modular Router CMU and CS elements - in total ~400 lines of code Server part ◦ Linux kernel – system call msend(fd_set *write_to, fd_set *written, char *buf, int len) ◦ Paraslash tools – a streaming server that uses the msend system call and a receiver Testbed Testbed setup: ◦ Paraslash server (S) ◦ Click Modular Router (R) ◦ Paraslash receivers (A,B) Testbed Results Bandwidth consumed by packet header transmission msend overhead negligible Conclusions CacheCast is: ◦ A valid solution for single source multiple destinations transfers ◦ Simple and reliable ◦ Fully distributed ◦ Incrementally deployable Thank You for Your Attention!