The Power of Prediction: Cloud Bandwidth and Cost Reduction Eyal Zohar Israel Cidon Technion Osnat (Ossi) Mokryn Tel-Aviv College Traffic Redundancy Elimination (TRE) Traffic redundancy stems from downloading same or similar information items. We found around 70% redundancy in end-clients traffic, compared with past traffic and local files. SIGCOMM 2011 2 TRE Importance Moving to the cloud => higher e2e traffic. Cloud users pay for traffic used in practice => incentive to use TRE. Cloud User Application Pay for Use Cloud Provider Cloud Traffic End-user SIGCOMM 2011 TRE 3 How TRE Works Server parses the outgoing stream to contentbased chunks and signs with SHA-1 Byte stream Rolling hash Anchor 1 Anchor 2 Chunk 1 Anchor 3 Chunk 2 Anchor 4 Chunk 3 SHA-1 signature Sign. 1 Sign. 2 Sign 3 New bytes Insertion example SIGCOMM 2011 Chunk 1 Chunk 2’ Chunk 3 4 Problems in Existing Solutions In the cloud environment: 1. 2. 3. 4. High processing costs in the cloud. Scalability – remember each client. Elasticity - unaware of data from other sources. Do not handle long-term repeats (days/weeks). Server 2 Receiver SIGCOMM 2011 Server 1 5 Our Solution: PACK (Predictive ACK) Redundancy detection by the client. Repeats appear in chains. Tries to match incoming chunks with a previously received chain or local file. Sends to the server predictions of the future data. SIGCOMM 2011 6 PACK: The Client Prediction Stream chunks Chunk 1 Chunk 2 Chunk 3 SHA-1 signature Chain of chunks Sign. 1 Sign. 2 Sign 3 Received Each prediction: 1.TCP seq. – no server parsing 2.Hint – spare unnecessary SHA-1 3.SHA-1 signature Prediction TCP seq. Chunk Last-byte hint SHA-1 SIGCOMM 2011 7 PACK: Server Operation The server compares the hint with the last-byte to sign. Upon a hint match it performs the expensive SHA-1. PACK saves cloud’s computational effort in the absence of redundancy. First receiver-based TRE: the server does not parse. It signs with >99% confidence. 2,3V 3 1 1 2 2,3? Local storage SIGCOMM 2011 Client 2 3 Server Chain 8 PACK Benefits Minimizes processing costs induced by TRE. – Signs with SHA-1 in the presence of redundancy. Receiver-based end-to-end TRE => suitable for cloud server elasticity and client mobility. – Does not require the server to continuously maintain clients’ status. SIGCOMM 2011 9 Server Effort Experiment Several data-sets in 3 modes: baseline no-TRE, PACK and a sender-based TRE. 25%-30% redundancy: common to many data-sets Single Server Cloud Operational Cost (100%=without TRE system) 140% 120% 100% 80% 60% 40% EndRE-like Sender-based PACK 20% 0% 0% SIGCOMM 2011 10% 20% 30% Redundancy Elimination Ratio 40% 50% 10 YouTube Redundancy Traces of 40k clients, captured at an ISP. Found 30% end-to-end (personal) redundancy. 35% 30% All YouTube Traffic (Gbps) 2.5 25% 2.0 YouTube Traffic 20% 1.5 PACK TRE 15% 1.0 10% 0.5 5% 0.0 SIGCOMM 2011 PACK TRE (Removed Redundancy) 3.0 0% Time (24 hours) 11 Long-Term TRE Social network: eliminated 30% with one hour cache and 75% with a long-term cache. 80% Average Redundancy of Daily Traffic 70% 60% 50% 40% 30% 20% Unlimited 1 Hour 24 Hours 10% 0% 1 SIGCOMM 2011 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Days Since Start 12 Cloud Email Redundancy Gmail account with 1,000 Inbox messages. Found 32% static redundancy (higher when messages are read multiple times). 300 250 Traffic Volume Per Month (MB) Redundant Non-redundant 200 150 100 50 0 Jan SIGCOMM 2011 Feb Mar Apr May Jun Month Jul Aug Sep Oct Nov Dec 13 Implementation Linux with Netfilter Queue, 25k lines of C and Java, available for download. Receiver-sender protocol is embedded in the TCP Options field. Transparent use at both sides. SIGCOMM 2011 14 Processing Effort in the Client Laptop experiment: PACK-related CPU consumption is ~4% when playing HD video (9 Mbps with 30% redundancy). Smartphone experiment: PACK consumes ~3% of the battery power when processing 1 GB video (avg. monthly data plan). Virtual traffic saves the client the need to chunk or sign. SIGCOMM 2011 15 New Chunking Algorithm Most existing solutions use Rabin fingerprint. SIGCOMM 2011 16 New Chunking Algorithm 64 bits Mask=00 00 8A 31 10 58 30 80 n n-1 n-2 n-3 n-4 n-5 n-6 n-7 n-8 n-40 n-41 n-42 n-43 n-44 n-45 n-46 n-47 SIGCOMM 2011 17 Summary Current TRE solutions may not reduce cloud cost. PACK is the first receiver-based TRE – leverages the power of prediction. Minimizes processing costs induced by TRE. Suitable for cloud server migration and client mobility. Implementation is available for download. SIGCOMM 2011 18