Web Caching on Smartphones: Ideal vs. Reality Feng Qian1, Kee Shen Quah1, Junxian Huang1, Jeffrey Erman2 Alexandre Gerber2, Z. Morley Mao1, Subhabrata Sen2, Oliver Spatscheck2 1University of Michigan 2AT&T June 27 2012 Labs - Research 2 Mobile Traffic: An Explosive Growth Year 2011 2012 2013 2014 2015 2016 Global Mobile Data Traffic per Month (106 TB) 0.6 1.3 2.4 4.2 6.9 10.8 Avg. Smartphone Traffic per Month (MB) 150 1600% increase 2576 Source: Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast, 2011-2016 • Deployment of cellular infrastructures: much slower – Spectrum shortage and economic issue – The cellular infrastructure spending in 2011 was expected to be only a 6.7% rise over 2010 3 Web Caching on Cellular Devices • The big picture: traffic redundancy elimination • The first network-wide study of redundant transfers caused by inefficient HTTP caching on cellular devices – HTTP: The dominant app-layer protocol for ~20 years – Caching: Huge benefits, but complex – Caching on cellular devices: Reduces redundant data transferred over the RAN Improves performance due to reduced latency Cuts cellular bills for customers 4 Background: Caching in HTTP 1.1 • Use Expiration and Revalidation to ensure caching consistency • Before expiration: the client should safely assume the freshness of the cached file known the protocol 20 send yearsa revalidation message • AfterWell expiration: clientfor must is the state-of-the-art in the context to theWhat server to query the freshness of the cacheofentry cellular devices? Last-Modified: Feb 1 If-Modified-Since: 122012 201215:00:00 15:00:00 304 Not 10 15:00:00 FebModified 1Feb 2012 Expires: 15 2012 15:00:00 ? Last-Modified: Feb 11215:00:00 15:00:00 Expires: Feb 10 15 15:00:00 5 Measurement Goal • Goal: understand the state-of-the-art in HTTP caching on cellular devices • What to study: redundant transfers caused by inefficient HTTP caching • Potential cause: HTTP implementation Related – Caching logic (client/server) not following HTTP spec – Limited cache size They account for 20% of the – Non-persistent cache total HTTP traffic volume! • Potential cause: application semantics related – Server conservatively sets headers to make files uncacheable or expire too soon 6 Measurement Data Name ISP UMICH Collection period May 20 2011 (24 hours) May to Oct 12 2011 (5 months) Collection location Commercial cellular core network Directly on user handsets Data format 695 million records of HTTP transactions Full packet trace with payload of all traffic Traffic volume 24.3 TB 118 GB Dataset size 271 GB 119 GB # Users About 2.9 million 20 U of Michigan students Platforms Multiple (mainly iOS and Android) Android 2.2 User interface for the data collector/uploader software 7 Methodology • A simulator strictly follows HTTP/1.1 caching logic (RFC 2616) – – – – Expiration and freshness calculation mechanism Non-cacheable objects Partial caching due to byte-range requests and broken connection LRU cache replacement algorithm, and more … • Feed each user’s HTTP transactions to the cache simulator • Redundant transfers are accurately identified in the simulation process • HTTP caching is not simple: 2K C++ LoC even for the simulation core 8 Cacheability and Redundancy • File cacheability: for both datasets – Most bytes (70% to 78%) and most files (66% to 72%) are cacheable. • Traffic Redundancy (assuming unlimited cache size) Dataset % Redundancy (HTTP only) % Redundancy (HTTP + non-HTTP) ISP 17.7% N/A UMICH 20.3% 17.3% Under-estimation due to HTTPS and app-semantic-related redundancy • Root causes of redundant transfers (within all HTTP traffic) Origin of redundancy Client Issue Server Issue ISP UMICH 1. Handset issues a request before local copies expire 15.9% 16.3% 2. Handset does not revalidate after local copies expire (the file unchanged). 1.8% 4.0% 3. Server does not recognize revalidation after local copies expire (the file unchanged) <0.1% <0.1% 9 Limited Cache Size and Non-persistent cache • Which factor has the main responsibility for redundancy? – Problematic caching logic It is unlikely that∞ the handset – Limited cached cache size Thesize: benefits are significant 4MB, HTTP traffic savings 17%13% is rebooted during such a even for a small cache. – Non-persistent cache: 59% of short interval. consecutive cache hits < 1 min • How large the cache size needs to be? – A cache of 50 MB achieves 90% of the gain (w.r.t. traffic reduction) compared to an unlimited cache Dist. of intervals between consecutive cache hits on the same entry (ISP trace) 10 Quantifying the Resource Impact of Redundant Traffic Compute thewe impact: ΔEabout = (E0cellular – ER) / resources E0 • In cellular networks, also care •ΔEUse ourenergy trace-driven RRC machine simulator with a : Radio ER: state Radio energy E : Radio energy handset model [Qian etal, Mobisys 11]0 impact ofradio power consumption in modified consumption redundant transfers traces with redundant – Applied to only cellular traffic within UMICHindataset original traces (a positive value) transfers removed • Three important metrics characterizing cellular resource consumption: – D: radio resource consumption – S: signaling load – E: handset radio energy consumption 11 Quantifying the Resource Impact of Redundant Traffic ΔS Signaling load Impact ΔE Radio Energy Impact ΔD Radio Resource Impact HTTP only 27% 26% 27% All traffic 6% 7% 9% • When redundant and other traffic coexist, only eliminating redundant traffic may not reduce resource consumption – As long as one of the concurrent transfers exists, the radio is on (i.e., consuming resources) • Non-HTTP traffic plays a role (push notification and chatting) – Traffic volume: small (1%); resource impact: high (18%) – Resource release is controlled by fixed inactivity timers – Sending small data incurs high resource overhead 12 Testing HTTP Libraries and Browsers • Verify measurement findings by testing popular HTTP libraries and browsers on real handsets • Design 13 controlled tests to cover all important • Revisit: which factor has the main aspects of caching implementation responsibility for redundancy? Feature tests (is it well–supported?) Attribute Problematic caching logic tests (infer the parameters) 1. Basic caching 1. Shared or non-shared? – Limited cached size 2. Revalidation 2. Persistent or non-persistent? – Non-persistent cache 3. Various non-caching directives 3. Cache entry size limit 4. Various expiration directives 4. Total cache size 5. URL with query strings 5. Cache entry replacement policy 6. Partial caching 6. Heuristic freshness lifetime 7. Redirection caching 13 Testing HTTP Libraries and Browsers • Basic caching test – – – – Handset requests for a small cacheable file f Server transfers f with a proper Expires directive. Client requests for f again before it expires. PASS iff the 2nd request not incurring any network traffic • Cache size test: perform binary search • Cache replacement policy test: try popular algorithms (LRU, LFU, FIFO) • See paper for all 13 tests 14 Test Results Smartphone HTTP library OS version Implementation issues of caching Support Caching? Caching Enabled by Default? java.net.URLConnection Android 2.3 • 4 out of 8 libraries do not support caching atNo all. No java.net.HttpURLConnection Android 2.3 No • For both browsers, when loading the same URL No back-to-back, the second request2.3 is treated as a org.apache.http.client.HttpClient A huge gap between protocolAndroid specification andNo No android.webkit.WebView Android 2.3 implementation, leading to significant No full reload from the remote server Yes • Android browser uses a small cache of 8MB redundancy of network traffic. android.net.http.HttpResponseCache Android 4.0.2 Partially • Partial caching is not supported Three20 (Version 1.0.6.2) do not properly handle iOSPragma:no-cache 4.3.4 No or • Some NSURLRequest Cache-Control:no-cache. iOS 5.0.1 Partially • (Version … ASIHTTPRequest 1.8.1) iOS 4.3.4 Partially No No No No Android Browser Android 2.3 Partially Yes iPhone Browser iOS 4.3.4/5.0.1 Partially Yes Chrome Browser Android 4.0.2 YES YES 15 Summary • The first network-wide study of cellular HTTP caching • Redundant transfers are prevalent – 18% (ISP) and 20% (UMICH) of HTTP traffic volume – 17% of overall traffic volume (UMICH) – 6%~9% of cellular resource consumption (UMICH) – The root cause: problematic caching logic on handsets – Validated by caching tests of popular libraries and browsers Backup Slides 17 Diversity Among Applications • Identifying smartphone applications – ISP: by user-agent fields in HTTP requests – UMICH: by the captured packet-process correspondence • Diversity among top apps – HTTP redundancy ratios range from 0.0% to 100.0% • Validate apps with high redundancy ratios (> 90%) – Analyze locally collected tcpdump traces – They do not cache HTTP responses • Some apps have negligible redundant transfers – Almost all bytes are not cacheable e.g., all requests are HTTP POST instead of HTTP GET 18 The Cache Simulator (Simplified Version) foreach HTTP transaction r if (file is not storable) then assign_label(r, NOT_STORABLE); continue; else if (cache entry not exists) then assign_label(r, CACHE_ENTRY_NOT_EXIST); else if (cache entry not expired) then assign_label(r, NOT_EXPIRED_DUP); continue; else if (file changed) then assign_label(r, FILE_CHANGED); else if (HTTP 304 used) then assign_label(r, HTTP_304); else if (revalidation not performed) then assign_label(r, EXPIRED_DUP); else assign_label(r, EXPIRED_DUP_SVR); update_cache_entry(r); endfor The simulation algorithm: • Performs fine-grained caching simulation at a per-user basis • Assigns to each HTTP transaction a label indicating its caching status. • Red labels correspond to duplicated transfers. Duplicated transfer: the file has not changed Duplicated The file hastransfer: not changed the file after hasthe not cache changed The Duplicated after file thecontains has cache transfer: changed entry "Cache-Control: Aafter expires, request thebut cache is issued no-store“. theentry Cache after entrythe expires, miss. cacheand entry a cache expires, revalidation but the server is It before handset expires. cannot the does befile cached. not expires. perform cache does properly not performed. recognize the cache revalidation. revalidation. Background: Radio Resource Management in Cellular Networks • RRC (Radio Resource Control) state machine [3GPP TS 25.331] – State promotions have promotion delay – State demotions incur tail times Delay: 2s Tail Time Delay: 1.5s RRC State Channel Radio Power IDLE Not allocated Almost zero CELL_FACH Shared, Low Speed Low CELL_DCH Dedicated, High Speed High Tail Time UMTS RRC State Machine for a large US 3G carrier Page 19 Background: Radio Resource Management in Cellular Networks PromoDCH Delay Tail 2 Sec5 sec FACH Tail 12 sec Tail Time Waiting inactivity timers to expire Page 20