mBenchLab Measuring QoE of Web Applications Using Mobile Devices http://benchlab.cs.umass.edu/ Emmanuel Cecchet, Robert Sims, Prashant Shenoy and Xin He University of Massachusetts Amherst THE WEB CIRCA 2000 Wikipedia (2001) (2004) (1999) (1995) (2000) mBenchLab – cecchet@cs.umass.edu (1998) 2 THE WEB TODAY… Amazon.com Tablet: 225 requests 30 .js 10 css 10 html 175 multimedia files Phone: 15 requests 1 .js 1 css 10 html 8 multimedia files mBenchLab – cecchet@cs.umass.edu 3 THE WEB TODAY… Wikipedia on Montreal 226 requests 3MB mBenchLab – cecchet@cs.umass.edu 4 BENCHMARKING TOOLS HAVE NOT KEPT UP Traditional approach (TPC-W, RUBiS…) Web Emulator Application under Test + BenchLab approach Application under Test HTTP trace mBenchLab – cecchet@cs.umass.edu Workload definition http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... Real Devices Real Browsers Real Networks 5 Replay QoE measurement Old way: QoE = Server + Network Modern way: QoE = Servers + Network + Browser Browsers are smart Parallelism on multiple connections JavaScript execution can trigger additional queries Rendering introduces delays in resource access Caching and pre-fetching HTTP replay cannot approximate real Web browser access to resources 0.25s 0.25s 0.06s 1.02s 0.67s 0.90s 1.19s 0.14s 0.97s 1.13s 0.70s 0.28s 0.27s 0.12s 3.86s 1.88s Total network time GET /wiki/page 1 Analyze page GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET combined.min.css jquery-ui.css main-ltr.css commonPrint.css shared.css flaggedrevs.css Common.css wikibits.js jquery.min.js ajax.js mwsuggest.js plugins...js Print.css Vector.css raw&gen=css ClickTracking.js Vector...js js&useskin WikiTable.css CommonsTicker.css flaggedrevs.js Infobox.css Messagebox.css Hoverbox.css Autocount.css toc.css Multilingual.css mediawiki_88x31.png 2 Rendering + JavaScript GET GET GET GET GET GET GET GET GET ExtraTools.js Navigation.js NavigationTabs.js Displaytitle.js RandomBook.js Edittools.js EditToolbar.js BookSearch.js MediaWikiCommon.css 3 Rendering + JavaScript GET GET GET GET GET GET GET GET GET GET GET 4 GET GET GET GET GET GET page-base.png page-fade.png border.png 1.png external-link.png bullet-icon.png user-icon.png tab-break.png tab-current.png tab-normal-fade.png search-fade.png Rendering search-ltr.png arrow-down.png wiki.png portal-break.png portal-break.png arrow-right.png generate page send files send files mBenchLab – cecchet@cs.umass.edu BROWSERS MATTER FOR QOE? send files send files + 2.21s total rendering time 6 USE CASES Provide an open source infrastructure for accurate QoE on mobile devices Benchmarking of Real applications & workloads Over real networks Using real devices & browsers Users Provide QoE on relevant Web sites Different from a basic SpeedTest mBenchLab – cecchet@cs.umass.edu Researchers 7 The mBenchLab approach Experimental What’s next? results mBenchLab – cecchet@cs.umass.edu OUTLINE 8 MBENCHLAB ARCHITECTURE Traces (HAR or access_log) Results (HAR or latency) Experiment Config Benchmark VMs Experiment scheduler http://... http://... http://... mBA Devices Metrics HTML Snapshots … mBA mBenchLab – cecchet@cs.umass.edu Experiment start/stop Trace download Results upload Web Frontend Browser registration BenchLab Dashboard Backend application Experiment Client App mBA mBA 9 1. Start the BenchLab Dashboard and create an experiment 2. Start the mBenchLab App on the mobile device 3. Browser issues HTTP requests and App performs QoE measurements 4. Upload collected results to Dashboard 5. Analyze results Play trace mBA mBA mBA mBA 1. 2. 3. Upload traces Define experiment Start experiment Detailed Network and Browser timings Upload results Browser registration Experiment start/stop Trace download Results upload Web Frontend Experiment scheduler Traces (HAR or access_log) Results (HAR or latency) Experiment Config Benchmark VMs mBenchLab – cecchet@cs.umass.edu RUNNING AN EXPERIMENT WITH MBENCHLAB • View results • Repeat experiment • Export setup/traces/ VMs/results 10 MBENCHLAB ANDROID APP (MBA) Issue HTTP requests in native Android browser Collect QoE measurements in HAR format Upload results to BenchLab Dashboard when done mBenchLab Android App Storage snap- HAR shot #1 HAR HAR #1 #1 tra ce Network Wifi 3G/4G mBenchLab runtime HAR recording proxy Selenium Android driver GPS Native Android browser mBenchLab – cecchet@cs.umass.edu Cloud Web services 11 MBA MEASUREMENTS QoE Overall page loading time including Javascript execution and rendering time Request failure/success/cache hit rate HTML correctness Rendering overview with screen snapshots Network DNS resolution time Connection establishment time Send/wait/receive time on network connections Device mBenchLab – cecchet@cs.umass.edu Hardware and software configurations Location (optional) 12 The mBenchLab approach Experimental results What’s next? mBenchLab – cecchet@cs.umass.edu OUTLINE 13 EXPERIMENTAL SETUP Desktop MacBook Pro using Firefox Tablets Trio Stealth Pro ($50 Android 4 tablet) Kindle Fire Phones Samsung S3 GT-I9300 (3G) Motorola Droid Razr (4G) HTC Desire C Traces mBenchLab – cecchet@cs.umass.edu Amazon Craigslist Wikipedia Wikibooks 14 QOE ON DIFFERENT WEB SITES Web sites can be very dynamic Amazon content is very dynamic (hurts repeatability) Craigslist is very simple and similar across platforms mBenchLab – cecchet@cs.umass.edu 15 INSTRUMENTATION OVERHEAD Hardware matters Single core underpowered hardware shows ~3s instrumentation overhead on Wikipedia pages Modern devices powerful enough to instrument with negligible overhead mBenchLab – cecchet@cs.umass.edu 16 Quantify QoE variation between Wifi vs Edge vs 3G vs 4G Performance varies based on location, network provider, hardware… mBenchLab – cecchet@cs.umass.edu QOE ON DIFFERENT MOBILE NETWORKS 17 IDENTIFYING QOE ISSUES Why is the page loading time so high? mBenchLab – cecchet@cs.umass.edu 18 QOE BUGS: THE SAMSUNG S3 PHONE Number of HTTP requests and page sizes off for Wikipedia pages mBenchLab – cecchet@cs.umass.edu 19 QOE BUGS: THE SAMSUNG S3 PHONE Bug in srcset implementation <img src="pear-desktop.jpeg" srcset="pear-mobile.jpeg 720w, pear-tablet.jpeg 1280w" alt="The pear"> mBenchLab – cecchet@cs.umass.edu 20 The mBenchLab approach Experimental What’s next? results mBenchLab – cecchet@cs.umass.edu OUTLINE 21 RELATED WORK Benchmarking tools have not kept up RUBiS, TPC-*: obsolete backends Httperf: unrealistic replay Commercial benchmarks Spec-* Not open nor free Mobility adds complexity Networks Hossein et al., A first look at traffic on smartphones, IMC’10 Hardware Hyojun et al., Revisiting Storage for Smartphones, FAST’12 Thiagarajan et al., Who killed my battery?, WWW’12 mBenchLab – cecchet@cs.umass.edu Location Nordström et al., Serval: An End-Host stack for Service Centric Networking, NSDI’12 22 SUMMARY AND FUTURE WORK Real devices, browsers, networks and application backends needed for modern WebApp benchmarking mBenchLab provides For researchers: Infrastructure for Internet scale Benchmarking of real applications with mobile devices For users: Insight on QoE with real Web sites rather than a simple SpeedTest Larger scale experiments More users, devices, locations More Web applications and traces mBenchLab – cecchet@cs.umass.edu 23 SOFTWARE, DOCUMENTATION, RESULTS: http://benchlab.cs.umass.edu/ WATCH TUTORIALS AND DEMOS ON YOUTUBE mBenchLab – cecchet@cs.umass.edu Q&A 24 BONUS SLIDES RELATED WORK Hossein et al., A first look at traffic on smartphones, IMC’10 Majority of traffic is Web browsing 3G performance varies according to network provider Mobile proxies improve performance Hyojun et al., Revisiting Storage for Smartphones, FAST’12 Device storage performance affects browsing experience Thiagarajan et al., Who killed my battery?, WWW’12 Battery consumption can be reduced with better JS and CSS Energy savings with JPG Nordström et al., Serval: An End-Host stack for Service Centric Networking, NSDI’12 mBenchLab – cecchet@cs.umass.edu Transparent switching between networks Location based performance 26 Web 2.0 applications Rich client interactions (AJAX, JS…) o Multimedia content o Replication, caching… o Large databases (few GB to multiple TB) o Complex Web interactions HTTP 1.1, CSS, images, flash, HTML 5… o WAN latencies, caching, Content Delivery Networks… o mBenchLab – cecchet@cs.umass.edu WEB APPLICATIONS HAVE CHANGED 27 Applications HTML CSS JS Multimedia Total RUBiS eBay.com TPC-W amazon.com 1 1 1 6 0 0 3 3 0 0 13 33 1 31 5 91 2 38 6 141 CloudStone 1 2 4 21 28 facebook.com 6 13 22 135 176 wikibooks.org 1 19 23 35 78 wikipedia.org 1 5 20 36 10 mBenchLab – cecchet@cs.umass.edu EVOLUTION OF WEB APPLICATIONS Number of interactions to fetch the home page of various web sites and benchmarks 28 TYPING SPEED MATTERS Auto-completion in search fields is common Each keystroke can generate a query Text searches use a lot of resources GET GET GET GET GET GET /api.php?action=opensearch&search=W /api.php?action=opensearch&search=Web /api.php?action=opensearch&search=Web+ /api.php?action=opensearch&search=Web+2 /api.php?action=opensearch&search=Web+2. /api.php?action=opensearch&search=Web+2.0 mBenchLab – cecchet@cs.umass.edu 29 STATE SIZE MATTERS Does the entire DB of Amazon or eBay fit in the memory of a cell phone? TPC-W DB size: 684MB RUBiS DB size: 1022MB Impact of CloudStone database size on performance Dataset size 25 users 100 users 200 users 400 users 500 users State size (in GB) 3.2 12 22 38 44 Database rows 173745 655344 1151590 1703262 1891242 Avg cpu load with 25 users 8% 10% 16% 41% 45% mBenchLab – cecchet@cs.umass.edu CloudStone Web application server load observed for various dataset sizes using a workload trace of 25 users replayed with Apache HttpClient 3. 30 What has changed in WebApps Benchmarking real applications with BenchLab Experimental Demo results mBenchLab – cecchet@cs.umass.edu OUTLINE 31 RECORDING HTTP TRACES 3 options to record traces in HTTP Archive (HAR) format directly in Web browser at HA proxy load balancer level using Apache httpd logs HA Proxy recorder Internet Frontend/ Load balancer httpd recorder App. Servers mBenchLab – cecchet@cs.umass.edu Recording in the Web browser Databases 32 Wikimedia Wiki open source software stack Lots of extensions Very complex to setup/install Real database dumps (up to 6TB) 3 months to create a dump 3 years to restore with default tools Multimedia content Images, audio, video Generators (dynamic or static) to avoid copyright issues mBenchLab – cecchet@cs.umass.edu WIKIMEDIA FOUNDATION WIKIS Real Web traces from Wikimedia Packaged as Virtual Appliances 33 COMPLEXITY BEHIND THE SCENE Browsers are smart Caching, prefetching, parallelism… Javascript can trigger additional requests Real network latencies vary a lot Too complex to simulate to get accurate QoE mBenchLab – cecchet@cs.umass.edu 34 BENCHLAB DASHBOARD JEE WebApp with embedded database Repository of benchmarks and traces Schedule and control experiment execution Results repository Can be used to distribute / reproduce experiments and compare results Browser registration Experiment start/stop Trace download Results upload Web Frontend http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... http://... Traces (HAR or access_log) Results (HAR or latency) Experiment Config Benchmark VMs Upload traces / VMs Define and run experiments Compare results Distribute benchmarks, traces, configs and results mBenchLab – cecchet@cs.umass.edu Experiment scheduler 35 Open Versus Closed: A Cautionary Tale – B. Schroeder, A. Wierman, M. Harchor-Balter – NSDI’06 response time difference between open and close can be large scheduling more beneficial in open systems mBenchLab – cecchet@cs.umass.edu OPEN VS CLOSED 36 SUMMARY AND FUTURE WORK Larger scale experiments More users, devices, locations More Web applications and traces Social and economical aspects User privacy Cost of mobile data plans Experiment feedback (to and from user) Automated result processing Anomaly detection Performance comparison Server side measurements with Wikipedia Virtual appliances Mobile proxy mBenchLab – cecchet@cs.umass.edu Software distribution for other researchers to setup their own BenchLab infrastructure 37