Understanding Website Complexity: Measurements, Metrics, and Implications Michael Butkiewicz Harsha Madhyastha UC Riverside Vyas Sekar Intel Labs 1 doubleclick.net cdn.turner.com ads.cnn.com Websites today are very complex! Diverse content from many servers and third party services cnn.com facebook.com 2 Users see slow loading websites! 67% of users encounter “slow” sites once a week (gomez.com) 3 Why does load time matter? Source: gomez.com Implications for: Website owners End users Browser developers Customization 4 Our work • Comprehensive study of website complexity – Analysis of sites across rank and category – Content and Service level metrics • Key metrics that impact performance 5 Roadmap • Introduction • Measurement Setup • Complexity • Performance Implications • Discussion and Summary 6 Measurement Setup • Selecting websites – 1,700 websites from Quantcast top 20k – Primary focus on landing (home) page – Annotated with Alexa Categories • Tools – Firefox + Firebug – No Local Caching • Approach – 4 vantage points (3 EC2, 1 UCR) – Every 60 second one page loaded – ~30 measurements per site per vantage point over 9 weeks 7 Example Site Download Objects A B C D HTML CSS Image Script Time 8 Example Firebug Log "log":{ "browser":{ "name":"Firefox”, }, "pages":[{ "startedDateTime":"18:12:59.702-04:00", "title":"Wired News”, "pageTimings":{ "onLoad":4630 }] "entries":[{ "startedDateTime":"18:12:59.702-04:00", "time”:75, "request":{ ... "headers":[{ "name":"Host", "value":"www.wired.com" }, } “response":{ "content":{ "mimeType":"text/html", "size":186013, ... ... 9 Roadmap • Introduction • Measurement Setup • Complexity – Content-level – Service-level • Performance Implications • Discussion and future work 10 Number of Objects: Across Categories Median 125 Objects! Median Site = 57 Objects! 11 Number of Objects: Across Ranks Not as much difference across rank ranges 12 Types of Content Median site: 33 Images, 10 JavaScript, 3 CSS, 0 Flash Flash 13 Normalized types of content Type % Mostly Homogeneous; Flash Skewed 14 Roadmap • Introduction • Measurement Setup • Complexity – Content-level – Service-level • Performance Implications • Discussion and Summary 15 Number of Servers Median News 30 Servers! Median site requires contacting 8 servers 16 Number of Origins 20% Sites > 13 Origins Median News 20 Origins! Median site requires contacting 6 origins 17 Popular non-origin providers Name google-analytics % of sites 58 doubleclick quantserve scorecardresearch 45 30 27 2mdn googleadservices facebook 24 18 17 yieldmanager 16 Not just usual suspects! bluekai.com imrworldwide.com invitemedia.com > 5% of sites each! Most common services: Analytics & Advertising Most common objects: Image (small!) and Javascript 18 Contribution of non-origin services 20% sites > 80% from 3rd Party Median Site Objects / Bytes 30% 35% 19 Contribution of non-origin services 80% from 3rd Party Median Site Objects / Bytes 30% 35% Time only 15% 20 Roadmap • Introduction • Measurement Setup • Complexity • Performance Implications • Discussion and Summary 21 Metric Review Content-Level Characteristics • Total Objects • Object Type: Number, Size • Absolute & Normalized Service-Level Characteristics • Number: Server & Origins • Non-Origin Fraction: Servers / Objects / Time Total combination of 33 metrics 22 Load Time vs. Metric Correlation 23 Load Time Variability vs. Metrics 24 Roadmap • Introduction • Measurement Setup • Complexity – Content-level – Service-level • Performance Implications • Discussion and Summary 25 Discussion: Many other variables • Client-side plugins – NoScript reduces #objects by half! • Mobile-specific customizations – Mobile version reduces #objects to a quarter • Landing vs. non-landing pages – Non-landing seem less complex 26 Conclusions • Comprehensive study of Website Complexity • Median site: 57 objects, contacts 8 servers across 6 origins • Categories show more differences than popularity ranks • Non-origin content: – Analytics & advertising popular – large # objects, bytes, servers, not time • Key performance indicators: – Load Time # objects – Variability # servers Data: www.cs.ucr.edu/~harsha/web_complexity 27