slides

advertisement
Understanding Website Complexity:
Measurements, Metrics, and Implications
Michael Butkiewicz
Harsha Madhyastha
UC Riverside
Vyas Sekar
Intel Labs
1
doubleclick.net
cdn.turner.com
ads.cnn.com
Websites today are very complex!
Diverse content from many servers and third party services
cnn.com
facebook.com
2
Users see slow loading websites!
67% of users encounter “slow” sites once a week (gomez.com)
3
Why does load time matter?
Source: gomez.com
Implications for:
Website owners
End users
Browser developers
Customization
4
Our work
• Comprehensive study of website complexity
– Analysis of sites across rank and category
– Content and Service level metrics
• Key metrics that impact performance
5
Roadmap
• Introduction
• Measurement Setup
• Complexity
• Performance Implications
• Discussion and Summary
6
Measurement Setup
• Selecting websites
– 1,700 websites from Quantcast top 20k
– Primary focus on landing (home) page
– Annotated with Alexa Categories
• Tools
– Firefox + Firebug
– No Local Caching
• Approach
– 4 vantage points (3 EC2, 1 UCR)
– Every 60 second one page loaded
– ~30 measurements per site per vantage point over 9 weeks
7
Example Site Download
Objects
A
B
C
D
HTML
CSS
Image
Script
Time
8
Example Firebug Log
"log":{
"browser":{
"name":"Firefox”,
},
"pages":[{
"startedDateTime":"18:12:59.702-04:00",
"title":"Wired News”,
"pageTimings":{
"onLoad":4630 }]
"entries":[{
"startedDateTime":"18:12:59.702-04:00",
"time”:75,
"request":{
...
"headers":[{
"name":"Host",
"value":"www.wired.com"
},
}
“response":{
"content":{
"mimeType":"text/html",
"size":186013,
...
...
9
Roadmap
• Introduction
• Measurement Setup
• Complexity
– Content-level
– Service-level
• Performance Implications
• Discussion and future work
10
Number of Objects: Across Categories
Median
125 Objects!
Median Site = 57 Objects!
11
Number of Objects: Across Ranks
Not as much difference across rank ranges
12
Types of Content
Median site: 33 Images, 10 JavaScript, 3 CSS, 0 Flash
Flash
13
Normalized types of content
Type % Mostly Homogeneous; Flash Skewed
14
Roadmap
• Introduction
• Measurement Setup
• Complexity
– Content-level
– Service-level
• Performance Implications
• Discussion and Summary
15
Number of Servers
Median News
30 Servers!
Median site requires contacting 8 servers
16
Number of Origins
20% Sites > 13 Origins
Median News
20 Origins!
Median site requires contacting 6 origins
17
Popular non-origin providers
Name
google-analytics
% of sites
58
doubleclick
quantserve
scorecardresearch
45
30
27
2mdn
googleadservices
facebook
24
18
17
yieldmanager
16
Not just usual suspects!
bluekai.com
imrworldwide.com
invitemedia.com
> 5% of sites each!
Most common services: Analytics & Advertising
Most common objects: Image (small!) and Javascript
18
Contribution of non-origin services
20% sites > 80%
from 3rd Party
Median Site
Objects / Bytes
30%
35%
19
Contribution of non-origin services
80% from 3rd Party
Median Site
Objects / Bytes
30%
35%
Time only 15%
20
Roadmap
• Introduction
• Measurement Setup
• Complexity
• Performance Implications
• Discussion and Summary
21
Metric Review
Content-Level Characteristics
• Total Objects
• Object Type: Number, Size
• Absolute & Normalized
Service-Level Characteristics
• Number: Server & Origins
• Non-Origin Fraction: Servers / Objects / Time
Total combination of 33 metrics
22
Load Time vs. Metric Correlation
23
Load Time Variability vs. Metrics
24
Roadmap
• Introduction
• Measurement Setup
• Complexity
– Content-level
– Service-level
• Performance Implications
• Discussion and Summary
25
Discussion: Many other variables
• Client-side plugins
– NoScript reduces #objects by half!
• Mobile-specific customizations
– Mobile version reduces #objects to a quarter
• Landing vs. non-landing pages
– Non-landing seem less complex
26
Conclusions
• Comprehensive study of Website Complexity
• Median site: 57 objects, contacts 8 servers across 6 origins
• Categories show more differences than popularity ranks
• Non-origin content:
– Analytics & advertising popular
– large # objects, bytes, servers, not time
• Key performance indicators:
– Load Time  # objects
– Variability  # servers
Data: www.cs.ucr.edu/~harsha/web_complexity
27
Download