Towards Understanding Developing World Traffic

advertisement
TOWARDS
UNDERSTANDING
DEVELOPING
WORLD TRAFFIC
Sunghwan Ihm (Princeton)
KyoungSoo Park (KAIST)
Vivek S. Pai (Princeton)
IMPROVING NETWORK ACCESS
IN THE DEVELOPING WORLD
 Internet
access is a scarce commodity in
the developing world: expensive / slow
 Our
focus: improving performance of
connected network access
 Non-focus:
providing/extending
connectivity (e.g., DTN, WiLDNet)
2
Sunghwan Ihm, Princeton University
POSSIBLE OPTIONS
 Web



proxy caching
Whole objects
Single endpoint (local)
Designated cacheable traffic only
 WAN



acceleration
Packet-level caching
Mostly for enterprise
Two (or more) endpoints, coordinated
 Effective
in first world
3
Sunghwan Ihm, Princeton University
DEVELOPING WORLD
QUESTIONS
 How
effective are these approaches?
 Systems designed for first-world use
 Most traffic studies small, first-world
focused
 How similar is developing region
traffic?
 Any
new opportunities to exploit?
 Differences in traffic
 Differences in cost/tradeoffs
 System design issues
4
Sunghwan Ihm, Princeton University
UNDERSTANDING DEVELOPING
WORLD TRAFFIC
Goal
Shape system design by better
understanding the traffic optimization
opportunities
Requirements
Large-scale, content-focused analysis
5
Sunghwan Ihm, Princeton University
PRIOR TRAFFIC ANALYSIS
WORK
 Large




scale traffic analysis
Internet Study 2007, 2008/2009 by ipoque
One million users
High-level characteristics via DPI
First-world focus
 Developing
world traffic analysis
Du et al. WWW’06, Johnson et al. NSDR’10
 Proxy-level analysis from kiosk, Internet cafes,
and community centers

6
Sunghwan Ihm, Princeton University
OUR APPROACH
 Combine


best features
Large-scale and content-focused
First world and developing world
 Use
traffic from CoDeeN content
distribution network (CDN)
Global proxy (500+ PlanetLab nodes)
 Running since 2003
 30+ million requests per day

7
Sunghwan Ihm, Princeton University
WHAT TO ANALYZE?
1.
Traffic profile
2.
Caching opportunities
3.
User behavior
8
Sunghwan Ihm, Princeton University
DATA COLLECTION
WAN
User
Browser
Cache
Local
Proxy
Cache
CoDeeN
Cache
Origin
Web Server
• Assume local proxy caches
• Focus on cache misses only
• Capture full content
9
9
Sunghwan Ihm, Princeton University
DATA SET
 Duration:
#
1 week (March 25-31, 2010)
Requests: 157 Million
 Volume:
3 TeraBytes
#
Clients (unique IPs): 348 K
#
Countries/Regions: 190


/8 networks coverage: 61.3%
/16 networks coverage: 24.1%
10
Sunghwan Ihm, Princeton University
TOP COUNTRIES
Requests %
Bytes %
Clients %
SA
CN
PL
Etc.
Etc.
Etc.
CN
US
DE
SA
PL
CN
DE
PL
AE
PL (Poland)
CN (China)
SA (Saudi Arabia)
Etc.(185 Countries)
US
SA
DE
US
RU
AE
RU
(Germany)
(United States)
(Russian Federation)
(United Arab Emirates)
11
OECD VS. DEVREG
 OECD:


the first world
27 high-income economies from OECD
member countries
25% of total traffic
 DevReg:


the developing world
The remaining 163 countries and 3 OECD
members: Mexico, Poland, and Turkey
75% of total traffic
12
Sunghwan Ihm, Princeton University
ANALYSIS #1: TRAFFIC
PROFILE


Conjecture:
DevReg users visit low-bandwidth Web
pages (small objects and text-heavy)
We often hear a variant of
“Offline Wikipedia content suffices for
developing world users”
13
Sunghwan Ihm, Princeton University
OBJECT SIZE
16KB
 Small:
median 3KB vs. 5KB
 Large: similar demand/profile
14
Sunghwan Ihm, Princeton University
TEXT AND IMAGES
 DevReg
has a higher fraction of images
 Exact opposite of bandwidth conjecture
15
Sunghwan Ihm, Princeton University
VIDEO AND AUDIO
 DevReg:
higher fraction of video & audio
 Music videos and MP3 songs
16
Sunghwan Ihm, Princeton University
APPLICATION (FLASH)
DevReg has a higher fraction of application traffic
 Median near 7%

17
Sunghwan Ihm, Princeton University
ANALYSIS #1 SUMMARY
 Some
evidence that DevReg-visited sites
have smaller objects, but
 DevReg
and
users visit large pages as well,
 DevReg
users seek a higher fraction of
rich content than OECD users
18
Sunghwan Ihm, Princeton University
ANALYSIS #2: CACHING
OPPORTUNITY
 Conjecture:
little gain from larger caches
Some analysis suggests 1GB sufficient
 Typical cache size < 20GB
 Object-based caching

19
Sunghwan Ihm, Princeton University
CONTENT-BASED CHUNK
CACHING
A
 Split


B

D
E
content into chunks
Name chunks by content (SHA-1 hash)
Cache chunks instead of objects
 Fetch

C
content, send only modified chunks
Two endpoints needed
Applies to “uncacheable” content
20
Sunghwan Ihm, Princeton University
OVERALL REDUNDANCY
 40%
@ 64 KB: objects or parts of large object
 60% @ 1 KB: parts of text pages
 65% @ 128 bytes: paragraphs or sentences 21
Sunghwan Ihm, Princeton University
CACHE BEHAVIOR
SIMULATION
 Simulate


one week’s traffic
Cache misses only
LRU cache replacement policy
 Determine


size for near-ideal hit rate
Calculate byte hit ratio (BHR)
Vary storage size (from 10MB to max)
 Results
for US, China, and Brazil
22
Sunghwan Ihm, Princeton University
US – 213 GB
23
CHINA – 559 GB
24
BRAZIL – 44 GB
25
ANALYSIS #2 SUMMARY
 Chunk


Reduces WAN (cache miss) traffic
Complements existing Web proxies
 Larger


caching useful
caches useful
Useful reduction in miss rate
Cheap compared to bandwidth costs
26
Sunghwan Ihm, Princeton University
ANALYSIS #3: USER
BEHAVIOR



Conjecture: as first-world Web pages get larger,
DevReg users suffer delays
Mechanism: observe aborted transfers
 Intentional termination
 Automatic when browsing away
Abort = users bored or downloads slow
27
Sunghwan Ihm, Princeton University
CANCELLED OBJECT SIZE
C-CDF
 Cancelled
objects larger than normal (red)
 Complete objects (green) much larger than
actual download (blue)
 Most downloads less than 10MB
28
Sunghwan Ihm, Princeton University
CANCELLED TRANSFER
VOLUME
 17%
of transfers are terminated early
 Due
to the early termination, 25% of
actual traffic
 If
fully downloaded, would have been 80%
of all bytes
 Overall traffic increase of 375%
29
Sunghwan Ihm, Princeton University
CANCELLED CONTENT TYPES
 Most
canceled responses were text
 Most bytes from video/audio/application
30
Sunghwan Ihm, Princeton University
% CANCELLED REQUESTS
CDF
 OECD

cancel more often than DevReg
Median almost double
31
Sunghwan Ihm, Princeton University
ANALYSIS #3 SUMMARY
 Many
transactions aborted
 Previewing

Content-based caching is effective
 OECD

video files
users less patient than DevReg
Cheap bandwidth = more sampling?
32
Sunghwan Ihm, Princeton University
CONCLUSIONS
 First


glimpse at CoDeeN traffic
Large-scale, content-focused analysis
OECD and developing world
 Many


DevReg assumptions are false
In fact, strong desire for rich content, and
Patient despite slow connections
 Systems


implications
Chunk caching worth more exploration
Larger caches very useful
33
Sunghwan Ihm, Princeton University
sihm@cs.princeton.edu
http://www.cs.princeton.edu/~sihm/
Download