L07-PPT - Interactive Computing Lab

advertisement
Content Distribution
March 2, 2011
2: Application Layer
1
Contents
 P2P architecture and benefits
 P2P content distribution
 Content distribution network (CDN)
2: Application Layer
2
Pure P2P architecture
 no always-on server
 arbitrary end systems
directly communicate peer-peer
 peers are intermittently
connected and change IP
addresses
 Three topics:
 File distribution
 Searching for information
 Case Study: Skype
2: Application Layer
3
File Distribution: Server-Client vs P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
File, size F
dN
uN
u1
d1
u2
ui: peer i upload
bandwidth
d2
di: peer i download
bandwidth
Network (with
abundant bandwidth)
2: Application Layer
4
File distribution time: server-client
 server sequentially
sends N copies:

NF/us time
 client i takes F/di
time to download
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
Time to distribute F
to N clients using = dcs = max { NF/us, F/min(di) }
i
client/server approach
increases linearly in N
(for large N) 2: Application Layer
5
File distribution time: P2P
 server must send one
Server
F
u1 d1 u2
d2
copy: F/us time
us
 client i takes F/di time
Network (with
dN
to download
abundant bandwidth)
uN
 NF bits must be
downloaded (aggregate)
 fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
i
2: Application Layer
6
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
3
Client-Server
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
N
2: Application Layer
7
Contents
 P2P architecture and benefits
 P2P content distribution
 Content distribution network (CDN)
2: Application Layer
8
P2P content distribution issues
 Issues
Peer discovery and group management
 Data placement and searching
 Reliable and efficient file exchange
 Security/privacy/anonymity/trust

 Approaches for group management and
data search (i.e., who has what?)
Centralized (e.g., BitTorrent tracker)
 Unstructured (e.g., Gnutella)
 Structured (Distributed Hash Tables [DHT])

2: Application Layer
9
Centralized index (Napster)
original “Napster” design
1) when peer connects, it
informs central server:


Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries for “Hey
Jude”
3) Alice requests file from
Bob
1
3
1
2
1
Alice
2: Application Layer
10
Centralized model
Bob
Alice
file transfer is
decentralized, but
locating content is
highly centralized
Judy
Jane
2: Application Layer
11
Centralized
 Benefits:
 Low per-node state
 Limited bandwidth usage
 Short location time
 High success rate
 Fault tolerant
 Drawbacks:
 Single point of failure
 Limited scale
 Possibly unbalanced load
Bob
Judy
Alice
Jane
 copyright infringement
2: Application Layer
12
File distribution: BitTorrent
 P2P file distribution
tracker: tracks peers
participating in torrent
torrent: group of
peers exchanging
chunks of a file
obtain list
of peers
trading
chunks
peer
2: Application Layer
13
BitTorrent (1)
 file divided into 256KB chunks.
 peer joining torrent:
has no chunks, but will accumulate them over time
 registers with tracker to get list of peers,
connects to subset of peers (“neighbors”)
 while downloading, peer uploads chunks to other
peers.
 peers may come and go
 once peer has entire file, it may (selfishly) leave or
(altruistically) remain

2: Application Layer
14
BitTorrent (2)
Pulling Chunks
 at any given time,
different peers have
different subsets of
file chunks
 periodically, a peer
(Alice) asks each
neighbor for list of
chunks that they have.
 Alice sends requests
for her missing chunks
 rarest first
Sending Chunks: tit-for-tat
 Alice sends chunks to four
neighbors currently
sending her chunks at the
highest rate
 re-evaluate top 4 every
10 secs
 every 30 secs: randomly
select another peer,
starts sending chunks
 newly chosen peer may
join top 4
 “optimistically unchoke”
2: Application Layer
15
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate,
can find better trading
partners & get file faster!
2: Application Layer
16
P2P Case study: Skype
Skype clients (SC)
 inherently P2P: pairs
of users communicate.
 proprietary
Skype
login server
application-layer
protocol (inferred via
reverse engineering)
 hierarchical overlay
with SNs
 Index maps usernames
to IP addresses;
distributed over SNs
Supernode
(SN)
2: Application Layer
17
Peers as relays
 Problem when both
Alice and Bob are
behind “NATs”.

NAT prevents an outside
peer from initiating a call
to insider peer
 Solution:
 Using Alice’s and Bob’s
SNs, Relay is chosen
 Each peer initiates
session with relay.
 Peers can now
communicate through
NATs via relay
2: Application Layer
18
Distributed Hash Table (DHT)
 DHT = distributed P2P database
 Database has (key, value) pairs;
 key: ss number; value: human name
 key: content type; value: IP address
 Peers query DB with key

DB returns values that match the key
 Peers can also insert (key, value) peers
2: Application Layer
19
DHT Identifiers
 Assign integer identifier to each peer in range
[0,2n-1].

Each identifier can be represented by n bits.
 Require each key to be an integer in same range.
 To get integer keys, hash original key.
 eg, key = h(“Led Zeppelin IV”)
 This is why they call it a distributed “hash” table
2: Application Layer
20
How to assign keys to peers?
 Central issue:

Assigning (key, value) pairs to peers.
 Rule: assign key to the peer that has the
closest ID.
 Convention in lecture: closest is the
immediate successor of the key.
 Ex: n=4; peers: 1,3,4,5,8,10,12,14;
key = 13, then successor peer = 14
 key = 15, then successor peer = 1

2: Application Layer
21
Chord (a circular DHT) (1)
1
3
15
4
12
5
10
8
 Each peer only aware of immediate successor
and predecessor.
 “Overlay network”
2: Application Layer
22
Chord (a circular DHT) (2)
O(N) messages
on avg to resolve
query, when there
are N peers
0001
I am
Who’s resp
0011
for key 1110 ?
1111
1110
0100
1110
1110
1100
1110
1110
Define closest
as closest
successor
1010
0101
1110
1000
2: Application Layer
23
Chord (a circular DHT) with Shortcuts
1
3
15
Who’s resp
for key 1110?
4
12
5
10
8
 Each peer keeps track of IP addresses of predecessor,
successor, short cuts.
 Reduced from 6 to 2 messages.
 Possible to design shortcuts so O(log N) neighbors, O(log
N) messages in query
2: Application Layer
24
Peer Churn
1
•To handle peer churn, require
3
15
4
12
5
10
each peer to know the IP address
of its two successors.
• Each peer periodically pings its
two successors to see if they
are still alive.
8
 Peer 5 abruptly leaves
 Peer 4 detects; makes 8 its immediate successor;
asks 8 who its immediate successor is; makes 8’s
immediate successor its second successor.
 What if peer 13 wants to join?
2: Application Layer
25
Contents
 P2P architecture and benefits
 P2P content distribution
 Content distribution network (CDN)
2: Application Layer
26
Why Content Networks?
 More hops between client and Web server
more congestion!
 Same data flowing repeatedly over links
between clients and Web server

C1
C3
C4
S
C2
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt
- IP router
2: Application Layer
27
Why Content Networks?
 Origin server is bottleneck as number of
users grows
 Flash Crowds (for instance, Sept. 11)
 The Content Distribution Problem: Arrange
a rendezvous between a content source at
the origin server (www.cnn.com) and a
content sink (us, as users)
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt
2: Application Layer
28
Example: Web Server Farm
 Simple solution to the content distribution problem: deploy a
large group of servers
www.cnn.com
(Copy 1)
www.cnn.com
(Copy 2)
Request from
grad.umd.edu
www.cnn.com
(Copy 3)
Request from
ren.cis.udel.edu
L4-L7 Switch
Request from
ren.cis.udel.edu
Request from
grad.umd.edu
 Arbitrate client requests to servers using an “intelligent”
L4-L7 switch
 Pretty widely used today
2: Application Layer
29
Example: Caching Proxy
 Majorly motivated by ISP business interests – reduction in
bandwidth consumption of ISP from the Internet
 Reduced network traffic
 Reduced user perceived latency
ISP
Client
ren.cis.udel.edu
Client
merlot.cis.ud
el.edu
Intercepters
TCP port 80
traffic
Other
traffic
Internet
www.cnn.com
Proxy
2: Application Layer
30
But on Sept. 11, 2001
Web Server
www.cnn.com
New Content
WTC News!
1000,000
other hosts
request
1000,000
other hosts
ISP
old
content
request
User
mslab.kaist.ac.kr
- Congestion /
Bottleneck
- Caching Proxy
2: Application Layer
31
Problems with discussed approaches:
Server farms and Caching proxies
 Server farms do nothing about problems due to
network congestion, or to improve latency issues due to
the network
 Caching proxies serve only their clients, not all users
on the Internet
 Content providers (say, Web servers) cannot rely on
existence and correct implementation of caching
proxies
 Accounting issues with caching proxies.
 For instance, www.cnn.com needs to know the number of hits
to the webpage for advertisements displayed on the
webpage
2: Application Layer
32
Again on Sept. 11, 2001 with CDN
Web Server
www.cnn.com
New Content
WTC News!
WA
CA
MI
1000,000
other users
IL
MA
1000,000
other users
FL
NY
DE
request
new
content
User
mslab.kaist.ac.kr
- Distribution
Infrastructure
- Surrogate
2: Application Layer
33
Web replication - CDNs
 Overlay network to distribute content from
origin servers to users
 Avoids large amounts of same data repeatedly
traversing potentially congested links on the
Internet
 Reduces Web server load
 Reduces user perceived latency
 Tries to route around congested networks
2: Application Layer
34
CDN vs. Caching Proxies
 Caches are used by ISPs to reduce bandwidth
consumption, CDNs are used by content providers
to improve quality of service to end users
 Caches are reactive, CDNs are proactive
 Caching proxies cater to their users (web clients)
and not to content providers (web servers), CDNs
cater to the content providers (web servers) and
clients
 CDNs give control over the content to the content
providers, caching proxies do not
2: Application Layer
35
CDN Architecture
Origin
Server
CDN
Request
Routing
Infrastructure
Distribution
& Accounting
Infrastructure
Surrogate
Surrogate
Client
Client
2: Application Layer
36
CDN Components
 Content Delivery Infrastructure: Delivering
content to clients from surrogates
 Request Routing Infrastructure: Steering or
directing content request from a client to a
suitable surrogate
 Distribution Infrastructure: Moving or replicating
content from content source (origin server,
content provider) to surrogates
 Accounting Infrastructure: Logging and reporting
of distribution and delivery activities
2: Application Layer
37
Server Interaction with CDN
www.cnn.com
1.
Origin server pushes new
content to CDN
OR
CDN pulls content from origin
server
Origin
Server
1
2
2. Origin server requests logs and
other accounting info from CDN
OR
CDN provides logs and other
accounting info to origin server
CDN
Distribution
Infrastructure
Accounting
Infrastructure
2: Application Layer
38
Client Interaction with CDN
1. Hi! I need www.cnn.com/sept11
2.
Go to surrogate
newyork.cnn.akamai.com
CDN
california.cnn.akamai.com
Surrogate
(CA)
Request
Routing
Infrastructure
3. Hi! I need content /sept11
newyorkcnn.akamai.com
Q:
How did the CDN choose the New
York surrogate over the California
surrogate ?
Surrogate
(NY)
1
2
3
Client
2: Application Layer
39
Request Routing Techniques
 Request routing techniques use a set of
metrics to direct users to “best” surrogate
 Proprietary, but underlying techniques
known:
DNS based request routing
 Content Modification (URL rewriting)
 Anycast based (how common is anycast?)
 URL based request routing
 Transport layer request routing
 Combination of multiple mechanisms

2: Application Layer
40
DNS based Request-Routing
 Common due to the ubiquity of DNS as a
directory service
 Specialized DNS server inserted in DNS
resolution process
 DNS server is capable of returning a
different set of A, NS or CNAME records
based on policies/metrics
2: Application Layer
41
DNS based Request-Routing
Q: How does the Akamai
DNS know which
surrogate is closest ?
Akamai
CDN
newyork.cnn.akamai.com
Surrogate
145.155.10.15
www.cnn.com
Akamai DNS
california.cnn.akamai.com
Surrogate
58.15.100.152
1) DNS query:
www.cnn.com
test.nyu.edu
128.4.30.15
DNS response:
A 145.155.10.15
newyork.cnn.akamai.com
local DNS server (dns.nyu.edu)
128.4.4.12
2: Application Layer
42
DNS based Request-Routing
www.cnn.com
Akamai
CDN
Akamai DNS
Surrogate
Surrogate
DNS query
test.nyu.edu
128.4.30.15
DNS response
local DNS server
(dns.nyu.edu)
128.4.4.12
2: Application Layer
43
DNS based Request Routing: Caching
www.cnn.com
Akamai DNS
Akamai
CDN
Requesting DNS - 76.43.32.4
Surrogate - 145.155.10.15
Surrogate
58.15.100.152
Surrogate
145.155.10.15
Requesting DNS - 76.43.32.4
Requesting DNS - 76.43.32.4
Available Bandwidth = 10 kbps
RTT = 10 ms
Client
Client DNS
76.43.35.53
76.43.32.4
Available Bandwidth = 5 kbps
RTT = 100 ms
www.cnn.com
A 145.155.10.15
TTL = 10s
2: Application Layer
44
DNS based Request Routing: Discussion
 Originator Problem: Client may be far removed
from client DNS
 Client DNS Masking Problem: Virtually all DNS
servers, except for root DNS servers honor
requests for recursion
Q: Which DNS server resolves a request for test.nyu.edu?
Q: Which DNS server performs the last recursion of the
DNS request?
 Hidden Load Factor: A DNS resolution may result
in drastically different load on the selected
surrogate – issue in load balancing requests, and
predicting load on surrogates
2: Application Layer
45
Server Selection Metrics
 Network Proximity (Surrogate to Client):
 Network hops (traceroute)
 Internet mapping services (NetGeo, IDMaps)
 …
 Surrogate Load:
 Number of active TCP connections
 HTTP request arrival rate
 Other OS metrics
 …
 Bandwidth Availability
2: Application Layer
46
P4P : Provider Portal for
(P2P) Applications
Laboratory of Networked
Systems
Yale University
P2P: Benefits and Challenges
P2P is a key to content delivery
– Low costs to content owners/distributors
– Scalability
Challenge
– Network-obliviousness usually leads to network
inefficiency
• Intradomain: for Verizon network, P2P traffic traverses
1000 miles and 5.5 metro-hops on average
• Interdomain: 50%-90% of existing local pieces in active
users are downloaded externally*
*Karagiannis
et al. Should Internet service providers fear peer-assisted content
distribution? In Proceeding of IMC 2005
ISP Attempts to Address P2P Issues
 Upgrade infrastructure
 Customer pricing
 Rate limiting, or termination of services
 P2P caching
ISPs cannot effectively address network
efficiency alone
Locality-aware P2P: P2P’s Attempt to
Improve Network Efficiency
 P2P has flexibility in shaping communication
patterns
 Locality-aware P2P tries to use this flexibility
to improve network efficiency

E.g., Karagiannis et al. 2005, Bindal et al. 2006,
Choffnes et al. 2008 (Ono)
Problems of Locality-aware P2P
 Locality-aware P2P needs to reverse engineer
network topology, traffic load and network
policy
 Locality-aware P2P may not achieve network
efficiency
Choose congested links
Traverse costly interdomain links
ISP 1
ISP 0
ISP 2
ISP K
A Fundamental Problem
 Feedback from networks is limited

E.g., end-to-end flow measurements or limited
ICMP feedback
Our Goal
Design a framework to enable better
cooperation between networks and P2P
P4P: Provider Portal for (P2P) Applications
P4P Architecture
ISP A
 Providers

iTracker
publish information via
iTracker
 Applications
 query providers’
information
 adjust traffic patterns
accordingly
iTracker
P2P
ISP B
Example:Tracker-based P2P
 Information flow



1. peer queries
appTracker
appTracker
2
iTracker
3
2/3. appTracker queries
1
iTracker
4
4. appTracker selects a
set of active peers
peer
ISP A
Download