IS250
Spring 2010
John Chuang
The Internet infrastructure, based on TCP/IP, provides:
Global reachability
Reliable end-to-end transport
Highly successful in supporting one-to-one (unicast) communication
But there are some limitations:
Difficult to deploy new network services (e.g., IP multicast, IP anycast, QoS, IPv6)
Lack of support for one-to-many (multicast) or even many-tomany (“peer-to-peer”) communication
End hosts have no control over what goes on in the network
(e.g., no source routing or user-directed routing)
John Chuang 2
One strategy: build an overlay network at the application layer
End hosts gain control over topology formation, routing, to meet specific application needs
New applications and services can be deployed without changes to the TCP/IP infrastructure
John Chuang 3
Logical topology
Self-organized
Dynamic
Application specific
Application layer overlay
Network layer
4 John Chuang
Domain Name Service (DNS)
6bone: IPv6 over IPv4
Mbone: multicast over unicast IP
X-Bone
John Chuang http://www.mbone.cl.cam.ac.uk/mbone/mbone-small.gif
http://graphics.stanford.edu/papers/mbone/morepix/world-6bone.jpeg
5
Web Caching and Content Distribution Networks (CDNs)
Application Layer Multicast (ALM)
User Directed Routing
Anonymous Routing
Resilient overlay network
Peer-to-Peer (P2P)
Unstructured P2P: gnutella, FreeNet, kazaa,…
Structured P2P: Distributed Hash Tables (DHTs)
John Chuang 6
Improves download latency, content availability by storing local copy of popular web objects
Web caches are L7 boxes client
John Chuang proxy cache network caches reverse proxy cache web server
7
Clients are intelligently redirected to nearest CDN server to download publisher content
IP anycast (if it exists) could accomplish this easily…
In the absence of IP anycast, companies like Akamai constructs
CDNs as application layer overlay networks client
CDN servers web server
John Chuang 8
Step 1: client queries DNS for IP address of www.publisher.com; based on client’s IP address, reconfigured publisher DNS returns IP address of replica closest to client
Local DNS publisher DNS publisher client
Nearest replica
John Chuang 9
Step 2: client contacts replica for object
Local DNS publisher DNS publisher client
Nearest replica
John Chuang 10
Local DNS
Step 1: client queries DNS for IP address of www.publisher.com
publisher client
CDN server
CDN DNS
John Chuang 11
Step 2: client contacts publisher; publisher returns HTML with embedded
Local DNS objects’ URLs pointing to best CDN server publisher client
CDN server
CDN DNS
John Chuang 12
Local DNS
Step 3: client queries DNS for IP address of
CDN server publisher client
CDN server
CDN DNS
John Chuang 13
Local DNS
Step 4: client contacts CDN server;
CDN server returns embedded objs publisher client
CDN server
CDN DNS
John Chuang 14
Web Caching and Content Distribution Networks (CDNs)
Application Layer Multicast (ALM)
User Directed Routing
Anonymous Routing
Resilient overlay network
Peer-to-Peer (P2P)
Unstructured P2P: gnutella, FreeNet, kazaa,…
Structured P2P: Distributed Hash Tables (DHTs)
John Chuang 15
Network routers must implement IP Multicast to construct delivery tree and forward packets to multicast group receivers routers client server
John Chuang 16
End hosts self-organize to construct multicast delivery tree; messages sent using IP unicast
Sacrifice some efficiency (latency stretch) for deployability
Various systems: ESM, Overcast, Promise, Scattercast, SplitStream,
Yoid, … routers client server
John Chuang 17
Web Caching and Content Distribution Networks (CDNs)
Application Layer Multicast (ALM)
User Directed Routing
Anonymous Routing
Resilient overlay network
Peer-to-Peer (P2P)
Unstructured P2P: gnutella, FreeNet, kazaa,…
Structured P2P: Distributed Hash Tables (DHTs)
John Chuang 18
IP source route allows end hosts to exercise some degree of route control
However, many ISPs turned off IP source routing option for security reasons routers client server
John Chuang 19
Some applications would benefit from having some degree of control over route selection
Resiliency: e.g., resilient overlay network (RON), Detour
Anonymity: onion routing, MIX-nets, … routers client server
John Chuang 20
Application layer overlay for anonymous routing
Existence of communication between
Alice and Bob not revealed to any 3rd party
Alice constructs onion where message is successively encrypted with keys of intermediate routing nodes
Each intermediate node ‘peels’ one layer of onion and forward to next node
Example system: Tor
John Chuang http://tor.eff.org/overview.html.en
21
Web Caching and Content Distribution Networks (CDNs)
Application Layer Multicast (ALM)
User Directed Routing
Anonymous Routing
Resilient overlay network
Peer-to-Peer (P2P)
Unstructured P2P: gnutella, FreeNet, kazaa,…
Structured P2P: Distributed Hash Tables (DHTs)
John Chuang 22
Self-organized overlay network to support distributed storage, search and retrieval of content
The killer-app: free music and movies
Individual peers contribute resources
Content
Network management (e.g., forwarding query messages)
Desirable properties:
Scalability
Performance (latency, recall)
Robustness
Anonymity, censorship-resistance
Design challenges:
Dynamic membership
Various forms of attacks
Free-riding behavior
John Chuang 23
1 st generation: centralized index
e.g., Napster
2 nd generation: decentralized indices
e.g., Gnutella v0.4, Freenet
3 rd generation: hierarchical
e.g., FastTrack (KaZaA, Grokster, Morpheus), eDonkey2000, Gnutella v0.6
4 th generation:
Structured topologies using DHTs, e.g., eMule,
Overnet, BitTorrent
Parallel downloads, e.g., BitTorrent, Avalanche
Darknets, e.g., WASTE for small-scale “F2F”
John Chuang networks
24
Maintains a centralized index that maps files to machines
How to find a file
Query the index system return a list of peers that store the requested file
Transfer the file directly from peer(s)
Advantage: m6
F
Simplicity: easy to implement sophisticated search engines on top of the index system
Disadvantage:
Single point of failure
E?
m1
A
E
E?
m5 m5
E m1 A m2 B m3 C m4 D m5 E m6 F m2
B
John Chuang
Slide adapted from Ion Stoica, Nicolas Christin
25 m4
D m3
C
Flood the request
How to find a file:
Send request to all neighbors
Neighbors recursively propagate the request
Eventually a machine that has the file receives the request, and it sends back the answer m6
Advantages:
Totally decentralized, highly robust
Disadvantages:
The entire network can be swamped with a request
Can be alleviated using TTLs, but can then fail to locate files (and still high resource usage)
John Chuang
F m1
E
A
E?
m5
E?
E m2
B
E?
E?
Slide adapted from Ion Stoica, Nicolas Christin m4 m3
26
C
Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;…
D
Use two-level hierarchy
F
Some nodes are elected as “super nodes” or “ultra-peers”
Each ultra-peer serves as centralized index for a portion of the network
If an ultra-peer does not know where to find an item, query is forwarded to other ultra-peers
Advantage:
F
F?
E
F?
m4
Reduce the amount of network traffic compared to “naïve” flooding
F?
Disadvantage:
Ultra-peers vulnerable to attacks
A
B
C m3
Potential convergence problems when ultra-peers leave abruptly
Used in FastTrack (KaZaA, Grokster,
Morpheus), eDonkey2000, Gnutella v0.6
John Chuang m1 m2
Assume red nodes are ultra-peers
Slide adapted from Ion Stoica, Nicolas Christin
27
D
Gnutella and KaZaA topologies are unstructured
Neighbor selection largely random
No guarantee that a file can be located, even if it exists in the network
Distributed hash tables (DHTs) offer to solve this problem by constructing highly structured topologies
John Chuang 28
Applications: distributed search (e.g., p2p, CDNs, cooperative caching), application layer overlays for multicast, anycast, etc.
Similar to traditional hash table data structure, except data is stored in distributed peer nodes
Each node is analogous to a bucket in a hash table
Put(), Get() interface like a regular hash table:
-
put( id , item ); item = get( id );
Designed to scale to large numbers of nodes and to handle continual node arrivals, departures, or failures.
Various DHT designs:
CAN, Chord, Kademlia, Pastry, Tapestry, Viceroy, etc.
John Chuang 29
Associate each node and item to a unique identifier in a one
Each node x
-
-
Fingers are neighbors i -th entry in finger table is the first node that succeeds or equals x + 2 i successor node of dimensional space (0..2
maintains a finger table
An item identified by id id is stored on the m )
Properties
Routing table size O(log( nodes
N )) , where N is the total number of
Guarantees that a file (if it exists) is found in O(log( N )) steps
John Chuang
Slide adapted from Ion Stoica, Nicolas Christin
30
Assume m = 3, i.e., an identifier space
0..7
Node n1:(1) joins
7
6
0
1
2
Finger Table i id +2 i succ
0 2 1
1 3 1
2 5 1
John Chuang
5 3
4
Slide adapted from Ion Stoica, Nicolas Christin
31
Assume m = 3, i.e., an identifier space
0..7
Node n1:(1) joins
Node n2:(2) joins
7
John Chuang
0
1
Finger Table i id +2 i succ
0 2 2
1 3 1
2 5 1
6 2
5
4
3
Finger Table i id +2 i succ
0 3 1
1 4 1
2 6 1
Slide adapted from Ion Stoica, Nicolas Christin
32
Assume m = 3, i.e., an identifier space
0..7
Node n1:(1) joins
Node n2:(2) joins
Nodes n3:(0), n4:(6) join
Finger Table i id +2 i succ
0 7 0
1 0 0
2 2 2
6
7
John Chuang
0
Finger Table i id +2 i succ
0 1 1
1 2 2
2 4 6
1
2
Finger Table i id +2 i succ
0 2 2
1 3 6
2 5 6
5
4
3
Finger Table i id +2 i succ
0 3 6
1 4 6
2 6 6
Slide adapted from Ion Stoica, Nicolas Christin
33
Items inserted: f1:(7), f2:(1)
6
7
0
Finger Table i id +2 i succ
0 1 1
1 2 2
2 4 6
Items
7
1
Finger Table i id +2 i succ
0 2 2
1 3 6
2 5 6
Items
1
2 Finger Table i id +2 i succ
0 7 0
1 0 0
2 2 2
John Chuang
5
4
3
Finger Table i id +2 i succ
0 3 6
1 4 6
2 6 6
Slide adapted from Ion Stoica, Nicolas Christin
34
Upon receiving a query for item id , a node
Checks if item is cached locally
If not, forwards the query to the largest node in its successor table that does not exceed id
7
6 Finger Table i id +2 i succ
0 7 0
1 0 0
2 2 2
5
John Chuang
0
Finger Table i id +2 i succ
0 1 1
1 2 2
2 4 6
Items
7 query(7)
1
Finger Table i id +2 i succ
0 2 2
1 3 6
2 5 6
Items
1
2
4
3
Finger Table i id +2 i succ
0 3 6
1 4 6
2 6 6
Slide adapted from Ion Stoica, Nicolas Christin
35
Difficult to deploy new network services at network layer
Response: build overlay network at the application layer
End hosts gain control over topology formation, routing, to meet specific application needs
New applications and services can be deployed without changes to the
TCP/IP infrastructure
Many flavors of application layer overlay networks:
Web Caching and Content Distribution Networks (CDNs)
Application Layer Multicast (ALM)
Anonymous Routing (Tor)
Resilient overlay network (RON)
P2P file-sharing networks
Distributed Hash Tables (DHTs)
…
John Chuang 36