Building Very Large Overlay Networks Jorg Liebeherr University of Virginia

advertisement
Building Very Large Overlay
Networks
Jorg Liebeherr
University of Virginia
HyperCast Project
• HyperCast is a set of protocols for large-scale overlay
multicasting and peer-to-peer networking
Research problems:
• Investigate which topologies are best
suited for
… a large number of peers
… that spuriously join and leave
… in a network that is dynamically changing
• Make it easier to write applications for
overlays easier by
… providing suitable abstractions (“overlay socket”)
… shielding application programmer from overlay topology issues
Acknowledgements
• Team:
– Past: Bhupinder Sethi, Tyler
Beam, Burton Filstrup, Mike
Nahas, Dongwen Wang,
Konrad Lorincz, Jean Ablutz,
Haiyong Wang, Weisheng Si,
Huafeng Lu
– Current: Jianping Wang,
Guimin Zhang, Guangyu
Dong, Josh Zaritsky
• This work is supported in
part by the National Science
Foundation:
D E N A L I
Applications with many receivers
1,000,000
Number
of Senders
Sensor
networks
Peer-to-Peer
Applications
1,000
Games
10
1
Streaming
Collaboration
Tools
10
Software
Distribution
1,000
1,000,000
Number
of Receivers
Need for Multicasting ?
• Maintaining unicast connections is not feasible
• Infrastructure or services needs to support a “send to group”
Multicast support in the network
infrastructure (IP Multicast)
• Reality Check:
– Deployment has encountered severe scalability limitations in both the
size and number of groups that can be supported
– IP Multicast is still plagued with concerns pertaining to scalability,
network management, deployment and support for error, flow and
congestion control
Overlay Multicasting
• Logical overlay resides on top of the layer-3 network
infrastructure (“underlay”)
• Data is forwarded between neighbors in the overlay
• Transparent to underlay network
• Overlay topology should match the layer-3 infrastructure
Overlay-based approaches for multicasting
• Build an overlay mesh network and embed trees into the mesh:
– Narada (CMU)
– RMX/Gossamer (UCB)
– more
• Build a shared tree:
– Yallcast/Yoid (NTT, ACIRI)
– AMRoute (Telcordia, UMD – College Park)
– Overcast (MIT)
– Nice (UMD)
– more
• Build an overlay using distributed hash tables and embed trees:
– Chord (UCB, MIT)
– CAN (UCB, ACIRI)
– Pastry/Scribe (Rice, MSR)
– more
HyperCast Overlay Topologies


Applications organize themselves
to form a logical overlay network
with a given topology
Data is forwarded along the
edges of the overlay network
Hypercube
Delaunay
triangulation
Spanning tree
(for mobile ad hoc)
Programming in HyperCast: Overlay Socket
•
Application Program
Socket-based API
Overlay Socket Interface
•
•
•
UDP (unicast or multicast)
or TCP
Supports different semantics
for transport of data
Supports different
overlay protocols
Implementation in Java
Statistics Interface
•
Overlay
Socket
Application
Receive
Buffer
Overlay Node Interface
Forwarding Engine
Overlay Node
Adapter Interface
Adapter Interface
Node Adapter
Socket Adapter
Messages of
the Overlay
Protocol
Application
Messages
Underlay Network
• HyperCast Project website:
http://www.cs.virginia.edu/hypercast
Message Store
Network of overlay sockets
•
Overlay socket is an endpoint of
communication in an overlay network
•
An overlay network is a collection of
overlay sockets
•
Overlay sockets in the same overlay
network have a common set of
attributes
•
•
Each overlay network has an overlay
ID, which can be used the access
attributes of overlay
Nodes exchange data with neighbors
in the overlay topology
Application
Application
Overlay
socket
Overlay
socket
Internet
Application
Overlay
socket
Application
Overlay
socket
Application
Application
Unicast and Multicast in overlays
• Unicast and multicast
is done along trees
that are embedded in
the overlay network.
• Requirement:
Overlay node must
be able to compute
the child nodes and
parent node with
respect to a given
root
Multicast
Root
(sender)
Unicast
Root
(receiver)
Sender
Overlay Message Format
Loosely modeled after IPv6 
minimal header with extensions
• Common Header:
1
2
3
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-------+---------------+---+---+-------------------------------+
|Version|LAS|Dmd| Traffic Class | Flow Label
| Next Header
|
+-------+---------------+---+---+-------------------------------+
| OL Message Length
|
Hop Limit
|
+-------------------------------+-------------------------------+
|
Src LA
|
+--------------------------------------------------------------|
Dest LA
|
+---------------------------------------------------------------+
Version (4 bit):
LAS (2 bit):
Dmd (4bit)
Traffic Class (8 bit):
Flow Label (8 bit):
Next Header (8 bit)
OL Message Length (8 bit)
Hop Limit (16 bit):
Src LA ((LAS+1)*4 bytes)
Dest LA ((LAS+1)*4 bytes
Version of the protocol (current Version is 0x0)
Size of logical address field
Delivery Mode (Multicast, Flood, Unicast, Anycast)
Specifies Quality of Service class (default: 0x00)
Flow identifier
Specifies the type of the next header following this header
Specifies the type of the next header following this header.
TTL field
Logical address of the source
Logical address of the destination
Socket Based API
•
•
Tries to stay close to Socket API for UDP Multicast
Program is independent of overlay topology
//Generate the configuration object
OverlayManager om = new OverlayManager(“hypercast.prop”);
String overlayID = om.getDefaultProperty(“MyOverlayID")
OverlaySocketConfig config = new
om.getOverlaySocketConfig(overlayID);
//create an overlay socket
OL_Socket socket = config.createOverlaySocket(callback);
//Join an overlay
socket.joinGroup();
//Create a message
OL_Message msg = socket.createMessage(byte[] data, int length);
//Send the message to all members in overlay network
socket.sendToAll(msg);
//Receive a message from the socket
OL_Message msg = socket.receive();
//Extract the payload
byte[] data = msg.getPayload();
Property File
• Stores attributes that configure the overlay socket (overlay
protocol, transport protocol and addresses)
# This is the Hypercast Configuration File
#
#
# LOG FILE:
LogFileName = stderr
# ERROR FILE:
ErrorFileName = stderr
# OVERLAY Server:
OverlayServer =
# OVERLAY ID:
OverlayID = 224.228.19.78/9472
KeyAttributes = Socket,Node,SocketAdapter
# SOCKET:
Socket = HCast2-0
HCAST2-0.TTL = 255
HCAST2-0.ReceiveBufferSize = 200
HCAST2-0.ReadTimeout = 0
...
# SOCKET ADAPTER:
SocketAdapter = TCP
SocketAdapter.TCP.MaximumPacketLength = 16384
SocketAdapter.UDP.MessageBufferSize = 100
# NODE:
Node = HC2-0
HC2-0.SleepTime = 400
HC2-0.MaxAge = 5
HC2-0.MaxMissingNeighbor = 10
HC2-0.MaxSuppressJoinBeacon = 3
# NODE ADAPTER:
#
NodeAdapter = UDPMulticast
NodeAdapter.UDP.MaximumPacketLength = 8192
NodeAdapter.UDP.MessageBufferSize = 18
NodeAdapter.UDPServer.UdpServer0 = 128.143.71.50:8081
NodeAdapter.UDPServer.MaxTransmissionTime = 1000
NodeAdapter.UDPMulticastAddress = 224.242.224.243/2424
Hypercast Software: Demo Applications
Distributed Whiteboard
Multicast file transfer
Data aggregation in P2P
Net:  Homework assignment in
CS757 (Computer Networks)
Application: Emergency Response Network
for Arlington County, Virginia
Project directed by
B. Horowitz and S.
Patek
Application of P2P Technology to
Advanced Emergency Response Systems
Dynamic Grouping and
Information Management
Situational Awareness
Priority Messaging
Other Features of Overlay Socket
Current version (2.0):
• Statistics: Each part of the overlay socket maintains statistics which are
accessed using a common interface
• Monitor and control: XML based protocol for remote access of statistics
and remote control of experiments
• LoToS: Simulator for testing and visualization of overlay protocols
• Overlay Manager: Server for storing and downloading overlay
configurations
Next version (parts are done, release in Summer 2004):
– “MessageStore”: Enhanced data services, e.g., end-to-end reliability,
persistence, streaming, etc.
– HyperCast for mobile ad-hoc networks on handheld devices
– Service differentiation for multi-stream delivery
– Clustering
– Integrity and privacy with distributed key management
Delaunay Triangulation Overlays
Nodes in a Plane
Nodes are assigned
x-y coordinates
4,9
10,8
(e.g., based on
geographic location)
0,6
0,3
0,2
5,2
0,1
0,0 1,0 2,0 3,0
12,0
Voronoi Regions
4,9
10,8
The Voronoi region of
a node is the region of
the plane that is closer
to this node than to
any other node.
0,6
5,2
12,0
Delaunay Triangulation
4,9
10,8
The Delaunay
triangulation has
edges between nodes
in neighboring Voronoi
regions.
0,6
5,2
12,0
Delaunay Triangulation
4,9
10,8
0,6
An equivalent
definition:
A triangulation such
that each
circumscribing circle
of a triangle formed by
three vertices, no
vertex of is in the
interior of the circle.
5,2
12,0
Locally Equiangular Property
• Sibson 1977: Maximize
the minimum angle
For every convex quadrilateral
formed by triangles ACB and
ABD that share a common edge
AB, the minimum internal angle
of triangles ACB and ABD is at
least as large as the minimum
internal angle of triangles ACD
and CBD.
C
A
a
D
b
B
Next-hop routing with Compass Routing
• A node’s parent in a spanning tree is its neighbor which forms
the smallest angle with the root.
• A node need only know information on its neighbors – no
routing protocol is needed for the overlay.
A
30°
Root
B is the Node’s Parent
15°
B
Node
Spanning tree when node
(8,4) is root. The tree can
be calculated by both
parents and children.
4,9
10,8
0,6
8,4
5,2
12,0
Evaluation of Delaunay Triangulation overlays
• Delaunay triangulation
can consider location of
nodes in an (x,y) plane,
but is not aware of the
network topology
Question: How does
Delaunay triangulation
compare with other
overlay topologies?
Overlay Topologies
Delaunay Triangulation and variants
– DT
– Hierarchical DT
– Multipoint DT
Hypercube
Degree-6 Graph
– Similar to graphs generated in Narada
Degree-3 Tree
– Similar to graphs generated in Yoid
Logical MST
– Minimum Spanning Tree
overlays used
by HyperCast
overlays that
assume
knowledge of
network
topology
Transit-Stub Network
Transit-Stub
•
•
•
•
•
GeorgiaTech
topology generator
4 transit domains
416 stub domains
1024 total routers
128 hosts on stub
domain
Evaluation of Overlays
• Simulation:
– Network with 1024 routers (“Transit-Stub” topology)
– 2 - 512 hosts
• Performance measures for trees embedded in an overlay
network:
– Degree of a node in an embedded tree
– “Stretch”: Ratio of delay in overlay to shortest path delay
– “Stress”: Number of duplicate transmissions over a
physical link
Illustration of Stress and Stretch
1
A
1
1
1
1
Stress = 2
1
1
1
1
B
1
Stress = 2
Unicast delay AB :
Delay AB in overlay:
Stretch for AB:
4
6
1.5
Stretch (90th Percentile)
Average Stretch
Stretch (90th Percentile)
90th Percentile of Stretch
Delaunay triangulation
Average Stress
Delaunay triangulation
90th Percentile of Stress
Delaunay triangulation
The DT Protocol
Protocol which organizes members of
a network in a Delaunay
Triangulation
• Each member only knows
its neighbors
• “soft-state” protocol
Topics:
• Nodes and Neighbors
• Example: A node joins
• State Diagram
• Rendezvous
• Measurement Experiments
Each node sends Hello
messages to its neighbors
periodically
4,9
10,8
0,6
5,2
12,0
• Each Hello contains the clockwise (CW) and counterclockwise (CCW)
neighbors
• Receiver of a Hello runs a “Neighbor test” ( locally equiangular prop.)
• CW and CCW are used to detect new neighbors
4,9
5,2
4,9
12,0
5,2
12,0
CCW
0,6
CW
Neighborhood Table of 10,8
Neighbor
10,8
12,0
5,2
–
4,9
–
10,8
A node that wants to join the
triangulation contacts a node
that is “close”
4,9
10,8
0,6
New node
8,4
5,2
12,0
Node (5,2) updates its Voronoi
region, and the triangulation
4,9
10,8
0,6
8,4
5,2
12,0
(5,2) sends a Hello which contains
info for contacting its clockwise and
counterclockwise neighbors
4,9
10,8
0,6
8,4
5,2
12,0
(8,4) contacts these neighbors ...
4,9
10,8
0,6
8,4
5,2
12,0
… which update their respective
Voronoi regions.
4,9
10,8
0,6
8,4
5,2
12,0
(4,9) and (12,0) send Hellos and
provide info for contacting their
respective clockwise and
counterclockwise neighbors.
4,9
10,8
0,6
8,4
5,2
12,0
(8,4) contacts the new neighbor
(10,8) ...
4,9
10,8
0,6
8,4
5,2
12,0
…which updates its Voronoi region...
4,9
10,8
0,6
8,4
5,2
12,0
…and responds with a Hello
4,9
10,8
0,6
8,4
5,2
12,0
This completes the update of the
Voronoi regions and the Delaunay
Triangulation
4,9
10,8
0,6
8,4
5,2
12,0
Rendezvous Methods
•
Rendezvous Problems:
– How does a new node detect a member of the overlay?
– How does the overlay repair a partition?
•
Three solutions:
1. Announcement via broadcast
2. Use of a rendezvous server
3. Use `likely’ members (“Buddy List”)
Rendezvous Method 1:
Announcement via broadcast (e.g.,
using IP Multicast)
4,9
10,8
0,6
New node
8,4
5,2
12,0
Rendezvous
Method 1:
Leader
4,9
10,8
A Leader is a node
with a Y-coordinate
higher than any of its
neighbors.
0,6
5,2
12,0
Rendezvous Method 2:
New node and leader contact a
rendezvous server. Server keeps a cache of some other nodes
4,9
10,8
0,6
New node
8,4
Server
5,2
12,0
Rendezvous Method 3: Each node has a list of “likely” members
of the overlay network
4,9
10,8
0,6
New node
with Buddy
List:
(12,0) (4,9)
8,4
5,2
12,0
State Diagram of a Node
Stopped
Leader
without
Neighbor
Application starts
Neighbor added
(with larger coordinates)
Neighbor added
(with smaller coordinates)
All neighbors
leave or timeout
All neighbors
leave or timeout
A new neighbor with
greater coordinates is added
Leader with
Neighbor
Not
Leader
After removing some neighbor,
this node has largest coordinates
Application
exits
Send Goodbye
Send Goodbye
Leaving
Send
Goodbye
Sub-states of a Node
Stable Without
Candidate
Neighbor
After neighborhood
updating, node becomes
not stable
Node contained in NewNode
passes neighbor test
After handling the
candidate neighbor,
node remains stable
Stable With
Candidate
Neighbor
After neighborhood updating,
node becomes stable.
After neighborhood
updating,
node becomes not stable
Not
Stable
• A node is stable when all nodes that appear in the CW and
CCW neighbor columns of the neighborhood table also appear
in the neighbor column
Measurement Experiments
• Experimental Platform:
Centurion cluster at UVA (cluster of 300 Linux PCs)
– 2 to 10,000 overlay members
– 1–100 members per PC
• Random coordinate assignments
Gigabit Ethernet
Internet
Switch 3
centurion183
centurion253-255
centurion149-167
Switch 4
centurion246
centurion250
centurion251
Switch 8
centurion168-182
centurion164-187
Switch 5
Switch 9
centurion249
centurion252
centurion188-211
Switch 6
Switch 11
Switch 7
Switch 10
centurion128-147
centurion228-247
Experiment: Adding Members
Time to Complete (sec)
How long does it take to add M members to an overlay network
of N members ?
M+N members
Experiment: Throughput of Multicasting
Average throughput
(Mbps)
100 MB bulk transfer for N=2-100 members (1 node per PC)
10 MB bulk transfer for N=20-1000 members (10 nodes per PC)
Bandwidth bounds
(due to stress)
Measured
values
Number of Members N
Experiment: Delay
Delay of a packet (msec)
100 MB bulk transfer for N=2-100 members (1 node per PC)
10 MB bulk transfer for N=20-1000 members (10 nodes per PC)
Number of Nodes N
Wide-area experiments
• PlanetLab: worldwide network of Linux PCs
• Obtain coordinates via “triangulation”
– Delay measurements
to 3 PCs
– Global Network Positioning (Eugene Ng, CMU)
Planetlab Video Streaming Demo
Summary
• Overlay socket is general API for programming P2P systems
• Several proof-of-concept applications
• Intensive experimental testing:
– Local PC cluster (on 100 PCs)
• 10,000 node overlay in < 1 minute
• Throughput of 2 Mbps for 1,000 receivers
– PlanetLab (on 60 machines)
• Up to 2,000 nodes
• Video streaming experiment with 80 receivers at 800 kbps had few losses
HyperCast (Version 2.0) site:
http://www.cs.virginia.edu/hypercast
Design documents, download software, user manual
Download