COMP/ELEC 429 Introduction to Computer Networks Overlay networks Some slides used with permissions from Edward W. Knightly, T. S. Eugene Ng, Ion Stoica, Hui Zhang T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 1 Abstract View of the Internet • A collection of IP routers and point-topoint physical links connecting routers • Point-to-point links between two routers are physically as direct as possible – A copper wire, a coax cable or a fiber laid from one router to another T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 2 Reality • Fibers and wires are laid with tremendous physical constraints – You can’t just dig up the ground everywhere and lay fibers – Right-of-way issue – Most fibers are laid along railroads • Physical fiber topology often very far from the topology you want • Internet is over-laid on top of this physical fiber topology • Internet topology is only logical! • Revelation: Internet is an overlay network T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 3 National Lambda Rail Project – Fiber Topology IP logical link Circuit T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 4 T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 5 Overlay Made Possible by Layer Abstraction • Layering hides the detail of lower layer from higher layer • IP operates on datalink layer (say ATM or SONET) logical topology • ATM or SONET creates point-to-point circuits on the fibers Network Datalink Physical T. S. Eugene Ng Datalink Physical eugeneng at cs.rice.edu Network Datalink Physical Rice University 6 Overlay • Overlay is clearly a general concept – You can keep overlaying one network on another, it’s all logical • IP Internet overlays on top of physical topology – Why stop here? • Something else can overlay on top of IP Internet – Use IP encapsulation to create logical links for yet another logical topology – E.g. VPN link from home to Rice’s campus network T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 7 Advanced Reasons to Overlay on IP Internet • IP provides basic best effort datagram service • Many things you may want in a network but not widely supported – Performance-based routing (vs. shortest AS path) – Multicast: One-to-many packet delivery T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 8 Overlay Multicast • IP multicast is not widely deployed – Increase router complexity – Needs universal router support • Cannot incrementally deploy – Poor built-in access control (anyone can potentially send multicast traffic) – Hard to multicast data reliably – Hard to perform congestion control on multicast data • Overlay multicast – Use end hosts as multicast tree nodes to distribute data – Most of the problems with IP multicast are eliminated – But overlay multicast has it’s own set of problems T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 9 Motivating Example: Conference Attendance T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 10 Solution based on naive unicast Stanford Gatech CMU (Source) Berkeley • Client-server architecture (the Web) • Does not scale well with group size – Source host is the bottleneck T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 11 End System Multicast CMU StanfordLAN Gatech StanfordModem Berkeley1 Overlay Tree Gatech Berkeley2 Stanford-LAN StanfordModem CMU Berkeley1 Berkeley 2 T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 12 End System Multicast: Benefits • Scalability – Routers do not do anything special, all new features in end systems • Easy to deploy – Works over the existing IP infrastructure • Can simplify support for higher level functionality CMU Stanford-LAN Berkeley1 Unicast congestion control Gatech Berkeley2 T. S. Eugene Ng Transcoding Stanford-Modem eugeneng at cs.rice.edu Rice University 13 Concerns with End System Multicast • Challenge to construct efficient overlay trees • Performance concerns compared to IP Multicast – Increase in delay – Bandwidth waste (packet duplication) Gatech Stanford CMU Stanford-LAN Gatech StanfordModem CMU Berkeley IP Multicast T. S. Eugene Ng Berkeley1 End System Multicast eugeneng at cs.rice.edu Berkeley2 Rice University 14 More Challenges StanfordLAN Gatech StanfordModem CMU Berkeley1 Berkeley 2 Overlays must adapt to network dynamics and congestion StanfordLAN Gatech StanfordModem CMU Berkeley1 Berkeley 2 Group membership is dynamic: members can join and leave T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 15 Inefficient Overlay Trees CMU StanfordLAN Stanford-Modem Berkeley1 StanfordLAN Stanford-Modem Berkeley1 Gatech -Potential congestion near CMU StanfordLAN Poor bandwidth to members Gatech Berkeley2 Berkeley2 High latency CMU CMU Stanford-Modem Berkeley1 Gatech Berkeley2 T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 16 An Efficient Overlay Tree Stanford-Modem CMU Stanford-LAN Berkeley1 Gatech Berkeley2 T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 17 Auto Tree Building is Very Hard • When a node joins, needs to quickly find a good parent to connect to tree • Good parent: – – – – low delay to source experience low packet loss has enough spare outbound network capacity has low delay, high bandwidth path to new node • When a node leaves, orphaned children nodes need to quickly find new parents • When loss rate or delay is high, need to decide whether to find a new parent – Is it the performance problem with this node, or with its parent, or in the middle of the network? – Are there actually better alternatives? T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 18 Introducing a New Idea • Multicast overlays, unicast overlays are meant to facilitate distribution of data – Route data to receivers – Basically the same fundamental function the Internet is designed for • Content addressable overlay are meant to facilitate the look-up of information! – Route questions to answers T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 19 In 1999, Napster was a huge hit Hash Table T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 20 Distributed Hash Tables (DHT) nodes k1,v1 Operations: insert(k,v) lookup(k) P2P overlay network eugeneng at cs.rice.edu k3,v3 k4,v4 k5,v5 T. S. Eugene Ng k2,v2 k6,v6 Rice University 21 Why p2p overlays? • Leverage pooled resources (storage, bandwidth, CPU) • Leverage resource diversity (geographic, ownership) • Scalability • Robustness • Self-organization T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 22 Pastry – An example system developed at Rice and Microsoft Generic p2p location and routing substrate • Self-organizing overlay network • Lookup/insert object in < log16 N routing steps (expected) • O(log N) per-node state T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 23 Pastry: Object distribution 2128-1 O 128 bit circular ID space nodeIds (uniform random) objId objIds (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 24 Pastry One primitive: route(M, X): route message M to the live node with nodeId closest to key X T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 25 Pastry: Object insertion/lookup 2128-1 O X Msg with key X is routed to live node with nodeId closest to X Route(X) T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 26 Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively • L – configurable parameter • robust routing through neighbor nodes • fault detection (use periodic hello msgs) T. S. Eugene Ng eugeneng at cs.rice.edu Rice University Pastry: Routing d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 Properties • log16 N steps • O(log N) state 65a1fc T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 28 Pastry: Routing table (# 65a1fcx) Row 0 0 x 1 x 2 x 3 x 4 x Row 1 6 0 x 6 1 x 6 2 x 6 3 x 6 4 x Row 2 6 5 0 x 6 5 1 x 6 5 2 x 6 5 3 x 6 5 4 x Row 3 6 5 a 0 x 6 5 a 2 x 6 5 a 3 x 6 5 a 4 x log16 N rows T. S. Eugene Ng 5 x 7 x 8 9 a x x x b x c x d x e x f x 6 6 6 7 x x 6 6 6 8 9 a x x x 6 b x 6 c x 6 d x 6 e x 6 f x 6 5 5 x 6 5 6 x 6 5 7 x 6 5 8 x 6 5 9 x 6 5 b x 6 5 c x 6 5 d x 6 5 e x 6 5 f x 6 5 a 5 x 6 5 a 6 x 6 5 a 7 x 6 5 a 8 x 6 5 a 9 x 6 5 a b x 6 5 a c x 6 5 a d x 6 5 a e x 6 5 a f x eugeneng at cs.rice.edu 6 5 a a x Rice University 29 Pastry: Routing Tradeoff • O(log N) routing table size • O(log N) message forwarding steps T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 30 Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (Rld exists) forward to Rld else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 31 Pastry: Node addition d471f1 d467c4 d462ba d46a1c d4213f New node: d46a1c Route(d46a1c) d13da3 65a1fc T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 32 Node departure (failure) Leaf set members exchange keep-alive messages • Leaf set repair (eager): request set from farthest live node in set • Routing table repair (lazy): get table from peers in the same row, then higher rows T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 33 Application Example PAST: File storage fileId Insert fileId T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 34 PAST: File storage k=4 fileId Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId Insert fileId (k is bounded by the leaf set size) T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 35 PAST: File Retrieval C k replicas Lookup fileId file located in log16 N steps (expected) usually locates replica nearest client C T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 36 Application Example SCRIBE: Large scale multicast topicId Publish topicId Subscribe topicId T. S. Eugene Ng eugeneng at cs.rice.edu Rice University 37