Peer-to-Peer Overlay Networks Outline • Overview of P2P overlay networks • Applications of overlay networks • Classification of overlay networks – Structured overlay networks – Unstructured overlay networks – Overlay multicast networks Overview of P2P overlay networks • What is P2P systems? – P2P refers to applications that take advantage of resources (storage, cycles, content, human presence) available at the end systems of the internet. • What is overlay networks? – Overlay networks refer to networks that are constructed on top of another network (e.g. IP). • What is P2P overlay network? – Any overlay network that is constructed by the Internet peers in the application layer on top of the IP network. Overview of P2P overlay networks • P2P overlay network properties – Efficient use of resources – Self-organizing • All peers organize themselves into an application layer network on top of IP. – Scalability • Consumers of resources also donate resources • Aggregate resources grow naturally with utilization – Reliability • • • No single point of failure Redundant overlay links between the peers Redundant data source – Ease of deployment and administration • • • • The nodes are self-organized No need to deploy servers to satisfy demand. Built-in fault tolerance, replication, and load balancing No need any change in underlay IP networks Applications of P2P overlay networks • P2P file sharing – Napster, Gnutella, Kaza, Emule, Edonkey, Bittorent, etc. • • • • • • • Application layer multicasting P2P media streaming Content distribution Distributed caching Distributed storage Distributed backup systems Grid computing Classification of overlay networks • Structured overlay networks – Are based on Distributed Hash Tables (DHT) – the overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. • Unstructured overlay networks – The overlay networks organize peers in a random graph in flat or hierarchical manners. • Overlay multicast networks – The peers organize themselves into an overlay tree for multicasting. Structured overlay networks • Overlay topology construction is based on NodeID’s that are generated by using Distributed Hash Tables (DHT). • In this category, the overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. • This structured graph enables efficient discovery of data items using the given keys. • Storing the objects in the networks is based on • It Guarantees object detection in O(log n) hops. • Examples: Content Addressable Network (CAN), Chord, Pastry. Unstructured P2P overlay networks • An Unstructured system composed of peers joining the network with some loose rules, without any prior knowledge of the topology. • Network uses flooding or random walks as the mechanism to send queries across the overlay with a limited scope. • When a peer receives the flood query, it sends a list of all content matching the query to the originating peer. • Examples: FreeNet, Gnutella,KaZaA, BitTorrent Unstructured P2P File Sharing Networks • Centralized Directory based P2P systems • Pure P2P systems • Hybrid P2P systems Unstructured P2P File Sharing Networks • Centralized Directory based P2P systems – All peers are connected to central entity – Peers establish connections between each other on demand to exchange user data (e.g. mp3 compressed data) – Central entity is necessary to provide the service – Central entity is some kind of index/group database – Central entity is lookup/routing table – Examples: Napster, Bittorent Unstructured P2P File Sharing Networks • Pure P2P systems – Any terminal entity can be removed without loss of functionality – No central entities employed in the overlay – Peers establish connections between each other randomly • To route request and response messages • To insert request messages into the overlay – Examples: Gnutella, FreeNet Unstructured P2P File Sharing Networks • Hybrid P2P systems – Main characteristic, compared to pure P2P: Introduction of another dynamic hierarchical layer – Election process to select an assign Superpeers – Superpeers: high degree (degree>>20, depending on network size) – Leafnodes: connected to one or more Superpeers (degree<7) – Example: KaZaA Superpeer leafnode P2P: centralized directory original “Napster” design 1) when peer connects, it informs central server: Bob centralized directory server 1 peers – IP address – content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob 1 3 1 2 1 Alice P2P: problems with centralized directory • Single point of failure • Performance bottleneck • Copyright infringement file transfer is decentralized, but locating content is highly decentralized Query flooding: Gnutella • fully distributed – no central server • public domain protocol • many Gnutella clients implementing protocol overlay network: graph • edge between peer X and Y if there’s a TCP connection • all active peers and edges is overlay net • Edge is not a physical link • Given peer will typically be connected with < 10 overlay neighbors Gnutella: protocol Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse Query path Scalability: limited scope flooding QueryHit File transfer: HTTP Query QueryHit Gnutella: Peer joining 1. Joining peer X must find some other peer in Gnutella network: use list of candidate peers 2. X sequentially attempts to make TCP with peers on list until connection setup with Y 3. X sends Ping message to Y; Y forwards Ping message. 4. All peers receiving Ping message respond with Pong message 5. X receives many Pong messages. It can then setup additional TCP connections Peer leaving: see homework problem! Exploiting heterogeneity: KaZaA • Each peer is either a group leader or assigned to a group leader. – TCP connection between peer and its group leader. – TCP connections between some pairs of group leaders. • Group leader tracks the content in all its children. ordinary peer group-leader peer neighoring relationships in overlay network KaZaA: Querying • Each file has a hash and a descriptor • Client sends keyword query to its group leader • Group leader responds with matches: – For each match: metadata, hash, IP address • If group leader forwards query to other group leaders, they respond with matches • Client then selects files for downloading – HTTP requests using hash as identifier sent to peers holding desired file KazaA tricks • • • • Limitations on simultaneous uploads Request queuing Incentive priorities Parallel downloading Internet P2P Traffic Statistics • Between 50 and 65 percent of all download traffic is P2P related. • Between 75 and 90 percent of all upload traffic is P2P related. • And it seems that more people are using p2p today • So what do people download? – 61,4 percent video 11,3 percent audio 27,2 percent is games/software/etc. • Source: http://torrentfreak.com/peer-to-peer-trafficstatistics/ Overlay Multicasting • Motivation – IP multicast has not be deployed over the Internet due to some fundamental problems in congestion control, flow control, security, group management and etc. – For the new emerging applications such as multimedia streaming, internet multicast service is required. – Solution: Overlay Multicasting • Overlay multicasting (or Application layer multicasting) is increasingly being used to overcome the problem of nonubiquitous deployment of IP multicast across heterogeneous networks. Overlay Multicasting • Main idea – Internet peers organize themselves into an overlay tree on top of the Internet. – Packet replication and forwarding are performed by peers in the application layer by using IP unicast service. Overlay Multicasting • Overlay multicasting benefits – Easy deployment • It is self-organized • it is based on IP unicast service • There is not any protocol support requirement by the Internet routers. – Scalability • It is scalable with multicast groups and the number of members in each group. – Efficient resource usage • Uplink resources of the Internet peers is used for multicast data distribution. • It is not necessary to use dedicated infrastructure and bandwidths for massive data distribution in the Internet. Overlay Multicasting • Classification of overlay multicast approaches – DHT based – Tree based – Mesh-tree based Overlay Multicasting • DHT based – Overlay tree is constructed on top of the DHT based P2P routing infrastructure such as pastry, CAN, Chord, etc. – Example: Scribe in which the overlay tree is constructed on a Pastry networks by using a multicast routing algorithm (similar to core based tree (CBT)). Overlay Multicasting • Tree based – Group members self-organize themselves into a tree by explicitly picking a parent for each new group. – Nodes on the tree may establish and maintain control links to one another in addition to the links provided by the data tree. As such,the tree, with these additional control links constitutes the control topology in a tree structure. – This approach is simple and is capable of building efficient data delivery trees. – The tree building algorithm must prevent loops and handle tree partition as the failure of a single node may cause a partition of the overlay topology. – Examples: ALMA, ALMI, OMNI, NICE, ZIGZAG, BTP, Overcast, … Overlay Multicasting • Mesh-tree based – The mesh-tree approach is a two-step design to the overlay topology. – It is common for group members to first distributedly organize themselves into an overlay control topology called the mesh. A routing protocol runs across this control topology and defines a unique overlay path to each and every member. – Data distribution trees rooted at any member is then built across this mesh based on some multicast routing protocols, e.g. DVMRP. – Compared to tree only design, mesh-tree approach is more complex. – it has the advantages of avoiding replicating group management functions across multiple (per-source) trees, providing more resilience to failure of members, leveraging on standard routing algorithms thus simplifying overlay construction and maintenance as loop avoidance and detection are built-in mechanisms in routing algorithms. – Examples: Narada, Kudos, Scattercast, Yoid