Survey on Peer-to-Peer Systems: Incentives for Cooperation Smita Rai Guides: Prof. Dipak Ghosal, Prof. Xin Liu Abstract Peer-to-peer computation, because of its unique advantages over the common client / server model, has given rise to several killer applications like Napster, Gnutella and KaZaA. The central tenet of P2P systems is cooperation. However, mostly users are not altruistic and have some natural disincentives to cooperate. Thus, incentive mechanisms that motivate users to contribute resources may be critical to the eventual success of such systems. This report looks at some well-known peer-to-peer applications and the problems posed by “free-riders” in such systems. We then survey some of the incentive based schemes proposed to overcome this problem. ECS289 Survey Report Table of Contents Introduction ......................................................................................................................... 3 The Tragedy of the Commons ........................................................................................ 3 Gnutella: A Case Study................................................................................................... 4 Incentive based Schemes .................................................................................................... 6 Quantifying disincentives in P2P Systems ..................................................................... 6 Rationality and Self Interest in P2P Networks ............................................................... 7 Peer-Approved Incentive Mechanism............................................................................. 8 Incentives for cooperation in Peer-to-Peer Networks ................................................... 10 Addressing the Non-cooperation Problem in P2P Systems .......................................... 11 Conclusions ....................................................................................................................... 13 References ......................................................................................................................... 14 2 ECS289 Survey Report Introduction The appearance of new forms of Peer-to-Peer (P2P) network applications such as Gnutella [Gn00a], KaZaA [Kazaa] and FreeNet [Fr00], holds promise for the emergence of fully distributed information sharing systems. These systems, inspired by Napster [Na00], will allow users worldwide access and provision of information while enjoying a level of privacy not possible in the present client-server architecture of the Web. The traditional client / server architecture has the following limitations: a) b) c) d) Hard to achieve scalability. Single point of failure. Administrative requirements. Unused resources at the edges of the network. P2P computing, which aims to avoid the above problems, is defined as the sharing of computer resources and services by direct exchange between systems [P2P]. These resources and services can include the exchange of information, processing cycles, cache storage, and disk storage for files. P2P computing takes advantage of the existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power for the ‘benefit’ of all. Keeping in mind the central tenet of co-operation, this gives rise to the issue of securing enough cooperation in such large and autonomous systems so that they become truly useful. There is a possibility that users will stop producing and only consume. This free riding behavior is the result of a social dilemma that all users of such systems confront and may result in “The Tragedy of the Commons” [Hardin68] for the system. The Tragedy of the Commons This term was first coined by G. Hardin [Hardin68] to denote the situation in which a group of people attempts to utilize a common good in the absence of central authority. In the context of P2P applications this common good can be the provision of a very large library of files, music and other documents to the user community. The dilemma for each individual is then to either contribute to the common good, or to shirk and free ride on the work of others. Hardin used a simple example of an open pasture to demonstrate how the tragedy develops. It is to be expected that each herdsman will try to keep as many cattle as possible on the commons. As a rational being, each herdsman seeks to maximize his gain. His utility for adding one more animal to his herd has one negative and one positive component. 1. The positive component is a function of the increment of one animal. Since the herdsman receives all the proceeds from the sale of the additional animal, the positive utility is nearly + 1. 2. The negative component is a function of the additional overgrazing created by one more animal. Since, however, the effects of overgrazing are shared by all the herdsmen, the negative utility for any particular decision making herdsman is only a fraction of - 1. 3 ECS289 Survey Report Adding together the component partial utilities, the rational herdsman concludes that the only sensible course for him to pursue is to add another animal to his herd. However, this is the conclusion reached by each and every rational herdsman sharing the commons. “Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit -- in a world that is limited. Ruin is the destination toward which all men rush, each pursuing his own best interest in a society that believes in the freedom”. In the following section we look at a typical P2P file sharing system and how the “free-riders” create problems that limit the utility of the system. Gnutella: A Case Study The architecture for the Gnutella network [Gn00a] is as follows: Fig 1: Gnutella (Courtesy: http://computer.howstuffworks.com/file-sharing.htm) 1. No central servers. 2. In order to join the system, a user initially connects to one of the several known hosts that are almost always available. 3. The user uses an application that adheres to the Gnutella protocol. Each instance of this application is called a peer. A peer can act as a client (consumer of information) or server (a supplier of information). 4. Peers broadcast query messages for a file with a TTL. The peers that receive the query message, either send a query response (if they have the file) or forward them to their neighbors, unless limited by the TTL. 4 ECS289 Survey Report Since files on Gnutella are treated like a public good and the users are not charged in proportion to their use, it appears rational for people to download music files without contributing by making their own files accessible to other users. Because every individual can reason this way and free ride on the efforts of others, the whole system's performance can degrade considerably, which makes everyone worse off. The second problem caused by free riding is to create vulnerabilities for a system in which there is risk to individuals. If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Users in such an environment thus become vulnerable to lawsuits, denial of service attacks, and potential loss of privacy. Extensive analysis of user traffic on Gnutella shows a significant amount of free riding in the system [AdHu00]. The authors, by sampling messages on the Gnutella network, discover that almost 70% of Gnutella users share no files, and nearly 50% of all responses are returned by the top 1 % of sharing hosts. The top 333 hosts (1%) Share As percent of the whole 1,142,645 37% 1,667 hosts (5%) 2,182,087 70% 3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%) 2,928,905 94% 6,667 hosts (20%) 3,037,232 98% 8,333 hosts (25%) 3,082,572 99% Table 1 – Statistics for the Gnutella Network [AdHu00] This study also differentiates between two kinds of free riders: 1. Peers that do not provide files for download by others. 2. Peers that provide downloadable content that is not desirable. Essentially, a quality versus quantity issue. This poses a social dilemma when there is a cost to the provider to make desirable files available to others. Thus, the case of Gnutella network demonstrates the need for providing incentives for the users to cooperate in similar P2P applications. In the next section we look at some of the proposals that seek to mitigate selfish behavior of the users to promote the utility of the overall system. 5 ECS289 Survey Report Incentive based Schemes Quantifying disincentives in P2P Systems In this paper [Feldman03], the authors attempt to quantify the performancebased disincentive a user in a typical P2P file-sharing system may have. They use the average latency of a file transfer as the performance metric. The authors try to capture how the performance experienced by a user varies as a function of: a) the sharing level b) whether a user shares files or not c) the asymmetry in the host incoming and outgoing bandwidths d) the system load. bSin Server TCP acks bCout sender S C bSout data Client bCin Fig 2: Local view of one host downloading from another [Feldman03] The authors make the following assumptions about their model: a) b) c) d) e) f) g) Download time is dominated by transfer time Bottleneck is always at the edge of the network Traffic follows TCP protocol Searches experience no delay; require negligible BW Files have the same size, popularity and spatial distribution Generated load is evenly distributed Number of uploads per node is proportional to its outgoing bandwidth The authors distinguish between potential and actual disincentives. The potential disincentive is when the users think their download will be delayed by their uploads. The reason is that the throughput of a TCP traffic depends on the interactions between the data and the ACK flows. The authors show by simulation of a three node network, that as a result of uploads, ACKs from a node get delayed sufficiently (in contending with the uploaded data on the outgoing link) to result in the following utilization (% of incoming link used for downloads): ADSL Ethernet in bw 1.5Mb/s 10Mb/s out bw 128Kb/s 10Mb/s 6 link utilization 0.2 0.8 ECS289 Survey Report Thus, there is a high potential disincentive for a node to allow uploads, particularly in the ADSL case. However, the authors based on their model theoretically show that the actual disincentive depends on the location of the bottleneck in transmission. If the server is the bottleneck, which occurs for a low level of sharing, there is no actual disincentive for the client to share. If the client is the bottleneck, when the level of sharing is high in the network, the client has a disincentive to share. The authors’ simulation results substantiate their theoretical results. For homogeneous systems there is no disincentive to share whatever be the sharing level and for heterogeneous system, the nodes with ADSL experience actual disincentive to share, but at a high level of sharing in the network. Fig 3: Latency experienced by nodes in a heterogeneous system 95% ADSL, 5% Ethernet nodes [Feldman03] To remove this disincentive, the authors propose that TCP ACKs should be prioritized over normal data flows on the outgoing link. The simulations results however, the authors feel are unclear, since this has a positive effect on the receiver’s incoming throughput but a negative effect on the sender’s outgoing throughput. Rationality and Self Interest in P2P Networks This paper by [Shn03] is basically theoretical and it was included in the survey because it gives an interesting perspective and proposes the use of an emerging field of computer science and artificial intelligence, to solve the problem of rational behavior in P2P systems. The paper has three objectives: a) To convince the reader that rationality is a real issue in peer-to-peer networks. b) To introduce Algorithmic Mechanism Design (AMD) and Distributed Algorithmic Mechanism Design (DAMD) as tools, which can be used when designing networks with rational nodes. c) To describe three open problems that are relevant in the peer to peer setting but are unsolved in existing AMD/DAMD work. 7 ECS289 Survey Report The authors give examples of the existence of rational behavior in all forms of P2P systems, whether peer-to-peer search, as in Kazaa [Kazaa], or peer-to-peer computation, as in Seti@Home project, to prove the existence of rational behavior in any P2P system. Proposing the use of the field of Mechanism Design (MD), the authors give an overview of the objectives of MD, which are of interest to the designer of a P2P system. The idea in MD is to define the strategic situation, or “rules of the game”, so that the system as a whole exhibits good behavior in equilibrium when self-interested nodes pursue self-interested strategies. Formally, a mechanism is a specification of possible player strategies and a mapping from the set of played strategies to outcomes. MD can be thought of as inverse game theory – where game theory reasons about how agents will play a game, MD reasons about how to design games that produce desired outcomes. MD assumes that the players feed their calculated strategies to a special obedient center that performs the mechanism calculation and declares the outcome. A famous example of a good mechanism (with a center) is the second-price sealed-bid auction (Vickrey Auction). As opposed to MD, the field of DAMD assumes that the mechanism calculation is carried out via a distributed computation. The authors finally raise three open problems which are unsolved in the AMD / DAMD work but which are relevant to P2P setting: a) Open Problem #1: What effect does network topology have on message passing in a centralized mechanism running on a peer-to-peer network? What about in a decentralized mechanism? b) Open Problem #2: What are the bounds on the guarantees that mechanism design can provide in a distributed setting, and what is the minimum set of helper technologies that must be employed in concert with DAMD ideas in distributed networks? c) Open Problem #3: How can assumptions about the distribution (but not the identity) of various node strategy types help to create mechanisms with good properties? Peer-Approved Incentive Mechanism The authors in [Rang03] model the problem of co-operation in P2P systems as a Multi-Person Prisoner’s Dilemma (MPD) [MPD]. The following four conditions define an MPD: 1. There are n players in the system, each with the same binary choice and payoffs. 2. Each player has the same preferred choice, which does not change, no matter what other players do. 3. A player is always better off if more among the others choose the un-preferred alternative. 4. For a certain k > 1, if k or more players choose the un-preferred alternative, they are better off than if all players had chosen the preferred alternative. 8 ECS289 Survey Report They classify incentivizing schemes as either using pricing policies or using nonpricing policies and they compare and give simulation results of one from each category: a) Token Exchange – A form of pricing scheme, in which a consumer must transfer a token to the supplier prior to a file download. To enable newcomers to use the system, each first-time user might be allotted a fixed number of tokens, but once these run out, the user has to serve files to earn tokens. b) Peer-Approved - A reputation system is used to maintain ratings for users, who are allowed to download files only from others with a lower or equal rating. This strategy motivates users to increase their rating in order to gain access to more files. User ratings can be based on different metrics: e.g., the number of files advertised by a user or the number of file-requests served by a user. First-time users without files to share should be allowed to download a small number of files so that they can enter the system and build their rating. The authors believe the second non – pricing scheme is more flexible since the user does not have to take a decision each time they want a file. A kind of flat price versus usage based price argument. However, the authors assume that the underlying reliable and secure mechanism to implement the above schemes is already in place and focus on the policies. The authors compare the above two strategies and a modification of PeerApproved - Peer Approved Tier (in which only a limited number of user rating categories are allowed) with help of simulations. In the simulation analysis, the heterogeneous set of users change the number of files they share, depending on the perceived benefits in each iteration of the simulation. Each user has 50 files and they are assumed to be equally popular. All the users have the same bandwidth and storage space. Each user initially advertises only a percentage of his files, according to a Zipf distribution. For the Peer-Approved schemes the rating of a user is the number of files currently advertised. The results are illustrated in Fig 4. Fig 4: Simulation results for a Zipf file advertising distribution [Rang03] 9 ECS289 Survey Report Thus, the performance of Peer- Approved is comparable to Token Exchange and so the authors conclude that it is a useful scheme in scenarios where pricing scheme like Token Exchange are not preferred. However, it is to be noted that the authors ignore the existence of the second type of free riders, as shown in the Gnutella study, those who advertise content that is not desired by others. So, even if a user is advertising files he may choose to advertise files which are of no use to others, and thus manipulate the system. Incentives for cooperation in Peer-to-Peer Networks In this paper [Lai03], the authors use and extend the Evolutionary Prisoner’s Dilemma [EPD] to study co-operation in a P2P system. The EPD adds to the classical Prisoner Dilemma by introducing repetition of games and the building of reputation. The authors’ extended version, called the Asymmetric EPD (AEPD) works as follows: a) b) c) d) e) f) AEPD consists of players who meet for games. A player can be a client in one game and a server in another. The server has a choice between co-operation and defection. Players decide depending on a strategy. They may maintain histories of other players’ actions. As a result of client and server’s actions, the payoffs from a payoff matrix are added to their scores. g) Round consists of one game by each player in the system as a client and a server. h) A generation consists of r rounds. i) After a generation, all history is cleared. j) Players evolve from their current strategies to higher scoring strategies in proportion to the difference between the average scores of the two strategies, after a generation. They assume three types of players at the start of the game: 100 % Cooperators, 100 % Defectors, Reciprocatives who use the decision function - P(cooperation with X)= Min { (Co-op X gave/ co-operation X received), 1}. They give simulation results for different mix proportions of the initial population. 10 ECS289 Survey Report They also compare the performance of different stranger policies: a. 100% Defect. b. 100% Co-operate. c. Adaptive. Pc t+1 = (1- mu)* Pc t + mu * Ct Ct = 1 if last stranger co-operated, 0 otherwise. Pc t = probability to co-operate with stranger at time t. In summary, their simulation results show the following: 1. Incentive techniques relying on private history of other players’ actions fail as population size increases. 2. Shared history scales to large population but requires supporting infrastructure, and is subject to collusion. Collusion is when a group of players conspire and share wrong history about their defector friends, saying they co-operated. 3. Incentive techniques that adapt to the behavior of strangers converge to complete co-operation despite no centralized identity allocation. Addressing the Non-cooperation Problem in P2P Systems The authors in [Kamvar03] look at the problem of cooperation with a fresh perspective. They make the following observations: a) In a P2P system, where users gain from answering a query, free riding is an unlikely problem. For example, in a pay-per-transaction file-sharing system where peers get paid for uploading files, peers will want to share files, because this generates income. In an auction system, where the auction advertisement is analogous to a query and bids analogous to query responses, peers will want to submit bids. Not 11 ECS289 b) Survey Report only are peers eager to provide services (e.g. share files), but they are in competition with other peers to provide their services. This competition creates another problem for networks like Gnutella, which depend on peers to forward queries to other peers, since a peer may drop the query to improve its chances of winning the auction (being selected to answer the query). The authors propose the Right To Respond (RTR) protocol for tackling the second issue. They propose to run this protocol on top of Gnutella. They assume the existence of an efficient micropayment scheme for their proposal. The RTR protocol works as follows: At the core of the protocol is the concept of a right to respond, or RTR. An RTR is simply a token signifying that a peer has a right to respond to a query message. A query is really a commodity. Peers should pay to receive the query, because that in turn brings in potential business. If a peer never receives any queries, then it can never provide its service to anyone. An analogous concept in real life markets are companies that buy lists of emails or referrals from other companies, so that they have a new pool of potential customers. Once a peer buys an RTR for a given query, it may do one or both of the following: (a) respond to the query and hope that it is chosen to upload its services, (b) sell the RTR to other peers. (It can still respond to the RTR even if it resells it). Peers can buy and sell RTRs with their neighbors only. In this framework, selling an RTR is equivalent to forwarding a query. Hence, there is built-in incentive to forward queries, since peers get paid to do so. Of course, some peers may still choose to not forward any queries in order to increase the probability that they will be chosen to provide the service. However, their actions will be offset by those peers who hedge their risk by selling a few RTRs, and by those peers who speculate in RTRs (buying RTRs simply to resell them). Basic Implementation of RTR An RTR has the following format: RTR = {Q, ts, query} SKQ Q is the identity of the querying peer ts is the timestamp at which the query was first issued, and query is the actual query string. These three values are signed by the querying peer's secret key SKQ , so that RTRs cannot be forged. Hence, each query requires a single signature generation, and a verification per forward. When a peer A forwards a query to a neighbor B, it will first send the offer containing partial RTR information and a price: Offer = {rep(Q), ts, query, price} 12 ECS289 Survey Report where rep(Q) is the reputation of the querying peer. The authors do not specify how this reputation is exactly determined. The offer contains enough information for B to determine whether to purchase the RTR, and whether the RTR is a duplicate B has seen before. However, because the identity of Q is not revealed, B cannot actually answer the query without purchasing the full RTR. If B decides not to purchase the RTR, he will simply drop the offer. Otherwise, B will send a purchase request to A, and peer A will forward the full RTR to B. This RTR protocol also allows the use of filters to restrict the RTRs received by a peer. These filters in turn, can be used by the querying node to judge the desirability of RTR by its neighbor and it can fix an appropriate price. Peers also have the option of disconnecting from neighbors who are either bad sellers (sell uninteresting RTRs) or bad buyers (do not buy RTRs). The authors propose to study the performance of the protocol using simulation in future studies. Conclusions The peer-to-peer networking paradigm promises to revolutionalize the way we design, build and use the communications network of tomorrow. The fundamental premise of peer-to-peer systems is that individual peers voluntarily contribute resources to the system. However, the inherent tension between universal cooperation for optimal overall utility, individual incentive to defect, and rational behavior leads to suboptimal utility in such systems. This problem has recently come into sharp focus with the revealing study of Gnutella by [AdHu00]. The alternative is to provide incentives for the users to cooperate in such systems. Most of the schemes surveyed in this paper, use game theoretic models to analyze the problem. One proposal introduces the field of inverse game theory to design P2P systems. Others propose the use of pricing mechanisms to act as incentives to mitigate selfish behavior. However, the effectiveness of these schemes can only be tested after a full-fledged implementation. Most of them work with certain assumptions about the P2P systems that need to be verified by an actual implementation. The growing popularity of P2P systems, as is evidenced by the fact that at present, there is more KaZaA traffic than Web traffic (!) [RossInfocom] demands an urgent interest in looking at the issues threatening their survival. 13 ECS289 Survey Report References [Gn00a] The Gnutella home page, http://www.gnutella.com/ [Na00] The Napster home page, http://www.napster.com/ [Fr00] The FreeNet home page, http://freenet.sourceforge.net/ [Kazaa] The KaZaA home page, http://www.kazaa.com/us/ [P2P] http://www-sop.inria.fr/mistral/personnel/Robin.Groenevelt/Publications/Peer-toPeer_Introduction_Feb.ppt [Kamvar03] S. Kamvar, B. Yang, and H. Garcia-Molina, "Addressing the Non Cooperation Problem in Competitive P2P Systems," Workshop on Economics of Peer-toPeer Systems, June 2003. [Shn03] J. Shneidman and D. Parkes, "Rationality and Self-Interest in Peer-to-Peer Networks," Proceedings of 2nd Int. Workshop on Peer-to-Peer Systems, February 2003. [Lai03] K. Lai, M. Feldman, I. Stoica, and J. Chuang, "Incentives for Cooperation in Peer-to-Peer Networks," Workshop on Economics of Peer-to-Peer Systems, June 2003. [Hardin68] Hardin, G. The Tragedy of the Commons. Science 162 (1968), 1243–1248. [EPD] Axelrod, R. The Evolution of Cooperation. Basic Books,1984. [Rang03] K. Ranganathan, M. Ripeanu, A. Sarin, and I. Foster, "To Share or Not to Share: An Analysis of Incentives to Contribute in Collaborative File Sharing Environments," Workshop on Economics of Peer-to-Peer Systems, June 2003. [MPD] Schelling, T.C., Micromotives and Macrobehavior. 1978: W.W.Norton & Company. [Feldman03] M. Feldman, K. Lai, J. Chuang, and I. Stoica, "Quantifying Disincentives in Peer-to-Peer Networks", 1st Workshop on Economics of Peer-to-Peer Systems, June 2003 [AdHu00] Adar, E. and B.A. Huberman, Free Riding on Gnutella, 2000, First Monday. http://www.firstmonday.dk/issues/issue5_10/adar/ [RossInfocom] http://cis.poly.edu/~ross/papers/P2PtutorialInfocom.pdf 14