MIT LIBRARIES 3 9080 02246 1260 DUPL DEWEY ^ '<> •'Bu§in«ss#MIT MIT Sloan School of Management Sloan Working Paper 4180-01 eBusiness@MIT Working Paper 101 October 2001 BUILDING TRUST ON-LINE: THE DESIGN OF RELIABLE REPUTATION REPORTING BUILDING TRUST ON-LINE: THE DESIGN OF RELIABLE REPUTATION REPORTING Chrysanthos Dellarocas This paper is available through the Center for eBusiness@MIT web site at the following URL: http://ebusiness.mit.edu/research/papers.htiTil This paper also can be downloaded without charge from the Network Electronic Paper Collection: Social Science Research http://papers.ssm. com/abstract=289967 Building Trust On-Line: The Design of Reliable Reputation Reporting Mechanisms for Online Trading Communities Chrysanthos Dellarocas Sloan School of Management Massachusetts Institute of Technology RoomE53-315 MA 02139 Cambridge, dell(a!mit.cdu Abstract: wisdom of trading communities on Several properties of online interaction are challenging the accumulated how to produce and manage management mechanism in trust. Online reputation reporting systems have emerged as a promising such settings. The objective of this paper onhne reputation reporting systems sets the stage by providing a critical reporting systems can be severely effects overview of the current number of important ways identifies a the paper that are robust in the presence is a in which the compromised by state reliability is of unfair and deceitful of the art in this area. how The paper raters. Following that, it of the current generation of reputation The central contribution of for effectively countering the undesirable of such fraudulent behavior. The paper describes the mechanisms, proves e.\plains of to contribute to the construction unfair buyers and sellers. number of novel "immunization mechanisms" trust their properties and various parameters of the marketplace microstructure, most notably the anonymity and authentication regimes, can influence their effectiveness. Finally, concludes by discussing the it implications of the findings for the managers and users of current and future electronic marketplaces and identifies 1. some important open issues for fiiture research. Introduction At the heart of any the agreed example, bilateral upon terms in in transactions goods or services or exchange there ways is a temptation, for the party that result in individual gains for where the buyer pays them to provide first, at a quality the seller which is is it who moves second, to defect from (and losses for the other party). For tempted to not provide the agreed upon inferior to what was advertised to the buyer. Unless there are some other guarantees, the buyer would then be tempted to hold back on her side of the exchange as worse Our risks off. well. In such situations, the trade will never take place Unsecured bilateral exchanges thus have the structure of a Prisoner's Dilemma. society has developed a wide range of informal and thus and both parties will end up being facilitating trade. The simple act mechanisms and formal of meeting face-to-face to institutions for m.anaging settle a transaction helps such reduce the likelihood that one party will end up empty-handed. Written contracts, commercial law, credit card companies and escrow services are additional examples of institutions with exactly the same goals. Although mechanism design and institutional support them completely. One example eliminate is can help reduce transaction the risk involving the exchange of goods whose "real" quality can only be assessed by the buyer a relatively long time after a trade has been completed Even where society does provide remedial measures to can never risks, they cover risks (e.g. used such cases (for example, the in Massachusetts "lemon law"), these are usually burdensome and costly and most buyers would very rather not in have cars). much them. Generally speaking, the more the two sides of a transaction are separated to resort to time and space, the greater the risks. In those cases, no transaction will take place unless the party moves first some possesses sufficient degree of tnist that the party commitments. The production of trust, therefore, is who moves second will indeed who honor its a precondition for the existence of any market and civilized society in general (Dunn, 1984; Gambetta, 1990). In "bncks and mortar" communities, the production of trust sometimes purely intuitive. their appearance, the tone For example, we tend to of their voice or about their prior experiences with the reliable predictor new their is based on several cues, often rational but trust or distrust potential trading partners body language. based on We also ask our already trusted partners prospect, under the assumption that past behavior is a relatively of future behavior. Taken together, these experiences form the reputation of our prospective partners. The emergence of electronic markets and other types of online trading communities are changing the rules on many aspects of doing business. Electronic markets promise substantial gains efficiency by bringing together a much larger set of buyers and sellers in productivity and and substantially reducing the search and transaction costs (Bakos, 1997; Bakos, 1998). In theory, buyers can then look for the best possible deal and end up transacting with a different will seller on every single transaction. None of these theoretical gains be realized, however, unless market makers and online community managers find effective ways produce among trust their members. The production of trust thus emerging as an important is to management challenge in any organization that operates or participates in online trading communities. Several properties of online communities challenge the accumulated produce trust. Formal institutions, such as legal guarantees, are which span multiple jurisdictions with, often example, it is with a seller markets, it is lives in Indonesia. The difficulty who is to less effective in global electronic markets, conflicting, legal systems (Johnson very difficult, and costly, for a buyer who wisdom of our societies on how and Post, 1996). For resides in the U.S.A. to resolve a trading dispute compounded by the fact that, in relatively easy for trading partners to suddenly "disappear" online identity (Friedman and Resnick, 1999; KoUock, 1999). many electronic and reappear under a different Furthermore, many of the electronic markets where face-to-face contact electronic markets is among parties we cues based on which tend to trust or distrust other individuals are absent in open up the universe of potential trading partners and enable transactions the desire to who have never worked together in the past. In such a large trading space, most of one's already trusted partners are unlikely to be able to provide the other prospects that one As may much information about the reputation of many of be considering. a counterbalance to those challenges, electronic communities are capable of storing complete and transactions they mediate. Several researchers and practitioners have, accurate information about all therefore, started to look at ways makers or other trusted one of the motivating forces behind the exception. Finally, is in which information can be aggregated and processed by the market this and third parties in order to help online buyers trustworthiness. This has lead to a new breed of systems, which sellers assess are quickly each other's becoming an indispensable component of every successful online trading community: electronic reputation reporting systems. We are already seeing the first generation of such systems in the form of online ratings, feedback or recommender systems (Resnick and Varian, 1997; Schafer community members and making them available to build trust basic idea by aggregating such to other users as predictors (www.ebay.com), for example, encourages both online is that of its members, as well as all individual ratings of past behavior of of fiiture behavior. eBay parties of each transaction to rate positive (+1), neutral (0) or a negative (-1) rating plus a short The majority of the The 2001). are given the ability to rate or provide feedback about their experiences with other community members. Feedback systems aim their users et.al., comments one another with either a comment. eBay makes the cumulative ratings publicly available to every registered user. current generation of online feedback systems have been developed entrepreneurs and their reliability has not yet been systematically researched. In by Internet fact, there is ample anecdotal evidence, as well as one recent legal case', related to the ability to effectively manipulate people's actions by using online feedback forums (stock message boards in this case) to spread false opinions. more and more organizations deserve new become a participate in electronic marketplaces, online reputation reporting systems scrutiny and the study of trust new management systems addition to the burgeoning field of The objective of this paper are robust in the presence overview of the current is Management to contribute to the construction of unfair and state As of the deceitftil raters. art in this area in digital 2). to Science. of online reputation reporting systems The paper (Section communities deserves sets the stage Following that, it by providing identifies a that a critical number of important ways in which the predictive value of the current generation of reputation reporting systems can be severely compromised by unfair buyers and number of novel "immunization mechanisms" fraudulent behavior. The paper describes the sellers (Section 3). The central contribution of the paper for effectively countering the undesirable effects mechanisms, proves their properties and explains is a of such how various parameters of the marketplace microstructure, most notably the anonymity and authentication regimes, can influence their effectiveness (Section 4). Finally, it concludes by discussing the implications of the findings for the managers and users of current and future electronic marketplaces and identifies some open issues for future research (Section 5). 2. Reputation reporting mechanisms The relative ease with past transactions, communities which computers can capture, makes and process huge amounts of information about store past behavior (reputational) information a particularly promising base the production of trust ways of producing in online in online communities. This fact, to together with the fact that the other traditional trust (institutional guarantees, indirect cues) prompted researchers and practitioners way on which to focus their attention do not work as well on developing online mechanisms based on reputational infonnation. This section provides in cyberspace, has trust building a critical survey of the state-of-the-art in this field. A repuiaiion, as defined by Wilson (Wilson, 1985) another. Operationally, this is is Its predictive indicative of future behavior". Reputation has a long time (Schmalensee, 1978; Shapiro, 1982; game theorists reputations sellers to is one person by a "characteristic or attribute ascribed to usually represented as a prediction about likely future behavior. however, primarily an empirical statement. behavior is have demonstrated that, in the power depends on It is, the supposition that past been the object of study of the social sciences Smallwood and Conlisk, for 1979). Several economists and presence of imperfect infonnation, the formation of an important force that helps buyers manage transaction risks, but also provides incentives to provide good service quality. Having interacted with someone in the past is, of course, the most reliable source of information about agent's reputation. But, relying only on direct experiences because an individual will be limited in the is that both inefficient and dangerous. Inefficient, number of exchange partners he or she has and dangerous because one will discover untrustworthy partners only through hard experience (Kollock, 1999). These shortcomings are especially severe partners is huge and the Great gains are possible the in the context of online communities where the institutional guarantees in case if number of potential of negative experiences are weaker. information about past interactions is shared and aggregated within a group form of opinions, ratings or recommendations. In the "bricks and mortar" communities many this forms: informal gossip networks, institutionalized rating agencies, professional critics, cyberspace, they take the form of online reputation reporting systems, also known as online can take etc. In recommender systems (Resnick and Varian, 1997). The following sections provide a brief discussion of the most important design challenges and categories of these systems. 2.1 Desitin challenges in online reputation reporting systems in Although the effective aggregation of other community members' opinions can be a very effective way gather information about the reputation of prospective trading partners, it is not without pitfalls. to The following paragraphs describe two important issues that need to be addressed by opinion-based reputation reporting mechanisms: Subjectively measurable attributes. of the paper we will use the term "agent" In the rest software) of an online trading community. measurable if identical for attribute A, A, subjeciively measurable attribute some of the attributes In order for agent b to is Q by the respective raters. agent s and is b_^ seller, human or subjectively may result in two The most common example of a the notion of product or service "quality". In most transaction types, make use of other it must agents' ratings for subjectively measurable attributes as a basis for "translate" each first try to of them into its own value system. In communities we address the above issue by primarily accepting recommendations from people traditional whom we know already. In those cases, our prior experience with these people helps us opinions and "translate" them into our value system. For example, we may know from gauge their past experience that extremely demanding and so a rating of "acceptable" on his scale would correspond to "brilliant" on As our scale. food, so Due b^ (buyer or of interest are subjectively measurable. calculating agent 5's reputation, Bill is Q of an say that an attribute behavior of agent 5 vis-a-vis two different agents R' ^ R' different ratings We to refer to a participant we to the a further example, that follow her opinions on movies while much likely that our considered we may know larger number of potential immediate "friends" It is, more therefore, such opinions becomes will much more we similar tastes in movies but not in ignore her recommendations on restaurants. trading partners, in online communities have had likely that Mary and we have we direct experiences with several will have to rely it is, once again, less of the prospects on the opinions of strangers so gauging difficult. Intentionally false opinions For a number of reasons (see Section 3) agents that is, may deliberately provide false opinions about another agent, opinions, which bear no relationship to their truthful assessment of their experiences with that other agent. In contrast to subjective opinions, for "translation" to somebody which we have assumed that there can be a possibility of else's value system, false opinions are usually deliberately constructed to mislead their recipients and the only sensible ignore them, however, one has to first way to treat them is to ignore them. In order to be able to be able to identify them. Before accepting opinions, raters must, therefore, also assess the trustworthiness of other agents with respect to giving honest opinions. et. al., 1993) correctly pointed out that the so-called "recommender trustworthiness" of an agent (Yahalom is orthogonal to its trustworthiness as a service provider. In other words, an agent can be a high-quahty service provider and a very unreliable recommendation provider or vice versa. of the section we will briefly survey the various classes of proposed online reputation reporting In the rest systems and will discuss 2.2 Recommendation Recommendation how each of them fares in addressing the above issues. repositories repositories store and make community members without attempting available reconmiendations from a large to substantially process or qualify them. very well suited for constructing such repositories. In auction sites, seller Feedback consists of a numerical short (80 characters max.) text comments the feedback Amazon employ such as Yahoo and eBay encourages the buyer and is which is is obviously mechanism of auction site A typical eBay. Other popular very similar mechanisms. can be +1 (praise), comment. eBay then makes the list accessible to any other registered user of the system. to leave feedback for each other. (neutral) or -1 (complaint) plus a of all submitted feedback ratings and eBay does calculate some rudimentary of the submitted ratings for each user (the sum of positive, neutral and negative ratings statistics 7 days, past month and 6 months) Recommendation but, otherwise, it does not their own filter, repositories are a step in the right direction. agents available to interested users, but they expect users to draw etc.) fall into this category. of an eBay-mediated transaction rating, Web most current-generation web-based fact, recommendation systems (message boards, opinion forums, representative of this class of systems The number of conclusions. somebody's trustworthiness On is the one hand, modify or process the submitted They make lots ratings. of information about other "make sense" of those this is consistent in the last ratings themselves and with the viewpoint that the assessment of an essentially subjective process (Boon and Holmes, 1991). On the other hand, however, this baseline approach does not scale very well. In situations where there are dozens or hundreds of, possibly conflicting, ratings, users need to spend considerable effort reading "between the lines" of individual ratings in order to "translate" other people's ratings to their order to decide whether a particular rating are complete strangers to lines" is possible at all. is one another, there In fact, as we false value system or in honest or not. What's more, in communities where most raters is no concrete evidence that reliable "reading between the mentioned, there by followmg the recommendations of own is ample anecdotal evidence of people being misled messages posted on Internet feedback forums. 2.3 Professional (specialist) rating sites Specialist-based recommendation systems in first-hand transactions employ trusted and knowledgeable who then engage with a number of service providers and then publish their "authoritative" ratings. Other users then use these ratings as a basis for forming their trustworthiness. specialists own assessment of someone's Examples of specialist-based recommendations are movie and restaurant critics, credit- rating agencies (Moody's) and e-commerce professional rating agencies, such as Gomez Advisors, Inc. (www.gomez.com). The biggest advantage of specialist-based recommendation systems ratings mentioned above. In most cases specialists are professionals maintain their trustworthiness as disinterested, themselves out of business). On fair is that it addresses the problem of false and take great pain to build and sources of opinions (otherwise they will quickly find the other hand, specialist-based recommendation systems have a number of shortcomings, which become even more severe in online communities: First, specialists involved in can only test a relatively performing these small number of service tests and, the larger and the more providers. There volatile the population must be able the lower the percentage of certified providers. Second, specialists ratings scale, good ratings. Third, specialists are individuals which do not necessarily match specialist ratings still need to of one community, to successfully conceal danger that providers will provide atypically good service their identity or else there is a the purpose of receiving time and cost is be able to that gauge a with their own to the specialist for tastes and internal of any other user of the system. Individual users of specialist's recommendation, in order to derive their own likely assessment. Last but not least, specialists typically base their ratings on a very small number of sample interactions with the service providers (oflen just one). This makes specialist ratings a very basis fi-om which to estimate the variability someone's trustworthiness, especially in of someone's service attributes, which is weak an important aspect of dynamic, time-varying environments. 2.4 Collaborative filtering systems Collaborative filtering techniques (Goldberg et. al., 1992; Resnick 1994; Shardanand and Maes, et. al., 1995; Billsus and Pazzani, 1998) attempt to process "raw" ratings contained in a recommendation repository in order to help raters focus their attention only on a subset of those ratings, which are most likely to be useful to them. The basic idea behind collaborative filtering is to use past ratings submitted by an agent i as a basis for locating other agents b ,b^,... whose ratings are likely to be most "usefiji" agent b in order to accurately predict someone's reputation from its own to subjective perspective. There are several classes of proposed techniques: Classification or clustering approaches rely on the assumption that agent small set of taste clusters, with the property that ratings of agents of the communities form a same relatively cluster for similar things are similar to each other. Therefore, if the taste cluster of an agent b can be identified, then ratings of other members of that cluster for an attribute Q of agent s can be used as statistical esiimaled rating R' for that same attribute from the perspective of b . samples for calculating the The problem of identifying the "right" taste cluster for a given agent reduces to the well-studied problem of classification/data clustering (Kaufman and Rousseeuw, 1990; context of collaborative filtering, the similarity of two buyers for commonly is rated sellers. Collaborative filtering researchers approaches, based on statistical similarity measures (Resnick Jain et, al. 1999; Gordon, 1999). In the a function of the distance of their ratings have experimented with a variety of 1994; Bresee et. al., et. al., 1998) as well as machine learning techniques (Billsus and Pazzani, 1998). Regression approaches rely on the assumption that the ratings of an agent b ratings R' R' V +B i>, for all agents is motivated by the belief, widely accepted by economists (Arrow, 1963; Sen, 1986) even when agents have "similar" According to (1) .y 'J This assumption this belief, in a tastes, one user's internal scale is not comparable given community the number of strict nearest neighbors will be very limited agents as the basis for calculating an agent's reputation. In that case, for each pair of agents, "internal scale" of agent b reputation R' that, to another user's scale. while the assumption of (1) opens the possibility of using the recommendations of a a ,p to the of another agent b through a linear relationship of the form =a fc can often be related we can use formula and then ( 1 ) if we can from the perspective of agent b b . larger number of estimate the parameters to "translate" the ratings treat the translated ratings as statistical much of agents b to the samples for estimating the The problem of estimating those parameters reduces to the ' well-Studied problem of linear regression. There is a huge literature on the topic and a lot of efficient techniques, which are applicable to this context (Malinvaud, 1966; Pindyck and Rubinfeld, 1981). Both classification and regression approaches common set of sellers. If the universe relate of sellers is buyers to one another based on their ratings for a large enough, even active buyers may be small subset of sellers. Accordingly, classification and regression approaches estimated reputations for may be poor because coverage and is due many seller-buyer pairs. Furthermore, the accuracy in to Databases (Fayyad elements of sparse databases in the of such reputation estimates 3. The et. al. known experiment with the use of techniques from the et. al. 1996), which discover latent relationships context of online reputation reporting systems. one such technique, Singular Value Decomposition (SVD), has been reported Sarwar is as reduced of ratings. Such weaknesses are prompting researchers Knowledge Discovery rated a very unable to calculate can be used to derive them. This problem fairly little ratings data to the sparse nature may have 2000). effects of unfair ratings in online reputation reporting systems field of among The promising use of in (Billsus and Bazzani 1998; Of the various classes of systems surveyed in the previous section, repositories with collaborative filtering have the best potential for we beheve scalability that recommendation and accuracy. Nevertheless, while these techniques address issues related to the subjective nature of ratings,th ey do not address the problem of unfair study a This section looks ratings. number of unfair rating scenarios problem at this and analyze in more detail. More specifically, our goal compromising the their effects in reliability is to of a coUaborative-filtering-based reputation reporting system. To simplify the discussion, in the rest of the paper trading community whose we sellers. In a future a typical transaction i, study we critical attributes We further and sellers. We further in , reflecting some attribute We assume a assume that of bi-directional ratings. In a buyer b contracts with a sellers for the provision of a service. by b (ratings can only be submitted we assume the following assumptions: will consider the implications the transaction, b provides a numerical rating R'^ (I) simplicity making participants are distinguished into buyers only buyers can rate as perceived are Upon conclusion of Q of the service offered by s conjunction with a transaction). Again, for the sake of that R'^ (i) is a scalar quantify, although, in most transactions there are more than one and ^^ (0 would be a vector. assume the existence of an online reputation reporting mechanism, whose goal process past ratings in order to calculate reliable personalized reputation estimates request of a prospective buyer b. In settings where the critical attribute subjectively measurable, there exist four scenarios sellers which deviate from a to store (/) for seller Q for which ratings where buyers and/or the system", resulting in biased reputation estimates, 7?^^ is .? and upon are provided is can intentionally try to "rig "fair" assessment of attribute Q for a given seller: a. • Unfair ratings by buyers Unfairly high ratings ("ballot stuffing "): unfairiy high ratings by them. This will A seller colludes have the effect of inflating a allowing that seller to receive more orders from buyers and • with a group of buyers at a in order to be given seller's reputation, therefore higher price than she deserves. Unfairly low ratings ("bad-mouthing"): Sellers can collude with buyers in order to "bad-mouth" other sellers that they want to drive out of the market. In such a situation, the conspiring buyers provide unfairly negative ratings to the targeted sellers, thus lowering their reputation. b. • Discriminatory seller behavior Negative discrimination: Sellers provide good service they "don't like". If the to everyone except a few specific buyers that number of buyers being discriminated upon reputation of sellers will be good and an extemalify will is relatively small, the cumulative be created against the victimized buyers. • good service Positive discrimination: Sellers provide exceptionally average service to the group and The rest. effect of this is few to a equivalent to ballot stuffing. That sufficiently large, their favorable ratings will inflate the reputation is will create The observable an externality against the rest of all four above scenarios effect seller. If the rated attribute is select individuals is, if the and favored of discriminating sellers of the buyers. is that there will not objectively measurable, it will be a dispersion of ratings for a given be very difficult, or distinguish ratings dispersion due to genuine taste differences from that which is impossible to due to unfair ratings or discriminatory behavior. This creates a moral hazard, which requires additional mechanisms in order to be and resolved. either avoided, or detected In the following analysis, we assume More issue of subjective ratings. the use of collaborative filtering techniques in order to address the we assume specifically, reputation of 5 from the perspective ofb, some that, in collaborative filtering technique N oi^b. N includes buyers who have previously rated nearest neighbor set neighbors of i, based on the similarity of their ratings, for other Sometimes, order to estimate the personalized .s commonly is used and who are the nearest rated sellers, with those ofb. Suppose, however, that the colluders have taken this step will filter out all unfair buyers. collaborative filtering into account and have cleverly picked buyers whose some fair raters in and unfair raters. Effects when reputation The simplest scenario is steady over lime to analyze is one where remains steady over time. That means ratings in their database, In order to make our ,R between!/? tnin 1 max and more concrete, we will which be approximated maximum rating will to the nearest taste cluster, it is neighbor expected that the assumption of normality. more set algorithms can take into account all the assumption that fair ratings can range of the general form: z~ N{^,a) to T' discussion. It is (2) (R) ~ N{jU, (T) The bounds corresponds nicely with common distributed fair ratings, requires belong make that they follow a distribution of the paper and therefore reputation, old. ,z)) where in the rest that agent behavior, that, collaborative filtering no matter how analysis we can assume = max{R,mm{R TUR) b and ofb tastes are similar to those everything else except their ratings of 5. In that case, the resulting set A' will include some to identify the . practice. The assumption of nomially based on the previous assumption that those ratings of a given buyer, and therefore represent fair ratings will introduction of minimum a single taste cluster. Within a be relatively closely clustered around some value and hence In this paper we will focus on the reliable estimation of the reputation mean. Given all the above assumptions, the goal of a reliable reputation reporting system should be the calculation of a reputation estimate (MRE) R'^ which is equal to or very close to jU distribution in the nearest neighbor set. Ideally, therefore: , the mean of the fair fair ratings mean ' bjatr On the other hand, the goal of unfair raters MRE R between the actual the distance is to strategically introduce unfair ratings in * , calculated by the reputation system and the fair specifically the objective of ballot-stuffing agent to minimize the it. Note that, in contrast to the is to case of fair ratings, it is MRE. More MRE while bad-mouthing agents aim maximize the not safe to ratings. Therefore, all analyses in the rest form of the distribution of unfair order to maximize make any assumptions about of this paper will calculate system behavior under the most disruptive possible unfair ratings strategy. We will only analyze the case of ballot-stuffing since the case of bad-mouthing is symmetrical. that the initial collaborative filtering step constructs a nearest unfair raters is that the actual 5 and the proportion of fair raters MRE qualifying rater in A^. /?' is ^ U where the is u =R max u '^ ' , i.e. B= Rlb.uctual In the '^ mean is the proportion mean of the most where all et. al. unfair buyers give the 2001). In that case, the actual MRE will approximate: maximum possible rating MRE is one where to the seller. reputation estimate bias for a contaminated set of ratings to be: (5) maximum MRE bias is given by: -jU) [0,9]. mechanisms" described B max for several different values For the purpose of comparing in Section 4, we have significant inflation of a seller's Percentage of MRE, Li this baseline and especially for small As can be (I 6, in the special case where case with the "immunization highlighted biases above greater than ±0.5 points on ratings which range from 0-9). unfair ratings (6) ' some values of from assumes recent rating given to s by each -R'. bjair tabulates of u ' ratings range which consistent with the practice of most current-generation B max ={\-S)H + dR max -^ = 5{R max 1 in value of unfair ratings. The strategy, which maximizes the above above scenario, the Figure *^ N, Finally, our baseline analysis in this section taken to be the sample This simple estimator '^ We define the mean set b.aclual commercial recommender systems (Schafer b. actual is i5. neighbor Assume 5% of the ratings range (i.e. biases seen, formula (6) can result in very and large 6. 9% seller as a solution. In environments where reputation estimates use more than strategy ensiires that eventually 6 can never be community, usually a very small fraction. all available ratings, this simple the actual fraction of unfair raters in the However, the strategy breaks down in reputation estimates are based on ratings submitted within a relatively short time ratings are heavily discounted). Let us assume that the cases n « Assume m. that personalized window time initial environments where window (or where older The following paragraph explains why. nearest neighbor set N,„i,ici contains m fair raters and n unfair raters. In further that the average interarrival time of fair ratings for a given seller MREs R' (t) W =[t- kA, t]. within fF would be equal to are based only on ratings for 5 submitted by buyers u e A',,,,,/,,/ Based on the above assumptions, the average number of fair k. To most X and is within the ratings submitted window ensure accurate reputation estimates, the width of the time W should be relatively small; therefore k should generally be a small number (say, between 5 and 20). For k « mwe can assume that every rating submitted within fFis from a distinct unfair raters flood the system with ratings at a frequency the unfair ratings frequency rating within the time is much higher than the frequency of et. al., we keep however, the above scenario would result rater. Even using raters where the fraction of unfair independent of how small n is Assume now = raters is 5 n/(n+k). TTiis For example, relative to m. if in only the at least last rating sent an active nearest neighbor expression results in 5 > 0.5 for n=10 and k=5, 6 = 10/(10+5) = therefore see that, for relatively small time windows, even a small (e.g. 5-10) that fair ratings. If high enough, every one of the n unfair raters will have submitted window W. As suggested by Zacharia that rule, fair rater. one by each set of n>k, 0.67. We number of colluding buyers can successfully use unfair ratings flooding to dominate the set of ratings used to calculate MREs and completely bias the estimate provided by the system. The results of this section indicate that even a relatively small number of unfair raters can significantly compromise the reliability of collaborative-filtering-based reputation reporting systems. This requires the development of effective measures for addressing the problem. Next section proposes and analyzes several such measures. 4. Mechanisms for immuntzing online reputation reporting systems against unfair rater behavior Having recognized the problem of unfair ratings as a number of mechanisms real and important one, for eliminating or significantly reducing its this section proposes a adverse effects on the reliability of online reputation reporting systems. 4.1 Avoiding negative unfair ratings using controlled anonymity The main argument of this section is that the anonymity regime of an online community can influence the kinds of reputation system attacks that are possible. A slightly surprising result is the realization that a fully transparent marketplace, where everybody knows everybody else's true identity incurs more dangers of reputation system fraud than a marketplace where the true identities of traders are carefully concealed from each other but are known to a trusted third entity (usually the market-maker). Bad-mouthing and negative discrimination are based on the ability to pick a few specific "victims" and give them unfairly poor ratings or provide them with poor service respectively. Usually, victims are selected based on some real-life attributes of their associated principal entities (for example, because they are our competitors or because of religious or racial prejudices). This adverse selection process can be avoided if the community conceals In such a "controlled the true identities of the buyers and sellers from each other. anonymity" scheme, the marketplace by applying some effective aiithenlication process before In addition, it keeps track of all transactions and knows it the true identity of market participants allows access to any agent (Hutt The marketplace publishes ratings. all the estimated reputation of buyers and sellers but keeps their identities concealed from each other (or assigns pseudonyms which change from one difficult). In that way, buyers and transaction to the next, in order to sellers make their decisions solely make 1995). et. al. them identity detection very based on the offered terms of trade as well as the published reputations. Because they can no longer identify their "victims", bad-mouthing and negative discrimination can be avoided. It is interesting to observe that, while, in as a source situation of additional is). cases, the anonymity of online conmiunities has been viewed (Kollock 1999; Friedman and Resnick 1999), here where some controlled degree of anonymity can be used Concealing the identity risks most identities of sellers In other is of buyers and to eliminate sellers is not possible in all we have some an example of a transaction risks. domains. For example, concealing the not possible in restaurant and hotel ratings (although concealing the identity of buyers domains, it may require the creative intervention of the marketplace. For example, in a marketplace of electronic component distributors, it may require the marketplace to act as an intermediary shipping hub that will help erase information about the seller's address. If concealing the identities of both parties from each other conceal the identity of one party only. More is not possible, then it specifically, concealing the identity may still be useful to of buyers but not sellers avoids negative discrimination against hand picked buyers but does not avoid bad-mouthing of hand picked sellers. In an analogous manner, concealing the identity of sellers but not buyers avoids bad-mouthing but not negative discrimination. These results are summarized in Figure 2. Generally speaking, concealing the identities of buyers sellers (a similar point is made in is usually easier than concealing the identities of Cranor and Resnick 1999). This means that negative discrimination easier to avoid than "bad-mouthing". Furthermore, concealing the identities performed used at is is of sellers before a service is usually easier than afterwards. In domains with this property, controlled anonymity can be the seller selection stage in order to, at least, protect sellers from being intentionally picked for subsequent bad-mouthing. For example, distributors, one could conceal the number of distributors for a given in the identities above-mentioned marketplace of electronic component of sellers component type is until after the closing of a deal. relatively large, this strategy Assuming would make it that the difficult, or impossible, for malevolent buyers to intentionally pick specific distributors for subsequent bad- mouthing. Anonymity Regime F The sample median k= (n+I)/2 if observations « Y known When odd. and Y n is even then (Hojo, 1931) (i.e. are no unfair that, as the size bjair Let us where the middle observation Y, is A considered to be any value between the two middle is it is most often taken to be their average. (^) = ^{Mi <^) previously assumed that ^^ '^ '^ • « of the sample increases, the median of a sample drawn from a to a normal distribution with mean equal normal distributions, the median raters, the <...<¥ when 5=0) we have normal distribution converges rapidly distribution. In Y where k=n/2, although of unfair ratings In the absence well is <Y 12/1 Y of « ordered observations is to the median of the parent equal to the mean. Therefore, in situations where there use of the sample median results in unbiased MREs: fair ' now assume that unfair raters know strategically try to introduce unfair ratings sample median of the stuffers" will try to we analysis that MREs are based whose values will on the sample median. They maximize the absolute bias between the and the sample median of the contaminated fair set maximize that bias while will "bad-mouthers" will try to set. More minimize consider the case of ballot stuffing. The case of bad-mouthing is specifically, "ballot it. In the following symmetric, with the signs reversed. Assuming ratings, that the nearest median, is fair ratings. the set of (J one where J< case and for It < where 0.5 neighbor < 0.5 all , , set consists all the ratings, and n fair ratings =S n median of the contaminated which are lower than or equal unfair set, will to the sample median be identical will set. In that have to to the A"' order statistic that, as the size quantile of the parent distribution the worst possible unfair ratings strategy, the n of the sample increases, the O) k"' order statistic of of a converges rapidly to a normal distribution with mean where q=k/n. Therefore, for large rating sample median of the contaminated set will samples converge n, under to defined by: ?x[Rl<x^] = be where k=(n+l)/2. sample drawn from a normal distribution N{/U, is 5)n unfair ratings are higher than the sample fair ratings, q"' — the most disruptive unfair ratings strategy, in terms of influencing the sample has been shown (Cadwell 1952) where X {\ Then, the sample median of the contaminated n equal to the = of « q^x^=0^'\q) + H (8) X where q = and O Given S 1 —= k (q) that is + n n ( \ + 1 1 2 (!-(?) 2 (1-tJ) i (9) CDF. the inverse standard normal = ^U R' bjair the asymptotic formula for the average reputation bias achievable 00% unfair ratings when fair ratings are drawn from a normal distribution ratings follow the most disruptive possible unfair E[B max = E[Rlp.actual -R'b.jt - o- ] ratings range Given correspond to users in the same be relatively small. Therefore, range. It is E\B max 1 for various values of S obvious that the taste cluster, it is 5% neighbor expected that the standard deviation of the did not consider standard deviations higher than maximum case where in the special that all ratings in the nearest As before, we have set fair ratings will 10% of the bias increases with the percentage of unfair ratings proportional to the standard deviation of the fair ratings. biases of and <7 -• we have assumed that we (10) (\-sy "- to 9. given by; :) 2 some of the values of from is o) and unfair 1 0-'(- -* Figure 3 shows ° ratings distribution, N{/U., by highlighted and ratings is directly maximum average of the rating range or more. Figure 3 clearly shows that the use of the sample median as a the basis of calculating MREs for unfair rater ratios manages of up to Percentage of unfair ratings to reduce the 30-40% and a maximum average bias to below 5% of the rating range wide range of fair rating standard deviations. . 4.3 Using frequency filtering to eliminate unfair ratings flooding Formulas (6) and (10) confirm the 5 the ratio of unfair intuitive fact that the reputation bias raters in a given sample. In settings where due to unfair ratings increases with a seller's quality attributes can vary over time (most realistic settings), calculation of reputation should be based on recent ratings only using time discounting or a time-window approach. In those cases. Section 3 demonstrated that by "flooding" the system with ratings in number of unfair ratings, a relatively small any given time window above 50% raters can manage to increase the and completely compromise the reliability ratio 5 of unfair of the system. This section proposes an approach for effectively immunizing a reputation reporting system against unfair The main ratings flooding. idea is to filter raters in the nearest neighbor set based on their ratings submission frequency. Description offrequency filtering Frequency Step 1: for a given seller. filtering Since this frequency with the passage of time), 1 2. window (/) W . too needs to be estimated using a time =[t-E, i]. /' = (\-D) « • , of unfair raters more than n=100, i.e. 1 n is 0% that the present in the set E of the for shy be equal (?) to b during to the A-th setF" among (t) for seller f^ )IE s, order and D is (1 1) statistic all the buyers for seller s, of the set to the 90-th smallest F' (/) a conservative estimate then if D=0 we assume 1 . where of the fraction that there are Assuming (?) contains average ratings submission frequencies would be equal F\t) (t) W buyer population for sellers. For example, unfair raters specifically: during the ratings submission frequency calculation number of elements of F' in the total the cutoff frequency The width the window approach. More average ratings submission frequencies s or less popular precisely, = (number of ratings submitted Set the cutoff frequency k More become more a time-varying quantity (sellers can each buyer b that has submitted ratings for time /' it, is F' {t) o^ buyer-specific Calculate the set for depends on estimating the average frequency of ratings submitted by each buyer no further that from 100 buyers, then frequency (the 10-th largest frequency) . ratings submission frequency calculation time order to contain at least a few ratings from all buyers for a given window seller. W should be large enough in Step MRE During the calculation oi~a 2: whom f^ > f' sellers is . In other words, for seller elimmate all j', eliminate all raters b in the nearest neighbor set for buyers whose average ratings submission frequency for above the cutoff frequency. Analysis offrequency filtering We will show that frequency filtering provides effective protection against unfair ratings flooding guaranteeing that the ratio of unfair raters in the by MRE calculation set cannot be more than twice as large as the ratio of unfair raters in the total buyer population. As before, we will assume that the entire of the reputation estimation time window comes from a in a typical W buyer population is n, unfair raters are S n «n and the width a relatively small ff (so that, each rating within fF typically is different rater). Then, after applying frequency filtering to the nearest neighbor set of raters, time window we (\-5) n expect to find (p{u) \u fair ratings, dii where (p(u) is the probability density ftinction of fair ratings frequencies, and at most W 5 n a ^fcutoff frequencies below unfair ratings, ° /^^, „ where <«< 1 is the fraction fair ratings ^' 5 _ S' _ \-5' / = submission to: 5 _ Jcuiaff \- 5 ''« 1 \u where raters with . Therefore, the unfair/fair ratings ratio in the final set would be equal unfair ratings of unfair ,.-. -8 (p{u)du a-f '^^ denotes the inflation of the unfair/fair ratings ratio in the final set relative to its \u(p{u) du value in the original set. The goal of unfair above and below the cutoff frequency to pick the cutoff frequency /__^,,^ distribution, order to maximize so as to minimize The cutoff frequency has been defined where D>5. For in raters is to strategically distribute their ratings frequencies In contrast, the goal satisfies the equation: of the market designer is /. as the (/-D/n-th order statistic relatively large samples, this where q /. converges of the sample of buyer frequencies, to the ^-th quantile of the fair rating frequencies (\-D) n = qi\-S) n + From = / + 5) (1 Let . We start by 5 = F distributions to equation (12), /=^""V c 1 -D + (a-l) - we some want to maximize want to minimize D, F . /(I + s) and get the following expression of the inflation / of unfair ratings: algebraic manipulation a, the fraction i.e. set ^^iS^Z^s -F (14) we of ratings that are where f = !!2i +Fmax mm -£ ' ^ The above expression frequencies. — da find that S) a= — dD and > . This means less than or equal to f^^^^^ , that, unfair raters will while market makers will (15) for the unfair/fair ratings inflation / 1 = i-£ 5-»o above limiting values of/ in in the \,D = S and; 5 l-S At the limiting cases we get lim substituting the > D as close as possible to an accurate estimate of the ratio of unfair raters -eS) 2(F (\-£)(Fmin mm By the probability density fijnction of Then, by applying the properties of unifonn probability population. Therefore, at equilibrium, ^ some assumptions about min utuj/ 1= (13) 1-0 assuming a unifonn distribution between F^^^ = f^ where/ After total n => q = exact analysis requires this point on, the fair ratings frequencies. F aS depends on the spread S of fair ratings and lim / = s^- equation (12), we 2 i-e get the final fonnula for the equilibrium relationship between 6, the ratio of unfair raters in the total population of buyers and 5' the final ratio of unfair ratings m the nearest neighbor set using time windowing and frequency filtering: S/(\-S)<S'<2S Equation (16) shows that, (16) no matter how hard unfair raters may try to "fiood" the system with ratings, the presence of frequency filtering guarantees that they cannot inflate their presence calculation set by In factor more ratio a belief that 5<0.1, then setting which also fall 5 of unfair raters will not be known D=0.1 has been experimentally proven exactly. In such cases, if to result in inflation ratios, within the bounds of equation (16). realistic mean / and MRE of 2. This concludes the proof most online communities, the exact we have A more than a in the final assumption about fair ratings frequencies is that they follow a variance related to the frequency spread S. This assumption is lognormal distribution with consistent with the findings of researchers in marketing (Lawrence 1980). In this case, the expression for the final ratio S' cannot be given in closed form. However, a numerical solution yields results, which approximate very closely those obtained analytically for uniformly distributed 0.001 fair rating 0.01 0.1 frequencies (Figure 4). 1000 100 10 1 Frequency spread -B— Uniform distribution Figure Given 4. Maximum that ±0.5 points median when unfair ratings inflation factors from 1-10) for Log-normal distribution when frequency filtering guarantees reputation biases ratings range x of less than 5% filtering is used (S = of the ratings scale contamination ratios of up to D = 0.l). than (e.g. less 30-40% and frequency filtering guarantees that unfair raters cannot use flooding to inflate their presence by more than a factor of two, the combination of frequency when and median filtering the ratio of unfair raters is up to filtering 15-20% of the total One possible criticism of the frequency filtering approach who transact would be we do most frequently with filtered out based on not believe that this property constitutes a customers" of a given is that final reputation it weakness of the approach. we believe seller. potentially eliminates those fair buyers submission frequency would be seller often receive preferential treatment, discrimination on behalf of the seller. Therefore, from the buyer population for a given a given seller. In fact, in the absence of unfair raters, their high ratings 5% of guarantees reputation biases of less than which is all fair raters. raters who Nevertheless, We argue that the "best in a way a form of positive that the potential elimination of such raters estimate in fact benefits the construction of more unbiased estimates for the benefit of first-time prospective buyers. 4.4 Issues in communities The where buyer identity is not authenticated effectiveness of frequency filtering relies on the assumption that a given principal entity can only have one buyer agent acting on principal entities works if its behalf in a given marketplace. The technique have multiple buyer agents with authenticated we consider all is also valid in situations identifiers. In that case, frequency where filtering agents of a given principal entity as a single buyer for frequency filtering purposes. In non-aulhenticated online communities (communities where "pseudonyms" are "cheap", of Friedman and Resnick) with time-windowed reputation estimation, unfair buyers can "flood" the system with unfair ratings by creating a large to use the still manage term to number of pseudonymously known buyer agents acting on their behalf In that case the total ratio 5 of unfair agents relative to the entire buyer population can be made arbitrarily high. If with frequency raters can still J ' <J manage , each of the unfair agents takes care of submitting unfair ratings for sellers because 6 will be high, even to severely needed contaminate a seller's is where the "true" identity of buyer agents cannot section make a strong argument example, requiring that in all 5. all presence of frequency filtering, imfair fair reputation. order to develop immunization techniques that are effective in communities Further research in in the for using be authenticated. In the meantime, the observations of this some reasonably effective authentication newly registering buyers supply a valid online communities where trust regimeybr buyers (for credit card for authentication purposes) based on reputational infomiation. is Conclusions and Management Implications We began this paper by arguing that managers of online marketplaces should pay special attention to the design of effective trust management mechanisms that will help guarantee the growth of their respective communities. stability, longevity and We pointed out some of the challenges of producing trust in online enviroiunents and argued that online reputation reporting systems, an emerging class of information systems, hold the potential of becoming an effective, scalable, and relatively low-cost approach for achieving this goal, especially when the set of buyers and sellers is large and volatile. Understanding the proper implementation, usage and limitations of such systems (in order to eventually overcome them) is therefore important, both for the managers as well as for the participants of online communities. This paper has contributed in this direction, first, by analyzing the reputation reporting systems in the presence of buyers who reliability of current-generation intentionally give unfair ratings to sellers and, second, by proposing and evaluating a set of "immunization mechanisms", which eliminate or significantly reduce the undesirable effects of such fraudulent behavior. In Section 3, the motivations for submitting unfair ratings were discussed and the effects of such ratings on biasing a reputation reporting system's mean reputation estimate of a seller were analyzed. We have concluded that reputation estimation methods based on the ratings mean, which are commonly used commercial recommender systems, are particularly vulnerable contexts where a seller's reputation Technique may vary over time. to unfair rating attacks, especially in in techniques proposed by this work will provide a useful basis that will stimulate further research in the important and promising field of online reputation reporting systems. References Arrow, Kenneth (1963). Social Choice and Individual Values. Yale University Press. Bakos, Y. (1997). Reducing Buyer Search Costs: Implications for Electronic Marketplaces. Management Volume Science, 43, 12, December 1997. Bakos, Y. (1998). Towards Friction- Free Markets: The Emerging Role of Electronic Marketplaces on the ACM, Volume Communications of the Internet. Billsus, D. and Pazzani, M.J. (1998). Learning collaborative information International Conference on Boon, Susan D., the face of 41, 8 (August 1998), pp. 35-42. risk. Machine Learning, July 1998, & Holmes, John G. Pages 190-21 1 (1991). filters. In A., 15''' pp. 46-54. The dynamics of interpersonal of Hinde, Robert Proceedings of the & Groebel, Jo (eds), trust: resolving uncertainty in Cooperation and Prosocial Behaviour. Cambridge University Press. Bresee, J S , Heckerman, D., and Kadie, C. (1998) Empirical Analysis of Predictive Algorithms 14''' Collaborative Filtering. In Proceedings of the Conference on Uncertainty for in Artificial Intelligence (UAI-98), pp. 43-52, San Francisco, July 24-26, 1998. Cadwell, J.H. (1952) The distribution of quantiles of small samples. Biometrika. Vol. 39, pp. 207-21 1. Cranor, L.F. and Resnick, P. (2000). Protocols for Automated Negotiations with Buyer Anonymity and Seller Reputations. To appear Dellarocas, C. (2000). in Netnomics. The Design of Reliable Trust Management Systems for Online Trading Communities. Working Paper, available from http://ccs.mit.edu/dell/trustm(;t.pdf Dunn, John. (1984) The concept of Richard, Schneewind, J. B., & 'trust' in the politics Skinner, Quentin (eds), Philosophy in History. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, Discovery and Data Mining, Friedman, version E.J. and Resnick, was presented at the MIT P. of John Locke. Chap. 12, pages 279-301 of Rorty, Press, P. Cambridge University and Uthurusamy, R. eds. (1996) Advances in Press. Knowledge Cambridge, Mass. (1999) The Social Cost of Cheap Pseudonyms. Working paper. An eariier Telecommunications Policy Research Conference, Washington, DC, October 1998. Gambetta, Diego (ed). (1990). Trust. Oxford: Basil Blackwell. Gordon, A.D. (1999) Classification. Boca Raton: Chapman & Hall/CRC. Goldberg, D., Nichols, D., Oki, B.M., and Terry, D. (1992) Using Collaborative Filtering to Infonnation Tapestry. Communications of the ACM 35 (12), pp. 61-70, Weave an December 1992. Hojo, T. (1931). Distribution of the median, quartiles and interquartile distance in samples from a normal population. Biometrika, Vol. 23, pp. 315-360. Huber, Peter (1981). Robust Statistics. Wiley, New York. and Hoyt. D.B. Hutt, A.E., Bosworth, S. New eds. (1995). Computer Security Handbook (3"^ edition). Wiley, York. Murty, M.N. and Flynn, Jain, A.K., 31, 3 (Sep. 1999), pages P.J. (1999) Data clustering: a review. ACM Computing Surveys, Vol. 264 -323. Johnson, D. R. and Post D. G. (1996). Law And Borders—The Rise of Law in Cyberspace. Stanford Law Review, Vol. 48. Kaufman, Wiley, L. New and Rousseeuw, P.J. (1990). Finding eds. E.J. Lawler, R.J. M. Macy, Data: S. Thyne, and in Online Markets. In Advances to Cluster Analysis. HA. May P., Group Processes (Vol. 16), 1980, pp. 212-226 E. (1966). Statistical McGraw-Hill, in Walker, Greenwich, CT: JAI Press. Methods of Econometrics. Paris: Pindyck, R. and Rubinfeld, D.L. (1981). Econometric Models Resnick, An Introduction (1980) The Lognormal Distribution of Buying Frequency Rates. Journal of Marketing Research. Vol. XVII, Malinvaud, in York. Kollock, P. (1999) The Production of Trust Lawrence, Groups New North Holland. and Economic Forecasts (2"'' Edition). York. lacovou, N., Suchak, M., Bergstrom, for Collaborative Filtering of Netnews. In and Riedl, J. (1994) Grouplens: An Open Architecture Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175-186, Resnick, P. and Varian, H.R. (1997). P., New York, NY: ACM Press. Recommender Systems. Communications of the ACM, Vol. 40 (3), pp. 56-58. Sarwar, B. M., Karypis, G., Konstan, in Recommender System - A Case J. A., and Riedl, Study. In J. (2000) Application of Dimensionality Reduction ACM WebKDD 2000 Web Mining for E-Commerce Workshop. Schmalensee, R. (1978). Advertising and ?rod.vic\QMa\\Ky. Journal of Political Economy, Vol. 86, pp. 485503. Sen, A. (1986). Social choice theory. In Handbook of Mathematical Economics, Volume 3. Elsevier Science Publishers. Schafer, J.B., Konstan, J., and Riedl, J., (2001) Electronic Commerce Recommender Applications. Journal of Data Mining and Knowledge Discovery. January, 2001 (expected). Shapiro, C. (1982) Consumer Infonnation, Product Economics 13 pp 20-35, Spring 1982. (1), Shardanand, U. and Maes, P. (1995). Social of information filtering: Algorithms for automating "word of mouth". In Proceedings of the Conference on pp. 210-217. Quality, and Seller Reputation. Bell Journal Human Factors in Computing Systems (CHI95), Denver, CO, Smallwood, D. and Conlisk, J. (1979). Product Quality in Markets Where Consumers Are Imperfectly Infomied. Quarterly Journal of Economics. Vol. 93, pp. 1-23. Wilson, Robert (1985). Reputations edited by Alvin Roth, Yahalom, R., Klein. in Games and Cambridge University B and , Markets. In Game-Theoretic Models of Bargaining. Press, pp. 27-62. Beth, T. (1993). Trust Relationships in Secure Systems - Authentication Perspective. In Proceedings of the IEEE Symposium on Research A Distributed in Security and Privacy, Oakland, 1993. Zacharia, G., Moukas, A., and Maes, P. (1999) Collaborative Reputation Marketplaces. In Proceedings of 32"'' Mechanisms in Online Hawaii International Conference on System Sciences (HICSS-32), Maui, Hawaii, January 1999. Footnotes ' Jonathan Lebed, a 15-year-old boy was sued by the SEC for buying large blocks of inexpensive, thinly traded stocks, posting false messages promoting the stocks on Internet message boards and then dumping the stocks after prices rose, partly as a result of his messages. The boy allegedly earned more than a quarter-million dollars in less than six months and Press). settled the lawsuit on September 20, 2000 for $285,000 (Source: Associated 0£C ^^00? ^ Date Due MIT LIBRARIES 3 9080 02246 1260