Does Cumulative Advantage Increase Inequality in the Distribution of Success? Evidence from a Crowdfunding Experiment ∗ Rembrand Koning Stanford GSB Jacob Model Stanford GSB September 29, 2015 Abstract The diffusion of online marketplaces has increased our access to “social information” – records of past behavior and opinions of consumers. One concern with this development is that social information may create cumulative advantage dynamics that distort marketplaces by increasing inequality in the distribution of success. Critically, this argument assumes that products that are likely to succeed disproportionately benefit from social information. We challenge this assumption and argue that cumulative advantage processes can aggregate to have nearly any effect on distribution of success, even decreasing the level of inequality in some cases. We assess these claims using archival data and a field experiment in a crowdfunding marketplace. Consistent with prior work, randomized changes to social information generate cumulative advantage. However, our treatments did not change the distribution of success. Products benefited equally from our treatments regardless of their predicted likelihood of success. Our treatments still affected marketplace dynamics by weakening the relationship between predicted and achieved success. ∗ Both authors contributed equally, order is alphabetical. Special thanks to DonorsChoose for making this research possible. This work has benefited from feedback provided by participants at CAOSS, AOM, and ASA. Generous funding was provided by Stanford’s Center on Philanthropy and Civil Society and Center for Social Innovation. 1 1 Introduction One of the seminal insights of social science is that social information, past behavior and opinions of others, powerfully shapes our attitudes, beliefs and actions. The rise of online platforms has made social information in the form of popularity, ratings and reviews increasingly accessible. However, concomitant with this profusion of social information, there is a growing controversy as to whether it facilitates or distorts marketplaces (Zuckerman, 2012; Muchnik, Aral and Taylor, 2013). On the one hand, such information may provide informative signals about difficult to observe aspects of products and services (Zhang and Liu, 2012). On the other hand, the use of social information is subject to many well-documented biases that may undermine or even pervert its potential signaling value (Cialdini, 1993; Simonsohn and Ariely, 2008). Small and arbitrary initial differences in social information may lead to large differences in success due to self-reinforcing “cumulative advantage” dynamics (Banerjee, 1992; DiPrete and Eirich, 2006). Indeed, a large body of archival (Simonsohn and Ariely, 2008; Chen, Wang and Xie, 2011; Burtch, Ghose and Wattal, 2013) and experimental (Salganik, Dodds and Watts, 2006; Tucker and Zhang, 2011; Muchnik, Aral and Taylor, 2013; van de Rijt et al., 2014) research has found evidence that changes in social information lead to cumulative advantage processes. Scholars have used this evidence to argue that social information in marketplaces inherently magnifies inequality in the distribution of success (Salganik, Dodds and Watts, 2006; Tucker and Zhang, 2011; Muchnik, Aral and Taylor, 2013; van de Rijt et al., 2014). Prior to any differences in social information, products in a marketplace already differ in their expected success. For instance, we take for granted that some products auctioned on eBay are more or less likely to reach their reserve price based on factors such as appearance and the reputation of the seller. Past research has largely sidestepped these incoming differences by employing sophisticated statistical controls or using randomization to account for pre-existing differences (cf. Salganik, Dodds and 2 Watts, 2006). However, in order to understand how social information changes the distribution of success in marketplaces, we need to know how incoming differences in expected success interact with changes in social information. Consider the case of two new books being sold in an online marketplace such as Amazon. One that is from a first-time author and one that is from a well-known author. A positive change in social information to both books, such as a good review, will create a wider gap in sales if the well-known author benefits more from the review. Alternatively, the same good review may plausibly benefit the first-time author more than a well-known author as the latter already has some established reputation and audience (Kovács and Sharkey, 2014). In this case, the change in social information will reduce the gap in success between these two books. This example illustrates how the direction of the effect of cumulative advantage on inequality is not obvious (Allison, Long and Krauze, 1982). Without an understanding of how cumulative advantage processes interact with a product’s expected success, we can say little about the effects of social information on inequality of success in a marketplace. In this paper, we investigate how changes to social information interact with a product’s expected success and explore the implications of these interactions for the distribution of success in online marketplaces. Specifically, we couple archival data from a online fundraising marketplace with an field experiment to examine the interaction between a product’s expected success and cumulative advantage processes. The marketplace, DonorsChoose, is a two-sided marketplace that helps public school teachers find donors willing to fund school supplies. DonorsChoose facilitates this by enabling teachers to post fundraising appeals (called “projects”) and by providing modern search tools to donors to find projects that appeal to them. One significant challenge is that the expected success of a product is typically not directly observable in field settings. We build on the approach taken in Salganik, Dodds and Watts’s (2006) seminal study of online culture markets in which Salganik 3 and colleagues created multiple online markets with the identical set of products. The authors experimentally manipulated the presence or absence of social information (in their context, popularity) and tracked the success of products in the marketplace. In their design, the success of products in marketplaces without social information served as a direct measure of expected success in the absence of social information. Although we cannot observe the same DonorsChoose project in multiple conditions, we take a conceptually similar approach to Salganik and colleagues by creating a counterfactual measure of expected success for projects that received our experimental treatments. We estimate expected success by generating predictions for each project using a Random Forest algorithm trained on the characteristics and fundraising outcomes of over 400 thousand past projects. With this measure of expected success, we then randomly contributed $5 or $40 to 320 projects and tracked their progress towards their fundraising goals. This randomization enables us to test whether exogenous changes in social information lead to cumulative advantage dynamics. Critically, we test whether these dynamics disproportionately benefit projects that were (un)likely to succeed by interacting our randomized treatments with our measure of expected success. This approach enables us to assess how arbitrary changes to social information change the distribution of funding outcomes. Consistent with prior work on social information in online marketplaces, we find that our randomly assigned contributions result in cumulative advantages that increase the probability that a project reaches its funding goal. Surprisingly, this effect is remarkable homogeneous across projects with different levels of predicted success; projects with characteristics that made them very likely to succeed benefited just as much as projects with characteristics that made them unlikely to succeed. In this way, our intervention introduced more noise into the fundraising process by reducing the correlation between a project’s expected and realized success. The utility of this noise depends on the goals of the marketplace. Increasing the unpredictability of success 4 reduces the chance that projects with the most popular characteristics succeed. However, a little noise also creates greater opportunities for novel, innovative and risky projects to reach their funding goals. 2 Social Influence, Cumulative Advantage and Inequality Cumulative advantage processes are considered a key driver of inequalities in society and markets (DiPrete and Eirich, 2006). Initial advantages become magnified through self-reinforcing dynamics resulting in a distribution of success that is highly unequal. Scholars have suggested two mechanisms that produce cumulative advantage processes: resource accumulation and information cascades. In the first case, small initial differences in the distribution of resources become magnified as those with initial resource advantages can channel these into additional resources and skills (Merton, 1968). An example of this is Merton’s pioneering work on cumulative advantage in academic research. This work takes a broad view of resources to include status garnered through awards. Award winners leverage their increased status to get additional funding for their work and so attract more talented students and collaborators (Merton, 1968). In this sense, Merton’s mechanism is one of resource accumulation —award winners can use their initial advantage to improve the quality of their work. Even if the initial allocation of the award were arbitrary, the emergent performance differences are merited. An alternative mechanism is information cascades (Banerjee, 1992). In these models, the assumption is that the quality of a product or service is difficult to observe directly. Consumers must rely on other observable signals such as social information that have some correlation, however weak, with quality (Podolny, 2005). For products and services, common examples of social information include information on popular- 5 ity, endorsements or reviews. This information tends to be self-reinforcing; a marginal increase in the popularity or number of endorsements a product receives makes it more attractive to subsequent consumers and reviewers. The core difference between Merton’s cumulative advantage mechanism and the information cascade model is that in the latter, success can become decoupled from a product’s underlying quality. The primary concern regarding cumulative advantage in Merton’s example of resource accumulation is that arbitrary and negligible initial differences allow certain scientists to receive a disproportionate amount of success. This resulting inequality reflects real differences in quality. That said, the arbitrary processes that lead to the inequality may be deemed unfair. The concern of information cascades scholars is quite different; their concern is that the resulting inequality in success may be largely uncorrelated with the underlying features and quality of a product or service. The expanding role of social information in our everyday lives has increased concerns over the decoupling of social information from underlying quality (Muchnik, Aral and Taylor, 2013; Colombo, Franzoni and Rossi-Lamastra, 2014). For example, business owners can manipulate their online reputations by purchasing and then reviewing their own products, or more nefariously, by paying others to give perfect reviews. This is a commonplace activity; one scholar estimated that upwards of one-third of online reviews are fake.1 Even if they are eventually identified and removed, fake reviews or ratings may influence other consumers, leading them to alter their ratings, reviewing or purchasing behavior. Once a process of cumulative advantage begins, it may be difficult to correct. Even beyond outright fraud, natural variation in marketplace activity may generate cumulative advantage processes. For example, differences in the timing of initial endorsements or purchases, which may be effectively random, could lead to cumulative advantages. These sources of advantage may be even more difficult to address than outright fraud. Ideally, outcomes in marketplaces would be relatively 1 http://www.nytimes.com/2012/08/26/business/book-reviewers-for-hire-meet-a-demand-for-onlineraves.html 6 robust to these perturbations. However, research suggests that some marketplaces may not be especially robust to these effects. In their pioneering studies of online culture markets, Salganik and colleagues examined how such arbitrary differences in social information change a product’s success in a simplified online marketplace created by the researchers (Salganik, Dodds and Watts, 2006; Salganik and Watts, 2008). They randomly assigned participants into marketplaces that either displayed the popularity of a product —in their setting, the number of downloads of a song— or suppressed this information. Product success, as measured by the number of downloads of a song, was more variable in marketplaces with popularity information than those without it. In the popularity conditions, the songs that arbitrarily received a few initial downloads achieved greater market share than the same song in a marketplace without social information. This study demonstrates that markets with social information create cumulative advantages through endogenous processes that disproportionately benefit the most appealing songs. In turn, this increases inequality within the marketplace. In a follow-up study, Salganik and colleagues also show that these endogenous processes are not destiny. In the same marketplace, they directly manipulated the popularity of songs by reversing the endogenous ranking of songs (Salganik and Watts, 2008). The impact of this manipulation of social information varied across products within the same marketplace, as measured by their success in marketplaces without social information. It strongly benefited the songs with the lowest levels of expected success. In contrast, songs with the highest levels of expected success did regain position in the download rankings but only after hundreds of subsequent downloads. The combination of these effects implies that in this second experiment, the social influence manipulation (i.e., the ranking inversion) likely reduced the inequality in popularity. In sum, the results of Salgnik and colleagues’ studies suggest that social information can generate cumulative advantage processes that have the potential to magnify or 7 reduce inequality in success, even within the same marketplace. A flourishing body of research has extended this work by examining whether changes to social information produce cumulative advantage effects in field settings. For example, Zhang and Liu (2012) found evidence in an observational study of a peer-to-peer lending platforms that changes in social information lead to cumulative advantage effects. However, it is difficult to casually identify social influence effects with observation data. Changes to social information are usually endogenous to a product’s expected success; people tend to consume, review or rate products that are likely to be appealing and succeed in the marketplace. The combination of the difficulty of measuring a product’s appeal coupled with selection processes makes it extremely challenging to determine how much success is due to changes in social information. Several recent studies have adopted a field experiment approach and to address these concerns by randomizing changes to social information. This research stream has largely reached similar conclusions (cf. Burtch, Ghose and Wattal, 2013) with studies finding that changes to social information through exogenous positive endorsements or contributions leads to cumulative advantage in a wide variety of settings including wedding services marketplaces (Tucker and Zhang, 2011), online social news aggregators (Muchnik, Aral and Taylor, 2013), online petitions, online reviews, and crowdfunding platforms (van de Rijt et al., 2014). Therefore, we expect as our baseline hypothesis: Hypothesis 1: Exogenous endorsements of or contributions to products produce cumulative advantage effects If changes to social information produce cumulative advantage, what are the implications for the distribution of success in markets? Scholars have tended to erroneously equate evidence of cumulative advantage with evidence for increased inequality of success (van de Rijt et al., 2014). However, the relationship is not deterministic because changes in the inequality of success implies that success of products relative to one 8 another—not just the absolute levels of success of a given product—change. The research design of randomly assigning changes to social information in a market only allows a researcher to estimate changes to absolute levels of success. It does not allow a researcher to draw conclusions on how these effects shape the distribution of success across products. Consider the simplest case where the cumulative advantage effects are identical across all products (e.g., a 10% increase in success rates). This produces a general shift in absolute success rates but relative success is unchanged. However, when cumulative advantage effects systematically vary across different types of products, the implications are more complex. If products with high levels of expected success benefit more from changes to social information than products with low levels of expected success, this would increase the inequality of success. Conversely, inequality in the distribution of success may decrease if products with low levels of expected success benefit more from changes to social information than products with high levels of expected success. Of course more complex relationships are possible, such as products with high and low levels of expected success benefiting more than those with average levels of expected success. Overall, these cases illustrate how cumulative advantage effects may aggregate to have nearly any effect—or no effect at all—on the level of inequality in the distribution of success. To illustrate this more concretely, imagine two musicians who are trying to raise similar amounts of money to produce a new album. These musicians decide to list their fundraising efforts on a crowdfunding platform, a popular type of online marketplace. These platforms aggregate individual contributions towards a publicly stated fundraising goal. They also commonly feature social information, typically the number and size of prior contributions. The first musician is an established one and is likely to reach her fundraising goal because she can rely on her established fan base for support. The second musician is an inexperienced musician and is less likely to reach his goal. 9 Imagine that each one of them randomly receives an initial contribution of $50. The question for scholars of cumulative advantage is which album benefits more from this contribution? One possibility suggested by the cumulative advantage literature is a “successbreeds-success” dynamic where the experienced musician disproportionately benefits. Potential contributors may prefer--perhaps because they are risk averse—to fund an album that they perceive as very likely to succeed. In this case, the first contribution will lead to stronger cumulative advantage effects for the established musician that, in turn, will create a wider gap in fundraising success than one would expect without this contribution. An alternative possibility suggested by the research on status and market uncertainty is that the inexperienced musician may disproportionately benefit from the contribution. The rationale is that it is likely that potential contributors have more uncertainty regarding the success of the inexperienced musician relative to the established musician (Podolny, 2005; Azoulay, Stuart and Wang, 2013). If the contribution decreases the level of uncertainty regarding the quality of the inexperienced musician’s fundraising appeal more than the experienced musician, then the contribution would disproportionately help the inexperienced musician. This mechanism would suggest that the first contribution would reduce the gap in realized success relative to expected success. These mechanisms are meant to be illustrative rather than exhaustive. Either one or even a combination of each, is plausible. Moreover, the functional form of the interaction of expected success and the $50 contribution may change the implications for inequality as well. For instance, there could be ceiling or floor effects. If an extremely well-known musician like Paul McCartney were to try to fundraise for an album on a crowdfunding platform, an initial contribution would likely have little effect on the distribution of success. Understanding these considerations is important, but the focus of our study is not to quantify the size or form of these mechanisms. Instead, 10 we are concerned with how the effects produced by changes in social information, regardless of the mechanism, aggregate to affect the distribution of success. While we shown that cumulative advantage may affect the distribution of success in myriad of ways, prior work has suggested that cumulative advantage processes tend to increase inequality in the distribution of success (Salganik, Dodds and Watts, 2006; Tucker and Zhang, 2011; Muchnik, Aral and Taylor, 2013; van de Rijt et al., 2014). Therefore, we expect: Hypothesis 2: Exogenous endorsements of or contributions to products benefit products with high levels of expected success more than products with low levels of expected success We explicitly examine whether cumulative advantage leads to increased inequality by testing if social information has differential benefits across products and services with different levels of expected success. In order to map differential benefits from cumulative advantage to changes in inequality in success, one needs a counterfactual estimate of expected success. Following Salganik and colleagues, we propose that a product’s expected level of success without an exogenous change to social information provides a reasonable way to create a counterfactual (Salganik, Dodds and Watts, 2006). This counterfactual enables us overcome two measurement issues. First, it provides a basis for measuring cumulative advantage. Deviation from expected success can be used to quantify cumulative advantage effects. Second, it enables one to locate products on the distribution of expected success prior to being affected by social information. This placement enables us to draw conclusions on how a change in social information may affect the shape of the distribution of success. In the following section, we describe the philanthropic marketplace we use in order to demonstrate our methodology. 11 3 Study Context Our study takes place on one of the largest and oldest crowdfunding platforms: www.donorschoose.org (referred to as DonorsChoose henceforth). Since its founding in 2003, DonorsChoose has raised over $300 million dollars for classrooms by acting as an online marketplace that connects teachers in need of school supplies with donors interested in supporting public education. Teachers list a project in the marketplace by completing a standardized template which includes the supplies they want for their classroom, the total cost of the supplies and a brief description of how the supplies will be used. The projects range from basic classroom supplies such as textbooks to more novel requests such as educational technology. The majority of funding requests range from a few hundred up to one thousand dollars. Potential donors can search through tens of thousands of projects using standard internet searching features such as keywords and facets. The barriers to entry for support are extremely low: one can contribute little as one dollar. Figure 1 presents a screenshot taken at the time of our experiment of a randomly selected project’s website. DonorsChoose provides very detailed information, most of which they verify through third-party sources. This information includes time-invariant details like the shipping costs, the level of poverty of the school and a text description of the project created by a teacher. It also includes dynamically updated social information such as the number of previous donations, the total amount raised so far and text messages of support from prior donors. When making a donation, donors can either reveal their identity or make an anonymous donation. They are also prompted to write a message for other potential donors to view, though this step is optional. Immediately after a donation, the project page is updated to reflect the donated amount, number of past donors and any new messages of support written by the donor. DonorsChoose operates in a similar manner to other prominent crowdfunding sites such as Kickstarter or IndieGoGo. Projects are successful if they achieve their funding 12 goal. When a project reaches its goal, DonorsChoose ships the supplies directly to the school to be used. Projects cannot receive donations above their fundraising goal and are immediately closed to donations once it reaches its goal. If the project fails to meet its funding goal after five months of listing, DonorsChoose removes the project from the website and donors are refunded the amount they donated to use towards other projects in the marketplace. This marketplace, which over 200,000 teachers and two million donors have participated in, is a prototypical example of a philanthropic crowdfunding platform (Agrawal, Catalini and Goldfarb, 2011; Burtch, Ghose and Wattal, 2013; Meer, 2014). We selected a philanthropic crowdfunding platform for several reasons. First, there are clear metrics of success on the platform: whether a product is funded and how quickly it reaches its fundraising goal. Second, there is variation in outcomes and expected levels of success. Roughly two-thirds of projects are funded in the marketplace and this varies greatly based on geography, subject matter and other project characteristics. Third, social information is very prominent in this marketplace. Information on the number of donors and a progress bar, which displays how much money has already been contributed, is displayed at the top of a project’s website (see Figure 1). Donors to a project also have the option of writing short messages that are listed on the project page. Overall, the platform is designed to communicate social information to inform the decisions of potential donors. Fourth, philanthropic crowdfunding is a place where social information should matter because “quality” of charitable causes is often difficult to directly observe (Hansmann, 1987). In particular, on DonorsChoose the beneficiaries of the donations, the teacher and the students, are typically distinct groups from the donors.2 This makes it difficult to assess the potential use of the pedagogical resources in the classroom 2 DonorsChoose tracks donations made by teachers who posted the projects These are rare events. Through our qualitative examination of donor comments and conversations with DonorsChoose, it is clear that the majority of donors are direct beneficiaries 13 and, once acquired, to monitor their use. The lack of direct observability creates the necessary preconditions for social information to lead to cumulative advantage effects; it is possible for social information to influence evaluations such that there is a decoupling of evaluations from a product’s underlying features. Indeed, a large literature in behavioral economics on matching and seed donations show that initial or matching contributions lead to increased rates of subsequent donations (e.g., List and Reiley, 2002; Karlan and List, 2012). Based on this work and research on crowdfunding platforms (van de Rijt et al., 2014), we expect that cumulative advantage operates in this setting. This is critical to our study because in order for social information to affect the distribution of funding outcomes, it needs to be able to produce cumulative advantage effects. Beyond theoretical considerations, understanding cumulative advantage in the philanthropic domain is important in its own right. Individual philanthropy is economically significant as individuals are estimated to have donated over $250B to charities in 2014.3 According to Gallup surveys, the vast majority of Americans —over 80 % overall and over 95 % of Americans with incomes above 75 thousand dollars— are estimated to donate to charity in a given year.4 Online giving, in particular, is becoming increasingly important. Online donations grew at an estimated 8.9% in 2014 compared to the 2.1% increase of donations overall.5 As online donations and crowdfunding platforms become an increasingly important way for charities to fundraise, it is critical to understand how social information shapes these decisions and affects the fundraising process. In addition, our particular setting is well-suited for our approach. DonorsChoose has collected extremely detailed data on all the donation activity on its website since its inception. They have the exact time (up to the nearest second), amount and source for 3 Giving USA 2015 http://www.gallup.com/poll/166250/americans-practice-charitable-giving-volunteerism.aspx 5 Blackbaud Charitable Giving Report: How Nonprofit Fundraising Performed in 2014 4 14 every donation on the website. This data enables us to construct a linked longitudinal set of donations for every project and every donation on DonorsChoose. We leverage this rich historical data to construct accurate predictions on the expected success of a project based on its time-invariant features when it is first listed on the marketplace. While attractive for theoretical reason and empirical reasons, there is a reasonable concern that studying donation decisions may not be generalizable to choice decisions in other marketplaces, such as consumer purchasing decisions. Scholars have suggested that donations are motivated by factors such as social signaling (Bénabou and Tirole, 2006; Karlan and McConnell, 2014) and the intrinsic utility or “warm glow” of the act of donating rather than aspects of the cause (Andreoni, 1990). While these factors certainly affect some donations on this platform, there are several reasons to believe that a large subset of donors on DonorsChoose are basing their decisions on more technical aspects of a project and are making decisions which are more analogous to traditional models of choice. First, one of the main attractions of DonorsChoose as a resource to donors is the ability to search and compare projects through keyword search and faceting (hence the name of the platform). Donors also have many alternatives to support public education such as school fundraisers, parent-teacher organizations, scholarship funds and a plethora of education nonprofits. If a donor wanted to support public eduction for warm glow or social signaling reasons, these are widely available options that do not involve the costs of searching through the platform. Thus, we would expect some degree of sorting amongst donors. Second, empirical analyses of DonorsChoose suggest donors are sensitive to aspects of the projects. For instance, Meer (2014) and our own analysis of DonorsChoose have found that donors are less likely to fund projects that have high levels of so-called “overhead costs” such as sales tax and shipping costs. In addition, Meer (2014) found that projects with higher levels of competition from similar projects are less likely to be funded, suggesting that donors are choosing between projects. These results 15 are consistent with recent research on donations more generally, which suggests that some donors are concerned with how their charitable dollars are being used (Gneezy, Keenan and Gneezy, 2014). In sum, the empirical evidence suggests that donors on DonorsChoose are making decisions using criteria, such as competition and price, that we would expect would affect choice behavior in other marketplaces. 4 Data and Methods Identifying the effect of cumulative advantage is difficult even with the detailed data that DonorsChoose collects. Without some form of randomization, naturally occurring or induced by a third-party, one cannot confidently separate social influence effects from unobserved heterogeneity (Manski, 1993; Shalizi and Thomas, 2011). For example, one project in our data very quickly attracted numerous donors from across the country and was funded withing hours of being listed, far less time than a typical project. It was a request for a set of books from the very popular book series, The Hunger Games. The timing of the project also set it up for success as a popular movie based on the book had been released shortly before the project was listed. As a result, fans of this series from around the country were searching the marketplace for The Hunger Games-related projects to support. Based on our reading of the messages left by the donors, it is likely that most were making their decisions to support the project based on their interest in the series and not based on the activities of prior donors. However, this relationship would be difficult to discern statistically as the flurry of donations would induce a correlation between early donations and eventual fundraising success. This is certainly an extreme example, but it illustrates how difficult it is for an analyst to differentiate social influence from difficult to observe aspects of products. We follow recent studies of social influence and sidestep the problem of unobserved heterogeneity by taking an experimental approach. Specifically, we donated $7,200 16 in $5 and $40 increments to 320 randomly selected projects on DonorsChoose over a 30-day period. We randomized which projects received our anonymous donations to isolate the causal effects of an initial contribution on funding outcomes. We also account for the fact that in this marketplace our donation simultaneously increases the probability of a project being funded by reducing the amount a project needs to reach its funding goal by either five or forty dollars. We control for this “mechanical” reduction by using a non-parametric modeling strategy to isolate the social influence effect. Our experimental approach allows us to test if social influence leads to cumulative advantages. This alone does not inform us on how cumulative advantage process affect the inequality of outcomes. This is because we cannot say if projects with a greater chance of being funded disproportionately benefit from changes to social information. Randomization has washed away such differences. To overcome this hurdle, we construct a counterfactual measure of a project’s expected success in the absence of our randomized donation. The interaction between this predicted level of success and the randomized contribution enables us to assess if the change in unpredictability varies across projects in different parts of the distribution of success. If projects likely to succeed disproportionately benefit then we have evidence for success-breeds-success inequality dynamics. We take advantage of the fine-grained historical data available on DonorsChoose to build a measure of expected success. Specifically, we use data on the characteristics and outcomes of hundreds of thousands of past projects to create an estimate of the probability a project would be funded without our intervention. If our exogenous changes to social information through randomized donations lead to success-breedssuccess dynamics, then projects with a higher predicted probability of success should benefit more than other projects. Conversely, if projects with lower predicted probabilities of success disproportionately benefit, then our randomized donations may actually 17 be reducing inequality in outcomes. The next section describes how we generate these estimates. 4.1 Predicting Success We used data on all projects posted between January 1st, 2005 and August 21st, 2012 to train a prediction algorithm that estimates a newly posted project’s chance of success. We excluded data from before 2005 as the website was significantly smaller and the information concerning each project is much less detailed. This window leaves us with 426,790 projects after excluding a handful of outlier projects that requested either zero or more than ten-thousand dollars. These 426,790 projects comprise our training set. This data provides a large number of project characteristics with which we can predict funding success. For each project, we know the timing of when a project comes online, the sales tax amount, fulfillment costs, vendor shipping charges, number of students who benefit, resource type, subject matter and if the project is eligible for matching funds. For each teacher, we know the teacher’s salutation (a proxy for gender), the grade-level they teach, if the teacher is part of Teach for America, and if the teacher is a New York Teaching Fellow. For each school, we know the location of the school, the school poverty level, if it is a charter school, a year-round school, a magnet school, an New Leaders for New Schools school, and if the school is part of the KIPP system. Each of these variables may independently affect the probability that a project will reach its funding goal. In addition, it is likely that complex functions and combinations of these variables matter as well. For example, on average projects that request iPads may be less successful than projects that request that request textbooks. However, this difference may be even greater for low poverty schools than high-poverty schools. Donors may see technology at wealthier schools as especially unappealing. One might 18 also imagine that teacher characteristics have important interactions. Perhaps gender stereotypes matter and male teachers are more successful at fundraising for sports equipment projects than their female counterparts. Overall, we want to use a prediction method that is able to account for potentially complex interactions between the project, teacher and school characteristics. We account for these potential interactions in a computationally tractable manner using a widely used machine learning algorithm, Random Forests (Breiman, 2001; Friedman, Hastie and Tibshirani, 2008). Random Forests are an extension of Classification and Regression Trees (CART). Unlike standard regression analysis, the CART algorithm automatically determines which covariates to include and how to include them. It does this by scanning over the set of variables and picking the single variable that best predicts the outcome of interest when split into two groups. Within each of these groups, the algorithm again selects the single variable that best subdivides these groups when split. It iterates in this manner until it there are less than a pre-specified number of observations in each group. The tree allows for the modeling of complex interactions since each of the subgroups may split on different variables. For example, the tree may first split projects into high-poverty schools that are likely to be funded and low-poverty schools that are unlikely to be funded. Then the algorithm would try to split each of these groups. For the low-poverty group, the split may be on the fundraising goal, indicating that donors are price sensitive. For the high-poverty group, the next split may be on vendor shipping charges, indicating that donors to this group are averse to funding overhead costs. Despite the theoretical and interpretative clarity of CARTs, they often have poor predictive performance and are unstable. The Random Forests algorithm addresses this issue by averaging the predictions of many CARTs trained on different bootstrapped samples of the data and by considering a randomly selected set of variables at each 19 split in each tree. This second step decorrelates the trees, which results in less variability when making predictions using the entire forest of trees. Random Forests have performed extremely well in benchmark tests of machine learning prediction techniques (Friedman, Hastie and Tibshirani, 2008; James et al., 2013). We use the “randomForest” package in R to build a predictive model of project success and treat the 426,790 projects described above as our training set. In the training sample, 65.5% of projects reach their funding goal. Therefore, a naive algorithm that always predicts every project ends up funded would yield a classification error rate of 34.5%. We improve on this error rate by fitting a Random Forest using 28 variables (see Figure 2 for the complete list). Instead of including cities or zip codes for each school, we include the longitude and latitude of the school, allowing the Random Forest to inductively partition geographical variation. After an initial exploratory analysis, we found that building a forest with 250 trees and considering five randomly selected variables at each split yielded the greatest predictive performance. The performance of the model is evaluated using the out-of-bag error rate, a procedure conceptually similar to cross-validation. The final error rate is 24.2%, a 30% decrease from naively guessing that every project is funded. The model exhibits few false negatives. The Random Forest incorrectly classifies projects that are funded as unfunded only 8.7% of the time. In contrast, for projects that do not reach their funding goal, the algorithm is only correct 53.6% of the time. In sum, the Random Forest greatly improves upon the naive prediction and does an especially good job at predicting which projects are likely to be winners. We also assessed the performance of the algorithm against our qualitative understanding of what factors matter on the platform. We first examined which variables the algorithm determined were most important when predicting whether a project would be funded. Figure 2 shows the importance of each variable in fitting the Random Forest, from most important at the top to least important at the bottom. Intuitively, this 20 graphs shows which variables lead to the greatest gains in classification accuracy. While variable importance tends to be biased towards variables that are continuous or have more categories, the plot nonetheless provides a way to check for which variables matters for project success. Consistent with our expectations, the amount requested is the most important variable, followed by date posted, geography, and then the project’s subject. This accords with other analyses of this platform (Meer, 2014) and more generally, with analyses of descriptive statistics of crowdfunding platforms (Mollick, 2014). To further unpack what the algorithm is doing, we randomly selected projects from each decile of predicted funding probability and present the differences and similarities in Table 1. A cursory glance at this table would suggest that larger projects are less likely to get funded. But this relationship is not deterministic. For example, the project in the 5th row only requests $194.02 but the predicted probability of funding is 27.6%. However, the project in the 20th row is the most likely to get funded (95.6%) and requests an additional $201.84 for a total of $395.86. That said, reducing the amount requested for the project in the 1st row from $816.54 to $250 would increase the predicted probability of funding from 5.6% to 48.28%. Moving the date of posting for the project in the 15th row from March to June reduces the funding probability from 77.2% to 66.4%. This difference suggests that it may be harder to raise money in the summer months when most schools are not in session. For the project in the 16th row, removing the secondary focus subject reduces the predicted probability of funding from 97.6% to 66.8%, suggesting that having a secondary focus subject may be beneficial. In contrast, for the project in the 18th row adding a secondary focus subject lowers the predicted probability of funding from 88.8% to 82.4%. It may be that secondary focus subject interacts with other project characteristics in ways that would likely be difficult to capture without the aid of the Random Forest algorithm. 21 4.2 Experimental Design Our experiment made contributions to 320 randomly selected DonorsChoose projects as soon as they were listed in the marketplace: 160 five dollar donations and 160 forty dollar donations. These are the 10th percentile and 65th percentile of donation amounts on the site, respectively. We worked with the Chief Technology Officer of DonorsChoose to create a customized data feed for all newly listed projects in a given day. We only include projects that have not received any prior contribution in the few hours since they went live. Over 99% of projects were included in our feed. We also restricted our sample to projects with a primary or secondary subject as “Literacy,” “Literature & Writing,” or “ESL”. This second selection criterion reduces the natural variation that occurs across categories and increases our statistical power (Gerber and Green, 2012). The selection procedure worked as follows. For each of the twenty days on which we made contributions, we first generated a list of all projects that fit the criteria described above. We avoid altering macro-level market dynamics by only donating to a small number of the listed projects. Donating to a large number of projects in one day might change the probability of funding for both the treated and untreated projects. Therefore, we restrict the number of projects that we contribute to on any given day in order to ensure that we can meet the stable unit treatment value assumption (SUTVA). This assumption would be violated if we dramatically changed the dynamics of the marketplace (Morgan and Winship, 2007). Specifically, we limit the number of projects we donate to at most 16 on any given day, a small fraction of active projects. We donated to projects on twenty days from August 22nd to September 18th, 2012. To maintain strict comparability between our treatment and control groups, we only analyze projects listed on days in which we made a donation. Figure 3 shows the number of projects we donated to, and the total number of projects at risk at receiving our treatment. Our donations were anonymous and all contributions appear visually 22 identical to potential donors to minimize potential donor identity effects (Karlan and List, 2012). We worked with DonorsChoose to ensure that our initial donation did not immediately alter the project’s search rank.6 (Ghose and Yang, 2009; Ghose, Ipeirotis and Li, 2014). We construct a control group out of the 2,651 projects that were at risk for being selected into treatment but were randomly excluded. We cannot include the entire sample of non-treated projects in our models because the number of projects and funding probability vary greatly across days. From our prediction modeling, we know that timing effects are important and thus including the entire sample may lead us to bias our estimates by improperly weighting some days more than others. To address this issue, one can include fixed effects for each day to control for inter-day variation in funding probability or sample projects proportional to the number of projects treated on each day. We chose to randomly sample projects in proportion to the number treated on that day.7 Specifically, we sample 5 control projects per 1 treatment project to maximize our power. We have to drop two days where there are too few potential control projects to sample. This procedure leaves us with 144 projects that receive the $5 treatment, 144 the $40 treatment, and 1,440 randomly selected projects serving as our control group. 5 Results and Analysis We begin our analysis by comparing our treatment and control groups. Table 2 presents summary statistics for the control, $5 treatment, and $40 treatment groups. The vari6 By the design of the search algorithm, the amount contributed does not alter the search rank until the project is nearing its 5 month time frame for funding Even at this point, the primary characteristics used to rank search are the school characteristics, time remaining and amount outstanding to raise. Checking the search order qualitatively during our experiment revealed no difference between our treatment and control groups. Finally, the effects observed from our treatments occur before the algorithm meaningfully alters search results. 7 Results are qualitatively unchanged using the fixed-effect approach. 23 ables “Total Project Amount”, “Number of Students Reached”, and the “Random Forest Predicted Probability of Funding” are time invariant and are measured prior to our treatment donations. In Table 2 the italicized variables “Days to Funding” and “Project is Funded” are measured after our treatment. Correlations between these variables are presented in Table 3. The pre-treatment variables show no statistically significant differences between groups and provide first-order evidence that our randomization procedure was successful. More formally, Table 4 regresses the three pre-treatment variables on our treatment conditions. If our randomizations are unrelated to project characteristics, then it should be the case that the coefficients on our treatment variables should be very close to zero. Indeed, the coefficients are small, and all insignificant. This gives us confidence that our procedure resulted in randomized treatment assignments. The effects of our intervention on the post-treatment variables are presented in Table 2. We find that our randomized donations alter the time to funding and the probability that a project is funded. Projects in the $40 treatment group have a funding rate of 84%, 10% higher than the control group rate of 73%. They also reach their funding goal on average 13 days faster than the 70 days it takes the control group. Unexpectedly, projects in the $5 treatment arm appear to perform somewhat worse than the control being funded at a 72% rate as compared to the control rate of 74%. Moreover, projects that received the $5 donations take on average 9 days longer to be funded than projects in the control group. We examine the hazard of funding by fitting separate Kaplan-Meier survival curves by experimental condition in order to better understand how our treatments altered funding dynamics. This approach also enables us to account for the right-censoring that occurs when DonorsChoose removes projects that have been on the site for 5 months (approximately 150 days). Given enough time, some (or perhaps all) of these projects would be fully funded. Analyzing funding rates allows us to account for this censoring. 24 Figure 4 plots the probability of funding by experimental condition. The x-axis is the number of days since posting. Consistent with Table 2, the $40 treatment appears to have a greater hazard rate of funding than the control group and this difference appears to grow larger with time. This increase is consistent with models of social influence leading to cumulative advantages. The funding probability for projects that received the $5 treatment appears slightly lower, but this decrease does not appear to change with time. While the $40 Kaplan-Meier curve is suggestive of cumulative advantage, it does not take into account that our donation reduces the amount of outstanding money a teacher has to raise to reach their goal, which we refer to as the “amount outstanding”. For instance, projects in the $40 treatment arm have $40 less to raise. Therefore, even if donors were unaffected by our treatment, projects in the $40 treatment arm should still be more successful at reaching their funding goals, ceteras paribus. We account for the reduction in the amount outstanding by controlling for it in a non-parametric way. Specifically, we fit Generalized Additive Models with coefficients for each of our treatments and with a penalized regression spline in the amount outstanding. We set the number and location of knots for the spline by minimizing the error rate using cross-validation. Table 5 present linear probability models of funding and Table 6 presents hazard models of funding. Model 1 and Model 5 replicate the summary statistic and KaplanMeier analysis presented above in regression form. The results are consistent with the visual evidence in Figure 4. The probability of funding for the projects in the $40 treatment condition increases by 9.7 percent (SE = 0.038) and the hazard of funding by 0.29 (SE = 0.096). While the $5 treatment is negative, the coefficient size is small and statistically insignificant. Model 2 and Model 6 control for the mechanical reduction by incorporating a spline in the amount outstanding. Model 6, the Cox model, also 25 includes strata for each quintile of the amount outstanding.8 We find some evidence for cumulative advantage in Models 2 and 6. Once we account for the mechanical reduction in the amount outstanding, the coefficient for the $40 treatment drops from 0.097 to 0.069 and is only significant at the 10% level. In Model 6, the hazard drops from 0.291 to 0.217 but remains significant at the 5% level. Taking these results together, it appears that roughly one-third of our effect occurs because of the mechanical reduction in the amount outstanding and the other two-thirds by changing the hazard rate of future donations. The $5 donations do not appear to have any meaningful effect on outcomes. Overall, Hypothesis 1 is largely supported for our $40 treatment. Next, we investigate Hypothesis 2 by testing whether these cumulative advantage effects change the distribution of outcomes. We estimate these changes by interacting our treatments with our measure of predicted success. We begin by first including normalized predicted funding probability in Models 3 and 7 in Tables 5 and 6. The predicted funding probability coefficient is highly significant in the linear probability and Cox models. To get a sense of the effect size, it is useful to compare the coefficient in our regression to the overall variability of our measure. A one standard deviation increase corresponds to a 15% increase in predicted probability. In the linear probability model, the coefficient on our normalized measure is 14.5%. This implies that a one standard deviation increase in the predicted probability of funding (15% higher) leads to a 14.5% increase in the actual probability of funding in our data. This provides strong evidence that our predicted probability measure is capturing differences in the likelihood of project success. Moreover, we increase our power by capturing a substantial amount of project heterogeneity with the inclusion of the predicted funding 8 One concern with fitting Cox models is meeting the proportional hazards assumption. In unreported analyses, we find that larger projects are less likely to get funded in the early days than projects that request a smaller amount, though this difference dissipates over time. To account for this, we stratify models 6-8 on quintiles of project size. Testing the model with these strata reveals that we meet the proportional hazards assumption. 26 probability variable in Models 3 and 7. The magnitude and statistical significance of our $40 treatment increases in both models. This provides further evidence that our $40 treatment leads to cumulative advantage. Models 4 and 8 in Tables 5 and 6 interact our treatment indicators with the predicted probability of funding. We find little evidence that projects that have higher predicted probabilities of funding benefit more from a randomized $40 treatment. The coefficient on the interaction term is small and statistically insignificant in both models. The main effect of the $40 donation does not change in magnitude nor significance. Thus, we find little evidence for cumulative advantage distorting the baseline chance of funding success. All projects appear to benefit equally from arbitrary and exogenous variation in social information. A potential concern with our analysis is that the functional form of the interaction effect may be non-linear. Our assumption of linearity may be masking underlying effects that at the tails of the distribution. We account for these possibilities by again using Generalized Additive Models with a flexible spline specification. Specifically, we fit models in which we interact our $40 treatment with a spline of predicted funding probability. This allows us to see if our treatments have larger or small effects at different points in the predicted funding probability distribution. Since the $5 treatment has no effect thus far, we drop these observations from this analysis to ease interpretation. As above, we determine the location and number of knots using cross-validation and minimization of the the out-of-fold error rate. Interpreting the implications of these non-linear interactions using coefficient estimates directly is extremely difficult. In lieu of regression tables, we present the results of these non-linear interactions by plotting the marginal effects in Figure 5. The xaxis in both plots is the standardized predicted funding probability. The black lines represent changes in funding probability for projects in the control group. The blue line changes in the funding probability for the the projects in the $40 treatment group. 27 The light shaded areas are the 95 percent confidence intervals. While the spline in the linear probability model is slightly non-linear at the tails, the Cox hazard model is perfectly linear. For both groups and in both models, the realized funding probability increases linearly with the predicted funding probability. We find no evidence for non-linear interaction effects. In short, we find no evidence for Hypothesis 2. Our treatments are constant no matter a project’s expected level of success; the unlikely to succeed and the very likely to succeed appear to equally benefit. In summary, we find that our $40 treatment leads to cumulative advantage. However, we find little evidence that our treatment varies across projects with different predicted levels of success. These results taken together suggest that cumulative advantage operates in our setting but that random variation in social information plays no direct role creating wider differences in success than one would expect. Instead, it seems that exogenous changes in social information induced by our treatments benefits projects across the distribution of expected success in a relatively equally way. This suggests that exogenous changes in social information may simply lead to more unpredictable outcomes, even if the distribution of success in expectation is relatively similar. 6 Discussion Our study has shown that the existence of cumulative advantage induced by exogenous changes does not necessarily increase inequality of success. This existence proof is an important corrective to the assumption made by many scholars that cumulative advantage, often created by social influence processes, inherently increases the inequality of success in marketplaces (Salganik, Dodds and Watts, 2006; Muchnik, Aral and Taylor, 2013; van de Rijt et al., 2014). Rather than assume a direct mapping between cumulative advantage and changes in the inequality of success, we provide a replicable 28 methodology for assessing the existence and strength of this link. Quantifying these effects is necessary in order to design marketplaces that balance the benefits of social information with the potential costs of the distortions such information may introduce. Though we did not find evidence that exogenous changes to social information produce differential cumulative advantage effects, there are other mechanisms through which cumulative advantage processes may exacerbate inequalities. For example, products that have an innate appeal may be more likely to receive an initial review, endorsement or contribution. Receiving earlier support could provide an initial advantage for these products over comparable products. In addition, some products in certain marketplaces may be more likely to receive large contributions, which prior research (along with our results) suggests may be more likely to attract subsequent customers (List and Reiley, 2002). Future research should investigate the role of these endogenous processes. However, our analysis greatly reduces concerns that arbitrary and exogenous early differences in social information deterministically lead to increases in the inequality of success. In our setting, products that are unlikely to succeed and that are likely to succeed equally benefit from changes to social information that is uncorrelated with underlying features of a product. By focusing on the inequality of success, our approach also sidesteps the contentious debate over whether the “wisdom of crowds” exists when social information in a marketplace leads to potentially interdependent rather than independent judgments (Zhang and Liu, 2012). This is an important debate to have, but it ignores the fact that marketplaces are often designed in ways that shepherd crowds towards particular goals. For example, crowdfunding platforms like Kickstarter and IndieGoGo curate and promote products that would otherwise be difficult to discover. DonorsChoose explicitly highlights projects that serve high-poverty public schools precisely because one of its goals as a philanthropic marketplace is to help direct capital to needy students. In practice, these platforms tend to reduce the inequality of success by promoting products that 29 are aligned with the goals of the marketplace designers. An important limitation to our study is the generalizablilty of its context. While we argue that the behavior of donors on DonorsChoose is similar to what we would expect in other marketplaces, we cannot be certain without replication in other contexts. For instance, it is possible that our interventions affect only aspects of evaluation that are particular to a philanthropic context, such as social signaling or warm glow, rather than more general perceptions of project’s features. Although we cannot rule out these explanations with our current study, future research should attempt to study these possibilities. Moreover, understanding both commercial and philanthropic motivations is particularly timely as new products and services are increasingly combining the two (Battilana and Lee, 2014). For example, consider the case of so-called “rewards-based” crowdfunding platforms such as Kickstarter and IndieGoGo. While some contributors use these platforms to buy a product or service (i.e., the contributor’s reward is the actual product or service being developed), many contributors support the overall endeavor and do not receive the actual products and services created. Instead, they typically receive recognition or trinkets as their rewards, which is very similar to what donors normally receive in exchange for charitable contributions. 7 Conclusion One of the key findings of our study is that social information may increase unpredictability of success in marketplaces. But when would unpredictability be desirable and when might it be detrimental? Our view is that the value of unpredictability is that it is one way to promote diversity of success in a marketplace. Diversity is not always a good thing. For instance, unpredictability is especially harmful in marketplaces where consumers have common goals, a consensus on what constitutes “quality” and a consensus on how to measure it. One example of this type of marketplace would 30 be peer-to-peer lending. Lenders have similar goals; they are looking for the best risk adjusted rate of return they can get. A situation in which social influence increases the funding of poor performing loans is bad for the consumers of these marketplaces and, in the long run, may jeopardize the viability of these platforms if lenders lose money. It is also likely bad for people who are taking the loan, as failing to repay a loan may hurt their credit rating or ability to get future loans. Alternatively, many marketplaces have consumers with diverse motivations and where assessments of quality are more varied (Zuckerman, 2012). These marketplaces may benefit from unpredictability as it leads to diversity in the distribution of success. The chance of an unpredicted success may encourage risk-taking, innovation, and exploratory strategies (March, 1991). The diversity of success may be an explicit goal for marketplaces seeking to foster innovation, a goal of many crowdfunding platforms. Concerns about decoupling success from the underlying appeal of a product may be muted because the single ideal or metric of “quality” may not exist. In sum, the greater levels of unpredictability created through social information may, on average, help products with less inherently popular characteristics. But it may also enable less appealing but more innovative products to succeed. 31 References Agrawal, Ajay K., Christian Catalini and Avi Goldfarb. 2011. “The geography of crowdfunding.” National Bureau of Economic Research . Allison, Paul D, J Scott Long and Tad K Krauze. 1982. “Cumulative advantage and inequality in science.” American Sociological Review pp. 615–625. Andreoni, James. 1990. “Impure altruism and donations to public goods: a theory of warm-glow giving.” The economic journal pp. 464–477. Azoulay, Pierre, Toby Stuart and Yanbo Wang. 2013. “Matthew: Effect or fable?” Management Science 60(1):92–109. Banerjee, Abhijit V. 1992. “A simple model of herd behavior.” The Quarterly Journal of Economics pp. 797–817. Battilana, Julie and Matthew Lee. 2014. “Advancing research on hybrid organizing– Insights from the study of social enterprises.” The Academy of Management Annals 8(1):397–441. Bénabou, Roland and Jean Tirole. 2006. “Incentives and Prosocial Behavior.” American Economic Review 96(5):1652–1678. Breiman, Leo. 2001. “Random forests.” Machine Learning 45(1):5–32. Burtch, Gordon, Anindya Ghose and Sunil Wattal. 2013. “An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets.” Information Systems Research 24(3):499–519. Chen, Yubo, Qi Wang and Jinhong Xie. 2011. “Online social interactions: A natural experiment on word of mouth versus observational learning.” Journal of Marketing Research 48(2):238–254. Cialdini, Robert B. 1993. Influence: The psychology of persuasion. Colombo, Massimo G, Chiara Franzoni and Cristina Rossi-Lamastra. 2014. “Internal Social Capital and the Attraction of Early Contributions in Crowdfunding.” Entrepreneurship Theory and Practice 39(1):75–100. DiPrete, Thomas A and Gregory M Eirich. 2006. “Cumulative Advantage as a Mechanism for Inequality: A Review of Theoretical and Empirical Developments.” Annual Review of Sociology 32(1):271–297. Friedman, Jerome, Trevor Hastie and Robert Tibshirani. 2008. The Elements of Statistical Learning. Springer series in statistics Springer, Berlin. Gerber, Alan S and Donald P Green. 2012. Field experiments: Design, analysis, and interpretation. WW Norton. 32 Ghose, Anindya, Panagiotis G Ipeirotis and Beibei Li. 2014. “Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue.” Management Science . Ghose, Anindya and Sha Yang. 2009. “An empirical analysis of search engine advertising: Sponsored search in electronic markets.” Management Science 55(10):1605– 1622. Gneezy, Uri, Elizabeth A Keenan and Ayelet Gneezy. 2014. “Avoiding overhead aversion in charity.” Science 346(6209):632–635. Hansmann, Henry. 1987. “Economic theories of nonprofit organization.” The nonprofit sector: A research handbook 1:27–42. James, Gareth, Daniela Witten, Trevor Hastie and Robert Tibshirani. 2013. An introduction to statistical learning. Springer. Karlan, Dean and John A. List. 2012. “How Can Bill and Melinda Gates Increase Other Peoples Donations to Fund Public Goods?” National Bureau of Economic Research . Karlan, Dean and Margaret A McConnell. 2014. “Hey look at me: The effect of giving circles on giving.” Journal of Economic Behavior & Organization 106:402–412. Kovács, Balázs and Amanda J Sharkey. 2014. “The Paradox of Publicity How Awards Can Negatively Affect the Evaluation of Quality.” Administrative Science Quarterly 59(1):1–33. List, John and David Reiley. 2002. “The Effects of Seed Money and Refunds on Charitable Giving: Experimental Evidence from a University Capital Campaign.” Journal of Political Economy 110(1):215–233. Manski, Charles F. 1993. “Identification of endogenous social effects: The reflection problem.” The Review of Economic Studies 60(3):531–542. March, James G. 1991. “Exploration and Exploitation in Organizational Learning.” Organization Science 2(1):71–87. Meer, Jonathan. 2014. “Effects of the price of charitable giving: Evidence from an online crowdfunding platform.” Journal of Economic Behavior & Organization 103:113–124. Merton, Robert K. 1968. “The Matthew Effect in Science.” Science 159:56–63. Mollick, Ethan. 2014. “The dynamics of crowdfunding: An exploratory study.” Journal of Business Venturing 29(1):1–16. Morgan, Stephen L. and Christopher Winship. 2007. Counterfactuals and causal inference: Methods and principles for social research. Cambridge University Press. 33 Muchnik, Lev, Sinan Aral and Sean J. Taylor. 2013. “Social influence bias: A randomized experiment.” Science 341(6146):647–651. Podolny, Joel M. 2005. Status Signals. A Sociological Study of Market Competition Princeton Univ Pr. Salganik, Matthew J. and Duncan J. Watts. 2008. “Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market.” Social Psychology Quarterly 71(4):338–355. Salganik, Matthew J, Peter Sheridan Dodds and Duncan J Watts. 2006. “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.” Science 311:854–856. Shalizi, Cosma Rohilla and Andrew C Thomas. 2011. “Homophily and contagion are generically confounded in observational social network studies.” Sociological Methods & Research 40(2):211–239. Simonsohn, Uri and Dan Ariely. 2008. “When Rational Sellers Face Nonrational Buyers: Evidence from Herding on eBay.” Management Science 54(9):1624–1637. Tucker, Catherine and Juanjuan Zhang. 2011. “How does popularity information affect choices? A field experiment.” Management Science 57(5):828–842. van de Rijt, Arnout, Soong Moon Kang, Michael Restivo and Akshay Patil. 2014. “Field experiments of success-breeds-success dynamics.” Proceedings of the National Academy of Sciences 111(19):6934–6939. Zhang, Juanjuan and Peng Liu. 2012. “Rational herding in microloan markets.” Management Science 58(5):892–912. Zuckerman, Ezra W. 2012. “Construction, concentration, and (dis) continuities in social valuations.” Annual Review of Sociology 38:223–245. 34 8 Figures and Tables Figure 1: Example of the DonorsChoose website at the time the experiment took place. 35 Variable Importance Tota Project Amount ● Day in Year ● School Longitude ● School Latitude ● Secondary Focus Subject ● Primary Focus Subject ● Vendor Shipping Charges ● Number Students Reached ● Day of Week ● Sales Tax ● Year ● Secondary Focus Area ● Resource Type ● Grade Level ● Fulfillment Costs ● Primary Focus Area ● Double Impact Match Eligible ● Teacher Mr, Mrs, or Ms ● School Poverty Level ● Almost Home Match Eligible ● TFA Teacher ● Charter School ● Year Round School ● Magnet School ● NLNS School ● NYTF Teacher ● Promise School ● KIPP School ● 0 5000 36 10000 15000 20000 MeanDecreaseGini Figure 2: Random Forest variable importance. Variables at the top of the list are more important in predicting a project’s probability of success than variables at the bottom of the list. Table 1: Two Randomly Selected Projects from each Decile of Predicted Funding Probability PID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 P(Funding) Project Title Date Project Amt Primary Subject 0.056 05-12 816.54 Literature & Writing A New Projector Would Make Presentations-Kids Brighter! 0.068 04-16 377.72 Literature & Writing Continuing To Expand Our Electronic Library 0.136 05-06 1, 899.96 Mathematics Teaching Through Technology 0.156 03-15 412.50 Mathematics Becoming Mathematicians With Technology 0.276 02-18 194.02 Special Needs Create And Learn 0.292 08-09 661.49 Environmental Science Geologists In The Making 0.316 07-21 210.02 Literature & Writing Stay Gold, Ponyboy 0.320 03-30 379.85 Literature & Writing Learning and Growing through Music 0.404 06-16 440.60 Literature & Writing Help Us Make Writing Wondrous! 0.452 03-14 252.47 Literacy iCan Learn with iPods! 0.536 05-10 735.32 Music Keyboards to Keep Kids Learning Part 2! 0.572 06-15 434.45 Literacy We Are Greatly in Need of General School Supplies 0.656 02-23 777.27 Literature & Writing New Technology for Our Classroom 2 0.696 06-12 142.32 Literacy Keeping Our Classroom Colorful in the New School Year! 0.772 03-07 598.26 Special Needs Technology for Special Needs Students 0.796 02-19 296.83 Literacy Where Are the Books? 0.816 05-25 339.35 Music Help Us Learn Guitar! 0.888 03-11 612.59 Literacy Empowering First Graders 0.928 03-16 250.84 Special Needs Read and Explore 0.956 08-11 395.86 Literacy Fill Our bookshelves! 37 Secondary Subject Shipping Students Literacy 12 175 Literacy 12 27 Other 158 24 31.29 100 Mathematics 0 30 Applied Sciences 0 70 0 24 Mathematics 12 18 Literacy 0 31 Mathematics 12 21 0 120 0 30 12 75 0 21 47.21 9 0 18 0 10 0 27 12 13 0 26 ESL Literature & Writing Special Needs Early Development Literacy Table 2: Summary Statistics by Experimental Condition Variable N Mean St. Dev. Min Median Max Control Projects Total Project Amount Number Students Reached Predicted Probability Days to Funding Project is Funded 1,440 1,440 1,440 1,440 1,440 534.80 68.1 0.66 69.9 0.74 393.55 111.4 0.15 56.1 0.44 133.47 5 0.22 1 0 434.77 30 0.67 49 1 4,512.88 999 0.96 150 1 $5 Treated Projects Total Project Amount Number Students Reached Predicted Probability Days to Funding Project is Funded 144 144 144 144 144 495.18 61.8 0.66 78.7 0.72 297.88 96.8 0.15 56.2 0.45 131.72 7 0.27 2 0 435.63 30 0.65 58 1 2,216.21 999 0.94 150 1 $40 Treated Projects Total Project Amount Number Students Reached Predicted Probability Days to Funding Project is Funded 144 144 144 144 144 491.74 74.5 0.67 56.7 0.84 391.95 145.1 0.15 52.2 0.37 134.95 12 0.29 1 0 418.36 29.5 0.69 39 1 3,297.28 999 0.92 150 1 Training Set Projects Italicized variables are measured post treament. Table 3: Correlations (1) (2) (3) (4) (5) (6) (7) (8) Total Project Amount Number Students Reached Predicted Probability Number of Natural Donors Days to Funding Reached Funding Goal Forty Dollar Treatment Five Dollar Treatment (1) (2) (3) (4) (5) (6) (7) 0.08 -0.63 -0.09 0.18 -0.14 -0.03 -0.03 -0.07 -0.02 -0.01 0.02 0.02 -0.02 0.09 -0.32 0.28 0.02 -0.02 -0.26 0.38 -0.01 -0.02 -0.78 -0.07 0.05 0.06 -0.02 -0.09 38 (8) Table 4: Balance Tests Dependent variable: PFP Log(Project Amount) Log(Num Students Reached) (1) (2) (3) $5 Treatment −0.008 (0.013) −0.046 (0.051) 0.004 (0.075) $40 Treatment 0.010 (0.013) −0.088 (0.051) 0.013 (0.075) Constant 0.663∗∗ (0.004) 6.098∗∗ (0.015) 3.705∗∗ (0.023) 1,728 3 −843.173 −0.001 1,728 3 −1,513.84 0.001 1,728 3 −2,195.23 −0.001 Observations Model D.F. Log Likelihood Adjusted R2 ∗ p<0.05; ∗∗ p<0.01 Linear Regression Models. Predicted Funding Probability (PFP). Note: 39 Table 5: Linear Probability Models Is the project funded? (1) (2) (3) (4) $5 Treatment −0.028 (0.038) −0.033 (0.037) −0.020 (0.036) −0.019 (0.036) $40 Treatment 0.097∗ (0.038) 0.069 (0.037) 0.087∗ (0.036) 0.084∗ (0.037) 0.145∗∗ (0.015) 0.141∗∗ (0.015) Predicted Funding Probability (PFP) $5 Treatment × PFP 0.022 (0.036) $40 Treatment × PFP 0.027 (0.036) Constant Project Amount Splines Observations Estimated Model D.F. Log Likelihood Adjusted R2 Note: 0.744∗∗ (0.011) 0.746∗∗ (0.011) 0.744∗∗ (0.011) 0.744∗∗ (0.011) No Yes Yes Yes 1,728 3 −1,007.198 0.003 1,728 6.87 −974.833 0.042 1,728 6.17 −934.405 0.085 1,728 8.12 −935.990 0.085 ∗ p<0.05; ∗∗ p<0.01; Linear probability models with penalized splines. All continuous variables standardized. Predicted Funding Probability (PFP). 40 Table 6: Cox-Proportional Hazard Models Days till project is funded (5) (6) (7) (8) $5 Treatment −0.138 (0.103) −0.186 (0.103) −0.145 (0.104) −0.149 (0.105) $40 Treatment 0.291∗∗ (0.096) 0.217∗ (0.097) 0.270∗ (0.097) 0.239∗ (0.102) 0.388∗∗ (0.045) 0.374∗∗ (0.047) Predicted Funding Probability (PFP) $5 Treatment × PFP 0.032 (0.106) $40 Treatment × PFP 0.160 (0.107) Project Amount Quintile Strata Project Amount Splines Observations Estimated Model D.F. Log Likelihood Note: No No Yes Yes Yes Yes Yes Yes 1,728 2 -8,936.613 1,728 3.17 -6,813.302 1,728 5.01 -6,773.990 1,728 7.91 -6,772.463 ∗ p<0.05; ∗∗ p<0.01; Cox-proportional hazard models with penalized splines. All continuous variables standardized. Predicted Funding Probability (PFP). 41 Count of new projects 300 Condition 200 40 5 0 100 0 9−18 9−17 9−16 9−15 9−14 9−13 9−12 9−11 9−10 9−09 9−08 9−07 9−06 9−05 9−04 9−03 9−02 9−01 8−31 8−30 8−29 8−28 8−27 8−26 8−25 8−24 8−23 8−22 Date Figure 3: Number of projects per day posted during experimental intervention period by condition. 42 1 0.8 0.6 0.4 Exp Condition 0 0.2 Control $5 $40 0 50 100 150 Days since project posted Figure 4: Kaplan-Meier Curves showing the probability after x days that a project is fully funded by condition. 43 Cox-Proportional Hazard 0 Change in Hazard of Funding -1 0.2 0.0 -0.2 -0.4 -2 -0.6 Change in Probability of Funding 1 0.4 0.6 Linear Probability Model -3 -2 -1 0 1 2 -3 Standardized Predicted Funding Probability -2 -1 0 1 2 Standardized Predicted Funding Probability Figure 5: Testing the Linearity of the Interaction Effect using Generalized Addative Models. 44