Identifying Social Influence in Viral Products Rodrigo Belo , Pedro Ferreira

advertisement
Identifying Social Influence in Viral Products
Using Randomization over a Large Mobile Network
Rodrigo Belo∗, Pedro Ferreira†
Carnegie Mellon University
November 27, 2014
Abstract
This paper analyzes the role of social influence in the diffusion of telecom related products
across social networks. We study a subset of the products deployed by a large European mobile
carrier and look at how their viral characteristics affect diffusion. We develop a theoretical
model of the consumers’ incentives to adopt these products and find that apparently similar viral
designs can result in very different adoption dynamics. We use randomization to identify social
influence from observational data. In short, we shuffle adoption dates to create pseudo worlds
where peer influence is inexistent, which allows us to capture, and thus separate, homophily. We
find that for some products more adopters lead to more adoption. However, for other products
more adopters lead to less adoption. These empirical results come in line with our theoretical
model. To the best of our knowledge, our paper is the first to provide empirical evidence of
social network effects that reduce adoption. These effects might limit considerably the diffusion
of some products countering the potential benefits from viral designs.
1
Introduction
There has been increasing interest in leveraging information available from social networks to promote the adoption of new products and services. Accordingly, much attention has been devoted
in the literature to study how different actors and network structures contribute to diffusion (e.g.,
Rogers, 2003; Watts and Dodds, 2007; Van den Bulte and Joshi, 2007; Bampo et al., 2008). Comparatively, little attention has been devoted to how product characteristics shape the adoption
∗
†
Rodrigo Belo, CMU, rbelo@cmu.edu.
Pedro Ferreira, CMU, pedrof@cmu.edu.
1
process over networks. Some studies look at this issue from a theoretical point of view (e.g.,
Jackson and Zenou, 2012; Galeotti et al., 2010; Kearns et al., 2001), but only a few studies try to
empirically assess the role of product characteristics on diffusion (e.g., Aral and Walker, 2011). This
is probably the case because it is hard to distinguish social influence from other phenomena, such
as heterogeneity in the propensity to adopt, homophily, correlated unobservables and simultaneity,
using only observational data (Shalizi and Thomas, 2011). Some recent studies use identification
strategies such as structural modeling (e.g., Ma et al., 2009), instrumental variables (e.g., Tucker,
2008), propensity score matching (e.g., Aral et al., 2009), and randomization (e.g., Anagnostopoulos et al., 2008; La Fond and Neville, 2010) to do so and find that influence plays a surprisingly
limited role in diffusion, in particular with online social networks (e.g., Anagnostopoulos et al.,
2008; Aral et al., 2009; Goel et al., 2012).
Randomization techniques consist in generating pseudo-samples based on the original sample
by randomly permuting the values of some variables among observations Noreen (1989) in a way
that breaks peer influence but not homophily. These permutations allow for estimating empirical
distributions for parameters of interest. Comparing these distributions to the parameters obtained
using the original data allows for learning whether peer influence plays a role in adoption or if
instead the latter is mostly driven by homophily (e.g., Anagnostopoulos et al., 2008; La Fond and
Neville, 2010; Belo and Ferreira, 2012). Randomization has not been much used in the literature,
in particular with large social networks. In this paper, we apply randomization to a large social
network inferred from mobile calls and we show that peer influence may play different roles depending on the viral characteristics of the products. We also show that randomization provides a
lower bound on the effect of peer influence. As such the contribution of this paper is twofold. One
the one hand, we improve our theoretical understanding of how randomization works and what it
can achieve. On the other and, we provide a large scale example of its empirical application in a
2
network context.
We use a comprehensive panel of data from a large European mobile carrier. The data comprise
detailed information about all subscribers, including all call and SMS detail records, pricing plans
and adoption of add-on products and promotions between August 2008 and June 2009. We look
for two types of products in this carrier. With type P products users pay a flat subscription fee and
can call for free subscribers in the same carrier that have also subscribed the same product. Type
N products are similar but one can instead call for free all users in the same carrier irrespective
of whether they also subscribed the product. Both types of products are common among mobile
network providers and are seen as part of their strategies to retain clients and compete in the
market.
We model the adoption of each of these types of products and find that despite their apparent
similarities their setups result in very different adoption incentives. Even though the adoption of
each of these products yields positive network effects to the adopters’ friends, for type P products
these effects are only realized upon friends’ adoption, while for type N products all friends can
immediately benefit from the ego’s adoption. Thus, and intuitively, while for type P products the
benefit of adopting grows with the number of friends that have adopted, for type N products the
incentives to adopt decrease with the number of friends that have adopted. Our empirical results
confirm this intuition. In particular, we find the expected ”negative” effect of social influence in
the case of the N products. Recall that homophily is, by definition, positive, which thus increases
our confidence in our identification of the effect of peer influence in the case of N products.
These results have important management implications. Social influence not always contributes
positively to product diffusion. Its role depends highly on the design of the viral features of the
products considered and seemingly similar products may end up exhibiting very different diffusion
dynamics due to potential different roles that social influence can play. Therefore, it is important
3
to carefully design these product features as to correctly anticipate demand and changes in demand
over time.
2
Related Work
2.1
Diffusion of Innovations and Social Influence
Social influence has been found to play an important role in the process diffusion of products and
services (e.g., Rogers, 2003) and as such it has been incorporated in many diffusion models, such
as epidemic models (including Susceptible-Infectious-Recovered (SIR) models (e.g., Kermack and
McKendrick, 1927) and the Bass Model Bass (1969)), where new adopters are influenced by the
proportion of previous adopters, threshold models (e.g., Granovetter, 1978), in which adoption
occurs when a given fraction of one’s friends has already adopted, and in hub models (e.g., Watts
and Dodds, 2007), in which a number of well-informed central agents adopt a product and then
lead others to adopt. Common to all these models is the fact that, under the right conditions,
a small number of initial adopters may lead to a large number of adoptions. Peres et al. (2010)
define innovation diffusion as “the process of the market penetration of new products and services,
which is driven by social influences”. These influences “include all of the interdependencies among
consumers that affect various market players with or without their explicit knowledge.”
Peres et al. (2010) provide a framework to classify factors that drive product adoption. They
draw a clear distinction between factors that stem only from heterogeneity among individuals
and factors that involve social interactions. The first group of factors includes only individual
characteristics that determine whether and when adoption occurs (e.g., Rogers, 2003; Watts and
Dodds, 2007; Van den Bulte and Joshi, 2007), while the latter group of factors represents all forms of
bidirectional and unidirectional communication across individuals, that is, social influence. Social
influence can be defined as the degree by which an action from an individual changes the behavior
of someone else, and includes all forms of consumer interactions, including network effects, social
4
signals and interpersonal communication (word-of-mouth)
2.2
Peer influence in large scale networks
Research on the role of peer influence in large-scale networks has seen a surge in the last decade
due to the increased availability of large data sets with social network information. There has been
a fair amount of work on the role of influential actors (e.g., Rogers, 2003; Watts and Dodds, 2007;
Van den Bulte and Joshi, 2007), on the role of network structure (e.g., Bampo et al., 2008), on the
characteristics of products that foster faster diffusion over networks (e.g., Aral and Walker, 2011;
Sun and Tang, 2011), and on the role of network effects (e.g., Goldenberg et al., 2010). Most of
this work has been in online networks (e.g., Anagnostopoulos et al., 2008; Aral et al., 2009; Goel
et al., 2012; Aral and Walker, 2011).
The ”influentials” hypothesis states that a small set of people trigger most of the diffusion, which
makes the identification of these people valuable for marketing purposes. Watts and Dodds (2007)
run a set of computer simulations to test this two-step hypothesis, finding that in most cases large
cascades of influence were driven not by influential individuals (opinion leaders) but rather by a
large number of easily influenced people. Van den Bulte and Joshi (2007) present a model in which
eventual adopters are split into two distinct groups: the ”influentials” and the ”imitators”. The
former are in touch with new developments and influence the ”imitators”. Adoption by ”imitators”
does not influence adoption by ”influentials”. They highlight the fact that these two-stage models
tend to fit better the data than standard mixed-influence models. Bampo et al. (2008) use data
from an actual viral marketing campaign to calibrate a computer model, and run several simulations
in different types of networks. They find that the network structure has a critical role in the spread
of a viral message.
Little attention has been given to the role of peer influence in real-world mobile network services.
One such exception is work by Godinho de Matos et al. (2012) where they study the effect of peer
5
influence in the adoption of the iPhone 3G. They use a multitude of methods to circumvent possible
biases in estimation. Namely, they use community detection algorithms and use the adoption by
friends of friends to instrument the adoption of friends. They find a consistent positive effect of
influence on the diffusion of this product. Ma et al. (2009) analyze the role of peer influence and
homophily on the diffusion of Call Ring-Back Tones (CRBT). CRBTs are songs chosen by the
receiver of a call that are played to the caller while she is waiting for the receiver to pick up the
call. They use 3 months of data from a large Indian telecom operator. Call data records along
with CRBT adoption data provide detailed information about the exposure to a given CRBT,
allowing for identifying influence under a relatively small set of assumptions. The authors develop
a structural model and conclude that both influence and homophily play a significant role in the
diffusion of CRBTs over this network.
Nevertheless, other authors have found that peer influence in online networks plays a limited
role in adoption. For example, Goel et al. (2012) analyze the diffusion patterns from seven online
domains, such as Twitter and Yahoo. They find similarities across all domains, namely they
find that most adoption is part of very simple cascades of only one hop, and that only a very
small fraction of adoptions are part of longer cascades. Aral et al. (2009) look at influence in an
instant messaging network and conclude that at half of the perceived contagion can be explained
by homophily.
2.3
Identifying Peer Influence
Identifying social influence in observational data is a hard task due to the confoundedness between
influence and unobserved effects such as homophily. Shalizi and Thomas (2011) argue that in
general homophily cannot be observationally distinguished from influence or contagion, unless some
causal structure is assumed. Consequently, the results obtained from exploring observational data
are only as good as these assumptions. Among other alternatives, the authors suggest placing
6
bounds on the causal effects of interest. If these bounds exclude zero, then it is possible to infer
the existence of an effect and its direction.
In this paper we use randomization to try separate homophily from peer influence and thus avoid
over-estimating the effect of the latter. We show that randomization applies nicely to network contexts and can be used in practice over large networks. Randomization tests are a technique that has
been used for non-parametric hypothesis testing based on permutations of values among observations Noreen (1989). The key idea is that under the null hypothesis these permutations correspond
only to random disturbances in the data and should not change the statistics of interest. The test
is conducted as follows. The original data are altered several times by permuting some attributes
among individuals, each permutation originating a pseudo-sample. A test score is calculated for
each pseudo-sample, and from these test scores an empirical distribution is estimated. The test
score can be any statistic calculated from the pseudo-sample, such as, for example, the sample
mean or a parameter resulting from a model estimation (as in the case of this paper). The test
score of the original data is then compared to the distribution of the test score calculated from the
pseudo-samples, and its significance is assessed.
Two recent studies outline randomization strategies to identify peer influence effects when information about the timing of actions is available. Anagnostopoulos et al. (2008) propose the shuffle
test to identify influence as a source of correlation in a social network. This test consists in shuffling
the adoption date among people that will eventually adopt. The test is based on the idea that
under the no-influence null hypothesis, adoption dates should be independent across people and
therefore the correlation coefficient between adoption and the number of previous adopters should
be the same no matter whether adoption dates have been shuffled. If the correlation coefficient
with the original data is different from the one obtained with shuffling, the null hypothesis can be
rejected and we can conclude that influence plays a role in the process of adoption. Note that this
7
shuffle requires a longitudinal view of the dataset, that is, the researcher needs to know exactly
which individuals ended up adopting the product so that shuffling can be performed among all
eventual adopters.
La Fond and Neville (2010) describe a general randomization method to identify homophily and
influence in a two-period setting. Their assumption is that if influence is present then the attributes
of connected individuals become more similar from one period to the next. Therefore, they define
a correlation measure that increases with both the number of connected individuals with similar
attribute values and the number of unconnected individuals with different attribute values. They
calculate empirical distributions for this auto-correlation statistic under the null hypothesis that
there is no influence (or homophily). This is obtained by selectively randomizing the creation of
links (homophily) or changes in attributes (influence) in a way that breaks the actual connections
but preserves all other network attributes, such as the number of links and attributes that change
and the degree of each individual. Finally, they compare the statistic obtained from the observed
change in the auto-correlation measure from one period to the next with the empirical distribution
of the same measure under the null hypothesis, and determine whether the null can be rejected or
not.
2.4
Network Effects and Network Games
While initial works on diffusion considered social influence and network effects as global phenomena
that occurred in a fully connected networks (e.g., Bass, 1969), more recent studies refine these
assumptions and analyze diffusion in partially connected networks with local network effects (e.g.,
Goldenberg et al., 2010). Galeotti et al. (2010) and Jackson and Zenou (2012) describe a set of
games, called “network games”, where outcomes depend on the network structure. There is a vast
literature on network games and on their theoretical properties (e.g., Galeotti et al., 2010; Jackson
and Zenou, 2012; Sundararajan, 2007; Kearns et al., 2001). For example, Goldenberg et al. (2010)
8
argue that some local network effects may slow down adoption because adopters tend to wait for
their early-adopter friends to adopt in order to get more utility from adoption. They use simulation
techniques to infer this “chilling effect”. This is a classical example of a game in which benefits
accrue only after one adopts. However, in our case, with the type N products benefits accrue even
without adoption. Hence, our setting is closer to that of “best-shot” public good games (Hirshleifer,
1983), in which the payoff of an individual depends on whether any of their neighbors takes an
action that implies a cost. This action entails a positive externality on neighbors, leading them
not to adopt if any of their neighbors has already adopted. To the best of our knowledge there
have been no empirical studies showing the existence of such behavior in real life and on actual
products. Our paper provides a first empirical example of such an effect.
3
Modeling Incentives to Adopt with Network Effects
We develop a model for how social influence can affect the adoption of products that exhibit network
effects. We focus on the specific case of products that provide free-calls within the same carrier
and look at the adoption incentives of two slightly different versions of these products. Type P
products allow calling for free subscribers that also adopt the same product. Type N products allow
calling for free any subscriber in the same carrier irrespective of whether the latter has adopted the
product. Subscribers can choose either of these versions of these products. Both of them require
subscribers to pay a fixed flat fee.
For sake of simplicity, we assume each user has a fixed number of friends, Fi , and that the network
structure does not change. Two subscribers are friends if they exchange a minimum number of
calls in a given time period. Subscriber i derives utility from calling friend j, uij , independently of
how much they talk to other friends. We assume quadratic pairwise utility in the number of calls
between user i and user j, cij :
9
ui =
X
uij =
j∈Fi
X
[a(cij + cji ) − b(cij + cji )2 − pcij ]
(1)
j∈Fi
where p represents the price per call. If user i adopts a type P product, she will pay a fixed fee,
f , but will not pay for calls to the other users that have also adopted this product:
ui |AP =
X
[a(cij + cji ) − b(cij + cji )2 − pcij 1{uj |AP ≥ uj }] − f
j∈Fi
where ui |AP represents her utility from adopting the product and 1{uj |AP ≥ uj } is an indicator
function for whether the utility from adopting the product is larger than the utility from not
adopting for user j, that is, whether user j adopts the product. Maximizing utility (as shown in
the appendix) yields that user i will adopt iff:
dAi ≥
4b
f
ap
where dAi represents the number of friends of user i that will eventually adopt the product. User
i will adopt this product if there is a minimum number of friends that will also adopt it. This is
consistent with positive peer influence: the higher the number of friend who adopt the higher the
likelihood of adopting.
The incentives to adopt a type N product are quite different. In this case, when user i adopts
this product she can call all friends within the same carrier for free:
ui |AP =
X
[a(cij + cji ) − b(cij + cji )2 ] − f
j∈Fi
In this case, adoption occurs as long as there are at least a minimum number of friends that will
not adopt (details of the derivation provided in the appendix):
10
dNi ≥
4b
f
ap
where dNi represents the number of friends of user i that will not adopt.
Thus, apparently similar products can generate different adoption incentives that can translate
to very different diffusion outcomes. These two types of products offer distinct incentives to users
that have not adopted them. On the one hand, as more people adopts a type P product, the higher
the incentive to adopt it because the number of people one can call for free increases. On the other
hand, type N products offer the opposite incentive: the more users that adopt such a product, the
smaller the incentives to adopt, given that adopters can already call for free.
4
Data
We use an 11-month anonymized panel of data comprising detailed information about all subscribers
in a large mobile European network provider. The data include records for all calls and SMS
placed by all subscribes in this carrier between August 2008 and June 2009. There are roughly
4 million subscribers active during our period of analysis. These records include the anonimize
identifiers of the caller and the callee, the start time and the duration of every call. On an average
day subscribers generate about 4 million calls and exchange 40 million SMS. Additionally, the
data contain information about the subscribers’ pricing plans and supplementary services. At each
point in time, each subscriber is associated with one pricing plan and possibly several supplementary
services. Supplementary services are a la carte add-on services that subscribers can acquire, such
as a pack of 1000 SMS at a discounted rate, free calls on the weekends or night for a given period of
time, or simply voicemail service. We limit our analysis to a random sample of 10,000 subscribers
and the subscribers they call and SMS – hereinafter called neighbors. As such we deal with
information from roughly 50,000 subscribers.
11
The definition of neighbor is based on the number of calls that users exchange during a given
period. Two subscribers are considered neighbors if they exchange a minimum number of calls in
the same calendar month (3 or 5 in our analysis) and there is at least one call in both directions.
Table 1 displays preliminary summary statistics on calls, SMSs and neighbors. As expected, calls
inside the network correspond to 75%-80% of the total calls. As usual in mobile networks, the
number of SMS is one order of magnitude larger than the number of calls. The share of SMS inside
the carrier is roughly 95%. Also, subscribers tend to have more neighbors inside the network,
roughly 70%-75%.
Table 1: Call, SMS and Neighbors Summary Statistics (N=10,000).
VARIABLES
Mean
SD
Calls Made Inside Network
Calls Made All
Calls Received Inside Network
Calls Received All
SMSs Made Inside Network
SMSs Made All
SMSs Received Inside Network
SMSs Received All
3-call Neighbors Inside Network
3-call Neighbors All
5-call Neighbors Inside Network
5-call Neighbors All
18.76
23.11
21.23
27.98
205.2
212.8
203.9
209.5
2.525
3.558
1.650
2.192
37.02
41.78
38.51
46.21
632.2
644.2
619.9
629.6
3.570
4.864
2.498
3.185
We estimate the effect of social influence on two products, one of type P and one of type N.
The type P product analyzed corresponds to a pricing plan in which users pay a fixed monthly fee
to call for free subscribers that have the same pricing plan. The type N product corresponds to
short-term promotion that was available during December 2008 for which users pays a one time
fixed fee and can call all users in the same carrier for free until the end of the year, independently
of whether the latter have adopted the same promotion. Our dataset registers 1126 adopters of
product P between the January 2008 and June 2009. Product N was adopted 534 types between
mid November and mid December 2008.
12
5
Empirical Strategy
We apply the shuffle test described in Anagnostopoulos et al. (2008) and use randomized versions
of the data to infer an empirical distribution for the effect of friends adoption on one’s adoption
under the null hypothesis of no influence. We then compare the coefficient obtained for this same
statistic using the original data with the average of this empirical distribution. We show in the
appendix that this procedure yields a lower bound for the magnitude of social influence. This is an
advantage regarding other methods such as instrumental variables and propensity score matching,
which often provide upper bounds for the potential effect of social influence. The reminder of this
section details our procedure.
Consider relational data represented as an undirected graph, G = (V, E), where V is a set of
users and E is a set of undirected edges. Edge eij connects users i and j and belongs in E iff users
i and j are neighbors. Each user v ∈ V has a time-changing attribute Wvt indicating whether she
has adopted the product at a time t̃ ≤ t.
Additionally, assume that at time t the probability of user i to adopt a given product follows
a distribution that depends on the number of users j connected to her that have already adopted
the product, ait =
P
j:eij ∈E
Wtj , and on her own characteristics, Xi .
Due to computational restrictions, and because our interest lies mainly in the first-order marginal
effects of having one more neighbor that adopted, we estimate a linear probability model (LPM)
of the form
p(yit = 1|yi,t,...,t−1 = 0, Xi ) = αait + Xi β + εit ,
(2)
where yit is the adoption of user i at time t, α corresponds to our parameter of interest, measuring the effect of friends’ adoption on i’s adoption, ait is the number of friends that have already
adopted a given product, Xi is a vector of covariates of user i and β is a parameter vector. As
13
mentioned before, α might capture not only the effect of peer influence but also that of other
sources of correlation between one’s decision to adopt and the number of one’s friends’ that have
previously adopted such as homophily and unobserved confounding variables. We run this model
on each randomized version of the data to estimate an empirical distribution for α under the null
hypothesis of no influence. This hypothesis can be stated as follows:
H0 : The probability of user i to adopt a given product at time t is not determined by the number
of neighbors that have already adopted, ait .
This means that without peer influence the adoption dates of one’s friends do not contribute to
the one’s adoption at any given point in time. We might, however, observe a positive correlation
between one’s adoption and the number of friends that have already adopted. This might be due
to the fact that friends have similar (unobserved) characteristics that have a similar effect on their
propensity to adopt.
We generate randomized versions of the original data that mimic them as much as possible but
assume that adoption dates are irrelevant for the process of diffusion. We shuffle adoption dates
among eventual adopters as suggested in Anagnostopoulos et al. (2008). This transformation preserves network-level statistics, such as the total number of adoptions in each period, and thus avoids
potential problems that could arise from significantly changing the original data. To calculate the
empirical distribution of α we run our LPM model to obtain an estimate for each randomized
version of the data. Then, we can reject the null hypothesis if the estimate for α obtained from estimating equation 3 with the original data falls outside the 95% confidence interval of the parameter
obtained from the empirical distribution. Moreover, we identify social influence as the difference
between the average of α from the empirical distribution across the randomized pseudo-sample and
14
the coefficient obtained with the original data. As discussed before this difference provides a lower
bound for this effect.
6
Results
Figure 1 shows our results for the type P product. The empirical distribution represents the
distribution of parameter α obtained from running the LPM model in equation (3) on 1,000 pseudodatasets with adoption dates shuffled among eventual adopters. This distribution has a positive
average, 0.0051, and a low standard deviation, 0.00007, so that we can reject the null hypothesis
that α = 0, meaning that confounding factors, such as homophily are at play. Despite these effects,
the coefficient obtained using the original data is higher and statistically different from the estimate
above. With the original data the coefficient obtained is 0.0053, outside the 95% confidence interval
of the empirical distribution. Therefore, we conclude that social influence plays a positive role in
the diffusion of this product. Its total effect corresponds to to at least 11% of the total adoption
observed.
Figure 2 shows the results for the type N product. In this case both the average of the empirical
distribution and the coefficient obtained with the original data are positive and statistically different
from zero. Still, the latter is statistically lower than the former. This means that without influence,
the coefficient associated with the role of friends’ adoption is higher than the coefficient obtained
using the original data, and thus – after accounting for unobserved phenomena such as homophily
– we find that social influence plays a negative role in the diffusion of this product. In this case,
social influence reduced the observed adoption by 9%. Table 3 summarizes these results.
The adoption of our type P product exhibits the typical positive effects of word-of-mouth. The
novelty in our empirical exercise is the result obtained with our type N product. Social influence
reduces the adoption of this product and this comes in line with our theoretical model. As a
robustness check, and given the novelty associated to this result, we identified five additional type
15
Type P Product
Frequency
75
50
25
x
0
0.0049
0.0050
0.0051
0.0052
Empirical Distribution
Figure 1: Distribution of coefficients for the Type P product over 1,000 shuffles of the adoption date.
The ‘×’ mark represents the coefficient obtained using the original data. Dashed lines represent
95% confidence intervals.
N products offered by this carrier and applied the same empirical strategy as above to them too.
The results are provided in appendix. In all these cases the effect of social influence is negative and
it is statistically significant in three of them.
7
Robustness check
In the procedure used before we shuffle adoption dates among all adopters to measure social influence. This assumes that social influence is coded in the users’ adoption dates. While this seems
a reasonable approach to think about peer influence – one cannot influence or change the past –
there is still a chance that adoption dates may conceal unobserved effects leading to adoption that
we may erroneously by taking up as peer influence.
For example, before we have implicitly assumed that adoption dates are all drawn from the same
distribution. However, if one’s propensity to adopt is correlated not only with the number of friends
16
Type N Product
100
Frequency
75
50
25
0
x
0.0045
0.0050
0.0055
0.0060
0.0065
Empirical Distribution
Figure 2: Distribution of coefficients for the Type N product over 1,000 shuffles of the adoption date.
The ‘×’ mark represents the coefficient obtained using the original data. Dashed lines represent
95% confidence intervals.
that eventually adopt, but also with whether one is an early or a late adopter (that is, if there is
temporal clustering in adoption), our procedure may yield unsatisfactory results. For instance, if
early adopters tend to be friends with each other then shuffling adoption dates among all adopters
would assing late adoption dates for early adopters’ friends, not reflecting the original data. Thus,
one way to improve our procedure is to shuffle adoption dates among friends of users that adopt
in the same period (or sufficiently close). This restriction controls for the heterogeneity across
consumers, in terms of early adopters vs laggards, allowing us to better identify social influence.
We proceed to shuffle adoption dates only among friends who adopted within the same week.
If a users’ friend is also friends with someone that adopted in another week, she is assigned to the
partition corresponding to the earlier week. This is not common in our dataset (10 cases only)
because we are using a sample of 10,000 users in a network with 4 million users. This decreases
17
Table 2: Influence estimates for type P and type N products.
Product
Type
Adopters
Original
Coefficient
Empirical
Distribtion
Marginal
Effect
Avg.
Exposure
Periods
Extra
Adoption
P
1126
5.3e-03***
(4.5e-04)
5.1e-03
(7.0e-05)
2.1e-04***
.871
68
124 (11%)
N
534
4.5e-03***
(4.4e-04)
5.7e-03
(2.6e-04)
-1.2e-03***
.131
32
-50 (-9%)
considerably the chance of someone’s friend being also friend with another focal user in our analysis.
Finally, friends of non-adopters are grouped together in a separate partition because there is no
way to assign them to any of the other partitions.
Figure 3 show the results obtained for the type N product, which is qualitatively similar to
the one obtained before without partitioning. Partitioning slightly increased the mean of the
empirical distribution keeping the same standard deviation. This increases both the magnitude
and confidence of our ”negative” influence result. Thus, this is a case in which partitioning allows
not only for controlling an extra potential source of homophily but also for strengthening our
results. Likewise, partitioning in the case of the type P product yields also results similar to the
ones showed before. These results are available upon request.
8
Conclusion
This paper complements the literature on product diffusion by studying how product characteristics
affect diffusion. In particular, we show how the viral characteristics of a product can change the
magnitude and the direction of the effect of social influence. We show that depending on the
design of these viral features, word-of-mouth from friends that have already adopted the product
can accelerate or slow down further adoption. We focus on two types of products that are only
slightly differently designed but that generate very different adoption incentives. We develop a
structural model that highlights this fact.
18
Type N Product
Frequency
90
60
30
0
x
0.0045
0.0050
0.0055
0.0060
Empirical Distribution
Figure 3: Distribution of coefficients for the type N product over 1,000 adoption date shuffles
shuffling only within the same week. The ‘×’ mark represents the coefficient obtained using the
original data. Dashed lines represent 95% confidence intervals.
We use randomization to identify social influence from observational data, and obtain empirical
results in line with the structural model developed. One of the products should be adopted when
enough friends have not yet adopted the product. In this case, the more friends that adopt the
product the more incentive one has to do so. The other product we consider should be adopted
when enough friends have not yet adopted the product. Therefore, in this case, one more friend
adopting the product decreases the incentive one has to adopt it. In line with these arguments, our
empirical results show that in fact the former product exhibits ”positive” social influence whereas
the latter exhibits ”negative” social influence. From another point of view, we observe ”positive”
social influence when benefits accrue only after adoption and we find ”negative” social influence
when benefits accrue even if adoption does not take place. Our theoretical model opens the black
box and pinpoints exactly the mechanism behind ”negative” social influence in the case of the
19
products we study. The emergence of the empirical results on ”negative” social influence may
explain why in some viral settings adoption remains lower than expected.
Finally, our findings have important managerial implications. While firms are increasingly trying
to design products with viral features, we show that not all designs result in widespread adoption.
In this paper we study one particular example of a design that seems to be viral at first glance but
turns out to produce only little adoption.
A
Appendices
The full version of the paper will include three appendices:
A.1 Proof of the conditions for adoption shown in section 3;
A.2 Proof that the randomization procedure yields a lower bound;
A.3 Empirical results for 5 more type N products.
The current version includes only draft versions for some of these appendices:
A.1 [DRAFT] Conditions for Adoption
In this section we detail the model presented in section 3 for a type N product. Derivations are
similar for type P products.
Setup
Consider users i, j, pij for the price user i pays to call user j, cij the number of calls from i to j,
Fi the set of friends of user i, and define di ≡
P
j∈Fi
1. Assume that the utility of calls is quadratic
in the number of calls: acij − bc2ij . ui is utility of user i, ui |AN utility of user i from adopting a
20
product of type N:
ui =
X
uij
j∈Fi
=
X
[a(cij + cji ) − b(cij + cji )2 − pij cij ]
j∈Fi
=
X
[acij + acji − bc2ij − 2bcij cji − bc2ji − pij cij ]
j∈Fi
For now assume that initiated calls (cij ) and received calls (cji ) contribute equally to utility.
The difference between them is that user i does not decide on received calls and does not pay for
them as well. If user i adopts the only difference is that she does not pay for outgoing calls:
ui |AN =
X
[a(cij + cji ) − b(cij + cji )2 ] − f
j∈Fi
Utility maximization with no friends that adopt:
∂
∂u
P i
j∈Fi cij
=
X ∂uij
∂cij
j∈Fi
∂uij
= a − 2bcij − 2bcji − pij = 0
∂cij
⇔ c∗ij =
a − pij
− c∗ji
2b
21
Assume for now that all prices are the same: pij = p, ∀i, j. Then, by symmetry,
c∗ij = c∗ji =
a−p
4b
Therefore,
u∗ij = a
=
a−p
a−p 2
a−p
− b(
) −p
2b
2b
4b
1
a2 − ap
[2a2 − 2ap − a2 + 2ap − p2 − ap + p2 ] =
4b
4b
So, aggregating all the u∗ij :
u∗i =
X
j∈Fi
u∗ij =
X a2 − ap
a2 − ap
= di
4b
4b
j∈Fi
If only user i adopts,
ui |AN =
X
uij |AN − f
j∈Fi
Maximizing,
X ∂uij |AN
∂u |A
Pi N =
∂ j∈Fi cij
∂cij
j∈Fi
∂uij |Ak
= a − 2bcij − 2bcji = 0
∂cij
⇔ c∗ij =
a
− c∗ji
2b
If user j does not adopt, then c∗ji = 0, because user j will maximize her utility only by the
received calls from i. Therefore,
22
u∗ij |AN = a
a
a
a2
− b( )2 =
2b
2b
4b
So, aggregating all the u∗ij |AN :
u∗i |AN =
X
u∗ij |AN − f = di
j∈Fi
a2
−f
4b
So, a user with no friends that have adopted will adopt iff:
u∗i |AN − u∗i > 0
⇔ [di
a2 − ap
a2
− f ] − [di
]>0
4b
4b
⇔ f < di
a2 − ap
a2
− di
4b
4b
⇔f <
di a
di a
a−
(a − p)
4b
4b
⇔f <
di a
di a
a−
(a − p)
4b
4b
⇔f <
di a
p
4b
Adoption with friends that have adopted
Assume now that user i expects that dAi friends adopt and that dN i friends do not adopt (di =
dAi + dN i ). In this case, she adopts iff:
E[u∗i |AN , dAi , dN i ] ≥ E[u∗i |dAi , dN i ]
⇔ di [
⇔
a2
f
a2
a2 − ap
− ] ≥ dAi + dN i
4b di
4b
4b
a2
f
dAi a2 dN i a2 − ap
−
≥
+
4b di
di 4b
di
4b
23
⇔a−
dN i
f 4b
dN i
)a +
(a − p)
≥ (1 −
di a
di
di
⇔a−
f 4b
dN i
dN i
dN i
a+
a−
p
≥a−
di a
di
di
di
⇔
f 4b
dN i
p
≤
di a
di
⇔ dN i ≥ 4
fb
ap
Thus, user i will adoption if there is a minimum number of friends that are not going to adopt.
This threshold increases with the adoption fee, f , and with b, and decreases with the price of calls,
p, and with the marginal utility of calls at cij + cji = 0, a. So, the adoption decision depends on
how many non-adopter friends one believes to have, and not on the amount of friends one has.
A.2 [DRAFT] Randomization Yields a Lower Bounds
In this section we show that for a special class of networks the difference between the average value
of α from the empirical distribution and the coefficient obtained using the original data is a lower
bound for the magnitude of the effect of influence.
Assume two partitions of a network, A and B, in which users are randomly connected. Furthermore, assume that there are no edges connecting users in A with users in B. This network structure
could be considered as two separate networks, A and B. Assume that in partition A adoption is
random. Every user in this partition adopts with probability p. Assume that in partition B there
is no adoption. By construction there is no influence in this model, i.e., adoption is not determined
by the number of neighbors that have adopted. The OLS estimator for peer influence, α, in the
model
yi = δ + αai + εi ,
where ai represents the number of neighbors that also adopt, is
24
(3)
α̂ =
ay − āȳ
a2 − ā2
Let NA and NB represent the number of users in partition A and B, respectively.Taking expectations we obtain an expression for the expected value of the OLS estimate for α, E[α̂]:
E[α̂] = p ·
NA
NA +NB EA [a]
NA
2
NA +NB (EA [a])
EA [a|y = 1] −
EA [a2 ] −
(4)
Note that expectations are only over partition A, so they are not affected by the number of
individuals in partition B.
Under the null hypothesis of no influence EA [a|y = 1] = EA [a|y = 0] = EA [a] so,
Ẽ[α̂] = p ·
NB
NA +NB EA [a]
A
EA [a]2
EA [a2 ] − NAN+N
B
(5)
This expected value increases with the proportion of adopters, p, with the relative size of partition B and with the average number of neighbors that had already adopted. It decreases with the
variance of the latter.
Assuming now that influence plays a role in partition A and that E[a|y = 1] 6= E[a|y = 0] we
get
E[α̂] = p ·
NA
NA +NB ]EA [a]
NA
2
NA +NB EA [a]
EA [a|y = 1] + [1 − 1 −
= p(1 − p) ·
EA [a2 ] −
EA [a|y = 1] − EA [a|y = 0]
+ Ẽ[α̂]
B
2
V arA [a] + NAN+N
E
[a]
A
B
(6)
The first term corresponds to the difference in expectations normalized by the percentage of
adopters, by the variance in the number of neighbors that have adopted, by the relative size of
25
partition B and EA [a]2 . The second term corresponds to the bias introduced by partition B in the
no-influence scenario, (Ẽ[α̂]).
If we can identify partitions A and B then we can correct for heterogeneity and get a consistent
estimate for the effect of influence. However, if we cannot identify these partitions we cannot
distinguish between homophiliy and influence with a simple OLS regression. Note that for the
influence scenario the expected value of the parameter we are trying to estimate, E[α], corresponds
to E[α̂] when there is no partition B:
E[α] = p(1 − p) ·
EA [a|y = 1] − EA [a|y = 0]
V arA [a]
We can, however, obtain an estimate for the bias introduced by heterogeneity (partition B) by
shuffling the data. By shuffling adoption dates repeatedly we get an empirical distribution for α
under the no-influence hypothesis. The expected value of this coefficient is Ẽ[α̂].
Thus, by subtracting the expected bias, Ẽ[α̂], from the OLS estimator, E[α̂], we get:
E[α̂] − Ẽ[α̂] = p(1 − p) ·
EA [a|y = 1] − EA [a|y = 0]
B
V arA [a] + NAN+N
EA [a]2
B
This quantity has always the same sign as E[α] and is always smaller, in magnitude, than E[α].
Therefore,
|E[α̂] − Ẽ[α̂]| ≤ |E[α]|
Thus, the difference between the coefficient obtained from running the regression in Equation
3 with the original data and the average value of α obtained from the empirical distribution,
E[α̂] − Ẽ[α̂], provides a lower bound for the magnitude of the role of peer influence in adoption.
Given our assumptions, this result applies only to very specific types of networks, where (1)
26
there are no adopters in partition B, and (2) there are no edges between users in different partitions. We believe, however, that these two assumptions can be relaxed in order to produce a more
general result. The first assumption can be relaxed by including a parameter for the probability
of adoption of users in partition B, while the second assumption can also be relaxed by including
a parameter for the probability of ties between users in the two partitions. We plan to relax these
assumptions by the time of the conference and merge them into one general solution.
A.3 [DRAFT] Empirical Results for Other type N products
Results for additional 5 type N products:
Product
Type
Adopters
Original
Coefficient
Empirical
Distribtion
Marginal
Effect
Avg.
Exposure
Periods
Extra
Adoption
(1)
357
.105
31
-22 (-6%)
341
-8.0e-4***
.054
57
-29 (-9%)
(3)
299
-6.0e-5
.085
27
(4)
205
-2.5e-4**
.076
28
(5)
110
.0036
(1.6e-04)
.0037
(2.0e-04)
.0032
(1.6e-04)
.0018
(1.4e-04)
.0011
(2.1e-04)
-7.0e-4***
(2)
.0029***
(4.0e-04)
.0028***
(5.0e-04)
.0032***
(4.7e-04)
.0015***
(4.0e-04)
.0015***
(4.1e-04)
-3.6e-4
.038
37
-6 (-3%)
Table 3: Influence estimates for type I and type II products.
References
Aris Anagnostopoulos, Ravi Kumar, and Mohammad Mahdian. Influence and correlation in social
networks. In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 7–15, New York, NY, USA, 2008. ACM.
S. Aral and D. Walker. Creating social contagion through viral product design: A randomized trial
of peer influence in networks. Management Science, 57(9):1623–1639, 2011.
S. Aral, L. Muchnik, and A. Sundararajan.
Distinguishing influence-based contagion from
27
homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544, 2009.
M. Bampo, M.T. Ewing, D.R. Mather, D. Stewart, and M. Wallace. The effects of the social
structure of digital networks on viral marketing performance. Information Systems Research,
19(3):273, 2008.
F.M. Bass. A New Product Growth for Model Consumer Durables. Management Science, 15(5):215,
1969.
Rodrigo Belo and Pedro Ferreira. Using Randomization Methods to Identify Social Influence in
Mobile Networks. In The Fourth IEEE International Conference on Social Computing, SocialCom
2012, 2012.
Andrea Galeotti, Sanjeev Goyal, Matthew O Jackson, Fernando Vega-Redondo, and Leeat Yariv.
Network games. The review of economic studies, 77(1):218–244, 2010.
Miguel Godinho de Matos, Pedro A Ferreira, and David Krackhardt. Peer influence and homophily
in the diffusion of the iphone 3g in a very large social network. In Privacy, Security, Risk and
Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social
Computing (SocialCom), pages 134–143. IEEE, 2012.
S. Goel, D.J. Watts, and D.G. Goldstein. The structure of online diffusion networks. In 13th ACM
Conference on Electronic Commerce, 2012.
J. Goldenberg, B. Libai, and E. Muller. The chilling effects of network externalities. International
Journal of Research in Marketing, 27(1):4–15, 2010.
M. Granovetter. Threshold models of collective behavior. American journal of sociology, pages
1420–1443, 1978.
28
Jack Hirshleifer. From weakest-link to best-shot: The voluntary provision of public goods. Public
Choice, 41(3):371–386, 1983.
Matthew Jackson and Yves Zenou. Games on networks. Handbook of Game Theory, 4, 2012.
Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. In
Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 253–260.
Morgan Kaufmann Publishers Inc., 2001.
WO Kermack and AG McKendrick. A contribution to the mathematical theory of epidemics.
Proceedings of the Royal Society of London. Series A, 115(772):700–721, 1927.
T. La Fond and J. Neville. Randomization tests for distinguishing social influence and homophily
effects. In Proceedings of the 19th international conference on World wide web, pages 601–610.
ACM, 2010.
L. Ma, R. Krishnan, and A. Montgomery. Homophily or Influence? An Empirical Analysis of
Purchase within a Social Network, 2009.
E.W. Noreen. Computer Intensive Methods for Testing Hypothesis- An Introduction. JOHN
WILEY & SONS, (229), 1989.
R. Peres, E. Muller, and V. Mahajan. Innovation diffusion and new product growth models: A
critical review and research directions. International Journal of Research in Marketing, 27(2):91–
106, 2010.
Everett M. Rogers. Diffusion of Innovations, 5th Edition. Free Press, August 2003.
C.R. Shalizi and A.C. Thomas. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2):211–239, 2011.
29
J. Sun and J. Tang. A survey of models and algorithms for social influence analysis. Social Network
Data Analytics, pages 177–214, 2011.
Arun Sundararajan. Local network effects and complex network structure. The BE Journal of
Theoretical Economics, 7(1), 2007.
C. Tucker. Identifying formal and informal influence in technology adoption with network externalities. Management Science, 54(12):2024, 2008.
C. Van den Bulte and Y.V. Joshi. New product diffusion with influentials and imitators. Marketing
Science, 26(3):400–421, 2007.
D.J. Watts and P.S. Dodds. Influentials, networks, and public opinion formation. Journal of
Consumer Research, 34(4):441–458, 2007.
30
Download