Location- ased Social Network Users Through a Lense:

advertisement
AAAI Technical Report FS-12-08
Social Networks and Social Contagion
Location-B ased Social Network Users Through a Lense:
Examining Temporal User Patterns
Konstantinos Pelechrinis and Prashant Krishnamurthy
School of Information Sciences
University of Pittsburgh
{kpele, prashk}@pitt.edu
Abstract
Such a list can be accompanied by tips and recommendations from people that have visited these places before. Even
simply the number of people that have been at a locale or
venue in the past or are present at the moment might be informative for making decisions.
An LBSN has two distinct components, a social and a
spatial one. The social part of the system resembles any
other existing online social network, where friendships are
declared and people can interact with the “connected” people. What differentiates LBSNs from other online social networks (OSNs) are the type of interactions that are feasible between the members of the network. The main feature of this interaction is location sharing, which comprises
the spatial component of the system. This is essentially a
timestamped location log of places a user has visited. Location sharing can be realized either through continuous tracking, in the form of a temporal latitude/longitude trajectory
(e.g., Loopt, or Google latitude) or via “check-ins”, where
users voluntarily announce their presence in a place or venue
at their convenience (e.g., Gowalla, Brightkite, Foursquare
etc.). Each representation has its advantages and disadvantages. The second approach, where location is tagged with
semantic information, as compared to a flat geographic trajectory, offers a richer set of information, but with coarse
location granularity.
The reasons why users may adopt LBSNS may vary.
Commercial LBSNs can offer Groupon-like deals based
on location, providing monetary incentives for someone to
adopt their LBSN, while the gaming aspects of LBSNs form
another important motivation for people to adopt their usage (Lindqvist et al. 2011). The LBSN service itself may
have different “scopes”. The primary purpose of Gowalla,
for example, was creating city guides while Foursquare has
focussed on the social aspects, coupons and gaming.
We wish to observe the ways in which users may alter
their behavior over time with regards to the usage of an
LBSN. For understanding this temporal evolution of users’
behaviors, in this work, we consider systems in which spatial information is created via time-stamped check-ins. For
instance, if Jack starts with primarily an interest in rewards
and coupons offered by specific locales/venues, he might use
the network only when he is at these venues. If he then gets
interested in the gaming aspects of the system over time, he
might start increasing his activity at other places as well.
There has been a rapid proliferation of location-based social
networks (LBSNs) during the last years. The spatial component of these systems provides a rich source of information
that can be exploited by a number of novel services. However, to better design such services, it is important to understand the way people make use of these platforms and how
this usage changes over time. While there exist studies that
examine the motivations of people for adopting the usage of
LBSNs and the temporal dynamics of these motivations, they
are based on interviews and are mostly qualitative. Motivations can further only indirectly reveal or help us infer user
behavior. In this paper, we analyze data from two commercial LBSNs to examine the temporal evolution of usage patterns to see what the data on their own reveal. We find that
users of two social networks that we examined increase their
level of activity as they use the system. However, depending
on the main purpose of the underlying LBSN, users may exhibit different behaviors over time. We believe that our findings can open new directions and stimulate further research
on areas such as location prediction and its applications (e.g.,
urban and transportation planning and location-based advertisment).
Keywords: Location-based Social Networks, Temporal Evolution, Mining of Social Networks
1
Introduction
Advancements in mobile handheld devices during the last
several years and their increased capability to accurately estimate (and report) their position have brought another dimension in the already popular digital social media arena,
that of location. People that are connected through them
not only have social ties (e.g., friendship) and/or interests in
common (e.g., hiking), but they are also connected with regards to their location. In other words, location-based social
networks bond the online, virtual social ties with those in the
real world through location information of users. This connection can drive a number of novel, convenient and appealing services. People can track their children or their friends.
Applications related to location coordination for scheduled
meetings can be enabled. Others can explore new places to
visit through a list of venues that are around their location.
c 2012, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
61
This will manifest as an increase in the rate of visits and/or
the number of different places he has visited over time.
In this paper, we analyze data from two commercial LBSNs (Gowalla and Brightkite) to answer the following general question “What are the temporal dynamics of LBSNs
users’ patterns ?” Our main findings can be summarized
in the following:
with regards to their privacy concerns. They found that privacy concerns increase with the age of the users as well
as with an increased activity in the system. Furthermore,
Cramer et al (Cramer, Rost, & Holmquist 2011) through another set of surveys and interviews identify emerging but
also conflicting norms on why people might not check-in.
Concerns related to “spamming” their friends with useless
information and the way the people around them perceive
the action of a check-in were the most cited reasons for not
sharing locations, while support to a friend’s business forms
an important reason to check-in at a locale.
Understanding the motivations behind the usage of LBSNs can help in explaining the LBSN usage pattern data.
However, surveys are limited in terms of small scale. Furthermore, they cannot quantitatively capture the behaviors of
users, since they are mainly comprised of qualitative questions and answers. To the best of our knowledge this is the
first large scale study on the temporal dynamics of LBSN
usage analyzing real data obtained from commercial networks.
• Our results show that people are slow adopters. They start
using the systems more often once they become familiar
with it.
• The general “scope” of the LBSN under examination affects the temporal dynamics of the usage patterns.
Scope of our study: Spatial information has been identified as a critical factor for the success of many functionalities, such as local search (Teevan et al. 2011). Our work
can be mainly viewed as a study of the temporal dynamics
of LBSN users’ location sharing patterns. However, it can
also provide guidelines on how functionalities that exploit
location information can be tuned in a more efficient and
perhaps accurate way with a better understanding of what
the data means. For instance, one of our findings show that
users have a transient initial period, where they are hesitant
to use an LBSN and share their location. A spatio-temporal
prediction algorithm, which can drive applications such as
localized targeted advertising, needs to take such dynamics
into consideration. We will further discuss the implications
of our findings later in the paper.
The rest of the paper is organized as follows. Section
2 discusses existing literature related to our study. Section
3 introduces the datasets we used to perform our analysis,
while Section 4 presents the analysis of these data and our
results. Finally, Section 5 elaborates on the scope of our
work and possible applications of our findings, while Section 6 concludes our work.
2
Mining the structure of LBSNs
There is a set of studies that examines the structural properties of existing LBSNs. Here, structure refers not only to
the properties of the social network graph (as in traditional
OSNs) but also in the location component (e.g., physical
distance to friends, time and type of check-ins etc.). For
instance, Cheng et al (Cheng et al. 2011) use data from
Foursquare to examine (i) the spatio-temporal properties of
users’ check-ins as well as (ii) their mobility patterns. One
interesting finding is that a user’s social status, together with
geographic and economic factors, is coupled with his mobility. Similarly, Noulas et al (Noulas et al. 2011) study
the geographical properties of users’ activities as captured
through the type of places they visit and their transitions.
They use this study for identifying universal features for human urban mobility (Noulas et al. 2012). Using data from
Foursquare, they show that mobility does not exhibit a universal behavior when examined in terms of pure distance
between two stay points. However, the density of the areas,
and as a consequence the intervening opportunities (Stouffer
1940), dictates the mobility of people in urban areas through
a fairly universal law. Scellato et al (Scellato et al. 2011)
try to identify the relation between friendship and distance
using data from 3 different LBSNs (Gowalla, Foursquare
and Brightkite). They find that the socio-spatial structure of
these systems cannot be explained by only geographic factors or only social mechanisms. Li and Chen (Li & Chen
2009) analyze data from Brightkite and after providing the
structural properties of the underlying social graph they try
to identify correlations between different users’ profile features, activity updates, and mobility patterns. Scellato and
Mascolo (Scellato & Mascolo 2011) show that the existence
(or not) of heavy tails in the distributions of various activity
attributes of LBSN users (e.g., degree distribution, check-in
distribution etc.) is tightly connected with the users’ account
age and activity span.
We would like to emphasize that our study is complementary to the above efforts. While we also analyze data from
Related Studies
In this section we will briefly discuss related studies and further differentiate our work.
Motivations for adopting LBSNs
There is a line of literature, which examines specifically the
reasons behind people using LBSNs. These studies mainly
utilize user surveys in order to obtain their results. For
instance, Lindqvist et al (Lindqvist et al. 2011), recruit
Foursquare users in order to identify their reasons behind using this system. They describe the motivations behind adoption of Fousquare and the continuation of its usage. They
find these motivations change over time. Users, in their initial stages, employ Foursquare for fun (the gaming aspect
of the system is initially important). Over time, users are
mainly interested in keeping track of places they have visited. They also observe specific bimodal distributions related
to specific places where users check-in. For instance, there
are users that either check-in at their homes all the time or
they do not at all. Surprisingly, this study reveals that privacy concerns are not as crucial as one might have expected.
Li and Chen (Li & Chen 2010) interview people specifically
62
LBSNs, our goal is to study the temporal behavior patterns
rather than the social and spatial network structure that is
usually captured and analyzed from a (static) network snapshot.
0.8
ECDF
3
1
LBSN Datasets
0.4
Gowalla
Brightkite
0.2
In this section we will briefly describe the dataset we used
for our analysis. These data have been made available by
Cho et al (Cho, Myers, & Leskovec 2011) and are obtained from two commercial LBSNs, namely Gowalla and
Brightkite.
Gowalla dataset: The dataset consists of 6,442,892 public check-in data performed by 196,591 Gowalla users in
647,923 distinct places, during the period between February 2009 and October 2010. Every check-in log includes
a tuple in the form <User ID, Time, Latitude,
Longitude, Venue ID>. Gowalla users also participate in a friendship network with reciprocal relations, which
consists of 950,327 links.
Brightkite dataset: The dataset consists of 4,491,143
public check-in data performed by 58,228 Brightkite users
in 772,966 distinct places, during the period between April
2008 and October 2010. The check-in information is in exactly the same format as above. Brighkite users also participate in a friendship network, which consists of 214,078
links1 .
The notion of time in our study is captured through the
check-in count of a user rather than an absolute value. In
other words, we examine the behavior of a user’s activity
as measured through the number of check-ins. For instance,
when we refer to time t = 5, we essentially refer to the 5th
check-in of a user. In this way we have a notion of relative
time (with respect to his first presence in the system). Hence,
even though the 5th check-in of Jack happened at a different
(absolute) time as compared to the 5th check-in of Bob, the
relative time captures the 5th interaction of both Jack and
Bob with the network. In this way, we are able to identify
any existing patterns and express the temporal dynamics of
an LBSN users’ behavior on average. To summarize, notion
of time in our study is event (i.e., check-in) driven.
Given that we are interested in this relative notion of time,
we calculate the cumulative distribution function of the number of check-ins for the users of the two datasets described
above. Figure 1 presents our results. We can observe that the
majority of the users have less than 1000 check-ins, while
only a very small percentage of them (around 5%) has over
1000 check-ins. In what follows, we will present the average
temporal dynamics of LBSN users as calculated over (i) the
entire user population, (ii) the set of users with more than
1000 check-ins and (iii) the set of users with less than 1000
check-ins. The reason behind this distinction is to examine
whether there exist intrinsic differences in the average behaviors of users with different total activity in the system.
We will elaborate more on this when we present our results
in the next section.
1
0.6
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200
Number of total checkins
Figure 1: The majority of the users in our datasets have less
than 1000 check-ins.
4
Temporal Behavior of Users
In this section we will present the analysis of the two aforementioned datasets. In particular, we examine the following metrics with respect to the check-in count of the user
(i.e., the relative time mentioned above): (i) the inter-checkin
time, that is, the time that has elapsed between two consecutive check-in events, (ii) the number of unique venues visited by a user, (iii) the entropy of a user, which captures his
diversity with regards to the places visited and, (iv) the intercheckin distance, that is, the geographical distance between
the venues of two consecutive check-ins.
Inter-checkin Time
In a check-in-based LBSN, users declare their presence at a
location in a voluntary fashion. This means that the checkin historic log of Jack does not necessarily include all the
venues he has visited over this period of time. The reasons
for not sharing a location can vary between users and can
also be related to the stage of system adoption (e.g., those
that have just started versus established adopters). We capture the change in users’ behavior in terms of the frequency
of location sharing through the time lag between two consecutive check-ins for all users. Figure 2 presents the average inter-checkin time with respect to the order2 of this
check-in pair for both of the datasets. In other words, the average inter-checkin time of order i is computed as the mean
of the time elapsed between check-ins i and i + 1 for all
users. As we can see, users initially appear to be hesitant to
use the LBSN system, but as time elapses (higher check-in
order) they become more familiar with it and appear to increasingly be willing to make use of it.
Next we consider the inter-checkin average time for the
three different cases as explained in Section 3; for all users,
for users with less than 1000 check-ins and for users with
more than 1000 check-ins. While the trend is the same for
every “sub-dataset”, users that have more than 1000 checkins in total are more likely to use the system from the very
beginning. Note also here that for the larger counts of checkins (more than 2000), there is no apparent trend, due to the
small number of samples (only a few users have check-ins
of order higher than 2000).
2
We will use the terms order and count interchangeably in the
rest of the paper.
The friendships are originally assymetric in Brighkite.
63
5
x 10
5
Mean
Median
10
15
x 10
5
0
0
500
1000
1500
Inter−checkin time (sec)
Inter−checkin time (sec)
5
2000
5
6
x 10
Mean
Median
4
2
0
0
500
1000
1500
Checkin count
5
0
0
(a) All users
100
200
300
400
500
600
700
800
900
1000
5
6
x 10
Mean
Median
4
x 10
Mean
Median
4
2
0
0
5
x 10
4
500
1000
1500
2000
Mean
Median
2
2
0
0
2000
6
Mean
Median
10
Inter−checkin time (sec)
15
100
200
300
400
500
600
Checkin count
700
800
900
1000
(b) Users with less than 1000 checkins
0
0
500
1000
1500
Checkin count
2000
(c) Users with more than 1000 checkins
Figure 2: LBSN users become more familiar on average with the system as they use it (Gowalla: bottom figure, Brightkite: top
figure).
While the mean value of the inter-checkin time can describe the average behavior of the users, we further examine
the distribution of the inter-checkin time for a given check-in
order. Figure 3 presents the cumulative distribution function
of the inter-checkin time for the 100th , 500th and 1000th
order check-in of all users for both datasets. That is, we
consider the inter-checkin time between the 99th and 100th
check-ins for all users and so on, and determine the CDF.
While the exact shape of the distribution for the two LBSNs
is different, both of them exhibit a long tail. This means that
extremely large values of inter check-in times exhibit a nonzero probability. As we can further observe, the distribution
has a tail that is heavier for smaller check-in counts for both
datasets. This means that the diversity among users is much
higher during the early adoption phase.
Given that the distribution of the inter-checkin time for a
given check-in count exhibits a heavy tail, the mean value
might not be a robust metric for capturing the users’ dynamics. Thus, we have also plotted the median inter-checkin time
in Figure 2. Even though the change in the median intercheckin time is less profound compared to the mean, the
trend is still similar (especially true for the Gowalla users).
Hence, regardless of the presence of a clear diversity among
the users with regards to the time elapsed between two consecutive check-ins, the inter-checkin time is reduced on average with an increase in the user’s activity (i.e., with the
increase in the check-in count).
Unique Places Visited
Next we examine the behavior of users with regards to the
locales/venues/places they visit. In particular, we want to examine how many new places people visit as they increase
their activity and use the LBSN system more. In other words,
do users tend to socialize in a small, closed set of venues or
do they tend to visit new places?
We have again used both of our datasets and we have calculated the average number of distinct venues that users have
after x numbers of check-ins. Figure 4(a) depicts the results
obtained when using the whole dataset. Note here that, the
results obtained when considering only users with less than
1000 check-ins (Figure 4(b)) and users with more than 1000
check-ins (Figure 4(c)) are similar and hence we will discuss
only the ones acquired from the whole dataset. We observe
that there is a linear relation between the number of unique
venues visited and the check-in count for both Gowalla and
Brightkite. However, the slope of the least-square linear fit
in the data is very different for the two datasets. In particular,
for Gowalla, the slope is equal to 0.65 (with an R2 value of
0.97), while for Brighkite the slope is much smaller, 0.085
(R2 = 0.98). Essentially, this means that every check-in a
Gowalla (Brightkite) user is performing, has a probability
of 0.65 (0.085) of being at a previously unseen venue in his
history-log of locations! Clearly, these are two very different
kinds of behaviors of users across the datasets.
In order to understand the reasons behind the different dynamics between the users of the two LBSNs with regards to
the unique places visited, recall that the check-ins of a user
do not reveal all of the actual places that they have been.
They capture only the places that users are willing to share
with the system. For instance, Jack might be willing to share
his location when he is at work (e.g., check-in at his office),
0
10
−1
ECDF
10
−2
10
100
500
1000
−3
10
0
10
2
10
4
10
Inter−checkin time (sec)
6
8
10
10
(a) Brightkite
0
10
−1
ECDF
10
−2
10
100
500
1000
−3
10
−4
10
0
10
2
10
4
10
Inter−checkin time (sec)
6
10
8
10
(b) Gowalla
Figure 3: The CDF of the inter-checkin times is skewed,
especially during the early stages of usage (i.e., smaller
checkin count).
64
250
300
300
200
Data
2
y=29.43+0.085x (R = 0.98)
500
1000
1500
2000
2500
2000
1000
1000
1500
Checkin Count
(a) All users
0
0
2
y=0.15x+13.49 (R = 0.96)
100
200
300
400
500
2000
600
700
800
900
1000
600
400
Data
100
Data
2
y=0.093x + 15.13 (R = 0.98)
0
0
2000
500
1000
1500
2500
0
0
100
200
300
400
500
600
Checkin Count
700
800
900
500
1000
1500
Checkin Count
2000
2500
1000
Data
y = 22.48+0.486x (R2 = 0.97)
2
y=−16.9+0.65x (R =0.97)
500
Data
50
200
Data
0
0
100
Unique Venues Visited
100
0
0
200
150
Unique Places Visited
Unique places visited
200
y=0.63x + 10.33 (R2 = 0.96)
1000
(b) Users with less than 1000 checkins
0
0
2000
2500
(c) Users with more than 1000 checkins
Figure 4: Gowalla users (bottom figure) tend to share their locations when they visit a new place, while Brightkite users (top
figure) remain active within a small, slowly increasing, group of venues.
but he might not be eager to do so with regards to the place
where he has had lunch (e.g., he does not want to share his
presence in a fast-food joint for self representation reasons).
In general, there are many reasons that have been cited behind a user not sharing his location in LBSNs (Lindqvist et
al. 2011) (Cramer, Rost, & Holmquist 2011). Even though
the reasons behind such a behavior are not the focus of our
study, they can partially help us understand the difference in
the two slopes observed.
An important factor that can affect the sharing attitudes of
people is related to the objective and the nature of the underlying system, that is, its main application and purpose (Tang
et al. 2010). Gowalla, evolved to become a city guide application. People that visit a city for first time could make use
of the check-ins of Gowalla users (and possibly textual comments accompanying them) and explore locales in this new
environment. Hence, Gowalla users may be tempted (perhaps even encouraged) to check-in at new spots, in order to
provide a more comprehensive guide of their city, which can
explain the large slope of the linear curve in Figure 4.
On the contrary, Brighkite did not have a similar objective and it was mainly a social-driven application. It aimed
at connecting people in the physical world through location
sharing. Hence, Brighkite users have all of the privacy concerns that have been identified in the literature, causing them
to be more skeptical when sharing their presence, which can
account for the much smaller slope of the corresponding
line.
We further examine the distribution of the unique venues
among the users for a given check-in count. Again, we consider the 100th , 500th and 1000th order check-in of all users
and we compute the CDF of their unique venues up to that
point. The results are presented in Figure 5. As we can see,
these distributions do not exhibit long tails as the ones for the
inter-checkin times (Figure 3), but they are closer to a uniform distribution. In addition, the number of distinct places
visited is upper bounded by the number of total check-ins of
a user. Therefore, the mean value calculated forms a robust
statistic.
Based on the above results we can conclude that the actual
application that an LBSN is targeting can have implications
on the temporal behavior dynamics of the users.
0
ECDF
10
100
500
1000
0
10
1
10
2
3
10
Unique Venues Visited
10
(a) Brightkite
0
10
−1
10
ECDF
−2
10
−3
10
100
500
1000
−4
10
−5
10
0
10
1
10
2
10
Unique Venues Visited
3
10
(b) Gowalla
Figure 5: The number of unique venues for a given check-in
count is close to uniformly distributed.
of a user with regards to the places he has visited. However,
it does not only consider the number of distinct locations
visited by him, but it also takes into account the frequency
of these visits. The definition presented in what follows is
based on similar definitions by Cranshaw et al (Cranshaw et
al. 2010).
Let us assume that Lu is a set containing all the locations
shared by user u. Furthermore, let Pl (u), l ∈ Lu , be the
fraction of check-ins of user u that happened in location l.
Then the entropy eu of user u is defined as:
X
eu = −
Pl (u) · log(Pl (u))
(1)
l∈Lu
User Entropy
From the above equation, we can notice that when the user
visits many places in fairly equal proportions, his entropy
will be large. On the contrary, when most of his activity is
Following the above analyses, we examine a user’s entropy.
In particular, the “entropy” of a user captures the diversity
65
3
3
4
Mean
Median
1
0
0
500
1000
1500
2000
10
5
Mean
Median
0
0
500
1000
1500
Checkin Count
0
0
Mean
Median
100
200
300
400
500
600
700
800
0
0
(a) All users
1000
4
Mean
Median
2
2000
900
6
100
200
300
400
500
600
Checkin Count
700
800
900
Average User Entropy
2
2
Average User Entropy
Average User Entropy
2
Mean
Median
1
0
0
500
1000
1500
2000
500
1000
1500
Checkin Count
2000
10
5
Mean
Median
1000
(b) Users with less than 1000 check-ins
0
0
(c) Users with more than 1000 check-ins
Figure 6: Gowalla users (bottom figures) spread their activity equally across a much larger number of venues resulting in a
higher entropy as compared to the more stable, low entropy of Brightkite users (top figures).
restricted to a few locales only, his entropy will be low. In
other words, a (non) diverse user with respect to the places
he visits will exhibit (low) high entropy.
Using the two datasets, we have calculated the average
user entropy as a function of the check-in order. The results are presented in Figure 6. Brighkite users exhibit much
lower (average) entropy as compared to the Gowalla users.
Recall that the latter have a much larger number of unique
venues, which means that they “distribute” their activity in
more places, exhibiting higher diversity. It is also interesting to observe that the entropy of a Brighkite user stabilizes
fairly quickly, after only a few check-ins. On the contrary,
the entropy of a Gowalla user, slowly increases as the checkin count increases. This is in alignment with the results obtained for the unique venues visited by a user (Figure 4).
We have further computed the distribution of the entropy
of all the users’ for a given check-in count (100th , 500th and
1000th check-in). Figure 7 presents the results. The distribution (especially that of Brightkite users) exhibits a slight
negative skew. Even though this small left-tail should not
significantly affect the robustness of the mean value for the
user entropy, we have further calculated the median user entropy as a function of the check-in count (Figure 6). We notice the median aligns with the results for the mean value.
For Brightkite users, the median value of the entropy is
slightly larger than the average value as expected, but it is
still much smaller than that of Gowalla users.
To summarize, the above results for the user entropy verify and further support our previous claim, that the actual
purpose of the underlying LBSN has an implication on the
temporal behavior of its users.
0
10
Inter-checkin Distance
100
500
1000
ECDF
Finally, we examine the temporal dynamics of LBSN users
from the perspective of their geographical properties. In
particular, we calculate the distance between two consecutive check-ins of the users as a function of the check-in
count. Figure 8 presents our results. Consecutive check-ins
of Gowalla users are contained within a small distance, regardless of the check-in count. This might again be explained if we consider the purpose of the system. Users that
aim at creating a type of city guide will tend to check-in at
many places, which most probably will be located very close
to each other.
On the contrary, the average distances between consecutive check-ins for Brightkite users exhibit much larger values and an increasing trend as the activity level increases.
The much larger values can be attributed to the fact that
Brighkite users might be more tempted to check-in to distant
places (e.g., places they visit during their travels) that do not
belong to their hometown. In addition, as we have observed
previously, the number of uniques venues shared by these
users is small. Thus, the large distance between consecutive
check-ins will be retained as there are not many new places
added in between.
However, if we examine the probability distribution of
inter-checkin distances for a given check-in order, we see
that the distribution exhibits an extremely heavy tail similar
to that of a power law distribution (Figure 9). Thus, the average value for the inter-checkin distances is not a very robust
metric, and therefore, we also calculate the median value.
−3
−2
10
10
−1
10
User Entropy
0
10
1
10
(a) Brightkite
0
10
−1
10
100
500
1000
−2
ECDF
10
−3
10
−4
10
−5
10
0
10
User Entropy
(b) Gowalla
Figure 7: The distribution of the user entropy for a
given check-in count is lightly left-skewed, especially for
Brightkite users.
66
400
Inter−checkin Distance (miles)
Inter−checkin Distance (miles)
Mean
Median
200
0
0
1000
500
1000
1500
2000
Mean
Median
500
0
0
500
1000
1500
Checkin Count
400
Mean
Median
200
0
0
200
(a) All users
600
800
1000
Mean
Median
100
0
0
2000
400
200
200
400
600
Checkin Count
800
1000
(b) Users with less than 1000 check-ins
Inter−checkin Distance (miles)
600
400
Mean
Median
200
0
0
500
1000
1500
2000
1000
1500
Checkin Count
2000
1000
500
0
0
Mean
Median
500
(c) Users with more than 1000 check-ins
Figure 8: LBSN users manifest a stable behavior with regards to the distances of consecutive places shared (Gowalla: bottom
figure, Brightkite: top figure).
We can see for both datasets that the median value is very
similar. In particular, users do not appear to change their behavior with regards to the distances traveled between two
back-to-back sharings of location.
In the next section, we will discuss what are the implications of our data analysis and how we can exploit the observed dynamics.
lized to enhance existing or enable new services focusing
on two example applications; (a) location prediction and (b)
friend recommendation.
Location Prediction: Location-based marketing and advertisement has been in rise during the last years (Bruner &
Kumar 2007). It has been primarily re-active, that is, users
do not get exposed to offers unless they are present in the
location of interest. However, a set of novel, proactive applications can be realized if we are able to predict with high
accuracy the future location of a user. For instance, time limited information about the places to be visited, weather and
traffic reports in these areas as well as presentation of reviews for these locales are just some of the possible services
to be offered.
The temporal dynamics of users can enable more accurate
location forecasting. For instance, the fact that users change
their level of activity over time is an indicator that in general
they do not follow regular patterns, or at least there is a transient period (i.e., the initial hesitation phase) that should be
considered with care in a supervised learning predictor. In
addition, using the relation between the unique venues visited by a user and his check-in count, we can obtain the probability that his next location will be a new venue or one of
the already visited. Further combining this information with
the dynamics of the inter-checkin distances can provide even
higher accuracy in future location estimation. Utilizing these
dynamics, in combination with the social dynamics present
in an LBSN (e.g., friendship relations, user spatial similarity
etc.), for an accurate spatial predictor, is part of our future
work.
Friend Recommendation: Social media are helping their
members to find friends with similar interests, or connect
them with people they already know. Of critical importance
to this service is a friend recommendation engine. In traditional digital social networks, this functionality is based
primarily on the number of common friends and the level of
virtual interactions with these common friends.
However, virtual interactions do not usually form a good
indicator of how close two people are in the real world. Nevertheless, with the introduction of LBSNs, the spatial logs
of users bridge the virtual and physical worlds and they can
be used to provide better friend recommendations. For instance, Scellato et al (Scellato, Noulas, & Mascolo 2011)
using longitudinal data, show that about 30% of new social
links are created among people that have visited the same
0
ECDF
10
100
500
1000
−5
0
10
5
10
Inter−checkin Distance (Miles)
10
(a) Brightkite
0
ECDF
10
−1
10
100
500
1000
−2
10 −6
10
−4
10
−2
0
10
10
Inter−checkin Distance (Miles)
2
10
4
10
(b) Gowalla
Figure 9: The distribution of the inter-checkin distances for
a given check-in count is positively skewed.
5
Discussion and Future Directions
LBSNs can enable a large number of novel applications. At
the same time, the rich spatial information present in these
systems can enhance existing important functionalities in social media. For instance, Lian and Xie (Lian & Xie 2011),
make use of check-ins of users that are similar to identify
the activities they participate in at the locations of interest.
In this section we will discuss how our findings can be uti-
67
places. This finding can reduce the search space for new
friendships, while keep accuracy as high as 66%. Taking into
consideration the users’ temporal dynamics can further improve friend recommendations. We are working on examining the integration of our findings presented in this paper,
with location-aware friend recommendation engines.
6
Cramer, H.; Rost, M.; and Holmquist, L. 2011. Performing a check-in: Emerging practices, norms and conflicts in
location-sharing using foursquare. In ACM MobileHCI.
Cranshaw, J.; Toch, E.; Hong, J.; Kittur, A.; and Sadeh, N.
2010. Bridging the gap between physical location and online
social networks. In UBICOMP.
Li, N., and Chen, G. 2009. Analysis of a location-based
social network. In IEEE CSE.
Li, N., and Chen, G. 2010. Sharing location in online social
networks. IEEE Network 24(5):20–25.
Lian, D., and Xie, X. 2011. Collaborative activity recognition via check-in history. In ACM LBSN.
Lindqvist, J.; Cranshaw, J.; Wiese, J.; Hong, J.; and Zimmerman, J. 2011. Im the mayor of my house: Examining
why people use foursquare - a social-driven location sharing
application. In ACM CHI.
Noulas, A.; Scellato, S.; Mascolo, C.; and Pontil, M. 2011.
An empirical study of geographic user activity patterns in
foursquare. In AAAI ICWSM (poster session).
Noulas, A.; Scellato, S.; Lambiotte, R.; Pontil, M.; and Mascolo, C. 2012. A tale of many cities: universal patters in
human urban mobility. In PloS ONE (Forthcoming).
Scellato, S., and Mascolo, C. 2011. Measuring user activity
on an online location-based social network. In NetSciCom.
Scellato, S.; Noulas, A.; Lambiotte, R.; and Mascolo, C.
2011. Socio-spatial properties of online location-based social networks. In AAAI ICWSM.
Scellato, S.; Noulas, A.; and Mascolo, C. 2011. Exploiting place features in link prediction on location-based social
networks. In ACM KDD.
Stouffer, S. 1940. Intervening opportunities: A theory relating mobility and distance. In American Sociological Review.
Tang, K.; Lin, J.; Hong, J.; Siewiorek, D.; and Sadeh, N.
2010. Rethinking location sharing: Exploring the implications of social-driven vs. purpose-driven location sharing. In
UBICOMP.
Teevan, J.; Karlson, A.; Amini, S.; Brush, A. B.; and
Krumm, J. 2011. Understanding the importance of location, time, and people in mobile local search behavior. In
ACM MobileHCI.
Conclusions
In this work we have examined the temporal dynamics of
location-based social network users. Analyzing data from
two commercial LBSNs (Brightkite and Gowalla) we have
studied the time intervals and geographical distances between two consecutive location sharings, as well as the number of unique places visited/shared and the user entropy,
as a function of the users’ activity level. The latter is captured through the number of check-ins. From our results,
users start using the system more often once they become
more familiar with it (i.e., with the increase in their checkin counts). This is expressed through smaller inter-checkin
times for higher order check-ins. Furthermore, users of different LBSNs might exhibit different temporal dynamics
with regards to the locations visited depending on the purpose of the specific network they participate in. Finally, we
have discussed ways through which these findings can enhance many existing or even enable new functionalities. The
latter is part of our current and future efforts.
Acknowledgment
We would like to thank Prof. Christos Faloutsos for his valuable comments and discussions during the preparation of
this work.
References
Bruner, G., and Kumar, A. 2007. Attitude toward locationbased advertising. In Journal of Interactive Advertising, Vol.
7, No 2.
Cheng, Z.; Caverlee, J.; Lee, K.; and Sui, D. 2011. Exploring
millions of footprints in location sharing services. In AAAI
ICWSM.
Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship
and mobility: Friendship and mobility: User movement in
location-based social networks. In ACM KDD.
68
Download