AAAI Technical Report FS-12-08 Social Networks and Social Contagion Location-B ased Social Network Users Through a Lense: Examining Temporal User Patterns Konstantinos Pelechrinis and Prashant Krishnamurthy School of Information Sciences University of Pittsburgh {kpele, prashk}@pitt.edu Abstract Such a list can be accompanied by tips and recommendations from people that have visited these places before. Even simply the number of people that have been at a locale or venue in the past or are present at the moment might be informative for making decisions. An LBSN has two distinct components, a social and a spatial one. The social part of the system resembles any other existing online social network, where friendships are declared and people can interact with the “connected” people. What differentiates LBSNs from other online social networks (OSNs) are the type of interactions that are feasible between the members of the network. The main feature of this interaction is location sharing, which comprises the spatial component of the system. This is essentially a timestamped location log of places a user has visited. Location sharing can be realized either through continuous tracking, in the form of a temporal latitude/longitude trajectory (e.g., Loopt, or Google latitude) or via “check-ins”, where users voluntarily announce their presence in a place or venue at their convenience (e.g., Gowalla, Brightkite, Foursquare etc.). Each representation has its advantages and disadvantages. The second approach, where location is tagged with semantic information, as compared to a flat geographic trajectory, offers a richer set of information, but with coarse location granularity. The reasons why users may adopt LBSNS may vary. Commercial LBSNs can offer Groupon-like deals based on location, providing monetary incentives for someone to adopt their LBSN, while the gaming aspects of LBSNs form another important motivation for people to adopt their usage (Lindqvist et al. 2011). The LBSN service itself may have different “scopes”. The primary purpose of Gowalla, for example, was creating city guides while Foursquare has focussed on the social aspects, coupons and gaming. We wish to observe the ways in which users may alter their behavior over time with regards to the usage of an LBSN. For understanding this temporal evolution of users’ behaviors, in this work, we consider systems in which spatial information is created via time-stamped check-ins. For instance, if Jack starts with primarily an interest in rewards and coupons offered by specific locales/venues, he might use the network only when he is at these venues. If he then gets interested in the gaming aspects of the system over time, he might start increasing his activity at other places as well. There has been a rapid proliferation of location-based social networks (LBSNs) during the last years. The spatial component of these systems provides a rich source of information that can be exploited by a number of novel services. However, to better design such services, it is important to understand the way people make use of these platforms and how this usage changes over time. While there exist studies that examine the motivations of people for adopting the usage of LBSNs and the temporal dynamics of these motivations, they are based on interviews and are mostly qualitative. Motivations can further only indirectly reveal or help us infer user behavior. In this paper, we analyze data from two commercial LBSNs to examine the temporal evolution of usage patterns to see what the data on their own reveal. We find that users of two social networks that we examined increase their level of activity as they use the system. However, depending on the main purpose of the underlying LBSN, users may exhibit different behaviors over time. We believe that our findings can open new directions and stimulate further research on areas such as location prediction and its applications (e.g., urban and transportation planning and location-based advertisment). Keywords: Location-based Social Networks, Temporal Evolution, Mining of Social Networks 1 Introduction Advancements in mobile handheld devices during the last several years and their increased capability to accurately estimate (and report) their position have brought another dimension in the already popular digital social media arena, that of location. People that are connected through them not only have social ties (e.g., friendship) and/or interests in common (e.g., hiking), but they are also connected with regards to their location. In other words, location-based social networks bond the online, virtual social ties with those in the real world through location information of users. This connection can drive a number of novel, convenient and appealing services. People can track their children or their friends. Applications related to location coordination for scheduled meetings can be enabled. Others can explore new places to visit through a list of venues that are around their location. c 2012, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 61 This will manifest as an increase in the rate of visits and/or the number of different places he has visited over time. In this paper, we analyze data from two commercial LBSNs (Gowalla and Brightkite) to answer the following general question “What are the temporal dynamics of LBSNs users’ patterns ?” Our main findings can be summarized in the following: with regards to their privacy concerns. They found that privacy concerns increase with the age of the users as well as with an increased activity in the system. Furthermore, Cramer et al (Cramer, Rost, & Holmquist 2011) through another set of surveys and interviews identify emerging but also conflicting norms on why people might not check-in. Concerns related to “spamming” their friends with useless information and the way the people around them perceive the action of a check-in were the most cited reasons for not sharing locations, while support to a friend’s business forms an important reason to check-in at a locale. Understanding the motivations behind the usage of LBSNs can help in explaining the LBSN usage pattern data. However, surveys are limited in terms of small scale. Furthermore, they cannot quantitatively capture the behaviors of users, since they are mainly comprised of qualitative questions and answers. To the best of our knowledge this is the first large scale study on the temporal dynamics of LBSN usage analyzing real data obtained from commercial networks. • Our results show that people are slow adopters. They start using the systems more often once they become familiar with it. • The general “scope” of the LBSN under examination affects the temporal dynamics of the usage patterns. Scope of our study: Spatial information has been identified as a critical factor for the success of many functionalities, such as local search (Teevan et al. 2011). Our work can be mainly viewed as a study of the temporal dynamics of LBSN users’ location sharing patterns. However, it can also provide guidelines on how functionalities that exploit location information can be tuned in a more efficient and perhaps accurate way with a better understanding of what the data means. For instance, one of our findings show that users have a transient initial period, where they are hesitant to use an LBSN and share their location. A spatio-temporal prediction algorithm, which can drive applications such as localized targeted advertising, needs to take such dynamics into consideration. We will further discuss the implications of our findings later in the paper. The rest of the paper is organized as follows. Section 2 discusses existing literature related to our study. Section 3 introduces the datasets we used to perform our analysis, while Section 4 presents the analysis of these data and our results. Finally, Section 5 elaborates on the scope of our work and possible applications of our findings, while Section 6 concludes our work. 2 Mining the structure of LBSNs There is a set of studies that examines the structural properties of existing LBSNs. Here, structure refers not only to the properties of the social network graph (as in traditional OSNs) but also in the location component (e.g., physical distance to friends, time and type of check-ins etc.). For instance, Cheng et al (Cheng et al. 2011) use data from Foursquare to examine (i) the spatio-temporal properties of users’ check-ins as well as (ii) their mobility patterns. One interesting finding is that a user’s social status, together with geographic and economic factors, is coupled with his mobility. Similarly, Noulas et al (Noulas et al. 2011) study the geographical properties of users’ activities as captured through the type of places they visit and their transitions. They use this study for identifying universal features for human urban mobility (Noulas et al. 2012). Using data from Foursquare, they show that mobility does not exhibit a universal behavior when examined in terms of pure distance between two stay points. However, the density of the areas, and as a consequence the intervening opportunities (Stouffer 1940), dictates the mobility of people in urban areas through a fairly universal law. Scellato et al (Scellato et al. 2011) try to identify the relation between friendship and distance using data from 3 different LBSNs (Gowalla, Foursquare and Brightkite). They find that the socio-spatial structure of these systems cannot be explained by only geographic factors or only social mechanisms. Li and Chen (Li & Chen 2009) analyze data from Brightkite and after providing the structural properties of the underlying social graph they try to identify correlations between different users’ profile features, activity updates, and mobility patterns. Scellato and Mascolo (Scellato & Mascolo 2011) show that the existence (or not) of heavy tails in the distributions of various activity attributes of LBSN users (e.g., degree distribution, check-in distribution etc.) is tightly connected with the users’ account age and activity span. We would like to emphasize that our study is complementary to the above efforts. While we also analyze data from Related Studies In this section we will briefly discuss related studies and further differentiate our work. Motivations for adopting LBSNs There is a line of literature, which examines specifically the reasons behind people using LBSNs. These studies mainly utilize user surveys in order to obtain their results. For instance, Lindqvist et al (Lindqvist et al. 2011), recruit Foursquare users in order to identify their reasons behind using this system. They describe the motivations behind adoption of Fousquare and the continuation of its usage. They find these motivations change over time. Users, in their initial stages, employ Foursquare for fun (the gaming aspect of the system is initially important). Over time, users are mainly interested in keeping track of places they have visited. They also observe specific bimodal distributions related to specific places where users check-in. For instance, there are users that either check-in at their homes all the time or they do not at all. Surprisingly, this study reveals that privacy concerns are not as crucial as one might have expected. Li and Chen (Li & Chen 2010) interview people specifically 62 LBSNs, our goal is to study the temporal behavior patterns rather than the social and spatial network structure that is usually captured and analyzed from a (static) network snapshot. 0.8 ECDF 3 1 LBSN Datasets 0.4 Gowalla Brightkite 0.2 In this section we will briefly describe the dataset we used for our analysis. These data have been made available by Cho et al (Cho, Myers, & Leskovec 2011) and are obtained from two commercial LBSNs, namely Gowalla and Brightkite. Gowalla dataset: The dataset consists of 6,442,892 public check-in data performed by 196,591 Gowalla users in 647,923 distinct places, during the period between February 2009 and October 2010. Every check-in log includes a tuple in the form <User ID, Time, Latitude, Longitude, Venue ID>. Gowalla users also participate in a friendship network with reciprocal relations, which consists of 950,327 links. Brightkite dataset: The dataset consists of 4,491,143 public check-in data performed by 58,228 Brightkite users in 772,966 distinct places, during the period between April 2008 and October 2010. The check-in information is in exactly the same format as above. Brighkite users also participate in a friendship network, which consists of 214,078 links1 . The notion of time in our study is captured through the check-in count of a user rather than an absolute value. In other words, we examine the behavior of a user’s activity as measured through the number of check-ins. For instance, when we refer to time t = 5, we essentially refer to the 5th check-in of a user. In this way we have a notion of relative time (with respect to his first presence in the system). Hence, even though the 5th check-in of Jack happened at a different (absolute) time as compared to the 5th check-in of Bob, the relative time captures the 5th interaction of both Jack and Bob with the network. In this way, we are able to identify any existing patterns and express the temporal dynamics of an LBSN users’ behavior on average. To summarize, notion of time in our study is event (i.e., check-in) driven. Given that we are interested in this relative notion of time, we calculate the cumulative distribution function of the number of check-ins for the users of the two datasets described above. Figure 1 presents our results. We can observe that the majority of the users have less than 1000 check-ins, while only a very small percentage of them (around 5%) has over 1000 check-ins. In what follows, we will present the average temporal dynamics of LBSN users as calculated over (i) the entire user population, (ii) the set of users with more than 1000 check-ins and (iii) the set of users with less than 1000 check-ins. The reason behind this distinction is to examine whether there exist intrinsic differences in the average behaviors of users with different total activity in the system. We will elaborate more on this when we present our results in the next section. 1 0.6 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Number of total checkins Figure 1: The majority of the users in our datasets have less than 1000 check-ins. 4 Temporal Behavior of Users In this section we will present the analysis of the two aforementioned datasets. In particular, we examine the following metrics with respect to the check-in count of the user (i.e., the relative time mentioned above): (i) the inter-checkin time, that is, the time that has elapsed between two consecutive check-in events, (ii) the number of unique venues visited by a user, (iii) the entropy of a user, which captures his diversity with regards to the places visited and, (iv) the intercheckin distance, that is, the geographical distance between the venues of two consecutive check-ins. Inter-checkin Time In a check-in-based LBSN, users declare their presence at a location in a voluntary fashion. This means that the checkin historic log of Jack does not necessarily include all the venues he has visited over this period of time. The reasons for not sharing a location can vary between users and can also be related to the stage of system adoption (e.g., those that have just started versus established adopters). We capture the change in users’ behavior in terms of the frequency of location sharing through the time lag between two consecutive check-ins for all users. Figure 2 presents the average inter-checkin time with respect to the order2 of this check-in pair for both of the datasets. In other words, the average inter-checkin time of order i is computed as the mean of the time elapsed between check-ins i and i + 1 for all users. As we can see, users initially appear to be hesitant to use the LBSN system, but as time elapses (higher check-in order) they become more familiar with it and appear to increasingly be willing to make use of it. Next we consider the inter-checkin average time for the three different cases as explained in Section 3; for all users, for users with less than 1000 check-ins and for users with more than 1000 check-ins. While the trend is the same for every “sub-dataset”, users that have more than 1000 checkins in total are more likely to use the system from the very beginning. Note also here that for the larger counts of checkins (more than 2000), there is no apparent trend, due to the small number of samples (only a few users have check-ins of order higher than 2000). 2 We will use the terms order and count interchangeably in the rest of the paper. The friendships are originally assymetric in Brighkite. 63 5 x 10 5 Mean Median 10 15 x 10 5 0 0 500 1000 1500 Inter−checkin time (sec) Inter−checkin time (sec) 5 2000 5 6 x 10 Mean Median 4 2 0 0 500 1000 1500 Checkin count 5 0 0 (a) All users 100 200 300 400 500 600 700 800 900 1000 5 6 x 10 Mean Median 4 x 10 Mean Median 4 2 0 0 5 x 10 4 500 1000 1500 2000 Mean Median 2 2 0 0 2000 6 Mean Median 10 Inter−checkin time (sec) 15 100 200 300 400 500 600 Checkin count 700 800 900 1000 (b) Users with less than 1000 checkins 0 0 500 1000 1500 Checkin count 2000 (c) Users with more than 1000 checkins Figure 2: LBSN users become more familiar on average with the system as they use it (Gowalla: bottom figure, Brightkite: top figure). While the mean value of the inter-checkin time can describe the average behavior of the users, we further examine the distribution of the inter-checkin time for a given check-in order. Figure 3 presents the cumulative distribution function of the inter-checkin time for the 100th , 500th and 1000th order check-in of all users for both datasets. That is, we consider the inter-checkin time between the 99th and 100th check-ins for all users and so on, and determine the CDF. While the exact shape of the distribution for the two LBSNs is different, both of them exhibit a long tail. This means that extremely large values of inter check-in times exhibit a nonzero probability. As we can further observe, the distribution has a tail that is heavier for smaller check-in counts for both datasets. This means that the diversity among users is much higher during the early adoption phase. Given that the distribution of the inter-checkin time for a given check-in count exhibits a heavy tail, the mean value might not be a robust metric for capturing the users’ dynamics. Thus, we have also plotted the median inter-checkin time in Figure 2. Even though the change in the median intercheckin time is less profound compared to the mean, the trend is still similar (especially true for the Gowalla users). Hence, regardless of the presence of a clear diversity among the users with regards to the time elapsed between two consecutive check-ins, the inter-checkin time is reduced on average with an increase in the user’s activity (i.e., with the increase in the check-in count). Unique Places Visited Next we examine the behavior of users with regards to the locales/venues/places they visit. In particular, we want to examine how many new places people visit as they increase their activity and use the LBSN system more. In other words, do users tend to socialize in a small, closed set of venues or do they tend to visit new places? We have again used both of our datasets and we have calculated the average number of distinct venues that users have after x numbers of check-ins. Figure 4(a) depicts the results obtained when using the whole dataset. Note here that, the results obtained when considering only users with less than 1000 check-ins (Figure 4(b)) and users with more than 1000 check-ins (Figure 4(c)) are similar and hence we will discuss only the ones acquired from the whole dataset. We observe that there is a linear relation between the number of unique venues visited and the check-in count for both Gowalla and Brightkite. However, the slope of the least-square linear fit in the data is very different for the two datasets. In particular, for Gowalla, the slope is equal to 0.65 (with an R2 value of 0.97), while for Brighkite the slope is much smaller, 0.085 (R2 = 0.98). Essentially, this means that every check-in a Gowalla (Brightkite) user is performing, has a probability of 0.65 (0.085) of being at a previously unseen venue in his history-log of locations! Clearly, these are two very different kinds of behaviors of users across the datasets. In order to understand the reasons behind the different dynamics between the users of the two LBSNs with regards to the unique places visited, recall that the check-ins of a user do not reveal all of the actual places that they have been. They capture only the places that users are willing to share with the system. For instance, Jack might be willing to share his location when he is at work (e.g., check-in at his office), 0 10 −1 ECDF 10 −2 10 100 500 1000 −3 10 0 10 2 10 4 10 Inter−checkin time (sec) 6 8 10 10 (a) Brightkite 0 10 −1 ECDF 10 −2 10 100 500 1000 −3 10 −4 10 0 10 2 10 4 10 Inter−checkin time (sec) 6 10 8 10 (b) Gowalla Figure 3: The CDF of the inter-checkin times is skewed, especially during the early stages of usage (i.e., smaller checkin count). 64 250 300 300 200 Data 2 y=29.43+0.085x (R = 0.98) 500 1000 1500 2000 2500 2000 1000 1000 1500 Checkin Count (a) All users 0 0 2 y=0.15x+13.49 (R = 0.96) 100 200 300 400 500 2000 600 700 800 900 1000 600 400 Data 100 Data 2 y=0.093x + 15.13 (R = 0.98) 0 0 2000 500 1000 1500 2500 0 0 100 200 300 400 500 600 Checkin Count 700 800 900 500 1000 1500 Checkin Count 2000 2500 1000 Data y = 22.48+0.486x (R2 = 0.97) 2 y=−16.9+0.65x (R =0.97) 500 Data 50 200 Data 0 0 100 Unique Venues Visited 100 0 0 200 150 Unique Places Visited Unique places visited 200 y=0.63x + 10.33 (R2 = 0.96) 1000 (b) Users with less than 1000 checkins 0 0 2000 2500 (c) Users with more than 1000 checkins Figure 4: Gowalla users (bottom figure) tend to share their locations when they visit a new place, while Brightkite users (top figure) remain active within a small, slowly increasing, group of venues. but he might not be eager to do so with regards to the place where he has had lunch (e.g., he does not want to share his presence in a fast-food joint for self representation reasons). In general, there are many reasons that have been cited behind a user not sharing his location in LBSNs (Lindqvist et al. 2011) (Cramer, Rost, & Holmquist 2011). Even though the reasons behind such a behavior are not the focus of our study, they can partially help us understand the difference in the two slopes observed. An important factor that can affect the sharing attitudes of people is related to the objective and the nature of the underlying system, that is, its main application and purpose (Tang et al. 2010). Gowalla, evolved to become a city guide application. People that visit a city for first time could make use of the check-ins of Gowalla users (and possibly textual comments accompanying them) and explore locales in this new environment. Hence, Gowalla users may be tempted (perhaps even encouraged) to check-in at new spots, in order to provide a more comprehensive guide of their city, which can explain the large slope of the linear curve in Figure 4. On the contrary, Brighkite did not have a similar objective and it was mainly a social-driven application. It aimed at connecting people in the physical world through location sharing. Hence, Brighkite users have all of the privacy concerns that have been identified in the literature, causing them to be more skeptical when sharing their presence, which can account for the much smaller slope of the corresponding line. We further examine the distribution of the unique venues among the users for a given check-in count. Again, we consider the 100th , 500th and 1000th order check-in of all users and we compute the CDF of their unique venues up to that point. The results are presented in Figure 5. As we can see, these distributions do not exhibit long tails as the ones for the inter-checkin times (Figure 3), but they are closer to a uniform distribution. In addition, the number of distinct places visited is upper bounded by the number of total check-ins of a user. Therefore, the mean value calculated forms a robust statistic. Based on the above results we can conclude that the actual application that an LBSN is targeting can have implications on the temporal behavior dynamics of the users. 0 ECDF 10 100 500 1000 0 10 1 10 2 3 10 Unique Venues Visited 10 (a) Brightkite 0 10 −1 10 ECDF −2 10 −3 10 100 500 1000 −4 10 −5 10 0 10 1 10 2 10 Unique Venues Visited 3 10 (b) Gowalla Figure 5: The number of unique venues for a given check-in count is close to uniformly distributed. of a user with regards to the places he has visited. However, it does not only consider the number of distinct locations visited by him, but it also takes into account the frequency of these visits. The definition presented in what follows is based on similar definitions by Cranshaw et al (Cranshaw et al. 2010). Let us assume that Lu is a set containing all the locations shared by user u. Furthermore, let Pl (u), l ∈ Lu , be the fraction of check-ins of user u that happened in location l. Then the entropy eu of user u is defined as: X eu = − Pl (u) · log(Pl (u)) (1) l∈Lu User Entropy From the above equation, we can notice that when the user visits many places in fairly equal proportions, his entropy will be large. On the contrary, when most of his activity is Following the above analyses, we examine a user’s entropy. In particular, the “entropy” of a user captures the diversity 65 3 3 4 Mean Median 1 0 0 500 1000 1500 2000 10 5 Mean Median 0 0 500 1000 1500 Checkin Count 0 0 Mean Median 100 200 300 400 500 600 700 800 0 0 (a) All users 1000 4 Mean Median 2 2000 900 6 100 200 300 400 500 600 Checkin Count 700 800 900 Average User Entropy 2 2 Average User Entropy Average User Entropy 2 Mean Median 1 0 0 500 1000 1500 2000 500 1000 1500 Checkin Count 2000 10 5 Mean Median 1000 (b) Users with less than 1000 check-ins 0 0 (c) Users with more than 1000 check-ins Figure 6: Gowalla users (bottom figures) spread their activity equally across a much larger number of venues resulting in a higher entropy as compared to the more stable, low entropy of Brightkite users (top figures). restricted to a few locales only, his entropy will be low. In other words, a (non) diverse user with respect to the places he visits will exhibit (low) high entropy. Using the two datasets, we have calculated the average user entropy as a function of the check-in order. The results are presented in Figure 6. Brighkite users exhibit much lower (average) entropy as compared to the Gowalla users. Recall that the latter have a much larger number of unique venues, which means that they “distribute” their activity in more places, exhibiting higher diversity. It is also interesting to observe that the entropy of a Brighkite user stabilizes fairly quickly, after only a few check-ins. On the contrary, the entropy of a Gowalla user, slowly increases as the checkin count increases. This is in alignment with the results obtained for the unique venues visited by a user (Figure 4). We have further computed the distribution of the entropy of all the users’ for a given check-in count (100th , 500th and 1000th check-in). Figure 7 presents the results. The distribution (especially that of Brightkite users) exhibits a slight negative skew. Even though this small left-tail should not significantly affect the robustness of the mean value for the user entropy, we have further calculated the median user entropy as a function of the check-in count (Figure 6). We notice the median aligns with the results for the mean value. For Brightkite users, the median value of the entropy is slightly larger than the average value as expected, but it is still much smaller than that of Gowalla users. To summarize, the above results for the user entropy verify and further support our previous claim, that the actual purpose of the underlying LBSN has an implication on the temporal behavior of its users. 0 10 Inter-checkin Distance 100 500 1000 ECDF Finally, we examine the temporal dynamics of LBSN users from the perspective of their geographical properties. In particular, we calculate the distance between two consecutive check-ins of the users as a function of the check-in count. Figure 8 presents our results. Consecutive check-ins of Gowalla users are contained within a small distance, regardless of the check-in count. This might again be explained if we consider the purpose of the system. Users that aim at creating a type of city guide will tend to check-in at many places, which most probably will be located very close to each other. On the contrary, the average distances between consecutive check-ins for Brightkite users exhibit much larger values and an increasing trend as the activity level increases. The much larger values can be attributed to the fact that Brighkite users might be more tempted to check-in to distant places (e.g., places they visit during their travels) that do not belong to their hometown. In addition, as we have observed previously, the number of uniques venues shared by these users is small. Thus, the large distance between consecutive check-ins will be retained as there are not many new places added in between. However, if we examine the probability distribution of inter-checkin distances for a given check-in order, we see that the distribution exhibits an extremely heavy tail similar to that of a power law distribution (Figure 9). Thus, the average value for the inter-checkin distances is not a very robust metric, and therefore, we also calculate the median value. −3 −2 10 10 −1 10 User Entropy 0 10 1 10 (a) Brightkite 0 10 −1 10 100 500 1000 −2 ECDF 10 −3 10 −4 10 −5 10 0 10 User Entropy (b) Gowalla Figure 7: The distribution of the user entropy for a given check-in count is lightly left-skewed, especially for Brightkite users. 66 400 Inter−checkin Distance (miles) Inter−checkin Distance (miles) Mean Median 200 0 0 1000 500 1000 1500 2000 Mean Median 500 0 0 500 1000 1500 Checkin Count 400 Mean Median 200 0 0 200 (a) All users 600 800 1000 Mean Median 100 0 0 2000 400 200 200 400 600 Checkin Count 800 1000 (b) Users with less than 1000 check-ins Inter−checkin Distance (miles) 600 400 Mean Median 200 0 0 500 1000 1500 2000 1000 1500 Checkin Count 2000 1000 500 0 0 Mean Median 500 (c) Users with more than 1000 check-ins Figure 8: LBSN users manifest a stable behavior with regards to the distances of consecutive places shared (Gowalla: bottom figure, Brightkite: top figure). We can see for both datasets that the median value is very similar. In particular, users do not appear to change their behavior with regards to the distances traveled between two back-to-back sharings of location. In the next section, we will discuss what are the implications of our data analysis and how we can exploit the observed dynamics. lized to enhance existing or enable new services focusing on two example applications; (a) location prediction and (b) friend recommendation. Location Prediction: Location-based marketing and advertisement has been in rise during the last years (Bruner & Kumar 2007). It has been primarily re-active, that is, users do not get exposed to offers unless they are present in the location of interest. However, a set of novel, proactive applications can be realized if we are able to predict with high accuracy the future location of a user. For instance, time limited information about the places to be visited, weather and traffic reports in these areas as well as presentation of reviews for these locales are just some of the possible services to be offered. The temporal dynamics of users can enable more accurate location forecasting. For instance, the fact that users change their level of activity over time is an indicator that in general they do not follow regular patterns, or at least there is a transient period (i.e., the initial hesitation phase) that should be considered with care in a supervised learning predictor. In addition, using the relation between the unique venues visited by a user and his check-in count, we can obtain the probability that his next location will be a new venue or one of the already visited. Further combining this information with the dynamics of the inter-checkin distances can provide even higher accuracy in future location estimation. Utilizing these dynamics, in combination with the social dynamics present in an LBSN (e.g., friendship relations, user spatial similarity etc.), for an accurate spatial predictor, is part of our future work. Friend Recommendation: Social media are helping their members to find friends with similar interests, or connect them with people they already know. Of critical importance to this service is a friend recommendation engine. In traditional digital social networks, this functionality is based primarily on the number of common friends and the level of virtual interactions with these common friends. However, virtual interactions do not usually form a good indicator of how close two people are in the real world. Nevertheless, with the introduction of LBSNs, the spatial logs of users bridge the virtual and physical worlds and they can be used to provide better friend recommendations. For instance, Scellato et al (Scellato, Noulas, & Mascolo 2011) using longitudinal data, show that about 30% of new social links are created among people that have visited the same 0 ECDF 10 100 500 1000 −5 0 10 5 10 Inter−checkin Distance (Miles) 10 (a) Brightkite 0 ECDF 10 −1 10 100 500 1000 −2 10 −6 10 −4 10 −2 0 10 10 Inter−checkin Distance (Miles) 2 10 4 10 (b) Gowalla Figure 9: The distribution of the inter-checkin distances for a given check-in count is positively skewed. 5 Discussion and Future Directions LBSNs can enable a large number of novel applications. At the same time, the rich spatial information present in these systems can enhance existing important functionalities in social media. For instance, Lian and Xie (Lian & Xie 2011), make use of check-ins of users that are similar to identify the activities they participate in at the locations of interest. In this section we will discuss how our findings can be uti- 67 places. This finding can reduce the search space for new friendships, while keep accuracy as high as 66%. Taking into consideration the users’ temporal dynamics can further improve friend recommendations. We are working on examining the integration of our findings presented in this paper, with location-aware friend recommendation engines. 6 Cramer, H.; Rost, M.; and Holmquist, L. 2011. Performing a check-in: Emerging practices, norms and conflicts in location-sharing using foursquare. In ACM MobileHCI. Cranshaw, J.; Toch, E.; Hong, J.; Kittur, A.; and Sadeh, N. 2010. Bridging the gap between physical location and online social networks. In UBICOMP. Li, N., and Chen, G. 2009. Analysis of a location-based social network. In IEEE CSE. Li, N., and Chen, G. 2010. Sharing location in online social networks. IEEE Network 24(5):20–25. Lian, D., and Xie, X. 2011. Collaborative activity recognition via check-in history. In ACM LBSN. Lindqvist, J.; Cranshaw, J.; Wiese, J.; Hong, J.; and Zimmerman, J. 2011. Im the mayor of my house: Examining why people use foursquare - a social-driven location sharing application. In ACM CHI. Noulas, A.; Scellato, S.; Mascolo, C.; and Pontil, M. 2011. An empirical study of geographic user activity patterns in foursquare. In AAAI ICWSM (poster session). Noulas, A.; Scellato, S.; Lambiotte, R.; Pontil, M.; and Mascolo, C. 2012. A tale of many cities: universal patters in human urban mobility. In PloS ONE (Forthcoming). Scellato, S., and Mascolo, C. 2011. Measuring user activity on an online location-based social network. In NetSciCom. Scellato, S.; Noulas, A.; Lambiotte, R.; and Mascolo, C. 2011. Socio-spatial properties of online location-based social networks. In AAAI ICWSM. Scellato, S.; Noulas, A.; and Mascolo, C. 2011. Exploiting place features in link prediction on location-based social networks. In ACM KDD. Stouffer, S. 1940. Intervening opportunities: A theory relating mobility and distance. In American Sociological Review. Tang, K.; Lin, J.; Hong, J.; Siewiorek, D.; and Sadeh, N. 2010. Rethinking location sharing: Exploring the implications of social-driven vs. purpose-driven location sharing. In UBICOMP. Teevan, J.; Karlson, A.; Amini, S.; Brush, A. B.; and Krumm, J. 2011. Understanding the importance of location, time, and people in mobile local search behavior. In ACM MobileHCI. Conclusions In this work we have examined the temporal dynamics of location-based social network users. Analyzing data from two commercial LBSNs (Brightkite and Gowalla) we have studied the time intervals and geographical distances between two consecutive location sharings, as well as the number of unique places visited/shared and the user entropy, as a function of the users’ activity level. The latter is captured through the number of check-ins. From our results, users start using the system more often once they become more familiar with it (i.e., with the increase in their checkin counts). This is expressed through smaller inter-checkin times for higher order check-ins. Furthermore, users of different LBSNs might exhibit different temporal dynamics with regards to the locations visited depending on the purpose of the specific network they participate in. Finally, we have discussed ways through which these findings can enhance many existing or even enable new functionalities. The latter is part of our current and future efforts. Acknowledgment We would like to thank Prof. Christos Faloutsos for his valuable comments and discussions during the preparation of this work. References Bruner, G., and Kumar, A. 2007. Attitude toward locationbased advertising. In Journal of Interactive Advertising, Vol. 7, No 2. Cheng, Z.; Caverlee, J.; Lee, K.; and Sui, D. 2011. Exploring millions of footprints in location sharing services. In AAAI ICWSM. Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship and mobility: Friendship and mobility: User movement in location-based social networks. In ACM KDD. 68