unique. We subsequently remove all merged locations that is often called a tempera the user visited fewer than five times and assign a unique nealing literature, becau label to each remaining place. These labels are the domains energy in physics. There of u and f i’s. We call such places significant. temperature τ1 and grad The above place indexing yields a total of 89,077 unique recovers the original obje locations, out of which 25,830 were visited at least five times University of Rochester, Department o f C omputer S cience 5.2.2 Inference by at least one user. There were 2,467,149 tweets total posted from the significant locations in the 4 week model At inference time, we a evaluation period. Table 1 lists summary statistics. planation of the observe We model each person’s location in 20 minute increments, of locations visited by o since more than 90% of the users tweet with lower frequency. sponding time and day Therefore, the domain of the time of day random variable likely sequence of locatio LocaEon plays an essenEal role in our lives, bridging our implements a dynamic probabilisEc model of human mobility, td is {0,We . .Flap . subsequently , 71} (total of remove 24/0.3 time intervals in any given unique. all merged locations that isperiod. often calledwhere a temperatu online and offline worlds We explore the interplay day). Flapliterature, runs a variant of V we treat users with known GPS posiEons as noisy sensors of the locaEon of the user visited fewer than five times and assign a unique nealing becaus between people's locaEon, interacEons, and their social late the likely state label to each remaining These supervised labels are the domains energy in most physics. Therefo their friends. place. We explore and unsupervised learning scenarios, 5.2.1 Learning Ees within a large real-­‐world dataset. Our system, Flap, Viterbi decoding is given of u and f i’s. We call such places significant. temperature τ and gradu 1 and focus on the efficiency of both learning and inference. place We above explore bothindexing supervised andaunsupervised learning solves two inEmately related tasks: link and loca-on The yields total of 89,077 uniqueof recovers the original objec � y = argm 1:t user mobility. In the earlier case, for each user, we train a The d ynamic B ayesian n etwork m odel s hown b elow a llows u s t o i nfer t he locations, out of which 25,830 were visited at least five times predic-on in online social networks. We evaluate Flap on a sample of 11 y1:t 5.2.2 Inference DBN on the first three weeks of data with known hidden most l ikely s equence o f l ocaEons v isited b y , g iven t he l ocaEon of his by at least one user. There were 2,467,149 tweets total thousand highly acEve users from New York City and Los Angeles, and show that it where Pr(y |x ) is con location values. In the latter case, the hidden labels are 1:t 1:t posted from the significant the 4 dweek model friends ( , … , locations ), Eme of dinay, and ay type (work day vs. day). time, we ar Atfree inference hidden states y given a unknown to the system. 1:t (1) reconstructs the en-re friendship graph (60M possible edges) with high evaluation period. Table 1 lists summary statistics. planation of the observed x between times 1 and During supervised learning, we find a set of parameters 1:t accuracy even when no edges are given; and We model each person’s location in 20 minute increments, of locations visited by on In each time slice, we (discrete probability distributions) θ that maximize the logsince more than 90% of the users tweet with lower frequency. sponding time and day ty (2) infers people's fine-­‐grained loca-on, even when they keep their data private theirsequence hidden parent nod likelihood the of the training data. This achieved byvariable optimizTherefore, domain of the time of isday random likely of location and we can only access the loca-on of their friends. node in each time slice, w objective function. tding is the {0, .following . . , 71} (total of 24/0.3 time intervals in any given period. achieve polynomial runti day). Flap runs a variant of Vi Our models significantly outperform current comparable approaches to either � � � ically, the time complex � late the most likely state o θ = argmax log Pr x , y |θ) , (6) 1:t 1:t task. 5.2.1 Learning θ where decoding T is the is number Viterbi given possible hidden state val We explore both supervised and unsupervised learning of where x1:t and y1:t represent the sequence of observed and � y = argma � Therefore, the overall 1:t user mobility. the earlier case, for each wet,train hidden values,Inrespectively, between timesuser, 1 and and aθ y1:t ference for any given tar DBN on the first three weeks of data with known hidden is the set of optimal model parameters. In our implemenM ) jointly represent the largest information value. Finally, the number of EM iterati while calculating the features for all pairs of n users is an where Pr(y |x ) is condi location values. In the latter case, the hidden labels are 1:t 1:t tation, we represent probabilities and likelihoods with their O(n ) operation, it can be significantly sped up via localityFlap infers social Ees by leveraging This renders our model tr hidden states y given a sensitive hashing [8]. unknown to the system. 1:t log-counterparts to avoid arithmetic underflow. 100 that evolve over long pe For s upervised l earning, t he o pEmal paLerns in friendship formaEon, the x between times 1 and t 5.1.2 Learning and Inference During supervised learning, we find a set of parameters 1:t For unsupervised learning, we perform expectation-maximi90 Our probabilistic model of the friendship network is a Next, we turn to our exp In each time slice, we set o f p arameters Θ g iven o bserved (discrete probability distributions) θ that maximize the logcontent o f p eople's m essages, a nd Markov random field that has a hidden node for each possization (EM) [9]. In the E step, the values of the hidden Figure 3: Two consecutive time slices of our dyLearn Since the friendship relationship is symmetric 80 ble friendship. namic Bayesian network for modeling motion patBelief Predicted their hidden parent node likelihood of the training data. This is achieved by optimiz(x) a nd h idden ( y) r andom v ariables user c o-­‐locaEon. F riendships o n nodes are inferred using the current DBN parameter values Decision our model contains n(n − 1)/2 hidden nodes, and irreflexive, terns of Twitter users from n friends. All nodes are 70 PropagaEon Friendships where nTree is the number of users. Each hidden node is connode in each time slice, we discrete, shaded nodes represent observed random ing the following objective function. can b e e sEmated d irectly f rom 6. EVALUATION (initialized randomly). In the subsequent M step, the inTwiLer are predicted by the process nected to an observed node (DT ) and to all other hidden variables, unfilled denote hidden variables. 60 achieve polynomial runtim nodes. ferred values of the hidden nodes are in turn used to update training d ata: shown o n t he l eY. T he b elief For clarity, we discuss Ultimately, we are interested in the probability of exis50 � � � ically, the time complexi � active users appear in the Twitter public timeline, they are tence of an edge (friendship) given the current graph structhe parameters. This process is repeated until convergence, θ = argmax log Pr x , y |θ) , (6) propagaEon a lgorithm i s two Flap’s tasks separat 1:t 1:t 40 processed by the decision tree and added to Q. This is an ture and the pairwise features of the vertices (users) the where T is the number o θ attractive mode, where the model is always up to date and edge is incident on. Applying Bayes’ theorem while assumat which point the EM algorithm outputs a maximum like30 summarized elow. data. Supervised DBN takes advantage of allbavailable ing mutual independence of features DT and M , we can possible hidden state valu lihoodx1:t point estimate of the DBN parameters. The corre6.1 Friendship Pr where and y1:t represent the sequence of observed and Unsupervised DBN write In u nsupervised l earning, w e a pply 20 Network Cho et al. PSMM � Algorithm 1 : refineEdgeProbabilities(Q) Therefore, the overall ti P (E = 1|DT = d, M = m) = sponding optimization problem can be written as hidden values, respectively, between times 1 and t, and θ Text LocaEon Naive 10 We evaluate Flap on f expectaEon-­‐maximizaEon m ethod: Input: Q: list containing all potential edges between pairs = P (DT = d|E =Structure 1)P (M = m|E = 1)P (E = 1)/Z ference for any given targ Random � of vertices along with their preliminary probabilitiesis the set of optimal model parameters. = P (DT = d|E = 1)P (E = 1|M = m)/Z � In our�implemen0 � cross-validation in which Output: Q: input list Q with refined probabilities (5) 0 1 (7) 2 3 the 4 number 5 6 7 EM 8 iteratio 9 θ = argmax log Pr x , y |θ) , of 1:t 1:t tation, we represent probabilities and likelihoods with their Number of Friends Leveraged (n) where 1: while Q has not converged do data, and vice versa. W θ � y1:t This renders our model tra 2: sort Q high to low by estimated edge probability Z = P (DT = d|E = i)P (E = i|M = m). log-counterparts to avoid arithmetic underflow. TwiLer Feed runs. We varied the am 3: for each �e, P (e)� in Q do that evolve over long peri 4: dt ⇐ DT (e) For unsupervised learning, we perform expectation-maximiwhere we sum over all possible values of hidden nodes y . 1:t provided to the model at E, DT , and M are random variables that represent edge 5: m ⇐ M (e) Next, we turn to our expe existence, DT score, and M score, respectively. In equazation (EM) [9]. In the E step, the values of the hidden 6: P (e) ⇐ Since equation 7 is computationally intractable for sizable Flap reconstructs the tion 5, we applied the equality 7: end for nodes are inferred using the current DBN parameter values domains, we simplify by optimizing its lower bound instead, range of conditions—eve P (M |E) = P (E|M )P (E)/P (M ) 8: end while 9: return Q 6.and EVALUATION (initialized randomly). In the subsequent M step, the insimilar to [13]. and subsequent simplifications so that we do not need to Table 2). It far out explicitly model P (E). We show that much informaEon can be inferred about individuals from their ferred values of the hidden nodes areEM in turn used tohas update The random initialization of the procedure a proAt learning time, we first train a regression decision tree For clarity, we discuss ex sion tree) and the precisi 5.2 Location Prediction DT and prune it using ten-fold cross-validation to prevent interacEons in online social media, without acEve user parEcipaEon. Namely, the parameters. This process is repeated until convergence, found influence on the final set of learned parameter values. While e ach o bservaEon t ype i s a 1 two Flap’stotasks parable thoseseparatel of [28], overfitting. We also perform maximum likelihood learning The goal of Flap’s location prediction component is to of the parameters P (DT |E) and P (E|M ). We chose the deinfer the most likely location of person u at any time. The we p redict T wiLer f riendships w ith h igh a ccuracy e ven w hen n o E es a re g iven. at which point the EM algorithm outputs a maximum likeAs a result, EM is prone to getting “stuck” in a local optiweak predictor ooff locations friendship of magnitude larger and cision tree pre-processing step for several reasons. First, the input consists of a sequence visited by w u’shen friends 0.9 point estimate of the DBN parameters. The corretext and location-based features considered individually or (and for supervised learning, locations of u himself over lihood the 6.1 Friendship Pre AddiEonally, we infer fine-­‐grained user locaEon, even when they keep their mum. To mitigate this problem, we perform deterministic We also compare our m considered i n i solaEon, c ombining independently have very poor predictive power. Therefore, training period), along with corresponding time information. 0.8 optimization problem can beawritten aslocaEon simulated annealing [29]. The basic idea is to reduce the models such as logistic regression tend to have low accuThe model outputs the most likely sequence of locationssponding u data p rivate a nd w e c an o nly ccess t he o f their friends. We evaluate Flap on fri summarized in Section 2. them r esults i n a s trong m odel, racy. Furthermore, the relationships between the observed visited over a given time period. � � � 0.7 undesirable influence of the initial random set of parameters We model user location in a dynamic Bayesian network attributes of a pair of users and the their friendship is often � cross-validation in which contemporaneous events accurately idenEfying the majority θ = argmax log Pr x (7) 1:t , y1:t |θ) , quite complex. For example, it is not simply the case that a shown in Figure 3. In each time slice, we have one hidden by “smoothing” the objective function so that it hopefully data, and vice versa. We ter data for various spa 0.6 friendship is more and more likely to exist as people spend node and a number of observed nodes, all of which are disθ y1:t of friendships. larger and larger amounts of time near each other. Consider crete. The hidden node represents the location of the target has fewer local optima. Mathematically, this is written as runs. We varied the am see that in our dataset, t two strangers that happen to take the same train to work, user (u). The node td represents the time of day and w de0.5 termines if a given day is a work day or a free day (weekend and tweet every time it goes through a station. Our dataset � where we sum over all possible values of hidden nodes 1 y1:t . � � provided to the model at and friendship is much � Random classifier contains a number of instances of this sort. During the train or a national holiday). Each of the remaining observed nodes τi 0.4 . (8) θ (τ , . . . , τ ) = argmax τ log Pr x , y |θ The p lot o n t he l eY s hows F lap’s 1 m7 is computationally i 1:t 1:tfor sizable Since equation intractable ride, their co-location could not be higher and yet they are (f 1 through f n) represents the location of one of the target Flap reconstructs the f as compared to their Fl Crandall et al. 1. E. C ho, S . A . M yers, a nd J . L eskovec. F riendship a nd m obility: U ser m ovement θ not friends on Twitter. This largely precludes success of user’s friends. Since the average node degree of geo-active y1:t performance i n t erms o f R OC domains, we simplify by optimizing its lower bound instead, 0.3 Tree Baseline users is 9.2, we concentrate on n ∈ {0, 1, 2, . . . , 9}, although classifiers that are looking for Dec. a simple decision surface. range of conditions—even dictive performance of C in l ocaEon-­‐based s ocial n etworks. A CM S IGKDD I nterna/onal C onference o n At inference time, we use DT make preliminary our approach works for arbitrary nonnegative values of n. 0% toobserved edges pre- curves, similar to [13]. a nd c ompares i t t o and Table 2). It far outp Here, τ . . . , τm is a D sequence of parameters, each of which is poor. When probabili dictions on the test data. Next, we execute a customized Each node is indexed by time slice. 1 , Knowledge 0.2 iscovery and Data Mining (KDD), 2011. 10% observed edgeswith loopy belief propagation algorithm that is initialized The domains of the random variables are generated from The random initialization of the EM procedure has a proalternaEve m odels. W e v ary t he sion tree) and the precision corresponds to a different amount of smoothing of the origion the number of contem the probabilities estimated by 25% DT (see Algorithm 1). Step 6 the Twitter dataset in the following way. First, for each observed edges 0.1 2. D. C randall, L . B ackstrom, D . C osley, S . S uri, D . H uLenlocher, a nd J . K leinberg. is where an edge receives belief updates from the other edges user, we extract a set of distinct locations they tweet from. found influence on the final set of learned parameter values. parable to those of [28], e nal objective function (shown in equation 7). The sequence 0.001%, precision 0.008, proporEon o f f riendships t hat a re 50% observed edges as well as the DT prior. Even though the graphical model is Then, we iteratively merge (cluster) all locations that are Inferring s ocial E es f rom g eographic c oincidences. P roceedings o f t he N a/onal 0 As a result, EM is prone to getting “stuck” in a local optidense, our algorithm converges within several hundred iterwithin 100 meters of each other in order to account for GPS of magnitude larger and o 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 given t o F lap f rom 0 % t o 5 0%. ations, due in part to the sufficiently accurate initialization sensor noise, which is especially severe in areas with tall False positive rate (1−Specificity) Academy of Sproblem, ciences, 1we 07(52):22436, 2010. mum. To mitigate this perform deterministic buildings, such as Manhattan. The location merging is done and regularization provided by the decision tree. Note that We also compare our m the algorithm can also function in an online fashion: as new separately for each user and we call the resulting locations simulated annealing [29]. The basic idea is to reduce the summarized in Section 2. Finding Your Friends and Following Them to Where You Are Adam Sadilek Henry Kautz Overview Jeffrey Bigham Loca-on Predic-on Friendship Predic-on E ut ... ut+1 ... 2 ... fnt tdt wt f1t+1 ... fnt+1 tdt+1 wt+1 Accuracy [%] f1t E E E E E i∈{0,1} E E E E � E E P (DT =dt|E=1)P (E=1|ME =m) i∈{0,1} P (DT =dt|E=i)P (E=i|ME =m) Conclusions True positive rate (Sensitivity) E References