Document 14294678

advertisement
unique. We subsequently remove all merged locations that
is often called a tempera
the user visited fewer than five times and assign a unique
nealing literature, becau
label to each remaining place. These labels are the domains
energy in physics. There
of u and f i’s. We call such places significant.
temperature τ1 and grad
The above place indexing yields a total of 89,077 unique
recovers the original obje
locations, out of which 25,830 were visited at least five times
University of Rochester, Department o
f C
omputer S
cience 5.2.2 Inference
by at least one user. There were 2,467,149 tweets total
posted from the significant locations in the 4 week model
At inference time, we a
evaluation period. Table 1 lists summary statistics.
planation of the observe
We model each person’s location in 20 minute increments,
of locations visited by o
since more than 90% of the users tweet with lower frequency.
sponding time and day
Therefore, the domain of the time of day random variable
likely sequence of locatio
LocaEon plays an essenEal role in our lives, bridging our implements a dynamic probabilisEc model of human mobility, td is {0,We
. .Flap . subsequently
, 71}
(total of remove
24/0.3
time
intervals
in any
given
unique.
all merged
locations
that
isperiod.
often
calledwhere a temperatu
online and offline worlds We explore the interplay day).
Flapliterature,
runs a variant
of V
we treat users with known GPS posiEons as noisy sensors of the locaEon of the
user visited
fewer than five times and assign a unique
nealing
becaus
between people's locaEon, interacEons, and their social late
the
likely
state
label to each
remaining
These supervised labels are the
domains
energy
in most
physics.
Therefo
their friends. place.
We explore and unsupervised learning scenarios, 5.2.1
Learning
Ees within a large real-­‐world dataset. Our system, Flap, Viterbi
decoding
is
given
of u and f i’s.
We
call
such
places
significant.
temperature
τ
and
gradu
1
and focus on the efficiency of both learning and inference. place
We above
explore
bothindexing
supervised
andaunsupervised
learning
solves two inEmately related tasks: link and loca-on The
yields
total of 89,077
uniqueof
recovers the original
objec
�
y
=
argm
1:t
user
mobility.
In
the
earlier
case,
for
each
user,
we
train
a
The d
ynamic B
ayesian n
etwork m
odel s
hown b
elow a
llows u
s t
o i
nfer t
he locations,
out
of
which
25,830
were
visited
at
least
five
times
predic-on in online social networks. We evaluate Flap on a sample of 11 y1:t
5.2.2
Inference
DBN
on
the
first
three
weeks
of
data
with
known
hidden
most l
ikely s
equence o
f l
ocaEons v
isited b
y ,
g
iven t
he l
ocaEon of his by
at
least
one
user.
There
were
2,467,149
tweets
total
thousand highly acEve users from New York City and Los Angeles, and show that it where
Pr(y
|x
)
is
con
location
values.
In
the
latter
case,
the
hidden
labels
are
1:t
1:t
posted from
the significant
the
4 dweek
model
friends ( , … , locations
), Eme of dinay, and ay type (work day vs. day). time, we ar
Atfree inference
hidden
states
y
given
a
unknown to
the
system.
1:t
(1)  reconstructs the en-re friendship graph (60M possible edges) with high evaluation
period. Table 1 lists summary statistics.
planation of the observed
x
between
times
1
and
During
supervised
learning,
we
find
a
set
of
parameters
1:t
accuracy even when no edges are given; and We model each person’s location in 20 minute increments,
of locations visited by on
In
each
time
slice,
we
(discrete
probability
distributions)
θ
that
maximize
the
logsince more than 90% of the users tweet with lower frequency.
sponding time and day ty
(2)  infers people's fine-­‐grained loca-on, even when they keep their data private theirsequence
hidden parent
nod
likelihood the
of the
training
data.
This
achieved
byvariable
optimizTherefore,
domain
of the
time
of isday
random
likely
of location
and we can only access the loca-on of their friends. node in each time slice, w
objective
function.
tding
is the
{0, .following
. . , 71} (total
of 24/0.3
time intervals in any given
period.
achieve
polynomial
runti
day).
Flap
runs
a
variant
of
Vi
Our models significantly outperform current comparable approaches to either � �
�
ically,
the time
complex
�
late
the
most
likely
state
o
θ
=
argmax
log
Pr
x
,
y
|θ)
,
(6)
1:t
1:t
task. 5.2.1 Learning θ
where decoding
T is the is
number
Viterbi
given
possible
hidden
state
val
We
explore
both
supervised
and
unsupervised
learning
of
where x1:t and y1:t represent the sequence of observed and
�
y
=
argma
�
Therefore,
the
overall
1:t
user
mobility.
the earlier case,
for each
wet,train
hidden
values,Inrespectively,
between
timesuser,
1 and
and aθ
y1:t
ference
for
any
given
tar
DBN
on
the
first
three
weeks
of
data
with
known
hidden
is the set of optimal model parameters. In our implemenM ) jointly represent the largest information value. Finally,
the
number
of
EM
iterati
while calculating the features for all pairs of n users is an
where
Pr(y
|x
)
is
condi
location
values.
In
the
latter
case,
the
hidden
labels
are
1:t
1:t
tation,
we
represent
probabilities
and
likelihoods
with
their
O(n ) operation, it can be significantly sped up via localityFlap infers social Ees by leveraging This
renders
our
model
tr
hidden
states
y
given
a
sensitive hashing [8].
unknown
to
the
system.
1:t
log-counterparts to avoid arithmetic underflow. 100
that
evolve
over
long
pe
For s
upervised l
earning, t
he o
pEmal paLerns in friendship formaEon, the x
between
times
1
and
t
5.1.2 Learning and Inference
During
supervised
learning,
we
find
a
set
of
parameters
1:t
For unsupervised learning, we perform expectation-maximi90
Our probabilistic model of the friendship network is a
Next,
we
turn
to
our
exp
In
each
time
slice,
we
set o
f p
arameters Θ
g
iven o
bserved (discrete
probability
distributions)
θ
that
maximize
the
logcontent o
f p
eople's m
essages, a
nd Markov random field that has a hidden node for each possization
(EM)
[9].
In
the
E
step,
the
values
of
the
hidden
Figure 3: Two consecutive time slices of our dyLearn Since the friendship relationship is symmetric
80
ble friendship.
namic Bayesian network for modeling motion patBelief Predicted their
hidden
parent
node
likelihood
of
the
training
data.
This
is
achieved
by
optimiz(x) a
nd h
idden (
y) r
andom v
ariables user c
o-­‐locaEon. F
riendships o
n nodes
are
inferred
using
the
current
DBN
parameter
values
Decision our model contains n(n − 1)/2 hidden nodes,
and irreflexive,
terns
of
Twitter
users
from
n
friends.
All
nodes
are
70
PropagaEon Friendships where nTree is the number of users. Each hidden node is connode
in
each
time
slice,
we
discrete, shaded nodes represent observed random
ing
the
following
objective
function.
can b
e e
sEmated d
irectly f
rom 6.
EVALUATION
(initialized randomly). In the subsequent M step,
the
inTwiLer are predicted by the process nected to an observed node (DT ) and to all other hidden
variables, unfilled denote hidden variables.
60
achieve
polynomial
runtim
nodes.
ferred
values
of
the
hidden
nodes
are
in
turn
used
to
update
training d
ata: shown o
n t
he l
eY. T
he b
elief For clarity, we discuss
Ultimately, we are interested in the probability of exis50
�
�
�
ically,
the
time
complexi
�
active users appear in the Twitter public timeline, they are
tence of an edge (friendship) given the current graph structhe
parameters.
This
process
is
repeated
until
convergence,
θ
=
argmax
log
Pr
x
,
y
|θ)
,
(6)
propagaEon a
lgorithm i
s two Flap’s tasks separat
1:t
1:t
40
processed by the decision tree and added to Q. This is an
ture and the pairwise features of the vertices (users) the
where
T
is
the
number
o
θ
attractive
mode,
where
the
model
is
always
up
to
date
and
edge is incident on. Applying Bayes’ theorem while assumat
which
point
the
EM
algorithm
outputs
a
maximum
like30
summarized elow. data.
Supervised DBN
takes advantage of allbavailable
ing mutual independence of features DT and M , we can
possible
hidden
state
valu
lihoodx1:t
point
estimate
of the
DBN
parameters.
The
corre6.1
Friendship
Pr
where
and
y1:t represent
the
sequence
of observed
and
Unsupervised DBN
write
In u
nsupervised l
earning, w
e a
pply 20
Network Cho et al.
PSMM
�
Algorithm 1 : refineEdgeProbabilities(Q)
Therefore,
the
overall
ti
P (E = 1|DT = d, M = m) =
sponding
optimization
problem
can
be
written
as
hidden
values,
respectively,
between
times
1
and
t,
and
θ
Text LocaEon Naive
10
We
evaluate
Flap
on
f
expectaEon-­‐maximizaEon m
ethod: Input: Q: list containing all potential edges between pairs
= P (DT = d|E =Structure 1)P (M = m|E = 1)P (E = 1)/Z
ference
for
any
given
targ
Random
�
of vertices along with their preliminary probabilitiesis the set of optimal model parameters.
= P (DT = d|E = 1)P (E = 1|M = m)/Z
�
In our�implemen0
�
cross-validation
in
which
Output:
Q:
input
list
Q
with
refined
probabilities
(5)
0
1 (7)
2
3 the
4 number
5
6
7 EM
8 iteratio
9
θ
=
argmax
log
Pr
x
,
y
|θ)
,
of
1:t
1:t
tation, we represent probabilities
and
likelihoods
with
their
Number of Friends
Leveraged
(n)
where
1: while Q has not converged do
data,
and
vice
versa.
W
θ
�
y1:t
This
renders
our
model
tra
2:
sort Q high to low by estimated edge probability
Z
=
P
(DT
=
d|E
=
i)P
(E
=
i|M
=
m).
log-counterparts
to
avoid
arithmetic
underflow.
TwiLer Feed runs.
We
varied
the
am
3:
for each �e, P (e)� in Q do
that
evolve
over
long
peri
4:
dt ⇐ DT (e)
For
unsupervised
learning,
we
perform
expectation-maximiwhere
we
sum
over
all
possible
values
of
hidden
nodes
y
.
1:t
provided
to
the
model
at
E, DT , and M are random variables that represent edge
5:
m ⇐ M (e)
Next,
we
turn
to
our
expe
existence, DT score, and M score, respectively. In equazation
(EM)
[9].
In
the
E
step,
the
values
of
the
hidden
6:
P (e) ⇐
Since
equation
7
is
computationally
intractable
for
sizable
Flap reconstructs the
tion 5, we applied the equality
7:
end for
nodes
are
inferred
using
the
current
DBN
parameter
values
domains,
we
simplify
by
optimizing
its
lower
bound
instead,
range of conditions—eve
P (M |E) = P (E|M )P (E)/P (M )
8: end while
9: return Q
6.and EVALUATION
(initialized
randomly). In the subsequent M step, the insimilar to [13].
and subsequent simplifications so that we do not need to
Table
2).
It
far
out
explicitly model P (E).
We show that much informaEon can be inferred about individuals from their ferred
values
of the
hidden nodes
areEM
in turn
used tohas
update
The random
initialization
of the
procedure
a proAt learning time, we first train a regression decision tree
For
clarity,
we
discuss
ex
sion
tree)
and
the
precisi
5.2 Location Prediction
DT and prune it using ten-fold cross-validation to prevent
interacEons in online social media, without acEve user parEcipaEon. Namely, the
parameters.
This
process
is
repeated
until
convergence,
found
influence
on
the
final
set
of
learned
parameter
values.
While e
ach o
bservaEon t
ype i
s a
1
two
Flap’stotasks
parable
thoseseparatel
of [28],
overfitting. We also perform maximum likelihood learning
The goal of Flap’s location prediction component is to
of the parameters P (DT |E) and P (E|M ). We chose the deinfer the most likely location of person u at any time. The
we p
redict T
wiLer f
riendships w
ith h
igh a
ccuracy e
ven w
hen n
o E
es a
re g
iven. at
which
point
the
EM
algorithm
outputs
a
maximum
likeAs
a
result,
EM
is
prone
to
getting
“stuck”
in
a
local
optiweak predictor ooff locations
friendship of magnitude larger and
cision tree pre-processing step for several reasons. First, the
input consists
of a sequence
visited by w
u’shen friends
0.9
point
estimate
of
the
DBN
parameters.
The
corretext and location-based features considered individually or
(and for supervised learning, locations of u himself over lihood
the
6.1
Friendship
Pre
AddiEonally, we infer fine-­‐grained user locaEon, even when they keep their mum.
To
mitigate
this
problem,
we
perform
deterministic
We
also
compare
our
m
considered i
n i
solaEon, c
ombining independently have very poor predictive power. Therefore,
training period), along with corresponding time information.
0.8
optimization
problem
can
beawritten
aslocaEon simulated
annealing
[29].
The
basic
idea
is
to
reduce
the
models such as logistic regression tend to have low accuThe model outputs the most likely sequence of locationssponding
u
data p
rivate a
nd w
e c
an o
nly ccess t
he o
f their friends. We
evaluate
Flap
on
fri
summarized
in
Section
2.
them r
esults i
n a
s
trong m
odel, racy. Furthermore, the relationships between the observed
visited over a given time period.
�
�
�
0.7
undesirable
influence
of
the
initial
random
set
of parameters
We model user location in a dynamic Bayesian network
attributes of a pair of users and the their friendship is often
�
cross-validation
in
which
contemporaneous
events
accurately idenEfying the majority θ
=
argmax
log
Pr
x
(7)
1:t , y1:t |θ) ,
quite complex. For example, it is not simply the case that a
shown in Figure 3.
In each time slice,
we have
one hidden
by
“smoothing”
the
objective
function
so
that
it
hopefully
data,
and
vice
versa.
We
ter
data
for
various
spa
0.6
friendship is more and more likely to exist as people spend
node and a number of observed nodes, all of which are disθ
y1:t
of friendships. larger and larger amounts of time near each other. Consider
crete.
The hidden node represents the location of the target
has
fewer
local
optima.
Mathematically,
this
is
written
as
runs.
We
varied
the
am
see
that
in
our
dataset,
t
two
strangers
that
happen
to
take
the
same
train
to
work,
user
(u).
The
node
td
represents
the
time
of
day
and
w
de0.5
termines if a given day is a work day or a free day (weekend
and tweet every time it goes through a station. Our dataset
�
where
we
sum
over
all
possible
values
of
hidden
nodes
1 y1:t .
�
�
provided
to
the
model
at
and
friendship
is
much
�
Random
classifier
contains a number of instances of this sort. During the train
or a national holiday). Each of the remaining observed nodes
τi
0.4
. (8)
θ
(τ
,
.
.
.
,
τ
)
=
argmax
τ
log
Pr
x
,
y
|θ
The p
lot o
n t
he l
eY s
hows F
lap’s 1
m7 is computationally
i
1:t
1:tfor sizable
Since
equation
intractable
ride, their co-location could not
be
higher
and
yet
they
are
(f
1
through
f
n)
represents
the
location
of
one
of
the
target
Flap
reconstructs
the
f
as
compared
to
their
Fl
Crandall et al.
1. 
E. C
ho, S
. A
. M
yers, a
nd J
. L
eskovec. F
riendship a
nd m
obility: U
ser m
ovement θ
not friends on Twitter. This largely precludes success of
user’s friends. Since the average node degree of geo-active
y1:t
performance i
n t
erms o
f R
OC domains,
we
simplify
by
optimizing
its lower bound instead,
0.3
Tree
Baseline
users is 9.2, we concentrate on n ∈ {0, 1, 2, . . . , 9}, although
classifiers that are looking for Dec.
a simple
decision
surface.
range
of
conditions—even
dictive
performance
of C
in l
ocaEon-­‐based s
ocial n
etworks. A
CM S
IGKDD I
nterna/onal C
onference o
n At inference time, we use DT
make preliminary
our approach works for arbitrary nonnegative values of n.
0% toobserved
edges pre- curves, similar
to
[13].
a
nd c
ompares i
t t
o and
Table
2).
It
far
outp
Here,
τ
. . . , τm is a D
sequence
of
parameters,
each
of
which
is
poor.
When
probabili
dictions on the test data. Next, we execute a customized
Each node is indexed by time slice.
1 , Knowledge 0.2
iscovery and Data Mining (KDD), 2011. 10% observed
edgeswith
loopy belief propagation algorithm
that is initialized
The domains of the random variables are generated from The random initialization of the EM procedure has a proalternaEve m
odels. W
e v
ary t
he sion
tree)
and
the
precision
corresponds
to
a
different
amount
of
smoothing
of
the
origion
the
number
of
contem
the probabilities estimated by 25%
DT (see
Algorithm
1).
Step
6
the
Twitter
dataset
in
the
following
way.
First,
for
each
observed edges
0.1
2. 
D. C
randall, L
. B
ackstrom, D
. C
osley, S
. S
uri, D
. H
uLenlocher, a
nd J
. K
leinberg. is where an edge receives belief updates from the other edges
user, we extract a set of distinct locations they tweet from.
found
influence
on
the
final
set
of
learned
parameter
values.
parable
to
those
of
[28],
e
nal
objective
function
(shown
in
equation
7).
The
sequence
0.001%,
precision
0.008,
proporEon o
f f
riendships t
hat a
re 50%
observed
edges
as well as the DT prior. Even though the graphical model is
Then, we iteratively merge (cluster) all locations that are
Inferring s
ocial E
es f
rom g
eographic c
oincidences. P
roceedings o
f t
he N
a/onal 0
As
a
result,
EM
is
prone
to
getting
“stuck”
in
a
local
optidense,
our
algorithm
converges
within
several
hundred
iterwithin
100
meters
of
each
other
in
order
to
account
for
GPS
of
magnitude
larger
and
o
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
given t
o F
lap f
rom 0
% t
o 5
0%. ations, due in part to the sufficiently accurate initialization
sensor noise, which is especially severe in areas with tall
False positive rate (1−Specificity)
Academy of Sproblem,
ciences, 1we
07(52):22436, 2010. mum. To mitigate
this
perform deterministic
buildings, such as Manhattan. The location merging is done
and regularization provided by the decision tree. Note that
We also compare our m
the algorithm can also function in an online fashion: as new
separately for each user and we call the resulting locations
simulated annealing [29]. The basic idea is to reduce the
summarized in Section 2.
Finding Your Friends and Following Them to Where You Are Adam Sadilek
Henry Kautz
Overview
Jeffrey Bigham Loca-on Predic-on Friendship Predic-on E
ut
...
ut+1
...
2
...
fnt
tdt
wt
f1t+1
...
fnt+1
tdt+1
wt+1
Accuracy [%]
f1t
E
E
E
E
E
i∈{0,1}
E
E
E
E
�
E
E
P (DT =dt|E=1)P (E=1|ME =m)
i∈{0,1} P (DT =dt|E=i)P (E=i|ME =m)
Conclusions True positive rate (Sensitivity)
E
References 
Download