Mining User Similarity Based on Location History

advertisement
Mining User Similarity Based on
Location History
Yu Zheng, Quannan Li, Xing Xie
Microsoft Research Asia
Outline
• Introduction
• Architecture
– Modeling Location History
– Measuring User Similarity
• Experimental Results
• Conclusion
Introduction (1)
• Goals
– Inferring the similarity (correlations ) between users from their location histories
– Enable friend recommendation οƒ  Personalized location recommendation
• Motivation
– The increasing availability of user-generated trajectories
• Life logging, Travel experience sharing
• Sports activity analysis, Multimedia content management,…
– People’s outdoor movements in the real world imply their interests
• Like sports: if frequently visit gyms and stadiums
• Like Travel: if usually access mountains and lakes
– According to the first law of the geography
• Everything is related to everything else, but near things are more related than distant things.
• People with similar location histories might share similar interests and preferences.
– Significance of user similarity in Web communities
• Generally, it help users find more relevant information from a large-scale dataset
• In GIS community: friend discovering and location recommendation
Introduction (2)
• Difficulty & Challenges
– How to model different users’ location history uniformly
• Various users’ location histories are inconsistent and incomparable
• What’s a shared location? By distance ?? X
– How to measure the similarity between users
• By counting the number of shared locations ??
• The Pearson correlation and the cosine correlation ??
• They do not take into account two important properties of people’s outdoor movements.
• Contribution and insights
– A step towards integrating social networking into GIS
– A hierarchical-graph
• Uniformly modeling different users’ location histories on a various scales of geo-spaces
– A similarity measure considering
• Sequence property of users’ movement behavior
• Hierarchy property of geographic spaces
Preliminary
• GPS logs P and GPS trajectory
• Stay points S={s1, s2,…, sn}.
– Stands for a geo-region where a user has stayed for a while
– E.g., if a user spent more 20 minutes within a distance of 200 meters
– Carry a semantic meaning beyond a raw GPS point
• Location history: πΏπ‘œπ‘π» = (𝑠1
βˆ†π‘‘ 1
𝑠2
βˆ†π‘‘ 2
βˆ†π‘‘ 𝑛 −1
,…,
𝑠𝑛 )
– represented by a sequence of stay points
– with transition intervals
A Stay Point S
Latitude, Longitude, Time
p1: Lat1, Lngt1, T1
p2: Lat2, Lngt2, T2
………...
pn:
Latn,
Lngtn,
p1
p6
p3
p7
p2
Tn
p4
p5
Architecture (1)
GPS Logs of
User i
GPS Logs of
User 2
GPS Logs of
User 1
GPS Logs of
User i+1
GPS Logs of
User n-1
GPS Logs of
User n
Modeling Location History
A Hierarchical Graph for each individual
l1
G1
c21
c20
c34
G3
c20
c33
c30
c34
G3
c32
l3
l3
c32
G2
G2
G2
c33
c32
c30
l3
c33
c33
c32
c31
l1
G1
c21 l2
l2
c20
c20
l3
c31
c21
l2
G2
c30
G1
G1
c21
l2
Traj2
l1
l1
c34
G3
Measuring Similarity
A similarity score Sij for each pair of users
c31
c34
G3
Modeling Location History (1)
GPS Logs of
User i
GPS Logs of
User 2
GPS Logs of
User 1
1. Stay point detection
2. Hierarchical clustering
3. Individual graph building
GPS Logs of
User i+1
GPS Logs of
User n-1
GPS Logs of
User n
Modeling Location History
A Hierarchical Graph for each individual
l1
G1
c21
c21
l2
c32
c34
G3
c20
c33
c32
c30
c34
G3
G2
G2
c32
l3
l3
c33
c30
l3
c33
c33
c32
c31
l1
G1
c21 l2
l2
c20
G2
l3
c31
c21
l2
c20
G2
c30
G1
G1
c20
Traj2
l1
l1
c34
G3
Measuring Similarity
A similarity score Sij for each pair of users
c31
c34
G3
Modeling Location History (2)
GPS Logs of
User 1
GPS Logs of
User i
GPS Logs of
User 2
GPS Logs of
User i+1
GPS Logs of
User n-1
GPS Logs of
User n
1. Stay point detection
2. Hierarchical clustering
GPS Logs of
User 1
GPS Logs of
User 2
Layer 1
Layer 1
c10
G1
Layer 2
G2
High
G1
{C }
c20
c21
A
A
B
Low
c
c30
c31
c32 c33
B
c34
Layer 3
G3
3. Individual graph building
Layer 3
Low
e
a
Layer 2
G2
High
Stands for a stay point S
Stands for a stay point cluster cij
Shared Hierarchical Framework
e
a
d
b
G3
Measuring User Similarity (1)
GPS Logs of
User i
GPS Logs of
User 2
GPS Logs of
User 1
GPS Logs of
User i+1
GPS Logs of
User n-1
GPS Logs of
User n
Modeling Location History
A Hierarchical Graph for each individual
l1
G1
c21
c21
l2
c32
1. Sequence Extraction
2. Sequence Matching
3. Similarity Score Calculating
c34
G3
c20
c33
c32
c30
c34
G3
G2
G2
c32
l3
l3
c33
c30
l3
c33
c33
c32
c31
l1
G1
c21 l2
l2
c20
G2
l3
c31
c21
l2
c20
G2
c30
G1
G1
c20
Traj2
l1
l1
c34
G3
c31
Measuring Similarity
A similarity score Sij for each pair of users
c34
G3
Measuring Similarity (2)
,
,
• Similar sequence Extraction
Traj 1
l1
l1
G1
G1
c21 l2
c21
c20
l2
c20
G2
cc3232
l3l3
1
c30c30 π‘ π‘’π‘ž13 = 𝑐32 (1) → 𝑐31 (1) →
π‘ π‘’π‘ž3 = 𝑐32 (1) → 𝑐31 (1) →
(1)
c31c31π‘ π‘’π‘ž232 =
cc3434𝑐31 G
G33 → 𝑐33 (1) →
π‘ π‘’π‘ž3 = 𝑐31 (1) → 𝑐33 (1) →
User 1's hierarchical graph HG1
time
s 81
s 71
s 61
s
1
2
c30
s
s 31
c31
s
1
1
1
5
G2
c32 → 𝑐 (1), ll3
𝑐33 (2) → 𝑐32 (2) → 𝑐33 (1)
32
𝑐33 (2) → 𝑐32 (2) → 𝑐33 (1) →
c 𝑐32 (1),cc33
𝑐32 (1) → 𝑐31 (2) → 𝑐32 (1) → 𝑐31
(1),G3
c (1),
c (1) → 𝑐c34
31
G3
𝑐32 (1) → 𝑐31 (2) → 𝑐c32
31
3
34
30
User n’s personal hierarchical graph
s 82
s
time
s 72
2
6
s 52
s 14
c32
33
32
s 42
s 32
c32
s 22
s 12
c33
c31
c33
c34
Measuring Similarity (3)
• Sequence matching
– We aim to find out the maximum-length similar sequence
– A pair of similar sequence: two individuals share the property of visiting the
same sequence of places with a similar time interval
Same visiting order: ai == bi
Similar transition time:
B
5h
A
8h
C
6h
B
u1
u2
7h
A
B
C
6.5h
14 h
A(1)οƒ  C(1)οƒ  B(1)οƒ  A(2)
BA
AC
B(1)οƒ  A(1)οƒ  C(2)οƒ  B(2)
A
βˆ†π‘‘π‘— − βˆ†π‘‘π‘—′
≤𝑝
max(βˆ†π‘‘π‘— , βˆ†π‘‘π‘—′ )
AB
X
X
ABC
√
,
Measuring Similarity (4)
,
• Similarity Calculating
– Two factors
• The length of the matched similar sequence
• The level of the matched similar sequence
– Calculation
1. Calculating similarity score for each
sequence (weighted by its length)
2. Adding up similarity score of each
sequence found on a level
3. Weighted Summing up the score
of multiple levels
π‘š
𝑠(π‘š ) = 𝛼(π‘š )
1
𝑆𝑙 =
𝑁1 ∗ 𝑁2
𝑖=1
min k 𝑖 , k 𝑖 ′
𝑛
𝑖=1
𝑠𝑖
𝐻
π‘†π‘œπ‘£π‘’π‘Ÿπ‘Žπ‘™ =
𝑙=1
𝛽𝑙 𝑆𝑙
(2)
Measuring Similarity (5)
User 1: User3> User 2
Layer 1
Layer 1
G1
G1
Layer 2
G2
High
B
e
a
A
User 1: A B
User 2: A B
B
Layer 3
c
d
b
G3
Layer 2
G2
High
A
Low
A B
Layer 3
c
Low
e
User 1: a c  e
User 2: bd
a
b
G3
A B
User 1: A B
User 3: A B
c e
User 1: a c  e
User 3: bc  e
Experiments (1)
• GPS Devices and Users
– 112 users collecting the data in the past year
age<=22
22<age<=25
26<=age<29
age>=30
Microsoft emplyees
Employees of other companies
Government staff
Colleage students
9% 16%
18%
30%
14%
45%
58%
10%
Experiments (2)
• GPS dataset
– > 6 million GPS points
– > 170,000 kilometers
– 36 cities in China and a few city in the USA, Korea and Japan
Experiments (3)
• Evaluation approach
– Evaluated as an information retrieval problem
– Ground truth: Users label the relationship with a ratings show in this Table
Relevance level
Relationships suggestion
4
Strongly similar
Family members/intimate lovers/roommate
3
Similar
Good friends/workmates/classmates
2
Weakly similar
Ordinary friends, neighbors in a community
1
Different
Strangers in the same city
0
Quite different
Strangers in other cities
A query user
Retrieve Similar Users
Relationship matrix
U1, U2, Ui, …, Un
U1,
U2,
...
Ui
Top Ten Similar Users
(U2, U3,…, U4)
3,
4,
0,
1,
G=(4, 3, 2, 3,0,1,…,0,0 )
0,
2,
Calculating
nDCG and MAP
0
1
3
( 4,3, 3, 2, 2, 1,…,0,0 )
...
3,
2,
2,
0,
....
2,
3,
Un
1
Get Ground Truth
Experiments (4)
• Comparing with baselines
– The Pearson Correlation
– Cosine Similarity
0.96
0.92
MAP
0.88
0.84
0.8
0.76
0.72
Methods
Experiments (5)
• NDCG comparison
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.78
nDCG@ 5
nDCG@10
Methods
Conclusion
• A hierarchical graph
– A uniform framework to measure various users’ location histories
– Effectively modeling users’ outdoor movements
• Sequentially
• Hierarchically
• Our similarity measurement outperformed existing methods
– The Person measurement and
– Cosine similarity measurement
– Hierarchy + Sequence achieved the best performance
Thanks!
Microsoft Research Asia
yuzheng@microsoft.com
Download