trajectory tutorial.pptx

advertisement
Similarity-based Analysis
for Trajectory Data
Kevin Zheng
25/04/2014
DASFAA 2014 Tutorial
1
Outline
•  Background
–  What is trajectory
–  Where do they come from
–  Why are they useful
–  Characteristics
•  Trajectory similarity search
–  Query classification
–  Trajectory similarity measures
–  Trajectory index
•  Similarity-based trajectory mining
–  Popular route mining
–  Co-traveller discovery
–  Trajectory clustering
25/04/2014
DASFAA 2014 Tutorial
2
Outline
•  Background
–  What is trajectory
–  Where do they come from
–  Why are they useful
–  Characteristics
•  Trajectory similarity search
–  Query classification
–  Trajectory similarity measures
–  Trajectory index
•  Similarity-based trajectory mining
–  Popular route mining
–  Co-traveller discovery
–  Trajectory clustering
25/04/2014
DASFAA 2014 Tutorial
3
What is trajectory?
•  Historical location records of moving objects
•  In mathematics
–  Continuous function: time à location
–  Location can be any dimension
•  In real applications
–  Locations are sampled periodically
–  A finite sequence of time-stamped locations: <p1,
t1>, <p2, t2> …, <pn, tn>
–  p: two or three dimensions (longitude, latitude)
25/04/2014
DASFAA 2014 Tutorial
4
Where is it from?
25/04/2014
DASFAA 2014 Tutorial
5
Where is it from?
•  GPS module on moving objects
–  Vehicles, mobile phone users, animals
•  Online social network
–  Twitter, Flickr, Facebook, Weibo
•  Sensors
–  Surveillance cameras, RFID, WiFi
•  More …
25/04/2014
DASFAA 2014 Tutorial
6
Who cares about it?
•  Government
–  Traffic pattern analysis
–  Public transportation management
–  Urban planning
•  Business
–  Location-based service
–  Personalized advertisement & recommendation
–  Taxi company, logistic company
•  Scientists & Researchers
–  Zoologist, meteorologist, astronomer
–  Open problems, challenging tasks
•  More …
25/04/2014
DASFAA 2014 Tutorial
7
Trajectory data are BIG
•  Volume
•  Velocity
•  Variety
25/04/2014
DASFAA 2014 Tutorial
8
Volume
•  In 2010, 1 billion vehicles
–  Taxi, logistic companies keep tracking their
vehicles
–  Self-driving car in near future?
•  In 2012, 1.08 billion smartphone users
•  In 2013, 20 million surveillance cameras in
China
•  They are generator!
–  The data keep accumulated
25/04/2014
DASFAA 2014 Tutorial
9
Velocity
•  Not just huge, they’re being generated quickly
•  Vehicle tracking & navigation
–  Re-position every few seconds
•  Geo-tagged social media
–  2 million Flickr photos per day, 5% geo-tagged
–  100 million posts on Sina Weibo per day, 1-2%
geo-tagged
–  400 million tweets per day, 1% geo-tagged
•  Sensors
–  How many cars pass a road camera every day?
25/04/2014
DASFAA 2014 Tutorial
10
Geo-tagged tweets
Images courtesy of Twitter
25/04/2014
DASFAA 2014 Tutorial
11
Variety
•  Data source
•  Tracking devices
–  Car GPS, smartphones, sensors
•  Tracking methods
–  Sampling strategy, sampling rate,
•  Spatial length & temporal duration
•  Data quality
25/04/2014
DASFAA 2014 Tutorial
12
Research directions
• 
• 
• 
• 
• 
Scalable, real-time data processing
Flexible database storage and index
Effective similarity measures
Uncertainty management
Data compression
Key and fundamental research problem: similarity-based analysis
25/04/2014
DASFAA 2014 Tutorial
13
Outline
•  Background
–  What is trajectory
–  Where do they come from
–  Why are they useful
–  Characteristics
•  Trajectory similarity search
–  Query classification
–  Trajectory similarity measures
–  Trajectory index
•  Similarity-based trajectory mining
–  Popular route mining
–  Co-traveller discovery
–  Trajectory clustering
25/04/2014
DASFAA 2014 Tutorial
14
Similarity-based analysis for
trajectories
•  Core problem: trajectory similarity search
–  Input: a trajectory dataset D, a query Q
–  Output: a subset of D that are ‘similar’ to Q
•  Foundation
–  Trajectory similarity measures
•  Approach
–  Index and search algorithm
•  Application
–  Popular route mining (route recommendation)
–  co-traveller discovery, clustering, classification, etc…
25/04/2014
DASFAA 2014 Tutorial
15
Similarity query classification
•  P-query
–  Query: point(s)
•  R-query
–  Query: region (spatial & temporal dimension)
•  T-query
–  Query: trajectory
25/04/2014
DASFAA 2014 Tutorial
16
P-query (single point)
Query location: q
Temporal constraint (optional): tc = [ts, te]
ts
te
q
ts
𝐷(𝑞, 𝑇)=​𝑚𝑖𝑛⁠𝑑𝑖𝑠𝑡(𝑞,𝑝) 𝑝∈𝑇 and satisfy tc
dist(q,p):
-  Lp-norm
-  Network distance
te
[Tao2002] Tao Y., Papadias D. and Shen Q., Continuous nearest neighbour search, VLDB, 2002
25/04/2014
DASFAA 2014 Tutorial
17
P-query (multiple points)
q1
q2
Query locations Q: q1, q2, q3, q4
q3
D(Q,T) is an aggregate
function of D(q,T)
q4
[Chen2010] Chen Z., Shen HT., Zhou X., Zheng Y and Xie X., Searching trajectories by locations – an efficiency study. SIGMOD 2010
25/04/2014
DASFAA 2014 Tutorial
18
R-query
•  Spatial region: R
•  Temporal interval:[ts, te]
ts
Ask for trajectories in a given
region during a time interval
R
te
ts
te
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
25/04/2014
DASFAA 2014 Tutorial
19
T-query
•  Query: Tq
How to measure their
distance?
Tq
25/04/2014
DASFAA 2014 Tutorial
20
Trajectory similarity measures
• 
• 
• 
• 
• 
• 
Many-to-many mapping
Different semantic/applications
Different lengths
Different sampling rates
Noises
Temporal dimension?
25/04/2014
DASFAA 2014 Tutorial
21
Classification
Consider location
only
Based on
location samples
Discrete
Continuous
Consider both
location and time
Spatial-only
Spatial-temporal
Lp-norm
DTW
LCSS
EDR
DTW, LCSS, EDR with time
constrain
OWD
LIP
Synchronous Euclidean
Distance
Based on line
segments or curves
25/04/2014
DASFAA 2014 Tutorial
22
Classification
Spatial-only
Discrete
Continuous
25/04/2014
Spatial-temporal
Lp-norm
DTW
LCSS
EDR
DTW, LCSS, EDR with time
constrain
OWD
LIP
Synchronous Euclidean
Distance
DASFAA 2014 Tutorial
23
Lp-norm
•  Average Lp-norm distance of all matched
locations
•  1-to-1 mapping
•  Trajectories are of the same length
25/04/2014
DASFAA 2014 Tutorial
24
Lp-norm
•  Cannot detect similar trajectories with different
sampling rates
•  Sensitive to noise
25/04/2014
DASFAA 2014 Tutorial
25
DTW
•  Dynamic Time Warping distance
–  Adaptation from time series distance measure
–  Used to handle time shift and scale in time series
•  Optimal order-aware alignment between two
sequences
–  Goal: minimize the aggregate distance between
matched points
•  1-to-many mapping
Yi, Byoung-Kee, Jagadish, HV and Faloutsos, Christos, Efficient retrieval of similar time sequences under time warping. ICDE 1998
25/04/2014
DASFAA 2014 Tutorial
26
DTW for trajectories
•  Nothing to do with ‘time’ at all
•  Useful when detecting similar trajectories with
different sampling rates
•  Sensitive to noise
25/04/2014
DASFAA 2014 Tutorial
27
LCSS
•  Longest Common Sub-Sequence
•  Adaptation of string similarity
–  Lcss(‘abcde’,’bd’) = 2
•  Threshold-based equality relationship
–  Two locations are regarded as equal if they’re
‘close’ (compared to a threshold)
•  1-to-(1 or null) mapping
VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002
25/04/2014
DASFAA 2014 Tutorial
28
LCSS
•  Insensitive to noise
•  Not easy to define threshold
•  May return dissimilar trajectories
p5
p3
p4
p2
p1
25/04/2014
p’1
p’3
p’2
DASFAA 2014 Tutorial
29
EDR
•  Edit Distance on Real sequence
•  Adaptation from Edit Distance on strings
–  Number of insert, delete, replace needed to convert
A into B
•  Threshold-based equality relationship
–  Two locations are regarded as equal if they’re
‘close’ (compared to a threshold)
Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and Fast Similarity Search for Moving Object Trajectories. SIGMOD 2005
25/04/2014
DASFAA 2014 Tutorial
30
EDR
•  Value means the number of operations, not
“distance between locations”
–  Insensitive to noise
replace
p5
insert
p3
p4
p2
insert
25/04/2014
p1
p’1
p’3
p’2
DASFAA 2014 Tutorial
31
LCSS and EDR
•  They are both count-based
–  LCSS counts the number of matched pairs
–  EDR counts the cost of operations needed to fix
the unmatched pairs
•  Higher LCSS, lower EDR
•  If cost(replace) = cost(insert) + cost(delete):
•  EDR(X,Y) = L(X)+L(Y) – 2LCSS(X,Y)
25/04/2014
DASFAA 2014 Tutorial
32
Classification
Spatial-only
Discrete
Continuous
25/04/2014
Spatial-temporal
Lp-norm
DTW
LCSS
EDR
DTW, LCSS, EDR with time
constrain
OWD
LIP
Synchronous Euclidean
Distance
DASFAA 2014 Tutorial
33
OWD
•  One Way Distance from T1 to T2 is:
–  Integral of the distance from points of T1 to T2
–  Divided by the length of T1
•  Make it into symmetric measure
Bin Lin, Jianwen Su, One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories. In Geoinformatica (2008)
25/04/2014
DASFAA 2014 Tutorial
34
OWD example
•  Consider one trajectory as piece-wise line
segment, and the other as discrete samples
25/04/2014
DASFAA 2014 Tutorial
35
LIP distance
•  Locality In-between Polylines
–  Polygon is the set of polygons formed between
intersection points
– 
Nikos Pelekis et al, Similarity Search in Trajectory Databases. Symposium on Temporal Representation and Reasoning 2007
25/04/2014
DASFAA 2014 Tutorial
36
LIP distance
•  Only work for 2-dimensional trajectories
•  Polygon à polyhedron: non-trivial change
25/04/2014
DASFAA 2014 Tutorial
37
Spatial-temporal similarity measure
•  All the spatial-only similarity measures can
incorporate time information in trajectories
•  Lp-norm and DTW can apply a temporal
constrain for more synchronized alignment
•  EDR and LCSS can apply a temporal threshold
on top of spatial threshold
25/04/2014
DASFAA 2014 Tutorial
38
Classification
Spatial-only
Discrete
Continuous
25/04/2014
Spatial-temporal
Lp-norm
DTW
LCSS
EDR
DTW, LCSS, EDR with time
constrain
OWD
LIP
Synchronous Euclidean
Distance
DASFAA 2014 Tutorial
39
DTW with temporal constrain
Time tolerance = 2
13
15
10
9
7
10
25/04/2014
DASFAA 2014 Tutorial
40
Spatial-temporal LCSS
•  A temporal threshold controls how far in time
we can go in order to match two locations
p5
t3=4
p3
t4=7
p4
t2=2 p2
t1=1 p1
p’1
t’1=1
t5=11
Time threshold=2
p’3 t’3=4
p’2 t’2=3
VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002
25/04/2014
DASFAA 2014 Tutorial
41
Classification
Spatial-only
Discrete
Continuous
25/04/2014
Spatial-temporal
Lp-norm
DTW
LCSS
EDR
DTW, LCSS, EDR with time
constrain
OWD
LIP
Synchronous Euclidean
Distance
DASFAA 2014 Tutorial
42
SED
•  Synchronous Euclidean Distance
–  Euclidean distance between locations at the same
time instance of two trajectories
•  Regard trajectory as continuous function of
time
Mirco Nanni, Dino Pedreschi, Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems (2006)
POTAMIAS, M., PATROUMPAS, K., AND SELLIS, T. K. Sampling trajectory streams with spatiotemporal criteria. SSDBM 2006
25/04/2014
DASFAA 2014 Tutorial
43
SED
Virtually create a
sample point at t=3
t=10
t=0
t=20
t=15
t=5
t=10
t=3
t=0
25/04/2014
t=25
t=12
t=7
DASFAA 2014 Tutorial
44
Trajectory index
•  Similarity measures give the way to calculate
the distance between two trajectories
•  However …
–  Huge amount of trajectories
–  Linear scan is inefficient
25/04/2014
DASFAA 2014 Tutorial
45
Trajectory index
• 
• 
• 
• 
3D R-tree
STR-tree (Spatio-temporal R-tree)
TB-tree (Trajecoty Bundle)
Multi-version R-tree
–  Partition temporal dimension
•  Grid-based index
–  Partition spatial dimension
25/04/2014
DASFAA 2014 Tutorial
46
3D R-tree
•  Indexing position samples only
–  Cannot answer queries about movements inbetween those samples
•  Indexing line segments
Time y x 25/04/2014
DASFAA 2014 Tutorial
47
Problem of R-tree
•  Large “dead space”
–  MBB covers a large portion of the space with no data
–  Low pruning power
•  Trajectory preservation
–  Line segments are grouped merely by spatial proximity
–  Regardless which trajectory they belong to
–  Retrieving a trajectory requires visits of different paths
in the tree
25/04/2014
DASFAA 2014 Tutorial
48
Augmented 3D R-tree
•  Augment leaf node with orientation
information
•  Distance to line segments can be approximated
more accurately
q
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
25/04/2014
DASFAA 2014 Tutorial
49
STR-tree
•  Extension of augmented 3D R-tree
•  Insertion
–  Try to keep the line segments belonging to the
same trajectory together
–  Find the leaf node containing the predecessor
•  Node split
–  Put disconnected segments into new node
–  Put the most recent backward-connected segment
into new node
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
25/04/2014
DASFAA 2014 Tutorial
50
TB-tree (Trajectory Bundle)
•  A leaf node only contains segments belonging
to the same trajectory
•  Leaf nodes containing same trajectory are
linked
•  Strictly preserve trajectories
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
25/04/2014
DASFAA 2014 Tutorial
51
TB-tree (Trajectory Bundle)
25/04/2014
DASFAA 2014 Tutorial
52
Reflection
•  Previous index treat spatial and temporal
dimensions equally
–  3D tree structure
•  In trajectory database, temporal dimension is
more dynamic
–  New segments are appended to existing trajectories
–  Archived trajectories rarely update
•  Indexing spatial and temporal dimensions
separately
–  Partition temporal dimension first
–  Partition spatial dimension first
25/04/2014
DASFAA 2014 Tutorial
53
Multi-version R-tree
For each 2mestamp, an R-­‐tree is created. So, there are many R-­‐trees. These R-­‐trees are indexed. HR-­‐tree Query for trajectories in a given region and in a given time interval:
1.  The R-tree at the timestamp is found first
2.  The trajectories in the specified region are retrieved from the R-tree.
[Nascimento1998] Nascimento, M., Silva, J. Towards Historical R-trees. ACM SAC, 1998
[Tao2001a] Tao, Y., Papadias, D.: Efficient historical r-trees. In: ssdbm, p. 0223. Published by the IEEE Computer Society (2001)
[Xu2005]Xu, X., Han, J., Lu, W.: Rt-tree: An improved r-tree indexing structure for temporal spatial databases. In: Int. Symp. on Spatial Data Handling,
2005
[Tao2001b] Tao, Y., Papadias, D.: Mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431-440 (2001)
25/04/2014
DASFAA 2014 Tutorial
54
Grid-based index
•  Partition space into non-overlapping cells
•  Trajectory segments in each cell are indexed
on temporal dimension
•  Query processing:
–  Spatial filtering
–  Temporal filtering
[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003
25/04/2014
DASFAA 2014 Tutorial
55
Grid-based index
[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003
25/04/2014
DASFAA 2014 Tutorial
56
Outline
•  Background
–  What is trajectory
–  Where do they come from
–  Why are they useful
–  Characteristics
•  Trajectory similarity search
–  Query classification
–  Trajectory similarity measures
–  Trajectory index
•  Similarity-based trajectory mining
–  Popular route mining
–  Co-traveller discovery
–  Trajectory clustering
25/04/2014
DASFAA 2014 Tutorial
57
Popular route mining
•  Shortest path may not be the favourable
•  Find the most popular/desirable path using the
GPS trajectories of past travellers
•  Classification
–  No specific source/destination – hot route
discovery
–  Specific source/destination – popular route search
25/04/2014
DASFAA 2014 Tutorial
58
T-pattern
•  A set of individual trajectories that share the
property of visiting the same sequence of
places with similar travel times
1. Spatial discretization: discretize space into finite set of regions of interest (RoI)
2. Translate trajectories into sequence of RoIs
3. Adapt sequential pattern mining algorithms with time constrain
F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In SIGKDD, pages 330–339, 2007
25/04/2014
DASFAA 2014 Tutorial
59
Periodic pattern
•  Find the repeat pattern for individual’s
trajectory
1.  Pre-defined and synchronized timestamps
2.  Adaptive region: density-based cluster
3.  Trajectories are translated to sequence of regions
at pre-defined time instances
N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal
data. In SIGKDD, pages 236–245, 2004.
25/04/2014
DASFAA 2014 Tutorial
60
Interesting travel sequence
•  A sequence of interesting locations that is
travelled by experienced drivers frequently
•  Interestingness of a location?
–  How many experienced travellers visited it
•  Experience of a traveller?
–  How many interesting locations she has visited
Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In WWW, pages 791–800,
2009.
25/04/2014
DASFAA 2014 Tutorial
61
Interesting travel sequence
•  Hyperlink-Induced Topic Search (HITS)
25/04/2014
DASFAA 2014 Tutorial
62
Popular route search
•  Given source/destination, find/estimate the
most popular route in between
•  Ideally, we can just count the number of
trajectories on different paths connecting the
two locations
Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. In ICDE, pages 900–911, 2011
25/04/2014
DASFAA 2014 Tutorial
63
Popular route discovery
•  In real scenarios, it’s not easy to find such
well-divided groups
•  Even worse, there’s no trajectory connecting
two locations at all!
25/04/2014
DASFAA 2014 Tutorial
64
Popular route discovery
•  Construct a transfer network from raw trajectories
as an intermediate result to capture the moving
behaviors between locations
–  Node: cluster of turning points
–  Edge: trajectories passing two nodes
25/04/2014
DASFAA 2014 Tutorial
65
Popular route discovery
•  Transfer probability
•  Find the route with the highest joint transfer
probability w.r.t. destination
25/04/2014
DASFAA 2014 Tutorial
66
Find popular route from uncertain
trajectories
•  Low-sampling-rate trajectories have high
degree of uncertainty
•  Uncertain trajectories are prevalent!
•  Is it possible to recover the original route given
a low-sampling-rate trajectory?
•  What if…
–  There is a historical set of uncertain trajectories
Kai Zheng, Yu Zheng, Xing Xie and Xiaofang Zhou. Reducing Uncertainty of Low-Sampling-Rate Trajectories. ICDE 2012
25/04/2014
DASFAA 2014 Tutorial
67
Find popular route from uncertain
trajectories
•  Can we find popular routes for specific s/d
using low-sampling-rate trajectories
Infrequent samples on the same path can reinforce each other, and
they collectively form a more ‘dense’ trajectory
25/04/2014
DASFAA 2014 Tutorial
68
Find popular route from uncertain
trajectories
•  Find PR for a given sequence of locations
•  Two-phase approach
–  Local PR construction
–  Global PR search
25/04/2014
DASFAA 2014 Tutorial
69
Find popular route from uncertain
trajectories without road networks
•  A road network is not always available or
applicable
1.  Discretize space into disjoint cells
2.  Derive the transfer graph using cells
3.  Infer frequent ‘virtual edges’ on the graph
L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In ACM SIGKDD, pages 195–203, 2012.
25/04/2014
DASFAA 2014 Tutorial
70
Time period-based most frequent
path (TPMFP)
•  Find the most frequent path for specific s/d and
time period
•  Desired properties for a MFP
–  Suffix optimal: suffix of a MFP is also a MFP
–  Length insensitive: MFP shouldn’t favor long/short
route
–  Bottleneck free: MFP shouldn’t contain infrequent
edges
Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni. Finding Time Period-Based Most Frequent Path in Big Trajectory Data. SIGMOD 2013
25/04/2014
DASFAA 2014 Tutorial
71
Time period-based most frequent
path (TPMFP)
•  Footmark graph
–  A weighted sub-graph
–  Edge frequency: number of trajectories reaching vd
during T
–  Path frequency: non-decreasingly sorted sequence of
edge frequencies
v1 à v2: 14, v2 à v3: 10
v3 à v12: 10, v2 à v12: 8
V1 à v2 à v12: (8, 14)
V1àv2àv3àv12: (10,10,14)
V1àv10àv11àv12: (1,21,21)
25/04/2014
DASFAA 2014 Tutorial
72
Time period-based most frequent
path (TPMFP)
•  Compare path frequency
–  More-frequent-than relation (>)
–  F > F’ if their first different value fj > fj’
–  (10,10,14) > (8,14) > (1,21,21)
•  Property
–  A total order, so MFP always exist
–  Guarantee the suffix optimal
–  Length of path doesn’t matter a lot
–  Path with infrequent edge frequency be disadvantaged
25/04/2014
DASFAA 2014 Tutorial
73
PR search is more challenging
•  Specific source/destination (and time
constraint)
–  Data are sparse
–  How to utilize more relevant data?
•  Online performance
–  Efficiency is critical
25/04/2014
DASFAA 2014 Tutorial
74
Summary
•  Background
–  What is trajectory
–  Where do they come from
–  Why are they useful
–  Characteristics
•  Trajectory similarity search
–  Query classification
–  Trajectory similarity measures
–  Trajectory index
•  Similarity-based trajectory mining
–  Popular route mining
–  Co-traveller discovery
–  Trajectory clustering
25/04/2014
DASFAA 2014 Tutorial
75
Thank you
•  Questions
–  kevinz@itee.uq.edu.au
•  DKE group at UQ
–  http://www.itee.uq.edu.au/dke/dke-lab
•  My research
–  http://staff.itee.uq.edu.au/kevinz/
25/04/2014
DASFAA 2014 Tutorial
76
Download