TRaCLASS: T

advertisement
VLDB 2008


Motivation
TraClass: Trajectory Feature Generation
 Trajectory Partitioning
 Region-Based Clustering
 Trajectory-Based Clustering




Classification Strategy
Performance Evaluation
Related Work
Conclusions
2008-08-28
2
Scope of this paper
Classifier
Unseen data
(Jeff, Professor, 4, ?)
Features
Feature Generation
Prediction
Training data
NAME
Mike
Mary
Bill
Jim
Dave
Anne
RANK
Assistant Prof
Assistant Prof
Professor
Associate Prof
Assistant Prof
Associate Prof
YEARS
3
7
2
7
6
3
TENURED
no
yes
yes
yes
no
no
Tenured = Yes
Class label
2008-08-28
3

A trajectory is a sequence of the location and
timestamp of a moving object
Hurricanes
Vessels
2008-08-28
Turtles
Vehicles
4

Definition: The process of predicting the class labels
of moving objects based on their trajectories and
other features

Applications: Homeland security, weather forecast,
law enforcement, etc.
 Example: Detection of vessel
types (e.g., container ships,
tankers, and fishing boats)
from satellite images
2008-08-28
5

Several trajectory classification methods have been
proposed mainly in the fields of pattern recognition,
bioengineering, and video surveillance

A common characteristic of earlier methods is that
they use the shapes of whole trajectories to do
classification, e.g., by using the HMM
Note: Although a few methods partition trajectories,
the purpose of their partitioning is just to
approximate or smooth trajectories
2008-08-28
6

Problem Statement:
Given a set of labeled trajectories, generate
discriminative trajectory features that make a
specific class distinguishable from other classes

Observations:
(1) Discriminative features are likely to appear at
parts of trajectories, not at whole trajectories;
(2) Discriminative features appear not only as
common movement patterns, but also as regions
2008-08-28
7
Sub-trajectory
Region


Observation 1: Parts of trajectories near the container port and
near the refinery enable us to distinguish between container ships
and tankers even if they share common long paths
Observation 2: Those in the fishery enable us to recognize fishing
boats even if they have no common path there
2008-08-28
8
Overall shape


The classification accuracy of earlier methods might not be high
since the overall shapes of whole trajectories are similar to each
other
Our framework TraClass aims at discovering both region and subtrajectory features
2008-08-28
9

Extract features in a top-down fashion, first by
region-based clustering and then by trajectorybased clustering
Trajectory partitions
Recursively quantize
non-homogeneous
regions
Trajectory partitions in
non-homogeneous regions
Repeatedly find
finer-granularity
clusters
Region-based and
Trajectory-based clusters
2008-08-28
10

Achieve high classification accuracy owing to the
collaboration between the two types of clustering


Region features ← Region-based clustering
Sub-trajectory features ← Trajectory partitioning and
trajectory-based clustering
2008-08-28
11
Trajectory partitions
Recursively quantize
non-homogeneous
regions
Trajectory partitions in
non-homogeneous regions
Repeatedly find
finer-granularity
clusters
Region-based and
Trajectory-based clusters
2008-08-28
12
1. Trajectories are partitioned based on their shapes as
in the partition-and-group framework [12]
2. Trajectory partitions are further partitioned by the
class labels
 The real interest here is to guarantee that trajectory
partitions do not span the class boundaries
Non-discriminative
Discriminative
Class A
Class B
Additional partitioning points
2008-08-28
13

If the most prevalent class around one endpoint is
different from that around the other endpoint,
further partition it
 Example:
Need to be further partitioned
Prevalent class =
Class B
Prevalent class =
Class A
Class A
Class B
2008-08-28
14
Trajectory partitions
Recursively quantize
non-homogeneous
regions
Trajectory partitions in
non-homogeneous regions
Repeatedly find
finer-granularity
clusters
Region-based and
Trajectory-based clusters
2008-08-28
15

Discover regions that have trajectories mostly of one
class regardless of their movement patterns
 The region-based cluster is a set of trajectory partitions of
the same class within a rectangular region regardless of
their movement patterns
(1)
2008-08-28
(2)
16


Homogeneity: The class distribution in each region
should be as homogeneous as possible
Conciseness: The number of regions should be as
small as possible
Note: Two properties are contradictory to each other
conciseness
homogeneity
One large region

Many small regions
Need to find a good tradeoff between the properties
2008-08-28
17

The minimum description length (MDL) cost consists
of the description cost and the code cost
 The former measures conciseness, and the latter
homogeneity

The best hypothesis is the one that minimizes the
sum of the description cost and the code cost

Finding a good quantization translates to finding the
best hypothesis using the MDL principle
2008-08-28
18

Progressively find a better partitioning alternately
for the X axis and for the Y axis as long as the MDL
cost decreases
 Select the partition that has the maximum code cost and
divide it into two parts in order to decrease the MDL cost
(1)
(2)
2008-08-28
(3)
(4)
19
Trajectory partitions
Recursively quantize
non-homogeneous
regions
Trajectory partitions in
non-homogeneous regions
Repeatedly find
finer-granularity
clusters
Region-based and
Trajectory-based clusters
2008-08-28
20

Discover sub-trajectories that indicate common
movement patterns of each class
 The trajectory-based cluster is a set of trajectory partitions
of the same class which share a common movement
pattern
(3)
2008-08-28
(4)
21

Similar to our trajectory clustering algorithm [12],
but incorporate the class labels into clustering
 The algorithm is based on DBSCAN [5]
 If an ε-neighborhood contains trajectory partitions mostly
of the same class, it is used for clustering; otherwise, it is
discarded immediately
Non-homogeneous
ε-neighborhood
X
L1
2008-08-28
Homogeneous
ε-neighborhood
O
L2
22

After trajectory-based clusters are found,
discriminative clusters are selected for effective
classification
 If the average distance to other clusters of different classes
is high, the discriminative power of the cluster is high
 Example:
C1
C2
Class A
Class B
C1 is more discriminative than C2
2008-08-28
23

A cluster link is a sequence of connectable (i.e.,
consecutive) trajectory-based clusters
 Two clusters are connectable if they share enough
trajectories (more formally, the ratio of common
trajectories is higher than χ)

The benefit of cluster links is to derive also wholetrajectory features
 Cluster links are added to the set of trajectory-based
clusters for use in classification
2008-08-28
24
1. Partition trajectories by considering the class labels
2. Perform region-based clustering
3. Perform trajectory-based clustering
4. Select discriminative trajectory-based clusters
5. Find cluster links from trajectory-based clusters
6. Convert each trajectory into a feature vector
 Each feature is either a region-based cluster or a
trajectory-based cluster
 The i-th entry of a feature vector is the frequency that the
i-th feature occurs in the trajectory
7. Feed the feature vectors to the SVM
2008-08-28
25

Use three real trajectory data sets
 Animal movement data set
 Movements of elk, deer, and cattle for the years 1993 through 1996
 Three classes: Elk, Deer, and Cattle
 Number of trajectories (points): 38 (7117), 30 (4333), and 34 (3540)
 Vessel navigation data set
 Navigation paths of two vessels in August 2000
 Two classes: Point Lobos and Point Sur
 Number of trajectories (points): 600 (65500) and 550 (125750)
 Hurricane track data set
 Atlantic Hurricanes for the years 1950 through 2006
 Two classes: Category 2 and Category 3
 Number of trajectories (points): 61 (2459) and 72 (3126)

Randomly select 20% of trajectories for the test set
2008-08-28
26

Measure classification accuracy, training time, and
prediction time for the three data sets
Classification accuracy =

# of test trajectories correctly classified
total # of test trajectories
Compare two versions of the algorithm
 TB-ONLY: Perform trajectory-based clustering only
 RB-TB: Perform both types of clustering
 TB-ONLY is expected to be no worse than earlier methods
since it discovers also whole-trajectory features by clusterlink generation
2008-08-28
27
Data Set
Version
Animal
Vessel
Hurricane
TB-ONLY
RB-TB
TB-ONLY
RB-TB
TB-ONLY
RB-TB
Accuracy (%)
50.0
83.3
84.4
98.2
65.4
73.1
Training Time (ms)
3542
2406
44683
22902
331
317
Prediction Time (ms)
104
98
722
608
48
46


The classification accuracy of RB-TB is much higher
than that of TB-ONLY
The training time of RB-TB is much shorter than that
of TB-ONLY
2008-08-28
28
Features:
10 region-based clusters
37 trajectory-based clusters
Data: Three classes
Accuracy = 83.3%
2008-08-28
Red: Elk Blue: Deer Black: Cattle
29
1 region-based cluster
15 trajectory-based clusters
These hurricanes entered the
Gulf of Mexico and thus
stayed longer at sea before
landfall than others;
They are likely to get strong
because hurricanes gain
energy from the evaporation
of warm ocean water
Stronger hurricanes tend to
go further than weaker ones
Gulf of Mexico
Red: Category 2 Blue: Category 3
2008-08-28
30

Effect of region-based clustering

Effect of the data size (scalability test)
2008-08-28
31

Pattern recognition [1] e.g., speech, handwriting,
signature, and gesture recognition
 Classifying human motion trajectories
 Employing the hidden Markov model (HMM)

Bioengineering [16]
 Classifying biological motion trajectories

Video surveillance [15]
 Detecting suspicious behaviors of pedestrians


Time-series classification [20,21]
Moving-object anomaly detection [14]
2008-08-28
32

A novel and comprehensive feature generation
framework for trajectories has been proposed

The primary advantage is the high classification
accuracy owing to the collaboration between the
two types of clustering

Various real-world applications, e.g., vessel
classification, can benefit from our framework
2008-08-28
33
2008-08-28
34
Download