Lunch Talk A Brief Overview of Trajectory Data Mining May 26, 2009 Jae-Gil Lee * This work was done while I was at University of Illinois at UrbanaChampaign for the years from 2006 through 2008. 1 Outline Introduction to Trajectory Data Clustering Classification Outlier Detection 2 Trajectory Data A trajectory is a sequence of the location and timestamp of a moving object Hurricanes Vessels Turtles Vehicles 3 Importance of Analysis on Trajectory Data The world becomes more and more mobile Prevalence of mobile devices such as cell phones, smart phones, and PDAs Satellite, sensor, RFID, and wireless technologies have been improved rapidly Tremendous amounts of trajectory data of moving objects 4 Research Impacts Trajectory data mining has many important, realworld applications driven by the real need Homeland security (e.g., border monitoring) Law enforcement (e.g., video surveillance) Weather forecast Traffic control Location-based service … 5 Significance of Our Research Previous Frameworks of Trajectory Mining The whole trajectory is considered as an atomic unit Our Frameworks of Trajectory Mining A trajectory is processed after it is partitioned into a set of line segments (trajectory partitions) Our frameworks can take advantage of the knowledge of partial trajectories Open a new paradigm of trajectory mining 6 Major Categories in Data Mining Clustering Classification Discover common behaviors (i.e., motions) from trajectory data of moving objects Classify moving objects based on their trajectories and other information SIGMOD’07, SSTD’07 VLDB’08, Submitted to a journal Pattern Mining Outlier Detection Find synchronous (i.e., chase) patterns from trajectory data of moving objects Uncover suspicious behaviors from trajectory data of moving objects Submitted to a journal ICDE’08, ICDE’09 7 Example Applications Clustering Classification Classify vessel types (e.g., tankers and containers) in radar images Most systems use a size estimate for classifying vessels, so small vessels are very hard to classify Pattern Mining Find common behaviors of animal migration from animal movement data Find popular sequences and travel times of tourists in sightseeing places Outlier Detection Detect suspicious persons in parking garages from video surveillance data (trajectories can be extracted from video data) 8 Movebank Project For Animal Tracking and Photo Monitoring Data NSF-funded project initiated in the summer of 2007 Aim: Facilitate long-term comparisons of animal movement data making it possible to address pressing questions such as the effects of global climate change and human-caused landscape change A data repository, a live data pipeline, online mapping and analysis tools, an educational tool, a community of collaborators Homepage: http://www.movebank.org/ Participants: New York State Museum, Princeton University, San Diego Supercomputer Center, UIUC Data Mining Group Our clustering algorithm will be open to public via web services 9 Example: Oil Bird Study Study animal behavior and movement with new toys: “Franzmitters” 848 MHz radio GPS, 3D acceleration Optional: Hand tracking, power amplifier, solar power 10 Outline Introduction to Trajectory Data Clustering Classification Outlier Detection 15 Trajectory Clustering Group similar trajectories into the same cluster 1. Whole Trajectory Clustering Probabilistic Clustering Density-Based Clustering: TF-OPTICS 2. Partial Trajectory Clustering The Partition-and-Group Framework 16 Partition-and-Group Framework (Lee et al. 07) Existing algorithms group trajectories as a whole They might not be able to find similar portions of trajectories e.g.) The common behavior cannot be discovered since TR1~TR5 move to totally different directions TR3 TR4 TR5 A common sub-trajectory TR2 TR1 The partition-and-group framework is proposed to discover common sub-trajectories 17 Usefulness of Common Sub-Trajectories Discovering common sub-trajectories is very useful, especially if we have regions of special interest Hurricane Landfall Forecasts Meteorologists will be interested in the common behaviors of hurricanes near the coastline or at sea (i.e., before landing) Effects of Roads and Traffic on Animal Movements Zoologists will be interested in the common behaviors of animals near the road where the traffic rate has been varied 18 Overall Procedure Two phases: partitioning and grouping (1) Partition TR3 TR4 TR5 A set of trajectories TR2 TR1 A representative trajectory (2) Group A cluster A set of line segments Note: a representative trajectory is a common sub-trajectory 19 Partitioning Phase Identify the points where the behavior of a trajectory changes rapidly characteristic points An optimal set of characteristic points is found by using the minimum description length (MDL) principle pc pc 3 p6 p7 p8 p5 pc p1 p4 2 p3 : characteristic point 4 p2 pc TRi 1 : trajectory partition Partition a trajectory at every characteristic point 20 Overview of the MDL Principle The MDL principle has been widely used in information theory The MDL cost consists of two components: L(H) and L(D|H), where H means the hypothesis, and D the data L(H) is the length, in bits, of the description of the hypothesis L(D|H) is the length, in bits, of the description of the data when encoded with the help of the hypothesis The best hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H) 21 MDL Formulation Finding the optimal partitioning translates to finding the best hypothesis using the MDL principle H → a set of trajectory partitions, D → a trajectory L(H) → the sum of the length of all trajectory partitions L(D|H) → the sum of the difference between a trajectory and a set of its trajectory partitions p5 p3 p2 p4 p1 pc 1 pc 2 L( H ) log 2(len ( p1 p 4)) L( D | H ) log 2(d ( p1 p 4, p1 p 2) d ( p1 p 4, p 2 p 3) d ( p1 p 4, p 3 p 4)) log 2(d ( p1 p 4, p1 p 2) d ( p1 p 4, p 2 p 3) d ( p1 p 4, p 3 p 4)) L(H) measures conciseness; L(D|H) preciseness 22 Grouping Phase (1/2) Find the clusters of trajectory partitions using density-based clustering (i.e., DBSCAN) A density-connect component forms a cluster, e.g., { L1, L2, L3, L4, L5, L6 } MinLns = 3 L5 L3 L4 L2 L1 L6 L6 L5 L3 L1 L2 L4 23 Grouping Phase (2/2) Describe the overall movement of the trajectory partitions that belong to the cluster A red line: a representative trajectory, A blue line: an average direction vector, Pink lines: line segments in a density-connected set 24 Sample Clustering Results 7 Clusters from Hurricane Data 570 Hurricanes (1950~2004) A red line: a representative trajectory 25 2 Clusters from Deer Data 26 Outline Introduction to Trajectory Data Clustering Classification Outlier Detection 27 Trajectory Classification Predict the class labels of moving objects based on their trajectories and other features 1. Machine learning techniques Studied mostly in pattern recognition, bioengineering, and video surveillance The hidden Markov model (HMM) 2. TraClass: Trajectory classification using hierarchical region-based and trajectory-based clustering 28 Classification Training data Each data element has its class label e.g., <deer, a sequence of points> <elk, a sequence of points> Extract features from the training data and build a classification model Predict a class label of a new data element e.g., <?, a sequence of points> → class label = deer 29 Common Characteristics of Previous Methods Use the shapes of whole trajectories to do classification Encode a whole trajectory into a feature vector; Convert a whole trajectory into a string or a sequence of the momentary speed; or Model a whole trajectory using the HMM Note: Although a few methods segment trajectories, the main purpose is to approximate or smooth trajectories before using the HMM 30 TraClass: Trajectory Classification Based on Clustering (Lee et al. 08) Motivation Discriminative features are likely to appear at parts of trajectories, not at whole trajectories Discriminative features appear not only as common movement patterns, but also as regions Solution Extract features in a top-down fashion, first by regionbased clustering and then by trajectory-based clustering 31 Intuition and Working Example Container Port Refinery Port A Port B Fishery Container Ships Tankers Fishing Boats Parts of trajectories near the container port and near the refinery enable us to distinguish between container ships and tankers even if they share common long paths Those in the fishery enable us to recognize fishing boats even if they have no common path there 32 Trajectory Partitions Region-Based Clustering Region-Based Clustering Trajectory-Based Clustering Features Trajectory-Based Clustering Class-Conscious Trajectory Partitioning 1. Trajectories are partitioned based on their shapes as in the partition-and-group framework 2. Trajectory partitions are further partitioned by the class labels The real interest here is to guarantee that trajectory partitions do not span the class boundaries Non-discriminative Discriminative Class A Class B Additional partitioning points 34 Region-Based Clustering Objective: Discover regions that have trajectories mostly of one class regardless of their movement patterns Algorithm: Find a better partitioning alternately for the X and Y axes as long as the MDL cost decreases The MDL cost is formulated to achieve both homogeneity and conciseness 35 Trajectory-Based Clustering Objective: Discover sub-trajectories that indicate common movement patterns of each class Algorithm: Extend the partition-and-group framework for classification purposes so that the class labels are incorporated into trajectory clustering If an ε-neighborhood contains trajectory partitions mostly of the same class, it is used for clustering; otherwise, it is discarded immediately 36 Selection of Trajectory-Based Clusters After trajectory-based clusters are found, highly discriminative clusters are selected for effective classification If the average distance from a specific cluster to other clusters of different classes is high, the discriminative power of the cluster is high e.g.) C1 C2 Class A Class B C1 is more discriminative than C2 37 Overall Procedure of TraClass 1. Partition trajectories 2. Perform region-based clustering 3. Perform trajectory-based clustering 4. Select discriminative trajectory-based clusters 5. Convert each trajectory into a feature vector Each feature is either a region-based cluster or a trajectory-based cluster The i-th entry of a feature vector is the frequency that the i-th feature occurs in the trajectory 6. Feed feature vectors to the SVM 38 Classification Results Datasets Methods Animal: Three classes ← three species: elk, deer, and cattle Vessel: Two classes ← two vessels Hurricane: Two classes ← category 2 and 3 hurricanes TB-ONLY: Perform trajectory-based clustering only RB-TB: Perform both types of clustering Results 39 Extracted Features Features: 10 Region-Based Clusters 37 Trajectory-Based Clusters Data (Three Classes) Accuracy = 83.3% 40 Outline Introduction to Trajectory Data Clustering Classification Outlier Detection 41 Trajectory Outlier Detection Detect trajectory outliers that are grossly different from or inconsistent with the remaining set of trajectories 1. Whole Trajectory Outlier Detection A unsupervised method A supervised method based on classification 2. Integration with multi-dimensional information 3. Partial Trajectory Outlier Detection The Partition-and-Detect Framework 42 Sample Trajectory Outliers Detect outliers from person trajectories in a room The entire data set The outliers only Our framework can detect the person whose behavior is only partially unusual, whereas previous frameworks cannot 43 Partition-and-Detect Framework (Lee et al. 08) Existing algorithms compare trajectories as a whole They might not be able to detect outlying portions of trajectories e.g.) TR3 is not detected as an outlier since its overall behavior is similar to those of neighboring trajectories TR5 TR TR3 4 TR TR1 2 An outlying sub-trajectory The partition-and-detect framework is proposed to detect outlying sub-trajectories 44 Usefulness of Outlying Sub-Trajectories Example: Sudden changes in hurricane’s path Usual trajectories Sudden change Since Hurricane Charley (Aug. 2004) was expected to hit the land closer to Tampa, many residents around Punta Gorda, Fla., were caught unprepared 45 Overall Procedure Two phases: partitioning and detection TR5 TR4 TR3 TR TR1 2 (1) Partition A set of trajectories A set of trajectory partitions (2) Detect TR3 An outlier Outlying trajectory partitions 46 Simple Trajectory Partitioning A trajectory is partitioned at a base unit: the smallest meaningful unit of a trajectory in a given application e.g.) The base unit can be every single point A trajectory TRout An outlying t-partition A t-partition Neighboring Trajectories Pros: High detection quality in general Cons: Poor performance due to a large number of tpartitions Propose a two-level partitioning strategy 47 Two-Level Trajectory Partitioning Objective Achieves much higher performance than the simple strategy Obtains the same result as that of the simple strategy; i.e., does not lose the quality of the result Basic idea 1. Partition a trajectory in coarse granularity first 2. Partition a coarse t-partition in fine granularity only when necessary Main benefit Narrows the search space that needs to be inspected in fine granularity Many portions of trajectories can be pruned early on 48 Intuition to Two-Level Trajectory Partitioning If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions are also very large (or small) TRi Coarse-Granularity Partitioning Fine-Granularity Partitioning TRj The lower and upper bounds for fine t-partitions are derived in the paper 49 Outlier Detection Once trajectories are partitioned, trajectory outliers are detected based on both distance and density An outlying t-partition is defined as Not close Close TRi ≤ 1‒p Li Li is an outlying t-partition > 1‒p TRi Li Li is not an outlying t-partition A trajectory is an outlier if it contains a sufficient amount of outlying t-partitions 50 Incorporation of Density The number of close trajectories is adjusted by the density of a t-partition T-Partitions in dense regions are favored! Dense region Decreased, Sparse region Increased 51 Sample Detection Results 13 Outliers from Hurricane Data 52 3 Outliers from Elk Data 53 Conclusions Trajectory mining becomes increasingly important as the world becomes more and more mobile We have established a new paradigm of trajectory mining that enables us to extract knowledge of partial trajectories Trajectory Clustering Trajectory Outlier Detection Trajectory Classification Trajectory Pattern Mining 54