A Generalized (k, m)-Segment Mean Algorithm for Todd Samuel Layton

advertisement
A Generalized (k, m)-Segment Mean Algorithm for
Long Term Modeling of Traversable Environments
by
Todd Samuel Layton
B.S., Massachusetts Institute of Technology (2013)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2014
c Massachusetts Institute of Technology 2014. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 23, 2014
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Daniela Rus
Andrew (1956) and Erna Viterbi Professor of EECS
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Albert R. Meyer
Chairman, Masters of Engineering Thesis Committee
2
A Generalized (k, m)-Segment Mean Algorithm for Long
Term Modeling of Traversable Environments
by
Todd Samuel Layton
Submitted to the Department of Electrical Engineering and Computer Science
on May 23, 2014, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Engineering
Abstract
We present an efficient algorithm for computing semantic environment models and
activity patterns in terms of those models from long-term value trajectories defined
as sensor data streams. We use an expectation-maximization approach to calculate
a locally optimal set of path segments with minimal total error from the given data
signal. This process reduces the raw data stream to an approximate semantic representation. The algorithm’s speed is greatly improved by the use of lossless coresets
during the iterative update step, as they can be calculated in constant amortized time
to perform operations with otherwise linear runtimes.
We evaluate the algorithm for two types of data, GPS points and video feature
vectors, on several data sets collected from robots and human-directed agents. These
experiments demonstrate the algorithm’s ability to reliably and quickly produce a
model which closely fits its input data, at a speed which is empirically no more than
linear relative to the size of that data set. We analyze several topological maps and
representative feature sets produced from these data sets.
Thesis Supervisor: Prof. Daniela Rus
Title: Andrew (1956) and Erna Viterbi Professor of EECS
3
4
Acknowledgments
I would like to thank Professor Daniela Rus for finding the kindness in her heart to
supervise me. Week after week, she was always willing to help, never lacking advice on
how to pursue and improve my research, experimentation, and analysis. I’ve learned
much about how to clearly present and evaluate my ideas, a skill which will be of
great value into the future.
In addition, I am grateful to the entire Distributed Robotics Laboratory (DRL), for
being such a great place to hang my proverbial hat during the long slog. In particular,
I would like to thank Danny Feldman for his involvement in the development of this
research, as well as Guy Rosman and Mikhail Volkov for their patience throughout
the trying task of integrating our respective systems.
And finally, I would like to thank my parents, for their incessant weekly support
throughout the year. Especially since they will likely be the only people outside the
lab who actually see this thesis.
5
6
Contents
1 Introduction
19
1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.2
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.3
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
1.4
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
1.5
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.6
Organization
24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Problem Statement
27
2.1
Input Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.2
Output Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.3
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3 Related Work
29
3.1
SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.2
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.3
GPS-Based Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4
Location Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4 Formal Specification
4.1
33
Background: Paths and Trajectories . . . . . . . . . . . . . . . . . . .
33
4.1.1
Segment Constraints . . . . . . . . . . . . . . . . . . . . . . .
34
4.1.2
Path Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
7
4.2
4.3
Data Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.2.1
Fitting Costs and Trajectory Means . . . . . . . . . . . . . . .
35
4.2.2
The (k, m)-Segment Mean . . . . . . . . . . . . . . . . . . . .
36
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.3.1
GPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.3.2
Video Features . . . . . . . . . . . . . . . . . . . . . . . . . .
37
5 (k, m)-Segment Mean Algorithm
5.1
41
k-Partition Initialization . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.1.1
Applying for Non-Linear Segment Models . . . . . . . . . . .
43
5.1.2
Selecting the k Parameter . . . . . . . . . . . . . . . . . . . .
43
m-Clustering Initialization . . . . . . . . . . . . . . . . . . . . . . . .
44
5.2.1
Selecting the m Parameter . . . . . . . . . . . . . . . . . . . .
44
5.3
Segment m-Set Construction . . . . . . . . . . . . . . . . . . . . . . .
44
5.4
Expectation-Maximization Iteration . . . . . . . . . . . . . . . . . . .
46
5.4.1
Expectation: k-Partition and m-Clustering Improvement . . .
46
5.4.2
Maximization: Segment m-Set Reoptimization . . . . . . . . .
47
5.4.3
Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.2
6 Optimization
6.1
49
Fitting Cost Coresets . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
6.1.1
GPS Coresets . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
6.1.2
Feature Vector Coreset . . . . . . . . . . . . . . . . . . . . . .
53
6.2
RDP Partition Initialization . . . . . . . . . . . . . . . . . . . . . . .
54
6.3
K-Means Clustering Initialization . . . . . . . . . . . . . . . . . . . .
55
6.4
Path Segment Calculation . . . . . . . . . . . . . . . . . . . . . . . .
56
6.5
2-Segment Trajectory Update . . . . . . . . . . . . . . . . . . . . . .
56
7 Experimental Evaluation
7.1
59
Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.1.1
59
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
7.2
7.1.2
Processing Environment . . . . . . . . . . . . . . . . . . . . .
62
7.1.3
Proportional Fitting Cost . . . . . . . . . . . . . . . . . . . .
62
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.2.1
Accuracy
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.2.2
Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
7.2.3
Selected Parameters . . . . . . . . . . . . . . . . . . . . . . .
66
7.2.4
Sample Results . . . . . . . . . . . . . . . . . . . . . . . . . .
67
8 Conclusion
73
9
10
List of Figures
1-1 Left: A geographic view of a GPS point signal with a closely-fitted trajectory. Right: The same signal and trajectory, progressing up through
time, along with the underlying topological map. Note that the trajectory and the map are geographically equivalent, indicating the high
degree of path repetition present in the trajectory.
. . . . . . . . . .
21
2-1 Left: A GPS point signal and the path segments of an approximate
(k, m)-segment mean, viewed geographically. Right: The same signal
and the same (k, m)-segment mean’s trajectory segments, viewed progressing up through time, numbered by their place in the k-partition
and color-matched to the underlying path segments. See Chapter 4 for
a formal definition of this terminology. . . . . . . . . . . . . . . . . .
28
7-1 A conceptual layout of the region traversed by the individual as recorded
in the short phone video. In terms of this graph, the individual’s trajectory would be labeled as ABCDBAEDCEBCA. . . . . . . . . .
61
7-2 A conceptual layout of the region traversed by the individual as recorded
in the long phone video. In terms of this graph, the individual’s trajectory would be labeled as ABCABCABDABCABDABCABCABCABDABCA.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
62
7-3 The proportional fitting cost drops off, at first quickly and then more
slowly, as the map size increases. Each data set has a characteristic
curve relation to m. Note that the feature vector data sets have noticeably higher proportional costs than the GPS data sets. This could be
because their much larger dimensionality introduces greater structural
cost into the data, or it could indicate that the ‘natural’ partition size
for those sets is greater that k = 300.
. . . . . . . . . . . . . . . . .
64
7-4 The algorithm’s run time per varies significantly around its average relation to m. While some data sets’ run times generally increase relative
to m, others’ are independent of it, or even decrease as it increases.
This is likely due to the behavior of the EM loop: increasing m might
cause the trajectory to be initialized closer to its local minimum, reducing the number of EM iterations needed, and therefore the total
run time. Given these run times, the system could easily process in
real time data streams up to 5 Hz. . . . . . . . . . . . . . . . . . . .
65
7-5 Two geographic maps of the ground robot data set, with the (300, 20)and (300, 200)-segment trajectory outputs of the algorithm, respectively. With far fewer path segments to utilize, the m = 20 trajectory’s
fit to the GPS points is much rougher, compared to the m = 200 trajectory’s close fit.
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7-6 Two geographic maps of the quadrotor robot data set, with the (300, 20)and (300, 200)-segment trajectory outputs of the algorithm, respectively. Since this data set contains a low degree of actual path repetition, the produced trajectories tend to simply align themselves to the
GPS points.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
67
7-7 Two geographic maps of the personal smartphone data set, with the
(300, 20)- and (300, 200)-segment trajectory outputs of the algorithm,
respectively. Because of the extended discontinuities in the data set,
the algorithm has made a best-effort attempt to bridge these gaps. As
a result, some parts of the map traverse areas lacking any input points.
It may be valuable to ‘repair’ such signal gaps using data patching, as
described in [18]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
7-8 Two clustering maps of the short phone video data set. Because this
video contains only a few repeated loops, the skew of the transition
strength towards the most prominent clusters is relatively low, especially when a large number of clusters are allowed. Despite this, the
qualitative appearance of the clusters’ representative frames are widely
varied, even amongst the most prominent clusters. In the left map,
the most prominent cluster (cluster 1) occurs 92 times, while the least
prominent (cluster 20) only occurs once. In the right map, the most
prominent cluster occurs 7 times, while the least prominent (cluster
200) still only occurs once.
. . . . . . . . . . . . . . . . . . . . . . .
69
7-9 Two clustering maps of the long phone video data set. Unlike the short
video, this video contains a larger number of repeated loops, and so
the transitions’ strengths tend to skew significantly towards the most
prominent clusters. Even at m = 200, the green lines are noticeably
thicker and denser around the top-right region of the map, where the
larger blue circles are arranged. In the left map, the most prominent
cluster (cluster 1) occurs 66 times, while the least prominent (cluster
20) only occurs once. In the right map, the most prominent cluster
occurs 8 times, while the least prominent (cluster 200) still only occurs
once.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
70
14
List of Tables
7.1
If the parameters k and m are not provided as input, the algorithm
attempts to select good values for them as part of the trajectory initialization. For each experimental data set, it selects a characteristic
(k, m) pair. Deviation of these selected values from the ground-truth
parameters is primarily a result of the imprecision of the initial assumptions made about the data, such as the linear or constant structure of
the path segments. In particular, note that the GPS data sets tend to
have low parameter values relative to their ground truths, while the
feature vector sets tend to have high values. . . . . . . . . . . . . . .
15
66
16
List of Algorithms
1
Approximate (k, m)-Segment Mean EM Algorithm . . . . . . . . . . .
41
2
RDP-Based Partition Initialization Algorithm . . . . . . . . . . . . .
42
3
K-Means Section Clustering Algorithm . . . . . . . . . . . . . . . . .
45
4
Optimal Path Segment Set Construction Algorithm . . . . . . . . . .
46
5
2-Segment Partition and Clustering Update . . . . . . . . . . . . . .
47
17
18
Chapter 1
Introduction
1.1
Overview
This thesis describes an algorithm for efficiently creating semantic event representations of raw sensor streams, which capture underlying patterns of the stream in
the form of an activity map. We intend for this algorithm to be applicable to the
creation of life-logging systems, which can track and summarize a person’s activities
using sensor data over an extended period of time. The simplest example of such a
system would be one which uses a GPS trace from a user’s smart phone to develop a
topological map of their daily routine. More complex implementations would have the
ability to analyze this data for the construction of textual descriptions of the user’s
patterns of movement. Towards the goal of developing such a system, we present an
algorithm capable of extracting semantic activities from GPS or other sensor data.
Specifically, the algorithm computes a sequence of instances of semantic activities
(in the form of trajectory segments) corresponding to a data stream produced by
a mobile source over an extended period of time. During this time, if the source’s
data trajectory contains a significant number of repeated representative paths, each
path can be interpreted as a distinct activity. The algorithm identifies and aggregates
these repetitions, as shown in Figure 1-1. The source can be a robot with autonomous
movement, or a machine under external human direction, such as a personal smartphone. The algorithm takes as input formatted sensor data, such as GPS or video
19
feature vectors, and computes both a semantic path graph and a trajectory through
that graph, in the respective forms of an activity set and a sequence within that set.
It is able to efficiently process very large data streams (inputs in the tens of thousands
of points have been tested with no difficulty) by computing trajectory coresets1 [7].
Applications of this semantic representation algorithm include autonomous mapping,
surveillance, and activity recognition.
In this thesis, we describe a technique for developing an activity representation of
GPS or video feature data in the form of an activity graph, present an algorithm for
producing these graphs quickly and with consistently low error, and show the experimental results of applying this algorithm to several data sets of varied size and origin.
The input consists of a GPS stream (a series of geographic coordinates) or a feature
vector stream (a series of feature-to-frame correlation weights) with accompanying
timestamps. Despite uncertainty in the data, both from error in the data measurements and from the imprecise nature of semantic activity identification in the real
world, the algorithm aims to identify the repeated patterns of representative paths
that appear in the source’s trajectory over an extended period of time. For example,
the GPS stream produced by a car moving around a city would provide a partial road
map of the city, and that of a robot in long-term operation would reveal its common
movement paths. Identifying the best path graph under these constraints, the simultaneous division of the data stream into simple trajectories and clustering of these
trajectories into shared paths, is referred to as the (k, m)-segment mean problem.
1.2
Motivation
With recent advances in device technology, modern society now produces a surfeit of
data of all types. Thus, research focus has turned to the field of ‘big-data’ analysis,
the endeavor to extract meaningful results from these large-scale data sets. Due to
the inherent difficulty of processing large input sets, this field is still largely in its
1
Coresets are a strategy for improving the computation runtime of an operation by preprocessing
the inputs into a reduced form ( the ‘coreset’), see Section 6.1.
20
GPS input signal
constructed trajectory
underlying topological map
latitude
time
GPS input signal
constructed trajectory
underlying topological map
longitude
latitude
longitude
Figure 1-1: Left: A geographic view of a GPS point signal with a closely-fitted
trajectory. Right: The same signal and trajectory, progressing up through time,
along with the underlying topological map. Note that the trajectory and the map
are geographically equivalent, indicating the high degree of path repetition present in
the trajectory.
infancy, and most big-data systems are designed to solve relatively straightforward,
‘number-crunching’ mathematical problems. For analysis requiring a more complex,
nuanced approach, human involvement is often required, introducing variability and
significant slowdown into what could be an automatic process.
For example, the current standard approach to creating topological maps, such as
is used to produce Google Maps, relies primarily on information obtained through geographical surveys. Such surveys are expensive and infrequent, and so the data they
provide can easily become outdated as the result of changes to the surveyed environment, for example by construction and development. Alternatively, collaborative
approaches such as OpenStreetMap can be more quickly amended and updated, but
are dependent on the highly variable availability and quality of crowdsourced data,
possibly impacting their accuracy and coverage. Furthermore, these approaches both
focus on roads and other outdoor paths, neglecting building interiors and other indoor
spaces, for which data collection can be greatly complicated by privacy and ownership
issues.
This thesis aims to describe an approach to modeling such problems in mathematical form and to demonstrate an algorithmic approach to solving them. Although
several assumptions and approximation are necessarily made in reducing the concep21
tual problem to a quantitative objective, this algorithm shows how complex structural
content can be extracted from large data sets in a manageable processing time.
1.3
Challenges
Satisfying constraints Algorithmically developing a semantic model under any
non-negligible constraints, such as those involved in identifying patterns of repetition, can be challenging. Describing a progression of values through time as a
single smooth function is infeasible, so it must be constructed from some number
of piecewise components. The standard approach, using dynamic programming, is
asymptotically too slow for effectively handling large data sets. Furthermore, as
these values are constrained to match an underlying representative model with a
smaller number of elements, each component of the value trajectory must be optimized not individually, but instead in conjunction with all other pieces that traverse
along the same representative section. These constraints are necessary in order to
encourage the expression of the source’s patterns of behavior in the structure of the
resulting model.
Managing data noise Determining an agent’s location without first deriving its
relative position presents a very different set of challenges than using relative-position
sensor data. Agent-local data streams, such as vision or rangefinding, are more
complex and tend to have relatively low inherent error, but accumulate inaccuracy
over time. Long-term SLAM techniques based on those forms of data, such as
those described in [9] and [4], must contend primarily with this drift, by applying
loop closing or other correctional methods, which have high computation times.
Conversely, global location data, such as GPS, and non-geographic characterization
data, such as video feature vectors, have comparatively high error, but which does
not necessarily grow over time. Therefore, a GPS- or feature-based environment
modeling algorithm must be able to robustly handle significant error throughout
the entire data stream. As a result, this semantic algorithm is less accurate than
a SLAM approach, but also much faster and with more consistently bounded error
22
over long operation times.
Generalizing across data types This system is intended to implement the (k, m)segment mean algorithm without restricting its application to a single type of data,
such as GPS or feature vectors. In order to achieve this, the generalized operation
framework of the algorithm (the overarching EM process) must be properly distinguished from logic specific to the data type being processed (input structure, error
model, etc.). This requires that each step of the algorithm be implemented in a sufficiently abstract manner, so as to allow it to potentially handle data of an arbitrary
type.
1.4
Contributions
This thesis contributes the following theoretical, systemic, and experimental results.
Trajectory and model creation A time-efficient (linear on the input signal size)
algorithm for recognizing repetitions in an agent’s trajectory and constructing a
semantic model of its behavior, as exemplified in Figure 1-1, by finding a locally
optimal approximate solution to the (k, m)-segment mean problem, for a GPS point
set or a video feature vector set.
Algorithm implementation A practical system implementing this algorithm, taking as input a GPS or video feature data set and producing a well-fitted trajectory
and underlying representative model.
Experimental evaluation A statistical analysis of the algorithm’s behavior when
applied to several different GPS and feature data input sets, numerically demonstrating its accuracy and speed, as well as a selection of constructed geographic
maps and representative feature sets, highlighting several qualitative features of the
algorithm and its output.
23
1.5
Applications
The work described in this paper can be gainfully applied to problems in the following
areas.
Map construction For a system operating long-term in an unknown environment,
developing a model of that environment is a foremost priority. By constructing and
applying a semantic data space, such as the geographic region map produced by the
(k, m)-segment mean algorithm with GPS input, the system can interpret sensor
information in a more meaningful context than in its raw form. This approach is
especially valuable when the data are noisy or their native format is particularly
opaque. Furthermore, by aggregating data from multiple agents within a region, the
constructed map’s accuracy and coverage can be drastically improved [12].
Pattern recognition By clustering discrete segments of the data stream according
to their structure, this algorithm creates a cluster signal, the sequence of cluster
assignment labels through time. Intelligently evaluating this signal to identify repeated subsequences can reveal semantic patterns underlying the original input signal. These patterns can then be compared to new data as the stream grows in order
to predict the source’s behavior [19].
Semantic compression As elaborated in [14] and [18], transforming a data set into
an activity-based semantic representation can massively improve its compressibility,
resulting in significant savings in data storage space. Moreover, because such semantic compression operates on a completely different basis than traditional text-based
compression, the two can be used in conjunction to reduce the compressed data’s
size even further.
1.6
Organization
Chapter 2 states the (k, m)-segment mean problem, laying out the inputs it recieves
and the intended results it should provide. Chapter 3 discusses existing works related
24
to this problem and its various aspects, both in general and for particular types of
data. Chapter 4 formally defines the (k, m)-segment mean and other terminology
involved in this system. Chapter 5 describes the structure, operation, and behavior of this approximate (k, m)-segment mean algorithm. Chapter 6 elaborates the
performance optimizations and runtime improvements of the algorithm’s implementation. Chapter 7 presents and analyzes experimental results of this implementation’s
application to various data sets. Chapter 8 concludes this thesis.
25
26
Chapter 2
Problem Statement
In this chapter we describe the (k, m)-segment mean algorithm. This algorithm takes
as input a sensor data stream and recognizes repeating patterns in this stream. The
patterns corresponds to a (k, m) segmentation of the data.
2.1
Input Structure
The primary input of the (k, m)-segment mean algorithm is the data stream to be
processed. This consists of a sequence of data values, timestamped and ordered
by time. Additionally, the algorithm may receive settings for one or both of its
parameters: k, the number of sections into which to partition the data; and m, the
number of underlying representative elements to which to assign those sections. If
either of these parameters is not specified in the input, the algorithm performs a
parameter finding process as part of its initialization step.
2.2
Output Objective
The (k, m)-segment mean algorithm should produce three data structures: a partition
of the input data stream into k sections, a collection of m representative semantic elements, and a clustering assignment of those k sections to those m elements. Together,
these products should describe a high-accuracy approximation of the (k, m)-segment
27
mean of the input data.
2.3
Data Types
The (k, m)-segment mean algorithm aims to be sufficiently general as to be easily
extensible to any arbitrary type of data. In our implementation, we focused on two
particular data types which help illustrate this principle: GPS and video features.
For GPS, the original motivational application of this algorithm, the input is a GPS
trace of an agent’s trajectory, and the resulting representative model is a topological
map traversed by that trajectory. For video features, the raw data is a video stream
of an agent’s viewpoint as it traverses the environment, but the algorithm’s input is a
stream of feature vectors, produced by applying image feature recognition techniques
to that video stream. The resulting model is a set of feature vectors, each representing
a local region of the environment as percieved by the agent through the video.
GPS input
trajectory segments
path segments
66
70
65
64
47 58
52
46
GPS input
trajectory segments
path segments
selected fitting distances
latitude
time
33
26
7
latitude
longitude
34 45
39
27
32
25
8 19
1 13
6
67
59
63 53
57
51
40
44
38
31
20
24
14
18
12
48
35
28
9
2
68
69
62
56
50
43
37
30
23
17
11
5
60
54
49
61
55
41
36
29
42
21
15
10
3
22
16
4
longitude
Figure 2-1: Left: A GPS point signal and the path segments of an approximate
(k, m)-segment mean, viewed geographically. Right: The same signal and the same
(k, m)-segment mean’s trajectory segments, viewed progressing up through time,
numbered by their place in the k-partition and color-matched to the underlying path
segments. See Chapter 4 for a formal definition of this terminology.
28
Chapter 3
Related Work
3.1
SLAM
SLAM (simultaneous localization and mapping) describes the class of problem wherein
an agent must jointly construct an understanding of its surrounding environment
(mapping) and of its position within that environment (localization), from the noisy
data produced by its sensors. The development of robust, generalized SLAM algorithms is an ongoing area of research [6]. While the (k, m)-segment mean problem
can generally be considered a type of SLAM, it differs in several significant ways
from the sort of SLAM systems that are the focus of most research. ‘Traditional’
SLAM techniques are based on data produced by local sensors, whose measurements
are relative to the pose of the agent which hosts them [5]. These sensors, such as
cameras and range-finders, tends to have high precision and low error in their actual
data. However, deriving a global pose from local data requires the aggregation of
those data, and so error tends to accumulate over time. For this reason, one of the
main challenges in most SLAM applications is loop closing, determining when the
agent has returned to a previously-visited position.
The (k, m)-segment mean, in contrast, does not attempt to integrate sensor data
in this way. Instead, it uses the inherent information in its input data (such as
global position for GPS, or qualitative environment description for detected video
features) to build a description of its surroundings in term of the same characteristics
29
by which the data describes them. The (k, m)-segment mean algorithm thus does
not suffer from the loop closing problem. However, while the error in its sensor
data does not accumulate over time, that data does tend to natively have higher
error and lower precision than the kinds used in existing SLAM systems. Therefore,
whereas most SLAM applications must first and foremost contend with the difficulty
of building a self-consistent global environment model, the (k, m)-segment mean’s
primary challenge is discerning the environment’s underlying structure from its noisy
input data.
3.2
Clustering
The process of divvying up the input stream’s points by their mutual fit is a form
of clustering. Clustering is a well-studied domain, including problems with notable
parallels to ours, such as projective clustering [1]. However, viewed as a clustering
problem, the (k, m)-segment mean construction objective has several distinct challenges. It requires multiple layers of clustering, first of data values into trajectory
sections, and then of those into semantic paths. Furthermore, because each trajectory segment must span a continuous, unbroken range of the input stream, the time
order of the values is an additional constraint on a cluster assignment’s validity.
3.3
GPS-Based Mapping
Extracting geographic and topological maps from GPS data, which is the application of the (k, m)-segment mean algorithm to geographic input, is not a new idea.
A variety of algorithmic approaches to this problem have been explored [2]. Most
use a clustering strategy similar to ours, though some use different methods such
as trace merging [3] or kernel density estimation [16], in order to construct a path
representation from input streams.
30
3.4
Location Recognition
Similarly, analyzing video data to identify distinct locations or scenes has been attempted using a variety of approaches, each with its own strengths and weaknesses.
Some, such as [10], use existing comprehensive databases into order to globally identify the view location. Others, such as [11], focus on performing deep image analysis
to extract the semantic structure of the video, in order to facilitate inter-data comparisons. Some approaches specifically focus on the clustering problem [15] as ours
does, though they often still incorporate a specific image feature analysis model into
their processing. Our approach instead aims to be as general as possible. It makes
only the most basic assumptions about the semantic meaning of the input feature
data, allowing for its specific clustering behavior to be defined by the intentions of
the vector quantization algorithm which preprocesses the video.
31
32
Chapter 4
Formal Specification
The following sections precisely define the terminology used in this thesis to describe
the mathematical structures involved in the algorithm’s operation, building up to a
formal specification of the (k, m)-segment mean.
4.1
Background: Paths and Trajectories
Definition 1 (trajectory). A trajectory is a function f~ : T → Rd , where T is a
subinterval of R. f~(t) represents the value of the trajectory at time t.
Definition 2 (path). A path is a function f~ : T → Rd , where T is a subinterval of R.
Unlike a trajectory, however, f~(τ ) is parametric, and so does not give any particular
significance to τ . The ‘velocity’
df~
dτ
is not meaningful, as f~(g(τ 0 )) is considered the
same path regardless of the choice of monotonic non-decreasing function g : T 0 → T .
Both paths and trajectories can be thought of as traces of an agent’s progression
through the value space Rd , where a trajectory includes an explicit measure of the
passage of time, but a path does not. It is easy, then, to see how any trajectory can
be transformed into a path, by simply removing its time component. Inversely, any
path can be expanded into an infinite number of possible trajectories, by applying to
it an abritrary time component. Note that these definitions alone do not include any
constraints on the continuity, differentiability, or other forms of ‘neatness’ of paths
33
and trajectories. However, in order to practically construct and manipulate them,
some reasonable assumptions will need to be made.
4.1.1
Segment Constraints
Definition 3 (k-segment function). A k-segment function is a function f~ : T → Rd ,
where T is a subinterval of R, which is smooth on all intervals Ti , i = 1 → k, where
{Ti } is a partition1 of T . Each Ti is called a section of the k-partition, and each
subfunction f~i (t) : Ti → Rd is called a segment.
Further constraints can be placed on a k-segment function by restricting its segments to a certain mathematical category, such as constants, lines, parabolae, etc.
Henceforth, we will only consider k-segment trajectories and paths, constrained to
line segments for GPS data and to constant segments for video feature data.
4.1.2
Path Graphs
Definition 4 (path graph). A path graph is a graph in which each vertex is associated
with a path.
In particular, we constrain the paths associated with a path graph’s vertices to
be 1-segment paths. A k-segment path, and thus a k-segment trajectory, can be
considered a sequence of 1-segment paths, so a directed path graph can be constructed
from the adjacencies of segments in that sequence. A path graph produced this way
is not particularly meaningful, unless there are segments which are repeated in the
path. This system is intended to work with paths which contain a significant degree
of segment repetition. This mapping of the trajectory to a path graph constitutes an
m-clustering of the trajectory segments to path segments, where m is the number of
path segments (i.e. vertices) in the graph. The edges of the graph can be assigned
weights representing the proportional occurrence of that transition in the k-segment
1
Unlike the usual mathematical definition, this thesis allows partitions to contain empty intervals.
Thus, every k-partition is also a (k + c)-partition for any positive integer c
34
path, resulting in a Markov process describing the probabilistic behavior of the agent
whose activity the path describes.
4.2
Data Fitting
Definition 5 (data stream). A data stream S = (T, V~ ) = [(ti ∈ R, ~vi ∈ Rd ) : i =
1 → n] is a sequence of timestamp-value pairs, in non-decreasing order by timestamp.
A data stream can be thought of as a possibly-noisy sampling of a trajectory. It
represents the actual measured sensor data which is available as input to the system.
The system’s goal is to reconstruct (a close approximation of) the original trajectory
from a data stream, by finding the trajectory which best ‘fits’ the stream.
4.2.1
Fitting Costs and Trajectory Means
Definition 6 (error model). An error model is a function Err : Rd × Rd → R≥0 ,
representing the likelihood error between a model value (the prior) and a data value
(the posterior).
To be a useful metric, an error model Err(~x|~λ) should have have three particular
properties.
~ = {~xi , ..., ~xn } and all ~λ, Err(X|
~ ~λ) =
Property (additive joining). For all X
n
P
Err(~
xi |~λ).
i=1
~ = {~xi , ..., ~xn }, argmin Err(X|
~ ~λ) =
Property (likelihood maximization). For all X
~λ
~ = argmax p(X|
~ ~λ).
argmax L(~λ|X)
~λ
~λ
Property (null optimality). For all ~x, min Err(~x|~λ) = 0.
~λ
Additive joining declares that the error from multiple points can be aggregated
by sum. Likelihood maximization declares that error corresponds to probability in
that their optima occur at the same model value. Null optimality guarantees that the
error of an individual data value can always reach zero, removing the data’s structural
likelihood from consideration.
35
Definition 7 (fitting cost). Given a trajectory f~, a timestamp-value pair (t, ~v ), where
t is in the time range of f~ and ~v is of the same data type at f~, and an error model
Err(~v |~λ), the fitting cost of f~ to (ti , vi ), C(ti , vi |f~), is equal to Err(~v |f~(t)). Given a
data stream S = (T, V~ ) instead of a single timestamp-value pair, the fitting cost of f~
to all of S, C(S|f~), is equal to the sum of its fitting costs to each pair in S.
The fitting cost describes how well-matched a trajectory is to a data stream, given
an error model applicable to the data type. Similarly, a path segment can be said to
have a fitting cost as well, equivalent to the fitting cost of a trajectory segment which
projects to that path segment and which spans the data stream’s time range with
some ‘reasonable’ speed function. In this thesis, we constrain that speed function to
be constant.
Definition 8 (trajectory mean). Given a data stream S, the trajectory mean f ∗ of
S is the trajectory with the lowest fitting cost to S, f~∗ = argmin C(S|f~).
f~
Of course, the search space of all possible trajectories whose time ranges contain
S is unfeasibly large, so it is reasonable to constrain the optimization problem to
trajectories of a certain structure.
4.2.2
The (k, m)-Segment Mean
Definition 9 ((k, m)-segment trajectory). A (k, m)-segment trajectory is a k-trajectory
which can be reduced to a path graph with m vertices.
In other words, a (k, m)-segment trajectory contains k trajectory segments, but
only m path segments, which are repeated so as to bridge the difference between those
two parameters.
Definition 10 ((k, m)-segment mean). Given a data stream S, the (k, m)-segment
mean is the trajectory mean of S among the category of (k, m)-segment trajectories.
The ultimate objective of this system is to find the (k, m)-segment mean of the
input stream, the (k, m)-segment trajectory best fitted to that stream. This trajec36
tory corresponds to a hidden Markov model underlying the input data, obscured by
sampling noise.
4.3
Data Types
The data-type-specific components of the algorithm must also be clearly defined, and
specified for the particular types which are implemented and used by this system.
4.3.1
GPS
A GPS point is a geometric value in R2 . It uses the geometric (squared distance) error
model, Err(~x|~λ) = ||~x − ~λ||2 . The 1-segment mean of a GPS data stream S = (T, V~ )
n
P
can thus be found by solving argmin ||~vi − f~(ti )||2 .
f~
4.3.2
i=1
Video Features
Each frame in a video stream is represented by a set of recognized features. Each
frame’s feature vector is a non-geometric value in (R≥0 )d , where d is the number of
distinct features found throughout the video.
Error Model
Feature data is patently non-geometric, not least because its support does not encompass all of Rd . It thus requires a non-geometric error model to evaluate comparisons
between a model feature vector and an element of a data stream. Conceptually, we
need a distribution function to describe image feature data. This distribution should
reflect the idea of a feature value as the relative ‘strength’ of that feature in the
image, without relying on the specific definition or method of calculation of the feature values. To this end, we choose to base the error model on a Poisson distribution,
π(x|λ) =
λx e−λ
,
x!
conceptualizing a feature value as the ‘count’, or number of instances,
of that feature in the image. However, as feature data is not limited to integers, it
must be thought of an expected or approximate count. This necessitates the adapta37
tion of the Poisson distribution (natively defined only for integers) to the continuous
support R≥0 .
This idea, of a ‘continuous Poisson distribution’, often arises in attempts to model
occurence data, but a single precise definition of this distribution is deceptively difficult to discern. [8] presents one definition, derived from the Poisson distribution’s
CDF Π(x|Λ) =
Γ(bx+1c,Λ)
.
Γ(bx+1c)
However, this approach is not viable for our modeling pur-
poses, most obviously because its continuous Poisson distribution has π̃(0|λ > 0) = 0,
meaning that a zero feature value could never be produced by a nonzero model segment. Instead, this system uses an alternate appoach based on a non-uniform prior
distribution of λ.
Consider the function f (x|λ) =
λx e−λ
,
Γ(x+1)
which is the extension of the discrete
Poisson distribution π(x|λ) into the continuous domain.2 This function is not normalized over x, which can be remedied by the addition of a normalization factor,
R∞
resulting in the distribution π̃(x|λ) = f (x|λ)
: Kλ = 0 f (x|λ)dx. However, it is
Kλ
not feasible to include Kλ in an optimization operation, so we include the prior
R∞
p(λ) = KKλ : K = Kλ dλ. This results in the distribution π̃(x, λ) = f (x|λ)
. By the
K
0
definition of the Γ-function, K is known to be infinite, which would also be infeasible,
except that this term can be extracted from consideration, as will be shown.
Consider the negative logarithmic probability of this distribution, − log π̃(x, λ) =
λ − x log λ + log Γ(x + 1) + log K. By the mathematical properties of the logarithm,
this term fulfills the first two requirements for an error model, additive joining and
likelihood maximization. The third requirement, null optimality, is not fulfilled, since
at the optimal model value is − log π̃(x, x) = x − x log x log Γ(x + 1) + log K > 0.
Instead, define the error function:
Err(x, λ) = − log π̃(x, λ) + log π̃(x, x) = (λ − x) − x(log λ − log x)
Because this differs from the negative logarithmic probability only by an additive
term that is independent of λ, the first two requirements are preserved, and the
2
Note that f (x|λ) has a hole discontinuity at x = 0, λ = 0. This hole can be ‘plugged’ by finding
the function’s limit, f (0|0) = 1, using l’Hôpital’s rule.
38
error function also achieves the third property, while also canceling out K from the
function. This is the error model which will be used for each dimension of the image
feature data.
In addition to an explicit model of error given the data and model values, it is also
necessary to be able to calculate the optimal model value given a set of data values.
~ = (X
~ 1 , ..., X
~ n ), the model
Theorem 1. Given a video feature vector data set X
~ ~λ) = E[X].
~
element which minizes the total error is ~λ∗ = argmin Err(X,
~λ
Proof. Consider a single dimension of a feature data set and the corresponding model
value.
n
dErr(X, λ∗ )
d X ∗
= ∗
((λ − Xi ) − Xi (log(λ∗ ) − log(Xi ))) = 0
dλ∗
dλ i=1
n
X
Xi
(1 − ∗ ) = 0
λ
i=1
n
1X
λ =
Xi = E[X]
n i=1
∗
The model value can be optimized by simply setting it to the average of the data
values. By extension, since the error model operates independently on each dimension,
~ the average of the feature
the optimal multi-dimensional model element is ~λ∗ = E[X],
vector data set.
Production and Preprocessing
Because the (k, m)-segment mean algorithm takes feature vector data as its input
rather than a video frame sequence, it is first necessary to use a separate feature
recognition system to produce feature data from raw video. This data type and
its error model are applicable to any kind of feature vector which purports to signify
expected ‘counts’ of features in frames. To produce such data, this thesis relies on the
system described in [13]. That process uses an adaptive vector quantization across
the entire video sequence to produce a high-dimensional (d = 5000) bags-of-words
feature data set, and then applies median filtering to compress this data down to a
39
low-dimensional (d = 300) representation. The resulting feature data is provided as
input into the (k, m)-segment mean algorithm.
40
Chapter 5
(k, m)-Segment Mean Algorithm
ApproxKMSegmentMean (Algorithm 1) is a local-approximation, expectationmaximization algorithm for solving the (k, m)-segment mean problem, as defined in
Chapter 2, using input as specified there. Given a data stream S = (T, V~ ), it initializes a reasonable first guess (k, m)-segment trajectory, then repeatedly alternates
between locally improving the k-partition-and-m-clustering and re-optimizing the m
path segments. It terminates once a locally cost-minimal solution is reached, at which
point the patterns of movement in the original data should have sufficiently coalesced
so as to be expressed in the resulting semantic map.
Algorithm 1 Approximate (k, m)-Segment Mean EM Algorithm
1: procedure ApproxKMSegmentMean(S, k?, m?)
2:
initialize k-partition part with RDPPartition(S, k?)
3:
initialize m-clustering clustering with KMeansCluster(S, part, m?)
4:
construct segment m-set segments with OptimalSegments(S, part, cluster)
5:
loop
6:
TwoSegmentUpdate(S, part, segments) with odd boundaries fixed
7:
TwoSegmentUpdate(S, part, segments) with even boundaries fixed
8:
reconstruct segments with OptimalSegments(S, part, cluster)
9:
evaluate total fitting cost cost
10:
if this cost is no less than the previous cost then
11:
return the (k, m)-segment trajectory {part, cluster, segments, cost}
41
5.1
k-Partition Initialization
A modified version of the Ramer-Douglas-Peucker algorithm, RDPPartition (Algorithm 2), is used to partition the input stream into k complete, non-overlapping
sections, the k-partition of the initial (k, m)-segment trajectory.
Algorithm 2 RDP-Based Partition Initialization Algorithm
1: procedure RDPPartition(S, k?)
2:
initialize part as a size-|S| 1-partition
3:
for i = 2 → |S| do
4:
for all data points (~x, t) ∈ S do
~ be the spanning segment of the section of part containing (~x, t)
5:
let L
~ to (~x, t)
6:
calculate the fitting cost from L
7:
find the pair ((~xi , ti ), (~xi+1 , ti+1 )) with the greatest combined fitting cost
8:
insert a boundary between these points into part . part is an i-partition
9:
if input k is defined then
10:
if i = k then
11:
return part
12:
else
13:
calculate S’s optimal i-trajectory with i-partition part
14:
record the fitting cost costi of this i-trajectory to S
15:
record the current part as parti
16:
find the elbow of the (i, costi ) curve
17:
select k to be the i found this way
18:
return partk
There are three key modifications from typical RDP:
• The standard RDP algorithm uses a purely geometric definition of the ‘distance’
cost between the model segment and the input values. In the modified version,
at line 6, the stream’s timestamps are used, in conjunction with its data type’s
error model, to calculate the per-point fitting cost as the distance instead.
• The standard RDP algorithm uses each value’s individual cost to its containing
subsequence’s spanning segment (the line segment from the subsequence’s first
value to its last value) as the metric for determining whether to split that subsequence at that value. In the modified version, since the (k, m)-segment mean
allows discontinuities between consecutive segments, the metric, at line 7, is in42
stead the combined costs of each consecutive pair of values within a subsignal.
• The standard RDP algorithm terminates once the maximum cost between a
value and its containing subsequence’s spanning segment falls within some given
> 0. In this version, it is fixed that = 0, and termination instead occurs
once k − 1 splits have been performed, at line 11, resulting in a k-partition,
assuming the k parameter value is given.
5.1.1
Applying for Non-Linear Segment Models
The RDP algorithm places splits between segments at data values which deviate
the furthest from the existing bounding segments. The efficacy of this approach
is reliant on certain assumptions: namely, that the value discontinuities between
adjacent segment ends are negligible; and that individual segments cannot possess
internal local extrema, so an extremum must indicate a joint between two segments.
GPS data, being derived from a trace of a continuous geographic trajectory with
notionally linear segments, fulfills these requirements, but other data types might
not. Pertinently, video feature vector data does not, as its constant segments can have
arbitrarily large inter-segment discontinuities, and consist entirely of local extrema
by dint of each one having a single value across its time span.
Thus, the data must be transformed into a structure which does fulfill these assumptions before RDP is applied. For feature data, this transformation is simple; the
stream is cumulatively summated. The discontinuity between adjacent sums is of the
magnitude of a single input value, and constant segments in the original data space
correspond to linear segments in the transformed space. RDP can then be effectively
applied to this transformed data stream.
5.1.2
Selecting the k Parameter
If k is not given to the algorithm as input, then the k-partition initialization step
attempts to determine the best value of k to fit the input stream. It uses a standard
elbow finding method in order to identify the k at which the cost improvements gained
43
from adding cluster shift from significant to incremental [17]. Moreover, because the
RDP algorithm is naturally progressive (i.e. it inductively calculates the result for k
from the result for (k − 1)), it is possible to calculate the fitting costs for all k up to
the elbow region, sufficient for finding the elbow a line 16, in the same asymptotic
runtime as calculating for a single k.
5.2
m-Clustering Initialization
Once the input stream has been partitioned, the k sections are grouped into m clusters
using the k-means algorithm, as described in KMeansCluster (Algorithm 3). Each
centroid is a model segment, and the appropriate error model is used to calculate the
cost of assigning a given section to a given centroid. However, it is important to note
that these sections may have internal cost: the fitting cost of a single section to its
own centroid is not necessarily zero. In keeping with the principle of null optimality,
therefore, each section’s fitting cost is subtracted from its centroid-assignment costs.
In this way, sections with higher internal cost are not unduely favored. The result of
this process is an m-clustering, which assigns the k sections to m clusters.
5.2.1
Selecting the m Parameter
Similar to the k parameter selection process, the m parameter is selected using the
elbow method. However, because k-means is not progressive in the same way, instead
a ternary search is used to locate the elbow value among m ∈ {1, ..., k}, performing kmeans at each candidate value. This multiplies the asymptotic runtime of the process
by a factor of O(log k) compared to performing a single k-means with m as an input.
5.3
Segment m-Set Construction
Once the k-partition and m-clustering have been initialized, the set of m underlying
path segments can be produced by OptimalSegments (Algorithm 4). For each
cluster, a segment is calculated to minimize the total error from it to the partition
44
Algorithm 3 K-Means Section Clustering Algorithm
1: procedure KMeansCluster(S, part, m?)
2:
if m is defined then
3:
set search range to {m}
4:
else
5:
initialize discrete ternary search range {1, ..., part.k}
6:
calculate and record the 1-clustering cost cost1
7:
calculate and record the (part.k)-clustering cost costpart.k
8:
loop
9:
if search range contains at least two non-boundary values then
10:
select candidate non-boundary values ma and mb in search range
11:
let M be {ma , mb }
12:
else
13:
let M be the entire search range
14:
for all mi ∈ M do
15:
initialize mi -clustering cluster of part’s sections with k-means++
16:
perform k-means on cluster
17:
record current cluster as clustermi
18:
calculate and record the mi -clustering cost costmi
19:
find elbow in {(mi , costmi )} with limits {(1, cost1 ), (part.k, costpart.k )}
20:
if search range contains at least two non-boundary values then
21:
update search according to whether the elbow is ma or mb
22:
else
23:
return clusterm for elbow value m
45
sections assigned to that cluster. This set will be equivalent to the final centroid set
from the k-means process.
Algorithm 4 Optimal Path Segment Set Construction Algorithm
1: procedure OptimalSegments(S, part, cluster)
2:
for all clusters in cluster do
3:
collect all sections of part assigned to that cluster by cluster
4:
extract all substreams of S defined by those sections
5:
calculate path segment with minimal total fitting cost to those substreams
6:
return the set of those m path segments
5.4
Expectation-Maximization Iteration
Once the components of the (k, m)-segment trajectory have been initialized, the algorithm performs an iterative improvement process using an expectation-maximization
approach.
5.4.1
Expectation: k-Partition and m-Clustering Improvement
While the set of m path segments is held fixed, the k-partition boundaries and mclustering assignments are jointly improved. This is achieved with two substeps: in
each one, either the even or odd partition boundaries are held fixed, and each twosection pair between them is independently updated using TwoSegmentUpdate
(Algorithm 5) by calculating its 2-segment optimum given the existing path segment
set. In this way, an NP-hard problem is reduced through approximation to a series
of parallelizable polynomial problems. Moreover, as long as the initialization process
was effective enough to place the (k, m)-segment trajectory in the correct local descent
region, the effect of this approximation on the resulting cost is negligible.
46
Algorithm 5 2-Segment Partition and Clustering Update
1: procedure TwoSegmentUpdate(S, part, segments)
2:
for all adjacent section pairs in part do
3:
join that pair into a single section
4:
for all possible boundary positions in that section do
5:
split that section into two at that position
6:
for all segments in segments do
7:
calculate the fitting cost of that segment to S within each section
8:
indentify the lowest-cost segment for each section
9:
identify the boundary position with the lowest-cost pair of sections
10:
return the aggregate (k, m)-segment trajectory
5.4.2
Maximization: Segment m-Set Reoptimization
Once the k-partition and m-clustering have been updated, they are in turn held
fixed while the set of m path segments is recalculated based on them, again using
OptimalSegments. Unlike the expectation step, this optimization is not incremental; rather, the segments are completely and precisely refitted to their assigned
sections, in the same way that they were initially constructed.
5.4.3
Termination
This iterative process continues until an iteration occurs in which the total fitting
cost does not improve. At this point, the locally optimal (k, m)-segment trajectory
has been found, and the algorithm terminates, producing this trajectory as the approximate (k, m)-segment mean. See Chapter 7 for a full experimental analysis of the
resulting output.
47
48
Chapter 6
Optimization
Given this algorithm’s extensive use of the EM paradigm, it is nontrivial to realistically evaluate its runtime, even before the application of any optimization techniques.
Instead, we will identify and justify the various individual runtime improvements
which this implementation achieves, and evaluate their relative gains over the naive
alternative.
6.1
Fitting Cost Coresets
Many of the significant runtime reductions achieved by this implementation are a
result of the usage of fitting cost coresets. In general, coresets are a type of intermediate data structure, derived from an initial data set, which can reduce the runtime of
performing a certain calculation in aggregate on that data set, possibly with a degree
of approximation [7]. In this case, a lossless coreset is used to reduce the amortized
runtime of a group of fitting cost calculations. Specifically, these coresets should have
three particular properties.
Property (section-segment independence). The fitting cost coreset of a stream section should constructible independent of any path segment. The fitting cost to a particular segment can then be calculated using only that segment and the previouslyconstructed coreset. The runtime necessary to construct a coreset should be asymptotically no greater than that which is necessary to perform a traditional fitting cost
49
calculation. The runtime necessary to calculate a fitting cost using an existing coreset
should be asymptotically less than that which is necessary to perform a traditional
fitting cost calculation.
In other words, using an intermediate coreset to calculate a fitting cost should
never have a worse runtime than calculating the cost directly, and should have a
better runtime when multiple costs are calculated for a single stream section.
Property (cumulative construction). Given a stream section and the corresponding
fitting cost coreset, it should be possible, after adding a single point to either end of the
section, to update the coreset to include this point in a runtime which is asymptotically
faster than completely recalculating the new coreset from the updated section.
Essentially, this allows the fitting cost corset to be cumulatively calculated for a
given stream section, producing the coresets for all O(n) left-or-right-affixed subsections in a reduced runtime compared to what is generally necessary to construct O(n)
different coresets.
Property (optimizability). The optimal model segment for a stream section should
be calculable from that segment’s coreset in a runtime asymptotically no worse than
that of calculating the segment directly from the stream.
For both GPS and feature data, a fitting cost coreset can be constructed in linear
time relative to the stream section’s size, at which point a fitting cost can be calculated
in constant time relative to that size. A group of cumulative coresets can be still be
constructed in linear time, rather than the natural quadratic runtime. The optimal
segment can also be calculated from the coreset in constant time per dimension.
6.1.1
GPS Coresets
~ =
The fitting cost from a trajectory line segment ~λ(t) to a GPS stream section S
~ can be expressed in matrix terms as ||Y~T − X||
~ 2 , where Y~T consists of the
(T, X)
F
values of ~λ(t) at each Ti ∈ T [18]. For a path line segment described by its endpoints
(~λs , ~λe ), then, the segment must first be progressed into the time domain so that
50
it exactly spans the range [T1 , T|T | ] = [Ts , Te ]. The value of this time-progressed
1
~ where
((Te − t)~λs + (t − Ts )~λe ). Therefore, Y~ = AΛ,
segment is ~λ[Ts ,Te ] (t) = Te −T
s


h
i
~λs
1
~
. By the definition

A∗i = Te −Ts Te − Ti Ti − Ts for i = 1 → |S| and Λ =
~λe
~ 2 = tr((Y~T − X)
~ T (Y~T − X)).
~
of the Frobenius norm, ||Y~T − X||
Applying the value of
F
Y~T and rearranging the terms, it is found that
 h
~ T A)∗1 − −(X
~ T A)∗2 − tr(X
~ T X)
~
(AT A)11 (AT A)12 (AT A)22 −(X

~ Λ)
~ =
C(S|


·
h
~λ2 2~λs · ~λe ~λ2 −(−2 · ~λs )− −(−2 · ~λe )− 1
e
s
i
Theorem 2. The fitting cost coreset of GPS data can be expressed in terms of eight
parameters of size O(1), each of which can be calculated cumulatively over the point
stream. These parameters can be calculated for a cumulative coreset group of size n
in O(n) time, and a fitting cost can be calculated from these parameters in O(1) time.
Proof. Let B∗i =
h
Te − Ti Ti − Ts
i
so that A =
1
B.
Te −Ts
As shown above, the
~
fitting cost can be expressed in terms of six parameters derived from A and X,
~ Ts and Te . Given these parameters,
meaning that they can be derived from B, X,
the fitting cost expression can thus be calculated in constant time.
Moreover, each of these parameters can be updated with additional GPS points
in constant time per point added. Ts is simply replaced if the new point is at the beginning of the stream section, or Te if it is at the end. For other six terms, cumulative
sums must be used. First, we will show that this can be achieved for the progressive addition of points to the top of the stream section (i.e., for a set of left-affixed
subsets), and then we will show that this calculation can be wrapped in a stream
section transformation which allows it to apply to the bottom of the section as well
(i.e. right-affixed subsets). Let j = 0 → n be the right bound index of the subset.
T
(B B)11 =
j
X
2
(Tj − Ti ) =
i=1
jTj2
− 2Tj
j
X
i=1
51
Ti +
j
X
i=1
Ti2
i 




T
(B B)12 =
j
X
(Ti − T1 )(Tj − Ti ) = −jT1 Tj + (T1 + Tj )
j
X
i=1
Ti −
i=1
T
(B B)22 =
j
X
2
(Ti − T1 ) =
jT12
− 2T1
j
X
~ i∗ = Tj
(Tj − Ti )X
j
X
j
X
~ i∗ −
X
i=1
i=1
~ T B)∗2 =
(X
Ti +
j
X
~ i∗ =
(Ti − T1 )X
j
X
j
X
i=1
Ti2
~ i∗
Ti X
i=1
~ i∗ − T1
Ti X
i=1
i=1
~ =
~ T X)
tr(X
j
X
Ti2
i=1
i=1
i=1
~ T B)∗1 =
(X
j
X
j
X
j
X
~ i∗
X
i=1
~ i∗ ||2 = ||X||
~ 2F
||X
i=1
These expressions can also be used to calculate the coreset parameters for the
right-affixed subset group, by apply a transformation to the stream section before
processing, and then reversing the transformation on the calculated values which are
input into the coreset parameter expressions. Specifically, the order of the elements in
~ and T must be reversed, and the timestamp values in T must be negated (so as to
X
invert the relative values of any two timestamps in the stream). Accordingly, T1 and
j
P
Tj must be negated and switched with one another, and the cumulative terms
Ti
i=1
and
j
P
~ i∗ must also be negated (these are the only cumulative terms which contain
Ti X
i=1
unsquared time values). Calculating the parameters from these modified intermediate
values produces coresets corresponding to the group of right-affixed stream subsets.
Therefore, the left- and right-affixed subset group coresets of a size-n stream
section can be constructed in O(n) time, at which point a fitting cost for any of these
subsets can be calculated from the corresponding coreset in O(1) time.
Theorem 3. Given the coreset for a GPS stream section of any size, the optimal line
path segment for that section can be calculated in constant time.
~ Λ).
~ This must occur where the all
Proof. The objective is to calculate argmin C(S|
~
Λ
52
λ-derivatives are equal to zero.


d
d~λs
d
d~λe


~ Λ)
~ =
 C(S|
~0

~0

~ Λ)
~ in terms of the parameters of the coreset, the
By applying the expression for C(S|
~ is found.
optimal Λ

~λs

~λe

~λs

~λe

Te − Ts
T
T
(B B)11 (B B)22 −
=
((B T B)12 )2

=
6.1.2

~ T B)∗1 − (B T B)12 (X
~ T B)∗2
(B B)22 (X


~ T B)∗2 − (B T B)12 (X
~ T B)∗1
(B T B)11 (X


~ T A)∗1 − (AT A)12 (X
~ T A)∗2
(AT A)22 (X


~ T A)∗2 − (AT A)12 (X
~ T A)∗1
(A A)11 (X

1
(AT A)11 (AT A)22
− ((AT A)12 )2
T
T
Feature Vector Coreset
The fitting cost coreset for video feature vector data is conceptually simpler than that
for GPS data. However, unlike GPS data’s fixed 2-dimensionality, feature data can
have an arbitrary dimension d, so its coreset’s behavior must be described in terms
of that value as well as n.
Theorem 4. The fitting cost coreset of d-dimensional feature data can be expressed in
terms of three parameters of size O(d), each of which can be calculated as a cumulative
sum over the vector stream. A fitting cost can be calculated from these parameters in
O(d) time, and these parameters can be calculated from a cumulative coreset group
of size n in O(nd) time. The optimal model vector can also be calculated from these
parameters in O(d) time.
Proof. Consider the fitting cost function for feature vectors:
~ ~λ) =
C((T, X),
d
n
X
X
j=1
!
~ ij ) − X
~ ij (log(~λj ) − log(X
~ ij ))))
((~λj − X
i=1
53
~ ~λ) =
C((T, X),
d
X
~λj
1−
i=1
j=1
~ ~λ) =
C((T, X),
n
X
d
X
n
X
~ ij − log(~λj )
X
i=1
~λj n − (~1 + log(~λj ))
~ ~λ) =
C((T, X),
n
n
d
P
~i P
X
i=1
n
X
j=1
n
X
!
~ ij log(X
~ ij )
X
i=1
~ ij +
X
i=1
~ ij +
X
i=1
j=1
n
X
n
X
!
~ ij log(X
~ ij )
X
i=1
d
n
P
P
~ i log(X
~ i)
X
(~λj ) ~1 + log(~λ) 1
·
i=1
j=1
A feature data coreset consists of three parameters: the scalar n, the size-d vector
n
n
P
~ i , and the size-d vector P X
~ i log(X
~ i ). Multiplying each parameter by is corX
i=1
i=1
responding λ-derived coefficient take O(d) time, so a fitting cost can be calculated
from these parameters in that runtime. Since each parameter can be expressed as a
sum of terms formed from individual data values in O(d) time, an entire cumulative
coreset group of size n can be constructed in O(nd) time. The optimal model segn
~ = 1 PX
~ i , so can be calculated from the two
ment of a video feature data set is E[X]
n
i=1
n
P
~ i in O(d) time.
parameters n and
X
i=1
.
6.2
RDP Partition Initialization
The standard RDP algorithm has an expected runtime of O(nd log n) with a worst
case of O(n2 d). Described purely in terms of the input stream’s size and dimension,
the modified RDP used to initialize the k-partition shares this runtime behavior.
However, it can be more precisely analyzed by also accounting for the parameter k.
If k is specified as input, then the algorithm can terminate once that k has been
achieved, giving an expected runtime of O(nd log k) with a worst case of O(nkd).
If k is not specified and must be automatically selected with the elbow finding
method, then the algorithm cannot necessarily terminate at the selected k, because
additional candidates beyond that value must be tested in order to determine that
that k is in fact the optimal one. However, this does not mean that all candidates
k ∈ {1, ..., n} must always be tested. Since the fitting cost can never be reduced
54
below 0, testing can cease once it reaches a k = kt where, even if its resulting cost
were 0, it would not be preferable to the best k found so far. Given this, the runtime
is O(nd log kt ) expected and O(nkt d) worst case.
In theory, kt is not guaranteed to be lower than n. However, it is easy to see that,
for reasonable input data, kt is significantly smaller than n. Let k ∗ be the optimal k,
which will be selected by the algorithm. Let c1 and ck∗ be, respectively, the fitting
cost at k = 1 and k = k ∗ . Let r∗ be the cost value of the elbow method’s reference line
at k = k ∗ . The fractional reduction of kt from n is equal to the fractional reduction
of ck∗ from r∗ .
n − kt
rk∗ − ck∗
=
n
c1
In other words, kt ’s improvement of the RDP process’s runtime is proportional to the
strength of k ∗ ’s optimality, the degree to which is improves the fitting cost relative
to the reference line. The only way for kt = n is if ck∗ = r∗ , which is essentially
impossible even with unreasonable input data. In practice, kt usually provides a
runtime decrease of at least twofold, and often more.
6.3
K-Means Clustering Initialization
The behavior of the k-means algorithm is very well studied, and is understood to
have a high worst-case runtime but to perform much better in practice. Though
it is impractical to attempt to fully analyze the runtime of this step, we can still
demonstrate significant increases by improving the performance of the process’s core
operation: the calculation of fitting costs between centroid segments and stream
sections.
Calculating the fitting cost of a size-n stream section to a single path segments
takes O(n) time, and so, naively, calculating those costs for m segments takes O(nm)
time. Using fitting cost coresets, however, the O(n)-time processing of the stream
sections can be separated from the particular path segment it is being fitted to, allowing multiple fitting costs to be evaluated in only O(n + m) time. Since the k-partition
boundaries are fixed throughout the entire m-clustering initialization process, the
55
coresets for the k sections can be calculated once at the beginning, replacing the kmean’s runtime dependence on n with the (usually) much lower k. This improvement
applies both to the k-means++ initialization step and to the iterative clustering steps.
Additionally, the k-means++ step can in its entirety be extracted from the O(log k)iteration search for the parameter m, by simply creating a complete ordering of the
k-sections according to the usual k-means++ process, and then truncating this ordering as necessary for each individual k-means run. As well as reducing the total
runtime of that step by a factor of O(log k), this modification also reduces the random variation between k-means runs. This stabilizes the reliability of meaningful
comparisons between the clustering costs resulting from different values of m, thus
improving the algorithm’s ability to accurately identify the correct elbow value for
that parameter.
6.4
Path Segment Calculation
The specific operation necessary to construct the optimal path segments, given a
partition and clustering of a data stream, is dependent on the data type of that
stream. For both GPS and video feature data, the optimal segment for a single
cluster can be calculated in time linear to the number of data values in that cluster,
using the fitting cost coreset, and can be calculated independently for each dimension.
Therefore, the full segment set can be linear time O(nd).
6.5
2-Segment Trajectory Update
For a single stream section of size ni and a set of m path segments, there are O(ni m2 )
possible 2-segment trajectories. Calculating the fitting cost of each of those take
O(ni ) time, so naively a single 2-segment trajectory update will take O(n2i m2 ) time.
However, by calculating separately the optimal path segment for each of the two trajectory segments, the optimal 2-segment trajectory, given the location of the segment
split, can be calculated in O(ni m) time, resulting in a total O(n2i m) runtime to find
56
the optimal segment split location as well.
This runtime can be further reduced through the use of fitting cost coresets. The
set of coresets for all left-affixed subsections and for all right-affixed subsections can
be calculated in O(ni ) time. The optimal path segment for each of these subsections
can then be found in O(ni m) time altogether. By then finding the best of the ni
pairings of a left-affixed subsection and a right-affixed subsection with the same nonaffixed boundary location, the optimal 2-segment trajectory can be found in O(ni m)
time. Therefore, the optimal 2-segment trajectories
for all sections of a partitioned
!
k=O(n)
P
size-n stream can be found in O
ni m = O(nm) time.
i=1
57
58
Chapter 7
Experimental Evaluation
We use two metrics to evaluate the ApproxKMSegmentMean algorithm: the fitting cost (i.e. error) and the runtime. A resulting (k, m)-segment mean approximation
is meaningless if it does not fit the input data set well, but the algorithm is not of any
use if it cannot process large sets with reasonable speed. In order to evaluate the algorithm’s effectiveness in these contexts, a series of experiments on several data sets were
used to collect empirical measurements. These statistics have been analyzed to provide a quantitative and qualitative understanding of ApproxKMSegmentMean’s
behavior.
7.1
7.1.1
Experimental Setup
Datasets
Five data sets of varying size, frequency, and qualitative characteristics were used to
test the effectiveness of this algorithm. Some of these are the same sets used in [18],
though without the additional preprocessing steps described therein.
ground robot
72,273 points, produced by SLAM localization, first floor of an academic building,
indoors only, 4 hours.
59
Using a 30-meter-capable Hokuyo scanning laser rangefinder and the GMapping
SLAM package for the ROS robotic operating system, a custom-built omnidirectional
ground robot was remotely operated to explore the first floor of an academic building,
concurrently mapping its surroundings and localizing itself. The resulting path is
high-noise, with loops and repeated sections.
quadrotor robot
12,000 points, produced by smartphone GPS, courtyard of an academic building,
outdoors only, 10 minutes.
Using a Samsung Galaxy SIII smartphone and a single onboard computer with ROS,
an Ascending Technologies Pelican quadrotor flying robot was remotely operated
above an outdoor courtyard, collecting filtered GPS data at a rate of 20 Hz.
personal smartphone
20,051 points, produced by smartphone GPS, greater Boston area, indoors and outdoors, 8 months.
Using the Travveler data logging smartphone application, GPS data was collected
from an individual’s phone at an approximate rate of 30 Hz, with frequent and
significant gaps in collection. The data was sanitized to remove points with nonunique timestamps, but in contrast to the experiments in [18], it is not patched to
remove discontinuities, as this (k, m)-segment mean algorithm does not rely on the
point signal being of a near-constant rate.
short phone video
9,900 vectors, produced by smartphone video, third floor of an academic building,
indoors only, 5.5 minutes.
Using the built-in camera of a handheld Samsung Galaxy S4 smartphone, a 1920x1080
video was recorded at 30 frames per second over the course of about 5 minutes. This
video shows the forward perspective of an individual traversing a significantly varying path several times amongst several different locations, pausing to observe each
one upon arrival.
60
D
C
E
B
A
Figure 7-1: A conceptual layout of the region traversed by the individual as recorded
in the short phone video. In terms of this graph, the individual’s trajectory would be
labeled as ABCDBAEDCEBCA.
long phone video
19,800 vectors, produced by smartphone video, throughout an academic building,
indoors and outdoors, half an hour.
Using the built-in camera of a handheld Samsung Galaxy S4 smartphone, a 1920x1080
video was recorded at 10 frames per second over the course of about half an hour.
This video shows the forward perspective of an individual traversing a slightly varying path ten times amongst several very different locations, pausing to observe each
one upon arrival.
61
D
C
B
A
Figure 7-2: A conceptual layout of the region traversed by the individual as recorded
in the long phone video. In terms of this graph, the individual’s trajectory would be
labeled as ABCABCABDABCABDABCABCABCABDABCA.
7.1.2
Processing Environment
These results were produced using an implementation of the ApproxKMSegmentMean
TM
R
algorithm in MATLAB (R2013a), running in 64-bit Windows 8 on a 2GHz IntelCore
i7
four-core processor with 6GB RAM.
7.1.3
Proportional Fitting Cost
The proportional fitting cost was used as the primary metric of a solution’s fit to its
input point signal.
Definition 11 (proportional fitting cost). Given an input signal S = (T, V~ ) of size
|S| = n and the total fitting cost Cf of a (k, m)-segment trajectory to that signal, the
ef =
proportional fitting cost is C
Cf
,
CS
where CS is the fitting cost of S to the single
62
constant segment consisting of the mean value vector of V~ , µ
~S =
1
n
n
P
V~i .
i=1
Observation. For the (1, 1)-segment trajectory consisting of the single path segment
ef = 1. Therefore, the proportional fitting cost of a (k, m)-segment mean
along µ
~S, C
can never be greater than 1.
ef = 0. Therefore the proportional
Observation. For the (n, n)-segment mean of S, C
fitting cost of a (k, m)-segment mean can never be less than zero.
Note that the bounds demonstrated by these observations apply strictly only
to true optimal solutions to the (k, m)-segment mean problem, not to the locally
approximate solutions produced by ApproxKMSegmentMean.
63
7.2
Results
7.2.1
Accuracy
0
proportional fitting costs of experimental data sets (k = 300)
10
ground robot GPS
quadrotor robot GPS
personal smartphone GPS
short phone video
long phone video
−1
proportional fitting cost (log−scale)
10
−2
10
−3
10
−4
10
−5
10
−6
10
Figure 7-3:
0
50
100
150
m value
200
250
300
The proportional fitting cost drops off, at first quickly and then more
slowly, as the map size increases. Each data set has a characteristic curve relation to
m. Note that the feature vector data sets have noticeably higher proportional costs
than the GPS data sets. This could be because their much larger dimensionality
introduces greater structural cost into the data, or it could indicate that the ‘natural’
partition size for those sets is greater that k = 300.
64
7.2.2
Speed
run times of experimental data sets (k = 300)
0.2
ground robot GPS
quadrotor robot GPS
personal smartphone GPS
short phone video
long phone video
run time (seconds per input value)
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
Figure 7-4:
0
50
100
150
m value
200
250
300
The algorithm’s run time per varies significantly around its average
relation to m. While some data sets’ run times generally increase relative to m,
others’ are independent of it, or even decrease as it increases. This is likely due
to the behavior of the EM loop: increasing m might cause the trajectory to be
initialized closer to its local minimum, reducing the number of EM iterations needed,
and therefore the total run time. Given these run times, the system could easily
process in real time data streams up to 5 Hz.
65
7.2.3
Selected Parameters
data set
k
m prop. fitting cost
run time (seconds per input)
ground robot GPS
62
6
0.0601831
0.0285913
quadrotor robot GPS
33
5
0.0709467
0.020304
personal smartphone GPS
161
9
0.0168283
0.0309489
short phone video
555
95
0.263151
0.192963
1429 190
0.230193
1.14047
long phone video
Table 7.1:
If the parameters k and m are not provided as input, the algorithm
attempts to select good values for them as part of the trajectory initialization. For
each experimental data set, it selects a characteristic (k, m) pair. Deviation of these
selected values from the ground-truth parameters is primarily a result of the imprecision of the initial assumptions made about the data, such as the linear or constant
structure of the path segments. In particular, note that the GPS data sets tend to
have low parameter values relative to their ground truths, while the feature vector
sets tend to have high values.
The RDP-based partition initialization is reliant on assumptions made about the
mathematical shape of the trajectory’s segments, and so is rather sensitive to input
streams which deviate from these assumptions. This helps to explains the selected
parameters’ divergence from their ground-truth values, as seen in Table 7.2.3.
66
7.2.4
Sample Results
GPS Geographic Maps
Geographic map of (300, 20)−segment trajectory for ground robot GPS
Geographic map of (300, 200)−segment trajectory for ground robot GPS
input data points
calculated path segments
input data points
calculated path segments
Figure 7-5: Two geographic maps of the ground robot data set, with the (300, 20)and (300, 200)-segment trajectory outputs of the algorithm, respectively. With far
fewer path segments to utilize, the m = 20 trajectory’s fit to the GPS points is much
rougher, compared to the m = 200 trajectory’s close fit.
Geographic map of (300, 20)−segment trajectory for quadrotor robot GPS
Geographic map of (300, 200)−segment trajectory for quadrotor robot GPS
input data points
calculated path segments
input data points
calculated path segments
Figure 7-6: Two geographic maps of the quadrotor robot data set, with the (300, 20)and (300, 200)-segment trajectory outputs of the algorithm, respectively. Since this
data set contains a low degree of actual path repetition, the produced trajectories
tend to simply align themselves to the GPS points.
67
Geographic map of (300, 20)−segment trajectory for personal smartphone GPS
Geographic map of (300, 200)−segment trajectory for personal smartphone GPS
input data points
calculated path segments
Figure 7-7:
input data points
calculated path segments
Two geographic maps of the personal smartphone data set, with the
(300, 20)- and (300, 200)-segment trajectory outputs of the algorithm, respectively.
Because of the extended discontinuities in the data set, the algorithm has made a
best-effort attempt to bridge these gaps. As a result, some parts of the map traverse
areas lacking any input points. It may be valuable to ‘repair’ such signal gaps using
data patching, as described in [18].
Feature Vector Clustering Maps
The size of each blue circle is proportional to the relative prominence of a particular
cluster in the produced trajectory, the number of times which that cluster appears
in the sequence. The thickness of each green line is proportional to the relative
strength of linkage between two clusters, the number of times which one of the two
immediately precedes the other in the sequence, with the absence of a line indicating
that the trajectory never directly transits between two clusters. The images are
representative frames for the most prominent clusters, each one corresponding to the
vector with the lowest fitting cost to its assigned segment. Some of these frames
are manually labeled to identify the ground-truth elements with which their clusters
correspond, using the letters from the conceptual layouts in Section 7.1.1.
68
Clustering map of (300, 20)−segment trajectory for short phone video
7
Clustering map of (300, 200)−segment trajectory for short phone video
5
61
3
9
11
13
101
19
15
cluster 2 (C) cluster 3 (A) cluster 4 (D)
cluster 6
cluster 7 (B) cluster 8 (D)
cluster 9
1
121
181
identified feature clusters
transitions between clusters
17
cluster 1
21
81
1
identified feature clusters
transitions between clusters
41
141
161
cluster 5
cluster 1
cluster 2 (A) cluster 3 (C) cluster 4 (D)
cluster 10 (A)
cluster 6
cluster 7 (B) cluster 8 (D)
cluster 9
cluster 5
cluster 10 (C)
Figure 7-8: Two clustering maps of the short phone video data set. Because this video
contains only a few repeated loops, the skew of the transition strength towards the
most prominent clusters is relatively low, especially when a large number of clusters
are allowed. Despite this, the qualitative appearance of the clusters’ representative
frames are widely varied, even amongst the most prominent clusters. In the left map,
the most prominent cluster (cluster 1) occurs 92 times, while the least prominent
(cluster 20) only occurs once. In the right map, the most prominent cluster occurs 7
times, while the least prominent (cluster 200) still only occurs once.
69
Clustering map of (300, 20)−segment trajectory for long phone video
7
Clustering map of (300, 200)−segment trajectory for long phone video
5
61
3
9
11
13
101
19
15
21
81
1
identified feature clusters
transitions between clusters
41
121
cluster 2
cluster 3
cluster 4 (B) cluster 5 (C)
cluster 6
cluster 7
cluster 8
cluster 9 (D) cluster 10 (B)
181
identified feature clusters
transitions between clusters
17
cluster 1 (B)
1
141
cluster 1
161
cluster 2 (A)
cluster 3
cluster 4 (C)
cluster 5
cluster 6 (B) cluster 7 (D)
cluster 8
cluster 9 (B)
cluster 10
Figure 7-9: Two clustering maps of the long phone video data set. Unlike the short
video, this video contains a larger number of repeated loops, and so the transitions’
strengths tend to skew significantly towards the most prominent clusters. Even at
m = 200, the green lines are noticeably thicker and denser around the top-right
region of the map, where the larger blue circles are arranged. In the left map, the
most prominent cluster (cluster 1) occurs 66 times, while the least prominent (cluster
20) only occurs once. In the right map, the most prominent cluster occurs 8 times,
while the least prominent (cluster 200) still only occurs once.
These sample clustering maps demonstrate both the successes and shortcomings
of ApproxKMSegmentMean as applied to feature vector data. On the one hand,
the clear non-uniformity of cluster prominence (the number of sections assigned to the
cluster), as well as the relative skew of transitions towards the most prominent cluster,
demonstrate how the algorithm is able to identify repeated feature characteristics
across the input stream. On the other hand, the presence of a non-negligible number
of less prominent or outright trivial clusters shows cases where the algorithm has failed
to develop a sufficiently robust understanding of the stream’s underlying patterns of
repetition. To some degree, this is a result of the choice of the (k, m)-segment mean
as the mathematical model of these patterns: because the algorithm aims to reduce
the aggregate fitting costs of the (k, m)-segment trajectory, it will tend to fit in a way
which favors small outlier values over larger but less divergent regions of the stream.
70
It is thus not surprising that a certain fraction of the total clusters are consistently
given over to such outliers, regardless of whether m is large or small.
71
72
Chapter 8
Conclusion
The ApproxKMSegmentMean algorithm produces a semantic map representing
the underlying patterns found in a long-term data trajectory with significant repetition. It uses a process of intelligent initialization followed by incremental improvement, in order to converge on a (k, m)-segment trajectory with a locally-optimal
fitting cost to the original data. This algorithm is sufficiently generalized to be applicable to a wide variety of input data types, such as GPS points and video feature
vectors. Figure 7.2.1 shows that the maps produced are close matches to the input
data, and Figure 7.2.2 shows that the algorithm is able to develop these maps quickly
enough for real-time applications.
Beyond the objective of solving the (k, m)-segment mean problem to high approximation accuracy, however, these results show the qualitative limitations of this
algorithm as applied to the development of semantic activity maps, such as its reliance on structural assumptions and its sensitivity to outliers. It bears investigating
whether the algorithm or its implementation can be modified in order to dampen
these undesirable operative characteristics.
In addition to improving the robustness of the algorithm to remedy these issues,
there are several promising avenues of potential future research stemming from this
work. Most obviously, this algorithm could be applied to other types of data, requiring
their own error models. More intriguing, however, is the possibilty of processing
multiple data streams at once. Compound input, synthesized from multiple sensors on
73
a single agent, could massively improve the algorithm’s ability to discern underlying
patterns in that agent’s behavior. Conversely, analyzing input from multiple agents
with overlapping regions of experience could produce a much more detailed map of
their shared region.
74
Bibliography
[1] Pankaj K Agarwal and Nabil H Mustafa. K-means projective clustering. In
23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database
Systems, 2004.
[2] James Biagioni and Jakob Eriksson. Inferring road maps from GPS traces: Survey and comparative evaluation. In Transportation Research Board 91st Annual
Meeting, 2012.
[3] Lili Cao and John Krumm. From GPS traces to a routable road map. In 17th
ACM International Conference on Advances in Geographic Information Systems
(SIGSPATIAL GIS), 2009.
[4] Winston Churchill and Paul Newman. Experience-based navigation for long-term
localisation. The International Journal of Robotics Research, December (Special
Issue on Long-Term Autonomy) 2013.
[5] Hugh Durrant-Whyte and Tim Bailey. Simultaneous localisation and mapping
(SLAM): Part I the essential algorithms. IEEE Robotics & Automation Magazine, 2006. URL: http://www-personal.acfr.usyd.edu.au/tbailey/papers/
slamtute1.pdf.
[6] Hugh Durrant-Whyte and Tim Bailey. Simultaneous localisation and mapping (SLAM): Part II state of the art. IEEE Robotics & Automation Magazine, 2006. URL: http://www-personal.acfr.usyd.edu.au/tbailey/papers/
slamtute2.pdf.
[7] Daniel Feldman, Cynthia Sung, and Daniela Rus. The single pixel GPS: Learning
big data signals from tiny coresets. In 20th ACM International Conference on
Advances in Geographic Information Systems (SIGSPATIAL GIS), 2012.
[8] Andrii Ilienko. Continuous counterparts of poisson and binomial distributions
and their properties. Annales Univ. Sci. Budapest, Sect. Comput., 39:137–147,
2013. URL: http://ac.inf.elte.hu/Vol_039_2013/137_39.pdf.
[9] Yasir Latif, César Cadena, and José Neira. Robust loop closing over time for
pose graph SLAM. The International Journal of Robotics Research, December
(Special Issue on Long-Term Autonomy) 2013.
75
[10] Yunpeng Li, Noah Snavely, and Daniel P. Huttenlocher. Location recognition using prioritized feature matching. In 11th European Conference on Computer Vision: Part II, 2010. URL: http://www.cs.cornell.edu/~dph/papers/
localization.pdf.
[11] Kai Ni, Anitha Kannan, Antonio Criminisi, and John Winn. Epitomic location
recognition. In IEEE Conference on Computer Vision and Pattern Recognition,
2008. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=
4587585.
[12] Brian Niehoefer, Ralf Burda, Christian Wietfeld, Franziskus Bauer, and Oliver
Lueert. GPS community map generation for enhanced routing methods based on
trace collection by mobile phones. In 1st International Conference on Advances
in Satellite and Space Communications (SPACOMM), 2009.
[13] Guy Rosman, Mikhail Volkov, Daniel Feldman, and Daniela Rus. Segmentation
of big data signals using coresets (provisional title), 2014.
[14] Falko Schmid, Kai-Florian Richter, and Patrick Laube. Semantic trajectory
compression. Advances in Spatial and Temporal Databases, 2009.
[15] Florian Schroff, C. Lawrence Zitnick, and Simon Baker. Clustering videos by
location. In British Machine Vision Conference, 2009. URL: http://research.
microsoft.com/pubs/81738/bmvc09_cr.pdf.
[16] Wenhuan Shi, Shuhan Shen, and Yuncai Liu. Automatic generation of road
network map from massive GPS, 12 vehicle trajectories. 12th International IEEE
Conference on Intelligent Transportation Systems (ITSC), 2009.
[17] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number
of clusters in a data set via the gap statistic. Journal of the Royal Statistical
Society: Series B (Statistical Methodology), 2002. URL: http://www.stanford.
edu/~hastie/Papers/gap.pdf.
[18] Cathy Wu. GPSZip: Semantic representation and compression system for GPS
using coresets. Master’s thesis, Massachusetts Institute of Technology, 2013.
[19] J.J.C. Ying, W.C. Lee, T.C. Weng, and V.S. Tseng. Semantic trajectory mining
for location prediction. In 19th ACM International Conference on Advances in
Geographic Information Systems (SIGSPATIAL GIS), 2011.
76
Download