Research Article Piecewise Trend Approximation: A Ratio-Based Time Series Representation Jingpei Dan,

advertisement
Hindawi Publishing Corporation
Abstract and Applied Analysis
Volume 2013, Article ID 603629, 7 pages
http://dx.doi.org/10.1155/2013/603629
Research Article
Piecewise Trend Approximation: A Ratio-Based Time
Series Representation
Jingpei Dan,1 Weiren Shi,2 Fangyan Dong,3 and Kaoru Hirota3
1
College of Computer Science, Chongqing University, Chongqing 400044, China
School of Automation, Chongqing University, Chongqing 400044, China
3
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, 4259 Nagatsuta,
Midoriku, Yokohama 226-8502, Japan
2
Correspondence should be addressed to Jingpei Dan; danjingpei@cqu.edu.cn
Received 13 March 2013; Accepted 27 April 2013
Academic Editor: Fuding Xie
Copyright © 2013 Jingpei Dan et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in
high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series;
the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations
that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive
data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively.
Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends,
and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate
the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying
the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher
classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA
is effective for high dimensional time series data mining.
1. Introduction
the time domain. This will make it feasible to benefit
from using dynamic time warping (DTW) that can
compare time series with local time shifting and
different lengths for similarity detection.
Time series representation is one of the key issues in time
series data mining, since the suitable choice of representation
greatly affects the ease and efficiency of time series data mining. To address the high dimensionality issue in real-world
time series data, a great number of time series representations
by applying dimensionality reduction have been proposed.
Dimensionality reduction methods help to compare time
series efficiently by modeling time series into a more compact
form, whereas significant information about main trends in a
time series, which are essential to effective similarity search,
may be lost. To support accurate and fast similarity detection
in time series, a number of special requirements that should
be satisfied by any representation model are summarized as
follows [1].
(iii) Sensitivity to Relevant Features. It is clearly desirable
that time series approximation is able to preserve as
much information in the original series as possible.
For this purpose, approximating a time series should
be accomplished in such a way that it tailors itself to
the local features of the series, in order to capture the
important trends of the series.
(i) Time Warping-Awareness. Time series should be modeled into a form that can be naturally mapped to
(iv) Absence of Parameters. Most representation models
and dimensionality reduction methods require the
(ii) Low Complexity. Due to the high dimensionality of
time series data, modeling time series should be
performed maintaining a reasonably low complexity,
which is possibly linear with the series length.
2
Abstract and Applied Analysis
user to specify some input parameters, for example,
the number of coefficients or symbols. However,
prior domain knowledge is often unavailable, and the
sensitivity to input parameters can seriously affect the
accuracy of the representation model or dimensionality reduction method.
From an empirical viewpoint, it has been recently observed that there is no absolute winner among the time series
representations in every application domain. Therefore, it is
critical for time series representation to keep features that
are important for corresponding application domains. The
sensitivity to features can be considered according to three
main subrequirements for the segments detected in an individual time series: (a) segments may have different lengths,
(b) any segment represents different slopes (trends) of a subsequence of data points, and (c) segments capture the series
trends [1].
Slopes [2] and derivative estimation [1] are adopted to denote trend of time series commonly in the literature. Due to
the property of tangent function that is used to calculate
slopes, it is difficult to distinguish two trends when the degrees of angles are close to ±90∘ by using slope to represent trend. In derivative time series segment approximation
(DSA) representation [1], original time series is firstly transformed into the first derivative estimations of the points, and
segmentation and approximation are based on the derivative
estimations of time series. It has been observed that relative
variations, the ratios between any two consecutive data points
in a given time series, are suitable for representing trend in
time series [3]. The magnitude of ratio reflects the variation
degree of trend and the sign of ratio represents the changing
direction of trend naturally. Based on ratio-based time series,
a time series representation, piecewise trend approximation
(PTA), is proposed, which retains the important feature of
main trends of original time series concisely by dimensionality reduction. In contrast to the conventional representations
based on raw time series data, the proposed PTA representation is based on local trends of raw time series data.
That is to say, the raw data is firstly transformed into local
trends (ratios), segmentation that separates time series into
segments of different trends is then performed based on the
ratios, and each segment is finally approximated by the ratios
between the first and the last data points in the segment.
PTA is able to satisfy the first three requirements mentioned earlier.
(i) PTA representations can be compared by using DTW
directly.
(ii) The ratio-based feature generation allows for representing a time series by focusing on the characteristic
trends in the series.
(iii) Computational complexity for PTA is linear with the
length of series, and the dimensionality of PTA is
adaptive with the identified trends of the series.
To validate the proposed PTA, the performance of PTA
for time series classification is compared to conventional
representations. The experiments are based on two classical
datasets by applying 𝐾-nearest neighbor (𝐾-NN) classification method. The comparative experimental results show that
PTA outperforms conventional representations in classification accuracy.
In Section 2, the time series representations with respect
to different dimensionality reduce, techniques are reviewed.
PTA representation is proposed in Section 3, and the experiments to validate the proposed PTA for time series classification are illustrated in Section 4.
2. Time Series Representations
To reduce dimensionality of a time series, a piecewise discontinuous function or low-order continuous function is usually
applied to approximate it into a compact form. This study focuses on the first dimensionality reduction method, and the
time series representations based on piecewise discontinuous
functions are reviewed as follows.
The piecewise approximation-based representations include discrete wavelet transform (DWT) [4, 5], swinging door
(SD) [6], Piecewise Linear Approximation (PLA) [7, 8], piecewise aggregate approximation (PAA) [9–11], adaptive piecewise constant approximation (APCA) [12], symbolic aggregate approximation (SAX) [13], and derivative time series segment approximation (DSA) [1].
Using DWT, a time series is represented in terms of a
finite length, fast decaying, oscillating, and discretely sampled
wave form (mother wavelet), which is scaled and translated in
order to create an orthonormal wavelet basis. Each function
in the wavelet basis is related to a real coefficient; the original series is reconstructed by computing the weighted sum of
all the functions in the basis, using the corresponding coefficient as weight. The Haar basis [14] is the most widely used
in wavelet transformation. The DWT representation of a time
series of length 𝑛 consists in identifying 𝑛 wavelet coefficients,
whereas a dimensionality reduction is achieved by maintaining only the first 𝑝 coefficients (with 𝑝 > 𝑛).
SD is a data compression technique that belongs to the
family of piecewise linear trending functions. SD has been
compared to wavelet compression. The SD algorithm employs
a heuristic to decide whether a value is to be stored within
the segment being grown or it is to be the beginning of a new
segment. Given a pivot point, which indicates the beginning
of a segment, two lines (the “doors”) are drawn from it to
envelop all the points up to the next one to be considered. The
envelop has the form of a triangle according to a parameter
that specifies the initial amplitude of the lines. The setup of
this parameter has impact on the data compression level.
In the PLA method, a time series is represented by a piecewise linear function, that is, a set of line segments. Several
methods have been proposed to recognize PLA segments
(e.g., [7, 8]).
PAA transforms a time series of 𝑛 points in a new one
composed by 𝑝 segments (with 𝑝 > 𝑛), each of which is of
size equal to 𝑛/𝑝 and is represented by the mean value of the
data points falling within the segment.
Like PAA, APCA approximates a time series by a sequence of segments, each one represented by the mean value
of its data points. A major difference from PAA is that APCA
Abstract and Applied Analysis
3
can identify segments of variable length. Also, the APCA algorithm is able to produce high quality approximations of a
time series by resorting to solutions adopted in the wavelet
domain.
In SAX method, dimensionality of original time series is
first reduced by applying PAA, then the PAA coefficients are
quantized, and finally each quantization level is represented
by a symbol so that SAX is a symbolic representation of time
series.
The DSA representation is based on the derivative version
of the original time series. DSA entails derivative estimation,
segmentation, and segment modeling to map a time series
into a different value domain which allows for maintaining
information on the significant features of the original series
in a dense and concise way.
For representing a time series of 𝑛 points, it can be
performed in 𝑂(𝑛) by using DWT, SD, (the fastest version of)
PLA, PAA, SAX, and DSA, whereas the complexity of APCA
is 𝑂(𝑛 log(𝑛)).
There are some other kinds of time series representations
applying continuous polynomial functions to approximate
time series, include Singular Value Decomposition (SVD)
[15, 16], Discrete Fourier Transforms (DFT) [17, 18], splines,
nonlinear regression, and Chebyshev polynomials [19, 20], of
which the details are kindly referred to the references.
In contrast to conventional representations based on raw
data, a time series representation based on ratios between
any two consecutive data points in a given time series is
proposed by applying piecewise segment approximation to
reduce dimensionality in Section 3.
3. PTA: Piecewise Trend Approximation
Given a time series π‘Œ = {(𝑦1 , 𝑑1 ), . . . , (𝑦𝑛 , 𝑑𝑛 )}, where 𝑦𝑖 is a
real numeric value and 𝑑𝑖 is the timestamp, it can be represented as a PTA representation
𝑇󸀠 = {(𝑅1 , 𝑅𝑑1 ) , . . . , (π‘…π‘š , π‘…π‘‘π‘š )} ,
π‘š ≤ 𝑛, 𝑛 ∈ 𝑁,
(1)
where 𝑅𝑑𝑖 is the right end point of the 𝑖th segment, 𝑅𝑖 (1 < 𝑖 ≤
π‘š) is the ratio between 𝑅𝑑𝑖−1 and 𝑅𝑑𝑖 in the 𝑖th segment, and
𝑅1 is the ratio between the first point 𝑑𝑖 and 𝑅𝑑1 . The length
of the 𝑖th segment can be calculated as 𝑅𝑑𝑖 − 𝑅𝑑𝑖−1 .
PTA approximates a time series by applying a piecewise
discontinuous function to reduce dimensionality. The algorithm of PTA consists of three main steps:
(1) local trend transformation: the original time series is
transformed into a new series where the values of data
points are ratios between any two consecutive data
points in original series;
(2) segmentation: the transformed local trend series is
divided into variable-length segments such that two
conjunctive segments represent different trends;
(3) segment approximation: each segment is represented
by the ratios between the first and last data points
within the segment, which indicates the characteristic
of trend.
3.1. Local Trend Transform. Given a time series π‘Œ = {(𝑦1 , 𝑑1 ),
. . . , (𝑦𝑛 , 𝑑𝑛 )}, 𝑛 ∈ 𝑁, a new series 𝑇 = {(π‘Ÿ1 , 𝑑2 ), . . . ,
(π‘Ÿπ‘› , 𝑑𝑛 )} is achieved from π‘Œ by local trend transform,
where π‘Ÿπ‘– is the value of ratio between (𝑦𝑖−1 , 𝑑𝑖−1 ), 𝑖 = 2, . . . ,
𝑛.
Ratios between each two consecutive data points in π‘Œ
are calculated according to the equation by justifying (1) as
follows:
π‘Ÿπ‘– =
𝑦𝑖 − 𝑦𝑖−1
,
𝑦𝑖−1
𝑖 = 2, . . . , 𝑛.
(2)
𝑇 is indeed a feature space of local trends mapped from the
original data space with one dimension reduced. Although
slope is often used to represents trend in the literature, it is
difficult to distinguish two trends when the degrees of angles
are close to ±90∘ due to the property of tangent function
which is used to calculate slopes. Ratio, however, is more
suitable for representing trend because the magnitude of ratio
reflects the variation degree of trend and the sign of ratio
represent the changing direction of trend naturally. Although
𝑇 is one dimension reduced, it is not enough for many realworld applications. Hence, 𝑇 will be compressed by the next
two steps into a more concise form.
3.2. Segmentation. Given a time series π‘Œ = {(𝑦1 , 𝑑1 ), . . . ,
(𝑦𝑛 , 𝑑𝑛 )}, 𝑛 ∈ 𝑁, π‘Œ is segmented into 𝑆 = {𝑆1 , . . . , π‘†π‘š } (π‘š ≤
𝑛, π‘š ∈ 𝑁), where π‘†π‘˜ (π‘˜ = 1, . . . , π‘š) is a subsequence of π‘Œ,
which is decided by key points that certain behavior changes
occur in π‘Œ. In PTA, segmentation is based on the local trend
series 𝑇 of original series π‘Œ. That is to say, the sequence
𝑇 = {(π‘Ÿ1 , 𝑑2 ), . . . , (π‘Ÿπ‘› , 𝑑𝑛 )} is divided into the sequence 𝑆 =
{𝑆1 , . . . , π‘†π‘š } (π‘š ≤ 𝑛, π‘š ∈ 𝑁), which is composed of π‘š
variable-length segments π‘†π‘˜ = {(π‘Ÿπ‘˜,1 , π‘‘π‘˜,1 ), . . . , (π‘Ÿπ‘˜,𝑗 , π‘‘π‘˜,𝑗 ) (π‘˜ =
1, . . . , π‘š)}. Each two consecutive segments represent different
trends. Since the segmentation in PTA is based on the ratios
by local trend transform, of which signs represent trend
directions, the main idea for segmentation is to separate 𝑇 by
finding out the first point such that the sign of it is different
from those of the previous points. Assume that πœ€ denotes the
threshold of the ratios and sign(π‘Ÿπ‘– ) denotes the sign of π‘Ÿπ‘– in
𝑇, the sequence π‘†π‘˜ is identified as a segment if and only if
sign(π‘Ÿπ‘˜,1 ) = sign(π‘Ÿπ‘˜,2 ) = ⋅ ⋅ ⋅ = sign(π‘Ÿπ‘˜,𝑗 ), |π‘Ÿπ‘˜,𝑗 | > πœ€, and
sign(π‘Ÿπ‘˜,𝑗 ) = sign(π‘Ÿπ‘˜+1,1 ), π‘˜ = 1, . . . , π‘š.
Accordingly, the raw data π‘Œ is segmented as 𝑆󸀠 = {𝑆1σΈ€  , . . . ,
σΈ€ 
π‘†π‘š }, π‘†π‘˜σΈ€  = {(π‘¦π‘˜,𝑙 , π‘‘π‘˜,𝑙 ), . . . , (π‘¦π‘˜,𝑗 , π‘‘π‘˜,𝑗 )}, (π‘˜ = 1, . . . , π‘š, π‘š ≤ 𝑛,
π‘š ∈ 𝑁).
This segmentation aggregates the data points having the
same changing directions so that the subsequences represent fluctuations in raw data intuitively. Thus, the reduced
dimensionality is adaptive to the trend fluctuations and no
parameter is needed.
3.3. Segment Approximation. To approximate the segments
σΈ€ 
}, π‘š ≤ 𝑛, π‘š ∈ 𝑁, the ratio between the first
𝑆󸀠 = {𝑆1σΈ€  , . . . , π‘†π‘š
and last point within each segment is calculated to represent
the main trend information of any segment. Finally, the PTA
representation 𝑇󸀠 = {(𝑅1 , 𝑅𝑑1 ), . . . , (π‘…π‘š , π‘…π‘‘π‘š )}, π‘š ≤ 𝑛, 𝑛 ∈ 𝑁,
is yielded such that
4
Abstract and Applied Analysis
π‘…π‘˜ =
π‘¦π‘˜,𝑗 − π‘¦π‘˜,𝑙
π‘¦π‘˜,𝑙
,
π‘˜ = 1, . . . , π‘š,
(3)
π‘…π‘‘π‘˜ = π‘‘π‘˜,𝑗 .
The PTA representation maintains the important feature
of trend variations in a concise form, while the computation
complexity of it is linear with the length 𝑛 of the sequence, that
is, 𝑂(𝑛). In addition, since the length of PTA representation
is determined by the fluctuations in original time series,
similarities between PTA representations can be compared by
applying dynamic time warping.
3.4. Distance Measure. To compare two time series data in
similarity search tasks, various distance measures have been
introduced. By far the most common distance measure for
time series is the Euclidean distance [21, 22]. Given two time
series 𝑋 and π‘Œ of the same length 𝑛, the Euclidean distance
between them is defined as
𝑛
𝐷 (𝑋, π‘Œ) = √ ∑(π‘₯𝑖 − 𝑦𝑖 )2 .
(4)
𝑖=1
In PTA representation, original time series is segmented
according to the change of local trend, and the length of the
transformed PTA representation is thus adaptive with the
trend variations in original time series. The Euclidean distance is limited to compare time series of equivalent length,
and thus it cannot be applied to time series similarity search
on PTA directly.
To address the limitation of Euclidean distance, dynamic
time warping (DTW) has been proposed to evaluate the similarity of variable-length time series [23]. Unlike Euclidean
distance, DTW allows elastic shifting of a sequence to provide
a better match with another sequence; hence, it can handle
time series with local shifting and different lengths. Therefore,
DTW can be directly applied to measure similarity of time
series in PTA form.
4. Experiments on Time Series Classification
Classification of time series has attracted much interest from
the data mining community [24–26]. To validate the performance of the proposed PTA representation for similarity
search in time series data, we design a classification experiment based on two classical datasets ControlChart and
Mixed-BagShapes [27] by applying the most common classification algorithm, 𝐾-nearest neighbor (𝐾-NN) classification.
ControlChart is a synthetic dataset of six classes: normal,
cyclic, increasing trend, decreasing trend, upward shift, and
downward shift. Each class contains 100 instances. Figure 1
shows that representative sample instances in each class of
ControlChart dataset. Mixed-BagShapes contains time series
derived from 160 shapes with nine classes of objects, including bone, cup, device, fork, glass, hand, pencil, rabbit, and
tool. The sample instances from each class of Mixed-BagShape are shown in Figure 2.
The proposed PTA is compared to two classical representations, PAA and APCA, which are introduced in Section 2.
The 𝐾-NN classification algorithm is briefly reviewed in
Section 4.1, data preprocessing is introduced in Section 4.2,
and the experimental results are illustrated in Section 4.3.
4.1. 𝐾-Nearest Neighbor (𝐾-NN) Classification. 𝐾-NN is one
of the most widely used instance-based learning methods
[28]. Given a set of 𝑛 training examples, upon receiving a
new instance to predict, the 𝐾-NN classifier will identify 𝐾nearest neighboring training examples of the new instance
and then assign the class label holding by the most number
of neighbors to the new instance [29]. To classify time series
data, it is straightforward to investigate the ability of time
series representations for similarity search by applying 𝐾-NN
algorithm since time series can be compared to the others as
instances in 𝐾-NN.
4.2. Data Preprocessing. In order to reduce the noise in the
data, original time series are usually preprocessed by smoothing techniques in time series data mining. It is essential to
make data amenable to further data mining tasks by denoising. In PTA, it is necessary to denoise time series data before
local trend transformation to avoid that the main trends are
undistinguished from noise. Thus, smoothing is applied to
denoise raw data before local trend transformation in PTA.
Commonly used smoothing techniques are moving average models including simple moving average, weighted moving average, and exponential moving average. In our experiments, exponential smoothing is applied to preprocess original data to reduce noise. Given a time series 𝑋 = {π‘₯𝑑 } (𝑑 =
0, . . . , 𝑛), the output 𝑆 = {𝑠𝑑 } (𝑑 = 0, . . . , 𝑛) of the exponential
smoothing algorithm is defined as
𝑠1 = π‘₯0 ,
𝑠𝑑 = 𝛼π‘₯𝑑−1 + (1 − 𝛼) 𝑠𝑑−1 ,
𝑑 > 1,
(5)
where 𝛼 is the smoothing factor and 0 < 𝛼 < 1.
4.3. Experimental Results of Time Series Classification. The
most commonly used 𝐾-NN algorithm is utilized to facilitate
independent confirmation of the proposed PTA representation. Concerning with the neighborhood size 𝐾 in 𝐾-NN
algorithm, the simple yet very competitive 1-NN algorithm
is adopted in this experiment, that is, 𝐾-NN with 𝐾 equal
to 1. The parameter of sliding window in PAA representation
and the threshold in PTA need to be predefined. The number
of segments for PAA is decided by the sliding window while
those of PTA and APCA are adaptive with fluctuations in
original data. To compare the representations effectively, the
parameters are tried several times such that the compressions
(i.e., number of segments) of the representations are equal or
at least very close. Classification accuracy is defined as
accuracy = 1 − 𝐸,
(6)
where 𝐸 is the error rate.
The comparative results on ControlChart and MixedBagShapes by using leaving-one-out cross-validation are
shown in Table 1. The results are the best results of each representation by trials of different parameters. For ControlChart,
Abstract and Applied Analysis
5
38
50
36
45
40
34
35
Value
Value
32
30
25
28
20
26
24
30
15
0
10
20
30
40
50
10
60
0
10
20
Time
(a)
50
60
40
50
60
35
50
30
Value
45
Value
40
(b)
55
40
35
25
20
15
30
25
30
Time
0
10
20
30
40
50
60
10
0
10
20
Time
30
Time
(c)
(d)
36
45
34
40
32
35
Value
Value
30
30
28
26
24
22
25
20
18
20
0
10
20
30
40
50
60
16
0
10
20
30
40
50
60
Time
Time
(e)
(f)
Figure 1: Sample instances from each class in ControlChart dataset: (a) normal; (b) cyclic; (c) increasing; (d) decreasing; (e) upward shift;
(f) downward shift.
the proposed PTA outperforms PAA and APCA by 3.55%
and 2.33% higher classification accuracy, respectively. For
Mixed-BagShapes, PTA yields 8.94% and 7.07% improvement
in classification accuracy compared with PAA and APCA, respectively. It is shown that the PTA outperforms the competitive representations by higher classification accuracy, which
indicates that PTA is effective for time series classification by
representing original data concisely with retaining important
feature of trend variation.
5. Conclusions
In order to improve efficiency of time series data mining in
high dimensional large-size databases, a time series representation piecewise trend approximation (PTA) is proposed
to represent original time series into a concise form while
retaining important feature of trend variations. Different
from the representations based on original data space, PTA
transforms original data space into the feature space of ratio
6
Abstract and Applied Analysis
1
1
−0.5
0
−0.5
−1
−1
−1.5
−1.5
−2
0
400
800
1200
−2.5
1600
Value
0
Value
Value
1
0.5
0.5
−2
2
1.5
1.5
400
0
800
−3
1600
0
400
800
1200
1600
Time
(b)
1.5
(c)
2
3
1.5
1
2
Value
0
−0.5
−1
1
0.5
1
Value
0.5
Value
1200
Time
(a)
0
−1
−1.5
0
400
800
1200
−2
1600
0
400
800
Time
Time
2
1
Value
0.5
0
−0.5
−1
−1.5
0
400
1600
0
−0.5
−1
−1.5
−2
−2.5
0
400
800
1200
1600
2.5
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
0
800
1600
1200
1600
(f)
400
800
1200
1600
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2.5
0
400
800
Time
Time
(g)
1200
Time
(e)
1.5
−2
1200
Value
(d)
Value
−1
−2
Time
−2
0
(h)
Time
(i)
Figure 2: Sample instances from each class in Mixed-BagShapes dataset: (a) bone; (b) cup; (c) device; (d) fork; (e) glass; (f) hand; (g) pencil;
(h) rabbit; (i) tool.
Table 1: Comparative results of classification accuracy for ControlChart and Mixed-BagShapes.
ControlChart
Mixed-BagShapes
PAA
0.952
0.876
APCA
0.964
0.894
Proposed PTA
0.987
0.962
between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction
and degree of local trend, respectively. Based on the ratiobased feature space, segmentation is performed such that
each two conjoint segments have different trends, and then
the piecewise segments are approximated by the ratios between the first and last pints within the segments; dimensionality is, hence, reduced while keeping important feature
of main trends in original data.
Based on two classical datasets, ControlChart and MixedBagShapes, by applying the commonly used time series classification algorithm 𝐾-NN, PTA is compared with classical
PAA and APCA representations using DTW distance measure. The results for ControlChart show that PTA yields 3.55%
and 2.33% improvements in classification accuracy compared
to PAA and APCA, respectively. For Mixed-BagShapes, PTA
outperforms PAA and APCA by 8.94% and 7.07% improvement, respectively. The time complexity of PTA algorithm is
Abstract and Applied Analysis
linear with the length of original time series. The efficiency of
time series data mining is, hence, enhanced by applying PTA
representation. The applications of PTA in time series clustering, indexing, and other similarity search tasks will be validated and a symbolic time representation derived from PTA
can be further developed.
Acknowledgment
This work is under Project no. 0216005202035 supported by
the Fundamental Research Funds for the Central Universities
in China.
References
[1] F. Gullo, G. Ponti, A. Tagarelli, and S. Greco, “A time series
representation model for accurate and fast similarity detection,”
Pattern Recognition, vol. 42, no. 11, pp. 2998–3014, 2009.
[2] Y. Ding, X. Yang, A. J. Kavs, and J. Li, “A novel piecewise linear segmentation for time series,” in Proceedings of the 2nd International Conference on Computer and Automation Engineering
(ICCAE ’10), pp. 52–55, February 2010.
[3] K. Huarng and T. H. K. Yu, “Ratio-based lengths of intervals to
improve fuzzy time series forecasting,” IEEE Transactions on
Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 36, no.
2, pp. 328–340, 2006.
[4] K. Chan and A. Fu, “Efficient time series matching by wavelets,”
in Proceedings of the International Conference on Data Engineering (ICDE ’99), pp. 126–133, 1999.
[5] Y. L. Wu, D. Agrawal et al., “A comparison of DFT and DWT
based similarity search in time-series databases,” in Proceedings
of the ACM International Conference on Information and Knowledge Management (CIKM ’00), pp. 488–495, 2000.
[6] E. H. Bristol, “Swinging door trending: adaptive trending recording,” in Proceedings of the ISA National Conference, pp. 749–
753, 1990.
[7] T. Pavlidis and S. L. Horowitz, “Segmentation of plane curves,”
IEEE Transactions on Computers, vol. 23, no. 8, pp. 860–870,
1974.
[8] E. Keogh and M. Pazzani, “An enhanced representation of time
series which allows fast and accurate classification, clustering
and relevance feedback,” in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining
(KDD ’98), pp. 239–241, 1998.
[9] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality reduction for fast similarity search in large time
series databases,” Knowledge and Information Systems, vol. 3, pp.
263–286, 2001.
[10] E. Keogh and M. Pazzani, “Scaling up dynamic time warping for
data mining applications,” in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining
(KDD ’00), pp. 285–289, 2000.
[11] X. Feng, C. Cheng, L. Changling, and S. Huihe, “An improved
process data compression algorithm,” in Proceedings of the International Conference on Intelligent Control and Automation,
pp. 2190–2193, 2002.
[12] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally
adaptive dimensionality reduction for indexing large time series
databases,” ACM Transactions on Database Systems, vol. 27, no.
2, pp. 188–228, 2002.
7
[13] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A symbolic representation of time series, with implications for streaming algorithms,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (DMKD ’03), pp. 2–11, 2003.
[14] C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to Wavelets and Wavelet Transforms, a Primer, Prentice-Hall, Englewood Cliffs, NJ, USA, 1997.
[15] F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting
Ad Hoc queries in large datasets of time sequences,” in Proceedings of the ACM SIGMOD International Conference on
Management of Data, pp. 289–300, 1997.
[16] K. V. Kanth, D. Agrawal, and A. Singh, “Dimensionality reduction for similarity searching in dynamic databases,” in Proceedingss of the ACM SIGMOD International Conference on Management of Data, pp. 166–176, 1998.
[17] D. Rafiei and A. Mendelzon, “Similarity-based queries for time
series data,” in Proceedings of the ACM SIGMOD International
Conference on Management of Data, pp. 13–25, 1997.
[18] D. Rafiei and A. Mendelzon, “Efficient retrieval of similar time
sequences using DFT,” in Proceedings of the International Conference on Foundations of Data Organization and Algorithms
(FODO ’98), pp. 249–257, 1998.
[19] J. C. Mason and D. C. Handscomb, Chebyshev Polynomials,
Chapman & Hall/CRC, Boca Raton, Fla, USA, 2003.
[20] Y. Cai and R. Ng, “Indexing spatio-temporal trajectories with
Chebyshev polynomials,” in Proceedings of the ACM SIGMOD
International Conference on Management of Data, pp. 599–610,
2004.
[21] G. Reinert, S. Schbath, and M. S. Waterman, “Probabilistic and
statistical properties of words: an overview,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 1–46, 2000.
[22] E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration,” Data
Mining and Knowledge Discovery, vol. 7, no. 4, pp. 349–371, 2003.
[23] D. Berndt and J. Clifford, “Using dynamic time warping to
find patterns in time series,” in Proceedings of the Workshop
on Knowledge Discovery in Databases, at the 12th International
Conference on Artificial Intelligence, pp. 359–370, 1994.
[24] J. McQueen, “Some methods for classification and analysis of
multivariate observation,” in Proceedings of the 5th Berkeley
Symposium on Mathematical Statistics and Probability, vol. 1, pp.
281–297, 1967.
[25] A. Nanopoulos, R. Alcock, and Y. Manolopoulos, “Featurebased classification of time-series data,” in Information Processing and Technology, pp. 49–61, Nova Science Publishers, Commack, NY, USA, 2001.
[26] C. A. Ratanamahatana and E. Keogh, “Making time-series classification more accurate using learned constraints,” in Proceedings of the 4th SIAM International Conference on Data Mining
(SDM ’04), pp. 11–22, April 2004.
[27] E. Keogh and T. Folias, “The UCR time series data mining archive,” 2002, http://www.cs.ucr.edu/∼eamonn/time series da
ta/.
[28] T. M. Mitchell, ,Machine Learning, Computer Sciences Series,
McGraw-Hill, New York, NY, USA, 1997.
[29] T. Cover and P. Hart, “Nearest neighbor pattern classification,”
IEEE Transaction on Information Theory, vol. 13, pp. 21–27, 1967.
Advances in
Operations Research
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Advances in
Decision Sciences
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Mathematical Problems
in Engineering
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Journal of
Algebra
Hindawi Publishing Corporation
http://www.hindawi.com
Probability and Statistics
Volume 2014
The Scientific
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
International Journal of
Differential Equations
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
Submit your manuscripts at
http://www.hindawi.com
International Journal of
Advances in
Combinatorics
Hindawi Publishing Corporation
http://www.hindawi.com
Mathematical Physics
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Journal of
Complex Analysis
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
International
Journal of
Mathematics and
Mathematical
Sciences
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com
Stochastic Analysis
Abstract and
Applied Analysis
Hindawi Publishing Corporation
http://www.hindawi.com
Hindawi Publishing Corporation
http://www.hindawi.com
International Journal of
Mathematics
Volume 2014
Volume 2014
Discrete Dynamics in
Nature and Society
Volume 2014
Volume 2014
Journal of
Journal of
Discrete Mathematics
Journal of
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Applied Mathematics
Journal of
Function Spaces
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Optimization
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Download