Discriminant analysis of time series using wavelets

advertisement
Discriminant analysis of time series using
wavelets
Elizabeth A. Maharaj1 and Andrés M. Alonso2
1
2
Monash University, Australia ann.maharaj@buseco.monash.edu.au
Universidad Carlos III de Madrid, Spain andres.alonso@uc3m.es
Summary. We consider the use of wavelets for the classification of time series.
In particular we use time independent wavelet variances as inputs into diagonal
quadratic discriminant analysis. Simulations studies show that our procedure performs very well resulting in small training and hold-out sample classification errors.
Applications to real data show that our procedure performs as well and in some
cases better than other classifications methods.
Key words: Wavelet variances, diagonal quadratic discriminant analysis
1 Introduction
Discriminant analysis of time series has applications in geology, medicine and many
other areas. [HPP01] regard time series signals as continuous curves and use a functional data-analytic method for dimension reduction. They then apply both nonparametric and Gaussian-based discriminators to the dimension-reduced data. Authors
such as [KST98] and [Shu03] have used spectral analysis to transform time series
to enable the application of linear or quadratic discriminant analysis to these time
series.
In this paper we develop a methodology based on wavelets to apply discriminant
analysis to time series. Methods used by [KST98] which depend on spectral density
estimates over specific frequency bands are applicable to only stationary time series.
[Shu03] uses methods based on time varying spectra that are applicable to nonstationary series as well. However in this case, specific bandwidths and window
lengths still have to be selected. [HOD04] use methods based on Fourier-type bases
using local spectral features of the time series. Block lengths for the bases have to be
selected. The use of wavelets overcomes this selection issue because discrimination is
based on all data on all available scales. While a particular wavelet filter will have to
be chosen, we will demonstrate that filters of various lengths from the Daubechies,
Symmlets and Coiflets families produce comparable results.
In Section 2, using the notation of [PW00] we give a brief description of the
wavelets variance and its use in discriminant analysis. The design and results of the
simulation study are discussed and presented in Section 3 with applications given in
Section 4. We conclude in Section 5 and comment on further research.
894
Elizabeth A. Maharaj and Andrés M. Alonso
2 Wavelet Variance and Discriminant Analysis
Wavelets are mathematical tools for analyzing signals and images in one or more
dimensions. In a time series context, the discrete wavelets transform (DWT) or the
modified discrete wavelets transform (MODWT) decomposes the time series into
filtered series at different time scales. Large time scales give more low frequency
information about the series, while small time scales give more high frequency information about the series. Both the DWT and MODWT can reconstruct a time
series perfectly from its transformed filtered series using an inverse transform. The
time independent MODWT wavelet variance is defined as
2
(τj ) ≡ var{W j,t },
νX
where
L−1
W j,t ≡
h̃j,l Xt−l ,
t = 0, 1, ..., N − 1,
l=0
h̃j,l is the wavelet filter, {Xt } is a time series with N data points and L is the length
2
(τj ) is
of the wavelet filter. An unbiased estimator of νX
2
ν̂X
(τj ) ≡
1
Mj
N−1
2
W j,t ,
t=Lj −1
where Lj is the number of boundary coefficients at level j associated with scale
τj ≡ 2j−1 and Mj = N − Lj + 1.
Suppose that Wj,t is a Gaussian stationary process with mean 0 and spectral
density function Sj (.). If Sj is finitely square integrable and strictly positive almost
2
2
(τj ) is asymptotically normal with mean νX
(τj )
everywhere then the estimator ν̂X
and large sample variance 2Aj /Mj (see [SWP00]) where
Aj =
1/2
−1/2
Sj2 (f )df.
Hence given a number of time series which belong to one of q groups, the MODWT is
obtained for each series. The MODWT variance is then determined at each scale and
used as inputs into a discriminant procedure. Since the wavelet variance estimators
are asymptotically normal, the quadratic discriminant is (asymptotically) optimal.
In our simulations and in the first real data example, we use the diagonal quadratic
discriminant procedure because the small number of time series does not permit the
estimation of the complete variance-covariance matrices.
3 Simulation Study
For one study, we simulated 30 time series of lengths 256, 1024 and 2048 from X1 (t),
a white noise process, and X2 (t), an autoregressive process of order 1 (AR(1)) with
φ = -0.1, -0.3, -0.5, -0.7, -0.9, 0.1, 0.3, 0.5, 0.7, 0.9. For series generated from each
of X1 (t) and X2 (t), 20 series are used as the training sample and the remaining 10
Discriminant analysis of time series using wavelets
895
formed the hold-out sample. For series of length T = 256, 1024 and 2048, wavelet
variances were obtained on 8, 10 and 11 scales respectively (maximum number of
scales for each series length). Training and hold-out sample error classification rates
were obtained. These results using a wavelet filter of length eight from the Symmlets
family are given in Tables 1 to 2. The simulations were carried out 1000 times. As
expected, there is clearly an improvement in the misclassification rate as series length
increases from 256 to 1024 to 2048. Figure 4 show examples of patterns of X1 (t) and
X2 (t) with φ = 0.1 and with φ = 0.9 from which it can be clearly seen that the
first two series patterns are similar while the third series pattern differs considerably
from the other two. Our simulation results are quite consistent with observations
made from the graphs.
Similar results were obtained using wavelets of other lengths from the Symmlets
family and wavelets of various lengths from the Daubechies and Coiflets families.
We also ran simulations using the scenarios from [HOD04] and obtained comparable results. We also conducted other simulation studies using series generated from
modulated autoregressive processes and obtained reasonably good results.
Table 1. Misclassification Rates for X1 (t) versus X2 (t): T =256
φ
-.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9
Training .00 .00 .00 .01 .14 .14 .01 .00 .00 .00
Hold-out .00 .00 .00 .04 .33 .32 .04 .00 .00 .00
Table 2. Misclassification Rates for X1 (t) versus X2 (t): T =1024
φ
-.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9
Training .00 .00 .00 .00 .04 .03 .00 .00 .00 .00
Hold-out .00 .00 .00 .00 .11 .12 .00 .00 .00 .00
Table 3. Misclassification Rates for X1 (t) versus X2 (t): T =2048
φ
-.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9
Training .00 .00 .00 .00 .01 .01 .00 .00 .00 .00
Hold-out .00 .00 .00 .00 .04 .04 .01 .00 .00 .00
896
Elizabeth A. Maharaj and Andrés M. Alonso
White Noise
10
0
−10
AR(1) phi = 0.1
10
0
−10
AR(1) phi = 0.9
10
0
−10
0
200
400
600
800
1000
Fig. 1. Simulated Series of length T = 1024
4 Applications
4.1 Earthquake and Explosion Data
We consider the suite of eight earthquake and eight mining explosions originating in
the Scandinavian peninsula as well as an unknown event which originated in NovayaZemmlya, Russia. These data were also used by [KST98], [Shu03] and [HOD04].
Figure 2 shows a typical earthquake and explosion from this suite of 16 and the
unknown event. Notice that the patterns of earthquake and explosion are dissimilar
with both series containing two phases of arrival, the initial P-wave and the later Swave which starts about midway of the series. The P-wave and S-wave each contains
1024 observations. The pattern of the unknown event appears to be closer to that
of an explosion than an earthquake. For each series, the ratio of wavelet variances
of the S-wave and P-wave was obtained at each scale and used as the inputs into
diagonal quadratic analysis. A justification for using ratios of the wavelets variances
in discriminant analysis is provided in the appendix.
Using a wavelet filter of length eight from the Symmlets family and the maximum
of ten levels, a hold-out one classification procedure was applied to obtain the holdout classification error rate. All earthquakes were correctly classified while the third
explosion was classified as an earthquake. The unknown event was classified as an
explosion. This is consistent with results obtained by [KST98], [Shu03] and [HOD04].
Tables 3 and 5 show the variance ratios3 at each wavelet level from where the reasons
for the misclassification of the third explosion and the classification of the unknown
event as an explosion are apparent. That is, the variance ratios for the third explosion
are more in line with one or more of the earthquakes at some of the levels, while the
variance ratios of the unknown event are more in line with those of the explosions
than earthquakes on most of the levels. The same results are obtained using wavelets
of other lengths from the Symmlets family and wavelets of various lengths from the
Daubechies and Coiflets families.
3
eq: earthquake; ex: explosion; ue: unknown event
Discriminant analysis of time series using wavelets
897
Earthquake
10
5
0
−5
−10
Explosion
10
5
0
−5
−10
Unknown Event
10
5
0
−5
−10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Fig. 2. Seismic Recordings of an Earthquake, Explosion and the Unknown Event
Table 4. MODWT Variance Ratio (S-Wave/P-Wave): Earthquakes
Level
1
2
3
4
5
6
7
8
9
10
eq1
eq2
eq3
eq4
eq5
eq6
6.76
13.41
39.59
64.87
85.55
11.76
3.20
23.54
37.03
75.23
2.26
4.36
30.31
49.83
112.58
18.14
6.07
5.73
7.80
6.14
2.72
10.23
59.54
246.19
54.47
2.78
2.99
14.63
159.60
652.58
0.75
1.31
6.21
45.17
14.92
0.94
1.07
2.33
15.63
36.28
0.41
0.40
2.88
31.42
23.31
2.24
1.43
5.96
8.78
34.09
6.98
5.72
10.90
33.55
31.06
25.19
29.53
4.82
2.25
1.38
eq7
eq8
9.45 1.06
18.09 2.99
50.13 19.80
95.66 96.30
93.09 107.78
59.18 7.60
19.26 4.34
13.52 0.80
20.95 0.36
23.92 0.37
Table 5. MODWT Variance Ratio (S-Wave/P-Wave): Explosions and the Unknown
Event
Level ex1
1
2
3
4
5
6
7
8
9
10
0.96
2.47
9.53
7.39
1.93
1.44
2.36
1.04
4.48
5.32
ex2
1.21
1.64
1.06
3.36
11.42
1.84
0.52
0.20
0.29
0.17
ex3 ex4 ex5 ex6
1.80
2.01
4.47
16.19
19.79
3.88
3.35
6.15
6.37
18.31
1.29
1.36
3.26
5.91
7.85
4.87
4.71
2.61
3.68
0.69
1.46
1.37
1.79
5.11
1.08
1.43
1.08
1.85
2.56
2.24
2.39
3.09
5.46
8.45
4.92
1.60
1.68
4.76
0.58
0.34
ex7 ex8
1.51
1.78
4.10
5.89
10.39
0.93
0.77
0.99
0.62
0.52
0.05
0.18
1.93
4.66
7.12
2.16
2.03
2.53
7.28
8.65
ue
1.13
0.99
1.53
1.29
1.10
1.13
0.38
0.17
0.05
0.05
898
Elizabeth A. Maharaj and Andrés M. Alonso
4.2 Control Chart Data
This dataset contains 600 control charts synthetically generated by the process in
[AM99]. There are six different classes of control charts (see [PC98]): 4
1.
2.
3.
4.
5.
6.
Normal patterns (N): Xt = µ + σYt ,
Cyclic patterns (C): Xt = µ + σYt + a sin(2πt/T ),
Increasing trend patterns (IT): Xt = µ + σYt + gt,
Decreasing trend patterns (DT): Xt = µ + σYt − gt,
Upward shift patterns (US): Xt = µ + σYt + ks,
Downward shift patterns (DS): Xt = µ + σYt − ks,
where µ is the mean of the process (set to 80), σ is the standard deviation of the
process (set to 5), Yt is a standard Gaussian white noise, a is the amplitude of cyclic
variation (set in the range 0 < a < 15), T is the period of a cycle (set between 4
and 12), g is the gradient of the trend (set in the range 0.2 < g < 0.5), k is the
shift position function (k = 0 before the shift and k = 1 at the shift and thereafter)
and s is the magnitude of the shift (set between 7.5 and 20). These control charts
time series were sampled within t = 0 to t = 59. Two zeroes each were padded at
the beginning and the end of each time series so that wavelets coefficients could be
obtained on the maximum of six scales.
[PC98] use self-organizing neural networks to discriminate among the different
patterns. They present the results for ten different networks. Their training and
hold-out classification rates range between 62.6% and 95.4% and between 62.1%
and 95.1%, respectively. Table 6 shows our classification results for a hold-out one
procedure. Here, we used the Haar wavelet filter and the wavelet variances were
obtained on six scales. The training and hold-out misclassification rates were 2.6%
and 3.2% improving the best network results in [PC98]. Moreover, the main misclassifications can be explained since they are IT classified as US, DT classified as DS
and viceversa. The results using other wavelets from the Daubechies, Coiflets and
Symmlets families were similar.
Table 6. Classification results for a hold-out one procedure in the control charts
dataset
Predicted
patterns
N
C
IT
DT
US
DS
4
N
Real patterns
C IT DT US DS
100 0 0 0 0 1
0 100 0 0 0 0
0 0 95 0 5 0
0 0 0 96 0 4
0 0 5 0 95 0
0 0 0 4 0 95
This dataset is available from the UCI KDD Archive, see [HB99].
Discriminant analysis of time series using wavelets
899
5 Concluding Remarks
In this article we proposed a method using wavelet variances for discrimination of
time series. The time series can be either stationary or nonstationary. Our simulation
study as well as our applications show that the method does very well in discriminating between time series patterns. An extension of this method using time dependent
wavelet variances is currently being undertaken and will appear in [MA06].
Acknowledgements
The first author acknowledges the support of a grant from the Faculty of Business
and Economics, Monash University (Australia) and the second author acknowledges
the support of the grant MTM2004-00098 from CICYT (Spain) as well as a Juan
de La Cierva grant.
References
[AM99] Alcock, R.J., Manolopoulos, Y.: Time-Series Similarity Queries Employing a Feature-Based Approach. 7th Hellenic Conference on Informatics.
Ioannina, Greece (1999)
[HPP01] Hall, P., Poskitt, D.S., Presnell, B.: A functional data-analytic approach
to signal discrimination. Technometrics 43, 1-9 (2001)
[HB99] Hettich, S., Bay, S.D.: The UCI KDD Archive [http://kdd.ics.uci.edu].
Irvine, CA: University of California, Department of Information and Computer Science (1999)
[HOD04] Huang, H., Hernando O., Stoffer, D.S.: Discrimination and Classification
of Nonstationary Time Series using the SLEX Model. J. Amer. Statist.
Assoc., 99, 763-774 (2004)
[KST98] Kakizawa, Y., Shumway, R.H., Taniguchi M.: Discrimination and Clustering for Multivariate Time Series. J. Amer. Statist. Assoc., 93, 328-340
(1998)
[MA06] Maharaj, E.A., Alonso, A.M.: Discrimination of Locally Stationary Time
Series using Wavelets. Preprint, (2006)
[PC98] Pham, D.T., Chan, A.B.: Control Chart Pattern Recognition using a New
Type of Self Organizing Neural Network. Proc. Instn. Mech. Engrs., 212,
115-127 (1998)
[PW00] Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis.
Cambridge University Press, Cambridge (2000)
[SWP00] Serroukh, A., Walden, A.T., Percival, D.B.: Statistical Properties and Uses
of the Wavelet Variance Estimator for the Scale Analysis of Time Series.
J. Amer. Statist. Assoc., 95, 184-196 (2000)
[Shu03] Shumway, R.H.: Time-frequency Clustering and Discriminant Analysis.
Statist. Probab. Lett., 63, 307-314 (2003)
900
Elizabeth A. Maharaj and Andrés M. Alonso
Appendix
Justification for using the ratios of variances in discriminant analysis is based on
2
2
(τj ) and ν̂X
(τj ) be estimators
their asymptotic normality (see, [SWP00]). Let ν̂X
1
2
2
2
(τ
)
and
ν
(τ
),
respectively.
Let
n
be
the
length of each
of wavelet variances νX
j
j
X2
1
wavelet series. Hence
2
2
(τj ) ν̂X
(τj )]
ν̂ 2 (τj ) = [ν̂X
1
2
is an estimator of
2
2
(τj ) νX
(τj )]
ν 2 (τj ) = [νX
1
2
which satisfy
√
d
n(ν̂ 2 (τj ) − ν 2 (τj )) −→ N (0, Σ).
Let
2
2
(τj ), ν̂X
(τj )) =
f (ν̂ 2 (τj )) = f (ν̂X
1
2
2
ν̂X
(τj )
1
,
2
ν̂X
(τj )
2
then, using the Cramer delta method we obtain
√
n(f (ν̂ 2 (τj )) − f (ν 2 (τj ))) −→ N (0, ∇f (ν 2 (τj )) Σ ∇f (ν 2 (τj ))),
where ∇f (ν (τj )) =
2
d
∂
2 (τ )
∂νX
j
1
2
f (ν (τj )),
∂
2 (τ )
∂νX
j
2
2
f (ν (τj ))
.
Download