Discriminant analysis of time series using wavelets Elizabeth A. Maharaj1 and Andrés M. Alonso2 1 2 Monash University, Australia ann.maharaj@buseco.monash.edu.au Universidad Carlos III de Madrid, Spain andres.alonso@uc3m.es Summary. We consider the use of wavelets for the classification of time series. In particular we use time independent wavelet variances as inputs into diagonal quadratic discriminant analysis. Simulations studies show that our procedure performs very well resulting in small training and hold-out sample classification errors. Applications to real data show that our procedure performs as well and in some cases better than other classifications methods. Key words: Wavelet variances, diagonal quadratic discriminant analysis 1 Introduction Discriminant analysis of time series has applications in geology, medicine and many other areas. [HPP01] regard time series signals as continuous curves and use a functional data-analytic method for dimension reduction. They then apply both nonparametric and Gaussian-based discriminators to the dimension-reduced data. Authors such as [KST98] and [Shu03] have used spectral analysis to transform time series to enable the application of linear or quadratic discriminant analysis to these time series. In this paper we develop a methodology based on wavelets to apply discriminant analysis to time series. Methods used by [KST98] which depend on spectral density estimates over specific frequency bands are applicable to only stationary time series. [Shu03] uses methods based on time varying spectra that are applicable to nonstationary series as well. However in this case, specific bandwidths and window lengths still have to be selected. [HOD04] use methods based on Fourier-type bases using local spectral features of the time series. Block lengths for the bases have to be selected. The use of wavelets overcomes this selection issue because discrimination is based on all data on all available scales. While a particular wavelet filter will have to be chosen, we will demonstrate that filters of various lengths from the Daubechies, Symmlets and Coiflets families produce comparable results. In Section 2, using the notation of [PW00] we give a brief description of the wavelets variance and its use in discriminant analysis. The design and results of the simulation study are discussed and presented in Section 3 with applications given in Section 4. We conclude in Section 5 and comment on further research. 894 Elizabeth A. Maharaj and Andrés M. Alonso 2 Wavelet Variance and Discriminant Analysis Wavelets are mathematical tools for analyzing signals and images in one or more dimensions. In a time series context, the discrete wavelets transform (DWT) or the modified discrete wavelets transform (MODWT) decomposes the time series into filtered series at different time scales. Large time scales give more low frequency information about the series, while small time scales give more high frequency information about the series. Both the DWT and MODWT can reconstruct a time series perfectly from its transformed filtered series using an inverse transform. The time independent MODWT wavelet variance is defined as 2 (τj ) ≡ var{W j,t }, νX where L−1 W j,t ≡ h̃j,l Xt−l , t = 0, 1, ..., N − 1, l=0 h̃j,l is the wavelet filter, {Xt } is a time series with N data points and L is the length 2 (τj ) is of the wavelet filter. An unbiased estimator of νX 2 ν̂X (τj ) ≡ 1 Mj N−1 2 W j,t , t=Lj −1 where Lj is the number of boundary coefficients at level j associated with scale τj ≡ 2j−1 and Mj = N − Lj + 1. Suppose that Wj,t is a Gaussian stationary process with mean 0 and spectral density function Sj (.). If Sj is finitely square integrable and strictly positive almost 2 2 (τj ) is asymptotically normal with mean νX (τj ) everywhere then the estimator ν̂X and large sample variance 2Aj /Mj (see [SWP00]) where Aj = 1/2 −1/2 Sj2 (f )df. Hence given a number of time series which belong to one of q groups, the MODWT is obtained for each series. The MODWT variance is then determined at each scale and used as inputs into a discriminant procedure. Since the wavelet variance estimators are asymptotically normal, the quadratic discriminant is (asymptotically) optimal. In our simulations and in the first real data example, we use the diagonal quadratic discriminant procedure because the small number of time series does not permit the estimation of the complete variance-covariance matrices. 3 Simulation Study For one study, we simulated 30 time series of lengths 256, 1024 and 2048 from X1 (t), a white noise process, and X2 (t), an autoregressive process of order 1 (AR(1)) with φ = -0.1, -0.3, -0.5, -0.7, -0.9, 0.1, 0.3, 0.5, 0.7, 0.9. For series generated from each of X1 (t) and X2 (t), 20 series are used as the training sample and the remaining 10 Discriminant analysis of time series using wavelets 895 formed the hold-out sample. For series of length T = 256, 1024 and 2048, wavelet variances were obtained on 8, 10 and 11 scales respectively (maximum number of scales for each series length). Training and hold-out sample error classification rates were obtained. These results using a wavelet filter of length eight from the Symmlets family are given in Tables 1 to 2. The simulations were carried out 1000 times. As expected, there is clearly an improvement in the misclassification rate as series length increases from 256 to 1024 to 2048. Figure 4 show examples of patterns of X1 (t) and X2 (t) with φ = 0.1 and with φ = 0.9 from which it can be clearly seen that the first two series patterns are similar while the third series pattern differs considerably from the other two. Our simulation results are quite consistent with observations made from the graphs. Similar results were obtained using wavelets of other lengths from the Symmlets family and wavelets of various lengths from the Daubechies and Coiflets families. We also ran simulations using the scenarios from [HOD04] and obtained comparable results. We also conducted other simulation studies using series generated from modulated autoregressive processes and obtained reasonably good results. Table 1. Misclassification Rates for X1 (t) versus X2 (t): T =256 φ -.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9 Training .00 .00 .00 .01 .14 .14 .01 .00 .00 .00 Hold-out .00 .00 .00 .04 .33 .32 .04 .00 .00 .00 Table 2. Misclassification Rates for X1 (t) versus X2 (t): T =1024 φ -.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9 Training .00 .00 .00 .00 .04 .03 .00 .00 .00 .00 Hold-out .00 .00 .00 .00 .11 .12 .00 .00 .00 .00 Table 3. Misclassification Rates for X1 (t) versus X2 (t): T =2048 φ -.9 -.7 -.5 -.3 -.1 .1 .3 .5 .7 .9 Training .00 .00 .00 .00 .01 .01 .00 .00 .00 .00 Hold-out .00 .00 .00 .00 .04 .04 .01 .00 .00 .00 896 Elizabeth A. Maharaj and Andrés M. Alonso White Noise 10 0 −10 AR(1) phi = 0.1 10 0 −10 AR(1) phi = 0.9 10 0 −10 0 200 400 600 800 1000 Fig. 1. Simulated Series of length T = 1024 4 Applications 4.1 Earthquake and Explosion Data We consider the suite of eight earthquake and eight mining explosions originating in the Scandinavian peninsula as well as an unknown event which originated in NovayaZemmlya, Russia. These data were also used by [KST98], [Shu03] and [HOD04]. Figure 2 shows a typical earthquake and explosion from this suite of 16 and the unknown event. Notice that the patterns of earthquake and explosion are dissimilar with both series containing two phases of arrival, the initial P-wave and the later Swave which starts about midway of the series. The P-wave and S-wave each contains 1024 observations. The pattern of the unknown event appears to be closer to that of an explosion than an earthquake. For each series, the ratio of wavelet variances of the S-wave and P-wave was obtained at each scale and used as the inputs into diagonal quadratic analysis. A justification for using ratios of the wavelets variances in discriminant analysis is provided in the appendix. Using a wavelet filter of length eight from the Symmlets family and the maximum of ten levels, a hold-out one classification procedure was applied to obtain the holdout classification error rate. All earthquakes were correctly classified while the third explosion was classified as an earthquake. The unknown event was classified as an explosion. This is consistent with results obtained by [KST98], [Shu03] and [HOD04]. Tables 3 and 5 show the variance ratios3 at each wavelet level from where the reasons for the misclassification of the third explosion and the classification of the unknown event as an explosion are apparent. That is, the variance ratios for the third explosion are more in line with one or more of the earthquakes at some of the levels, while the variance ratios of the unknown event are more in line with those of the explosions than earthquakes on most of the levels. The same results are obtained using wavelets of other lengths from the Symmlets family and wavelets of various lengths from the Daubechies and Coiflets families. 3 eq: earthquake; ex: explosion; ue: unknown event Discriminant analysis of time series using wavelets 897 Earthquake 10 5 0 −5 −10 Explosion 10 5 0 −5 −10 Unknown Event 10 5 0 −5 −10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Fig. 2. Seismic Recordings of an Earthquake, Explosion and the Unknown Event Table 4. MODWT Variance Ratio (S-Wave/P-Wave): Earthquakes Level 1 2 3 4 5 6 7 8 9 10 eq1 eq2 eq3 eq4 eq5 eq6 6.76 13.41 39.59 64.87 85.55 11.76 3.20 23.54 37.03 75.23 2.26 4.36 30.31 49.83 112.58 18.14 6.07 5.73 7.80 6.14 2.72 10.23 59.54 246.19 54.47 2.78 2.99 14.63 159.60 652.58 0.75 1.31 6.21 45.17 14.92 0.94 1.07 2.33 15.63 36.28 0.41 0.40 2.88 31.42 23.31 2.24 1.43 5.96 8.78 34.09 6.98 5.72 10.90 33.55 31.06 25.19 29.53 4.82 2.25 1.38 eq7 eq8 9.45 1.06 18.09 2.99 50.13 19.80 95.66 96.30 93.09 107.78 59.18 7.60 19.26 4.34 13.52 0.80 20.95 0.36 23.92 0.37 Table 5. MODWT Variance Ratio (S-Wave/P-Wave): Explosions and the Unknown Event Level ex1 1 2 3 4 5 6 7 8 9 10 0.96 2.47 9.53 7.39 1.93 1.44 2.36 1.04 4.48 5.32 ex2 1.21 1.64 1.06 3.36 11.42 1.84 0.52 0.20 0.29 0.17 ex3 ex4 ex5 ex6 1.80 2.01 4.47 16.19 19.79 3.88 3.35 6.15 6.37 18.31 1.29 1.36 3.26 5.91 7.85 4.87 4.71 2.61 3.68 0.69 1.46 1.37 1.79 5.11 1.08 1.43 1.08 1.85 2.56 2.24 2.39 3.09 5.46 8.45 4.92 1.60 1.68 4.76 0.58 0.34 ex7 ex8 1.51 1.78 4.10 5.89 10.39 0.93 0.77 0.99 0.62 0.52 0.05 0.18 1.93 4.66 7.12 2.16 2.03 2.53 7.28 8.65 ue 1.13 0.99 1.53 1.29 1.10 1.13 0.38 0.17 0.05 0.05 898 Elizabeth A. Maharaj and Andrés M. Alonso 4.2 Control Chart Data This dataset contains 600 control charts synthetically generated by the process in [AM99]. There are six different classes of control charts (see [PC98]): 4 1. 2. 3. 4. 5. 6. Normal patterns (N): Xt = µ + σYt , Cyclic patterns (C): Xt = µ + σYt + a sin(2πt/T ), Increasing trend patterns (IT): Xt = µ + σYt + gt, Decreasing trend patterns (DT): Xt = µ + σYt − gt, Upward shift patterns (US): Xt = µ + σYt + ks, Downward shift patterns (DS): Xt = µ + σYt − ks, where µ is the mean of the process (set to 80), σ is the standard deviation of the process (set to 5), Yt is a standard Gaussian white noise, a is the amplitude of cyclic variation (set in the range 0 < a < 15), T is the period of a cycle (set between 4 and 12), g is the gradient of the trend (set in the range 0.2 < g < 0.5), k is the shift position function (k = 0 before the shift and k = 1 at the shift and thereafter) and s is the magnitude of the shift (set between 7.5 and 20). These control charts time series were sampled within t = 0 to t = 59. Two zeroes each were padded at the beginning and the end of each time series so that wavelets coefficients could be obtained on the maximum of six scales. [PC98] use self-organizing neural networks to discriminate among the different patterns. They present the results for ten different networks. Their training and hold-out classification rates range between 62.6% and 95.4% and between 62.1% and 95.1%, respectively. Table 6 shows our classification results for a hold-out one procedure. Here, we used the Haar wavelet filter and the wavelet variances were obtained on six scales. The training and hold-out misclassification rates were 2.6% and 3.2% improving the best network results in [PC98]. Moreover, the main misclassifications can be explained since they are IT classified as US, DT classified as DS and viceversa. The results using other wavelets from the Daubechies, Coiflets and Symmlets families were similar. Table 6. Classification results for a hold-out one procedure in the control charts dataset Predicted patterns N C IT DT US DS 4 N Real patterns C IT DT US DS 100 0 0 0 0 1 0 100 0 0 0 0 0 0 95 0 5 0 0 0 0 96 0 4 0 0 5 0 95 0 0 0 0 4 0 95 This dataset is available from the UCI KDD Archive, see [HB99]. Discriminant analysis of time series using wavelets 899 5 Concluding Remarks In this article we proposed a method using wavelet variances for discrimination of time series. The time series can be either stationary or nonstationary. Our simulation study as well as our applications show that the method does very well in discriminating between time series patterns. An extension of this method using time dependent wavelet variances is currently being undertaken and will appear in [MA06]. Acknowledgements The first author acknowledges the support of a grant from the Faculty of Business and Economics, Monash University (Australia) and the second author acknowledges the support of the grant MTM2004-00098 from CICYT (Spain) as well as a Juan de La Cierva grant. References [AM99] Alcock, R.J., Manolopoulos, Y.: Time-Series Similarity Queries Employing a Feature-Based Approach. 7th Hellenic Conference on Informatics. Ioannina, Greece (1999) [HPP01] Hall, P., Poskitt, D.S., Presnell, B.: A functional data-analytic approach to signal discrimination. Technometrics 43, 1-9 (2001) [HB99] Hettich, S., Bay, S.D.: The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science (1999) [HOD04] Huang, H., Hernando O., Stoffer, D.S.: Discrimination and Classification of Nonstationary Time Series using the SLEX Model. J. Amer. Statist. Assoc., 99, 763-774 (2004) [KST98] Kakizawa, Y., Shumway, R.H., Taniguchi M.: Discrimination and Clustering for Multivariate Time Series. J. Amer. Statist. Assoc., 93, 328-340 (1998) [MA06] Maharaj, E.A., Alonso, A.M.: Discrimination of Locally Stationary Time Series using Wavelets. Preprint, (2006) [PC98] Pham, D.T., Chan, A.B.: Control Chart Pattern Recognition using a New Type of Self Organizing Neural Network. Proc. Instn. Mech. Engrs., 212, 115-127 (1998) [PW00] Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis. Cambridge University Press, Cambridge (2000) [SWP00] Serroukh, A., Walden, A.T., Percival, D.B.: Statistical Properties and Uses of the Wavelet Variance Estimator for the Scale Analysis of Time Series. J. Amer. Statist. Assoc., 95, 184-196 (2000) [Shu03] Shumway, R.H.: Time-frequency Clustering and Discriminant Analysis. Statist. Probab. Lett., 63, 307-314 (2003) 900 Elizabeth A. Maharaj and Andrés M. Alonso Appendix Justification for using the ratios of variances in discriminant analysis is based on 2 2 (τj ) and ν̂X (τj ) be estimators their asymptotic normality (see, [SWP00]). Let ν̂X 1 2 2 2 (τ ) and ν (τ ), respectively. Let n be the length of each of wavelet variances νX j j X2 1 wavelet series. Hence 2 2 (τj ) ν̂X (τj )] ν̂ 2 (τj ) = [ν̂X 1 2 is an estimator of 2 2 (τj ) νX (τj )] ν 2 (τj ) = [νX 1 2 which satisfy √ d n(ν̂ 2 (τj ) − ν 2 (τj )) −→ N (0, Σ). Let 2 2 (τj ), ν̂X (τj )) = f (ν̂ 2 (τj )) = f (ν̂X 1 2 2 ν̂X (τj ) 1 , 2 ν̂X (τj ) 2 then, using the Cramer delta method we obtain √ n(f (ν̂ 2 (τj )) − f (ν 2 (τj ))) −→ N (0, ∇f (ν 2 (τj )) Σ ∇f (ν 2 (τj ))), where ∇f (ν (τj )) = 2 d ∂ 2 (τ ) ∂νX j 1 2 f (ν (τj )), ∂ 2 (τ ) ∂νX j 2 2 f (ν (τj )) .