International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 K-means Clustering for Sleep Spindles Classification Joao Caldas da Costa EIM Training and UNINOVA, University Nova of Lisbon Gold Coast, Australia joao.caldas.costa@gmail.com Manuel Duarte Ortigueira UNINOVA and Department of Electrical Engineering, University Nova of Lisbon Lisbon, Portugal mdo@fct.unl.pt ArnaldoGuimarães Batista UNINOVA and Department of Electrical Engineering, University Nova of Lisbon Lisbon, Portugal agb@fct.unl.pt Abstract : Changes in EEG sleep spindles constitute a promising indicator of sleep disorders. In this paper SleepSpindles are extracted from real EEG data from patients suffering from any kind of brain illness. In this paper a triple (STFT, WT and WMSD) algorithm for sleep spindle detection is used. Its performance is studied and quantified. After the detection and isolation, an ARMA model is applied to each spindle. The mean of the parameters of the ARMA model corresponding to all the detected spindles for each patient is computed and finally, these parameters are used in a k-means clustering classification algorithm to assign a given illness to each patient. Keywords - ARMA; Sleep Spindles; EEG; k-means clustering I. INTRODUCTION Sleep spindles (SS) are particular EEG patterns which occur during the sleep cycle with center frequency in the band 11.5 to 15 Hz. They are used as one of the features to classify the sleep stages [1]. Sleep spindles are promising objective indicators in sleep disorders. In order to interpret then, their structure needs to be clarified or a suitable model needs to be found. The correct detection of human SS and posterior characterization can lead to early detection of changes in brain and prevent or, at least, mitigate the influence of certain diseases [2]. Three methods have been used in the SS detection. The Short Time Fourier Transform (STFT) method relies in the fact that after the transform has been applied to a signal containing a SS, a peak will occur in the SS frequency range. The Wavelet Transform (WT) uses the normalized wavelet power to detect sleep spindles. Wave Morphology for Spindle Detection (WMSD) directly mimics manual visual scoring. The methods are combined using an AND algorithm. In this work, ARMA model for sleep spindles is used to detect meaningful differences when applied to spindles from people with pathologies. After a SS is correctly identified and isolated, an ARMA model is This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...77 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 applied to it. Once the ARMA parameters are obtained, a k-means clustering classification algorithm is applied to the data in order to classify each patient. The paper outlines as follows. In section 2 we present a brief description of Sleep Spindles and their characteristics, a description of the detection methodology and statistical measures are described. In section 3 theARMA model is introduced and in section 4 k-means clustering is explained. In section 6 experimental results are presented. At last we draw some conclusions. II. SLEEP SPINDLES: DETECTION AND METHODS A. Sleep Spindles It is commonly referred in literature that sleep spindles are the most interesting hallmark of stage 2 sleep electroencephalograms (EEG) [1]. A sleep spindle is a burst of brain activity visible on an EEG and it consists of 11-15 Hz waves with duration between 0.5s and 2s in healthy adults with amplitude up to 30 μV. A SS is present in Fig.2, starting at 1.45s with duration of 0.6s. The spindle is characterized by progressively increasing, then gradually decreasing amplitude, which gives the waveform its characteristic name. There is a consensus that SS are originated in the thalamus and can be recorded as potential changes at the cortical surface [3]. Sleep spindles were first described in human EEG by Loomis in 1935, but the first commonly accepteddefinition of sleep spindle was given by Rechtschaffen and Kales [4]: “The presence of a sleep spindle should not be defined unless it is of at least 0.5sec duration, i.e., one shouldbe able to count 6 or 7 distinct waves within the half-second period. Because the term “sleep spindle” has been widely used in sleep research, this term will be retained. The term should be used only to describe activity between 12 and 14 cps”. B. The Detection Algorithm Recent SS detectors are based on methods that include fuzzy logic, neural networks, bandpass filter, fast time frequency transform, Fourier transform and wavelet transform. The majority of the proposed algorithms are –directly or indirectly – based on amplitude-frequency analysis, thus banking on spindle definition and mimicking visual analysis.[5] Figure 1. The detection algorithm. In this work, SS are detected using a combination of Wavelet Transform (WT), Short Time Fourier Transform(STFT) and Wave Morphology for Spindle Detection (WMSD) algorithms. A vector is used to characterize the signal (same length as the sampled signal).This vector defines each point of the sampled signal This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...78 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 as belonging to a SS or not. The mixed result is computed, i.e., a point is considered belonging to a SS if it is marked as SS in WT,STFT and WMSD algorithms. Finally, if there are not enough consecutive points marked as belonging to a SS, inorder to last at least 0.5 seconds, they are considered as non-spindle. The method is summarized in Fig. 1. The use of STFT is commonly used in signal processing [6] and the STFT of a discrete signal is defined as: The magnitude squared of the STFT yields the spectrogram of the signal: The SS detection is based on the spectrogram. SS are detected when peaks are found in the 11-15Hz range.An example of detection of SS using STFT and corresponding spectrogram can be seen in Fig.2. It is clear the presence of peak in the spectrogram (t=1.4-2.0s an f_12Hz), corresponding to a SS. Figure 2. SS detection using STFT, WT and WMSD algorithms. The SS detection with WT used spindles employs the continuous wavelet transform of EEG signal x(t) defined as: whereψ(t) is called the ‘mother wavelet’, the asterisk denotes complex conjugate, whereas a and b are scaling parameters [7]. The corresponding normalized wavelet power is defined by: andσ is the standard deviation of the EEG segment used. Complex Morlet WT was used. SS are detected when normalized wavelet power is above a certain threshold. In Fig. 2 a SS is detected using the normalized wavelet power. This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...79 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 The WMSD algorithm used in this paper is based on the definition of Sleep Spindle by Rechtschaffen and Kales [4]. The whole process mimics the visual detection mechanism and this algorithm was for the first time published by the authors in [8].The implemented algorithm consists of: a) Detection of peaks in the signal (maxima and minima), based on a defined threshold, thus, eliminating small peaks (in Fig. 2 they are marked with a “•”); b) Determination of extreme to extreme time distance and conversion to frequency: c) Verification if the determined frequencies lie in the SS range (11-15 Hz) (peaks satisfying this condition are marked with an “*” in Fig. 2); d) If there are more than 12 consecutive peaks (6 maxima and 6 minima) in the SS frequency band as spindle is marked (peaks satisfying this condition are marked with “□” in Fig. 2). C. Statistical Measures and algorithm performance In order to assess the validity of results, the algorithm was applied to the data and results compared with visually scored signal. Measures were taken, namely true positive (TP), false positive (FP), true negative (TN) and false negative (FN) events. Figure 3. Statistical measures; TP: true positive, TN: true negative, FP: false positive and FN: false negative regions. Here discrepancy was enhanced for demonstrative purposes.. A TP result is counted when a sample was scored as a spindle by the automatic method and the expertsimultaneously. A TN result is set when a correct decision of absence of spindle was made. If the automatic result indicated a presence of spindle and there was no spindle visual scoring, a FP result was counted. On the opposite,if the output indicated no spindle whereas the expert scored some, a FN result was counted (Fig. 3). Sensitivity, specificity and accuracy are defined as: This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...80 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 In [9] a comparison of the threshold choice is presented based on a EEG signal partly scored by a human expert. It is possible to determine the optimal threshold value by analyzing the Sensitivity x Specificity curve (Fig.4).As sensitivity improves towards the top and specificity improves towards the right, the optimal point on the curve is the point nearest to the top right corner. The best result obtained is the combination of the 3 algorithms (black line with “*” marks), with a sensitivity and specificity around 94%. Figure 4. Sensitivity x Specificity curves for the implemented algorithms. III. ARMA MODEL In signal processing, autoregressive moving average (ARMA) models are typically applied to correlated time series data. Given a time series, we can consider it as the output of an ARMA system driven by white noise. The ARMA model is a tool for understanding and, whenever necessary, predicting future values in time series. The model consists of two parts, an autoregressive (AR) part and a moving average (MA) part. The model is usually referred to as ARMA(p,q) where p is the order of the autoregressive part and q is the order of the moving average part . Compared with the pure MA or AR models, ARMA models more suitable for describing the characteristics of a given process with minimum number of parameters using both poles and zeros, rather than just poles or zeros[10]. As referred, a stationary ARMA process of order (p,q) is considered as the output of a linear time invariant(LTI) digital filter driven by white noise. The transfer function of the system is given by: with a0=1. The process corresponding to this model satisfies the difference equation: This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...81 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 where w(n) is the input sequence, a zero-mean white noise and x(n) is the output sequence. The main task in the modeling can be formulated as: Given a segment of a time series, x(n), n=0,1,2 …, L-1, estimate the p+q+1 ARMA parameters. It was used the “armax.m” command from the “Systems Identification Toolbox” in Matlab to perform the ARMA modelation, thus, obtaining the A and C parameters of the equation: Figure 5. Pole-Zero Map of a SS and corresponding ARMA model. In Fig. 5 a Pole-Zero Map of a identified SS is presented together with it’sARMA(5,1) model. The model was chosen to be ARMA(5,1), has it was the one with best results is [11]. IV. K-MEANS CLUSTERING In data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean [12]. Given a set of observations (x1, x2,… ,xn), where each observation is a d-dimensional real vector, kmeans clustering aims to partition the n observations into k sets (k≤ n) S = {S1, S2,…. Sk} so as to minimize the within-cluster sum of squares: Where μi is the mean of points in Si. The function “kMeansCluster.m” from KardiTeknomo. For a complete description of the algorithm please refer to[13]. V. EXPERIMENTAL RESULTS This study makes use of a sample representative of human sleep, obtained from 19 volunteers, males and females with ages between 35 and 87 years old. Briefly, all polysomnograms were obtained by a Nicolet EEG1A97 18-channel polygraph with a sampling rate of 256Hz. From the group, 8 subjects were completely This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...82 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 healthyand the remaining had one or more disorders, namely: REM sleep behavior disorder (RBD), periodic limb movement disorder (PLMD), insomnia or epilepsy. The patients’ disorders are: o Rapid eye movement sleep behavior disorder (RBD): a sleep disorder that involves abnormal behavior during the sleep phase with rapid eye movement (REM sleep).It is characterized by the dreamer acting dreams.These dreams often involve kicking, screaming, punching, grabbing and even jumping out of bed; o Periodic limb movement disorder (PLMD): a sleep disorder where the patient moves limbs involuntarily during sleep, and has symptoms or problems related to the movement; o Insomnia (or sleeplessness): a well known sleep disorder in which there is an inability to fall asleep or to stay asleep as long as desired; o Epilepsy: a common and diverse set of chronic neurological disorders characterized by seizures.The signals were unclassified and the whole night signal of C3-A2 channel was used. A total of 14130 SS have been detected by the algorithm. The detection methods were applied with a combination of threshold parameters for the STFT, WMSD and WT algorithm. In the STFT case, the threshold value used corresponds to the cumulative value of peaks in the spectrogram. In the WMSD algorithm, a point is considered a maximum peak if it has the maximal value, and was preceded (to the left) by a value lower than the threshold defined. The Normalized Wavelet Power amplitude is used as threshold in the WT case. K-means clustering has been applied to the arithmetic means of the coefficients from the ARMA transfer functions. The value N=2 has been selected, in order to determine if a patient has any pathology or not. So, it was expected that patients with any kind of pathology would all lie on the same group whereas patients with no disorders should lie in the other. The majority of results were as expected (table 1). Discrepancies occurred for two patients (marked in red): • Pat11, suffering from RMD and Epilepsy has been classified in groups 1 (healthy patients); • Pat23, suffering from PLMD and Insomnia has been classified in group 1 (healthy patients); TABLE I. PATIENTS DISORDERS AND RESULTS FROM K-MEAN CLUSTERING This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...83 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 VI. CONCLUSION In this work, sleep spindles that were automatically scored using a triple detection automatic algorithm were modelled using an ARMA(5,1) model. After the model was applied to the signals, k-means clustering was used to distinguish patients with any sleep related disorder from normal persons. The ARMA model proved to be useful to distinguish sleep spindles from patients suffering from sleep related disorders. Further modelling needs to be carried out in order to correct distinguish different pathologies. K-means clustering provided a powerful tool for patient differentiation. ACKNOWLEDGMENT The authors would like to acknowledge sleep laboratory CENC – Centro de Electro encefalo grafia eNeuro fisiologia Clinica for providing the data used for this work.This work was funded by EIM Training (Queensland Australia) and by Portuguese National Funds through the FCT – Foundation for Science and Technology under the project PEst-OE/EEI/UI0066/2011. REFERENCES [1] L. De Gennaro and M. Ferrara, “Sleep spindles: an overview”, Sleep Med Rev 7:423–40, 2003. [2] J.C. Costa, M.D. Ortigueira and A. Batista, “ARMA Modelling of Sleep Spindles”, Proceedings of the Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS'11 - IFIP AICT 349, pp 341-348, 2011. [3] M. Steriade, E.G. Jones and Llinas, “Thalamic Oscillations and Signaling”. Neuroscience Institute Publications. New York: John Wiley & Sons, 1990. [4] A. Rechtschaffen and A. Kales, “A manual of standardised terminology, techniques and scoring system for sleep stages of human subjects”, Washington, DC: Public Health Service, U.S. Government Printing Office; 1968. [5] A .Nonclercq , C. Urbain, D. Verheulpen, C. Decaestecker, P. Van Bogaert and P. Peigneux, “Sleep spindle detection through amplitude-frequency normal modelling”, Journal of Neuroscience Methods , 2010. This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...84 International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 10 : Issue No : 3 : Issue on : July / August , 2013 [6] J. Proakis, and D. Manolakis, “Digital Signal Processing”, 4th Ed., Prentice-Hall, 2006. [7] I. Omerhodzic, S. Avdakovic, A. Nuhanovic, K. Dizdarevic and K. Rotim, “Energy Distribution of EEG Signal Components by Wavelet Transform”, pp45-60 IInTech publishing, 2012. [8] J.C. Costa, M.D. Ortigueira, A. Batista and T. Paiva, “An Automatic Sleep Spindle detector based on WT, STFT and WMSD”,International Journal of the World Academy of Science, Engineering and Technology, issue 68, pp1298-1301, 2012. [9] J.C. Costa, M.D. Ortigueira, A. Batista and T. Paiva, “Threshold choice for automatic spindle detection”. Proc. IWSSIP2012; 2012 [10] A. Kizilkaya and A. H. Kayran, “ARMA model parameter estimation based on the equivalent MA approach”. Digital Signal Processing, Vol 16, Issue 6, 2006. [11] J.C. Costa, M.D. Ortigueira, A. Batista and T. Paiva. “ARMA Modelling of Sleep Spindles”, Proceedings of the Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS'11 - IFIP AICT 349, pp 341-348, 2011. [12] K-means clustering. (2012, August 2). In Wikipedia, The Free Encyclopedia. Retrieved 16:50, December 15, 2013, from http://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid= 505438129 [13] K. Tekmono. (2013, March 12), in http://people.revoledu.c This Paper was Presented on : 2nd International Conference on Computer Science, Information System & Communication Technologies ( ICCSISCT 2013 )- Sydney , Australia on June 18 – 19 , 2013 …… Page...85