THE STATE UNIVERSITY OF NEW JERSEY RUTGERS Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University Acknowledgements Comprehensive Epilepsy Center, St. Peter’s University Hospital Brain Institute, University of Florida Rajesh C. Sachdeo, MD Deepak Tikku, MD Panos M. Pardalos, PhD J. Chris Sackellares, MD Paul R. Carney, MD Bioengineering, Arizona State University Leonidas D. Iasemidis, PhD Agenda Background: Epilepsy Electroencephalogram (EEG) Time Series Chaos Theory: Dimensionality Reduction Seizure Prediction Feature Selection Process Monitoring Concluding Remarks Facts About Epilepsy At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy. Epilepsy is the second most common brain disorder (after stroke) The hallmark of epilepsy is recurrent seizures. Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern. Epileptic Seizures Seizures usually occur spontaneously, in the absence of external triggers. Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours. Rationale Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion in the U.S. in associated health care costs and losses in employment, wages, and productivity. Cost per patient ranged from $4,272 for persons with remission after initial diagnosis and treatment to $138,602 for persons with intractable and frequent seizures. How To Fight Epilepsy Anti-Epileptic Drugs (AEDs) Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive Epilepsy surgery Require long-term invasive EEG monitoring 50% of pre-surgical candidates do not undergo respective surgery Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue Only 60% of surgery cases result in seizure free Electrical Stimulation (Vagus nerve stimulator) Parameters (amplitude and duration of stimulation) arbitrarily adjusted As effective as one additional AED dose Side Effects Seizure Prediction? Vagus Nerve Stimulator Open Problems Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors preceding seizures? If yes, what measurement can be used to indicate these pre-cursors? Does normal brain activity during differ from abnormal brain activity? Electroencephalogram (EEG) …is a tool for evaluating the physiological state of the brain. …offers excellent spatial and temporal resolution to characterize rapidly changing electrical activity of brain activation …captures voltage potentials produced by brain cells while communicating. In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities. From Microscopic to Macroscopic Level (Electroencephalogram - EEG) Depth and Subdural electrode placement for EEG recordings ROF LOF RST LTD LST LOF LST RTD LTD Scalp EEG Data Acquisition EEG Data Acquisition Typical EEG Time Series Data Goals of Research Test the hypothesis that seizures are not a random process. Employ data mining techniques to differentiate normal and abnormal EEGs Employ quantitative analysis to identify seizure pre-cursors Demonstrate that seizures could be predicted Develop a closed-loop seizure control device (Brain Pacemaker) 10-second EEGs: Seizure Evolution Normal Pre-Seizure Seizure Post-Seizure Dimensionality Reduction The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB (assume 16-bit ASCI format) 1 day = 1 hour*24 1 week = 1 hour*168 20 patients = 1 hour*3360 Kilobytes → Megabytes → Gigabytes → Terabytes Dimensionality Reduction Using Chaos Theory Chaos in Brain? Chaos in Stock Market? Chaos in Foreign Exchanges (Swedish Currency)? Measure the brain dynamics from EEG time series. Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent measures the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space Measures the chaoticity of the brain waves Embed the data set (EEG). Xi = (x(ti),x(ti+τ),…,x(ti+(p-1)τ))T where τ is the selected time lag between the components of each vector in the phase space, p is the selected dimension of the embedding phase space, and ti [1,T-(p-1) τ]. Pick a point x(t0) somewhere in the middle of the trajectory. Find that point's nearest neighbor. Call that point z0 (t0). Compute |z0 (t0) - x(t0)| = L0. Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z0 (ti) - x(ti)| = L0(i) and incrementing i, until L0(i) > ε. Call that value L0' and that time t1. Find z1 (t1), the “nearest neighbor” of x(t1), and go to step 3. Repeat the procedure to the end of the fiduciary trajectory t = tn, keeping track of the Li and Li' . where M is the number of times we went through the loop above, and N is the number of time-steps in the fiduciary. NΔt = tn - t0 2-D Example: Circle of initial conditions evolves into an ellipse. d1 d 0 e1t is the major axis. d 2 d 0 e 2t is the minor axis. The i th Lyapunov exponents after n steps can be defined as: di 1 i log n d0 STLmax Profiles Pre-Ictal Ictal Post-Ictal Hidden Synchronization Patterns How similar are they? Statistics to quantify the convergence of STLmax By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are Li {STL max1i , STL max i2 , , STL max i60 } L j {STL max1j , STL max 2j , , STL max 60j }, Dij Li L j {dij1 , dij2 , , dij60 } {STL max1i STL max1j , STL max i2 STL max 2j , , STL max i60 STL max 60j } Then, we calculate the average value, D ij ,and the sample standard deviation, ˆ d , of Dij {dij , dij2 , , dij60} . Dij The T-index between EEG signal epochs i and j is defined as Tij ˆ d , 60 Statistically Quantifying the Convergence IID (Independent and Identically Distributed) Test Assumption 1: Within a window of 30 STLmax points, the differences of STLmax values (Dij) between two electrode sites i and j are independent. To verify this assumption, Employ “portmanteau” test of white noise developed by Ljung and Box. Assumption 2: Within a wt window of 60 points, the differences of STLmax values between two electrode sites i and j are normally distributed. To verify this assumption, Employ To check this assumption, we employed the Shapiro-Wilk W test, which is is a well-established and powerful test of departure from normality. Convergence of STLmax Models Homoclinic Chaos (Silnikov’s Theorem): Rössler systems, Lorentz systems, population dynamical systems dxi (t ) w i y i z i dt N (e i , j x j e i', j xi ) (1) j 1,i j dyi (t ) w i xi a i yi dt (2) dzi (t ) b i xi zi ( xi yi ) (3) dt w, a, b and g are intrinsic parameters. e and e’ are directional coupling strengths. N = number of oscillators STLmax versus time and coupling Why Feature Selection? Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to show the convergence preceding the next seizure. Optimization Problem Optimization: We apply optimization techniques to find a group of electrode sites such that … They are the most converged (in STLmax) electrode sites during 10-min window before the seizure They show the dynamical resetting (diverged in STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode sites”. Hypothesis: The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure. Multi-Quadratic Integer Programming To select critical electrode sites, we formulated this problem as a multiquadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the average T-index among electrode sites a linear constraint to identify the number of critical electrode sites a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting Problem P1 : Min f( x) xT Qx n s.t. x b i 1 i xT Dx a xi {0,1}, i 1,..., n Notation and Modeling x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. xi = 1 if electrode i is selected to be one of the critical electrode sites. xi = 0 otherwise. Q is an (nn) matrix, whose each element qij represents the Tindex between electrode i and j during 10-minute window before a seizure. b is an integer constant. (the number of critical electrode sites) D is an (nn) matrix, whose each element dij represents the Tindex between electrode i and j during 10-minute window after a seizure. α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H0: “`two brain sites acquire identical STLmax values within 10-minute window” Conventional Linearization Approach for Multi-Quadratic 0-1 Problem For each product xi x j , we introduce new 0-1 variable xij xi x j (i j ). Note that xii xi 2 xi for xi 0,1 . The equivalent linear 0-1 problem is given by: min q x ij ij i s.t. j Ax b xij xi , for i, j 1,..., n (i j ) xij x j , for i, j 1,..., n (i j ) xi x j 1 xij , for i, j 1,..., n (i j ) d i x a ij ij j xi {0,1}, 0 xij 1, i, j 1,..., n Note that the number of continuous variables has been increased to O( n 2 ). Note that this problem formulation is computationally inefficient as n increases. KKT Conditions Approach Consider the quadratic 0-1 programming problem Min f( x ) x T Qx s.t. Ax b xi {0,1}, i 1,..., n Q is an (nn) matrix. b is an integer constant x is an n-dimensional column vector eT = (1,1,…,1) Relax x ≥ 0, we then have the following KKT conditions: Min f( x ) x Qx s.t. Ax b Qx u.e y 0 T xi 0, i 1,..., n c 0, A e , v 0 T Ax b yT x 0 x 0, u 0, y 0 KKT Conditions Approach Add slack variables a and define s = u.e + a Minimizing slack variables, we can formulate this problem as: T Min e s Qx y s 0 Ax b y x0 x 0, s 0, y 0 T Fix x{0,1} yT x 0 y M (1 x) Min eT s Qx y s 0 Ax b y M (1 x ) where s 0, y 0, x 0,1, and M max qij Q i j Note that this problem formulation is an efficient approach, as n increases, because it has the SAME number of 0-1 variables (n), and 2n additional continuous variables. Connections Between QIP problems and MILP problems For any matrix Q where qij≥0 We want to prove that P and P are equivalent: Problem P : Problem P : Min f( x ) x T Qx s.t. Ax b xi {0,1}, i 1,..., n Equivalent Min eT s Qx y s 0 Ax b (1) (2) y M (1 x ) y T x 0 (3) s 0, y 0, x 0,1 where M max aij i j (4) Theorem1: "If P has an optimal solution x 0 iff there exist y 0 , s 0 such that ( x 0 , y 0 , s 0 ) is an optimal solution to P." PROOF : Neccessity. If x is an optimal solution to P, it is obvious that y , s : y 0, s 0 such that Qx 0 y s 0 (1) and y T x 0 (3) . Choose y 0 and s0 from the above defined set of y and s s.t. eT s 0 is minimized. Let us show that ( x 0 , y 0 , s 0 ) is an optimal solution to P. Multiplying (1) by ( x 0 )T , we obtain ( x 0 ) T Qx 0 ( x 0 ) T y 0 ( x 0 ) T s 0 0. Note that from (3), ( x 0 )T y 0 ( y 0 ) T x 0. We then have ( x 0 )T Qx 0 ( x 0 )T s 0 . We know that x 0 arg min x T Qx, s.t. Ax b, x {0,1}. If we can prove that eT s 0 ( x 0 )T s 0 (5) , then ( x 0 , y 0 , s 0 ) is an optimal solution to P. To prove eT s 0 ( x 0 )T s 0 (5) , it is sufficient to show that, for any i, if xi0 0, then si0 0. We can prove this statement by contradiction. Proof : Assume that given ( x 0 , y 0 , s 0 ) that is an optimal solution to P, xi0 0 and si0 0 for some i. ( eT s 0 is minimized) For any i, define vectors yi yi0 si0 and si 0, which is not the optimal solution (eT s is not minimal). It is clear that ( x 0 , y , s ) satisfied all contraints (1) - (4) in P. Thus, ( x 0 , y, s ) is feasible and eT s eT s 0 . This fact contradicts our initial assumption that ( x 0 , y 0 , s 0 ) is an optimal solution to P. Sufficiency. The proof is similar. eT s 0 ( x 0 )T s 0 s1 s2 ... sn x1s1 x2 s2 ... xn sn Theoretical Results: MILP formulation for MQIP problem Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem with the SAME number of integer variables. Problem P : 1 Min eT s Problem P : 1 Min f( x) xT Qx s.t. Ax b xT Dx a x {0,1}, i 1,..., n i Equivalent Qx y s 0 (1) Ax b (2) y M (1 x) (3) Dx z 0 (4) eT z a (5) z M 'x (6) s, y, z 0, x 0,1 (7) where M max qij Q , i j M ' max dij D i j Theorem2: "If P1 has an optimal solution x0 iff there exist y 0, s0, z 0 such that ( x0, y 0, s0, z 0) is an optimal solution to P1." PROOF : Neccessity. From the proof of theorem 1, to prove theorem 2 we only need to show that if x0 is an optimal solution to problem P1, then there exists vector z 0(s.t. zi 0) and the following constraints are satisfied Dx0 z0 0 (1) eT z0 a (2) z0 M ' x0 (3) From (3), note that if xi0 0 then we have zi0 0 (the proof is similar to the one in theorem 1). Then we obtain eT z0 ( x0 )T z0 (4) . Since zi0 is a real number and every element of the matrix D is nonnegative, for all i where we have xi0 1, we can choose zi0 0 such that (Dx0 )i zi0. We then satisfy (1) and (3). Multiplying (1) by ( x0 )T , from (4) we obtain ( x0 )T Dx0 ( x0 )T z0 eT z0. Since x0 is an optimal solution to P1, (2) is satisfied: ( x0 )T Dx0 eT z0 a Sufficiency. The proof is similar. Reference: • P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. Empirical Results: Performance on Larger Problems Reference: • W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1 Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. Empirical Results: Performance on Larger Problems Hypothesis Testing Simulation Hypothesis: The critical electrode sites should be most likely to show the convergence in STLmax (drop in T-index below the critical value) again before the next seizure. The critical electrode sites are electrode sites that are the most converged (in STLmax ) electrode sites during 10-min window before the seizure show the dynamical resetting (diverged in STLmax ) during 10-min window after the seizure Simulation: Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times) Optimal VS Non-Optimal Simulation - Results How to automate the system Automated Seizure Warning System EEG Signals Continuously calculate STLmax from multichannel EEG. ASWA Select critical electrode sites after every subsequent seizure Give a warning when: T-index value is greater than 5, then drops to a value of 2.662 or less Monitor the average T-index of the critical electrodes Data Characteristics Performance Evaluation for ASWS To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning. # of accurately predicted seizures Sensitivity = # of analyzed seizures False Prediction Rate = average number of false warnings per hour Training Results Performance characteristics of automated seizure warning algorithm with the best parameter-settings of training data set. RECEIVER OPERATING CHARACTERISTICS (ROC) ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1-Specificity, plotted on Xaxis) that needs to be minimized the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized. ROC curve analysis for the best parameter settings of 10 patients Test Results Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set. Validation of the ASWS algorithm Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets Spatial Properties Non-Optimized ASWS – Selecting non-optimal electrode sites 100 Randomly Selected Electrodes Prediction Scores: ASWS Prediction Scores: Surrogate Data and Non-Optimized ASWS W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005. Prediction Scores: Surrogate Data and Non-Optimal ASWS Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application The first online real-time seizure prediction system Seizure Prediction Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average Ongoing and Future Research Classification of EEGs from normal and epileptic patients Classification of abnormal brain activity Cluster analysis of epileptic brains Analysis on scalp EEGs Reference W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005. W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. EEG Classification in Epilepsy. To appear in Annals of Operations Research. W. Chaovalitwongse and P.M. Pardalos. Optimization Approaches to Characterize the Hidden Dynamics of the Epileptic Brain: Seizure Prediction and Localization. To appear in SIAG/OPT Views-and-News. W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005 . L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005. P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004) W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters) Questions? Thank you Classification of Brain Activity Phase Profiles Entropy H of Attractor Classification of Physiological States Nearest Neighbor Time Series Classification Normal Pre-Seizure A Post-Seizure Similarity Measure for EEG Time Series – T-test By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 30 points, 5 minutes) are Li {STL max1i , STL max i2 , , STL max i30 } L j {STL max1j , STL max 2j , , STL max 30j }, Dij Li L j {d ij1 , d ij2 , , d ij30 } {STL max1i STL max1j , STL max i2 STL max 2j , , STL max i30 STL max 30j } Then, we calculate the average value, D ij ,and the sample standard deviation, ˆ d , of Dij {dij , dij2 , , dij30} . D ij The T-index between EEG signal epochs i and j is defined as Tij ˆ d , 30 T-Statistics Distance The T-index, Txy, between the time series x and y is then defined as: E[ X ] E[Y ] Txy xy / n where E[ ] denotes the average of the value within an epoch of the time series, n is the length of the time series epoch, and σxy is the sample standard deviation of the difference in value of x and y. Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom. Nearest Neighbor Classification Rules Given an unknown-state epoch of EEG signals A, we calculate statistical distances between the EEG epoch and the groups of Normal, Pre-Seizure, and Post-Seizure EEGs in our database. EEG sample A will be classified in the group of patient’s states (normal, pre-seizure, and postseizure) that yields the minimum T-index distance. Multiple Electrodes = Multiple Decisions Averaging Voting (Majority voting: selects action with maximum number of votes) Preliminary Data Set 132 5-minute epochs of pre-seizure EEGs 132 5-minute epochs of post-seizure EEGs 300 5-minute epochs of normal EEGs Pre-seizure = 0-30 minutes before seizure Post-seizure = 2-10 minutes after seizure Normal = 10 hours away from seizure Probability of Correct Classifications Probability of Correct Classifications Patient State Classification (Voting - Lmax+Phase) - Sensitivity 100.00% 95.65% Percentage of Classified Type 90.00% 80.00% 72.73% 70.00% 65.00% 60.00% Pre-ictal 50.00% Post-ictal Inter-ictal 40.00% 30.00% 25.00% 22.73% 20.00% 10.00% 10.00% 4.35% 0.00% 4.55% 0.00% Pre-ictal Post-ictal States Inter-ictal Metrics for Performance Evaluation PREDICTED CLASS Class=Yes Class=No ACTUAL CLASS Class=Yes a b Class=No c d a: TP (true positive); b: FN (false negative); c: FP (false positive); d: TN (true negative) Sensitivity and Specificity Sensitivity measures the fraction of positive cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize. Maximize the probability of correctly classifying patient states. False positive rate can be considered as 1-Specificity which one wants to minimize. RECEIVER OPERATING CHARACTERISTICS (ROC) ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized the detection rate (Sensitivity, plotted on Yaxis) that needs to be maximized. ROC – Performance Characteristics ROC for Different Classification Methods 1.000 0.900 0.800 0.700 Lmax Phase Sensitivity Entropy 0.600 Voting 0.500 0.400 0.300 0.200 0.100 0.000 0.000 0.100 0.200 0.300 0.400 0.500 0.600 1-Specificity 0.700 0.800 0.900 1.000 ROC – Performance Characteristics ROC for Different Classification Methods 1.000 0.900 0.800 Lmax 0.700 Phase Phase Lmax Sensitivity Entropy 0.600 Entropy Voting Average 0.500 0.400 0.300 0.200 0.100 0.000 0.000 0.100 0.200 0.300 0.400 0.500 0.600 1-Specificity 0.700 0.800 0.900 1.000 ROC – Performance Characteristics ROC for Different Classification Methods 1.000 0.900 0.800 Lmax Sensitivity 0.700 Phase Phase Lmax Average Entropy 0.600 Entropy Voting Average L+P+E Voting 0.500 0.400 0.300 0.200 0.100 0.000 0.000 0.100 0.200 0.300 0.400 0.500 0.600 1-Specificity 0.700 0.800 0.900 1.000 ROC – Performance Characteristics ROC for Different Classification Methods 1.000 Sensitivity = 95.7% 0.900 Specificity = 75.4% Voting 0.800 Lmax Sensitivity 0.700 Average Phase Phase Lmax Average Entropy 0.600 Entropy Voting Average L+P+E L+P Voting 0.500 0.400 0.300 0.200 0.100 0.000 0.000 0.100 0.200 0.300 0.400 0.500 0.600 1-Specificity 0.700 0.800 0.900 1.000 Results Any More Sophisticated Method? Support Vector Machines 2-Class Linearly Separable Case Mathematical Modeling Leave-one-out Cross Validation Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies. K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times. Classification Results QP for Clustering Clustering Epileptic Brains Hierarchical Clustering Agglomerative Divisive a, b, c, d, e b, c, e a, d b, c a d e b c Hierarchical Clustering Agglomerative Divisive a, b, c, d, e b, c, e a, d b, c a d e b c Hierarchical Clustering Agglomerative Divisive a, b, c, d, e b, c, e a, d b, c a d e b c Clustering via Concave Quadratic Programming (CCQP) Formulate a clustering problem as a Quadratic Integer Program (QIP) where A is an nxn T-index matrix of pairwise distance λ is a parameter adjusting the degree of similarity within a cluster xi is a 0-1 decision variable indicating whether or not point i is selected (assigned) to be in the cluster Advantages In some instances when λ is large enough to make the quadratic function become concave function. QIP can be converted to a continuous problem (minimizing a concave quadratic function over a sphere) CCQP Algorithm Patient 1: Box Plot of Average Solution Lmax Patient 1: Box Plots of Average Solution Lmax Phase Patient 2: Box Plots of Average Solution Lmax Phase Kruskal-Wallis Test …is a nonparametric version of the one-way ANOVA …is an extension of the Wilcoxon rank sum test to more than two groups …compares samples from two or more groups. …compares the medians of the samples in X, and returns the p-value for the null hypothesis that all samples are drawn from the same population (or equivalently, from different populations with the same distribution). Assumptions The Kruskal-Wallis test makes the following assumptions about the data in X: All samples come from populations having the same continuous distribution, apart from possibly different locations due to group effects. All observations are mutually independent. The classical one-way ANOVA test replaces the first assumption with the stronger assumption that the populations have normal distributions. T-test Test the hypothesis of the difference in means of two samples Determine whether two samples, x and y, could have the same mean when the standard deviations are unknown but assumed equal. Asymptotically, Txy index follows a tdistribution with n-1 degrees of freedom. Results – Significance Level Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application Quadratic Programming for Feature Selection Quadratic Programming for Clustering Long-Term Monitoring Analysis