Optimization and Data Mining in Epilepsy Research

advertisement
THE STATE UNIVERSITY OF NEW JERSEY
RUTGERS
Optimization and Data
Mining in Epilepsy Research
W. Art Chaovalitwongse
Assistant Professor
Industrial and Systems Engineering
Rutgers University
Acknowledgements

Comprehensive Epilepsy Center, St. Peter’s
University Hospital



Brain Institute, University of Florida




Rajesh C. Sachdeo, MD
Deepak Tikku, MD
Panos M. Pardalos, PhD
J. Chris Sackellares, MD
Paul R. Carney, MD
Bioengineering, Arizona State University

Leonidas D. Iasemidis, PhD
Agenda




Background: Epilepsy
Electroencephalogram (EEG) Time Series
Chaos Theory: Dimensionality Reduction
Seizure Prediction



Feature Selection
Process Monitoring
Concluding Remarks
Facts About Epilepsy




At least 2 million Americans and other 40-50
million people worldwide (about 1% of
population) suffer from Epilepsy.
Epilepsy is the second most common brain
disorder (after stroke)
The hallmark of epilepsy is recurrent seizures.
Epileptic seizures occur when a massive group
of neurons in the cerebral cortex suddenly begin
to discharge in a highly organized rhythmic
pattern.
Epileptic Seizures



Seizures usually occur spontaneously, in the
absence of external triggers.
Seizures cause temporary disturbances of brain
functions such as motor control, responsiveness
and recall which typically last from seconds to a
few minutes.
Seizures may be followed by a post-ictal period
of confusion or impaired sensorial that can
persist for several hours.
Rationale


Based on 1995 estimates, epilepsy imposes an
annual economic burden of $12.5 billion in the
U.S. in associated health care costs and losses
in employment, wages, and productivity.
Cost per patient ranged from $4,272 for persons
with remission after initial diagnosis and
treatment to $138,602 for persons with
intractable and frequent seizures.
How To Fight Epilepsy




Anti-Epileptic Drugs (AEDs)
 Mainstay of epilepsy treatment
 Approximately 25 to 30% remain unresponsive
Epilepsy surgery
 Require long-term invasive EEG monitoring
 50% of pre-surgical candidates do not undergo respective surgery
 Multiple epileptogenic zones
 Epileptogenic zone located in functional brain tissue
 Only 60% of surgery cases result in seizure free
Electrical Stimulation (Vagus nerve stimulator)
 Parameters (amplitude and duration of stimulation) arbitrarily
adjusted
 As effective as one additional AED dose
 Side Effects
Seizure Prediction?
Vagus Nerve Stimulator
Open Problems





Is the seizure occurrence random?
If not, can seizures be predicted?
If yes, are there seizure pre-cursors
preceding seizures?
If yes, what measurement can be used to
indicate these pre-cursors?
Does normal brain activity during differ from
abnormal brain activity?
Electroencephalogram (EEG)




…is a tool for evaluating the physiological state of
the brain.
…offers excellent spatial and temporal resolution to
characterize rapidly changing electrical activity of
brain activation
…captures voltage potentials produced by brain
cells while communicating.
In an EEG, electrodes are implanted in deep brain
or placed on the scalp over multiple areas of the
brain to detect and record patterns of electrical
activity and check for abnormalities.
From Microscopic to Macroscopic
Level (Electroencephalogram - EEG)
Depth and Subdural electrode
placement for EEG recordings
ROF
LOF
RST
LTD
LST
LOF
LST
RTD
LTD
Scalp EEG Data Acquisition
EEG Data Acquisition
Typical EEG Time Series Data
Goals of Research





Test the hypothesis that seizures are not a
random process.
Employ data mining techniques to
differentiate normal and abnormal EEGs
Employ quantitative analysis to identify
seizure pre-cursors
Demonstrate that seizures could be predicted
Develop a closed-loop seizure control device
(Brain Pacemaker)
10-second EEGs: Seizure Evolution
Normal
Pre-Seizure
Seizure
Post-Seizure
Dimensionality Reduction
The brain is a non-stationary system.
 EEG time series is non-stationary.
 With 200 Hz sampling, 1 hour of EEGs is
comprised of
200*60*60*30 = 21,600,000 data points = 43.2MB
(assume 16-bit ASCI format)
 1 day = 1 hour*24
 1 week = 1 hour*168
 20 patients = 1 hour*3360

Kilobytes → Megabytes
→ Gigabytes → Terabytes
Dimensionality Reduction
Using Chaos Theory






Chaos in Brain?
Chaos in Stock Market?
Chaos in Foreign Exchanges (Swedish Currency)?
Measure the brain dynamics from EEG time series.
Apply dynamical measures (based on chaos theory) to
non-overlapping EEG epochs of 10.24 seconds = 2048
points.
Maximum Short-Term Lyapunov Exponent
 measures the average uncertainty along the local
eigenvectors and phase differences of an attractor in
the phase space
 Measures the chaoticity of the brain waves





Embed the data set (EEG). Xi = (x(ti),x(ti+τ),…,x(ti+(p-1)τ))T where τ is the selected time
lag between the components of each vector in the phase space, p is the selected
dimension of the embedding phase space, and ti  [1,T-(p-1) τ].
Pick a point x(t0) somewhere in the middle of the trajectory. Find that point's nearest
neighbor. Call that point z0 (t0).
Compute |z0 (t0) - x(t0)| = L0.
Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z0
(ti) - x(ti)| = L0(i) and incrementing i, until L0(i) > ε. Call that value L0' and that time
t1.
Find z1 (t1), the “nearest neighbor” of x(t1), and go to step 3. Repeat the procedure to
the end of the fiduciary trajectory t = tn, keeping track of the Li and Li' .
where M is the number of
times we went through the
loop above, and N is the
number of time-steps in
the fiduciary. NΔt = tn - t0
2-D Example: Circle of initial conditions evolves into an
ellipse.
 d1  d 0 e1t is the major axis.
 d 2  d 0 e 2t is the minor axis.
 The i th Lyapunov exponents after n steps can be defined as:
di
1
i  log
n
d0
STLmax Profiles
Pre-Ictal
Ictal
Post-Ictal
Hidden Synchronization
Patterns
How similar are they?
Statistics to quantify the convergence of STLmax
By paired-T statistic:
Per electrode, for EEG signal epochs i and j, suppose their STLmax
values in the epochs (of length 60 points, 10 minutes) are
Li  {STL max1i , STL max i2 ,
, STL max i60 }
L j  {STL max1j , STL max 2j ,
, STL max 60j },
Dij  Li  L j  {dij1 , dij2 ,
, dij60 }
 {STL max1i  STL max1j , STL max i2  STL max 2j ,
, STL max i60  STL max 60j }
Then, we calculate the average value, D ij ,and the sample standard
deviation, ˆ d , of Dij  {dij , dij2 , , dij60} .
Dij
The T-index between EEG signal epochs i and j is defined as Tij  ˆ
d
,
60
Statistically Quantifying the
Convergence
IID (Independent and Identically
Distributed) Test
Assumption 1: Within a window of 30 STLmax
points, the differences of STLmax values (Dij)
between two electrode sites i and j are
independent.
To verify this assumption, Employ “portmanteau” test
of white noise developed by Ljung and Box.
Assumption 2: Within a wt window of 60 points, the
differences of STLmax values between two
electrode sites i and j are normally distributed.
To verify this assumption, Employ To check this
assumption, we employed the Shapiro-Wilk W test,
which is is a well-established and powerful test of
departure from normality.
Convergence of STLmax
Models
Homoclinic Chaos (Silnikov’s Theorem):
Rössler systems, Lorentz systems, population dynamical
systems
dxi (t )
 w i y i  z i 
dt
N

(e i , j x j  e i', j xi ) (1)
j 1,i  j
dyi (t )
 w i xi  a i yi
dt
(2)
dzi (t )
 b i xi  zi ( xi  yi )
(3)
dt
w, a, b and g are intrinsic parameters.
e and e’ are directional coupling strengths.
N = number of oscillators
STLmax versus time and coupling

Why Feature Selection?


Not every electrode site shows the convergence.
Feature Selection: Select the electrodes that are most
likely to show the convergence preceding the next seizure.
Optimization Problem


Optimization:
 We apply optimization techniques to find a group of
electrode sites such that …
 They are the most converged (in STLmax) electrode
sites during 10-min window before the seizure
 They show the dynamical resetting (diverged in
STLmax) during 10-min window after the seizure.
 Such electrode sites are defined as “critical electrode
sites”.
Hypothesis:
 The critical electrode sites should be most likely to
show the convergence in STLmax again before the
next seizure.
Multi-Quadratic Integer
Programming

To select critical electrode sites, we
formulated this problem as a multiquadratic integer (0-1) programming
(MQIP) problem with …
 objective function to minimize the
average T-index among electrode
sites
 a linear constraint to identify the
number of critical electrode sites
 a quadratic constraint to ensure
that the selected electrode sites
show the dynamical resetting
Problem P1 :
Min f( x)  xT Qx
n
s.t.
x b
i 1
i
xT Dx  a
xi  {0,1}, i  1,..., n
Notation and Modeling





x is an n-dimensional column vector (decision variables), where
each xi represents the electrode site i.
 xi = 1 if electrode i is selected to be one of the critical electrode
sites.
 xi = 0 otherwise.
Q is an (nn) matrix, whose each element qij represents the Tindex between electrode i and j during 10-minute window before a
seizure.
b is an integer constant. (the number of critical electrode sites)
D is an (nn) matrix, whose each element dij represents the Tindex between electrode i and j during 10-minute window after a
seizure.
α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of
T-index, as previously defined, to reject H0: “`two brain sites
acquire identical STLmax values within 10-minute window”
Conventional Linearization Approach for
Multi-Quadratic 0-1 Problem
 For each product xi x j , we introduce new 0-1 variable xij  xi x j (i  j ).
Note that xii  xi 2  xi for xi  0,1 .
 The equivalent linear 0-1 problem is given by:
min
 q x
ij ij
i
s.t.
j
Ax  b
xij  xi , for i, j  1,..., n (i  j )
xij  x j , for i, j  1,..., n (i  j )
xi  x j  1  xij , for i, j  1,..., n (i  j )
d
i
x a
ij ij
j
xi  {0,1}, 0  xij  1, i, j  1,..., n
 Note that the number of continuous variables has been increased to O( n 2 ).
 Note that this problem formulation is computationally inefficient as n increases.
KKT Conditions Approach

Consider the quadratic 0-1 programming problem
Min f( x )  x T Qx
s.t.
Ax  b
xi  {0,1}, i  1,..., n


Q is an (nn) matrix.
b is an integer constant
x is an n-dimensional column vector
eT = (1,1,…,1)
Relax x ≥ 0, we then have the following KKT conditions:
Min f( x )  x Qx
s.t.
Ax  b
Qx  u.e  y  0
T
xi  0, i  1,..., n
c  0, A  e , v  0
T
Ax  b
yT x  0
x  0, u  0, y  0
KKT Conditions Approach


Add slack variables a and define s = u.e + a
Minimizing slack variables, we can formulate this problem as:
T
Min e s
Qx  y  s  0
Ax  b
y x0
x  0, s  0, y  0
T
Fix x{0,1}
yT x  0  y  M (1  x)
Min eT s
Qx  y  s  0
Ax  b
y  M (1  x )
where s  0, y  0, x  0,1,
and M  max  qij  Q 
i
j

Note that this problem formulation is an efficient approach, as n increases,
because it has the SAME number of 0-1 variables (n), and 2n additional
continuous variables.
Connections Between QIP problems and
MILP problems


For any matrix Q where qij≥0
We want to prove that P and P are equivalent:
Problem P :
Problem P :
Min f( x )  x T Qx
s.t.
Ax  b
xi  {0,1}, i  1,..., n
Equivalent
Min eT s
Qx  y  s  0
Ax  b
(1)
(2)
y  M (1  x )  y T x  0 (3)
s  0, y  0, x  0,1
where M  max  aij
i
j
(4)
Theorem1: "If P has an optimal solution x 0 iff there exist y 0 , s 0 such that
( x 0 , y 0 , s 0 ) is an optimal solution to P."
PROOF : Neccessity. If x is an optimal solution to P, it is obvious that
y , s : y  0, s  0 such that Qx 0  y  s  0
(1) and y T x  0 (3) .
Choose y 0 and s0 from the above defined set of y and s s.t. eT s 0 is minimized.
Let us show that ( x 0 , y 0 , s 0 ) is an optimal solution to P.
Multiplying (1) by ( x 0 )T , we obtain ( x 0 ) T Qx 0  ( x 0 ) T y 0  ( x 0 ) T s 0  0.
Note that from (3), ( x 0 )T y 0  ( y 0 ) T x  0. We then have ( x 0 )T Qx 0  ( x 0 )T s 0 .
We know that x 0  arg min x T Qx, s.t. Ax  b, x  {0,1}. If we can prove that
eT s 0  ( x 0 )T s 0
(5) , then ( x 0 , y 0 , s 0 ) is an optimal solution to P.
To prove eT s 0  ( x 0 )T s 0
(5) , it is sufficient to show that, for any i,
if xi0  0, then si0  0. We can prove this statement by contradiction.
Proof : Assume that given ( x 0 , y 0 , s 0 ) that is an optimal solution to P,
xi0  0 and si0  0 for some i. ( eT s 0 is minimized)
For any i, define vectors yi  yi0  si0 and si  0, which is not the optimal
solution (eT s is not minimal). It is clear that ( x 0 , y , s ) satisfied all contraints
(1) - (4) in P. Thus, ( x 0 , y, s ) is feasible and eT s  eT s 0 .
This fact contradicts our initial assumption that ( x 0 , y 0 , s 0 ) is an
optimal solution to P.
Sufficiency. The proof is similar.
eT s 0  ( x 0 )T s 0  s1  s2  ...  sn  x1s1  x2 s2  ...  xn sn
Theoretical Results:
MILP formulation for MQIP problem


Consider the MQIP problem
We proved that the MQIP program is EQUIVALENT to a MILP problem
with the SAME number of integer variables.
Problem P :
1
Min eT s
Problem P :
1
Min f( x)  xT Qx
s.t.
Ax  b
xT Dx  a
x {0,1}, i  1,..., n
i
Equivalent
Qx  y  s  0
(1)
Ax  b
(2)
y  M (1 x) (3)
Dx  z  0
(4)
eT z  a
(5)
z  M 'x
(6)
s, y, z  0, x 0,1
(7)
where M  max  qij  Q  ,
i j
M '  max  dij  D 
i j
Theorem2: "If P1 has an optimal solution x0 iff there exist y 0, s0, z 0 such that ( x0, y 0, s0, z 0)
is an optimal solution to P1."
PROOF : Neccessity. From the proof of theorem 1, to prove theorem 2 we only need to show
that if x0 is an optimal solution to problem P1, then there exists vector z 0(s.t. zi  0) and the
following constraints are satisfied
Dx0  z0  0
(1)
eT z0  a
(2)
z0  M ' x0
(3)
From (3), note that if xi0  0 then we have zi0  0 (the proof is similar to the one in theorem 1).
Then we obtain
eT z0  ( x0 )T z0 (4) .
Since zi0 is a real number and every element of the matrix D is nonnegative, for all i where
we have xi0  1, we can choose zi0  0 such that (Dx0 )i  zi0. We then satisfy (1) and (3).
Multiplying (1) by ( x0 )T , from (4) we obtain ( x0 )T Dx0  ( x0 )T z0  eT z0.
Since x0 is an optimal solution to P1, (2) is satisfied: ( x0 )T Dx0  eT z0  a
Sufficiency. The proof is similar.
Reference:
•
P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney,
O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal
Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004.
Empirical Results:
Performance on Larger Problems
Reference:
•
W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1
Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research
Letters, 32(6): 517-522, 2004.
Empirical Results:
Performance on Larger Problems
Hypothesis Testing Simulation


Hypothesis:
 The critical electrode sites should be most likely to show
the convergence in STLmax (drop in T-index below the
critical value) again before the next seizure.
 The critical electrode sites are electrode sites that
 are the most converged (in STLmax ) electrode sites
during 10-min window before the seizure
 show the dynamical resetting (diverged in STLmax )
during 10-min window after the seizure
Simulation:
 Based on 3 patients with 20 seizures, we compare the
probability of showing the convergence in STLmax (drop in
T-index below the critical value) before the next seizure
between the electrode sites, which are
 Critical electrode sites
 Randomly selected (5,000 times)
Optimal VS Non-Optimal
Simulation - Results
How to automate the system
Automated Seizure Warning System
EEG Signals
Continuously calculate
STLmax from multichannel EEG.
ASWA
Select critical electrode
sites after every
subsequent seizure
Give a warning when:
T-index value is greater
than 5, then drops to a
value of 2.662 or less
Monitor the average
T-index of the
critical electrodes
Data Characteristics
Performance Evaluation for
ASWS



To test this algorithm, a warning was
considered to be true if a seizure occurred
within 3 hours after the warning.
# of accurately predicted seizures
Sensitivity =
# of analyzed seizures
False Prediction Rate = average number of
false warnings per hour
Training Results
Performance characteristics of automated seizure warning
algorithm with the best parameter-settings of training data set.
RECEIVER OPERATING
CHARACTERISTICS (ROC)



ROC curve (receiver operating characteristic) is
used to indicate an appropriate trade-off that one
can achieve between:
the false positive rate (1-Specificity, plotted on Xaxis) that needs to be minimized
the detection rate (Sensitivity, plotted on Y-axis) that
needs to be maximized.
ROC curve analysis for the best
parameter settings of 10 patients
Test Results
Performance characteristics of automated seizure warning
algorithm with the best parameter settings on testing data set.
Validation of the ASWS
algorithm

Temporal Properties



Surrogate Seizure Time Data Set
100 Surrogate Data Sets
Spatial Properties


Non-Optimized ASWS – Selecting non-optimal
electrode sites
100 Randomly Selected Electrodes
Prediction Scores: ASWS
Prediction Scores: Surrogate Data
and Non-Optimized ASWS
W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C.
Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application
to Epilepsy. Epilepsy Research, 64, 93-133, 2005.
Prediction Scores: Surrogate
Data and Non-Optimal ASWS
Concluding Remarks






Overview of Epilepsy Research
Applications of Data Mining and Optimization Techniques
Interplay between theory and application
The first online real-time seizure prediction system
Seizure Prediction
 Predicting ~70% of temporal lobe seizures on average
 Giving a false alarm rate of ~0.16 per hour on average
Ongoing and Future Research
 Classification of EEGs from normal and epileptic patients
 Classification of abnormal brain activity
 Cluster analysis of epileptic brains
 Analysis on scalp EEGs
Reference







W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau,
and J.C. Sackellares. A Robust Method for Studying the Dynamics of the
Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.
W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. EEG Classification in
Epilepsy. To appear in Annals of Operations Research.
W. Chaovalitwongse and P.M. Pardalos. Optimization Approaches to
Characterize the Hidden Dynamics of the Epileptic Brain: Seizure Prediction and
Localization. To appear in SIAG/OPT Views-and-News.
W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C.
Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for
Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005 .
L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan,
A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective
On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116
(3): 532-544, 2005.
P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S.
Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning
Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical
Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award
2004)
W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization
Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research
Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations
Research Letters)
Questions?
Thank you
Classification of Brain
Activity
Phase Profiles
Entropy H of Attractor
Classification of Physiological
States
Nearest Neighbor Time Series
Classification
Normal
Pre-Seizure
A
Post-Seizure
Similarity Measure for EEG
Time Series – T-test
By paired-T statistic:
Per electrode, for EEG signal epochs i and j, suppose their STLmax
values in the epochs (of length 30 points, 5 minutes) are
Li  {STL max1i , STL max i2 ,
, STL max i30 }
L j  {STL max1j , STL max 2j ,
, STL max 30j },
Dij  Li  L j  {d ij1 , d ij2 ,
, d ij30 }
 {STL max1i  STL max1j , STL max i2  STL max 2j ,
, STL max i30  STL max 30j }
Then, we calculate the average value, D ij ,and the sample standard
deviation, ˆ d , of Dij  {dij , dij2 , , dij30} .
D ij
The T-index between EEG signal epochs i and j is defined as Tij  ˆ
d
,
30
T-Statistics Distance

The T-index, Txy, between the
time series x and y is then
defined as:
E[ X ]  E[Y ]
Txy 
 xy / n
where E[ ] denotes the
average of the value within an
epoch of the time series, n is
the length of the time series
epoch, and σxy is the sample
standard deviation of the
difference in value of x and y.
Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom.
Nearest Neighbor
Classification Rules



Given an unknown-state epoch of EEG signals A,
we calculate statistical distances between the EEG
epoch and the groups of Normal, Pre-Seizure, and
Post-Seizure EEGs in our database.
EEG sample A will be classified in the group of
patient’s states (normal, pre-seizure, and postseizure) that yields the minimum T-index distance.
Multiple Electrodes = Multiple Decisions


Averaging
Voting (Majority voting: selects action with maximum
number of votes)
Preliminary Data Set



132 5-minute epochs of pre-seizure EEGs
132 5-minute epochs of post-seizure EEGs
300 5-minute epochs of normal EEGs



Pre-seizure = 0-30 minutes before seizure
Post-seizure = 2-10 minutes after seizure
Normal = 10 hours away from seizure
Probability of Correct
Classifications
Probability of Correct
Classifications
Patient State Classification (Voting - Lmax+Phase) - Sensitivity
100.00%
95.65%
Percentage of Classified Type
90.00%
80.00%
72.73%
70.00%
65.00%
60.00%
Pre-ictal
50.00%
Post-ictal
Inter-ictal
40.00%
30.00%
25.00%
22.73%
20.00%
10.00%
10.00%
4.35%
0.00%
4.55%
0.00%
Pre-ictal
Post-ictal
States
Inter-ictal
Metrics for Performance
Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL
CLASS
Class=Yes
a
b
Class=No
c
d
a: TP (true positive); b: FN (false negative);
c: FP (false positive); d: TN (true negative)
Sensitivity and Specificity


Sensitivity measures the fraction of positive cases that are
classified as positive.
Specificity measures the fraction of negative cases classified as
negative.
Sensitivity = TP/(TP+FN)
Specificity = TN/(TN+FP)



Sensitivity can be considered as a detection (prediction or
classification) rate that one wants to maximize.
Maximize the probability of correctly classifying patient states.
False positive rate can be considered as 1-Specificity which one
wants to minimize.
RECEIVER OPERATING
CHARACTERISTICS (ROC)



ROC curve (receiver operating characteristic)
is used to indicate an appropriate trade-off
that one can achieve between:
the false positive rate (1-Specificity, plotted
on X-axis) that needs to be minimized
the detection rate (Sensitivity, plotted on Yaxis) that needs to be maximized.
ROC – Performance
Characteristics
ROC for Different Classification Methods
1.000
0.900
0.800
0.700
Lmax
Phase
Sensitivity
Entropy
0.600
Voting
0.500
0.400
0.300
0.200
0.100
0.000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
1-Specificity
0.700
0.800
0.900
1.000
ROC – Performance
Characteristics
ROC for Different Classification Methods
1.000
0.900
0.800
Lmax
0.700
Phase
Phase
Lmax
Sensitivity
Entropy
0.600
Entropy
Voting
Average
0.500
0.400
0.300
0.200
0.100
0.000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
1-Specificity
0.700
0.800
0.900
1.000
ROC – Performance
Characteristics
ROC for Different Classification Methods
1.000
0.900
0.800
Lmax
Sensitivity
0.700
Phase
Phase
Lmax
Average
Entropy
0.600
Entropy
Voting
Average
L+P+E
Voting
0.500
0.400
0.300
0.200
0.100
0.000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
1-Specificity
0.700
0.800
0.900
1.000
ROC – Performance
Characteristics
ROC for Different Classification Methods
1.000
Sensitivity = 95.7%
0.900
Specificity
= 75.4%
Voting
0.800
Lmax
Sensitivity
0.700
Average
Phase
Phase
Lmax
Average
Entropy
0.600
Entropy
Voting
Average
L+P+E
L+P
Voting
0.500
0.400
0.300
0.200
0.100
0.000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
1-Specificity
0.700
0.800
0.900
1.000
Results
Any More Sophisticated
Method?
Support Vector Machines
2-Class Linearly Separable Case
Mathematical Modeling
Leave-one-out Cross Validation


Cross-validation can be seen as a way of
applying partial information about the
applicability of alternative classification
strategies.
K-fold cross validation:




Divide all the data into k subsets of equal size.
Train a classifier using k-1 groups of training data.
Test a classifier on the omitted subset.
Iterate k times.
Classification Results
QP for Clustering
Clustering Epileptic Brains
Hierarchical Clustering


Agglomerative
Divisive
a, b, c, d, e
b, c, e
a, d
b, c
a
d
e
b
c
Hierarchical Clustering


Agglomerative
Divisive
a, b, c, d, e
b, c, e
a, d
b, c
a
d
e
b
c
Hierarchical Clustering


Agglomerative
Divisive
a, b, c, d, e
b, c, e
a, d
b, c
a
d
e
b
c
Clustering via Concave Quadratic
Programming (CCQP)

Formulate a clustering problem as a Quadratic
Integer Program (QIP)

where A is an nxn T-index matrix of pairwise
distance
λ is a parameter adjusting the degree of similarity
within a cluster
xi is a 0-1 decision variable indicating whether or not
point i is selected (assigned) to be in the cluster


Advantages


In some instances when λ is large enough to make the
quadratic function become concave function.
QIP can be converted to a continuous problem (minimizing a
concave quadratic function over a sphere)
CCQP Algorithm
Patient 1: Box Plot of Average Solution
Lmax
Patient 1: Box Plots of Average Solution
Lmax
Phase
Patient 2: Box Plots of Average Solution
Lmax
Phase
Kruskal-Wallis Test




…is a nonparametric version of the one-way
ANOVA
…is an extension of the Wilcoxon rank sum test
to more than two groups
…compares samples from two or more groups.
…compares the medians of the samples in X,
and returns the p-value for the null hypothesis
that all samples are drawn from the same
population (or equivalently, from different
populations with the same distribution).
Assumptions

The Kruskal-Wallis test makes the following
assumptions about the data in X:



All samples come from populations having the
same continuous distribution, apart from possibly
different locations due to group effects.
All observations are mutually independent.
The classical one-way ANOVA test replaces
the first assumption with the stronger
assumption that the populations have normal
distributions.
T-test



Test the hypothesis of
the difference in means
of two samples
Determine whether two
samples, x and y, could
have the same mean
when the standard
deviations are unknown
but assumed equal.
Asymptotically, Txy
index follows a tdistribution with n-1
degrees of freedom.
Results – Significance Level
Concluding Remarks






Overview of Epilepsy Research
Applications of Data Mining and Optimization
Techniques
Interplay between theory and application
Quadratic Programming for Feature Selection
Quadratic Programming for Clustering
Long-Term Monitoring Analysis
Download