M.Tech. (CS), Semester III, Course B50
Functional Brain Signal
Processing: EEG & fMRI
Lesson 7
Kaushik Majumdar
Indian Statistical Institute
Bangalore Center kmajumdar@isibang.ac.in
EEG Coherence Measures
Cross-correlation.
Covariance:
( , )
((
( ))( y
( )))
EEG Feature Extraction
Features of EEG signals can be in myriad different forms, such as:
Amplitude
Phase
Fourier coefficients
Wavelet coefficients, etc.
Two Most Fundamental Aspects of
Machine Learning
Differentiation: decomposing the data into features, and
Integration: classification of those features.
Fisher’s Discriminant
Duda, Hart & Stork, 2006
Fisher’s Discriminant (cont.) w
1
T y
1 x
11 x
12
x
21 x
22
.
.
.
.
.
.
. . . .
.
.
. . . .
. . . .
. . . .
x x
1
2
.
d d
.
.
.
x
.
.
n 1 x
.
n
.
2
. . . .
. . . .
. . . .
x
.
.
nd
T
There are n d-dimensional data vectors x
1
, ….., x n
, out of which n
1 vectors belong to a set D
1 and n
2 vectors belong to another set D
2
. n
1
+ n
2
=n. w is a ddimensional weight vector such that || w || = 1. That is w can apply rotation only. The rotation will have to be such that D
1 and D
2 are optimally separable by a projection on a straight line in the ddimensional space.
Fisher’s Discriminant (cont.)
Sample mean is an unbiased estimate of the population mean. So difference in mean ensures difference in population.
m i
1 n i x
T j .
D i x
T j .
m i
1 n i
y j
Y i y j
1 n i x
T j
.
D i
T T w x j .
T w m i
Fisher’s Discriminant (cont.) m
1
m
2
s i
2
i
(
T
(
1
m
2
)
i
)
2 which the criterion function
J
m
1
m
2 s
1
2 s
2
2
2 is to be maximized.
D
1
D
2
T w x
Fisher’s Discriminant (cont.)
Let us define
S i
x
D i
(
i
)(
i
)
T and s i
2
Since y
T w x , x
D , i i
{1, 2}
x
D i
(
T T w x w m i
)
2
T w S w i
x
D i
T T w ( x m )( x m ) w i i
S w
S
1
S
2 and s i
2
i
(
i
)
2
Fisher’s Discriminant (cont.) s
1
2 s
2
2 T w S w
1
T w S w
2
T w S w w
Similarly ( m
1
m
2
)
2
(
T w m
1
T w m
2
)
2
T
(
1
m
2
)( m
1
m
2
)
T w
T w S w
B where
S
B
( m
1
m
2
)( m
1
m
2
)
T
S w is called within class scatter matrix and matrix.
S
B is called between class scatter
Fisher’s Discriminant (cont.)
J w
T w S w
B
T w S w w
S
B
f
J ( w ) is always a scalar quantity and therefore
( ) must hold for a scalar valued function f of a vector w variable w , because w T ( S
B
– f( w ) S w
) w = 0.
Clearly, maximum f ( w ) will make J ( w ) maximum. Let maximum f ( w
we can write
S w
B
S w w where w is the vector for which J ( w ) is maximum.
S
B w is in direction of m
1
– m
2
(elaborated in the next slide). Also scale of does not matter, only direction does. So we can write w
Fisher’s Discriminant (cont.)
S w w
m
1
m
2 or w
S
1 w
( m
1
m
2
Note that
)
S w
B
( m
1
m
2
)( m
1
m
2
)
T w
( m
1
m
2
){( m
1
m
2
)
T w }
Here all vectors are by default column vector, if not stated otherwise. So, all transpose operations give row vectors. ( m
1
– m
2
) T is a row vector and w is a column vector. Therefore the value within the second bracket above is a scalar.
That is of m
1
S
B w = ( m
– m
2
.
1
– m
2
)s, where s is a scalar. This implies S
B w is in the direction
Dimensionality Reduction by
Fisher’s Discriminant
S S w
I S
B
S w are d-dimensional square matrices.
For the purpose of classification (or pattern recognition) we only need those eigenvectors of whose associated eigenvalues are w B large enough. The rest of the vectors (and therefore dimensions) we can ignore.
Logistic Regression p x w
1
y )
1 b
T w x ) y
b
Parra et al., NeuroImage , 22: 342 – 452, 2005
Logistic Regression (cont.) p(y)
1 - p(y)
Logistic Regression vs. Fisher’s
Discriminant
Theoretically it has been shown that logistic regression is shown to be between one half and two thirds as effective as normal discrimination for statistically interesting values of parameters (B. Effron, The efficiency of logistic regression compared to normal discriminant analysis, JASA (1975)
892-898).
Logistic Regression (cont.) j p j
i
D
1 ij i
( )
b j exp( y t j
( ))
y t j
( ))
1
p j
1
y t j
( )) j
N
1 p j to be maximized, N is number of data points
Logistic Regression (cont.)
L w
1
,......, w b
D
log
j
N
1 exp( y t j
( ))
y t j
( ))
Note that is a monotonically increasing function and so any set which
1 exp(
x exp(
) x ) increases will lead us closer to the optimal value of . Even if we take and the end result for EEG signal separation for target and non-target or for different targets will almost be similar to the case when a convergence technique for as described is followed. The two classes of data will be separated by the hyperplane normal to and the perpendicular distance of the hyperplane from origin is . In other words the equation of the hyperplane is
.
Logistic Regression vs. Fisher’s
Discriminant
FD projects the multidimensional data on a line, whose orientation is such that the separation of the projected data becomes maximum on that line.
LR assigns probability distribution to the two different data sets in a way that the distribution approaches 1 on one class and 0 on another, exponentially fast.
This makes LR a better separator or classifier than FD.
References
R. Q. Quiroga, A. Kraskov, T. Kreuz and P.
Grassberger, On performance of differnet synchronization measures in real data: a case study on EEG signals, Phys. Rev. E ,
65(4): 041903, 2002.
R. O. Duda, P. E. Hart and D. G. Stork,
Pattern Classification, 4e, John Wiley &
Sons, New York, 2007, p. 117 – 121.
THANK YOU
This lecture is available at http://www.isibang.ac.in/~kaushik