Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar

advertisement

M.Tech. (CS), Semester III, Course B50

Functional Brain Signal

Processing: EEG & fMRI

Lesson 7

Kaushik Majumdar

Indian Statistical Institute

Bangalore Center kmajumdar@isibang.ac.in

EEG Coherence Measures

 Cross-correlation.

Covariance:

( , )

((

( ))( y

( )))

EEG Feature Extraction

Features of EEG signals can be in myriad different forms, such as:

 Amplitude

 Phase

 Fourier coefficients

 Wavelet coefficients, etc.

Two Most Fundamental Aspects of

Machine Learning

 Differentiation: decomposing the data into features, and

 Integration: classification of those features.

Fisher’s Discriminant

Duda, Hart & Stork, 2006

Fisher’s Discriminant (cont.) w

1

T y

1      x

11 x

12

     x

21 x

22

.

    

.

.

.

.

.

. . . .

.

.

. . . .

. . . .

. . . .

x x

1

2

.

d d

.

.

.

     x

.

.

n 1 x

.

n

.

2

. . . .

. . . .

. . . .

x

.

.

nd

T

There are n d-dimensional data vectors x

1

, ….., x n

, out of which n

1 vectors belong to a set D

1 and n

2 vectors belong to another set D

2

. n

1

+ n

2

=n. w is a ddimensional weight vector such that || w || = 1. That is w can apply rotation only. The rotation will have to be such that D

1 and D

2 are optimally separable by a projection on a straight line in the ddimensional space.

Fisher’s Discriminant (cont.)

 Sample mean is an unbiased estimate of the population mean. So difference in mean ensures difference in population.

m i

1  n i x

T j .

D i x

T j .

m i

1 n i

 y j

Y i y j

1 n i x

T j

.

D i

T T w x j .

 T w m i

Fisher’s Discriminant (cont.) m

1

 m

2

 s i

2  

 i

(

T

(

1

 m

2

)

 i

)

2 which the criterion function

J

 m

1

 m

2 s

1

2  s

2

2

2 is to be maximized.

D

1

D

2

T w x

Fisher’s Discriminant (cont.)

Let us define

S i

  x

D i

(

 i

)(

 i

)

T and s i

2

Since y

T w x , x

D , i i

{1, 2}

  x

D i

(

T  T w x w m i

)

2

T w S w i

 x

D i

T   T w ( x m )( x m ) w i i

S w

S

1

S

2 and s i

2  

 i

(

 i

)

2

Fisher’s Discriminant (cont.) s

1

2  s

2

2  T w S w

1

 T w S w

2

 T w S w w

Similarly ( m

1

 m

2

)

2 

(

T w m

1

 T w m

2

)

2

 T

(

1

 m

2

)( m

1

 m

2

)

T w

 T w S w

B where

S

B

( m

1

 m

2

)( m

1

 m

2

)

T

S w is called within class scatter matrix and matrix.

S

B is called between class scatter

Fisher’s Discriminant (cont.)

J w

T w S w

B

T w S w w

S

B

 f

J ( w ) is always a scalar quantity and therefore

( ) must hold for a scalar valued function f of a vector w variable w , because w T ( S

B

– f( w ) S w

) w = 0.

Clearly, maximum f ( w ) will make J ( w ) maximum. Let maximum f ( w

 we can write

S w

B

 

S w w where w is the vector for which J ( w ) is maximum.

S

B w is in direction of m

1

– m

2

(elaborated in the next slide). Also scale of does not matter, only direction does. So we can write w

Fisher’s Discriminant (cont.)

S w w

 m

1

 m

2 or w

S

1 w

( m

1

 m

2

Note that

)

S w

B

( m

1

 m

2

)( m

1

 m

2

)

T w

( m

1

 m

2

){( m

1

 m

2

)

T w }

Here all vectors are by default column vector, if not stated otherwise. So, all transpose operations give row vectors. ( m

1

– m

2

) T is a row vector and w is a column vector. Therefore the value within the second bracket above is a scalar.

That is of m

1

S

B w = ( m

– m

2

.

1

– m

2

)s, where s is a scalar. This implies S

B w is in the direction

Dimensionality Reduction by

Fisher’s Discriminant

 

S S w

 

I S

B

S w are d-dimensional square matrices.

For the purpose of classification (or pattern recognition) we only need those eigenvectors of whose associated eigenvalues are w B large enough. The rest of the vectors (and therefore dimensions) we can ignore.

Logistic Regression p x w

1

 y )

1 b

T w x ) y

  b

Parra et al., NeuroImage , 22: 342 – 452, 2005

Logistic Regression (cont.) p(y)

1 - p(y)

Logistic Regression vs. Fisher’s

Discriminant

 Theoretically it has been shown that logistic regression is shown to be between one half and two thirds as effective as normal discrimination for statistically interesting values of parameters (B. Effron, The efficiency of logistic regression compared to normal discriminant analysis, JASA (1975)

892-898).

Logistic Regression (cont.) j p j

 i

D 

1 ij i

( )

 b j exp( y t j

( ))

 y t j

( ))

1

 p j

1

 y t j

( )) j

N 

1 p j to be maximized, N is number of data points

Logistic Regression (cont.)

L w

1

,......, w b

D

  log

 j

N 

1 exp( y t j

( ))

 y t j

( ))

Note that is a monotonically increasing function and so any set which

1 exp(

 x exp(

) x ) increases will lead us closer to the optimal value of . Even if we take and the end result for EEG signal separation for target and non-target or for different targets will almost be similar to the case when a convergence technique for as described is followed. The two classes of data will be separated by the hyperplane normal to and the perpendicular distance of the hyperplane from origin is . In other words the equation of the hyperplane is

.

Logistic Regression vs. Fisher’s

Discriminant

 FD projects the multidimensional data on a line, whose orientation is such that the separation of the projected data becomes maximum on that line.

 LR assigns probability distribution to the two different data sets in a way that the distribution approaches 1 on one class and 0 on another, exponentially fast.

This makes LR a better separator or classifier than FD.

References

 R. Q. Quiroga, A. Kraskov, T. Kreuz and P.

Grassberger, On performance of differnet synchronization measures in real data: a case study on EEG signals, Phys. Rev. E ,

65(4): 041903, 2002.

 R. O. Duda, P. E. Hart and D. G. Stork,

Pattern Classification, 4e, John Wiley &

Sons, New York, 2007, p. 117 – 121.

THANK YOU

This lecture is available at http://www.isibang.ac.in/~kaushik

Download