M.Tech. (CS), Semester III, Course B50 Functional Brain Signal Processing: EEG & fMRI Lesson 2 Kaushik Majumdar Indian Statistical Institute Bangalore Center kmajumdar@isibang.ac.in EEG Processing Preprocessing Pattern recognition Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview EEG Artifacts Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview Eye Blink Artifact: Electrooculogram (EOG) Matrix Representation of MultiChannel EEG M is an m x n matrix, whose m rows represent m EEG channels and n columns represent n time points. Often during EEG processing we are to find a matrix W such that WM is the processed signal. Majumdar, under preparation, 2013 EOG Identification by Principal Component Analysis (PCA) PCA Algorithm (cont.) PCA Algorithm (cont.) PCA Rotation and (Stretching or Contracting) Wallstrom et al., Int. J. Psychophysiol., 53: 105-119, 2004 Performance of PCA in EOG Removal EOG Independent Component Analysis (ICA) In PCA data components are assumed to be mutually orthogonal, which is too restrictive. PCA components Original data sets ICA (cont.) PCA will give poor results if the covariance matrix has eigenvalues close to each other. ICA as Blind Source Separation (BSS) S1 2 S2 1 S4 Four musicians are playing in a room. From the outside only music can be heard through four microphones. 4 No one can be seen. How the music heard from outside can be decomposed into four sources? S3 3 Mathematical Formulation A is mixing matrix, x is sensor vector, s is source vector and n is noise, which is to be eliminated by filtering. Mathematical Formulation (cont.) Given find such that Any estimation technique of is called an ICA technique or BSS technique in general. Hyvarinen and Oja, Neural Networks, 13: 411-430, 2000 ICA Algorithm: FastICA Whitening: Normalization (make mean zero). Make variance one i.e., E expectation, x is the vector of signals and I is identity matrix. FastICA (cont.) B is orthogonal matrix and D is diagonal matrix of E will satisfy Whitening complete Non-Gaussianity ICA is appropriate only when probability distribution of the data set is non-Gaussian. Gaussian distribution is of the form Entropy of Gaussian Variable A Gaussian variable has the largest entropy among a class of random variables with equal variance (for a proof see Cover & Thomas, Elements of Information Theory). Here we will give an intuitive argument. Entropy of a Random Variable X En( X ) p( X ) log p( X )dX Less (zero) information More information Random 1 0.8 0.9 0.6 0.8 0.4 0.7 0.2 0.6 X = random(t) X = sin(10t) Deterministic 1 0 -0.2 0.5 0.4 -0.4 0.3 -0.6 0.2 -0.8 0.1 -1 0 0 1 2 3 4 t 5 6 7 0 100 200 300 400 t 500 600 700 Gaussian Random Variable Has Highest Entropy: Intuitive Proof By Central Limit Theorem (CLT) the mean of a class of random variables (class is signified by uniform variance) follows normal distribution as the number of members in the class tends to infinity (i.e., becomes very large). Infinite observations hold infinite or maximum amount of information. Intuitive Proof (cont.) Therefore a random variable with normal distribution has the highest information content. So it has the highest entropy. If each variable in a class of random variables admit only finite number of nonzero values, the one with uniform distribution will have the highest entropy. Non-Gaussianity as Negentropy H is entropy and J negentropy. J is to be maximized. When J is maximum y is reduced to a component. This can be shown by calculating the kurtosis 2 for component and sum of components including the said component (See Hyvarinen & Oja, 2000, P. 7). Steps of FastICA after Whitening g is in the form of either of the two Exercise FastICA has been implemented in EEGLAB (in runica function). Remove artifacts from sample EEG data using the ICA implementation in EEGLAB. Concept of Independence in PCA and ICA In PCA independence means orthogonality i.e., pairwise dot product is zero. In ICA independence is statistical independence. Let x, y be random variables, p(x) is probability distribution function of x and p(x,y) is joint probability distribution function of (x,y). If p(x,y) = p(x).p(y) holds we call x and y are statistically independent. Independence (cont.) If vectors v1 and v2 are orthogonal they are independent. Say not, then a1v1 + a2v2 = 0 implies, a1v1.v1 + a2v2.v1 = 0 or a1 = 0. Similarly a2 = 0. If v1 = cv2 then both of them must have same probability distribution or p(v1,v2) = p(v1) = p(v2). If v1 and v2 are linearly independent p(v1,v2) = p(v1).p(v2) may or may not hold. If p(v1,v2) = p(v1).p(v2) holds then v1 and v2 are linearly independent. Conditions for ICA Applicability Sources are statistically independent. Propagation delays in the mixing medium are negligible. Sources are time varying. Mixing medium delays may affect sources in different locations differently and thereby corrupting their temporal structures. Number of sources = number of sensors. References Benbadis and Rielo, EEG artifacts, eMedicine, available online at http://emedicine.medscape.com/article/1140 247-overview, 2008. Hyvarinen and Oja, Independent component analysis: algorithms and applications, Neural Networks, vol. 13, p. 411-431, 2000. Majumdar, A Brief Survey of Quantitative EEG Analysis, Chapter 2. THANK YOU This lecture is available at http://www.isibang.ac.in/~kaushik