Source Localization of MEG Generation Using Spatio-temporal Kalman Filter by Neil U Desai B.S. Computer Science Massachusetts Institute of Technology, 2004 SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ENGINEERING IN ELECTRICAL ENGINEERING AND COMPUTER SCIENCE AT THE HUSMS INS MASSACHUSETTS INSTITUTE OF TECHNOLOGY F OF TECHNOLOGY j2 JUNE 2005 JU 82005 Copyright 02005 Neil U Desai. All rights reserved. LIBRARIES The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Signature of Author Depahtment of Eletrical Engineering and Computer Science Certified by Dr. Emery Brown irector, Neuroscience Statistics Research Laboratory, MGH Thesis Supervisor Certified by Dr. Stephen Bums Harvard-MIT Di of Health Sciences and Technology Thesis Advisor Oce ted by Professor AiThur C. Smith Chairman, Department Committee on Graduate Theses BARKER Source Localization for Magnetoencephalography By Spatio-Temporal Kalman Filtering and Fixed-Interval Smoothing by Neil U Desai Submitted to the Department of Electrical Engineering and Computer Science on May 19, 2005, in partial fulfillment of the requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science ABSTRACT The inverse problem for magnetoencephalography (MEG) involves estimating the magnitude and location of sources inside the brain that give rise to the magnetic field recorded on the scalp as subjects execute cognitive, motor and/or sensory tasks. Given a forward model which describes how the signals emanate from the brain sources, a standard approach for estimating the MEG sources from scalp measurements is to use regularized least squares approaches such as LORETA, MNE, VARETA. These regularization methods impose a spatial constraint on the MEG inverse solution yet, they do not consider the temporal dynamics inherent to the biophysics of the problem. To address these issues, we present a state-space formulation of the MEG inverse problem by specifying a state equation that describes temporal dynamics of the MEG sources. Using a standard forward model system as the observation equation, we derive spatio-temporal Kalman filter and fixed-interval smoothing algorithms for MEG source localization. To compare the methods analytically, we present a Bayesian derivation of the regularized least squares and Kalman filtering methods. This analysis reveals that the estimates computed from the static methods bias the location of the sources toward zero. We compare the static, Kalman filter and fixed-interval smoothing methods in a simulated study of MEG data designed to emulate somatosensory MEG sources with different signal-to-noise ratios (SNR) and mean offsets. The data were mixtures of sinusoids with SNR ranging from 1 to 10 and mean offset ranging from 0 to 20. With both decrease in SNR and increase in mean offset, the Kalman filter and the fixed interval smoothing methods gave uniformly more accurate estimates of source locations in terms of mean square error. Because the fixed interval smoothing estimates were based on all recorded measurements, they had uniformly lower mean-squared errors than the Kalman estimates. These results suggest that state-space models can offer a more accurate approach to localizing brain sources from MEG recordings and that this approach may enhance appreciably the use of MEG as a non-invasive tool for studying brain function. Thesis Supervisor: Emery N. Brown Title: Director, Neuroscience Statistics Research Laboratory, MGH Thesis Advisor: Stephen Bums Title: Senior Lecturer, Harvard-MIT Division of Health Sciences and Technology 2 Acknowledgements This research project was a collaborative effort. I would like to thank Dr. Emery Brown for giving me the opportunity to work in his laboratory and for providing guidance throughout the project. Dr Brown also gave me the opportunity to meet with other senior research scientists and attend many meetings with fellow colleagues. I would also like to thank Dr. Chris Long who helped me with the many signal processing and mathematical challenges I found along the way. Finally, I would like to thank Dr. Matti Himiiinen whose patience allowed me to understand the resources used to create realistic brain signals and forward solutions. 3 Table of Contents Introduction...................................................5 Inverse M ethods ..................................................................................................... 13 Sim ulation .................................................................................... 30 R esults ........................................................................................ . 35 C onclusions..................................................................................... 40 R eferences.................................................................................... 41 4 1. Introduction: 1.1 MEG Measurements of the electrical currents flowing within the neurons of the brain can be used to aid in the diagnosis of many disorders, as well as to study basic brain function. One way to observe these tiny electrical currents is to measure the magnetic fields that they produce. Because magnetic fields pass through the skull and scalp as if they were transparent, they can be non-invasively measured outside the head, using a technique called magnetoencephalography, or MEG. MEG has important advantages over other functional imaging techniques. First, the temporal resolution is excellent allowing researchers to monitor brain activity on the millisecond timescale. Furthermore, MEG is completely non-invasive, requiring neither strong external magnetic fields (e.g. fMRI) nor injections of radioactive substances (e.g. PET). The magnetic fields that are evoked from neurons in the brain are very weak; they have strengths of 0.2 to 1 pT, less than a hundred millionth of the earth's magnetic field (Hamalainen, 1993). However, with the use of a sensitive magnetic-field sensor, the superconducting quantum interference device (SQUID), these small magnetic fields can be harnessed. The physics of how the SQUID works is not relevant for this research. Because of such small magnetic fields, all MEG recordings must be conducted in a special magnetically shielded room, to reduce the interference from larger magnetic fields. The patient sits in a chair and the MEG helmet is positioned directly above the patient's head, see figure 1. 5 Figure 1: Patient situated under the MEG Electroencephalography (EEG) is a similar non-invasive way to measure the brain's electrical activity. EEG requires a cap in which electrodes are placed. The cap must be fitted securely on the patients head to ensure accurate measurements. There are two critical differences between EEG and MEG. First, EEG measures electrical field, while the MEG measures magnetic field. Secondly, MEG is insensitive to the conductivities of the scalp, skull, and brain-which can affect EEG measurements. Neurons communicate through generation of these electrical potentials, called action potentials. They travel from neuron to neuron by traversing down the length of the neuronal axon. The progress of these potentials depends on the flow of a small quantity of ions down the center of the axon. Since ions have electrical charge, their 6 flow in one direction qualifies as an electrical current. Though this current is small, it will generate a very small magnetic field looping around the neuronal axon. When a number of neurons are simultaneously active, the magnetic field generated by these neurons is strong enough to be recorded from the surface of the brain using the SQUID. Magnetic fields appear whenever an electrical current is generated. If the electrical current has a net flow along a line (like a DC electrical current down a wire), then circular magnetic fields will appear in loops around that line, following the 'righthand rule' in electromagnetic theory. The figure below shows a current flow, J,, and the associated magnetic field, B. B Figure 2: The current Jv produces a magnetic field, B, in loops around the current. The magnetic field is sensed on the surface by traveling through the skull and scalp without any distortion. Previous research (Hari and Salmelin, 1997) determined that the conductivity of the skull and scalp are uniform, thus allowing the magnetic field to propagate without needing to consider other parameters. By the nature of the electric field, radial currents (sources that travel perpendicular to the cortex) do not produce 7 detectable magnetic fields, while tangential sources (parallel to the cortex) will produce magnetic fields, see figure 3: cortex VV00 cortical sources Figure 3: Only currents that run perpendicular to the cortex produce a measurable magnetic field (B * 0) Where B is the induced magnetic field, and V is the electric field associated with a current source. The arrows parallel to the cortex will emanate a detectable magnetic field, while those that are perpendicular to the cortex will not. The MEG detects the magnetic field using the coils in the SQUID. The coils are configured into 306 MEG channels. This configuration contains 204 gradiometers and 102 magnetometers. The gradiometers have more accurate sensitivity than the magnetometers and when analyzing MEG data, the magnetometers are generally ignored. The coil configurations of the channels are shown below: 8 Figure 4: Configuration of the coils on the MEG Depending on how one samples the surface of the brain, there are thousands of sources within the brain that elicit these magnetic fields. These sources propagate to the surface using Maxwell's equations and Biot-Savart Law. The method of how fields propagate is explained by a suitable forward model. The forward problem in MEG source localization is to determine the location and strengths of the MEG scalp measurements (the coils) based on the active source inside the brain. The purpose of this inverse problem, then, is to estimate the location and intensity of these sources given simply the measurements produced from the scalp. The problem is ill-posed because the number of measurements is significantly smaller than the number of sources. 1.2 Forward Problem: Before discussing methods of inverse localization, it is important to understand the forward model. This modeling depicts the relationship between the current sources 9 and the data coming from the MEG. It uses Maxwell's equations to predict how the current sources inside the brain radiate to the scalp. As described above, the current sources in the brain generate a current flow, J(r'), where r' is the source that produces the current. The total current flow, J(r'), can be divided into primaryJ (r') and volume J '(r') currents. The primary currents reflect directly the activity of the corresponding neurons. These currents flows exist in a conductive medium, i.e. the brain tissue. The volume currents are produced in other media, such as the skull and scalp. Jv (r') results from the effect of the electric field on outsides charge carriers and represents currents of the macroscopic field (Williamson and Kaufman, 1987): Jv a(r')E(r') where E is the electric field and T(r') is the conductivity which we will here assume to be isotropic. By making approximations for Maxwell Equations: E=-VV where V is the electric potential. The primary current flow JP (r') is the driving force of the total current flow and can be defined at the macroscopic level as the current that does not include the volume current: J(r') = J (r') + J v(r') = JP (r') - a(r')V V(r') The total current density may not have any volume currents (for example, when a current is closed loop), but all total currents must include a component that is directly associated with the primary current. For MEG, it is the primary currents we are interested in since they represent areas of the brain that relate to sensory and motor processes (Mosher, et. Al, 1999). We 10 will modify our current models to stress the importance of the primary current density. If we assume a simple head model involving a single homogeneous sphere and consists of contiguous regions each of constant conductivity, then the relationship between a source inside the brain and a measurement point outside the brain is given as the following, from Biot-Savart Law: B(r) = g J(r') x (r- r')3 dr' 4z G 1 11 Where B is the magnetic field, r is a location at the measurement point, r' is the location of the current source, and G is the closed volume containing the source. After distinguishing between primary and secondary current sources, this equation can be rewritten as: B(r) = BO(r)+ 4 (J -. 4z )JV(r') (r- r') xdS s |r - r || where the summation is over all boundaries, and Bo(r) is the magnetic field resulted from the primary currents. The equations above are used in a forward model assuming a spherical homogeneous head model. However, the forward problem must in general be solved numerically for arbitrary head shapes. In this research, a boundary element method (BEM) was used for solving the forward problem. The theory and mathematics involved are very similar to those for a homogeneous sphere model. In the BEM, the brain volume is partitioned into millions of tiny tetrahedral (see figure below). 11 Figure 5: Under the BEM the brain surface is divided into thousands on triangles. The mathematics calculating how a magnetic field is detected at the measurement point is applied as a general theory. The equations and relationships can be formed to fit any particular model, such as the BEM. There are several different head models, such as the 3 shell-sphere or the finite-element method (FEM), which take alternate parameters into consideration. For the purpose of this report, we need not go into the specific differences in head models. The BEM, the method used for this research, is calculated using the most current technology. 12 2. Inverse Methods: 2.1 Problem Formulation: Determining the strength and location of a neuronal current based on the MEG scalp measurement data is an ill-conditioned inverse problem. After defining a model for source activation, the measurement data is used to specify the parameters for the model, revealing information about the strength of current in different parts of the brain. The mathematical methods of estimation were first developed in the early 1 9 th 1 8 th and centuries. Bayesian statistics says that given the a priori probability distribution and a measurement value, one can get the a posteriori distribution describing the knowledge about possible values of the parameters. In 1777, Bernoulli described the method of maximum likelihood stating that a parameter value should be chosen that makes the obtained values more probable. Finally, Gauss invented the method of least-squares and current estimation methods solve inverse solutions using a regularized least-squares approach or by maximizing the a posteriori probability distribution. In the 1960s, Kalman published an effective method of updating parameter estimates given incoming measurement data, which is known as the Kalman Filter. The MEG inverse problem, as with many other inverse problems, is ill-posed, i.e., there may exist many combinations of parameters that produce the same results, thus having the problem of non-uniqueness. We will examine the evolution of methods used in source localization. Before we continue, it's important to set up the problem in terms of variables. 13 Since most of biomedical inverse solutions use linear algebraic formulations, a matrix setup is a good framework to follow. Assume that we call the measurement (observation) data vector Y, and the current source vector J. The vector Y encompasses all the measurements from the 204 channels: Y(t)= (y(l, t), y(2, t),..., y(Nc, t))' where t =1,... , T ' denotes matrix transposition. The measurement vector has the dimension Nc x T, where Nc is the number of channels, and T is the number of time points. The vector J contains the number of sources which are known in this research as gray matter voxels. At each voxel, there is a local three-dimensional current vector: j(t) = (j (V, o), jy (V, o), j '(V, t))' where v is the voxel label. The column vector of all current vectors (for all voxels) is denoted by: JW)= (U(1, A),j2, 0)',...j(Nv, )) where Nv is the number of voxels. Thus, the dimension of J(t) is 3Nv x T. The problem is ill-posed because Nc << Nv. Depending on how the brain is divided into discrete segments, Nv can vary from 400 to 4,000, each voxel having an x-, y-, and z-current density. It is difficult to estimate the strength and location of all 3Nv voxels, given only the 204 channel measurements on the scalp. This leads to the nonuniqueness problem described above since Nc << 3Nv. The forward model, explained in section 1.2, contains all of its information about the source propagation and current density to a matrix known as the leadfield matrix K, which maps the intracranial neuronal currents to the extracranial MEG signals. Each row of K represents a channel, and the values in that row are the weights 14 that each voxel provides to the value of that channel. K also acts as an ordertransformation matrix effectively transferring the order of the problem from the source space (3Nv x T) to the measurement space (Nc x T), which is significantly smaller. Thus, K has dimensions of Nc x Nv. The linearity of the lead field matrix allows the follow relationship: Y, = KJ, The above model can be easily extended to multiple time samples, thus creating a spatio-temporal model. Additionally, there exists an observation noise, e, which is an additive random element. Thus, we evolve the equation above into: Y = KJ, +e, Having stated the problem in terms of variables, we can now understand inverse methods. The overdetermined and underdetermined dipole models are both instantaneous solutions, i.e., they do not vary with time and take the current densities at each instant of time. The dynamic solutions use previous time states and a new measurement to provide an a posteriori estimate of the state. This enables the estimate to have spatio-temporal characteristics. 2.1 Static models: 2.1.1 Overdetermined dipole model (least-squares): The most basic model for MEG is the overdetermined dipole model. Here, the basic strategy is to fix the number of candidate sources and use a nonlinear estimation 15 algorithm to minimize the squared error between the actual data and the estimated value calculated from the forward model. The assumption is that the activated source area is small compared to the distance to the measurement channels, so the measured magnetic field resembles that generated by a current dipole. The setup for this situation is as follows: Y = K(r, j)J+ e Where K(rj) represents the signals generated by unit diploes with given locations r = (x1,y1,z,,...,xNv YNvZNv) and orientations = (Ijx'l'IZ''***'NvX' NvY' NvZ). The observation noise, c, has a zero-mean and a covariance matrix E. The maximum likelihood estimate of the dipole parameters (rj) can be calculated by minimizing the least-squares cost function: E(r, j, J) = |Y - K(r, j)Jj where the square of the Frobenius norm, x2 is the sum of the squares of the elements of the matrix. Taking into account the noise covariance, Ec, the least-squares cost function becomes E(r, q,Q)= Xj (Y-K(r, q)J) and the solution to this form is: f(J)=(Y -KJ) T -l(Y -KJ) The MEG sensors are extremely sensitive, intercepting noise from all active fields in the room. Thus the noise distribution of the measurement may include some non-Gaussian contributions, but on average, the noise settles upon a normal distribution 16 The problem with this model is that true sources areas are not single, point-like entities. Previous research has shown that an area of 1cm2 or larger can produce detectable signals at recovery (Hamalainen, 1993). Thus, the estimated source localization will not be exactly similar to the true locations, but will be close in the general area of activation. In implementing the least-squares method, the most significant variable in this method is the number of voxels to use, Nv. If one uses too many voxels, then the sources can be made to fit any data set, regardless of quality. However, if too few voxels are used, the intensity of the current dipole might not be strong enough to be detected at the sensor locations. Furthermore, as the number of voxels increases, the cost function may result in getting trapped in local minima, which might be undesirable. The solution to the problem is tractable because the number of dipole sources is much less than the number of channels. This is different when using the underdetermined dipole model in the next section. 2.1.2 Underdetermined dipole model (least squares with regularization): The previous least-squares solution is the simplest case and serves as a foundation for deriving more sophisticated inverse algorithms. In the underdetermined dipole model, the dipole locations are predefined over the volume where voxels can be found with decent accuracy. Here, the number of voxels, Nv, are greater than the number of measurements, Nc. If we observe the system at a single time point, we can set up a Gaussian linear model as follows: 17 Y =KJ +e (1) J, = Hp, + R, (2) where J contains the voxel strengths, and E is the measurement noise, distributed as a Gaussian random vector with mean zero and covariance matrix E. Similarly, J has an additive noise element R, which is zero mean and has a covariance matrix ER. The underdetermined problem results in minimizing the following cost function: - KJ f(J) = E - H +2|J RU112 (3) where X is a regularization parameter to trade-off the measurement error and prior information. If X is large, then the data, Y, must fit the prediction, KJ, so that the expression governing the fitness of the model, ||Y - KJ1, is minimal. Consequently, if k is small, then the model won't predict the data too well. Thus, k helps to penalize the fitness of the model. The corresponding least squares error function is: f(J)=(Y -KJ) T The first term in this equation X) (Y -KJ)+2(J f(J)=(Y -- KJ)T - Hp)- 1 (J - Hp) -)(Y- KJ) is the same as the least- square solution depicted in the overdetermined method. In order to identify a solution, the second term A(J - Hp)Z- (J- Hp) has information about the mean of the prior and the covariance structure of the source noise. The solution to the error function is: J=HP+'ZRK(KXRK+X) 1 [Y-KIP] (4) where J is the estimate of J. 18 This form brings about some interesting insights. The solutions are instantaneous (static) because J is computed at a snapshot instance in time. This poses an issue because there is no continuity constraint on the temporal characteristics of the solution, i.e., the estimate of source localizations at time t = t' is only proportional to the distribution of the source localizations at time t = t'. Another insight is that both static least squares models are the special case where the prior distribution on the sources is the Gaussian distribution with Hp = 0. This reduces Equation 4 to J = VL RK'(K2X RK'+Y)-'Y (5) which is the standard form of the least squares solution with regularization. Currently, this solution is widely used and adapted to include other physiological constraints, but for the purpose of this paper we will compare all future estimates to this static or 'naive' estimate. A major drawback with this model is that since the prior guess of the mean at time t = T is 0, each solution resets whatever previous knowledge it had. The lack of continuity in time may prevent accurate estimation of the current state estimate. 2.2 Dynamic Models: 2.2.1 State-Space Model: In order to develop the equations for the linear filter, a suitable state-space model for the system is presented as follows: 19 where J, =FJt, + 7, (6) Y =KJ +e, (7) mt is a source white noise with mean zero and covariance matrix Q = E(9rIT). Consider the autoregressive process AR(l) in equation 5. This equation models states that the estimate of a current vector at time t is equal to some constant multiplied by the current vector at time t-1. This adds the idea of temporal continuity to the system. If we let F = I3Nv, where Ix is an identity matrix of size x, Equation 6 becomes: J(v,t) =I 3 NJt + 7 (8) so that the current estimate of a voxel v at time t is equal to its value at t-J plus some random noise. An additional constraint can be placed so that the voxel at time t is equal to its value at t-1 and proportional to its neighbors at time t-1. To account for neighborhood interactions, we can rewrite equation 8 as: J(v,t)= AJ + B I J ,+, (9) Ve Z(v) where A and B are constants and Z(v) contains the neighbors of voxel v. The summation is over all the neighbors of v. The neighbors can be easily found by defining a spatial radius and determining which voxels intersect that area. The resulting values for this voxel are scaled by a factor B which averages the values by dividing the total sum by the number of neighbors. A different method is used to select the parameter A. Since we want the values for voxel v at time t - 1 to be emphasized more than the values for the neighbors of voxel v at t - 1, we will set the A to be I3N, and then scale the identity so that the summation over the entire row for voxel v is 1. Consider the following space 2-d representation for a set of voxels. 20 Figure 6: Spatial map of 2-D cross-section of the brain. There are 40 voxles. The following example is assuming voxel 18 is being observed, with neighbors 11, 17, 19 and 26. Assume that is how the brain looks on a cross-section perpendicular to the Z-axis. There are a total of 40 voxels each numbered in the spatial configuration above. Let's assume we are looking at voxel 18 for the moment. The neighborhood of voxels, Z(1 8) is [11, 17, 19, 26], two in the x-direction and two in the y-direction. According to equation 8, row 18 of the J vector will have a component an J(1 8, t) = AIJ(1 8, t -1) + B(j(1 1,t -1) + j(1 7, t -1) + j(1 9, t -1) + J(26, t - 1)) + q(t) The parameter B will initially be 0.25 since there are 4 neighbors. The parameter A will be set to 1. However, the entire row will be normalized so the sum adds to 1. Thus, the parameter B will now be 0.125 and the parameter A will be 0.5. This way, the sum is 21 0.5 times voxel 18's previous value, plus 0.125 times voxels II's previous value, plus 0.125 times voxel 17's previous value, and so on. Since we are adding up similar dimensions, it is possible to combine all the information about voxel v as well as its neighbors, Z(v), into a simple matrix, F. In the example provided above, the 1 8 th row of J, corresponding to voxel 18 is as follows: Z(v) 1... 11 17 18 2 18 26 HRn 1 a 19 *.. 0 18 11/8 11/211/8 1/8 0 Figure 7: Demonstrating portion of the F matrix corresponding to voxel 18. This is an example of how F is configured. For all other columns in row 18, the values are 0 because there are no neighborhood interactions to consider. If you notice, the diagonals of F will always have a value because this refers to the voxel's own previous value at time t - 1. 22 iijbi, = V kvtnmml - - - 500- 1000- 2000- 2500 . 500 1500 1000 2000 2500 Figure 8: Image of F, dimensions of Nv x Nv, showing the neighborhood interactions between all voxels. On a full scale of Nv = 2697 voxels, the F matrix looks like the above image captured from Matlab. The diagonals are apparent, and the banded lines show the neighbors. There are smaller bands around the diagonals that cannot be seen in the figure, since voxel v generally will have as neighbors' v + 1 and v - 1. There are many ways to implement the state-space model. One could use an AR(2) which would look at voxels not only at t, and t - 1, but also t - 2. Additionally, we can expand the neighborhood space by increasing the radius of possible neighboring voxels. There are other physiological variables we might be able to model inside the generalized state-space model but this model serves as a good starting point to continue the dynamic model derivation. 2.2.2 Kalman Filtering: 23 - In 1960, R.E. Kalman published his famous paper describing a set of mathematical equations that provides an efficient computational (recursive) method of estimating the state of a process. The Kalman filter is a powerful tool because it supports estimation of past, present, and even future states and can model these states without exact knowledge of the system. Since that paper was published, the Kalman filter has been applied to all filtering and estimating problems ranging from navigation control to forecasting with atmospheric models. In this paper, we develop a set of equations on the Kalman filter to be used for the purpose of estimating the location and strength of current sources in the brain. The Kalman filter can be derived directly from Bayes' Rule and a probabilistic approach to solving the filter equations are provided in the appendix. The Kalman filter is formulated as follows. Given the state-space model described in equations 5 and 6: J,=FJ1_1 +R, Y= KJ +e, in which the state noise, It, is a white Gaussian noise with a covariance matrix Q, and the measurement noise, ct, also is a white Gaussian noise with a covariance matrix R. We would like to formulate the estimation algorithm to satisfy the following statistical conditions: 24 1. The expected value of our estimate is equal to the expected value of the state; i.e., on average, our estimate of the state will equal the true state. 2. Versus all other estimation algorithms, this algorithm minimizes the expected value of the mean squared error; i.e., on average, the algorithm gives the smallest possible estimation error. The Kalman filter satisfies these two criteria. The filter estimates the state by using a feedback control. First the filter estimates the value of a state at time t, and then it receives feedback from the measurements (including the noise). The Kalman filter equations fall into two categories: time update (prediction) equations and measurement update (filter) equations. The time update equations use knowledge of the temporal continuity to project forward (in time) the current state and error covariance to give the a priori estimates for the next time step. The measurement update equations provide feedback so that the new measurement is incorporated to the estimation allowing the a prioriestimate to improve to an a posterioriestimate. The sequence of events in Kalman filtering is as follows: start with the initial conditions Jo0o, V010 where V is the error covariance of the state estimate J. The first step is to predict the next state for J1. Predicting the next state Ji is actually predicting the state Ji given state Jo. Thus, the actual computation is estimating the state for Jilo. Then a new measurement, Yi is received, and is used in a filtering stage to compute the value of the state at time t = 1. The result is represented by Jill. The process is recursive, yielding J211 in the next step followed by J2 12 at the filtering stage. The iterations continue until JNVlNv iS computed. The Kalman filter equations are presented below: 25 Prediction (Time-Update) J,,-1 =FJ,1 1I (10) V,_- = FV_FT + Q (11) Notice how in the time-update equation 10 a new a priori estimate for the state is predicted: Jtit. 1 . The prediction happens through the use of the F matrix that gives a new estimate of the state. The state error covariance, Vt, is also predicted forward from the previous time point to give an a priori estimate of Vtlt-1 . Filter (Measurement-Update) +R)- 1 (12) J,,t = Jt,_ + G, (Y - KJt,, (13) Gt=Vj,_K T (KVjK T V ,= (I- GK)V,_j (14) The measurement equations first compute the Kalman gain, Gt. Equation 13 uses the calculated gain to get an a posteriori estimate of the state. The term, Y - KJj,_, are residuals, and measure the difference between the new incoming measurement, and the a priori estimate. The gain, then, is used to minimize the a posteriori error covariance, and it determines how much to weigh the residual. Finally Equation 14 determines the a posteriorierror covariance. The Kalman filter can be viewed in the same framework used when working with the least-squares problem. Recall the solution to the least squares solution with regularization from Equation 4. J = Hp + 2'YRK'(KV R K'+,)-1 Y - KHp] 26 Recall that Hp was the mean of the state, and from the regularized least squares solution, we typically set this value to 0, implying that at every new time point, the a priori mean for the state is 0. Because the Kalman filter is a dynamic process, the mean of the a priori will be related by Equation 10. So if we were to assume that Hp = Jt-i, then we could set up a different cost minimizing function: f (j) = Iy - KJ,- 1|I, + - FJ-1 and arrive at a new estimate: i, = J,- + V,_-K'(KV,_, K'+ 1 -[Y - KJ,_ we can identify this solution as the Kalman filter measurement update equation that calculates the a posterioriestimate. a priori estimate residual i, = J,_ + V,_ K'(KV_, K'+Y,) -1Y - KJ,,] Kalman gain Thus using the similar framework for static and dynamic solutions, it is easy to see how each solution differs. 2.2.3 Fixed Interval (backwards) smoother: 27 The Kalman Filter framework provides an accurate way to estimate a given state J(v,t) at time t. And since this is a recursive algorithm, the estimate at time t has taken all previous time points into consideration. Thus, an alternate way to write the estimate t- 1). is J(v, t It = 0 ... In a MEG experiment, all the data of a single-trial is computed and averaged over the entire epoch. Thus data from the MEG measurements are acquired, so an estimate of a state could be better calculated not only using the previous time points, but future time point as well. This filter is known as the fixed-interval or backwards smoother. It provides an estimate of J(v, t I t = 0... T-1) where T is the number of time points. The backwards smoother can be derived using a statistical framework, which is given in the Appendix. The equations for the backwards smoother are as follows: Gain (Correction term) +A, = , (15) Smoothed estimate +At,.t+IT - Jt+ 1, ) JtIT =JtIt VtIT = Vt t + A, (VIIr - V+, )A (16) T, (17) The process is similar to the Kalman filter. The gain, At, is calculated which determines the weight placed on the residual J + I-T - J t+1 I t. The smoothed estimate Jt1 adjusts the Kalman filter estimate Jtjt by adding or subtracting the new residual. This filter requires that all error covariance values must be saved for all time points, which can be a computational issue if the size of Nv grows very large. 28 In this research, we will test the three inverse solutions: Static solution: least squares: J(v, t I t = t) Dynamic solution: Kalman Filter: J(v, t I t = 0... t) Backwards smoother: J(v, t I t = .... T) Simulated brain signals will be placed into an area of voxels in the brain and through the forward model, we generate the MEG measurements. Then, each three inverse solutions will recover the signal and a mean squared error will be calculated. 29 3. Simulations: 3.1 Region of Interest: In order to test the effectiveness of each inverse solution we decided to run models on simulated MEG data. Using proprietary software, we digitized the brain into voxels using a 7.5mm spacing. Then, we chose a region of interest. The region of interest for this simulation was this section of the left hemisphere of the brain: Figure 9: Region of interest on inflated surface This view of the brain is known as an inflated surface. The brain's cortex is highly convoluted with indentations in the brain which are known as gyrus (shallow) or sulcus (deep) regions. The inflated surface places all convolutions on the same plane 30 and inflates them to the surface. Thus, what is circled in green is a selection of voxels on the gyri and sulci of the brain (see figure below). Figure 10: Region of interest on normal surface The brain imaging software enables us to select a region of interest on the brain and compute a forward model on only those voxels which are in that region of interest. The entire cortex consists of more than 8,000 gray matter voxels that can emanate a magnetic field. By reducing the order of the problem to a region of interest, we can reduce the computational burden on the inverse solutions. There is no loss of information. The region of interest here selects 837 voxels. The forward model that is calculated automatically creates a lead-field matrix, K, for the region of interest. For this simulation, the number of sources are Nv = 837, 31 .......... and the number of measurement channels is Nc = 204. Thus, K has dimensions of 204 x 837. 3.2 Brain Signals: In each of the 837 voxels, we place a simulated brain signal. Two simulated signals were developed. The first signal is a sine wave with a frequency of 10Hz. We denote this signal as S1 = sin(2zrl0t) where t is an array of time points. The signal is depicted below: Figure 11: Signal S1 to be placed in all voxels to produce JTrue For this simulation, we have a 'True' array of voxels, which is called JTue. SI is placed into each voxel. 32 The other simulated signal is more physiologically accurate. The signal is a modulated sine wave with a mixture of 10+20Hz frequency. Additionally, there are bursts of activation followed by silence. Here S2 = sin(2ir10t)+sin(2c2Ot).A graph of the signal S2 is provided below. Figure 12: Signal S2 to be placed in all voxels to produce JTrne. The JTrue signals for S1 and S2 have a mean of zero. Because of this fact, the least- squares solution will do a great job of signal recovery since it assumes at each time point the mean of the estimate is 0. For this reason, we will add a DC offset to the JTre signals since the actual signals produced in the brain do not have zero means. Thus Jhrue will be Jrue + offset. We will vary the offset from 0 to 20. 33 3.3 Selecting Regularization Parameter A If we recall from inverse methods, the static least square solution in an underdetermined dipole model has a regularization parameter X. This parameter is used to penalize the fitness of the model. Typically, values for lambda are between 0.01 and 1, however most values fall within 0.1 to 1. In order to find the approximate value for X, we varied k from 0.1 to 1 and compute the mean-squared error between the true signal SI and the least squares solution given by Equation 5. The resulting mean-squared error was smallest when X was equal to 0.5. For the remainder of the results, this will be the constant for X. The format of the simulation is as follows: 1. Place signal Si into all voxels to produce an Nv x T matrix JTrue. 2. Produce a measurement matrix, Y, using the forward model to produce an Nc x T matrix Y. 3. Calculate inverse solutions: a. Jstati for least squares with regularization solution. b. JKalman for Kalman filter solution. c. JSmooth for the fixed interval smoother. 4. Calculate the mean-squared error (MSE) between the three inverse solutions and JTrue. 5. Vary the signal-to-noise (SNR) ratios and offset values. 6. Perform the same analysis for signal Si. 34 4. Results The first set of experiments focused on changing the offset value in order to determine what effect a non-zero mean has on the recovered signals. For the signal Si, the recovery for all solutions are shown in the following figure: Figure 13: Inverse solution recovery for signal S1 at one voxel. Each plot shows a recovered signal for a given DC offset value. It is clear that as the offset occurs, the dynamic solutions (Kalman and smoother) can track and estimate the true signal better. The static solution is very noisy and fluctuates rapidly because of the prior assumption that the mean is 0. 35 The static solution does perform very well when there is no offset. As the offset gradually increase, you can see the static solution fluctuates rapidly. The dynamic solutions do a much better job at tracking the signal, despite the offset. This occurs because of the dynamic mean allows temporal continuity. For signal S2, a similar result occurs: Figure 14: Inverse solution recovery for signal S2 at one voxel. Similar results as S1, tracking improves using a dynamic inverse solution. 36 The SNR for the above simulations were at 5. This is an experimentally suitable SNR for MEG data. The MSE for each signal under the different offset values was calculated for SNR 5. The results are shown below. Signal S2: MSE with Variable Offset 3.5E-1 1 3.OE-1 1 -+- Static -6- Kalman Smoother 2.5E-1 I wi C', 2.OE-1 1 1.5E-11 1.E-11 5.OE-12 O.OE+00 10 20 Offset Figure 15: MSE versus Offset for signal S2. The MSE for the Static solutions increase significantly as the offset is increased. The results above show that the non-zero mean has a significant effect on the recovery performance. This is an important point because true brain signals do not always have zero mean. Recovery for the static solution gets increasingly difficult as the mean departs from zero. The Kalman filter is significantly better than the static solution, and the smoother slightly improves the estimates from the Kalman computations. The SNRs were increased from 1 to 10 assuming a constant offset of 0. A graph of the effect of SNR on a recovered signal is shown below: 37 Signal S2: MSE with Variable SNR (offset = 0) 7.OE-12 -4-Static -*-Kalman Smoother 6.OE-12 5.OE-12 wU 4.OE-12 C,, 3.OE-12 2.OE-12 I .OE-12 O.OE+OO 1I 2 4 5 10 SNR Figure 16: MSE versus SNR with 0 offset. The inverse solutions follow a downward trend as SNR is increased. At a high SNR (10), the static solution recovery is very similar to that of the dynamic solutions. MEG data is very noisy, a typical SNR for measurement data is around 5. At this SNR, the recovery still favors the dynamic solutions. If we assume an offset of 10, then the difference between the static and dynamic solutions are more significant. See the figure below. 38 Signal S2: MSE with Variable SNR (offset = 10) 3.OE-1 1 -+- Static -M-Kalman Smoother 2.5E-1 1 2.OE-1 1 w 1.5E-11 1.OE-11 5.OE-12 O.OE+00 1 2 4 5 10 SNR Figure 17: MSE versus SNR with offset of 10. Demonstrating the importance of dynamic solutions. 39 5. Conclusions The purpose of this report is to serve as a proof of principle for future research. The preliminary research here provides two important findings. First, a dynamic solution to the inverse problem using spatio-temporal Kalman filtering results in a better overall recovery of the states. In neurophysiology, brain signals can be intractable, and the process is very spontaneous and fluctuating. In order to simulate this environment, an offset was added to our simulated signals. The offset is an important variable in the inverse solutions because it causes the static solution to constantly fluctuate from zero in order to find a solution. Secondly, the fixed interval smoother allows for more detailed signal recovery. It is not certain whether or not the small improvement in performance can be clinically justified, but for the purpose of this report, it should be looked upon further. In order to fully understand the fruits of dynamic inverse solutions, an analysis must be conducted on real data. Real data will provide the Y vector of measurements and then we can run the inverse solutions on the measurement data. This is the aim of our future research. 40 5. References Galka, A., Yamashita, 0., Ozaki, T., Biscay, R., Valde's-Sosa, P.A., 2004. A solution to the dynamical inverse problem of EEG generation using spatiotemporal Kalman filtering. Neurolmage 23: 435-453. Mosher, J.C.; Leahy, R.M.; Lewis, P.S., "EEG and MEG: forward solutions for inverse methods", IEEE Transactions on Biomedical Engineering, Volume: 46 3 , March 1999, Page(s): 245 -259. M. Hamalainen et al. (1993) Magnetoencephalography - theory, instrumentation, and applications to noninvasive studies of the working humans brain. Reviews of Modem Physics 65:413-197 PDF R. Hari and R. Salmelin (1997) Human cortical oscillations: a neuromagnetic view through the skull. Trends in Neuroscience 20:44-40 PDF S. J. Williamson, and L. Kaufman, "Analysis of neuromagnetic signals," in A. S. Gevins and A. Remond (Eds.), Handbook of Electroencephalography and Clinical Neurophysiology: Volume 1. Method of Analysis of Brain Electrical and Magnetic Signals (pp. 405-448). New York: Elsevier, 1987 G. Kitagawa and W. Gersch. Smoothness Priors Analysis of Time Series. Notes in Statistics #116, New York:Springer-Verlag. 1996 41 Lecture