Source Localization of MEG Generation Using Spatio-temporal ... by SUBMITTED U

Source Localization of MEG Generation Using Spatio-temporal Kalman Filter
by
Neil U Desai
B.S. Computer Science
Massachusetts Institute of Technology, 2004
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND
COMPUTER SCIENCE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF ENGINEERING IN ELECTRICAL ENGINEERING AND
COMPUTER SCIENCE AT THE
HUSMS INS
MASSACHUSETTS INSTITUTE OF TECHNOLOGY F
OF TECHNOLOGY
j2
JUNE 2005
JU 82005
Copyright 02005 Neil U Desai. All rights reserved.
LIBRARIES
The author hereby grants to MIT permission to reproduce and to distribute publicly
paper and electronic copies of this thesis document in whole or in part.
Signature of Author
Depahtment of Eletrical Engineering and Computer Science
Certified by
Dr. Emery Brown
irector, Neuroscience Statistics Research Laboratory, MGH
Thesis Supervisor
Certified by
Dr. Stephen Bums
Harvard-MIT Di
of Health Sciences and Technology
Thesis Advisor
Oce ted by
Professor AiThur C. Smith
Chairman, Department Committee on Graduate Theses
BARKER
Source Localization for Magnetoencephalography By Spatio-Temporal Kalman
Filtering and Fixed-Interval Smoothing
by
Neil U Desai
Submitted to the Department of Electrical Engineering and Computer Science on May
19, 2005, in partial fulfillment of the requirements for the Degree of Master of
Engineering in Electrical Engineering and Computer Science
ABSTRACT
The inverse problem for magnetoencephalography (MEG) involves estimating the
magnitude and location of sources inside the brain that give rise to the magnetic field
recorded on the scalp as subjects execute cognitive, motor and/or sensory tasks. Given
a forward model which describes how the signals emanate from the brain sources, a
standard approach for estimating the MEG sources from scalp measurements is to use
regularized least squares approaches such as LORETA, MNE, VARETA. These
regularization methods impose a spatial constraint on the MEG inverse solution yet,
they do not consider the temporal dynamics inherent to the biophysics of the problem.
To address these issues, we present a state-space formulation of the MEG inverse
problem by specifying a state equation that describes temporal dynamics of the MEG
sources. Using a standard forward model system as the observation equation, we derive
spatio-temporal Kalman filter and fixed-interval smoothing algorithms for MEG source
localization. To compare the methods analytically, we present a Bayesian derivation of
the regularized least squares and Kalman filtering methods. This analysis reveals that
the estimates computed from the static methods bias the location of the sources toward
zero. We compare the static, Kalman filter and fixed-interval smoothing methods in a
simulated study of MEG data designed to emulate somatosensory MEG sources with
different signal-to-noise ratios (SNR) and mean offsets. The data were mixtures of
sinusoids with SNR ranging from 1 to 10 and mean offset ranging from 0 to 20. With
both decrease in SNR and increase in mean offset, the Kalman filter and the fixed
interval smoothing methods gave uniformly more accurate estimates of source locations
in terms of mean square error. Because the fixed interval smoothing estimates were
based on all recorded measurements, they had uniformly lower mean-squared errors
than the Kalman estimates. These results suggest that state-space models can offer a
more accurate approach to localizing brain sources from MEG recordings and that this
approach may enhance appreciably the use of MEG as a non-invasive tool for studying
brain function.
Thesis Supervisor: Emery N. Brown
Title: Director, Neuroscience Statistics Research Laboratory, MGH
Thesis Advisor: Stephen Bums
Title: Senior Lecturer, Harvard-MIT Division of Health Sciences and Technology
2
Acknowledgements
This research project was a collaborative effort. I would like to thank Dr. Emery Brown
for giving me the opportunity to work in his laboratory and for providing guidance
throughout the project. Dr Brown also gave me the opportunity to meet with other
senior research scientists and attend many meetings with fellow colleagues. I would
also like to thank Dr. Chris Long who helped me with the many signal processing and
mathematical challenges I found along the way. Finally, I would like to thank Dr. Matti
Himiiinen whose patience allowed me to understand the resources used to create
realistic brain signals and forward solutions.
3
Table of Contents
Introduction...................................................5
Inverse M ethods .....................................................................................................
13
Sim ulation ....................................................................................
30
R esults ........................................................................................
. 35
C onclusions.....................................................................................
40
R eferences....................................................................................
41
4
1. Introduction:
1.1 MEG
Measurements of the electrical currents flowing within the neurons of the brain
can be used to aid in the diagnosis of many disorders, as well as to study basic brain
function. One way to observe these tiny electrical currents is to measure the magnetic
fields that they produce. Because magnetic fields pass through the skull and scalp as if
they were transparent, they can be non-invasively measured outside the head, using a
technique called magnetoencephalography, or MEG.
MEG has important advantages over other functional imaging techniques. First,
the temporal resolution is excellent allowing researchers to monitor brain activity on the
millisecond timescale. Furthermore, MEG is completely non-invasive, requiring neither
strong external magnetic fields (e.g. fMRI) nor injections of radioactive substances (e.g.
PET).
The magnetic fields that are evoked from neurons in the brain are very weak;
they have strengths of 0.2 to 1 pT, less than a hundred millionth of the earth's magnetic
field (Hamalainen, 1993). However, with the use of a sensitive magnetic-field sensor,
the superconducting quantum interference device (SQUID), these small magnetic fields
can be harnessed. The physics of how the SQUID works is not relevant for this
research. Because of such small magnetic fields, all MEG recordings must be conducted
in a special magnetically shielded room, to reduce the interference from larger magnetic
fields. The patient sits in a chair and the MEG helmet is positioned directly above the
patient's head, see figure 1.
5
Figure 1: Patient situated under the MEG
Electroencephalography (EEG) is a similar non-invasive way to measure the
brain's electrical activity. EEG requires a cap in which electrodes are placed. The cap
must be fitted securely on the patients head to ensure accurate measurements. There are
two critical differences between EEG and MEG. First, EEG measures electrical field,
while the MEG measures magnetic field. Secondly, MEG is insensitive to the
conductivities of the scalp, skull, and brain-which can affect EEG measurements.
Neurons communicate through generation of these electrical potentials, called
action potentials. They travel from neuron to neuron by traversing down the length of
the neuronal axon. The progress of these potentials depends on the flow of a small
quantity of ions down the center of the axon. Since ions have electrical charge, their
6
flow in one direction qualifies as an electrical current. Though this current is small, it
will generate a very small magnetic field looping around the neuronal axon. When a
number of neurons are simultaneously active, the magnetic field generated by these
neurons is strong enough to be recorded from the surface of the brain using the SQUID.
Magnetic fields appear whenever an electrical current is generated. If the
electrical current has a net flow along a line (like a DC electrical current down a wire),
then circular magnetic fields will appear in loops around that line, following the 'righthand rule' in electromagnetic theory. The figure below shows a current flow, J,, and the
associated magnetic field, B.
B
Figure 2: The current Jv produces a magnetic field, B, in loops around the current.
The magnetic field is sensed on the surface by traveling through the skull and
scalp without any distortion. Previous research (Hari and Salmelin, 1997) determined
that the conductivity of the skull and scalp are uniform, thus allowing the magnetic field
to propagate without needing to consider other parameters. By the nature of the electric
field, radial currents (sources that travel perpendicular to the cortex) do not produce
7
detectable magnetic fields, while tangential sources (parallel to the cortex) will produce
magnetic fields, see figure 3:
cortex
VV00
cortical sources
Figure 3: Only currents that run perpendicular to the cortex produce a measurable
magnetic field (B * 0)
Where B is the induced magnetic field, and V is the electric field associated with a
current source. The arrows parallel to the cortex will emanate a detectable magnetic
field, while those that are perpendicular to the cortex will not.
The MEG detects the magnetic field using the coils in the SQUID. The coils are
configured into 306 MEG channels. This configuration contains 204 gradiometers and
102 magnetometers. The gradiometers have more accurate sensitivity than the
magnetometers and when analyzing MEG data, the magnetometers are generally
ignored. The coil configurations of the channels are shown below:
8
Figure 4: Configuration of the coils on the MEG
Depending on how one samples the surface of the brain, there are thousands of
sources within the brain that elicit these magnetic fields. These sources propagate to the
surface using Maxwell's equations and Biot-Savart Law. The method of how fields
propagate is explained by a suitable forward model.
The forward problem in MEG source localization is to determine the location
and strengths of the MEG scalp measurements (the coils) based on the active source
inside the brain. The purpose of this inverse problem, then, is to estimate the location
and intensity of these sources given simply the measurements produced from the scalp.
The problem is ill-posed because the number of measurements is significantly smaller
than the number of sources.
1.2 Forward Problem:
Before discussing methods of inverse localization, it is important to understand
the forward model. This modeling depicts the relationship between the current sources
9
and the data coming from the MEG. It uses Maxwell's equations to predict how the
current sources inside the brain radiate to the scalp.
As described above, the current sources in the brain generate a current flow,
J(r'), where r' is the source that produces the current. The total current flow, J(r'),
can be divided into primaryJ (r') and volume J '(r') currents. The primary currents
reflect directly the activity of the corresponding neurons. These currents flows exist in a
conductive medium, i.e. the brain tissue. The volume currents are produced in other
media, such as the skull and scalp. Jv (r') results from the effect of the electric field
on outsides charge carriers and represents currents of the macroscopic field (Williamson
and Kaufman, 1987):
Jv
a(r')E(r')
where E is the electric field and T(r') is the conductivity which we will here assume to
be isotropic. By making approximations for Maxwell Equations:
E=-VV
where V is the electric potential. The primary current flow JP (r') is the driving force of
the total current flow and can be defined at the macroscopic level as the current that
does not include the volume current:
J(r') = J (r') + J v(r') = JP (r') - a(r')V V(r')
The total current density may not have any volume currents (for example, when
a current is closed loop), but all total currents must include a component that is directly
associated with the primary current.
For MEG, it is the primary currents we are interested in since they represent
areas of the brain that relate to sensory and motor processes (Mosher, et. Al, 1999). We
10
will modify our current models to stress the importance of the primary current density.
If we assume a simple head model involving a single homogeneous sphere and consists
of contiguous regions each of constant conductivity, then the relationship between a
source inside the brain and a measurement point outside the brain is given as the
following, from Biot-Savart Law:
B(r) = g
J(r') x (r- r')3 dr'
4z G
1
11
Where B is the magnetic field, r is a location at the measurement point, r' is the
location of the current source, and G is the closed volume containing the source. After
distinguishing between primary and secondary current sources, this equation can be rewritten as:
B(r) = BO(r)+
4
(J -.
4z
)JV(r') (r- r') xdS
s
|r
- r ||
where the summation is over all boundaries, and Bo(r) is the magnetic field resulted
from the primary currents.
The equations above are used in a forward model assuming a spherical
homogeneous head model. However, the forward problem must in general be solved
numerically for arbitrary head shapes. In this research, a boundary element method
(BEM) was used for solving the forward problem. The theory and mathematics
involved are very similar to those for a homogeneous sphere model. In the BEM, the
brain volume is partitioned into millions of tiny tetrahedral (see figure below).
11
Figure 5: Under the BEM the brain surface is divided into thousands on triangles.
The mathematics calculating how a magnetic field is detected at the measurement point
is applied as a general theory. The equations and relationships can be formed to fit any
particular model, such as the BEM. There are several different head models, such as the
3 shell-sphere or the finite-element method (FEM), which take alternate parameters into
consideration. For the purpose of this report, we need not go into the specific
differences in head models. The BEM, the method used for this research, is calculated
using the most current technology.
12
2. Inverse Methods:
2.1 Problem Formulation:
Determining the strength and location of a neuronal current based on the MEG
scalp measurement data is an ill-conditioned inverse problem. After defining a model
for source activation, the measurement data is used to specify the parameters for the
model, revealing information about the strength of current in different parts of the brain.
The mathematical methods of estimation were first developed in the
early
1 9 th
1 8 th
and
centuries. Bayesian statistics says that given the a priori probability
distribution and a measurement value, one can get the a posteriori distribution
describing the knowledge about possible values of the parameters. In 1777, Bernoulli
described the method of maximum likelihood stating that a parameter value should be
chosen that makes the obtained values more probable. Finally, Gauss invented the
method of least-squares and current estimation methods solve inverse solutions using a
regularized least-squares approach or by maximizing the a posteriori probability
distribution. In the 1960s, Kalman published an effective method of updating parameter
estimates given incoming measurement data, which is known as the Kalman Filter.
The MEG inverse problem, as with many other inverse problems, is ill-posed,
i.e., there may exist many combinations of parameters that produce the same results,
thus having the problem of non-uniqueness. We will examine the evolution of methods
used in source localization. Before we continue, it's important to set up the problem in
terms of variables.
13
Since most of biomedical inverse solutions use linear algebraic formulations, a
matrix setup is a good framework to follow. Assume that we call the measurement
(observation) data vector Y, and the current source vector J. The vector Y encompasses
all the measurements from the 204 channels:
Y(t)= (y(l, t), y(2, t),..., y(Nc, t))'
where
t =1,... , T
' denotes matrix transposition. The measurement vector has the dimension Nc
x T, where Nc is the number of channels, and T is the number of time points. The vector
J contains the number of sources which are known in this research as gray matter
voxels. At each voxel, there is a local three-dimensional current vector:
j(t) = (j (V, o), jy (V, o), j '(V, t))'
where v is the voxel label. The column vector of all current vectors (for all voxels) is
denoted by:
JW)= (U(1,
A),j2, 0)',...j(Nv, ))
where Nv is the number of voxels. Thus, the dimension of J(t) is 3Nv x T.
The problem is ill-posed because Nc << Nv. Depending on how the brain is
divided into discrete segments, Nv can vary from 400 to 4,000, each voxel having an x-,
y-, and z-current density. It is difficult to estimate the strength and location of all 3Nv
voxels, given only the 204 channel measurements on the scalp. This leads to the nonuniqueness problem described above since Nc << 3Nv.
The forward model, explained in section 1.2, contains all of its information
about the source propagation and current density to a matrix known as the leadfield
matrix K, which maps the intracranial neuronal currents to the extracranial MEG
signals. Each row of K represents a channel, and the values in that row are the weights
14
that each voxel provides to the value of that channel. K also acts as an ordertransformation matrix effectively transferring the order of the problem from the source
space (3Nv x T) to the measurement space (Nc x T), which is significantly smaller.
Thus, K has dimensions of Nc x Nv.
The linearity of the lead field matrix allows the follow relationship:
Y, = KJ,
The above model can be easily extended to multiple time samples, thus creating a
spatio-temporal model. Additionally, there exists an observation noise, e, which is an
additive random element. Thus, we evolve the equation above into:
Y = KJ, +e,
Having stated the problem in terms of variables, we can now understand inverse
methods.
The
overdetermined
and underdetermined
dipole
models
are
both
instantaneous solutions, i.e., they do not vary with time and take the current densities at
each instant of time. The dynamic solutions use previous time states and a new
measurement to provide an a posteriori estimate of the state. This enables the estimate
to have spatio-temporal characteristics.
2.1 Static models:
2.1.1 Overdetermined dipole model (least-squares):
The most basic model for MEG is the overdetermined dipole model. Here, the
basic strategy is to fix the number of candidate sources and use a nonlinear estimation
15
algorithm to minimize the squared error between the actual data and the estimated value
calculated from the forward model. The assumption is that the activated source area is
small compared to the distance to the measurement channels, so the measured magnetic
field resembles that generated by a current dipole. The setup for this situation is as
follows:
Y = K(r, j)J+ e
Where K(rj) represents the signals generated by unit diploes with given locations
r = (x1,y1,z,,...,xNv YNvZNv)
and orientations
= (Ijx'l'IZ''***'NvX'
NvY'
NvZ). The
observation noise, c, has a zero-mean and a covariance matrix E. The maximum
likelihood estimate of the dipole parameters (rj) can be calculated by minimizing the
least-squares cost function:
E(r, j, J) = |Y - K(r, j)Jj
where the square of the Frobenius norm, x2 is the sum of the squares of the elements
of the matrix. Taking into account the noise covariance, Ec, the least-squares cost
function becomes
E(r, q,Q)= Xj
(Y-K(r, q)J)
and the solution to this form is:
f(J)=(Y -KJ)
T
-l(Y -KJ)
The MEG sensors are extremely sensitive, intercepting noise from all active
fields in the room. Thus the noise distribution of the measurement may include some
non-Gaussian contributions, but on average, the noise settles upon a normal distribution
16
The problem with this model is that true sources areas are not single, point-like
entities. Previous research has shown that an area of 1cm2 or larger can produce
detectable signals at recovery (Hamalainen, 1993). Thus, the estimated source
localization will not be exactly similar to the true locations, but will be close in the
general area of activation.
In implementing the least-squares method, the most significant variable in this
method is the number of voxels to use, Nv. If one uses too many voxels, then the
sources can be made to fit any data set, regardless of quality. However, if too few
voxels are used, the intensity of the current dipole might not be strong enough to be
detected at the sensor locations. Furthermore, as the number of voxels increases, the
cost function may result in getting trapped in local minima, which might be undesirable.
The solution to the problem is tractable because the number of dipole sources is
much less than the number of channels. This is different when using the
underdetermined dipole model in the next section.
2.1.2 Underdetermined dipole model (least squares with regularization):
The previous least-squares solution is the simplest case and serves as a
foundation for deriving more sophisticated inverse algorithms. In the underdetermined
dipole model, the dipole locations are predefined over the volume where voxels can be
found with decent accuracy. Here, the number of voxels, Nv, are greater than the
number of measurements, Nc. If we observe the system at a single time point, we can
set up a Gaussian linear model as follows:
17
Y =KJ +e
(1)
J, = Hp, + R,
(2)
where J contains the voxel strengths, and E is the measurement noise, distributed as a
Gaussian random vector with mean zero and covariance matrix E. Similarly, J has an
additive noise element R, which is zero mean and has a covariance matrix ER.
The underdetermined problem results in minimizing the following cost function:
- KJ
f(J) =
E
- H
+2|J
RU112
(3)
where X is a regularization parameter to trade-off the measurement error and prior
information. If X is large, then the data, Y, must fit the prediction, KJ, so that the
expression governing the fitness of the model, ||Y - KJ1,
is minimal. Consequently, if
k is small, then the model won't predict the data too well. Thus, k helps to penalize the
fitness of the model.
The corresponding least squares error function is:
f(J)=(Y
-KJ) T
The first term in this equation
X) (Y -KJ)+2(J
f(J)=(Y -- KJ)T
-
Hp)-
1 (J -
Hp)
-)(Y- KJ) is the same as the least-
square solution depicted in the overdetermined method. In order to identify a solution,
the second term A(J - Hp)Z- (J- Hp) has information about the mean of the prior and
the covariance structure of the source noise. The solution to the error function is:
J=HP+'ZRK(KXRK+X) 1 [Y-KIP] (4)
where J is the estimate of J.
18
This form brings about some interesting insights. The solutions are instantaneous
(static) because J is computed at a snapshot instance in time. This poses an issue
because there is no continuity constraint on the temporal characteristics of the solution,
i.e., the estimate of source localizations at time t = t' is only proportional to the
distribution of the source localizations at time t = t'.
Another insight is that both static least squares models are the special case
where the prior distribution on the sources is the Gaussian distribution with Hp = 0.
This reduces Equation 4 to
J = VL RK'(K2X RK'+Y)-'Y
(5)
which is the standard form of the least squares solution with regularization. Currently,
this solution is widely used and adapted to include other physiological constraints, but
for the purpose of this paper we will compare all future estimates to this static or 'naive'
estimate.
A major drawback with this model is that since the prior guess of the mean at
time t = T is 0, each solution resets whatever previous knowledge it had. The lack of
continuity in time may prevent accurate estimation of the current state estimate.
2.2 Dynamic Models:
2.2.1 State-Space Model:
In order to develop the equations for the linear filter, a suitable state-space
model for the system is presented as follows:
19
where
J, =FJt, + 7,
(6)
Y =KJ +e,
(7)
mt is a source white noise with mean zero and covariance
matrix
Q = E(9rIT).
Consider the autoregressive process AR(l) in equation 5. This equation models states
that the estimate of a current vector at time t is equal to some constant multiplied by the
current vector at time t-1. This adds the idea of temporal continuity to the system. If we
let F = I3Nv, where Ix is an identity matrix of size x, Equation 6 becomes:
J(v,t) =I 3 NJt
+
7
(8)
so that the current estimate of a voxel v at time t is equal to its value at t-J plus some
random noise. An additional constraint can be placed so that the voxel at time t is equal
to its value at t-1 and proportional to its neighbors at time t-1. To account for
neighborhood interactions, we can rewrite equation 8 as:
J(v,t)= AJ
+ B I J ,+,
(9)
Ve Z(v)
where A and B are constants and Z(v) contains the neighbors of voxel v. The
summation is over all the neighbors of v. The neighbors can be easily found by defining
a spatial radius and determining which voxels intersect that area. The resulting values
for this voxel are scaled by a factor B which averages the values by dividing the total
sum by the number of neighbors. A different method is used to select the parameter A.
Since we want the values for voxel v at time t - 1 to be emphasized more than the values
for the neighbors of voxel v at t - 1, we will set the A to be I3N, and then scale the
identity so that the summation over the entire row for voxel v is 1. Consider the
following space 2-d representation for a set of voxels.
20
Figure 6: Spatial map of 2-D cross-section of the brain. There are 40 voxles. The following example
is assuming voxel 18 is being observed, with neighbors 11, 17, 19 and 26.
Assume that is how the brain looks on a cross-section perpendicular to the Z-axis. There
are a total of 40 voxels each numbered in the spatial configuration above. Let's assume
we are looking at voxel 18 for the moment. The neighborhood of voxels, Z(1 8) is [11,
17, 19, 26], two in the x-direction and two in the y-direction. According to equation 8,
row 18 of the J vector will have a component an
J(1 8, t) = AIJ(1 8, t -1) + B(j(1 1,t -1) + j(1 7, t -1)
+ j(1 9, t -1)
+ J(26, t - 1)) + q(t)
The parameter B will initially be 0.25 since there are 4 neighbors. The parameter A will
be set to 1. However, the entire row will be normalized so the sum adds to 1. Thus, the
parameter B will now be 0.125 and the parameter A will be 0.5. This way, the sum is
21
0.5 times voxel 18's previous value, plus 0.125 times voxels II's previous value, plus
0.125 times voxel 17's previous value, and so on.
Since we are adding up similar dimensions, it is possible to combine all the
information about voxel v as well as its neighbors, Z(v), into a simple matrix, F. In the
example provided above, the
1 8 th
row of J, corresponding to voxel 18 is as follows:
Z(v)
1... 11
17 18
2
18
26
HRn
1
a
19 *..
0
18
11/8 11/211/8
1/8
0
Figure 7: Demonstrating portion of the F matrix corresponding to voxel 18.
This is an example of how F is configured. For all other columns in row 18, the values
are 0 because there are no neighborhood interactions to consider. If you notice, the
diagonals of F will always have a value because this refers to the voxel's own previous
value at time t - 1.
22
iijbi, =
V kvtnmml
- -
-
500-
1000-
2000-
2500
.
500
1500
1000
2000
2500
Figure 8: Image of F, dimensions of Nv x Nv, showing the neighborhood interactions between all
voxels.
On a full scale of Nv = 2697 voxels, the F matrix looks like the above image captured
from Matlab. The diagonals are apparent, and the banded lines show the neighbors.
There are smaller bands around the diagonals that cannot be seen in the figure, since
voxel v generally will have as neighbors' v + 1 and v - 1.
There are many ways to implement the state-space model. One could use an
AR(2) which would look at voxels not only at t, and t - 1, but also t - 2. Additionally,
we can expand the neighborhood space by increasing the radius of possible neighboring
voxels. There are other physiological variables we might be able to model inside the
generalized state-space model but this model serves as a good starting point to continue
the dynamic model derivation.
2.2.2 Kalman Filtering:
23
-
In 1960, R.E. Kalman published his famous paper describing a set of
mathematical equations that provides an efficient computational (recursive) method of
estimating the state of a process. The Kalman filter is a powerful tool because it
supports estimation of past, present, and even future states and can model these states
without exact knowledge of the system.
Since that paper was published, the Kalman filter has been applied to all filtering
and estimating problems ranging from navigation control to forecasting with
atmospheric models. In this paper, we develop a set of equations on the Kalman filter to
be used for the purpose of estimating the location and strength of current sources in the
brain.
The Kalman filter can be derived directly from Bayes' Rule and a probabilistic
approach to solving the filter equations are provided in the appendix. The Kalman filter
is formulated as follows. Given the state-space model described in equations 5 and 6:
J,=FJ1_1 +R,
Y= KJ +e,
in which the state noise, It, is a white Gaussian noise with a covariance matrix
Q, and
the measurement noise, ct, also is a white Gaussian noise with a covariance matrix R.
We would like to formulate the estimation algorithm to satisfy the following statistical
conditions:
24
1.
The expected value of our estimate is equal to the expected value of the
state; i.e., on average, our estimate of the state will equal the true state.
2. Versus all other estimation algorithms, this algorithm minimizes the
expected value of the mean squared error; i.e., on average, the algorithm
gives the smallest possible estimation error.
The Kalman filter satisfies these two criteria. The filter estimates the state by using a
feedback control. First the filter estimates the value of a state at time t, and then it
receives feedback from the measurements (including the noise). The Kalman filter
equations fall into two categories: time update (prediction) equations and measurement
update (filter) equations. The time update equations use knowledge of the temporal
continuity to project forward (in time) the current state and error covariance to give the
a priori estimates for the next time step. The measurement update equations provide
feedback so that the new measurement is incorporated to the estimation allowing the a
prioriestimate to improve to an a posterioriestimate.
The sequence of events in Kalman filtering is as follows: start with the initial
conditions Jo0o, V010 where V is the error covariance of the state estimate J. The first step
is to predict the next state for J1. Predicting the next state Ji is actually predicting the
state Ji given state Jo. Thus, the actual computation is estimating the state for Jilo. Then
a new measurement, Yi is received, and is used in a filtering stage to compute the value
of the state at time t = 1. The result is represented by Jill. The process is recursive,
yielding J211 in the next step followed by J2 12 at the filtering stage. The iterations
continue until
JNVlNv
iS computed.
The Kalman filter equations are presented below:
25
Prediction (Time-Update)
J,,-1 =FJ,1 1I
(10)
V,_- = FV_FT +
Q
(11)
Notice how in the time-update equation 10 a new a priori estimate for the state is
predicted: Jtit. 1 . The prediction happens through the use of the F matrix that gives a new
estimate of the state. The state error covariance, Vt, is also predicted forward from the
previous time point to give an a priori estimate of Vtlt-1 .
Filter (Measurement-Update)
+R)- 1
(12)
J,,t = Jt,_ + G, (Y - KJt,,
(13)
Gt=Vj,_K T (KVjK
T
V ,= (I- GK)V,_j
(14)
The measurement equations first compute the Kalman gain, Gt. Equation 13 uses the
calculated gain to get an a posteriori estimate of the state. The term, Y - KJj,_, are
residuals, and measure the difference between the new incoming measurement, and the
a priori estimate. The gain, then, is used to minimize the a posteriori error covariance,
and it determines how much to weigh the residual. Finally Equation 14 determines the a
posteriorierror covariance.
The Kalman filter can be viewed in the same framework used when working
with the least-squares problem. Recall the solution to the least squares solution with
regularization from Equation 4.
J = Hp + 2'YRK'(KV R K'+,)-1 Y - KHp]
26
Recall that Hp was the mean of the state, and from the regularized least squares
solution, we typically set this value to 0, implying that at every new time point, the a
priori mean for the state is 0. Because the Kalman filter is a dynamic process, the mean
of the a priori will be related by Equation 10. So if we were to assume that Hp = Jt-i,
then we could set up a different cost minimizing function:
f (j)
=
Iy
-
KJ,- 1|I, +
- FJ-1
and arrive at a new estimate:
i, = J,- + V,_-K'(KV,_, K'+
1
-[Y - KJ,_
we can identify this solution as the Kalman filter measurement update equation that
calculates the a posterioriestimate.
a priori estimate
residual
i, = J,_ + V,_ K'(KV_, K'+Y,) -1Y - KJ,,]
Kalman gain
Thus using the similar framework for static and dynamic solutions, it is easy to
see how each solution differs.
2.2.3 Fixed Interval (backwards) smoother:
27
The Kalman Filter framework provides an accurate way to estimate a given state
J(v,t) at time t. And since this is a recursive algorithm, the estimate at time t has taken
all previous time points into consideration. Thus, an alternate way to write the estimate
t- 1).
is J(v, t It = 0 ...
In a MEG experiment, all the data of a single-trial is computed and averaged
over the entire epoch. Thus data from the MEG measurements are acquired, so an
estimate of a state could be better calculated not only using the previous time points, but
future time point as well. This filter is known as the fixed-interval or backwards
smoother. It provides an estimate of J(v, t I t = 0... T-1) where T is the number of time
points.
The backwards smoother can be derived using a statistical framework, which is
given in the Appendix. The equations for the backwards smoother are as follows:
Gain (Correction term)
+A, = , (15)
Smoothed estimate
+At,.t+IT - Jt+
1, )
JtIT =JtIt
VtIT = Vt t + A, (VIIr - V+, )A
(16)
T,
(17)
The process is similar to the Kalman filter. The gain, At, is calculated which determines
the weight placed on the residual J
+
I-T
-
J t+1 I t. The smoothed estimate Jt1 adjusts the
Kalman filter estimate Jtjt by adding or subtracting the new residual. This filter requires
that all error covariance values must be saved for all time points, which can be a
computational issue if the size of Nv grows very large.
28
In this research, we will test the three inverse solutions:
Static solution:
least squares: J(v, t I t = t)
Dynamic solution:
Kalman Filter: J(v, t I t = 0... t)
Backwards smoother: J(v, t I t =
....
T)
Simulated brain signals will be placed into an area of voxels in the brain and through
the forward model, we generate the MEG measurements. Then, each three inverse
solutions will recover the signal and a mean squared error will be calculated.
29
3. Simulations:
3.1 Region of Interest:
In order to test the effectiveness of each inverse solution we decided to run
models on simulated MEG data. Using proprietary software, we digitized the brain into
voxels using a 7.5mm spacing. Then, we chose a region of interest. The region of
interest for this simulation was this section of the left hemisphere of the brain:
Figure 9: Region of interest on inflated surface
This view of the brain is known as an inflated surface. The brain's cortex is
highly convoluted with indentations in the brain which are known as gyrus (shallow) or
sulcus (deep) regions. The inflated surface places all convolutions on the same plane
30
and inflates them to the surface. Thus, what is circled in green is a selection of voxels
on the gyri and sulci of the brain (see figure below).
Figure 10: Region of interest on normal surface
The brain imaging software enables us to select a region of interest on the brain
and compute a forward model on only those voxels which are in that region of interest.
The entire cortex consists of more than 8,000 gray matter voxels that can emanate a
magnetic field. By reducing the order of the problem to a region of interest, we can
reduce the computational burden on the inverse solutions. There is no loss of
information. The region of interest here selects 837 voxels.
The forward model that is calculated automatically creates a lead-field matrix,
K, for the region of interest. For this simulation, the number of sources are Nv = 837,
31
..........
and the number of measurement channels is Nc = 204. Thus, K has dimensions of 204 x
837.
3.2 Brain Signals:
In each of the 837 voxels, we place a simulated brain signal. Two simulated
signals were developed. The first signal is a sine wave with a frequency of 10Hz. We
denote this signal as S1 = sin(2zrl0t) where t is an array of time points. The signal is
depicted below:
Figure 11: Signal S1 to be placed in all voxels to produce JTrue
For this simulation, we have a 'True' array of voxels, which is called JTue. SI is placed
into each voxel.
32
The other simulated signal is more physiologically accurate. The signal is a
modulated sine wave with a mixture of 10+20Hz frequency. Additionally, there are
bursts of activation followed by silence. Here S2 = sin(2ir10t)+sin(2c2Ot).A graph of
the signal S2 is provided below.
Figure 12: Signal S2 to be placed in all voxels to produce JTrne.
The
JTrue
signals for S1 and S2 have a mean of zero. Because of this fact, the least-
squares solution will do a great job of signal recovery since it assumes at each time
point the mean of the estimate is 0. For this reason, we will add a DC offset to the JTre
signals since the actual signals produced in the brain do not have zero means. Thus Jhrue
will be Jrue + offset. We will vary the offset from 0 to 20.
33
3.3 Selecting Regularization Parameter A
If we recall from inverse methods, the static least square solution in an
underdetermined dipole model has a regularization parameter X. This parameter is used
to penalize the fitness of the model. Typically, values for lambda are between 0.01 and
1, however most values fall within 0.1 to 1. In order to find the approximate value for X,
we varied k from 0.1 to 1 and compute the mean-squared error between the true signal
SI and the least squares solution given by Equation 5. The resulting mean-squared error
was smallest when X was equal to 0.5. For the remainder of the results, this will be the
constant for X.
The format of the simulation is as follows:
1.
Place signal Si into all voxels to produce an Nv x T matrix JTrue.
2. Produce a measurement matrix, Y, using the forward model to produce an Nc x
T matrix Y.
3.
Calculate inverse solutions:
a. Jstati for least squares with regularization solution.
b.
JKalman
for Kalman filter solution.
c.
JSmooth for
the fixed interval smoother.
4. Calculate the mean-squared error (MSE) between the three inverse solutions and
JTrue.
5. Vary the signal-to-noise (SNR) ratios and offset values.
6. Perform the same analysis for signal Si.
34
4. Results
The first set of experiments focused on changing the offset value in order to
determine what effect a non-zero mean has on the recovered signals. For the signal Si,
the recovery for all solutions are shown in the following figure:
Figure 13: Inverse solution recovery for signal S1 at one voxel. Each plot shows a recovered signal
for a given DC offset value. It is clear that as the offset occurs, the dynamic solutions (Kalman and
smoother) can track and estimate the true signal better. The static solution is very noisy and
fluctuates rapidly because of the prior assumption that the mean is 0.
35
The static solution does perform very well when there is no offset. As the offset
gradually increase, you can see the static solution fluctuates rapidly. The dynamic
solutions do a much better job at tracking the signal, despite the offset. This occurs
because of the dynamic mean allows temporal continuity. For signal S2, a similar result
occurs:
Figure 14: Inverse solution recovery for signal S2 at one voxel. Similar results as S1, tracking
improves using a dynamic inverse solution.
36
The SNR for the above simulations were at 5. This is an experimentally suitable SNR
for MEG data. The MSE for each signal under the different offset values was calculated
for SNR 5. The results are shown below.
Signal S2: MSE with Variable Offset
3.5E-1 1
3.OE-1 1
-+- Static
-6- Kalman
Smoother
2.5E-1 I
wi
C',
2.OE-1 1
1.5E-11
1.E-11
5.OE-12
O.OE+00
10
20
Offset
Figure 15: MSE versus Offset for signal S2. The MSE for the Static solutions increase significantly
as the offset is increased.
The results above show that the non-zero mean has a significant effect on the
recovery performance. This is an important point because true brain signals do not
always have zero mean. Recovery for the static solution gets increasingly difficult as the
mean departs from zero. The Kalman filter is significantly better than the static solution,
and the smoother slightly improves the estimates from the Kalman computations.
The SNRs were increased from 1 to 10 assuming a constant offset of 0. A graph
of the effect of SNR on a recovered signal is shown below:
37
Signal S2: MSE with Variable SNR (offset = 0)
7.OE-12
-4-Static
-*-Kalman
Smoother
6.OE-12
5.OE-12
wU
4.OE-12
C,,
3.OE-12
2.OE-12
I .OE-12
O.OE+OO
1I
2
4
5
10
SNR
Figure 16: MSE versus SNR with 0 offset. The inverse solutions follow a downward trend as SNR is
increased. At a high SNR (10), the static solution recovery is very similar to that of the dynamic
solutions.
MEG data is very noisy, a typical SNR for measurement data is around 5. At this SNR,
the recovery still favors the dynamic solutions. If we assume an offset of 10, then the
difference between the static and dynamic solutions are more significant. See the figure
below.
38
Signal S2: MSE with Variable SNR (offset = 10)
3.OE-1 1
-+- Static
-M-Kalman
Smoother
2.5E-1 1
2.OE-1 1
w
1.5E-11
1.OE-11
5.OE-12
O.OE+00
1
2
4
5
10
SNR
Figure 17: MSE versus SNR with offset of 10. Demonstrating the importance of dynamic solutions.
39
5. Conclusions
The purpose of this report is to serve as a proof of principle for future research.
The preliminary research here provides two important findings. First, a dynamic
solution to the inverse problem using spatio-temporal Kalman filtering results in a better
overall recovery of the states. In neurophysiology, brain signals can be intractable, and
the process is very spontaneous and fluctuating. In order to simulate this environment,
an offset was added to our simulated signals. The offset is an important variable in the
inverse solutions because it causes the static solution to constantly fluctuate from zero
in order to find a solution.
Secondly, the fixed interval smoother allows for more detailed signal recovery.
It is not certain whether or not the small improvement in performance can be clinically
justified, but for the purpose of this report, it should be looked upon further.
In order to fully understand the fruits of dynamic inverse solutions, an analysis
must be conducted on real data. Real data will provide the Y vector of measurements
and then we can run the inverse solutions on the measurement data. This is the aim of
our future research.
40
5. References
Galka, A., Yamashita, 0., Ozaki, T., Biscay, R., Valde's-Sosa, P.A., 2004. A solution to
the dynamical inverse problem of EEG generation using
spatiotemporal Kalman filtering. Neurolmage 23: 435-453.
Mosher, J.C.; Leahy, R.M.; Lewis, P.S., "EEG and MEG: forward solutions for inverse
methods", IEEE Transactions on Biomedical Engineering, Volume: 46 3 , March 1999,
Page(s): 245 -259.
M. Hamalainen et al. (1993) Magnetoencephalography - theory, instrumentation, and
applications to noninvasive studies of the working humans brain. Reviews of Modem
Physics 65:413-197 PDF
R. Hari and R. Salmelin (1997) Human cortical oscillations: a neuromagnetic view
through the skull. Trends in Neuroscience 20:44-40 PDF
S. J. Williamson, and L. Kaufman, "Analysis of neuromagnetic signals," in A. S.
Gevins and A. Remond (Eds.), Handbook of Electroencephalography and Clinical
Neurophysiology: Volume 1. Method of Analysis of Brain Electrical and Magnetic
Signals (pp. 405-448). New York: Elsevier, 1987
G. Kitagawa and W. Gersch. Smoothness Priors Analysis of Time Series.
Notes in Statistics #116, New York:Springer-Verlag. 1996
41
Lecture