A resampling approach to estimate the stability of one

advertisement
1514
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
A Resampling Approach to Estimate the Stability
of One-Dimensional or Multidimensional
Independent Components
Frank Meinecke, Andreas Ziehe, Motoaki Kawanabe and Klaus-R. Müller
Abstract— When applying unsupervised learning techniques in
biomedical data analysis, a key question is whether the estimated
parameters of the studied system are reliable. In other words,
can we assess the quality of the result produced by our learning
technique? We propose resampling methods to tackle this question and illustrate their usefulness for Blind Source Separation
(BSS). We demonstrate that our proposed reliability estimation
can be used to discover stable one- or multi-dimensional independent components, to choose the appropriate BSS-model, to
enhance significantly the separation performance, and, most important, to flag components that carry physical meaning. Application to different biomedical testbed data sets (MEG/ECG recordings) underline the usefulness of our approach.
Keywords— Blind Source Separation, bootstrap, electrocardiography (ECG), independent component analysis (ICA), magnetoencephalography (MEG), multidimensional ICA, reliability, resampling, stability, unsupervised learning
I. Introduction
M
ANY typical biomedical applications consist of three main
steps: a measurement, which produces some raw data,
a data analysis, that extracts parameters of interest from the
data and a decision or interpretation based on these extracted
parameters. The latter step is mostly done semi-automatically
i.e. often involving medically trained staff. As the data analysis algorithms, being used in step two of the process, often
rely on unsupervised learning, i.e. the desired parameters are
estimated without availability of labels, there is a fundamental
dilemma. Unsupervised learning algorithms always find ‘some’
answer within their model class: regardless of whether the used
model is truly applicable. For example the PCA algorithm will
always provide a projection to an orthogonal basis whether the
signal of interest can be best decomposed or interpreted within
this basis or not. Thus, the user has so far little clue about the
certainty with which the answer of the unsupervised algorithm
is correct.
Clearly, the medical staff always needs both aspects for a
diagnosis: a decision and the certainty for the respective decision. So, assessment of reliability for a data analysis result is
of fundamental importance, especially if the automatic decision
or subsequent human decision bears high risks or costs.
In this paper, we show how a reliability estimation for unsupervised learning algorithms can be computed using well-known
resampling methods from statistics [3], [4] (section II). Once we
F. Meinecke is with University of Potsdam, Department of Physics,
Am Neuen Palais 10, 14469 Potsdam, Germany and with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany.
E-mail:
frank.meinecke@first.fraunhofer.de
A. Ziehe is with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany.
E-mail: andreas.ziehe@first.fraunhofer.de
M. Kawanabe is with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany. E-mail: motoaki.kawanabe@first.fraunhofer.de
K. -R. Müller is with University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany and with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany.
E-mail: klausrobert.mueller@first.fraunhofer.de
Correspondence should be addressed to K.-R.Müller
are able to estimate the reliability of a solution, we can use this
information for model/algorithm selection purposes, for testing
model validity and for improving the used algorithm. Note that
resampling is completely general and can be applied to assess reliability of any unsupervised learning algorithms, e.g. projection
techniques (for instance independent component analysis (ICA),
principal component analysis (PCA) (cf. [5]), kernel PCA [6],
. . . ), clustering (cf. [7]) and so on. In section III we apply
the proposed resampling techniques to blind source separation
(BSS) problems. We show how these techniques enable us to
select a good BSS-algorithm, to improve the separation performance and to find potentially meaningful projection- directions
or subspaces respectively. We will give an algorithmic description of the resampling method (section III) and show excellent
experimental results on toy (section IV) and several real-world
data testbed sets (MEG, ECG) (section V) and conclude with
a brief discussion in section VI.
II. Resampling Techniques for Unsupervised Learning
A. Resampling Methods
In the typical unsupervised learning scenario, we want to
learn or estimate a set of parameters θ = (θ1 , . . . , θp ) from observed data x = (x1 , . . . , xT ), that characterize the generating
law of the data. Usually, we consider a random variable X distributed according to a stochastic process F and regard x as
one realization of it.
We will denote the estimated parameters by θ̂ = θ̂(x) =
(θ̂1 (x), . . . , θ̂p (x) ), where the estimator is a function of the given
data set. The important quantity to assess stability is the RootMean-Squared error (RMSE) of the estimator θˆi
r
h
i
σi = EF (θi − θ̂i (X))2
(1)
where EF and varF denote the expectation and the variance
for F 1 . We remark that in our procedure for blind source separation, we measure the error componentwise, i.e. for each θi .
Since we neither have access to the true parameter θi , nor to
more than one realization x from the distribution F , we cannot
evaluate these quantities in a straight forward manner.
Resampling is a statistical method which gives e.g. the bias
and the variance of estimators only from one set of data x at
hand by virtue of modern computer power. Among such procedures, the Jackknife and the Bootstrap are most well-known
(see e.g. [3], [4]). The Jackknife produces surrogate data sets
by just deleting one datum each time from the original data
set. There are generalizations of this approach like the deletek Jackknife which delete more than one datum at a time. The
bootstrap is a more general approach and is widely used in data
analysis recently. We will give a brief explanation in the next
section.
1
Although we can use a general distance measure d(θi , θ̂i (X)) to evaluate the
error of the estimator, we will consider the quadratic distance in our explanations for simplicity.
MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS
1515
F
B. Bootstrap
Let us consider the case that we get T i.i.d. samples
x1 , . . . , xT from a distribution F . We will write the data by
the vector x = (x1 , . . . , xT ). A scalar parameter θi is estimated
with an estimator θ̂i (x). We want to evaluate the RMSE of
the estimator. Let F̂ be the empirical distribution of the data
x. A random variable from F̂ takes values xt (t = 1, . . . , T )
with equal probabilities T1 . Then, B new surrogate data sets2
∗b
x∗b = (x∗b
1 , . . . , xT ) are generated with b = 1, . . . , B, by tak∗b
ing T i.i.d. random variables x∗b
1 , . . . , xT from the empirical
distribution F̂ . We remark that some data points might occur several times, while others might not occur at all in a
particular bootstrap sample. On each surrogate x∗b , the estimator θ̂i∗b = θ̂i (x∗b ) is calculated, so we have B estimators3
θ̂i∗1 , . . . , θ̂i∗B . The bootstrap estimator of the RMSE is calculated as
v
u
B
u1 X
σ̂i (B) = t
(θ̂i − θ̂i∗b )2
(2)
B b=1
(See also the flowchart of the Bootstrap in Fig. 1). This quantity
measures, how robust our estimation is against small (resampling) changes to the data; in other words, how stable the learning algorithm is wrt. the estimated solution θ̂i . Thus, Eq.(2) can
be used as a measure of reliability that allows to select between
different algorithmic solutions, to perform selection between different algorithms or to choose hyperparameters for a single algorithm. Furthermore an assumption about the data generating
model can be accepted or rejected (in the sense of mathematical
testing theory). In the following, we will employ the bootstrap
error estimator σ̂i (B) as a measure of reliability. If the data
generating process is not i.i.d., we have to use extentions of the
Bootstrap e.g. when there exists time structure in the data.
There is a wide literature on statistical properties of the Bootstrap and its extentions, which supports the use of resampling
procedures. For example, it can be shown that for i.i.d. data
the bootstrap estimators of the distributions of many commonly
used statistics are consistent [4], i.e. the bootstrap error estimator σ̂(B) of θ̂i converges to the true σi in probability as B and
T goes to infinity:
p
σ̂i (B) → σi ,
(B → ∞, T → ∞)
x *1
θ (x *1)
i
x*
2
. . .
θ i( x *2 )
x =( x 1 ,..., xT )
data
F
empirical
distribution
x*
b
B
Bootstrap
samples
Bootstrap
replications
of θ i
Bootstrap estimate
of error (uncertainty)
Fig. 1. A schematic picture of the Bootstrap.
matrix A has full column rank. So, the blind source separation problem is to identify the mixing matrix and/or the source
signals using only the observed signals while assuming statistical independence of the source signals and linear independence
of the columns of A. (There are other BSS algorithms that
use slightly different assumptions, e.g. vanishing temporal crosscorrelations instead of statistical independence)
The BSS problem as stated above is clearly undetermined:
since only the observed signals xi are known, a scalar factor
can be exchanged between each source signal sj and the corresponding column of A without changing the product. Also the
ordering of the source signals (and the corresponding columns
of A) has no meaning and is nothing but a notational device.
Thus, the source signals can be recovered at best up to a permutation, scales and signs. In other words: we can only identify
an unordered set of one-dimensional source signal subspaces.
B. Multi-dimensional Independent Components
Recently, some approaches have tried to generalize the idea of
ICA to the case of multi-dimensional independent components
(see e.g. [23] or cf. subspace models [24]). In this case it is not
assumed that all of the m source signals are mutually statistically independent, but that they form K higher dimensional
independent components. This means, that there is a set of
indices 1 = i0 < i1 < · · · < iK = m + 1 that fulfills
K
Y
pk (sik−1 , . . . , sik −1 )
(4)
k=1
j=1
Independent Component Analysis (ICA) assumes that the
source signals sj (t) are statistically independent and that the
2
i
σ i (B)
p(s1 , s2 , . . . , sm ) =
Blind source separation (BSS) techniques (e.g. [8], [9], [10],
[11], [12], [13], [14], [15], [5]) have found wide-spread use in various application domains, e. g. acoustics (e.g. [16], [17], [18],
[19]), telecommunication (cf. [5]) or biomedical signal processing (e.g. [20], [21], [22], [5]). BSS is an unsupervised statistical
technique to reveal m unknown source signals sj (t) when only
mixtures of them can be observed. For a linear mixture model,
each of the n ≥ m observed signals xi (t) is assumed to be generated by
m
X
xi (t) =
Aij sj (t).
(3)
3
x*
θ (x *B)
θi( x * b)
III. Resampling applied to ICA
A. The BSS-Model
. . .
This surrogate data is called the bootstrap sample.
These estimators from the bootstrap samples are called the bootstrap replication of θ̂.
where p(.) denotes the joint probability density function of the
whole data set. Each pk (.) is a probability density function
that cannot be further decomposed into a product of marginal
densities. Standard ICA algorithms that are applied to such
a data set will produce one-dimensional source estimates that
are as independent as possible, which means they will still find
the right decomposition given by Eqn.4, but they are forced to
select as well a decomposition of the actually multidimensional
components. Thus standard ICA techniques are able to find
these multidimensional source signal subspaces [23], but they
choose an (arbitrary) basis within these subspaces. The problem
is then to decide, which of the one-dimensional source space
estimates given by the algorithm should be grouped together.
Figure 2(b) shows an example of a two-dimensional signal
space. The two time series that are shown in this scatterplot
are given by s1 = (1 + q) ∗ sin(φ) and s2 = (1 + q) ∗ cos(φ)
where 0 ≤ φ ≤ 2π and q are uniformly distributed and Gaussian
1516
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
a.)
respect to {ei } then reads
b.)
?
Fig. 2.
Scatterplot of two different two-dimensional time series: (a)
shows a mixture of two one-dimensional sources. One can easily find the
basis (axes) in which the time series become independent. (b) shows a
distribution that cannot be written as a product of marginal densities.
There is no basis in which the time series become independent.
signals, respectively. This combination of sin / cos produces a
rotational symmetric joint probability density, so there is no
basis within this space that could make these two time series
independent.
P
(uei ) · (vei )
pP i
pP
2
2
i (uei ) ·
i (vei )
d{ei } (u, v) = arccos
!
.
(5)
The mixing/demixing process can be described as a change
of coordinates. This means, we consider the sources si as components of a vector s with respect to an orthonormal basis {ei }
and the observations xj as the components of the very same
vector in terms of a different basis {fj }:
X
X
s=
ei si =
fj x j .
(6)
i
j
with ei ej = δij where δij is the Kronecker symbol (δij = 1 if
i = j and δij = 0 otherwise). Note, that the basis {fj } of the
mixtures then is non-orthogonal in general.
The linear transformation between these two coordinate systems is given by Eqns.(3) and (6):
fj =
m
X
ei A−1
ij .
(7)
i=1
C. Used source separation algorithms
In the next sections, we will illustrate the resampling idea
with two commonly used source separation algorithms: JADE
and TDSEP. Both algorithms determine the mixing matrix
based on a joint approximate diagonalization of symmetric matrices. The difference between them is that JADE [13] computes
those matrices from ’parallel slices’ of the fourth order cumulant tensor, TDSEP [14] solely relies on second-order statistics
and diagonalizes time-lagged correlation matrices, i.e. JADE
maximizes the kurtosis of the output signals whereas TDSEP
minimizes temporal cross-correlations between the output signals.
To achieve an approximate simultaneous diagonalization of
several symmetric matrices the algorithms take two steps: (1)
pre-whitening and (2) rotation. First, the whitening transformation W transforms the signals x(t) according to z(t) =
Wx(t) [25] such that the covariance matrix of z(t) becomes the
identity matrix. The remaining set of matrices can then be diagonalized by an orthogonal transformation R, since in a white
basis all degrees of freedom left are rotations [13]. For several
matrices, that share a common Eigen-structure, a Jacobi-like
algorithm proposed by Cardoso is used to determine R [26],
[27]. The basic idea is that the rotation matrix R is formed by
a product of elementary plane rotations Rk (φk ) each trying to
minimize the off-diagonal elements in a two dimensional subspace, where the optimal rotation angle φk can be calculated in
closed form (see [27] for details). Concatenation of both transforms (whitening W and rotation R) yields an estimate of the
mixing matrix  = W−1 R−1 .
If we denote thePestimated sources given by our ICAalgorithm withP
ŝi =
Â−1
ij xj and the corresponding basis with
{êi } (i.e. ŝ = i êi ŝi ) then we obtain
X
ek A−1
(8)
êi =
kj Âji .
k,j
and the estimation error for the i-th component is given by
0
1
−1
(A
Â)
ii
A.
Ei = d{ei } (ei , êi ) = arccos @ qP
(9)
−1 Â)2
(A
ki
k
E. Resampling Methods
It is straight forward to apply bootstrap resampling for i.i.d.
data and for algorithms that do not use temporal structure.
Less obvious is to construct a time structure preserving bootstrap. A simple bootstrap approach would clearly destroy temporal structure, but it can be generalized such that methods
like TDSEP that use temporal correlations are still applicable.
Consider a time series of length
T . The bootstrap resampling
P
defines a series {at } with
at = T and each at0 telling how
often the data point x(t0 ) has been drawn. Using this, we can
calculate the resampled time-lagged correlation matrices as
∗
(τ ) =
Ĉij
T
1 X
at · [xi (t)xj (t − τ ) + xi (t − τ )xj (t)]
2T t=τ +1
Another way of generating a time-structure-preserving surrogate data is, for example, to apply a (random) linear filter F on
the measured (mixed) data
D. Distance Measure for ICA - Projections
Since ICA does not allow to identify the mixing matrix A itself, but only an unordered set of one-dimensional source signal
subspaces, a very natural distance measure between two sources
is the angle difference estimate between their respective subspaces. A mathematically simple, but technically important
fact is, that we first have to define an orthonormal basis {ei } in
our data space to define the notion of an angle. Let U and V be
two one-dimensional subspaces and u ∈ U, v ∈ V two vectors of
non-zero length. The distance between U and V measured with
(10)
Fxi (t) =
T
X
τ =0
fτ · xi (t − τ ) =
X
Aij · Fsj (t).
(11)
j
(This can be understood as giving different weights to different
frequencies, whereas normal bootstrap gives different weights to
different time instants.) Since the mixture matrix A commutes
with this filter operator ([F, A] = 0) and the filtered sources
s0j (t) = Fsj (t) are still mutually independent, the filtered signals x0i (t) = Fxi (t) can be interpreted as linear mixture of the
filtered sources with the same mixture matrix A.
MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS
1517
Note, that in general, resampling procedures using Equations
(10) or (11) do not provide consistent estimators like the real
Bootstrap on i.i.d. data sets Nevertheless, the asymptotic bias
can be bounded.(Appendix B). Also numerical simulation show,
that they can still be used for the purposes of this paper, since
they give good estimates as well (at least for small separation
errors).
F. The Resampling Algorithm - Uncertainty estimation
We will now give a short description of our resampling algorithm. In principle, it is straightforward now, to obtain the
bootstrap estimator for the RMSE as
v
u
B
u1 X
σ̂i = t
d2 (êi , ê∗b
i ),
B b=1 {êi }
(12)
where B is the number of bootstrap replications and {ê∗b
i } is
the basis estimated from the b-th surrogate data set.
This naive approach, however, has two problems in practice.
The first one is merely technical: We assume, that we already
∗b
∗b
know, which basis vector ê∗b
1 , ê2 , ê3 , . . . estimated from the
surrogate data corresponds to a given êi . In general, this is
not true, and finding the right permutation may become computationally very expensive. The second problem is more fundamental: If we allow higher dimensional source signals, this
estimator will no longer be able to assess the stability of a solution. To see this, consider a five-dimensional mixture of, say,
2 signals s1 and s2 with dim(s1 ) = 2 and dim(s2 ) = 3 that
both cannot be written as linear combinations of independent
one-dimensional signals but are mutually statistically independent (i.e. p(s1 , s2 ) = p(s1 ) · p(s2 )). Although the two subspaces
containing s1 and s2 can be separated perfectly with standard
ICA techniques, there is no primely basis within each of these
two subspaces that could be identified by ICA. This means, every ICA projection will be marked as unstable by the bootstrap
estimator.
To simplify the permutation problem, we use the fact that
the subspaces identified by our ICA algorithms do not depend
on the initial basis {fj } defined by our observations xj , i.e. we
are free to perform a linear transformation (or a change of coordinate system) before applying the resampling procedure. In
particular, it is highly convenient to resample after a prior blind
source separation step, i.e. we have to consider only small deviations from the identity matrix in every bootstrap sample. Since
the components of the data vector with respect to {êi } are white
by construction, the separating matrices R̂∗b obtained from the
surrogate data sets are approximately characterized by a small
rotation4 .
The crucial idea how to find stable higher dimensional source
signal subspaces is to calculate not the overall rotation for each
direction, but to decompose each rotation into N (N2−1) elementary rotations within all two-dimensional planes spanned by the
coordinate axes. This can be carried out by taking the matrix
logarithm
α̂∗b = ln(R̂∗b ).
(13)
Here, each component αij of α is the angle of a rotation in the
i-j-plane (see Appendix A).
If we now calculate the variance of the α-matrix component4
In order to obtain exact rotations a re-whitening transformation – defined as
x0 = Vx with V = E[xxT ]−1/2 – is applied to the surrogate data.
wise, we obtain a separability matrix
v
u
B
u1 X
2
(α̂∗b
Ŝij = t
ij ) .
B
(14)
b=1
The component Ŝij of this matrix measures, how unstable the
ICA solution is with respect to a rotation in the i-j-plane,
i.e. how reliable the respective one dimensional subspaces êi
and êj can be separated. Note that a low Ŝij therefore corresponds to good separability.
If the used BSS algorithm was successful in separating the
independent subspaces, the separability matrix should have a
block structure that groups together one-dimensional ICA projections that belong to the same independent subspace. Thus, a
reliable independent subspace should become clearly separated
from every other subspace. Algorithmically one can use the second eigen-vector for detecting block structure in Ŝ which is a
very common technique in spectral clustering (see e.g. [28], [29],
[30]).
We use this to define the uncertainty of the estimated one
or multi-dimensional source signal subspace. Let V be a kdimensional subspace spanned by the ICA basis vectors êi with
i ∈ I(V) = {i1 , i2 , . . . , ik }. The uncertainty of an estimated
multi-dimensional source signal subspace can be defined by
U (V) := max Ŝij .
j ∈I(V)
/
i∈I(V)
(15)
In the case of one-dimensional independent components this
reduces to
Ui := U (êi ) = max Ŝij .
(16)
j
Let us summarize the resampling algorithm:
1. Estimate the mixing matrix  with some ICA/BSS algorithm. Calculate the projections
y = A−1 x.
2. Produce B surrogate data sets from y and whiten each of
these data sets.
3. For each of the B surrogate data sets: do BSS. This produces
a set of rotation matrices R̂∗b , b = 1..B.
4. Calculate the matrix of the elementary rotation angles α̂∗b =
ln(R̂∗b )
5. Calculate the separability matrix Ŝij in the rotation parameters (angles) αij .
6. Separate the data space into different one or multidimensional subspaces according to the block structure of S
7. For each subspace calculate the uncertainty
To wrap up, we compute the stability of independent subspaces found by ICA/BSS. Depending on the application
(cf. next section), each subspace can in principle be one or multidimensional. We would like to stress that our method allows to
pin down structural dependencies (say e.g. a three dimensional
stable subspace) which provides highly important and relevant
information for subsequent biomedical interpretation and modeling.
IV. Experiments on artificial data
In the experiments that are reported here, we used both the
bootstrap and the filter technique. Remarkably, the results are
almost identical. The following figures show the results of the
Bootstrap resampling (with JADE) and Eq.(10) (with TDSEP).
1518
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
A. Comparison Separation Error - Uncertainty Estimate
To show the practical applicability of the resampling idea to
BSS, the RMSE (Eq. 1) was compared with the Uncertainty Ui
(Eq. 16) for the case of one dimensional independent components. The separation was performed on artificial 2D mixtures
of signals that have been produced by simple stochastic processes (1000 data points, unit variance). To achieve different
separation qualities, the parameters of the used stochastic processes have been adjusted such that they produce time series of
different kurtosis values and different strength of time structure.
Figure (3) shows the result of the experiment for the algorithms JADE and TDSEP. Each point in this diagram stands
for a specific parameter setting of the stochastic process. One
can clearly see, that the RMSE σi is nicely correlated to the Uncertainty Ui . For low Uncertainties (U ≤ 0.1) the uncertainty
measure therefore allows a good prediction of the true separation error. For large uncertainties the points in the diagrams
scatter over a large range of possible RMSE values; this means,
that it is no longer possible to predict the true separation error.
(The systematic deviation of the points from the bisecting line
in this regime is due to the fact that one can measure errors
only up to π/4).
TDSEP
0.4
0.3
0.3
0.2
0.2
0.1
0
RMSE
0.4
JADE
0.5
RMSE
0.5
0.1
Uncertainty U
0
0.1
0.2
0.3
0.4
i
0.5
0
0
0.1
Uncertainty U i
0.2
0.3
0.4
0.5
Fig. 3. The RMSE vs. the uncertainty estimate for the two used algorithms. For small values (U ≤ 0.1) the uncertainty allows to predict the
RMSE.
B. Blockwise uncertainty estimates
For a longer time series it may be interesting to know whether
different parts of a given time series are more (or less) reliable
to separate than others. In this case, it may be better to estimate the mixing matrix not on the whole time series but only
on certain “good” parts, where the BSS assumptions are properly fulfilled. To demonstrate these effects, we mixed two audio
sources (8kHz, 10s - 80000 data points), where the mixtures are
partly corrupted by white Gaussian noise. Reliability analysis is
performed on windows of length 1000, shifted in steps of 250; the
resulting variance estimates are smoothed. Fig. 4 shows again
that the uncertainty measure is nicely correlated with the true
separation error, furthermore the variance goes systematically
up within the noisy part but also in other parts of the time series that do not seem to match the assumptions underlying the
algorithm.5 So our reliability estimates can eventually be used
to improve separation performance by either removing all but
the ‘reliable’ parts of the time series or by performing weighted
averaging. For our example this reduces the overall separation
error by 2 orders of magnitude from 2.4 · 10−2 to 1.7 · 10−4 .
This moving-window resampling can detect instabilities of the
projections in two different ways: Besides the resampling variance that can be calculated for each window, one can also cal5
For example, the peak in the last third of the time series can be traced back
to the fact that the original time series are correlated in this region.
no additive noise
additive noise
no additive noise
Fig. 4. Upper panel: mixtures, partly corrupted by noise. Lower panel:
the blockwise variance estimate (solid line) vs the true separation error
on this block (dotted line).
culate the change of the projection directions between two windows. The later has already been used successfully by Makeig
et al. [31].
C. Selecting the appropriate BSS algorithm and detecting multidimensional independent components
As our resampling algorithm behaves well in the case of onedimensional independent components, the next logical step is
to test it with mixtures of multidimensional independent components. Equally important we can use it as a model selection criterion for: (a) selecting some hyperparameter of the
BSS algorithm, e.g. choosing the lag values for TDSEP or (b)
choosing between a set of different algorithms that rely on different assumptions about the data, i.e. higher order statistics
(e.g. JADE, INFOMAX, FastICA, ...) or second order statistics (e.g. TDSEP). It could, in principle, be much better to
extract one or multi-dimensional component with one and the
next with another assumption/algorithm.
To illustrate the usefulness of the separability matrix, we
study the following seven-channel mixture: two harmonic oscillations (sin and cos), two speech signals, two channels of white
Gaussian noise and one channel of uniformly distributed noise.
From section III-B we know, that this mixture contains of five
independent subspaces: The audio signals and the uniformly
distributed noise each define a one-dimensional source signal
space, whereas the sin/cos- and the Gaussian-noise-subspaces
are two-dimensional.
Figure 5 shows the estimated sources and the separability
matrix using TDSEP with time lags (τ = 0..20). The source estimates show, that TDSEP is able to identify the audio sources
and the sin/cos - signals up to a phase shift. The Gaussian and
uniform noise signals are still mixed; what is to be expected,
because TDSEP can separate only sources with temporal structure. The separability matrix displays exactly this behaviour:
source estimates 1 and 2 (audio signals) are one dimensional
components, source estimates 3,4,5 (noise) span a 3-dimensional
component and source estimates 6 and 7 (sin and cos) span a
2-dimensional component. Note, that the grouping of source estimates to higher dimensional subspaces displays both the properties of the data and properties of the used source separation
algorithm.
Figure 6 shows estimated sources and separability matrix for
the same data set using JADE. The main difference to figure 5
(besides the permutation of the estimates, that has no meaning) is the fact, that JADE is able (in contrast to TDSEP) to
separate the uniform noise from the Gaussian sources. This can
as well be seen in the separability matrix.
For this data set, JADE is able to find smaller subspaces than
TDSEP and can therefore be regarded as the more suitable algorithm in this case. A more careful examination of the separability matrix, however, reveals (Figure 7), that the uncertainties
Estimated Independent Component No.
MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS
1519
The estimated source signals
The Separability Matrix
0.4
1
1
2
2
3
3
4
4
0.2
5
5
0.15
6
6
0.1
7
7
0.35
0.3
0.25
0.05
2
4
6
Estimated Independent Component No.
Fig. 5. Estimated sources and separability matrix using TDSEP (toy data set).
The estimated source signals
The Separability Matrix
0.4
1
1
2
2
3
3
4
4
0.2
5
5
0.15
6
6
0.1
7
7
0.35
0.3
0.25
0.05
2
4
6
Fig. 6. Estimated sources and separability matrix using JADE (toy data set).
TDSEP
0.5 b.)
0.4
0.4
Uncertainty
Uncertainty
JADE
0.5 a.)
0.3
0.2
0.1
0
data sets is beyond the scope of this paper; we reference original
papers for physiological details and interpretation.
0.3
A. Fetal ECG
0.2
We now illustrate our resampling approach on fetal ECG
data. We use a data set [32] of 2500 points sampled at 500Hz
with 8 electrodes located on abdomen and thorax of a pregnant woman. In previous examinations of this data set [33],
[23], the necessity of introducing the generalization of ICA to
multi-dimensional components became obvious.
Looking at the separability matrices, one can clearly see that
the JADE algorithm is more appropriate for separating this
data than TDSEP, because JADE yields lower matrix entries,
i.e. higher reliability. The TDSEP separability matrix does not
show a clear block structure; the only component that can reliably separated from the others is component 8. The JADE separability matrix in contrast shows two one-dimensional components (1 and 4) and three two-dimensional components (2/3, 5/6
and 7/8). Examination of the estimated source signals shows
that in fact only JADE is able to separate the heartbeat of the
fetus from the heartbeat of the mother (mother: 1, 2/3, 4; fetus:
7/8).
0.1
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Fig. 7. The one-dimensional uncertainties for the source estimates from
Figs. 6 and 5 using (a) JADE and (b) TDSEP. On this data set, JADE is
able to identify three one-dimensional components with acceptable low uncertainties (3,4,7 - the audio signals and the uniformly distributed source)
whereas TDSEP can find only two stable one-dimensional components
(1,2 - the audio signals). Note that TDSEP is able to extract the audio
signals more reliably than JADE.
of the one dimensional audio signal estimates (U1 = 0.004 and
U2 = 0.004) using TDSEP are much lower, than the respective uncertainties (U4 = 0.091 and U7 = 0.023) using JADE.
Calculation of the true separation errors shows that in fact TDSEP (E1 = 0.017 and E2 = 0.004) does a better job estimating these sources (JADE: E4 = 0.117 and E7 = 0.032). The
2-dimensional sin/cos subspace is found equally well by both
algorithms.
Knowing this, it is now straightforward to combine the
strengths of both algoritms for the source separation. A first
application of TDSEP finds the audio sources and the sin/cossubspace. Then applying JADE to the orthogonal subspace
allows to extract the other components and yields the best solution that can be achieved by combining these two algorithms.
V. Application to Biomedical Data
We will now apply our algorithm to biomedical real-world
data sets that serve as testbeds. A full analysis of the respective
B. Removing 150Hz artifact in event-related MEG measurements
In the analysis of MEG data we often face the problem that
noise from biological or technical origin is corrupting the measurements. A previous study showed, that blind source separation methods can be used to reduce the artifacts which will
improve the source localization accuracy [34], [35].
We applied ICA combined with our resampling scheme to
the MEG data sets from [34], [35] containing measurements
of somatic evoked magnetic fields (SEFs) elicited by electrical stimulation of the right median nerve. In this experiment,
the magnetic field above the contralateral somatosensory cortex
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
Estimated Independent Component No.
1520
The estimated source signals
The Separability Matrix
1
1
0.35
2
2
0.3
3
3
0.25
4
4
0.2
5
5
0.15
6
6
7
7
8
8
0.1
0.05
1
2
3
4
5
6
7
8
0
Estimated Independent Component No.
Fig. 8. Estimated source signals and separability matrix for the ECG time series using TDSEP.
The estimated source signals
The Separability Matrix
1
1
0.35
2
2
0.3
3
3
0.25
4
4
0.2
5
5
0.15
6
6
7
7
8
8
0.1
0.05
1
2
3
4
5
6
7
8
0
Fig. 9. Estimated source signals and separability matrix for the ECG time series using JADE.
a.) Separability Matrix TDSEP
b.) Separability Matrix JADE
0.5
5
10
0.5
0.4
5
0.4
0.3
0.3
10
15
0.2
15
0.2
20
0.1
20
0.1
0
25
25
5
Fig. 10.
ment
10
15
20
25
5
10
15
20
25
0
The separability matrices of the Event-Related EEG Measure-
Figure 10 shows that TDSEP works better on this data set
and is able to find low-dimensional signal subspaces. The separability matrix for JADE does not show such a clear block-
structure and cannot identify the artifact-component as well as
TDSEP.
TDSEP2
was measured at a sampling rate of 2 kHz using the Berlin 49channel planar SQUID-gradiometer system operated in a magnetically shielded room. A stimulus locked averaging over 12000
epochs with a duration of 333 ms had been performed.
The averaged data can be used to locate the exact position of
the activated region of the somatosensory cortex by fitting an
equivalent current dipole model. This is of high clinical importance for example in the “presurgical assessment of functional
brain area” [36], [37]. However such a technique is very sensitive
to corruption by artifactual signals, like the omnipresent power
line interferences.
Applying BSS and resampling to this data after compression to the 25 most powerful principle components (cf. [34],
[38]), reveals that several low-dimensional subspaces could be
reliably identified when using TDSEP. In particular, the twodimensional subspace spanned by components TDSEP1 and
TDSEP2 clearly corresponds to a 150Hz harmonics due to the
power line artifact (Here, we encounter again the previously
discussed sin / cos-component, Fig. 11).
TDSEP1
Fig. 11. Shown are the TDSEP source estimates 1 and 2 (left) and a
scatterplot of the components (right).
C. A DC-MEG experiment with acoustic stimulation
We now apply our reliability analysis to a time series that has
been produced by a DC-MEG experiment with acoustic stimulation. DC-coupled brain monitoring is of high medical relevance
because many pathophysiological processes have their main energy in the frequency range below 0.1 Hz. The biomagnetic
recording technology employed here is based on a mechanical
modulation of the head, respectively, body position relative to
the sensor [39]. This technology has the potential to enable
physicians to detect minute injury-related fields e.g. from nearDC phenomena in stroke such as peri-infarct and anoxic depolarizations (see e.g. [40], [41], [42] for a detailed discussion).
The magnetic field has been recorded with a planar SQUIDgradiometer sensor-array that was centered tangentially over the
left auditory cortex. An acoustic stimulation was achieved by
presenting alternating periods of music and silence, each of 30s
length, to the subjects right ear during 30 min. of total recording time. This paradigm of externally controlled music- related
MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS
1521
DC-activations of auditory cortices defines a measurement and
analysis scenario with almost complete knowledge about both
the spatial pattern and the time course of a cerebral DC-source
which on the other hand is fully embedded in the ‘real’ biological
and ambient noise background (for details see [43], [42]). The
measured DC-magnetic field values, sampled at a frequency of
0.4 Hz, gave a total number of 720 sample points for each of
the 49 channels. While previously [42] analysing the data, we
found that many of the ICA components are seemingly meaningless and it took some medical knowledge to find potential
meaningful projections for a later close inspection. In the current experiment, BSS and resampling was performed on the 23
most powerful principal components.
a.) Separability Matrix TDSEP
b.) Separability Matrix JADE
0.4
5
0.3
10
0.2
15
0.1
20
5
10
15
0.4
5
0.3
10
0.2
15
0.1
20
20
5
10
15
20
Fig. 12. The separability matrices of TDSEP and JADE on the DCMEG data set show that JADE fails to produce a stable source separation;
TDSEP is able to find three one-dimensional and three higher dimensional
components.
The results in Fig. 12 show that JADE (b) fails completely
to produce a stable source separation, whereas TDSEP (a)
identifies three one-dimensional components (1,22,23) and three
higher-dimensional components (2/3, 4-7 and 8-21) with a low
uncertainty. In fact, at least the one-dimensional components
have a clear physical meaning: component 1 is an internal very
low frequency frequency signal (drift) that is always present in
DC-measurements and component 23 shows an typical artifact
produced by the MEG measuring device (Figure 14). Interest-
0.4
JADE
0.5
a.)
Uncertainty
Uncertainty
0.5
0.3
0.2
0.1
0
5
10
15
20
0.4
TDSEP
b.)
0.3
0.2
0.1
0
5
10
15
20
Fig. 13. The one-dimensional uncertainties of the source estimates using
a.) JADE and b.) TDSEP
Fig. 14. The TDSEP components 1 and 23. Component 1 (upper curve)
is a drift and component 23 (lower curve) an artifact produced by the
DC-MEG measurement device.
ingly component 22 shows a (noisy) rectangular waveform that
clearly displays the 1/30s on/off characteristics of the stimulus
(correlation to stimulus 0.7; see Fig. 15). The clear dipole-
Fig. 15. Spatial field pattern and time course of TDSEP channel 22.
structure of the spatial field pattern in Fig. 15 underlines the
relevance of this projection. The components found by JADE
do not show such a clear structure and the strongest correlation of any component to the stimulus is about 0.3, which is of
the same order of magnitude as the strongest correlated PCAcomponent before applying JADE.
VI. Discussion and Conclusions
We proposed a simple method based on resampling techniques
to estimate the reliability of results obtained from unsupervised
learning algorithms. After briefly discussing the general resampling idea, we applied it to the BSS scenario and showed, that
our technique approximates the separation error well. Now several directions are open(ed) for applications.
First, we may like to use our reliability assessment for model
selection purposes to distinguish between algorithms or to chose
good hyper-parameters. Note that powerful algorithms exist
that can be used in deflation mode, e.g. FastICA [44]. So, BSS
should be applied component-wise: chosing the best, i.e. most
reliable algorithm for every one or multi-dimensional component. Simulation experience clearly suggest that such a deflation strategy can give excellent results if different statistical assumptions are underlying the respective sources, e.g. when the
first source contains only temporal information, the second is of
super-Gaussian nature and so on.
Second, variances can be estimated on blocks of data and separation performance can be enhanced by weighted averaging or
by using only low variance blocks where the model matches the
data nicely. Possible breakdowns of reliability give an indication
for a violation of the respective BSS/ICA model assumptions,
thus revealing interesting structure of the data.
Finally, reliability estimates can be used to find stable meaningful components. Here our assumption is that the more meaningful a component is, the more stably we should be able to
estimate it. In this sense artifacts appear of course also as
meaningful, whereas noisy directions are discarded easily, due to
their high uncertainty. By discarding meaningless6 components,
we can relieve the medical staff from inspecting useless components and therefore reduce the necessary human interaction in
a decision/diagnosis process. This is particularly important for
example in MEG applications where recent devices have more
than 300 sensor channels.
Note that the reliable components to be extracted can be
one- or multi-dimensional, a finding, which can provide highly
useful information as a starting point for further understanding or modeling of physiological data. For example the study
(see section V-A) on fetal ECG underline in very nice agreement to previous work [23]: ECG signals can be most reliably
6
The subspace spanned by the unstable component estimates could in principle also carry physical meaning, but within the assumptions of the used algorithm it is impossible to reveal any hidden structure. The estimated source
signals are arbitrary in this case and should not be interpreted without further
processing.
1522
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
described and should be modeled best with multi-dimensional
components.
As indicated above, since resampling is a very general statistical method, it can also be used for assessing reliability in other
unsupervised learning scenarios. Future research will therefore
focus on applying resampling techniques to other unsupervised
learning scenarios e.g. for clustering (cf. also [45]). We furthermore also intend to consider Bayesian modelings (e.g. varational
or ensemble models cf. [46], [47], [15], [5]) where often a variance
estimate comes for free, along with the trained model – however
often at high computational costs.
Appendix
The 2-dimensional Rotation matrix can now be written as
«
„
´
1`
α12 M12 + α21 M21
R = exp
2
`
´
= exp α12 M12
=
=
1 cos α12 + M12 sin α12
«
„
cos α12
sin α12
− sin α12 cos α12
which is the best known parameterization of the 2-dimensional
rotation matrices. Eqn.(20) is the n-dimensional generalization
of this formula.
A. Rotation Angles and Rotation Matrices
B. Asymptotic Considerations for Resampling
The rotation matrices in N -dimensional space form the representation of the special orthogonal group SO(N ). This means,
that for all R ∈ SO(N )
Properties of resampling methods are typically studied in the
limit when the number of bootstrap samples B → ∞ and the
length of signal T → ∞ [4]. In the case of bootstrap resampling,
as B → ∞, the bootstrap variance estimatorqUi∗ (B) computed
RRT = 1
(17)
det(R) = +1
(18)
Any orthogonal matrix R can be written as the exponential
of a single antisymmetric matrix α
R = eα =
∞
X
1 n
α
n!
n=0
To see this, just transpose R = eα :
T
RT = (eα )T = eα = e−α = R−1
A basis {Mij } in the space of all antisymmetric matrices can
be defined by
“
”
Mij
= δai δbj − δaj δbi
(19)
ab
δia
where
denotes the Kronecker-Symbol: δai = 1 if i = a and
δai = 0 otherwise. Note, that each Mij is a N by N matrix,
a and b specify the matrix entries, i.e. a is the row- and b the
column-index. Using this, every orthogonal matrix can be written as
!
N
1 X
ij
αij M
(20)
R = exp
2 i,j=1
Due to the antisymmetry of the M ij , one can choose the αij
to be antisymmetric, too, without loss of generality. If we do so,
each αij corresponds to the angle of a basic rotation within the
i-j-plane. More precisely, exp(αij Mij ) is a orthogonal transformation that rotates the i-axis towards the j-axis by the angle
αij .
So, given an arbitrary rotation matrix R, it is always straightforward to decompose it into elementary rotations within different planes by taking the logarithm:
α = ln(R)
(21)
Example 1: In the 2-dimensional case one obtains the 2 by 2
antisymmetric basis matrices
` ij ´ «
„ ` ij ´
`Mij ´11 `Mij ´12
Mij =
M 21
M 22
With i, j = 1..2 these are 4 matrices:
„
«
0 0
M11 = M22 =
0 0
„
«
0
1
M12 = −M21 =
−1 0
from the α∗ij ’s converge to Ui∗ (∞) := maxj
VarF̂ [α∗ij ] where
α∗ij
denotes the resampled angle deviation and F̂ denotes the
distribution generating it. Furthermore,pif F̂ → F , Ui∗ (∞) converges to the true variance Ui = maxj VarF [αij ] as T → ∞.
This is the case, for example, if the original signal is i.i.d. in time.
When the data has time structure, F̂ does not necessarily converge to the generating distribution F of the original signal anymore. Although we cannot neglect this difference completely, it
is small enough to use our scheme for the purposes considered
in this paper. For instance in TDSEP, where the αij depend on
the variation of the time-lagged covariances Cii (τ ) of the signals,
∗
we can bound the difference ∆ij = VarF̂ [Ĉij
(τ )] − VarF [Ĉij (τ )]
between the real variation and its bootstrap estimator as
n 2
o
(
`
´
2
2τ
2τ
1
2a
M
1
+
a
+
2τ
a
, j=i
2
T
1−a
|∆ij | ≤
2
1 2a2
M , j 6= i
T 1−a2
if ∃a < 1, M ≥ 1, ∀i : |Cii (τ )| ≤ M aτ |Cii (0)|. In our experiments, however, the bias is usually found to be much smaller
than this upper bound.
For the filter resampling it is rather difficult to show theoretically whether it can be used as a unbiased variance estimator.
Nevertheless, our experiments show numerically that the filter
resampling typically provides good absolute variance estimates.
Acknowledgments
K.-R.M thanks Guido Nolte and the members of the Oberwolfach Seminar September 2000 in particular Lutz Dümbgen and
Enno Mammen for helpful discussions and suggestions. K.-R.M
and A.Z. acknowledge partial funding by the EU project (IST1999-14190 – BLISS). The studies were supported by a grant
of the Bundesministerium für Bildung und Forschung (BMBF),
FKZ 01IBB02A and 01IBB02B. We thank PTB for providing
the MEG data. Special thanks go to the reviewers, who gave
highly valuable advice in the revision process.
References
[1]
[2]
[3]
[4]
F. Meinecke, A. Ziehe, M. Kawanabe, and K.-R. Müller, “Assessing reliability of ICA projections – a resampling approach,” in Proc. ICA’01, T.-W.
Lee, Ed., 2001.
F. Meinecke, A. Ziehe, M. Kawanabe, and K.-R. Müller, “Estimating the
reliability of ICA projections,” in Advances in Neural Information Processing Systems 14, S. Becker T.G. Dietterich and Z. Ghahramani, Eds.,
Cambridge MA, 2002, MIT Press.
B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman
and Hall, first edition, 1993.
J. Shao and D. Tu, The Jackknife and Bootstrap, Springer, New York, 1995.
MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis,
Wiley, 2001.
B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis
as a kernel eigenvalue problem,” Neural Computation, vol. 10, pp. 1299 –
1319, 1998.
R.O. Duda, P.E.Hart, and D.G.Stork, Pattern classification, John Wiley &
Sons, second edition, 2001.
Ch. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal Processing, vol.
24, pp. 1–10, 1991.
P. Comon, “Independent component analysis, a new concept?,” Signal
Processing, vol. 36, no. 3, pp. 287–314, 1994.
G. Deco and D. Obradovic, An information-theoretic approach to neural
computing, Springer, New York, 1996.
S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for blind
signal separation,” in Advances in Neural Information Processing Systems
(NIPS 95), D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, Eds. 1996,
vol. 8, pp. 882–893, The MIT Press.
A. J. Bell and T. J. Sejnowski, “An information maximisation approach
to blind separation and blind deconvolution,” Neural Computation, vol. 7,
pp. 1129–1159, 1995.
J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian
signals,” IEEE Proceedings-F, vol. 140, no. 6, pp. 362–370, 1993.
A. Ziehe and K.-R. Müller, “TDSEP – an efficient algorithm for blind
separation using time structure,” in Proc. Int. Conf. on Artificial Neural Networks (ICANN’98), L. Niklasson, M. Bodén, and T. Ziemke, Eds.,
Skövde, Sweden, 1998, pp. 675 – 680, Springer Verlag.
S. Roberts and R. Everson, Eds., Independent Component Analysis: principles and practice, Cambridge University Press, 2001.
L. Parra and C. Spence, “Convolutive blind source separation of nonstationary sources,” IEEE Trans. on Speech and Audio Processing, vol. 8,
no. 3, pp. 320–327, May 2000.
T.-W. Lee, A. Ziehe, R. Orglmeister, and T.J. Sejnowski, “Combining
time-delayed decorrelation and ICA: Towards solving the cocktail party
problem,” in Proc. ICASSP98, Seattle, 1998, vol. 2, pp. 1249–1252.
N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation
based on temporal structure of speech signals,” Neurocomputing, vol. 41,
no. 1-4, pp. 1–24, 2001, also BSIS Technical Reports No.98-2.
L. Parra, C. D. Spence, P. Sajda, A. Ziehe, and K.-R. Müller, “Unmixing
hyperspectral data,” in Advances in Neural Information Processing Systems
12, S.A. Solla, T.K. Leen, and K.-R. Müller, Eds. 2000, pp. 942–948, MIT
Press.
S. Makeig, T-P. Jung, D. Ghahremani, A.J. Bell, and T.J. Sejnowski,
“Blind separation of event-related brain responses into independent components,” Proc. Natl. Acad. Sci. USA, vol. 94, pp. 10979–10984, 1997.
R. Vigário, V. Jousmäki, M. Hämäläinen, R. Hari, and E. Oja, “Independent component analysis for identification of artifacts in magnetoencephalographic recordings,” in Advances in Neural Information Processing
Systems, Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, Eds.
1998, vol. 10, pp. 229–235, The MIT Press.
A. Ziehe, K.-R. Müller, G. Nolte, B.-M. Mackert, and G. Curio, “Artifact
reduction in magnetoneurography based on time-delayed second order correlations,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 1,
pp. 75–87, 2000, also GMD Technical Report No. 31, 1998.
J. F. Cardoso, “Multidimensional independent component analysis,” in
Proceedings of ICASSP ’98. 1998, IEEE.
E. Oja, Subspace methods of Pattern Recognition, Res. Studies Press, Hertfordshire, 1983.
G.H. Golub and C.F. van Loan, Matrix Computation, The Johns Hopkins
University Press, London, 1989.
C.G.J. Jacobi,
“Über ein leichtes Verfahren, die in der Theorie
der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen,”
Crelle J. reine angew. Mathematik, vol. 30, pp. 51–94, 1846.
J.-F. Cardoso and A. Souloumiac, “Jacobi angles for simultaneous diagonalization,” SIAM J.Mat.Anal.Appl., vol. 17, no. 1, pp. 161 ff., 1996.
Y. Weiss, “Segmentation using eigenvectors: A unifying view,” in ICCV
(2), 1999, pp. 975–982.
M. Meila and J.Shi, “Learning segmentation by random walks,” in Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, Eds. 2001, vol. 13, MIT Press.
C. J. Alpert, A. B. Kahng, and S.Z. Yao, “Spectral partitioning with
multiple eigenvectors,” Discrete Applied Mathematics, vol. 90, pp. 3–26,
1999.
S. Makeig, S. Enghoff, T.-P. Jung, and T. Sejnowski, “Moving-window
ICA decomposition of EEG data reveals event-related changes in oscillatory brain activity,” in Proc. 2nd Int. Workshop on Independent Component
Analysis and Blind Source Separation (ICA’2000), Helsinki, Finland, 2000,
pp. 627–632.
De Moor B. L. R. (ed), “Daisy: Database for the identification of systems.
http://www.esat.kuleuven.ac.be/sista/daisy,” 1997.
L. D. Lathauwer, B. D. Moor, and J. Vandewalle, “Fetal electrocardiogram extraction by source subspace separation,” in Proceedings of HOS’95,
Aiguabla, Spain, 1995.
1523
[34] A. Ziehe, G. Nolte, T. Sander, K.-R. Müller, and G. Curio, “A comparison
of ICA-based artifact reduction methods for MEG,” in Recent Advances in
Biomagnetism, Proc. of the 12th International conference on Biomagnetism,
Jukka Nenonen, Ed., Espoo,Finland, 2001, pp. 895–898, Helsinki University of Technology.
[35] G. Nolte and G. Curio, “The effect of artifact rejection by signal-space
projection on source localization accuracy in MEG measurements,” IEEE
Trans. Biomed. Eng., vol. 46, pp. 400–408, 1999.
[36] O.Ganslandt, D. Ulbricht, H. Kober, J. Vieth, C. Strauss, and
R. Fahlbusch,
“SEF-MEG localization of somatosensory cortex as a
method for presurgical assessment of functional brain area,” Electroencephalogr. Clin. Neurophysiol. Suppl., vol. 46, pp. 209–13, 1996.
[37] M.A. Uusitalo and R.J. Ilmoniemi,
“The signal-space projection (SSP) method for separating MEG or EEG into components,”
Med.Biol.Eng.Comput., vol. 35, pp. 135–140, 1997.
[38] A. Hyvarinen, J. Sarela, and R. Vigario, “Spikes and bumps: Artefacts generated by independent component analysis with insufficient sample size,”
in Proc. Int. Workshop on Independent Component Analysis and Blind Source
Separation (ICA’99), Aussois, France, January 11–15, 1999, pp. 425–429.
[39] G. Wübbeler, B.-M. Mackert, F. Armbrust, M. Burghoff, P. Marx, G. Curio, and L. Trahms, “Measuring para-DC biomagnetic fields of the head
using a horizontal modulated patient cot,” Biomed Tech (Berl), , no. 43,
pp. 232–233, 1999, in german.
[40] B.-M. Mackert, J. Mackert, G. Wübbeler, F. Armbrust, K.-D. Wolff,
M. Burghoff, L. Trahms, and G. Curio, “Magnetometry of injury currents
from human nerve and muscle specimens using superconducting quantum
interferences devices,” Neuroscience Letters, vol. 262, no. 3, pp. 163–166,
Mar 1999.
[41] G. Curio, S.M. Erné, M. Burghoff, K.-D. Wolff, and A. Pilz, “Non-invasive
neuromagnetic monitoring of nerve and muscle injury currents,” Electroencephalography and clinical Neurophysiology, vol. 89, no. 3, pp. 154–160, 1993.
[42] G. Wübbeler, A. Ziehe, B.-M. Mackert, K.-R. Müller, L. Trahms, and
G. Curio, “Independent component analysis of non-invasively recorded
cortical magnetic DC-fields in humans,” IEEE Transactions on Biomedical
Engineering, vol. 47, no. 5, pp. 594–599, 2000.
[43] B.-M. Mackert, G. Wübbeler, P. Marx, L. Trahms, and G. Curio, “Noninvasive long-term recordings of cortical ’direct current’ (DC-) activity in
humans using magnetoencephalography,” Neuroscience Letters, vol. 273,
no. 3, pp. 159–162, Oct 1999.
[44] A. Hyvärinen and E. Oja, “A fast fixed-point algorithm for independent
component analysis,” Neural Computation, vol. 9, no. 7, pp. 1483–1492,
1997.
[45] V. Roth, T. Lange, M. Braun, and J. Buhmann, “A resampling based
approach to cluster validation,” 2001, unpublished manuscript.
[46] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11, no.
4, pp. 803–851, 1999.
[47] H. Valpola, Bayesian Ensemble Learning for Nonlinear Factor Analysis, vol.
108 of Acta Polytechnica Scandinavica, Mathematics and Computing Series,
Finnish Academies of Technology, Espoo, Finnland, 2000.
Frank Meinecke is master student at University of
Potsdam and Fraunhofer FIRST Berlin. His research
interest is focused on nonlinear dynamics and signal
processing, especially blind source separation and independent component analysis.
1524
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
Andreas Ziehe received the Diplom degree in computer science from Humboldt University Berlin in
1998. He is currently working towards his Ph.D. at
University of Potsdam and Fraunhofer FIRST Berlin.
His research interest is focused on computational
methods for data analysis and signal processing, especially blind source separation and independent component analysis.
Motoaki Kawanabe received both master and Ph.D
in mathematical engineering from University of Tokyo
in Prof. Amari’s Lab. Since 2000 he has joined the
IDA group at Fraunhofer FIRST. His research interest
is focused on statistics, information theory, information geometry and recently also on blind source separation and independent component analysis.
Klaus-Robert Müller received the Diplom degree in
mathematical physics 1989 and the Ph.D. in theoretical computer science in 1992, both from University
of Karlsruhe, Germany. From 1992 to 1994 he worked
as a Postdoctoral fellow at GMD FIRST, in Berlin
where he started to built up the intelligent data analysis (IDA) group. From 1994 to 1995 he was a European Community STP Research Fellow at University
of Tokyo in Prof. Amari’s Lab. From 1995 on he is
department head of the IDA group at GMD FIRST
(since 2001 Fraunhofer FIRST) in Berlin and since
1999 he holds a joint associate Professor position of
GMD and University of Potsdam. He has been lecturing at Humboldt University, Technical University Berlin and University of Potsdam. In 1999 he received
the annual national prize for pattern recognition (Olympus Prize) awarded by
the German pattern recognition society DAGM. He serves in the editorial board
of Computational Statistics, IEEE Transactions on Biomedical Engineering and
in program and organization committees of various international conferences.
His research areas include statistical physics and statistical learning theory for
neural networks, support vector machines and ensemble learning techniques. His
present interests are expanded to time-series analysis, blind source separation
techniques and to statistical denoising methods for the analysis of biomedical
data.
Download