1514 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING A Resampling Approach to Estimate the Stability of One-Dimensional or Multidimensional Independent Components Frank Meinecke, Andreas Ziehe, Motoaki Kawanabe and Klaus-R. Müller Abstract— When applying unsupervised learning techniques in biomedical data analysis, a key question is whether the estimated parameters of the studied system are reliable. In other words, can we assess the quality of the result produced by our learning technique? We propose resampling methods to tackle this question and illustrate their usefulness for Blind Source Separation (BSS). We demonstrate that our proposed reliability estimation can be used to discover stable one- or multi-dimensional independent components, to choose the appropriate BSS-model, to enhance significantly the separation performance, and, most important, to flag components that carry physical meaning. Application to different biomedical testbed data sets (MEG/ECG recordings) underline the usefulness of our approach. Keywords— Blind Source Separation, bootstrap, electrocardiography (ECG), independent component analysis (ICA), magnetoencephalography (MEG), multidimensional ICA, reliability, resampling, stability, unsupervised learning I. Introduction M ANY typical biomedical applications consist of three main steps: a measurement, which produces some raw data, a data analysis, that extracts parameters of interest from the data and a decision or interpretation based on these extracted parameters. The latter step is mostly done semi-automatically i.e. often involving medically trained staff. As the data analysis algorithms, being used in step two of the process, often rely on unsupervised learning, i.e. the desired parameters are estimated without availability of labels, there is a fundamental dilemma. Unsupervised learning algorithms always find ‘some’ answer within their model class: regardless of whether the used model is truly applicable. For example the PCA algorithm will always provide a projection to an orthogonal basis whether the signal of interest can be best decomposed or interpreted within this basis or not. Thus, the user has so far little clue about the certainty with which the answer of the unsupervised algorithm is correct. Clearly, the medical staff always needs both aspects for a diagnosis: a decision and the certainty for the respective decision. So, assessment of reliability for a data analysis result is of fundamental importance, especially if the automatic decision or subsequent human decision bears high risks or costs. In this paper, we show how a reliability estimation for unsupervised learning algorithms can be computed using well-known resampling methods from statistics [3], [4] (section II). Once we F. Meinecke is with University of Potsdam, Department of Physics, Am Neuen Palais 10, 14469 Potsdam, Germany and with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany. E-mail: frank.meinecke@first.fraunhofer.de A. Ziehe is with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany. E-mail: andreas.ziehe@first.fraunhofer.de M. Kawanabe is with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany. E-mail: motoaki.kawanabe@first.fraunhofer.de K. -R. Müller is with University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany and with Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany. E-mail: klausrobert.mueller@first.fraunhofer.de Correspondence should be addressed to K.-R.Müller are able to estimate the reliability of a solution, we can use this information for model/algorithm selection purposes, for testing model validity and for improving the used algorithm. Note that resampling is completely general and can be applied to assess reliability of any unsupervised learning algorithms, e.g. projection techniques (for instance independent component analysis (ICA), principal component analysis (PCA) (cf. [5]), kernel PCA [6], . . . ), clustering (cf. [7]) and so on. In section III we apply the proposed resampling techniques to blind source separation (BSS) problems. We show how these techniques enable us to select a good BSS-algorithm, to improve the separation performance and to find potentially meaningful projection- directions or subspaces respectively. We will give an algorithmic description of the resampling method (section III) and show excellent experimental results on toy (section IV) and several real-world data testbed sets (MEG, ECG) (section V) and conclude with a brief discussion in section VI. II. Resampling Techniques for Unsupervised Learning A. Resampling Methods In the typical unsupervised learning scenario, we want to learn or estimate a set of parameters θ = (θ1 , . . . , θp ) from observed data x = (x1 , . . . , xT ), that characterize the generating law of the data. Usually, we consider a random variable X distributed according to a stochastic process F and regard x as one realization of it. We will denote the estimated parameters by θ̂ = θ̂(x) = (θ̂1 (x), . . . , θ̂p (x) ), where the estimator is a function of the given data set. The important quantity to assess stability is the RootMean-Squared error (RMSE) of the estimator θˆi r h i σi = EF (θi − θ̂i (X))2 (1) where EF and varF denote the expectation and the variance for F 1 . We remark that in our procedure for blind source separation, we measure the error componentwise, i.e. for each θi . Since we neither have access to the true parameter θi , nor to more than one realization x from the distribution F , we cannot evaluate these quantities in a straight forward manner. Resampling is a statistical method which gives e.g. the bias and the variance of estimators only from one set of data x at hand by virtue of modern computer power. Among such procedures, the Jackknife and the Bootstrap are most well-known (see e.g. [3], [4]). The Jackknife produces surrogate data sets by just deleting one datum each time from the original data set. There are generalizations of this approach like the deletek Jackknife which delete more than one datum at a time. The bootstrap is a more general approach and is widely used in data analysis recently. We will give a brief explanation in the next section. 1 Although we can use a general distance measure d(θi , θ̂i (X)) to evaluate the error of the estimator, we will consider the quadratic distance in our explanations for simplicity. MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS 1515 F B. Bootstrap Let us consider the case that we get T i.i.d. samples x1 , . . . , xT from a distribution F . We will write the data by the vector x = (x1 , . . . , xT ). A scalar parameter θi is estimated with an estimator θ̂i (x). We want to evaluate the RMSE of the estimator. Let F̂ be the empirical distribution of the data x. A random variable from F̂ takes values xt (t = 1, . . . , T ) with equal probabilities T1 . Then, B new surrogate data sets2 ∗b x∗b = (x∗b 1 , . . . , xT ) are generated with b = 1, . . . , B, by tak∗b ing T i.i.d. random variables x∗b 1 , . . . , xT from the empirical distribution F̂ . We remark that some data points might occur several times, while others might not occur at all in a particular bootstrap sample. On each surrogate x∗b , the estimator θ̂i∗b = θ̂i (x∗b ) is calculated, so we have B estimators3 θ̂i∗1 , . . . , θ̂i∗B . The bootstrap estimator of the RMSE is calculated as v u B u1 X σ̂i (B) = t (θ̂i − θ̂i∗b )2 (2) B b=1 (See also the flowchart of the Bootstrap in Fig. 1). This quantity measures, how robust our estimation is against small (resampling) changes to the data; in other words, how stable the learning algorithm is wrt. the estimated solution θ̂i . Thus, Eq.(2) can be used as a measure of reliability that allows to select between different algorithmic solutions, to perform selection between different algorithms or to choose hyperparameters for a single algorithm. Furthermore an assumption about the data generating model can be accepted or rejected (in the sense of mathematical testing theory). In the following, we will employ the bootstrap error estimator σ̂i (B) as a measure of reliability. If the data generating process is not i.i.d., we have to use extentions of the Bootstrap e.g. when there exists time structure in the data. There is a wide literature on statistical properties of the Bootstrap and its extentions, which supports the use of resampling procedures. For example, it can be shown that for i.i.d. data the bootstrap estimators of the distributions of many commonly used statistics are consistent [4], i.e. the bootstrap error estimator σ̂(B) of θ̂i converges to the true σi in probability as B and T goes to infinity: p σ̂i (B) → σi , (B → ∞, T → ∞) x *1 θ (x *1) i x* 2 . . . θ i( x *2 ) x =( x 1 ,..., xT ) data F empirical distribution x* b B Bootstrap samples Bootstrap replications of θ i Bootstrap estimate of error (uncertainty) Fig. 1. A schematic picture of the Bootstrap. matrix A has full column rank. So, the blind source separation problem is to identify the mixing matrix and/or the source signals using only the observed signals while assuming statistical independence of the source signals and linear independence of the columns of A. (There are other BSS algorithms that use slightly different assumptions, e.g. vanishing temporal crosscorrelations instead of statistical independence) The BSS problem as stated above is clearly undetermined: since only the observed signals xi are known, a scalar factor can be exchanged between each source signal sj and the corresponding column of A without changing the product. Also the ordering of the source signals (and the corresponding columns of A) has no meaning and is nothing but a notational device. Thus, the source signals can be recovered at best up to a permutation, scales and signs. In other words: we can only identify an unordered set of one-dimensional source signal subspaces. B. Multi-dimensional Independent Components Recently, some approaches have tried to generalize the idea of ICA to the case of multi-dimensional independent components (see e.g. [23] or cf. subspace models [24]). In this case it is not assumed that all of the m source signals are mutually statistically independent, but that they form K higher dimensional independent components. This means, that there is a set of indices 1 = i0 < i1 < · · · < iK = m + 1 that fulfills K Y pk (sik−1 , . . . , sik −1 ) (4) k=1 j=1 Independent Component Analysis (ICA) assumes that the source signals sj (t) are statistically independent and that the 2 i σ i (B) p(s1 , s2 , . . . , sm ) = Blind source separation (BSS) techniques (e.g. [8], [9], [10], [11], [12], [13], [14], [15], [5]) have found wide-spread use in various application domains, e. g. acoustics (e.g. [16], [17], [18], [19]), telecommunication (cf. [5]) or biomedical signal processing (e.g. [20], [21], [22], [5]). BSS is an unsupervised statistical technique to reveal m unknown source signals sj (t) when only mixtures of them can be observed. For a linear mixture model, each of the n ≥ m observed signals xi (t) is assumed to be generated by m X xi (t) = Aij sj (t). (3) 3 x* θ (x *B) θi( x * b) III. Resampling applied to ICA A. The BSS-Model . . . This surrogate data is called the bootstrap sample. These estimators from the bootstrap samples are called the bootstrap replication of θ̂. where p(.) denotes the joint probability density function of the whole data set. Each pk (.) is a probability density function that cannot be further decomposed into a product of marginal densities. Standard ICA algorithms that are applied to such a data set will produce one-dimensional source estimates that are as independent as possible, which means they will still find the right decomposition given by Eqn.4, but they are forced to select as well a decomposition of the actually multidimensional components. Thus standard ICA techniques are able to find these multidimensional source signal subspaces [23], but they choose an (arbitrary) basis within these subspaces. The problem is then to decide, which of the one-dimensional source space estimates given by the algorithm should be grouped together. Figure 2(b) shows an example of a two-dimensional signal space. The two time series that are shown in this scatterplot are given by s1 = (1 + q) ∗ sin(φ) and s2 = (1 + q) ∗ cos(φ) where 0 ≤ φ ≤ 2π and q are uniformly distributed and Gaussian 1516 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING a.) respect to {ei } then reads b.) ? Fig. 2. Scatterplot of two different two-dimensional time series: (a) shows a mixture of two one-dimensional sources. One can easily find the basis (axes) in which the time series become independent. (b) shows a distribution that cannot be written as a product of marginal densities. There is no basis in which the time series become independent. signals, respectively. This combination of sin / cos produces a rotational symmetric joint probability density, so there is no basis within this space that could make these two time series independent. P (uei ) · (vei ) pP i pP 2 2 i (uei ) · i (vei ) d{ei } (u, v) = arccos ! . (5) The mixing/demixing process can be described as a change of coordinates. This means, we consider the sources si as components of a vector s with respect to an orthonormal basis {ei } and the observations xj as the components of the very same vector in terms of a different basis {fj }: X X s= ei si = fj x j . (6) i j with ei ej = δij where δij is the Kronecker symbol (δij = 1 if i = j and δij = 0 otherwise). Note, that the basis {fj } of the mixtures then is non-orthogonal in general. The linear transformation between these two coordinate systems is given by Eqns.(3) and (6): fj = m X ei A−1 ij . (7) i=1 C. Used source separation algorithms In the next sections, we will illustrate the resampling idea with two commonly used source separation algorithms: JADE and TDSEP. Both algorithms determine the mixing matrix based on a joint approximate diagonalization of symmetric matrices. The difference between them is that JADE [13] computes those matrices from ’parallel slices’ of the fourth order cumulant tensor, TDSEP [14] solely relies on second-order statistics and diagonalizes time-lagged correlation matrices, i.e. JADE maximizes the kurtosis of the output signals whereas TDSEP minimizes temporal cross-correlations between the output signals. To achieve an approximate simultaneous diagonalization of several symmetric matrices the algorithms take two steps: (1) pre-whitening and (2) rotation. First, the whitening transformation W transforms the signals x(t) according to z(t) = Wx(t) [25] such that the covariance matrix of z(t) becomes the identity matrix. The remaining set of matrices can then be diagonalized by an orthogonal transformation R, since in a white basis all degrees of freedom left are rotations [13]. For several matrices, that share a common Eigen-structure, a Jacobi-like algorithm proposed by Cardoso is used to determine R [26], [27]. The basic idea is that the rotation matrix R is formed by a product of elementary plane rotations Rk (φk ) each trying to minimize the off-diagonal elements in a two dimensional subspace, where the optimal rotation angle φk can be calculated in closed form (see [27] for details). Concatenation of both transforms (whitening W and rotation R) yields an estimate of the mixing matrix  = W−1 R−1 . If we denote thePestimated sources given by our ICAalgorithm withP ŝi = Â−1 ij xj and the corresponding basis with {êi } (i.e. ŝ = i êi ŝi ) then we obtain X ek A−1 (8) êi = kj Âji . k,j and the estimation error for the i-th component is given by 0 1 −1 (A Â) ii A. Ei = d{ei } (ei , êi ) = arccos @ qP (9) −1 Â)2 (A ki k E. Resampling Methods It is straight forward to apply bootstrap resampling for i.i.d. data and for algorithms that do not use temporal structure. Less obvious is to construct a time structure preserving bootstrap. A simple bootstrap approach would clearly destroy temporal structure, but it can be generalized such that methods like TDSEP that use temporal correlations are still applicable. Consider a time series of length T . The bootstrap resampling P defines a series {at } with at = T and each at0 telling how often the data point x(t0 ) has been drawn. Using this, we can calculate the resampled time-lagged correlation matrices as ∗ (τ ) = Ĉij T 1 X at · [xi (t)xj (t − τ ) + xi (t − τ )xj (t)] 2T t=τ +1 Another way of generating a time-structure-preserving surrogate data is, for example, to apply a (random) linear filter F on the measured (mixed) data D. Distance Measure for ICA - Projections Since ICA does not allow to identify the mixing matrix A itself, but only an unordered set of one-dimensional source signal subspaces, a very natural distance measure between two sources is the angle difference estimate between their respective subspaces. A mathematically simple, but technically important fact is, that we first have to define an orthonormal basis {ei } in our data space to define the notion of an angle. Let U and V be two one-dimensional subspaces and u ∈ U, v ∈ V two vectors of non-zero length. The distance between U and V measured with (10) Fxi (t) = T X τ =0 fτ · xi (t − τ ) = X Aij · Fsj (t). (11) j (This can be understood as giving different weights to different frequencies, whereas normal bootstrap gives different weights to different time instants.) Since the mixture matrix A commutes with this filter operator ([F, A] = 0) and the filtered sources s0j (t) = Fsj (t) are still mutually independent, the filtered signals x0i (t) = Fxi (t) can be interpreted as linear mixture of the filtered sources with the same mixture matrix A. MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS 1517 Note, that in general, resampling procedures using Equations (10) or (11) do not provide consistent estimators like the real Bootstrap on i.i.d. data sets Nevertheless, the asymptotic bias can be bounded.(Appendix B). Also numerical simulation show, that they can still be used for the purposes of this paper, since they give good estimates as well (at least for small separation errors). F. The Resampling Algorithm - Uncertainty estimation We will now give a short description of our resampling algorithm. In principle, it is straightforward now, to obtain the bootstrap estimator for the RMSE as v u B u1 X σ̂i = t d2 (êi , ê∗b i ), B b=1 {êi } (12) where B is the number of bootstrap replications and {ê∗b i } is the basis estimated from the b-th surrogate data set. This naive approach, however, has two problems in practice. The first one is merely technical: We assume, that we already ∗b ∗b know, which basis vector ê∗b 1 , ê2 , ê3 , . . . estimated from the surrogate data corresponds to a given êi . In general, this is not true, and finding the right permutation may become computationally very expensive. The second problem is more fundamental: If we allow higher dimensional source signals, this estimator will no longer be able to assess the stability of a solution. To see this, consider a five-dimensional mixture of, say, 2 signals s1 and s2 with dim(s1 ) = 2 and dim(s2 ) = 3 that both cannot be written as linear combinations of independent one-dimensional signals but are mutually statistically independent (i.e. p(s1 , s2 ) = p(s1 ) · p(s2 )). Although the two subspaces containing s1 and s2 can be separated perfectly with standard ICA techniques, there is no primely basis within each of these two subspaces that could be identified by ICA. This means, every ICA projection will be marked as unstable by the bootstrap estimator. To simplify the permutation problem, we use the fact that the subspaces identified by our ICA algorithms do not depend on the initial basis {fj } defined by our observations xj , i.e. we are free to perform a linear transformation (or a change of coordinate system) before applying the resampling procedure. In particular, it is highly convenient to resample after a prior blind source separation step, i.e. we have to consider only small deviations from the identity matrix in every bootstrap sample. Since the components of the data vector with respect to {êi } are white by construction, the separating matrices R̂∗b obtained from the surrogate data sets are approximately characterized by a small rotation4 . The crucial idea how to find stable higher dimensional source signal subspaces is to calculate not the overall rotation for each direction, but to decompose each rotation into N (N2−1) elementary rotations within all two-dimensional planes spanned by the coordinate axes. This can be carried out by taking the matrix logarithm α̂∗b = ln(R̂∗b ). (13) Here, each component αij of α is the angle of a rotation in the i-j-plane (see Appendix A). If we now calculate the variance of the α-matrix component4 In order to obtain exact rotations a re-whitening transformation – defined as x0 = Vx with V = E[xxT ]−1/2 – is applied to the surrogate data. wise, we obtain a separability matrix v u B u1 X 2 (α̂∗b Ŝij = t ij ) . B (14) b=1 The component Ŝij of this matrix measures, how unstable the ICA solution is with respect to a rotation in the i-j-plane, i.e. how reliable the respective one dimensional subspaces êi and êj can be separated. Note that a low Ŝij therefore corresponds to good separability. If the used BSS algorithm was successful in separating the independent subspaces, the separability matrix should have a block structure that groups together one-dimensional ICA projections that belong to the same independent subspace. Thus, a reliable independent subspace should become clearly separated from every other subspace. Algorithmically one can use the second eigen-vector for detecting block structure in Ŝ which is a very common technique in spectral clustering (see e.g. [28], [29], [30]). We use this to define the uncertainty of the estimated one or multi-dimensional source signal subspace. Let V be a kdimensional subspace spanned by the ICA basis vectors êi with i ∈ I(V) = {i1 , i2 , . . . , ik }. The uncertainty of an estimated multi-dimensional source signal subspace can be defined by U (V) := max Ŝij . j ∈I(V) / i∈I(V) (15) In the case of one-dimensional independent components this reduces to Ui := U (êi ) = max Ŝij . (16) j Let us summarize the resampling algorithm: 1. Estimate the mixing matrix  with some ICA/BSS algorithm. Calculate the projections y = A−1 x. 2. Produce B surrogate data sets from y and whiten each of these data sets. 3. For each of the B surrogate data sets: do BSS. This produces a set of rotation matrices R̂∗b , b = 1..B. 4. Calculate the matrix of the elementary rotation angles α̂∗b = ln(R̂∗b ) 5. Calculate the separability matrix Ŝij in the rotation parameters (angles) αij . 6. Separate the data space into different one or multidimensional subspaces according to the block structure of S 7. For each subspace calculate the uncertainty To wrap up, we compute the stability of independent subspaces found by ICA/BSS. Depending on the application (cf. next section), each subspace can in principle be one or multidimensional. We would like to stress that our method allows to pin down structural dependencies (say e.g. a three dimensional stable subspace) which provides highly important and relevant information for subsequent biomedical interpretation and modeling. IV. Experiments on artificial data In the experiments that are reported here, we used both the bootstrap and the filter technique. Remarkably, the results are almost identical. The following figures show the results of the Bootstrap resampling (with JADE) and Eq.(10) (with TDSEP). 1518 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING A. Comparison Separation Error - Uncertainty Estimate To show the practical applicability of the resampling idea to BSS, the RMSE (Eq. 1) was compared with the Uncertainty Ui (Eq. 16) for the case of one dimensional independent components. The separation was performed on artificial 2D mixtures of signals that have been produced by simple stochastic processes (1000 data points, unit variance). To achieve different separation qualities, the parameters of the used stochastic processes have been adjusted such that they produce time series of different kurtosis values and different strength of time structure. Figure (3) shows the result of the experiment for the algorithms JADE and TDSEP. Each point in this diagram stands for a specific parameter setting of the stochastic process. One can clearly see, that the RMSE σi is nicely correlated to the Uncertainty Ui . For low Uncertainties (U ≤ 0.1) the uncertainty measure therefore allows a good prediction of the true separation error. For large uncertainties the points in the diagrams scatter over a large range of possible RMSE values; this means, that it is no longer possible to predict the true separation error. (The systematic deviation of the points from the bisecting line in this regime is due to the fact that one can measure errors only up to π/4). TDSEP 0.4 0.3 0.3 0.2 0.2 0.1 0 RMSE 0.4 JADE 0.5 RMSE 0.5 0.1 Uncertainty U 0 0.1 0.2 0.3 0.4 i 0.5 0 0 0.1 Uncertainty U i 0.2 0.3 0.4 0.5 Fig. 3. The RMSE vs. the uncertainty estimate for the two used algorithms. For small values (U ≤ 0.1) the uncertainty allows to predict the RMSE. B. Blockwise uncertainty estimates For a longer time series it may be interesting to know whether different parts of a given time series are more (or less) reliable to separate than others. In this case, it may be better to estimate the mixing matrix not on the whole time series but only on certain “good” parts, where the BSS assumptions are properly fulfilled. To demonstrate these effects, we mixed two audio sources (8kHz, 10s - 80000 data points), where the mixtures are partly corrupted by white Gaussian noise. Reliability analysis is performed on windows of length 1000, shifted in steps of 250; the resulting variance estimates are smoothed. Fig. 4 shows again that the uncertainty measure is nicely correlated with the true separation error, furthermore the variance goes systematically up within the noisy part but also in other parts of the time series that do not seem to match the assumptions underlying the algorithm.5 So our reliability estimates can eventually be used to improve separation performance by either removing all but the ‘reliable’ parts of the time series or by performing weighted averaging. For our example this reduces the overall separation error by 2 orders of magnitude from 2.4 · 10−2 to 1.7 · 10−4 . This moving-window resampling can detect instabilities of the projections in two different ways: Besides the resampling variance that can be calculated for each window, one can also cal5 For example, the peak in the last third of the time series can be traced back to the fact that the original time series are correlated in this region. no additive noise additive noise no additive noise Fig. 4. Upper panel: mixtures, partly corrupted by noise. Lower panel: the blockwise variance estimate (solid line) vs the true separation error on this block (dotted line). culate the change of the projection directions between two windows. The later has already been used successfully by Makeig et al. [31]. C. Selecting the appropriate BSS algorithm and detecting multidimensional independent components As our resampling algorithm behaves well in the case of onedimensional independent components, the next logical step is to test it with mixtures of multidimensional independent components. Equally important we can use it as a model selection criterion for: (a) selecting some hyperparameter of the BSS algorithm, e.g. choosing the lag values for TDSEP or (b) choosing between a set of different algorithms that rely on different assumptions about the data, i.e. higher order statistics (e.g. JADE, INFOMAX, FastICA, ...) or second order statistics (e.g. TDSEP). It could, in principle, be much better to extract one or multi-dimensional component with one and the next with another assumption/algorithm. To illustrate the usefulness of the separability matrix, we study the following seven-channel mixture: two harmonic oscillations (sin and cos), two speech signals, two channels of white Gaussian noise and one channel of uniformly distributed noise. From section III-B we know, that this mixture contains of five independent subspaces: The audio signals and the uniformly distributed noise each define a one-dimensional source signal space, whereas the sin/cos- and the Gaussian-noise-subspaces are two-dimensional. Figure 5 shows the estimated sources and the separability matrix using TDSEP with time lags (τ = 0..20). The source estimates show, that TDSEP is able to identify the audio sources and the sin/cos - signals up to a phase shift. The Gaussian and uniform noise signals are still mixed; what is to be expected, because TDSEP can separate only sources with temporal structure. The separability matrix displays exactly this behaviour: source estimates 1 and 2 (audio signals) are one dimensional components, source estimates 3,4,5 (noise) span a 3-dimensional component and source estimates 6 and 7 (sin and cos) span a 2-dimensional component. Note, that the grouping of source estimates to higher dimensional subspaces displays both the properties of the data and properties of the used source separation algorithm. Figure 6 shows estimated sources and separability matrix for the same data set using JADE. The main difference to figure 5 (besides the permutation of the estimates, that has no meaning) is the fact, that JADE is able (in contrast to TDSEP) to separate the uniform noise from the Gaussian sources. This can as well be seen in the separability matrix. For this data set, JADE is able to find smaller subspaces than TDSEP and can therefore be regarded as the more suitable algorithm in this case. A more careful examination of the separability matrix, however, reveals (Figure 7), that the uncertainties Estimated Independent Component No. MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS 1519 The estimated source signals The Separability Matrix 0.4 1 1 2 2 3 3 4 4 0.2 5 5 0.15 6 6 0.1 7 7 0.35 0.3 0.25 0.05 2 4 6 Estimated Independent Component No. Fig. 5. Estimated sources and separability matrix using TDSEP (toy data set). The estimated source signals The Separability Matrix 0.4 1 1 2 2 3 3 4 4 0.2 5 5 0.15 6 6 0.1 7 7 0.35 0.3 0.25 0.05 2 4 6 Fig. 6. Estimated sources and separability matrix using JADE (toy data set). TDSEP 0.5 b.) 0.4 0.4 Uncertainty Uncertainty JADE 0.5 a.) 0.3 0.2 0.1 0 data sets is beyond the scope of this paper; we reference original papers for physiological details and interpretation. 0.3 A. Fetal ECG 0.2 We now illustrate our resampling approach on fetal ECG data. We use a data set [32] of 2500 points sampled at 500Hz with 8 electrodes located on abdomen and thorax of a pregnant woman. In previous examinations of this data set [33], [23], the necessity of introducing the generalization of ICA to multi-dimensional components became obvious. Looking at the separability matrices, one can clearly see that the JADE algorithm is more appropriate for separating this data than TDSEP, because JADE yields lower matrix entries, i.e. higher reliability. The TDSEP separability matrix does not show a clear block structure; the only component that can reliably separated from the others is component 8. The JADE separability matrix in contrast shows two one-dimensional components (1 and 4) and three two-dimensional components (2/3, 5/6 and 7/8). Examination of the estimated source signals shows that in fact only JADE is able to separate the heartbeat of the fetus from the heartbeat of the mother (mother: 1, 2/3, 4; fetus: 7/8). 0.1 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Fig. 7. The one-dimensional uncertainties for the source estimates from Figs. 6 and 5 using (a) JADE and (b) TDSEP. On this data set, JADE is able to identify three one-dimensional components with acceptable low uncertainties (3,4,7 - the audio signals and the uniformly distributed source) whereas TDSEP can find only two stable one-dimensional components (1,2 - the audio signals). Note that TDSEP is able to extract the audio signals more reliably than JADE. of the one dimensional audio signal estimates (U1 = 0.004 and U2 = 0.004) using TDSEP are much lower, than the respective uncertainties (U4 = 0.091 and U7 = 0.023) using JADE. Calculation of the true separation errors shows that in fact TDSEP (E1 = 0.017 and E2 = 0.004) does a better job estimating these sources (JADE: E4 = 0.117 and E7 = 0.032). The 2-dimensional sin/cos subspace is found equally well by both algorithms. Knowing this, it is now straightforward to combine the strengths of both algoritms for the source separation. A first application of TDSEP finds the audio sources and the sin/cossubspace. Then applying JADE to the orthogonal subspace allows to extract the other components and yields the best solution that can be achieved by combining these two algorithms. V. Application to Biomedical Data We will now apply our algorithm to biomedical real-world data sets that serve as testbeds. A full analysis of the respective B. Removing 150Hz artifact in event-related MEG measurements In the analysis of MEG data we often face the problem that noise from biological or technical origin is corrupting the measurements. A previous study showed, that blind source separation methods can be used to reduce the artifacts which will improve the source localization accuracy [34], [35]. We applied ICA combined with our resampling scheme to the MEG data sets from [34], [35] containing measurements of somatic evoked magnetic fields (SEFs) elicited by electrical stimulation of the right median nerve. In this experiment, the magnetic field above the contralateral somatosensory cortex IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING Estimated Independent Component No. 1520 The estimated source signals The Separability Matrix 1 1 0.35 2 2 0.3 3 3 0.25 4 4 0.2 5 5 0.15 6 6 7 7 8 8 0.1 0.05 1 2 3 4 5 6 7 8 0 Estimated Independent Component No. Fig. 8. Estimated source signals and separability matrix for the ECG time series using TDSEP. The estimated source signals The Separability Matrix 1 1 0.35 2 2 0.3 3 3 0.25 4 4 0.2 5 5 0.15 6 6 7 7 8 8 0.1 0.05 1 2 3 4 5 6 7 8 0 Fig. 9. Estimated source signals and separability matrix for the ECG time series using JADE. a.) Separability Matrix TDSEP b.) Separability Matrix JADE 0.5 5 10 0.5 0.4 5 0.4 0.3 0.3 10 15 0.2 15 0.2 20 0.1 20 0.1 0 25 25 5 Fig. 10. ment 10 15 20 25 5 10 15 20 25 0 The separability matrices of the Event-Related EEG Measure- Figure 10 shows that TDSEP works better on this data set and is able to find low-dimensional signal subspaces. The separability matrix for JADE does not show such a clear block- structure and cannot identify the artifact-component as well as TDSEP. TDSEP2 was measured at a sampling rate of 2 kHz using the Berlin 49channel planar SQUID-gradiometer system operated in a magnetically shielded room. A stimulus locked averaging over 12000 epochs with a duration of 333 ms had been performed. The averaged data can be used to locate the exact position of the activated region of the somatosensory cortex by fitting an equivalent current dipole model. This is of high clinical importance for example in the “presurgical assessment of functional brain area” [36], [37]. However such a technique is very sensitive to corruption by artifactual signals, like the omnipresent power line interferences. Applying BSS and resampling to this data after compression to the 25 most powerful principle components (cf. [34], [38]), reveals that several low-dimensional subspaces could be reliably identified when using TDSEP. In particular, the twodimensional subspace spanned by components TDSEP1 and TDSEP2 clearly corresponds to a 150Hz harmonics due to the power line artifact (Here, we encounter again the previously discussed sin / cos-component, Fig. 11). TDSEP1 Fig. 11. Shown are the TDSEP source estimates 1 and 2 (left) and a scatterplot of the components (right). C. A DC-MEG experiment with acoustic stimulation We now apply our reliability analysis to a time series that has been produced by a DC-MEG experiment with acoustic stimulation. DC-coupled brain monitoring is of high medical relevance because many pathophysiological processes have their main energy in the frequency range below 0.1 Hz. The biomagnetic recording technology employed here is based on a mechanical modulation of the head, respectively, body position relative to the sensor [39]. This technology has the potential to enable physicians to detect minute injury-related fields e.g. from nearDC phenomena in stroke such as peri-infarct and anoxic depolarizations (see e.g. [40], [41], [42] for a detailed discussion). The magnetic field has been recorded with a planar SQUIDgradiometer sensor-array that was centered tangentially over the left auditory cortex. An acoustic stimulation was achieved by presenting alternating periods of music and silence, each of 30s length, to the subjects right ear during 30 min. of total recording time. This paradigm of externally controlled music- related MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS 1521 DC-activations of auditory cortices defines a measurement and analysis scenario with almost complete knowledge about both the spatial pattern and the time course of a cerebral DC-source which on the other hand is fully embedded in the ‘real’ biological and ambient noise background (for details see [43], [42]). The measured DC-magnetic field values, sampled at a frequency of 0.4 Hz, gave a total number of 720 sample points for each of the 49 channels. While previously [42] analysing the data, we found that many of the ICA components are seemingly meaningless and it took some medical knowledge to find potential meaningful projections for a later close inspection. In the current experiment, BSS and resampling was performed on the 23 most powerful principal components. a.) Separability Matrix TDSEP b.) Separability Matrix JADE 0.4 5 0.3 10 0.2 15 0.1 20 5 10 15 0.4 5 0.3 10 0.2 15 0.1 20 20 5 10 15 20 Fig. 12. The separability matrices of TDSEP and JADE on the DCMEG data set show that JADE fails to produce a stable source separation; TDSEP is able to find three one-dimensional and three higher dimensional components. The results in Fig. 12 show that JADE (b) fails completely to produce a stable source separation, whereas TDSEP (a) identifies three one-dimensional components (1,22,23) and three higher-dimensional components (2/3, 4-7 and 8-21) with a low uncertainty. In fact, at least the one-dimensional components have a clear physical meaning: component 1 is an internal very low frequency frequency signal (drift) that is always present in DC-measurements and component 23 shows an typical artifact produced by the MEG measuring device (Figure 14). Interest- 0.4 JADE 0.5 a.) Uncertainty Uncertainty 0.5 0.3 0.2 0.1 0 5 10 15 20 0.4 TDSEP b.) 0.3 0.2 0.1 0 5 10 15 20 Fig. 13. The one-dimensional uncertainties of the source estimates using a.) JADE and b.) TDSEP Fig. 14. The TDSEP components 1 and 23. Component 1 (upper curve) is a drift and component 23 (lower curve) an artifact produced by the DC-MEG measurement device. ingly component 22 shows a (noisy) rectangular waveform that clearly displays the 1/30s on/off characteristics of the stimulus (correlation to stimulus 0.7; see Fig. 15). The clear dipole- Fig. 15. Spatial field pattern and time course of TDSEP channel 22. structure of the spatial field pattern in Fig. 15 underlines the relevance of this projection. The components found by JADE do not show such a clear structure and the strongest correlation of any component to the stimulus is about 0.3, which is of the same order of magnitude as the strongest correlated PCAcomponent before applying JADE. VI. Discussion and Conclusions We proposed a simple method based on resampling techniques to estimate the reliability of results obtained from unsupervised learning algorithms. After briefly discussing the general resampling idea, we applied it to the BSS scenario and showed, that our technique approximates the separation error well. Now several directions are open(ed) for applications. First, we may like to use our reliability assessment for model selection purposes to distinguish between algorithms or to chose good hyper-parameters. Note that powerful algorithms exist that can be used in deflation mode, e.g. FastICA [44]. So, BSS should be applied component-wise: chosing the best, i.e. most reliable algorithm for every one or multi-dimensional component. Simulation experience clearly suggest that such a deflation strategy can give excellent results if different statistical assumptions are underlying the respective sources, e.g. when the first source contains only temporal information, the second is of super-Gaussian nature and so on. Second, variances can be estimated on blocks of data and separation performance can be enhanced by weighted averaging or by using only low variance blocks where the model matches the data nicely. Possible breakdowns of reliability give an indication for a violation of the respective BSS/ICA model assumptions, thus revealing interesting structure of the data. Finally, reliability estimates can be used to find stable meaningful components. Here our assumption is that the more meaningful a component is, the more stably we should be able to estimate it. In this sense artifacts appear of course also as meaningful, whereas noisy directions are discarded easily, due to their high uncertainty. By discarding meaningless6 components, we can relieve the medical staff from inspecting useless components and therefore reduce the necessary human interaction in a decision/diagnosis process. This is particularly important for example in MEG applications where recent devices have more than 300 sensor channels. Note that the reliable components to be extracted can be one- or multi-dimensional, a finding, which can provide highly useful information as a starting point for further understanding or modeling of physiological data. For example the study (see section V-A) on fetal ECG underline in very nice agreement to previous work [23]: ECG signals can be most reliably 6 The subspace spanned by the unstable component estimates could in principle also carry physical meaning, but within the assumptions of the used algorithm it is impossible to reveal any hidden structure. The estimated source signals are arbitrary in this case and should not be interpreted without further processing. 1522 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING described and should be modeled best with multi-dimensional components. As indicated above, since resampling is a very general statistical method, it can also be used for assessing reliability in other unsupervised learning scenarios. Future research will therefore focus on applying resampling techniques to other unsupervised learning scenarios e.g. for clustering (cf. also [45]). We furthermore also intend to consider Bayesian modelings (e.g. varational or ensemble models cf. [46], [47], [15], [5]) where often a variance estimate comes for free, along with the trained model – however often at high computational costs. Appendix The 2-dimensional Rotation matrix can now be written as « „ ´ 1` α12 M12 + α21 M21 R = exp 2 ` ´ = exp α12 M12 = = 1 cos α12 + M12 sin α12 « „ cos α12 sin α12 − sin α12 cos α12 which is the best known parameterization of the 2-dimensional rotation matrices. Eqn.(20) is the n-dimensional generalization of this formula. A. Rotation Angles and Rotation Matrices B. Asymptotic Considerations for Resampling The rotation matrices in N -dimensional space form the representation of the special orthogonal group SO(N ). This means, that for all R ∈ SO(N ) Properties of resampling methods are typically studied in the limit when the number of bootstrap samples B → ∞ and the length of signal T → ∞ [4]. In the case of bootstrap resampling, as B → ∞, the bootstrap variance estimatorqUi∗ (B) computed RRT = 1 (17) det(R) = +1 (18) Any orthogonal matrix R can be written as the exponential of a single antisymmetric matrix α R = eα = ∞ X 1 n α n! n=0 To see this, just transpose R = eα : T RT = (eα )T = eα = e−α = R−1 A basis {Mij } in the space of all antisymmetric matrices can be defined by “ ” Mij = δai δbj − δaj δbi (19) ab δia where denotes the Kronecker-Symbol: δai = 1 if i = a and δai = 0 otherwise. Note, that each Mij is a N by N matrix, a and b specify the matrix entries, i.e. a is the row- and b the column-index. Using this, every orthogonal matrix can be written as ! N 1 X ij αij M (20) R = exp 2 i,j=1 Due to the antisymmetry of the M ij , one can choose the αij to be antisymmetric, too, without loss of generality. If we do so, each αij corresponds to the angle of a basic rotation within the i-j-plane. More precisely, exp(αij Mij ) is a orthogonal transformation that rotates the i-axis towards the j-axis by the angle αij . So, given an arbitrary rotation matrix R, it is always straightforward to decompose it into elementary rotations within different planes by taking the logarithm: α = ln(R) (21) Example 1: In the 2-dimensional case one obtains the 2 by 2 antisymmetric basis matrices ` ij ´ « „ ` ij ´ `Mij ´11 `Mij ´12 Mij = M 21 M 22 With i, j = 1..2 these are 4 matrices: „ « 0 0 M11 = M22 = 0 0 „ « 0 1 M12 = −M21 = −1 0 from the α∗ij ’s converge to Ui∗ (∞) := maxj VarF̂ [α∗ij ] where α∗ij denotes the resampled angle deviation and F̂ denotes the distribution generating it. Furthermore,pif F̂ → F , Ui∗ (∞) converges to the true variance Ui = maxj VarF [αij ] as T → ∞. This is the case, for example, if the original signal is i.i.d. in time. When the data has time structure, F̂ does not necessarily converge to the generating distribution F of the original signal anymore. Although we cannot neglect this difference completely, it is small enough to use our scheme for the purposes considered in this paper. For instance in TDSEP, where the αij depend on the variation of the time-lagged covariances Cii (τ ) of the signals, ∗ we can bound the difference ∆ij = VarF̂ [Ĉij (τ )] − VarF [Ĉij (τ )] between the real variation and its bootstrap estimator as n 2 o ( ` ´ 2 2τ 2τ 1 2a M 1 + a + 2τ a , j=i 2 T 1−a |∆ij | ≤ 2 1 2a2 M , j 6= i T 1−a2 if ∃a < 1, M ≥ 1, ∀i : |Cii (τ )| ≤ M aτ |Cii (0)|. In our experiments, however, the bias is usually found to be much smaller than this upper bound. For the filter resampling it is rather difficult to show theoretically whether it can be used as a unbiased variance estimator. Nevertheless, our experiments show numerically that the filter resampling typically provides good absolute variance estimates. Acknowledgments K.-R.M thanks Guido Nolte and the members of the Oberwolfach Seminar September 2000 in particular Lutz Dümbgen and Enno Mammen for helpful discussions and suggestions. K.-R.M and A.Z. acknowledge partial funding by the EU project (IST1999-14190 – BLISS). The studies were supported by a grant of the Bundesministerium für Bildung und Forschung (BMBF), FKZ 01IBB02A and 01IBB02B. We thank PTB for providing the MEG data. Special thanks go to the reviewers, who gave highly valuable advice in the revision process. References [1] [2] [3] [4] F. Meinecke, A. Ziehe, M. Kawanabe, and K.-R. Müller, “Assessing reliability of ICA projections – a resampling approach,” in Proc. ICA’01, T.-W. Lee, Ed., 2001. F. Meinecke, A. Ziehe, M. Kawanabe, and K.-R. Müller, “Estimating the reliability of ICA projections,” in Advances in Neural Information Processing Systems 14, S. Becker T.G. Dietterich and Z. Ghahramani, Eds., Cambridge MA, 2002, MIT Press. B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, first edition, 1993. J. Shao and D. Tu, The Jackknife and Bootstrap, Springer, New York, 1995. MEINECKE ET. AL - STABILITY OF INDEPENDENT COMPONENTS [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley, 2001. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, pp. 1299 – 1319, 1998. R.O. Duda, P.E.Hart, and D.G.Stork, Pattern classification, John Wiley & Sons, second edition, 2001. Ch. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal Processing, vol. 24, pp. 1–10, 1991. P. Comon, “Independent component analysis, a new concept?,” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994. G. Deco and D. Obradovic, An information-theoretic approach to neural computing, Springer, New York, 1996. S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for blind signal separation,” in Advances in Neural Information Processing Systems (NIPS 95), D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, Eds. 1996, vol. 8, pp. 882–893, The MIT Press. A. J. Bell and T. J. Sejnowski, “An information maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, pp. 1129–1159, 1995. J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian signals,” IEEE Proceedings-F, vol. 140, no. 6, pp. 362–370, 1993. A. Ziehe and K.-R. Müller, “TDSEP – an efficient algorithm for blind separation using time structure,” in Proc. Int. Conf. on Artificial Neural Networks (ICANN’98), L. Niklasson, M. Bodén, and T. Ziemke, Eds., Skövde, Sweden, 1998, pp. 675 – 680, Springer Verlag. S. Roberts and R. Everson, Eds., Independent Component Analysis: principles and practice, Cambridge University Press, 2001. L. Parra and C. Spence, “Convolutive blind source separation of nonstationary sources,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, May 2000. T.-W. Lee, A. Ziehe, R. Orglmeister, and T.J. Sejnowski, “Combining time-delayed decorrelation and ICA: Towards solving the cocktail party problem,” in Proc. ICASSP98, Seattle, 1998, vol. 2, pp. 1249–1252. N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, vol. 41, no. 1-4, pp. 1–24, 2001, also BSIS Technical Reports No.98-2. L. Parra, C. D. Spence, P. Sajda, A. Ziehe, and K.-R. Müller, “Unmixing hyperspectral data,” in Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K.-R. Müller, Eds. 2000, pp. 942–948, MIT Press. S. Makeig, T-P. Jung, D. Ghahremani, A.J. Bell, and T.J. Sejnowski, “Blind separation of event-related brain responses into independent components,” Proc. Natl. Acad. Sci. USA, vol. 94, pp. 10979–10984, 1997. R. Vigário, V. Jousmäki, M. Hämäläinen, R. Hari, and E. Oja, “Independent component analysis for identification of artifacts in magnetoencephalographic recordings,” in Advances in Neural Information Processing Systems, Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, Eds. 1998, vol. 10, pp. 229–235, The MIT Press. A. Ziehe, K.-R. Müller, G. Nolte, B.-M. Mackert, and G. Curio, “Artifact reduction in magnetoneurography based on time-delayed second order correlations,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 1, pp. 75–87, 2000, also GMD Technical Report No. 31, 1998. J. F. Cardoso, “Multidimensional independent component analysis,” in Proceedings of ICASSP ’98. 1998, IEEE. E. Oja, Subspace methods of Pattern Recognition, Res. Studies Press, Hertfordshire, 1983. G.H. Golub and C.F. van Loan, Matrix Computation, The Johns Hopkins University Press, London, 1989. C.G.J. Jacobi, “Über ein leichtes Verfahren, die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen,” Crelle J. reine angew. Mathematik, vol. 30, pp. 51–94, 1846. J.-F. Cardoso and A. Souloumiac, “Jacobi angles for simultaneous diagonalization,” SIAM J.Mat.Anal.Appl., vol. 17, no. 1, pp. 161 ff., 1996. Y. Weiss, “Segmentation using eigenvectors: A unifying view,” in ICCV (2), 1999, pp. 975–982. M. Meila and J.Shi, “Learning segmentation by random walks,” in Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, Eds. 2001, vol. 13, MIT Press. C. J. Alpert, A. B. Kahng, and S.Z. Yao, “Spectral partitioning with multiple eigenvectors,” Discrete Applied Mathematics, vol. 90, pp. 3–26, 1999. S. Makeig, S. Enghoff, T.-P. Jung, and T. Sejnowski, “Moving-window ICA decomposition of EEG data reveals event-related changes in oscillatory brain activity,” in Proc. 2nd Int. Workshop on Independent Component Analysis and Blind Source Separation (ICA’2000), Helsinki, Finland, 2000, pp. 627–632. De Moor B. L. R. (ed), “Daisy: Database for the identification of systems. http://www.esat.kuleuven.ac.be/sista/daisy,” 1997. L. D. Lathauwer, B. D. Moor, and J. Vandewalle, “Fetal electrocardiogram extraction by source subspace separation,” in Proceedings of HOS’95, Aiguabla, Spain, 1995. 1523 [34] A. Ziehe, G. Nolte, T. Sander, K.-R. Müller, and G. Curio, “A comparison of ICA-based artifact reduction methods for MEG,” in Recent Advances in Biomagnetism, Proc. of the 12th International conference on Biomagnetism, Jukka Nenonen, Ed., Espoo,Finland, 2001, pp. 895–898, Helsinki University of Technology. [35] G. Nolte and G. Curio, “The effect of artifact rejection by signal-space projection on source localization accuracy in MEG measurements,” IEEE Trans. Biomed. Eng., vol. 46, pp. 400–408, 1999. [36] O.Ganslandt, D. Ulbricht, H. Kober, J. Vieth, C. Strauss, and R. Fahlbusch, “SEF-MEG localization of somatosensory cortex as a method for presurgical assessment of functional brain area,” Electroencephalogr. Clin. Neurophysiol. Suppl., vol. 46, pp. 209–13, 1996. [37] M.A. Uusitalo and R.J. Ilmoniemi, “The signal-space projection (SSP) method for separating MEG or EEG into components,” Med.Biol.Eng.Comput., vol. 35, pp. 135–140, 1997. [38] A. Hyvarinen, J. Sarela, and R. Vigario, “Spikes and bumps: Artefacts generated by independent component analysis with insufficient sample size,” in Proc. Int. Workshop on Independent Component Analysis and Blind Source Separation (ICA’99), Aussois, France, January 11–15, 1999, pp. 425–429. [39] G. Wübbeler, B.-M. Mackert, F. Armbrust, M. Burghoff, P. Marx, G. Curio, and L. Trahms, “Measuring para-DC biomagnetic fields of the head using a horizontal modulated patient cot,” Biomed Tech (Berl), , no. 43, pp. 232–233, 1999, in german. [40] B.-M. Mackert, J. Mackert, G. Wübbeler, F. Armbrust, K.-D. Wolff, M. Burghoff, L. Trahms, and G. Curio, “Magnetometry of injury currents from human nerve and muscle specimens using superconducting quantum interferences devices,” Neuroscience Letters, vol. 262, no. 3, pp. 163–166, Mar 1999. [41] G. Curio, S.M. Erné, M. Burghoff, K.-D. Wolff, and A. Pilz, “Non-invasive neuromagnetic monitoring of nerve and muscle injury currents,” Electroencephalography and clinical Neurophysiology, vol. 89, no. 3, pp. 154–160, 1993. [42] G. Wübbeler, A. Ziehe, B.-M. Mackert, K.-R. Müller, L. Trahms, and G. Curio, “Independent component analysis of non-invasively recorded cortical magnetic DC-fields in humans,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 5, pp. 594–599, 2000. [43] B.-M. Mackert, G. Wübbeler, P. Marx, L. Trahms, and G. Curio, “Noninvasive long-term recordings of cortical ’direct current’ (DC-) activity in humans using magnetoencephalography,” Neuroscience Letters, vol. 273, no. 3, pp. 159–162, Oct 1999. [44] A. Hyvärinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Computation, vol. 9, no. 7, pp. 1483–1492, 1997. [45] V. Roth, T. Lange, M. Braun, and J. Buhmann, “A resampling based approach to cluster validation,” 2001, unpublished manuscript. [46] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11, no. 4, pp. 803–851, 1999. [47] H. Valpola, Bayesian Ensemble Learning for Nonlinear Factor Analysis, vol. 108 of Acta Polytechnica Scandinavica, Mathematics and Computing Series, Finnish Academies of Technology, Espoo, Finnland, 2000. Frank Meinecke is master student at University of Potsdam and Fraunhofer FIRST Berlin. His research interest is focused on nonlinear dynamics and signal processing, especially blind source separation and independent component analysis. 1524 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING Andreas Ziehe received the Diplom degree in computer science from Humboldt University Berlin in 1998. He is currently working towards his Ph.D. at University of Potsdam and Fraunhofer FIRST Berlin. His research interest is focused on computational methods for data analysis and signal processing, especially blind source separation and independent component analysis. Motoaki Kawanabe received both master and Ph.D in mathematical engineering from University of Tokyo in Prof. Amari’s Lab. Since 2000 he has joined the IDA group at Fraunhofer FIRST. His research interest is focused on statistics, information theory, information geometry and recently also on blind source separation and independent component analysis. Klaus-Robert Müller received the Diplom degree in mathematical physics 1989 and the Ph.D. in theoretical computer science in 1992, both from University of Karlsruhe, Germany. From 1992 to 1994 he worked as a Postdoctoral fellow at GMD FIRST, in Berlin where he started to built up the intelligent data analysis (IDA) group. From 1994 to 1995 he was a European Community STP Research Fellow at University of Tokyo in Prof. Amari’s Lab. From 1995 on he is department head of the IDA group at GMD FIRST (since 2001 Fraunhofer FIRST) in Berlin and since 1999 he holds a joint associate Professor position of GMD and University of Potsdam. He has been lecturing at Humboldt University, Technical University Berlin and University of Potsdam. In 1999 he received the annual national prize for pattern recognition (Olympus Prize) awarded by the German pattern recognition society DAGM. He serves in the editorial board of Computational Statistics, IEEE Transactions on Biomedical Engineering and in program and organization committees of various international conferences. His research areas include statistical physics and statistical learning theory for neural networks, support vector machines and ensemble learning techniques. His present interests are expanded to time-series analysis, blind source separation techniques and to statistical denoising methods for the analysis of biomedical data.