838 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images Vitali Zagorodnov*, Member, IEEE, and Arridhana Ciptadi, Student Member, IEEE Abstract—Many segmentation algorithms in medical imaging rely on accurate modeling and estimation of tissue intensity probability density functions. Gaussian mixture modeling, currently the most common approach, has several drawbacks, such as reliance on a Gaussian model and iterative local optimization used to estimate the model parameters. It also does not take advantage of substantially larger amount of data provided by 3D acquisitions, which are becoming standard in clinical environment. We propose a novel and completely non-parametric algorithm to estimate the tissue intensity probabilities in 3D images. Instead of relying on traditional framework of iterating between classification and estimation, we pose the problem as an instance of a blind source separation problem, where the unknown distributions are treated as sources and histograms of image subvolumes as mixtures. The new approach performed well on synthetic data and real magnetic resonance imaging (MRI) scans of the brain, robustly capturing intensity distributions of even small image structures and partial volume voxels. Index Terms—Blind source separation, Gaussian mixtures, image segmentation, magnetic resonance imaging (MRI), tissue intensity distributions. I. INTRODUCTION ANY segmentation algorithms in medical imaging rely on accurate modeling and estimation of tissue intensity probability density functions (pdfs) [1]–[12], usually in the context of statistical region-based segmentation. Commonly, tissue intensity probabilities are modeled using the finite mixture (FM) model [2], [13], [14], and its special case the finite Gaussian mixture (FGM) model [15], [16]. In these models the intensity pdf of each tissue class is represented by a parametric (e.g., Gaussian in the case of FGM) function called the component density, while the intensity pdf of the whole image is modeled by a weighted sum of the tissue component densities. The fitting is usually done using Expectation Maximization (EM) algorithm [1], [3], [6], [9]–[11], [17]–[19], which iterates between classification and parameter estimation until a stable state is reached. The FM models combined with EM algorithm have been incorporated into many existing image segmentation pipelines, usu- M Manuscript received October 19, 2010; accepted November 29, 2010. Date of publication December 17, 2010; date of current version March 02, 2011. This work was supported by SBIC C-012/2006 grant provided by A*STAR, Singapore (Agency for Science and Technology and Research). Asterisk indicates corresponding author. *V. Zagorodnov is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore. A. Ciptadi is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore. Digital Object Identifier 10.1109/TMI.2010.2098417 ally in the context of brain tissue segmentation (FreeSurfer [20], SIENAX [21] part of FSL [22], SPM [16]). The main drawback of FGM models is that the tissue intensity distributions do not always have a Gaussian form. The noise in magnetic resonance (MR) images is known to be Rician rather than Gaussian [23]. But the largest deviation from Gaussianity is due to presence of partial volume (PV) voxels [6], [8], [24]–[27]. One way to model such voxels is by representing their distribution using a uniform mixture of “pure” distributions [8], [25], [28]. However, this complicates the optimization as the functional involved becomes considerably more nonlinear. Uniform mixture assumption has also been questioned [27]. To simplify the problem several researchers suggested modeling partial volume voxels intensity distributions as Gaussian, which appears to be a good approximation for sufficiently noisy images [6]. This approximation has been incorporated into the most recent versions of SPM [16]. However, a more recent study [29] revealed that model [28] can still achieve better fit. Another issue associated with framework is potential convergence to a poor local optimum, which means a sufficiently close parameter initialization is usually required [4], [13], especially for distribution means [29]. The convergence of the EM algorithm to a more meaningful optimum can be improved by including prior information in the classification step, such as pixel correlations [30], MRF priors [7], [25], [31], [32] or probabilistic atlas [5], [9], [31]–[33]. Even though this approach is helpful, especially for very noisy images [7], it can also introduce bias in estimation [30]. And while probabilistic atlas is useful when segmenting brain tissues [5] or abdominal organs [9], its construction is not always possible, as is the case in the segmentation of brain lesions [34], tumor detection [35], or localization of fMRI activations. Finally, the approach often fails to take advantage of substantially larger amount of data present in 3D images, such as those obtained by magnetic resonance (MR) and X-ray computed tomography (CT) scanning techniques. We propose a novel nonparametric algorithm to estimate tissue intensity probabilities in 3D images that completely departs from traditional classification-estimation framework. To illustrate the main idea behind our approach, consider the following introductory example. Shown in Fig. 1 are the histograms of a 3D T1-weighted MR image and two of its 2D slices. The observed variability in the shape of 2D histograms is due to varying tissue proportions across the slices. For example, the slice in Fig. 1(b) contains relatively small amount of cerebro-spinal fluid (CSF) and hence the lowest intensity peak (see Fig. 1(c) for reference) is practically missing from its histogram. The proportion of CSF is in- 0278-0062/$26.00 © 2010 IEEE ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 839 II. PROBLEM STATEMENT Let be a 3D image volume partitioned into a set of subvolumes . We assume the voxel intensities of can take distinct values and are drawn from unknown . For exprobability mass functions (pmf) ample, a brain volume can be assumed to have three main tissues, white matter (WM), gray matter (GM), and cerebro-spinal . For an 8-bit acquisition, . Subfluid (CSF), so volumes can be chosen arbitrary, for example as coronal, sagittal or transverse slices of the 3D volume. be the -bin histogram of , normalized to Let sum to 1. Then (1) is the th tissue proportion in the th subvolume, , and is the noise term that reflects the difference between the actual probability distribution and its finite approximation by a histogram. and . Rewriting (1) in a Let matrix form yields where Fig. 1. Histograms of a 3D brain image and several of its slices. (a) 3D Image and its histogram. (b) Transverse slice 128 and its histogram. (c) Transverse slice 152 and its histogram. creased in the slice shown in Fig. 1(c) due to the presence of ventricles, leading to reappearance of the lowest intensity peak. This slice, however, contains only a small amount of gray matter (GM), thus lowering the middle peak of the histogram. While this variability can potentially provide useful information for mixture estimation, it is traditionally discarded by performing estimation directly on the histogram of the whole volume. The proposed approach treats the histograms of 2D slices (or any other subvolumes) as mixture realizations of the component densities. This allows stating the unmixing problem in a blind source separation (BSS) framework [37], [38]. To solve the problem we use a framework that is similar to that of independent component analysis (ICA) [39], but without relying on the independence assumption. Instead we use the fact that underlying components must be valid probability distributions with different means, which results in a simple convex linear optimization problem that guarantees convergence to a global optimum (ICA’s iterative procedure, in comparison, converges only to a local optimum). Our approach provides a promising alternative for estimating component densities in 3D images that is more accurate compared to the state-of-the-art approaches. We note that a preliminary version of this algorithm was previously reported in a conference paper [36]. The present paper provides a complete description of the algorithm, introduces covariance matrix correction to improve its performance, and presents a more complete set of validation experiments, including evaluation of effects of intensity nonuniformity and individual algorithm components/settings (subvolume size, relaxation of the number principal components, correction of the covariance matrix). (2) This is identical to blind source separation (BSS) formulation with subvolume histograms as mixtures and unknown tissue pmf’s as sources. Our goal is to estimate as well as their mixing weights given . Our solution requires several assumptions, most of which , and suffiare general to a BSS problem: cient variability of mixing proportions . These can be easily satisfied with proper choice of partitioning and histogram dimensionality. We also assume that distributions have different means and are sufficiently separated from each other, where the meaning of sufficient separation is detailed in Section III. These assumptions are not very restrictive and are generally satisfied for medical images [15], [40]. III. PROPOSED SOLUTION BSS problem has been studied extensively in recent years, with several solutions proposed for selected special cases, e.g., factor analysis (FA) [41] for Gaussian sources and independent factor analysis (IFA) [42] or independent component analysis (ICA) [39] for independent non-Gaussian sources. The general framework of performing decomposition (2), as exemplified by ICA, can be briefly summarized as follows. 1) Center matrix so that all columns sum to zero, by subfrom all rows tracting row vector of . Let designate the centered version of . 2) Apply principal component analysis (PCA) to . Practically, this can be done by performing SVD deand setting and composition, . The rows of matrix contain a set of orthogonal contains the weights. principal components and matrix 840 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 2. Explanation of definitions used in Lemma 1. (a) Two original distributions f and f with means and , respectively, (b) Estimated PCA components p and p , (c) Various linear combinations of PCA components. 3) Decomposition of the original (noncentered) matrix can be done by changing to , where is a column vector where all elements are equal to 1. 4) Truncate and to preserve a smaller (specified) number of PCA components corresponding to the largest eigen. The square roots of these eigenvalues are values of located on the diagonal of . The resultant decomposition has the form , where the “hat” symbol refers to truncation, and appearance of error vector is due to discarding of the components. , so that . 5) Further decompose as In practice, this step is performed in the opposite direction, as A. Estimating Components With Largest and Smallest Means be the components with the smallest and largest Let and mean respectively. Then these can be estimated by minimizing ) the mean of a linear combi(for ) or maximizing (for nation of ’s, where coefficients of linear combination satisfy constraints (4) and (5). This follows from the following lemma, , and its corollary, which which proves this statement for extends the result to arbitrary . Lemma 1: Let and be the means of underbe an lying probability distributions. Let arbitrary linear combination of ’s that satisfies and for . Then, (6) (3) where rows of are estimated one-by-one by maximizing some measure of non-Gaussianity of the resultant components (rows of matrix ). Note that this approach cannot be applied directly to our problem, because the rows of (step 5) are not constrained to represent valid probability distribution functions (non-negative and sum to 1). Furthermore, maximizing non-Gaussianity might not result in a meaningful solution as our source components are neither Gaussian nor independent. However, the ICA framework can be adapted by preserving steps 1–4, and modifying step 5 as follows. be a set of preserved PCA components, Let . Let be a column vector then where all elements are equal to 1. Since the component densities represent valid pmf’s and hence are non-negative and sum to 1, this imposes equality and inequality constraints on the elements of (4) where The equalities are achieved when and . Proof: Explanations of the terms used in the lemma stateand (3) it follows ment are given in Fig. 2. From ’s that is also a linear combination of Then and (5) These constraints can be combined with the assumptions that component densities have different means and are sufficiently separated from each other to estimate . This is done in the following sections. (7) and Let is minimized by making . Then, and as large as possible. However, the ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES largest possible is controlled by non-negativity of 841 B. Estimating Remaining Components This leads to the left side of inequality (6). The right side of inequality (6) can be obtained in similar fashion, by setting and . The intuition behind Lemma 1 can be derived from Fig. 2(c), which shows several linear combinations of principal components estimated on the basis of a mixture of two Gaussian disand . Some linear tributions with means in this example, will be combinations, such as rejected because they contain negative values. The combination appears very similar to , but will also be rejected because of the negative bump aligned with the peak of . Further decreasing the coefficient in front will lead to disappearance of this bump and convergence to , the component with is non-negative, but minimum mean. Combination because of the presence its mean is slightly less than of a small positive bump aligned with the peak of . Reducing the coefficient in front of will remove the bump and increase the mean of this combination, converging to , the component with maximum mean. In practice, ’s are usually small and can be discarded. For Gaussian components on unbounded domain, it can be straightforwardly shown that and hence . If we ignore and , the equalities in (6) are achieved when or . Similar statement can be made in case of more than two components, see the following corollary. and be Corollary 1: Let the means of underlying probability distributions. Let satisfy and for . Then In this section we show that all remaining components can be estimated iteratively, one-by-one, by minimizing their inner product (effectively overlap) with the components that have already been estimated, where optimization is reduced to solving another Linear Programming problem. The following lemma , while its corollary extends the result to shows this for . Here throughout designates the inner product and is the norm. and Lemma 2: Let , where have been estimated using (9) and (10). coefficients of are the solution of Then the the following linear programming problem: (11) Proof: The sum of the inner products between , an arbiis trary linear combination of ’s, and Let tion are Since where i.e., . The partial derivatives of this func- is a linear function of ’s, and , its minimum occurs at , is the coordinate along which has the smallest slope, (8) or . The equalities holds when In practice, the coefficients of linear combination of ’s, corresponding to components with smallest and largest mean, can be estimated by solving the following linear programming optimization problems, respectively: (9) (10) The next section discusses estimation of the remaining components, assuming their number is larger than two. According to the lemma conditions, , hence . Therefore, is achieved when the minimum value of and and . The necessary condition in Lemma 2, , can be interpreted as sufficient separation between the underlying components, since these are non-negative. More specifically, it requires that the sum of overlaps between and and between and is smaller than the norm of or . This translates into a minimum signal-to-noise ratio (SNR) of 1.66 in the case of Gaussian components. This requirement can be easily satisfied for most medical images, where typical SNRs usually exceed 3. In the case of more than three components, starting from the first two estimated components, all other components can be estimated iteratively one-by-one by minimizing their overlap with all previously estimated components, as shown in the following lemma. 842 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 The value of the histogram at the th point can be thought of as a binary random variables, each representing a random sum of chance of success, and hence has a binomial draw that has distribution with mean and variance . (the histogram scaled to sum The normalized histogram and variance , to 1) then has mean which means the approximation noise has zero mean and variance Fig. 3. (a) Piecewise constant probability density function and (b) its approximation by histogram. Lemma 3: Let be the underlying components, of which the first are already known. Let (12) (14) Note that the variance increases almost linearly with , consistent with our observation in Fig. 3. The influence of the histogram noise can be reduced by estimating its contribution to the covariance matrix and subse, then the covariquently removing it. Let ance matrix can be written as follows: Then , where are solutions of the following linear programming problem, will coincide with one of the remaining unknown components: (13) Proof: Let function . Then . As mentioned in Lemma 2, the minimum , where is the coorof occurs when dinate along which has the smallest slope. According to (12), the minimum must correspond to one of the unknown compo. nents, i.e., Note that one of the terms on the right side of inequality (12) is equal to the norm of a component, while all other terms on both sides correspond to overlaps between components. Hence condition (12) can be again interpreted as sufficient separation between the underlying components. The first term corresponds to the covariance matrix that has not are uncorrelated, been affected by noise. Since and the second and third terms can be neglected (15) The the last term histogram is the noise. covariance matrix Its nondiagonal of entries because and are zero mean and uncorrelated. The diagonal entries can be approximated using (14) as IV. REDUCING THE EFFECT OF NOISE ON ESTIMATION OF PRINCIPAL COMPONENTS Histogram is an approximation of the underlying probability distribution (or probability mass function, for images). The difference between the two is due to finite set of data used for histogram evaluation and can be treated as noise. When identically distributed, this approximation noise is unlikely to alter the directions of principal components, a property widely exploited by PCA-based denoising. However, the histogram noise may not identically distributed, as shown in the comparison between a piecewise uniform probability mass function and its approximation by a histogram, evaluated using 2000 data points (Fig. 3). Note that the approximation noise has larger variance in the range [0.5, 1], where the original probability mass function is larger, and smaller variance in the range [0, 0.5], where the probability mass function is smaller. The noise statistics can be derived analytically by noticing that histogram amplitudes have binomial distributions. Let be the underlying probability mass function of the th image be the number of data points (image voxels). subvolume and When all subvolumes have equal size (16) Subtracting estimated variance matrix. from (15) yields the corrected co- V. IMPLEMENTATION Our algorithm was implemented fully in Matlab, using built-in functions pcacov and linprog to estimate principal components and perform linear programming optimization. Note that pcacov was used instead of SVD decomposition because it allows manipulation of the covariance matrix (Section IV). The implementation of our framework for component estimation is described in Fig. 4. The algorithm to estimate the 1-st and ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 843 Note that despite increased dimensionality of the new PCA subspace, the number of principal components will still be equal to the number of underlying tissue distributions, a required condition for the proofs of the lemmas. In other words, if intensity variation of a particular tissue gives raise to an additional principal component, this variation will be treated as a separate class of interest, albeit “inferior” to other classes. The large overlap between these inferior distributions and the main classes will guarantee that the former will not appear in the estimation loop output until all the latter ones are estimated, at which point the estimation process can be stopped. VI. EXPERIMENTAL RESULTS A. Estimating Tissue Intensity Distributions From Structural MRI Data Fig. 4. Algorithm implementation. th component (line 7) is implemented as described in (9) and (10) of Lemma 1. The algorithm to estimate each additional component (line 10) is implemented as described in (13) of Lemma 3. The total running time of the algorithm is on the order of a few seconds, for a typical 3D MR image. During the initial testing on simulated data we have discovered that the non-negative constraint imposed on estimated comwas too strict. The histogram noise and errors in ponents estimating of the principal subspace can lead to an infeasible optimization problem or a very narrow search space. To overto come this we relaxed the non-negativity constraint , where is the number of the histogram bins. This negative bound was small enough not to cause any visible estimation problems in our experiments. We have also found that intensity nonuniformity (smoothly varying intensity across the same tissue) and the presence of several tissues with slightly different intensities, e.g., subcortical structures in brain images, can increase the dimensionality of the original subspace. In this case some of the estimated principal components may represent slight variations in tissue appearance rather than a new tissue, resulting in a large overlap between the estimated components. This poses two problems. First, the appearance of two overlapping component representing the same tissue means that some other tissue will not be represented. Second, it stalls the optimization procedure, which requires sufficient separation between the components. To overcome these issues we implemented a simple failure detection in line 11, which measures the overlap between the current component and all previously estimated components. The overlap here was defined as the intersection area between components, i.e., area under the curve of . If the overlap is greater than 2/3, an empirically set threshold, the counter is reset to zero and the number of principal components is increased by 1 to better capture the original subspace. The new PCA subspace with increased dimensionality will contain a mixture of components corresponding to distinct tissue distributions as well as some components corresponding to their intensity variations. For performance evaluation we used T1-weighted brain images from two publicly available data sets, BrainWeb (http://www.bic.mni.mcgill.ca/brainweb/) and IBSR (http://www.cma.mgh.harvard.edu/ibsr/), and one data set acquired at our site (CNL). The BrainWeb data set contains realistic synthesized brain volumes with varying degrees of noise and intensity nonuniformity, and 1 1 1 mm resolution. The IBSR data set contains 18 real MR acquisitions made on a 1.5 T scanner with 1 1 1.5 mm resolution. The CNL data set contained times 1 1 mm brain scans of 14 healthy aging adults (age 55–82), acquired on Siemens Tim Trio 3 T scanner. BrainWeb and IBSR data sets contained the ground truth labeling for GM and WM, by construction for BrainWeb and manually segmented by an expert for IBSR. To obtain ground truth labeling for CNL data set, these images were processed using FreeSurfer 3.04 [43] and the resultant pial and WM surfaces were then edited by an expert and converted to volume masks. In addition, BrainWeb data set contained the ground truth labeling for CSF, which was not available for IBSR and CNL data sets. To compensate for this we removed all non-brain tissues using GCUT, an in-house developed skull stripping technique [44]. Following this we defined the CSF label as the set of voxels preserved by the skull stripping but not belonging to GM or WM labels. Note that so defined CSF region may contain traces of non-brain tissues, left behind by the skull stripping procedure. We further augmented the ground truth to include mix classes of partial volume voxels (GM-WM and CSF-GM). The partial volume voxels were defined as the voxels located near the boundary between two tissues. Practically, these were identified by performing a one voxel erosion of each tissue with a standard six-neighbor connectivity and selecting the eroded voxels as belonging to the mix classes. All images were first skull stripped using GCUT [44] and then processed by our algorithm set to estimate either three (only the main tissues) or five tissue intensity distributions. Shown in Fig. 5 are the results of applying our algorithm on images from BrainWeb data set. Our algorithm exhibited excellent performance when only the main tissue classes were considered [Fig. 5(left)] and performed slightly worse when the mix classed were included, especially for large noise levels [Fig. 5(right)]. This decrease in the estimation quality was primarily due to the 844 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 6. Estimating three classes [CSF GM WM] on an IBSR volume 5 (left) and a CNL volume (right). Ground truth distributions are shown using dotted lines. (a) Our algorithm. (b) SPM 8. TABLE I CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE DISTRIBUTIONS, AVERAGED OVER BRAINWEB VOLUMES (NOISE LEVEL 1, 3, 5, AND 7). * INDICATES p < : 0 005 Fig. 5. Estimating 3 classes [CSF GM WM] (left) and 3 pure classes [CSF GM WM] mix classes [CSF-GM GM-WM] (right) on BrainWeb data. Ground . (b) Noise . truth distributions are shown using dotted lines. (a) Noise . (c) Noise +2 =7 =3 =5 increased error in estimating tissue proportions rather than the distribution shapes. It is remarkable that our algorithm was able to capture the two-peak shape of the CSF-GM mix class distribution, which appears at the lower noise levels [Fig. 5(a) and (b)]. This would not be possible with a Gaussian mixture model, which represents each distribution by a single peak Gaussian. Our algorithm also compares favorably with several other approaches, in which were evaluated on BrainWeb volume with noise [29]—compare Fig. 5(c) here with Fig. 6 in [29]. Our algorithm also performed well on real MRI data from IBSR and CNL data sets [Fig. 6(a)]. For quantitative performance evaluation we used two measures. One was the average Pearson correlation between estimated and true distributions of each class; its goal was to capture the quality of distribution shape estimation. The other was the absolute difference between the estimated and the true proportions for each class. The summary of the performance results for BrainWeb and IBSR/CNL data sets is provided in Tables I and II respectively. Overall, our approach achieved excellent average correlations of 0.88–0.97 and average proportion errors of less than 0.06. For reference, we compared these results with those obtained by well-known SPM 8 segmentation software package, which models the image histogram using a mixture of Gaussians and iterates between estimating mixture parameters, tissue classification, intensity nonuniformity, and registration [16]. We ran TABLE II CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE DISTRIBUTIONS, AVERAGED OVER 18 IBSR VOLUMES AND 14 CNL VOLUMES. (A) RELAXATION, (B) HISTOGRAM ERROR CORRECTION. ** INDICATES p < : = PC 0 001 SPM with the default settings using 10 Gaussian distributions for GM, 2 for WM, 6 for CSF + other structures, which were then linearly combined to obtain the distributions of the three main brain tissues. Overall, SPM performed worse compared to ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES Fig. 7. Estimating three classes [CSF GM WM] on BrainWeb data with 5% noise and (a) 20% or (b) 40% nonuniformity. Ground truth distributions are shown using dotted lines. our method, achieving correlations of 0.81–0.85 and proportion errors of 0.05–0.13 (Table II). Considering that the CSF class was not as clearly defined as the GM and WM classes, we have also included results restricted to the GM and WM classes in the last two columns of the table. A typical SPM output is shown in Fig. 6(b). We have also evaluated the effects of intensity nonuniformity and individual algorithm components/settings (subvolume size, relaxation of the number principal components, correction of the covariance matrix) on the estimation performance. Changing subvolume size from 3 3 3 cuboids to 20 20 20 cuboids to sagittal slices had little effect on performance (Table I). However, the average correlations decreased in the presence of intensity nonuniformity, and the change was statistically signif. As shown in Fig. 7, nonuniformity causes icant the estimated distributions with minimum and maximum means to become narrower than the true distributions, which explains the decreased correlations. The relaxation step led to significant improvement in estimation performance; correlation increased from 0.52 to 0.88 for IBSR data set and from 0.71 to 0.94 for , and proportion error decreased from CNL data set more than 0.2 to 0.06 or less (Table II). The correction of the covariance matrix for histogram noise had no discernable effect on the estimation performance. B. Estimating Distribution of Activated Voxels From Simulated Functional MR Data Activated regions in functional MRI experiments are typically detected using significance threshold testing, where voxels are declared activated on the basis of their unlikely intensity under the null hypothesis of no activation [45]. This allows controlling for Type I error but not for Type II error, since only the null distribution is modeled. Here by Type I error we mean false positives—declaring a nonactivated pixel activated, and by Type II error we mean false negatives—declaring an activated pixel nonactivated. A more appropriate threshold could be defined if the distribution of activated voxels is known [46], but small size of the activated class makes the estimation difficult. To simulate functional MRI data we created a set of synthetic 200 200 200 resolution images, where activated regions were modeled as uniform intensity cubes of size 3 3 3 voxels on a uniform background. The images were corrupted by Gaussian noise, thus creating two Gaussian distributions, one for the nonactivated class and the other for the activated class. We then varied the difference between the means of the two 845 distributions and the proportion of the activated (smaller) class to obtain different samples for our experiments. The signal-tonoise ratio (SNR), defined as the ratio of the difference between the means and the standard deviation, was varied from 2 to 6. The proportion of the activated class was varied from 0.1% to 16%. Instead of providing correlations and proportion errors as in Section VI-A, we used the estimated distributions to directly determine the threshold that minimizes the misclassification error (the sum of Type I and Type II errors). We then recorded the percentage increase in misclassification error when using the estimated threshold versus the optimal one, derived using the knowledge of the true distributions. For reference, we have compared the results with those obtained using our own implementation of the standard EM algorithm. To achieve the best possible performance, the EM algorithm was initialized with the true parameter values, e.g., means, variances, and proportions derived from the true distributions. The results are summarized in Fig. 8. The performance of our algorithm was the same throughout the chosen range of SNRs, and practically coincided with the optimal performance for proportions 0.68%–1% and larger. There was a significant drop in performance below 0.68%–1% critical failure point. Performance of the EM-based estimation was significantly worse than – , and was comparable to that of our approach at . However, considering that EM algorithm was ours at initialized with the true parameter values, and real life performance with random initialization is expected to be worse, our approach clearly offers a superior alternative to the EM algorithm in this application. Unlike in the case of structural MRI data, the relaxation of the number of principal components did not have any discernable effect on the performance in this experiment. On the other hand, the histogram error correction had a significant impact—its omission made our algorithm perform worse than the EM algorithm (Fig. 8). Using larger subvolumes also decreased the performance, with critical failure proportion moving from 0.68%–1% to 1.35% for 10 10 10 subvolumes and to 6% proportion for 20 20 20 subvolumes. VII. DISCUSSION We presented a novel nonparametric, noniterative approach to estimating underlying tissue probability distributions from 3D image volumes. The main idea is to treat estimation as an instance of a blind source separation problem, constraining underlying components to be non-negative, sum to one, and be sufficiently separated from each other. It is important to highlight the differences between the proposed approach and related techniques, such as ICA and nonnegative matrix factorization (NMF) [47]. All three approaches seek a linear decomposition of the data matrix as , where rows of matrices and contain basis vectors and mixing weights respectively. The proposed approach is similar to ICA in using a two-stage decomposition, (PCA) followed by , so that . Another similarity between ICA and the proposed approach is that the rows of are estimated one-by-one, by optimizing a certain objective function. The main novelty of our approach, compared 846 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 8. Percentage increase in misclassification error (performance drop) as a function of smaller class proportion for (a) SNR = 2, (b) SNR = 4, (c) SNR = 6 for our approach (with and without error correction) versus EM algorithm. to ICA, is the choice of this objective function, designed to minimize or maximize the means of the components and minimize their overlap with previously estimated components, rather that maximizing some measure of non-Gaussianity. Our estimated components are also constrained to be non-negative and sum to 1. This results in completely different optimization procedure, iterative locally optimal for ICA and noniterative globally optimal for our approach. NMF differs from ICA and the proposed approach in that and are estimated simultaneously, rather than matrices row by row. Our approach has similar constraints to those of NMF—matrices and in NMF are required to be non-negative and basis sparsity can be encouraged [48]. Nevertheless, the optimization procedure used in NMF is iterative and only locally optimal, and testing its suitability for the current application is left as future work. Note that our approach does not use any regularization or smoothness constraints on the shape of the estimated tissue distributions. The resultant smoothness is due to enforcing the distributions to be linear combinations of the main principal components. These components are expected to be smooth due to denoising effect of PCA—the noise spreads across many minor PCA components and is removed when these components are discarded. The chosen experiments aimed to highlight the capabilities of our algorithm and its advantages over standard approaches based on the EM algorithm. In the first experiment (estimating brain tissue intensity probability distributions from structural MRI data) our algorithm outperformed SPM even though the latter relied on prior tissue probability maps and intermediate tissue classification. The underlying distributions were not Gaussian due to the presence of partial volume voxels, intensity nonuniformity, as well as natural tissue variability, and these non-Gaussian shapes were better captured by our method. Since our method does not require any prior information, this suggests its possible deployment in such applications as developmental studies on humans, segmentation of abdominal adipose tissue, tumor segmentation, functional MRI, etc. In the second experiment (simulated functional MRI data) the goal was to test robustness of our approach to a scenario where one class (activated voxels) is substantially smaller than the other. Even when initialized with the true parameter values, the EM algorithm performed worse than our method for low SNRs. This can be explained by the small peak of activated voxels distribution being completely merged with the tail of the null distribution. This problem does not affect our method, because of the use of subvolume histograms, rather than the whole image histogram. In subvolumes containing activated voxels, the number of activated and nonactivated voxels will be similar to each other, preventing the tail of the null distribution from overpowering the peak of activated voxels. For example, when a 5 5 5 subvolume fully covers a 3 3 3 activated region, the subvolume will contain 27 activated voxels and nonactivated voxels. The chosen experiments also highlighted the advantages of histogram noise correction (Section IV) and relaxation of the number of PCA components (Section V). The latter was essential for structural MRI data, but not for functional MRI data, possibly due to absence of intensity nonuniformity artifact. Note that the raw fMRI volumes do exhibit intensity nonuniformity, but the classification is applied to statistical parametric maps, where voxel intensities correspond to likelihood of activation. Such maps are generated based on linear regression of the task time course from the voxel time courses and hence do not contain intensity nonuniformity. Histogram noise correction was beneficial for fMRI data only. This is likely due to small size of activated regions, which makes the proportion of their distribution so small that it becomes comparable to the histogram noise. Increasing subvolume size did not affect the performance for structural MRI data and reduced it for simulated functional MRI data. This is expected, considering that estimated principal components can be thought of as weighted averages of subvolume histograms. Increasing subvolume size reduces the noise variance of each measurement (subvolume histogram) but also reduces the number of measurements (the number of subvolumes) by exactly the same factor, and the two changes cancel each other. The decrease in performance for functional MRI data was due to increased difficulty of detecting a small peak inside the tail of the null distribution. For example, a 10 10 10 subvolume fully covering a 3 3 3 activated region will contain nonactivated only 27 activated voxels and voxels, approximately 50 of which will be located in the tail of the null distribution (beyond the two standard deviations). While the presence of intensity nonuniformity does not lead to failure of our algorithm, due to proposed relaxation of the number of PCA components, it has adverse effect on performance by causing estimated distributions with minimum and maximum means to become narrower than otherwise expected. This can be explained by nonuniformity causing two (or pos- ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES sibly more) principal components to represent a single tissue distribution. These components can be combined to yield distributions that are narrower than the true distribution and are aligned with its left or right border. Hence, maximizing the mean will yield a narrower distribution aligned with the right border of the true distribution, while minimization of the mean will lead to alignment with the left border, as shown in Fig. 7. Real life applications, such as brain morphometry analysis, employ intensity nonuniformity correction as a mandatory first step in data processing pipeline. The remaining (after correction) nonuniformity is likely to be much smaller than 20%–40% used in our testing and hence expected to have minimal effect on performance. REFERENCES [1] T. Lei and W. Sewchand, “Statistical approach to X-ray CT imaging and its applications in image analysis,” IEEE Trans. Med. Imag., vol. 11, no. 1, pp. 62–69, Mar. 1992. [2] F. O’Sullivan, “Imaging radiotracer model parameters in PET: A mixture analysis approach,” IEEE Trans. Med. Imag., vol. 12, no. 3, pp. 399–412, Sep. 1993. [3] A. Lundervold and G. Storvik, “Segmentation of brain parenchyma and cerebrospinal fluid in multispectral magnetic resonance images,” IEEE Trans. Med. Imag., vol. 14, no. 2, pp. 339–349, Jun. 1995. [4] W. M. Wells, W. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive segmentation of MRI data,” IEEE Trans. Med. Imag., vol. 15, no. 4, pp. 429–442, Aug. 1996. [5] J. Ashburner and K. Friston, “Multimodal image coregistration and partitioning—A unified framework,” Neuroimage, vol. 6, no. 3, pp. 209–217, 1997. [6] S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain tissue classification of magnetic resonance images using partial volume modeling,” IEEE Trans. Med. Imag., vol. 19, no. 12, pp. 1179–1187, Dec. 2000. [7] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1, pp. 45–57, Jan. 2001. [8] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “A unifying framework for partial volume segmentation of brain MR images,” IEEE Trans. Med. Imag., vol. 22, no. 1, pp. 105–119, Jan. 2003. [9] H. Park, P. H. Bland, and C. R. Meyer, “Construction of an abdominal probabilistic atlas and its application in segmentation,” IEEE Trans. Med. Imag., vol. 22, no. 4, pp. 483–492, Apr. 2003. [10] G. Dugas-Phocion, M. A. G. Ballester, G. Malandain, C. Lebrun, and N. Ayache, “Improved em-based tissue segmentation and partial volume effect quantification in multi-sequence brain MRI,” in Int. Conf. Med. Image Computing Computer Assisted Intervent. (MICCAI), Saint-Malo, France, 2004, vol. 3216, pp. 26–33. [11] M. H. Cardinal, J. Meunier, G. Soulez, R. L. Maurice, E. Therasse, and G. Cloutier, “Intravascular ultrasound image segmentation: A three-dimensional fast-marching method based on gray level distributions,” IEEE Trans. Med. Imag., vol. 25, no. 5, pp. 590–601, May 2006. [12] F. G. Meyer and X. Shen, “Classification of fMRI time series in a lowdimensional subspace with a spatial prior,” IEEE Trans. Med. Imag., vol. 27, no. 1, pp. 87–98, Jan. 2008. [13] M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 381–396, Mar. 2002. [14] J. Tohka, E. Krestyannikov, I. D. Dinov, A. M. Graham, D. W. Shattuck, U. Ruotsalainen, and A. W. Toga, “Genetic algorithms for finite mixture model based voxel classification in neuroimaging,” IEEE Trans. Med. Imag., vol. 26, no. 5, pp. 696–711, May 2007. [15] P. Schroeter, J. M. Vesin, T. Langenberger, and R. Meuli, “Robust parameter estimation of intensity distributions for brain magnetic resonance images,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 172–186, Apr. 1998. 847 [16] J. Ashburner and K. J. Friston, “Unified segmentation,” Neuroimage, vol. 26, no. 3, pp. 839–851, 2005. [17] Z. Liang, J. R. Macfall, and D. P. Harrington, “Parameter estimation and tissue segmentation from multispectral MR images,” IEEE Trans. Med. Imag., vol. 13, no. 3, pp. 441–449, Sep. 1994. [18] C. A. Bouman and M. Shapiro, “A multiscale random field model for bayesian image segmentation,” IEEE Trans. Image Process., vol. 3, no. 2, pp. 162–177, Mar. 1994. [19] H. Greenspan, A. Ruf, and J. Goldberger, “Constrained Gaussian mixture model framework for automatic segmentation of MR brain images,” IEEE Trans. Med. Imag., vol. 25, no. 9, pp. 1233–1245, Sep. 2006. [20] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo, N. Makris, B. Rosen, and A. M. Dale, “Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,” Neuron, vol. 33, no. 3, pp. 341–355, 2002. [21] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1, pp. 45–57, Jan. 2001. [22] S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E. Behrens, H. Johansen-Berg, P. R. Bannister, M. D. Luca, I. Drobnjak, and D. E. Flitney, “Advances in functional and structural MR image analysis and implementation as FSL,” NeuroImage Math. Brain Imag., vol. 23, pp. S208–S219, 2004. [23] H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI data,” Magn. Reson. Med., vol. 34, no. 6, pp. 910–914, 1995. [24] P. Santago and H. D. Gage, “Quantification of MR brain images by mixture density and partial volume modeling,” IEEE Trans. Med. Imag., vol. 12, no. 3, pp. 566–574, Sep. 1993. [25] D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, D. A. Rottenberg, and R. M. Leahy, “Magnetic resonance image tissue classification using a partial volume model,” Neuroimage, vol. 13, no. 5, pp. 856–876, 2001. [26] D. H. Laidlaw, K. W. Fleischer, and A. H. Barr, “Partial-volume Bayesian classification of material mixtures in MR volume data using voxel histograms,” IEEE Trans. Med. Imag., vol. 17, no. 1, pp. 74–86, Feb. 1998. [27] J. Ashburner and K. J. Friston, “Voxel-based morphometry—The methods,” Neuroimage, vol. 11, no. 6, pp. 805–821, 2000, Pt 1. [28] P. Santago and H. D. Gage, “Statistical models of partial volume effect,” IEEE Trans. Image Process., vol. 4, no. 11, pp. 1531–1540, Nov. 1995. [29] M. B. Cuadra, L. Cammoun, T. Butz, O. Cuisenaire, and J.-P. Thiran, “Comparison and validation of tissue modelization and statistical classification methods in T1-weighted MR brain images,” IEEE Trans. Med. Imag., vol. 24, no. 12, pp. 1548–1565, Dec. 2005. [30] S. Sanjay-Gopal and T. J. Hebert, “Bayesian pixel classification using spatially variant finite mixtures and the generalized EM algorithm,” IEEE Trans. Image Process., vol. 7, no. 7, pp. 1014–1028, Jul. 1998. [31] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Automated model-based tissue classification of MR images of the brain,” IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 897–908, Oct. 1999. [32] J. L. Marroquin, B. C. Vemuri, S. Botello, F. Calderon, and A. Fernandez-Bouzas, “An accurate and efficient bayesian method for automatic Segmentation of brain MRI,” IEEE Trans. Med. Imag., vol. 21, no. 8, pp. 934–945, Aug. 2002. [33] K. M. Pohl, W. M. Wells, A. Guimond, K. Kasai, M. E. Shenton, R. Kikinis, W. E. L. Grimson, and S. K. Warfield, “Incorporating nonrigid registration into expectation maximization algorithm to segment MR images,” in Int. Conf. Med. Image Computing Computer Assist. Intervent. (MICCAI), 2002. [34] M. Prastawa, “Automatic brain tumor segmentation by subject specific modification of atlas priors,” Acad. Radiol., vol. 10, no. 12, pp. 1341–1348, 2003. [35] J. Nuyts, P. Dupont, S. Stroobants, A. Maes, L. Mortelmans, and P. Suetens, “Evaluation of maximum-likelihood based attenupion correction in positron emission tomography,” IEEE Trans. Nucl. Sci., vol. 46, no. 4, p. 1136, Aug. 1999. [36] A. Ciptadi, C. Chen, and V. Zagorodnov, “Component analysis approach to estimation of tissue intensity distributions of 3D images,” in IEEE Int. Conf. Comput. Vis. (ICCV), Kyoto, Japan, 2009, p. 1765. 848 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separation technique using second-order statistics,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997. [38] J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Process. Lett., vol. 4, no. 4, pp. 112–114, Apr. 1997. [39] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley-Interscience, 2001. [40] R. Nagarajan and C. A. Peterson, “Identifying spots in microarray images,” IEEE Trans. Nanobiosci., vol. 1, no. 2, pp. 78–84, Jun. 2002. [41] D. Lawley and A. Maxwell, “Factor analysis as a statistical method,” Statistician, vol. 12, no. 3, pp. 209–229, 1962. [42] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11, no. 4, 1999. [43] A. M. Dale, B. Fischl, and M. I. Sereno, “Cortical surface-based analysis: I. segmentation and surface reconstruction,” NeuroImage, vol. 9, no. 2, pp. 179–194, 1999. [44] S. Sadananthan, W. Zheng, M. Chee, and V. Zagorodnov, “Skull stripping using graph cuts,” NeuroImage, vol. 49, no. 1, pp. 225–239, 2010. [45] B. R. Logan and D. B. Rowe, “An evaluation of thresholding techniques in fMRI analysis,” Neuroimage, vol. 22, no. 1, pp. 95–108, 2004. [46] N. V. Hartvig and J. L. Jensen, “Spatial mixture modeling of fMRI data,” Hum. Brain Mapp., vol. 11, no. 4, pp. 233–248, 2000. [47] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999. [48] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, 2004.