Component Analysis Approach to Estimation of , Member, IEEE

advertisement
838
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
Component Analysis Approach to Estimation of
Tissue Intensity Distributions of 3D Images
Vitali Zagorodnov*, Member, IEEE, and Arridhana Ciptadi, Student Member, IEEE
Abstract—Many segmentation algorithms in medical imaging
rely on accurate modeling and estimation of tissue intensity probability density functions. Gaussian mixture modeling, currently the
most common approach, has several drawbacks, such as reliance
on a Gaussian model and iterative local optimization used to
estimate the model parameters. It also does not take advantage of
substantially larger amount of data provided by 3D acquisitions,
which are becoming standard in clinical environment. We propose
a novel and completely non-parametric algorithm to estimate
the tissue intensity probabilities in 3D images. Instead of relying
on traditional framework of iterating between classification and
estimation, we pose the problem as an instance of a blind source
separation problem, where the unknown distributions are treated
as sources and histograms of image subvolumes as mixtures. The
new approach performed well on synthetic data and real magnetic
resonance imaging (MRI) scans of the brain, robustly capturing
intensity distributions of even small image structures and partial
volume voxels.
Index Terms—Blind source separation, Gaussian mixtures,
image segmentation, magnetic resonance imaging (MRI), tissue
intensity distributions.
I. INTRODUCTION
ANY segmentation algorithms in medical imaging rely
on accurate modeling and estimation of tissue intensity
probability density functions (pdfs) [1]–[12], usually in the context of statistical region-based segmentation. Commonly, tissue
intensity probabilities are modeled using the finite mixture (FM)
model [2], [13], [14], and its special case the finite Gaussian
mixture (FGM) model [15], [16]. In these models the intensity
pdf of each tissue class is represented by a parametric (e.g.,
Gaussian in the case of FGM) function called the component
density, while the intensity pdf of the whole image is modeled by
a weighted sum of the tissue component densities. The fitting is
usually done using Expectation Maximization (EM) algorithm
[1], [3], [6], [9]–[11], [17]–[19], which iterates between classification and parameter estimation until a stable state is reached.
The FM models combined with EM algorithm have been incorporated into many existing image segmentation pipelines, usu-
M
Manuscript received October 19, 2010; accepted November 29, 2010. Date
of publication December 17, 2010; date of current version March 02, 2011. This
work was supported by SBIC C-012/2006 grant provided by A*STAR, Singapore (Agency for Science and Technology and Research). Asterisk indicates
corresponding author.
*V. Zagorodnov is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore.
A. Ciptadi is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore.
Digital Object Identifier 10.1109/TMI.2010.2098417
ally in the context of brain tissue segmentation (FreeSurfer [20],
SIENAX [21] part of FSL [22], SPM [16]).
The main drawback of FGM models is that the tissue intensity distributions do not always have a Gaussian form. The
noise in magnetic resonance (MR) images is known to be Rician rather than Gaussian [23]. But the largest deviation from
Gaussianity is due to presence of partial volume (PV) voxels [6],
[8], [24]–[27]. One way to model such voxels is by representing
their distribution using a uniform mixture of “pure” distributions
[8], [25], [28]. However, this complicates the optimization as the
functional involved becomes considerably more nonlinear. Uniform mixture assumption has also been questioned [27]. To simplify the problem several researchers suggested modeling partial volume voxels intensity distributions as Gaussian, which appears to be a good approximation for sufficiently noisy images
[6]. This approximation has been incorporated into the most recent versions of SPM [16]. However, a more recent study [29]
revealed that model [28] can still achieve better fit.
Another issue associated with
framework is potential convergence to a poor local optimum, which means a sufficiently close parameter initialization is usually required [4],
[13], especially for distribution means [29]. The convergence of
the EM algorithm to a more meaningful optimum can be improved by including prior information in the classification step,
such as pixel correlations [30], MRF priors [7], [25], [31], [32]
or probabilistic atlas [5], [9], [31]–[33]. Even though this approach is helpful, especially for very noisy images [7], it can
also introduce bias in estimation [30]. And while probabilistic
atlas is useful when segmenting brain tissues [5] or abdominal
organs [9], its construction is not always possible, as is the case
in the segmentation of brain lesions [34], tumor detection [35],
or localization of fMRI activations.
Finally, the
approach often fails to take advantage of substantially larger amount of data present in 3D images,
such as those obtained by magnetic resonance (MR) and X-ray
computed tomography (CT) scanning techniques. We propose a
novel nonparametric algorithm to estimate tissue intensity probabilities in 3D images that completely departs from traditional
classification-estimation framework. To illustrate the main idea
behind our approach, consider the following introductory example.
Shown in Fig. 1 are the histograms of a 3D T1-weighted MR
image and two of its 2D slices. The observed variability in the
shape of 2D histograms is due to varying tissue proportions
across the slices. For example, the slice in Fig. 1(b) contains
relatively small amount of cerebro-spinal fluid (CSF) and hence
the lowest intensity peak (see Fig. 1(c) for reference) is practically missing from its histogram. The proportion of CSF is in-
0278-0062/$26.00 © 2010 IEEE
ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES
839
II. PROBLEM STATEMENT
Let
be a 3D image volume partitioned into a set of
subvolumes
. We assume the voxel intensities of
can take distinct values and are drawn from
unknown
. For exprobability mass functions (pmf)
ample, a brain volume can be assumed to have three main tissues, white matter (WM), gray matter (GM), and cerebro-spinal
. For an 8-bit acquisition,
. Subfluid (CSF), so
volumes can be chosen arbitrary, for example as coronal, sagittal
or transverse slices of the 3D volume.
be the -bin histogram of , normalized to
Let
sum to 1. Then
(1)
is the th tissue proportion in the th subvolume,
, and is the noise term that reflects the difference
between the actual probability distribution and its finite approximation by a histogram.
and
. Rewriting (1) in a
Let
matrix form yields
where
Fig. 1. Histograms of a 3D brain image and several of its slices. (a) 3D Image
and its histogram. (b) Transverse slice 128 and its histogram. (c) Transverse
slice 152 and its histogram.
creased in the slice shown in Fig. 1(c) due to the presence of
ventricles, leading to reappearance of the lowest intensity peak.
This slice, however, contains only a small amount of gray matter
(GM), thus lowering the middle peak of the histogram. While
this variability can potentially provide useful information for
mixture estimation, it is traditionally discarded by performing
estimation directly on the histogram of the whole volume.
The proposed approach treats the histograms of 2D slices (or
any other subvolumes) as mixture realizations of the component densities. This allows stating the unmixing problem in a
blind source separation (BSS) framework [37], [38]. To solve
the problem we use a framework that is similar to that of independent component analysis (ICA) [39], but without relying
on the independence assumption. Instead we use the fact that
underlying components must be valid probability distributions
with different means, which results in a simple convex linear
optimization problem that guarantees convergence to a global
optimum (ICA’s iterative procedure, in comparison, converges
only to a local optimum). Our approach provides a promising alternative for estimating component densities in 3D images that
is more accurate compared to the state-of-the-art approaches.
We note that a preliminary version of this algorithm was
previously reported in a conference paper [36]. The present
paper provides a complete description of the algorithm, introduces covariance matrix correction to improve its performance,
and presents a more complete set of validation experiments,
including evaluation of effects of intensity nonuniformity and
individual algorithm components/settings (subvolume size,
relaxation of the number principal components, correction of
the covariance matrix).
(2)
This is identical to blind source separation (BSS) formulation
with subvolume histograms as mixtures and unknown tissue
pmf’s as sources. Our goal is to estimate
as well
as their mixing weights given .
Our solution requires several assumptions, most of which
, and suffiare general to a BSS problem:
cient variability of mixing proportions . These can be easily
satisfied with proper choice of partitioning and histogram dimensionality. We also assume that distributions
have different means and are sufficiently separated from each
other, where the meaning of sufficient separation is detailed in
Section III. These assumptions are not very restrictive and are
generally satisfied for medical images [15], [40].
III. PROPOSED SOLUTION
BSS problem has been studied extensively in recent years,
with several solutions proposed for selected special cases, e.g.,
factor analysis (FA) [41] for Gaussian sources and independent
factor analysis (IFA) [42] or independent component analysis
(ICA) [39] for independent non-Gaussian sources. The general
framework of performing decomposition (2), as exemplified by
ICA, can be briefly summarized as follows.
1) Center matrix so that all columns sum to zero, by subfrom all rows
tracting row vector
of . Let designate the centered version of .
2) Apply principal component analysis (PCA) to
. Practically, this can be done by performing SVD deand setting
and
composition,
. The rows of matrix contain a set of orthogonal
contains the weights.
principal components and matrix
840
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
Fig. 2. Explanation of definitions used in Lemma 1. (a) Two original distributions f and f with means and , respectively, (b) Estimated PCA components
p and p , (c) Various linear combinations of PCA components.
3) Decomposition of the original (noncentered) matrix
can be done by changing
to
,
where is a column vector where all elements are equal to
1.
4) Truncate and to preserve a smaller (specified) number
of PCA components corresponding to the largest eigen. The square roots of these eigenvalues are
values of
located on the diagonal of . The resultant decomposition
has the form
, where the “hat” symbol refers
to truncation, and appearance of error vector is due to
discarding of the components.
, so that
.
5) Further decompose as
In practice, this step is performed in the opposite direction,
as
A. Estimating Components With Largest and Smallest Means
be the components with the smallest and largest
Let and
mean respectively. Then these can be estimated by minimizing
) the mean of a linear combi(for ) or maximizing (for
nation of ’s, where coefficients of linear combination satisfy
constraints (4) and (5). This follows from the following lemma,
, and its corollary, which
which proves this statement for
extends the result to arbitrary .
Lemma 1: Let
and
be the means of underbe an
lying probability distributions. Let
arbitrary linear combination of ’s that satisfies
and
for
. Then,
(6)
(3)
where rows of are estimated one-by-one by maximizing
some measure of non-Gaussianity of the resultant components (rows of matrix ).
Note that this approach cannot be applied directly to our
problem, because the rows of (step 5) are not constrained to
represent valid probability distribution functions (non-negative
and sum to 1). Furthermore, maximizing non-Gaussianity
might not result in a meaningful solution as our source components are neither Gaussian nor independent. However, the
ICA framework can be adapted by preserving steps 1–4, and
modifying step 5 as follows.
be a set of preserved PCA components,
Let
. Let
be a column vector
then
where all elements are equal to 1. Since the component densities
represent valid pmf’s and hence are non-negative and sum to
1, this imposes equality and
inequality constraints on the
elements of
(4)
where
The equalities are achieved when
and
.
Proof: Explanations of the terms used in the lemma stateand (3) it follows
ment are given in Fig. 2. From
’s
that is also a linear combination of
Then
and
(5)
These constraints can be combined with the assumptions that
component densities have different means and are sufficiently
separated from each other to estimate . This is done in the
following sections.
(7)
and
Let
is minimized by making
. Then,
and
as large as possible. However, the
ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES
largest possible
is controlled by non-negativity of
841
B. Estimating Remaining Components
This leads to the left side of inequality (6). The right side of
inequality (6) can be obtained in similar fashion, by setting
and
.
The intuition behind Lemma 1 can be derived from Fig. 2(c),
which shows several linear combinations of principal components estimated on the basis of a mixture of two Gaussian disand
. Some linear
tributions with means
in this example, will be
combinations, such as
rejected because they contain negative values. The combination
appears very similar to , but will also be rejected
because of the negative bump aligned with the peak of . Further decreasing the coefficient in front will lead to disappearance of this bump and convergence to , the component with
is non-negative, but
minimum mean. Combination
because of the presence
its mean is slightly less than
of a small positive bump aligned with the peak of . Reducing
the coefficient in front of will remove the bump and increase
the mean of this combination, converging to , the component
with maximum mean.
In practice, ’s are usually small and can be discarded.
For Gaussian components on unbounded domain, it can
be straightforwardly shown that
and hence
.
If we ignore and , the equalities in (6) are achieved when
or
. Similar statement can be made in case of
more than two components, see the following corollary.
and
be
Corollary 1: Let
the means of underlying probability distributions. Let
satisfy
and
for
.
Then
In this section we show that all remaining components can
be estimated iteratively, one-by-one, by minimizing their inner
product (effectively overlap) with the components that have already been estimated, where optimization is reduced to solving
another Linear Programming problem. The following lemma
, while its corollary extends the result to
shows this for
. Here throughout
designates the inner product and
is the
norm.
and
Lemma 2: Let
, where
have been estimated using (9) and (10).
coefficients of
are the solution of
Then the
the following linear programming problem:
(11)
Proof: The sum of the inner products between , an arbiis
trary linear combination of ’s, and
Let
tion are
Since
where
i.e.,
. The partial derivatives of this func-
is a linear function of
’s,
and
, its minimum occurs at
,
is the coordinate along which has the smallest slope,
(8)
or
.
The equalities holds when
In practice, the coefficients of linear combination of ’s, corresponding to components with smallest and largest mean, can
be estimated by solving the following linear programming optimization problems, respectively:
(9)
(10)
The next section discusses estimation of the remaining components, assuming their number is larger than two.
According to the lemma conditions,
, hence
. Therefore,
is achieved when
the minimum value of
and
and
.
The necessary condition in Lemma 2,
, can be interpreted as sufficient separation
between the underlying components, since these are non-negative. More specifically, it requires that the sum of overlaps between and and between and is smaller than the norm
of or . This translates into a minimum signal-to-noise ratio
(SNR) of 1.66 in the case of Gaussian components. This requirement can be easily satisfied for most medical images, where typical SNRs usually exceed 3.
In the case of more than three components, starting from the
first two estimated components, all other components can be estimated iteratively one-by-one by minimizing their overlap with
all previously estimated components, as shown in the following
lemma.
842
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
The value of the histogram at the th point can be thought of as a
binary random variables, each representing a random
sum of
chance of success, and hence has a binomial
draw that has
distribution with mean
and variance
.
(the histogram scaled to sum
The normalized histogram
and variance
,
to 1) then has mean
which means the approximation noise
has zero mean and variance
Fig. 3. (a) Piecewise constant probability density function and (b) its approximation by histogram.
Lemma 3: Let
be the underlying components, of
which the first are already known. Let
(12)
(14)
Note that the variance increases almost linearly with
, consistent with our observation in Fig. 3.
The influence of the histogram noise can be reduced by estimating its contribution to the covariance matrix and subse, then the covariquently removing it. Let
ance matrix can be written as follows:
Then
, where
are solutions of the following
linear programming problem, will coincide with one of the remaining unknown components:
(13)
Proof: Let function
. Then
. As mentioned in Lemma 2, the minimum
, where is the coorof occurs when
dinate along which has the smallest slope. According to (12),
the minimum must correspond to one of the unknown compo.
nents, i.e.,
Note that one of the terms on the right side of inequality (12)
is equal to the norm of a component, while all other terms on
both sides correspond to overlaps between components. Hence
condition (12) can be again interpreted as sufficient separation
between the underlying components.
The first term corresponds to the covariance matrix that has not
are uncorrelated,
been affected by noise. Since and
the second and third terms can be neglected
(15)
The
the
last term
histogram
is the
noise.
covariance matrix
Its
nondiagonal
of
entries
because
and
are zero mean and uncorrelated. The diagonal
entries can be approximated using (14) as
IV. REDUCING THE EFFECT OF NOISE ON ESTIMATION
OF PRINCIPAL COMPONENTS
Histogram is an approximation of the underlying probability
distribution (or probability mass function, for images). The difference between the two is due to finite set of data used for
histogram evaluation and can be treated as noise. When identically distributed, this approximation noise is unlikely to alter
the directions of principal components, a property widely exploited by PCA-based denoising. However, the histogram noise
may not identically distributed, as shown in the comparison between a piecewise uniform probability mass function and its approximation by a histogram, evaluated using 2000 data points
(Fig. 3). Note that the approximation noise has larger variance
in the range [0.5, 1], where the original probability mass function is larger, and smaller variance in the range [0, 0.5], where
the probability mass function is smaller.
The noise statistics can be derived analytically by noticing
that histogram amplitudes have binomial distributions. Let
be the underlying probability mass function of the th image
be the number of data points (image voxels).
subvolume and
When all subvolumes have equal size
(16)
Subtracting estimated
variance matrix.
from (15) yields the corrected co-
V. IMPLEMENTATION
Our algorithm was implemented fully in Matlab, using
built-in functions pcacov and linprog to estimate principal components and perform linear programming optimization. Note
that pcacov was used instead of SVD decomposition because it
allows manipulation of the covariance matrix (Section IV). The
implementation of our framework for component estimation
is described in Fig. 4. The algorithm to estimate the 1-st and
ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES
843
Note that despite increased dimensionality of the new PCA
subspace, the number of principal components will still be equal
to the number of underlying tissue distributions, a required condition for the proofs of the lemmas. In other words, if intensity
variation of a particular tissue gives raise to an additional principal component, this variation will be treated as a separate class
of interest, albeit “inferior” to other classes. The large overlap
between these inferior distributions and the main classes will
guarantee that the former will not appear in the estimation loop
output until all the latter ones are estimated, at which point the
estimation process can be stopped.
VI. EXPERIMENTAL RESULTS
A. Estimating Tissue Intensity Distributions From Structural
MRI Data
Fig. 4. Algorithm implementation.
th component (line 7) is implemented as described in (9) and
(10) of Lemma 1. The algorithm to estimate each additional
component (line 10) is implemented as described in (13) of
Lemma 3. The total running time of the algorithm is on the
order of a few seconds, for a typical 3D MR image.
During the initial testing on simulated data we have discovered that the non-negative constraint imposed on estimated comwas too strict. The histogram noise and errors in
ponents
estimating of the principal subspace can lead to an infeasible
optimization problem or a very narrow search space. To overto
come this we relaxed the non-negativity constraint
, where is the number of the histogram bins.
This negative bound was small enough not to cause any visible
estimation problems in our experiments.
We have also found that intensity nonuniformity (smoothly
varying intensity across the same tissue) and the presence of several tissues with slightly different intensities, e.g., subcortical
structures in brain images, can increase the dimensionality of the
original subspace. In this case some of the estimated principal
components may represent slight variations in tissue appearance
rather than a new tissue, resulting in a large overlap between the
estimated components. This poses two problems. First, the appearance of two overlapping component representing the same
tissue means that some other tissue will not be represented.
Second, it stalls the optimization procedure, which requires sufficient separation between the components.
To overcome these issues we implemented a simple failure
detection in line 11, which measures the overlap between
the current component and all previously estimated components. The overlap here was defined as the intersection
area between components, i.e., area under the curve of
. If the overlap is greater than 2/3,
an empirically set threshold, the counter is reset to zero and
the number of principal components is increased by 1 to better
capture the original subspace. The new PCA subspace with
increased dimensionality will contain a mixture of components
corresponding to distinct tissue distributions as well as some
components corresponding to their intensity variations.
For performance evaluation we used T1-weighted
brain images from two publicly available data sets,
BrainWeb (http://www.bic.mni.mcgill.ca/brainweb/)
and
IBSR (http://www.cma.mgh.harvard.edu/ibsr/), and one data
set acquired at our site (CNL). The BrainWeb data set contains
realistic synthesized brain volumes with varying degrees of
noise and intensity nonuniformity, and 1 1 1 mm resolution. The IBSR data set contains 18 real MR acquisitions made
on a 1.5 T scanner with 1 1 1.5 mm resolution. The CNL
data set contained times 1 1 mm brain scans of 14 healthy
aging adults (age 55–82), acquired on Siemens Tim Trio 3 T
scanner.
BrainWeb and IBSR data sets contained the ground truth labeling for GM and WM, by construction for BrainWeb and manually segmented by an expert for IBSR. To obtain ground truth
labeling for CNL data set, these images were processed using
FreeSurfer 3.04 [43] and the resultant pial and WM surfaces
were then edited by an expert and converted to volume masks. In
addition, BrainWeb data set contained the ground truth labeling
for CSF, which was not available for IBSR and CNL data sets.
To compensate for this we removed all non-brain tissues using
GCUT, an in-house developed skull stripping technique [44].
Following this we defined the CSF label as the set of voxels preserved by the skull stripping but not belonging to GM or WM
labels. Note that so defined CSF region may contain traces of
non-brain tissues, left behind by the skull stripping procedure.
We further augmented the ground truth to include mix classes
of partial volume voxels (GM-WM and CSF-GM). The partial volume voxels were defined as the voxels located near the
boundary between two tissues. Practically, these were identified
by performing a one voxel erosion of each tissue with a standard
six-neighbor connectivity and selecting the eroded voxels as belonging to the mix classes.
All images were first skull stripped using GCUT [44] and then
processed by our algorithm set to estimate either three (only
the main tissues) or five tissue intensity distributions. Shown
in Fig. 5 are the results of applying our algorithm on images
from BrainWeb data set. Our algorithm exhibited excellent performance when only the main tissue classes were considered
[Fig. 5(left)] and performed slightly worse when the mix classed
were included, especially for large noise levels [Fig. 5(right)].
This decrease in the estimation quality was primarily due to the
844
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
Fig. 6. Estimating three classes [CSF GM WM] on an IBSR volume 5 (left)
and a CNL volume (right). Ground truth distributions are shown using dotted
lines. (a) Our algorithm. (b) SPM 8.
TABLE I
CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND
TRUE DISTRIBUTIONS, AVERAGED OVER BRAINWEB VOLUMES
(NOISE LEVEL 1, 3, 5, AND 7). * INDICATES p < :
0 005
Fig. 5. Estimating 3 classes [CSF GM WM] (left) and 3 pure classes [CSF GM
WM]
mix classes [CSF-GM GM-WM] (right) on BrainWeb data. Ground
. (b) Noise
.
truth distributions are shown using dotted lines. (a) Noise
.
(c) Noise
+2
=7
=3
=5
increased error in estimating tissue proportions rather than the
distribution shapes.
It is remarkable that our algorithm was able to capture the
two-peak shape of the CSF-GM mix class distribution, which
appears at the lower noise levels [Fig. 5(a) and (b)]. This would
not be possible with a Gaussian mixture model, which represents each distribution by a single peak Gaussian. Our algorithm also compares favorably with several other approaches,
in
which were evaluated on BrainWeb volume with noise
[29]—compare Fig. 5(c) here with Fig. 6 in [29]. Our algorithm
also performed well on real MRI data from IBSR and CNL data
sets [Fig. 6(a)].
For quantitative performance evaluation we used two measures. One was the average Pearson correlation between estimated and true distributions of each class; its goal was to capture
the quality of distribution shape estimation. The other was the
absolute difference between the estimated and the true proportions for each class. The summary of the performance results for
BrainWeb and IBSR/CNL data sets is provided in Tables I and II
respectively. Overall, our approach achieved excellent average
correlations of 0.88–0.97 and average proportion errors of less
than 0.06.
For reference, we compared these results with those obtained
by well-known SPM 8 segmentation software package, which
models the image histogram using a mixture of Gaussians and
iterates between estimating mixture parameters, tissue classification, intensity nonuniformity, and registration [16]. We ran
TABLE II
CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE
DISTRIBUTIONS, AVERAGED OVER 18 IBSR VOLUMES AND 14 CNL
VOLUMES. (A)
RELAXATION, (B) HISTOGRAM ERROR CORRECTION.
** INDICATES p < :
= PC
0 001
SPM with the default settings using 10 Gaussian distributions
for GM, 2 for WM, 6 for CSF + other structures, which were
then linearly combined to obtain the distributions of the three
main brain tissues. Overall, SPM performed worse compared to
ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES
Fig. 7. Estimating three classes [CSF GM WM] on BrainWeb data with 5%
noise and (a) 20% or (b) 40% nonuniformity. Ground truth distributions are
shown using dotted lines.
our method, achieving correlations of 0.81–0.85 and proportion
errors of 0.05–0.13 (Table II). Considering that the CSF class
was not as clearly defined as the GM and WM classes, we have
also included results restricted to the GM and WM classes in the
last two columns of the table. A typical SPM output is shown in
Fig. 6(b).
We have also evaluated the effects of intensity nonuniformity
and individual algorithm components/settings (subvolume size,
relaxation of the number principal components, correction of the
covariance matrix) on the estimation performance. Changing
subvolume size from 3 3 3 cuboids to 20 20 20 cuboids
to sagittal slices had little effect on performance (Table I). However, the average correlations decreased in the presence of intensity nonuniformity, and the change was statistically signif. As shown in Fig. 7, nonuniformity causes
icant
the estimated distributions with minimum and maximum means
to become narrower than the true distributions, which explains
the decreased correlations. The relaxation step led to significant
improvement in estimation performance; correlation increased
from 0.52 to 0.88 for IBSR data set and from 0.71 to 0.94 for
, and proportion error decreased from
CNL data set
more than 0.2 to 0.06 or less (Table II). The correction of the
covariance matrix for histogram noise had no discernable effect
on the estimation performance.
B. Estimating Distribution of Activated Voxels From Simulated
Functional MR Data
Activated regions in functional MRI experiments are typically detected using significance threshold testing, where voxels
are declared activated on the basis of their unlikely intensity
under the null hypothesis of no activation [45]. This allows controlling for Type I error but not for Type II error, since only the
null distribution is modeled. Here by Type I error we mean false
positives—declaring a nonactivated pixel activated, and by Type
II error we mean false negatives—declaring an activated pixel
nonactivated. A more appropriate threshold could be defined if
the distribution of activated voxels is known [46], but small size
of the activated class makes the estimation difficult.
To simulate functional MRI data we created a set of synthetic 200 200 200 resolution images, where activated regions were modeled as uniform intensity cubes of size 3 3 3
voxels on a uniform background. The images were corrupted by
Gaussian noise, thus creating two Gaussian distributions, one
for the nonactivated class and the other for the activated class.
We then varied the difference between the means of the two
845
distributions and the proportion of the activated (smaller) class
to obtain different samples for our experiments. The signal-tonoise ratio (SNR), defined as the ratio of the difference between
the means and the standard deviation, was varied from 2 to 6.
The proportion of the activated class was varied from 0.1% to
16%.
Instead of providing correlations and proportion errors as in
Section VI-A, we used the estimated distributions to directly determine the threshold that minimizes the misclassification error
(the sum of Type I and Type II errors). We then recorded the
percentage increase in misclassification error when using the
estimated threshold versus the optimal one, derived using the
knowledge of the true distributions. For reference, we have compared the results with those obtained using our own implementation of the standard EM algorithm. To achieve the best possible
performance, the EM algorithm was initialized with the true parameter values, e.g., means, variances, and proportions derived
from the true distributions.
The results are summarized in Fig. 8. The performance of our
algorithm was the same throughout the chosen range of SNRs,
and practically coincided with the optimal performance for proportions 0.68%–1% and larger. There was a significant drop
in performance below 0.68%–1% critical failure point. Performance of the EM-based estimation was significantly worse than
– , and was comparable to
that of our approach at
. However, considering that EM algorithm was
ours at
initialized with the true parameter values, and real life performance with random initialization is expected to be worse, our
approach clearly offers a superior alternative to the EM algorithm in this application.
Unlike in the case of structural MRI data, the relaxation
of the number of principal components did not have any discernable effect on the performance in this experiment. On the
other hand, the histogram error correction had a significant impact—its omission made our algorithm perform worse than the
EM algorithm (Fig. 8). Using larger subvolumes also decreased
the performance, with critical failure proportion moving from
0.68%–1% to 1.35% for 10 10 10 subvolumes and to 6%
proportion for 20 20 20 subvolumes.
VII. DISCUSSION
We presented a novel nonparametric, noniterative approach
to estimating underlying tissue probability distributions from
3D image volumes. The main idea is to treat estimation as an
instance of a blind source separation problem, constraining underlying components to be non-negative, sum to one, and be
sufficiently separated from each other.
It is important to highlight the differences between the proposed approach and related techniques, such as ICA and nonnegative matrix factorization (NMF) [47]. All three approaches
seek a linear decomposition of the data matrix as
, where rows of matrices and contain basis vectors and
mixing weights respectively. The proposed approach is similar
to ICA in using a two-stage decomposition,
(PCA) followed by
, so that
. Another
similarity between ICA and the proposed approach is that the
rows of are estimated one-by-one, by optimizing a certain objective function. The main novelty of our approach, compared
846
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
Fig. 8. Percentage increase in misclassification error (performance drop) as a function of smaller class proportion for (a) SNR = 2, (b) SNR = 4, (c) SNR = 6
for our approach (with and without error correction) versus EM algorithm.
to ICA, is the choice of this objective function, designed to minimize or maximize the means of the components and minimize
their overlap with previously estimated components, rather that
maximizing some measure of non-Gaussianity. Our estimated
components are also constrained to be non-negative and sum
to 1. This results in completely different optimization procedure, iterative locally optimal for ICA and noniterative globally
optimal for our approach.
NMF differs from ICA and the proposed approach in that
and
are estimated simultaneously, rather than
matrices
row by row. Our approach has similar constraints to those of
NMF—matrices and in NMF are required to be non-negative and basis sparsity can be encouraged [48]. Nevertheless,
the optimization procedure used in NMF is iterative and only
locally optimal, and testing its suitability for the current application is left as future work.
Note that our approach does not use any regularization or
smoothness constraints on the shape of the estimated tissue distributions. The resultant smoothness is due to enforcing the distributions to be linear combinations of the main principal components. These components are expected to be smooth due to
denoising effect of PCA—the noise spreads across many minor
PCA components and is removed when these components are
discarded.
The chosen experiments aimed to highlight the capabilities
of our algorithm and its advantages over standard approaches
based on the EM algorithm. In the first experiment (estimating
brain tissue intensity probability distributions from structural
MRI data) our algorithm outperformed SPM even though the
latter relied on prior tissue probability maps and intermediate
tissue classification. The underlying distributions were not
Gaussian due to the presence of partial volume voxels, intensity
nonuniformity, as well as natural tissue variability, and these
non-Gaussian shapes were better captured by our method.
Since our method does not require any prior information,
this suggests its possible deployment in such applications as
developmental studies on humans, segmentation of abdominal
adipose tissue, tumor segmentation, functional MRI, etc.
In the second experiment (simulated functional MRI data)
the goal was to test robustness of our approach to a scenario
where one class (activated voxels) is substantially smaller than
the other. Even when initialized with the true parameter values,
the EM algorithm performed worse than our method for low
SNRs. This can be explained by the small peak of activated
voxels distribution being completely merged with the tail of the
null distribution. This problem does not affect our method, because of the use of subvolume histograms, rather than the whole
image histogram. In subvolumes containing activated voxels,
the number of activated and nonactivated voxels will be similar
to each other, preventing the tail of the null distribution from
overpowering the peak of activated voxels. For example, when
a 5 5 5 subvolume fully covers a 3 3 3 activated region,
the subvolume will contain 27 activated voxels and
nonactivated voxels.
The chosen experiments also highlighted the advantages of
histogram noise correction (Section IV) and relaxation of the
number of PCA components (Section V). The latter was essential for structural MRI data, but not for functional MRI data,
possibly due to absence of intensity nonuniformity artifact. Note
that the raw fMRI volumes do exhibit intensity nonuniformity,
but the classification is applied to statistical parametric maps,
where voxel intensities correspond to likelihood of activation.
Such maps are generated based on linear regression of the task
time course from the voxel time courses and hence do not contain intensity nonuniformity. Histogram noise correction was
beneficial for fMRI data only. This is likely due to small size
of activated regions, which makes the proportion of their distribution so small that it becomes comparable to the histogram
noise.
Increasing subvolume size did not affect the performance for
structural MRI data and reduced it for simulated functional MRI
data. This is expected, considering that estimated principal components can be thought of as weighted averages of subvolume
histograms. Increasing subvolume size reduces the noise variance of each measurement (subvolume histogram) but also reduces the number of measurements (the number of subvolumes)
by exactly the same factor, and the two changes cancel each
other. The decrease in performance for functional MRI data was
due to increased difficulty of detecting a small peak inside the
tail of the null distribution. For example, a 10 10 10 subvolume fully covering a 3 3 3 activated region will contain
nonactivated
only 27 activated voxels and
voxels, approximately 50 of which will be located in the tail of
the null distribution (beyond the two standard deviations).
While the presence of intensity nonuniformity does not lead
to failure of our algorithm, due to proposed relaxation of the
number of PCA components, it has adverse effect on performance by causing estimated distributions with minimum and
maximum means to become narrower than otherwise expected.
This can be explained by nonuniformity causing two (or pos-
ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES
sibly more) principal components to represent a single tissue
distribution. These components can be combined to yield distributions that are narrower than the true distribution and are
aligned with its left or right border. Hence, maximizing the
mean will yield a narrower distribution aligned with the right
border of the true distribution, while minimization of the mean
will lead to alignment with the left border, as shown in Fig. 7.
Real life applications, such as brain morphometry analysis, employ intensity nonuniformity correction as a mandatory first step
in data processing pipeline. The remaining (after correction)
nonuniformity is likely to be much smaller than 20%–40% used
in our testing and hence expected to have minimal effect on performance.
REFERENCES
[1] T. Lei and W. Sewchand, “Statistical approach to X-ray CT imaging
and its applications in image analysis,” IEEE Trans. Med. Imag., vol.
11, no. 1, pp. 62–69, Mar. 1992.
[2] F. O’Sullivan, “Imaging radiotracer model parameters in PET: A mixture analysis approach,” IEEE Trans. Med. Imag., vol. 12, no. 3, pp.
399–412, Sep. 1993.
[3] A. Lundervold and G. Storvik, “Segmentation of brain parenchyma and
cerebrospinal fluid in multispectral magnetic resonance images,” IEEE
Trans. Med. Imag., vol. 14, no. 2, pp. 339–349, Jun. 1995.
[4] W. M. Wells, W. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive
segmentation of MRI data,” IEEE Trans. Med. Imag., vol. 15, no. 4, pp.
429–442, Aug. 1996.
[5] J. Ashburner and K. Friston, “Multimodal image coregistration and
partitioning—A unified framework,” Neuroimage, vol. 6, no. 3, pp.
209–217, 1997.
[6] S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain
tissue classification of magnetic resonance images using partial
volume modeling,” IEEE Trans. Med. Imag., vol. 19, no. 12,
pp. 1179–1187, Dec. 2000.
[7] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1,
pp. 45–57, Jan. 2001.
[8] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “A unifying framework for partial volume segmentation of brain MR images,”
IEEE Trans. Med. Imag., vol. 22, no. 1, pp. 105–119, Jan. 2003.
[9] H. Park, P. H. Bland, and C. R. Meyer, “Construction of an abdominal
probabilistic atlas and its application in segmentation,” IEEE Trans.
Med. Imag., vol. 22, no. 4, pp. 483–492, Apr. 2003.
[10] G. Dugas-Phocion, M. A. G. Ballester, G. Malandain, C. Lebrun,
and N. Ayache, “Improved em-based tissue segmentation and partial
volume effect quantification in multi-sequence brain MRI,” in Int.
Conf. Med. Image Computing Computer Assisted Intervent. (MICCAI),
Saint-Malo, France, 2004, vol. 3216, pp. 26–33.
[11] M. H. Cardinal, J. Meunier, G. Soulez, R. L. Maurice, E. Therasse,
and G. Cloutier, “Intravascular ultrasound image segmentation:
A three-dimensional fast-marching method based on gray level
distributions,” IEEE Trans. Med. Imag., vol. 25, no. 5, pp. 590–601,
May 2006.
[12] F. G. Meyer and X. Shen, “Classification of fMRI time series in a lowdimensional subspace with a spatial prior,” IEEE Trans. Med. Imag.,
vol. 27, no. 1, pp. 87–98, Jan. 2008.
[13] M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture
models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp.
381–396, Mar. 2002.
[14] J. Tohka, E. Krestyannikov, I. D. Dinov, A. M. Graham, D. W. Shattuck,
U. Ruotsalainen, and A. W. Toga, “Genetic algorithms for finite mixture model based voxel classification in neuroimaging,” IEEE Trans.
Med. Imag., vol. 26, no. 5, pp. 696–711, May 2007.
[15] P. Schroeter, J. M. Vesin, T. Langenberger, and R. Meuli, “Robust parameter estimation of intensity distributions for brain magnetic resonance images,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 172–186,
Apr. 1998.
847
[16] J. Ashburner and K. J. Friston, “Unified segmentation,” Neuroimage,
vol. 26, no. 3, pp. 839–851, 2005.
[17] Z. Liang, J. R. Macfall, and D. P. Harrington, “Parameter estimation
and tissue segmentation from multispectral MR images,” IEEE Trans.
Med. Imag., vol. 13, no. 3, pp. 441–449, Sep. 1994.
[18] C. A. Bouman and M. Shapiro, “A multiscale random field model for
bayesian image segmentation,” IEEE Trans. Image Process., vol. 3, no.
2, pp. 162–177, Mar. 1994.
[19] H. Greenspan, A. Ruf, and J. Goldberger, “Constrained Gaussian mixture model framework for automatic segmentation of MR brain images,” IEEE Trans. Med. Imag., vol. 25, no. 9, pp. 1233–1245, Sep.
2006.
[20] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove,
A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo,
N. Makris, B. Rosen, and A. M. Dale, “Whole brain segmentation:
Automated labeling of neuroanatomical structures in the human brain,”
Neuron, vol. 33, no. 3, pp. 341–355, 2002.
[21] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1,
pp. 45–57, Jan. 2001.
[22] S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E.
Behrens, H. Johansen-Berg, P. R. Bannister, M. D. Luca, I. Drobnjak,
and D. E. Flitney, “Advances in functional and structural MR image
analysis and implementation as FSL,” NeuroImage Math. Brain Imag.,
vol. 23, pp. S208–S219, 2004.
[23] H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI
data,” Magn. Reson. Med., vol. 34, no. 6, pp. 910–914, 1995.
[24] P. Santago and H. D. Gage, “Quantification of MR brain images by
mixture density and partial volume modeling,” IEEE Trans. Med.
Imag., vol. 12, no. 3, pp. 566–574, Sep. 1993.
[25] D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, D. A. Rottenberg, and R. M. Leahy, “Magnetic resonance image tissue classification
using a partial volume model,” Neuroimage, vol. 13, no. 5, pp. 856–876,
2001.
[26] D. H. Laidlaw, K. W. Fleischer, and A. H. Barr, “Partial-volume
Bayesian classification of material mixtures in MR volume data using
voxel histograms,” IEEE Trans. Med. Imag., vol. 17, no. 1, pp. 74–86,
Feb. 1998.
[27] J. Ashburner and K. J. Friston, “Voxel-based morphometry—The
methods,” Neuroimage, vol. 11, no. 6, pp. 805–821, 2000, Pt 1.
[28] P. Santago and H. D. Gage, “Statistical models of partial volume effect,” IEEE Trans. Image Process., vol. 4, no. 11, pp. 1531–1540, Nov.
1995.
[29] M. B. Cuadra, L. Cammoun, T. Butz, O. Cuisenaire, and J.-P. Thiran,
“Comparison and validation of tissue modelization and statistical classification methods in T1-weighted MR brain images,” IEEE Trans.
Med. Imag., vol. 24, no. 12, pp. 1548–1565, Dec. 2005.
[30] S. Sanjay-Gopal and T. J. Hebert, “Bayesian pixel classification using
spatially variant finite mixtures and the generalized EM algorithm,”
IEEE Trans. Image Process., vol. 7, no. 7, pp. 1014–1028, Jul. 1998.
[31] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Automated model-based tissue classification of MR images of the brain,”
IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 897–908, Oct. 1999.
[32] J. L. Marroquin, B. C. Vemuri, S. Botello, F. Calderon, and A. Fernandez-Bouzas, “An accurate and efficient bayesian method for automatic Segmentation of brain MRI,” IEEE Trans. Med. Imag., vol. 21,
no. 8, pp. 934–945, Aug. 2002.
[33] K. M. Pohl, W. M. Wells, A. Guimond, K. Kasai, M. E. Shenton, R.
Kikinis, W. E. L. Grimson, and S. K. Warfield, “Incorporating nonrigid registration into expectation maximization algorithm to segment
MR images,” in Int. Conf. Med. Image Computing Computer Assist.
Intervent. (MICCAI), 2002.
[34] M. Prastawa, “Automatic brain tumor segmentation by subject specific modification of atlas priors,” Acad. Radiol., vol. 10, no. 12, pp.
1341–1348, 2003.
[35] J. Nuyts, P. Dupont, S. Stroobants, A. Maes, L. Mortelmans, and P.
Suetens, “Evaluation of maximum-likelihood based attenupion correction in positron emission tomography,” IEEE Trans. Nucl. Sci., vol. 46,
no. 4, p. 1136, Aug. 1999.
[36] A. Ciptadi, C. Chen, and V. Zagorodnov, “Component analysis approach to estimation of tissue intensity distributions of 3D images,”
in IEEE Int. Conf. Comput. Vis. (ICCV), Kyoto, Japan, 2009, p. 1765.
848
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011
[37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A
blind source separation technique using second-order statistics,” IEEE
Trans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997.
[38] J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Process. Lett., vol. 4, no. 4, pp. 112–114, Apr.
1997.
[39] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley-Interscience, 2001.
[40] R. Nagarajan and C. A. Peterson, “Identifying spots in microarray images,” IEEE Trans. Nanobiosci., vol. 1, no. 2, pp. 78–84, Jun. 2002.
[41] D. Lawley and A. Maxwell, “Factor analysis as a statistical method,”
Statistician, vol. 12, no. 3, pp. 209–229, 1962.
[42] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11,
no. 4, 1999.
[43] A. M. Dale, B. Fischl, and M. I. Sereno, “Cortical surface-based analysis: I. segmentation and surface reconstruction,” NeuroImage, vol. 9,
no. 2, pp. 179–194, 1999.
[44] S. Sadananthan, W. Zheng, M. Chee, and V. Zagorodnov, “Skull stripping using graph cuts,” NeuroImage, vol. 49, no. 1, pp. 225–239, 2010.
[45] B. R. Logan and D. B. Rowe, “An evaluation of thresholding techniques in fMRI analysis,” Neuroimage, vol. 22, no. 1, pp. 95–108, 2004.
[46] N. V. Hartvig and J. L. Jensen, “Spatial mixture modeling of fMRI
data,” Hum. Brain Mapp., vol. 11, no. 4, pp. 233–248, 2000.
[47] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791,
1999.
[48] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, 2004.
Download